All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ilya Dryomov <idryomov@gmail.com>
To: Xiubo Li <xiubli@redhat.com>
Cc: ceph-devel@vger.kernel.org, jlayton@kernel.org,
	vshankar@redhat.com,  mchangir@redhat.com
Subject: Re: [PATCH v3 3/3] libceph: just wait for more data to be available on the socket
Date: Tue, 16 Jan 2024 11:00:05 +0100	[thread overview]
Message-ID: <CAOi1vP8xOFA4QgMwjGyzTJuAC6T6+XDypXW3Dhhin0RnUh-ZAQ@mail.gmail.com> (raw)
In-Reply-To: <a1d6e998-f496-4408-9d76-3671ee73e054@redhat.com>

On Tue, Jan 16, 2024 at 5:09 AM Xiubo Li <xiubli@redhat.com> wrote:
>
>
> On 1/16/24 06:38, Ilya Dryomov wrote:
>
> On Fri, Dec 15, 2023 at 1:22 AM <xiubli@redhat.com> wrote:
>
> From: Xiubo Li <xiubli@redhat.com>
>
> The messages from ceph maybe split into multiple socket packages
> and we just need to wait for all the data to be availiable on the
> sokcet.
>
> This will add 'sr_total_resid' to record the total length for all
> data items for sparse-read message and 'sr_resid_elen' to record
> the current extent total length.
>
> URL: https://tracker.ceph.com/issues/63586
> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> ---
>  include/linux/ceph/messenger.h |  2 ++
>  net/ceph/messenger.c           |  1 +
>  net/ceph/messenger_v1.c        | 21 ++++++++++++++++-----
>  net/ceph/osd_client.c          |  1 +
>  4 files changed, 20 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/ceph/messenger.h b/include/linux/ceph/messenger.h
> index 2eaaabbe98cb..05e9b39a58f8 100644
> --- a/include/linux/ceph/messenger.h
> +++ b/include/linux/ceph/messenger.h
> @@ -231,10 +231,12 @@ struct ceph_msg_data {
>
>  struct ceph_msg_data_cursor {
>         size_t                  total_resid;    /* across all data items */
> +       size_t                  sr_total_resid; /* across all data items for sparse-read */
>
>         struct ceph_msg_data    *data;          /* current data item */
>         size_t                  resid;          /* bytes not yet consumed */
>         int                     sr_resid;       /* residual sparse_read len */
> +       int                     sr_resid_elen;  /* total sparse_read elen */
>
> Hi Xiubo,
>
> Is adding yet another sparse-read field to the cursor really needed?
> Would making read_partial_sparse_msg_extent() decrement sr_total_resid
> as it goes just like sr_resid is decremented work?
>
> This can be improved by removing it. Let me fix it.
>
>         bool                    need_crc;       /* crc update needed */
>         union {
>  #ifdef CONFIG_BLOCK
> diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
> index 3c8b78d9c4d1..eafd592af382 100644
> --- a/net/ceph/messenger.c
> +++ b/net/ceph/messenger.c
> @@ -1073,6 +1073,7 @@ void ceph_msg_data_cursor_init(struct ceph_msg_data_cursor *cursor,
>         cursor->total_resid = length;
>         cursor->data = msg->data;
>         cursor->sr_resid = 0;
> +       cursor->sr_resid_elen = 0;
>
>         __ceph_msg_data_cursor_init(cursor);
>  }
> diff --git a/net/ceph/messenger_v1.c b/net/ceph/messenger_v1.c
> index 4cb60bacf5f5..7425fa26e4c3 100644
> --- a/net/ceph/messenger_v1.c
> +++ b/net/ceph/messenger_v1.c
> @@ -160,7 +160,9 @@ static size_t sizeof_footer(struct ceph_connection *con)
>  static void prepare_message_data(struct ceph_msg *msg, u32 data_len)
>  {
>         /* Initialize data cursor if it's not a sparse read */
> -       if (!msg->sparse_read)
> +       if (msg->sparse_read)
> +               msg->cursor.sr_total_resid = data_len;
> +       else
>                 ceph_msg_data_cursor_init(&msg->cursor, msg, data_len);
>  }
>
> @@ -1032,18 +1034,25 @@ static int read_partial_sparse_msg_data(struct ceph_connection *con)
>         bool do_datacrc = !ceph_test_opt(from_msgr(con->msgr), NOCRC);
>         u32 crc = 0;
>         int ret = 1;
> +       int len;
>
>         if (do_datacrc)
>                 crc = con->in_data_crc;
>
> -       do {
> -               if (con->v1.in_sr_kvec.iov_base)
> +       while (cursor->sr_total_resid && ret > 0) {
> +               len = 0;
> +               if (con->v1.in_sr_kvec.iov_base) {
>                         ret = read_partial_message_chunk(con,
>                                                          &con->v1.in_sr_kvec,
>                                                          con->v1.in_sr_len,
>                                                          &crc);
> -               else if (cursor->sr_resid > 0)
> +                       if (ret == 1)
> +                               len = con->v1.in_sr_len;
> +               } else if (cursor->sr_resid > 0) {
>                         ret = read_partial_sparse_msg_extent(con, &crc);
> +                       if (ret == 1)
> +                               len = cursor->sr_resid_elen;
> +               }
>
> I was waiting for Jeff's review since this is his code, but it's been
> a while so I'll just comment.
>
> To me, it's a sign of suboptimal structure that you needed to add new
> fields to the cursor just to be able tell whether this function is done
> with the message, because it's something that is already tracked but
> gets lost between the loops here.
>
> Currently this function is structured as:
>
>     do {
>         if (set up for kvec)
>            read as much as possible into kvec
>         else if (set up for pages)
>            read as much as possible into pages
>
>         if (short read)
>             bail, will be called again later
>
>          invoke con->ops->sparse_read() for processing the thing what
>          was read and setting up the next read OR setting up the initial
>          read
>     } until (con->ops->sparse_read() returns "done")
>
> If it was me, I would pursue refactoring this to:
>
>     do {
>         if (set up for kvec)
>            read as much as possible into kvec
>         else if (set up for pages)
>            read as much as possible into pages
>         else
>            bail
>
> Why bail here ? For the new sparse read both the 'kvec' and 'pages' shouldn't be set, so the following '->sparse_read()' will set up them and continue.
>
> Or you just want the 'read_partial_sparse_msg_data()' to read data but not the first time to trigger the '->sparse_read()' ? Before I tried a similar approach and it was not as optional as this one as I do.

Hi Xiubo,

I addressed that in the previous message:

    ... and invoke con->ops->sparse_read() for the first time elsewhere,
    likely in prepare_message_data().  The rationale is that the state
    machine inside con->ops->sparse_read() is conceptually a cursor and
    prepare_message_data() is where the cursor is initialized for normal
    reads.

The benefit is that no additional state would be needed.

> The 'cursor->sr_total_resid', which is similar with 'cursor->total_resid', will just record the total data length for current sparse-read request and will determine whether should we skip parsing the "(page) data" in 'read_partial_message()'.

I understand the intent behind cursor->sr_total_resid, but it would be
nice to do without it because of its inherent redundancy.

Did you try calling con->ops->sparse_read() for the first time exactly
where cursor->sr_total_resid is initialized in your patch?

Thanks,

                Ilya

  parent reply	other threads:[~2024-01-16 10:00 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-15  0:20 [PATCH v3 0/3] libceph: fix sparse-read failure bug xiubli
2023-12-15  0:20 ` [PATCH v3 1/3] libceph: fail the sparse-read if there still has data in socket xiubli
2023-12-15 17:06   ` Jeff Layton
2023-12-18  1:23     ` Xiubo Li
2023-12-15  0:20 ` [PATCH v3 2/3] libceph: rename read_sparse_msg_XX to read_partial_sparse_msg_XX xiubli
2023-12-15  0:20 ` [PATCH v3 3/3] libceph: just wait for more data to be available on the socket xiubli
2024-01-15 22:38   ` Ilya Dryomov
     [not found]     ` <a1d6e998-f496-4408-9d76-3671ee73e054@redhat.com>
2024-01-16 10:00       ` Ilya Dryomov [this message]
     [not found]         ` <35849fda-29b2-47ad-bf49-f2715efc7b8c@redhat.com>
2024-01-16 12:06           ` Ilya Dryomov
     [not found]         ` <68e4cf5a-f64f-4545-87b0-762ab920d9ba@redhat.com>
2024-01-22 19:13           ` Ilya Dryomov
2024-01-22 21:26             ` Ilya Dryomov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOi1vP8xOFA4QgMwjGyzTJuAC6T6+XDypXW3Dhhin0RnUh-ZAQ@mail.gmail.com \
    --to=idryomov@gmail.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=jlayton@kernel.org \
    --cc=mchangir@redhat.com \
    --cc=vshankar@redhat.com \
    --cc=xiubli@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.