All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: Xiubo Li <xiubli@redhat.com>, ceph-devel@vger.kernel.org
Cc: idryomov@gmail.com, vshankar@redhat.com, mchangir@redhat.com
Subject: Re: [PATCH v4 3/3] libceph: just wait for more data to be available on the socket
Date: Fri, 19 Jan 2024 06:09:56 -0500	[thread overview]
Message-ID: <f0c7ec2741851ff71e77f2e7598c0de665cce4ac.camel@kernel.org> (raw)
In-Reply-To: <ede93dec-3faf-48d1-859e-5edf4323fd15@redhat.com>

On Fri, 2024-01-19 at 12:35 +0800, Xiubo Li wrote:
> On 1/19/24 02:24, Jeff Layton wrote:
> > On Thu, 2024-01-18 at 18:50 +0800, xiubli@redhat.com wrote:
> > > From: Xiubo Li <xiubli@redhat.com>
> > > 
> > > The messages from ceph maybe split into multiple socket packages
> > > and we just need to wait for all the data to be availiable on the
> > > sokcet.
> > > 
> > > This will add 'sr_total_resid' to record the total length for all
> > > data items for sparse-read message and 'sr_resid_elen' to record
> > > the current extent total length.
> > > 
> > > URL: https://tracker.ceph.com/issues/63586
> > > Signed-off-by: Xiubo Li <xiubli@redhat.com>
> > > ---
> > >   include/linux/ceph/messenger.h |  1 +
> > >   net/ceph/messenger_v1.c        | 32 +++++++++++++++++++++-----------
> > >   2 files changed, 22 insertions(+), 11 deletions(-)
> > > 
> > > diff --git a/include/linux/ceph/messenger.h b/include/linux/ceph/messenger.h
> > > index 2eaaabbe98cb..ca6f82abed62 100644
> > > --- a/include/linux/ceph/messenger.h
> > > +++ b/include/linux/ceph/messenger.h
> > > @@ -231,6 +231,7 @@ struct ceph_msg_data {
> > >   
> > >   struct ceph_msg_data_cursor {
> > >   	size_t			total_resid;	/* across all data items */
> > > +	size_t			sr_total_resid;	/* across all data items for sparse-read */
> > >   
> > >   	struct ceph_msg_data	*data;		/* current data item */
> > >   	size_t			resid;		/* bytes not yet consumed */
> > > diff --git a/net/ceph/messenger_v1.c b/net/ceph/messenger_v1.c
> > > index 4cb60bacf5f5..2733da891688 100644
> > > --- a/net/ceph/messenger_v1.c
> > > +++ b/net/ceph/messenger_v1.c
> > > @@ -160,7 +160,9 @@ static size_t sizeof_footer(struct ceph_connection *con)
> > >   static void prepare_message_data(struct ceph_msg *msg, u32 data_len)
> > >   {
> > >   	/* Initialize data cursor if it's not a sparse read */
> > > -	if (!msg->sparse_read)
> > > +	if (msg->sparse_read)
> > > +		msg->cursor.sr_total_resid = data_len;
> > > +	else
> > >   		ceph_msg_data_cursor_init(&msg->cursor, msg, data_len);
> > >   }
> > >   
> > > @@ -1032,35 +1034,43 @@ static int read_partial_sparse_msg_data(struct ceph_connection *con)
> > >   	bool do_datacrc = !ceph_test_opt(from_msgr(con->msgr), NOCRC);
> > >   	u32 crc = 0;
> > >   	int ret = 1;
> > > +	int len;
> > >   
> > >   	if (do_datacrc)
> > >   		crc = con->in_data_crc;
> > >   
> > > -	do {
> > > -		if (con->v1.in_sr_kvec.iov_base)
> > > +	while (cursor->sr_total_resid) {
> > > +		len = 0;
> > > +		if (con->v1.in_sr_kvec.iov_base) {
> > > +			len = con->v1.in_sr_kvec.iov_len;
> > >   			ret = read_partial_message_chunk(con,
> > >   							 &con->v1.in_sr_kvec,
> > >   							 con->v1.in_sr_len,
> > >   							 &crc);
> > > -		else if (cursor->sr_resid > 0)
> > > +			len = con->v1.in_sr_kvec.iov_len - len;
> > > +		} else if (cursor->sr_resid > 0) {
> > > +			len = cursor->sr_resid;
> > >   			ret = read_partial_sparse_msg_extent(con, &crc);
> > > -
> > > -		if (ret <= 0) {
> > > -			if (do_datacrc)
> > > -				con->in_data_crc = crc;
> > > -			return ret;
> > > +			len -= cursor->sr_resid;
> > >   		}
> > > +		cursor->sr_total_resid -= len;
> > > +		if (ret <= 0)
> > > +			break;
> > >   
> > >   		memset(&con->v1.in_sr_kvec, 0, sizeof(con->v1.in_sr_kvec));
> > >   		ret = con->ops->sparse_read(con, cursor,
> > >   				(char **)&con->v1.in_sr_kvec.iov_base);
> > > +		if (ret <= 0) {
> > > +			ret = ret ? : 1; /* must return > 0 to indicate success */
> > > +			break;
> > > +		}
> > >   		con->v1.in_sr_len = ret;
> > > -	} while (ret > 0);
> > > +	}
> > >   
> > >   	if (do_datacrc)
> > >   		con->in_data_crc = crc;
> > >   
> > > -	return ret < 0 ? ret : 1;  /* must return > 0 to indicate success */
> > > +	return ret;
> > >   }
> > >   
> > >   static int read_partial_msg_data(struct ceph_connection *con)
> > Looking back over this code...
> > 
> > The way it works today, once we determine it's a sparse read, we call
> > read_sparse_msg_data. At that point we call either
> > read_partial_message_chunk (to read into the kvec) or
> > read_sparse_msg_extent if sr_resid is already set (indicating that we're
> > receiving an extent).
> > 
> > read_sparse_msg_extent calls ceph_tcp_recvpage in a loop until
> > cursor->sr_resid have been received. The exception there when
> > ceph_tcp_recvpage returns <= 0.
> > 
> > ceph_tcp_recvpage returns 0 if sock_recvmsg returns -EAGAIN (maybe also
> > in other cases). So it sounds like the client just timed out on a read
> > from the socket or caught a signal or something?
> > 
> > If that's correct, then do we know what ceph_tcp_recvpage returned when
> > the problem happened?
> 
> It should just return parital data, and we should continue from here in 
> the next loop when the reset data comes.
> 

Tracking this extra length seems like the wrong fix. We're already
looping in read_sparse_msg_extent until the sr_resid goes to 0. ISTM
that it's just that read_sparse_msg_extent is returning inappropriately
in the face of timeouts.

IOW, it does this:

                ret = ceph_tcp_recvpage(con->sock, rpage, (int)off, len);
                if (ret <= 0)
                        return ret;

...should it just not be returning there when ret == 0? Maybe it should
be retrying the recvpage instead?
-- 
Jeff Layton <jlayton@kernel.org>

  reply	other threads:[~2024-01-19 11:09 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-18 10:50 [PATCH v4 0/3] libceph: fix sparse-read failure bug xiubli
2024-01-18 10:50 ` [PATCH v4 1/3] libceph: fail the sparse-read if there still has data in socket xiubli
2024-01-18 14:03   ` Jeff Layton
2024-01-19  4:07     ` Xiubo Li
2024-01-19 11:03       ` Jeff Layton
2024-01-22  3:17         ` Xiubo Li
2024-01-18 10:50 ` [PATCH v4 2/3] libceph: rename read_sparse_msg_XX to read_partial_sparse_msg_XX xiubli
2024-01-18 14:04   ` Jeff Layton
2024-01-18 10:50 ` [PATCH v4 3/3] libceph: just wait for more data to be available on the socket xiubli
2024-01-18 14:36   ` Jeff Layton
2024-01-18 18:24   ` Jeff Layton
2024-01-19  4:35     ` Xiubo Li
2024-01-19 11:09       ` Jeff Layton [this message]
2024-01-22  2:52         ` Xiubo Li
2024-01-22 11:44           ` Jeff Layton
2024-01-22 15:02   ` Jeff Layton
2024-01-22 16:55     ` Ilya Dryomov
2024-01-22 17:14       ` Jeff Layton
2024-01-22 19:41         ` Ilya Dryomov
2024-01-23  0:53           ` Xiubo Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f0c7ec2741851ff71e77f2e7598c0de665cce4ac.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=ceph-devel@vger.kernel.org \
    --cc=idryomov@gmail.com \
    --cc=mchangir@redhat.com \
    --cc=vshankar@redhat.com \
    --cc=xiubli@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.