All of lore.kernel.org
 help / color / mirror / Atom feed
From: Xiubo Li <xiubli@redhat.com>
To: Luis Henriques <lhenriques@suse.de>
Cc: ceph-devel@vger.kernel.org, idryomov@gmail.com,
	jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com
Subject: Re: [PATCH v6 3/3] libceph: just wait for more data to be available on the socket
Date: Wed, 28 Feb 2024 08:45:41 +0800	[thread overview]
Message-ID: <831bfb4a-6213-4e32-8c68-252be354342e@redhat.com> (raw)
In-Reply-To: <871q8x4yac.fsf@suse.de>


On 2/28/24 01:22, Luis Henriques wrote:
> Hi Xiubo!
>
> xiubli@redhat.com writes:
>
>> From: Xiubo Li <xiubli@redhat.com>
>>
>> A short read may occur while reading the message footer from the
>> socket.  Later, when the socket is ready for another read, the
>> messenger shoudl invoke all read_partial* handlers, except the
>> read_partial_sparse_msg_data().  The contract between the messenger
>> and these handlers is that the handlers should bail if the area
>> of the message is responsible for is already processed.  So,
>> in this case, it's expected that read_sparse_msg_data() would bail,
>> allowing the messenger to invoke read_partial() for the footer and
>> pick up where it left off.
>>
>> However read_partial_sparse_msg_data() violates that contract and
>> ends up calling into the state machine in the OSD client. The
>> sparse-read state machine just assumes that it's a new op and
>> interprets some piece of the footer as the sparse-read extents/data
>> and then returns bogus extents/data length, etc.
>>
>> This will just reuse the 'total_resid' to determine whether should
>> the read_partial_sparse_msg_data() bail out or not. Because once
>> it reaches to zero that means all the extents and data have been
>> successfully received in last read, else it could break out when
>> partially reading any of the extents and data. And then the
>> osd_sparse_read() could continue where it left off.
> I'm seeing an issue with fstest generic/580, which seems to enter an
> infinite loop effectively rendering the testing VM unusable.  It's pretty
> easy to reproduce, just run the test ensuring to be using msgv2 (I'm
> mounting the filesystem with 'ms_mode=crc'), and you should see the
> following on the logs:
>
> [...]
>    libceph: prepare_sparse_read_cont: ret 0x1000 total_resid 0x0 resid 0x0
>    libceph: osd1 (2)192.168.155.1:6810 read processing error
>    libceph: mon0 (2)192.168.155.1:40608 session established
>    libceph: bad late_status 0x1
>    libceph: osd1 (2)192.168.155.1:6810 protocol error, bad epilogue
>    libceph: mon0 (2)192.168.155.1:40608 session established
>    libceph: prepare_sparse_read_cont: ret 0x1000 total_resid 0x0 resid 0x0
>    libceph: osd1 (2)192.168.155.1:6810 read processing error
>    libceph: mon0 (2)192.168.155.1:40608 session established
>    libceph: bad late_status 0x1
> [...]
>
> Reverting this patch (commit 8e46a2d068c9 ("libceph: just wait for more
> data to be available on the socket")) seems to fix.  I haven't
> investigated it further, but since it'll take me some time to refresh my
> memory, I thought I should report it immediately.  Maybe someone has any
> idea.

The following patch should fix it. I haven't test it yet. Will do it 
later today:


diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c
index a0ca5414b333..2c32ad4d9774 100644
--- a/net/ceph/messenger_v2.c
+++ b/net/ceph/messenger_v2.c
@@ -1860,10 +1860,10 @@ static int prepare_read_control_remainder(struct 
ceph_connection *con)
  static int prepare_read_data(struct ceph_connection *con)
  {
         struct bio_vec bv;
+       u64 len = con->in_msg->sparse_read_total ? : data_len(con->in_msg);

         con->in_data_crc = -1;
-       ceph_msg_data_cursor_init(&con->v2.in_cursor, con->in_msg,
-                                 data_len(con->in_msg));
+       ceph_msg_data_cursor_init(&con->v2.in_cursor, con->in_msg, len);

         get_bvec_at(&con->v2.in_cursor, &bv);
         if (ceph_test_opt(from_msgr(con->msgr), RXBOUNCE)) {


> Cheers,


      parent reply	other threads:[~2024-02-28  0:45 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-25  2:39 [PATCH v6 0/3] libceph: fix sparse-read failure bug xiubli
2024-01-25  2:39 ` [PATCH v6 1/3] libceph: fail the sparse-read if the data length doesn't match xiubli
2024-01-25  2:39 ` [PATCH v6 2/3] libceph: rename read_sparse_msg_XX to read_partial_sparse_msg_XX xiubli
2024-01-25  2:39 ` [PATCH v6 3/3] libceph: just wait for more data to be available on the socket xiubli
2024-02-27 17:22   ` Luis Henriques
2024-02-28  0:22     ` Xiubo Li
2024-02-28  0:45     ` Xiubo Li [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=831bfb4a-6213-4e32-8c68-252be354342e@redhat.com \
    --to=xiubli@redhat.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=idryomov@gmail.com \
    --cc=jlayton@kernel.org \
    --cc=lhenriques@suse.de \
    --cc=mchangir@redhat.com \
    --cc=vshankar@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.