All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Jim Schutt" <jaschut@sandia.gov>
To: Sage Weil <sage@newdream.net>
Cc: "ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: Re: Client reconnect failing: reader gets bad tag
Date: Fri, 6 May 2011 15:56:39 -0600	[thread overview]
Message-ID: <4DC46E97.6000901@sandia.gov> (raw)
In-Reply-To: <Pine.LNX.4.64.1105051215050.26811@cobra.newdream.net>

[-- Attachment #1: Type: text/plain, Size: 2176 bytes --]


Hi Sage,

Sage Weil wrote:
> On Wed, 4 May 2011, Jim Schutt wrote:
>> Hi,
>>
>> I'm seeing clients having trouble reconnecting after timed-out
>> requests.  When they get in this state, sometimes they manage
>> to reconnect after several attempts; sometimes they never seem
>> to be able to reconnect.
> 
> Hmm, the interesting line is
> 
>> 2011-05-04 16:00:59.710971 7f15d6948940 -- 172.17.40.30:6806/12583 >>
>> 172.17.40.49:0/302440129 pipe(0x213fa000 sd=91 pgs=430 cs=1 l=1).reader bad
>> tag 0
> 
> That _should_ mean the server side (osd) closes out the connection 
> immediately, which should generate a disconnect error on the client and an 
> immediate reconnect.  So it's strange that you're also seeing timeouts.
> 
> Of course, we should be getting bad tags anyway, so something else is 
> clearly wrong and may be contributing to both problems.  
> 
> How easy is this to reproduce?  It's right after a fresh connection, so 
> the number of possibly offending code paths is pretty small, at least!
> 
> There is client side debugging to turn on, but it's very chatty.  Maybe 
> you can just enable a few key lines, like the connect handshake ones, and 
> any point where we queue/send a tag.  It's a bit tedious to enable 
> the individual dout lines in messenger.c, sadly, but unless you have a 
> very fast netconsole or something that's probably the only way to go...

Here's some logs of a client-server hanging interaction.

My dd started on the client at 14:38:22.

The first bad tag can be seen in the osd6 log at 14:39:40.655544.

AFAICS, the client had written a stripe into its socket,
and the OSD got as far as reading the msg tag and header
when the client gave up the the message, closed the socket,
and reconnected.  The OSD got a bad tag on the new pipe.

After that the client continued to retry the send, but
for many retries it always sent a bad tag.  But, it seems
to do this without closing/opening the socket.

Then, the client does close/open the socket, and a valid
msg tag is sent, and things work fine.

FWIW, I think the client-side messenger isn't doing a
good job distinguishing a busy OSD from a dead OSD.

-- Jim

> 
> sage
> 
> 


[-- Attachment #2: client.full.log.bz2 --]
[-- Type: application/x-bzip, Size: 15714 bytes --]

[-- Attachment #3: client.log.bz2 --]
[-- Type: application/x-bzip, Size: 1844 bytes --]

[-- Attachment #4: server.log.bz2 --]
[-- Type: application/x-bzip, Size: 20897 bytes --]

  parent reply	other threads:[~2011-05-06 21:57 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-04 22:27 Client reconnect failing: reader gets bad tag Jim Schutt
2011-05-05 19:19 ` Sage Weil
2011-05-05 20:23   ` Jim Schutt
2011-05-06 21:56   ` Jim Schutt [this message]
2011-05-12 21:32   ` [PATCH 1/2] libceph: add debugging to understand how bad msg tag is getting sent Jim Schutt
2011-05-12 21:32   ` [PATCH 2/2] libceph: fix handle_timeout() racing with con_work()/try_write() Jim Schutt
2011-05-16 16:57     ` [PATCH v2 0/1] " Jim Schutt
2011-05-16 16:57       ` [PATCH v2 1/1] " Jim Schutt
2011-05-16 17:57     ` [PATCH 2/2] " Sage Weil
2011-05-16 19:06       ` Jim Schutt
2011-05-17 22:32       ` Jim Schutt
2011-05-17 23:27         ` Sage Weil
2011-05-17 23:38           ` Sage Weil
2011-05-18 14:34             ` Jim Schutt
2011-05-18 20:27             ` Jim Schutt
2011-05-18 23:36               ` Sage Weil
2011-05-19 17:31                 ` Jim Schutt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4DC46E97.6000901@sandia.gov \
    --to=jaschut@sandia.gov \
    --cc=ceph-devel@vger.kernel.org \
    --cc=sage@newdream.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.