All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gregory Farnum <greg@inktank.com>
To: Yann Dupont <Yann.Dupont@univ-nantes.fr>
Cc: Sam Just <sam.just@inktank.com>, ceph-devel <ceph-devel@vger.kernel.org>
Subject: Re: domino-style OSD crash
Date: Fri, 6 Jul 2012 10:01:28 -0700	[thread overview]
Message-ID: <CAPYLRzgb7KU5jBjqWS7GiYc2KqNUXXjcOv=kRoD4cEavotaX0Q@mail.gmail.com> (raw)
In-Reply-To: <4FF6919C.8080201@univ-nantes.fr>

On Fri, Jul 6, 2012 at 12:19 AM, Yann Dupont <Yann.Dupont@univ-nantes.fr> wrote:
> Le 05/07/2012 23:32, Gregory Farnum a écrit :
>
> [...]
>
>>> ok, so as all nodes were identical, I probably have hit a btrfs bug (like
>>> a
>>> erroneous out of space ) in more or less the same time. And when 1 osd
>>> was
>>> out,
>
>
> OH , I didn't finish the sentence... When 1 osd was out, missing data was
> copied on another nodes, probably speeding btrfs problem on those nodes (I
> suspect erroneous out of space conditions)

Ah. How full are/were the disks?

>
> I've reformatted OSD with xfs. Performance is slightly worse for the moment
> (well, depend on the workload, and maybe lack of syncfs is to blame), but at
> least I hope to have the storage layer rock-solid. BTW, I've managed to keep
> the faulty btrfs volumes .
>
> [...]
>
>
>>>> I wonder if maybe there's a confounding factor here — are all your nodes
>>>> similar to each other,
>>>
>>> Yes. I designed the cluster that way. All nodes are identical hardware
>>> (powerEdge M610, 10G intel ethernet + emulex fibre channel attached to
>>> storage (1 Array for 2 OSD nodes, 1 controller dedicated for each OSD)
>>
>> Oh, interesting. Are the broken nodes all on the same set of arrays?
>
>
> No. There are 4 completely independant raid arrays, in 4 different
> locations. They are similar (same brand & model, but slighltly different
> disks, and 1 different firmware), all arrays are multipathed. I don't think
> the raid array is the problem. We use those particular models since 2/3
> years, and in the logs I don't see any problem that can be caused by the
> storage itself (like scsi or multipath errors)

I must have misunderstood then. What did you mean by "1 Array for 2 OSD nodes"?
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2012-07-06 17:01 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-04  8:44 domino-style OSD crash Yann Dupont
2012-06-04 16:16 ` Tommi Virtanen
2012-06-04 17:40   ` Sam Just
2012-06-04 18:34     ` Greg Farnum
2012-07-03  8:40     ` Yann Dupont
2012-07-03 19:42       ` Tommi Virtanen
2012-07-03 20:54         ` Yann Dupont
2012-07-03 21:38           ` Tommi Virtanen
2012-07-04  8:06             ` Yann Dupont
2012-07-04 16:21               ` Gregory Farnum
2012-07-04 17:53                 ` Yann Dupont
2012-07-05 21:32                   ` Gregory Farnum
2012-07-06  7:19                     ` Yann Dupont
2012-07-06 17:01                       ` Gregory Farnum [this message]
2012-07-07  8:19                         ` Yann Dupont
2012-07-09 17:14                           ` Samuel Just
2012-07-10  9:46                             ` Yann Dupont
2012-07-10 15:56                               ` Tommi Virtanen
2012-07-10 16:39                                 ` Yann Dupont
2012-07-10 17:11                                   ` Tommi Virtanen
2012-07-10 17:36                                     ` Yann Dupont
2012-07-10 18:16                                       ` Tommi Virtanen
2012-07-09 17:43               ` Tommi Virtanen
2012-07-09 19:05                 ` Yann Dupont
2012-07-09 19:48                   ` Tommi Virtanen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPYLRzgb7KU5jBjqWS7GiYc2KqNUXXjcOv=kRoD4cEavotaX0Q@mail.gmail.com' \
    --to=greg@inktank.com \
    --cc=Yann.Dupont@univ-nantes.fr \
    --cc=ceph-devel@vger.kernel.org \
    --cc=sam.just@inktank.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.