All of lore.kernel.org
 help / color / mirror / Atom feed
From: James Bottomley <James.Bottomley@HansenPartnership.com>
To: david@lang.hm
Cc: Willy Tarreau <w@1wt.eu>,
	Bart Van Assche <bart.vanassche@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Philipp Reisner <philipp.reisner@linbit.com>,
	linux-kernel@vger.kernel.org, Jens Axboe <jens.axboe@oracle.com>,
	Greg KH <gregkh@suse.de>, Neil Brown <neilb@suse.de>,
	Sam Ravnborg <sam@ravnborg.org>, Dave Jones <davej@redhat.com>,
	Nikanth Karthikesan <knikanth@suse.de>,
	Lars Marowsky-Bree <lmb@suse.de>,
	Kyle Moffett <kyle@moffetthome.net>,
	Lars Ellenberg <lars.ellenberg@linbit.com>
Subject: Re: [PATCH 00/16] DRBD: a block device for HA clusters
Date: Sun, 03 May 2009 10:09:31 -0500	[thread overview]
Message-ID: <1241363371.5596.45.camel@mulgrave.int.hansenpartnership.com> (raw)
In-Reply-To: <alpine.DEB.1.10.0905030753380.15782@asgard>

On Sun, 2009-05-03 at 07:56 -0700, david@lang.hm wrote:
> On Sun, 3 May 2009, James Bottomley wrote:
> 
> > Subject: Re: [PATCH 00/16] DRBD: a block device for HA clusters
> > 
> > On Sun, 2009-05-03 at 07:36 -0700, david@lang.hm wrote:
> >> On Sun, 3 May 2009, James Bottomley wrote:
> >>
> >>> Subject: Re: [PATCH 00/16] DRBD: a block device for HA clusters
> >>>
> >>> On Sat, 2009-05-02 at 22:40 -0700, david@lang.hm wrote:
> >>>> On Sun, 3 May 2009, Willy Tarreau wrote:
> >>>>
> >>>>> On Sat, May 02, 2009 at 09:33:35AM +0200, Bart Van Assche wrote:
> >>>>>> On Fri, May 1, 2009 at 10:59 AM, Andrew Morton
> >>>>>> <akpm@linux-foundation.org> wrote:
> >>>>>>> On Thu, 30 Apr 2009 13:26:36 +0200 Philipp Reisner <philipp.reisner@linbit.com> wrote:
> >>>>>>>
> >>>>>>>> This is a repost of DRBD
> >>>>>>>
> >>>>>>> Is it being used anywhere for anything?  If so, where and what?
> >>>>>>
> >>>>>> One popular application is to run iSCSI and HA software on top of DRBD
> >>>>>> in order to build a highly available iSCSI storage target.
> >>>>>
> >>>>> Confirmed, I have several customers who're doing exactly that.
> >>>>
> >>>> I will also say that there are a lot of us out here who would have a use
> >>>> for DRDB in our HA setups, but have held off implementing it specificly
> >>>> because it's not yet in the upstream kernel.
> >>>
> >>> Actually, that's not a particularly strong reason because we already
> >>> have an in-kernel replicator that has much of the functionality of drbd
> >>> that you could use.  The main reason for wanting drbd in kernel is that
> >>> it has a *current* user base.
> >>>
> >>> Both the in kernel md/nbd and drbd do sync and async replication with
> >>> primary side bitmaps.  The main differences are:
> >>>
> >>>      * md/nbd can do 1 to N replication,
> >>>      * drbd can do active/active replication (useful for cluster
> >>>        filesystems)
> >>>      * The chunk size of the md/nbd is tunable
> >>>      * With the updated nbd-tools, current md/nbd can do point in time
> >>>        rollback on transaction logged secondaries (a BCS requirement)
> >>>      * drbd manages the mirror state explicitly, md/nbd needs a user
> >>>        space helper
> >>>
> >>> And probably a few others I forget.
> >>
> >> one very big one:
> >>
> >> DRDB has better support for dealing with split brain situations and
> >> recovering from them.
> >
> > I don't really think so.  The decision about which (or if a) node should
> > be killed lies with the HA harness outside of the province of the
> > replication.
> >
> > One could argue that the symmetric active mode of drbd allows both nodes
> > to continue rather than having the harness make a kill decision about
> > one.  However, if they both alter the same data, you get an
> > irreconcilable data corruption fault which, one can argue, is directly
> > counter to HA principles and so allowing drbd continuation is arguably
> > the wrong thing to do.
> 
> but the issue is that at the time the failure is taking place, neither 
> side _knows_ that the other side is running. In fact, they both think that 
> the other side is dead.

Resolving this is the job of the HA harness, as I said ... the usual
solution being either third node pings or confirmable switchover.

> with DRDB, when the two sides start talking again they will discover that 
> they are different and complain, loudly, to the sysadmin that they need 
> help

The object of HA is to prevent data becoming toast, not to point it out
to the sysadmin after the fact.

> with md/ndb you have the situation where both sides will try to resync to 
> the other side as soon as the packets can get through. this can end up 
> corrupting both sides if it's not caught fast enough

Actually, that's just your implementation: md/nbd does nothing to
re-establish the replication, it has to be done by the HA harness after
split brain resolution.  What a correct harness would do is to compare
the HA event log and the intent logs to see if there had been activity
to both sides after loss of contact and, if their had, to flag the data
corruption problem and not resume replication.

This corruption situation isn't unique to replication ... any time you
may potentially have allowed both sides to write to a data store, you
get it, that's why it's the job of the HA harness to sort out whether a
split brain happened and what to do about it *first*.

James




  reply	other threads:[~2009-05-03 15:09 UTC|newest]

Thread overview: 90+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-30 11:26 [PATCH 00/16] DRBD: a block device for HA clusters Philipp Reisner
2009-04-30 11:26 ` [PATCH 01/16] DRBD: major.h Philipp Reisner
2009-04-30 11:26   ` [PATCH 02/16] DRBD: lru_cache Philipp Reisner
2009-04-30 11:26     ` [PATCH 03/16] DRBD: activity_log Philipp Reisner
2009-04-30 11:26       ` [PATCH 04/16] DRBD: bitmap Philipp Reisner
2009-04-30 11:26         ` [PATCH 05/16] DRBD: request Philipp Reisner
2009-04-30 11:26           ` [PATCH 06/16] DRBD: userspace_interface Philipp Reisner
2009-04-30 11:26             ` [PATCH 07/16] DRBD: internal_data_structures Philipp Reisner
2009-04-30 11:26               ` [PATCH 08/16] DRBD: main Philipp Reisner
2009-04-30 11:26                 ` [PATCH 09/16] DRBD: receiver Philipp Reisner
2009-04-30 11:26                   ` [PATCH 10/16] DRBD: proc Philipp Reisner
2009-04-30 11:26                     ` [PATCH 11/16] DRBD: worker Philipp Reisner
2009-04-30 11:26                       ` [PATCH 12/16] DRBD: variable_length_integer_encoding Philipp Reisner
2009-04-30 11:26                         ` [PATCH 13/16] DRBD: misc Philipp Reisner
2009-04-30 11:26                           ` [PATCH 14/16] DRBD: tracepoint_probes Philipp Reisner
2009-04-30 11:26                             ` [PATCH 15/16] DRBD: documentation Philipp Reisner
2009-04-30 11:26                               ` [PATCH 16/16] DRBD: final Philipp Reisner
2009-05-02 15:45                         ` [PATCH 12/16] DRBD: variable_length_integer_encoding James Bottomley
2009-05-02 17:29                           ` Lars Ellenberg
2009-05-02 15:44                     ` [PATCH 10/16] DRBD: proc James Bottomley
2009-05-02 20:23                       ` Lars Ellenberg
2009-05-02 15:41         ` [PATCH 04/16] DRBD: bitmap James Bottomley
2009-05-02 17:28           ` Lars Ellenberg
2009-05-03  5:21             ` Neil Brown
2009-05-03  7:38               ` Lars Ellenberg
2009-05-05 17:48               ` Lars Marowsky-Bree
2009-05-05 17:51                 ` James Bottomley
2009-05-05 22:26                 ` Neil Brown
2009-05-01  9:01       ` [PATCH 03/16] DRBD: activity_log Andrew Morton
2009-05-02 17:00         ` Lars Ellenberg
2009-05-01  8:59     ` [PATCH 02/16] DRBD: lru_cache Andrew Morton
2009-05-02 15:26       ` Lars Ellenberg
2009-05-02 17:58         ` Andrew Morton
2009-05-02 18:13           ` Lars Ellenberg
2009-05-02 18:26             ` Andrew Morton
2009-05-02 19:39               ` Lars Ellenberg
2009-05-02 23:51     ` Kyle Moffett
2009-05-03  6:27       ` Lars Ellenberg
2009-05-03 14:06         ` Kyle Moffett
2009-05-03 22:48           ` Lars Ellenberg
2009-05-04  0:48             ` Kyle Moffett
2009-05-04  1:01               ` Kyle Moffett
2009-05-04 16:12                 ` Rik van Riel
2009-05-04 16:15                   ` Lars Ellenberg
2009-05-01  8:59   ` [PATCH 01/16] DRBD: major.h Andrew Morton
2009-05-01  8:59 ` [PATCH 00/16] DRBD: a block device for HA clusters Andrew Morton
2009-05-01 11:15   ` Lars Marowsky-Bree
2009-05-01 13:14     ` Dave Jones
2009-05-01 19:14       ` Andrew Morton
2009-05-05  4:05     ` Christian Kujau
2009-05-02  7:33   ` Bart Van Assche
2009-05-03  5:36     ` Willy Tarreau
2009-05-03  5:40       ` david
2009-05-03 14:21         ` James Bottomley
2009-05-03 14:36           ` david
2009-05-03 14:45             ` James Bottomley
2009-05-03 14:56               ` david
2009-05-03 15:09                 ` James Bottomley [this message]
2009-05-03 15:22                   ` david
2009-05-03 15:38                     ` James Bottomley
2009-05-03 15:48                       ` david
2009-05-03 16:02                         ` James Bottomley
2009-05-03 16:13                           ` david
2009-05-04  8:28               ` Philipp Reisner
2009-05-04 17:24                 ` James Bottomley
2009-05-05  8:21                   ` Philipp Reisner
2009-05-05 14:09                     ` James Bottomley
2009-05-05 15:56                       ` Philipp Reisner
2009-05-05 17:05                         ` James Bottomley
2009-05-05 21:45                           ` Philipp Reisner
2009-05-05 21:53                             ` James Bottomley
2009-05-06  8:17                               ` Philipp Reisner
2009-05-05 15:03                     ` Bart Van Assche
2009-05-05 15:57                       ` Philipp Reisner
2009-05-05 17:38                         ` Lars Marowsky-Bree
2009-05-03 10:06       ` Philipp Reisner
2009-05-03 10:15         ` Thomas Backlund
2009-05-03  5:53 ` Neil Brown
2009-05-03  6:24   ` david
2009-05-03  8:29   ` Lars Ellenberg
2009-05-03 11:00     ` Neil Brown
2009-05-03 21:32       ` Lars Ellenberg
2009-05-04 16:12         ` Lars Marowsky-Bree
2009-05-05 22:08         ` Lars Ellenberg
2009-05-14 22:31 devzero
2009-05-15 12:10 Philipp Reisner
2009-07-06 15:39 [PATCH 00/16] drbd: " Philipp Reisner
2009-07-21  5:49 ` Andrew Morton
     [not found]   ` <20090720224940.36da1ef8.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2009-07-21 18:51     ` Lars Ellenberg
2009-07-22  4:59       ` [Drbd-dev] " Stephen Rothwell
2009-07-24 15:20         ` Philipp Reisner
     [not found]           ` <200907241720.22771.philipp.reisner-63ez5xqkn6DQT0dZR+AlfA@public.gmane.org>
2009-07-26 23:24             ` Stephen Rothwell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1241363371.5596.45.camel@mulgrave.int.hansenpartnership.com \
    --to=james.bottomley@hansenpartnership.com \
    --cc=akpm@linux-foundation.org \
    --cc=bart.vanassche@gmail.com \
    --cc=davej@redhat.com \
    --cc=david@lang.hm \
    --cc=gregkh@suse.de \
    --cc=jens.axboe@oracle.com \
    --cc=knikanth@suse.de \
    --cc=kyle@moffetthome.net \
    --cc=lars.ellenberg@linbit.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lmb@suse.de \
    --cc=neilb@suse.de \
    --cc=philipp.reisner@linbit.com \
    --cc=sam@ravnborg.org \
    --cc=w@1wt.eu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.