linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jamie Lokier <jamie@shareable.org>
To: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Cc: Sage Weil <sage@newdream.net>, Jeff Garzik <jeff@garzik.org>,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: POHMELFS high performance network filesystem. Transactions, failover, performance.
Date: Wed, 14 May 2008 15:31:05 +0100	[thread overview]
Message-ID: <20080514143105.GB14987@shareable.org> (raw)
In-Reply-To: <20080514135156.GA23131@2ka.mipt.ru>

Evgeniy Polyakov wrote:
> > For writes, Paxos is actually more or less optimal (in the non-failure 
> > cases, at least).  Reads are trickier, but there are ways to keep that 
> > fast as well.  FWIW, Ceph extends basic Paxos with a leasing mechanism to 
> > keep reads fast, consistent, and distributed.  It's only used for cluster 
> > state, though, not file data.
> 
> Well, it depends... If we are talking about single node perfromance,
> then any protocol, which requries to wait for authorization (or any
> approach, which waits for acknowledge just after data was sent) is slow.
>
> If we are talking about agregate parallel perfromance, then its basic
> protocol with 2 messages is (probably) optimal, but still I'm not
> convinced, that 2 messages case is a good choise, I want one :)

Look up "one-phase commit" or even "zero-phase commit".  (The
terminology is cheating a bit.)  As I've understood it, all commit
protocols have a step where each node guarantees it can commit if
asked and node failure at that point does not invalidate the guarantee
if the node recovers (if it can't maintain the guarantee, the node
doesn't recover in a technical sense and a higher level protocol may
reintegrate the node).  One/zero-phase commit extends that to
guaranteeing a certain amounts and types of data can be written before
it knows what the data is, so write messages within that window are
sufficient for global commits.  Guarantees can be acquired
asynchronously in advance of need, and can have time and other limits.
These guarantees are no different in principle from the 1-bit
guarantee offered by the "can you commit" phase of other commit
protocols, so they aren't as weak as they seem.

Now combine it with a quorum protocol like Paxos, you can commit with
async guarantees from a subset of nodes.  Guarantees can be
piggybacked on earlier requests.  There, single node write
performance with quorum robustness.

-- Jamie

  reply	other threads:[~2008-05-14 14:31 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-13 17:45 POHMELFS high performance network filesystem. Transactions, failover, performance Evgeniy Polyakov
2008-05-13 19:09 ` Jeff Garzik
2008-05-13 20:51   ` Evgeniy Polyakov
2008-05-14  0:52     ` Jamie Lokier
2008-05-14  1:16       ` Florian Wiessner
2008-05-14  8:10         ` Evgeniy Polyakov
2008-05-14  7:57       ` Evgeniy Polyakov
2008-05-14 13:35     ` Sage Weil
2008-05-14 13:52       ` Evgeniy Polyakov
2008-05-14 14:31         ` Jamie Lokier [this message]
2008-05-14 15:00           ` Evgeniy Polyakov
2008-05-14 19:08             ` Jeff Garzik
2008-05-14 19:32               ` Evgeniy Polyakov
2008-05-14 20:37                 ` Jeff Garzik
2008-05-14 21:19                   ` Evgeniy Polyakov
2008-05-14 21:34                     ` Jeff Garzik
2008-05-14 21:32             ` Jamie Lokier
2008-05-14 21:37               ` Jeff Garzik
2008-05-14 21:43                 ` Jamie Lokier
2008-05-14 22:02               ` Evgeniy Polyakov
2008-05-14 22:28                 ` Jamie Lokier
2008-05-14 22:45                   ` Evgeniy Polyakov
2008-05-15  1:10                     ` Jamie Lokier
2008-05-15  7:34                       ` Evgeniy Polyakov
2008-05-14 19:05           ` Jeff Garzik
2008-05-14 21:38             ` Jamie Lokier
2008-05-14 19:03         ` Jeff Garzik
2008-05-14 19:38           ` Evgeniy Polyakov
2008-05-14 21:57             ` Jamie Lokier
2008-05-14 22:06               ` Jeff Garzik
2008-05-14 22:41                 ` Evgeniy Polyakov
2008-05-14 22:50                   ` Evgeniy Polyakov
2008-05-14 22:32               ` Evgeniy Polyakov
2008-05-14 14:09       ` Jamie Lokier
2008-05-14 16:09         ` Sage Weil
2008-05-14 19:11           ` Jeff Garzik
2008-05-14 21:19           ` Jamie Lokier
2008-05-14 18:24       ` Jeff Garzik
2008-05-14 20:00         ` Sage Weil
2008-05-14 21:49           ` Jeff Garzik
2008-05-14 22:26             ` Sage Weil
2008-05-14 22:35               ` Jamie Lokier
2008-05-14  6:33 ` Andrew Morton
2008-05-14  7:40   ` Evgeniy Polyakov
2008-05-14  8:01     ` Andrew Morton
2008-05-14  8:31       ` Evgeniy Polyakov
2008-05-14  8:08     ` Evgeniy Polyakov
2008-05-14 13:41       ` Sage Weil
2008-05-14 13:56         ` Evgeniy Polyakov
2008-05-14 17:56         ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080514143105.GB14987@shareable.org \
    --to=jamie@shareable.org \
    --cc=jeff@garzik.org \
    --cc=johnpol@2ka.mipt.ru \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=sage@newdream.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).