linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jamie Lokier <jamie@shareable.org>
To: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Cc: Jeff Garzik <jeff@garzik.org>, Sage Weil <sage@newdream.net>,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: POHMELFS high performance network filesystem. Transactions, failover, performance.
Date: Wed, 14 May 2008 22:57:06 +0100	[thread overview]
Message-ID: <20080514215704.GE23758@shareable.org> (raw)
In-Reply-To: <20080514193843.GB10165@2ka.mipt.ru>

Evgeniy Polyakov wrote:
> > Quite true, but IMO single-node performance is largely an academic 
> > exercise today.  What production system is run without backups or 
> > replication?
> 
> If cluster is made out of 2-3-4-10 machines, it does want to get maximum
> single node performance. But I agree that in some cases we have to
> sacrifice of something in order to find something new. And the larger
> cluster becomes, for more things we can close eyes on.

With the right topology and hardware, you can get _faster_ than single
node performance with as many nodes as you like, except when there is
a node/link failure and the network pauses briefly to reorganise - and
even that is solvable.

Consider:

    Client <-> A <-> B <-> C <-> D

A to D are servers.  <-> are independent network links.  Each server
has hardware which can forward a packet at the same time it's being
received like the best switches (wormhole routing), while performing
minor transformations on it (I did say the right hardware ;-)

Client sends a request message.  It is forwarded along the whole
chain, and reaches D with just a few microseconds of delay compared
with A.

All servers process the message, and produce a response in about the
same time.  However, (think of RAID) they don't all process all data
in the message, just part they are responsible for, so they might do
it faster than a single node would processing the whole message.

The aggregate response is a function of all of them.  D sends its
response.  C forwards that packet while modifying the answer to
include its own response.  B, A do the same.  The answer at Client
arrives just a few microseconds later than it would have with just a
single server.

If desired, arrange it in a tree to reduce even the microseconds.

Such network hardware is quite feasible, indeed quite easy with an
FPGA based NIC.

Enjoy the speed :-)

-- Jamie

  reply	other threads:[~2008-05-14 21:57 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-13 17:45 POHMELFS high performance network filesystem. Transactions, failover, performance Evgeniy Polyakov
2008-05-13 19:09 ` Jeff Garzik
2008-05-13 20:51   ` Evgeniy Polyakov
2008-05-14  0:52     ` Jamie Lokier
2008-05-14  1:16       ` Florian Wiessner
2008-05-14  8:10         ` Evgeniy Polyakov
2008-05-14  7:57       ` Evgeniy Polyakov
2008-05-14 13:35     ` Sage Weil
2008-05-14 13:52       ` Evgeniy Polyakov
2008-05-14 14:31         ` Jamie Lokier
2008-05-14 15:00           ` Evgeniy Polyakov
2008-05-14 19:08             ` Jeff Garzik
2008-05-14 19:32               ` Evgeniy Polyakov
2008-05-14 20:37                 ` Jeff Garzik
2008-05-14 21:19                   ` Evgeniy Polyakov
2008-05-14 21:34                     ` Jeff Garzik
2008-05-14 21:32             ` Jamie Lokier
2008-05-14 21:37               ` Jeff Garzik
2008-05-14 21:43                 ` Jamie Lokier
2008-05-14 22:02               ` Evgeniy Polyakov
2008-05-14 22:28                 ` Jamie Lokier
2008-05-14 22:45                   ` Evgeniy Polyakov
2008-05-15  1:10                     ` Jamie Lokier
2008-05-15  7:34                       ` Evgeniy Polyakov
2008-05-14 19:05           ` Jeff Garzik
2008-05-14 21:38             ` Jamie Lokier
2008-05-14 19:03         ` Jeff Garzik
2008-05-14 19:38           ` Evgeniy Polyakov
2008-05-14 21:57             ` Jamie Lokier [this message]
2008-05-14 22:06               ` Jeff Garzik
2008-05-14 22:41                 ` Evgeniy Polyakov
2008-05-14 22:50                   ` Evgeniy Polyakov
2008-05-14 22:32               ` Evgeniy Polyakov
2008-05-14 14:09       ` Jamie Lokier
2008-05-14 16:09         ` Sage Weil
2008-05-14 19:11           ` Jeff Garzik
2008-05-14 21:19           ` Jamie Lokier
2008-05-14 18:24       ` Jeff Garzik
2008-05-14 20:00         ` Sage Weil
2008-05-14 21:49           ` Jeff Garzik
2008-05-14 22:26             ` Sage Weil
2008-05-14 22:35               ` Jamie Lokier
2008-05-14  6:33 ` Andrew Morton
2008-05-14  7:40   ` Evgeniy Polyakov
2008-05-14  8:01     ` Andrew Morton
2008-05-14  8:31       ` Evgeniy Polyakov
2008-05-14  8:08     ` Evgeniy Polyakov
2008-05-14 13:41       ` Sage Weil
2008-05-14 13:56         ` Evgeniy Polyakov
2008-05-14 17:56         ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080514215704.GE23758@shareable.org \
    --to=jamie@shareable.org \
    --cc=jeff@garzik.org \
    --cc=johnpol@2ka.mipt.ru \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=sage@newdream.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).