All of lore.kernel.org
 help / color / mirror / Atom feed
* raid over ethernet
@ 2011-01-29  1:58 Roberto Spadim
  2011-01-29  5:41 ` Jérôme Poulin
  2011-01-29  6:42 ` Mikael Abrahamsson
  0 siblings, 2 replies; 22+ messages in thread
From: Roberto Spadim @ 2011-01-29  1:58 UTC (permalink / raw)
  To: Linux-RAID

hi guys, i was thinking about raid over ethernet... there's a solution
to make a syncronous replica of my filesystem? no problem if my
primary server get down, i can mout my replica fsck it and continue
with available data
i was reading about nbd, anyone have more ideas?

-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: raid over ethernet
  2011-01-29  1:58 raid over ethernet Roberto Spadim
@ 2011-01-29  5:41 ` Jérôme Poulin
  2011-01-29  6:42   ` Roberto Spadim
  2011-01-29  6:42 ` Mikael Abrahamsson
  1 sibling, 1 reply; 22+ messages in thread
From: Jérôme Poulin @ 2011-01-29  5:41 UTC (permalink / raw)
  To: Roberto Spadim; +Cc: Linux-RAID

DRBD: http://www.drbd.org/

Envoyé de mon appareil mobile.

Jérôme Poulin
Solutions G.A.

On 2011-01-28, at 20:58, Roberto Spadim <roberto@spadim.com.br> wrote:

> hi guys, i was thinking about raid over ethernet... there's a solution
> to make a syncronous replica of my filesystem? no problem if my
> primary server get down, i can mout my replica fsck it and continue
> with available data
> i was reading about nbd, anyone have more ideas?
>
> --
> Roberto Spadim
> Spadim Technology / SPAEmpresarial
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: raid over ethernet
  2011-01-29  5:41 ` Jérôme Poulin
@ 2011-01-29  6:42   ` Roberto Spadim
  2011-01-29 13:29     ` Alexander Schreiber
  0 siblings, 1 reply; 22+ messages in thread
From: Roberto Spadim @ 2011-01-29  6:42 UTC (permalink / raw)
  To: Jérôme Poulin; +Cc: Linux-RAID

is it better than nbd+mdadm?

2011/1/29 Jérôme Poulin <jeromepoulin@gmail.com>:
> DRBD: http://www.drbd.org/
>
> Envoyé de mon appareil mobile.
>
> Jérôme Poulin
> Solutions G.A.
>
> On 2011-01-28, at 20:58, Roberto Spadim <roberto@spadim.com.br> wrote:
>
>> hi guys, i was thinking about raid over ethernet... there's a solution
>> to make a syncronous replica of my filesystem? no problem if my
>> primary server get down, i can mout my replica fsck it and continue
>> with available data
>> i was reading about nbd, anyone have more ideas?
>>
>> --
>> Roberto Spadim
>> Spadim Technology / SPAEmpresarial
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: raid over ethernet
  2011-01-29  1:58 raid over ethernet Roberto Spadim
  2011-01-29  5:41 ` Jérôme Poulin
@ 2011-01-29  6:42 ` Mikael Abrahamsson
  2011-01-29  6:44   ` Roberto Spadim
  2011-01-29 18:34   ` David Brown
  1 sibling, 2 replies; 22+ messages in thread
From: Mikael Abrahamsson @ 2011-01-29  6:42 UTC (permalink / raw)
  To: Roberto Spadim; +Cc: Linux-RAID

On Fri, 28 Jan 2011, Roberto Spadim wrote:

> hi guys, i was thinking about raid over ethernet... there's a solution
> to make a syncronous replica of my filesystem? no problem if my
> primary server get down, i can mout my replica fsck it and continue
> with available data
> i was reading about nbd, anyone have more ideas?

Look into AoE (ATA over Ethernet).

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: raid over ethernet
  2011-01-29  6:42 ` Mikael Abrahamsson
@ 2011-01-29  6:44   ` Roberto Spadim
  2011-01-29  6:48     ` Roberto Spadim
                       ` (2 more replies)
  2011-01-29 18:34   ` David Brown
  1 sibling, 3 replies; 22+ messages in thread
From: Roberto Spadim @ 2011-01-29  6:44 UTC (permalink / raw)
  To: Mikael Abrahamsson; +Cc: Linux-RAID

faster than nbd?

2011/1/29 Mikael Abrahamsson <swmike@swm.pp.se>:
> On Fri, 28 Jan 2011, Roberto Spadim wrote:
>
>> hi guys, i was thinking about raid over ethernet... there's a solution
>> to make a syncronous replica of my filesystem? no problem if my
>> primary server get down, i can mout my replica fsck it and continue
>> with available data
>> i was reading about nbd, anyone have more ideas?
>
> Look into AoE (ATA over Ethernet).
>
> --
> Mikael Abrahamsson    email: swmike@swm.pp.se
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: raid over ethernet
  2011-01-29  6:44   ` Roberto Spadim
@ 2011-01-29  6:48     ` Roberto Spadim
       [not found]       ` <AANLkTikdahgMoJjGr2otTS70LSM77GNpW_vAkZf15Kph@mail.gmail.com>
  2011-01-29 13:34     ` Alexander Schreiber
  2011-01-29 15:30     ` Spelic
  2 siblings, 1 reply; 22+ messages in thread
From: Roberto Spadim @ 2011-01-29  6:48 UTC (permalink / raw)
  To: Mikael Abrahamsson; +Cc: Linux-RAID

better than drbd?

2011/1/29 Roberto Spadim <roberto@spadim.com.br>:
> faster than nbd?
>
> 2011/1/29 Mikael Abrahamsson <swmike@swm.pp.se>:
>> On Fri, 28 Jan 2011, Roberto Spadim wrote:
>>
>>> hi guys, i was thinking about raid over ethernet... there's a solution
>>> to make a syncronous replica of my filesystem? no problem if my
>>> primary server get down, i can mout my replica fsck it and continue
>>> with available data
>>> i was reading about nbd, anyone have more ideas?
>>
>> Look into AoE (ATA over Ethernet).
>>
>> --
>> Mikael Abrahamsson    email: swmike@swm.pp.se
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> --
> Roberto Spadim
> Spadim Technology / SPAEmpresarial
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: raid over ethernet
       [not found]       ` <AANLkTikdahgMoJjGr2otTS70LSM77GNpW_vAkZf15Kph@mail.gmail.com>
@ 2011-01-29 11:47         ` Roberto Spadim
  0 siblings, 0 replies; 22+ messages in thread
From: Roberto Spadim @ 2011-01-29 11:47 UTC (permalink / raw)
  To: Peter Chacko; +Cc: Mikael Abrahamsson, Linux-RAID

Manging the combination of nbd and mdraid is complicated.

complicated = drbd work?

2011/1/29 Peter Chacko <peterchacko35@gmail.com>:
> AoE is not routable. And has no replication .Its not used  for DRBD or NBD.
> AoE is best if you want to implement cheapest SAN in the local network.
> For the original purpose, DRBD is the best. Manging the combination of nbd
> and mdraid is complicated.
> thanks.
> Peter Chacko,
> Athinio data systems.
>
> On Sat, Jan 29, 2011 at 12:18 PM, Roberto Spadim <roberto@spadim.com.br>
> wrote:
>>
>> better than drbd?
>>
>> 2011/1/29 Roberto Spadim <roberto@spadim.com.br>:
>> > faster than nbd?
>> >
>> > 2011/1/29 Mikael Abrahamsson <swmike@swm.pp.se>:
>> >> On Fri, 28 Jan 2011, Roberto Spadim wrote:
>> >>
>> >>> hi guys, i was thinking about raid over ethernet... there's a solution
>> >>> to make a syncronous replica of my filesystem? no problem if my
>> >>> primary server get down, i can mout my replica fsck it and continue
>> >>> with available data
>> >>> i was reading about nbd, anyone have more ideas?
>> >>
>> >> Look into AoE (ATA over Ethernet).
>> >>
>> >> --
>> >> Mikael Abrahamsson    email: swmike@swm.pp.se
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe linux-raid"
>> >> in
>> >> the body of a message to majordomo@vger.kernel.org
>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >>
>> >
>> >
>> >
>> > --
>> > Roberto Spadim
>> > Spadim Technology / SPAEmpresarial
>> >
>>
>>
>>
>> --
>> Roberto Spadim
>> Spadim Technology / SPAEmpresarial
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: raid over ethernet
  2011-01-29  6:42   ` Roberto Spadim
@ 2011-01-29 13:29     ` Alexander Schreiber
  0 siblings, 0 replies; 22+ messages in thread
From: Alexander Schreiber @ 2011-01-29 13:29 UTC (permalink / raw)
  To: Roberto Spadim; +Cc: Jérôme Poulin, Linux-RAID

On Sat, Jan 29, 2011 at 04:42:16AM -0200, Roberto Spadim wrote:
> is it better than nbd+mdadm?

Definitely. We are using drbd replicated disks on a _lot_ of machines,
with all kinds of outside events: disk failures, network failures,
machine failures of various interesting variants. Despite this kind of
pounding, drbd turned out to be very robust, with data loss happening
very rarely (well, with some combined failures you are just plain
screwed - that's why one has backups).

Kind regards,
           Alex.
-- 
"Opportunity is missed by most people because it is dressed in overalls and
 looks like work."                                      -- Thomas A. Edison

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: raid over ethernet
  2011-01-29  6:44   ` Roberto Spadim
  2011-01-29  6:48     ` Roberto Spadim
@ 2011-01-29 13:34     ` Alexander Schreiber
       [not found]       ` <AANLkTi=6ridRPnHpfdOC=f2_ESndSARmQRkvT_shYO3s@mail.gmail.com>
  2011-01-29 15:30     ` Spelic
  2 siblings, 1 reply; 22+ messages in thread
From: Alexander Schreiber @ 2011-01-29 13:34 UTC (permalink / raw)
  To: Roberto Spadim; +Cc: Mikael Abrahamsson, Linux-RAID

On Sat, Jan 29, 2011 at 04:44:05AM -0200, Roberto Spadim wrote:
> faster than nbd?

I don't know how drbd compares in speed to ndb, but drbd is obviously
slower than plain disks, especially if you care about your data. In the
only sensible operating mode (synchronous writes to the underlying block
devices), the speed (both bandwidth and latency) depends on your disks
and your network connection (so you better get at least a Gigabit link).
Depending on your particular setup, you'll probably get 50-60% of the
plain disk performance for writes, while reads should be reasonably
close to the plain disk performance - drbd optimizes reads by just reading
from the local disk if it can.

Kind regards,
          Alex.
-- 
"Opportunity is missed by most people because it is dressed in overalls and
 looks like work."                                      -- Thomas A. Edison

^ permalink raw reply	[flat|nested] 22+ messages in thread

* raid over ethernet
       [not found]       ` <AANLkTi=6ridRPnHpfdOC=f2_ESndSARmQRkvT_shYO3s@mail.gmail.com>
@ 2011-01-29 14:25         ` Denis
  2011-01-29 21:08         ` Alexander Schreiber
  1 sibling, 0 replies; 22+ messages in thread
From: Denis @ 2011-01-29 14:25 UTC (permalink / raw)
  To: Linux-RAID; +Cc: Roberto Spadim, Mikael Abrahamsson, Alexander Schreiber

ouch, html. - my bad.

---------- Forwarded message ----------
From: Denis <denismpa@gmail.com>
Date: 2011/1/29
Subject: Re: raid over ethernet
To: Alexander Schreiber <als@thangorodrim.de>
Cc: Roberto Spadim <roberto@spadim.com.br>, Mikael Abrahamsson
<swmike@swm.pp.se>, Linux-RAID <linux-raid@vger.kernel.org>




2011/1/29 Roberto Spadim <roberto@spadim.com.br>
>
> Manging the combination of nbd and mdraid is complicated.
>
> complicated = drbd work?

I have been using drbd for a long time and it's quite easy to
implement, manage and use. The main purpose of all aplications I have
used it for, were high availibility and it works just fine. And it's
really cool to se it integrated with heartbeat, which will manage to
mount the partition on one or another node, according to your police
and nodes availibility.

2011/1/29 Alexander Schreiber <als@thangorodrim.de>
>
> plain disk performance for writes, while reads should be reasonably
> close to the plain disk performance - drbd optimizes reads by just reading
> from the local disk if it can.
>

 However, I have not used it with active-active fashion. Have you? if
yes, what is your overall experience?

>
> Kind regards,
>          Alex.
> --
> "Opportunity is missed by most people because it is dressed in overalls and
>  looks like work."                                      -- Thomas A. Edison
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Cheers,

--
Denis Anjos,
www.versatushpc.com.br



--
Denis Anjos,
www.versatushpc.com.br
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: raid over ethernet
  2011-01-29  6:44   ` Roberto Spadim
  2011-01-29  6:48     ` Roberto Spadim
  2011-01-29 13:34     ` Alexander Schreiber
@ 2011-01-29 15:30     ` Spelic
  2 siblings, 0 replies; 22+ messages in thread
From: Spelic @ 2011-01-29 15:30 UTC (permalink / raw)
  To: linux-raid

On 01/29/2011 07:44 AM, Roberto Spadim wrote:
> faster than nbd?

NBD is fast but has one problem: if you lose network connectivity for a 
while (tcp drops) there is no recovery I am aware of. I think it unmaps 
the disk until user intervention. Or this was the situation a couple of 
years ago.
Actually for RAID this might even be a good point, but keep it in mind.
iscsi seems an obvious alternative. And you can put anything under MD I 
think, but DRBD (without MD) is probably better because it's made 
exactly for that purpose.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: raid over ethernet
  2011-01-29  6:42 ` Mikael Abrahamsson
  2011-01-29  6:44   ` Roberto Spadim
@ 2011-01-29 18:34   ` David Brown
  1 sibling, 0 replies; 22+ messages in thread
From: David Brown @ 2011-01-29 18:34 UTC (permalink / raw)
  To: linux-raid

On 29/01/11 07:42, Mikael Abrahamsson wrote:
> On Fri, 28 Jan 2011, Roberto Spadim wrote:
>
>> hi guys, i was thinking about raid over ethernet... there's a solution
>> to make a syncronous replica of my filesystem? no problem if my
>> primary server get down, i can mout my replica fsck it and continue
>> with available data
>> i was reading about nbd, anyone have more ideas?
>
> Look into AoE (ATA over Ethernet).
>

I think AoE is limited to fairly direct connections - it doesn't use IP, 
and can't be routed (at least not easily - I'm sure it is possible if 
you try hard enough).  The alternative is iSCSI, which does use IP and 
can therefore be routed and passed around over networks.  AoE is 
therefore slightly more efficient, and iSCSI more flexible.

If you are looking at making a raid1 with an iSCSI or AoE target as one 
of the disks, consider using a write-intent bitmap and the 
--write-mostly and --write-behind flags.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: raid over ethernet
       [not found]       ` <AANLkTi=6ridRPnHpfdOC=f2_ESndSARmQRkvT_shYO3s@mail.gmail.com>
  2011-01-29 14:25         ` Denis
@ 2011-01-29 21:08         ` Alexander Schreiber
  2011-01-29 21:54           ` John Robinson
  2011-01-31  8:42           ` Denis
  1 sibling, 2 replies; 22+ messages in thread
From: Alexander Schreiber @ 2011-01-29 21:08 UTC (permalink / raw)
  To: Denis; +Cc: Roberto Spadim, Mikael Abrahamsson, Linux-RAID

On Sat, Jan 29, 2011 at 12:23:14PM -0200, Denis wrote:
> 2011/1/29 Alexander Schreiber <als@thangorodrim.de>
> 
> >
> > plain disk performance for writes, while reads should be reasonably
> > close to the plain disk performance - drbd optimizes reads by just reading
> > from the local disk if it can.
> >
> >
>  However, I have not used it with active-active fashion. Have you? if yes,
> what is your overall experience?

We are using drbd to provide mirrored disks for virtual machines running
under Xen. 99% of the time, the drbd devices run in primary/secondary
mode (aka active/passive), but they are switched to primary/primary
(aka active/active) for live migrations of domains, as that needs the
disks to be available on both nodes. From our experience, if the drbd
device is healthy, this is very reliable. No experience with running
drbd in primary/primary config for any extended period of time, though
(the live migrations are usually over after a few seconds to a minute at
most, then the drbd devices go back to primary/secondary).

Kind regards,
          Alex.
-- 
"Opportunity is missed by most people because it is dressed in overalls and
 looks like work."                                      -- Thomas A. Edison

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: raid over ethernet
  2011-01-29 21:08         ` Alexander Schreiber
@ 2011-01-29 21:54           ` John Robinson
  2011-01-29 23:04             ` Stan Hoeppner
                               ` (2 more replies)
  2011-01-31  8:42           ` Denis
  1 sibling, 3 replies; 22+ messages in thread
From: John Robinson @ 2011-01-29 21:54 UTC (permalink / raw)
  To: Alexander Schreiber; +Cc: Linux-RAID

On 29/01/2011 21:08, Alexander Schreiber wrote:
> On Sat, Jan 29, 2011 at 12:23:14PM -0200, Denis wrote:
>> 2011/1/29 Alexander Schreiber<als@thangorodrim.de>
>>
>>>
>>> plain disk performance for writes, while reads should be reasonably
>>> close to the plain disk performance - drbd optimizes reads by just reading
>>> from the local disk if it can.
>>>
>>>
>>   However, I have not used it with active-active fashion. Have you? if yes,
>> what is your overall experience?
>
> We are using drbd to provide mirrored disks for virtual machines running
> under Xen. 99% of the time, the drbd devices run in primary/secondary
> mode (aka active/passive), but they are switched to primary/primary
> (aka active/active) for live migrations of domains, as that needs the
> disks to be available on both nodes. From our experience, if the drbd
> device is healthy, this is very reliable. No experience with running
> drbd in primary/primary config for any extended period of time, though
> (the live migrations are usually over after a few seconds to a minute at
> most, then the drbd devices go back to primary/secondary).

Now that is interesting, to me at least. More as a thought experiment 
for now, I was wondering how one would go about setting up a small 
cluster of commodity servers (maybe 8 machines) running Xen (or perhaps 
now KVM) VMs, such that if one (or potentially two) of the machines 
died, the VMs could be picked up by the other machines in the cluster, 
and only using locally-attached SATA/SAS discs in each machine.

I guess I'm talking about RAIN or RAIS rather than RAID so maybe I'd 
better start reading the Wikipedia pages on those and not talk about it 
on this list...

Cheers,

John.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: raid over ethernet
  2011-01-29 21:54           ` John Robinson
@ 2011-01-29 23:04             ` Stan Hoeppner
  2011-01-29 23:06             ` Miles Fidelman
  2011-01-30  1:43             ` Alexander Schreiber
  2 siblings, 0 replies; 22+ messages in thread
From: Stan Hoeppner @ 2011-01-29 23:04 UTC (permalink / raw)
  To: John Robinson; +Cc: Alexander Schreiber, Linux-RAID

John Robinson put forth on 1/29/2011 3:54 PM:

> Now that is interesting, to me at least. More as a thought experiment for now, I
> was wondering how one would go about setting up a small cluster of commodity
> servers (maybe 8 machines) running Xen (or perhaps now KVM) VMs, such that if
> one (or potentially two) of the machines died, the VMs could be picked up by the
> other machines in the cluster, and only using locally-attached SATA/SAS discs in
> each machine.

Doing N-way active replication with DRBD increases network utilization
substantially.  With two DRBD active nodes you will have a maximum of _2_
simultaneous data streams, one in each direction.  With 8 active nodes you will
have a maximum of _56_ simultaneous data streams.  Your scenario requires all
nodes be active.

This may work for a hobby cluster or something with very low volume of data
being written to disk.  This solution most likely won't scale for a cluster with
any amount of real traffic.  GbE peaks at 100 MB/s.  Therefore each node will
have only about 12 MB/s of bidirectional bandwidth for each other cluster member
if my math is correct.  A single SATA disk run about 80-120 MB/s, so your
network DRBD disk bandwidth is about 1/7th to 1/10th that of a single local
disk.  In a 2 node cluster it's closer to 1:1.  For you scenario to actually be
feasible, you'd need at least bonded quad GbE interfaces if not single 10 GbE
interfaces to get all the bandwidth you'd need.

You'd be _MUCH_ better off using 2 active DRBD mirrored NFS servers with GFS2
filesystems and having the aforementioned 8 nodes do their data sharing via NFS.
 In this setup each node only writes once (to NFS) dramatically reducing network
bandwidth required per node, with only 16 maximum data streams instead of 56.
If you need more bandwidth or IOPS than a single disk NFS server can produce,
simply RAID 4-10 disks on each NFS server via RAID 10, then mirror the two RAIDs
with DRBD.

You may need 2-4 GbE interfaces between the two NFS servers just for DRBD
traffic, but the cost of that is much less than having the same number of
interfaces in each of 8 cluster nodes.  This will also give you much better
performance after a node or two fails and you have to boot their VM guests on
other hosts.  Having fast central RAID storage will allow those guests to boot
much more quickly and without causing degraded performance on the other nodes
due to lack of disk bandwidth in your suggested model.

-- 
Stan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: raid over ethernet
  2011-01-29 21:54           ` John Robinson
  2011-01-29 23:04             ` Stan Hoeppner
@ 2011-01-29 23:06             ` Miles Fidelman
  2011-01-30  1:43             ` Alexander Schreiber
  2 siblings, 0 replies; 22+ messages in thread
From: Miles Fidelman @ 2011-01-29 23:06 UTC (permalink / raw)
  Cc: Linux-RAID

John Robinson wrote:
> Now that is interesting, to me at least. More as a thought experiment 
> for now, I was wondering how one would go about setting up a small 
> cluster of commodity servers (maybe 8 machines) running Xen (or 
> perhaps now KVM) VMs, such that if one (or potentially two) of the 
> machines died, the VMs could be picked up by the other machines in the 
> cluster, and only using locally-attached SATA/SAS discs in each machine.
I do that now - albeit only on a 2-node cluster.  DRBD works just fine 
using locally attached drives.

-- 
In theory, there is no difference between theory and practice.
In<fnord>  practice, there is.   .... Yogi Berra



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: raid over ethernet
  2011-01-29 21:54           ` John Robinson
  2011-01-29 23:04             ` Stan Hoeppner
  2011-01-29 23:06             ` Miles Fidelman
@ 2011-01-30  1:43             ` Alexander Schreiber
  2 siblings, 0 replies; 22+ messages in thread
From: Alexander Schreiber @ 2011-01-30  1:43 UTC (permalink / raw)
  To: John Robinson; +Cc: Linux-RAID

On Sat, Jan 29, 2011 at 09:54:55PM +0000, John Robinson wrote:
> On 29/01/2011 21:08, Alexander Schreiber wrote:
> >On Sat, Jan 29, 2011 at 12:23:14PM -0200, Denis wrote:
> >>2011/1/29 Alexander Schreiber<als@thangorodrim.de>
> >>
> >>>
> >>>plain disk performance for writes, while reads should be reasonably
> >>>close to the plain disk performance - drbd optimizes reads by just reading
> >>>from the local disk if it can.
> >>>
> >>>
> >>  However, I have not used it with active-active fashion. Have you? if yes,
> >>what is your overall experience?
> >
> >We are using drbd to provide mirrored disks for virtual machines running
> >under Xen. 99% of the time, the drbd devices run in primary/secondary
> >mode (aka active/passive), but they are switched to primary/primary
> >(aka active/active) for live migrations of domains, as that needs the
> >disks to be available on both nodes. From our experience, if the drbd
> >device is healthy, this is very reliable. No experience with running
> >drbd in primary/primary config for any extended period of time, though
> >(the live migrations are usually over after a few seconds to a minute at
> >most, then the drbd devices go back to primary/secondary).
> 
> Now that is interesting, to me at least. More as a thought
> experiment for now, I was wondering how one would go about setting
> up a small cluster of commodity servers (maybe 8 machines) running
> Xen (or perhaps now KVM) VMs, such that if one (or potentially two)
> of the machines died, the VMs could be picked up by the other
> machines in the cluster, and only using locally-attached SATA/SAS
> discs in each machine.
> 
> I guess I'm talking about RAIN or RAIS rather than RAID so maybe I'd
> better start reading the Wikipedia pages on those and not talk about
> it on this list...

For the "survive single node total machine failure" case your problem has
already been solved: http://code.google.com/p/ganeti/

We run a large number of clusters with that and the VMs routinely survive
disk failures and recover (come back from what looks like a power failure
to the VM) from node failure.

Kind regards,
           Alex.
-- 
"Opportunity is missed by most people because it is dressed in overalls and
 looks like work."                                      -- Thomas A. Edison

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: raid over ethernet
  2011-01-29 21:08         ` Alexander Schreiber
  2011-01-29 21:54           ` John Robinson
@ 2011-01-31  8:42           ` Denis
  2011-01-31 13:03             ` Alexander Schreiber
  1 sibling, 1 reply; 22+ messages in thread
From: Denis @ 2011-01-31  8:42 UTC (permalink / raw)
  To: Alexander Schreiber; +Cc: Roberto Spadim, Mikael Abrahamsson, Linux-RAID

2011/1/29 Alexander Schreiber <als@thangorodrim.de>:
> On Sat, Jan 29, 2011 at 12:23:14PM -0200, Denis wrote:
>> 2011/1/29 Alexander Schreiber <als@thangorodrim.de>
>>
>> >
>> > plain disk performance for writes, while reads should be reasonably
>> > close to the plain disk performance - drbd optimizes reads by just reading
>> > from the local disk if it can.
>> >
>> >
>>  However, I have not used it with active-active fashion. Have you? if yes,
>> what is your overall experience?
>
> We are using drbd to provide mirrored disks for virtual machines running
> under Xen. 99% of the time, the drbd devices run in primary/secondary
> mode (aka active/passive), but they are switched to primary/primary
> (aka active/active) for live migrations of domains, as that needs the
> disks to be available on both nodes. From our experience, if the drbd
> device is healthy, this is very reliable. No experience with running
> drbd in primary/primary config for any extended period of time, though
> (the live migrations are usually over after a few seconds to a minute at
> most, then the drbd devices go back to primary/secondary).

What filesystem are you using to enable the primary-primary mode? Have
you evaluated it against any other available option?
>
> Kind regards,
>          Alex.
> --
> "Opportunity is missed by most people because it is dressed in overalls and
>  looks like work."                                      -- Thomas A. Edison
>

cheers!

-- 
Denis Anjos,
www.versatushpc.com.br
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: raid over ethernet
  2011-01-31  8:42           ` Denis
@ 2011-01-31 13:03             ` Alexander Schreiber
  2011-01-31 14:45               ` Roberto Spadim
  0 siblings, 1 reply; 22+ messages in thread
From: Alexander Schreiber @ 2011-01-31 13:03 UTC (permalink / raw)
  To: Denis; +Cc: Roberto Spadim, Mikael Abrahamsson, Linux-RAID

On Mon, Jan 31, 2011 at 06:42:44AM -0200, Denis wrote:
> 2011/1/29 Alexander Schreiber <als@thangorodrim.de>:
> > On Sat, Jan 29, 2011 at 12:23:14PM -0200, Denis wrote:
> >> 2011/1/29 Alexander Schreiber <als@thangorodrim.de>
> >>
> >> >
> >> > plain disk performance for writes, while reads should be reasonably
> >> > close to the plain disk performance - drbd optimizes reads by just reading
> >> > from the local disk if it can.
> >> >
> >> >
> >>  However, I have not used it with active-active fashion. Have you? if yes,
> >> what is your overall experience?
> >
> > We are using drbd to provide mirrored disks for virtual machines running
> > under Xen. 99% of the time, the drbd devices run in primary/secondary
> > mode (aka active/passive), but they are switched to primary/primary
> > (aka active/active) for live migrations of domains, as that needs the
> > disks to be available on both nodes. From our experience, if the drbd
> > device is healthy, this is very reliable. No experience with running
> > drbd in primary/primary config for any extended period of time, though
> > (the live migrations are usually over after a few seconds to a minute at
> > most, then the drbd devices go back to primary/secondary).
> 
> What filesystem are you using to enable the primary-primary mode? Have
> you evaluated it against any other available option?

The filesystem is whatever the VM is using, usually ext3. But the
filesystem doesn't matter in our use case at all, because:
 - the backing store for drbd  are logical volumes
 - the drbd block devices are directly exported as block devices
   to the VMs
The filesystem is only active inside the VM - and the VM is not aware of
the drbd primary/secondary -> primary/primary -> primary/secondary dance
that happens "outside" to enable live migration.

Kind regards,
           Alex.
-- 
"Opportunity is missed by most people because it is dressed in overalls and
 looks like work."                                      -- Thomas A. Edison
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: raid over ethernet
  2011-01-31 13:03             ` Alexander Schreiber
@ 2011-01-31 14:45               ` Roberto Spadim
  2011-01-31 16:15                 ` Alexander Schreiber
  0 siblings, 1 reply; 22+ messages in thread
From: Roberto Spadim @ 2011-01-31 14:45 UTC (permalink / raw)
  To: Alexander Schreiber; +Cc: Denis, Mikael Abrahamsson, Linux-RAID

i think filesystem is a problem...
you can't have two writers over a filesystem that allow only one, or
you will have filesystem crash (a lot of fsck repair... local cache
and other's features), maybe a gfs ocfs or another is a better
solution...

2011/1/31 Alexander Schreiber <als@thangorodrim.de>:
> On Mon, Jan 31, 2011 at 06:42:44AM -0200, Denis wrote:
>> 2011/1/29 Alexander Schreiber <als@thangorodrim.de>:
>> > On Sat, Jan 29, 2011 at 12:23:14PM -0200, Denis wrote:
>> >> 2011/1/29 Alexander Schreiber <als@thangorodrim.de>
>> >>
>> >> >
>> >> > plain disk performance for writes, while reads should be reasonably
>> >> > close to the plain disk performance - drbd optimizes reads by just reading
>> >> > from the local disk if it can.
>> >> >
>> >> >
>> >>  However, I have not used it with active-active fashion. Have you? if yes,
>> >> what is your overall experience?
>> >
>> > We are using drbd to provide mirrored disks for virtual machines running
>> > under Xen. 99% of the time, the drbd devices run in primary/secondary
>> > mode (aka active/passive), but they are switched to primary/primary
>> > (aka active/active) for live migrations of domains, as that needs the
>> > disks to be available on both nodes. From our experience, if the drbd
>> > device is healthy, this is very reliable. No experience with running
>> > drbd in primary/primary config for any extended period of time, though
>> > (the live migrations are usually over after a few seconds to a minute at
>> > most, then the drbd devices go back to primary/secondary).
>>
>> What filesystem are you using to enable the primary-primary mode? Have
>> you evaluated it against any other available option?
>
> The filesystem is whatever the VM is using, usually ext3. But the
> filesystem doesn't matter in our use case at all, because:
>  - the backing store for drbd  are logical volumes
>  - the drbd block devices are directly exported as block devices
>   to the VMs
> The filesystem is only active inside the VM - and the VM is not aware of
> the drbd primary/secondary -> primary/primary -> primary/secondary dance
> that happens "outside" to enable live migration.
>
> Kind regards,
>           Alex.
> --
> "Opportunity is missed by most people because it is dressed in overalls and
>  looks like work."                                      -- Thomas A. Edison
>
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: raid over ethernet
  2011-01-31 14:45               ` Roberto Spadim
@ 2011-01-31 16:15                 ` Alexander Schreiber
  2011-01-31 17:37                   ` Roberto Spadim
  0 siblings, 1 reply; 22+ messages in thread
From: Alexander Schreiber @ 2011-01-31 16:15 UTC (permalink / raw)
  To: Roberto Spadim; +Cc: Denis, Mikael Abrahamsson, Linux-RAID

On Mon, Jan 31, 2011 at 12:45:31PM -0200, Roberto Spadim wrote:
> i think filesystem is a problem...
> you can't have two writers over a filesystem that allow only one, or
> you will have filesystem crash (a lot of fsck repair... local cache
> and other's features), maybe a gfs ocfs or another is a better
> solution...

No, for _our_ use case (replicated disks for VMs running under Xen
with live migration) the fileystem just _does_ _not_ _matter_ _at_
_all_. Due to the way Xen live migration works, there is only one
writer at any one time: the VM "owning" the virtual disk provided
by drbd. 

To illustrate the point, a very short summary of what happens during
Xen live migration in our setup:
 - VM is to be migrated from host A to host B, with the virtual block
   device for the instance being provided by a drbd pair running on
   those hosts
 - host A/B are configured primary/secondary
 - we reconfigure drbd to primary/primary
 - start Xen live migration
 - Xen creates a target VM on host B, this VM is not yet running
 - Xen syncs live VM memory from host A to host B
 - when most of the memory is synced over, Xen suspends execution of
   the VM on host A
 - Xen copies the remaining dirty VM memory from host A to host B
 - Xen resumes VM execution on host B, destroys the source VM
   on host A, Xen live migration is completed
 - we reconfigure drbd on hosts A/B to secondary/primary

There is no concurrent access to the virtual block device here anywhere.
And the only reason we go primary/primary during live migration is that
for Xen to attach the disks to the target VM, they have to be available
and accessible on the target node - as well as on the source node where
they are currently attached to the source VM.

Now, if you were doing things like, say, use an primary/primary drbd
setup for NFS servers serving in parallel from two hosts, then yes, 
you'd have to take special steps with a proper parallel filesystem
to avoid corruption. But this is a completely different problem.

Kidn regards,
          Alex.
> 
> 2011/1/31 Alexander Schreiber <als@thangorodrim.de>:
> > On Mon, Jan 31, 2011 at 06:42:44AM -0200, Denis wrote:
> >> 2011/1/29 Alexander Schreiber <als@thangorodrim.de>:
> >> > On Sat, Jan 29, 2011 at 12:23:14PM -0200, Denis wrote:
> >> >> 2011/1/29 Alexander Schreiber <als@thangorodrim.de>
> >> >>
> >> >> >
> >> >> > plain disk performance for writes, while reads should be reasonably
> >> >> > close to the plain disk performance - drbd optimizes reads by just reading
> >> >> > from the local disk if it can.
> >> >> >
> >> >> >
> >> >>  However, I have not used it with active-active fashion. Have you? if yes,
> >> >> what is your overall experience?
> >> >
> >> > We are using drbd to provide mirrored disks for virtual machines running
> >> > under Xen. 99% of the time, the drbd devices run in primary/secondary
> >> > mode (aka active/passive), but they are switched to primary/primary
> >> > (aka active/active) for live migrations of domains, as that needs the
> >> > disks to be available on both nodes. From our experience, if the drbd
> >> > device is healthy, this is very reliable. No experience with running
> >> > drbd in primary/primary config for any extended period of time, though
> >> > (the live migrations are usually over after a few seconds to a minute at
> >> > most, then the drbd devices go back to primary/secondary).
> >>
> >> What filesystem are you using to enable the primary-primary mode? Have
> >> you evaluated it against any other available option?
> >
> > The filesystem is whatever the VM is using, usually ext3. But the
> > filesystem doesn't matter in our use case at all, because:
> >  - the backing store for drbd  are logical volumes
> >  - the drbd block devices are directly exported as block devices
> >   to the VMs
> > The filesystem is only active inside the VM - and the VM is not aware of
> > the drbd primary/secondary -> primary/primary -> primary/secondary dance
> > that happens "outside" to enable live migration.

-- 
"Opportunity is missed by most people because it is dressed in overalls and
 looks like work."                                      -- Thomas A. Edison
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: raid over ethernet
  2011-01-31 16:15                 ` Alexander Schreiber
@ 2011-01-31 17:37                   ` Roberto Spadim
  0 siblings, 0 replies; 22+ messages in thread
From: Roberto Spadim @ 2011-01-31 17:37 UTC (permalink / raw)
  To: Alexander Schreiber; +Cc: Denis, Mikael Abrahamsson, Linux-RAID

nice, you don´t have two writers.

2011/1/31 Alexander Schreiber <als@thangorodrim.de>:
> On Mon, Jan 31, 2011 at 12:45:31PM -0200, Roberto Spadim wrote:
>> i think filesystem is a problem...
>> you can't have two writers over a filesystem that allow only one, or
>> you will have filesystem crash (a lot of fsck repair... local cache
>> and other's features), maybe a gfs ocfs or another is a better
>> solution...
>
> No, for _our_ use case (replicated disks for VMs running under Xen
> with live migration) the fileystem just _does_ _not_ _matter_ _at_
> _all_. Due to the way Xen live migration works, there is only one
> writer at any one time: the VM "owning" the virtual disk provided
> by drbd.
>
> To illustrate the point, a very short summary of what happens during
> Xen live migration in our setup:
>  - VM is to be migrated from host A to host B, with the virtual block
>   device for the instance being provided by a drbd pair running on
>   those hosts
>  - host A/B are configured primary/secondary
>  - we reconfigure drbd to primary/primary
>  - start Xen live migration
>  - Xen creates a target VM on host B, this VM is not yet running
>  - Xen syncs live VM memory from host A to host B
>  - when most of the memory is synced over, Xen suspends execution of
>   the VM on host A
>  - Xen copies the remaining dirty VM memory from host A to host B
>  - Xen resumes VM execution on host B, destroys the source VM
>   on host A, Xen live migration is completed
>  - we reconfigure drbd on hosts A/B to secondary/primary
>
> There is no concurrent access to the virtual block device here anywhere.
> And the only reason we go primary/primary during live migration is that
> for Xen to attach the disks to the target VM, they have to be available
> and accessible on the target node - as well as on the source node where
> they are currently attached to the source VM.
>
> Now, if you were doing things like, say, use an primary/primary drbd
> setup for NFS servers serving in parallel from two hosts, then yes,
> you'd have to take special steps with a proper parallel filesystem
> to avoid corruption. But this is a completely different problem.
>
> Kidn regards,
>          Alex.
>>
>> 2011/1/31 Alexander Schreiber <als@thangorodrim.de>:
>> > On Mon, Jan 31, 2011 at 06:42:44AM -0200, Denis wrote:
>> >> 2011/1/29 Alexander Schreiber <als@thangorodrim.de>:
>> >> > On Sat, Jan 29, 2011 at 12:23:14PM -0200, Denis wrote:
>> >> >> 2011/1/29 Alexander Schreiber <als@thangorodrim.de>
>> >> >>
>> >> >> >
>> >> >> > plain disk performance for writes, while reads should be reasonably
>> >> >> > close to the plain disk performance - drbd optimizes reads by just reading
>> >> >> > from the local disk if it can.
>> >> >> >
>> >> >> >
>> >> >>  However, I have not used it with active-active fashion. Have you? if yes,
>> >> >> what is your overall experience?
>> >> >
>> >> > We are using drbd to provide mirrored disks for virtual machines running
>> >> > under Xen. 99% of the time, the drbd devices run in primary/secondary
>> >> > mode (aka active/passive), but they are switched to primary/primary
>> >> > (aka active/active) for live migrations of domains, as that needs the
>> >> > disks to be available on both nodes. From our experience, if the drbd
>> >> > device is healthy, this is very reliable. No experience with running
>> >> > drbd in primary/primary config for any extended period of time, though
>> >> > (the live migrations are usually over after a few seconds to a minute at
>> >> > most, then the drbd devices go back to primary/secondary).
>> >>
>> >> What filesystem are you using to enable the primary-primary mode? Have
>> >> you evaluated it against any other available option?
>> >
>> > The filesystem is whatever the VM is using, usually ext3. But the
>> > filesystem doesn't matter in our use case at all, because:
>> >  - the backing store for drbd  are logical volumes
>> >  - the drbd block devices are directly exported as block devices
>> >   to the VMs
>> > The filesystem is only active inside the VM - and the VM is not aware of
>> > the drbd primary/secondary -> primary/primary -> primary/secondary dance
>> > that happens "outside" to enable live migration.
>
> --
> "Opportunity is missed by most people because it is dressed in overalls and
>  looks like work."                                      -- Thomas A. Edison
>
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2011-01-31 17:37 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-29  1:58 raid over ethernet Roberto Spadim
2011-01-29  5:41 ` Jérôme Poulin
2011-01-29  6:42   ` Roberto Spadim
2011-01-29 13:29     ` Alexander Schreiber
2011-01-29  6:42 ` Mikael Abrahamsson
2011-01-29  6:44   ` Roberto Spadim
2011-01-29  6:48     ` Roberto Spadim
     [not found]       ` <AANLkTikdahgMoJjGr2otTS70LSM77GNpW_vAkZf15Kph@mail.gmail.com>
2011-01-29 11:47         ` Roberto Spadim
2011-01-29 13:34     ` Alexander Schreiber
     [not found]       ` <AANLkTi=6ridRPnHpfdOC=f2_ESndSARmQRkvT_shYO3s@mail.gmail.com>
2011-01-29 14:25         ` Denis
2011-01-29 21:08         ` Alexander Schreiber
2011-01-29 21:54           ` John Robinson
2011-01-29 23:04             ` Stan Hoeppner
2011-01-29 23:06             ` Miles Fidelman
2011-01-30  1:43             ` Alexander Schreiber
2011-01-31  8:42           ` Denis
2011-01-31 13:03             ` Alexander Schreiber
2011-01-31 14:45               ` Roberto Spadim
2011-01-31 16:15                 ` Alexander Schreiber
2011-01-31 17:37                   ` Roberto Spadim
2011-01-29 15:30     ` Spelic
2011-01-29 18:34   ` David Brown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.