* raid over ethernet @ 2011-01-29 1:58 Roberto Spadim 2011-01-29 5:41 ` Jérôme Poulin 2011-01-29 6:42 ` Mikael Abrahamsson 0 siblings, 2 replies; 22+ messages in thread From: Roberto Spadim @ 2011-01-29 1:58 UTC (permalink / raw) To: Linux-RAID hi guys, i was thinking about raid over ethernet... there's a solution to make a syncronous replica of my filesystem? no problem if my primary server get down, i can mout my replica fsck it and continue with available data i was reading about nbd, anyone have more ideas? -- Roberto Spadim Spadim Technology / SPAEmpresarial ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: raid over ethernet 2011-01-29 1:58 raid over ethernet Roberto Spadim @ 2011-01-29 5:41 ` Jérôme Poulin 2011-01-29 6:42 ` Roberto Spadim 2011-01-29 6:42 ` Mikael Abrahamsson 1 sibling, 1 reply; 22+ messages in thread From: Jérôme Poulin @ 2011-01-29 5:41 UTC (permalink / raw) To: Roberto Spadim; +Cc: Linux-RAID DRBD: http://www.drbd.org/ Envoyé de mon appareil mobile. Jérôme Poulin Solutions G.A. On 2011-01-28, at 20:58, Roberto Spadim <roberto@spadim.com.br> wrote: > hi guys, i was thinking about raid over ethernet... there's a solution > to make a syncronous replica of my filesystem? no problem if my > primary server get down, i can mout my replica fsck it and continue > with available data > i was reading about nbd, anyone have more ideas? > > -- > Roberto Spadim > Spadim Technology / SPAEmpresarial > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: raid over ethernet 2011-01-29 5:41 ` Jérôme Poulin @ 2011-01-29 6:42 ` Roberto Spadim 2011-01-29 13:29 ` Alexander Schreiber 0 siblings, 1 reply; 22+ messages in thread From: Roberto Spadim @ 2011-01-29 6:42 UTC (permalink / raw) To: Jérôme Poulin; +Cc: Linux-RAID is it better than nbd+mdadm? 2011/1/29 Jérôme Poulin <jeromepoulin@gmail.com>: > DRBD: http://www.drbd.org/ > > Envoyé de mon appareil mobile. > > Jérôme Poulin > Solutions G.A. > > On 2011-01-28, at 20:58, Roberto Spadim <roberto@spadim.com.br> wrote: > >> hi guys, i was thinking about raid over ethernet... there's a solution >> to make a syncronous replica of my filesystem? no problem if my >> primary server get down, i can mout my replica fsck it and continue >> with available data >> i was reading about nbd, anyone have more ideas? >> >> -- >> Roberto Spadim >> Spadim Technology / SPAEmpresarial >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: raid over ethernet 2011-01-29 6:42 ` Roberto Spadim @ 2011-01-29 13:29 ` Alexander Schreiber 0 siblings, 0 replies; 22+ messages in thread From: Alexander Schreiber @ 2011-01-29 13:29 UTC (permalink / raw) To: Roberto Spadim; +Cc: Jérôme Poulin, Linux-RAID On Sat, Jan 29, 2011 at 04:42:16AM -0200, Roberto Spadim wrote: > is it better than nbd+mdadm? Definitely. We are using drbd replicated disks on a _lot_ of machines, with all kinds of outside events: disk failures, network failures, machine failures of various interesting variants. Despite this kind of pounding, drbd turned out to be very robust, with data loss happening very rarely (well, with some combined failures you are just plain screwed - that's why one has backups). Kind regards, Alex. -- "Opportunity is missed by most people because it is dressed in overalls and looks like work." -- Thomas A. Edison ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: raid over ethernet 2011-01-29 1:58 raid over ethernet Roberto Spadim 2011-01-29 5:41 ` Jérôme Poulin @ 2011-01-29 6:42 ` Mikael Abrahamsson 2011-01-29 6:44 ` Roberto Spadim 2011-01-29 18:34 ` David Brown 1 sibling, 2 replies; 22+ messages in thread From: Mikael Abrahamsson @ 2011-01-29 6:42 UTC (permalink / raw) To: Roberto Spadim; +Cc: Linux-RAID On Fri, 28 Jan 2011, Roberto Spadim wrote: > hi guys, i was thinking about raid over ethernet... there's a solution > to make a syncronous replica of my filesystem? no problem if my > primary server get down, i can mout my replica fsck it and continue > with available data > i was reading about nbd, anyone have more ideas? Look into AoE (ATA over Ethernet). -- Mikael Abrahamsson email: swmike@swm.pp.se ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: raid over ethernet 2011-01-29 6:42 ` Mikael Abrahamsson @ 2011-01-29 6:44 ` Roberto Spadim 2011-01-29 6:48 ` Roberto Spadim ` (2 more replies) 2011-01-29 18:34 ` David Brown 1 sibling, 3 replies; 22+ messages in thread From: Roberto Spadim @ 2011-01-29 6:44 UTC (permalink / raw) To: Mikael Abrahamsson; +Cc: Linux-RAID faster than nbd? 2011/1/29 Mikael Abrahamsson <swmike@swm.pp.se>: > On Fri, 28 Jan 2011, Roberto Spadim wrote: > >> hi guys, i was thinking about raid over ethernet... there's a solution >> to make a syncronous replica of my filesystem? no problem if my >> primary server get down, i can mout my replica fsck it and continue >> with available data >> i was reading about nbd, anyone have more ideas? > > Look into AoE (ATA over Ethernet). > > -- > Mikael Abrahamsson email: swmike@swm.pp.se > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: raid over ethernet 2011-01-29 6:44 ` Roberto Spadim @ 2011-01-29 6:48 ` Roberto Spadim [not found] ` <AANLkTikdahgMoJjGr2otTS70LSM77GNpW_vAkZf15Kph@mail.gmail.com> 2011-01-29 13:34 ` Alexander Schreiber 2011-01-29 15:30 ` Spelic 2 siblings, 1 reply; 22+ messages in thread From: Roberto Spadim @ 2011-01-29 6:48 UTC (permalink / raw) To: Mikael Abrahamsson; +Cc: Linux-RAID better than drbd? 2011/1/29 Roberto Spadim <roberto@spadim.com.br>: > faster than nbd? > > 2011/1/29 Mikael Abrahamsson <swmike@swm.pp.se>: >> On Fri, 28 Jan 2011, Roberto Spadim wrote: >> >>> hi guys, i was thinking about raid over ethernet... there's a solution >>> to make a syncronous replica of my filesystem? no problem if my >>> primary server get down, i can mout my replica fsck it and continue >>> with available data >>> i was reading about nbd, anyone have more ideas? >> >> Look into AoE (ATA over Ethernet). >> >> -- >> Mikael Abrahamsson email: swmike@swm.pp.se >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > > > -- > Roberto Spadim > Spadim Technology / SPAEmpresarial > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
[parent not found: <AANLkTikdahgMoJjGr2otTS70LSM77GNpW_vAkZf15Kph@mail.gmail.com>]
* Re: raid over ethernet [not found] ` <AANLkTikdahgMoJjGr2otTS70LSM77GNpW_vAkZf15Kph@mail.gmail.com> @ 2011-01-29 11:47 ` Roberto Spadim 0 siblings, 0 replies; 22+ messages in thread From: Roberto Spadim @ 2011-01-29 11:47 UTC (permalink / raw) To: Peter Chacko; +Cc: Mikael Abrahamsson, Linux-RAID Manging the combination of nbd and mdraid is complicated. complicated = drbd work? 2011/1/29 Peter Chacko <peterchacko35@gmail.com>: > AoE is not routable. And has no replication .Its not used for DRBD or NBD. > AoE is best if you want to implement cheapest SAN in the local network. > For the original purpose, DRBD is the best. Manging the combination of nbd > and mdraid is complicated. > thanks. > Peter Chacko, > Athinio data systems. > > On Sat, Jan 29, 2011 at 12:18 PM, Roberto Spadim <roberto@spadim.com.br> > wrote: >> >> better than drbd? >> >> 2011/1/29 Roberto Spadim <roberto@spadim.com.br>: >> > faster than nbd? >> > >> > 2011/1/29 Mikael Abrahamsson <swmike@swm.pp.se>: >> >> On Fri, 28 Jan 2011, Roberto Spadim wrote: >> >> >> >>> hi guys, i was thinking about raid over ethernet... there's a solution >> >>> to make a syncronous replica of my filesystem? no problem if my >> >>> primary server get down, i can mout my replica fsck it and continue >> >>> with available data >> >>> i was reading about nbd, anyone have more ideas? >> >> >> >> Look into AoE (ATA over Ethernet). >> >> >> >> -- >> >> Mikael Abrahamsson email: swmike@swm.pp.se >> >> -- >> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" >> >> in >> >> the body of a message to majordomo@vger.kernel.org >> >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> > >> > >> > >> > -- >> > Roberto Spadim >> > Spadim Technology / SPAEmpresarial >> > >> >> >> >> -- >> Roberto Spadim >> Spadim Technology / SPAEmpresarial >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: raid over ethernet 2011-01-29 6:44 ` Roberto Spadim 2011-01-29 6:48 ` Roberto Spadim @ 2011-01-29 13:34 ` Alexander Schreiber [not found] ` <AANLkTi=6ridRPnHpfdOC=f2_ESndSARmQRkvT_shYO3s@mail.gmail.com> 2011-01-29 15:30 ` Spelic 2 siblings, 1 reply; 22+ messages in thread From: Alexander Schreiber @ 2011-01-29 13:34 UTC (permalink / raw) To: Roberto Spadim; +Cc: Mikael Abrahamsson, Linux-RAID On Sat, Jan 29, 2011 at 04:44:05AM -0200, Roberto Spadim wrote: > faster than nbd? I don't know how drbd compares in speed to ndb, but drbd is obviously slower than plain disks, especially if you care about your data. In the only sensible operating mode (synchronous writes to the underlying block devices), the speed (both bandwidth and latency) depends on your disks and your network connection (so you better get at least a Gigabit link). Depending on your particular setup, you'll probably get 50-60% of the plain disk performance for writes, while reads should be reasonably close to the plain disk performance - drbd optimizes reads by just reading from the local disk if it can. Kind regards, Alex. -- "Opportunity is missed by most people because it is dressed in overalls and looks like work." -- Thomas A. Edison ^ permalink raw reply [flat|nested] 22+ messages in thread
[parent not found: <AANLkTi=6ridRPnHpfdOC=f2_ESndSARmQRkvT_shYO3s@mail.gmail.com>]
* raid over ethernet [not found] ` <AANLkTi=6ridRPnHpfdOC=f2_ESndSARmQRkvT_shYO3s@mail.gmail.com> @ 2011-01-29 14:25 ` Denis 2011-01-29 21:08 ` Alexander Schreiber 1 sibling, 0 replies; 22+ messages in thread From: Denis @ 2011-01-29 14:25 UTC (permalink / raw) To: Linux-RAID; +Cc: Roberto Spadim, Mikael Abrahamsson, Alexander Schreiber ouch, html. - my bad. ---------- Forwarded message ---------- From: Denis <denismpa@gmail.com> Date: 2011/1/29 Subject: Re: raid over ethernet To: Alexander Schreiber <als@thangorodrim.de> Cc: Roberto Spadim <roberto@spadim.com.br>, Mikael Abrahamsson <swmike@swm.pp.se>, Linux-RAID <linux-raid@vger.kernel.org> 2011/1/29 Roberto Spadim <roberto@spadim.com.br> > > Manging the combination of nbd and mdraid is complicated. > > complicated = drbd work? I have been using drbd for a long time and it's quite easy to implement, manage and use. The main purpose of all aplications I have used it for, were high availibility and it works just fine. And it's really cool to se it integrated with heartbeat, which will manage to mount the partition on one or another node, according to your police and nodes availibility. 2011/1/29 Alexander Schreiber <als@thangorodrim.de> > > plain disk performance for writes, while reads should be reasonably > close to the plain disk performance - drbd optimizes reads by just reading > from the local disk if it can. > However, I have not used it with active-active fashion. Have you? if yes, what is your overall experience? > > Kind regards, > Alex. > -- > "Opportunity is missed by most people because it is dressed in overalls and > looks like work." -- Thomas A. Edison > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Cheers, -- Denis Anjos, www.versatushpc.com.br -- Denis Anjos, www.versatushpc.com.br -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: raid over ethernet [not found] ` <AANLkTi=6ridRPnHpfdOC=f2_ESndSARmQRkvT_shYO3s@mail.gmail.com> 2011-01-29 14:25 ` Denis @ 2011-01-29 21:08 ` Alexander Schreiber 2011-01-29 21:54 ` John Robinson 2011-01-31 8:42 ` Denis 1 sibling, 2 replies; 22+ messages in thread From: Alexander Schreiber @ 2011-01-29 21:08 UTC (permalink / raw) To: Denis; +Cc: Roberto Spadim, Mikael Abrahamsson, Linux-RAID On Sat, Jan 29, 2011 at 12:23:14PM -0200, Denis wrote: > 2011/1/29 Alexander Schreiber <als@thangorodrim.de> > > > > > plain disk performance for writes, while reads should be reasonably > > close to the plain disk performance - drbd optimizes reads by just reading > > from the local disk if it can. > > > > > However, I have not used it with active-active fashion. Have you? if yes, > what is your overall experience? We are using drbd to provide mirrored disks for virtual machines running under Xen. 99% of the time, the drbd devices run in primary/secondary mode (aka active/passive), but they are switched to primary/primary (aka active/active) for live migrations of domains, as that needs the disks to be available on both nodes. From our experience, if the drbd device is healthy, this is very reliable. No experience with running drbd in primary/primary config for any extended period of time, though (the live migrations are usually over after a few seconds to a minute at most, then the drbd devices go back to primary/secondary). Kind regards, Alex. -- "Opportunity is missed by most people because it is dressed in overalls and looks like work." -- Thomas A. Edison ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: raid over ethernet 2011-01-29 21:08 ` Alexander Schreiber @ 2011-01-29 21:54 ` John Robinson 2011-01-29 23:04 ` Stan Hoeppner ` (2 more replies) 2011-01-31 8:42 ` Denis 1 sibling, 3 replies; 22+ messages in thread From: John Robinson @ 2011-01-29 21:54 UTC (permalink / raw) To: Alexander Schreiber; +Cc: Linux-RAID On 29/01/2011 21:08, Alexander Schreiber wrote: > On Sat, Jan 29, 2011 at 12:23:14PM -0200, Denis wrote: >> 2011/1/29 Alexander Schreiber<als@thangorodrim.de> >> >>> >>> plain disk performance for writes, while reads should be reasonably >>> close to the plain disk performance - drbd optimizes reads by just reading >>> from the local disk if it can. >>> >>> >> However, I have not used it with active-active fashion. Have you? if yes, >> what is your overall experience? > > We are using drbd to provide mirrored disks for virtual machines running > under Xen. 99% of the time, the drbd devices run in primary/secondary > mode (aka active/passive), but they are switched to primary/primary > (aka active/active) for live migrations of domains, as that needs the > disks to be available on both nodes. From our experience, if the drbd > device is healthy, this is very reliable. No experience with running > drbd in primary/primary config for any extended period of time, though > (the live migrations are usually over after a few seconds to a minute at > most, then the drbd devices go back to primary/secondary). Now that is interesting, to me at least. More as a thought experiment for now, I was wondering how one would go about setting up a small cluster of commodity servers (maybe 8 machines) running Xen (or perhaps now KVM) VMs, such that if one (or potentially two) of the machines died, the VMs could be picked up by the other machines in the cluster, and only using locally-attached SATA/SAS discs in each machine. I guess I'm talking about RAIN or RAIS rather than RAID so maybe I'd better start reading the Wikipedia pages on those and not talk about it on this list... Cheers, John. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: raid over ethernet 2011-01-29 21:54 ` John Robinson @ 2011-01-29 23:04 ` Stan Hoeppner 2011-01-29 23:06 ` Miles Fidelman 2011-01-30 1:43 ` Alexander Schreiber 2 siblings, 0 replies; 22+ messages in thread From: Stan Hoeppner @ 2011-01-29 23:04 UTC (permalink / raw) To: John Robinson; +Cc: Alexander Schreiber, Linux-RAID John Robinson put forth on 1/29/2011 3:54 PM: > Now that is interesting, to me at least. More as a thought experiment for now, I > was wondering how one would go about setting up a small cluster of commodity > servers (maybe 8 machines) running Xen (or perhaps now KVM) VMs, such that if > one (or potentially two) of the machines died, the VMs could be picked up by the > other machines in the cluster, and only using locally-attached SATA/SAS discs in > each machine. Doing N-way active replication with DRBD increases network utilization substantially. With two DRBD active nodes you will have a maximum of _2_ simultaneous data streams, one in each direction. With 8 active nodes you will have a maximum of _56_ simultaneous data streams. Your scenario requires all nodes be active. This may work for a hobby cluster or something with very low volume of data being written to disk. This solution most likely won't scale for a cluster with any amount of real traffic. GbE peaks at 100 MB/s. Therefore each node will have only about 12 MB/s of bidirectional bandwidth for each other cluster member if my math is correct. A single SATA disk run about 80-120 MB/s, so your network DRBD disk bandwidth is about 1/7th to 1/10th that of a single local disk. In a 2 node cluster it's closer to 1:1. For you scenario to actually be feasible, you'd need at least bonded quad GbE interfaces if not single 10 GbE interfaces to get all the bandwidth you'd need. You'd be _MUCH_ better off using 2 active DRBD mirrored NFS servers with GFS2 filesystems and having the aforementioned 8 nodes do their data sharing via NFS. In this setup each node only writes once (to NFS) dramatically reducing network bandwidth required per node, with only 16 maximum data streams instead of 56. If you need more bandwidth or IOPS than a single disk NFS server can produce, simply RAID 4-10 disks on each NFS server via RAID 10, then mirror the two RAIDs with DRBD. You may need 2-4 GbE interfaces between the two NFS servers just for DRBD traffic, but the cost of that is much less than having the same number of interfaces in each of 8 cluster nodes. This will also give you much better performance after a node or two fails and you have to boot their VM guests on other hosts. Having fast central RAID storage will allow those guests to boot much more quickly and without causing degraded performance on the other nodes due to lack of disk bandwidth in your suggested model. -- Stan ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: raid over ethernet 2011-01-29 21:54 ` John Robinson 2011-01-29 23:04 ` Stan Hoeppner @ 2011-01-29 23:06 ` Miles Fidelman 2011-01-30 1:43 ` Alexander Schreiber 2 siblings, 0 replies; 22+ messages in thread From: Miles Fidelman @ 2011-01-29 23:06 UTC (permalink / raw) Cc: Linux-RAID John Robinson wrote: > Now that is interesting, to me at least. More as a thought experiment > for now, I was wondering how one would go about setting up a small > cluster of commodity servers (maybe 8 machines) running Xen (or > perhaps now KVM) VMs, such that if one (or potentially two) of the > machines died, the VMs could be picked up by the other machines in the > cluster, and only using locally-attached SATA/SAS discs in each machine. I do that now - albeit only on a 2-node cluster. DRBD works just fine using locally attached drives. -- In theory, there is no difference between theory and practice. In<fnord> practice, there is. .... Yogi Berra ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: raid over ethernet 2011-01-29 21:54 ` John Robinson 2011-01-29 23:04 ` Stan Hoeppner 2011-01-29 23:06 ` Miles Fidelman @ 2011-01-30 1:43 ` Alexander Schreiber 2 siblings, 0 replies; 22+ messages in thread From: Alexander Schreiber @ 2011-01-30 1:43 UTC (permalink / raw) To: John Robinson; +Cc: Linux-RAID On Sat, Jan 29, 2011 at 09:54:55PM +0000, John Robinson wrote: > On 29/01/2011 21:08, Alexander Schreiber wrote: > >On Sat, Jan 29, 2011 at 12:23:14PM -0200, Denis wrote: > >>2011/1/29 Alexander Schreiber<als@thangorodrim.de> > >> > >>> > >>>plain disk performance for writes, while reads should be reasonably > >>>close to the plain disk performance - drbd optimizes reads by just reading > >>>from the local disk if it can. > >>> > >>> > >> However, I have not used it with active-active fashion. Have you? if yes, > >>what is your overall experience? > > > >We are using drbd to provide mirrored disks for virtual machines running > >under Xen. 99% of the time, the drbd devices run in primary/secondary > >mode (aka active/passive), but they are switched to primary/primary > >(aka active/active) for live migrations of domains, as that needs the > >disks to be available on both nodes. From our experience, if the drbd > >device is healthy, this is very reliable. No experience with running > >drbd in primary/primary config for any extended period of time, though > >(the live migrations are usually over after a few seconds to a minute at > >most, then the drbd devices go back to primary/secondary). > > Now that is interesting, to me at least. More as a thought > experiment for now, I was wondering how one would go about setting > up a small cluster of commodity servers (maybe 8 machines) running > Xen (or perhaps now KVM) VMs, such that if one (or potentially two) > of the machines died, the VMs could be picked up by the other > machines in the cluster, and only using locally-attached SATA/SAS > discs in each machine. > > I guess I'm talking about RAIN or RAIS rather than RAID so maybe I'd > better start reading the Wikipedia pages on those and not talk about > it on this list... For the "survive single node total machine failure" case your problem has already been solved: http://code.google.com/p/ganeti/ We run a large number of clusters with that and the VMs routinely survive disk failures and recover (come back from what looks like a power failure to the VM) from node failure. Kind regards, Alex. -- "Opportunity is missed by most people because it is dressed in overalls and looks like work." -- Thomas A. Edison ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: raid over ethernet 2011-01-29 21:08 ` Alexander Schreiber 2011-01-29 21:54 ` John Robinson @ 2011-01-31 8:42 ` Denis 2011-01-31 13:03 ` Alexander Schreiber 1 sibling, 1 reply; 22+ messages in thread From: Denis @ 2011-01-31 8:42 UTC (permalink / raw) To: Alexander Schreiber; +Cc: Roberto Spadim, Mikael Abrahamsson, Linux-RAID 2011/1/29 Alexander Schreiber <als@thangorodrim.de>: > On Sat, Jan 29, 2011 at 12:23:14PM -0200, Denis wrote: >> 2011/1/29 Alexander Schreiber <als@thangorodrim.de> >> >> > >> > plain disk performance for writes, while reads should be reasonably >> > close to the plain disk performance - drbd optimizes reads by just reading >> > from the local disk if it can. >> > >> > >> However, I have not used it with active-active fashion. Have you? if yes, >> what is your overall experience? > > We are using drbd to provide mirrored disks for virtual machines running > under Xen. 99% of the time, the drbd devices run in primary/secondary > mode (aka active/passive), but they are switched to primary/primary > (aka active/active) for live migrations of domains, as that needs the > disks to be available on both nodes. From our experience, if the drbd > device is healthy, this is very reliable. No experience with running > drbd in primary/primary config for any extended period of time, though > (the live migrations are usually over after a few seconds to a minute at > most, then the drbd devices go back to primary/secondary). What filesystem are you using to enable the primary-primary mode? Have you evaluated it against any other available option? > > Kind regards, > Alex. > -- > "Opportunity is missed by most people because it is dressed in overalls and > looks like work." -- Thomas A. Edison > cheers! -- Denis Anjos, www.versatushpc.com.br -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: raid over ethernet 2011-01-31 8:42 ` Denis @ 2011-01-31 13:03 ` Alexander Schreiber 2011-01-31 14:45 ` Roberto Spadim 0 siblings, 1 reply; 22+ messages in thread From: Alexander Schreiber @ 2011-01-31 13:03 UTC (permalink / raw) To: Denis; +Cc: Roberto Spadim, Mikael Abrahamsson, Linux-RAID On Mon, Jan 31, 2011 at 06:42:44AM -0200, Denis wrote: > 2011/1/29 Alexander Schreiber <als@thangorodrim.de>: > > On Sat, Jan 29, 2011 at 12:23:14PM -0200, Denis wrote: > >> 2011/1/29 Alexander Schreiber <als@thangorodrim.de> > >> > >> > > >> > plain disk performance for writes, while reads should be reasonably > >> > close to the plain disk performance - drbd optimizes reads by just reading > >> > from the local disk if it can. > >> > > >> > > >> However, I have not used it with active-active fashion. Have you? if yes, > >> what is your overall experience? > > > > We are using drbd to provide mirrored disks for virtual machines running > > under Xen. 99% of the time, the drbd devices run in primary/secondary > > mode (aka active/passive), but they are switched to primary/primary > > (aka active/active) for live migrations of domains, as that needs the > > disks to be available on both nodes. From our experience, if the drbd > > device is healthy, this is very reliable. No experience with running > > drbd in primary/primary config for any extended period of time, though > > (the live migrations are usually over after a few seconds to a minute at > > most, then the drbd devices go back to primary/secondary). > > What filesystem are you using to enable the primary-primary mode? Have > you evaluated it against any other available option? The filesystem is whatever the VM is using, usually ext3. But the filesystem doesn't matter in our use case at all, because: - the backing store for drbd are logical volumes - the drbd block devices are directly exported as block devices to the VMs The filesystem is only active inside the VM - and the VM is not aware of the drbd primary/secondary -> primary/primary -> primary/secondary dance that happens "outside" to enable live migration. Kind regards, Alex. -- "Opportunity is missed by most people because it is dressed in overalls and looks like work." -- Thomas A. Edison -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: raid over ethernet 2011-01-31 13:03 ` Alexander Schreiber @ 2011-01-31 14:45 ` Roberto Spadim 2011-01-31 16:15 ` Alexander Schreiber 0 siblings, 1 reply; 22+ messages in thread From: Roberto Spadim @ 2011-01-31 14:45 UTC (permalink / raw) To: Alexander Schreiber; +Cc: Denis, Mikael Abrahamsson, Linux-RAID i think filesystem is a problem... you can't have two writers over a filesystem that allow only one, or you will have filesystem crash (a lot of fsck repair... local cache and other's features), maybe a gfs ocfs or another is a better solution... 2011/1/31 Alexander Schreiber <als@thangorodrim.de>: > On Mon, Jan 31, 2011 at 06:42:44AM -0200, Denis wrote: >> 2011/1/29 Alexander Schreiber <als@thangorodrim.de>: >> > On Sat, Jan 29, 2011 at 12:23:14PM -0200, Denis wrote: >> >> 2011/1/29 Alexander Schreiber <als@thangorodrim.de> >> >> >> >> > >> >> > plain disk performance for writes, while reads should be reasonably >> >> > close to the plain disk performance - drbd optimizes reads by just reading >> >> > from the local disk if it can. >> >> > >> >> > >> >> However, I have not used it with active-active fashion. Have you? if yes, >> >> what is your overall experience? >> > >> > We are using drbd to provide mirrored disks for virtual machines running >> > under Xen. 99% of the time, the drbd devices run in primary/secondary >> > mode (aka active/passive), but they are switched to primary/primary >> > (aka active/active) for live migrations of domains, as that needs the >> > disks to be available on both nodes. From our experience, if the drbd >> > device is healthy, this is very reliable. No experience with running >> > drbd in primary/primary config for any extended period of time, though >> > (the live migrations are usually over after a few seconds to a minute at >> > most, then the drbd devices go back to primary/secondary). >> >> What filesystem are you using to enable the primary-primary mode? Have >> you evaluated it against any other available option? > > The filesystem is whatever the VM is using, usually ext3. But the > filesystem doesn't matter in our use case at all, because: > - the backing store for drbd are logical volumes > - the drbd block devices are directly exported as block devices > to the VMs > The filesystem is only active inside the VM - and the VM is not aware of > the drbd primary/secondary -> primary/primary -> primary/secondary dance > that happens "outside" to enable live migration. > > Kind regards, > Alex. > -- > "Opportunity is missed by most people because it is dressed in overalls and > looks like work." -- Thomas A. Edison > > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: raid over ethernet 2011-01-31 14:45 ` Roberto Spadim @ 2011-01-31 16:15 ` Alexander Schreiber 2011-01-31 17:37 ` Roberto Spadim 0 siblings, 1 reply; 22+ messages in thread From: Alexander Schreiber @ 2011-01-31 16:15 UTC (permalink / raw) To: Roberto Spadim; +Cc: Denis, Mikael Abrahamsson, Linux-RAID On Mon, Jan 31, 2011 at 12:45:31PM -0200, Roberto Spadim wrote: > i think filesystem is a problem... > you can't have two writers over a filesystem that allow only one, or > you will have filesystem crash (a lot of fsck repair... local cache > and other's features), maybe a gfs ocfs or another is a better > solution... No, for _our_ use case (replicated disks for VMs running under Xen with live migration) the fileystem just _does_ _not_ _matter_ _at_ _all_. Due to the way Xen live migration works, there is only one writer at any one time: the VM "owning" the virtual disk provided by drbd. To illustrate the point, a very short summary of what happens during Xen live migration in our setup: - VM is to be migrated from host A to host B, with the virtual block device for the instance being provided by a drbd pair running on those hosts - host A/B are configured primary/secondary - we reconfigure drbd to primary/primary - start Xen live migration - Xen creates a target VM on host B, this VM is not yet running - Xen syncs live VM memory from host A to host B - when most of the memory is synced over, Xen suspends execution of the VM on host A - Xen copies the remaining dirty VM memory from host A to host B - Xen resumes VM execution on host B, destroys the source VM on host A, Xen live migration is completed - we reconfigure drbd on hosts A/B to secondary/primary There is no concurrent access to the virtual block device here anywhere. And the only reason we go primary/primary during live migration is that for Xen to attach the disks to the target VM, they have to be available and accessible on the target node - as well as on the source node where they are currently attached to the source VM. Now, if you were doing things like, say, use an primary/primary drbd setup for NFS servers serving in parallel from two hosts, then yes, you'd have to take special steps with a proper parallel filesystem to avoid corruption. But this is a completely different problem. Kidn regards, Alex. > > 2011/1/31 Alexander Schreiber <als@thangorodrim.de>: > > On Mon, Jan 31, 2011 at 06:42:44AM -0200, Denis wrote: > >> 2011/1/29 Alexander Schreiber <als@thangorodrim.de>: > >> > On Sat, Jan 29, 2011 at 12:23:14PM -0200, Denis wrote: > >> >> 2011/1/29 Alexander Schreiber <als@thangorodrim.de> > >> >> > >> >> > > >> >> > plain disk performance for writes, while reads should be reasonably > >> >> > close to the plain disk performance - drbd optimizes reads by just reading > >> >> > from the local disk if it can. > >> >> > > >> >> > > >> >> However, I have not used it with active-active fashion. Have you? if yes, > >> >> what is your overall experience? > >> > > >> > We are using drbd to provide mirrored disks for virtual machines running > >> > under Xen. 99% of the time, the drbd devices run in primary/secondary > >> > mode (aka active/passive), but they are switched to primary/primary > >> > (aka active/active) for live migrations of domains, as that needs the > >> > disks to be available on both nodes. From our experience, if the drbd > >> > device is healthy, this is very reliable. No experience with running > >> > drbd in primary/primary config for any extended period of time, though > >> > (the live migrations are usually over after a few seconds to a minute at > >> > most, then the drbd devices go back to primary/secondary). > >> > >> What filesystem are you using to enable the primary-primary mode? Have > >> you evaluated it against any other available option? > > > > The filesystem is whatever the VM is using, usually ext3. But the > > filesystem doesn't matter in our use case at all, because: > > - the backing store for drbd are logical volumes > > - the drbd block devices are directly exported as block devices > > to the VMs > > The filesystem is only active inside the VM - and the VM is not aware of > > the drbd primary/secondary -> primary/primary -> primary/secondary dance > > that happens "outside" to enable live migration. -- "Opportunity is missed by most people because it is dressed in overalls and looks like work." -- Thomas A. Edison -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: raid over ethernet 2011-01-31 16:15 ` Alexander Schreiber @ 2011-01-31 17:37 ` Roberto Spadim 0 siblings, 0 replies; 22+ messages in thread From: Roberto Spadim @ 2011-01-31 17:37 UTC (permalink / raw) To: Alexander Schreiber; +Cc: Denis, Mikael Abrahamsson, Linux-RAID nice, you don´t have two writers. 2011/1/31 Alexander Schreiber <als@thangorodrim.de>: > On Mon, Jan 31, 2011 at 12:45:31PM -0200, Roberto Spadim wrote: >> i think filesystem is a problem... >> you can't have two writers over a filesystem that allow only one, or >> you will have filesystem crash (a lot of fsck repair... local cache >> and other's features), maybe a gfs ocfs or another is a better >> solution... > > No, for _our_ use case (replicated disks for VMs running under Xen > with live migration) the fileystem just _does_ _not_ _matter_ _at_ > _all_. Due to the way Xen live migration works, there is only one > writer at any one time: the VM "owning" the virtual disk provided > by drbd. > > To illustrate the point, a very short summary of what happens during > Xen live migration in our setup: > - VM is to be migrated from host A to host B, with the virtual block > device for the instance being provided by a drbd pair running on > those hosts > - host A/B are configured primary/secondary > - we reconfigure drbd to primary/primary > - start Xen live migration > - Xen creates a target VM on host B, this VM is not yet running > - Xen syncs live VM memory from host A to host B > - when most of the memory is synced over, Xen suspends execution of > the VM on host A > - Xen copies the remaining dirty VM memory from host A to host B > - Xen resumes VM execution on host B, destroys the source VM > on host A, Xen live migration is completed > - we reconfigure drbd on hosts A/B to secondary/primary > > There is no concurrent access to the virtual block device here anywhere. > And the only reason we go primary/primary during live migration is that > for Xen to attach the disks to the target VM, they have to be available > and accessible on the target node - as well as on the source node where > they are currently attached to the source VM. > > Now, if you were doing things like, say, use an primary/primary drbd > setup for NFS servers serving in parallel from two hosts, then yes, > you'd have to take special steps with a proper parallel filesystem > to avoid corruption. But this is a completely different problem. > > Kidn regards, > Alex. >> >> 2011/1/31 Alexander Schreiber <als@thangorodrim.de>: >> > On Mon, Jan 31, 2011 at 06:42:44AM -0200, Denis wrote: >> >> 2011/1/29 Alexander Schreiber <als@thangorodrim.de>: >> >> > On Sat, Jan 29, 2011 at 12:23:14PM -0200, Denis wrote: >> >> >> 2011/1/29 Alexander Schreiber <als@thangorodrim.de> >> >> >> >> >> >> > >> >> >> > plain disk performance for writes, while reads should be reasonably >> >> >> > close to the plain disk performance - drbd optimizes reads by just reading >> >> >> > from the local disk if it can. >> >> >> > >> >> >> > >> >> >> However, I have not used it with active-active fashion. Have you? if yes, >> >> >> what is your overall experience? >> >> > >> >> > We are using drbd to provide mirrored disks for virtual machines running >> >> > under Xen. 99% of the time, the drbd devices run in primary/secondary >> >> > mode (aka active/passive), but they are switched to primary/primary >> >> > (aka active/active) for live migrations of domains, as that needs the >> >> > disks to be available on both nodes. From our experience, if the drbd >> >> > device is healthy, this is very reliable. No experience with running >> >> > drbd in primary/primary config for any extended period of time, though >> >> > (the live migrations are usually over after a few seconds to a minute at >> >> > most, then the drbd devices go back to primary/secondary). >> >> >> >> What filesystem are you using to enable the primary-primary mode? Have >> >> you evaluated it against any other available option? >> > >> > The filesystem is whatever the VM is using, usually ext3. But the >> > filesystem doesn't matter in our use case at all, because: >> > - the backing store for drbd are logical volumes >> > - the drbd block devices are directly exported as block devices >> > to the VMs >> > The filesystem is only active inside the VM - and the VM is not aware of >> > the drbd primary/secondary -> primary/primary -> primary/secondary dance >> > that happens "outside" to enable live migration. > > -- > "Opportunity is missed by most people because it is dressed in overalls and > looks like work." -- Thomas A. Edison > > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: raid over ethernet 2011-01-29 6:44 ` Roberto Spadim 2011-01-29 6:48 ` Roberto Spadim 2011-01-29 13:34 ` Alexander Schreiber @ 2011-01-29 15:30 ` Spelic 2 siblings, 0 replies; 22+ messages in thread From: Spelic @ 2011-01-29 15:30 UTC (permalink / raw) To: linux-raid On 01/29/2011 07:44 AM, Roberto Spadim wrote: > faster than nbd? NBD is fast but has one problem: if you lose network connectivity for a while (tcp drops) there is no recovery I am aware of. I think it unmaps the disk until user intervention. Or this was the situation a couple of years ago. Actually for RAID this might even be a good point, but keep it in mind. iscsi seems an obvious alternative. And you can put anything under MD I think, but DRBD (without MD) is probably better because it's made exactly for that purpose. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: raid over ethernet 2011-01-29 6:42 ` Mikael Abrahamsson 2011-01-29 6:44 ` Roberto Spadim @ 2011-01-29 18:34 ` David Brown 1 sibling, 0 replies; 22+ messages in thread From: David Brown @ 2011-01-29 18:34 UTC (permalink / raw) To: linux-raid On 29/01/11 07:42, Mikael Abrahamsson wrote: > On Fri, 28 Jan 2011, Roberto Spadim wrote: > >> hi guys, i was thinking about raid over ethernet... there's a solution >> to make a syncronous replica of my filesystem? no problem if my >> primary server get down, i can mout my replica fsck it and continue >> with available data >> i was reading about nbd, anyone have more ideas? > > Look into AoE (ATA over Ethernet). > I think AoE is limited to fairly direct connections - it doesn't use IP, and can't be routed (at least not easily - I'm sure it is possible if you try hard enough). The alternative is iSCSI, which does use IP and can therefore be routed and passed around over networks. AoE is therefore slightly more efficient, and iSCSI more flexible. If you are looking at making a raid1 with an iSCSI or AoE target as one of the disks, consider using a write-intent bitmap and the --write-mostly and --write-behind flags. ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2011-01-31 17:37 UTC | newest] Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-01-29 1:58 raid over ethernet Roberto Spadim 2011-01-29 5:41 ` Jérôme Poulin 2011-01-29 6:42 ` Roberto Spadim 2011-01-29 13:29 ` Alexander Schreiber 2011-01-29 6:42 ` Mikael Abrahamsson 2011-01-29 6:44 ` Roberto Spadim 2011-01-29 6:48 ` Roberto Spadim [not found] ` <AANLkTikdahgMoJjGr2otTS70LSM77GNpW_vAkZf15Kph@mail.gmail.com> 2011-01-29 11:47 ` Roberto Spadim 2011-01-29 13:34 ` Alexander Schreiber [not found] ` <AANLkTi=6ridRPnHpfdOC=f2_ESndSARmQRkvT_shYO3s@mail.gmail.com> 2011-01-29 14:25 ` Denis 2011-01-29 21:08 ` Alexander Schreiber 2011-01-29 21:54 ` John Robinson 2011-01-29 23:04 ` Stan Hoeppner 2011-01-29 23:06 ` Miles Fidelman 2011-01-30 1:43 ` Alexander Schreiber 2011-01-31 8:42 ` Denis 2011-01-31 13:03 ` Alexander Schreiber 2011-01-31 14:45 ` Roberto Spadim 2011-01-31 16:15 ` Alexander Schreiber 2011-01-31 17:37 ` Roberto Spadim 2011-01-29 15:30 ` Spelic 2011-01-29 18:34 ` David Brown
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.