* Infiniband 40GB
@ 2012-06-03 8:10 Stefan Priebe
2012-06-03 12:56 ` Mark Nelson
0 siblings, 1 reply; 36+ messages in thread
From: Stefan Priebe @ 2012-06-03 8:10 UTC (permalink / raw)
To: ceph-devel
Hi List,
has anybody already tried CEPH over Infiniband 40GB?
Stefan
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB
2012-06-03 8:10 Infiniband 40GB Stefan Priebe
@ 2012-06-03 12:56 ` Mark Nelson
2012-06-04 6:22 ` Hannes Reinecke
0 siblings, 1 reply; 36+ messages in thread
From: Mark Nelson @ 2012-06-03 12:56 UTC (permalink / raw)
To: Stefan Priebe; +Cc: ceph-devel
On 6/3/12 3:10 AM, Stefan Priebe wrote:
> Hi List,
>
> has anybody already tried CEPH over Infiniband 40GB?
>
> Stefan
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Stefan,
A couple of folks have done DDR IB. For now you are limited to ipoib
though. If you have the hardware available I'd be really curious what
kind of throughput/latencies you see.
Mark
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB
2012-06-03 12:56 ` Mark Nelson
@ 2012-06-04 6:22 ` Hannes Reinecke
2012-06-04 7:26 ` Stefan Priebe - Profihost AG
2012-06-04 12:28 ` Mark Nelson
0 siblings, 2 replies; 36+ messages in thread
From: Hannes Reinecke @ 2012-06-04 6:22 UTC (permalink / raw)
To: Mark Nelson; +Cc: Stefan Priebe, ceph-devel
On 06/03/2012 02:56 PM, Mark Nelson wrote:
> On 6/3/12 3:10 AM, Stefan Priebe wrote:
>> Hi List,
>>
>> has anybody already tried CEPH over Infiniband 40GB?
>>
>> Stefan
>> --
>> To unsubscribe from this list: send the line "unsubscribe
>> ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> Hi Stefan,
>
> A couple of folks have done DDR IB. For now you are limited to
> ipoib though. If you have the hardware available I'd be really
> curious what kind of throughput/latencies you see.
>
Hehe.
Good luck with that.
We've tried on 10GigE with _disastrous_ results.
Up to the point where 1GigE was actually _faster_.
So far we've uncovered two issues:
- intel_idle was/is seriously broken (we've tried on 3.0-stable,
so might've been fixed by now)
- osd-server is calling 'fsync' on each and every write request.
Does wonders for performance ...
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB
2012-06-04 6:22 ` Hannes Reinecke
@ 2012-06-04 7:26 ` Stefan Priebe - Profihost AG
2012-06-04 7:39 ` Hannes Reinecke
2012-06-04 12:28 ` Mark Nelson
1 sibling, 1 reply; 36+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-06-04 7:26 UTC (permalink / raw)
To: Hannes Reinecke; +Cc: Mark Nelson, ceph-devel
Am 04.06.2012 08:22, schrieb Hannes Reinecke:
> Hehe.
> Good luck with that.
>
> We've tried on 10GigE with _disastrous_ results.
> Up to the point where 1GigE was actually _faster_.
So you mean you've tried 10GBE or 10GB ipoib with Infiniband?
> - osd-server is calling 'fsync' on each and every write request.
> Does wonders for performance ...
Already talked to the ceph guys?
Stefan
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB
2012-06-04 7:26 ` Stefan Priebe - Profihost AG
@ 2012-06-04 7:39 ` Hannes Reinecke
2012-06-04 7:53 ` Stefan Priebe - Profihost AG
0 siblings, 1 reply; 36+ messages in thread
From: Hannes Reinecke @ 2012-06-04 7:39 UTC (permalink / raw)
To: Stefan Priebe - Profihost AG; +Cc: Mark Nelson, ceph-devel
On 06/04/2012 09:26 AM, Stefan Priebe - Profihost AG wrote:
> Am 04.06.2012 08:22, schrieb Hannes Reinecke:
>> Hehe.
>> Good luck with that.
>>
>> We've tried on 10GigE with _disastrous_ results.
>> Up to the point where 1GigE was actually _faster_.
>
> So you mean you've tried 10GBE or 10GB ipoib with Infiniband?
>
>> - osd-server is calling 'fsync' on each and every write request.
>> Does wonders for performance ...
> Already talked to the ceph guys?
>
Still not there yet. Still need to figure out the exact details;
performance regressions are notoriously hard to track.
But yeah, rumours have it we are in contact.
Project management on our side could be improved, though ;)
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB
2012-06-04 7:39 ` Hannes Reinecke
@ 2012-06-04 7:53 ` Stefan Priebe - Profihost AG
2012-06-04 8:02 ` Hannes Reinecke
0 siblings, 1 reply; 36+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-06-04 7:53 UTC (permalink / raw)
To: Hannes Reinecke; +Cc: Mark Nelson, ceph-devel
Am 04.06.2012 09:39, schrieb Hannes Reinecke:
> On 06/04/2012 09:26 AM, Stefan Priebe - Profihost AG wrote:
>> Am 04.06.2012 08:22, schrieb Hannes Reinecke:
>>> Hehe.
>>> Good luck with that.
>>>
>>> We've tried on 10GigE with _disastrous_ results.
>>> Up to the point where 1GigE was actually _faster_.
>>
>> So you mean you've tried 10GBE or 10GB ipoib with Infiniband?
Could you please answer this question too? Thx.
Cheers,
Stefan
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB
2012-06-04 7:53 ` Stefan Priebe - Profihost AG
@ 2012-06-04 8:02 ` Hannes Reinecke
2012-06-04 8:23 ` Stefan Majer
0 siblings, 1 reply; 36+ messages in thread
From: Hannes Reinecke @ 2012-06-04 8:02 UTC (permalink / raw)
To: Stefan Priebe - Profihost AG; +Cc: Mark Nelson, ceph-devel
On 06/04/2012 09:53 AM, Stefan Priebe - Profihost AG wrote:
> Am 04.06.2012 09:39, schrieb Hannes Reinecke:
>> On 06/04/2012 09:26 AM, Stefan Priebe - Profihost AG wrote:
>>> Am 04.06.2012 08:22, schrieb Hannes Reinecke:
>>>> Hehe.
>>>> Good luck with that.
>>>>
>>>> We've tried on 10GigE with _disastrous_ results.
>>>> Up to the point where 1GigE was actually _faster_.
>>>
>>> So you mean you've tried 10GBE or 10GB ipoib with Infiniband?
>
> Could you please answer this question too? Thx.
>
This was plain 10GigE, ie TCP/IP. Not infiniband, I'm afraid.
However, given that our problems have not been related to the actual
transport I'd be very much surprised if they would not occur on
Infiniband.
And I would _definitely_ like to hear if someone managed to get any
decent speed (notably write speed) on fast interconnects.
There's always a chance we've messed things up and were just
measuring our crap setup ...
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB
2012-06-04 8:02 ` Hannes Reinecke
@ 2012-06-04 8:23 ` Stefan Majer
2012-06-04 9:21 ` Yann Dupont
2012-06-05 8:54 ` Stefan Priebe - Profihost AG
0 siblings, 2 replies; 36+ messages in thread
From: Stefan Majer @ 2012-06-04 8:23 UTC (permalink / raw)
To: Hannes Reinecke; +Cc: Stefan Priebe - Profihost AG, Mark Nelson, ceph-devel
Hi Hannes,
our production environment is running on 10GB infrastructure. We had a
lot of troubles till we got to where we are today.
We use Intel X520 D2 cards on our OSD´s and nexus switch
infrastructure. All other cards we where testing failed horrible.
Some of the problems we encountered have been:
- page allocation failures in the ixgbe driver --> fixed in upstream
- problems with jumbo frames, we had to disable tso, gro, lro -- >
this is the most obscure thing
- various tuning via sysctl in the net.tcp and net.ipv4 area --> this
was also the outcome of stefan´s benchmarking odysee.
But after all this we a quite happy actully and are only limited by
the speed of the drives (2TB SATA).
The fsync is a fdatasync in fact which is available in newer glibc. If
you dont use btrfs (we use xfs) you need to use a recent glibc with
fdatasync support.
On Mon, Jun 4, 2012 at 10:02 AM, Hannes Reinecke <hare@suse.de> wrote:
hope this helps
Greetings
Stefan
> On 06/04/2012 09:53 AM, Stefan Priebe - Profihost AG wrote:
>> Am 04.06.2012 09:39, schrieb Hannes Reinecke:
>>> On 06/04/2012 09:26 AM, Stefan Priebe - Profihost AG wrote:
>>>> Am 04.06.2012 08:22, schrieb Hannes Reinecke:
>>>>> Hehe.
>>>>> Good luck with that.
>>>>>
>>>>> We've tried on 10GigE with _disastrous_ results.
>>>>> Up to the point where 1GigE was actually _faster_.
>>>>
>>>> So you mean you've tried 10GBE or 10GB ipoib with Infiniband?
>>
>> Could you please answer this question too? Thx.
>>
> This was plain 10GigE, ie TCP/IP. Not infiniband, I'm afraid.
>
> However, given that our problems have not been related to the actual
> transport I'd be very much surprised if they would not occur on
> Infiniband.
>
> And I would _definitely_ like to hear if someone managed to get any
> decent speed (notably write speed) on fast interconnects.
> There's always a chance we've messed things up and were just
> measuring our crap setup ...
>
> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke zSeries & Storage
> hare@suse.de +49 911 74053 688
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Stefan Majer
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB
2012-06-04 8:23 ` Stefan Majer
@ 2012-06-04 9:21 ` Yann Dupont
2012-06-04 9:35 ` Alexandre DERUMIER
2012-06-04 9:47 ` Amon Ott
2012-06-05 8:54 ` Stefan Priebe - Profihost AG
1 sibling, 2 replies; 36+ messages in thread
From: Yann Dupont @ 2012-06-04 9:21 UTC (permalink / raw)
To: Stefan Majer
Cc: Hannes Reinecke, Stefan Priebe - Profihost AG, Mark Nelson, ceph-devel
Le 04/06/2012 10:23, Stefan Majer a écrit :
> Hi Hannes,
>
> our production environment is running on 10GB infrastructure. We had a
> lot of troubles till we got to where we are today.
> We use Intel X520 D2 cards on our OSD´s and nexus switch
> infrastructure. All other cards we where testing failed horrible.
>
we have Intel Corporation 82599EB 10 Gigabit Dual Port Backplane
Connection (rev 01)... Don't know the 'commercial name'. ixgbe driver.
> Some of the problems we encountered have been:
> - page allocation failures in the ixgbe driver --> fixed in upstream
> - problems with jumbo frames, we had to disable tso, gro, lro -- >
> this is the most obscure thing
> - various tuning via sysctl in the net.tcp and net.ipv4 area --> this
> was also the outcome of stefan´s benchmarking odysee.
some tuning we made :
-> Turning off Virtualisation extension in BIOS. Don't know why, but it
gaves us crappy performance. We usually put it on, because we use KVM a
lot. In our case, OSD are in bare metal and disabling virtualisation
extension gives us a very big boost.
It may be a BIOS bug in our machines (DELL M610).
-> One of my colleague played with receive flow steeting ; the intel
card supports multi queue, so it seems we can gain a little with it :
!/bin/sh
for x in $(seq 0 23); do echo FFFFFFFF >
/sys/class/net/eth2/queues/rx-${x}/rps_cpus; done
echo 16384 > /proc/sys/net/core/rps_sock_flow_entries
for x in $(seq 0 23); do echo 16384 >
/sys/class/net/eth2/queues/rx-${x}/rps_flow_cnt; done
>
> But after all this we a quite happy actully and are only limited by
> the speed of the drives (2TB SATA).
> The fsync is a fdatasync in fact which is available in newer glibc. If
> you dont use btrfs (we use xfs) you need to use a recent glibc with
> fdatasync support.
Does it may explain why we see loosy performance with xfs right now ?
That the main reason we're stuck with btrfs for the moment.
we're using debian 'stable' : libc is
libc6 2.11.3-3
probably too old ?
Cheers,
--
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB
2012-06-04 9:21 ` Yann Dupont
@ 2012-06-04 9:35 ` Alexandre DERUMIER
2012-06-04 9:53 ` Yann Dupont
2012-06-04 9:47 ` Amon Ott
1 sibling, 1 reply; 36+ messages in thread
From: Alexandre DERUMIER @ 2012-06-04 9:35 UTC (permalink / raw)
To: Yann Dupont
Cc: Hannes Reinecke, Stefan Priebe - Profihost AG, Mark Nelson,
ceph-devel, Stefan Majer
Hi,
about this:
>> Turning off Virtualisation extension in BIOS. Don't know why, but it
>>gaves us crappy performance. We usually put it on, because we use KVM a
>>lot. In our case, OSD are in bare metal and disabling virtualisation
>>extension gives us a very big boost.
>>It may be a BIOS bug in our machines (DELL M610).
It could be related to iommu, if you pass intel_iommu=on in grub.
I have already had this kind of problem.
When intel_iommu=on, Linux (completely unrelated to KVM) adds a new level
of protection which didn't exist without an IOMMU - the network card, which
without an IOMMU could write (via DMA) to any memory location, now is
not allowed - the card can only write to memory locates which the OS
wanted it to write. Theoretically, this can protect the OS against
various kinds of attacks. But what happens now is that every time that
Linux passes a new buffer to the card, it needs to change the IOMMU
mappings. This noticably slows down I/O, unfortunately.
----- Mail original -----
De: "Yann Dupont" <Yann.Dupont@univ-nantes.fr>
À: "Stefan Majer" <stefan.majer@gmail.com>
Cc: "Hannes Reinecke" <hare@suse.de>, "Stefan Priebe - Profihost AG" <s.priebe@profihost.ag>, "Mark Nelson" <mark.nelson@inktank.com>, ceph-devel@vger.kernel.org
Envoyé: Lundi 4 Juin 2012 11:21:56
Objet: Re: Infiniband 40GB
Le 04/06/2012 10:23, Stefan Majer a écrit :
> Hi Hannes,
>
> our production environment is running on 10GB infrastructure. We had a
> lot of troubles till we got to where we are today.
> We use Intel X520 D2 cards on our OSD´s and nexus switch
> infrastructure. All other cards we where testing failed horrible.
>
we have Intel Corporation 82599EB 10 Gigabit Dual Port Backplane
Connection (rev 01)... Don't know the 'commercial name'. ixgbe driver.
> Some of the problems we encountered have been:
> - page allocation failures in the ixgbe driver --> fixed in upstream
> - problems with jumbo frames, we had to disable tso, gro, lro -- >
> this is the most obscure thing
> - various tuning via sysctl in the net.tcp and net.ipv4 area --> this
> was also the outcome of stefan´s benchmarking odysee.
some tuning we made :
-> Turning off Virtualisation extension in BIOS. Don't know why, but it
gaves us crappy performance. We usually put it on, because we use KVM a
lot. In our case, OSD are in bare metal and disabling virtualisation
extension gives us a very big boost.
It may be a BIOS bug in our machines (DELL M610).
-> One of my colleague played with receive flow steeting ; the intel
card supports multi queue, so it seems we can gain a little with it :
!/bin/sh
for x in $(seq 0 23); do echo FFFFFFFF >
/sys/class/net/eth2/queues/rx-${x}/rps_cpus; done
echo 16384 > /proc/sys/net/core/rps_sock_flow_entries
for x in $(seq 0 23); do echo 16384 >
/sys/class/net/eth2/queues/rx-${x}/rps_flow_cnt; done
>
> But after all this we a quite happy actully and are only limited by
> the speed of the drives (2TB SATA).
> The fsync is a fdatasync in fact which is available in newer glibc. If
> you dont use btrfs (we use xfs) you need to use a recent glibc with
> fdatasync support.
Does it may explain why we see loosy performance with xfs right now ?
That the main reason we're stuck with btrfs for the moment.
we're using debian 'stable' : libc is
libc6 2.11.3-3
probably too old ?
Cheers,
--
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
--
Alexandre D erumier
Ingénieur Système
Fixe : 03 20 68 88 90
Fax : 03 20 68 90 81
45 Bvd du Général Leclerc 59100 Roubaix - France
12 rue Marivaux 75002 Paris - France
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB
2012-06-04 9:21 ` Yann Dupont
2012-06-04 9:35 ` Alexandre DERUMIER
@ 2012-06-04 9:47 ` Amon Ott
2012-06-04 9:58 ` Yann Dupont
` (3 more replies)
1 sibling, 4 replies; 36+ messages in thread
From: Amon Ott @ 2012-06-04 9:47 UTC (permalink / raw)
To: Yann Dupont; +Cc: ceph-devel
[-- Attachment #1: Type: text/plain, Size: 3097 bytes --]
On Monday 04 June 2012 you wrote:
> Le 04/06/2012 10:23, Stefan Majer a écrit :
> > Hi Hannes,
> >
> > our production environment is running on 10GB infrastructure. We had a
> > lot of troubles till we got to where we are today.
> > We use Intel X520 D2 cards on our OSD´s and nexus switch
> > infrastructure. All other cards we where testing failed horrible.
>
> we have Intel Corporation 82599EB 10 Gigabit Dual Port Backplane
> Connection (rev 01)... Don't know the 'commercial name'. ixgbe driver.
>
> > Some of the problems we encountered have been:
> > - page allocation failures in the ixgbe driver --> fixed in upstream
> > - problems with jumbo frames, we had to disable tso, gro, lro -- >
> > this is the most obscure thing
> > - various tuning via sysctl in the net.tcp and net.ipv4 area --> this
> > was also the outcome of stefan´s benchmarking odysee.
>
> some tuning we made :
>
> -> Turning off Virtualisation extension in BIOS. Don't know why, but it
> gaves us crappy performance. We usually put it on, because we use KVM a
> lot. In our case, OSD are in bare metal and disabling virtualisation
> extension gives us a very big boost.
> It may be a BIOS bug in our machines (DELL M610).
>
> -> One of my colleague played with receive flow steeting ; the intel
> card supports multi queue, so it seems we can gain a little with it :
>
> !/bin/sh
>
> for x in $(seq 0 23); do echo FFFFFFFF >
> /sys/class/net/eth2/queues/rx-${x}/rps_cpus; done
> echo 16384 > /proc/sys/net/core/rps_sock_flow_entries
> for x in $(seq 0 23); do echo 16384 >
> /sys/class/net/eth2/queues/rx-${x}/rps_flow_cnt; done
>
> > But after all this we a quite happy actully and are only limited by
> > the speed of the drives (2TB SATA).
> > The fsync is a fdatasync in fact which is available in newer glibc. If
> > you dont use btrfs (we use xfs) you need to use a recent glibc with
> > fdatasync support.
>
> Does it may explain why we see loosy performance with xfs right now ?
> That the main reason we're stuck with btrfs for the moment.
>
> we're using debian 'stable' : libc is
> libc6 2.11.3-3
> probably too old ?
One reason for performance problems with that libc6 version is missing
syncfs() support. I backported a patch for 2.13, originally by Andreas
Schwab, schwab@redhat.com, to Debian stable code. Patch is attached.
Copy the patch to eglibc's debian/patches/, add to debian/patches/series,
rebuild eglibc packages (including libc6) with dpkg-buildpackage, install new
libc6-dev, rebuild ceph packages against it, install and retry. AFAIK, not
even libc6 in Debian experimental has syncfs() support.
Also see thread "OSD deadlock with cephfs client and OSD on same machine"
Amon Ott
--
Dr. Amon Ott
m-privacy GmbH Tel: +49 30 24342334
Am Köllnischen Park 1 Fax: +49 30 24342336
10179 Berlin http://www.m-privacy.de
Amtsgericht Charlottenburg, HRB 84946
Geschäftsführer:
Dipl.-Kfm. Holger Maczkowsky,
Roman Maczkowsky
GnuPG-Key-ID: 0x2DD3A649
[-- Attachment #2: syncfs.diff --]
[-- Type: text/x-diff, Size: 4110 bytes --]
Versions.def | 1 +
misc/Makefile | 4 ++--
misc/Versions | 3 +++
misc/syncfs.c | 33 +++++++++++++++++++++++++++++++++
posix/unistd.h | 9 ++++++++-
sysdeps/unix/syscalls.list | 1 +
6 files changed, 48 insertions(+), 3 deletions(-)
create mode 100644 misc/syncfs.c
diff --git a/Versions.def b/Versions.def
index 0ccda50..e478fdd 100644
--- a/Versions.def
+++ b/Versions.def
@@ -30,5 +30,6 @@ libc {
GLIBC_2.11
GLIBC_2.12
+ GLIBC_2.14
%ifdef USE_IN_LIBIO
HURD_CTHREADS_0.3
%endif
diff --git a/misc/Makefile b/misc/Makefile
index ee69361..52b13da 100644
--- a/misc/Makefile
+++ b/misc/Makefile
@@ -1,4 +1,4 @@
-# Copyright (C) 1991-2006, 2007, 2009 Free Software Foundation, Inc.
+# Copyright (C) 1991-2006, 2007, 2009, 2011 Free Software Foundation, Inc.
# This file is part of the GNU C Library.
# The GNU C Library is free software; you can redistribute it and/or
@@ -45,7 +45,7 @@ routines := brk sbrk sstk ioctl \
getdtsz \
gethostname sethostname getdomain setdomain \
select pselect \
- acct chroot fsync sync fdatasync reboot \
+ acct chroot fsync sync fdatasync syncfs reboot \
gethostid sethostid \
vhangup \
swapon swapoff mktemp mkstemp mkstemp64 mkdtemp \
diff --git a/misc/Versions b/misc/Versions
index 3ffe3d1..3a31c7f 100644
--- a/misc/Versions
+++ b/misc/Versions
@@ -143,4 +143,7 @@ libc {
GLIBC_2.11 {
mkstemps; mkstemps64; mkostemps; mkostemps64;
}
+ GLIBC_2.14 {
+ syncfs;
+ }
}
diff --git a/misc/syncfs.c b/misc/syncfs.c
new file mode 100644
index 0000000..bd7328c
--- /dev/null
+++ b/misc/syncfs.c
@@ -0,0 +1,33 @@
+/* Copyright (C) 2011 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, write to the Free
+ Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
+ 02111-1307 USA. */
+
+#include <errno.h>
+#include <unistd.h>
+
+/* Make all changes done to all files on the file system associated
+ with FD actually appear on disk. */
+int
+syncfs (int fd)
+{
+ __set_errno (ENOSYS);
+ return -1;
+}
+
+
+stub_warning (syncfs)
+#include <stub-tag.h>
diff --git a/posix/unistd.h b/posix/unistd.h
index 5ebcaf1..aa11860 100644
--- a/posix/unistd.h
+++ b/posix/unistd.h
@@ -1,4 +1,4 @@
-/* Copyright (C) 1991-2006, 2007, 2008, 2009 Free Software Foundation, Inc.
+/* Copyright (C) 1991-2009, 2010, 2011 Free Software Foundation, Inc.
This file is part of the GNU C Library.
The GNU C Library is free software; you can redistribute it and/or
@@ -974,6 +974,13 @@ extern int fsync (int __fd);
#endif /* Use BSD || X/Open || Unix98. */
+#ifdef __USE_GNU
+/* Make all changes done to all files on the file system associated
+ with FD actually appear on disk. */
+extern int syncfs (int __fd) __THROW;
+#endif
+
+
#if defined __USE_BSD || defined __USE_XOPEN_EXTENDED
/* Return identifier for the current host. */
diff --git a/sysdeps/unix/syscalls.list b/sysdeps/unix/syscalls.list
index 04ed63c..ad49170 100644
--- a/sysdeps/unix/syscalls.list
+++ b/sysdeps/unix/syscalls.list
@@ -55,6 +55,7 @@ swapoff - swapoff i:s swapoff
swapon - swapon i:s swapon
symlink - symlink i:ss __symlink symlink
sync - sync i: sync
+syncfs - syncfs i:i syncfs
sys_fstat fxstat fstat i:ip __syscall_fstat
sys_mknod xmknod mknod i:sii __syscall_mknod
sys_stat xstat stat i:sp __syscall_stat
--
1.7.4
^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB
2012-06-04 9:35 ` Alexandre DERUMIER
@ 2012-06-04 9:53 ` Yann Dupont
0 siblings, 0 replies; 36+ messages in thread
From: Yann Dupont @ 2012-06-04 9:53 UTC (permalink / raw)
To: Alexandre DERUMIER
Cc: Hannes Reinecke, Stefan Priebe - Profihost AG, Mark Nelson,
ceph-devel, Stefan Majer
Le 04/06/2012 11:35, Alexandre DERUMIER a écrit :
> Hi,
> about this:
>>> Turning off Virtualisation extension in BIOS. Don't know why, but it
>>> gaves us crappy performance. We usually put it on, because we use KVM a
>>> lot. In our case, OSD are in bare metal and disabling virtualisation
>>> extension gives us a very big boost.
>>> It may be a BIOS bug in our machines (DELL M610).
>
> It could be related to iommu, if you pass intel_iommu=on in grub.
> I have already had this kind of problem.
>
> When intel_iommu=on, Linux (completely unrelated to KVM) adds a new level
> of protection which didn't exist without an IOMMU - the network card, which
> without an IOMMU could write (via DMA) to any memory location, now is
> not allowed - the card can only write to memory locates which the OS
> wanted it to write. Theoretically, this can protect the OS against
> various kinds of attacks. But what happens now is that every time that
> Linux passes a new buffer to the card, it needs to change the IOMMU
> mappings. This noticably slows down I/O, unfortunately.
>
>
Infortunately, this is not the case. The intel card supports it, but
DELL M160 don't.
And I just checked, ou linux command line don't include intel_iommu=on.
BTW, it seems that turning on virtualization on bios kills performance
on integrated ixgbe driver. Sourceforge one seems less affected. Our
tests were circa kernel 3.2 , it may have changed since.
Cheers,
--
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB
2012-06-04 9:47 ` Amon Ott
@ 2012-06-04 9:58 ` Yann Dupont
2012-06-04 11:40 ` Alexandre DERUMIER
` (2 subsequent siblings)
3 siblings, 0 replies; 36+ messages in thread
From: Yann Dupont @ 2012-06-04 9:58 UTC (permalink / raw)
To: Amon Ott; +Cc: ceph-devel
Le 04/06/2012 11:47, Amon Ott a écrit :
> even libc6 in Debian experimental has syncfs() support.
>
> Also see thread "OSD deadlock with cephfs client and OSD on same machine"
Great , thanks for explanation.
... lots of tests to do this afternoon :) I need to convert my OSD with
xfs, benchmark with standard libc, then convert libc with your patch &
retest.
Thanks,
cheers,
--
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB
2012-06-04 9:47 ` Amon Ott
2012-06-04 9:58 ` Yann Dupont
@ 2012-06-04 11:40 ` Alexandre DERUMIER
2012-06-04 12:59 ` Mark Nelson
2012-06-04 15:42 ` Stefan Priebe
2012-06-06 10:48 ` Stefan Priebe - Profihost AG
3 siblings, 1 reply; 36+ messages in thread
From: Alexandre DERUMIER @ 2012-06-04 11:40 UTC (permalink / raw)
To: Amon Ott; +Cc: ceph-devel, Yann Dupont
Hi,
I'm currently doing some tests with xfs, debian wheezy with standard libc6 (2.11.3-3) and 3.2 kernel.
I'm doing some iostats(3 nodes with 5 osd), and I see constant writes to disks.(as the datas are flushed each second from journal to disk).
Journal is big enough (20GB tmpfs) to handle 30s of write.
Do you think it's related to the missing syncfs() support ?
-Alexandre
----- Mail original -----
De: "Amon Ott" <a.ott@m-privacy.de>
À: "Yann Dupont" <Yann.Dupont@univ-nantes.fr>
Cc: ceph-devel@vger.kernel.org
Envoyé: Lundi 4 Juin 2012 11:47:22
Objet: Re: Infiniband 40GB
On Monday 04 June 2012 you wrote:
> Le 04/06/2012 10:23, Stefan Majer a écrit :
> > Hi Hannes,
> >
> > our production environment is running on 10GB infrastructure. We had a
> > lot of troubles till we got to where we are today.
> > We use Intel X520 D2 cards on our OSD´s and nexus switch
> > infrastructure. All other cards we where testing failed horrible.
>
> we have Intel Corporation 82599EB 10 Gigabit Dual Port Backplane
> Connection (rev 01)... Don't know the 'commercial name'. ixgbe driver.
>
> > Some of the problems we encountered have been:
> > - page allocation failures in the ixgbe driver --> fixed in upstream
> > - problems with jumbo frames, we had to disable tso, gro, lro -- >
> > this is the most obscure thing
> > - various tuning via sysctl in the net.tcp and net.ipv4 area --> this
> > was also the outcome of stefan´s benchmarking odysee.
>
> some tuning we made :
>
> -> Turning off Virtualisation extension in BIOS. Don't know why, but it
> gaves us crappy performance. We usually put it on, because we use KVM a
> lot. In our case, OSD are in bare metal and disabling virtualisation
> extension gives us a very big boost.
> It may be a BIOS bug in our machines (DELL M610).
>
> -> One of my colleague played with receive flow steeting ; the intel
> card supports multi queue, so it seems we can gain a little with it :
>
> !/bin/sh
>
> for x in $(seq 0 23); do echo FFFFFFFF >
> /sys/class/net/eth2/queues/rx-${x}/rps_cpus; done
> echo 16384 > /proc/sys/net/core/rps_sock_flow_entries
> for x in $(seq 0 23); do echo 16384 >
> /sys/class/net/eth2/queues/rx-${x}/rps_flow_cnt; done
>
> > But after all this we a quite happy actully and are only limited by
> > the speed of the drives (2TB SATA).
> > The fsync is a fdatasync in fact which is available in newer glibc. If
> > you dont use btrfs (we use xfs) you need to use a recent glibc with
> > fdatasync support.
>
> Does it may explain why we see loosy performance with xfs right now ?
> That the main reason we're stuck with btrfs for the moment.
>
> we're using debian 'stable' : libc is
> libc6 2.11.3-3
> probably too old ?
One reason for performance problems with that libc6 version is missing
syncfs() support. I backported a patch for 2.13, originally by Andreas
Schwab, schwab@redhat.com, to Debian stable code. Patch is attached.
Copy the patch to eglibc's debian/patches/, add to debian/patches/series,
rebuild eglibc packages (including libc6) with dpkg-buildpackage, install new
libc6-dev, rebuild ceph packages against it, install and retry. AFAIK, not
even libc6 in Debian experimental has syncfs() support.
Also see thread "OSD deadlock with cephfs client and OSD on same machine"
Amon Ott
--
Dr. Amon Ott
m-privacy GmbH Tel: +49 30 24342334
Am Köllnischen Park 1 Fax: +49 30 24342336
10179 Berlin http://www.m-privacy.de
Amtsgericht Charlottenburg, HRB 84946
Geschäftsführer:
Dipl.-Kfm. Holger Maczkowsky,
Roman Maczkowsky
GnuPG-Key-ID: 0x2DD3A649
--
--
Alexandre D erumier
Ingénieur Système
Fixe : 03 20 68 88 90
Fax : 03 20 68 90 81
45 Bvd du Général Leclerc 59100 Roubaix - France
12 rue Marivaux 75002 Paris - France
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB
2012-06-04 6:22 ` Hannes Reinecke
2012-06-04 7:26 ` Stefan Priebe - Profihost AG
@ 2012-06-04 12:28 ` Mark Nelson
2012-06-04 12:34 ` Tomasz Paszkowski
1 sibling, 1 reply; 36+ messages in thread
From: Mark Nelson @ 2012-06-04 12:28 UTC (permalink / raw)
To: Hannes Reinecke; +Cc: Stefan Priebe, ceph-devel
On 6/4/12 1:22 AM, Hannes Reinecke wrote:
> On 06/03/2012 02:56 PM, Mark Nelson wrote:
>> On 6/3/12 3:10 AM, Stefan Priebe wrote:
>>> Hi List,
>>>
>>> has anybody already tried CEPH over Infiniband 40GB?
>>>
>>> Stefan
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe
>>> ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>> Hi Stefan,
>>
>> A couple of folks have done DDR IB. For now you are limited to
>> ipoib though. If you have the hardware available I'd be really
>> curious what kind of throughput/latencies you see.
>>
> Hehe.
>
> Good luck with that.
>
> We've tried on 10GigE with _disastrous_ results.
> Up to the point where 1GigE was actually _faster_.
Strange! Do you see good results with something like iperf? Internally
we have 10GE on some of our test nodes and I can get up to around
600MB/s per node during rados bench testing.
> So far we've uncovered two issues:
> - intel_idle was/is seriously broken (we've tried on 3.0-stable,
> so might've been fixed by now)
> - osd-server is calling 'fsync' on each and every write request.
> Does wonders for performance ...
For syncfs support, upgrade to a distro with glibc 2.13+ (ie precise).
I've noticed a significant improvement in our spinning disk performance
going from oneiric and kernel 3.3 to precise and kernel 3.4. I think
part of this is related to the raid drivers for the cards we have in our
test boxes though. I'm actually recording blktrace and seekwatcher
results for all of our tests to specifically look at syncs and disk seek
behavior...
>
> Cheers,
>
> Hannes
Mark
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB
2012-06-04 12:28 ` Mark Nelson
@ 2012-06-04 12:34 ` Tomasz Paszkowski
2012-06-04 12:40 ` Mark Nelson
0 siblings, 1 reply; 36+ messages in thread
From: Tomasz Paszkowski @ 2012-06-04 12:34 UTC (permalink / raw)
To: Mark Nelson; +Cc: Hannes Reinecke, Stefan Priebe, ceph-devel
On Mon, Jun 4, 2012 at 2:28 PM, Mark Nelson <mark.nelson@inktank.com> wrote:
>
> For syncfs support, upgrade to a distro with glibc 2.13+ (ie precise). I've
> noticed a significant improvement in our spinning disk performance going
> from oneiric and kernel 3.3 to precise and kernel 3.4. I think part of this
> is related to the raid drivers for the cards we have in our test boxes
> though. I'm actually recording blktrace and seekwatcher results for all of
> our tests to specifically look at syncs and disk seek behavior...
>
Correct me if I'am wrong. But AFAIR precise in running 3.2 kernel.
--
Tomasz Paszkowski
SS7, Asterisk, SAN, Datacenter, Cloud Computing
+48500166299
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB
2012-06-04 12:34 ` Tomasz Paszkowski
@ 2012-06-04 12:40 ` Mark Nelson
0 siblings, 0 replies; 36+ messages in thread
From: Mark Nelson @ 2012-06-04 12:40 UTC (permalink / raw)
To: Tomasz Paszkowski; +Cc: Hannes Reinecke, Stefan Priebe, ceph-devel
On 6/4/12 7:34 AM, Tomasz Paszkowski wrote:
> On Mon, Jun 4, 2012 at 2:28 PM, Mark Nelson<mark.nelson@inktank.com> wrote:
>>
>> For syncfs support, upgrade to a distro with glibc 2.13+ (ie precise). I've
>> noticed a significant improvement in our spinning disk performance going
>> from oneiric and kernel 3.3 to precise and kernel 3.4. I think part of this
>> is related to the raid drivers for the cards we have in our test boxes
>> though. I'm actually recording blktrace and seekwatcher results for all of
>> our tests to specifically look at syncs and disk seek behavior...
>>
>
> Correct me if I'am wrong. But AFAIR precise in running 3.2 kernel.
Sorry, I should have been more clear. We were running oneiric with our
own kernel 3.3 build and are now running precise with our own kernel 3.4
build (available on gitbuilder.ceph.com).
Mark
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB
2012-06-04 11:40 ` Alexandre DERUMIER
@ 2012-06-04 12:59 ` Mark Nelson
2012-06-04 13:07 ` Alexandre DERUMIER
2012-06-06 16:05 ` Alexandre DERUMIER
0 siblings, 2 replies; 36+ messages in thread
From: Mark Nelson @ 2012-06-04 12:59 UTC (permalink / raw)
To: Alexandre DERUMIER; +Cc: Amon Ott, ceph-devel, Yann Dupont
On 6/4/12 6:40 AM, Alexandre DERUMIER wrote:
> Hi,
>
> I'm currently doing some tests with xfs, debian wheezy with standard libc6 (2.11.3-3) and 3.2 kernel.
>
> I'm doing some iostats(3 nodes with 5 osd), and I see constant writes to disks.(as the datas are flushed each second from journal to disk).
>
> Journal is big enough (20GB tmpfs) to handle 30s of write.
>
> Do you think it's related to the missing syncfs() support ?
>
> -Alexandre
Hi Alexandre,
I've included some seekwatcher results for rados bench tests using 16
concurrent 4MB writes on XFS OSD. One shows ubuntu oneiric and the
other precise (ie no syncfs support vs syncfs support in libc).
Unfortunately the original test was on 0.46 and the second test was on
0.47.2, so multiple things changed between the tests. Both were tested
with kernel 3.4. Interestingly the seeks/second don't seem to drop much
but the overall performance has about doubled. This was using a single
7200rpm disk for the OSD data disk and a seperate 7200rpm disk for the
journal in both cases. I'd definitely try 0.47.2 with a new libc though
and see how that works for you.
ceph 0.46/oneiric:
http://nhm.ceph.com/movies/mailinglist-tests/xfs-osd0-oneiric-3.4.mpg
ceph 0.47.2/precise:
http://nhm.ceph.com/movies/mailinglist-tests/xfs-osd0-precise-3.4.mpg
Mark
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB
2012-06-04 12:59 ` Mark Nelson
@ 2012-06-04 13:07 ` Alexandre DERUMIER
2012-06-04 13:28 ` Mark Nelson
2012-06-06 16:05 ` Alexandre DERUMIER
1 sibling, 1 reply; 36+ messages in thread
From: Alexandre DERUMIER @ 2012-06-04 13:07 UTC (permalink / raw)
To: Mark Nelson; +Cc: Amon Ott, ceph-devel, Yann Dupont
Thanks Mark,
I'll rebuild my cluster with ubuntu precise tomorrow. (Don't have time to backport/maintain libc6 ;)
BTW, do you use mainly ubuntu at intank for your tests ?
I'd like to have a setup as close as possible of intank setup.
----- Mail original -----
De: "Mark Nelson" <mark.nelson@inktank.com>
À: "Alexandre DERUMIER" <aderumier@odiso.com>
Cc: "Amon Ott" <a.ott@m-privacy.de>, ceph-devel@vger.kernel.org, "Yann Dupont" <Yann.Dupont@univ-nantes.fr>
Envoyé: Lundi 4 Juin 2012 14:59:58
Objet: Re: Infiniband 40GB
On 6/4/12 6:40 AM, Alexandre DERUMIER wrote:
> Hi,
>
> I'm currently doing some tests with xfs, debian wheezy with standard libc6 (2.11.3-3) and 3.2 kernel.
>
> I'm doing some iostats(3 nodes with 5 osd), and I see constant writes to disks.(as the datas are flushed each second from journal to disk).
>
> Journal is big enough (20GB tmpfs) to handle 30s of write.
>
> Do you think it's related to the missing syncfs() support ?
>
> -Alexandre
Hi Alexandre,
I've included some seekwatcher results for rados bench tests using 16
concurrent 4MB writes on XFS OSD. One shows ubuntu oneiric and the
other precise (ie no syncfs support vs syncfs support in libc).
Unfortunately the original test was on 0.46 and the second test was on
0.47.2, so multiple things changed between the tests. Both were tested
with kernel 3.4. Interestingly the seeks/second don't seem to drop much
but the overall performance has about doubled. This was using a single
7200rpm disk for the OSD data disk and a seperate 7200rpm disk for the
journal in both cases. I'd definitely try 0.47.2 with a new libc though
and see how that works for you.
ceph 0.46/oneiric:
http://nhm.ceph.com/movies/mailinglist-tests/xfs-osd0-oneiric-3.4.mpg
ceph 0.47.2/precise:
http://nhm.ceph.com/movies/mailinglist-tests/xfs-osd0-precise-3.4.mpg
Mark
--
--
Alexandre D erumier
Ingénieur Système
Fixe : 03 20 68 88 90
Fax : 03 20 68 90 81
45 Bvd du Général Leclerc 59100 Roubaix - France
12 rue Marivaux 75002 Paris - France
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB
2012-06-04 13:07 ` Alexandre DERUMIER
@ 2012-06-04 13:28 ` Mark Nelson
2012-06-04 15:11 ` Gregory Farnum
0 siblings, 1 reply; 36+ messages in thread
From: Mark Nelson @ 2012-06-04 13:28 UTC (permalink / raw)
To: Alexandre DERUMIER; +Cc: Amon Ott, ceph-devel, Yann Dupont
Hi Alexandre,
A lot of our testing is on Ubuntu right now. I'm using the ceph and
kernel debs from ceph.gitbuilder.com for my tests. Post some results to
the list once you get your cluster setup!
Thanks,
Mark
On 6/4/12 8:07 AM, Alexandre DERUMIER wrote:
> Thanks Mark,
> I'll rebuild my cluster with ubuntu precise tomorrow. (Don't have time to backport/maintain libc6 ;)
>
>
> BTW, do you use mainly ubuntu at intank for your tests ?
>
> I'd like to have a setup as close as possible of intank setup.
>
>
> ----- Mail original -----
>
> De: "Mark Nelson"<mark.nelson@inktank.com>
> À: "Alexandre DERUMIER"<aderumier@odiso.com>
> Cc: "Amon Ott"<a.ott@m-privacy.de>, ceph-devel@vger.kernel.org, "Yann Dupont"<Yann.Dupont@univ-nantes.fr>
> Envoyé: Lundi 4 Juin 2012 14:59:58
> Objet: Re: Infiniband 40GB
>
> On 6/4/12 6:40 AM, Alexandre DERUMIER wrote:
>> Hi,
>>
>> I'm currently doing some tests with xfs, debian wheezy with standard libc6 (2.11.3-3) and 3.2 kernel.
>>
>> I'm doing some iostats(3 nodes with 5 osd), and I see constant writes to disks.(as the datas are flushed each second from journal to disk).
>>
>> Journal is big enough (20GB tmpfs) to handle 30s of write.
>>
>> Do you think it's related to the missing syncfs() support ?
>>
>> -Alexandre
>
> Hi Alexandre,
>
> I've included some seekwatcher results for rados bench tests using 16
> concurrent 4MB writes on XFS OSD. One shows ubuntu oneiric and the
> other precise (ie no syncfs support vs syncfs support in libc).
> Unfortunately the original test was on 0.46 and the second test was on
> 0.47.2, so multiple things changed between the tests. Both were tested
> with kernel 3.4. Interestingly the seeks/second don't seem to drop much
> but the overall performance has about doubled. This was using a single
> 7200rpm disk for the OSD data disk and a seperate 7200rpm disk for the
> journal in both cases. I'd definitely try 0.47.2 with a new libc though
> and see how that works for you.
>
> ceph 0.46/oneiric:
> http://nhm.ceph.com/movies/mailinglist-tests/xfs-osd0-oneiric-3.4.mpg
>
> ceph 0.47.2/precise:
> http://nhm.ceph.com/movies/mailinglist-tests/xfs-osd0-precise-3.4.mpg
>
> Mark
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB
2012-06-04 13:28 ` Mark Nelson
@ 2012-06-04 15:11 ` Gregory Farnum
2012-06-04 15:34 ` Mark Nelson
0 siblings, 1 reply; 36+ messages in thread
From: Gregory Farnum @ 2012-06-04 15:11 UTC (permalink / raw)
To: ceph-devel; +Cc: Alexandre DERUMIER, Amon Ott, Yann Dupont, Mark Nelson
On Monday, June 4, 2012 at 6:28 AM, Mark Nelson wrote:
> Hi Alexandre,
>
> A lot of our testing is on Ubuntu right now. I'm using the ceph and
> kernel debs from ceph.gitbuilder.com (http://ceph.gitbuilder.com) for my tests. Post some results to
> the list once you get your cluster setup!
>
I think he means gitbuilder.ceph.com. ;)
-Greg
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB
2012-06-04 15:11 ` Gregory Farnum
@ 2012-06-04 15:34 ` Mark Nelson
0 siblings, 0 replies; 36+ messages in thread
From: Mark Nelson @ 2012-06-04 15:34 UTC (permalink / raw)
To: Gregory Farnum; +Cc: ceph-devel, Alexandre DERUMIER, Amon Ott, Yann Dupont
On 06/04/2012 10:11 AM, Gregory Farnum wrote:
> On Monday, June 4, 2012 at 6:28 AM, Mark Nelson wrote:
>> Hi Alexandre,
>>
>> A lot of our testing is on Ubuntu right now. I'm using the ceph and
>> kernel debs from ceph.gitbuilder.com (http://ceph.gitbuilder.com) for my tests. Post some results to
>> the list once you get your cluster setup!
>>
>
> I think he means gitbuilder.ceph.com. ;)
> -Greg
>
Doh! This is why I need caffeine before writing emails. Thanks Greg. :)
Mark
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB
2012-06-04 9:47 ` Amon Ott
2012-06-04 9:58 ` Yann Dupont
2012-06-04 11:40 ` Alexandre DERUMIER
@ 2012-06-04 15:42 ` Stefan Priebe
2012-06-05 7:08 ` Amon Ott
2012-06-06 10:48 ` Stefan Priebe - Profihost AG
3 siblings, 1 reply; 36+ messages in thread
From: Stefan Priebe @ 2012-06-04 15:42 UTC (permalink / raw)
To: Amon Ott; +Cc: Yann Dupont, ceph-devel
Hi Amon,
thanks for your backported patch. At least it doesn't cleanly apply to
debian squeeze stable as it wants a glic 2.12 in Versions.def but Debian
is only at 2.11? Do you use another patch too?
Stefan
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB
2012-06-04 15:42 ` Stefan Priebe
@ 2012-06-05 7:08 ` Amon Ott
2012-06-05 7:46 ` Stefan Priebe - Profihost AG
0 siblings, 1 reply; 36+ messages in thread
From: Amon Ott @ 2012-06-05 7:08 UTC (permalink / raw)
To: Stefan Priebe; +Cc: Yann Dupont, ceph-devel
On Monday 04 June 2012 wrote Stefan Priebe:
> Hi Amon,
>
> thanks for your backported patch. At least it doesn't cleanly apply to
> debian squeeze stable as it wants a glic 2.12 in Versions.def but Debian
> is only at 2.11? Do you use another patch too?
I ripped the patch right out of our previously built 2.11.3-3 source tree. It
needs to be last in the series file, because several existing Debian patches
modify the sources at various places. I could also make our compiled packages
available to you for download.
Amon Ott
--
Dr. Amon Ott
m-privacy GmbH Tel: +49 30 24342334
Am Köllnischen Park 1 Fax: +49 30 24342336
10179 Berlin http://www.m-privacy.de
Amtsgericht Charlottenburg, HRB 84946
Geschäftsführer:
Dipl.-Kfm. Holger Maczkowsky,
Roman Maczkowsky
GnuPG-Key-ID: 0x2DD3A649
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB
2012-06-05 7:08 ` Amon Ott
@ 2012-06-05 7:46 ` Stefan Priebe - Profihost AG
0 siblings, 0 replies; 36+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-06-05 7:46 UTC (permalink / raw)
To: Amon Ott; +Cc: Yann Dupont, ceph-devel
Am 05.06.2012 09:08, schrieb Amon Ott:
> On Monday 04 June 2012 wrote Stefan Priebe:
>> Hi Amon,
>>
>> thanks for your backported patch. At least it doesn't cleanly apply to
>> debian squeeze stable as it wants a glic 2.12 in Versions.def but Debian
>> is only at 2.11? Do you use another patch too?
>
> I ripped the patch right out of our previously built 2.11.3-3 source tree. It
> needs to be last in the series file, because several existing Debian patches
> modify the sources at various places. I could also make our compiled packages
> available to you for download.
Sorry i added the file in front of the series file...
Thanks
Stefan
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB
2012-06-04 8:23 ` Stefan Majer
2012-06-04 9:21 ` Yann Dupont
@ 2012-06-05 8:54 ` Stefan Priebe - Profihost AG
1 sibling, 0 replies; 36+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-06-05 8:54 UTC (permalink / raw)
To: Stefan Majer; +Cc: Hannes Reinecke, Mark Nelson, ceph-devel
Hi Stefan,
Am 04.06.2012 10:23, schrieb Stefan Majer:
> our production environment is running on 10GB infrastructure. We had a
> lot of troubles till we got to where we are today.
> We use Intel X520 D2 cards on our OSD´s and nexus switch
> infrastructure. All other cards we where testing failed horrible.
Have you also tried emulex cards? (also used by HP)
Stefan
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB
2012-06-04 9:47 ` Amon Ott
` (2 preceding siblings ...)
2012-06-04 15:42 ` Stefan Priebe
@ 2012-06-06 10:48 ` Stefan Priebe - Profihost AG
2012-06-06 10:57 ` Amon Ott
3 siblings, 1 reply; 36+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-06-06 10:48 UTC (permalink / raw)
To: Amon Ott; +Cc: Yann Dupont, ceph-devel
Hi Amon,
i've added your patch:
# strings /lib/libc-2.11.3.so |grep -i syncfs
syncfs
But configure of ceph still claims there is no syncfs support.
# ./configure |grep -i sync
checking for syncfs... no
checking for sync_file_range... yes
Any ideas?
Hint: I'm compiling my packages on an OpenVZ RHEL6 based virtual
container - so THIS kernel where i'm compiling does not support syncfs
is this the reason?
Stefan
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB
2012-06-06 10:48 ` Stefan Priebe - Profihost AG
@ 2012-06-06 10:57 ` Amon Ott
2012-06-06 11:02 ` Stefan Priebe - Profihost AG
0 siblings, 1 reply; 36+ messages in thread
From: Amon Ott @ 2012-06-06 10:57 UTC (permalink / raw)
To: Stefan Priebe - Profihost AG; +Cc: ceph-devel
On Wednesday 06 June 2012 wrote Stefan Priebe - Profihost AG:
> Hi Amon,
>
> i've added your patch:
> # strings /lib/libc-2.11.3.so |grep -i syncfs
> syncfs
>
> But configure of ceph still claims there is no syncfs support.
>
> # ./configure |grep -i sync
> checking for syncfs... no
> checking for sync_file_range... yes
>
> Any ideas?
Did you also install the new libc6-dev, which contains the new header files?
Amon Ott
--
Dr. Amon Ott
m-privacy GmbH Tel: +49 30 24342334
Am Köllnischen Park 1 Fax: +49 30 24342336
10179 Berlin http://www.m-privacy.de
Amtsgericht Charlottenburg, HRB 84946
Geschäftsführer:
Dipl.-Kfm. Holger Maczkowsky,
Roman Maczkowsky
GnuPG-Key-ID: 0x2DD3A649
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB
2012-06-06 10:57 ` Amon Ott
@ 2012-06-06 11:02 ` Stefan Priebe - Profihost AG
2012-06-07 11:33 ` Amon Ott
0 siblings, 1 reply; 36+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-06-06 11:02 UTC (permalink / raw)
To: Amon Ott; +Cc: ceph-devel
Am 06.06.2012 12:57, schrieb Amon Ott:
> On Wednesday 06 June 2012 wrote Stefan Priebe - Profihost AG:
>> Hi Amon,
>>
>> i've added your patch:
>> # strings /lib/libc-2.11.3.so |grep -i syncfs
>> syncfs
>>
>> But configure of ceph still claims there is no syncfs support.
>>
>> # ./configure |grep -i sync
>> checking for syncfs... no
>> checking for sync_file_range... yes
>>
>> Any ideas?
>
> Did you also install the new libc6-dev, which contains the new header files?
Yes.
/usr/include/unistd.h:
extern int syncfs (int __fd) __THROW;
/usr/include/gnu/stubs-64.h:
#define __stub_syncfs
Stefan
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB
2012-06-04 12:59 ` Mark Nelson
2012-06-04 13:07 ` Alexandre DERUMIER
@ 2012-06-06 16:05 ` Alexandre DERUMIER
2012-06-06 16:43 ` Mark Nelson
1 sibling, 1 reply; 36+ messages in thread
From: Alexandre DERUMIER @ 2012-06-06 16:05 UTC (permalink / raw)
To: Mark Nelson; +Cc: Amon Ott, ceph-devel, Yann Dupont
Hi, I have rebuild my cluster with ubuntu precise,
-kernel 3.2
-ceph 0.47.2
-libc6 2.15
-3 nodes - 5 osd (xfs) by node and 1 tmpfs with 5 journal file.
I had launch rados bench,
and I see again constant writes to xfs....
Maybe this is related to tmpfs ?
I'll retry with kernel 3.4 from intank tomorrow.
I'll also try with journal on a physical disk with xfs partition.
I'll keep you in touch.
----- Mail original -----
De: "Mark Nelson" <mark.nelson@inktank.com>
À: "Alexandre DERUMIER" <aderumier@odiso.com>
Cc: "Amon Ott" <a.ott@m-privacy.de>, ceph-devel@vger.kernel.org, "Yann Dupont" <Yann.Dupont@univ-nantes.fr>
Envoyé: Lundi 4 Juin 2012 14:59:58
Objet: Re: Infiniband 40GB
On 6/4/12 6:40 AM, Alexandre DERUMIER wrote:
> Hi,
>
> I'm currently doing some tests with xfs, debian wheezy with standard libc6 (2.11.3-3) and 3.2 kernel.
>
> I'm doing some iostats(3 nodes with 5 osd), and I see constant writes to disks.(as the datas are flushed each second from journal to disk).
>
> Journal is big enough (20GB tmpfs) to handle 30s of write.
>
> Do you think it's related to the missing syncfs() support ?
>
> -Alexandre
Hi Alexandre,
I've included some seekwatcher results for rados bench tests using 16
concurrent 4MB writes on XFS OSD. One shows ubuntu oneiric and the
other precise (ie no syncfs support vs syncfs support in libc).
Unfortunately the original test was on 0.46 and the second test was on
0.47.2, so multiple things changed between the tests. Both were tested
with kernel 3.4. Interestingly the seeks/second don't seem to drop much
but the overall performance has about doubled. This was using a single
7200rpm disk for the OSD data disk and a seperate 7200rpm disk for the
journal in both cases. I'd definitely try 0.47.2 with a new libc though
and see how that works for you.
ceph 0.46/oneiric:
http://nhm.ceph.com/movies/mailinglist-tests/xfs-osd0-oneiric-3.4.mpg
ceph 0.47.2/precise:
http://nhm.ceph.com/movies/mailinglist-tests/xfs-osd0-precise-3.4.mpg
Mark
--
--
Alexandre D erumier
Ingénieur Système
Fixe : 03 20 68 88 90
Fax : 03 20 68 90 81
45 Bvd du Général Leclerc 59100 Roubaix - France
12 rue Marivaux 75002 Paris - France
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB
2012-06-06 16:05 ` Alexandre DERUMIER
@ 2012-06-06 16:43 ` Mark Nelson
0 siblings, 0 replies; 36+ messages in thread
From: Mark Nelson @ 2012-06-06 16:43 UTC (permalink / raw)
To: Alexandre DERUMIER; +Cc: Amon Ott, ceph-devel, Yann Dupont
Hi Alexandre,
If you can run blktrace during your test on one of the OSD data disks
and send me the results I can take a look at them. Also, the rados
bench settings and output would be useful too.
Thanks,
Mark
On 6/6/12 11:05 AM, Alexandre DERUMIER wrote:
> Hi, I have rebuild my cluster with ubuntu precise,
>
> -kernel 3.2
> -ceph 0.47.2
> -libc6 2.15
> -3 nodes - 5 osd (xfs) by node and 1 tmpfs with 5 journal file.
>
> I had launch rados bench,
> and I see again constant writes to xfs....
>
> Maybe this is related to tmpfs ?
>
>
> I'll retry with kernel 3.4 from intank tomorrow.
> I'll also try with journal on a physical disk with xfs partition.
>
> I'll keep you in touch.
>
>
> ----- Mail original -----
>
> De: "Mark Nelson"<mark.nelson@inktank.com>
> À: "Alexandre DERUMIER"<aderumier@odiso.com>
> Cc: "Amon Ott"<a.ott@m-privacy.de>, ceph-devel@vger.kernel.org, "Yann Dupont"<Yann.Dupont@univ-nantes.fr>
> Envoyé: Lundi 4 Juin 2012 14:59:58
> Objet: Re: Infiniband 40GB
>
> On 6/4/12 6:40 AM, Alexandre DERUMIER wrote:
>> Hi,
>>
>> I'm currently doing some tests with xfs, debian wheezy with standard libc6 (2.11.3-3) and 3.2 kernel.
>>
>> I'm doing some iostats(3 nodes with 5 osd), and I see constant writes to disks.(as the datas are flushed each second from journal to disk).
>>
>> Journal is big enough (20GB tmpfs) to handle 30s of write.
>>
>> Do you think it's related to the missing syncfs() support ?
>>
>> -Alexandre
>
> Hi Alexandre,
>
> I've included some seekwatcher results for rados bench tests using 16
> concurrent 4MB writes on XFS OSD. One shows ubuntu oneiric and the
> other precise (ie no syncfs support vs syncfs support in libc).
> Unfortunately the original test was on 0.46 and the second test was on
> 0.47.2, so multiple things changed between the tests. Both were tested
> with kernel 3.4. Interestingly the seeks/second don't seem to drop much
> but the overall performance has about doubled. This was using a single
> 7200rpm disk for the OSD data disk and a seperate 7200rpm disk for the
> journal in both cases. I'd definitely try 0.47.2 with a new libc though
> and see how that works for you.
>
> ceph 0.46/oneiric:
> http://nhm.ceph.com/movies/mailinglist-tests/xfs-osd0-oneiric-3.4.mpg
>
> ceph 0.47.2/precise:
> http://nhm.ceph.com/movies/mailinglist-tests/xfs-osd0-precise-3.4.mpg
>
> Mark
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB
2012-06-06 11:02 ` Stefan Priebe - Profihost AG
@ 2012-06-07 11:33 ` Amon Ott
2012-06-07 12:44 ` Stefan Priebe - Profihost AG
0 siblings, 1 reply; 36+ messages in thread
From: Amon Ott @ 2012-06-07 11:33 UTC (permalink / raw)
To: Stefan Priebe - Profihost AG; +Cc: ceph-devel
On Wednesday 06 June 2012 wrote Stefan Priebe - Profihost AG:
> Am 06.06.2012 12:57, schrieb Amon Ott:
> > On Wednesday 06 June 2012 wrote Stefan Priebe - Profihost AG:
> >> Hi Amon,
> >>
> >> i've added your patch:
> >> # strings /lib/libc-2.11.3.so |grep -i syncfs
> >> syncfs
> >>
> >> But configure of ceph still claims there is no syncfs support.
> >>
> >> # ./configure |grep -i sync
> >> checking for syncfs... no
> >> checking for sync_file_range... yes
> >>
> >> Any ideas?
> >
> > Did you also install the new libc6-dev, which contains the new header
> > files?
>
> Yes.
>
> /usr/include/unistd.h:
> extern int syncfs (int __fd) __THROW;
>
> /usr/include/gnu/stubs-64.h:
> #define __stub_syncfs
Are you building on 32 or 64 Bit? We have 32 here.
Amon Ott
--
Dr. Amon Ott
m-privacy GmbH Tel: +49 30 24342334
Am Köllnischen Park 1 Fax: +49 30 24342336
10179 Berlin http://www.m-privacy.de
Amtsgericht Charlottenburg, HRB 84946
Geschäftsführer:
Dipl.-Kfm. Holger Maczkowsky,
Roman Maczkowsky
GnuPG-Key-ID: 0x2DD3A649
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB
2012-06-07 11:33 ` Amon Ott
@ 2012-06-07 12:44 ` Stefan Priebe - Profihost AG
0 siblings, 0 replies; 36+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-06-07 12:44 UTC (permalink / raw)
To: Amon Ott; +Cc: ceph-devel
Am 07.06.2012 13:33, schrieb Amon Ott:
> On Wednesday 06 June 2012 wrote Stefan Priebe - Profihost AG:
>> Am 06.06.2012 12:57, schrieb Amon Ott:
>>> On Wednesday 06 June 2012 wrote Stefan Priebe - Profihost AG:
>> /usr/include/unistd.h:
>> extern int syncfs (int __fd) __THROW;
>>
>> /usr/include/gnu/stubs-64.h:
>> #define __stub_syncfs
>
> Are you building on 32 or 64 Bit? We have 32 here.
64bit but does this make a difference?
Stefan
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB
2012-06-07 11:25 ` Alexandre DERUMIER
@ 2012-06-07 17:15 ` Mark Nelson
0 siblings, 0 replies; 36+ messages in thread
From: Mark Nelson @ 2012-06-07 17:15 UTC (permalink / raw)
To: Alexandre DERUMIER; +Cc: Amon Ott, ceph-devel, Yann Dupont
On 6/7/12 6:25 AM, Alexandre DERUMIER wrote:
> others tests done today: (kernel 3.4 - ubuntu precise)
>
> 3 nodes with 5 osd with btrfs, 1GB journal in tmps forced in writeahead
> 3 nodes with 1 osd with xfs,8GB journal in tmpfs
> 3 nodes with 1 osd with btfs,8GB journal in tmpfs forced in writeahead
>
> 3 nodes with 5 osd with btrfs, 20G journal on disk forced in writeahead
> 3 nodes with 1 osd with xfs,20GB journal on disk
> 3 nodes with 1 osd with btfs,20GB journal on disk forced in writeahead
>
>
>
>
> same behaviour for all cases, writes are constant to disk.
>
> benched with:
> rados -p pool3 bench 60 write -t 16
>
> also with
> fio, bonnie , random/seq write from guest vm with differents block size.
>
>
>
>
> ----- Mail original -----
>
> De: "Alexandre DERUMIER"<aderumier@odiso.com>
> À: "Mark Nelson"<mark.nelson@inktank.com>
> Cc: "Amon Ott"<a.ott@m-privacy.de>, ceph-devel@vger.kernel.org, "Yann Dupont"<Yann.Dupont@univ-nantes.fr>
> Envoyé: Jeudi 7 Juin 2012 05:31:15
> Objet: Re: Infiniband 40GB
>
> Hi again,
> I have done some tests with journals on a real disk, I have same behaviour.
>
> iostat show constant write to journal and write to disks at the same time since the beginning of the benchmark.
>
>
> maybe can I try to use differents partitions for each journal ? (currently I have 1 partition with 5 journal files of each osd)
>
> -Alexandre
>
>
>
> ----- Mail original -----
>
> De: "Alexandre DERUMIER"<aderumier@odiso.com>
> À: "Mark Nelson"<mark.nelson@inktank.com>
> Cc: "Amon Ott"<a.ott@m-privacy.de>, ceph-devel@vger.kernel.org, "Yann Dupont"<Yann.Dupont@univ-nantes.fr>
> Envoyé: Jeudi 7 Juin 2012 05:11:15
> Objet: Re: Infiniband 40GB
>
> Hi mark,
> I have attached a blktrace of /dev/sdb1 of node1 (osd.0)
>
> and also iostat (showing constant writes)
>
> bench used:
>
> rados -p pool3 bench 60 write -t 16
>
>
> kernel use : 3.4 from intank
>
> I'll do tests with journal on an xfs partition today
>
Hi Alexandre,
I'll try to take a look at the data you sent me later today.
Thanks!
Mark
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB
2012-06-07 3:31 ` Alexandre DERUMIER
@ 2012-06-07 11:25 ` Alexandre DERUMIER
2012-06-07 17:15 ` Mark Nelson
0 siblings, 1 reply; 36+ messages in thread
From: Alexandre DERUMIER @ 2012-06-07 11:25 UTC (permalink / raw)
To: Mark Nelson; +Cc: Amon Ott, ceph-devel, Yann Dupont
others tests done today: (kernel 3.4 - ubuntu precise)
3 nodes with 5 osd with btrfs, 1GB journal in tmps forced in writeahead
3 nodes with 1 osd with xfs,8GB journal in tmpfs
3 nodes with 1 osd with btfs,8GB journal in tmpfs forced in writeahead
3 nodes with 5 osd with btrfs, 20G journal on disk forced in writeahead
3 nodes with 1 osd with xfs,20GB journal on disk
3 nodes with 1 osd with btfs,20GB journal on disk forced in writeahead
same behaviour for all cases, writes are constant to disk.
benched with:
rados -p pool3 bench 60 write -t 16
also with
fio, bonnie , random/seq write from guest vm with differents block size.
----- Mail original -----
De: "Alexandre DERUMIER" <aderumier@odiso.com>
À: "Mark Nelson" <mark.nelson@inktank.com>
Cc: "Amon Ott" <a.ott@m-privacy.de>, ceph-devel@vger.kernel.org, "Yann Dupont" <Yann.Dupont@univ-nantes.fr>
Envoyé: Jeudi 7 Juin 2012 05:31:15
Objet: Re: Infiniband 40GB
Hi again,
I have done some tests with journals on a real disk, I have same behaviour.
iostat show constant write to journal and write to disks at the same time since the beginning of the benchmark.
maybe can I try to use differents partitions for each journal ? (currently I have 1 partition with 5 journal files of each osd)
-Alexandre
----- Mail original -----
De: "Alexandre DERUMIER" <aderumier@odiso.com>
À: "Mark Nelson" <mark.nelson@inktank.com>
Cc: "Amon Ott" <a.ott@m-privacy.de>, ceph-devel@vger.kernel.org, "Yann Dupont" <Yann.Dupont@univ-nantes.fr>
Envoyé: Jeudi 7 Juin 2012 05:11:15
Objet: Re: Infiniband 40GB
Hi mark,
I have attached a blktrace of /dev/sdb1 of node1 (osd.0)
and also iostat (showing constant writes)
bench used:
rados -p pool3 bench 60 write -t 16
kernel use : 3.4 from intank
I'll do tests with journal on an xfs partition today
----- Mail original -----
De: "Mark Nelson" <mark.nelson@inktank.com>
À: "Alexandre DERUMIER" <aderumier@odiso.com>
Cc: "Amon Ott" <a.ott@m-privacy.de>, ceph-devel@vger.kernel.org, "Yann Dupont" <Yann.Dupont@univ-nantes.fr>
Envoyé: Mercredi 6 Juin 2012 18:43:50
Objet: Re: Infiniband 40GB
Hi Alexandre,
If you can run blktrace during your test on one of the OSD data disks
and send me the results I can take a look at them. Also, the rados
bench settings and output would be useful too.
Thanks,
Mark
On 6/6/12 11:05 AM, Alexandre DERUMIER wrote:
> Hi, I have rebuild my cluster with ubuntu precise,
>
> -kernel 3.2
> -ceph 0.47.2
> -libc6 2.15
> -3 nodes - 5 osd (xfs) by node and 1 tmpfs with 5 journal file.
>
> I had launch rados bench,
> and I see again constant writes to xfs....
>
> Maybe this is related to tmpfs ?
>
>
> I'll retry with kernel 3.4 from intank tomorrow.
> I'll also try with journal on a physical disk with xfs partition.
>
> I'll keep you in touch.
>
>
> ----- Mail original -----
>
> De: "Mark Nelson"<mark.nelson@inktank.com>
> À: "Alexandre DERUMIER"<aderumier@odiso.com>
> Cc: "Amon Ott"<a.ott@m-privacy.de>, ceph-devel@vger.kernel.org, "Yann Dupont"<Yann.Dupont@univ-nantes.fr>
> Envoyé: Lundi 4 Juin 2012 14:59:58
> Objet: Re: Infiniband 40GB
>
> On 6/4/12 6:40 AM, Alexandre DERUMIER wrote:
>> Hi,
>>
>> I'm currently doing some tests with xfs, debian wheezy with standard libc6 (2.11.3-3) and 3.2 kernel.
>>
>> I'm doing some iostats(3 nodes with 5 osd), and I see constant writes to disks.(as the datas are flushed each second from journal to disk).
>>
>> Journal is big enough (20GB tmpfs) to handle 30s of write.
>>
>> Do you think it's related to the missing syncfs() support ?
>>
>> -Alexandre
>
> Hi Alexandre,
>
> I've included some seekwatcher results for rados bench tests using 16
> concurrent 4MB writes on XFS OSD. One shows ubuntu oneiric and the
> other precise (ie no syncfs support vs syncfs support in libc).
> Unfortunately the original test was on 0.46 and the second test was on
> 0.47.2, so multiple things changed between the tests. Both were tested
> with kernel 3.4. Interestingly the seeks/second don't seem to drop much
> but the overall performance has about doubled. This was using a single
> 7200rpm disk for the OSD data disk and a seperate 7200rpm disk for the
> journal in both cases. I'd definitely try 0.47.2 with a new libc though
> and see how that works for you.
>
> ceph 0.46/oneiric:
> http://nhm.ceph.com/movies/mailinglist-tests/xfs-osd0-oneiric-3.4.mpg
>
> ceph 0.47.2/precise:
> http://nhm.ceph.com/movies/mailinglist-tests/xfs-osd0-precise-3.4.mpg
>
> Mark
>
>
>
--
--
Alexandre D erumier
Ingénieur Système
Fixe : 03 20 68 88 90
Fax : 03 20 68 90 81
45 Bvd du Général Leclerc 59100 Roubaix - France
12 rue Marivaux 75002 Paris - France
--
--
Alexandre D erumier
Ingénieur Système
Fixe : 03 20 68 88 90
Fax : 03 20 68 90 81
45 Bvd du Général Leclerc 59100 Roubaix - France
12 rue Marivaux 75002 Paris - France
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
--
Alexandre D erumier
Ingénieur Système
Fixe : 03 20 68 88 90
Fax : 03 20 68 90 81
45 Bvd du Général Leclerc 59100 Roubaix - France
12 rue Marivaux 75002 Paris - France
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB
[not found] <a81f3855-1c7d-447b-9bbf-6a891e372909@mailpro>
@ 2012-06-07 3:31 ` Alexandre DERUMIER
2012-06-07 11:25 ` Alexandre DERUMIER
0 siblings, 1 reply; 36+ messages in thread
From: Alexandre DERUMIER @ 2012-06-07 3:31 UTC (permalink / raw)
To: Mark Nelson; +Cc: Amon Ott, ceph-devel, Yann Dupont
Hi again,
I have done some tests with journals on a real disk, I have same behaviour.
iostat show constant write to journal and write to disks at the same time since the beginning of the benchmark.
maybe can I try to use differents partitions for each journal ? (currently I have 1 partition with 5 journal files of each osd)
-Alexandre
----- Mail original -----
De: "Alexandre DERUMIER" <aderumier@odiso.com>
À: "Mark Nelson" <mark.nelson@inktank.com>
Cc: "Amon Ott" <a.ott@m-privacy.de>, ceph-devel@vger.kernel.org, "Yann Dupont" <Yann.Dupont@univ-nantes.fr>
Envoyé: Jeudi 7 Juin 2012 05:11:15
Objet: Re: Infiniband 40GB
Hi mark,
I have attached a blktrace of /dev/sdb1 of node1 (osd.0)
and also iostat (showing constant writes)
bench used:
rados -p pool3 bench 60 write -t 16
kernel use : 3.4 from intank
I'll do tests with journal on an xfs partition today
----- Mail original -----
De: "Mark Nelson" <mark.nelson@inktank.com>
À: "Alexandre DERUMIER" <aderumier@odiso.com>
Cc: "Amon Ott" <a.ott@m-privacy.de>, ceph-devel@vger.kernel.org, "Yann Dupont" <Yann.Dupont@univ-nantes.fr>
Envoyé: Mercredi 6 Juin 2012 18:43:50
Objet: Re: Infiniband 40GB
Hi Alexandre,
If you can run blktrace during your test on one of the OSD data disks
and send me the results I can take a look at them. Also, the rados
bench settings and output would be useful too.
Thanks,
Mark
On 6/6/12 11:05 AM, Alexandre DERUMIER wrote:
> Hi, I have rebuild my cluster with ubuntu precise,
>
> -kernel 3.2
> -ceph 0.47.2
> -libc6 2.15
> -3 nodes - 5 osd (xfs) by node and 1 tmpfs with 5 journal file.
>
> I had launch rados bench,
> and I see again constant writes to xfs....
>
> Maybe this is related to tmpfs ?
>
>
> I'll retry with kernel 3.4 from intank tomorrow.
> I'll also try with journal on a physical disk with xfs partition.
>
> I'll keep you in touch.
>
>
> ----- Mail original -----
>
> De: "Mark Nelson"<mark.nelson@inktank.com>
> À: "Alexandre DERUMIER"<aderumier@odiso.com>
> Cc: "Amon Ott"<a.ott@m-privacy.de>, ceph-devel@vger.kernel.org, "Yann Dupont"<Yann.Dupont@univ-nantes.fr>
> Envoyé: Lundi 4 Juin 2012 14:59:58
> Objet: Re: Infiniband 40GB
>
> On 6/4/12 6:40 AM, Alexandre DERUMIER wrote:
>> Hi,
>>
>> I'm currently doing some tests with xfs, debian wheezy with standard libc6 (2.11.3-3) and 3.2 kernel.
>>
>> I'm doing some iostats(3 nodes with 5 osd), and I see constant writes to disks.(as the datas are flushed each second from journal to disk).
>>
>> Journal is big enough (20GB tmpfs) to handle 30s of write.
>>
>> Do you think it's related to the missing syncfs() support ?
>>
>> -Alexandre
>
> Hi Alexandre,
>
> I've included some seekwatcher results for rados bench tests using 16
> concurrent 4MB writes on XFS OSD. One shows ubuntu oneiric and the
> other precise (ie no syncfs support vs syncfs support in libc).
> Unfortunately the original test was on 0.46 and the second test was on
> 0.47.2, so multiple things changed between the tests. Both were tested
> with kernel 3.4. Interestingly the seeks/second don't seem to drop much
> but the overall performance has about doubled. This was using a single
> 7200rpm disk for the OSD data disk and a seperate 7200rpm disk for the
> journal in both cases. I'd definitely try 0.47.2 with a new libc though
> and see how that works for you.
>
> ceph 0.46/oneiric:
> http://nhm.ceph.com/movies/mailinglist-tests/xfs-osd0-oneiric-3.4.mpg
>
> ceph 0.47.2/precise:
> http://nhm.ceph.com/movies/mailinglist-tests/xfs-osd0-precise-3.4.mpg
>
> Mark
>
>
>
--
--
Alexandre D erumier
Ingénieur Système
Fixe : 03 20 68 88 90
Fax : 03 20 68 90 81
45 Bvd du Général Leclerc 59100 Roubaix - France
12 rue Marivaux 75002 Paris - France
--
--
Alexandre D erumier
Ingénieur Système
Fixe : 03 20 68 88 90
Fax : 03 20 68 90 81
45 Bvd du Général Leclerc 59100 Roubaix - France
12 rue Marivaux 75002 Paris - France
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2012-06-07 17:15 UTC | newest]
Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-03 8:10 Infiniband 40GB Stefan Priebe
2012-06-03 12:56 ` Mark Nelson
2012-06-04 6:22 ` Hannes Reinecke
2012-06-04 7:26 ` Stefan Priebe - Profihost AG
2012-06-04 7:39 ` Hannes Reinecke
2012-06-04 7:53 ` Stefan Priebe - Profihost AG
2012-06-04 8:02 ` Hannes Reinecke
2012-06-04 8:23 ` Stefan Majer
2012-06-04 9:21 ` Yann Dupont
2012-06-04 9:35 ` Alexandre DERUMIER
2012-06-04 9:53 ` Yann Dupont
2012-06-04 9:47 ` Amon Ott
2012-06-04 9:58 ` Yann Dupont
2012-06-04 11:40 ` Alexandre DERUMIER
2012-06-04 12:59 ` Mark Nelson
2012-06-04 13:07 ` Alexandre DERUMIER
2012-06-04 13:28 ` Mark Nelson
2012-06-04 15:11 ` Gregory Farnum
2012-06-04 15:34 ` Mark Nelson
2012-06-06 16:05 ` Alexandre DERUMIER
2012-06-06 16:43 ` Mark Nelson
2012-06-04 15:42 ` Stefan Priebe
2012-06-05 7:08 ` Amon Ott
2012-06-05 7:46 ` Stefan Priebe - Profihost AG
2012-06-06 10:48 ` Stefan Priebe - Profihost AG
2012-06-06 10:57 ` Amon Ott
2012-06-06 11:02 ` Stefan Priebe - Profihost AG
2012-06-07 11:33 ` Amon Ott
2012-06-07 12:44 ` Stefan Priebe - Profihost AG
2012-06-05 8:54 ` Stefan Priebe - Profihost AG
2012-06-04 12:28 ` Mark Nelson
2012-06-04 12:34 ` Tomasz Paszkowski
2012-06-04 12:40 ` Mark Nelson
[not found] <a81f3855-1c7d-447b-9bbf-6a891e372909@mailpro>
2012-06-07 3:31 ` Alexandre DERUMIER
2012-06-07 11:25 ` Alexandre DERUMIER
2012-06-07 17:15 ` Mark Nelson
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.