All of lore.kernel.org
 help / color / mirror / Atom feed
* NFS over Ceph
@ 2012-04-23 23:50 Calvin Morrow
  2012-04-24  0:12 ` Tommi Virtanen
  2012-04-24  3:01 ` Sage Weil
  0 siblings, 2 replies; 5+ messages in thread
From: Calvin Morrow @ 2012-04-23 23:50 UTC (permalink / raw)
  To: ceph-devel

I've been testing a couple different use scenarios with Ceph 0.45
(two-node cluster, single mon, active/standby mds).  I have a pair of
KVM virtual machines acting as ceph clients to re-export iSCSI over
RBD block devices, and also NFS over a Ceph mount (mount -t ceph).

The iSCSI re-export is going very well.  So far I haven't had any
issues to speak of (even while testing Pacemaker based failover).

The NFS re-export isn't going nearly as well.  I'm running into
several issues with reliability, speed, etc.  To start with, file
operations seem painstakingly long.  Copying over multiple 20 Kb files
takes  > 10 seconds per file.  "dd if=/dev/zero of='.... goes very
fast once the data transfer starts, but the actual opening of the file
can take nearly as long (or longer depending on size).

I've also run into cases where the directory mounted as ceph
(/mnt/ceph) "hangs" on the NFS server requiring a reboot of the NFS
server.

That said, are there any special recommendations regarding exporting
Ceph through NFS?  I know that in the wiki and also (still present as
of 3.3.3) kernel source indicates:

* NFS export support
*
* NFS re-export of a ceph mount is, at present, only semireliable.
* The basic issue is that the Ceph architectures doesn't lend itself
* well to generating filehandles that will remain valid forever.

Should I be trying this a different way?  NFS export of a filesystem
(ext4 / xfs) on RBD?  Other options?  Also, does the filehandle
limitation specified above apply to more than NFS (such as a KVM image
using a file on a ceph mount for storage backing)?

Any insight would be appreciated.

Calvin

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: NFS over Ceph
  2012-04-23 23:50 NFS over Ceph Calvin Morrow
@ 2012-04-24  0:12 ` Tommi Virtanen
  2012-04-24  3:01 ` Sage Weil
  1 sibling, 0 replies; 5+ messages in thread
From: Tommi Virtanen @ 2012-04-24  0:12 UTC (permalink / raw)
  To: Calvin Morrow; +Cc: ceph-devel

On Mon, Apr 23, 2012 at 16:50, Calvin Morrow <calvin.morrow@gmail.com> wrote:
> Should I be trying this a different way?  NFS export of a filesystem
> (ext4 / xfs) on RBD?  Other options?  Also, does the filehandle
> limitation specified above apply to more than NFS (such as a KVM image
> using a file on a ceph mount for storage backing)?

I can't really help you with most of your email right now, but I
wanted to add a tidbit that wasn't mentioned in your message:

There's been work to export a Ceph filesystem, without mounting &
re-exporting, by using nfs-ganesha and either a custom libceph mapper,
or the FUSE mapping. Ceph has a FUSE daemon, and ganesha claims to be
able to use just about any FUSE filesystem work woth it.. so if you're
actually investing R&D into this, that way might be worth exploring.

http://sourceforge.net/apps/trac/nfs-ganesha/

Perhaps someone else can fill in their experiences with that line of study.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: NFS over Ceph
  2012-04-23 23:50 NFS over Ceph Calvin Morrow
  2012-04-24  0:12 ` Tommi Virtanen
@ 2012-04-24  3:01 ` Sage Weil
  2012-04-24  5:19   ` Calvin Morrow
  1 sibling, 1 reply; 5+ messages in thread
From: Sage Weil @ 2012-04-24  3:01 UTC (permalink / raw)
  To: Calvin Morrow; +Cc: ceph-devel

On Mon, 23 Apr 2012, Calvin Morrow wrote:
> I've been testing a couple different use scenarios with Ceph 0.45
> (two-node cluster, single mon, active/standby mds).  I have a pair of
> KVM virtual machines acting as ceph clients to re-export iSCSI over
> RBD block devices, and also NFS over a Ceph mount (mount -t ceph).
> 
> The iSCSI re-export is going very well.  So far I haven't had any
> issues to speak of (even while testing Pacemaker based failover).
> 
> The NFS re-export isn't going nearly as well.  I'm running into
> several issues with reliability, speed, etc.  To start with, file
> operations seem painstakingly long.  Copying over multiple 20 Kb files
> takes  > 10 seconds per file.  "dd if=/dev/zero of='.... goes very
> fast once the data transfer starts, but the actual opening of the file
> can take nearly as long (or longer depending on size).

Can you try with the 'async' option in your exports file?  I think the 
main problem with the slowness is because of what nfsd is doing with 
syncs, but want to confirm that.

Generally speaking, there is an unfortunate disconnect between the NFS and 
Ceph metadata protocols.  Ceph tries to do lots of operations and sync 
periodically and on-demand (e.g., when you fsync() a directory).  NFS, 
OTOH, says you should sync every operation, which is usually pretty 
horrible for performance unless you have NVRAM or an SSD or something.

We haven't invested much time/thought into what the best behavior should 
be, here... NFS is pretty far down are list at the moment.

sage

> 
> I've also run into cases where the directory mounted as ceph
> (/mnt/ceph) "hangs" on the NFS server requiring a reboot of the NFS
> server.
> 
> That said, are there any special recommendations regarding exporting
> Ceph through NFS?  I know that in the wiki and also (still present as
> of 3.3.3) kernel source indicates:
> 
> * NFS export support
> *
> * NFS re-export of a ceph mount is, at present, only semireliable.
> * The basic issue is that the Ceph architectures doesn't lend itself
> * well to generating filehandles that will remain valid forever.
> 
> Should I be trying this a different way?  NFS export of a filesystem
> (ext4 / xfs) on RBD?  Other options?  Also, does the filehandle
> limitation specified above apply to more than NFS (such as a KVM image
> using a file on a ceph mount for storage backing)?
> 
> Any insight would be appreciated.
> 
> Calvin
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: NFS over Ceph
  2012-04-24  3:01 ` Sage Weil
@ 2012-04-24  5:19   ` Calvin Morrow
  2012-04-24 13:29     ` Sage Weil
  0 siblings, 1 reply; 5+ messages in thread
From: Calvin Morrow @ 2012-04-24  5:19 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

On Mon, Apr 23, 2012 at 9:01 PM, Sage Weil <sage@newdream.net> wrote:
> On Mon, 23 Apr 2012, Calvin Morrow wrote:
>> I've been testing a couple different use scenarios with Ceph 0.45
>> (two-node cluster, single mon, active/standby mds).  I have a pair of
>> KVM virtual machines acting as ceph clients to re-export iSCSI over
>> RBD block devices, and also NFS over a Ceph mount (mount -t ceph).
>>
>> The iSCSI re-export is going very well.  So far I haven't had any
>> issues to speak of (even while testing Pacemaker based failover).
>>
>> The NFS re-export isn't going nearly as well.  I'm running into
>> several issues with reliability, speed, etc.  To start with, file
>> operations seem painstakingly long.  Copying over multiple 20 Kb files
>> takes  > 10 seconds per file.  "dd if=/dev/zero of='.... goes very
>> fast once the data transfer starts, but the actual opening of the file
>> can take nearly as long (or longer depending on size).
>
> Can you try with the 'async' option in your exports file?  I think the
> main problem with the slowness is because of what nfsd is doing with
> syncs, but want to confirm that.
>

async didn't make a difference.  I thought this pretty strange, so I
decided to try mounting a separate dir with the ceph-fuse client
instead of the native kernel client.  What amounted was a night and
day difference.  I pushed a good 79 GB (my home directory) through the
nfs server (sync) attached to the fuse client at an average speed of
~68 MB / sec over consumer gigabit.

Just for completeness, I re-exported the native kernel client (after
verifying it could browse ok, read / write files, etc.) and I was back
to __very__ slow metadata ops (just a simple `ls` takes > 1 min).

Calvin

> Generally speaking, there is an unfortunate disconnect between the NFS and
> Ceph metadata protocols.  Ceph tries to do lots of operations and sync
> periodically and on-demand (e.g., when you fsync() a directory).  NFS,
> OTOH, says you should sync every operation, which is usually pretty
> horrible for performance unless you have NVRAM or an SSD or something.
>
> We haven't invested much time/thought into what the best behavior should
> be, here... NFS is pretty far down are list at the moment.
>
> sage
>
>>
>> I've also run into cases where the directory mounted as ceph
>> (/mnt/ceph) "hangs" on the NFS server requiring a reboot of the NFS
>> server.
>>
>> That said, are there any special recommendations regarding exporting
>> Ceph through NFS?  I know that in the wiki and also (still present as
>> of 3.3.3) kernel source indicates:
>>
>> * NFS export support
>> *
>> * NFS re-export of a ceph mount is, at present, only semireliable.
>> * The basic issue is that the Ceph architectures doesn't lend itself
>> * well to generating filehandles that will remain valid forever.
>>
>> Should I be trying this a different way?  NFS export of a filesystem
>> (ext4 / xfs) on RBD?  Other options?  Also, does the filehandle
>> limitation specified above apply to more than NFS (such as a KVM image
>> using a file on a ceph mount for storage backing)?
>>
>> Any insight would be appreciated.
>>
>> Calvin
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: NFS over Ceph
  2012-04-24  5:19   ` Calvin Morrow
@ 2012-04-24 13:29     ` Sage Weil
  0 siblings, 0 replies; 5+ messages in thread
From: Sage Weil @ 2012-04-24 13:29 UTC (permalink / raw)
  To: Calvin Morrow; +Cc: ceph-devel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 4235 bytes --]

On Mon, 23 Apr 2012, Calvin Morrow wrote:
> On Mon, Apr 23, 2012 at 9:01 PM, Sage Weil <sage@newdream.net> wrote:
> > On Mon, 23 Apr 2012, Calvin Morrow wrote:
> >> I've been testing a couple different use scenarios with Ceph 0.45
> >> (two-node cluster, single mon, active/standby mds).  I have a pair of
> >> KVM virtual machines acting as ceph clients to re-export iSCSI over
> >> RBD block devices, and also NFS over a Ceph mount (mount -t ceph).
> >>
> >> The iSCSI re-export is going very well.  So far I haven't had any
> >> issues to speak of (even while testing Pacemaker based failover).
> >>
> >> The NFS re-export isn't going nearly as well.  I'm running into
> >> several issues with reliability, speed, etc.  To start with, file
> >> operations seem painstakingly long.  Copying over multiple 20 Kb files
> >> takes  > 10 seconds per file.  "dd if=/dev/zero of='.... goes very
> >> fast once the data transfer starts, but the actual opening of the file
> >> can take nearly as long (or longer depending on size).
> >
> > Can you try with the 'async' option in your exports file?  I think the
> > main problem with the slowness is because of what nfsd is doing with
> > syncs, but want to confirm that.
> >
> 
> async didn't make a difference.  I thought this pretty strange, so I
> decided to try mounting a separate dir with the ceph-fuse client
> instead of the native kernel client.  What amounted was a night and
> day difference.  I pushed a good 79 GB (my home directory) through the
> nfs server (sync) attached to the fuse client at an average speed of
> ~68 MB / sec over consumer gigabit.
> 
> Just for completeness, I re-exported the native kernel client (after
> verifying it could browse ok, read / write files, etc.) and I was back
> to __very__ slow metadata ops (just a simple `ls` takes > 1 min).

Can you generate an mds log with 'debug ms = 1' in the [mds] section of 
your config so we can see which operations are taking so long?

A kernel client log would also be helpful.  If you run the script 
src/script/kcon_most.sh on the re-exporting host ceph will spam the kernel 
debug log with copious amounts of information that should show up in your 
kern.log.

Thanks!
sage



> 
> Calvin
> 
> > Generally speaking, there is an unfortunate disconnect between the NFS and
> > Ceph metadata protocols.  Ceph tries to do lots of operations and sync
> > periodically and on-demand (e.g., when you fsync() a directory).  NFS,
> > OTOH, says you should sync every operation, which is usually pretty
> > horrible for performance unless you have NVRAM or an SSD or something.
> >
> > We haven't invested much time/thought into what the best behavior should
> > be, here... NFS is pretty far down are list at the moment.
> >
> > sage
> >
> >>
> >> I've also run into cases where the directory mounted as ceph
> >> (/mnt/ceph) "hangs" on the NFS server requiring a reboot of the NFS
> >> server.
> >>
> >> That said, are there any special recommendations regarding exporting
> >> Ceph through NFS?  I know that in the wiki and also (still present as
> >> of 3.3.3) kernel source indicates:
> >>
> >> * NFS export support
> >> *
> >> * NFS re-export of a ceph mount is, at present, only semireliable.
> >> * The basic issue is that the Ceph architectures doesn't lend itself
> >> * well to generating filehandles that will remain valid forever.
> >>
> >> Should I be trying this a different way?  NFS export of a filesystem
> >> (ext4 / xfs) on RBD?  Other options?  Also, does the filehandle
> >> limitation specified above apply to more than NFS (such as a KVM image
> >> using a file on a ceph mount for storage backing)?
> >>
> >> Any insight would be appreciated.
> >>
> >> Calvin
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> >>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-04-24 13:30 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-23 23:50 NFS over Ceph Calvin Morrow
2012-04-24  0:12 ` Tommi Virtanen
2012-04-24  3:01 ` Sage Weil
2012-04-24  5:19   ` Calvin Morrow
2012-04-24 13:29     ` Sage Weil

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.