All of lore.kernel.org
 help / color / mirror / Atom feed
* CephFS usability
@ 2016-07-21 12:11 John Spray
  2016-07-21 14:33 ` Eric Eastman
  2016-07-25  5:40 ` Zhi Zhang
  0 siblings, 2 replies; 5+ messages in thread
From: John Spray @ 2016-07-21 12:11 UTC (permalink / raw)
  To: Ceph Development

Dear list,

I'm collecting ideas for making CephFS easier to use.  This list
includes some preexisting stuff, as well as some recent ideas from
people working on the code.  I'm looking for feedback on what's here,
and any extra ideas people have.

Some of the items here are dependent on ceph-mgr (like the enhanced
status views, client statistics), some aren't.  The general theme is
to make things less arcane, and make the state of the system easier to
understand.

Please share your thoughts.

Cheers,
John


Simpler kernel client setup
 * allow mount.ceph to use the same keyring file that the FUSE client
uses, instead of requiring users to strip the secret out of that file
manually. (http://tracker.ceph.com/issues/16656)

Simpler multi-fs use from ceph-fuse
 * A nicer syntax than having to pass --client_mds_namespace
 * A way to specify the chosen filesystem in fstab

Mount-less administrative shell/commands:
 * A lightweight python shell enabling admins to manipulate their
filesystem without a full blown client mount
 * Friendlier commands than current setxattr syntax for layouts and quotas
 * Enable administrators to inspect the filesystem (ls, cd, stat, etc)
 * Enable administrators to configure things like directories for
users with quotas, mapping directories to pools

CephFS daemon/recovery status view:
 * Currently we see the text status of replay/clientreplay etc in "ceph status"
 * A more detailed "ceph fs status" view that breaks down each MDS
daemon's state
 * What we'd really like to see in these modes is progress (% of
segments replayed, % of clients replayed, % of clients reconnected)
and timing information (e.g. in reconnect, something like "waiting for
1 client for another 30 seconds")
 * Maybe also display some other high level perf stats per-MDS like
client requests per second within this view.

CephFS full system dstat-like view
 * Currently have "daemonperf mds.<foo>" asok mechanism, which is
local to one MDS and does not give OSD statistics
 * Add a "ceph fs perf" command that fuses multi-mds data with OSD
data to give users a single view of the level of metadata and data IO
across the system

Client statistics
 * Implement the "live performance probes" mechanism
http://tracker.ceph.com/projects/ceph/wiki/Live_Performance_Probes
 * This is the same infrastructure as would be used for e.g. "rbd top"
image listing.
 * Initially could just be a "client top" view with 5-10 key stats per
client, where we collect data for the busiest 10-20 clients (on modest
size systems this likely to be all clients in practice)
 * Full feature would have per-path filtering, so that admin could say
"which subtree is busy?  OK, which client is busy within that
subtree?".

Orchestrated backward scrub (aka cephfs-data-scan, #12143):
 * Wrap it in a central CLI that runs a pool of workers
 * Those workers could be embedded in standby mgrs, in standby mdss,
or standalone
 * Need a work queue type mechanism, probably via RADOS objects.
 * This is http://tracker.ceph.com/issues/12143

Single command hard client eviction:
 * Wrap the process of blacklisting a client *and* evicting it from
all MDS daemons
 * Similar procedure currently done in CephFSVolumeClient.evict
 * This is http://tracker.ceph.com/issues/9754

Simplified client auth caps creation:
 * Wrap the process of creating a client identify that has just the
right MDS+OSD capabilities for accessing a particular filesystem
("ceph fs authorize client.foo" instead of "ceph auth get-or-create
... ..."

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: CephFS usability
  2016-07-21 12:11 CephFS usability John Spray
@ 2016-07-21 14:33 ` Eric Eastman
  2016-07-21 15:03   ` John Spray
  2016-07-25  5:40 ` Zhi Zhang
  1 sibling, 1 reply; 5+ messages in thread
From: Eric Eastman @ 2016-07-21 14:33 UTC (permalink / raw)
  To: John Spray; +Cc: Ceph Development

I have been playing with CephFS since before Firefly and I am now
trying to put it into production. You listed a lot of good ideas, but
many of them would seem to help developers and those trying to get the
absolute best performance out of CephFS, more then helping day to day
administration or using CephFS in a typical environment.  The three
things I really need are:

1. Documentation.  The documents for administrating CephFS on the Ceph
site and the man pages are incomplete.  As an example, having to open
a ticket to find the magic options to get ACL support on a FUSE mount
(see ticket #15783) is very inefficient for both the end user and the
Ceph engineers. There are a huge number of mount options between
kernel and FUSE mounts with very little explanation of what they do
and what are the performance costs of using them.  The new CAP
protections are great, if you can figure them out.  Same goes for the
new recovery tools.

2. Snapshots.  Without snapshots I cannot put the file system it into
production in all my use cases.

3. Full support for the SAMBA and Ganesha CephFS modules, including
HA.  Although these modules are not owned by the Ceph team, they need
to be solid to use CephFS at a lot of sites. The GlusterFS team seems
to be doing a lot of work on these interfaces, and it would be nice if
that work was also being done for Ceph.

To make CephFS easier to use in the field, I would really like to see
the base functionality well supported and documented before focusing
on new things. Thank you for everything you are doing.

Eric Eastman

On Thu, Jul 21, 2016 at 6:11 AM, John Spray <jspray@redhat.com> wrote:
> Dear list,
>
> I'm collecting ideas for making CephFS easier to use.  This list
> includes some preexisting stuff, as well as some recent ideas from
> people working on the code.  I'm looking for feedback on what's here,
> and any extra ideas people have.
>
> Some of the items here are dependent on ceph-mgr (like the enhanced
> status views, client statistics), some aren't.  The general theme is
> to make things less arcane, and make the state of the system easier to
> understand.
>
> Please share your thoughts.
>
> Cheers,
> John
>
>
> Simpler kernel client setup
>  * allow mount.ceph to use the same keyring file that the FUSE client
> uses, instead of requiring users to strip the secret out of that file
> manually. (http://tracker.ceph.com/issues/16656)
>
> Simpler multi-fs use from ceph-fuse
>  * A nicer syntax than having to pass --client_mds_namespace
>  * A way to specify the chosen filesystem in fstab
>
> Mount-less administrative shell/commands:
>  * A lightweight python shell enabling admins to manipulate their
> filesystem without a full blown client mount
>  * Friendlier commands than current setxattr syntax for layouts and quotas
>  * Enable administrators to inspect the filesystem (ls, cd, stat, etc)
>  * Enable administrators to configure things like directories for
> users with quotas, mapping directories to pools
>
> CephFS daemon/recovery status view:
>  * Currently we see the text status of replay/clientreplay etc in "ceph status"
>  * A more detailed "ceph fs status" view that breaks down each MDS
> daemon's state
>  * What we'd really like to see in these modes is progress (% of
> segments replayed, % of clients replayed, % of clients reconnected)
> and timing information (e.g. in reconnect, something like "waiting for
> 1 client for another 30 seconds")
>  * Maybe also display some other high level perf stats per-MDS like
> client requests per second within this view.
>
> CephFS full system dstat-like view
>  * Currently have "daemonperf mds.<foo>" asok mechanism, which is
> local to one MDS and does not give OSD statistics
>  * Add a "ceph fs perf" command that fuses multi-mds data with OSD
> data to give users a single view of the level of metadata and data IO
> across the system
>
> Client statistics
>  * Implement the "live performance probes" mechanism
> http://tracker.ceph.com/projects/ceph/wiki/Live_Performance_Probes
>  * This is the same infrastructure as would be used for e.g. "rbd top"
> image listing.
>  * Initially could just be a "client top" view with 5-10 key stats per
> client, where we collect data for the busiest 10-20 clients (on modest
> size systems this likely to be all clients in practice)
>  * Full feature would have per-path filtering, so that admin could say
> "which subtree is busy?  OK, which client is busy within that
> subtree?".
>
> Orchestrated backward scrub (aka cephfs-data-scan, #12143):
>  * Wrap it in a central CLI that runs a pool of workers
>  * Those workers could be embedded in standby mgrs, in standby mdss,
> or standalone
>  * Need a work queue type mechanism, probably via RADOS objects.
>  * This is http://tracker.ceph.com/issues/12143
>
> Single command hard client eviction:
>  * Wrap the process of blacklisting a client *and* evicting it from
> all MDS daemons
>  * Similar procedure currently done in CephFSVolumeClient.evict
>  * This is http://tracker.ceph.com/issues/9754
>
> Simplified client auth caps creation:
>  * Wrap the process of creating a client identify that has just the
> right MDS+OSD capabilities for accessing a particular filesystem
> ("ceph fs authorize client.foo" instead of "ceph auth get-or-create
> ... ..."
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: CephFS usability
  2016-07-21 14:33 ` Eric Eastman
@ 2016-07-21 15:03   ` John Spray
  0 siblings, 0 replies; 5+ messages in thread
From: John Spray @ 2016-07-21 15:03 UTC (permalink / raw)
  To: Eric Eastman; +Cc: Ceph Development

On Thu, Jul 21, 2016 at 3:33 PM, Eric Eastman
<eric.eastman@keepertech.com> wrote:
> I have been playing with CephFS since before Firefly and I am now
> trying to put it into production. You listed a lot of good ideas, but
> many of them would seem to help developers and those trying to get the
> absolute best performance out of CephFS, more then helping day to day
> administration or using CephFS in a typical environment.  The three
> things I really need are:
>
> 1. Documentation.  The documents for administrating CephFS on the Ceph
> site and the man pages are incomplete.  As an example, having to open
> a ticket to find the magic options to get ACL support on a FUSE mount
> (see ticket #15783) is very inefficient for both the end user and the
> Ceph engineers. There are a huge number of mount options between
> kernel and FUSE mounts with very little explanation of what they do
> and what are the performance costs of using them.  The new CAP
> protections are great, if you can figure them out.  Same goes for the
> new recovery tools.
>
> 2. Snapshots.  Without snapshots I cannot put the file system it into
> production in all my use cases.
>
> 3. Full support for the SAMBA and Ganesha CephFS modules, including
> HA.  Although these modules are not owned by the Ceph team, they need
> to be solid to use CephFS at a lot of sites. The GlusterFS team seems
> to be doing a lot of work on these interfaces, and it would be nice if
> that work was also being done for Ceph.
>
> To make CephFS easier to use in the field, I would really like to see
> the base functionality well supported and documented before focusing
> on new things. Thank you for everything you are doing.

Let me clarify a bit: this isn't about prioritising usability work vs.
anything else, it's about gathering a list of tasks so that we have a
good to-do list when folks are working in that area.  There
increasingly many people working on CephFS (hooray!), so it has become
more important to have a nicely primed queue of work on lots of
different fronts so that we can work on these parallel.

Documentation is an ongoing issue in most projects (especially open
source).  The challenge in getting a big overhaul of the docs is that
most vendors have their own downstream/product documentation, which
can leave the upstream documentation a bit less loved than we would
like.  I steer people towards contributing to the upstream
documentation wherever possible.

Snapshots and Samba/NFS are of course full blown features rather than
usability items.  There is work going on on all of these (Zheng
recently made lots of snapshot fixes, a ganesha engineer is currently
working on building NFS-ganesha into our continuous integration).

John







> Eric Eastman
>
> On Thu, Jul 21, 2016 at 6:11 AM, John Spray <jspray@redhat.com> wrote:
>> Dear list,
>>
>> I'm collecting ideas for making CephFS easier to use.  This list
>> includes some preexisting stuff, as well as some recent ideas from
>> people working on the code.  I'm looking for feedback on what's here,
>> and any extra ideas people have.
>>
>> Some of the items here are dependent on ceph-mgr (like the enhanced
>> status views, client statistics), some aren't.  The general theme is
>> to make things less arcane, and make the state of the system easier to
>> understand.
>>
>> Please share your thoughts.
>>
>> Cheers,
>> John
>>
>>
>> Simpler kernel client setup
>>  * allow mount.ceph to use the same keyring file that the FUSE client
>> uses, instead of requiring users to strip the secret out of that file
>> manually. (http://tracker.ceph.com/issues/16656)
>>
>> Simpler multi-fs use from ceph-fuse
>>  * A nicer syntax than having to pass --client_mds_namespace
>>  * A way to specify the chosen filesystem in fstab
>>
>> Mount-less administrative shell/commands:
>>  * A lightweight python shell enabling admins to manipulate their
>> filesystem without a full blown client mount
>>  * Friendlier commands than current setxattr syntax for layouts and quotas
>>  * Enable administrators to inspect the filesystem (ls, cd, stat, etc)
>>  * Enable administrators to configure things like directories for
>> users with quotas, mapping directories to pools
>>
>> CephFS daemon/recovery status view:
>>  * Currently we see the text status of replay/clientreplay etc in "ceph status"
>>  * A more detailed "ceph fs status" view that breaks down each MDS
>> daemon's state
>>  * What we'd really like to see in these modes is progress (% of
>> segments replayed, % of clients replayed, % of clients reconnected)
>> and timing information (e.g. in reconnect, something like "waiting for
>> 1 client for another 30 seconds")
>>  * Maybe also display some other high level perf stats per-MDS like
>> client requests per second within this view.
>>
>> CephFS full system dstat-like view
>>  * Currently have "daemonperf mds.<foo>" asok mechanism, which is
>> local to one MDS and does not give OSD statistics
>>  * Add a "ceph fs perf" command that fuses multi-mds data with OSD
>> data to give users a single view of the level of metadata and data IO
>> across the system
>>
>> Client statistics
>>  * Implement the "live performance probes" mechanism
>> http://tracker.ceph.com/projects/ceph/wiki/Live_Performance_Probes
>>  * This is the same infrastructure as would be used for e.g. "rbd top"
>> image listing.
>>  * Initially could just be a "client top" view with 5-10 key stats per
>> client, where we collect data for the busiest 10-20 clients (on modest
>> size systems this likely to be all clients in practice)
>>  * Full feature would have per-path filtering, so that admin could say
>> "which subtree is busy?  OK, which client is busy within that
>> subtree?".
>>
>> Orchestrated backward scrub (aka cephfs-data-scan, #12143):
>>  * Wrap it in a central CLI that runs a pool of workers
>>  * Those workers could be embedded in standby mgrs, in standby mdss,
>> or standalone
>>  * Need a work queue type mechanism, probably via RADOS objects.
>>  * This is http://tracker.ceph.com/issues/12143
>>
>> Single command hard client eviction:
>>  * Wrap the process of blacklisting a client *and* evicting it from
>> all MDS daemons
>>  * Similar procedure currently done in CephFSVolumeClient.evict
>>  * This is http://tracker.ceph.com/issues/9754
>>
>> Simplified client auth caps creation:
>>  * Wrap the process of creating a client identify that has just the
>> right MDS+OSD capabilities for accessing a particular filesystem
>> ("ceph fs authorize client.foo" instead of "ceph auth get-or-create
>> ... ..."
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: CephFS usability
  2016-07-21 12:11 CephFS usability John Spray
  2016-07-21 14:33 ` Eric Eastman
@ 2016-07-25  5:40 ` Zhi Zhang
  2016-07-25  8:15   ` Ilya Dryomov
  1 sibling, 1 reply; 5+ messages in thread
From: Zhi Zhang @ 2016-07-25  5:40 UTC (permalink / raw)
  To: John Spray; +Cc: Ceph Development

Hi John,

Thanks for sharing CephFS to-do list to us. It could make CephFS users
to better know the forward direction of CephFS. We have worked on
kernel CephFS for more than 1 year and already put it on production to
server some user scenarios. We are planning to adopt kernel CephFS to
server more our users.

Here are 3 items from my past experience on kernel CephFS. I think
they could make CephFS more easier and faster to adopt.

1. Backport fixes from higher kernel version to standard linux
distribution's kernel
* Current most of linux distribution's kernel is still 3.10.x. Kernel
CephFS has some functional and stability issues on 3.10.x even on the
latest 3.10.101. And AFAIK, some companies including mine, don't allow
to upgrading kernel to 4.x easily. So I have to backport nearly 3+
years' fixes into 3.10.x gradually and also deal with fixing's
incompatible issue among different kernel. It costs lots of time and
effort to backport, test and verify them, but the benefit is obvious
that kernel CephFS becomes quite stable and well-performed for us. So
I think if we can try to backport each fixing to lower kernel version
when it is committed, it could save lots of effort and make things
much more easier.

2. Differentiate log's level
* Currently ceph's kernel modules have very few log levels. Once we
enable the logs to reproduce some performance or stability bugs, the
whole system might hang because of log flood, so I have to add logs by
myself on critical path. If we can differentiate log's level, I think
it could benefit and help developers.

3. Limited quota support on kernel CephFS
* I know implementing quota support on kernel CephFS like Ceph-fuse is
quite difficult. What I mention here is a sort of idea to see whether
there is any possibility we can work on to limit the file number per
directory at least. We are facing such issue that one directory may
contain millions of files.

Thank you for everything you have done to CephFS.

Regards,
Zhi Zhang (David)
Contact: zhang.david2011@gmail.com
              zhangz.david@outlook.com


On Thu, Jul 21, 2016 at 8:11 PM, John Spray <jspray@redhat.com> wrote:
> Dear list,
>
> I'm collecting ideas for making CephFS easier to use.  This list
> includes some preexisting stuff, as well as some recent ideas from
> people working on the code.  I'm looking for feedback on what's here,
> and any extra ideas people have.
>
> Some of the items here are dependent on ceph-mgr (like the enhanced
> status views, client statistics), some aren't.  The general theme is
> to make things less arcane, and make the state of the system easier to
> understand.
>
> Please share your thoughts.
>
> Cheers,
> John
>
>
> Simpler kernel client setup
>  * allow mount.ceph to use the same keyring file that the FUSE client
> uses, instead of requiring users to strip the secret out of that file
> manually. (http://tracker.ceph.com/issues/16656)
>
> Simpler multi-fs use from ceph-fuse
>  * A nicer syntax than having to pass --client_mds_namespace
>  * A way to specify the chosen filesystem in fstab
>
> Mount-less administrative shell/commands:
>  * A lightweight python shell enabling admins to manipulate their
> filesystem without a full blown client mount
>  * Friendlier commands than current setxattr syntax for layouts and quotas
>  * Enable administrators to inspect the filesystem (ls, cd, stat, etc)
>  * Enable administrators to configure things like directories for
> users with quotas, mapping directories to pools
>
> CephFS daemon/recovery status view:
>  * Currently we see the text status of replay/clientreplay etc in "ceph status"
>  * A more detailed "ceph fs status" view that breaks down each MDS
> daemon's state
>  * What we'd really like to see in these modes is progress (% of
> segments replayed, % of clients replayed, % of clients reconnected)
> and timing information (e.g. in reconnect, something like "waiting for
> 1 client for another 30 seconds")
>  * Maybe also display some other high level perf stats per-MDS like
> client requests per second within this view.
>
> CephFS full system dstat-like view
>  * Currently have "daemonperf mds.<foo>" asok mechanism, which is
> local to one MDS and does not give OSD statistics
>  * Add a "ceph fs perf" command that fuses multi-mds data with OSD
> data to give users a single view of the level of metadata and data IO
> across the system
>
> Client statistics
>  * Implement the "live performance probes" mechanism
> http://tracker.ceph.com/projects/ceph/wiki/Live_Performance_Probes
>  * This is the same infrastructure as would be used for e.g. "rbd top"
> image listing.
>  * Initially could just be a "client top" view with 5-10 key stats per
> client, where we collect data for the busiest 10-20 clients (on modest
> size systems this likely to be all clients in practice)
>  * Full feature would have per-path filtering, so that admin could say
> "which subtree is busy?  OK, which client is busy within that
> subtree?".
>
> Orchestrated backward scrub (aka cephfs-data-scan, #12143):
>  * Wrap it in a central CLI that runs a pool of workers
>  * Those workers could be embedded in standby mgrs, in standby mdss,
> or standalone
>  * Need a work queue type mechanism, probably via RADOS objects.
>  * This is http://tracker.ceph.com/issues/12143
>
> Single command hard client eviction:
>  * Wrap the process of blacklisting a client *and* evicting it from
> all MDS daemons
>  * Similar procedure currently done in CephFSVolumeClient.evict
>  * This is http://tracker.ceph.com/issues/9754
>
> Simplified client auth caps creation:
>  * Wrap the process of creating a client identify that has just the
> right MDS+OSD capabilities for accessing a particular filesystem
> ("ceph fs authorize client.foo" instead of "ceph auth get-or-create
> ... ..."
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: CephFS usability
  2016-07-25  5:40 ` Zhi Zhang
@ 2016-07-25  8:15   ` Ilya Dryomov
  0 siblings, 0 replies; 5+ messages in thread
From: Ilya Dryomov @ 2016-07-25  8:15 UTC (permalink / raw)
  To: Zhi Zhang; +Cc: John Spray, Ceph Development

On Mon, Jul 25, 2016 at 7:40 AM, Zhi Zhang <zhang.david2011@gmail.com> wrote:
> Hi John,
>
> Thanks for sharing CephFS to-do list to us. It could make CephFS users
> to better know the forward direction of CephFS. We have worked on
> kernel CephFS for more than 1 year and already put it on production to
> server some user scenarios. We are planning to adopt kernel CephFS to
> server more our users.
>
> Here are 3 items from my past experience on kernel CephFS. I think
> they could make CephFS more easier and faster to adopt.
>
> 1. Backport fixes from higher kernel version to standard linux
> distribution's kernel
> * Current most of linux distribution's kernel is still 3.10.x. Kernel

I don't think that's true ;)

> CephFS has some functional and stability issues on 3.10.x even on the
> latest 3.10.101. And AFAIK, some companies including mine, don't allow
> to upgrading kernel to 4.x easily. So I have to backport nearly 3+
> years' fixes into 3.10.x gradually and also deal with fixing's
> incompatible issue among different kernel. It costs lots of time and
> effort to backport, test and verify them, but the benefit is obvious
> that kernel CephFS becomes quite stable and well-performed for us. So
> I think if we can try to backport each fixing to lower kernel version
> when it is committed, it could save lots of effort and make things
> much more easier.

While we can certainly do a better job of backporting select CephFS
fixes to upstream stable kernels, it can only get us so far, especially
if the target is a 3 year old kernel.

Have you considered using RHEL7 kernels?  They are 3.10-based, with RBD
and CephFS regularly rebased to later upstream versions.  They include
a large number of upstream fixes, and some features too.

>
> 2. Differentiate log's level
> * Currently ceph's kernel modules have very few log levels. Once we
> enable the logs to reproduce some performance or stability bugs, the
> whole system might hang because of log flood, so I have to add logs by
> myself on critical path. If we can differentiate log's level, I think
> it could benefit and help developers.

That's definitely a problem, especially with rate limiting and
wide-spread adoption of journald.  I've been wanting to add an ftrace
option, but that depends on a kernel patch which I need to finish.
Different log levels or at least tracepoints in functions like send()
and handle_reply() (so you don't have to enable logging) are on the
TODO list.

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-07-25  8:15 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-21 12:11 CephFS usability John Spray
2016-07-21 14:33 ` Eric Eastman
2016-07-21 15:03   ` John Spray
2016-07-25  5:40 ` Zhi Zhang
2016-07-25  8:15   ` Ilya Dryomov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.