From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx3-rdu2.redhat.com ([66.187.233.73]:59996 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2389725AbeHARM6 (ORCPT ); Wed, 1 Aug 2018 13:12:58 -0400 Subject: [PATCH 23/33] vfs: Provide documentation for new mount API [ver #11] From: David Howells To: viro@zeniv.linux.org.uk Cc: torvalds@linux-foundation.org, dhowells@redhat.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 01 Aug 2018 16:26:42 +0100 Message-ID: <153313720207.13253.5684200137365900841.stgit@warthog.procyon.org.uk> In-Reply-To: <153313703562.13253.5766498657900728120.stgit@warthog.procyon.org.uk> References: <153313703562.13253.5766498657900728120.stgit@warthog.procyon.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Sender: linux-fsdevel-owner@vger.kernel.org List-ID: Provide documentation for the new mount API. Signed-off-by: David Howells --- Documentation/filesystems/mount_api.txt | 706 +++++++++++++++++++++++++++++++ 1 file changed, 706 insertions(+) create mode 100644 Documentation/filesystems/mount_api.txt diff --git a/Documentation/filesystems/mount_api.txt b/Documentation/filesystems/mount_api.txt new file mode 100644 index 000000000000..5fec78eed4f4 --- /dev/null +++ b/Documentation/filesystems/mount_api.txt @@ -0,0 +1,706 @@ + ==================== + FILESYSTEM MOUNT API + ==================== + +CONTENTS + + (1) Overview. + + (2) The filesystem context. + + (3) The filesystem context operations. + + (4) Filesystem context security. + + (5) VFS filesystem context operations. + + (6) Parameter description. + + (7) Parameter helper functions. + + +======== +OVERVIEW +======== + +The creation of new mounts is now to be done in a multistep process: + + (1) Create a filesystem context. + + (2) Parse the parameters and attach them to the context. Parameters are + expected to be passed individually from userspace, though legacy binary + parameters can also be handled. + + (3) Validate and pre-process the context. + + (4) Get or create a superblock and mountable root. + + (5) Perform the mount. + + (6) Return an error message attached to the context. + + (7) Destroy the context. + +To support this, the file_system_type struct gains a new field: + + int (*init_fs_context)(struct fs_context *fc, struct dentry *reference); + +which is invoked to set up the filesystem-specific parts of a filesystem +context, including the additional space. The reference parameter is used to +convey a superblock and an automount point or a point to reconfigure from which +the filesystem may draw extra information (such as namespaces) for submount +(FS_CONTEXT_FOR_SUBMOUNT) or reconfiguration (FS_CONTEXT_FOR_RECONFIGURE) +purposes - otherwise it will be NULL. + +Note that security initialisation is done *after* the filesystem is called so +that the namespaces may be adjusted first. + +And the super_operations struct gains one field: + + int (*reconfigure)(struct super_block *, struct fs_context *); + +This shadows the ->reconfigure() operation and takes a prepared filesystem +context instead of the mount flags and data page. It may modify the sb_flags +in the context for the caller to pick up. + +[NOTE] reconfigure is intended as a replacement for remount_fs. + + +====================== +THE FILESYSTEM CONTEXT +====================== + +The creation and reconfiguration of a superblock is governed by a filesystem +context. This is represented by the fs_context structure: + + struct fs_context { + const struct fs_context_operations *ops; + struct file_system_type *fs_type; + void *fs_private; + struct dentry *root; + struct user_namespace *user_ns; + struct net *net_ns; + const struct cred *cred; + char *source; + char *subtype; + void *security; + void *s_fs_info; + unsigned int sb_flags; + enum fs_context_purpose purpose:8; + bool sloppy:1; + bool silent:1; + ... + }; + +The fs_context fields are as follows: + + (*) const struct fs_context_operations *ops + + These are operations that can be done on a filesystem context (see + below). This must be set by the ->init_fs_context() file_system_type + operation. + + (*) struct file_system_type *fs_type + + A pointer to the file_system_type of the filesystem that is being + constructed or reconfigured. This retains a reference on the type owner. + + (*) void *fs_private + + A pointer to the file system's private data. This is where the filesystem + will need to store any options it parses. + + (*) struct dentry *root + + A pointer to the root of the mountable tree (and indirectly, the + superblock thereof). This is filled in by the ->get_tree() op. If this + is set, an active reference on root->d_sb must also be held. + + (*) struct user_namespace *user_ns + (*) struct net *net_ns + + There are a subset of the namespaces in use by the invoking process. They + retain references on each namespace. The subscribed namespaces may be + replaced by the filesystem to reflect other sources, such as the parent + mount superblock on an automount. + + (*) const struct cred *cred + + The mounter's credentials. This retains a reference on the credentials. + + (*) char *source + + This specifies the source. It may be a block device (e.g. /dev/sda1) or + something more exotic, such as the "host:/path" that NFS desires. + + (*) char *subtype + + This is a string to be added to the type displayed in /proc/mounts to + qualify it (used by FUSE). This is available for the filesystem to set if + desired. + + (*) void *security + + A place for the LSMs to hang their security data for the superblock. The + relevant security operations are described below. + + (*) void *s_fs_info + + The proposed s_fs_info for a new superblock, set in the superblock by + sget_fc(). This can be used to distinguish superblocks. + + (*) unsigned int sb_flags + + This holds the SB_* flags to be set in super_block::s_flags. + + (*) enum fs_context_purpose + + This indicates the purpose for which the context is intended. The + available values are: + + FS_CONTEXT_FOR_USER_MOUNT, -- New superblock for user-specified mount + FS_CONTEXT_FOR_KERNEL_MOUNT, -- New superblock for kernel-internal mount + FS_CONTEXT_FOR_SUBMOUNT -- New automatic submount of extant mount + FS_CONTEXT_FOR_RECONFIGURE -- Change an existing mount + + (*) bool sloppy + (*) bool silent + + These are set if the sloppy or silent mount options are given. + + [NOTE] sloppy is probably unnecessary when userspace passes over one + option at a time since the error can just be ignored if userspace deems it + to be unimportant. + + [NOTE] silent is probably redundant with sb_flags & SB_SILENT. + +The mount context is created by calling vfs_new_fs_context(), vfs_sb_reconfig() +or vfs_dup_fs_context() and is destroyed with put_fs_context(). Note that the +structure is not refcounted. + +VFS, security and filesystem mount options are set individually with +vfs_parse_mount_option(). Options provided by the old mount(2) system call as +a page of data can be parsed with generic_parse_monolithic(). + +When mounting, the filesystem is allowed to take data from any of the pointers +and attach it to the superblock (or whatever), provided it clears the pointer +in the mount context. + +The filesystem is also allowed to allocate resources and pin them with the +mount context. For instance, NFS might pin the appropriate protocol version +module. + + +================================= +THE FILESYSTEM CONTEXT OPERATIONS +================================= + +The filesystem context points to a table of operations: + + struct fs_context_operations { + void (*free)(struct fs_context *fc); + int (*dup)(struct fs_context *fc, struct fs_context *src_fc); + int (*parse_param)(struct fs_context *fc, + struct struct fs_parameter *param); + int (*parse_monolithic)(struct fs_context *fc, void *data, + size_t data_size); + int (*validate)(struct fs_context *fc); + int (*get_tree)(struct fs_context *fc); + }; + +These operations are invoked by the various stages of the mount procedure to +manage the filesystem context. They are as follows: + + (*) void (*free)(struct fs_context *fc); + + Called to clean up the filesystem-specific part of the filesystem context + when the context is destroyed. It should be aware that parts of the + context may have been removed and NULL'd out by ->get_tree(). + + (*) int (*dup)(struct fs_context *fc, struct fs_context *src_fc); + + Called when a filesystem context has been duplicated to duplicate the + filesystem-private data. An error may be returned to indicate failure to + do this. + + [!] Note that even if this fails, put_fs_context() will be called + immediately thereafter, so ->dup() *must* make the + filesystem-private data safe for ->free(). + + (*) int (*parse_param)(struct fs_context *fc, + struct struct fs_parameter *param); + + Called when a parameter is being added to the filesystem context. param + points to the key name and maybe a value object. VFS-specific options + will have been weeded out and fc->sb_flags updated in the context. + Security options will also have been weeded out and fc->security updated. + + The parameter can be parsed with fs_parse() and fs_lookup_param(). Note + that the source(s) are presented as parameters named "source". + + If successful, 0 should be returned or a negative error code otherwise. + + (*) int (*parse_monolithic)(struct fs_context *fc, + void *data, size_t data_size); + + Called when the mount(2) system call is invoked to pass the entire data + page in one go. If this is expected to be just a list of "key[=val]" + items separated by commas, then this may be set to NULL. + + The return value is as for ->parse_param(). + + If the filesystem (e.g. NFS) needs to examine the data first and then + finds it's the standard key-val list then it may pass it off to + generic_parse_monolithic(). + + (*) int (*validate)(struct fs_context *fc); + + Called when all the options have been applied and the mount is about to + take place. It is should check for inconsistencies from mount options and + it is also allowed to do preliminary resource acquisition. For instance, + the core NFS module could load the NFS protocol module here. + + Note that if fc->purpose == FS_CONTEXT_FOR_RECONFIGURE, some of the + options necessary for a new mount may not be set. + + The return value is as for ->parse_option(). + + (*) int (*get_tree)(struct fs_context *fc); + + Called to get or create the mountable root and superblock, using the + information stored in the filesystem context (reconfiguration goes via a + different vector). It may detach any resources it desires from the + filesystem context and transfer them to the superblock it creates. + + On success it should set fc->root to the mountable root and return 0. In + the case of an error, it should return a negative error code. + + The phase on a userspace-driven context will be set to only allow this to + be called once on any particular context. + + +=========================== +FILESYSTEM CONTEXT SECURITY +=========================== + +The filesystem context contains a security pointer that the LSMs can use for +building up a security context for the superblock to be mounted. There are a +number of operations used by the new mount code for this purpose: + + (*) int security_fs_context_alloc(struct fs_context *fc, + struct dentry *reference); + + Called to initialise fc->security (which is preset to NULL) and allocate + any resources needed. It should return 0 on success or a negative error + code on failure. + + reference will be non-NULL if the context is being created for superblock + reconfiguration (FS_CONTEXT_FOR_RECONFIGURE) in which case it indicates + the root dentry of the superblock to be reconfigured. It will also be + non-NULL in the case of a submount (FS_CONTEXT_FOR_SUBMOUNT) in which case + it indicates the automount point. + + (*) int security_fs_context_dup(struct fs_context *fc, + struct fs_context *src_fc); + + Called to initialise fc->security (which is preset to NULL) and allocate + any resources needed. The original filesystem context is pointed to by + src_fc and may be used for reference. It should return 0 on success or a + negative error code on failure. + + (*) void security_fs_context_free(struct fs_context *fc); + + Called to clean up anything attached to fc->security. Note that the + contents may have been transferred to a superblock and the pointer cleared + during get_tree. + + (*) int security_fs_context_parse_param(struct fs_context *fc, + struct fs_parameter *param); + + Called for each mount parameter, including the source. The arguments are + as for the ->parse_param() method. It should return 0 to indicate that + the parameter should be passed on to the filesystem, 1 to indicate that + the parameter should be discarded or an error to indicate that the + parameter should be rejected. + + The value pointed to by param may be modified (if a string) or stolen + (provided the value pointer is NULL'd out). If it is stolen, 1 must be + returned to prevent it being passed to the filesystem. + + (*) int security_fs_context_validate(struct fs_context *fc); + + Called after all the options have been parsed to validate the collection + as a whole and to do any necessary allocation so that + security_sb_get_tree() is less likely to fail. It should return 0 or a + negative error code. + + (*) int security_sb_get_tree(struct fs_context *fc); + + Called during the mount procedure to verify that the specified superblock + is allowed to be mounted and to transfer the security data there. It + should return 0 or a negative error code. + + (*) int security_sb_mountpoint(struct fs_context *fc, struct path *mountpoint, + unsigned int mnt_flags); + + Called during the mount procedure to verify that the root dentry attached + to the context is permitted to be attached to the specified mountpoint. + It should return 0 on success or a negative error code on failure. + + +================================= +VFS FILESYSTEM CONTEXT OPERATIONS +================================= + +There are four operations for creating a filesystem context and +one for destroying a context: + + (*) struct fs_context *vfs_new_fs_context(struct file_system_type *fs_type, + struct dentry *reference, + unsigned int sb_flags, + enum fs_context_purpose purpose); + + Create a filesystem context for a given filesystem type and purpose. This + allocates the filesystem context, sets the flags, initialises the security + and calls fs_type->init_fs_context() to initialise the filesystem private + data. + + reference can be NULL or it may indicate the root dentry of a superblock + that is going to be reconfigured (FS_CONTEXT_FOR_RECONFIGURE) or the + automount point that triggered a submount (FS_CONTEXT_FOR_SUBMOUNT). This + is provided as a source of namespace information. + + (*) struct fs_context *vfs_sb_reconfig(struct vfsmount *mnt, + unsigned int sb_flags); + + Create a filesystem context from the same filesystem as an extant mount + and initialise the mount parameters from the superblock underlying that + mount. This is for use by superblock parameter reconfiguration. + + (*) struct fs_context *vfs_dup_fs_context(struct fs_context *src_fc); + + Duplicate a filesystem context, copying any options noted and duplicating + or additionally referencing any resources held therein. This is available + for use where a filesystem has to get a mount within a mount, such as NFS4 + does by internally mounting the root of the target server and then doing a + private pathwalk to the target directory. + + (*) void put_fs_context(struct fs_context *fc); + + Destroy a filesystem context, releasing any resources it holds. This + calls the ->free() operation. This is intended to be called by anyone who + created a filesystem context. + + [!] filesystem contexts are not refcounted, so this causes unconditional + destruction. + +In all the above operations, apart from the put op, the return is a mount +context pointer or a negative error code. + +For the remaining operations, if an error occurs, a negative error code will be +returned. + + (*) int vfs_get_tree(struct fs_context *fc); + + Get or create the mountable root and superblock, using the parameters in + the filesystem context to select/configure the superblock. This invokes + the ->validate() op and then the ->get_tree() op. + + [NOTE] ->validate() could perhaps be rolled into ->get_tree() and + ->reconfigure(). + + (*) struct vfsmount *vfs_create_mount(struct fs_context *fc); + + Create a mount given the parameters in the specified filesystem context. + Note that this does not attach the mount to anything. + + (*) int vfs_parse_fs_param(struct fs_context *fc, + struct fs_parameter *param); + + Supply a single mount parameter to the filesystem context. This include + the specification of the source/device which is specified as the "source" + parameter (which may be specified multiple times if the filesystem + supports that). + + param specifies the parameter key name and the value. The parameter is + first checked to see if it corresponds to a standard mount flag (in which + case it is used to set an SB_xxx flag and consumed) or a security option + (in which case the LSM consumes it) before it is passed on to the + filesystem. + + The parameter value is typed and can be one of: + + fs_value_is_flag, Parameter not given a value. + fs_value_is_string, Value is a string + fs_value_is_blob, Value is a binary blob + fs_value_is_filename, Value is a filename* + dirfd + fs_value_is_filename_empty, Value is a filename* + dirfd + AT_EMPTY_PATH + fs_value_is_file, Value is an open file (file*) + + If there is a value, that value is stored in a union in the struct in one + of param->{string,blob,name,file}. Note that the function may steal and + clear the pointer, but then becomes responsible for disposing of the + object. + + (*) int vfs_parse_fs_string(struct fs_context *fc, char *key, + const char *value, size_t v_size); + + A wrapper around vfs_parse_fs_param() that just passes a constant string. + + (*) int generic_parse_monolithic(struct fs_context *fc, + void *data, size_t data_len); + + Parse a sys_mount() data page, assuming the form to be a text list + consisting of key[=val] options separated by commas. Each item in the + list is passed to vfs_mount_option(). This is the default when the + ->parse_monolithic() operation is NULL. + + +===================== +PARAMETER DESCRIPTION +===================== + +Parameters are described using structures defined in linux/fs_parser.h. +There's a core description struct that links everything together: + + struct fs_parameter_description { + const char name[16]; + u8 nr_params; + u8 nr_keys; + u8 nr_enums; + bool ignore_unknown; + bool no_source; + const struct constant_table *keys; + const struct fs_parameter_spec *specs; + const struct fs_parameter_enum *enums; + }; + +For example: + + enum afs_param { + Opt_autocell, + Opt_bar, + Opt_dyn, + Opt_foo, + Opt_source, + nr__afs_params + }; + + static const struct fs_parameter_description afs_fs_parameters = { + .name = "kAFS", + .nr_params = nr__afs_params, + .nr_keys = ARRAY_SIZE(afs_param_keys), + .nr_enums = ARRAY_SIZE(afs_param_enums), + .keys = afs_param_keys, + .specs = afs_param_specs, + .enums = afs_param_enums, + }; + +The members are as follows: + + (1) const char name[16]; + + The name to be used in error messages generated by the parse helper + functions. + + (2) u8 nr_params; + + The number of discrete parameter identifiers. This indicates the number + of elements in the ->types[] array and also limits the values that may be + used in the values that the ->keys[] array maps to. + + It is expected that, for example, two parameters that are related, say + "acl" and "noacl" with have the same ID, but will be flagged to indicate + that one is the inverse of the other. The value can then be picked out + from the parse result. + + (3) const struct fs_parameter_specification *specs; + + Table of parameter specifications, where the entries are of type: + + struct fs_parameter_type { + enum fs_parameter_spec type:8; + u8 flags; + }; + + and the parameter identifier is the index to the array. 'type' indicates + the desired value type and must be one of: + + TYPE NAME EXPECTED VALUE RESULT IN + ======================= ======================= ===================== + fs_param_takes_no_value No value n/a + fs_param_is_bool Boolean value result->boolean + fs_param_is_u32 32-bit unsigned int result->uint_32 + fs_param_is_u32_octal 32-bit octal int result->uint_32 + fs_param_is_u32_hex 32-bit hex int result->uint_32 + fs_param_is_s32 32-bit signed int result->int_32 + fs_param_is_enum Enum value name result->uint_32 + fs_param_is_string Arbitrary string param->string + fs_param_is_blob Binary blob param->blob + fs_param_is_blockdev Blockdev path * Needs lookup + fs_param_is_path Path * Needs lookup + fs_param_is_fd File descriptor param->file + + And each parameter can be qualified with 'flags': + + fs_param_v_optional The value is optional + fs_param_neg_with_no If key name is prefixed with "no", it is false + fs_param_neg_with_empty If value is "", it is false + fs_param_deprecated The parameter is deprecated. + + For example: + + static const struct fs_parameter_spec afs_param_specs[nr__afs_params] = { + [Opt_autocell] = { fs_param_takes_no_value }, + [Opt_bar] = { fs_param_is_enum }, + [Opt_dyn] = { fs_param_takes_no_value }, + [Opt_foo] = { fs_param_is_bool, fs_param_neg_with_no }, + [Opt_source] = { fs_param_is_string }, + }; + + Note that if the value is of fs_param_is_bool type, fs_parse() will try + to match any string value against "0", "1", "no", "yes", "false", "true". + + (4) const struct constant_table *keys; + u8 nr_keys; + + Table of key name to parameter ID mappings and the number of elements in + the table. This is optional if ->nr_params is 0. The table is just an + array of { name, integer } pairs, e.g.: + + static const struct constant_table afs_param_keys[] = { + { "autocell", Opt_autocell }, + { "bar", Opt_bar }, + { "dyn", Opt_dyn }, + { "foo", Opt_foo }, + { "source", Opt_source }, + }; + + [!] NOTE that the table must be sorted for bsearch() to use strcmp() to + compare the entries. + + The parameter ID can also be fs_param_key_removed to indicate that a + deprecated parameter has been removed and that an error will be given. + This differs from fs_param_deprecated where the parameter may still have + an effect. + + (5) const struct fs_parameter_enum *enums; + u8 nr_enums; + + Table of enum value names to integer mappings and the number of elements + stored therein. This is of type: + + struct fs_parameter_enum { + u8 param_id; + char name[14]; + u8 value; + }; + + Where the array is an unsorted list of { parameter ID, name }-keyed + elements that indicate the value to map to, e.g.: + + static const struct fs_parameter_enum afs_param_enums[] = { + { Opt_bar, "x", 1}, + { Opt_bar, "y", 23}, + { Opt_bar, "z", 42}, + }; + + If a parameter of type fs_param_is_enum is encountered, fs_parse() will + try to look the value up in the enum table and the result will be stored + in the parse result. + + (6) bool ignore_unknown; + + If this is set, fs_parse() will not generate an error for unknown + parameters, but will rather return 0. + + (7) bool no_source; + + If this is set, fs_parse() will ignore any "source" parameter and not + pass it to the filesystem. + +The parser should be pointed to by the parser pointer in the file_system_type +struct as this will provide validation on registration (if +CONFIG_VALIDATE_FS_PARSER=y) and will allow the description to be queried from +userspace using the fsinfo() syscall. + + +========================== +PARAMETER HELPER FUNCTIONS +========================== + +A number of helper functions are provided to help a filesystem or an LSM +process the parameters it is given. + + (*) int lookup_constant(const struct constant_table tbl[], + const char *name, int not_found); + + Look up a constant by name in a table of name -> integer mappings. The + table is an array of elements of the following type: + + struct constant_table { + const char *name; + int value; + }; + + and it must be sorted such that it can be searched using bsearch() using + strcmp(). If a match is found, the corresponding value is returned. If a + match isn't found, the not_found value is returned instead. + + (*) bool validate_constant_table(const struct constant_table *tbl, + size_t tbl_size, + int low, int high, int special); + + Validate a constant table. Checks that all the elements are appropriately + ordered, that there are no duplicates and that the values are between low + and high inclusive, though provision is made for one allowable special + value outside of that range. If no special value is required, special + should just be set to lie inside the low-to-high range. + + If all is good, true is returned. If the table is invalid, errors are + logged to dmesg, the stack is dumped and false is returned. + + (*) int fs_parse(struct fs_context *fc, + const struct fs_param_parser *parser, + struct fs_parameter *param, + struct fs_param_parse_result *result); + + This is the main interpreter of parameters. It uses the parameter + description (parser) to look up the name of the parameter to use and to + convert that to a parameter ID (stored in result->key). + + If successful, and if the parameter type indicates the result is a + boolean, integer or enum type, the value is converted by this function and + the result stored in result->{boolean,int_32,uint_32}. + + If a match isn't initially made, the key is prefixed with "no" and no + value is present then an attempt will be made to look up the key with the + prefix removed. If this matches a parameter for which the type has flag + fs_param_neg_with_no set, then a match will be made and the value will be + set to false/0/NULL. + + If the parameter is successfully matched and, optionally, parsed + correctly, 1 is returned. If the parameter isn't matched and + parser->ignore_unknown is set, then 0 is returned. Otherwise -EINVAL is + returned. + + (*) bool fs_validate_parser(const char *name, + const struct fs_param_parser *parser); + + This is validates the parameter description. 'name' is used to illuminate + any error messages that are logged. It returns true if the description is + good and false if it is not. + + (*) int fs_lookup_param(struct fs_context *fc, + const struct fs_param_parser *parser, + struct fs_parameter *value, + struct fs_parse_result *result, + struct path *_path); + + This takes the result of fs_parse() for string and filename type + parameters and attemps to do a path lookup on them. If the parameter + expects a blockdev, a check is made that the inode actually represents + one. + + Returns 0 if successful and *_path will be set; returns a negative error + code if not.