linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/24] orangefs: page cache
@ 2018-03-20 17:02 Martin Brandenburg
  2018-03-20 17:02 ` [PATCH 01/24] orangefs: make several *_operations structs static Martin Brandenburg
                   ` (23 more replies)
  0 siblings, 24 replies; 25+ messages in thread
From: Martin Brandenburg @ 2018-03-20 17:02 UTC (permalink / raw)
  To: hubcap, linux-fsdevel; +Cc: Martin Brandenburg

This is significantly reworked from last time.

First some unrelated cleanups, which will in any case be submitted for
4.17.

Nothing too unusual here, but this is a big change, and it's time to
get this out here.

Martin Brandenburg (24):
  orangefs: make several *_operations structs static
  orangefs: remove unused code
  orangefs: create uapi interface
  orangefs: open code short single-use functions
  orangefs: implement vm_ops->fault
  orangefs: implement xattr cache
  orangefs: simpler installation documentation
  orangefs: add tracepoint for service_operation
  orangefs: tracepoints for orangefs_devreq_{read,write_iter,poll}
  orangefs: do not invalidate attributes on inode create
  orangefs: simply orangefs_inode_getattr interface
  orangefs: update attributes rather than relying on server
  orangefs: hold i_lock during inode_getattr
  orangefs: set up and use backing_dev_info
  orangefs: let setattr write to cached inode
  orangefs: reorganize setattr functions to track attribute changes
  orangefs: remove orangefs_readpages
  orangefs: service ops done for writeback are not killable
  orangefs: migrate to generic_file_read_iter
  orangefs: implement writepage
  orangefs: skip inode writeout if nothing to write
  orangefs: write range tracking
  orangefs: tracepoints for readpage and writeback
  orangefs: tracepoints for getattr, setattr, and write_inode

 Documentation/filesystems/orangefs.txt |  84 ++--
 fs/orangefs/Makefile                   |   4 +-
 fs/orangefs/acl.c                      |   5 +-
 fs/orangefs/dcache.c                   |   4 +-
 fs/orangefs/devorangefs-req.c          |  66 +--
 fs/orangefs/dir.c                      |   4 +-
 fs/orangefs/downcall.h                 | 137 ------
 fs/orangefs/file.c                     | 295 ++++---------
 fs/orangefs/inode.c                    | 471 +++++++++++++++------
 fs/orangefs/namei.c                    |  54 ++-
 fs/orangefs/orangefs-bufmap.c          |   1 -
 fs/orangefs/orangefs-cache.c           |   1 -
 fs/orangefs/orangefs-debug.h           |  33 --
 fs/orangefs/orangefs-debugfs.c         |   8 +-
 fs/orangefs/orangefs-dev-proto.h       |  61 ---
 fs/orangefs/orangefs-kernel.h          | 136 ++----
 fs/orangefs/orangefs-mod.c             |   1 -
 fs/orangefs/orangefs-sysfs.c           |  12 +-
 fs/orangefs/orangefs-trace.c           |   3 +
 fs/orangefs/orangefs-trace.h           | 241 +++++++++++
 fs/orangefs/orangefs-utils.c           | 192 +++++----
 fs/orangefs/protocol.h                 | 339 ---------------
 fs/orangefs/super.c                    |  50 ++-
 fs/orangefs/symlink.c                  |   1 -
 fs/orangefs/upcall.h                   | 260 ------------
 fs/orangefs/waitqueue.c                |  36 +-
 fs/orangefs/xattr.c                    | 112 ++++-
 include/uapi/linux/orangefs.h          | 732 +++++++++++++++++++++++++++++++++
 28 files changed, 1825 insertions(+), 1518 deletions(-)
 delete mode 100644 fs/orangefs/downcall.h
 delete mode 100644 fs/orangefs/orangefs-dev-proto.h
 create mode 100644 fs/orangefs/orangefs-trace.c
 create mode 100644 fs/orangefs/orangefs-trace.h
 delete mode 100644 fs/orangefs/upcall.h
 create mode 100644 include/uapi/linux/orangefs.h

-- 
2.16.2

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 01/24] orangefs: make several *_operations structs static
  2018-03-20 17:02 [PATCH 00/24] orangefs: page cache Martin Brandenburg
@ 2018-03-20 17:02 ` Martin Brandenburg
  2018-03-20 17:02 ` [PATCH 02/24] orangefs: remove unused code Martin Brandenburg
                   ` (22 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Martin Brandenburg @ 2018-03-20 17:02 UTC (permalink / raw)
  To: hubcap, linux-fsdevel; +Cc: Martin Brandenburg

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
---
 fs/orangefs/devorangefs-req.c | 52 +++++++++++++++++++++----------------------
 fs/orangefs/inode.c           |  4 ++--
 fs/orangefs/orangefs-kernel.h |  3 ---
 3 files changed, 28 insertions(+), 31 deletions(-)

diff --git a/fs/orangefs/devorangefs-req.c b/fs/orangefs/devorangefs-req.c
index b03057afac2a..04da19bf000e 100644
--- a/fs/orangefs/devorangefs-req.c
+++ b/fs/orangefs/devorangefs-req.c
@@ -779,9 +779,35 @@ static long orangefs_devreq_compat_ioctl(struct file *filp, unsigned int cmd,
 
 #endif /* CONFIG_COMPAT is in .config */
 
+static __poll_t orangefs_devreq_poll(struct file *file,
+				      struct poll_table_struct *poll_table)
+{
+	__poll_t poll_revent_mask = 0;
+
+	poll_wait(file, &orangefs_request_list_waitq, poll_table);
+
+	if (!list_empty(&orangefs_request_list))
+		poll_revent_mask |= EPOLLIN;
+	return poll_revent_mask;
+}
+
 /* the assigned character device major number */
 static int orangefs_dev_major;
 
+static const struct file_operations orangefs_devreq_file_operations = {
+	.owner = THIS_MODULE,
+	.read = orangefs_devreq_read,
+	.write_iter = orangefs_devreq_write_iter,
+	.open = orangefs_devreq_open,
+	.release = orangefs_devreq_release,
+	.unlocked_ioctl = orangefs_devreq_ioctl,
+
+#ifdef CONFIG_COMPAT		/* CONFIG_COMPAT is in .config */
+	.compat_ioctl = orangefs_devreq_compat_ioctl,
+#endif
+	.poll = orangefs_devreq_poll
+};
+
 /*
  * Initialize orangefs device specific state:
  * Must be called at module load time only
@@ -814,29 +840,3 @@ void orangefs_dev_cleanup(void)
 		     "*** /dev/%s character device unregistered ***\n",
 		     ORANGEFS_REQDEVICE_NAME);
 }
-
-static __poll_t orangefs_devreq_poll(struct file *file,
-				      struct poll_table_struct *poll_table)
-{
-	__poll_t poll_revent_mask = 0;
-
-	poll_wait(file, &orangefs_request_list_waitq, poll_table);
-
-	if (!list_empty(&orangefs_request_list))
-		poll_revent_mask |= EPOLLIN;
-	return poll_revent_mask;
-}
-
-const struct file_operations orangefs_devreq_file_operations = {
-	.owner = THIS_MODULE,
-	.read = orangefs_devreq_read,
-	.write_iter = orangefs_devreq_write_iter,
-	.open = orangefs_devreq_open,
-	.release = orangefs_devreq_release,
-	.unlocked_ioctl = orangefs_devreq_ioctl,
-
-#ifdef CONFIG_COMPAT		/* CONFIG_COMPAT is in .config */
-	.compat_ioctl = orangefs_devreq_compat_ioctl,
-#endif
-	.poll = orangefs_devreq_poll
-};
diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c
index fe1d705ad91f..79c61da8b1bc 100644
--- a/fs/orangefs/inode.c
+++ b/fs/orangefs/inode.c
@@ -138,7 +138,7 @@ static ssize_t orangefs_direct_IO(struct kiocb *iocb,
 }
 
 /** ORANGEFS2 implementation of address space operations */
-const struct address_space_operations orangefs_address_operations = {
+static const struct address_space_operations orangefs_address_operations = {
 	.readpage = orangefs_readpage,
 	.readpages = orangefs_readpages,
 	.invalidatepage = orangefs_invalidatepage,
@@ -307,7 +307,7 @@ int orangefs_update_time(struct inode *inode, struct timespec *time, int flags)
 }
 
 /* ORANGEDS2 implementation of VFS inode operations for files */
-const struct inode_operations orangefs_file_inode_operations = {
+static const struct inode_operations orangefs_file_inode_operations = {
 	.get_acl = orangefs_get_acl,
 	.set_acl = orangefs_set_acl,
 	.setattr = orangefs_setattr,
diff --git a/fs/orangefs/orangefs-kernel.h b/fs/orangefs/orangefs-kernel.h
index eebbaece85ef..f49d53de8901 100644
--- a/fs/orangefs/orangefs-kernel.h
+++ b/fs/orangefs/orangefs-kernel.h
@@ -487,14 +487,11 @@ extern struct list_head *orangefs_htable_ops_in_progress;
 extern spinlock_t orangefs_htable_ops_in_progress_lock;
 extern int hash_table_size;
 
-extern const struct address_space_operations orangefs_address_operations;
-extern const struct inode_operations orangefs_file_inode_operations;
 extern const struct file_operations orangefs_file_operations;
 extern const struct inode_operations orangefs_symlink_inode_operations;
 extern const struct inode_operations orangefs_dir_inode_operations;
 extern const struct file_operations orangefs_dir_operations;
 extern const struct dentry_operations orangefs_dentry_operations;
-extern const struct file_operations orangefs_devreq_file_operations;
 
 extern wait_queue_head_t orangefs_bufmap_init_waitq;
 
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 02/24] orangefs: remove unused code
  2018-03-20 17:02 [PATCH 00/24] orangefs: page cache Martin Brandenburg
  2018-03-20 17:02 ` [PATCH 01/24] orangefs: make several *_operations structs static Martin Brandenburg
@ 2018-03-20 17:02 ` Martin Brandenburg
  2018-03-20 17:02 ` [PATCH 03/24] orangefs: create uapi interface Martin Brandenburg
                   ` (21 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Martin Brandenburg @ 2018-03-20 17:02 UTC (permalink / raw)
  To: hubcap, linux-fsdevel; +Cc: Martin Brandenburg

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
---
 fs/orangefs/orangefs-debug.h  |  6 ----
 fs/orangefs/orangefs-kernel.h | 77 -------------------------------------------
 fs/orangefs/protocol.h        | 45 -------------------------
 3 files changed, 128 deletions(-)

diff --git a/fs/orangefs/orangefs-debug.h b/fs/orangefs/orangefs-debug.h
index c7db56a31b92..6e079d4230d0 100644
--- a/fs/orangefs/orangefs-debug.h
+++ b/fs/orangefs/orangefs-debug.h
@@ -43,12 +43,6 @@
 #define GOSSIP_MAX_NR                 16
 #define GOSSIP_MAX_DEBUG              (((__u64)1 << GOSSIP_MAX_NR) - 1)
 
-/*function prototypes*/
-__u64 ORANGEFS_kmod_eventlog_to_mask(const char *event_logging);
-__u64 ORANGEFS_debug_eventlog_to_mask(const char *event_logging);
-char *ORANGEFS_debug_mask_to_eventlog(__u64 mask);
-char *ORANGEFS_kmod_mask_to_eventlog(__u64 mask);
-
 /* a private internal type */
 struct __keyword_mask_s {
 	const char *keyword;
diff --git a/fs/orangefs/orangefs-kernel.h b/fs/orangefs/orangefs-kernel.h
index f49d53de8901..c29bb0ebc6bb 100644
--- a/fs/orangefs/orangefs-kernel.h
+++ b/fs/orangefs/orangefs-kernel.h
@@ -65,11 +65,7 @@
 #define ORANGEFS_REQDEVICE_NAME          "pvfs2-req"
 
 #define ORANGEFS_DEVREQ_MAGIC             0x20030529
-#define ORANGEFS_LINK_MAX                 0x000000FF
 #define ORANGEFS_PURGE_RETRY_COUNT     0x00000005
-#define ORANGEFS_MAX_NUM_OPTIONS          0x00000004
-#define ORANGEFS_MAX_MOUNT_OPT_LEN        0x00000080
-#define ORANGEFS_MAX_FSKEY_LEN            64
 
 #define MAX_DEV_REQ_UPSIZE (2 * sizeof(__s32) +   \
 sizeof(__u64) + sizeof(struct orangefs_upcall_s))
@@ -112,15 +108,6 @@ extern const struct xattr_handler *orangefs_xattr_handlers[];
 extern struct posix_acl *orangefs_get_acl(struct inode *inode, int type);
 extern int orangefs_set_acl(struct inode *inode, struct posix_acl *acl, int type);
 
-/*
- * Redefine xtvec structure so that we could move helper functions out of
- * the define
- */
-struct xtvec {
-	__kernel_off_t xtv_off;		/* must be off_t */
-	__kernel_size_t xtv_len;	/* must be size_t */
-};
-
 /*
  * orangefs data structures
  */
@@ -224,39 +211,6 @@ struct orangefs_sb_info_s {
 	struct list_head list;
 };
 
-/*
- * structure that holds the state of any async I/O operation issued
- * through the VFS. Needed especially to handle cancellation requests
- * or even completion notification so that the VFS client-side daemon
- * can free up its vfs_request slots.
- */
-struct orangefs_kiocb_s {
-	/* the pointer to the task that initiated the AIO */
-	struct task_struct *tsk;
-
-	/* pointer to the kiocb that kicked this operation */
-	struct kiocb *kiocb;
-
-	/* buffer index that was used for the I/O */
-	struct orangefs_bufmap *bufmap;
-	int buffer_index;
-
-	/* orangefs kernel operation type */
-	struct orangefs_kernel_op_s *op;
-
-	/* set to indicate the type of the operation */
-	int rw;
-
-	/* file offset */
-	loff_t offset;
-
-	/* and the count in bytes */
-	size_t bytes_to_be_copied;
-
-	ssize_t bytes_copied;
-	int needs_cleanup;
-};
-
 struct orangefs_stats {
 	unsigned long cache_hits;
 	unsigned long cache_misses;
@@ -305,21 +259,6 @@ static inline struct orangefs_khandle *get_khandle_from_ino(struct inode *inode)
 	return &(ORANGEFS_I(inode)->refn.khandle);
 }
 
-static inline ino_t get_ino_from_khandle(struct inode *inode)
-{
-	struct orangefs_khandle *khandle;
-	ino_t ino;
-
-	khandle = get_khandle_from_ino(inode);
-	ino = orangefs_khandle_to_ino(khandle);
-	return ino;
-}
-
-static inline ino_t get_parent_ino_from_dentry(struct dentry *dentry)
-{
-	return get_ino_from_khandle(dentry->d_parent->d_inode);
-}
-
 static inline int is_root_handle(struct inode *inode)
 {
 	gossip_debug(GOSSIP_DCACHE_DEBUG,
@@ -391,7 +330,6 @@ void fsid_key_table_finalize(void);
 /*
  * defined in inode.c
  */
-__u32 convert_to_orangefs_mask(unsigned long lite_mask);
 struct inode *orangefs_new_inode(struct super_block *sb,
 			      struct inode *dir,
 			      int mode,
@@ -410,17 +348,6 @@ int orangefs_update_time(struct inode *, struct timespec *, int);
 /*
  * defined in xattr.c
  */
-int orangefs_setxattr(struct dentry *dentry,
-		   const char *name,
-		   const void *value,
-		   size_t size,
-		   int flags);
-
-ssize_t orangefs_getxattr(struct dentry *dentry,
-		       const char *name,
-		       void *buffer,
-		       size_t size);
-
 ssize_t orangefs_listxattr(struct dentry *dentry, char *buffer, size_t size);
 
 /*
@@ -467,8 +394,6 @@ int orangefs_inode_check_changed(struct inode *inode);
 
 int orangefs_inode_setattr(struct inode *inode, struct iattr *iattr);
 
-int orangefs_unmount_sb(struct super_block *sb);
-
 bool orangefs_cancel_op_in_progress(struct orangefs_kernel_op_s *op);
 
 int orangefs_normalize_to_errno(__s32 error_code);
@@ -493,8 +418,6 @@ extern const struct inode_operations orangefs_dir_inode_operations;
 extern const struct file_operations orangefs_dir_operations;
 extern const struct dentry_operations orangefs_dentry_operations;
 
-extern wait_queue_head_t orangefs_bufmap_init_waitq;
-
 /*
  * misc convenience macros
  */
diff --git a/fs/orangefs/protocol.h b/fs/orangefs/protocol.h
index dc6e3e6269c3..61ee8d64c842 100644
--- a/fs/orangefs/protocol.h
+++ b/fs/orangefs/protocol.h
@@ -5,11 +5,6 @@
 #include <linux/slab.h>
 #include <linux/ioctl.h>
 
-/* pvfs2-config.h ***********************************************************/
-#define ORANGEFS_VERSION_MAJOR 2
-#define ORANGEFS_VERSION_MINOR 9
-#define ORANGEFS_VERSION_SUB 0
-
 /* khandle stuff  ***********************************************************/
 
 /*
@@ -70,16 +65,6 @@ static inline void ORANGEFS_khandle_from(struct orangefs_khandle *kh,
 }
 
 /* pvfs2-types.h ************************************************************/
-typedef __u32 ORANGEFS_uid;
-typedef __u32 ORANGEFS_gid;
-typedef __s32 ORANGEFS_fs_id;
-typedef __u32 ORANGEFS_permissions;
-typedef __u64 ORANGEFS_time;
-typedef __s64 ORANGEFS_size;
-typedef __u64 ORANGEFS_flags;
-typedef __u64 ORANGEFS_ds_position;
-typedef __s32 ORANGEFS_error;
-typedef __s64 ORANGEFS_offset;
 
 #define ORANGEFS_SUPER_MAGIC 0x20030528
 
@@ -145,7 +130,6 @@ typedef __s64 ORANGEFS_offset;
 #define ORANGEFS_APPEND_FL    FS_APPEND_FL
 #define ORANGEFS_NOATIME_FL   FS_NOATIME_FL
 #define ORANGEFS_MIRROR_FL    0x01000000ULL
-#define ORANGEFS_O_EXECUTE (1 << 0)
 #define ORANGEFS_FS_ID_NULL       ((__s32)0)
 #define ORANGEFS_ATTR_SYS_UID                   (1 << 0)
 #define ORANGEFS_ATTR_SYS_GID                   (1 << 1)
@@ -229,35 +213,6 @@ enum orangefs_ds_type {
 	ORANGEFS_TYPE_INTERNAL = (1 << 5)	/* for the server's private use */
 };
 
-/*
- * ORANGEFS_certificate simply stores a buffer with the buffer size.
- * The buffer can be converted to an OpenSSL X509 struct for use.
- */
-struct ORANGEFS_certificate {
-	__u32 buf_size;
-	unsigned char *buf;
-};
-
-/*
- * A credential identifies a user and is signed by the client/user
- * private key.
- */
-struct ORANGEFS_credential {
-	__u32 userid;	/* user id */
-	__u32 num_groups;	/* length of group_array */
-	__u32 *group_array;	/* groups for which the user is a member */
-	char *issuer;		/* alias of the issuing server */
-	__u64 timeout;	/* seconds after epoch to time out */
-	__u32 sig_size;	/* length of the signature in bytes */
-	unsigned char *signature;	/* digital signature */
-	struct ORANGEFS_certificate certificate;	/* user certificate buffer */
-};
-#define extra_size_ORANGEFS_credential (ORANGEFS_REQ_LIMIT_GROUPS	*	\
-				    sizeof(__u32)		+	\
-				    ORANGEFS_REQ_LIMIT_ISSUER	+	\
-				    ORANGEFS_REQ_LIMIT_SIGNATURE	+	\
-				    extra_size_ORANGEFS_certificate)
-
 /* This structure is used by the VFS-client interaction alone */
 struct ORANGEFS_keyval_pair {
 	char key[ORANGEFS_MAX_XATTR_NAMELEN];
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 03/24] orangefs: create uapi interface
  2018-03-20 17:02 [PATCH 00/24] orangefs: page cache Martin Brandenburg
  2018-03-20 17:02 ` [PATCH 01/24] orangefs: make several *_operations structs static Martin Brandenburg
  2018-03-20 17:02 ` [PATCH 02/24] orangefs: remove unused code Martin Brandenburg
@ 2018-03-20 17:02 ` Martin Brandenburg
  2018-03-20 17:02 ` [PATCH 04/24] orangefs: open code short single-use functions Martin Brandenburg
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Martin Brandenburg @ 2018-03-20 17:02 UTC (permalink / raw)
  To: hubcap, linux-fsdevel; +Cc: Martin Brandenburg

This formally separates userspace visible and kernel internal data
structures.  The sole consumer of this would be the pvfs2-client-core,
which currently simply defines all these structures itself.  Most of
these are shared with the rest of the OrangeFS userspace client and
server code.

As the userspace OrangeFS client and server progress, they are beginning
to change their "internal" definitions.  This will allow us to separate
data structures which will be sent to the kernel and data structures
which belong to userspace.

This is accomplished by moving everything relevant from fs/orangefs/*.h
to include/uapi/linux/orangefs.h.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
---
 fs/orangefs/acl.c                |   1 -
 fs/orangefs/dcache.c             |   1 -
 fs/orangefs/devorangefs-req.c    |   2 -
 fs/orangefs/dir.c                |   1 -
 fs/orangefs/downcall.h           | 137 --------
 fs/orangefs/file.c               |   1 -
 fs/orangefs/inode.c              |   1 -
 fs/orangefs/namei.c              |   1 -
 fs/orangefs/orangefs-bufmap.c    |   1 -
 fs/orangefs/orangefs-cache.c     |   1 -
 fs/orangefs/orangefs-debug.h     |  27 --
 fs/orangefs/orangefs-debugfs.c   |   4 +-
 fs/orangefs/orangefs-dev-proto.h |  61 ----
 fs/orangefs/orangefs-kernel.h    |   3 +-
 fs/orangefs/orangefs-mod.c       |   1 -
 fs/orangefs/orangefs-sysfs.c     |   1 -
 fs/orangefs/orangefs-utils.c     |   2 -
 fs/orangefs/protocol.h           | 294 ----------------
 fs/orangefs/super.c              |   1 -
 fs/orangefs/symlink.c            |   1 -
 fs/orangefs/upcall.h             | 260 --------------
 fs/orangefs/waitqueue.c          |   1 -
 fs/orangefs/xattr.c              |   1 -
 include/uapi/linux/orangefs.h    | 732 +++++++++++++++++++++++++++++++++++++++
 24 files changed, 736 insertions(+), 800 deletions(-)
 delete mode 100644 fs/orangefs/downcall.h
 delete mode 100644 fs/orangefs/orangefs-dev-proto.h
 delete mode 100644 fs/orangefs/upcall.h
 create mode 100644 include/uapi/linux/orangefs.h

diff --git a/fs/orangefs/acl.c b/fs/orangefs/acl.c
index 480ea059a680..796c22f80b78 100644
--- a/fs/orangefs/acl.c
+++ b/fs/orangefs/acl.c
@@ -5,7 +5,6 @@
  * See COPYING in top-level directory.
  */
 
-#include "protocol.h"
 #include "orangefs-kernel.h"
 #include "orangefs-bufmap.h"
 #include <linux/posix_acl_xattr.h>
diff --git a/fs/orangefs/dcache.c b/fs/orangefs/dcache.c
index fe484cf93e5c..8e8e15850e39 100644
--- a/fs/orangefs/dcache.c
+++ b/fs/orangefs/dcache.c
@@ -9,7 +9,6 @@
  *  Implementation of dentry (directory cache) functions.
  */
 
-#include "protocol.h"
 #include "orangefs-kernel.h"
 
 /* Returns 1 if dentry can still be trusted, else 0. */
diff --git a/fs/orangefs/devorangefs-req.c b/fs/orangefs/devorangefs-req.c
index 04da19bf000e..f4a1eff35e59 100644
--- a/fs/orangefs/devorangefs-req.c
+++ b/fs/orangefs/devorangefs-req.c
@@ -8,9 +8,7 @@
  * See COPYING in top-level directory.
  */
 
-#include "protocol.h"
 #include "orangefs-kernel.h"
-#include "orangefs-dev-proto.h"
 #include "orangefs-bufmap.h"
 #include "orangefs-debugfs.h"
 
diff --git a/fs/orangefs/dir.c b/fs/orangefs/dir.c
index e2c2699d8016..e760315acd2a 100644
--- a/fs/orangefs/dir.c
+++ b/fs/orangefs/dir.c
@@ -3,7 +3,6 @@
  * Copyright 2017 Omnibond Systems, L.L.C.
  */
 
-#include "protocol.h"
 #include "orangefs-kernel.h"
 #include "orangefs-bufmap.h"
 
diff --git a/fs/orangefs/downcall.h b/fs/orangefs/downcall.h
deleted file mode 100644
index ea2332e16af9..000000000000
--- a/fs/orangefs/downcall.h
+++ /dev/null
@@ -1,137 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/*
- * (C) 2001 Clemson University and The University of Chicago
- *
- * See COPYING in top-level directory.
- */
-
-/*
- *  Definitions of downcalls used in Linux kernel module.
- */
-
-#ifndef __DOWNCALL_H
-#define __DOWNCALL_H
-
-/*
- * Sanitized the device-client core interaction
- * for clean 32-64 bit usage
- */
-struct orangefs_io_response {
-	__s64 amt_complete;
-};
-
-struct orangefs_lookup_response {
-	struct orangefs_object_kref refn;
-};
-
-struct orangefs_create_response {
-	struct orangefs_object_kref refn;
-};
-
-struct orangefs_symlink_response {
-	struct orangefs_object_kref refn;
-};
-
-struct orangefs_getattr_response {
-	struct ORANGEFS_sys_attr_s attributes;
-	char link_target[ORANGEFS_NAME_MAX];
-};
-
-struct orangefs_mkdir_response {
-	struct orangefs_object_kref refn;
-};
-
-struct orangefs_statfs_response {
-	__s64 block_size;
-	__s64 blocks_total;
-	__s64 blocks_avail;
-	__s64 files_total;
-	__s64 files_avail;
-};
-
-struct orangefs_fs_mount_response {
-	__s32 fs_id;
-	__s32 id;
-	struct orangefs_khandle root_khandle;
-};
-
-/* the getxattr response is the attribute value */
-struct orangefs_getxattr_response {
-	__s32 val_sz;
-	__s32 __pad1;
-	char val[ORANGEFS_MAX_XATTR_VALUELEN];
-};
-
-/* the listxattr response is an array of attribute names */
-struct orangefs_listxattr_response {
-	__s32 returned_count;
-	__s32 __pad1;
-	__u64 token;
-	char key[ORANGEFS_MAX_XATTR_LISTLEN * ORANGEFS_MAX_XATTR_NAMELEN];
-	__s32 keylen;
-	__s32 __pad2;
-	__s32 lengths[ORANGEFS_MAX_XATTR_LISTLEN];
-};
-
-struct orangefs_param_response {
-	union {
-		__s64 value64;
-		__s32 value32[2];
-	} u;
-};
-
-#define PERF_COUNT_BUF_SIZE 4096
-struct orangefs_perf_count_response {
-	char buffer[PERF_COUNT_BUF_SIZE];
-};
-
-#define FS_KEY_BUF_SIZE 4096
-struct orangefs_fs_key_response {
-	__s32 fs_keylen;
-	__s32 __pad1;
-	char fs_key[FS_KEY_BUF_SIZE];
-};
-
-/* 2.9.6 */
-struct orangefs_features_response {
-	__u64 features;
-};
-
-struct orangefs_downcall_s {
-	__s32 type;
-	__s32 status;
-	/* currently trailer is used only by readdir */
-	__s64 trailer_size;
-	char *trailer_buf;
-
-	union {
-		struct orangefs_io_response io;
-		struct orangefs_lookup_response lookup;
-		struct orangefs_create_response create;
-		struct orangefs_symlink_response sym;
-		struct orangefs_getattr_response getattr;
-		struct orangefs_mkdir_response mkdir;
-		struct orangefs_statfs_response statfs;
-		struct orangefs_fs_mount_response fs_mount;
-		struct orangefs_getxattr_response getxattr;
-		struct orangefs_listxattr_response listxattr;
-		struct orangefs_param_response param;
-		struct orangefs_perf_count_response perf_count;
-		struct orangefs_fs_key_response fs_key;
-		struct orangefs_features_response features;
-	} resp;
-};
-
-/*
- * The readdir response comes in the trailer.  It is followed by the
- * directory entries as described in dir.c.
- */
-
-struct orangefs_readdir_response_s {
-	__u64 token;
-	__u64 directory_version;
-	__u32 __pad2;
-	__u32 orangefs_dirent_outcount;
-};
-
-#endif /* __DOWNCALL_H */
diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c
index 0d228cd087e6..15ac531f19cf 100644
--- a/fs/orangefs/file.c
+++ b/fs/orangefs/file.c
@@ -9,7 +9,6 @@
  *  Linux VFS file operations.
  */
 
-#include "protocol.h"
 #include "orangefs-kernel.h"
 #include "orangefs-bufmap.h"
 #include <linux/fs.h>
diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c
index 79c61da8b1bc..a922c9da80d6 100644
--- a/fs/orangefs/inode.c
+++ b/fs/orangefs/inode.c
@@ -10,7 +10,6 @@
  */
 
 #include <linux/bvec.h>
-#include "protocol.h"
 #include "orangefs-kernel.h"
 #include "orangefs-bufmap.h"
 
diff --git a/fs/orangefs/namei.c b/fs/orangefs/namei.c
index 6e3134e6d98a..3ba6e153f769 100644
--- a/fs/orangefs/namei.c
+++ b/fs/orangefs/namei.c
@@ -9,7 +9,6 @@
  *  Linux VFS namei operations.
  */
 
-#include "protocol.h"
 #include "orangefs-kernel.h"
 
 /*
diff --git a/fs/orangefs/orangefs-bufmap.c b/fs/orangefs/orangefs-bufmap.c
index 59f444dced9b..ee11b29c3a9e 100644
--- a/fs/orangefs/orangefs-bufmap.c
+++ b/fs/orangefs/orangefs-bufmap.c
@@ -4,7 +4,6 @@
  *
  * See COPYING in top-level directory.
  */
-#include "protocol.h"
 #include "orangefs-kernel.h"
 #include "orangefs-bufmap.h"
 
diff --git a/fs/orangefs/orangefs-cache.c b/fs/orangefs/orangefs-cache.c
index 3b6982bf6bcf..a02288721060 100644
--- a/fs/orangefs/orangefs-cache.c
+++ b/fs/orangefs/orangefs-cache.c
@@ -5,7 +5,6 @@
  * See COPYING in top-level directory.
  */
 
-#include "protocol.h"
 #include "orangefs-kernel.h"
 
 /* tags assigned to kernel upcall operations */
diff --git a/fs/orangefs/orangefs-debug.h b/fs/orangefs/orangefs-debug.h
index 6e079d4230d0..383b709d41a1 100644
--- a/fs/orangefs/orangefs-debug.h
+++ b/fs/orangefs/orangefs-debug.h
@@ -13,35 +13,8 @@
 #ifndef __ORANGEFS_DEBUG_H
 #define __ORANGEFS_DEBUG_H
 
-#ifdef __KERNEL__
 #include <linux/types.h>
 #include <linux/kernel.h>
-#else
-#include <stdint.h>
-#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
-#endif
-
-#define	GOSSIP_NO_DEBUG			(__u64)0
-
-#define GOSSIP_SUPER_DEBUG		((__u64)1 << 0)
-#define GOSSIP_INODE_DEBUG		((__u64)1 << 1)
-#define GOSSIP_FILE_DEBUG		((__u64)1 << 2)
-#define GOSSIP_DIR_DEBUG		((__u64)1 << 3)
-#define GOSSIP_UTILS_DEBUG		((__u64)1 << 4)
-#define GOSSIP_WAIT_DEBUG		((__u64)1 << 5)
-#define GOSSIP_ACL_DEBUG		((__u64)1 << 6)
-#define GOSSIP_DCACHE_DEBUG		((__u64)1 << 7)
-#define GOSSIP_DEV_DEBUG		((__u64)1 << 8)
-#define GOSSIP_NAME_DEBUG		((__u64)1 << 9)
-#define GOSSIP_BUFMAP_DEBUG		((__u64)1 << 10)
-#define GOSSIP_CACHE_DEBUG		((__u64)1 << 11)
-#define GOSSIP_DEBUGFS_DEBUG		((__u64)1 << 12)
-#define GOSSIP_XATTR_DEBUG		((__u64)1 << 13)
-#define GOSSIP_INIT_DEBUG		((__u64)1 << 14)
-#define GOSSIP_SYSFS_DEBUG		((__u64)1 << 15)
-
-#define GOSSIP_MAX_NR                 16
-#define GOSSIP_MAX_DEBUG              (((__u64)1 << GOSSIP_MAX_NR) - 1)
 
 /* a private internal type */
 struct __keyword_mask_s {
diff --git a/fs/orangefs/orangefs-debugfs.c b/fs/orangefs/orangefs-debugfs.c
index 6e35f2f3c897..af14e80211c9 100644
--- a/fs/orangefs/orangefs-debugfs.c
+++ b/fs/orangefs/orangefs-debugfs.c
@@ -40,9 +40,9 @@
 
 #include <linux/uaccess.h>
 
-#include "orangefs-debugfs.h"
-#include "protocol.h"
 #include "orangefs-kernel.h"
+#include "orangefs-debug.h"
+#include "orangefs-debugfs.h"
 
 #define DEBUG_HELP_STRING_SIZE 4096
 #define HELP_STRING_UNINITIALIZED \
diff --git a/fs/orangefs/orangefs-dev-proto.h b/fs/orangefs/orangefs-dev-proto.h
deleted file mode 100644
index dc6609824965..000000000000
--- a/fs/orangefs/orangefs-dev-proto.h
+++ /dev/null
@@ -1,61 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/*
- * (C) 2001 Clemson University and The University of Chicago
- *
- * See COPYING in top-level directory.
- */
-
-#ifndef _ORANGEFS_DEV_PROTO_H
-#define _ORANGEFS_DEV_PROTO_H
-
-/*
- * types and constants shared between user space and kernel space for
- * device interaction using a common protocol
- */
-
-/*
- * valid orangefs kernel operation types
- */
-#define ORANGEFS_VFS_OP_INVALID           0xFF000000
-#define ORANGEFS_VFS_OP_FILE_IO        0xFF000001
-#define ORANGEFS_VFS_OP_LOOKUP         0xFF000002
-#define ORANGEFS_VFS_OP_CREATE         0xFF000003
-#define ORANGEFS_VFS_OP_GETATTR        0xFF000004
-#define ORANGEFS_VFS_OP_REMOVE         0xFF000005
-#define ORANGEFS_VFS_OP_MKDIR          0xFF000006
-#define ORANGEFS_VFS_OP_READDIR        0xFF000007
-#define ORANGEFS_VFS_OP_SETATTR        0xFF000008
-#define ORANGEFS_VFS_OP_SYMLINK        0xFF000009
-#define ORANGEFS_VFS_OP_RENAME         0xFF00000A
-#define ORANGEFS_VFS_OP_STATFS         0xFF00000B
-#define ORANGEFS_VFS_OP_TRUNCATE       0xFF00000C
-#define ORANGEFS_VFS_OP_RA_FLUSH       0xFF00000D
-#define ORANGEFS_VFS_OP_FS_MOUNT       0xFF00000E
-#define ORANGEFS_VFS_OP_FS_UMOUNT      0xFF00000F
-#define ORANGEFS_VFS_OP_GETXATTR       0xFF000010
-#define ORANGEFS_VFS_OP_SETXATTR          0xFF000011
-#define ORANGEFS_VFS_OP_LISTXATTR         0xFF000012
-#define ORANGEFS_VFS_OP_REMOVEXATTR       0xFF000013
-#define ORANGEFS_VFS_OP_PARAM          0xFF000014
-#define ORANGEFS_VFS_OP_PERF_COUNT     0xFF000015
-#define ORANGEFS_VFS_OP_CANCEL            0xFF00EE00
-#define ORANGEFS_VFS_OP_FSYNC          0xFF00EE01
-#define ORANGEFS_VFS_OP_FSKEY             0xFF00EE02
-#define ORANGEFS_VFS_OP_READDIRPLUS       0xFF00EE03
-#define ORANGEFS_VFS_OP_FEATURES	0xFF00EE05 /* 2.9.6 */
-
-/* features is a 64-bit unsigned bitmask */
-#define ORANGEFS_FEATURE_READAHEAD 1
-
-/*
- * Misc constants. Please retain them as multiples of 8!
- * Otherwise 32-64 bit interactions will be messed up :)
- */
-#define ORANGEFS_MAX_DEBUG_STRING_LEN	0x00000800
-
-#define ORANGEFS_MAX_DIRENT_COUNT_READDIR 512
-
-#include "upcall.h"
-#include "downcall.h"
-
-#endif
diff --git a/fs/orangefs/orangefs-kernel.h b/fs/orangefs/orangefs-kernel.h
index c29bb0ebc6bb..4f26fcbb9c83 100644
--- a/fs/orangefs/orangefs-kernel.h
+++ b/fs/orangefs/orangefs-kernel.h
@@ -54,7 +54,8 @@
 
 #include <asm/unaligned.h>
 
-#include "orangefs-dev-proto.h"
+#include <linux/orangefs.h>
+#include "protocol.h"
 
 #define ORANGEFS_DEFAULT_OP_TIMEOUT_SECS       20
 
diff --git a/fs/orangefs/orangefs-mod.c b/fs/orangefs/orangefs-mod.c
index 85ef87245a87..cf9f3259ad93 100644
--- a/fs/orangefs/orangefs-mod.c
+++ b/fs/orangefs/orangefs-mod.c
@@ -7,7 +7,6 @@
  * See COPYING in top-level directory.
  */
 
-#include "protocol.h"
 #include "orangefs-kernel.h"
 #include "orangefs-debugfs.h"
 #include "orangefs-sysfs.h"
diff --git a/fs/orangefs/orangefs-sysfs.c b/fs/orangefs/orangefs-sysfs.c
index 079a465796f3..71177ef3d8b6 100644
--- a/fs/orangefs/orangefs-sysfs.c
+++ b/fs/orangefs/orangefs-sysfs.c
@@ -135,7 +135,6 @@
 #include <linux/module.h>
 #include <linux/init.h>
 
-#include "protocol.h"
 #include "orangefs-kernel.h"
 #include "orangefs-sysfs.h"
 
diff --git a/fs/orangefs/orangefs-utils.c b/fs/orangefs/orangefs-utils.c
index ea6256d136d1..0a08c7bd25ca 100644
--- a/fs/orangefs/orangefs-utils.c
+++ b/fs/orangefs/orangefs-utils.c
@@ -5,9 +5,7 @@
  * See COPYING in top-level directory.
  */
 #include <linux/kernel.h>
-#include "protocol.h"
 #include "orangefs-kernel.h"
-#include "orangefs-dev-proto.h"
 #include "orangefs-bufmap.h"
 
 __s32 fsid_of_op(struct orangefs_kernel_op_s *op)
diff --git a/fs/orangefs/protocol.h b/fs/orangefs/protocol.h
index 61ee8d64c842..e16d09e5b030 100644
--- a/fs/orangefs/protocol.h
+++ b/fs/orangefs/protocol.h
@@ -7,27 +7,6 @@
 
 /* khandle stuff  ***********************************************************/
 
-/*
- * The 2.9 core will put 64 bit handles in here like this:
- *    1234 0000 0000 5678
- * The 3.0 and beyond cores will put 128 bit handles in here like this:
- *    1234 5678 90AB CDEF
- * The kernel module will always use the first four bytes and
- * the last four bytes as an inum.
- */
-struct orangefs_khandle {
-	unsigned char u[16];
-}  __aligned(8);
-
-/*
- * kernel version of an object ref.
- */
-struct orangefs_object_kref {
-	struct orangefs_khandle khandle;
-	__s32 fs_id;
-	__s32 __pad1;
-};
-
 /*
  * compare 2 khandles assumes little endian thus from large address to
  * small address
@@ -68,286 +47,13 @@ static inline void ORANGEFS_khandle_from(struct orangefs_khandle *kh,
 
 #define ORANGEFS_SUPER_MAGIC 0x20030528
 
-/*
- * ORANGEFS error codes are a signed 32-bit integer. Error codes are negative, but
- * the sign is stripped before decoding.
- */
-
-/* Bit 31 is not used since it is the sign. */
-
-/*
- * Bit 30 specifies that this is a ORANGEFS error. A ORANGEFS error is either an
- * encoded errno value or a ORANGEFS protocol error.
- */
-#define ORANGEFS_ERROR_BIT (1 << 30)
-
-/*
- * Bit 29 specifies that this is a ORANGEFS protocol error and not an encoded
- * errno value.
- */
-#define ORANGEFS_NON_ERRNO_ERROR_BIT (1 << 29)
-
-/*
- * Bits 9, 8, and 7 specify the error class, which encodes the section of
- * server code the error originated in for logging purposes. It is not used
- * in the kernel except to be masked out.
- */
-#define ORANGEFS_ERROR_CLASS_BITS 0x380
-
-/* Bits 6 - 0 are reserved for the actual error code. */
-#define ORANGEFS_ERROR_NUMBER_BITS 0x7f
-
-/* Encoded errno values decoded by PINT_errno_mapping in orangefs-utils.c. */
-
-/* Our own ORANGEFS protocol error codes. */
-#define ORANGEFS_ECANCEL    (1|ORANGEFS_NON_ERRNO_ERROR_BIT|ORANGEFS_ERROR_BIT)
-#define ORANGEFS_EDEVINIT   (2|ORANGEFS_NON_ERRNO_ERROR_BIT|ORANGEFS_ERROR_BIT)
-#define ORANGEFS_EDETAIL    (3|ORANGEFS_NON_ERRNO_ERROR_BIT|ORANGEFS_ERROR_BIT)
-#define ORANGEFS_EHOSTNTFD  (4|ORANGEFS_NON_ERRNO_ERROR_BIT|ORANGEFS_ERROR_BIT)
-#define ORANGEFS_EADDRNTFD  (5|ORANGEFS_NON_ERRNO_ERROR_BIT|ORANGEFS_ERROR_BIT)
-#define ORANGEFS_ENORECVR   (6|ORANGEFS_NON_ERRNO_ERROR_BIT|ORANGEFS_ERROR_BIT)
-#define ORANGEFS_ETRYAGAIN  (7|ORANGEFS_NON_ERRNO_ERROR_BIT|ORANGEFS_ERROR_BIT)
-#define ORANGEFS_ENOTPVFS   (8|ORANGEFS_NON_ERRNO_ERROR_BIT|ORANGEFS_ERROR_BIT)
-#define ORANGEFS_ESECURITY  (9|ORANGEFS_NON_ERRNO_ERROR_BIT|ORANGEFS_ERROR_BIT)
-
-/* permission bits */
-#define ORANGEFS_O_EXECUTE (1 << 0)
-#define ORANGEFS_O_WRITE   (1 << 1)
-#define ORANGEFS_O_READ    (1 << 2)
-#define ORANGEFS_G_EXECUTE (1 << 3)
-#define ORANGEFS_G_WRITE   (1 << 4)
-#define ORANGEFS_G_READ    (1 << 5)
-#define ORANGEFS_U_EXECUTE (1 << 6)
-#define ORANGEFS_U_WRITE   (1 << 7)
-#define ORANGEFS_U_READ    (1 << 8)
-/* no ORANGEFS_U_VTX (sticky bit) */
-#define ORANGEFS_G_SGID    (1 << 10)
-#define ORANGEFS_U_SUID    (1 << 11)
-
-#define ORANGEFS_ITERATE_START    2147483646
-#define ORANGEFS_ITERATE_END      2147483645
-#define ORANGEFS_IMMUTABLE_FL FS_IMMUTABLE_FL
-#define ORANGEFS_APPEND_FL    FS_APPEND_FL
-#define ORANGEFS_NOATIME_FL   FS_NOATIME_FL
-#define ORANGEFS_MIRROR_FL    0x01000000ULL
-#define ORANGEFS_FS_ID_NULL       ((__s32)0)
-#define ORANGEFS_ATTR_SYS_UID                   (1 << 0)
-#define ORANGEFS_ATTR_SYS_GID                   (1 << 1)
-#define ORANGEFS_ATTR_SYS_PERM                  (1 << 2)
-#define ORANGEFS_ATTR_SYS_ATIME                 (1 << 3)
-#define ORANGEFS_ATTR_SYS_CTIME                 (1 << 4)
-#define ORANGEFS_ATTR_SYS_MTIME                 (1 << 5)
-#define ORANGEFS_ATTR_SYS_TYPE                  (1 << 6)
-#define ORANGEFS_ATTR_SYS_ATIME_SET             (1 << 7)
-#define ORANGEFS_ATTR_SYS_MTIME_SET             (1 << 8)
-#define ORANGEFS_ATTR_SYS_SIZE                  (1 << 20)
-#define ORANGEFS_ATTR_SYS_LNK_TARGET            (1 << 24)
-#define ORANGEFS_ATTR_SYS_DFILE_COUNT           (1 << 25)
-#define ORANGEFS_ATTR_SYS_DIRENT_COUNT          (1 << 26)
-#define ORANGEFS_ATTR_SYS_BLKSIZE               (1 << 28)
-#define ORANGEFS_ATTR_SYS_MIRROR_COPIES_COUNT   (1 << 29)
-#define ORANGEFS_ATTR_SYS_COMMON_ALL	\
-	(ORANGEFS_ATTR_SYS_UID	|	\
-	 ORANGEFS_ATTR_SYS_GID	|	\
-	 ORANGEFS_ATTR_SYS_PERM	|	\
-	 ORANGEFS_ATTR_SYS_ATIME	|	\
-	 ORANGEFS_ATTR_SYS_CTIME	|	\
-	 ORANGEFS_ATTR_SYS_MTIME	|	\
-	 ORANGEFS_ATTR_SYS_TYPE)
-
-#define ORANGEFS_ATTR_SYS_ALL_SETABLE		\
-(ORANGEFS_ATTR_SYS_COMMON_ALL-ORANGEFS_ATTR_SYS_TYPE)
-
-#define ORANGEFS_ATTR_SYS_ALL_NOHINT			\
-	(ORANGEFS_ATTR_SYS_COMMON_ALL		|	\
-	 ORANGEFS_ATTR_SYS_SIZE			|	\
-	 ORANGEFS_ATTR_SYS_LNK_TARGET		|	\
-	 ORANGEFS_ATTR_SYS_DFILE_COUNT		|	\
-	 ORANGEFS_ATTR_SYS_MIRROR_COPIES_COUNT	|	\
-	 ORANGEFS_ATTR_SYS_DIRENT_COUNT		|	\
-	 ORANGEFS_ATTR_SYS_BLKSIZE)
-
-#define ORANGEFS_XATTR_REPLACE 0x2
-#define ORANGEFS_XATTR_CREATE  0x1
-#define ORANGEFS_MAX_SERVER_ADDR_LEN 256
-#define ORANGEFS_NAME_MAX                256
-/*
- * max extended attribute name len as imposed by the VFS and exploited for the
- * upcall request types.
- * NOTE: Please retain them as multiples of 8 even if you wish to change them
- * This is *NECESSARY* for supporting 32 bit user-space binaries on a 64-bit
- * kernel. Due to implementation within DBPF, this really needs to be
- * ORANGEFS_NAME_MAX, which it was the same value as, but no reason to let it
- * break if that changes in the future.
- */
-#define ORANGEFS_MAX_XATTR_NAMELEN   ORANGEFS_NAME_MAX	/* Not the same as
-						 * XATTR_NAME_MAX defined
-						 * by <linux/xattr.h>
-						 */
-#define ORANGEFS_MAX_XATTR_VALUELEN  8192	/* Not the same as XATTR_SIZE_MAX
-					 * defined by <linux/xattr.h>
-					 */
-#define ORANGEFS_MAX_XATTR_LISTLEN   16	/* Not the same as XATTR_LIST_MAX
-					 * defined by <linux/xattr.h>
-					 */
-/*
- * ORANGEFS I/O operation types, used in both system and server interfaces.
- */
-enum ORANGEFS_io_type {
-	ORANGEFS_IO_READ = 1,
-	ORANGEFS_IO_WRITE = 2
-};
-
-/*
- * If this enum is modified the server parameters related to the precreate pool
- * batch and low threshold sizes may need to be modified  to reflect this
- * change.
- */
-enum orangefs_ds_type {
-	ORANGEFS_TYPE_NONE = 0,
-	ORANGEFS_TYPE_METAFILE = (1 << 0),
-	ORANGEFS_TYPE_DATAFILE = (1 << 1),
-	ORANGEFS_TYPE_DIRECTORY = (1 << 2),
-	ORANGEFS_TYPE_SYMLINK = (1 << 3),
-	ORANGEFS_TYPE_DIRDATA = (1 << 4),
-	ORANGEFS_TYPE_INTERNAL = (1 << 5)	/* for the server's private use */
-};
-
-/* This structure is used by the VFS-client interaction alone */
-struct ORANGEFS_keyval_pair {
-	char key[ORANGEFS_MAX_XATTR_NAMELEN];
-	__s32 key_sz;	/* __s32 for portable, fixed-size structures */
-	__s32 val_sz;
-	char val[ORANGEFS_MAX_XATTR_VALUELEN];
-};
-
-/* pvfs2-sysint.h ***********************************************************/
-/* Describes attributes for a file, directory, or symlink. */
-struct ORANGEFS_sys_attr_s {
-	__u32 owner;
-	__u32 group;
-	__u32 perms;
-	__u64 atime;
-	__u64 mtime;
-	__u64 ctime;
-	__s64 size;
-
-	/* NOTE: caller must free if valid */
-	char *link_target;
-
-	/* Changed to __s32 so that size of structure does not change */
-	__s32 dfile_count;
-
-	/* Changed to __s32 so that size of structure does not change */
-	__s32 distr_dir_servers_initial;
-
-	/* Changed to __s32 so that size of structure does not change */
-	__s32 distr_dir_servers_max;
-
-	/* Changed to __s32 so that size of structure does not change */
-	__s32 distr_dir_split_size;
-
-	__u32 mirror_copies_count;
-
-	/* NOTE: caller must free if valid */
-	char *dist_name;
-
-	/* NOTE: caller must free if valid */
-	char *dist_params;
-
-	__s64 dirent_count;
-	enum orangefs_ds_type objtype;
-	__u64 flags;
-	__u32 mask;
-	__s64 blksize;
-};
-
-#define ORANGEFS_LOOKUP_LINK_NO_FOLLOW 0
-
-/* pint-dev.h ***************************************************************/
-
-/* parameter structure used in ORANGEFS_DEV_DEBUG ioctl command */
-struct dev_mask_info_s {
-	enum {
-		KERNEL_MASK,
-		CLIENT_MASK,
-	} mask_type;
-	__u64 mask_value;
-};
-
-struct dev_mask2_info_s {
-	__u64 mask1_value;
-	__u64 mask2_value;
-};
-
 /* pvfs2-util.h *************************************************************/
 __s32 ORANGEFS_util_translate_mode(int mode);
 
-/* pvfs2-debug.h ************************************************************/
-#include "orangefs-debug.h"
-
 /* pvfs2-internal.h *********************************************************/
 #define llu(x) (unsigned long long)(x)
 #define lld(x) (long long)(x)
 
-/* pint-dev-shared.h ********************************************************/
-#define ORANGEFS_DEV_MAGIC 'k'
-
-#define ORANGEFS_READDIR_DEFAULT_DESC_COUNT  5
-
-#define DEV_GET_MAGIC           0x1
-#define DEV_GET_MAX_UPSIZE      0x2
-#define DEV_GET_MAX_DOWNSIZE    0x3
-#define DEV_MAP                 0x4
-#define DEV_REMOUNT_ALL         0x5
-#define DEV_DEBUG               0x6
-#define DEV_UPSTREAM            0x7
-#define DEV_CLIENT_MASK         0x8
-#define DEV_CLIENT_STRING       0x9
-#define DEV_MAX_NR              0xa
-
-/* supported ioctls, codes are with respect to user-space */
-enum {
-	ORANGEFS_DEV_GET_MAGIC = _IOW(ORANGEFS_DEV_MAGIC, DEV_GET_MAGIC, __s32),
-	ORANGEFS_DEV_GET_MAX_UPSIZE =
-	    _IOW(ORANGEFS_DEV_MAGIC, DEV_GET_MAX_UPSIZE, __s32),
-	ORANGEFS_DEV_GET_MAX_DOWNSIZE =
-	    _IOW(ORANGEFS_DEV_MAGIC, DEV_GET_MAX_DOWNSIZE, __s32),
-	ORANGEFS_DEV_MAP = _IO(ORANGEFS_DEV_MAGIC, DEV_MAP),
-	ORANGEFS_DEV_REMOUNT_ALL = _IO(ORANGEFS_DEV_MAGIC, DEV_REMOUNT_ALL),
-	ORANGEFS_DEV_DEBUG = _IOR(ORANGEFS_DEV_MAGIC, DEV_DEBUG, __s32),
-	ORANGEFS_DEV_UPSTREAM = _IOW(ORANGEFS_DEV_MAGIC, DEV_UPSTREAM, int),
-	ORANGEFS_DEV_CLIENT_MASK = _IOW(ORANGEFS_DEV_MAGIC,
-				    DEV_CLIENT_MASK,
-				    struct dev_mask2_info_s),
-	ORANGEFS_DEV_CLIENT_STRING = _IOW(ORANGEFS_DEV_MAGIC,
-				      DEV_CLIENT_STRING,
-				      char *),
-	ORANGEFS_DEV_MAXNR = DEV_MAX_NR,
-};
-
-/*
- * version number for use in communicating between kernel space and user
- * space. Zero signifies the upstream version of the kernel module.
- */
-#define ORANGEFS_KERNEL_PROTO_VERSION 0
-#define ORANGEFS_MINIMUM_USERSPACE_VERSION 20903
-
-/*
- * describes memory regions to map in the ORANGEFS_DEV_MAP ioctl.
- * NOTE: See devorangefs-req.c for 32 bit compat structure.
- * Since this structure has a variable-sized layout that is different
- * on 32 and 64 bit platforms, we need to normalize to a 64 bit layout
- * on such systems before servicing ioctl calls from user-space binaries
- * that may be 32 bit!
- */
-struct ORANGEFS_dev_map_desc {
-	void *ptr;
-	__s32 total_size;
-	__s32 size;
-	__s32 count;
-};
-
 /* gossip.h *****************************************************************/
 
 extern __u64 orangefs_gossip_debug_mask;
diff --git a/fs/orangefs/super.c b/fs/orangefs/super.c
index 3ae5fdba0225..6183f6f6db53 100644
--- a/fs/orangefs/super.c
+++ b/fs/orangefs/super.c
@@ -5,7 +5,6 @@
  * See COPYING in top-level directory.
  */
 
-#include "protocol.h"
 #include "orangefs-kernel.h"
 #include "orangefs-bufmap.h"
 
diff --git a/fs/orangefs/symlink.c b/fs/orangefs/symlink.c
index db107fe91ab3..106c0dfd48bf 100644
--- a/fs/orangefs/symlink.c
+++ b/fs/orangefs/symlink.c
@@ -5,7 +5,6 @@
  * See COPYING in top-level directory.
  */
 
-#include "protocol.h"
 #include "orangefs-kernel.h"
 #include "orangefs-bufmap.h"
 
diff --git a/fs/orangefs/upcall.h b/fs/orangefs/upcall.h
deleted file mode 100644
index 16118452aa12..000000000000
--- a/fs/orangefs/upcall.h
+++ /dev/null
@@ -1,260 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/*
- * (C) 2001 Clemson University and The University of Chicago
- *
- * See COPYING in top-level directory.
- */
-
-#ifndef __UPCALL_H
-#define __UPCALL_H
-
-/*
- * Sanitized this header file to fix
- * 32-64 bit interaction issues between
- * client-core and device
- */
-struct orangefs_io_request_s {
-	__s32 __pad1;
-	__s32 buf_index;
-	__s32 count;
-	__s32 __pad2;
-	__s64 offset;
-	struct orangefs_object_kref refn;
-	enum ORANGEFS_io_type io_type;
-	__s32 readahead_size;
-};
-
-struct orangefs_lookup_request_s {
-	__s32 sym_follow;
-	__s32 __pad1;
-	struct orangefs_object_kref parent_refn;
-	char d_name[ORANGEFS_NAME_MAX];
-};
-
-struct orangefs_create_request_s {
-	struct orangefs_object_kref parent_refn;
-	struct ORANGEFS_sys_attr_s attributes;
-	char d_name[ORANGEFS_NAME_MAX];
-};
-
-struct orangefs_symlink_request_s {
-	struct orangefs_object_kref parent_refn;
-	struct ORANGEFS_sys_attr_s attributes;
-	char entry_name[ORANGEFS_NAME_MAX];
-	char target[ORANGEFS_NAME_MAX];
-};
-
-struct orangefs_getattr_request_s {
-	struct orangefs_object_kref refn;
-	__u32 mask;
-	__u32 __pad1;
-};
-
-struct orangefs_setattr_request_s {
-	struct orangefs_object_kref refn;
-	struct ORANGEFS_sys_attr_s attributes;
-};
-
-struct orangefs_remove_request_s {
-	struct orangefs_object_kref parent_refn;
-	char d_name[ORANGEFS_NAME_MAX];
-};
-
-struct orangefs_mkdir_request_s {
-	struct orangefs_object_kref parent_refn;
-	struct ORANGEFS_sys_attr_s attributes;
-	char d_name[ORANGEFS_NAME_MAX];
-};
-
-struct orangefs_readdir_request_s {
-	struct orangefs_object_kref refn;
-	__u64 token;
-	__s32 max_dirent_count;
-	__s32 buf_index;
-};
-
-struct orangefs_readdirplus_request_s {
-	struct orangefs_object_kref refn;
-	__u64 token;
-	__s32 max_dirent_count;
-	__u32 mask;
-	__s32 buf_index;
-	__s32 __pad1;
-};
-
-struct orangefs_rename_request_s {
-	struct orangefs_object_kref old_parent_refn;
-	struct orangefs_object_kref new_parent_refn;
-	char d_old_name[ORANGEFS_NAME_MAX];
-	char d_new_name[ORANGEFS_NAME_MAX];
-};
-
-struct orangefs_statfs_request_s {
-	__s32 fs_id;
-	__s32 __pad1;
-};
-
-struct orangefs_truncate_request_s {
-	struct orangefs_object_kref refn;
-	__s64 size;
-};
-
-struct orangefs_ra_cache_flush_request_s {
-	struct orangefs_object_kref refn;
-};
-
-struct orangefs_fs_mount_request_s {
-	char orangefs_config_server[ORANGEFS_MAX_SERVER_ADDR_LEN];
-};
-
-struct orangefs_fs_umount_request_s {
-	__s32 id;
-	__s32 fs_id;
-	char orangefs_config_server[ORANGEFS_MAX_SERVER_ADDR_LEN];
-};
-
-struct orangefs_getxattr_request_s {
-	struct orangefs_object_kref refn;
-	__s32 key_sz;
-	__s32 __pad1;
-	char key[ORANGEFS_MAX_XATTR_NAMELEN];
-};
-
-struct orangefs_setxattr_request_s {
-	struct orangefs_object_kref refn;
-	struct ORANGEFS_keyval_pair keyval;
-	__s32 flags;
-	__s32 __pad1;
-};
-
-struct orangefs_listxattr_request_s {
-	struct orangefs_object_kref refn;
-	__s32 requested_count;
-	__s32 __pad1;
-	__u64 token;
-};
-
-struct orangefs_removexattr_request_s {
-	struct orangefs_object_kref refn;
-	__s32 key_sz;
-	__s32 __pad1;
-	char key[ORANGEFS_MAX_XATTR_NAMELEN];
-};
-
-struct orangefs_op_cancel_s {
-	__u64 op_tag;
-};
-
-struct orangefs_fsync_request_s {
-	struct orangefs_object_kref refn;
-};
-
-enum orangefs_param_request_type {
-	ORANGEFS_PARAM_REQUEST_SET = 1,
-	ORANGEFS_PARAM_REQUEST_GET = 2
-};
-
-enum orangefs_param_request_op {
-	ORANGEFS_PARAM_REQUEST_OP_ACACHE_TIMEOUT_MSECS = 1,
-	ORANGEFS_PARAM_REQUEST_OP_ACACHE_HARD_LIMIT = 2,
-	ORANGEFS_PARAM_REQUEST_OP_ACACHE_SOFT_LIMIT = 3,
-	ORANGEFS_PARAM_REQUEST_OP_ACACHE_RECLAIM_PERCENTAGE = 4,
-	ORANGEFS_PARAM_REQUEST_OP_PERF_TIME_INTERVAL_SECS = 5,
-	ORANGEFS_PARAM_REQUEST_OP_PERF_HISTORY_SIZE = 6,
-	ORANGEFS_PARAM_REQUEST_OP_PERF_RESET = 7,
-	ORANGEFS_PARAM_REQUEST_OP_NCACHE_TIMEOUT_MSECS = 8,
-	ORANGEFS_PARAM_REQUEST_OP_NCACHE_HARD_LIMIT = 9,
-	ORANGEFS_PARAM_REQUEST_OP_NCACHE_SOFT_LIMIT = 10,
-	ORANGEFS_PARAM_REQUEST_OP_NCACHE_RECLAIM_PERCENTAGE = 11,
-	ORANGEFS_PARAM_REQUEST_OP_STATIC_ACACHE_TIMEOUT_MSECS = 12,
-	ORANGEFS_PARAM_REQUEST_OP_STATIC_ACACHE_HARD_LIMIT = 13,
-	ORANGEFS_PARAM_REQUEST_OP_STATIC_ACACHE_SOFT_LIMIT = 14,
-	ORANGEFS_PARAM_REQUEST_OP_STATIC_ACACHE_RECLAIM_PERCENTAGE = 15,
-	ORANGEFS_PARAM_REQUEST_OP_CLIENT_DEBUG = 16,
-	ORANGEFS_PARAM_REQUEST_OP_CCACHE_TIMEOUT_SECS = 17,
-	ORANGEFS_PARAM_REQUEST_OP_CCACHE_HARD_LIMIT = 18,
-	ORANGEFS_PARAM_REQUEST_OP_CCACHE_SOFT_LIMIT = 19,
-	ORANGEFS_PARAM_REQUEST_OP_CCACHE_RECLAIM_PERCENTAGE = 20,
-	ORANGEFS_PARAM_REQUEST_OP_CAPCACHE_TIMEOUT_SECS = 21,
-	ORANGEFS_PARAM_REQUEST_OP_CAPCACHE_HARD_LIMIT = 22,
-	ORANGEFS_PARAM_REQUEST_OP_CAPCACHE_SOFT_LIMIT = 23,
-	ORANGEFS_PARAM_REQUEST_OP_CAPCACHE_RECLAIM_PERCENTAGE = 24,
-	ORANGEFS_PARAM_REQUEST_OP_TWO_MASK_VALUES = 25,
-	ORANGEFS_PARAM_REQUEST_OP_READAHEAD_SIZE = 26,
-	ORANGEFS_PARAM_REQUEST_OP_READAHEAD_COUNT = 27,
-	ORANGEFS_PARAM_REQUEST_OP_READAHEAD_COUNT_SIZE = 28,
-	ORANGEFS_PARAM_REQUEST_OP_READAHEAD_READCNT = 29,
-};
-
-struct orangefs_param_request_s {
-	enum orangefs_param_request_type type;
-	enum orangefs_param_request_op op;
-	union {
-		__s64 value64;
-		__s32 value32[2];
-	} u;
-	char s_value[ORANGEFS_MAX_DEBUG_STRING_LEN];
-};
-
-enum orangefs_perf_count_request_type {
-	ORANGEFS_PERF_COUNT_REQUEST_ACACHE = 1,
-	ORANGEFS_PERF_COUNT_REQUEST_NCACHE = 2,
-	ORANGEFS_PERF_COUNT_REQUEST_CAPCACHE = 3,
-};
-
-struct orangefs_perf_count_request_s {
-	enum orangefs_perf_count_request_type type;
-	__s32 __pad1;
-};
-
-struct orangefs_fs_key_request_s {
-	__s32 fsid;
-	__s32 __pad1;
-};
-
-/* 2.9.6 */
-struct orangefs_features_request_s {
-	__u64 features;
-};
-
-struct orangefs_upcall_s {
-	__s32 type;
-	__u32 uid;
-	__u32 gid;
-	int pid;
-	int tgid;
-	/* Trailers unused but must be retained for protocol compatibility. */
-	__s64 trailer_size;
-	char *trailer_buf;
-
-	union {
-		struct orangefs_io_request_s io;
-		struct orangefs_lookup_request_s lookup;
-		struct orangefs_create_request_s create;
-		struct orangefs_symlink_request_s sym;
-		struct orangefs_getattr_request_s getattr;
-		struct orangefs_setattr_request_s setattr;
-		struct orangefs_remove_request_s remove;
-		struct orangefs_mkdir_request_s mkdir;
-		struct orangefs_readdir_request_s readdir;
-		struct orangefs_readdirplus_request_s readdirplus;
-		struct orangefs_rename_request_s rename;
-		struct orangefs_statfs_request_s statfs;
-		struct orangefs_truncate_request_s truncate;
-		struct orangefs_ra_cache_flush_request_s ra_cache_flush;
-		struct orangefs_fs_mount_request_s fs_mount;
-		struct orangefs_fs_umount_request_s fs_umount;
-		struct orangefs_getxattr_request_s getxattr;
-		struct orangefs_setxattr_request_s setxattr;
-		struct orangefs_listxattr_request_s listxattr;
-		struct orangefs_removexattr_request_s removexattr;
-		struct orangefs_op_cancel_s cancel;
-		struct orangefs_fsync_request_s fsync;
-		struct orangefs_param_request_s param;
-		struct orangefs_perf_count_request_s perf_count;
-		struct orangefs_fs_key_request_s fs_key;
-		struct orangefs_features_request_s features;
-	} req;
-};
-
-#endif /* __UPCALL_H */
diff --git a/fs/orangefs/waitqueue.c b/fs/orangefs/waitqueue.c
index 0577d6dba8c8..1992a2647b8a 100644
--- a/fs/orangefs/waitqueue.c
+++ b/fs/orangefs/waitqueue.c
@@ -13,7 +13,6 @@
  *  In-kernel waitqueue operations.
  */
 
-#include "protocol.h"
 #include "orangefs-kernel.h"
 #include "orangefs-bufmap.h"
 
diff --git a/fs/orangefs/xattr.c b/fs/orangefs/xattr.c
index 03bcb871544d..b3b0db56b408 100644
--- a/fs/orangefs/xattr.c
+++ b/fs/orangefs/xattr.c
@@ -9,7 +9,6 @@
  *  Linux VFS extended attribute operations.
  */
 
-#include "protocol.h"
 #include "orangefs-kernel.h"
 #include "orangefs-bufmap.h"
 #include <linux/posix_acl_xattr.h>
diff --git a/include/uapi/linux/orangefs.h b/include/uapi/linux/orangefs.h
new file mode 100644
index 000000000000..eb201fa68c43
--- /dev/null
+++ b/include/uapi/linux/orangefs.h
@@ -0,0 +1,732 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * (C) 2001 Clemson University and The University of Chicago
+ * Copyright 2018 Omnibond Systems, L.L.C.
+ *
+ * See COPYING in top-level directory.
+ */
+
+#ifndef _UAPI_LINUX_ORANGEFS_H
+#define _UAPI_LINUX_ORANGEFS_H
+
+#include <linux/ioctl.h>
+#include <linux/types.h>
+
+/*
+ * valid orangefs kernel operation types
+ */
+#define ORANGEFS_VFS_OP_INVALID      0xFF000000
+#define ORANGEFS_VFS_OP_FILE_IO      0xFF000001
+#define ORANGEFS_VFS_OP_LOOKUP       0xFF000002
+#define ORANGEFS_VFS_OP_CREATE       0xFF000003
+#define ORANGEFS_VFS_OP_GETATTR      0xFF000004
+#define ORANGEFS_VFS_OP_REMOVE       0xFF000005
+#define ORANGEFS_VFS_OP_MKDIR        0xFF000006
+#define ORANGEFS_VFS_OP_READDIR      0xFF000007
+#define ORANGEFS_VFS_OP_SETATTR      0xFF000008
+#define ORANGEFS_VFS_OP_SYMLINK      0xFF000009
+#define ORANGEFS_VFS_OP_RENAME       0xFF00000A
+#define ORANGEFS_VFS_OP_STATFS       0xFF00000B
+#define ORANGEFS_VFS_OP_TRUNCATE     0xFF00000C
+#define ORANGEFS_VFS_OP_RA_FLUSH     0xFF00000D
+#define ORANGEFS_VFS_OP_FS_MOUNT     0xFF00000E
+#define ORANGEFS_VFS_OP_FS_UMOUNT    0xFF00000F
+#define ORANGEFS_VFS_OP_GETXATTR     0xFF000010
+#define ORANGEFS_VFS_OP_SETXATTR     0xFF000011
+#define ORANGEFS_VFS_OP_LISTXATTR    0xFF000012
+#define ORANGEFS_VFS_OP_REMOVEXATTR  0xFF000013
+#define ORANGEFS_VFS_OP_PARAM        0xFF000014
+#define ORANGEFS_VFS_OP_PERF_COUNT   0xFF000015
+#define ORANGEFS_VFS_OP_CANCEL       0xFF00EE00
+#define ORANGEFS_VFS_OP_FSYNC        0xFF00EE01
+#define ORANGEFS_VFS_OP_FSKEY        0xFF00EE02
+#define ORANGEFS_VFS_OP_READDIRPLUS  0xFF00EE03
+#define ORANGEFS_VFS_OP_FEATURES     0xFF00EE05 /* 2.9.6 */
+
+/* features is a 64-bit unsigned bitmask */
+#define ORANGEFS_FEATURE_READAHEAD 1
+
+/*
+ * Misc constants. Please retain them as multiples of 8!
+ * Otherwise 32-64 bit interactions will be messed up :)
+ */
+#define ORANGEFS_MAX_DEBUG_STRING_LEN 0x00000800
+
+#define ORANGEFS_MAX_DIRENT_COUNT_READDIR 512
+
+/*
+ * The 2.9 core will put 64 bit handles in here like this:
+ *    1234 0000 0000 5678
+ * The 3.0 and beyond cores will put 128 bit handles in here like this:
+ *    1234 5678 90AB CDEF
+ * The kernel module will always use the first four bytes and
+ * the last four bytes as an inum.
+ */
+struct orangefs_khandle {
+	unsigned char u[16];
+} __attribute__((aligned(8)));
+
+/*
+ * kernel version of an object ref.
+ */
+struct orangefs_object_kref {
+	struct orangefs_khandle khandle;
+	__s32 fs_id;
+	__s32 __pad1;
+};
+
+/*
+ * ORANGEFS error codes are a signed 32-bit integer. Error codes are negative, but
+ * the sign is stripped before decoding.
+ */
+
+/* Bit 31 is not used since it is the sign. */
+
+/*
+ * Bit 30 specifies that this is a ORANGEFS error. A ORANGEFS error is either an
+ * encoded errno value or a ORANGEFS protocol error.
+ */
+#define ORANGEFS_ERROR_BIT (1 << 30)
+
+/*
+ * Bit 29 specifies that this is a ORANGEFS protocol error and not an encoded
+ * errno value.
+ */
+#define ORANGEFS_NON_ERRNO_ERROR_BIT (1 << 29)
+
+/*
+ * Bits 9, 8, and 7 specify the error class, which encodes the section of
+ * server code the error originated in for logging purposes. It is not used
+ * in the kernel except to be masked out.
+ */
+#define ORANGEFS_ERROR_CLASS_BITS 0x380
+
+/* Bits 6 - 0 are reserved for the actual error code. */
+#define ORANGEFS_ERROR_NUMBER_BITS 0x7f
+
+/* Encoded errno values decoded by PINT_errno_mapping in orangefs-utils.c. */
+
+/* Our own ORANGEFS protocol error codes. */
+#define ORANGEFS_ECANCEL    (1|ORANGEFS_NON_ERRNO_ERROR_BIT|ORANGEFS_ERROR_BIT)
+#define ORANGEFS_EDEVINIT   (2|ORANGEFS_NON_ERRNO_ERROR_BIT|ORANGEFS_ERROR_BIT)
+#define ORANGEFS_EDETAIL    (3|ORANGEFS_NON_ERRNO_ERROR_BIT|ORANGEFS_ERROR_BIT)
+#define ORANGEFS_EHOSTNTFD  (4|ORANGEFS_NON_ERRNO_ERROR_BIT|ORANGEFS_ERROR_BIT)
+#define ORANGEFS_EADDRNTFD  (5|ORANGEFS_NON_ERRNO_ERROR_BIT|ORANGEFS_ERROR_BIT)
+#define ORANGEFS_ENORECVR   (6|ORANGEFS_NON_ERRNO_ERROR_BIT|ORANGEFS_ERROR_BIT)
+#define ORANGEFS_ETRYAGAIN  (7|ORANGEFS_NON_ERRNO_ERROR_BIT|ORANGEFS_ERROR_BIT)
+#define ORANGEFS_ENOTPVFS   (8|ORANGEFS_NON_ERRNO_ERROR_BIT|ORANGEFS_ERROR_BIT)
+#define ORANGEFS_ESECURITY  (9|ORANGEFS_NON_ERRNO_ERROR_BIT|ORANGEFS_ERROR_BIT)
+
+/* permission bits */
+#define ORANGEFS_O_EXECUTE (1 << 0)
+#define ORANGEFS_O_WRITE   (1 << 1)
+#define ORANGEFS_O_READ    (1 << 2)
+#define ORANGEFS_G_EXECUTE (1 << 3)
+#define ORANGEFS_G_WRITE   (1 << 4)
+#define ORANGEFS_G_READ    (1 << 5)
+#define ORANGEFS_U_EXECUTE (1 << 6)
+#define ORANGEFS_U_WRITE   (1 << 7)
+#define ORANGEFS_U_READ    (1 << 8)
+/* no ORANGEFS_U_VTX (sticky bit) */
+#define ORANGEFS_G_SGID    (1 << 10)
+#define ORANGEFS_U_SUID    (1 << 11)
+
+#define ORANGEFS_ITERATE_START  2147483646
+#define ORANGEFS_ITERATE_END    2147483645
+#define ORANGEFS_IMMUTABLE_FL   FS_IMMUTABLE_FL
+#define ORANGEFS_APPEND_FL      FS_APPEND_FL
+#define ORANGEFS_NOATIME_FL     FS_NOATIME_FL
+#define ORANGEFS_MIRROR_FL      0x01000000ULL
+#define ORANGEFS_FS_ID_NULL     ((__s32)0)
+#define ORANGEFS_ATTR_SYS_UID                  (1 << 0)
+#define ORANGEFS_ATTR_SYS_GID                  (1 << 1)
+#define ORANGEFS_ATTR_SYS_PERM                 (1 << 2)
+#define ORANGEFS_ATTR_SYS_ATIME                (1 << 3)
+#define ORANGEFS_ATTR_SYS_CTIME                (1 << 4)
+#define ORANGEFS_ATTR_SYS_MTIME                (1 << 5)
+#define ORANGEFS_ATTR_SYS_TYPE                 (1 << 6)
+#define ORANGEFS_ATTR_SYS_ATIME_SET            (1 << 7)
+#define ORANGEFS_ATTR_SYS_MTIME_SET            (1 << 8)
+#define ORANGEFS_ATTR_SYS_SIZE                 (1 << 20)
+#define ORANGEFS_ATTR_SYS_LNK_TARGET           (1 << 24)
+#define ORANGEFS_ATTR_SYS_DFILE_COUNT          (1 << 25)
+#define ORANGEFS_ATTR_SYS_DIRENT_COUNT         (1 << 26)
+#define ORANGEFS_ATTR_SYS_BLKSIZE              (1 << 28)
+#define ORANGEFS_ATTR_SYS_MIRROR_COPIES_COUNT  (1 << 29)
+#define ORANGEFS_ATTR_SYS_COMMON_ALL	\
+	(ORANGEFS_ATTR_SYS_UID	|	\
+	 ORANGEFS_ATTR_SYS_GID	|	\
+	 ORANGEFS_ATTR_SYS_PERM	|	\
+	 ORANGEFS_ATTR_SYS_ATIME	|	\
+	 ORANGEFS_ATTR_SYS_CTIME	|	\
+	 ORANGEFS_ATTR_SYS_MTIME	|	\
+	 ORANGEFS_ATTR_SYS_TYPE)
+
+#define ORANGEFS_ATTR_SYS_ALL_SETABLE		\
+(ORANGEFS_ATTR_SYS_COMMON_ALL-ORANGEFS_ATTR_SYS_TYPE)
+
+#define ORANGEFS_ATTR_SYS_ALL_NOHINT			\
+	(ORANGEFS_ATTR_SYS_COMMON_ALL		|	\
+	 ORANGEFS_ATTR_SYS_SIZE			|	\
+	 ORANGEFS_ATTR_SYS_LNK_TARGET		|	\
+	 ORANGEFS_ATTR_SYS_DFILE_COUNT		|	\
+	 ORANGEFS_ATTR_SYS_MIRROR_COPIES_COUNT	|	\
+	 ORANGEFS_ATTR_SYS_DIRENT_COUNT		|	\
+	 ORANGEFS_ATTR_SYS_BLKSIZE)
+
+#define ORANGEFS_XATTR_REPLACE 0x2
+#define ORANGEFS_XATTR_CREATE  0x1
+#define ORANGEFS_MAX_SERVER_ADDR_LEN 256
+#define ORANGEFS_NAME_MAX                256
+/*
+ * max extended attribute name len as imposed by the VFS and exploited for the
+ * upcall request types.
+ * NOTE: Please retain them as multiples of 8 even if you wish to change them
+ * This is *NECESSARY* for supporting 32 bit user-space binaries on a 64-bit
+ * kernel. Due to implementation within DBPF, this really needs to be
+ * ORANGEFS_NAME_MAX, which it was the same value as, but no reason to let it
+ * break if that changes in the future.
+ */
+#define ORANGEFS_MAX_XATTR_NAMELEN   ORANGEFS_NAME_MAX	/* Not the same as
+						 * XATTR_NAME_MAX defined
+						 * by <linux/xattr.h>
+						 */
+#define ORANGEFS_MAX_XATTR_VALUELEN  8192	/* Not the same as XATTR_SIZE_MAX
+					 * defined by <linux/xattr.h>
+					 */
+#define ORANGEFS_MAX_XATTR_LISTLEN   16	/* Not the same as XATTR_LIST_MAX
+					 * defined by <linux/xattr.h>
+					 */
+/*
+ * ORANGEFS I/O operation types, used in both system and server interfaces.
+ */
+enum ORANGEFS_io_type {
+	ORANGEFS_IO_READ = 1,
+	ORANGEFS_IO_WRITE = 2
+};
+
+/*
+ * If this enum is modified the server parameters related to the precreate pool
+ * batch and low threshold sizes may need to be modified  to reflect this
+ * change.
+ */
+enum orangefs_ds_type {
+	ORANGEFS_TYPE_NONE = 0,
+	ORANGEFS_TYPE_METAFILE = (1 << 0),
+	ORANGEFS_TYPE_DATAFILE = (1 << 1),
+	ORANGEFS_TYPE_DIRECTORY = (1 << 2),
+	ORANGEFS_TYPE_SYMLINK = (1 << 3),
+	ORANGEFS_TYPE_DIRDATA = (1 << 4),
+	ORANGEFS_TYPE_INTERNAL = (1 << 5)	/* for the server's private use */
+};
+
+/* This structure is used by the VFS-client interaction alone */
+struct ORANGEFS_keyval_pair {
+	char key[ORANGEFS_MAX_XATTR_NAMELEN];
+	__s32 key_sz;	/* __s32 for portable, fixed-size structures */
+	__s32 val_sz;
+	char val[ORANGEFS_MAX_XATTR_VALUELEN];
+};
+
+/* pvfs2-sysint.h ***********************************************************/
+/* Describes attributes for a file, directory, or symlink. */
+struct ORANGEFS_sys_attr_s {
+	__u32 owner;
+	__u32 group;
+	__u32 perms;
+	__u64 atime;
+	__u64 mtime;
+	__u64 ctime;
+	__s64 size;
+
+	/* NOTE: caller must free if valid */
+	char *link_target;
+
+	/* Changed to __s32 so that size of structure does not change */
+	__s32 dfile_count;
+
+	/* Changed to __s32 so that size of structure does not change */
+	__s32 distr_dir_servers_initial;
+
+	/* Changed to __s32 so that size of structure does not change */
+	__s32 distr_dir_servers_max;
+
+	/* Changed to __s32 so that size of structure does not change */
+	__s32 distr_dir_split_size;
+
+	__u32 mirror_copies_count;
+
+	/* NOTE: caller must free if valid */
+	char *dist_name;
+
+	/* NOTE: caller must free if valid */
+	char *dist_params;
+
+	__s64 dirent_count;
+	enum orangefs_ds_type objtype;
+	__u64 flags;
+	__u32 mask;
+	__s64 blksize;
+};
+
+#define ORANGEFS_LOOKUP_LINK_NO_FOLLOW 0
+
+/* pint-dev.h ***************************************************************/
+
+/* parameter structure used in ORANGEFS_DEV_DEBUG ioctl command */
+struct dev_mask_info_s {
+	enum {
+		KERNEL_MASK,
+		CLIENT_MASK,
+	} mask_type;
+	__u64 mask_value;
+};
+
+struct dev_mask2_info_s {
+	__u64 mask1_value;
+	__u64 mask2_value;
+};
+
+#define	GOSSIP_NO_DEBUG			(__u64)0
+
+#define GOSSIP_SUPER_DEBUG		((__u64)1 << 0)
+#define GOSSIP_INODE_DEBUG		((__u64)1 << 1)
+#define GOSSIP_FILE_DEBUG		((__u64)1 << 2)
+#define GOSSIP_DIR_DEBUG		((__u64)1 << 3)
+#define GOSSIP_UTILS_DEBUG		((__u64)1 << 4)
+#define GOSSIP_WAIT_DEBUG		((__u64)1 << 5)
+#define GOSSIP_ACL_DEBUG		((__u64)1 << 6)
+#define GOSSIP_DCACHE_DEBUG		((__u64)1 << 7)
+#define GOSSIP_DEV_DEBUG		((__u64)1 << 8)
+#define GOSSIP_NAME_DEBUG		((__u64)1 << 9)
+#define GOSSIP_BUFMAP_DEBUG		((__u64)1 << 10)
+#define GOSSIP_CACHE_DEBUG		((__u64)1 << 11)
+#define GOSSIP_DEBUGFS_DEBUG		((__u64)1 << 12)
+#define GOSSIP_XATTR_DEBUG		((__u64)1 << 13)
+#define GOSSIP_INIT_DEBUG		((__u64)1 << 14)
+#define GOSSIP_SYSFS_DEBUG		((__u64)1 << 15)
+
+#define GOSSIP_MAX_NR                 16
+#define GOSSIP_MAX_DEBUG              (((__u64)1 << GOSSIP_MAX_NR) - 1)
+
+/* pint-dev-shared.h ********************************************************/
+#define ORANGEFS_DEV_MAGIC 'k'
+
+#define ORANGEFS_READDIR_DEFAULT_DESC_COUNT  5
+
+#define DEV_GET_MAGIC           0x1
+#define DEV_GET_MAX_UPSIZE      0x2
+#define DEV_GET_MAX_DOWNSIZE    0x3
+#define DEV_MAP                 0x4
+#define DEV_REMOUNT_ALL         0x5
+#define DEV_DEBUG               0x6
+#define DEV_UPSTREAM            0x7
+#define DEV_CLIENT_MASK         0x8
+#define DEV_CLIENT_STRING       0x9
+#define DEV_MAX_NR              0xa
+
+/* supported ioctls, codes are with respect to user-space */
+enum {
+	ORANGEFS_DEV_GET_MAGIC = _IOW(ORANGEFS_DEV_MAGIC, DEV_GET_MAGIC, __s32),
+	ORANGEFS_DEV_GET_MAX_UPSIZE =
+	    _IOW(ORANGEFS_DEV_MAGIC, DEV_GET_MAX_UPSIZE, __s32),
+	ORANGEFS_DEV_GET_MAX_DOWNSIZE =
+	    _IOW(ORANGEFS_DEV_MAGIC, DEV_GET_MAX_DOWNSIZE, __s32),
+	ORANGEFS_DEV_MAP = _IO(ORANGEFS_DEV_MAGIC, DEV_MAP),
+	ORANGEFS_DEV_REMOUNT_ALL = _IO(ORANGEFS_DEV_MAGIC, DEV_REMOUNT_ALL),
+	ORANGEFS_DEV_DEBUG = _IOR(ORANGEFS_DEV_MAGIC, DEV_DEBUG, __s32),
+	ORANGEFS_DEV_UPSTREAM = _IOW(ORANGEFS_DEV_MAGIC, DEV_UPSTREAM, int),
+	ORANGEFS_DEV_CLIENT_MASK = _IOW(ORANGEFS_DEV_MAGIC,
+				    DEV_CLIENT_MASK,
+				    struct dev_mask2_info_s),
+	ORANGEFS_DEV_CLIENT_STRING = _IOW(ORANGEFS_DEV_MAGIC,
+				      DEV_CLIENT_STRING,
+				      char *),
+	ORANGEFS_DEV_MAXNR = DEV_MAX_NR,
+};
+
+/*
+ * version number for use in communicating between kernel space and user
+ * space. Zero signifies the upstream version of the kernel module.
+ */
+#define ORANGEFS_KERNEL_PROTO_VERSION 0
+#define ORANGEFS_MINIMUM_USERSPACE_VERSION 20903
+
+/*
+ * describes memory regions to map in the ORANGEFS_DEV_MAP ioctl.
+ * NOTE: See devorangefs-req.c for 32 bit compat structure.
+ * Since this structure has a variable-sized layout that is different
+ * on 32 and 64 bit platforms, we need to normalize to a 64 bit layout
+ * on such systems before servicing ioctl calls from user-space binaries
+ * that may be 32 bit!
+ */
+struct ORANGEFS_dev_map_desc {
+	void *ptr;
+	__s32 total_size;
+	__s32 size;
+	__s32 count;
+};
+
+struct orangefs_io_response {
+	__s64 amt_complete;
+};
+
+struct orangefs_lookup_response {
+	struct orangefs_object_kref refn;
+};
+
+struct orangefs_create_response {
+	struct orangefs_object_kref refn;
+};
+
+struct orangefs_symlink_response {
+	struct orangefs_object_kref refn;
+};
+
+struct orangefs_getattr_response {
+	struct ORANGEFS_sys_attr_s attributes;
+	char link_target[ORANGEFS_NAME_MAX];
+};
+
+struct orangefs_mkdir_response {
+	struct orangefs_object_kref refn;
+};
+
+struct orangefs_statfs_response {
+	__s64 block_size;
+	__s64 blocks_total;
+	__s64 blocks_avail;
+	__s64 files_total;
+	__s64 files_avail;
+};
+
+struct orangefs_fs_mount_response {
+	__s32 fs_id;
+	__s32 id;
+	struct orangefs_khandle root_khandle;
+};
+
+/* the getxattr response is the attribute value */
+struct orangefs_getxattr_response {
+	__s32 val_sz;
+	__s32 __pad1;
+	char val[ORANGEFS_MAX_XATTR_VALUELEN];
+};
+
+/* the listxattr response is an array of attribute names */
+struct orangefs_listxattr_response {
+	__s32 returned_count;
+	__s32 __pad1;
+	__u64 token;
+	char key[ORANGEFS_MAX_XATTR_LISTLEN * ORANGEFS_MAX_XATTR_NAMELEN];
+	__s32 keylen;
+	__s32 __pad2;
+	__s32 lengths[ORANGEFS_MAX_XATTR_LISTLEN];
+};
+
+struct orangefs_param_response {
+	union {
+		__s64 value64;
+		__s32 value32[2];
+	} u;
+};
+
+#define PERF_COUNT_BUF_SIZE 4096
+struct orangefs_perf_count_response {
+	char buffer[PERF_COUNT_BUF_SIZE];
+};
+
+#define FS_KEY_BUF_SIZE 4096
+struct orangefs_fs_key_response {
+	__s32 fs_keylen;
+	__s32 __pad1;
+	char fs_key[FS_KEY_BUF_SIZE];
+};
+
+/* 2.9.6 */
+struct orangefs_features_response {
+	__u64 features;
+};
+
+struct orangefs_downcall_s {
+	__s32 type;
+	__s32 status;
+	/* currently trailer is used only by readdir */
+	__s64 trailer_size;
+	char *trailer_buf;
+
+	union {
+		struct orangefs_io_response io;
+		struct orangefs_lookup_response lookup;
+		struct orangefs_create_response create;
+		struct orangefs_symlink_response sym;
+		struct orangefs_getattr_response getattr;
+		struct orangefs_mkdir_response mkdir;
+		struct orangefs_statfs_response statfs;
+		struct orangefs_fs_mount_response fs_mount;
+		struct orangefs_getxattr_response getxattr;
+		struct orangefs_listxattr_response listxattr;
+		struct orangefs_param_response param;
+		struct orangefs_perf_count_response perf_count;
+		struct orangefs_fs_key_response fs_key;
+		struct orangefs_features_response features;
+	} resp;
+};
+
+/*
+ * The readdir response comes in the trailer.  It is followed by the
+ * directory entries as described in dir.c.
+ */
+
+struct orangefs_readdir_response_s {
+	__u64 token;
+	__u64 directory_version;
+	__u32 __pad2;
+	__u32 orangefs_dirent_outcount;
+};
+
+struct orangefs_io_request_s {
+	__s32 __pad1;
+	__s32 buf_index;
+	__s32 count;
+	__s32 __pad2;
+	__s64 offset;
+	struct orangefs_object_kref refn;
+	enum ORANGEFS_io_type io_type;
+	__s32 readahead_size;
+};
+
+struct orangefs_lookup_request_s {
+	__s32 sym_follow;
+	__s32 __pad1;
+	struct orangefs_object_kref parent_refn;
+	char d_name[ORANGEFS_NAME_MAX];
+};
+
+struct orangefs_create_request_s {
+	struct orangefs_object_kref parent_refn;
+	struct ORANGEFS_sys_attr_s attributes;
+	char d_name[ORANGEFS_NAME_MAX];
+};
+
+struct orangefs_symlink_request_s {
+	struct orangefs_object_kref parent_refn;
+	struct ORANGEFS_sys_attr_s attributes;
+	char entry_name[ORANGEFS_NAME_MAX];
+	char target[ORANGEFS_NAME_MAX];
+};
+
+struct orangefs_getattr_request_s {
+	struct orangefs_object_kref refn;
+	__u32 mask;
+	__u32 __pad1;
+};
+
+struct orangefs_setattr_request_s {
+	struct orangefs_object_kref refn;
+	struct ORANGEFS_sys_attr_s attributes;
+};
+
+struct orangefs_remove_request_s {
+	struct orangefs_object_kref parent_refn;
+	char d_name[ORANGEFS_NAME_MAX];
+};
+
+struct orangefs_mkdir_request_s {
+	struct orangefs_object_kref parent_refn;
+	struct ORANGEFS_sys_attr_s attributes;
+	char d_name[ORANGEFS_NAME_MAX];
+};
+
+struct orangefs_readdir_request_s {
+	struct orangefs_object_kref refn;
+	__u64 token;
+	__s32 max_dirent_count;
+	__s32 buf_index;
+};
+
+struct orangefs_readdirplus_request_s {
+	struct orangefs_object_kref refn;
+	__u64 token;
+	__s32 max_dirent_count;
+	__u32 mask;
+	__s32 buf_index;
+	__s32 __pad1;
+};
+
+struct orangefs_rename_request_s {
+	struct orangefs_object_kref old_parent_refn;
+	struct orangefs_object_kref new_parent_refn;
+	char d_old_name[ORANGEFS_NAME_MAX];
+	char d_new_name[ORANGEFS_NAME_MAX];
+};
+
+struct orangefs_statfs_request_s {
+	__s32 fs_id;
+	__s32 __pad1;
+};
+
+struct orangefs_truncate_request_s {
+	struct orangefs_object_kref refn;
+	__s64 size;
+};
+
+struct orangefs_ra_cache_flush_request_s {
+	struct orangefs_object_kref refn;
+};
+
+struct orangefs_fs_mount_request_s {
+	char orangefs_config_server[ORANGEFS_MAX_SERVER_ADDR_LEN];
+};
+
+struct orangefs_fs_umount_request_s {
+	__s32 id;
+	__s32 fs_id;
+	char orangefs_config_server[ORANGEFS_MAX_SERVER_ADDR_LEN];
+};
+
+struct orangefs_getxattr_request_s {
+	struct orangefs_object_kref refn;
+	__s32 key_sz;
+	__s32 __pad1;
+	char key[ORANGEFS_MAX_XATTR_NAMELEN];
+};
+
+struct orangefs_setxattr_request_s {
+	struct orangefs_object_kref refn;
+	struct ORANGEFS_keyval_pair keyval;
+	__s32 flags;
+	__s32 __pad1;
+};
+
+struct orangefs_listxattr_request_s {
+	struct orangefs_object_kref refn;
+	__s32 requested_count;
+	__s32 __pad1;
+	__u64 token;
+};
+
+struct orangefs_removexattr_request_s {
+	struct orangefs_object_kref refn;
+	__s32 key_sz;
+	__s32 __pad1;
+	char key[ORANGEFS_MAX_XATTR_NAMELEN];
+};
+
+struct orangefs_op_cancel_s {
+	__u64 op_tag;
+};
+
+struct orangefs_fsync_request_s {
+	struct orangefs_object_kref refn;
+};
+
+enum orangefs_param_request_type {
+	ORANGEFS_PARAM_REQUEST_SET = 1,
+	ORANGEFS_PARAM_REQUEST_GET = 2
+};
+
+enum orangefs_param_request_op {
+	ORANGEFS_PARAM_REQUEST_OP_ACACHE_TIMEOUT_MSECS = 1,
+	ORANGEFS_PARAM_REQUEST_OP_ACACHE_HARD_LIMIT = 2,
+	ORANGEFS_PARAM_REQUEST_OP_ACACHE_SOFT_LIMIT = 3,
+	ORANGEFS_PARAM_REQUEST_OP_ACACHE_RECLAIM_PERCENTAGE = 4,
+	ORANGEFS_PARAM_REQUEST_OP_PERF_TIME_INTERVAL_SECS = 5,
+	ORANGEFS_PARAM_REQUEST_OP_PERF_HISTORY_SIZE = 6,
+	ORANGEFS_PARAM_REQUEST_OP_PERF_RESET = 7,
+	ORANGEFS_PARAM_REQUEST_OP_NCACHE_TIMEOUT_MSECS = 8,
+	ORANGEFS_PARAM_REQUEST_OP_NCACHE_HARD_LIMIT = 9,
+	ORANGEFS_PARAM_REQUEST_OP_NCACHE_SOFT_LIMIT = 10,
+	ORANGEFS_PARAM_REQUEST_OP_NCACHE_RECLAIM_PERCENTAGE = 11,
+	ORANGEFS_PARAM_REQUEST_OP_STATIC_ACACHE_TIMEOUT_MSECS = 12,
+	ORANGEFS_PARAM_REQUEST_OP_STATIC_ACACHE_HARD_LIMIT = 13,
+	ORANGEFS_PARAM_REQUEST_OP_STATIC_ACACHE_SOFT_LIMIT = 14,
+	ORANGEFS_PARAM_REQUEST_OP_STATIC_ACACHE_RECLAIM_PERCENTAGE = 15,
+	ORANGEFS_PARAM_REQUEST_OP_CLIENT_DEBUG = 16,
+	ORANGEFS_PARAM_REQUEST_OP_CCACHE_TIMEOUT_SECS = 17,
+	ORANGEFS_PARAM_REQUEST_OP_CCACHE_HARD_LIMIT = 18,
+	ORANGEFS_PARAM_REQUEST_OP_CCACHE_SOFT_LIMIT = 19,
+	ORANGEFS_PARAM_REQUEST_OP_CCACHE_RECLAIM_PERCENTAGE = 20,
+	ORANGEFS_PARAM_REQUEST_OP_CAPCACHE_TIMEOUT_SECS = 21,
+	ORANGEFS_PARAM_REQUEST_OP_CAPCACHE_HARD_LIMIT = 22,
+	ORANGEFS_PARAM_REQUEST_OP_CAPCACHE_SOFT_LIMIT = 23,
+	ORANGEFS_PARAM_REQUEST_OP_CAPCACHE_RECLAIM_PERCENTAGE = 24,
+	ORANGEFS_PARAM_REQUEST_OP_TWO_MASK_VALUES = 25,
+	ORANGEFS_PARAM_REQUEST_OP_READAHEAD_SIZE = 26,
+	ORANGEFS_PARAM_REQUEST_OP_READAHEAD_COUNT = 27,
+	ORANGEFS_PARAM_REQUEST_OP_READAHEAD_COUNT_SIZE = 28,
+	ORANGEFS_PARAM_REQUEST_OP_READAHEAD_READCNT = 29,
+};
+
+struct orangefs_param_request_s {
+	enum orangefs_param_request_type type;
+	enum orangefs_param_request_op op;
+	union {
+		__s64 value64;
+		__s32 value32[2];
+	} u;
+	char s_value[ORANGEFS_MAX_DEBUG_STRING_LEN];
+};
+
+enum orangefs_perf_count_request_type {
+	ORANGEFS_PERF_COUNT_REQUEST_ACACHE = 1,
+	ORANGEFS_PERF_COUNT_REQUEST_NCACHE = 2,
+	ORANGEFS_PERF_COUNT_REQUEST_CAPCACHE = 3,
+};
+
+struct orangefs_perf_count_request_s {
+	enum orangefs_perf_count_request_type type;
+	__s32 __pad1;
+};
+
+struct orangefs_fs_key_request_s {
+	__s32 fsid;
+	__s32 __pad1;
+};
+
+/* 2.9.6 */
+struct orangefs_features_request_s {
+	__u64 features;
+};
+
+struct orangefs_upcall_s {
+	__s32 type;
+	__u32 uid;
+	__u32 gid;
+	int pid;
+	int tgid;
+	/* Trailers unused but must be retained for protocol compatibility. */
+	__s64 trailer_size;
+	char *trailer_buf;
+
+	union {
+		struct orangefs_io_request_s io;
+		struct orangefs_lookup_request_s lookup;
+		struct orangefs_create_request_s create;
+		struct orangefs_symlink_request_s sym;
+		struct orangefs_getattr_request_s getattr;
+		struct orangefs_setattr_request_s setattr;
+		struct orangefs_remove_request_s remove;
+		struct orangefs_mkdir_request_s mkdir;
+		struct orangefs_readdir_request_s readdir;
+		struct orangefs_readdirplus_request_s readdirplus;
+		struct orangefs_rename_request_s rename;
+		struct orangefs_statfs_request_s statfs;
+		struct orangefs_truncate_request_s truncate;
+		struct orangefs_ra_cache_flush_request_s ra_cache_flush;
+		struct orangefs_fs_mount_request_s fs_mount;
+		struct orangefs_fs_umount_request_s fs_umount;
+		struct orangefs_getxattr_request_s getxattr;
+		struct orangefs_setxattr_request_s setxattr;
+		struct orangefs_listxattr_request_s listxattr;
+		struct orangefs_removexattr_request_s removexattr;
+		struct orangefs_op_cancel_s cancel;
+		struct orangefs_fsync_request_s fsync;
+		struct orangefs_param_request_s param;
+		struct orangefs_perf_count_request_s perf_count;
+		struct orangefs_fs_key_request_s fs_key;
+		struct orangefs_features_request_s features;
+	} req;
+};
+
+#endif /* _UAPI_LINUX_ORANGEFS_H */
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 04/24] orangefs: open code short single-use functions
  2018-03-20 17:02 [PATCH 00/24] orangefs: page cache Martin Brandenburg
                   ` (2 preceding siblings ...)
  2018-03-20 17:02 ` [PATCH 03/24] orangefs: create uapi interface Martin Brandenburg
@ 2018-03-20 17:02 ` Martin Brandenburg
  2018-03-20 17:02 ` [PATCH 05/24] orangefs: implement vm_ops->fault Martin Brandenburg
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Martin Brandenburg @ 2018-03-20 17:02 UTC (permalink / raw)
  To: hubcap, linux-fsdevel; +Cc: Martin Brandenburg

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
---
 fs/orangefs/file.c | 95 +++++++++++-------------------------------------------
 1 file changed, 19 insertions(+), 76 deletions(-)

diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c
index 15ac531f19cf..739b7b14bf68 100644
--- a/fs/orangefs/file.c
+++ b/fs/orangefs/file.c
@@ -40,70 +40,6 @@ static int flush_racache(struct inode *inode)
 	return ret;
 }
 
-/*
- * Copy to client-core's address space from the buffers specified
- * by the iovec upto total_size bytes.
- * NOTE: the iovector can either contain addresses which
- *       can futher be kernel-space or user-space addresses.
- *       or it can pointers to struct page's
- */
-static int precopy_buffers(int buffer_index,
-			   struct iov_iter *iter,
-			   size_t total_size)
-{
-	int ret = 0;
-	/*
-	 * copy data from application/kernel by pulling it out
-	 * of the iovec.
-	 */
-
-
-	if (total_size) {
-		ret = orangefs_bufmap_copy_from_iovec(iter,
-						      buffer_index,
-						      total_size);
-		if (ret < 0)
-		gossip_err("%s: Failed to copy-in buffers. Please make sure that the pvfs2-client is running. %ld\n",
-			   __func__,
-			   (long)ret);
-	}
-
-	if (ret < 0)
-		gossip_err("%s: Failed to copy-in buffers. Please make sure that the pvfs2-client is running. %ld\n",
-			__func__,
-			(long)ret);
-	return ret;
-}
-
-/*
- * Copy from client-core's address space to the buffers specified
- * by the iovec upto total_size bytes.
- * NOTE: the iovector can either contain addresses which
- *       can futher be kernel-space or user-space addresses.
- *       or it can pointers to struct page's
- */
-static int postcopy_buffers(int buffer_index,
-			    struct iov_iter *iter,
-			    size_t total_size)
-{
-	int ret = 0;
-	/*
-	 * copy data to application/kernel by pushing it out to
-	 * the iovec. NOTE; target buffers can be addresses or
-	 * struct page pointers.
-	 */
-	if (total_size) {
-		ret = orangefs_bufmap_copy_to_iovec(iter,
-						    buffer_index,
-						    total_size);
-		if (ret < 0)
-			gossip_err("%s: Failed to copy-out buffers. Please make sure that the pvfs2-client is running (%ld)\n",
-				__func__,
-				(long)ret);
-	}
-	return ret;
-}
-
 /*
  * Post and wait for the I/O upcall to finish
  */
@@ -156,14 +92,15 @@ static ssize_t wait_for_direct_io(enum ORANGEFS_io_type type, struct inode *inod
 		     total_size);
 	/*
 	 * Stage 1: copy the buffers into client-core's address space
-	 * precopy_buffers only pertains to writes.
 	 */
-	if (type == ORANGEFS_IO_WRITE) {
-		ret = precopy_buffers(buffer_index,
-				      iter,
-				      total_size);
-		if (ret < 0)
+	if (type == ORANGEFS_IO_WRITE && total_size) {
+		ret = orangefs_bufmap_copy_from_iovec(iter, buffer_index,
+		    total_size);
+		if (ret < 0) {
+			gossip_err("%s: Failed to copy-in buffers. Please make sure that the pvfs2-client is running. %ld\n",
+			    __func__, (long)ret);
 			goto out;
+		}
 	}
 
 	gossip_debug(GOSSIP_FILE_DEBUG,
@@ -259,14 +196,20 @@ static ssize_t wait_for_direct_io(enum ORANGEFS_io_type type, struct inode *inod
 
 	/*
 	 * Stage 3: Post copy buffers from client-core's address space
-	 * postcopy_buffers only pertains to reads.
 	 */
-	if (type == ORANGEFS_IO_READ) {
-		ret = postcopy_buffers(buffer_index,
-				       iter,
-				       new_op->downcall.resp.io.amt_complete);
-		if (ret < 0)
+	if (type == ORANGEFS_IO_READ && new_op->downcall.resp.io.amt_complete) {
+		/*
+		 * NOTE: the iovector can either contain addresses which
+		 *       can futher be kernel-space or user-space addresses.
+		 *       or it can pointers to struct page's
+		 */
+		ret = orangefs_bufmap_copy_to_iovec(iter, buffer_index,
+		    new_op->downcall.resp.io.amt_complete);
+		if (ret < 0) {
+			gossip_err("%s: Failed to copy-out buffers. Please make sure that the pvfs2-client is running (%ld)\n",
+			    __func__, (long)ret);
 			goto out;
+		}
 	}
 	gossip_debug(GOSSIP_FILE_DEBUG,
 	    "%s(%pU): Amount %s, returned by the sys-io call:%d\n",
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 05/24] orangefs: implement vm_ops->fault
  2018-03-20 17:02 [PATCH 00/24] orangefs: page cache Martin Brandenburg
                   ` (3 preceding siblings ...)
  2018-03-20 17:02 ` [PATCH 04/24] orangefs: open code short single-use functions Martin Brandenburg
@ 2018-03-20 17:02 ` Martin Brandenburg
  2018-03-20 17:02 ` [PATCH 06/24] orangefs: implement xattr cache Martin Brandenburg
                   ` (18 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Martin Brandenburg @ 2018-03-20 17:02 UTC (permalink / raw)
  To: hubcap, linux-fsdevel; +Cc: Martin Brandenburg

Must retrieve size before running filemap_fault so the kernel has
an up-to-date size.

This should have been caught by xfstests generic/246, but it was masked
by orangefs_new_inode, which set i_size to PAGE_SIZE.  When nothing
caused a getattr prior to a pagefault, i_size was still PAGE_SIZE.
Since xfstests only read 10 bytes, it did not catch this bug.

When orangefs_new_inode was modified to perform a getattr instead,
i_size was set to zero, as it was a newly created file.  Then
orangefs_file_write_iter did NOT set i_size, instead prefering to
invalidate the attribute cache and letting the next caller retrieve
i_size.  But the fault handler did not know it was supposed to retrieve
i_size.  So during xfstests, i_size was still zero, and filemap_fault
returned VM_FAULT_SIGBUS.

Fixes xfstests generic/080, generic/141, generic/215, generic/247,
generic/248, generic/437, and generic/452.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
---
 fs/orangefs/file.c | 30 ++++++++++++++++++++++++++++--
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c
index 739b7b14bf68..f78a5902d5f7 100644
--- a/fs/orangefs/file.c
+++ b/fs/orangefs/file.c
@@ -527,6 +527,28 @@ static long orangefs_ioctl(struct file *file, unsigned int cmd, unsigned long ar
 	return ret;
 }
 
+static int orangefs_fault(struct vm_fault *vmf)
+{
+	struct file *file = vmf->vma->vm_file;
+	int rc;
+	rc = orangefs_inode_getattr(file->f_mapping->host, 0, 1,
+	    STATX_SIZE);
+	if (rc == -ESTALE)
+		rc = -EIO;
+	if (rc) {
+		gossip_err("%s: orangefs_inode_getattr failed, "
+		    "rc:%d:.\n", __func__, rc);
+		return rc;
+	}
+	return filemap_fault(vmf);
+}
+
+const struct vm_operations_struct orangefs_file_vm_ops = {
+	.fault = orangefs_fault,
+	.map_pages = filemap_map_pages,
+	.page_mkwrite = filemap_page_mkwrite,
+};
+
 /*
  * Memory map a region of a file.
  */
@@ -538,12 +560,16 @@ static int orangefs_file_mmap(struct file *file, struct vm_area_struct *vma)
 			(char *)file->f_path.dentry->d_name.name :
 			(char *)"Unknown"));
 
+	if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_MAYWRITE))
+		return -EINVAL;
+
 	/* set the sequential readahead hint */
 	vma->vm_flags |= VM_SEQ_READ;
 	vma->vm_flags &= ~VM_RAND_READ;
 
-	/* Use readonly mmap since we cannot support writable maps. */
-	return generic_file_readonly_mmap(file, vma);
+	file_accessed(file);
+	vma->vm_ops = &orangefs_file_vm_ops;
+	return 0;
 }
 
 #define mapping_nrpages(idata) ((idata)->nrpages)
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 06/24] orangefs: implement xattr cache
  2018-03-20 17:02 [PATCH 00/24] orangefs: page cache Martin Brandenburg
                   ` (4 preceding siblings ...)
  2018-03-20 17:02 ` [PATCH 05/24] orangefs: implement vm_ops->fault Martin Brandenburg
@ 2018-03-20 17:02 ` Martin Brandenburg
  2018-03-20 17:02 ` [PATCH 07/24] orangefs: simpler installation documentation Martin Brandenburg
                   ` (17 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Martin Brandenburg @ 2018-03-20 17:02 UTC (permalink / raw)
  To: hubcap, linux-fsdevel; +Cc: Martin Brandenburg

This uses the same timeout as the getattr cache.  This substantially
increases performance when writing files with smaller buffer sizes.

When writing, the size is (often) changed, which causes a call to
notify_change which calls security_inode_need_killpriv which needs a
getxattr.  Caching it reduces traffic to the server.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
---
 fs/orangefs/inode.c           |  1 +
 fs/orangefs/orangefs-kernel.h | 10 +++++
 fs/orangefs/super.c           |  9 ++++
 fs/orangefs/xattr.c           | 97 +++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 117 insertions(+)

diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c
index a922c9da80d6..26381d451039 100644
--- a/fs/orangefs/inode.c
+++ b/fs/orangefs/inode.c
@@ -363,6 +363,7 @@ static int orangefs_set_inode(struct inode *inode, void *data)
 	struct orangefs_object_kref *ref = (struct orangefs_object_kref *) data;
 	ORANGEFS_I(inode)->refn.fs_id = ref->fs_id;
 	ORANGEFS_I(inode)->refn.khandle = ref->khandle;
+	hash_init(ORANGEFS_I(inode)->xattr_cache);
 	return 0;
 }
 
diff --git a/fs/orangefs/orangefs-kernel.h b/fs/orangefs/orangefs-kernel.h
index 4f26fcbb9c83..b2046d29a116 100644
--- a/fs/orangefs/orangefs-kernel.h
+++ b/fs/orangefs/orangefs-kernel.h
@@ -195,6 +195,8 @@ struct orangefs_inode_s {
 
 	unsigned long getattr_time;
 	u32 getattr_mask;
+
+	DECLARE_HASHTABLE(xattr_cache, 4);
 };
 
 /* per superblock private orangefs info */
@@ -219,6 +221,14 @@ struct orangefs_stats {
 	unsigned long writes;
 };
 
+struct orangefs_cached_xattr {
+	struct hlist_node node;
+	char key[ORANGEFS_MAX_XATTR_NAMELEN];
+	char val[ORANGEFS_MAX_XATTR_VALUELEN];
+	ssize_t length;
+	unsigned long timeout;
+};
+
 extern struct orangefs_stats orangefs_stats;
 
 /*
diff --git a/fs/orangefs/super.c b/fs/orangefs/super.c
index 6183f6f6db53..b7718530bde4 100644
--- a/fs/orangefs/super.c
+++ b/fs/orangefs/super.c
@@ -127,6 +127,15 @@ static void orangefs_i_callback(struct rcu_head *head)
 {
 	struct inode *inode = container_of(head, struct inode, i_rcu);
 	struct orangefs_inode_s *orangefs_inode = ORANGEFS_I(inode);
+	struct orangefs_cached_xattr *cx;
+	struct hlist_node *tmp;
+	int i;
+
+	hash_for_each_safe(orangefs_inode->xattr_cache, i, tmp, cx, node) {
+		hlist_del(&cx->node);
+		kfree(cx);
+	}
+
 	kmem_cache_free(orangefs_inode_cache, orangefs_inode);
 }
 
diff --git a/fs/orangefs/xattr.c b/fs/orangefs/xattr.c
index b3b0db56b408..59bd382eb6dc 100644
--- a/fs/orangefs/xattr.c
+++ b/fs/orangefs/xattr.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 /*
  * (C) 2001 Clemson University and The University of Chicago
+ * Copyright 2018 Omnibond Systems, L.L.C.
  *
  * See COPYING in top-level directory.
  */
@@ -49,6 +50,35 @@ static inline int convert_to_internal_xattr_flags(int setxattr_flags)
 	return internal_flag;
 }
 
+static unsigned int xattr_key(const char *key)
+{
+	unsigned int i = 0;
+	while (key)
+		i += *key++;
+	return i % 16;
+}
+
+static struct orangefs_cached_xattr *find_cached_xattr(struct inode *inode,
+    const char *key)
+{
+	struct orangefs_inode_s *orangefs_inode = ORANGEFS_I(inode);
+	struct orangefs_cached_xattr *cx;
+	struct hlist_head *h;
+	struct hlist_node *tmp;
+	h = &orangefs_inode->xattr_cache[xattr_key(key)];
+	if (hlist_empty(h))
+		return NULL;
+	hlist_for_each_entry_safe(cx, tmp, h, node) {
+		if (!time_before(jiffies, cx->timeout)) {
+			hlist_del(&cx->node);
+			kfree(cx);
+			continue;
+		}
+		if (!strcmp(cx->key, key))
+			return cx;
+	}
+	return NULL;
+}
 
 /*
  * Tries to get a specified key's attributes of a given
@@ -64,6 +94,7 @@ ssize_t orangefs_inode_getxattr(struct inode *inode, const char *name,
 {
 	struct orangefs_inode_s *orangefs_inode = ORANGEFS_I(inode);
 	struct orangefs_kernel_op_s *new_op = NULL;
+	struct orangefs_cached_xattr *cx;
 	ssize_t ret = -ENOMEM;
 	ssize_t length = 0;
 	int fsuid;
@@ -92,6 +123,27 @@ ssize_t orangefs_inode_getxattr(struct inode *inode, const char *name,
 
 	down_read(&orangefs_inode->xattr_sem);
 
+	cx = find_cached_xattr(inode, name);
+	if (cx) {
+		if (cx->length == -1) {
+			ret = -ENODATA;
+			goto out_unlock;
+		} else {
+			if (size == 0) {
+				ret = cx->length;
+				goto out_unlock;
+			}
+			if (cx->length > size) {
+				ret = -ERANGE;
+				goto out_release_op;
+			}
+			memcpy(buffer, cx->val, cx->length);
+			memset(buffer + cx->length, 0, size - cx->length);
+			ret = cx->length;
+			goto out_unlock;
+		}
+	}
+
 	new_op = op_alloc(ORANGEFS_VFS_OP_GETXATTR);
 	if (!new_op)
 		goto out_unlock;
@@ -116,6 +168,15 @@ ssize_t orangefs_inode_getxattr(struct inode *inode, const char *name,
 				     " does not exist!\n",
 				     get_khandle_from_ino(inode),
 				     (char *)new_op->upcall.req.getxattr.key);
+			cx = kmalloc(sizeof *cx, GFP_KERNEL);
+			if (cx) {
+				strcpy(cx->key, name);
+				cx->length = -1;
+				cx->timeout = jiffies +
+				    orangefs_getattr_timeout_msecs*HZ/1000;
+				hash_add(orangefs_inode->xattr_cache, &cx->node,
+				    xattr_key(cx->key));
+			}
 		}
 		goto out_release_op;
 	}
@@ -155,6 +216,16 @@ ssize_t orangefs_inode_getxattr(struct inode *inode, const char *name,
 
 	ret = length;
 
+	cx = kmalloc(sizeof *cx, GFP_KERNEL);
+	if (cx) {
+		strcpy(cx->key, name);
+		memcpy(cx->val, buffer, length);
+		cx->length = length;
+		cx->timeout = jiffies + HZ;
+		hash_add(orangefs_inode->xattr_cache, &cx->node,
+		    xattr_key(cx->key));
+	}
+
 out_release_op:
 	op_release(new_op);
 out_unlock:
@@ -167,6 +238,9 @@ static int orangefs_inode_removexattr(struct inode *inode, const char *name,
 {
 	struct orangefs_inode_s *orangefs_inode = ORANGEFS_I(inode);
 	struct orangefs_kernel_op_s *new_op = NULL;
+	struct orangefs_cached_xattr *cx;
+	struct hlist_head *h;
+	struct hlist_node *tmp;
 	int ret = -ENOMEM;
 
 	if (strlen(name) >= ORANGEFS_MAX_XATTR_NAMELEN)
@@ -208,6 +282,16 @@ static int orangefs_inode_removexattr(struct inode *inode, const char *name,
 		     "orangefs_inode_removexattr: returning %d\n", ret);
 
 	op_release(new_op);
+
+	h = &orangefs_inode->xattr_cache[xattr_key(name)];
+	hlist_for_each_entry_safe(cx, tmp, h, node) {
+		if (!strcmp(cx->key, name)) {
+			hlist_del(&cx->node);
+			kfree(cx);
+			break;
+		}
+	}
+
 out_unlock:
 	up_write(&orangefs_inode->xattr_sem);
 	return ret;
@@ -225,6 +309,9 @@ int orangefs_inode_setxattr(struct inode *inode, const char *name,
 	struct orangefs_inode_s *orangefs_inode = ORANGEFS_I(inode);
 	struct orangefs_kernel_op_s *new_op;
 	int internal_flag = 0;
+	struct orangefs_cached_xattr *cx;
+	struct hlist_head *h;
+	struct hlist_node *tmp;
 	int ret = -ENOMEM;
 
 	gossip_debug(GOSSIP_XATTR_DEBUG,
@@ -286,6 +373,16 @@ int orangefs_inode_setxattr(struct inode *inode, const char *name,
 
 	/* when request is serviced properly, free req op struct */
 	op_release(new_op);
+
+	h = &orangefs_inode->xattr_cache[xattr_key(name)];
+	hlist_for_each_entry_safe(cx, tmp, h, node) {
+		if (!strcmp(cx->key, name)) {
+			hlist_del(&cx->node);
+			kfree(cx);
+			break;
+		}
+	}
+
 out_unlock:
 	up_write(&orangefs_inode->xattr_sem);
 	return ret;
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 07/24] orangefs: simpler installation documentation
  2018-03-20 17:02 [PATCH 00/24] orangefs: page cache Martin Brandenburg
                   ` (5 preceding siblings ...)
  2018-03-20 17:02 ` [PATCH 06/24] orangefs: implement xattr cache Martin Brandenburg
@ 2018-03-20 17:02 ` Martin Brandenburg
  2018-03-20 17:02 ` [PATCH 08/24] orangefs: add tracepoint for service_operation Martin Brandenburg
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Martin Brandenburg @ 2018-03-20 17:02 UTC (permalink / raw)
  To: hubcap, linux-fsdevel; +Cc: Martin Brandenburg

Unless one is working on the userspace code, there's no need to compile
OrangeFS.  The package works just fine.

Also document the process to run xfstests against OrangeFS.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
---
 Documentation/filesystems/orangefs.txt | 84 ++++++++++++++++++++++------------
 1 file changed, 56 insertions(+), 28 deletions(-)

diff --git a/Documentation/filesystems/orangefs.txt b/Documentation/filesystems/orangefs.txt
index e2818b60a5c2..a8696a43efe2 100644
--- a/Documentation/filesystems/orangefs.txt
+++ b/Documentation/filesystems/orangefs.txt
@@ -42,48 +42,76 @@ Orangefs versions prior to 2.9.3 would not be compatible with the
 upstream version of the kernel client.
 
 
-BUILDING THE USERSPACE FILESYSTEM ON A SINGLE SERVER
-====================================================
+RUNNING ORANGEFS ON A SINGLE SERVER
+===================================
 
-You can omit --prefix if you don't care that things are sprinkled around in
-/usr/local. As of version 2.9.6, Orangefs uses Berkeley DB by default, we
-will probably be changing the default to lmdb soon.
+OrangeFS is usually run in large installations with multiple servers and
+clients, but a complete filesystem can be run on a single machine for
+development and testing.
 
-./configure --prefix=/opt/ofs --with-db-backend=lmdb
+On Fedora, install orangefs and orangefs-server.
 
-make
+dnf -y install orangefs orangefs-server
 
-make install
+There is an example server configuration file in
+/etc/orangefs/orangefs.conf.  Change localhost to your hostname.
 
-Create an orangefs config file:
-/opt/ofs/bin/pvfs2-genconfig /etc/pvfs2.conf
+To generate a filesystem to run xfstests against, see below.
 
-  for "Enter hostnames", use the hostname, don't let it default to
-  localhost.
+There is an example client configuration file in /etc/pvfs2tab.  It is a
+single line.  Uncomment it and change the hostname if necessary.  This
+controls clients which use libpvfs2.  This does not control the
+pvfs2-client-core.
 
-create a pvfs2tab file in /etc:
-cat /etc/pvfs2tab
-tcp://myhostname:3334/orangefs /mymountpoint pvfs2 defaults,noauto 0 0
+Create the filesystem.
 
-create the mount point you specified in the tab file if needed:
-mkdir /mymountpoint
+pvfs2-server -f /etc/orangefs/orangefs.conf
 
-bootstrap the server:
-/opt/ofs/sbin/pvfs2-server /etc/pvfs2.conf -f
+Start the server.
 
-start the server:
-/opt/osf/sbin/pvfs2-server /etc/pvfs2.conf
+systemctl start orangefs-server
 
-Now the server is running. At this point you might like to
-prove things are working with:
+Test the server.
 
-/opt/osf/bin/pvfs2-ls /mymountpoint
+pvfs2-ping -m /pvfsmnt
 
-If stuff seems to be working, turn on the client core:
-/opt/osf/sbin/pvfs2-client -p /opt/osf/sbin/pvfs2-client-core
+Start the client.  The module must be compiled in or loaded before this
+point.
 
-Mount your filesystem.
-mount -t pvfs2 tcp://myhostname:3334/orangefs /mymountpoint
+systemctl start orangefs-client
+
+Mount the filesystem.
+
+mount -t pvfs2 tcp://localhost:3334/orangefs /pvfsmnt
+
+
+RUNNING XFSTESTS
+================
+
+It is useful to use a scratch filesystem with xfstests.  This can be
+done with only one server.
+
+Make a second copy of the FileSystem section in the server configuration
+file, which is /etc/orangefs/orangefs.conf.  Change the Name to scratch.
+Change the ID to something other than the ID of the first FileSystem
+section (2 is usually a good choice).
+
+Then there are two FileSystem sections: orangefs and scratch.
+
+This change should be made before creating the filesystem.
+
+pvfs2-server -f /etc/orangefs/orangefs.conf
+
+To run xfstests, create /etc/xfsqa.config.
+
+TEST_DIR=/orangefs
+TEST_DEV=tcp://localhost:3334/orangefs
+SCRATCH_MNT=/scratch
+SCRATCH_DEV=tcp://localhost:3334/scratch
+
+Then xfstests can be run
+
+./check -pvfs2
 
 
 OPTIONS
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 08/24] orangefs: add tracepoint for service_operation
  2018-03-20 17:02 [PATCH 00/24] orangefs: page cache Martin Brandenburg
                   ` (6 preceding siblings ...)
  2018-03-20 17:02 ` [PATCH 07/24] orangefs: simpler installation documentation Martin Brandenburg
@ 2018-03-20 17:02 ` Martin Brandenburg
  2018-03-20 17:02 ` [PATCH 09/24] orangefs: tracepoints for orangefs_devreq_{read,write_iter,poll} Martin Brandenburg
                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Martin Brandenburg @ 2018-03-20 17:02 UTC (permalink / raw)
  To: hubcap, linux-fsdevel; +Cc: Martin Brandenburg

This is the first tracepoint for OrangeFS.

Remove op_name argument to service_operation.  It is only used for debug
messages, and just as useful information can be extracted from the op.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
---
 fs/orangefs/Makefile           |  4 +++-
 fs/orangefs/dcache.c           |  3 +--
 fs/orangefs/dir.c              |  3 +--
 fs/orangefs/file.c             | 10 ++--------
 fs/orangefs/inode.c            |  2 +-
 fs/orangefs/namei.c            | 12 +++++-------
 fs/orangefs/orangefs-debugfs.c |  4 +---
 fs/orangefs/orangefs-kernel.h  |  4 +---
 fs/orangefs/orangefs-sysfs.c   | 11 ++---------
 fs/orangefs/orangefs-trace.c   |  3 +++
 fs/orangefs/orangefs-trace.h   | 41 +++++++++++++++++++++++++++++++++++++++++
 fs/orangefs/orangefs-utils.c   |  9 +++------
 fs/orangefs/super.c            | 12 ++++++------
 fs/orangefs/waitqueue.c        | 15 +++++++--------
 fs/orangefs/xattr.c            | 14 ++++----------
 15 files changed, 81 insertions(+), 66 deletions(-)
 create mode 100644 fs/orangefs/orangefs-trace.c
 create mode 100644 fs/orangefs/orangefs-trace.h

diff --git a/fs/orangefs/Makefile b/fs/orangefs/Makefile
index 9b6c50bb173b..072388cdf0a3 100644
--- a/fs/orangefs/Makefile
+++ b/fs/orangefs/Makefile
@@ -8,4 +8,6 @@ obj-$(CONFIG_ORANGEFS_FS) += orangefs.o
 orangefs-objs := acl.o file.o orangefs-cache.o orangefs-utils.o xattr.o \
 		 dcache.o inode.o orangefs-sysfs.o orangefs-mod.o super.o \
 		 devorangefs-req.o namei.o symlink.o dir.o orangefs-bufmap.o \
-		 orangefs-debugfs.o waitqueue.o
+		 orangefs-debugfs.o waitqueue.o orangefs-trace.o
+
+CFLAGS_orangefs-trace.o += -I$(src)
diff --git a/fs/orangefs/dcache.c b/fs/orangefs/dcache.c
index 8e8e15850e39..606235415351 100644
--- a/fs/orangefs/dcache.c
+++ b/fs/orangefs/dcache.c
@@ -41,8 +41,7 @@ static int orangefs_revalidate_lookup(struct dentry *dentry)
 		     __LINE__,
 		     get_interruptible_flag(parent_inode));
 
-	err = service_operation(new_op, "orangefs_lookup",
-			get_interruptible_flag(parent_inode));
+	err = service_operation(new_op, get_interruptible_flag(parent_inode));
 
 	/* Positive dentry: reject if error or not the same inode. */
 	if (inode) {
diff --git a/fs/orangefs/dir.c b/fs/orangefs/dir.c
index e760315acd2a..567177dd956c 100644
--- a/fs/orangefs/dir.c
+++ b/fs/orangefs/dir.c
@@ -85,8 +85,7 @@ static int do_readdir(struct orangefs_inode_s *oi,
 
 	op->upcall.req.readdir.buf_index = bufi;
 
-	r = service_operation(op, "orangefs_readdir",
-	    get_interruptible_flag(dentry->d_inode));
+	r = service_operation(op, get_interruptible_flag(dentry->d_inode));
 
 	orangefs_readdir_index_put(bufi);
 
diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c
index f78a5902d5f7..5a66d521d9df 100644
--- a/fs/orangefs/file.c
+++ b/fs/orangefs/file.c
@@ -30,8 +30,7 @@ static int flush_racache(struct inode *inode)
 		return -ENOMEM;
 	new_op->upcall.req.ra_cache_flush.refn = orangefs_inode->refn;
 
-	ret = service_operation(new_op, "orangefs_flush_racache",
-	    get_interruptible_flag(inode));
+	ret = service_operation(new_op, get_interruptible_flag(inode));
 
 	gossip_debug(GOSSIP_UTILS_DEBUG, "%s: got return value of %d\n",
 	    __func__, ret);
@@ -110,11 +109,7 @@ static ssize_t wait_for_direct_io(enum ORANGEFS_io_type type, struct inode *inod
 		     llu(new_op->tag));
 
 	/* Stage 2: Service the I/O operation */
-	ret = service_operation(new_op,
-				type == ORANGEFS_IO_WRITE ?
-					"file_write" :
-					"file_read",
-				get_interruptible_flag(inode));
+	ret = service_operation(new_op, get_interruptible_flag(inode));
 
 	/*
 	 * If service_operation() returns -EAGAIN #and# the operation was
@@ -627,7 +622,6 @@ static int orangefs_fsync(struct file *file,
 	new_op->upcall.req.fsync.refn = orangefs_inode->refn;
 
 	ret = service_operation(new_op,
-			"orangefs_fsync",
 			get_interruptible_flag(file_inode(file)));
 
 	gossip_debug(GOSSIP_FILE_DEBUG,
diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c
index 26381d451039..bfd3add254c1 100644
--- a/fs/orangefs/inode.c
+++ b/fs/orangefs/inode.c
@@ -180,7 +180,7 @@ static int orangefs_setattr_size(struct inode *inode, struct iattr *iattr)
 	new_op->upcall.req.truncate.refn = orangefs_inode->refn;
 	new_op->upcall.req.truncate.size = (__s64) iattr->ia_size;
 
-	ret = service_operation(new_op, __func__,
+	ret = service_operation(new_op,
 				get_interruptible_flag(inode));
 
 	/*
diff --git a/fs/orangefs/namei.c b/fs/orangefs/namei.c
index 3ba6e153f769..7dfdf6ad15f6 100644
--- a/fs/orangefs/namei.c
+++ b/fs/orangefs/namei.c
@@ -42,7 +42,7 @@ static int orangefs_create(struct inode *dir,
 	strncpy(new_op->upcall.req.create.d_name,
 		dentry->d_name.name, ORANGEFS_NAME_MAX - 1);
 
-	ret = service_operation(new_op, __func__, get_interruptible_flag(dir));
+	ret = service_operation(new_op, get_interruptible_flag(dir));
 
 	gossip_debug(GOSSIP_NAME_DEBUG,
 		     "%s: %pd: handle:%pU: fsid:%d: new_op:%p: ret:%d:\n",
@@ -150,7 +150,7 @@ static struct dentry *orangefs_lookup(struct inode *dir, struct dentry *dentry,
 		     &new_op->upcall.req.lookup.parent_refn.khandle,
 		     new_op->upcall.req.lookup.parent_refn.fs_id);
 
-	ret = service_operation(new_op, __func__, get_interruptible_flag(dir));
+	ret = service_operation(new_op, get_interruptible_flag(dir));
 
 	gossip_debug(GOSSIP_NAME_DEBUG,
 		     "Lookup Got %pU, fsid %d (ret=%d)\n",
@@ -245,8 +245,7 @@ static int orangefs_unlink(struct inode *dir, struct dentry *dentry)
 	strncpy(new_op->upcall.req.remove.d_name, dentry->d_name.name,
 		ORANGEFS_NAME_MAX - 1);
 
-	ret = service_operation(new_op, "orangefs_unlink",
-				get_interruptible_flag(inode));
+	ret = service_operation(new_op, get_interruptible_flag(inode));
 
 	gossip_debug(GOSSIP_NAME_DEBUG,
 		     "%s: service_operation returned:%d:\n",
@@ -302,7 +301,7 @@ static int orangefs_symlink(struct inode *dir,
 		ORANGEFS_NAME_MAX - 1);
 	strncpy(new_op->upcall.req.sym.target, symname, ORANGEFS_NAME_MAX - 1);
 
-	ret = service_operation(new_op, __func__, get_interruptible_flag(dir));
+	ret = service_operation(new_op, get_interruptible_flag(dir));
 
 	gossip_debug(GOSSIP_NAME_DEBUG,
 		     "Symlink Got ORANGEFS handle %pU on fsid %d (ret=%d)\n",
@@ -373,7 +372,7 @@ static int orangefs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode
 	strncpy(new_op->upcall.req.mkdir.d_name,
 		dentry->d_name.name, ORANGEFS_NAME_MAX - 1);
 
-	ret = service_operation(new_op, __func__, get_interruptible_flag(dir));
+	ret = service_operation(new_op, get_interruptible_flag(dir));
 
 	gossip_debug(GOSSIP_NAME_DEBUG,
 		     "Mkdir Got ORANGEFS handle %pU on fsid %d\n",
@@ -458,7 +457,6 @@ static int orangefs_rename(struct inode *old_dir,
 		ORANGEFS_NAME_MAX - 1);
 
 	ret = service_operation(new_op,
-				"orangefs_rename",
 				get_interruptible_flag(old_dentry->d_inode));
 
 	gossip_debug(GOSSIP_NAME_DEBUG,
diff --git a/fs/orangefs/orangefs-debugfs.c b/fs/orangefs/orangefs-debugfs.c
index af14e80211c9..61f1957e1e70 100644
--- a/fs/orangefs/orangefs-debugfs.c
+++ b/fs/orangefs/orangefs-debugfs.c
@@ -512,9 +512,7 @@ static ssize_t orangefs_debug_write(struct file *file,
 			c_mask.mask2);
 
 		/* service_operation returns 0 on success... */
-		rc = service_operation(new_op,
-				       "orangefs_param",
-					ORANGEFS_OP_INTERRUPTIBLE);
+		rc = service_operation(new_op, ORANGEFS_OP_INTERRUPTIBLE);
 
 		if (rc)
 			gossip_debug(GOSSIP_DEBUGFS_DEBUG,
diff --git a/fs/orangefs/orangefs-kernel.h b/fs/orangefs/orangefs-kernel.h
index b2046d29a116..7e2ad1590d1e 100644
--- a/fs/orangefs/orangefs-kernel.h
+++ b/fs/orangefs/orangefs-kernel.h
@@ -439,9 +439,7 @@ extern const struct dentry_operations orangefs_dentry_operations;
 #define ORANGEFS_OP_NO_MUTEX      8   /* don't acquire request_mutex */
 #define ORANGEFS_OP_ASYNC         16  /* Queue it, but don't wait */
 
-int service_operation(struct orangefs_kernel_op_s *op,
-		      const char *op_name,
-		      int flags);
+int service_operation(struct orangefs_kernel_op_s *, int);
 
 #define get_interruptible_flag(inode) \
 	((ORANGEFS_SB(inode->i_sb)->flags & ORANGEFS_OPT_INTR) ? \
diff --git a/fs/orangefs/orangefs-sysfs.c b/fs/orangefs/orangefs-sysfs.c
index 71177ef3d8b6..f1de0bba657b 100644
--- a/fs/orangefs/orangefs-sysfs.c
+++ b/fs/orangefs/orangefs-sysfs.c
@@ -303,7 +303,6 @@ static ssize_t sysfs_service_op_show(struct kobject *kobj,
 {
 	struct orangefs_kernel_op_s *new_op = NULL;
 	int rc = 0;
-	char *ser_op_type = NULL;
 	__u32 op_alloc_type;
 
 	gossip_debug(GOSSIP_SYSFS_DEBUG,
@@ -461,17 +460,11 @@ static ssize_t sysfs_service_op_show(struct kobject *kobj,
 		goto out;
 	}
 
-
-	if (strcmp(kobj->name, PC_KOBJ_ID))
-		ser_op_type = "orangefs_param";
-	else
-		ser_op_type = "orangefs_perf_count";
-
 	/*
 	 * The service_operation will return an errno return code on
 	 * error, and zero on success.
 	 */
-	rc = service_operation(new_op, ser_op_type, ORANGEFS_OP_INTERRUPTIBLE);
+	rc = service_operation(new_op, ORANGEFS_OP_INTERRUPTIBLE);
 
 out:
 	if (!rc) {
@@ -792,7 +785,7 @@ static ssize_t sysfs_service_op_store(struct kobject *kobj,
 	 * The service_operation will return a errno return code on
 	 * error, and zero on success.
 	 */
-	rc = service_operation(new_op, "orangefs_param", ORANGEFS_OP_INTERRUPTIBLE);
+	rc = service_operation(new_op, ORANGEFS_OP_INTERRUPTIBLE);
 
 	if (rc < 0) {
 		gossip_err("sysfs_service_op_store: service op returned:%d:\n",
diff --git a/fs/orangefs/orangefs-trace.c b/fs/orangefs/orangefs-trace.c
new file mode 100644
index 000000000000..f4e0a1d04577
--- /dev/null
+++ b/fs/orangefs/orangefs-trace.c
@@ -0,0 +1,3 @@
+#include "orangefs-kernel.h"
+#define CREATE_TRACE_POINTS
+#include "orangefs-trace.h"
diff --git a/fs/orangefs/orangefs-trace.h b/fs/orangefs/orangefs-trace.h
new file mode 100644
index 000000000000..73feffc43d93
--- /dev/null
+++ b/fs/orangefs/orangefs-trace.h
@@ -0,0 +1,41 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2018 Omnibond Systems, L.L.C.
+ */
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM orangefs
+
+#if !defined(_TRACE_ORANGEFS_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_ORANGEFS_H
+
+#include <linux/tracepoint.h>
+
+#define OP_NAME_LEN 64
+
+TRACE_EVENT(orangefs_service_operation,
+    TP_PROTO(struct orangefs_kernel_op_s *op, int flags),
+    TP_ARGS(op, flags),
+    TP_STRUCT__entry(
+        __array(char, op_name, OP_NAME_LEN)
+        __field(int, flags)
+        __field(int, attempts)
+    ),
+    TP_fast_assign(
+        strlcpy(__entry->op_name, get_opname_string(op), OP_NAME_LEN);
+        __entry->flags = flags;
+        __entry->attempts = op->attempts;
+    ),
+    TP_printk(
+        "op_name=%s flags=%d attempts=%d", __entry->op_name, __entry->flags,
+        __entry->attempts
+    )
+);
+
+#endif
+
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH .
+#define TRACE_INCLUDE_FILE orangefs-trace
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/fs/orangefs/orangefs-utils.c b/fs/orangefs/orangefs-utils.c
index 0a08c7bd25ca..f8cbbdd7dd7f 100644
--- a/fs/orangefs/orangefs-utils.c
+++ b/fs/orangefs/orangefs-utils.c
@@ -304,8 +304,7 @@ int orangefs_inode_getattr(struct inode *inode, int new, int bypass,
 		new_op->upcall.req.getattr.mask =
 		    ORANGEFS_ATTR_SYS_ALL_NOHINT & ~ORANGEFS_ATTR_SYS_SIZE;
 
-	ret = service_operation(new_op, __func__,
-	    get_interruptible_flag(inode));
+	ret = service_operation(new_op, get_interruptible_flag(inode));
 	if (ret != 0)
 		goto out;
 
@@ -419,8 +418,7 @@ int orangefs_inode_check_changed(struct inode *inode)
 	new_op->upcall.req.getattr.mask = ORANGEFS_ATTR_SYS_TYPE |
 	    ORANGEFS_ATTR_SYS_LNK_TARGET;
 
-	ret = service_operation(new_op, __func__,
-	    get_interruptible_flag(inode));
+	ret = service_operation(new_op, get_interruptible_flag(inode));
 	if (ret != 0)
 		goto out;
 
@@ -451,8 +449,7 @@ int orangefs_inode_setattr(struct inode *inode, struct iattr *iattr)
 		       &new_op->upcall.req.setattr.attributes,
 		       iattr);
 	if (ret >= 0) {
-		ret = service_operation(new_op, __func__,
-				get_interruptible_flag(inode));
+		ret = service_operation(new_op, get_interruptible_flag(inode));
 
 		gossip_debug(GOSSIP_UTILS_DEBUG,
 			     "orangefs_inode_setattr: returning %d\n",
diff --git a/fs/orangefs/super.c b/fs/orangefs/super.c
index b7718530bde4..f4ff3aec9989 100644
--- a/fs/orangefs/super.c
+++ b/fs/orangefs/super.c
@@ -176,7 +176,7 @@ static int orangefs_statfs(struct dentry *dentry, struct kstatfs *buf)
 	if (ORANGEFS_SB(sb)->flags & ORANGEFS_OPT_INTR)
 		flags = ORANGEFS_OP_INTERRUPTIBLE;
 
-	ret = service_operation(new_op, "orangefs_statfs", flags);
+	ret = service_operation(new_op, flags);
 
 	if (new_op->downcall.status < 0)
 		goto out_op_release;
@@ -258,7 +258,7 @@ int orangefs_remount(struct orangefs_sb_info_s *orangefs_sb)
 	 * request_mutex to prevent other operations from bypassing
 	 * this one
 	 */
-	ret = service_operation(new_op, "orangefs_remount",
+	ret = service_operation(new_op,
 		ORANGEFS_OP_PRIORITY | ORANGEFS_OP_NO_MUTEX);
 	gossip_debug(GOSSIP_SUPER_DEBUG,
 		     "orangefs_remount: mount got return value of %d\n",
@@ -280,7 +280,7 @@ int orangefs_remount(struct orangefs_sb_info_s *orangefs_sb)
 		if (!new_op)
 			return -ENOMEM;
 		new_op->upcall.req.features.features = 0;
-		ret = service_operation(new_op, "orangefs_features",
+		ret = service_operation(new_op,
 		    ORANGEFS_OP_PRIORITY | ORANGEFS_OP_NO_MUTEX);
 		if (!ret)
 			orangefs_features =
@@ -392,7 +392,7 @@ static int orangefs_unmount(int id, __s32 fs_id, const char *devname)
 	op->upcall.req.fs_umount.fs_id = fs_id;
 	strncpy(op->upcall.req.fs_umount.orangefs_config_server,
 	    devname, ORANGEFS_MAX_SERVER_ADDR_LEN - 1);
-	r = service_operation(op, "orangefs_fs_umount", 0);
+	r = service_operation(op, 0);
 	/* Not much to do about an error here. */
 	if (r)
 		gossip_err("orangefs_unmount: service_operation %d\n", r);
@@ -492,7 +492,7 @@ struct dentry *orangefs_mount(struct file_system_type *fst,
 		     "Attempting ORANGEFS Mount via host %s\n",
 		     new_op->upcall.req.fs_mount.orangefs_config_server);
 
-	ret = service_operation(new_op, "orangefs_mount", 0);
+	ret = service_operation(new_op, 0);
 	gossip_debug(GOSSIP_SUPER_DEBUG,
 		     "orangefs_mount: mount got return value of %d\n", ret);
 	if (ret)
@@ -553,7 +553,7 @@ struct dentry *orangefs_mount(struct file_system_type *fst,
 		if (!new_op)
 			return ERR_PTR(-ENOMEM);
 		new_op->upcall.req.features.features = 0;
-		ret = service_operation(new_op, "orangefs_features", 0);
+		ret = service_operation(new_op, 0);
 		orangefs_features = new_op->downcall.resp.features.features;
 		op_release(new_op);
 	} else {
diff --git a/fs/orangefs/waitqueue.c b/fs/orangefs/waitqueue.c
index 1992a2647b8a..c345a1d7fde2 100644
--- a/fs/orangefs/waitqueue.c
+++ b/fs/orangefs/waitqueue.c
@@ -15,6 +15,7 @@
 
 #include "orangefs-kernel.h"
 #include "orangefs-bufmap.h"
+#include "orangefs-trace.h"
 
 static int wait_for_matching_downcall(struct orangefs_kernel_op_s *, long, bool);
 static void orangefs_clean_up_interrupted_operation(struct orangefs_kernel_op_s *);
@@ -57,9 +58,7 @@ void purge_waiting_ops(void)
  *
  * Returns contents of op->downcall.status for convenience
  */
-int service_operation(struct orangefs_kernel_op_s *op,
-		      const char *op_name,
-		      int flags)
+int service_operation(struct orangefs_kernel_op_s *op, int flags)
 {
 	long timeout = MAX_SCHEDULE_TIMEOUT;
 	int ret = 0;
@@ -74,10 +73,11 @@ int service_operation(struct orangefs_kernel_op_s *op,
 	gossip_debug(GOSSIP_WAIT_DEBUG,
 		     "%s: %s op:%p: process:%s: pid:%d:\n",
 		     __func__,
-		     op_name,
+		     get_opname_string(op),
 		     op,
 		     current->comm,
 		     current->pid);
+	trace_orangefs_service_operation(op, flags);
 
 	/*
 	 * If ORANGEFS_OP_NO_MUTEX was set in flags, we need to avoid
@@ -159,8 +159,7 @@ int service_operation(struct orangefs_kernel_op_s *op,
 	/* failed to get matching downcall */
 	if (ret == -ETIMEDOUT) {
 		gossip_err("%s: %s -- wait timed out; aborting attempt.\n",
-			   __func__,
-			   op_name);
+			   __func__, get_opname_string(op));
 	}
 
 	/*
@@ -178,7 +177,7 @@ int service_operation(struct orangefs_kernel_op_s *op,
 			     "orangefs: tag %llu (%s)"
 			     " -- operation to be retried (%d attempt)\n",
 			     llu(op->tag),
-			     op_name,
+			     get_opname_string(op),
 			     op->attempts);
 
 		/*
@@ -194,7 +193,7 @@ int service_operation(struct orangefs_kernel_op_s *op,
 	gossip_debug(GOSSIP_WAIT_DEBUG,
 		     "%s: %s returning: %d for %p.\n",
 		     __func__,
-		     op_name,
+		     get_opname_string(op),
 		     ret,
 		     op);
 	return ret;
diff --git a/fs/orangefs/xattr.c b/fs/orangefs/xattr.c
index 59bd382eb6dc..3cef454e7afa 100644
--- a/fs/orangefs/xattr.c
+++ b/fs/orangefs/xattr.c
@@ -158,8 +158,7 @@ ssize_t orangefs_inode_getxattr(struct inode *inode, const char *name,
 	 */
 	new_op->upcall.req.getxattr.key_sz = strlen(name) + 1;
 
-	ret = service_operation(new_op, "orangefs_inode_getxattr",
-				get_interruptible_flag(inode));
+	ret = service_operation(new_op, get_interruptible_flag(inode));
 	if (ret != 0) {
 		if (ret == -ENOENT) {
 			ret = -ENODATA;
@@ -265,9 +264,7 @@ static int orangefs_inode_removexattr(struct inode *inode, const char *name,
 		     (char *)new_op->upcall.req.removexattr.key,
 		     (int)new_op->upcall.req.removexattr.key_sz);
 
-	ret = service_operation(new_op,
-				"orangefs_inode_removexattr",
-				get_interruptible_flag(inode));
+	ret = service_operation(new_op, get_interruptible_flag(inode));
 	if (ret == -ENOENT) {
 		/*
 		 * Request to replace a non-existent attribute is an error.
@@ -363,9 +360,7 @@ int orangefs_inode_setxattr(struct inode *inode, const char *name,
 		     (int)new_op->upcall.req.setxattr.keyval.key_sz,
 		     size);
 
-	ret = service_operation(new_op,
-				"orangefs_inode_setxattr",
-				get_interruptible_flag(inode));
+	ret = service_operation(new_op, get_interruptible_flag(inode));
 
 	gossip_debug(GOSSIP_XATTR_DEBUG,
 		     "orangefs_inode_setxattr: returning %d\n",
@@ -427,8 +422,7 @@ ssize_t orangefs_listxattr(struct dentry *dentry, char *buffer, size_t size)
 	new_op->upcall.req.listxattr.token = token;
 	new_op->upcall.req.listxattr.requested_count =
 	    (size == 0) ? 0 : ORANGEFS_MAX_XATTR_LISTLEN;
-	ret = service_operation(new_op, __func__,
-				get_interruptible_flag(inode));
+	ret = service_operation(new_op, get_interruptible_flag(inode));
 	if (ret != 0)
 		goto done;
 
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 09/24] orangefs: tracepoints for orangefs_devreq_{read,write_iter,poll}
  2018-03-20 17:02 [PATCH 00/24] orangefs: page cache Martin Brandenburg
                   ` (7 preceding siblings ...)
  2018-03-20 17:02 ` [PATCH 08/24] orangefs: add tracepoint for service_operation Martin Brandenburg
@ 2018-03-20 17:02 ` Martin Brandenburg
  2018-03-20 17:02 ` [PATCH 10/24] orangefs: do not invalidate attributes on inode create Martin Brandenburg
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Martin Brandenburg @ 2018-03-20 17:02 UTC (permalink / raw)
  To: hubcap, linux-fsdevel; +Cc: Martin Brandenburg

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
---
 fs/orangefs/devorangefs-req.c | 14 ++++++++++--
 fs/orangefs/orangefs-trace.h  | 50 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 62 insertions(+), 2 deletions(-)

diff --git a/fs/orangefs/devorangefs-req.c b/fs/orangefs/devorangefs-req.c
index f4a1eff35e59..e33bfeac92e8 100644
--- a/fs/orangefs/devorangefs-req.c
+++ b/fs/orangefs/devorangefs-req.c
@@ -11,6 +11,7 @@
 #include "orangefs-kernel.h"
 #include "orangefs-bufmap.h"
 #include "orangefs-debugfs.h"
+#include "orangefs-trace.h"
 
 #include <linux/debugfs.h>
 #include <linux/slab.h>
@@ -180,8 +181,10 @@ static ssize_t orangefs_devreq_read(struct file *file,
 	}
 
 	/* Check for an empty list before locking. */
-	if (list_empty(&orangefs_request_list))
+	if (list_empty(&orangefs_request_list)) {
+		trace_orangefs_devreq_read(0, 1, NULL);
 		return -EAGAIN;
+	}
 
 restart:
 	cur_op = NULL;
@@ -250,6 +253,7 @@ static ssize_t orangefs_devreq_read(struct file *file,
 	 */
 	if (!cur_op) {
 		spin_unlock(&orangefs_request_list_lock);
+		trace_orangefs_devreq_read(0, 0, NULL);
 		return -EAGAIN;
 	}
 
@@ -314,6 +318,7 @@ static ssize_t orangefs_devreq_read(struct file *file,
 	spin_unlock(&cur_op->lock);
 	spin_unlock(&orangefs_htable_ops_in_progress_lock);
 
+	trace_orangefs_devreq_read(1, 0, cur_op);
 	/* The client only asks to read one size buffer. */
 	return MAX_DEV_REQ_UPSIZE;
 error:
@@ -340,6 +345,7 @@ static ssize_t orangefs_devreq_read(struct file *file,
 		complete(&cur_op->waitq);
 	}
 	spin_unlock(&orangefs_request_list_lock);
+	trace_orangefs_devreq_read(0, 0, cur_op);
 	return -EFAULT;
 }
 
@@ -474,6 +480,7 @@ static ssize_t orangefs_devreq_write_iter(struct kiocb *iocb,
 	}
 
 wakeup:
+	trace_orangefs_devreq_write_iter(op);
 	/*
 	 * Return to vfs waitqueue, and back to service_operation
 	 * through wait_for_matching_downcall. 
@@ -781,11 +788,14 @@ static __poll_t orangefs_devreq_poll(struct file *file,
 				      struct poll_table_struct *poll_table)
 {
 	__poll_t poll_revent_mask = 0;
+	int empty;
 
 	poll_wait(file, &orangefs_request_list_waitq, poll_table);
 
-	if (!list_empty(&orangefs_request_list))
+	empty = list_empty(&orangefs_request_list);
+	if (!empty)
 		poll_revent_mask |= EPOLLIN;
+	trace_orangefs_devreq_poll(empty);
 	return poll_revent_mask;
 }
 
diff --git a/fs/orangefs/orangefs-trace.h b/fs/orangefs/orangefs-trace.h
index 73feffc43d93..16e2b5a86071 100644
--- a/fs/orangefs/orangefs-trace.h
+++ b/fs/orangefs/orangefs-trace.h
@@ -13,6 +13,56 @@
 
 #define OP_NAME_LEN 64
 
+TRACE_EVENT(orangefs_devreq_poll,
+    TP_PROTO(int empty),
+    TP_ARGS(empty),
+    TP_STRUCT__entry(
+        __field(int, empty)
+    ),
+    TP_fast_assign(
+        __entry->empty = empty;
+    ),
+    TP_printk(
+        "empty=%d", __entry->empty
+    )
+);
+
+TRACE_EVENT(orangefs_devreq_read,
+    TP_PROTO(int success, int empty, struct orangefs_kernel_op_s *op),
+    TP_ARGS(success, empty, op),
+    TP_STRUCT__entry(
+        __field(int, success)
+        __field(int, empty)
+        __array(char, op_name, OP_NAME_LEN)
+    ),
+    TP_fast_assign(
+        __entry->success = success;
+        __entry->empty = empty;
+        if (op)
+            strlcpy(__entry->op_name, get_opname_string(op), OP_NAME_LEN);
+        else
+            __entry->op_name[0] = 0;
+    ),
+    TP_printk(
+        "success=%d empty=%d op_name=%s", __entry->success, __entry->empty,
+        __entry->op_name
+    )
+);
+
+TRACE_EVENT(orangefs_devreq_write_iter,
+    TP_PROTO(struct orangefs_kernel_op_s *op),
+    TP_ARGS(op),
+    TP_STRUCT__entry(
+        __array(char, op_name, OP_NAME_LEN)
+    ),
+    TP_fast_assign(
+        strlcpy(__entry->op_name, get_opname_string(op), OP_NAME_LEN);
+    ),
+    TP_printk(
+        "op_name=%s", __entry->op_name
+    )
+);
+
 TRACE_EVENT(orangefs_service_operation,
     TP_PROTO(struct orangefs_kernel_op_s *op, int flags),
     TP_ARGS(op, flags),
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 10/24] orangefs: do not invalidate attributes on inode create
  2018-03-20 17:02 [PATCH 00/24] orangefs: page cache Martin Brandenburg
                   ` (8 preceding siblings ...)
  2018-03-20 17:02 ` [PATCH 09/24] orangefs: tracepoints for orangefs_devreq_{read,write_iter,poll} Martin Brandenburg
@ 2018-03-20 17:02 ` Martin Brandenburg
  2018-03-20 17:02 ` [PATCH 11/24] orangefs: simply orangefs_inode_getattr interface Martin Brandenburg
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Martin Brandenburg @ 2018-03-20 17:02 UTC (permalink / raw)
  To: hubcap, linux-fsdevel; +Cc: Martin Brandenburg

When an inode is created, we fetch attributes from the server.  There is
no need to turn around and invalidate them.

No need to initialize attributes after the getattr either.  Either it'll
be exactly the same, or it'll be something else and wrong.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
---
 fs/orangefs/inode.c | 6 ------
 fs/orangefs/namei.c | 6 ------
 2 files changed, 12 deletions(-)

diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c
index bfd3add254c1..109bdd101e04 100644
--- a/fs/orangefs/inode.c
+++ b/fs/orangefs/inode.c
@@ -448,12 +448,6 @@ struct inode *orangefs_new_inode(struct super_block *sb, struct inode *dir,
 		goto out_iput;
 
 	orangefs_init_iops(inode);
-
-	inode->i_mode = mode;
-	inode->i_uid = current_fsuid();
-	inode->i_gid = current_fsgid();
-	inode->i_atime = inode->i_mtime = inode->i_ctime = current_time(inode);
-	inode->i_size = PAGE_SIZE;
 	inode->i_rdev = dev;
 
 	error = insert_inode_locked4(inode, hash, orangefs_test_inode, ref);
diff --git a/fs/orangefs/namei.c b/fs/orangefs/namei.c
index 7dfdf6ad15f6..8f1c6e6d3ee1 100644
--- a/fs/orangefs/namei.c
+++ b/fs/orangefs/namei.c
@@ -77,8 +77,6 @@ static int orangefs_create(struct inode *dir,
 	d_instantiate(dentry, inode);
 	unlock_new_inode(inode);
 	orangefs_set_timeout(dentry);
-	ORANGEFS_I(inode)->getattr_time = jiffies - 1;
-	ORANGEFS_I(inode)->getattr_mask = STATX_BASIC_STATS;
 
 	gossip_debug(GOSSIP_NAME_DEBUG,
 		     "%s: dentry instantiated for %pd\n",
@@ -333,8 +331,6 @@ static int orangefs_symlink(struct inode *dir,
 	d_instantiate(dentry, inode);
 	unlock_new_inode(inode);
 	orangefs_set_timeout(dentry);
-	ORANGEFS_I(inode)->getattr_time = jiffies - 1;
-	ORANGEFS_I(inode)->getattr_mask = STATX_BASIC_STATS;
 
 	gossip_debug(GOSSIP_NAME_DEBUG,
 		     "Inode (Symlink) %pU -> %pd\n",
@@ -403,8 +399,6 @@ static int orangefs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode
 	d_instantiate(dentry, inode);
 	unlock_new_inode(inode);
 	orangefs_set_timeout(dentry);
-	ORANGEFS_I(inode)->getattr_time = jiffies - 1;
-	ORANGEFS_I(inode)->getattr_mask = STATX_BASIC_STATS;
 
 	gossip_debug(GOSSIP_NAME_DEBUG,
 		     "Inode (Directory) %pU -> %pd\n",
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 11/24] orangefs: simply orangefs_inode_getattr interface
  2018-03-20 17:02 [PATCH 00/24] orangefs: page cache Martin Brandenburg
                   ` (9 preceding siblings ...)
  2018-03-20 17:02 ` [PATCH 10/24] orangefs: do not invalidate attributes on inode create Martin Brandenburg
@ 2018-03-20 17:02 ` Martin Brandenburg
  2018-03-20 17:02 ` [PATCH 12/24] orangefs: update attributes rather than relying on server Martin Brandenburg
                   ` (12 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Martin Brandenburg @ 2018-03-20 17:02 UTC (permalink / raw)
  To: hubcap, linux-fsdevel; +Cc: Martin Brandenburg

No need to store the received mask.  It is either STATX_BASIC_STATS or
STATX_BASIC_STATS & ~STATX_SIZE.  If STATX_SIZE is requested, the cache
is bypassed anyway, so the cached mask is unnecessary to decide whether
to do a real getattr.

This is a change.  Previously a getattr would want size and use the
cached size.  All of the in-kernel callers that wanted size did not want
a cached size.  Now a getattr cannot use the cached size if it wants
size at all.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
---
 fs/orangefs/file.c            | 12 ++++++------
 fs/orangefs/inode.c           | 11 ++++++-----
 fs/orangefs/orangefs-kernel.h |  7 ++++---
 fs/orangefs/orangefs-utils.c  | 31 ++++++++++---------------------
 4 files changed, 26 insertions(+), 35 deletions(-)

diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c
index 5a66d521d9df..e1b9880e2b5f 100644
--- a/fs/orangefs/file.c
+++ b/fs/orangefs/file.c
@@ -418,8 +418,8 @@ static ssize_t orangefs_file_write_iter(struct kiocb *iocb, struct iov_iter *ite
 
 	/* Make sure generic_write_checks sees an up to date inode size. */
 	if (file->f_flags & O_APPEND) {
-		rc = orangefs_inode_getattr(file->f_mapping->host, 0, 1,
-		    STATX_SIZE);
+		rc = orangefs_inode_getattr(file->f_mapping->host,
+		    ORANGEFS_GETATTR_SIZE);
 		if (rc == -ESTALE)
 			rc = -EIO;
 		if (rc) {
@@ -526,8 +526,8 @@ static int orangefs_fault(struct vm_fault *vmf)
 {
 	struct file *file = vmf->vma->vm_file;
 	int rc;
-	rc = orangefs_inode_getattr(file->f_mapping->host, 0, 1,
-	    STATX_SIZE);
+	rc = orangefs_inode_getattr(file->f_mapping->host,
+	    ORANGEFS_GETATTR_SIZE);
 	if (rc == -ESTALE)
 		rc = -EIO;
 	if (rc) {
@@ -652,8 +652,8 @@ static loff_t orangefs_file_llseek(struct file *file, loff_t offset, int origin)
 		 * NOTE: We are only interested in file size here,
 		 * so we set mask accordingly.
 		 */
-		ret = orangefs_inode_getattr(file->f_mapping->host, 0, 1,
-		    STATX_SIZE);
+		ret = orangefs_inode_getattr(file->f_mapping->host,
+		    ORANGEFS_GETATTR_SIZE);
 		if (ret == -ESTALE)
 			ret = -EIO;
 		if (ret) {
diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c
index 109bdd101e04..6222f029f93a 100644
--- a/fs/orangefs/inode.c
+++ b/fs/orangefs/inode.c
@@ -161,7 +161,7 @@ static int orangefs_setattr_size(struct inode *inode, struct iattr *iattr)
 		     iattr->ia_size);
 
 	/* Ensure that we have a up to date size, so we know if it changed. */
-	ret = orangefs_inode_getattr(inode, 0, 1, STATX_SIZE);
+	ret = orangefs_inode_getattr(inode, ORANGEFS_GETATTR_SIZE);
 	if (ret == -ESTALE)
 		ret = -EIO;
 	if (ret) {
@@ -255,7 +255,8 @@ int orangefs_getattr(const struct path *path, struct kstat *stat,
 		     "orangefs_getattr: called on %pd\n",
 		     path->dentry);
 
-	ret = orangefs_inode_getattr(inode, 0, 0, request_mask);
+	ret = orangefs_inode_getattr(inode,
+	    request_mask & STATX_SIZE ? ORANGEFS_GETATTR_SIZE : 0);
 	if (ret == 0) {
 		generic_fillattr(inode, stat);
 
@@ -282,7 +283,7 @@ int orangefs_permission(struct inode *inode, int mask)
 	gossip_debug(GOSSIP_INODE_DEBUG, "%s: refreshing\n", __func__);
 
 	/* Make sure the permission (and other common attrs) are up to date. */
-	ret = orangefs_inode_getattr(inode, 0, 0, STATX_MODE);
+	ret = orangefs_inode_getattr(inode, 0);
 	if (ret < 0)
 		return ret;
 
@@ -398,7 +399,7 @@ struct inode *orangefs_iget(struct super_block *sb, struct orangefs_object_kref
 	if (!inode || !(inode->i_state & I_NEW))
 		return inode;
 
-	error = orangefs_inode_getattr(inode, 1, 1, STATX_ALL);
+	error = orangefs_inode_getattr(inode, ORANGEFS_GETATTR_NEW);
 	if (error) {
 		iget_failed(inode);
 		return ERR_PTR(error);
@@ -443,7 +444,7 @@ struct inode *orangefs_new_inode(struct super_block *sb, struct inode *dir,
 	orangefs_set_inode(inode, ref);
 	inode->i_ino = hash;	/* needed for stat etc */
 
-	error = orangefs_inode_getattr(inode, 1, 1, STATX_ALL);
+	error = orangefs_inode_getattr(inode, ORANGEFS_GETATTR_NEW);
 	if (error)
 		goto out_iput;
 
diff --git a/fs/orangefs/orangefs-kernel.h b/fs/orangefs/orangefs-kernel.h
index 7e2ad1590d1e..03a2a042132f 100644
--- a/fs/orangefs/orangefs-kernel.h
+++ b/fs/orangefs/orangefs-kernel.h
@@ -194,7 +194,6 @@ struct orangefs_inode_s {
 	sector_t last_failed_block_index_read;
 
 	unsigned long getattr_time;
-	u32 getattr_mask;
 
 	DECLARE_HASHTABLE(xattr_cache, 4);
 };
@@ -398,8 +397,10 @@ int orangefs_inode_setxattr(struct inode *inode,
 			 size_t size,
 			 int flags);
 
-int orangefs_inode_getattr(struct inode *inode, int new, int bypass,
-    u32 request_mask);
+#define ORANGEFS_GETATTR_NEW 1
+#define ORANGEFS_GETATTR_SIZE 2
+
+int orangefs_inode_getattr(struct inode *, int);
 
 int orangefs_inode_check_changed(struct inode *inode);
 
diff --git a/fs/orangefs/orangefs-utils.c b/fs/orangefs/orangefs-utils.c
index f8cbbdd7dd7f..ab1be285f89d 100644
--- a/fs/orangefs/orangefs-utils.c
+++ b/fs/orangefs/orangefs-utils.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 /*
  * (C) 2001 Clemson University and The University of Chicago
+ * Copyright 2018 Omnibond Systems, L.L.C.
  *
  * See COPYING in top-level directory.
  */
@@ -268,8 +269,7 @@ static int orangefs_inode_is_stale(struct inode *inode,
 	return 0;
 }
 
-int orangefs_inode_getattr(struct inode *inode, int new, int bypass,
-    u32 request_mask)
+int orangefs_inode_getattr(struct inode *inode, int flags)
 {
 	struct orangefs_inode_s *orangefs_inode = ORANGEFS_I(inode);
 	struct orangefs_kernel_op_s *new_op;
@@ -279,16 +279,9 @@ int orangefs_inode_getattr(struct inode *inode, int new, int bypass,
 	gossip_debug(GOSSIP_UTILS_DEBUG, "%s: called on inode %pU\n", __func__,
 	    get_khandle_from_ino(inode));
 
-	if (!new && !bypass) {
-		/*
-		 * Must have all the attributes in the mask and be within cache
-		 * time.
-		 */
-		if ((request_mask & orangefs_inode->getattr_mask) ==
-		    request_mask &&
-		    time_before(jiffies, orangefs_inode->getattr_time))
-			return 0;
-	}
+	/* Must have all the attributes in the mask and be within cache time. */
+	if (!flags && time_before(jiffies, orangefs_inode->getattr_time))
+		return 0;
 
 	new_op = op_alloc(ORANGEFS_VFS_OP_GETATTR);
 	if (!new_op)
@@ -298,7 +291,7 @@ int orangefs_inode_getattr(struct inode *inode, int new, int bypass,
 	 * Size is the hardest attribute to get.  The incremental cost of any
 	 * other attribute is essentially zero.
 	 */
-	if (request_mask & STATX_SIZE || new)
+	if (flags)
 		new_op->upcall.req.getattr.mask = ORANGEFS_ATTR_SYS_ALL_NOHINT;
 	else
 		new_op->upcall.req.getattr.mask =
@@ -308,7 +301,7 @@ int orangefs_inode_getattr(struct inode *inode, int new, int bypass,
 	if (ret != 0)
 		goto out;
 
-	if (!new) {
+	if (!(flags & ORANGEFS_GETATTR_NEW)) {
 		ret = orangefs_inode_is_stale(inode,
 		    &new_op->downcall.resp.getattr.attributes,
 		    new_op->downcall.resp.getattr.link_target);
@@ -324,7 +317,7 @@ int orangefs_inode_getattr(struct inode *inode, int new, int bypass,
 	case S_IFREG:
 		inode->i_flags = orangefs_inode_flags(&new_op->
 		    downcall.resp.getattr.attributes);
-		if (request_mask & STATX_SIZE || new) {
+		if (flags) {
 			inode_size = (loff_t)new_op->
 			    downcall.resp.getattr.attributes.size;
 			rounded_up_size =
@@ -340,7 +333,7 @@ int orangefs_inode_getattr(struct inode *inode, int new, int bypass,
 		}
 		break;
 	case S_IFDIR:
-		if (request_mask & STATX_SIZE || new) {
+		if (flags) {
 			inode->i_size = PAGE_SIZE;
 			orangefs_inode->blksize = i_blocksize(inode);
 			spin_lock(&inode->i_lock);
@@ -350,7 +343,7 @@ int orangefs_inode_getattr(struct inode *inode, int new, int bypass,
 		set_nlink(inode, 1);
 		break;
 	case S_IFLNK:
-		if (new) {
+		if (flags & ORANGEFS_GETATTR_NEW) {
 			inode->i_size = (loff_t)strlen(new_op->
 			    downcall.resp.getattr.link_target);
 			orangefs_inode->blksize = i_blocksize(inode);
@@ -392,10 +385,6 @@ int orangefs_inode_getattr(struct inode *inode, int new, int bypass,
 
 	orangefs_inode->getattr_time = jiffies +
 	    orangefs_getattr_timeout_msecs*HZ/1000;
-	if (request_mask & STATX_SIZE || new)
-		orangefs_inode->getattr_mask = STATX_BASIC_STATS;
-	else
-		orangefs_inode->getattr_mask = STATX_BASIC_STATS & ~STATX_SIZE;
 	ret = 0;
 out:
 	op_release(new_op);
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 12/24] orangefs: update attributes rather than relying on server
  2018-03-20 17:02 [PATCH 00/24] orangefs: page cache Martin Brandenburg
                   ` (10 preceding siblings ...)
  2018-03-20 17:02 ` [PATCH 11/24] orangefs: simply orangefs_inode_getattr interface Martin Brandenburg
@ 2018-03-20 17:02 ` Martin Brandenburg
  2018-03-20 17:02 ` [PATCH 13/24] orangefs: hold i_lock during inode_getattr Martin Brandenburg
                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Martin Brandenburg @ 2018-03-20 17:02 UTC (permalink / raw)
  To: hubcap, linux-fsdevel; +Cc: Martin Brandenburg

This should be a no-op now, but once inode writeback works, it'll be
necessary to have the correct attribute in the dirty inode.

Previously the attribute fetch timeout was marked invalid and the server
provided the updated attribute.  When the inode is dirty, the server
cannot be consulted since it does not yet know the pending setattr.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
---
 fs/orangefs/file.c  | 10 ++--------
 fs/orangefs/namei.c |  7 ++++++-
 2 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c
index e1b9880e2b5f..ad615d149683 100644
--- a/fs/orangefs/file.c
+++ b/fs/orangefs/file.c
@@ -321,14 +321,8 @@ static ssize_t do_readv_writev(enum ORANGEFS_io_type type, struct file *file,
 			file_accessed(file);
 		} else {
 			file_update_time(file);
-			/*
-			 * Must invalidate to ensure write loop doesn't
-			 * prevent kernel from reading updated
-			 * attribute.  Size probably changed because of
-			 * the write, and other clients could update
-			 * any other attribute.
-			 */
-			orangefs_inode->getattr_time = jiffies - 1;
+			if (*offset > i_size_read(inode))
+				i_size_write(inode, *offset);
 		}
 	}
 
diff --git a/fs/orangefs/namei.c b/fs/orangefs/namei.c
index 8f1c6e6d3ee1..b6c3d742f35b 100644
--- a/fs/orangefs/namei.c
+++ b/fs/orangefs/namei.c
@@ -425,6 +425,7 @@ static int orangefs_rename(struct inode *old_dir,
 			unsigned int flags)
 {
 	struct orangefs_kernel_op_s *new_op;
+	struct iattr iattr;
 	int ret;
 
 	if (flags)
@@ -434,7 +435,11 @@ static int orangefs_rename(struct inode *old_dir,
 		     "orangefs_rename: called (%pd2 => %pd2) ct=%d\n",
 		     old_dentry, new_dentry, d_count(new_dentry));
 
-	ORANGEFS_I(new_dentry->d_parent->d_inode)->getattr_time = jiffies - 1;
+	new_dir->i_mtime = new_dir->i_ctime = current_time(new_dir);
+	memset(&iattr, 0, sizeof iattr);
+	iattr.ia_valid |= ATTR_MTIME;
+	orangefs_inode_setattr(new_dir, &iattr);
+	mark_inode_dirty_sync(new_dir);
 
 	new_op = op_alloc(ORANGEFS_VFS_OP_RENAME);
 	if (!new_op)
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 13/24] orangefs: hold i_lock during inode_getattr
  2018-03-20 17:02 [PATCH 00/24] orangefs: page cache Martin Brandenburg
                   ` (11 preceding siblings ...)
  2018-03-20 17:02 ` [PATCH 12/24] orangefs: update attributes rather than relying on server Martin Brandenburg
@ 2018-03-20 17:02 ` Martin Brandenburg
  2018-03-20 17:02 ` [PATCH 14/24] orangefs: set up and use backing_dev_info Martin Brandenburg
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Martin Brandenburg @ 2018-03-20 17:02 UTC (permalink / raw)
  To: hubcap, linux-fsdevel; +Cc: Martin Brandenburg

This should be a no-op now.  When inode writeback works, this will
prevent a getattr from overwriting inode data while an inode is
transitioning to dirty.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
---
 fs/orangefs/inode.c          |  4 ++--
 fs/orangefs/orangefs-utils.c | 33 +++++++++++++++++++++++----------
 2 files changed, 25 insertions(+), 12 deletions(-)

diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c
index 6222f029f93a..d77787f7b2f3 100644
--- a/fs/orangefs/inode.c
+++ b/fs/orangefs/inode.c
@@ -252,8 +252,8 @@ int orangefs_getattr(const struct path *path, struct kstat *stat,
 	struct orangefs_inode_s *orangefs_inode = NULL;
 
 	gossip_debug(GOSSIP_INODE_DEBUG,
-		     "orangefs_getattr: called on %pd\n",
-		     path->dentry);
+		     "orangefs_getattr: called on %pd mask %u\n",
+		     path->dentry, request_mask);
 
 	ret = orangefs_inode_getattr(inode,
 	    request_mask & STATX_SIZE ? ORANGEFS_GETATTR_SIZE : 0);
diff --git a/fs/orangefs/orangefs-utils.c b/fs/orangefs/orangefs-utils.c
index ab1be285f89d..8b13f1d15999 100644
--- a/fs/orangefs/orangefs-utils.c
+++ b/fs/orangefs/orangefs-utils.c
@@ -276,12 +276,17 @@ int orangefs_inode_getattr(struct inode *inode, int flags)
 	loff_t inode_size, rounded_up_size;
 	int ret, type;
 
-	gossip_debug(GOSSIP_UTILS_DEBUG, "%s: called on inode %pU\n", __func__,
-	    get_khandle_from_ino(inode));
+	gossip_debug(GOSSIP_UTILS_DEBUG, "%s: called on inode %pU flags %d\n",
+	    __func__, get_khandle_from_ino(inode), flags);
 
+	spin_lock(&inode->i_lock);
 	/* Must have all the attributes in the mask and be within cache time. */
-	if (!flags && time_before(jiffies, orangefs_inode->getattr_time))
+	if ((!flags && time_before(jiffies, orangefs_inode->getattr_time)) ||
+	    inode->i_state & I_DIRTY) {
+		spin_unlock(&inode->i_lock);
 		return 0;
+	}
+	spin_unlock(&inode->i_lock);
 
 	new_op = op_alloc(ORANGEFS_VFS_OP_GETATTR);
 	if (!new_op)
@@ -301,13 +306,23 @@ int orangefs_inode_getattr(struct inode *inode, int flags)
 	if (ret != 0)
 		goto out;
 
+	spin_lock(&inode->i_lock);
+	/* Must have all the attributes in the mask and be within cache time. */
+	if ((!flags && time_before(jiffies, orangefs_inode->getattr_time)) ||
+	    inode->i_state & I_DIRTY) {
+		gossip_debug(GOSSIP_UTILS_DEBUG, "%s: in cache or dirty\n",
+		    __func__);
+		ret = 0;
+		goto out_unlock;
+	}
+
 	if (!(flags & ORANGEFS_GETATTR_NEW)) {
 		ret = orangefs_inode_is_stale(inode,
 		    &new_op->downcall.resp.getattr.attributes,
 		    new_op->downcall.resp.getattr.link_target);
 		if (ret) {
 			ret = -ESTALE;
-			goto out;
+			goto out_unlock;
 		}
 	}
 
@@ -325,20 +340,16 @@ int orangefs_inode_getattr(struct inode *inode, int flags)
 			inode->i_size = inode_size;
 			orangefs_inode->blksize =
 			    new_op->downcall.resp.getattr.attributes.blksize;
-			spin_lock(&inode->i_lock);
 			inode->i_bytes = inode_size;
 			inode->i_blocks =
 			    (unsigned long)(rounded_up_size / 512);
-			spin_unlock(&inode->i_lock);
 		}
 		break;
 	case S_IFDIR:
 		if (flags) {
 			inode->i_size = PAGE_SIZE;
 			orangefs_inode->blksize = i_blocksize(inode);
-			spin_lock(&inode->i_lock);
 			inode_set_bytes(inode, inode->i_size);
-			spin_unlock(&inode->i_lock);
 		}
 		set_nlink(inode, 1);
 		break;
@@ -352,7 +363,7 @@ int orangefs_inode_getattr(struct inode *inode, int flags)
 			    ORANGEFS_NAME_MAX);
 			if (ret == -E2BIG) {
 				ret = -EIO;
-				goto out;
+				goto out_unlock;
 			}
 			inode->i_link = orangefs_inode->link_target;
 		}
@@ -362,7 +373,7 @@ int orangefs_inode_getattr(struct inode *inode, int flags)
 		/* XXX: ESTALE?  This is what is done if it is not new. */
 		orangefs_make_bad_inode(inode);
 		ret = -ESTALE;
-		goto out;
+		goto out_unlock;
 	}
 
 	inode->i_uid = make_kuid(&init_user_ns, new_op->
@@ -386,6 +397,8 @@ int orangefs_inode_getattr(struct inode *inode, int flags)
 	orangefs_inode->getattr_time = jiffies +
 	    orangefs_getattr_timeout_msecs*HZ/1000;
 	ret = 0;
+out_unlock:
+	spin_unlock(&inode->i_lock);
 out:
 	op_release(new_op);
 	return ret;
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 14/24] orangefs: set up and use backing_dev_info
  2018-03-20 17:02 [PATCH 00/24] orangefs: page cache Martin Brandenburg
                   ` (12 preceding siblings ...)
  2018-03-20 17:02 ` [PATCH 13/24] orangefs: hold i_lock during inode_getattr Martin Brandenburg
@ 2018-03-20 17:02 ` Martin Brandenburg
  2018-03-20 17:02 ` [PATCH 15/24] orangefs: let setattr write to cached inode Martin Brandenburg
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Martin Brandenburg @ 2018-03-20 17:02 UTC (permalink / raw)
  To: hubcap, linux-fsdevel; +Cc: Martin Brandenburg

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
---
 fs/orangefs/super.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/fs/orangefs/super.c b/fs/orangefs/super.c
index f4ff3aec9989..1251d201b3c9 100644
--- a/fs/orangefs/super.c
+++ b/fs/orangefs/super.c
@@ -404,15 +404,11 @@ static int orangefs_fill_sb(struct super_block *sb,
 		struct orangefs_fs_mount_response *fs_mount,
 		void *data, int silent)
 {
-	int ret = -EINVAL;
-	struct inode *root = NULL;
-	struct dentry *root_dentry = NULL;
+	int ret;
+	struct inode *root;
+	struct dentry *root_dentry;
 	struct orangefs_object_kref root_object;
 
-	/* alloc and init our private orangefs sb info */
-	sb->s_fs_info = kzalloc(sizeof(struct orangefs_sb_info_s), GFP_KERNEL);
-	if (!ORANGEFS_SB(sb))
-		return -ENOMEM;
 	ORANGEFS_SB(sb)->sb = sb;
 
 	ORANGEFS_SB(sb)->root_khandle = fs_mount->root_khandle;
@@ -435,6 +431,10 @@ static int orangefs_fill_sb(struct super_block *sb,
 	sb->s_blocksize_bits = orangefs_bufmap_shift_query();
 	sb->s_maxbytes = MAX_LFS_FILESIZE;
 
+	ret = super_setup_bdi(sb);
+	if (ret)
+		return ret;
+
 	root_object.khandle = ORANGEFS_SB(sb)->root_khandle;
 	root_object.fs_id = ORANGEFS_SB(sb)->fs_id;
 	gossip_debug(GOSSIP_SUPER_DEBUG,
@@ -513,6 +513,13 @@ struct dentry *orangefs_mount(struct file_system_type *fst,
 		goto free_op;
 	}
 
+	/* alloc and init our private orangefs sb info */
+	sb->s_fs_info = kzalloc(sizeof(struct orangefs_sb_info_s), GFP_KERNEL);
+	if (!ORANGEFS_SB(sb)) {
+		d = ERR_PTR(-ENOMEM);
+		goto free_op;
+	}
+
 	ret = orangefs_fill_sb(sb,
 	      &new_op->downcall.resp.fs_mount, data,
 	      flags & SB_SILENT ? 1 : 0);
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 15/24] orangefs: let setattr write to cached inode
  2018-03-20 17:02 [PATCH 00/24] orangefs: page cache Martin Brandenburg
                   ` (13 preceding siblings ...)
  2018-03-20 17:02 ` [PATCH 14/24] orangefs: set up and use backing_dev_info Martin Brandenburg
@ 2018-03-20 17:02 ` Martin Brandenburg
  2018-03-20 17:02 ` [PATCH 16/24] orangefs: reorganize setattr functions to track attribute changes Martin Brandenburg
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Martin Brandenburg @ 2018-03-20 17:02 UTC (permalink / raw)
  To: hubcap, linux-fsdevel; +Cc: Martin Brandenburg

This is a fairly big change, but ultimately it's not a lot of code.

Implement write_inode and then avoid the call to orangefs_inode_setattr
within orangefs_setattr.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
---
 fs/orangefs/inode.c | 10 +++-------
 fs/orangefs/super.c | 16 ++++++++++++++++
 2 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c
index d77787f7b2f3..181723e16a94 100644
--- a/fs/orangefs/inode.c
+++ b/fs/orangefs/inode.c
@@ -207,8 +207,8 @@ static int orangefs_setattr_size(struct inode *inode, struct iattr *iattr)
  */
 int orangefs_setattr(struct dentry *dentry, struct iattr *iattr)
 {
-	int ret = -EINVAL;
 	struct inode *inode = dentry->d_inode;
+	int ret;
 
 	gossip_debug(GOSSIP_INODE_DEBUG,
 		     "orangefs_setattr: called on %pd\n",
@@ -227,15 +227,11 @@ int orangefs_setattr(struct dentry *dentry, struct iattr *iattr)
 	setattr_copy(inode, iattr);
 	mark_inode_dirty(inode);
 
-	ret = orangefs_inode_setattr(inode, iattr);
-	gossip_debug(GOSSIP_INODE_DEBUG,
-		     "orangefs_setattr: inode_setattr returned %d\n",
-		     ret);
-
-	if (!ret && (iattr->ia_valid & ATTR_MODE))
+	if (iattr->ia_valid & ATTR_MODE)
 		/* change mod on a file that has ACLs */
 		ret = posix_acl_chmod(inode, inode->i_mode);
 
+	ret = 0;
 out:
 	gossip_debug(GOSSIP_INODE_DEBUG, "orangefs_setattr: returning %d\n", ret);
 	return ret;
diff --git a/fs/orangefs/super.c b/fs/orangefs/super.c
index 1251d201b3c9..56e85d8c04bd 100644
--- a/fs/orangefs/super.c
+++ b/fs/orangefs/super.c
@@ -150,6 +150,21 @@ static void orangefs_destroy_inode(struct inode *inode)
 	call_rcu(&inode->i_rcu, orangefs_i_callback);
 }
 
+int orangefs_write_inode(struct inode *inode, struct writeback_control *wbc)
+{
+	struct iattr iattr;
+	gossip_debug(GOSSIP_SUPER_DEBUG, "orangefs_write_inode\n");
+	iattr.ia_valid = ATTR_MODE | ATTR_UID | ATTR_GID | ATTR_ATIME |
+	    ATTR_ATIME_SET | ATTR_MTIME | ATTR_MTIME_SET | ATTR_CTIME;
+	iattr.ia_mode = inode->i_mode;
+	iattr.ia_uid = inode->i_uid;
+	iattr.ia_gid = inode->i_gid;
+	iattr.ia_atime = inode->i_atime;
+	iattr.ia_mtime = inode->i_mtime;
+	iattr.ia_ctime = inode->i_ctime;
+	return orangefs_inode_setattr(inode, &iattr);
+}
+
 /*
  * NOTE: information filled in here is typically reflected in the
  * output of the system command 'df'
@@ -307,6 +322,7 @@ void fsid_key_table_finalize(void)
 static const struct super_operations orangefs_s_ops = {
 	.alloc_inode = orangefs_alloc_inode,
 	.destroy_inode = orangefs_destroy_inode,
+	.write_inode = orangefs_write_inode,
 	.drop_inode = generic_delete_inode,
 	.statfs = orangefs_statfs,
 	.remount_fs = orangefs_remount_fs,
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 16/24] orangefs: reorganize setattr functions to track attribute changes
  2018-03-20 17:02 [PATCH 00/24] orangefs: page cache Martin Brandenburg
                   ` (14 preceding siblings ...)
  2018-03-20 17:02 ` [PATCH 15/24] orangefs: let setattr write to cached inode Martin Brandenburg
@ 2018-03-20 17:02 ` Martin Brandenburg
  2018-03-20 17:02 ` [PATCH 17/24] orangefs: remove orangefs_readpages Martin Brandenburg
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Martin Brandenburg @ 2018-03-20 17:02 UTC (permalink / raw)
  To: hubcap, linux-fsdevel; +Cc: Martin Brandenburg

OrangeFS accepts a mask indicating which attributes were changed.  The
kernel must not set any bits except those that were actually changed.
The kernel must set the uid/gid of the request to the actual uid/gid
responsible for the change.

Code path for notify_change initiated setattrs is

orangefs_setattr(dentry, iattr)
-> __orangefs_setattr(inode, iattr)

In kernel changes are initiated by calling __orangefs_setattr.

Code path for writeback is

orangefs_write_inode
-> orangefs_inode_setattr

attr_valid and attr_uid and attr_gid change together under i_lock.
I_DIRTY changes separately.

__orangefs_setattr
	lock
	if needs to be cleaned first, unlock and retry
	set attr_valid
	copy data in
	unlock
	mark_inode_dirty

orangefs_inode_setattr
	lock
	copy attributes out
	unlock
	clear getattr_time
	# __writeback_single_inode clears dirty

orangefs_inode_getattr
	# possible to get here with attr_valid set and not dirty
	lock
	if getattr_time ok or attr_valid set, unlock and return
	unlock
	do server operation
	# another thread may getattr or setattr, so check for that
	lock
	if getattr_time ok or attr_valid, unlock and return
	else, copy in
	update getattr_time
	unlock

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
---
 fs/orangefs/acl.c             |   4 +-
 fs/orangefs/inode.c           |  76 +++++++++++++++++++++++------
 fs/orangefs/namei.c           |  36 ++++++--------
 fs/orangefs/orangefs-kernel.h |   8 ++-
 fs/orangefs/orangefs-utils.c  | 110 +++++++++++++++++-------------------------
 fs/orangefs/super.c           |  11 +----
 6 files changed, 130 insertions(+), 115 deletions(-)

diff --git a/fs/orangefs/acl.c b/fs/orangefs/acl.c
index 796c22f80b78..bacd676ed133 100644
--- a/fs/orangefs/acl.c
+++ b/fs/orangefs/acl.c
@@ -142,7 +142,7 @@ int orangefs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
 			rc = __orangefs_set_acl(inode, acl, type);
 		} else {
 			iattr.ia_valid = ATTR_MODE;
-			rc = orangefs_inode_setattr(inode, &iattr);
+			rc = __orangefs_setattr(inode, &iattr);
 		}
 
 		return rc;
@@ -181,7 +181,7 @@ int orangefs_init_acl(struct inode *inode, struct inode *dir)
 		inode->i_mode = mode;
 		iattr.ia_mode = mode;
 		iattr.ia_valid |= ATTR_MODE;
-		orangefs_inode_setattr(inode, &iattr);
+		__orangefs_setattr(inode, &iattr);
 	}
 
 	return error;
diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c
index 181723e16a94..d3ff038c7694 100644
--- a/fs/orangefs/inode.c
+++ b/fs/orangefs/inode.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 /*
  * (C) 2001 Clemson University and The University of Chicago
+ * Copyright 2018 Omnibond Systems, L.L.C.
  *
  * See COPYING in top-level directory.
  */
@@ -202,21 +203,31 @@ static int orangefs_setattr_size(struct inode *inode, struct iattr *iattr)
 	return ret;
 }
 
-/*
- * Change attributes of an object referenced by dentry.
- */
-int orangefs_setattr(struct dentry *dentry, struct iattr *iattr)
+int __orangefs_setattr(struct inode *inode, struct iattr *iattr)
 {
-	struct inode *inode = dentry->d_inode;
 	int ret;
 
-	gossip_debug(GOSSIP_INODE_DEBUG,
-		     "orangefs_setattr: called on %pd\n",
-		     dentry);
-
-	ret = setattr_prepare(dentry, iattr);
-	if (ret)
-		goto out;
+	if (iattr->ia_valid & ATTR_MODE) {
+		if (iattr->ia_mode & (S_ISVTX)) {
+			if (is_root_handle(inode)) {
+				/*
+				 * allow sticky bit to be set on root (since
+				 * it shows up that way by default anyhow),
+				 * but don't show it to the server
+				 */
+				iattr->ia_mode -= S_ISVTX;
+			} else {
+				gossip_debug(GOSSIP_UTILS_DEBUG,
+					     "User attempted to set sticky bit on non-root directory; returning EINVAL.\n");
+				return -EINVAL;
+			}
+		}
+		if (iattr->ia_mode & (S_ISUID)) {
+			gossip_debug(GOSSIP_UTILS_DEBUG,
+				     "Attempting to set setuid bit (not supported); returning EINVAL.\n");
+			return -EINVAL;
+		}
+	}
 
 	if (iattr->ia_valid & ATTR_SIZE) {
 		ret = orangefs_setattr_size(inode, iattr);
@@ -224,7 +235,24 @@ int orangefs_setattr(struct dentry *dentry, struct iattr *iattr)
 			goto out;
 	}
 
+again:
+	spin_lock(&inode->i_lock);
+	if (ORANGEFS_I(inode)->attr_valid) {
+		if (uid_eq(ORANGEFS_I(inode)->attr_uid, current_fsuid()) &&
+		    gid_eq(ORANGEFS_I(inode)->attr_gid, current_fsgid())) {
+			ORANGEFS_I(inode)->attr_valid = iattr->ia_valid;
+		} else {
+			spin_unlock(&inode->i_lock);
+			write_inode_now(inode, 1);
+			goto again;
+		}
+	} else {
+		ORANGEFS_I(inode)->attr_valid = iattr->ia_valid;
+		ORANGEFS_I(inode)->attr_uid = current_fsuid();
+		ORANGEFS_I(inode)->attr_gid = current_fsgid();
+	}
 	setattr_copy(inode, iattr);
+	spin_unlock(&inode->i_lock);
 	mark_inode_dirty(inode);
 
 	if (iattr->ia_valid & ATTR_MODE)
@@ -233,7 +261,26 @@ int orangefs_setattr(struct dentry *dentry, struct iattr *iattr)
 
 	ret = 0;
 out:
-	gossip_debug(GOSSIP_INODE_DEBUG, "orangefs_setattr: returning %d\n", ret);
+	return ret;
+
+}
+
+/*
+ * Change attributes of an object referenced by dentry.
+ */
+int orangefs_setattr(struct dentry *dentry, struct iattr *iattr)
+{
+	int ret;
+	gossip_debug(GOSSIP_INODE_DEBUG, "__orangefs_setattr: called on %pd\n",
+	    dentry);
+	ret = setattr_prepare(dentry, iattr);
+	if (ret)
+		goto out;
+	ret = __orangefs_setattr(d_inode(dentry), iattr);
+	sync_inode_metadata(d_inode(dentry), 1);
+out:
+	gossip_debug(GOSSIP_INODE_DEBUG, "orangefs_setattr: returning %d\n",
+	    ret);
 	return ret;
 }
 
@@ -299,7 +346,7 @@ int orangefs_update_time(struct inode *inode, struct timespec *time, int flags)
 		iattr.ia_valid |= ATTR_CTIME;
 	if (flags & S_MTIME)
 		iattr.ia_valid |= ATTR_MTIME;
-	return orangefs_inode_setattr(inode, &iattr);
+	return __orangefs_setattr(inode, &iattr);
 }
 
 /* ORANGEDS2 implementation of VFS inode operations for files */
@@ -360,6 +407,7 @@ static int orangefs_set_inode(struct inode *inode, void *data)
 	struct orangefs_object_kref *ref = (struct orangefs_object_kref *) data;
 	ORANGEFS_I(inode)->refn.fs_id = ref->fs_id;
 	ORANGEFS_I(inode)->refn.khandle = ref->khandle;
+	ORANGEFS_I(inode)->attr_valid = 0;
 	hash_init(ORANGEFS_I(inode)->xattr_cache);
 	return 0;
 }
diff --git a/fs/orangefs/namei.c b/fs/orangefs/namei.c
index b6c3d742f35b..f8e151cae44f 100644
--- a/fs/orangefs/namei.c
+++ b/fs/orangefs/namei.c
@@ -83,11 +83,10 @@ static int orangefs_create(struct inode *dir,
 		     __func__,
 		     dentry);
 
-	dir->i_mtime = dir->i_ctime = current_time(dir);
 	memset(&iattr, 0, sizeof iattr);
-	iattr.ia_valid |= ATTR_MTIME;
-	orangefs_inode_setattr(dir, &iattr);
-	mark_inode_dirty_sync(dir);
+	iattr.ia_valid |= ATTR_MTIME | ATTR_CTIME;
+	iattr.ia_mtime = iattr.ia_ctime = current_time(dir);
+	__orangefs_setattr(dir, &iattr);
 	ret = 0;
 out:
 	gossip_debug(GOSSIP_NAME_DEBUG,
@@ -255,11 +254,11 @@ static int orangefs_unlink(struct inode *dir, struct dentry *dentry)
 	if (!ret) {
 		drop_nlink(inode);
 
-		dir->i_mtime = dir->i_ctime = current_time(dir);
 		memset(&iattr, 0, sizeof iattr);
-		iattr.ia_valid |= ATTR_MTIME;
-		orangefs_inode_setattr(dir, &iattr);
-		mark_inode_dirty_sync(dir);
+		iattr.ia_valid |= ATTR_MTIME | ATTR_CTIME;
+		iattr.ia_mtime = iattr.ia_ctime = current_time(dir);
+		__orangefs_setattr(dir, &iattr);
+	ret = 0;
 	}
 	return ret;
 }
@@ -337,11 +336,10 @@ static int orangefs_symlink(struct inode *dir,
 		     get_khandle_from_ino(inode),
 		     dentry);
 
-	dir->i_mtime = dir->i_ctime = current_time(dir);
 	memset(&iattr, 0, sizeof iattr);
-	iattr.ia_valid |= ATTR_MTIME;
-	orangefs_inode_setattr(dir, &iattr);
-	mark_inode_dirty_sync(dir);
+	iattr.ia_valid |= ATTR_MTIME | ATTR_CTIME;
+	iattr.ia_mtime = iattr.ia_ctime = current_time(dir);
+	__orangefs_setattr(dir, &iattr);
 	ret = 0;
 out:
 	return ret;
@@ -409,11 +407,10 @@ static int orangefs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode
 	 * NOTE: we have no good way to keep nlink consistent for directories
 	 * across clients; keep constant at 1.
 	 */
-	dir->i_mtime = dir->i_ctime = current_time(dir);
 	memset(&iattr, 0, sizeof iattr);
-	iattr.ia_valid |= ATTR_MTIME;
-	orangefs_inode_setattr(dir, &iattr);
-	mark_inode_dirty_sync(dir);
+	iattr.ia_valid |= ATTR_MTIME | ATTR_CTIME;
+	iattr.ia_mtime = iattr.ia_ctime = current_time(dir);
+	__orangefs_setattr(dir, &iattr);
 out:
 	return ret;
 }
@@ -435,11 +432,10 @@ static int orangefs_rename(struct inode *old_dir,
 		     "orangefs_rename: called (%pd2 => %pd2) ct=%d\n",
 		     old_dentry, new_dentry, d_count(new_dentry));
 
-	new_dir->i_mtime = new_dir->i_ctime = current_time(new_dir);
 	memset(&iattr, 0, sizeof iattr);
-	iattr.ia_valid |= ATTR_MTIME;
-	orangefs_inode_setattr(new_dir, &iattr);
-	mark_inode_dirty_sync(new_dir);
+	iattr.ia_valid |= ATTR_MTIME | ATTR_CTIME;
+	iattr.ia_mtime = iattr.ia_ctime = current_time(new_dir);
+	__orangefs_setattr(new_dir, &iattr);
 
 	new_op = op_alloc(ORANGEFS_VFS_OP_RENAME);
 	if (!new_op)
diff --git a/fs/orangefs/orangefs-kernel.h b/fs/orangefs/orangefs-kernel.h
index 03a2a042132f..dedc96aa69fc 100644
--- a/fs/orangefs/orangefs-kernel.h
+++ b/fs/orangefs/orangefs-kernel.h
@@ -194,6 +194,9 @@ struct orangefs_inode_s {
 	sector_t last_failed_block_index_read;
 
 	unsigned long getattr_time;
+	int attr_valid;
+	kuid_t attr_uid;
+	kgid_t attr_gid;
 
 	DECLARE_HASHTABLE(xattr_cache, 4);
 };
@@ -346,7 +349,8 @@ struct inode *orangefs_new_inode(struct super_block *sb,
 			      dev_t dev,
 			      struct orangefs_object_kref *ref);
 
-int orangefs_setattr(struct dentry *dentry, struct iattr *iattr);
+int __orangefs_setattr(struct inode *, struct iattr *);
+int orangefs_setattr(struct dentry *, struct iattr *);
 
 int orangefs_getattr(const struct path *path, struct kstat *stat,
 		     u32 request_mask, unsigned int flags);
@@ -404,7 +408,7 @@ int orangefs_inode_getattr(struct inode *, int);
 
 int orangefs_inode_check_changed(struct inode *inode);
 
-int orangefs_inode_setattr(struct inode *inode, struct iattr *iattr);
+int orangefs_inode_setattr(struct inode *inode);
 
 bool orangefs_cancel_op_in_progress(struct orangefs_kernel_op_s *op);
 
diff --git a/fs/orangefs/orangefs-utils.c b/fs/orangefs/orangefs-utils.c
index 8b13f1d15999..d34b9a90f6d7 100644
--- a/fs/orangefs/orangefs-utils.c
+++ b/fs/orangefs/orangefs-utils.c
@@ -134,51 +134,37 @@ static int orangefs_inode_perms(struct ORANGEFS_sys_attr_s *attrs)
  * NOTE: in kernel land, we never use the sys_attr->link_target for
  * anything, so don't bother copying it into the sys_attr object here.
  */
-static inline int copy_attributes_from_inode(struct inode *inode,
-					     struct ORANGEFS_sys_attr_s *attrs,
-					     struct iattr *iattr)
+static inline void copy_attributes_from_inode(struct inode *inode,
+    struct ORANGEFS_sys_attr_s *attrs)
 {
-	umode_t tmp_mode;
-
-	if (!iattr || !inode || !attrs) {
-		gossip_err("NULL iattr (%p), inode (%p), attrs (%p) "
-			   "in copy_attributes_from_inode!\n",
-			   iattr,
-			   inode,
-			   attrs);
-		return -EINVAL;
-	}
-	/*
-	 * We need to be careful to only copy the attributes out of the
-	 * iattr object that we know are valid.
-	 */
+	struct orangefs_inode_s *orangefs_inode = ORANGEFS_I(inode);
 	attrs->mask = 0;
-	if (iattr->ia_valid & ATTR_UID) {
-		attrs->owner = from_kuid(&init_user_ns, iattr->ia_uid);
+	if (orangefs_inode->attr_valid & ATTR_UID) {
+		attrs->owner = from_kuid(&init_user_ns, inode->i_uid);
 		attrs->mask |= ORANGEFS_ATTR_SYS_UID;
 		gossip_debug(GOSSIP_UTILS_DEBUG, "(UID) %d\n", attrs->owner);
 	}
-	if (iattr->ia_valid & ATTR_GID) {
-		attrs->group = from_kgid(&init_user_ns, iattr->ia_gid);
+	if (orangefs_inode->attr_valid & ATTR_GID) {
+		attrs->group = from_kgid(&init_user_ns, inode->i_gid);
 		attrs->mask |= ORANGEFS_ATTR_SYS_GID;
 		gossip_debug(GOSSIP_UTILS_DEBUG, "(GID) %d\n", attrs->group);
 	}
 
-	if (iattr->ia_valid & ATTR_ATIME) {
+	if (orangefs_inode->attr_valid & ATTR_ATIME) {
 		attrs->mask |= ORANGEFS_ATTR_SYS_ATIME;
-		if (iattr->ia_valid & ATTR_ATIME_SET) {
-			attrs->atime = (time64_t)iattr->ia_atime.tv_sec;
+		if (orangefs_inode->attr_valid & ATTR_ATIME_SET) {
+			attrs->atime = (time64_t)inode->i_atime.tv_sec;
 			attrs->mask |= ORANGEFS_ATTR_SYS_ATIME_SET;
 		}
 	}
-	if (iattr->ia_valid & ATTR_MTIME) {
+	if (orangefs_inode->attr_valid & ATTR_MTIME) {
 		attrs->mask |= ORANGEFS_ATTR_SYS_MTIME;
-		if (iattr->ia_valid & ATTR_MTIME_SET) {
-			attrs->mtime = (time64_t)iattr->ia_mtime.tv_sec;
+		if (orangefs_inode->attr_valid & ATTR_MTIME_SET) {
+			attrs->mtime = (time64_t)inode->i_mtime.tv_sec;
 			attrs->mask |= ORANGEFS_ATTR_SYS_MTIME_SET;
 		}
 	}
-	if (iattr->ia_valid & ATTR_CTIME)
+	if (orangefs_inode->attr_valid & ATTR_CTIME)
 		attrs->mask |= ORANGEFS_ATTR_SYS_CTIME;
 
 	/*
@@ -187,34 +173,10 @@ static inline int copy_attributes_from_inode(struct inode *inode,
 	 * ATTR_SIZE
 	 */
 
-	if (iattr->ia_valid & ATTR_MODE) {
-		tmp_mode = iattr->ia_mode;
-		if (tmp_mode & (S_ISVTX)) {
-			if (is_root_handle(inode)) {
-				/*
-				 * allow sticky bit to be set on root (since
-				 * it shows up that way by default anyhow),
-				 * but don't show it to the server
-				 */
-				tmp_mode -= S_ISVTX;
-			} else {
-				gossip_debug(GOSSIP_UTILS_DEBUG,
-					     "User attempted to set sticky bit on non-root directory; returning EINVAL.\n");
-				return -EINVAL;
-			}
-		}
-
-		if (tmp_mode & (S_ISUID)) {
-			gossip_debug(GOSSIP_UTILS_DEBUG,
-				     "Attempting to set setuid bit (not supported); returning EINVAL.\n");
-			return -EINVAL;
-		}
-
-		attrs->perms = ORANGEFS_util_translate_mode(tmp_mode);
+	if (orangefs_inode->attr_valid & ATTR_MODE) {
+		attrs->perms = ORANGEFS_util_translate_mode(inode->i_mode);
 		attrs->mask |= ORANGEFS_ATTR_SYS_PERM;
 	}
-
-	return 0;
 }
 
 static int orangefs_inode_type(enum orangefs_ds_type objtype)
@@ -279,10 +241,16 @@ int orangefs_inode_getattr(struct inode *inode, int flags)
 	gossip_debug(GOSSIP_UTILS_DEBUG, "%s: called on inode %pU flags %d\n",
 	    __func__, get_khandle_from_ino(inode), flags);
 
+again:
 	spin_lock(&inode->i_lock);
 	/* Must have all the attributes in the mask and be within cache time. */
 	if ((!flags && time_before(jiffies, orangefs_inode->getattr_time)) ||
-	    inode->i_state & I_DIRTY) {
+	    orangefs_inode->attr_valid) {
+		if (orangefs_inode->attr_valid) {
+			spin_unlock(&inode->i_lock);
+			write_inode_now(inode, 1);
+			goto again;
+		}
 		spin_unlock(&inode->i_lock);
 		return 0;
 	}
@@ -306,10 +274,16 @@ int orangefs_inode_getattr(struct inode *inode, int flags)
 	if (ret != 0)
 		goto out;
 
+again2:
 	spin_lock(&inode->i_lock);
 	/* Must have all the attributes in the mask and be within cache time. */
 	if ((!flags && time_before(jiffies, orangefs_inode->getattr_time)) ||
-	    inode->i_state & I_DIRTY) {
+	    orangefs_inode->attr_valid) {
+		if (orangefs_inode->attr_valid) {
+			spin_unlock(&inode->i_lock);
+			write_inode_now(inode, 1);
+			goto again2;
+		}
 		gossip_debug(GOSSIP_UTILS_DEBUG, "%s: in cache or dirty\n",
 		    __func__);
 		ret = 0;
@@ -436,7 +410,7 @@ int orangefs_inode_check_changed(struct inode *inode)
  * issues a orangefs setattr request to make sure the new attribute values
  * take effect if successful.  returns 0 on success; -errno otherwise
  */
-int orangefs_inode_setattr(struct inode *inode, struct iattr *iattr)
+int orangefs_inode_setattr(struct inode *inode)
 {
 	struct orangefs_inode_s *orangefs_inode = ORANGEFS_I(inode);
 	struct orangefs_kernel_op_s *new_op;
@@ -446,23 +420,25 @@ int orangefs_inode_setattr(struct inode *inode, struct iattr *iattr)
 	if (!new_op)
 		return -ENOMEM;
 
+	spin_lock(&inode->i_lock);
+	new_op->upcall.uid = from_kuid(&init_user_ns, orangefs_inode->attr_uid);
+	new_op->upcall.gid = from_kgid(&init_user_ns, orangefs_inode->attr_gid);
 	new_op->upcall.req.setattr.refn = orangefs_inode->refn;
-	ret = copy_attributes_from_inode(inode,
-		       &new_op->upcall.req.setattr.attributes,
-		       iattr);
-	if (ret >= 0) {
-		ret = service_operation(new_op, get_interruptible_flag(inode));
+	copy_attributes_from_inode(inode,
+	    &new_op->upcall.req.setattr.attributes);
+	orangefs_inode->attr_valid = 0;
+	spin_unlock(&inode->i_lock);
 
-		gossip_debug(GOSSIP_UTILS_DEBUG,
-			     "orangefs_inode_setattr: returning %d\n",
-			     ret);
-	}
+	ret = service_operation(new_op, get_interruptible_flag(inode));
+	gossip_debug(GOSSIP_UTILS_DEBUG,
+	    "orangefs_inode_setattr: returning %d\n", ret);
+	if (ret)
+		orangefs_make_bad_inode(inode);
 
 	op_release(new_op);
 
 	if (ret == 0)
 		orangefs_inode->getattr_time = jiffies - 1;
-
 	return ret;
 }
 
diff --git a/fs/orangefs/super.c b/fs/orangefs/super.c
index 56e85d8c04bd..c4c4fc4340bb 100644
--- a/fs/orangefs/super.c
+++ b/fs/orangefs/super.c
@@ -152,17 +152,8 @@ static void orangefs_destroy_inode(struct inode *inode)
 
 int orangefs_write_inode(struct inode *inode, struct writeback_control *wbc)
 {
-	struct iattr iattr;
 	gossip_debug(GOSSIP_SUPER_DEBUG, "orangefs_write_inode\n");
-	iattr.ia_valid = ATTR_MODE | ATTR_UID | ATTR_GID | ATTR_ATIME |
-	    ATTR_ATIME_SET | ATTR_MTIME | ATTR_MTIME_SET | ATTR_CTIME;
-	iattr.ia_mode = inode->i_mode;
-	iattr.ia_uid = inode->i_uid;
-	iattr.ia_gid = inode->i_gid;
-	iattr.ia_atime = inode->i_atime;
-	iattr.ia_mtime = inode->i_mtime;
-	iattr.ia_ctime = inode->i_ctime;
-	return orangefs_inode_setattr(inode, &iattr);
+	return orangefs_inode_setattr(inode);
 }
 
 /*
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 17/24] orangefs: remove orangefs_readpages
  2018-03-20 17:02 [PATCH 00/24] orangefs: page cache Martin Brandenburg
                   ` (15 preceding siblings ...)
  2018-03-20 17:02 ` [PATCH 16/24] orangefs: reorganize setattr functions to track attribute changes Martin Brandenburg
@ 2018-03-20 17:02 ` Martin Brandenburg
  2018-03-20 17:02 ` [PATCH 18/24] orangefs: service ops done for writeback are not killable Martin Brandenburg
                   ` (6 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Martin Brandenburg @ 2018-03-20 17:02 UTC (permalink / raw)
  To: hubcap, linux-fsdevel; +Cc: Martin Brandenburg

It's a copy of the loop which would run in read_pages from
mm/readahead.c.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
---
 fs/orangefs/inode.c | 39 +--------------------------------------
 1 file changed, 1 insertion(+), 38 deletions(-)

diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c
index d3ff038c7694..be7f2cdb3342 100644
--- a/fs/orangefs/inode.c
+++ b/fs/orangefs/inode.c
@@ -14,7 +14,7 @@
 #include "orangefs-kernel.h"
 #include "orangefs-bufmap.h"
 
-static int read_one_page(struct page *page)
+static int orangefs_readpage(struct file *file, struct page *page)
 {
 	int ret;
 	int max_block;
@@ -59,42 +59,6 @@ static int read_one_page(struct page *page)
 	return ret;
 }
 
-static int orangefs_readpage(struct file *file, struct page *page)
-{
-	return read_one_page(page);
-}
-
-static int orangefs_readpages(struct file *file,
-			   struct address_space *mapping,
-			   struct list_head *pages,
-			   unsigned nr_pages)
-{
-	int page_idx;
-	int ret;
-
-	gossip_debug(GOSSIP_INODE_DEBUG, "orangefs_readpages called\n");
-
-	for (page_idx = 0; page_idx < nr_pages; page_idx++) {
-		struct page *page;
-
-		page = list_entry(pages->prev, struct page, lru);
-		list_del(&page->lru);
-		if (!add_to_page_cache(page,
-				       mapping,
-				       page->index,
-				       readahead_gfp_mask(mapping))) {
-			ret = read_one_page(page);
-			gossip_debug(GOSSIP_INODE_DEBUG,
-				"failure adding page to cache, read_one_page returned: %d\n",
-				ret);
-	      } else {
-			put_page(page);
-	      }
-	}
-	BUG_ON(!list_empty(pages));
-	return 0;
-}
-
 static void orangefs_invalidatepage(struct page *page,
 				 unsigned int offset,
 				 unsigned int length)
@@ -140,7 +104,6 @@ static ssize_t orangefs_direct_IO(struct kiocb *iocb,
 /** ORANGEFS2 implementation of address space operations */
 static const struct address_space_operations orangefs_address_operations = {
 	.readpage = orangefs_readpage,
-	.readpages = orangefs_readpages,
 	.invalidatepage = orangefs_invalidatepage,
 	.releasepage = orangefs_releasepage,
 	.direct_IO = orangefs_direct_IO,
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 18/24] orangefs: service ops done for writeback are not killable
  2018-03-20 17:02 [PATCH 00/24] orangefs: page cache Martin Brandenburg
                   ` (16 preceding siblings ...)
  2018-03-20 17:02 ` [PATCH 17/24] orangefs: remove orangefs_readpages Martin Brandenburg
@ 2018-03-20 17:02 ` Martin Brandenburg
  2018-03-20 17:02 ` [PATCH 19/24] orangefs: migrate to generic_file_read_iter Martin Brandenburg
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Martin Brandenburg @ 2018-03-20 17:02 UTC (permalink / raw)
  To: hubcap, linux-fsdevel; +Cc: Martin Brandenburg

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
---
 fs/orangefs/orangefs-kernel.h |  1 +
 fs/orangefs/orangefs-utils.c  |  6 ++++--
 fs/orangefs/waitqueue.c       | 20 +++++++++++---------
 3 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/fs/orangefs/orangefs-kernel.h b/fs/orangefs/orangefs-kernel.h
index dedc96aa69fc..525656d928d4 100644
--- a/fs/orangefs/orangefs-kernel.h
+++ b/fs/orangefs/orangefs-kernel.h
@@ -443,6 +443,7 @@ extern const struct dentry_operations orangefs_dentry_operations;
 #define ORANGEFS_OP_CANCELLATION  4   /* this is a cancellation */
 #define ORANGEFS_OP_NO_MUTEX      8   /* don't acquire request_mutex */
 #define ORANGEFS_OP_ASYNC         16  /* Queue it, but don't wait */
+#define ORANGEFS_OP_WRITEBACK     32
 
 int service_operation(struct orangefs_kernel_op_s *, int);
 
diff --git a/fs/orangefs/orangefs-utils.c b/fs/orangefs/orangefs-utils.c
index d34b9a90f6d7..0e52208726fd 100644
--- a/fs/orangefs/orangefs-utils.c
+++ b/fs/orangefs/orangefs-utils.c
@@ -270,7 +270,8 @@ int orangefs_inode_getattr(struct inode *inode, int flags)
 		new_op->upcall.req.getattr.mask =
 		    ORANGEFS_ATTR_SYS_ALL_NOHINT & ~ORANGEFS_ATTR_SYS_SIZE;
 
-	ret = service_operation(new_op, get_interruptible_flag(inode));
+	ret = service_operation(new_op,
+	    get_interruptible_flag(inode));
 	if (ret != 0)
 		goto out;
 
@@ -429,7 +430,8 @@ int orangefs_inode_setattr(struct inode *inode)
 	orangefs_inode->attr_valid = 0;
 	spin_unlock(&inode->i_lock);
 
-	ret = service_operation(new_op, get_interruptible_flag(inode));
+	ret = service_operation(new_op,
+	    get_interruptible_flag(inode) | ORANGEFS_OP_WRITEBACK);
 	gossip_debug(GOSSIP_UTILS_DEBUG,
 	    "orangefs_inode_setattr: returning %d\n", ret);
 	if (ret)
diff --git a/fs/orangefs/waitqueue.c b/fs/orangefs/waitqueue.c
index c345a1d7fde2..b02ca891f999 100644
--- a/fs/orangefs/waitqueue.c
+++ b/fs/orangefs/waitqueue.c
@@ -17,7 +17,7 @@
 #include "orangefs-bufmap.h"
 #include "orangefs-trace.h"
 
-static int wait_for_matching_downcall(struct orangefs_kernel_op_s *, long, bool);
+static int wait_for_matching_downcall(struct orangefs_kernel_op_s *, long, int);
 static void orangefs_clean_up_interrupted_operation(struct orangefs_kernel_op_s *);
 
 /*
@@ -138,9 +138,7 @@ int service_operation(struct orangefs_kernel_op_s *op, int flags)
 	if (!(flags & ORANGEFS_OP_NO_MUTEX))
 		mutex_unlock(&orangefs_request_mutex);
 
-	ret = wait_for_matching_downcall(op, timeout,
-					 flags & ORANGEFS_OP_INTERRUPTIBLE);
-
+	ret = wait_for_matching_downcall(op, timeout, flags);
 	gossip_debug(GOSSIP_WAIT_DEBUG,
 		     "%s: wait_for_matching_downcall returned %d for %p\n",
 		     __func__,
@@ -311,10 +309,12 @@ static void
  * Returns with op->lock taken.
  */
 static int wait_for_matching_downcall(struct orangefs_kernel_op_s *op,
-				      long timeout,
-				      bool interruptible)
+				      long timeout, int flags)
 {
 	long n;
+	int writeback = flags & ORANGEFS_OP_WRITEBACK,
+	    interruptible = flags & ORANGEFS_OP_INTERRUPTIBLE;
+
 
 	/*
 	 * There's a "schedule_timeout" inside of these wait
@@ -322,10 +322,12 @@ static int wait_for_matching_downcall(struct orangefs_kernel_op_s *op,
 	 * user process that needs something done and is being
 	 * manipulated by the client-core process.
 	 */
-	if (interruptible)
+	if (writeback)
+		n = wait_for_completion_io_timeout(&op->waitq, timeout);
+	else if (!writeback && interruptible)
 		n = wait_for_completion_interruptible_timeout(&op->waitq,
-							      timeout);
-	else
+								      timeout);
+	else /* !writeback && !interruptible but compiler complains */
 		n = wait_for_completion_killable_timeout(&op->waitq, timeout);
 
 	spin_lock(&op->lock);
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 19/24] orangefs: migrate to generic_file_read_iter
  2018-03-20 17:02 [PATCH 00/24] orangefs: page cache Martin Brandenburg
                   ` (17 preceding siblings ...)
  2018-03-20 17:02 ` [PATCH 18/24] orangefs: service ops done for writeback are not killable Martin Brandenburg
@ 2018-03-20 17:02 ` Martin Brandenburg
  2018-03-20 17:02 ` [PATCH 20/24] orangefs: implement writepage Martin Brandenburg
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Martin Brandenburg @ 2018-03-20 17:02 UTC (permalink / raw)
  To: hubcap, linux-fsdevel; +Cc: Martin Brandenburg

Remove orangefs_inode_read.  It was used by readpage.  Calling
wait_for_direct_io directly serves the purpose just as well.  There is
now no check of the bufmap size in the readpage path.  There are already
other places the bufmap size is assumed to be greater than PAGE_SIZE.

Important to call truncate_inode_pages now in the write path so a
subsequent read sees the new data.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
---
 fs/orangefs/file.c            | 68 +++++--------------------------------------
 fs/orangefs/inode.c           | 63 +++++++++++++++------------------------
 fs/orangefs/orangefs-kernel.h | 13 +++++----
 3 files changed, 38 insertions(+), 106 deletions(-)

diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c
index ad615d149683..708ccc4d0691 100644
--- a/fs/orangefs/file.c
+++ b/fs/orangefs/file.c
@@ -42,7 +42,7 @@ static int flush_racache(struct inode *inode)
 /*
  * Post and wait for the I/O upcall to finish
  */
-static ssize_t wait_for_direct_io(enum ORANGEFS_io_type type, struct inode *inode,
+ssize_t wait_for_direct_io(enum ORANGEFS_io_type type, struct inode *inode,
 		loff_t *offset, struct iov_iter *iter,
 		size_t total_size, loff_t readahead_size)
 {
@@ -234,7 +234,7 @@ static ssize_t wait_for_direct_io(enum ORANGEFS_io_type type, struct inode *inod
  * augmented/extended metadata attached to the file.
  * Note: File extended attributes override any mount options.
  */
-static ssize_t do_readv_writev(enum ORANGEFS_io_type type, struct file *file,
+ssize_t do_readv_writev(enum ORANGEFS_io_type type, struct file *file,
 		loff_t *offset, struct iov_iter *iter)
 {
 	struct inode *inode = file->f_mapping->host;
@@ -335,67 +335,11 @@ static ssize_t do_readv_writev(enum ORANGEFS_io_type type, struct file *file,
 	return ret;
 }
 
-/*
- * Read data from a specified offset in a file (referenced by inode).
- * Data may be placed either in a user or kernel buffer.
- */
-ssize_t orangefs_inode_read(struct inode *inode,
-			    struct iov_iter *iter,
-			    loff_t *offset,
-			    loff_t readahead_size)
+static ssize_t orangefs_file_read_iter(struct kiocb *iocb,
+    struct iov_iter *iter)
 {
-	struct orangefs_inode_s *orangefs_inode = ORANGEFS_I(inode);
-	size_t count = iov_iter_count(iter);
-	size_t bufmap_size;
-	ssize_t ret = -EINVAL;
-
 	orangefs_stats.reads++;
-
-	bufmap_size = orangefs_bufmap_size_query();
-	if (count > bufmap_size) {
-		gossip_debug(GOSSIP_FILE_DEBUG,
-			     "%s: count is too large (%zd/%zd)!\n",
-			     __func__, count, bufmap_size);
-		return -EINVAL;
-	}
-
-	gossip_debug(GOSSIP_FILE_DEBUG,
-		     "%s(%pU) %zd@%llu\n",
-		     __func__,
-		     &orangefs_inode->refn.khandle,
-		     count,
-		     llu(*offset));
-
-	ret = wait_for_direct_io(ORANGEFS_IO_READ, inode, offset, iter,
-			count, readahead_size);
-	if (ret > 0)
-		*offset += ret;
-
-	gossip_debug(GOSSIP_FILE_DEBUG,
-		     "%s(%pU): Value(%zd) returned.\n",
-		     __func__,
-		     &orangefs_inode->refn.khandle,
-		     ret);
-
-	return ret;
-}
-
-static ssize_t orangefs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
-{
-	struct file *file = iocb->ki_filp;
-	loff_t pos = iocb->ki_pos;
-	ssize_t rc = 0;
-
-	BUG_ON(iocb->private);
-
-	gossip_debug(GOSSIP_FILE_DEBUG, "orangefs_file_read_iter\n");
-
-	orangefs_stats.reads++;
-
-	rc = do_readv_writev(ORANGEFS_IO_READ, file, &pos, iter);
-	iocb->ki_pos = pos;
-
-	return rc;
+	return generic_file_read_iter(iocb, iter);
 }
 
 static ssize_t orangefs_file_write_iter(struct kiocb *iocb, struct iov_iter *iter)
@@ -406,6 +350,8 @@ static ssize_t orangefs_file_write_iter(struct kiocb *iocb, struct iov_iter *ite
 
 	BUG_ON(iocb->private);
 
+	truncate_inode_pages(file->f_mapping, 0);
+
 	gossip_debug(GOSSIP_FILE_DEBUG, "orangefs_file_write_iter\n");
 
 	inode_lock(file->f_mapping->host);
diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c
index be7f2cdb3342..81b8ef565f88 100644
--- a/fs/orangefs/inode.c
+++ b/fs/orangefs/inode.c
@@ -16,37 +16,25 @@
 
 static int orangefs_readpage(struct file *file, struct page *page)
 {
-	int ret;
-	int max_block;
-	ssize_t bytes_read = 0;
 	struct inode *inode = page->mapping->host;
-	const __u32 blocksize = PAGE_SIZE;	/* inode->i_blksize */
-	const __u32 blockbits = PAGE_SHIFT;	/* inode->i_blkbits */
-	struct iov_iter to;
-	struct bio_vec bv = {.bv_page = page, .bv_len = PAGE_SIZE};
-
-	iov_iter_bvec(&to, ITER_BVEC | READ, &bv, 1, PAGE_SIZE);
-
-	gossip_debug(GOSSIP_INODE_DEBUG,
-		    "orangefs_readpage called with page %p\n",
-		     page);
-
-	max_block = ((inode->i_size / blocksize) + 1);
-
-	if (page->index < max_block) {
-		loff_t blockptr_offset = (((loff_t) page->index) << blockbits);
-
-		bytes_read = orangefs_inode_read(inode,
-						 &to,
-						 &blockptr_offset,
-						 inode->i_size);
-	}
+	struct iov_iter iter;
+	struct bio_vec bv;
+	ssize_t ret;
+	loff_t off;
+
+	off = page_offset(page);
+	bv.bv_page = page;
+	bv.bv_len = PAGE_SIZE;
+	bv.bv_offset = 0;
+	iov_iter_bvec(&iter, ITER_BVEC | READ, &bv, 1, PAGE_SIZE);
+
+	ret = wait_for_direct_io(ORANGEFS_IO_READ, inode, &off, &iter,
+	    PAGE_SIZE, inode->i_size);
 	/* this will only zero remaining unread portions of the page data */
-	iov_iter_zero(~0U, &to);
+	iov_iter_zero(~0U, &iter);
 	/* takes care of potential aliasing */
 	flush_dcache_page(page);
-	if (bytes_read < 0) {
-		ret = bytes_read;
+	if (ret < 0) {
 		SetPageError(page);
 	} else {
 		SetPageUptodate(page);
@@ -83,22 +71,17 @@ static int orangefs_releasepage(struct page *page, gfp_t foo)
 	return 0;
 }
 
-/*
- * Having a direct_IO entry point in the address_space_operations
- * struct causes the kernel to allows us to use O_DIRECT on
- * open. Nothing will ever call this thing, but in the future we
- * will need to be able to use O_DIRECT on open in order to support
- * AIO. Modeled after NFS, they do this too.
- */
-
 static ssize_t orangefs_direct_IO(struct kiocb *iocb,
 				  struct iov_iter *iter)
 {
-	gossip_debug(GOSSIP_INODE_DEBUG,
-		     "orangefs_direct_IO: %pD\n",
-		     iocb->ki_filp);
-
-	return -EINVAL;
+	struct file *file = iocb->ki_filp;
+	loff_t pos = *(&iocb->ki_pos);
+	/*
+	 * This cannot happen until write_iter becomes
+	 * generic_file_write_iter.
+	 */
+	BUG_ON(iov_iter_rw(iter) != READ);
+	return do_readv_writev(ORANGEFS_IO_READ, file, &pos, iter);
 }
 
 /** ORANGEFS2 implementation of address space operations */
diff --git a/fs/orangefs/orangefs-kernel.h b/fs/orangefs/orangefs-kernel.h
index 525656d928d4..aa839cde2a9b 100644
--- a/fs/orangefs/orangefs-kernel.h
+++ b/fs/orangefs/orangefs-kernel.h
@@ -370,11 +370,6 @@ ssize_t orangefs_listxattr(struct dentry *dentry, char *buffer, size_t size);
 struct inode *orangefs_iget(struct super_block *sb,
 			 struct orangefs_object_kref *ref);
 
-ssize_t orangefs_inode_read(struct inode *inode,
-			    struct iov_iter *iter,
-			    loff_t *offset,
-			    loff_t readahead_size);
-
 /*
  * defined in devorangefs-req.c
  */
@@ -385,6 +380,14 @@ void orangefs_dev_cleanup(void);
 int is_daemon_in_service(void);
 bool __is_daemon_in_service(void);
 
+/*
+ * defined in file.c
+ */
+ssize_t wait_for_direct_io(enum ORANGEFS_io_type, struct inode *, loff_t *,
+    struct iov_iter *, size_t, loff_t);
+ssize_t do_readv_writev(enum ORANGEFS_io_type, struct file *, loff_t *,
+    struct iov_iter *);
+
 /*
  * defined in orangefs-utils.c
  */
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 20/24] orangefs: implement writepage
  2018-03-20 17:02 [PATCH 00/24] orangefs: page cache Martin Brandenburg
                   ` (18 preceding siblings ...)
  2018-03-20 17:02 ` [PATCH 19/24] orangefs: migrate to generic_file_read_iter Martin Brandenburg
@ 2018-03-20 17:02 ` Martin Brandenburg
  2018-03-20 17:02 ` [PATCH 21/24] orangefs: skip inode writeout if nothing to write Martin Brandenburg
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Martin Brandenburg @ 2018-03-20 17:02 UTC (permalink / raw)
  To: hubcap, linux-fsdevel; +Cc: Martin Brandenburg

Now orangefs_inode_getattr fills from cache if an inode has dirty pages.

also if attr_valid and dirty pages and !flags, we spin on inode writeback
before returning if pages still dirty after: should it be other way

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
---
 fs/orangefs/file.c           | 77 +++++++++-----------------------------------
 fs/orangefs/inode.c          | 59 +++++++++++++++++++++++++++++----
 fs/orangefs/orangefs-utils.c | 12 +++++--
 3 files changed, 78 insertions(+), 70 deletions(-)

diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c
index 708ccc4d0691..2aec109e3574 100644
--- a/fs/orangefs/file.c
+++ b/fs/orangefs/file.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 /*
  * (C) 2001 Clemson University and The University of Chicago
+ * Copyright 2018 Omnibond Systems, L.L.C.
  *
  * See COPYING in top-level directory.
  */
@@ -342,65 +343,11 @@ static ssize_t orangefs_file_read_iter(struct kiocb *iocb,
 	return generic_file_read_iter(iocb, iter);
 }
 
-static ssize_t orangefs_file_write_iter(struct kiocb *iocb, struct iov_iter *iter)
+static ssize_t orangefs_file_write_iter(struct kiocb *iocb,
+    struct iov_iter *iter)
 {
-	struct file *file = iocb->ki_filp;
-	loff_t pos;
-	ssize_t rc;
-
-	BUG_ON(iocb->private);
-
-	truncate_inode_pages(file->f_mapping, 0);
-
-	gossip_debug(GOSSIP_FILE_DEBUG, "orangefs_file_write_iter\n");
-
-	inode_lock(file->f_mapping->host);
-
-	/* Make sure generic_write_checks sees an up to date inode size. */
-	if (file->f_flags & O_APPEND) {
-		rc = orangefs_inode_getattr(file->f_mapping->host,
-		    ORANGEFS_GETATTR_SIZE);
-		if (rc == -ESTALE)
-			rc = -EIO;
-		if (rc) {
-			gossip_err("%s: orangefs_inode_getattr failed, "
-			    "rc:%zd:.\n", __func__, rc);
-			goto out;
-		}
-	}
-
-	rc = generic_write_checks(iocb, iter);
-
-	if (rc <= 0) {
-		gossip_err("%s: generic_write_checks failed, rc:%zd:.\n",
-			   __func__, rc);
-		goto out;
-	}
-
-	/*
-	 * if we are appending, generic_write_checks would have updated
-	 * pos to the end of the file, so we will wait till now to set
-	 * pos...
-	 */
-	pos = iocb->ki_pos;
-
-	rc = do_readv_writev(ORANGEFS_IO_WRITE,
-			     file,
-			     &pos,
-			     iter);
-	if (rc < 0) {
-		gossip_err("%s: do_readv_writev failed, rc:%zd:.\n",
-			   __func__, rc);
-		goto out;
-	}
-
-	iocb->ki_pos = pos;
 	orangefs_stats.writes++;
-
-out:
-
-	inode_unlock(file->f_mapping->host);
-	return rc;
+	return generic_file_write_iter(iocb, iter);
 }
 
 /*
@@ -495,9 +442,6 @@ static int orangefs_file_mmap(struct file *file, struct vm_area_struct *vma)
 			(char *)file->f_path.dentry->d_name.name :
 			(char *)"Unknown"));
 
-	if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_MAYWRITE))
-		return -EINVAL;
-
 	/* set the sequential readahead hint */
 	vma->vm_flags |= VM_SEQ_READ;
 	vma->vm_flags &= ~VM_RAND_READ;
@@ -537,8 +481,6 @@ static int orangefs_file_release(struct inode *inode, struct file *file)
 			gossip_debug(GOSSIP_INODE_DEBUG,
 			    "flush_racache finished\n");
 		}
-		truncate_inode_pages(file_inode(file)->i_mapping,
-				     0);
 	}
 	return 0;
 }
@@ -556,6 +498,11 @@ static int orangefs_fsync(struct file *file,
 		ORANGEFS_I(file_inode(file));
 	struct orangefs_kernel_op_s *new_op = NULL;
 
+	ret = filemap_write_and_wait_range(file_inode(file)->i_mapping,
+	    start, end);
+	if (ret)
+		return ret;
+
 	new_op = op_alloc(ORANGEFS_VFS_OP_FSYNC);
 	if (!new_op)
 		return -ENOMEM;
@@ -636,6 +583,11 @@ static int orangefs_lock(struct file *filp, int cmd, struct file_lock *fl)
 	return rc;
 }
 
+int orangefs_flush(struct file *file, fl_owner_t id)
+{
+	return vfs_fsync(file, 0);
+}
+
 /** ORANGEFS implementation of VFS file operations */
 const struct file_operations orangefs_file_operations = {
 	.llseek		= orangefs_file_llseek,
@@ -645,6 +597,7 @@ const struct file_operations orangefs_file_operations = {
 	.unlocked_ioctl	= orangefs_ioctl,
 	.mmap		= orangefs_file_mmap,
 	.open		= generic_file_open,
+	.flush		= orangefs_flush,
 	.release	= orangefs_file_release,
 	.fsync		= orangefs_fsync,
 };
diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c
index 81b8ef565f88..b9b9d659a3e1 100644
--- a/fs/orangefs/inode.c
+++ b/fs/orangefs/inode.c
@@ -14,6 +14,44 @@
 #include "orangefs-kernel.h"
 #include "orangefs-bufmap.h"
 
+static int orangefs_writepage(struct page *page, struct writeback_control *wbc)
+{
+	struct inode *inode = page->mapping->host;
+	struct iov_iter iter;
+	struct bio_vec bv;
+	size_t len, wlen;
+	ssize_t ret;
+	loff_t off;
+
+	set_page_writeback(page);
+
+	off = page_offset(page);
+	len = i_size_read(inode);
+	if (off + PAGE_SIZE > len)
+		wlen = len - off;
+	else
+		wlen = PAGE_SIZE;
+
+	bv.bv_page = page;
+	bv.bv_len = wlen;
+	bv.bv_offset = 0;
+	if (wlen == 0)
+		dump_stack();
+	iov_iter_bvec(&iter, ITER_BVEC | WRITE, &bv, 1, wlen);
+
+	ret = wait_for_direct_io(ORANGEFS_IO_WRITE, inode, &off, &iter, wlen,
+	    len);
+	if (ret < 0) {
+		SetPageError(page);
+		mapping_set_error(page->mapping, ret);
+	} else {
+		ret = 0;
+	}
+	end_page_writeback(page);
+	unlock_page(page);
+	return ret;
+}
+
 static int orangefs_readpage(struct file *file, struct page *page)
 {
 	struct inode *inode = page->mapping->host;
@@ -47,6 +85,15 @@ static int orangefs_readpage(struct file *file, struct page *page)
 	return ret;
 }
 
+int orangefs_write_end(struct file *file, struct address_space *mapping,
+    loff_t pos, unsigned len, unsigned copied, struct page *page, void *fsdata)
+{
+	int r;
+	r = simple_write_end(file, mapping, pos, len, copied, page, fsdata);
+	mark_inode_dirty_sync(file_inode(file));
+	return r;
+}
+
 static void orangefs_invalidatepage(struct page *page,
 				 unsigned int offset,
 				 unsigned int length)
@@ -76,17 +123,17 @@ static ssize_t orangefs_direct_IO(struct kiocb *iocb,
 {
 	struct file *file = iocb->ki_filp;
 	loff_t pos = *(&iocb->ki_pos);
-	/*
-	 * This cannot happen until write_iter becomes
-	 * generic_file_write_iter.
-	 */
-	BUG_ON(iov_iter_rw(iter) != READ);
-	return do_readv_writev(ORANGEFS_IO_READ, file, &pos, iter);
+	return do_readv_writev(iov_iter_rw(iter) == WRITE ?
+	    ORANGEFS_IO_WRITE : ORANGEFS_IO_READ, file, &pos, iter);
 }
 
 /** ORANGEFS2 implementation of address space operations */
 static const struct address_space_operations orangefs_address_operations = {
+	.writepage = orangefs_writepage,
 	.readpage = orangefs_readpage,
+	.set_page_dirty = __set_page_dirty_nobuffers,
+	.write_begin = simple_write_begin,
+	.write_end = orangefs_write_end,
 	.invalidatepage = orangefs_invalidatepage,
 	.releasepage = orangefs_releasepage,
 	.direct_IO = orangefs_direct_IO,
diff --git a/fs/orangefs/orangefs-utils.c b/fs/orangefs/orangefs-utils.c
index 0e52208726fd..e12ac6ae9894 100644
--- a/fs/orangefs/orangefs-utils.c
+++ b/fs/orangefs/orangefs-utils.c
@@ -245,12 +245,16 @@ int orangefs_inode_getattr(struct inode *inode, int flags)
 	spin_lock(&inode->i_lock);
 	/* Must have all the attributes in the mask and be within cache time. */
 	if ((!flags && time_before(jiffies, orangefs_inode->getattr_time)) ||
-	    orangefs_inode->attr_valid) {
+	    orangefs_inode->attr_valid || inode->i_state & I_DIRTY_PAGES) {
 		if (orangefs_inode->attr_valid) {
 			spin_unlock(&inode->i_lock);
 			write_inode_now(inode, 1);
 			goto again;
 		}
+		if (inode->i_state & I_DIRTY_PAGES) {
+			spin_unlock(&inode->i_lock);
+			return 0;
+		}
 		spin_unlock(&inode->i_lock);
 		return 0;
 	}
@@ -279,12 +283,16 @@ int orangefs_inode_getattr(struct inode *inode, int flags)
 	spin_lock(&inode->i_lock);
 	/* Must have all the attributes in the mask and be within cache time. */
 	if ((!flags && time_before(jiffies, orangefs_inode->getattr_time)) ||
-	    orangefs_inode->attr_valid) {
+	    orangefs_inode->attr_valid || inode->i_state & I_DIRTY_PAGES) {
 		if (orangefs_inode->attr_valid) {
 			spin_unlock(&inode->i_lock);
 			write_inode_now(inode, 1);
 			goto again2;
 		}
+		if (inode->i_state & I_DIRTY_PAGES) {
+			spin_unlock(&inode->i_lock);
+			return 0;
+		}
 		gossip_debug(GOSSIP_UTILS_DEBUG, "%s: in cache or dirty\n",
 		    __func__);
 		ret = 0;
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 21/24] orangefs: skip inode writeout if nothing to write
  2018-03-20 17:02 [PATCH 00/24] orangefs: page cache Martin Brandenburg
                   ` (19 preceding siblings ...)
  2018-03-20 17:02 ` [PATCH 20/24] orangefs: implement writepage Martin Brandenburg
@ 2018-03-20 17:02 ` Martin Brandenburg
  2018-03-20 17:02 ` [PATCH 22/24] orangefs: write range tracking Martin Brandenburg
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Martin Brandenburg @ 2018-03-20 17:02 UTC (permalink / raw)
  To: hubcap, linux-fsdevel; +Cc: Martin Brandenburg

Would happen if an inode is dirty but whatever happened is not something
that can be written out to OrangeFS.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
---
 fs/orangefs/orangefs-utils.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/fs/orangefs/orangefs-utils.c b/fs/orangefs/orangefs-utils.c
index e12ac6ae9894..4d079635fa4b 100644
--- a/fs/orangefs/orangefs-utils.c
+++ b/fs/orangefs/orangefs-utils.c
@@ -436,6 +436,11 @@ int orangefs_inode_setattr(struct inode *inode)
 	copy_attributes_from_inode(inode,
 	    &new_op->upcall.req.setattr.attributes);
 	orangefs_inode->attr_valid = 0;
+	if (!new_op->upcall.req.setattr.attributes.mask) {
+		spin_unlock(&inode->i_lock);
+		op_release(new_op);
+		return 0;
+	}
 	spin_unlock(&inode->i_lock);
 
 	ret = service_operation(new_op,
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 22/24] orangefs: write range tracking
  2018-03-20 17:02 [PATCH 00/24] orangefs: page cache Martin Brandenburg
                   ` (20 preceding siblings ...)
  2018-03-20 17:02 ` [PATCH 21/24] orangefs: skip inode writeout if nothing to write Martin Brandenburg
@ 2018-03-20 17:02 ` Martin Brandenburg
  2018-03-20 17:02 ` [PATCH 23/24] orangefs: tracepoints for readpage and writeback Martin Brandenburg
  2018-03-20 17:02 ` [PATCH 24/24] orangefs: tracepoints for getattr, setattr, and write_inode Martin Brandenburg
  23 siblings, 0 replies; 25+ messages in thread
From: Martin Brandenburg @ 2018-03-20 17:02 UTC (permalink / raw)
  To: hubcap, linux-fsdevel; +Cc: Martin Brandenburg

This is necessary to ensure the uid/gid responsible for the write is
communicated with the server.  Only one uid/gid may have outstanding
changes at a time.  If another uid/gid writes while there are
outstanding changes, the changes must be written out before the new
data is put into the page.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
---
 fs/orangefs/file.c            |  12 ++-
 fs/orangefs/inode.c           | 243 +++++++++++++++++++++++++++++++++++++-----
 fs/orangefs/orangefs-kernel.h |  12 ++-
 3 files changed, 237 insertions(+), 30 deletions(-)

diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c
index 2aec109e3574..1bbc40af67ee 100644
--- a/fs/orangefs/file.c
+++ b/fs/orangefs/file.c
@@ -44,8 +44,8 @@ static int flush_racache(struct inode *inode)
  * Post and wait for the I/O upcall to finish
  */
 ssize_t wait_for_direct_io(enum ORANGEFS_io_type type, struct inode *inode,
-		loff_t *offset, struct iov_iter *iter,
-		size_t total_size, loff_t readahead_size)
+    loff_t *offset, struct iov_iter *iter, size_t total_size,
+    loff_t readahead_size, struct orangefs_write_request *wr)
 {
 	struct orangefs_inode_s *orangefs_inode = ORANGEFS_I(inode);
 	struct orangefs_khandle *handle = &orangefs_inode->refn.khandle;
@@ -101,6 +101,10 @@ ssize_t wait_for_direct_io(enum ORANGEFS_io_type type, struct inode *inode,
 			    __func__, (long)ret);
 			goto out;
 		}
+		if (wr) {
+			new_op->upcall.uid = from_kuid(&init_user_ns, wr->uid);
+			new_op->upcall.gid = from_kgid(&init_user_ns, wr->gid);
+		}
 	}
 
 	gossip_debug(GOSSIP_FILE_DEBUG,
@@ -286,7 +290,7 @@ ssize_t do_readv_writev(enum ORANGEFS_io_type type, struct file *file,
 			     (int)*offset);
 
 		ret = wait_for_direct_io(type, inode, offset, iter,
-				each_count, 0);
+				each_count, 0, NULL);
 		gossip_debug(GOSSIP_FILE_DEBUG,
 			     "%s(%pU): return from wait_for_io:%d\n",
 			     __func__,
@@ -428,7 +432,7 @@ static int orangefs_fault(struct vm_fault *vmf)
 const struct vm_operations_struct orangefs_file_vm_ops = {
 	.fault = orangefs_fault,
 	.map_pages = filemap_map_pages,
-	.page_mkwrite = filemap_page_mkwrite,
+	.page_mkwrite = orangefs_page_mkwrite,
 };
 
 /*
diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c
index b9b9d659a3e1..aae9dc91f836 100644
--- a/fs/orangefs/inode.c
+++ b/fs/orangefs/inode.c
@@ -14,9 +14,11 @@
 #include "orangefs-kernel.h"
 #include "orangefs-bufmap.h"
 
-static int orangefs_writepage(struct page *page, struct writeback_control *wbc)
+static int orangefs_writepage_locked(struct page *page,
+    struct writeback_control *wbc)
 {
 	struct inode *inode = page->mapping->host;
+	struct orangefs_write_request *wr;
 	struct iov_iter iter;
 	struct bio_vec bv;
 	size_t len, wlen;
@@ -25,33 +27,175 @@ static int orangefs_writepage(struct page *page, struct writeback_control *wbc)
 
 	set_page_writeback(page);
 
-	off = page_offset(page);
-	len = i_size_read(inode);
-	if (off + PAGE_SIZE > len)
-		wlen = len - off;
-	else
-		wlen = PAGE_SIZE;
+	if (PagePrivate(page)) {
+		wr = (struct orangefs_write_request *)page_private(page);
+		BUG_ON(!wr);
+		if (wr->mwrite) {
+			off = page_offset(page);
+			len = i_size_read(inode);
+			if (off + PAGE_SIZE > len)
+				wlen = len - off;
+			else
+				wlen = PAGE_SIZE;
+		} else {
+			off = wr->pos;
+			wlen = wr->len;
+			len = i_size_read(inode);
+		}
+	} else {
+/*		BUG();*/
+		/* It's not private so there's nothing to write, right? */
+		printk("writepage not private!\n");
+		end_page_writeback(page);
+		return 0;
+
+	}
 
 	bv.bv_page = page;
 	bv.bv_len = wlen;
 	bv.bv_offset = 0;
-	if (wlen == 0)
-		dump_stack();
 	iov_iter_bvec(&iter, ITER_BVEC | WRITE, &bv, 1, wlen);
 
 	ret = wait_for_direct_io(ORANGEFS_IO_WRITE, inode, &off, &iter, wlen,
-	    len);
+	    len, wr);
 	if (ret < 0) {
 		SetPageError(page);
 		mapping_set_error(page->mapping, ret);
 	} else {
 		ret = 0;
+		if (wr) {
+			ClearPagePrivate(page);
+			kfree(wr);
+		}
 	}
 	end_page_writeback(page);
-	unlock_page(page);
 	return ret;
 }
 
+static int do_writepage_if_necessary(struct page *page, loff_t pos,
+    unsigned len)
+{
+	struct orangefs_write_request *wr;
+	struct writeback_control wbc = {
+		.sync_mode = WB_SYNC_ALL,
+		.nr_to_write = 0,
+	};
+	int r;
+	if (PagePrivate(page)) {
+		wr = (struct orangefs_write_request *)page_private(page);
+		BUG_ON(!wr);
+		/*
+		 * If the new request is not contiguous with the last one or if
+		 * the uid or gid is different, the page must be written out
+		 * before continuing.
+		 */
+		if (pos + len < wr->pos || wr->pos + wr->len < pos ||
+		    !uid_eq(current_fsuid(), wr->uid) ||
+		    !gid_eq(current_fsgid(), wr->gid)) {
+			wbc.range_start = page_file_offset(page);
+			wbc.range_end = wbc.range_start + PAGE_SIZE - 1;
+			wait_on_page_writeback(page);
+			if (clear_page_dirty_for_io(page)) {
+				r = orangefs_writepage_locked(page, &wbc);
+				if (r)
+					return r;
+			}
+			BUG_ON(PagePrivate(page));
+		}
+	}
+	return 0;
+}
+
+static int update_wr(struct page *page, loff_t pos, unsigned len, int mwrite)
+{
+	struct orangefs_write_request *wr;
+	if (PagePrivate(page)) {
+		wr = (struct orangefs_write_request *)page_private(page);
+		BUG_ON(!wr);
+		if (mwrite) {
+			wr->mwrite = 1;
+			return 0;
+		}
+		if (pos < wr->pos) {
+			wr->len += wr->pos - pos;
+			wr->pos = pos;
+		}
+		if (pos + len > wr->pos + wr->len)
+			wr->len = pos + len - wr->pos;
+		else
+			wr->len = wr->pos + wr->len - wr->pos;
+	} else {
+		wr = kmalloc(sizeof *wr, GFP_KERNEL);
+		if (wr) {
+			wr->pos = pos;
+			wr->len = len;
+			wr->uid = current_fsuid();
+			wr->gid = current_fsgid();
+			wr->mwrite = mwrite;
+			SetPagePrivate(page);
+			set_page_private(page, (unsigned long)wr);
+		} else {
+			return -ENOMEM;
+		}
+	}
+	return 0;
+}
+
+int orangefs_page_mkwrite(struct vm_fault *vmf)
+{
+	struct page *page = vmf->page;
+	struct inode *inode = file_inode(vmf->vma->vm_file);
+	unsigned len;
+	int r;
+
+	/* Do not write past the file size. */
+	len = i_size_read(inode) - page_file_offset(page);
+	if (len > PAGE_SIZE)
+		len = PAGE_SIZE;
+
+	lock_page(page);
+	r = do_writepage_if_necessary(page, page_file_offset(page),
+	    len);
+	if (r) {
+		r = VM_FAULT_RETRY;
+		unlock_page(vmf->page);
+		return r;
+	}
+	r = update_wr(page, page_file_offset(page), len, 1);
+	if (r) {
+		r = VM_FAULT_RETRY;
+		unlock_page(vmf->page);
+		return r;
+	}
+
+	r = VM_FAULT_LOCKED;
+	sb_start_pagefault(inode->i_sb);
+	file_update_time(vmf->vma->vm_file);
+	if (page->mapping != inode->i_mapping) {
+		unlock_page(page);
+		r = VM_FAULT_NOPAGE;
+		goto out;
+	}
+	/*
+	 * We mark the page dirty already here so that when freeze is in
+	 * progress, we are guaranteed that writeback during freezing will
+	 * see the dirty page and writeprotect it again.
+	 */
+	set_page_dirty(page);
+	wait_for_stable_page(page);
+out:
+	sb_end_pagefault(inode->i_sb);
+	return r;
+}
+
+static int orangefs_writepage(struct page *page, struct writeback_control *wbc)
+{
+	int r;
+	r = orangefs_writepage_locked(page, wbc);
+	unlock_page(page);
+	return r;
+}
+
 static int orangefs_readpage(struct file *file, struct page *page)
 {
 	struct inode *inode = page->mapping->host;
@@ -67,7 +211,7 @@ static int orangefs_readpage(struct file *file, struct page *page)
 	iov_iter_bvec(&iter, ITER_BVEC | READ, &bv, 1, PAGE_SIZE);
 
 	ret = wait_for_direct_io(ORANGEFS_IO_READ, inode, &off, &iter,
-	    PAGE_SIZE, inode->i_size);
+	    PAGE_SIZE, inode->i_size, NULL);
 	/* this will only zero remaining unread portions of the page data */
 	iov_iter_zero(~0U, &iter);
 	/* takes care of potential aliasing */
@@ -85,10 +229,26 @@ static int orangefs_readpage(struct file *file, struct page *page)
 	return ret;
 }
 
+static int orangefs_write_begin(struct file *file,
+    struct address_space *mapping, loff_t pos, unsigned len, unsigned flags,
+    struct page **pagep, void **fsdata)
+{
+	int r;
+	r = simple_write_begin(file, mapping, pos, len, flags, pagep, fsdata);
+	if (r)
+		return r;
+	r = do_writepage_if_necessary(*pagep, pos, len);
+	if (r)
+		unlock_page(*pagep);
+	return r;
+}
+
 int orangefs_write_end(struct file *file, struct address_space *mapping,
     loff_t pos, unsigned len, unsigned copied, struct page *page, void *fsdata)
 {
 	int r;
+	if (update_wr(page, pos, len, 0))
+		return -ENOMEM;
 	r = simple_write_end(file, mapping, pos, len, copied, page, fsdata);
 	mark_inode_dirty_sync(file_inode(file));
 	return r;
@@ -98,24 +258,57 @@ static void orangefs_invalidatepage(struct page *page,
 				 unsigned int offset,
 				 unsigned int length)
 {
-	gossip_debug(GOSSIP_INODE_DEBUG,
-		     "orangefs_invalidatepage called on page %p "
-		     "(offset is %u)\n",
-		     page,
-		     offset);
-
-	ClearPageUptodate(page);
-	ClearPageMappedToDisk(page);
+	struct orangefs_write_request *wr;
+	/* XXX move to releasepage and call + rebase */
+	struct writeback_control wbc = {
+		.sync_mode = WB_SYNC_ALL,
+		.nr_to_write = 0,
+	};
+	int r;
+	if (PagePrivate(page)) {
+		wr = (struct orangefs_write_request *)page_private(page);
+		BUG_ON(!wr);
+/* XXX prove */
+		if (offset == 0 && length == PAGE_SIZE) {
+			ClearPagePrivate(page);
+			kfree(wr);
+		} else if (wr->pos - page_offset(page) < offset &&
+		    wr->pos - page_offset(page) + wr->len > offset + length) {
+			wbc.range_start = page_file_offset(page);
+			wbc.range_end = wbc.range_start + PAGE_SIZE - 1;
+			wait_on_page_writeback(page);
+			if (clear_page_dirty_for_io(page)) {
+				r = orangefs_writepage_locked(page, &wbc);
+				if (r)
+					return;
+			} else {
+				ClearPagePrivate(page);
+				kfree(wr);
+			}
+		} else if (wr->pos - page_offset(page) < offset &&
+		    wr->pos - page_offset(page) + wr->len <= offset + length) {
+			wr->len = offset;
+		} else if (wr->pos - page_offset(page) >= offset &&
+		    wr->pos - page_offset(page) + wr->len > offset + length) {
+			wr->pos += length - wr->pos + page_offset(page);
+			wr->len -= length - wr->pos + page_offset(page);
+		} else {
+			/*
+			 * Invalidate range is bigger than write range but
+			 * entire write range is to be invalidated.
+			 */
+			ClearPagePrivate(page);
+			kfree(wr);
+		}
+	}
 	return;
 
 }
 
 static int orangefs_releasepage(struct page *page, gfp_t foo)
 {
-	gossip_debug(GOSSIP_INODE_DEBUG,
-		     "orangefs_releasepage called on page %p\n",
-		     page);
-	return 0;
+	BUG();
+	return !PagePrivate(page);
 }
 
 static ssize_t orangefs_direct_IO(struct kiocb *iocb,
@@ -132,7 +325,7 @@ static const struct address_space_operations orangefs_address_operations = {
 	.writepage = orangefs_writepage,
 	.readpage = orangefs_readpage,
 	.set_page_dirty = __set_page_dirty_nobuffers,
-	.write_begin = simple_write_begin,
+	.write_begin = orangefs_write_begin,
 	.write_end = orangefs_write_end,
 	.invalidatepage = orangefs_invalidatepage,
 	.releasepage = orangefs_releasepage,
diff --git a/fs/orangefs/orangefs-kernel.h b/fs/orangefs/orangefs-kernel.h
index aa839cde2a9b..e0656bd9a87c 100644
--- a/fs/orangefs/orangefs-kernel.h
+++ b/fs/orangefs/orangefs-kernel.h
@@ -179,6 +179,14 @@ static inline void set_op_state_purged(struct orangefs_kernel_op_s *op)
 	}
 }
 
+struct orangefs_write_request {
+	loff_t pos;
+	unsigned len;
+	kuid_t uid;
+	kgid_t gid;
+	int mwrite;
+};
+
 /* per inode private orangefs info */
 struct orangefs_inode_s {
 	struct orangefs_object_kref refn;
@@ -343,6 +351,8 @@ void fsid_key_table_finalize(void);
 /*
  * defined in inode.c
  */
+int orangefs_page_mkwrite(struct vm_fault *);
+
 struct inode *orangefs_new_inode(struct super_block *sb,
 			      struct inode *dir,
 			      int mode,
@@ -384,7 +394,7 @@ bool __is_daemon_in_service(void);
  * defined in file.c
  */
 ssize_t wait_for_direct_io(enum ORANGEFS_io_type, struct inode *, loff_t *,
-    struct iov_iter *, size_t, loff_t);
+    struct iov_iter *, size_t, loff_t, struct orangefs_write_request *);
 ssize_t do_readv_writev(enum ORANGEFS_io_type, struct file *, loff_t *,
     struct iov_iter *);
 
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 23/24] orangefs: tracepoints for readpage and writeback
  2018-03-20 17:02 [PATCH 00/24] orangefs: page cache Martin Brandenburg
                   ` (21 preceding siblings ...)
  2018-03-20 17:02 ` [PATCH 22/24] orangefs: write range tracking Martin Brandenburg
@ 2018-03-20 17:02 ` Martin Brandenburg
  2018-03-20 17:02 ` [PATCH 24/24] orangefs: tracepoints for getattr, setattr, and write_inode Martin Brandenburg
  23 siblings, 0 replies; 25+ messages in thread
From: Martin Brandenburg @ 2018-03-20 17:02 UTC (permalink / raw)
  To: hubcap, linux-fsdevel; +Cc: Martin Brandenburg

trace_orangefs_writepage and trace_orangefs_radpage are self explanatory.

trace_orangefs_early_writeback will be used to determine the cost of the
inability to cache multiple writes from different uids or noncontiguous
writes within a page.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
---
 fs/orangefs/inode.c          | 11 +++++++++-
 fs/orangefs/orangefs-trace.h | 50 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c
index aae9dc91f836..8c67cdab2b12 100644
--- a/fs/orangefs/inode.c
+++ b/fs/orangefs/inode.c
@@ -13,6 +13,7 @@
 #include <linux/bvec.h>
 #include "orangefs-kernel.h"
 #include "orangefs-bufmap.h"
+#include "orangefs-trace.h"
 
 static int orangefs_writepage_locked(struct page *page,
     struct writeback_control *wbc)
@@ -51,6 +52,8 @@ static int orangefs_writepage_locked(struct page *page,
 
 	}
 
+	trace_orangefs_writepage(off, wlen, wr->mwrite);
+
 	bv.bv_page = page;
 	bv.bv_len = wlen;
 	bv.bv_offset = 0;
@@ -82,20 +85,24 @@ static int do_writepage_if_necessary(struct page *page, loff_t pos,
 	};
 	int r;
 	if (PagePrivate(page)) {
+		int noncontig;
 		wr = (struct orangefs_write_request *)page_private(page);
 		BUG_ON(!wr);
+ 		noncontig = pos + len < wr->pos || wr->pos + wr->len < pos;
 		/*
 		 * If the new request is not contiguous with the last one or if
 		 * the uid or gid is different, the page must be written out
 		 * before continuing.
 		 */
-		if (pos + len < wr->pos || wr->pos + wr->len < pos ||
+		if (noncontig ||
 		    !uid_eq(current_fsuid(), wr->uid) ||
 		    !gid_eq(current_fsgid(), wr->gid)) {
 			wbc.range_start = page_file_offset(page);
 			wbc.range_end = wbc.range_start + PAGE_SIZE - 1;
 			wait_on_page_writeback(page);
 			if (clear_page_dirty_for_io(page)) {
+				trace_orangefs_early_writeback(noncontig ?
+				    1 : 2);
 				r = orangefs_writepage_locked(page, &wbc);
 				if (r)
 					return r;
@@ -205,6 +212,7 @@ static int orangefs_readpage(struct file *file, struct page *page)
 	loff_t off;
 
 	off = page_offset(page);
+	trace_orangefs_readpage(off, PAGE_SIZE);
 	bv.bv_page = page;
 	bv.bv_len = PAGE_SIZE;
 	bv.bv_offset = 0;
@@ -278,6 +286,7 @@ static void orangefs_invalidatepage(struct page *page,
 			wbc.range_end = wbc.range_start + PAGE_SIZE - 1;
 			wait_on_page_writeback(page);
 			if (clear_page_dirty_for_io(page)) {
+				trace_orangefs_early_writeback(0);
 				r = orangefs_writepage_locked(page, &wbc);
 				if (r)
 					return;
diff --git a/fs/orangefs/orangefs-trace.h b/fs/orangefs/orangefs-trace.h
index 16e2b5a86071..faf09b26d9ba 100644
--- a/fs/orangefs/orangefs-trace.h
+++ b/fs/orangefs/orangefs-trace.h
@@ -63,6 +63,21 @@ TRACE_EVENT(orangefs_devreq_write_iter,
     )
 );
 
+TRACE_EVENT(orangefs_early_writeback,
+    TP_PROTO(int reason),
+    TP_ARGS(reason),
+    TP_STRUCT__entry(
+        __field(int, reason)
+    ),
+    TP_fast_assign(
+        __entry->reason = reason;
+    ),
+    TP_printk(
+        "%s", __entry->reason == 0 ? "invalidatepage" :
+            (__entry->reason == 1 ? "noncontiguous" : "uid/gid")
+    )
+);
+
 TRACE_EVENT(orangefs_service_operation,
     TP_PROTO(struct orangefs_kernel_op_s *op, int flags),
     TP_ARGS(op, flags),
@@ -82,6 +97,41 @@ TRACE_EVENT(orangefs_service_operation,
     )
 );
 
+TRACE_EVENT(orangefs_readpage,
+    TP_PROTO(loff_t off, size_t len),
+    TP_ARGS(off, len),
+    TP_STRUCT__entry(
+        __field(loff_t, off)
+        __field(size_t, len)
+    ),
+    TP_fast_assign(
+        __entry->off = off;
+        __entry->len = len;
+    ),
+    TP_printk(
+        "off=%lld len=%ld", __entry->off, __entry->len
+    )
+);
+
+TRACE_EVENT(orangefs_writepage,
+    TP_PROTO(loff_t off, size_t len, int mwrite),
+    TP_ARGS(off, len, mwrite),
+    TP_STRUCT__entry(
+        __field(loff_t, off)
+        __field(size_t, len)
+        __field(int, mwrite)
+    ),
+    TP_fast_assign(
+        __entry->off = off;
+        __entry->len = len;
+        __entry->mwrite = mwrite;
+    ),
+    TP_printk(
+        "off=%lld len=%ld mwrite=%s", __entry->off, __entry->len,
+            __entry->mwrite ? "yes" : "no"
+    )
+);
+
 #endif
 
 #undef TRACE_INCLUDE_PATH
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 24/24] orangefs: tracepoints for getattr, setattr, and write_inode
  2018-03-20 17:02 [PATCH 00/24] orangefs: page cache Martin Brandenburg
                   ` (22 preceding siblings ...)
  2018-03-20 17:02 ` [PATCH 23/24] orangefs: tracepoints for readpage and writeback Martin Brandenburg
@ 2018-03-20 17:02 ` Martin Brandenburg
  23 siblings, 0 replies; 25+ messages in thread
From: Martin Brandenburg @ 2018-03-20 17:02 UTC (permalink / raw)
  To: hubcap, linux-fsdevel; +Cc: Martin Brandenburg

trace_orangefs_early_setattr will be used to determine the cost of early
attribute writes.

Otherwise self explanatory.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
---
 fs/orangefs/inode.c          |   5 +++
 fs/orangefs/orangefs-trace.h | 100 +++++++++++++++++++++++++++++++++++++++++++
 fs/orangefs/orangefs-utils.c |   6 ++-
 3 files changed, 110 insertions(+), 1 deletion(-)

diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c
index 8c67cdab2b12..5286fa96dfc2 100644
--- a/fs/orangefs/inode.c
+++ b/fs/orangefs/inode.c
@@ -438,6 +438,8 @@ int __orangefs_setattr(struct inode *inode, struct iattr *iattr)
 			ORANGEFS_I(inode)->attr_valid = iattr->ia_valid;
 		} else {
 			spin_unlock(&inode->i_lock);
+			trace_orangefs_early_setattr(inode,
+			    ORANGEFS_I(inode)->attr_valid, iattr->ia_valid);
 			write_inode_now(inode, 1);
 			goto again;
 		}
@@ -446,6 +448,8 @@ int __orangefs_setattr(struct inode *inode, struct iattr *iattr)
 		ORANGEFS_I(inode)->attr_uid = current_fsuid();
 		ORANGEFS_I(inode)->attr_gid = current_fsgid();
 	}
+	trace_orangefs_setattr(inode, ORANGEFS_I(inode)->attr_valid,
+	    iattr->ia_valid);
 	setattr_copy(inode, iattr);
 	spin_unlock(&inode->i_lock);
 	mark_inode_dirty(inode);
@@ -492,6 +496,7 @@ int orangefs_getattr(const struct path *path, struct kstat *stat,
 	gossip_debug(GOSSIP_INODE_DEBUG,
 		     "orangefs_getattr: called on %pd mask %u\n",
 		     path->dentry, request_mask);
+	trace_orangefs_getattr(inode, request_mask);
 
 	ret = orangefs_inode_getattr(inode,
 	    request_mask & STATX_SIZE ? ORANGEFS_GETATTR_SIZE : 0);
diff --git a/fs/orangefs/orangefs-trace.h b/fs/orangefs/orangefs-trace.h
index faf09b26d9ba..76b37e18d133 100644
--- a/fs/orangefs/orangefs-trace.h
+++ b/fs/orangefs/orangefs-trace.h
@@ -63,6 +63,32 @@ TRACE_EVENT(orangefs_devreq_write_iter,
     )
 );
 
+TRACE_EVENT(orangefs_early_setattr,
+    TP_PROTO(struct inode *inode, int attr_valid, int ia_valid),
+    TP_ARGS(inode, attr_valid, ia_valid),
+    TP_STRUCT__entry(
+        __array(unsigned char, u, 16)
+        __field(__s32, fs_id)
+        __field(int, attr_valid)
+        __field(int, ia_valid)
+    ),
+    TP_fast_assign(
+        memcpy(__entry->u, ORANGEFS_I(inode)->refn.khandle.u, 16);
+        __entry->fs_id = ORANGEFS_I(inode)->refn.fs_id;
+        __entry->attr_valid = attr_valid;
+        __entry->ia_valid = ia_valid;
+    ),
+    TP_printk(
+        "khandle=%02x%02x%02x%02x-%02x%02x-%02x%02x-%02x%02x-"
+            "%02x%02x%02x%02x%02x%02x fs_id=%d attr_valid=%d ia_valid=%d",
+            __entry->u[0], __entry->u[1], __entry->u[2], __entry->u[3],
+            __entry->u[4], __entry->u[5], __entry->u[6], __entry->u[7],
+            __entry->u[8], __entry->u[9], __entry->u[10], __entry->u[11],
+            __entry->u[12], __entry->u[13], __entry->u[14], __entry->u[15],
+            __entry->fs_id, __entry->attr_valid, __entry->ia_valid
+    )
+);
+
 TRACE_EVENT(orangefs_early_writeback,
     TP_PROTO(int reason),
     TP_ARGS(reason),
@@ -78,6 +104,56 @@ TRACE_EVENT(orangefs_early_writeback,
     )
 );
 
+TRACE_EVENT(orangefs_getattr,
+    TP_PROTO(struct inode *inode, int request_mask),
+    TP_ARGS(inode, request_mask),
+    TP_STRUCT__entry(
+        __array(unsigned char, u, 16)
+        __field(__s32, fs_id)
+        __field(int, request_mask)
+    ),
+    TP_fast_assign(
+        memcpy(__entry->u, ORANGEFS_I(inode)->refn.khandle.u, 16);
+        __entry->fs_id = ORANGEFS_I(inode)->refn.fs_id;
+        __entry->request_mask = request_mask;
+    ),
+    TP_printk(
+        "khandle=%02x%02x%02x%02x-%02x%02x-%02x%02x-%02x%02x-"
+            "%02x%02x%02x%02x%02x%02x fs_id=%d request_mask=%d",
+            __entry->u[0], __entry->u[1], __entry->u[2], __entry->u[3],
+            __entry->u[4], __entry->u[5], __entry->u[6], __entry->u[7],
+            __entry->u[8], __entry->u[9], __entry->u[10], __entry->u[11],
+            __entry->u[12], __entry->u[13], __entry->u[14], __entry->u[15],
+            __entry->fs_id, __entry->request_mask
+    )
+);
+
+TRACE_EVENT(orangefs_setattr,
+    TP_PROTO(struct inode *inode, int attr_valid, int ia_valid),
+    TP_ARGS(inode, attr_valid, ia_valid),
+    TP_STRUCT__entry(
+        __array(unsigned char, u, 16)
+        __field(__s32, fs_id)
+        __field(int, attr_valid)
+        __field(int, ia_valid)
+    ),
+    TP_fast_assign(
+        memcpy(__entry->u, ORANGEFS_I(inode)->refn.khandle.u, 16);
+        __entry->fs_id = ORANGEFS_I(inode)->refn.fs_id;
+        __entry->attr_valid = attr_valid;
+        __entry->ia_valid = ia_valid;
+    ),
+    TP_printk(
+        "khandle=%02x%02x%02x%02x-%02x%02x-%02x%02x-%02x%02x-"
+            "%02x%02x%02x%02x%02x%02x fs_id=%d attr_valid=%d ia_valid=%d",
+            __entry->u[0], __entry->u[1], __entry->u[2], __entry->u[3],
+            __entry->u[4], __entry->u[5], __entry->u[6], __entry->u[7],
+            __entry->u[8], __entry->u[9], __entry->u[10], __entry->u[11],
+            __entry->u[12], __entry->u[13], __entry->u[14], __entry->u[15],
+            __entry->fs_id, __entry->attr_valid, __entry->ia_valid
+    )
+);
+
 TRACE_EVENT(orangefs_service_operation,
     TP_PROTO(struct orangefs_kernel_op_s *op, int flags),
     TP_ARGS(op, flags),
@@ -113,6 +189,30 @@ TRACE_EVENT(orangefs_readpage,
     )
 );
 
+TRACE_EVENT(orangefs_write_inode,
+    TP_PROTO(struct inode *inode, int attr_valid),
+    TP_ARGS(inode, attr_valid),
+    TP_STRUCT__entry(
+        __array(unsigned char, u, 16)
+        __field(__s32, fs_id)
+        __field(int, attr_valid)
+    ),
+    TP_fast_assign(
+        memcpy(__entry->u, ORANGEFS_I(inode)->refn.khandle.u, 16);
+        __entry->fs_id = ORANGEFS_I(inode)->refn.fs_id;
+        __entry->attr_valid = attr_valid;
+    ),
+    TP_printk(
+        "khandle=%02x%02x%02x%02x-%02x%02x-%02x%02x-%02x%02x-"
+            "%02x%02x%02x%02x%02x%02x fs_id=%d attr_valid=%d",
+            __entry->u[0], __entry->u[1], __entry->u[2], __entry->u[3],
+            __entry->u[4], __entry->u[5], __entry->u[6], __entry->u[7],
+            __entry->u[8], __entry->u[9], __entry->u[10], __entry->u[11],
+            __entry->u[12], __entry->u[13], __entry->u[14], __entry->u[15],
+            __entry->fs_id, __entry->attr_valid
+    )
+);
+
 TRACE_EVENT(orangefs_writepage,
     TP_PROTO(loff_t off, size_t len, int mwrite),
     TP_ARGS(off, len, mwrite),
diff --git a/fs/orangefs/orangefs-utils.c b/fs/orangefs/orangefs-utils.c
index 4d079635fa4b..24bd1c7d797a 100644
--- a/fs/orangefs/orangefs-utils.c
+++ b/fs/orangefs/orangefs-utils.c
@@ -8,6 +8,7 @@
 #include <linux/kernel.h>
 #include "orangefs-kernel.h"
 #include "orangefs-bufmap.h"
+#include "orangefs-trace.h"
 
 __s32 fsid_of_op(struct orangefs_kernel_op_s *op)
 {
@@ -423,7 +424,7 @@ int orangefs_inode_setattr(struct inode *inode)
 {
 	struct orangefs_inode_s *orangefs_inode = ORANGEFS_I(inode);
 	struct orangefs_kernel_op_s *new_op;
-	int ret;
+	int attr_valid, ret;
 
 	new_op = op_alloc(ORANGEFS_VFS_OP_SETATTR);
 	if (!new_op)
@@ -435,6 +436,7 @@ int orangefs_inode_setattr(struct inode *inode)
 	new_op->upcall.req.setattr.refn = orangefs_inode->refn;
 	copy_attributes_from_inode(inode,
 	    &new_op->upcall.req.setattr.attributes);
+	attr_valid = orangefs_inode->attr_valid;
 	orangefs_inode->attr_valid = 0;
 	if (!new_op->upcall.req.setattr.attributes.mask) {
 		spin_unlock(&inode->i_lock);
@@ -443,6 +445,8 @@ int orangefs_inode_setattr(struct inode *inode)
 	}
 	spin_unlock(&inode->i_lock);
 
+	trace_orangefs_write_inode(inode, attr_valid);
+
 	ret = service_operation(new_op,
 	    get_interruptible_flag(inode) | ORANGEFS_OP_WRITEBACK);
 	gossip_debug(GOSSIP_UTILS_DEBUG,
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2018-03-20 17:03 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-20 17:02 [PATCH 00/24] orangefs: page cache Martin Brandenburg
2018-03-20 17:02 ` [PATCH 01/24] orangefs: make several *_operations structs static Martin Brandenburg
2018-03-20 17:02 ` [PATCH 02/24] orangefs: remove unused code Martin Brandenburg
2018-03-20 17:02 ` [PATCH 03/24] orangefs: create uapi interface Martin Brandenburg
2018-03-20 17:02 ` [PATCH 04/24] orangefs: open code short single-use functions Martin Brandenburg
2018-03-20 17:02 ` [PATCH 05/24] orangefs: implement vm_ops->fault Martin Brandenburg
2018-03-20 17:02 ` [PATCH 06/24] orangefs: implement xattr cache Martin Brandenburg
2018-03-20 17:02 ` [PATCH 07/24] orangefs: simpler installation documentation Martin Brandenburg
2018-03-20 17:02 ` [PATCH 08/24] orangefs: add tracepoint for service_operation Martin Brandenburg
2018-03-20 17:02 ` [PATCH 09/24] orangefs: tracepoints for orangefs_devreq_{read,write_iter,poll} Martin Brandenburg
2018-03-20 17:02 ` [PATCH 10/24] orangefs: do not invalidate attributes on inode create Martin Brandenburg
2018-03-20 17:02 ` [PATCH 11/24] orangefs: simply orangefs_inode_getattr interface Martin Brandenburg
2018-03-20 17:02 ` [PATCH 12/24] orangefs: update attributes rather than relying on server Martin Brandenburg
2018-03-20 17:02 ` [PATCH 13/24] orangefs: hold i_lock during inode_getattr Martin Brandenburg
2018-03-20 17:02 ` [PATCH 14/24] orangefs: set up and use backing_dev_info Martin Brandenburg
2018-03-20 17:02 ` [PATCH 15/24] orangefs: let setattr write to cached inode Martin Brandenburg
2018-03-20 17:02 ` [PATCH 16/24] orangefs: reorganize setattr functions to track attribute changes Martin Brandenburg
2018-03-20 17:02 ` [PATCH 17/24] orangefs: remove orangefs_readpages Martin Brandenburg
2018-03-20 17:02 ` [PATCH 18/24] orangefs: service ops done for writeback are not killable Martin Brandenburg
2018-03-20 17:02 ` [PATCH 19/24] orangefs: migrate to generic_file_read_iter Martin Brandenburg
2018-03-20 17:02 ` [PATCH 20/24] orangefs: implement writepage Martin Brandenburg
2018-03-20 17:02 ` [PATCH 21/24] orangefs: skip inode writeout if nothing to write Martin Brandenburg
2018-03-20 17:02 ` [PATCH 22/24] orangefs: write range tracking Martin Brandenburg
2018-03-20 17:02 ` [PATCH 23/24] orangefs: tracepoints for readpage and writeback Martin Brandenburg
2018-03-20 17:02 ` [PATCH 24/24] orangefs: tracepoints for getattr, setattr, and write_inode Martin Brandenburg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).