All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V3 0/7] Orangefs: kernel client introduction
@ 2015-07-17 17:16 Mike Marshall
  2015-07-17 17:16 ` [PATCH V3 1/7] Orangefs: kernel client part 1 Mike Marshall
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: Mike Marshall @ 2015-07-17 17:16 UTC (permalink / raw)
  To: viro; +Cc: Mike Marshall, linux-fsdevel

From: Mike Marshall <hubcap@omnibond.com>

Here is version Three of the Orangefs kernel client patch.

Changes and who suggested them:

  * addition of this 0/X part of the series - Boaz Harrosh
  * addition of orangefs.txt file - Boaz Harrosh
  * Kconfig cleanup - Randy Dunlap
  * transition from procfs to debugfs and sysfs - Greg K-H
  * fixups to extended attribute code - Walt Ligon
  * "future proofing" of interface between kernel module and userspace.

OrangeFS is an LGPL userspace, scale-out, parallel storage system.
It is ideal for large storage problems faced by HPC, Big Data, Genomics,
Bioinformatics, Video Streaming and Rendering.

Features:

  * Distributed Metadata, including Giga+-inspired distributed metadata
    for directory entries
  * Support for multiple network infrastructures leveraging BMI to adapt
    to TCP, IB, Portals and others.
  * Stateless Servers
  * User-level Implementation
  * Multiple interface integration levels for easy system and application
    integration
  * Server-to-server collective communication for improved metadata
    operation scalability
  * Scalable, Portable and Flexible
  * Proven Research Platform
  * Integrated capability-based security
  * Extensive Documentation
  * Optimized MPI-IO Support

In addition to the Linux kernel client, OrangeFS supports multiple client
platforms and applications including: WEBDAV, S3, Windows and Hadoop
Integration

History:
OrangeFS is the third development phase of the PVFS project. PFVS was
first developed in 1993 by Walt Ligon and Eric Blumer as a parallel file
system for Parallel Virtual Machine (PVM) as part of a NASA grant to study
the I/O patterns of parallel programs. In 2001-2004 a complete rewrite
PVFS2 was developed by Walt Ligon, Phil Carnes, Pete Wyckoff, Neil Miller,
Rob Latham, Sam Lang and others, providing for many of the modern
distributed and parallel file system concepts.  Omnibond is leading the
third phase of development with the OrangeFS Community.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH V3 1/7] Orangefs: kernel client part 1
  2015-07-17 17:16 [PATCH V3 0/7] Orangefs: kernel client introduction Mike Marshall
@ 2015-07-17 17:16 ` Mike Marshall
  2015-07-17 17:16 ` [PATCH V3 2/7] Orangefs: kernel client part 2 Mike Marshall
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Mike Marshall @ 2015-07-17 17:16 UTC (permalink / raw)
  To: viro; +Cc: Mike Marshall, linux-fsdevel

From: Mike Marshall <hubcap@omnibond.com>

OrangeFS (formerly PVFS) is an lgpl licensed userspace networked parallel
file system. OrangeFS can be accessed through included system utilities,
user integration libraries, MPI-IO and can be used by the Hadoop
ecosystem as an alternative to the HDFS filesystem. OrangeFS is used
widely for parallel science, data analytics and engineering applications.

While applications often don't require Orangefs to be mounted into
the VFS, users do like to be able to access their files in the normal way.
The Orangefs kernel client allows Orangefs filesystems to be mounted as
a VFS. The kernel client communicates with a userspace daemon which in
turn communicates with the Orangefs server daemons that implement the
filesystem. The server daemons (there's almost always more than one)
need not be running on the same host as the kernel client.

Orangefs filesystems can also be mounted with FUSE, and we
ship code and instructions to facilitate that, but most of our users
report preferring to use our kernel module instead. Further, as an example
of a problem we can't solve with fuse, we have in the works a
not-yet-ready-for-prime-time version of a file_operations lock function
that accounts for the server daemons being distributed across more
than one running kernel.

Many people and organizations, including Clemson University,
Argonne National Laboratories and Acxiom Corporation have
helped to create what has become Orangefs over more than twenty
years. Some of the more recent contributors to the kernel client
include:

  Mike Marshall
  Christoph Hellwig
  Randy Martin
  Becky Ligon
  Walt Ligon
  Michael Moore
  Rob Ross
  Phil Carnes

Signed-off-by: Mike Marshall <hubcap@omnibond.com>
---
 fs/orangefs/downcall.h        | 138 +++++++
 fs/orangefs/protocol.h        | 681 +++++++++++++++++++++++++++++++++
 fs/orangefs/pvfs2-bufmap.h    |  76 ++++
 fs/orangefs/pvfs2-debug.h     | 290 ++++++++++++++
 fs/orangefs/pvfs2-debugfs.h   |   3 +
 fs/orangefs/pvfs2-dev-proto.h | 102 +++++
 fs/orangefs/pvfs2-kernel.h    | 864 ++++++++++++++++++++++++++++++++++++++++++
 fs/orangefs/pvfs2-sysfs.h     |   2 +
 fs/orangefs/upcall.h          | 255 +++++++++++++
 9 files changed, 2411 insertions(+)
 create mode 100644 fs/orangefs/downcall.h
 create mode 100644 fs/orangefs/protocol.h
 create mode 100644 fs/orangefs/pvfs2-bufmap.h
 create mode 100644 fs/orangefs/pvfs2-debug.h
 create mode 100644 fs/orangefs/pvfs2-debugfs.h
 create mode 100644 fs/orangefs/pvfs2-dev-proto.h
 create mode 100644 fs/orangefs/pvfs2-kernel.h
 create mode 100644 fs/orangefs/pvfs2-sysfs.h
 create mode 100644 fs/orangefs/upcall.h

diff --git a/fs/orangefs/downcall.h b/fs/orangefs/downcall.h
new file mode 100644
index 0000000..a79129f
--- /dev/null
+++ b/fs/orangefs/downcall.h
@@ -0,0 +1,138 @@
+/*
+ * (C) 2001 Clemson University and The University of Chicago
+ *
+ * See COPYING in top-level directory.
+ */
+
+/*
+ *  Definitions of downcalls used in Linux kernel module.
+ */
+
+#ifndef __DOWNCALL_H
+#define __DOWNCALL_H
+
+/*
+ * Sanitized the device-client core interaction
+ * for clean 32-64 bit usage
+ */
+struct pvfs2_io_response {
+	__s64 amt_complete;
+};
+
+struct pvfs2_iox_response {
+	__s64 amt_complete;
+};
+
+struct pvfs2_lookup_response {
+	struct pvfs2_object_kref refn;
+};
+
+struct pvfs2_create_response {
+	struct pvfs2_object_kref refn;
+};
+
+struct pvfs2_symlink_response {
+	struct pvfs2_object_kref refn;
+};
+
+struct pvfs2_getattr_response {
+	struct PVFS_sys_attr_s attributes;
+	char link_target[PVFS2_NAME_LEN];
+};
+
+struct pvfs2_mkdir_response {
+	struct pvfs2_object_kref refn;
+};
+
+/*
+ * duplication of some system interface structures so that I don't have
+ * to allocate extra memory
+ */
+struct pvfs2_dirent {
+	char *d_name;
+	int d_length;
+	struct pvfs2_khandle khandle;
+};
+
+struct pvfs2_statfs_response {
+	__s64 block_size;
+	__s64 blocks_total;
+	__s64 blocks_avail;
+	__s64 files_total;
+	__s64 files_avail;
+};
+
+struct pvfs2_fs_mount_response {
+	__s32 fs_id;
+	__s32 id;
+	struct pvfs2_khandle root_khandle;
+};
+
+/* the getxattr response is the attribute value */
+struct pvfs2_getxattr_response {
+	__s32 val_sz;
+	__s32 __pad1;
+	char val[PVFS_MAX_XATTR_VALUELEN];
+};
+
+/* the listxattr response is an array of attribute names */
+struct pvfs2_listxattr_response {
+	__s32 returned_count;
+	__s32 __pad1;
+	__u64 token;
+	char key[PVFS_MAX_XATTR_LISTLEN * PVFS_MAX_XATTR_NAMELEN];
+	__s32 keylen;
+	__s32 __pad2;
+	__s32 lengths[PVFS_MAX_XATTR_LISTLEN];
+};
+
+struct pvfs2_param_response {
+	__s64 value;
+};
+
+#define PERF_COUNT_BUF_SIZE 4096
+struct pvfs2_perf_count_response {
+	char buffer[PERF_COUNT_BUF_SIZE];
+};
+
+#define FS_KEY_BUF_SIZE 4096
+struct pvfs2_fs_key_response {
+	__s32 fs_keylen;
+	__s32 __pad1;
+	char fs_key[FS_KEY_BUF_SIZE];
+};
+
+struct pvfs2_downcall_s {
+	__s32 type;
+	__s32 status;
+	/* currently trailer is used only by readdir */
+	__s64 trailer_size;
+	char * trailer_buf;
+
+	union {
+		struct pvfs2_io_response io;
+		struct pvfs2_iox_response iox;
+		struct pvfs2_lookup_response lookup;
+		struct pvfs2_create_response create;
+		struct pvfs2_symlink_response sym;
+		struct pvfs2_getattr_response getattr;
+		struct pvfs2_mkdir_response mkdir;
+		struct pvfs2_statfs_response statfs;
+		struct pvfs2_fs_mount_response fs_mount;
+		struct pvfs2_getxattr_response getxattr;
+		struct pvfs2_listxattr_response listxattr;
+		struct pvfs2_param_response param;
+		struct pvfs2_perf_count_response perf_count;
+		struct pvfs2_fs_key_response fs_key;
+	} resp;
+};
+
+struct pvfs2_readdir_response_s {
+	__u64 token;
+	__u64 directory_version;
+	__u32 __pad2;
+	__u32 pvfs_dirent_outcount;
+	struct pvfs2_dirent *dirent_array;
+};
+
+#endif /* __DOWNCALL_H */
diff --git a/fs/orangefs/protocol.h b/fs/orangefs/protocol.h
new file mode 100644
index 0000000..2fb3a63
--- /dev/null
+++ b/fs/orangefs/protocol.h
@@ -0,0 +1,681 @@
+#include <linux/spinlock_types.h>
+#include <linux/types.h>
+#include <linux/slab.h>
+
+extern struct client_debug_mask *cdm_array;
+extern char *debug_help_string;
+extern int help_string_initialized;
+extern struct dentry *debug_dir;
+extern struct dentry *help_file_dentry;
+extern struct dentry *client_debug_dentry;
+extern const struct file_operations debug_help_fops;
+extern int client_all_index;
+extern int client_verbose_index;
+extern int cdm_element_count;
+#define DEBUG_HELP_STRING_SIZE 4096
+#define HELP_STRING_UNINITIALIZED \
+	"Client Debug Keywords are unknown until the first time\n" \
+	"the client is started after boot.\n"
+#define ORANGEFS_KMOD_DEBUG_HELP_FILE "debug-help"
+#define ORANGEFS_KMOD_DEBUG_FILE "kernel-debug"
+#define ORANGEFS_CLIENT_DEBUG_FILE "client-debug"
+#define PVFS2_VERBOSE "verbose"
+#define PVFS2_ALL "all"
+
+/* pvfs2-config.h ***********************************************************/
+#define PVFS2_VERSION_MAJOR 2
+#define PVFS2_VERSION_MINOR 9
+#define PVFS2_VERSION_SUB 0
+
+/* khandle stuff  ***********************************************************/
+
+/*
+ * The 2.9 core will put 64 bit handles in here like this:
+ *    1234 0000 0000 5678
+ * The 3.0 and beyond cores will put 128 bit handles in here like this:
+ *    1234 5678 90AB CDEF
+ * The kernel module will always use the first four bytes and
+ * the last four bytes as an inum.
+ */
+struct pvfs2_khandle {
+	unsigned char u[16];
+}  __aligned(8);
+
+/*
+ * kernel version of an object ref.
+ */
+struct pvfs2_object_kref {
+	struct pvfs2_khandle khandle;
+	__s32 fs_id;
+	__s32 __pad1;
+};
+
+/*
+ * compare 2 khandles assumes little endian thus from large address to
+ * small address
+ */
+static inline int PVFS_khandle_cmp(const struct pvfs2_khandle *kh1,
+				   const struct pvfs2_khandle *kh2)
+{
+	int i;
+
+	for (i = 15; i >= 0; i--) {
+		if (kh1->u[i] > kh2->u[i])
+			return 1;
+		if (kh1->u[i] < kh2->u[i])
+			return -1;
+	}
+
+	return 0;
+}
+
+/* copy a khandle to a field of arbitrary size */
+static inline void PVFS_khandle_to(const struct pvfs2_khandle *kh,
+				   void *p, int size)
+{
+	int i;
+	unsigned char *c = p;
+
+	memset(p, 0, size);
+
+	for (i = 0; i < 16 && i < size; i++)
+		c[i] = kh->u[i];
+}
+
+/* copy a khandle from a field of arbitrary size */
+static inline void PVFS_khandle_from(struct pvfs2_khandle *kh,
+				     void *p, int size)
+{
+	int i;
+	unsigned char *c = p;
+
+	memset(kh, 0, 16);
+
+	for (i = 0; i < 16 && i < size; i++)
+		kh->u[i] = c[i];
+}
+
+/* pvfs2-types.h ************************************************************/
+typedef __u32 PVFS_uid;
+typedef __u32 PVFS_gid;
+typedef __s32 PVFS_fs_id;
+typedef __u32 PVFS_permissions;
+typedef __u64 PVFS_time;
+typedef __s64 PVFS_size;
+typedef __u64 PVFS_flags;
+typedef __u64 PVFS_ds_position;
+typedef __s32 PVFS_error;
+typedef __s64 PVFS_offset;
+
+#define PVFS2_SUPER_MAGIC 0x20030528
+#define PVFS_ERROR_BIT           (1 << 30)
+#define PVFS_NON_ERRNO_ERROR_BIT (1 << 29)
+#define IS_PVFS_ERROR(__error)   ((__error)&(PVFS_ERROR_BIT))
+#define IS_PVFS_NON_ERRNO_ERROR(__error)  \
+(((__error)&(PVFS_NON_ERRNO_ERROR_BIT)) && IS_PVFS_ERROR(__error))
+#define PVFS_ERROR_TO_ERRNO(__error) PVFS_get_errno_mapping(__error)
+
+/* 7 bits are used for the errno mapped error codes */
+#define PVFS_ERROR_CODE(__error) \
+((__error) & (__s32)(0x7f|PVFS_ERROR_BIT))
+#define PVFS_ERROR_CLASS(__error) \
+((__error) & ~((__s32)(0x7f|PVFS_ERROR_BIT|PVFS_NON_ERRNO_ERROR_BIT)))
+#define PVFS_NON_ERRNO_ERROR_CODE(__error) \
+((__error) & (__s32)(127|PVFS_ERROR_BIT|PVFS_NON_ERRNO_ERROR_BIT))
+
+/* PVFS2 error codes, compliments of asm/errno.h */
+#define PVFS_EPERM            E(1)	/* Operation not permitted */
+#define PVFS_ENOENT           E(2)	/* No such file or directory */
+#define PVFS_EINTR            E(3)	/* Interrupted system call */
+#define PVFS_EIO              E(4)	/* I/O error */
+#define PVFS_ENXIO            E(5)	/* No such device or address */
+#define PVFS_EBADF            E(6)	/* Bad file number */
+#define PVFS_EAGAIN           E(7)	/* Try again */
+#define PVFS_ENOMEM           E(8)	/* Out of memory */
+#define PVFS_EFAULT           E(9)	/* Bad address */
+#define PVFS_EBUSY           E(10)	/* Device or resource busy */
+#define PVFS_EEXIST          E(11)	/* File exists */
+#define PVFS_ENODEV          E(12)	/* No such device */
+#define PVFS_ENOTDIR         E(13)	/* Not a directory */
+#define PVFS_EISDIR          E(14)	/* Is a directory */
+#define PVFS_EINVAL          E(15)	/* Invalid argument */
+#define PVFS_EMFILE          E(16)	/* Too many open files */
+#define PVFS_EFBIG           E(17)	/* File too large */
+#define PVFS_ENOSPC          E(18)	/* No space left on device */
+#define PVFS_EROFS           E(19)	/* Read-only file system */
+#define PVFS_EMLINK          E(20)	/* Too many links */
+#define PVFS_EPIPE           E(21)	/* Broken pipe */
+#define PVFS_EDEADLK         E(22)	/* Resource deadlock would occur */
+#define PVFS_ENAMETOOLONG    E(23)	/* File name too long */
+#define PVFS_ENOLCK          E(24)	/* No record locks available */
+#define PVFS_ENOSYS          E(25)	/* Function not implemented */
+#define PVFS_ENOTEMPTY       E(26)	/* Directory not empty */
+					/*
+#define PVFS_ELOOP           E(27)	 * Too many symbolic links encountered
+					 */
+#define PVFS_EWOULDBLOCK     E(28)	/* Operation would block */
+#define PVFS_ENOMSG          E(29)	/* No message of desired type */
+#define PVFS_EUNATCH         E(30)	/* Protocol driver not attached */
+#define PVFS_EBADR           E(31)	/* Invalid request descriptor */
+#define PVFS_EDEADLOCK       E(32)
+#define PVFS_ENODATA         E(33)	/* No data available */
+#define PVFS_ETIME           E(34)	/* Timer expired */
+#define PVFS_ENONET          E(35)	/* Machine is not on the network */
+#define PVFS_EREMOTE         E(36)	/* Object is remote */
+#define PVFS_ECOMM           E(37)	/* Communication error on send */
+#define PVFS_EPROTO          E(38)	/* Protocol error */
+#define PVFS_EBADMSG         E(39)	/* Not a data message */
+					/*
+#define PVFS_EOVERFLOW       E(40)	 * Value too large for defined data
+					 * type
+					 */
+					/*
+#define PVFS_ERESTART        E(41)	 * Interrupted system call should be
+					 * restarted
+					 */
+#define PVFS_EMSGSIZE        E(42)	/* Message too long */
+#define PVFS_EPROTOTYPE      E(43)	/* Protocol wrong type for socket */
+#define PVFS_ENOPROTOOPT     E(44)	/* Protocol not available */
+#define PVFS_EPROTONOSUPPORT E(45)	/* Protocol not supported */
+					/*
+#define PVFS_EOPNOTSUPP      E(46)	 * Operation not supported on transport
+					 * endpoint
+					 */
+#define PVFS_EADDRINUSE      E(47)	/* Address already in use */
+#define PVFS_EADDRNOTAVAIL   E(48)	/* Cannot assign requested address */
+#define PVFS_ENETDOWN        E(49)	/* Network is down */
+#define PVFS_ENETUNREACH     E(50)	/* Network is unreachable */
+					/*
+#define PVFS_ENETRESET       E(51)	 * Network dropped connection because
+					 * of reset
+					 */
+#define PVFS_ENOBUFS         E(52)	/* No buffer space available */
+#define PVFS_ETIMEDOUT       E(53)	/* Connection timed out */
+#define PVFS_ECONNREFUSED    E(54)	/* Connection refused */
+#define PVFS_EHOSTDOWN       E(55)	/* Host is down */
+#define PVFS_EHOSTUNREACH    E(56)	/* No route to host */
+#define PVFS_EALREADY        E(57)	/* Operation already in progress */
+#define PVFS_EACCES          E(58)	/* Access not allowed */
+#define PVFS_ECONNRESET      E(59)	/* Connection reset by peer */
+#define PVFS_ERANGE          E(60)	/* Math out of range or buf too small */
+
+/***************** non-errno/pvfs2 specific error codes *****************/
+#define PVFS_ECANCEL    (1|(PVFS_NON_ERRNO_ERROR_BIT|PVFS_ERROR_BIT))
+#define PVFS_EDEVINIT   (2|(PVFS_NON_ERRNO_ERROR_BIT|PVFS_ERROR_BIT))
+#define PVFS_EDETAIL    (3|(PVFS_NON_ERRNO_ERROR_BIT|PVFS_ERROR_BIT))
+#define PVFS_EHOSTNTFD  (4|(PVFS_NON_ERRNO_ERROR_BIT|PVFS_ERROR_BIT))
+#define PVFS_EADDRNTFD  (5|(PVFS_NON_ERRNO_ERROR_BIT|PVFS_ERROR_BIT))
+#define PVFS_ENORECVR   (6|(PVFS_NON_ERRNO_ERROR_BIT|PVFS_ERROR_BIT))
+#define PVFS_ETRYAGAIN  (7|(PVFS_NON_ERRNO_ERROR_BIT|PVFS_ERROR_BIT))
+#define PVFS_ENOTPVFS   (8|(PVFS_NON_ERRNO_ERROR_BIT|PVFS_ERROR_BIT))
+#define PVFS_ESECURITY  (9|(PVFS_NON_ERRNO_ERROR_BIT|PVFS_ERROR_BIT))
+
+/*
+ * NOTE: PLEASE DO NOT ARBITRARILY ADD NEW ERRNO ERROR CODES!
+ *
+ * IF YOU CHOOSE TO ADD A NEW ERROR CODE (DESPITE OUR PLEA), YOU ALSO
+ * NEED TO INCREMENT PVFS_ERRNO MAX (BELOW) AND ADD A MAPPING TO A
+ * UNIX ERRNO VALUE IN THE MACROS BELOW (USED IN
+ * src/common/misc/errno-mapping.c and the kernel module)
+ */
+#define PVFS_ERRNO_MAX          61
+
+#define PVFS_ERROR_BMI    (1 << 7)	/* BMI-specific error */
+#define PVFS_ERROR_TROVE  (2 << 7)	/* Trove-specific error */
+#define PVFS_ERROR_FLOW   (3 << 7)
+#define PVFS_ERROR_SM     (4 << 7)	/* state machine specific error */
+#define PVFS_ERROR_SCHED  (5 << 7)
+#define PVFS_ERROR_CLIENT (6 << 7)
+#define PVFS_ERROR_DEV    (7 << 7)	/* device file interaction */
+
+#define PVFS_ERROR_CLASS_BITS	\
+	(PVFS_ERROR_BMI    |	\
+	 PVFS_ERROR_TROVE  |	\
+	 PVFS_ERROR_FLOW   |	\
+	 PVFS_ERROR_SM     |	\
+	 PVFS_ERROR_SCHED  |	\
+	 PVFS_ERROR_CLIENT |	\
+	 PVFS_ERROR_DEV)
+
+#define DECLARE_ERRNO_MAPPING()                       \
+__s32 PINT_errno_mapping[PVFS_ERRNO_MAX + 1] = { \
+	0,     /* leave this one empty */                 \
+	EPERM, /* 1 */                                    \
+	ENOENT,                                           \
+	EINTR,                                            \
+	EIO,                                              \
+	ENXIO,                                            \
+	EBADF,                                            \
+	EAGAIN,                                           \
+	ENOMEM,                                           \
+	EFAULT,                                           \
+	EBUSY, /* 10 */                                   \
+	EEXIST,                                           \
+	ENODEV,                                           \
+	ENOTDIR,                                          \
+	EISDIR,                                           \
+	EINVAL,                                           \
+	EMFILE,                                           \
+	EFBIG,                                            \
+	ENOSPC,                                           \
+	EROFS,                                            \
+	EMLINK, /* 20 */                                  \
+	EPIPE,                                            \
+	EDEADLK,                                          \
+	ENAMETOOLONG,                                     \
+	ENOLCK,                                           \
+	ENOSYS,                                           \
+	ENOTEMPTY,                                        \
+	ELOOP,                                            \
+	EWOULDBLOCK,                                      \
+	ENOMSG,                                           \
+	EUNATCH, /* 30 */                                 \
+	EBADR,                                            \
+	EDEADLOCK,                                        \
+	ENODATA,                                          \
+	ETIME,                                            \
+	ENONET,                                           \
+	EREMOTE,                                          \
+	ECOMM,                                            \
+	EPROTO,                                           \
+	EBADMSG,                                          \
+	EOVERFLOW, /* 40 */                               \
+	ERESTART,                                         \
+	EMSGSIZE,                                         \
+	EPROTOTYPE,                                       \
+	ENOPROTOOPT,                                      \
+	EPROTONOSUPPORT,                                  \
+	EOPNOTSUPP,                                       \
+	EADDRINUSE,                                       \
+	EADDRNOTAVAIL,                                    \
+	ENETDOWN,                                         \
+	ENETUNREACH, /* 50 */                             \
+	ENETRESET,                                        \
+	ENOBUFS,                                          \
+	ETIMEDOUT,                                        \
+	ECONNREFUSED,                                     \
+	EHOSTDOWN,                                        \
+	EHOSTUNREACH,                                     \
+	EALREADY,                                         \
+	EACCES,                                           \
+	ECONNRESET,   /* 59 */                            \
+	ERANGE,                                           \
+	0         /* PVFS_ERRNO_MAX */                    \
+};                                                    \
+const char *PINT_non_errno_strerror_mapping[] = {     \
+	"Success", /* 0 */                                \
+	"Operation cancelled (possibly due to timeout)",  \
+	"Device initialization failed",                   \
+	"Detailed per-server errors are available",       \
+	"Unknown host",                                   \
+	"No address associated with name",                \
+	"Unknown server error",                           \
+	"Host name lookup failure",                       \
+	"Path contains non-PVFS elements",                \
+	"Security error",                                 \
+};                                                    \
+__s32 PINT_non_errno_mapping[] = {               \
+	0,     /* leave this one empty */                 \
+	PVFS_ECANCEL,   /* 1 */                           \
+	PVFS_EDEVINIT,  /* 2 */                           \
+	PVFS_EDETAIL,   /* 3 */                           \
+	PVFS_EHOSTNTFD, /* 4 */                           \
+	PVFS_EADDRNTFD, /* 5 */                           \
+	PVFS_ENORECVR,  /* 6 */                           \
+	PVFS_ETRYAGAIN, /* 7 */                           \
+	PVFS_ENOTPVFS,  /* 8 */                           \
+	PVFS_ESECURITY, /* 9 */                           \
+}
+
+/*
+ *   NOTE: PVFS_get_errno_mapping will convert a PVFS_ERROR_CODE to an
+ *   errno value.  If the error code is a pvfs2 specific error code
+ *   (i.e. a PVFS_NON_ERRNO_ERROR_CODE), PVFS_get_errno_mapping will
+ *   return an index into the PINT_non_errno_strerror_mapping array which
+ *   can be used for getting the pvfs2 specific strerror message given
+ *   the error code.  if the value is not a recognized error code, the
+ *   passed in value will be returned unchanged.
+ */
+#define DECLARE_ERRNO_MAPPING_AND_FN()					\
+extern __s32 PINT_errno_mapping[];					\
+extern __s32 PINT_non_errno_mapping[];				\
+extern const char *PINT_non_errno_strerror_mapping[];			\
+__s32 PVFS_get_errno_mapping(__s32 error)			\
+{									\
+	__s32 ret = error, mask = 0;				\
+	__s32 positive = ((error > -1) ? 1 : 0);			\
+	if (IS_PVFS_NON_ERRNO_ERROR((positive ? error : -error))) {	\
+		mask = (PVFS_NON_ERRNO_ERROR_BIT |			\
+			PVFS_ERROR_BIT |				\
+			PVFS_ERROR_CLASS_BITS);				\
+		ret = PVFS_NON_ERRNO_ERROR_CODE(((positive ?		\
+						     error :		\
+						     abs(error))) &	\
+						 ~mask);		\
+	}								\
+	else if (IS_PVFS_ERROR((positive ? error : -error))) {		\
+		mask = (PVFS_ERROR_BIT |				\
+			PVFS_ERROR_CLASS_BITS);				\
+		ret = PINT_errno_mapping[PVFS_ERROR_CODE(((positive ?	\
+								error :	\
+								abs(error))) & \
+							  ~mask)];	\
+	}								\
+	return ret;							\
+}									\
+__s32 PVFS_errno_to_error(int err)					\
+{									\
+	__s32 e = 0;						\
+									\
+	for (; e < PVFS_ERRNO_MAX; ++e)					\
+		if (PINT_errno_mapping[e] == err)			\
+			return e | PVFS_ERROR_BIT;			\
+									\
+	return err;							\
+}									\
+DECLARE_ERRNO_MAPPING()
+
+/* permission bits */
+#define PVFS_O_EXECUTE (1 << 0)
+#define PVFS_O_WRITE   (1 << 1)
+#define PVFS_O_READ    (1 << 2)
+#define PVFS_G_EXECUTE (1 << 3)
+#define PVFS_G_WRITE   (1 << 4)
+#define PVFS_G_READ    (1 << 5)
+#define PVFS_U_EXECUTE (1 << 6)
+#define PVFS_U_WRITE   (1 << 7)
+#define PVFS_U_READ    (1 << 8)
+/* no PVFS_U_VTX (sticky bit) */
+#define PVFS_G_SGID    (1 << 10)
+#define PVFS_U_SUID    (1 << 11)
+
+/* definition taken from stdint.h */
+#define INT32_MAX (2147483647)
+#define PVFS_ITERATE_START    (INT32_MAX - 1)
+#define PVFS_ITERATE_END      (INT32_MAX - 2)
+#define PVFS_READDIR_START PVFS_ITERATE_START
+#define PVFS_READDIR_END   PVFS_ITERATE_END
+#define PVFS_IMMUTABLE_FL FS_IMMUTABLE_FL
+#define PVFS_APPEND_FL    FS_APPEND_FL
+#define PVFS_NOATIME_FL   FS_NOATIME_FL
+#define PVFS_MIRROR_FL    0x01000000ULL
+#define PVFS_O_EXECUTE (1 << 0)
+#define PVFS_FS_ID_NULL       ((__s32)0)
+#define PVFS_ATTR_SYS_UID                   (1 << 0)
+#define PVFS_ATTR_SYS_GID                   (1 << 1)
+#define PVFS_ATTR_SYS_PERM                  (1 << 2)
+#define PVFS_ATTR_SYS_ATIME                 (1 << 3)
+#define PVFS_ATTR_SYS_CTIME                 (1 << 4)
+#define PVFS_ATTR_SYS_MTIME                 (1 << 5)
+#define PVFS_ATTR_SYS_TYPE                  (1 << 6)
+#define PVFS_ATTR_SYS_ATIME_SET             (1 << 7)
+#define PVFS_ATTR_SYS_MTIME_SET             (1 << 8)
+#define PVFS_ATTR_SYS_SIZE                  (1 << 20)
+#define PVFS_ATTR_SYS_LNK_TARGET            (1 << 24)
+#define PVFS_ATTR_SYS_DFILE_COUNT           (1 << 25)
+#define PVFS_ATTR_SYS_DIRENT_COUNT          (1 << 26)
+#define PVFS_ATTR_SYS_BLKSIZE               (1 << 28)
+#define PVFS_ATTR_SYS_MIRROR_COPIES_COUNT   (1 << 29)
+#define PVFS_ATTR_SYS_COMMON_ALL	\
+	(PVFS_ATTR_SYS_UID	|	\
+	 PVFS_ATTR_SYS_GID	|	\
+	 PVFS_ATTR_SYS_PERM	|	\
+	 PVFS_ATTR_SYS_ATIME	|	\
+	 PVFS_ATTR_SYS_CTIME	|	\
+	 PVFS_ATTR_SYS_MTIME	|	\
+	 PVFS_ATTR_SYS_TYPE)
+
+#define PVFS_ATTR_SYS_ALL_SETABLE		\
+(PVFS_ATTR_SYS_COMMON_ALL-PVFS_ATTR_SYS_TYPE)
+
+#define PVFS_ATTR_SYS_ALL_NOHINT			\
+	(PVFS_ATTR_SYS_COMMON_ALL		|	\
+	 PVFS_ATTR_SYS_SIZE			|	\
+	 PVFS_ATTR_SYS_LNK_TARGET		|	\
+	 PVFS_ATTR_SYS_DFILE_COUNT		|	\
+	 PVFS_ATTR_SYS_MIRROR_COPIES_COUNT	|	\
+	 PVFS_ATTR_SYS_DIRENT_COUNT		|	\
+	 PVFS_ATTR_SYS_BLKSIZE)
+#define PVFS_XATTR_REPLACE 0x2
+#define PVFS_XATTR_CREATE  0x1
+#define PVFS_MAX_SERVER_ADDR_LEN 256
+#define PVFS_NAME_MAX            256
+/*
+ * max extended attribute name len as imposed by the VFS and exploited for the
+ * upcall request types.
+ * NOTE: Please retain them as multiples of 8 even if you wish to change them
+ * This is *NECESSARY* for supporting 32 bit user-space binaries on a 64-bit
+ * kernel. Due to implementation within DBPF, this really needs to be
+ * PVFS_NAME_MAX, which it was the same value as, but no reason to let it
+ * break if that changes in the future.
+ */
+#define PVFS_MAX_XATTR_NAMELEN   PVFS_NAME_MAX	/* Not the same as
+						 * XATTR_NAME_MAX defined
+						 * by <linux/xattr.h>
+						 */
+#define PVFS_MAX_XATTR_VALUELEN  8192	/* Not the same as XATTR_SIZE_MAX
+					 * defined by <linux/xattr.h>
+					 */
+#define PVFS_MAX_XATTR_LISTLEN   16	/* Not the same as XATTR_LIST_MAX
+					 * defined by <linux/xattr.h>
+					 */
+/*
+ * PVFS I/O operation types, used in both system and server interfaces.
+ */
+enum PVFS_io_type {
+	PVFS_IO_READ = 1,
+	PVFS_IO_WRITE = 2
+};
+
+/*
+ * If this enum is modified the server parameters related to the precreate pool
+ * batch and low threshold sizes may need to be modified  to reflect this
+ * change.
+ */
+enum pvfs2_ds_type {
+	PVFS_TYPE_NONE = 0,
+	PVFS_TYPE_METAFILE = (1 << 0),
+	PVFS_TYPE_DATAFILE = (1 << 1),
+	PVFS_TYPE_DIRECTORY = (1 << 2),
+	PVFS_TYPE_SYMLINK = (1 << 3),
+	PVFS_TYPE_DIRDATA = (1 << 4),
+	PVFS_TYPE_INTERNAL = (1 << 5)	/* for the server's private use */
+};
+
+/*
+ * PVFS_certificate simply stores a buffer with the buffer size.
+ * The buffer can be converted to an OpenSSL X509 struct for use.
+ */
+struct PVFS_certificate {
+	__u32 buf_size;
+	unsigned char *buf;
+};
+
+/*
+ * A credential identifies a user and is signed by the client/user
+ * private key.
+ */
+struct PVFS_credential {
+	__u32 userid;	/* user id */
+	__u32 num_groups;	/* length of group_array */
+	__u32 *group_array;	/* groups for which the user is a member */
+	char *issuer;		/* alias of the issuing server */
+	__u64 timeout;	/* seconds after epoch to time out */
+	__u32 sig_size;	/* length of the signature in bytes */
+	unsigned char *signature;	/* digital signature */
+	struct PVFS_certificate certificate;	/* user certificate buffer */
+};
+#define extra_size_PVFS_credential (PVFS_REQ_LIMIT_GROUPS	*	\
+				    sizeof(__u32)		+	\
+				    PVFS_REQ_LIMIT_ISSUER	+	\
+				    PVFS_REQ_LIMIT_SIGNATURE	+	\
+				    extra_size_PVFS_certificate)
+
+/* This structure is used by the VFS-client interaction alone */
+struct PVFS_keyval_pair {
+	char key[PVFS_MAX_XATTR_NAMELEN];
+	__s32 key_sz;	/* __s32 for portable, fixed-size structures */
+	__s32 val_sz;
+	char val[PVFS_MAX_XATTR_VALUELEN];
+};
+
+/* pvfs2-sysint.h ***********************************************************/
+/* Describes attributes for a file, directory, or symlink. */
+struct PVFS_sys_attr_s {
+	__u32 owner;
+	__u32 group;
+	__u32 perms;
+	__u64 atime;
+	__u64 mtime;
+	__u64 ctime;
+	__s64 size;
+
+	/* NOTE: caller must free if valid */
+	char *link_target;
+
+	/* Changed to __s32 so that size of structure does not change */
+	__s32 dfile_count;
+
+	/* Changed to __s32 so that size of structure does not change */
+	__s32 distr_dir_servers_initial;
+
+	/* Changed to __s32 so that size of structure does not change */
+	__s32 distr_dir_servers_max;
+
+	/* Changed to __s32 so that size of structure does not change */
+	__s32 distr_dir_split_size;
+
+	__u32 mirror_copies_count;
+
+	/* NOTE: caller must free if valid */
+	char *dist_name;
+
+	/* NOTE: caller must free if valid */
+	char *dist_params;
+
+	__s64 dirent_count;
+	enum pvfs2_ds_type objtype;
+	__u64 flags;
+	__u32 mask;
+	__s64 blksize;
+};
+
+#define PVFS2_LOOKUP_LINK_NO_FOLLOW 0
+#define PVFS2_LOOKUP_LINK_FOLLOW    1
+
+/* pint-dev.h ***************************************************************/
+
+/* parameter structure used in PVFS_DEV_DEBUG ioctl command */
+struct dev_mask_info_s {
+	enum {
+		KERNEL_MASK,
+		CLIENT_MASK,
+	} mask_type;
+	__u64 mask_value;
+};
+
+struct dev_mask2_info_s {
+	__u64 mask1_value;
+	__u64 mask2_value;
+};
+
+/* pvfs2-util.h *************************************************************/
+#define PVFS_util_min(x1, x2) (((x1) > (x2)) ? (x2) : (x1))
+__s32 PVFS_util_translate_mode(int mode);
+
+/* pvfs2-debug.h ************************************************************/
+#include "pvfs2-debug.h"
+
+/* pvfs2-internal.h *********************************************************/
+#define llu(x) (unsigned long long)(x)
+#define lld(x) (long long)(x)
+
+/* pint-dev-shared.h ********************************************************/
+#define PVFS_DEV_MAGIC 'k'
+
+#define PVFS2_READDIR_DEFAULT_DESC_COUNT  5
+
+#define DEV_GET_MAGIC           0x1
+#define DEV_GET_MAX_UPSIZE      0x2
+#define DEV_GET_MAX_DOWNSIZE    0x3
+#define DEV_MAP                 0x4
+#define DEV_REMOUNT_ALL         0x5
+#define DEV_DEBUG               0x6
+#define DEV_UPSTREAM            0x7
+#define DEV_CLIENT_MASK         0x8
+#define DEV_CLIENT_STRING       0x9
+#define DEV_MAX_NR              0xa
+
+/* supported ioctls, codes are with respect to user-space */
+enum {
+	PVFS_DEV_GET_MAGIC = _IOW(PVFS_DEV_MAGIC, DEV_GET_MAGIC, __s32),
+	PVFS_DEV_GET_MAX_UPSIZE =
+	    _IOW(PVFS_DEV_MAGIC, DEV_GET_MAX_UPSIZE, __s32),
+	PVFS_DEV_GET_MAX_DOWNSIZE =
+	    _IOW(PVFS_DEV_MAGIC, DEV_GET_MAX_DOWNSIZE, __s32),
+	PVFS_DEV_MAP = _IO(PVFS_DEV_MAGIC, DEV_MAP),
+	PVFS_DEV_REMOUNT_ALL = _IO(PVFS_DEV_MAGIC, DEV_REMOUNT_ALL),
+	PVFS_DEV_DEBUG = _IOR(PVFS_DEV_MAGIC, DEV_DEBUG, __s32),
+	PVFS_DEV_UPSTREAM = _IOW(PVFS_DEV_MAGIC, DEV_UPSTREAM, int),
+	PVFS_DEV_CLIENT_MASK = _IOW(PVFS_DEV_MAGIC,
+				    DEV_CLIENT_MASK,
+				    struct dev_mask2_info_s),
+	PVFS_DEV_CLIENT_STRING = _IOW(PVFS_DEV_MAGIC,
+				      DEV_CLIENT_STRING,
+				      char *),
+	PVFS_DEV_MAXNR = DEV_MAX_NR,
+};
+
+/*
+ * version number for use in communicating between kernel space and user
+ * space
+ */
+/*
+#define PVFS_KERNEL_PROTO_VERSION			\
+		((PVFS2_VERSION_MAJOR * 10000)	+	\
+		 (PVFS2_VERSION_MINOR * 100)	+	\
+		 PVFS2_VERSION_SUB)
+*/
+#define PVFS_KERNEL_PROTO_VERSION 0
+
+/*
+ * describes memory regions to map in the PVFS_DEV_MAP ioctl.
+ * NOTE: See devpvfs2-req.c for 32 bit compat structure.
+ * Since this structure has a variable-sized layout that is different
+ * on 32 and 64 bit platforms, we need to normalize to a 64 bit layout
+ * on such systems before servicing ioctl calls from user-space binaries
+ * that may be 32 bit!
+ */
+struct PVFS_dev_map_desc {
+	void *ptr;
+	__s32 total_size;
+	__s32 size;
+	__s32 count;
+};
+
+/* gossip.h *****************************************************************/
+
+#ifdef GOSSIP_DISABLE_DEBUG
+#define gossip_debug(mask, format, f...) do {} while (0)
+#else
+extern __u64 gossip_debug_mask;
+extern struct client_debug_mask client_debug_mask;
+
+/* try to avoid function call overhead by checking masks in macro */
+#define gossip_debug(mask, format, f...)			\
+do {								\
+	if (gossip_debug_mask & mask)				\
+		printk(format, ##f);				\
+} while (0)
+#endif /* GOSSIP_DISABLE_DEBUG */
+
+/* do file and line number printouts w/ the GNU preprocessor */
+#define gossip_ldebug(mask, format, f...)				\
+		gossip_debug(mask, "%s: " format, __func__, ##f)
+
+#define gossip_err printk
+#define gossip_lerr(format, f...)					\
+		gossip_err("%s line %d: " format,			\
+			   __FILE__,					\
+			   __LINE__,					\
+			   ##f)
diff --git a/fs/orangefs/pvfs2-bufmap.h b/fs/orangefs/pvfs2-bufmap.h
new file mode 100644
index 0000000..e269dea
--- /dev/null
+++ b/fs/orangefs/pvfs2-bufmap.h
@@ -0,0 +1,76 @@
+/*
+ * (C) 2001 Clemson University and The University of Chicago
+ *
+ * See COPYING in top-level directory.
+ */
+
+#ifndef __PVFS2_BUFMAP_H
+#define __PVFS2_BUFMAP_H
+
+/* used to describe mapped buffers */
+struct pvfs_bufmap_desc {
+	void *uaddr;			/* user space address pointer */
+	struct page **page_array;	/* array of mapped pages */
+	int array_count;		/* size of above arrays */
+	struct list_head list_link;
+};
+
+struct pvfs2_bufmap;
+
+struct pvfs2_bufmap *pvfs2_bufmap_ref(void);
+void pvfs2_bufmap_unref(struct pvfs2_bufmap *bufmap);
+
+/*
+ * pvfs_bufmap_size_query is now an inline function because buffer
+ * sizes are not hardcoded
+ */
+int pvfs_bufmap_size_query(void);
+
+int pvfs_bufmap_shift_query(void);
+
+int pvfs_bufmap_initialize(struct PVFS_dev_map_desc *user_desc);
+
+int get_bufmap_init(void);
+
+void pvfs_bufmap_finalize(void);
+
+int pvfs_bufmap_get(struct pvfs2_bufmap **mapp, int *buffer_index);
+
+void pvfs_bufmap_put(struct pvfs2_bufmap *bufmap, int buffer_index);
+
+int readdir_index_get(struct pvfs2_bufmap **mapp, int *buffer_index);
+
+void readdir_index_put(struct pvfs2_bufmap *bufmap, int buffer_index);
+
+int pvfs_bufmap_copy_iovec_from_user(struct pvfs2_bufmap *bufmap,
+				     int buffer_index,
+				     const struct iovec *iov,
+				     unsigned long nr_segs,
+				     size_t size);
+
+int pvfs_bufmap_copy_iovec_from_kernel(struct pvfs2_bufmap *bufmap,
+				       int buffer_index,
+				       const struct iovec *iov,
+				       unsigned long nr_segs,
+				       size_t size);
+
+int pvfs_bufmap_copy_to_user_iovec(struct pvfs2_bufmap *bufmap,
+				   int buffer_index,
+				   const struct iovec *iov,
+				   unsigned long nr_segs,
+				   size_t size);
+
+int pvfs_bufmap_copy_to_kernel_iovec(struct pvfs2_bufmap *bufmap,
+				     int buffer_index,
+				     const struct iovec *iov,
+				     unsigned long nr_segs,
+				     size_t size);
+
+size_t pvfs_bufmap_copy_to_user_task_iovec(struct task_struct *tsk,
+					   struct iovec *iovec,
+					   unsigned long nr_segs,
+					   struct pvfs2_bufmap *bufmap,
+					   int buffer_index,
+					   size_t bytes_to_be_copied);
+
+#endif /* __PVFS2_BUFMAP_H */
diff --git a/fs/orangefs/pvfs2-debug.h b/fs/orangefs/pvfs2-debug.h
new file mode 100644
index 0000000..4c27ad7
--- /dev/null
+++ b/fs/orangefs/pvfs2-debug.h
@@ -0,0 +1,290 @@
+/*
+ * (C) 2001 Clemson University and The University of Chicago
+ *
+ * See COPYING in top-level directory.
+ */
+
+/* This file just defines debugging masks to be used with the gossip
+ * logging utility.  All debugging masks for PVFS2 are kept here to make
+ * sure we don't have collisions.
+ */
+
+#ifndef __PVFS2_DEBUG_H
+#define __PVFS2_DEBUG_H
+
+#ifdef __KERNEL__
+#include <linux/types.h>
+#else
+#include <stdint.h>
+#endif
+
+#define GOSSIP_NO_DEBUG                (__u64)0
+#define GOSSIP_BMI_DEBUG_TCP           ((__u64)1 << 0)
+#define GOSSIP_BMI_DEBUG_CONTROL       ((__u64)1 << 1)
+#define GOSSIP_BMI_DEBUG_OFFSETS       ((__u64)1 << 2)
+#define GOSSIP_BMI_DEBUG_GM            ((__u64)1 << 3)
+#define GOSSIP_JOB_DEBUG               ((__u64)1 << 4)
+#define GOSSIP_SERVER_DEBUG            ((__u64)1 << 5)
+#define GOSSIP_STO_DEBUG_CTRL          ((__u64)1 << 6)
+#define GOSSIP_STO_DEBUG_DEFAULT       ((__u64)1 << 7)
+#define GOSSIP_FLOW_DEBUG              ((__u64)1 << 8)
+#define GOSSIP_BMI_DEBUG_GM_MEM        ((__u64)1 << 9)
+#define GOSSIP_REQUEST_DEBUG           ((__u64)1 << 10)
+#define GOSSIP_FLOW_PROTO_DEBUG        ((__u64)1 << 11)
+#define GOSSIP_NCACHE_DEBUG            ((__u64)1 << 12)
+#define GOSSIP_CLIENT_DEBUG            ((__u64)1 << 13)
+#define GOSSIP_REQ_SCHED_DEBUG         ((__u64)1 << 14)
+#define GOSSIP_ACACHE_DEBUG            ((__u64)1 << 15)
+#define GOSSIP_TROVE_DEBUG             ((__u64)1 << 16)
+#define GOSSIP_TROVE_OP_DEBUG          ((__u64)1 << 17)
+#define GOSSIP_DIST_DEBUG              ((__u64)1 << 18)
+#define GOSSIP_BMI_DEBUG_IB            ((__u64)1 << 19)
+#define GOSSIP_DBPF_ATTRCACHE_DEBUG    ((__u64)1 << 20)
+#define GOSSIP_MMAP_RCACHE_DEBUG       ((__u64)1 << 21)
+#define GOSSIP_LOOKUP_DEBUG            ((__u64)1 << 22)
+#define GOSSIP_REMOVE_DEBUG            ((__u64)1 << 23)
+#define GOSSIP_GETATTR_DEBUG           ((__u64)1 << 24)
+#define GOSSIP_READDIR_DEBUG           ((__u64)1 << 25)
+#define GOSSIP_IO_DEBUG                ((__u64)1 << 26)
+#define GOSSIP_DBPF_OPEN_CACHE_DEBUG   ((__u64)1 << 27)
+#define GOSSIP_PERMISSIONS_DEBUG       ((__u64)1 << 28)
+#define GOSSIP_CANCEL_DEBUG            ((__u64)1 << 29)
+#define GOSSIP_MSGPAIR_DEBUG           ((__u64)1 << 30)
+#define GOSSIP_CLIENTCORE_DEBUG        ((__u64)1 << 31)
+#define GOSSIP_CLIENTCORE_TIMING_DEBUG ((__u64)1 << 32)
+#define GOSSIP_SETATTR_DEBUG           ((__u64)1 << 33)
+#define GOSSIP_MKDIR_DEBUG             ((__u64)1 << 34)
+#define GOSSIP_VARSTRIP_DEBUG          ((__u64)1 << 35)
+#define GOSSIP_GETEATTR_DEBUG          ((__u64)1 << 36)
+#define GOSSIP_SETEATTR_DEBUG          ((__u64)1 << 37)
+#define GOSSIP_ENDECODE_DEBUG          ((__u64)1 << 38)
+#define GOSSIP_DELEATTR_DEBUG          ((__u64)1 << 39)
+#define GOSSIP_ACCESS_DEBUG            ((__u64)1 << 40)
+#define GOSSIP_ACCESS_DETAIL_DEBUG     ((__u64)1 << 41)
+#define GOSSIP_LISTEATTR_DEBUG         ((__u64)1 << 42)
+#define GOSSIP_PERFCOUNTER_DEBUG       ((__u64)1 << 43)
+#define GOSSIP_STATE_MACHINE_DEBUG     ((__u64)1 << 44)
+#define GOSSIP_DBPF_KEYVAL_DEBUG       ((__u64)1 << 45)
+#define GOSSIP_LISTATTR_DEBUG          ((__u64)1 << 46)
+#define GOSSIP_DBPF_COALESCE_DEBUG     ((__u64)1 << 47)
+#define GOSSIP_ACCESS_HOSTNAMES        ((__u64)1 << 48)
+#define GOSSIP_FSCK_DEBUG              ((__u64)1 << 49)
+#define GOSSIP_BMI_DEBUG_MX            ((__u64)1 << 50)
+#define GOSSIP_BSTREAM_DEBUG           ((__u64)1 << 51)
+#define GOSSIP_BMI_DEBUG_PORTALS       ((__u64)1 << 52)
+#define GOSSIP_USER_DEV_DEBUG          ((__u64)1 << 53)
+#define GOSSIP_DIRECTIO_DEBUG          ((__u64)1 << 54)
+#define GOSSIP_MGMT_DEBUG              ((__u64)1 << 55)
+#define GOSSIP_MIRROR_DEBUG            ((__u64)1 << 56)
+#define GOSSIP_WIN_CLIENT_DEBUG        ((__u64)1 << 57)
+#define GOSSIP_SECURITY_DEBUG          ((__u64)1 << 58)
+#define GOSSIP_USRINT_DEBUG            ((__u64)1 << 59)
+#define GOSSIP_RCACHE_DEBUG            ((__u64)1 << 60)
+#define GOSSIP_SECCACHE_DEBUG          ((__u64)1 << 61)
+
+#define GOSSIP_BMI_DEBUG_ALL ((__u64) (GOSSIP_BMI_DEBUG_TCP +	\
+					 GOSSIP_BMI_DEBUG_CONTROL +	\
+					 GOSSIP_BMI_DEBUG_GM +		\
+					 GOSSIP_BMI_DEBUG_OFFSETS +	\
+					 GOSSIP_BMI_DEBUG_IB +		\
+					 GOSSIP_BMI_DEBUG_MX +		\
+					 GOSSIP_BMI_DEBUG_PORTALS))
+
+const char *PVFS_debug_get_next_debug_keyword(int position);
+
+#define GOSSIP_SUPER_DEBUG		((__u64)1 << 0)
+#define GOSSIP_INODE_DEBUG		((__u64)1 << 1)
+#define GOSSIP_FILE_DEBUG		((__u64)1 << 2)
+#define GOSSIP_DIR_DEBUG		((__u64)1 << 3)
+#define GOSSIP_UTILS_DEBUG		((__u64)1 << 4)
+#define GOSSIP_WAIT_DEBUG		((__u64)1 << 5)
+#define GOSSIP_ACL_DEBUG		((__u64)1 << 6)
+#define GOSSIP_DCACHE_DEBUG		((__u64)1 << 7)
+#define GOSSIP_DEV_DEBUG		((__u64)1 << 8)
+#define GOSSIP_NAME_DEBUG		((__u64)1 << 9)
+#define GOSSIP_BUFMAP_DEBUG		((__u64)1 << 10)
+#define GOSSIP_CACHE_DEBUG		((__u64)1 << 11)
+#define GOSSIP_DEBUGFS_DEBUG		((__u64)1 << 12)
+#define GOSSIP_XATTR_DEBUG		((__u64)1 << 13)
+#define GOSSIP_INIT_DEBUG		((__u64)1 << 14)
+#define GOSSIP_SYSFS_DEBUG		((__u64)1 << 15)
+
+#define GOSSIP_MAX_NR                 16
+#define GOSSIP_MAX_DEBUG              (((__u64)1 << GOSSIP_MAX_NR) - 1)
+
+/*function prototypes*/
+__u64 PVFS_kmod_eventlog_to_mask(const char *event_logging);
+__u64 PVFS_debug_eventlog_to_mask(const char *event_logging);
+char *PVFS_debug_mask_to_eventlog(__u64 mask);
+char *PVFS_kmod_mask_to_eventlog(__u64 mask);
+
+/* a private internal type */
+struct __keyword_mask_s {
+	const char *keyword;
+	__u64 mask_val;
+};
+
+#define __DEBUG_ALL ((__u64) -1)
+
+/* map all config keywords to pvfs2 debug masks here */
+static struct __keyword_mask_s s_keyword_mask_map[] = {
+	/* Log trove debugging info.  Same as 'trove'. */
+	{"storage", GOSSIP_TROVE_DEBUG},
+	/* Log trove debugging info.  Same as 'storage'. */
+	{"trove", GOSSIP_TROVE_DEBUG},
+	/* Log trove operations. */
+	{"trove_op", GOSSIP_TROVE_OP_DEBUG},
+	/* Log network debug info. */
+	{"network", GOSSIP_BMI_DEBUG_ALL},
+	/* Log server info, including new operations. */
+	{"server", GOSSIP_SERVER_DEBUG},
+	/* Log client sysint info.  This is only useful for the client. */
+	{"client", GOSSIP_CLIENT_DEBUG},
+	/* Debug the varstrip distribution */
+	{"varstrip", GOSSIP_VARSTRIP_DEBUG},
+	/* Log job info */
+	{"job", GOSSIP_JOB_DEBUG},
+	/* Debug PINT_process_request calls.  EXTREMELY verbose! */
+	{"request", GOSSIP_REQUEST_DEBUG},
+	/* Log request scheduler events */
+	{"reqsched", GOSSIP_REQ_SCHED_DEBUG},
+	/* Log the flow protocol events, including flowproto_multiqueue */
+	{"flowproto", GOSSIP_FLOW_PROTO_DEBUG},
+	/* Log flow calls */
+	{"flow", GOSSIP_FLOW_DEBUG},
+	/* Debug the client name cache.  Only useful on the client. */
+	{"ncache", GOSSIP_NCACHE_DEBUG},
+	/* Debug read-ahead cache events.  Only useful on the client. */
+	{"mmaprcache", GOSSIP_MMAP_RCACHE_DEBUG},
+	/* Debug the attribute cache.  Only useful on the client. */
+	{"acache", GOSSIP_ACACHE_DEBUG},
+	/* Log/Debug distribution calls */
+	{"distribution", GOSSIP_DIST_DEBUG},
+	/* Debug the server-side dbpf attribute cache */
+	{"dbpfattrcache", GOSSIP_DBPF_ATTRCACHE_DEBUG},
+	/* Debug the client lookup state machine. */
+	{"lookup", GOSSIP_LOOKUP_DEBUG},
+	/* Debug the client remove state macine. */
+	{"remove", GOSSIP_REMOVE_DEBUG},
+	/* Debug the server getattr state machine. */
+	{"getattr", GOSSIP_GETATTR_DEBUG},
+	/* Debug the server setattr state machine. */
+	{"setattr", GOSSIP_SETATTR_DEBUG},
+	/* vectored getattr server state machine */
+	{"listattr", GOSSIP_LISTATTR_DEBUG},
+	/* Debug the client and server get ext attributes SM. */
+	{"geteattr", GOSSIP_GETEATTR_DEBUG},
+	/* Debug the client and server set ext attributes SM. */
+	{"seteattr", GOSSIP_SETEATTR_DEBUG},
+	/* Debug the readdir operation (client and server) */
+	{"readdir", GOSSIP_READDIR_DEBUG},
+	/* Debug the mkdir operation (server only) */
+	{"mkdir", GOSSIP_MKDIR_DEBUG},
+	/* Debug the io operation (reads and writes)
+	 * for both the client and server */
+	{"io", GOSSIP_IO_DEBUG},
+	/* Debug the server's open file descriptor cache */
+	{"open_cache", GOSSIP_DBPF_OPEN_CACHE_DEBUG},
+	/* Debug permissions checking on the server */
+	{"permissions", GOSSIP_PERMISSIONS_DEBUG},
+	/* Debug the cancel operation */
+	{"cancel", GOSSIP_CANCEL_DEBUG},
+	/* Debug the msgpair state machine */
+	{"msgpair", GOSSIP_MSGPAIR_DEBUG},
+	/* Debug the client core app */
+	{"clientcore", GOSSIP_CLIENTCORE_DEBUG},
+	/* Debug the client timing state machines (job timeout, etc.) */
+	{"clientcore_timing", GOSSIP_CLIENTCORE_TIMING_DEBUG},
+	/* network encoding */
+	{"endecode", GOSSIP_ENDECODE_DEBUG},
+	/* Show server file (metadata) accesses (both modify and read-only). */
+	{"access", GOSSIP_ACCESS_DEBUG},
+	/* Show more detailed server file accesses */
+	{"access_detail", GOSSIP_ACCESS_DETAIL_DEBUG},
+	/* Debug the listeattr operation */
+	{"listeattr", GOSSIP_LISTEATTR_DEBUG},
+	/* Debug the state machine management code */
+	{"sm", GOSSIP_STATE_MACHINE_DEBUG},
+	/* Debug the metadata dbpf keyval functions */
+	{"keyval", GOSSIP_DBPF_KEYVAL_DEBUG},
+	/* Debug the metadata sync coalescing code */
+	{"coalesce", GOSSIP_DBPF_COALESCE_DEBUG},
+	/* Display the hostnames instead of IP addrs in debug output */
+	{"access_hostnames", GOSSIP_ACCESS_HOSTNAMES},
+	/* Show the client device events */
+	{"user_dev", GOSSIP_USER_DEV_DEBUG},
+	/* Debug the fsck tool */
+	{"fsck", GOSSIP_FSCK_DEBUG},
+	/* Debug the bstream code */
+	{"bstream", GOSSIP_BSTREAM_DEBUG},
+	/* Debug trove in direct io mode */
+	{"directio", GOSSIP_DIRECTIO_DEBUG},
+	/* Debug direct io thread management */
+	{"mgmt", GOSSIP_MGMT_DEBUG},
+	/* Debug mirroring process */
+	{"mirror", GOSSIP_MIRROR_DEBUG},
+	/* Windows client */
+	{"win_client", GOSSIP_WIN_CLIENT_DEBUG},
+	/* Debug robust security code */
+	{"security", GOSSIP_SECURITY_DEBUG},
+	/* Capability Cache */
+	{"seccache", GOSSIP_SECCACHE_DEBUG},
+	/* Client User Interface */
+	{"usrint", GOSSIP_USRINT_DEBUG},
+	/* rcache */
+	{"rcache", GOSSIP_RCACHE_DEBUG},
+	/* Everything except the periodic events.  Useful for debugging */
+	{"verbose",
+	 (__DEBUG_ALL &
+	  ~(GOSSIP_PERFCOUNTER_DEBUG | GOSSIP_STATE_MACHINE_DEBUG |
+	    GOSSIP_ENDECODE_DEBUG | GOSSIP_USER_DEV_DEBUG))
+	 },
+	/* No debug output */
+	{"none", GOSSIP_NO_DEBUG},
+	/* Everything */
+	{"all", __DEBUG_ALL}
+};
+
+#undef __DEBUG_ALL
+
+/*
+ * Map all kmod keywords to kmod debug masks here. Keep this
+ * structure "packed":
+ *
+ *   "all" is always last...
+ *
+ *   keyword     mask_val     index
+ *     foo          1           0
+ *     bar          2           1
+ *     baz          4           2
+ *     qux          8           3
+ *      .           .           .
+ */
+static struct __keyword_mask_s s_kmod_keyword_mask_map[] = {
+	{"super", GOSSIP_SUPER_DEBUG},
+	{"inode", GOSSIP_INODE_DEBUG},
+	{"file", GOSSIP_FILE_DEBUG},
+	{"dir", GOSSIP_DIR_DEBUG},
+	{"utils", GOSSIP_UTILS_DEBUG},
+	{"wait", GOSSIP_WAIT_DEBUG},
+	{"acl", GOSSIP_ACL_DEBUG},
+	{"dcache", GOSSIP_DCACHE_DEBUG},
+	{"dev", GOSSIP_DEV_DEBUG},
+	{"name", GOSSIP_NAME_DEBUG},
+	{"bufmap", GOSSIP_BUFMAP_DEBUG},
+	{"cache", GOSSIP_CACHE_DEBUG},
+	{"debugfs", GOSSIP_DEBUGFS_DEBUG},
+	{"xattr", GOSSIP_XATTR_DEBUG},
+	{"init", GOSSIP_INIT_DEBUG},
+	{"sysfs", GOSSIP_SYSFS_DEBUG},
+	{"none", GOSSIP_NO_DEBUG},
+	{"all", GOSSIP_MAX_DEBUG}
+};
+
+static const int num_kmod_keyword_mask_map = (int)
+	(sizeof(s_kmod_keyword_mask_map) / sizeof(struct __keyword_mask_s));
+
+static const int num_keyword_mask_map = (int)
+	(sizeof(s_keyword_mask_map) / sizeof(struct __keyword_mask_s));
+
+#endif /* __PVFS2_DEBUG_H */
diff --git a/fs/orangefs/pvfs2-debugfs.h b/fs/orangefs/pvfs2-debugfs.h
new file mode 100644
index 0000000..a66b7d0
--- /dev/null
+++ b/fs/orangefs/pvfs2-debugfs.h
@@ -0,0 +1,3 @@
+int pvfs2_debugfs_init(void);
+int pvfs2_kernel_debug_init(void);
+void pvfs2_debugfs_cleanup(void);
diff --git a/fs/orangefs/pvfs2-dev-proto.h b/fs/orangefs/pvfs2-dev-proto.h
new file mode 100644
index 0000000..9c82e6e
--- /dev/null
+++ b/fs/orangefs/pvfs2-dev-proto.h
@@ -0,0 +1,102 @@
+/*
+ * (C) 2001 Clemson University and The University of Chicago
+ *
+ * See COPYING in top-level directory.
+ */
+
+#ifndef _PVFS2_DEV_PROTO_H
+#define _PVFS2_DEV_PROTO_H
+
+/*
+ * types and constants shared between user space and kernel space for
+ * device interaction using a common protocol
+ */
+
+/*
+ * valid pvfs2 kernel operation types
+ */
+#define PVFS2_VFS_OP_INVALID           0xFF000000
+#define PVFS2_VFS_OP_FILE_IO           0xFF000001
+#define PVFS2_VFS_OP_LOOKUP            0xFF000002
+#define PVFS2_VFS_OP_CREATE            0xFF000003
+#define PVFS2_VFS_OP_GETATTR           0xFF000004
+#define PVFS2_VFS_OP_REMOVE            0xFF000005
+#define PVFS2_VFS_OP_MKDIR             0xFF000006
+#define PVFS2_VFS_OP_READDIR           0xFF000007
+#define PVFS2_VFS_OP_SETATTR           0xFF000008
+#define PVFS2_VFS_OP_SYMLINK           0xFF000009
+#define PVFS2_VFS_OP_RENAME            0xFF00000A
+#define PVFS2_VFS_OP_STATFS            0xFF00000B
+#define PVFS2_VFS_OP_TRUNCATE          0xFF00000C
+#define PVFS2_VFS_OP_MMAP_RA_FLUSH     0xFF00000D
+#define PVFS2_VFS_OP_FS_MOUNT          0xFF00000E
+#define PVFS2_VFS_OP_FS_UMOUNT         0xFF00000F
+#define PVFS2_VFS_OP_GETXATTR          0xFF000010
+#define PVFS2_VFS_OP_SETXATTR          0xFF000011
+#define PVFS2_VFS_OP_LISTXATTR         0xFF000012
+#define PVFS2_VFS_OP_REMOVEXATTR       0xFF000013
+#define PVFS2_VFS_OP_PARAM             0xFF000014
+#define PVFS2_VFS_OP_PERF_COUNT        0xFF000015
+#define PVFS2_VFS_OP_CANCEL            0xFF00EE00
+#define PVFS2_VFS_OP_FSYNC             0xFF00EE01
+#define PVFS2_VFS_OP_FSKEY             0xFF00EE02
+#define PVFS2_VFS_OP_READDIRPLUS       0xFF00EE03
+#define PVFS2_VFS_OP_FILE_IOX          0xFF00EE04
+
+/*
+ * Misc constants. Please retain them as multiples of 8!
+ * Otherwise 32-64 bit interactions will be messed up :)
+ */
+#define PVFS2_NAME_LEN			0x00000100
+#define PVFS2_MAX_DEBUG_STRING_LEN	0x00000400
+#define PVFS2_MAX_DEBUG_ARRAY_LEN	0x00000800
+
+/*
+ * MAX_DIRENT_COUNT cannot be larger than PVFS_REQ_LIMIT_LISTATTR.
+ * The value of PVFS_REQ_LIMIT_LISTATTR has been changed from 113 to 60
+ * to accomodate an attribute object with mirrored handles.
+ * MAX_DIRENT_COUNT is replaced by MAX_DIRENT_COUNT_READDIR and
+ * MAX_DIRENT_COUNT_READDIRPLUS, since readdir doesn't trigger a listattr
+ * but readdirplus might.
+*/
+#define MAX_DIRENT_COUNT_READDIR       0x00000060
+#define MAX_DIRENT_COUNT_READDIRPLUS   0x0000003C
+
+#include "upcall.h"
+#include "downcall.h"
+
+/*
+ * These macros differ from proto macros in that they don't do any
+ * byte-swappings and are used to ensure that kernel-clientcore interactions
+ * don't cause any unaligned accesses etc on 64 bit machines
+ */
+#ifndef roundup4
+#define roundup4(x) (((x)+3) & ~3)
+#endif
+
+#ifndef roundup8
+#define roundup8(x) (((x)+7) & ~7)
+#endif
+
+/* strings; decoding just points into existing character data */
+#define enc_string(pptr, pbuf) do { \
+	__u32 len = strlen(*pbuf); \
+	*(__u32 *) *(pptr) = (len); \
+	memcpy(*(pptr)+4, *pbuf, len+1); \
+	*(pptr) += roundup8(4 + len + 1); \
+} while (0)
+
+#define dec_string(pptr, pbuf, plen) do { \
+	__u32 len = (*(__u32 *) *(pptr)); \
+	*pbuf = *(pptr) + 4; \
+	*(pptr) += roundup8(4 + len + 1); \
+	if (plen) \
+		*plen = len;\
+} while (0)
+
+struct read_write_x {
+	__s64 off;
+	__s64 len;
+};
+
+#endif
diff --git a/fs/orangefs/pvfs2-kernel.h b/fs/orangefs/pvfs2-kernel.h
new file mode 100644
index 0000000..6c787c4
--- /dev/null
+++ b/fs/orangefs/pvfs2-kernel.h
@@ -0,0 +1,864 @@
+/*
+ * (C) 2001 Clemson University and The University of Chicago
+ *
+ * See COPYING in top-level directory.
+ */
+
+/*
+ *  The PVFS2 Linux kernel support allows PVFS2 volumes to be mounted and
+ *  accessed through the Linux VFS (i.e. using standard I/O system calls).
+ *  This support is only needed on clients that wish to mount the file system.
+ *
+ */
+
+/*
+ *  Declarations and macros for the PVFS2 Linux kernel support.
+ */
+
+#ifndef __PVFS2KERNEL_H
+#define __PVFS2KERNEL_H
+
+#include <linux/kernel.h>
+#include <linux/moduleparam.h>
+#include <linux/statfs.h>
+#include <linux/backing-dev.h>
+#include <linux/device.h>
+#include <linux/mpage.h>
+#include <linux/namei.h>
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+#include <linux/fs.h>
+#include <linux/vmalloc.h>
+
+#include <linux/aio.h>
+#include <linux/posix_acl.h>
+#include <linux/posix_acl_xattr.h>
+#include <linux/compat.h>
+#include <linux/mount.h>
+#include <linux/uaccess.h>
+#include <linux/atomic.h>
+#include <linux/uio.h>
+#include <linux/sched.h>
+#include <linux/mm.h>
+#include <linux/wait.h>
+#include <linux/dcache.h>
+#include <linux/pagemap.h>
+#include <linux/poll.h>
+#include <linux/rwsem.h>
+#include <linux/xattr.h>
+#include <linux/exportfs.h>
+
+#include <asm/unaligned.h>
+
+#include "pvfs2-dev-proto.h"
+
+#ifdef PVFS2_KERNEL_DEBUG
+#define PVFS2_DEFAULT_OP_TIMEOUT_SECS       10
+#else
+#define PVFS2_DEFAULT_OP_TIMEOUT_SECS       20
+#endif
+
+#define PVFS2_BUFMAP_WAIT_TIMEOUT_SECS      30
+
+#define PVFS2_DEFAULT_SLOT_TIMEOUT_SECS     900	/* 15 minutes */
+
+#define PVFS2_REQDEVICE_NAME          "pvfs2-req"
+
+#define PVFS2_DEVREQ_MAGIC             0x20030529
+#define PVFS2_LINK_MAX                 0x000000FF
+#define PVFS2_PURGE_RETRY_COUNT        0x00000005
+#define PVFS2_SEEK_END                 0x00000002
+#define PVFS2_MAX_NUM_OPTIONS          0x00000004
+#define PVFS2_MAX_MOUNT_OPT_LEN        0x00000080
+#define PVFS2_MAX_FSKEY_LEN            64
+
+#define MAX_DEV_REQ_UPSIZE (2*sizeof(__s32) +   \
+sizeof(__u64) + sizeof(struct pvfs2_upcall_s))
+#define MAX_DEV_REQ_DOWNSIZE (2*sizeof(__s32) + \
+sizeof(__u64) + sizeof(struct pvfs2_downcall_s))
+
+#define BITS_PER_LONG_DIV_8 (BITS_PER_LONG >> 3)
+
+/* borrowed from irda.h */
+#ifndef MSECS_TO_JIFFIES
+#define MSECS_TO_JIFFIES(ms) (((ms)*HZ+999)/1000)
+#endif
+
+#define MAX_ALIGNED_DEV_REQ_UPSIZE				\
+		(MAX_DEV_REQ_UPSIZE +				\
+			((((MAX_DEV_REQ_UPSIZE /		\
+				(BITS_PER_LONG_DIV_8)) *	\
+				(BITS_PER_LONG_DIV_8)) +	\
+			    (BITS_PER_LONG_DIV_8)) -		\
+			MAX_DEV_REQ_UPSIZE))
+
+#define MAX_ALIGNED_DEV_REQ_DOWNSIZE				\
+		(MAX_DEV_REQ_DOWNSIZE +				\
+			((((MAX_DEV_REQ_DOWNSIZE /		\
+				(BITS_PER_LONG_DIV_8)) *	\
+				(BITS_PER_LONG_DIV_8)) +	\
+			    (BITS_PER_LONG_DIV_8)) -		\
+			MAX_DEV_REQ_DOWNSIZE))
+
+/*
+ * valid pvfs2 kernel operation states
+ *
+ * unknown  - op was just initialized
+ * waiting  - op is on request_list (upward bound)
+ * inprogr  - op is in progress (waiting for downcall)
+ * serviced - op has matching downcall; ok
+ * purged   - op has to start a timer since client-core
+ *            exited uncleanly before servicing op
+ */
+enum pvfs2_vfs_op_states {
+	OP_VFS_STATE_UNKNOWN = 0,
+	OP_VFS_STATE_WAITING = 1,
+	OP_VFS_STATE_INPROGR = 2,
+	OP_VFS_STATE_SERVICED = 4,
+	OP_VFS_STATE_PURGED = 8,
+};
+
+#define set_op_state_waiting(op)     ((op)->op_state = OP_VFS_STATE_WAITING)
+#define set_op_state_inprogress(op)  ((op)->op_state = OP_VFS_STATE_INPROGR)
+#define set_op_state_serviced(op)    ((op)->op_state = OP_VFS_STATE_SERVICED)
+#define set_op_state_purged(op)      ((op)->op_state |= OP_VFS_STATE_PURGED)
+
+#define op_state_waiting(op)     ((op)->op_state & OP_VFS_STATE_WAITING)
+#define op_state_in_progress(op) ((op)->op_state & OP_VFS_STATE_INPROGR)
+#define op_state_serviced(op)    ((op)->op_state & OP_VFS_STATE_SERVICED)
+#define op_state_purged(op)      ((op)->op_state & OP_VFS_STATE_PURGED)
+
+#define get_op(op)					\
+	do {						\
+		atomic_inc(&(op)->aio_ref_count);	\
+		gossip_debug(GOSSIP_DEV_DEBUG,	\
+			"(get) Alloced OP (%p:%llu)\n",	\
+			op,				\
+			llu((op)->tag));		\
+	} while (0)
+
+#define put_op(op)							\
+	do {								\
+		if (atomic_sub_and_test(1, &(op)->aio_ref_count) == 1) {  \
+			gossip_debug(GOSSIP_DEV_DEBUG,		\
+				"(put) Releasing OP (%p:%llu)\n",	\
+				op,					\
+				llu((op)->tag));			\
+			op_release(op);					\
+			}						\
+	} while (0)
+
+#define op_wait(op) (atomic_read(&(op)->aio_ref_count) <= 2 ? 0 : 1)
+
+/*
+ * Defines for controlling whether I/O upcalls are for async or sync operations
+ */
+enum PVFS_async_io_type {
+	PVFS_VFS_SYNC_IO = 0,
+	PVFS_VFS_ASYNC_IO = 1,
+};
+
+/*
+ * An array of client_debug_mask will be built to hold debug keyword/mask
+ * values fetched from userspace.
+ */
+struct client_debug_mask {
+	char *keyword;
+	__u64 mask1;
+	__u64 mask2;
+};
+
+/*
+ * pvfs2 kernel memory related flags
+ */
+
+#if ((defined PVFS2_KERNEL_DEBUG) && (defined CONFIG_DEBUG_SLAB))
+#define PVFS2_CACHE_CREATE_FLAGS SLAB_RED_ZONE
+#else
+#define PVFS2_CACHE_CREATE_FLAGS 0
+#endif /* ((defined PVFS2_KERNEL_DEBUG) && (defined CONFIG_DEBUG_SLAB)) */
+
+#define PVFS2_CACHE_ALLOC_FLAGS (GFP_KERNEL)
+#define PVFS2_GFP_FLAGS (GFP_KERNEL)
+#define PVFS2_BUFMAP_GFP_FLAGS (GFP_KERNEL)
+
+#define pvfs2_kmap(page) kmap(page)
+#define pvfs2_kunmap(page) kunmap(page)
+
+/* pvfs2 xattr and acl related defines */
+#define PVFS2_XATTR_INDEX_POSIX_ACL_ACCESS  1
+#define PVFS2_XATTR_INDEX_POSIX_ACL_DEFAULT 2
+#define PVFS2_XATTR_INDEX_TRUSTED           3
+#define PVFS2_XATTR_INDEX_DEFAULT           4
+
+#if 0
+#ifndef POSIX_ACL_XATTR_ACCESS
+#define POSIX_ACL_XATTR_ACCESS	"system.posix_acl_access"
+#endif
+#ifndef POSIX_ACL_XATTR_DEFAULT
+#define POSIX_ACL_XATTR_DEFAULT	"system.posix_acl_default"
+#endif
+#endif
+
+#define PVFS2_XATTR_NAME_ACL_ACCESS  POSIX_ACL_XATTR_ACCESS
+#define PVFS2_XATTR_NAME_ACL_DEFAULT POSIX_ACL_XATTR_DEFAULT
+#define PVFS2_XATTR_NAME_TRUSTED_PREFIX "trusted."
+#define PVFS2_XATTR_NAME_DEFAULT_PREFIX ""
+
+/* these functions are defined in pvfs2-utils.c */
+int orangefs_prepare_cdm_array(char *debug_array_string);
+int orangefs_prepare_debugfs_help_string(int);
+
+/* defined in pvfs2-debugfs.c */
+int pvfs2_client_debug_init(void);
+
+void debug_string_to_mask(char *, void *, int);
+void do_c_mask(int, char *, struct client_debug_mask **);
+void do_k_mask(int, char *, __u64 **);
+
+void debug_mask_to_string(void *, int);
+void do_k_string(void *, int);
+void do_c_string(void *, int);
+int check_amalgam_keyword(void *, int);
+int keyword_is_amalgam(char *);
+
+/*these variables are defined in pvfs2-mod.c */
+extern char kernel_debug_string[PVFS2_MAX_DEBUG_STRING_LEN];
+extern char client_debug_string[PVFS2_MAX_DEBUG_STRING_LEN];
+extern char client_debug_array_string[PVFS2_MAX_DEBUG_STRING_LEN];
+/* HELLO
+extern struct client_debug_mask current_client_mask;
+*/
+extern unsigned int kernel_mask_set_mod_init;
+
+extern int pvfs2_init_acl(struct inode *inode, struct inode *dir);
+extern const struct xattr_handler *pvfs2_xattr_handlers[];
+
+extern struct posix_acl *pvfs2_get_acl(struct inode *inode, int type);
+extern int pvfs2_set_acl(struct inode *inode, struct posix_acl *acl, int type);
+
+int pvfs2_xattr_set_default(struct dentry *dentry,
+			    const char *name,
+			    const void *buffer,
+			    size_t size,
+			    int flags,
+			    int handler_flags);
+
+int pvfs2_xattr_get_default(struct dentry *dentry,
+			    const char *name,
+			    void *buffer,
+			    size_t size,
+			    int handler_flags);
+
+/*
+ * Redefine xtvec structure so that we could move helper functions out of
+ * the define
+ */
+struct xtvec {
+	__kernel_off_t xtv_off;		/* must be off_t */
+	__kernel_size_t xtv_len;	/* must be size_t */
+};
+
+/*
+ * pvfs2 data structures
+ */
+struct pvfs2_kernel_op_s {
+	enum pvfs2_vfs_op_states op_state;
+	__u64 tag;
+
+	/*
+	 * Set uses_shared_memory to 1 if this operation uses shared memory.
+	 * If true, then a retry on the op must also get a new shared memory
+	 * buffer and re-populate it.
+	 */
+	int uses_shared_memory;
+
+	struct pvfs2_upcall_s upcall;
+	struct pvfs2_downcall_s downcall;
+
+	wait_queue_head_t waitq;
+	spinlock_t lock;
+
+	int io_completed;
+	wait_queue_head_t io_completion_waitq;
+
+	/*
+	 * upcalls requiring variable length trailers require that this struct
+	 * be in the request list even after client-core does a read() on the
+	 * device to dequeue the upcall.
+	 * if op_linger field goes to 0, we dequeue this op off the list.
+	 * else we let it stay. What gets passed to the read() is
+	 * a) if op_linger field is = 1, pvfs2_kernel_op_s itself
+	 * b) else if = 0, we pass ->upcall.trailer_buf
+	 * We expect to have only a single upcall trailer buffer,
+	 * so we expect callers with trailers
+	 * to set this field to 2 and others to set it to 1.
+	 */
+	__s32 op_linger, op_linger_tmp;
+	/* VFS aio fields */
+
+	/* used by the async I/O code to stash the pvfs2_kiocb_s structure */
+	void *priv;
+
+	/* used again for the async I/O code for deallocation */
+	atomic_t aio_ref_count;
+
+	int attempts;
+
+	struct list_head list;
+};
+
+/* per inode private pvfs2 info */
+struct pvfs2_inode_s {
+	struct pvfs2_object_kref refn;
+	char link_target[PVFS_NAME_MAX];
+	__s64 blksize;
+	/*
+	 * Reading/Writing Extended attributes need to acquire the appropriate
+	 * reader/writer semaphore on the pvfs2_inode_s structure.
+	 */
+	struct rw_semaphore xattr_sem;
+
+	struct inode vfs_inode;
+	sector_t last_failed_block_index_read;
+
+	/*
+	 * State of in-memory attributes not yet flushed to disk associated
+	 * with this object
+	 */
+	unsigned long pinode_flags;
+
+	/* All allocated pvfs2_inode_s objects are chained to a list */
+	struct list_head list;
+};
+
+#define P_ATIME_FLAG 0
+#define P_MTIME_FLAG 1
+#define P_CTIME_FLAG 2
+#define P_MODE_FLAG  3
+
+#define ClearAtimeFlag(pinode) clear_bit(P_ATIME_FLAG, &(pinode)->pinode_flags)
+#define SetAtimeFlag(pinode)   set_bit(P_ATIME_FLAG, &(pinode)->pinode_flags)
+#define AtimeFlag(pinode)      test_bit(P_ATIME_FLAG, &(pinode)->pinode_flags)
+
+#define ClearMtimeFlag(pinode) clear_bit(P_MTIME_FLAG, &(pinode)->pinode_flags)
+#define SetMtimeFlag(pinode)   set_bit(P_MTIME_FLAG, &(pinode)->pinode_flags)
+#define MtimeFlag(pinode)      test_bit(P_MTIME_FLAG, &(pinode)->pinode_flags)
+
+#define ClearCtimeFlag(pinode) clear_bit(P_CTIME_FLAG, &(pinode)->pinode_flags)
+#define SetCtimeFlag(pinode)   set_bit(P_CTIME_FLAG, &(pinode)->pinode_flags)
+#define CtimeFlag(pinode)      test_bit(P_CTIME_FLAG, &(pinode)->pinode_flags)
+
+#define ClearModeFlag(pinode) clear_bit(P_MODE_FLAG, &(pinode)->pinode_flags)
+#define SetModeFlag(pinode)   set_bit(P_MODE_FLAG, &(pinode)->pinode_flags)
+#define ModeFlag(pinode)      test_bit(P_MODE_FLAG, &(pinode)->pinode_flags)
+
+/* per superblock private pvfs2 info */
+struct pvfs2_sb_info_s {
+	struct pvfs2_khandle root_khandle;
+	__s32 fs_id;
+	int id;
+	int flags;
+#define PVFS2_OPT_INTR		0x01
+#define PVFS2_OPT_LOCAL_LOCK	0x02
+	char devname[PVFS_MAX_SERVER_ADDR_LEN];
+	struct super_block *sb;
+	int mount_pending;
+	struct list_head list;
+};
+
+/*
+ * a temporary structure used only for sb mount time that groups the
+ * mount time data provided along with a private superblock structure
+ * that is allocated before a 'kernel' superblock is allocated.
+*/
+struct pvfs2_mount_sb_info_s {
+	void *data;
+	struct pvfs2_khandle root_khandle;
+	__s32 fs_id;
+	int id;
+};
+
+/*
+ * structure that holds the state of any async I/O operation issued
+ * through the VFS. Needed especially to handle cancellation requests
+ * or even completion notification so that the VFS client-side daemon
+ * can free up its vfs_request slots.
+ */
+struct pvfs2_kiocb_s {
+	/* the pointer to the task that initiated the AIO */
+	struct task_struct *tsk;
+
+	/* pointer to the kiocb that kicked this operation */
+	struct kiocb *kiocb;
+
+	/* buffer index that was used for the I/O */
+	struct pvfs2_bufmap *bufmap;
+	int buffer_index;
+
+	/* pvfs2 kernel operation type */
+	struct pvfs2_kernel_op_s *op;
+
+	/* The user space buffers from/to which I/O is being staged */
+	struct iovec *iov;
+
+	/* number of elements in the iovector */
+	unsigned long nr_segs;
+
+	/* set to indicate the type of the operation */
+	int rw;
+
+	/* file offset */
+	loff_t offset;
+
+	/* and the count in bytes */
+	size_t bytes_to_be_copied;
+
+	ssize_t bytes_copied;
+	int needs_cleanup;
+};
+
+struct pvfs2_stats {
+	unsigned long cache_hits;
+	unsigned long cache_misses;
+	unsigned long reads;
+	unsigned long writes;
+};
+
+extern struct pvfs2_stats g_pvfs2_stats;
+
+/*
+  NOTE: See Documentation/filesystems/porting for information
+  on implementing FOO_I and properly accessing fs private data
+*/
+static inline struct pvfs2_inode_s *PVFS2_I(struct inode *inode)
+{
+	return container_of(inode, struct pvfs2_inode_s, vfs_inode);
+}
+
+static inline struct pvfs2_sb_info_s *PVFS2_SB(struct super_block *sb)
+{
+	return (struct pvfs2_sb_info_s *) sb->s_fs_info;
+}
+
+/* ino_t descends from "unsigned long", 8 bytes, 64 bits. */
+static inline ino_t pvfs2_khandle_to_ino(struct pvfs2_khandle *khandle)
+{
+	union {
+		unsigned char u[8];
+		__u64 ino;
+	} ihandle;
+
+	ihandle.u[0] = khandle->u[0] ^ khandle->u[4];
+	ihandle.u[1] = khandle->u[1] ^ khandle->u[5];
+	ihandle.u[2] = khandle->u[2] ^ khandle->u[6];
+	ihandle.u[3] = khandle->u[3] ^ khandle->u[7];
+	ihandle.u[4] = khandle->u[12] ^ khandle->u[8];
+	ihandle.u[5] = khandle->u[13] ^ khandle->u[9];
+	ihandle.u[6] = khandle->u[14] ^ khandle->u[10];
+	ihandle.u[7] = khandle->u[15] ^ khandle->u[11];
+
+	return ihandle.ino;
+}
+
+static inline struct pvfs2_khandle *get_khandle_from_ino(struct inode *inode)
+{
+	return &(PVFS2_I(inode)->refn.khandle);
+}
+
+static inline __s32 get_fsid_from_ino(struct inode *inode)
+{
+	return PVFS2_I(inode)->refn.fs_id;
+}
+
+static inline ino_t get_ino_from_khandle(struct inode *inode)
+{
+	struct pvfs2_khandle *khandle;
+	ino_t ino;
+
+	khandle = get_khandle_from_ino(inode);
+	ino = pvfs2_khandle_to_ino(khandle);
+	return ino;
+}
+
+static inline ino_t get_parent_ino_from_dentry(struct dentry *dentry)
+{
+	return get_ino_from_khandle(dentry->d_parent->d_inode);
+}
+
+static inline int is_root_handle(struct inode *inode)
+{
+	gossip_debug(GOSSIP_DCACHE_DEBUG,
+		     "%s: root handle: %pU, this handle: %pU:\n",
+		     __func__,
+		     &PVFS2_SB(inode->i_sb)->root_khandle,
+		     get_khandle_from_ino(inode));
+
+	if (PVFS_khandle_cmp(&(PVFS2_SB(inode->i_sb)->root_khandle),
+			     get_khandle_from_ino(inode)))
+		return 0;
+	else
+		return 1;
+}
+
+static inline int match_handle(struct pvfs2_khandle resp_handle,
+			       struct inode *inode)
+{
+	gossip_debug(GOSSIP_DCACHE_DEBUG,
+		     "%s: one handle: %pU, another handle:%pU:\n",
+		     __func__,
+		     &resp_handle,
+		     get_khandle_from_ino(inode));
+
+	if (PVFS_khandle_cmp(&resp_handle, get_khandle_from_ino(inode)))
+		return 0;
+	else
+		return 1;
+}
+
+/*
+ * defined in pvfs2-cache.c
+ */
+int op_cache_initialize(void);
+int op_cache_finalize(void);
+struct pvfs2_kernel_op_s *op_alloc(__s32 type);
+struct pvfs2_kernel_op_s *op_alloc_trailer(__s32 type);
+char *get_opname_string(struct pvfs2_kernel_op_s *new_op);
+void op_release(struct pvfs2_kernel_op_s *op);
+
+int dev_req_cache_initialize(void);
+int dev_req_cache_finalize(void);
+void *dev_req_alloc(void);
+void dev_req_release(void *);
+
+int pvfs2_inode_cache_initialize(void);
+int pvfs2_inode_cache_finalize(void);
+
+int kiocb_cache_initialize(void);
+int kiocb_cache_finalize(void);
+struct pvfs2_kiocb_s *kiocb_alloc(void);
+void kiocb_release(struct pvfs2_kiocb_s *ptr);
+
+/*
+ * defined in pvfs2-mod.c
+ */
+void purge_inprogress_ops(void);
+
+/*
+ * defined in waitqueue.c
+ */
+int wait_for_matching_downcall(struct pvfs2_kernel_op_s *op);
+int wait_for_cancellation_downcall(struct pvfs2_kernel_op_s *op);
+void pvfs2_clean_up_interrupted_operation(struct pvfs2_kernel_op_s *op);
+void purge_waiting_ops(void);
+
+/*
+ * defined in super.c
+ */
+struct dentry *pvfs2_mount(struct file_system_type *fst,
+			   int flags,
+			   const char *devname,
+			   void *data);
+
+void pvfs2_kill_sb(struct super_block *sb);
+int pvfs2_remount(struct super_block *sb);
+
+int fsid_key_table_initialize(void);
+void fsid_key_table_finalize(void);
+
+/*
+ * defined in inode.c
+ */
+__u32 convert_to_pvfs2_mask(unsigned long lite_mask);
+struct inode *pvfs2_new_inode(struct super_block *sb,
+			      struct inode *dir,
+			      int mode,
+			      dev_t dev,
+			      struct pvfs2_object_kref *ref);
+
+int pvfs2_setattr(struct dentry *dentry, struct iattr *iattr);
+
+int pvfs2_getattr(struct vfsmount *mnt,
+		  struct dentry *dentry,
+		  struct kstat *kstat);
+
+/*
+ * defined in xattr.c
+ */
+int pvfs2_setxattr(struct dentry *dentry,
+		   const char *name,
+		   const void *value,
+		   size_t size,
+		   int flags);
+
+ssize_t pvfs2_getxattr(struct dentry *dentry,
+		       const char *name,
+		       void *buffer,
+		       size_t size);
+
+ssize_t pvfs2_listxattr(struct dentry *dentry, char *buffer, size_t size);
+
+/*
+ * defined in namei.c
+ */
+struct inode *pvfs2_iget(struct super_block *sb,
+			 struct pvfs2_object_kref *ref);
+
+ssize_t pvfs2_inode_read(struct inode *inode,
+			 char *buf,
+			 size_t count,
+			 loff_t *offset,
+			 loff_t readahead_size);
+
+/*
+ * defined in devpvfs2-req.c
+ */
+int pvfs2_dev_init(void);
+void pvfs2_dev_cleanup(void);
+int is_daemon_in_service(void);
+int fs_mount_pending(__s32 fsid);
+
+/*
+ * defined in pvfs2-utils.c
+ */
+__s32 fsid_of_op(struct pvfs2_kernel_op_s *op);
+
+int pvfs2_flush_inode(struct inode *inode);
+
+ssize_t pvfs2_inode_getxattr(struct inode *inode,
+			     const char *prefix,
+			     const char *name,
+			     void *buffer,
+			     size_t size);
+
+int pvfs2_inode_setxattr(struct inode *inode,
+			 const char *prefix,
+			 const char *name,
+			 const void *value,
+			 size_t size,
+			 int flags);
+
+int pvfs2_inode_getattr(struct inode *inode, __u32 mask);
+
+int pvfs2_inode_setattr(struct inode *inode, struct iattr *iattr);
+
+void pvfs2_op_initialize(struct pvfs2_kernel_op_s *op);
+
+void pvfs2_make_bad_inode(struct inode *inode);
+
+void mask_blocked_signals(sigset_t *orig_sigset);
+
+void unmask_blocked_signals(sigset_t *orig_sigset);
+
+int pvfs2_unmount_sb(struct super_block *sb);
+
+int pvfs2_cancel_op_in_progress(__u64 tag);
+
+__u64 pvfs2_convert_time_field(void *time_ptr);
+
+int pvfs2_normalize_to_errno(__s32 error_code);
+
+extern struct mutex devreq_mutex;
+extern struct mutex request_mutex;
+extern int debug;
+extern int op_timeout_secs;
+extern int slot_timeout_secs;
+extern struct list_head pvfs2_superblocks;
+extern spinlock_t pvfs2_superblocks_lock;
+extern struct list_head pvfs2_request_list;
+extern spinlock_t pvfs2_request_list_lock;
+extern wait_queue_head_t pvfs2_request_list_waitq;
+extern struct list_head *htable_ops_in_progress;
+extern spinlock_t htable_ops_in_progress_lock;
+extern int hash_table_size;
+
+extern const struct address_space_operations pvfs2_address_operations;
+extern struct backing_dev_info pvfs2_backing_dev_info;
+extern struct inode_operations pvfs2_file_inode_operations;
+extern const struct file_operations pvfs2_file_operations;
+extern struct inode_operations pvfs2_symlink_inode_operations;
+extern struct inode_operations pvfs2_dir_inode_operations;
+extern const struct file_operations pvfs2_dir_operations;
+extern const struct dentry_operations pvfs2_dentry_operations;
+extern const struct file_operations pvfs2_devreq_file_operations;
+
+extern wait_queue_head_t pvfs2_bufmap_init_waitq;
+
+/*
+ * misc convenience macros
+ */
+#define add_op_to_request_list(op)				\
+do {								\
+	spin_lock(&pvfs2_request_list_lock);			\
+	spin_lock(&op->lock);					\
+	set_op_state_waiting(op);				\
+	list_add_tail(&op->list, &pvfs2_request_list);		\
+	spin_unlock(&pvfs2_request_list_lock);			\
+	spin_unlock(&op->lock);					\
+	wake_up_interruptible(&pvfs2_request_list_waitq);	\
+} while (0)
+
+#define add_priority_op_to_request_list(op)				\
+	do {								\
+		spin_lock(&pvfs2_request_list_lock);			\
+		spin_lock(&op->lock);					\
+		set_op_state_waiting(op);				\
+									\
+		list_add(&op->list, &pvfs2_request_list);		\
+		spin_unlock(&pvfs2_request_list_lock);			\
+		spin_unlock(&op->lock);					\
+		wake_up_interruptible(&pvfs2_request_list_waitq);	\
+} while (0)
+
+#define remove_op_from_request_list(op)					\
+	do {								\
+		struct list_head *tmp = NULL;				\
+		struct list_head *tmp_safe = NULL;			\
+		struct pvfs2_kernel_op_s *tmp_op = NULL;		\
+									\
+		spin_lock(&pvfs2_request_list_lock);			\
+		list_for_each_safe(tmp, tmp_safe, &pvfs2_request_list) { \
+			tmp_op = list_entry(tmp,			\
+					    struct pvfs2_kernel_op_s,	\
+					    list);			\
+			if (tmp_op && (tmp_op == op)) {			\
+				list_del(&tmp_op->list);		\
+				break;					\
+			}						\
+		}							\
+		spin_unlock(&pvfs2_request_list_lock);			\
+	} while (0)
+
+#define PVFS2_OP_INTERRUPTIBLE 1   /* service_operation() is interruptible */
+#define PVFS2_OP_PRIORITY      2   /* service_operation() is high priority */
+#define PVFS2_OP_CANCELLATION  4   /* this is a cancellation */
+#define PVFS2_OP_NO_SEMAPHORE  8   /* don't acquire semaphore */
+#define PVFS2_OP_ASYNC         16  /* Queue it, but don't wait */
+
+int service_operation(struct pvfs2_kernel_op_s *op,
+		      const char *op_name,
+		      int flags);
+
+/*
+ * handles two possible error cases, depending on context.
+ *
+ * by design, our vfs i/o errors need to be handled in one of two ways,
+ * depending on where the error occured.
+ *
+ * if the error happens in the waitqueue code because we either timed
+ * out or a signal was raised while waiting, we need to cancel the
+ * userspace i/o operation and free the op manually.  this is done to
+ * avoid having the device start writing application data to our shared
+ * bufmap pages without us expecting it.
+ *
+ * FIXME: POSSIBLE OPTIMIZATION:
+ * However, if we timed out or if we got a signal AND our upcall was never
+ * picked off the queue (i.e. we were in OP_VFS_STATE_WAITING), then we don't
+ * need to send a cancellation upcall. The way we can handle this is
+ * set error_exit to 2 in such cases and 1 whenever cancellation has to be
+ * sent and have handle_error
+ * take care of this situation as well..
+ *
+ * if a pvfs2 sysint level error occured and i/o has been completed,
+ * there is no need to cancel the operation, as the user has finished
+ * using the bufmap page and so there is no danger in this case.  in
+ * this case, we wake up the device normally so that it may free the
+ * op, as normal.
+ *
+ * note the only reason this is a macro is because both read and write
+ * cases need the exact same handling code.
+ */
+#define handle_io_error()					\
+do {								\
+	if (!op_state_serviced(new_op)) {			\
+		pvfs2_cancel_op_in_progress(new_op->tag);	\
+		op_release(new_op);				\
+	} else {						\
+		wake_up_daemon_for_return(new_op);		\
+	}							\
+	new_op = NULL;						\
+	pvfs_bufmap_put(bufmap, buffer_index);				\
+	buffer_index = -1;					\
+} while (0)
+
+#define get_interruptible_flag(inode) \
+	((PVFS2_SB(inode->i_sb)->flags & PVFS2_OPT_INTR) ? \
+		PVFS2_OP_INTERRUPTIBLE : 0)
+
+#define add_pvfs2_sb(sb)						\
+do {									\
+	gossip_debug(GOSSIP_SUPER_DEBUG,				\
+		     "Adding SB %p to pvfs2 superblocks\n",		\
+		     PVFS2_SB(sb));					\
+	spin_lock(&pvfs2_superblocks_lock);				\
+	list_add_tail(&PVFS2_SB(sb)->list, &pvfs2_superblocks);		\
+	spin_unlock(&pvfs2_superblocks_lock); \
+} while (0)
+
+#define remove_pvfs2_sb(sb)						\
+do {									\
+	struct list_head *tmp = NULL;					\
+	struct list_head *tmp_safe = NULL;				\
+	struct pvfs2_sb_info_s *pvfs2_sb = NULL;			\
+									\
+	spin_lock(&pvfs2_superblocks_lock);				\
+	list_for_each_safe(tmp, tmp_safe, &pvfs2_superblocks) {		\
+		pvfs2_sb = list_entry(tmp,				\
+				      struct pvfs2_sb_info_s,		\
+				      list);				\
+		if (pvfs2_sb && (pvfs2_sb->sb == sb)) {			\
+			gossip_debug(GOSSIP_SUPER_DEBUG,		\
+			    "Removing SB %p from pvfs2 superblocks\n",	\
+			pvfs2_sb);					\
+			list_del(&pvfs2_sb->list);			\
+			break;						\
+		}							\
+	}								\
+	spin_unlock(&pvfs2_superblocks_lock);				\
+} while (0)
+
+#define pvfs2_lock_inode(inode) spin_lock(&inode->i_lock)
+#define pvfs2_unlock_inode(inode) spin_unlock(&inode->i_lock)
+#define pvfs2_current_signal_lock current->sighand->siglock
+#define pvfs2_current_sigaction current->sighand->action
+
+#define fill_default_sys_attrs(sys_attr, type, mode)			\
+do {									\
+	sys_attr.owner = from_kuid(current_user_ns(), current_fsuid()); \
+	sys_attr.group = from_kgid(current_user_ns(), current_fsgid()); \
+	sys_attr.size = 0;						\
+	sys_attr.perms = PVFS_util_translate_mode(mode);		\
+	sys_attr.objtype = type;					\
+	sys_attr.mask = PVFS_ATTR_SYS_ALL_SETABLE;			\
+} while (0)
+
+#define pvfs2_inode_lock(__i)  mutex_lock(&(__i)->i_mutex)
+
+#define pvfs2_inode_unlock(__i) mutex_unlock(&(__i)->i_mutex)
+
+static inline void pvfs2_i_size_write(struct inode *inode, loff_t i_size)
+{
+#if BITS_PER_LONG == 32 && defined(CONFIG_SMP)
+	pvfs2_inode_lock(inode);
+#endif
+	i_size_write(inode, i_size);
+#if BITS_PER_LONG == 32 && defined(CONFIG_SMP)
+	pvfs2_inode_unlock(inode);
+#endif
+}
+
+static inline unsigned int diff(struct timeval *end, struct timeval *begin)
+{
+	if (end->tv_usec < begin->tv_usec) {
+		end->tv_usec += 1000000;
+		end->tv_sec--;
+	}
+	end->tv_sec -= begin->tv_sec;
+	end->tv_usec -= begin->tv_usec;
+	return (end->tv_sec * 1000000) + end->tv_usec;
+}
+
+#endif /* __PVFS2KERNEL_H */
diff --git a/fs/orangefs/pvfs2-sysfs.h b/fs/orangefs/pvfs2-sysfs.h
new file mode 100644
index 0000000..f0b7638
--- /dev/null
+++ b/fs/orangefs/pvfs2-sysfs.h
@@ -0,0 +1,2 @@
+extern int orangefs_sysfs_init(void);
+extern void orangefs_sysfs_exit(void);
diff --git a/fs/orangefs/upcall.h b/fs/orangefs/upcall.h
new file mode 100644
index 0000000..1e07f62
--- /dev/null
+++ b/fs/orangefs/upcall.h
@@ -0,0 +1,255 @@
+/*
+ * (C) 2001 Clemson University and The University of Chicago
+ *
+ * See COPYING in top-level directory.
+ */
+
+#ifndef __UPCALL_H
+#define __UPCALL_H
+
+/*
+ * Sanitized this header file to fix
+ * 32-64 bit interaction issues between
+ * client-core and device
+ */
+struct pvfs2_io_request_s {
+	__s32 async_vfs_io;
+	__s32 buf_index;
+	__s32 count;
+	__s32 __pad1;
+	__s64 offset;
+	struct pvfs2_object_kref refn;
+	enum PVFS_io_type io_type;
+	__s32 readahead_size;
+};
+
+struct pvfs2_iox_request_s {
+	__s32 buf_index;
+	__s32 count;
+	struct pvfs2_object_kref refn;
+	enum PVFS_io_type io_type;
+	__s32 __pad1;
+};
+
+struct pvfs2_lookup_request_s {
+	__s32 sym_follow;
+	__s32 __pad1;
+	struct pvfs2_object_kref parent_refn;
+	char d_name[PVFS2_NAME_LEN];
+};
+
+struct pvfs2_create_request_s {
+	struct pvfs2_object_kref parent_refn;
+	struct PVFS_sys_attr_s attributes;
+	char d_name[PVFS2_NAME_LEN];
+};
+
+struct pvfs2_symlink_request_s {
+	struct pvfs2_object_kref parent_refn;
+	struct PVFS_sys_attr_s attributes;
+	char entry_name[PVFS2_NAME_LEN];
+	char target[PVFS2_NAME_LEN];
+};
+
+struct pvfs2_getattr_request_s {
+	struct pvfs2_object_kref refn;
+	__u32 mask;
+	__u32 __pad1;
+};
+
+struct pvfs2_setattr_request_s {
+	struct pvfs2_object_kref refn;
+	struct PVFS_sys_attr_s attributes;
+};
+
+struct pvfs2_remove_request_s {
+	struct pvfs2_object_kref parent_refn;
+	char d_name[PVFS2_NAME_LEN];
+};
+
+struct pvfs2_mkdir_request_s {
+	struct pvfs2_object_kref parent_refn;
+	struct PVFS_sys_attr_s attributes;
+	char d_name[PVFS2_NAME_LEN];
+};
+
+struct pvfs2_readdir_request_s {
+	struct pvfs2_object_kref refn;
+	__u64 token;
+	__s32 max_dirent_count;
+	__s32 buf_index;
+};
+
+struct pvfs2_readdirplus_request_s {
+	struct pvfs2_object_kref refn;
+	__u64 token;
+	__s32 max_dirent_count;
+	__u32 mask;
+	__s32 buf_index;
+	__s32 __pad1;
+};
+
+struct pvfs2_rename_request_s {
+	struct pvfs2_object_kref old_parent_refn;
+	struct pvfs2_object_kref new_parent_refn;
+	char d_old_name[PVFS2_NAME_LEN];
+	char d_new_name[PVFS2_NAME_LEN];
+};
+
+struct pvfs2_statfs_request_s {
+	__s32 fs_id;
+	__s32 __pad1;
+};
+
+struct pvfs2_truncate_request_s {
+	struct pvfs2_object_kref refn;
+	__s64 size;
+};
+
+struct pvfs2_mmap_ra_cache_flush_request_s {
+	struct pvfs2_object_kref refn;
+};
+
+struct pvfs2_fs_mount_request_s {
+	char pvfs2_config_server[PVFS_MAX_SERVER_ADDR_LEN];
+};
+
+struct pvfs2_fs_umount_request_s {
+	__s32 id;
+	__s32 fs_id;
+	char pvfs2_config_server[PVFS_MAX_SERVER_ADDR_LEN];
+};
+
+struct pvfs2_getxattr_request_s {
+	struct pvfs2_object_kref refn;
+	__s32 key_sz;
+	__s32 __pad1;
+	char key[PVFS_MAX_XATTR_NAMELEN];
+};
+
+struct pvfs2_setxattr_request_s {
+	struct pvfs2_object_kref refn;
+	struct PVFS_keyval_pair keyval;
+	__s32 flags;
+	__s32 __pad1;
+};
+
+struct pvfs2_listxattr_request_s {
+	struct pvfs2_object_kref refn;
+	__s32 requested_count;
+	__s32 __pad1;
+	__u64 token;
+};
+
+struct pvfs2_removexattr_request_s {
+	struct pvfs2_object_kref refn;
+	__s32 key_sz;
+	__s32 __pad1;
+	char key[PVFS_MAX_XATTR_NAMELEN];
+};
+
+struct pvfs2_op_cancel_s {
+	__u64 op_tag;
+};
+
+struct pvfs2_fsync_request_s {
+	struct pvfs2_object_kref refn;
+};
+
+enum pvfs2_param_request_type {
+	PVFS2_PARAM_REQUEST_SET = 1,
+	PVFS2_PARAM_REQUEST_GET = 2
+};
+
+enum pvfs2_param_request_op {
+	PVFS2_PARAM_REQUEST_OP_ACACHE_TIMEOUT_MSECS = 1,
+	PVFS2_PARAM_REQUEST_OP_ACACHE_HARD_LIMIT = 2,
+	PVFS2_PARAM_REQUEST_OP_ACACHE_SOFT_LIMIT = 3,
+	PVFS2_PARAM_REQUEST_OP_ACACHE_RECLAIM_PERCENTAGE = 4,
+	PVFS2_PARAM_REQUEST_OP_PERF_TIME_INTERVAL_SECS = 5,
+	PVFS2_PARAM_REQUEST_OP_PERF_HISTORY_SIZE = 6,
+	PVFS2_PARAM_REQUEST_OP_PERF_RESET = 7,
+	PVFS2_PARAM_REQUEST_OP_NCACHE_TIMEOUT_MSECS = 8,
+	PVFS2_PARAM_REQUEST_OP_NCACHE_HARD_LIMIT = 9,
+	PVFS2_PARAM_REQUEST_OP_NCACHE_SOFT_LIMIT = 10,
+	PVFS2_PARAM_REQUEST_OP_NCACHE_RECLAIM_PERCENTAGE = 11,
+	PVFS2_PARAM_REQUEST_OP_STATIC_ACACHE_TIMEOUT_MSECS = 12,
+	PVFS2_PARAM_REQUEST_OP_STATIC_ACACHE_HARD_LIMIT = 13,
+	PVFS2_PARAM_REQUEST_OP_STATIC_ACACHE_SOFT_LIMIT = 14,
+	PVFS2_PARAM_REQUEST_OP_STATIC_ACACHE_RECLAIM_PERCENTAGE = 15,
+	PVFS2_PARAM_REQUEST_OP_CLIENT_DEBUG = 16,
+	PVFS2_PARAM_REQUEST_OP_CCACHE_TIMEOUT_SECS = 17,
+	PVFS2_PARAM_REQUEST_OP_CCACHE_HARD_LIMIT = 18,
+	PVFS2_PARAM_REQUEST_OP_CCACHE_SOFT_LIMIT = 19,
+	PVFS2_PARAM_REQUEST_OP_CCACHE_RECLAIM_PERCENTAGE = 20,
+	PVFS2_PARAM_REQUEST_OP_CAPCACHE_TIMEOUT_SECS = 21,
+	PVFS2_PARAM_REQUEST_OP_CAPCACHE_HARD_LIMIT = 22,
+	PVFS2_PARAM_REQUEST_OP_CAPCACHE_SOFT_LIMIT = 23,
+	PVFS2_PARAM_REQUEST_OP_CAPCACHE_RECLAIM_PERCENTAGE = 24,
+	PVFS2_PARAM_REQUEST_OP_TWO_MASK_VALUES = 25,
+};
+
+struct pvfs2_param_request_s {
+	enum pvfs2_param_request_type type;
+	enum pvfs2_param_request_op op;
+	__s64 value;
+	char s_value[PVFS2_MAX_DEBUG_STRING_LEN];
+};
+
+enum pvfs2_perf_count_request_type {
+	PVFS2_PERF_COUNT_REQUEST_ACACHE = 1,
+	PVFS2_PERF_COUNT_REQUEST_NCACHE = 2,
+	PVFS2_PERF_COUNT_REQUEST_CAPCACHE = 3,
+};
+
+struct pvfs2_perf_count_request_s {
+	enum pvfs2_perf_count_request_type type;
+	__s32 __pad1;
+};
+
+struct pvfs2_fs_key_request_s {
+	__s32 fsid;
+	__s32 __pad1;
+};
+
+struct pvfs2_upcall_s {
+	__s32 type;
+	__u32 uid;
+	__u32 gid;
+	int pid;
+	int tgid;
+	/* currently trailer is used only by readx/writex (iox) */
+	__s64 trailer_size;
+	char *trailer_buf;
+
+	union {
+		struct pvfs2_io_request_s io;
+		struct pvfs2_iox_request_s iox;
+		struct pvfs2_lookup_request_s lookup;
+		struct pvfs2_create_request_s create;
+		struct pvfs2_symlink_request_s sym;
+		struct pvfs2_getattr_request_s getattr;
+		struct pvfs2_setattr_request_s setattr;
+		struct pvfs2_remove_request_s remove;
+		struct pvfs2_mkdir_request_s mkdir;
+		struct pvfs2_readdir_request_s readdir;
+		struct pvfs2_readdirplus_request_s readdirplus;
+		struct pvfs2_rename_request_s rename;
+		struct pvfs2_statfs_request_s statfs;
+		struct pvfs2_truncate_request_s truncate;
+		struct pvfs2_mmap_ra_cache_flush_request_s ra_cache_flush;
+		struct pvfs2_fs_mount_request_s fs_mount;
+		struct pvfs2_fs_umount_request_s fs_umount;
+		struct pvfs2_getxattr_request_s getxattr;
+		struct pvfs2_setxattr_request_s setxattr;
+		struct pvfs2_listxattr_request_s listxattr;
+		struct pvfs2_removexattr_request_s removexattr;
+		struct pvfs2_op_cancel_s cancel;
+		struct pvfs2_fsync_request_s fsync;
+		struct pvfs2_param_request_s param;
+		struct pvfs2_perf_count_request_s perf_count;
+		struct pvfs2_fs_key_request_s fs_key;
+	} req;
+};
+
+#endif /* __UPCALL_H */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH V3 2/7] Orangefs: kernel client part 2
  2015-07-17 17:16 [PATCH V3 0/7] Orangefs: kernel client introduction Mike Marshall
  2015-07-17 17:16 ` [PATCH V3 1/7] Orangefs: kernel client part 1 Mike Marshall
@ 2015-07-17 17:16 ` Mike Marshall
  2015-07-17 17:16 ` [PATCH V3 3/7] Orangefs: kernel client part 3 Mike Marshall
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Mike Marshall @ 2015-07-17 17:16 UTC (permalink / raw)
  To: viro; +Cc: Mike Marshall, linux-fsdevel

From: Mike Marshall <hubcap@omnibond.com>

Signed-off-by: Mike Marshall <hubcap@omnibond.com>
---
 fs/orangefs/acl.c          |  175 ++++++++
 fs/orangefs/dcache.c       |  142 ++++++
 fs/orangefs/devpvfs2-req.c |  997 +++++++++++++++++++++++++++++++++++++++++++
 fs/orangefs/dir.c          |  394 +++++++++++++++++
 fs/orangefs/file.c         | 1019 ++++++++++++++++++++++++++++++++++++++++++++
 fs/orangefs/inode.c        |  469 ++++++++++++++++++++
 6 files changed, 3196 insertions(+)
 create mode 100644 fs/orangefs/acl.c
 create mode 100644 fs/orangefs/dcache.c
 create mode 100644 fs/orangefs/devpvfs2-req.c
 create mode 100644 fs/orangefs/dir.c
 create mode 100644 fs/orangefs/file.c
 create mode 100644 fs/orangefs/inode.c

diff --git a/fs/orangefs/acl.c b/fs/orangefs/acl.c
new file mode 100644
index 0000000..e462b81
--- /dev/null
+++ b/fs/orangefs/acl.c
@@ -0,0 +1,175 @@
+/*
+ * (C) 2001 Clemson University and The University of Chicago
+ *
+ * See COPYING in top-level directory.
+ */
+
+#include "protocol.h"
+#include "pvfs2-kernel.h"
+#include "pvfs2-bufmap.h"
+#include <linux/posix_acl_xattr.h>
+#include <linux/fs_struct.h>
+
+struct posix_acl *pvfs2_get_acl(struct inode *inode, int type)
+{
+	struct posix_acl *acl;
+	int ret;
+	char *key = NULL, *value = NULL;
+
+	switch (type) {
+	case ACL_TYPE_ACCESS:
+		key = PVFS2_XATTR_NAME_ACL_ACCESS;
+		break;
+	case ACL_TYPE_DEFAULT:
+		key = PVFS2_XATTR_NAME_ACL_DEFAULT;
+		break;
+	default:
+		gossip_err("pvfs2_get_acl: bogus value of type %d\n", type);
+		return ERR_PTR(-EINVAL);
+	}
+	/*
+	 * Rather than incurring a network call just to determine the exact
+	 * length of the attribute, I just allocate a max length to save on
+	 * the network call. Conceivably, we could pass NULL to
+	 * pvfs2_inode_getxattr() to probe the length of the value, but
+	 * I don't do that for now.
+	 */
+	value = kmalloc(PVFS_MAX_XATTR_VALUELEN, GFP_KERNEL);
+	if (value == NULL)
+		return ERR_PTR(-ENOMEM);
+
+	gossip_debug(GOSSIP_ACL_DEBUG,
+		     "inode %pU, key %s, type %d\n",
+		     get_khandle_from_ino(inode),
+		     key,
+		     type);
+	ret = pvfs2_inode_getxattr(inode,
+				   "",
+				   key,
+				   value,
+				   PVFS_MAX_XATTR_VALUELEN);
+	/* if the key exists, convert it to an in-memory rep */
+	if (ret > 0) {
+		acl = posix_acl_from_xattr(&init_user_ns, value, ret);
+	} else if (ret == -ENODATA || ret == -ENOSYS) {
+		acl = NULL;
+	} else {
+		gossip_err("inode %pU retrieving acl's failed with error %d\n",
+			   get_khandle_from_ino(inode),
+			   ret);
+		acl = ERR_PTR(ret);
+	}
+	/* kfree(NULL) is safe, so don't worry if value ever got used */
+	kfree(value);
+	return acl;
+}
+
+int pvfs2_set_acl(struct inode *inode, struct posix_acl *acl, int type)
+{
+	struct pvfs2_inode_s *pvfs2_inode = PVFS2_I(inode);
+	int error = 0;
+	void *value = NULL;
+	size_t size = 0;
+	const char *name = NULL;
+
+	switch (type) {
+	case ACL_TYPE_ACCESS:
+		name = PVFS2_XATTR_NAME_ACL_ACCESS;
+		if (acl) {
+			umode_t mode = inode->i_mode;
+			/*
+			 * can we represent this with the traditional file
+			 * mode permission bits?
+			 */
+			error = posix_acl_equiv_mode(acl, &mode);
+			if (error < 0) {
+				gossip_err("%s: posix_acl_equiv_mode err: %d\n",
+					   __func__,
+					   error);
+				return error;
+			}
+
+			if (inode->i_mode != mode)
+				SetModeFlag(pvfs2_inode);
+			inode->i_mode = mode;
+			mark_inode_dirty_sync(inode);
+			if (error == 0)
+				acl = NULL;
+		}
+		break;
+	case ACL_TYPE_DEFAULT:
+		name = PVFS2_XATTR_NAME_ACL_DEFAULT;
+		break;
+	default:
+		gossip_err("%s: invalid type %d!\n", __func__, type);
+		return -EINVAL;
+	}
+
+	gossip_debug(GOSSIP_ACL_DEBUG,
+		     "%s: inode %pU, key %s type %d\n",
+		     __func__, get_khandle_from_ino(inode),
+		     name,
+		     type);
+
+	if (acl) {
+		size = posix_acl_xattr_size(acl->a_count);
+		value = kmalloc(size, GFP_KERNEL);
+		if (!value)
+			return -ENOMEM;
+
+		error = posix_acl_to_xattr(&init_user_ns, acl, value, size);
+		if (error < 0)
+			goto out;
+	}
+
+	gossip_debug(GOSSIP_ACL_DEBUG,
+		     "%s: name %s, value %p, size %zd, acl %p\n",
+		     __func__, name, value, size, acl);
+	/*
+	 * Go ahead and set the extended attribute now. NOTE: Suppose acl
+	 * was NULL, then value will be NULL and size will be 0 and that
+	 * will xlate to a removexattr. However, we don't want removexattr
+	 * complain if attributes does not exist.
+	 */
+	error = pvfs2_inode_setxattr(inode, "", name, value, size, 0);
+
+out:
+	kfree(value);
+	if (!error)
+		set_cached_acl(inode, type, acl);
+	return error;
+}
+
+int pvfs2_init_acl(struct inode *inode, struct inode *dir)
+{
+	struct pvfs2_inode_s *pvfs2_inode = PVFS2_I(inode);
+	struct posix_acl *default_acl, *acl;
+	umode_t mode = inode->i_mode;
+	int error = 0;
+
+	ClearModeFlag(pvfs2_inode);
+
+	error = posix_acl_create(dir, &mode, &default_acl, &acl);
+	if (error)
+		return error;
+
+	if (default_acl) {
+		error = pvfs2_set_acl(inode, default_acl, ACL_TYPE_DEFAULT);
+		posix_acl_release(default_acl);
+	}
+
+	if (acl) {
+		if (!error)
+			error = pvfs2_set_acl(inode, acl, ACL_TYPE_ACCESS);
+		posix_acl_release(acl);
+	}
+
+	/* If mode of the inode was changed, then do a forcible ->setattr */
+	if (mode != inode->i_mode) {
+		SetModeFlag(pvfs2_inode);
+		inode->i_mode = mode;
+		pvfs2_flush_inode(inode);
+	}
+
+	return error;
+}
diff --git a/fs/orangefs/dcache.c b/fs/orangefs/dcache.c
new file mode 100644
index 0000000..9466b17
--- /dev/null
+++ b/fs/orangefs/dcache.c
@@ -0,0 +1,142 @@
+/*
+ * (C) 2001 Clemson University and The University of Chicago
+ *
+ * See COPYING in top-level directory.
+ */
+
+/*
+ *  Implementation of dentry (directory cache) functions.
+ */
+
+#include "protocol.h"
+#include "pvfs2-kernel.h"
+
+/* Returns 1 if dentry can still be trusted, else 0. */
+static int pvfs2_revalidate_lookup(struct dentry *dentry)
+{
+	struct dentry *parent_dentry = dget_parent(dentry);
+	struct inode *parent_inode = parent_dentry->d_inode;
+	struct pvfs2_inode_s *parent = PVFS2_I(parent_inode);
+	struct inode *inode = dentry->d_inode;
+	struct pvfs2_kernel_op_s *new_op;
+	int ret = 0;
+	int err = 0;
+
+	gossip_debug(GOSSIP_DCACHE_DEBUG, "%s: attempting lookup.\n", __func__);
+
+	new_op = op_alloc(PVFS2_VFS_OP_LOOKUP);
+	if (!new_op)
+		goto out_put_parent;
+
+	new_op->upcall.req.lookup.sym_follow = PVFS2_LOOKUP_LINK_NO_FOLLOW;
+	new_op->upcall.req.lookup.parent_refn = parent->refn;
+	strncpy(new_op->upcall.req.lookup.d_name,
+		dentry->d_name.name,
+		PVFS2_NAME_LEN);
+
+	gossip_debug(GOSSIP_DCACHE_DEBUG,
+		     "%s:%s:%d interrupt flag [%d]\n",
+		     __FILE__,
+		     __func__,
+		     __LINE__,
+		     get_interruptible_flag(parent_inode));
+
+	err = service_operation(new_op, "pvfs2_lookup",
+			get_interruptible_flag(parent_inode));
+	if (err)
+		goto out_drop;
+
+	if (new_op->downcall.status != 0 ||
+	    !match_handle(new_op->downcall.resp.lookup.refn.khandle, inode)) {
+		gossip_debug(GOSSIP_DCACHE_DEBUG,
+			"%s:%s:%d "
+			"lookup failure |%s| or no match |%s|.\n",
+			__FILE__,
+			__func__,
+			__LINE__,
+			new_op->downcall.status ? "true" : "false",
+			match_handle(new_op->downcall.resp.lookup.refn.khandle,
+					inode) ? "false" : "true");
+		gossip_debug(GOSSIP_DCACHE_DEBUG,
+			     "%s:%s:%d revalidate failed\n",
+			     __FILE__, __func__, __LINE__);
+		goto out_drop;
+	}
+
+	ret = 1;
+out_release_op:
+	op_release(new_op);
+out_put_parent:
+	dput(parent_dentry);
+	return ret;
+out_drop:
+	d_drop(dentry);
+	goto out_release_op;
+}
+
+/*
+ * Verify that dentry is valid.
+ *
+ * Should return 1 if dentry can still be trusted, else 0
+ */
+static int pvfs2_d_revalidate(struct dentry *dentry, unsigned int flags)
+{
+	struct inode *inode;
+	int ret = 0;
+
+	if (flags & LOOKUP_RCU)
+		return -ECHILD;
+
+	gossip_debug(GOSSIP_DCACHE_DEBUG, "%s: called on dentry %p.\n",
+		     __func__, dentry);
+
+	/* find inode from dentry */
+	if (!dentry->d_inode) {
+		gossip_debug(GOSSIP_DCACHE_DEBUG, "%s: negative dentry.\n",
+			     __func__);
+		goto invalid_exit;
+	}
+
+	gossip_debug(GOSSIP_DCACHE_DEBUG, "%s: inode valid.\n", __func__);
+	inode = dentry->d_inode;
+
+	/*
+	 * first perform a lookup to make sure that the object not only
+	 * exists, but is still in the expected place in the name space
+	 */
+	if (!is_root_handle(inode)) {
+		if (!pvfs2_revalidate_lookup(dentry))
+			goto invalid_exit;
+	} else {
+		gossip_debug(GOSSIP_DCACHE_DEBUG,
+			     "%s: root handle, lookup skipped.\n",
+			     __func__);
+	}
+
+	/* now perform getattr */
+	gossip_debug(GOSSIP_DCACHE_DEBUG,
+		     "%s: doing getattr: inode: %p, handle: %pU\n",
+		     __func__,
+		     inode,
+		     get_khandle_from_ino(inode));
+	ret = pvfs2_inode_getattr(inode, PVFS_ATTR_SYS_ALL_NOHINT);
+	gossip_debug(GOSSIP_DCACHE_DEBUG,
+		     "%s: getattr %s (ret = %d), returning %s for dentry i_count=%d\n",
+		     __func__,
+		     (ret == 0 ? "succeeded" : "failed"),
+		     ret,
+		     (ret == 0 ? "valid" : "INVALID"),
+		     atomic_read(&inode->i_count));
+	if (ret != 0)
+		goto invalid_exit;
+
+	/* dentry is valid! */
+	return 1;
+
+invalid_exit:
+	return 0;
+}
+
+const struct dentry_operations pvfs2_dentry_operations = {
+	.d_revalidate = pvfs2_d_revalidate,
+};
diff --git a/fs/orangefs/devpvfs2-req.c b/fs/orangefs/devpvfs2-req.c
new file mode 100644
index 0000000..3e45022
--- /dev/null
+++ b/fs/orangefs/devpvfs2-req.c
@@ -0,0 +1,997 @@
+/*
+ * (C) 2001 Clemson University and The University of Chicago
+ *
+ * Changes by Acxiom Corporation to add protocol version to kernel
+ * communication, Copyright Acxiom Corporation, 2005.
+ *
+ * See COPYING in top-level directory.
+ */
+
+#include "protocol.h"
+#include "pvfs2-kernel.h"
+#include "pvfs2-dev-proto.h"
+#include "pvfs2-bufmap.h"
+
+#include <linux/debugfs.h>
+#include <linux/slab.h>
+
+/* this file implements the /dev/pvfs2-req device node */
+
+static int open_access_count;
+
+#define DUMP_DEVICE_ERROR()                                                   \
+do {                                                                          \
+	gossip_err("*****************************************************\n");\
+	gossip_err("PVFS2 Device Error:  You cannot open the device file ");  \
+	gossip_err("\n/dev/%s more than once.  Please make sure that\nthere " \
+		   "are no ", PVFS2_REQDEVICE_NAME);                          \
+	gossip_err("instances of a program using this device\ncurrently "     \
+		   "running. (You must verify this!)\n");                     \
+	gossip_err("For example, you can use the lsof program as follows:\n");\
+	gossip_err("'lsof | grep %s' (run this as root)\n",                   \
+		   PVFS2_REQDEVICE_NAME);                                     \
+	gossip_err("  open_access_count = %d\n", open_access_count);          \
+	gossip_err("*****************************************************\n");\
+} while (0)
+
+static int hash_func(__u64 tag, int table_size)
+{
+	return tag % ((unsigned int)table_size);
+}
+
+static void pvfs2_devreq_add_op(struct pvfs2_kernel_op_s *op)
+{
+	int index = hash_func(op->tag, hash_table_size);
+
+	spin_lock(&htable_ops_in_progress_lock);
+	list_add_tail(&op->list, &htable_ops_in_progress[index]);
+	spin_unlock(&htable_ops_in_progress_lock);
+}
+
+static struct pvfs2_kernel_op_s *pvfs2_devreq_remove_op(__u64 tag)
+{
+	struct pvfs2_kernel_op_s *op, *next;
+	int index;
+
+	index = hash_func(tag, hash_table_size);
+
+	spin_lock(&htable_ops_in_progress_lock);
+	list_for_each_entry_safe(op,
+				 next,
+				 &htable_ops_in_progress[index],
+				 list) {
+		if (op->tag == tag) {
+			list_del(&op->list);
+			spin_unlock(&htable_ops_in_progress_lock);
+			return op;
+		}
+	}
+
+	spin_unlock(&htable_ops_in_progress_lock);
+	return NULL;
+}
+
+static int pvfs2_devreq_open(struct inode *inode, struct file *file)
+{
+	int ret = -EINVAL;
+
+	if (!(file->f_flags & O_NONBLOCK)) {
+		gossip_err("pvfs2: device cannot be opened in blocking mode\n");
+		goto out;
+	}
+	ret = -EACCES;
+	gossip_debug(GOSSIP_DEV_DEBUG, "pvfs2-client-core: opening device\n");
+	mutex_lock(&devreq_mutex);
+
+	if (open_access_count == 0) {
+		ret = generic_file_open(inode, file);
+		if (ret == 0)
+			open_access_count++;
+	} else {
+		DUMP_DEVICE_ERROR();
+	}
+	mutex_unlock(&devreq_mutex);
+
+out:
+
+	gossip_debug(GOSSIP_DEV_DEBUG,
+		     "pvfs2-client-core: open device complete (ret = %d)\n",
+		     ret);
+	return ret;
+}
+
+static ssize_t pvfs2_devreq_read(struct file *file,
+				 char __user *buf,
+				 size_t count, loff_t *offset)
+{
+	int ret = 0;
+	ssize_t len = 0;
+	struct pvfs2_kernel_op_s *cur_op = NULL;
+	static __s32 magic = PVFS2_DEVREQ_MAGIC;
+	__s32 proto_ver = PVFS_KERNEL_PROTO_VERSION;
+
+	if (!(file->f_flags & O_NONBLOCK)) {
+		/* We do not support blocking reads/opens any more */
+		gossip_err("pvfs2: blocking reads are not supported! (pvfs2-client-core bug)\n");
+		return -EINVAL;
+	} else {
+		struct pvfs2_kernel_op_s *op = NULL, *temp = NULL;
+		/* get next op (if any) from top of list */
+		spin_lock(&pvfs2_request_list_lock);
+		list_for_each_entry_safe(op, temp, &pvfs2_request_list, list) {
+			__s32 fsid = fsid_of_op(op);
+			/*
+			 * Check if this op's fsid is known and needs
+			 * remounting
+			 */
+			if (fsid != PVFS_FS_ID_NULL &&
+			    fs_mount_pending(fsid) == 1) {
+				gossip_debug(GOSSIP_DEV_DEBUG,
+					     "Skipping op tag %llu %s\n",
+					     llu(op->tag),
+					     get_opname_string(op));
+				continue;
+			} else {
+				/*
+				 * op does not belong to any particular fsid
+				 * or already mounted.. let it through
+				 */
+				cur_op = op;
+				spin_lock(&cur_op->lock);
+				list_del(&cur_op->list);
+				cur_op->op_linger_tmp--;
+				/*
+				 * if there is a trailer, re-add it to
+				 * the request list.
+				 */
+				if (cur_op->op_linger == 2 &&
+				    cur_op->op_linger_tmp == 1) {
+					if (cur_op->upcall.trailer_size <= 0 ||
+					    cur_op->upcall.trailer_buf == NULL)
+						gossip_err("BUG:trailer_size is %ld and trailer buf is %p\n", (long)cur_op->upcall.trailer_size, cur_op->upcall.trailer_buf);
+					/* re-add it to the head of the list */
+					list_add(&cur_op->list,
+						 &pvfs2_request_list);
+				}
+				spin_unlock(&cur_op->lock);
+				break;
+			}
+		}
+		spin_unlock(&pvfs2_request_list_lock);
+	}
+
+	if (cur_op) {
+		spin_lock(&cur_op->lock);
+
+		gossip_debug(GOSSIP_DEV_DEBUG,
+			     "client-core: reading op tag %llu %s\n",
+			     llu(cur_op->tag), get_opname_string(cur_op));
+		if (op_state_in_progress(cur_op) || op_state_serviced(cur_op)) {
+			if (cur_op->op_linger == 1)
+				gossip_err("WARNING: Current op already queued...skipping\n");
+		} else if (cur_op->op_linger == 1 ||
+			   (cur_op->op_linger == 2 &&
+			    cur_op->op_linger_tmp == 0)) {
+			/*
+			 * atomically move the operation to the
+			 * htable_ops_in_progress
+			 */
+			set_op_state_inprogress(cur_op);
+			pvfs2_devreq_add_op(cur_op);
+		}
+
+		spin_unlock(&cur_op->lock);
+
+		/* 2 cases
+		 * a) OPs with no trailers
+		 * b) OPs with trailers, Stage 1
+		 * Either way push the upcall out
+		 */
+		if (cur_op->op_linger == 1 ||
+		   (cur_op->op_linger == 2 && cur_op->op_linger_tmp == 1)) {
+			len = MAX_ALIGNED_DEV_REQ_UPSIZE;
+			if ((size_t) len <= count) {
+			    ret = copy_to_user(buf,
+					       &proto_ver,
+					       sizeof(__s32));
+			    if (ret == 0) {
+				ret = copy_to_user(buf + sizeof(__s32),
+						   &magic,
+						   sizeof(__s32));
+				if (ret == 0) {
+				    ret = copy_to_user(buf+2 * sizeof(__s32),
+						       &cur_op->tag,
+						       sizeof(__u64));
+				    if (ret == 0) {
+					ret = copy_to_user(
+						buf +
+						  2 *
+						  sizeof(__s32) +
+						  sizeof(__u64),
+						&cur_op->upcall,
+						sizeof(struct pvfs2_upcall_s));
+				    }
+				}
+			    }
+
+			    if (ret) {
+				gossip_err("Failed to copy data to user space\n");
+				len = -EFAULT;
+			    }
+			} else {
+				gossip_err
+				    ("Failed to copy data to user space\n");
+				len = -EIO;
+			}
+		}
+		/* Stage 2: Push the trailer out */
+		else if (cur_op->op_linger == 2 && cur_op->op_linger_tmp == 0) {
+			len = cur_op->upcall.trailer_size;
+			if ((size_t) len <= count) {
+				ret = copy_to_user(buf,
+						   cur_op->upcall.trailer_buf,
+						   len);
+				if (ret) {
+					gossip_err("Failed to copy trailer to user space\n");
+					len = -EFAULT;
+				}
+			} else {
+				gossip_err("Read buffer for trailer is too small (%ld as opposed to %ld)\n",
+					(long)count,
+					(long)len);
+				len = -EIO;
+			}
+		} else {
+			gossip_err("cur_op: %p (op_linger %d), (op_linger_tmp %d), erroneous request list?\n",
+				cur_op,
+				cur_op->op_linger,
+				cur_op->op_linger_tmp);
+			len = 0;
+		}
+	} else if (file->f_flags & O_NONBLOCK) {
+		/*
+		 * if in non-blocking mode, return EAGAIN since no requests are
+		 * ready yet
+		 */
+		len = -EAGAIN;
+	}
+	return len;
+}
+
+/* Function for writev() callers into the device */
+static ssize_t pvfs2_devreq_writev(struct file *file,
+				   const struct iovec *iov,
+				   size_t count,
+				   loff_t *offset)
+{
+	struct pvfs2_kernel_op_s *op = NULL;
+	void *buffer = NULL;
+	void *ptr = NULL;
+	unsigned long i = 0;
+	static int max_downsize = MAX_ALIGNED_DEV_REQ_DOWNSIZE;
+	int ret = 0, num_remaining = max_downsize;
+	int notrailer_count = 4; /* num elements in iovec without trailer */
+	int payload_size = 0;
+	__s32 magic = 0;
+	__s32 proto_ver = 0;
+	__u64 tag = 0;
+	ssize_t total_returned_size = 0;
+
+	/* Either there is a trailer or there isn't */
+	if (count != notrailer_count && count != (notrailer_count + 1)) {
+		gossip_err("Error: Number of iov vectors is (%ld) and notrailer count is %d\n",
+			count,
+			notrailer_count);
+		return -EPROTO;
+	}
+	buffer = dev_req_alloc();
+	if (!buffer)
+		return -ENOMEM;
+	ptr = buffer;
+
+	for (i = 0; i < notrailer_count; i++) {
+		if (iov[i].iov_len > num_remaining) {
+			gossip_err
+			    ("writev error: Freeing buffer and returning\n");
+			dev_req_release(buffer);
+			return -EMSGSIZE;
+		}
+		ret = copy_from_user(ptr, iov[i].iov_base, iov[i].iov_len);
+		if (ret) {
+			gossip_err("Failed to copy data from user space\n");
+			dev_req_release(buffer);
+			return -EIO;
+		}
+		num_remaining -= iov[i].iov_len;
+		ptr += iov[i].iov_len;
+		payload_size += iov[i].iov_len;
+	}
+	total_returned_size = payload_size;
+
+	/* these elements are currently 8 byte aligned (8 bytes for (version +
+	 * magic) 8 bytes for tag).  If you add another element, either
+	 * make it 8 bytes big, or use get_unaligned when asigning.
+	 */
+	ptr = buffer;
+	proto_ver = *((__s32 *) ptr);
+	ptr += sizeof(__s32);
+
+	magic = *((__s32 *) ptr);
+	ptr += sizeof(__s32);
+
+	tag = *((__u64 *) ptr);
+	ptr += sizeof(__u64);
+
+	if (magic != PVFS2_DEVREQ_MAGIC) {
+		gossip_err("Error: Device magic number does not match.\n");
+		dev_req_release(buffer);
+		return -EPROTO;
+	}
+
+	/*
+	 * proto_ver = 20902 for 2.9.2
+	 */
+
+	op = pvfs2_devreq_remove_op(tag);
+	if (op) {
+		/* Increase ref count! */
+		get_op(op);
+		/* cut off magic and tag from payload size */
+		payload_size -= (2 * sizeof(__s32) + sizeof(__u64));
+		if (payload_size <= sizeof(struct pvfs2_downcall_s))
+			/* copy the passed in downcall into the op */
+			memcpy(&op->downcall,
+			       ptr,
+			       sizeof(struct pvfs2_downcall_s));
+		else
+			gossip_debug(GOSSIP_DEV_DEBUG,
+				     "writev: Ignoring %d bytes\n",
+				     payload_size);
+
+		/* Do not allocate needlessly if client-core forgets
+		 * to reset trailer size on op errors.
+		 */
+		if (op->downcall.status == 0 && op->downcall.trailer_size > 0) {
+			gossip_debug(GOSSIP_DEV_DEBUG,
+				     "writev: trailer size %ld\n",
+				     (unsigned long)op->downcall.trailer_size);
+			if (count != (notrailer_count + 1)) {
+				gossip_err("Error: trailer size (%ld) is non-zero, no trailer elements though? (%ld)\n", (unsigned long)op->downcall.trailer_size, count);
+				dev_req_release(buffer);
+				put_op(op);
+				return -EPROTO;
+			}
+			if (iov[notrailer_count].iov_len >
+			    op->downcall.trailer_size) {
+				gossip_err("writev error: trailer size (%ld) != iov_len (%ld)\n", (unsigned long)op->downcall.trailer_size, (unsigned long)iov[notrailer_count].iov_len);
+				dev_req_release(buffer);
+				put_op(op);
+				return -EMSGSIZE;
+			}
+			/* Allocate a buffer large enough to hold the
+			 * trailer bytes.
+			 */
+			op->downcall.trailer_buf =
+			    vmalloc(op->downcall.trailer_size);
+			if (op->downcall.trailer_buf != NULL) {
+				gossip_debug(GOSSIP_DEV_DEBUG, "vmalloc: %p\n",
+					     op->downcall.trailer_buf);
+				ret = copy_from_user(op->downcall.trailer_buf,
+						     iov[notrailer_count].
+						     iov_base,
+						     iov[notrailer_count].
+						     iov_len);
+				if (ret) {
+					gossip_err("Failed to copy trailer data from user space\n");
+					dev_req_release(buffer);
+					gossip_debug(GOSSIP_DEV_DEBUG,
+						     "vfree: %p\n",
+						     op->downcall.trailer_buf);
+					vfree(op->downcall.trailer_buf);
+					op->downcall.trailer_buf = NULL;
+					put_op(op);
+					return -EIO;
+				}
+			} else {
+				/* Change downcall status */
+				op->downcall.status = -ENOMEM;
+				gossip_err("writev: could not vmalloc for trailer!\n");
+			}
+		}
+
+		/* if this operation is an I/O operation and if it was
+		 * initiated on behalf of a *synchronous* VFS I/O operation,
+		 * only then we need to wait
+		 * for all data to be copied before we can return to avoid
+		 * buffer corruption and races that can pull the buffers
+		 * out from under us.
+		 *
+		 * Essentially we're synchronizing with other parts of the
+		 * vfs implicitly by not allowing the user space
+		 * application reading/writing this device to return until
+		 * the buffers are done being used.
+		 */
+		if ((op->upcall.type == PVFS2_VFS_OP_FILE_IO &&
+		     op->upcall.req.io.async_vfs_io == PVFS_VFS_SYNC_IO) ||
+		     op->upcall.type == PVFS2_VFS_OP_FILE_IOX) {
+			int timed_out = 0;
+			DECLARE_WAITQUEUE(wait_entry, current);
+
+			/* tell the vfs op waiting on a waitqueue
+			 * that this op is done
+			 */
+			spin_lock(&op->lock);
+			set_op_state_serviced(op);
+			spin_unlock(&op->lock);
+
+			add_wait_queue_exclusive(&op->io_completion_waitq,
+						 &wait_entry);
+			wake_up_interruptible(&op->waitq);
+
+			while (1) {
+				set_current_state(TASK_INTERRUPTIBLE);
+
+				spin_lock(&op->lock);
+				if (op->io_completed) {
+					spin_unlock(&op->lock);
+					break;
+				}
+				spin_unlock(&op->lock);
+
+				if (!signal_pending(current)) {
+					int timeout =
+					    MSECS_TO_JIFFIES(1000 *
+							     op_timeout_secs);
+					if (!schedule_timeout(timeout)) {
+						gossip_debug(GOSSIP_DEV_DEBUG, "*** I/O wait time is up\n");
+						timed_out = 1;
+						break;
+					}
+					continue;
+				}
+
+				gossip_debug(GOSSIP_DEV_DEBUG, "*** signal on I/O wait -- aborting\n");
+				break;
+			}
+
+			set_current_state(TASK_RUNNING);
+			remove_wait_queue(&op->io_completion_waitq,
+					  &wait_entry);
+
+			/* NOTE: for I/O operations we handle releasing the op
+			 * object except in the case of timeout.  the reason we
+			 * can't free the op in timeout cases is that the op
+			 * service logic in the vfs retries operations using
+			 * the same op ptr, thus it can't be freed.
+			 */
+			if (!timed_out)
+				op_release(op);
+		} else {
+
+			/*
+			 * tell the vfs op waiting on a waitqueue that
+			 * this op is done
+			 */
+			spin_lock(&op->lock);
+			set_op_state_serviced(op);
+			spin_unlock(&op->lock);
+			/*
+			   for every other operation (i.e. non-I/O), we need to
+			   wake up the callers for downcall completion
+			   notification
+			 */
+			wake_up_interruptible(&op->waitq);
+		}
+	} else {
+		/* ignore downcalls that we're not interested in */
+		gossip_debug(GOSSIP_DEV_DEBUG,
+			     "WARNING: No one's waiting for tag %llu\n",
+			     llu(tag));
+	}
+	dev_req_release(buffer);
+
+	return total_returned_size;
+}
+
+static ssize_t pvfs2_devreq_write_iter(struct kiocb *iocb,
+				      struct iov_iter *iter)
+{
+	return pvfs2_devreq_writev(iocb->ki_filp,
+				   iter->iov,
+				   iter->nr_segs,
+				   &iocb->ki_pos);
+}
+
+/* Returns whether any FS are still pending remounted */
+static int mark_all_pending_mounts(void)
+{
+	int unmounted = 1;
+	struct pvfs2_sb_info_s *pvfs2_sb = NULL;
+
+	spin_lock(&pvfs2_superblocks_lock);
+	list_for_each_entry(pvfs2_sb, &pvfs2_superblocks, list) {
+		/* All of these file system require a remount */
+		pvfs2_sb->mount_pending = 1;
+		unmounted = 0;
+	}
+	spin_unlock(&pvfs2_superblocks_lock);
+	return unmounted;
+}
+
+/*
+ * Determine if a given file system needs to be remounted or not
+ *  Returns -1 on error
+ *           0 if already mounted
+ *           1 if needs remount
+ */
+int fs_mount_pending(__s32 fsid)
+{
+	int mount_pending = -1;
+	struct pvfs2_sb_info_s *pvfs2_sb = NULL;
+
+	spin_lock(&pvfs2_superblocks_lock);
+	list_for_each_entry(pvfs2_sb, &pvfs2_superblocks, list) {
+		if (pvfs2_sb->fs_id == fsid) {
+			mount_pending = pvfs2_sb->mount_pending;
+			break;
+		}
+	}
+	spin_unlock(&pvfs2_superblocks_lock);
+	return mount_pending;
+}
+
+/*
+ * NOTE: gets called when the last reference to this device is dropped.
+ * Using the open_access_count variable, we enforce a reference count
+ * on this file so that it can be opened by only one process at a time.
+ * the devreq_mutex is used to make sure all i/o has completed
+ * before we call pvfs_bufmap_finalize, and similar such tricky
+ * situations
+ */
+static int pvfs2_devreq_release(struct inode *inode, struct file *file)
+{
+	int unmounted = 0;
+
+	gossip_debug(GOSSIP_DEV_DEBUG,
+		     "%s:pvfs2-client-core: exiting, closing device\n",
+		     __func__);
+
+	mutex_lock(&devreq_mutex);
+	pvfs_bufmap_finalize();
+
+	open_access_count--;
+
+	unmounted = mark_all_pending_mounts();
+	gossip_debug(GOSSIP_DEV_DEBUG, "PVFS2 Device Close: Filesystem(s) %s\n",
+		     (unmounted ? "UNMOUNTED" : "MOUNTED"));
+	mutex_unlock(&devreq_mutex);
+
+	/*
+	 * Walk through the list of ops in the request list, mark them
+	 * as purged and wake them up.
+	 */
+	purge_waiting_ops();
+	/*
+	 * Walk through the hash table of in progress operations; mark
+	 * them as purged and wake them up
+	 */
+	purge_inprogress_ops();
+	gossip_debug(GOSSIP_DEV_DEBUG,
+		     "pvfs2-client-core: device close complete\n");
+	return 0;
+}
+
+int is_daemon_in_service(void)
+{
+	int in_service;
+
+	/*
+	 * What this function does is checks if client-core is alive
+	 * based on the access count we maintain on the device.
+	 */
+	mutex_lock(&devreq_mutex);
+	in_service = open_access_count == 1 ? 0 : -EIO;
+	mutex_unlock(&devreq_mutex);
+	return in_service;
+}
+
+static inline long check_ioctl_command(unsigned int command)
+{
+	/* Check for valid ioctl codes */
+	if (_IOC_TYPE(command) != PVFS_DEV_MAGIC) {
+		gossip_err("device ioctl magic numbers don't match! Did you rebuild pvfs2-client-core/libpvfs2? [cmd %x, magic %x != %x]\n",
+			command,
+			_IOC_TYPE(command),
+			PVFS_DEV_MAGIC);
+		return -EINVAL;
+	}
+	/* and valid ioctl commands */
+	if (_IOC_NR(command) >= PVFS_DEV_MAXNR || _IOC_NR(command) <= 0) {
+		gossip_err("Invalid ioctl command number [%d >= %d]\n",
+			   _IOC_NR(command), PVFS_DEV_MAXNR);
+		return -ENOIOCTLCMD;
+	}
+	return 0;
+}
+
+static long dispatch_ioctl_command(unsigned int command, unsigned long arg)
+{
+	static __s32 magic = PVFS2_DEVREQ_MAGIC;
+	static __s32 max_up_size = MAX_ALIGNED_DEV_REQ_UPSIZE;
+	static __s32 max_down_size = MAX_ALIGNED_DEV_REQ_DOWNSIZE;
+	struct PVFS_dev_map_desc user_desc;
+	int ret = 0;
+	struct dev_mask_info_s mask_info = { 0 };
+	struct dev_mask2_info_s mask2_info = { 0, 0 };
+	int upstream_kmod = 1;
+	struct list_head *tmp = NULL;
+	struct pvfs2_sb_info_s *pvfs2_sb = NULL;
+
+	/* mtmoore: add locking here */
+
+	switch (command) {
+	case PVFS_DEV_GET_MAGIC:
+		return ((put_user(magic, (__s32 __user *) arg) == -EFAULT) ?
+			-EIO :
+			0);
+	case PVFS_DEV_GET_MAX_UPSIZE:
+		return ((put_user(max_up_size,
+				  (__s32 __user *) arg) == -EFAULT) ?
+					-EIO :
+					0);
+	case PVFS_DEV_GET_MAX_DOWNSIZE:
+		return ((put_user(max_down_size,
+				  (__s32 __user *) arg) == -EFAULT) ?
+					-EIO :
+					0);
+	case PVFS_DEV_MAP:
+		ret = copy_from_user(&user_desc,
+				     (struct PVFS_dev_map_desc __user *)
+				     arg,
+				     sizeof(struct PVFS_dev_map_desc));
+		return ret ? -EIO : pvfs_bufmap_initialize(&user_desc);
+	case PVFS_DEV_REMOUNT_ALL:
+		gossip_debug(GOSSIP_DEV_DEBUG,
+			     "pvfs2_devreq_ioctl: got PVFS_DEV_REMOUNT_ALL\n");
+
+		/*
+		 * remount all mounted pvfs2 volumes to regain the lost
+		 * dynamic mount tables (if any) -- NOTE: this is done
+		 * without keeping the superblock list locked due to the
+		 * upcall/downcall waiting.  also, the request semaphore is
+		 * used to ensure that no operations will be serviced until
+		 * all of the remounts are serviced (to avoid ops between
+		 * mounts to fail)
+		 */
+		ret = mutex_lock_interruptible(&request_mutex);
+		if (ret < 0)
+			return ret;
+		gossip_debug(GOSSIP_DEV_DEBUG,
+			     "pvfs2_devreq_ioctl: priority remount in progress\n");
+		list_for_each(tmp, &pvfs2_superblocks) {
+			pvfs2_sb =
+				list_entry(tmp, struct pvfs2_sb_info_s, list);
+			if (pvfs2_sb && (pvfs2_sb->sb)) {
+				gossip_debug(GOSSIP_DEV_DEBUG,
+					     "Remounting SB %p\n",
+					     pvfs2_sb);
+
+				ret = pvfs2_remount(pvfs2_sb->sb);
+				if (ret) {
+					gossip_debug(GOSSIP_DEV_DEBUG,
+						     "SB %p remount failed\n",
+						     pvfs2_sb);
+						break;
+				}
+			}
+		}
+		gossip_debug(GOSSIP_DEV_DEBUG,
+			     "pvfs2_devreq_ioctl: priority remount complete\n");
+		mutex_unlock(&request_mutex);
+		return ret;
+
+	case PVFS_DEV_UPSTREAM:
+		ret = copy_to_user((void __user *)arg,
+				    &upstream_kmod,
+				    sizeof(upstream_kmod));
+
+		if (ret != 0)
+			return -EIO;
+		else
+			return ret;
+
+	case PVFS_DEV_CLIENT_MASK:
+		ret = copy_from_user(&mask2_info,
+				     (void __user *)arg,
+				     sizeof(struct dev_mask2_info_s));
+
+		if (ret != 0)
+			return -EIO;
+
+		client_debug_mask.mask1 = mask2_info.mask1_value;
+		client_debug_mask.mask2 = mask2_info.mask2_value;
+
+		pr_info("%s: client debug mask has been been received "
+			":%llx: :%llx:\n",
+			__func__,
+			(unsigned long long)client_debug_mask.mask1,
+			(unsigned long long)client_debug_mask.mask2);
+
+		return ret;
+
+	case PVFS_DEV_CLIENT_STRING:
+		ret = copy_from_user(&client_debug_array_string,
+				     (void __user *)arg,
+				     PVFS2_MAX_DEBUG_STRING_LEN);
+		if (ret != 0) {
+			pr_info("%s: "
+				"PVFS_DEV_CLIENT_STRING: copy_from_user failed"
+				"\n",
+				__func__);
+			return -EIO;
+		}
+
+		pr_info("%s: client debug array string has been been received."
+			"\n",
+			__func__);
+
+		if (!help_string_initialized) {
+
+			/* Free the "we don't know yet" default string... */
+			kfree(debug_help_string);
+
+			/* build a proper debug help string */
+			if (orangefs_prepare_debugfs_help_string(0)) {
+				gossip_err("%s: "
+					   "prepare_debugfs_help_string failed"
+					   "\n",
+					   __func__);
+				return -EIO;
+			}
+
+			/* Replace the boilerplate boot-time debug-help file. */
+			debugfs_remove(help_file_dentry);
+
+			help_file_dentry =
+				debugfs_create_file(
+					ORANGEFS_KMOD_DEBUG_HELP_FILE,
+					0444,
+					debug_dir,
+					debug_help_string,
+					&debug_help_fops);
+
+			if (!help_file_dentry) {
+				gossip_err("%s: debugfs_create_file failed for"
+					   " :%s:!\n",
+					   __func__,
+					   ORANGEFS_KMOD_DEBUG_HELP_FILE);
+				return -EIO;
+			}
+		}
+
+		debug_mask_to_string(&client_debug_mask, 1);
+
+		debugfs_remove(client_debug_dentry);
+
+		pvfs2_client_debug_init();
+
+		help_string_initialized++;
+
+		return ret;
+
+	case PVFS_DEV_DEBUG:
+		ret = copy_from_user(&mask_info,
+				     (void __user *)arg,
+				     sizeof(mask_info));
+
+		if (ret != 0)
+			return -EIO;
+
+		if (mask_info.mask_type == KERNEL_MASK) {
+			if ((mask_info.mask_value == 0)
+			    && (kernel_mask_set_mod_init)) {
+				/*
+				 * the kernel debug mask was set when the
+				 * kernel module was loaded; don't override
+				 * it if the client-core was started without
+				 * a value for PVFS2_KMODMASK.
+				 */
+				return 0;
+			}
+			debug_mask_to_string(&mask_info.mask_value,
+					     mask_info.mask_type);
+			gossip_debug_mask = mask_info.mask_value;
+			pr_info("PVFS: kernel debug mask has been modified to "
+				":%s: :%llx:\n",
+				kernel_debug_string,
+				(unsigned long long)gossip_debug_mask);
+		} else if (mask_info.mask_type == CLIENT_MASK) {
+			debug_mask_to_string(&mask_info.mask_value,
+					     mask_info.mask_type);
+			pr_info("PVFS: client debug mask has been modified to"
+				":%s: :%llx:\n",
+				client_debug_string,
+				llu(mask_info.mask_value));
+		} else {
+			gossip_lerr("Invalid mask type....\n");
+			return -EINVAL;
+		}
+
+		return ret;
+
+	default:
+		return -ENOIOCTLCMD;
+	}
+	return -ENOIOCTLCMD;
+}
+
+static long pvfs2_devreq_ioctl(struct file *file,
+			       unsigned int command, unsigned long arg)
+{
+	long ret;
+
+	/* Check for properly constructed commands */
+	ret = check_ioctl_command(command);
+	if (ret < 0)
+		return (int)ret;
+
+	return (int)dispatch_ioctl_command(command, arg);
+}
+
+#ifdef CONFIG_COMPAT		/* CONFIG_COMPAT is in .config */
+
+/*  Compat structure for the PVFS_DEV_MAP ioctl */
+struct PVFS_dev_map_desc32 {
+	compat_uptr_t ptr;
+	__s32 total_size;
+	__s32 size;
+	__s32 count;
+};
+
+static unsigned long translate_dev_map26(unsigned long args, long *error)
+{
+	struct PVFS_dev_map_desc32 __user *p32 = (void __user *)args;
+	/*
+	 * Depending on the architecture, allocate some space on the
+	 * user-call-stack based on our expected layout.
+	 */
+	struct PVFS_dev_map_desc __user *p =
+	    compat_alloc_user_space(sizeof(*p));
+	u32 addr;
+
+	*error = 0;
+	/* get the ptr from the 32 bit user-space */
+	if (get_user(addr, &p32->ptr))
+		goto err;
+	/* try to put that into a 64-bit layout */
+	if (put_user(compat_ptr(addr), &p->ptr))
+		goto err;
+	/* copy the remaining fields */
+	if (copy_in_user(&p->total_size, &p32->total_size, sizeof(__s32)))
+		goto err;
+	if (copy_in_user(&p->size, &p32->size, sizeof(__s32)))
+		goto err;
+	if (copy_in_user(&p->count, &p32->count, sizeof(__s32)))
+		goto err;
+	return (unsigned long)p;
+err:
+	*error = -EFAULT;
+	return 0;
+}
+
+/*
+ * 32 bit user-space apps' ioctl handlers when kernel modules
+ * is compiled as a 64 bit one
+ */
+static long pvfs2_devreq_compat_ioctl(struct file *filp, unsigned int cmd,
+				      unsigned long args)
+{
+	long ret;
+	unsigned long arg = args;
+
+	/* Check for properly constructed commands */
+	ret = check_ioctl_command(cmd);
+	if (ret < 0)
+		return ret;
+	if (cmd == PVFS_DEV_MAP) {
+		/*
+		 * convert the arguments to what we expect internally
+		 * in kernel space
+		 */
+		arg = translate_dev_map26(args, &ret);
+		if (ret < 0) {
+			gossip_err("Could not translate dev map\n");
+			return ret;
+		}
+	}
+	/* no other ioctl requires translation */
+	return dispatch_ioctl_command(cmd, arg);
+}
+
+static int pvfs2_ioctl32_init(void)
+{
+	return 0;
+}
+
+static void pvfs2_ioctl32_cleanup(void)
+{
+	return;
+}
+
+#endif /* CONFIG_COMPAT is in .config */
+
+/* the assigned character device major number */
+static int pvfs2_dev_major;
+
+/*
+ * Initialize pvfs2 device specific state:
+ * Must be called at module load time only
+ */
+int pvfs2_dev_init(void)
+{
+	int ret;
+
+	/* register the ioctl32 sub-system */
+	ret = pvfs2_ioctl32_init();
+	if (ret < 0)
+		return ret;
+
+	/* register pvfs2-req device  */
+	pvfs2_dev_major = register_chrdev(0,
+					  PVFS2_REQDEVICE_NAME,
+					  &pvfs2_devreq_file_operations);
+	if (pvfs2_dev_major < 0) {
+		gossip_debug(GOSSIP_DEV_DEBUG,
+			     "Failed to register /dev/%s (error %d)\n",
+			     PVFS2_REQDEVICE_NAME, pvfs2_dev_major);
+		pvfs2_ioctl32_cleanup();
+		return pvfs2_dev_major;
+	}
+
+	gossip_debug(GOSSIP_DEV_DEBUG,
+		     "*** /dev/%s character device registered ***\n",
+		     PVFS2_REQDEVICE_NAME);
+	gossip_debug(GOSSIP_DEV_DEBUG, "'mknod /dev/%s c %d 0'.\n",
+		     PVFS2_REQDEVICE_NAME, pvfs2_dev_major);
+	return 0;
+}
+
+void pvfs2_dev_cleanup(void)
+{
+	unregister_chrdev(pvfs2_dev_major, PVFS2_REQDEVICE_NAME);
+	gossip_debug(GOSSIP_DEV_DEBUG,
+		     "*** /dev/%s character device unregistered ***\n",
+		     PVFS2_REQDEVICE_NAME);
+	/* unregister the ioctl32 sub-system */
+	pvfs2_ioctl32_cleanup();
+}
+
+static unsigned int pvfs2_devreq_poll(struct file *file,
+				      struct poll_table_struct *poll_table)
+{
+	int poll_revent_mask = 0;
+
+	if (open_access_count == 1) {
+		poll_wait(file, &pvfs2_request_list_waitq, poll_table);
+
+		spin_lock(&pvfs2_request_list_lock);
+		if (!list_empty(&pvfs2_request_list))
+			poll_revent_mask |= POLL_IN;
+		spin_unlock(&pvfs2_request_list_lock);
+	}
+	return poll_revent_mask;
+}
+
+const struct file_operations pvfs2_devreq_file_operations = {
+	.owner = THIS_MODULE,
+	.read = pvfs2_devreq_read,
+	.write_iter = pvfs2_devreq_write_iter,
+	.open = pvfs2_devreq_open,
+	.release = pvfs2_devreq_release,
+	.unlocked_ioctl = pvfs2_devreq_ioctl,
+
+#ifdef CONFIG_COMPAT		/* CONFIG_COMPAT is in .config */
+	.compat_ioctl = pvfs2_devreq_compat_ioctl,
+#endif
+	.poll = pvfs2_devreq_poll
+};
diff --git a/fs/orangefs/dir.c b/fs/orangefs/dir.c
new file mode 100644
index 0000000..9b5f4bb
--- /dev/null
+++ b/fs/orangefs/dir.c
@@ -0,0 +1,394 @@
+/*
+ * (C) 2001 Clemson University and The University of Chicago
+ *
+ * See COPYING in top-level directory.
+ */
+
+#include "protocol.h"
+#include "pvfs2-kernel.h"
+#include "pvfs2-bufmap.h"
+
+struct readdir_handle_s {
+	int buffer_index;
+	struct pvfs2_readdir_response_s readdir_response;
+	void *dents_buf;
+};
+
+/*
+ * decode routine needed by kmod to make sense of the shared page for readdirs.
+ */
+static long decode_dirents(char *ptr, struct pvfs2_readdir_response_s *readdir)
+{
+	int i;
+	struct pvfs2_readdir_response_s *rd =
+		(struct pvfs2_readdir_response_s *) ptr;
+	char *buf = ptr;
+	char **pptr = &buf;
+
+	readdir->token = rd->token;
+	readdir->pvfs_dirent_outcount = rd->pvfs_dirent_outcount;
+	readdir->dirent_array = kmalloc(readdir->pvfs_dirent_outcount *
+					sizeof(*readdir->dirent_array),
+					GFP_KERNEL);
+	if (readdir->dirent_array == NULL)
+		return -ENOMEM;
+	*pptr += offsetof(struct pvfs2_readdir_response_s, dirent_array);
+	for (i = 0; i < readdir->pvfs_dirent_outcount; i++) {
+		dec_string(pptr, &readdir->dirent_array[i].d_name,
+			   &readdir->dirent_array[i].d_length);
+		readdir->dirent_array[i].khandle =
+			*(struct pvfs2_khandle *) *pptr;
+		*pptr += 16;
+	}
+	return (unsigned long)*pptr - (unsigned long)ptr;
+}
+
+static long readdir_handle_ctor(struct readdir_handle_s *rhandle, void *buf,
+				int buffer_index)
+{
+	long ret;
+
+	if (buf == NULL) {
+		gossip_err
+		    ("Invalid NULL buffer specified in readdir_handle_ctor\n");
+		return -ENOMEM;
+	}
+	if (buffer_index < 0) {
+		gossip_err
+		    ("Invalid buffer index specified in readdir_handle_ctor\n");
+		return -EINVAL;
+	}
+	rhandle->buffer_index = buffer_index;
+	rhandle->dents_buf = buf;
+	ret = decode_dirents(buf, &rhandle->readdir_response);
+	if (ret < 0) {
+		gossip_err("Could not decode readdir from buffer %ld\n", ret);
+		rhandle->buffer_index = -1;
+		gossip_debug(GOSSIP_DIR_DEBUG, "vfree %p\n", buf);
+		vfree(buf);
+		rhandle->dents_buf = NULL;
+	}
+	return ret;
+}
+
+static void readdir_handle_dtor(struct pvfs2_bufmap *bufmap,
+		struct readdir_handle_s *rhandle)
+{
+	if (rhandle == NULL)
+		return;
+
+	/* kfree(NULL) is safe */
+	kfree(rhandle->readdir_response.dirent_array);
+	rhandle->readdir_response.dirent_array = NULL;
+
+	if (rhandle->buffer_index >= 0) {
+		readdir_index_put(bufmap, rhandle->buffer_index);
+		rhandle->buffer_index = -1;
+	}
+	if (rhandle->dents_buf) {
+		gossip_debug(GOSSIP_DIR_DEBUG, "vfree %p\n",
+			     rhandle->dents_buf);
+		vfree(rhandle->dents_buf);
+		rhandle->dents_buf = NULL;
+	}
+}
+
+/*
+ * Read directory entries from an instance of an open directory.
+ *
+ * \note This routine was converted for the readdir to iterate change
+ *       in "struct file_operations". "converted" mostly amounts to
+ *       changing occurrences of "readdir" and "filldir" in the
+ *       comments to "iterate" and "dir_emit". Also filldir calls
+ *       were changed to dir_emit calls.
+ *
+ * \param dir_emit callback function called for each entry read.
+ *
+ * \retval <0 on error
+ * \retval 0  when directory has been completely traversed
+ * \retval >0 if we don't call dir_emit for all entries
+ *
+ * \note If the dir_emit call-back returns non-zero, then iterate should
+ *       assume that it has had enough, and should return as well.
+ */
+static int pvfs2_readdir(struct file *file, struct dir_context *ctx)
+{
+	struct pvfs2_bufmap *bufmap = NULL;
+	int ret = 0;
+	int buffer_index;
+	__u64 *ptoken = file->private_data;
+	__u64 pos = 0;
+	ino_t ino = 0;
+	struct dentry *dentry = file->f_path.dentry;
+	struct pvfs2_kernel_op_s *new_op = NULL;
+	struct pvfs2_inode_s *pvfs2_inode = PVFS2_I(dentry->d_inode);
+	int buffer_full = 0;
+	struct readdir_handle_s rhandle;
+	int i = 0;
+	int len = 0;
+	ino_t current_ino = 0;
+	char *current_entry = NULL;
+	long bytes_decoded;
+
+	gossip_ldebug(GOSSIP_DIR_DEBUG,
+		      "%s: ctx->pos:%lld, token = %llu\n",
+		      __func__,
+		      lld(ctx->pos),
+		      llu(*ptoken));
+
+	pos = (__u64) ctx->pos;
+
+	/* are we done? */
+	if (pos == PVFS_READDIR_END) {
+		gossip_debug(GOSSIP_DIR_DEBUG,
+			     "Skipping to termination path\n");
+		return 0;
+	}
+
+	gossip_debug(GOSSIP_DIR_DEBUG,
+		     "pvfs2_readdir called on %s (pos=%llu)\n",
+		     dentry->d_name.name, llu(pos));
+
+	rhandle.buffer_index = -1;
+	rhandle.dents_buf = NULL;
+	memset(&rhandle.readdir_response, 0, sizeof(rhandle.readdir_response));
+
+	new_op = op_alloc(PVFS2_VFS_OP_READDIR);
+	if (!new_op)
+		return -ENOMEM;
+
+	new_op->uses_shared_memory = 1;
+	new_op->upcall.req.readdir.refn = pvfs2_inode->refn;
+	new_op->upcall.req.readdir.max_dirent_count = MAX_DIRENT_COUNT_READDIR;
+
+	gossip_debug(GOSSIP_DIR_DEBUG,
+		     "%s: upcall.req.readdir.refn.khandle: %pU\n",
+		     __func__,
+		     &new_op->upcall.req.readdir.refn.khandle);
+
+	/*
+	 * NOTE: the position we send to the readdir upcall is out of
+	 * sync with ctx->pos since:
+	 * 1. pvfs2 doesn't include the "." and ".." entries that are
+	 *    added below.
+	 * 2. the introduction of distributed directory logic makes token no
+	 *    longer be related to f_pos and pos. Instead an independent
+	 *    variable is used inside the function and stored in the
+	 *    private_data of the file structure.
+	 */
+	new_op->upcall.req.readdir.token = *ptoken;
+
+get_new_buffer_index:
+	ret = readdir_index_get(&bufmap, &buffer_index);
+	if (ret < 0) {
+		gossip_lerr("pvfs2_readdir: readdir_index_get() failure (%d)\n",
+			    ret);
+		goto out_free_op;
+	}
+	new_op->upcall.req.readdir.buf_index = buffer_index;
+
+	ret = service_operation(new_op,
+				"pvfs2_readdir",
+				get_interruptible_flag(dentry->d_inode));
+
+	gossip_debug(GOSSIP_DIR_DEBUG,
+		     "Readdir downcall status is %d.  ret:%d\n",
+		     new_op->downcall.status,
+		     ret);
+
+	if (ret == -EAGAIN && op_state_purged(new_op)) {
+		/*
+		 * readdir shared memory aread has been wiped due to
+		 * pvfs2-client-core restarting, so we must get a new
+		 * index into the shared memory.
+		 */
+		gossip_debug(GOSSIP_DIR_DEBUG,
+			"%s: Getting new buffer_index for retry of readdir..\n",
+			 __func__);
+		readdir_index_put(bufmap, buffer_index);
+		goto get_new_buffer_index;
+	}
+
+	if (ret == -EIO && op_state_purged(new_op)) {
+		gossip_err("%s: Client is down. Aborting readdir call.\n",
+			__func__);
+		readdir_index_put(bufmap, buffer_index);
+		goto out_free_op;
+	}
+
+	if (ret < 0 || new_op->downcall.status != 0) {
+		gossip_debug(GOSSIP_DIR_DEBUG,
+			     "Readdir request failed.  Status:%d\n",
+			     new_op->downcall.status);
+		readdir_index_put(bufmap, buffer_index);
+		if (ret >= 0)
+			ret = new_op->downcall.status;
+		goto out_free_op;
+	}
+
+	bytes_decoded =
+		readdir_handle_ctor(&rhandle,
+				    new_op->downcall.trailer_buf,
+				    buffer_index);
+	if (bytes_decoded < 0) {
+		gossip_err("pvfs2_readdir: Could not decode trailer buffer into a readdir response %d\n",
+			ret);
+		ret = bytes_decoded;
+		readdir_index_put(bufmap, buffer_index);
+		goto out_free_op;
+	}
+
+	if (bytes_decoded != new_op->downcall.trailer_size) {
+		gossip_err("pvfs2_readdir: # bytes decoded (%ld) != trailer size (%ld)\n",
+			bytes_decoded,
+			(long)new_op->downcall.trailer_size);
+		ret = -EINVAL;
+		goto out_destroy_handle;
+	}
+
+	if (pos == 0) {
+		ino = get_ino_from_khandle(dentry->d_inode);
+		gossip_debug(GOSSIP_DIR_DEBUG,
+			     "%s: calling dir_emit of \".\" with pos = %llu\n",
+			     __func__,
+			     llu(pos));
+		ret = dir_emit(ctx, ".", 1, ino, DT_DIR);
+		if (ret < 0)
+			goto out_destroy_handle;
+		ctx->pos++;
+		gossip_ldebug(GOSSIP_DIR_DEBUG,
+			      "%s: ctx->pos:%lld\n",
+			      __func__,
+			      lld(ctx->pos));
+		pos++;
+	}
+
+	if (pos == 1) {
+		ino = get_parent_ino_from_dentry(dentry);
+		gossip_debug(GOSSIP_DIR_DEBUG,
+			     "%s: calling dir_emit of \"..\" with pos = %llu\n",
+			     __func__,
+			     llu(pos));
+		ret = dir_emit(ctx, "..", 2, ino, DT_DIR);
+		if (ret < 0)
+			goto out_destroy_handle;
+		ctx->pos++;
+		gossip_ldebug(GOSSIP_DIR_DEBUG,
+			      "%s: ctx->pos:%lld\n",
+			      __func__,
+			      lld(ctx->pos));
+		pos++;
+	}
+
+	for (i = 0; i < rhandle.readdir_response.pvfs_dirent_outcount; i++) {
+		len = rhandle.readdir_response.dirent_array[i].d_length;
+		current_entry = rhandle.readdir_response.dirent_array[i].d_name;
+		current_ino = pvfs2_khandle_to_ino(
+			&(rhandle.readdir_response.dirent_array[i].khandle));
+
+		gossip_debug(GOSSIP_DIR_DEBUG,
+			     "calling dir_emit for %s with len %d, pos %ld\n",
+			     current_entry,
+			     len,
+			     (unsigned long)pos);
+		ret =
+		    dir_emit(ctx, current_entry, len, current_ino, DT_UNKNOWN);
+		if (ret < 0) {
+			gossip_debug(GOSSIP_DIR_DEBUG,
+				     "dir_emit() failed. ret:%d\n",
+				     ret);
+			if (i < 2) {
+				gossip_err("dir_emit failed on one of the first two true PVFS directory entries.\n");
+				gossip_err("Duplicate entries may appear.\n");
+			}
+			buffer_full = 1;
+			break;
+		}
+		ctx->pos++;
+		gossip_ldebug(GOSSIP_DIR_DEBUG,
+			      "%s: ctx->pos:%lld\n",
+			      __func__,
+			      lld(ctx->pos));
+
+		pos++;
+	}
+
+	/* this means that all of the dir_emit calls succeeded */
+	if (i == rhandle.readdir_response.pvfs_dirent_outcount) {
+		/* update token */
+		*ptoken = rhandle.readdir_response.token;
+	} else {
+		/* this means a dir_emit call failed */
+		if (rhandle.readdir_response.token == PVFS_READDIR_END) {
+			/*
+			 * If PVFS hit end of directory, then there
+			 * is no way to do math on the token that it
+			 * returned. Instead we go by ctx->pos but
+			 * back up to account for the artificial .
+			 * and .. entries.
+			 */
+			ctx->pos -= 3;
+		} else {
+			/*
+			 * this means a dir_emit call failed. !!! need to set
+			 * back to previous ctx->pos, no middle value allowed
+			 */
+			pos -= (i - 1);
+			ctx->pos -= (i - 1);
+		}
+		gossip_debug(GOSSIP_DIR_DEBUG,
+			"at least one dir_emit call failed. Setting ctx->pos to: %lld\n",
+			lld(ctx->pos));
+	}
+
+	/*
+	 * Did we hit the end of the directory?
+	 */
+	if (rhandle.readdir_response.token == PVFS_READDIR_END &&
+	    !buffer_full) {
+		gossip_debug(GOSSIP_DIR_DEBUG, "End of dir detected; setting ctx->pos to PVFS_READDIR_END.\n");
+		ctx->pos = PVFS_READDIR_END;
+	}
+
+	gossip_debug(GOSSIP_DIR_DEBUG,
+		     "pos = %llu, token = %llu"
+		     ", ctx->pos should have been %lld\n",
+		     llu(pos),
+		     llu(*ptoken),
+		     lld(ctx->pos));
+
+out_destroy_handle:
+	readdir_handle_dtor(bufmap, &rhandle);
+out_free_op:
+	op_release(new_op);
+	gossip_debug(GOSSIP_DIR_DEBUG, "pvfs2_readdir returning %d\n", ret);
+	return ret;
+}
+
+static int pvfs2_dir_open(struct inode *inode, struct file *file)
+{
+	__u64 *ptoken;
+
+	file->private_data = kmalloc(sizeof(__u64), GFP_KERNEL);
+	if (!file->private_data)
+		return -ENOMEM;
+
+	ptoken = file->private_data;
+	*ptoken = PVFS_READDIR_START;
+	return 0;
+}
+
+static int pvfs2_dir_release(struct inode *inode, struct file *file)
+{
+	pvfs2_flush_inode(inode);
+	kfree(file->private_data);
+	return 0;
+}
+
+/** PVFS2 implementation of VFS directory operations */
+const struct file_operations pvfs2_dir_operations = {
+	.read = generic_read_dir,
+	.iterate = pvfs2_readdir,
+	.open = pvfs2_dir_open,
+	.release = pvfs2_dir_release,
+};
diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c
new file mode 100644
index 0000000..8e26f9f
--- /dev/null
+++ b/fs/orangefs/file.c
@@ -0,0 +1,1019 @@
+/*
+ * (C) 2001 Clemson University and The University of Chicago
+ *
+ * See COPYING in top-level directory.
+ */
+
+/*
+ *  Linux VFS file operations.
+ */
+
+#include "protocol.h"
+#include "pvfs2-kernel.h"
+#include "pvfs2-bufmap.h"
+#include <linux/fs.h>
+#include <linux/pagemap.h>
+
+#define wake_up_daemon_for_return(op)			\
+do {							\
+	spin_lock(&op->lock);                           \
+	op->io_completed = 1;                           \
+	spin_unlock(&op->lock);                         \
+	wake_up_interruptible(&op->io_completion_waitq);\
+} while (0)
+
+/*
+ * Copy to client-core's address space from the buffers specified
+ * by the iovec upto total_size bytes.
+ * NOTE: the iovector can either contain addresses which
+ *       can futher be kernel-space or user-space addresses.
+ *       or it can pointers to struct page's
+ */
+static int precopy_buffers(struct pvfs2_bufmap *bufmap,
+			   int buffer_index,
+			   const struct iovec *vec,
+			   unsigned long nr_segs,
+			   size_t total_size,
+			   int from_user)
+{
+	int ret = 0;
+
+	/*
+	 * copy data from application/kernel by pulling it out
+	 * of the iovec.
+	 */
+	/* Are we copying from User Virtual Addresses? */
+	if (from_user)
+		ret = pvfs_bufmap_copy_iovec_from_user(
+			bufmap,
+			buffer_index,
+			vec,
+			nr_segs,
+			total_size);
+	/* Are we copying from Kernel Virtual Addresses? */
+	else
+		ret = pvfs_bufmap_copy_iovec_from_kernel(
+			bufmap,
+			buffer_index,
+			vec,
+			nr_segs,
+			total_size);
+	if (ret < 0)
+		gossip_err("%s: Failed to copy-in buffers. Please make sure that the pvfs2-client is running. %ld\n",
+			__func__,
+			(long)ret);
+	return ret;
+}
+
+/*
+ * Copy from client-core's address space to the buffers specified
+ * by the iovec upto total_size bytes.
+ * NOTE: the iovector can either contain addresses which
+ *       can futher be kernel-space or user-space addresses.
+ *       or it can pointers to struct page's
+ */
+static int postcopy_buffers(struct pvfs2_bufmap *bufmap,
+			    int buffer_index,
+			    const struct iovec *vec,
+			    int nr_segs,
+			    size_t total_size,
+			    int to_user)
+{
+	int ret = 0;
+
+	/*
+	 * copy data to application/kernel by pushing it out to
+	 * the iovec. NOTE; target buffers can be addresses or
+	 * struct page pointers.
+	 */
+	if (total_size) {
+		/* Are we copying to User Virtual Addresses? */
+		if (to_user)
+			ret = pvfs_bufmap_copy_to_user_iovec(
+				bufmap,
+				buffer_index,
+				vec,
+				nr_segs,
+				total_size);
+		/* Are we copying to Kern Virtual Addresses? */
+		else
+			ret = pvfs_bufmap_copy_to_kernel_iovec(
+				bufmap,
+				buffer_index,
+				vec,
+				nr_segs,
+				total_size);
+		if (ret < 0)
+			gossip_err("%s: Failed to copy-out buffers.  Please make sure that the pvfs2-client is running (%ld)\n",
+				__func__,
+				(long)ret);
+	}
+	return ret;
+}
+
+/*
+ * Post and wait for the I/O upcall to finish
+ */
+static ssize_t wait_for_direct_io(enum PVFS_io_type type, struct inode *inode,
+		loff_t *offset, struct iovec *vec, unsigned long nr_segs,
+		size_t total_size, loff_t readahead_size, int to_user)
+{
+	struct pvfs2_inode_s *pvfs2_inode = PVFS2_I(inode);
+	struct pvfs2_khandle *handle = &pvfs2_inode->refn.khandle;
+	struct pvfs2_bufmap *bufmap = NULL;
+	struct pvfs2_kernel_op_s *new_op = NULL;
+	int buffer_index = -1;
+	ssize_t ret;
+
+	new_op = op_alloc(PVFS2_VFS_OP_FILE_IO);
+	if (!new_op) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	/* synchronous I/O */
+	new_op->upcall.req.io.async_vfs_io = PVFS_VFS_SYNC_IO;
+	new_op->upcall.req.io.readahead_size = readahead_size;
+	new_op->upcall.req.io.io_type = type;
+	new_op->upcall.req.io.refn = pvfs2_inode->refn;
+
+populate_shared_memory:
+	/* get a shared buffer index */
+	ret = pvfs_bufmap_get(&bufmap, &buffer_index);
+	if (ret < 0) {
+		gossip_debug(GOSSIP_FILE_DEBUG,
+			     "%s: pvfs_bufmap_get failure (%ld)\n",
+			     __func__, (long)ret);
+		goto out;
+	}
+	gossip_debug(GOSSIP_FILE_DEBUG,
+		     "%s(%pU): GET op %p -> buffer_index %d\n",
+		     __func__,
+		     handle,
+		     new_op,
+		     buffer_index);
+
+	new_op->uses_shared_memory = 1;
+	new_op->upcall.req.io.buf_index = buffer_index;
+	new_op->upcall.req.io.count = total_size;
+	new_op->upcall.req.io.offset = *offset;
+
+	gossip_debug(GOSSIP_FILE_DEBUG,
+		     "%s(%pU): copy_to_user %d nr_segs %lu, offset: %llu total_size: %zd\n",
+		     __func__,
+		     handle,
+		     to_user,
+		     nr_segs,
+		     llu(*offset),
+		     total_size);
+	/*
+	 * Stage 1: copy the buffers into client-core's address space
+	 * precopy_buffers only pertains to writes.
+	 */
+	if (type == PVFS_IO_WRITE) {
+		ret = precopy_buffers(bufmap,
+				      buffer_index,
+				      vec,
+				      nr_segs,
+				      total_size,
+				      to_user);
+		if (ret < 0)
+			goto out;
+	}
+
+	gossip_debug(GOSSIP_FILE_DEBUG,
+		     "%s(%pU): Calling post_io_request with tag (%llu)\n",
+		     __func__,
+		     handle,
+		     llu(new_op->tag));
+
+	/* Stage 2: Service the I/O operation */
+	ret = service_operation(new_op,
+				type == PVFS_IO_WRITE ?
+					"file_write" :
+					"file_read",
+				get_interruptible_flag(inode));
+
+	/*
+	 * If service_operation() returns -EAGAIN #and# the operation was
+	 * purged from pvfs2_request_list or htable_ops_in_progress, then
+	 * we know that the client was restarted, causing the shared memory
+	 * area to be wiped clean.  To restart a  write operation in this
+	 * case, we must re-copy the data from the user's iovec to a NEW
+	 * shared memory location. To restart a read operation, we must get
+	 * a new shared memory location.
+	 */
+	if (ret == -EAGAIN && op_state_purged(new_op)) {
+		pvfs_bufmap_put(bufmap, buffer_index);
+		gossip_debug(GOSSIP_FILE_DEBUG,
+			     "%s:going to repopulate_shared_memory.\n",
+			     __func__);
+		goto populate_shared_memory;
+	}
+
+	if (ret < 0) {
+		handle_io_error(); /* defined in pvfs2-kernel.h */
+		/*
+		   don't write an error to syslog on signaled operation
+		   termination unless we've got debugging turned on, as
+		   this can happen regularly (i.e. ctrl-c)
+		 */
+		if (ret == -EINTR)
+			gossip_debug(GOSSIP_FILE_DEBUG,
+				     "%s: returning error %ld\n", __func__,
+				     (long)ret);
+		else
+			gossip_err("%s: error in %s handle %pU, returning %zd\n",
+				__func__,
+				type == PVFS_IO_READ ?
+					"read from" : "write to",
+				handle, ret);
+		goto out;
+	}
+
+	/*
+	 * Stage 3: Post copy buffers from client-core's address space
+	 * postcopy_buffers only pertains to reads.
+	 */
+	if (type == PVFS_IO_READ) {
+		ret = postcopy_buffers(bufmap,
+				       buffer_index,
+				       vec,
+				       nr_segs,
+				       new_op->downcall.resp.io.amt_complete,
+				       to_user);
+		if (ret < 0) {
+			/*
+			 * put error codes in downcall so that handle_io_error()
+			 * preserves it properly
+			 */
+			new_op->downcall.status = ret;
+			handle_io_error();
+			goto out;
+		}
+	}
+	gossip_debug(GOSSIP_FILE_DEBUG,
+	    "%s(%pU): Amount written as returned by the sys-io call:%d\n",
+	    __func__,
+	    handle,
+	    (int)new_op->downcall.resp.io.amt_complete);
+
+	ret = new_op->downcall.resp.io.amt_complete;
+
+	/*
+	   tell the device file owner waiting on I/O that this read has
+	   completed and it can return now.  in this exact case, on
+	   wakeup the daemon will free the op, so we *cannot* touch it
+	   after this.
+	 */
+	wake_up_daemon_for_return(new_op);
+	new_op = NULL;
+
+out:
+	if (buffer_index >= 0) {
+		pvfs_bufmap_put(bufmap, buffer_index);
+		gossip_debug(GOSSIP_FILE_DEBUG,
+			     "%s(%pU): PUT buffer_index %d\n",
+			     __func__, handle, buffer_index);
+		buffer_index = -1;
+	}
+	if (new_op) {
+		op_release(new_op);
+		new_op = NULL;
+	}
+	return ret;
+}
+
+/*
+ * The reason we need to do this is to be able to support readv and writev
+ * that are larger than (pvfs_bufmap_size_query()) Default is
+ * PVFS2_BUFMAP_DEFAULT_DESC_SIZE MB. What that means is that we will
+ * create a new io vec descriptor for those memory addresses that
+ * go beyond the limit. Return value for this routine is negative in case
+ * of errors and 0 in case of success.
+ *
+ * Further, the new_nr_segs pointer is updated to hold the new value
+ * of number of iovecs, the new_vec pointer is updated to hold the pointer
+ * to the new split iovec, and the size array is an array of integers holding
+ * the number of iovecs that straddle pvfs_bufmap_size_query().
+ * The max_new_nr_segs value is computed by the caller and returned.
+ * (It will be (count of all iov_len/ block_size) + 1).
+ */
+static int split_iovecs(unsigned long max_new_nr_segs,		/* IN */
+			unsigned long nr_segs,			/* IN */
+			const struct iovec *original_iovec,	/* IN */
+			unsigned long *new_nr_segs,		/* OUT */
+			struct iovec **new_vec,			/* OUT */
+			unsigned long *seg_count,		/* OUT */
+			unsigned long **seg_array)		/* OUT */
+{
+	unsigned long seg;
+	unsigned long count = 0;
+	unsigned long begin_seg;
+	unsigned long tmpnew_nr_segs = 0;
+	struct iovec *new_iovec = NULL;
+	struct iovec *orig_iovec;
+	unsigned long *sizes = NULL;
+	unsigned long sizes_count = 0;
+
+	if (nr_segs <= 0 ||
+	    original_iovec == NULL ||
+	    new_nr_segs == NULL ||
+	    new_vec == NULL ||
+	    seg_count == NULL ||
+	    seg_array == NULL ||
+	    max_new_nr_segs <= 0) {
+		gossip_err("Invalid parameters to split_iovecs\n");
+		return -EINVAL;
+	}
+	*new_nr_segs = 0;
+	*new_vec = NULL;
+	*seg_count = 0;
+	*seg_array = NULL;
+	/* copy the passed in iovec descriptor to a temp structure */
+	orig_iovec = kmalloc_array(nr_segs,
+				   sizeof(*orig_iovec),
+				   PVFS2_BUFMAP_GFP_FLAGS);
+	if (orig_iovec == NULL) {
+		gossip_err(
+		    "split_iovecs: Could not allocate memory for %lu bytes!\n",
+		    (unsigned long)(nr_segs * sizeof(*orig_iovec)));
+		return -ENOMEM;
+	}
+	new_iovec = kcalloc(max_new_nr_segs,
+			    sizeof(*new_iovec),
+			    PVFS2_BUFMAP_GFP_FLAGS);
+	if (new_iovec == NULL) {
+		kfree(orig_iovec);
+		gossip_err(
+		    "split_iovecs: Could not allocate memory for %lu bytes!\n",
+		    (unsigned long)(max_new_nr_segs * sizeof(*new_iovec)));
+		return -ENOMEM;
+	}
+	sizes = kcalloc(max_new_nr_segs,
+			sizeof(*sizes),
+			PVFS2_BUFMAP_GFP_FLAGS);
+	if (sizes == NULL) {
+		kfree(new_iovec);
+		kfree(orig_iovec);
+		gossip_err(
+		    "split_iovecs: Could not allocate memory for %lu bytes!\n",
+		    (unsigned long)(max_new_nr_segs * sizeof(*sizes)));
+		return -ENOMEM;
+	}
+	/* copy the passed in iovec to a temp structure */
+	memcpy(orig_iovec, original_iovec, nr_segs * sizeof(*orig_iovec));
+	begin_seg = 0;
+repeat:
+	for (seg = begin_seg; seg < nr_segs; seg++) {
+		if (tmpnew_nr_segs >= max_new_nr_segs ||
+		    sizes_count >= max_new_nr_segs) {
+			kfree(sizes);
+			kfree(orig_iovec);
+			kfree(new_iovec);
+			gossip_err
+			    ("split_iovecs: exceeded the index limit (%lu)\n",
+			    tmpnew_nr_segs);
+			return -EINVAL;
+		}
+		if (count + orig_iovec[seg].iov_len <
+		    pvfs_bufmap_size_query()) {
+			count += orig_iovec[seg].iov_len;
+			memcpy(&new_iovec[tmpnew_nr_segs],
+			       &orig_iovec[seg],
+			       sizeof(*new_iovec));
+			tmpnew_nr_segs++;
+			sizes[sizes_count]++;
+		} else {
+			new_iovec[tmpnew_nr_segs].iov_base =
+			    orig_iovec[seg].iov_base;
+			new_iovec[tmpnew_nr_segs].iov_len =
+			    (pvfs_bufmap_size_query() - count);
+			tmpnew_nr_segs++;
+			sizes[sizes_count]++;
+			sizes_count++;
+			begin_seg = seg;
+			orig_iovec[seg].iov_base +=
+			    (pvfs_bufmap_size_query() - count);
+			orig_iovec[seg].iov_len -=
+			    (pvfs_bufmap_size_query() - count);
+			count = 0;
+			break;
+		}
+	}
+	if (seg != nr_segs)
+		goto repeat;
+	else
+		sizes_count++;
+
+	*new_nr_segs = tmpnew_nr_segs;
+	/* new_iovec is freed by the caller */
+	*new_vec = new_iovec;
+	*seg_count = sizes_count;
+	/* seg_array is also freed by the caller */
+	*seg_array = sizes;
+	kfree(orig_iovec);
+	return 0;
+}
+
+static long bound_max_iovecs(const struct iovec *curr, unsigned long nr_segs,
+			     ssize_t *total_count)
+{
+	unsigned long i;
+	long max_nr_iovecs;
+	ssize_t total;
+	ssize_t count;
+
+	total = 0;
+	count = 0;
+	max_nr_iovecs = 0;
+	for (i = 0; i < nr_segs; i++) {
+		const struct iovec *iv = &curr[i];
+
+		count += iv->iov_len;
+		if (unlikely((ssize_t) (count | iv->iov_len) < 0))
+			return -EINVAL;
+		if (total + iv->iov_len < pvfs_bufmap_size_query()) {
+			total += iv->iov_len;
+			max_nr_iovecs++;
+		} else {
+			total =
+			    (total + iv->iov_len - pvfs_bufmap_size_query());
+			max_nr_iovecs += (total / pvfs_bufmap_size_query() + 2);
+		}
+	}
+	*total_count = count;
+	return max_nr_iovecs;
+}
+
+/*
+ * Common entry point for read/write/readv/writev
+ * This function will dispatch it to either the direct I/O
+ * or buffered I/O path depending on the mount options and/or
+ * augmented/extended metadata attached to the file.
+ * Note: File extended attributes override any mount options.
+ */
+static ssize_t do_readv_writev(enum PVFS_io_type type, struct file *file,
+		loff_t *offset, const struct iovec *iov, unsigned long nr_segs)
+{
+	struct inode *inode = file->f_mapping->host;
+	struct pvfs2_inode_s *pvfs2_inode = PVFS2_I(inode);
+	struct pvfs2_khandle *handle = &pvfs2_inode->refn.khandle;
+	ssize_t ret;
+	ssize_t total_count;
+	unsigned int to_free;
+	size_t count;
+	unsigned long seg;
+	unsigned long new_nr_segs = 0;
+	unsigned long max_new_nr_segs = 0;
+	unsigned long seg_count = 0;
+	unsigned long *seg_array = NULL;
+	struct iovec *iovecptr = NULL;
+	struct iovec *ptr = NULL;
+
+	total_count = 0;
+	ret = -EINVAL;
+	count = 0;
+	to_free = 0;
+
+	/* Compute total and max number of segments after split */
+	max_new_nr_segs = bound_max_iovecs(iov, nr_segs, &count);
+	if (max_new_nr_segs < 0) {
+		gossip_lerr("%s: could not bound iovec %lu\n",
+			    __func__,
+			    max_new_nr_segs);
+		goto out;
+	}
+
+	gossip_debug(GOSSIP_FILE_DEBUG,
+		"%s-BEGIN(%pU): count(%d) after estimate_max_iovecs.\n",
+		__func__,
+		handle,
+		(int)count);
+
+	if (type == PVFS_IO_WRITE) {
+		gossip_debug(GOSSIP_FILE_DEBUG,
+			     "%s(%pU): proceeding with offset : %llu, "
+			     "size %d\n",
+			     __func__,
+			     handle,
+			     llu(*offset),
+			     (int)count);
+	}
+
+	if (count == 0) {
+		ret = 0;
+		goto out;
+	}
+
+	/*
+	 * if the total size of data transfer requested is greater than
+	 * the kernel-set blocksize of PVFS2, then we split the iovecs
+	 * such that no iovec description straddles a block size limit
+	 */
+
+	gossip_debug(GOSSIP_FILE_DEBUG,
+		     "%s: pvfs_bufmap_size:%d\n",
+		     __func__,
+		     pvfs_bufmap_size_query());
+
+	if (count > pvfs_bufmap_size_query()) {
+		/*
+		 * Split up the given iovec description such that
+		 * no iovec descriptor straddles over the block-size limitation.
+		 * This makes us our job easier to stage the I/O.
+		 * In addition, this function will also compute an array
+		 * with seg_count entries that will store the number of
+		 * segments that straddle the block-size boundaries.
+		 */
+		ret = split_iovecs(max_new_nr_segs,	/* IN */
+				   nr_segs,		/* IN */
+				   iov,			/* IN */
+				   &new_nr_segs,	/* OUT */
+				   &iovecptr,		/* OUT */
+				   &seg_count,		/* OUT */
+				   &seg_array);		/* OUT */
+		if (ret < 0) {
+			gossip_err("%s: Failed to split iovecs to satisfy larger than blocksize readv/writev request %zd\n",
+				__func__,
+				ret);
+			goto out;
+		}
+		gossip_debug(GOSSIP_FILE_DEBUG,
+			     "%s: Splitting iovecs from %lu to %lu"
+			     " [max_new %lu]\n",
+			     __func__,
+			     nr_segs,
+			     new_nr_segs,
+			     max_new_nr_segs);
+		/* We must free seg_array and iovecptr */
+		to_free = 1;
+	} else {
+		new_nr_segs = nr_segs;
+		/* use the given iovec description */
+		iovecptr = (struct iovec *)iov;
+		/* There is only 1 element in the seg_array */
+		seg_count = 1;
+		/* and its value is the number of segments passed in */
+		seg_array = &nr_segs;
+		/* We dont have to free up anything */
+		to_free = 0;
+	}
+	ptr = iovecptr;
+
+	gossip_debug(GOSSIP_FILE_DEBUG,
+		     "%s(%pU) %zd@%llu\n",
+		     __func__,
+		     handle,
+		     count,
+		     llu(*offset));
+	gossip_debug(GOSSIP_FILE_DEBUG,
+		     "%s(%pU): new_nr_segs: %lu, seg_count: %lu\n",
+		     __func__,
+		     handle,
+		     new_nr_segs, seg_count);
+
+/* PVFS2_KERNEL_DEBUG is a CFLAGS define. */
+#ifdef PVFS2_KERNEL_DEBUG
+	for (seg = 0; seg < new_nr_segs; seg++)
+		gossip_debug(GOSSIP_FILE_DEBUG,
+			     "%s: %d) %p to %p [%d bytes]\n",
+			     __func__,
+			     (int)seg + 1,
+			     iovecptr[seg].iov_base,
+			     iovecptr[seg].iov_base + iovecptr[seg].iov_len,
+			     (int)iovecptr[seg].iov_len);
+	for (seg = 0; seg < seg_count; seg++)
+		gossip_debug(GOSSIP_FILE_DEBUG,
+			     "%s: %zd) %lu\n",
+			     __func__,
+			     seg + 1,
+			     seg_array[seg]);
+#endif
+	seg = 0;
+	while (total_count < count) {
+		size_t each_count;
+		size_t amt_complete;
+
+		/* how much to transfer in this loop iteration */
+		each_count =
+		   (((count - total_count) > pvfs_bufmap_size_query()) ?
+			pvfs_bufmap_size_query() :
+			(count - total_count));
+
+		gossip_debug(GOSSIP_FILE_DEBUG,
+			     "%s(%pU): size of each_count(%d)\n",
+			     __func__,
+			     handle,
+			     (int)each_count);
+		gossip_debug(GOSSIP_FILE_DEBUG,
+			     "%s(%pU): BEFORE wait_for_io: offset is %d\n",
+			     __func__,
+			     handle,
+			     (int)*offset);
+
+		ret = wait_for_direct_io(type, inode, offset, ptr,
+				seg_array[seg], each_count, 0, 1);
+		gossip_debug(GOSSIP_FILE_DEBUG,
+			     "%s(%pU): return from wait_for_io:%d\n",
+			     __func__,
+			     handle,
+			     (int)ret);
+
+		if (ret < 0)
+			goto out;
+
+		/* advance the iovec pointer */
+		ptr += seg_array[seg];
+		seg++;
+		*offset += ret;
+		total_count += ret;
+		amt_complete = ret;
+
+		gossip_debug(GOSSIP_FILE_DEBUG,
+			     "%s(%pU): AFTER wait_for_io: offset is %d\n",
+			     __func__,
+			     handle,
+			     (int)*offset);
+
+		/*
+		 * if we got a short I/O operations,
+		 * fall out and return what we got so far
+		 */
+		if (amt_complete < each_count)
+			break;
+	} /*end while */
+
+	if (total_count > 0)
+		ret = total_count;
+out:
+	if (to_free) {
+		kfree(iovecptr);
+		kfree(seg_array);
+	}
+	if (ret > 0) {
+		if (type == PVFS_IO_READ) {
+			file_accessed(file);
+		} else {
+			SetMtimeFlag(pvfs2_inode);
+			inode->i_mtime = CURRENT_TIME;
+			mark_inode_dirty_sync(inode);
+		}
+	}
+
+	gossip_debug(GOSSIP_FILE_DEBUG,
+		     "%s(%pU): Value(%d) returned.\n",
+		     __func__,
+		     handle,
+		     (int)ret);
+
+	return ret;
+}
+
+/*
+ * Read data from a specified offset in a file (referenced by inode).
+ * Data may be placed either in a user or kernel buffer.
+ */
+ssize_t pvfs2_inode_read(struct inode *inode,
+			 char __user *buf,
+			 size_t count,
+			 loff_t *offset,
+			 loff_t readahead_size)
+{
+	struct pvfs2_inode_s *pvfs2_inode = PVFS2_I(inode);
+	size_t bufmap_size;
+	struct iovec vec;
+	ssize_t ret = -EINVAL;
+
+	g_pvfs2_stats.reads++;
+
+	vec.iov_base = buf;
+	vec.iov_len = count;
+
+	bufmap_size = pvfs_bufmap_size_query();
+	if (count > bufmap_size) {
+		gossip_debug(GOSSIP_FILE_DEBUG,
+			     "%s: count is too large (%zd/%zd)!\n",
+			     __func__, count, bufmap_size);
+		return -EINVAL;
+	}
+
+	gossip_debug(GOSSIP_FILE_DEBUG,
+		     "%s(%pU) %zd@%llu\n",
+		     __func__,
+		     &pvfs2_inode->refn.khandle,
+		     count,
+		     llu(*offset));
+
+	ret = wait_for_direct_io(PVFS_IO_READ, inode, offset, &vec, 1,
+			count, readahead_size, 0);
+	if (ret > 0)
+		*offset += ret;
+
+	gossip_debug(GOSSIP_FILE_DEBUG,
+		     "%s(%pU): Value(%zd) returned.\n",
+		     __func__,
+		     &pvfs2_inode->refn.khandle,
+		     ret);
+
+	return ret;
+}
+
+static ssize_t pvfs2_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
+{
+	struct file *file = iocb->ki_filp;
+	loff_t pos = *(&iocb->ki_pos);
+	ssize_t rc = 0;
+	unsigned long nr_segs = iter->nr_segs;
+
+	BUG_ON(iocb->private);
+
+	gossip_debug(GOSSIP_FILE_DEBUG, "pvfs2_file_read_iter\n");
+
+	g_pvfs2_stats.reads++;
+
+	rc = do_readv_writev(PVFS_IO_READ,
+			     file,
+			     &pos,
+			     iter->iov,
+			     nr_segs);
+	iocb->ki_pos = pos;
+
+	return rc;
+}
+
+static ssize_t pvfs2_file_write_iter(struct kiocb *iocb, struct iov_iter *iter)
+{
+	struct file *file = iocb->ki_filp;
+	loff_t pos = *(&iocb->ki_pos);
+	unsigned long nr_segs = iter->nr_segs;
+	ssize_t rc;
+
+	BUG_ON(iocb->private);
+
+	gossip_debug(GOSSIP_FILE_DEBUG, "pvfs2_file_write_iter\n");
+
+	mutex_lock(&file->f_mapping->host->i_mutex);
+
+	/* Make sure generic_write_checks sees an up to date inode size. */
+	if (file->f_flags & O_APPEND) {
+		rc = pvfs2_inode_getattr(file->f_mapping->host,
+					 PVFS_ATTR_SYS_SIZE);
+		if (rc) {
+			gossip_err("%s: pvfs2_inode_getattr failed, rc:%zd:.\n",
+				   __func__, rc);
+			goto out;
+		}
+	}
+
+	if (file->f_pos > i_size_read(file->f_mapping->host))
+		pvfs2_i_size_write(file->f_mapping->host, file->f_pos);
+
+	rc = generic_write_checks(iocb, iter);
+
+	if (rc <= 0) {
+		gossip_err("%s: generic_write_checks failed, rc:%zd:.\n",
+			   __func__, rc);
+		goto out;
+	}
+
+	rc = do_readv_writev(PVFS_IO_WRITE,
+			     file,
+			     &pos,
+			     iter->iov,
+			     nr_segs);
+	if (rc < 0) {
+		gossip_err("%s: do_readv_writev failed, rc:%zd:.\n",
+			   __func__, rc);
+		goto out;
+	}
+
+	iocb->ki_pos = pos;
+	g_pvfs2_stats.writes++;
+
+out:
+
+	mutex_unlock(&file->f_mapping->host->i_mutex);
+	return rc;
+}
+
+/*
+ * Perform a miscellaneous operation on a file.
+ */
+long pvfs2_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+	int ret = -ENOTTY;
+	__u64 val = 0;
+	unsigned long uval;
+
+	gossip_debug(GOSSIP_FILE_DEBUG,
+		     "pvfs2_ioctl: called with cmd %d\n",
+		     cmd);
+
+	/*
+	 * we understand some general ioctls on files, such as the immutable
+	 * and append flags
+	 */
+	if (cmd == FS_IOC_GETFLAGS) {
+		val = 0;
+		ret = pvfs2_xattr_get_default(file->f_path.dentry,
+					      "user.pvfs2.meta_hint",
+					      &val,
+					      sizeof(val),
+					      0);
+		if (ret < 0 && ret != -ENODATA)
+			return ret;
+		else if (ret == -ENODATA)
+			val = 0;
+		uval = val;
+		gossip_debug(GOSSIP_FILE_DEBUG,
+			     "pvfs2_ioctl: FS_IOC_GETFLAGS: %llu\n",
+			     (unsigned long long)uval);
+		return put_user(uval, (int __user *)arg);
+	} else if (cmd == FS_IOC_SETFLAGS) {
+		ret = 0;
+		if (get_user(uval, (int __user *)arg))
+			return -EFAULT;
+		/*
+		 * PVFS_MIRROR_FL is set internally when the mirroring mode
+		 * is turned on for a file. The user is not allowed to turn
+		 * on this bit, but the bit is present if the user first gets
+		 * the flags and then updates the flags with some new
+		 * settings. So, we ignore it in the following edit. bligon.
+		 */
+		if ((uval & ~PVFS_MIRROR_FL) &
+		    (~(FS_IMMUTABLE_FL | FS_APPEND_FL | FS_NOATIME_FL))) {
+			gossip_err("pvfs2_ioctl: the FS_IOC_SETFLAGS only supports setting one of FS_IMMUTABLE_FL|FS_APPEND_FL|FS_NOATIME_FL\n");
+			return -EINVAL;
+		}
+		val = uval;
+		gossip_debug(GOSSIP_FILE_DEBUG,
+			     "pvfs2_ioctl: FS_IOC_SETFLAGS: %llu\n",
+			     (unsigned long long)val);
+		ret = pvfs2_xattr_set_default(file->f_path.dentry,
+					      "user.pvfs2.meta_hint",
+					      &val,
+					      sizeof(val),
+					      0,
+					      0);
+	}
+
+	return ret;
+}
+
+/*
+ * Memory map a region of a file.
+ */
+static int pvfs2_file_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	gossip_debug(GOSSIP_FILE_DEBUG,
+		     "pvfs2_file_mmap: called on %s\n",
+		     (file ?
+			(char *)file->f_path.dentry->d_name.name :
+			(char *)"Unknown"));
+
+	/* set the sequential readahead hint */
+	vma->vm_flags |= VM_SEQ_READ;
+	vma->vm_flags &= ~VM_RAND_READ;
+	return generic_file_mmap(file, vma);
+}
+
+#define mapping_nrpages(idata) ((idata)->nrpages)
+
+/*
+ * Called to notify the module that there are no more references to
+ * this file (i.e. no processes have it open).
+ *
+ * \note Not called when each file is closed.
+ */
+int pvfs2_file_release(struct inode *inode, struct file *file)
+{
+	gossip_debug(GOSSIP_FILE_DEBUG,
+		     "pvfs2_file_release: called on %s\n",
+		     file->f_path.dentry->d_name.name);
+
+	pvfs2_flush_inode(inode);
+
+	/*
+	   remove all associated inode pages from the page cache and mmap
+	   readahead cache (if any); this forces an expensive refresh of
+	   data for the next caller of mmap (or 'get_block' accesses)
+	 */
+	if (file->f_path.dentry->d_inode &&
+	    file->f_path.dentry->d_inode->i_mapping &&
+	    mapping_nrpages(&file->f_path.dentry->d_inode->i_data))
+		truncate_inode_pages(file->f_path.dentry->d_inode->i_mapping,
+				     0);
+	return 0;
+}
+
+/*
+ * Push all data for a specific file onto permanent storage.
+ */
+int pvfs2_fsync(struct file *file, loff_t start, loff_t end, int datasync)
+{
+	int ret = -EINVAL;
+	struct pvfs2_inode_s *pvfs2_inode =
+		PVFS2_I(file->f_path.dentry->d_inode);
+	struct pvfs2_kernel_op_s *new_op = NULL;
+
+	/* required call */
+	filemap_write_and_wait_range(file->f_mapping, start, end);
+
+	new_op = op_alloc(PVFS2_VFS_OP_FSYNC);
+	if (!new_op)
+		return -ENOMEM;
+	new_op->upcall.req.fsync.refn = pvfs2_inode->refn;
+
+	ret = service_operation(new_op,
+			"pvfs2_fsync",
+			get_interruptible_flag(file->f_path.dentry->d_inode));
+
+	gossip_debug(GOSSIP_FILE_DEBUG,
+		     "pvfs2_fsync got return value of %d\n",
+		     ret);
+
+	op_release(new_op);
+
+	pvfs2_flush_inode(file->f_path.dentry->d_inode);
+	return ret;
+}
+
+/*
+ * Change the file pointer position for an instance of an open file.
+ *
+ * \note If .llseek is overriden, we must acquire lock as described in
+ *       Documentation/filesystems/Locking.
+ *
+ * Future upgrade could support SEEK_DATA and SEEK_HOLE but would
+ * require much changes to the FS
+ */
+loff_t pvfs2_file_llseek(struct file *file, loff_t offset, int origin)
+{
+	int ret = -EINVAL;
+	struct inode *inode = file->f_path.dentry->d_inode;
+
+	if (!inode) {
+		gossip_err("pvfs2_file_llseek: invalid inode (NULL)\n");
+		return ret;
+	}
+
+	if (origin == PVFS2_SEEK_END) {
+		/*
+		 * revalidate the inode's file size.
+		 * NOTE: We are only interested in file size here,
+		 * so we set mask accordingly.
+		 */
+		ret = pvfs2_inode_getattr(inode, PVFS_ATTR_SYS_SIZE);
+		if (ret) {
+			gossip_debug(GOSSIP_FILE_DEBUG,
+				     "%s:%s:%d calling make bad inode\n",
+				     __FILE__,
+				     __func__,
+				     __LINE__);
+			pvfs2_make_bad_inode(inode);
+			return ret;
+		}
+	}
+
+	gossip_debug(GOSSIP_FILE_DEBUG,
+		     "pvfs2_file_llseek: offset is %ld | origin is %d | "
+		     "inode size is %lu\n",
+		     (long)offset,
+		     origin,
+		     (unsigned long)file->f_path.dentry->d_inode->i_size);
+
+	return generic_file_llseek(file, offset, origin);
+}
+
+/*
+ * Support local locks (locks that only this kernel knows about)
+ * if Orangefs was mounted -o local_lock.
+ */
+int pvfs2_lock(struct file *filp, int cmd, struct file_lock *fl)
+{
+	int rc = -ENOLCK;
+
+	if (PVFS2_SB(filp->f_inode->i_sb)->flags & PVFS2_OPT_LOCAL_LOCK) {
+		if (cmd == F_GETLK) {
+			rc = 0;
+			posix_test_lock(filp, fl);
+		} else {
+			rc = posix_lock_file(filp, fl, NULL);
+		}
+	}
+
+	return rc;
+}
+
+/** PVFS2 implementation of VFS file operations */
+const struct file_operations pvfs2_file_operations = {
+	.llseek		= pvfs2_file_llseek,
+	.read_iter	= pvfs2_file_read_iter,
+	.write_iter	= pvfs2_file_write_iter,
+	.lock		= pvfs2_lock,
+	.unlocked_ioctl	= pvfs2_ioctl,
+	.mmap		= pvfs2_file_mmap,
+	.open		= generic_file_open,
+	.release	= pvfs2_file_release,
+	.fsync		= pvfs2_fsync,
+};
diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c
new file mode 100644
index 0000000..feda00f
--- /dev/null
+++ b/fs/orangefs/inode.c
@@ -0,0 +1,469 @@
+/*
+ * (C) 2001 Clemson University and The University of Chicago
+ *
+ * See COPYING in top-level directory.
+ */
+
+/*
+ *  Linux VFS inode operations.
+ */
+
+#include "protocol.h"
+#include "pvfs2-kernel.h"
+#include "pvfs2-bufmap.h"
+
+static int read_one_page(struct page *page)
+{
+	void *page_data;
+	int ret;
+	int max_block;
+	ssize_t bytes_read = 0;
+	struct inode *inode = page->mapping->host;
+	const __u32 blocksize = PAGE_CACHE_SIZE;	/* inode->i_blksize */
+	const __u32 blockbits = PAGE_CACHE_SHIFT;	/* inode->i_blkbits */
+
+	gossip_debug(GOSSIP_INODE_DEBUG,
+		    "pvfs2_readpage called with page %p\n",
+		     page);
+	page_data = pvfs2_kmap(page);
+
+	max_block = ((inode->i_size / blocksize) + 1);
+
+	if (page->index < max_block) {
+		loff_t blockptr_offset = (((loff_t) page->index) << blockbits);
+
+		bytes_read = pvfs2_inode_read(inode,
+					      page_data,
+					      blocksize,
+					      &blockptr_offset,
+					      inode->i_size);
+	}
+	/* only zero remaining unread portions of the page data */
+	if (bytes_read > 0)
+		memset(page_data + bytes_read, 0, blocksize - bytes_read);
+	else
+		memset(page_data, 0, blocksize);
+	/* takes care of potential aliasing */
+	flush_dcache_page(page);
+	if (bytes_read < 0) {
+		ret = bytes_read;
+		SetPageError(page);
+	} else {
+		SetPageUptodate(page);
+		if (PageError(page))
+			ClearPageError(page);
+		ret = 0;
+	}
+	pvfs2_kunmap(page);
+	/* unlock the page after the ->readpage() routine completes */
+	unlock_page(page);
+	return ret;
+}
+
+static int pvfs2_readpage(struct file *file, struct page *page)
+{
+	return read_one_page(page);
+}
+
+static int pvfs2_readpages(struct file *file,
+			   struct address_space *mapping,
+			   struct list_head *pages,
+			   unsigned nr_pages)
+{
+	int page_idx;
+	int ret;
+
+	gossip_debug(GOSSIP_INODE_DEBUG, "pvfs2_readpages called\n");
+
+	for (page_idx = 0; page_idx < nr_pages; page_idx++) {
+		struct page *page;
+
+		page = list_entry(pages->prev, struct page, lru);
+		list_del(&page->lru);
+		if (!add_to_page_cache(page,
+				       mapping,
+				       page->index,
+				       GFP_KERNEL)) {
+			ret = read_one_page(page);
+			gossip_debug(GOSSIP_INODE_DEBUG,
+				"failure adding page to cache, read_one_page returned: %d\n",
+				ret);
+	      } else {
+			page_cache_release(page);
+	      }
+	}
+	BUG_ON(!list_empty(pages));
+	return 0;
+}
+
+static void pvfs2_invalidatepage(struct page *page,
+				 unsigned int offset,
+				 unsigned int length)
+{
+	gossip_debug(GOSSIP_INODE_DEBUG,
+		     "pvfs2_invalidatepage called on page %p "
+		     "(offset is %u)\n",
+		     page,
+		     offset);
+
+	ClearPageUptodate(page);
+	ClearPageMappedToDisk(page);
+	return;
+
+}
+
+static int pvfs2_releasepage(struct page *page, gfp_t foo)
+{
+	gossip_debug(GOSSIP_INODE_DEBUG,
+		     "pvfs2_releasepage called on page %p\n",
+		     page);
+	return 0;
+}
+
+/*
+ * Having a direct_IO entry point in the address_space_operations
+ * struct causes the kernel to allows us to use O_DIRECT on
+ * open. Nothing will ever call this thing, but in the future we
+ * will need to be able to use O_DIRECT on open in order to support
+ * AIO. Modeled after NFS, they do this too.
+ */
+/*
+static ssize_t pvfs2_direct_IO(int rw,
+			struct kiocb *iocb,
+			struct iov_iter *iter,
+			loff_t offset)
+{
+	gossip_debug(GOSSIP_INODE_DEBUG,
+		     "pvfs2_direct_IO: %s\n",
+		     iocb->ki_filp->f_path.dentry->d_name.name);
+
+	return -EINVAL;
+}
+*/
+
+struct backing_dev_info pvfs2_backing_dev_info = {
+	.name = "pvfs2",
+	.ra_pages = 0,
+	.capabilities = BDI_CAP_NO_ACCT_DIRTY | BDI_CAP_NO_WRITEBACK,
+};
+
+/** PVFS2 implementation of address space operations */
+const struct address_space_operations pvfs2_address_operations = {
+	.readpage = pvfs2_readpage,
+	.readpages = pvfs2_readpages,
+	.invalidatepage = pvfs2_invalidatepage,
+	.releasepage = pvfs2_releasepage,
+/*	.direct_IO = pvfs2_direct_IO */
+};
+
+static int pvfs2_setattr_size(struct inode *inode, struct iattr *iattr)
+{
+	struct pvfs2_inode_s *pvfs2_inode = PVFS2_I(inode);
+	struct pvfs2_kernel_op_s *new_op;
+	loff_t orig_size = i_size_read(inode);
+	int ret = -EINVAL;
+
+	gossip_debug(GOSSIP_INODE_DEBUG,
+		     "%s: %pU: Handle is %pU | fs_id %d | size is %llu\n",
+		     __func__,
+		     get_khandle_from_ino(inode),
+		     &pvfs2_inode->refn.khandle,
+		     pvfs2_inode->refn.fs_id,
+		     iattr->ia_size);
+
+	truncate_setsize(inode, iattr->ia_size);
+
+	new_op = op_alloc(PVFS2_VFS_OP_TRUNCATE);
+	if (!new_op)
+		return -ENOMEM;
+
+	new_op->upcall.req.truncate.refn = pvfs2_inode->refn;
+	new_op->upcall.req.truncate.size = (__s64) iattr->ia_size;
+
+	ret = service_operation(new_op, __func__,
+				get_interruptible_flag(inode));
+
+	/*
+	 * the truncate has no downcall members to retrieve, but
+	 * the status value tells us if it went through ok or not
+	 */
+	gossip_debug(GOSSIP_INODE_DEBUG,
+		     "pvfs2: pvfs2_truncate got return value of %d\n",
+		     ret);
+
+	op_release(new_op);
+
+	if (ret != 0)
+		return ret;
+
+	/*
+	 * Only change the c/mtime if we are changing the size or we are
+	 * explicitly asked to change it.  This handles the semantic difference
+	 * between truncate() and ftruncate() as implemented in the VFS.
+	 *
+	 * The regular truncate() case without ATTR_CTIME and ATTR_MTIME is a
+	 * special case where we need to update the times despite not having
+	 * these flags set.  For all other operations the VFS set these flags
+	 * explicitly if it wants a timestamp update.
+	 */
+	if (orig_size != i_size_read(inode) &&
+	    !(iattr->ia_valid & (ATTR_CTIME | ATTR_MTIME))) {
+		iattr->ia_ctime = iattr->ia_mtime =
+			current_fs_time(inode->i_sb);
+		iattr->ia_valid |= ATTR_CTIME | ATTR_MTIME;
+	}
+
+	return ret;
+}
+
+/*
+ * Change attributes of an object referenced by dentry.
+ */
+int pvfs2_setattr(struct dentry *dentry, struct iattr *iattr)
+{
+	int ret = -EINVAL;
+	struct inode *inode = dentry->d_inode;
+
+	gossip_debug(GOSSIP_INODE_DEBUG,
+		     "pvfs2_setattr: called on %s\n",
+		     dentry->d_name.name);
+
+	ret = inode_change_ok(inode, iattr);
+	if (ret)
+		goto out;
+
+	if ((iattr->ia_valid & ATTR_SIZE) &&
+	    iattr->ia_size != i_size_read(inode)) {
+		ret = pvfs2_setattr_size(inode, iattr);
+		if (ret)
+			goto out;
+	}
+
+	setattr_copy(inode, iattr);
+	mark_inode_dirty(inode);
+
+	ret = pvfs2_inode_setattr(inode, iattr);
+	gossip_debug(GOSSIP_INODE_DEBUG,
+		     "pvfs2_setattr: inode_setattr returned %d\n",
+		     ret);
+
+	if (!ret && (iattr->ia_valid & ATTR_MODE))
+		/* change mod on a file that has ACLs */
+		ret = posix_acl_chmod(inode, inode->i_mode);
+
+out:
+	gossip_debug(GOSSIP_INODE_DEBUG, "pvfs2_setattr: returning %d\n", ret);
+	return ret;
+}
+
+/*
+ * Obtain attributes of an object given a dentry
+ */
+int pvfs2_getattr(struct vfsmount *mnt,
+		  struct dentry *dentry,
+		  struct kstat *kstat)
+{
+	int ret = -ENOENT;
+	struct inode *inode = dentry->d_inode;
+	struct pvfs2_inode_s *pvfs2_inode = NULL;
+
+	gossip_debug(GOSSIP_INODE_DEBUG,
+		     "pvfs2_getattr: called on %s\n",
+		     dentry->d_name.name);
+
+	/*
+	 * Similar to the above comment, a getattr also expects that all
+	 * fields/attributes of the inode would be refreshed. So again, we
+	 * dont have too much of a choice but refresh all the attributes.
+	 */
+	ret = pvfs2_inode_getattr(inode, PVFS_ATTR_SYS_ALL_NOHINT);
+	if (ret == 0) {
+		generic_fillattr(inode, kstat);
+		/* override block size reported to stat */
+		pvfs2_inode = PVFS2_I(inode);
+		kstat->blksize = pvfs2_inode->blksize;
+	} else {
+		/* assume an I/O error and flag inode as bad */
+		gossip_debug(GOSSIP_INODE_DEBUG,
+			     "%s:%s:%d calling make bad inode\n",
+			     __FILE__,
+			     __func__,
+			     __LINE__);
+		pvfs2_make_bad_inode(inode);
+	}
+	return ret;
+}
+
+/* PVFS2 implementation of VFS inode operations for files */
+struct inode_operations pvfs2_file_inode_operations = {
+	.get_acl = pvfs2_get_acl,
+	.set_acl = pvfs2_set_acl,
+	.setattr = pvfs2_setattr,
+	.getattr = pvfs2_getattr,
+	.setxattr = generic_setxattr,
+	.getxattr = generic_getxattr,
+	.listxattr = pvfs2_listxattr,
+	.removexattr = generic_removexattr,
+};
+
+static int pvfs2_init_iops(struct inode *inode)
+{
+	inode->i_mapping->a_ops = &pvfs2_address_operations;
+
+	switch (inode->i_mode & S_IFMT) {
+	case S_IFREG:
+		inode->i_op = &pvfs2_file_inode_operations;
+		inode->i_fop = &pvfs2_file_operations;
+		inode->i_blkbits = PAGE_CACHE_SHIFT;
+		break;
+	case S_IFLNK:
+		inode->i_op = &pvfs2_symlink_inode_operations;
+		break;
+	case S_IFDIR:
+		inode->i_op = &pvfs2_dir_inode_operations;
+		inode->i_fop = &pvfs2_dir_operations;
+		break;
+	default:
+		gossip_debug(GOSSIP_INODE_DEBUG,
+			     "%s: unsupported mode\n",
+			     __func__);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/*
+ * Given a PVFS2 object identifier (fsid, handle), convert it into a ino_t type
+ * that will be used as a hash-index from where the handle will
+ * be searched for in the VFS hash table of inodes.
+ */
+static inline ino_t pvfs2_handle_hash(struct pvfs2_object_kref *ref)
+{
+	if (!ref)
+		return 0;
+	return pvfs2_khandle_to_ino(&(ref->khandle));
+}
+
+/*
+ * Called to set up an inode from iget5_locked.
+ */
+static int pvfs2_set_inode(struct inode *inode, void *data)
+{
+	struct pvfs2_object_kref *ref = (struct pvfs2_object_kref *) data;
+	struct pvfs2_inode_s *pvfs2_inode = NULL;
+
+	/* Make sure that we have sane parameters */
+	if (!data || !inode)
+		return 0;
+	pvfs2_inode = PVFS2_I(inode);
+	if (!pvfs2_inode)
+		return 0;
+	pvfs2_inode->refn.fs_id = ref->fs_id;
+	pvfs2_inode->refn.khandle = ref->khandle;
+	return 0;
+}
+
+/*
+ * Called to determine if handles match.
+ */
+static int pvfs2_test_inode(struct inode *inode, void *data)
+{
+	struct pvfs2_object_kref *ref = (struct pvfs2_object_kref *) data;
+	struct pvfs2_inode_s *pvfs2_inode = NULL;
+
+	pvfs2_inode = PVFS2_I(inode);
+	return (!PVFS_khandle_cmp(&(pvfs2_inode->refn.khandle), &(ref->khandle))
+		&& pvfs2_inode->refn.fs_id == ref->fs_id);
+}
+
+/*
+ * Front-end to lookup the inode-cache maintained by the VFS using the PVFS2
+ * file handle.
+ *
+ * @sb: the file system super block instance.
+ * @ref: The PVFS2 object for which we are trying to locate an inode structure.
+ */
+struct inode *pvfs2_iget(struct super_block *sb, struct pvfs2_object_kref *ref)
+{
+	struct inode *inode = NULL;
+	unsigned long hash;
+	int error;
+
+	hash = pvfs2_handle_hash(ref);
+	inode = iget5_locked(sb, hash, pvfs2_test_inode, pvfs2_set_inode, ref);
+	if (!inode || !(inode->i_state & I_NEW))
+		return inode;
+
+	error = pvfs2_inode_getattr(inode, PVFS_ATTR_SYS_ALL_NOHINT);
+	if (error) {
+		iget_failed(inode);
+		return ERR_PTR(error);
+	}
+
+	inode->i_ino = hash;	/* needed for stat etc */
+	pvfs2_init_iops(inode);
+	unlock_new_inode(inode);
+
+	gossip_debug(GOSSIP_INODE_DEBUG,
+		     "iget handle %pU, fsid %d hash %ld i_ino %lu\n",
+		     &ref->khandle,
+		     ref->fs_id,
+		     hash,
+		     inode->i_ino);
+
+	return inode;
+}
+
+/*
+ * Allocate an inode for a newly created file and insert it into the inode hash.
+ */
+struct inode *pvfs2_new_inode(struct super_block *sb, struct inode *dir,
+		int mode, dev_t dev, struct pvfs2_object_kref *ref)
+{
+	unsigned long hash = pvfs2_handle_hash(ref);
+	struct inode *inode;
+	int error;
+
+	gossip_debug(GOSSIP_INODE_DEBUG,
+		     "pvfs2_get_custom_inode_common: called\n"
+		     "(sb is %p | MAJOR(dev)=%u | MINOR(dev)=%u mode=%o)\n",
+		     sb,
+		     MAJOR(dev),
+		     MINOR(dev),
+		     mode);
+
+	inode = new_inode(sb);
+	if (!inode)
+		return NULL;
+
+	pvfs2_set_inode(inode, ref);
+	inode->i_ino = hash;	/* needed for stat etc */
+
+	error = pvfs2_inode_getattr(inode, PVFS_ATTR_SYS_ALL_NOHINT);
+	if (error)
+		goto out_iput;
+
+	pvfs2_init_iops(inode);
+
+	inode->i_mode = mode;
+	inode->i_uid = current_fsuid();
+	inode->i_gid = current_fsgid();
+	inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
+	inode->i_size = PAGE_CACHE_SIZE;
+	inode->i_rdev = dev;
+
+	error = insert_inode_locked4(inode, hash, pvfs2_test_inode, ref);
+	if (error < 0)
+		goto out_iput;
+
+	gossip_debug(GOSSIP_INODE_DEBUG,
+		     "Initializing ACL's for inode %pU\n",
+		     get_khandle_from_ino(inode));
+	pvfs2_init_acl(inode, dir);
+	return inode;
+
+out_iput:
+	iput(inode);
+	return ERR_PTR(error);
+}
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH V3 3/7] Orangefs: kernel client part 3
  2015-07-17 17:16 [PATCH V3 0/7] Orangefs: kernel client introduction Mike Marshall
  2015-07-17 17:16 ` [PATCH V3 1/7] Orangefs: kernel client part 1 Mike Marshall
  2015-07-17 17:16 ` [PATCH V3 2/7] Orangefs: kernel client part 2 Mike Marshall
@ 2015-07-17 17:16 ` Mike Marshall
  2015-07-17 17:16 ` [PATCH V3 4/7] Orangefs: kernel client part 4 Mike Marshall
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Mike Marshall @ 2015-07-17 17:16 UTC (permalink / raw)
  To: viro; +Cc: Mike Marshall, linux-fsdevel

From: Mike Marshall <hubcap@omnibond.com>

Signed-off-by: Mike Marshall <hubcap@omnibond.com>
---
 fs/orangefs/namei.c         | 473 +++++++++++++++++++++
 fs/orangefs/pvfs2-bufmap.c  | 970 ++++++++++++++++++++++++++++++++++++++++++++
 fs/orangefs/pvfs2-cache.c   | 260 ++++++++++++
 fs/orangefs/pvfs2-debugfs.c | 458 +++++++++++++++++++++
 fs/orangefs/pvfs2-mod.c     | 316 +++++++++++++++
 5 files changed, 2477 insertions(+)
 create mode 100644 fs/orangefs/namei.c
 create mode 100644 fs/orangefs/pvfs2-bufmap.c
 create mode 100644 fs/orangefs/pvfs2-cache.c
 create mode 100644 fs/orangefs/pvfs2-debugfs.c
 create mode 100644 fs/orangefs/pvfs2-mod.c

diff --git a/fs/orangefs/namei.c b/fs/orangefs/namei.c
new file mode 100644
index 0000000..747fe6a
--- /dev/null
+++ b/fs/orangefs/namei.c
@@ -0,0 +1,473 @@
+/*
+ * (C) 2001 Clemson University and The University of Chicago
+ *
+ * See COPYING in top-level directory.
+ */
+
+/*
+ *  Linux VFS namei operations.
+ */
+
+#include "protocol.h"
+#include "pvfs2-kernel.h"
+
+/*
+ * Get a newly allocated inode to go with a negative dentry.
+ */
+static int pvfs2_create(struct inode *dir,
+			struct dentry *dentry,
+			umode_t mode,
+			bool exclusive)
+{
+	struct pvfs2_inode_s *parent = PVFS2_I(dir);
+	struct pvfs2_kernel_op_s *new_op;
+	struct inode *inode;
+	int ret;
+
+	gossip_debug(GOSSIP_NAME_DEBUG, "%s: called\n", __func__);
+
+	new_op = op_alloc(PVFS2_VFS_OP_CREATE);
+	if (!new_op)
+		return -ENOMEM;
+
+	new_op->upcall.req.create.parent_refn = parent->refn;
+
+	fill_default_sys_attrs(new_op->upcall.req.create.attributes,
+			       PVFS_TYPE_METAFILE, mode);
+
+	strncpy(new_op->upcall.req.create.d_name,
+		dentry->d_name.name, PVFS2_NAME_LEN);
+
+	ret = service_operation(new_op, __func__, get_interruptible_flag(dir));
+
+	gossip_debug(GOSSIP_NAME_DEBUG,
+		     "Create Got PVFS2 handle %pU on fsid %d (ret=%d)\n",
+		     &new_op->downcall.resp.create.refn.khandle,
+		     new_op->downcall.resp.create.refn.fs_id, ret);
+
+	if (ret < 0) {
+		gossip_debug(GOSSIP_NAME_DEBUG,
+			     "%s: failed with error code %d\n",
+			     __func__, ret);
+		goto out;
+	}
+
+	inode = pvfs2_new_inode(dir->i_sb, dir, S_IFREG | mode, 0,
+				&new_op->downcall.resp.create.refn);
+	if (IS_ERR(inode)) {
+		gossip_err("*** Failed to allocate pvfs2 file inode\n");
+		ret = PTR_ERR(inode);
+		goto out;
+	}
+
+	gossip_debug(GOSSIP_NAME_DEBUG,
+		     "Assigned file inode new number of %pU\n",
+		     get_khandle_from_ino(inode));
+
+	d_instantiate(dentry, inode);
+	unlock_new_inode(inode);
+
+	gossip_debug(GOSSIP_NAME_DEBUG,
+		     "Inode (Regular File) %pU -> %s\n",
+		     get_khandle_from_ino(inode),
+		     dentry->d_name.name);
+
+	SetMtimeFlag(parent);
+	dir->i_mtime = dir->i_ctime = current_fs_time(dir->i_sb);
+	mark_inode_dirty_sync(dir);
+	ret = 0;
+out:
+	op_release(new_op);
+	gossip_debug(GOSSIP_NAME_DEBUG, "%s: returning %d\n", __func__, ret);
+	return ret;
+}
+
+/*
+ * Attempt to resolve an object name (dentry->d_name), parent handle, and
+ * fsid into a handle for the object.
+ */
+static struct dentry *pvfs2_lookup(struct inode *dir, struct dentry *dentry,
+				   unsigned int flags)
+{
+	struct pvfs2_inode_s *parent = PVFS2_I(dir);
+	struct pvfs2_kernel_op_s *new_op;
+	struct inode *inode;
+	struct dentry *res;
+	int ret = -EINVAL;
+
+	/*
+	 * in theory we could skip a lookup here (if the intent is to
+	 * create) in order to avoid a potentially failed lookup, but
+	 * leaving it in can skip a valid lookup and try to create a file
+	 * that already exists (e.g. the vfs already handles checking for
+	 * -EEXIST on O_EXCL opens, which is broken if we skip this lookup
+	 * in the create path)
+	 */
+	gossip_debug(GOSSIP_NAME_DEBUG, "%s called on %s\n",
+		     __func__, dentry->d_name.name);
+
+	if (dentry->d_name.len > (PVFS2_NAME_LEN - 1))
+		return ERR_PTR(-ENAMETOOLONG);
+
+	new_op = op_alloc(PVFS2_VFS_OP_LOOKUP);
+	if (!new_op)
+		return ERR_PTR(-ENOMEM);
+
+	new_op->upcall.req.lookup.sym_follow = flags & LOOKUP_FOLLOW;
+
+	gossip_debug(GOSSIP_NAME_DEBUG, "%s:%s:%d using parent %pU\n",
+		     __FILE__,
+		     __func__,
+		     __LINE__,
+		     &parent->refn.khandle);
+	new_op->upcall.req.lookup.parent_refn = parent->refn;
+
+	strncpy(new_op->upcall.req.lookup.d_name, dentry->d_name.name,
+		PVFS2_NAME_LEN);
+
+	gossip_debug(GOSSIP_NAME_DEBUG,
+		     "%s: doing lookup on %s under %pU,%d (follow=%s)\n",
+		     __func__,
+		     new_op->upcall.req.lookup.d_name,
+		     &new_op->upcall.req.lookup.parent_refn.khandle,
+		     new_op->upcall.req.lookup.parent_refn.fs_id,
+		     ((new_op->upcall.req.lookup.sym_follow ==
+		       PVFS2_LOOKUP_LINK_FOLLOW) ? "yes" : "no"));
+
+	ret = service_operation(new_op, __func__, get_interruptible_flag(dir));
+
+	gossip_debug(GOSSIP_NAME_DEBUG,
+		     "Lookup Got %pU, fsid %d (ret=%d)\n",
+		     &new_op->downcall.resp.lookup.refn.khandle,
+		     new_op->downcall.resp.lookup.refn.fs_id,
+		     ret);
+
+	if (ret < 0) {
+		if (ret == -ENOENT) {
+			/*
+			 * if no inode was found, add a negative dentry to
+			 * dcache anyway; if we don't, we don't hold expected
+			 * lookup semantics and we most noticeably break
+			 * during directory renames.
+			 *
+			 * however, if the operation failed or exited, do not
+			 * add the dentry (e.g. in the case that a touch is
+			 * issued on a file that already exists that was
+			 * interrupted during this lookup -- no need to add
+			 * another negative dentry for an existing file)
+			 */
+
+			gossip_debug(GOSSIP_NAME_DEBUG,
+				     "pvfs2_lookup: Adding *negative* dentry "
+				     "%p for %s\n",
+				     dentry,
+				     dentry->d_name.name);
+
+			d_add(dentry, NULL);
+			res = NULL;
+			goto out;
+		}
+
+		/* must be a non-recoverable error */
+		res = ERR_PTR(ret);
+		goto out;
+	}
+
+	inode = pvfs2_iget(dir->i_sb, &new_op->downcall.resp.lookup.refn);
+	if (IS_ERR(inode)) {
+		gossip_debug(GOSSIP_NAME_DEBUG,
+			"error %ld from iget\n", PTR_ERR(inode));
+		res = ERR_CAST(inode);
+		goto out;
+	}
+
+	gossip_debug(GOSSIP_NAME_DEBUG,
+		     "%s:%s:%d "
+		     "Found good inode [%lu] with count [%d]\n",
+		     __FILE__,
+		     __func__,
+		     __LINE__,
+		     inode->i_ino,
+		     (int)atomic_read(&inode->i_count));
+
+	/* update dentry/inode pair into dcache */
+	res = d_splice_alias(inode, dentry);
+
+	gossip_debug(GOSSIP_NAME_DEBUG,
+		     "Lookup success (inode ct = %d)\n",
+		     (int)atomic_read(&inode->i_count));
+out:
+	op_release(new_op);
+	return res;
+}
+
+/* return 0 on success; non-zero otherwise */
+static int pvfs2_unlink(struct inode *dir, struct dentry *dentry)
+{
+	struct inode *inode = dentry->d_inode;
+	struct pvfs2_inode_s *parent = PVFS2_I(dir);
+	struct pvfs2_kernel_op_s *new_op;
+	int ret;
+
+	gossip_debug(GOSSIP_NAME_DEBUG,
+		     "%s: called on %s\n"
+		     "  (inode %pU): Parent is %pU | fs_id %d\n",
+		     __func__,
+		     dentry->d_name.name,
+		     get_khandle_from_ino(inode),
+		     &parent->refn.khandle,
+		     parent->refn.fs_id);
+
+	new_op = op_alloc(PVFS2_VFS_OP_REMOVE);
+	if (!new_op)
+		return -ENOMEM;
+
+	new_op->upcall.req.remove.parent_refn = parent->refn;
+	strncpy(new_op->upcall.req.remove.d_name, dentry->d_name.name,
+		PVFS2_NAME_LEN);
+
+	ret = service_operation(new_op, "pvfs2_unlink",
+				get_interruptible_flag(inode));
+
+	/* when request is serviced properly, free req op struct */
+	op_release(new_op);
+
+	if (!ret) {
+		drop_nlink(inode);
+
+		SetMtimeFlag(parent);
+		dir->i_mtime = dir->i_ctime = current_fs_time(dir->i_sb);
+		mark_inode_dirty_sync(dir);
+	}
+	return ret;
+}
+
+/*
+ * pvfs2_link() is only implemented here to make sure that we return a
+ * reasonable error code (the kernel will return a misleading EPERM
+ * otherwise).  PVFS2 does not support hard links.
+ */
+static int pvfs2_link(struct dentry *old_dentry,
+		      struct inode *dir,
+		      struct dentry *dentry)
+{
+	return -EOPNOTSUPP;
+}
+
+/*
+ * pvfs2_mknod() is only implemented here to make sure that we return a
+ * reasonable error code (the kernel will return a misleading EPERM
+ * otherwise).  PVFS2 does not support special files such as fifos or devices.
+ */
+static int pvfs2_mknod(struct inode *dir,
+		       struct dentry *dentry,
+		       umode_t mode,
+		       dev_t rdev)
+{
+	return -EOPNOTSUPP;
+}
+
+static int pvfs2_symlink(struct inode *dir,
+			 struct dentry *dentry,
+			 const char *symname)
+{
+	struct pvfs2_inode_s *parent = PVFS2_I(dir);
+	struct pvfs2_kernel_op_s *new_op;
+	struct inode *inode;
+	int mode = 755;
+	int ret;
+
+	gossip_debug(GOSSIP_NAME_DEBUG, "%s: called\n", __func__);
+
+	if (!symname)
+		return -EINVAL;
+
+	new_op = op_alloc(PVFS2_VFS_OP_SYMLINK);
+	if (!new_op)
+		return -ENOMEM;
+
+	new_op->upcall.req.sym.parent_refn = parent->refn;
+
+	fill_default_sys_attrs(new_op->upcall.req.sym.attributes,
+			       PVFS_TYPE_SYMLINK,
+			       mode);
+
+	strncpy(new_op->upcall.req.sym.entry_name,
+		dentry->d_name.name,
+		PVFS2_NAME_LEN);
+	strncpy(new_op->upcall.req.sym.target, symname, PVFS2_NAME_LEN);
+
+	ret = service_operation(new_op, __func__, get_interruptible_flag(dir));
+
+	gossip_debug(GOSSIP_NAME_DEBUG,
+		     "Symlink Got PVFS2 handle %pU on fsid %d (ret=%d)\n",
+		     &new_op->downcall.resp.sym.refn.khandle,
+		     new_op->downcall.resp.sym.refn.fs_id, ret);
+
+	if (ret < 0) {
+		gossip_debug(GOSSIP_NAME_DEBUG,
+			    "%s: failed with error code %d\n",
+			    __func__, ret);
+		goto out;
+	}
+
+	inode = pvfs2_new_inode(dir->i_sb, dir, S_IFLNK | mode, 0,
+				&new_op->downcall.resp.sym.refn);
+	if (IS_ERR(inode)) {
+		gossip_err
+		    ("*** Failed to allocate pvfs2 symlink inode\n");
+		ret = PTR_ERR(inode);
+		goto out;
+	}
+
+	gossip_debug(GOSSIP_NAME_DEBUG,
+		     "Assigned symlink inode new number of %pU\n",
+		     get_khandle_from_ino(inode));
+
+	d_instantiate(dentry, inode);
+	unlock_new_inode(inode);
+
+	gossip_debug(GOSSIP_NAME_DEBUG,
+		     "Inode (Symlink) %pU -> %s\n",
+		     get_khandle_from_ino(inode),
+		     dentry->d_name.name);
+
+	SetMtimeFlag(parent);
+	dir->i_mtime = dir->i_ctime = current_fs_time(dir->i_sb);
+	mark_inode_dirty_sync(dir);
+	ret = 0;
+out:
+	op_release(new_op);
+	return ret;
+}
+
+static int pvfs2_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
+{
+	struct pvfs2_inode_s *parent = PVFS2_I(dir);
+	struct pvfs2_kernel_op_s *new_op;
+	struct inode *inode;
+	int ret;
+
+	new_op = op_alloc(PVFS2_VFS_OP_MKDIR);
+	if (!new_op)
+		return -ENOMEM;
+
+	new_op->upcall.req.mkdir.parent_refn = parent->refn;
+
+	fill_default_sys_attrs(new_op->upcall.req.mkdir.attributes,
+			       PVFS_TYPE_DIRECTORY, mode);
+
+	strncpy(new_op->upcall.req.mkdir.d_name,
+		dentry->d_name.name, PVFS2_NAME_LEN);
+
+	ret = service_operation(new_op, __func__, get_interruptible_flag(dir));
+
+	gossip_debug(GOSSIP_NAME_DEBUG,
+		     "Mkdir Got PVFS2 handle %pU on fsid %d\n",
+		     &new_op->downcall.resp.mkdir.refn.khandle,
+		     new_op->downcall.resp.mkdir.refn.fs_id);
+
+	if (ret < 0) {
+		gossip_debug(GOSSIP_NAME_DEBUG,
+			     "%s: failed with error code %d\n",
+			     __func__, ret);
+		goto out;
+	}
+
+	inode = pvfs2_new_inode(dir->i_sb, dir, S_IFDIR | mode, 0,
+				&new_op->downcall.resp.mkdir.refn);
+	if (IS_ERR(inode)) {
+		gossip_err("*** Failed to allocate pvfs2 dir inode\n");
+		ret = PTR_ERR(inode);
+		goto out;
+	}
+
+	gossip_debug(GOSSIP_NAME_DEBUG,
+		     "Assigned dir inode new number of %pU\n",
+		     get_khandle_from_ino(inode));
+
+	d_instantiate(dentry, inode);
+	unlock_new_inode(inode);
+
+	gossip_debug(GOSSIP_NAME_DEBUG,
+		     "Inode (Directory) %pU -> %s\n",
+		     get_khandle_from_ino(inode),
+		     dentry->d_name.name);
+
+	/*
+	 * NOTE: we have no good way to keep nlink consistent for directories
+	 * across clients; keep constant at 1.
+	 */
+	SetMtimeFlag(parent);
+	dir->i_mtime = dir->i_ctime = current_fs_time(dir->i_sb);
+	mark_inode_dirty_sync(dir);
+out:
+	op_release(new_op);
+	return ret;
+}
+
+static int pvfs2_rename(struct inode *old_dir,
+			struct dentry *old_dentry,
+			struct inode *new_dir,
+			struct dentry *new_dentry)
+{
+	struct pvfs2_kernel_op_s *new_op;
+	int ret;
+
+	gossip_debug(GOSSIP_NAME_DEBUG,
+		     "pvfs2_rename: called (%s/%s => %s/%s) ct=%d\n",
+		     old_dentry->d_parent->d_name.name,
+		     old_dentry->d_name.name,
+		     new_dentry->d_parent->d_name.name,
+		     new_dentry->d_name.name,
+		     d_count(new_dentry));
+
+	new_op = op_alloc(PVFS2_VFS_OP_RENAME);
+	if (!new_op)
+		return -EINVAL;
+
+	new_op->upcall.req.rename.old_parent_refn = PVFS2_I(old_dir)->refn;
+	new_op->upcall.req.rename.new_parent_refn = PVFS2_I(new_dir)->refn;
+
+	strncpy(new_op->upcall.req.rename.d_old_name,
+		old_dentry->d_name.name,
+		PVFS2_NAME_LEN);
+	strncpy(new_op->upcall.req.rename.d_new_name,
+		new_dentry->d_name.name,
+		PVFS2_NAME_LEN);
+
+	ret = service_operation(new_op,
+				"pvfs2_rename",
+				get_interruptible_flag(old_dentry->d_inode));
+
+	gossip_debug(GOSSIP_NAME_DEBUG,
+		     "pvfs2_rename: got downcall status %d\n",
+		     ret);
+
+	if (new_dentry->d_inode)
+		new_dentry->d_inode->i_ctime = CURRENT_TIME;
+
+	op_release(new_op);
+	return ret;
+}
+
+/* PVFS2 implementation of VFS inode operations for directories */
+struct inode_operations pvfs2_dir_inode_operations = {
+	.lookup = pvfs2_lookup,
+	.get_acl = pvfs2_get_acl,
+	.set_acl = pvfs2_set_acl,
+	.create = pvfs2_create,
+	.link = pvfs2_link,
+	.unlink = pvfs2_unlink,
+	.symlink = pvfs2_symlink,
+	.mkdir = pvfs2_mkdir,
+	.rmdir = pvfs2_unlink,
+	.mknod = pvfs2_mknod,
+	.rename = pvfs2_rename,
+	.setattr = pvfs2_setattr,
+	.getattr = pvfs2_getattr,
+	.setxattr = generic_setxattr,
+	.getxattr = generic_getxattr,
+	.removexattr = generic_removexattr,
+	.listxattr = pvfs2_listxattr,
+};
diff --git a/fs/orangefs/pvfs2-bufmap.c b/fs/orangefs/pvfs2-bufmap.c
new file mode 100644
index 0000000..aa14c37
--- /dev/null
+++ b/fs/orangefs/pvfs2-bufmap.c
@@ -0,0 +1,970 @@
+/*
+ * (C) 2001 Clemson University and The University of Chicago
+ *
+ * See COPYING in top-level directory.
+ */
+#include "protocol.h"
+#include "pvfs2-kernel.h"
+#include "pvfs2-bufmap.h"
+
+DECLARE_WAIT_QUEUE_HEAD(pvfs2_bufmap_init_waitq);
+
+struct pvfs2_bufmap {
+	atomic_t refcnt;
+
+	int desc_size;
+	int desc_shift;
+	int desc_count;
+	int total_size;
+	int page_count;
+
+	struct page **page_array;
+	struct pvfs_bufmap_desc *desc_array;
+
+	/* array to track usage of buffer descriptors */
+	int *buffer_index_array;
+	spinlock_t buffer_index_lock;
+
+	/* array to track usage of buffer descriptors for readdir */
+	int readdir_index_array[PVFS2_READDIR_DEFAULT_DESC_COUNT];
+	spinlock_t readdir_index_lock;
+} *__pvfs2_bufmap;
+
+static DEFINE_SPINLOCK(pvfs2_bufmap_lock);
+
+static void
+pvfs2_bufmap_unmap(struct pvfs2_bufmap *bufmap)
+{
+	int i;
+
+	for (i = 0; i < bufmap->page_count; i++)
+		page_cache_release(bufmap->page_array[i]);
+}
+
+static void
+pvfs2_bufmap_free(struct pvfs2_bufmap *bufmap)
+{
+	kfree(bufmap->page_array);
+	kfree(bufmap->desc_array);
+	kfree(bufmap->buffer_index_array);
+	kfree(bufmap);
+}
+
+struct pvfs2_bufmap *pvfs2_bufmap_ref(void)
+{
+	struct pvfs2_bufmap *bufmap = NULL;
+
+	spin_lock(&pvfs2_bufmap_lock);
+	if (__pvfs2_bufmap) {
+		bufmap = __pvfs2_bufmap;
+		atomic_inc(&bufmap->refcnt);
+	}
+	spin_unlock(&pvfs2_bufmap_lock);
+	return bufmap;
+}
+
+void pvfs2_bufmap_unref(struct pvfs2_bufmap *bufmap)
+{
+	if (atomic_dec_and_lock(&bufmap->refcnt, &pvfs2_bufmap_lock)) {
+		__pvfs2_bufmap = NULL;
+		spin_unlock(&pvfs2_bufmap_lock);
+
+		pvfs2_bufmap_unmap(bufmap);
+		pvfs2_bufmap_free(bufmap);
+	}
+}
+
+inline int pvfs_bufmap_size_query(void)
+{
+	struct pvfs2_bufmap *bufmap = pvfs2_bufmap_ref();
+	int size = bufmap ? bufmap->desc_size : 0;
+
+	pvfs2_bufmap_unref(bufmap);
+	return size;
+}
+
+inline int pvfs_bufmap_shift_query(void)
+{
+	struct pvfs2_bufmap *bufmap = pvfs2_bufmap_ref();
+	int shift = bufmap ? bufmap->desc_shift : 0;
+
+	pvfs2_bufmap_unref(bufmap);
+	return shift;
+}
+
+static DECLARE_WAIT_QUEUE_HEAD(bufmap_waitq);
+static DECLARE_WAIT_QUEUE_HEAD(readdir_waitq);
+
+/*
+ * get_bufmap_init
+ *
+ * If bufmap_init is 1, then the shared memory system, including the
+ * buffer_index_array, is available.  Otherwise, it is not.
+ *
+ * returns the value of bufmap_init
+ */
+int get_bufmap_init(void)
+{
+	return __pvfs2_bufmap ? 1 : 0;
+}
+
+
+static struct pvfs2_bufmap *
+pvfs2_bufmap_alloc(struct PVFS_dev_map_desc *user_desc)
+{
+	struct pvfs2_bufmap *bufmap;
+
+	bufmap = kzalloc(sizeof(*bufmap), GFP_KERNEL);
+	if (!bufmap)
+		goto out;
+
+	atomic_set(&bufmap->refcnt, 1);
+	bufmap->total_size = user_desc->total_size;
+	bufmap->desc_count = user_desc->count;
+	bufmap->desc_size = user_desc->size;
+	bufmap->desc_shift = ilog2(bufmap->desc_size);
+
+	spin_lock_init(&bufmap->buffer_index_lock);
+	bufmap->buffer_index_array =
+		kcalloc(bufmap->desc_count, sizeof(int), GFP_KERNEL);
+	if (!bufmap->buffer_index_array) {
+		gossip_err("pvfs2: could not allocate %d buffer indices\n",
+				bufmap->desc_count);
+		goto out_free_bufmap;
+	}
+	spin_lock_init(&bufmap->readdir_index_lock);
+
+	bufmap->desc_array =
+		kcalloc(bufmap->desc_count, sizeof(struct pvfs_bufmap_desc),
+			GFP_KERNEL);
+	if (!bufmap->desc_array) {
+		gossip_err("pvfs2: could not allocate %d descriptors\n",
+				bufmap->desc_count);
+		goto out_free_index_array;
+	}
+
+	bufmap->page_count = bufmap->total_size / PAGE_SIZE;
+
+	/* allocate storage to track our page mappings */
+	bufmap->page_array =
+		kcalloc(bufmap->page_count, sizeof(struct page *), GFP_KERNEL);
+	if (!bufmap->page_array)
+		goto out_free_desc_array;
+
+	return bufmap;
+
+out_free_desc_array:
+	kfree(bufmap->desc_array);
+out_free_index_array:
+	kfree(bufmap->buffer_index_array);
+out_free_bufmap:
+	kfree(bufmap);
+out:
+	return NULL;
+}
+
+static int
+pvfs2_bufmap_map(struct pvfs2_bufmap *bufmap,
+		struct PVFS_dev_map_desc *user_desc)
+{
+	int pages_per_desc = bufmap->desc_size / PAGE_SIZE;
+	int offset = 0, ret, i;
+
+	/* map the pages */
+	down_write(&current->mm->mmap_sem);
+	ret = get_user_pages(current,
+			     current->mm,
+			     (unsigned long)user_desc->ptr,
+			     bufmap->page_count,
+			     1,
+			     0,
+			     bufmap->page_array,
+			     NULL);
+	up_write(&current->mm->mmap_sem);
+
+	if (ret < 0)
+		return ret;
+
+	if (ret != bufmap->page_count) {
+		gossip_err("pvfs2 error: asked for %d pages, only got %d.\n",
+				bufmap->page_count, ret);
+
+		for (i = 0; i < ret; i++) {
+			SetPageError(bufmap->page_array[i]);
+			page_cache_release(bufmap->page_array[i]);
+		}
+		return -ENOMEM;
+	}
+
+	/*
+	 * ideally we want to get kernel space pointers for each page, but
+	 * we can't kmap that many pages at once if highmem is being used.
+	 * so instead, we just kmap/kunmap the page address each time the
+	 * kaddr is needed.
+	 */
+	for (i = 0; i < bufmap->page_count; i++)
+		flush_dcache_page(bufmap->page_array[i]);
+
+	/* build a list of available descriptors */
+	for (offset = 0, i = 0; i < bufmap->desc_count; i++) {
+		bufmap->desc_array[i].page_array = &bufmap->page_array[offset];
+		bufmap->desc_array[i].array_count = pages_per_desc;
+		bufmap->desc_array[i].uaddr =
+		    (user_desc->ptr + (i * pages_per_desc * PAGE_SIZE));
+		offset += pages_per_desc;
+	}
+
+	return 0;
+}
+
+/*
+ * pvfs_bufmap_initialize()
+ *
+ * initializes the mapped buffer interface
+ *
+ * returns 0 on success, -errno on failure
+ */
+int pvfs_bufmap_initialize(struct PVFS_dev_map_desc *user_desc)
+{
+	struct pvfs2_bufmap *bufmap;
+	int ret = -EINVAL;
+
+	gossip_debug(GOSSIP_BUFMAP_DEBUG,
+		     "pvfs_bufmap_initialize: called (ptr ("
+		     "%p) sz (%d) cnt(%d).\n",
+		     user_desc->ptr,
+		     user_desc->size,
+		     user_desc->count);
+
+	/*
+	 * sanity check alignment and size of buffer that caller wants to
+	 * work with
+	 */
+	if (PAGE_ALIGN((unsigned long)user_desc->ptr) !=
+	    (unsigned long)user_desc->ptr) {
+		gossip_err("pvfs2 error: memory alignment (front). %p\n",
+			   user_desc->ptr);
+		goto out;
+	}
+
+	if (PAGE_ALIGN(((unsigned long)user_desc->ptr + user_desc->total_size))
+	    != (unsigned long)(user_desc->ptr + user_desc->total_size)) {
+		gossip_err("pvfs2 error: memory alignment (back).(%p + %d)\n",
+			   user_desc->ptr,
+			   user_desc->total_size);
+		goto out;
+	}
+
+	if (user_desc->total_size != (user_desc->size * user_desc->count)) {
+		gossip_err("pvfs2 error: user provided an oddly sized buffer: (%d, %d, %d)\n",
+			   user_desc->total_size,
+			   user_desc->size,
+			   user_desc->count);
+		goto out;
+	}
+
+	if ((user_desc->size % PAGE_SIZE) != 0) {
+		gossip_err("pvfs2 error: bufmap size not page size divisible (%d).\n",
+			   user_desc->size);
+		goto out;
+	}
+
+	ret = -ENOMEM;
+	bufmap = pvfs2_bufmap_alloc(user_desc);
+	if (!bufmap)
+		goto out;
+
+	ret = pvfs2_bufmap_map(bufmap, user_desc);
+	if (ret)
+		goto out_free_bufmap;
+
+
+	spin_lock(&pvfs2_bufmap_lock);
+	if (__pvfs2_bufmap) {
+		spin_unlock(&pvfs2_bufmap_lock);
+		gossip_err("pvfs2: error: bufmap already initialized.\n");
+		ret = -EALREADY;
+		goto out_unmap_bufmap;
+	}
+	__pvfs2_bufmap = bufmap;
+	spin_unlock(&pvfs2_bufmap_lock);
+
+	/*
+	 * If there are operations in pvfs2_bufmap_init_waitq, wake them up.
+	 * This scenario occurs when the client-core is restarted and I/O
+	 * requests in the in-progress or waiting tables are restarted.  I/O
+	 * requests cannot be restarted until the shared memory system is
+	 * completely re-initialized, so we put the I/O requests in this
+	 * waitq until initialization has completed.  NOTE:  the I/O requests
+	 * are also on a timer, so they don't wait forever just in case the
+	 * client-core doesn't come back up.
+	 */
+	wake_up_interruptible(&pvfs2_bufmap_init_waitq);
+
+	gossip_debug(GOSSIP_BUFMAP_DEBUG,
+		     "pvfs_bufmap_initialize: exiting normally\n");
+	return 0;
+
+out_unmap_bufmap:
+	pvfs2_bufmap_unmap(bufmap);
+out_free_bufmap:
+	pvfs2_bufmap_free(bufmap);
+out:
+	return ret;
+}
+
+/*
+ * pvfs_bufmap_finalize()
+ *
+ * shuts down the mapped buffer interface and releases any resources
+ * associated with it
+ *
+ * no return value
+ */
+void pvfs_bufmap_finalize(void)
+{
+	gossip_debug(GOSSIP_BUFMAP_DEBUG, "pvfs2_bufmap_finalize: called\n");
+	BUG_ON(!__pvfs2_bufmap);
+	pvfs2_bufmap_unref(__pvfs2_bufmap);
+	gossip_debug(GOSSIP_BUFMAP_DEBUG,
+		     "pvfs2_bufmap_finalize: exiting normally\n");
+}
+
+struct slot_args {
+	int slot_count;
+	int *slot_array;
+	spinlock_t *slot_lock;
+	wait_queue_head_t *slot_wq;
+};
+
+static int wait_for_a_slot(struct slot_args *slargs, int *buffer_index)
+{
+	int ret = -1;
+	int i = 0;
+	DECLARE_WAITQUEUE(my_wait, current);
+
+
+	add_wait_queue_exclusive(slargs->slot_wq, &my_wait);
+
+	while (1) {
+		set_current_state(TASK_INTERRUPTIBLE);
+
+		/*
+		 * check for available desc, slot_lock is the appropriate
+		 * index_lock
+		 */
+		spin_lock(slargs->slot_lock);
+		for (i = 0; i < slargs->slot_count; i++)
+			if (slargs->slot_array[i] == 0) {
+				slargs->slot_array[i] = 1;
+				*buffer_index = i;
+				ret = 0;
+				break;
+			}
+		spin_unlock(slargs->slot_lock);
+
+		/* if we acquired a buffer, then break out of while */
+		if (ret == 0)
+			break;
+
+		if (!signal_pending(current)) {
+			int timeout =
+			    MSECS_TO_JIFFIES(1000 * slot_timeout_secs);
+			gossip_debug(GOSSIP_BUFMAP_DEBUG,
+				     "[BUFMAP]: waiting %d "
+				     "seconds for a slot\n",
+				     slot_timeout_secs);
+			if (!schedule_timeout(timeout)) {
+				gossip_debug(GOSSIP_BUFMAP_DEBUG,
+					     "*** wait_for_a_slot timed out\n");
+				ret = -ETIMEDOUT;
+				break;
+			}
+			gossip_debug(GOSSIP_BUFMAP_DEBUG,
+			  "[BUFMAP]: woken up by a slot becoming available.\n");
+			continue;
+		}
+
+		gossip_debug(GOSSIP_BUFMAP_DEBUG, "pvfs2: %s interrupted.\n",
+			     __func__);
+		ret = -EINTR;
+		break;
+	}
+
+	set_current_state(TASK_RUNNING);
+	remove_wait_queue(slargs->slot_wq, &my_wait);
+	return ret;
+}
+
+static void put_back_slot(struct slot_args *slargs, int buffer_index)
+{
+	/* slot_lock is the appropriate index_lock */
+	spin_lock(slargs->slot_lock);
+	if (buffer_index < 0 || buffer_index >= slargs->slot_count) {
+		spin_unlock(slargs->slot_lock);
+		return;
+	}
+
+	/* put the desc back on the queue */
+	slargs->slot_array[buffer_index] = 0;
+	spin_unlock(slargs->slot_lock);
+
+	/* wake up anyone who may be sleeping on the queue */
+	wake_up_interruptible(slargs->slot_wq);
+}
+
+/*
+ * pvfs_bufmap_get()
+ *
+ * gets a free mapped buffer descriptor, will sleep until one becomes
+ * available if necessary
+ *
+ * returns 0 on success, -errno on failure
+ */
+int pvfs_bufmap_get(struct pvfs2_bufmap **mapp, int *buffer_index)
+{
+	struct pvfs2_bufmap *bufmap = pvfs2_bufmap_ref();
+	struct slot_args slargs;
+	int ret;
+
+	if (!bufmap) {
+		gossip_err("pvfs2: please confirm that pvfs2-client daemon is running.\n");
+		return -EIO;
+	}
+
+	slargs.slot_count = bufmap->desc_count;
+	slargs.slot_array = bufmap->buffer_index_array;
+	slargs.slot_lock = &bufmap->buffer_index_lock;
+	slargs.slot_wq = &bufmap_waitq;
+	ret = wait_for_a_slot(&slargs, buffer_index);
+	if (ret)
+		pvfs2_bufmap_unref(bufmap);
+	*mapp = bufmap;
+	return ret;
+}
+
+/*
+ * pvfs_bufmap_put()
+ *
+ * returns a mapped buffer descriptor to the collection
+ *
+ * no return value
+ */
+void pvfs_bufmap_put(struct pvfs2_bufmap *bufmap, int buffer_index)
+{
+	struct slot_args slargs;
+
+	slargs.slot_count = bufmap->desc_count;
+	slargs.slot_array = bufmap->buffer_index_array;
+	slargs.slot_lock = &bufmap->buffer_index_lock;
+	slargs.slot_wq = &bufmap_waitq;
+	put_back_slot(&slargs, buffer_index);
+	pvfs2_bufmap_unref(bufmap);
+}
+
+/*
+ * readdir_index_get()
+ *
+ * gets a free descriptor, will sleep until one becomes
+ * available if necessary.
+ * Although the readdir buffers are not mapped into kernel space
+ * we could do that at a later point of time. Regardless, these
+ * indices are used by the client-core.
+ *
+ * returns 0 on success, -errno on failure
+ */
+int readdir_index_get(struct pvfs2_bufmap **mapp, int *buffer_index)
+{
+	struct pvfs2_bufmap *bufmap = pvfs2_bufmap_ref();
+	struct slot_args slargs;
+	int ret;
+
+	if (!bufmap) {
+		gossip_err("pvfs2: please confirm that pvfs2-client daemon is running.\n");
+		return -EIO;
+	}
+
+	slargs.slot_count = PVFS2_READDIR_DEFAULT_DESC_COUNT;
+	slargs.slot_array = bufmap->readdir_index_array;
+	slargs.slot_lock = &bufmap->readdir_index_lock;
+	slargs.slot_wq = &readdir_waitq;
+	ret = wait_for_a_slot(&slargs, buffer_index);
+	if (ret)
+		pvfs2_bufmap_unref(bufmap);
+	*mapp = bufmap;
+	return ret;
+}
+
+void readdir_index_put(struct pvfs2_bufmap *bufmap, int buffer_index)
+{
+	struct slot_args slargs;
+
+	slargs.slot_count = PVFS2_READDIR_DEFAULT_DESC_COUNT;
+	slargs.slot_array = bufmap->readdir_index_array;
+	slargs.slot_lock = &bufmap->readdir_index_lock;
+	slargs.slot_wq = &readdir_waitq;
+	put_back_slot(&slargs, buffer_index);
+	pvfs2_bufmap_unref(bufmap);
+}
+
+/*
+ * pvfs_bufmap_copy_iovec_from_user()
+ *
+ * copies data from several user space address's in an iovec
+ * to a mapped buffer
+ *
+ * Note that the mapped buffer is a series of pages and therefore
+ * the copies have to be split by PAGE_SIZE bytes at a time.
+ * Note that this routine checks that summation of iov_len
+ * across all the elements of iov is equal to size.
+ *
+ * returns 0 on success, -errno on failure
+ */
+int pvfs_bufmap_copy_iovec_from_user(struct pvfs2_bufmap *bufmap,
+				     int buffer_index,
+				     const struct iovec *iov,
+				     unsigned long nr_segs,
+				     size_t size)
+{
+	size_t ret = 0;
+	size_t amt_copied = 0;
+	size_t cur_copy_size = 0;
+	unsigned int to_page_offset = 0;
+	unsigned int to_page_index = 0;
+	void *to_kaddr = NULL;
+	void __user *from_addr = NULL;
+	struct iovec *copied_iovec = NULL;
+	struct pvfs_bufmap_desc *to;
+	unsigned int seg;
+	char *tmp_printer = NULL;
+	int tmp_int = 0;
+
+	gossip_debug(GOSSIP_BUFMAP_DEBUG,
+		     "pvfs_bufmap_copy_iovec_from_user: index %d, "
+		     "size %zd\n",
+		     buffer_index,
+		     size);
+
+	to = &bufmap->desc_array[buffer_index];
+
+	/*
+	 * copy the passed in iovec so that we can change some of its fields
+	 */
+	copied_iovec = kmalloc_array(nr_segs,
+				     sizeof(*copied_iovec),
+				     PVFS2_BUFMAP_GFP_FLAGS);
+	if (copied_iovec == NULL)
+		return -ENOMEM;
+
+	memcpy(copied_iovec, iov, nr_segs * sizeof(*copied_iovec));
+	/*
+	 * Go through each segment in the iovec and make sure that
+	 * the summation of iov_len matches the given size.
+	 */
+	for (seg = 0, amt_copied = 0; seg < nr_segs; seg++)
+		amt_copied += copied_iovec[seg].iov_len;
+	if (amt_copied != size) {
+		gossip_err(
+		    "pvfs2_bufmap_copy_iovec_from_user: computed total ("
+		    "%zd) is not equal to (%zd)\n",
+		    amt_copied,
+		    size);
+		kfree(copied_iovec);
+		return -EINVAL;
+	}
+
+	to_page_index = 0;
+	to_page_offset = 0;
+	amt_copied = 0;
+	seg = 0;
+	/*
+	 * Go through each segment in the iovec and copy its
+	 * buffer into the mapped buffer one page at a time though
+	 */
+	while (amt_copied < size) {
+		struct iovec *iv = &copied_iovec[seg];
+		int inc_to_page_index;
+
+		if (iv->iov_len < (PAGE_SIZE - to_page_offset)) {
+			cur_copy_size =
+			    PVFS_util_min(iv->iov_len, size - amt_copied);
+			seg++;
+			from_addr = iv->iov_base;
+			inc_to_page_index = 0;
+		} else if (iv->iov_len == (PAGE_SIZE - to_page_offset)) {
+			cur_copy_size =
+			    PVFS_util_min(iv->iov_len, size - amt_copied);
+			seg++;
+			from_addr = iv->iov_base;
+			inc_to_page_index = 1;
+		} else {
+			cur_copy_size =
+			    PVFS_util_min(PAGE_SIZE - to_page_offset,
+					  size - amt_copied);
+			from_addr = iv->iov_base;
+			iv->iov_base += cur_copy_size;
+			iv->iov_len -= cur_copy_size;
+			inc_to_page_index = 1;
+		}
+		to_kaddr = pvfs2_kmap(to->page_array[to_page_index]);
+		ret =
+		    copy_from_user(to_kaddr + to_page_offset,
+				   from_addr,
+				   cur_copy_size);
+		if (!PageReserved(to->page_array[to_page_index]))
+			SetPageDirty(to->page_array[to_page_index]);
+
+		if (!tmp_printer) {
+			tmp_printer = (char *)(to_kaddr + to_page_offset);
+			tmp_int += tmp_printer[0];
+			gossip_debug(GOSSIP_BUFMAP_DEBUG,
+				     "First character (integer value) in pvfs_bufmap_copy_from_user: %d\n",
+				     tmp_int);
+		}
+
+		pvfs2_kunmap(to->page_array[to_page_index]);
+		if (ret) {
+			gossip_err("Failed to copy data from user space\n");
+			kfree(copied_iovec);
+			return -EFAULT;
+		}
+
+		amt_copied += cur_copy_size;
+		if (inc_to_page_index) {
+			to_page_offset = 0;
+			to_page_index++;
+		} else {
+			to_page_offset += cur_copy_size;
+		}
+	}
+	kfree(copied_iovec);
+	return 0;
+}
+
+/*
+ * pvfs_bufmap_copy_iovec_from_kernel()
+ *
+ * copies data from several kernel space address's in an iovec
+ * to a mapped buffer
+ *
+ * Note that the mapped buffer is a series of pages and therefore
+ * the copies have to be split by PAGE_SIZE bytes at a time.
+ * Note that this routine checks that summation of iov_len
+ * across all the elements of iov is equal to size.
+ *
+ * returns 0 on success, -errno on failure
+ */
+int pvfs_bufmap_copy_iovec_from_kernel(struct pvfs2_bufmap *bufmap,
+		int buffer_index, const struct iovec *iov,
+		unsigned long nr_segs, size_t size)
+{
+	size_t amt_copied = 0;
+	size_t cur_copy_size = 0;
+	int to_page_index = 0;
+	void *to_kaddr = NULL;
+	void *from_kaddr = NULL;
+	struct iovec *copied_iovec = NULL;
+	struct pvfs_bufmap_desc *to;
+	unsigned int seg;
+	unsigned to_page_offset = 0;
+
+	gossip_debug(GOSSIP_BUFMAP_DEBUG,
+		     "pvfs_bufmap_copy_iovec_from_kernel: index %d, "
+		     "size %zd\n",
+		     buffer_index,
+		     size);
+
+	to = &bufmap->desc_array[buffer_index];
+	/*
+	 * copy the passed in iovec so that we can change some of its fields
+	 */
+	copied_iovec = kmalloc_array(nr_segs,
+				     sizeof(*copied_iovec),
+				     PVFS2_BUFMAP_GFP_FLAGS);
+	if (copied_iovec == NULL)
+		return -ENOMEM;
+
+	memcpy(copied_iovec, iov, nr_segs * sizeof(*copied_iovec));
+	/*
+	 * Go through each segment in the iovec and make sure that
+	 * the summation of iov_len matches the given size.
+	 */
+	for (seg = 0, amt_copied = 0; seg < nr_segs; seg++)
+		amt_copied += copied_iovec[seg].iov_len;
+	if (amt_copied != size) {
+		gossip_err("pvfs2_bufmap_copy_iovec_from_kernel: computed total(%zd) is not equal to (%zd)\n",
+			   amt_copied,
+			   size);
+		kfree(copied_iovec);
+		return -EINVAL;
+	}
+
+	to_page_index = 0;
+	amt_copied = 0;
+	seg = 0;
+	to_page_offset = 0;
+	/*
+	 * Go through each segment in the iovec and copy its
+	 * buffer into the mapped buffer one page at a time though
+	 */
+	while (amt_copied < size) {
+		struct iovec *iv = &copied_iovec[seg];
+		int inc_to_page_index;
+
+		if (iv->iov_len < (PAGE_SIZE - to_page_offset)) {
+			cur_copy_size =
+			    PVFS_util_min(iv->iov_len, size - amt_copied);
+			seg++;
+			from_kaddr = iv->iov_base;
+			inc_to_page_index = 0;
+		} else if (iv->iov_len == (PAGE_SIZE - to_page_offset)) {
+			cur_copy_size =
+			    PVFS_util_min(iv->iov_len, size - amt_copied);
+			seg++;
+			from_kaddr = iv->iov_base;
+			inc_to_page_index = 1;
+		} else {
+			cur_copy_size =
+			    PVFS_util_min(PAGE_SIZE - to_page_offset,
+					  size - amt_copied);
+			from_kaddr = iv->iov_base;
+			iv->iov_base += cur_copy_size;
+			iv->iov_len -= cur_copy_size;
+			inc_to_page_index = 1;
+		}
+		to_kaddr = pvfs2_kmap(to->page_array[to_page_index]);
+		memcpy(to_kaddr + to_page_offset, from_kaddr, cur_copy_size);
+		if (!PageReserved(to->page_array[to_page_index]))
+			SetPageDirty(to->page_array[to_page_index]);
+		pvfs2_kunmap(to->page_array[to_page_index]);
+		amt_copied += cur_copy_size;
+		if (inc_to_page_index) {
+			to_page_offset = 0;
+			to_page_index++;
+		} else {
+			to_page_offset += cur_copy_size;
+		}
+	}
+	kfree(copied_iovec);
+	return 0;
+}
+
+/*
+ * pvfs_bufmap_copy_to_user_iovec()
+ *
+ * copies data to several user space address's in an iovec
+ * from a mapped buffer
+ *
+ * returns 0 on success, -errno on failure
+ */
+int pvfs_bufmap_copy_to_user_iovec(struct pvfs2_bufmap *bufmap,
+		int buffer_index, const struct iovec *iov,
+		unsigned long nr_segs, size_t size)
+{
+	size_t ret = 0;
+	size_t amt_copied = 0;
+	size_t cur_copy_size = 0;
+	int from_page_index = 0;
+	void *from_kaddr = NULL;
+	void __user *to_addr = NULL;
+	struct iovec *copied_iovec = NULL;
+	struct pvfs_bufmap_desc *from;
+	unsigned int seg;
+	unsigned from_page_offset = 0;
+	char *tmp_printer = NULL;
+	int tmp_int = 0;
+
+	gossip_debug(GOSSIP_BUFMAP_DEBUG,
+		     "pvfs_bufmap_copy_to_user_iovec: index %d, size %zd\n",
+		     buffer_index,
+		     size);
+
+	from = &bufmap->desc_array[buffer_index];
+	/*
+	 * copy the passed in iovec so that we can change some of its fields
+	 */
+	copied_iovec = kmalloc_array(nr_segs,
+				     sizeof(*copied_iovec),
+				     PVFS2_BUFMAP_GFP_FLAGS);
+	if (copied_iovec == NULL)
+		return -ENOMEM;
+
+	memcpy(copied_iovec, iov, nr_segs * sizeof(*copied_iovec));
+	/*
+	 * Go through each segment in the iovec and make sure that
+	 * the summation of iov_len is greater than the given size.
+	 */
+	for (seg = 0, amt_copied = 0; seg < nr_segs; seg++)
+		amt_copied += copied_iovec[seg].iov_len;
+	if (amt_copied < size) {
+		gossip_err("pvfs2_bufmap_copy_to_user_iovec: computed total (%zd) is less than (%zd)\n",
+			   amt_copied,
+			   size);
+		kfree(copied_iovec);
+		return -EINVAL;
+	}
+
+	from_page_index = 0;
+	amt_copied = 0;
+	seg = 0;
+	from_page_offset = 0;
+	/*
+	 * Go through each segment in the iovec and copy from the mapper buffer,
+	 * but make sure that we do so one page at a time.
+	 */
+	while (amt_copied < size) {
+		struct iovec *iv = &copied_iovec[seg];
+		int inc_from_page_index;
+
+		if (iv->iov_len < (PAGE_SIZE - from_page_offset)) {
+			cur_copy_size =
+			    PVFS_util_min(iv->iov_len, size - amt_copied);
+			seg++;
+			to_addr = iv->iov_base;
+			inc_from_page_index = 0;
+		} else if (iv->iov_len == (PAGE_SIZE - from_page_offset)) {
+			cur_copy_size =
+			    PVFS_util_min(iv->iov_len, size - amt_copied);
+			seg++;
+			to_addr = iv->iov_base;
+			inc_from_page_index = 1;
+		} else {
+			cur_copy_size =
+			    PVFS_util_min(PAGE_SIZE - from_page_offset,
+					  size - amt_copied);
+			to_addr = iv->iov_base;
+			iv->iov_base += cur_copy_size;
+			iv->iov_len -= cur_copy_size;
+			inc_from_page_index = 1;
+		}
+		from_kaddr = pvfs2_kmap(from->page_array[from_page_index]);
+		if (!tmp_printer) {
+			tmp_printer = (char *)(from_kaddr + from_page_offset);
+			tmp_int += tmp_printer[0];
+			gossip_debug(GOSSIP_BUFMAP_DEBUG,
+				     "First character (integer value) in pvfs_bufmap_copy_to_user_iovec: %d\n",
+				     tmp_int);
+		}
+		ret =
+		    copy_to_user(to_addr,
+				 from_kaddr + from_page_offset,
+				 cur_copy_size);
+		pvfs2_kunmap(from->page_array[from_page_index]);
+		if (ret) {
+			gossip_err("Failed to copy data to user space\n");
+			kfree(copied_iovec);
+			return -EFAULT;
+		}
+
+		amt_copied += cur_copy_size;
+		if (inc_from_page_index) {
+			from_page_offset = 0;
+			from_page_index++;
+		} else {
+			from_page_offset += cur_copy_size;
+		}
+	}
+	kfree(copied_iovec);
+	return 0;
+}
+
+/*
+ * pvfs_bufmap_copy_to_kernel_iovec()
+ *
+ * copies data to several kernel space address's in an iovec
+ * from a mapped buffer
+ *
+ * returns 0 on success, -errno on failure
+ */
+int pvfs_bufmap_copy_to_kernel_iovec(struct pvfs2_bufmap *bufmap,
+		int buffer_index, const struct iovec *iov,
+		unsigned long nr_segs, size_t size)
+{
+	size_t amt_copied = 0;
+	size_t cur_copy_size = 0;
+	int from_page_index = 0;
+	void *from_kaddr = NULL;
+	void *to_kaddr = NULL;
+	struct iovec *copied_iovec = NULL;
+	struct pvfs_bufmap_desc *from;
+	unsigned int seg;
+	unsigned int from_page_offset = 0;
+
+	gossip_debug(GOSSIP_BUFMAP_DEBUG,
+		     "pvfs_bufmap_copy_to_kernel_iovec: index %d, size %zd\n",
+		      buffer_index,
+		      size);
+
+	from = &bufmap->desc_array[buffer_index];
+	/*
+	 * copy the passed in iovec so that we can change some of its fields
+	 */
+	copied_iovec = kmalloc_array(nr_segs,
+				     sizeof(*copied_iovec),
+				     PVFS2_BUFMAP_GFP_FLAGS);
+	if (copied_iovec == NULL)
+		return -ENOMEM;
+
+	memcpy(copied_iovec, iov, nr_segs * sizeof(*copied_iovec));
+	/*
+	 * Go through each segment in the iovec and make sure that
+	 * the summation of iov_len is greater than the given size.
+	 */
+	for (seg = 0, amt_copied = 0; seg < nr_segs; seg++)
+		amt_copied += copied_iovec[seg].iov_len;
+
+	if (amt_copied < size) {
+		gossip_err("pvfs2_bufmap_copy_to_kernel_iovec: computed total (%zd) is less than (%zd)\n",
+		     amt_copied,
+		     size);
+		kfree(copied_iovec);
+		return -EINVAL;
+	}
+
+	from_page_index = 0;
+	amt_copied = 0;
+	seg = 0;
+	from_page_offset = 0;
+	/*
+	 * Go through each segment in the iovec and copy from the mapper buffer,
+	 * but make sure that we do so one page at a time.
+	 */
+	while (amt_copied < size) {
+		struct iovec *iv = &copied_iovec[seg];
+		int inc_from_page_index;
+
+		if (iv->iov_len < (PAGE_SIZE - from_page_offset)) {
+			cur_copy_size =
+			    PVFS_util_min(iv->iov_len, size - amt_copied);
+			seg++;
+			to_kaddr = iv->iov_base;
+			inc_from_page_index = 0;
+		} else if (iv->iov_len == (PAGE_SIZE - from_page_offset)) {
+			cur_copy_size =
+			    PVFS_util_min(iv->iov_len, size - amt_copied);
+			seg++;
+			to_kaddr = iv->iov_base;
+			inc_from_page_index = 1;
+		} else {
+			cur_copy_size =
+			    PVFS_util_min(PAGE_SIZE - from_page_offset,
+					  size - amt_copied);
+			to_kaddr = iv->iov_base;
+			iv->iov_base += cur_copy_size;
+			iv->iov_len -= cur_copy_size;
+			inc_from_page_index = 1;
+		}
+		from_kaddr = pvfs2_kmap(from->page_array[from_page_index]);
+		memcpy(to_kaddr, from_kaddr + from_page_offset, cur_copy_size);
+		pvfs2_kunmap(from->page_array[from_page_index]);
+		amt_copied += cur_copy_size;
+		if (inc_from_page_index) {
+			from_page_offset = 0;
+			from_page_index++;
+		} else {
+			from_page_offset += cur_copy_size;
+		}
+	}
+	kfree(copied_iovec);
+	return 0;
+}
diff --git a/fs/orangefs/pvfs2-cache.c b/fs/orangefs/pvfs2-cache.c
new file mode 100644
index 0000000..1525188
--- /dev/null
+++ b/fs/orangefs/pvfs2-cache.c
@@ -0,0 +1,260 @@
+/*
+ * (C) 2001 Clemson University and The University of Chicago
+ *
+ * See COPYING in top-level directory.
+ */
+
+#include "protocol.h"
+#include "pvfs2-kernel.h"
+
+/* tags assigned to kernel upcall operations */
+static __u64 next_tag_value;
+static DEFINE_SPINLOCK(next_tag_value_lock);
+
+/* the pvfs2 memory caches */
+
+/* a cache for pvfs2 upcall/downcall operations */
+static struct kmem_cache *op_cache;
+
+/* a cache for device (/dev/pvfs2-req) communication */
+static struct kmem_cache *dev_req_cache;
+
+/* a cache for pvfs2_kiocb objects (i.e pvfs2 iocb structures ) */
+static struct kmem_cache *pvfs2_kiocb_cache;
+
+int op_cache_initialize(void)
+{
+	op_cache = kmem_cache_create("pvfs2_op_cache",
+				     sizeof(struct pvfs2_kernel_op_s),
+				     0,
+				     PVFS2_CACHE_CREATE_FLAGS,
+				     NULL);
+
+	if (!op_cache) {
+		gossip_err("Cannot create pvfs2_op_cache\n");
+		return -ENOMEM;
+	}
+
+	/* initialize our atomic tag counter */
+	spin_lock(&next_tag_value_lock);
+	next_tag_value = 100;
+	spin_unlock(&next_tag_value_lock);
+	return 0;
+}
+
+int op_cache_finalize(void)
+{
+	kmem_cache_destroy(op_cache);
+	return 0;
+}
+
+char *get_opname_string(struct pvfs2_kernel_op_s *new_op)
+{
+	if (new_op) {
+		__s32 type = new_op->upcall.type;
+
+		if (type == PVFS2_VFS_OP_FILE_IO)
+			return "OP_FILE_IO";
+		else if (type == PVFS2_VFS_OP_LOOKUP)
+			return "OP_LOOKUP";
+		else if (type == PVFS2_VFS_OP_CREATE)
+			return "OP_CREATE";
+		else if (type == PVFS2_VFS_OP_GETATTR)
+			return "OP_GETATTR";
+		else if (type == PVFS2_VFS_OP_REMOVE)
+			return "OP_REMOVE";
+		else if (type == PVFS2_VFS_OP_MKDIR)
+			return "OP_MKDIR";
+		else if (type == PVFS2_VFS_OP_READDIR)
+			return "OP_READDIR";
+		else if (type == PVFS2_VFS_OP_READDIRPLUS)
+			return "OP_READDIRPLUS";
+		else if (type == PVFS2_VFS_OP_SETATTR)
+			return "OP_SETATTR";
+		else if (type == PVFS2_VFS_OP_SYMLINK)
+			return "OP_SYMLINK";
+		else if (type == PVFS2_VFS_OP_RENAME)
+			return "OP_RENAME";
+		else if (type == PVFS2_VFS_OP_STATFS)
+			return "OP_STATFS";
+		else if (type == PVFS2_VFS_OP_TRUNCATE)
+			return "OP_TRUNCATE";
+		else if (type == PVFS2_VFS_OP_MMAP_RA_FLUSH)
+			return "OP_MMAP_RA_FLUSH";
+		else if (type == PVFS2_VFS_OP_FS_MOUNT)
+			return "OP_FS_MOUNT";
+		else if (type == PVFS2_VFS_OP_FS_UMOUNT)
+			return "OP_FS_UMOUNT";
+		else if (type == PVFS2_VFS_OP_GETXATTR)
+			return "OP_GETXATTR";
+		else if (type == PVFS2_VFS_OP_SETXATTR)
+			return "OP_SETXATTR";
+		else if (type == PVFS2_VFS_OP_LISTXATTR)
+			return "OP_LISTXATTR";
+		else if (type == PVFS2_VFS_OP_REMOVEXATTR)
+			return "OP_REMOVEXATTR";
+		else if (type == PVFS2_VFS_OP_PARAM)
+			return "OP_PARAM";
+		else if (type == PVFS2_VFS_OP_PERF_COUNT)
+			return "OP_PERF_COUNT";
+		else if (type == PVFS2_VFS_OP_CANCEL)
+			return "OP_CANCEL";
+		else if (type == PVFS2_VFS_OP_FSYNC)
+			return "OP_FSYNC";
+		else if (type == PVFS2_VFS_OP_FSKEY)
+			return "OP_FSKEY";
+		else if (type == PVFS2_VFS_OP_FILE_IOX)
+			return "OP_FILE_IOX";
+	}
+	return "OP_UNKNOWN?";
+}
+
+static struct pvfs2_kernel_op_s *op_alloc_common(__s32 op_linger, __s32 type)
+{
+	struct pvfs2_kernel_op_s *new_op = NULL;
+
+	new_op = kmem_cache_alloc(op_cache, PVFS2_CACHE_ALLOC_FLAGS);
+	if (new_op) {
+		memset(new_op, 0, sizeof(struct pvfs2_kernel_op_s));
+
+		INIT_LIST_HEAD(&new_op->list);
+		spin_lock_init(&new_op->lock);
+		init_waitqueue_head(&new_op->waitq);
+
+		init_waitqueue_head(&new_op->io_completion_waitq);
+		atomic_set(&new_op->aio_ref_count, 0);
+
+		pvfs2_op_initialize(new_op);
+
+		/* initialize the op specific tag and upcall credentials */
+		spin_lock(&next_tag_value_lock);
+		new_op->tag = next_tag_value++;
+		if (next_tag_value == 0)
+			next_tag_value = 100;
+		spin_unlock(&next_tag_value_lock);
+		new_op->upcall.type = type;
+		new_op->attempts = 0;
+		gossip_debug(GOSSIP_CACHE_DEBUG,
+			     "Alloced OP (%p: %llu %s)\n",
+			     new_op,
+			     llu(new_op->tag),
+			     get_opname_string(new_op));
+
+		new_op->upcall.uid = from_kuid(current_user_ns(),
+					       current_fsuid());
+
+		new_op->upcall.gid = from_kgid(current_user_ns(),
+					       current_fsgid());
+
+		new_op->op_linger = new_op->op_linger_tmp = op_linger;
+	} else {
+		gossip_err("op_alloc: kmem_cache_alloc failed!\n");
+	}
+	return new_op;
+}
+
+struct pvfs2_kernel_op_s *op_alloc(__s32 type)
+{
+	return op_alloc_common(1, type);
+}
+
+struct pvfs2_kernel_op_s *op_alloc_trailer(__s32 type)
+{
+	return op_alloc_common(2, type);
+}
+
+void op_release(struct pvfs2_kernel_op_s *pvfs2_op)
+{
+	if (pvfs2_op) {
+		gossip_debug(GOSSIP_CACHE_DEBUG,
+			     "Releasing OP (%p: %llu)\n",
+			     pvfs2_op,
+			     llu(pvfs2_op->tag));
+		pvfs2_op_initialize(pvfs2_op);
+		kmem_cache_free(op_cache, pvfs2_op);
+	} else {
+		gossip_err("NULL pointer in op_release\n");
+	}
+}
+
+int dev_req_cache_initialize(void)
+{
+	dev_req_cache = kmem_cache_create("pvfs2_devreqcache",
+					  MAX_ALIGNED_DEV_REQ_DOWNSIZE,
+					  0,
+					  PVFS2_CACHE_CREATE_FLAGS,
+					  NULL);
+
+	if (!dev_req_cache) {
+		gossip_err("Cannot create pvfs2_dev_req_cache\n");
+		return -ENOMEM;
+	}
+	return 0;
+}
+
+int dev_req_cache_finalize(void)
+{
+	kmem_cache_destroy(dev_req_cache);
+	return 0;
+}
+
+void *dev_req_alloc(void)
+{
+	void *buffer;
+
+	buffer = kmem_cache_alloc(dev_req_cache, PVFS2_CACHE_ALLOC_FLAGS);
+	if (buffer == NULL)
+		gossip_err("Failed to allocate from dev_req_cache\n");
+	else
+		memset(buffer, 0, sizeof(MAX_ALIGNED_DEV_REQ_DOWNSIZE));
+	return buffer;
+}
+
+void dev_req_release(void *buffer)
+{
+	if (buffer)
+		kmem_cache_free(dev_req_cache, buffer);
+	else
+		gossip_err("NULL pointer passed to dev_req_release\n");
+}
+
+int kiocb_cache_initialize(void)
+{
+	pvfs2_kiocb_cache = kmem_cache_create("pvfs2_kiocbcache",
+					      sizeof(struct pvfs2_kiocb_s),
+					      0,
+					      PVFS2_CACHE_CREATE_FLAGS,
+					      NULL);
+
+	if (!pvfs2_kiocb_cache) {
+		gossip_err("Cannot create pvfs2_kiocb_cache!\n");
+		return -ENOMEM;
+	}
+	return 0;
+}
+
+int kiocb_cache_finalize(void)
+{
+	kmem_cache_destroy(pvfs2_kiocb_cache);
+	return 0;
+}
+
+struct pvfs2_kiocb_s *kiocb_alloc(void)
+{
+	struct pvfs2_kiocb_s *x = NULL;
+
+	x = kmem_cache_alloc(pvfs2_kiocb_cache, PVFS2_CACHE_ALLOC_FLAGS);
+	if (x == NULL)
+		gossip_err("kiocb_alloc: kmem_cache_alloc failed!\n");
+	else
+		memset(x, 0, sizeof(struct pvfs2_kiocb_s));
+	return x;
+}
+
+void kiocb_release(struct pvfs2_kiocb_s *x)
+{
+	if (x)
+		kmem_cache_free(pvfs2_kiocb_cache, x);
+	else
+		gossip_err("kiocb_release: kmem_cache_free NULL pointer!\n");
+}
diff --git a/fs/orangefs/pvfs2-debugfs.c b/fs/orangefs/pvfs2-debugfs.c
new file mode 100644
index 0000000..8d118da
--- /dev/null
+++ b/fs/orangefs/pvfs2-debugfs.c
@@ -0,0 +1,458 @@
+/*
+ * What:		/sys/kernel/debug/orangefs/debug-help
+ * Date:		June 2015
+ * Contact:		Mike Marshall <hubcap@omnibond.com>
+ * Description:
+ * 			List of client and kernel debug keywords.
+ *
+ *
+ * What:		/sys/kernel/debug/orangefs/client-debug
+ * Date:		June 2015
+ * Contact:		Mike Marshall <hubcap@omnibond.com>
+ * Description:
+ * 			Debug setting for "the client", the userspace
+ * 			helper for the kernel module.
+ *
+ *
+ * What:		/sys/kernel/debug/orangefs/kernel-debug
+ * Date:		June 2015
+ * Contact:		Mike Marshall <hubcap@omnibond.com>
+ * Description:
+ * 			Debug setting for the orangefs kernel module.
+ *
+ * 			Any of the keywords, or comma-separated lists
+ * 			of keywords, from debug-help can be catted to
+ * 			client-debug or kernel-debug.
+ *
+ * 			"none", "all" and "verbose" are special keywords
+ * 			for client-debug. Setting client-debug to "all"
+ * 			is kind of like trying to drink water from a
+ * 			fire hose, "verbose" triggers most of the same
+ * 			output except for the constant flow of output
+ * 			from the main wait loop.
+ *
+ * 			"none" and "all" are similar settings for kernel-debug
+ * 			no need for a "verbose".
+ */
+#include <linux/debugfs.h>
+#include <linux/slab.h>
+
+#include <linux/uaccess.h>
+
+#include "pvfs2-debugfs.h"
+#include "protocol.h"
+#include "pvfs2-kernel.h"
+
+static int orangefs_debug_disabled = 1;
+
+static int orangefs_debug_help_open(struct inode *, struct file *);
+
+const struct file_operations debug_help_fops = {
+	.open           = orangefs_debug_help_open,
+	.read           = seq_read,
+	.release        = seq_release,
+	.llseek         = seq_lseek,
+};
+
+static void *help_start(struct seq_file *, loff_t *);
+static void *help_next(struct seq_file *, void *, loff_t *);
+static void help_stop(struct seq_file *, void *);
+static int help_show(struct seq_file *, void *);
+
+static const struct seq_operations help_debug_ops = {
+	.start	= help_start,
+	.next	= help_next,
+	.stop	= help_stop,
+	.show	= help_show,
+};
+
+/*
+ * Used to protect data in ORANGEFS_KMOD_DEBUG_FILE and
+ * ORANGEFS_KMOD_DEBUG_FILE.
+ */
+DEFINE_MUTEX(orangefs_debug_lock);
+
+int orangefs_debug_open(struct inode *, struct file *);
+
+static ssize_t orangefs_debug_read(struct file *,
+				 char __user *,
+				 size_t,
+				 loff_t *);
+
+static ssize_t orangefs_debug_write(struct file *,
+				  const char __user *,
+				  size_t,
+				  loff_t *);
+
+static const struct file_operations kernel_debug_fops = {
+	.open           = orangefs_debug_open,
+	.read           = orangefs_debug_read,
+	.write		= orangefs_debug_write,
+	.llseek         = generic_file_llseek,
+};
+
+/*
+ * initialize kmod debug operations, create orangefs debugfs dir and
+ * ORANGEFS_KMOD_DEBUG_HELP_FILE.
+ */
+int pvfs2_debugfs_init(void)
+{
+
+	int rc = -ENOMEM;
+
+	debug_dir = debugfs_create_dir("orangefs", NULL);
+	if (!debug_dir)
+		goto out;
+
+	help_file_dentry = debugfs_create_file(ORANGEFS_KMOD_DEBUG_HELP_FILE,
+				  0444,
+				  debug_dir,
+				  debug_help_string,
+				  &debug_help_fops);
+	if (!help_file_dentry)
+		goto out;
+
+	orangefs_debug_disabled = 0;
+	rc = 0;
+
+out:
+	if (rc)
+		pvfs2_debugfs_cleanup();
+
+	return rc;
+}
+
+void pvfs2_debugfs_cleanup(void)
+{
+	debugfs_remove_recursive(debug_dir);
+}
+
+/* open ORANGEFS_KMOD_DEBUG_HELP_FILE */
+static int orangefs_debug_help_open(struct inode *inode, struct file *file)
+{
+	int rc = -ENODEV;
+	int ret;
+
+	gossip_debug(GOSSIP_DEBUGFS_DEBUG,
+		     "orangefs_debug_help_open: start\n");
+
+	if (orangefs_debug_disabled)
+		goto out;
+
+	ret = seq_open(file, &help_debug_ops);
+	if (ret)
+		goto out;
+
+	((struct seq_file *)(file->private_data))->private = inode->i_private;
+
+	rc = 0;
+
+out:
+	gossip_debug(GOSSIP_DEBUGFS_DEBUG,
+		     "orangefs_debug_help_open: rc:%d:\n",
+		     rc);
+	return rc;
+}
+
+/*
+ * I think start always gets called again after stop. Start
+ * needs to return NULL when it is done. The whole "payload"
+ * in this case is a single (long) string, so by the second
+ * time we get to start (pos = 1), we're done.
+ */
+static void *help_start(struct seq_file *m, loff_t *pos)
+{
+	void *payload = NULL;
+
+	gossip_debug(GOSSIP_DEBUGFS_DEBUG, "help_start: start\n");
+
+	if (*pos == 0)
+		payload = m->private;
+
+	return payload;
+}
+
+static void *help_next(struct seq_file *m, void *v, loff_t *pos)
+{
+	gossip_debug(GOSSIP_DEBUGFS_DEBUG, "help_next: start\n");
+
+	return NULL;
+}
+
+static void help_stop(struct seq_file *m, void *p)
+{
+	gossip_debug(GOSSIP_DEBUGFS_DEBUG, "help_stop: start\n");
+}
+
+static int help_show(struct seq_file *m, void *v)
+{
+	gossip_debug(GOSSIP_DEBUGFS_DEBUG, "help_show: start\n");
+
+	seq_puts(m, v);
+
+	return 0;
+}
+
+/*
+ * initialize the kernel-debug file.
+ */
+int pvfs2_kernel_debug_init(void)
+{
+
+	int rc = -ENOMEM;
+	struct dentry *ret;
+	char *k_buffer = NULL;
+
+	gossip_debug(GOSSIP_DEBUGFS_DEBUG, "%s: start\n", __func__);
+
+	k_buffer = kzalloc(PVFS2_MAX_DEBUG_STRING_LEN, GFP_KERNEL);
+	if (!k_buffer)
+		goto out;
+
+	if (strlen(kernel_debug_string) + 1 < PVFS2_MAX_DEBUG_STRING_LEN) {
+		strcpy(k_buffer, kernel_debug_string);
+		strcat(k_buffer, "\n");
+	} else {
+		strcpy(k_buffer, "none\n");
+		pr_info("%s: overflow 1!\n", __func__);
+	}
+
+	ret = debugfs_create_file(ORANGEFS_KMOD_DEBUG_FILE,
+				  0444,
+				  debug_dir,
+				  k_buffer,
+				  &kernel_debug_fops);
+	if (!ret) {
+		pr_info("%s: failed to create %s.\n",
+			__func__,
+			ORANGEFS_KMOD_DEBUG_FILE);
+		goto out;
+	}
+
+	rc = 0;
+
+out:
+	if (rc)
+		pvfs2_debugfs_cleanup();
+
+	gossip_debug(GOSSIP_DEBUGFS_DEBUG, "%s: rc:%d:\n", __func__, rc);
+	return rc;
+}
+
+/*
+ * initialize the client-debug file.
+ */
+int pvfs2_client_debug_init(void)
+{
+
+	int rc = -ENOMEM;
+	char *c_buffer = NULL;
+
+	gossip_debug(GOSSIP_DEBUGFS_DEBUG, "%s: start\n", __func__);
+
+	c_buffer = kzalloc(PVFS2_MAX_DEBUG_STRING_LEN, GFP_KERNEL);
+	if (!c_buffer)
+		goto out;
+
+	if (strlen(client_debug_string) + 1 < PVFS2_MAX_DEBUG_STRING_LEN) {
+		strcpy(c_buffer, client_debug_string);
+		strcat(c_buffer, "\n");
+	} else {
+		strcpy(c_buffer, "none\n");
+		pr_info("%s: overflow! 2\n", __func__);
+	}
+
+	client_debug_dentry = debugfs_create_file(ORANGEFS_CLIENT_DEBUG_FILE,
+						  0444,
+						  debug_dir,
+						  c_buffer,
+						  &kernel_debug_fops);
+	if (!client_debug_dentry) {
+		pr_info("%s: failed to create %s.\n",
+			__func__,
+			ORANGEFS_CLIENT_DEBUG_FILE);
+		goto out;
+	}
+
+	rc = 0;
+
+out:
+	if (rc)
+		pvfs2_debugfs_cleanup();
+
+	gossip_debug(GOSSIP_DEBUGFS_DEBUG, "%s: rc:%d:\n", __func__, rc);
+	return rc;
+}
+
+/* open ORANGEFS_KMOD_DEBUG_FILE or ORANGEFS_CLIENT_DEBUG_FILE.*/
+int orangefs_debug_open(struct inode *inode, struct file *file)
+{
+	int rc = -ENODEV;
+
+	gossip_debug(GOSSIP_DEBUGFS_DEBUG,
+		     "%s: orangefs_debug_disabled: %d\n",
+		     __func__,
+		     orangefs_debug_disabled);
+
+	if (orangefs_debug_disabled)
+		goto out;
+
+	rc = 0;
+	mutex_lock(&orangefs_debug_lock);
+	file->private_data = inode->i_private;
+	mutex_unlock(&orangefs_debug_lock);
+
+out:
+	gossip_debug(GOSSIP_DEBUGFS_DEBUG,
+		     "orangefs_debug_open: rc: %d\n",
+		     rc);
+	return rc;
+}
+
+static ssize_t orangefs_debug_read(struct file *file,
+				 char __user *ubuf,
+				 size_t count,
+				 loff_t *ppos)
+{
+	char *buf;
+	int sprintf_ret;
+	ssize_t read_ret = -ENOMEM;
+
+	gossip_debug(GOSSIP_DEBUGFS_DEBUG, "orangefs_debug_read: start\n");
+
+	buf = kmalloc(PVFS2_MAX_DEBUG_STRING_LEN, GFP_KERNEL);
+	if (!buf)
+		goto out;
+
+	mutex_lock(&orangefs_debug_lock);
+	sprintf_ret = sprintf(buf, "%s", (char *)file->private_data);
+	mutex_unlock(&orangefs_debug_lock);
+
+	read_ret = simple_read_from_buffer(ubuf, count, ppos, buf, sprintf_ret);
+
+	kfree(buf);
+
+out:
+	gossip_debug(GOSSIP_DEBUGFS_DEBUG,
+		     "orangefs_debug_read: ret: %zu\n",
+		     read_ret);
+
+	return read_ret;
+}
+
+static ssize_t orangefs_debug_write(struct file *file,
+				  const char __user *ubuf,
+				  size_t count,
+				  loff_t *ppos)
+{
+	char *buf;
+	int rc = -EFAULT;
+	size_t silly = 0;
+	char *debug_string;
+	struct pvfs2_kernel_op_s *new_op = NULL;
+	struct client_debug_mask c_mask = { NULL, 0, 0 };
+
+	gossip_debug(GOSSIP_DEBUGFS_DEBUG,
+		"orangefs_debug_write: %s\n",
+		file->f_path.dentry->d_name.name);
+
+	/*
+	 * Thwart users who try to jamb a ridiculous number
+	 * of bytes into the debug file...
+	 */
+	if (count > PVFS2_MAX_DEBUG_STRING_LEN + 1) {
+		silly = count;
+		count = PVFS2_MAX_DEBUG_STRING_LEN + 1;
+	}
+
+	buf = kmalloc(PVFS2_MAX_DEBUG_STRING_LEN, GFP_KERNEL);
+	if (!buf)
+		goto out;
+	memset(buf, 0, PVFS2_MAX_DEBUG_STRING_LEN);
+
+	if (copy_from_user(buf, ubuf, count - 1)) {
+		gossip_debug(GOSSIP_DEBUGFS_DEBUG,
+			     "%s: copy_from_user failed!\n",
+			     __func__);
+		goto out;
+	}
+
+	/*
+	 * Map the keyword string from userspace into a valid debug mask.
+	 * The mapping process involves mapping the human-inputted string
+	 * into a valid mask, and then rebuilding the string from the
+	 * verified valid mask.
+	 *
+	 * A service operation is required to set a new client-side
+	 * debug mask.
+	 */
+	if (!strcmp(file->f_path.dentry->d_name.name,
+		    ORANGEFS_KMOD_DEBUG_FILE)) {
+		debug_string_to_mask(buf, &gossip_debug_mask, 0);
+		debug_mask_to_string(&gossip_debug_mask, 0);
+		debug_string = kernel_debug_string;
+		gossip_debug(GOSSIP_DEBUGFS_DEBUG,
+			     "New kernel debug string is %s\n",
+			     kernel_debug_string);
+	} else {
+		/* Can't reset client debug mask if client is not running. */
+		if (is_daemon_in_service()) {
+			pr_info("%s: Client not running :%d:\n",
+				__func__,
+				is_daemon_in_service());
+			goto out;
+		}
+
+		debug_string_to_mask(buf, &c_mask, 1);
+		debug_mask_to_string(&c_mask, 1);
+		debug_string = client_debug_string;
+
+		new_op = op_alloc(PVFS2_VFS_OP_PARAM);
+		if (!new_op) {
+			pr_info("%s: op_alloc failed!\n", __func__);
+			goto out;
+		}
+
+		new_op->upcall.req.param.op =
+			PVFS2_PARAM_REQUEST_OP_TWO_MASK_VALUES;
+		new_op->upcall.req.param.type = PVFS2_PARAM_REQUEST_SET;
+		memset(new_op->upcall.req.param.s_value,
+		       0,
+		       PVFS2_MAX_DEBUG_STRING_LEN);
+		sprintf(new_op->upcall.req.param.s_value,
+			"%llx %llx\n",
+			c_mask.mask1,
+			c_mask.mask2);
+
+		/* service_operation returns 0 on success... */
+		rc = service_operation(new_op,
+				       "pvfs2_param",
+					PVFS2_OP_INTERRUPTIBLE);
+
+		if (rc)
+			gossip_debug(GOSSIP_DEBUGFS_DEBUG,
+				     "%s: service_operation failed! rc:%d:\n",
+				     __func__,
+				     rc);
+
+		op_release(new_op);
+	}
+
+	mutex_lock(&orangefs_debug_lock);
+	memset(file->f_inode->i_private, 0, PVFS2_MAX_DEBUG_STRING_LEN);
+	sprintf((char *)file->f_inode->i_private, "%s\n", debug_string);
+	mutex_unlock(&orangefs_debug_lock);
+
+	*ppos += count;
+	if (silly)
+		rc = silly;
+	else
+		rc = count;
+
+out:
+	gossip_debug(GOSSIP_DEBUGFS_DEBUG,
+		     "orangefs_debug_write: rc: %d\n",
+		     rc);
+	kfree(buf);
+	return rc;
+}
diff --git a/fs/orangefs/pvfs2-mod.c b/fs/orangefs/pvfs2-mod.c
new file mode 100644
index 0000000..9cbc992
--- /dev/null
+++ b/fs/orangefs/pvfs2-mod.c
@@ -0,0 +1,316 @@
+/*
+ * (C) 2001 Clemson University and The University of Chicago
+ *
+ * Changes by Acxiom Corporation to add proc file handler for pvfs2 client
+ * parameters, Copyright Acxiom Corporation, 2005.
+ *
+ * See COPYING in top-level directory.
+ */
+
+#include "protocol.h"
+#include "pvfs2-kernel.h"
+#include "pvfs2-debugfs.h"
+#include "pvfs2-sysfs.h"
+
+/* PVFS2_VERSION is a ./configure define */
+#ifndef PVFS2_VERSION
+#define PVFS2_VERSION "Unknown"
+#endif
+
+/*
+ * global variables declared here
+ */
+
+/* array of client debug keyword/mask values */
+struct client_debug_mask *cdm_array;
+int cdm_element_count;
+
+char kernel_debug_string[PVFS2_MAX_DEBUG_STRING_LEN] = "none";
+char client_debug_string[PVFS2_MAX_DEBUG_STRING_LEN];
+char client_debug_array_string[PVFS2_MAX_DEBUG_STRING_LEN];
+
+char *debug_help_string;
+int help_string_initialized;
+struct dentry *help_file_dentry;
+struct dentry *client_debug_dentry;
+struct dentry *debug_dir;
+int client_verbose_index;
+int client_all_index;
+struct pvfs2_stats g_pvfs2_stats;
+
+/* the size of the hash tables for ops in progress */
+int hash_table_size = 509;
+
+static ulong module_parm_debug_mask;
+__u64 gossip_debug_mask;
+struct client_debug_mask client_debug_mask = { NULL, 0, 0 };
+unsigned int kernel_mask_set_mod_init; /* implicitly false */
+int op_timeout_secs = PVFS2_DEFAULT_OP_TIMEOUT_SECS;
+int slot_timeout_secs = PVFS2_DEFAULT_SLOT_TIMEOUT_SECS;
+__u32 DEBUG_LINE = 50;
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("PVFS2 Development Team");
+MODULE_DESCRIPTION("The Linux Kernel VFS interface to PVFS2");
+MODULE_PARM_DESC(module_parm_debug_mask, "debugging level (see pvfs2-debug.h for values)");
+MODULE_PARM_DESC(op_timeout_secs, "Operation timeout in seconds");
+MODULE_PARM_DESC(slot_timeout_secs, "Slot timeout in seconds");
+MODULE_PARM_DESC(hash_table_size,
+		 "size of hash table for operations in progress");
+
+static struct file_system_type pvfs2_fs_type = {
+	.name = "pvfs2",
+	.mount = pvfs2_mount,
+	.kill_sb = pvfs2_kill_sb,
+	.owner = THIS_MODULE,
+};
+
+module_param(hash_table_size, int, 0);
+module_param(module_parm_debug_mask, ulong, 0755);
+module_param(op_timeout_secs, int, 0);
+module_param(slot_timeout_secs, int, 0);
+
+/* synchronizes the request device file */
+struct mutex devreq_mutex;
+
+/*
+  blocks non-priority requests from being queued for servicing.  this
+  could be used for protecting the request list data structure, but
+  for now it's only being used to stall the op addition to the request
+  list
+*/
+struct mutex request_mutex;
+
+/* hash table for storing operations waiting for matching downcall */
+struct list_head *htable_ops_in_progress;
+DEFINE_SPINLOCK(htable_ops_in_progress_lock);
+
+/* list for queueing upcall operations */
+LIST_HEAD(pvfs2_request_list);
+
+/* used to protect the above pvfs2_request_list */
+DEFINE_SPINLOCK(pvfs2_request_list_lock);
+
+/* used for incoming request notification */
+DECLARE_WAIT_QUEUE_HEAD(pvfs2_request_list_waitq);
+
+static int __init pvfs2_init(void)
+{
+	int ret = -1;
+	__u32 i = 0;
+
+	/* convert input debug mask to a 64-bit unsigned integer */
+	gossip_debug_mask = (unsigned long long) module_parm_debug_mask;
+
+	/*
+	 * set the kernel's gossip debug string; invalid mask values will
+	 * be ignored.
+	 */
+	debug_mask_to_string(&gossip_debug_mask, 0);
+
+	/* remove any invalid values from the mask */
+	debug_string_to_mask(kernel_debug_string, &gossip_debug_mask, 0);
+
+	/*
+	 * if the mask has a non-zero value, then indicate that the mask
+	 * was set when the kernel module was loaded.  The pvfs2 dev ioctl
+	 * command will look at this boolean to determine if the kernel's
+	 * debug mask should be overwritten when the client-core is started.
+	 */
+	if (gossip_debug_mask != 0)
+		kernel_mask_set_mod_init = true;
+
+	/* print information message to the system log */
+	pr_info("pvfs2: pvfs2_init called with debug mask: :%s: :%llx:\n",
+	       kernel_debug_string,
+	       (unsigned long long)gossip_debug_mask);
+
+	ret = bdi_init(&pvfs2_backing_dev_info);
+
+	if (ret)
+		return ret;
+
+	if (op_timeout_secs < 0)
+		op_timeout_secs = 0;
+
+	if (slot_timeout_secs < 0)
+		slot_timeout_secs = 0;
+
+	/* initialize global book keeping data structures */
+	ret = op_cache_initialize();
+	if (ret < 0)
+		goto err;
+
+	ret = dev_req_cache_initialize();
+	if (ret < 0)
+		goto cleanup_op;
+
+	ret = pvfs2_inode_cache_initialize();
+	if (ret < 0)
+		goto cleanup_req;
+
+	ret = kiocb_cache_initialize();
+	if (ret  < 0)
+		goto cleanup_inode;
+
+	/* Initialize the pvfsdev subsystem. */
+	ret = pvfs2_dev_init();
+	if (ret < 0) {
+		gossip_err("pvfs2: could not initialize device subsystem %d!\n",
+			   ret);
+		goto cleanup_kiocb;
+	}
+
+	mutex_init(&devreq_mutex);
+	mutex_init(&request_mutex);
+
+	htable_ops_in_progress =
+	    kcalloc(hash_table_size, sizeof(struct list_head), GFP_KERNEL);
+	if (!htable_ops_in_progress) {
+		gossip_err("Failed to initialize op hashtable");
+		ret = -ENOMEM;
+		goto cleanup_device;
+	}
+
+	/* initialize a doubly linked at each hash table index */
+	for (i = 0; i < hash_table_size; i++)
+		INIT_LIST_HEAD(&htable_ops_in_progress[i]);
+
+	ret = fsid_key_table_initialize();
+	if (ret < 0)
+		goto cleanup_progress_table;
+
+	/*
+	 * Build the contents of /sys/kernel/debug/orangefs/debug-help
+	 * from the keywords in the kernel keyword/mask array.
+	 *
+	 * The keywords in the client keyword/mask array are
+	 * unknown at boot time.
+	 *
+	 * orangefs_prepare_debugfs_help_string will be used again
+	 * later to rebuild the debug-help file after the client starts
+	 * and passes along the needed info. The argument signifies
+	 * which time orangefs_prepare_debugfs_help_string is being
+	 * called.
+	 *
+	 */
+	ret = orangefs_prepare_debugfs_help_string(1);
+	if (ret)
+		goto out;
+
+	pvfs2_debugfs_init();
+	pvfs2_kernel_debug_init();
+	orangefs_sysfs_init();
+
+	ret = register_filesystem(&pvfs2_fs_type);
+	if (ret == 0) {
+		pr_info("pvfs2: module version %s loaded\n", PVFS2_VERSION);
+		return 0;
+	}
+
+	pvfs2_debugfs_cleanup();
+	orangefs_sysfs_exit();
+	fsid_key_table_finalize();
+
+cleanup_progress_table:
+	kfree(htable_ops_in_progress);
+
+cleanup_device:
+	pvfs2_dev_cleanup();
+
+cleanup_kiocb:
+	kiocb_cache_finalize();
+
+cleanup_inode:
+	pvfs2_inode_cache_finalize();
+
+cleanup_req:
+	dev_req_cache_finalize();
+
+cleanup_op:
+	op_cache_finalize();
+
+err:
+	bdi_destroy(&pvfs2_backing_dev_info);
+
+out:
+	return ret;
+}
+
+static void __exit pvfs2_exit(void)
+{
+	int i = 0;
+	struct pvfs2_kernel_op_s *cur_op = NULL;
+
+	gossip_debug(GOSSIP_INIT_DEBUG, "pvfs2: pvfs2_exit called\n");
+
+	unregister_filesystem(&pvfs2_fs_type);
+	pvfs2_debugfs_cleanup();
+	orangefs_sysfs_exit();
+	fsid_key_table_finalize();
+	pvfs2_dev_cleanup();
+	/* clear out all pending upcall op requests */
+	spin_lock(&pvfs2_request_list_lock);
+	while (!list_empty(&pvfs2_request_list)) {
+		cur_op = list_entry(pvfs2_request_list.next,
+				    struct pvfs2_kernel_op_s,
+				    list);
+		list_del(&cur_op->list);
+		gossip_debug(GOSSIP_INIT_DEBUG,
+			     "Freeing unhandled upcall request type %d\n",
+			     cur_op->upcall.type);
+		op_release(cur_op);
+	}
+	spin_unlock(&pvfs2_request_list_lock);
+
+	for (i = 0; i < hash_table_size; i++)
+		while (!list_empty(&htable_ops_in_progress[i])) {
+			cur_op = list_entry(htable_ops_in_progress[i].next,
+					    struct pvfs2_kernel_op_s,
+					    list);
+			op_release(cur_op);
+		}
+
+	kiocb_cache_finalize();
+	pvfs2_inode_cache_finalize();
+	dev_req_cache_finalize();
+	op_cache_finalize();
+
+	kfree(htable_ops_in_progress);
+
+	bdi_destroy(&pvfs2_backing_dev_info);
+
+	pr_info("pvfs2: module version %s unloaded\n", PVFS2_VERSION);
+}
+
+/*
+ * What we do in this function is to walk the list of operations
+ * that are in progress in the hash table and mark them as purged as well.
+ */
+void purge_inprogress_ops(void)
+{
+	int i;
+
+	for (i = 0; i < hash_table_size; i++) {
+		struct pvfs2_kernel_op_s *op;
+		struct pvfs2_kernel_op_s *next;
+
+		list_for_each_entry_safe(op,
+					 next,
+					 &htable_ops_in_progress[i],
+					 list) {
+			spin_lock(&op->lock);
+			gossip_debug(GOSSIP_INIT_DEBUG,
+				"pvfs2-client-core: purging in-progress op tag "
+				"%llu %s\n",
+				llu(op->tag),
+				get_opname_string(op));
+			set_op_state_purged(op);
+			spin_unlock(&op->lock);
+			wake_up_interruptible(&op->waitq);
+		}
+	}
+}
+
+module_init(pvfs2_init);
+module_exit(pvfs2_exit);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH V3 4/7] Orangefs: kernel client part 4
  2015-07-17 17:16 [PATCH V3 0/7] Orangefs: kernel client introduction Mike Marshall
                   ` (2 preceding siblings ...)
  2015-07-17 17:16 ` [PATCH V3 3/7] Orangefs: kernel client part 3 Mike Marshall
@ 2015-07-17 17:16 ` Mike Marshall
  2015-07-17 17:16 ` [PATCH V3 5/7] Orangefs: kernel client part 5 Mike Marshall
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Mike Marshall @ 2015-07-17 17:16 UTC (permalink / raw)
  To: viro; +Cc: Mike Marshall, linux-fsdevel

From: Mike Marshall <hubcap@omnibond.com>

Signed-off-by: Mike Marshall <hubcap@omnibond.com>
---
 fs/orangefs/pvfs2-sysfs.c | 1787 +++++++++++++++++++++++++++++++++++++++++++++
 fs/orangefs/pvfs2-utils.c | 1128 ++++++++++++++++++++++++++++
 2 files changed, 2915 insertions(+)
 create mode 100644 fs/orangefs/pvfs2-sysfs.c
 create mode 100644 fs/orangefs/pvfs2-utils.c

diff --git a/fs/orangefs/pvfs2-sysfs.c b/fs/orangefs/pvfs2-sysfs.c
new file mode 100644
index 0000000..6d0e18b
--- /dev/null
+++ b/fs/orangefs/pvfs2-sysfs.c
@@ -0,0 +1,1787 @@
+/*
+ * Documentation/ABI/stable/orangefs-sysfs:
+ *
+ * What:		/sys/fs/orangefs/perf_counter_reset
+ * Date:		June 2015
+ * Contact:		Mike Marshall <hubcap@omnibond.com>
+ * Description:
+ * 			echo a 0 or a 1 into perf_counter_reset to
+ * 			reset all the counters in
+ * 			/sys/fs/orangefs/perf_counters
+ * 			except ones with PINT_PERF_PRESERVE set.
+ *
+ *
+ * What:		/sys/fs/orangefs/perf_counters/...
+ * Date:		Jun 2015
+ * Contact:		Mike Marshall <hubcap@omnibond.com>
+ * Description:
+ * 			Counters and settings for various caches.
+ * 			Read only.
+ *
+ *
+ * What:		/sys/fs/orangefs/perf_time_interval_secs
+ * Date:		Jun 2015
+ * Contact:		Mike Marshall <hubcap@omnibond.com>
+ * Description:
+ *			Length of perf counter intervals in
+ *			seconds.
+ *
+ *
+ * What:		/sys/fs/orangefs/perf_history_size
+ * Date:		Jun 2015
+ * Contact:		Mike Marshall <hubcap@omnibond.com>
+ * Description:
+ * 			The perf_counters cache statistics have N, or
+ * 			perf_history_size, samples. The default is
+ * 			one.
+ *
+ *			Every perf_time_interval_secs the (first)
+ *			samples are reset.
+ *
+ *			If N is greater than one, the "current" set
+ *			of samples is reset, and the samples from the
+ *			other N-1 intervals remain available.
+ *
+ *
+ * What:		/sys/fs/orangefs/op_timeout_secs
+ * Date:		Jun 2015
+ * Contact:		Mike Marshall <hubcap@omnibond.com>
+ * Description:
+ *			Service operation timeout in seconds.
+ *
+ *
+ * What:		/sys/fs/orangefs/slot_timeout_secs
+ * Date:		Jun 2015
+ * Contact:		Mike Marshall <hubcap@omnibond.com>
+ * Description:
+ *			"Slot" timeout in seconds. A "slot"
+ *			is an indexed buffer in the shared
+ *			memory segment used for communication
+ *			between the kernel module and userspace.
+ *			Slots are requested and waited for,
+ *			the wait times out after slot_timeout_secs.
+ *
+ *
+ * What:		/sys/fs/orangefs/acache/...
+ * Date:		Jun 2015
+ * Contact:		Mike Marshall <hubcap@omnibond.com>
+ * Description:
+ * 			Attribute cache configurable settings.
+ *
+ *
+ * What:		/sys/fs/orangefs/ncache/...
+ * Date:		Jun 2015
+ * Contact:		Mike Marshall <hubcap@omnibond.com>
+ * Description:
+ * 			Name cache configurable settings.
+ *
+ *
+ * What:		/sys/fs/orangefs/capcache/...
+ * Date:		Jun 2015
+ * Contact:		Mike Marshall <hubcap@omnibond.com>
+ * Description:
+ * 			Capability cache configurable settings.
+ *
+ *
+ * What:		/sys/fs/orangefs/ccache/...
+ * Date:		Jun 2015
+ * Contact:		Mike Marshall <hubcap@omnibond.com>
+ * Description:
+ * 			Credential cache configurable settings.
+ *
+ */
+
+#include <linux/fs.h>
+#include <linux/kobject.h>
+#include <linux/string.h>
+#include <linux/sysfs.h>
+#include <linux/module.h>
+#include <linux/init.h>
+
+#include "protocol.h"
+#include "pvfs2-kernel.h"
+#include "pvfs2-sysfs.h"
+
+#define ORANGEFS_KOBJ_ID "orangefs"
+#define ACACHE_KOBJ_ID "acache"
+#define CAPCACHE_KOBJ_ID "capcache"
+#define CCACHE_KOBJ_ID "ccache"
+#define NCACHE_KOBJ_ID "ncache"
+#define PC_KOBJ_ID "pc"
+#define STATS_KOBJ_ID "stats"
+
+struct orangefs_obj {
+	struct kobject kobj;
+	int op_timeout_secs;
+	int perf_counter_reset;
+	int perf_history_size;
+	int perf_time_interval_secs;
+	int slot_timeout_secs;
+};
+
+struct acache_orangefs_obj {
+	struct kobject kobj;
+	int hard_limit;
+	int reclaim_percentage;
+	int soft_limit;
+	int timeout_msecs;
+};
+
+struct capcache_orangefs_obj {
+	struct kobject kobj;
+	int hard_limit;
+	int reclaim_percentage;
+	int soft_limit;
+	int timeout_secs;
+};
+
+struct ccache_orangefs_obj {
+	struct kobject kobj;
+	int hard_limit;
+	int reclaim_percentage;
+	int soft_limit;
+	int timeout_secs;
+};
+
+struct ncache_orangefs_obj {
+	struct kobject kobj;
+	int hard_limit;
+	int reclaim_percentage;
+	int soft_limit;
+	int timeout_msecs;
+};
+
+struct pc_orangefs_obj {
+	struct kobject kobj;
+	char *acache;
+	char *capcache;
+	char *ncache;
+};
+
+struct stats_orangefs_obj {
+	struct kobject kobj;
+	int reads;
+	int writes;
+};
+
+struct orangefs_attribute {
+	struct attribute attr;
+	ssize_t (*show)(struct orangefs_obj *orangefs_obj,
+			struct orangefs_attribute *attr,
+			char *buf);
+	ssize_t (*store)(struct orangefs_obj *orangefs_obj,
+			 struct orangefs_attribute *attr,
+			 const char *buf,
+			 size_t count);
+};
+
+struct acache_orangefs_attribute {
+	struct attribute attr;
+	ssize_t (*show)(struct acache_orangefs_obj *acache_orangefs_obj,
+			struct acache_orangefs_attribute *attr,
+			char *buf);
+	ssize_t (*store)(struct acache_orangefs_obj *acache_orangefs_obj,
+			 struct acache_orangefs_attribute *attr,
+			 const char *buf,
+			 size_t count);
+};
+
+struct capcache_orangefs_attribute {
+	struct attribute attr;
+	ssize_t (*show)(struct capcache_orangefs_obj *capcache_orangefs_obj,
+			struct capcache_orangefs_attribute *attr,
+			char *buf);
+	ssize_t (*store)(struct capcache_orangefs_obj *capcache_orangefs_obj,
+			 struct capcache_orangefs_attribute *attr,
+			 const char *buf,
+			 size_t count);
+};
+
+struct ccache_orangefs_attribute {
+	struct attribute attr;
+	ssize_t (*show)(struct ccache_orangefs_obj *ccache_orangefs_obj,
+			struct ccache_orangefs_attribute *attr,
+			char *buf);
+	ssize_t (*store)(struct ccache_orangefs_obj *ccache_orangefs_obj,
+			 struct ccache_orangefs_attribute *attr,
+			 const char *buf,
+			 size_t count);
+};
+
+struct ncache_orangefs_attribute {
+	struct attribute attr;
+	ssize_t (*show)(struct ncache_orangefs_obj *ncache_orangefs_obj,
+			struct ncache_orangefs_attribute *attr,
+			char *buf);
+	ssize_t (*store)(struct ncache_orangefs_obj *ncache_orangefs_obj,
+			 struct ncache_orangefs_attribute *attr,
+			 const char *buf,
+			 size_t count);
+};
+
+struct pc_orangefs_attribute {
+	struct attribute attr;
+	ssize_t (*show)(struct pc_orangefs_obj *pc_orangefs_obj,
+			struct pc_orangefs_attribute *attr,
+			char *buf);
+	ssize_t (*store)(struct pc_orangefs_obj *pc_orangefs_obj,
+			 struct pc_orangefs_attribute *attr,
+			 const char *buf,
+			 size_t count);
+};
+
+struct stats_orangefs_attribute {
+	struct attribute attr;
+	ssize_t (*show)(struct stats_orangefs_obj *stats_orangefs_obj,
+			struct stats_orangefs_attribute *attr,
+			char *buf);
+	ssize_t (*store)(struct stats_orangefs_obj *stats_orangefs_obj,
+			 struct stats_orangefs_attribute *attr,
+			 const char *buf,
+			 size_t count);
+};
+
+static ssize_t orangefs_attr_show(struct kobject *kobj,
+				  struct attribute *attr,
+				  char *buf)
+{
+	struct orangefs_attribute *attribute;
+	struct orangefs_obj *orangefs_obj;
+	int rc;
+
+	attribute = container_of(attr, struct orangefs_attribute, attr);
+	orangefs_obj = container_of(kobj, struct orangefs_obj, kobj);
+
+	if (!attribute->show) {
+		rc = -EIO;
+		goto out;
+	}
+
+	rc = attribute->show(orangefs_obj, attribute, buf);
+
+out:
+	return rc;
+}
+
+static ssize_t orangefs_attr_store(struct kobject *kobj,
+				   struct attribute *attr,
+				   const char *buf,
+				   size_t len)
+{
+	struct orangefs_attribute *attribute;
+	struct orangefs_obj *orangefs_obj;
+	int rc;
+
+	gossip_debug(GOSSIP_SYSFS_DEBUG,
+		     "orangefs_attr_store: start\n");
+
+	attribute = container_of(attr, struct orangefs_attribute, attr);
+	orangefs_obj = container_of(kobj, struct orangefs_obj, kobj);
+
+	if (!attribute->store) {
+		rc = -EIO;
+		goto out;
+	}
+
+	rc = attribute->store(orangefs_obj, attribute, buf, len);
+
+out:
+	return rc;
+}
+
+static const struct sysfs_ops orangefs_sysfs_ops = {
+	.show = orangefs_attr_show,
+	.store = orangefs_attr_store,
+};
+
+static ssize_t acache_orangefs_attr_show(struct kobject *kobj,
+					 struct attribute *attr,
+					 char *buf)
+{
+	struct acache_orangefs_attribute *attribute;
+	struct acache_orangefs_obj *acache_orangefs_obj;
+	int rc;
+
+	attribute = container_of(attr, struct acache_orangefs_attribute, attr);
+	acache_orangefs_obj =
+		container_of(kobj, struct acache_orangefs_obj, kobj);
+
+	if (!attribute->show) {
+		rc = -EIO;
+		goto out;
+	}
+
+	rc = attribute->show(acache_orangefs_obj, attribute, buf);
+
+out:
+	return rc;
+}
+
+static ssize_t acache_orangefs_attr_store(struct kobject *kobj,
+					  struct attribute *attr,
+					  const char *buf,
+					  size_t len)
+{
+	struct acache_orangefs_attribute *attribute;
+	struct acache_orangefs_obj *acache_orangefs_obj;
+	int rc;
+
+	gossip_debug(GOSSIP_SYSFS_DEBUG,
+		     "acache_orangefs_attr_store: start\n");
+
+	attribute = container_of(attr, struct acache_orangefs_attribute, attr);
+	acache_orangefs_obj =
+		container_of(kobj, struct acache_orangefs_obj, kobj);
+
+	if (!attribute->store) {
+		rc = -EIO;
+		goto out;
+	}
+
+	rc = attribute->store(acache_orangefs_obj, attribute, buf, len);
+
+out:
+	return rc;
+}
+
+static const struct sysfs_ops acache_orangefs_sysfs_ops = {
+	.show = acache_orangefs_attr_show,
+	.store = acache_orangefs_attr_store,
+};
+
+static ssize_t capcache_orangefs_attr_show(struct kobject *kobj,
+					   struct attribute *attr,
+					   char *buf)
+{
+	struct capcache_orangefs_attribute *attribute;
+	struct capcache_orangefs_obj *capcache_orangefs_obj;
+	int rc;
+
+	attribute =
+		container_of(attr, struct capcache_orangefs_attribute, attr);
+	capcache_orangefs_obj =
+		container_of(kobj, struct capcache_orangefs_obj, kobj);
+
+	if (!attribute->show) {
+		rc = -EIO;
+		goto out;
+	}
+
+	rc = attribute->show(capcache_orangefs_obj, attribute, buf);
+
+out:
+	return rc;
+}
+
+static ssize_t capcache_orangefs_attr_store(struct kobject *kobj,
+					    struct attribute *attr,
+					    const char *buf,
+					    size_t len)
+{
+	struct capcache_orangefs_attribute *attribute;
+	struct capcache_orangefs_obj *capcache_orangefs_obj;
+	int rc;
+
+	gossip_debug(GOSSIP_SYSFS_DEBUG,
+		     "capcache_orangefs_attr_store: start\n");
+
+	attribute =
+		container_of(attr, struct capcache_orangefs_attribute, attr);
+	capcache_orangefs_obj =
+		container_of(kobj, struct capcache_orangefs_obj, kobj);
+
+	if (!attribute->store) {
+		rc = -EIO;
+		goto out;
+	}
+
+	rc = attribute->store(capcache_orangefs_obj, attribute, buf, len);
+
+out:
+	return rc;
+}
+
+static const struct sysfs_ops capcache_orangefs_sysfs_ops = {
+	.show = capcache_orangefs_attr_show,
+	.store = capcache_orangefs_attr_store,
+};
+
+static ssize_t ccache_orangefs_attr_show(struct kobject *kobj,
+					 struct attribute *attr,
+					 char *buf)
+{
+	struct ccache_orangefs_attribute *attribute;
+	struct ccache_orangefs_obj *ccache_orangefs_obj;
+	int rc;
+
+	attribute =
+		container_of(attr, struct ccache_orangefs_attribute, attr);
+	ccache_orangefs_obj =
+		container_of(kobj, struct ccache_orangefs_obj, kobj);
+
+	if (!attribute->show) {
+		rc = -EIO;
+		goto out;
+	}
+
+	rc = attribute->show(ccache_orangefs_obj, attribute, buf);
+
+out:
+	return rc;
+}
+
+static ssize_t ccache_orangefs_attr_store(struct kobject *kobj,
+					  struct attribute *attr,
+					  const char *buf,
+					  size_t len)
+{
+	struct ccache_orangefs_attribute *attribute;
+	struct ccache_orangefs_obj *ccache_orangefs_obj;
+	int rc;
+
+	gossip_debug(GOSSIP_SYSFS_DEBUG,
+		     "ccache_orangefs_attr_store: start\n");
+
+	attribute =
+		container_of(attr, struct ccache_orangefs_attribute, attr);
+	ccache_orangefs_obj =
+		container_of(kobj, struct ccache_orangefs_obj, kobj);
+
+	if (!attribute->store) {
+		rc = -EIO;
+		goto out;
+	}
+
+	rc = attribute->store(ccache_orangefs_obj, attribute, buf, len);
+
+out:
+	return rc;
+}
+
+static const struct sysfs_ops ccache_orangefs_sysfs_ops = {
+	.show = ccache_orangefs_attr_show,
+	.store = ccache_orangefs_attr_store,
+};
+
+static ssize_t ncache_orangefs_attr_show(struct kobject *kobj,
+					 struct attribute *attr,
+					 char *buf)
+{
+	struct ncache_orangefs_attribute *attribute;
+	struct ncache_orangefs_obj *ncache_orangefs_obj;
+	int rc;
+
+	attribute = container_of(attr, struct ncache_orangefs_attribute, attr);
+	ncache_orangefs_obj =
+		container_of(kobj, struct ncache_orangefs_obj, kobj);
+
+	if (!attribute->show) {
+		rc = -EIO;
+		goto out;
+	}
+
+	rc = attribute->show(ncache_orangefs_obj, attribute, buf);
+
+out:
+	return rc;
+}
+
+static ssize_t ncache_orangefs_attr_store(struct kobject *kobj,
+					  struct attribute *attr,
+					  const char *buf,
+					  size_t len)
+{
+	struct ncache_orangefs_attribute *attribute;
+	struct ncache_orangefs_obj *ncache_orangefs_obj;
+	int rc;
+
+	gossip_debug(GOSSIP_SYSFS_DEBUG,
+		     "ncache_orangefs_attr_store: start\n");
+
+	attribute = container_of(attr, struct ncache_orangefs_attribute, attr);
+	ncache_orangefs_obj =
+		container_of(kobj, struct ncache_orangefs_obj, kobj);
+
+	if (!attribute->store) {
+		rc = -EIO;
+		goto out;
+	}
+
+	rc = attribute->store(ncache_orangefs_obj, attribute, buf, len);
+
+out:
+	return rc;
+}
+
+static const struct sysfs_ops ncache_orangefs_sysfs_ops = {
+	.show = ncache_orangefs_attr_show,
+	.store = ncache_orangefs_attr_store,
+};
+
+static ssize_t pc_orangefs_attr_show(struct kobject *kobj,
+				     struct attribute *attr,
+				     char *buf)
+{
+	struct pc_orangefs_attribute *attribute;
+	struct pc_orangefs_obj *pc_orangefs_obj;
+	int rc;
+
+	attribute = container_of(attr, struct pc_orangefs_attribute, attr);
+	pc_orangefs_obj =
+		container_of(kobj, struct pc_orangefs_obj, kobj);
+
+	if (!attribute->show) {
+		rc = -EIO;
+		goto out;
+	}
+
+	rc = attribute->show(pc_orangefs_obj, attribute, buf);
+
+out:
+	return rc;
+}
+
+static const struct sysfs_ops pc_orangefs_sysfs_ops = {
+	.show = pc_orangefs_attr_show,
+};
+
+static ssize_t stats_orangefs_attr_show(struct kobject *kobj,
+					struct attribute *attr,
+					char *buf)
+{
+	struct stats_orangefs_attribute *attribute;
+	struct stats_orangefs_obj *stats_orangefs_obj;
+	int rc;
+
+	attribute = container_of(attr, struct stats_orangefs_attribute, attr);
+	stats_orangefs_obj =
+		container_of(kobj, struct stats_orangefs_obj, kobj);
+
+	if (!attribute->show) {
+		rc = -EIO;
+		goto out;
+	}
+
+	rc = attribute->show(stats_orangefs_obj, attribute, buf);
+
+out:
+	return rc;
+}
+
+static const struct sysfs_ops stats_orangefs_sysfs_ops = {
+	.show = stats_orangefs_attr_show,
+};
+
+static void orangefs_release(struct kobject *kobj)
+{
+	struct orangefs_obj *orangefs_obj;
+
+	orangefs_obj = container_of(kobj, struct orangefs_obj, kobj);
+	kfree(orangefs_obj);
+}
+
+static void acache_orangefs_release(struct kobject *kobj)
+{
+	struct acache_orangefs_obj *acache_orangefs_obj;
+
+	acache_orangefs_obj =
+		container_of(kobj, struct acache_orangefs_obj, kobj);
+	kfree(acache_orangefs_obj);
+}
+
+static void capcache_orangefs_release(struct kobject *kobj)
+{
+	struct capcache_orangefs_obj *capcache_orangefs_obj;
+
+	capcache_orangefs_obj =
+		container_of(kobj, struct capcache_orangefs_obj, kobj);
+	kfree(capcache_orangefs_obj);
+}
+
+static void ccache_orangefs_release(struct kobject *kobj)
+{
+	struct ccache_orangefs_obj *ccache_orangefs_obj;
+
+	ccache_orangefs_obj =
+		container_of(kobj, struct ccache_orangefs_obj, kobj);
+	kfree(ccache_orangefs_obj);
+}
+
+static void ncache_orangefs_release(struct kobject *kobj)
+{
+	struct ncache_orangefs_obj *ncache_orangefs_obj;
+
+	ncache_orangefs_obj =
+		container_of(kobj, struct ncache_orangefs_obj, kobj);
+	kfree(ncache_orangefs_obj);
+}
+
+static void pc_orangefs_release(struct kobject *kobj)
+{
+	struct pc_orangefs_obj *pc_orangefs_obj;
+
+	pc_orangefs_obj =
+		container_of(kobj, struct pc_orangefs_obj, kobj);
+	kfree(pc_orangefs_obj);
+}
+
+static void stats_orangefs_release(struct kobject *kobj)
+{
+	struct stats_orangefs_obj *stats_orangefs_obj;
+
+	stats_orangefs_obj =
+		container_of(kobj, struct stats_orangefs_obj, kobj);
+	kfree(stats_orangefs_obj);
+}
+
+static ssize_t sysfs_int_show(char *kobj_id, char *buf, void *attr)
+{
+	int rc = -EIO;
+	struct orangefs_attribute *orangefs_attr;
+	struct stats_orangefs_attribute *stats_orangefs_attr;
+
+	gossip_debug(GOSSIP_SYSFS_DEBUG, "sysfs_int_show: id:%s:\n", kobj_id);
+
+	if (!strcmp(kobj_id, ORANGEFS_KOBJ_ID)) {
+		orangefs_attr = (struct orangefs_attribute *)attr;
+
+		if (!strcmp(orangefs_attr->attr.name, "op_timeout_secs")) {
+			rc = scnprintf(buf,
+				       PAGE_SIZE,
+				       "%d\n",
+				       op_timeout_secs);
+			goto out;
+		} else if (!strcmp(orangefs_attr->attr.name,
+				   "slot_timeout_secs")) {
+			rc = scnprintf(buf,
+				       PAGE_SIZE,
+				       "%d\n",
+				       slot_timeout_secs);
+			goto out;
+		} else {
+			goto out;
+		}
+
+	} else if (!strcmp(kobj_id, STATS_KOBJ_ID)) {
+		stats_orangefs_attr = (struct stats_orangefs_attribute *)attr;
+
+		if (!strcmp(stats_orangefs_attr->attr.name, "reads")) {
+			rc = scnprintf(buf,
+				       PAGE_SIZE,
+				       "%lu\n",
+				       g_pvfs2_stats.reads);
+			goto out;
+		} else if (!strcmp(stats_orangefs_attr->attr.name, "writes")) {
+			rc = scnprintf(buf,
+				       PAGE_SIZE,
+				       "%lu\n",
+				       g_pvfs2_stats.writes);
+			goto out;
+		} else {
+			goto out;
+		}
+	}
+
+out:
+
+	return rc;
+}
+
+static ssize_t int_orangefs_show(struct orangefs_obj *orangefs_obj,
+				 struct orangefs_attribute *attr,
+				 char *buf)
+{
+	int rc;
+
+	gossip_debug(GOSSIP_SYSFS_DEBUG,
+		     "int_orangefs_show:start attr->attr.name:%s:\n",
+		     attr->attr.name);
+
+	rc = sysfs_int_show(ORANGEFS_KOBJ_ID, buf, (void *) attr);
+
+	return rc;
+}
+
+static ssize_t int_stats_show(struct stats_orangefs_obj *stats_orangefs_obj,
+			struct stats_orangefs_attribute *attr,
+			char *buf)
+{
+	int rc;
+
+	gossip_debug(GOSSIP_SYSFS_DEBUG,
+		     "int_stats_show:start attr->attr.name:%s:\n",
+		     attr->attr.name);
+
+	rc = sysfs_int_show(STATS_KOBJ_ID, buf, (void *) attr);
+
+	return rc;
+}
+
+static ssize_t int_store(struct orangefs_obj *orangefs_obj,
+			 struct orangefs_attribute *attr,
+			 const char *buf,
+			 size_t count)
+{
+	int rc = 0;
+
+	gossip_debug(GOSSIP_SYSFS_DEBUG,
+		     "int_store: start attr->attr.name:%s: buf:%s:\n",
+		     attr->attr.name, buf);
+
+	if (!strcmp(attr->attr.name, "op_timeout_secs")) {
+		rc = kstrtoint(buf, 0, &op_timeout_secs);
+		goto out;
+	} else if (!strcmp(attr->attr.name, "slot_timeout_secs")) {
+		rc = kstrtoint(buf, 0, &slot_timeout_secs);
+		goto out;
+	} else {
+		goto out;
+	}
+
+out:
+	if (rc)
+		rc = -EINVAL;
+	else
+		rc = count;
+
+	return rc;
+}
+
+/*
+ * obtain attribute values from userspace with a service operation.
+ */
+int sysfs_service_op_show(char *kobj_id, char *buf, void *attr)
+{
+	struct pvfs2_kernel_op_s *new_op = NULL;
+	int rc = 0;
+	char *ser_op_type = NULL;
+	struct orangefs_attribute *orangefs_attr;
+	struct acache_orangefs_attribute *acache_attr;
+	struct capcache_orangefs_attribute *capcache_attr;
+	struct ccache_orangefs_attribute *ccache_attr;
+	struct ncache_orangefs_attribute *ncache_attr;
+	struct pc_orangefs_attribute *pc_attr;
+	__u32 op_alloc_type;
+
+	gossip_debug(GOSSIP_SYSFS_DEBUG,
+		     "sysfs_service_op_show: id:%s:\n",
+		     kobj_id);
+
+	if (strcmp(kobj_id, PC_KOBJ_ID))
+		op_alloc_type = PVFS2_VFS_OP_PARAM;
+	else
+		op_alloc_type = PVFS2_VFS_OP_PERF_COUNT;
+
+	new_op = op_alloc(op_alloc_type);
+	if (!new_op) {
+		rc = -ENOMEM;
+		goto out;
+	}
+
+	/* Can't do a service_operation if the client is not running... */
+	rc = is_daemon_in_service();
+	if (rc) {
+		pr_info("%s: Client not running :%d:\n",
+			__func__,
+			is_daemon_in_service());
+		goto out;
+	}
+
+	if (strcmp(kobj_id, PC_KOBJ_ID))
+		new_op->upcall.req.param.type = PVFS2_PARAM_REQUEST_GET;
+
+	if (!strcmp(kobj_id, ORANGEFS_KOBJ_ID)) {
+		orangefs_attr = (struct orangefs_attribute *)attr;
+
+		if (!strcmp(orangefs_attr->attr.name, "perf_history_size"))
+			new_op->upcall.req.param.op =
+				PVFS2_PARAM_REQUEST_OP_PERF_HISTORY_SIZE;
+		else if (!strcmp(orangefs_attr->attr.name,
+				 "perf_time_interval_secs"))
+			new_op->upcall.req.param.op =
+				PVFS2_PARAM_REQUEST_OP_PERF_TIME_INTERVAL_SECS;
+		else if (!strcmp(orangefs_attr->attr.name,
+				 "perf_counter_reset"))
+			new_op->upcall.req.param.op =
+				PVFS2_PARAM_REQUEST_OP_PERF_RESET;
+
+	} else if (!strcmp(kobj_id, ACACHE_KOBJ_ID)) {
+		acache_attr = (struct acache_orangefs_attribute *)attr;
+
+		if (!strcmp(acache_attr->attr.name, "timeout_msecs"))
+			new_op->upcall.req.param.op =
+				PVFS2_PARAM_REQUEST_OP_ACACHE_TIMEOUT_MSECS;
+
+		if (!strcmp(acache_attr->attr.name, "hard_limit"))
+			new_op->upcall.req.param.op =
+				PVFS2_PARAM_REQUEST_OP_ACACHE_HARD_LIMIT;
+
+		if (!strcmp(acache_attr->attr.name, "soft_limit"))
+			new_op->upcall.req.param.op =
+				PVFS2_PARAM_REQUEST_OP_ACACHE_SOFT_LIMIT;
+
+		if (!strcmp(acache_attr->attr.name, "reclaim_percentage"))
+			new_op->upcall.req.param.op =
+			  PVFS2_PARAM_REQUEST_OP_ACACHE_RECLAIM_PERCENTAGE;
+
+	} else if (!strcmp(kobj_id, CAPCACHE_KOBJ_ID)) {
+		capcache_attr = (struct capcache_orangefs_attribute *)attr;
+
+		if (!strcmp(capcache_attr->attr.name, "timeout_secs"))
+			new_op->upcall.req.param.op =
+				PVFS2_PARAM_REQUEST_OP_CAPCACHE_TIMEOUT_SECS;
+
+		if (!strcmp(capcache_attr->attr.name, "hard_limit"))
+			new_op->upcall.req.param.op =
+				PVFS2_PARAM_REQUEST_OP_CAPCACHE_HARD_LIMIT;
+
+		if (!strcmp(capcache_attr->attr.name, "soft_limit"))
+			new_op->upcall.req.param.op =
+				PVFS2_PARAM_REQUEST_OP_CAPCACHE_SOFT_LIMIT;
+
+		if (!strcmp(capcache_attr->attr.name, "reclaim_percentage"))
+			new_op->upcall.req.param.op =
+			  PVFS2_PARAM_REQUEST_OP_CAPCACHE_RECLAIM_PERCENTAGE;
+
+	} else if (!strcmp(kobj_id, CCACHE_KOBJ_ID)) {
+		ccache_attr = (struct ccache_orangefs_attribute *)attr;
+
+		if (!strcmp(ccache_attr->attr.name, "timeout_secs"))
+			new_op->upcall.req.param.op =
+				PVFS2_PARAM_REQUEST_OP_CCACHE_TIMEOUT_SECS;
+
+		if (!strcmp(ccache_attr->attr.name, "hard_limit"))
+			new_op->upcall.req.param.op =
+				PVFS2_PARAM_REQUEST_OP_CCACHE_HARD_LIMIT;
+
+		if (!strcmp(ccache_attr->attr.name, "soft_limit"))
+			new_op->upcall.req.param.op =
+				PVFS2_PARAM_REQUEST_OP_CCACHE_SOFT_LIMIT;
+
+		if (!strcmp(ccache_attr->attr.name, "reclaim_percentage"))
+			new_op->upcall.req.param.op =
+			  PVFS2_PARAM_REQUEST_OP_CCACHE_RECLAIM_PERCENTAGE;
+
+	} else if (!strcmp(kobj_id, NCACHE_KOBJ_ID)) {
+		ncache_attr = (struct ncache_orangefs_attribute *)attr;
+
+		if (!strcmp(ncache_attr->attr.name, "timeout_msecs"))
+			new_op->upcall.req.param.op =
+				PVFS2_PARAM_REQUEST_OP_NCACHE_TIMEOUT_MSECS;
+
+		if (!strcmp(ncache_attr->attr.name, "hard_limit"))
+			new_op->upcall.req.param.op =
+				PVFS2_PARAM_REQUEST_OP_NCACHE_HARD_LIMIT;
+
+		if (!strcmp(ncache_attr->attr.name, "soft_limit"))
+			new_op->upcall.req.param.op =
+				PVFS2_PARAM_REQUEST_OP_NCACHE_SOFT_LIMIT;
+
+		if (!strcmp(ncache_attr->attr.name, "reclaim_percentage"))
+			new_op->upcall.req.param.op =
+			  PVFS2_PARAM_REQUEST_OP_NCACHE_RECLAIM_PERCENTAGE;
+
+	} else if (!strcmp(kobj_id, PC_KOBJ_ID)) {
+		pc_attr = (struct pc_orangefs_attribute *)attr;
+
+		if (!strcmp(pc_attr->attr.name, ACACHE_KOBJ_ID))
+			new_op->upcall.req.perf_count.type =
+				PVFS2_PERF_COUNT_REQUEST_ACACHE;
+
+		if (!strcmp(pc_attr->attr.name, CAPCACHE_KOBJ_ID))
+			new_op->upcall.req.perf_count.type =
+				PVFS2_PERF_COUNT_REQUEST_CAPCACHE;
+
+		if (!strcmp(pc_attr->attr.name, NCACHE_KOBJ_ID))
+			new_op->upcall.req.perf_count.type =
+				PVFS2_PERF_COUNT_REQUEST_NCACHE;
+
+	} else {
+		gossip_err("sysfs_service_op_show: unknown kobj_id:%s:\n",
+			   kobj_id);
+		rc = -EINVAL;
+		goto out;
+	}
+
+
+	if (strcmp(kobj_id, PC_KOBJ_ID))
+		ser_op_type = "pvfs2_param";
+	else
+		ser_op_type = "pvfs2_perf_count";
+
+	/*
+	 * The service_operation will return an errno return code on
+	 * error, and zero on success.
+	 */
+	rc = service_operation(new_op, ser_op_type, PVFS2_OP_INTERRUPTIBLE);
+
+out:
+	if (!rc) {
+		if (strcmp(kobj_id, PC_KOBJ_ID)) {
+			rc = scnprintf(buf,
+				       PAGE_SIZE,
+				       "%d\n",
+				       (int)new_op->downcall.resp.param.value);
+		} else {
+			rc = scnprintf(
+				buf,
+				PAGE_SIZE,
+				"%s",
+				new_op->downcall.resp.perf_count.buffer);
+		}
+	}
+
+	/*
+	 * if we got ENOMEM, then op_alloc probably failed...
+	 */
+	if (rc != -ENOMEM)
+		op_release(new_op);
+
+	return rc;
+
+}
+
+static ssize_t service_orangefs_show(struct orangefs_obj *orangefs_obj,
+				     struct orangefs_attribute *attr,
+				     char *buf)
+{
+	int rc = 0;
+
+	rc = sysfs_service_op_show(ORANGEFS_KOBJ_ID, buf, (void *)attr);
+
+	return rc;
+}
+
+static ssize_t
+	service_acache_show(struct acache_orangefs_obj *acache_orangefs_obj,
+			    struct acache_orangefs_attribute *attr,
+			    char *buf)
+{
+	int rc = 0;
+
+	rc = sysfs_service_op_show(ACACHE_KOBJ_ID, buf, (void *)attr);
+
+	return rc;
+}
+
+static ssize_t service_capcache_show(struct capcache_orangefs_obj
+					*capcache_orangefs_obj,
+				     struct capcache_orangefs_attribute *attr,
+				     char *buf)
+{
+	int rc = 0;
+
+	rc = sysfs_service_op_show(CAPCACHE_KOBJ_ID, buf, (void *)attr);
+
+	return rc;
+}
+
+static ssize_t service_ccache_show(struct ccache_orangefs_obj
+					*ccache_orangefs_obj,
+				   struct ccache_orangefs_attribute *attr,
+				   char *buf)
+{
+	int rc = 0;
+
+	rc = sysfs_service_op_show(CCACHE_KOBJ_ID, buf, (void *)attr);
+
+	return rc;
+}
+
+static ssize_t
+	service_ncache_show(struct ncache_orangefs_obj *ncache_orangefs_obj,
+			    struct ncache_orangefs_attribute *attr,
+			    char *buf)
+{
+	int rc = 0;
+
+	rc = sysfs_service_op_show(NCACHE_KOBJ_ID, buf, (void *)attr);
+
+	return rc;
+}
+
+static ssize_t
+	service_pc_show(struct pc_orangefs_obj *pc_orangefs_obj,
+			    struct pc_orangefs_attribute *attr,
+			    char *buf)
+{
+	int rc = 0;
+
+	rc = sysfs_service_op_show(PC_KOBJ_ID, buf, (void *)attr);
+
+	return rc;
+}
+
+/*
+ * pass attribute values back to userspace with a service operation.
+ *
+ * We have to do a memory allocation, an sscanf and a service operation.
+ * And we have to evaluate what the user entered, to make sure the
+ * value is within the range supported by the attribute. So, there's
+ * a lot of return code checking and mapping going on here.
+ *
+ * We want to return 1 if we think everything went OK, and
+ * EINVAL if not.
+ */
+int sysfs_service_op_store(char *kobj_id, const char *buf, void *attr)
+{
+	struct pvfs2_kernel_op_s *new_op = NULL;
+	int val = 0;
+	int rc = 0;
+	struct orangefs_attribute *orangefs_attr;
+	struct acache_orangefs_attribute *acache_attr;
+	struct capcache_orangefs_attribute *capcache_attr;
+	struct ccache_orangefs_attribute *ccache_attr;
+	struct ncache_orangefs_attribute *ncache_attr;
+
+	gossip_debug(GOSSIP_SYSFS_DEBUG,
+		     "sysfs_service_op_store: id:%s:\n",
+		     kobj_id);
+
+	new_op = op_alloc(PVFS2_VFS_OP_PARAM);
+	if (!new_op) {
+		rc = -ENOMEM;
+		goto out;
+	}
+
+	/* Can't do a service_operation if the client is not running... */
+	rc = is_daemon_in_service();
+	if (rc) {
+		pr_info("%s: Client not running :%d:\n",
+			__func__,
+			is_daemon_in_service());
+		goto out;
+	}
+
+	/*
+	 * The value we want to send back to userspace is in buf.
+	 */
+	rc = kstrtoint(buf, 0, &val);
+	if (rc)
+		goto out;
+
+	if (!strcmp(kobj_id, ORANGEFS_KOBJ_ID)) {
+		orangefs_attr = (struct orangefs_attribute *)attr;
+
+		if (!strcmp(orangefs_attr->attr.name, "perf_history_size")) {
+			if (val > 0) {
+				new_op->upcall.req.param.op =
+				  PVFS2_PARAM_REQUEST_OP_PERF_HISTORY_SIZE;
+			} else {
+				rc = 0;
+				goto out;
+			}
+		} else if (!strcmp(orangefs_attr->attr.name,
+				   "perf_time_interval_secs")) {
+			if (val > 0) {
+				new_op->upcall.req.param.op =
+				PVFS2_PARAM_REQUEST_OP_PERF_TIME_INTERVAL_SECS;
+			} else {
+				rc = 0;
+				goto out;
+			}
+		} else if (!strcmp(orangefs_attr->attr.name,
+				   "perf_counter_reset")) {
+			if ((val == 0) || (val == 1)) {
+				new_op->upcall.req.param.op =
+					PVFS2_PARAM_REQUEST_OP_PERF_RESET;
+			} else {
+				rc = 0;
+				goto out;
+			}
+		}
+
+	} else if (!strcmp(kobj_id, ACACHE_KOBJ_ID)) {
+		acache_attr = (struct acache_orangefs_attribute *)attr;
+
+		if (!strcmp(acache_attr->attr.name, "hard_limit")) {
+			if (val > -1) {
+				new_op->upcall.req.param.op =
+				  PVFS2_PARAM_REQUEST_OP_ACACHE_HARD_LIMIT;
+			} else {
+				rc = 0;
+				goto out;
+			}
+		} else if (!strcmp(acache_attr->attr.name, "soft_limit")) {
+			if (val > -1) {
+				new_op->upcall.req.param.op =
+				  PVFS2_PARAM_REQUEST_OP_ACACHE_SOFT_LIMIT;
+			} else {
+				rc = 0;
+				goto out;
+			}
+		} else if (!strcmp(acache_attr->attr.name,
+				   "reclaim_percentage")) {
+			if ((val > -1) && (val < 101)) {
+				new_op->upcall.req.param.op =
+				  PVFS2_PARAM_REQUEST_OP_ACACHE_RECLAIM_PERCENTAGE;
+			} else {
+				rc = 0;
+				goto out;
+			}
+		} else if (!strcmp(acache_attr->attr.name, "timeout_msecs")) {
+			if (val > -1) {
+				new_op->upcall.req.param.op =
+				  PVFS2_PARAM_REQUEST_OP_ACACHE_TIMEOUT_MSECS;
+			} else {
+				rc = 0;
+				goto out;
+			}
+		}
+
+	} else if (!strcmp(kobj_id, CAPCACHE_KOBJ_ID)) {
+		capcache_attr = (struct capcache_orangefs_attribute *)attr;
+
+		if (!strcmp(capcache_attr->attr.name, "hard_limit")) {
+			if (val > -1) {
+				new_op->upcall.req.param.op =
+				  PVFS2_PARAM_REQUEST_OP_CAPCACHE_HARD_LIMIT;
+			} else {
+				rc = 0;
+				goto out;
+			}
+		} else if (!strcmp(capcache_attr->attr.name, "soft_limit")) {
+			if (val > -1) {
+				new_op->upcall.req.param.op =
+				  PVFS2_PARAM_REQUEST_OP_CAPCACHE_SOFT_LIMIT;
+			} else {
+				rc = 0;
+				goto out;
+			}
+		} else if (!strcmp(capcache_attr->attr.name,
+				   "reclaim_percentage")) {
+			if ((val > -1) && (val < 101)) {
+				new_op->upcall.req.param.op =
+				  PVFS2_PARAM_REQUEST_OP_CAPCACHE_RECLAIM_PERCENTAGE;
+			} else {
+				rc = 0;
+				goto out;
+			}
+		} else if (!strcmp(capcache_attr->attr.name, "timeout_secs")) {
+			if (val > -1) {
+				new_op->upcall.req.param.op =
+				  PVFS2_PARAM_REQUEST_OP_CAPCACHE_TIMEOUT_SECS;
+			} else {
+				rc = 0;
+				goto out;
+			}
+		}
+
+	} else if (!strcmp(kobj_id, CCACHE_KOBJ_ID)) {
+		ccache_attr = (struct ccache_orangefs_attribute *)attr;
+
+		if (!strcmp(ccache_attr->attr.name, "hard_limit")) {
+			if (val > -1) {
+				new_op->upcall.req.param.op =
+				  PVFS2_PARAM_REQUEST_OP_CCACHE_HARD_LIMIT;
+			} else {
+				rc = 0;
+				goto out;
+			}
+		} else if (!strcmp(ccache_attr->attr.name, "soft_limit")) {
+			if (val > -1) {
+				new_op->upcall.req.param.op =
+				  PVFS2_PARAM_REQUEST_OP_CCACHE_SOFT_LIMIT;
+			} else {
+				rc = 0;
+				goto out;
+			}
+		} else if (!strcmp(ccache_attr->attr.name,
+				   "reclaim_percentage")) {
+			if ((val > -1) && (val < 101)) {
+				new_op->upcall.req.param.op =
+				  PVFS2_PARAM_REQUEST_OP_CCACHE_RECLAIM_PERCENTAGE;
+			} else {
+				rc = 0;
+				goto out;
+			}
+		} else if (!strcmp(ccache_attr->attr.name, "timeout_secs")) {
+			if (val > -1) {
+				new_op->upcall.req.param.op =
+				  PVFS2_PARAM_REQUEST_OP_CCACHE_TIMEOUT_SECS;
+			} else {
+				rc = 0;
+				goto out;
+			}
+		}
+
+	} else if (!strcmp(kobj_id, NCACHE_KOBJ_ID)) {
+		ncache_attr = (struct ncache_orangefs_attribute *)attr;
+
+		if (!strcmp(ncache_attr->attr.name, "hard_limit")) {
+			if (val > -1) {
+				new_op->upcall.req.param.op =
+				  PVFS2_PARAM_REQUEST_OP_NCACHE_HARD_LIMIT;
+			} else {
+				rc = 0;
+				goto out;
+			}
+		} else if (!strcmp(ncache_attr->attr.name, "soft_limit")) {
+			if (val > -1) {
+				new_op->upcall.req.param.op =
+				  PVFS2_PARAM_REQUEST_OP_NCACHE_SOFT_LIMIT;
+			} else {
+				rc = 0;
+				goto out;
+			}
+		} else if (!strcmp(ncache_attr->attr.name,
+				   "reclaim_percentage")) {
+			if ((val > -1) && (val < 101)) {
+				new_op->upcall.req.param.op =
+					PVFS2_PARAM_REQUEST_OP_NCACHE_RECLAIM_PERCENTAGE;
+			} else {
+				rc = 0;
+				goto out;
+			}
+		} else if (!strcmp(ncache_attr->attr.name, "timeout_msecs")) {
+			if (val > -1) {
+				new_op->upcall.req.param.op =
+				  PVFS2_PARAM_REQUEST_OP_NCACHE_TIMEOUT_MSECS;
+			} else {
+				rc = 0;
+				goto out;
+			}
+		}
+
+	} else {
+		gossip_err("sysfs_service_op_store: unknown kobj_id:%s:\n",
+			   kobj_id);
+		rc = -EINVAL;
+		goto out;
+	}
+
+	new_op->upcall.req.param.type = PVFS2_PARAM_REQUEST_SET;
+
+	new_op->upcall.req.param.value = val;
+
+	/*
+	 * The service_operation will return a errno return code on
+	 * error, and zero on success.
+	 */
+	rc = service_operation(new_op, "pvfs2_param", PVFS2_OP_INTERRUPTIBLE);
+
+	if (rc < 0) {
+		gossip_err("sysfs_service_op_store: service op returned:%d:\n",
+			rc);
+		rc = 0;
+	} else {
+		rc = 1;
+	}
+
+out:
+	/*
+	 * if we got ENOMEM, then op_alloc probably failed...
+	 */
+	if (rc == -ENOMEM)
+		rc = 0;
+	else
+		op_release(new_op);
+
+	if (rc == 0)
+		rc = -EINVAL;
+
+	return rc;
+}
+
+static ssize_t
+	service_orangefs_store(struct orangefs_obj *orangefs_obj,
+			       struct orangefs_attribute *attr,
+			       const char *buf,
+			       size_t count)
+{
+	int rc = 0;
+
+	rc = sysfs_service_op_store(ORANGEFS_KOBJ_ID, buf, (void *) attr);
+
+	/* rc should have an errno value if the service_op went bad. */
+	if (rc == 1)
+		rc = count;
+
+	return rc;
+}
+
+static ssize_t
+	service_acache_store(struct acache_orangefs_obj *acache_orangefs_obj,
+			     struct acache_orangefs_attribute *attr,
+			     const char *buf,
+			     size_t count)
+{
+	int rc = 0;
+
+	rc = sysfs_service_op_store(ACACHE_KOBJ_ID, buf, (void *) attr);
+
+	/* rc should have an errno value if the service_op went bad. */
+	if (rc == 1)
+		rc = count;
+
+	return rc;
+}
+
+static ssize_t
+	service_capcache_store(struct capcache_orangefs_obj
+				*capcache_orangefs_obj,
+			       struct capcache_orangefs_attribute *attr,
+			       const char *buf,
+			       size_t count)
+{
+	int rc = 0;
+
+	rc = sysfs_service_op_store(CAPCACHE_KOBJ_ID, buf, (void *) attr);
+
+	/* rc should have an errno value if the service_op went bad. */
+	if (rc == 1)
+		rc = count;
+
+	return rc;
+}
+
+static ssize_t service_ccache_store(struct ccache_orangefs_obj
+					*ccache_orangefs_obj,
+				    struct ccache_orangefs_attribute *attr,
+				    const char *buf,
+				    size_t count)
+{
+	int rc = 0;
+
+	rc = sysfs_service_op_store(CCACHE_KOBJ_ID, buf, (void *) attr);
+
+	/* rc should have an errno value if the service_op went bad. */
+	if (rc == 1)
+		rc = count;
+
+	return rc;
+}
+
+static ssize_t
+	service_ncache_store(struct ncache_orangefs_obj *ncache_orangefs_obj,
+			     struct ncache_orangefs_attribute *attr,
+			     const char *buf,
+			     size_t count)
+{
+	int rc = 0;
+
+	rc = sysfs_service_op_store(NCACHE_KOBJ_ID, buf, (void *) attr);
+
+	/* rc should have an errno value if the service_op went bad. */
+	if (rc == 1)
+		rc = count;
+
+	return rc;
+}
+
+static struct orangefs_attribute op_timeout_secs_attribute =
+	__ATTR(op_timeout_secs, 0664, int_orangefs_show, int_store);
+
+static struct orangefs_attribute slot_timeout_secs_attribute =
+	__ATTR(slot_timeout_secs, 0664, int_orangefs_show, int_store);
+
+static struct orangefs_attribute perf_counter_reset_attribute =
+	__ATTR(perf_counter_reset,
+	       0664,
+	       service_orangefs_show,
+	       service_orangefs_store);
+
+static struct orangefs_attribute perf_history_size_attribute =
+	__ATTR(perf_history_size,
+	       0664,
+	       service_orangefs_show,
+	       service_orangefs_store);
+
+static struct orangefs_attribute perf_time_interval_secs_attribute =
+	__ATTR(perf_time_interval_secs,
+	       0664,
+	       service_orangefs_show,
+	       service_orangefs_store);
+
+static struct attribute *orangefs_default_attrs[] = {
+	&op_timeout_secs_attribute.attr,
+	&slot_timeout_secs_attribute.attr,
+	&perf_counter_reset_attribute.attr,
+	&perf_history_size_attribute.attr,
+	&perf_time_interval_secs_attribute.attr,
+	NULL,
+};
+
+static struct kobj_type orangefs_ktype = {
+	.sysfs_ops = &orangefs_sysfs_ops,
+	.release = orangefs_release,
+	.default_attrs = orangefs_default_attrs,
+};
+
+static struct acache_orangefs_attribute acache_hard_limit_attribute =
+	__ATTR(hard_limit,
+	       0664,
+	       service_acache_show,
+	       service_acache_store);
+
+static struct acache_orangefs_attribute acache_reclaim_percent_attribute =
+	__ATTR(reclaim_percentage,
+	       0664,
+	       service_acache_show,
+	       service_acache_store);
+
+static struct acache_orangefs_attribute acache_soft_limit_attribute =
+	__ATTR(soft_limit,
+	       0664,
+	       service_acache_show,
+	       service_acache_store);
+
+static struct acache_orangefs_attribute acache_timeout_msecs_attribute =
+	__ATTR(timeout_msecs,
+	       0664,
+	       service_acache_show,
+	       service_acache_store);
+
+static struct attribute *acache_orangefs_default_attrs[] = {
+	&acache_hard_limit_attribute.attr,
+	&acache_reclaim_percent_attribute.attr,
+	&acache_soft_limit_attribute.attr,
+	&acache_timeout_msecs_attribute.attr,
+	NULL,
+};
+
+static struct kobj_type acache_orangefs_ktype = {
+	.sysfs_ops = &acache_orangefs_sysfs_ops,
+	.release = acache_orangefs_release,
+	.default_attrs = acache_orangefs_default_attrs,
+};
+
+static struct capcache_orangefs_attribute capcache_hard_limit_attribute =
+	__ATTR(hard_limit,
+	       0664,
+	       service_capcache_show,
+	       service_capcache_store);
+
+static struct capcache_orangefs_attribute capcache_reclaim_percent_attribute =
+	__ATTR(reclaim_percentage,
+	       0664,
+	       service_capcache_show,
+	       service_capcache_store);
+
+static struct capcache_orangefs_attribute capcache_soft_limit_attribute =
+	__ATTR(soft_limit,
+	       0664,
+	       service_capcache_show,
+	       service_capcache_store);
+
+static struct capcache_orangefs_attribute capcache_timeout_secs_attribute =
+	__ATTR(timeout_secs,
+	       0664,
+	       service_capcache_show,
+	       service_capcache_store);
+
+static struct attribute *capcache_orangefs_default_attrs[] = {
+	&capcache_hard_limit_attribute.attr,
+	&capcache_reclaim_percent_attribute.attr,
+	&capcache_soft_limit_attribute.attr,
+	&capcache_timeout_secs_attribute.attr,
+	NULL,
+};
+
+static struct kobj_type capcache_orangefs_ktype = {
+	.sysfs_ops = &capcache_orangefs_sysfs_ops,
+	.release = capcache_orangefs_release,
+	.default_attrs = capcache_orangefs_default_attrs,
+};
+
+static struct ccache_orangefs_attribute ccache_hard_limit_attribute =
+	__ATTR(hard_limit,
+	       0664,
+	       service_ccache_show,
+	       service_ccache_store);
+
+static struct ccache_orangefs_attribute ccache_reclaim_percent_attribute =
+	__ATTR(reclaim_percentage,
+	       0664,
+	       service_ccache_show,
+	       service_ccache_store);
+
+static struct ccache_orangefs_attribute ccache_soft_limit_attribute =
+	__ATTR(soft_limit,
+	       0664,
+	       service_ccache_show,
+	       service_ccache_store);
+
+static struct ccache_orangefs_attribute ccache_timeout_secs_attribute =
+	__ATTR(timeout_secs,
+	       0664,
+	       service_ccache_show,
+	       service_ccache_store);
+
+static struct attribute *ccache_orangefs_default_attrs[] = {
+	&ccache_hard_limit_attribute.attr,
+	&ccache_reclaim_percent_attribute.attr,
+	&ccache_soft_limit_attribute.attr,
+	&ccache_timeout_secs_attribute.attr,
+	NULL,
+};
+
+static struct kobj_type ccache_orangefs_ktype = {
+	.sysfs_ops = &ccache_orangefs_sysfs_ops,
+	.release = ccache_orangefs_release,
+	.default_attrs = ccache_orangefs_default_attrs,
+};
+
+static struct ncache_orangefs_attribute ncache_hard_limit_attribute =
+	__ATTR(hard_limit,
+	       0664,
+	       service_ncache_show,
+	       service_ncache_store);
+
+static struct ncache_orangefs_attribute ncache_reclaim_percent_attribute =
+	__ATTR(reclaim_percentage,
+	       0664,
+	       service_ncache_show,
+	       service_ncache_store);
+
+static struct ncache_orangefs_attribute ncache_soft_limit_attribute =
+	__ATTR(soft_limit,
+	       0664,
+	       service_ncache_show,
+	       service_ncache_store);
+
+static struct ncache_orangefs_attribute ncache_timeout_msecs_attribute =
+	__ATTR(timeout_msecs,
+	       0664,
+	       service_ncache_show,
+	       service_ncache_store);
+
+static struct attribute *ncache_orangefs_default_attrs[] = {
+	&ncache_hard_limit_attribute.attr,
+	&ncache_reclaim_percent_attribute.attr,
+	&ncache_soft_limit_attribute.attr,
+	&ncache_timeout_msecs_attribute.attr,
+	NULL,
+};
+
+static struct kobj_type ncache_orangefs_ktype = {
+	.sysfs_ops = &ncache_orangefs_sysfs_ops,
+	.release = ncache_orangefs_release,
+	.default_attrs = ncache_orangefs_default_attrs,
+};
+
+static struct pc_orangefs_attribute pc_acache_attribute =
+	__ATTR(acache,
+	       0664,
+	       service_pc_show,
+	       NULL);
+
+static struct pc_orangefs_attribute pc_capcache_attribute =
+	__ATTR(capcache,
+	       0664,
+	       service_pc_show,
+	       NULL);
+
+static struct pc_orangefs_attribute pc_ncache_attribute =
+	__ATTR(ncache,
+	       0664,
+	       service_pc_show,
+	       NULL);
+
+static struct attribute *pc_orangefs_default_attrs[] = {
+	&pc_acache_attribute.attr,
+	&pc_capcache_attribute.attr,
+	&pc_ncache_attribute.attr,
+	NULL,
+};
+
+static struct kobj_type pc_orangefs_ktype = {
+	.sysfs_ops = &pc_orangefs_sysfs_ops,
+	.release = pc_orangefs_release,
+	.default_attrs = pc_orangefs_default_attrs,
+};
+
+static struct stats_orangefs_attribute stats_reads_attribute =
+	__ATTR(reads,
+	       0664,
+	       int_stats_show,
+	       NULL);
+
+static struct stats_orangefs_attribute stats_writes_attribute =
+	__ATTR(writes,
+	       0664,
+	       int_stats_show,
+	       NULL);
+
+static struct attribute *stats_orangefs_default_attrs[] = {
+	&stats_reads_attribute.attr,
+	&stats_writes_attribute.attr,
+	NULL,
+};
+
+static struct kobj_type stats_orangefs_ktype = {
+	.sysfs_ops = &stats_orangefs_sysfs_ops,
+	.release = stats_orangefs_release,
+	.default_attrs = stats_orangefs_default_attrs,
+};
+
+static struct orangefs_obj *orangefs_obj;
+static struct acache_orangefs_obj *acache_orangefs_obj;
+static struct capcache_orangefs_obj *capcache_orangefs_obj;
+static struct ccache_orangefs_obj *ccache_orangefs_obj;
+static struct ncache_orangefs_obj *ncache_orangefs_obj;
+static struct pc_orangefs_obj *pc_orangefs_obj;
+static struct stats_orangefs_obj *stats_orangefs_obj;
+
+int orangefs_sysfs_init(void)
+{
+	int rc;
+
+	gossip_debug(GOSSIP_SYSFS_DEBUG, "orangefs_sysfs_init: start\n");
+
+	/* create /sys/fs/orangefs. */
+	orangefs_obj = kzalloc(sizeof(*orangefs_obj), GFP_KERNEL);
+	if (!orangefs_obj) {
+		rc = -EINVAL;
+		goto out;
+	}
+
+	rc = kobject_init_and_add(&orangefs_obj->kobj,
+				  &orangefs_ktype,
+				  fs_kobj,
+				  ORANGEFS_KOBJ_ID);
+
+	if (rc) {
+		kobject_put(&orangefs_obj->kobj);
+		rc = -EINVAL;
+		goto out;
+	}
+
+	kobject_uevent(&orangefs_obj->kobj, KOBJ_ADD);
+
+	/* create /sys/fs/orangefs/acache. */
+	acache_orangefs_obj = kzalloc(sizeof(*acache_orangefs_obj), GFP_KERNEL);
+	if (!acache_orangefs_obj) {
+		rc = -EINVAL;
+		goto out;
+	}
+
+	rc = kobject_init_and_add(&acache_orangefs_obj->kobj,
+				  &acache_orangefs_ktype,
+				  &orangefs_obj->kobj,
+				  ACACHE_KOBJ_ID);
+
+	if (rc) {
+		kobject_put(&acache_orangefs_obj->kobj);
+		rc = -EINVAL;
+		goto out;
+	}
+
+	kobject_uevent(&acache_orangefs_obj->kobj, KOBJ_ADD);
+
+	/* create /sys/fs/orangefs/capcache. */
+	capcache_orangefs_obj =
+		kzalloc(sizeof(*capcache_orangefs_obj), GFP_KERNEL);
+	if (!capcache_orangefs_obj) {
+		rc = -EINVAL;
+		goto out;
+	}
+
+	rc = kobject_init_and_add(&capcache_orangefs_obj->kobj,
+				  &capcache_orangefs_ktype,
+				  &orangefs_obj->kobj,
+				  CAPCACHE_KOBJ_ID);
+	if (rc) {
+		kobject_put(&capcache_orangefs_obj->kobj);
+		rc = -EINVAL;
+		goto out;
+	}
+
+	kobject_uevent(&capcache_orangefs_obj->kobj, KOBJ_ADD);
+
+	/* create /sys/fs/orangefs/ccache. */
+	ccache_orangefs_obj =
+		kzalloc(sizeof(*ccache_orangefs_obj), GFP_KERNEL);
+	if (!ccache_orangefs_obj) {
+		rc = -EINVAL;
+		goto out;
+	}
+
+	rc = kobject_init_and_add(&ccache_orangefs_obj->kobj,
+				  &ccache_orangefs_ktype,
+				  &orangefs_obj->kobj,
+				  CCACHE_KOBJ_ID);
+	if (rc) {
+		kobject_put(&ccache_orangefs_obj->kobj);
+		rc = -EINVAL;
+		goto out;
+	}
+
+	kobject_uevent(&ccache_orangefs_obj->kobj, KOBJ_ADD);
+
+	/* create /sys/fs/orangefs/ncache. */
+	ncache_orangefs_obj = kzalloc(sizeof(*ncache_orangefs_obj), GFP_KERNEL);
+	if (!ncache_orangefs_obj) {
+		rc = -EINVAL;
+		goto out;
+	}
+
+	rc = kobject_init_and_add(&ncache_orangefs_obj->kobj,
+				  &ncache_orangefs_ktype,
+				  &orangefs_obj->kobj,
+				  NCACHE_KOBJ_ID);
+
+	if (rc) {
+		kobject_put(&ncache_orangefs_obj->kobj);
+		rc = -EINVAL;
+		goto out;
+	}
+
+	kobject_uevent(&ncache_orangefs_obj->kobj, KOBJ_ADD);
+
+	/* create /sys/fs/orangefs/perf_counters. */
+	pc_orangefs_obj = kzalloc(sizeof(*pc_orangefs_obj), GFP_KERNEL);
+	if (!pc_orangefs_obj) {
+		rc = -EINVAL;
+		goto out;
+	}
+
+	rc = kobject_init_and_add(&pc_orangefs_obj->kobj,
+				  &pc_orangefs_ktype,
+				  &orangefs_obj->kobj,
+				  "perf_counters");
+
+	if (rc) {
+		kobject_put(&pc_orangefs_obj->kobj);
+		rc = -EINVAL;
+		goto out;
+	}
+
+	kobject_uevent(&pc_orangefs_obj->kobj, KOBJ_ADD);
+
+	/* create /sys/fs/orangefs/stats. */
+	stats_orangefs_obj = kzalloc(sizeof(*stats_orangefs_obj), GFP_KERNEL);
+	if (!stats_orangefs_obj) {
+		rc = -EINVAL;
+		goto out;
+	}
+
+	rc = kobject_init_and_add(&stats_orangefs_obj->kobj,
+				  &stats_orangefs_ktype,
+				  &orangefs_obj->kobj,
+				  STATS_KOBJ_ID);
+
+	if (rc) {
+		kobject_put(&stats_orangefs_obj->kobj);
+		rc = -EINVAL;
+		goto out;
+	}
+
+	kobject_uevent(&stats_orangefs_obj->kobj, KOBJ_ADD);
+out:
+	return rc;
+}
+
+void orangefs_sysfs_exit(void)
+{
+	gossip_debug(GOSSIP_SYSFS_DEBUG, "orangefs_sysfs_exit: start\n");
+
+	kobject_put(&acache_orangefs_obj->kobj);
+	kobject_put(&capcache_orangefs_obj->kobj);
+	kobject_put(&ccache_orangefs_obj->kobj);
+	kobject_put(&ncache_orangefs_obj->kobj);
+	kobject_put(&pc_orangefs_obj->kobj);
+	kobject_put(&stats_orangefs_obj->kobj);
+
+	kobject_put(&orangefs_obj->kobj);
+}
diff --git a/fs/orangefs/pvfs2-utils.c b/fs/orangefs/pvfs2-utils.c
new file mode 100644
index 0000000..107f425
--- /dev/null
+++ b/fs/orangefs/pvfs2-utils.c
@@ -0,0 +1,1128 @@
+/*
+ * (C) 2001 Clemson University and The University of Chicago
+ *
+ * See COPYING in top-level directory.
+ */
+#include "protocol.h"
+#include "pvfs2-kernel.h"
+#include "pvfs2-dev-proto.h"
+#include "pvfs2-bufmap.h"
+
+__s32 fsid_of_op(struct pvfs2_kernel_op_s *op)
+{
+	__s32 fsid = PVFS_FS_ID_NULL;
+
+	if (op) {
+		switch (op->upcall.type) {
+		case PVFS2_VFS_OP_FILE_IO:
+			fsid = op->upcall.req.io.refn.fs_id;
+			break;
+		case PVFS2_VFS_OP_LOOKUP:
+			fsid = op->upcall.req.lookup.parent_refn.fs_id;
+			break;
+		case PVFS2_VFS_OP_CREATE:
+			fsid = op->upcall.req.create.parent_refn.fs_id;
+			break;
+		case PVFS2_VFS_OP_GETATTR:
+			fsid = op->upcall.req.getattr.refn.fs_id;
+			break;
+		case PVFS2_VFS_OP_REMOVE:
+			fsid = op->upcall.req.remove.parent_refn.fs_id;
+			break;
+		case PVFS2_VFS_OP_MKDIR:
+			fsid = op->upcall.req.mkdir.parent_refn.fs_id;
+			break;
+		case PVFS2_VFS_OP_READDIR:
+			fsid = op->upcall.req.readdir.refn.fs_id;
+			break;
+		case PVFS2_VFS_OP_SETATTR:
+			fsid = op->upcall.req.setattr.refn.fs_id;
+			break;
+		case PVFS2_VFS_OP_SYMLINK:
+			fsid = op->upcall.req.sym.parent_refn.fs_id;
+			break;
+		case PVFS2_VFS_OP_RENAME:
+			fsid = op->upcall.req.rename.old_parent_refn.fs_id;
+			break;
+		case PVFS2_VFS_OP_STATFS:
+			fsid = op->upcall.req.statfs.fs_id;
+			break;
+		case PVFS2_VFS_OP_TRUNCATE:
+			fsid = op->upcall.req.truncate.refn.fs_id;
+			break;
+		case PVFS2_VFS_OP_MMAP_RA_FLUSH:
+			fsid = op->upcall.req.ra_cache_flush.refn.fs_id;
+			break;
+		case PVFS2_VFS_OP_FS_UMOUNT:
+			fsid = op->upcall.req.fs_umount.fs_id;
+			break;
+		case PVFS2_VFS_OP_GETXATTR:
+			fsid = op->upcall.req.getxattr.refn.fs_id;
+			break;
+		case PVFS2_VFS_OP_SETXATTR:
+			fsid = op->upcall.req.setxattr.refn.fs_id;
+			break;
+		case PVFS2_VFS_OP_LISTXATTR:
+			fsid = op->upcall.req.listxattr.refn.fs_id;
+			break;
+		case PVFS2_VFS_OP_REMOVEXATTR:
+			fsid = op->upcall.req.removexattr.refn.fs_id;
+			break;
+		case PVFS2_VFS_OP_FSYNC:
+			fsid = op->upcall.req.fsync.refn.fs_id;
+			break;
+		default:
+			break;
+		}
+	}
+	return fsid;
+}
+
+static void pvfs2_set_inode_flags(struct inode *inode,
+				  struct PVFS_sys_attr_s *attrs)
+{
+	if (attrs->flags & PVFS_IMMUTABLE_FL)
+		inode->i_flags |= S_IMMUTABLE;
+	else
+		inode->i_flags &= ~S_IMMUTABLE;
+
+	if (attrs->flags & PVFS_APPEND_FL)
+		inode->i_flags |= S_APPEND;
+	else
+		inode->i_flags &= ~S_APPEND;
+
+	if (attrs->flags & PVFS_NOATIME_FL)
+		inode->i_flags |= S_NOATIME;
+	else
+		inode->i_flags &= ~S_NOATIME;
+
+}
+
+/* NOTE: symname is ignored unless the inode is a sym link */
+static int copy_attributes_to_inode(struct inode *inode,
+				    struct PVFS_sys_attr_s *attrs,
+				    char *symname)
+{
+	int ret = -1;
+	int perm_mode = 0;
+	struct pvfs2_inode_s *pvfs2_inode = PVFS2_I(inode);
+	loff_t inode_size = 0;
+	loff_t rounded_up_size = 0;
+
+
+	/*
+	   arbitrarily set the inode block size; FIXME: we need to
+	   resolve the difference between the reported inode blocksize
+	   and the PAGE_CACHE_SIZE, since our block count will always
+	   be wrong.
+
+	   For now, we're setting the block count to be the proper
+	   number assuming the block size is 512 bytes, and the size is
+	   rounded up to the nearest 4K.  This is apparently required
+	   to get proper size reports from the 'du' shell utility.
+
+	   changing the inode->i_blkbits to something other than
+	   PAGE_CACHE_SHIFT breaks mmap/execution as we depend on that.
+	 */
+	gossip_debug(GOSSIP_UTILS_DEBUG,
+		     "attrs->mask = %x (objtype = %s)\n",
+		     attrs->mask,
+		     attrs->objtype == PVFS_TYPE_METAFILE ? "file" :
+		     attrs->objtype == PVFS_TYPE_DIRECTORY ? "directory" :
+		     attrs->objtype == PVFS_TYPE_SYMLINK ? "symlink" :
+			"invalid/unknown");
+
+	switch (attrs->objtype) {
+	case PVFS_TYPE_METAFILE:
+		pvfs2_set_inode_flags(inode, attrs);
+		if (attrs->mask & PVFS_ATTR_SYS_SIZE) {
+			inode_size = (loff_t) attrs->size;
+			rounded_up_size =
+			    (inode_size + (4096 - (inode_size % 4096)));
+
+			pvfs2_lock_inode(inode);
+			inode->i_bytes = inode_size;
+			inode->i_blocks =
+			    (unsigned long)(rounded_up_size / 512);
+			pvfs2_unlock_inode(inode);
+
+			/*
+			 * NOTE: make sure all the places we're called
+			 * from have the inode->i_sem lock. We're fine
+			 * in 99% of the cases since we're mostly
+			 * called from a lookup.
+			 */
+			inode->i_size = inode_size;
+		}
+		break;
+	case PVFS_TYPE_SYMLINK:
+		if (symname != NULL) {
+			inode->i_size = (loff_t) strlen(symname);
+			break;
+		}
+		/*FALLTHRU*/
+	default:
+		pvfs2_lock_inode(inode);
+		inode->i_bytes = PAGE_CACHE_SIZE;
+		inode->i_blocks = (unsigned long)(PAGE_CACHE_SIZE / 512);
+		pvfs2_unlock_inode(inode);
+
+		inode->i_size = PAGE_CACHE_SIZE;
+		break;
+	}
+
+	inode->i_uid = make_kuid(&init_user_ns, attrs->owner);
+	inode->i_gid = make_kgid(&init_user_ns, attrs->group);
+	inode->i_atime.tv_sec = (time_t) attrs->atime;
+	inode->i_mtime.tv_sec = (time_t) attrs->mtime;
+	inode->i_ctime.tv_sec = (time_t) attrs->ctime;
+	inode->i_atime.tv_nsec = 0;
+	inode->i_mtime.tv_nsec = 0;
+	inode->i_ctime.tv_nsec = 0;
+
+	if (attrs->perms & PVFS_O_EXECUTE)
+		perm_mode |= S_IXOTH;
+	if (attrs->perms & PVFS_O_WRITE)
+		perm_mode |= S_IWOTH;
+	if (attrs->perms & PVFS_O_READ)
+		perm_mode |= S_IROTH;
+
+	if (attrs->perms & PVFS_G_EXECUTE)
+		perm_mode |= S_IXGRP;
+	if (attrs->perms & PVFS_G_WRITE)
+		perm_mode |= S_IWGRP;
+	if (attrs->perms & PVFS_G_READ)
+		perm_mode |= S_IRGRP;
+
+	if (attrs->perms & PVFS_U_EXECUTE)
+		perm_mode |= S_IXUSR;
+	if (attrs->perms & PVFS_U_WRITE)
+		perm_mode |= S_IWUSR;
+	if (attrs->perms & PVFS_U_READ)
+		perm_mode |= S_IRUSR;
+
+	if (attrs->perms & PVFS_G_SGID)
+		perm_mode |= S_ISGID;
+	if (attrs->perms & PVFS_U_SUID)
+		perm_mode |= S_ISUID;
+
+	inode->i_mode = perm_mode;
+
+	if (is_root_handle(inode)) {
+		/* special case: mark the root inode as sticky */
+		inode->i_mode |= S_ISVTX;
+		gossip_debug(GOSSIP_UTILS_DEBUG,
+			     "Marking inode %pU as sticky\n",
+			     get_khandle_from_ino(inode));
+	}
+
+	switch (attrs->objtype) {
+	case PVFS_TYPE_METAFILE:
+		inode->i_mode |= S_IFREG;
+		ret = 0;
+		break;
+	case PVFS_TYPE_DIRECTORY:
+		inode->i_mode |= S_IFDIR;
+		/* NOTE: we have no good way to keep nlink consistent
+		 * for directories across clients; keep constant at 1.
+		 * Why 1?  If we go with 2, then find(1) gets confused
+		 * and won't work properly withouth the -noleaf option
+		 */
+		set_nlink(inode, 1);
+		ret = 0;
+		break;
+	case PVFS_TYPE_SYMLINK:
+		inode->i_mode |= S_IFLNK;
+
+		/* copy link target to inode private data */
+		if (pvfs2_inode && symname) {
+			strncpy(pvfs2_inode->link_target,
+				symname,
+				PVFS_NAME_MAX);
+			gossip_debug(GOSSIP_UTILS_DEBUG,
+				     "Copied attr link target %s\n",
+				     pvfs2_inode->link_target);
+		}
+		gossip_debug(GOSSIP_UTILS_DEBUG,
+			     "symlink mode %o\n",
+			     inode->i_mode);
+		ret = 0;
+		break;
+	default:
+		gossip_err("pvfs2: copy_attributes_to_inode: got invalid attribute type %x\n",
+			attrs->objtype);
+	}
+
+	gossip_debug(GOSSIP_UTILS_DEBUG,
+		     "pvfs2: copy_attributes_to_inode: setting i_mode to %o, i_size to %lu\n",
+		     inode->i_mode,
+		     (unsigned long)i_size_read(inode));
+
+	return ret;
+}
+
+/*
+ * NOTE: in kernel land, we never use the sys_attr->link_target for
+ * anything, so don't bother copying it into the sys_attr object here.
+ */
+static inline int copy_attributes_from_inode(struct inode *inode,
+					     struct PVFS_sys_attr_s *attrs,
+					     struct iattr *iattr)
+{
+	umode_t tmp_mode;
+
+	if (!iattr || !inode || !attrs) {
+		gossip_err("NULL iattr (%p), inode (%p), attrs (%p) "
+			   "in copy_attributes_from_inode!\n",
+			   iattr,
+			   inode,
+			   attrs);
+		return -EINVAL;
+	}
+	/*
+	 * We need to be careful to only copy the attributes out of the
+	 * iattr object that we know are valid.
+	 */
+	attrs->mask = 0;
+	if (iattr->ia_valid & ATTR_UID) {
+		attrs->owner = from_kuid(current_user_ns(), iattr->ia_uid);
+		attrs->mask |= PVFS_ATTR_SYS_UID;
+		gossip_debug(GOSSIP_UTILS_DEBUG, "(UID) %d\n", attrs->owner);
+	}
+	if (iattr->ia_valid & ATTR_GID) {
+		attrs->group = from_kgid(current_user_ns(), iattr->ia_gid);
+		attrs->mask |= PVFS_ATTR_SYS_GID;
+		gossip_debug(GOSSIP_UTILS_DEBUG, "(GID) %d\n", attrs->group);
+	}
+
+	if (iattr->ia_valid & ATTR_ATIME) {
+		attrs->mask |= PVFS_ATTR_SYS_ATIME;
+		if (iattr->ia_valid & ATTR_ATIME_SET) {
+			attrs->atime =
+			    pvfs2_convert_time_field((void *)&iattr->ia_atime);
+			attrs->mask |= PVFS_ATTR_SYS_ATIME_SET;
+		}
+	}
+	if (iattr->ia_valid & ATTR_MTIME) {
+		attrs->mask |= PVFS_ATTR_SYS_MTIME;
+		if (iattr->ia_valid & ATTR_MTIME_SET) {
+			attrs->mtime =
+			    pvfs2_convert_time_field((void *)&iattr->ia_mtime);
+			attrs->mask |= PVFS_ATTR_SYS_MTIME_SET;
+		}
+	}
+	if (iattr->ia_valid & ATTR_CTIME)
+		attrs->mask |= PVFS_ATTR_SYS_CTIME;
+
+	/*
+	 * PVFS2 cannot set size with a setattr operation.  Probably not likely
+	 * to be requested through the VFS, but just in case, don't worry about
+	 * ATTR_SIZE
+	 */
+
+	if (iattr->ia_valid & ATTR_MODE) {
+		tmp_mode = iattr->ia_mode;
+		if (tmp_mode & (S_ISVTX)) {
+			if (is_root_handle(inode)) {
+				/*
+				 * allow sticky bit to be set on root (since
+				 * it shows up that way by default anyhow),
+				 * but don't show it to the server
+				 */
+				tmp_mode -= S_ISVTX;
+			} else {
+				gossip_debug(GOSSIP_UTILS_DEBUG,
+					     "User attempted to set sticky bit on non-root directory; returning EINVAL.\n");
+				return -EINVAL;
+			}
+		}
+
+		if (tmp_mode & (S_ISUID)) {
+			gossip_debug(GOSSIP_UTILS_DEBUG,
+				     "Attempting to set setuid bit (not supported); returning EINVAL.\n");
+			return -EINVAL;
+		}
+
+		attrs->perms = PVFS_util_translate_mode(tmp_mode);
+		attrs->mask |= PVFS_ATTR_SYS_PERM;
+	}
+
+	return 0;
+}
+
+/*
+ * issues a pvfs2 getattr request and fills in the appropriate inode
+ * attributes if successful.  returns 0 on success; -errno otherwise
+ */
+int pvfs2_inode_getattr(struct inode *inode, __u32 getattr_mask)
+{
+	struct pvfs2_inode_s *pvfs2_inode = PVFS2_I(inode);
+	struct pvfs2_kernel_op_s *new_op;
+	int ret = -EINVAL;
+
+	gossip_debug(GOSSIP_UTILS_DEBUG,
+		     "%s: called on inode %pU\n",
+		     __func__,
+		     get_khandle_from_ino(inode));
+
+	new_op = op_alloc(PVFS2_VFS_OP_GETATTR);
+	if (!new_op)
+		return -ENOMEM;
+	new_op->upcall.req.getattr.refn = pvfs2_inode->refn;
+	new_op->upcall.req.getattr.mask = getattr_mask;
+
+	ret = service_operation(new_op, __func__,
+				get_interruptible_flag(inode));
+	if (ret != 0)
+		goto out;
+
+	if (copy_attributes_to_inode(inode,
+			&new_op->downcall.resp.getattr.attributes,
+			new_op->downcall.resp.getattr.link_target)) {
+		gossip_err("%s: failed to copy attributes\n", __func__);
+		ret = -ENOENT;
+		goto out;
+	}
+
+	/*
+	 * Store blksize in pvfs2 specific part of inode structure; we are
+	 * only going to use this to report to stat to make sure it doesn't
+	 * perturb any inode related code paths.
+	 */
+	if (new_op->downcall.resp.getattr.attributes.objtype ==
+			PVFS_TYPE_METAFILE) {
+		pvfs2_inode->blksize =
+			new_op->downcall.resp.getattr.attributes.blksize;
+	} else {
+		/* mimic behavior of generic_fillattr() for other types. */
+		pvfs2_inode->blksize = (1 << inode->i_blkbits);
+
+	}
+
+out:
+	gossip_debug(GOSSIP_UTILS_DEBUG,
+		     "Getattr on handle %pU, "
+		     "fsid %d\n  (inode ct = %d) returned %d\n",
+		     &pvfs2_inode->refn.khandle,
+		     pvfs2_inode->refn.fs_id,
+		     (int)atomic_read(&inode->i_count),
+		     ret);
+
+	op_release(new_op);
+	return ret;
+}
+
+/*
+ * issues a pvfs2 setattr request to make sure the new attribute values
+ * take effect if successful.  returns 0 on success; -errno otherwise
+ */
+int pvfs2_inode_setattr(struct inode *inode, struct iattr *iattr)
+{
+	struct pvfs2_inode_s *pvfs2_inode = PVFS2_I(inode);
+	struct pvfs2_kernel_op_s *new_op;
+	int ret;
+
+	new_op = op_alloc(PVFS2_VFS_OP_SETATTR);
+	if (!new_op)
+		return -ENOMEM;
+
+	new_op->upcall.req.setattr.refn = pvfs2_inode->refn;
+	ret = copy_attributes_from_inode(inode,
+		       &new_op->upcall.req.setattr.attributes,
+		       iattr);
+	if (ret < 0) {
+		op_release(new_op);
+		return ret;
+	}
+
+	ret = service_operation(new_op, __func__,
+				get_interruptible_flag(inode));
+
+	gossip_debug(GOSSIP_UTILS_DEBUG,
+		     "pvfs2_inode_setattr: returning %d\n",
+		     ret);
+
+	/* when request is serviced properly, free req op struct */
+	op_release(new_op);
+
+	/*
+	 * successful setattr should clear the atime, mtime and
+	 * ctime flags.
+	 */
+	if (ret == 0) {
+		ClearAtimeFlag(pvfs2_inode);
+		ClearMtimeFlag(pvfs2_inode);
+		ClearCtimeFlag(pvfs2_inode);
+		ClearModeFlag(pvfs2_inode);
+	}
+
+	return ret;
+}
+
+int pvfs2_flush_inode(struct inode *inode)
+{
+	/*
+	 * If it is a dirty inode, this function gets called.
+	 * Gather all the information that needs to be setattr'ed
+	 * Right now, this will only be used for mode, atime, mtime
+	 * and/or ctime.
+	 */
+	struct iattr wbattr;
+	int ret;
+	int mtime_flag;
+	int ctime_flag;
+	int atime_flag;
+	int mode_flag;
+	struct pvfs2_inode_s *pvfs2_inode = PVFS2_I(inode);
+
+	memset(&wbattr, 0, sizeof(wbattr));
+
+	/*
+	 * check inode flags up front, and clear them if they are set.  This
+	 * will prevent multiple processes from all trying to flush the same
+	 * inode if they call close() simultaneously
+	 */
+	mtime_flag = MtimeFlag(pvfs2_inode);
+	ClearMtimeFlag(pvfs2_inode);
+	ctime_flag = CtimeFlag(pvfs2_inode);
+	ClearCtimeFlag(pvfs2_inode);
+	atime_flag = AtimeFlag(pvfs2_inode);
+	ClearAtimeFlag(pvfs2_inode);
+	mode_flag = ModeFlag(pvfs2_inode);
+	ClearModeFlag(pvfs2_inode);
+
+	/*  -- Lazy atime,mtime and ctime update --
+	 * Note: all times are dictated by server in the new scheme
+	 * and not by the clients
+	 *
+	 * Also mode updates are being handled now..
+	 */
+
+	if (mtime_flag)
+		wbattr.ia_valid |= ATTR_MTIME;
+	if (ctime_flag)
+		wbattr.ia_valid |= ATTR_CTIME;
+	if (atime_flag)
+		wbattr.ia_valid |= ATTR_ATIME;
+
+	if (mode_flag) {
+		wbattr.ia_mode = inode->i_mode;
+		wbattr.ia_valid |= ATTR_MODE;
+	}
+
+	gossip_debug(GOSSIP_UTILS_DEBUG,
+		     "*********** pvfs2_flush_inode: %pU "
+		     "(ia_valid %d)\n",
+		     get_khandle_from_ino(inode),
+		     wbattr.ia_valid);
+	if (wbattr.ia_valid == 0) {
+		gossip_debug(GOSSIP_UTILS_DEBUG,
+			     "pvfs2_flush_inode skipping setattr()\n");
+		return 0;
+	}
+
+	gossip_debug(GOSSIP_UTILS_DEBUG,
+		     "pvfs2_flush_inode (%pU) writing mode %o\n",
+		     get_khandle_from_ino(inode),
+		     inode->i_mode);
+
+	ret = pvfs2_inode_setattr(inode, &wbattr);
+
+	return ret;
+}
+
+int pvfs2_unmount_sb(struct super_block *sb)
+{
+	int ret = -EINVAL;
+	struct pvfs2_kernel_op_s *new_op = NULL;
+
+	gossip_debug(GOSSIP_UTILS_DEBUG,
+		     "pvfs2_unmount_sb called on sb %p\n",
+		     sb);
+
+	new_op = op_alloc(PVFS2_VFS_OP_FS_UMOUNT);
+	if (!new_op)
+		return -ENOMEM;
+	new_op->upcall.req.fs_umount.id = PVFS2_SB(sb)->id;
+	new_op->upcall.req.fs_umount.fs_id = PVFS2_SB(sb)->fs_id;
+	strncpy(new_op->upcall.req.fs_umount.pvfs2_config_server,
+		PVFS2_SB(sb)->devname,
+		PVFS_MAX_SERVER_ADDR_LEN);
+
+	gossip_debug(GOSSIP_UTILS_DEBUG,
+		     "Attempting PVFS2 Unmount via host %s\n",
+		     new_op->upcall.req.fs_umount.pvfs2_config_server);
+
+	ret = service_operation(new_op, "pvfs2_fs_umount", 0);
+
+	gossip_debug(GOSSIP_UTILS_DEBUG,
+		     "pvfs2_unmount: got return value of %d\n", ret);
+	if (ret)
+		sb = ERR_PTR(ret);
+	else
+		PVFS2_SB(sb)->mount_pending = 1;
+
+	op_release(new_op);
+	return ret;
+}
+
+/*
+ * NOTE: on successful cancellation, be sure to return -EINTR, as
+ * that's the return value the caller expects
+ */
+int pvfs2_cancel_op_in_progress(__u64 tag)
+{
+	int ret = -EINVAL;
+	struct pvfs2_kernel_op_s *new_op = NULL;
+
+	gossip_debug(GOSSIP_UTILS_DEBUG,
+		     "pvfs2_cancel_op_in_progress called on tag %llu\n",
+		     llu(tag));
+
+	new_op = op_alloc(PVFS2_VFS_OP_CANCEL);
+	if (!new_op)
+		return -ENOMEM;
+	new_op->upcall.req.cancel.op_tag = tag;
+
+	gossip_debug(GOSSIP_UTILS_DEBUG,
+		     "Attempting PVFS2 operation cancellation of tag %llu\n",
+		     llu(new_op->upcall.req.cancel.op_tag));
+
+	ret = service_operation(new_op, "pvfs2_cancel", PVFS2_OP_CANCELLATION);
+
+	gossip_debug(GOSSIP_UTILS_DEBUG,
+		     "pvfs2_cancel_op_in_progress: got return value of %d\n",
+		     ret);
+
+	op_release(new_op);
+	return ret;
+}
+
+void pvfs2_op_initialize(struct pvfs2_kernel_op_s *op)
+{
+	if (op) {
+		spin_lock(&op->lock);
+		op->io_completed = 0;
+
+		op->upcall.type = PVFS2_VFS_OP_INVALID;
+		op->downcall.type = PVFS2_VFS_OP_INVALID;
+		op->downcall.status = -1;
+
+		op->op_state = OP_VFS_STATE_UNKNOWN;
+		op->tag = 0;
+		spin_unlock(&op->lock);
+	}
+}
+
+void pvfs2_make_bad_inode(struct inode *inode)
+{
+	if (is_root_handle(inode)) {
+		/*
+		 * if this occurs, the pvfs2-client-core was killed but we
+		 * can't afford to lose the inode operations and such
+		 * associated with the root handle in any case.
+		 */
+		gossip_debug(GOSSIP_UTILS_DEBUG,
+			     "*** NOT making bad root inode %pU\n",
+			     get_khandle_from_ino(inode));
+	} else {
+		gossip_debug(GOSSIP_UTILS_DEBUG,
+			     "*** making bad inode %pU\n",
+			     get_khandle_from_ino(inode));
+		make_bad_inode(inode);
+	}
+}
+
+/* this code is based on linux/net/sunrpc/clnt.c:rpc_clnt_sigmask */
+void mask_blocked_signals(sigset_t *orig_sigset)
+{
+	unsigned long sigallow = sigmask(SIGKILL);
+	unsigned long irqflags = 0;
+	struct k_sigaction *action = pvfs2_current_sigaction;
+
+	sigallow |= ((action[SIGINT - 1].sa.sa_handler == SIG_DFL) ?
+		     sigmask(SIGINT) :
+		     0);
+	sigallow |= ((action[SIGQUIT - 1].sa.sa_handler == SIG_DFL) ?
+		     sigmask(SIGQUIT) :
+		     0);
+
+	spin_lock_irqsave(&pvfs2_current_signal_lock, irqflags);
+	*orig_sigset = current->blocked;
+	siginitsetinv(&current->blocked, sigallow & ~orig_sigset->sig[0]);
+	recalc_sigpending();
+	spin_unlock_irqrestore(&pvfs2_current_signal_lock, irqflags);
+}
+
+/* this code is based on linux/net/sunrpc/clnt.c:rpc_clnt_sigunmask */
+void unmask_blocked_signals(sigset_t *orig_sigset)
+{
+	unsigned long irqflags = 0;
+
+	spin_lock_irqsave(&pvfs2_current_signal_lock, irqflags);
+	current->blocked = *orig_sigset;
+	recalc_sigpending();
+	spin_unlock_irqrestore(&pvfs2_current_signal_lock, irqflags);
+}
+
+__u64 pvfs2_convert_time_field(void *time_ptr)
+{
+	__u64 pvfs2_time;
+	struct timespec *tspec = (struct timespec *)time_ptr;
+
+	pvfs2_time = (__u64) ((time_t) tspec->tv_sec);
+	return pvfs2_time;
+}
+
+/* macro defined in include/pvfs2-types.h */
+DECLARE_ERRNO_MAPPING_AND_FN();
+
+int pvfs2_normalize_to_errno(__s32 error_code)
+{
+	if (error_code > 0) {
+		gossip_err("pvfs2: error status receieved.\n");
+		gossip_err("pvfs2: assuming error code is inverted.\n");
+		error_code = -error_code;
+	}
+
+	/* convert any error codes that are in pvfs2 format */
+	if (IS_PVFS_NON_ERRNO_ERROR(-error_code)) {
+		if (PVFS_NON_ERRNO_ERROR_CODE(-error_code) == PVFS_ECANCEL) {
+			/*
+			 * cancellation error codes generally correspond to
+			 * a timeout from the client's perspective
+			 */
+			error_code = -ETIMEDOUT;
+		} else {
+			/* assume a default error code */
+			gossip_err("pvfs2: warning: got error code without errno equivalent: %d.\n",
+				   error_code);
+			error_code = -EINVAL;
+		}
+	} else if (IS_PVFS_ERROR(-error_code)) {
+		error_code = -PVFS_ERROR_TO_ERRNO(-error_code);
+	}
+	return error_code;
+}
+
+#define NUM_MODES 11
+__s32 PVFS_util_translate_mode(int mode)
+{
+	int ret = 0;
+	int i = 0;
+	static int modes[NUM_MODES] = {
+		S_IXOTH, S_IWOTH, S_IROTH,
+		S_IXGRP, S_IWGRP, S_IRGRP,
+		S_IXUSR, S_IWUSR, S_IRUSR,
+		S_ISGID, S_ISUID
+	};
+	static int pvfs2_modes[NUM_MODES] = {
+		PVFS_O_EXECUTE, PVFS_O_WRITE, PVFS_O_READ,
+		PVFS_G_EXECUTE, PVFS_G_WRITE, PVFS_G_READ,
+		PVFS_U_EXECUTE, PVFS_U_WRITE, PVFS_U_READ,
+		PVFS_G_SGID, PVFS_U_SUID
+	};
+
+	for (i = 0; i < NUM_MODES; i++)
+		if (mode & modes[i])
+			ret |= pvfs2_modes[i];
+
+	return ret;
+}
+#undef NUM_MODES
+
+/*
+ * After obtaining a string representation of the client's debug
+ * keywords and their associated masks, this function is called to build an
+ * array of these values.
+ */
+int orangefs_prepare_cdm_array(char *debug_array_string)
+{
+	int i;
+	int rc = -EINVAL;
+	char *cds_head = NULL;
+	char *cds_delimiter = NULL;
+	int keyword_len = 0;
+
+	gossip_debug(GOSSIP_UTILS_DEBUG, "%s: start\n", __func__);
+
+	/*
+	 * figure out how many elements the cdm_array needs.
+	 */
+	for (i = 0; i < strlen(debug_array_string); i++)
+		if (debug_array_string[i] == '\n')
+			cdm_element_count++;
+
+	if (!cdm_element_count) {
+		pr_info("No elements in client debug array string!\n");
+		goto out;
+	}
+
+	cdm_array =
+		kzalloc(cdm_element_count * sizeof(struct client_debug_mask),
+			GFP_KERNEL);
+	if (!cdm_array) {
+		pr_info("malloc failed for cdm_array!\n");
+		rc = -ENOMEM;
+		goto out;
+	}
+
+	cds_head = debug_array_string;
+
+	for (i = 0; i < cdm_element_count; i++) {
+		cds_delimiter = strchr(cds_head, '\n');
+		*cds_delimiter = '\0';
+
+		keyword_len = strcspn(cds_head, " ");
+
+		cdm_array[i].keyword = kzalloc(keyword_len + 1, GFP_KERNEL);
+		if (!cdm_array[i].keyword) {
+			rc = -ENOMEM;
+			goto out;
+		}
+
+		sscanf(cds_head,
+		       "%s %llx %llx",
+		       cdm_array[i].keyword,
+		       (unsigned long long *)&(cdm_array[i].mask1),
+		       (unsigned long long *)&(cdm_array[i].mask2));
+
+		if (!strcmp(cdm_array[i].keyword, PVFS2_VERBOSE))
+			client_verbose_index = i;
+
+		if (!strcmp(cdm_array[i].keyword, PVFS2_ALL))
+			client_all_index = i;
+
+		cds_head = cds_delimiter + 1;
+	}
+
+	rc = cdm_element_count;
+
+	gossip_debug(GOSSIP_UTILS_DEBUG, "%s: rc:%d:\n", __func__, rc);
+
+out:
+
+	return rc;
+
+}
+
+/*
+ * /sys/kernel/debug/orangefs/debug-help can be catted to
+ * see all the available kernel and client debug keywords.
+ *
+ * When the kernel boots, we have no idea what keywords the
+ * client supports, nor their associated masks.
+ *
+ * We pass through this function once at boot and stamp a
+ * boilerplate "we don't know" message for the client in the
+ * debug-help file. We pass through here again when the client
+ * starts and then we can fill out the debug-help file fully.
+ *
+ * The client might be restarted any number of times between
+ * reboots, we only build the debug-help file the first time.
+ */
+int orangefs_prepare_debugfs_help_string(int at_boot)
+{
+	int rc = -EINVAL;
+	int i;
+	int byte_count = 0;
+	char *client_title = "Client Debug Keywords:\n";
+	char *kernel_title = "Kernel Debug Keywords:\n";
+
+	gossip_debug(GOSSIP_UTILS_DEBUG, "%s: start\n", __func__);
+
+	if (at_boot) {
+		byte_count += strlen(HELP_STRING_UNINITIALIZED);
+		client_title = HELP_STRING_UNINITIALIZED;
+	} else {
+		/*
+		 * fill the client keyword/mask array and remember
+		 * how many elements there were.
+		 */
+		cdm_element_count =
+			orangefs_prepare_cdm_array(client_debug_array_string);
+		if (cdm_element_count <= 0)
+			goto out;
+
+		/* Count the bytes destined for debug_help_string. */
+		byte_count += strlen(client_title);
+
+		for (i = 0; i < cdm_element_count; i++) {
+			byte_count += strlen(cdm_array[i].keyword + 2);
+			if (byte_count >= DEBUG_HELP_STRING_SIZE) {
+				pr_info("%s: overflow 1!\n", __func__);
+				goto out;
+			}
+		}
+
+		gossip_debug(GOSSIP_UTILS_DEBUG,
+			     "%s: cdm_element_count:%d:\n",
+			     __func__,
+			     cdm_element_count);
+	}
+
+	byte_count += strlen(kernel_title);
+	for (i = 0; i < num_kmod_keyword_mask_map; i++) {
+		byte_count +=
+			strlen(s_kmod_keyword_mask_map[i].keyword + 2);
+		if (byte_count >= DEBUG_HELP_STRING_SIZE) {
+			pr_info("%s: overflow 2!\n", __func__);
+			goto out;
+		}
+	}
+
+	/* build debug_help_string. */
+	debug_help_string = kzalloc(DEBUG_HELP_STRING_SIZE, GFP_KERNEL);
+	if (!debug_help_string) {
+		rc = -ENOMEM;
+		goto out;
+	}
+
+	strcat(debug_help_string, client_title);
+
+	if (!at_boot) {
+		for (i = 0; i < cdm_element_count; i++) {
+			strcat(debug_help_string, "\t");
+			strcat(debug_help_string, cdm_array[i].keyword);
+			strcat(debug_help_string, "\n");
+		}
+	}
+
+	strcat(debug_help_string, "\n");
+	strcat(debug_help_string, kernel_title);
+
+	for (i = 0; i < num_kmod_keyword_mask_map; i++) {
+		strcat(debug_help_string, "\t");
+		strcat(debug_help_string, s_kmod_keyword_mask_map[i].keyword);
+		strcat(debug_help_string, "\n");
+	}
+
+	rc = 0;
+
+out:
+
+	return rc;
+
+}
+
+/*
+ * kernel = type 0
+ * client = type 1
+ */
+void debug_mask_to_string(void *mask, int type)
+{
+	int i;
+	int len = 0;
+	char *debug_string;
+	int element_count = 0;
+
+	gossip_debug(GOSSIP_UTILS_DEBUG, "%s: start\n", __func__);
+
+	if (type) {
+		debug_string = client_debug_string;
+		element_count = cdm_element_count;
+	} else {
+		debug_string = kernel_debug_string;
+		element_count = num_kmod_keyword_mask_map;
+	}
+
+	memset(debug_string, 0, PVFS2_MAX_DEBUG_STRING_LEN);
+
+	/*
+	 * Some keywords, like "all" or "verbose", are amalgams of
+	 * numerous other keywords. Make a special check for those
+	 * before grinding through the whole mask only to find out
+	 * later...
+	 */
+	if (check_amalgam_keyword(mask, type))
+		goto out;
+
+	/* Build the debug string. */
+	for (i = 0; i < element_count; i++)
+		if (type)
+			do_c_string(mask, i);
+		else
+			do_k_string(mask, i);
+
+	len = strlen(debug_string);
+
+	if ((len) && (type))
+		client_debug_string[len - 1] = '\0';
+	else if (len)
+		kernel_debug_string[len - 1] = '\0';
+	else if (type)
+		strcpy(client_debug_string, "none");
+	else
+		strcpy(kernel_debug_string, "none");
+
+out:
+gossip_debug(GOSSIP_UTILS_DEBUG, "%s: string:%s:\n", __func__, debug_string);
+
+	return;
+
+}
+
+void do_k_string(void *k_mask, int index)
+{
+	__u64 *mask = (__u64 *) k_mask;
+
+	if (keyword_is_amalgam((char *) s_kmod_keyword_mask_map[index].keyword))
+			goto out;
+
+	if (*mask & s_kmod_keyword_mask_map[index].mask_val) {
+		if ((strlen(kernel_debug_string) +
+		     strlen(s_kmod_keyword_mask_map[index].keyword))
+			< PVFS2_MAX_DEBUG_STRING_LEN - 1) {
+				strcat(kernel_debug_string,
+				       s_kmod_keyword_mask_map[index].keyword);
+				strcat(kernel_debug_string, ",");
+			} else {
+				gossip_err("%s: overflow!\n", __func__);
+				strcpy(kernel_debug_string, PVFS2_ALL);
+				goto out;
+			}
+	}
+
+out:
+
+	return;
+}
+
+void do_c_string(void *c_mask, int index)
+{
+	struct client_debug_mask *mask = (struct client_debug_mask *) c_mask;
+
+	if (keyword_is_amalgam(cdm_array[index].keyword))
+		goto out;
+
+	if ((mask->mask1 & cdm_array[index].mask1) ||
+	    (mask->mask2 & cdm_array[index].mask2)) {
+		if ((strlen(client_debug_string) +
+		     strlen(cdm_array[index].keyword) + 1)
+			< PVFS2_MAX_DEBUG_STRING_LEN - 2) {
+				strcat(client_debug_string,
+				       cdm_array[index].keyword);
+				strcat(client_debug_string, ",");
+			} else {
+				gossip_err("%s: overflow!\n", __func__);
+				strcpy(client_debug_string, PVFS2_ALL);
+				goto out;
+			}
+	}
+out:
+	return;
+}
+
+int keyword_is_amalgam(char *keyword)
+{
+	int rc = 0;
+
+	if ((!strcmp(keyword, PVFS2_ALL)) || (!strcmp(keyword, PVFS2_VERBOSE)))
+		rc = 1;
+
+	return rc;
+}
+
+/*
+ * kernel = type 0
+ * client = type 1
+ *
+ * return 1 if we found an amalgam.
+ */
+int check_amalgam_keyword(void *mask, int type)
+{
+	__u64 *k_mask;
+	struct client_debug_mask *c_mask;
+	int k_all_index = num_kmod_keyword_mask_map - 1;
+	int rc = 0;
+
+	if (type) {
+		c_mask = (struct client_debug_mask *) mask;
+
+		if ((c_mask->mask1 == cdm_array[client_all_index].mask1) &&
+		    (c_mask->mask2 == cdm_array[client_all_index].mask2)) {
+			strcpy(client_debug_string, PVFS2_ALL);
+			rc = 1;
+			goto out;
+		}
+
+		if ((c_mask->mask1 == cdm_array[client_verbose_index].mask1) &&
+		    (c_mask->mask2 == cdm_array[client_verbose_index].mask2)) {
+			strcpy(client_debug_string, PVFS2_VERBOSE);
+			rc = 1;
+			goto out;
+		}
+
+	} else {
+		k_mask = (__u64 *) mask;
+
+		if (*k_mask >= s_kmod_keyword_mask_map[k_all_index].mask_val) {
+			strcpy(kernel_debug_string, PVFS2_ALL);
+			rc = 1;
+			goto out;
+		}
+	}
+
+out:
+
+	return rc;
+}
+
+/*
+ * kernel = type 0
+ * client = type 1
+ */
+void debug_string_to_mask(char *debug_string, void *mask, int type)
+{
+	char *unchecked_keyword;
+	int i;
+	char *strsep_fodder = kstrdup(debug_string, GFP_KERNEL);
+	int element_count = 0;
+	struct client_debug_mask *c_mask;
+	__u64 *k_mask;
+
+	gossip_debug(GOSSIP_UTILS_DEBUG, "%s: start\n", __func__);
+
+	if (type) {
+		c_mask = (struct client_debug_mask *)mask;
+		element_count = cdm_element_count;
+	} else {
+		k_mask = (__u64 *)mask;
+		*k_mask = 0;
+		element_count = num_kmod_keyword_mask_map;
+	}
+
+	while ((unchecked_keyword = strsep(&strsep_fodder, ",")))
+		if (strlen(unchecked_keyword)) {
+			for (i = 0; i < element_count; i++)
+				if (type)
+					do_c_mask(i,
+						  unchecked_keyword,
+						  &c_mask);
+				else
+					do_k_mask(i,
+						  unchecked_keyword,
+						  &k_mask);
+		}
+
+	kfree(strsep_fodder);
+}
+
+void do_c_mask(int i,
+	       char *unchecked_keyword,
+	       struct client_debug_mask **sane_mask)
+{
+
+	if (!strcmp(cdm_array[i].keyword, unchecked_keyword)) {
+		(**sane_mask).mask1 = (**sane_mask).mask1 | cdm_array[i].mask1;
+		(**sane_mask).mask2 = (**sane_mask).mask2 | cdm_array[i].mask2;
+	}
+}
+
+void do_k_mask(int i, char *unchecked_keyword, __u64 **sane_mask)
+{
+
+	if (!strcmp(s_kmod_keyword_mask_map[i].keyword, unchecked_keyword))
+		**sane_mask = (**sane_mask) |
+				s_kmod_keyword_mask_map[i].mask_val;
+}
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH V3 5/7] Orangefs: kernel client part 5
  2015-07-17 17:16 [PATCH V3 0/7] Orangefs: kernel client introduction Mike Marshall
                   ` (3 preceding siblings ...)
  2015-07-17 17:16 ` [PATCH V3 4/7] Orangefs: kernel client part 4 Mike Marshall
@ 2015-07-17 17:16 ` Mike Marshall
  2015-07-17 17:16 ` [PATCH V3 6/7] Orangefs: kernel client part 6 Mike Marshall
  2015-07-17 17:16 ` [PATCH V3 7/7] Orangefs: kernel client part 7 Mike Marshall
  6 siblings, 0 replies; 8+ messages in thread
From: Mike Marshall @ 2015-07-17 17:16 UTC (permalink / raw)
  To: viro; +Cc: Mike Marshall, linux-fsdevel

From: Mike Marshall <hubcap@omnibond.com>

Signed-off-by: Mike Marshall <hubcap@omnibond.com>
---
 fs/orangefs/super.c     | 558 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/orangefs/symlink.c   |  31 +++
 fs/orangefs/waitqueue.c | 522 ++++++++++++++++++++++++++++++++++++++++++++
 fs/orangefs/xattr.c     | 532 +++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 1643 insertions(+)
 create mode 100644 fs/orangefs/super.c
 create mode 100644 fs/orangefs/symlink.c
 create mode 100644 fs/orangefs/waitqueue.c
 create mode 100644 fs/orangefs/xattr.c

diff --git a/fs/orangefs/super.c b/fs/orangefs/super.c
new file mode 100644
index 0000000..a854390
--- /dev/null
+++ b/fs/orangefs/super.c
@@ -0,0 +1,558 @@
+/*
+ * (C) 2001 Clemson University and The University of Chicago
+ *
+ * See COPYING in top-level directory.
+ */
+
+#include "protocol.h"
+#include "pvfs2-kernel.h"
+#include "pvfs2-bufmap.h"
+
+#include <linux/parser.h>
+
+/* a cache for pvfs2-inode objects (i.e. pvfs2 inode private data) */
+static struct kmem_cache *pvfs2_inode_cache;
+
+/* list for storing pvfs2 specific superblocks in use */
+LIST_HEAD(pvfs2_superblocks);
+
+DEFINE_SPINLOCK(pvfs2_superblocks_lock);
+
+enum {
+	Opt_intr,
+	Opt_acl,
+	Opt_local_lock,
+
+	Opt_err
+};
+
+static const match_table_t tokens = {
+	{ Opt_acl,		"acl" },
+	{ Opt_intr,		"intr" },
+	{ Opt_local_lock,	"local_lock" },
+	{ Opt_err,	NULL }
+};
+
+
+static int parse_mount_options(struct super_block *sb, char *options,
+		int silent)
+{
+	struct pvfs2_sb_info_s *pvfs2_sb = PVFS2_SB(sb);
+	substring_t args[MAX_OPT_ARGS];
+	char *p;
+
+	/*
+	 * Force any potential flags that might be set from the mount
+	 * to zero, ie, initialize to unset.
+	 */
+	sb->s_flags &= ~MS_POSIXACL;
+	pvfs2_sb->flags &= ~PVFS2_OPT_INTR;
+	pvfs2_sb->flags &= ~PVFS2_OPT_LOCAL_LOCK;
+
+	while ((p = strsep(&options, ",")) != NULL) {
+		int token;
+
+		if (!*p)
+			continue;
+
+		token = match_token(p, tokens, args);
+		switch (token) {
+		case Opt_acl:
+			sb->s_flags |= MS_POSIXACL;
+			break;
+		case Opt_intr:
+			pvfs2_sb->flags |= PVFS2_OPT_INTR;
+			break;
+		case Opt_local_lock:
+			pvfs2_sb->flags |= PVFS2_OPT_LOCAL_LOCK;
+			break;
+		default:
+			goto fail;
+		}
+	}
+
+	return 0;
+fail:
+	if (!silent)
+		gossip_err("Error: mount option [%s] is not supported.\n", p);
+	return -EINVAL;
+}
+
+static void pvfs2_inode_cache_ctor(void *req)
+{
+	struct pvfs2_inode_s *pvfs2_inode = req;
+
+	inode_init_once(&pvfs2_inode->vfs_inode);
+	init_rwsem(&pvfs2_inode->xattr_sem);
+
+	pvfs2_inode->vfs_inode.i_version = 1;
+}
+
+static struct inode *pvfs2_alloc_inode(struct super_block *sb)
+{
+	struct pvfs2_inode_s *pvfs2_inode;
+
+	pvfs2_inode = kmem_cache_alloc(pvfs2_inode_cache,
+				       PVFS2_CACHE_ALLOC_FLAGS);
+	if (pvfs2_inode == NULL) {
+		gossip_err("Failed to allocate pvfs2_inode\n");
+		return NULL;
+	}
+
+	/*
+	 * We want to clear everything except for rw_semaphore and the
+	 * vfs_inode.
+	 */
+	memset(&pvfs2_inode->refn.khandle, 0, 16);
+	pvfs2_inode->refn.fs_id = PVFS_FS_ID_NULL;
+	pvfs2_inode->last_failed_block_index_read = 0;
+	memset(pvfs2_inode->link_target, 0, sizeof(pvfs2_inode->link_target));
+	pvfs2_inode->pinode_flags = 0;
+
+	gossip_debug(GOSSIP_SUPER_DEBUG,
+		     "pvfs2_alloc_inode: allocated %p\n",
+		     &pvfs2_inode->vfs_inode);
+	return &pvfs2_inode->vfs_inode;
+}
+
+static void pvfs2_destroy_inode(struct inode *inode)
+{
+	struct pvfs2_inode_s *pvfs2_inode = PVFS2_I(inode);
+
+	gossip_debug(GOSSIP_SUPER_DEBUG,
+			"%s: deallocated %p destroying inode %pU\n",
+			__func__, pvfs2_inode, get_khandle_from_ino(inode));
+
+	kmem_cache_free(pvfs2_inode_cache, pvfs2_inode);
+}
+
+/*
+ * NOTE: information filled in here is typically reflected in the
+ * output of the system command 'df'
+*/
+static int pvfs2_statfs(struct dentry *dentry, struct kstatfs *buf)
+{
+	int ret = -ENOMEM;
+	struct pvfs2_kernel_op_s *new_op = NULL;
+	int flags = 0;
+	struct super_block *sb = NULL;
+
+	sb = dentry->d_sb;
+
+	gossip_debug(GOSSIP_SUPER_DEBUG,
+		     "pvfs2_statfs: called on sb %p (fs_id is %d)\n",
+		     sb,
+		     (int)(PVFS2_SB(sb)->fs_id));
+
+	new_op = op_alloc(PVFS2_VFS_OP_STATFS);
+	if (!new_op)
+		return ret;
+	new_op->upcall.req.statfs.fs_id = PVFS2_SB(sb)->fs_id;
+
+	if (PVFS2_SB(sb)->flags & PVFS2_OPT_INTR)
+		flags = PVFS2_OP_INTERRUPTIBLE;
+
+	ret = service_operation(new_op, "pvfs2_statfs", flags);
+
+	if (new_op->downcall.status < 0)
+		goto out_op_release;
+
+	gossip_debug(GOSSIP_SUPER_DEBUG,
+		     "pvfs2_statfs: got %ld blocks available | "
+		     "%ld blocks total | %ld block size\n",
+		     (long)new_op->downcall.resp.statfs.blocks_avail,
+		     (long)new_op->downcall.resp.statfs.blocks_total,
+		     (long)new_op->downcall.resp.statfs.block_size);
+
+	buf->f_type = sb->s_magic;
+	memcpy(&buf->f_fsid, &PVFS2_SB(sb)->fs_id, sizeof(buf->f_fsid));
+	buf->f_bsize = new_op->downcall.resp.statfs.block_size;
+	buf->f_namelen = PVFS2_NAME_LEN;
+
+	buf->f_blocks = (sector_t) new_op->downcall.resp.statfs.blocks_total;
+	buf->f_bfree = (sector_t) new_op->downcall.resp.statfs.blocks_avail;
+	buf->f_bavail = (sector_t) new_op->downcall.resp.statfs.blocks_avail;
+	buf->f_files = (sector_t) new_op->downcall.resp.statfs.files_total;
+	buf->f_ffree = (sector_t) new_op->downcall.resp.statfs.files_avail;
+	buf->f_frsize = sb->s_blocksize;
+
+out_op_release:
+	op_release(new_op);
+	gossip_debug(GOSSIP_SUPER_DEBUG, "pvfs2_statfs: returning %d\n", ret);
+	return ret;
+}
+
+/*
+ * Remount as initiated by VFS layer.  We just need to reparse the mount
+ * options, no need to signal pvfs2-client-core about it.
+ */
+static int pvfs2_remount_fs(struct super_block *sb, int *flags, char *data)
+{
+	gossip_debug(GOSSIP_SUPER_DEBUG, "pvfs2_remount_fs: called\n");
+	return parse_mount_options(sb, data, 1);
+}
+
+/*
+ * Remount as initiated by pvfs2-client-core on restart.  This is used to
+ * repopulate mount information left from previous pvfs2-client-core.
+ *
+ * the idea here is that given a valid superblock, we're
+ * re-initializing the user space client with the initial mount
+ * information specified when the super block was first initialized.
+ * this is very different than the first initialization/creation of a
+ * superblock.  we use the special service_priority_operation to make
+ * sure that the mount gets ahead of any other pending operation that
+ * is waiting for servicing.  this means that the pvfs2-client won't
+ * fail to start several times for all other pending operations before
+ * the client regains all of the mount information from us.
+ * NOTE: this function assumes that the request_mutex is already acquired!
+ */
+int pvfs2_remount(struct super_block *sb)
+{
+	struct pvfs2_kernel_op_s *new_op;
+	int ret = -EINVAL;
+
+	gossip_debug(GOSSIP_SUPER_DEBUG, "pvfs2_remount: called\n");
+
+	new_op = op_alloc(PVFS2_VFS_OP_FS_MOUNT);
+	if (!new_op)
+		return -ENOMEM;
+	strncpy(new_op->upcall.req.fs_mount.pvfs2_config_server,
+		PVFS2_SB(sb)->devname,
+		PVFS_MAX_SERVER_ADDR_LEN);
+
+	gossip_debug(GOSSIP_SUPER_DEBUG,
+		     "Attempting PVFS2 Remount via host %s\n",
+		     new_op->upcall.req.fs_mount.pvfs2_config_server);
+
+	/*
+	 * we assume that the calling function has already acquire the
+	 * request_mutex to prevent other operations from bypassing
+	 * this one
+	 */
+	ret = service_operation(new_op, "pvfs2_remount",
+		PVFS2_OP_PRIORITY | PVFS2_OP_NO_SEMAPHORE);
+	gossip_debug(GOSSIP_SUPER_DEBUG,
+		     "pvfs2_remount: mount got return value of %d\n",
+		     ret);
+	if (ret == 0) {
+		/*
+		 * store the id assigned to this sb -- it's just a
+		 * short-lived mapping that the system interface uses
+		 * to map this superblock to a particular mount entry
+		 */
+		PVFS2_SB(sb)->id = new_op->downcall.resp.fs_mount.id;
+		PVFS2_SB(sb)->mount_pending = 0;
+	}
+
+	op_release(new_op);
+	return ret;
+}
+
+int fsid_key_table_initialize(void)
+{
+	return 0;
+}
+
+void fsid_key_table_finalize(void)
+{
+}
+
+/* Called whenever the VFS dirties the inode in response to atime updates */
+static void pvfs2_dirty_inode(struct inode *inode, int flags)
+{
+	struct pvfs2_inode_s *pvfs2_inode = PVFS2_I(inode);
+
+	gossip_debug(GOSSIP_SUPER_DEBUG,
+		     "pvfs2_dirty_inode: %pU\n",
+		     get_khandle_from_ino(inode));
+	SetAtimeFlag(pvfs2_inode);
+}
+
+struct super_operations pvfs2_s_ops = {
+	.alloc_inode = pvfs2_alloc_inode,
+	.destroy_inode = pvfs2_destroy_inode,
+	.dirty_inode = pvfs2_dirty_inode,
+	.drop_inode = generic_delete_inode,
+	.statfs = pvfs2_statfs,
+	.remount_fs = pvfs2_remount_fs,
+	.show_options = generic_show_options,
+};
+
+struct dentry *pvfs2_fh_to_dentry(struct super_block *sb,
+				  struct fid *fid,
+				  int fh_len,
+				  int fh_type)
+{
+	struct pvfs2_object_kref refn;
+
+	if (fh_len < 5 || fh_type > 2)
+		return NULL;
+
+	PVFS_khandle_from(&(refn.khandle), fid->raw, 16);
+	refn.fs_id = (u32) fid->raw[4];
+	gossip_debug(GOSSIP_SUPER_DEBUG,
+		     "fh_to_dentry: handle %pU, fs_id %d\n",
+		     &refn.khandle,
+		     refn.fs_id);
+
+	return d_obtain_alias(pvfs2_iget(sb, &refn));
+}
+
+int pvfs2_encode_fh(struct inode *inode,
+		    __u32 *fh,
+		    int *max_len,
+		    struct inode *parent)
+{
+	int len = parent ? 10 : 5;
+	int type = 1;
+	struct pvfs2_object_kref refn;
+
+	if (*max_len < len) {
+		gossip_lerr("fh buffer is too small for encoding\n");
+		*max_len = len;
+		type = 255;
+		goto out;
+	}
+
+	refn = PVFS2_I(inode)->refn;
+	PVFS_khandle_to(&refn.khandle, fh, 16);
+	fh[4] = refn.fs_id;
+
+	gossip_debug(GOSSIP_SUPER_DEBUG,
+		     "Encoding fh: handle %pU, fsid %u\n",
+		     &refn.khandle,
+		     refn.fs_id);
+
+
+	if (parent) {
+		refn = PVFS2_I(parent)->refn;
+		PVFS_khandle_to(&refn.khandle, (char *) fh + 20, 16);
+		fh[9] = refn.fs_id;
+
+		type = 2;
+		gossip_debug(GOSSIP_SUPER_DEBUG,
+			     "Encoding parent: handle %pU, fsid %u\n",
+			     &refn.khandle,
+			     refn.fs_id);
+	}
+	*max_len = len;
+
+out:
+	return type;
+}
+
+static struct export_operations pvfs2_export_ops = {
+	.encode_fh = pvfs2_encode_fh,
+	.fh_to_dentry = pvfs2_fh_to_dentry,
+};
+
+int pvfs2_fill_sb(struct super_block *sb, void *data, int silent)
+{
+	int ret = -EINVAL;
+	struct inode *root = NULL;
+	struct dentry *root_dentry = NULL;
+	struct pvfs2_mount_sb_info_s *mount_sb_info =
+		(struct pvfs2_mount_sb_info_s *) data;
+	struct pvfs2_object_kref root_object;
+
+	/* alloc and init our private pvfs2 sb info */
+	sb->s_fs_info =
+		kmalloc(sizeof(struct pvfs2_sb_info_s), PVFS2_GFP_FLAGS);
+	if (!PVFS2_SB(sb))
+		return -ENOMEM;
+	memset(sb->s_fs_info, 0, sizeof(struct pvfs2_sb_info_s));
+	PVFS2_SB(sb)->sb = sb;
+
+	PVFS2_SB(sb)->root_khandle = mount_sb_info->root_khandle;
+	PVFS2_SB(sb)->fs_id = mount_sb_info->fs_id;
+	PVFS2_SB(sb)->id = mount_sb_info->id;
+
+	if (mount_sb_info->data) {
+		ret = parse_mount_options(sb, mount_sb_info->data,
+					  silent);
+		if (ret)
+			return ret;
+	}
+
+	/* Hang the xattr handlers off the superblock */
+	sb->s_xattr = pvfs2_xattr_handlers;
+	sb->s_magic = PVFS2_SUPER_MAGIC;
+	sb->s_op = &pvfs2_s_ops;
+	sb->s_d_op = &pvfs2_dentry_operations;
+
+	sb->s_blocksize = pvfs_bufmap_size_query();
+	sb->s_blocksize_bits = pvfs_bufmap_shift_query();
+	sb->s_maxbytes = MAX_LFS_FILESIZE;
+
+	root_object.khandle = PVFS2_SB(sb)->root_khandle;
+	root_object.fs_id = PVFS2_SB(sb)->fs_id;
+	gossip_debug(GOSSIP_SUPER_DEBUG,
+		     "get inode %pU, fsid %d\n",
+		     &root_object.khandle,
+		     root_object.fs_id);
+
+	root = pvfs2_iget(sb, &root_object);
+	if (IS_ERR(root))
+		return PTR_ERR(root);
+
+	gossip_debug(GOSSIP_SUPER_DEBUG,
+		     "Allocated root inode [%p] with mode %x\n",
+		     root,
+		     root->i_mode);
+
+	/* allocates and places root dentry in dcache */
+	root_dentry = d_make_root(root);
+	if (!root_dentry) {
+		iput(root);
+		return -ENOMEM;
+	}
+
+	sb->s_export_op = &pvfs2_export_ops;
+	sb->s_root = root_dentry;
+	return 0;
+}
+
+struct dentry *pvfs2_mount(struct file_system_type *fst,
+			   int flags,
+			   const char *devname,
+			   void *data)
+{
+	int ret = -EINVAL;
+	struct super_block *sb = ERR_PTR(-EINVAL);
+	struct pvfs2_kernel_op_s *new_op;
+	struct pvfs2_mount_sb_info_s mount_sb_info;
+	struct dentry *mnt_sb_d = ERR_PTR(-EINVAL);
+
+	gossip_debug(GOSSIP_SUPER_DEBUG,
+		     "pvfs2_mount: called with devname %s\n",
+		     devname);
+
+	if (!devname) {
+		gossip_err("ERROR: device name not specified.\n");
+		return ERR_PTR(-EINVAL);
+	}
+
+	new_op = op_alloc(PVFS2_VFS_OP_FS_MOUNT);
+	if (!new_op)
+		return ERR_PTR(-ENOMEM);
+
+	strncpy(new_op->upcall.req.fs_mount.pvfs2_config_server,
+		devname,
+		PVFS_MAX_SERVER_ADDR_LEN);
+
+	gossip_debug(GOSSIP_SUPER_DEBUG,
+		     "Attempting PVFS2 Mount via host %s\n",
+		     new_op->upcall.req.fs_mount.pvfs2_config_server);
+
+	ret = service_operation(new_op, "pvfs2_mount", 0);
+	gossip_debug(GOSSIP_SUPER_DEBUG,
+		     "pvfs2_mount: mount got return value of %d\n", ret);
+	if (ret)
+		goto free_op;
+
+	if (new_op->downcall.resp.fs_mount.fs_id == PVFS_FS_ID_NULL) {
+		gossip_err("ERROR: Retrieved null fs_id\n");
+		ret = -EINVAL;
+		goto free_op;
+	}
+
+	/* fill in temporary structure passed to fill_sb method */
+	mount_sb_info.data = data;
+	mount_sb_info.root_khandle =
+		new_op->downcall.resp.fs_mount.root_khandle;
+	mount_sb_info.fs_id = new_op->downcall.resp.fs_mount.fs_id;
+	mount_sb_info.id = new_op->downcall.resp.fs_mount.id;
+
+	/*
+	 * the mount_sb_info structure looks odd, but it's used because
+	 * the private sb info isn't allocated until we call
+	 * pvfs2_fill_sb, yet we have the info we need to fill it with
+	 * here.  so we store it temporarily and pass all of the info
+	 * to fill_sb where it's properly copied out
+	 */
+	mnt_sb_d = mount_nodev(fst,
+			       flags,
+			       (void *)&mount_sb_info,
+			       pvfs2_fill_sb);
+	if (IS_ERR(mnt_sb_d)) {
+		sb = ERR_CAST(mnt_sb_d);
+		goto free_op;
+	}
+
+	sb = mnt_sb_d->d_sb;
+
+	/*
+	 * on successful mount, store the devname and data
+	 * used
+	 */
+	strncpy(PVFS2_SB(sb)->devname,
+		devname,
+		PVFS_MAX_SERVER_ADDR_LEN);
+
+	/* mount_pending must be cleared */
+	PVFS2_SB(sb)->mount_pending = 0;
+
+	/*
+	 * finally, add this sb to our list of known pvfs2
+	 * sb's
+	 */
+	add_pvfs2_sb(sb);
+	op_release(new_op);
+	return mnt_sb_d;
+
+free_op:
+	gossip_err("pvfs2_mount: mount request failed with %d\n", ret);
+	if (ret == -EINVAL) {
+		gossip_err("Ensure that all pvfs2-servers have the same FS configuration files\n");
+		gossip_err("Look at pvfs2-client-core log file (typically /tmp/pvfs2-client.log) for more details\n");
+	}
+
+	op_release(new_op);
+
+	gossip_debug(GOSSIP_SUPER_DEBUG,
+		     "pvfs2_mount: returning dentry %p\n",
+		     mnt_sb_d);
+	return mnt_sb_d;
+}
+
+void pvfs2_kill_sb(struct super_block *sb)
+{
+	gossip_debug(GOSSIP_SUPER_DEBUG, "pvfs2_kill_sb: called\n");
+
+	/*
+	 * issue the unmount to userspace to tell it to remove the
+	 * dynamic mount info it has for this superblock
+	 */
+	pvfs2_unmount_sb(sb);
+
+	/* remove the sb from our list of pvfs2 specific sb's */
+	remove_pvfs2_sb(sb);
+
+	/* provided sb cleanup */
+	kill_anon_super(sb);
+
+	/* free the pvfs2 superblock private data */
+	kfree(PVFS2_SB(sb));
+}
+
+int pvfs2_inode_cache_initialize(void)
+{
+	pvfs2_inode_cache = kmem_cache_create("pvfs2_inode_cache",
+					      sizeof(struct pvfs2_inode_s),
+					      0,
+					      PVFS2_CACHE_CREATE_FLAGS,
+					      pvfs2_inode_cache_ctor);
+
+	if (!pvfs2_inode_cache) {
+		gossip_err("Cannot create pvfs2_inode_cache\n");
+		return -ENOMEM;
+	}
+	return 0;
+}
+
+int pvfs2_inode_cache_finalize(void)
+{
+	kmem_cache_destroy(pvfs2_inode_cache);
+	return 0;
+}
diff --git a/fs/orangefs/symlink.c b/fs/orangefs/symlink.c
new file mode 100644
index 0000000..2adfcef
--- /dev/null
+++ b/fs/orangefs/symlink.c
@@ -0,0 +1,31 @@
+/*
+ * (C) 2001 Clemson University and The University of Chicago
+ *
+ * See COPYING in top-level directory.
+ */
+
+#include "protocol.h"
+#include "pvfs2-kernel.h"
+#include "pvfs2-bufmap.h"
+
+static const char *pvfs2_follow_link(struct dentry *dentry, void **cookie)
+{
+	char *target =  PVFS2_I(dentry->d_inode)->link_target;
+
+	gossip_debug(GOSSIP_INODE_DEBUG,
+		     "%s: called on %s (target is %p)\n",
+		     __func__, (char *)dentry->d_name.name, target);
+
+	*cookie = target;
+
+	return target;
+}
+
+struct inode_operations pvfs2_symlink_inode_operations = {
+	.readlink = generic_readlink,
+	.follow_link = pvfs2_follow_link,
+	.setattr = pvfs2_setattr,
+	.getattr = pvfs2_getattr,
+	.listxattr = pvfs2_listxattr,
+	.setxattr = generic_setxattr,
+};
diff --git a/fs/orangefs/waitqueue.c b/fs/orangefs/waitqueue.c
new file mode 100644
index 0000000..9b32286
--- /dev/null
+++ b/fs/orangefs/waitqueue.c
@@ -0,0 +1,522 @@
+/*
+ * (C) 2001 Clemson University and The University of Chicago
+ * (C) 2011 Omnibond Systems
+ *
+ * Changes by Acxiom Corporation to implement generic service_operation()
+ * function, Copyright Acxiom Corporation, 2005.
+ *
+ * See COPYING in top-level directory.
+ */
+
+/*
+ *  In-kernel waitqueue operations.
+ */
+
+#include "protocol.h"
+#include "pvfs2-kernel.h"
+#include "pvfs2-bufmap.h"
+
+/*
+ * What we do in this function is to walk the list of operations that are
+ * present in the request queue and mark them as purged.
+ * NOTE: This is called from the device close after client-core has
+ * guaranteed that no new operations could appear on the list since the
+ * client-core is anyway going to exit.
+ */
+void purge_waiting_ops(void)
+{
+	struct pvfs2_kernel_op_s *op;
+
+	spin_lock(&pvfs2_request_list_lock);
+	list_for_each_entry(op, &pvfs2_request_list, list) {
+		gossip_debug(GOSSIP_WAIT_DEBUG,
+			     "pvfs2-client-core: purging op tag %llu %s\n",
+			     llu(op->tag),
+			     get_opname_string(op));
+		spin_lock(&op->lock);
+		set_op_state_purged(op);
+		spin_unlock(&op->lock);
+		wake_up_interruptible(&op->waitq);
+	}
+	spin_unlock(&pvfs2_request_list_lock);
+}
+
+/*
+ * submits a PVFS2 operation and waits for it to complete
+ *
+ * Note op->downcall.status will contain the status of the operation (in
+ * errno format), whether provided by pvfs2-client or a result of failure to
+ * service the operation.  If the caller wishes to distinguish, then
+ * op->state can be checked to see if it was serviced or not.
+ *
+ * Returns contents of op->downcall.status for convenience
+ */
+int service_operation(struct pvfs2_kernel_op_s *op,
+		      const char *op_name,
+		      int flags)
+{
+	/* flags to modify behavior */
+	sigset_t orig_sigset;
+	int ret = 0;
+
+	/* irqflags and wait_entry are only used IF the client-core aborts */
+	unsigned long irqflags;
+
+	DECLARE_WAITQUEUE(wait_entry, current);
+
+	op->upcall.tgid = current->tgid;
+	op->upcall.pid = current->pid;
+
+retry_servicing:
+	op->downcall.status = 0;
+	gossip_debug(GOSSIP_WAIT_DEBUG,
+		     "pvfs2: service_operation: %s %p\n",
+		     op_name,
+		     op);
+	gossip_debug(GOSSIP_WAIT_DEBUG,
+		     "pvfs2: operation posted by process: %s, pid: %i\n",
+		     current->comm,
+		     current->pid);
+
+	/* mask out signals if this operation is not to be interrupted */
+	if (!(flags & PVFS2_OP_INTERRUPTIBLE))
+		mask_blocked_signals(&orig_sigset);
+
+	if (!(flags & PVFS2_OP_NO_SEMAPHORE)) {
+		ret = mutex_lock_interruptible(&request_mutex);
+		/*
+		 * check to see if we were interrupted while waiting for
+		 * semaphore
+		 */
+		if (ret < 0) {
+			if (!(flags & PVFS2_OP_INTERRUPTIBLE))
+				unmask_blocked_signals(&orig_sigset);
+			op->downcall.status = ret;
+			gossip_debug(GOSSIP_WAIT_DEBUG,
+				     "pvfs2: service_operation interrupted.\n");
+			return ret;
+		}
+	}
+
+	gossip_debug(GOSSIP_WAIT_DEBUG,
+		     "%s:About to call is_daemon_in_service().\n",
+		     __func__);
+
+	if (is_daemon_in_service() < 0) {
+		/*
+		 * By incrementing the per-operation attempt counter, we
+		 * directly go into the timeout logic while waiting for
+		 * the matching downcall to be read
+		 */
+		gossip_debug(GOSSIP_WAIT_DEBUG,
+			     "%s:client core is NOT in service(%d).\n",
+			     __func__,
+			     is_daemon_in_service());
+		op->attempts++;
+	}
+
+	/* queue up the operation */
+	if (flags & PVFS2_OP_PRIORITY) {
+		add_priority_op_to_request_list(op);
+	} else {
+		gossip_debug(GOSSIP_WAIT_DEBUG,
+			     "%s:About to call add_op_to_request_list().\n",
+			     __func__);
+		add_op_to_request_list(op);
+	}
+
+	if (!(flags & PVFS2_OP_NO_SEMAPHORE))
+		mutex_unlock(&request_mutex);
+
+	/*
+	 * If we are asked to service an asynchronous operation from
+	 * VFS perspective, we are done.
+	 */
+	if (flags & PVFS2_OP_ASYNC)
+		return 0;
+
+	if (flags & PVFS2_OP_CANCELLATION) {
+		gossip_debug(GOSSIP_WAIT_DEBUG,
+			     "%s:"
+			     "About to call wait_for_cancellation_downcall.\n",
+			     __func__);
+		ret = wait_for_cancellation_downcall(op);
+	} else {
+		ret = wait_for_matching_downcall(op);
+	}
+
+	if (ret < 0) {
+		/* failed to get matching downcall */
+		if (ret == -ETIMEDOUT) {
+			gossip_err("pvfs2: %s -- wait timed out; aborting attempt.\n",
+				   op_name);
+		}
+		op->downcall.status = ret;
+	} else {
+		/* got matching downcall; make sure status is in errno format */
+		op->downcall.status =
+		    pvfs2_normalize_to_errno(op->downcall.status);
+		ret = op->downcall.status;
+	}
+
+	if (!(flags & PVFS2_OP_INTERRUPTIBLE))
+		unmask_blocked_signals(&orig_sigset);
+
+	BUG_ON(ret != op->downcall.status);
+	/* retry if operation has not been serviced and if requested */
+	if (!op_state_serviced(op) && op->downcall.status == -EAGAIN) {
+		gossip_debug(GOSSIP_WAIT_DEBUG,
+			     "pvfs2: tag %llu (%s)"
+			     " -- operation to be retried (%d attempt)\n",
+			     llu(op->tag),
+			     op_name,
+			     op->attempts + 1);
+
+		if (!op->uses_shared_memory)
+			/*
+			 * this operation doesn't use the shared memory
+			 * system
+			 */
+			goto retry_servicing;
+
+		/* op uses shared memory */
+		if (get_bufmap_init() == 0) {
+			/*
+			 * This operation uses the shared memory system AND
+			 * the system is not yet ready. This situation occurs
+			 * when the client-core is restarted AND there were
+			 * operations waiting to be processed or were already
+			 * in process.
+			 */
+			gossip_debug(GOSSIP_WAIT_DEBUG,
+				     "uses_shared_memory is true.\n");
+			gossip_debug(GOSSIP_WAIT_DEBUG,
+				     "Client core in-service status(%d).\n",
+				     is_daemon_in_service());
+			gossip_debug(GOSSIP_WAIT_DEBUG, "bufmap_init:%d.\n",
+				     get_bufmap_init());
+			gossip_debug(GOSSIP_WAIT_DEBUG,
+				     "operation's status is 0x%0x.\n",
+				     op->op_state);
+
+			/*
+			 * let process sleep for a few seconds so shared
+			 * memory system can be initialized.
+			 */
+			spin_lock_irqsave(&op->lock, irqflags);
+			add_wait_queue(&pvfs2_bufmap_init_waitq, &wait_entry);
+			spin_unlock_irqrestore(&op->lock, irqflags);
+
+			set_current_state(TASK_INTERRUPTIBLE);
+
+			/*
+			 * Wait for pvfs_bufmap_initialize() to wake me up
+			 * within the allotted time.
+			 */
+			ret = schedule_timeout(MSECS_TO_JIFFIES
+				(1000 * PVFS2_BUFMAP_WAIT_TIMEOUT_SECS));
+
+			gossip_debug(GOSSIP_WAIT_DEBUG,
+				     "Value returned from schedule_timeout:"
+				     "%d.\n",
+				     ret);
+			gossip_debug(GOSSIP_WAIT_DEBUG,
+				     "Is shared memory available? (%d).\n",
+				     get_bufmap_init());
+
+			spin_lock_irqsave(&op->lock, irqflags);
+			remove_wait_queue(&pvfs2_bufmap_init_waitq,
+					  &wait_entry);
+			spin_unlock_irqrestore(&op->lock, irqflags);
+
+			if (get_bufmap_init() == 0) {
+				gossip_err("%s:The shared memory system has not started in %d seconds after the client core restarted.  Aborting user's request(%s).\n",
+					   __func__,
+					   PVFS2_BUFMAP_WAIT_TIMEOUT_SECS,
+					   get_opname_string(op));
+				return -EIO;
+			}
+
+			/*
+			 * Return to the calling function and re-populate a
+			 * shared memory buffer.
+			 */
+			return -EAGAIN;
+		}
+	}
+
+	gossip_debug(GOSSIP_WAIT_DEBUG,
+		     "pvfs2: service_operation %s returning: %d for %p.\n",
+		     op_name,
+		     ret,
+		     op);
+	return ret;
+}
+
+void pvfs2_clean_up_interrupted_operation(struct pvfs2_kernel_op_s *op)
+{
+	/*
+	 * handle interrupted cases depending on what state we were in when
+	 * the interruption is detected.  there is a coarse grained lock
+	 * across the operation.
+	 *
+	 * NOTE: be sure not to reverse lock ordering by locking an op lock
+	 * while holding the request_list lock.  Here, we first lock the op
+	 * and then lock the appropriate list.
+	 */
+	if (!op) {
+		gossip_debug(GOSSIP_WAIT_DEBUG,
+			    "%s: op is null, ignoring\n",
+			     __func__);
+		return;
+	}
+
+	/*
+	 * one more sanity check, make sure it's in one of the possible states
+	 * or don't try to cancel it
+	 */
+	if (!(op_state_waiting(op) ||
+	      op_state_in_progress(op) ||
+	      op_state_serviced(op) ||
+	      op_state_purged(op))) {
+		gossip_debug(GOSSIP_WAIT_DEBUG,
+			     "%s: op %p not in a valid state (%0x), "
+			     "ignoring\n",
+			     __func__,
+			     op,
+			     op->op_state);
+		return;
+	}
+
+	spin_lock(&op->lock);
+
+	if (op_state_waiting(op)) {
+		/*
+		 * upcall hasn't been read; remove op from upcall request
+		 * list.
+		 */
+		spin_unlock(&op->lock);
+		remove_op_from_request_list(op);
+		gossip_debug(GOSSIP_WAIT_DEBUG,
+			     "Interrupted: Removed op %p from request_list\n",
+			     op);
+	} else if (op_state_in_progress(op)) {
+		/* op must be removed from the in progress htable */
+		spin_unlock(&op->lock);
+		spin_lock(&htable_ops_in_progress_lock);
+		list_del(&op->list);
+		spin_unlock(&htable_ops_in_progress_lock);
+		gossip_debug(GOSSIP_WAIT_DEBUG,
+			     "Interrupted: Removed op %p"
+			     " from htable_ops_in_progress\n",
+			     op);
+	} else if (!op_state_serviced(op)) {
+		spin_unlock(&op->lock);
+		gossip_err("interrupted operation is in a weird state 0x%x\n",
+			   op->op_state);
+	}
+}
+
+/*
+ * sleeps on waitqueue waiting for matching downcall.
+ * if client-core finishes servicing, then we are good to go.
+ * else if client-core exits, we get woken up here, and retry with a timeout
+ *
+ * Post when this call returns to the caller, the specified op will no
+ * longer be on any list or htable.
+ *
+ * Returns 0 on success and -errno on failure
+ * Errors are:
+ * EAGAIN in case we want the caller to requeue and try again..
+ * EINTR/EIO/ETIMEDOUT indicating we are done trying to service this
+ * operation since client-core seems to be exiting too often
+ * or if we were interrupted.
+ */
+int wait_for_matching_downcall(struct pvfs2_kernel_op_s *op)
+{
+	int ret = -EINVAL;
+	DECLARE_WAITQUEUE(wait_entry, current);
+
+	spin_lock(&op->lock);
+	add_wait_queue(&op->waitq, &wait_entry);
+	spin_unlock(&op->lock);
+
+	while (1) {
+		set_current_state(TASK_INTERRUPTIBLE);
+
+		spin_lock(&op->lock);
+		if (op_state_serviced(op)) {
+			spin_unlock(&op->lock);
+			ret = 0;
+			break;
+		}
+		spin_unlock(&op->lock);
+
+		if (!signal_pending(current)) {
+			/*
+			 * if this was our first attempt and client-core
+			 * has not purged our operation, we are happy to
+			 * simply wait
+			 */
+			spin_lock(&op->lock);
+			if (op->attempts == 0 && !op_state_purged(op)) {
+				spin_unlock(&op->lock);
+				schedule();
+			} else {
+				spin_unlock(&op->lock);
+				/*
+				 * subsequent attempts, we retry exactly once
+				 * with timeouts
+				 */
+				if (!schedule_timeout(MSECS_TO_JIFFIES
+				      (1000 * op_timeout_secs))) {
+					gossip_debug(GOSSIP_WAIT_DEBUG,
+						     "*** %s:"
+						     " operation timed out (tag"
+						     " %llu, %p, att %d)\n",
+						     __func__,
+						     llu(op->tag),
+						     op,
+						     op->attempts);
+					ret = -ETIMEDOUT;
+					pvfs2_clean_up_interrupted_operation
+					    (op);
+					break;
+				}
+			}
+			spin_lock(&op->lock);
+			op->attempts++;
+			/*
+			 * if the operation was purged in the meantime, it
+			 * is better to requeue it afresh but ensure that
+			 * we have not been purged repeatedly. This could
+			 * happen if client-core crashes when an op
+			 * is being serviced, so we requeue the op, client
+			 * core crashes again so we requeue the op, client
+			 * core starts, and so on...
+			 */
+			if (op_state_purged(op)) {
+				ret = (op->attempts < PVFS2_PURGE_RETRY_COUNT) ?
+					 -EAGAIN :
+					 -EIO;
+				spin_unlock(&op->lock);
+				gossip_debug(GOSSIP_WAIT_DEBUG,
+					     "*** %s:"
+					     " operation purged (tag "
+					     "%llu, %p, att %d)\n",
+					     __func__,
+					     llu(op->tag),
+					     op,
+					     op->attempts);
+				pvfs2_clean_up_interrupted_operation(op);
+				break;
+			}
+			spin_unlock(&op->lock);
+			continue;
+		}
+
+		gossip_debug(GOSSIP_WAIT_DEBUG,
+			     "*** %s:"
+			     " operation interrupted by a signal (tag "
+			     "%llu, op %p)\n",
+			     __func__,
+			     llu(op->tag),
+			     op);
+		pvfs2_clean_up_interrupted_operation(op);
+		ret = -EINTR;
+		break;
+	}
+
+	set_current_state(TASK_RUNNING);
+
+	spin_lock(&op->lock);
+	remove_wait_queue(&op->waitq, &wait_entry);
+	spin_unlock(&op->lock);
+
+	return ret;
+}
+
+/*
+ * similar to wait_for_matching_downcall(), but used in the special case
+ * of I/O cancellations.
+ *
+ * Note we need a special wait function because if this is called we already
+ *      know that a signal is pending in current and need to service the
+ *      cancellation upcall anyway.  the only way to exit this is to either
+ *      timeout or have the cancellation be serviced properly.
+ */
+int wait_for_cancellation_downcall(struct pvfs2_kernel_op_s *op)
+{
+	int ret = -EINVAL;
+	DECLARE_WAITQUEUE(wait_entry, current);
+
+	spin_lock(&op->lock);
+	add_wait_queue(&op->waitq, &wait_entry);
+	spin_unlock(&op->lock);
+
+	while (1) {
+		set_current_state(TASK_INTERRUPTIBLE);
+
+		spin_lock(&op->lock);
+		if (op_state_serviced(op)) {
+			gossip_debug(GOSSIP_WAIT_DEBUG,
+				     "%s:op-state is SERVICED.\n",
+				     __func__);
+			spin_unlock(&op->lock);
+			ret = 0;
+			break;
+		}
+		spin_unlock(&op->lock);
+
+		if (signal_pending(current)) {
+			gossip_debug(GOSSIP_WAIT_DEBUG,
+				     "%s:operation interrupted by a signal (tag"
+				     " %llu, op %p)\n",
+				     __func__,
+				     llu(op->tag),
+				     op);
+			pvfs2_clean_up_interrupted_operation(op);
+			ret = -EINTR;
+			break;
+		}
+
+		gossip_debug(GOSSIP_WAIT_DEBUG,
+			     "%s:About to call schedule_timeout.\n",
+			     __func__);
+		ret =
+		    schedule_timeout(MSECS_TO_JIFFIES(1000 * op_timeout_secs));
+
+		gossip_debug(GOSSIP_WAIT_DEBUG,
+			     "%s:Value returned from schedule_timeout(%d).\n",
+			     __func__,
+			     ret);
+		if (!ret) {
+			gossip_debug(GOSSIP_WAIT_DEBUG,
+				     "%s:*** operation timed out: %p\n",
+				     __func__,
+				     op);
+			pvfs2_clean_up_interrupted_operation(op);
+			ret = -ETIMEDOUT;
+			break;
+		}
+
+		gossip_debug(GOSSIP_WAIT_DEBUG,
+			     "%s:Breaking out of loop, regardless of value returned by schedule_timeout.\n",
+			     __func__);
+		ret = -ETIMEDOUT;
+		break;
+	}
+
+	set_current_state(TASK_RUNNING);
+
+	spin_lock(&op->lock);
+	remove_wait_queue(&op->waitq, &wait_entry);
+	spin_unlock(&op->lock);
+
+	gossip_debug(GOSSIP_WAIT_DEBUG,
+		     "%s:returning ret(%d)\n",
+		     __func__,
+		     ret);
+
+	return ret;
+}
diff --git a/fs/orangefs/xattr.c b/fs/orangefs/xattr.c
new file mode 100644
index 0000000..2766090
--- /dev/null
+++ b/fs/orangefs/xattr.c
@@ -0,0 +1,532 @@
+/*
+ * (C) 2001 Clemson University and The University of Chicago
+ *
+ * See COPYING in top-level directory.
+ */
+
+/*
+ *  Linux VFS extended attribute operations.
+ */
+
+#include "protocol.h"
+#include "pvfs2-kernel.h"
+#include "pvfs2-bufmap.h"
+#include <linux/posix_acl_xattr.h>
+#include <linux/xattr.h>
+
+
+#define SYSTEM_PVFS2_KEY "system.pvfs2."
+#define SYSTEM_PVFS2_KEY_LEN 13
+
+/*
+ * this function returns
+ *   0 if the key corresponding to name is not meant to be printed as part
+ *     of a listxattr.
+ *   1 if the key corresponding to name is meant to be returned as part of
+ *     a listxattr.
+ * The ones that start SYSTEM_PVFS2_KEY are the ones to avoid printing.
+ */
+static int is_reserved_key(const char *key, size_t size)
+{
+
+	if (size < SYSTEM_PVFS2_KEY_LEN)
+		return 1;
+
+	return strncmp(key, SYSTEM_PVFS2_KEY, SYSTEM_PVFS2_KEY_LEN) ?  1 : 0;
+}
+
+static inline int convert_to_internal_xattr_flags(int setxattr_flags)
+{
+	int internal_flag = 0;
+
+	if (setxattr_flags & XATTR_REPLACE) {
+		/* Attribute must exist! */
+		internal_flag = PVFS_XATTR_REPLACE;
+	} else if (setxattr_flags & XATTR_CREATE) {
+		/* Attribute must not exist */
+		internal_flag = PVFS_XATTR_CREATE;
+	}
+	return internal_flag;
+}
+
+
+/*
+ * Tries to get a specified key's attributes of a given
+ * file into a user-specified buffer. Note that the getxattr
+ * interface allows for the users to probe the size of an
+ * extended attribute by passing in a value of 0 to size.
+ * Thus our return value is always the size of the attribute
+ * unless the key does not exist for the file and/or if
+ * there were errors in fetching the attribute value.
+ */
+ssize_t pvfs2_inode_getxattr(struct inode *inode, const char *prefix,
+		const char *name, void *buffer, size_t size)
+{
+	struct pvfs2_inode_s *pvfs2_inode = PVFS2_I(inode);
+	struct pvfs2_kernel_op_s *new_op = NULL;
+	ssize_t ret = -ENOMEM;
+	ssize_t length = 0;
+	int fsuid;
+	int fsgid;
+
+	gossip_debug(GOSSIP_XATTR_DEBUG,
+		     "%s: prefix %s name %s, buffer_size %zd\n",
+		     __func__, prefix, name, size);
+
+	if (name == NULL || (size > 0 && buffer == NULL)) {
+		gossip_err("pvfs2_inode_getxattr: bogus NULL pointers\n");
+		return -EINVAL;
+	}
+	if (size < 0 ||
+	    (strlen(name) + strlen(prefix)) >= PVFS_MAX_XATTR_NAMELEN) {
+		gossip_err("Invalid size (%d) or key length (%d)\n",
+			   (int)size,
+			   (int)(strlen(name) + strlen(prefix)));
+		return -EINVAL;
+	}
+
+	fsuid = from_kuid(current_user_ns(), current_fsuid());
+	fsgid = from_kgid(current_user_ns(), current_fsgid());
+
+	gossip_debug(GOSSIP_XATTR_DEBUG,
+		     "getxattr on inode %pU, name %s "
+		     "(uid %o, gid %o)\n",
+		     get_khandle_from_ino(inode),
+		     name,
+		     fsuid,
+		     fsgid);
+
+	down_read(&pvfs2_inode->xattr_sem);
+
+	new_op = op_alloc(PVFS2_VFS_OP_GETXATTR);
+	if (!new_op)
+		goto out_unlock;
+
+	new_op->upcall.req.getxattr.refn = pvfs2_inode->refn;
+	ret = snprintf((char *)new_op->upcall.req.getxattr.key,
+		       PVFS_MAX_XATTR_NAMELEN, "%s%s", prefix, name);
+
+	/*
+	 * NOTE: Although keys are meant to be NULL terminated textual
+	 * strings, I am going to explicitly pass the length just in case
+	 * we change this later on...
+	 */
+	new_op->upcall.req.getxattr.key_sz = ret + 1;
+
+	ret = service_operation(new_op, "pvfs2_inode_getxattr",
+				get_interruptible_flag(inode));
+	if (ret != 0) {
+		if (ret == -ENOENT) {
+			ret = -ENODATA;
+			gossip_debug(GOSSIP_XATTR_DEBUG,
+				     "pvfs2_inode_getxattr: inode %pU key %s"
+				     " does not exist!\n",
+				     get_khandle_from_ino(inode),
+				     (char *)new_op->upcall.req.getxattr.key);
+		}
+		goto out_release_op;
+	}
+
+	/*
+	 * Length returned includes null terminator.
+	 */
+	length = new_op->downcall.resp.getxattr.val_sz;
+
+	/*
+	 * Just return the length of the queried attribute.
+	 */
+	if (size == 0) {
+		ret = length;
+		goto out_release_op;
+	}
+
+	/*
+	 * Check to see if key length is > provided buffer size.
+	 */
+	if (length > size) {
+		ret = -ERANGE;
+		goto out_release_op;
+	}
+
+	memset(buffer, 0, size);
+	memcpy(buffer, new_op->downcall.resp.getxattr.val, length);
+	gossip_debug(GOSSIP_XATTR_DEBUG,
+	     "pvfs2_inode_getxattr: inode %pU "
+	     "key %s key_sz %d, val_len %d\n",
+	     get_khandle_from_ino(inode),
+	     (char *)new_op->
+		upcall.req.getxattr.key,
+		     (int)new_op->
+		upcall.req.getxattr.key_sz,
+	     (int)ret);
+
+	ret = length;
+
+out_release_op:
+	op_release(new_op);
+out_unlock:
+	up_read(&pvfs2_inode->xattr_sem);
+	return ret;
+}
+
+static int pvfs2_inode_removexattr(struct inode *inode,
+			    const char *prefix,
+			    const char *name,
+			    int flags)
+{
+	struct pvfs2_inode_s *pvfs2_inode = PVFS2_I(inode);
+	struct pvfs2_kernel_op_s *new_op = NULL;
+	int ret = -ENOMEM;
+
+	down_write(&pvfs2_inode->xattr_sem);
+	new_op = op_alloc(PVFS2_VFS_OP_REMOVEXATTR);
+	if (!new_op)
+		goto out_unlock;
+
+	new_op->upcall.req.removexattr.refn = pvfs2_inode->refn;
+	/*
+	 * NOTE: Although keys are meant to be NULL terminated
+	 * textual strings, I am going to explicitly pass the
+	 * length just in case we change this later on...
+	 */
+	ret = snprintf((char *)new_op->upcall.req.removexattr.key,
+		       PVFS_MAX_XATTR_NAMELEN,
+		       "%s%s",
+		       (prefix ? prefix : ""),
+		       name);
+	new_op->upcall.req.removexattr.key_sz = ret + 1;
+
+	gossip_debug(GOSSIP_XATTR_DEBUG,
+		     "pvfs2_inode_removexattr: key %s, key_sz %d\n",
+		     (char *)new_op->upcall.req.removexattr.key,
+		     (int)new_op->upcall.req.removexattr.key_sz);
+
+	ret = service_operation(new_op,
+				"pvfs2_inode_removexattr",
+				get_interruptible_flag(inode));
+	if (ret == -ENOENT) {
+		/*
+		 * Request to replace a non-existent attribute is an error.
+		 */
+		if (flags & XATTR_REPLACE)
+			ret = -ENODATA;
+		else
+			ret = 0;
+	}
+
+	gossip_debug(GOSSIP_XATTR_DEBUG,
+		     "pvfs2_inode_removexattr: returning %d\n", ret);
+
+	op_release(new_op);
+out_unlock:
+	up_write(&pvfs2_inode->xattr_sem);
+	return ret;
+}
+
+/*
+ * Tries to set an attribute for a given key on a file.
+ *
+ * Returns a -ve number on error and 0 on success.  Key is text, but value
+ * can be binary!
+ */
+int pvfs2_inode_setxattr(struct inode *inode, const char *prefix,
+		const char *name, const void *value, size_t size, int flags)
+{
+	struct pvfs2_inode_s *pvfs2_inode = PVFS2_I(inode);
+	struct pvfs2_kernel_op_s *new_op;
+	int internal_flag = 0;
+	int ret = -ENOMEM;
+
+	gossip_debug(GOSSIP_XATTR_DEBUG,
+		     "%s: prefix %s, name %s, buffer_size %zd\n",
+		     __func__, prefix, name, size);
+
+	if (size < 0 ||
+	    size >= PVFS_MAX_XATTR_VALUELEN ||
+	    flags < 0) {
+		gossip_err("pvfs2_inode_setxattr: bogus values of size(%d), flags(%d)\n",
+			   (int)size,
+			   flags);
+		return -EINVAL;
+	}
+
+	if (name == NULL ||
+	    (size > 0 && value == NULL)) {
+		gossip_err("pvfs2_inode_setxattr: bogus NULL pointers!\n");
+		return -EINVAL;
+	}
+
+	internal_flag = convert_to_internal_xattr_flags(flags);
+
+	if (prefix) {
+		if (strlen(name) + strlen(prefix) >= PVFS_MAX_XATTR_NAMELEN) {
+			gossip_err
+			    ("pvfs2_inode_setxattr: bogus key size (%d)\n",
+			     (int)(strlen(name) + strlen(prefix)));
+			return -EINVAL;
+		}
+	} else {
+		if (strlen(name) >= PVFS_MAX_XATTR_NAMELEN) {
+			gossip_err
+			    ("pvfs2_inode_setxattr: bogus key size (%d)\n",
+			     (int)(strlen(name)));
+			return -EINVAL;
+		}
+	}
+
+	/* This is equivalent to a removexattr */
+	if (size == 0 && value == NULL) {
+		gossip_debug(GOSSIP_XATTR_DEBUG,
+			     "removing xattr (%s%s)\n",
+			     prefix,
+			     name);
+		return pvfs2_inode_removexattr(inode, prefix, name, flags);
+	}
+
+	gossip_debug(GOSSIP_XATTR_DEBUG,
+		     "setxattr on inode %pU, name %s\n",
+		     get_khandle_from_ino(inode),
+		     name);
+
+	down_write(&pvfs2_inode->xattr_sem);
+	new_op = op_alloc(PVFS2_VFS_OP_SETXATTR);
+	if (!new_op)
+		goto out_unlock;
+
+
+	new_op->upcall.req.setxattr.refn = pvfs2_inode->refn;
+	new_op->upcall.req.setxattr.flags = internal_flag;
+	/*
+	 * NOTE: Although keys are meant to be NULL terminated textual
+	 * strings, I am going to explicitly pass the length just in
+	 * case we change this later on...
+	 */
+	ret = snprintf((char *)new_op->upcall.req.setxattr.keyval.key,
+		       PVFS_MAX_XATTR_NAMELEN,
+		       "%s%s",
+		       prefix, name);
+	new_op->upcall.req.setxattr.keyval.key_sz = ret + 1;
+	memcpy(new_op->upcall.req.setxattr.keyval.val, value, size);
+	new_op->upcall.req.setxattr.keyval.val_sz = size;
+
+	gossip_debug(GOSSIP_XATTR_DEBUG,
+		     "pvfs2_inode_setxattr: key %s, key_sz %d "
+		     " value size %zd\n",
+		     (char *)new_op->upcall.req.setxattr.keyval.key,
+		     (int)new_op->upcall.req.setxattr.keyval.key_sz,
+		     size);
+
+	ret = service_operation(new_op,
+				"pvfs2_inode_setxattr",
+				get_interruptible_flag(inode));
+
+	gossip_debug(GOSSIP_XATTR_DEBUG,
+		     "pvfs2_inode_setxattr: returning %d\n",
+		     ret);
+
+	/* when request is serviced properly, free req op struct */
+	op_release(new_op);
+out_unlock:
+	up_write(&pvfs2_inode->xattr_sem);
+	return ret;
+}
+
+/*
+ * Tries to get a specified object's keys into a user-specified buffer of a
+ * given size.  Note that like the previous instances of xattr routines, this
+ * also allows you to pass in a NULL pointer and 0 size to probe the size for
+ * subsequent memory allocations. Thus our return value is always the size of
+ * all the keys unless there were errors in fetching the keys!
+ */
+ssize_t pvfs2_listxattr(struct dentry *dentry, char *buffer, size_t size)
+{
+	struct inode *inode = dentry->d_inode;
+	struct pvfs2_inode_s *pvfs2_inode = PVFS2_I(inode);
+	struct pvfs2_kernel_op_s *new_op;
+	__u64 token = PVFS_ITERATE_START;
+	ssize_t ret = -ENOMEM;
+	ssize_t total = 0;
+	ssize_t length = 0;
+	int count_keys = 0;
+	int key_size;
+	int i = 0;
+
+	if (size > 0 && buffer == NULL) {
+		gossip_err("%s: bogus NULL pointers\n", __func__);
+		return -EINVAL;
+	}
+	if (size < 0) {
+		gossip_err("Invalid size (%d)\n", (int)size);
+		return -EINVAL;
+	}
+
+	down_read(&pvfs2_inode->xattr_sem);
+	new_op = op_alloc(PVFS2_VFS_OP_LISTXATTR);
+	if (!new_op)
+		goto out_unlock;
+
+	if (buffer && size > 0)
+		memset(buffer, 0, size);
+
+try_again:
+	key_size = 0;
+	new_op->upcall.req.listxattr.refn = pvfs2_inode->refn;
+	new_op->upcall.req.listxattr.token = token;
+	new_op->upcall.req.listxattr.requested_count =
+	    (size == 0) ? 0 : PVFS_MAX_XATTR_LISTLEN;
+	ret = service_operation(new_op, __func__,
+				get_interruptible_flag(inode));
+	if (ret != 0)
+		goto done;
+
+	if (size == 0) {
+		/*
+		 * This is a bit of a big upper limit, but I did not want to
+		 * spend too much time getting this correct, since users end
+		 * up allocating memory rather than us...
+		 */
+		total = new_op->downcall.resp.listxattr.returned_count *
+			PVFS_MAX_XATTR_NAMELEN;
+		goto done;
+	}
+
+	length = new_op->downcall.resp.listxattr.keylen;
+	if (length == 0)
+		goto done;
+
+	/*
+	 * Check to see how much can be fit in the buffer. Fit only whole keys.
+	 */
+	for (i = 0; i < new_op->downcall.resp.listxattr.returned_count; i++) {
+		if (total + new_op->downcall.resp.listxattr.lengths[i] > size)
+			goto done;
+
+		/*
+		 * Since many dumb programs try to setxattr() on our reserved
+		 * xattrs this is a feeble attempt at defeating those by not
+		 * listing them in the output of listxattr.. sigh
+		 */
+		if (is_reserved_key(new_op->downcall.resp.listxattr.key +
+				    key_size,
+				    new_op->downcall.resp.
+					listxattr.lengths[i])) {
+			gossip_debug(GOSSIP_XATTR_DEBUG, "Copying key %d -> %s\n",
+					i, new_op->downcall.resp.listxattr.key +
+						key_size);
+			memcpy(buffer + total,
+				new_op->downcall.resp.listxattr.key + key_size,
+				new_op->downcall.resp.listxattr.lengths[i]);
+			total += new_op->downcall.resp.listxattr.lengths[i];
+			count_keys++;
+		} else {
+			gossip_debug(GOSSIP_XATTR_DEBUG, "[RESERVED] key %d -> %s\n",
+					i, new_op->downcall.resp.listxattr.key +
+						key_size);
+		}
+		key_size += new_op->downcall.resp.listxattr.lengths[i];
+	}
+
+	/*
+	 * Since the buffer was large enough, we might have to continue
+	 * fetching more keys!
+	 */
+	token = new_op->downcall.resp.listxattr.token;
+	if (token != PVFS_ITERATE_END)
+		goto try_again;
+
+done:
+	gossip_debug(GOSSIP_XATTR_DEBUG, "%s: returning %d"
+		     " [size of buffer %ld] (filled in %d keys)\n",
+		     __func__,
+		     ret ? (int)ret : (int)total,
+		     (long)size,
+		     count_keys);
+	op_release(new_op);
+	if (ret == 0)
+		ret = total;
+out_unlock:
+	up_read(&pvfs2_inode->xattr_sem);
+	return ret;
+}
+
+int pvfs2_xattr_set_default(struct dentry *dentry,
+			    const char *name,
+			    const void *buffer,
+			    size_t size,
+			    int flags,
+			    int handler_flags)
+{
+	return pvfs2_inode_setxattr(dentry->d_inode,
+				    PVFS2_XATTR_NAME_DEFAULT_PREFIX,
+				    name,
+				    buffer,
+				    size,
+				    flags);
+}
+
+int pvfs2_xattr_get_default(struct dentry *dentry,
+			    const char *name,
+			    void *buffer,
+			    size_t size,
+			    int handler_flags)
+{
+	return pvfs2_inode_getxattr(dentry->d_inode,
+				    PVFS2_XATTR_NAME_DEFAULT_PREFIX,
+				    name,
+				    buffer,
+				    size);
+
+}
+
+static int pvfs2_xattr_set_trusted(struct dentry *dentry,
+			    const char *name,
+			    const void *buffer,
+			    size_t size,
+			    int flags,
+			    int handler_flags)
+{
+	return pvfs2_inode_setxattr(dentry->d_inode,
+				    PVFS2_XATTR_NAME_TRUSTED_PREFIX,
+				    name,
+				    buffer,
+				    size,
+				    flags);
+}
+
+static int pvfs2_xattr_get_trusted(struct dentry *dentry,
+			    const char *name,
+			    void *buffer,
+			    size_t size,
+			    int handler_flags)
+{
+	return pvfs2_inode_getxattr(dentry->d_inode,
+				    PVFS2_XATTR_NAME_TRUSTED_PREFIX,
+				    name,
+				    buffer,
+				    size);
+}
+
+static struct xattr_handler pvfs2_xattr_trusted_handler = {
+	.prefix = PVFS2_XATTR_NAME_TRUSTED_PREFIX,
+	.get = pvfs2_xattr_get_trusted,
+	.set = pvfs2_xattr_set_trusted,
+};
+
+static struct xattr_handler pvfs2_xattr_default_handler = {
+	/*
+	 * NOTE: this is set to be the empty string.
+	 * so that all un-prefixed xattrs keys get caught
+	 * here!
+	 */
+	.prefix = PVFS2_XATTR_NAME_DEFAULT_PREFIX,
+	.get = pvfs2_xattr_get_default,
+	.set = pvfs2_xattr_set_default,
+};
+
+const struct xattr_handler *pvfs2_xattr_handlers[] = {
+	&posix_acl_access_xattr_handler,
+	&posix_acl_default_xattr_handler,
+	&pvfs2_xattr_trusted_handler,
+	&pvfs2_xattr_default_handler,
+	NULL
+};
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH V3 6/7] Orangefs: kernel client part 6
  2015-07-17 17:16 [PATCH V3 0/7] Orangefs: kernel client introduction Mike Marshall
                   ` (4 preceding siblings ...)
  2015-07-17 17:16 ` [PATCH V3 5/7] Orangefs: kernel client part 5 Mike Marshall
@ 2015-07-17 17:16 ` Mike Marshall
  2015-07-17 17:16 ` [PATCH V3 7/7] Orangefs: kernel client part 7 Mike Marshall
  6 siblings, 0 replies; 8+ messages in thread
From: Mike Marshall @ 2015-07-17 17:16 UTC (permalink / raw)
  To: viro; +Cc: Mike Marshall, linux-fsdevel

From: Mike Marshall <hubcap@omnibond.com>

Signed-off-by: Mike Marshall <hubcap@omnibond.com>
---
 Documentation/ABI/stable/sysfs-fs-orangefs |  87 ++++++++++++++++++
 Documentation/filesystems/orangefs.txt     | 137 +++++++++++++++++++++++++++++
 2 files changed, 224 insertions(+)
 create mode 100644 Documentation/ABI/stable/sysfs-fs-orangefs
 create mode 100644 Documentation/filesystems/orangefs.txt

diff --git a/Documentation/ABI/stable/sysfs-fs-orangefs b/Documentation/ABI/stable/sysfs-fs-orangefs
new file mode 100644
index 0000000..affdb11
--- /dev/null
+++ b/Documentation/ABI/stable/sysfs-fs-orangefs
@@ -0,0 +1,87 @@
+What:			/sys/fs/orangefs/perf_counters/*
+Date:			Jun 2015
+Contact:		Mike Marshall <hubcap@omnibond.com>
+Description:
+			Counters and settings for various caches.
+			Read only.
+
+
+What:			/sys/fs/orangefs/perf_counter_reset
+Date:			June 2015
+Contact:		Mike Marshall <hubcap@omnibond.com>
+Description:
+			echo a 0 or a 1 into perf_counter_reset to
+			reset all the counters in
+			/sys/fs/orangefs/perf_counters
+			except ones with PINT_PERF_PRESERVE set.
+
+
+What:			/sys/fs/orangefs/perf_time_interval_secs
+Date:			Jun 2015
+Contact:		Mike Marshall <hubcap@omnibond.com>
+Description:
+			Length of perf counter intervals in
+			seconds.
+
+
+What:			/sys/fs/orangefs/perf_history_size
+Date:			Jun 2015
+Contact:		Mike Marshall <hubcap@omnibond.com>
+Description:
+			The perf_counters cache statistics have N, or
+			perf_history_size, samples. The default is
+			one.
+
+			Every perf_time_interval_secs the (first)
+			samples are reset.
+
+			If N is greater than one, the "current" set
+			of samples is reset, and the samples from the
+			other N-1 intervals remain available.
+
+
+What:			/sys/fs/orangefs/op_timeout_secs
+Date:			Jun 2015
+Contact:		Mike Marshall <hubcap@omnibond.com>
+Description:
+			Service operation timeout in seconds.
+
+
+What:			/sys/fs/orangefs/slot_timeout_secs
+Date:			Jun 2015
+Contact:		Mike Marshall <hubcap@omnibond.com>
+Description:
+			"Slot" timeout in seconds. A "slot"
+			is an indexed buffer in the shared
+			memory segment used for communication
+			between the kernel module and userspace.
+			Slots are requested and waited for,
+			the wait times out after slot_timeout_secs.
+
+
+What:			/sys/fs/orangefs/acache/*
+Date:			Jun 2015
+Contact:		Mike Marshall <hubcap@omnibond.com>
+Description:
+			Attribute cache configurable settings.
+
+
+What:			/sys/fs/orangefs/ncache/*
+Date:			Jun 2015
+Contact:		Mike Marshall <hubcap@omnibond.com>
+Description:
+			Name cache configurable settings.
+
+
+What:			/sys/fs/orangefs/capcache/*
+Date:			Jun 2015
+Contact:		Mike Marshall <hubcap@omnibond.com>
+Description:
+			Capability cache configurable settings.
+
+
+What:			/sys/fs/orangefs/ccache/*
+Date:			Jun 2015
+Contact:		Mike Marshall <hubcap@omnibond.com>
+Description:
+			Credential cache configurable settings.
diff --git a/Documentation/filesystems/orangefs.txt b/Documentation/filesystems/orangefs.txt
new file mode 100644
index 0000000..ec9c841
--- /dev/null
+++ b/Documentation/filesystems/orangefs.txt
@@ -0,0 +1,137 @@
+ORANGEFS
+========
+
+OrangeFS is an LGPL userspace scale-out parallel storage system. It is ideal
+for large storage problems faced by HPC, BigData, Streaming Video,
+Genomics, Bioinformatics.
+
+Orangefs, originally called PVFS, was first developed in 1993 by
+Walt Ligon and Eric Blumer as a parallel file system for Parallel
+Virtual Machine (PVM) as part of a NASA grant to study the I/O patterns
+of parallel programs.
+
+Orangefs features include:
+
+  * Distributes file data among multiple file servers
+  * Supports simultaneous access by multiple clients
+  * Stores file data and metadata on servers using local file system
+    and access methods
+  * Userspace implementation is easy to install and maintain
+  * Direct MPI support
+  * Stateless
+
+
+MAILING LIST
+============
+
+http://beowulf-underground.org/mailman/listinfo/pvfs2-users
+
+
+DOCUMENTATION
+=============
+
+http://www.orangefs.org/documentation/
+
+
+USERSPACE FILESYSTEM SOURCE
+===========================
+
+http://www.orangefs.org/download
+
+Orangefs versions prior to 2.9.3 would not be compatible with the
+upstream version of the kernel client.
+
+
+BUILDING THE USERSPACE FILESYSTEM ON A SINGLE SERVER
+====================================================
+
+When Orangefs is upstream, "--with-kernel" shouldn't be needed, but
+until then the path to where the kernel with the Orangefs kernel client
+patch was built is needed to ensure that pvfs2-client-core (the bridge
+between kernel space and user space) will build properly. You can omit
+--prefix if you don't care that things are sprinkled around in
+/usr/local.
+
+./configure --prefix=/opt/ofs --with-kernel=/path/to/orangefs/kernel
+
+make
+
+make install
+
+Create an orangefs config file:
+/opt/ofs/bin/pvfs2-genconfig /etc/pvfs2.conf
+
+  for "Enter hostnames", use the hostname, don't let it default to
+  localhost.
+
+create a pvfs2tab file in /etc:
+cat /etc/pvfs2tab
+tcp://myhostname:3334/orangefs /mymountpoint pvfs2 defaults,noauto 0 0
+
+create the mount point you specified in the tab file if needed:
+mkdir /mymountpoint
+
+bootstrap the server:
+/opt/ofs/sbin/pvfs2-server /etc/pvfs2.conf -f
+
+start the server:
+/opt/osf/sbin/pvfs2-server /etc/pvfs2.conf
+
+Now the server is running. At this point you might like to
+prove things are working with:
+
+/opt/osf/bin/pvfs2-ls /mymountpoint
+
+You might not want to enforce selinux, it doesn't seem to matter by
+linux 3.11...
+
+If stuff seems to be working, turn on the client core:
+/opt/osf/sbin/pvfs2-client -p /opt/osf/sbin/pvfs2-client-core
+
+Mount your filesystem.
+mount -t pvfs2 tcp://myhostname:3334/orangefs /mymountpoint
+
+
+OPTIONS
+=======
+
+The following mount options are accepted:
+
+  acl
+    Allow the use of Access Control Lists on files and directories.
+
+  intr
+    Some operations between the kernel client and the user space
+    filesystem can be interruptible, such as changes in debug levels
+    and the setting of tunable parameters.
+
+  local_lock
+    Enable posix locking from the perspective of "this" kernel. The
+    default file_operations lock action is to return ENOSYS. Posix
+    locking kicks in if the filesystem is mounted with -o local_lock.
+    Distributed locking is being worked on for the future.
+
+
+DEBUGGING
+=========
+
+If you want the debug (GOSSIP) statments in a particular
+source file (inode.c for example) go to syslog:
+
+  echo inode > /sys/kernel/debug/orangefs/kernel-debug
+
+No debugging (the default):
+
+  echo none > /sys/kernel/debug/orangefs/kernel-debug
+
+Debugging from several source files:
+
+  echo inode,dir > /sys/kernel/debug/orangefs/kernel-debug
+
+All debugging:
+
+  echo all > /sys/kernel/debug/orangefs/kernel-debug
+
+Get a list of all debugging keywords:
+
+  cat /sys/kernel/debug/orangefs/debug-help
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH V3 7/7] Orangefs: kernel client part 7
  2015-07-17 17:16 [PATCH V3 0/7] Orangefs: kernel client introduction Mike Marshall
                   ` (5 preceding siblings ...)
  2015-07-17 17:16 ` [PATCH V3 6/7] Orangefs: kernel client part 6 Mike Marshall
@ 2015-07-17 17:16 ` Mike Marshall
  6 siblings, 0 replies; 8+ messages in thread
From: Mike Marshall @ 2015-07-17 17:16 UTC (permalink / raw)
  To: viro; +Cc: Mike Marshall, linux-fsdevel

From: Mike Marshall <hubcap@omnibond.com>

Signed-off-by: Mike Marshall <hubcap@omnibond.com>
---
 fs/Kconfig           |  1 +
 fs/Makefile          |  1 +
 fs/orangefs/Kconfig  |  6 ++++++
 fs/orangefs/Makefile | 10 ++++++++++
 4 files changed, 18 insertions(+)
 create mode 100644 fs/orangefs/Kconfig
 create mode 100644 fs/orangefs/Makefile

diff --git a/fs/Kconfig b/fs/Kconfig
index 011f433..abdaf03 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -193,6 +193,7 @@ menuconfig MISC_FILESYSTEMS
 
 if MISC_FILESYSTEMS
 
+source "fs/orangefs/Kconfig"
 source "fs/adfs/Kconfig"
 source "fs/affs/Kconfig"
 source "fs/ecryptfs/Kconfig"
diff --git a/fs/Makefile b/fs/Makefile
index cb20e4b..288cbc3 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -105,6 +105,7 @@ obj-$(CONFIG_AUTOFS4_FS)	+= autofs4/
 obj-$(CONFIG_ADFS_FS)		+= adfs/
 obj-$(CONFIG_FUSE_FS)		+= fuse/
 obj-$(CONFIG_OVERLAY_FS)	+= overlayfs/
+obj-$(CONFIG_ORANGEFS_FS)       += orangefs/
 obj-$(CONFIG_UDF_FS)		+= udf/
 obj-$(CONFIG_SUN_OPENPROMFS)	+= openpromfs/
 obj-$(CONFIG_OMFS_FS)		+= omfs/
diff --git a/fs/orangefs/Kconfig b/fs/orangefs/Kconfig
new file mode 100644
index 0000000..1554c02
--- /dev/null
+++ b/fs/orangefs/Kconfig
@@ -0,0 +1,6 @@
+config ORANGEFS_FS
+	tristate "ORANGEFS (Powered by PVFS) support"
+	select FS_POSIX_ACL
+	help
+	   Orange is a parallel file system designed for use on high end
+	   computing (HEC) systems.
diff --git a/fs/orangefs/Makefile b/fs/orangefs/Makefile
new file mode 100644
index 0000000..828b36a
--- /dev/null
+++ b/fs/orangefs/Makefile
@@ -0,0 +1,10 @@
+#
+# Makefile for the ORANGEFS filesystem.
+#
+
+obj-$(CONFIG_ORANGEFS_FS) += orangefs.o
+
+orangefs-objs := acl.o file.o pvfs2-cache.o pvfs2-utils.o xattr.o dcache.o \
+		 inode.o pvfs2-sysfs.o pvfs2-mod.o super.o devpvfs2-req.o \
+		 namei.o symlink.o dir.o pvfs2-bufmap.o \
+		 pvfs2-debugfs.o waitqueue.o
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-07-17 17:17 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-17 17:16 [PATCH V3 0/7] Orangefs: kernel client introduction Mike Marshall
2015-07-17 17:16 ` [PATCH V3 1/7] Orangefs: kernel client part 1 Mike Marshall
2015-07-17 17:16 ` [PATCH V3 2/7] Orangefs: kernel client part 2 Mike Marshall
2015-07-17 17:16 ` [PATCH V3 3/7] Orangefs: kernel client part 3 Mike Marshall
2015-07-17 17:16 ` [PATCH V3 4/7] Orangefs: kernel client part 4 Mike Marshall
2015-07-17 17:16 ` [PATCH V3 5/7] Orangefs: kernel client part 5 Mike Marshall
2015-07-17 17:16 ` [PATCH V3 6/7] Orangefs: kernel client part 6 Mike Marshall
2015-07-17 17:16 ` [PATCH V3 7/7] Orangefs: kernel client part 7 Mike Marshall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.