linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/46] Lustre IO stack simplifications and cleanups
@ 2016-03-30 23:48 green
  2016-03-30 23:48 ` [PATCH v2 01/46] staging/lustre/obdclass: limit lu_site hash table size green
                   ` (45 more replies)
  0 siblings, 46 replies; 47+ messages in thread
From: green @ 2016-03-30 23:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List, Oleg Drokin

From: Oleg Drokin <green@linuxhacker.ru>

v2 due to a compile error that was accidentally introduced
in one of the patches. Also added a few more fixes and cleanups
as three additional patches at the end of the series.

This large body of patches mostly deals with Lustre IO stack
simplifications and related cleanups.
In particular the whole cl_page and cl_lock stuff is reduced to
basically nothing.
The simplifications remove about 5K lines of code and the other 5k
are mostly due to shuffling code around and renaming stuff
to get rid of now defunct layers.

Bobi Jam (2):
  staging/lustre: update comments after cl_lock simplification
  staging/lustre: lov_io_init() should return error code

Jinshan Xiong (14):
  staging/lustre: Reintroduce global env list
  staging/lustre/osc: Adjustment on osc LRU for performance
  staging/lustre/osc: to drop LRU pages with cl_lru_work
  staging/lustre/clio: collapse layer of cl_page
  staging/lustre/obdclass: Add a preallocated percpu cl_env
  staging/lustre/clio: add pages into writeback cache in batches
  staging/lustre/osc: add weight function for DLM lock
  staging/lustre/clio: remove stackable cl_page completely
  staging/lustre/clio: optimize read ahead code
  staging/lustre/clio: generalize cl_sync_io
  staging/lustre/clio: cl_lock simplification
  staging/lustre/llite: clip page correctly for vvp_io_commit_sync
  staging/lustre/llite: deadlock for page write
  staging/lustre/ldlm: ELC picks locks in a safer policy

John Hammond (5):
  staging/lustre/llite: Rename struct ccc_grouplock to ll_grouplock
  staging/lustre/llite: Rename struct vvp_thread_info to ll_thread_info
  staging/lustre/llite: rename struct ccc_thread_info to vvp_thread_info
  staging/lustre/llite: Remove ccc_global_{init,fini}()
  staging/lustre/llite: Move several declarations to llite_internal.h

John L. Hammond (15):
  staging/lustre: merge lclient/*.c into llite/
  staging/lustre/llite: remove lli_lvb
  staging/lustre/lmv: remove lmv_init_{lock,unlock}()
  staging/lustre/obd: remove struct client_obd_lock
  staging/lustre/llite: remove some cl wrappers
  staging/lustre/llite: merge lclient.h into llite/vvp_internal.h
  staging/lustre/llite: rename ccc_device to vvp_device
  staging/lustre/llite: rename ccc_object to vvp_object
  staging/lustre/llite: rename ccc_page to vvp_page
  staging/lustre/llite: rename ccc_lock to vvp_lock
  staging/lustre:llite: remove struct ll_ra_read
  staging/lustre/llite: merge ccc_io and vvp_io
  staging/lustre/llite: use vui prefix for struct vvp_io members
  staging/lustre/llite: move vvp_io functions to vvp_io.c
  staging/lustre/llite: rename ccc_req to vvp_req

Li Dongyang (1):
  staging/lustre/llite: make sure we do cl_page_clip on the last page

Niu Yawei (1):
  staging/lustre/ldlm: revert changes to ldlm_cancel_aged_policy()

Oleg Drokin (6):
  staging/lustre/obdclass: limit lu_site hash table size
  staging/lustre: Get rid of CFS_PAGE_MASK
  staging/lustre: Remove struct ll_iattr
  staging/lustre/llite: Move ll_dirent_type_get and make it static
  staging/lustre/llite: Remove unused vui_local_lock field
  staging/lustre: Fix spacing style before open parenthesis

Vitaly Fertman (2):
  staging/lustre/ldlm: restore the ELC for enqueue
  staging/lustre/ldlm: Solve a race for LRU lock cancel

 .../lustre/include/linux/libcfs/linux/linux-mem.h  |    1 -
 .../lustre/lnet/libcfs/linux/linux-crypto.c        |    2 +-
 drivers/staging/lustre/lnet/selftest/brw_test.c    |    2 +-
 drivers/staging/lustre/lustre/fld/fld_request.c    |   14 +-
 drivers/staging/lustre/lustre/include/cl_object.h  |  962 ++-------
 drivers/staging/lustre/lustre/include/lclient.h    |  408 ----
 drivers/staging/lustre/lustre/include/linux/obd.h  |  125 --
 drivers/staging/lustre/lustre/include/lu_object.h  |   64 +-
 .../lustre/lustre/include/lustre/lustre_idl.h      |    4 +-
 .../lustre/lustre/include/lustre/lustre_user.h     |   36 +-
 drivers/staging/lustre/lustre/include/lustre_cfg.h |    2 +-
 drivers/staging/lustre/lustre/include/lustre_dlm.h |   14 +-
 .../staging/lustre/lustre/include/lustre_import.h  |    2 +-
 drivers/staging/lustre/lustre/include/lustre_lib.h |   36 +-
 drivers/staging/lustre/lustre/include/obd.h        |   14 +-
 drivers/staging/lustre/lustre/lclient/glimpse.c    |  270 ---
 drivers/staging/lustre/lustre/lclient/lcommon_cl.c | 1203 -----------
 .../staging/lustre/lustre/lclient/lcommon_misc.c   |  200 --
 drivers/staging/lustre/lustre/ldlm/ldlm_internal.h |   10 +-
 drivers/staging/lustre/lustre/ldlm/ldlm_lib.c      |    5 +-
 drivers/staging/lustre/lustre/ldlm/ldlm_lock.c     |   19 +-
 drivers/staging/lustre/lustre/ldlm/ldlm_request.c  |   85 +-
 drivers/staging/lustre/lustre/ldlm/ldlm_resource.c |    1 +
 drivers/staging/lustre/lustre/llite/Makefile       |    5 +-
 drivers/staging/lustre/lustre/llite/dir.c          |   24 +-
 drivers/staging/lustre/lustre/llite/file.c         |  133 +-
 drivers/staging/lustre/lustre/llite/glimpse.c      |  255 +++
 drivers/staging/lustre/lustre/llite/lcommon_cl.c   |  327 +++
 drivers/staging/lustre/lustre/llite/lcommon_misc.c |  201 ++
 drivers/staging/lustre/lustre/llite/llite_close.c  |   28 +-
 .../staging/lustre/lustre/llite/llite_internal.h   |  244 +--
 drivers/staging/lustre/lustre/llite/llite_lib.c    |   21 +-
 drivers/staging/lustre/lustre/llite/llite_mmap.c   |   42 +-
 drivers/staging/lustre/lustre/llite/lproc_llite.c  |   10 +-
 drivers/staging/lustre/lustre/llite/rw.c           |  363 ++--
 drivers/staging/lustre/lustre/llite/rw26.c         |  304 ++-
 drivers/staging/lustre/lustre/llite/super25.c      |   14 +-
 drivers/staging/lustre/lustre/llite/vvp_dev.c      |  254 ++-
 drivers/staging/lustre/lustre/llite/vvp_internal.h |  332 ++-
 drivers/staging/lustre/lustre/llite/vvp_io.c       |  925 +++++----
 drivers/staging/lustre/lustre/llite/vvp_lock.c     |   53 +-
 drivers/staging/lustre/lustre/llite/vvp_object.c   |  141 +-
 drivers/staging/lustre/lustre/llite/vvp_page.c     |  209 +-
 drivers/staging/lustre/lustre/llite/vvp_req.c      |  121 ++
 drivers/staging/lustre/lustre/lmv/lmv_internal.h   |    3 -
 drivers/staging/lustre/lustre/lmv/lmv_obd.c        |   26 +-
 .../staging/lustre/lustre/lov/lov_cl_internal.h    |  105 +-
 drivers/staging/lustre/lustre/lov/lov_dev.c        |    5 +-
 drivers/staging/lustre/lustre/lov/lov_internal.h   |    2 +
 drivers/staging/lustre/lustre/lov/lov_io.c         |  233 +--
 drivers/staging/lustre/lustre/lov/lov_lock.c       |  996 +--------
 drivers/staging/lustre/lustre/lov/lov_obd.c        |    1 -
 drivers/staging/lustre/lustre/lov/lov_object.c     |   48 +-
 drivers/staging/lustre/lustre/lov/lov_offset.c     |   13 +
 drivers/staging/lustre/lustre/lov/lov_page.c       |  183 +-
 drivers/staging/lustre/lustre/lov/lovsub_lock.c    |  383 ----
 drivers/staging/lustre/lustre/lov/lovsub_page.c    |    4 +-
 drivers/staging/lustre/lustre/mdc/lproc_mdc.c      |    8 +-
 drivers/staging/lustre/lustre/mdc/mdc_lib.c        |   21 +-
 drivers/staging/lustre/lustre/mdc/mdc_request.c    |    4 +-
 drivers/staging/lustre/lustre/obdclass/cl_io.c     |  417 ++--
 drivers/staging/lustre/lustre/obdclass/cl_lock.c   | 2152 +-------------------
 drivers/staging/lustre/lustre/obdclass/cl_object.c |  294 ++-
 drivers/staging/lustre/lustre/obdclass/cl_page.c   |  654 +-----
 drivers/staging/lustre/lustre/obdclass/class_obd.c |    2 +-
 drivers/staging/lustre/lustre/obdclass/debug.c     |    4 +-
 drivers/staging/lustre/lustre/obdclass/lu_object.c |    4 +-
 drivers/staging/lustre/lustre/obdclass/obdo.c      |    3 +-
 .../staging/lustre/lustre/obdecho/echo_client.c    |  115 +-
 drivers/staging/lustre/lustre/osc/lproc_osc.c      |   50 +-
 drivers/staging/lustre/lustre/osc/osc_cache.c      |  378 +++-
 .../staging/lustre/lustre/osc/osc_cl_internal.h    |  159 +-
 drivers/staging/lustre/lustre/osc/osc_internal.h   |   23 +-
 drivers/staging/lustre/lustre/osc/osc_io.c         |  279 +--
 drivers/staging/lustre/lustre/osc/osc_lock.c       | 1693 ++++++---------
 drivers/staging/lustre/lustre/osc/osc_object.c     |   35 +-
 drivers/staging/lustre/lustre/osc/osc_page.c       |  513 +++--
 drivers/staging/lustre/lustre/osc/osc_request.c    |  378 ++--
 drivers/staging/lustre/lustre/ptlrpc/ptlrpcd.c     |   16 +-
 drivers/staging/lustre/lustre/ptlrpc/sec_bulk.c    |    2 +-
 drivers/staging/lustre/lustre/ptlrpc/sec_plain.c   |    2 +-
 81 files changed, 5827 insertions(+), 10866 deletions(-)
 delete mode 100644 drivers/staging/lustre/lustre/include/lclient.h
 delete mode 100644 drivers/staging/lustre/lustre/include/linux/obd.h
 delete mode 100644 drivers/staging/lustre/lustre/lclient/glimpse.c
 delete mode 100644 drivers/staging/lustre/lustre/lclient/lcommon_cl.c
 delete mode 100644 drivers/staging/lustre/lustre/lclient/lcommon_misc.c
 create mode 100644 drivers/staging/lustre/lustre/llite/glimpse.c
 create mode 100644 drivers/staging/lustre/lustre/llite/lcommon_cl.c
 create mode 100644 drivers/staging/lustre/lustre/llite/lcommon_misc.c
 create mode 100644 drivers/staging/lustre/lustre/llite/vvp_req.c

-- 
2.1.0

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v2 01/46] staging/lustre/obdclass: limit lu_site hash table size
  2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
@ 2016-03-30 23:48 ` green
  2016-03-30 23:48 ` [PATCH v2 02/46] staging/lustre: Get rid of CFS_PAGE_MASK green
                   ` (44 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: green @ 2016-03-30 23:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List, Oleg Drokin,
	Li Dongyang

From: Oleg Drokin <green@linuxhacker.ru>

Allocating a big hash table using the formula for osd
does not really work for clients. We will create new
hash table for each mount on a single client which is
a lot of memory more than expected.

This patch limits the hash table up to 8M which has
524288 entries

Signed-off-by: Li Dongyang <dongyang.li@anu.edu.au>
Reviewed-on: http://review.whamcloud.com/18048
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7689
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 drivers/staging/lustre/lustre/obdclass/lu_object.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/lustre/obdclass/lu_object.c b/drivers/staging/lustre/lustre/obdclass/lu_object.c
index 65a4746..69fdcee 100644
--- a/drivers/staging/lustre/lustre/obdclass/lu_object.c
+++ b/drivers/staging/lustre/lustre/obdclass/lu_object.c
@@ -935,7 +935,7 @@ static void lu_dev_add_linkage(struct lu_site *s, struct lu_device *d)
  * Initialize site \a s, with \a d as the top level device.
  */
 #define LU_SITE_BITS_MIN    12
-#define LU_SITE_BITS_MAX    24
+#define LU_SITE_BITS_MAX    19
 /**
  * total 256 buckets, we don't want too many buckets because:
  * - consume too much memory
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 02/46] staging/lustre: Get rid of CFS_PAGE_MASK
  2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
  2016-03-30 23:48 ` [PATCH v2 01/46] staging/lustre/obdclass: limit lu_site hash table size green
@ 2016-03-30 23:48 ` green
  2016-03-30 23:48 ` [PATCH v2 03/46] staging/lustre: merge lclient/*.c into llite/ green
                   ` (43 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: green @ 2016-03-30 23:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List, Oleg Drokin

From: Oleg Drokin <green@linuxhacker.ru>

CFS_PAGE_MASK is the same as PAGE_MASK, so get rid of it.

We are replacing it with PAGE_MASK instead of PAGE_CACHE_MASK
because PAGE_CACHE_* stuff is apparently going away.

Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 .../lustre/include/linux/libcfs/linux/linux-mem.h  |  1 -
 .../lustre/lnet/libcfs/linux/linux-crypto.c        |  2 +-
 drivers/staging/lustre/lnet/selftest/brw_test.c    |  2 +-
 drivers/staging/lustre/lustre/llite/llite_mmap.c   |  4 ++--
 drivers/staging/lustre/lustre/llite/rw26.c         |  8 ++++----
 drivers/staging/lustre/lustre/llite/vvp_io.c       |  6 +++---
 drivers/staging/lustre/lustre/lmv/lmv_obd.c        |  2 +-
 drivers/staging/lustre/lustre/obdclass/class_obd.c |  2 +-
 .../staging/lustre/lustre/obdecho/echo_client.c    |  6 +++---
 drivers/staging/lustre/lustre/osc/osc_cache.c      |  4 ++--
 drivers/staging/lustre/lustre/osc/osc_request.c    | 24 +++++++++++-----------
 drivers/staging/lustre/lustre/ptlrpc/sec_bulk.c    |  2 +-
 drivers/staging/lustre/lustre/ptlrpc/sec_plain.c   |  2 +-
 13 files changed, 32 insertions(+), 33 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/libcfs/linux/linux-mem.h b/drivers/staging/lustre/include/linux/libcfs/linux/linux-mem.h
index 0f2fd79..448379b 100644
--- a/drivers/staging/lustre/include/linux/libcfs/linux/linux-mem.h
+++ b/drivers/staging/lustre/include/linux/libcfs/linux/linux-mem.h
@@ -57,7 +57,6 @@
 #include "../libcfs_cpu.h"
 #endif
 
-#define CFS_PAGE_MASK		   (~((__u64)PAGE_CACHE_SIZE-1))
 #define page_index(p)       ((p)->index)
 
 #define memory_pressure_get() (current->flags & PF_MEMALLOC)
diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-crypto.c b/drivers/staging/lustre/lnet/libcfs/linux/linux-crypto.c
index 0715101..84f9b7b 100644
--- a/drivers/staging/lustre/lnet/libcfs/linux/linux-crypto.c
+++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-crypto.c
@@ -227,7 +227,7 @@ int cfs_crypto_hash_update_page(struct cfs_crypto_hash_desc *hdesc,
 	struct scatterlist sl;
 
 	sg_init_table(&sl, 1);
-	sg_set_page(&sl, page, len, offset & ~CFS_PAGE_MASK);
+	sg_set_page(&sl, page, len, offset & ~PAGE_MASK);
 
 	ahash_request_set_crypt(req, &sl, NULL, sl.length);
 	return crypto_ahash_update(req);
diff --git a/drivers/staging/lustre/lnet/selftest/brw_test.c b/drivers/staging/lustre/lnet/selftest/brw_test.c
index b33c356..1988cee 100644
--- a/drivers/staging/lustre/lnet/selftest/brw_test.c
+++ b/drivers/staging/lustre/lnet/selftest/brw_test.c
@@ -457,7 +457,7 @@ brw_server_handle(struct srpc_server_rpc *rpc)
 
 	if (!(reqstmsg->msg_ses_feats & LST_FEAT_BULK_LEN)) {
 		/* compat with old version */
-		if (reqst->brw_len & ~CFS_PAGE_MASK) {
+		if (reqst->brw_len & ~PAGE_MASK) {
 			reply->brw_status = EINVAL;
 			return 0;
 		}
diff --git a/drivers/staging/lustre/lustre/llite/llite_mmap.c b/drivers/staging/lustre/lustre/llite/llite_mmap.c
index 69445a9..baccf93 100644
--- a/drivers/staging/lustre/lustre/llite/llite_mmap.c
+++ b/drivers/staging/lustre/lustre/llite/llite_mmap.c
@@ -57,10 +57,10 @@ void policy_from_vma(ldlm_policy_data_t *policy,
 		     struct vm_area_struct *vma, unsigned long addr,
 		     size_t count)
 {
-	policy->l_extent.start = ((addr - vma->vm_start) & CFS_PAGE_MASK) +
+	policy->l_extent.start = ((addr - vma->vm_start) & PAGE_MASK) +
 				 (vma->vm_pgoff << PAGE_CACHE_SHIFT);
 	policy->l_extent.end = (policy->l_extent.start + count - 1) |
-			       ~CFS_PAGE_MASK;
+			       ~PAGE_MASK;
 }
 
 struct vm_area_struct *our_vma(struct mm_struct *mm, unsigned long addr,
diff --git a/drivers/staging/lustre/lustre/llite/rw26.c b/drivers/staging/lustre/lustre/llite/rw26.c
index 7a5db67..3d7e64e 100644
--- a/drivers/staging/lustre/lustre/llite/rw26.c
+++ b/drivers/staging/lustre/lustre/llite/rw26.c
@@ -376,7 +376,7 @@ static ssize_t ll_direct_IO_26(struct kiocb *iocb, struct iov_iter *iter,
 		return -EBADF;
 
 	/* FIXME: io smaller than PAGE_SIZE is broken on ia64 ??? */
-	if ((file_offset & ~CFS_PAGE_MASK) || (count & ~CFS_PAGE_MASK))
+	if ((file_offset & ~PAGE_MASK) || (count & ~PAGE_MASK))
 		return -EINVAL;
 
 	CDEBUG(D_VFSTRACE,
@@ -386,7 +386,7 @@ static ssize_t ll_direct_IO_26(struct kiocb *iocb, struct iov_iter *iter,
 	       MAX_DIO_SIZE >> PAGE_CACHE_SHIFT);
 
 	/* Check that all user buffers are aligned as well */
-	if (iov_iter_alignment(iter) & ~CFS_PAGE_MASK)
+	if (iov_iter_alignment(iter) & ~PAGE_MASK)
 		return -EINVAL;
 
 	env = cl_env_get(&refcheck);
@@ -435,8 +435,8 @@ static ssize_t ll_direct_IO_26(struct kiocb *iocb, struct iov_iter *iter,
 			    size > (PAGE_CACHE_SIZE / sizeof(*pages)) *
 				   PAGE_CACHE_SIZE) {
 				size = ((((size / 2) - 1) |
-					 ~CFS_PAGE_MASK) + 1) &
-					CFS_PAGE_MASK;
+					 ~PAGE_MASK) + 1) &
+					PAGE_MASK;
 				CDEBUG(D_VFSTRACE, "DIO size now %lu\n",
 				       size);
 				continue;
diff --git a/drivers/staging/lustre/lustre/llite/vvp_io.c b/drivers/staging/lustre/lustre/llite/vvp_io.c
index fb0c26e..984699a 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_io.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_io.c
@@ -234,8 +234,8 @@ static int vvp_mmap_locks(const struct lu_env *env,
 		if (count == 0)
 			continue;
 
-		count += addr & (~CFS_PAGE_MASK);
-		addr &= CFS_PAGE_MASK;
+		count += addr & (~PAGE_MASK);
+		addr &= PAGE_MASK;
 
 		down_read(&mm->mmap_sem);
 		while ((vma = our_vma(mm, addr, count)) != NULL) {
@@ -1043,7 +1043,7 @@ static int vvp_io_commit_write(const struct lu_env *env,
 				to = PAGE_CACHE_SIZE;
 				need_clip = false;
 			} else if (last_index == pg->cp_index) {
-				int size_to = i_size_read(inode) & ~CFS_PAGE_MASK;
+				int size_to = i_size_read(inode) & ~PAGE_MASK;
 
 				if (to < size_to)
 					to = size_to;
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index 0f776cf..e3a7216 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -2071,7 +2071,7 @@ static void lmv_adjust_dirpages(struct page **pages, int ncfspgs, int nlupgs)
 			dp = (struct lu_dirpage *)((char *)dp + LU_PAGE_SIZE);
 
 			/* Check if we've reached the end of the CFS_PAGE. */
-			if (!((unsigned long)dp & ~CFS_PAGE_MASK))
+			if (!((unsigned long)dp & ~PAGE_MASK))
 				break;
 
 			/* Save the hash and flags of this lu_dirpage. */
diff --git a/drivers/staging/lustre/lustre/obdclass/class_obd.c b/drivers/staging/lustre/lustre/obdclass/class_obd.c
index 1a938e1..d9844ba 100644
--- a/drivers/staging/lustre/lustre/obdclass/class_obd.c
+++ b/drivers/staging/lustre/lustre/obdclass/class_obd.c
@@ -461,7 +461,7 @@ static int obd_init_checks(void)
 		CWARN("LPD64 wrong length! strlen(%s)=%d != 2\n", buf, len);
 		ret = -EINVAL;
 	}
-	if ((u64val & ~CFS_PAGE_MASK) >= PAGE_CACHE_SIZE) {
+	if ((u64val & ~PAGE_MASK) >= PAGE_CACHE_SIZE) {
 		CWARN("mask failed: u64val %llu >= %llu\n", u64val,
 		      (__u64)PAGE_CACHE_SIZE);
 		ret = -EINVAL;
diff --git a/drivers/staging/lustre/lustre/obdecho/echo_client.c b/drivers/staging/lustre/lustre/obdecho/echo_client.c
index 64ffe24..33ce0c8 100644
--- a/drivers/staging/lustre/lustre/obdecho/echo_client.c
+++ b/drivers/staging/lustre/lustre/obdecho/echo_client.c
@@ -1119,7 +1119,7 @@ static int cl_echo_object_brw(struct echo_object *eco, int rw, u64 offset,
 	int rc;
 	int i;
 
-	LASSERT((offset & ~CFS_PAGE_MASK) == 0);
+	LASSERT((offset & ~PAGE_MASK) == 0);
 	LASSERT(ed->ed_next);
 	env = cl_env_get(&refcheck);
 	if (IS_ERR(env))
@@ -1387,7 +1387,7 @@ static int echo_client_kbrw(struct echo_device *ed, int rw, struct obdo *oa,
 	LASSERT(rw == OBD_BRW_WRITE || rw == OBD_BRW_READ);
 
 	if (count <= 0 ||
-	    (count & (~CFS_PAGE_MASK)) != 0)
+	    (count & (~PAGE_MASK)) != 0)
 		return -EINVAL;
 
 	/* XXX think again with misaligned I/O */
@@ -1470,7 +1470,7 @@ static int echo_client_prep_commit(const struct lu_env *env,
 	u64 npages, tot_pages;
 	int i, ret = 0, brw_flags = 0;
 
-	if (count <= 0 || (count & (~CFS_PAGE_MASK)) != 0)
+	if (count <= 0 || (count & (~PAGE_MASK)) != 0)
 		return -EINVAL;
 
 	npages = batch >> PAGE_CACHE_SHIFT;
diff --git a/drivers/staging/lustre/lustre/osc/osc_cache.c b/drivers/staging/lustre/lustre/osc/osc_cache.c
index 6336311..3cfd2b0 100644
--- a/drivers/staging/lustre/lustre/osc/osc_cache.c
+++ b/drivers/staging/lustre/lustre/osc/osc_cache.c
@@ -877,7 +877,7 @@ int osc_extent_finish(const struct lu_env *env, struct osc_extent *ext,
 		 * span a whole chunk on the OST side, or our accounting goes
 		 * wrong.  Should match the code in filter_grant_check.
 		 */
-		int offset = oap->oap_page_off & ~CFS_PAGE_MASK;
+		int offset = oap->oap_page_off & ~PAGE_MASK;
 		int count = oap->oap_count + (offset & (blocksize - 1));
 		int end = (offset + oap->oap_count) & (blocksize - 1);
 
@@ -2238,7 +2238,7 @@ int osc_prep_async_page(struct osc_object *osc, struct osc_page *ops,
 
 	oap->oap_page = page;
 	oap->oap_obj_off = offset;
-	LASSERT(!(offset & ~CFS_PAGE_MASK));
+	LASSERT(!(offset & ~PAGE_MASK));
 
 	if (!client_is_remote(exp) && capable(CFS_CAP_SYS_RESOURCE))
 		oap->oap_brw_flags = OBD_BRW_NOQUOTA;
diff --git a/drivers/staging/lustre/lustre/osc/osc_request.c b/drivers/staging/lustre/lustre/osc/osc_request.c
index ec0287f..850d5dd 100644
--- a/drivers/staging/lustre/lustre/osc/osc_request.c
+++ b/drivers/staging/lustre/lustre/osc/osc_request.c
@@ -1082,7 +1082,7 @@ static void handle_short_read(int nob_read, u32 page_count,
 		if (pga[i]->count > nob_read) {
 			/* EOF inside this page */
 			ptr = kmap(pga[i]->pg) +
-				(pga[i]->off & ~CFS_PAGE_MASK);
+				(pga[i]->off & ~PAGE_MASK);
 			memset(ptr + nob_read, 0, pga[i]->count - nob_read);
 			kunmap(pga[i]->pg);
 			page_count--;
@@ -1097,7 +1097,7 @@ static void handle_short_read(int nob_read, u32 page_count,
 
 	/* zero remaining pages */
 	while (page_count-- > 0) {
-		ptr = kmap(pga[i]->pg) + (pga[i]->off & ~CFS_PAGE_MASK);
+		ptr = kmap(pga[i]->pg) + (pga[i]->off & ~PAGE_MASK);
 		memset(ptr, 0, pga[i]->count);
 		kunmap(pga[i]->pg);
 		i++;
@@ -1188,20 +1188,20 @@ static u32 osc_checksum_bulk(int nob, u32 pg_count,
 		if (i == 0 && opc == OST_READ &&
 		    OBD_FAIL_CHECK(OBD_FAIL_OSC_CHECKSUM_RECEIVE)) {
 			unsigned char *ptr = kmap(pga[i]->pg);
-			int off = pga[i]->off & ~CFS_PAGE_MASK;
+			int off = pga[i]->off & ~PAGE_MASK;
 
 			memcpy(ptr + off, "bad1", min(4, nob));
 			kunmap(pga[i]->pg);
 		}
 		cfs_crypto_hash_update_page(hdesc, pga[i]->pg,
-					    pga[i]->off & ~CFS_PAGE_MASK,
+					    pga[i]->off & ~PAGE_MASK,
 				  count);
 		CDEBUG(D_PAGE,
 		       "page %p map %p index %lu flags %lx count %u priv %0lx: off %d\n",
 		       pga[i]->pg, pga[i]->pg->mapping, pga[i]->pg->index,
 		       (long)pga[i]->pg->flags, page_count(pga[i]->pg),
 		       page_private(pga[i]->pg),
-		       (int)(pga[i]->off & ~CFS_PAGE_MASK));
+		       (int)(pga[i]->off & ~PAGE_MASK));
 
 		nob -= pga[i]->count;
 		pg_count--;
@@ -1309,7 +1309,7 @@ static int osc_brw_prep_request(int cmd, struct client_obd *cli,
 	pg_prev = pga[0];
 	for (requested_nob = i = 0; i < page_count; i++, niobuf++) {
 		struct brw_page *pg = pga[i];
-		int poff = pg->off & ~CFS_PAGE_MASK;
+		int poff = pg->off & ~PAGE_MASK;
 
 		LASSERT(pg->count > 0);
 		/* make sure there is no gap in the middle of page array */
@@ -2227,8 +2227,8 @@ int osc_enqueue_base(struct obd_export *exp, struct ldlm_res_id *res_id,
 	/* Filesystem lock extents are extended to page boundaries so that
 	 * dealing with the page cache is a little smoother.
 	 */
-	policy->l_extent.start -= policy->l_extent.start & ~CFS_PAGE_MASK;
-	policy->l_extent.end |= ~CFS_PAGE_MASK;
+	policy->l_extent.start -= policy->l_extent.start & ~PAGE_MASK;
+	policy->l_extent.end |= ~PAGE_MASK;
 
 	/*
 	 * kms is not valid when either object is completely fresh (so that no
@@ -2378,8 +2378,8 @@ int osc_match_base(struct obd_export *exp, struct ldlm_res_id *res_id,
 	/* Filesystem lock extents are extended to page boundaries so that
 	 * dealing with the page cache is a little smoother
 	 */
-	policy->l_extent.start -= policy->l_extent.start & ~CFS_PAGE_MASK;
-	policy->l_extent.end |= ~CFS_PAGE_MASK;
+	policy->l_extent.start -= policy->l_extent.start & ~PAGE_MASK;
+	policy->l_extent.end |= ~PAGE_MASK;
 
 	/* Next, search for already existing extent locks that will cover us */
 	/* If we're trying to read, we also search for an existing PW lock.  The
@@ -2784,7 +2784,7 @@ out:
 			goto skip_locking;
 
 		policy.l_extent.start = fm_key->fiemap.fm_start &
-						CFS_PAGE_MASK;
+						PAGE_MASK;
 
 		if (OBD_OBJECT_EOF - fm_key->fiemap.fm_length <=
 		    fm_key->fiemap.fm_start + PAGE_CACHE_SIZE - 1)
@@ -2792,7 +2792,7 @@ out:
 		else
 			policy.l_extent.end = (fm_key->fiemap.fm_start +
 				fm_key->fiemap.fm_length +
-				PAGE_CACHE_SIZE - 1) & CFS_PAGE_MASK;
+				PAGE_CACHE_SIZE - 1) & PAGE_MASK;
 
 		ostid_build_res_name(&fm_key->oa.o_oi, &res_id);
 		mode = ldlm_lock_match(exp->exp_obd->obd_namespace,
diff --git a/drivers/staging/lustre/lustre/ptlrpc/sec_bulk.c b/drivers/staging/lustre/lustre/ptlrpc/sec_bulk.c
index 54a0c1f..33d1411 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/sec_bulk.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/sec_bulk.c
@@ -527,7 +527,7 @@ int sptlrpc_get_bulk_checksum(struct ptlrpc_bulk_desc *desc, __u8 alg,
 
 	for (i = 0; i < desc->bd_iov_count; i++) {
 		cfs_crypto_hash_update_page(hdesc, desc->bd_iov[i].kiov_page,
-				  desc->bd_iov[i].kiov_offset & ~CFS_PAGE_MASK,
+				  desc->bd_iov[i].kiov_offset & ~PAGE_MASK,
 				  desc->bd_iov[i].kiov_len);
 	}
 
diff --git a/drivers/staging/lustre/lustre/ptlrpc/sec_plain.c b/drivers/staging/lustre/lustre/ptlrpc/sec_plain.c
index 6276bf5..37c9f4c 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/sec_plain.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/sec_plain.c
@@ -162,7 +162,7 @@ static void corrupt_bulk_data(struct ptlrpc_bulk_desc *desc)
 			continue;
 
 		ptr = kmap(desc->bd_iov[i].kiov_page);
-		off = desc->bd_iov[i].kiov_offset & ~CFS_PAGE_MASK;
+		off = desc->bd_iov[i].kiov_offset & ~PAGE_MASK;
 		ptr[off] ^= 0x1;
 		kunmap(desc->bd_iov[i].kiov_page);
 		return;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 03/46] staging/lustre: merge lclient/*.c into llite/
  2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
  2016-03-30 23:48 ` [PATCH v2 01/46] staging/lustre/obdclass: limit lu_site hash table size green
  2016-03-30 23:48 ` [PATCH v2 02/46] staging/lustre: Get rid of CFS_PAGE_MASK green
@ 2016-03-30 23:48 ` green
  2016-03-30 23:48 ` [PATCH v2 04/46] staging/lustre: Reintroduce global env list green
                   ` (42 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: green @ 2016-03-30 23:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List,
	John L. Hammond, Oleg Drokin

From: "John L. Hammond" <john.hammond@intel.com>

Separate lclient was necessary to be shared between
different client implementations, make no sense to have
them separate in Linux kernel.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Based-on: http://review.whamcloud.com/10171
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 drivers/staging/lustre/lustre/lclient/glimpse.c    |  270 -----
 drivers/staging/lustre/lustre/lclient/lcommon_cl.c | 1203 -------------------
 .../staging/lustre/lustre/lclient/lcommon_misc.c   |  200 ----
 drivers/staging/lustre/lustre/llite/Makefile       |    2 +-
 drivers/staging/lustre/lustre/llite/glimpse.c      |  273 +++++
 drivers/staging/lustre/lustre/llite/lcommon_cl.c   | 1209 ++++++++++++++++++++
 drivers/staging/lustre/lustre/llite/lcommon_misc.c |  200 ++++
 7 files changed, 1683 insertions(+), 1674 deletions(-)
 delete mode 100644 drivers/staging/lustre/lustre/lclient/glimpse.c
 delete mode 100644 drivers/staging/lustre/lustre/lclient/lcommon_cl.c
 delete mode 100644 drivers/staging/lustre/lustre/lclient/lcommon_misc.c
 create mode 100644 drivers/staging/lustre/lustre/llite/glimpse.c
 create mode 100644 drivers/staging/lustre/lustre/llite/lcommon_cl.c
 create mode 100644 drivers/staging/lustre/lustre/llite/lcommon_misc.c

diff --git a/drivers/staging/lustre/lustre/lclient/glimpse.c b/drivers/staging/lustre/lustre/lclient/glimpse.c
deleted file mode 100644
index c4e8a08..0000000
--- a/drivers/staging/lustre/lustre/lclient/glimpse.c
+++ /dev/null
@@ -1,270 +0,0 @@
-/*
- * GPL HEADER START
- *
- * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 only,
- * as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * General Public License version 2 for more details (a copy is included
- * in the LICENSE file that accompanied this code).
- *
- * You should have received a copy of the GNU General Public License
- * version 2 along with this program; If not, see
- * http://www.sun.com/software/products/lustre/docs/GPLv2.pdf
- *
- * Please contact Sun Microsystems, Inc., 4150 Network Circle, Santa Clara,
- * CA 95054 USA or visit www.sun.com if you need additional information or
- * have any questions.
- *
- * GPL HEADER END
- */
-/*
- * Copyright (c) 2008, 2010, Oracle and/or its affiliates. All rights reserved.
- * Use is subject to license terms.
- *
- * Copyright (c) 2011, 2012, Intel Corporation.
- */
-/*
- * This file is part of Lustre, http://www.lustre.org/
- * Lustre is a trademark of Sun Microsystems, Inc.
- *
- * glimpse code shared between vvp and liblustre (and other Lustre clients in
- * the future).
- *
- *   Author: Nikita Danilov <nikita.danilov@sun.com>
- *   Author: Oleg Drokin <oleg.drokin@sun.com>
- */
-
-#include "../../include/linux/libcfs/libcfs.h"
-#include "../include/obd_class.h"
-#include "../include/obd_support.h"
-#include "../include/obd.h"
-
-#include "../include/lustre_dlm.h"
-#include "../include/lustre_lite.h"
-#include "../include/lustre_mdc.h"
-#include <linux/pagemap.h>
-#include <linux/file.h>
-
-#include "../include/cl_object.h"
-#include "../include/lclient.h"
-#include "../llite/llite_internal.h"
-
-static const struct cl_lock_descr whole_file = {
-	.cld_start = 0,
-	.cld_end   = CL_PAGE_EOF,
-	.cld_mode  = CLM_READ
-};
-
-/*
- * Check whether file has possible unwriten pages.
- *
- * \retval 1    file is mmap-ed or has dirty pages
- *	 0    otherwise
- */
-blkcnt_t dirty_cnt(struct inode *inode)
-{
-	blkcnt_t cnt = 0;
-	struct ccc_object *vob = cl_inode2ccc(inode);
-	void	      *results[1];
-
-	if (inode->i_mapping)
-		cnt += radix_tree_gang_lookup_tag(&inode->i_mapping->page_tree,
-						  results, 0, 1,
-						  PAGECACHE_TAG_DIRTY);
-	if (cnt == 0 && atomic_read(&vob->cob_mmap_cnt) > 0)
-		cnt = 1;
-
-	return (cnt > 0) ? 1 : 0;
-}
-
-int cl_glimpse_lock(const struct lu_env *env, struct cl_io *io,
-		    struct inode *inode, struct cl_object *clob, int agl)
-{
-	struct cl_lock_descr *descr = &ccc_env_info(env)->cti_descr;
-	struct cl_inode_info *lli   = cl_i2info(inode);
-	const struct lu_fid  *fid   = lu_object_fid(&clob->co_lu);
-	struct ccc_io	*cio   = ccc_env_io(env);
-	struct cl_lock       *lock;
-	int result;
-
-	result = 0;
-	if (!(lli->lli_flags & LLIF_MDS_SIZE_LOCK)) {
-		CDEBUG(D_DLMTRACE, "Glimpsing inode "DFID"\n", PFID(fid));
-		if (lli->lli_has_smd) {
-			/* NOTE: this looks like DLM lock request, but it may
-			 *       not be one. Due to CEF_ASYNC flag (translated
-			 *       to LDLM_FL_HAS_INTENT by osc), this is
-			 *       glimpse request, that won't revoke any
-			 *       conflicting DLM locks held. Instead,
-			 *       ll_glimpse_callback() will be called on each
-			 *       client holding a DLM lock against this file,
-			 *       and resulting size will be returned for each
-			 *       stripe. DLM lock on [0, EOF] is acquired only
-			 *       if there were no conflicting locks. If there
-			 *       were conflicting locks, enqueuing or waiting
-			 *       fails with -ENAVAIL, but valid inode
-			 *       attributes are returned anyway.
-			 */
-			*descr = whole_file;
-			descr->cld_obj   = clob;
-			descr->cld_mode  = CLM_PHANTOM;
-			descr->cld_enq_flags = CEF_ASYNC | CEF_MUST;
-			if (agl)
-				descr->cld_enq_flags |= CEF_AGL;
-			cio->cui_glimpse = 1;
-			/*
-			 * CEF_ASYNC is used because glimpse sub-locks cannot
-			 * deadlock (because they never conflict with other
-			 * locks) and, hence, can be enqueued out-of-order.
-			 *
-			 * CEF_MUST protects glimpse lock from conversion into
-			 * a lockless mode.
-			 */
-			lock = cl_lock_request(env, io, descr, "glimpse",
-					       current);
-			cio->cui_glimpse = 0;
-
-			if (!lock)
-				return 0;
-
-			if (IS_ERR(lock))
-				return PTR_ERR(lock);
-
-			LASSERT(agl == 0);
-			result = cl_wait(env, lock);
-			if (result == 0) {
-				cl_merge_lvb(env, inode);
-				if (cl_isize_read(inode) > 0 &&
-				    inode->i_blocks == 0) {
-					/*
-					 * LU-417: Add dirty pages block count
-					 * lest i_blocks reports 0, some "cp" or
-					 * "tar" may think it's a completely
-					 * sparse file and skip it.
-					 */
-					inode->i_blocks = dirty_cnt(inode);
-				}
-				cl_unuse(env, lock);
-			}
-			cl_lock_release(env, lock, "glimpse", current);
-		} else {
-			CDEBUG(D_DLMTRACE, "No objects for inode\n");
-			cl_merge_lvb(env, inode);
-		}
-	}
-
-	return result;
-}
-
-static int cl_io_get(struct inode *inode, struct lu_env **envout,
-		     struct cl_io **ioout, int *refcheck)
-{
-	struct lu_env	  *env;
-	struct cl_io	   *io;
-	struct cl_inode_info   *lli = cl_i2info(inode);
-	struct cl_object       *clob = lli->lli_clob;
-	int result;
-
-	if (S_ISREG(cl_inode_mode(inode))) {
-		env = cl_env_get(refcheck);
-		if (!IS_ERR(env)) {
-			io = ccc_env_thread_io(env);
-			io->ci_obj = clob;
-			*envout = env;
-			*ioout  = io;
-			result = 1;
-		} else
-			result = PTR_ERR(env);
-	} else
-		result = 0;
-	return result;
-}
-
-int cl_glimpse_size0(struct inode *inode, int agl)
-{
-	/*
-	 * We don't need ast_flags argument to cl_glimpse_size(), because
-	 * osc_lock_enqueue() takes care of the possible deadlock that said
-	 * argument was introduced to avoid.
-	 */
-	/*
-	 * XXX but note that ll_file_seek() passes LDLM_FL_BLOCK_NOWAIT to
-	 * cl_glimpse_size(), which doesn't make sense: glimpse locks are not
-	 * blocking anyway.
-	 */
-	struct lu_env	  *env = NULL;
-	struct cl_io	   *io  = NULL;
-	int		     result;
-	int		     refcheck;
-
-	result = cl_io_get(inode, &env, &io, &refcheck);
-	if (result > 0) {
-again:
-		io->ci_verify_layout = 1;
-		result = cl_io_init(env, io, CIT_MISC, io->ci_obj);
-		if (result > 0)
-			/*
-			 * nothing to do for this io. This currently happens
-			 * when stripe sub-object's are not yet created.
-			 */
-			result = io->ci_result;
-		else if (result == 0)
-			result = cl_glimpse_lock(env, io, inode, io->ci_obj,
-						 agl);
-
-		OBD_FAIL_TIMEOUT(OBD_FAIL_GLIMPSE_DELAY, 2);
-		cl_io_fini(env, io);
-		if (unlikely(io->ci_need_restart))
-			goto again;
-		cl_env_put(env, &refcheck);
-	}
-	return result;
-}
-
-int cl_local_size(struct inode *inode)
-{
-	struct lu_env	   *env = NULL;
-	struct cl_io	    *io  = NULL;
-	struct ccc_thread_info  *cti;
-	struct cl_object	*clob;
-	struct cl_lock_descr    *descr;
-	struct cl_lock	  *lock;
-	int		      result;
-	int		      refcheck;
-
-	if (!cl_i2info(inode)->lli_has_smd)
-		return 0;
-
-	result = cl_io_get(inode, &env, &io, &refcheck);
-	if (result <= 0)
-		return result;
-
-	clob = io->ci_obj;
-	result = cl_io_init(env, io, CIT_MISC, clob);
-	if (result > 0)
-		result = io->ci_result;
-	else if (result == 0) {
-		cti = ccc_env_info(env);
-		descr = &cti->cti_descr;
-
-		*descr = whole_file;
-		descr->cld_obj = clob;
-		lock = cl_lock_peek(env, io, descr, "localsize", current);
-		if (lock) {
-			cl_merge_lvb(env, inode);
-			cl_unuse(env, lock);
-			cl_lock_release(env, lock, "localsize", current);
-			result = 0;
-		} else
-			result = -ENODATA;
-	}
-	cl_io_fini(env, io);
-	cl_env_put(env, &refcheck);
-	return result;
-}
diff --git a/drivers/staging/lustre/lustre/lclient/lcommon_cl.c b/drivers/staging/lustre/lustre/lclient/lcommon_cl.c
deleted file mode 100644
index aced41a..0000000
--- a/drivers/staging/lustre/lustre/lclient/lcommon_cl.c
+++ /dev/null
@@ -1,1203 +0,0 @@
-/*
- * GPL HEADER START
- *
- * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 only,
- * as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * General Public License version 2 for more details (a copy is included
- * in the LICENSE file that accompanied this code).
- *
- * You should have received a copy of the GNU General Public License
- * version 2 along with this program; If not, see
- * http://www.sun.com/software/products/lustre/docs/GPLv2.pdf
- *
- * Please contact Sun Microsystems, Inc., 4150 Network Circle, Santa Clara,
- * CA 95054 USA or visit www.sun.com if you need additional information or
- * have any questions.
- *
- * GPL HEADER END
- */
-/*
- * Copyright (c) 2008, 2010, Oracle and/or its affiliates. All rights reserved.
- * Use is subject to license terms.
- *
- * Copyright (c) 2011, 2015, Intel Corporation.
- */
-/*
- * This file is part of Lustre, http://www.lustre.org/
- * Lustre is a trademark of Sun Microsystems, Inc.
- *
- * cl code shared between vvp and liblustre (and other Lustre clients in the
- * future).
- *
- *   Author: Nikita Danilov <nikita.danilov@sun.com>
- */
-
-#define DEBUG_SUBSYSTEM S_LLITE
-
-#include "../../include/linux/libcfs/libcfs.h"
-# include <linux/fs.h>
-# include <linux/sched.h>
-# include <linux/mm.h>
-# include <linux/quotaops.h>
-# include <linux/highmem.h>
-# include <linux/pagemap.h>
-# include <linux/rbtree.h>
-
-#include "../include/obd.h"
-#include "../include/obd_support.h"
-#include "../include/lustre_fid.h"
-#include "../include/lustre_lite.h"
-#include "../include/lustre_dlm.h"
-#include "../include/lustre_ver.h"
-#include "../include/lustre_mdc.h"
-#include "../include/cl_object.h"
-
-#include "../include/lclient.h"
-
-#include "../llite/llite_internal.h"
-
-static const struct cl_req_operations ccc_req_ops;
-
-/*
- * ccc_ prefix stands for "Common Client Code".
- */
-
-static struct kmem_cache *ccc_lock_kmem;
-static struct kmem_cache *ccc_object_kmem;
-static struct kmem_cache *ccc_thread_kmem;
-static struct kmem_cache *ccc_session_kmem;
-static struct kmem_cache *ccc_req_kmem;
-
-static struct lu_kmem_descr ccc_caches[] = {
-	{
-		.ckd_cache = &ccc_lock_kmem,
-		.ckd_name  = "ccc_lock_kmem",
-		.ckd_size  = sizeof(struct ccc_lock)
-	},
-	{
-		.ckd_cache = &ccc_object_kmem,
-		.ckd_name  = "ccc_object_kmem",
-		.ckd_size  = sizeof(struct ccc_object)
-	},
-	{
-		.ckd_cache = &ccc_thread_kmem,
-		.ckd_name  = "ccc_thread_kmem",
-		.ckd_size  = sizeof(struct ccc_thread_info),
-	},
-	{
-		.ckd_cache = &ccc_session_kmem,
-		.ckd_name  = "ccc_session_kmem",
-		.ckd_size  = sizeof(struct ccc_session)
-	},
-	{
-		.ckd_cache = &ccc_req_kmem,
-		.ckd_name  = "ccc_req_kmem",
-		.ckd_size  = sizeof(struct ccc_req)
-	},
-	{
-		.ckd_cache = NULL
-	}
-};
-
-/*****************************************************************************
- *
- * Vvp device and device type functions.
- *
- */
-
-void *ccc_key_init(const struct lu_context *ctx, struct lu_context_key *key)
-{
-	struct ccc_thread_info *info;
-
-	info = kmem_cache_zalloc(ccc_thread_kmem, GFP_NOFS);
-	if (!info)
-		info = ERR_PTR(-ENOMEM);
-	return info;
-}
-
-void ccc_key_fini(const struct lu_context *ctx,
-			 struct lu_context_key *key, void *data)
-{
-	struct ccc_thread_info *info = data;
-
-	kmem_cache_free(ccc_thread_kmem, info);
-}
-
-void *ccc_session_key_init(const struct lu_context *ctx,
-				  struct lu_context_key *key)
-{
-	struct ccc_session *session;
-
-	session = kmem_cache_zalloc(ccc_session_kmem, GFP_NOFS);
-	if (!session)
-		session = ERR_PTR(-ENOMEM);
-	return session;
-}
-
-void ccc_session_key_fini(const struct lu_context *ctx,
-				 struct lu_context_key *key, void *data)
-{
-	struct ccc_session *session = data;
-
-	kmem_cache_free(ccc_session_kmem, session);
-}
-
-struct lu_context_key ccc_key = {
-	.lct_tags = LCT_CL_THREAD,
-	.lct_init = ccc_key_init,
-	.lct_fini = ccc_key_fini
-};
-
-struct lu_context_key ccc_session_key = {
-	.lct_tags = LCT_SESSION,
-	.lct_init = ccc_session_key_init,
-	.lct_fini = ccc_session_key_fini
-};
-
-/* type constructor/destructor: ccc_type_{init,fini,start,stop}(). */
-/* LU_TYPE_INIT_FINI(ccc, &ccc_key, &ccc_session_key); */
-
-int ccc_device_init(const struct lu_env *env, struct lu_device *d,
-			   const char *name, struct lu_device *next)
-{
-	struct ccc_device  *vdv;
-	int rc;
-
-	vdv = lu2ccc_dev(d);
-	vdv->cdv_next = lu2cl_dev(next);
-
-	LASSERT(d->ld_site && next->ld_type);
-	next->ld_site = d->ld_site;
-	rc = next->ld_type->ldt_ops->ldto_device_init(
-			env, next, next->ld_type->ldt_name, NULL);
-	if (rc == 0) {
-		lu_device_get(next);
-		lu_ref_add(&next->ld_reference, "lu-stack", &lu_site_init);
-	}
-	return rc;
-}
-
-struct lu_device *ccc_device_fini(const struct lu_env *env,
-					 struct lu_device *d)
-{
-	return cl2lu_dev(lu2ccc_dev(d)->cdv_next);
-}
-
-struct lu_device *ccc_device_alloc(const struct lu_env *env,
-				   struct lu_device_type *t,
-				   struct lustre_cfg *cfg,
-				   const struct lu_device_operations *luops,
-				   const struct cl_device_operations *clops)
-{
-	struct ccc_device *vdv;
-	struct lu_device  *lud;
-	struct cl_site    *site;
-	int rc;
-
-	vdv = kzalloc(sizeof(*vdv), GFP_NOFS);
-	if (!vdv)
-		return ERR_PTR(-ENOMEM);
-
-	lud = &vdv->cdv_cl.cd_lu_dev;
-	cl_device_init(&vdv->cdv_cl, t);
-	ccc2lu_dev(vdv)->ld_ops = luops;
-	vdv->cdv_cl.cd_ops = clops;
-
-	site = kzalloc(sizeof(*site), GFP_NOFS);
-	if (site) {
-		rc = cl_site_init(site, &vdv->cdv_cl);
-		if (rc == 0)
-			rc = lu_site_init_finish(&site->cs_lu);
-		else {
-			LASSERT(!lud->ld_site);
-			CERROR("Cannot init lu_site, rc %d.\n", rc);
-			kfree(site);
-		}
-	} else
-		rc = -ENOMEM;
-	if (rc != 0) {
-		ccc_device_free(env, lud);
-		lud = ERR_PTR(rc);
-	}
-	return lud;
-}
-
-struct lu_device *ccc_device_free(const struct lu_env *env,
-					 struct lu_device *d)
-{
-	struct ccc_device *vdv  = lu2ccc_dev(d);
-	struct cl_site    *site = lu2cl_site(d->ld_site);
-	struct lu_device  *next = cl2lu_dev(vdv->cdv_next);
-
-	if (d->ld_site) {
-		cl_site_fini(site);
-		kfree(site);
-	}
-	cl_device_fini(lu2cl_dev(d));
-	kfree(vdv);
-	return next;
-}
-
-int ccc_req_init(const struct lu_env *env, struct cl_device *dev,
-			struct cl_req *req)
-{
-	struct ccc_req *vrq;
-	int result;
-
-	vrq = kmem_cache_zalloc(ccc_req_kmem, GFP_NOFS);
-	if (vrq) {
-		cl_req_slice_add(req, &vrq->crq_cl, dev, &ccc_req_ops);
-		result = 0;
-	} else
-		result = -ENOMEM;
-	return result;
-}
-
-/**
- * An `emergency' environment used by ccc_inode_fini() when cl_env_get()
- * fails. Access to this environment is serialized by ccc_inode_fini_guard
- * mutex.
- */
-static struct lu_env *ccc_inode_fini_env;
-
-/**
- * A mutex serializing calls to slp_inode_fini() under extreme memory
- * pressure, when environments cannot be allocated.
- */
-static DEFINE_MUTEX(ccc_inode_fini_guard);
-static int dummy_refcheck;
-
-int ccc_global_init(struct lu_device_type *device_type)
-{
-	int result;
-
-	result = lu_kmem_init(ccc_caches);
-	if (result)
-		return result;
-
-	result = lu_device_type_init(device_type);
-	if (result)
-		goto out_kmem;
-
-	ccc_inode_fini_env = cl_env_alloc(&dummy_refcheck,
-					  LCT_REMEMBER|LCT_NOREF);
-	if (IS_ERR(ccc_inode_fini_env)) {
-		result = PTR_ERR(ccc_inode_fini_env);
-		goto out_device;
-	}
-
-	ccc_inode_fini_env->le_ctx.lc_cookie = 0x4;
-	return 0;
-out_device:
-	lu_device_type_fini(device_type);
-out_kmem:
-	lu_kmem_fini(ccc_caches);
-	return result;
-}
-
-void ccc_global_fini(struct lu_device_type *device_type)
-{
-	if (ccc_inode_fini_env) {
-		cl_env_put(ccc_inode_fini_env, &dummy_refcheck);
-		ccc_inode_fini_env = NULL;
-	}
-	lu_device_type_fini(device_type);
-	lu_kmem_fini(ccc_caches);
-}
-
-/*****************************************************************************
- *
- * Object operations.
- *
- */
-
-struct lu_object *ccc_object_alloc(const struct lu_env *env,
-				   const struct lu_object_header *unused,
-				   struct lu_device *dev,
-				   const struct cl_object_operations *clops,
-				   const struct lu_object_operations *luops)
-{
-	struct ccc_object *vob;
-	struct lu_object  *obj;
-
-	vob = kmem_cache_zalloc(ccc_object_kmem, GFP_NOFS);
-	if (vob) {
-		struct cl_object_header *hdr;
-
-		obj = ccc2lu(vob);
-		hdr = &vob->cob_header;
-		cl_object_header_init(hdr);
-		lu_object_init(obj, &hdr->coh_lu, dev);
-		lu_object_add_top(&hdr->coh_lu, obj);
-
-		vob->cob_cl.co_ops = clops;
-		obj->lo_ops = luops;
-	} else
-		obj = NULL;
-	return obj;
-}
-
-int ccc_object_init0(const struct lu_env *env,
-			    struct ccc_object *vob,
-			    const struct cl_object_conf *conf)
-{
-	vob->cob_inode = conf->coc_inode;
-	vob->cob_transient_pages = 0;
-	cl_object_page_init(&vob->cob_cl, sizeof(struct ccc_page));
-	return 0;
-}
-
-int ccc_object_init(const struct lu_env *env, struct lu_object *obj,
-			   const struct lu_object_conf *conf)
-{
-	struct ccc_device *dev = lu2ccc_dev(obj->lo_dev);
-	struct ccc_object *vob = lu2ccc(obj);
-	struct lu_object  *below;
-	struct lu_device  *under;
-	int result;
-
-	under = &dev->cdv_next->cd_lu_dev;
-	below = under->ld_ops->ldo_object_alloc(env, obj->lo_header, under);
-	if (below) {
-		const struct cl_object_conf *cconf;
-
-		cconf = lu2cl_conf(conf);
-		INIT_LIST_HEAD(&vob->cob_pending_list);
-		lu_object_add(obj, below);
-		result = ccc_object_init0(env, vob, cconf);
-	} else
-		result = -ENOMEM;
-	return result;
-}
-
-void ccc_object_free(const struct lu_env *env, struct lu_object *obj)
-{
-	struct ccc_object *vob = lu2ccc(obj);
-
-	lu_object_fini(obj);
-	lu_object_header_fini(obj->lo_header);
-	kmem_cache_free(ccc_object_kmem, vob);
-}
-
-int ccc_lock_init(const struct lu_env *env,
-		  struct cl_object *obj, struct cl_lock *lock,
-		  const struct cl_io *unused,
-		  const struct cl_lock_operations *lkops)
-{
-	struct ccc_lock *clk;
-	int result;
-
-	CLOBINVRNT(env, obj, ccc_object_invariant(obj));
-
-	clk = kmem_cache_zalloc(ccc_lock_kmem, GFP_NOFS);
-	if (clk) {
-		cl_lock_slice_add(lock, &clk->clk_cl, obj, lkops);
-		result = 0;
-	} else
-		result = -ENOMEM;
-	return result;
-}
-
-int ccc_object_glimpse(const struct lu_env *env,
-		       const struct cl_object *obj, struct ost_lvb *lvb)
-{
-	struct inode *inode = ccc_object_inode(obj);
-
-	lvb->lvb_mtime = cl_inode_mtime(inode);
-	lvb->lvb_atime = cl_inode_atime(inode);
-	lvb->lvb_ctime = cl_inode_ctime(inode);
-	/*
-	 * LU-417: Add dirty pages block count lest i_blocks reports 0, some
-	 * "cp" or "tar" on remote node may think it's a completely sparse file
-	 * and skip it.
-	 */
-	if (lvb->lvb_size > 0 && lvb->lvb_blocks == 0)
-		lvb->lvb_blocks = dirty_cnt(inode);
-	return 0;
-}
-
-static void ccc_object_size_lock(struct cl_object *obj)
-{
-	struct inode *inode = ccc_object_inode(obj);
-
-	ll_inode_size_lock(inode);
-	cl_object_attr_lock(obj);
-}
-
-static void ccc_object_size_unlock(struct cl_object *obj)
-{
-	struct inode *inode = ccc_object_inode(obj);
-
-	cl_object_attr_unlock(obj);
-	ll_inode_size_unlock(inode);
-}
-
-/*****************************************************************************
- *
- * Page operations.
- *
- */
-
-struct page *ccc_page_vmpage(const struct lu_env *env,
-			    const struct cl_page_slice *slice)
-{
-	return cl2vm_page(slice);
-}
-
-int ccc_page_is_under_lock(const struct lu_env *env,
-			   const struct cl_page_slice *slice,
-			   struct cl_io *io)
-{
-	struct ccc_io	*cio  = ccc_env_io(env);
-	struct cl_lock_descr *desc = &ccc_env_info(env)->cti_descr;
-	struct cl_page       *page = slice->cpl_page;
-
-	int result;
-
-	if (io->ci_type == CIT_READ || io->ci_type == CIT_WRITE ||
-	    io->ci_type == CIT_FAULT) {
-		if (cio->cui_fd->fd_flags & LL_FILE_GROUP_LOCKED)
-			result = -EBUSY;
-		else {
-			desc->cld_start = page->cp_index;
-			desc->cld_end   = page->cp_index;
-			desc->cld_obj   = page->cp_obj;
-			desc->cld_mode  = CLM_READ;
-			result = cl_queue_match(&io->ci_lockset.cls_done,
-						desc) ? -EBUSY : 0;
-		}
-	} else
-		result = 0;
-	return result;
-}
-
-int ccc_fail(const struct lu_env *env, const struct cl_page_slice *slice)
-{
-	/*
-	 * Cached read?
-	 */
-	LBUG();
-	return 0;
-}
-
-int ccc_transient_page_prep(const struct lu_env *env,
-				   const struct cl_page_slice *slice,
-				   struct cl_io *unused)
-{
-	/* transient page should always be sent. */
-	return 0;
-}
-
-/*****************************************************************************
- *
- * Lock operations.
- *
- */
-
-void ccc_lock_delete(const struct lu_env *env,
-		     const struct cl_lock_slice *slice)
-{
-	CLOBINVRNT(env, slice->cls_obj, ccc_object_invariant(slice->cls_obj));
-}
-
-void ccc_lock_fini(const struct lu_env *env, struct cl_lock_slice *slice)
-{
-	struct ccc_lock *clk = cl2ccc_lock(slice);
-
-	kmem_cache_free(ccc_lock_kmem, clk);
-}
-
-int ccc_lock_enqueue(const struct lu_env *env,
-		     const struct cl_lock_slice *slice,
-		     struct cl_io *unused, __u32 enqflags)
-{
-	CLOBINVRNT(env, slice->cls_obj, ccc_object_invariant(slice->cls_obj));
-	return 0;
-}
-
-int ccc_lock_use(const struct lu_env *env, const struct cl_lock_slice *slice)
-{
-	CLOBINVRNT(env, slice->cls_obj, ccc_object_invariant(slice->cls_obj));
-	return 0;
-}
-
-int ccc_lock_unuse(const struct lu_env *env, const struct cl_lock_slice *slice)
-{
-	CLOBINVRNT(env, slice->cls_obj, ccc_object_invariant(slice->cls_obj));
-	return 0;
-}
-
-int ccc_lock_wait(const struct lu_env *env, const struct cl_lock_slice *slice)
-{
-	CLOBINVRNT(env, slice->cls_obj, ccc_object_invariant(slice->cls_obj));
-	return 0;
-}
-
-/**
- * Implementation of cl_lock_operations::clo_fits_into() methods for ccc
- * layer. This function is executed every time io finds an existing lock in
- * the lock cache while creating new lock. This function has to decide whether
- * cached lock "fits" into io.
- *
- * \param slice lock to be checked
- * \param io    IO that wants a lock.
- *
- * \see lov_lock_fits_into().
- */
-int ccc_lock_fits_into(const struct lu_env *env,
-		       const struct cl_lock_slice *slice,
-		       const struct cl_lock_descr *need,
-		       const struct cl_io *io)
-{
-	const struct cl_lock       *lock  = slice->cls_lock;
-	const struct cl_lock_descr *descr = &lock->cll_descr;
-	const struct ccc_io	*cio   = ccc_env_io(env);
-	int			 result;
-
-	/*
-	 * Work around DLM peculiarity: it assumes that glimpse
-	 * (LDLM_FL_HAS_INTENT) lock is always LCK_PR, and returns reads lock
-	 * when asked for LCK_PW lock with LDLM_FL_HAS_INTENT flag set. Make
-	 * sure that glimpse doesn't get CLM_WRITE top-lock, so that it
-	 * doesn't enqueue CLM_WRITE sub-locks.
-	 */
-	if (cio->cui_glimpse)
-		result = descr->cld_mode != CLM_WRITE;
-
-	/*
-	 * Also, don't match incomplete write locks for read, otherwise read
-	 * would enqueue missing sub-locks in the write mode.
-	 */
-	else if (need->cld_mode != descr->cld_mode)
-		result = lock->cll_state >= CLS_ENQUEUED;
-	else
-		result = 1;
-	return result;
-}
-
-/**
- * Implements cl_lock_operations::clo_state() method for ccc layer, invoked
- * whenever lock state changes. Transfers object attributes, that might be
- * updated as a result of lock acquiring into inode.
- */
-void ccc_lock_state(const struct lu_env *env,
-		    const struct cl_lock_slice *slice,
-		    enum cl_lock_state state)
-{
-	struct cl_lock *lock = slice->cls_lock;
-
-	/*
-	 * Refresh inode attributes when the lock is moving into CLS_HELD
-	 * state, and only when this is a result of real enqueue, rather than
-	 * of finding lock in the cache.
-	 */
-	if (state == CLS_HELD && lock->cll_state < CLS_HELD) {
-		struct cl_object *obj;
-		struct inode     *inode;
-
-		obj   = slice->cls_obj;
-		inode = ccc_object_inode(obj);
-
-		/* vmtruncate() sets the i_size
-		 * under both a DLM lock and the
-		 * ll_inode_size_lock().  If we don't get the
-		 * ll_inode_size_lock() here we can match the DLM lock and
-		 * reset i_size.  generic_file_write can then trust the
-		 * stale i_size when doing appending writes and effectively
-		 * cancel the result of the truncate.  Getting the
-		 * ll_inode_size_lock() after the enqueue maintains the DLM
-		 * -> ll_inode_size_lock() acquiring order.
-		 */
-		if (lock->cll_descr.cld_start == 0 &&
-		    lock->cll_descr.cld_end == CL_PAGE_EOF)
-			cl_merge_lvb(env, inode);
-	}
-}
-
-/*****************************************************************************
- *
- * io operations.
- *
- */
-
-int ccc_io_one_lock_index(const struct lu_env *env, struct cl_io *io,
-			  __u32 enqflags, enum cl_lock_mode mode,
-			  pgoff_t start, pgoff_t end)
-{
-	struct ccc_io	  *cio   = ccc_env_io(env);
-	struct cl_lock_descr   *descr = &cio->cui_link.cill_descr;
-	struct cl_object       *obj   = io->ci_obj;
-
-	CLOBINVRNT(env, obj, ccc_object_invariant(obj));
-
-	CDEBUG(D_VFSTRACE, "lock: %d [%lu, %lu]\n", mode, start, end);
-
-	memset(&cio->cui_link, 0, sizeof(cio->cui_link));
-
-	if (cio->cui_fd && (cio->cui_fd->fd_flags & LL_FILE_GROUP_LOCKED)) {
-		descr->cld_mode = CLM_GROUP;
-		descr->cld_gid  = cio->cui_fd->fd_grouplock.cg_gid;
-	} else {
-		descr->cld_mode  = mode;
-	}
-	descr->cld_obj   = obj;
-	descr->cld_start = start;
-	descr->cld_end   = end;
-	descr->cld_enq_flags = enqflags;
-
-	cl_io_lock_add(env, io, &cio->cui_link);
-	return 0;
-}
-
-void ccc_io_update_iov(const struct lu_env *env,
-		       struct ccc_io *cio, struct cl_io *io)
-{
-	size_t size = io->u.ci_rw.crw_count;
-
-	if (!cl_is_normalio(env, io) || !cio->cui_iter)
-		return;
-
-	iov_iter_truncate(cio->cui_iter, size);
-}
-
-int ccc_io_one_lock(const struct lu_env *env, struct cl_io *io,
-		    __u32 enqflags, enum cl_lock_mode mode,
-		    loff_t start, loff_t end)
-{
-	struct cl_object *obj = io->ci_obj;
-
-	return ccc_io_one_lock_index(env, io, enqflags, mode,
-				     cl_index(obj, start), cl_index(obj, end));
-}
-
-void ccc_io_end(const struct lu_env *env, const struct cl_io_slice *ios)
-{
-	CLOBINVRNT(env, ios->cis_io->ci_obj,
-		   ccc_object_invariant(ios->cis_io->ci_obj));
-}
-
-void ccc_io_advance(const struct lu_env *env,
-		    const struct cl_io_slice *ios,
-		    size_t nob)
-{
-	struct ccc_io    *cio = cl2ccc_io(env, ios);
-	struct cl_io     *io  = ios->cis_io;
-	struct cl_object *obj = ios->cis_io->ci_obj;
-
-	CLOBINVRNT(env, obj, ccc_object_invariant(obj));
-
-	if (!cl_is_normalio(env, io))
-		return;
-
-	iov_iter_reexpand(cio->cui_iter, cio->cui_tot_count  -= nob);
-}
-
-/**
- * Helper function that if necessary adjusts file size (inode->i_size), when
- * position at the offset \a pos is accessed. File size can be arbitrary stale
- * on a Lustre client, but client at least knows KMS. If accessed area is
- * inside [0, KMS], set file size to KMS, otherwise glimpse file size.
- *
- * Locking: cl_isize_lock is used to serialize changes to inode size and to
- * protect consistency between inode size and cl_object
- * attributes. cl_object_size_lock() protects consistency between cl_attr's of
- * top-object and sub-objects.
- */
-int ccc_prep_size(const struct lu_env *env, struct cl_object *obj,
-		  struct cl_io *io, loff_t start, size_t count, int *exceed)
-{
-	struct cl_attr *attr  = ccc_env_thread_attr(env);
-	struct inode   *inode = ccc_object_inode(obj);
-	loff_t	  pos   = start + count - 1;
-	loff_t kms;
-	int result;
-
-	/*
-	 * Consistency guarantees: following possibilities exist for the
-	 * relation between region being accessed and real file size at this
-	 * moment:
-	 *
-	 *  (A): the region is completely inside of the file;
-	 *
-	 *  (B-x): x bytes of region are inside of the file, the rest is
-	 *  outside;
-	 *
-	 *  (C): the region is completely outside of the file.
-	 *
-	 * This classification is stable under DLM lock already acquired by
-	 * the caller, because to change the class, other client has to take
-	 * DLM lock conflicting with our lock. Also, any updates to ->i_size
-	 * by other threads on this client are serialized by
-	 * ll_inode_size_lock(). This guarantees that short reads are handled
-	 * correctly in the face of concurrent writes and truncates.
-	 */
-	ccc_object_size_lock(obj);
-	result = cl_object_attr_get(env, obj, attr);
-	if (result == 0) {
-		kms = attr->cat_kms;
-		if (pos > kms) {
-			/*
-			 * A glimpse is necessary to determine whether we
-			 * return a short read (B) or some zeroes at the end
-			 * of the buffer (C)
-			 */
-			ccc_object_size_unlock(obj);
-			result = cl_glimpse_lock(env, io, inode, obj, 0);
-			if (result == 0 && exceed) {
-				/* If objective page index exceed end-of-file
-				 * page index, return directly. Do not expect
-				 * kernel will check such case correctly.
-				 * linux-2.6.18-128.1.1 miss to do that.
-				 * --bug 17336
-				 */
-				loff_t size = cl_isize_read(inode);
-				loff_t cur_index = start >> PAGE_CACHE_SHIFT;
-				loff_t size_index = (size - 1) >>
-						    PAGE_CACHE_SHIFT;
-
-				if ((size == 0 && cur_index != 0) ||
-				    size_index < cur_index)
-					*exceed = 1;
-			}
-			return result;
-		}
-		/*
-		 * region is within kms and, hence, within real file
-		 * size (A). We need to increase i_size to cover the
-		 * read region so that generic_file_read() will do its
-		 * job, but that doesn't mean the kms size is
-		 * _correct_, it is only the _minimum_ size. If
-		 * someone does a stat they will get the correct size
-		 * which will always be >= the kms value here.
-		 * b=11081
-		 */
-		if (cl_isize_read(inode) < kms) {
-			cl_isize_write_nolock(inode, kms);
-			CDEBUG(D_VFSTRACE,
-					DFID" updating i_size %llu\n",
-					PFID(lu_object_fid(&obj->co_lu)),
-					(__u64)cl_isize_read(inode));
-
-		}
-	}
-	ccc_object_size_unlock(obj);
-	return result;
-}
-
-/*****************************************************************************
- *
- * Transfer operations.
- *
- */
-
-void ccc_req_completion(const struct lu_env *env,
-			const struct cl_req_slice *slice, int ioret)
-{
-	struct ccc_req *vrq;
-
-	if (ioret > 0)
-		cl_stats_tally(slice->crs_dev, slice->crs_req->crq_type, ioret);
-
-	vrq = cl2ccc_req(slice);
-	kmem_cache_free(ccc_req_kmem, vrq);
-}
-
-/**
- * Implementation of struct cl_req_operations::cro_attr_set() for ccc
- * layer. ccc is responsible for
- *
- *    - o_[mac]time
- *
- *    - o_mode
- *
- *    - o_parent_seq
- *
- *    - o_[ug]id
- *
- *    - o_parent_oid
- *
- *    - o_parent_ver
- *
- *    - o_ioepoch,
- *
- */
-void ccc_req_attr_set(const struct lu_env *env,
-		      const struct cl_req_slice *slice,
-		      const struct cl_object *obj,
-		      struct cl_req_attr *attr, u64 flags)
-{
-	struct inode *inode;
-	struct obdo  *oa;
-	u32	      valid_flags;
-
-	oa = attr->cra_oa;
-	inode = ccc_object_inode(obj);
-	valid_flags = OBD_MD_FLTYPE;
-
-	if (slice->crs_req->crq_type == CRT_WRITE) {
-		if (flags & OBD_MD_FLEPOCH) {
-			oa->o_valid |= OBD_MD_FLEPOCH;
-			oa->o_ioepoch = cl_i2info(inode)->lli_ioepoch;
-			valid_flags |= OBD_MD_FLMTIME | OBD_MD_FLCTIME |
-				       OBD_MD_FLUID | OBD_MD_FLGID;
-		}
-	}
-	obdo_from_inode(oa, inode, valid_flags & flags);
-	obdo_set_parent_fid(oa, &cl_i2info(inode)->lli_fid);
-	memcpy(attr->cra_jobid, cl_i2info(inode)->lli_jobid,
-	       JOBSTATS_JOBID_SIZE);
-}
-
-static const struct cl_req_operations ccc_req_ops = {
-	.cro_attr_set   = ccc_req_attr_set,
-	.cro_completion = ccc_req_completion
-};
-
-int cl_setattr_ost(struct inode *inode, const struct iattr *attr)
-{
-	struct lu_env *env;
-	struct cl_io  *io;
-	int	    result;
-	int	    refcheck;
-
-	env = cl_env_get(&refcheck);
-	if (IS_ERR(env))
-		return PTR_ERR(env);
-
-	io = ccc_env_thread_io(env);
-	io->ci_obj = cl_i2info(inode)->lli_clob;
-
-	io->u.ci_setattr.sa_attr.lvb_atime = LTIME_S(attr->ia_atime);
-	io->u.ci_setattr.sa_attr.lvb_mtime = LTIME_S(attr->ia_mtime);
-	io->u.ci_setattr.sa_attr.lvb_ctime = LTIME_S(attr->ia_ctime);
-	io->u.ci_setattr.sa_attr.lvb_size = attr->ia_size;
-	io->u.ci_setattr.sa_valid = attr->ia_valid;
-
-again:
-	if (cl_io_init(env, io, CIT_SETATTR, io->ci_obj) == 0) {
-		struct ccc_io *cio = ccc_env_io(env);
-
-		if (attr->ia_valid & ATTR_FILE)
-			/* populate the file descriptor for ftruncate to honor
-			 * group lock - see LU-787
-			 */
-			cio->cui_fd = cl_iattr2fd(inode, attr);
-
-		result = cl_io_loop(env, io);
-	} else {
-		result = io->ci_result;
-	}
-	cl_io_fini(env, io);
-	if (unlikely(io->ci_need_restart))
-		goto again;
-	/* HSM import case: file is released, cannot be restored
-	 * no need to fail except if restore registration failed
-	 * with -ENODATA
-	 */
-	if (result == -ENODATA && io->ci_restore_needed &&
-	    io->ci_result != -ENODATA)
-		result = 0;
-	cl_env_put(env, &refcheck);
-	return result;
-}
-
-/*****************************************************************************
- *
- * Type conversions.
- *
- */
-
-struct lu_device *ccc2lu_dev(struct ccc_device *vdv)
-{
-	return &vdv->cdv_cl.cd_lu_dev;
-}
-
-struct ccc_device *lu2ccc_dev(const struct lu_device *d)
-{
-	return container_of0(d, struct ccc_device, cdv_cl.cd_lu_dev);
-}
-
-struct ccc_device *cl2ccc_dev(const struct cl_device *d)
-{
-	return container_of0(d, struct ccc_device, cdv_cl);
-}
-
-struct lu_object *ccc2lu(struct ccc_object *vob)
-{
-	return &vob->cob_cl.co_lu;
-}
-
-struct ccc_object *lu2ccc(const struct lu_object *obj)
-{
-	return container_of0(obj, struct ccc_object, cob_cl.co_lu);
-}
-
-struct ccc_object *cl2ccc(const struct cl_object *obj)
-{
-	return container_of0(obj, struct ccc_object, cob_cl);
-}
-
-struct ccc_lock *cl2ccc_lock(const struct cl_lock_slice *slice)
-{
-	return container_of(slice, struct ccc_lock, clk_cl);
-}
-
-struct ccc_io *cl2ccc_io(const struct lu_env *env,
-			 const struct cl_io_slice *slice)
-{
-	struct ccc_io *cio;
-
-	cio = container_of(slice, struct ccc_io, cui_cl);
-	LASSERT(cio == ccc_env_io(env));
-	return cio;
-}
-
-struct ccc_req *cl2ccc_req(const struct cl_req_slice *slice)
-{
-	return container_of0(slice, struct ccc_req, crq_cl);
-}
-
-struct page *cl2vm_page(const struct cl_page_slice *slice)
-{
-	return cl2ccc_page(slice)->cpg_page;
-}
-
-/*****************************************************************************
- *
- * Accessors.
- *
- */
-int ccc_object_invariant(const struct cl_object *obj)
-{
-	struct inode	 *inode = ccc_object_inode(obj);
-	struct cl_inode_info *lli   = cl_i2info(inode);
-
-	return (S_ISREG(cl_inode_mode(inode)) ||
-		/* i_mode of unlinked inode is zeroed. */
-		cl_inode_mode(inode) == 0) && lli->lli_clob == obj;
-}
-
-struct inode *ccc_object_inode(const struct cl_object *obj)
-{
-	return cl2ccc(obj)->cob_inode;
-}
-
-/**
- * Initialize or update CLIO structures for regular files when new
- * meta-data arrives from the server.
- *
- * \param inode regular file inode
- * \param md    new file metadata from MDS
- * - allocates cl_object if necessary,
- * - updated layout, if object was already here.
- */
-int cl_file_inode_init(struct inode *inode, struct lustre_md *md)
-{
-	struct lu_env	*env;
-	struct cl_inode_info *lli;
-	struct cl_object     *clob;
-	struct lu_site       *site;
-	struct lu_fid	*fid;
-	struct cl_object_conf conf = {
-		.coc_inode = inode,
-		.u = {
-			.coc_md    = md
-		}
-	};
-	int result = 0;
-	int refcheck;
-
-	LASSERT(md->body->valid & OBD_MD_FLID);
-	LASSERT(S_ISREG(cl_inode_mode(inode)));
-
-	env = cl_env_get(&refcheck);
-	if (IS_ERR(env))
-		return PTR_ERR(env);
-
-	site = cl_i2sbi(inode)->ll_site;
-	lli  = cl_i2info(inode);
-	fid  = &lli->lli_fid;
-	LASSERT(fid_is_sane(fid));
-
-	if (!lli->lli_clob) {
-		/* clob is slave of inode, empty lli_clob means for new inode,
-		 * there is no clob in cache with the given fid, so it is
-		 * unnecessary to perform lookup-alloc-lookup-insert, just
-		 * alloc and insert directly.
-		 */
-		LASSERT(inode->i_state & I_NEW);
-		conf.coc_lu.loc_flags = LOC_F_NEW;
-		clob = cl_object_find(env, lu2cl_dev(site->ls_top_dev),
-				      fid, &conf);
-		if (!IS_ERR(clob)) {
-			/*
-			 * No locking is necessary, as new inode is
-			 * locked by I_NEW bit.
-			 */
-			lli->lli_clob = clob;
-			lli->lli_has_smd = lsm_has_objects(md->lsm);
-			lu_object_ref_add(&clob->co_lu, "inode", inode);
-		} else
-			result = PTR_ERR(clob);
-	} else {
-		result = cl_conf_set(env, lli->lli_clob, &conf);
-	}
-
-	cl_env_put(env, &refcheck);
-
-	if (result != 0)
-		CERROR("Failure to initialize cl object "DFID": %d\n",
-		       PFID(fid), result);
-	return result;
-}
-
-/**
- * Wait for others drop their references of the object at first, then we drop
- * the last one, which will lead to the object be destroyed immediately.
- * Must be called after cl_object_kill() against this object.
- *
- * The reason we want to do this is: destroying top object will wait for sub
- * objects being destroyed first, so we can't let bottom layer (e.g. from ASTs)
- * to initiate top object destroying which may deadlock. See bz22520.
- */
-static void cl_object_put_last(struct lu_env *env, struct cl_object *obj)
-{
-	struct lu_object_header *header = obj->co_lu.lo_header;
-	wait_queue_t	   waiter;
-
-	if (unlikely(atomic_read(&header->loh_ref) != 1)) {
-		struct lu_site *site = obj->co_lu.lo_dev->ld_site;
-		struct lu_site_bkt_data *bkt;
-
-		bkt = lu_site_bkt_from_fid(site, &header->loh_fid);
-
-		init_waitqueue_entry(&waiter, current);
-		add_wait_queue(&bkt->lsb_marche_funebre, &waiter);
-
-		while (1) {
-			set_current_state(TASK_UNINTERRUPTIBLE);
-			if (atomic_read(&header->loh_ref) == 1)
-				break;
-			schedule();
-		}
-
-		set_current_state(TASK_RUNNING);
-		remove_wait_queue(&bkt->lsb_marche_funebre, &waiter);
-	}
-
-	cl_object_put(env, obj);
-}
-
-void cl_inode_fini(struct inode *inode)
-{
-	struct lu_env	   *env;
-	struct cl_inode_info    *lli  = cl_i2info(inode);
-	struct cl_object	*clob = lli->lli_clob;
-	int refcheck;
-	int emergency;
-
-	if (clob) {
-		void		    *cookie;
-
-		cookie = cl_env_reenter();
-		env = cl_env_get(&refcheck);
-		emergency = IS_ERR(env);
-		if (emergency) {
-			mutex_lock(&ccc_inode_fini_guard);
-			LASSERT(ccc_inode_fini_env);
-			cl_env_implant(ccc_inode_fini_env, &refcheck);
-			env = ccc_inode_fini_env;
-		}
-		/*
-		 * cl_object cache is a slave to inode cache (which, in turn
-		 * is a slave to dentry cache), don't keep cl_object in memory
-		 * when its master is evicted.
-		 */
-		cl_object_kill(env, clob);
-		lu_object_ref_del(&clob->co_lu, "inode", inode);
-		cl_object_put_last(env, clob);
-		lli->lli_clob = NULL;
-		if (emergency) {
-			cl_env_unplant(ccc_inode_fini_env, &refcheck);
-			mutex_unlock(&ccc_inode_fini_guard);
-		} else
-			cl_env_put(env, &refcheck);
-		cl_env_reexit(cookie);
-	}
-}
-
-/**
- * return IF_* type for given lu_dirent entry.
- * IF_* flag shld be converted to particular OS file type in
- * platform llite module.
- */
-__u16 ll_dirent_type_get(struct lu_dirent *ent)
-{
-	__u16 type = 0;
-	struct luda_type *lt;
-	int len = 0;
-
-	if (le32_to_cpu(ent->lde_attrs) & LUDA_TYPE) {
-		const unsigned align = sizeof(struct luda_type) - 1;
-
-		len = le16_to_cpu(ent->lde_namelen);
-		len = (len + align) & ~align;
-		lt = (void *)ent->lde_name + len;
-		type = IFTODT(le16_to_cpu(lt->lt_type));
-	}
-	return type;
-}
-
-/**
- * build inode number from passed @fid
- */
-__u64 cl_fid_build_ino(const struct lu_fid *fid, int api32)
-{
-	if (BITS_PER_LONG == 32 || api32)
-		return fid_flatten32(fid);
-	else
-		return fid_flatten(fid);
-}
-
-/**
- * build inode generation from passed @fid.  If our FID overflows the 32-bit
- * inode number then return a non-zero generation to distinguish them.
- */
-__u32 cl_fid_build_gen(const struct lu_fid *fid)
-{
-	__u32 gen;
-
-	if (fid_is_igif(fid)) {
-		gen = lu_igif_gen(fid);
-		return gen;
-	}
-
-	gen = fid_flatten(fid) >> 32;
-	return gen;
-}
-
-/* lsm is unreliable after hsm implementation as layout can be changed at
- * any time. This is only to support old, non-clio-ized interfaces. It will
- * cause deadlock if clio operations are called with this extra layout refcount
- * because in case the layout changed during the IO, ll_layout_refresh() will
- * have to wait for the refcount to become zero to destroy the older layout.
- *
- * Notice that the lsm returned by this function may not be valid unless called
- * inside layout lock - MDS_INODELOCK_LAYOUT.
- */
-struct lov_stripe_md *ccc_inode_lsm_get(struct inode *inode)
-{
-	return lov_lsm_get(cl_i2info(inode)->lli_clob);
-}
-
-inline void ccc_inode_lsm_put(struct inode *inode, struct lov_stripe_md *lsm)
-{
-	lov_lsm_put(cl_i2info(inode)->lli_clob, lsm);
-}
diff --git a/drivers/staging/lustre/lustre/lclient/lcommon_misc.c b/drivers/staging/lustre/lustre/lclient/lcommon_misc.c
deleted file mode 100644
index d80bcedd..0000000
--- a/drivers/staging/lustre/lustre/lclient/lcommon_misc.c
+++ /dev/null
@@ -1,200 +0,0 @@
-/*
- * GPL HEADER START
- *
- * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 only,
- * as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * General Public License version 2 for more details (a copy is included
- * in the LICENSE file that accompanied this code).
- *
- * You should have received a copy of the GNU General Public License
- * version 2 along with this program; If not, see
- * http://www.sun.com/software/products/lustre/docs/GPLv2.pdf
- *
- * Please contact Sun Microsystems, Inc., 4150 Network Circle, Santa Clara,
- * CA 95054 USA or visit www.sun.com if you need additional information or
- * have any questions.
- *
- * GPL HEADER END
- */
-/*
- * Copyright (c) 2008, 2010, Oracle and/or its affiliates. All rights reserved.
- * Use is subject to license terms.
- *
- * Copyright (c) 2011, 2012, Intel Corporation.
- */
-/*
- * This file is part of Lustre, http://www.lustre.org/
- * Lustre is a trademark of Sun Microsystems, Inc.
- *
- * cl code shared between vvp and liblustre (and other Lustre clients in the
- * future).
- *
- */
-#include "../include/obd_class.h"
-#include "../include/obd_support.h"
-#include "../include/obd.h"
-#include "../include/cl_object.h"
-#include "../include/lclient.h"
-
-#include "../include/lustre_lite.h"
-
-/* Initialize the default and maximum LOV EA and cookie sizes.  This allows
- * us to make MDS RPCs with large enough reply buffers to hold the
- * maximum-sized (= maximum striped) EA and cookie without having to
- * calculate this (via a call into the LOV + OSCs) each time we make an RPC.
- */
-int cl_init_ea_size(struct obd_export *md_exp, struct obd_export *dt_exp)
-{
-	struct lov_stripe_md lsm = { .lsm_magic = LOV_MAGIC_V3 };
-	__u32 valsize = sizeof(struct lov_desc);
-	int rc, easize, def_easize, cookiesize;
-	struct lov_desc desc;
-	__u16 stripes, def_stripes;
-
-	rc = obd_get_info(NULL, dt_exp, sizeof(KEY_LOVDESC), KEY_LOVDESC,
-			  &valsize, &desc, NULL);
-	if (rc)
-		return rc;
-
-	stripes = min_t(__u32, desc.ld_tgt_count, LOV_MAX_STRIPE_COUNT);
-	lsm.lsm_stripe_count = stripes;
-	easize = obd_size_diskmd(dt_exp, &lsm);
-
-	def_stripes = min_t(__u32, desc.ld_default_stripe_count,
-			    LOV_MAX_STRIPE_COUNT);
-	lsm.lsm_stripe_count = def_stripes;
-	def_easize = obd_size_diskmd(dt_exp, &lsm);
-
-	cookiesize = stripes * sizeof(struct llog_cookie);
-
-	/* default cookiesize is 0 because from 2.4 server doesn't send
-	 * llog cookies to client.
-	 */
-	CDEBUG(D_HA,
-	       "updating def/max_easize: %d/%d def/max_cookiesize: 0/%d\n",
-	       def_easize, easize, cookiesize);
-
-	rc = md_init_ea_size(md_exp, easize, def_easize, cookiesize, 0);
-	return rc;
-}
-
-/**
- * This function is used as an upcall-callback hooked by liblustre and llite
- * clients into obd_notify() listeners chain to handle notifications about
- * change of import connect_flags. See llu_fsswop_mount() and
- * lustre_common_fill_super().
- */
-int cl_ocd_update(struct obd_device *host,
-		  struct obd_device *watched,
-		  enum obd_notify_event ev, void *owner, void *data)
-{
-	struct lustre_client_ocd *lco;
-	struct client_obd	*cli;
-	__u64 flags;
-	int   result;
-
-	if (!strcmp(watched->obd_type->typ_name, LUSTRE_OSC_NAME)) {
-		cli = &watched->u.cli;
-		lco = owner;
-		flags = cli->cl_import->imp_connect_data.ocd_connect_flags;
-		CDEBUG(D_SUPER, "Changing connect_flags: %#llx -> %#llx\n",
-		       lco->lco_flags, flags);
-		mutex_lock(&lco->lco_lock);
-		lco->lco_flags &= flags;
-		/* for each osc event update ea size */
-		if (lco->lco_dt_exp)
-			cl_init_ea_size(lco->lco_md_exp, lco->lco_dt_exp);
-
-		mutex_unlock(&lco->lco_lock);
-		result = 0;
-	} else {
-		CERROR("unexpected notification from %s %s!\n",
-		       watched->obd_type->typ_name,
-		       watched->obd_name);
-		result = -EINVAL;
-	}
-	return result;
-}
-
-#define GROUPLOCK_SCOPE "grouplock"
-
-int cl_get_grouplock(struct cl_object *obj, unsigned long gid, int nonblock,
-		     struct ccc_grouplock *cg)
-{
-	struct lu_env	  *env;
-	struct cl_io	   *io;
-	struct cl_lock	 *lock;
-	struct cl_lock_descr   *descr;
-	__u32		   enqflags;
-	int		     refcheck;
-	int		     rc;
-
-	env = cl_env_get(&refcheck);
-	if (IS_ERR(env))
-		return PTR_ERR(env);
-
-	io = ccc_env_thread_io(env);
-	io->ci_obj = obj;
-	io->ci_ignore_layout = 1;
-
-	rc = cl_io_init(env, io, CIT_MISC, io->ci_obj);
-	if (rc) {
-		/* Does not make sense to take GL for released layout */
-		if (rc > 0)
-			rc = -ENOTSUPP;
-		cl_env_put(env, &refcheck);
-		return rc;
-	}
-
-	descr = &ccc_env_info(env)->cti_descr;
-	descr->cld_obj = obj;
-	descr->cld_start = 0;
-	descr->cld_end = CL_PAGE_EOF;
-	descr->cld_gid = gid;
-	descr->cld_mode = CLM_GROUP;
-
-	enqflags = CEF_MUST | (nonblock ? CEF_NONBLOCK : 0);
-	descr->cld_enq_flags = enqflags;
-
-	lock = cl_lock_request(env, io, descr, GROUPLOCK_SCOPE, current);
-	if (IS_ERR(lock)) {
-		cl_io_fini(env, io);
-		cl_env_put(env, &refcheck);
-		return PTR_ERR(lock);
-	}
-
-	cg->cg_env  = cl_env_get(&refcheck);
-	cg->cg_io   = io;
-	cg->cg_lock = lock;
-	cg->cg_gid  = gid;
-	LASSERT(cg->cg_env == env);
-
-	cl_env_unplant(env, &refcheck);
-	return 0;
-}
-
-void cl_put_grouplock(struct ccc_grouplock *cg)
-{
-	struct lu_env  *env  = cg->cg_env;
-	struct cl_io   *io   = cg->cg_io;
-	struct cl_lock *lock = cg->cg_lock;
-	int	     refcheck;
-
-	LASSERT(cg->cg_env);
-	LASSERT(cg->cg_gid);
-
-	cl_env_implant(env, &refcheck);
-	cl_env_put(env, &refcheck);
-
-	cl_unuse(env, lock);
-	cl_lock_release(env, lock, GROUPLOCK_SCOPE, current);
-	cl_io_fini(env, io);
-	cl_env_put(env, NULL);
-}
diff --git a/drivers/staging/lustre/lustre/llite/Makefile b/drivers/staging/lustre/lustre/llite/Makefile
index 9ac29e7..24085e2 100644
--- a/drivers/staging/lustre/lustre/llite/Makefile
+++ b/drivers/staging/lustre/lustre/llite/Makefile
@@ -4,7 +4,7 @@ lustre-y := dcache.o dir.o file.o llite_close.o llite_lib.o llite_nfs.o \
 	    rw.o namei.o symlink.o llite_mmap.o \
 	    xattr.o xattr_cache.o remote_perm.o llite_rmtacl.o \
 	    rw26.o super25.o statahead.o \
-	    ../lclient/glimpse.o ../lclient/lcommon_cl.o ../lclient/lcommon_misc.o \
+	    glimpse.o lcommon_cl.o lcommon_misc.o \
 	    vvp_dev.o vvp_page.o vvp_lock.o vvp_io.o vvp_object.o lproc_llite.o
 
 llite_lloop-y := lloop.o
diff --git a/drivers/staging/lustre/lustre/llite/glimpse.c b/drivers/staging/lustre/lustre/llite/glimpse.c
new file mode 100644
index 0000000..f235f35
--- /dev/null
+++ b/drivers/staging/lustre/lustre/llite/glimpse.c
@@ -0,0 +1,273 @@
+/*
+ * GPL HEADER START
+ *
+ * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 only,
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License version 2 for more details (a copy is included
+ * in the LICENSE file that accompanied this code).
+ *
+ * You should have received a copy of the GNU General Public License
+ * version 2 along with this program; If not, see
+ * http://www.sun.com/software/products/lustre/docs/GPLv2.pdf
+ *
+ * Please contact Sun Microsystems, Inc., 4150 Network Circle, Santa Clara,
+ * CA 95054 USA or visit www.sun.com if you need additional information or
+ * have any questions.
+ *
+ * GPL HEADER END
+ */
+/*
+ * Copyright (c) 2008, 2010, Oracle and/or its affiliates. All rights reserved.
+ * Use is subject to license terms.
+ *
+ * Copyright (c) 2011, 2012, Intel Corporation.
+ */
+/*
+ * This file is part of Lustre, http://www.lustre.org/
+ * Lustre is a trademark of Sun Microsystems, Inc.
+ *
+ * glimpse code shared between vvp and liblustre (and other Lustre clients in
+ * the future).
+ *
+ *   Author: Nikita Danilov <nikita.danilov@sun.com>
+ *   Author: Oleg Drokin <oleg.drokin@sun.com>
+ */
+
+#include "../../include/linux/libcfs/libcfs.h"
+#include "../include/obd_class.h"
+#include "../include/obd_support.h"
+#include "../include/obd.h"
+
+#include "../include/lustre_dlm.h"
+#include "../include/lustre_lite.h"
+#include "../include/lustre_mdc.h"
+#include <linux/pagemap.h>
+#include <linux/file.h>
+
+#include "../include/cl_object.h"
+#include "../include/lclient.h"
+#include "../llite/llite_internal.h"
+
+static const struct cl_lock_descr whole_file = {
+	.cld_start = 0,
+	.cld_end   = CL_PAGE_EOF,
+	.cld_mode  = CLM_READ
+};
+
+/*
+ * Check whether file has possible unwriten pages.
+ *
+ * \retval 1    file is mmap-ed or has dirty pages
+ *	 0    otherwise
+ */
+blkcnt_t dirty_cnt(struct inode *inode)
+{
+	blkcnt_t cnt = 0;
+	struct ccc_object *vob = cl_inode2ccc(inode);
+	void	      *results[1];
+
+	if (inode->i_mapping)
+		cnt += radix_tree_gang_lookup_tag(&inode->i_mapping->page_tree,
+						  results, 0, 1,
+						  PAGECACHE_TAG_DIRTY);
+	if (cnt == 0 && atomic_read(&vob->cob_mmap_cnt) > 0)
+		cnt = 1;
+
+	return (cnt > 0) ? 1 : 0;
+}
+
+int cl_glimpse_lock(const struct lu_env *env, struct cl_io *io,
+		    struct inode *inode, struct cl_object *clob, int agl)
+{
+	struct cl_lock_descr *descr = &ccc_env_info(env)->cti_descr;
+	struct cl_inode_info *lli   = cl_i2info(inode);
+	const struct lu_fid  *fid   = lu_object_fid(&clob->co_lu);
+	struct ccc_io	*cio   = ccc_env_io(env);
+	struct cl_lock       *lock;
+	int result;
+
+	result = 0;
+	if (!(lli->lli_flags & LLIF_MDS_SIZE_LOCK)) {
+		CDEBUG(D_DLMTRACE, "Glimpsing inode " DFID "\n", PFID(fid));
+		if (lli->lli_has_smd) {
+			/* NOTE: this looks like DLM lock request, but it may
+			 *       not be one. Due to CEF_ASYNC flag (translated
+			 *       to LDLM_FL_HAS_INTENT by osc), this is
+			 *       glimpse request, that won't revoke any
+			 *       conflicting DLM locks held. Instead,
+			 *       ll_glimpse_callback() will be called on each
+			 *       client holding a DLM lock against this file,
+			 *       and resulting size will be returned for each
+			 *       stripe. DLM lock on [0, EOF] is acquired only
+			 *       if there were no conflicting locks. If there
+			 *       were conflicting locks, enqueuing or waiting
+			 *       fails with -ENAVAIL, but valid inode
+			 *       attributes are returned anyway.
+			 */
+			*descr = whole_file;
+			descr->cld_obj   = clob;
+			descr->cld_mode  = CLM_PHANTOM;
+			descr->cld_enq_flags = CEF_ASYNC | CEF_MUST;
+			if (agl)
+				descr->cld_enq_flags |= CEF_AGL;
+			cio->cui_glimpse = 1;
+			/*
+			 * CEF_ASYNC is used because glimpse sub-locks cannot
+			 * deadlock (because they never conflict with other
+			 * locks) and, hence, can be enqueued out-of-order.
+			 *
+			 * CEF_MUST protects glimpse lock from conversion into
+			 * a lockless mode.
+			 */
+			lock = cl_lock_request(env, io, descr, "glimpse",
+					       current);
+			cio->cui_glimpse = 0;
+
+			if (!lock)
+				return 0;
+
+			if (IS_ERR(lock))
+				return PTR_ERR(lock);
+
+			LASSERT(agl == 0);
+			result = cl_wait(env, lock);
+			if (result == 0) {
+				cl_merge_lvb(env, inode);
+				if (cl_isize_read(inode) > 0 &&
+				    inode->i_blocks == 0) {
+					/*
+					 * LU-417: Add dirty pages block count
+					 * lest i_blocks reports 0, some "cp" or
+					 * "tar" may think it's a completely
+					 * sparse file and skip it.
+					 */
+					inode->i_blocks = dirty_cnt(inode);
+				}
+				cl_unuse(env, lock);
+			}
+			cl_lock_release(env, lock, "glimpse", current);
+		} else {
+			CDEBUG(D_DLMTRACE, "No objects for inode\n");
+			cl_merge_lvb(env, inode);
+		}
+	}
+
+	return result;
+}
+
+static int cl_io_get(struct inode *inode, struct lu_env **envout,
+		     struct cl_io **ioout, int *refcheck)
+{
+	struct lu_env	  *env;
+	struct cl_io	   *io;
+	struct cl_inode_info   *lli = cl_i2info(inode);
+	struct cl_object       *clob = lli->lli_clob;
+	int result;
+
+	if (S_ISREG(cl_inode_mode(inode))) {
+		env = cl_env_get(refcheck);
+		if (!IS_ERR(env)) {
+			io = ccc_env_thread_io(env);
+			io->ci_obj = clob;
+			*envout = env;
+			*ioout  = io;
+			result = 1;
+		} else {
+			result = PTR_ERR(env);
+		}
+	} else {
+		result = 0;
+	}
+	return result;
+}
+
+int cl_glimpse_size0(struct inode *inode, int agl)
+{
+	/*
+	 * We don't need ast_flags argument to cl_glimpse_size(), because
+	 * osc_lock_enqueue() takes care of the possible deadlock that said
+	 * argument was introduced to avoid.
+	 */
+	/*
+	 * XXX but note that ll_file_seek() passes LDLM_FL_BLOCK_NOWAIT to
+	 * cl_glimpse_size(), which doesn't make sense: glimpse locks are not
+	 * blocking anyway.
+	 */
+	struct lu_env	  *env = NULL;
+	struct cl_io	   *io  = NULL;
+	int		     result;
+	int		     refcheck;
+
+	result = cl_io_get(inode, &env, &io, &refcheck);
+	if (result > 0) {
+again:
+		io->ci_verify_layout = 1;
+		result = cl_io_init(env, io, CIT_MISC, io->ci_obj);
+		if (result > 0)
+			/*
+			 * nothing to do for this io. This currently happens
+			 * when stripe sub-object's are not yet created.
+			 */
+			result = io->ci_result;
+		else if (result == 0)
+			result = cl_glimpse_lock(env, io, inode, io->ci_obj,
+						 agl);
+
+		OBD_FAIL_TIMEOUT(OBD_FAIL_GLIMPSE_DELAY, 2);
+		cl_io_fini(env, io);
+		if (unlikely(io->ci_need_restart))
+			goto again;
+		cl_env_put(env, &refcheck);
+	}
+	return result;
+}
+
+int cl_local_size(struct inode *inode)
+{
+	struct lu_env	   *env = NULL;
+	struct cl_io	    *io  = NULL;
+	struct ccc_thread_info  *cti;
+	struct cl_object	*clob;
+	struct cl_lock_descr    *descr;
+	struct cl_lock	  *lock;
+	int		      result;
+	int		      refcheck;
+
+	if (!cl_i2info(inode)->lli_has_smd)
+		return 0;
+
+	result = cl_io_get(inode, &env, &io, &refcheck);
+	if (result <= 0)
+		return result;
+
+	clob = io->ci_obj;
+	result = cl_io_init(env, io, CIT_MISC, clob);
+	if (result > 0) {
+		result = io->ci_result;
+	} else if (result == 0) {
+		cti = ccc_env_info(env);
+		descr = &cti->cti_descr;
+
+		*descr = whole_file;
+		descr->cld_obj = clob;
+		lock = cl_lock_peek(env, io, descr, "localsize", current);
+		if (lock) {
+			cl_merge_lvb(env, inode);
+			cl_unuse(env, lock);
+			cl_lock_release(env, lock, "localsize", current);
+			result = 0;
+		} else {
+			result = -ENODATA;
+		}
+	}
+	cl_io_fini(env, io);
+	cl_env_put(env, &refcheck);
+	return result;
+}
diff --git a/drivers/staging/lustre/lustre/llite/lcommon_cl.c b/drivers/staging/lustre/lustre/llite/lcommon_cl.c
new file mode 100644
index 0000000..065b0f2
--- /dev/null
+++ b/drivers/staging/lustre/lustre/llite/lcommon_cl.c
@@ -0,0 +1,1209 @@
+/*
+ * GPL HEADER START
+ *
+ * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 only,
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License version 2 for more details (a copy is included
+ * in the LICENSE file that accompanied this code).
+ *
+ * You should have received a copy of the GNU General Public License
+ * version 2 along with this program; If not, see
+ * http://www.sun.com/software/products/lustre/docs/GPLv2.pdf
+ *
+ * Please contact Sun Microsystems, Inc., 4150 Network Circle, Santa Clara,
+ * CA 95054 USA or visit www.sun.com if you need additional information or
+ * have any questions.
+ *
+ * GPL HEADER END
+ */
+/*
+ * Copyright (c) 2008, 2010, Oracle and/or its affiliates. All rights reserved.
+ * Use is subject to license terms.
+ *
+ * Copyright (c) 2011, 2015, Intel Corporation.
+ */
+/*
+ * This file is part of Lustre, http://www.lustre.org/
+ * Lustre is a trademark of Sun Microsystems, Inc.
+ *
+ * cl code shared between vvp and liblustre (and other Lustre clients in the
+ * future).
+ *
+ *   Author: Nikita Danilov <nikita.danilov@sun.com>
+ */
+
+#define DEBUG_SUBSYSTEM S_LLITE
+
+#include "../../include/linux/libcfs/libcfs.h"
+# include <linux/fs.h>
+# include <linux/sched.h>
+# include <linux/mm.h>
+# include <linux/quotaops.h>
+# include <linux/highmem.h>
+# include <linux/pagemap.h>
+# include <linux/rbtree.h>
+
+#include "../include/obd.h"
+#include "../include/obd_support.h"
+#include "../include/lustre_fid.h"
+#include "../include/lustre_lite.h"
+#include "../include/lustre_dlm.h"
+#include "../include/lustre_ver.h"
+#include "../include/lustre_mdc.h"
+#include "../include/cl_object.h"
+
+#include "../include/lclient.h"
+
+#include "../llite/llite_internal.h"
+
+static const struct cl_req_operations ccc_req_ops;
+
+/*
+ * ccc_ prefix stands for "Common Client Code".
+ */
+
+static struct kmem_cache *ccc_lock_kmem;
+static struct kmem_cache *ccc_object_kmem;
+static struct kmem_cache *ccc_thread_kmem;
+static struct kmem_cache *ccc_session_kmem;
+static struct kmem_cache *ccc_req_kmem;
+
+static struct lu_kmem_descr ccc_caches[] = {
+	{
+		.ckd_cache = &ccc_lock_kmem,
+		.ckd_name  = "ccc_lock_kmem",
+		.ckd_size  = sizeof(struct ccc_lock)
+	},
+	{
+		.ckd_cache = &ccc_object_kmem,
+		.ckd_name  = "ccc_object_kmem",
+		.ckd_size  = sizeof(struct ccc_object)
+	},
+	{
+		.ckd_cache = &ccc_thread_kmem,
+		.ckd_name  = "ccc_thread_kmem",
+		.ckd_size  = sizeof(struct ccc_thread_info),
+	},
+	{
+		.ckd_cache = &ccc_session_kmem,
+		.ckd_name  = "ccc_session_kmem",
+		.ckd_size  = sizeof(struct ccc_session)
+	},
+	{
+		.ckd_cache = &ccc_req_kmem,
+		.ckd_name  = "ccc_req_kmem",
+		.ckd_size  = sizeof(struct ccc_req)
+	},
+	{
+		.ckd_cache = NULL
+	}
+};
+
+/*****************************************************************************
+ *
+ * Vvp device and device type functions.
+ *
+ */
+
+void *ccc_key_init(const struct lu_context *ctx, struct lu_context_key *key)
+{
+	struct ccc_thread_info *info;
+
+	info = kmem_cache_zalloc(ccc_thread_kmem, GFP_NOFS);
+	if (!info)
+		info = ERR_PTR(-ENOMEM);
+	return info;
+}
+
+void ccc_key_fini(const struct lu_context *ctx,
+		  struct lu_context_key *key, void *data)
+{
+	struct ccc_thread_info *info = data;
+
+	kmem_cache_free(ccc_thread_kmem, info);
+}
+
+void *ccc_session_key_init(const struct lu_context *ctx,
+			   struct lu_context_key *key)
+{
+	struct ccc_session *session;
+
+	session = kmem_cache_zalloc(ccc_session_kmem, GFP_NOFS);
+	if (!session)
+		session = ERR_PTR(-ENOMEM);
+	return session;
+}
+
+void ccc_session_key_fini(const struct lu_context *ctx,
+			  struct lu_context_key *key, void *data)
+{
+	struct ccc_session *session = data;
+
+	kmem_cache_free(ccc_session_kmem, session);
+}
+
+struct lu_context_key ccc_key = {
+	.lct_tags = LCT_CL_THREAD,
+	.lct_init = ccc_key_init,
+	.lct_fini = ccc_key_fini
+};
+
+struct lu_context_key ccc_session_key = {
+	.lct_tags = LCT_SESSION,
+	.lct_init = ccc_session_key_init,
+	.lct_fini = ccc_session_key_fini
+};
+
+/* type constructor/destructor: ccc_type_{init,fini,start,stop}(). */
+/* LU_TYPE_INIT_FINI(ccc, &ccc_key, &ccc_session_key); */
+
+int ccc_device_init(const struct lu_env *env, struct lu_device *d,
+		    const char *name, struct lu_device *next)
+{
+	struct ccc_device  *vdv;
+	int rc;
+
+	vdv = lu2ccc_dev(d);
+	vdv->cdv_next = lu2cl_dev(next);
+
+	LASSERT(d->ld_site && next->ld_type);
+	next->ld_site = d->ld_site;
+	rc = next->ld_type->ldt_ops->ldto_device_init(
+			env, next, next->ld_type->ldt_name, NULL);
+	if (rc == 0) {
+		lu_device_get(next);
+		lu_ref_add(&next->ld_reference, "lu-stack", &lu_site_init);
+	}
+	return rc;
+}
+
+struct lu_device *ccc_device_fini(const struct lu_env *env,
+				  struct lu_device *d)
+{
+	return cl2lu_dev(lu2ccc_dev(d)->cdv_next);
+}
+
+struct lu_device *ccc_device_alloc(const struct lu_env *env,
+				   struct lu_device_type *t,
+				   struct lustre_cfg *cfg,
+				   const struct lu_device_operations *luops,
+				   const struct cl_device_operations *clops)
+{
+	struct ccc_device *vdv;
+	struct lu_device  *lud;
+	struct cl_site    *site;
+	int rc;
+
+	vdv = kzalloc(sizeof(*vdv), GFP_NOFS);
+	if (!vdv)
+		return ERR_PTR(-ENOMEM);
+
+	lud = &vdv->cdv_cl.cd_lu_dev;
+	cl_device_init(&vdv->cdv_cl, t);
+	ccc2lu_dev(vdv)->ld_ops = luops;
+	vdv->cdv_cl.cd_ops = clops;
+
+	site = kzalloc(sizeof(*site), GFP_NOFS);
+	if (site) {
+		rc = cl_site_init(site, &vdv->cdv_cl);
+		if (rc == 0) {
+			rc = lu_site_init_finish(&site->cs_lu);
+		} else {
+			LASSERT(!lud->ld_site);
+			CERROR("Cannot init lu_site, rc %d.\n", rc);
+			kfree(site);
+		}
+	} else {
+		rc = -ENOMEM;
+	}
+	if (rc != 0) {
+		ccc_device_free(env, lud);
+		lud = ERR_PTR(rc);
+	}
+	return lud;
+}
+
+struct lu_device *ccc_device_free(const struct lu_env *env,
+				  struct lu_device *d)
+{
+	struct ccc_device *vdv  = lu2ccc_dev(d);
+	struct cl_site    *site = lu2cl_site(d->ld_site);
+	struct lu_device  *next = cl2lu_dev(vdv->cdv_next);
+
+	if (d->ld_site) {
+		cl_site_fini(site);
+		kfree(site);
+	}
+	cl_device_fini(lu2cl_dev(d));
+	kfree(vdv);
+	return next;
+}
+
+int ccc_req_init(const struct lu_env *env, struct cl_device *dev,
+		 struct cl_req *req)
+{
+	struct ccc_req *vrq;
+	int result;
+
+	vrq = kmem_cache_zalloc(ccc_req_kmem, GFP_NOFS);
+	if (vrq) {
+		cl_req_slice_add(req, &vrq->crq_cl, dev, &ccc_req_ops);
+		result = 0;
+	} else {
+		result = -ENOMEM;
+	}
+	return result;
+}
+
+/**
+ * An `emergency' environment used by ccc_inode_fini() when cl_env_get()
+ * fails. Access to this environment is serialized by ccc_inode_fini_guard
+ * mutex.
+ */
+static struct lu_env *ccc_inode_fini_env;
+
+/**
+ * A mutex serializing calls to slp_inode_fini() under extreme memory
+ * pressure, when environments cannot be allocated.
+ */
+static DEFINE_MUTEX(ccc_inode_fini_guard);
+static int dummy_refcheck;
+
+int ccc_global_init(struct lu_device_type *device_type)
+{
+	int result;
+
+	result = lu_kmem_init(ccc_caches);
+	if (result)
+		return result;
+
+	result = lu_device_type_init(device_type);
+	if (result)
+		goto out_kmem;
+
+	ccc_inode_fini_env = cl_env_alloc(&dummy_refcheck,
+					  LCT_REMEMBER | LCT_NOREF);
+	if (IS_ERR(ccc_inode_fini_env)) {
+		result = PTR_ERR(ccc_inode_fini_env);
+		goto out_device;
+	}
+
+	ccc_inode_fini_env->le_ctx.lc_cookie = 0x4;
+	return 0;
+out_device:
+	lu_device_type_fini(device_type);
+out_kmem:
+	lu_kmem_fini(ccc_caches);
+	return result;
+}
+
+void ccc_global_fini(struct lu_device_type *device_type)
+{
+	if (ccc_inode_fini_env) {
+		cl_env_put(ccc_inode_fini_env, &dummy_refcheck);
+		ccc_inode_fini_env = NULL;
+	}
+	lu_device_type_fini(device_type);
+	lu_kmem_fini(ccc_caches);
+}
+
+/*****************************************************************************
+ *
+ * Object operations.
+ *
+ */
+
+struct lu_object *ccc_object_alloc(const struct lu_env *env,
+				   const struct lu_object_header *unused,
+				   struct lu_device *dev,
+				   const struct cl_object_operations *clops,
+				   const struct lu_object_operations *luops)
+{
+	struct ccc_object *vob;
+	struct lu_object  *obj;
+
+	vob = kmem_cache_zalloc(ccc_object_kmem, GFP_NOFS);
+	if (vob) {
+		struct cl_object_header *hdr;
+
+		obj = ccc2lu(vob);
+		hdr = &vob->cob_header;
+		cl_object_header_init(hdr);
+		lu_object_init(obj, &hdr->coh_lu, dev);
+		lu_object_add_top(&hdr->coh_lu, obj);
+
+		vob->cob_cl.co_ops = clops;
+		obj->lo_ops = luops;
+	} else {
+		obj = NULL;
+	}
+	return obj;
+}
+
+int ccc_object_init0(const struct lu_env *env,
+		     struct ccc_object *vob,
+		     const struct cl_object_conf *conf)
+{
+	vob->cob_inode = conf->coc_inode;
+	vob->cob_transient_pages = 0;
+	cl_object_page_init(&vob->cob_cl, sizeof(struct ccc_page));
+	return 0;
+}
+
+int ccc_object_init(const struct lu_env *env, struct lu_object *obj,
+		    const struct lu_object_conf *conf)
+{
+	struct ccc_device *dev = lu2ccc_dev(obj->lo_dev);
+	struct ccc_object *vob = lu2ccc(obj);
+	struct lu_object  *below;
+	struct lu_device  *under;
+	int result;
+
+	under = &dev->cdv_next->cd_lu_dev;
+	below = under->ld_ops->ldo_object_alloc(env, obj->lo_header, under);
+	if (below) {
+		const struct cl_object_conf *cconf;
+
+		cconf = lu2cl_conf(conf);
+		INIT_LIST_HEAD(&vob->cob_pending_list);
+		lu_object_add(obj, below);
+		result = ccc_object_init0(env, vob, cconf);
+	} else {
+		result = -ENOMEM;
+	}
+	return result;
+}
+
+void ccc_object_free(const struct lu_env *env, struct lu_object *obj)
+{
+	struct ccc_object *vob = lu2ccc(obj);
+
+	lu_object_fini(obj);
+	lu_object_header_fini(obj->lo_header);
+	kmem_cache_free(ccc_object_kmem, vob);
+}
+
+int ccc_lock_init(const struct lu_env *env,
+		  struct cl_object *obj, struct cl_lock *lock,
+		  const struct cl_io *unused,
+		  const struct cl_lock_operations *lkops)
+{
+	struct ccc_lock *clk;
+	int result;
+
+	CLOBINVRNT(env, obj, ccc_object_invariant(obj));
+
+	clk = kmem_cache_zalloc(ccc_lock_kmem, GFP_NOFS);
+	if (clk) {
+		cl_lock_slice_add(lock, &clk->clk_cl, obj, lkops);
+		result = 0;
+	} else {
+		result = -ENOMEM;
+	}
+	return result;
+}
+
+int ccc_object_glimpse(const struct lu_env *env,
+		       const struct cl_object *obj, struct ost_lvb *lvb)
+{
+	struct inode *inode = ccc_object_inode(obj);
+
+	lvb->lvb_mtime = cl_inode_mtime(inode);
+	lvb->lvb_atime = cl_inode_atime(inode);
+	lvb->lvb_ctime = cl_inode_ctime(inode);
+	/*
+	 * LU-417: Add dirty pages block count lest i_blocks reports 0, some
+	 * "cp" or "tar" on remote node may think it's a completely sparse file
+	 * and skip it.
+	 */
+	if (lvb->lvb_size > 0 && lvb->lvb_blocks == 0)
+		lvb->lvb_blocks = dirty_cnt(inode);
+	return 0;
+}
+
+static void ccc_object_size_lock(struct cl_object *obj)
+{
+	struct inode *inode = ccc_object_inode(obj);
+
+	ll_inode_size_lock(inode);
+	cl_object_attr_lock(obj);
+}
+
+static void ccc_object_size_unlock(struct cl_object *obj)
+{
+	struct inode *inode = ccc_object_inode(obj);
+
+	cl_object_attr_unlock(obj);
+	ll_inode_size_unlock(inode);
+}
+
+/*****************************************************************************
+ *
+ * Page operations.
+ *
+ */
+
+struct page *ccc_page_vmpage(const struct lu_env *env,
+			     const struct cl_page_slice *slice)
+{
+	return cl2vm_page(slice);
+}
+
+int ccc_page_is_under_lock(const struct lu_env *env,
+			   const struct cl_page_slice *slice,
+			   struct cl_io *io)
+{
+	struct ccc_io	*cio  = ccc_env_io(env);
+	struct cl_lock_descr *desc = &ccc_env_info(env)->cti_descr;
+	struct cl_page       *page = slice->cpl_page;
+
+	int result;
+
+	if (io->ci_type == CIT_READ || io->ci_type == CIT_WRITE ||
+	    io->ci_type == CIT_FAULT) {
+		if (cio->cui_fd->fd_flags & LL_FILE_GROUP_LOCKED) {
+			result = -EBUSY;
+		} else {
+			desc->cld_start = page->cp_index;
+			desc->cld_end   = page->cp_index;
+			desc->cld_obj   = page->cp_obj;
+			desc->cld_mode  = CLM_READ;
+			result = cl_queue_match(&io->ci_lockset.cls_done,
+						desc) ? -EBUSY : 0;
+		}
+	} else {
+		result = 0;
+	}
+	return result;
+}
+
+int ccc_fail(const struct lu_env *env, const struct cl_page_slice *slice)
+{
+	/*
+	 * Cached read?
+	 */
+	LBUG();
+	return 0;
+}
+
+int ccc_transient_page_prep(const struct lu_env *env,
+			    const struct cl_page_slice *slice,
+			    struct cl_io *unused)
+{
+	/* transient page should always be sent. */
+	return 0;
+}
+
+/*****************************************************************************
+ *
+ * Lock operations.
+ *
+ */
+
+void ccc_lock_delete(const struct lu_env *env,
+		     const struct cl_lock_slice *slice)
+{
+	CLOBINVRNT(env, slice->cls_obj, ccc_object_invariant(slice->cls_obj));
+}
+
+void ccc_lock_fini(const struct lu_env *env, struct cl_lock_slice *slice)
+{
+	struct ccc_lock *clk = cl2ccc_lock(slice);
+
+	kmem_cache_free(ccc_lock_kmem, clk);
+}
+
+int ccc_lock_enqueue(const struct lu_env *env,
+		     const struct cl_lock_slice *slice,
+		     struct cl_io *unused, __u32 enqflags)
+{
+	CLOBINVRNT(env, slice->cls_obj, ccc_object_invariant(slice->cls_obj));
+	return 0;
+}
+
+int ccc_lock_use(const struct lu_env *env, const struct cl_lock_slice *slice)
+{
+	CLOBINVRNT(env, slice->cls_obj, ccc_object_invariant(slice->cls_obj));
+	return 0;
+}
+
+int ccc_lock_unuse(const struct lu_env *env, const struct cl_lock_slice *slice)
+{
+	CLOBINVRNT(env, slice->cls_obj, ccc_object_invariant(slice->cls_obj));
+	return 0;
+}
+
+int ccc_lock_wait(const struct lu_env *env, const struct cl_lock_slice *slice)
+{
+	CLOBINVRNT(env, slice->cls_obj, ccc_object_invariant(slice->cls_obj));
+	return 0;
+}
+
+/**
+ * Implementation of cl_lock_operations::clo_fits_into() methods for ccc
+ * layer. This function is executed every time io finds an existing lock in
+ * the lock cache while creating new lock. This function has to decide whether
+ * cached lock "fits" into io.
+ *
+ * \param slice lock to be checked
+ * \param io    IO that wants a lock.
+ *
+ * \see lov_lock_fits_into().
+ */
+int ccc_lock_fits_into(const struct lu_env *env,
+		       const struct cl_lock_slice *slice,
+		       const struct cl_lock_descr *need,
+		       const struct cl_io *io)
+{
+	const struct cl_lock       *lock  = slice->cls_lock;
+	const struct cl_lock_descr *descr = &lock->cll_descr;
+	const struct ccc_io	*cio   = ccc_env_io(env);
+	int			 result;
+
+	/*
+	 * Work around DLM peculiarity: it assumes that glimpse
+	 * (LDLM_FL_HAS_INTENT) lock is always LCK_PR, and returns reads lock
+	 * when asked for LCK_PW lock with LDLM_FL_HAS_INTENT flag set. Make
+	 * sure that glimpse doesn't get CLM_WRITE top-lock, so that it
+	 * doesn't enqueue CLM_WRITE sub-locks.
+	 */
+	if (cio->cui_glimpse)
+		result = descr->cld_mode != CLM_WRITE;
+
+	/*
+	 * Also, don't match incomplete write locks for read, otherwise read
+	 * would enqueue missing sub-locks in the write mode.
+	 */
+	else if (need->cld_mode != descr->cld_mode)
+		result = lock->cll_state >= CLS_ENQUEUED;
+	else
+		result = 1;
+	return result;
+}
+
+/**
+ * Implements cl_lock_operations::clo_state() method for ccc layer, invoked
+ * whenever lock state changes. Transfers object attributes, that might be
+ * updated as a result of lock acquiring into inode.
+ */
+void ccc_lock_state(const struct lu_env *env,
+		    const struct cl_lock_slice *slice,
+		    enum cl_lock_state state)
+{
+	struct cl_lock *lock = slice->cls_lock;
+
+	/*
+	 * Refresh inode attributes when the lock is moving into CLS_HELD
+	 * state, and only when this is a result of real enqueue, rather than
+	 * of finding lock in the cache.
+	 */
+	if (state == CLS_HELD && lock->cll_state < CLS_HELD) {
+		struct cl_object *obj;
+		struct inode     *inode;
+
+		obj   = slice->cls_obj;
+		inode = ccc_object_inode(obj);
+
+		/* vmtruncate() sets the i_size
+		 * under both a DLM lock and the
+		 * ll_inode_size_lock().  If we don't get the
+		 * ll_inode_size_lock() here we can match the DLM lock and
+		 * reset i_size.  generic_file_write can then trust the
+		 * stale i_size when doing appending writes and effectively
+		 * cancel the result of the truncate.  Getting the
+		 * ll_inode_size_lock() after the enqueue maintains the DLM
+		 * -> ll_inode_size_lock() acquiring order.
+		 */
+		if (lock->cll_descr.cld_start == 0 &&
+		    lock->cll_descr.cld_end == CL_PAGE_EOF)
+			cl_merge_lvb(env, inode);
+	}
+}
+
+/*****************************************************************************
+ *
+ * io operations.
+ *
+ */
+
+int ccc_io_one_lock_index(const struct lu_env *env, struct cl_io *io,
+			  __u32 enqflags, enum cl_lock_mode mode,
+			  pgoff_t start, pgoff_t end)
+{
+	struct ccc_io	  *cio   = ccc_env_io(env);
+	struct cl_lock_descr   *descr = &cio->cui_link.cill_descr;
+	struct cl_object       *obj   = io->ci_obj;
+
+	CLOBINVRNT(env, obj, ccc_object_invariant(obj));
+
+	CDEBUG(D_VFSTRACE, "lock: %d [%lu, %lu]\n", mode, start, end);
+
+	memset(&cio->cui_link, 0, sizeof(cio->cui_link));
+
+	if (cio->cui_fd && (cio->cui_fd->fd_flags & LL_FILE_GROUP_LOCKED)) {
+		descr->cld_mode = CLM_GROUP;
+		descr->cld_gid  = cio->cui_fd->fd_grouplock.cg_gid;
+	} else {
+		descr->cld_mode  = mode;
+	}
+	descr->cld_obj   = obj;
+	descr->cld_start = start;
+	descr->cld_end   = end;
+	descr->cld_enq_flags = enqflags;
+
+	cl_io_lock_add(env, io, &cio->cui_link);
+	return 0;
+}
+
+void ccc_io_update_iov(const struct lu_env *env,
+		       struct ccc_io *cio, struct cl_io *io)
+{
+	size_t size = io->u.ci_rw.crw_count;
+
+	if (!cl_is_normalio(env, io) || !cio->cui_iter)
+		return;
+
+	iov_iter_truncate(cio->cui_iter, size);
+}
+
+int ccc_io_one_lock(const struct lu_env *env, struct cl_io *io,
+		    __u32 enqflags, enum cl_lock_mode mode,
+		    loff_t start, loff_t end)
+{
+	struct cl_object *obj = io->ci_obj;
+
+	return ccc_io_one_lock_index(env, io, enqflags, mode,
+				     cl_index(obj, start), cl_index(obj, end));
+}
+
+void ccc_io_end(const struct lu_env *env, const struct cl_io_slice *ios)
+{
+	CLOBINVRNT(env, ios->cis_io->ci_obj,
+		   ccc_object_invariant(ios->cis_io->ci_obj));
+}
+
+void ccc_io_advance(const struct lu_env *env,
+		    const struct cl_io_slice *ios,
+		    size_t nob)
+{
+	struct ccc_io    *cio = cl2ccc_io(env, ios);
+	struct cl_io     *io  = ios->cis_io;
+	struct cl_object *obj = ios->cis_io->ci_obj;
+
+	CLOBINVRNT(env, obj, ccc_object_invariant(obj));
+
+	if (!cl_is_normalio(env, io))
+		return;
+
+	iov_iter_reexpand(cio->cui_iter, cio->cui_tot_count  -= nob);
+}
+
+/**
+ * Helper function that if necessary adjusts file size (inode->i_size), when
+ * position at the offset \a pos is accessed. File size can be arbitrary stale
+ * on a Lustre client, but client at least knows KMS. If accessed area is
+ * inside [0, KMS], set file size to KMS, otherwise glimpse file size.
+ *
+ * Locking: cl_isize_lock is used to serialize changes to inode size and to
+ * protect consistency between inode size and cl_object
+ * attributes. cl_object_size_lock() protects consistency between cl_attr's of
+ * top-object and sub-objects.
+ */
+int ccc_prep_size(const struct lu_env *env, struct cl_object *obj,
+		  struct cl_io *io, loff_t start, size_t count, int *exceed)
+{
+	struct cl_attr *attr  = ccc_env_thread_attr(env);
+	struct inode   *inode = ccc_object_inode(obj);
+	loff_t	  pos   = start + count - 1;
+	loff_t kms;
+	int result;
+
+	/*
+	 * Consistency guarantees: following possibilities exist for the
+	 * relation between region being accessed and real file size at this
+	 * moment:
+	 *
+	 *  (A): the region is completely inside of the file;
+	 *
+	 *  (B-x): x bytes of region are inside of the file, the rest is
+	 *  outside;
+	 *
+	 *  (C): the region is completely outside of the file.
+	 *
+	 * This classification is stable under DLM lock already acquired by
+	 * the caller, because to change the class, other client has to take
+	 * DLM lock conflicting with our lock. Also, any updates to ->i_size
+	 * by other threads on this client are serialized by
+	 * ll_inode_size_lock(). This guarantees that short reads are handled
+	 * correctly in the face of concurrent writes and truncates.
+	 */
+	ccc_object_size_lock(obj);
+	result = cl_object_attr_get(env, obj, attr);
+	if (result == 0) {
+		kms = attr->cat_kms;
+		if (pos > kms) {
+			/*
+			 * A glimpse is necessary to determine whether we
+			 * return a short read (B) or some zeroes at the end
+			 * of the buffer (C)
+			 */
+			ccc_object_size_unlock(obj);
+			result = cl_glimpse_lock(env, io, inode, obj, 0);
+			if (result == 0 && exceed) {
+				/* If objective page index exceed end-of-file
+				 * page index, return directly. Do not expect
+				 * kernel will check such case correctly.
+				 * linux-2.6.18-128.1.1 miss to do that.
+				 * --bug 17336
+				 */
+				loff_t size = cl_isize_read(inode);
+				loff_t cur_index = start >> PAGE_CACHE_SHIFT;
+				loff_t size_index = (size - 1) >>
+						    PAGE_CACHE_SHIFT;
+
+				if ((size == 0 && cur_index != 0) ||
+				    size_index < cur_index)
+					*exceed = 1;
+			}
+			return result;
+		}
+		/*
+		 * region is within kms and, hence, within real file
+		 * size (A). We need to increase i_size to cover the
+		 * read region so that generic_file_read() will do its
+		 * job, but that doesn't mean the kms size is
+		 * _correct_, it is only the _minimum_ size. If
+		 * someone does a stat they will get the correct size
+		 * which will always be >= the kms value here.
+		 * b=11081
+		 */
+		if (cl_isize_read(inode) < kms) {
+			cl_isize_write_nolock(inode, kms);
+			CDEBUG(D_VFSTRACE, DFID " updating i_size %llu\n",
+			       PFID(lu_object_fid(&obj->co_lu)),
+			       (__u64)cl_isize_read(inode));
+		}
+	}
+	ccc_object_size_unlock(obj);
+	return result;
+}
+
+/*****************************************************************************
+ *
+ * Transfer operations.
+ *
+ */
+
+void ccc_req_completion(const struct lu_env *env,
+			const struct cl_req_slice *slice, int ioret)
+{
+	struct ccc_req *vrq;
+
+	if (ioret > 0)
+		cl_stats_tally(slice->crs_dev, slice->crs_req->crq_type, ioret);
+
+	vrq = cl2ccc_req(slice);
+	kmem_cache_free(ccc_req_kmem, vrq);
+}
+
+/**
+ * Implementation of struct cl_req_operations::cro_attr_set() for ccc
+ * layer. ccc is responsible for
+ *
+ *    - o_[mac]time
+ *
+ *    - o_mode
+ *
+ *    - o_parent_seq
+ *
+ *    - o_[ug]id
+ *
+ *    - o_parent_oid
+ *
+ *    - o_parent_ver
+ *
+ *    - o_ioepoch,
+ *
+ */
+void ccc_req_attr_set(const struct lu_env *env,
+		      const struct cl_req_slice *slice,
+		      const struct cl_object *obj,
+		      struct cl_req_attr *attr, u64 flags)
+{
+	struct inode *inode;
+	struct obdo  *oa;
+	u32	      valid_flags;
+
+	oa = attr->cra_oa;
+	inode = ccc_object_inode(obj);
+	valid_flags = OBD_MD_FLTYPE;
+
+	if (slice->crs_req->crq_type == CRT_WRITE) {
+		if (flags & OBD_MD_FLEPOCH) {
+			oa->o_valid |= OBD_MD_FLEPOCH;
+			oa->o_ioepoch = cl_i2info(inode)->lli_ioepoch;
+			valid_flags |= OBD_MD_FLMTIME | OBD_MD_FLCTIME |
+				       OBD_MD_FLUID | OBD_MD_FLGID;
+		}
+	}
+	obdo_from_inode(oa, inode, valid_flags & flags);
+	obdo_set_parent_fid(oa, &cl_i2info(inode)->lli_fid);
+	memcpy(attr->cra_jobid, cl_i2info(inode)->lli_jobid,
+	       JOBSTATS_JOBID_SIZE);
+}
+
+static const struct cl_req_operations ccc_req_ops = {
+	.cro_attr_set   = ccc_req_attr_set,
+	.cro_completion = ccc_req_completion
+};
+
+int cl_setattr_ost(struct inode *inode, const struct iattr *attr)
+{
+	struct lu_env *env;
+	struct cl_io  *io;
+	int	    result;
+	int	    refcheck;
+
+	env = cl_env_get(&refcheck);
+	if (IS_ERR(env))
+		return PTR_ERR(env);
+
+	io = ccc_env_thread_io(env);
+	io->ci_obj = cl_i2info(inode)->lli_clob;
+
+	io->u.ci_setattr.sa_attr.lvb_atime = LTIME_S(attr->ia_atime);
+	io->u.ci_setattr.sa_attr.lvb_mtime = LTIME_S(attr->ia_mtime);
+	io->u.ci_setattr.sa_attr.lvb_ctime = LTIME_S(attr->ia_ctime);
+	io->u.ci_setattr.sa_attr.lvb_size = attr->ia_size;
+	io->u.ci_setattr.sa_valid = attr->ia_valid;
+
+again:
+	if (cl_io_init(env, io, CIT_SETATTR, io->ci_obj) == 0) {
+		struct ccc_io *cio = ccc_env_io(env);
+
+		if (attr->ia_valid & ATTR_FILE)
+			/* populate the file descriptor for ftruncate to honor
+			 * group lock - see LU-787
+			 */
+			cio->cui_fd = cl_iattr2fd(inode, attr);
+
+		result = cl_io_loop(env, io);
+	} else {
+		result = io->ci_result;
+	}
+	cl_io_fini(env, io);
+	if (unlikely(io->ci_need_restart))
+		goto again;
+	/* HSM import case: file is released, cannot be restored
+	 * no need to fail except if restore registration failed
+	 * with -ENODATA
+	 */
+	if (result == -ENODATA && io->ci_restore_needed &&
+	    io->ci_result != -ENODATA)
+		result = 0;
+	cl_env_put(env, &refcheck);
+	return result;
+}
+
+/*****************************************************************************
+ *
+ * Type conversions.
+ *
+ */
+
+struct lu_device *ccc2lu_dev(struct ccc_device *vdv)
+{
+	return &vdv->cdv_cl.cd_lu_dev;
+}
+
+struct ccc_device *lu2ccc_dev(const struct lu_device *d)
+{
+	return container_of0(d, struct ccc_device, cdv_cl.cd_lu_dev);
+}
+
+struct ccc_device *cl2ccc_dev(const struct cl_device *d)
+{
+	return container_of0(d, struct ccc_device, cdv_cl);
+}
+
+struct lu_object *ccc2lu(struct ccc_object *vob)
+{
+	return &vob->cob_cl.co_lu;
+}
+
+struct ccc_object *lu2ccc(const struct lu_object *obj)
+{
+	return container_of0(obj, struct ccc_object, cob_cl.co_lu);
+}
+
+struct ccc_object *cl2ccc(const struct cl_object *obj)
+{
+	return container_of0(obj, struct ccc_object, cob_cl);
+}
+
+struct ccc_lock *cl2ccc_lock(const struct cl_lock_slice *slice)
+{
+	return container_of(slice, struct ccc_lock, clk_cl);
+}
+
+struct ccc_io *cl2ccc_io(const struct lu_env *env,
+			 const struct cl_io_slice *slice)
+{
+	struct ccc_io *cio;
+
+	cio = container_of(slice, struct ccc_io, cui_cl);
+	LASSERT(cio == ccc_env_io(env));
+	return cio;
+}
+
+struct ccc_req *cl2ccc_req(const struct cl_req_slice *slice)
+{
+	return container_of0(slice, struct ccc_req, crq_cl);
+}
+
+struct page *cl2vm_page(const struct cl_page_slice *slice)
+{
+	return cl2ccc_page(slice)->cpg_page;
+}
+
+/*****************************************************************************
+ *
+ * Accessors.
+ *
+ */
+int ccc_object_invariant(const struct cl_object *obj)
+{
+	struct inode	 *inode = ccc_object_inode(obj);
+	struct cl_inode_info *lli   = cl_i2info(inode);
+
+	return (S_ISREG(cl_inode_mode(inode)) ||
+		/* i_mode of unlinked inode is zeroed. */
+		cl_inode_mode(inode) == 0) && lli->lli_clob == obj;
+}
+
+struct inode *ccc_object_inode(const struct cl_object *obj)
+{
+	return cl2ccc(obj)->cob_inode;
+}
+
+/**
+ * Initialize or update CLIO structures for regular files when new
+ * meta-data arrives from the server.
+ *
+ * \param inode regular file inode
+ * \param md    new file metadata from MDS
+ * - allocates cl_object if necessary,
+ * - updated layout, if object was already here.
+ */
+int cl_file_inode_init(struct inode *inode, struct lustre_md *md)
+{
+	struct lu_env	*env;
+	struct cl_inode_info *lli;
+	struct cl_object     *clob;
+	struct lu_site       *site;
+	struct lu_fid	*fid;
+	struct cl_object_conf conf = {
+		.coc_inode = inode,
+		.u = {
+			.coc_md    = md
+		}
+	};
+	int result = 0;
+	int refcheck;
+
+	LASSERT(md->body->valid & OBD_MD_FLID);
+	LASSERT(S_ISREG(cl_inode_mode(inode)));
+
+	env = cl_env_get(&refcheck);
+	if (IS_ERR(env))
+		return PTR_ERR(env);
+
+	site = cl_i2sbi(inode)->ll_site;
+	lli  = cl_i2info(inode);
+	fid  = &lli->lli_fid;
+	LASSERT(fid_is_sane(fid));
+
+	if (!lli->lli_clob) {
+		/* clob is slave of inode, empty lli_clob means for new inode,
+		 * there is no clob in cache with the given fid, so it is
+		 * unnecessary to perform lookup-alloc-lookup-insert, just
+		 * alloc and insert directly.
+		 */
+		LASSERT(inode->i_state & I_NEW);
+		conf.coc_lu.loc_flags = LOC_F_NEW;
+		clob = cl_object_find(env, lu2cl_dev(site->ls_top_dev),
+				      fid, &conf);
+		if (!IS_ERR(clob)) {
+			/*
+			 * No locking is necessary, as new inode is
+			 * locked by I_NEW bit.
+			 */
+			lli->lli_clob = clob;
+			lli->lli_has_smd = lsm_has_objects(md->lsm);
+			lu_object_ref_add(&clob->co_lu, "inode", inode);
+		} else {
+			result = PTR_ERR(clob);
+		}
+	} else {
+		result = cl_conf_set(env, lli->lli_clob, &conf);
+	}
+
+	cl_env_put(env, &refcheck);
+
+	if (result != 0)
+		CERROR("Failure to initialize cl object " DFID ": %d\n",
+		       PFID(fid), result);
+	return result;
+}
+
+/**
+ * Wait for others drop their references of the object at first, then we drop
+ * the last one, which will lead to the object be destroyed immediately.
+ * Must be called after cl_object_kill() against this object.
+ *
+ * The reason we want to do this is: destroying top object will wait for sub
+ * objects being destroyed first, so we can't let bottom layer (e.g. from ASTs)
+ * to initiate top object destroying which may deadlock. See bz22520.
+ */
+static void cl_object_put_last(struct lu_env *env, struct cl_object *obj)
+{
+	struct lu_object_header *header = obj->co_lu.lo_header;
+	wait_queue_t	   waiter;
+
+	if (unlikely(atomic_read(&header->loh_ref) != 1)) {
+		struct lu_site *site = obj->co_lu.lo_dev->ld_site;
+		struct lu_site_bkt_data *bkt;
+
+		bkt = lu_site_bkt_from_fid(site, &header->loh_fid);
+
+		init_waitqueue_entry(&waiter, current);
+		add_wait_queue(&bkt->lsb_marche_funebre, &waiter);
+
+		while (1) {
+			set_current_state(TASK_UNINTERRUPTIBLE);
+			if (atomic_read(&header->loh_ref) == 1)
+				break;
+			schedule();
+		}
+
+		set_current_state(TASK_RUNNING);
+		remove_wait_queue(&bkt->lsb_marche_funebre, &waiter);
+	}
+
+	cl_object_put(env, obj);
+}
+
+void cl_inode_fini(struct inode *inode)
+{
+	struct lu_env	   *env;
+	struct cl_inode_info    *lli  = cl_i2info(inode);
+	struct cl_object	*clob = lli->lli_clob;
+	int refcheck;
+	int emergency;
+
+	if (clob) {
+		void		    *cookie;
+
+		cookie = cl_env_reenter();
+		env = cl_env_get(&refcheck);
+		emergency = IS_ERR(env);
+		if (emergency) {
+			mutex_lock(&ccc_inode_fini_guard);
+			LASSERT(ccc_inode_fini_env);
+			cl_env_implant(ccc_inode_fini_env, &refcheck);
+			env = ccc_inode_fini_env;
+		}
+		/*
+		 * cl_object cache is a slave to inode cache (which, in turn
+		 * is a slave to dentry cache), don't keep cl_object in memory
+		 * when its master is evicted.
+		 */
+		cl_object_kill(env, clob);
+		lu_object_ref_del(&clob->co_lu, "inode", inode);
+		cl_object_put_last(env, clob);
+		lli->lli_clob = NULL;
+		if (emergency) {
+			cl_env_unplant(ccc_inode_fini_env, &refcheck);
+			mutex_unlock(&ccc_inode_fini_guard);
+		} else {
+			cl_env_put(env, &refcheck);
+		}
+		cl_env_reexit(cookie);
+	}
+}
+
+/**
+ * return IF_* type for given lu_dirent entry.
+ * IF_* flag shld be converted to particular OS file type in
+ * platform llite module.
+ */
+__u16 ll_dirent_type_get(struct lu_dirent *ent)
+{
+	__u16 type = 0;
+	struct luda_type *lt;
+	int len = 0;
+
+	if (le32_to_cpu(ent->lde_attrs) & LUDA_TYPE) {
+		const unsigned int align = sizeof(struct luda_type) - 1;
+
+		len = le16_to_cpu(ent->lde_namelen);
+		len = (len + align) & ~align;
+		lt = (void *)ent->lde_name + len;
+		type = IFTODT(le16_to_cpu(lt->lt_type));
+	}
+	return type;
+}
+
+/**
+ * build inode number from passed @fid
+ */
+__u64 cl_fid_build_ino(const struct lu_fid *fid, int api32)
+{
+	if (BITS_PER_LONG == 32 || api32)
+		return fid_flatten32(fid);
+	else
+		return fid_flatten(fid);
+}
+
+/**
+ * build inode generation from passed @fid.  If our FID overflows the 32-bit
+ * inode number then return a non-zero generation to distinguish them.
+ */
+__u32 cl_fid_build_gen(const struct lu_fid *fid)
+{
+	__u32 gen;
+
+	if (fid_is_igif(fid)) {
+		gen = lu_igif_gen(fid);
+		return gen;
+	}
+
+	gen = fid_flatten(fid) >> 32;
+	return gen;
+}
+
+/* lsm is unreliable after hsm implementation as layout can be changed at
+ * any time. This is only to support old, non-clio-ized interfaces. It will
+ * cause deadlock if clio operations are called with this extra layout refcount
+ * because in case the layout changed during the IO, ll_layout_refresh() will
+ * have to wait for the refcount to become zero to destroy the older layout.
+ *
+ * Notice that the lsm returned by this function may not be valid unless called
+ * inside layout lock - MDS_INODELOCK_LAYOUT.
+ */
+struct lov_stripe_md *ccc_inode_lsm_get(struct inode *inode)
+{
+	return lov_lsm_get(cl_i2info(inode)->lli_clob);
+}
+
+inline void ccc_inode_lsm_put(struct inode *inode, struct lov_stripe_md *lsm)
+{
+	lov_lsm_put(cl_i2info(inode)->lli_clob, lsm);
+}
diff --git a/drivers/staging/lustre/lustre/llite/lcommon_misc.c b/drivers/staging/lustre/lustre/llite/lcommon_misc.c
new file mode 100644
index 0000000..d80bcedd
--- /dev/null
+++ b/drivers/staging/lustre/lustre/llite/lcommon_misc.c
@@ -0,0 +1,200 @@
+/*
+ * GPL HEADER START
+ *
+ * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 only,
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License version 2 for more details (a copy is included
+ * in the LICENSE file that accompanied this code).
+ *
+ * You should have received a copy of the GNU General Public License
+ * version 2 along with this program; If not, see
+ * http://www.sun.com/software/products/lustre/docs/GPLv2.pdf
+ *
+ * Please contact Sun Microsystems, Inc., 4150 Network Circle, Santa Clara,
+ * CA 95054 USA or visit www.sun.com if you need additional information or
+ * have any questions.
+ *
+ * GPL HEADER END
+ */
+/*
+ * Copyright (c) 2008, 2010, Oracle and/or its affiliates. All rights reserved.
+ * Use is subject to license terms.
+ *
+ * Copyright (c) 2011, 2012, Intel Corporation.
+ */
+/*
+ * This file is part of Lustre, http://www.lustre.org/
+ * Lustre is a trademark of Sun Microsystems, Inc.
+ *
+ * cl code shared between vvp and liblustre (and other Lustre clients in the
+ * future).
+ *
+ */
+#include "../include/obd_class.h"
+#include "../include/obd_support.h"
+#include "../include/obd.h"
+#include "../include/cl_object.h"
+#include "../include/lclient.h"
+
+#include "../include/lustre_lite.h"
+
+/* Initialize the default and maximum LOV EA and cookie sizes.  This allows
+ * us to make MDS RPCs with large enough reply buffers to hold the
+ * maximum-sized (= maximum striped) EA and cookie without having to
+ * calculate this (via a call into the LOV + OSCs) each time we make an RPC.
+ */
+int cl_init_ea_size(struct obd_export *md_exp, struct obd_export *dt_exp)
+{
+	struct lov_stripe_md lsm = { .lsm_magic = LOV_MAGIC_V3 };
+	__u32 valsize = sizeof(struct lov_desc);
+	int rc, easize, def_easize, cookiesize;
+	struct lov_desc desc;
+	__u16 stripes, def_stripes;
+
+	rc = obd_get_info(NULL, dt_exp, sizeof(KEY_LOVDESC), KEY_LOVDESC,
+			  &valsize, &desc, NULL);
+	if (rc)
+		return rc;
+
+	stripes = min_t(__u32, desc.ld_tgt_count, LOV_MAX_STRIPE_COUNT);
+	lsm.lsm_stripe_count = stripes;
+	easize = obd_size_diskmd(dt_exp, &lsm);
+
+	def_stripes = min_t(__u32, desc.ld_default_stripe_count,
+			    LOV_MAX_STRIPE_COUNT);
+	lsm.lsm_stripe_count = def_stripes;
+	def_easize = obd_size_diskmd(dt_exp, &lsm);
+
+	cookiesize = stripes * sizeof(struct llog_cookie);
+
+	/* default cookiesize is 0 because from 2.4 server doesn't send
+	 * llog cookies to client.
+	 */
+	CDEBUG(D_HA,
+	       "updating def/max_easize: %d/%d def/max_cookiesize: 0/%d\n",
+	       def_easize, easize, cookiesize);
+
+	rc = md_init_ea_size(md_exp, easize, def_easize, cookiesize, 0);
+	return rc;
+}
+
+/**
+ * This function is used as an upcall-callback hooked by liblustre and llite
+ * clients into obd_notify() listeners chain to handle notifications about
+ * change of import connect_flags. See llu_fsswop_mount() and
+ * lustre_common_fill_super().
+ */
+int cl_ocd_update(struct obd_device *host,
+		  struct obd_device *watched,
+		  enum obd_notify_event ev, void *owner, void *data)
+{
+	struct lustre_client_ocd *lco;
+	struct client_obd	*cli;
+	__u64 flags;
+	int   result;
+
+	if (!strcmp(watched->obd_type->typ_name, LUSTRE_OSC_NAME)) {
+		cli = &watched->u.cli;
+		lco = owner;
+		flags = cli->cl_import->imp_connect_data.ocd_connect_flags;
+		CDEBUG(D_SUPER, "Changing connect_flags: %#llx -> %#llx\n",
+		       lco->lco_flags, flags);
+		mutex_lock(&lco->lco_lock);
+		lco->lco_flags &= flags;
+		/* for each osc event update ea size */
+		if (lco->lco_dt_exp)
+			cl_init_ea_size(lco->lco_md_exp, lco->lco_dt_exp);
+
+		mutex_unlock(&lco->lco_lock);
+		result = 0;
+	} else {
+		CERROR("unexpected notification from %s %s!\n",
+		       watched->obd_type->typ_name,
+		       watched->obd_name);
+		result = -EINVAL;
+	}
+	return result;
+}
+
+#define GROUPLOCK_SCOPE "grouplock"
+
+int cl_get_grouplock(struct cl_object *obj, unsigned long gid, int nonblock,
+		     struct ccc_grouplock *cg)
+{
+	struct lu_env	  *env;
+	struct cl_io	   *io;
+	struct cl_lock	 *lock;
+	struct cl_lock_descr   *descr;
+	__u32		   enqflags;
+	int		     refcheck;
+	int		     rc;
+
+	env = cl_env_get(&refcheck);
+	if (IS_ERR(env))
+		return PTR_ERR(env);
+
+	io = ccc_env_thread_io(env);
+	io->ci_obj = obj;
+	io->ci_ignore_layout = 1;
+
+	rc = cl_io_init(env, io, CIT_MISC, io->ci_obj);
+	if (rc) {
+		/* Does not make sense to take GL for released layout */
+		if (rc > 0)
+			rc = -ENOTSUPP;
+		cl_env_put(env, &refcheck);
+		return rc;
+	}
+
+	descr = &ccc_env_info(env)->cti_descr;
+	descr->cld_obj = obj;
+	descr->cld_start = 0;
+	descr->cld_end = CL_PAGE_EOF;
+	descr->cld_gid = gid;
+	descr->cld_mode = CLM_GROUP;
+
+	enqflags = CEF_MUST | (nonblock ? CEF_NONBLOCK : 0);
+	descr->cld_enq_flags = enqflags;
+
+	lock = cl_lock_request(env, io, descr, GROUPLOCK_SCOPE, current);
+	if (IS_ERR(lock)) {
+		cl_io_fini(env, io);
+		cl_env_put(env, &refcheck);
+		return PTR_ERR(lock);
+	}
+
+	cg->cg_env  = cl_env_get(&refcheck);
+	cg->cg_io   = io;
+	cg->cg_lock = lock;
+	cg->cg_gid  = gid;
+	LASSERT(cg->cg_env == env);
+
+	cl_env_unplant(env, &refcheck);
+	return 0;
+}
+
+void cl_put_grouplock(struct ccc_grouplock *cg)
+{
+	struct lu_env  *env  = cg->cg_env;
+	struct cl_io   *io   = cg->cg_io;
+	struct cl_lock *lock = cg->cg_lock;
+	int	     refcheck;
+
+	LASSERT(cg->cg_env);
+	LASSERT(cg->cg_gid);
+
+	cl_env_implant(env, &refcheck);
+	cl_env_put(env, &refcheck);
+
+	cl_unuse(env, lock);
+	cl_lock_release(env, lock, GROUPLOCK_SCOPE, current);
+	cl_io_fini(env, io);
+	cl_env_put(env, NULL);
+}
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 04/46] staging/lustre: Reintroduce global env list
  2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
                   ` (2 preceding siblings ...)
  2016-03-30 23:48 ` [PATCH v2 03/46] staging/lustre: merge lclient/*.c into llite/ green
@ 2016-03-30 23:48 ` green
  2016-03-30 23:48 ` [PATCH v2 05/46] staging/lustre/osc: Adjustment on osc LRU for performance green
                   ` (41 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: green @ 2016-03-30 23:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Jinshan Xiong, Oleg Drokin

From: Jinshan Xiong <jinshan.xiong@intel.com>

This reverts a patch that was merged before lustre client
was introduced into the stagign tree, so it's not in the history.

The performance dropped a lot when memory reclaim process kicked
in as ll_releasepage() was called to destroy lustre pages. It turned
out that big overhead to allocate cl_env and keys on the fly so we
have to revert this patch.

The original problem for the reverted patch would be solved in a
follow on patch instead.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-on: http://review.whamcloud.com/7888
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3321
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 drivers/staging/lustre/lustre/include/cl_object.h  |  1 +
 drivers/staging/lustre/lustre/llite/lcommon_misc.c |  3 +-
 drivers/staging/lustre/lustre/llite/llite_lib.c    |  2 +
 drivers/staging/lustre/lustre/obdclass/cl_object.c | 90 ++++++++++++++++++++--
 drivers/staging/lustre/lustre/obdclass/lu_object.c |  2 +
 5 files changed, 92 insertions(+), 6 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/cl_object.h b/drivers/staging/lustre/lustre/include/cl_object.h
index fb971de..e611f79 100644
--- a/drivers/staging/lustre/lustre/include/cl_object.h
+++ b/drivers/staging/lustre/lustre/include/cl_object.h
@@ -3241,6 +3241,7 @@ void *cl_env_reenter(void);
 void cl_env_reexit(void *cookie);
 void cl_env_implant(struct lu_env *env, int *refcheck);
 void cl_env_unplant(struct lu_env *env, int *refcheck);
+unsigned int cl_env_cache_purge(unsigned int nr);
 
 /** @} cl_env */
 
diff --git a/drivers/staging/lustre/lustre/llite/lcommon_misc.c b/drivers/staging/lustre/lustre/llite/lcommon_misc.c
index d80bcedd..f68c368 100644
--- a/drivers/staging/lustre/lustre/llite/lcommon_misc.c
+++ b/drivers/staging/lustre/lustre/llite/lcommon_misc.c
@@ -146,10 +146,11 @@ int cl_get_grouplock(struct cl_object *obj, unsigned long gid, int nonblock,
 
 	rc = cl_io_init(env, io, CIT_MISC, io->ci_obj);
 	if (rc) {
+		cl_io_fini(env, io);
+		cl_env_put(env, &refcheck);
 		/* Does not make sense to take GL for released layout */
 		if (rc > 0)
 			rc = -ENOTSUPP;
-		cl_env_put(env, &refcheck);
 		return rc;
 	}
 
diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c
index 6d6bb33..673d31e 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -999,6 +999,8 @@ void ll_put_super(struct super_block *sb)
 
 	lustre_common_put_super(sb);
 
+	cl_env_cache_purge(~0);
+
 	module_put(THIS_MODULE);
 } /* client_put_super */
 
diff --git a/drivers/staging/lustre/lustre/obdclass/cl_object.c b/drivers/staging/lustre/lustre/obdclass/cl_object.c
index 43e299d..0772706 100644
--- a/drivers/staging/lustre/lustre/obdclass/cl_object.c
+++ b/drivers/staging/lustre/lustre/obdclass/cl_object.c
@@ -492,6 +492,13 @@ EXPORT_SYMBOL(cl_site_stats_print);
  * bz20044, bz22683.
  */
 
+static LIST_HEAD(cl_envs);
+static unsigned int cl_envs_cached_nr;
+static unsigned int cl_envs_cached_max = 128; /* XXX: prototype: arbitrary limit
+					       * for now.
+					       */
+static DEFINE_SPINLOCK(cl_envs_guard);
+
 struct cl_env {
 	void	     *ce_magic;
 	struct lu_env     ce_lu;
@@ -697,6 +704,39 @@ static void cl_env_fini(struct cl_env *cle)
 	kmem_cache_free(cl_env_kmem, cle);
 }
 
+static struct lu_env *cl_env_obtain(void *debug)
+{
+	struct cl_env *cle;
+	struct lu_env *env;
+
+	spin_lock(&cl_envs_guard);
+	LASSERT(equi(cl_envs_cached_nr == 0, list_empty(&cl_envs)));
+	if (cl_envs_cached_nr > 0) {
+		int rc;
+
+		cle = container_of(cl_envs.next, struct cl_env, ce_linkage);
+		list_del_init(&cle->ce_linkage);
+		cl_envs_cached_nr--;
+		spin_unlock(&cl_envs_guard);
+
+		env = &cle->ce_lu;
+		rc = lu_env_refill(env);
+		if (rc == 0) {
+			cl_env_init0(cle, debug);
+			lu_context_enter(&env->le_ctx);
+			lu_context_enter(&cle->ce_ses);
+		} else {
+			cl_env_fini(cle);
+			env = ERR_PTR(rc);
+		}
+	} else {
+		spin_unlock(&cl_envs_guard);
+		env = cl_env_new(lu_context_tags_default,
+				 lu_session_tags_default, debug);
+	}
+	return env;
+}
+
 static inline struct cl_env *cl_env_container(struct lu_env *env)
 {
 	return container_of(env, struct cl_env, ce_lu);
@@ -727,6 +767,8 @@ static struct lu_env *cl_env_peek(int *refcheck)
  * Returns lu_env: if there already is an environment associated with the
  * current thread, it is returned, otherwise, new environment is allocated.
  *
+ * Allocations are amortized through the global cache of environments.
+ *
  * \param refcheck pointer to a counter used to detect environment leaks. In
  * the usual case cl_env_get() and cl_env_put() are called in the same lexical
  * scope and pointer to the same integer is passed as \a refcheck. This is
@@ -740,10 +782,7 @@ struct lu_env *cl_env_get(int *refcheck)
 
 	env = cl_env_peek(refcheck);
 	if (!env) {
-		env = cl_env_new(lu_context_tags_default,
-				 lu_session_tags_default,
-				 __builtin_return_address(0));
-
+		env = cl_env_obtain(__builtin_return_address(0));
 		if (!IS_ERR(env)) {
 			struct cl_env *cle;
 
@@ -787,6 +826,32 @@ static void cl_env_exit(struct cl_env *cle)
 }
 
 /**
+ * Finalizes and frees a given number of cached environments. This is done to
+ * (1) free some memory (not currently hooked into VM), or (2) release
+ * references to modules.
+ */
+unsigned int cl_env_cache_purge(unsigned int nr)
+{
+	struct cl_env *cle;
+
+	spin_lock(&cl_envs_guard);
+	for (; !list_empty(&cl_envs) && nr > 0; --nr) {
+		cle = container_of(cl_envs.next, struct cl_env, ce_linkage);
+		list_del_init(&cle->ce_linkage);
+		LASSERT(cl_envs_cached_nr > 0);
+		cl_envs_cached_nr--;
+		spin_unlock(&cl_envs_guard);
+
+		cl_env_fini(cle);
+		spin_lock(&cl_envs_guard);
+	}
+	LASSERT(equi(cl_envs_cached_nr == 0, list_empty(&cl_envs)));
+	spin_unlock(&cl_envs_guard);
+	return nr;
+}
+EXPORT_SYMBOL(cl_env_cache_purge);
+
+/**
  * Release an environment.
  *
  * Decrement \a env reference counter. When counter drops to 0, nothing in
@@ -808,7 +873,22 @@ void cl_env_put(struct lu_env *env, int *refcheck)
 		cl_env_detach(cle);
 		cle->ce_debug = NULL;
 		cl_env_exit(cle);
-		cl_env_fini(cle);
+		/*
+		 * Don't bother to take a lock here.
+		 *
+		 * Return environment to the cache only when it was allocated
+		 * with the standard tags.
+		 */
+		if (cl_envs_cached_nr < cl_envs_cached_max &&
+		    (env->le_ctx.lc_tags & ~LCT_HAS_EXIT) == LCT_CL_THREAD &&
+		    (env->le_ses->lc_tags & ~LCT_HAS_EXIT) == LCT_SESSION) {
+			spin_lock(&cl_envs_guard);
+			list_add(&cle->ce_linkage, &cl_envs);
+			cl_envs_cached_nr++;
+			spin_unlock(&cl_envs_guard);
+		} else {
+			cl_env_fini(cle);
+		}
 	}
 }
 EXPORT_SYMBOL(cl_env_put);
diff --git a/drivers/staging/lustre/lustre/obdclass/lu_object.c b/drivers/staging/lustre/lustre/obdclass/lu_object.c
index 69fdcee..770d5bd 100644
--- a/drivers/staging/lustre/lustre/obdclass/lu_object.c
+++ b/drivers/staging/lustre/lustre/obdclass/lu_object.c
@@ -55,6 +55,7 @@
 #include "../include/lustre_disk.h"
 #include "../include/lustre_fid.h"
 #include "../include/lu_object.h"
+#include "../include/cl_object.h"
 #include "../include/lu_ref.h"
 #include <linux/list.h>
 
@@ -1468,6 +1469,7 @@ void lu_context_key_quiesce(struct lu_context_key *key)
 		/*
 		 * XXX layering violation.
 		 */
+		cl_env_cache_purge(~0);
 		key->lct_tags |= LCT_QUIESCENT;
 		/*
 		 * XXX memory barrier has to go here.
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 05/46] staging/lustre/osc: Adjustment on osc LRU for performance
  2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
                   ` (3 preceding siblings ...)
  2016-03-30 23:48 ` [PATCH v2 04/46] staging/lustre: Reintroduce global env list green
@ 2016-03-30 23:48 ` green
  2016-03-30 23:48 ` [PATCH v2 06/46] staging/lustre/osc: to drop LRU pages with cl_lru_work green
                   ` (40 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: green @ 2016-03-30 23:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Jinshan Xiong, Oleg Drokin

From: Jinshan Xiong <jinshan.xiong@intel.com>

Add and discard pages from LRU in batch.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-on: http://review.whamcloud.com/7890
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3321
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 drivers/staging/lustre/lustre/llite/llite_lib.c    |   5 +-
 drivers/staging/lustre/lustre/osc/lproc_osc.c      |   2 +-
 drivers/staging/lustre/lustre/osc/osc_cache.c      |   2 +
 .../staging/lustre/lustre/osc/osc_cl_internal.h    |  26 +-
 drivers/staging/lustre/lustre/osc/osc_internal.h   |   3 +-
 drivers/staging/lustre/lustre/osc/osc_io.c         |  51 ++++
 drivers/staging/lustre/lustre/osc/osc_page.c       | 301 +++++++++++----------
 drivers/staging/lustre/lustre/osc/osc_request.c    |   2 +-
 8 files changed, 235 insertions(+), 157 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c
index 673d31e..a4401f2 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -85,10 +85,7 @@ static struct ll_sb_info *ll_init_sbi(struct super_block *sb)
 
 	si_meminfo(&si);
 	pages = si.totalram - si.totalhigh;
-	if (pages >> (20 - PAGE_CACHE_SHIFT) < 512)
-		lru_page_max = pages / 2;
-	else
-		lru_page_max = (pages / 4) * 3;
+	lru_page_max = pages / 2;
 
 	/* initialize lru data */
 	atomic_set(&sbi->ll_cache.ccc_users, 0);
diff --git a/drivers/staging/lustre/lustre/osc/lproc_osc.c b/drivers/staging/lustre/lustre/osc/lproc_osc.c
index 57c43c5..3eff12c 100644
--- a/drivers/staging/lustre/lustre/osc/lproc_osc.c
+++ b/drivers/staging/lustre/lustre/osc/lproc_osc.c
@@ -223,7 +223,7 @@ static ssize_t osc_cached_mb_seq_write(struct file *file,
 
 	rc = atomic_read(&cli->cl_lru_in_list) - pages_number;
 	if (rc > 0)
-		(void)osc_lru_shrink(cli, rc);
+		(void)osc_lru_shrink(cli, rc, true);
 
 	return count;
 }
diff --git a/drivers/staging/lustre/lustre/osc/osc_cache.c b/drivers/staging/lustre/lustre/osc/osc_cache.c
index 3cfd2b0..6196c3b 100644
--- a/drivers/staging/lustre/lustre/osc/osc_cache.c
+++ b/drivers/staging/lustre/lustre/osc/osc_cache.c
@@ -856,6 +856,8 @@ int osc_extent_finish(const struct lu_env *env, struct osc_extent *ext,
 
 	ext->oe_rc = rc ?: ext->oe_nr_pages;
 	EASSERT(ergo(rc == 0, ext->oe_state == OES_RPC), ext);
+
+	osc_lru_add_batch(cli, &ext->oe_pages);
 	list_for_each_entry_safe(oap, tmp, &ext->oe_pages, oap_pending_item) {
 		list_del_init(&oap->oap_rpc_item);
 		list_del_init(&oap->oap_pending_item);
diff --git a/drivers/staging/lustre/lustre/osc/osc_cl_internal.h b/drivers/staging/lustre/lustre/osc/osc_cl_internal.h
index d55d04d..f516848 100644
--- a/drivers/staging/lustre/lustre/osc/osc_cl_internal.h
+++ b/drivers/staging/lustre/lustre/osc/osc_cl_internal.h
@@ -77,6 +77,8 @@ struct osc_io {
 	 */
 	struct osc_extent *oi_trunc;
 
+	int oi_lru_reserved;
+
 	struct obd_info    oi_info;
 	struct obdo	oi_oa;
 	struct osc_async_cbargs {
@@ -100,7 +102,7 @@ struct osc_session {
 	struct osc_io       os_io;
 };
 
-#define OTI_PVEC_SIZE 64
+#define OTI_PVEC_SIZE 256
 struct osc_thread_info {
 	struct ldlm_res_id      oti_resname;
 	ldlm_policy_data_t      oti_policy;
@@ -369,18 +371,15 @@ struct osc_page {
 	 * Set if the page must be transferred with OBD_BRW_SRVLOCK.
 	 */
 			      ops_srvlock:1;
-	union {
-		/**
-		 * lru page list. ops_inflight and ops_lru are exclusive so
-		 * that they can share the same data.
-		 */
-		struct list_head	      ops_lru;
-		/**
-		 * Linkage into a per-osc_object list of pages in flight. For
-		 * debugging.
-		 */
-		struct list_head	    ops_inflight;
-	};
+	/**
+	 * lru page list. See osc_lru_{del|use}() in osc_page.c for usage.
+	 */
+	struct list_head	      ops_lru;
+	/**
+	 * Linkage into a per-osc_object list of pages in flight. For
+	 * debugging.
+	 */
+	struct list_head	    ops_inflight;
 	/**
 	 * Thread that submitted this page for transfer. For debugging.
 	 */
@@ -432,6 +431,7 @@ void osc_index2policy  (ldlm_policy_data_t *policy, const struct cl_object *obj,
 int  osc_lvb_print     (const struct lu_env *env, void *cookie,
 			lu_printer_t p, const struct ost_lvb *lvb);
 
+void osc_lru_add_batch(struct client_obd *cli, struct list_head *list);
 void osc_page_submit(const struct lu_env *env, struct osc_page *opg,
 		     enum cl_req_type crt, int brw_flags);
 int osc_cancel_async_page(const struct lu_env *env, struct osc_page *ops);
diff --git a/drivers/staging/lustre/lustre/osc/osc_internal.h b/drivers/staging/lustre/lustre/osc/osc_internal.h
index ea695c2..ec12962 100644
--- a/drivers/staging/lustre/lustre/osc/osc_internal.h
+++ b/drivers/staging/lustre/lustre/osc/osc_internal.h
@@ -130,7 +130,8 @@ int osc_sync_base(struct obd_export *exp, struct obd_info *oinfo,
 int osc_process_config_base(struct obd_device *obd, struct lustre_cfg *cfg);
 int osc_build_rpc(const struct lu_env *env, struct client_obd *cli,
 		  struct list_head *ext_list, int cmd);
-int osc_lru_shrink(struct client_obd *cli, int target);
+int osc_lru_shrink(struct client_obd *cli, int target, bool force);
+int osc_lru_reclaim(struct client_obd *cli);
 
 extern spinlock_t osc_ast_guard;
 
diff --git a/drivers/staging/lustre/lustre/osc/osc_io.c b/drivers/staging/lustre/lustre/osc/osc_io.c
index 6bd0a45..a0fa533 100644
--- a/drivers/staging/lustre/lustre/osc/osc_io.c
+++ b/drivers/staging/lustre/lustre/osc/osc_io.c
@@ -308,6 +308,55 @@ static int osc_io_commit_write(const struct lu_env *env,
 	return 0;
 }
 
+static int osc_io_rw_iter_init(const struct lu_env *env,
+			       const struct cl_io_slice *ios)
+{
+	struct cl_io *io = ios->cis_io;
+	struct osc_io *oio = osc_env_io(env);
+	struct osc_object *osc = cl2osc(ios->cis_obj);
+	struct client_obd *cli = osc_cli(osc);
+	unsigned long c;
+	unsigned int npages;
+	unsigned int max_pages;
+
+	if (cl_io_is_append(io))
+		return 0;
+
+	npages = io->u.ci_rw.crw_count >> PAGE_CACHE_SHIFT;
+	if (io->u.ci_rw.crw_pos & ~PAGE_MASK)
+		++npages;
+
+	max_pages = cli->cl_max_pages_per_rpc * cli->cl_max_rpcs_in_flight;
+	if (npages > max_pages)
+		npages = max_pages;
+
+	c = atomic_read(cli->cl_lru_left);
+	if (c < npages && osc_lru_reclaim(cli) > 0)
+		c = atomic_read(cli->cl_lru_left);
+	while (c >= npages) {
+		if (c == atomic_cmpxchg(cli->cl_lru_left, c, c - npages)) {
+			oio->oi_lru_reserved = npages;
+			break;
+		}
+		c = atomic_read(cli->cl_lru_left);
+	}
+
+	return 0;
+}
+
+static void osc_io_rw_iter_fini(const struct lu_env *env,
+				const struct cl_io_slice *ios)
+{
+	struct osc_io *oio = osc_env_io(env);
+	struct osc_object *osc = cl2osc(ios->cis_obj);
+	struct client_obd *cli = osc_cli(osc);
+
+	if (oio->oi_lru_reserved > 0) {
+		atomic_add(oio->oi_lru_reserved, cli->cl_lru_left);
+		oio->oi_lru_reserved = 0;
+	}
+}
+
 static int osc_io_fault_start(const struct lu_env *env,
 			      const struct cl_io_slice *ios)
 {
@@ -650,6 +699,8 @@ static const struct cl_io_operations osc_io_ops = {
 			.cio_fini   = osc_io_fini
 		},
 		[CIT_WRITE] = {
+			.cio_iter_init = osc_io_rw_iter_init,
+			.cio_iter_fini = osc_io_rw_iter_fini,
 			.cio_start  = osc_io_write_start,
 			.cio_end    = osc_io_end,
 			.cio_fini   = osc_io_fini
diff --git a/drivers/staging/lustre/lustre/osc/osc_page.c b/drivers/staging/lustre/lustre/osc/osc_page.c
index d720b1a..a60b783 100644
--- a/drivers/staging/lustre/lustre/osc/osc_page.c
+++ b/drivers/staging/lustre/lustre/osc/osc_page.c
@@ -42,8 +42,8 @@
 
 #include "osc_cl_internal.h"
 
-static void osc_lru_del(struct client_obd *cli, struct osc_page *opg, bool del);
-static void osc_lru_add(struct client_obd *cli, struct osc_page *opg);
+static void osc_lru_del(struct client_obd *cli, struct osc_page *opg);
+static void osc_lru_use(struct client_obd *cli, struct osc_page *opg);
 static int osc_lru_reserve(const struct lu_env *env, struct osc_object *obj,
 			   struct osc_page *opg);
 
@@ -104,10 +104,7 @@ static void osc_page_transfer_add(const struct lu_env *env,
 {
 	struct osc_object *obj = cl2osc(opg->ops_cl.cpl_obj);
 
-	/* ops_lru and ops_inflight share the same field, so take it from LRU
-	 * first and then use it as inflight.
-	 */
-	osc_lru_del(osc_cli(obj), opg, false);
+	osc_lru_use(osc_cli(obj), opg);
 
 	spin_lock(&obj->oo_seatbelt);
 	list_add(&opg->ops_inflight, &obj->oo_inflight[crt]);
@@ -222,21 +219,15 @@ static void osc_page_completion_read(const struct lu_env *env,
 				     int ioret)
 {
 	struct osc_page *opg = cl2osc_page(slice);
-	struct osc_object *obj = cl2osc(opg->ops_cl.cpl_obj);
 
 	if (likely(opg->ops_lock))
 		osc_page_putref_lock(env, opg);
-	osc_lru_add(osc_cli(obj), opg);
 }
 
 static void osc_page_completion_write(const struct lu_env *env,
 				      const struct cl_page_slice *slice,
 				      int ioret)
 {
-	struct osc_page *opg = cl2osc_page(slice);
-	struct osc_object *obj = cl2osc(slice->cpl_obj);
-
-	osc_lru_add(osc_cli(obj), opg);
 }
 
 static int osc_page_fail(const struct lu_env *env,
@@ -334,7 +325,7 @@ static void osc_page_delete(const struct lu_env *env,
 	}
 	spin_unlock(&obj->oo_seatbelt);
 
-	osc_lru_del(osc_cli(obj), opg, true);
+	osc_lru_del(osc_cli(obj), opg);
 }
 
 static void osc_page_clip(const struct lu_env *env,
@@ -483,13 +474,12 @@ void osc_page_submit(const struct lu_env *env, struct osc_page *opg,
  */
 
 static DECLARE_WAIT_QUEUE_HEAD(osc_lru_waitq);
-static atomic_t osc_lru_waiters = ATOMIC_INIT(0);
 /* LRU pages are freed in batch mode. OSC should at least free this
  * number of pages to avoid running out of LRU budget, and..
  */
 static const int lru_shrink_min = 2 << (20 - PAGE_CACHE_SHIFT);  /* 2M */
 /* free this number at most otherwise it will take too long time to finish. */
-static const int lru_shrink_max = 32 << (20 - PAGE_CACHE_SHIFT); /* 32M */
+static const int lru_shrink_max = 8 << (20 - PAGE_CACHE_SHIFT); /* 8M */
 
 /* Check if we can free LRU slots from this OSC. If there exists LRU waiters,
  * we should free slots aggressively. In this way, slots are freed in a steady
@@ -500,62 +490,127 @@ static const int lru_shrink_max = 32 << (20 - PAGE_CACHE_SHIFT); /* 32M */
 static int osc_cache_too_much(struct client_obd *cli)
 {
 	struct cl_client_cache *cache = cli->cl_cache;
-	int pages = atomic_read(&cli->cl_lru_in_list) >> 1;
+	int pages = atomic_read(&cli->cl_lru_in_list);
+	unsigned long budget;
 
-	if (atomic_read(&osc_lru_waiters) > 0 &&
-	    atomic_read(cli->cl_lru_left) < lru_shrink_max)
-		/* drop lru pages aggressively */
-		return min(pages, lru_shrink_max);
+	budget = cache->ccc_lru_max / atomic_read(&cache->ccc_users);
 
 	/* if it's going to run out LRU slots, we should free some, but not
 	 * too much to maintain fairness among OSCs.
 	 */
 	if (atomic_read(cli->cl_lru_left) < cache->ccc_lru_max >> 4) {
-		unsigned long tmp;
+		if (pages >= budget)
+			return lru_shrink_max;
+		else if (pages >= budget / 2)
+			return lru_shrink_min;
+	} else if (pages >= budget * 2)
+		return lru_shrink_min;
+	return 0;
+}
 
-		tmp = cache->ccc_lru_max / atomic_read(&cache->ccc_users);
-		if (pages > tmp)
-			return min(pages, lru_shrink_max);
+void osc_lru_add_batch(struct client_obd *cli, struct list_head *plist)
+{
+	LIST_HEAD(lru);
+	struct osc_async_page *oap;
+	int npages = 0;
+
+	list_for_each_entry(oap, plist, oap_pending_item) {
+		struct osc_page *opg = oap2osc_page(oap);
+
+		if (!opg->ops_in_lru)
+			continue;
 
-		return pages > lru_shrink_min ? lru_shrink_min : 0;
+		++npages;
+		LASSERT(list_empty(&opg->ops_lru));
+		list_add(&opg->ops_lru, &lru);
 	}
 
-	return 0;
+	if (npages > 0) {
+		client_obd_list_lock(&cli->cl_lru_list_lock);
+		list_splice_tail(&lru, &cli->cl_lru_list);
+		atomic_sub(npages, &cli->cl_lru_busy);
+		atomic_add(npages, &cli->cl_lru_in_list);
+		client_obd_list_unlock(&cli->cl_lru_list_lock);
+
+		/* XXX: May set force to be true for better performance */
+		osc_lru_shrink(cli, osc_cache_too_much(cli), false);
+	}
 }
 
-/* Return how many pages are not discarded in @pvec. */
-static int discard_pagevec(const struct lu_env *env, struct cl_io *io,
-			   struct cl_page **pvec, int max_index)
+static void __osc_lru_del(struct client_obd *cli, struct osc_page *opg)
+{
+	LASSERT(atomic_read(&cli->cl_lru_in_list) > 0);
+	list_del_init(&opg->ops_lru);
+	atomic_dec(&cli->cl_lru_in_list);
+}
+
+/**
+ * Page is being destroyed. The page may be not in LRU list, if the transfer
+ * has never finished(error occurred).
+ */
+static void osc_lru_del(struct client_obd *cli, struct osc_page *opg)
+{
+	if (opg->ops_in_lru) {
+		client_obd_list_lock(&cli->cl_lru_list_lock);
+		if (!list_empty(&opg->ops_lru)) {
+			__osc_lru_del(cli, opg);
+		} else {
+			LASSERT(atomic_read(&cli->cl_lru_busy) > 0);
+			atomic_dec(&cli->cl_lru_busy);
+		}
+		client_obd_list_unlock(&cli->cl_lru_list_lock);
+
+		atomic_inc(cli->cl_lru_left);
+		/* this is a great place to release more LRU pages if
+		 * this osc occupies too many LRU pages and kernel is
+		 * stealing one of them.
+		 */
+		if (!memory_pressure_get())
+			osc_lru_shrink(cli, osc_cache_too_much(cli), false);
+		wake_up(&osc_lru_waitq);
+	} else {
+		LASSERT(list_empty(&opg->ops_lru));
+	}
+}
+
+/**
+ * Delete page from LRUlist for redirty.
+ */
+static void osc_lru_use(struct client_obd *cli, struct osc_page *opg)
+{
+	/* If page is being transferred for the first time,
+	 * ops_lru should be empty
+	 */
+	if (opg->ops_in_lru && !list_empty(&opg->ops_lru)) {
+		client_obd_list_lock(&cli->cl_lru_list_lock);
+		__osc_lru_del(cli, opg);
+		client_obd_list_unlock(&cli->cl_lru_list_lock);
+		atomic_inc(&cli->cl_lru_busy);
+	}
+}
+
+static void discard_pagevec(const struct lu_env *env, struct cl_io *io,
+			    struct cl_page **pvec, int max_index)
 {
-	int count;
 	int i;
 
-	for (count = 0, i = 0; i < max_index; i++) {
+	for (i = 0; i < max_index; i++) {
 		struct cl_page *page = pvec[i];
 
-		if (cl_page_own_try(env, io, page) == 0) {
-			/* free LRU page only if nobody is using it.
-			 * This check is necessary to avoid freeing the pages
-			 * having already been removed from LRU and pinned
-			 * for IO.
-			 */
-			if (!cl_page_in_use(page)) {
-				cl_page_unmap(env, io, page);
-				cl_page_discard(env, io, page);
-				++count;
-			}
-			cl_page_disown(env, io, page);
-		}
+		LASSERT(cl_page_is_owned(page, io));
+		cl_page_unmap(env, io, page);
+		cl_page_discard(env, io, page);
+		cl_page_disown(env, io, page);
 		cl_page_put(env, page);
+
 		pvec[i] = NULL;
 	}
-	return max_index - count;
 }
 
 /**
  * Drop @target of pages from LRU at most.
  */
-int osc_lru_shrink(struct client_obd *cli, int target)
+int osc_lru_shrink(struct client_obd *cli, int target, bool force)
 {
 	struct cl_env_nest nest;
 	struct lu_env *env;
@@ -573,18 +628,32 @@ int osc_lru_shrink(struct client_obd *cli, int target)
 	if (atomic_read(&cli->cl_lru_in_list) == 0 || target <= 0)
 		return 0;
 
+	if (!force) {
+		if (atomic_read(&cli->cl_lru_shrinkers) > 0)
+			return -EBUSY;
+
+		if (atomic_inc_return(&cli->cl_lru_shrinkers) > 1) {
+			atomic_dec(&cli->cl_lru_shrinkers);
+			return -EBUSY;
+		}
+	} else {
+		atomic_inc(&cli->cl_lru_shrinkers);
+	}
+
 	env = cl_env_nested_get(&nest);
-	if (IS_ERR(env))
-		return PTR_ERR(env);
+	if (IS_ERR(env)) {
+		rc = PTR_ERR(env);
+		goto out;
+	}
 
 	pvec = osc_env_info(env)->oti_pvec;
 	io = &osc_env_info(env)->oti_io;
 
 	client_obd_list_lock(&cli->cl_lru_list_lock);
-	atomic_inc(&cli->cl_lru_shrinkers);
 	maxscan = min(target << 1, atomic_read(&cli->cl_lru_in_list));
 	list_for_each_entry_safe(opg, temp, &cli->cl_lru_list, ops_lru) {
 		struct cl_page *page;
+		bool will_free = false;
 
 		if (--maxscan < 0)
 			break;
@@ -603,7 +672,7 @@ int osc_lru_shrink(struct client_obd *cli, int target)
 			client_obd_list_unlock(&cli->cl_lru_list_lock);
 
 			if (clobj) {
-				count -= discard_pagevec(env, io, pvec, index);
+				discard_pagevec(env, io, pvec, index);
 				index = 0;
 
 				cl_io_fini(env, io);
@@ -625,98 +694,56 @@ int osc_lru_shrink(struct client_obd *cli, int target)
 			continue;
 		}
 
-		/* move this page to the end of list as it will be discarded
-		 * soon. The page will be finally removed from LRU list in
-		 * osc_page_delete().
-		 */
-		list_move_tail(&opg->ops_lru, &cli->cl_lru_list);
+		if (cl_page_own_try(env, io, page) == 0) {
+			if (!cl_page_in_use_noref(page)) {
+				/* remove it from lru list earlier to avoid
+				 * lock contention
+				 */
+				__osc_lru_del(cli, opg);
+				opg->ops_in_lru = 0; /* will be discarded */
+
+				cl_page_get(page);
+				will_free = true;
+			} else {
+				cl_page_disown(env, io, page);
+			}
+		}
 
-		/* it's okay to grab a refcount here w/o holding lock because
-		 * it has to grab cl_lru_list_lock to delete the page.
-		 */
-		cl_page_get(page);
-		pvec[index++] = page;
-		if (++count >= target)
-			break;
+		if (!will_free) {
+			list_move_tail(&opg->ops_lru, &cli->cl_lru_list);
+			continue;
+		}
 
+		/* Don't discard and free the page with cl_lru_list held */
+		pvec[index++] = page;
 		if (unlikely(index == OTI_PVEC_SIZE)) {
 			client_obd_list_unlock(&cli->cl_lru_list_lock);
-			count -= discard_pagevec(env, io, pvec, index);
+			discard_pagevec(env, io, pvec, index);
 			index = 0;
 
 			client_obd_list_lock(&cli->cl_lru_list_lock);
 		}
+
+		if (++count >= target)
+			break;
 	}
 	client_obd_list_unlock(&cli->cl_lru_list_lock);
 
 	if (clobj) {
-		count -= discard_pagevec(env, io, pvec, index);
+		discard_pagevec(env, io, pvec, index);
 
 		cl_io_fini(env, io);
 		cl_object_put(env, clobj);
 	}
 	cl_env_nested_put(&nest, env);
 
+out:
 	atomic_dec(&cli->cl_lru_shrinkers);
-	return count > 0 ? count : rc;
-}
-
-static void osc_lru_add(struct client_obd *cli, struct osc_page *opg)
-{
-	bool wakeup = false;
-
-	if (!opg->ops_in_lru)
-		return;
-
-	atomic_dec(&cli->cl_lru_busy);
-	client_obd_list_lock(&cli->cl_lru_list_lock);
-	if (list_empty(&opg->ops_lru)) {
-		list_move_tail(&opg->ops_lru, &cli->cl_lru_list);
-		atomic_inc_return(&cli->cl_lru_in_list);
-		wakeup = atomic_read(&osc_lru_waiters) > 0;
-	}
-	client_obd_list_unlock(&cli->cl_lru_list_lock);
-
-	if (wakeup) {
-		osc_lru_shrink(cli, osc_cache_too_much(cli));
+	if (count > 0) {
+		atomic_add(count, cli->cl_lru_left);
 		wake_up_all(&osc_lru_waitq);
 	}
-}
-
-/* delete page from LRUlist. The page can be deleted from LRUlist for two
- * reasons: redirtied or deleted from page cache.
- */
-static void osc_lru_del(struct client_obd *cli, struct osc_page *opg, bool del)
-{
-	if (opg->ops_in_lru) {
-		client_obd_list_lock(&cli->cl_lru_list_lock);
-		if (!list_empty(&opg->ops_lru)) {
-			LASSERT(atomic_read(&cli->cl_lru_in_list) > 0);
-			list_del_init(&opg->ops_lru);
-			atomic_dec(&cli->cl_lru_in_list);
-			if (!del)
-				atomic_inc(&cli->cl_lru_busy);
-		} else if (del) {
-			LASSERT(atomic_read(&cli->cl_lru_busy) > 0);
-			atomic_dec(&cli->cl_lru_busy);
-		}
-		client_obd_list_unlock(&cli->cl_lru_list_lock);
-		if (del) {
-			atomic_inc(cli->cl_lru_left);
-			/* this is a great place to release more LRU pages if
-			 * this osc occupies too many LRU pages and kernel is
-			 * stealing one of them.
-			 * cl_lru_shrinkers is to avoid recursive call in case
-			 * we're already in the context of osc_lru_shrink().
-			 */
-			if (atomic_read(&cli->cl_lru_shrinkers) == 0 &&
-			    !memory_pressure_get())
-				osc_lru_shrink(cli, osc_cache_too_much(cli));
-			wake_up(&osc_lru_waitq);
-		}
-	} else {
-		LASSERT(list_empty(&opg->ops_lru));
-	}
+	return count > 0 ? count : rc;
 }
 
 static inline int max_to_shrink(struct client_obd *cli)
@@ -724,16 +751,19 @@ static inline int max_to_shrink(struct client_obd *cli)
 	return min(atomic_read(&cli->cl_lru_in_list) >> 1, lru_shrink_max);
 }
 
-static int osc_lru_reclaim(struct client_obd *cli)
+int osc_lru_reclaim(struct client_obd *cli)
 {
 	struct cl_client_cache *cache = cli->cl_cache;
 	int max_scans;
-	int rc;
+	int rc = 0;
 
 	LASSERT(cache);
 
-	rc = osc_lru_shrink(cli, lru_shrink_min);
+	rc = osc_lru_shrink(cli, lru_shrink_min, false);
 	if (rc != 0) {
+		if (rc == -EBUSY)
+			rc = 0;
+
 		CDEBUG(D_CACHE, "%s: Free %d pages from own LRU: %p.\n",
 		       cli->cl_import->imp_obd->obd_name, rc, cli);
 		return rc;
@@ -764,10 +794,10 @@ static int osc_lru_reclaim(struct client_obd *cli)
 		       atomic_read(&cli->cl_lru_busy));
 
 		list_move_tail(&cli->cl_lru_osc, &cache->ccc_lru);
-		if (atomic_read(&cli->cl_lru_in_list) > 0) {
+		if (osc_cache_too_much(cli) > 0) {
 			spin_unlock(&cache->ccc_lru_lock);
 
-			rc = osc_lru_shrink(cli, max_to_shrink(cli));
+			rc = osc_lru_shrink(cli, osc_cache_too_much(cli), true);
 			spin_lock(&cache->ccc_lru_lock);
 			if (rc != 0)
 				break;
@@ -784,15 +814,20 @@ static int osc_lru_reserve(const struct lu_env *env, struct osc_object *obj,
 			   struct osc_page *opg)
 {
 	struct l_wait_info lwi = LWI_INTR(LWI_ON_SIGNAL_NOOP, NULL);
+	struct osc_io *oio = osc_env_io(env);
 	struct client_obd *cli = osc_cli(obj);
 	int rc = 0;
 
 	if (!cli->cl_cache) /* shall not be in LRU */
 		return 0;
 
+	if (oio->oi_lru_reserved > 0) {
+		--oio->oi_lru_reserved;
+		goto out;
+	}
+
 	LASSERT(atomic_read(cli->cl_lru_left) >= 0);
 	while (!atomic_add_unless(cli->cl_lru_left, -1, 0)) {
-		int gen;
 
 		/* run out of LRU spaces, try to drop some by itself */
 		rc = osc_lru_reclaim(cli);
@@ -803,23 +838,15 @@ static int osc_lru_reserve(const struct lu_env *env, struct osc_object *obj,
 
 		cond_resched();
 
-		/* slowest case, all of caching pages are busy, notifying
-		 * other OSCs that we're lack of LRU slots.
-		 */
-		atomic_inc(&osc_lru_waiters);
-
-		gen = atomic_read(&cli->cl_lru_in_list);
 		rc = l_wait_event(osc_lru_waitq,
-				  atomic_read(cli->cl_lru_left) > 0 ||
-				  (atomic_read(&cli->cl_lru_in_list) > 0 &&
-				   gen != atomic_read(&cli->cl_lru_in_list)),
+				  atomic_read(cli->cl_lru_left) > 0,
 				  &lwi);
 
-		atomic_dec(&osc_lru_waiters);
 		if (rc < 0)
 			break;
 	}
 
+out:
 	if (rc >= 0) {
 		atomic_inc(&cli->cl_lru_busy);
 		opg->ops_in_lru = 1;
diff --git a/drivers/staging/lustre/lustre/osc/osc_request.c b/drivers/staging/lustre/lustre/osc/osc_request.c
index 850d5dd..6dadda4 100644
--- a/drivers/staging/lustre/lustre/osc/osc_request.c
+++ b/drivers/staging/lustre/lustre/osc/osc_request.c
@@ -2910,7 +2910,7 @@ static int osc_set_info_async(const struct lu_env *env, struct obd_export *exp,
 		int nr = atomic_read(&cli->cl_lru_in_list) >> 1;
 		int target = *(int *)val;
 
-		nr = osc_lru_shrink(cli, min(nr, target));
+		nr = osc_lru_shrink(cli, min(nr, target), true);
 		*(int *)val -= nr;
 		return 0;
 	}
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 06/46] staging/lustre/osc: to drop LRU pages with cl_lru_work
  2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
                   ` (4 preceding siblings ...)
  2016-03-30 23:48 ` [PATCH v2 05/46] staging/lustre/osc: Adjustment on osc LRU for performance green
@ 2016-03-30 23:48 ` green
  2016-03-30 23:48 ` [PATCH v2 07/46] staging/lustre/clio: collapse layer of cl_page green
                   ` (39 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: green @ 2016-03-30 23:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Jinshan Xiong, Oleg Drokin

From: Jinshan Xiong <jinshan.xiong@intel.com>

This way we can drop it async.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-on: http://review.whamcloud.com/7891
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3321
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 drivers/staging/lustre/lustre/include/obd.h        |  1 +
 drivers/staging/lustre/lustre/llite/lproc_llite.c  |  9 ++++-
 drivers/staging/lustre/lustre/osc/lproc_osc.c      | 12 +++++-
 .../staging/lustre/lustre/osc/osc_cl_internal.h    |  1 +
 drivers/staging/lustre/lustre/osc/osc_internal.h   |  3 +-
 drivers/staging/lustre/lustre/osc/osc_page.c       | 45 ++++++++++++++--------
 drivers/staging/lustre/lustre/osc/osc_request.c    | 23 ++++++++++-
 drivers/staging/lustre/lustre/ptlrpc/ptlrpcd.c     | 16 ++++++--
 8 files changed, 85 insertions(+), 25 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/obd.h b/drivers/staging/lustre/lustre/include/obd.h
index 4a0f2e8..26182ca 100644
--- a/drivers/staging/lustre/lustre/include/obd.h
+++ b/drivers/staging/lustre/lustre/include/obd.h
@@ -364,6 +364,7 @@ struct client_obd {
 
 	/* ptlrpc work for writeback in ptlrpcd context */
 	void		    *cl_writeback_work;
+	void			*cl_lru_work;
 	/* hash tables for osc_quota_info */
 	struct cfs_hash	      *cl_quota_hash[MAXQUOTAS];
 };
diff --git a/drivers/staging/lustre/lustre/llite/lproc_llite.c b/drivers/staging/lustre/lustre/llite/lproc_llite.c
index 45941a6..9e8e61a 100644
--- a/drivers/staging/lustre/lustre/llite/lproc_llite.c
+++ b/drivers/staging/lustre/lustre/llite/lproc_llite.c
@@ -393,6 +393,8 @@ static ssize_t ll_max_cached_mb_seq_write(struct file *file,
 	struct super_block *sb = ((struct seq_file *)file->private_data)->private;
 	struct ll_sb_info *sbi = ll_s2sbi(sb);
 	struct cl_client_cache *cache = &sbi->ll_cache;
+	struct lu_env *env;
+	int refcheck;
 	int mult, rc, pages_number;
 	int diff = 0;
 	int nrpages = 0;
@@ -430,6 +432,10 @@ static ssize_t ll_max_cached_mb_seq_write(struct file *file,
 		goto out;
 	}
 
+	env = cl_env_get(&refcheck);
+	if (IS_ERR(env))
+		return 0;
+
 	diff = -diff;
 	while (diff > 0) {
 		int tmp;
@@ -461,13 +467,14 @@ static ssize_t ll_max_cached_mb_seq_write(struct file *file,
 
 		/* difficult - have to ask OSCs to drop LRU slots. */
 		tmp = diff << 1;
-		rc = obd_set_info_async(NULL, sbi->ll_dt_exp,
+		rc = obd_set_info_async(env, sbi->ll_dt_exp,
 					sizeof(KEY_CACHE_LRU_SHRINK),
 					KEY_CACHE_LRU_SHRINK,
 					sizeof(tmp), &tmp, NULL);
 		if (rc < 0)
 			break;
 	}
+	cl_env_put(env, &refcheck);
 
 out:
 	if (rc >= 0) {
diff --git a/drivers/staging/lustre/lustre/osc/lproc_osc.c b/drivers/staging/lustre/lustre/osc/lproc_osc.c
index 3eff12c..e6e2029 100644
--- a/drivers/staging/lustre/lustre/osc/lproc_osc.c
+++ b/drivers/staging/lustre/lustre/osc/lproc_osc.c
@@ -222,8 +222,16 @@ static ssize_t osc_cached_mb_seq_write(struct file *file,
 		return -ERANGE;
 
 	rc = atomic_read(&cli->cl_lru_in_list) - pages_number;
-	if (rc > 0)
-		(void)osc_lru_shrink(cli, rc, true);
+	if (rc > 0) {
+		struct lu_env *env;
+		int refcheck;
+
+		env = cl_env_get(&refcheck);
+		if (!IS_ERR(env)) {
+			(void)osc_lru_shrink(env, cli, rc, true);
+			cl_env_put(env, &refcheck);
+		}
+	}
 
 	return count;
 }
diff --git a/drivers/staging/lustre/lustre/osc/osc_cl_internal.h b/drivers/staging/lustre/lustre/osc/osc_cl_internal.h
index f516848..b6325f5 100644
--- a/drivers/staging/lustre/lustre/osc/osc_cl_internal.h
+++ b/drivers/staging/lustre/lustre/osc/osc_cl_internal.h
@@ -457,6 +457,7 @@ int osc_cache_wait_range(const struct lu_env *env, struct osc_object *obj,
 			 pgoff_t start, pgoff_t end);
 void osc_io_unplug(const struct lu_env *env, struct client_obd *cli,
 		   struct osc_object *osc);
+int lru_queue_work(const struct lu_env *env, void *data);
 
 void osc_object_set_contended  (struct osc_object *obj);
 void osc_object_clear_contended(struct osc_object *obj);
diff --git a/drivers/staging/lustre/lustre/osc/osc_internal.h b/drivers/staging/lustre/lustre/osc/osc_internal.h
index ec12962..b3b15d4 100644
--- a/drivers/staging/lustre/lustre/osc/osc_internal.h
+++ b/drivers/staging/lustre/lustre/osc/osc_internal.h
@@ -130,7 +130,8 @@ int osc_sync_base(struct obd_export *exp, struct obd_info *oinfo,
 int osc_process_config_base(struct obd_device *obd, struct lustre_cfg *cfg);
 int osc_build_rpc(const struct lu_env *env, struct client_obd *cli,
 		  struct list_head *ext_list, int cmd);
-int osc_lru_shrink(struct client_obd *cli, int target, bool force);
+int osc_lru_shrink(const struct lu_env *env, struct client_obd *cli,
+		   int target, bool force);
 int osc_lru_reclaim(struct client_obd *cli);
 
 extern spinlock_t osc_ast_guard;
diff --git a/drivers/staging/lustre/lustre/osc/osc_page.c b/drivers/staging/lustre/lustre/osc/osc_page.c
index a60b783..f0a9870 100644
--- a/drivers/staging/lustre/lustre/osc/osc_page.c
+++ b/drivers/staging/lustre/lustre/osc/osc_page.c
@@ -508,6 +508,18 @@ static int osc_cache_too_much(struct client_obd *cli)
 	return 0;
 }
 
+int lru_queue_work(const struct lu_env *env, void *data)
+{
+	struct client_obd *cli = data;
+
+	CDEBUG(D_CACHE, "Run LRU work for client obd %p.\n", cli);
+
+	if (osc_cache_too_much(cli))
+		osc_lru_shrink(env, cli, lru_shrink_max, true);
+
+	return 0;
+}
+
 void osc_lru_add_batch(struct client_obd *cli, struct list_head *plist)
 {
 	LIST_HEAD(lru);
@@ -533,7 +545,8 @@ void osc_lru_add_batch(struct client_obd *cli, struct list_head *plist)
 		client_obd_list_unlock(&cli->cl_lru_list_lock);
 
 		/* XXX: May set force to be true for better performance */
-		osc_lru_shrink(cli, osc_cache_too_much(cli), false);
+		if (osc_cache_too_much(cli))
+			(void)ptlrpcd_queue_work(cli->cl_lru_work);
 	}
 }
 
@@ -566,7 +579,7 @@ static void osc_lru_del(struct client_obd *cli, struct osc_page *opg)
 		 * stealing one of them.
 		 */
 		if (!memory_pressure_get())
-			osc_lru_shrink(cli, osc_cache_too_much(cli), false);
+			(void)ptlrpcd_queue_work(cli->cl_lru_work);
 		wake_up(&osc_lru_waitq);
 	} else {
 		LASSERT(list_empty(&opg->ops_lru));
@@ -610,10 +623,9 @@ static void discard_pagevec(const struct lu_env *env, struct cl_io *io,
 /**
  * Drop @target of pages from LRU at most.
  */
-int osc_lru_shrink(struct client_obd *cli, int target, bool force)
+int osc_lru_shrink(const struct lu_env *env, struct client_obd *cli,
+		   int target, bool force)
 {
-	struct cl_env_nest nest;
-	struct lu_env *env;
 	struct cl_io *io;
 	struct cl_object *clobj = NULL;
 	struct cl_page **pvec;
@@ -640,12 +652,6 @@ int osc_lru_shrink(struct client_obd *cli, int target, bool force)
 		atomic_inc(&cli->cl_lru_shrinkers);
 	}
 
-	env = cl_env_nested_get(&nest);
-	if (IS_ERR(env)) {
-		rc = PTR_ERR(env);
-		goto out;
-	}
-
 	pvec = osc_env_info(env)->oti_pvec;
 	io = &osc_env_info(env)->oti_io;
 
@@ -735,9 +741,7 @@ int osc_lru_shrink(struct client_obd *cli, int target, bool force)
 		cl_io_fini(env, io);
 		cl_object_put(env, clobj);
 	}
-	cl_env_nested_put(&nest, env);
 
-out:
 	atomic_dec(&cli->cl_lru_shrinkers);
 	if (count > 0) {
 		atomic_add(count, cli->cl_lru_left);
@@ -753,20 +757,26 @@ static inline int max_to_shrink(struct client_obd *cli)
 
 int osc_lru_reclaim(struct client_obd *cli)
 {
+	struct cl_env_nest nest;
+	struct lu_env *env;
 	struct cl_client_cache *cache = cli->cl_cache;
 	int max_scans;
 	int rc = 0;
 
 	LASSERT(cache);
 
-	rc = osc_lru_shrink(cli, lru_shrink_min, false);
+	env = cl_env_nested_get(&nest);
+	if (IS_ERR(env))
+		return 0;
+
+	rc = osc_lru_shrink(env, cli, osc_cache_too_much(cli), false);
 	if (rc != 0) {
 		if (rc == -EBUSY)
 			rc = 0;
 
 		CDEBUG(D_CACHE, "%s: Free %d pages from own LRU: %p.\n",
 		       cli->cl_import->imp_obd->obd_name, rc, cli);
-		return rc;
+		goto out;
 	}
 
 	CDEBUG(D_CACHE, "%s: cli %p no free slots, pages: %d, busy: %d.\n",
@@ -797,7 +807,8 @@ int osc_lru_reclaim(struct client_obd *cli)
 		if (osc_cache_too_much(cli) > 0) {
 			spin_unlock(&cache->ccc_lru_lock);
 
-			rc = osc_lru_shrink(cli, osc_cache_too_much(cli), true);
+			rc = osc_lru_shrink(env, cli, osc_cache_too_much(cli),
+					    true);
 			spin_lock(&cache->ccc_lru_lock);
 			if (rc != 0)
 				break;
@@ -805,6 +816,8 @@ int osc_lru_reclaim(struct client_obd *cli)
 	}
 	spin_unlock(&cache->ccc_lru_lock);
 
+out:
+	cl_env_nested_put(&nest, env);
 	CDEBUG(D_CACHE, "%s: cli %p freed %d pages.\n",
 	       cli->cl_import->imp_obd->obd_name, cli, rc);
 	return rc;
diff --git a/drivers/staging/lustre/lustre/osc/osc_request.c b/drivers/staging/lustre/lustre/osc/osc_request.c
index 6dadda4..c055511b3 100644
--- a/drivers/staging/lustre/lustre/osc/osc_request.c
+++ b/drivers/staging/lustre/lustre/osc/osc_request.c
@@ -2910,7 +2910,7 @@ static int osc_set_info_async(const struct lu_env *env, struct obd_export *exp,
 		int nr = atomic_read(&cli->cl_lru_in_list) >> 1;
 		int target = *(int *)val;
 
-		nr = osc_lru_shrink(cli, min(nr, target), true);
+		nr = osc_lru_shrink(env, cli, min(nr, target), true);
 		*(int *)val -= nr;
 		return 0;
 	}
@@ -3167,6 +3167,14 @@ int osc_setup(struct obd_device *obd, struct lustre_cfg *lcfg)
 	}
 	cli->cl_writeback_work = handler;
 
+	handler = ptlrpcd_alloc_work(cli->cl_import, lru_queue_work, cli);
+	if (IS_ERR(handler)) {
+		rc = PTR_ERR(handler);
+		goto out_ptlrpcd_work;
+	}
+
+	cli->cl_lru_work = handler;
+
 	rc = osc_quota_setup(obd);
 	if (rc)
 		goto out_ptlrpcd_work;
@@ -3199,7 +3207,14 @@ int osc_setup(struct obd_device *obd, struct lustre_cfg *lcfg)
 	return rc;
 
 out_ptlrpcd_work:
-	ptlrpcd_destroy_work(handler);
+	if (cli->cl_writeback_work) {
+		ptlrpcd_destroy_work(cli->cl_writeback_work);
+		cli->cl_writeback_work = NULL;
+	}
+	if (cli->cl_lru_work) {
+		ptlrpcd_destroy_work(cli->cl_lru_work);
+		cli->cl_lru_work = NULL;
+	}
 out_client_setup:
 	client_obd_cleanup(obd);
 out_ptlrpcd:
@@ -3238,6 +3253,10 @@ static int osc_precleanup(struct obd_device *obd, enum obd_cleanup_stage stage)
 			ptlrpcd_destroy_work(cli->cl_writeback_work);
 			cli->cl_writeback_work = NULL;
 		}
+		if (cli->cl_lru_work) {
+			ptlrpcd_destroy_work(cli->cl_lru_work);
+			cli->cl_lru_work = NULL;
+		}
 		obd_cleanup_client_import(obd);
 		ptlrpc_lprocfs_unregister_obd(obd);
 		lprocfs_obd_cleanup(obd);
diff --git a/drivers/staging/lustre/lustre/ptlrpc/ptlrpcd.c b/drivers/staging/lustre/lustre/ptlrpc/ptlrpcd.c
index db003f5..dbc3376 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/ptlrpcd.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/ptlrpcd.c
@@ -387,7 +387,8 @@ static int ptlrpcd(void *arg)
 {
 	struct ptlrpcd_ctl *pc = arg;
 	struct ptlrpc_request_set *set;
-	struct lu_env env = { .le_ses = NULL };
+	struct lu_context ses = { 0 };
+	struct lu_env env = { .le_ses = &ses };
 	int rc = 0;
 	int exit = 0;
 
@@ -416,6 +417,13 @@ static int ptlrpcd(void *arg)
 	 */
 	rc = lu_context_init(&env.le_ctx,
 			     LCT_CL_THREAD|LCT_REMEMBER|LCT_NOREF);
+	if (rc == 0) {
+		rc = lu_context_init(env.le_ses,
+				     LCT_SESSION | LCT_REMEMBER | LCT_NOREF);
+		if (rc != 0)
+			lu_context_fini(&env.le_ctx);
+	}
+
 	if (rc != 0)
 		goto failed;
 
@@ -436,9 +444,10 @@ static int ptlrpcd(void *arg)
 				  ptlrpc_expired_set, set);
 
 		lu_context_enter(&env.le_ctx);
-		l_wait_event(set->set_waitq,
-			     ptlrpcd_check(&env, pc), &lwi);
+		lu_context_enter(env.le_ses);
+		l_wait_event(set->set_waitq, ptlrpcd_check(&env, pc), &lwi);
 		lu_context_exit(&env.le_ctx);
+		lu_context_exit(env.le_ses);
 
 		/*
 		 * Abort inflight rpcs for forced stop case.
@@ -461,6 +470,7 @@ static int ptlrpcd(void *arg)
 	if (!list_empty(&set->set_requests))
 		ptlrpc_set_wait(set);
 	lu_context_fini(&env.le_ctx);
+	lu_context_fini(env.le_ses);
 
 	complete(&pc->pc_finishing);
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 07/46] staging/lustre/clio: collapse layer of cl_page
  2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
                   ` (5 preceding siblings ...)
  2016-03-30 23:48 ` [PATCH v2 06/46] staging/lustre/osc: to drop LRU pages with cl_lru_work green
@ 2016-03-30 23:48 ` green
  2016-03-30 23:48 ` [PATCH v2 08/46] staging/lustre/obdclass: Add a preallocated percpu cl_env green
                   ` (38 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: green @ 2016-03-30 23:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Jinshan Xiong, Oleg Drokin

From: Jinshan Xiong <jinshan.xiong@intel.com>

Move radix tree to osc layer to for performance improvement.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-on: http://review.whamcloud.com/7892
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3321
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 drivers/staging/lustre/lustre/include/cl_object.h  |  36 +--
 drivers/staging/lustre/lustre/llite/rw.c           |   2 +-
 drivers/staging/lustre/lustre/llite/rw26.c         |   4 -
 drivers/staging/lustre/lustre/llite/vvp_dev.c      |  47 +--
 drivers/staging/lustre/lustre/llite/vvp_io.c       |   1 -
 drivers/staging/lustre/lustre/llite/vvp_object.c   |  13 +
 drivers/staging/lustre/lustre/llite/vvp_page.c     |  36 +--
 drivers/staging/lustre/lustre/lov/lov_object.c     |   9 +-
 drivers/staging/lustre/lustre/lov/lov_page.c       |  29 +-
 drivers/staging/lustre/lustre/obdclass/cl_io.c     |   1 +
 drivers/staging/lustre/lustre/obdclass/cl_lock.c   | 131 +-------
 drivers/staging/lustre/lustre/obdclass/cl_object.c |  47 +--
 drivers/staging/lustre/lustre/obdclass/cl_page.c   | 354 ++-------------------
 drivers/staging/lustre/lustre/osc/osc_cache.c      | 207 +++++++++++-
 .../staging/lustre/lustre/osc/osc_cl_internal.h    |  27 +-
 drivers/staging/lustre/lustre/osc/osc_io.c         |  14 +-
 drivers/staging/lustre/lustre/osc/osc_lock.c       |  10 +-
 drivers/staging/lustre/lustre/osc/osc_object.c     |   2 +
 drivers/staging/lustre/lustre/osc/osc_page.c       |  28 +-
 19 files changed, 394 insertions(+), 604 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/cl_object.h b/drivers/staging/lustre/lustre/include/cl_object.h
index e611f79..5daf688 100644
--- a/drivers/staging/lustre/lustre/include/cl_object.h
+++ b/drivers/staging/lustre/lustre/include/cl_object.h
@@ -388,6 +388,12 @@ struct cl_object_operations {
 	 */
 	int (*coo_glimpse)(const struct lu_env *env,
 			   const struct cl_object *obj, struct ost_lvb *lvb);
+	/**
+	 * Object prune method. Called when the layout is going to change on
+	 * this object, therefore each layer has to clean up their cache,
+	 * mainly pages and locks.
+	 */
+	int (*coo_prune)(const struct lu_env *env, struct cl_object *obj);
 };
 
 /**
@@ -403,15 +409,9 @@ struct cl_object_header {
 	 * mostly useless otherwise.
 	 */
 	/** @{ */
-	/** Lock protecting page tree. */
-	spinlock_t		 coh_page_guard;
 	/** Lock protecting lock list. */
 	spinlock_t		 coh_lock_guard;
 	/** @} locks */
-	/** Radix tree of cl_page's, cached for this object. */
-	struct radix_tree_root   coh_tree;
-	/** # of pages in radix tree. */
-	unsigned long	    coh_pages;
 	/** List of cl_lock's granted for this object. */
 	struct list_head	       coh_locks;
 
@@ -897,14 +897,6 @@ struct cl_page_operations {
 	void  (*cpo_export)(const struct lu_env *env,
 			    const struct cl_page_slice *slice, int uptodate);
 	/**
-	 * Unmaps page from the user space (if it is mapped).
-	 *
-	 * \see cl_page_unmap()
-	 * \see vvp_page_unmap()
-	 */
-	int (*cpo_unmap)(const struct lu_env *env,
-			 const struct cl_page_slice *slice, struct cl_io *io);
-	/**
 	 * Checks whether underlying VM page is locked (in the suitable
 	 * sense). Used for assertions.
 	 *
@@ -2794,19 +2786,13 @@ enum {
 };
 
 /* callback of cl_page_gang_lookup() */
-typedef int   (*cl_page_gang_cb_t)  (const struct lu_env *, struct cl_io *,
-				     struct cl_page *, void *);
-int cl_page_gang_lookup(const struct lu_env *env, struct cl_object *obj,
-			struct cl_io *io, pgoff_t start, pgoff_t end,
-			cl_page_gang_cb_t cb, void *cbdata);
-struct cl_page *cl_page_lookup(struct cl_object_header *hdr, pgoff_t index);
 struct cl_page *cl_page_find(const struct lu_env *env, struct cl_object *obj,
 			     pgoff_t idx, struct page *vmpage,
 			     enum cl_page_type type);
-struct cl_page *cl_page_find_sub(const struct lu_env *env,
-				 struct cl_object *obj,
-				 pgoff_t idx, struct page *vmpage,
-				     struct cl_page *parent);
+struct cl_page *cl_page_alloc(const struct lu_env *env,
+			      struct cl_object *o, pgoff_t ind,
+			      struct page *vmpage,
+			      enum cl_page_type type);
 void cl_page_get(struct cl_page *page);
 void cl_page_put(const struct lu_env *env, struct cl_page *page);
 void cl_page_print(const struct lu_env *env, void *cookie, lu_printer_t printer,
@@ -2872,8 +2858,6 @@ int cl_page_flush(const struct lu_env *env, struct cl_io *io,
 void cl_page_discard(const struct lu_env *env, struct cl_io *io,
 		     struct cl_page *pg);
 void cl_page_delete(const struct lu_env *env, struct cl_page *pg);
-int cl_page_unmap(const struct lu_env *env, struct cl_io *io,
-		  struct cl_page *pg);
 int cl_page_is_vmlocked(const struct lu_env *env, const struct cl_page *pg);
 void cl_page_export(const struct lu_env *env, struct cl_page *pg, int uptodate);
 int cl_page_is_under_lock(const struct lu_env *env, struct cl_io *io,
diff --git a/drivers/staging/lustre/lustre/llite/rw.c b/drivers/staging/lustre/lustre/llite/rw.c
index 34614ac..01b8365 100644
--- a/drivers/staging/lustre/lustre/llite/rw.c
+++ b/drivers/staging/lustre/lustre/llite/rw.c
@@ -442,7 +442,7 @@ static int cl_read_ahead_page(const struct lu_env *env, struct cl_io *io,
 			cl_page_list_add(queue, page);
 			rc = 1;
 		} else {
-			cl_page_delete(env, page);
+			cl_page_discard(env, io, page);
 			rc = -ENOLCK;
 		}
 	} else {
diff --git a/drivers/staging/lustre/lustre/llite/rw26.c b/drivers/staging/lustre/lustre/llite/rw26.c
index 3d7e64e..b5335de 100644
--- a/drivers/staging/lustre/lustre/llite/rw26.c
+++ b/drivers/staging/lustre/lustre/llite/rw26.c
@@ -95,11 +95,7 @@ static void ll_invalidatepage(struct page *vmpage, unsigned int offset,
 			if (obj) {
 				page = cl_vmpage_page(vmpage, obj);
 				if (page) {
-					lu_ref_add(&page->cp_reference,
-						   "delete", vmpage);
 					cl_page_delete(env, page);
-					lu_ref_del(&page->cp_reference,
-						   "delete", vmpage);
 					cl_page_put(env, page);
 				}
 			} else
diff --git a/drivers/staging/lustre/lustre/llite/vvp_dev.c b/drivers/staging/lustre/lustre/llite/vvp_dev.c
index 282b70b..29d24c9 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_dev.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_dev.c
@@ -36,6 +36,7 @@
  * cl_device and cl_device_type implementation for VVP layer.
  *
  *   Author: Nikita Danilov <nikita.danilov@sun.com>
+ *   Author: Jinshan Xiong <jinshan.xiong@intel.com>
  */
 
 #define DEBUG_SUBSYSTEM S_LLITE
@@ -356,23 +357,18 @@ static loff_t vvp_pgcache_find(const struct lu_env *env,
 			return ~0ULL;
 		clob = vvp_pgcache_obj(env, dev, &id);
 		if (clob) {
-			struct cl_object_header *hdr;
-			int		      nr;
-			struct cl_page	  *pg;
+			struct inode *inode = ccc_object_inode(clob);
+			struct page *vmpage;
+			int nr;
 
-			/* got an object. Find next page. */
-			hdr = cl_object_header(clob);
-
-			spin_lock(&hdr->coh_page_guard);
-			nr = radix_tree_gang_lookup(&hdr->coh_tree,
-						    (void **)&pg,
-						    id.vpi_index, 1);
+			nr = find_get_pages_contig(inode->i_mapping,
+						   id.vpi_index, 1, &vmpage);
 			if (nr > 0) {
-				id.vpi_index = pg->cp_index;
+				id.vpi_index = vmpage->index;
 				/* Cant support over 16T file */
-				nr = !(pg->cp_index > 0xffffffff);
+				nr = !(vmpage->index > 0xffffffff);
+				page_cache_release(vmpage);
 			}
-			spin_unlock(&hdr->coh_page_guard);
 
 			lu_object_ref_del(&clob->co_lu, "dump", current);
 			cl_object_put(env, clob);
@@ -431,8 +427,6 @@ static int vvp_pgcache_show(struct seq_file *f, void *v)
 	struct ll_sb_info       *sbi;
 	struct cl_object	*clob;
 	struct lu_env	   *env;
-	struct cl_page	  *page;
-	struct cl_object_header *hdr;
 	struct vvp_pgcache_id    id;
 	int		      refcheck;
 	int		      result;
@@ -444,14 +438,23 @@ static int vvp_pgcache_show(struct seq_file *f, void *v)
 		sbi = f->private;
 		clob = vvp_pgcache_obj(env, &sbi->ll_cl->cd_lu_dev, &id);
 		if (clob) {
-			hdr = cl_object_header(clob);
-
-			spin_lock(&hdr->coh_page_guard);
-			page = cl_page_lookup(hdr, id.vpi_index);
-			spin_unlock(&hdr->coh_page_guard);
+			struct inode *inode = ccc_object_inode(clob);
+			struct cl_page *page = NULL;
+			struct page *vmpage;
+
+			result = find_get_pages_contig(inode->i_mapping,
+						       id.vpi_index, 1,
+						       &vmpage);
+			if (result > 0) {
+				lock_page(vmpage);
+				page = cl_vmpage_page(vmpage, clob);
+				unlock_page(vmpage);
+
+				page_cache_release(vmpage);
+			}
 
-			seq_printf(f, "%8x@"DFID": ",
-				   id.vpi_index, PFID(&hdr->coh_lu.loh_fid));
+			seq_printf(f, "%8x@" DFID ": ", id.vpi_index,
+				   PFID(lu_object_fid(&clob->co_lu)));
 			if (page) {
 				vvp_pgcache_page_show(env, f, page);
 				cl_page_put(env, page);
diff --git a/drivers/staging/lustre/lustre/llite/vvp_io.c b/drivers/staging/lustre/lustre/llite/vvp_io.c
index 984699a..ffe301b 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_io.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_io.c
@@ -763,7 +763,6 @@ static int vvp_io_fault_start(const struct lu_env *env,
 
 			vmpage = NULL;
 			if (result < 0) {
-				cl_page_unmap(env, io, page);
 				cl_page_discard(env, io, page);
 				cl_page_disown(env, io, page);
 
diff --git a/drivers/staging/lustre/lustre/llite/vvp_object.c b/drivers/staging/lustre/lustre/llite/vvp_object.c
index 03c887d..b9a1d01 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_object.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_object.c
@@ -165,6 +165,18 @@ static int vvp_conf_set(const struct lu_env *env, struct cl_object *obj,
 	return 0;
 }
 
+static int vvp_prune(const struct lu_env *env, struct cl_object *obj)
+{
+	struct inode *inode = ccc_object_inode(obj);
+	int rc;
+
+	rc = cl_sync_file_range(inode, 0, OBD_OBJECT_EOF, CL_FSYNC_ALL, 1);
+	if (rc == 0)
+		truncate_inode_pages(inode->i_mapping, 0);
+
+	return rc;
+}
+
 static const struct cl_object_operations vvp_ops = {
 	.coo_page_init = vvp_page_init,
 	.coo_lock_init = vvp_lock_init,
@@ -172,6 +184,7 @@ static const struct cl_object_operations vvp_ops = {
 	.coo_attr_get  = vvp_attr_get,
 	.coo_attr_set  = vvp_attr_set,
 	.coo_conf_set  = vvp_conf_set,
+	.coo_prune     = vvp_prune,
 	.coo_glimpse   = ccc_object_glimpse
 };
 
diff --git a/drivers/staging/lustre/lustre/llite/vvp_page.c b/drivers/staging/lustre/lustre/llite/vvp_page.c
index 850bae7..11e609e 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_page.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_page.c
@@ -138,6 +138,7 @@ static void vvp_page_discard(const struct lu_env *env,
 	struct page	   *vmpage  = cl2vm_page(slice);
 	struct address_space *mapping;
 	struct ccc_page      *cpg     = cl2ccc_page(slice);
+	__u64 offset;
 
 	LASSERT(vmpage);
 	LASSERT(PageLocked(vmpage));
@@ -147,6 +148,9 @@ static void vvp_page_discard(const struct lu_env *env,
 	if (cpg->cpg_defer_uptodate && !cpg->cpg_ra_used)
 		ll_ra_stats_inc(mapping, RA_STAT_DISCARDED);
 
+	offset = vmpage->index << PAGE_SHIFT;
+	ll_teardown_mmaps(vmpage->mapping, offset, offset + PAGE_SIZE);
+
 	/*
 	 * truncate_complete_page() calls
 	 * a_ops->invalidatepage()->cl_page_delete()->vvp_page_delete().
@@ -154,37 +158,26 @@ static void vvp_page_discard(const struct lu_env *env,
 	truncate_complete_page(mapping, vmpage);
 }
 
-static int vvp_page_unmap(const struct lu_env *env,
-			  const struct cl_page_slice *slice,
-			  struct cl_io *unused)
-{
-	struct page *vmpage = cl2vm_page(slice);
-	__u64       offset;
-
-	LASSERT(vmpage);
-	LASSERT(PageLocked(vmpage));
-
-	offset = vmpage->index << PAGE_CACHE_SHIFT;
-
-	/*
-	 * XXX is it safe to call this with the page lock held?
-	 */
-	ll_teardown_mmaps(vmpage->mapping, offset, offset + PAGE_CACHE_SIZE);
-	return 0;
-}
-
 static void vvp_page_delete(const struct lu_env *env,
 			    const struct cl_page_slice *slice)
 {
 	struct page       *vmpage = cl2vm_page(slice);
 	struct inode     *inode  = vmpage->mapping->host;
 	struct cl_object *obj    = slice->cpl_obj;
+	struct cl_page   *page   = slice->cpl_page;
+	int refc;
 
 	LASSERT(PageLocked(vmpage));
-	LASSERT((struct cl_page *)vmpage->private == slice->cpl_page);
+	LASSERT((struct cl_page *)vmpage->private == page);
 	LASSERT(inode == ccc_object_inode(obj));
 
 	vvp_write_complete(cl2ccc(obj), cl2ccc_page(slice));
+
+	/* Drop the reference count held in vvp_page_init */
+	refc = atomic_dec_return(&page->cp_ref);
+	LASSERTF(refc >= 1, "page = %p, refc = %d\n", page, refc);
+
+	ClearPageUptodate(vmpage);
 	ClearPagePrivate(vmpage);
 	vmpage->private = 0;
 	/*
@@ -404,7 +397,6 @@ static const struct cl_page_operations vvp_page_ops = {
 	.cpo_vmpage	= ccc_page_vmpage,
 	.cpo_discard       = vvp_page_discard,
 	.cpo_delete	= vvp_page_delete,
-	.cpo_unmap	 = vvp_page_unmap,
 	.cpo_export	= vvp_page_export,
 	.cpo_is_vmlocked   = vvp_page_is_vmlocked,
 	.cpo_fini	  = vvp_page_fini,
@@ -541,6 +533,8 @@ int vvp_page_init(const struct lu_env *env, struct cl_object *obj,
 
 	INIT_LIST_HEAD(&cpg->cpg_pending_linkage);
 	if (page->cp_type == CPT_CACHEABLE) {
+		/* in cache, decref in vvp_page_delete */
+		atomic_inc(&page->cp_ref);
 		SetPagePrivate(vmpage);
 		vmpage->private = (unsigned long)page;
 		cl_page_slice_add(page, &cpg->cpg_cl, obj, &vvp_page_ops);
diff --git a/drivers/staging/lustre/lustre/lov/lov_object.c b/drivers/staging/lustre/lustre/lov/lov_object.c
index 1f8ed95..5d8a2b6 100644
--- a/drivers/staging/lustre/lustre/lov/lov_object.c
+++ b/drivers/staging/lustre/lustre/lov/lov_object.c
@@ -287,7 +287,7 @@ static int lov_delete_empty(const struct lu_env *env, struct lov_object *lov,
 
 	lov_layout_wait(env, lov);
 
-	cl_object_prune(env, &lov->lo_cl);
+	cl_locks_prune(env, &lov->lo_cl, 0);
 	return 0;
 }
 
@@ -364,7 +364,7 @@ static int lov_delete_raid0(const struct lu_env *env, struct lov_object *lov,
 			}
 		}
 	}
-	cl_object_prune(env, &lov->lo_cl);
+	cl_locks_prune(env, &lov->lo_cl, 0);
 	return 0;
 }
 
@@ -666,7 +666,6 @@ static int lov_layout_change(const struct lu_env *unused,
 	const struct lov_layout_operations *old_ops;
 	const struct lov_layout_operations *new_ops;
 
-	struct cl_object_header *hdr = cl_object_header(&lov->lo_cl);
 	void *cookie;
 	struct lu_env *env;
 	int refcheck;
@@ -691,13 +690,13 @@ static int lov_layout_change(const struct lu_env *unused,
 	old_ops = &lov_dispatch[lov->lo_type];
 	new_ops = &lov_dispatch[llt];
 
+	cl_object_prune(env, &lov->lo_cl);
+
 	result = old_ops->llo_delete(env, lov, &lov->u);
 	if (result == 0) {
 		old_ops->llo_fini(env, lov, &lov->u);
 
 		LASSERT(atomic_read(&lov->lo_active_ios) == 0);
-		LASSERT(!hdr->coh_tree.rnode);
-		LASSERT(hdr->coh_pages == 0);
 
 		lov->lo_type = LLT_EMPTY;
 		result = new_ops->llo_init(env,
diff --git a/drivers/staging/lustre/lustre/lov/lov_page.c b/drivers/staging/lustre/lustre/lov/lov_page.c
index fdcaf80..9728da2 100644
--- a/drivers/staging/lustre/lustre/lov/lov_page.c
+++ b/drivers/staging/lustre/lustre/lov/lov_page.c
@@ -36,6 +36,7 @@
  * Implementation of cl_page for LOV layer.
  *
  *   Author: Nikita Danilov <nikita.danilov@sun.com>
+ *   Author: Jinshan Xiong <jinshan.xiong@intel.com>
  */
 
 #define DEBUG_SUBSYSTEM S_LOV
@@ -179,31 +180,21 @@ int lov_page_init_raid0(const struct lu_env *env, struct cl_object *obj,
 	cl_page_slice_add(page, &lpg->lps_cl, obj, &lov_page_ops);
 
 	sub = lov_sub_get(env, lio, stripe);
-	if (IS_ERR(sub)) {
-		rc = PTR_ERR(sub);
-		goto out;
-	}
+	if (IS_ERR(sub))
+		return PTR_ERR(sub);
 
 	subobj = lovsub2cl(r0->lo_sub[stripe]);
-	subpage = cl_page_find_sub(sub->sub_env, subobj,
-				   cl_index(subobj, suboff), vmpage, page);
-	lov_sub_put(sub);
-	if (IS_ERR(subpage)) {
-		rc = PTR_ERR(subpage);
-		goto out;
-	}
-
-	if (likely(subpage->cp_parent == page)) {
-		lu_ref_add(&subpage->cp_reference, "lov", page);
+	subpage = cl_page_alloc(sub->sub_env, subobj, cl_index(subobj, suboff),
+				vmpage, page->cp_type);
+	if (!IS_ERR(subpage)) {
+		subpage->cp_parent = page;
+		page->cp_child = subpage;
 		lpg->lps_invalid = 0;
-		rc = 0;
 	} else {
-		CL_PAGE_DEBUG(D_ERROR, env, page, "parent page\n");
-		CL_PAGE_DEBUG(D_ERROR, env, subpage, "child page\n");
-		LASSERT(0);
+		rc = PTR_ERR(subpage);
 	}
+	lov_sub_put(sub);
 
-out:
 	return rc;
 }
 
diff --git a/drivers/staging/lustre/lustre/obdclass/cl_io.c b/drivers/staging/lustre/lustre/obdclass/cl_io.c
index f5128b4..cf94284 100644
--- a/drivers/staging/lustre/lustre/obdclass/cl_io.c
+++ b/drivers/staging/lustre/lustre/obdclass/cl_io.c
@@ -36,6 +36,7 @@
  * Client IO.
  *
  *   Author: Nikita Danilov <nikita.danilov@sun.com>
+ *   Author: Jinshan Xiong <jinshan.xiong@intel.com>
  */
 
 #define DEBUG_SUBSYSTEM S_CLASS
diff --git a/drivers/staging/lustre/lustre/obdclass/cl_lock.c b/drivers/staging/lustre/lustre/obdclass/cl_lock.c
index f952c1c..32ecc5a 100644
--- a/drivers/staging/lustre/lustre/obdclass/cl_lock.c
+++ b/drivers/staging/lustre/lustre/obdclass/cl_lock.c
@@ -36,6 +36,7 @@
  * Client Extent Lock.
  *
  *   Author: Nikita Danilov <nikita.danilov@sun.com>
+ *   Author: Jinshan Xiong <jinshan.xiong@intel.com>
  */
 
 #define DEBUG_SUBSYSTEM S_CLASS
@@ -1816,128 +1817,6 @@ struct cl_lock *cl_lock_at_pgoff(const struct lu_env *env,
 EXPORT_SYMBOL(cl_lock_at_pgoff);
 
 /**
- * Calculate the page offset at the layer of @lock.
- * At the time of this writing, @page is top page and @lock is sub lock.
- */
-static pgoff_t pgoff_at_lock(struct cl_page *page, struct cl_lock *lock)
-{
-	struct lu_device_type *dtype;
-	const struct cl_page_slice *slice;
-
-	dtype = lock->cll_descr.cld_obj->co_lu.lo_dev->ld_type;
-	slice = cl_page_at(page, dtype);
-	return slice->cpl_page->cp_index;
-}
-
-/**
- * Check if page @page is covered by an extra lock or discard it.
- */
-static int check_and_discard_cb(const struct lu_env *env, struct cl_io *io,
-				struct cl_page *page, void *cbdata)
-{
-	struct cl_thread_info *info = cl_env_info(env);
-	struct cl_lock *lock = cbdata;
-	pgoff_t index = pgoff_at_lock(page, lock);
-
-	if (index >= info->clt_fn_index) {
-		struct cl_lock *tmp;
-
-		/* refresh non-overlapped index */
-		tmp = cl_lock_at_pgoff(env, lock->cll_descr.cld_obj, index,
-				       lock, 1, 0);
-		if (tmp) {
-			/* Cache the first-non-overlapped index so as to skip
-			 * all pages within [index, clt_fn_index). This
-			 * is safe because if tmp lock is canceled, it will
-			 * discard these pages.
-			 */
-			info->clt_fn_index = tmp->cll_descr.cld_end + 1;
-			if (tmp->cll_descr.cld_end == CL_PAGE_EOF)
-				info->clt_fn_index = CL_PAGE_EOF;
-			cl_lock_put(env, tmp);
-		} else if (cl_page_own(env, io, page) == 0) {
-			/* discard the page */
-			cl_page_unmap(env, io, page);
-			cl_page_discard(env, io, page);
-			cl_page_disown(env, io, page);
-		} else {
-			LASSERT(page->cp_state == CPS_FREEING);
-		}
-	}
-
-	info->clt_next_index = index + 1;
-	return CLP_GANG_OKAY;
-}
-
-static int discard_cb(const struct lu_env *env, struct cl_io *io,
-		      struct cl_page *page, void *cbdata)
-{
-	struct cl_thread_info *info = cl_env_info(env);
-	struct cl_lock *lock   = cbdata;
-
-	LASSERT(lock->cll_descr.cld_mode >= CLM_WRITE);
-	KLASSERT(ergo(page->cp_type == CPT_CACHEABLE,
-		      !PageWriteback(cl_page_vmpage(env, page))));
-	KLASSERT(ergo(page->cp_type == CPT_CACHEABLE,
-		      !PageDirty(cl_page_vmpage(env, page))));
-
-	info->clt_next_index = pgoff_at_lock(page, lock) + 1;
-	if (cl_page_own(env, io, page) == 0) {
-		/* discard the page */
-		cl_page_unmap(env, io, page);
-		cl_page_discard(env, io, page);
-		cl_page_disown(env, io, page);
-	} else {
-		LASSERT(page->cp_state == CPS_FREEING);
-	}
-
-	return CLP_GANG_OKAY;
-}
-
-/**
- * Discard pages protected by the given lock. This function traverses radix
- * tree to find all covering pages and discard them. If a page is being covered
- * by other locks, it should remain in cache.
- *
- * If error happens on any step, the process continues anyway (the reasoning
- * behind this being that lock cancellation cannot be delayed indefinitely).
- */
-int cl_lock_discard_pages(const struct lu_env *env, struct cl_lock *lock)
-{
-	struct cl_thread_info *info  = cl_env_info(env);
-	struct cl_io	  *io    = &info->clt_io;
-	struct cl_lock_descr  *descr = &lock->cll_descr;
-	cl_page_gang_cb_t      cb;
-	int res;
-	int result;
-
-	LINVRNT(cl_lock_invariant(env, lock));
-
-	io->ci_obj = cl_object_top(descr->cld_obj);
-	io->ci_ignore_layout = 1;
-	result = cl_io_init(env, io, CIT_MISC, io->ci_obj);
-	if (result != 0)
-		goto out;
-
-	cb = descr->cld_mode == CLM_READ ? check_and_discard_cb : discard_cb;
-	info->clt_fn_index = info->clt_next_index = descr->cld_start;
-	do {
-		res = cl_page_gang_lookup(env, descr->cld_obj, io,
-					  info->clt_next_index, descr->cld_end,
-					  cb, (void *)lock);
-		if (info->clt_next_index > descr->cld_end)
-			break;
-
-		if (res == CLP_GANG_RESCHED)
-			cond_resched();
-	} while (res != CLP_GANG_OKAY);
-out:
-	cl_io_fini(env, io);
-	return result;
-}
-EXPORT_SYMBOL(cl_lock_discard_pages);
-
-/**
  * Eliminate all locks for a given object.
  *
  * Caller has to guarantee that no lock is in active use.
@@ -1951,12 +1830,6 @@ void cl_locks_prune(const struct lu_env *env, struct cl_object *obj, int cancel)
 	struct cl_lock	  *lock;
 
 	head = cl_object_header(obj);
-	/*
-	 * If locks are destroyed without cancellation, all pages must be
-	 * already destroyed (as otherwise they will be left unprotected).
-	 */
-	LASSERT(ergo(!cancel,
-		     !head->coh_tree.rnode && head->coh_pages == 0));
 
 	spin_lock(&head->coh_lock_guard);
 	while (!list_empty(&head->coh_locks)) {
@@ -2095,8 +1968,8 @@ void cl_lock_hold_add(const struct lu_env *env, struct cl_lock *lock,
 	LINVRNT(cl_lock_invariant(env, lock));
 	LASSERT(lock->cll_state != CLS_FREEING);
 
-	cl_lock_hold_mod(env, lock, 1);
 	cl_lock_get(lock);
+	cl_lock_hold_mod(env, lock, 1);
 	lu_ref_add(&lock->cll_holders, scope, source);
 	lu_ref_add(&lock->cll_reference, scope, source);
 }
diff --git a/drivers/staging/lustre/lustre/obdclass/cl_object.c b/drivers/staging/lustre/lustre/obdclass/cl_object.c
index 0772706..65b6402 100644
--- a/drivers/staging/lustre/lustre/obdclass/cl_object.c
+++ b/drivers/staging/lustre/lustre/obdclass/cl_object.c
@@ -36,6 +36,7 @@
  * Client Lustre Object.
  *
  *   Author: Nikita Danilov <nikita.danilov@sun.com>
+ *   Author: Jinshan Xiong <jinshan.xiong@intel.com>
  */
 
 /*
@@ -43,7 +44,6 @@
  *
  *  i_mutex
  *      PG_locked
- *	  ->coh_page_guard
  *	  ->coh_lock_guard
  *	  ->coh_attr_guard
  *	  ->ls_guard
@@ -63,8 +63,6 @@
 
 static struct kmem_cache *cl_env_kmem;
 
-/** Lock class of cl_object_header::coh_page_guard */
-static struct lock_class_key cl_page_guard_class;
 /** Lock class of cl_object_header::coh_lock_guard */
 static struct lock_class_key cl_lock_guard_class;
 /** Lock class of cl_object_header::coh_attr_guard */
@@ -81,15 +79,10 @@ int cl_object_header_init(struct cl_object_header *h)
 
 	result = lu_object_header_init(&h->coh_lu);
 	if (result == 0) {
-		spin_lock_init(&h->coh_page_guard);
 		spin_lock_init(&h->coh_lock_guard);
 		spin_lock_init(&h->coh_attr_guard);
-		lockdep_set_class(&h->coh_page_guard, &cl_page_guard_class);
 		lockdep_set_class(&h->coh_lock_guard, &cl_lock_guard_class);
 		lockdep_set_class(&h->coh_attr_guard, &cl_attr_guard_class);
-		h->coh_pages = 0;
-		/* XXX hard coded GFP_* mask. */
-		INIT_RADIX_TREE(&h->coh_tree, GFP_ATOMIC);
 		INIT_LIST_HEAD(&h->coh_locks);
 		h->coh_page_bufsize = ALIGN(sizeof(struct cl_page), 8);
 	}
@@ -315,6 +308,32 @@ int cl_conf_set(const struct lu_env *env, struct cl_object *obj,
 EXPORT_SYMBOL(cl_conf_set);
 
 /**
+ * Prunes caches of pages and locks for this object.
+ */
+void cl_object_prune(const struct lu_env *env, struct cl_object *obj)
+{
+	struct lu_object_header *top;
+	struct cl_object *o;
+	int result;
+
+	top = obj->co_lu.lo_header;
+	result = 0;
+	list_for_each_entry(o, &top->loh_layers, co_lu.lo_linkage) {
+		if (o->co_ops->coo_prune) {
+			result = o->co_ops->coo_prune(env, o);
+			if (result != 0)
+				break;
+		}
+	}
+
+	/* TODO: pruning locks will be moved into layers after cl_lock
+	 * simplification is done
+	 */
+	cl_locks_prune(env, obj, 1);
+}
+EXPORT_SYMBOL(cl_object_prune);
+
+/**
  * Helper function removing all object locks, and marking object for
  * deletion. All object pages must have been deleted at this point.
  *
@@ -326,8 +345,6 @@ void cl_object_kill(const struct lu_env *env, struct cl_object *obj)
 	struct cl_object_header *hdr;
 
 	hdr = cl_object_header(obj);
-	LASSERT(!hdr->coh_tree.rnode);
-	LASSERT(hdr->coh_pages == 0);
 
 	set_bit(LU_OBJECT_HEARD_BANSHEE, &hdr->coh_lu.loh_flags);
 	/*
@@ -341,16 +358,6 @@ void cl_object_kill(const struct lu_env *env, struct cl_object *obj)
 }
 EXPORT_SYMBOL(cl_object_kill);
 
-/**
- * Prunes caches of pages and locks for this object.
- */
-void cl_object_prune(const struct lu_env *env, struct cl_object *obj)
-{
-	cl_pages_prune(env, obj);
-	cl_locks_prune(env, obj, 1);
-}
-EXPORT_SYMBOL(cl_object_prune);
-
 void cache_stats_init(struct cache_stats *cs, const char *name)
 {
 	int i;
diff --git a/drivers/staging/lustre/lustre/obdclass/cl_page.c b/drivers/staging/lustre/lustre/obdclass/cl_page.c
index 231a2f2..8169836 100644
--- a/drivers/staging/lustre/lustre/obdclass/cl_page.c
+++ b/drivers/staging/lustre/lustre/obdclass/cl_page.c
@@ -36,6 +36,7 @@
  * Client Lustre Page.
  *
  *   Author: Nikita Danilov <nikita.danilov@sun.com>
+ *   Author: Jinshan Xiong <jinshan.xiong@intel.com>
  */
 
 #define DEBUG_SUBSYSTEM S_CLASS
@@ -48,8 +49,7 @@
 #include "../include/cl_object.h"
 #include "cl_internal.h"
 
-static void cl_page_delete0(const struct lu_env *env, struct cl_page *pg,
-			    int radix);
+static void cl_page_delete0(const struct lu_env *env, struct cl_page *pg);
 
 # define PASSERT(env, page, expr)					   \
 	do {								   \
@@ -79,8 +79,7 @@ static struct cl_page *cl_page_top_trusted(struct cl_page *page)
  *
  * This function can be used to obtain initial reference to previously
  * unreferenced cached object. It can be called only if concurrent page
- * reclamation is somehow prevented, e.g., by locking page radix-tree
- * (cl_object_header::hdr->coh_page_guard), or by keeping a lock on a VM page,
+ * reclamation is somehow prevented, e.g., by keeping a lock on a VM page,
  * associated with \a page.
  *
  * Use with care! Not exported.
@@ -114,132 +113,6 @@ cl_page_at_trusted(const struct cl_page *page,
 	return NULL;
 }
 
-/**
- * Returns a page with given index in the given object, or NULL if no page is
- * found. Acquires a reference on \a page.
- *
- * Locking: called under cl_object_header::coh_page_guard spin-lock.
- */
-struct cl_page *cl_page_lookup(struct cl_object_header *hdr, pgoff_t index)
-{
-	struct cl_page *page;
-
-	assert_spin_locked(&hdr->coh_page_guard);
-
-	page = radix_tree_lookup(&hdr->coh_tree, index);
-	if (page)
-		cl_page_get_trust(page);
-	return page;
-}
-EXPORT_SYMBOL(cl_page_lookup);
-
-/**
- * Returns a list of pages by a given [start, end] of \a obj.
- *
- * \param resched If not NULL, then we give up before hogging CPU for too
- * long and set *resched = 1, in that case caller should implement a retry
- * logic.
- *
- * Gang tree lookup (radix_tree_gang_lookup()) optimization is absolutely
- * crucial in the face of [offset, EOF] locks.
- *
- * Return at least one page in @queue unless there is no covered page.
- */
-int cl_page_gang_lookup(const struct lu_env *env, struct cl_object *obj,
-			struct cl_io *io, pgoff_t start, pgoff_t end,
-			cl_page_gang_cb_t cb, void *cbdata)
-{
-	struct cl_object_header *hdr;
-	struct cl_page	  *page;
-	struct cl_page	 **pvec;
-	const struct cl_page_slice  *slice;
-	const struct lu_device_type *dtype;
-	pgoff_t		  idx;
-	unsigned int	     nr;
-	unsigned int	     i;
-	unsigned int	     j;
-	int		      res = CLP_GANG_OKAY;
-	int		      tree_lock = 1;
-
-	idx = start;
-	hdr = cl_object_header(obj);
-	pvec = cl_env_info(env)->clt_pvec;
-	dtype = cl_object_top(obj)->co_lu.lo_dev->ld_type;
-	spin_lock(&hdr->coh_page_guard);
-	while ((nr = radix_tree_gang_lookup(&hdr->coh_tree, (void **)pvec,
-					    idx, CLT_PVEC_SIZE)) > 0) {
-		int end_of_region = 0;
-
-		idx = pvec[nr - 1]->cp_index + 1;
-		for (i = 0, j = 0; i < nr; ++i) {
-			page = pvec[i];
-			pvec[i] = NULL;
-
-			LASSERT(page->cp_type == CPT_CACHEABLE);
-			if (page->cp_index > end) {
-				end_of_region = 1;
-				break;
-			}
-			if (page->cp_state == CPS_FREEING)
-				continue;
-
-			slice = cl_page_at_trusted(page, dtype);
-			/*
-			 * Pages for lsm-less file has no underneath sub-page
-			 * for osc, in case of ...
-			 */
-			PASSERT(env, page, slice);
-
-			page = slice->cpl_page;
-			/*
-			 * Can safely call cl_page_get_trust() under
-			 * radix-tree spin-lock.
-			 *
-			 * XXX not true, because @page is from object another
-			 * than @hdr and protected by different tree lock.
-			 */
-			cl_page_get_trust(page);
-			lu_ref_add_atomic(&page->cp_reference,
-					  "gang_lookup", current);
-			pvec[j++] = page;
-		}
-
-		/*
-		 * Here a delicate locking dance is performed. Current thread
-		 * holds a reference to a page, but has to own it before it
-		 * can be placed into queue. Owning implies waiting, so
-		 * radix-tree lock is to be released. After a wait one has to
-		 * check that pages weren't truncated (cl_page_own() returns
-		 * error in the latter case).
-		 */
-		spin_unlock(&hdr->coh_page_guard);
-		tree_lock = 0;
-
-		for (i = 0; i < j; ++i) {
-			page = pvec[i];
-			if (res == CLP_GANG_OKAY)
-				res = (*cb)(env, io, page, cbdata);
-			lu_ref_del(&page->cp_reference,
-				   "gang_lookup", current);
-			cl_page_put(env, page);
-		}
-		if (nr < CLT_PVEC_SIZE || end_of_region)
-			break;
-
-		if (res == CLP_GANG_OKAY && need_resched())
-			res = CLP_GANG_RESCHED;
-		if (res != CLP_GANG_OKAY)
-			break;
-
-		spin_lock(&hdr->coh_page_guard);
-		tree_lock = 1;
-	}
-	if (tree_lock)
-		spin_unlock(&hdr->coh_page_guard);
-	return res;
-}
-EXPORT_SYMBOL(cl_page_gang_lookup);
-
 static void cl_page_free(const struct lu_env *env, struct cl_page *page)
 {
 	struct cl_object *obj  = page->cp_obj;
@@ -276,10 +149,10 @@ static inline void cl_page_state_set_trust(struct cl_page *page,
 	*(enum cl_page_state *)&page->cp_state = state;
 }
 
-static struct cl_page *cl_page_alloc(const struct lu_env *env,
-				     struct cl_object *o, pgoff_t ind,
-				     struct page *vmpage,
-				     enum cl_page_type type)
+struct cl_page *cl_page_alloc(const struct lu_env *env,
+			      struct cl_object *o, pgoff_t ind,
+			      struct page *vmpage,
+			      enum cl_page_type type)
 {
 	struct cl_page	  *page;
 	struct lu_object_header *head;
@@ -289,8 +162,6 @@ static struct cl_page *cl_page_alloc(const struct lu_env *env,
 		int result = 0;
 
 		atomic_set(&page->cp_ref, 1);
-		if (type == CPT_CACHEABLE) /* for radix tree */
-			atomic_inc(&page->cp_ref);
 		page->cp_obj = o;
 		cl_object_get(o);
 		lu_object_ref_add_at(&o->co_lu, &page->cp_obj_ref, "cl_page",
@@ -309,7 +180,7 @@ static struct cl_page *cl_page_alloc(const struct lu_env *env,
 				result = o->co_ops->coo_page_init(env, o,
 								  page, vmpage);
 				if (result != 0) {
-					cl_page_delete0(env, page, 0);
+					cl_page_delete0(env, page);
 					cl_page_free(env, page);
 					page = ERR_PTR(result);
 					break;
@@ -321,6 +192,7 @@ static struct cl_page *cl_page_alloc(const struct lu_env *env,
 	}
 	return page;
 }
+EXPORT_SYMBOL(cl_page_alloc);
 
 /**
  * Returns a cl_page with index \a idx at the object \a o, and associated with
@@ -333,16 +205,13 @@ static struct cl_page *cl_page_alloc(const struct lu_env *env,
  *
  * \see cl_object_find(), cl_lock_find()
  */
-static struct cl_page *cl_page_find0(const struct lu_env *env,
-				     struct cl_object *o,
-				     pgoff_t idx, struct page *vmpage,
-				     enum cl_page_type type,
-				     struct cl_page *parent)
+struct cl_page *cl_page_find(const struct lu_env *env,
+			     struct cl_object *o,
+			     pgoff_t idx, struct page *vmpage,
+			     enum cl_page_type type)
 {
 	struct cl_page	  *page = NULL;
-	struct cl_page	  *ghost = NULL;
 	struct cl_object_header *hdr;
-	int err;
 
 	LASSERT(type == CPT_CACHEABLE || type == CPT_TRANSIENT);
 	might_sleep();
@@ -368,90 +237,19 @@ static struct cl_page *cl_page_find0(const struct lu_env *env,
 		 *       reference on it.
 		 */
 		page = cl_vmpage_page(vmpage, o);
-		PINVRNT(env, page,
-			ergo(page,
-			     cl_page_vmpage(env, page) == vmpage &&
-			     (void *)radix_tree_lookup(&hdr->coh_tree,
-						       idx) == page));
-	}
 
-	if (page)
-		return page;
+		if (page)
+			return page;
+	}
 
 	/* allocate and initialize cl_page */
 	page = cl_page_alloc(env, o, idx, vmpage, type);
-	if (IS_ERR(page))
-		return page;
-
-	if (type == CPT_TRANSIENT) {
-		if (parent) {
-			LASSERT(!page->cp_parent);
-			page->cp_parent = parent;
-			parent->cp_child = page;
-		}
-		return page;
-	}
-
-	/*
-	 * XXX optimization: use radix_tree_preload() here, and change tree
-	 * gfp mask to GFP_KERNEL in cl_object_header_init().
-	 */
-	spin_lock(&hdr->coh_page_guard);
-	err = radix_tree_insert(&hdr->coh_tree, idx, page);
-	if (err != 0) {
-		ghost = page;
-		/*
-		 * Noted by Jay: a lock on \a vmpage protects cl_page_find()
-		 * from this race, but
-		 *
-		 *     0. it's better to have cl_page interface "locally
-		 *     consistent" so that its correctness can be reasoned
-		 *     about without appealing to the (obscure world of) VM
-		 *     locking.
-		 *
-		 *     1. handling this race allows ->coh_tree to remain
-		 *     consistent even when VM locking is somehow busted,
-		 *     which is very useful during diagnosing and debugging.
-		 */
-		page = ERR_PTR(err);
-		CL_PAGE_DEBUG(D_ERROR, env, ghost,
-			      "fail to insert into radix tree: %d\n", err);
-	} else {
-		if (parent) {
-			LASSERT(!page->cp_parent);
-			page->cp_parent = parent;
-			parent->cp_child = page;
-		}
-		hdr->coh_pages++;
-	}
-	spin_unlock(&hdr->coh_page_guard);
-
-	if (unlikely(ghost)) {
-		cl_page_delete0(env, ghost, 0);
-		cl_page_free(env, ghost);
-	}
 	return page;
 }
-
-struct cl_page *cl_page_find(const struct lu_env *env, struct cl_object *o,
-			     pgoff_t idx, struct page *vmpage,
-			     enum cl_page_type type)
-{
-	return cl_page_find0(env, o, idx, vmpage, type, NULL);
-}
 EXPORT_SYMBOL(cl_page_find);
 
-struct cl_page *cl_page_find_sub(const struct lu_env *env, struct cl_object *o,
-				 pgoff_t idx, struct page *vmpage,
-				 struct cl_page *parent)
-{
-	return cl_page_find0(env, o, idx, vmpage, parent->cp_type, parent);
-}
-EXPORT_SYMBOL(cl_page_find_sub);
-
 static inline int cl_page_invariant(const struct cl_page *pg)
 {
-	struct cl_object_header *header;
 	struct cl_page	  *parent;
 	struct cl_page	  *child;
 	struct cl_io	    *owner;
@@ -461,7 +259,6 @@ static inline int cl_page_invariant(const struct cl_page *pg)
 	 */
 	LINVRNT(cl_page_is_vmlocked(NULL, pg));
 
-	header = cl_object_header(pg->cp_obj);
 	parent = pg->cp_parent;
 	child  = pg->cp_child;
 	owner  = pg->cp_owner;
@@ -473,15 +270,7 @@ static inline int cl_page_invariant(const struct cl_page *pg)
 		ergo(parent, pg->cp_obj != parent->cp_obj) &&
 		ergo(owner && parent,
 		     parent->cp_owner == pg->cp_owner->ci_parent) &&
-		ergo(owner && child, child->cp_owner->ci_parent == owner) &&
-		/*
-		 * Either page is early in initialization (has neither child
-		 * nor parent yet), or it is in the object radix tree.
-		 */
-		ergo(pg->cp_state < CPS_FREEING && pg->cp_type == CPT_CACHEABLE,
-		     (void *)radix_tree_lookup(&header->coh_tree,
-					       pg->cp_index) == pg ||
-		     (!child && !parent));
+		ergo(owner && child, child->cp_owner->ci_parent == owner);
 }
 
 static void cl_page_state_set0(const struct lu_env *env,
@@ -1001,11 +790,8 @@ EXPORT_SYMBOL(cl_page_discard);
  * pages, e.g,. in a error handling cl_page_find()->cl_page_delete0()
  * path. Doesn't check page invariant.
  */
-static void cl_page_delete0(const struct lu_env *env, struct cl_page *pg,
-			    int radix)
+static void cl_page_delete0(const struct lu_env *env, struct cl_page *pg)
 {
-	struct cl_page *tmp = pg;
-
 	PASSERT(env, pg, pg == cl_page_top(pg));
 	PASSERT(env, pg, pg->cp_state != CPS_FREEING);
 
@@ -1014,41 +800,11 @@ static void cl_page_delete0(const struct lu_env *env, struct cl_page *pg,
 	 */
 	cl_page_owner_clear(pg);
 
-	/*
-	 * unexport the page firstly before freeing it so that
-	 * the page content is considered to be invalid.
-	 * We have to do this because a CPS_FREEING cl_page may
-	 * be NOT under the protection of a cl_lock.
-	 * Afterwards, if this page is found by other threads, then this
-	 * page will be forced to reread.
-	 */
-	cl_page_export(env, pg, 0);
 	cl_page_state_set0(env, pg, CPS_FREEING);
 
-	CL_PAGE_INVOID(env, pg, CL_PAGE_OP(cpo_delete),
-		       (const struct lu_env *, const struct cl_page_slice *));
-
-	if (tmp->cp_type == CPT_CACHEABLE) {
-		if (!radix)
-			/* !radix means that @pg is not yet in the radix tree,
-			 * skip removing it.
-			 */
-			tmp = pg->cp_child;
-		for (; tmp; tmp = tmp->cp_child) {
-			void		    *value;
-			struct cl_object_header *hdr;
-
-			hdr = cl_object_header(tmp->cp_obj);
-			spin_lock(&hdr->coh_page_guard);
-			value = radix_tree_delete(&hdr->coh_tree,
-						  tmp->cp_index);
-			PASSERT(env, tmp, value == tmp);
-			PASSERT(env, tmp, hdr->coh_pages > 0);
-			hdr->coh_pages--;
-			spin_unlock(&hdr->coh_page_guard);
-			cl_page_put(env, tmp);
-		}
-	}
+	CL_PAGE_INVOID_REVERSE(env, pg, CL_PAGE_OP(cpo_delete),
+			       (const struct lu_env *,
+				const struct cl_page_slice *));
 }
 
 /**
@@ -1079,30 +835,11 @@ static void cl_page_delete0(const struct lu_env *env, struct cl_page *pg,
 void cl_page_delete(const struct lu_env *env, struct cl_page *pg)
 {
 	PINVRNT(env, pg, cl_page_invariant(pg));
-	cl_page_delete0(env, pg, 1);
+	cl_page_delete0(env, pg);
 }
 EXPORT_SYMBOL(cl_page_delete);
 
 /**
- * Unmaps page from user virtual memory.
- *
- * Calls cl_page_operations::cpo_unmap() through all layers top-to-bottom. The
- * layer responsible for VM interaction has to unmap page from user space
- * virtual memory.
- *
- * \see cl_page_operations::cpo_unmap()
- */
-int cl_page_unmap(const struct lu_env *env,
-		  struct cl_io *io, struct cl_page *pg)
-{
-	PINVRNT(env, pg, cl_page_is_owned(pg, io));
-	PINVRNT(env, pg, cl_page_invariant(pg));
-
-	return cl_page_invoke(env, io, pg, CL_PAGE_OP(cpo_unmap));
-}
-EXPORT_SYMBOL(cl_page_unmap);
-
-/**
  * Marks page up-to-date.
  *
  * Call cl_page_operations::cpo_export() through all layers top-to-bottom. The
@@ -1359,53 +1096,6 @@ int cl_page_is_under_lock(const struct lu_env *env, struct cl_io *io,
 }
 EXPORT_SYMBOL(cl_page_is_under_lock);
 
-static int page_prune_cb(const struct lu_env *env, struct cl_io *io,
-			 struct cl_page *page, void *cbdata)
-{
-	cl_page_own(env, io, page);
-	cl_page_unmap(env, io, page);
-	cl_page_discard(env, io, page);
-	cl_page_disown(env, io, page);
-	return CLP_GANG_OKAY;
-}
-
-/**
- * Purges all cached pages belonging to the object \a obj.
- */
-int cl_pages_prune(const struct lu_env *env, struct cl_object *clobj)
-{
-	struct cl_thread_info   *info;
-	struct cl_object	*obj = cl_object_top(clobj);
-	struct cl_io	    *io;
-	int		      result;
-
-	info  = cl_env_info(env);
-	io    = &info->clt_io;
-
-	/*
-	 * initialize the io. This is ugly since we never do IO in this
-	 * function, we just make cl_page_list functions happy. -jay
-	 */
-	io->ci_obj = obj;
-	io->ci_ignore_layout = 1;
-	result = cl_io_init(env, io, CIT_MISC, obj);
-	if (result != 0) {
-		cl_io_fini(env, io);
-		return io->ci_result;
-	}
-
-	do {
-		result = cl_page_gang_lookup(env, obj, io, 0, CL_PAGE_EOF,
-					     page_prune_cb, NULL);
-		if (result == CLP_GANG_RESCHED)
-			cond_resched();
-	} while (result != CLP_GANG_OKAY);
-
-	cl_io_fini(env, io);
-	return result;
-}
-EXPORT_SYMBOL(cl_pages_prune);
-
 /**
  * Tells transfer engine that only part of a page is to be transmitted.
  *
diff --git a/drivers/staging/lustre/lustre/osc/osc_cache.c b/drivers/staging/lustre/lustre/osc/osc_cache.c
index 6196c3b..c9d4e3c 100644
--- a/drivers/staging/lustre/lustre/osc/osc_cache.c
+++ b/drivers/staging/lustre/lustre/osc/osc_cache.c
@@ -1015,7 +1015,6 @@ static int osc_extent_truncate(struct osc_extent *ext, pgoff_t trunc_index,
 		lu_ref_add(&page->cp_reference, "truncate", current);
 
 		if (cl_page_own(env, io, page) == 0) {
-			cl_page_unmap(env, io, page);
 			cl_page_discard(env, io, page);
 			cl_page_disown(env, io, page);
 		} else {
@@ -2136,8 +2135,7 @@ static void osc_check_rpcs(const struct lu_env *env, struct client_obd *cli)
 
 		cl_object_get(obj);
 		client_obd_list_unlock(&cli->cl_loi_list_lock);
-		lu_object_ref_add_at(&obj->co_lu, &link, "check",
-				     current);
+		lu_object_ref_add_at(&obj->co_lu, &link, "check", current);
 
 		/* attempt some read/write balancing by alternating between
 		 * reads and writes in an object.  The makes_rpc checks here
@@ -2180,8 +2178,7 @@ static void osc_check_rpcs(const struct lu_env *env, struct client_obd *cli)
 		osc_object_unlock(osc);
 
 		osc_list_maint(cli, osc);
-		lu_object_ref_del_at(&obj->co_lu, &link, "check",
-				     current);
+		lu_object_ref_del_at(&obj->co_lu, &link, "check", current);
 		cl_object_put(env, obj);
 
 		client_obd_list_lock(&cli->cl_loi_list_lock);
@@ -2994,4 +2991,204 @@ int osc_cache_writeback_range(const struct lu_env *env, struct osc_object *obj,
 	return result;
 }
 
+/**
+ * Returns a list of pages by a given [start, end] of \a obj.
+ *
+ * \param resched If not NULL, then we give up before hogging CPU for too
+ * long and set *resched = 1, in that case caller should implement a retry
+ * logic.
+ *
+ * Gang tree lookup (radix_tree_gang_lookup()) optimization is absolutely
+ * crucial in the face of [offset, EOF] locks.
+ *
+ * Return at least one page in @queue unless there is no covered page.
+ */
+int osc_page_gang_lookup(const struct lu_env *env, struct cl_io *io,
+			 struct osc_object *osc, pgoff_t start, pgoff_t end,
+			 osc_page_gang_cbt cb, void *cbdata)
+{
+	struct osc_page *ops;
+	void            **pvec;
+	pgoff_t         idx;
+	unsigned int    nr;
+	unsigned int    i;
+	unsigned int    j;
+	int             res = CLP_GANG_OKAY;
+	bool            tree_lock = true;
+
+	idx = start;
+	pvec = osc_env_info(env)->oti_pvec;
+	spin_lock(&osc->oo_tree_lock);
+	while ((nr = radix_tree_gang_lookup(&osc->oo_tree, pvec,
+					    idx, OTI_PVEC_SIZE)) > 0) {
+		struct cl_page *page;
+		bool end_of_region = false;
+
+		for (i = 0, j = 0; i < nr; ++i) {
+			ops = pvec[i];
+			pvec[i] = NULL;
+
+			idx = osc_index(ops);
+			if (idx > end) {
+				end_of_region = true;
+				break;
+			}
+
+			page = cl_page_top(ops->ops_cl.cpl_page);
+			LASSERT(page->cp_type == CPT_CACHEABLE);
+			if (page->cp_state == CPS_FREEING)
+				continue;
+
+			cl_page_get(page);
+			lu_ref_add_atomic(&page->cp_reference,
+					  "gang_lookup", current);
+			pvec[j++] = ops;
+		}
+		++idx;
+
+		/*
+		 * Here a delicate locking dance is performed. Current thread
+		 * holds a reference to a page, but has to own it before it
+		 * can be placed into queue. Owning implies waiting, so
+		 * radix-tree lock is to be released. After a wait one has to
+		 * check that pages weren't truncated (cl_page_own() returns
+		 * error in the latter case).
+		 */
+		spin_unlock(&osc->oo_tree_lock);
+		tree_lock = false;
+
+		for (i = 0; i < j; ++i) {
+			ops = pvec[i];
+			if (res == CLP_GANG_OKAY)
+				res = (*cb)(env, io, ops, cbdata);
+
+			page = cl_page_top(ops->ops_cl.cpl_page);
+			lu_ref_del(&page->cp_reference, "gang_lookup", current);
+			cl_page_put(env, page);
+		}
+		if (nr < OTI_PVEC_SIZE || end_of_region)
+			break;
+
+		if (res == CLP_GANG_OKAY && need_resched())
+			res = CLP_GANG_RESCHED;
+		if (res != CLP_GANG_OKAY)
+			break;
+
+		spin_lock(&osc->oo_tree_lock);
+		tree_lock = true;
+	}
+	if (tree_lock)
+		spin_unlock(&osc->oo_tree_lock);
+	return res;
+}
+
+/**
+ * Check if page @page is covered by an extra lock or discard it.
+ */
+static int check_and_discard_cb(const struct lu_env *env, struct cl_io *io,
+				struct osc_page *ops, void *cbdata)
+{
+	struct osc_thread_info *info = osc_env_info(env);
+	struct cl_lock *lock = cbdata;
+	pgoff_t index;
+
+	index = osc_index(ops);
+	if (index >= info->oti_fn_index) {
+		struct cl_lock *tmp;
+		struct cl_page *page = cl_page_top(ops->ops_cl.cpl_page);
+
+		/* refresh non-overlapped index */
+		tmp = cl_lock_at_pgoff(env, lock->cll_descr.cld_obj, index,
+				       lock, 1, 0);
+		if (tmp) {
+			/* Cache the first-non-overlapped index so as to skip
+			 * all pages within [index, oti_fn_index). This
+			 * is safe because if tmp lock is canceled, it will
+			 * discard these pages.
+			 */
+			info->oti_fn_index = tmp->cll_descr.cld_end + 1;
+			if (tmp->cll_descr.cld_end == CL_PAGE_EOF)
+				info->oti_fn_index = CL_PAGE_EOF;
+			cl_lock_put(env, tmp);
+		} else if (cl_page_own(env, io, page) == 0) {
+			/* discard the page */
+			cl_page_discard(env, io, page);
+			cl_page_disown(env, io, page);
+		} else {
+			LASSERT(page->cp_state == CPS_FREEING);
+		}
+	}
+
+	info->oti_next_index = index + 1;
+	return CLP_GANG_OKAY;
+}
+
+static int discard_cb(const struct lu_env *env, struct cl_io *io,
+		      struct osc_page *ops, void *cbdata)
+{
+	struct osc_thread_info *info = osc_env_info(env);
+	struct cl_lock *lock = cbdata;
+	struct cl_page *page = cl_page_top(ops->ops_cl.cpl_page);
+
+	LASSERT(lock->cll_descr.cld_mode >= CLM_WRITE);
+	KLASSERT(ergo(page->cp_type == CPT_CACHEABLE,
+		      !PageWriteback(cl_page_vmpage(env, page))));
+	KLASSERT(ergo(page->cp_type == CPT_CACHEABLE,
+		      !PageDirty(cl_page_vmpage(env, page))));
+
+	/* page is top page. */
+	info->oti_next_index = osc_index(ops) + 1;
+	if (cl_page_own(env, io, page) == 0) {
+		/* discard the page */
+		cl_page_discard(env, io, page);
+		cl_page_disown(env, io, page);
+	} else {
+		LASSERT(page->cp_state == CPS_FREEING);
+	}
+
+	return CLP_GANG_OKAY;
+}
+
+/**
+ * Discard pages protected by the given lock. This function traverses radix
+ * tree to find all covering pages and discard them. If a page is being covered
+ * by other locks, it should remain in cache.
+ *
+ * If error happens on any step, the process continues anyway (the reasoning
+ * behind this being that lock cancellation cannot be delayed indefinitely).
+ */
+int osc_lock_discard_pages(const struct lu_env *env, struct osc_lock *ols)
+{
+	struct osc_thread_info *info = osc_env_info(env);
+	struct cl_io *io = &info->oti_io;
+	struct cl_object *osc = ols->ols_cl.cls_obj;
+	struct cl_lock *lock = ols->ols_cl.cls_lock;
+	struct cl_lock_descr *descr = &lock->cll_descr;
+	osc_page_gang_cbt cb;
+	int res;
+	int result;
+
+	io->ci_obj = cl_object_top(osc);
+	io->ci_ignore_layout = 1;
+	result = cl_io_init(env, io, CIT_MISC, io->ci_obj);
+	if (result != 0)
+		goto out;
+
+	cb = descr->cld_mode == CLM_READ ? check_and_discard_cb : discard_cb;
+	info->oti_fn_index = info->oti_next_index = descr->cld_start;
+	do {
+		res = osc_page_gang_lookup(env, io, cl2osc(osc),
+					   info->oti_next_index, descr->cld_end,
+					   cb, (void *)lock);
+		if (info->oti_next_index > descr->cld_end)
+			break;
+
+		if (res == CLP_GANG_RESCHED)
+			cond_resched();
+	} while (res != CLP_GANG_OKAY);
+out:
+	cl_io_fini(env, io);
+	return result;
+}
+
 /** @} osc */
diff --git a/drivers/staging/lustre/lustre/osc/osc_cl_internal.h b/drivers/staging/lustre/lustre/osc/osc_cl_internal.h
index b6325f5..e70f06c 100644
--- a/drivers/staging/lustre/lustre/osc/osc_cl_internal.h
+++ b/drivers/staging/lustre/lustre/osc/osc_cl_internal.h
@@ -111,7 +111,12 @@ struct osc_thread_info {
 	struct lustre_handle    oti_handle;
 	struct cl_page_list     oti_plist;
 	struct cl_io		oti_io;
-	struct cl_page	       *oti_pvec[OTI_PVEC_SIZE];
+	void			*oti_pvec[OTI_PVEC_SIZE];
+	/**
+	 * Fields used by cl_lock_discard_pages().
+	 */
+	pgoff_t			oti_next_index;
+	pgoff_t			oti_fn_index; /* first non-overlapped index */
 };
 
 struct osc_object {
@@ -161,6 +166,13 @@ struct osc_object {
 	 * oo_{read|write}_pages soon.
 	 */
 	spinlock_t	    oo_lock;
+
+	/**
+	 * Radix tree for caching pages
+	 */
+	struct radix_tree_root	oo_tree;
+	spinlock_t		oo_tree_lock;
+	unsigned long		oo_npages;
 };
 
 static inline void osc_object_lock(struct osc_object *obj)
@@ -569,6 +581,11 @@ static inline struct osc_page *oap2osc_page(struct osc_async_page *oap)
 	return (struct osc_page *)container_of(oap, struct osc_page, ops_oap);
 }
 
+static inline pgoff_t osc_index(struct osc_page *opg)
+{
+	return opg->ops_cl.cpl_page->cp_index;
+}
+
 static inline struct osc_lock *cl2osc_lock(const struct cl_lock_slice *slice)
 {
 	LINVRNT(osc_is_object(&slice->cls_obj->co_lu));
@@ -691,6 +708,14 @@ int osc_extent_finish(const struct lu_env *env, struct osc_extent *ext,
 		      int sent, int rc);
 void osc_extent_release(const struct lu_env *env, struct osc_extent *ext);
 
+int osc_lock_discard_pages(const struct lu_env *env, struct osc_lock *lock);
+
+typedef int (*osc_page_gang_cbt)(const struct lu_env *, struct cl_io *,
+				 struct osc_page *, void *);
+int osc_page_gang_lookup(const struct lu_env *env, struct cl_io *io,
+			 struct osc_object *osc, pgoff_t start, pgoff_t end,
+			 osc_page_gang_cbt cb, void *cbdata);
+
 /** @} osc */
 
 #endif /* OSC_CL_INTERNAL_H */
diff --git a/drivers/staging/lustre/lustre/osc/osc_io.c b/drivers/staging/lustre/lustre/osc/osc_io.c
index a0fa533..1536d31 100644
--- a/drivers/staging/lustre/lustre/osc/osc_io.c
+++ b/drivers/staging/lustre/lustre/osc/osc_io.c
@@ -391,18 +391,13 @@ static int osc_async_upcall(void *a, int rc)
  * Checks that there are no pages being written in the extent being truncated.
  */
 static int trunc_check_cb(const struct lu_env *env, struct cl_io *io,
-			  struct cl_page *page, void *cbdata)
+			  struct osc_page *ops, void *cbdata)
 {
-	const struct cl_page_slice *slice;
-	struct osc_page *ops;
+	struct cl_page *page = ops->ops_cl.cpl_page;
 	struct osc_async_page *oap;
 	__u64 start = *(__u64 *)cbdata;
 
-	slice = cl_page_at(page, &osc_device_type);
-	LASSERT(slice);
-	ops = cl2osc_page(slice);
 	oap = &ops->ops_oap;
-
 	if (oap->oap_cmd & OBD_BRW_WRITE &&
 	    !list_empty(&oap->oap_pending_item))
 		CL_PAGE_DEBUG(D_ERROR, env, page, "exists %llu/%s.\n",
@@ -434,8 +429,9 @@ static void osc_trunc_check(const struct lu_env *env, struct cl_io *io,
 	/*
 	 * Complain if there are pages in the truncated region.
 	 */
-	cl_page_gang_lookup(env, clob, io, start + partial, CL_PAGE_EOF,
-			    trunc_check_cb, (void *)&size);
+	osc_page_gang_lookup(env, io, cl2osc(clob),
+			     start + partial, CL_PAGE_EOF,
+			     trunc_check_cb, (void *)&size);
 }
 
 static int osc_io_setattr_start(const struct lu_env *env,
diff --git a/drivers/staging/lustre/lustre/osc/osc_lock.c b/drivers/staging/lustre/lustre/osc/osc_lock.c
index 013df97..3a8a6d1 100644
--- a/drivers/staging/lustre/lustre/osc/osc_lock.c
+++ b/drivers/staging/lustre/lustre/osc/osc_lock.c
@@ -36,6 +36,7 @@
  * Implementation of cl_lock for OSC layer.
  *
  *   Author: Nikita Danilov <nikita.danilov@sun.com>
+ *   Author: Jinshan Xiong <jinshan.xiong@intel.com>
  */
 
 #define DEBUG_SUBSYSTEM S_OSC
@@ -897,11 +898,8 @@ static int osc_ldlm_glimpse_ast(struct ldlm_lock *dlmlock, void *data)
 static unsigned long osc_lock_weigh(const struct lu_env *env,
 				    const struct cl_lock_slice *slice)
 {
-	/*
-	 * don't need to grab coh_page_guard since we don't care the exact #
-	 * of pages..
-	 */
-	return cl_object_header(slice->cls_obj)->coh_pages;
+	/* TODO: check how many pages are covered by this lock */
+	return cl2osc(slice->cls_obj)->oo_npages;
 }
 
 static void osc_lock_build_einfo(const struct lu_env *env,
@@ -1276,7 +1274,7 @@ static int osc_lock_flush(struct osc_lock *ols, int discard)
 				result = 0;
 		}
 
-		rc = cl_lock_discard_pages(env, lock);
+		rc = osc_lock_discard_pages(env, ols);
 		if (result == 0 && rc < 0)
 			result = rc;
 
diff --git a/drivers/staging/lustre/lustre/osc/osc_object.c b/drivers/staging/lustre/lustre/osc/osc_object.c
index 9d474fc..2d2d39a 100644
--- a/drivers/staging/lustre/lustre/osc/osc_object.c
+++ b/drivers/staging/lustre/lustre/osc/osc_object.c
@@ -36,6 +36,7 @@
  * Implementation of cl_object for OSC layer.
  *
  *   Author: Nikita Danilov <nikita.danilov@sun.com>
+ *   Author: Jinshan Xiong <jinshan.xiong@intel.com>
  */
 
 #define DEBUG_SUBSYSTEM S_OSC
@@ -94,6 +95,7 @@ static int osc_object_init(const struct lu_env *env, struct lu_object *obj,
 	atomic_set(&osc->oo_nr_reads, 0);
 	atomic_set(&osc->oo_nr_writes, 0);
 	spin_lock_init(&osc->oo_lock);
+	spin_lock_init(&osc->oo_tree_lock);
 
 	cl_object_page_init(lu2cl(obj), sizeof(struct osc_page));
 
diff --git a/drivers/staging/lustre/lustre/osc/osc_page.c b/drivers/staging/lustre/lustre/osc/osc_page.c
index f0a9870..91ff607 100644
--- a/drivers/staging/lustre/lustre/osc/osc_page.c
+++ b/drivers/staging/lustre/lustre/osc/osc_page.c
@@ -36,6 +36,7 @@
  * Implementation of cl_page for OSC layer.
  *
  *   Author: Nikita Danilov <nikita.danilov@sun.com>
+ *   Author: Jinshan Xiong <jinshan.xiong@intel.com>
  */
 
 #define DEBUG_SUBSYSTEM S_OSC
@@ -326,6 +327,18 @@ static void osc_page_delete(const struct lu_env *env,
 	spin_unlock(&obj->oo_seatbelt);
 
 	osc_lru_del(osc_cli(obj), opg);
+
+	if (slice->cpl_page->cp_type == CPT_CACHEABLE) {
+		void *value;
+
+		spin_lock(&obj->oo_tree_lock);
+		value = radix_tree_delete(&obj->oo_tree, osc_index(opg));
+		if (value)
+			--obj->oo_npages;
+		spin_unlock(&obj->oo_tree_lock);
+
+		LASSERT(ergo(value, value == opg));
+	}
 }
 
 static void osc_page_clip(const struct lu_env *env,
@@ -422,8 +435,18 @@ int osc_page_init(const struct lu_env *env, struct cl_object *obj,
 	INIT_LIST_HEAD(&opg->ops_lru);
 
 	/* reserve an LRU space for this page */
-	if (page->cp_type == CPT_CACHEABLE && result == 0)
+	if (page->cp_type == CPT_CACHEABLE && result == 0) {
 		result = osc_lru_reserve(env, osc, opg);
+		if (result == 0) {
+			spin_lock(&osc->oo_tree_lock);
+			result = radix_tree_insert(&osc->oo_tree,
+						   page->cp_index, opg);
+			if (result == 0)
+				++osc->oo_npages;
+			spin_unlock(&osc->oo_tree_lock);
+			LASSERT(result == 0);
+		}
+	}
 
 	return result;
 }
@@ -611,7 +634,6 @@ static void discard_pagevec(const struct lu_env *env, struct cl_io *io,
 		struct cl_page *page = pvec[i];
 
 		LASSERT(cl_page_is_owned(page, io));
-		cl_page_unmap(env, io, page);
 		cl_page_discard(env, io, page);
 		cl_page_disown(env, io, page);
 		cl_page_put(env, page);
@@ -652,7 +674,7 @@ int osc_lru_shrink(const struct lu_env *env, struct client_obd *cli,
 		atomic_inc(&cli->cl_lru_shrinkers);
 	}
 
-	pvec = osc_env_info(env)->oti_pvec;
+	pvec = (struct cl_page **)osc_env_info(env)->oti_pvec;
 	io = &osc_env_info(env)->oti_io;
 
 	client_obd_list_lock(&cli->cl_lru_list_lock);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 08/46] staging/lustre/obdclass: Add a preallocated percpu cl_env
  2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
                   ` (6 preceding siblings ...)
  2016-03-30 23:48 ` [PATCH v2 07/46] staging/lustre/clio: collapse layer of cl_page green
@ 2016-03-30 23:48 ` green
  2016-03-30 23:48 ` [PATCH v2 09/46] staging/lustre/clio: add pages into writeback cache in batches green
                   ` (37 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: green @ 2016-03-30 23:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Jinshan Xiong, Prakash Surya, Oleg Drokin

From: Jinshan Xiong <jinshan.xiong@intel.com>

This change adds support for a single preallocated cl_env per CPU
which can be used in circumstances where reschedule is not possible.
Currently this interface is only used by the ll_releasepage function.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Signed-off-by: Prakash Surya <surya1@llnl.gov>
Reviewed-on: http://review.whamcloud.com/8174
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3321
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 drivers/staging/lustre/lustre/include/cl_object.h  |  12 +++
 drivers/staging/lustre/lustre/llite/rw26.c         |  54 +++++++----
 drivers/staging/lustre/lustre/obdclass/cl_lock.c   |   1 -
 drivers/staging/lustre/lustre/obdclass/cl_object.c | 107 +++++++++++++++++++++
 drivers/staging/lustre/lustre/obdclass/cl_page.c   |   1 -
 5 files changed, 152 insertions(+), 23 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/cl_object.h b/drivers/staging/lustre/lustre/include/cl_object.h
index 5daf688..e8455dc 100644
--- a/drivers/staging/lustre/lustre/include/cl_object.h
+++ b/drivers/staging/lustre/lustre/include/cl_object.h
@@ -2773,6 +2773,16 @@ static inline void *cl_object_page_slice(struct cl_object *clob,
 	return (void *)((char *)page + clob->co_slice_off);
 }
 
+/**
+ * Return refcount of cl_object.
+ */
+static inline int cl_object_refc(struct cl_object *clob)
+{
+	struct lu_object_header *header = clob->co_lu.lo_header;
+
+	return atomic_read(&header->loh_ref);
+}
+
 /** @} cl_object */
 
 /** \defgroup cl_page cl_page
@@ -3226,6 +3236,8 @@ void cl_env_reexit(void *cookie);
 void cl_env_implant(struct lu_env *env, int *refcheck);
 void cl_env_unplant(struct lu_env *env, int *refcheck);
 unsigned int cl_env_cache_purge(unsigned int nr);
+struct lu_env *cl_env_percpu_get(void);
+void cl_env_percpu_put(struct lu_env *env);
 
 /** @} cl_env */
 
diff --git a/drivers/staging/lustre/lustre/llite/rw26.c b/drivers/staging/lustre/lustre/llite/rw26.c
index b5335de..cc49c21 100644
--- a/drivers/staging/lustre/lustre/llite/rw26.c
+++ b/drivers/staging/lustre/lustre/llite/rw26.c
@@ -107,12 +107,12 @@ static void ll_invalidatepage(struct page *vmpage, unsigned int offset,
 
 static int ll_releasepage(struct page *vmpage, gfp_t gfp_mask)
 {
-	struct cl_env_nest nest;
 	struct lu_env     *env;
+	void			*cookie;
 	struct cl_object  *obj;
 	struct cl_page    *page;
 	struct address_space *mapping;
-	int result;
+	int result = 0;
 
 	LASSERT(PageLocked(vmpage));
 	if (PageWriteback(vmpage) || PageDirty(vmpage))
@@ -126,30 +126,42 @@ static int ll_releasepage(struct page *vmpage, gfp_t gfp_mask)
 	if (!obj)
 		return 1;
 
-	/* 1 for page allocator, 1 for cl_page and 1 for page cache */
+	/* 1 for caller, 1 for cl_page and 1 for page cache */
 	if (page_count(vmpage) > 3)
 		return 0;
 
-	/* TODO: determine what gfp should be used by @gfp_mask. */
-	env = cl_env_nested_get(&nest);
-	if (IS_ERR(env))
-		/* If we can't allocate an env we won't call cl_page_put()
-		 * later on which further means it's impossible to drop
-		 * page refcount by cl_page, so ask kernel to not free
-		 * this page.
-		 */
-		return 0;
-
 	page = cl_vmpage_page(vmpage, obj);
-	result = !page;
-	if (page) {
-		if (!cl_page_in_use(page)) {
-			result = 1;
-			cl_page_delete(env, page);
-		}
-		cl_page_put(env, page);
+	if (!page)
+		return 1;
+
+	cookie = cl_env_reenter();
+	env = cl_env_percpu_get();
+	LASSERT(!IS_ERR(env));
+
+	if (!cl_page_in_use(page)) {
+		result = 1;
+		cl_page_delete(env, page);
 	}
-	cl_env_nested_put(&nest, env);
+
+	/* To use percpu env array, the call path can not be rescheduled;
+	 * otherwise percpu array will be messed if ll_releaspage() called
+	 * again on the same CPU.
+	 *
+	 * If this page holds the last refc of cl_object, the following
+	 * call path may cause reschedule:
+	 *   cl_page_put -> cl_page_free -> cl_object_put ->
+	 *     lu_object_put -> lu_object_free -> lov_delete_raid0 ->
+	 *     cl_locks_prune.
+	 *
+	 * However, the kernel can't get rid of this inode until all pages have
+	 * been cleaned up. Now that we hold page lock here, it's pretty safe
+	 * that we won't get into object delete path.
+	 */
+	LASSERT(cl_object_refc(obj) > 1);
+	cl_page_put(env, page);
+
+	cl_env_percpu_put(env);
+	cl_env_reexit(cookie);
 	return result;
 }
 
diff --git a/drivers/staging/lustre/lustre/obdclass/cl_lock.c b/drivers/staging/lustre/lustre/obdclass/cl_lock.c
index 32ecc5a..fe8059a 100644
--- a/drivers/staging/lustre/lustre/obdclass/cl_lock.c
+++ b/drivers/staging/lustre/lustre/obdclass/cl_lock.c
@@ -255,7 +255,6 @@ static void cl_lock_free(const struct lu_env *env, struct cl_lock *lock)
 	LINVRNT(!cl_lock_is_mutexed(lock));
 
 	cl_lock_trace(D_DLMTRACE, env, "free lock", lock);
-	might_sleep();
 	while (!list_empty(&lock->cll_layers)) {
 		struct cl_lock_slice *slice;
 
diff --git a/drivers/staging/lustre/lustre/obdclass/cl_object.c b/drivers/staging/lustre/lustre/obdclass/cl_object.c
index 65b6402..fa9b083 100644
--- a/drivers/staging/lustre/lustre/obdclass/cl_object.c
+++ b/drivers/staging/lustre/lustre/obdclass/cl_object.c
@@ -390,6 +390,8 @@ static int cache_stats_print(const struct cache_stats *cs,
 	return 0;
 }
 
+static void cl_env_percpu_refill(void);
+
 /**
  * Initialize client site.
  *
@@ -409,6 +411,7 @@ int cl_site_init(struct cl_site *s, struct cl_device *d)
 			atomic_set(&s->cs_pages_state[0], 0);
 		for (i = 0; i < ARRAY_SIZE(s->cs_locks_state); ++i)
 			atomic_set(&s->cs_locks_state[i], 0);
+		cl_env_percpu_refill();
 	}
 	return result;
 }
@@ -1001,6 +1004,104 @@ void cl_lvb2attr(struct cl_attr *attr, const struct ost_lvb *lvb)
 }
 EXPORT_SYMBOL(cl_lvb2attr);
 
+static struct cl_env cl_env_percpu[NR_CPUS];
+
+static int cl_env_percpu_init(void)
+{
+	struct cl_env *cle;
+	int tags = LCT_REMEMBER | LCT_NOREF;
+	int i, j;
+	int rc = 0;
+
+	for_each_possible_cpu(i) {
+		struct lu_env *env;
+
+		cle = &cl_env_percpu[i];
+		env = &cle->ce_lu;
+
+		INIT_LIST_HEAD(&cle->ce_linkage);
+		cle->ce_magic = &cl_env_init0;
+		rc = lu_env_init(env, LCT_CL_THREAD | tags);
+		if (rc == 0) {
+			rc = lu_context_init(&cle->ce_ses, LCT_SESSION | tags);
+			if (rc == 0) {
+				lu_context_enter(&cle->ce_ses);
+				env->le_ses = &cle->ce_ses;
+			} else {
+				lu_env_fini(env);
+			}
+		}
+		if (rc != 0)
+			break;
+	}
+	if (rc != 0) {
+		/* Indices 0 to i (excluding i) were correctly initialized,
+		 * thus we must uninitialize up to i, the rest are undefined.
+		 */
+		for (j = 0; j < i; j++) {
+			cle = &cl_env_percpu[i];
+			lu_context_exit(&cle->ce_ses);
+			lu_context_fini(&cle->ce_ses);
+			lu_env_fini(&cle->ce_lu);
+		}
+	}
+
+	return rc;
+}
+
+static void cl_env_percpu_fini(void)
+{
+	int i;
+
+	for_each_possible_cpu(i) {
+		struct cl_env *cle = &cl_env_percpu[i];
+
+		lu_context_exit(&cle->ce_ses);
+		lu_context_fini(&cle->ce_ses);
+		lu_env_fini(&cle->ce_lu);
+	}
+}
+
+static void cl_env_percpu_refill(void)
+{
+	int i;
+
+	for_each_possible_cpu(i)
+		lu_env_refill(&cl_env_percpu[i].ce_lu);
+}
+
+void cl_env_percpu_put(struct lu_env *env)
+{
+	struct cl_env *cle;
+	int cpu;
+
+	cpu = smp_processor_id();
+	cle = cl_env_container(env);
+	LASSERT(cle == &cl_env_percpu[cpu]);
+
+	cle->ce_ref--;
+	LASSERT(cle->ce_ref == 0);
+
+	CL_ENV_DEC(busy);
+	cl_env_detach(cle);
+	cle->ce_debug = NULL;
+
+	put_cpu();
+}
+EXPORT_SYMBOL(cl_env_percpu_put);
+
+struct lu_env *cl_env_percpu_get()
+{
+	struct cl_env *cle;
+
+	cle = &cl_env_percpu[get_cpu()];
+	cl_env_init0(cle, __builtin_return_address(0));
+
+	cl_env_attach(cle);
+	return &cle->ce_lu;
+}
+EXPORT_SYMBOL(cl_env_percpu_get);
+
 /*****************************************************************************
  *
  * Temporary prototype thing: mirror obd-devices into cl devices.
@@ -1154,6 +1255,11 @@ int cl_global_init(void)
 	if (result)
 		goto out_lock;
 
+	result = cl_env_percpu_init();
+	if (result)
+		/* no cl_env_percpu_fini on error */
+		goto out_lock;
+
 	return 0;
 out_lock:
 	cl_lock_fini();
@@ -1171,6 +1277,7 @@ out_store:
  */
 void cl_global_fini(void)
 {
+	cl_env_percpu_fini();
 	cl_lock_fini();
 	cl_page_fini();
 	lu_context_key_degister(&cl_key);
diff --git a/drivers/staging/lustre/lustre/obdclass/cl_page.c b/drivers/staging/lustre/lustre/obdclass/cl_page.c
index 8169836..bab8a74 100644
--- a/drivers/staging/lustre/lustre/obdclass/cl_page.c
+++ b/drivers/staging/lustre/lustre/obdclass/cl_page.c
@@ -123,7 +123,6 @@ static void cl_page_free(const struct lu_env *env, struct cl_page *page)
 	PASSERT(env, page, !page->cp_parent);
 	PASSERT(env, page, page->cp_state == CPS_FREEING);
 
-	might_sleep();
 	while (!list_empty(&page->cp_layers)) {
 		struct cl_page_slice *slice;
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 09/46] staging/lustre/clio: add pages into writeback cache in batches
  2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
                   ` (7 preceding siblings ...)
  2016-03-30 23:48 ` [PATCH v2 08/46] staging/lustre/obdclass: Add a preallocated percpu cl_env green
@ 2016-03-30 23:48 ` green
  2016-03-30 23:48 ` [PATCH v2 10/46] staging/lustre/osc: add weight function for DLM lock green
                   ` (36 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: green @ 2016-03-30 23:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Jinshan Xiong, Oleg Drokin

From: Jinshan Xiong <jinshan.xiong@intel.com>

in ll_write_end(), instead of adding the page into writeback
cache directly, it will be held in a page list. After enough
pages have been collected, issue them all with cio_commit_async().

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-on: http://review.whamcloud.com/7893
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3321
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 drivers/staging/lustre/lustre/include/cl_object.h  |  77 ++--
 drivers/staging/lustre/lustre/include/lclient.h    |   6 +
 drivers/staging/lustre/lustre/llite/file.c         |  11 +-
 .../staging/lustre/lustre/llite/llite_internal.h   |   8 +-
 drivers/staging/lustre/lustre/llite/rw.c           | 186 ++------
 drivers/staging/lustre/lustre/llite/rw26.c         | 210 +++++++--
 drivers/staging/lustre/lustre/llite/vvp_internal.h |  10 +-
 drivers/staging/lustre/lustre/llite/vvp_io.c       | 490 +++++++++++----------
 .../staging/lustre/lustre/lov/lov_cl_internal.h    |   2 +
 drivers/staging/lustre/lustre/lov/lov_io.c         | 219 ++++-----
 drivers/staging/lustre/lustre/lov/lov_page.c       |  28 --
 drivers/staging/lustre/lustre/obdclass/cl_io.c     | 108 ++---
 drivers/staging/lustre/lustre/obdclass/cl_page.c   |  38 --
 .../staging/lustre/lustre/obdecho/echo_client.c    |  25 +-
 drivers/staging/lustre/lustre/osc/osc_cache.c      |  14 +-
 .../staging/lustre/lustre/osc/osc_cl_internal.h    |   2 +
 drivers/staging/lustre/lustre/osc/osc_internal.h   |   6 +
 drivers/staging/lustre/lustre/osc/osc_io.c         | 160 ++++---
 drivers/staging/lustre/lustre/osc/osc_page.c       |  32 +-
 drivers/staging/lustre/lustre/osc/osc_request.c    |  56 ++-
 20 files changed, 792 insertions(+), 896 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/cl_object.h b/drivers/staging/lustre/lustre/include/cl_object.h
index e8455dc..c3865ec 100644
--- a/drivers/staging/lustre/lustre/include/cl_object.h
+++ b/drivers/staging/lustre/lustre/include/cl_object.h
@@ -1019,26 +1019,6 @@ struct cl_page_operations {
 		 */
 		int  (*cpo_make_ready)(const struct lu_env *env,
 				       const struct cl_page_slice *slice);
-		/**
-		 * Announce that this page is to be written out
-		 * opportunistically, that is, page is dirty, it is not
-		 * necessary to start write-out transfer right now, but
-		 * eventually page has to be written out.
-		 *
-		 * Main caller of this is the write path (see
-		 * vvp_io_commit_write()), using this method to build a
-		 * "transfer cache" from which large transfers are then
-		 * constructed by the req-formation engine.
-		 *
-		 * \todo XXX it would make sense to add page-age tracking
-		 * semantics here, and to oblige the req-formation engine to
-		 * send the page out not later than it is too old.
-		 *
-		 * \see cl_page_cache_add()
-		 */
-		int  (*cpo_cache_add)(const struct lu_env *env,
-				      const struct cl_page_slice *slice,
-				      struct cl_io *io);
 	} io[CRT_NR];
 	/**
 	 * Tell transfer engine that only [to, from] part of a page should be
@@ -2023,6 +2003,8 @@ struct cl_io_slice {
 	struct list_head		     cis_linkage;
 };
 
+typedef void (*cl_commit_cbt)(const struct lu_env *, struct cl_io *,
+			      struct cl_page *);
 /**
  * Per-layer io operations.
  * \see vvp_io_ops, lov_io_ops, lovsub_io_ops, osc_io_ops
@@ -2106,7 +2088,7 @@ struct cl_io_operations {
 		void (*cio_fini)(const struct lu_env *env,
 				 const struct cl_io_slice *slice);
 	} op[CIT_OP_NR];
-	struct {
+
 		/**
 		 * Submit pages from \a queue->c2_qin for IO, and move
 		 * successfully submitted pages into \a queue->c2_qout. Return
@@ -2119,7 +2101,15 @@ struct cl_io_operations {
 				   const struct cl_io_slice *slice,
 				   enum cl_req_type crt,
 				   struct cl_2queue *queue);
-	} req_op[CRT_NR];
+	/**
+	 * Queue async page for write.
+	 * The difference between cio_submit and cio_queue is that
+	 * cio_submit is for urgent request.
+	 */
+	int  (*cio_commit_async)(const struct lu_env *env,
+				 const struct cl_io_slice *slice,
+				 struct cl_page_list *queue, int from, int to,
+				 cl_commit_cbt cb);
 	/**
 	 * Read missing page.
 	 *
@@ -2132,31 +2122,6 @@ struct cl_io_operations {
 			     const struct cl_io_slice *slice,
 			     const struct cl_page_slice *page);
 	/**
-	 * Prepare write of a \a page. Called bottom-to-top by a top-level
-	 * cl_io_operations::op[CIT_WRITE]::cio_start() to prepare page for
-	 * get data from user-level buffer.
-	 *
-	 * \pre io->ci_type == CIT_WRITE
-	 *
-	 * \see vvp_io_prepare_write(), lov_io_prepare_write(),
-	 * osc_io_prepare_write().
-	 */
-	int (*cio_prepare_write)(const struct lu_env *env,
-				 const struct cl_io_slice *slice,
-				 const struct cl_page_slice *page,
-				 unsigned from, unsigned to);
-	/**
-	 *
-	 * \pre io->ci_type == CIT_WRITE
-	 *
-	 * \see vvp_io_commit_write(), lov_io_commit_write(),
-	 * osc_io_commit_write().
-	 */
-	int (*cio_commit_write)(const struct lu_env *env,
-				const struct cl_io_slice *slice,
-				const struct cl_page_slice *page,
-				unsigned from, unsigned to);
-	/**
 	 * Optional debugging helper. Print given io slice.
 	 */
 	int (*cio_print)(const struct lu_env *env, void *cookie,
@@ -3044,15 +3009,14 @@ int cl_io_lock_alloc_add(const struct lu_env *env, struct cl_io *io,
 			 struct cl_lock_descr *descr);
 int cl_io_read_page(const struct lu_env *env, struct cl_io *io,
 		    struct cl_page *page);
-int cl_io_prepare_write(const struct lu_env *env, struct cl_io *io,
-			struct cl_page *page, unsigned from, unsigned to);
-int cl_io_commit_write(const struct lu_env *env, struct cl_io *io,
-		       struct cl_page *page, unsigned from, unsigned to);
 int cl_io_submit_rw(const struct lu_env *env, struct cl_io *io,
 		    enum cl_req_type iot, struct cl_2queue *queue);
 int cl_io_submit_sync(const struct lu_env *env, struct cl_io *io,
 		      enum cl_req_type iot, struct cl_2queue *queue,
 		      long timeout);
+int cl_io_commit_async(const struct lu_env *env, struct cl_io *io,
+		       struct cl_page_list *queue, int from, int to,
+		       cl_commit_cbt cb);
 int cl_io_is_going(const struct lu_env *env);
 
 /**
@@ -3108,6 +3072,12 @@ static inline struct cl_page *cl_page_list_last(struct cl_page_list *plist)
 	return list_entry(plist->pl_pages.prev, struct cl_page, cp_batch);
 }
 
+static inline struct cl_page *cl_page_list_first(struct cl_page_list *plist)
+{
+	LASSERT(plist->pl_nr > 0);
+	return list_entry(plist->pl_pages.next, struct cl_page, cp_batch);
+}
+
 /**
  * Iterate over pages in a page list.
  */
@@ -3124,9 +3094,14 @@ void cl_page_list_init(struct cl_page_list *plist);
 void cl_page_list_add(struct cl_page_list *plist, struct cl_page *page);
 void cl_page_list_move(struct cl_page_list *dst, struct cl_page_list *src,
 		       struct cl_page *page);
+void cl_page_list_move_head(struct cl_page_list *dst, struct cl_page_list *src,
+			    struct cl_page *page);
 void cl_page_list_splice(struct cl_page_list *list, struct cl_page_list *head);
+void cl_page_list_del(const struct lu_env *env, struct cl_page_list *plist,
+		      struct cl_page *page);
 void cl_page_list_disown(const struct lu_env *env,
 			 struct cl_io *io, struct cl_page_list *plist);
+void cl_page_list_fini(const struct lu_env *env, struct cl_page_list *plist);
 
 void cl_2queue_init(struct cl_2queue *queue);
 void cl_2queue_disown(const struct lu_env *env,
diff --git a/drivers/staging/lustre/lustre/include/lclient.h b/drivers/staging/lustre/lustre/include/lclient.h
index 5d839a9..6c3a30a 100644
--- a/drivers/staging/lustre/lustre/include/lclient.h
+++ b/drivers/staging/lustre/lustre/include/lclient.h
@@ -91,6 +91,12 @@ struct ccc_io {
 		struct {
 			enum ccc_setattr_lock_type cui_local_lock;
 		} setattr;
+		struct {
+			struct cl_page_list cui_queue;
+			unsigned long cui_written;
+			int cui_from;
+			int cui_to;
+		} write;
 	} u;
 	/**
 	 * True iff io is processing glimpse right now.
diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index cf619af..127fff6 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -1120,6 +1120,9 @@ ll_file_io_generic(const struct lu_env *env, struct vvp_io_args *args,
 	struct cl_io	 *io;
 	ssize_t	       result;
 
+	CDEBUG(D_VFSTRACE, "file: %s, type: %d ppos: %llu, count: %zd\n",
+	       file->f_path.dentry->d_name.name, iot, *ppos, count);
+
 restart:
 	io = ccc_env_thread_io(env);
 	ll_io_init(io, file, iot == CIT_WRITE);
@@ -1144,9 +1147,8 @@ restart:
 					goto out;
 				}
 				write_mutex_locked = 1;
-			} else if (iot == CIT_READ) {
-				down_read(&lli->lli_trunc_sem);
 			}
+			down_read(&lli->lli_trunc_sem);
 			break;
 		case IO_SPLICE:
 			vio->u.splice.cui_pipe = args->u.splice.via_pipe;
@@ -1157,10 +1159,10 @@ restart:
 			LBUG();
 		}
 		result = cl_io_loop(env, io);
+		if (args->via_io_subtype == IO_NORMAL)
+			up_read(&lli->lli_trunc_sem);
 		if (write_mutex_locked)
 			mutex_unlock(&lli->lli_write_mutex);
-		else if (args->via_io_subtype == IO_NORMAL && iot == CIT_READ)
-			up_read(&lli->lli_trunc_sem);
 	} else {
 		/* cl_io_rw_init() handled IO */
 		result = io->ci_result;
@@ -1197,6 +1199,7 @@ out:
 			fd->fd_write_failed = true;
 		}
 	}
+	CDEBUG(D_VFSTRACE, "iot: %d, result: %zd\n", iot, result);
 
 	return result;
 }
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index 3e1572c..08fe0ea 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -697,8 +697,6 @@ int ll_md_blocking_ast(struct ldlm_lock *, struct ldlm_lock_desc *,
 struct dentry *ll_splice_alias(struct inode *inode, struct dentry *de);
 
 /* llite/rw.c */
-int ll_prepare_write(struct file *, struct page *, unsigned from, unsigned to);
-int ll_commit_write(struct file *, struct page *, unsigned from, unsigned to);
 int ll_writepage(struct page *page, struct writeback_control *wbc);
 int ll_writepages(struct address_space *, struct writeback_control *wbc);
 int ll_readpage(struct file *file, struct page *page);
@@ -706,6 +704,9 @@ void ll_readahead_init(struct inode *inode, struct ll_readahead_state *ras);
 int ll_readahead(const struct lu_env *env, struct cl_io *io,
 		 struct ll_readahead_state *ras, struct address_space *mapping,
 		 struct cl_page_list *queue, int flags);
+int vvp_io_write_commit(const struct lu_env *env, struct cl_io *io);
+struct ll_cl_context *ll_cl_init(struct file *file, struct page *vmpage);
+void ll_cl_fini(struct ll_cl_context *lcc);
 
 extern const struct address_space_operations ll_aops;
 
@@ -1476,4 +1477,7 @@ int ll_layout_restore(struct inode *inode);
 int ll_xattr_init(void);
 void ll_xattr_fini(void);
 
+int ll_page_sync_io(const struct lu_env *env, struct cl_io *io,
+		    struct cl_page *page, enum cl_req_type crt);
+
 #endif /* LLITE_INTERNAL_H */
diff --git a/drivers/staging/lustre/lustre/llite/rw.c b/drivers/staging/lustre/lustre/llite/rw.c
index 01b8365..dcccdec 100644
--- a/drivers/staging/lustre/lustre/llite/rw.c
+++ b/drivers/staging/lustre/lustre/llite/rw.c
@@ -63,7 +63,7 @@
  * Finalizes cl-data before exiting typical address_space operation. Dual to
  * ll_cl_init().
  */
-static void ll_cl_fini(struct ll_cl_context *lcc)
+void ll_cl_fini(struct ll_cl_context *lcc)
 {
 	struct lu_env  *env  = lcc->lcc_env;
 	struct cl_io   *io   = lcc->lcc_io;
@@ -84,8 +84,7 @@ static void ll_cl_fini(struct ll_cl_context *lcc)
  * Initializes common cl-data at the typical address_space operation entry
  * point.
  */
-static struct ll_cl_context *ll_cl_init(struct file *file,
-					struct page *vmpage, int create)
+struct ll_cl_context *ll_cl_init(struct file *file, struct page *vmpage)
 {
 	struct ll_cl_context *lcc;
 	struct lu_env    *env;
@@ -96,7 +95,7 @@ static struct ll_cl_context *ll_cl_init(struct file *file,
 	int refcheck;
 	int result = 0;
 
-	clob = ll_i2info(vmpage->mapping->host)->lli_clob;
+	clob = ll_i2info(file_inode(file))->lli_clob;
 	LASSERT(clob);
 
 	env = cl_env_get(&refcheck);
@@ -111,62 +110,18 @@ static struct ll_cl_context *ll_cl_init(struct file *file,
 
 	cio = ccc_env_io(env);
 	io = cio->cui_cl.cis_io;
-	if (!io && create) {
-		struct inode *inode = vmpage->mapping->host;
-		loff_t pos;
+	lcc->lcc_io = io;
+	if (!io) {
+		struct inode *inode = file_inode(file);
 
-		if (inode_trylock(inode)) {
-			inode_unlock((inode));
+		CERROR("%s: " DFID " no active IO, please file a ticket.\n",
+		       ll_get_fsname(inode->i_sb, NULL, 0),
+		       PFID(ll_inode2fid(inode)));
+		dump_stack();
 
-			/* this is too bad. Someone is trying to write the
-			 * page w/o holding inode mutex. This means we can
-			 * add dirty pages into cache during truncate
-			 */
-			CERROR("Proc %s is dirtying page w/o inode lock, this will break truncate\n",
-			       current->comm);
-			dump_stack();
-			LBUG();
-			return ERR_PTR(-EIO);
-		}
-
-		/*
-		 * Loop-back driver calls ->prepare_write().
-		 * methods directly, bypassing file system ->write() operation,
-		 * so cl_io has to be created here.
-		 */
-		io = ccc_env_thread_io(env);
-		ll_io_init(io, file, 1);
-
-		/* No lock at all for this kind of IO - we can't do it because
-		 * we have held page lock, it would cause deadlock.
-		 * XXX: This causes poor performance to loop device - One page
-		 *      per RPC.
-		 *      In order to get better performance, users should use
-		 *      lloop driver instead.
-		 */
-		io->ci_lockreq = CILR_NEVER;
-
-		pos = vmpage->index << PAGE_CACHE_SHIFT;
-
-		/* Create a temp IO to serve write. */
-		result = cl_io_rw_init(env, io, CIT_WRITE, pos, PAGE_CACHE_SIZE);
-		if (result == 0) {
-			cio->cui_fd = LUSTRE_FPRIVATE(file);
-			cio->cui_iter = NULL;
-			result = cl_io_iter_init(env, io);
-			if (result == 0) {
-				result = cl_io_lock(env, io);
-				if (result == 0)
-					result = cl_io_start(env, io);
-			}
-		} else
-			result = io->ci_result;
-	}
-
-	lcc->lcc_io = io;
-	if (!io)
 		result = -EIO;
-	if (result == 0) {
+	}
+	if (result == 0 && vmpage) {
 		struct cl_page   *page;
 
 		LASSERT(io->ci_state == CIS_IO_GOING);
@@ -185,99 +140,9 @@ static struct ll_cl_context *ll_cl_init(struct file *file,
 		lcc = ERR_PTR(result);
 	}
 
-	CDEBUG(D_VFSTRACE, "%lu@"DFID" -> %d %p %p\n",
-	       vmpage->index, PFID(lu_object_fid(&clob->co_lu)), result,
-	       env, io);
 	return lcc;
 }
 
-static struct ll_cl_context *ll_cl_get(void)
-{
-	struct ll_cl_context *lcc;
-	struct lu_env *env;
-	int refcheck;
-
-	env = cl_env_get(&refcheck);
-	LASSERT(!IS_ERR(env));
-	lcc = &vvp_env_info(env)->vti_io_ctx;
-	LASSERT(env == lcc->lcc_env);
-	LASSERT(current == lcc->lcc_cookie);
-	cl_env_put(env, &refcheck);
-
-	/* env has got in ll_cl_init, so it is still usable. */
-	return lcc;
-}
-
-/**
- * ->prepare_write() address space operation called by generic_file_write()
- * for every page during write.
- */
-int ll_prepare_write(struct file *file, struct page *vmpage, unsigned from,
-		     unsigned to)
-{
-	struct ll_cl_context *lcc;
-	int result;
-
-	lcc = ll_cl_init(file, vmpage, 1);
-	if (!IS_ERR(lcc)) {
-		struct lu_env  *env = lcc->lcc_env;
-		struct cl_io   *io  = lcc->lcc_io;
-		struct cl_page *page = lcc->lcc_page;
-
-		cl_page_assume(env, io, page);
-
-		result = cl_io_prepare_write(env, io, page, from, to);
-		if (result == 0) {
-			/*
-			 * Add a reference, so that page is not evicted from
-			 * the cache until ->commit_write() is called.
-			 */
-			cl_page_get(page);
-			lu_ref_add(&page->cp_reference, "prepare_write",
-				   current);
-		} else {
-			cl_page_unassume(env, io, page);
-			ll_cl_fini(lcc);
-		}
-		/* returning 0 in prepare assumes commit must be called
-		 * afterwards
-		 */
-	} else {
-		result = PTR_ERR(lcc);
-	}
-	return result;
-}
-
-int ll_commit_write(struct file *file, struct page *vmpage, unsigned from,
-		    unsigned to)
-{
-	struct ll_cl_context *lcc;
-	struct lu_env    *env;
-	struct cl_io     *io;
-	struct cl_page   *page;
-	int result = 0;
-
-	lcc  = ll_cl_get();
-	env  = lcc->lcc_env;
-	page = lcc->lcc_page;
-	io   = lcc->lcc_io;
-
-	LASSERT(cl_page_is_owned(page, io));
-	LASSERT(from <= to);
-	if (from != to) /* handle short write case. */
-		result = cl_io_commit_write(env, io, page, from, to);
-	if (cl_page_is_owned(page, io))
-		cl_page_unassume(env, io, page);
-
-	/*
-	 * Release reference acquired by ll_prepare_write().
-	 */
-	lu_ref_del(&page->cp_reference, "prepare_write", current);
-	cl_page_put(env, page);
-	ll_cl_fini(lcc);
-	return result;
-}
-
 static void ll_ra_stats_inc_sbi(struct ll_sb_info *sbi, enum ra_stat which);
 
 /**
@@ -1251,7 +1116,7 @@ int ll_readpage(struct file *file, struct page *vmpage)
 	struct ll_cl_context *lcc;
 	int result;
 
-	lcc = ll_cl_init(file, vmpage, 0);
+	lcc = ll_cl_init(file, vmpage);
 	if (!IS_ERR(lcc)) {
 		struct lu_env  *env  = lcc->lcc_env;
 		struct cl_io   *io   = lcc->lcc_io;
@@ -1273,3 +1138,28 @@ int ll_readpage(struct file *file, struct page *vmpage)
 	}
 	return result;
 }
+
+int ll_page_sync_io(const struct lu_env *env, struct cl_io *io,
+		    struct cl_page *page, enum cl_req_type crt)
+{
+	struct cl_2queue  *queue;
+	int result;
+
+	LASSERT(io->ci_type == CIT_READ || io->ci_type == CIT_WRITE);
+
+	queue = &io->ci_queue;
+	cl_2queue_init_page(queue, page);
+
+	result = cl_io_submit_sync(env, io, crt, queue, 0);
+	LASSERT(cl_page_is_owned(page, io));
+
+	if (crt == CRT_READ)
+		/*
+		 * in CRT_WRITE case page is left locked even in case of
+		 * error.
+		 */
+		cl_page_list_disown(env, io, &queue->c2_qin);
+	cl_2queue_fini(env, queue);
+
+	return result;
+}
diff --git a/drivers/staging/lustre/lustre/llite/rw26.c b/drivers/staging/lustre/lustre/llite/rw26.c
index cc49c21..e8d29e1 100644
--- a/drivers/staging/lustre/lustre/llite/rw26.c
+++ b/drivers/staging/lustre/lustre/llite/rw26.c
@@ -462,57 +462,211 @@ out:
 		inode_unlock(inode);
 
 	if (tot_bytes > 0) {
-		if (iov_iter_rw(iter) == WRITE) {
-			struct lov_stripe_md *lsm;
-
-			lsm = ccc_inode_lsm_get(inode);
-			LASSERT(lsm);
-			lov_stripe_lock(lsm);
-			obd_adjust_kms(ll_i2dtexp(inode), lsm, file_offset, 0);
-			lov_stripe_unlock(lsm);
-			ccc_inode_lsm_put(inode, lsm);
-		}
+		struct ccc_io *cio = ccc_env_io(env);
+
+		/* no commit async for direct IO */
+		cio->u.write.cui_written += tot_bytes;
 	}
 
 	cl_env_put(env, &refcheck);
 	return tot_bytes ? : result;
 }
 
+/**
+ * Prepare partially written-to page for a write.
+ */
+static int ll_prepare_partial_page(const struct lu_env *env, struct cl_io *io,
+				   struct cl_page *pg)
+{
+	struct cl_object *obj  = io->ci_obj;
+	struct cl_attr *attr   = ccc_env_thread_attr(env);
+	loff_t          offset = cl_offset(obj, pg->cp_index);
+	int             result;
+
+	cl_object_attr_lock(obj);
+	result = cl_object_attr_get(env, obj, attr);
+	cl_object_attr_unlock(obj);
+	if (result == 0) {
+		struct ccc_page *cp;
+
+		cp = cl2ccc_page(cl_page_at(pg, &vvp_device_type));
+
+		/*
+		 * If are writing to a new page, no need to read old data.
+		 * The extent locking will have updated the KMS, and for our
+		 * purposes here we can treat it like i_size.
+		 */
+		if (attr->cat_kms <= offset) {
+			char *kaddr = kmap_atomic(cp->cpg_page);
+
+			memset(kaddr, 0, cl_page_size(obj));
+			kunmap_atomic(kaddr);
+		} else if (cp->cpg_defer_uptodate) {
+			cp->cpg_ra_used = 1;
+		} else {
+			result = ll_page_sync_io(env, io, pg, CRT_READ);
+		}
+	}
+	return result;
+}
+
 static int ll_write_begin(struct file *file, struct address_space *mapping,
 			  loff_t pos, unsigned len, unsigned flags,
 			  struct page **pagep, void **fsdata)
 {
+	struct ll_cl_context *lcc;
+	struct lu_env  *env;
+	struct cl_io   *io;
+	struct cl_page *page;
+	struct cl_object *clob = ll_i2info(mapping->host)->lli_clob;
 	pgoff_t index = pos >> PAGE_CACHE_SHIFT;
-	struct page *page;
-	int rc;
-	unsigned from = pos & (PAGE_CACHE_SIZE - 1);
+	struct page *vmpage = NULL;
+	unsigned int from = pos & (PAGE_CACHE_SIZE - 1);
+	unsigned int to = from + len;
+	int result = 0;
 
-	page = grab_cache_page_write_begin(mapping, index, flags);
-	if (!page)
-		return -ENOMEM;
+	CDEBUG(D_VFSTRACE, "Writing %lu of %d to %d bytes\n", index, from, len);
+
+	lcc = ll_cl_init(file, NULL);
+	if (IS_ERR(lcc)) {
+		result = PTR_ERR(lcc);
+		goto out;
+	}
+
+	env = lcc->lcc_env;
+	io  = lcc->lcc_io;
+
+	/* To avoid deadlock, try to lock page first. */
+	vmpage = grab_cache_page_nowait(mapping, index);
+	if (unlikely(!vmpage || PageDirty(vmpage))) {
+		struct ccc_io *cio = ccc_env_io(env);
+		struct cl_page_list *plist = &cio->u.write.cui_queue;
+
+		/* if the page is already in dirty cache, we have to commit
+		 * the pages right now; otherwise, it may cause deadlock
+		 * because it holds page lock of a dirty page and request for
+		 * more grants. It's okay for the dirty page to be the first
+		 * one in commit page list, though.
+		 */
+		if (vmpage && PageDirty(vmpage) && plist->pl_nr > 0) {
+			unlock_page(vmpage);
+			page_cache_release(vmpage);
+			vmpage = NULL;
+		}
 
-	*pagep = page;
+		/* commit pages and then wait for page lock */
+		result = vvp_io_write_commit(env, io);
+		if (result < 0)
+			goto out;
 
-	rc = ll_prepare_write(file, page, from, from + len);
-	if (rc) {
-		unlock_page(page);
-		page_cache_release(page);
+		if (!vmpage) {
+			vmpage = grab_cache_page_write_begin(mapping, index,
+							     flags);
+			if (!vmpage) {
+				result = -ENOMEM;
+				goto out;
+			}
+		}
 	}
-	return rc;
+
+	page = cl_page_find(env, clob, vmpage->index, vmpage, CPT_CACHEABLE);
+	if (IS_ERR(page)) {
+		result = PTR_ERR(page);
+		goto out;
+	}
+
+	lcc->lcc_page = page;
+	lu_ref_add(&page->cp_reference, "cl_io", io);
+
+	cl_page_assume(env, io, page);
+	if (!PageUptodate(vmpage)) {
+		/*
+		 * We're completely overwriting an existing page,
+		 * so _don't_ set it up to date until commit_write
+		 */
+		if (from == 0 && to == PAGE_SIZE) {
+			CL_PAGE_HEADER(D_PAGE, env, page, "full page write\n");
+			POISON_PAGE(vmpage, 0x11);
+		} else {
+			/* TODO: can be optimized at OSC layer to check if it
+			 * is a lockless IO. In that case, it's not necessary
+			 * to read the data.
+			 */
+			result = ll_prepare_partial_page(env, io, page);
+			if (result == 0)
+				SetPageUptodate(vmpage);
+		}
+	}
+	if (result < 0)
+		cl_page_unassume(env, io, page);
+out:
+	if (result < 0) {
+		if (vmpage) {
+			unlock_page(vmpage);
+			page_cache_release(vmpage);
+		}
+		if (!IS_ERR(lcc))
+			ll_cl_fini(lcc);
+	} else {
+		*pagep = vmpage;
+		*fsdata = lcc;
+	}
+	return result;
 }
 
 static int ll_write_end(struct file *file, struct address_space *mapping,
 			loff_t pos, unsigned len, unsigned copied,
-			struct page *page, void *fsdata)
+			struct page *vmpage, void *fsdata)
 {
+	struct ll_cl_context *lcc = fsdata;
+	struct lu_env *env;
+	struct cl_io *io;
+	struct ccc_io *cio;
+	struct cl_page *page;
 	unsigned from = pos & (PAGE_CACHE_SIZE - 1);
-	int rc;
+	bool unplug = false;
+	int result = 0;
+
+	page_cache_release(vmpage);
+
+	env  = lcc->lcc_env;
+	page = lcc->lcc_page;
+	io   = lcc->lcc_io;
+	cio  = ccc_env_io(env);
+
+	LASSERT(cl_page_is_owned(page, io));
+	if (copied > 0) {
+		struct cl_page_list *plist = &cio->u.write.cui_queue;
+
+		lcc->lcc_page = NULL; /* page will be queued */
+
+		/* Add it into write queue */
+		cl_page_list_add(plist, page);
+		if (plist->pl_nr == 1) /* first page */
+			cio->u.write.cui_from = from;
+		else
+			LASSERT(from == 0);
+		cio->u.write.cui_to = from + copied;
+
+		/* We may have one full RPC, commit it soon */
+		if (plist->pl_nr >= PTLRPC_MAX_BRW_PAGES)
+			unplug = true;
+
+		CL_PAGE_DEBUG(D_VFSTRACE, env, page,
+			      "queued page: %d.\n", plist->pl_nr);
+	} else {
+		cl_page_disown(env, io, page);
+
+		/* page list is not contiguous now, commit it now */
+		unplug = true;
+	}
 
-	rc = ll_commit_write(file, page, from, from + copied);
-	unlock_page(page);
-	page_cache_release(page);
+	if (unplug ||
+	    file->f_flags & O_SYNC || IS_SYNC(file_inode(file)))
+		result = vvp_io_write_commit(env, io);
 
-	return rc ?: copied;
+	ll_cl_fini(lcc);
+	return result >= 0 ? copied : result;
 }
 
 #ifdef CONFIG_MIGRATION
diff --git a/drivers/staging/lustre/lustre/llite/vvp_internal.h b/drivers/staging/lustre/lustre/llite/vvp_internal.h
index bb39337..9abde11 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_internal.h
+++ b/drivers/staging/lustre/lustre/llite/vvp_internal.h
@@ -44,17 +44,15 @@
 #include "../include/cl_object.h"
 #include "llite_internal.h"
 
-int vvp_io_init(const struct lu_env *env,
-		struct cl_object *obj, struct cl_io *io);
-int vvp_lock_init(const struct lu_env *env,
-		  struct cl_object *obj, struct cl_lock *lock,
-				   const struct cl_io *io);
+int vvp_io_init(const struct lu_env *env, struct cl_object *obj,
+		struct cl_io *io);
+int vvp_lock_init(const struct lu_env *env, struct cl_object *obj,
+		  struct cl_lock *lock, const struct cl_io *io);
 int vvp_page_init(const struct lu_env *env, struct cl_object *obj,
 		  struct cl_page *page, struct page *vmpage);
 struct lu_object *vvp_object_alloc(const struct lu_env *env,
 				   const struct lu_object_header *hdr,
 				   struct lu_device *dev);
-
 struct ccc_object *cl_inode2ccc(struct inode *inode);
 
 extern const struct file_operations vvp_dump_pgcache_file_ops;
diff --git a/drivers/staging/lustre/lustre/llite/vvp_io.c b/drivers/staging/lustre/lustre/llite/vvp_io.c
index ffe301b..f4a1384 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_io.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_io.c
@@ -101,6 +101,27 @@ static bool can_populate_pages(const struct lu_env *env, struct cl_io *io,
  *
  */
 
+static int vvp_io_write_iter_init(const struct lu_env *env,
+				  const struct cl_io_slice *ios)
+{
+	struct ccc_io *cio = cl2ccc_io(env, ios);
+
+	cl_page_list_init(&cio->u.write.cui_queue);
+	cio->u.write.cui_written = 0;
+	cio->u.write.cui_from = 0;
+	cio->u.write.cui_to = PAGE_SIZE;
+
+	return 0;
+}
+
+static void vvp_io_write_iter_fini(const struct lu_env *env,
+				   const struct cl_io_slice *ios)
+{
+	struct ccc_io *cio = cl2ccc_io(env, ios);
+
+	LASSERT(cio->u.write.cui_queue.pl_nr == 0);
+}
+
 static int vvp_io_fault_iter_init(const struct lu_env *env,
 				  const struct cl_io_slice *ios)
 {
@@ -563,6 +584,183 @@ static void vvp_io_read_fini(const struct lu_env *env, const struct cl_io_slice
 	vvp_io_fini(env, ios);
 }
 
+static int vvp_io_commit_sync(const struct lu_env *env, struct cl_io *io,
+			      struct cl_page_list *plist, int from, int to)
+{
+	struct cl_2queue *queue = &io->ci_queue;
+	struct cl_page *page;
+	unsigned int bytes = 0;
+	int rc = 0;
+
+	if (plist->pl_nr == 0)
+		return 0;
+
+	if (from != 0) {
+		page = cl_page_list_first(plist);
+		cl_page_clip(env, page, from,
+			     plist->pl_nr == 1 ? to : PAGE_SIZE);
+	}
+	if (to != PAGE_SIZE && plist->pl_nr > 1) {
+		page = cl_page_list_last(plist);
+		cl_page_clip(env, page, 0, to);
+	}
+
+	cl_2queue_init(queue);
+	cl_page_list_splice(plist, &queue->c2_qin);
+	rc = cl_io_submit_sync(env, io, CRT_WRITE, queue, 0);
+
+	/* plist is not sorted any more */
+	cl_page_list_splice(&queue->c2_qin, plist);
+	cl_page_list_splice(&queue->c2_qout, plist);
+	cl_2queue_fini(env, queue);
+
+	if (rc == 0) {
+		/* calculate bytes */
+		bytes = plist->pl_nr << PAGE_SHIFT;
+		bytes -= from + PAGE_SIZE - to;
+
+		while (plist->pl_nr > 0) {
+			page = cl_page_list_first(plist);
+			cl_page_list_del(env, plist, page);
+
+			cl_page_clip(env, page, 0, PAGE_SIZE);
+
+			SetPageUptodate(cl_page_vmpage(env, page));
+			cl_page_disown(env, io, page);
+
+			/* held in ll_cl_init() */
+			lu_ref_del(&page->cp_reference, "cl_io", io);
+			cl_page_put(env, page);
+		}
+	}
+
+	return bytes > 0 ? bytes : rc;
+}
+
+static void write_commit_callback(const struct lu_env *env, struct cl_io *io,
+				  struct cl_page *page)
+{
+	const struct cl_page_slice *slice;
+	struct ccc_page *cp;
+	struct page *vmpage;
+
+	slice = cl_page_at(page, &vvp_device_type);
+	cp = cl2ccc_page(slice);
+	vmpage = cp->cpg_page;
+
+	SetPageUptodate(vmpage);
+	set_page_dirty(vmpage);
+	vvp_write_pending(cl2ccc(slice->cpl_obj), cp);
+
+	cl_page_disown(env, io, page);
+
+	/* held in ll_cl_init() */
+	lu_ref_del(&page->cp_reference, "cl_io", io);
+	cl_page_put(env, page);
+}
+
+/* make sure the page list is contiguous */
+static bool page_list_sanity_check(struct cl_page_list *plist)
+{
+	struct cl_page *page;
+	pgoff_t index = CL_PAGE_EOF;
+
+	cl_page_list_for_each(page, plist) {
+		if (index == CL_PAGE_EOF) {
+			index = page->cp_index;
+			continue;
+		}
+
+		++index;
+		if (index == page->cp_index)
+			continue;
+
+		return false;
+	}
+	return true;
+}
+
+/* Return how many bytes have queued or written */
+int vvp_io_write_commit(const struct lu_env *env, struct cl_io *io)
+{
+	struct cl_object *obj = io->ci_obj;
+	struct inode *inode = ccc_object_inode(obj);
+	struct ccc_io *cio = ccc_env_io(env);
+	struct cl_page_list *queue = &cio->u.write.cui_queue;
+	struct cl_page *page;
+	int rc = 0;
+	int bytes = 0;
+	unsigned int npages = cio->u.write.cui_queue.pl_nr;
+
+	if (npages == 0)
+		return 0;
+
+	CDEBUG(D_VFSTRACE, "commit async pages: %d, from %d, to %d\n",
+	       npages, cio->u.write.cui_from, cio->u.write.cui_to);
+
+	LASSERT(page_list_sanity_check(queue));
+
+	/* submit IO with async write */
+	rc = cl_io_commit_async(env, io, queue,
+				cio->u.write.cui_from, cio->u.write.cui_to,
+				write_commit_callback);
+	npages -= queue->pl_nr; /* already committed pages */
+	if (npages > 0) {
+		/* calculate how many bytes were written */
+		bytes = npages << PAGE_SHIFT;
+
+		/* first page */
+		bytes -= cio->u.write.cui_from;
+		if (queue->pl_nr == 0) /* last page */
+			bytes -= PAGE_SIZE - cio->u.write.cui_to;
+		LASSERTF(bytes > 0, "bytes = %d, pages = %d\n", bytes, npages);
+
+		cio->u.write.cui_written += bytes;
+
+		CDEBUG(D_VFSTRACE, "Committed %d pages %d bytes, tot: %ld\n",
+		       npages, bytes, cio->u.write.cui_written);
+
+		/* the first page must have been written. */
+		cio->u.write.cui_from = 0;
+	}
+	LASSERT(page_list_sanity_check(queue));
+	LASSERT(ergo(rc == 0, queue->pl_nr == 0));
+
+	/* out of quota, try sync write */
+	if (rc == -EDQUOT && !cl_io_is_mkwrite(io)) {
+		rc = vvp_io_commit_sync(env, io, queue,
+					cio->u.write.cui_from,
+					cio->u.write.cui_to);
+		if (rc > 0) {
+			cio->u.write.cui_written += rc;
+			rc = 0;
+		}
+	}
+
+	/* update inode size */
+	ll_merge_lvb(env, inode);
+
+	/* Now the pages in queue were failed to commit, discard them
+	 * unless they were dirtied before.
+	 */
+	while (queue->pl_nr > 0) {
+		page = cl_page_list_first(queue);
+		cl_page_list_del(env, queue, page);
+
+		if (!PageDirty(cl_page_vmpage(env, page)))
+			cl_page_discard(env, io, page);
+
+		cl_page_disown(env, io, page);
+
+		/* held in ll_cl_init() */
+		lu_ref_del(&page->cp_reference, "cl_io", io);
+		cl_page_put(env, page);
+	}
+	cl_page_list_fini(env, queue);
+
+	return rc;
+}
+
 static int vvp_io_write_start(const struct lu_env *env,
 			      const struct cl_io_slice *ios)
 {
@@ -596,9 +794,24 @@ static int vvp_io_write_start(const struct lu_env *env,
 		result = generic_file_write_iter(cio->cui_iocb, cio->cui_iter);
 
 	if (result > 0) {
+		result = vvp_io_write_commit(env, io);
+		if (cio->u.write.cui_written > 0) {
+			result = cio->u.write.cui_written;
+			io->ci_nob += result;
+
+			CDEBUG(D_VFSTRACE, "write: nob %zd, result: %zd\n",
+			       io->ci_nob, result);
+		}
+	}
+	if (result > 0) {
+		struct ll_inode_info *lli = ll_i2info(inode);
+
+		spin_lock(&lli->lli_lock);
+		lli->lli_flags |= LLIF_DATA_MODIFIED;
+		spin_unlock(&lli->lli_lock);
+
 		if (result < cnt)
 			io->ci_continue = 0;
-		io->ci_nob += result;
 		ll_rw_stats_tally(ll_i2sbi(inode), current->pid,
 				  cio->cui_fd, pos, result, WRITE);
 		result = 0;
@@ -645,6 +858,21 @@ static int vvp_io_kernel_fault(struct vvp_fault_io *cfio)
 	return -EINVAL;
 }
 
+static void mkwrite_commit_callback(const struct lu_env *env, struct cl_io *io,
+				    struct cl_page *page)
+{
+	const struct cl_page_slice *slice;
+	struct ccc_page *cp;
+	struct page *vmpage;
+
+	slice = cl_page_at(page, &vvp_device_type);
+	cp = cl2ccc_page(slice);
+	vmpage = cp->cpg_page;
+
+	set_page_dirty(vmpage);
+	vvp_write_pending(cl2ccc(slice->cpl_obj), cp);
+}
+
 static int vvp_io_fault_start(const struct lu_env *env,
 			      const struct cl_io_slice *ios)
 {
@@ -659,7 +887,7 @@ static int vvp_io_fault_start(const struct lu_env *env,
 	struct page	  *vmpage  = NULL;
 	struct cl_page      *page;
 	loff_t	       size;
-	pgoff_t	      last; /* last page in a file data region */
+	pgoff_t		     last_index;
 
 	if (fio->ft_executable &&
 	    inode->i_mtime.tv_sec != vio->u.fault.ft_mtime)
@@ -705,15 +933,15 @@ static int vvp_io_fault_start(const struct lu_env *env,
 		goto out;
 	}
 
+	last_index = cl_index(obj, size - 1);
+
 	if (fio->ft_mkwrite) {
-		pgoff_t last_index;
 		/*
 		 * Capture the size while holding the lli_trunc_sem from above
 		 * we want to make sure that we complete the mkwrite action
 		 * while holding this lock. We need to make sure that we are
 		 * not past the end of the file.
 		 */
-		last_index = cl_index(obj, size - 1);
 		if (last_index < fio->ft_index) {
 			CDEBUG(D_PAGE,
 			       "llite: mkwrite and truncate race happened: %p: 0x%lx 0x%lx\n",
@@ -745,21 +973,28 @@ static int vvp_io_fault_start(const struct lu_env *env,
 	 */
 	if (fio->ft_mkwrite) {
 		wait_on_page_writeback(vmpage);
-		if (set_page_dirty(vmpage)) {
-			struct ccc_page *cp;
+		if (!PageDirty(vmpage)) {
+			struct cl_page_list *plist = &io->ci_queue.c2_qin;
+			int to = PAGE_SIZE;
 
 			/* vvp_page_assume() calls wait_on_page_writeback(). */
 			cl_page_assume(env, io, page);
 
-			cp = cl2ccc_page(cl_page_at(page, &vvp_device_type));
-			vvp_write_pending(cl2ccc(obj), cp);
+			cl_page_list_init(plist);
+			cl_page_list_add(plist, page);
+
+			/* size fixup */
+			if (last_index == page->cp_index)
+				to = size & ~PAGE_MASK;
 
 			/* Do not set Dirty bit here so that in case IO is
 			 * started before the page is really made dirty, we
 			 * still have chance to detect it.
 			 */
-			result = cl_page_cache_add(env, io, page, CRT_WRITE);
+			result = cl_io_commit_async(env, io, plist, 0, to,
+						    mkwrite_commit_callback);
 			LASSERT(cl_page_is_owned(page, io));
+			cl_page_list_fini(env, plist);
 
 			vmpage = NULL;
 			if (result < 0) {
@@ -777,15 +1012,14 @@ static int vvp_io_fault_start(const struct lu_env *env,
 		}
 	}
 
-	last = cl_index(obj, size - 1);
 	/*
 	 * The ft_index is only used in the case of
 	 * a mkwrite action. We need to check
 	 * our assertions are correct, since
 	 * we should have caught this above
 	 */
-	LASSERT(!fio->ft_mkwrite || fio->ft_index <= last);
-	if (fio->ft_index == last)
+	LASSERT(!fio->ft_mkwrite || fio->ft_index <= last_index);
+	if (fio->ft_index == last_index)
 		/*
 		 * Last page is mapped partially.
 		 */
@@ -865,234 +1099,6 @@ static int vvp_io_read_page(const struct lu_env *env,
 	return 0;
 }
 
-static int vvp_page_sync_io(const struct lu_env *env, struct cl_io *io,
-			    struct cl_page *page, struct ccc_page *cp,
-			    enum cl_req_type crt)
-{
-	struct cl_2queue  *queue;
-	int result;
-
-	LASSERT(io->ci_type == CIT_READ || io->ci_type == CIT_WRITE);
-
-	queue = &io->ci_queue;
-	cl_2queue_init_page(queue, page);
-
-	result = cl_io_submit_sync(env, io, crt, queue, 0);
-	LASSERT(cl_page_is_owned(page, io));
-
-	if (crt == CRT_READ)
-		/*
-		 * in CRT_WRITE case page is left locked even in case of
-		 * error.
-		 */
-		cl_page_list_disown(env, io, &queue->c2_qin);
-	cl_2queue_fini(env, queue);
-
-	return result;
-}
-
-/**
- * Prepare partially written-to page for a write.
- */
-static int vvp_io_prepare_partial(const struct lu_env *env, struct cl_io *io,
-				  struct cl_object *obj, struct cl_page *pg,
-				  struct ccc_page *cp,
-				  unsigned from, unsigned to)
-{
-	struct cl_attr *attr   = ccc_env_thread_attr(env);
-	loff_t	  offset = cl_offset(obj, pg->cp_index);
-	int	     result;
-
-	cl_object_attr_lock(obj);
-	result = cl_object_attr_get(env, obj, attr);
-	cl_object_attr_unlock(obj);
-	if (result == 0) {
-		/*
-		 * If are writing to a new page, no need to read old data.
-		 * The extent locking will have updated the KMS, and for our
-		 * purposes here we can treat it like i_size.
-		 */
-		if (attr->cat_kms <= offset) {
-			char *kaddr = kmap_atomic(cp->cpg_page);
-
-			memset(kaddr, 0, cl_page_size(obj));
-			kunmap_atomic(kaddr);
-		} else if (cp->cpg_defer_uptodate)
-			cp->cpg_ra_used = 1;
-		else
-			result = vvp_page_sync_io(env, io, pg, cp, CRT_READ);
-		/*
-		 * In older implementations, obdo_refresh_inode is called here
-		 * to update the inode because the write might modify the
-		 * object info at OST. However, this has been proven useless,
-		 * since LVB functions will be called when user space program
-		 * tries to retrieve inode attribute.  Also, see bug 15909 for
-		 * details. -jay
-		 */
-		if (result == 0)
-			cl_page_export(env, pg, 1);
-	}
-	return result;
-}
-
-static int vvp_io_prepare_write(const struct lu_env *env,
-				const struct cl_io_slice *ios,
-				const struct cl_page_slice *slice,
-				unsigned from, unsigned to)
-{
-	struct cl_object *obj    = slice->cpl_obj;
-	struct ccc_page  *cp     = cl2ccc_page(slice);
-	struct cl_page   *pg     = slice->cpl_page;
-	struct page       *vmpage = cp->cpg_page;
-
-	int result;
-
-	LINVRNT(cl_page_is_vmlocked(env, pg));
-	LASSERT(vmpage->mapping->host == ccc_object_inode(obj));
-
-	result = 0;
-
-	CL_PAGE_HEADER(D_PAGE, env, pg, "preparing: [%d, %d]\n", from, to);
-	if (!PageUptodate(vmpage)) {
-		/*
-		 * We're completely overwriting an existing page, so _don't_
-		 * set it up to date until commit_write
-		 */
-		if (from == 0 && to == PAGE_CACHE_SIZE) {
-			CL_PAGE_HEADER(D_PAGE, env, pg, "full page write\n");
-			POISON_PAGE(page, 0x11);
-		} else
-			result = vvp_io_prepare_partial(env, ios->cis_io, obj,
-							pg, cp, from, to);
-	} else
-		CL_PAGE_HEADER(D_PAGE, env, pg, "uptodate\n");
-	return result;
-}
-
-static int vvp_io_commit_write(const struct lu_env *env,
-			       const struct cl_io_slice *ios,
-			       const struct cl_page_slice *slice,
-			       unsigned from, unsigned to)
-{
-	struct cl_object  *obj    = slice->cpl_obj;
-	struct cl_io      *io     = ios->cis_io;
-	struct ccc_page   *cp     = cl2ccc_page(slice);
-	struct cl_page    *pg     = slice->cpl_page;
-	struct inode      *inode  = ccc_object_inode(obj);
-	struct ll_sb_info *sbi    = ll_i2sbi(inode);
-	struct ll_inode_info *lli = ll_i2info(inode);
-	struct page	*vmpage = cp->cpg_page;
-
-	int    result;
-	int    tallyop;
-	loff_t size;
-
-	LINVRNT(cl_page_is_vmlocked(env, pg));
-	LASSERT(vmpage->mapping->host == inode);
-
-	LU_OBJECT_HEADER(D_INODE, env, &obj->co_lu, "committing page write\n");
-	CL_PAGE_HEADER(D_PAGE, env, pg, "committing: [%d, %d]\n", from, to);
-
-	/*
-	 * queue a write for some time in the future the first time we
-	 * dirty the page.
-	 *
-	 * This is different from what other file systems do: they usually
-	 * just mark page (and some of its buffers) dirty and rely on
-	 * balance_dirty_pages() to start a write-back. Lustre wants write-back
-	 * to be started earlier for the following reasons:
-	 *
-	 *     (1) with a large number of clients we need to limit the amount
-	 *     of cached data on the clients a lot;
-	 *
-	 *     (2) large compute jobs generally want compute-only then io-only
-	 *     and the IO should complete as quickly as possible;
-	 *
-	 *     (3) IO is batched up to the RPC size and is async until the
-	 *     client max cache is hit
-	 *     (/sys/fs/lustre/osc/OSC.../max_dirty_mb)
-	 *
-	 */
-	if (!PageDirty(vmpage)) {
-		tallyop = LPROC_LL_DIRTY_MISSES;
-		result = cl_page_cache_add(env, io, pg, CRT_WRITE);
-		if (result == 0) {
-			/* page was added into cache successfully. */
-			set_page_dirty(vmpage);
-			vvp_write_pending(cl2ccc(obj), cp);
-		} else if (result == -EDQUOT) {
-			pgoff_t last_index = i_size_read(inode) >> PAGE_CACHE_SHIFT;
-			bool need_clip = true;
-
-			/*
-			 * Client ran out of disk space grant. Possible
-			 * strategies are:
-			 *
-			 *     (a) do a sync write, renewing grant;
-			 *
-			 *     (b) stop writing on this stripe, switch to the
-			 *     next one.
-			 *
-			 * (b) is a part of "parallel io" design that is the
-			 * ultimate goal. (a) is what "old" client did, and
-			 * what the new code continues to do for the time
-			 * being.
-			 */
-			if (last_index > pg->cp_index) {
-				to = PAGE_CACHE_SIZE;
-				need_clip = false;
-			} else if (last_index == pg->cp_index) {
-				int size_to = i_size_read(inode) & ~PAGE_MASK;
-
-				if (to < size_to)
-					to = size_to;
-			}
-			if (need_clip)
-				cl_page_clip(env, pg, 0, to);
-			result = vvp_page_sync_io(env, io, pg, cp, CRT_WRITE);
-			if (result)
-				CERROR("Write page %lu of inode %p failed %d\n",
-				       pg->cp_index, inode, result);
-		}
-	} else {
-		tallyop = LPROC_LL_DIRTY_HITS;
-		result = 0;
-	}
-	ll_stats_ops_tally(sbi, tallyop, 1);
-
-	/* Inode should be marked DIRTY even if no new page was marked DIRTY
-	 * because page could have been not flushed between 2 modifications.
-	 * It is important the file is marked DIRTY as soon as the I/O is done
-	 * Indeed, when cache is flushed, file could be already closed and it
-	 * is too late to warn the MDT.
-	 * It is acceptable that file is marked DIRTY even if I/O is dropped
-	 * for some reasons before being flushed to OST.
-	 */
-	if (result == 0) {
-		spin_lock(&lli->lli_lock);
-		lli->lli_flags |= LLIF_DATA_MODIFIED;
-		spin_unlock(&lli->lli_lock);
-	}
-
-	size = cl_offset(obj, pg->cp_index) + to;
-
-	ll_inode_size_lock(inode);
-	if (result == 0) {
-		if (size > i_size_read(inode)) {
-			cl_isize_write_nolock(inode, size);
-			CDEBUG(D_VFSTRACE, DFID" updating i_size %lu\n",
-			       PFID(lu_object_fid(&obj->co_lu)),
-			       (unsigned long)size);
-		}
-		cl_page_export(env, pg, 1);
-	} else {
-		if (size > i_size_read(inode))
-			cl_page_discard(env, io, pg);
-	}
-	ll_inode_size_unlock(inode);
-	return result;
-}
-
 static const struct cl_io_operations vvp_io_ops = {
 	.op = {
 		[CIT_READ] = {
@@ -1103,6 +1109,8 @@ static const struct cl_io_operations vvp_io_ops = {
 		},
 		[CIT_WRITE] = {
 			.cio_fini      = vvp_io_fini,
+			.cio_iter_init = vvp_io_write_iter_init,
+			.cio_iter_fini = vvp_io_write_iter_fini,
 			.cio_lock      = vvp_io_write_lock,
 			.cio_start     = vvp_io_write_start,
 			.cio_advance   = ccc_io_advance
@@ -1130,8 +1138,6 @@ static const struct cl_io_operations vvp_io_ops = {
 		}
 	},
 	.cio_read_page     = vvp_io_read_page,
-	.cio_prepare_write = vvp_io_prepare_write,
-	.cio_commit_write  = vvp_io_commit_write
 };
 
 int vvp_io_init(const struct lu_env *env, struct cl_object *obj,
diff --git a/drivers/staging/lustre/lustre/lov/lov_cl_internal.h b/drivers/staging/lustre/lustre/lov/lov_cl_internal.h
index 7dd3162..3d568fc 100644
--- a/drivers/staging/lustre/lustre/lov/lov_cl_internal.h
+++ b/drivers/staging/lustre/lustre/lov/lov_cl_internal.h
@@ -444,8 +444,10 @@ struct lov_thread_info {
 	struct cl_lock_descr    lti_ldescr;
 	struct ost_lvb	  lti_lvb;
 	struct cl_2queue	lti_cl2q;
+	struct cl_page_list     lti_plist;
 	struct cl_lock_closure  lti_closure;
 	wait_queue_t	  lti_waiter;
+	struct cl_attr          lti_attr;
 };
 
 /**
diff --git a/drivers/staging/lustre/lustre/lov/lov_io.c b/drivers/staging/lustre/lustre/lov/lov_io.c
index 4296aac..c606490 100644
--- a/drivers/staging/lustre/lustre/lov/lov_io.c
+++ b/drivers/staging/lustre/lustre/lov/lov_io.c
@@ -543,13 +543,6 @@ static void lov_io_unlock(const struct lu_env *env,
 	LASSERT(rc == 0);
 }
 
-static struct cl_page_list *lov_io_submit_qin(struct lov_device *ld,
-					      struct cl_page_list *qin,
-					      int idx, int alloc)
-{
-	return alloc ? &qin[idx] : &ld->ld_emrg[idx]->emrg_page_list;
-}
-
 /**
  * lov implementation of cl_operations::cio_submit() method. It takes a list
  * of pages in \a queue, splits it into per-stripe sub-lists, invokes
@@ -569,25 +562,17 @@ static int lov_io_submit(const struct lu_env *env,
 			 const struct cl_io_slice *ios,
 			 enum cl_req_type crt, struct cl_2queue *queue)
 {
-	struct lov_io	  *lio = cl2lov_io(env, ios);
-	struct lov_object      *obj = lio->lis_object;
-	struct lov_device       *ld = lu2lov_dev(lov2cl(obj)->co_lu.lo_dev);
-	struct cl_page_list    *qin = &queue->c2_qin;
-	struct cl_2queue      *cl2q = &lov_env_info(env)->lti_cl2q;
-	struct cl_page_list *stripes_qin = NULL;
+	struct cl_page_list *qin = &queue->c2_qin;
+	struct lov_io *lio = cl2lov_io(env, ios);
+	struct lov_io_sub *sub;
+	struct cl_page_list *plist = &lov_env_info(env)->lti_plist;
 	struct cl_page *page;
-	struct cl_page *tmp;
 	int stripe;
 
-#define QIN(stripe) lov_io_submit_qin(ld, stripes_qin, stripe, alloc)
-
 	int rc = 0;
-	int alloc =
-		!(current->flags & PF_MEMALLOC);
 
 	if (lio->lis_active_subios == 1) {
 		int idx = lio->lis_single_subio_index;
-		struct lov_io_sub *sub;
 
 		LASSERT(idx < lio->lis_nr_subios);
 		sub = lov_sub_get(env, lio, idx);
@@ -600,119 +585,120 @@ static int lov_io_submit(const struct lu_env *env,
 	}
 
 	LASSERT(lio->lis_subs);
-	if (alloc) {
-		stripes_qin =
-			libcfs_kvzalloc(sizeof(*stripes_qin) *
-					lio->lis_nr_subios,
-					GFP_NOFS);
-		if (!stripes_qin)
-			return -ENOMEM;
-
-		for (stripe = 0; stripe < lio->lis_nr_subios; stripe++)
-			cl_page_list_init(&stripes_qin[stripe]);
-	} else {
-		/*
-		 * If we get here, it means pageout & swap doesn't help.
-		 * In order to not make things worse, even don't try to
-		 * allocate the memory with __GFP_NOWARN. -jay
-		 */
-		mutex_lock(&ld->ld_mutex);
-		lio->lis_mem_frozen = 1;
-	}
 
-	cl_2queue_init(cl2q);
-	cl_page_list_for_each_safe(page, tmp, qin) {
-		stripe = lov_page_stripe(page);
-		cl_page_list_move(QIN(stripe), qin, page);
-	}
+	cl_page_list_init(plist);
+	while (qin->pl_nr > 0) {
+		struct cl_2queue *cl2q = &lov_env_info(env)->lti_cl2q;
 
-	for (stripe = 0; stripe < lio->lis_nr_subios; stripe++) {
-		struct lov_io_sub   *sub;
-		struct cl_page_list *sub_qin = QIN(stripe);
+		cl_2queue_init(cl2q);
 
-		if (list_empty(&sub_qin->pl_pages))
-			continue;
+		page = cl_page_list_first(qin);
+		cl_page_list_move(&cl2q->c2_qin, qin, page);
+
+		stripe = lov_page_stripe(page);
+		while (qin->pl_nr > 0) {
+			page = cl_page_list_first(qin);
+			if (stripe != lov_page_stripe(page))
+				break;
+
+			cl_page_list_move(&cl2q->c2_qin, qin, page);
+		}
 
-		cl_page_list_splice(sub_qin, &cl2q->c2_qin);
 		sub = lov_sub_get(env, lio, stripe);
 		if (!IS_ERR(sub)) {
 			rc = cl_io_submit_rw(sub->sub_env, sub->sub_io,
 					     crt, cl2q);
 			lov_sub_put(sub);
-		} else
+		} else {
 			rc = PTR_ERR(sub);
-		cl_page_list_splice(&cl2q->c2_qin,  &queue->c2_qin);
+		}
+
+		cl_page_list_splice(&cl2q->c2_qin, plist);
 		cl_page_list_splice(&cl2q->c2_qout, &queue->c2_qout);
+		cl_2queue_fini(env, cl2q);
+
 		if (rc != 0)
 			break;
 	}
 
-	for (stripe = 0; stripe < lio->lis_nr_subios; stripe++) {
-		struct cl_page_list *sub_qin = QIN(stripe);
+	cl_page_list_splice(plist, qin);
+	cl_page_list_fini(env, plist);
 
-		if (list_empty(&sub_qin->pl_pages))
-			continue;
+	return rc;
+}
+
+static int lov_io_commit_async(const struct lu_env *env,
+			       const struct cl_io_slice *ios,
+			       struct cl_page_list *queue, int from, int to,
+			       cl_commit_cbt cb)
+{
+	struct cl_page_list *plist = &lov_env_info(env)->lti_plist;
+	struct lov_io *lio = cl2lov_io(env, ios);
+	struct lov_io_sub *sub;
+	struct cl_page *page;
+	int rc = 0;
 
-		cl_page_list_splice(sub_qin, qin);
+	if (lio->lis_active_subios == 1) {
+		int idx = lio->lis_single_subio_index;
+
+		LASSERT(idx < lio->lis_nr_subios);
+		sub = lov_sub_get(env, lio, idx);
+		LASSERT(!IS_ERR(sub));
+		LASSERT(sub->sub_io == &lio->lis_single_subio);
+		rc = cl_io_commit_async(sub->sub_env, sub->sub_io, queue,
+					from, to, cb);
+		lov_sub_put(sub);
+		return rc;
 	}
 
-	if (alloc) {
-		kvfree(stripes_qin);
-	} else {
-		int i;
+	LASSERT(lio->lis_subs);
 
-		for (i = 0; i < lio->lis_nr_subios; i++) {
-			struct cl_io *cio = lio->lis_subs[i].sub_io;
+	cl_page_list_init(plist);
+	while (queue->pl_nr > 0) {
+		int stripe_to = to;
+		int stripe;
 
-			if (cio && cio == &ld->ld_emrg[i]->emrg_subio)
-				lov_io_sub_fini(env, lio, &lio->lis_subs[i]);
+		LASSERT(plist->pl_nr == 0);
+		page = cl_page_list_first(queue);
+		cl_page_list_move(plist, queue, page);
+
+		stripe = lov_page_stripe(page);
+		while (queue->pl_nr > 0) {
+			page = cl_page_list_first(queue);
+			if (stripe != lov_page_stripe(page))
+				break;
+
+			cl_page_list_move(plist, queue, page);
 		}
-		lio->lis_mem_frozen = 0;
-		mutex_unlock(&ld->ld_mutex);
-	}
 
-	return rc;
-#undef QIN
-}
+		if (queue->pl_nr > 0) /* still has more pages */
+			stripe_to = PAGE_SIZE;
 
-static int lov_io_prepare_write(const struct lu_env *env,
-				const struct cl_io_slice *ios,
-				const struct cl_page_slice *slice,
-				unsigned from, unsigned to)
-{
-	struct lov_io     *lio      = cl2lov_io(env, ios);
-	struct cl_page    *sub_page = lov_sub_page(slice);
-	struct lov_io_sub *sub;
-	int result;
+		sub = lov_sub_get(env, lio, stripe);
+		if (!IS_ERR(sub)) {
+			rc = cl_io_commit_async(sub->sub_env, sub->sub_io,
+						plist, from, stripe_to, cb);
+			lov_sub_put(sub);
+		} else {
+			rc = PTR_ERR(sub);
+			break;
+		}
 
-	sub = lov_page_subio(env, lio, slice);
-	if (!IS_ERR(sub)) {
-		result = cl_io_prepare_write(sub->sub_env, sub->sub_io,
-					     sub_page, from, to);
-		lov_sub_put(sub);
-	} else
-		result = PTR_ERR(sub);
-	return result;
-}
+		if (plist->pl_nr > 0) /* short write */
+			break;
 
-static int lov_io_commit_write(const struct lu_env *env,
-			       const struct cl_io_slice *ios,
-			       const struct cl_page_slice *slice,
-			       unsigned from, unsigned to)
-{
-	struct lov_io     *lio      = cl2lov_io(env, ios);
-	struct cl_page    *sub_page = lov_sub_page(slice);
-	struct lov_io_sub *sub;
-	int result;
+		from = 0;
+	}
 
-	sub = lov_page_subio(env, lio, slice);
-	if (!IS_ERR(sub)) {
-		result = cl_io_commit_write(sub->sub_env, sub->sub_io,
-					    sub_page, from, to);
-		lov_sub_put(sub);
-	} else
-		result = PTR_ERR(sub);
-	return result;
+	/* for error case, add the page back into the qin list */
+	LASSERT(ergo(rc == 0, plist->pl_nr == 0));
+	while (plist->pl_nr > 0) {
+		/* error occurred, add the uncommitted pages back into queue */
+		page = cl_page_list_last(plist);
+		cl_page_list_move_head(queue, plist, page);
+	}
+
+	return rc;
 }
 
 static int lov_io_fault_start(const struct lu_env *env,
@@ -803,16 +789,8 @@ static const struct cl_io_operations lov_io_ops = {
 			.cio_fini   = lov_io_fini
 		}
 	},
-	.req_op = {
-		 [CRT_READ] = {
-			 .cio_submit    = lov_io_submit
-		 },
-		 [CRT_WRITE] = {
-			 .cio_submit    = lov_io_submit
-		 }
-	 },
-	.cio_prepare_write = lov_io_prepare_write,
-	.cio_commit_write  = lov_io_commit_write
+	.cio_submit                    = lov_io_submit,
+	.cio_commit_async              = lov_io_commit_async,
 };
 
 /*****************************************************************************
@@ -880,15 +858,8 @@ static const struct cl_io_operations lov_empty_io_ops = {
 			.cio_fini   = lov_empty_io_fini
 		}
 	},
-	.req_op = {
-		 [CRT_READ] = {
-			 .cio_submit    = LOV_EMPTY_IMPOSSIBLE
-		 },
-		 [CRT_WRITE] = {
-			 .cio_submit    = LOV_EMPTY_IMPOSSIBLE
-		 }
-	 },
-	.cio_commit_write = LOV_EMPTY_IMPOSSIBLE
+	.cio_submit                    = LOV_EMPTY_IMPOSSIBLE,
+	.cio_commit_async              = LOV_EMPTY_IMPOSSIBLE
 };
 
 int lov_io_init_raid0(const struct lu_env *env, struct cl_object *obj,
diff --git a/drivers/staging/lustre/lustre/lov/lov_page.c b/drivers/staging/lustre/lustre/lov/lov_page.c
index 9728da2..5d9b355 100644
--- a/drivers/staging/lustre/lustre/lov/lov_page.c
+++ b/drivers/staging/lustre/lustre/lov/lov_page.c
@@ -105,29 +105,6 @@ static void lov_page_assume(const struct lu_env *env,
 	lov_page_own(env, slice, io, 0);
 }
 
-static int lov_page_cache_add(const struct lu_env *env,
-			      const struct cl_page_slice *slice,
-			      struct cl_io *io)
-{
-	struct lov_io     *lio = lov_env_io(env);
-	struct lov_io_sub *sub;
-	int rc = 0;
-
-	LINVRNT(lov_page_invariant(slice));
-	LINVRNT(!cl2lov_page(slice)->lps_invalid);
-
-	sub = lov_page_subio(env, lio, slice);
-	if (!IS_ERR(sub)) {
-		rc = cl_page_cache_add(sub->sub_env, sub->sub_io,
-				       slice->cpl_page->cp_child, CRT_WRITE);
-		lov_sub_put(sub);
-	} else {
-		rc = PTR_ERR(sub);
-		CL_PAGE_DEBUG(D_ERROR, env, slice->cpl_page, "rc = %d\n", rc);
-	}
-	return rc;
-}
-
 static int lov_page_print(const struct lu_env *env,
 			  const struct cl_page_slice *slice,
 			  void *cookie, lu_printer_t printer)
@@ -141,11 +118,6 @@ static const struct cl_page_operations lov_page_ops = {
 	.cpo_fini   = lov_page_fini,
 	.cpo_own    = lov_page_own,
 	.cpo_assume = lov_page_assume,
-	.io = {
-		[CRT_WRITE] = {
-			.cpo_cache_add = lov_page_cache_add
-		}
-	},
 	.cpo_print  = lov_page_print
 };
 
diff --git a/drivers/staging/lustre/lustre/obdclass/cl_io.c b/drivers/staging/lustre/lustre/obdclass/cl_io.c
index cf94284..9b3c5c1 100644
--- a/drivers/staging/lustre/lustre/obdclass/cl_io.c
+++ b/drivers/staging/lustre/lustre/obdclass/cl_io.c
@@ -782,77 +782,29 @@ int cl_io_read_page(const struct lu_env *env, struct cl_io *io,
 EXPORT_SYMBOL(cl_io_read_page);
 
 /**
- * Called by write io to prepare page to receive data from user buffer.
+ * Commit a list of contiguous pages into writeback cache.
  *
- * \see cl_io_operations::cio_prepare_write()
+ * \returns 0 if all pages committed, or errcode if error occurred.
+ * \see cl_io_operations::cio_commit_async()
  */
-int cl_io_prepare_write(const struct lu_env *env, struct cl_io *io,
-			struct cl_page *page, unsigned from, unsigned to)
+int cl_io_commit_async(const struct lu_env *env, struct cl_io *io,
+		       struct cl_page_list *queue, int from, int to,
+		       cl_commit_cbt cb)
 {
 	const struct cl_io_slice *scan;
 	int result = 0;
 
-	LINVRNT(io->ci_type == CIT_WRITE);
-	LINVRNT(cl_page_is_owned(page, io));
-	LINVRNT(io->ci_state == CIS_IO_GOING || io->ci_state == CIS_LOCKED);
-	LINVRNT(cl_io_invariant(io));
-	LASSERT(cl_page_in_io(page, io));
-
-	cl_io_for_each_reverse(scan, io) {
-		if (scan->cis_iop->cio_prepare_write) {
-			const struct cl_page_slice *slice;
-
-			slice = cl_io_slice_page(scan, page);
-			result = scan->cis_iop->cio_prepare_write(env, scan,
-								  slice,
-								  from, to);
-			if (result != 0)
-				break;
-		}
-	}
-	return result;
-}
-EXPORT_SYMBOL(cl_io_prepare_write);
-
-/**
- * Called by write io after user data were copied into a page.
- *
- * \see cl_io_operations::cio_commit_write()
- */
-int cl_io_commit_write(const struct lu_env *env, struct cl_io *io,
-		       struct cl_page *page, unsigned from, unsigned to)
-{
-	const struct cl_io_slice *scan;
-	int result = 0;
-
-	LINVRNT(io->ci_type == CIT_WRITE);
-	LINVRNT(io->ci_state == CIS_IO_GOING || io->ci_state == CIS_LOCKED);
-	LINVRNT(cl_io_invariant(io));
-	/*
-	 * XXX Uh... not nice. Top level cl_io_commit_write() call (vvp->lov)
-	 * already called cl_page_cache_add(), moving page into CPS_CACHED
-	 * state. Better (and more general) way of dealing with such situation
-	 * is needed.
-	 */
-	LASSERT(cl_page_is_owned(page, io) || page->cp_parent);
-	LASSERT(cl_page_in_io(page, io));
-
 	cl_io_for_each(scan, io) {
-		if (scan->cis_iop->cio_commit_write) {
-			const struct cl_page_slice *slice;
-
-			slice = cl_io_slice_page(scan, page);
-			result = scan->cis_iop->cio_commit_write(env, scan,
-								 slice,
-								 from, to);
-			if (result != 0)
-				break;
-		}
+		if (!scan->cis_iop->cio_commit_async)
+			continue;
+		result = scan->cis_iop->cio_commit_async(env, scan, queue,
+							 from, to, cb);
+		if (result != 0)
+			break;
 	}
-	LINVRNT(result <= 0);
 	return result;
 }
-EXPORT_SYMBOL(cl_io_commit_write);
+EXPORT_SYMBOL(cl_io_commit_async);
 
 /**
  * Submits a list of pages for immediate io.
@@ -870,13 +822,10 @@ int cl_io_submit_rw(const struct lu_env *env, struct cl_io *io,
 	const struct cl_io_slice *scan;
 	int result = 0;
 
-	LINVRNT(crt < ARRAY_SIZE(scan->cis_iop->req_op));
-
 	cl_io_for_each(scan, io) {
-		if (!scan->cis_iop->req_op[crt].cio_submit)
+		if (!scan->cis_iop->cio_submit)
 			continue;
-		result = scan->cis_iop->req_op[crt].cio_submit(env, scan, crt,
-							       queue);
+		result = scan->cis_iop->cio_submit(env, scan, crt, queue);
 		if (result != 0)
 			break;
 	}
@@ -1073,8 +1022,8 @@ EXPORT_SYMBOL(cl_page_list_add);
 /**
  * Removes a page from a page list.
  */
-static void cl_page_list_del(const struct lu_env *env,
-			     struct cl_page_list *plist, struct cl_page *page)
+void cl_page_list_del(const struct lu_env *env, struct cl_page_list *plist,
+		      struct cl_page *page)
 {
 	LASSERT(plist->pl_nr > 0);
 	LINVRNT(plist->pl_owner == current);
@@ -1087,6 +1036,7 @@ static void cl_page_list_del(const struct lu_env *env,
 	lu_ref_del_at(&page->cp_reference, &page->cp_queue_ref, "queue", plist);
 	cl_page_put(env, page);
 }
+EXPORT_SYMBOL(cl_page_list_del);
 
 /**
  * Moves a page from one page list to another.
@@ -1107,6 +1057,24 @@ void cl_page_list_move(struct cl_page_list *dst, struct cl_page_list *src,
 EXPORT_SYMBOL(cl_page_list_move);
 
 /**
+ * Moves a page from one page list to the head of another list.
+ */
+void cl_page_list_move_head(struct cl_page_list *dst, struct cl_page_list *src,
+			    struct cl_page *page)
+{
+	LASSERT(src->pl_nr > 0);
+	LINVRNT(dst->pl_owner == current);
+	LINVRNT(src->pl_owner == current);
+
+	list_move(&page->cp_batch, &dst->pl_pages);
+	--src->pl_nr;
+	++dst->pl_nr;
+	lu_ref_set_at(&page->cp_reference, &page->cp_queue_ref, "queue",
+		      src, dst);
+}
+EXPORT_SYMBOL(cl_page_list_move_head);
+
+/**
  * splice the cl_page_list, just as list head does
  */
 void cl_page_list_splice(struct cl_page_list *list, struct cl_page_list *head)
@@ -1163,8 +1131,7 @@ EXPORT_SYMBOL(cl_page_list_disown);
 /**
  * Releases pages from queue.
  */
-static void cl_page_list_fini(const struct lu_env *env,
-			      struct cl_page_list *plist)
+void cl_page_list_fini(const struct lu_env *env, struct cl_page_list *plist)
 {
 	struct cl_page *page;
 	struct cl_page *temp;
@@ -1175,6 +1142,7 @@ static void cl_page_list_fini(const struct lu_env *env,
 		cl_page_list_del(env, plist, page);
 	LASSERT(plist->pl_nr == 0);
 }
+EXPORT_SYMBOL(cl_page_list_fini);
 
 /**
  * Assumes all pages in a queue.
diff --git a/drivers/staging/lustre/lustre/obdclass/cl_page.c b/drivers/staging/lustre/lustre/obdclass/cl_page.c
index bab8a74..0844a97 100644
--- a/drivers/staging/lustre/lustre/obdclass/cl_page.c
+++ b/drivers/staging/lustre/lustre/obdclass/cl_page.c
@@ -1012,44 +1012,6 @@ int cl_page_make_ready(const struct lu_env *env, struct cl_page *pg,
 EXPORT_SYMBOL(cl_page_make_ready);
 
 /**
- * Notify layers that high level io decided to place this page into a cache
- * for future transfer.
- *
- * The layer implementing transfer engine (osc) has to register this page in
- * its queues.
- *
- * \pre  cl_page_is_owned(pg, io)
- * \post cl_page_is_owned(pg, io)
- *
- * \see cl_page_operations::cpo_cache_add()
- */
-int cl_page_cache_add(const struct lu_env *env, struct cl_io *io,
-		      struct cl_page *pg, enum cl_req_type crt)
-{
-	const struct cl_page_slice *scan;
-	int result = 0;
-
-	PINVRNT(env, pg, crt < CRT_NR);
-	PINVRNT(env, pg, cl_page_is_owned(pg, io));
-	PINVRNT(env, pg, cl_page_invariant(pg));
-
-	if (crt >= CRT_NR)
-		return -EINVAL;
-
-	list_for_each_entry(scan, &pg->cp_layers, cpl_linkage) {
-		if (!scan->cpl_ops->io[crt].cpo_cache_add)
-			continue;
-
-		result = scan->cpl_ops->io[crt].cpo_cache_add(env, scan, io);
-		if (result != 0)
-			break;
-	}
-	CL_PAGE_HEADER(D_TRACE, env, pg, "%d %d\n", crt, result);
-	return result;
-}
-EXPORT_SYMBOL(cl_page_cache_add);
-
-/**
  * Called if a pge is being written back by kernel's intention.
  *
  * \pre  cl_page_is_owned(pg, io)
diff --git a/drivers/staging/lustre/lustre/obdecho/echo_client.c b/drivers/staging/lustre/lustre/obdecho/echo_client.c
index 33ce0c8..6c205f9 100644
--- a/drivers/staging/lustre/lustre/obdecho/echo_client.c
+++ b/drivers/staging/lustre/lustre/obdecho/echo_client.c
@@ -1085,22 +1085,17 @@ static int cl_echo_cancel0(struct lu_env *env, struct echo_device *ed,
 	return 0;
 }
 
-static int cl_echo_async_brw(const struct lu_env *env, struct cl_io *io,
-			     enum cl_req_type unused, struct cl_2queue *queue)
+static void echo_commit_callback(const struct lu_env *env, struct cl_io *io,
+				 struct cl_page *page)
 {
-	struct cl_page *clp;
-	struct cl_page *temp;
-	int result = 0;
+	struct echo_thread_info *info;
+	struct cl_2queue *queue;
 
-	cl_page_list_for_each_safe(clp, temp, &queue->c2_qin) {
-		int rc;
+	info = echo_env_info(env);
+	LASSERT(io == &info->eti_io);
 
-		rc = cl_page_cache_add(env, io, clp, CRT_WRITE);
-		if (rc == 0)
-			continue;
-		result = result ?: rc;
-	}
-	return result;
+	queue = &info->eti_queue;
+	cl_page_list_add(&queue->c2_qout, page);
 }
 
 static int cl_echo_object_brw(struct echo_object *eco, int rw, u64 offset,
@@ -1179,7 +1174,9 @@ static int cl_echo_object_brw(struct echo_object *eco, int rw, u64 offset,
 
 		async = async && (typ == CRT_WRITE);
 		if (async)
-			rc = cl_echo_async_brw(env, io, typ, queue);
+			rc = cl_io_commit_async(env, io, &queue->c2_qin,
+						0, PAGE_SIZE,
+						echo_commit_callback);
 		else
 			rc = cl_io_submit_sync(env, io, typ, queue, 0);
 		CDEBUG(D_INFO, "echo_client %s write returns %d\n",
diff --git a/drivers/staging/lustre/lustre/osc/osc_cache.c b/drivers/staging/lustre/lustre/osc/osc_cache.c
index c9d4e3c..3be4b1f 100644
--- a/drivers/staging/lustre/lustre/osc/osc_cache.c
+++ b/drivers/staging/lustre/lustre/osc/osc_cache.c
@@ -879,10 +879,9 @@ int osc_extent_finish(const struct lu_env *env, struct osc_extent *ext,
 		 * span a whole chunk on the OST side, or our accounting goes
 		 * wrong.  Should match the code in filter_grant_check.
 		 */
-		int offset = oap->oap_page_off & ~PAGE_MASK;
-		int count = oap->oap_count + (offset & (blocksize - 1));
-		int end = (offset + oap->oap_count) & (blocksize - 1);
-
+		int offset = last_off & ~PAGE_MASK;
+		int count = last_count + (offset & (blocksize - 1));
+		int end = (offset + last_count) & (blocksize - 1);
 		if (end)
 			count += blocksize - end;
 
@@ -3131,14 +3130,13 @@ static int discard_cb(const struct lu_env *env, struct cl_io *io,
 	struct cl_page *page = cl_page_top(ops->ops_cl.cpl_page);
 
 	LASSERT(lock->cll_descr.cld_mode >= CLM_WRITE);
-	KLASSERT(ergo(page->cp_type == CPT_CACHEABLE,
-		      !PageWriteback(cl_page_vmpage(env, page))));
-	KLASSERT(ergo(page->cp_type == CPT_CACHEABLE,
-		      !PageDirty(cl_page_vmpage(env, page))));
 
 	/* page is top page. */
 	info->oti_next_index = osc_index(ops) + 1;
 	if (cl_page_own(env, io, page) == 0) {
+		KLASSERT(ergo(page->cp_type == CPT_CACHEABLE,
+			      !PageDirty(cl_page_vmpage(env, page))));
+
 		/* discard the page */
 		cl_page_discard(env, io, page);
 		cl_page_disown(env, io, page);
diff --git a/drivers/staging/lustre/lustre/osc/osc_cl_internal.h b/drivers/staging/lustre/lustre/osc/osc_cl_internal.h
index e70f06c..0e06bed 100644
--- a/drivers/staging/lustre/lustre/osc/osc_cl_internal.h
+++ b/drivers/staging/lustre/lustre/osc/osc_cl_internal.h
@@ -453,6 +453,8 @@ int osc_prep_async_page(struct osc_object *osc, struct osc_page *ops,
 			struct page *page, loff_t offset);
 int osc_queue_async_io(const struct lu_env *env, struct cl_io *io,
 		       struct osc_page *ops);
+int osc_page_cache_add(const struct lu_env *env,
+		       const struct cl_page_slice *slice, struct cl_io *io);
 int osc_teardown_async_page(const struct lu_env *env, struct osc_object *obj,
 			    struct osc_page *ops);
 int osc_flush_async_page(const struct lu_env *env, struct cl_io *io,
diff --git a/drivers/staging/lustre/lustre/osc/osc_internal.h b/drivers/staging/lustre/lustre/osc/osc_internal.h
index b3b15d4..906225c 100644
--- a/drivers/staging/lustre/lustre/osc/osc_internal.h
+++ b/drivers/staging/lustre/lustre/osc/osc_internal.h
@@ -83,6 +83,12 @@ struct osc_async_page {
 #define oap_count       oap_brw_page.count
 #define oap_brw_flags   oap_brw_page.flag
 
+static inline struct osc_async_page *brw_page2oap(struct brw_page *pga)
+{
+	return (struct osc_async_page *)container_of(pga, struct osc_async_page,
+						     oap_brw_page);
+}
+
 struct osc_cache_waiter {
 	struct list_head	      ocw_entry;
 	wait_queue_head_t	     ocw_waitq;
diff --git a/drivers/staging/lustre/lustre/osc/osc_io.c b/drivers/staging/lustre/lustre/osc/osc_io.c
index 1536d31..e9e18a1 100644
--- a/drivers/staging/lustre/lustre/osc/osc_io.c
+++ b/drivers/staging/lustre/lustre/osc/osc_io.c
@@ -185,6 +185,13 @@ static int osc_io_submit(const struct lu_env *env,
 	return qout->pl_nr > 0 ? 0 : result;
 }
 
+/**
+ * This is called when a page is accessed within file in a way that creates
+ * new page, if one were missing (i.e., if there were a hole at that place in
+ * the file, or accessed page is beyond the current file size).
+ *
+ * Expand stripe KMS if necessary.
+ */
 static void osc_page_touch_at(const struct lu_env *env,
 			      struct cl_object *obj, pgoff_t idx, unsigned to)
 {
@@ -208,7 +215,8 @@ static void osc_page_touch_at(const struct lu_env *env,
 	       kms > loi->loi_kms ? "" : "not ", loi->loi_kms, kms,
 	       loi->loi_lvb.lvb_size);
 
-	valid = 0;
+	attr->cat_mtime = attr->cat_ctime = LTIME_S(CURRENT_TIME);
+	valid = CAT_MTIME | CAT_CTIME;
 	if (kms > loi->loi_kms) {
 		attr->cat_kms = kms;
 		valid |= CAT_KMS;
@@ -221,91 +229,83 @@ static void osc_page_touch_at(const struct lu_env *env,
 	cl_object_attr_unlock(obj);
 }
 
-/**
- * This is called when a page is accessed within file in a way that creates
- * new page, if one were missing (i.e., if there were a hole at that place in
- * the file, or accessed page is beyond the current file size). Examples:
- * ->commit_write() and ->nopage() methods.
- *
- * Expand stripe KMS if necessary.
- */
-static void osc_page_touch(const struct lu_env *env,
-			   struct osc_page *opage, unsigned to)
-{
-	struct cl_page *page = opage->ops_cl.cpl_page;
-	struct cl_object *obj = opage->ops_cl.cpl_obj;
-
-	osc_page_touch_at(env, obj, page->cp_index, to);
-}
-
-/**
- * Implements cl_io_operations::cio_prepare_write() method for osc layer.
- *
- * \retval -EIO transfer initiated against this osc will most likely fail
- * \retval 0    transfer initiated against this osc will most likely succeed.
- *
- * The reason for this check is to immediately return an error to the caller
- * in the case of a deactivated import. Note, that import can be deactivated
- * later, while pages, dirtied by this IO, are still in the cache, but this is
- * irrelevant, because that would still return an error to the application (if
- * it does fsync), but many applications don't do fsync because of performance
- * issues, and we wanted to return an -EIO at write time to notify the
- * application.
- */
-static int osc_io_prepare_write(const struct lu_env *env,
-				const struct cl_io_slice *ios,
-				const struct cl_page_slice *slice,
-				unsigned from, unsigned to)
+static int osc_io_commit_async(const struct lu_env *env,
+			       const struct cl_io_slice *ios,
+			       struct cl_page_list *qin, int from, int to,
+			       cl_commit_cbt cb)
 {
-	struct osc_device *dev = lu2osc_dev(slice->cpl_obj->co_lu.lo_dev);
-	struct obd_import *imp = class_exp2cliimp(dev->od_exp);
+	struct cl_io *io = ios->cis_io;
 	struct osc_io *oio = cl2osc_io(env, ios);
+	struct osc_object *osc = cl2osc(ios->cis_obj);
+	struct cl_page *page;
+	struct cl_page *last_page;
+	struct osc_page *opg;
 	int result = 0;
 
+	LASSERT(qin->pl_nr > 0);
+
+	/* Handle partial page cases */
+	last_page = cl_page_list_last(qin);
+	if (oio->oi_lockless) {
+		page = cl_page_list_first(qin);
+		if (page == last_page) {
+			cl_page_clip(env, page, from, to);
+		} else {
+			if (from != 0)
+				cl_page_clip(env, page, from, PAGE_SIZE);
+			if (to != PAGE_SIZE)
+				cl_page_clip(env, last_page, 0, to);
+		}
+	}
+
 	/*
-	 * This implements OBD_BRW_CHECK logic from old client.
+	 * NOTE: here @page is a top-level page. This is done to avoid
+	 * creation of sub-page-list.
 	 */
+	while (qin->pl_nr > 0) {
+		struct osc_async_page *oap;
 
-	if (!imp || imp->imp_invalid)
-		result = -EIO;
-	if (result == 0 && oio->oi_lockless)
-		/* this page contains `invalid' data, but who cares?
-		 * nobody can access the invalid data.
-		 * in osc_io_commit_write(), we're going to write exact
-		 * [from, to) bytes of this page to OST. -jay
-		 */
-		cl_page_export(env, slice->cpl_page, 1);
+		page = cl_page_list_first(qin);
+		opg = osc_cl_page_osc(page);
+		oap = &opg->ops_oap;
 
-	return result;
-}
+		if (!list_empty(&oap->oap_rpc_item)) {
+			CDEBUG(D_CACHE, "Busy oap %p page %p for submit.\n",
+			       oap, opg);
+			result = -EBUSY;
+			break;
+		}
 
-static int osc_io_commit_write(const struct lu_env *env,
-			       const struct cl_io_slice *ios,
-			       const struct cl_page_slice *slice,
-			       unsigned from, unsigned to)
-{
-	struct osc_io *oio = cl2osc_io(env, ios);
-	struct osc_page *opg = cl2osc_page(slice);
-	struct osc_object *obj = cl2osc(opg->ops_cl.cpl_obj);
-	struct osc_async_page *oap = &opg->ops_oap;
+		/* The page may be already in dirty cache. */
+		if (list_empty(&oap->oap_pending_item)) {
+			result = osc_page_cache_add(env, &opg->ops_cl, io);
+			if (result != 0)
+				break;
+		}
 
-	LASSERT(to > 0);
-	/*
-	 * XXX instead of calling osc_page_touch() here and in
-	 * osc_io_fault_start() it might be more logical to introduce
-	 * cl_page_touch() method, that generic cl_io_commit_write() and page
-	 * fault code calls.
-	 */
-	osc_page_touch(env, cl2osc_page(slice), to);
-	if (!client_is_remote(osc_export(obj)) &&
-	    capable(CFS_CAP_SYS_RESOURCE))
-		oap->oap_brw_flags |= OBD_BRW_NOQUOTA;
+		osc_page_touch_at(env, osc2cl(osc),
+				  opg->ops_cl.cpl_page->cp_index,
+				  page == last_page ? to : PAGE_SIZE);
 
-	if (oio->oi_lockless)
-		/* see osc_io_prepare_write() for lockless io handling. */
-		cl_page_clip(env, slice->cpl_page, from, to);
+		cl_page_list_del(env, qin, page);
 
-	return 0;
+		(*cb)(env, io, page);
+		/* Can't access page any more. Page can be in transfer and
+		 * complete at any time.
+		 */
+	}
+
+	/* for sync write, kernel will wait for this page to be flushed before
+	 * osc_io_end() is called, so release it earlier.
+	 * for mkwrite(), it's known there is no further pages.
+	 */
+	if (cl_io_is_sync_write(io) && oio->oi_active) {
+		osc_extent_release(env, oio->oi_active);
+		oio->oi_active = NULL;
+	}
+
+	CDEBUG(D_INFO, "%d %d\n", qin->pl_nr, result);
+	return result;
 }
 
 static int osc_io_rw_iter_init(const struct lu_env *env,
@@ -719,16 +719,8 @@ static const struct cl_io_operations osc_io_ops = {
 			.cio_fini   = osc_io_fini
 		}
 	},
-	.req_op = {
-		 [CRT_READ] = {
-			 .cio_submit    = osc_io_submit
-		 },
-		 [CRT_WRITE] = {
-			 .cio_submit    = osc_io_submit
-		 }
-	 },
-	.cio_prepare_write = osc_io_prepare_write,
-	.cio_commit_write  = osc_io_commit_write
+	.cio_submit                 = osc_io_submit,
+	.cio_commit_async           = osc_io_commit_async
 };
 
 /*****************************************************************************
diff --git a/drivers/staging/lustre/lustre/osc/osc_page.c b/drivers/staging/lustre/lustre/osc/osc_page.c
index 91ff607..eff3b4b 100644
--- a/drivers/staging/lustre/lustre/osc/osc_page.c
+++ b/drivers/staging/lustre/lustre/osc/osc_page.c
@@ -89,8 +89,8 @@ static void osc_page_transfer_put(const struct lu_env *env,
 	struct cl_page *page = cl_page_top(opg->ops_cl.cpl_page);
 
 	if (opg->ops_transfer_pinned) {
-		lu_ref_del(&page->cp_reference, "transfer", page);
 		opg->ops_transfer_pinned = 0;
+		lu_ref_del(&page->cp_reference, "transfer", page);
 		cl_page_put(env, page);
 	}
 }
@@ -113,11 +113,9 @@ static void osc_page_transfer_add(const struct lu_env *env,
 	spin_unlock(&obj->oo_seatbelt);
 }
 
-static int osc_page_cache_add(const struct lu_env *env,
-			      const struct cl_page_slice *slice,
-			      struct cl_io *io)
+int osc_page_cache_add(const struct lu_env *env,
+		       const struct cl_page_slice *slice, struct cl_io *io)
 {
-	struct osc_io *oio = osc_env_io(env);
 	struct osc_page *opg = cl2osc_page(slice);
 	int result;
 
@@ -130,17 +128,6 @@ static int osc_page_cache_add(const struct lu_env *env,
 	else
 		osc_page_transfer_add(env, opg, CRT_WRITE);
 
-	/* for sync write, kernel will wait for this page to be flushed before
-	 * osc_io_end() is called, so release it earlier.
-	 * for mkwrite(), it's known there is no further pages.
-	 */
-	if (cl_io_is_sync_write(io) || cl_io_is_mkwrite(io)) {
-		if (oio->oi_active) {
-			osc_extent_release(env, oio->oi_active);
-			oio->oi_active = NULL;
-		}
-	}
-
 	return result;
 }
 
@@ -231,17 +218,6 @@ static void osc_page_completion_write(const struct lu_env *env,
 {
 }
 
-static int osc_page_fail(const struct lu_env *env,
-			 const struct cl_page_slice *slice,
-			 struct cl_io *unused)
-{
-	/*
-	 * Cached read?
-	 */
-	LBUG();
-	return 0;
-}
-
 static const char *osc_list(struct list_head *head)
 {
 	return list_empty(head) ? "-" : "+";
@@ -393,11 +369,9 @@ static const struct cl_page_operations osc_page_ops = {
 	.cpo_disown	= osc_page_disown,
 	.io = {
 		[CRT_READ] = {
-			.cpo_cache_add  = osc_page_fail,
 			.cpo_completion = osc_page_completion_read
 		},
 		[CRT_WRITE] = {
-			.cpo_cache_add  = osc_page_cache_add,
 			.cpo_completion = osc_page_completion_write
 		}
 	},
diff --git a/drivers/staging/lustre/lustre/osc/osc_request.c b/drivers/staging/lustre/lustre/osc/osc_request.c
index c055511b3..1ff497f 100644
--- a/drivers/staging/lustre/lustre/osc/osc_request.c
+++ b/drivers/staging/lustre/lustre/osc/osc_request.c
@@ -1734,7 +1734,6 @@ static int brw_interpret(const struct lu_env *env,
 	struct osc_brw_async_args *aa = data;
 	struct osc_extent *ext;
 	struct osc_extent *tmp;
-	struct cl_object *obj = NULL;
 	struct client_obd *cli = aa->aa_cli;
 
 	rc = osc_brw_fini_request(req, rc);
@@ -1763,24 +1762,17 @@ static int brw_interpret(const struct lu_env *env,
 			rc = -EIO;
 	}
 
-	list_for_each_entry_safe(ext, tmp, &aa->aa_exts, oe_link) {
-		if (!obj && rc == 0) {
-			obj = osc2cl(ext->oe_obj);
-			cl_object_get(obj);
-		}
-
-		list_del_init(&ext->oe_link);
-		osc_extent_finish(env, ext, 1, rc);
-	}
-	LASSERT(list_empty(&aa->aa_exts));
-	LASSERT(list_empty(&aa->aa_oaps));
-
-	if (obj) {
+	if (rc == 0) {
 		struct obdo *oa = aa->aa_oa;
 		struct cl_attr *attr  = &osc_env_info(env)->oti_attr;
 		unsigned long valid = 0;
+		struct cl_object *obj;
+		struct osc_async_page *last;
+
+		last = brw_page2oap(aa->aa_ppga[aa->aa_page_count - 1]);
+		obj = osc2cl(last->oap_obj);
 
-		LASSERT(rc == 0);
+		cl_object_attr_lock(obj);
 		if (oa->o_valid & OBD_MD_FLBLOCKS) {
 			attr->cat_blocks = oa->o_blocks;
 			valid |= CAT_BLOCKS;
@@ -1797,15 +1789,39 @@ static int brw_interpret(const struct lu_env *env,
 			attr->cat_ctime = oa->o_ctime;
 			valid |= CAT_CTIME;
 		}
-		if (valid != 0) {
-			cl_object_attr_lock(obj);
-			cl_object_attr_set(env, obj, attr, valid);
-			cl_object_attr_unlock(obj);
+
+		if (lustre_msg_get_opc(req->rq_reqmsg) == OST_WRITE) {
+			struct lov_oinfo *loi = cl2osc(obj)->oo_oinfo;
+			loff_t last_off = last->oap_count + last->oap_obj_off;
+
+			/* Change file size if this is an out of quota or
+			 * direct IO write and it extends the file size
+			 */
+			if (loi->loi_lvb.lvb_size < last_off) {
+				attr->cat_size = last_off;
+				valid |= CAT_SIZE;
+			}
+			/* Extend KMS if it's not a lockless write */
+			if (loi->loi_kms < last_off &&
+			    oap2osc_page(last)->ops_srvlock == 0) {
+				attr->cat_kms = last_off;
+				valid |= CAT_KMS;
+			}
 		}
-		cl_object_put(env, obj);
+
+		if (valid != 0)
+			cl_object_attr_set(env, obj, attr, valid);
+		cl_object_attr_unlock(obj);
 	}
 	kmem_cache_free(obdo_cachep, aa->aa_oa);
 
+	list_for_each_entry_safe(ext, tmp, &aa->aa_exts, oe_link) {
+		list_del_init(&ext->oe_link);
+		osc_extent_finish(env, ext, 1, rc);
+	}
+	LASSERT(list_empty(&aa->aa_exts));
+	LASSERT(list_empty(&aa->aa_oaps));
+
 	cl_req_completion(env, aa->aa_clerq, rc < 0 ? rc :
 			  req->rq_bulk->bd_nob_transferred);
 	osc_release_ppga(aa->aa_ppga, aa->aa_page_count);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 10/46] staging/lustre/osc: add weight function for DLM lock
  2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
                   ` (8 preceding siblings ...)
  2016-03-30 23:48 ` [PATCH v2 09/46] staging/lustre/clio: add pages into writeback cache in batches green
@ 2016-03-30 23:48 ` green
  2016-03-30 23:48 ` [PATCH v2 11/46] staging/lustre/clio: remove stackable cl_page completely green
                   ` (35 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: green @ 2016-03-30 23:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Jinshan Xiong, Oleg Drokin

From: Jinshan Xiong <jinshan.xiong@intel.com>

Use weigh_ast to decide if a lock covers any pages.
In recovery, weigh_ast will be used to decide if a DLM read lock
covers any locked pages, or it will be canceled instead being
recovered.

The problem with the original implementation is that it attached
each osc_page to an osc_lock also changed lock state to add every
pages for readahead.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-on: http://review.whamcloud.com/7894
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3321
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 drivers/staging/lustre/lustre/ldlm/ldlm_request.c  |   9 +-
 .../staging/lustre/lustre/osc/osc_cl_internal.h    |  20 ----
 drivers/staging/lustre/lustre/osc/osc_internal.h   |   3 +-
 drivers/staging/lustre/lustre/osc/osc_lock.c       | 113 +++++++++++++++------
 drivers/staging/lustre/lustre/osc/osc_page.c       |  78 +-------------
 drivers/staging/lustre/lustre/osc/osc_request.c    |   5 +-
 6 files changed, 89 insertions(+), 139 deletions(-)

diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_request.c b/drivers/staging/lustre/lustre/ldlm/ldlm_request.c
index c7904a9..d5968e0 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_request.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_request.c
@@ -1139,10 +1139,10 @@ static ldlm_policy_res_t ldlm_cancel_no_wait_policy(struct ldlm_namespace *ns,
 	ldlm_policy_res_t result = LDLM_POLICY_CANCEL_LOCK;
 	ldlm_cancel_for_recovery cb = ns->ns_cancel_for_recovery;
 
-	lock_res_and_lock(lock);
-
 	/* don't check added & count since we want to process all locks
-	 * from unused list
+	 * from unused list.
+	 * It's fine to not take lock to access lock->l_resource since
+	 * the lock has already been granted so it won't change.
 	 */
 	switch (lock->l_resource->lr_type) {
 	case LDLM_EXTENT:
@@ -1151,11 +1151,12 @@ static ldlm_policy_res_t ldlm_cancel_no_wait_policy(struct ldlm_namespace *ns,
 			break;
 	default:
 		result = LDLM_POLICY_SKIP_LOCK;
+		lock_res_and_lock(lock);
 		lock->l_flags |= LDLM_FL_SKIPPED;
+		unlock_res_and_lock(lock);
 		break;
 	}
 
-	unlock_res_and_lock(lock);
 	return result;
 }
 
diff --git a/drivers/staging/lustre/lustre/osc/osc_cl_internal.h b/drivers/staging/lustre/lustre/osc/osc_cl_internal.h
index 0e06bed..cf87043 100644
--- a/drivers/staging/lustre/lustre/osc/osc_cl_internal.h
+++ b/drivers/staging/lustre/lustre/osc/osc_cl_internal.h
@@ -275,16 +275,6 @@ struct osc_lock {
 	enum osc_lock_state      ols_state;
 
 	/**
-	 * How many pages are using this lock for io, currently only used by
-	 * read-ahead. If non-zero, the underlying dlm lock won't be cancelled
-	 * during recovery to avoid deadlock. see bz16774.
-	 *
-	 * \see osc_page::ops_lock
-	 * \see osc_page_addref_lock(), osc_page_putref_lock()
-	 */
-	atomic_t	     ols_pageref;
-
-	/**
 	 * true, if ldlm_lock_addref() was called against
 	 * osc_lock::ols_lock. This is used for sanity checking.
 	 *
@@ -400,16 +390,6 @@ struct osc_page {
 	 * Submit time - the time when the page is starting RPC. For debugging.
 	 */
 	unsigned long	    ops_submit_time;
-
-	/**
-	 * A lock of which we hold a reference covers this page. Only used by
-	 * read-ahead: for a readahead page, we hold it's covering lock to
-	 * prevent it from being canceled during recovery.
-	 *
-	 * \see osc_lock::ols_pageref
-	 * \see osc_page_addref_lock(), osc_page_putref_lock().
-	 */
-	struct cl_lock       *ops_lock;
 };
 
 extern struct kmem_cache *osc_lock_kmem;
diff --git a/drivers/staging/lustre/lustre/osc/osc_internal.h b/drivers/staging/lustre/lustre/osc/osc_internal.h
index 906225c..b7fb01a 100644
--- a/drivers/staging/lustre/lustre/osc/osc_internal.h
+++ b/drivers/staging/lustre/lustre/osc/osc_internal.h
@@ -141,6 +141,7 @@ int osc_lru_shrink(const struct lu_env *env, struct client_obd *cli,
 int osc_lru_reclaim(struct client_obd *cli);
 
 extern spinlock_t osc_ast_guard;
+unsigned long osc_ldlm_weigh_ast(struct ldlm_lock *dlmlock);
 
 int osc_setup(struct obd_device *obd, struct lustre_cfg *lcfg);
 
@@ -181,8 +182,6 @@ static inline struct osc_device *obd2osc_dev(const struct obd_device *d)
 	return container_of0(d->obd_lu_dev, struct osc_device, od_cl.cd_lu_dev);
 }
 
-int osc_dlm_lock_pageref(struct ldlm_lock *dlm);
-
 extern struct kmem_cache *osc_quota_kmem;
 struct osc_quota_info {
 	/** linkage for quota hash table */
diff --git a/drivers/staging/lustre/lustre/osc/osc_lock.c b/drivers/staging/lustre/lustre/osc/osc_lock.c
index 3a8a6d1..978b6ea 100644
--- a/drivers/staging/lustre/lustre/osc/osc_lock.c
+++ b/drivers/staging/lustre/lustre/osc/osc_lock.c
@@ -51,8 +51,6 @@
  *  @{
  */
 
-#define _PAGEREF_MAGIC  (-10000000)
-
 /*****************************************************************************
  *
  * Type conversions.
@@ -248,8 +246,6 @@ static void osc_lock_fini(const struct lu_env *env,
 	 */
 	osc_lock_unhold(ols);
 	LASSERT(!ols->ols_lock);
-	LASSERT(atomic_read(&ols->ols_pageref) == 0 ||
-		atomic_read(&ols->ols_pageref) == _PAGEREF_MAGIC);
 
 	kmem_cache_free(osc_lock_kmem, ols);
 }
@@ -895,11 +891,88 @@ static int osc_ldlm_glimpse_ast(struct ldlm_lock *dlmlock, void *data)
 	return result;
 }
 
-static unsigned long osc_lock_weigh(const struct lu_env *env,
-				    const struct cl_lock_slice *slice)
+static int weigh_cb(const struct lu_env *env, struct cl_io *io,
+		    struct osc_page *ops, void *cbdata)
 {
-	/* TODO: check how many pages are covered by this lock */
-	return cl2osc(slice->cls_obj)->oo_npages;
+	struct cl_page *page = ops->ops_cl.cpl_page;
+
+	if (cl_page_is_vmlocked(env, page)) {
+		(*(unsigned long *)cbdata)++;
+		return CLP_GANG_ABORT;
+	}
+
+	return CLP_GANG_OKAY;
+}
+
+static unsigned long osc_lock_weight(const struct lu_env *env,
+				     const struct osc_lock *ols)
+{
+	struct cl_io *io = &osc_env_info(env)->oti_io;
+	struct cl_lock_descr *descr = &ols->ols_cl.cls_lock->cll_descr;
+	struct cl_object *obj = ols->ols_cl.cls_obj;
+	unsigned long npages = 0;
+	int result;
+
+	io->ci_obj = cl_object_top(obj);
+	io->ci_ignore_layout = 1;
+	result = cl_io_init(env, io, CIT_MISC, io->ci_obj);
+	if (result != 0)
+		return result;
+
+	do {
+		result = osc_page_gang_lookup(env, io, cl2osc(obj),
+					      descr->cld_start, descr->cld_end,
+					      weigh_cb, (void *)&npages);
+		if (result == CLP_GANG_ABORT)
+			break;
+		if (result == CLP_GANG_RESCHED)
+			cond_resched();
+	} while (result != CLP_GANG_OKAY);
+	cl_io_fini(env, io);
+
+	return npages;
+}
+
+/**
+ * Get the weight of dlm lock for early cancellation.
+ */
+unsigned long osc_ldlm_weigh_ast(struct ldlm_lock *dlmlock)
+{
+	struct cl_env_nest       nest;
+	struct lu_env           *env;
+	struct osc_lock         *lock;
+	unsigned long            weight;
+
+	might_sleep();
+	/*
+	 * osc_ldlm_weigh_ast has a complex context since it might be called
+	 * because of lock canceling, or from user's input. We have to make
+	 * a new environment for it. Probably it is implementation safe to use
+	 * the upper context because cl_lock_put don't modify environment
+	 * variables. But just in case ..
+	 */
+	env = cl_env_nested_get(&nest);
+	if (IS_ERR(env))
+		/* Mostly because lack of memory, do not eliminate this lock */
+		return 1;
+
+	LASSERT(dlmlock->l_resource->lr_type == LDLM_EXTENT);
+	lock = osc_ast_data_get(dlmlock);
+	if (!lock) {
+		/* cl_lock was destroyed because of memory pressure.
+		 * It is much reasonable to assign this type of lock
+		 * a lower cost.
+		 */
+		weight = 0;
+		goto out;
+	}
+
+	weight = osc_lock_weight(env, lock);
+	osc_ast_data_put(env, lock);
+
+out:
+	cl_env_nested_put(&nest, env);
+	return weight;
 }
 
 static void osc_lock_build_einfo(const struct lu_env *env,
@@ -1468,7 +1541,6 @@ static const struct cl_lock_operations osc_lock_ops = {
 	.clo_delete  = osc_lock_delete,
 	.clo_state   = osc_lock_state,
 	.clo_cancel  = osc_lock_cancel,
-	.clo_weigh   = osc_lock_weigh,
 	.clo_print   = osc_lock_print,
 	.clo_fits_into = osc_lock_fits_into,
 };
@@ -1570,7 +1642,6 @@ int osc_lock_init(const struct lu_env *env,
 		__u32 enqflags = lock->cll_descr.cld_enq_flags;
 
 		osc_lock_build_einfo(env, lock, clk, &clk->ols_einfo);
-		atomic_set(&clk->ols_pageref, 0);
 		clk->ols_state = OLS_NEW;
 
 		clk->ols_flags = osc_enq2ldlm_flags(enqflags);
@@ -1597,26 +1668,4 @@ int osc_lock_init(const struct lu_env *env,
 	return result;
 }
 
-int osc_dlm_lock_pageref(struct ldlm_lock *dlm)
-{
-	struct osc_lock *olock;
-	int rc = 0;
-
-	spin_lock(&osc_ast_guard);
-	olock = dlm->l_ast_data;
-	/*
-	 * there's a very rare race with osc_page_addref_lock(), but that
-	 * doesn't matter because in the worst case we don't cancel a lock
-	 * which we actually can, that's no harm.
-	 */
-	if (olock &&
-	    atomic_add_return(_PAGEREF_MAGIC,
-			      &olock->ols_pageref) != _PAGEREF_MAGIC) {
-		atomic_sub(_PAGEREF_MAGIC, &olock->ols_pageref);
-		rc = 1;
-	}
-	spin_unlock(&osc_ast_guard);
-	return rc;
-}
-
 /** @} osc */
diff --git a/drivers/staging/lustre/lustre/osc/osc_page.c b/drivers/staging/lustre/lustre/osc/osc_page.c
index eff3b4b..8dc62fa 100644
--- a/drivers/staging/lustre/lustre/osc/osc_page.c
+++ b/drivers/staging/lustre/lustre/osc/osc_page.c
@@ -67,10 +67,6 @@ static int osc_page_protected(const struct lu_env *env,
 static void osc_page_fini(const struct lu_env *env,
 			  struct cl_page_slice *slice)
 {
-	struct osc_page *opg = cl2osc_page(slice);
-
-	CDEBUG(D_TRACE, "%p\n", opg);
-	LASSERT(!opg->ops_lock);
 }
 
 static void osc_page_transfer_get(struct osc_page *opg, const char *label)
@@ -139,42 +135,6 @@ void osc_index2policy(ldlm_policy_data_t *policy, const struct cl_object *obj,
 	policy->l_extent.end = cl_offset(obj, end + 1) - 1;
 }
 
-static int osc_page_addref_lock(const struct lu_env *env,
-				struct osc_page *opg,
-				struct cl_lock *lock)
-{
-	struct osc_lock *olock;
-	int rc;
-
-	LASSERT(!opg->ops_lock);
-
-	olock = osc_lock_at(lock);
-	if (atomic_inc_return(&olock->ols_pageref) <= 0) {
-		atomic_dec(&olock->ols_pageref);
-		rc = -ENODATA;
-	} else {
-		cl_lock_get(lock);
-		opg->ops_lock = lock;
-		rc = 0;
-	}
-	return rc;
-}
-
-static void osc_page_putref_lock(const struct lu_env *env,
-				 struct osc_page *opg)
-{
-	struct cl_lock *lock = opg->ops_lock;
-	struct osc_lock *olock;
-
-	LASSERT(lock);
-	olock = osc_lock_at(lock);
-
-	atomic_dec(&olock->ols_pageref);
-	opg->ops_lock = NULL;
-
-	cl_lock_put(env, lock);
-}
-
 static int osc_page_is_under_lock(const struct lu_env *env,
 				  const struct cl_page_slice *slice,
 				  struct cl_io *unused)
@@ -185,39 +145,12 @@ static int osc_page_is_under_lock(const struct lu_env *env,
 	lock = cl_lock_at_page(env, slice->cpl_obj, slice->cpl_page,
 			       NULL, 1, 0);
 	if (lock) {
-		if (osc_page_addref_lock(env, cl2osc_page(slice), lock) == 0)
-			result = -EBUSY;
 		cl_lock_put(env, lock);
+		result = -EBUSY;
 	}
 	return result;
 }
 
-static void osc_page_disown(const struct lu_env *env,
-			    const struct cl_page_slice *slice,
-			    struct cl_io *io)
-{
-	struct osc_page *opg = cl2osc_page(slice);
-
-	if (unlikely(opg->ops_lock))
-		osc_page_putref_lock(env, opg);
-}
-
-static void osc_page_completion_read(const struct lu_env *env,
-				     const struct cl_page_slice *slice,
-				     int ioret)
-{
-	struct osc_page *opg = cl2osc_page(slice);
-
-	if (likely(opg->ops_lock))
-		osc_page_putref_lock(env, opg);
-}
-
-static void osc_page_completion_write(const struct lu_env *env,
-				      const struct cl_page_slice *slice,
-				      int ioret)
-{
-}
-
 static const char *osc_list(struct list_head *head)
 {
 	return list_empty(head) ? "-" : "+";
@@ -366,15 +299,6 @@ static const struct cl_page_operations osc_page_ops = {
 	.cpo_print	 = osc_page_print,
 	.cpo_delete	= osc_page_delete,
 	.cpo_is_under_lock = osc_page_is_under_lock,
-	.cpo_disown	= osc_page_disown,
-	.io = {
-		[CRT_READ] = {
-			.cpo_completion = osc_page_completion_read
-		},
-		[CRT_WRITE] = {
-			.cpo_completion = osc_page_completion_write
-		}
-	},
 	.cpo_clip	   = osc_page_clip,
 	.cpo_cancel	 = osc_page_cancel,
 	.cpo_flush	  = osc_page_flush
diff --git a/drivers/staging/lustre/lustre/osc/osc_request.c b/drivers/staging/lustre/lustre/osc/osc_request.c
index 1ff497f..1e378e6 100644
--- a/drivers/staging/lustre/lustre/osc/osc_request.c
+++ b/drivers/staging/lustre/lustre/osc/osc_request.c
@@ -3131,8 +3131,6 @@ static int osc_import_event(struct obd_device *obd,
  */
 static int osc_cancel_for_recovery(struct ldlm_lock *lock)
 {
-	check_res_locked(lock->l_resource);
-
 	/*
 	 * Cancel all unused extent lock in granted mode LCK_PR or LCK_CR.
 	 *
@@ -3141,8 +3139,7 @@ static int osc_cancel_for_recovery(struct ldlm_lock *lock)
 	 */
 	if (lock->l_resource->lr_type == LDLM_EXTENT &&
 	    (lock->l_granted_mode == LCK_PR ||
-	     lock->l_granted_mode == LCK_CR) &&
-	    (osc_dlm_lock_pageref(lock) == 0))
+	     lock->l_granted_mode == LCK_CR) && osc_ldlm_weigh_ast(lock) == 0)
 		return 1;
 
 	return 0;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 11/46] staging/lustre/clio: remove stackable cl_page completely
  2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
                   ` (9 preceding siblings ...)
  2016-03-30 23:48 ` [PATCH v2 10/46] staging/lustre/osc: add weight function for DLM lock green
@ 2016-03-30 23:48 ` green
  2016-03-30 23:48 ` [PATCH v2 12/46] staging/lustre/clio: optimize read ahead code green
                   ` (34 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: green @ 2016-03-30 23:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Jinshan Xiong, Oleg Drokin

From: Jinshan Xiong <jinshan.xiong@intel.com>

>From now on, cl_page becomes one to one mapping of vmpage.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-on: http://review.whamcloud.com/7895
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3321
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 drivers/staging/lustre/lustre/include/cl_object.h  |  43 ++---
 drivers/staging/lustre/lustre/include/lclient.h    |   7 +-
 drivers/staging/lustre/lustre/llite/lcommon_cl.c   |  12 +-
 .../staging/lustre/lustre/llite/llite_internal.h   |   4 +
 drivers/staging/lustre/lustre/llite/rw.c           |   7 +-
 drivers/staging/lustre/lustre/llite/rw26.c         |  35 +---
 drivers/staging/lustre/lustre/llite/vvp_internal.h |   2 +-
 drivers/staging/lustre/lustre/llite/vvp_io.c       |  45 +++--
 drivers/staging/lustre/lustre/llite/vvp_page.c     |  23 +--
 .../staging/lustre/lustre/lov/lov_cl_internal.h    |  14 +-
 drivers/staging/lustre/lustre/lov/lov_io.c         |   8 +-
 drivers/staging/lustre/lustre/lov/lov_object.c     |  32 +++-
 drivers/staging/lustre/lustre/lov/lov_page.c       | 104 +++-------
 drivers/staging/lustre/lustre/lov/lovsub_page.c    |   2 +-
 drivers/staging/lustre/lustre/obdclass/cl_io.c     |  63 +-----
 drivers/staging/lustre/lustre/obdclass/cl_object.c |   4 +-
 drivers/staging/lustre/lustre/obdclass/cl_page.c   | 213 +++++----------------
 .../staging/lustre/lustre/obdecho/echo_client.c    |  22 +--
 drivers/staging/lustre/lustre/osc/osc_cache.c      |  59 +++---
 .../staging/lustre/lustre/osc/osc_cl_internal.h    |  12 +-
 drivers/staging/lustre/lustre/osc/osc_io.c         |  41 ++--
 drivers/staging/lustre/lustre/osc/osc_page.c       |  33 ++--
 22 files changed, 257 insertions(+), 528 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/cl_object.h b/drivers/staging/lustre/lustre/include/cl_object.h
index c3865ec..5b65854 100644
--- a/drivers/staging/lustre/lustre/include/cl_object.h
+++ b/drivers/staging/lustre/lustre/include/cl_object.h
@@ -322,7 +322,7 @@ struct cl_object_operations {
 	 *	 to be used instead of newly created.
 	 */
 	int  (*coo_page_init)(const struct lu_env *env, struct cl_object *obj,
-			      struct cl_page *page, struct page *vmpage);
+				struct cl_page *page, pgoff_t index);
 	/**
 	 * Initialize lock slice for this layer. Called top-to-bottom through
 	 * every object layer when a new cl_lock is instantiated. Layer
@@ -460,10 +460,6 @@ struct cl_object_header {
 					co_lu.lo_linkage)
 /** @} cl_object */
 
-#ifndef pgoff_t
-#define pgoff_t unsigned long
-#endif
-
 #define CL_PAGE_EOF ((pgoff_t)~0ull)
 
 /** \addtogroup cl_page cl_page
@@ -727,16 +723,10 @@ struct cl_page {
 	atomic_t	     cp_ref;
 	/** An object this page is a part of. Immutable after creation. */
 	struct cl_object	*cp_obj;
-	/** Logical page index within the object. Immutable after creation. */
-	pgoff_t		  cp_index;
 	/** List of slices. Immutable after creation. */
 	struct list_head	       cp_layers;
-	/** Parent page, NULL for top-level page. Immutable after creation. */
-	struct cl_page	  *cp_parent;
-	/** Lower-layer page. NULL for bottommost page. Immutable after
-	 * creation.
-	 */
-	struct cl_page	  *cp_child;
+	/** vmpage */
+	struct page		*cp_vmpage;
 	/**
 	 * Page state. This field is const to avoid accidental update, it is
 	 * modified only internally within cl_page.c. Protected by a VM lock.
@@ -791,6 +781,7 @@ struct cl_page {
  */
 struct cl_page_slice {
 	struct cl_page		  *cpl_page;
+	pgoff_t				 cpl_index;
 	/**
 	 * Object slice corresponding to this page slice. Immutable after
 	 * creation.
@@ -846,11 +837,6 @@ struct cl_page_operations {
 	 */
 
 	/**
-	 * \return the underlying VM page. Optional.
-	 */
-	struct page *(*cpo_vmpage)(const struct lu_env *env,
-				   const struct cl_page_slice *slice);
-	/**
 	 * Called when \a io acquires this page into the exclusive
 	 * ownership. When this method returns, it is guaranteed that the is
 	 * not owned by other io, and no transfer is going on against
@@ -1102,6 +1088,12 @@ static inline int __page_in_use(const struct cl_page *page, int refc)
 #define cl_page_in_use(pg)       __page_in_use(pg, 1)
 #define cl_page_in_use_noref(pg) __page_in_use(pg, 0)
 
+static inline struct page *cl_page_vmpage(struct cl_page *page)
+{
+	LASSERT(page->cp_vmpage);
+	return page->cp_vmpage;
+}
+
 /** @} cl_page */
 
 /** \addtogroup cl_lock cl_lock
@@ -2729,7 +2721,7 @@ static inline int cl_object_same(struct cl_object *o0, struct cl_object *o1)
 static inline void cl_object_page_init(struct cl_object *clob, int size)
 {
 	clob->co_slice_off = cl_object_header(clob)->coh_page_bufsize;
-	cl_object_header(clob)->coh_page_bufsize += ALIGN(size, 8);
+	cl_object_header(clob)->coh_page_bufsize += cfs_size_round(size);
 }
 
 static inline void *cl_object_page_slice(struct cl_object *clob,
@@ -2774,9 +2766,7 @@ void cl_page_print(const struct lu_env *env, void *cookie, lu_printer_t printer,
 		   const struct cl_page *pg);
 void cl_page_header_print(const struct lu_env *env, void *cookie,
 			  lu_printer_t printer, const struct cl_page *pg);
-struct page *cl_page_vmpage(const struct lu_env *env, struct cl_page *page);
 struct cl_page *cl_vmpage_page(struct page *vmpage, struct cl_object *obj);
-struct cl_page *cl_page_top(struct cl_page *page);
 
 const struct cl_page_slice *cl_page_at(const struct cl_page *page,
 				       const struct lu_device_type *dtype);
@@ -2868,17 +2858,6 @@ struct cl_lock *cl_lock_at_pgoff(const struct lu_env *env,
 				 struct cl_object *obj, pgoff_t index,
 				 struct cl_lock *except, int pending,
 				 int canceld);
-static inline struct cl_lock *cl_lock_at_page(const struct lu_env *env,
-					      struct cl_object *obj,
-					      struct cl_page *page,
-					      struct cl_lock *except,
-					      int pending, int canceld)
-{
-	LASSERT(cl_object_header(obj) == cl_object_header(page->cp_obj));
-	return cl_lock_at_pgoff(env, obj, page->cp_index, except,
-				pending, canceld);
-}
-
 const struct cl_lock_slice *cl_lock_at(const struct cl_lock *lock,
 				       const struct lu_device_type *dtype);
 
diff --git a/drivers/staging/lustre/lustre/include/lclient.h b/drivers/staging/lustre/lustre/include/lclient.h
index 6c3a30a..c91fb01 100644
--- a/drivers/staging/lustre/lustre/include/lclient.h
+++ b/drivers/staging/lustre/lustre/include/lclient.h
@@ -238,6 +238,11 @@ static inline struct ccc_page *cl2ccc_page(const struct cl_page_slice *slice)
 	return container_of(slice, struct ccc_page, cpg_cl);
 }
 
+static inline pgoff_t ccc_index(struct ccc_page *ccc)
+{
+	return ccc->cpg_cl.cpl_index;
+}
+
 struct ccc_device {
 	struct cl_device    cdv_cl;
 	struct super_block *cdv_sb;
@@ -294,8 +299,6 @@ int ccc_lock_init(const struct lu_env *env, struct cl_object *obj,
 		  const struct cl_lock_operations *lkops);
 int ccc_object_glimpse(const struct lu_env *env,
 		       const struct cl_object *obj, struct ost_lvb *lvb);
-struct page *ccc_page_vmpage(const struct lu_env *env,
-			    const struct cl_page_slice *slice);
 int ccc_page_is_under_lock(const struct lu_env *env,
 			   const struct cl_page_slice *slice, struct cl_io *io);
 int ccc_fail(const struct lu_env *env, const struct cl_page_slice *slice);
diff --git a/drivers/staging/lustre/lustre/llite/lcommon_cl.c b/drivers/staging/lustre/lustre/llite/lcommon_cl.c
index 065b0f2..55fa0da 100644
--- a/drivers/staging/lustre/lustre/llite/lcommon_cl.c
+++ b/drivers/staging/lustre/lustre/llite/lcommon_cl.c
@@ -336,6 +336,8 @@ struct lu_object *ccc_object_alloc(const struct lu_env *env,
 		obj = ccc2lu(vob);
 		hdr = &vob->cob_header;
 		cl_object_header_init(hdr);
+		hdr->coh_page_bufsize = cfs_size_round(sizeof(struct cl_page));
+
 		lu_object_init(obj, &hdr->coh_lu, dev);
 		lu_object_add_top(&hdr->coh_lu, obj);
 
@@ -450,12 +452,6 @@ static void ccc_object_size_unlock(struct cl_object *obj)
  *
  */
 
-struct page *ccc_page_vmpage(const struct lu_env *env,
-			     const struct cl_page_slice *slice)
-{
-	return cl2vm_page(slice);
-}
-
 int ccc_page_is_under_lock(const struct lu_env *env,
 			   const struct cl_page_slice *slice,
 			   struct cl_io *io)
@@ -471,8 +467,8 @@ int ccc_page_is_under_lock(const struct lu_env *env,
 		if (cio->cui_fd->fd_flags & LL_FILE_GROUP_LOCKED) {
 			result = -EBUSY;
 		} else {
-			desc->cld_start = page->cp_index;
-			desc->cld_end   = page->cp_index;
+			desc->cld_start = ccc_index(cl2ccc_page(slice));
+			desc->cld_end   = ccc_index(cl2ccc_page(slice));
 			desc->cld_obj   = page->cp_obj;
 			desc->cld_mode  = CLM_READ;
 			result = cl_queue_match(&io->ci_lockset.cls_done,
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index 08fe0ea..bc83147 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -982,6 +982,10 @@ static inline void ll_invalidate_page(struct page *vmpage)
 	if (!mapping)
 		return;
 
+	/*
+	 * truncate_complete_page() calls
+	 * a_ops->invalidatepage()->cl_page_delete()->vvp_page_delete().
+	 */
 	ll_teardown_mmaps(mapping, offset, offset + PAGE_CACHE_SIZE);
 	truncate_complete_page(mapping, vmpage);
 }
diff --git a/drivers/staging/lustre/lustre/llite/rw.c b/drivers/staging/lustre/lustre/llite/rw.c
index dcccdec..b1375f1 100644
--- a/drivers/staging/lustre/lustre/llite/rw.c
+++ b/drivers/staging/lustre/lustre/llite/rw.c
@@ -290,15 +290,16 @@ void ll_ra_read_ex(struct file *f, struct ll_ra_read *rar)
 
 static int cl_read_ahead_page(const struct lu_env *env, struct cl_io *io,
 			      struct cl_page_list *queue, struct cl_page *page,
-			      struct page *vmpage)
+			      struct cl_object *clob)
 {
+	struct page *vmpage = page->cp_vmpage;
 	struct ccc_page *cp;
 	int	      rc;
 
 	rc = 0;
 	cl_page_assume(env, io, page);
 	lu_ref_add(&page->cp_reference, "ra", current);
-	cp = cl2ccc_page(cl_page_at(page, &vvp_device_type));
+	cp = cl2ccc_page(cl_object_page_slice(clob, page));
 	if (!cp->cpg_defer_uptodate && !PageUptodate(vmpage)) {
 		rc = cl_page_is_under_lock(env, io, page);
 		if (rc == -EBUSY) {
@@ -348,7 +349,7 @@ static int ll_read_ahead_page(const struct lu_env *env, struct cl_io *io,
 					    vmpage, CPT_CACHEABLE);
 			if (!IS_ERR(page)) {
 				rc = cl_read_ahead_page(env, io, queue,
-							page, vmpage);
+							page, clob);
 				if (rc == -ENOLCK) {
 					which = RA_STAT_FAILED_MATCH;
 					msg   = "lock match failed";
diff --git a/drivers/staging/lustre/lustre/llite/rw26.c b/drivers/staging/lustre/lustre/llite/rw26.c
index e8d29e1..e2fea8c 100644
--- a/drivers/staging/lustre/lustre/llite/rw26.c
+++ b/drivers/staging/lustre/lustre/llite/rw26.c
@@ -165,28 +165,6 @@ static int ll_releasepage(struct page *vmpage, gfp_t gfp_mask)
 	return result;
 }
 
-static int ll_set_page_dirty(struct page *vmpage)
-{
-#if 0
-	struct cl_page    *page = vvp_vmpage_page_transient(vmpage);
-	struct vvp_object *obj  = cl_inode2vvp(vmpage->mapping->host);
-	struct vvp_page   *cpg;
-
-	/*
-	 * XXX should page method be called here?
-	 */
-	LASSERT(&obj->co_cl == page->cp_obj);
-	cpg = cl2vvp_page(cl_page_at(page, &vvp_device_type));
-	/*
-	 * XXX cannot do much here, because page is possibly not locked:
-	 * sys_munmap()->...
-	 *     ->unmap_page_range()->zap_pte_range()->set_page_dirty().
-	 */
-	vvp_write_pending(obj, cpg);
-#endif
-	return __set_page_dirty_nobuffers(vmpage);
-}
-
 #define MAX_DIRECTIO_SIZE (2*1024*1024*1024UL)
 
 static inline int ll_get_user_pages(int rw, unsigned long user_addr,
@@ -274,7 +252,7 @@ ssize_t ll_direct_rw_pages(const struct lu_env *env, struct cl_io *io,
 		 * write directly
 		 */
 		if (clp->cp_type == CPT_CACHEABLE) {
-			struct page *vmpage = cl_page_vmpage(env, clp);
+			struct page *vmpage = cl_page_vmpage(clp);
 			struct page *src_page;
 			struct page *dst_page;
 			void       *src;
@@ -478,19 +456,16 @@ out:
 static int ll_prepare_partial_page(const struct lu_env *env, struct cl_io *io,
 				   struct cl_page *pg)
 {
-	struct cl_object *obj  = io->ci_obj;
 	struct cl_attr *attr   = ccc_env_thread_attr(env);
-	loff_t          offset = cl_offset(obj, pg->cp_index);
+	struct cl_object *obj  = io->ci_obj;
+	struct ccc_page *cp    = cl_object_page_slice(obj, pg);
+	loff_t          offset = cl_offset(obj, ccc_index(cp));
 	int             result;
 
 	cl_object_attr_lock(obj);
 	result = cl_object_attr_get(env, obj, attr);
 	cl_object_attr_unlock(obj);
 	if (result == 0) {
-		struct ccc_page *cp;
-
-		cp = cl2ccc_page(cl_page_at(pg, &vvp_device_type));
-
 		/*
 		 * If are writing to a new page, no need to read old data.
 		 * The extent locking will have updated the KMS, and for our
@@ -685,7 +660,7 @@ const struct address_space_operations ll_aops = {
 	.direct_IO      = ll_direct_IO_26,
 	.writepage      = ll_writepage,
 	.writepages     = ll_writepages,
-	.set_page_dirty = ll_set_page_dirty,
+	.set_page_dirty = __set_page_dirty_nobuffers,
 	.write_begin    = ll_write_begin,
 	.write_end      = ll_write_end,
 	.invalidatepage = ll_invalidatepage,
diff --git a/drivers/staging/lustre/lustre/llite/vvp_internal.h b/drivers/staging/lustre/lustre/llite/vvp_internal.h
index 9abde11..aa06f40 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_internal.h
+++ b/drivers/staging/lustre/lustre/llite/vvp_internal.h
@@ -49,7 +49,7 @@ int vvp_io_init(const struct lu_env *env, struct cl_object *obj,
 int vvp_lock_init(const struct lu_env *env, struct cl_object *obj,
 		  struct cl_lock *lock, const struct cl_io *io);
 int vvp_page_init(const struct lu_env *env, struct cl_object *obj,
-		  struct cl_page *page, struct page *vmpage);
+		  struct cl_page *page, pgoff_t index);
 struct lu_object *vvp_object_alloc(const struct lu_env *env,
 				   const struct lu_object_header *hdr,
 				   struct lu_device *dev);
diff --git a/drivers/staging/lustre/lustre/llite/vvp_io.c b/drivers/staging/lustre/lustre/llite/vvp_io.c
index f4a1384..ac9d615 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_io.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_io.c
@@ -625,7 +625,7 @@ static int vvp_io_commit_sync(const struct lu_env *env, struct cl_io *io,
 
 			cl_page_clip(env, page, 0, PAGE_SIZE);
 
-			SetPageUptodate(cl_page_vmpage(env, page));
+			SetPageUptodate(cl_page_vmpage(page));
 			cl_page_disown(env, io, page);
 
 			/* held in ll_cl_init() */
@@ -640,17 +640,15 @@ static int vvp_io_commit_sync(const struct lu_env *env, struct cl_io *io,
 static void write_commit_callback(const struct lu_env *env, struct cl_io *io,
 				  struct cl_page *page)
 {
-	const struct cl_page_slice *slice;
 	struct ccc_page *cp;
-	struct page *vmpage;
-
-	slice = cl_page_at(page, &vvp_device_type);
-	cp = cl2ccc_page(slice);
-	vmpage = cp->cpg_page;
+	struct page *vmpage = page->cp_vmpage;
+	struct cl_object *clob = cl_io_top(io)->ci_obj;
 
 	SetPageUptodate(vmpage);
 	set_page_dirty(vmpage);
-	vvp_write_pending(cl2ccc(slice->cpl_obj), cp);
+
+	cp = cl2ccc_page(cl_object_page_slice(clob, page));
+	vvp_write_pending(cl2ccc(clob), cp);
 
 	cl_page_disown(env, io, page);
 
@@ -660,19 +658,22 @@ static void write_commit_callback(const struct lu_env *env, struct cl_io *io,
 }
 
 /* make sure the page list is contiguous */
-static bool page_list_sanity_check(struct cl_page_list *plist)
+static bool page_list_sanity_check(struct cl_object *obj,
+				   struct cl_page_list *plist)
 {
 	struct cl_page *page;
 	pgoff_t index = CL_PAGE_EOF;
 
 	cl_page_list_for_each(page, plist) {
+		struct ccc_page *cp = cl_object_page_slice(obj, page);
+
 		if (index == CL_PAGE_EOF) {
-			index = page->cp_index;
+			index = ccc_index(cp);
 			continue;
 		}
 
 		++index;
-		if (index == page->cp_index)
+		if (index == ccc_index(cp))
 			continue;
 
 		return false;
@@ -698,7 +699,7 @@ int vvp_io_write_commit(const struct lu_env *env, struct cl_io *io)
 	CDEBUG(D_VFSTRACE, "commit async pages: %d, from %d, to %d\n",
 	       npages, cio->u.write.cui_from, cio->u.write.cui_to);
 
-	LASSERT(page_list_sanity_check(queue));
+	LASSERT(page_list_sanity_check(obj, queue));
 
 	/* submit IO with async write */
 	rc = cl_io_commit_async(env, io, queue,
@@ -723,7 +724,7 @@ int vvp_io_write_commit(const struct lu_env *env, struct cl_io *io)
 		/* the first page must have been written. */
 		cio->u.write.cui_from = 0;
 	}
-	LASSERT(page_list_sanity_check(queue));
+	LASSERT(page_list_sanity_check(obj, queue));
 	LASSERT(ergo(rc == 0, queue->pl_nr == 0));
 
 	/* out of quota, try sync write */
@@ -747,7 +748,7 @@ int vvp_io_write_commit(const struct lu_env *env, struct cl_io *io)
 		page = cl_page_list_first(queue);
 		cl_page_list_del(env, queue, page);
 
-		if (!PageDirty(cl_page_vmpage(env, page)))
+		if (!PageDirty(cl_page_vmpage(page)))
 			cl_page_discard(env, io, page);
 
 		cl_page_disown(env, io, page);
@@ -861,16 +862,13 @@ static int vvp_io_kernel_fault(struct vvp_fault_io *cfio)
 static void mkwrite_commit_callback(const struct lu_env *env, struct cl_io *io,
 				    struct cl_page *page)
 {
-	const struct cl_page_slice *slice;
 	struct ccc_page *cp;
-	struct page *vmpage;
+	struct cl_object *clob = cl_io_top(io)->ci_obj;
 
-	slice = cl_page_at(page, &vvp_device_type);
-	cp = cl2ccc_page(slice);
-	vmpage = cp->cpg_page;
+	set_page_dirty(page->cp_vmpage);
 
-	set_page_dirty(vmpage);
-	vvp_write_pending(cl2ccc(slice->cpl_obj), cp);
+	cp = cl2ccc_page(cl_object_page_slice(clob, page));
+	vvp_write_pending(cl2ccc(clob), cp);
 }
 
 static int vvp_io_fault_start(const struct lu_env *env,
@@ -975,6 +973,7 @@ static int vvp_io_fault_start(const struct lu_env *env,
 		wait_on_page_writeback(vmpage);
 		if (!PageDirty(vmpage)) {
 			struct cl_page_list *plist = &io->ci_queue.c2_qin;
+			struct ccc_page *cp = cl_object_page_slice(obj, page);
 			int to = PAGE_SIZE;
 
 			/* vvp_page_assume() calls wait_on_page_writeback(). */
@@ -984,7 +983,7 @@ static int vvp_io_fault_start(const struct lu_env *env,
 			cl_page_list_add(plist, page);
 
 			/* size fixup */
-			if (last_index == page->cp_index)
+			if (last_index == ccc_index(cp))
 				to = size & ~PAGE_MASK;
 
 			/* Do not set Dirty bit here so that in case IO is
@@ -1069,7 +1068,7 @@ static int vvp_io_read_page(const struct lu_env *env,
 
 	if (sbi->ll_ra_info.ra_max_pages_per_file &&
 	    sbi->ll_ra_info.ra_max_pages)
-		ras_update(sbi, inode, ras, page->cp_index,
+		ras_update(sbi, inode, ras, ccc_index(cp),
 			   cp->cpg_defer_uptodate);
 
 	/* Sanity check whether the page is protected by a lock. */
diff --git a/drivers/staging/lustre/lustre/llite/vvp_page.c b/drivers/staging/lustre/lustre/llite/vvp_page.c
index 11e609e..d9f13c3 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_page.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_page.c
@@ -136,26 +136,15 @@ static void vvp_page_discard(const struct lu_env *env,
 			     struct cl_io *unused)
 {
 	struct page	   *vmpage  = cl2vm_page(slice);
-	struct address_space *mapping;
 	struct ccc_page      *cpg     = cl2ccc_page(slice);
-	__u64 offset;
 
 	LASSERT(vmpage);
 	LASSERT(PageLocked(vmpage));
 
-	mapping = vmpage->mapping;
-
 	if (cpg->cpg_defer_uptodate && !cpg->cpg_ra_used)
-		ll_ra_stats_inc(mapping, RA_STAT_DISCARDED);
-
-	offset = vmpage->index << PAGE_SHIFT;
-	ll_teardown_mmaps(vmpage->mapping, offset, offset + PAGE_SIZE);
+		ll_ra_stats_inc(vmpage->mapping, RA_STAT_DISCARDED);
 
-	/*
-	 * truncate_complete_page() calls
-	 * a_ops->invalidatepage()->cl_page_delete()->vvp_page_delete().
-	 */
-	truncate_complete_page(mapping, vmpage);
+	ll_invalidate_page(vmpage);
 }
 
 static void vvp_page_delete(const struct lu_env *env,
@@ -269,7 +258,7 @@ static void vvp_page_completion_read(const struct lu_env *env,
 {
 	struct ccc_page *cp     = cl2ccc_page(slice);
 	struct page      *vmpage = cp->cpg_page;
-	struct cl_page  *page   = cl_page_top(slice->cpl_page);
+	struct cl_page  *page   = slice->cpl_page;
 	struct inode    *inode  = ccc_object_inode(page->cp_obj);
 
 	LASSERT(PageLocked(vmpage));
@@ -394,7 +383,6 @@ static const struct cl_page_operations vvp_page_ops = {
 	.cpo_assume	= vvp_page_assume,
 	.cpo_unassume      = vvp_page_unassume,
 	.cpo_disown	= vvp_page_disown,
-	.cpo_vmpage	= ccc_page_vmpage,
 	.cpo_discard       = vvp_page_discard,
 	.cpo_delete	= vvp_page_delete,
 	.cpo_export	= vvp_page_export,
@@ -504,7 +492,6 @@ static const struct cl_page_operations vvp_transient_page_ops = {
 	.cpo_unassume      = vvp_transient_page_unassume,
 	.cpo_disown	= vvp_transient_page_disown,
 	.cpo_discard       = vvp_transient_page_discard,
-	.cpo_vmpage	= ccc_page_vmpage,
 	.cpo_fini	  = vvp_transient_page_fini,
 	.cpo_is_vmlocked   = vvp_transient_page_is_vmlocked,
 	.cpo_print	 = vvp_page_print,
@@ -522,12 +509,14 @@ static const struct cl_page_operations vvp_transient_page_ops = {
 };
 
 int vvp_page_init(const struct lu_env *env, struct cl_object *obj,
-		  struct cl_page *page, struct page *vmpage)
+		struct cl_page *page, pgoff_t index)
 {
 	struct ccc_page *cpg = cl_object_page_slice(obj, page);
+	struct page     *vmpage = page->cp_vmpage;
 
 	CLOBINVRNT(env, obj, ccc_object_invariant(obj));
 
+	cpg->cpg_cl.cpl_index = index;
 	cpg->cpg_page = vmpage;
 	page_cache_get(vmpage);
 
diff --git a/drivers/staging/lustre/lustre/lov/lov_cl_internal.h b/drivers/staging/lustre/lustre/lov/lov_cl_internal.h
index 3d568fc..b8e2315 100644
--- a/drivers/staging/lustre/lustre/lov/lov_cl_internal.h
+++ b/drivers/staging/lustre/lustre/lov/lov_cl_internal.h
@@ -613,14 +613,13 @@ int lov_sublock_modify(const struct lu_env *env, struct lov_lock *lov,
 		       const struct cl_lock_descr *d, int idx);
 
 int lov_page_init(const struct lu_env *env, struct cl_object *ob,
-		  struct cl_page *page, struct page *vmpage);
+		  struct cl_page *page, pgoff_t index);
 int lovsub_page_init(const struct lu_env *env, struct cl_object *ob,
-		     struct cl_page *page, struct page *vmpage);
-
+		     struct cl_page *page, pgoff_t index);
 int lov_page_init_empty(const struct lu_env *env, struct cl_object *obj,
-			struct cl_page *page, struct page *vmpage);
+			struct cl_page *page, pgoff_t index);
 int lov_page_init_raid0(const struct lu_env *env, struct cl_object *obj,
-			struct cl_page *page, struct page *vmpage);
+			struct cl_page *page, pgoff_t index);
 struct lu_object *lov_object_alloc(const struct lu_env *env,
 				   const struct lu_object_header *hdr,
 				   struct lu_device *dev);
@@ -791,11 +790,6 @@ static inline struct lovsub_req *cl2lovsub_req(const struct cl_req_slice *slice)
 	return container_of0(slice, struct lovsub_req, lsrq_cl);
 }
 
-static inline struct cl_page *lov_sub_page(const struct cl_page_slice *slice)
-{
-	return slice->cpl_page->cp_child;
-}
-
 static inline struct lov_io *cl2lov_io(const struct lu_env *env,
 				       const struct cl_io_slice *ios)
 {
diff --git a/drivers/staging/lustre/lustre/lov/lov_io.c b/drivers/staging/lustre/lustre/lov/lov_io.c
index c606490..e5b2cfc 100644
--- a/drivers/staging/lustre/lustre/lov/lov_io.c
+++ b/drivers/staging/lustre/lustre/lov/lov_io.c
@@ -248,10 +248,12 @@ void lov_sub_put(struct lov_io_sub *sub)
 static int lov_page_stripe(const struct cl_page *page)
 {
 	struct lovsub_object *subobj;
+	const struct cl_page_slice *slice;
 
-	subobj = lu2lovsub(
-		lu_object_locate(page->cp_child->cp_obj->co_lu.lo_header,
-				 &lovsub_device_type));
+	slice = cl_page_at(page, &lovsub_device_type);
+	LASSERT(slice->cpl_obj);
+
+	subobj = cl2lovsub(slice->cpl_obj);
 	return subobj->lso_index;
 }
 
diff --git a/drivers/staging/lustre/lustre/lov/lov_object.c b/drivers/staging/lustre/lustre/lov/lov_object.c
index 5d8a2b6..0159b6f 100644
--- a/drivers/staging/lustre/lustre/lov/lov_object.c
+++ b/drivers/staging/lustre/lustre/lov/lov_object.c
@@ -67,7 +67,7 @@ struct lov_layout_operations {
 	int  (*llo_print)(const struct lu_env *env, void *cookie,
 			  lu_printer_t p, const struct lu_object *o);
 	int  (*llo_page_init)(const struct lu_env *env, struct cl_object *obj,
-			      struct cl_page *page, struct page *vmpage);
+			      struct cl_page *page, pgoff_t index);
 	int  (*llo_lock_init)(const struct lu_env *env,
 			      struct cl_object *obj, struct cl_lock *lock,
 			      const struct cl_io *io);
@@ -193,6 +193,18 @@ static int lov_init_sub(const struct lu_env *env, struct lov_object *lov,
 	return result;
 }
 
+static int lov_page_slice_fixup(struct lov_object *lov,
+				struct cl_object *stripe)
+{
+	struct cl_object_header *hdr = cl_object_header(&lov->lo_cl);
+	struct cl_object *o;
+
+	cl_object_for_each(o, stripe)
+		o->co_slice_off += hdr->coh_page_bufsize;
+
+	return cl_object_header(stripe)->coh_page_bufsize;
+}
+
 static int lov_init_raid0(const struct lu_env *env,
 			  struct lov_device *dev, struct lov_object *lov,
 			  const struct cl_object_conf *conf,
@@ -222,6 +234,8 @@ static int lov_init_raid0(const struct lu_env *env,
 	r0->lo_sub = libcfs_kvzalloc(r0->lo_nr * sizeof(r0->lo_sub[0]),
 				     GFP_NOFS);
 	if (r0->lo_sub) {
+		int psz = 0;
+
 		result = 0;
 		subconf->coc_inode = conf->coc_inode;
 		spin_lock_init(&r0->lo_sub_lock);
@@ -254,11 +268,21 @@ static int lov_init_raid0(const struct lu_env *env,
 				if (result == -EAGAIN) { /* try again */
 					--i;
 					result = 0;
+					continue;
 				}
 			} else {
 				result = PTR_ERR(stripe);
 			}
+
+			if (result == 0) {
+				int sz = lov_page_slice_fixup(lov, stripe);
+
+				LASSERT(ergo(psz > 0, psz == sz));
+				psz = sz;
+			}
 		}
+		if (result == 0)
+			cl_object_header(&lov->lo_cl)->coh_page_bufsize += psz;
 	} else
 		result = -ENOMEM;
 out:
@@ -824,10 +848,10 @@ static int lov_object_print(const struct lu_env *env, void *cookie,
 }
 
 int lov_page_init(const struct lu_env *env, struct cl_object *obj,
-		  struct cl_page *page, struct page *vmpage)
+		  struct cl_page *page, pgoff_t index)
 {
-	return LOV_2DISPATCH_NOLOCK(cl2lov(obj),
-				    llo_page_init, env, obj, page, vmpage);
+	return LOV_2DISPATCH_NOLOCK(cl2lov(obj), llo_page_init, env, obj, page,
+				    index);
 }
 
 /**
diff --git a/drivers/staging/lustre/lustre/lov/lov_page.c b/drivers/staging/lustre/lustre/lov/lov_page.c
index 5d9b355..0c508bd 100644
--- a/drivers/staging/lustre/lustre/lov/lov_page.c
+++ b/drivers/staging/lustre/lustre/lov/lov_page.c
@@ -52,59 +52,6 @@
  * Lov page operations.
  *
  */
-
-static int lov_page_invariant(const struct cl_page_slice *slice)
-{
-	const struct cl_page  *page = slice->cpl_page;
-	const struct cl_page  *sub  = lov_sub_page(slice);
-
-	return ergo(sub,
-		    page->cp_child == sub &&
-		    sub->cp_parent == page &&
-		    page->cp_state == sub->cp_state);
-}
-
-static void lov_page_fini(const struct lu_env *env,
-			  struct cl_page_slice *slice)
-{
-	struct cl_page  *sub = lov_sub_page(slice);
-
-	LINVRNT(lov_page_invariant(slice));
-
-	if (sub) {
-		LASSERT(sub->cp_state == CPS_FREEING);
-		lu_ref_del(&sub->cp_reference, "lov", sub->cp_parent);
-		sub->cp_parent = NULL;
-		slice->cpl_page->cp_child = NULL;
-		cl_page_put(env, sub);
-	}
-}
-
-static int lov_page_own(const struct lu_env *env,
-			const struct cl_page_slice *slice, struct cl_io *io,
-			int nonblock)
-{
-	struct lov_io     *lio = lov_env_io(env);
-	struct lov_io_sub *sub;
-
-	LINVRNT(lov_page_invariant(slice));
-	LINVRNT(!cl2lov_page(slice)->lps_invalid);
-
-	sub = lov_page_subio(env, lio, slice);
-	if (!IS_ERR(sub)) {
-		lov_sub_page(slice)->cp_owner = sub->sub_io;
-		lov_sub_put(sub);
-	} else
-		LBUG(); /* Arrgh */
-	return 0;
-}
-
-static void lov_page_assume(const struct lu_env *env,
-			    const struct cl_page_slice *slice, struct cl_io *io)
-{
-	lov_page_own(env, slice, io, 0);
-}
-
 static int lov_page_print(const struct lu_env *env,
 			  const struct cl_page_slice *slice,
 			  void *cookie, lu_printer_t printer)
@@ -115,26 +62,17 @@ static int lov_page_print(const struct lu_env *env,
 }
 
 static const struct cl_page_operations lov_page_ops = {
-	.cpo_fini   = lov_page_fini,
-	.cpo_own    = lov_page_own,
-	.cpo_assume = lov_page_assume,
 	.cpo_print  = lov_page_print
 };
 
-static void lov_empty_page_fini(const struct lu_env *env,
-				struct cl_page_slice *slice)
-{
-	LASSERT(!slice->cpl_page->cp_child);
-}
-
 int lov_page_init_raid0(const struct lu_env *env, struct cl_object *obj,
-			struct cl_page *page, struct page *vmpage)
+			struct cl_page *page, pgoff_t index)
 {
 	struct lov_object *loo = cl2lov(obj);
 	struct lov_layout_raid0 *r0 = lov_r0(loo);
 	struct lov_io     *lio = lov_env_io(env);
-	struct cl_page    *subpage;
 	struct cl_object  *subobj;
+	struct cl_object  *o;
 	struct lov_io_sub *sub;
 	struct lov_page   *lpg = cl_object_page_slice(obj, page);
 	loff_t	     offset;
@@ -142,13 +80,12 @@ int lov_page_init_raid0(const struct lu_env *env, struct cl_object *obj,
 	int		stripe;
 	int		rc;
 
-	offset = cl_offset(obj, page->cp_index);
+	offset = cl_offset(obj, index);
 	stripe = lov_stripe_number(loo->lo_lsm, offset);
 	LASSERT(stripe < r0->lo_nr);
 	rc = lov_stripe_offset(loo->lo_lsm, offset, stripe, &suboff);
 	LASSERT(rc == 0);
 
-	lpg->lps_invalid = 1;
 	cl_page_slice_add(page, &lpg->lps_cl, obj, &lov_page_ops);
 
 	sub = lov_sub_get(env, lio, stripe);
@@ -156,35 +93,44 @@ int lov_page_init_raid0(const struct lu_env *env, struct cl_object *obj,
 		return PTR_ERR(sub);
 
 	subobj = lovsub2cl(r0->lo_sub[stripe]);
-	subpage = cl_page_alloc(sub->sub_env, subobj, cl_index(subobj, suboff),
-				vmpage, page->cp_type);
-	if (!IS_ERR(subpage)) {
-		subpage->cp_parent = page;
-		page->cp_child = subpage;
-		lpg->lps_invalid = 0;
-	} else {
-		rc = PTR_ERR(subpage);
+	list_for_each_entry(o, &subobj->co_lu.lo_header->loh_layers,
+			    co_lu.lo_linkage) {
+		if (o->co_ops->coo_page_init) {
+			rc = o->co_ops->coo_page_init(sub->sub_env, o, page,
+						      cl_index(subobj, suboff));
+			if (rc != 0)
+				break;
+		}
 	}
 	lov_sub_put(sub);
 
 	return rc;
 }
 
+static int lov_page_empty_print(const struct lu_env *env,
+				const struct cl_page_slice *slice,
+				void *cookie, lu_printer_t printer)
+{
+	struct lov_page *lp = cl2lov_page(slice);
+
+	return (*printer)(env, cookie, LUSTRE_LOV_NAME "-page@%p, empty.\n",
+			  lp);
+}
+
 static const struct cl_page_operations lov_empty_page_ops = {
-	.cpo_fini   = lov_empty_page_fini,
-	.cpo_print  = lov_page_print
+	.cpo_print = lov_page_empty_print
 };
 
 int lov_page_init_empty(const struct lu_env *env, struct cl_object *obj,
-			struct cl_page *page, struct page *vmpage)
+			struct cl_page *page, pgoff_t index)
 {
 	struct lov_page *lpg = cl_object_page_slice(obj, page);
 	void *addr;
 
 	cl_page_slice_add(page, &lpg->lps_cl, obj, &lov_empty_page_ops);
-	addr = kmap(vmpage);
+	addr = kmap(page->cp_vmpage);
 	memset(addr, 0, cl_page_size(obj));
-	kunmap(vmpage);
+	kunmap(page->cp_vmpage);
 	cl_page_export(env, page, 1);
 	return 0;
 }
diff --git a/drivers/staging/lustre/lustre/lov/lovsub_page.c b/drivers/staging/lustre/lustre/lov/lovsub_page.c
index 2d94553..fb4c0cc 100644
--- a/drivers/staging/lustre/lustre/lov/lovsub_page.c
+++ b/drivers/staging/lustre/lustre/lov/lovsub_page.c
@@ -60,7 +60,7 @@ static const struct cl_page_operations lovsub_page_ops = {
 };
 
 int lovsub_page_init(const struct lu_env *env, struct cl_object *obj,
-		     struct cl_page *page, struct page *unused)
+		     struct cl_page *page, pgoff_t ind)
 {
 	struct lovsub_page *lsb = cl_object_page_slice(obj, page);
 
diff --git a/drivers/staging/lustre/lustre/obdclass/cl_io.c b/drivers/staging/lustre/lustre/obdclass/cl_io.c
index 9b3c5c1..86591ce 100644
--- a/drivers/staging/lustre/lustre/obdclass/cl_io.c
+++ b/drivers/staging/lustre/lustre/obdclass/cl_io.c
@@ -693,42 +693,6 @@ cl_io_slice_page(const struct cl_io_slice *ios, struct cl_page *page)
 }
 
 /**
- * True iff \a page is within \a io range.
- */
-static int cl_page_in_io(const struct cl_page *page, const struct cl_io *io)
-{
-	int     result = 1;
-	loff_t  start;
-	loff_t  end;
-	pgoff_t idx;
-
-	idx = page->cp_index;
-	switch (io->ci_type) {
-	case CIT_READ:
-	case CIT_WRITE:
-		/*
-		 * check that [start, end) and [pos, pos + count) extents
-		 * overlap.
-		 */
-		if (!cl_io_is_append(io)) {
-			const struct cl_io_rw_common *crw = &(io->u.ci_rw);
-
-			start = cl_offset(page->cp_obj, idx);
-			end   = cl_offset(page->cp_obj, idx + 1);
-			result = crw->crw_pos < end &&
-				 start < crw->crw_pos + crw->crw_count;
-		}
-		break;
-	case CIT_FAULT:
-		result = io->u.ci_fault.ft_index == idx;
-		break;
-	default:
-		LBUG();
-	}
-	return result;
-}
-
-/**
  * Called by read io, when page has to be read from the server.
  *
  * \see cl_io_operations::cio_read_page()
@@ -743,7 +707,6 @@ int cl_io_read_page(const struct lu_env *env, struct cl_io *io,
 	LINVRNT(io->ci_type == CIT_READ || io->ci_type == CIT_FAULT);
 	LINVRNT(cl_page_is_owned(page, io));
 	LINVRNT(io->ci_state == CIS_IO_GOING || io->ci_state == CIS_LOCKED);
-	LINVRNT(cl_page_in_io(page, io));
 	LINVRNT(cl_io_invariant(io));
 
 	queue = &io->ci_queue;
@@ -893,7 +856,6 @@ static int cl_io_cancel(const struct lu_env *env, struct cl_io *io,
 	cl_page_list_for_each(page, queue) {
 		int rc;
 
-		LINVRNT(cl_page_in_io(page, io));
 		rc = cl_page_cancel(env, page);
 		result = result ?: rc;
 	}
@@ -1229,7 +1191,7 @@ EXPORT_SYMBOL(cl_2queue_init_page);
 /**
  * Returns top-level io.
  *
- * \see cl_object_top(), cl_page_top().
+ * \see cl_object_top()
  */
 struct cl_io *cl_io_top(struct cl_io *io)
 {
@@ -1292,19 +1254,14 @@ static int cl_req_init(const struct lu_env *env, struct cl_req *req,
 	int result;
 
 	result = 0;
-	page = cl_page_top(page);
-	do {
-		list_for_each_entry(slice, &page->cp_layers, cpl_linkage) {
-			dev = lu2cl_dev(slice->cpl_obj->co_lu.lo_dev);
-			if (dev->cd_ops->cdo_req_init) {
-				result = dev->cd_ops->cdo_req_init(env,
-								   dev, req);
-				if (result != 0)
-					break;
-			}
+	list_for_each_entry(slice, &page->cp_layers, cpl_linkage) {
+		dev = lu2cl_dev(slice->cpl_obj->co_lu.lo_dev);
+		if (dev->cd_ops->cdo_req_init) {
+			result = dev->cd_ops->cdo_req_init(env, dev, req);
+			if (result != 0)
+				break;
 		}
-		page = page->cp_child;
-	} while (page && result == 0);
+	}
 	return result;
 }
 
@@ -1375,8 +1332,6 @@ void cl_req_page_add(const struct lu_env *env,
 	struct cl_req_obj *rqo;
 	int i;
 
-	page = cl_page_top(page);
-
 	LASSERT(list_empty(&page->cp_flight));
 	LASSERT(!page->cp_req);
 
@@ -1407,8 +1362,6 @@ void cl_req_page_done(const struct lu_env *env, struct cl_page *page)
 {
 	struct cl_req *req = page->cp_req;
 
-	page = cl_page_top(page);
-
 	LASSERT(!list_empty(&page->cp_flight));
 	LASSERT(req->crq_nrpages > 0);
 
diff --git a/drivers/staging/lustre/lustre/obdclass/cl_object.c b/drivers/staging/lustre/lustre/obdclass/cl_object.c
index fa9b083..72e6333 100644
--- a/drivers/staging/lustre/lustre/obdclass/cl_object.c
+++ b/drivers/staging/lustre/lustre/obdclass/cl_object.c
@@ -84,7 +84,7 @@ int cl_object_header_init(struct cl_object_header *h)
 		lockdep_set_class(&h->coh_lock_guard, &cl_lock_guard_class);
 		lockdep_set_class(&h->coh_attr_guard, &cl_attr_guard_class);
 		INIT_LIST_HEAD(&h->coh_locks);
-		h->coh_page_bufsize = ALIGN(sizeof(struct cl_page), 8);
+		h->coh_page_bufsize = 0;
 	}
 	return result;
 }
@@ -138,7 +138,7 @@ EXPORT_SYMBOL(cl_object_get);
 /**
  * Returns the top-object for a given \a o.
  *
- * \see cl_page_top(), cl_io_top()
+ * \see cl_io_top()
  */
 struct cl_object *cl_object_top(struct cl_object *o)
 {
diff --git a/drivers/staging/lustre/lustre/obdclass/cl_page.c b/drivers/staging/lustre/lustre/obdclass/cl_page.c
index 0844a97..cb15673 100644
--- a/drivers/staging/lustre/lustre/obdclass/cl_page.c
+++ b/drivers/staging/lustre/lustre/obdclass/cl_page.c
@@ -63,18 +63,6 @@ static void cl_page_delete0(const struct lu_env *env, struct cl_page *pg);
 	((void)sizeof(env), (void)sizeof(page), (void)sizeof !!(exp))
 
 /**
- * Internal version of cl_page_top, it should be called if the page is
- * known to be not freed, says with page referenced, or radix tree lock held,
- * or page owned.
- */
-static struct cl_page *cl_page_top_trusted(struct cl_page *page)
-{
-	while (page->cp_parent)
-		page = page->cp_parent;
-	return page;
-}
-
-/**
  * Internal version of cl_page_get().
  *
  * This function can be used to obtain initial reference to previously
@@ -102,14 +90,10 @@ cl_page_at_trusted(const struct cl_page *page,
 {
 	const struct cl_page_slice *slice;
 
-	page = cl_page_top_trusted((struct cl_page *)page);
-	do {
-		list_for_each_entry(slice, &page->cp_layers, cpl_linkage) {
-			if (slice->cpl_obj->co_lu.lo_dev->ld_type == dtype)
-				return slice;
-		}
-		page = page->cp_child;
-	} while (page);
+	list_for_each_entry(slice, &page->cp_layers, cpl_linkage) {
+		if (slice->cpl_obj->co_lu.lo_dev->ld_type == dtype)
+			return slice;
+	}
 	return NULL;
 }
 
@@ -120,7 +104,6 @@ static void cl_page_free(const struct lu_env *env, struct cl_page *page)
 	PASSERT(env, page, list_empty(&page->cp_batch));
 	PASSERT(env, page, !page->cp_owner);
 	PASSERT(env, page, !page->cp_req);
-	PASSERT(env, page, !page->cp_parent);
 	PASSERT(env, page, page->cp_state == CPS_FREEING);
 
 	while (!list_empty(&page->cp_layers)) {
@@ -129,7 +112,8 @@ static void cl_page_free(const struct lu_env *env, struct cl_page *page)
 		slice = list_entry(page->cp_layers.next,
 				   struct cl_page_slice, cpl_linkage);
 		list_del_init(page->cp_layers.next);
-		slice->cpl_ops->cpo_fini(env, slice);
+		if (unlikely(slice->cpl_ops->cpo_fini))
+			slice->cpl_ops->cpo_fini(env, slice);
 	}
 	lu_object_ref_del_at(&obj->co_lu, &page->cp_obj_ref, "cl_page", page);
 	cl_object_put(env, obj);
@@ -165,7 +149,7 @@ struct cl_page *cl_page_alloc(const struct lu_env *env,
 		cl_object_get(o);
 		lu_object_ref_add_at(&o->co_lu, &page->cp_obj_ref, "cl_page",
 				     page);
-		page->cp_index = ind;
+		page->cp_vmpage = vmpage;
 		cl_page_state_set_trust(page, CPS_CACHED);
 		page->cp_type = type;
 		INIT_LIST_HEAD(&page->cp_layers);
@@ -176,8 +160,8 @@ struct cl_page *cl_page_alloc(const struct lu_env *env,
 		head = o->co_lu.lo_header;
 		list_for_each_entry(o, &head->loh_layers, co_lu.lo_linkage) {
 			if (o->co_ops->coo_page_init) {
-				result = o->co_ops->coo_page_init(env, o,
-								  page, vmpage);
+				result = o->co_ops->coo_page_init(env, o, page,
+								  ind);
 				if (result != 0) {
 					cl_page_delete0(env, page);
 					cl_page_free(env, page);
@@ -249,27 +233,12 @@ EXPORT_SYMBOL(cl_page_find);
 
 static inline int cl_page_invariant(const struct cl_page *pg)
 {
-	struct cl_page	  *parent;
-	struct cl_page	  *child;
-	struct cl_io	    *owner;
-
 	/*
 	 * Page invariant is protected by a VM lock.
 	 */
 	LINVRNT(cl_page_is_vmlocked(NULL, pg));
 
-	parent = pg->cp_parent;
-	child  = pg->cp_child;
-	owner  = pg->cp_owner;
-
-	return cl_page_in_use(pg) &&
-		ergo(parent, parent->cp_child == pg) &&
-		ergo(child, child->cp_parent == pg) &&
-		ergo(child, pg->cp_obj != child->cp_obj) &&
-		ergo(parent, pg->cp_obj != parent->cp_obj) &&
-		ergo(owner && parent,
-		     parent->cp_owner == pg->cp_owner->ci_parent) &&
-		ergo(owner && child, child->cp_owner->ci_parent == owner);
+	return cl_page_in_use_noref(pg);
 }
 
 static void cl_page_state_set0(const struct lu_env *env,
@@ -322,13 +291,9 @@ static void cl_page_state_set0(const struct lu_env *env,
 	old = page->cp_state;
 	PASSERT(env, page, allowed_transitions[old][state]);
 	CL_PAGE_HEADER(D_TRACE, env, page, "%d -> %d\n", old, state);
-	for (; page; page = page->cp_child) {
-		PASSERT(env, page, page->cp_state == old);
-		PASSERT(env, page,
-			equi(state == CPS_OWNED, page->cp_owner));
-
-		cl_page_state_set_trust(page, state);
-	}
+	PASSERT(env, page, page->cp_state == old);
+	PASSERT(env, page, equi(state == CPS_OWNED, page->cp_owner));
+	cl_page_state_set_trust(page, state);
 }
 
 static void cl_page_state_set(const struct lu_env *env,
@@ -362,8 +327,6 @@ EXPORT_SYMBOL(cl_page_get);
  */
 void cl_page_put(const struct lu_env *env, struct cl_page *page)
 {
-	PASSERT(env, page, atomic_read(&page->cp_ref) > !!page->cp_parent);
-
 	CL_PAGE_HEADER(D_TRACE, env, page, "%d\n",
 		       atomic_read(&page->cp_ref));
 
@@ -383,34 +346,10 @@ void cl_page_put(const struct lu_env *env, struct cl_page *page)
 EXPORT_SYMBOL(cl_page_put);
 
 /**
- * Returns a VM page associated with a given cl_page.
- */
-struct page *cl_page_vmpage(const struct lu_env *env, struct cl_page *page)
-{
-	const struct cl_page_slice *slice;
-
-	/*
-	 * Find uppermost layer with ->cpo_vmpage() method, and return its
-	 * result.
-	 */
-	page = cl_page_top(page);
-	do {
-		list_for_each_entry(slice, &page->cp_layers, cpl_linkage) {
-			if (slice->cpl_ops->cpo_vmpage)
-				return slice->cpl_ops->cpo_vmpage(env, slice);
-		}
-		page = page->cp_child;
-	} while (page);
-	LBUG(); /* ->cpo_vmpage() has to be defined somewhere in the stack */
-}
-EXPORT_SYMBOL(cl_page_vmpage);
-
-/**
  * Returns a cl_page associated with a VM page, and given cl_object.
  */
 struct cl_page *cl_vmpage_page(struct page *vmpage, struct cl_object *obj)
 {
-	struct cl_page *top;
 	struct cl_page *page;
 
 	KLASSERT(PageLocked(vmpage));
@@ -421,36 +360,15 @@ struct cl_page *cl_vmpage_page(struct page *vmpage, struct cl_object *obj)
 	 *       bottom-to-top pass.
 	 */
 
-	/*
-	 * This loop assumes that ->private points to the top-most page. This
-	 * can be rectified easily.
-	 */
-	top = (struct cl_page *)vmpage->private;
-	if (!top)
-		return NULL;
-
-	for (page = top; page; page = page->cp_child) {
-		if (cl_object_same(page->cp_obj, obj)) {
-			cl_page_get_trust(page);
-			break;
-		}
+	page = (struct cl_page *)vmpage->private;
+	if (page) {
+		cl_page_get_trust(page);
+		LASSERT(page->cp_type == CPT_CACHEABLE);
 	}
-	LASSERT(ergo(page, page->cp_type == CPT_CACHEABLE));
 	return page;
 }
 EXPORT_SYMBOL(cl_vmpage_page);
 
-/**
- * Returns the top-page for a given page.
- *
- * \see cl_object_top(), cl_io_top()
- */
-struct cl_page *cl_page_top(struct cl_page *page)
-{
-	return cl_page_top_trusted(page);
-}
-EXPORT_SYMBOL(cl_page_top);
-
 const struct cl_page_slice *cl_page_at(const struct cl_page *page,
 				       const struct lu_device_type *dtype)
 {
@@ -470,21 +388,14 @@ EXPORT_SYMBOL(cl_page_at);
 	int		       (*__method)_proto;		    \
 									\
 	__result = 0;						   \
-	__page = cl_page_top(__page);				   \
-	do {							    \
-		list_for_each_entry(__scan, &__page->cp_layers,     \
-					cpl_linkage) {		  \
-			__method = *(void **)((char *)__scan->cpl_ops + \
-					      __op);		    \
-			if (__method) {					\
-				__result = (*__method)(__env, __scan,   \
-						       ## __VA_ARGS__); \
-				if (__result != 0)		      \
-					break;			  \
-			}					       \
-		}						       \
-		__page = __page->cp_child;			      \
-	} while (__page && __result == 0);			      \
+	list_for_each_entry(__scan, &__page->cp_layers, cpl_linkage) {  \
+		__method = *(void **)((char *)__scan->cpl_ops +  __op); \
+		if (__method) {						\
+			__result = (*__method)(__env, __scan, ## __VA_ARGS__); \
+			if (__result != 0)				\
+				break;					\
+		}							\
+	}								\
 	if (__result > 0)					       \
 		__result = 0;					   \
 	__result;						       \
@@ -498,18 +409,11 @@ do {								    \
 	ptrdiff_t		   __op   = (_op);		     \
 	void		      (*__method)_proto;		    \
 									\
-	__page = cl_page_top(__page);				   \
-	do {							    \
-		list_for_each_entry(__scan, &__page->cp_layers,     \
-					cpl_linkage) {		  \
-			__method = *(void **)((char *)__scan->cpl_ops + \
-					      __op);		    \
-			if (__method)				   \
-				(*__method)(__env, __scan,	      \
-					    ## __VA_ARGS__);	    \
-		}						       \
-		__page = __page->cp_child;			      \
-	} while (__page);					       \
+	list_for_each_entry(__scan, &__page->cp_layers, cpl_linkage) {	\
+		__method = *(void **)((char *)__scan->cpl_ops + __op);	\
+		if (__method)						\
+			(*__method)(__env, __scan, ## __VA_ARGS__);	\
+	}								\
 } while (0)
 
 #define CL_PAGE_INVOID_REVERSE(_env, _page, _op, _proto, ...)	       \
@@ -520,20 +424,11 @@ do {									\
 	ptrdiff_t		   __op   = (_op);			 \
 	void		      (*__method)_proto;			\
 									    \
-	/* get to the bottom page. */				       \
-	while (__page->cp_child)					    \
-		__page = __page->cp_child;				  \
-	do {								\
-		list_for_each_entry_reverse(__scan, &__page->cp_layers, \
-						cpl_linkage) {	      \
-			__method = *(void **)((char *)__scan->cpl_ops +     \
-					      __op);			\
-			if (__method)				       \
-				(*__method)(__env, __scan,		  \
-					    ## __VA_ARGS__);		\
-		}							   \
-		__page = __page->cp_parent;				 \
-	} while (__page);						   \
+	list_for_each_entry_reverse(__scan, &__page->cp_layers, cpl_linkage) { \
+		__method = *(void **)((char *)__scan->cpl_ops + __op);	\
+		if (__method)						\
+			(*__method)(__env, __scan, ## __VA_ARGS__);	\
+	}								\
 } while (0)
 
 static int cl_page_invoke(const struct lu_env *env,
@@ -559,20 +454,17 @@ static void cl_page_invoid(const struct lu_env *env,
 
 static void cl_page_owner_clear(struct cl_page *page)
 {
-	for (page = cl_page_top(page); page; page = page->cp_child) {
-		if (page->cp_owner) {
-			LASSERT(page->cp_owner->ci_owned_nr > 0);
-			page->cp_owner->ci_owned_nr--;
-			page->cp_owner = NULL;
-			page->cp_task = NULL;
-		}
+	if (page->cp_owner) {
+		LASSERT(page->cp_owner->ci_owned_nr > 0);
+		page->cp_owner->ci_owned_nr--;
+		page->cp_owner = NULL;
+		page->cp_task = NULL;
 	}
 }
 
 static void cl_page_owner_set(struct cl_page *page)
 {
-	for (page = cl_page_top(page); page; page = page->cp_child)
-		page->cp_owner->ci_owned_nr++;
+	page->cp_owner->ci_owned_nr++;
 }
 
 void cl_page_disown0(const struct lu_env *env,
@@ -603,8 +495,9 @@ void cl_page_disown0(const struct lu_env *env,
  */
 int cl_page_is_owned(const struct cl_page *pg, const struct cl_io *io)
 {
+	struct cl_io *top = cl_io_top((struct cl_io *)io);
 	LINVRNT(cl_object_same(pg->cp_obj, io->ci_obj));
-	return pg->cp_state == CPS_OWNED && pg->cp_owner == io;
+	return pg->cp_state == CPS_OWNED && pg->cp_owner == top;
 }
 EXPORT_SYMBOL(cl_page_is_owned);
 
@@ -635,7 +528,6 @@ static int cl_page_own0(const struct lu_env *env, struct cl_io *io,
 
 	PINVRNT(env, pg, !cl_page_is_owned(pg, io));
 
-	pg = cl_page_top(pg);
 	io = cl_io_top(io);
 
 	if (pg->cp_state == CPS_FREEING) {
@@ -649,7 +541,7 @@ static int cl_page_own0(const struct lu_env *env, struct cl_io *io,
 		if (result == 0) {
 			PASSERT(env, pg, !pg->cp_owner);
 			PASSERT(env, pg, !pg->cp_req);
-			pg->cp_owner = io;
+			pg->cp_owner = cl_io_top(io);
 			pg->cp_task  = current;
 			cl_page_owner_set(pg);
 			if (pg->cp_state != CPS_FREEING) {
@@ -702,12 +594,11 @@ void cl_page_assume(const struct lu_env *env,
 {
 	PINVRNT(env, pg, cl_object_same(pg->cp_obj, io->ci_obj));
 
-	pg = cl_page_top(pg);
 	io = cl_io_top(io);
 
 	cl_page_invoid(env, io, pg, CL_PAGE_OP(cpo_assume));
 	PASSERT(env, pg, !pg->cp_owner);
-	pg->cp_owner = io;
+	pg->cp_owner = cl_io_top(io);
 	pg->cp_task = current;
 	cl_page_owner_set(pg);
 	cl_page_state_set(env, pg, CPS_OWNED);
@@ -731,7 +622,6 @@ void cl_page_unassume(const struct lu_env *env,
 	PINVRNT(env, pg, cl_page_is_owned(pg, io));
 	PINVRNT(env, pg, cl_page_invariant(pg));
 
-	pg = cl_page_top(pg);
 	io = cl_io_top(io);
 	cl_page_owner_clear(pg);
 	cl_page_state_set(env, pg, CPS_CACHED);
@@ -758,7 +648,6 @@ void cl_page_disown(const struct lu_env *env,
 {
 	PINVRNT(env, pg, cl_page_is_owned(pg, io));
 
-	pg = cl_page_top(pg);
 	io = cl_io_top(io);
 	cl_page_disown0(env, io, pg);
 }
@@ -791,7 +680,6 @@ EXPORT_SYMBOL(cl_page_discard);
  */
 static void cl_page_delete0(const struct lu_env *env, struct cl_page *pg)
 {
-	PASSERT(env, pg, pg == cl_page_top(pg));
 	PASSERT(env, pg, pg->cp_state != CPS_FREEING);
 
 	/*
@@ -825,7 +713,6 @@ static void cl_page_delete0(const struct lu_env *env, struct cl_page *pg)
  * Once page reaches cl_page_state::CPS_FREEING, all remaining references will
  * drain after some time, at which point page will be recycled.
  *
- * \pre  pg == cl_page_top(pg)
  * \pre  VM page is locked
  * \post pg->cp_state == CPS_FREEING
  *
@@ -865,7 +752,6 @@ int cl_page_is_vmlocked(const struct lu_env *env, const struct cl_page *pg)
 	int result;
 	const struct cl_page_slice *slice;
 
-	pg = cl_page_top_trusted((struct cl_page *)pg);
 	slice = container_of(pg->cp_layers.next,
 			     const struct cl_page_slice, cpl_linkage);
 	PASSERT(env, pg, slice->cpl_ops->cpo_is_vmlocked);
@@ -1082,9 +968,8 @@ void cl_page_header_print(const struct lu_env *env, void *cookie,
 			  lu_printer_t printer, const struct cl_page *pg)
 {
 	(*printer)(env, cookie,
-		   "page@%p[%d %p:%lu ^%p_%p %d %d %d %p %p %#x]\n",
+		   "page@%p[%d %p %d %d %d %p %p %#x]\n",
 		   pg, atomic_read(&pg->cp_ref), pg->cp_obj,
-		   pg->cp_index, pg->cp_parent, pg->cp_child,
 		   pg->cp_state, pg->cp_error, pg->cp_type,
 		   pg->cp_owner, pg->cp_req, pg->cp_flags);
 }
@@ -1096,11 +981,7 @@ EXPORT_SYMBOL(cl_page_header_print);
 void cl_page_print(const struct lu_env *env, void *cookie,
 		   lu_printer_t printer, const struct cl_page *pg)
 {
-	struct cl_page *scan;
-
-	for (scan = cl_page_top((struct cl_page *)pg); scan;
-	     scan = scan->cp_child)
-		cl_page_header_print(env, cookie, printer, scan);
+	cl_page_header_print(env, cookie, printer, pg);
 	CL_PAGE_INVOKE(env, (struct cl_page *)pg, CL_PAGE_OP(cpo_print),
 		       (const struct lu_env *env,
 			const struct cl_page_slice *slice,
diff --git a/drivers/staging/lustre/lustre/obdecho/echo_client.c b/drivers/staging/lustre/lustre/obdecho/echo_client.c
index 6c205f9..db56081 100644
--- a/drivers/staging/lustre/lustre/obdecho/echo_client.c
+++ b/drivers/staging/lustre/lustre/obdecho/echo_client.c
@@ -81,7 +81,6 @@ struct echo_object_conf {
 struct echo_page {
 	struct cl_page_slice   ep_cl;
 	struct mutex		ep_lock;
-	struct page	    *ep_vmpage;
 };
 
 struct echo_lock {
@@ -219,12 +218,6 @@ static struct lu_kmem_descr echo_caches[] = {
  *
  * @{
  */
-static struct page *echo_page_vmpage(const struct lu_env *env,
-				     const struct cl_page_slice *slice)
-{
-	return cl2echo_page(slice)->ep_vmpage;
-}
-
 static int echo_page_own(const struct lu_env *env,
 			 const struct cl_page_slice *slice,
 			 struct cl_io *io, int nonblock)
@@ -273,12 +266,10 @@ static void echo_page_completion(const struct lu_env *env,
 static void echo_page_fini(const struct lu_env *env,
 			   struct cl_page_slice *slice)
 {
-	struct echo_page *ep    = cl2echo_page(slice);
 	struct echo_object *eco = cl2echo_obj(slice->cpl_obj);
-	struct page *vmpage      = ep->ep_vmpage;
 
 	atomic_dec(&eco->eo_npages);
-	page_cache_release(vmpage);
+	page_cache_release(slice->cpl_page->cp_vmpage);
 }
 
 static int echo_page_prep(const struct lu_env *env,
@@ -295,7 +286,8 @@ static int echo_page_print(const struct lu_env *env,
 	struct echo_page *ep = cl2echo_page(slice);
 
 	(*printer)(env, cookie, LUSTRE_ECHO_CLIENT_NAME"-page@%p %d vm@%p\n",
-		   ep, mutex_is_locked(&ep->ep_lock), ep->ep_vmpage);
+		   ep, mutex_is_locked(&ep->ep_lock),
+		   slice->cpl_page->cp_vmpage);
 	return 0;
 }
 
@@ -303,7 +295,6 @@ static const struct cl_page_operations echo_page_ops = {
 	.cpo_own	   = echo_page_own,
 	.cpo_disown	= echo_page_disown,
 	.cpo_discard       = echo_page_discard,
-	.cpo_vmpage	= echo_page_vmpage,
 	.cpo_fini	  = echo_page_fini,
 	.cpo_print	 = echo_page_print,
 	.cpo_is_vmlocked   = echo_page_is_vmlocked,
@@ -367,13 +358,12 @@ static struct cl_lock_operations echo_lock_ops = {
  * @{
  */
 static int echo_page_init(const struct lu_env *env, struct cl_object *obj,
-			  struct cl_page *page, struct page *vmpage)
+			  struct cl_page *page, pgoff_t index)
 {
 	struct echo_page *ep = cl_object_page_slice(obj, page);
 	struct echo_object *eco = cl2echo_obj(obj);
 
-	ep->ep_vmpage = vmpage;
-	page_cache_get(vmpage);
+	page_cache_get(page->cp_vmpage);
 	mutex_init(&ep->ep_lock);
 	cl_page_slice_add(page, &ep->ep_cl, obj, &echo_page_ops);
 	atomic_inc(&eco->eo_npages);
@@ -568,6 +558,8 @@ static struct lu_object *echo_object_alloc(const struct lu_env *env,
 
 		obj = &echo_obj2cl(eco)->co_lu;
 		cl_object_header_init(hdr);
+		hdr->coh_page_bufsize = cfs_size_round(sizeof(struct cl_page));
+
 		lu_object_init(obj, &hdr->coh_lu, dev);
 		lu_object_add_top(&hdr->coh_lu, obj);
 
diff --git a/drivers/staging/lustre/lustre/osc/osc_cache.c b/drivers/staging/lustre/lustre/osc/osc_cache.c
index 3be4b1f..74607933 100644
--- a/drivers/staging/lustre/lustre/osc/osc_cache.c
+++ b/drivers/staging/lustre/lustre/osc/osc_cache.c
@@ -276,7 +276,7 @@ static int osc_extent_sanity_check0(struct osc_extent *ext,
 
 	page_count = 0;
 	list_for_each_entry(oap, &ext->oe_pages, oap_pending_item) {
-		pgoff_t index = oap2cl_page(oap)->cp_index;
+		pgoff_t index = osc_index(oap2osc(oap));
 		++page_count;
 		if (index > ext->oe_end || index < ext->oe_start) {
 			rc = 110;
@@ -991,19 +991,19 @@ static int osc_extent_truncate(struct osc_extent *ext, pgoff_t trunc_index,
 
 	/* discard all pages with index greater then trunc_index */
 	list_for_each_entry_safe(oap, tmp, &ext->oe_pages, oap_pending_item) {
-		struct cl_page *sub = oap2cl_page(oap);
-		struct cl_page *page = cl_page_top(sub);
+		pgoff_t index = osc_index(oap2osc(oap));
+		struct cl_page *page = oap2cl_page(oap);
 
 		LASSERT(list_empty(&oap->oap_rpc_item));
 
 		/* only discard the pages with their index greater than
 		 * trunc_index, and ...
 		 */
-		if (sub->cp_index < trunc_index ||
-		    (sub->cp_index == trunc_index && partial)) {
+		if (index < trunc_index ||
+		    (index == trunc_index && partial)) {
 			/* accounting how many pages remaining in the chunk
 			 * so that we can calculate grants correctly. */
-			if (sub->cp_index >> ppc_bits == trunc_chunk)
+			if (index >> ppc_bits == trunc_chunk)
 				++pages_in_chunk;
 			continue;
 		}
@@ -1256,7 +1256,7 @@ static int osc_make_ready(const struct lu_env *env, struct osc_async_page *oap,
 			  int cmd)
 {
 	struct osc_page *opg = oap2osc_page(oap);
-	struct cl_page *page = cl_page_top(oap2cl_page(oap));
+	struct cl_page  *page = oap2cl_page(oap);
 	int result;
 
 	LASSERT(cmd == OBD_BRW_WRITE); /* no cached reads */
@@ -1271,7 +1271,7 @@ static int osc_refresh_count(const struct lu_env *env,
 			     struct osc_async_page *oap, int cmd)
 {
 	struct osc_page *opg = oap2osc_page(oap);
-	struct cl_page *page = oap2cl_page(oap);
+	pgoff_t index = osc_index(oap2osc(oap));
 	struct cl_object *obj;
 	struct cl_attr *attr = &osc_env_info(env)->oti_attr;
 
@@ -1288,10 +1288,10 @@ static int osc_refresh_count(const struct lu_env *env,
 	if (result < 0)
 		return result;
 	kms = attr->cat_kms;
-	if (cl_offset(obj, page->cp_index) >= kms)
+	if (cl_offset(obj, index) >= kms)
 		/* catch race with truncate */
 		return 0;
-	else if (cl_offset(obj, page->cp_index + 1) > kms)
+	else if (cl_offset(obj, index + 1) > kms)
 		/* catch sub-page write at end of file */
 		return kms % PAGE_CACHE_SIZE;
 	else
@@ -1302,7 +1302,7 @@ static int osc_completion(const struct lu_env *env, struct osc_async_page *oap,
 			  int cmd, int rc)
 {
 	struct osc_page *opg = oap2osc_page(oap);
-	struct cl_page *page = cl_page_top(oap2cl_page(oap));
+	struct cl_page    *page = oap2cl_page(oap);
 	struct osc_object *obj = cl2osc(opg->ops_cl.cpl_obj);
 	enum cl_req_type crt;
 	int srvlock;
@@ -2313,7 +2313,7 @@ int osc_queue_async_io(const struct lu_env *env, struct cl_io *io,
 	OSC_IO_DEBUG(osc, "oap %p page %p added for cmd %d\n",
 		     oap, oap->oap_page, oap->oap_cmd & OBD_BRW_RWMASK);
 
-	index = oap2cl_page(oap)->cp_index;
+	index = osc_index(oap2osc(oap));
 
 	/* Add this page into extent by the following steps:
 	 * 1. if there exists an active extent for this IO, mostly this page
@@ -2425,21 +2425,21 @@ int osc_teardown_async_page(const struct lu_env *env,
 	LASSERT(oap->oap_magic == OAP_MAGIC);
 
 	CDEBUG(D_INFO, "teardown oap %p page %p at index %lu.\n",
-	       oap, ops, oap2cl_page(oap)->cp_index);
+	       oap, ops, osc_index(oap2osc(oap)));
 
 	osc_object_lock(obj);
 	if (!list_empty(&oap->oap_rpc_item)) {
 		CDEBUG(D_CACHE, "oap %p is not in cache.\n", oap);
 		rc = -EBUSY;
 	} else if (!list_empty(&oap->oap_pending_item)) {
-		ext = osc_extent_lookup(obj, oap2cl_page(oap)->cp_index);
+		ext = osc_extent_lookup(obj, osc_index(oap2osc(oap)));
 		/* only truncated pages are allowed to be taken out.
 		 * See osc_extent_truncate() and osc_cache_truncate_start()
 		 * for details.
 		 */
 		if (ext && ext->oe_state != OES_TRUNC) {
 			OSC_EXTENT_DUMP(D_ERROR, ext, "trunc at %lu.\n",
-					oap2cl_page(oap)->cp_index);
+					osc_index(oap2osc(oap)));
 			rc = -EBUSY;
 		}
 	}
@@ -2462,7 +2462,7 @@ int osc_flush_async_page(const struct lu_env *env, struct cl_io *io,
 	struct osc_extent *ext = NULL;
 	struct osc_object *obj = cl2osc(ops->ops_cl.cpl_obj);
 	struct cl_page *cp = ops->ops_cl.cpl_page;
-	pgoff_t	index = cp->cp_index;
+	pgoff_t            index = osc_index(ops);
 	struct osc_async_page *oap = &ops->ops_oap;
 	bool unplug = false;
 	int rc = 0;
@@ -2477,8 +2477,7 @@ int osc_flush_async_page(const struct lu_env *env, struct cl_io *io,
 	switch (ext->oe_state) {
 	case OES_RPC:
 	case OES_LOCK_DONE:
-		CL_PAGE_DEBUG(D_ERROR, env, cl_page_top(cp),
-			      "flush an in-rpc page?\n");
+		CL_PAGE_DEBUG(D_ERROR, env, cp, "flush an in-rpc page?\n");
 		LASSERT(0);
 		break;
 	case OES_LOCKING:
@@ -2504,7 +2503,7 @@ int osc_flush_async_page(const struct lu_env *env, struct cl_io *io,
 		break;
 	}
 
-	rc = cl_page_prep(env, io, cl_page_top(cp), CRT_WRITE);
+	rc = cl_page_prep(env, io, cp, CRT_WRITE);
 	if (rc)
 		goto out;
 
@@ -2548,7 +2547,7 @@ int osc_cancel_async_page(const struct lu_env *env, struct osc_page *ops)
 	struct osc_extent *ext;
 	struct osc_extent *found = NULL;
 	struct list_head *plist;
-	pgoff_t index = oap2cl_page(oap)->cp_index;
+	pgoff_t index = osc_index(ops);
 	int rc = -EBUSY;
 	int cmd;
 
@@ -2611,12 +2610,12 @@ int osc_queue_sync_pages(const struct lu_env *env, struct osc_object *obj,
 	pgoff_t end = 0;
 
 	list_for_each_entry(oap, list, oap_pending_item) {
-		struct cl_page *cp = oap2cl_page(oap);
+		pgoff_t index = osc_index(oap2osc(oap));
 
-		if (cp->cp_index > end)
-			end = cp->cp_index;
-		if (cp->cp_index < start)
-			start = cp->cp_index;
+		if (index > end)
+			end = index;
+		if (index < start)
+			start = index;
 		++page_count;
 		mppr <<= (page_count > mppr);
 	}
@@ -3033,7 +3032,7 @@ int osc_page_gang_lookup(const struct lu_env *env, struct cl_io *io,
 				break;
 			}
 
-			page = cl_page_top(ops->ops_cl.cpl_page);
+			page = ops->ops_cl.cpl_page;
 			LASSERT(page->cp_type == CPT_CACHEABLE);
 			if (page->cp_state == CPS_FREEING)
 				continue;
@@ -3061,7 +3060,7 @@ int osc_page_gang_lookup(const struct lu_env *env, struct cl_io *io,
 			if (res == CLP_GANG_OKAY)
 				res = (*cb)(env, io, ops, cbdata);
 
-			page = cl_page_top(ops->ops_cl.cpl_page);
+			page = ops->ops_cl.cpl_page;
 			lu_ref_del(&page->cp_reference, "gang_lookup", current);
 			cl_page_put(env, page);
 		}
@@ -3094,7 +3093,7 @@ static int check_and_discard_cb(const struct lu_env *env, struct cl_io *io,
 	index = osc_index(ops);
 	if (index >= info->oti_fn_index) {
 		struct cl_lock *tmp;
-		struct cl_page *page = cl_page_top(ops->ops_cl.cpl_page);
+		struct cl_page *page = ops->ops_cl.cpl_page;
 
 		/* refresh non-overlapped index */
 		tmp = cl_lock_at_pgoff(env, lock->cll_descr.cld_obj, index,
@@ -3127,7 +3126,7 @@ static int discard_cb(const struct lu_env *env, struct cl_io *io,
 {
 	struct osc_thread_info *info = osc_env_info(env);
 	struct cl_lock *lock = cbdata;
-	struct cl_page *page = cl_page_top(ops->ops_cl.cpl_page);
+	struct cl_page *page = ops->ops_cl.cpl_page;
 
 	LASSERT(lock->cll_descr.cld_mode >= CLM_WRITE);
 
@@ -3135,7 +3134,7 @@ static int discard_cb(const struct lu_env *env, struct cl_io *io,
 	info->oti_next_index = osc_index(ops) + 1;
 	if (cl_page_own(env, io, page) == 0) {
 		KLASSERT(ergo(page->cp_type == CPT_CACHEABLE,
-			      !PageDirty(cl_page_vmpage(env, page))));
+			      !PageDirty(cl_page_vmpage(page))));
 
 		/* discard the page */
 		cl_page_discard(env, io, page);
diff --git a/drivers/staging/lustre/lustre/osc/osc_cl_internal.h b/drivers/staging/lustre/lustre/osc/osc_cl_internal.h
index cf87043..89552d7 100644
--- a/drivers/staging/lustre/lustre/osc/osc_cl_internal.h
+++ b/drivers/staging/lustre/lustre/osc/osc_cl_internal.h
@@ -416,7 +416,7 @@ struct lu_object *osc_object_alloc(const struct lu_env *env,
 				   const struct lu_object_header *hdr,
 				   struct lu_device *dev);
 int osc_page_init(const struct lu_env *env, struct cl_object *obj,
-		  struct cl_page *page, struct page *vmpage);
+		  struct cl_page *page, pgoff_t ind);
 
 void osc_index2policy  (ldlm_policy_data_t *policy, const struct cl_object *obj,
 			pgoff_t start, pgoff_t end);
@@ -553,6 +553,11 @@ static inline struct osc_page *oap2osc(struct osc_async_page *oap)
 	return container_of0(oap, struct osc_page, ops_oap);
 }
 
+static inline pgoff_t osc_index(struct osc_page *opg)
+{
+	return opg->ops_cl.cpl_index;
+}
+
 static inline struct cl_page *oap2cl_page(struct osc_async_page *oap)
 {
 	return oap2osc(oap)->ops_cl.cpl_page;
@@ -563,11 +568,6 @@ static inline struct osc_page *oap2osc_page(struct osc_async_page *oap)
 	return (struct osc_page *)container_of(oap, struct osc_page, ops_oap);
 }
 
-static inline pgoff_t osc_index(struct osc_page *opg)
-{
-	return opg->ops_cl.cpl_page->cp_index;
-}
-
 static inline struct osc_lock *cl2osc_lock(const struct cl_lock_slice *slice)
 {
 	LINVRNT(osc_is_object(&slice->cls_obj->co_lu));
diff --git a/drivers/staging/lustre/lustre/osc/osc_io.c b/drivers/staging/lustre/lustre/osc/osc_io.c
index e9e18a1..1ae8a22 100644
--- a/drivers/staging/lustre/lustre/osc/osc_io.c
+++ b/drivers/staging/lustre/lustre/osc/osc_io.c
@@ -68,11 +68,15 @@ static struct osc_io *cl2osc_io(const struct lu_env *env,
 	return oio;
 }
 
-static struct osc_page *osc_cl_page_osc(struct cl_page *page)
+static struct osc_page *osc_cl_page_osc(struct cl_page *page,
+					struct osc_object *osc)
 {
 	const struct cl_page_slice *slice;
 
-	slice = cl_page_at(page, &osc_device_type);
+	if (osc)
+		slice = cl_object_page_slice(&osc->oo_cl, page);
+	else
+		slice = cl_page_at(page, &osc_device_type);
 	LASSERT(slice);
 
 	return cl2osc_page(slice);
@@ -137,7 +141,7 @@ static int osc_io_submit(const struct lu_env *env,
 		io = page->cp_owner;
 		LASSERT(io);
 
-		opg = osc_cl_page_osc(page);
+		opg = osc_cl_page_osc(page, osc);
 		oap = &opg->ops_oap;
 		LASSERT(osc == oap->oap_obj);
 
@@ -258,15 +262,11 @@ static int osc_io_commit_async(const struct lu_env *env,
 		}
 	}
 
-	/*
-	 * NOTE: here @page is a top-level page. This is done to avoid
-	 * creation of sub-page-list.
-	 */
 	while (qin->pl_nr > 0) {
 		struct osc_async_page *oap;
 
 		page = cl_page_list_first(qin);
-		opg = osc_cl_page_osc(page);
+		opg = osc_cl_page_osc(page, osc);
 		oap = &opg->ops_oap;
 
 		if (!list_empty(&oap->oap_rpc_item)) {
@@ -283,8 +283,7 @@ static int osc_io_commit_async(const struct lu_env *env,
 				break;
 		}
 
-		osc_page_touch_at(env, osc2cl(osc),
-				  opg->ops_cl.cpl_page->cp_index,
+		osc_page_touch_at(env, osc2cl(osc), osc_index(opg),
 				  page == last_page ? to : PAGE_SIZE);
 
 		cl_page_list_del(env, qin, page);
@@ -403,14 +402,9 @@ static int trunc_check_cb(const struct lu_env *env, struct cl_io *io,
 		CL_PAGE_DEBUG(D_ERROR, env, page, "exists %llu/%s.\n",
 			      start, current->comm);
 
-	{
-		struct page *vmpage = cl_page_vmpage(env, page);
-
-		if (PageLocked(vmpage))
-			CDEBUG(D_CACHE, "page %p index %lu locked for %d.\n",
-			       ops, page->cp_index,
-			       (oap->oap_cmd & OBD_BRW_RWMASK));
-	}
+	if (PageLocked(page->cp_vmpage))
+		CDEBUG(D_CACHE, "page %p index %lu locked for %d.\n",
+		       ops, osc_index(ops), oap->oap_cmd & OBD_BRW_RWMASK);
 
 	return CLP_GANG_OKAY;
 }
@@ -788,18 +782,21 @@ static void osc_req_attr_set(const struct lu_env *env,
 		oa->o_valid |= OBD_MD_FLID;
 	}
 	if (flags & OBD_MD_FLHANDLE) {
+		struct cl_object *subobj;
+
 		clerq = slice->crs_req;
 		LASSERT(!list_empty(&clerq->crq_pages));
 		apage = container_of(clerq->crq_pages.next,
 				     struct cl_page, cp_flight);
-		opg = osc_cl_page_osc(apage);
-		apage = opg->ops_cl.cpl_page; /* now apage is a sub-page */
-		lock = cl_lock_at_page(env, apage->cp_obj, apage, NULL, 1, 1);
+		opg = osc_cl_page_osc(apage, NULL);
+		subobj = opg->ops_cl.cpl_obj;
+		lock = cl_lock_at_pgoff(env, subobj, osc_index(opg),
+					NULL, 1, 1);
 		if (!lock) {
 			struct cl_object_header *head;
 			struct cl_lock *scan;
 
-			head = cl_object_header(apage->cp_obj);
+			head = cl_object_header(subobj);
 			list_for_each_entry(scan, &head->coh_locks, cll_linkage)
 				CL_LOCK_DEBUG(D_ERROR, env, scan,
 					      "no cover page!\n");
diff --git a/drivers/staging/lustre/lustre/osc/osc_page.c b/drivers/staging/lustre/lustre/osc/osc_page.c
index 8dc62fa..3e0a8c3 100644
--- a/drivers/staging/lustre/lustre/osc/osc_page.c
+++ b/drivers/staging/lustre/lustre/osc/osc_page.c
@@ -64,14 +64,9 @@ static int osc_page_protected(const struct lu_env *env,
  * Page operations.
  *
  */
-static void osc_page_fini(const struct lu_env *env,
-			  struct cl_page_slice *slice)
-{
-}
-
 static void osc_page_transfer_get(struct osc_page *opg, const char *label)
 {
-	struct cl_page *page = cl_page_top(opg->ops_cl.cpl_page);
+	struct cl_page *page = opg->ops_cl.cpl_page;
 
 	LASSERT(!opg->ops_transfer_pinned);
 	cl_page_get(page);
@@ -82,7 +77,7 @@ static void osc_page_transfer_get(struct osc_page *opg, const char *label)
 static void osc_page_transfer_put(const struct lu_env *env,
 				  struct osc_page *opg)
 {
-	struct cl_page *page = cl_page_top(opg->ops_cl.cpl_page);
+	struct cl_page *page = opg->ops_cl.cpl_page;
 
 	if (opg->ops_transfer_pinned) {
 		opg->ops_transfer_pinned = 0;
@@ -139,11 +134,12 @@ static int osc_page_is_under_lock(const struct lu_env *env,
 				  const struct cl_page_slice *slice,
 				  struct cl_io *unused)
 {
+	struct osc_page *opg = cl2osc_page(slice);
 	struct cl_lock *lock;
 	int result = -ENODATA;
 
-	lock = cl_lock_at_page(env, slice->cpl_obj, slice->cpl_page,
-			       NULL, 1, 0);
+	lock = cl_lock_at_pgoff(env, slice->cpl_obj, osc_index(opg),
+				NULL, 1, 0);
 	if (lock) {
 		cl_lock_put(env, lock);
 		result = -EBUSY;
@@ -173,8 +169,8 @@ static int osc_page_print(const struct lu_env *env,
 	struct osc_object *obj = cl2osc(slice->cpl_obj);
 	struct client_obd *cli = &osc_export(obj)->exp_obd->u.cli;
 
-	return (*printer)(env, cookie, LUSTRE_OSC_NAME "-page@%p: 1< %#x %d %u %s %s > 2< %llu %u %u %#x %#x | %p %p %p > 3< %s %p %d %lu %d > 4< %d %d %d %lu %s | %s %s %s %s > 5< %s %s %s %s | %d %s | %d %s %s>\n",
-			  opg,
+	return (*printer)(env, cookie, LUSTRE_OSC_NAME "-page@%p %lu: 1< %#x %d %u %s %s > 2< %llu %u %u %#x %#x | %p %p %p > 3< %s %p %d %lu %d > 4< %d %d %d %lu %s | %s %s %s %s > 5< %s %s %s %s | %d %s | %d %s %s>\n",
+			  opg, osc_index(opg),
 			  /* 1 */
 			  oap->oap_magic, oap->oap_cmd,
 			  oap->oap_interrupted,
@@ -222,7 +218,7 @@ static void osc_page_delete(const struct lu_env *env,
 	osc_page_transfer_put(env, opg);
 	rc = osc_teardown_async_page(env, obj, opg);
 	if (rc) {
-		CL_PAGE_DEBUG(D_ERROR, env, cl_page_top(slice->cpl_page),
+		CL_PAGE_DEBUG(D_ERROR, env, slice->cpl_page,
 			      "Trying to teardown failed: %d\n", rc);
 		LASSERT(0);
 	}
@@ -295,7 +291,6 @@ static int osc_page_flush(const struct lu_env *env,
 }
 
 static const struct cl_page_operations osc_page_ops = {
-	.cpo_fini	  = osc_page_fini,
 	.cpo_print	 = osc_page_print,
 	.cpo_delete	= osc_page_delete,
 	.cpo_is_under_lock = osc_page_is_under_lock,
@@ -305,7 +300,7 @@ static const struct cl_page_operations osc_page_ops = {
 };
 
 int osc_page_init(const struct lu_env *env, struct cl_object *obj,
-		  struct cl_page *page, struct page *vmpage)
+		  struct cl_page *page, pgoff_t index)
 {
 	struct osc_object *osc = cl2osc(obj);
 	struct osc_page *opg = cl_object_page_slice(obj, page);
@@ -313,9 +308,10 @@ int osc_page_init(const struct lu_env *env, struct cl_object *obj,
 
 	opg->ops_from = 0;
 	opg->ops_to = PAGE_CACHE_SIZE;
+	opg->ops_cl.cpl_index = index;
 
-	result = osc_prep_async_page(osc, opg, vmpage,
-				     cl_offset(obj, page->cp_index));
+	result = osc_prep_async_page(osc, opg, page->cp_vmpage,
+				     cl_offset(obj, index));
 	if (result == 0) {
 		struct osc_io *oio = osc_env_io(env);
 
@@ -337,8 +333,7 @@ int osc_page_init(const struct lu_env *env, struct cl_object *obj,
 		result = osc_lru_reserve(env, osc, opg);
 		if (result == 0) {
 			spin_lock(&osc->oo_tree_lock);
-			result = radix_tree_insert(&osc->oo_tree,
-						   page->cp_index, opg);
+			result = radix_tree_insert(&osc->oo_tree, index, opg);
 			if (result == 0)
 				++osc->oo_npages;
 			spin_unlock(&osc->oo_tree_lock);
@@ -584,7 +579,7 @@ int osc_lru_shrink(const struct lu_env *env, struct client_obd *cli,
 		if (--maxscan < 0)
 			break;
 
-		page = cl_page_top(opg->ops_cl.cpl_page);
+		page = opg->ops_cl.cpl_page;
 		if (cl_page_in_use_noref(page)) {
 			list_move_tail(&opg->ops_lru, &cli->cl_lru_list);
 			continue;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 12/46] staging/lustre/clio: optimize read ahead code
  2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
                   ` (10 preceding siblings ...)
  2016-03-30 23:48 ` [PATCH v2 11/46] staging/lustre/clio: remove stackable cl_page completely green
@ 2016-03-30 23:48 ` green
  2016-03-30 23:48 ` [PATCH v2 13/46] staging/lustre/llite: remove lli_lvb green
                   ` (33 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: green @ 2016-03-30 23:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Jinshan Xiong, Oleg Drokin

From: Jinshan Xiong <jinshan.xiong@intel.com>

It used to check each page in the readahead window is covered by
a lock underneath, now cpo_page_is_under_lock() provides @max_index
to help decide the maximum ra window. @max_index can be modified by
OSC to extend the maximum lock region, to align stripe boundary at
LOV, and to make sure the readahead region at least covers read
region at LLITE layer.

After this is done, usually readahead code calls
cpo_page_is_under_lock() for each stripe.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-on: http://review.whamcloud.com/8523
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3321
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 drivers/staging/lustre/lustre/include/cl_object.h  |   6 +-
 drivers/staging/lustre/lustre/include/lclient.h    |   2 -
 drivers/staging/lustre/lustre/llite/lcommon_cl.c   |  28 ------
 .../staging/lustre/lustre/llite/llite_internal.h   |   7 +-
 drivers/staging/lustre/lustre/llite/lproc_llite.c  |   1 +
 drivers/staging/lustre/lustre/llite/rw.c           | 103 ++++++++++++++-------
 drivers/staging/lustre/lustre/llite/vvp_io.c       |  23 +----
 drivers/staging/lustre/lustre/llite/vvp_page.c     |  26 ++++--
 .../staging/lustre/lustre/lov/lov_cl_internal.h    |   1 +
 drivers/staging/lustre/lustre/lov/lov_internal.h   |   2 +
 drivers/staging/lustre/lustre/lov/lov_io.c         |   2 +-
 drivers/staging/lustre/lustre/lov/lov_offset.c     |  13 +++
 drivers/staging/lustre/lustre/lov/lov_page.c       |  60 ++++++++++--
 drivers/staging/lustre/lustre/lov/lovsub_page.c    |   4 +-
 drivers/staging/lustre/lustre/obdclass/cl_io.c     |   2 +-
 drivers/staging/lustre/lustre/obdclass/cl_page.c   |  39 ++++++--
 .../staging/lustre/lustre/obdecho/echo_client.c    |   2 +-
 drivers/staging/lustre/lustre/osc/osc_page.c       |  10 +-
 18 files changed, 208 insertions(+), 123 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/cl_object.h b/drivers/staging/lustre/lustre/include/cl_object.h
index 5b65854..69b40f5 100644
--- a/drivers/staging/lustre/lustre/include/cl_object.h
+++ b/drivers/staging/lustre/lustre/include/cl_object.h
@@ -935,7 +935,7 @@ struct cl_page_operations {
 	 */
 	int (*cpo_is_under_lock)(const struct lu_env *env,
 				 const struct cl_page_slice *slice,
-				 struct cl_io *io);
+				 struct cl_io *io, pgoff_t *max);
 
 	/**
 	 * Optional debugging helper. Prints given page slice.
@@ -2674,7 +2674,7 @@ static inline void cl_device_fini(struct cl_device *d)
 }
 
 void cl_page_slice_add(struct cl_page *page, struct cl_page_slice *slice,
-		       struct cl_object *obj,
+		       struct cl_object *obj, pgoff_t index,
 		       const struct cl_page_operations *ops);
 void cl_lock_slice_add(struct cl_lock *lock, struct cl_lock_slice *slice,
 		       struct cl_object *obj,
@@ -2826,7 +2826,7 @@ void cl_page_delete(const struct lu_env *env, struct cl_page *pg);
 int cl_page_is_vmlocked(const struct lu_env *env, const struct cl_page *pg);
 void cl_page_export(const struct lu_env *env, struct cl_page *pg, int uptodate);
 int cl_page_is_under_lock(const struct lu_env *env, struct cl_io *io,
-			  struct cl_page *page);
+			  struct cl_page *page, pgoff_t *max_index);
 loff_t cl_offset(const struct cl_object *obj, pgoff_t idx);
 pgoff_t cl_index(const struct cl_object *obj, loff_t offset);
 int cl_page_size(const struct cl_object *obj);
diff --git a/drivers/staging/lustre/lustre/include/lclient.h b/drivers/staging/lustre/lustre/include/lclient.h
index c91fb01..a8c8788 100644
--- a/drivers/staging/lustre/lustre/include/lclient.h
+++ b/drivers/staging/lustre/lustre/include/lclient.h
@@ -299,8 +299,6 @@ int ccc_lock_init(const struct lu_env *env, struct cl_object *obj,
 		  const struct cl_lock_operations *lkops);
 int ccc_object_glimpse(const struct lu_env *env,
 		       const struct cl_object *obj, struct ost_lvb *lvb);
-int ccc_page_is_under_lock(const struct lu_env *env,
-			   const struct cl_page_slice *slice, struct cl_io *io);
 int ccc_fail(const struct lu_env *env, const struct cl_page_slice *slice);
 int ccc_transient_page_prep(const struct lu_env *env,
 			    const struct cl_page_slice *slice,
diff --git a/drivers/staging/lustre/lustre/llite/lcommon_cl.c b/drivers/staging/lustre/lustre/llite/lcommon_cl.c
index 55fa0da..e34d832 100644
--- a/drivers/staging/lustre/lustre/llite/lcommon_cl.c
+++ b/drivers/staging/lustre/lustre/llite/lcommon_cl.c
@@ -452,34 +452,6 @@ static void ccc_object_size_unlock(struct cl_object *obj)
  *
  */
 
-int ccc_page_is_under_lock(const struct lu_env *env,
-			   const struct cl_page_slice *slice,
-			   struct cl_io *io)
-{
-	struct ccc_io	*cio  = ccc_env_io(env);
-	struct cl_lock_descr *desc = &ccc_env_info(env)->cti_descr;
-	struct cl_page       *page = slice->cpl_page;
-
-	int result;
-
-	if (io->ci_type == CIT_READ || io->ci_type == CIT_WRITE ||
-	    io->ci_type == CIT_FAULT) {
-		if (cio->cui_fd->fd_flags & LL_FILE_GROUP_LOCKED) {
-			result = -EBUSY;
-		} else {
-			desc->cld_start = ccc_index(cl2ccc_page(slice));
-			desc->cld_end   = ccc_index(cl2ccc_page(slice));
-			desc->cld_obj   = page->cp_obj;
-			desc->cld_mode  = CLM_READ;
-			result = cl_queue_match(&io->ci_lockset.cls_done,
-						desc) ? -EBUSY : 0;
-		}
-	} else {
-		result = 0;
-	}
-	return result;
-}
-
 int ccc_fail(const struct lu_env *env, const struct cl_page_slice *slice)
 {
 	/*
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index bc83147..cd69173 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -328,6 +328,7 @@ enum ra_stat {
 	RA_STAT_EOF,
 	RA_STAT_MAX_IN_FLIGHT,
 	RA_STAT_WRONG_GRAB_PAGE,
+	RA_STAT_FAILED_REACH_END,
 	_NR_RA_STAT,
 };
 
@@ -702,8 +703,8 @@ int ll_writepages(struct address_space *, struct writeback_control *wbc);
 int ll_readpage(struct file *file, struct page *page);
 void ll_readahead_init(struct inode *inode, struct ll_readahead_state *ras);
 int ll_readahead(const struct lu_env *env, struct cl_io *io,
-		 struct ll_readahead_state *ras, struct address_space *mapping,
-		 struct cl_page_list *queue, int flags);
+		 struct cl_page_list *queue, struct ll_readahead_state *ras,
+		 bool hit);
 int vvp_io_write_commit(const struct lu_env *env, struct cl_io *io);
 struct ll_cl_context *ll_cl_init(struct file *file, struct page *vmpage);
 void ll_cl_fini(struct ll_cl_context *lcc);
@@ -1074,7 +1075,7 @@ void ras_update(struct ll_sb_info *sbi, struct inode *inode,
 		struct ll_readahead_state *ras, unsigned long index,
 		unsigned hit);
 void ll_ra_count_put(struct ll_sb_info *sbi, unsigned long len);
-void ll_ra_stats_inc(struct address_space *mapping, enum ra_stat which);
+void ll_ra_stats_inc(struct inode *inode, enum ra_stat which);
 
 /* llite/llite_rmtacl.c */
 #ifdef CONFIG_FS_POSIX_ACL
diff --git a/drivers/staging/lustre/lustre/llite/lproc_llite.c b/drivers/staging/lustre/lustre/llite/lproc_llite.c
index 9e8e61a..091144f 100644
--- a/drivers/staging/lustre/lustre/llite/lproc_llite.c
+++ b/drivers/staging/lustre/lustre/llite/lproc_llite.c
@@ -960,6 +960,7 @@ static const char *ra_stat_string[] = {
 	[RA_STAT_EOF] = "read-ahead to EOF",
 	[RA_STAT_MAX_IN_FLIGHT] = "hit max r-a issue",
 	[RA_STAT_WRONG_GRAB_PAGE] = "wrong page from grab_cache_page",
+	[RA_STAT_FAILED_REACH_END] = "failed to reach end"
 };
 
 int ldebugfs_register_mountpoint(struct dentry *parent,
diff --git a/drivers/staging/lustre/lustre/llite/rw.c b/drivers/staging/lustre/lustre/llite/rw.c
index b1375f1..ad15058 100644
--- a/drivers/staging/lustre/lustre/llite/rw.c
+++ b/drivers/staging/lustre/lustre/llite/rw.c
@@ -166,7 +166,7 @@ static void ll_ra_stats_inc_sbi(struct ll_sb_info *sbi, enum ra_stat which);
  */
 static unsigned long ll_ra_count_get(struct ll_sb_info *sbi,
 				     struct ra_io_arg *ria,
-				     unsigned long pages)
+				     unsigned long pages, unsigned long min)
 {
 	struct ll_ra_info *ra = &sbi->ll_ra_info;
 	long ret;
@@ -206,6 +206,11 @@ static unsigned long ll_ra_count_get(struct ll_sb_info *sbi,
 	}
 
 out:
+	if (ret < min) {
+		/* override ra limit for maximum performance */
+		atomic_add(min - ret, &ra->ra_cur_pages);
+		ret = min;
+	}
 	return ret;
 }
 
@@ -222,9 +227,9 @@ static void ll_ra_stats_inc_sbi(struct ll_sb_info *sbi, enum ra_stat which)
 	lprocfs_counter_incr(sbi->ll_ra_stats, which);
 }
 
-void ll_ra_stats_inc(struct address_space *mapping, enum ra_stat which)
+void ll_ra_stats_inc(struct inode *inode, enum ra_stat which)
 {
-	struct ll_sb_info *sbi = ll_i2sbi(mapping->host);
+	struct ll_sb_info *sbi = ll_i2sbi(inode);
 
 	ll_ra_stats_inc_sbi(sbi, which);
 }
@@ -290,7 +295,7 @@ void ll_ra_read_ex(struct file *f, struct ll_ra_read *rar)
 
 static int cl_read_ahead_page(const struct lu_env *env, struct cl_io *io,
 			      struct cl_page_list *queue, struct cl_page *page,
-			      struct cl_object *clob)
+			      struct cl_object *clob, pgoff_t *max_index)
 {
 	struct page *vmpage = page->cp_vmpage;
 	struct ccc_page *cp;
@@ -301,8 +306,11 @@ static int cl_read_ahead_page(const struct lu_env *env, struct cl_io *io,
 	lu_ref_add(&page->cp_reference, "ra", current);
 	cp = cl2ccc_page(cl_object_page_slice(clob, page));
 	if (!cp->cpg_defer_uptodate && !PageUptodate(vmpage)) {
-		rc = cl_page_is_under_lock(env, io, page);
-		if (rc == -EBUSY) {
+		CDEBUG(D_READA, "page index %lu, max_index: %lu\n",
+		       ccc_index(cp), *max_index);
+		if (*max_index == 0 || ccc_index(cp) > *max_index)
+			rc = cl_page_is_under_lock(env, io, page, max_index);
+		if (rc == 0) {
 			cp->cpg_defer_uptodate = 1;
 			cp->cpg_ra_used = 0;
 			cl_page_list_add(queue, page);
@@ -332,24 +340,25 @@ static int cl_read_ahead_page(const struct lu_env *env, struct cl_io *io,
  */
 static int ll_read_ahead_page(const struct lu_env *env, struct cl_io *io,
 			      struct cl_page_list *queue,
-			      pgoff_t index, struct address_space *mapping)
+			      pgoff_t index, pgoff_t *max_index)
 {
+	struct cl_object *clob  = io->ci_obj;
+	struct inode     *inode = ccc_object_inode(clob);
 	struct page      *vmpage;
-	struct cl_object *clob  = ll_i2info(mapping->host)->lli_clob;
 	struct cl_page   *page;
 	enum ra_stat      which = _NR_RA_STAT; /* keep gcc happy */
 	int	       rc    = 0;
 	const char       *msg   = NULL;
 
-	vmpage = grab_cache_page_nowait(mapping, index);
+	vmpage = grab_cache_page_nowait(inode->i_mapping, index);
 	if (vmpage) {
 		/* Check if vmpage was truncated or reclaimed */
-		if (vmpage->mapping == mapping) {
+		if (vmpage->mapping == inode->i_mapping) {
 			page = cl_page_find(env, clob, vmpage->index,
 					    vmpage, CPT_CACHEABLE);
 			if (!IS_ERR(page)) {
 				rc = cl_read_ahead_page(env, io, queue,
-							page, clob);
+							page, clob, max_index);
 				if (rc == -ENOLCK) {
 					which = RA_STAT_FAILED_MATCH;
 					msg   = "lock match failed";
@@ -370,7 +379,7 @@ static int ll_read_ahead_page(const struct lu_env *env, struct cl_io *io,
 		msg   = "g_c_p_n failed";
 	}
 	if (msg) {
-		ll_ra_stats_inc(mapping, which);
+		ll_ra_stats_inc(inode, which);
 		CDEBUG(D_READA, "%s\n", msg);
 	}
 	return rc;
@@ -482,11 +491,12 @@ static int ll_read_ahead_pages(const struct lu_env *env,
 			       struct cl_io *io, struct cl_page_list *queue,
 			       struct ra_io_arg *ria,
 			       unsigned long *reserved_pages,
-			       struct address_space *mapping,
 			       unsigned long *ra_end)
 {
-	int rc, count = 0, stride_ria;
-	unsigned long page_idx;
+	int rc, count = 0;
+	bool stride_ria;
+	pgoff_t page_idx;
+	pgoff_t max_index = 0;
 
 	LASSERT(ria);
 	RIA_DEBUG(ria);
@@ -497,7 +507,7 @@ static int ll_read_ahead_pages(const struct lu_env *env,
 		if (ras_inside_ra_window(page_idx, ria)) {
 			/* If the page is inside the read-ahead window*/
 			rc = ll_read_ahead_page(env, io, queue,
-						page_idx, mapping);
+						page_idx, &max_index);
 			if (rc == 1) {
 				(*reserved_pages)--;
 				count++;
@@ -532,25 +542,23 @@ static int ll_read_ahead_pages(const struct lu_env *env,
 }
 
 int ll_readahead(const struct lu_env *env, struct cl_io *io,
-		 struct ll_readahead_state *ras, struct address_space *mapping,
-		 struct cl_page_list *queue, int flags)
+		 struct cl_page_list *queue, struct ll_readahead_state *ras,
+		 bool hit)
 {
 	struct vvp_io *vio = vvp_env_io(env);
 	struct vvp_thread_info *vti = vvp_env_info(env);
 	struct cl_attr *attr = ccc_env_thread_attr(env);
 	unsigned long start = 0, end = 0, reserved;
-	unsigned long ra_end, len;
+	unsigned long ra_end, len, mlen = 0;
 	struct inode *inode;
 	struct ll_ra_read *bead;
 	struct ra_io_arg *ria = &vti->vti_ria;
-	struct ll_inode_info *lli;
 	struct cl_object *clob;
 	int ret = 0;
 	__u64 kms;
 
-	inode = mapping->host;
-	lli = ll_i2info(inode);
-	clob = lli->lli_clob;
+	clob = io->ci_obj;
+	inode = ccc_object_inode(clob);
 
 	memset(ria, 0, sizeof(*ria));
 
@@ -562,7 +570,7 @@ int ll_readahead(const struct lu_env *env, struct cl_io *io,
 		return ret;
 	kms = attr->cat_kms;
 	if (kms == 0) {
-		ll_ra_stats_inc(mapping, RA_STAT_ZERO_LEN);
+		ll_ra_stats_inc(inode, RA_STAT_ZERO_LEN);
 		return 0;
 	}
 
@@ -621,29 +629,48 @@ int ll_readahead(const struct lu_env *env, struct cl_io *io,
 	spin_unlock(&ras->ras_lock);
 
 	if (end == 0) {
-		ll_ra_stats_inc(mapping, RA_STAT_ZERO_WINDOW);
+		ll_ra_stats_inc(inode, RA_STAT_ZERO_WINDOW);
 		return 0;
 	}
 	len = ria_page_count(ria);
-	if (len == 0)
+	if (len == 0) {
+		ll_ra_stats_inc(inode, RA_STAT_ZERO_WINDOW);
 		return 0;
+	}
+
+	CDEBUG(D_READA, DFID ": ria: %lu/%lu, bead: %lu/%lu, hit: %d\n",
+	       PFID(lu_object_fid(&clob->co_lu)),
+	       ria->ria_start, ria->ria_end,
+	       !bead ? 0 : bead->lrr_start,
+	       !bead ? 0 : bead->lrr_count,
+	       hit);
+
+	/* at least to extend the readahead window to cover current read */
+	if (!hit && bead &&
+	    bead->lrr_start + bead->lrr_count > ria->ria_start) {
+		/* to the end of current read window. */
+		mlen = bead->lrr_start + bead->lrr_count - ria->ria_start;
+		/* trim to RPC boundary */
+		start = ria->ria_start & (PTLRPC_MAX_BRW_PAGES - 1);
+		mlen = min(mlen, PTLRPC_MAX_BRW_PAGES - start);
+	}
 
-	reserved = ll_ra_count_get(ll_i2sbi(inode), ria, len);
+	reserved = ll_ra_count_get(ll_i2sbi(inode), ria, len, mlen);
 	if (reserved < len)
-		ll_ra_stats_inc(mapping, RA_STAT_MAX_IN_FLIGHT);
+		ll_ra_stats_inc(inode, RA_STAT_MAX_IN_FLIGHT);
 
-	CDEBUG(D_READA, "reserved page %lu ra_cur %d ra_max %lu\n", reserved,
+	CDEBUG(D_READA, "reserved pages %lu/%lu/%lu, ra_cur %d, ra_max %lu\n",
+	       reserved, len, mlen,
 	       atomic_read(&ll_i2sbi(inode)->ll_ra_info.ra_cur_pages),
 	       ll_i2sbi(inode)->ll_ra_info.ra_max_pages);
 
-	ret = ll_read_ahead_pages(env, io, queue,
-				  ria, &reserved, mapping, &ra_end);
+	ret = ll_read_ahead_pages(env, io, queue, ria, &reserved, &ra_end);
 
 	if (reserved != 0)
 		ll_ra_count_put(ll_i2sbi(inode), reserved);
 
 	if (ra_end == end + 1 && ra_end == (kms >> PAGE_CACHE_SHIFT))
-		ll_ra_stats_inc(mapping, RA_STAT_EOF);
+		ll_ra_stats_inc(inode, RA_STAT_EOF);
 
 	/* if we didn't get to the end of the region we reserved from
 	 * the ras we need to go back and update the ras so that the
@@ -655,6 +682,7 @@ int ll_readahead(const struct lu_env *env, struct cl_io *io,
 	       ra_end, end, ria->ria_end);
 
 	if (ra_end != end + 1) {
+		ll_ra_stats_inc(inode, RA_STAT_FAILED_REACH_END);
 		spin_lock(&ras->ras_lock);
 		if (ra_end < ras->ras_next_readahead &&
 		    index_in_window(ra_end, ras->ras_window_start, 0,
@@ -925,15 +953,18 @@ void ras_update(struct ll_sb_info *sbi, struct inode *inode,
 	ras->ras_last_readpage = index;
 	ras_set_start(inode, ras, index);
 
-	if (stride_io_mode(ras))
+	if (stride_io_mode(ras)) {
 		/* Since stride readahead is sensitive to the offset
 		 * of read-ahead, so we use original offset here,
 		 * instead of ras_window_start, which is RPC aligned
 		 */
 		ras->ras_next_readahead = max(index, ras->ras_next_readahead);
-	else
-		ras->ras_next_readahead = max(ras->ras_window_start,
-					      ras->ras_next_readahead);
+	} else {
+		if (ras->ras_next_readahead < ras->ras_window_start)
+			ras->ras_next_readahead = ras->ras_window_start;
+		if (!hit)
+			ras->ras_next_readahead = index + 1;
+	}
 	RAS_CDEBUG(ras);
 
 	/* Trigger RA in the mmap case where ras_consecutive_requests
diff --git a/drivers/staging/lustre/lustre/llite/vvp_io.c b/drivers/staging/lustre/lustre/llite/vvp_io.c
index ac9d615..18127d3 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_io.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_io.c
@@ -1052,35 +1052,19 @@ static int vvp_io_read_page(const struct lu_env *env,
 			    const struct cl_page_slice *slice)
 {
 	struct cl_io	      *io     = ios->cis_io;
-	struct cl_object	  *obj    = slice->cpl_obj;
 	struct ccc_page	   *cp     = cl2ccc_page(slice);
 	struct cl_page	    *page   = slice->cpl_page;
-	struct inode	      *inode  = ccc_object_inode(obj);
+	struct inode	      *inode  = ccc_object_inode(slice->cpl_obj);
 	struct ll_sb_info	 *sbi    = ll_i2sbi(inode);
 	struct ll_file_data       *fd     = cl2ccc_io(env, ios)->cui_fd;
 	struct ll_readahead_state *ras    = &fd->fd_ras;
-	struct page		*vmpage = cp->cpg_page;
 	struct cl_2queue	  *queue  = &io->ci_queue;
-	int rc;
-
-	CLOBINVRNT(env, obj, ccc_object_invariant(obj));
-	LASSERT(slice->cpl_obj == obj);
 
 	if (sbi->ll_ra_info.ra_max_pages_per_file &&
 	    sbi->ll_ra_info.ra_max_pages)
 		ras_update(sbi, inode, ras, ccc_index(cp),
 			   cp->cpg_defer_uptodate);
 
-	/* Sanity check whether the page is protected by a lock. */
-	rc = cl_page_is_under_lock(env, io, page);
-	if (rc != -EBUSY) {
-		CL_PAGE_HEADER(D_WARNING, env, page, "%s: %d\n",
-			       rc == -ENODATA ? "without a lock" :
-			       "match failed", rc);
-		if (rc != -ENODATA)
-			return rc;
-	}
-
 	if (cp->cpg_defer_uptodate) {
 		cp->cpg_ra_used = 1;
 		cl_page_export(env, page, 1);
@@ -1089,11 +1073,12 @@ static int vvp_io_read_page(const struct lu_env *env,
 	 * Add page into the queue even when it is marked uptodate above.
 	 * this will unlock it automatically as part of cl_page_list_disown().
 	 */
+
 	cl_page_list_add(&queue->c2_qin, page);
 	if (sbi->ll_ra_info.ra_max_pages_per_file &&
 	    sbi->ll_ra_info.ra_max_pages)
-		ll_readahead(env, io, ras,
-			     vmpage->mapping, &queue->c2_qin, fd->fd_flags);
+		ll_readahead(env, io, &queue->c2_qin, ras,
+			     cp->cpg_defer_uptodate);
 
 	return 0;
 }
diff --git a/drivers/staging/lustre/lustre/llite/vvp_page.c b/drivers/staging/lustre/lustre/llite/vvp_page.c
index d9f13c3..3c6b723 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_page.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_page.c
@@ -142,7 +142,7 @@ static void vvp_page_discard(const struct lu_env *env,
 	LASSERT(PageLocked(vmpage));
 
 	if (cpg->cpg_defer_uptodate && !cpg->cpg_ra_used)
-		ll_ra_stats_inc(vmpage->mapping, RA_STAT_DISCARDED);
+		ll_ra_stats_inc(vmpage->mapping->host, RA_STAT_DISCARDED);
 
 	ll_invalidate_page(vmpage);
 }
@@ -357,6 +357,20 @@ static int vvp_page_make_ready(const struct lu_env *env,
 	return result;
 }
 
+static int vvp_page_is_under_lock(const struct lu_env *env,
+				  const struct cl_page_slice *slice,
+				  struct cl_io *io, pgoff_t *max_index)
+{
+	if (io->ci_type == CIT_READ || io->ci_type == CIT_WRITE ||
+	    io->ci_type == CIT_FAULT) {
+		struct ccc_io *cio = ccc_env_io(env);
+
+		if (unlikely(cio->cui_fd->fd_flags & LL_FILE_GROUP_LOCKED))
+			*max_index = CL_PAGE_EOF;
+	}
+	return 0;
+}
+
 static int vvp_page_print(const struct lu_env *env,
 			  const struct cl_page_slice *slice,
 			  void *cookie, lu_printer_t printer)
@@ -389,7 +403,7 @@ static const struct cl_page_operations vvp_page_ops = {
 	.cpo_is_vmlocked   = vvp_page_is_vmlocked,
 	.cpo_fini	  = vvp_page_fini,
 	.cpo_print	 = vvp_page_print,
-	.cpo_is_under_lock = ccc_page_is_under_lock,
+	.cpo_is_under_lock = vvp_page_is_under_lock,
 	.io = {
 		[CRT_READ] = {
 			.cpo_prep	= vvp_page_prep_read,
@@ -495,7 +509,7 @@ static const struct cl_page_operations vvp_transient_page_ops = {
 	.cpo_fini	  = vvp_transient_page_fini,
 	.cpo_is_vmlocked   = vvp_transient_page_is_vmlocked,
 	.cpo_print	 = vvp_page_print,
-	.cpo_is_under_lock = ccc_page_is_under_lock,
+	.cpo_is_under_lock	= vvp_page_is_under_lock,
 	.io = {
 		[CRT_READ] = {
 			.cpo_prep	= ccc_transient_page_prep,
@@ -516,7 +530,6 @@ int vvp_page_init(const struct lu_env *env, struct cl_object *obj,
 
 	CLOBINVRNT(env, obj, ccc_object_invariant(obj));
 
-	cpg->cpg_cl.cpl_index = index;
 	cpg->cpg_page = vmpage;
 	page_cache_get(vmpage);
 
@@ -526,12 +539,13 @@ int vvp_page_init(const struct lu_env *env, struct cl_object *obj,
 		atomic_inc(&page->cp_ref);
 		SetPagePrivate(vmpage);
 		vmpage->private = (unsigned long)page;
-		cl_page_slice_add(page, &cpg->cpg_cl, obj, &vvp_page_ops);
+		cl_page_slice_add(page, &cpg->cpg_cl, obj, index,
+				  &vvp_page_ops);
 	} else {
 		struct ccc_object *clobj = cl2ccc(obj);
 
 		LASSERT(!inode_trylock(clobj->cob_inode));
-		cl_page_slice_add(page, &cpg->cpg_cl, obj,
+		cl_page_slice_add(page, &cpg->cpg_cl, obj, index,
 				  &vvp_transient_page_ops);
 		clobj->cob_transient_pages++;
 	}
diff --git a/drivers/staging/lustre/lustre/lov/lov_cl_internal.h b/drivers/staging/lustre/lustre/lov/lov_cl_internal.h
index b8e2315..9b3d13b 100644
--- a/drivers/staging/lustre/lustre/lov/lov_cl_internal.h
+++ b/drivers/staging/lustre/lustre/lov/lov_cl_internal.h
@@ -632,6 +632,7 @@ struct lov_lock_link *lov_lock_link_find(const struct lu_env *env,
 					 struct lovsub_lock *sub);
 struct lov_io_sub *lov_page_subio(const struct lu_env *env, struct lov_io *lio,
 				  const struct cl_page_slice *slice);
+int lov_page_stripe(const struct cl_page *page);
 
 #define lov_foreach_target(lov, var)		    \
 	for (var = 0; var < lov_targets_nr(lov); ++var)
diff --git a/drivers/staging/lustre/lustre/lov/lov_internal.h b/drivers/staging/lustre/lustre/lov/lov_internal.h
index 590f932..9985855 100644
--- a/drivers/staging/lustre/lustre/lov/lov_internal.h
+++ b/drivers/staging/lustre/lustre/lov/lov_internal.h
@@ -146,6 +146,8 @@ int lov_stripe_intersects(struct lov_stripe_md *lsm, int stripeno,
 			  u64 start, u64 end,
 			  u64 *obd_start, u64 *obd_end);
 int lov_stripe_number(struct lov_stripe_md *lsm, u64 lov_off);
+pgoff_t lov_stripe_pgoff(struct lov_stripe_md *lsm, pgoff_t stripe_index,
+			 int stripe);
 
 /* lov_qos.c */
 #define LOV_USES_ASSIGNED_STRIPE	0
diff --git a/drivers/staging/lustre/lustre/lov/lov_io.c b/drivers/staging/lustre/lustre/lov/lov_io.c
index e5b2cfc..ba79955 100644
--- a/drivers/staging/lustre/lustre/lov/lov_io.c
+++ b/drivers/staging/lustre/lustre/lov/lov_io.c
@@ -245,7 +245,7 @@ void lov_sub_put(struct lov_io_sub *sub)
  *
  */
 
-static int lov_page_stripe(const struct cl_page *page)
+int lov_page_stripe(const struct cl_page *page)
 {
 	struct lovsub_object *subobj;
 	const struct cl_page_slice *slice;
diff --git a/drivers/staging/lustre/lustre/lov/lov_offset.c b/drivers/staging/lustre/lustre/lov/lov_offset.c
index ae83eb0..cb7b516 100644
--- a/drivers/staging/lustre/lustre/lov/lov_offset.c
+++ b/drivers/staging/lustre/lustre/lov/lov_offset.c
@@ -66,6 +66,19 @@ u64 lov_stripe_size(struct lov_stripe_md *lsm, u64 ost_size, int stripeno)
 	return lov_size;
 }
 
+/**
+ * Compute file level page index by stripe level page offset
+ */
+pgoff_t lov_stripe_pgoff(struct lov_stripe_md *lsm, pgoff_t stripe_index,
+			 int stripe)
+{
+	loff_t offset;
+
+	offset = lov_stripe_size(lsm, stripe_index << PAGE_CACHE_SHIFT,
+				 stripe);
+	return offset >> PAGE_CACHE_SHIFT;
+}
+
 /* we have an offset in file backed by an lov and want to find out where
  * that offset lands in our given stripe of the file.  for the easy
  * case where the offset is within the stripe, we just have to scale the
diff --git a/drivers/staging/lustre/lustre/lov/lov_page.c b/drivers/staging/lustre/lustre/lov/lov_page.c
index 0c508bd..9634c13 100644
--- a/drivers/staging/lustre/lustre/lov/lov_page.c
+++ b/drivers/staging/lustre/lustre/lov/lov_page.c
@@ -52,17 +52,57 @@
  * Lov page operations.
  *
  */
-static int lov_page_print(const struct lu_env *env,
-			  const struct cl_page_slice *slice,
-			  void *cookie, lu_printer_t printer)
+
+/**
+ * Adjust the stripe index by layout of raid0. @max_index is the maximum
+ * page index covered by an underlying DLM lock.
+ * This function converts max_index from stripe level to file level, and make
+ * sure it's not beyond one stripe.
+ */
+static int lov_raid0_page_is_under_lock(const struct lu_env *env,
+					const struct cl_page_slice *slice,
+					struct cl_io *unused,
+					pgoff_t *max_index)
+{
+	struct lov_object *loo = cl2lov(slice->cpl_obj);
+	struct lov_layout_raid0 *r0 = lov_r0(loo);
+	pgoff_t index = *max_index;
+	unsigned int pps; /* pages per stripe */
+
+	CDEBUG(D_READA, "*max_index = %lu, nr = %d\n", index, r0->lo_nr);
+	if (index == 0) /* the page is not covered by any lock */
+		return 0;
+
+	if (r0->lo_nr == 1) /* single stripe file */
+		return 0;
+
+	/* max_index is stripe level, convert it into file level */
+	if (index != CL_PAGE_EOF) {
+		int stripeno = lov_page_stripe(slice->cpl_page);
+		*max_index = lov_stripe_pgoff(loo->lo_lsm, index, stripeno);
+	}
+
+	/* calculate the end of current stripe */
+	pps = loo->lo_lsm->lsm_stripe_size >> PAGE_CACHE_SHIFT;
+	index = ((slice->cpl_index + pps) & ~(pps - 1)) - 1;
+
+	/* never exceed the end of the stripe */
+	*max_index = min_t(pgoff_t, *max_index, index);
+	return 0;
+}
+
+static int lov_raid0_page_print(const struct lu_env *env,
+				const struct cl_page_slice *slice,
+				void *cookie, lu_printer_t printer)
 {
 	struct lov_page *lp = cl2lov_page(slice);
 
-	return (*printer)(env, cookie, LUSTRE_LOV_NAME"-page@%p\n", lp);
+	return (*printer)(env, cookie, LUSTRE_LOV_NAME "-page@%p, raid0\n", lp);
 }
 
-static const struct cl_page_operations lov_page_ops = {
-	.cpo_print  = lov_page_print
+static const struct cl_page_operations lov_raid0_page_ops = {
+	.cpo_is_under_lock = lov_raid0_page_is_under_lock,
+	.cpo_print  = lov_raid0_page_print
 };
 
 int lov_page_init_raid0(const struct lu_env *env, struct cl_object *obj,
@@ -86,7 +126,7 @@ int lov_page_init_raid0(const struct lu_env *env, struct cl_object *obj,
 	rc = lov_stripe_offset(loo->lo_lsm, offset, stripe, &suboff);
 	LASSERT(rc == 0);
 
-	cl_page_slice_add(page, &lpg->lps_cl, obj, &lov_page_ops);
+	cl_page_slice_add(page, &lpg->lps_cl, obj, index, &lov_raid0_page_ops);
 
 	sub = lov_sub_get(env, lio, stripe);
 	if (IS_ERR(sub))
@@ -107,7 +147,7 @@ int lov_page_init_raid0(const struct lu_env *env, struct cl_object *obj,
 	return rc;
 }
 
-static int lov_page_empty_print(const struct lu_env *env,
+static int lov_empty_page_print(const struct lu_env *env,
 				const struct cl_page_slice *slice,
 				void *cookie, lu_printer_t printer)
 {
@@ -118,7 +158,7 @@ static int lov_page_empty_print(const struct lu_env *env,
 }
 
 static const struct cl_page_operations lov_empty_page_ops = {
-	.cpo_print = lov_page_empty_print
+	.cpo_print = lov_empty_page_print
 };
 
 int lov_page_init_empty(const struct lu_env *env, struct cl_object *obj,
@@ -127,7 +167,7 @@ int lov_page_init_empty(const struct lu_env *env, struct cl_object *obj,
 	struct lov_page *lpg = cl_object_page_slice(obj, page);
 	void *addr;
 
-	cl_page_slice_add(page, &lpg->lps_cl, obj, &lov_empty_page_ops);
+	cl_page_slice_add(page, &lpg->lps_cl, obj, index, &lov_empty_page_ops);
 	addr = kmap(page->cp_vmpage);
 	memset(addr, 0, cl_page_size(obj));
 	kunmap(page->cp_vmpage);
diff --git a/drivers/staging/lustre/lustre/lov/lovsub_page.c b/drivers/staging/lustre/lustre/lov/lovsub_page.c
index fb4c0cc..9badedc 100644
--- a/drivers/staging/lustre/lustre/lov/lovsub_page.c
+++ b/drivers/staging/lustre/lustre/lov/lovsub_page.c
@@ -60,11 +60,11 @@ static const struct cl_page_operations lovsub_page_ops = {
 };
 
 int lovsub_page_init(const struct lu_env *env, struct cl_object *obj,
-		     struct cl_page *page, pgoff_t ind)
+		     struct cl_page *page, pgoff_t index)
 {
 	struct lovsub_page *lsb = cl_object_page_slice(obj, page);
 
-	cl_page_slice_add(page, &lsb->lsb_cl, obj, &lovsub_page_ops);
+	cl_page_slice_add(page, &lsb->lsb_cl, obj, index, &lovsub_page_ops);
 	return 0;
 }
 
diff --git a/drivers/staging/lustre/lustre/obdclass/cl_io.c b/drivers/staging/lustre/lustre/obdclass/cl_io.c
index 86591ce..65d6cee 100644
--- a/drivers/staging/lustre/lustre/obdclass/cl_io.c
+++ b/drivers/staging/lustre/lustre/obdclass/cl_io.c
@@ -733,7 +733,7 @@ int cl_io_read_page(const struct lu_env *env, struct cl_io *io,
 				break;
 		}
 	}
-	if (result == 0)
+	if (result == 0 && queue->c2_qin.pl_nr > 0)
 		result = cl_io_submit_rw(env, io, CRT_READ, queue);
 	/*
 	 * Unlock unsent pages in case of error.
diff --git a/drivers/staging/lustre/lustre/obdclass/cl_page.c b/drivers/staging/lustre/lustre/obdclass/cl_page.c
index cb15673..506a9f9 100644
--- a/drivers/staging/lustre/lustre/obdclass/cl_page.c
+++ b/drivers/staging/lustre/lustre/obdclass/cl_page.c
@@ -401,6 +401,30 @@ EXPORT_SYMBOL(cl_page_at);
 	__result;						       \
 })
 
+#define CL_PAGE_INVOKE_REVERSE(_env, _page, _op, _proto, ...)		\
+({									\
+	const struct lu_env        *__env  = (_env);			\
+	struct cl_page             *__page = (_page);			\
+	const struct cl_page_slice *__scan;				\
+	int                         __result;				\
+	ptrdiff_t                   __op   = (_op);			\
+	int                       (*__method)_proto;			\
+									\
+	__result = 0;							\
+	list_for_each_entry_reverse(__scan, &__page->cp_layers,		\
+					cpl_linkage) {			\
+		__method = *(void **)((char *)__scan->cpl_ops +  __op);	\
+		if (__method) {						\
+			__result = (*__method)(__env, __scan, ## __VA_ARGS__); \
+			if (__result != 0)				\
+				break;					\
+		}							\
+	}								\
+	if (__result > 0)						\
+		__result = 0;						\
+	__result;							\
+})
+
 #define CL_PAGE_INVOID(_env, _page, _op, _proto, ...)		   \
 do {								    \
 	const struct lu_env	*__env  = (_env);		    \
@@ -928,17 +952,17 @@ EXPORT_SYMBOL(cl_page_flush);
  * \see cl_page_operations::cpo_is_under_lock()
  */
 int cl_page_is_under_lock(const struct lu_env *env, struct cl_io *io,
-			  struct cl_page *page)
+			  struct cl_page *page, pgoff_t *max_index)
 {
 	int rc;
 
 	PINVRNT(env, page, cl_page_invariant(page));
 
-	rc = CL_PAGE_INVOKE(env, page, CL_PAGE_OP(cpo_is_under_lock),
-			    (const struct lu_env *,
-			     const struct cl_page_slice *, struct cl_io *),
-			    io);
-	PASSERT(env, page, rc != 0);
+	rc = CL_PAGE_INVOKE_REVERSE(env, page, CL_PAGE_OP(cpo_is_under_lock),
+				    (const struct lu_env *,
+				     const struct cl_page_slice *,
+				      struct cl_io *, pgoff_t *),
+				    io, max_index);
 	return rc;
 }
 EXPORT_SYMBOL(cl_page_is_under_lock);
@@ -1041,11 +1065,12 @@ EXPORT_SYMBOL(cl_page_size);
  * \see cl_lock_slice_add(), cl_req_slice_add(), cl_io_slice_add()
  */
 void cl_page_slice_add(struct cl_page *page, struct cl_page_slice *slice,
-		       struct cl_object *obj,
+		       struct cl_object *obj, pgoff_t index,
 		       const struct cl_page_operations *ops)
 {
 	list_add_tail(&slice->cpl_linkage, &page->cp_layers);
 	slice->cpl_obj  = obj;
+	slice->cpl_index = index;
 	slice->cpl_ops  = ops;
 	slice->cpl_page = page;
 }
diff --git a/drivers/staging/lustre/lustre/obdecho/echo_client.c b/drivers/staging/lustre/lustre/obdecho/echo_client.c
index db56081..0d84d04 100644
--- a/drivers/staging/lustre/lustre/obdecho/echo_client.c
+++ b/drivers/staging/lustre/lustre/obdecho/echo_client.c
@@ -365,7 +365,7 @@ static int echo_page_init(const struct lu_env *env, struct cl_object *obj,
 
 	page_cache_get(page->cp_vmpage);
 	mutex_init(&ep->ep_lock);
-	cl_page_slice_add(page, &ep->ep_cl, obj, &echo_page_ops);
+	cl_page_slice_add(page, &ep->ep_cl, obj, index, &echo_page_ops);
 	atomic_inc(&eco->eo_npages);
 	return 0;
 }
diff --git a/drivers/staging/lustre/lustre/osc/osc_page.c b/drivers/staging/lustre/lustre/osc/osc_page.c
index 3e0a8c3..e02dd33 100644
--- a/drivers/staging/lustre/lustre/osc/osc_page.c
+++ b/drivers/staging/lustre/lustre/osc/osc_page.c
@@ -132,17 +132,19 @@ void osc_index2policy(ldlm_policy_data_t *policy, const struct cl_object *obj,
 
 static int osc_page_is_under_lock(const struct lu_env *env,
 				  const struct cl_page_slice *slice,
-				  struct cl_io *unused)
+				  struct cl_io *unused, pgoff_t *max_index)
 {
 	struct osc_page *opg = cl2osc_page(slice);
 	struct cl_lock *lock;
 	int result = -ENODATA;
 
+	*max_index = 0;
 	lock = cl_lock_at_pgoff(env, slice->cpl_obj, osc_index(opg),
 				NULL, 1, 0);
 	if (lock) {
+		*max_index = lock->cll_descr.cld_end;
 		cl_lock_put(env, lock);
-		result = -EBUSY;
+		result = 0;
 	}
 	return result;
 }
@@ -308,7 +310,6 @@ int osc_page_init(const struct lu_env *env, struct cl_object *obj,
 
 	opg->ops_from = 0;
 	opg->ops_to = PAGE_CACHE_SIZE;
-	opg->ops_cl.cpl_index = index;
 
 	result = osc_prep_async_page(osc, opg, page->cp_vmpage,
 				     cl_offset(obj, index));
@@ -316,7 +317,8 @@ int osc_page_init(const struct lu_env *env, struct cl_object *obj,
 		struct osc_io *oio = osc_env_io(env);
 
 		opg->ops_srvlock = osc_io_srvlock(oio);
-		cl_page_slice_add(page, &opg->ops_cl, obj, &osc_page_ops);
+		cl_page_slice_add(page, &opg->ops_cl, obj, index,
+				  &osc_page_ops);
 	}
 	/*
 	 * Cannot assert osc_page_protected() here as read-ahead
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 13/46] staging/lustre/llite: remove lli_lvb
  2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
                   ` (11 preceding siblings ...)
  2016-03-30 23:48 ` [PATCH v2 12/46] staging/lustre/clio: optimize read ahead code green
@ 2016-03-30 23:48 ` green
  2016-03-30 23:48 ` [PATCH v2 14/46] staging/lustre/lmv: remove lmv_init_{lock,unlock}() green
                   ` (32 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: green @ 2016-03-30 23:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List,
	John L. Hammond, Oleg Drokin

From: "John L. Hammond" <john.hammond@intel.com>

In struct ll_inode_info remove the struct ost_lvb lli_lvb member and
replace it with obd_time lli_{a,m,c}time. Rename ll_merge_lvb() to
ll_merge_attr(). Remove cl_merge_lvb() and replace calls to it with
calls to ll_merge_attr().

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Reviewed-on: http://review.whamcloud.com/12849
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2675
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 drivers/staging/lustre/lustre/llite/file.c         | 65 ++++++++++++----------
 drivers/staging/lustre/lustre/llite/glimpse.c      |  6 +-
 drivers/staging/lustre/lustre/llite/lcommon_cl.c   |  2 +-
 .../staging/lustre/lustre/llite/llite_internal.h   | 11 ++--
 drivers/staging/lustre/lustre/llite/llite_lib.c    |  6 +-
 drivers/staging/lustre/lustre/llite/vvp_io.c       |  2 +-
 6 files changed, 48 insertions(+), 44 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index 127fff6..8fc9da0 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -994,50 +994,57 @@ int ll_inode_getattr(struct inode *inode, struct obdo *obdo,
 	return rc;
 }
 
-int ll_merge_lvb(const struct lu_env *env, struct inode *inode)
+int ll_merge_attr(const struct lu_env *env, struct inode *inode)
 {
 	struct ll_inode_info *lli = ll_i2info(inode);
 	struct cl_object *obj = lli->lli_clob;
 	struct cl_attr *attr = ccc_env_thread_attr(env);
-	struct ost_lvb lvb;
+	s64 atime;
+	s64 mtime;
+	s64 ctime;
 	int rc = 0;
 
 	ll_inode_size_lock(inode);
+
 	/* merge timestamps the most recently obtained from mds with
 	 * timestamps obtained from osts
 	 */
-	LTIME_S(inode->i_atime) = lli->lli_lvb.lvb_atime;
-	LTIME_S(inode->i_mtime) = lli->lli_lvb.lvb_mtime;
-	LTIME_S(inode->i_ctime) = lli->lli_lvb.lvb_ctime;
+	LTIME_S(inode->i_atime) = lli->lli_atime;
+	LTIME_S(inode->i_mtime) = lli->lli_mtime;
+	LTIME_S(inode->i_ctime) = lli->lli_ctime;
 
-	lvb.lvb_size = i_size_read(inode);
-	lvb.lvb_blocks = inode->i_blocks;
-	lvb.lvb_mtime = LTIME_S(inode->i_mtime);
-	lvb.lvb_atime = LTIME_S(inode->i_atime);
-	lvb.lvb_ctime = LTIME_S(inode->i_ctime);
+	mtime = LTIME_S(inode->i_mtime);
+	atime = LTIME_S(inode->i_atime);
+	ctime = LTIME_S(inode->i_ctime);
 
 	cl_object_attr_lock(obj);
 	rc = cl_object_attr_get(env, obj, attr);
 	cl_object_attr_unlock(obj);
 
-	if (rc == 0) {
-		if (lvb.lvb_atime < attr->cat_atime)
-			lvb.lvb_atime = attr->cat_atime;
-		if (lvb.lvb_ctime < attr->cat_ctime)
-			lvb.lvb_ctime = attr->cat_ctime;
-		if (lvb.lvb_mtime < attr->cat_mtime)
-			lvb.lvb_mtime = attr->cat_mtime;
+	if (rc != 0)
+		goto out_size_unlock;
 
-		CDEBUG(D_VFSTRACE, DFID " updating i_size %llu\n",
-		       PFID(&lli->lli_fid), attr->cat_size);
-		cl_isize_write_nolock(inode, attr->cat_size);
+	if (atime < attr->cat_atime)
+		atime = attr->cat_atime;
 
-		inode->i_blocks = attr->cat_blocks;
+	if (ctime < attr->cat_ctime)
+		ctime = attr->cat_ctime;
 
-		LTIME_S(inode->i_mtime) = lvb.lvb_mtime;
-		LTIME_S(inode->i_atime) = lvb.lvb_atime;
-		LTIME_S(inode->i_ctime) = lvb.lvb_ctime;
-	}
+	if (mtime < attr->cat_mtime)
+		mtime = attr->cat_mtime;
+
+	CDEBUG(D_VFSTRACE, DFID " updating i_size %llu\n",
+	       PFID(&lli->lli_fid), attr->cat_size);
+
+	cl_isize_write_nolock(inode, attr->cat_size);
+
+	inode->i_blocks = attr->cat_blocks;
+
+	LTIME_S(inode->i_mtime) = mtime;
+	LTIME_S(inode->i_atime) = atime;
+	LTIME_S(inode->i_ctime) = ctime;
+
+out_size_unlock:
 	ll_inode_size_unlock(inode);
 
 	return rc;
@@ -1936,7 +1943,7 @@ int ll_hsm_release(struct inode *inode)
 		goto out;
 	}
 
-	ll_merge_lvb(env, inode);
+	ll_merge_attr(env, inode);
 	cl_env_nested_put(&nest, env);
 
 	/* Release the file.
@@ -3001,9 +3008,9 @@ static int ll_inode_revalidate(struct dentry *dentry, __u64 ibits)
 
 	/* if object isn't regular file, don't validate size */
 	if (!S_ISREG(inode->i_mode)) {
-		LTIME_S(inode->i_atime) = ll_i2info(inode)->lli_lvb.lvb_atime;
-		LTIME_S(inode->i_mtime) = ll_i2info(inode)->lli_lvb.lvb_mtime;
-		LTIME_S(inode->i_ctime) = ll_i2info(inode)->lli_lvb.lvb_ctime;
+		LTIME_S(inode->i_atime) = ll_i2info(inode)->lli_atime;
+		LTIME_S(inode->i_mtime) = ll_i2info(inode)->lli_mtime;
+		LTIME_S(inode->i_ctime) = ll_i2info(inode)->lli_ctime;
 	} else {
 		/* In case of restore, the MDT has the right size and has
 		 * already send it back without granting the layout lock,
diff --git a/drivers/staging/lustre/lustre/llite/glimpse.c b/drivers/staging/lustre/lustre/llite/glimpse.c
index f235f35..88bc7c9 100644
--- a/drivers/staging/lustre/lustre/llite/glimpse.c
+++ b/drivers/staging/lustre/lustre/llite/glimpse.c
@@ -139,7 +139,7 @@ int cl_glimpse_lock(const struct lu_env *env, struct cl_io *io,
 			LASSERT(agl == 0);
 			result = cl_wait(env, lock);
 			if (result == 0) {
-				cl_merge_lvb(env, inode);
+				ll_merge_attr(env, inode);
 				if (cl_isize_read(inode) > 0 &&
 				    inode->i_blocks == 0) {
 					/*
@@ -155,7 +155,7 @@ int cl_glimpse_lock(const struct lu_env *env, struct cl_io *io,
 			cl_lock_release(env, lock, "glimpse", current);
 		} else {
 			CDEBUG(D_DLMTRACE, "No objects for inode\n");
-			cl_merge_lvb(env, inode);
+			ll_merge_attr(env, inode);
 		}
 	}
 
@@ -259,7 +259,7 @@ int cl_local_size(struct inode *inode)
 		descr->cld_obj = clob;
 		lock = cl_lock_peek(env, io, descr, "localsize", current);
 		if (lock) {
-			cl_merge_lvb(env, inode);
+			ll_merge_attr(env, inode);
 			cl_unuse(env, lock);
 			cl_lock_release(env, lock, "localsize", current);
 			result = 0;
diff --git a/drivers/staging/lustre/lustre/llite/lcommon_cl.c b/drivers/staging/lustre/lustre/llite/lcommon_cl.c
index e34d832..4871d0f 100644
--- a/drivers/staging/lustre/lustre/llite/lcommon_cl.c
+++ b/drivers/staging/lustre/lustre/llite/lcommon_cl.c
@@ -591,7 +591,7 @@ void ccc_lock_state(const struct lu_env *env,
 		 */
 		if (lock->cll_descr.cld_start == 0 &&
 		    lock->cll_descr.cld_end == CL_PAGE_EOF)
-			cl_merge_lvb(env, inode);
+			ll_merge_attr(env, inode);
 	}
 }
 
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index cd69173..c1d747a 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -161,7 +161,9 @@ struct ll_inode_info {
 	struct inode			lli_vfs_inode;
 
 	/* the most recent timestamps obtained from mds */
-	struct ost_lvb			lli_lvb;
+	s64				lli_atime;
+	s64				lli_mtime;
+	s64				lli_ctime;
 	spinlock_t			lli_agl_lock;
 
 	/* Try to make the d::member and f::member are aligned. Before using
@@ -752,7 +754,7 @@ int ll_dir_setstripe(struct inode *inode, struct lov_user_md *lump,
 int ll_dir_getstripe(struct inode *inode, struct lov_mds_md **lmmp,
 		     int *lmm_size, struct ptlrpc_request **request);
 int ll_fsync(struct file *file, loff_t start, loff_t end, int data);
-int ll_merge_lvb(const struct lu_env *env, struct inode *inode);
+int ll_merge_attr(const struct lu_env *env, struct inode *inode);
 int ll_fid2path(struct inode *inode, void __user *arg);
 int ll_data_version(struct inode *inode, __u64 *data_version, int extent_lock);
 int ll_hsm_release(struct inode *inode);
@@ -1319,11 +1321,6 @@ static inline void cl_isize_write(struct inode *inode, loff_t kms)
 
 #define cl_isize_read(inode)	     i_size_read(inode)
 
-static inline int cl_merge_lvb(const struct lu_env *env, struct inode *inode)
-{
-	return ll_merge_lvb(env, inode);
-}
-
 #define cl_inode_atime(inode) LTIME_S((inode)->i_atime)
 #define cl_inode_ctime(inode) LTIME_S((inode)->i_ctime)
 #define cl_inode_mtime(inode) LTIME_S((inode)->i_mtime)
diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c
index a4401f2..0b4e0db 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -1551,7 +1551,7 @@ void ll_update_inode(struct inode *inode, struct lustre_md *md)
 	if (body->valid & OBD_MD_FLATIME) {
 		if (body->atime > LTIME_S(inode->i_atime))
 			LTIME_S(inode->i_atime) = body->atime;
-		lli->lli_lvb.lvb_atime = body->atime;
+		lli->lli_atime = body->atime;
 	}
 	if (body->valid & OBD_MD_FLMTIME) {
 		if (body->mtime > LTIME_S(inode->i_mtime)) {
@@ -1560,12 +1560,12 @@ void ll_update_inode(struct inode *inode, struct lustre_md *md)
 			       body->mtime);
 			LTIME_S(inode->i_mtime) = body->mtime;
 		}
-		lli->lli_lvb.lvb_mtime = body->mtime;
+		lli->lli_mtime = body->mtime;
 	}
 	if (body->valid & OBD_MD_FLCTIME) {
 		if (body->ctime > LTIME_S(inode->i_ctime))
 			LTIME_S(inode->i_ctime) = body->ctime;
-		lli->lli_lvb.lvb_ctime = body->ctime;
+		lli->lli_ctime = body->ctime;
 	}
 	if (body->valid & OBD_MD_FLMODE)
 		inode->i_mode = (inode->i_mode & S_IFMT)|(body->mode & ~S_IFMT);
diff --git a/drivers/staging/lustre/lustre/llite/vvp_io.c b/drivers/staging/lustre/lustre/llite/vvp_io.c
index 18127d3..c7db318 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_io.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_io.c
@@ -739,7 +739,7 @@ int vvp_io_write_commit(const struct lu_env *env, struct cl_io *io)
 	}
 
 	/* update inode size */
-	ll_merge_lvb(env, inode);
+	ll_merge_attr(env, inode);
 
 	/* Now the pages in queue were failed to commit, discard them
 	 * unless they were dirtied before.
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 14/46] staging/lustre/lmv: remove lmv_init_{lock,unlock}()
  2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
                   ` (12 preceding siblings ...)
  2016-03-30 23:48 ` [PATCH v2 13/46] staging/lustre/llite: remove lli_lvb green
@ 2016-03-30 23:48 ` green
  2016-03-30 23:48 ` [PATCH v2 15/46] staging/lustre/obd: remove struct client_obd_lock green
                   ` (31 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: green @ 2016-03-30 23:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List,
	John L. Hammond, Oleg Drokin

From: "John L. Hammond" <john.hammond@intel.com>

In struct lmv_obd rename the init_mutex member to
lmv_init_mutex. Remove the compat macros lmv_init_{lock,unlock}() and
use mutex_{lock,unlock}(&lmv->lmv_init_mutex) instead.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Reviewed-on: http://review.whamcloud.com/12115
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2675
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 drivers/staging/lustre/lustre/include/obd.h      |  2 +-
 drivers/staging/lustre/lustre/lmv/lmv_internal.h |  3 ---
 drivers/staging/lustre/lustre/lmv/lmv_obd.c      | 24 ++++++++++++------------
 3 files changed, 13 insertions(+), 16 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/obd.h b/drivers/staging/lustre/lustre/include/obd.h
index 26182ca..15c514c 100644
--- a/drivers/staging/lustre/lustre/include/obd.h
+++ b/drivers/staging/lustre/lustre/include/obd.h
@@ -512,7 +512,7 @@ struct lmv_obd {
 	struct obd_uuid		cluuid;
 	struct obd_export	*exp;
 
-	struct mutex		init_mutex;
+	struct mutex		lmv_init_mutex;
 	int			connected;
 	int			max_easize;
 	int			max_def_easize;
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_internal.h b/drivers/staging/lustre/lustre/lmv/lmv_internal.h
index 8a00871..7007e4c 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_internal.h
+++ b/drivers/staging/lustre/lustre/lmv/lmv_internal.h
@@ -42,9 +42,6 @@
 
 #define LMV_MAX_TGT_COUNT 128
 
-#define lmv_init_lock(lmv)   mutex_lock(&lmv->init_mutex)
-#define lmv_init_unlock(lmv) mutex_unlock(&lmv->init_mutex)
-
 #define LL_IT2STR(it)					\
 	((it) ? ldlm_it2str((it)->it_op) : "0")
 
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index e3a7216..8bd2dc5 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -425,7 +425,7 @@ static int lmv_add_target(struct obd_device *obd, struct obd_uuid *uuidp,
 
 	CDEBUG(D_CONFIG, "Target uuid: %s. index %d\n", uuidp->uuid, index);
 
-	lmv_init_lock(lmv);
+	mutex_lock(&lmv->lmv_init_mutex);
 
 	if (lmv->desc.ld_tgt_count == 0) {
 		struct obd_device *mdc_obd;
@@ -433,7 +433,7 @@ static int lmv_add_target(struct obd_device *obd, struct obd_uuid *uuidp,
 		mdc_obd = class_find_client_obd(uuidp, LUSTRE_MDC_NAME,
 						&obd->obd_uuid);
 		if (!mdc_obd) {
-			lmv_init_unlock(lmv);
+			mutex_unlock(&lmv->lmv_init_mutex);
 			CERROR("%s: Target %s not attached: rc = %d\n",
 			       obd->obd_name, uuidp->uuid, -EINVAL);
 			return -EINVAL;
@@ -445,7 +445,7 @@ static int lmv_add_target(struct obd_device *obd, struct obd_uuid *uuidp,
 		CERROR("%s: UUID %s already assigned at LOV target index %d: rc = %d\n",
 		       obd->obd_name,
 		       obd_uuid2str(&tgt->ltd_uuid), index, -EEXIST);
-		lmv_init_unlock(lmv);
+		mutex_unlock(&lmv->lmv_init_mutex);
 		return -EEXIST;
 	}
 
@@ -459,7 +459,7 @@ static int lmv_add_target(struct obd_device *obd, struct obd_uuid *uuidp,
 			newsize <<= 1;
 		newtgts = kcalloc(newsize, sizeof(*newtgts), GFP_NOFS);
 		if (!newtgts) {
-			lmv_init_unlock(lmv);
+			mutex_unlock(&lmv->lmv_init_mutex);
 			return -ENOMEM;
 		}
 
@@ -481,7 +481,7 @@ static int lmv_add_target(struct obd_device *obd, struct obd_uuid *uuidp,
 
 	tgt = kzalloc(sizeof(*tgt), GFP_NOFS);
 	if (!tgt) {
-		lmv_init_unlock(lmv);
+		mutex_unlock(&lmv->lmv_init_mutex);
 		return -ENOMEM;
 	}
 
@@ -507,7 +507,7 @@ static int lmv_add_target(struct obd_device *obd, struct obd_uuid *uuidp,
 		}
 	}
 
-	lmv_init_unlock(lmv);
+	mutex_unlock(&lmv->lmv_init_mutex);
 	return rc;
 }
 
@@ -522,14 +522,14 @@ int lmv_check_connect(struct obd_device *obd)
 	if (lmv->connected)
 		return 0;
 
-	lmv_init_lock(lmv);
+	mutex_lock(&lmv->lmv_init_mutex);
 	if (lmv->connected) {
-		lmv_init_unlock(lmv);
+		mutex_unlock(&lmv->lmv_init_mutex);
 		return 0;
 	}
 
 	if (lmv->desc.ld_tgt_count == 0) {
-		lmv_init_unlock(lmv);
+		mutex_unlock(&lmv->lmv_init_mutex);
 		CERROR("%s: no targets configured.\n", obd->obd_name);
 		return -EINVAL;
 	}
@@ -551,7 +551,7 @@ int lmv_check_connect(struct obd_device *obd)
 	lmv->connected = 1;
 	easize = lmv_get_easize(lmv);
 	lmv_init_ea_size(obd->obd_self_export, easize, 0, 0, 0);
-	lmv_init_unlock(lmv);
+	mutex_unlock(&lmv->lmv_init_mutex);
 	return 0;
 
  out_disc:
@@ -572,7 +572,7 @@ int lmv_check_connect(struct obd_device *obd)
 		}
 	}
 	class_disconnect(lmv->exp);
-	lmv_init_unlock(lmv);
+	mutex_unlock(&lmv->lmv_init_mutex);
 	return rc;
 }
 
@@ -1269,7 +1269,7 @@ static int lmv_setup(struct obd_device *obd, struct lustre_cfg *lcfg)
 	lmv->lmv_placement = PLACEMENT_CHAR_POLICY;
 
 	spin_lock_init(&lmv->lmv_lock);
-	mutex_init(&lmv->init_mutex);
+	mutex_init(&lmv->lmv_init_mutex);
 
 	lprocfs_lmv_init_vars(&lvars);
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 15/46] staging/lustre/obd: remove struct client_obd_lock
  2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
                   ` (13 preceding siblings ...)
  2016-03-30 23:48 ` [PATCH v2 14/46] staging/lustre/lmv: remove lmv_init_{lock,unlock}() green
@ 2016-03-30 23:48 ` green
  2016-03-30 23:48 ` [PATCH v2 16/46] staging/lustre/llite: remove some cl wrappers green
                   ` (30 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: green @ 2016-03-30 23:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List,
	John L. Hammond, Oleg Drokin

From: "John L. Hammond" <john.hammond@intel.com>

Remove the definition of struct client_obd_lock and the functions
client_obd_list_{init,lock,unlock,done}(). Use spinlock_t for the
cl_{loi,lru}_list_lock members of struct client_obd and call
spin_{lock,unlock}() directly.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 drivers/staging/lustre/lustre/fld/fld_request.c    | 14 ++---
 drivers/staging/lustre/lustre/include/linux/obd.h  | 67 ----------------------
 drivers/staging/lustre/lustre/include/obd.h        |  9 +--
 drivers/staging/lustre/lustre/ldlm/ldlm_lib.c      |  4 +-
 drivers/staging/lustre/lustre/mdc/lproc_mdc.c      |  8 +--
 drivers/staging/lustre/lustre/mdc/mdc_lib.c        | 18 +++---
 drivers/staging/lustre/lustre/osc/lproc_osc.c      | 38 ++++++------
 drivers/staging/lustre/lustre/osc/osc_cache.c      | 50 ++++++++--------
 .../staging/lustre/lustre/osc/osc_cl_internal.h    |  2 +-
 drivers/staging/lustre/lustre/osc/osc_page.c       | 24 ++++----
 drivers/staging/lustre/lustre/osc/osc_request.c    | 46 +++++++--------
 11 files changed, 105 insertions(+), 175 deletions(-)

diff --git a/drivers/staging/lustre/lustre/fld/fld_request.c b/drivers/staging/lustre/lustre/fld/fld_request.c
index a3d122d..2dfdb51 100644
--- a/drivers/staging/lustre/lustre/fld/fld_request.c
+++ b/drivers/staging/lustre/lustre/fld/fld_request.c
@@ -64,9 +64,9 @@ static int fld_req_avail(struct client_obd *cli, struct mdc_cache_waiter *mcw)
 {
 	int rc;
 
-	client_obd_list_lock(&cli->cl_loi_list_lock);
+	spin_lock(&cli->cl_loi_list_lock);
 	rc = list_empty(&mcw->mcw_entry);
-	client_obd_list_unlock(&cli->cl_loi_list_lock);
+	spin_unlock(&cli->cl_loi_list_lock);
 	return rc;
 };
 
@@ -75,15 +75,15 @@ static void fld_enter_request(struct client_obd *cli)
 	struct mdc_cache_waiter mcw;
 	struct l_wait_info lwi = { 0 };
 
-	client_obd_list_lock(&cli->cl_loi_list_lock);
+	spin_lock(&cli->cl_loi_list_lock);
 	if (cli->cl_r_in_flight >= cli->cl_max_rpcs_in_flight) {
 		list_add_tail(&mcw.mcw_entry, &cli->cl_cache_waiters);
 		init_waitqueue_head(&mcw.mcw_waitq);
-		client_obd_list_unlock(&cli->cl_loi_list_lock);
+		spin_unlock(&cli->cl_loi_list_lock);
 		l_wait_event(mcw.mcw_waitq, fld_req_avail(cli, &mcw), &lwi);
 	} else {
 		cli->cl_r_in_flight++;
-		client_obd_list_unlock(&cli->cl_loi_list_lock);
+		spin_unlock(&cli->cl_loi_list_lock);
 	}
 }
 
@@ -92,7 +92,7 @@ static void fld_exit_request(struct client_obd *cli)
 	struct list_head *l, *tmp;
 	struct mdc_cache_waiter *mcw;
 
-	client_obd_list_lock(&cli->cl_loi_list_lock);
+	spin_lock(&cli->cl_loi_list_lock);
 	cli->cl_r_in_flight--;
 	list_for_each_safe(l, tmp, &cli->cl_cache_waiters) {
 
@@ -106,7 +106,7 @@ static void fld_exit_request(struct client_obd *cli)
 		cli->cl_r_in_flight++;
 		wake_up(&mcw->mcw_waitq);
 	}
-	client_obd_list_unlock(&cli->cl_loi_list_lock);
+	spin_unlock(&cli->cl_loi_list_lock);
 }
 
 static int fld_rrb_hash(struct lu_client_fld *fld, u64 seq)
diff --git a/drivers/staging/lustre/lustre/include/linux/obd.h b/drivers/staging/lustre/lustre/include/linux/obd.h
index 3907bf4..e1063e8 100644
--- a/drivers/staging/lustre/lustre/include/linux/obd.h
+++ b/drivers/staging/lustre/lustre/include/linux/obd.h
@@ -55,71 +55,4 @@ struct ll_iattr {
 	unsigned int	ia_attr_flags;
 };
 
-#define CLIENT_OBD_LIST_LOCK_DEBUG 1
-
-struct client_obd_lock {
-	spinlock_t		lock;
-
-	unsigned long       time;
-	struct task_struct *task;
-	const char	 *func;
-	int		 line;
-};
-
-static inline void __client_obd_list_lock(struct client_obd_lock *lock,
-					  const char *func, int line)
-{
-	unsigned long cur = jiffies;
-
-	while (1) {
-		if (spin_trylock(&lock->lock)) {
-			LASSERT(!lock->task);
-			lock->task = current;
-			lock->func = func;
-			lock->line = line;
-			lock->time = jiffies;
-			break;
-		}
-
-		if (time_before(cur + 5 * HZ, jiffies) &&
-		    time_before(lock->time + 5 * HZ, jiffies)) {
-			struct task_struct *task = lock->task;
-
-			if (!task)
-				continue;
-
-			LCONSOLE_WARN("%s:%d: lock %p was acquired by <%s:%d:%s:%d> for %lu seconds.\n",
-				      current->comm, current->pid,
-				      lock, task->comm, task->pid,
-				      lock->func, lock->line,
-				      (jiffies - lock->time) / HZ);
-			LCONSOLE_WARN("====== for current process =====\n");
-			dump_stack();
-			LCONSOLE_WARN("====== end =======\n");
-			set_current_state(TASK_UNINTERRUPTIBLE);
-			schedule_timeout(1000 * HZ);
-		}
-		cpu_relax();
-	}
-}
-
-#define client_obd_list_lock(lock) \
-	__client_obd_list_lock(lock, __func__, __LINE__)
-
-static inline void client_obd_list_unlock(struct client_obd_lock *lock)
-{
-	LASSERT(lock->task);
-	lock->task = NULL;
-	lock->time = jiffies;
-	spin_unlock(&lock->lock);
-}
-
-static inline void client_obd_list_lock_init(struct client_obd_lock *lock)
-{
-	spin_lock_init(&lock->lock);
-}
-
-static inline void client_obd_list_lock_done(struct client_obd_lock *lock)
-{}
-
 #endif /* __LINUX_OBD_H */
diff --git a/drivers/staging/lustre/lustre/include/obd.h b/drivers/staging/lustre/lustre/include/obd.h
index 15c514c..84cc001 100644
--- a/drivers/staging/lustre/lustre/include/obd.h
+++ b/drivers/staging/lustre/lustre/include/obd.h
@@ -37,6 +37,7 @@
 #ifndef __OBD_H
 #define __OBD_H
 
+#include <linux/spinlock.h>
 #include "linux/obd.h"
 
 #define IOC_OSC_TYPE	 'h'
@@ -293,14 +294,10 @@ struct client_obd {
 	 * blocking everywhere, but we don't want to slow down fast-path of
 	 * our main platform.)
 	 *
-	 * Exact type of ->cl_loi_list_lock is defined in arch/obd.h together
-	 * with client_obd_list_{un,}lock() and
-	 * client_obd_list_lock_{init,done}() functions.
-	 *
 	 * NB by Jinshan: though field names are still _loi_, but actually
 	 * osc_object{}s are in the list.
 	 */
-	struct client_obd_lock	       cl_loi_list_lock;
+	spinlock_t		       cl_loi_list_lock;
 	struct list_head	       cl_loi_ready_list;
 	struct list_head	       cl_loi_hp_ready_list;
 	struct list_head	       cl_loi_write_list;
@@ -327,7 +324,7 @@ struct client_obd {
 	atomic_t		 cl_lru_shrinkers;
 	atomic_t		 cl_lru_in_list;
 	struct list_head	 cl_lru_list; /* lru page list */
-	struct client_obd_lock   cl_lru_list_lock; /* page list protector */
+	spinlock_t		 cl_lru_list_lock; /* page list protector */
 
 	/* number of in flight destroy rpcs is limited to max_rpcs_in_flight */
 	atomic_t	     cl_destroy_in_flight;
diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_lib.c b/drivers/staging/lustre/lustre/ldlm/ldlm_lib.c
index b586d5a..b497ce4 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_lib.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_lib.c
@@ -314,7 +314,7 @@ int client_obd_setup(struct obd_device *obddev, struct lustre_cfg *lcfg)
 	INIT_LIST_HEAD(&cli->cl_loi_hp_ready_list);
 	INIT_LIST_HEAD(&cli->cl_loi_write_list);
 	INIT_LIST_HEAD(&cli->cl_loi_read_list);
-	client_obd_list_lock_init(&cli->cl_loi_list_lock);
+	spin_lock_init(&cli->cl_loi_list_lock);
 	atomic_set(&cli->cl_pending_w_pages, 0);
 	atomic_set(&cli->cl_pending_r_pages, 0);
 	cli->cl_r_in_flight = 0;
@@ -333,7 +333,7 @@ int client_obd_setup(struct obd_device *obddev, struct lustre_cfg *lcfg)
 	atomic_set(&cli->cl_lru_busy, 0);
 	atomic_set(&cli->cl_lru_in_list, 0);
 	INIT_LIST_HEAD(&cli->cl_lru_list);
-	client_obd_list_lock_init(&cli->cl_lru_list_lock);
+	spin_lock_init(&cli->cl_lru_list_lock);
 
 	init_waitqueue_head(&cli->cl_destroy_waitq);
 	atomic_set(&cli->cl_destroy_in_flight, 0);
diff --git a/drivers/staging/lustre/lustre/mdc/lproc_mdc.c b/drivers/staging/lustre/lustre/mdc/lproc_mdc.c
index 38f267a..5c7a15d 100644
--- a/drivers/staging/lustre/lustre/mdc/lproc_mdc.c
+++ b/drivers/staging/lustre/lustre/mdc/lproc_mdc.c
@@ -49,9 +49,9 @@ static ssize_t max_rpcs_in_flight_show(struct kobject *kobj,
 					      obd_kobj);
 	struct client_obd *cli = &dev->u.cli;
 
-	client_obd_list_lock(&cli->cl_loi_list_lock);
+	spin_lock(&cli->cl_loi_list_lock);
 	len = sprintf(buf, "%u\n", cli->cl_max_rpcs_in_flight);
-	client_obd_list_unlock(&cli->cl_loi_list_lock);
+	spin_unlock(&cli->cl_loi_list_lock);
 
 	return len;
 }
@@ -74,9 +74,9 @@ static ssize_t max_rpcs_in_flight_store(struct kobject *kobj,
 	if (val < 1 || val > MDC_MAX_RIF_MAX)
 		return -ERANGE;
 
-	client_obd_list_lock(&cli->cl_loi_list_lock);
+	spin_lock(&cli->cl_loi_list_lock);
 	cli->cl_max_rpcs_in_flight = val;
-	client_obd_list_unlock(&cli->cl_loi_list_lock);
+	spin_unlock(&cli->cl_loi_list_lock);
 
 	return count;
 }
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_lib.c b/drivers/staging/lustre/lustre/mdc/mdc_lib.c
index b3bfdcb..bd29eb7 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_lib.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_lib.c
@@ -481,9 +481,9 @@ static int mdc_req_avail(struct client_obd *cli, struct mdc_cache_waiter *mcw)
 {
 	int rc;
 
-	client_obd_list_lock(&cli->cl_loi_list_lock);
+	spin_lock(&cli->cl_loi_list_lock);
 	rc = list_empty(&mcw->mcw_entry);
-	client_obd_list_unlock(&cli->cl_loi_list_lock);
+	spin_unlock(&cli->cl_loi_list_lock);
 	return rc;
 };
 
@@ -497,23 +497,23 @@ int mdc_enter_request(struct client_obd *cli)
 	struct mdc_cache_waiter mcw;
 	struct l_wait_info lwi = LWI_INTR(LWI_ON_SIGNAL_NOOP, NULL);
 
-	client_obd_list_lock(&cli->cl_loi_list_lock);
+	spin_lock(&cli->cl_loi_list_lock);
 	if (cli->cl_r_in_flight >= cli->cl_max_rpcs_in_flight) {
 		list_add_tail(&mcw.mcw_entry, &cli->cl_cache_waiters);
 		init_waitqueue_head(&mcw.mcw_waitq);
-		client_obd_list_unlock(&cli->cl_loi_list_lock);
+		spin_unlock(&cli->cl_loi_list_lock);
 		rc = l_wait_event(mcw.mcw_waitq, mdc_req_avail(cli, &mcw),
 				  &lwi);
 		if (rc) {
-			client_obd_list_lock(&cli->cl_loi_list_lock);
+			spin_lock(&cli->cl_loi_list_lock);
 			if (list_empty(&mcw.mcw_entry))
 				cli->cl_r_in_flight--;
 			list_del_init(&mcw.mcw_entry);
-			client_obd_list_unlock(&cli->cl_loi_list_lock);
+			spin_unlock(&cli->cl_loi_list_lock);
 		}
 	} else {
 		cli->cl_r_in_flight++;
-		client_obd_list_unlock(&cli->cl_loi_list_lock);
+		spin_unlock(&cli->cl_loi_list_lock);
 	}
 	return rc;
 }
@@ -523,7 +523,7 @@ void mdc_exit_request(struct client_obd *cli)
 	struct list_head *l, *tmp;
 	struct mdc_cache_waiter *mcw;
 
-	client_obd_list_lock(&cli->cl_loi_list_lock);
+	spin_lock(&cli->cl_loi_list_lock);
 	cli->cl_r_in_flight--;
 	list_for_each_safe(l, tmp, &cli->cl_cache_waiters) {
 		if (cli->cl_r_in_flight >= cli->cl_max_rpcs_in_flight) {
@@ -538,5 +538,5 @@ void mdc_exit_request(struct client_obd *cli)
 	}
 	/* Empty waiting list? Decrease reqs in-flight number */
 
-	client_obd_list_unlock(&cli->cl_loi_list_lock);
+	spin_unlock(&cli->cl_loi_list_lock);
 }
diff --git a/drivers/staging/lustre/lustre/osc/lproc_osc.c b/drivers/staging/lustre/lustre/osc/lproc_osc.c
index e6e2029..911e505 100644
--- a/drivers/staging/lustre/lustre/osc/lproc_osc.c
+++ b/drivers/staging/lustre/lustre/osc/lproc_osc.c
@@ -121,9 +121,9 @@ static ssize_t max_rpcs_in_flight_store(struct kobject *kobj,
 		atomic_add(added, &osc_pool_req_count);
 	}
 
-	client_obd_list_lock(&cli->cl_loi_list_lock);
+	spin_lock(&cli->cl_loi_list_lock);
 	cli->cl_max_rpcs_in_flight = val;
-	client_obd_list_unlock(&cli->cl_loi_list_lock);
+	spin_unlock(&cli->cl_loi_list_lock);
 
 	return count;
 }
@@ -139,9 +139,9 @@ static ssize_t max_dirty_mb_show(struct kobject *kobj,
 	long val;
 	int mult;
 
-	client_obd_list_lock(&cli->cl_loi_list_lock);
+	spin_lock(&cli->cl_loi_list_lock);
 	val = cli->cl_dirty_max;
-	client_obd_list_unlock(&cli->cl_loi_list_lock);
+	spin_unlock(&cli->cl_loi_list_lock);
 
 	mult = 1 << 20;
 	return lprocfs_read_frac_helper(buf, PAGE_SIZE, val, mult);
@@ -169,10 +169,10 @@ static ssize_t max_dirty_mb_store(struct kobject *kobj,
 	    pages_number > totalram_pages / 4) /* 1/4 of RAM */
 		return -ERANGE;
 
-	client_obd_list_lock(&cli->cl_loi_list_lock);
+	spin_lock(&cli->cl_loi_list_lock);
 	cli->cl_dirty_max = (u32)(pages_number << PAGE_CACHE_SHIFT);
 	osc_wake_cache_waiters(cli);
-	client_obd_list_unlock(&cli->cl_loi_list_lock);
+	spin_unlock(&cli->cl_loi_list_lock);
 
 	return count;
 }
@@ -247,9 +247,9 @@ static ssize_t cur_dirty_bytes_show(struct kobject *kobj,
 	struct client_obd *cli = &dev->u.cli;
 	int len;
 
-	client_obd_list_lock(&cli->cl_loi_list_lock);
+	spin_lock(&cli->cl_loi_list_lock);
 	len = sprintf(buf, "%lu\n", cli->cl_dirty);
-	client_obd_list_unlock(&cli->cl_loi_list_lock);
+	spin_unlock(&cli->cl_loi_list_lock);
 
 	return len;
 }
@@ -264,9 +264,9 @@ static ssize_t cur_grant_bytes_show(struct kobject *kobj,
 	struct client_obd *cli = &dev->u.cli;
 	int len;
 
-	client_obd_list_lock(&cli->cl_loi_list_lock);
+	spin_lock(&cli->cl_loi_list_lock);
 	len = sprintf(buf, "%lu\n", cli->cl_avail_grant);
-	client_obd_list_unlock(&cli->cl_loi_list_lock);
+	spin_unlock(&cli->cl_loi_list_lock);
 
 	return len;
 }
@@ -287,12 +287,12 @@ static ssize_t cur_grant_bytes_store(struct kobject *kobj,
 		return rc;
 
 	/* this is only for shrinking grant */
-	client_obd_list_lock(&cli->cl_loi_list_lock);
+	spin_lock(&cli->cl_loi_list_lock);
 	if (val >= cli->cl_avail_grant) {
-		client_obd_list_unlock(&cli->cl_loi_list_lock);
+		spin_unlock(&cli->cl_loi_list_lock);
 		return -EINVAL;
 	}
-	client_obd_list_unlock(&cli->cl_loi_list_lock);
+	spin_unlock(&cli->cl_loi_list_lock);
 
 	if (cli->cl_import->imp_state == LUSTRE_IMP_FULL)
 		rc = osc_shrink_grant_to_target(cli, val);
@@ -311,9 +311,9 @@ static ssize_t cur_lost_grant_bytes_show(struct kobject *kobj,
 	struct client_obd *cli = &dev->u.cli;
 	int len;
 
-	client_obd_list_lock(&cli->cl_loi_list_lock);
+	spin_lock(&cli->cl_loi_list_lock);
 	len = sprintf(buf, "%lu\n", cli->cl_lost_grant);
-	client_obd_list_unlock(&cli->cl_loi_list_lock);
+	spin_unlock(&cli->cl_loi_list_lock);
 
 	return len;
 }
@@ -585,9 +585,9 @@ static ssize_t max_pages_per_rpc_store(struct kobject *kobj,
 	if (val == 0 || val > ocd->ocd_brw_size >> PAGE_CACHE_SHIFT) {
 		return -ERANGE;
 	}
-	client_obd_list_lock(&cli->cl_loi_list_lock);
+	spin_lock(&cli->cl_loi_list_lock);
 	cli->cl_max_pages_per_rpc = val;
-	client_obd_list_unlock(&cli->cl_loi_list_lock);
+	spin_unlock(&cli->cl_loi_list_lock);
 
 	return count;
 }
@@ -631,7 +631,7 @@ static int osc_rpc_stats_seq_show(struct seq_file *seq, void *v)
 
 	ktime_get_real_ts64(&now);
 
-	client_obd_list_lock(&cli->cl_loi_list_lock);
+	spin_lock(&cli->cl_loi_list_lock);
 
 	seq_printf(seq, "snapshot_time:	 %llu.%9lu (secs.usecs)\n",
 		   (s64)now.tv_sec, (unsigned long)now.tv_nsec);
@@ -715,7 +715,7 @@ static int osc_rpc_stats_seq_show(struct seq_file *seq, void *v)
 			break;
 	}
 
-	client_obd_list_unlock(&cli->cl_loi_list_lock);
+	spin_unlock(&cli->cl_loi_list_lock);
 
 	return 0;
 }
diff --git a/drivers/staging/lustre/lustre/osc/osc_cache.c b/drivers/staging/lustre/lustre/osc/osc_cache.c
index 74607933..c510659 100644
--- a/drivers/staging/lustre/lustre/osc/osc_cache.c
+++ b/drivers/staging/lustre/lustre/osc/osc_cache.c
@@ -1373,7 +1373,7 @@ static int osc_completion(const struct lu_env *env, struct osc_async_page *oap,
 static void osc_consume_write_grant(struct client_obd *cli,
 				    struct brw_page *pga)
 {
-	assert_spin_locked(&cli->cl_loi_list_lock.lock);
+	assert_spin_locked(&cli->cl_loi_list_lock);
 	LASSERT(!(pga->flag & OBD_BRW_FROM_GRANT));
 	atomic_inc(&obd_dirty_pages);
 	cli->cl_dirty += PAGE_CACHE_SIZE;
@@ -1389,7 +1389,7 @@ static void osc_consume_write_grant(struct client_obd *cli,
 static void osc_release_write_grant(struct client_obd *cli,
 				    struct brw_page *pga)
 {
-	assert_spin_locked(&cli->cl_loi_list_lock.lock);
+	assert_spin_locked(&cli->cl_loi_list_lock);
 	if (!(pga->flag & OBD_BRW_FROM_GRANT)) {
 		return;
 	}
@@ -1408,7 +1408,7 @@ static void osc_release_write_grant(struct client_obd *cli,
  * To avoid sleeping with object lock held, it's good for us allocate enough
  * grants before entering into critical section.
  *
- * client_obd_list_lock held by caller
+ * spin_lock held by caller
  */
 static int osc_reserve_grant(struct client_obd *cli, unsigned int bytes)
 {
@@ -1442,11 +1442,11 @@ static void __osc_unreserve_grant(struct client_obd *cli,
 static void osc_unreserve_grant(struct client_obd *cli,
 				unsigned int reserved, unsigned int unused)
 {
-	client_obd_list_lock(&cli->cl_loi_list_lock);
+	spin_lock(&cli->cl_loi_list_lock);
 	__osc_unreserve_grant(cli, reserved, unused);
 	if (unused > 0)
 		osc_wake_cache_waiters(cli);
-	client_obd_list_unlock(&cli->cl_loi_list_lock);
+	spin_unlock(&cli->cl_loi_list_lock);
 }
 
 /**
@@ -1467,7 +1467,7 @@ static void osc_free_grant(struct client_obd *cli, unsigned int nr_pages,
 {
 	int grant = (1 << cli->cl_chunkbits) + cli->cl_extent_tax;
 
-	client_obd_list_lock(&cli->cl_loi_list_lock);
+	spin_lock(&cli->cl_loi_list_lock);
 	atomic_sub(nr_pages, &obd_dirty_pages);
 	cli->cl_dirty -= nr_pages << PAGE_CACHE_SHIFT;
 	cli->cl_lost_grant += lost_grant;
@@ -1479,7 +1479,7 @@ static void osc_free_grant(struct client_obd *cli, unsigned int nr_pages,
 		cli->cl_avail_grant += grant;
 	}
 	osc_wake_cache_waiters(cli);
-	client_obd_list_unlock(&cli->cl_loi_list_lock);
+	spin_unlock(&cli->cl_loi_list_lock);
 	CDEBUG(D_CACHE, "lost %u grant: %lu avail: %lu dirty: %lu\n",
 	       lost_grant, cli->cl_lost_grant,
 	       cli->cl_avail_grant, cli->cl_dirty);
@@ -1491,9 +1491,9 @@ static void osc_free_grant(struct client_obd *cli, unsigned int nr_pages,
  */
 static void osc_exit_cache(struct client_obd *cli, struct osc_async_page *oap)
 {
-	client_obd_list_lock(&cli->cl_loi_list_lock);
+	spin_lock(&cli->cl_loi_list_lock);
 	osc_release_write_grant(cli, &oap->oap_brw_page);
-	client_obd_list_unlock(&cli->cl_loi_list_lock);
+	spin_unlock(&cli->cl_loi_list_lock);
 }
 
 /**
@@ -1532,9 +1532,9 @@ static int ocw_granted(struct client_obd *cli, struct osc_cache_waiter *ocw)
 {
 	int rc;
 
-	client_obd_list_lock(&cli->cl_loi_list_lock);
+	spin_lock(&cli->cl_loi_list_lock);
 	rc = list_empty(&ocw->ocw_entry);
-	client_obd_list_unlock(&cli->cl_loi_list_lock);
+	spin_unlock(&cli->cl_loi_list_lock);
 	return rc;
 }
 
@@ -1556,7 +1556,7 @@ static int osc_enter_cache(const struct lu_env *env, struct client_obd *cli,
 
 	OSC_DUMP_GRANT(cli, "need:%d.\n", bytes);
 
-	client_obd_list_lock(&cli->cl_loi_list_lock);
+	spin_lock(&cli->cl_loi_list_lock);
 
 	/* force the caller to try sync io.  this can jump the list
 	 * of queued writes and create a discontiguous rpc stream
@@ -1587,7 +1587,7 @@ static int osc_enter_cache(const struct lu_env *env, struct client_obd *cli,
 	while (cli->cl_dirty > 0 || cli->cl_w_in_flight > 0) {
 		list_add_tail(&ocw.ocw_entry, &cli->cl_cache_waiters);
 		ocw.ocw_rc = 0;
-		client_obd_list_unlock(&cli->cl_loi_list_lock);
+		spin_unlock(&cli->cl_loi_list_lock);
 
 		osc_io_unplug_async(env, cli, NULL);
 
@@ -1596,7 +1596,7 @@ static int osc_enter_cache(const struct lu_env *env, struct client_obd *cli,
 
 		rc = l_wait_event(ocw.ocw_waitq, ocw_granted(cli, &ocw), &lwi);
 
-		client_obd_list_lock(&cli->cl_loi_list_lock);
+		spin_lock(&cli->cl_loi_list_lock);
 
 		/* l_wait_event is interrupted by signal */
 		if (rc < 0) {
@@ -1615,7 +1615,7 @@ static int osc_enter_cache(const struct lu_env *env, struct client_obd *cli,
 		}
 	}
 out:
-	client_obd_list_unlock(&cli->cl_loi_list_lock);
+	spin_unlock(&cli->cl_loi_list_lock);
 	OSC_DUMP_GRANT(cli, "returned %d.\n", rc);
 	return rc;
 }
@@ -1776,9 +1776,9 @@ static int osc_list_maint(struct client_obd *cli, struct osc_object *osc)
 {
 	int is_ready;
 
-	client_obd_list_lock(&cli->cl_loi_list_lock);
+	spin_lock(&cli->cl_loi_list_lock);
 	is_ready = __osc_list_maint(cli, osc);
-	client_obd_list_unlock(&cli->cl_loi_list_lock);
+	spin_unlock(&cli->cl_loi_list_lock);
 
 	return is_ready;
 }
@@ -1829,10 +1829,10 @@ static void osc_ap_completion(const struct lu_env *env, struct client_obd *cli,
 	oap->oap_interrupted = 0;
 
 	if (oap->oap_cmd & OBD_BRW_WRITE && xid > 0) {
-		client_obd_list_lock(&cli->cl_loi_list_lock);
+		spin_lock(&cli->cl_loi_list_lock);
 		osc_process_ar(&cli->cl_ar, xid, rc);
 		osc_process_ar(&loi->loi_ar, xid, rc);
-		client_obd_list_unlock(&cli->cl_loi_list_lock);
+		spin_unlock(&cli->cl_loi_list_lock);
 	}
 
 	rc = osc_completion(env, oap, oap->oap_cmd, rc);
@@ -2133,7 +2133,7 @@ static void osc_check_rpcs(const struct lu_env *env, struct client_obd *cli)
 		}
 
 		cl_object_get(obj);
-		client_obd_list_unlock(&cli->cl_loi_list_lock);
+		spin_unlock(&cli->cl_loi_list_lock);
 		lu_object_ref_add_at(&obj->co_lu, &link, "check", current);
 
 		/* attempt some read/write balancing by alternating between
@@ -2180,7 +2180,7 @@ static void osc_check_rpcs(const struct lu_env *env, struct client_obd *cli)
 		lu_object_ref_del_at(&obj->co_lu, &link, "check", current);
 		cl_object_put(env, obj);
 
-		client_obd_list_lock(&cli->cl_loi_list_lock);
+		spin_lock(&cli->cl_loi_list_lock);
 	}
 }
 
@@ -2197,9 +2197,9 @@ static int osc_io_unplug0(const struct lu_env *env, struct client_obd *cli,
 		 * potential stack overrun problem. LU-2859
 		 */
 		atomic_inc(&cli->cl_lru_shrinkers);
-		client_obd_list_lock(&cli->cl_loi_list_lock);
+		spin_lock(&cli->cl_loi_list_lock);
 		osc_check_rpcs(env, cli);
-		client_obd_list_unlock(&cli->cl_loi_list_lock);
+		spin_unlock(&cli->cl_loi_list_lock);
 		atomic_dec(&cli->cl_lru_shrinkers);
 	} else {
 		CDEBUG(D_CACHE, "Queue writeback work for client %p.\n", cli);
@@ -2332,9 +2332,9 @@ int osc_queue_async_io(const struct lu_env *env, struct cl_io *io,
 			grants = 0;
 
 		/* it doesn't need any grant to dirty this page */
-		client_obd_list_lock(&cli->cl_loi_list_lock);
+		spin_lock(&cli->cl_loi_list_lock);
 		rc = osc_enter_cache_try(cli, oap, grants, 0);
-		client_obd_list_unlock(&cli->cl_loi_list_lock);
+		spin_unlock(&cli->cl_loi_list_lock);
 		if (rc == 0) { /* try failed */
 			grants = 0;
 			need_release = 1;
diff --git a/drivers/staging/lustre/lustre/osc/osc_cl_internal.h b/drivers/staging/lustre/lustre/osc/osc_cl_internal.h
index 89552d7..bb34b53a 100644
--- a/drivers/staging/lustre/lustre/osc/osc_cl_internal.h
+++ b/drivers/staging/lustre/lustre/osc/osc_cl_internal.h
@@ -608,7 +608,7 @@ enum osc_extent_state {
  *
  * LOCKING ORDER
  * =============
- * page lock -> client_obd_list_lock -> object lock(osc_object::oo_lock)
+ * page lock -> cl_loi_list_lock -> object lock(osc_object::oo_lock)
  */
 struct osc_extent {
 	/** red-black tree node */
diff --git a/drivers/staging/lustre/lustre/osc/osc_page.c b/drivers/staging/lustre/lustre/osc/osc_page.c
index e02dd33..5b31351 100644
--- a/drivers/staging/lustre/lustre/osc/osc_page.c
+++ b/drivers/staging/lustre/lustre/osc/osc_page.c
@@ -456,11 +456,11 @@ void osc_lru_add_batch(struct client_obd *cli, struct list_head *plist)
 	}
 
 	if (npages > 0) {
-		client_obd_list_lock(&cli->cl_lru_list_lock);
+		spin_lock(&cli->cl_lru_list_lock);
 		list_splice_tail(&lru, &cli->cl_lru_list);
 		atomic_sub(npages, &cli->cl_lru_busy);
 		atomic_add(npages, &cli->cl_lru_in_list);
-		client_obd_list_unlock(&cli->cl_lru_list_lock);
+		spin_unlock(&cli->cl_lru_list_lock);
 
 		/* XXX: May set force to be true for better performance */
 		if (osc_cache_too_much(cli))
@@ -482,14 +482,14 @@ static void __osc_lru_del(struct client_obd *cli, struct osc_page *opg)
 static void osc_lru_del(struct client_obd *cli, struct osc_page *opg)
 {
 	if (opg->ops_in_lru) {
-		client_obd_list_lock(&cli->cl_lru_list_lock);
+		spin_lock(&cli->cl_lru_list_lock);
 		if (!list_empty(&opg->ops_lru)) {
 			__osc_lru_del(cli, opg);
 		} else {
 			LASSERT(atomic_read(&cli->cl_lru_busy) > 0);
 			atomic_dec(&cli->cl_lru_busy);
 		}
-		client_obd_list_unlock(&cli->cl_lru_list_lock);
+		spin_unlock(&cli->cl_lru_list_lock);
 
 		atomic_inc(cli->cl_lru_left);
 		/* this is a great place to release more LRU pages if
@@ -513,9 +513,9 @@ static void osc_lru_use(struct client_obd *cli, struct osc_page *opg)
 	 * ops_lru should be empty
 	 */
 	if (opg->ops_in_lru && !list_empty(&opg->ops_lru)) {
-		client_obd_list_lock(&cli->cl_lru_list_lock);
+		spin_lock(&cli->cl_lru_list_lock);
 		__osc_lru_del(cli, opg);
-		client_obd_list_unlock(&cli->cl_lru_list_lock);
+		spin_unlock(&cli->cl_lru_list_lock);
 		atomic_inc(&cli->cl_lru_busy);
 	}
 }
@@ -572,7 +572,7 @@ int osc_lru_shrink(const struct lu_env *env, struct client_obd *cli,
 	pvec = (struct cl_page **)osc_env_info(env)->oti_pvec;
 	io = &osc_env_info(env)->oti_io;
 
-	client_obd_list_lock(&cli->cl_lru_list_lock);
+	spin_lock(&cli->cl_lru_list_lock);
 	maxscan = min(target << 1, atomic_read(&cli->cl_lru_in_list));
 	list_for_each_entry_safe(opg, temp, &cli->cl_lru_list, ops_lru) {
 		struct cl_page *page;
@@ -592,7 +592,7 @@ int osc_lru_shrink(const struct lu_env *env, struct client_obd *cli,
 			struct cl_object *tmp = page->cp_obj;
 
 			cl_object_get(tmp);
-			client_obd_list_unlock(&cli->cl_lru_list_lock);
+			spin_unlock(&cli->cl_lru_list_lock);
 
 			if (clobj) {
 				discard_pagevec(env, io, pvec, index);
@@ -608,7 +608,7 @@ int osc_lru_shrink(const struct lu_env *env, struct client_obd *cli,
 			io->ci_ignore_layout = 1;
 			rc = cl_io_init(env, io, CIT_MISC, clobj);
 
-			client_obd_list_lock(&cli->cl_lru_list_lock);
+			spin_lock(&cli->cl_lru_list_lock);
 
 			if (rc != 0)
 				break;
@@ -640,17 +640,17 @@ int osc_lru_shrink(const struct lu_env *env, struct client_obd *cli,
 		/* Don't discard and free the page with cl_lru_list held */
 		pvec[index++] = page;
 		if (unlikely(index == OTI_PVEC_SIZE)) {
-			client_obd_list_unlock(&cli->cl_lru_list_lock);
+			spin_unlock(&cli->cl_lru_list_lock);
 			discard_pagevec(env, io, pvec, index);
 			index = 0;
 
-			client_obd_list_lock(&cli->cl_lru_list_lock);
+			spin_lock(&cli->cl_lru_list_lock);
 		}
 
 		if (++count >= target)
 			break;
 	}
-	client_obd_list_unlock(&cli->cl_lru_list_lock);
+	spin_unlock(&cli->cl_lru_list_lock);
 
 	if (clobj) {
 		discard_pagevec(env, io, pvec, index);
diff --git a/drivers/staging/lustre/lustre/osc/osc_request.c b/drivers/staging/lustre/lustre/osc/osc_request.c
index 1e378e6..372bd26 100644
--- a/drivers/staging/lustre/lustre/osc/osc_request.c
+++ b/drivers/staging/lustre/lustre/osc/osc_request.c
@@ -801,7 +801,7 @@ static void osc_announce_cached(struct client_obd *cli, struct obdo *oa,
 	LASSERT(!(oa->o_valid & bits));
 
 	oa->o_valid |= bits;
-	client_obd_list_lock(&cli->cl_loi_list_lock);
+	spin_lock(&cli->cl_loi_list_lock);
 	oa->o_dirty = cli->cl_dirty;
 	if (unlikely(cli->cl_dirty - cli->cl_dirty_transit >
 		     cli->cl_dirty_max)) {
@@ -833,7 +833,7 @@ static void osc_announce_cached(struct client_obd *cli, struct obdo *oa,
 	oa->o_grant = cli->cl_avail_grant + cli->cl_reserved_grant;
 	oa->o_dropped = cli->cl_lost_grant;
 	cli->cl_lost_grant = 0;
-	client_obd_list_unlock(&cli->cl_loi_list_lock);
+	spin_unlock(&cli->cl_loi_list_lock);
 	CDEBUG(D_CACHE, "dirty: %llu undirty: %u dropped %u grant: %llu\n",
 	       oa->o_dirty, oa->o_undirty, oa->o_dropped, oa->o_grant);
 
@@ -849,9 +849,9 @@ void osc_update_next_shrink(struct client_obd *cli)
 
 static void __osc_update_grant(struct client_obd *cli, u64 grant)
 {
-	client_obd_list_lock(&cli->cl_loi_list_lock);
+	spin_lock(&cli->cl_loi_list_lock);
 	cli->cl_avail_grant += grant;
-	client_obd_list_unlock(&cli->cl_loi_list_lock);
+	spin_unlock(&cli->cl_loi_list_lock);
 }
 
 static void osc_update_grant(struct client_obd *cli, struct ost_body *body)
@@ -889,10 +889,10 @@ out:
 
 static void osc_shrink_grant_local(struct client_obd *cli, struct obdo *oa)
 {
-	client_obd_list_lock(&cli->cl_loi_list_lock);
+	spin_lock(&cli->cl_loi_list_lock);
 	oa->o_grant = cli->cl_avail_grant / 4;
 	cli->cl_avail_grant -= oa->o_grant;
-	client_obd_list_unlock(&cli->cl_loi_list_lock);
+	spin_unlock(&cli->cl_loi_list_lock);
 	if (!(oa->o_valid & OBD_MD_FLFLAGS)) {
 		oa->o_valid |= OBD_MD_FLFLAGS;
 		oa->o_flags = 0;
@@ -911,10 +911,10 @@ static int osc_shrink_grant(struct client_obd *cli)
 	__u64 target_bytes = (cli->cl_max_rpcs_in_flight + 1) *
 			     (cli->cl_max_pages_per_rpc << PAGE_CACHE_SHIFT);
 
-	client_obd_list_lock(&cli->cl_loi_list_lock);
+	spin_lock(&cli->cl_loi_list_lock);
 	if (cli->cl_avail_grant <= target_bytes)
 		target_bytes = cli->cl_max_pages_per_rpc << PAGE_CACHE_SHIFT;
-	client_obd_list_unlock(&cli->cl_loi_list_lock);
+	spin_unlock(&cli->cl_loi_list_lock);
 
 	return osc_shrink_grant_to_target(cli, target_bytes);
 }
@@ -924,7 +924,7 @@ int osc_shrink_grant_to_target(struct client_obd *cli, __u64 target_bytes)
 	int rc = 0;
 	struct ost_body	*body;
 
-	client_obd_list_lock(&cli->cl_loi_list_lock);
+	spin_lock(&cli->cl_loi_list_lock);
 	/* Don't shrink if we are already above or below the desired limit
 	 * We don't want to shrink below a single RPC, as that will negatively
 	 * impact block allocation and long-term performance.
@@ -933,10 +933,10 @@ int osc_shrink_grant_to_target(struct client_obd *cli, __u64 target_bytes)
 		target_bytes = cli->cl_max_pages_per_rpc << PAGE_CACHE_SHIFT;
 
 	if (target_bytes >= cli->cl_avail_grant) {
-		client_obd_list_unlock(&cli->cl_loi_list_lock);
+		spin_unlock(&cli->cl_loi_list_lock);
 		return 0;
 	}
-	client_obd_list_unlock(&cli->cl_loi_list_lock);
+	spin_unlock(&cli->cl_loi_list_lock);
 
 	body = kzalloc(sizeof(*body), GFP_NOFS);
 	if (!body)
@@ -944,10 +944,10 @@ int osc_shrink_grant_to_target(struct client_obd *cli, __u64 target_bytes)
 
 	osc_announce_cached(cli, &body->oa, 0);
 
-	client_obd_list_lock(&cli->cl_loi_list_lock);
+	spin_lock(&cli->cl_loi_list_lock);
 	body->oa.o_grant = cli->cl_avail_grant - target_bytes;
 	cli->cl_avail_grant = target_bytes;
-	client_obd_list_unlock(&cli->cl_loi_list_lock);
+	spin_unlock(&cli->cl_loi_list_lock);
 	if (!(body->oa.o_valid & OBD_MD_FLFLAGS)) {
 		body->oa.o_valid |= OBD_MD_FLFLAGS;
 		body->oa.o_flags = 0;
@@ -1035,7 +1035,7 @@ static void osc_init_grant(struct client_obd *cli, struct obd_connect_data *ocd)
 	 * race is tolerable here: if we're evicted, but imp_state already
 	 * left EVICTED state, then cl_dirty must be 0 already.
 	 */
-	client_obd_list_lock(&cli->cl_loi_list_lock);
+	spin_lock(&cli->cl_loi_list_lock);
 	if (cli->cl_import->imp_state == LUSTRE_IMP_EVICTED)
 		cli->cl_avail_grant = ocd->ocd_grant;
 	else
@@ -1053,7 +1053,7 @@ static void osc_init_grant(struct client_obd *cli, struct obd_connect_data *ocd)
 
 	/* determine the appropriate chunk size used by osc_extent. */
 	cli->cl_chunkbits = max_t(int, PAGE_CACHE_SHIFT, ocd->ocd_blocksize);
-	client_obd_list_unlock(&cli->cl_loi_list_lock);
+	spin_unlock(&cli->cl_loi_list_lock);
 
 	CDEBUG(D_CACHE, "%s, setting cl_avail_grant: %ld cl_lost_grant: %ld chunk bits: %d\n",
 	       cli->cl_import->imp_obd->obd_name,
@@ -1827,7 +1827,7 @@ static int brw_interpret(const struct lu_env *env,
 	osc_release_ppga(aa->aa_ppga, aa->aa_page_count);
 	ptlrpc_lprocfs_brw(req, req->rq_bulk->bd_nob_transferred);
 
-	client_obd_list_lock(&cli->cl_loi_list_lock);
+	spin_lock(&cli->cl_loi_list_lock);
 	/* We need to decrement before osc_ap_completion->osc_wake_cache_waiters
 	 * is called so we know whether to go to sync BRWs or wait for more
 	 * RPCs to complete
@@ -1837,7 +1837,7 @@ static int brw_interpret(const struct lu_env *env,
 	else
 		cli->cl_r_in_flight--;
 	osc_wake_cache_waiters(cli);
-	client_obd_list_unlock(&cli->cl_loi_list_lock);
+	spin_unlock(&cli->cl_loi_list_lock);
 
 	osc_io_unplug(env, cli, NULL);
 	return rc;
@@ -2005,7 +2005,7 @@ int osc_build_rpc(const struct lu_env *env, struct client_obd *cli,
 	if (tmp)
 		tmp->oap_request = ptlrpc_request_addref(req);
 
-	client_obd_list_lock(&cli->cl_loi_list_lock);
+	spin_lock(&cli->cl_loi_list_lock);
 	starting_offset >>= PAGE_CACHE_SHIFT;
 	if (cmd == OBD_BRW_READ) {
 		cli->cl_r_in_flight++;
@@ -2020,7 +2020,7 @@ int osc_build_rpc(const struct lu_env *env, struct client_obd *cli,
 		lprocfs_oh_tally_log2(&cli->cl_write_offset_hist,
 				      starting_offset + 1);
 	}
-	client_obd_list_unlock(&cli->cl_loi_list_lock);
+	spin_unlock(&cli->cl_loi_list_lock);
 
 	DEBUG_REQ(D_INODE, req, "%d pages, aa %p. now %dr/%dw in flight",
 		  page_count, aa, cli->cl_r_in_flight,
@@ -3005,12 +3005,12 @@ static int osc_reconnect(const struct lu_env *env,
 	if (data && (data->ocd_connect_flags & OBD_CONNECT_GRANT)) {
 		long lost_grant;
 
-		client_obd_list_lock(&cli->cl_loi_list_lock);
+		spin_lock(&cli->cl_loi_list_lock);
 		data->ocd_grant = (cli->cl_avail_grant + cli->cl_dirty) ?:
 				2 * cli_brw_size(obd);
 		lost_grant = cli->cl_lost_grant;
 		cli->cl_lost_grant = 0;
-		client_obd_list_unlock(&cli->cl_loi_list_lock);
+		spin_unlock(&cli->cl_loi_list_lock);
 
 		CDEBUG(D_RPCTRACE, "ocd_connect_flags: %#llx ocd_version: %d ocd_grant: %d, lost: %ld.\n",
 		       data->ocd_connect_flags,
@@ -3060,10 +3060,10 @@ static int osc_import_event(struct obd_device *obd,
 	switch (event) {
 	case IMP_EVENT_DISCON: {
 		cli = &obd->u.cli;
-		client_obd_list_lock(&cli->cl_loi_list_lock);
+		spin_lock(&cli->cl_loi_list_lock);
 		cli->cl_avail_grant = 0;
 		cli->cl_lost_grant = 0;
-		client_obd_list_unlock(&cli->cl_loi_list_lock);
+		spin_unlock(&cli->cl_loi_list_lock);
 		break;
 	}
 	case IMP_EVENT_INACTIVE: {
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 16/46] staging/lustre/llite: remove some cl wrappers
  2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
                   ` (14 preceding siblings ...)
  2016-03-30 23:48 ` [PATCH v2 15/46] staging/lustre/obd: remove struct client_obd_lock green
@ 2016-03-30 23:48 ` green
  2016-03-30 23:48 ` [PATCH v2 17/46] staging/lustre: Remove struct ll_iattr green
                   ` (29 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: green @ 2016-03-30 23:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List,
	John L. Hammond, Oleg Drokin

From: "John L. Hammond" <john.hammond@intel.com>

In llite remove the wrapper functions and macros:
  cl_i2info()
  cl_i2sbi()
  cl_iattr2fd()
  cl_inode_info
  cl_inode_mode()
  cl_inode_{a,m,c}time()
  cl_isize_{read,write,write_nolock}()

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Reviewed-on: http://review.whamcloud.com/12850
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2675
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 drivers/staging/lustre/lustre/llite/dir.c          |  2 +-
 drivers/staging/lustre/lustre/llite/file.c         |  8 ++--
 drivers/staging/lustre/lustre/llite/glimpse.c      | 10 ++---
 drivers/staging/lustre/lustre/llite/lcommon_cl.c   | 45 +++++++++++-----------
 .../staging/lustre/lustre/llite/llite_internal.h   | 32 ---------------
 drivers/staging/lustre/lustre/llite/llite_lib.c    |  2 +-
 drivers/staging/lustre/lustre/llite/vvp_object.c   |  4 +-
 7 files changed, 35 insertions(+), 68 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/dir.c b/drivers/staging/lustre/lustre/llite/dir.c
index 4e0a3e5..2ca4b0e 100644
--- a/drivers/staging/lustre/lustre/llite/dir.c
+++ b/drivers/staging/lustre/lustre/llite/dir.c
@@ -191,7 +191,7 @@ static int ll_dir_filler(void *_hash, struct page *page0)
 		body = req_capsule_server_get(&request->rq_pill, &RMF_MDT_BODY);
 		/* Checked by mdc_readpage() */
 		if (body->valid & OBD_MD_FLSIZE)
-			cl_isize_write(inode, body->size);
+			i_size_write(inode, body->size);
 
 		nrdpgs = (request->rq_bulk->bd_nob_transferred+PAGE_CACHE_SIZE-1)
 			 >> PAGE_CACHE_SHIFT;
diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index 8fc9da0..c3c258f 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -1036,7 +1036,7 @@ int ll_merge_attr(const struct lu_env *env, struct inode *inode)
 	CDEBUG(D_VFSTRACE, DFID " updating i_size %llu\n",
 	       PFID(&lli->lli_fid), attr->cat_size);
 
-	cl_isize_write_nolock(inode, attr->cat_size);
+	i_size_write(inode, attr->cat_size);
 
 	inode->i_blocks = attr->cat_blocks;
 
@@ -1592,7 +1592,7 @@ ll_get_grouplock(struct inode *inode, struct file *file, unsigned long arg)
 	LASSERT(!fd->fd_grouplock.cg_lock);
 	spin_unlock(&lli->lli_lock);
 
-	rc = cl_get_grouplock(cl_i2info(inode)->lli_clob,
+	rc = cl_get_grouplock(ll_i2info(inode)->lli_clob,
 			      arg, (file->f_flags & O_NONBLOCK), &grouplock);
 	if (rc)
 		return rc;
@@ -2614,7 +2614,7 @@ int cl_sync_file_range(struct inode *inode, loff_t start, loff_t end,
 		return PTR_ERR(env);
 
 	io = ccc_env_thread_io(env);
-	io->ci_obj = cl_i2info(inode)->lli_clob;
+	io->ci_obj = ll_i2info(inode)->lli_clob;
 	io->ci_ignore_layout = ignore_layout;
 
 	/* initialize parameters for sync */
@@ -3629,7 +3629,7 @@ int ll_layout_restore(struct inode *inode)
 	       sizeof(hur->hur_user_item[0].hui_fid));
 	hur->hur_user_item[0].hui_extent.length = -1;
 	hur->hur_request.hr_itemcount = 1;
-	rc = obd_iocontrol(LL_IOC_HSM_REQUEST, cl_i2sbi(inode)->ll_md_exp,
+	rc = obd_iocontrol(LL_IOC_HSM_REQUEST, ll_i2sbi(inode)->ll_md_exp,
 			   len, hur, NULL);
 	kfree(hur);
 	return rc;
diff --git a/drivers/staging/lustre/lustre/llite/glimpse.c b/drivers/staging/lustre/lustre/llite/glimpse.c
index 88bc7c9..9b0e2ec 100644
--- a/drivers/staging/lustre/lustre/llite/glimpse.c
+++ b/drivers/staging/lustre/lustre/llite/glimpse.c
@@ -87,7 +87,7 @@ int cl_glimpse_lock(const struct lu_env *env, struct cl_io *io,
 		    struct inode *inode, struct cl_object *clob, int agl)
 {
 	struct cl_lock_descr *descr = &ccc_env_info(env)->cti_descr;
-	struct cl_inode_info *lli   = cl_i2info(inode);
+	struct ll_inode_info *lli   = ll_i2info(inode);
 	const struct lu_fid  *fid   = lu_object_fid(&clob->co_lu);
 	struct ccc_io	*cio   = ccc_env_io(env);
 	struct cl_lock       *lock;
@@ -140,7 +140,7 @@ int cl_glimpse_lock(const struct lu_env *env, struct cl_io *io,
 			result = cl_wait(env, lock);
 			if (result == 0) {
 				ll_merge_attr(env, inode);
-				if (cl_isize_read(inode) > 0 &&
+				if (i_size_read(inode) > 0 &&
 				    inode->i_blocks == 0) {
 					/*
 					 * LU-417: Add dirty pages block count
@@ -167,11 +167,11 @@ static int cl_io_get(struct inode *inode, struct lu_env **envout,
 {
 	struct lu_env	  *env;
 	struct cl_io	   *io;
-	struct cl_inode_info   *lli = cl_i2info(inode);
+	struct ll_inode_info	*lli = ll_i2info(inode);
 	struct cl_object       *clob = lli->lli_clob;
 	int result;
 
-	if (S_ISREG(cl_inode_mode(inode))) {
+	if (S_ISREG(inode->i_mode)) {
 		env = cl_env_get(refcheck);
 		if (!IS_ERR(env)) {
 			io = ccc_env_thread_io(env);
@@ -240,7 +240,7 @@ int cl_local_size(struct inode *inode)
 	int		      result;
 	int		      refcheck;
 
-	if (!cl_i2info(inode)->lli_has_smd)
+	if (!ll_i2info(inode)->lli_has_smd)
 		return 0;
 
 	result = cl_io_get(inode, &env, &io, &refcheck);
diff --git a/drivers/staging/lustre/lustre/llite/lcommon_cl.c b/drivers/staging/lustre/lustre/llite/lcommon_cl.c
index 4871d0f..fde96d3 100644
--- a/drivers/staging/lustre/lustre/llite/lcommon_cl.c
+++ b/drivers/staging/lustre/lustre/llite/lcommon_cl.c
@@ -417,9 +417,9 @@ int ccc_object_glimpse(const struct lu_env *env,
 {
 	struct inode *inode = ccc_object_inode(obj);
 
-	lvb->lvb_mtime = cl_inode_mtime(inode);
-	lvb->lvb_atime = cl_inode_atime(inode);
-	lvb->lvb_ctime = cl_inode_ctime(inode);
+	lvb->lvb_mtime = LTIME_S(inode->i_mtime);
+	lvb->lvb_atime = LTIME_S(inode->i_atime);
+	lvb->lvb_ctime = LTIME_S(inode->i_ctime);
 	/*
 	 * LU-417: Add dirty pages block count lest i_blocks reports 0, some
 	 * "cp" or "tar" on remote node may think it's a completely sparse file
@@ -731,7 +731,7 @@ int ccc_prep_size(const struct lu_env *env, struct cl_object *obj,
 				 * linux-2.6.18-128.1.1 miss to do that.
 				 * --bug 17336
 				 */
-				loff_t size = cl_isize_read(inode);
+				loff_t size = i_size_read(inode);
 				loff_t cur_index = start >> PAGE_CACHE_SHIFT;
 				loff_t size_index = (size - 1) >>
 						    PAGE_CACHE_SHIFT;
@@ -752,11 +752,11 @@ int ccc_prep_size(const struct lu_env *env, struct cl_object *obj,
 		 * which will always be >= the kms value here.
 		 * b=11081
 		 */
-		if (cl_isize_read(inode) < kms) {
-			cl_isize_write_nolock(inode, kms);
+		if (i_size_read(inode) < kms) {
+			i_size_write(inode, kms);
 			CDEBUG(D_VFSTRACE, DFID " updating i_size %llu\n",
 			       PFID(lu_object_fid(&obj->co_lu)),
-			       (__u64)cl_isize_read(inode));
+			       (__u64)i_size_read(inode));
 		}
 	}
 	ccc_object_size_unlock(obj);
@@ -816,14 +816,14 @@ void ccc_req_attr_set(const struct lu_env *env,
 	if (slice->crs_req->crq_type == CRT_WRITE) {
 		if (flags & OBD_MD_FLEPOCH) {
 			oa->o_valid |= OBD_MD_FLEPOCH;
-			oa->o_ioepoch = cl_i2info(inode)->lli_ioepoch;
+			oa->o_ioepoch = ll_i2info(inode)->lli_ioepoch;
 			valid_flags |= OBD_MD_FLMTIME | OBD_MD_FLCTIME |
 				       OBD_MD_FLUID | OBD_MD_FLGID;
 		}
 	}
 	obdo_from_inode(oa, inode, valid_flags & flags);
-	obdo_set_parent_fid(oa, &cl_i2info(inode)->lli_fid);
-	memcpy(attr->cra_jobid, cl_i2info(inode)->lli_jobid,
+	obdo_set_parent_fid(oa, &ll_i2info(inode)->lli_fid);
+	memcpy(attr->cra_jobid, ll_i2info(inode)->lli_jobid,
 	       JOBSTATS_JOBID_SIZE);
 }
 
@@ -844,7 +844,7 @@ int cl_setattr_ost(struct inode *inode, const struct iattr *attr)
 		return PTR_ERR(env);
 
 	io = ccc_env_thread_io(env);
-	io->ci_obj = cl_i2info(inode)->lli_clob;
+	io->ci_obj = ll_i2info(inode)->lli_clob;
 
 	io->u.ci_setattr.sa_attr.lvb_atime = LTIME_S(attr->ia_atime);
 	io->u.ci_setattr.sa_attr.lvb_mtime = LTIME_S(attr->ia_mtime);
@@ -860,7 +860,7 @@ again:
 			/* populate the file descriptor for ftruncate to honor
 			 * group lock - see LU-787
 			 */
-			cio->cui_fd = cl_iattr2fd(inode, attr);
+			cio->cui_fd = LUSTRE_FPRIVATE(attr->ia_file);
 
 		result = cl_io_loop(env, io);
 	} else {
@@ -949,11 +949,10 @@ struct page *cl2vm_page(const struct cl_page_slice *slice)
 int ccc_object_invariant(const struct cl_object *obj)
 {
 	struct inode	 *inode = ccc_object_inode(obj);
-	struct cl_inode_info *lli   = cl_i2info(inode);
+	struct ll_inode_info	*lli	= ll_i2info(inode);
 
-	return (S_ISREG(cl_inode_mode(inode)) ||
-		/* i_mode of unlinked inode is zeroed. */
-		cl_inode_mode(inode) == 0) && lli->lli_clob == obj;
+	return (S_ISREG(inode->i_mode) || inode->i_mode == 0) &&
+	       lli->lli_clob == obj;
 }
 
 struct inode *ccc_object_inode(const struct cl_object *obj)
@@ -973,7 +972,7 @@ struct inode *ccc_object_inode(const struct cl_object *obj)
 int cl_file_inode_init(struct inode *inode, struct lustre_md *md)
 {
 	struct lu_env	*env;
-	struct cl_inode_info *lli;
+	struct ll_inode_info *lli;
 	struct cl_object     *clob;
 	struct lu_site       *site;
 	struct lu_fid	*fid;
@@ -987,14 +986,14 @@ int cl_file_inode_init(struct inode *inode, struct lustre_md *md)
 	int refcheck;
 
 	LASSERT(md->body->valid & OBD_MD_FLID);
-	LASSERT(S_ISREG(cl_inode_mode(inode)));
+	LASSERT(S_ISREG(inode->i_mode));
 
 	env = cl_env_get(&refcheck);
 	if (IS_ERR(env))
 		return PTR_ERR(env);
 
-	site = cl_i2sbi(inode)->ll_site;
-	lli  = cl_i2info(inode);
+	site = ll_i2sbi(inode)->ll_site;
+	lli  = ll_i2info(inode);
 	fid  = &lli->lli_fid;
 	LASSERT(fid_is_sane(fid));
 
@@ -1071,7 +1070,7 @@ static void cl_object_put_last(struct lu_env *env, struct cl_object *obj)
 void cl_inode_fini(struct inode *inode)
 {
 	struct lu_env	   *env;
-	struct cl_inode_info    *lli  = cl_i2info(inode);
+	struct ll_inode_info    *lli  = ll_i2info(inode);
 	struct cl_object	*clob = lli->lli_clob;
 	int refcheck;
 	int emergency;
@@ -1168,10 +1167,10 @@ __u32 cl_fid_build_gen(const struct lu_fid *fid)
  */
 struct lov_stripe_md *ccc_inode_lsm_get(struct inode *inode)
 {
-	return lov_lsm_get(cl_i2info(inode)->lli_clob);
+	return lov_lsm_get(ll_i2info(inode)->lli_clob);
 }
 
 inline void ccc_inode_lsm_put(struct inode *inode, struct lov_stripe_md *lsm)
 {
-	lov_lsm_put(cl_i2info(inode)->lli_clob, lsm);
+	lov_lsm_put(ll_i2info(inode)->lli_clob, lsm);
 }
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index c1d747a..ffed507 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -1293,38 +1293,6 @@ typedef enum llioc_iter (*llioc_callback_t)(struct inode *inode,
 void *ll_iocontrol_register(llioc_callback_t cb, int count, unsigned int *cmd);
 void ll_iocontrol_unregister(void *magic);
 
-/* lclient compat stuff */
-#define cl_inode_info ll_inode_info
-#define cl_i2info(info) ll_i2info(info)
-#define cl_inode_mode(inode) ((inode)->i_mode)
-#define cl_i2sbi ll_i2sbi
-
-static inline struct ll_file_data *cl_iattr2fd(struct inode *inode,
-					       const struct iattr *attr)
-{
-	LASSERT(attr->ia_valid & ATTR_FILE);
-	return LUSTRE_FPRIVATE(attr->ia_file);
-}
-
-static inline void cl_isize_write_nolock(struct inode *inode, loff_t kms)
-{
-	LASSERT(mutex_is_locked(&ll_i2info(inode)->lli_size_mutex));
-	i_size_write(inode, kms);
-}
-
-static inline void cl_isize_write(struct inode *inode, loff_t kms)
-{
-	ll_inode_size_lock(inode);
-	i_size_write(inode, kms);
-	ll_inode_size_unlock(inode);
-}
-
-#define cl_isize_read(inode)	     i_size_read(inode)
-
-#define cl_inode_atime(inode) LTIME_S((inode)->i_atime)
-#define cl_inode_ctime(inode) LTIME_S((inode)->i_ctime)
-#define cl_inode_mtime(inode) LTIME_S((inode)->i_mtime)
-
 int cl_sync_file_range(struct inode *inode, loff_t start, loff_t end,
 		       enum cl_fsync_mode mode, int ignore_layout);
 
diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c
index 0b4e0db..041d221 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -1698,7 +1698,7 @@ void ll_read_inode2(struct inode *inode, void *opaque)
 
 void ll_delete_inode(struct inode *inode)
 {
-	struct cl_inode_info *lli = cl_i2info(inode);
+	struct ll_inode_info *lli = ll_i2info(inode);
 
 	if (S_ISREG(inode->i_mode) && lli->lli_clob)
 		/* discard all dirty pages before truncating them, required by
diff --git a/drivers/staging/lustre/lustre/llite/vvp_object.c b/drivers/staging/lustre/lustre/llite/vvp_object.c
index b9a1d01..d03eb2b 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_object.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_object.c
@@ -112,7 +112,7 @@ static int vvp_attr_set(const struct lu_env *env, struct cl_object *obj,
 	if (valid & CAT_CTIME)
 		inode->i_ctime.tv_sec = attr->cat_ctime;
 	if (0 && valid & CAT_SIZE)
-		cl_isize_write_nolock(inode, attr->cat_size);
+		i_size_write(inode, attr->cat_size);
 	/* not currently necessary */
 	if (0 && valid & (CAT_UID|CAT_GID|CAT_SIZE))
 		mark_inode_dirty(inode);
@@ -196,7 +196,7 @@ static const struct lu_object_operations vvp_lu_obj_ops = {
 
 struct ccc_object *cl_inode2ccc(struct inode *inode)
 {
-	struct cl_inode_info *lli = cl_i2info(inode);
+	struct ll_inode_info *lli = ll_i2info(inode);
 	struct cl_object     *obj = lli->lli_clob;
 	struct lu_object     *lu;
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 17/46] staging/lustre: Remove struct ll_iattr
  2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
                   ` (15 preceding siblings ...)
  2016-03-30 23:48 ` [PATCH v2 16/46] staging/lustre/llite: remove some cl wrappers green
@ 2016-03-30 23:48 ` green
  2016-03-30 23:48 ` [PATCH v2 18/46] staging/lustre/clio: generalize cl_sync_io green
                   ` (28 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: green @ 2016-03-30 23:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List, Oleg Drokin

From: Oleg Drokin <green@linuxhacker.ru>

This was a compat code from the time it had ia_attr_flags.
Instead convert all the cryptic callers that did
((struct ll_iattr *)&op_data->op_attr)->ia_attr_flags into
direct access to op_data->op_attr_flags

This also makes lustre/include/linux/obd.h not needed anymore,
so remove it.

Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 drivers/staging/lustre/lustre/include/linux/obd.h | 58 -----------------------
 drivers/staging/lustre/lustre/include/obd.h       |  2 +-
 drivers/staging/lustre/lustre/llite/file.c        |  4 +-
 drivers/staging/lustre/lustre/llite/llite_lib.c   |  2 +-
 drivers/staging/lustre/lustre/mdc/mdc_lib.c       |  3 +-
 drivers/staging/lustre/lustre/obdclass/obdo.c     |  3 +-
 6 files changed, 6 insertions(+), 66 deletions(-)
 delete mode 100644 drivers/staging/lustre/lustre/include/linux/obd.h

diff --git a/drivers/staging/lustre/lustre/include/linux/obd.h b/drivers/staging/lustre/lustre/include/linux/obd.h
deleted file mode 100644
index e1063e8..0000000
--- a/drivers/staging/lustre/lustre/include/linux/obd.h
+++ /dev/null
@@ -1,58 +0,0 @@
-/*
- * GPL HEADER START
- *
- * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 only,
- * as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * General Public License version 2 for more details (a copy is included
- * in the LICENSE file that accompanied this code).
- *
- * You should have received a copy of the GNU General Public License
- * version 2 along with this program; If not, see
- * http://www.sun.com/software/products/lustre/docs/GPLv2.pdf
- *
- * Please contact Sun Microsystems, Inc., 4150 Network Circle, Santa Clara,
- * CA 95054 USA or visit www.sun.com if you need additional information or
- * have any questions.
- *
- * GPL HEADER END
- */
-/*
- * Copyright (c) 1999, 2010, Oracle and/or its affiliates. All rights reserved.
- * Use is subject to license terms.
- *
- * Copyright (c) 2011, 2012, Intel Corporation.
- */
-/*
- * This file is part of Lustre, http://www.lustre.org/
- * Lustre is a trademark of Sun Microsystems, Inc.
- */
-
-#ifndef __LINUX_OBD_H
-#define __LINUX_OBD_H
-
-#ifndef __OBD_H
-#error Do not #include this file directly. #include <obd.h> instead
-#endif
-
-#include "../obd_support.h"
-
-#include <linux/fs.h>
-#include <linux/list.h>
-#include <linux/sched.h>  /* for struct task_struct, for current.h */
-#include <linux/mount.h>
-
-#include "../lustre_intent.h"
-
-struct ll_iattr {
-	struct iattr	iattr;
-	unsigned int	ia_attr_flags;
-};
-
-#endif /* __LINUX_OBD_H */
diff --git a/drivers/staging/lustre/lustre/include/obd.h b/drivers/staging/lustre/lustre/include/obd.h
index 84cc001..ded7b10 100644
--- a/drivers/staging/lustre/lustre/include/obd.h
+++ b/drivers/staging/lustre/lustre/include/obd.h
@@ -38,7 +38,6 @@
 #define __OBD_H
 
 #include <linux/spinlock.h>
-#include "linux/obd.h"
 
 #define IOC_OSC_TYPE	 'h'
 #define IOC_OSC_MIN_NR       20
@@ -55,6 +54,7 @@
 #include "lustre_export.h"
 #include "lustre_fid.h"
 #include "lustre_fld.h"
+#include "lustre_intent.h"
 
 #define MAX_OBD_DEVICES 8192
 
diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index c3c258f..bf2a5ee 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -45,6 +45,7 @@
 #include "../include/lustre_lite.h"
 #include <linux/pagemap.h>
 #include <linux/file.h>
+#include <linux/mount.h>
 #include "llite_internal.h"
 #include "../include/lustre/ll_fiemap.h"
 
@@ -87,8 +88,7 @@ void ll_pack_inode2opdata(struct inode *inode, struct md_op_data *op_data,
 	op_data->op_attr.ia_ctime = inode->i_ctime;
 	op_data->op_attr.ia_size = i_size_read(inode);
 	op_data->op_attr_blocks = inode->i_blocks;
-	((struct ll_iattr *)&op_data->op_attr)->ia_attr_flags =
-					ll_inode_to_ext_flags(inode->i_flags);
+	op_data->op_attr_flags = ll_inode_to_ext_flags(inode->i_flags);
 	op_data->op_ioepoch = ll_i2info(inode)->lli_ioepoch;
 	if (fh)
 		op_data->op_handle = *fh;
diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c
index 041d221..0f01cfc 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -1771,7 +1771,7 @@ int ll_iocontrol(struct inode *inode, struct file *file,
 		if (IS_ERR(op_data))
 			return PTR_ERR(op_data);
 
-		((struct ll_iattr *)&op_data->op_attr)->ia_attr_flags = flags;
+		op_data->op_attr_flags = flags;
 		op_data->op_attr.ia_valid |= ATTR_ATTR_FLAG;
 		rc = md_setattr(sbi->ll_md_exp, op_data,
 				NULL, 0, NULL, 0, &req, NULL);
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_lib.c b/drivers/staging/lustre/lustre/mdc/mdc_lib.c
index bd29eb7..be0acf7 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_lib.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_lib.c
@@ -279,8 +279,7 @@ static void mdc_setattr_pack_rec(struct mdt_rec_setattr *rec,
 	rec->sa_atime  = LTIME_S(op_data->op_attr.ia_atime);
 	rec->sa_mtime  = LTIME_S(op_data->op_attr.ia_mtime);
 	rec->sa_ctime  = LTIME_S(op_data->op_attr.ia_ctime);
-	rec->sa_attr_flags =
-			((struct ll_iattr *)&op_data->op_attr)->ia_attr_flags;
+	rec->sa_attr_flags = op_data->op_attr_flags;
 	if ((op_data->op_attr.ia_valid & ATTR_GID) &&
 	    in_group_p(op_data->op_attr.ia_gid))
 		rec->sa_suppgid =
diff --git a/drivers/staging/lustre/lustre/obdclass/obdo.c b/drivers/staging/lustre/lustre/obdclass/obdo.c
index e6436cb..748e33f 100644
--- a/drivers/staging/lustre/lustre/obdclass/obdo.c
+++ b/drivers/staging/lustre/lustre/obdclass/obdo.c
@@ -185,8 +185,7 @@ void md_from_obdo(struct md_op_data *op_data, struct obdo *oa, u32 valid)
 		op_data->op_attr.ia_valid |= ATTR_BLOCKS;
 	}
 	if (valid & OBD_MD_FLFLAGS) {
-		((struct ll_iattr *)&op_data->op_attr)->ia_attr_flags =
-			oa->o_flags;
+		op_data->op_attr_flags = oa->o_flags;
 		op_data->op_attr.ia_valid |= ATTR_ATTR_FLAG;
 	}
 }
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 18/46] staging/lustre/clio: generalize cl_sync_io
  2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
                   ` (16 preceding siblings ...)
  2016-03-30 23:48 ` [PATCH v2 17/46] staging/lustre: Remove struct ll_iattr green
@ 2016-03-30 23:48 ` green
  2016-03-30 23:48 ` [PATCH v2 19/46] staging/lustre/clio: cl_lock simplification green
                   ` (27 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: green @ 2016-03-30 23:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Jinshan Xiong, Oleg Drokin

From: Jinshan Xiong <jinshan.xiong@intel.com>

To make cl_sync_io interfaces not just wait for pages, but to be
a generic synchronization mechanism.

Also remove cl_io_cancel that became not used.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-on: http://review.whamcloud.com/8656
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4198
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 drivers/staging/lustre/lustre/include/cl_object.h | 13 ++--
 drivers/staging/lustre/lustre/obdclass/cl_io.c    | 74 +++++++++++------------
 drivers/staging/lustre/lustre/obdclass/cl_page.c  |  2 +-
 3 files changed, 44 insertions(+), 45 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/cl_object.h b/drivers/staging/lustre/lustre/include/cl_object.h
index 69b40f5..91261b1 100644
--- a/drivers/staging/lustre/lustre/include/cl_object.h
+++ b/drivers/staging/lustre/lustre/include/cl_object.h
@@ -3125,13 +3125,18 @@ struct cl_sync_io {
 	atomic_t		csi_barrier;
 	/** completion to be signaled when transfer is complete. */
 	wait_queue_head_t		csi_waitq;
+	/** callback to invoke when this IO is finished */
+	void			(*csi_end_io)(const struct lu_env *,
+					      struct cl_sync_io *);
 };
 
-void cl_sync_io_init(struct cl_sync_io *anchor, int nrpages);
-int  cl_sync_io_wait(const struct lu_env *env, struct cl_io *io,
-		     struct cl_page_list *queue, struct cl_sync_io *anchor,
+void cl_sync_io_init(struct cl_sync_io *anchor, int nr,
+		     void (*end)(const struct lu_env *, struct cl_sync_io *));
+int  cl_sync_io_wait(const struct lu_env *env, struct cl_sync_io *anchor,
 		     long timeout);
-void cl_sync_io_note(struct cl_sync_io *anchor, int ioret);
+void cl_sync_io_note(const struct lu_env *env, struct cl_sync_io *anchor,
+		     int ioret);
+void cl_sync_io_end(const struct lu_env *env, struct cl_sync_io *anchor);
 
 /** @} cl_sync_io */
 
diff --git a/drivers/staging/lustre/lustre/obdclass/cl_io.c b/drivers/staging/lustre/lustre/obdclass/cl_io.c
index 65d6cee..6a8dd9f 100644
--- a/drivers/staging/lustre/lustre/obdclass/cl_io.c
+++ b/drivers/staging/lustre/lustre/obdclass/cl_io.c
@@ -800,6 +800,9 @@ int cl_io_submit_rw(const struct lu_env *env, struct cl_io *io,
 }
 EXPORT_SYMBOL(cl_io_submit_rw);
 
+static void cl_page_list_assume(const struct lu_env *env,
+				struct cl_io *io, struct cl_page_list *plist);
+
 /**
  * Submit a sync_io and wait for the IO to be finished, or error happens.
  * If \a timeout is zero, it means to wait for the IO unconditionally.
@@ -817,7 +820,7 @@ int cl_io_submit_sync(const struct lu_env *env, struct cl_io *io,
 		pg->cp_sync_io = anchor;
 	}
 
-	cl_sync_io_init(anchor, queue->c2_qin.pl_nr);
+	cl_sync_io_init(anchor, queue->c2_qin.pl_nr, &cl_sync_io_end);
 	rc = cl_io_submit_rw(env, io, iot, queue);
 	if (rc == 0) {
 		/*
@@ -828,12 +831,12 @@ int cl_io_submit_sync(const struct lu_env *env, struct cl_io *io,
 		 */
 		cl_page_list_for_each(pg, &queue->c2_qin) {
 			pg->cp_sync_io = NULL;
-			cl_sync_io_note(anchor, 1);
+			cl_sync_io_note(env, anchor, 1);
 		}
 
 		/* wait for the IO to be finished. */
-		rc = cl_sync_io_wait(env, io, &queue->c2_qout,
-				     anchor, timeout);
+		rc = cl_sync_io_wait(env, anchor, timeout);
+		cl_page_list_assume(env, io, &queue->c2_qout);
 	} else {
 		LASSERT(list_empty(&queue->c2_qout.pl_pages));
 		cl_page_list_for_each(pg, &queue->c2_qin)
@@ -844,25 +847,6 @@ int cl_io_submit_sync(const struct lu_env *env, struct cl_io *io,
 EXPORT_SYMBOL(cl_io_submit_sync);
 
 /**
- * Cancel an IO which has been submitted by cl_io_submit_rw.
- */
-static int cl_io_cancel(const struct lu_env *env, struct cl_io *io,
-			struct cl_page_list *queue)
-{
-	struct cl_page *page;
-	int result = 0;
-
-	CERROR("Canceling ongoing page transmission\n");
-	cl_page_list_for_each(page, queue) {
-		int rc;
-
-		rc = cl_page_cancel(env, page);
-		result = result ?: rc;
-	}
-	return result;
-}
-
-/**
  * Main io loop.
  *
  * Pumps io through iterations calling
@@ -1433,25 +1417,38 @@ void cl_req_attr_set(const struct lu_env *env, struct cl_req *req,
 }
 EXPORT_SYMBOL(cl_req_attr_set);
 
+/* cl_sync_io_callback assumes the caller must call cl_sync_io_wait() to
+ * wait for the IO to finish.
+ */
+void cl_sync_io_end(const struct lu_env *env, struct cl_sync_io *anchor)
+{
+	wake_up_all(&anchor->csi_waitq);
+
+	/* it's safe to nuke or reuse anchor now */
+	atomic_set(&anchor->csi_barrier, 0);
+}
+EXPORT_SYMBOL(cl_sync_io_end);
 
 /**
- * Initialize synchronous io wait anchor, for transfer of \a nrpages pages.
+ * Initialize synchronous io wait anchor
  */
-void cl_sync_io_init(struct cl_sync_io *anchor, int nrpages)
+void cl_sync_io_init(struct cl_sync_io *anchor, int nr,
+		     void (*end)(const struct lu_env *, struct cl_sync_io *))
 {
 	init_waitqueue_head(&anchor->csi_waitq);
-	atomic_set(&anchor->csi_sync_nr, nrpages);
-	atomic_set(&anchor->csi_barrier, nrpages > 0);
+	atomic_set(&anchor->csi_sync_nr, nr);
+	atomic_set(&anchor->csi_barrier, nr > 0);
 	anchor->csi_sync_rc = 0;
+	anchor->csi_end_io = end;
+	LASSERT(end);
 }
 EXPORT_SYMBOL(cl_sync_io_init);
 
 /**
- * Wait until all transfer completes. Transfer completion routine has to call
- * cl_sync_io_note() for every page.
+ * Wait until all IO completes. Transfer completion routine has to call
+ * cl_sync_io_note() for every entity.
  */
-int cl_sync_io_wait(const struct lu_env *env, struct cl_io *io,
-		    struct cl_page_list *queue, struct cl_sync_io *anchor,
+int cl_sync_io_wait(const struct lu_env *env, struct cl_sync_io *anchor,
 		    long timeout)
 {
 	struct l_wait_info lwi = LWI_TIMEOUT_INTR(cfs_time_seconds(timeout),
@@ -1464,11 +1461,9 @@ int cl_sync_io_wait(const struct lu_env *env, struct cl_io *io,
 			  atomic_read(&anchor->csi_sync_nr) == 0,
 			  &lwi);
 	if (rc < 0) {
-		CERROR("SYNC IO failed with error: %d, try to cancel %d remaining pages\n",
+		CERROR("IO failed: %d, still wait for %d remaining entries\n",
 		       rc, atomic_read(&anchor->csi_sync_nr));
 
-		(void)cl_io_cancel(env, io, queue);
-
 		lwi = (struct l_wait_info) { 0 };
 		(void)l_wait_event(anchor->csi_waitq,
 				   atomic_read(&anchor->csi_sync_nr) == 0,
@@ -1477,14 +1472,12 @@ int cl_sync_io_wait(const struct lu_env *env, struct cl_io *io,
 		rc = anchor->csi_sync_rc;
 	}
 	LASSERT(atomic_read(&anchor->csi_sync_nr) == 0);
-	cl_page_list_assume(env, io, queue);
 
 	/* wait until cl_sync_io_note() has done wakeup */
 	while (unlikely(atomic_read(&anchor->csi_barrier) != 0)) {
 		cpu_relax();
 	}
 
-	POISON(anchor, 0x5a, sizeof(*anchor));
 	return rc;
 }
 EXPORT_SYMBOL(cl_sync_io_wait);
@@ -1492,7 +1485,8 @@ EXPORT_SYMBOL(cl_sync_io_wait);
 /**
  * Indicate that transfer of a single page completed.
  */
-void cl_sync_io_note(struct cl_sync_io *anchor, int ioret)
+void cl_sync_io_note(const struct lu_env *env, struct cl_sync_io *anchor,
+		     int ioret)
 {
 	if (anchor->csi_sync_rc == 0 && ioret < 0)
 		anchor->csi_sync_rc = ioret;
@@ -1503,9 +1497,9 @@ void cl_sync_io_note(struct cl_sync_io *anchor, int ioret)
 	 */
 	LASSERT(atomic_read(&anchor->csi_sync_nr) > 0);
 	if (atomic_dec_and_test(&anchor->csi_sync_nr)) {
-		wake_up_all(&anchor->csi_waitq);
-		/* it's safe to nuke or reuse anchor now */
-		atomic_set(&anchor->csi_barrier, 0);
+		LASSERT(anchor->csi_end_io);
+		anchor->csi_end_io(env, anchor);
+		/* Can't access anchor any more */
 	}
 }
 EXPORT_SYMBOL(cl_sync_io_note);
diff --git a/drivers/staging/lustre/lustre/obdclass/cl_page.c b/drivers/staging/lustre/lustre/obdclass/cl_page.c
index 506a9f9..ad7f0ae 100644
--- a/drivers/staging/lustre/lustre/obdclass/cl_page.c
+++ b/drivers/staging/lustre/lustre/obdclass/cl_page.c
@@ -887,7 +887,7 @@ void cl_page_completion(const struct lu_env *env,
 	cl_page_put(env, pg);
 
 	if (anchor)
-		cl_sync_io_note(anchor, ioret);
+		cl_sync_io_note(env, anchor, ioret);
 }
 EXPORT_SYMBOL(cl_page_completion);
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 19/46] staging/lustre/clio: cl_lock simplification
  2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
                   ` (17 preceding siblings ...)
  2016-03-30 23:48 ` [PATCH v2 18/46] staging/lustre/clio: generalize cl_sync_io green
@ 2016-03-30 23:48 ` green
  2016-03-30 23:48 ` [PATCH v2 20/46] staging/lustre: update comments after " green
                   ` (26 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: green @ 2016-03-30 23:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Jinshan Xiong, Bobi Jam, Oleg Drokin

From: Jinshan Xiong <jinshan.xiong@intel.com>

In this patch, the cl_lock cache is eliminated. cl_lock is turned
into a cacheless data container for the requirements of locks to
complete the IO. cl_lock is created before I/O starts and destroyed
when the I/O is complete.

cl_lock depends on LDLM lock to fulfill lock semantics. LDLM lock
is attached to cl_lock at OSC layer. LDLM lock is still cacheable.

Two major methods are supported for cl_lock: clo_enqueue and
clo_cancel.  A cl_lock is enqueued by cl_lock_request(), which will
call clo_enqueue() methods for each layer to enqueue the lock.
At the LOV layer, if a cl_lock consists of multiple sub cl_locks,
each sub locks will be enqueued correspondingly. At OSC layer, the
lock enqueue request will tend to reuse cached LDLM lock; otherwise
a new LDLM lock will have to be requested from OST side.

cl_lock_cancel() must be called to release a cl_lock after use.
clo_cancel() method will be called for each layer to release the
resource held by this lock. At OSC layer, the reference count of LDLM
lock, which is held at clo_enqueue time, is released.

LDLM lock can only be canceled if there is no cl_lock using it.

Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-on: http://review.whamcloud.com/10858
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3259
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 drivers/staging/lustre/lustre/include/cl_object.h  |  590 +-----
 drivers/staging/lustre/lustre/include/lclient.h    |   26 +-
 .../lustre/lustre/include/lustre/lustre_idl.h      |    2 +
 drivers/staging/lustre/lustre/include/lustre_dlm.h |    1 +
 drivers/staging/lustre/lustre/ldlm/ldlm_lib.c      |    1 +
 drivers/staging/lustre/lustre/ldlm/ldlm_lock.c     |    3 +-
 drivers/staging/lustre/lustre/ldlm/ldlm_request.c  |   16 +-
 drivers/staging/lustre/lustre/ldlm/ldlm_resource.c |    1 +
 drivers/staging/lustre/lustre/llite/glimpse.c      |   49 +-
 drivers/staging/lustre/lustre/llite/lcommon_cl.c   |  107 +-
 drivers/staging/lustre/lustre/llite/lcommon_misc.c |   14 +-
 drivers/staging/lustre/lustre/llite/rw26.c         |    3 +-
 drivers/staging/lustre/lustre/llite/vvp_io.c       |   19 +-
 drivers/staging/lustre/lustre/llite/vvp_lock.c     |   25 +-
 drivers/staging/lustre/lustre/llite/vvp_object.c   |   12 +-
 .../staging/lustre/lustre/lov/lov_cl_internal.h    |   75 +-
 drivers/staging/lustre/lustre/lov/lov_dev.c        |    5 +-
 drivers/staging/lustre/lustre/lov/lov_lock.c       |  996 +---------
 drivers/staging/lustre/lustre/lov/lov_object.c     |   13 +-
 drivers/staging/lustre/lustre/lov/lovsub_lock.c    |  383 ----
 drivers/staging/lustre/lustre/obdclass/cl_io.c     |  168 +-
 drivers/staging/lustre/lustre/obdclass/cl_lock.c   | 2026 ++------------------
 drivers/staging/lustre/lustre/obdclass/cl_object.c |   64 +-
 drivers/staging/lustre/lustre/obdclass/cl_page.c   |    9 -
 .../staging/lustre/lustre/obdecho/echo_client.c    |   60 +-
 drivers/staging/lustre/lustre/osc/osc_cache.c      |  110 +-
 .../staging/lustre/lustre/osc/osc_cl_internal.h    |   62 +-
 drivers/staging/lustre/lustre/osc/osc_internal.h   |   10 +-
 drivers/staging/lustre/lustre/osc/osc_io.c         |   41 +-
 drivers/staging/lustre/lustre/osc/osc_lock.c       | 1618 ++++++----------
 drivers/staging/lustre/lustre/osc/osc_object.c     |   33 +-
 drivers/staging/lustre/lustre/osc/osc_page.c       |   14 +-
 drivers/staging/lustre/lustre/osc/osc_request.c    |  207 +-
 33 files changed, 1203 insertions(+), 5560 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/cl_object.h b/drivers/staging/lustre/lustre/include/cl_object.h
index 91261b1..8f9512e 100644
--- a/drivers/staging/lustre/lustre/include/cl_object.h
+++ b/drivers/staging/lustre/lustre/include/cl_object.h
@@ -82,7 +82,6 @@
  *  - i_mutex
  *      - PG_locked
  *	  - cl_object_header::coh_page_guard
- *	  - cl_object_header::coh_lock_guard
  *	  - lu_site::ls_guard
  *
  * See the top comment in cl_object.c for the description of overall locking and
@@ -404,16 +403,6 @@ struct cl_object_header {
 	 * here.
 	 */
 	struct lu_object_header  coh_lu;
-	/** \name locks
-	 * \todo XXX move locks below to the separate cache-lines, they are
-	 * mostly useless otherwise.
-	 */
-	/** @{ */
-	/** Lock protecting lock list. */
-	spinlock_t		 coh_lock_guard;
-	/** @} locks */
-	/** List of cl_lock's granted for this object. */
-	struct list_head	       coh_locks;
 
 	/**
 	 * Parent object. It is assumed that an object has a well-defined
@@ -795,16 +784,9 @@ struct cl_page_slice {
 /**
  * Lock mode. For the client extent locks.
  *
- * \warning: cl_lock_mode_match() assumes particular ordering here.
  * \ingroup cl_lock
  */
 enum cl_lock_mode {
-	/**
-	 * Mode of a lock that protects no data, and exists only as a
-	 * placeholder. This is used for `glimpse' requests. A phantom lock
-	 * might get promoted to real lock at some point.
-	 */
-	CLM_PHANTOM,
 	CLM_READ,
 	CLM_WRITE,
 	CLM_GROUP
@@ -1114,12 +1096,6 @@ static inline struct page *cl_page_vmpage(struct cl_page *page)
  * (struct cl_lock) and a list of layers (struct cl_lock_slice), linked to
  * cl_lock::cll_layers list through cl_lock_slice::cls_linkage.
  *
- * All locks for a given object are linked into cl_object_header::coh_locks
- * list (protected by cl_object_header::coh_lock_guard spin-lock) through
- * cl_lock::cll_linkage. Currently this list is not sorted in any way. We can
- * sort it in starting lock offset, or use altogether different data structure
- * like a tree.
- *
  * Typical cl_lock consists of the two layers:
  *
  *     - vvp_lock (vvp specific data), and
@@ -1320,289 +1296,21 @@ struct cl_lock_descr {
 	__u32	     cld_enq_flags;
 };
 
-#define DDESCR "%s(%d):[%lu, %lu]"
+#define DDESCR "%s(%d):[%lu, %lu]:%x"
 #define PDESCR(descr)						   \
 	cl_lock_mode_name((descr)->cld_mode), (descr)->cld_mode,	\
-	(descr)->cld_start, (descr)->cld_end
+	(descr)->cld_start, (descr)->cld_end, (descr)->cld_enq_flags
 
 const char *cl_lock_mode_name(const enum cl_lock_mode mode);
 
 /**
- * Lock state-machine states.
- *
- * \htmlonly
- * <pre>
- *
- * Possible state transitions:
- *
- *	      +------------------>NEW
- *	      |		    |
- *	      |		    | cl_enqueue_try()
- *	      |		    |
- *	      |    cl_unuse_try()  V
- *	      |  +--------------QUEUING (*)
- *	      |  |		 |
- *	      |  |		 | cl_enqueue_try()
- *	      |  |		 |
- *	      |  | cl_unuse_try()  V
- *    sub-lock  |  +-------------ENQUEUED (*)
- *    canceled  |  |		 |
- *	      |  |		 | cl_wait_try()
- *	      |  |		 |
- *	      |  |		(R)
- *	      |  |		 |
- *	      |  |		 V
- *	      |  |		HELD<---------+
- *	      |  |		 |	    |
- *	      |  |		 |	    | cl_use_try()
- *	      |  |  cl_unuse_try() |	    |
- *	      |  |		 |	    |
- *	      |  |		 V	 ---+
- *	      |  +------------>INTRANSIT (D) <--+
- *	      |		    |	    |
- *	      |     cl_unuse_try() |	    | cached lock found
- *	      |		    |	    | cl_use_try()
- *	      |		    |	    |
- *	      |		    V	    |
- *	      +------------------CACHED---------+
- *				   |
- *				  (C)
- *				   |
- *				   V
- *				FREEING
- *
- * Legend:
- *
- *	 In states marked with (*) transition to the same state (i.e., a loop
- *	 in the diagram) is possible.
- *
- *	 (R) is the point where Receive call-back is invoked: it allows layers
- *	 to handle arrival of lock reply.
- *
- *	 (C) is the point where Cancellation call-back is invoked.
- *
- *	 (D) is the transit state which means the lock is changing.
- *
- *	 Transition to FREEING state is possible from any other state in the
- *	 diagram in case of unrecoverable error.
- * </pre>
- * \endhtmlonly
- *
- * These states are for individual cl_lock object. Top-lock and its sub-locks
- * can be in the different states. Another way to say this is that we have
- * nested state-machines.
- *
- * Separate QUEUING and ENQUEUED states are needed to support non-blocking
- * operation for locks with multiple sub-locks. Imagine lock on a file F, that
- * intersects 3 stripes S0, S1, and S2. To enqueue F client has to send
- * enqueue to S0, wait for its completion, then send enqueue for S1, wait for
- * its completion and at last enqueue lock for S2, and wait for its
- * completion. In that case, top-lock is in QUEUING state while S0, S1 are
- * handled, and is in ENQUEUED state after enqueue to S2 has been sent (note
- * that in this case, sub-locks move from state to state, and top-lock remains
- * in the same state).
- */
-enum cl_lock_state {
-	/**
-	 * Lock that wasn't yet enqueued
-	 */
-	CLS_NEW,
-	/**
-	 * Enqueue is in progress, blocking for some intermediate interaction
-	 * with the other side.
-	 */
-	CLS_QUEUING,
-	/**
-	 * Lock is fully enqueued, waiting for server to reply when it is
-	 * granted.
-	 */
-	CLS_ENQUEUED,
-	/**
-	 * Lock granted, actively used by some IO.
-	 */
-	CLS_HELD,
-	/**
-	 * This state is used to mark the lock is being used, or unused.
-	 * We need this state because the lock may have several sublocks,
-	 * so it's impossible to have an atomic way to bring all sublocks
-	 * into CLS_HELD state at use case, or all sublocks to CLS_CACHED
-	 * at unuse case.
-	 * If a thread is referring to a lock, and it sees the lock is in this
-	 * state, it must wait for the lock.
-	 * See state diagram for details.
-	 */
-	CLS_INTRANSIT,
-	/**
-	 * Lock granted, not used.
-	 */
-	CLS_CACHED,
-	/**
-	 * Lock is being destroyed.
-	 */
-	CLS_FREEING,
-	CLS_NR
-};
-
-enum cl_lock_flags {
-	/**
-	 * lock has been cancelled. This flag is never cleared once set (by
-	 * cl_lock_cancel0()).
-	 */
-	CLF_CANCELLED	= 1 << 0,
-	/** cancellation is pending for this lock. */
-	CLF_CANCELPEND	= 1 << 1,
-	/** destruction is pending for this lock. */
-	CLF_DOOMED	= 1 << 2,
-	/** from enqueue RPC reply upcall. */
-	CLF_FROM_UPCALL	= 1 << 3,
-};
-
-/**
- * Lock closure.
- *
- * Lock closure is a collection of locks (both top-locks and sub-locks) that
- * might be updated in a result of an operation on a certain lock (which lock
- * this is a closure of).
- *
- * Closures are needed to guarantee dead-lock freedom in the presence of
- *
- *     - nested state-machines (top-lock state-machine composed of sub-lock
- *       state-machines), and
- *
- *     - shared sub-locks.
- *
- * Specifically, many operations, such as lock enqueue, wait, unlock,
- * etc. start from a top-lock, and then operate on a sub-locks of this
- * top-lock, holding a top-lock mutex. When sub-lock state changes as a result
- * of such operation, this change has to be propagated to all top-locks that
- * share this sub-lock. Obviously, no natural lock ordering (e.g.,
- * top-to-bottom or bottom-to-top) captures this scenario, so try-locking has
- * to be used. Lock closure systematizes this try-and-repeat logic.
- */
-struct cl_lock_closure {
-	/**
-	 * Lock that is mutexed when closure construction is started. When
-	 * closure in is `wait' mode (cl_lock_closure::clc_wait), mutex on
-	 * origin is released before waiting.
-	 */
-	struct cl_lock   *clc_origin;
-	/**
-	 * List of enclosed locks, so far. Locks are linked here through
-	 * cl_lock::cll_inclosure.
-	 */
-	struct list_head	clc_list;
-	/**
-	 * True iff closure is in a `wait' mode. This determines what
-	 * cl_lock_enclosure() does when a lock L to be added to the closure
-	 * is currently mutexed by some other thread.
-	 *
-	 * If cl_lock_closure::clc_wait is not set, then closure construction
-	 * fails with CLO_REPEAT immediately.
-	 *
-	 * In wait mode, cl_lock_enclosure() waits until next attempt to build
-	 * a closure might succeed. To this end it releases an origin mutex
-	 * (cl_lock_closure::clc_origin), that has to be the only lock mutex
-	 * owned by the current thread, and then waits on L mutex (by grabbing
-	 * it and immediately releasing), before returning CLO_REPEAT to the
-	 * caller.
-	 */
-	int	       clc_wait;
-	/** Number of locks in the closure. */
-	int	       clc_nr;
-};
-
-/**
  * Layered client lock.
  */
 struct cl_lock {
-	/** Reference counter. */
-	atomic_t	  cll_ref;
 	/** List of slices. Immutable after creation. */
 	struct list_head	    cll_layers;
-	/**
-	 * Linkage into cl_lock::cll_descr::cld_obj::coh_locks list. Protected
-	 * by cl_lock::cll_descr::cld_obj::coh_lock_guard.
-	 */
-	struct list_head	    cll_linkage;
-	/**
-	 * Parameters of this lock. Protected by
-	 * cl_lock::cll_descr::cld_obj::coh_lock_guard nested within
-	 * cl_lock::cll_guard. Modified only on lock creation and in
-	 * cl_lock_modify().
-	 */
+	/** lock attribute, extent, cl_object, etc. */
 	struct cl_lock_descr  cll_descr;
-	/** Protected by cl_lock::cll_guard. */
-	enum cl_lock_state    cll_state;
-	/** signals state changes. */
-	wait_queue_head_t	   cll_wq;
-	/**
-	 * Recursive lock, most fields in cl_lock{} are protected by this.
-	 *
-	 * Locking rules: this mutex is never held across network
-	 * communication, except when lock is being canceled.
-	 *
-	 * Lock ordering: a mutex of a sub-lock is taken first, then a mutex
-	 * on a top-lock. Other direction is implemented through a
-	 * try-lock-repeat loop. Mutices of unrelated locks can be taken only
-	 * by try-locking.
-	 *
-	 * \see osc_lock_enqueue_wait(), lov_lock_cancel(), lov_sublock_wait().
-	 */
-	struct mutex		cll_guard;
-	struct task_struct	*cll_guarder;
-	int		   cll_depth;
-
-	/**
-	 * the owner for INTRANSIT state
-	 */
-	struct task_struct	*cll_intransit_owner;
-	int		   cll_error;
-	/**
-	 * Number of holds on a lock. A hold prevents a lock from being
-	 * canceled and destroyed. Protected by cl_lock::cll_guard.
-	 *
-	 * \see cl_lock_hold(), cl_lock_unhold(), cl_lock_release()
-	 */
-	int		   cll_holds;
-	 /**
-	  * Number of lock users. Valid in cl_lock_state::CLS_HELD state
-	  * only. Lock user pins lock in CLS_HELD state. Protected by
-	  * cl_lock::cll_guard.
-	  *
-	  * \see cl_wait(), cl_unuse().
-	  */
-	int		   cll_users;
-	/**
-	 * Flag bit-mask. Values from enum cl_lock_flags. Updates are
-	 * protected by cl_lock::cll_guard.
-	 */
-	unsigned long	 cll_flags;
-	/**
-	 * A linkage into a list of locks in a closure.
-	 *
-	 * \see cl_lock_closure
-	 */
-	struct list_head	    cll_inclosure;
-	/**
-	 * Confict lock at queuing time.
-	 */
-	struct cl_lock       *cll_conflict;
-	/**
-	 * A list of references to this lock, for debugging.
-	 */
-	struct lu_ref	 cll_reference;
-	/**
-	 * A list of holds on this lock, for debugging.
-	 */
-	struct lu_ref	 cll_holders;
-	/**
-	 * A reference for cl_lock::cll_descr::cld_obj. For debugging.
-	 */
-	struct lu_ref_link    cll_obj_ref;
-#ifdef CONFIG_LOCKDEP
-	/* "dep_map" name is assumed by lockdep.h macros. */
-	struct lockdep_map    dep_map;
-#endif
 };
 
 /**
@@ -1622,170 +1330,32 @@ struct cl_lock_slice {
 };
 
 /**
- * Possible (non-error) return values of ->clo_{enqueue,wait,unlock}().
- *
- * NOTE: lov_subresult() depends on ordering here.
- */
-enum cl_lock_transition {
-	/** operation cannot be completed immediately. Wait for state change. */
-	CLO_WAIT	= 1,
-	/** operation had to release lock mutex, restart. */
-	CLO_REPEAT      = 2,
-	/** lower layer re-enqueued. */
-	CLO_REENQUEUED  = 3,
-};
-
-/**
  *
  * \see vvp_lock_ops, lov_lock_ops, lovsub_lock_ops, osc_lock_ops
  */
 struct cl_lock_operations {
-	/**
-	 * \name statemachine
-	 *
-	 * State machine transitions. These 3 methods are called to transfer
-	 * lock from one state to another, as described in the commentary
-	 * above enum #cl_lock_state.
-	 *
-	 * \retval 0	  this layer has nothing more to do to before
-	 *		       transition to the target state happens;
-	 *
-	 * \retval CLO_REPEAT method had to release and re-acquire cl_lock
-	 *		    mutex, repeat invocation of transition method
-	 *		    across all layers;
-	 *
-	 * \retval CLO_WAIT   this layer cannot move to the target state
-	 *		    immediately, as it has to wait for certain event
-	 *		    (e.g., the communication with the server). It
-	 *		    is guaranteed, that when the state transfer
-	 *		    becomes possible, cl_lock::cll_wq wait-queue
-	 *		    is signaled. Caller can wait for this event by
-	 *		    calling cl_lock_state_wait();
-	 *
-	 * \retval -ve	failure, abort state transition, move the lock
-	 *		    into cl_lock_state::CLS_FREEING state, and set
-	 *		    cl_lock::cll_error.
-	 *
-	 * Once all layers voted to agree to transition (by returning 0), lock
-	 * is moved into corresponding target state. All state transition
-	 * methods are optional.
-	 */
 	/** @{ */
 	/**
 	 * Attempts to enqueue the lock. Called top-to-bottom.
 	 *
+	 * \retval 0	this layer has enqueued the lock successfully
+	 * \retval >0	this layer has enqueued the lock, but need to wait on
+	 *		@anchor for resources
+	 * \retval -ve	failure
+	 *
 	 * \see ccc_lock_enqueue(), lov_lock_enqueue(), lovsub_lock_enqueue(),
 	 * \see osc_lock_enqueue()
 	 */
 	int  (*clo_enqueue)(const struct lu_env *env,
 			    const struct cl_lock_slice *slice,
-			    struct cl_io *io, __u32 enqflags);
-	/**
-	 * Attempts to wait for enqueue result. Called top-to-bottom.
-	 *
-	 * \see ccc_lock_wait(), lov_lock_wait(), osc_lock_wait()
-	 */
-	int  (*clo_wait)(const struct lu_env *env,
-			 const struct cl_lock_slice *slice);
-	/**
-	 * Attempts to unlock the lock. Called bottom-to-top. In addition to
-	 * usual return values of lock state-machine methods, this can return
-	 * -ESTALE to indicate that lock cannot be returned to the cache, and
-	 * has to be re-initialized.
-	 * unuse is a one-shot operation, so it must NOT return CLO_WAIT.
-	 *
-	 * \see ccc_lock_unuse(), lov_lock_unuse(), osc_lock_unuse()
-	 */
-	int  (*clo_unuse)(const struct lu_env *env,
-			  const struct cl_lock_slice *slice);
-	/**
-	 * Notifies layer that cached lock is started being used.
-	 *
-	 * \pre lock->cll_state == CLS_CACHED
-	 *
-	 * \see lov_lock_use(), osc_lock_use()
-	 */
-	int  (*clo_use)(const struct lu_env *env,
-			const struct cl_lock_slice *slice);
-	/** @} statemachine */
-	/**
-	 * A method invoked when lock state is changed (as a result of state
-	 * transition). This is used, for example, to track when the state of
-	 * a sub-lock changes, to propagate this change to the corresponding
-	 * top-lock. Optional
-	 *
-	 * \see lovsub_lock_state()
-	 */
-	void (*clo_state)(const struct lu_env *env,
-			  const struct cl_lock_slice *slice,
-			  enum cl_lock_state st);
-	/**
-	 * Returns true, iff given lock is suitable for the given io, idea
-	 * being, that there are certain "unsafe" locks, e.g., ones acquired
-	 * for O_APPEND writes, that we don't want to re-use for a normal
-	 * write, to avoid the danger of cascading evictions. Optional. Runs
-	 * under cl_object_header::coh_lock_guard.
-	 *
-	 * XXX this should take more information about lock needed by
-	 * io. Probably lock description or something similar.
-	 *
-	 * \see lov_fits_into()
-	 */
-	int (*clo_fits_into)(const struct lu_env *env,
-			     const struct cl_lock_slice *slice,
-			     const struct cl_lock_descr *need,
-			     const struct cl_io *io);
-	/**
-	 * \name ast
-	 * Asynchronous System Traps. All of then are optional, all are
-	 * executed bottom-to-top.
-	 */
-	/** @{ */
-
+			    struct cl_io *io, struct cl_sync_io *anchor);
 	/**
-	 * Cancellation callback. Cancel a lock voluntarily, or under
-	 * the request of server.
+	 * Cancel a lock, release its DLM lock ref, while does not cancel the
+	 * DLM lock
 	 */
 	void (*clo_cancel)(const struct lu_env *env,
 			   const struct cl_lock_slice *slice);
-	/**
-	 * Lock weighting ast. Executed to estimate how precious this lock
-	 * is. The sum of results across all layers is used to determine
-	 * whether lock worth keeping in cache given present memory usage.
-	 *
-	 * \see osc_lock_weigh(), vvp_lock_weigh(), lovsub_lock_weigh().
-	 */
-	unsigned long (*clo_weigh)(const struct lu_env *env,
-				   const struct cl_lock_slice *slice);
-	/** @} ast */
-
-	/**
-	 * \see lovsub_lock_closure()
-	 */
-	int (*clo_closure)(const struct lu_env *env,
-			   const struct cl_lock_slice *slice,
-			   struct cl_lock_closure *closure);
-	/**
-	 * Executed bottom-to-top when lock description changes (e.g., as a
-	 * result of server granting more generous lock than was requested).
-	 *
-	 * \see lovsub_lock_modify()
-	 */
-	int (*clo_modify)(const struct lu_env *env,
-			  const struct cl_lock_slice *slice,
-			  const struct cl_lock_descr *updated);
-	/**
-	 * Notifies layers (bottom-to-top) that lock is going to be
-	 * destroyed. Responsibility of layers is to prevent new references on
-	 * this lock from being acquired once this method returns.
-	 *
-	 * This can be called multiple times due to the races.
-	 *
-	 * \see cl_lock_delete()
-	 * \see osc_lock_delete(), lovsub_lock_delete()
-	 */
-	void (*clo_delete)(const struct lu_env *env,
-			   const struct cl_lock_slice *slice);
+	/** @} */
 	/**
 	 * Destructor. Frees resources and the slice.
 	 *
@@ -2165,9 +1735,13 @@ enum cl_enq_flags {
 	 */
 	CEF_AGL	  = 0x00000020,
 	/**
+	 * enqueue a lock to test DLM lock existence.
+	 */
+	CEF_PEEK	= 0x00000040,
+	/**
 	 * mask of enq_flags.
 	 */
-	CEF_MASK	 = 0x0000003f,
+	CEF_MASK         = 0x0000007f,
 };
 
 /**
@@ -2177,12 +1751,12 @@ enum cl_enq_flags {
 struct cl_io_lock_link {
 	/** linkage into one of cl_lockset lists. */
 	struct list_head	   cill_linkage;
-	struct cl_lock_descr cill_descr;
-	struct cl_lock      *cill_lock;
+	struct cl_lock          cill_lock;
 	/** optional destructor */
 	void	       (*cill_fini)(const struct lu_env *env,
 				    struct cl_io_lock_link *link);
 };
+#define cill_descr	cill_lock.cll_descr
 
 /**
  * Lock-set represents a collection of locks, that io needs at a
@@ -2216,8 +1790,6 @@ struct cl_io_lock_link {
 struct cl_lockset {
 	/** locks to be acquired. */
 	struct list_head  cls_todo;
-	/** locks currently being processed. */
-	struct list_head  cls_curr;
 	/** locks acquired. */
 	struct list_head  cls_done;
 };
@@ -2581,9 +2153,7 @@ struct cl_site {
 	 * and top-locks (and top-pages) are accounted here.
 	 */
 	struct cache_stats    cs_pages;
-	struct cache_stats    cs_locks;
 	atomic_t	  cs_pages_state[CPS_NR];
-	atomic_t	  cs_locks_state[CLS_NR];
 };
 
 int  cl_site_init(struct cl_site *s, struct cl_device *top);
@@ -2707,7 +2277,7 @@ int  cl_object_glimpse(const struct lu_env *env, struct cl_object *obj,
 		       struct ost_lvb *lvb);
 int  cl_conf_set(const struct lu_env *env, struct cl_object *obj,
 		 const struct cl_object_conf *conf);
-void cl_object_prune(const struct lu_env *env, struct cl_object *obj);
+int cl_object_prune(const struct lu_env *env, struct cl_object *obj);
 void cl_object_kill(const struct lu_env *env, struct cl_object *obj);
 
 /**
@@ -2845,121 +2415,17 @@ void cl_lock_descr_print(const struct lu_env *env, void *cookie,
  * @{
  */
 
-struct cl_lock *cl_lock_hold(const struct lu_env *env, const struct cl_io *io,
-			     const struct cl_lock_descr *need,
-			     const char *scope, const void *source);
-struct cl_lock *cl_lock_peek(const struct lu_env *env, const struct cl_io *io,
-			     const struct cl_lock_descr *need,
-			     const char *scope, const void *source);
-struct cl_lock *cl_lock_request(const struct lu_env *env, struct cl_io *io,
-				const struct cl_lock_descr *need,
-				const char *scope, const void *source);
-struct cl_lock *cl_lock_at_pgoff(const struct lu_env *env,
-				 struct cl_object *obj, pgoff_t index,
-				 struct cl_lock *except, int pending,
-				 int canceld);
+int cl_lock_request(const struct lu_env *env, struct cl_io *io,
+		    struct cl_lock *lock);
+int cl_lock_init(const struct lu_env *env, struct cl_lock *lock,
+		 const struct cl_io *io);
+void cl_lock_fini(const struct lu_env *env, struct cl_lock *lock);
 const struct cl_lock_slice *cl_lock_at(const struct cl_lock *lock,
 				       const struct lu_device_type *dtype);
-
-void cl_lock_get(struct cl_lock *lock);
-void cl_lock_get_trust(struct cl_lock *lock);
-void cl_lock_put(const struct lu_env *env, struct cl_lock *lock);
-void cl_lock_hold_add(const struct lu_env *env, struct cl_lock *lock,
-		      const char *scope, const void *source);
-void cl_lock_hold_release(const struct lu_env *env, struct cl_lock *lock,
-			  const char *scope, const void *source);
-void cl_lock_unhold(const struct lu_env *env, struct cl_lock *lock,
-		    const char *scope, const void *source);
-void cl_lock_release(const struct lu_env *env, struct cl_lock *lock,
-		     const char *scope, const void *source);
-void cl_lock_user_add(const struct lu_env *env, struct cl_lock *lock);
-void cl_lock_user_del(const struct lu_env *env, struct cl_lock *lock);
-
-int cl_lock_is_intransit(struct cl_lock *lock);
-
-int cl_lock_enqueue_wait(const struct lu_env *env, struct cl_lock *lock,
-			 int keep_mutex);
-
-/** \name statemachine statemachine
- * Interface to lock state machine consists of 3 parts:
- *
- *     - "try" functions that attempt to effect a state transition. If state
- *     transition is not possible right now (e.g., if it has to wait for some
- *     asynchronous event to occur), these functions return
- *     cl_lock_transition::CLO_WAIT.
- *
- *     - "non-try" functions that implement synchronous blocking interface on
- *     top of non-blocking "try" functions. These functions repeatedly call
- *     corresponding "try" versions, and if state transition is not possible
- *     immediately, wait for lock state change.
- *
- *     - methods from cl_lock_operations, called by "try" functions. Lock can
- *     be advanced to the target state only when all layers voted that they
- *     are ready for this transition. "Try" functions call methods under lock
- *     mutex. If a layer had to release a mutex, it re-acquires it and returns
- *     cl_lock_transition::CLO_REPEAT, causing "try" function to call all
- *     layers again.
- *
- * TRY	      NON-TRY      METHOD			    FINAL STATE
- *
- * cl_enqueue_try() cl_enqueue() cl_lock_operations::clo_enqueue() CLS_ENQUEUED
- *
- * cl_wait_try()    cl_wait()    cl_lock_operations::clo_wait()    CLS_HELD
- *
- * cl_unuse_try()   cl_unuse()   cl_lock_operations::clo_unuse()   CLS_CACHED
- *
- * cl_use_try()     NONE	 cl_lock_operations::clo_use()     CLS_HELD
- *
- * @{
- */
-
-int cl_wait(const struct lu_env *env, struct cl_lock *lock);
-void cl_unuse(const struct lu_env *env, struct cl_lock *lock);
-int cl_enqueue_try(const struct lu_env *env, struct cl_lock *lock,
-		   struct cl_io *io, __u32 flags);
-int cl_unuse_try(const struct lu_env *env, struct cl_lock *lock);
-int cl_wait_try(const struct lu_env *env, struct cl_lock *lock);
-int cl_use_try(const struct lu_env *env, struct cl_lock *lock, int atomic);
-
-/** @} statemachine */
-
-void cl_lock_signal(const struct lu_env *env, struct cl_lock *lock);
-int cl_lock_state_wait(const struct lu_env *env, struct cl_lock *lock);
-void cl_lock_state_set(const struct lu_env *env, struct cl_lock *lock,
-		       enum cl_lock_state state);
-int cl_queue_match(const struct list_head *queue,
-		   const struct cl_lock_descr *need);
-
-void cl_lock_mutex_get(const struct lu_env *env, struct cl_lock *lock);
-void cl_lock_mutex_put(const struct lu_env *env, struct cl_lock *lock);
-int cl_lock_is_mutexed(struct cl_lock *lock);
-int cl_lock_nr_mutexed(const struct lu_env *env);
-int cl_lock_discard_pages(const struct lu_env *env, struct cl_lock *lock);
-int cl_lock_ext_match(const struct cl_lock_descr *has,
-		      const struct cl_lock_descr *need);
-int cl_lock_descr_match(const struct cl_lock_descr *has,
-			const struct cl_lock_descr *need);
-int cl_lock_mode_match(enum cl_lock_mode has, enum cl_lock_mode need);
-int cl_lock_modify(const struct lu_env *env, struct cl_lock *lock,
-		   const struct cl_lock_descr *desc);
-
-void cl_lock_closure_init(const struct lu_env *env,
-			  struct cl_lock_closure *closure,
-			  struct cl_lock *origin, int wait);
-void cl_lock_closure_fini(struct cl_lock_closure *closure);
-int cl_lock_closure_build(const struct lu_env *env, struct cl_lock *lock,
-			  struct cl_lock_closure *closure);
-void cl_lock_disclosure(const struct lu_env *env,
-			struct cl_lock_closure *closure);
-int cl_lock_enclosure(const struct lu_env *env, struct cl_lock *lock,
-		      struct cl_lock_closure *closure);
-
+void cl_lock_release(const struct lu_env *env, struct cl_lock *lock);
+int cl_lock_enqueue(const struct lu_env *env, struct cl_io *io,
+		    struct cl_lock *lock, struct cl_sync_io *anchor);
 void cl_lock_cancel(const struct lu_env *env, struct cl_lock *lock);
-void cl_lock_delete(const struct lu_env *env, struct cl_lock *lock);
-void cl_lock_error(const struct lu_env *env, struct cl_lock *lock, int error);
-void cl_locks_prune(const struct lu_env *env, struct cl_object *obj, int wait);
-
-unsigned long cl_lock_weigh(const struct lu_env *env, struct cl_lock *lock);
 
 /** @} cl_lock */
 
diff --git a/drivers/staging/lustre/lustre/include/lclient.h b/drivers/staging/lustre/lustre/include/lclient.h
index a8c8788..82af8ae 100644
--- a/drivers/staging/lustre/lustre/include/lclient.h
+++ b/drivers/staging/lustre/lustre/include/lclient.h
@@ -99,10 +99,6 @@ struct ccc_io {
 		} write;
 	} u;
 	/**
-	 * True iff io is processing glimpse right now.
-	 */
-	int		  cui_glimpse;
-	/**
 	 * Layout version when this IO is initialized
 	 */
 	__u32		cui_layout_gen;
@@ -123,6 +119,7 @@ extern struct lu_context_key ccc_key;
 extern struct lu_context_key ccc_session_key;
 
 struct ccc_thread_info {
+	struct cl_lock		cti_lock;
 	struct cl_lock_descr cti_descr;
 	struct cl_io	 cti_io;
 	struct cl_attr       cti_attr;
@@ -137,6 +134,14 @@ static inline struct ccc_thread_info *ccc_env_info(const struct lu_env *env)
 	return info;
 }
 
+static inline struct cl_lock *ccc_env_lock(const struct lu_env *env)
+{
+	struct cl_lock *lock = &ccc_env_info(env)->cti_lock;
+
+	memset(lock, 0, sizeof(*lock));
+	return lock;
+}
+
 static inline struct cl_attr *ccc_env_thread_attr(const struct lu_env *env)
 {
 	struct cl_attr *attr = &ccc_env_info(env)->cti_attr;
@@ -308,18 +313,7 @@ void ccc_lock_delete(const struct lu_env *env,
 void ccc_lock_fini(const struct lu_env *env, struct cl_lock_slice *slice);
 int ccc_lock_enqueue(const struct lu_env *env,
 		     const struct cl_lock_slice *slice,
-		     struct cl_io *io, __u32 enqflags);
-int ccc_lock_use(const struct lu_env *env, const struct cl_lock_slice *slice);
-int ccc_lock_unuse(const struct lu_env *env, const struct cl_lock_slice *slice);
-int ccc_lock_wait(const struct lu_env *env, const struct cl_lock_slice *slice);
-int ccc_lock_fits_into(const struct lu_env *env,
-		       const struct cl_lock_slice *slice,
-		       const struct cl_lock_descr *need,
-		       const struct cl_io *io);
-void ccc_lock_state(const struct lu_env *env,
-		    const struct cl_lock_slice *slice,
-		    enum cl_lock_state state);
-
+		     struct cl_io *io, struct cl_sync_io *anchor);
 int ccc_io_one_lock_index(const struct lu_env *env, struct cl_io *io,
 			  __u32 enqflags, enum cl_lock_mode mode,
 			  pgoff_t start, pgoff_t end);
diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index da8bc6e..4318511 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -2582,6 +2582,8 @@ struct ldlm_extent {
 	__u64 gid;
 };
 
+#define LDLM_GID_ANY ((__u64)-1)
+
 static inline int ldlm_extent_overlap(struct ldlm_extent *ex1,
 				      struct ldlm_extent *ex2)
 {
diff --git a/drivers/staging/lustre/lustre/include/lustre_dlm.h b/drivers/staging/lustre/lustre/include/lustre_dlm.h
index 8b0364f..b1abdc2 100644
--- a/drivers/staging/lustre/lustre/include/lustre_dlm.h
+++ b/drivers/staging/lustre/lustre/include/lustre_dlm.h
@@ -71,6 +71,7 @@ struct obd_device;
  */
 enum ldlm_error {
 	ELDLM_OK = 0,
+	ELDLM_LOCK_MATCHED = 1,
 
 	ELDLM_LOCK_CHANGED = 300,
 	ELDLM_LOCK_ABORTED = 301,
diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_lib.c b/drivers/staging/lustre/lustre/ldlm/ldlm_lib.c
index b497ce4..7fedbec 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_lib.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_lib.c
@@ -748,6 +748,7 @@ int ldlm_error2errno(enum ldlm_error error)
 
 	switch (error) {
 	case ELDLM_OK:
+	case ELDLM_LOCK_MATCHED:
 		result = 0;
 		break;
 	case ELDLM_LOCK_CHANGED:
diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_lock.c b/drivers/staging/lustre/lustre/ldlm/ldlm_lock.c
index ecd65a7..27a051b 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_lock.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_lock.c
@@ -657,7 +657,7 @@ void ldlm_lock_addref(struct lustre_handle *lockh, __u32 mode)
 	struct ldlm_lock *lock;
 
 	lock = ldlm_handle2lock(lockh);
-	LASSERT(lock);
+	LASSERTF(lock, "Non-existing lock: %llx\n", lockh->cookie);
 	ldlm_lock_addref_internal(lock, mode);
 	LDLM_LOCK_PUT(lock);
 }
@@ -1092,6 +1092,7 @@ static struct ldlm_lock *search_queue(struct list_head *queue,
 
 		if (unlikely(match == LCK_GROUP) &&
 		    lock->l_resource->lr_type == LDLM_EXTENT &&
+		    policy->l_extent.gid != LDLM_GID_ANY &&
 		    lock->l_policy_data.l_extent.gid != policy->l_extent.gid)
 			continue;
 
diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_request.c b/drivers/staging/lustre/lustre/ldlm/ldlm_request.c
index d5968e0..42925ac 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_request.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_request.c
@@ -347,7 +347,6 @@ int ldlm_cli_enqueue_fini(struct obd_export *exp, struct ptlrpc_request *req,
 	struct ldlm_lock *lock;
 	struct ldlm_reply *reply;
 	int cleanup_phase = 1;
-	int size = 0;
 
 	lock = ldlm_handle2lock(lockh);
 	/* ldlm_cli_enqueue is holding a reference on this lock. */
@@ -375,8 +374,8 @@ int ldlm_cli_enqueue_fini(struct obd_export *exp, struct ptlrpc_request *req,
 		goto cleanup;
 	}
 
-	if (lvb_len != 0) {
-		LASSERT(lvb);
+	if (lvb_len > 0) {
+		int size = 0;
 
 		size = req_capsule_get_size(&req->rq_pill, &RMF_DLM_LVB,
 					    RCL_SERVER);
@@ -390,12 +389,13 @@ int ldlm_cli_enqueue_fini(struct obd_export *exp, struct ptlrpc_request *req,
 			rc = -EINVAL;
 			goto cleanup;
 		}
+		lvb_len = size;
 	}
 
 	if (rc == ELDLM_LOCK_ABORTED) {
-		if (lvb_len != 0)
+		if (lvb_len > 0 && lvb)
 			rc = ldlm_fill_lvb(lock, &req->rq_pill, RCL_SERVER,
-					   lvb, size);
+					   lvb, lvb_len);
 		if (rc == 0)
 			rc = ELDLM_LOCK_ABORTED;
 		goto cleanup;
@@ -489,7 +489,7 @@ int ldlm_cli_enqueue_fini(struct obd_export *exp, struct ptlrpc_request *req,
 	/* If the lock has already been granted by a completion AST, don't
 	 * clobber the LVB with an older one.
 	 */
-	if (lvb_len != 0) {
+	if (lvb_len > 0) {
 		/* We must lock or a racing completion might update lvb without
 		 * letting us know and we'll clobber the correct value.
 		 * Cannot unlock after the check either, as that still leaves
@@ -498,7 +498,7 @@ int ldlm_cli_enqueue_fini(struct obd_export *exp, struct ptlrpc_request *req,
 		lock_res_and_lock(lock);
 		if (lock->l_req_mode != lock->l_granted_mode)
 			rc = ldlm_fill_lvb(lock, &req->rq_pill, RCL_SERVER,
-					   lock->l_lvb_data, size);
+					   lock->l_lvb_data, lvb_len);
 		unlock_res_and_lock(lock);
 		if (rc < 0) {
 			cleanup_phase = 1;
@@ -518,7 +518,7 @@ int ldlm_cli_enqueue_fini(struct obd_export *exp, struct ptlrpc_request *req,
 		}
 	}
 
-	if (lvb_len && lvb) {
+	if (lvb_len > 0 && lvb) {
 		/* Copy the LVB here, and not earlier, because the completion
 		 * AST (if any) can override what we got in the reply
 		 */
diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c b/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c
index 9dede87..242a664 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c
@@ -1400,3 +1400,4 @@ void ldlm_resource_dump(int level, struct ldlm_resource *res)
 			LDLM_DEBUG_LIMIT(level, lock, "###");
 	}
 }
+EXPORT_SYMBOL(ldlm_resource_dump);
diff --git a/drivers/staging/lustre/lustre/llite/glimpse.c b/drivers/staging/lustre/lustre/llite/glimpse.c
index 9b0e2ec..0759dfc 100644
--- a/drivers/staging/lustre/lustre/llite/glimpse.c
+++ b/drivers/staging/lustre/lustre/llite/glimpse.c
@@ -86,17 +86,17 @@ blkcnt_t dirty_cnt(struct inode *inode)
 int cl_glimpse_lock(const struct lu_env *env, struct cl_io *io,
 		    struct inode *inode, struct cl_object *clob, int agl)
 {
-	struct cl_lock_descr *descr = &ccc_env_info(env)->cti_descr;
 	struct ll_inode_info *lli   = ll_i2info(inode);
 	const struct lu_fid  *fid   = lu_object_fid(&clob->co_lu);
-	struct ccc_io	*cio   = ccc_env_io(env);
-	struct cl_lock       *lock;
 	int result;
 
 	result = 0;
 	if (!(lli->lli_flags & LLIF_MDS_SIZE_LOCK)) {
 		CDEBUG(D_DLMTRACE, "Glimpsing inode " DFID "\n", PFID(fid));
 		if (lli->lli_has_smd) {
+			struct cl_lock *lock = ccc_env_lock(env);
+			struct cl_lock_descr *descr = &lock->cll_descr;
+
 			/* NOTE: this looks like DLM lock request, but it may
 			 *       not be one. Due to CEF_ASYNC flag (translated
 			 *       to LDLM_FL_HAS_INTENT by osc), this is
@@ -113,11 +113,10 @@ int cl_glimpse_lock(const struct lu_env *env, struct cl_io *io,
 			 */
 			*descr = whole_file;
 			descr->cld_obj   = clob;
-			descr->cld_mode  = CLM_PHANTOM;
+			descr->cld_mode  = CLM_READ;
 			descr->cld_enq_flags = CEF_ASYNC | CEF_MUST;
 			if (agl)
 				descr->cld_enq_flags |= CEF_AGL;
-			cio->cui_glimpse = 1;
 			/*
 			 * CEF_ASYNC is used because glimpse sub-locks cannot
 			 * deadlock (because they never conflict with other
@@ -126,19 +125,11 @@ int cl_glimpse_lock(const struct lu_env *env, struct cl_io *io,
 			 * CEF_MUST protects glimpse lock from conversion into
 			 * a lockless mode.
 			 */
-			lock = cl_lock_request(env, io, descr, "glimpse",
-					       current);
-			cio->cui_glimpse = 0;
-
-			if (!lock)
-				return 0;
+			result = cl_lock_request(env, io, lock);
+			if (result < 0)
+				return result;
 
-			if (IS_ERR(lock))
-				return PTR_ERR(lock);
-
-			LASSERT(agl == 0);
-			result = cl_wait(env, lock);
-			if (result == 0) {
+			if (!agl) {
 				ll_merge_attr(env, inode);
 				if (i_size_read(inode) > 0 &&
 				    inode->i_blocks == 0) {
@@ -150,9 +141,8 @@ int cl_glimpse_lock(const struct lu_env *env, struct cl_io *io,
 					 */
 					inode->i_blocks = dirty_cnt(inode);
 				}
-				cl_unuse(env, lock);
 			}
-			cl_lock_release(env, lock, "glimpse", current);
+			cl_lock_release(env, lock);
 		} else {
 			CDEBUG(D_DLMTRACE, "No objects for inode\n");
 			ll_merge_attr(env, inode);
@@ -233,10 +223,7 @@ int cl_local_size(struct inode *inode)
 {
 	struct lu_env	   *env = NULL;
 	struct cl_io	    *io  = NULL;
-	struct ccc_thread_info  *cti;
 	struct cl_object	*clob;
-	struct cl_lock_descr    *descr;
-	struct cl_lock	  *lock;
 	int		      result;
 	int		      refcheck;
 
@@ -252,19 +239,15 @@ int cl_local_size(struct inode *inode)
 	if (result > 0) {
 		result = io->ci_result;
 	} else if (result == 0) {
-		cti = ccc_env_info(env);
-		descr = &cti->cti_descr;
+		struct cl_lock *lock = ccc_env_lock(env);
 
-		*descr = whole_file;
-		descr->cld_obj = clob;
-		lock = cl_lock_peek(env, io, descr, "localsize", current);
-		if (lock) {
+		lock->cll_descr = whole_file;
+		lock->cll_descr.cld_enq_flags = CEF_PEEK;
+		lock->cll_descr.cld_obj = clob;
+		result = cl_lock_request(env, io, lock);
+		if (result == 0) {
 			ll_merge_attr(env, inode);
-			cl_unuse(env, lock);
-			cl_lock_release(env, lock, "localsize", current);
-			result = 0;
-		} else {
-			result = -ENODATA;
+			cl_lock_release(env, lock);
 		}
 	}
 	cl_io_fini(env, io);
diff --git a/drivers/staging/lustre/lustre/llite/lcommon_cl.c b/drivers/staging/lustre/lustre/llite/lcommon_cl.c
index fde96d3..b339d1b 100644
--- a/drivers/staging/lustre/lustre/llite/lcommon_cl.c
+++ b/drivers/staging/lustre/lustre/llite/lcommon_cl.c
@@ -475,12 +475,6 @@ int ccc_transient_page_prep(const struct lu_env *env,
  *
  */
 
-void ccc_lock_delete(const struct lu_env *env,
-		     const struct cl_lock_slice *slice)
-{
-	CLOBINVRNT(env, slice->cls_obj, ccc_object_invariant(slice->cls_obj));
-}
-
 void ccc_lock_fini(const struct lu_env *env, struct cl_lock_slice *slice)
 {
 	struct ccc_lock *clk = cl2ccc_lock(slice);
@@ -490,111 +484,12 @@ void ccc_lock_fini(const struct lu_env *env, struct cl_lock_slice *slice)
 
 int ccc_lock_enqueue(const struct lu_env *env,
 		     const struct cl_lock_slice *slice,
-		     struct cl_io *unused, __u32 enqflags)
-{
-	CLOBINVRNT(env, slice->cls_obj, ccc_object_invariant(slice->cls_obj));
-	return 0;
-}
-
-int ccc_lock_use(const struct lu_env *env, const struct cl_lock_slice *slice)
-{
-	CLOBINVRNT(env, slice->cls_obj, ccc_object_invariant(slice->cls_obj));
-	return 0;
-}
-
-int ccc_lock_unuse(const struct lu_env *env, const struct cl_lock_slice *slice)
+		     struct cl_io *unused, struct cl_sync_io *anchor)
 {
 	CLOBINVRNT(env, slice->cls_obj, ccc_object_invariant(slice->cls_obj));
 	return 0;
 }
 
-int ccc_lock_wait(const struct lu_env *env, const struct cl_lock_slice *slice)
-{
-	CLOBINVRNT(env, slice->cls_obj, ccc_object_invariant(slice->cls_obj));
-	return 0;
-}
-
-/**
- * Implementation of cl_lock_operations::clo_fits_into() methods for ccc
- * layer. This function is executed every time io finds an existing lock in
- * the lock cache while creating new lock. This function has to decide whether
- * cached lock "fits" into io.
- *
- * \param slice lock to be checked
- * \param io    IO that wants a lock.
- *
- * \see lov_lock_fits_into().
- */
-int ccc_lock_fits_into(const struct lu_env *env,
-		       const struct cl_lock_slice *slice,
-		       const struct cl_lock_descr *need,
-		       const struct cl_io *io)
-{
-	const struct cl_lock       *lock  = slice->cls_lock;
-	const struct cl_lock_descr *descr = &lock->cll_descr;
-	const struct ccc_io	*cio   = ccc_env_io(env);
-	int			 result;
-
-	/*
-	 * Work around DLM peculiarity: it assumes that glimpse
-	 * (LDLM_FL_HAS_INTENT) lock is always LCK_PR, and returns reads lock
-	 * when asked for LCK_PW lock with LDLM_FL_HAS_INTENT flag set. Make
-	 * sure that glimpse doesn't get CLM_WRITE top-lock, so that it
-	 * doesn't enqueue CLM_WRITE sub-locks.
-	 */
-	if (cio->cui_glimpse)
-		result = descr->cld_mode != CLM_WRITE;
-
-	/*
-	 * Also, don't match incomplete write locks for read, otherwise read
-	 * would enqueue missing sub-locks in the write mode.
-	 */
-	else if (need->cld_mode != descr->cld_mode)
-		result = lock->cll_state >= CLS_ENQUEUED;
-	else
-		result = 1;
-	return result;
-}
-
-/**
- * Implements cl_lock_operations::clo_state() method for ccc layer, invoked
- * whenever lock state changes. Transfers object attributes, that might be
- * updated as a result of lock acquiring into inode.
- */
-void ccc_lock_state(const struct lu_env *env,
-		    const struct cl_lock_slice *slice,
-		    enum cl_lock_state state)
-{
-	struct cl_lock *lock = slice->cls_lock;
-
-	/*
-	 * Refresh inode attributes when the lock is moving into CLS_HELD
-	 * state, and only when this is a result of real enqueue, rather than
-	 * of finding lock in the cache.
-	 */
-	if (state == CLS_HELD && lock->cll_state < CLS_HELD) {
-		struct cl_object *obj;
-		struct inode     *inode;
-
-		obj   = slice->cls_obj;
-		inode = ccc_object_inode(obj);
-
-		/* vmtruncate() sets the i_size
-		 * under both a DLM lock and the
-		 * ll_inode_size_lock().  If we don't get the
-		 * ll_inode_size_lock() here we can match the DLM lock and
-		 * reset i_size.  generic_file_write can then trust the
-		 * stale i_size when doing appending writes and effectively
-		 * cancel the result of the truncate.  Getting the
-		 * ll_inode_size_lock() after the enqueue maintains the DLM
-		 * -> ll_inode_size_lock() acquiring order.
-		 */
-		if (lock->cll_descr.cld_start == 0 &&
-		    lock->cll_descr.cld_end == CL_PAGE_EOF)
-			ll_merge_attr(env, inode);
-	}
-}
-
 /*****************************************************************************
  *
  * io operations.
diff --git a/drivers/staging/lustre/lustre/llite/lcommon_misc.c b/drivers/staging/lustre/lustre/llite/lcommon_misc.c
index f68c368..4c12e21 100644
--- a/drivers/staging/lustre/lustre/llite/lcommon_misc.c
+++ b/drivers/staging/lustre/lustre/llite/lcommon_misc.c
@@ -145,7 +145,7 @@ int cl_get_grouplock(struct cl_object *obj, unsigned long gid, int nonblock,
 	io->ci_ignore_layout = 1;
 
 	rc = cl_io_init(env, io, CIT_MISC, io->ci_obj);
-	if (rc) {
+	if (rc != 0) {
 		cl_io_fini(env, io);
 		cl_env_put(env, &refcheck);
 		/* Does not make sense to take GL for released layout */
@@ -154,7 +154,8 @@ int cl_get_grouplock(struct cl_object *obj, unsigned long gid, int nonblock,
 		return rc;
 	}
 
-	descr = &ccc_env_info(env)->cti_descr;
+	lock = ccc_env_lock(env);
+	descr = &lock->cll_descr;
 	descr->cld_obj = obj;
 	descr->cld_start = 0;
 	descr->cld_end = CL_PAGE_EOF;
@@ -164,11 +165,11 @@ int cl_get_grouplock(struct cl_object *obj, unsigned long gid, int nonblock,
 	enqflags = CEF_MUST | (nonblock ? CEF_NONBLOCK : 0);
 	descr->cld_enq_flags = enqflags;
 
-	lock = cl_lock_request(env, io, descr, GROUPLOCK_SCOPE, current);
-	if (IS_ERR(lock)) {
+	rc = cl_lock_request(env, io, lock);
+	if (rc < 0) {
 		cl_io_fini(env, io);
 		cl_env_put(env, &refcheck);
-		return PTR_ERR(lock);
+		return rc;
 	}
 
 	cg->cg_env  = cl_env_get(&refcheck);
@@ -194,8 +195,7 @@ void cl_put_grouplock(struct ccc_grouplock *cg)
 	cl_env_implant(env, &refcheck);
 	cl_env_put(env, &refcheck);
 
-	cl_unuse(env, lock);
-	cl_lock_release(env, lock, GROUPLOCK_SCOPE, current);
+	cl_lock_release(env, lock);
 	cl_io_fini(env, io);
 	cl_env_put(env, NULL);
 }
diff --git a/drivers/staging/lustre/lustre/llite/rw26.c b/drivers/staging/lustre/lustre/llite/rw26.c
index e2fea8c..50d8289 100644
--- a/drivers/staging/lustre/lustre/llite/rw26.c
+++ b/drivers/staging/lustre/lustre/llite/rw26.c
@@ -150,8 +150,7 @@ static int ll_releasepage(struct page *vmpage, gfp_t gfp_mask)
 	 * If this page holds the last refc of cl_object, the following
 	 * call path may cause reschedule:
 	 *   cl_page_put -> cl_page_free -> cl_object_put ->
-	 *     lu_object_put -> lu_object_free -> lov_delete_raid0 ->
-	 *     cl_locks_prune.
+	 *     lu_object_put -> lu_object_free -> lov_delete_raid0.
 	 *
 	 * However, the kernel can't get rid of this inode until all pages have
 	 * been cleaned up. Now that we hold page lock here, it's pretty safe
diff --git a/drivers/staging/lustre/lustre/llite/vvp_io.c b/drivers/staging/lustre/lustre/llite/vvp_io.c
index c7db318..fb6f932 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_io.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_io.c
@@ -233,7 +233,7 @@ static int vvp_mmap_locks(const struct lu_env *env,
 	ldlm_policy_data_t      policy;
 	unsigned long	   addr;
 	ssize_t		 count;
-	int		     result;
+	int		 result = 0;
 	struct iov_iter i;
 	struct iovec iov;
 
@@ -265,10 +265,10 @@ static int vvp_mmap_locks(const struct lu_env *env,
 
 			if (ll_file_nolock(vma->vm_file)) {
 				/*
-				 * For no lock case, a lockless lock will be
-				 * generated.
+				 * For no lock case is not allowed for mmap
 				 */
-				flags = CEF_NEVER;
+				result = -EINVAL;
+				break;
 			}
 
 			/*
@@ -290,10 +290,8 @@ static int vvp_mmap_locks(const struct lu_env *env,
 			       descr->cld_mode, descr->cld_start,
 			       descr->cld_end);
 
-			if (result < 0) {
-				up_read(&mm->mmap_sem);
-				return result;
-			}
+			if (result < 0)
+				break;
 
 			if (vma->vm_end - addr >= count)
 				break;
@@ -302,8 +300,10 @@ static int vvp_mmap_locks(const struct lu_env *env,
 			addr = vma->vm_end;
 		}
 		up_read(&mm->mmap_sem);
+		if (result < 0)
+			break;
 	}
-	return 0;
+	return result;
 }
 
 static int vvp_io_rw_lock(const struct lu_env *env, struct cl_io *io,
@@ -781,6 +781,7 @@ static int vvp_io_write_start(const struct lu_env *env,
 		 * PARALLEL IO This has to be changed for parallel IO doing
 		 * out-of-order writes.
 		 */
+		ll_merge_attr(env, inode);
 		pos = io->u.ci_wr.wr.crw_pos = i_size_read(inode);
 		cio->cui_iocb->ki_pos = pos;
 	} else {
diff --git a/drivers/staging/lustre/lustre/llite/vvp_lock.c b/drivers/staging/lustre/lustre/llite/vvp_lock.c
index ff09480..8c505a6 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_lock.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_lock.c
@@ -51,32 +51,9 @@
  *
  */
 
-/**
- * Estimates lock value for the purpose of managing the lock cache during
- * memory shortages.
- *
- * Locks for memory mapped files are almost infinitely precious, others are
- * junk. "Mapped locks" are heavy, but not infinitely heavy, so that they are
- * ordered within themselves by weights assigned from other layers.
- */
-static unsigned long vvp_lock_weigh(const struct lu_env *env,
-				    const struct cl_lock_slice *slice)
-{
-	struct ccc_object *cob = cl2ccc(slice->cls_obj);
-
-	return atomic_read(&cob->cob_mmap_cnt) > 0 ? ~0UL >> 2 : 0;
-}
-
 static const struct cl_lock_operations vvp_lock_ops = {
-	.clo_delete    = ccc_lock_delete,
 	.clo_fini      = ccc_lock_fini,
-	.clo_enqueue   = ccc_lock_enqueue,
-	.clo_wait      = ccc_lock_wait,
-	.clo_use       = ccc_lock_use,
-	.clo_unuse     = ccc_lock_unuse,
-	.clo_fits_into = ccc_lock_fits_into,
-	.clo_state     = ccc_lock_state,
-	.clo_weigh     = vvp_lock_weigh
+	.clo_enqueue   = ccc_lock_enqueue
 };
 
 int vvp_lock_init(const struct lu_env *env, struct cl_object *obj,
diff --git a/drivers/staging/lustre/lustre/llite/vvp_object.c b/drivers/staging/lustre/lustre/llite/vvp_object.c
index d03eb2b..34210db 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_object.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_object.c
@@ -170,11 +170,15 @@ static int vvp_prune(const struct lu_env *env, struct cl_object *obj)
 	struct inode *inode = ccc_object_inode(obj);
 	int rc;
 
-	rc = cl_sync_file_range(inode, 0, OBD_OBJECT_EOF, CL_FSYNC_ALL, 1);
-	if (rc == 0)
-		truncate_inode_pages(inode->i_mapping, 0);
+	rc = cl_sync_file_range(inode, 0, OBD_OBJECT_EOF, CL_FSYNC_LOCAL, 1);
+	if (rc < 0) {
+		CDEBUG(D_VFSTRACE, DFID ": writeback failed: %d\n",
+		       PFID(lu_object_fid(&obj->co_lu)), rc);
+		return rc;
+	}
 
-	return rc;
+	truncate_inode_pages(inode->i_mapping, 0);
+	return 0;
 }
 
 static const struct cl_object_operations vvp_ops = {
diff --git a/drivers/staging/lustre/lustre/lov/lov_cl_internal.h b/drivers/staging/lustre/lustre/lov/lov_cl_internal.h
index 9b3d13b..dfe41a8 100644
--- a/drivers/staging/lustre/lustre/lov/lov_cl_internal.h
+++ b/drivers/staging/lustre/lustre/lov/lov_cl_internal.h
@@ -281,24 +281,17 @@ struct lov_object {
 };
 
 /**
- * Flags that top-lock can set on each of its sub-locks.
- */
-enum lov_sub_flags {
-	/** Top-lock acquired a hold (cl_lock_hold()) on a sub-lock. */
-	LSF_HELD = 1 << 0
-};
-
-/**
  * State lov_lock keeps for each sub-lock.
  */
 struct lov_lock_sub {
 	/** sub-lock itself */
-	struct lovsub_lock  *sub_lock;
-	/** An array of per-sub-lock flags, taken from enum lov_sub_flags */
-	unsigned	     sub_flags;
+	struct cl_lock		sub_lock;
+	/** Set if the sublock has ever been enqueued, meaning it may
+	 * hold resources of underlying layers
+	 */
+	unsigned int		sub_is_enqueued:1,
+				sub_initialized:1;
 	int		  sub_stripe;
-	struct cl_lock_descr sub_descr;
-	struct cl_lock_descr sub_got;
 };
 
 /**
@@ -308,59 +301,8 @@ struct lov_lock {
 	struct cl_lock_slice   lls_cl;
 	/** Number of sub-locks in this lock */
 	int		    lls_nr;
-	/**
-	 * Number of existing sub-locks.
-	 */
-	unsigned	       lls_nr_filled;
-	/**
-	 * Set when sub-lock was canceled, while top-lock was being
-	 * used, or unused.
-	 */
-	unsigned int	       lls_cancel_race:1;
-	/**
-	 * An array of sub-locks
-	 *
-	 * There are two issues with managing sub-locks:
-	 *
-	 *     - sub-locks are concurrently canceled, and
-	 *
-	 *     - sub-locks are shared with other top-locks.
-	 *
-	 * To manage cancellation, top-lock acquires a hold on a sublock
-	 * (lov_sublock_adopt()) when the latter is inserted into
-	 * lov_lock::lls_sub[]. This hold is released (lov_sublock_release())
-	 * when top-lock is going into CLS_CACHED state or destroyed. Hold
-	 * prevents sub-lock from cancellation.
-	 *
-	 * Sub-lock sharing means, among other things, that top-lock that is
-	 * in the process of creation (i.e., not yet inserted into lock list)
-	 * is already accessible to other threads once at least one of its
-	 * sub-locks is created, see lov_lock_sub_init().
-	 *
-	 * Sub-lock can be in one of the following states:
-	 *
-	 *     - doesn't exist, lov_lock::lls_sub[]::sub_lock == NULL. Such
-	 *       sub-lock was either never created (top-lock is in CLS_NEW
-	 *       state), or it was created, then canceled, then destroyed
-	 *       (lov_lock_unlink() cleared sub-lock pointer in the top-lock).
-	 *
-	 *     - sub-lock exists and is on
-	 *       hold. (lov_lock::lls_sub[]::sub_flags & LSF_HELD). This is a
-	 *       normal state of a sub-lock in CLS_HELD and CLS_CACHED states
-	 *       of a top-lock.
-	 *
-	 *     - sub-lock exists, but is not held by the top-lock. This
-	 *       happens after top-lock released a hold on sub-locks before
-	 *       going into cache (lov_lock_unuse()).
-	 *
-	 * \todo To support wide-striping, array has to be replaced with a set
-	 * of queues to avoid scanning.
-	 */
-	struct lov_lock_sub   *lls_sub;
-	/**
-	 * Original description with which lock was enqueued.
-	 */
-	struct cl_lock_descr   lls_orig;
+	/** sublock array */
+	struct lov_lock_sub     lls_sub[0];
 };
 
 struct lov_page {
@@ -445,7 +387,6 @@ struct lov_thread_info {
 	struct ost_lvb	  lti_lvb;
 	struct cl_2queue	lti_cl2q;
 	struct cl_page_list     lti_plist;
-	struct cl_lock_closure  lti_closure;
 	wait_queue_t	  lti_waiter;
 	struct cl_attr          lti_attr;
 };
diff --git a/drivers/staging/lustre/lustre/lov/lov_dev.c b/drivers/staging/lustre/lustre/lov/lov_dev.c
index 532ef87..dccc634 100644
--- a/drivers/staging/lustre/lustre/lov/lov_dev.c
+++ b/drivers/staging/lustre/lustre/lov/lov_dev.c
@@ -143,9 +143,7 @@ static void *lov_key_init(const struct lu_context *ctx,
 	struct lov_thread_info *info;
 
 	info = kmem_cache_zalloc(lov_thread_kmem, GFP_NOFS);
-	if (info)
-		INIT_LIST_HEAD(&info->lti_closure.clc_list);
-	else
+	if (!info)
 		info = ERR_PTR(-ENOMEM);
 	return info;
 }
@@ -155,7 +153,6 @@ static void lov_key_fini(const struct lu_context *ctx,
 {
 	struct lov_thread_info *info = data;
 
-	LINVRNT(list_empty(&info->lti_closure.clc_list));
 	kmem_cache_free(lov_thread_kmem, info);
 }
 
diff --git a/drivers/staging/lustre/lustre/lov/lov_lock.c b/drivers/staging/lustre/lustre/lov/lov_lock.c
index ae854bc..1b203d1 100644
--- a/drivers/staging/lustre/lustre/lov/lov_lock.c
+++ b/drivers/staging/lustre/lustre/lov/lov_lock.c
@@ -46,11 +46,6 @@
  *  @{
  */
 
-static struct cl_lock_closure *lov_closure_get(const struct lu_env *env,
-					       struct cl_lock *parent);
-
-static int lov_lock_unuse(const struct lu_env *env,
-			  const struct cl_lock_slice *slice);
 /*****************************************************************************
  *
  * Lov lock operations.
@@ -58,7 +53,7 @@ static int lov_lock_unuse(const struct lu_env *env,
  */
 
 static struct lov_sublock_env *lov_sublock_env_get(const struct lu_env *env,
-						   struct cl_lock *parent,
+						   const struct cl_lock *parent,
 						   struct lov_lock_sub *lls)
 {
 	struct lov_sublock_env *subenv;
@@ -100,185 +95,26 @@ static void lov_sublock_env_put(struct lov_sublock_env *subenv)
 		lov_sub_put(subenv->lse_sub);
 }
 
-static void lov_sublock_adopt(const struct lu_env *env, struct lov_lock *lck,
-			      struct cl_lock *sublock, int idx,
-			      struct lov_lock_link *link)
+static int lov_sublock_init(const struct lu_env *env,
+			    const struct cl_lock *parent,
+			    struct lov_lock_sub *lls)
 {
-	struct lovsub_lock *lsl;
-	struct cl_lock     *parent = lck->lls_cl.cls_lock;
-	int		 rc;
-
-	LASSERT(cl_lock_is_mutexed(parent));
-	LASSERT(cl_lock_is_mutexed(sublock));
-
-	lsl = cl2sub_lock(sublock);
-	/*
-	 * check that sub-lock doesn't have lock link to this top-lock.
-	 */
-	LASSERT(!lov_lock_link_find(env, lck, lsl));
-	LASSERT(idx < lck->lls_nr);
-
-	lck->lls_sub[idx].sub_lock = lsl;
-	lck->lls_nr_filled++;
-	LASSERT(lck->lls_nr_filled <= lck->lls_nr);
-	list_add_tail(&link->lll_list, &lsl->lss_parents);
-	link->lll_idx = idx;
-	link->lll_super = lck;
-	cl_lock_get(parent);
-	lu_ref_add(&parent->cll_reference, "lov-child", sublock);
-	lck->lls_sub[idx].sub_flags |= LSF_HELD;
-	cl_lock_user_add(env, sublock);
-
-	rc = lov_sublock_modify(env, lck, lsl, &sublock->cll_descr, idx);
-	LASSERT(rc == 0); /* there is no way this can fail, currently */
-}
-
-static struct cl_lock *lov_sublock_alloc(const struct lu_env *env,
-					 const struct cl_io *io,
-					 struct lov_lock *lck,
-					 int idx, struct lov_lock_link **out)
-{
-	struct cl_lock       *sublock;
-	struct cl_lock       *parent;
-	struct lov_lock_link *link;
-
-	LASSERT(idx < lck->lls_nr);
-
-	link = kmem_cache_zalloc(lov_lock_link_kmem, GFP_NOFS);
-	if (link) {
-		struct lov_sublock_env *subenv;
-		struct lov_lock_sub  *lls;
-		struct cl_lock_descr *descr;
-
-		parent = lck->lls_cl.cls_lock;
-		lls    = &lck->lls_sub[idx];
-		descr  = &lls->sub_got;
-
-		subenv = lov_sublock_env_get(env, parent, lls);
-		if (!IS_ERR(subenv)) {
-			/* CAVEAT: Don't try to add a field in lov_lock_sub
-			 * to remember the subio. This is because lock is able
-			 * to be cached, but this is not true for IO. This
-			 * further means a sublock might be referenced in
-			 * different io context. -jay
-			 */
-
-			sublock = cl_lock_hold(subenv->lse_env, subenv->lse_io,
-					       descr, "lov-parent", parent);
-			lov_sublock_env_put(subenv);
-		} else {
-			/* error occurs. */
-			sublock = (void *)subenv;
-		}
-
-		if (!IS_ERR(sublock))
-			*out = link;
-		else
-			kmem_cache_free(lov_lock_link_kmem, link);
-	} else
-		sublock = ERR_PTR(-ENOMEM);
-	return sublock;
-}
-
-static void lov_sublock_unlock(const struct lu_env *env,
-			       struct lovsub_lock *lsl,
-			       struct cl_lock_closure *closure,
-			       struct lov_sublock_env *subenv)
-{
-	lov_sublock_env_put(subenv);
-	lsl->lss_active = NULL;
-	cl_lock_disclosure(env, closure);
-}
-
-static int lov_sublock_lock(const struct lu_env *env,
-			    struct lov_lock *lck,
-			    struct lov_lock_sub *lls,
-			    struct cl_lock_closure *closure,
-			    struct lov_sublock_env **lsep)
-{
-	struct lovsub_lock *sublock;
-	struct cl_lock     *child;
-	int		 result = 0;
-
-	LASSERT(list_empty(&closure->clc_list));
-
-	sublock = lls->sub_lock;
-	child = sublock->lss_cl.cls_lock;
-	result = cl_lock_closure_build(env, child, closure);
-	if (result == 0) {
-		struct cl_lock *parent = closure->clc_origin;
-
-		LASSERT(cl_lock_is_mutexed(child));
-		sublock->lss_active = parent;
-
-		if (unlikely((child->cll_state == CLS_FREEING) ||
-			     (child->cll_flags & CLF_CANCELLED))) {
-			struct lov_lock_link *link;
-			/*
-			 * we could race with lock deletion which temporarily
-			 * put the lock in freeing state, bug 19080.
-			 */
-			LASSERT(!(lls->sub_flags & LSF_HELD));
-
-			link = lov_lock_link_find(env, lck, sublock);
-			LASSERT(link);
-			lov_lock_unlink(env, link, sublock);
-			lov_sublock_unlock(env, sublock, closure, NULL);
-			lck->lls_cancel_race = 1;
-			result = CLO_REPEAT;
-		} else if (lsep) {
-			struct lov_sublock_env *subenv;
+	struct lov_sublock_env *subenv;
+	int result;
 
-			subenv = lov_sublock_env_get(env, parent, lls);
-			if (IS_ERR(subenv)) {
-				lov_sublock_unlock(env, sublock,
-						   closure, NULL);
-				result = PTR_ERR(subenv);
-			} else {
-				*lsep = subenv;
-			}
-		}
+	subenv = lov_sublock_env_get(env, parent, lls);
+	if (!IS_ERR(subenv)) {
+		result = cl_lock_init(subenv->lse_env, &lls->sub_lock,
+				      subenv->lse_io);
+		lov_sublock_env_put(subenv);
+	} else {
+		/* error occurs. */
+		result = PTR_ERR(subenv);
 	}
 	return result;
 }
 
 /**
- * Updates the result of a top-lock operation from a result of sub-lock
- * sub-operations. Top-operations like lov_lock_{enqueue,use,unuse}() iterate
- * over sub-locks and lov_subresult() is used to calculate return value of a
- * top-operation. To this end, possible return values of sub-operations are
- * ordered as
- *
- *     - 0		  success
- *     - CLO_WAIT	   wait for event
- *     - CLO_REPEAT	 repeat top-operation
- *     - -ne		fundamental error
- *
- * Top-level return code can only go down through this list. CLO_REPEAT
- * overwrites CLO_WAIT, because lock mutex was released and sleeping condition
- * has to be rechecked by the upper layer.
- */
-static int lov_subresult(int result, int rc)
-{
-	int result_rank;
-	int rc_rank;
-
-	LASSERTF(result <= 0 || result == CLO_REPEAT || result == CLO_WAIT,
-		 "result = %d\n", result);
-	LASSERTF(rc <= 0 || rc == CLO_REPEAT || rc == CLO_WAIT,
-		 "rc = %d\n", rc);
-	CLASSERT(CLO_WAIT < CLO_REPEAT);
-
-	/* calculate ranks in the ordering above */
-	result_rank = result < 0 ? 1 + CLO_REPEAT : result;
-	rc_rank = rc < 0 ? 1 + CLO_REPEAT : rc;
-
-	if (result_rank < rc_rank)
-		result = rc;
-	return result;
-}
-
-/**
  * Creates sub-locks for a given lov_lock for the first time.
  *
  * Goes through all sub-objects of top-object, and creates sub-locks on every
@@ -286,8 +122,9 @@ static int lov_subresult(int result, int rc)
  * fact that top-lock (that is being created) can be accessed concurrently
  * through already created sub-locks (possibly shared with other top-locks).
  */
-static int lov_lock_sub_init(const struct lu_env *env,
-			     struct lov_lock *lck, const struct cl_io *io)
+static struct lov_lock *lov_lock_sub_init(const struct lu_env *env,
+					  const struct cl_object *obj,
+					  struct cl_lock *lock)
 {
 	int result = 0;
 	int i;
@@ -297,241 +134,86 @@ static int lov_lock_sub_init(const struct lu_env *env,
 	u64 file_start;
 	u64 file_end;
 
-	struct lov_object       *loo    = cl2lov(lck->lls_cl.cls_obj);
+	struct lov_object       *loo    = cl2lov(obj);
 	struct lov_layout_raid0 *r0     = lov_r0(loo);
-	struct cl_lock	  *parent = lck->lls_cl.cls_lock;
+	struct lov_lock		*lovlck;
 
-	lck->lls_orig = parent->cll_descr;
-	file_start = cl_offset(lov2cl(loo), parent->cll_descr.cld_start);
-	file_end   = cl_offset(lov2cl(loo), parent->cll_descr.cld_end + 1) - 1;
+	file_start = cl_offset(lov2cl(loo), lock->cll_descr.cld_start);
+	file_end   = cl_offset(lov2cl(loo), lock->cll_descr.cld_end + 1) - 1;
 
 	for (i = 0, nr = 0; i < r0->lo_nr; i++) {
 		/*
 		 * XXX for wide striping smarter algorithm is desirable,
 		 * breaking out of the loop, early.
 		 */
-		if (likely(r0->lo_sub[i]) &&
+		if (likely(r0->lo_sub[i]) && /* spare layout */
 		    lov_stripe_intersects(loo->lo_lsm, i,
 					  file_start, file_end, &start, &end))
 			nr++;
 	}
 	LASSERT(nr > 0);
-	lck->lls_sub = libcfs_kvzalloc(nr * sizeof(lck->lls_sub[0]), GFP_NOFS);
-	if (!lck->lls_sub)
-		return -ENOMEM;
+	lovlck = libcfs_kvzalloc(offsetof(struct lov_lock, lls_sub[nr]),
+				 GFP_NOFS);
+	if (!lovlck)
+		return ERR_PTR(-ENOMEM);
 
-	lck->lls_nr = nr;
-	/*
-	 * First, fill in sub-lock descriptions in
-	 * lck->lls_sub[].sub_descr. They are used by lov_sublock_alloc()
-	 * (called below in this function, and by lov_lock_enqueue()) to
-	 * create sub-locks. At this moment, no other thread can access
-	 * top-lock.
-	 */
+	lovlck->lls_nr = nr;
 	for (i = 0, nr = 0; i < r0->lo_nr; ++i) {
 		if (likely(r0->lo_sub[i]) &&
 		    lov_stripe_intersects(loo->lo_lsm, i,
 					  file_start, file_end, &start, &end)) {
+			struct lov_lock_sub *lls = &lovlck->lls_sub[nr];
 			struct cl_lock_descr *descr;
 
-			descr = &lck->lls_sub[nr].sub_descr;
+			descr = &lls->sub_lock.cll_descr;
 
 			LASSERT(!descr->cld_obj);
 			descr->cld_obj   = lovsub2cl(r0->lo_sub[i]);
 			descr->cld_start = cl_index(descr->cld_obj, start);
 			descr->cld_end   = cl_index(descr->cld_obj, end);
-			descr->cld_mode  = parent->cll_descr.cld_mode;
-			descr->cld_gid   = parent->cll_descr.cld_gid;
-			descr->cld_enq_flags   = parent->cll_descr.cld_enq_flags;
-			/* XXX has no effect */
-			lck->lls_sub[nr].sub_got = *descr;
-			lck->lls_sub[nr].sub_stripe = i;
+			descr->cld_mode  = lock->cll_descr.cld_mode;
+			descr->cld_gid   = lock->cll_descr.cld_gid;
+			descr->cld_enq_flags = lock->cll_descr.cld_enq_flags;
+			lls->sub_stripe = i;
+
+			/* initialize sub lock */
+			result = lov_sublock_init(env, lock, lls);
+			if (result < 0)
+				break;
+
+			lls->sub_initialized = 1;
 			nr++;
 		}
 	}
-	LASSERT(nr == lck->lls_nr);
-
-	/*
-	 * Some sub-locks can be missing at this point. This is not a problem,
-	 * because enqueue will create them anyway. Main duty of this function
-	 * is to fill in sub-lock descriptions in a race free manner.
-	 */
-	return result;
-}
+	LASSERT(ergo(result == 0, nr == lovlck->lls_nr));
 
-static int lov_sublock_release(const struct lu_env *env, struct lov_lock *lck,
-			       int i, int deluser, int rc)
-{
-	struct cl_lock *parent = lck->lls_cl.cls_lock;
-
-	LASSERT(cl_lock_is_mutexed(parent));
-
-	if (lck->lls_sub[i].sub_flags & LSF_HELD) {
-		struct cl_lock    *sublock;
-		int dying;
-
-		sublock = lck->lls_sub[i].sub_lock->lss_cl.cls_lock;
-		LASSERT(cl_lock_is_mutexed(sublock));
+	if (result != 0) {
+		for (i = 0; i < nr; ++i) {
+			if (!lovlck->lls_sub[i].sub_initialized)
+				break;
 
-		lck->lls_sub[i].sub_flags &= ~LSF_HELD;
-		if (deluser)
-			cl_lock_user_del(env, sublock);
-		/*
-		 * If the last hold is released, and cancellation is pending
-		 * for a sub-lock, release parent mutex, to avoid keeping it
-		 * while sub-lock is being paged out.
-		 */
-		dying = (sublock->cll_descr.cld_mode == CLM_PHANTOM ||
-			 sublock->cll_descr.cld_mode == CLM_GROUP ||
-			 (sublock->cll_flags & (CLF_CANCELPEND|CLF_DOOMED))) &&
-			sublock->cll_holds == 1;
-		if (dying)
-			cl_lock_mutex_put(env, parent);
-		cl_lock_unhold(env, sublock, "lov-parent", parent);
-		if (dying) {
-			cl_lock_mutex_get(env, parent);
-			rc = lov_subresult(rc, CLO_REPEAT);
+			cl_lock_fini(env, &lovlck->lls_sub[i].sub_lock);
 		}
-		/*
-		 * From now on lck->lls_sub[i].sub_lock is a "weak" pointer,
-		 * not backed by a reference on a
-		 * sub-lock. lovsub_lock_delete() will clear
-		 * lck->lls_sub[i].sub_lock under semaphores, just before
-		 * sub-lock is destroyed.
-		 */
+		kvfree(lovlck);
+		lovlck = ERR_PTR(result);
 	}
-	return rc;
-}
-
-static void lov_sublock_hold(const struct lu_env *env, struct lov_lock *lck,
-			     int i)
-{
-	struct cl_lock *parent = lck->lls_cl.cls_lock;
-
-	LASSERT(cl_lock_is_mutexed(parent));
-
-	if (!(lck->lls_sub[i].sub_flags & LSF_HELD)) {
-		struct cl_lock *sublock;
-
-		sublock = lck->lls_sub[i].sub_lock->lss_cl.cls_lock;
-		LASSERT(cl_lock_is_mutexed(sublock));
-		LASSERT(sublock->cll_state != CLS_FREEING);
 
-		lck->lls_sub[i].sub_flags |= LSF_HELD;
-
-		cl_lock_get_trust(sublock);
-		cl_lock_hold_add(env, sublock, "lov-parent", parent);
-		cl_lock_user_add(env, sublock);
-		cl_lock_put(env, sublock);
-	}
+	return lovlck;
 }
 
 static void lov_lock_fini(const struct lu_env *env,
 			  struct cl_lock_slice *slice)
 {
-	struct lov_lock *lck;
+	struct lov_lock *lovlck;
 	int i;
 
-	lck = cl2lov_lock(slice);
-	LASSERT(lck->lls_nr_filled == 0);
-	if (lck->lls_sub) {
-		for (i = 0; i < lck->lls_nr; ++i)
-			/*
-			 * No sub-locks exists at this point, as sub-lock has
-			 * a reference on its parent.
-			 */
-			LASSERT(!lck->lls_sub[i].sub_lock);
-		kvfree(lck->lls_sub);
+	lovlck = cl2lov_lock(slice);
+	for (i = 0; i < lovlck->lls_nr; ++i) {
+		LASSERT(!lovlck->lls_sub[i].sub_is_enqueued);
+		if (lovlck->lls_sub[i].sub_initialized)
+			cl_lock_fini(env, &lovlck->lls_sub[i].sub_lock);
 	}
-	kmem_cache_free(lov_lock_kmem, lck);
-}
-
-static int lov_lock_enqueue_wait(const struct lu_env *env,
-				 struct lov_lock *lck,
-				 struct cl_lock *sublock)
-{
-	struct cl_lock *lock = lck->lls_cl.cls_lock;
-	int	     result;
-
-	LASSERT(cl_lock_is_mutexed(lock));
-
-	cl_lock_mutex_put(env, lock);
-	result = cl_lock_enqueue_wait(env, sublock, 0);
-	cl_lock_mutex_get(env, lock);
-	return result ?: CLO_REPEAT;
-}
-
-/**
- * Tries to advance a state machine of a given sub-lock toward enqueuing of
- * the top-lock.
- *
- * \retval 0 if state-transition can proceed
- * \retval -ve otherwise.
- */
-static int lov_lock_enqueue_one(const struct lu_env *env, struct lov_lock *lck,
-				struct cl_lock *sublock,
-				struct cl_io *io, __u32 enqflags, int last)
-{
-	int result;
-
-	/* first, try to enqueue a sub-lock ... */
-	result = cl_enqueue_try(env, sublock, io, enqflags);
-	if ((sublock->cll_state == CLS_ENQUEUED) && !(enqflags & CEF_AGL)) {
-		/* if it is enqueued, try to `wait' on it---maybe it's already
-		 * granted
-		 */
-		result = cl_wait_try(env, sublock);
-		if (result == CLO_REENQUEUED)
-			result = CLO_WAIT;
-	}
-	/*
-	 * If CEF_ASYNC flag is set, then all sub-locks can be enqueued in
-	 * parallel, otherwise---enqueue has to wait until sub-lock is granted
-	 * before proceeding to the next one.
-	 */
-	if ((result == CLO_WAIT) && (sublock->cll_state <= CLS_HELD) &&
-	    (enqflags & CEF_ASYNC) && (!last || (enqflags & CEF_AGL)))
-		result = 0;
-	return result;
-}
-
-/**
- * Helper function for lov_lock_enqueue() that creates missing sub-lock.
- */
-static int lov_sublock_fill(const struct lu_env *env, struct cl_lock *parent,
-			    struct cl_io *io, struct lov_lock *lck, int idx)
-{
-	struct lov_lock_link *link = NULL;
-	struct cl_lock       *sublock;
-	int		   result;
-
-	LASSERT(parent->cll_depth == 1);
-	cl_lock_mutex_put(env, parent);
-	sublock = lov_sublock_alloc(env, io, lck, idx, &link);
-	if (!IS_ERR(sublock))
-		cl_lock_mutex_get(env, sublock);
-	cl_lock_mutex_get(env, parent);
-
-	if (!IS_ERR(sublock)) {
-		cl_lock_get_trust(sublock);
-		if (parent->cll_state == CLS_QUEUING &&
-		    !lck->lls_sub[idx].sub_lock) {
-			lov_sublock_adopt(env, lck, sublock, idx, link);
-		} else {
-			kmem_cache_free(lov_lock_link_kmem, link);
-			/* other thread allocated sub-lock, or enqueue is no
-			 * longer going on
-			 */
-			cl_lock_mutex_put(env, parent);
-			cl_lock_unhold(env, sublock, "lov-parent", parent);
-			cl_lock_mutex_get(env, parent);
-		}
-		cl_lock_mutex_put(env, sublock);
-		cl_lock_put(env, sublock);
-		result = CLO_REPEAT;
-	} else
-		result = PTR_ERR(sublock);
-	return result;
+	kvfree(lovlck);
 }
 
 /**
@@ -543,529 +225,59 @@ static int lov_sublock_fill(const struct lu_env *env, struct cl_lock *parent,
  */
 static int lov_lock_enqueue(const struct lu_env *env,
 			    const struct cl_lock_slice *slice,
-			    struct cl_io *io, __u32 enqflags)
+			    struct cl_io *io, struct cl_sync_io *anchor)
 {
-	struct cl_lock	 *lock    = slice->cls_lock;
-	struct lov_lock	*lck     = cl2lov_lock(slice);
-	struct cl_lock_closure *closure = lov_closure_get(env, lock);
+	struct cl_lock *lock = slice->cls_lock;
+	struct lov_lock *lovlck = cl2lov_lock(slice);
 	int i;
-	int result;
-	enum cl_lock_state minstate;
+	int rc = 0;
 
-	for (result = 0, minstate = CLS_FREEING, i = 0; i < lck->lls_nr; ++i) {
-		int rc;
-		struct lovsub_lock     *sub;
-		struct lov_lock_sub    *lls;
-		struct cl_lock	 *sublock;
+	for (i = 0; i < lovlck->lls_nr; ++i) {
+		struct lov_lock_sub  *lls = &lovlck->lls_sub[i];
 		struct lov_sublock_env *subenv;
 
-		if (lock->cll_state != CLS_QUEUING) {
-			/*
-			 * Lock might have left QUEUING state if previous
-			 * iteration released its mutex. Stop enqueing in this
-			 * case and let the upper layer to decide what to do.
-			 */
-			LASSERT(i > 0 && result != 0);
-			break;
-		}
-
-		lls = &lck->lls_sub[i];
-		sub = lls->sub_lock;
-		/*
-		 * Sub-lock might have been canceled, while top-lock was
-		 * cached.
-		 */
-		if (!sub) {
-			result = lov_sublock_fill(env, lock, io, lck, i);
-			/* lov_sublock_fill() released @lock mutex,
-			 * restart.
-			 */
+		subenv = lov_sublock_env_get(env, lock, lls);
+		if (IS_ERR(subenv)) {
+			rc = PTR_ERR(subenv);
 			break;
 		}
-		sublock = sub->lss_cl.cls_lock;
-		rc = lov_sublock_lock(env, lck, lls, closure, &subenv);
-		if (rc == 0) {
-			lov_sublock_hold(env, lck, i);
-			rc = lov_lock_enqueue_one(subenv->lse_env, lck, sublock,
-						  subenv->lse_io, enqflags,
-						  i == lck->lls_nr - 1);
-			minstate = min(minstate, sublock->cll_state);
-			if (rc == CLO_WAIT) {
-				switch (sublock->cll_state) {
-				case CLS_QUEUING:
-					/* take recursive mutex, the lock is
-					 * released in lov_lock_enqueue_wait.
-					 */
-					cl_lock_mutex_get(env, sublock);
-					lov_sublock_unlock(env, sub, closure,
-							   subenv);
-					rc = lov_lock_enqueue_wait(env, lck,
-								   sublock);
-					break;
-				case CLS_CACHED:
-					cl_lock_get(sublock);
-					/* take recursive mutex of sublock */
-					cl_lock_mutex_get(env, sublock);
-					/* need to release all locks in closure
-					 * otherwise it may deadlock. LU-2683.
-					 */
-					lov_sublock_unlock(env, sub, closure,
-							   subenv);
-					/* sublock and parent are held. */
-					rc = lov_sublock_release(env, lck, i,
-								 1, rc);
-					cl_lock_mutex_put(env, sublock);
-					cl_lock_put(env, sublock);
-					break;
-				default:
-					lov_sublock_unlock(env, sub, closure,
-							   subenv);
-					break;
-				}
-			} else {
-				LASSERT(!sublock->cll_conflict);
-				lov_sublock_unlock(env, sub, closure, subenv);
-			}
-		}
-		result = lov_subresult(result, rc);
-		if (result != 0)
+		rc = cl_lock_enqueue(subenv->lse_env, subenv->lse_io,
+				     &lls->sub_lock, anchor);
+		lov_sublock_env_put(subenv);
+		if (rc != 0)
 			break;
-	}
-	cl_lock_closure_fini(closure);
-	return result ?: minstate >= CLS_ENQUEUED ? 0 : CLO_WAIT;
-}
-
-static int lov_lock_unuse(const struct lu_env *env,
-			  const struct cl_lock_slice *slice)
-{
-	struct lov_lock	*lck     = cl2lov_lock(slice);
-	struct cl_lock_closure *closure = lov_closure_get(env, slice->cls_lock);
-	int i;
-	int result;
-
-	for (result = 0, i = 0; i < lck->lls_nr; ++i) {
-		int rc;
-		struct lovsub_lock     *sub;
-		struct cl_lock	 *sublock;
-		struct lov_lock_sub    *lls;
-		struct lov_sublock_env *subenv;
 
-		/* top-lock state cannot change concurrently, because single
-		 * thread (one that released the last hold) carries unlocking
-		 * to the completion.
-		 */
-		LASSERT(slice->cls_lock->cll_state == CLS_INTRANSIT);
-		lls = &lck->lls_sub[i];
-		sub = lls->sub_lock;
-		if (!sub)
-			continue;
-
-		sublock = sub->lss_cl.cls_lock;
-		rc = lov_sublock_lock(env, lck, lls, closure, &subenv);
-		if (rc == 0) {
-			if (lls->sub_flags & LSF_HELD) {
-				LASSERT(sublock->cll_state == CLS_HELD ||
-					sublock->cll_state == CLS_ENQUEUED);
-				rc = cl_unuse_try(subenv->lse_env, sublock);
-				rc = lov_sublock_release(env, lck, i, 0, rc);
-			}
-			lov_sublock_unlock(env, sub, closure, subenv);
-		}
-		result = lov_subresult(result, rc);
+		lls->sub_is_enqueued = 1;
 	}
-
-	if (result == 0 && lck->lls_cancel_race) {
-		lck->lls_cancel_race = 0;
-		result = -ESTALE;
-	}
-	cl_lock_closure_fini(closure);
-	return result;
+	return rc;
 }
 
 static void lov_lock_cancel(const struct lu_env *env,
 			    const struct cl_lock_slice *slice)
 {
-	struct lov_lock	*lck     = cl2lov_lock(slice);
-	struct cl_lock_closure *closure = lov_closure_get(env, slice->cls_lock);
+	struct cl_lock *lock = slice->cls_lock;
+	struct lov_lock *lovlck = cl2lov_lock(slice);
 	int i;
-	int result;
 
-	for (result = 0, i = 0; i < lck->lls_nr; ++i) {
-		int rc;
-		struct lovsub_lock     *sub;
-		struct cl_lock	 *sublock;
-		struct lov_lock_sub    *lls;
+	for (i = 0; i < lovlck->lls_nr; ++i) {
+		struct lov_lock_sub *lls = &lovlck->lls_sub[i];
+		struct cl_lock *sublock = &lls->sub_lock;
 		struct lov_sublock_env *subenv;
 
-		/* top-lock state cannot change concurrently, because single
-		 * thread (one that released the last hold) carries unlocking
-		 * to the completion.
-		 */
-		lls = &lck->lls_sub[i];
-		sub = lls->sub_lock;
-		if (!sub)
-			continue;
-
-		sublock = sub->lss_cl.cls_lock;
-		rc = lov_sublock_lock(env, lck, lls, closure, &subenv);
-		if (rc == 0) {
-			if (!(lls->sub_flags & LSF_HELD)) {
-				lov_sublock_unlock(env, sub, closure, subenv);
-				continue;
-			}
-
-			switch (sublock->cll_state) {
-			case CLS_HELD:
-				rc = cl_unuse_try(subenv->lse_env, sublock);
-				lov_sublock_release(env, lck, i, 0, 0);
-				break;
-			default:
-				lov_sublock_release(env, lck, i, 1, 0);
-				break;
-			}
-			lov_sublock_unlock(env, sub, closure, subenv);
-		}
-
-		if (rc == CLO_REPEAT) {
-			--i;
-			continue;
-		}
-
-		result = lov_subresult(result, rc);
-	}
-
-	if (result)
-		CL_LOCK_DEBUG(D_ERROR, env, slice->cls_lock,
-			      "lov_lock_cancel fails with %d.\n", result);
-
-	cl_lock_closure_fini(closure);
-}
-
-static int lov_lock_wait(const struct lu_env *env,
-			 const struct cl_lock_slice *slice)
-{
-	struct lov_lock	*lck     = cl2lov_lock(slice);
-	struct cl_lock_closure *closure = lov_closure_get(env, slice->cls_lock);
-	enum cl_lock_state      minstate;
-	int		     reenqueued;
-	int		     result;
-	int		     i;
-
-again:
-	for (result = 0, minstate = CLS_FREEING, i = 0, reenqueued = 0;
-	     i < lck->lls_nr; ++i) {
-		int rc;
-		struct lovsub_lock     *sub;
-		struct cl_lock	 *sublock;
-		struct lov_lock_sub    *lls;
-		struct lov_sublock_env *subenv;
-
-		lls = &lck->lls_sub[i];
-		sub = lls->sub_lock;
-		sublock = sub->lss_cl.cls_lock;
-		rc = lov_sublock_lock(env, lck, lls, closure, &subenv);
-		if (rc == 0) {
-			LASSERT(sublock->cll_state >= CLS_ENQUEUED);
-			if (sublock->cll_state < CLS_HELD)
-				rc = cl_wait_try(env, sublock);
-
-			minstate = min(minstate, sublock->cll_state);
-			lov_sublock_unlock(env, sub, closure, subenv);
-		}
-		if (rc == CLO_REENQUEUED) {
-			reenqueued++;
-			rc = 0;
-		}
-		result = lov_subresult(result, rc);
-		if (result != 0)
-			break;
-	}
-	/* Each sublock only can be reenqueued once, so will not loop
-	 * forever.
-	 */
-	if (result == 0 && reenqueued != 0)
-		goto again;
-	cl_lock_closure_fini(closure);
-	return result ?: minstate >= CLS_HELD ? 0 : CLO_WAIT;
-}
-
-static int lov_lock_use(const struct lu_env *env,
-			const struct cl_lock_slice *slice)
-{
-	struct lov_lock	*lck     = cl2lov_lock(slice);
-	struct cl_lock_closure *closure = lov_closure_get(env, slice->cls_lock);
-	int		     result;
-	int		     i;
-
-	LASSERT(slice->cls_lock->cll_state == CLS_INTRANSIT);
-
-	for (result = 0, i = 0; i < lck->lls_nr; ++i) {
-		int rc;
-		struct lovsub_lock     *sub;
-		struct cl_lock	 *sublock;
-		struct lov_lock_sub    *lls;
-		struct lov_sublock_env *subenv;
-
-		LASSERT(slice->cls_lock->cll_state == CLS_INTRANSIT);
-
-		lls = &lck->lls_sub[i];
-		sub = lls->sub_lock;
-		if (!sub) {
-			/*
-			 * Sub-lock might have been canceled, while top-lock was
-			 * cached.
-			 */
-			result = -ESTALE;
-			break;
-		}
-
-		sublock = sub->lss_cl.cls_lock;
-		rc = lov_sublock_lock(env, lck, lls, closure, &subenv);
-		if (rc == 0) {
-			LASSERT(sublock->cll_state != CLS_FREEING);
-			lov_sublock_hold(env, lck, i);
-			if (sublock->cll_state == CLS_CACHED) {
-				rc = cl_use_try(subenv->lse_env, sublock, 0);
-				if (rc != 0)
-					rc = lov_sublock_release(env, lck,
-								 i, 1, rc);
-			} else if (sublock->cll_state == CLS_NEW) {
-				/* Sub-lock might have been canceled, while
-				 * top-lock was cached.
-				 */
-				result = -ESTALE;
-				lov_sublock_release(env, lck, i, 1, result);
-			}
-			lov_sublock_unlock(env, sub, closure, subenv);
-		}
-		result = lov_subresult(result, rc);
-		if (result != 0)
-			break;
-	}
-
-	if (lck->lls_cancel_race) {
-		/*
-		 * If there is unlocking happened at the same time, then
-		 * sublock_lock state should be FREEING, and lov_sublock_lock
-		 * should return CLO_REPEAT. In this case, it should return
-		 * ESTALE, and up layer should reset the lock state to be NEW.
-		 */
-		lck->lls_cancel_race = 0;
-		LASSERT(result != 0);
-		result = -ESTALE;
-	}
-	cl_lock_closure_fini(closure);
-	return result;
-}
-
-/**
- * Check if the extent region \a descr is covered by \a child against the
- * specific \a stripe.
- */
-static int lov_lock_stripe_is_matching(const struct lu_env *env,
-				       struct lov_object *lov, int stripe,
-				       const struct cl_lock_descr *child,
-				       const struct cl_lock_descr *descr)
-{
-	struct lov_stripe_md *lsm = lov->lo_lsm;
-	u64 start;
-	u64 end;
-	int result;
-
-	if (lov_r0(lov)->lo_nr == 1)
-		return cl_lock_ext_match(child, descr);
-
-	/*
-	 * For a multi-stripes object:
-	 * - make sure the descr only covers child's stripe, and
-	 * - check if extent is matching.
-	 */
-	start = cl_offset(&lov->lo_cl, descr->cld_start);
-	end   = cl_offset(&lov->lo_cl, descr->cld_end + 1) - 1;
-	result = 0;
-	/* glimpse should work on the object with LOV EA hole. */
-	if (end - start <= lsm->lsm_stripe_size) {
-		int idx;
-
-		idx = lov_stripe_number(lsm, start);
-		if (idx == stripe ||
-		    unlikely(!lov_r0(lov)->lo_sub[idx])) {
-			idx = lov_stripe_number(lsm, end);
-			if (idx == stripe ||
-			    unlikely(!lov_r0(lov)->lo_sub[idx]))
-				result = 1;
-		}
-	}
-
-	if (result != 0) {
-		struct cl_lock_descr *subd = &lov_env_info(env)->lti_ldescr;
-		u64 sub_start;
-		u64 sub_end;
-
-		subd->cld_obj  = NULL;   /* don't need sub object at all */
-		subd->cld_mode = descr->cld_mode;
-		subd->cld_gid  = descr->cld_gid;
-		result = lov_stripe_intersects(lsm, stripe, start, end,
-					       &sub_start, &sub_end);
-		LASSERT(result);
-		subd->cld_start = cl_index(child->cld_obj, sub_start);
-		subd->cld_end   = cl_index(child->cld_obj, sub_end);
-		result = cl_lock_ext_match(child, subd);
-	}
-	return result;
-}
-
-/**
- * An implementation of cl_lock_operations::clo_fits_into() method.
- *
- * Checks whether a lock (given by \a slice) is suitable for \a
- * io. Multi-stripe locks can be used only for "quick" io, like truncate, or
- * O_APPEND write.
- *
- * \see ccc_lock_fits_into().
- */
-static int lov_lock_fits_into(const struct lu_env *env,
-			      const struct cl_lock_slice *slice,
-			      const struct cl_lock_descr *need,
-			      const struct cl_io *io)
-{
-	struct lov_lock   *lov = cl2lov_lock(slice);
-	struct lov_object *obj = cl2lov(slice->cls_obj);
-	int result;
-
-	LASSERT(cl_object_same(need->cld_obj, slice->cls_obj));
-	LASSERT(lov->lls_nr > 0);
-
-	/* for top lock, it's necessary to match enq flags otherwise it will
-	 * run into problem if a sublock is missing and reenqueue.
-	 */
-	if (need->cld_enq_flags != lov->lls_orig.cld_enq_flags)
-		return 0;
-
-	if (need->cld_mode == CLM_GROUP)
-		/*
-		 * always allow to match group lock.
-		 */
-		result = cl_lock_ext_match(&lov->lls_orig, need);
-	else if (lov->lls_nr == 1) {
-		struct cl_lock_descr *got = &lov->lls_sub[0].sub_got;
-
-		result = lov_lock_stripe_is_matching(env,
-						     cl2lov(slice->cls_obj),
-						     lov->lls_sub[0].sub_stripe,
-						     got, need);
-	} else if (io->ci_type != CIT_SETATTR && io->ci_type != CIT_MISC &&
-		   !cl_io_is_append(io) && need->cld_mode != CLM_PHANTOM)
-		/*
-		 * Multi-stripe locks are only suitable for `quick' IO and for
-		 * glimpse.
-		 */
-		result = 0;
-	else
-		/*
-		 * Most general case: multi-stripe existing lock, and
-		 * (potentially) multi-stripe @need lock. Check that @need is
-		 * covered by @lov's sub-locks.
-		 *
-		 * For now, ignore lock expansions made by the server, and
-		 * match against original lock extent.
-		 */
-		result = cl_lock_ext_match(&lov->lls_orig, need);
-	CDEBUG(D_DLMTRACE, DDESCR"/"DDESCR" %d %d/%d: %d\n",
-	       PDESCR(&lov->lls_orig), PDESCR(&lov->lls_sub[0].sub_got),
-	       lov->lls_sub[0].sub_stripe, lov->lls_nr, lov_r0(obj)->lo_nr,
-	       result);
-	return result;
-}
-
-void lov_lock_unlink(const struct lu_env *env,
-		     struct lov_lock_link *link, struct lovsub_lock *sub)
-{
-	struct lov_lock *lck    = link->lll_super;
-	struct cl_lock  *parent = lck->lls_cl.cls_lock;
-
-	LASSERT(cl_lock_is_mutexed(parent));
-	LASSERT(cl_lock_is_mutexed(sub->lss_cl.cls_lock));
-
-	list_del_init(&link->lll_list);
-	LASSERT(lck->lls_sub[link->lll_idx].sub_lock == sub);
-	/* yank this sub-lock from parent's array */
-	lck->lls_sub[link->lll_idx].sub_lock = NULL;
-	LASSERT(lck->lls_nr_filled > 0);
-	lck->lls_nr_filled--;
-	lu_ref_del(&parent->cll_reference, "lov-child", sub->lss_cl.cls_lock);
-	cl_lock_put(env, parent);
-	kmem_cache_free(lov_lock_link_kmem, link);
-}
-
-struct lov_lock_link *lov_lock_link_find(const struct lu_env *env,
-					 struct lov_lock *lck,
-					 struct lovsub_lock *sub)
-{
-	struct lov_lock_link *scan;
-
-	LASSERT(cl_lock_is_mutexed(sub->lss_cl.cls_lock));
-
-	list_for_each_entry(scan, &sub->lss_parents, lll_list) {
-		if (scan->lll_super == lck)
-			return scan;
-	}
-	return NULL;
-}
-
-/**
- * An implementation of cl_lock_operations::clo_delete() method. This is
- * invoked for "top-to-bottom" delete, when lock destruction starts from the
- * top-lock, e.g., as a result of inode destruction.
- *
- * Unlinks top-lock from all its sub-locks. Sub-locks are not deleted there:
- * this is done separately elsewhere:
- *
- *     - for inode destruction, lov_object_delete() calls cl_object_kill() for
- *       each sub-object, purging its locks;
- *
- *     - in other cases (e.g., a fatal error with a top-lock) sub-locks are
- *       left in the cache.
- */
-static void lov_lock_delete(const struct lu_env *env,
-			    const struct cl_lock_slice *slice)
-{
-	struct lov_lock	*lck     = cl2lov_lock(slice);
-	struct cl_lock_closure *closure = lov_closure_get(env, slice->cls_lock);
-	struct lov_lock_link   *link;
-	int		     rc;
-	int		     i;
-
-	LASSERT(slice->cls_lock->cll_state == CLS_FREEING);
-
-	for (i = 0; i < lck->lls_nr; ++i) {
-		struct lov_lock_sub *lls = &lck->lls_sub[i];
-		struct lovsub_lock  *lsl = lls->sub_lock;
-
-		if (!lsl) /* already removed */
+		if (!lls->sub_is_enqueued)
 			continue;
 
-		rc = lov_sublock_lock(env, lck, lls, closure, NULL);
-		if (rc == CLO_REPEAT) {
-			--i;
-			continue;
+		lls->sub_is_enqueued = 0;
+		subenv = lov_sublock_env_get(env, lock, lls);
+		if (!IS_ERR(subenv)) {
+			cl_lock_cancel(subenv->lse_env, sublock);
+			lov_sublock_env_put(subenv);
+		} else {
+			CL_LOCK_DEBUG(D_ERROR, env, slice->cls_lock,
+				      "lov_lock_cancel fails with %ld.\n",
+				      PTR_ERR(subenv));
 		}
-
-		LASSERT(rc == 0);
-		LASSERT(lsl->lss_cl.cls_lock->cll_state < CLS_FREEING);
-
-		if (lls->sub_flags & LSF_HELD)
-			lov_sublock_release(env, lck, i, 1, 0);
-
-		link = lov_lock_link_find(env, lck, lsl);
-		LASSERT(link);
-		lov_lock_unlink(env, link, lsl);
-		LASSERT(!lck->lls_sub[i].sub_lock);
-
-		lov_sublock_unlock(env, lsl, closure, NULL);
 	}
-
-	cl_lock_closure_fini(closure);
 }
 
 static int lov_lock_print(const struct lu_env *env, void *cookie,
@@ -1079,12 +291,8 @@ static int lov_lock_print(const struct lu_env *env, void *cookie,
 		struct lov_lock_sub *sub;
 
 		sub = &lck->lls_sub[i];
-		(*p)(env, cookie, "    %d %x: ", i, sub->sub_flags);
-		if (sub->sub_lock)
-			cl_lock_print(env, cookie, p,
-				      sub->sub_lock->lss_cl.cls_lock);
-		else
-			(*p)(env, cookie, "---\n");
+		(*p)(env, cookie, "    %d %x: ", i, sub->sub_is_enqueued);
+		cl_lock_print(env, cookie, p, &sub->sub_lock);
 	}
 	return 0;
 }
@@ -1092,12 +300,7 @@ static int lov_lock_print(const struct lu_env *env, void *cookie,
 static const struct cl_lock_operations lov_lock_ops = {
 	.clo_fini      = lov_lock_fini,
 	.clo_enqueue   = lov_lock_enqueue,
-	.clo_wait      = lov_lock_wait,
-	.clo_use       = lov_lock_use,
-	.clo_unuse     = lov_lock_unuse,
 	.clo_cancel    = lov_lock_cancel,
-	.clo_fits_into = lov_lock_fits_into,
-	.clo_delete    = lov_lock_delete,
 	.clo_print     = lov_lock_print
 };
 
@@ -1105,14 +308,13 @@ int lov_lock_init_raid0(const struct lu_env *env, struct cl_object *obj,
 			struct cl_lock *lock, const struct cl_io *io)
 {
 	struct lov_lock *lck;
-	int result;
+	int result = 0;
 
-	lck = kmem_cache_zalloc(lov_lock_kmem, GFP_NOFS);
-	if (lck) {
+	lck = lov_lock_sub_init(env, obj, lock);
+	if (!IS_ERR(lck))
 		cl_lock_slice_add(lock, &lck->lls_cl, obj, &lov_lock_ops);
-		result = lov_lock_sub_init(env, lck, io);
-	} else
-		result = -ENOMEM;
+	else
+		result = PTR_ERR(lck);
 	return result;
 }
 
@@ -1147,21 +349,9 @@ int lov_lock_init_empty(const struct lu_env *env, struct cl_object *obj,
 	lck = kmem_cache_zalloc(lov_lock_kmem, GFP_NOFS);
 	if (lck) {
 		cl_lock_slice_add(lock, &lck->lls_cl, obj, &lov_empty_lock_ops);
-		lck->lls_orig = lock->cll_descr;
 		result = 0;
 	}
 	return result;
 }
 
-static struct cl_lock_closure *lov_closure_get(const struct lu_env *env,
-					       struct cl_lock *parent)
-{
-	struct cl_lock_closure *closure;
-
-	closure = &lov_env_info(env)->lti_closure;
-	LASSERT(list_empty(&closure->clc_list));
-	cl_lock_closure_init(env, closure, parent, 1);
-	return closure;
-}
-
 /** @} lov */
diff --git a/drivers/staging/lustre/lustre/lov/lov_object.c b/drivers/staging/lustre/lustre/lov/lov_object.c
index 0159b6f..6a353d1 100644
--- a/drivers/staging/lustre/lustre/lov/lov_object.c
+++ b/drivers/staging/lustre/lustre/lov/lov_object.c
@@ -310,8 +310,6 @@ static int lov_delete_empty(const struct lu_env *env, struct lov_object *lov,
 	LASSERT(lov->lo_type == LLT_EMPTY || lov->lo_type == LLT_RELEASED);
 
 	lov_layout_wait(env, lov);
-
-	cl_locks_prune(env, &lov->lo_cl, 0);
 	return 0;
 }
 
@@ -379,7 +377,7 @@ static int lov_delete_raid0(const struct lu_env *env, struct lov_object *lov,
 			struct lovsub_object *los = r0->lo_sub[i];
 
 			if (los) {
-				cl_locks_prune(env, &los->lso_cl, 1);
+				cl_object_prune(env, &los->lso_cl);
 				/*
 				 * If top-level object is to be evicted from
 				 * the cache, so are its sub-objects.
@@ -388,7 +386,6 @@ static int lov_delete_raid0(const struct lu_env *env, struct lov_object *lov,
 			}
 		}
 	}
-	cl_locks_prune(env, &lov->lo_cl, 0);
 	return 0;
 }
 
@@ -714,7 +711,9 @@ static int lov_layout_change(const struct lu_env *unused,
 	old_ops = &lov_dispatch[lov->lo_type];
 	new_ops = &lov_dispatch[llt];
 
-	cl_object_prune(env, &lov->lo_cl);
+	result = cl_object_prune(env, &lov->lo_cl);
+	if (result != 0)
+		goto out;
 
 	result = old_ops->llo_delete(env, lov, &lov->u);
 	if (result == 0) {
@@ -736,6 +735,7 @@ static int lov_layout_change(const struct lu_env *unused,
 		}
 	}
 
+out:
 	cl_env_put(env, &refcheck);
 	cl_env_reexit(cookie);
 	return result;
@@ -816,7 +816,8 @@ static int lov_conf_set(const struct lu_env *env, struct cl_object *obj,
 		goto out;
 	}
 
-	lov->lo_layout_invalid = lov_layout_change(env, lov, conf);
+	result = lov_layout_change(env, lov, conf);
+	lov->lo_layout_invalid = result != 0;
 
 out:
 	lov_conf_unlock(lov);
diff --git a/drivers/staging/lustre/lustre/lov/lovsub_lock.c b/drivers/staging/lustre/lustre/lov/lovsub_lock.c
index 3bb0c90..670d203 100644
--- a/drivers/staging/lustre/lustre/lov/lovsub_lock.c
+++ b/drivers/staging/lustre/lustre/lov/lovsub_lock.c
@@ -62,391 +62,8 @@ static void lovsub_lock_fini(const struct lu_env *env,
 	kmem_cache_free(lovsub_lock_kmem, lsl);
 }
 
-static void lovsub_parent_lock(const struct lu_env *env, struct lov_lock *lov)
-{
-	struct cl_lock *parent;
-
-	parent = lov->lls_cl.cls_lock;
-	cl_lock_get(parent);
-	lu_ref_add(&parent->cll_reference, "lovsub-parent", current);
-	cl_lock_mutex_get(env, parent);
-}
-
-static void lovsub_parent_unlock(const struct lu_env *env, struct lov_lock *lov)
-{
-	struct cl_lock *parent;
-
-	parent = lov->lls_cl.cls_lock;
-	cl_lock_mutex_put(env, lov->lls_cl.cls_lock);
-	lu_ref_del(&parent->cll_reference, "lovsub-parent", current);
-	cl_lock_put(env, parent);
-}
-
-/**
- * Implements cl_lock_operations::clo_state() method for lovsub layer, which
- * method is called whenever sub-lock state changes. Propagates state change
- * to the top-locks.
- */
-static void lovsub_lock_state(const struct lu_env *env,
-			      const struct cl_lock_slice *slice,
-			      enum cl_lock_state state)
-{
-	struct lovsub_lock   *sub = cl2lovsub_lock(slice);
-	struct lov_lock_link *scan;
-
-	LASSERT(cl_lock_is_mutexed(slice->cls_lock));
-
-	list_for_each_entry(scan, &sub->lss_parents, lll_list) {
-		struct lov_lock *lov    = scan->lll_super;
-		struct cl_lock  *parent = lov->lls_cl.cls_lock;
-
-		if (sub->lss_active != parent) {
-			lovsub_parent_lock(env, lov);
-			cl_lock_signal(env, parent);
-			lovsub_parent_unlock(env, lov);
-		}
-	}
-}
-
-/**
- * Implementation of cl_lock_operation::clo_weigh() estimating lock weight by
- * asking parent lock.
- */
-static unsigned long lovsub_lock_weigh(const struct lu_env *env,
-				       const struct cl_lock_slice *slice)
-{
-	struct lovsub_lock *lock = cl2lovsub_lock(slice);
-	struct lov_lock    *lov;
-	unsigned long       dumbbell;
-
-	LASSERT(cl_lock_is_mutexed(slice->cls_lock));
-
-	if (!list_empty(&lock->lss_parents)) {
-		/*
-		 * It is not clear whether all parents have to be asked and
-		 * their estimations summed, or it is enough to ask one. For
-		 * the current usages, one is always enough.
-		 */
-		lov = container_of(lock->lss_parents.next,
-				   struct lov_lock_link, lll_list)->lll_super;
-
-		lovsub_parent_lock(env, lov);
-		dumbbell = cl_lock_weigh(env, lov->lls_cl.cls_lock);
-		lovsub_parent_unlock(env, lov);
-	} else
-		dumbbell = 0;
-
-	return dumbbell;
-}
-
-/**
- * Maps start/end offsets within a stripe, to offsets within a file.
- */
-static void lovsub_lock_descr_map(const struct cl_lock_descr *in,
-				  struct lov_object *lov,
-				  int stripe, struct cl_lock_descr *out)
-{
-	pgoff_t size; /* stripe size in pages */
-	pgoff_t skip; /* how many pages in every stripe are occupied by
-		       * "other" stripes
-		       */
-	pgoff_t start;
-	pgoff_t end;
-
-	start = in->cld_start;
-	end   = in->cld_end;
-
-	if (lov->lo_lsm->lsm_stripe_count > 1) {
-		size = cl_index(lov2cl(lov), lov->lo_lsm->lsm_stripe_size);
-		skip = (lov->lo_lsm->lsm_stripe_count - 1) * size;
-
-		/* XXX overflow check here? */
-		start += start/size * skip + stripe * size;
-
-		if (end != CL_PAGE_EOF) {
-			end += end/size * skip + stripe * size;
-			/*
-			 * And check for overflow...
-			 */
-			if (end < in->cld_end)
-				end = CL_PAGE_EOF;
-		}
-	}
-	out->cld_start = start;
-	out->cld_end   = end;
-}
-
-/**
- * Adjusts parent lock extent when a sub-lock is attached to a parent. This is
- * called in two ways:
- *
- *     - as part of receive call-back, when server returns granted extent to
- *       the client, and
- *
- *     - when top-lock finds existing sub-lock in the cache.
- *
- * Note, that lock mode is not propagated to the parent: i.e., if CLM_READ
- * top-lock matches CLM_WRITE sub-lock, top-lock is still CLM_READ.
- */
-int lov_sublock_modify(const struct lu_env *env, struct lov_lock *lov,
-		       struct lovsub_lock *sublock,
-		       const struct cl_lock_descr *d, int idx)
-{
-	struct cl_lock       *parent;
-	struct lovsub_object *subobj;
-	struct cl_lock_descr *pd;
-	struct cl_lock_descr *parent_descr;
-	int		   result;
-
-	parent       = lov->lls_cl.cls_lock;
-	parent_descr = &parent->cll_descr;
-	LASSERT(cl_lock_mode_match(d->cld_mode, parent_descr->cld_mode));
-
-	subobj = cl2lovsub(sublock->lss_cl.cls_obj);
-	pd     = &lov_env_info(env)->lti_ldescr;
-
-	pd->cld_obj  = parent_descr->cld_obj;
-	pd->cld_mode = parent_descr->cld_mode;
-	pd->cld_gid  = parent_descr->cld_gid;
-	lovsub_lock_descr_map(d, subobj->lso_super, subobj->lso_index, pd);
-	lov->lls_sub[idx].sub_got = *d;
-	/*
-	 * Notify top-lock about modification, if lock description changes
-	 * materially.
-	 */
-	if (!cl_lock_ext_match(parent_descr, pd))
-		result = cl_lock_modify(env, parent, pd);
-	else
-		result = 0;
-	return result;
-}
-
-static int lovsub_lock_modify(const struct lu_env *env,
-			      const struct cl_lock_slice *s,
-			      const struct cl_lock_descr *d)
-{
-	struct lovsub_lock   *lock   = cl2lovsub_lock(s);
-	struct lov_lock_link *scan;
-	struct lov_lock      *lov;
-	int result		   = 0;
-
-	LASSERT(cl_lock_mode_match(d->cld_mode,
-				   s->cls_lock->cll_descr.cld_mode));
-	list_for_each_entry(scan, &lock->lss_parents, lll_list) {
-		int rc;
-
-		lov = scan->lll_super;
-		lovsub_parent_lock(env, lov);
-		rc = lov_sublock_modify(env, lov, lock, d, scan->lll_idx);
-		lovsub_parent_unlock(env, lov);
-		result = result ?: rc;
-	}
-	return result;
-}
-
-static int lovsub_lock_closure(const struct lu_env *env,
-			       const struct cl_lock_slice *slice,
-			       struct cl_lock_closure *closure)
-{
-	struct lovsub_lock   *sub;
-	struct cl_lock       *parent;
-	struct lov_lock_link *scan;
-	int		   result;
-
-	LASSERT(cl_lock_is_mutexed(slice->cls_lock));
-
-	sub    = cl2lovsub_lock(slice);
-	result = 0;
-
-	list_for_each_entry(scan, &sub->lss_parents, lll_list) {
-		parent = scan->lll_super->lls_cl.cls_lock;
-		result = cl_lock_closure_build(env, parent, closure);
-		if (result != 0)
-			break;
-	}
-	return result;
-}
-
-/**
- * A helper function for lovsub_lock_delete() that deals with a given parent
- * top-lock.
- */
-static int lovsub_lock_delete_one(const struct lu_env *env,
-				  struct cl_lock *child, struct lov_lock *lov)
-{
-	struct cl_lock *parent;
-	int	     result;
-
-	parent = lov->lls_cl.cls_lock;
-	if (parent->cll_error)
-		return 0;
-
-	result = 0;
-	switch (parent->cll_state) {
-	case CLS_ENQUEUED:
-		/* See LU-1355 for the case that a glimpse lock is
-		 * interrupted by signal
-		 */
-		LASSERT(parent->cll_flags & CLF_CANCELLED);
-		break;
-	case CLS_QUEUING:
-	case CLS_FREEING:
-		cl_lock_signal(env, parent);
-		break;
-	case CLS_INTRANSIT:
-		/*
-		 * Here lies a problem: a sub-lock is canceled while top-lock
-		 * is being unlocked. Top-lock cannot be moved into CLS_NEW
-		 * state, because unlocking has to succeed eventually by
-		 * placing lock into CLS_CACHED (or failing it), see
-		 * cl_unuse_try(). Nor can top-lock be left in CLS_CACHED
-		 * state, because lov maintains an invariant that all
-		 * sub-locks exist in CLS_CACHED (this allows cached top-lock
-		 * to be reused immediately). Nor can we wait for top-lock
-		 * state to change, because this can be synchronous to the
-		 * current thread.
-		 *
-		 * We know for sure that lov_lock_unuse() will be called at
-		 * least one more time to finish un-using, so leave a mark on
-		 * the top-lock, that will be seen by the next call to
-		 * lov_lock_unuse().
-		 */
-		if (cl_lock_is_intransit(parent))
-			lov->lls_cancel_race = 1;
-		break;
-	case CLS_CACHED:
-		/*
-		 * if a sub-lock is canceled move its top-lock into CLS_NEW
-		 * state to preserve an invariant that a top-lock in
-		 * CLS_CACHED is immediately ready for re-use (i.e., has all
-		 * sub-locks), and so that next attempt to re-use the top-lock
-		 * enqueues missing sub-lock.
-		 */
-		cl_lock_state_set(env, parent, CLS_NEW);
-		/* fall through */
-	case CLS_NEW:
-		/*
-		 * if last sub-lock is canceled, destroy the top-lock (which
-		 * is now `empty') proactively.
-		 */
-		if (lov->lls_nr_filled == 0) {
-			/* ... but unfortunately, this cannot be done easily,
-			 * as cancellation of a top-lock might acquire mutices
-			 * of its other sub-locks, violating lock ordering,
-			 * see cl_lock_{cancel,delete}() preconditions.
-			 *
-			 * To work around this, the mutex of this sub-lock is
-			 * released, top-lock is destroyed, and sub-lock mutex
-			 * acquired again. The list of parents has to be
-			 * re-scanned from the beginning after this.
-			 *
-			 * Only do this if no mutices other than on @child and
-			 * @parent are held by the current thread.
-			 *
-			 * TODO: The lock modal here is too complex, because
-			 * the lock may be canceled and deleted by voluntarily:
-			 *    cl_lock_request
-			 *      -> osc_lock_enqueue_wait
-			 *	-> osc_lock_cancel_wait
-			 *	  -> cl_lock_delete
-			 *	    -> lovsub_lock_delete
-			 *	      -> cl_lock_cancel/delete
-			 *		-> ...
-			 *
-			 * The better choice is to spawn a kernel thread for
-			 * this purpose. -jay
-			 */
-			if (cl_lock_nr_mutexed(env) == 2) {
-				cl_lock_mutex_put(env, child);
-				cl_lock_cancel(env, parent);
-				cl_lock_delete(env, parent);
-				result = 1;
-			}
-		}
-		break;
-	case CLS_HELD:
-		CL_LOCK_DEBUG(D_ERROR, env, parent, "Delete CLS_HELD lock\n");
-	default:
-		CERROR("Impossible state: %d\n", parent->cll_state);
-		LBUG();
-		break;
-	}
-
-	return result;
-}
-
-/**
- * An implementation of cl_lock_operations::clo_delete() method. This is
- * invoked in "bottom-to-top" delete, when lock destruction starts from the
- * sub-lock (e.g, as a result of ldlm lock LRU policy).
- */
-static void lovsub_lock_delete(const struct lu_env *env,
-			       const struct cl_lock_slice *slice)
-{
-	struct cl_lock     *child = slice->cls_lock;
-	struct lovsub_lock *sub   = cl2lovsub_lock(slice);
-	int restart;
-
-	LASSERT(cl_lock_is_mutexed(child));
-
-	/*
-	 * Destruction of a sub-lock might take multiple iterations, because
-	 * when the last sub-lock of a given top-lock is deleted, top-lock is
-	 * canceled proactively, and this requires to release sub-lock
-	 * mutex. Once sub-lock mutex has been released, list of its parents
-	 * has to be re-scanned from the beginning.
-	 */
-	do {
-		struct lov_lock      *lov;
-		struct lov_lock_link *scan;
-		struct lov_lock_link *temp;
-		struct lov_lock_sub  *subdata;
-
-		restart = 0;
-		list_for_each_entry_safe(scan, temp,
-					 &sub->lss_parents, lll_list) {
-			lov     = scan->lll_super;
-			subdata = &lov->lls_sub[scan->lll_idx];
-			lovsub_parent_lock(env, lov);
-			subdata->sub_got = subdata->sub_descr;
-			lov_lock_unlink(env, scan, sub);
-			restart = lovsub_lock_delete_one(env, child, lov);
-			lovsub_parent_unlock(env, lov);
-
-			if (restart) {
-				cl_lock_mutex_get(env, child);
-				break;
-			}
-	       }
-	} while (restart);
-}
-
-static int lovsub_lock_print(const struct lu_env *env, void *cookie,
-			     lu_printer_t p, const struct cl_lock_slice *slice)
-{
-	struct lovsub_lock   *sub = cl2lovsub_lock(slice);
-	struct lov_lock      *lov;
-	struct lov_lock_link *scan;
-
-	list_for_each_entry(scan, &sub->lss_parents, lll_list) {
-		lov = scan->lll_super;
-		(*p)(env, cookie, "[%d %p ", scan->lll_idx, lov);
-		if (lov)
-			cl_lock_descr_print(env, cookie, p,
-					    &lov->lls_cl.cls_lock->cll_descr);
-		(*p)(env, cookie, "] ");
-	}
-	return 0;
-}
-
 static const struct cl_lock_operations lovsub_lock_ops = {
 	.clo_fini    = lovsub_lock_fini,
-	.clo_state   = lovsub_lock_state,
-	.clo_delete  = lovsub_lock_delete,
-	.clo_modify  = lovsub_lock_modify,
-	.clo_closure = lovsub_lock_closure,
-	.clo_weigh   = lovsub_lock_weigh,
-	.clo_print   = lovsub_lock_print
 };
 
 int lovsub_lock_init(const struct lu_env *env, struct cl_object *obj,
diff --git a/drivers/staging/lustre/lustre/obdclass/cl_io.c b/drivers/staging/lustre/lustre/obdclass/cl_io.c
index 6a8dd9f..7655dc4 100644
--- a/drivers/staging/lustre/lustre/obdclass/cl_io.c
+++ b/drivers/staging/lustre/lustre/obdclass/cl_io.c
@@ -160,7 +160,6 @@ static int cl_io_init0(const struct lu_env *env, struct cl_io *io,
 
 	io->ci_type = iot;
 	INIT_LIST_HEAD(&io->ci_lockset.cls_todo);
-	INIT_LIST_HEAD(&io->ci_lockset.cls_curr);
 	INIT_LIST_HEAD(&io->ci_lockset.cls_done);
 	INIT_LIST_HEAD(&io->ci_layers);
 
@@ -242,37 +241,7 @@ static int cl_lock_descr_sort(const struct cl_lock_descr *d0,
 			      const struct cl_lock_descr *d1)
 {
 	return lu_fid_cmp(lu_object_fid(&d0->cld_obj->co_lu),
-			  lu_object_fid(&d1->cld_obj->co_lu)) ?:
-		__diff_normalize(d0->cld_start, d1->cld_start);
-}
-
-static int cl_lock_descr_cmp(const struct cl_lock_descr *d0,
-			     const struct cl_lock_descr *d1)
-{
-	int ret;
-
-	ret = lu_fid_cmp(lu_object_fid(&d0->cld_obj->co_lu),
-			 lu_object_fid(&d1->cld_obj->co_lu));
-	if (ret)
-		return ret;
-	if (d0->cld_end < d1->cld_start)
-		return -1;
-	if (d0->cld_start > d0->cld_end)
-		return 1;
-	return 0;
-}
-
-static void cl_lock_descr_merge(struct cl_lock_descr *d0,
-				const struct cl_lock_descr *d1)
-{
-	d0->cld_start = min(d0->cld_start, d1->cld_start);
-	d0->cld_end = max(d0->cld_end, d1->cld_end);
-
-	if (d1->cld_mode == CLM_WRITE && d0->cld_mode != CLM_WRITE)
-		d0->cld_mode = CLM_WRITE;
-
-	if (d1->cld_mode == CLM_GROUP && d0->cld_mode != CLM_GROUP)
-		d0->cld_mode = CLM_GROUP;
+			  lu_object_fid(&d1->cld_obj->co_lu));
 }
 
 /*
@@ -321,33 +290,35 @@ static void cl_io_locks_sort(struct cl_io *io)
 	} while (!done);
 }
 
-/**
- * Check whether \a queue contains locks matching \a need.
- *
- * \retval +ve there is a matching lock in the \a queue
- * \retval   0 there are no matching locks in the \a queue
- */
-int cl_queue_match(const struct list_head *queue,
-		   const struct cl_lock_descr *need)
+static void cl_lock_descr_merge(struct cl_lock_descr *d0,
+				const struct cl_lock_descr *d1)
 {
-	struct cl_io_lock_link *scan;
+	d0->cld_start = min(d0->cld_start, d1->cld_start);
+	d0->cld_end = max(d0->cld_end, d1->cld_end);
 
-	list_for_each_entry(scan, queue, cill_linkage) {
-		if (cl_lock_descr_match(&scan->cill_descr, need))
-			return 1;
-	}
-	return 0;
+	if (d1->cld_mode == CLM_WRITE && d0->cld_mode != CLM_WRITE)
+		d0->cld_mode = CLM_WRITE;
+
+	if (d1->cld_mode == CLM_GROUP && d0->cld_mode != CLM_GROUP)
+		d0->cld_mode = CLM_GROUP;
 }
-EXPORT_SYMBOL(cl_queue_match);
 
-static int cl_queue_merge(const struct list_head *queue,
-			  const struct cl_lock_descr *need)
+static int cl_lockset_merge(const struct cl_lockset *set,
+			    const struct cl_lock_descr *need)
 {
 	struct cl_io_lock_link *scan;
 
-	list_for_each_entry(scan, queue, cill_linkage) {
-		if (cl_lock_descr_cmp(&scan->cill_descr, need))
+	list_for_each_entry(scan, &set->cls_todo, cill_linkage) {
+		if (!cl_object_same(scan->cill_descr.cld_obj, need->cld_obj))
 			continue;
+
+		/* Merge locks for the same object because ldlm lock server
+		 * may expand the lock extent, otherwise there is a deadlock
+		 * case if two conflicted locks are queueud for the same object
+		 * and lock server expands one lock to overlap the another.
+		 * The side effect is that it can generate a multi-stripe lock
+		 * that may cause casacading problem
+		 */
 		cl_lock_descr_merge(&scan->cill_descr, need);
 		CDEBUG(D_VFSTRACE, "lock: %d: [%lu, %lu]\n",
 		       scan->cill_descr.cld_mode, scan->cill_descr.cld_start,
@@ -357,87 +328,20 @@ static int cl_queue_merge(const struct list_head *queue,
 	return 0;
 }
 
-static int cl_lockset_match(const struct cl_lockset *set,
-			    const struct cl_lock_descr *need)
-{
-	return cl_queue_match(&set->cls_curr, need) ||
-	       cl_queue_match(&set->cls_done, need);
-}
-
-static int cl_lockset_merge(const struct cl_lockset *set,
-			    const struct cl_lock_descr *need)
-{
-	return cl_queue_merge(&set->cls_todo, need) ||
-	       cl_lockset_match(set, need);
-}
-
-static int cl_lockset_lock_one(const struct lu_env *env,
-			       struct cl_io *io, struct cl_lockset *set,
-			       struct cl_io_lock_link *link)
-{
-	struct cl_lock *lock;
-	int	     result;
-
-	lock = cl_lock_request(env, io, &link->cill_descr, "io", io);
-
-	if (!IS_ERR(lock)) {
-		link->cill_lock = lock;
-		list_move(&link->cill_linkage, &set->cls_curr);
-		if (!(link->cill_descr.cld_enq_flags & CEF_ASYNC)) {
-			result = cl_wait(env, lock);
-			if (result == 0)
-				list_move(&link->cill_linkage, &set->cls_done);
-		} else
-			result = 0;
-	} else
-		result = PTR_ERR(lock);
-	return result;
-}
-
-static void cl_lock_link_fini(const struct lu_env *env, struct cl_io *io,
-			      struct cl_io_lock_link *link)
-{
-	struct cl_lock *lock = link->cill_lock;
-
-	list_del_init(&link->cill_linkage);
-	if (lock) {
-		cl_lock_release(env, lock, "io", io);
-		link->cill_lock = NULL;
-	}
-	if (link->cill_fini)
-		link->cill_fini(env, link);
-}
-
 static int cl_lockset_lock(const struct lu_env *env, struct cl_io *io,
 			   struct cl_lockset *set)
 {
 	struct cl_io_lock_link *link;
 	struct cl_io_lock_link *temp;
-	struct cl_lock	 *lock;
 	int result;
 
 	result = 0;
 	list_for_each_entry_safe(link, temp, &set->cls_todo, cill_linkage) {
-		if (!cl_lockset_match(set, &link->cill_descr)) {
-			/* XXX some locking to guarantee that locks aren't
-			 * expanded in between.
-			 */
-			result = cl_lockset_lock_one(env, io, set, link);
-			if (result != 0)
-				break;
-		} else
-			cl_lock_link_fini(env, io, link);
-	}
-	if (result == 0) {
-		list_for_each_entry_safe(link, temp,
-					 &set->cls_curr, cill_linkage) {
-			lock = link->cill_lock;
-			result = cl_wait(env, lock);
-			if (result == 0)
-				list_move(&link->cill_linkage, &set->cls_done);
-			else
-				break;
-		}
+		result = cl_lock_request(env, io, &link->cill_lock);
+		if (result < 0)
+			break;
+
+		list_move(&link->cill_linkage, &set->cls_done);
 	}
 	return result;
 }
@@ -493,16 +397,19 @@ void cl_io_unlock(const struct lu_env *env, struct cl_io *io)
 
 	set = &io->ci_lockset;
 
-	list_for_each_entry_safe(link, temp, &set->cls_todo, cill_linkage)
-		cl_lock_link_fini(env, io, link);
-
-	list_for_each_entry_safe(link, temp, &set->cls_curr, cill_linkage)
-		cl_lock_link_fini(env, io, link);
+	list_for_each_entry_safe(link, temp, &set->cls_todo, cill_linkage) {
+		list_del_init(&link->cill_linkage);
+		if (link->cill_fini)
+			link->cill_fini(env, link);
+	}
 
 	list_for_each_entry_safe(link, temp, &set->cls_done, cill_linkage) {
-		cl_unuse(env, link->cill_lock);
-		cl_lock_link_fini(env, io, link);
+		list_del_init(&link->cill_linkage);
+		cl_lock_release(env, &link->cill_lock);
+		if (link->cill_fini)
+			link->cill_fini(env, link);
 	}
+
 	cl_io_for_each_reverse(scan, io) {
 		if (scan->cis_iop->op[io->ci_type].cio_unlock)
 			scan->cis_iop->op[io->ci_type].cio_unlock(env, scan);
@@ -1435,6 +1342,7 @@ EXPORT_SYMBOL(cl_sync_io_end);
 void cl_sync_io_init(struct cl_sync_io *anchor, int nr,
 		     void (*end)(const struct lu_env *, struct cl_sync_io *))
 {
+	memset(anchor, 0, sizeof(*anchor));
 	init_waitqueue_head(&anchor->csi_waitq);
 	atomic_set(&anchor->csi_sync_nr, nr);
 	atomic_set(&anchor->csi_barrier, nr > 0);
diff --git a/drivers/staging/lustre/lustre/obdclass/cl_lock.c b/drivers/staging/lustre/lustre/obdclass/cl_lock.c
index fe8059a..26a576b 100644
--- a/drivers/staging/lustre/lustre/obdclass/cl_lock.c
+++ b/drivers/staging/lustre/lustre/obdclass/cl_lock.c
@@ -48,1987 +48,187 @@
 #include "../include/cl_object.h"
 #include "cl_internal.h"
 
-/** Lock class of cl_lock::cll_guard */
-static struct lock_class_key cl_lock_guard_class;
-static struct kmem_cache *cl_lock_kmem;
-
-static struct lu_kmem_descr cl_lock_caches[] = {
-	{
-		.ckd_cache = &cl_lock_kmem,
-		.ckd_name  = "cl_lock_kmem",
-		.ckd_size  = sizeof (struct cl_lock)
-	},
-	{
-		.ckd_cache = NULL
-	}
-};
-
-#define CS_LOCK_INC(o, item)
-#define CS_LOCK_DEC(o, item)
-#define CS_LOCKSTATE_INC(o, state)
-#define CS_LOCKSTATE_DEC(o, state)
-
-/**
- * Basic lock invariant that is maintained at all times. Caller either has a
- * reference to \a lock, or somehow assures that \a lock cannot be freed.
- *
- * \see cl_lock_invariant()
- */
-static int cl_lock_invariant_trusted(const struct lu_env *env,
-				     const struct cl_lock *lock)
-{
-	return  ergo(lock->cll_state == CLS_FREEING, lock->cll_holds == 0) &&
-		atomic_read(&lock->cll_ref) >= lock->cll_holds &&
-		lock->cll_holds >= lock->cll_users &&
-		lock->cll_holds >= 0 &&
-		lock->cll_users >= 0 &&
-		lock->cll_depth >= 0;
-}
-
-/**
- * Stronger lock invariant, checking that caller has a reference on a lock.
- *
- * \see cl_lock_invariant_trusted()
- */
-static int cl_lock_invariant(const struct lu_env *env,
-			     const struct cl_lock *lock)
-{
-	int result;
-
-	result = atomic_read(&lock->cll_ref) > 0 &&
-		cl_lock_invariant_trusted(env, lock);
-	if (!result && env)
-		CL_LOCK_DEBUG(D_ERROR, env, lock, "invariant broken\n");
-	return result;
-}
-
-/**
- * Returns lock "nesting": 0 for a top-lock and 1 for a sub-lock.
- */
-static enum clt_nesting_level cl_lock_nesting(const struct cl_lock *lock)
-{
-	return cl_object_header(lock->cll_descr.cld_obj)->coh_nesting;
-}
-
-/**
- * Returns a set of counters for this lock, depending on a lock nesting.
- */
-static struct cl_thread_counters *cl_lock_counters(const struct lu_env *env,
-						   const struct cl_lock *lock)
-{
-	struct cl_thread_info *info;
-	enum clt_nesting_level nesting;
-
-	info = cl_env_info(env);
-	nesting = cl_lock_nesting(lock);
-	LASSERT(nesting < ARRAY_SIZE(info->clt_counters));
-	return &info->clt_counters[nesting];
-}
-
-static void cl_lock_trace0(int level, const struct lu_env *env,
-			   const char *prefix, const struct cl_lock *lock,
-			   const char *func, const int line)
-{
-	struct cl_object_header *h = cl_object_header(lock->cll_descr.cld_obj);
-
-	CDEBUG(level, "%s: %p@(%d %p %d %d %d %d %d %lx)(%p/%d/%d) at %s():%d\n",
-	       prefix, lock, atomic_read(&lock->cll_ref),
-	       lock->cll_guarder, lock->cll_depth,
-	       lock->cll_state, lock->cll_error, lock->cll_holds,
-	       lock->cll_users, lock->cll_flags,
-	       env, h->coh_nesting, cl_lock_nr_mutexed(env),
-	       func, line);
-}
-
-#define cl_lock_trace(level, env, prefix, lock)			 \
-	cl_lock_trace0(level, env, prefix, lock, __func__, __LINE__)
-
-#define RETIP ((unsigned long)__builtin_return_address(0))
-
-#ifdef CONFIG_LOCKDEP
-static struct lock_class_key cl_lock_key;
-
-static void cl_lock_lockdep_init(struct cl_lock *lock)
-{
-	lockdep_set_class_and_name(lock, &cl_lock_key, "EXT");
-}
-
-static void cl_lock_lockdep_acquire(const struct lu_env *env,
-				    struct cl_lock *lock, __u32 enqflags)
-{
-	cl_lock_counters(env, lock)->ctc_nr_locks_acquired++;
-	lock_map_acquire(&lock->dep_map);
-}
-
-static void cl_lock_lockdep_release(const struct lu_env *env,
-				    struct cl_lock *lock)
-{
-	cl_lock_counters(env, lock)->ctc_nr_locks_acquired--;
-	lock_release(&lock->dep_map, 0, RETIP);
-}
-
-#else /* !CONFIG_LOCKDEP */
-
-static void cl_lock_lockdep_init(struct cl_lock *lock)
-{}
-static void cl_lock_lockdep_acquire(const struct lu_env *env,
-				    struct cl_lock *lock, __u32 enqflags)
-{}
-static void cl_lock_lockdep_release(const struct lu_env *env,
-				    struct cl_lock *lock)
-{}
-
-#endif /* !CONFIG_LOCKDEP */
-
-/**
- * Adds lock slice to the compound lock.
- *
- * This is called by cl_object_operations::coo_lock_init() methods to add a
- * per-layer state to the lock. New state is added at the end of
- * cl_lock::cll_layers list, that is, it is at the bottom of the stack.
- *
- * \see cl_req_slice_add(), cl_page_slice_add(), cl_io_slice_add()
- */
-void cl_lock_slice_add(struct cl_lock *lock, struct cl_lock_slice *slice,
-		       struct cl_object *obj,
-		       const struct cl_lock_operations *ops)
-{
-	slice->cls_lock = lock;
-	list_add_tail(&slice->cls_linkage, &lock->cll_layers);
-	slice->cls_obj = obj;
-	slice->cls_ops = ops;
-}
-EXPORT_SYMBOL(cl_lock_slice_add);
-
-/**
- * Returns true iff a lock with the mode \a has provides at least the same
- * guarantees as a lock with the mode \a need.
- */
-int cl_lock_mode_match(enum cl_lock_mode has, enum cl_lock_mode need)
-{
-	LINVRNT(need == CLM_READ || need == CLM_WRITE ||
-		need == CLM_PHANTOM || need == CLM_GROUP);
-	LINVRNT(has == CLM_READ || has == CLM_WRITE ||
-		has == CLM_PHANTOM || has == CLM_GROUP);
-	CLASSERT(CLM_PHANTOM < CLM_READ);
-	CLASSERT(CLM_READ < CLM_WRITE);
-	CLASSERT(CLM_WRITE < CLM_GROUP);
-
-	if (has != CLM_GROUP)
-		return need <= has;
-	else
-		return need == has;
-}
-EXPORT_SYMBOL(cl_lock_mode_match);
-
-/**
- * Returns true iff extent portions of lock descriptions match.
- */
-int cl_lock_ext_match(const struct cl_lock_descr *has,
-		      const struct cl_lock_descr *need)
-{
-	return
-		has->cld_start <= need->cld_start &&
-		has->cld_end >= need->cld_end &&
-		cl_lock_mode_match(has->cld_mode, need->cld_mode) &&
-		(has->cld_mode != CLM_GROUP || has->cld_gid == need->cld_gid);
-}
-EXPORT_SYMBOL(cl_lock_ext_match);
-
-/**
- * Returns true iff a lock with the description \a has provides at least the
- * same guarantees as a lock with the description \a need.
- */
-int cl_lock_descr_match(const struct cl_lock_descr *has,
-			const struct cl_lock_descr *need)
-{
-	return
-		cl_object_same(has->cld_obj, need->cld_obj) &&
-		cl_lock_ext_match(has, need);
-}
-EXPORT_SYMBOL(cl_lock_descr_match);
-
-static void cl_lock_free(const struct lu_env *env, struct cl_lock *lock)
-{
-	struct cl_object *obj = lock->cll_descr.cld_obj;
-
-	LINVRNT(!cl_lock_is_mutexed(lock));
-
-	cl_lock_trace(D_DLMTRACE, env, "free lock", lock);
-	while (!list_empty(&lock->cll_layers)) {
-		struct cl_lock_slice *slice;
-
-		slice = list_entry(lock->cll_layers.next,
-				   struct cl_lock_slice, cls_linkage);
-		list_del_init(lock->cll_layers.next);
-		slice->cls_ops->clo_fini(env, slice);
-	}
-	CS_LOCK_DEC(obj, total);
-	CS_LOCKSTATE_DEC(obj, lock->cll_state);
-	lu_object_ref_del_at(&obj->co_lu, &lock->cll_obj_ref, "cl_lock", lock);
-	cl_object_put(env, obj);
-	lu_ref_fini(&lock->cll_reference);
-	lu_ref_fini(&lock->cll_holders);
-	mutex_destroy(&lock->cll_guard);
-	kmem_cache_free(cl_lock_kmem, lock);
-}
-
-/**
- * Releases a reference on a lock.
- *
- * When last reference is released, lock is returned to the cache, unless it
- * is in cl_lock_state::CLS_FREEING state, in which case it is destroyed
- * immediately.
- *
- * \see cl_object_put(), cl_page_put()
- */
-void cl_lock_put(const struct lu_env *env, struct cl_lock *lock)
-{
-	struct cl_object	*obj;
-
-	LINVRNT(cl_lock_invariant(env, lock));
-	obj = lock->cll_descr.cld_obj;
-	LINVRNT(obj);
-
-	CDEBUG(D_TRACE, "releasing reference: %d %p %lu\n",
-	       atomic_read(&lock->cll_ref), lock, RETIP);
-
-	if (atomic_dec_and_test(&lock->cll_ref)) {
-		if (lock->cll_state == CLS_FREEING) {
-			LASSERT(list_empty(&lock->cll_linkage));
-			cl_lock_free(env, lock);
-		}
-		CS_LOCK_DEC(obj, busy);
-	}
-}
-EXPORT_SYMBOL(cl_lock_put);
-
-/**
- * Acquires an additional reference to a lock.
- *
- * This can be called only by caller already possessing a reference to \a
- * lock.
- *
- * \see cl_object_get(), cl_page_get()
- */
-void cl_lock_get(struct cl_lock *lock)
-{
-	LINVRNT(cl_lock_invariant(NULL, lock));
-	CDEBUG(D_TRACE, "acquiring reference: %d %p %lu\n",
-	       atomic_read(&lock->cll_ref), lock, RETIP);
-	atomic_inc(&lock->cll_ref);
-}
-EXPORT_SYMBOL(cl_lock_get);
-
-/**
- * Acquires a reference to a lock.
- *
- * This is much like cl_lock_get(), except that this function can be used to
- * acquire initial reference to the cached lock. Caller has to deal with all
- * possible races. Use with care!
- *
- * \see cl_page_get_trust()
- */
-void cl_lock_get_trust(struct cl_lock *lock)
-{
-	CDEBUG(D_TRACE, "acquiring trusted reference: %d %p %lu\n",
-	       atomic_read(&lock->cll_ref), lock, RETIP);
-	if (atomic_inc_return(&lock->cll_ref) == 1)
-		CS_LOCK_INC(lock->cll_descr.cld_obj, busy);
-}
-EXPORT_SYMBOL(cl_lock_get_trust);
-
-/**
- * Helper function destroying the lock that wasn't completely initialized.
- *
- * Other threads can acquire references to the top-lock through its
- * sub-locks. Hence, it cannot be cl_lock_free()-ed immediately.
- */
-static void cl_lock_finish(const struct lu_env *env, struct cl_lock *lock)
-{
-	cl_lock_mutex_get(env, lock);
-	cl_lock_cancel(env, lock);
-	cl_lock_delete(env, lock);
-	cl_lock_mutex_put(env, lock);
-	cl_lock_put(env, lock);
-}
-
-static struct cl_lock *cl_lock_alloc(const struct lu_env *env,
-				     struct cl_object *obj,
-				     const struct cl_io *io,
-				     const struct cl_lock_descr *descr)
-{
-	struct cl_lock	  *lock;
-	struct lu_object_header *head;
-
-	lock = kmem_cache_zalloc(cl_lock_kmem, GFP_NOFS);
-	if (lock) {
-		atomic_set(&lock->cll_ref, 1);
-		lock->cll_descr = *descr;
-		lock->cll_state = CLS_NEW;
-		cl_object_get(obj);
-		lu_object_ref_add_at(&obj->co_lu, &lock->cll_obj_ref, "cl_lock",
-				     lock);
-		INIT_LIST_HEAD(&lock->cll_layers);
-		INIT_LIST_HEAD(&lock->cll_linkage);
-		INIT_LIST_HEAD(&lock->cll_inclosure);
-		lu_ref_init(&lock->cll_reference);
-		lu_ref_init(&lock->cll_holders);
-		mutex_init(&lock->cll_guard);
-		lockdep_set_class(&lock->cll_guard, &cl_lock_guard_class);
-		init_waitqueue_head(&lock->cll_wq);
-		head = obj->co_lu.lo_header;
-		CS_LOCKSTATE_INC(obj, CLS_NEW);
-		CS_LOCK_INC(obj, total);
-		CS_LOCK_INC(obj, create);
-		cl_lock_lockdep_init(lock);
-		list_for_each_entry(obj, &head->loh_layers, co_lu.lo_linkage) {
-			int err;
-
-			err = obj->co_ops->coo_lock_init(env, obj, lock, io);
-			if (err != 0) {
-				cl_lock_finish(env, lock);
-				lock = ERR_PTR(err);
-				break;
-			}
-		}
-	} else
-		lock = ERR_PTR(-ENOMEM);
-	return lock;
-}
-
-/**
- * Transfer the lock into INTRANSIT state and return the original state.
- *
- * \pre  state: CLS_CACHED, CLS_HELD or CLS_ENQUEUED
- * \post state: CLS_INTRANSIT
- * \see CLS_INTRANSIT
- */
-static enum cl_lock_state cl_lock_intransit(const struct lu_env *env,
-					    struct cl_lock *lock)
-{
-	enum cl_lock_state state = lock->cll_state;
-
-	LASSERT(cl_lock_is_mutexed(lock));
-	LASSERT(state != CLS_INTRANSIT);
-	LASSERTF(state >= CLS_ENQUEUED && state <= CLS_CACHED,
-		 "Malformed lock state %d.\n", state);
-
-	cl_lock_state_set(env, lock, CLS_INTRANSIT);
-	lock->cll_intransit_owner = current;
-	cl_lock_hold_add(env, lock, "intransit", current);
-	return state;
-}
-
-/**
- *  Exit the intransit state and restore the lock state to the original state
- */
-static void cl_lock_extransit(const struct lu_env *env, struct cl_lock *lock,
-			      enum cl_lock_state state)
-{
-	LASSERT(cl_lock_is_mutexed(lock));
-	LASSERT(lock->cll_state == CLS_INTRANSIT);
-	LASSERT(state != CLS_INTRANSIT);
-	LASSERT(lock->cll_intransit_owner == current);
-
-	lock->cll_intransit_owner = NULL;
-	cl_lock_state_set(env, lock, state);
-	cl_lock_unhold(env, lock, "intransit", current);
-}
-
-/**
- * Checking whether the lock is intransit state
- */
-int cl_lock_is_intransit(struct cl_lock *lock)
-{
-	LASSERT(cl_lock_is_mutexed(lock));
-	return lock->cll_state == CLS_INTRANSIT &&
-	       lock->cll_intransit_owner != current;
-}
-EXPORT_SYMBOL(cl_lock_is_intransit);
-/**
- * Returns true iff lock is "suitable" for given io. E.g., locks acquired by
- * truncate and O_APPEND cannot be reused for read/non-append-write, as they
- * cover multiple stripes and can trigger cascading timeouts.
- */
-static int cl_lock_fits_into(const struct lu_env *env,
-			     const struct cl_lock *lock,
-			     const struct cl_lock_descr *need,
-			     const struct cl_io *io)
-{
-	const struct cl_lock_slice *slice;
-
-	LINVRNT(cl_lock_invariant_trusted(env, lock));
-	list_for_each_entry(slice, &lock->cll_layers, cls_linkage) {
-		if (slice->cls_ops->clo_fits_into &&
-		    !slice->cls_ops->clo_fits_into(env, slice, need, io))
-			return 0;
-	}
-	return 1;
-}
-
-static struct cl_lock *cl_lock_lookup(const struct lu_env *env,
-				      struct cl_object *obj,
-				      const struct cl_io *io,
-				      const struct cl_lock_descr *need)
-{
-	struct cl_lock	  *lock;
-	struct cl_object_header *head;
-
-	head = cl_object_header(obj);
-	assert_spin_locked(&head->coh_lock_guard);
-	CS_LOCK_INC(obj, lookup);
-	list_for_each_entry(lock, &head->coh_locks, cll_linkage) {
-		int matched;
-
-		matched = cl_lock_ext_match(&lock->cll_descr, need) &&
-			  lock->cll_state < CLS_FREEING &&
-			  lock->cll_error == 0 &&
-			  !(lock->cll_flags & CLF_CANCELLED) &&
-			  cl_lock_fits_into(env, lock, need, io);
-		CDEBUG(D_DLMTRACE, "has: "DDESCR"(%d) need: "DDESCR": %d\n",
-		       PDESCR(&lock->cll_descr), lock->cll_state, PDESCR(need),
-		       matched);
-		if (matched) {
-			cl_lock_get_trust(lock);
-			CS_LOCK_INC(obj, hit);
-			return lock;
-		}
-	}
-	return NULL;
-}
-
-/**
- * Returns a lock matching description \a need.
- *
- * This is the main entry point into the cl_lock caching interface. First, a
- * cache (implemented as a per-object linked list) is consulted. If lock is
- * found there, it is returned immediately. Otherwise new lock is allocated
- * and returned. In any case, additional reference to lock is acquired.
- *
- * \see cl_object_find(), cl_page_find()
- */
-static struct cl_lock *cl_lock_find(const struct lu_env *env,
-				    const struct cl_io *io,
-				    const struct cl_lock_descr *need)
-{
-	struct cl_object_header *head;
-	struct cl_object	*obj;
-	struct cl_lock	  *lock;
-
-	obj  = need->cld_obj;
-	head = cl_object_header(obj);
-
-	spin_lock(&head->coh_lock_guard);
-	lock = cl_lock_lookup(env, obj, io, need);
-	spin_unlock(&head->coh_lock_guard);
-
-	if (!lock) {
-		lock = cl_lock_alloc(env, obj, io, need);
-		if (!IS_ERR(lock)) {
-			struct cl_lock *ghost;
-
-			spin_lock(&head->coh_lock_guard);
-			ghost = cl_lock_lookup(env, obj, io, need);
-			if (!ghost) {
-				cl_lock_get_trust(lock);
-				list_add_tail(&lock->cll_linkage,
-					      &head->coh_locks);
-				spin_unlock(&head->coh_lock_guard);
-				CS_LOCK_INC(obj, busy);
-			} else {
-				spin_unlock(&head->coh_lock_guard);
-				/*
-				 * Other threads can acquire references to the
-				 * top-lock through its sub-locks. Hence, it
-				 * cannot be cl_lock_free()-ed immediately.
-				 */
-				cl_lock_finish(env, lock);
-				lock = ghost;
-			}
-		}
-	}
-	return lock;
-}
-
-/**
- * Returns existing lock matching given description. This is similar to
- * cl_lock_find() except that no new lock is created, and returned lock is
- * guaranteed to be in enum cl_lock_state::CLS_HELD state.
- */
-struct cl_lock *cl_lock_peek(const struct lu_env *env, const struct cl_io *io,
-			     const struct cl_lock_descr *need,
-			     const char *scope, const void *source)
-{
-	struct cl_object_header *head;
-	struct cl_object	*obj;
-	struct cl_lock	  *lock;
-
-	obj  = need->cld_obj;
-	head = cl_object_header(obj);
-
-	do {
-		spin_lock(&head->coh_lock_guard);
-		lock = cl_lock_lookup(env, obj, io, need);
-		spin_unlock(&head->coh_lock_guard);
-		if (!lock)
-			return NULL;
-
-		cl_lock_mutex_get(env, lock);
-		if (lock->cll_state == CLS_INTRANSIT)
-			/* Don't care return value. */
-			cl_lock_state_wait(env, lock);
-		if (lock->cll_state == CLS_FREEING) {
-			cl_lock_mutex_put(env, lock);
-			cl_lock_put(env, lock);
-			lock = NULL;
-		}
-	} while (!lock);
-
-	cl_lock_hold_add(env, lock, scope, source);
-	cl_lock_user_add(env, lock);
-	if (lock->cll_state == CLS_CACHED)
-		cl_use_try(env, lock, 1);
-	if (lock->cll_state == CLS_HELD) {
-		cl_lock_mutex_put(env, lock);
-		cl_lock_lockdep_acquire(env, lock, 0);
-		cl_lock_put(env, lock);
-	} else {
-		cl_unuse_try(env, lock);
-		cl_lock_unhold(env, lock, scope, source);
-		cl_lock_mutex_put(env, lock);
-		cl_lock_put(env, lock);
-		lock = NULL;
-	}
-
-	return lock;
-}
-EXPORT_SYMBOL(cl_lock_peek);
-
-/**
- * Returns a slice within a lock, corresponding to the given layer in the
- * device stack.
- *
- * \see cl_page_at()
- */
-const struct cl_lock_slice *cl_lock_at(const struct cl_lock *lock,
-				       const struct lu_device_type *dtype)
-{
-	const struct cl_lock_slice *slice;
-
-	LINVRNT(cl_lock_invariant_trusted(NULL, lock));
-
-	list_for_each_entry(slice, &lock->cll_layers, cls_linkage) {
-		if (slice->cls_obj->co_lu.lo_dev->ld_type == dtype)
-			return slice;
-	}
-	return NULL;
-}
-EXPORT_SYMBOL(cl_lock_at);
-
-static void cl_lock_mutex_tail(const struct lu_env *env, struct cl_lock *lock)
-{
-	struct cl_thread_counters *counters;
-
-	counters = cl_lock_counters(env, lock);
-	lock->cll_depth++;
-	counters->ctc_nr_locks_locked++;
-	lu_ref_add(&counters->ctc_locks_locked, "cll_guard", lock);
-	cl_lock_trace(D_TRACE, env, "got mutex", lock);
-}
-
-/**
- * Locks cl_lock object.
- *
- * This is used to manipulate cl_lock fields, and to serialize state
- * transitions in the lock state machine.
- *
- * \post cl_lock_is_mutexed(lock)
- *
- * \see cl_lock_mutex_put()
- */
-void cl_lock_mutex_get(const struct lu_env *env, struct cl_lock *lock)
-{
-	LINVRNT(cl_lock_invariant(env, lock));
-
-	if (lock->cll_guarder == current) {
-		LINVRNT(cl_lock_is_mutexed(lock));
-		LINVRNT(lock->cll_depth > 0);
-	} else {
-		struct cl_object_header *hdr;
-		struct cl_thread_info   *info;
-		int i;
-
-		LINVRNT(lock->cll_guarder != current);
-		hdr = cl_object_header(lock->cll_descr.cld_obj);
-		/*
-		 * Check that mutices are taken in the bottom-to-top order.
-		 */
-		info = cl_env_info(env);
-		for (i = 0; i < hdr->coh_nesting; ++i)
-			LASSERT(info->clt_counters[i].ctc_nr_locks_locked == 0);
-		mutex_lock_nested(&lock->cll_guard, hdr->coh_nesting);
-		lock->cll_guarder = current;
-		LINVRNT(lock->cll_depth == 0);
-	}
-	cl_lock_mutex_tail(env, lock);
-}
-EXPORT_SYMBOL(cl_lock_mutex_get);
-
-/**
- * Try-locks cl_lock object.
- *
- * \retval 0 \a lock was successfully locked
- *
- * \retval -EBUSY \a lock cannot be locked right now
- *
- * \post ergo(result == 0, cl_lock_is_mutexed(lock))
- *
- * \see cl_lock_mutex_get()
- */
-static int cl_lock_mutex_try(const struct lu_env *env, struct cl_lock *lock)
-{
-	int result;
-
-	LINVRNT(cl_lock_invariant_trusted(env, lock));
-
-	result = 0;
-	if (lock->cll_guarder == current) {
-		LINVRNT(lock->cll_depth > 0);
-		cl_lock_mutex_tail(env, lock);
-	} else if (mutex_trylock(&lock->cll_guard)) {
-		LINVRNT(lock->cll_depth == 0);
-		lock->cll_guarder = current;
-		cl_lock_mutex_tail(env, lock);
-	} else
-		result = -EBUSY;
-	return result;
-}
-
-/**
- {* Unlocks cl_lock object.
- *
- * \pre cl_lock_is_mutexed(lock)
- *
- * \see cl_lock_mutex_get()
- */
-void cl_lock_mutex_put(const struct lu_env *env, struct cl_lock *lock)
-{
-	struct cl_thread_counters *counters;
-
-	LINVRNT(cl_lock_invariant(env, lock));
-	LINVRNT(cl_lock_is_mutexed(lock));
-	LINVRNT(lock->cll_guarder == current);
-	LINVRNT(lock->cll_depth > 0);
-
-	counters = cl_lock_counters(env, lock);
-	LINVRNT(counters->ctc_nr_locks_locked > 0);
-
-	cl_lock_trace(D_TRACE, env, "put mutex", lock);
-	lu_ref_del(&counters->ctc_locks_locked, "cll_guard", lock);
-	counters->ctc_nr_locks_locked--;
-	if (--lock->cll_depth == 0) {
-		lock->cll_guarder = NULL;
-		mutex_unlock(&lock->cll_guard);
-	}
-}
-EXPORT_SYMBOL(cl_lock_mutex_put);
-
-/**
- * Returns true iff lock's mutex is owned by the current thread.
- */
-int cl_lock_is_mutexed(struct cl_lock *lock)
-{
-	return lock->cll_guarder == current;
-}
-EXPORT_SYMBOL(cl_lock_is_mutexed);
-
-/**
- * Returns number of cl_lock mutices held by the current thread (environment).
- */
-int cl_lock_nr_mutexed(const struct lu_env *env)
-{
-	struct cl_thread_info *info;
-	int i;
-	int locked;
-
-	/*
-	 * NOTE: if summation across all nesting levels (currently 2) proves
-	 *       too expensive, a summary counter can be added to
-	 *       struct cl_thread_info.
-	 */
-	info = cl_env_info(env);
-	for (i = 0, locked = 0; i < ARRAY_SIZE(info->clt_counters); ++i)
-		locked += info->clt_counters[i].ctc_nr_locks_locked;
-	return locked;
-}
-EXPORT_SYMBOL(cl_lock_nr_mutexed);
-
-static void cl_lock_cancel0(const struct lu_env *env, struct cl_lock *lock)
-{
-	LINVRNT(cl_lock_is_mutexed(lock));
-	LINVRNT(cl_lock_invariant(env, lock));
-	if (!(lock->cll_flags & CLF_CANCELLED)) {
-		const struct cl_lock_slice *slice;
-
-		lock->cll_flags |= CLF_CANCELLED;
-		list_for_each_entry_reverse(slice, &lock->cll_layers,
-					    cls_linkage) {
-			if (slice->cls_ops->clo_cancel)
-				slice->cls_ops->clo_cancel(env, slice);
-		}
-	}
-}
-
-static void cl_lock_delete0(const struct lu_env *env, struct cl_lock *lock)
-{
-	struct cl_object_header    *head;
-	const struct cl_lock_slice *slice;
-
-	LINVRNT(cl_lock_is_mutexed(lock));
-	LINVRNT(cl_lock_invariant(env, lock));
-
-	if (lock->cll_state < CLS_FREEING) {
-		bool in_cache;
-
-		LASSERT(lock->cll_state != CLS_INTRANSIT);
-		cl_lock_state_set(env, lock, CLS_FREEING);
-
-		head = cl_object_header(lock->cll_descr.cld_obj);
-
-		spin_lock(&head->coh_lock_guard);
-		in_cache = !list_empty(&lock->cll_linkage);
-		if (in_cache)
-			list_del_init(&lock->cll_linkage);
-		spin_unlock(&head->coh_lock_guard);
-
-		if (in_cache) /* coh_locks cache holds a refcount. */
-			cl_lock_put(env, lock);
-
-		/*
-		 * From now on, no new references to this lock can be acquired
-		 * by cl_lock_lookup().
-		 */
-		list_for_each_entry_reverse(slice, &lock->cll_layers,
-					    cls_linkage) {
-			if (slice->cls_ops->clo_delete)
-				slice->cls_ops->clo_delete(env, slice);
-		}
-		/*
-		 * From now on, no new references to this lock can be acquired
-		 * by layer-specific means (like a pointer from struct
-		 * ldlm_lock in osc, or a pointer from top-lock to sub-lock in
-		 * lov).
-		 *
-		 * Lock will be finally freed in cl_lock_put() when last of
-		 * existing references goes away.
-		 */
-	}
-}
-
-/**
- * Mod(ifie)s cl_lock::cll_holds counter for a given lock. Also, for a
- * top-lock (nesting == 0) accounts for this modification in the per-thread
- * debugging counters. Sub-lock holds can be released by a thread different
- * from one that acquired it.
- */
-static void cl_lock_hold_mod(const struct lu_env *env, struct cl_lock *lock,
-			     int delta)
-{
-	struct cl_thread_counters *counters;
-	enum clt_nesting_level     nesting;
-
-	lock->cll_holds += delta;
-	nesting = cl_lock_nesting(lock);
-	if (nesting == CNL_TOP) {
-		counters = &cl_env_info(env)->clt_counters[CNL_TOP];
-		counters->ctc_nr_held += delta;
-		LASSERT(counters->ctc_nr_held >= 0);
-	}
-}
-
-/**
- * Mod(ifie)s cl_lock::cll_users counter for a given lock. See
- * cl_lock_hold_mod() for the explanation of the debugging code.
- */
-static void cl_lock_used_mod(const struct lu_env *env, struct cl_lock *lock,
-			     int delta)
-{
-	struct cl_thread_counters *counters;
-	enum clt_nesting_level     nesting;
-
-	lock->cll_users += delta;
-	nesting = cl_lock_nesting(lock);
-	if (nesting == CNL_TOP) {
-		counters = &cl_env_info(env)->clt_counters[CNL_TOP];
-		counters->ctc_nr_used += delta;
-		LASSERT(counters->ctc_nr_used >= 0);
-	}
-}
-
-void cl_lock_hold_release(const struct lu_env *env, struct cl_lock *lock,
-			  const char *scope, const void *source)
-{
-	LINVRNT(cl_lock_is_mutexed(lock));
-	LINVRNT(cl_lock_invariant(env, lock));
-	LASSERT(lock->cll_holds > 0);
-
-	cl_lock_trace(D_DLMTRACE, env, "hold release lock", lock);
-	lu_ref_del(&lock->cll_holders, scope, source);
-	cl_lock_hold_mod(env, lock, -1);
-	if (lock->cll_holds == 0) {
-		CL_LOCK_ASSERT(lock->cll_state != CLS_HELD, env, lock);
-		if (lock->cll_descr.cld_mode == CLM_PHANTOM ||
-		    lock->cll_descr.cld_mode == CLM_GROUP ||
-		    lock->cll_state != CLS_CACHED)
-			/*
-			 * If lock is still phantom or grouplock when user is
-			 * done with it---destroy the lock.
-			 */
-			lock->cll_flags |= CLF_CANCELPEND|CLF_DOOMED;
-		if (lock->cll_flags & CLF_CANCELPEND) {
-			lock->cll_flags &= ~CLF_CANCELPEND;
-			cl_lock_cancel0(env, lock);
-		}
-		if (lock->cll_flags & CLF_DOOMED) {
-			/* no longer doomed: it's dead... Jim. */
-			lock->cll_flags &= ~CLF_DOOMED;
-			cl_lock_delete0(env, lock);
-		}
-	}
-}
-EXPORT_SYMBOL(cl_lock_hold_release);
-
-/**
- * Waits until lock state is changed.
- *
- * This function is called with cl_lock mutex locked, atomically releases
- * mutex and goes to sleep, waiting for a lock state change (signaled by
- * cl_lock_signal()), and re-acquires the mutex before return.
- *
- * This function is used to wait until lock state machine makes some progress
- * and to emulate synchronous operations on top of asynchronous lock
- * interface.
- *
- * \retval -EINTR wait was interrupted
- *
- * \retval 0 wait wasn't interrupted
- *
- * \pre cl_lock_is_mutexed(lock)
- *
- * \see cl_lock_signal()
- */
-int cl_lock_state_wait(const struct lu_env *env, struct cl_lock *lock)
-{
-	wait_queue_t waiter;
-	sigset_t blocked;
-	int result;
-
-	LINVRNT(cl_lock_is_mutexed(lock));
-	LINVRNT(cl_lock_invariant(env, lock));
-	LASSERT(lock->cll_depth == 1);
-	LASSERT(lock->cll_state != CLS_FREEING); /* too late to wait */
-
-	cl_lock_trace(D_DLMTRACE, env, "state wait lock", lock);
-	result = lock->cll_error;
-	if (result == 0) {
-		/* To avoid being interrupted by the 'non-fatal' signals
-		 * (SIGCHLD, for instance), we'd block them temporarily.
-		 * LU-305
-		 */
-		blocked = cfs_block_sigsinv(LUSTRE_FATAL_SIGS);
-
-		init_waitqueue_entry(&waiter, current);
-		add_wait_queue(&lock->cll_wq, &waiter);
-		set_current_state(TASK_INTERRUPTIBLE);
-		cl_lock_mutex_put(env, lock);
-
-		LASSERT(cl_lock_nr_mutexed(env) == 0);
-
-		/* Returning ERESTARTSYS instead of EINTR so syscalls
-		 * can be restarted if signals are pending here
-		 */
-		result = -ERESTARTSYS;
-		if (likely(!OBD_FAIL_CHECK(OBD_FAIL_LOCK_STATE_WAIT_INTR))) {
-			schedule();
-			if (!signal_pending(current))
-				result = 0;
-		}
-
-		cl_lock_mutex_get(env, lock);
-		set_current_state(TASK_RUNNING);
-		remove_wait_queue(&lock->cll_wq, &waiter);
-
-		/* Restore old blocked signals */
-		cfs_restore_sigs(blocked);
-	}
-	return result;
-}
-EXPORT_SYMBOL(cl_lock_state_wait);
-
-static void cl_lock_state_signal(const struct lu_env *env, struct cl_lock *lock,
-				 enum cl_lock_state state)
-{
-	const struct cl_lock_slice *slice;
-
-	LINVRNT(cl_lock_is_mutexed(lock));
-	LINVRNT(cl_lock_invariant(env, lock));
-
-	list_for_each_entry(slice, &lock->cll_layers, cls_linkage)
-		if (slice->cls_ops->clo_state)
-			slice->cls_ops->clo_state(env, slice, state);
-	wake_up_all(&lock->cll_wq);
-}
-
-/**
- * Notifies waiters that lock state changed.
- *
- * Wakes up all waiters sleeping in cl_lock_state_wait(), also notifies all
- * layers about state change by calling cl_lock_operations::clo_state()
- * top-to-bottom.
- */
-void cl_lock_signal(const struct lu_env *env, struct cl_lock *lock)
-{
-	cl_lock_trace(D_DLMTRACE, env, "state signal lock", lock);
-	cl_lock_state_signal(env, lock, lock->cll_state);
-}
-EXPORT_SYMBOL(cl_lock_signal);
-
-/**
- * Changes lock state.
- *
- * This function is invoked to notify layers that lock state changed, possible
- * as a result of an asynchronous event such as call-back reception.
- *
- * \post lock->cll_state == state
- *
- * \see cl_lock_operations::clo_state()
- */
-void cl_lock_state_set(const struct lu_env *env, struct cl_lock *lock,
-		       enum cl_lock_state state)
-{
-	LASSERT(lock->cll_state <= state ||
-		(lock->cll_state == CLS_CACHED &&
-		 (state == CLS_HELD || /* lock found in cache */
-		  state == CLS_NEW  ||   /* sub-lock canceled */
-		  state == CLS_INTRANSIT)) ||
-		/* lock is in transit state */
-		lock->cll_state == CLS_INTRANSIT);
-
-	if (lock->cll_state != state) {
-		CS_LOCKSTATE_DEC(lock->cll_descr.cld_obj, lock->cll_state);
-		CS_LOCKSTATE_INC(lock->cll_descr.cld_obj, state);
-
-		cl_lock_state_signal(env, lock, state);
-		lock->cll_state = state;
-	}
-}
-EXPORT_SYMBOL(cl_lock_state_set);
-
-static int cl_unuse_try_internal(const struct lu_env *env, struct cl_lock *lock)
-{
-	const struct cl_lock_slice *slice;
-	int result;
-
-	do {
-		result = 0;
-
-		LINVRNT(cl_lock_is_mutexed(lock));
-		LINVRNT(cl_lock_invariant(env, lock));
-		LASSERT(lock->cll_state == CLS_INTRANSIT);
-
-		result = -ENOSYS;
-		list_for_each_entry_reverse(slice, &lock->cll_layers,
-					    cls_linkage) {
-			if (slice->cls_ops->clo_unuse) {
-				result = slice->cls_ops->clo_unuse(env, slice);
-				if (result != 0)
-					break;
-			}
-		}
-		LASSERT(result != -ENOSYS);
-	} while (result == CLO_REPEAT);
-
-	return result;
-}
-
-/**
- * Yanks lock from the cache (cl_lock_state::CLS_CACHED state) by calling
- * cl_lock_operations::clo_use() top-to-bottom to notify layers.
- * @atomic = 1, it must unuse the lock to recovery the lock to keep the
- *  use process atomic
- */
-int cl_use_try(const struct lu_env *env, struct cl_lock *lock, int atomic)
-{
-	const struct cl_lock_slice *slice;
-	int result;
-	enum cl_lock_state state;
-
-	cl_lock_trace(D_DLMTRACE, env, "use lock", lock);
-
-	LASSERT(lock->cll_state == CLS_CACHED);
-	if (lock->cll_error)
-		return lock->cll_error;
-
-	result = -ENOSYS;
-	state = cl_lock_intransit(env, lock);
-	list_for_each_entry(slice, &lock->cll_layers, cls_linkage) {
-		if (slice->cls_ops->clo_use) {
-			result = slice->cls_ops->clo_use(env, slice);
-			if (result != 0)
-				break;
-		}
-	}
-	LASSERT(result != -ENOSYS);
-
-	LASSERTF(lock->cll_state == CLS_INTRANSIT, "Wrong state %d.\n",
-		 lock->cll_state);
-
-	if (result == 0) {
-		state = CLS_HELD;
-	} else {
-		if (result == -ESTALE) {
-			/*
-			 * ESTALE means sublock being cancelled
-			 * at this time, and set lock state to
-			 * be NEW here and ask the caller to repeat.
-			 */
-			state = CLS_NEW;
-			result = CLO_REPEAT;
-		}
-
-		/* @atomic means back-off-on-failure. */
-		if (atomic) {
-			int rc;
-
-			rc = cl_unuse_try_internal(env, lock);
-			/* Vet the results. */
-			if (rc < 0 && result > 0)
-				result = rc;
-		}
-
-	}
-	cl_lock_extransit(env, lock, state);
-	return result;
-}
-EXPORT_SYMBOL(cl_use_try);
-
-/**
- * Helper for cl_enqueue_try() that calls ->clo_enqueue() across all layers
- * top-to-bottom.
- */
-static int cl_enqueue_kick(const struct lu_env *env,
-			   struct cl_lock *lock,
-			   struct cl_io *io, __u32 flags)
-{
-	int result;
-	const struct cl_lock_slice *slice;
-
-	result = -ENOSYS;
-	list_for_each_entry(slice, &lock->cll_layers, cls_linkage) {
-		if (slice->cls_ops->clo_enqueue) {
-			result = slice->cls_ops->clo_enqueue(env,
-							     slice, io, flags);
-			if (result != 0)
-				break;
-		}
-	}
-	LASSERT(result != -ENOSYS);
-	return result;
-}
-
-/**
- * Tries to enqueue a lock.
- *
- * This function is called repeatedly by cl_enqueue() until either lock is
- * enqueued, or error occurs. This function does not block waiting for
- * networking communication to complete.
- *
- * \post ergo(result == 0, lock->cll_state == CLS_ENQUEUED ||
- *			 lock->cll_state == CLS_HELD)
- *
- * \see cl_enqueue() cl_lock_operations::clo_enqueue()
- * \see cl_lock_state::CLS_ENQUEUED
- */
-int cl_enqueue_try(const struct lu_env *env, struct cl_lock *lock,
-		   struct cl_io *io, __u32 flags)
-{
-	int result;
-
-	cl_lock_trace(D_DLMTRACE, env, "enqueue lock", lock);
-	do {
-		LINVRNT(cl_lock_is_mutexed(lock));
-
-		result = lock->cll_error;
-		if (result != 0)
-			break;
-
-		switch (lock->cll_state) {
-		case CLS_NEW:
-			cl_lock_state_set(env, lock, CLS_QUEUING);
-			/* fall-through */
-		case CLS_QUEUING:
-			/* kick layers. */
-			result = cl_enqueue_kick(env, lock, io, flags);
-			/* For AGL case, the cl_lock::cll_state may
-			 * become CLS_HELD already.
-			 */
-			if (result == 0 && lock->cll_state == CLS_QUEUING)
-				cl_lock_state_set(env, lock, CLS_ENQUEUED);
-			break;
-		case CLS_INTRANSIT:
-			LASSERT(cl_lock_is_intransit(lock));
-			result = CLO_WAIT;
-			break;
-		case CLS_CACHED:
-			/* yank lock from the cache. */
-			result = cl_use_try(env, lock, 0);
-			break;
-		case CLS_ENQUEUED:
-		case CLS_HELD:
-			result = 0;
-			break;
-		default:
-		case CLS_FREEING:
-			/*
-			 * impossible, only held locks with increased
-			 * ->cll_holds can be enqueued, and they cannot be
-			 * freed.
-			 */
-			LBUG();
-		}
-	} while (result == CLO_REPEAT);
-	return result;
-}
-EXPORT_SYMBOL(cl_enqueue_try);
-
-/**
- * Cancel the conflicting lock found during previous enqueue.
- *
- * \retval 0 conflicting lock has been canceled.
- * \retval -ve error code.
- */
-int cl_lock_enqueue_wait(const struct lu_env *env,
-			 struct cl_lock *lock,
-			 int keep_mutex)
-{
-	struct cl_lock  *conflict;
-	int	      rc = 0;
-
-	LASSERT(cl_lock_is_mutexed(lock));
-	LASSERT(lock->cll_state == CLS_QUEUING);
-	LASSERT(lock->cll_conflict);
-
-	conflict = lock->cll_conflict;
-	lock->cll_conflict = NULL;
-
-	cl_lock_mutex_put(env, lock);
-	LASSERT(cl_lock_nr_mutexed(env) == 0);
-
-	cl_lock_mutex_get(env, conflict);
-	cl_lock_trace(D_DLMTRACE, env, "enqueue wait", conflict);
-	cl_lock_cancel(env, conflict);
-	cl_lock_delete(env, conflict);
-
-	while (conflict->cll_state != CLS_FREEING) {
-		rc = cl_lock_state_wait(env, conflict);
-		if (rc != 0)
-			break;
-	}
-	cl_lock_mutex_put(env, conflict);
-	lu_ref_del(&conflict->cll_reference, "cancel-wait", lock);
-	cl_lock_put(env, conflict);
-
-	if (keep_mutex)
-		cl_lock_mutex_get(env, lock);
-
-	LASSERT(rc <= 0);
-	return rc;
-}
-EXPORT_SYMBOL(cl_lock_enqueue_wait);
-
-static int cl_enqueue_locked(const struct lu_env *env, struct cl_lock *lock,
-			     struct cl_io *io, __u32 enqflags)
+static void cl_lock_trace0(int level, const struct lu_env *env,
+			   const char *prefix, const struct cl_lock *lock,
+			   const char *func, const int line)
 {
-	int result;
-
-	LINVRNT(cl_lock_is_mutexed(lock));
-	LINVRNT(cl_lock_invariant(env, lock));
-	LASSERT(lock->cll_holds > 0);
+	struct cl_object_header *h = cl_object_header(lock->cll_descr.cld_obj);
 
-	cl_lock_user_add(env, lock);
-	do {
-		result = cl_enqueue_try(env, lock, io, enqflags);
-		if (result == CLO_WAIT) {
-			if (lock->cll_conflict)
-				result = cl_lock_enqueue_wait(env, lock, 1);
-			else
-				result = cl_lock_state_wait(env, lock);
-			if (result == 0)
-				continue;
-		}
-		break;
-	} while (1);
-	if (result != 0)
-		cl_unuse_try(env, lock);
-	LASSERT(ergo(result == 0 && !(enqflags & CEF_AGL),
-		     lock->cll_state == CLS_ENQUEUED ||
-		     lock->cll_state == CLS_HELD));
-	return result;
+	CDEBUG(level, "%s: %p (%p/%d) at %s():%d\n",
+	       prefix, lock, env, h->coh_nesting, func, line);
 }
+#define cl_lock_trace(level, env, prefix, lock)				\
+	cl_lock_trace0(level, env, prefix, lock, __func__, __LINE__)
 
 /**
- * Tries to unlock a lock.
- *
- * This function is called to release underlying resource:
- * 1. for top lock, the resource is sublocks it held;
- * 2. for sublock, the resource is the reference to dlmlock.
+ * Adds lock slice to the compound lock.
  *
- * cl_unuse_try is a one-shot operation, so it must NOT return CLO_WAIT.
+ * This is called by cl_object_operations::coo_lock_init() methods to add a
+ * per-layer state to the lock. New state is added at the end of
+ * cl_lock::cll_layers list, that is, it is at the bottom of the stack.
  *
- * \see cl_unuse() cl_lock_operations::clo_unuse()
- * \see cl_lock_state::CLS_CACHED
+ * \see cl_req_slice_add(), cl_page_slice_add(), cl_io_slice_add()
  */
-int cl_unuse_try(const struct lu_env *env, struct cl_lock *lock)
+void cl_lock_slice_add(struct cl_lock *lock, struct cl_lock_slice *slice,
+		       struct cl_object *obj,
+		       const struct cl_lock_operations *ops)
 {
-	int			 result;
-	enum cl_lock_state	  state = CLS_NEW;
-
-	cl_lock_trace(D_DLMTRACE, env, "unuse lock", lock);
-
-	if (lock->cll_users > 1) {
-		cl_lock_user_del(env, lock);
-		return 0;
-	}
-
-	/* Only if the lock is in CLS_HELD or CLS_ENQUEUED state, it can hold
-	 * underlying resources.
-	 */
-	if (!(lock->cll_state == CLS_HELD || lock->cll_state == CLS_ENQUEUED)) {
-		cl_lock_user_del(env, lock);
-		return 0;
-	}
-
-	/*
-	 * New lock users (->cll_users) are not protecting unlocking
-	 * from proceeding. From this point, lock eventually reaches
-	 * CLS_CACHED, is reinitialized to CLS_NEW or fails into
-	 * CLS_FREEING.
-	 */
-	state = cl_lock_intransit(env, lock);
-
-	result = cl_unuse_try_internal(env, lock);
-	LASSERT(lock->cll_state == CLS_INTRANSIT);
-	LASSERT(result != CLO_WAIT);
-	cl_lock_user_del(env, lock);
-	if (result == 0 || result == -ESTALE) {
-		/*
-		 * Return lock back to the cache. This is the only
-		 * place where lock is moved into CLS_CACHED state.
-		 *
-		 * If one of ->clo_unuse() methods returned -ESTALE, lock
-		 * cannot be placed into cache and has to be
-		 * re-initialized. This happens e.g., when a sub-lock was
-		 * canceled while unlocking was in progress.
-		 */
-		if (state == CLS_HELD && result == 0)
-			state = CLS_CACHED;
-		else
-			state = CLS_NEW;
-		cl_lock_extransit(env, lock, state);
-
-		/*
-		 * Hide -ESTALE error.
-		 * If the lock is a glimpse lock, and it has multiple
-		 * stripes. Assuming that one of its sublock returned -ENAVAIL,
-		 * and other sublocks are matched write locks. In this case,
-		 * we can't set this lock to error because otherwise some of
-		 * its sublocks may not be canceled. This causes some dirty
-		 * pages won't be written to OSTs. -jay
-		 */
-		result = 0;
-	} else {
-		CERROR("result = %d, this is unlikely!\n", result);
-		state = CLS_NEW;
-		cl_lock_extransit(env, lock, state);
-	}
-	return result ?: lock->cll_error;
+	slice->cls_lock = lock;
+	list_add_tail(&slice->cls_linkage, &lock->cll_layers);
+	slice->cls_obj = obj;
+	slice->cls_ops = ops;
 }
-EXPORT_SYMBOL(cl_unuse_try);
+EXPORT_SYMBOL(cl_lock_slice_add);
 
-static void cl_unuse_locked(const struct lu_env *env, struct cl_lock *lock)
+void cl_lock_fini(const struct lu_env *env, struct cl_lock *lock)
 {
-	int result;
+	cl_lock_trace(D_DLMTRACE, env, "destroy lock", lock);
 
-	result = cl_unuse_try(env, lock);
-	if (result)
-		CL_LOCK_DEBUG(D_ERROR, env, lock, "unuse return %d\n", result);
-}
+	while (!list_empty(&lock->cll_layers)) {
+		struct cl_lock_slice *slice;
 
-/**
- * Unlocks a lock.
- */
-void cl_unuse(const struct lu_env *env, struct cl_lock *lock)
-{
-	cl_lock_mutex_get(env, lock);
-	cl_unuse_locked(env, lock);
-	cl_lock_mutex_put(env, lock);
-	cl_lock_lockdep_release(env, lock);
+		slice = list_entry(lock->cll_layers.next,
+				   struct cl_lock_slice, cls_linkage);
+		list_del_init(lock->cll_layers.next);
+		slice->cls_ops->clo_fini(env, slice);
+	}
+	POISON(lock, 0x5a, sizeof(*lock));
 }
-EXPORT_SYMBOL(cl_unuse);
+EXPORT_SYMBOL(cl_lock_fini);
 
-/**
- * Tries to wait for a lock.
- *
- * This function is called repeatedly by cl_wait() until either lock is
- * granted, or error occurs. This function does not block waiting for network
- * communication to complete.
- *
- * \see cl_wait() cl_lock_operations::clo_wait()
- * \see cl_lock_state::CLS_HELD
- */
-int cl_wait_try(const struct lu_env *env, struct cl_lock *lock)
+int cl_lock_init(const struct lu_env *env, struct cl_lock *lock,
+		 const struct cl_io *io)
 {
-	const struct cl_lock_slice *slice;
-	int			 result;
-
-	cl_lock_trace(D_DLMTRACE, env, "wait lock try", lock);
-	do {
-		LINVRNT(cl_lock_is_mutexed(lock));
-		LINVRNT(cl_lock_invariant(env, lock));
-		LASSERTF(lock->cll_state == CLS_QUEUING ||
-			 lock->cll_state == CLS_ENQUEUED ||
-			 lock->cll_state == CLS_HELD ||
-			 lock->cll_state == CLS_INTRANSIT,
-			 "lock state: %d\n", lock->cll_state);
-		LASSERT(lock->cll_users > 0);
-		LASSERT(lock->cll_holds > 0);
+	struct cl_object *obj = lock->cll_descr.cld_obj;
+	struct cl_object *scan;
+	int result = 0;
 
-		result = lock->cll_error;
-		if (result != 0)
-			break;
+	/* Make sure cl_lock::cll_descr is initialized. */
+	LASSERT(obj);
 
-		if (cl_lock_is_intransit(lock)) {
-			result = CLO_WAIT;
+	INIT_LIST_HEAD(&lock->cll_layers);
+	list_for_each_entry(scan, &obj->co_lu.lo_header->loh_layers,
+			    co_lu.lo_linkage) {
+		result = scan->co_ops->coo_lock_init(env, scan, lock, io);
+		if (result != 0) {
+			cl_lock_fini(env, lock);
 			break;
 		}
+	}
 
-		if (lock->cll_state == CLS_HELD)
-			/* nothing to do */
-			break;
-
-		result = -ENOSYS;
-		list_for_each_entry(slice, &lock->cll_layers, cls_linkage) {
-			if (slice->cls_ops->clo_wait) {
-				result = slice->cls_ops->clo_wait(env, slice);
-				if (result != 0)
-					break;
-			}
-		}
-		LASSERT(result != -ENOSYS);
-		if (result == 0) {
-			LASSERT(lock->cll_state != CLS_INTRANSIT);
-			cl_lock_state_set(env, lock, CLS_HELD);
-		}
-	} while (result == CLO_REPEAT);
 	return result;
 }
-EXPORT_SYMBOL(cl_wait_try);
+EXPORT_SYMBOL(cl_lock_init);
 
 /**
- * Waits until enqueued lock is granted.
- *
- * \pre current thread or io owns a hold on the lock
- * \pre ergo(result == 0, lock->cll_state == CLS_ENQUEUED ||
- *			lock->cll_state == CLS_HELD)
+ * Returns a slice with a lock, corresponding to the given layer in the
+ * device stack.
  *
- * \post ergo(result == 0, lock->cll_state == CLS_HELD)
- */
-int cl_wait(const struct lu_env *env, struct cl_lock *lock)
-{
-	int result;
-
-	cl_lock_mutex_get(env, lock);
-
-	LINVRNT(cl_lock_invariant(env, lock));
-	LASSERTF(lock->cll_state == CLS_ENQUEUED || lock->cll_state == CLS_HELD,
-		 "Wrong state %d\n", lock->cll_state);
-	LASSERT(lock->cll_holds > 0);
-
-	do {
-		result = cl_wait_try(env, lock);
-		if (result == CLO_WAIT) {
-			result = cl_lock_state_wait(env, lock);
-			if (result == 0)
-				continue;
-		}
-		break;
-	} while (1);
-	if (result < 0) {
-		cl_unuse_try(env, lock);
-		cl_lock_lockdep_release(env, lock);
-	}
-	cl_lock_trace(D_DLMTRACE, env, "wait lock", lock);
-	cl_lock_mutex_put(env, lock);
-	LASSERT(ergo(result == 0, lock->cll_state == CLS_HELD));
-	return result;
-}
-EXPORT_SYMBOL(cl_wait);
-
-/**
- * Executes cl_lock_operations::clo_weigh(), and sums results to estimate lock
- * value.
+ * \see cl_page_at()
  */
-unsigned long cl_lock_weigh(const struct lu_env *env, struct cl_lock *lock)
+const struct cl_lock_slice *cl_lock_at(const struct cl_lock *lock,
+				       const struct lu_device_type *dtype)
 {
 	const struct cl_lock_slice *slice;
-	unsigned long pound;
-	unsigned long ounce;
 
-	LINVRNT(cl_lock_is_mutexed(lock));
-	LINVRNT(cl_lock_invariant(env, lock));
-
-	pound = 0;
-	list_for_each_entry_reverse(slice, &lock->cll_layers, cls_linkage) {
-		if (slice->cls_ops->clo_weigh) {
-			ounce = slice->cls_ops->clo_weigh(env, slice);
-			pound += ounce;
-			if (pound < ounce) /* over-weight^Wflow */
-				pound = ~0UL;
-		}
+	list_for_each_entry(slice, &lock->cll_layers, cls_linkage) {
+		if (slice->cls_obj->co_lu.lo_dev->ld_type == dtype)
+			return slice;
 	}
-	return pound;
+	return NULL;
 }
-EXPORT_SYMBOL(cl_lock_weigh);
+EXPORT_SYMBOL(cl_lock_at);
 
-/**
- * Notifies layers that lock description changed.
- *
- * The server can grant client a lock different from one that was requested
- * (e.g., larger in extent). This method is called when actually granted lock
- * description becomes known to let layers to accommodate for changed lock
- * description.
- *
- * \see cl_lock_operations::clo_modify()
- */
-int cl_lock_modify(const struct lu_env *env, struct cl_lock *lock,
-		   const struct cl_lock_descr *desc)
+void cl_lock_cancel(const struct lu_env *env, struct cl_lock *lock)
 {
 	const struct cl_lock_slice *slice;
-	struct cl_object	   *obj = lock->cll_descr.cld_obj;
-	struct cl_object_header    *hdr = cl_object_header(obj);
-	int result;
-
-	cl_lock_trace(D_DLMTRACE, env, "modify lock", lock);
-	/* don't allow object to change */
-	LASSERT(obj == desc->cld_obj);
-	LINVRNT(cl_lock_is_mutexed(lock));
-	LINVRNT(cl_lock_invariant(env, lock));
 
+	cl_lock_trace(D_DLMTRACE, env, "cancel lock", lock);
 	list_for_each_entry_reverse(slice, &lock->cll_layers, cls_linkage) {
-		if (slice->cls_ops->clo_modify) {
-			result = slice->cls_ops->clo_modify(env, slice, desc);
-			if (result != 0)
-				return result;
-		}
-	}
-	CL_LOCK_DEBUG(D_DLMTRACE, env, lock, " -> "DDESCR"@"DFID"\n",
-		      PDESCR(desc), PFID(lu_object_fid(&desc->cld_obj->co_lu)));
-	/*
-	 * Just replace description in place. Nothing more is needed for
-	 * now. If locks were indexed according to their extent and/or mode,
-	 * that index would have to be updated here.
-	 */
-	spin_lock(&hdr->coh_lock_guard);
-	lock->cll_descr = *desc;
-	spin_unlock(&hdr->coh_lock_guard);
-	return 0;
-}
-EXPORT_SYMBOL(cl_lock_modify);
-
-/**
- * Initializes lock closure with a given origin.
- *
- * \see cl_lock_closure
- */
-void cl_lock_closure_init(const struct lu_env *env,
-			  struct cl_lock_closure *closure,
-			  struct cl_lock *origin, int wait)
-{
-	LINVRNT(cl_lock_is_mutexed(origin));
-	LINVRNT(cl_lock_invariant(env, origin));
-
-	INIT_LIST_HEAD(&closure->clc_list);
-	closure->clc_origin = origin;
-	closure->clc_wait   = wait;
-	closure->clc_nr     = 0;
-}
-EXPORT_SYMBOL(cl_lock_closure_init);
-
-/**
- * Builds a closure of \a lock.
- *
- * Building of a closure consists of adding initial lock (\a lock) into it,
- * and calling cl_lock_operations::clo_closure() methods of \a lock. These
- * methods might call cl_lock_closure_build() recursively again, adding more
- * locks to the closure, etc.
- *
- * \see cl_lock_closure
- */
-int cl_lock_closure_build(const struct lu_env *env, struct cl_lock *lock,
-			  struct cl_lock_closure *closure)
-{
-	const struct cl_lock_slice *slice;
-	int result;
-
-	LINVRNT(cl_lock_is_mutexed(closure->clc_origin));
-	LINVRNT(cl_lock_invariant(env, closure->clc_origin));
-
-	result = cl_lock_enclosure(env, lock, closure);
-	if (result == 0) {
-		list_for_each_entry(slice, &lock->cll_layers, cls_linkage) {
-			if (slice->cls_ops->clo_closure) {
-				result = slice->cls_ops->clo_closure(env, slice,
-								     closure);
-				if (result != 0)
-					break;
-			}
-		}
-	}
-	if (result != 0)
-		cl_lock_disclosure(env, closure);
-	return result;
-}
-EXPORT_SYMBOL(cl_lock_closure_build);
-
-/**
- * Adds new lock to a closure.
- *
- * Try-locks \a lock and if succeeded, adds it to the closure (never more than
- * once). If try-lock failed, returns CLO_REPEAT, after optionally waiting
- * until next try-lock is likely to succeed.
- */
-int cl_lock_enclosure(const struct lu_env *env, struct cl_lock *lock,
-		      struct cl_lock_closure *closure)
-{
-	int result = 0;
-
-	cl_lock_trace(D_DLMTRACE, env, "enclosure lock", lock);
-	if (!cl_lock_mutex_try(env, lock)) {
-		/*
-		 * If lock->cll_inclosure is not empty, lock is already in
-		 * this closure.
-		 */
-		if (list_empty(&lock->cll_inclosure)) {
-			cl_lock_get_trust(lock);
-			lu_ref_add(&lock->cll_reference, "closure", closure);
-			list_add(&lock->cll_inclosure, &closure->clc_list);
-			closure->clc_nr++;
-		} else
-			cl_lock_mutex_put(env, lock);
-		result = 0;
-	} else {
-		cl_lock_disclosure(env, closure);
-		if (closure->clc_wait) {
-			cl_lock_get_trust(lock);
-			lu_ref_add(&lock->cll_reference, "closure-w", closure);
-			cl_lock_mutex_put(env, closure->clc_origin);
-
-			LASSERT(cl_lock_nr_mutexed(env) == 0);
-			cl_lock_mutex_get(env, lock);
-			cl_lock_mutex_put(env, lock);
-
-			cl_lock_mutex_get(env, closure->clc_origin);
-			lu_ref_del(&lock->cll_reference, "closure-w", closure);
-			cl_lock_put(env, lock);
-		}
-		result = CLO_REPEAT;
-	}
-	return result;
-}
-EXPORT_SYMBOL(cl_lock_enclosure);
-
-/** Releases mutices of enclosed locks. */
-void cl_lock_disclosure(const struct lu_env *env,
-			struct cl_lock_closure *closure)
-{
-	struct cl_lock *scan;
-	struct cl_lock *temp;
-
-	cl_lock_trace(D_DLMTRACE, env, "disclosure lock", closure->clc_origin);
-	list_for_each_entry_safe(scan, temp, &closure->clc_list,
-				 cll_inclosure) {
-		list_del_init(&scan->cll_inclosure);
-		cl_lock_mutex_put(env, scan);
-		lu_ref_del(&scan->cll_reference, "closure", closure);
-		cl_lock_put(env, scan);
-		closure->clc_nr--;
-	}
-	LASSERT(closure->clc_nr == 0);
-}
-EXPORT_SYMBOL(cl_lock_disclosure);
-
-/** Finalizes a closure. */
-void cl_lock_closure_fini(struct cl_lock_closure *closure)
-{
-	LASSERT(closure->clc_nr == 0);
-	LASSERT(list_empty(&closure->clc_list));
-}
-EXPORT_SYMBOL(cl_lock_closure_fini);
-
-/**
- * Destroys this lock. Notifies layers (bottom-to-top) that lock is being
- * destroyed, then destroy the lock. If there are holds on the lock, postpone
- * destruction until all holds are released. This is called when a decision is
- * made to destroy the lock in the future. E.g., when a blocking AST is
- * received on it, or fatal communication error happens.
- *
- * Caller must have a reference on this lock to prevent a situation, when
- * deleted lock lingers in memory for indefinite time, because nobody calls
- * cl_lock_put() to finish it.
- *
- * \pre atomic_read(&lock->cll_ref) > 0
- * \pre ergo(cl_lock_nesting(lock) == CNL_TOP,
- *	   cl_lock_nr_mutexed(env) == 1)
- *      [i.e., if a top-lock is deleted, mutices of no other locks can be
- *      held, as deletion of sub-locks might require releasing a top-lock
- *      mutex]
- *
- * \see cl_lock_operations::clo_delete()
- * \see cl_lock::cll_holds
- */
-void cl_lock_delete(const struct lu_env *env, struct cl_lock *lock)
-{
-	LINVRNT(cl_lock_is_mutexed(lock));
-	LINVRNT(cl_lock_invariant(env, lock));
-	LASSERT(ergo(cl_lock_nesting(lock) == CNL_TOP,
-		     cl_lock_nr_mutexed(env) == 1));
-
-	cl_lock_trace(D_DLMTRACE, env, "delete lock", lock);
-	if (lock->cll_holds == 0)
-		cl_lock_delete0(env, lock);
-	else
-		lock->cll_flags |= CLF_DOOMED;
-}
-EXPORT_SYMBOL(cl_lock_delete);
-
-/**
- * Mark lock as irrecoverably failed, and mark it for destruction. This
- * happens when, e.g., server fails to grant a lock to us, or networking
- * time-out happens.
- *
- * \pre atomic_read(&lock->cll_ref) > 0
- *
- * \see clo_lock_delete()
- * \see cl_lock::cll_holds
- */
-void cl_lock_error(const struct lu_env *env, struct cl_lock *lock, int error)
-{
-	LINVRNT(cl_lock_is_mutexed(lock));
-	LINVRNT(cl_lock_invariant(env, lock));
-
-	if (lock->cll_error == 0 && error != 0) {
-		cl_lock_trace(D_DLMTRACE, env, "set lock error", lock);
-		lock->cll_error = error;
-		cl_lock_signal(env, lock);
-		cl_lock_cancel(env, lock);
-		cl_lock_delete(env, lock);
+		if (slice->cls_ops->clo_cancel)
+			slice->cls_ops->clo_cancel(env, slice);
 	}
 }
-EXPORT_SYMBOL(cl_lock_error);
-
-/**
- * Cancels this lock. Notifies layers
- * (bottom-to-top) that lock is being cancelled, then destroy the lock. If
- * there are holds on the lock, postpone cancellation until
- * all holds are released.
- *
- * Cancellation notification is delivered to layers at most once.
- *
- * \see cl_lock_operations::clo_cancel()
- * \see cl_lock::cll_holds
- */
-void cl_lock_cancel(const struct lu_env *env, struct cl_lock *lock)
-{
-	LINVRNT(cl_lock_is_mutexed(lock));
-	LINVRNT(cl_lock_invariant(env, lock));
-
-	cl_lock_trace(D_DLMTRACE, env, "cancel lock", lock);
-	if (lock->cll_holds == 0)
-		cl_lock_cancel0(env, lock);
-	else
-		lock->cll_flags |= CLF_CANCELPEND;
-}
 EXPORT_SYMBOL(cl_lock_cancel);
 
 /**
- * Finds an existing lock covering given index and optionally different from a
- * given \a except lock.
+ * Enqueue a lock.
+ * \param anchor: if we need to wait for resources before getting the lock,
+ *		  use @anchor for the purpose.
+ * \retval 0  enqueue successfully
+ * \retval <0 error code
  */
-struct cl_lock *cl_lock_at_pgoff(const struct lu_env *env,
-				 struct cl_object *obj, pgoff_t index,
-				 struct cl_lock *except,
-				 int pending, int canceld)
+int cl_lock_enqueue(const struct lu_env *env, struct cl_io *io,
+		    struct cl_lock *lock, struct cl_sync_io *anchor)
 {
-	struct cl_object_header *head;
-	struct cl_lock	  *scan;
-	struct cl_lock	  *lock;
-	struct cl_lock_descr    *need;
-
-	head = cl_object_header(obj);
-	need = &cl_env_info(env)->clt_descr;
-	lock = NULL;
+	const struct cl_lock_slice *slice;
+	int rc = -ENOSYS;
 
-	need->cld_mode = CLM_READ; /* CLM_READ matches both READ & WRITE, but
-				    * not PHANTOM
-				    */
-	need->cld_start = need->cld_end = index;
-	need->cld_enq_flags = 0;
+	list_for_each_entry(slice, &lock->cll_layers, cls_linkage) {
+		if (!slice->cls_ops->clo_enqueue)
+			continue;
 
-	spin_lock(&head->coh_lock_guard);
-	/* It is fine to match any group lock since there could be only one
-	 * with a uniq gid and it conflicts with all other lock modes too
-	 */
-	list_for_each_entry(scan, &head->coh_locks, cll_linkage) {
-		if (scan != except &&
-		    (scan->cll_descr.cld_mode == CLM_GROUP ||
-		    cl_lock_ext_match(&scan->cll_descr, need)) &&
-		    scan->cll_state >= CLS_HELD &&
-		    scan->cll_state < CLS_FREEING &&
-		    /*
-		     * This check is racy as the lock can be canceled right
-		     * after it is done, but this is fine, because page exists
-		     * already.
-		     */
-		    (canceld || !(scan->cll_flags & CLF_CANCELLED)) &&
-		    (pending || !(scan->cll_flags & CLF_CANCELPEND))) {
-			/* Don't increase cs_hit here since this
-			 * is just a helper function.
-			 */
-			cl_lock_get_trust(scan);
-			lock = scan;
+		rc = slice->cls_ops->clo_enqueue(env, slice, io, anchor);
+		if (rc != 0)
 			break;
 		}
-	}
-	spin_unlock(&head->coh_lock_guard);
-	return lock;
+	return rc;
 }
-EXPORT_SYMBOL(cl_lock_at_pgoff);
+EXPORT_SYMBOL(cl_lock_enqueue);
 
 /**
- * Eliminate all locks for a given object.
- *
- * Caller has to guarantee that no lock is in active use.
- *
- * \param cancel when this is set, cl_locks_prune() cancels locks before
- *	       destroying.
+ * Main high-level entry point of cl_lock interface that finds existing or
+ * enqueues new lock matching given description.
  */
-void cl_locks_prune(const struct lu_env *env, struct cl_object *obj, int cancel)
+int cl_lock_request(const struct lu_env *env, struct cl_io *io,
+		    struct cl_lock *lock)
 {
-	struct cl_object_header *head;
-	struct cl_lock	  *lock;
-
-	head = cl_object_header(obj);
-
-	spin_lock(&head->coh_lock_guard);
-	while (!list_empty(&head->coh_locks)) {
-		lock = container_of(head->coh_locks.next,
-				    struct cl_lock, cll_linkage);
-		cl_lock_get_trust(lock);
-		spin_unlock(&head->coh_lock_guard);
-		lu_ref_add(&lock->cll_reference, "prune", current);
-
-again:
-		cl_lock_mutex_get(env, lock);
-		if (lock->cll_state < CLS_FREEING) {
-			LASSERT(lock->cll_users <= 1);
-			if (unlikely(lock->cll_users == 1)) {
-				struct l_wait_info lwi = { 0 };
-
-				cl_lock_mutex_put(env, lock);
-				l_wait_event(lock->cll_wq,
-					     lock->cll_users == 0,
-					     &lwi);
-				goto again;
-			}
-
-			if (cancel)
-				cl_lock_cancel(env, lock);
-			cl_lock_delete(env, lock);
-		}
-		cl_lock_mutex_put(env, lock);
-		lu_ref_del(&lock->cll_reference, "prune", current);
-		cl_lock_put(env, lock);
-		spin_lock(&head->coh_lock_guard);
-	}
-	spin_unlock(&head->coh_lock_guard);
-}
-EXPORT_SYMBOL(cl_locks_prune);
+	struct cl_sync_io *anchor = NULL;
+	__u32 enq_flags = lock->cll_descr.cld_enq_flags;
+	int rc;
 
-static struct cl_lock *cl_lock_hold_mutex(const struct lu_env *env,
-					  const struct cl_io *io,
-					  const struct cl_lock_descr *need,
-					  const char *scope, const void *source)
-{
-	struct cl_lock *lock;
+	rc = cl_lock_init(env, lock, io);
+	if (rc < 0)
+		return rc;
 
-	while (1) {
-		lock = cl_lock_find(env, io, need);
-		if (IS_ERR(lock))
-			break;
-		cl_lock_mutex_get(env, lock);
-		if (lock->cll_state < CLS_FREEING &&
-		    !(lock->cll_flags & CLF_CANCELLED)) {
-			cl_lock_hold_mod(env, lock, 1);
-			lu_ref_add(&lock->cll_holders, scope, source);
-			lu_ref_add(&lock->cll_reference, scope, source);
-			break;
-		}
-		cl_lock_mutex_put(env, lock);
-		cl_lock_put(env, lock);
+	if ((enq_flags & CEF_ASYNC) && !(enq_flags & CEF_AGL)) {
+		anchor = &cl_env_info(env)->clt_anchor;
+		cl_sync_io_init(anchor, 1, cl_sync_io_end);
 	}
-	return lock;
-}
 
-/**
- * Returns a lock matching \a need description with a reference and a hold on
- * it.
- *
- * This is much like cl_lock_find(), except that cl_lock_hold() additionally
- * guarantees that lock is not in the CLS_FREEING state on return.
- */
-struct cl_lock *cl_lock_hold(const struct lu_env *env, const struct cl_io *io,
-			     const struct cl_lock_descr *need,
-			     const char *scope, const void *source)
-{
-	struct cl_lock *lock;
+	rc = cl_lock_enqueue(env, io, lock, anchor);
 
-	lock = cl_lock_hold_mutex(env, io, need, scope, source);
-	if (!IS_ERR(lock))
-		cl_lock_mutex_put(env, lock);
-	return lock;
-}
-EXPORT_SYMBOL(cl_lock_hold);
+	if (anchor) {
+		int rc2;
 
-/**
- * Main high-level entry point of cl_lock interface that finds existing or
- * enqueues new lock matching given description.
- */
-struct cl_lock *cl_lock_request(const struct lu_env *env, struct cl_io *io,
-				const struct cl_lock_descr *need,
-				const char *scope, const void *source)
-{
-	struct cl_lock       *lock;
-	int		   rc;
-	__u32		 enqflags = need->cld_enq_flags;
+		/* drop the reference count held at initialization time */
+		cl_sync_io_note(env, anchor, 0);
+		rc2 = cl_sync_io_wait(env, anchor, 0);
+		if (rc2 < 0 && rc == 0)
+			rc = rc2;
+	}
 
-	do {
-		lock = cl_lock_hold_mutex(env, io, need, scope, source);
-		if (IS_ERR(lock))
-			break;
+	if (rc < 0)
+		cl_lock_release(env, lock);
 
-		rc = cl_enqueue_locked(env, lock, io, enqflags);
-		if (rc == 0) {
-			if (cl_lock_fits_into(env, lock, need, io)) {
-				if (!(enqflags & CEF_AGL)) {
-					cl_lock_mutex_put(env, lock);
-					cl_lock_lockdep_acquire(env, lock,
-								enqflags);
-					break;
-				}
-				rc = 1;
-			}
-			cl_unuse_locked(env, lock);
-		}
-		cl_lock_trace(D_DLMTRACE, env,
-			      rc <= 0 ? "enqueue failed" : "agl succeed", lock);
-		cl_lock_hold_release(env, lock, scope, source);
-		cl_lock_mutex_put(env, lock);
-		lu_ref_del(&lock->cll_reference, scope, source);
-		cl_lock_put(env, lock);
-		if (rc > 0) {
-			LASSERT(enqflags & CEF_AGL);
-			lock = NULL;
-		} else if (rc != 0) {
-			lock = ERR_PTR(rc);
-		}
-	} while (rc == 0);
-	return lock;
+	return rc;
 }
 EXPORT_SYMBOL(cl_lock_request);
 
 /**
- * Adds a hold to a known lock.
- */
-void cl_lock_hold_add(const struct lu_env *env, struct cl_lock *lock,
-		      const char *scope, const void *source)
-{
-	LINVRNT(cl_lock_is_mutexed(lock));
-	LINVRNT(cl_lock_invariant(env, lock));
-	LASSERT(lock->cll_state != CLS_FREEING);
-
-	cl_lock_get(lock);
-	cl_lock_hold_mod(env, lock, 1);
-	lu_ref_add(&lock->cll_holders, scope, source);
-	lu_ref_add(&lock->cll_reference, scope, source);
-}
-EXPORT_SYMBOL(cl_lock_hold_add);
-
-/**
- * Releases a hold and a reference on a lock, on which caller acquired a
- * mutex.
- */
-void cl_lock_unhold(const struct lu_env *env, struct cl_lock *lock,
-		    const char *scope, const void *source)
-{
-	LINVRNT(cl_lock_invariant(env, lock));
-	cl_lock_hold_release(env, lock, scope, source);
-	lu_ref_del(&lock->cll_reference, scope, source);
-	cl_lock_put(env, lock);
-}
-EXPORT_SYMBOL(cl_lock_unhold);
-
-/**
  * Releases a hold and a reference on a lock, obtained by cl_lock_hold().
  */
-void cl_lock_release(const struct lu_env *env, struct cl_lock *lock,
-		     const char *scope, const void *source)
+void cl_lock_release(const struct lu_env *env, struct cl_lock *lock)
 {
-	LINVRNT(cl_lock_invariant(env, lock));
 	cl_lock_trace(D_DLMTRACE, env, "release lock", lock);
-	cl_lock_mutex_get(env, lock);
-	cl_lock_hold_release(env, lock, scope, source);
-	cl_lock_mutex_put(env, lock);
-	lu_ref_del(&lock->cll_reference, scope, source);
-	cl_lock_put(env, lock);
+	cl_lock_cancel(env, lock);
+	cl_lock_fini(env, lock);
 }
 EXPORT_SYMBOL(cl_lock_release);
 
-void cl_lock_user_add(const struct lu_env *env, struct cl_lock *lock)
-{
-	LINVRNT(cl_lock_is_mutexed(lock));
-	LINVRNT(cl_lock_invariant(env, lock));
-
-	cl_lock_used_mod(env, lock, 1);
-}
-EXPORT_SYMBOL(cl_lock_user_add);
-
-void cl_lock_user_del(const struct lu_env *env, struct cl_lock *lock)
-{
-	LINVRNT(cl_lock_is_mutexed(lock));
-	LINVRNT(cl_lock_invariant(env, lock));
-	LASSERT(lock->cll_users > 0);
-
-	cl_lock_used_mod(env, lock, -1);
-	if (lock->cll_users == 0)
-		wake_up_all(&lock->cll_wq);
-}
-EXPORT_SYMBOL(cl_lock_user_del);
-
 const char *cl_lock_mode_name(const enum cl_lock_mode mode)
 {
 	static const char *names[] = {
-		[CLM_PHANTOM] = "P",
 		[CLM_READ]    = "R",
 		[CLM_WRITE]   = "W",
 		[CLM_GROUP]   = "G"
@@ -2061,10 +261,8 @@ void cl_lock_print(const struct lu_env *env, void *cookie,
 		   lu_printer_t printer, const struct cl_lock *lock)
 {
 	const struct cl_lock_slice *slice;
-	(*printer)(env, cookie, "lock@%p[%d %d %d %d %d %08lx] ",
-		   lock, atomic_read(&lock->cll_ref),
-		   lock->cll_state, lock->cll_error, lock->cll_holds,
-		   lock->cll_users, lock->cll_flags);
+
+	(*printer)(env, cookie, "lock@%p", lock);
 	cl_lock_descr_print(env, cookie, printer, &lock->cll_descr);
 	(*printer)(env, cookie, " {\n");
 
@@ -2079,13 +277,3 @@ void cl_lock_print(const struct lu_env *env, void *cookie,
 	(*printer)(env, cookie, "} lock@%p\n", lock);
 }
 EXPORT_SYMBOL(cl_lock_print);
-
-int cl_lock_init(void)
-{
-	return lu_kmem_init(cl_lock_caches);
-}
-
-void cl_lock_fini(void)
-{
-	lu_kmem_fini(cl_lock_caches);
-}
diff --git a/drivers/staging/lustre/lustre/obdclass/cl_object.c b/drivers/staging/lustre/lustre/obdclass/cl_object.c
index 72e6333..395b92c 100644
--- a/drivers/staging/lustre/lustre/obdclass/cl_object.c
+++ b/drivers/staging/lustre/lustre/obdclass/cl_object.c
@@ -44,7 +44,6 @@
  *
  *  i_mutex
  *      PG_locked
- *	  ->coh_lock_guard
  *	  ->coh_attr_guard
  *	  ->ls_guard
  */
@@ -63,8 +62,6 @@
 
 static struct kmem_cache *cl_env_kmem;
 
-/** Lock class of cl_object_header::coh_lock_guard */
-static struct lock_class_key cl_lock_guard_class;
 /** Lock class of cl_object_header::coh_attr_guard */
 static struct lock_class_key cl_attr_guard_class;
 
@@ -79,11 +76,8 @@ int cl_object_header_init(struct cl_object_header *h)
 
 	result = lu_object_header_init(&h->coh_lu);
 	if (result == 0) {
-		spin_lock_init(&h->coh_lock_guard);
 		spin_lock_init(&h->coh_attr_guard);
-		lockdep_set_class(&h->coh_lock_guard, &cl_lock_guard_class);
 		lockdep_set_class(&h->coh_attr_guard, &cl_attr_guard_class);
-		INIT_LIST_HEAD(&h->coh_locks);
 		h->coh_page_bufsize = 0;
 	}
 	return result;
@@ -310,7 +304,7 @@ EXPORT_SYMBOL(cl_conf_set);
 /**
  * Prunes caches of pages and locks for this object.
  */
-void cl_object_prune(const struct lu_env *env, struct cl_object *obj)
+int cl_object_prune(const struct lu_env *env, struct cl_object *obj)
 {
 	struct lu_object_header *top;
 	struct cl_object *o;
@@ -326,10 +320,7 @@ void cl_object_prune(const struct lu_env *env, struct cl_object *obj)
 		}
 	}
 
-	/* TODO: pruning locks will be moved into layers after cl_lock
-	 * simplification is done
-	 */
-	cl_locks_prune(env, obj, 1);
+	return result;
 }
 EXPORT_SYMBOL(cl_object_prune);
 
@@ -342,19 +333,9 @@ EXPORT_SYMBOL(cl_object_prune);
  */
 void cl_object_kill(const struct lu_env *env, struct cl_object *obj)
 {
-	struct cl_object_header *hdr;
-
-	hdr = cl_object_header(obj);
+	struct cl_object_header *hdr = cl_object_header(obj);
 
 	set_bit(LU_OBJECT_HEARD_BANSHEE, &hdr->coh_lu.loh_flags);
-	/*
-	 * Destroy all locks. Object destruction (including cl_inode_fini())
-	 * cannot cancel the locks, because in the case of a local client,
-	 * where client and server share the same thread running
-	 * prune_icache(), this can dead-lock with ldlm_cancel_handler()
-	 * waiting on __wait_on_freeing_inode().
-	 */
-	cl_locks_prune(env, obj, 0);
 }
 EXPORT_SYMBOL(cl_object_kill);
 
@@ -406,11 +387,8 @@ int cl_site_init(struct cl_site *s, struct cl_device *d)
 	result = lu_site_init(&s->cs_lu, &d->cd_lu_dev);
 	if (result == 0) {
 		cache_stats_init(&s->cs_pages, "pages");
-		cache_stats_init(&s->cs_locks, "locks");
 		for (i = 0; i < ARRAY_SIZE(s->cs_pages_state); ++i)
 			atomic_set(&s->cs_pages_state[0], 0);
-		for (i = 0; i < ARRAY_SIZE(s->cs_locks_state); ++i)
-			atomic_set(&s->cs_locks_state[i], 0);
 		cl_env_percpu_refill();
 	}
 	return result;
@@ -445,15 +423,6 @@ int cl_site_stats_print(const struct cl_site *site, struct seq_file *m)
 		[CPS_PAGEIN]  = "r",
 		[CPS_FREEING] = "f"
 	};
-	static const char *lstate[] = {
-		[CLS_NEW]       = "n",
-		[CLS_QUEUING]   = "q",
-		[CLS_ENQUEUED]  = "e",
-		[CLS_HELD]      = "h",
-		[CLS_INTRANSIT] = "t",
-		[CLS_CACHED]    = "c",
-		[CLS_FREEING]   = "f"
-	};
 /*
        lookup    hit  total   busy create
 pages: ...... ...... ...... ...... ...... [...... ...... ...... ......]
@@ -467,12 +436,6 @@ locks: ...... ...... ...... ...... ...... [...... ...... ...... ...... ......]
 		seq_printf(m, "%s: %u ", pstate[i],
 			   atomic_read(&site->cs_pages_state[i]));
 	seq_printf(m, "]\n");
-	cache_stats_print(&site->cs_locks, m, 0);
-	seq_printf(m, " [");
-	for (i = 0; i < ARRAY_SIZE(site->cs_locks_state); ++i)
-		seq_printf(m, "%s: %u ", lstate[i],
-			   atomic_read(&site->cs_locks_state[i]));
-	seq_printf(m, "]\n");
 	cache_stats_print(&cl_env_stats, m, 0);
 	seq_printf(m, "\n");
 	return 0;
@@ -1147,12 +1110,6 @@ void cl_stack_fini(const struct lu_env *env, struct cl_device *cl)
 }
 EXPORT_SYMBOL(cl_stack_fini);
 
-int  cl_lock_init(void);
-void cl_lock_fini(void);
-
-int  cl_page_init(void);
-void cl_page_fini(void);
-
 static struct lu_context_key cl_key;
 
 struct cl_thread_info *cl_env_info(const struct lu_env *env)
@@ -1247,22 +1204,13 @@ int cl_global_init(void)
 	if (result)
 		goto out_kmem;
 
-	result = cl_lock_init();
-	if (result)
-		goto out_context;
-
-	result = cl_page_init();
-	if (result)
-		goto out_lock;
-
 	result = cl_env_percpu_init();
 	if (result)
 		/* no cl_env_percpu_fini on error */
-		goto out_lock;
+		goto out_context;
 
 	return 0;
-out_lock:
-	cl_lock_fini();
+
 out_context:
 	lu_context_key_degister(&cl_key);
 out_kmem:
@@ -1278,8 +1226,6 @@ out_store:
 void cl_global_fini(void)
 {
 	cl_env_percpu_fini();
-	cl_lock_fini();
-	cl_page_fini();
 	lu_context_key_degister(&cl_key);
 	lu_kmem_fini(cl_object_caches);
 	cl_env_store_fini();
diff --git a/drivers/staging/lustre/lustre/obdclass/cl_page.c b/drivers/staging/lustre/lustre/obdclass/cl_page.c
index ad7f0ae..8df39ce 100644
--- a/drivers/staging/lustre/lustre/obdclass/cl_page.c
+++ b/drivers/staging/lustre/lustre/obdclass/cl_page.c
@@ -1075,12 +1075,3 @@ void cl_page_slice_add(struct cl_page *page, struct cl_page_slice *slice,
 	slice->cpl_page = page;
 }
 EXPORT_SYMBOL(cl_page_slice_add);
-
-int  cl_page_init(void)
-{
-	return 0;
-}
-
-void cl_page_fini(void)
-{
-}
diff --git a/drivers/staging/lustre/lustre/obdecho/echo_client.c b/drivers/staging/lustre/lustre/obdecho/echo_client.c
index 0d84d04..a752bb4 100644
--- a/drivers/staging/lustre/lustre/obdecho/echo_client.c
+++ b/drivers/staging/lustre/lustre/obdecho/echo_client.c
@@ -171,7 +171,7 @@ struct echo_thread_info {
 
 	struct cl_2queue	eti_queue;
 	struct cl_io	    eti_io;
-	struct cl_lock_descr    eti_descr;
+	struct cl_lock          eti_lock;
 	struct lu_fid	   eti_fid;
 	struct lu_fid		eti_fid2;
 };
@@ -327,26 +327,8 @@ static void echo_lock_fini(const struct lu_env *env,
 	kmem_cache_free(echo_lock_kmem, ecl);
 }
 
-static void echo_lock_delete(const struct lu_env *env,
-			     const struct cl_lock_slice *slice)
-{
-	struct echo_lock *ecl      = cl2echo_lock(slice);
-
-	LASSERT(list_empty(&ecl->el_chain));
-}
-
-static int echo_lock_fits_into(const struct lu_env *env,
-			       const struct cl_lock_slice *slice,
-			       const struct cl_lock_descr *need,
-			       const struct cl_io *unused)
-{
-	return 1;
-}
-
 static struct cl_lock_operations echo_lock_ops = {
 	.clo_fini      = echo_lock_fini,
-	.clo_delete    = echo_lock_delete,
-	.clo_fits_into = echo_lock_fits_into
 };
 
 /** @} echo_lock */
@@ -811,16 +793,7 @@ static void echo_lock_release(const struct lu_env *env,
 {
 	struct cl_lock *clk = echo_lock2cl(ecl);
 
-	cl_lock_get(clk);
-	cl_unuse(env, clk);
-	cl_lock_release(env, clk, "ec enqueue", ecl->el_object);
-	if (!still_used) {
-		cl_lock_mutex_get(env, clk);
-		cl_lock_cancel(env, clk);
-		cl_lock_delete(env, clk);
-		cl_lock_mutex_put(env, clk);
-	}
-	cl_lock_put(env, clk);
+	cl_lock_release(env, clk);
 }
 
 static struct lu_device *echo_device_free(const struct lu_env *env,
@@ -1014,9 +987,11 @@ static int cl_echo_enqueue0(struct lu_env *env, struct echo_object *eco,
 
 	info = echo_env_info(env);
 	io = &info->eti_io;
-	descr = &info->eti_descr;
+	lck = &info->eti_lock;
 	obj = echo_obj2cl(eco);
 
+	memset(lck, 0, sizeof(*lck));
+	descr = &lck->cll_descr;
 	descr->cld_obj   = obj;
 	descr->cld_start = cl_index(obj, start);
 	descr->cld_end   = cl_index(obj, end);
@@ -1024,25 +999,20 @@ static int cl_echo_enqueue0(struct lu_env *env, struct echo_object *eco,
 	descr->cld_enq_flags = enqflags;
 	io->ci_obj = obj;
 
-	lck = cl_lock_request(env, io, descr, "ec enqueue", eco);
-	if (lck) {
+	rc = cl_lock_request(env, io, lck);
+	if (rc == 0) {
 		struct echo_client_obd *ec = eco->eo_dev->ed_ec;
 		struct echo_lock *el;
 
-		rc = cl_wait(env, lck);
-		if (rc == 0) {
-			el = cl2echo_lock(cl_lock_at(lck, &echo_device_type));
-			spin_lock(&ec->ec_lock);
-			if (list_empty(&el->el_chain)) {
-				list_add(&el->el_chain, &ec->ec_locks);
-				el->el_cookie = ++ec->ec_unique;
-			}
-			atomic_inc(&el->el_refcount);
-			*cookie = el->el_cookie;
-			spin_unlock(&ec->ec_lock);
-		} else {
-			cl_lock_release(env, lck, "ec enqueue", current);
+		el = cl2echo_lock(cl_lock_at(lck, &echo_device_type));
+		spin_lock(&ec->ec_lock);
+		if (list_empty(&el->el_chain)) {
+			list_add(&el->el_chain, &ec->ec_locks);
+			el->el_cookie = ++ec->ec_unique;
 		}
+		atomic_inc(&el->el_refcount);
+		*cookie = el->el_cookie;
+		spin_unlock(&ec->ec_lock);
 	}
 	return rc;
 }
diff --git a/drivers/staging/lustre/lustre/osc/osc_cache.c b/drivers/staging/lustre/lustre/osc/osc_cache.c
index c510659..d01f2a2 100644
--- a/drivers/staging/lustre/lustre/osc/osc_cache.c
+++ b/drivers/staging/lustre/lustre/osc/osc_cache.c
@@ -76,6 +76,8 @@ static inline char *ext_flags(struct osc_extent *ext, char *flags)
 	*buf++ = ext->oe_rw ? 'r' : 'w';
 	if (ext->oe_intree)
 		*buf++ = 'i';
+	if (ext->oe_sync)
+		*buf++ = 'S';
 	if (ext->oe_srvlock)
 		*buf++ = 's';
 	if (ext->oe_hp)
@@ -121,9 +123,13 @@ static const char *oes_strings[] = {
 		__ext->oe_grants, __ext->oe_nr_pages,			      \
 		list_empty_marker(&__ext->oe_pages),			      \
 		waitqueue_active(&__ext->oe_waitq) ? '+' : '-',		      \
-		__ext->oe_osclock, __ext->oe_mppr, __ext->oe_owner,	      \
+		__ext->oe_dlmlock, __ext->oe_mppr, __ext->oe_owner,	      \
 		/* ----- part 4 ----- */				      \
 		## __VA_ARGS__);					      \
+	if (lvl == D_ERROR && __ext->oe_dlmlock)			      \
+		LDLM_ERROR(__ext->oe_dlmlock, "extent: %p\n", __ext);	      \
+	else								      \
+		LDLM_DEBUG(__ext->oe_dlmlock, "extent: %p\n", __ext);	      \
 } while (0)
 
 #undef EASSERTF
@@ -240,20 +246,25 @@ static int osc_extent_sanity_check0(struct osc_extent *ext,
 		goto out;
 	}
 
-	if (!ext->oe_osclock && ext->oe_grants > 0) {
+	if (ext->oe_sync && ext->oe_grants > 0) {
 		rc = 90;
 		goto out;
 	}
 
-	if (ext->oe_osclock) {
-		struct cl_lock_descr *descr;
+	if (ext->oe_dlmlock) {
+		struct ldlm_extent *extent;
 
-		descr = &ext->oe_osclock->cll_descr;
-		if (!(descr->cld_start <= ext->oe_start &&
-		      descr->cld_end >= ext->oe_max_end)) {
+		extent = &ext->oe_dlmlock->l_policy_data.l_extent;
+		if (!(extent->start <= cl_offset(osc2cl(obj), ext->oe_start) &&
+		      extent->end >= cl_offset(osc2cl(obj), ext->oe_max_end))) {
 			rc = 100;
 			goto out;
 		}
+
+		if (!(ext->oe_dlmlock->l_granted_mode & (LCK_PW | LCK_GROUP))) {
+			rc = 102;
+			goto out;
+		}
 	}
 
 	if (ext->oe_nr_pages > ext->oe_mppr) {
@@ -359,7 +370,7 @@ static struct osc_extent *osc_extent_alloc(struct osc_object *obj)
 	ext->oe_state = OES_INV;
 	INIT_LIST_HEAD(&ext->oe_pages);
 	init_waitqueue_head(&ext->oe_waitq);
-	ext->oe_osclock = NULL;
+	ext->oe_dlmlock = NULL;
 
 	return ext;
 }
@@ -385,9 +396,11 @@ static void osc_extent_put(const struct lu_env *env, struct osc_extent *ext)
 		LASSERT(ext->oe_state == OES_INV);
 		LASSERT(!ext->oe_intree);
 
-		if (ext->oe_osclock) {
-			cl_lock_put(env, ext->oe_osclock);
-			ext->oe_osclock = NULL;
+		if (ext->oe_dlmlock) {
+			lu_ref_add(&ext->oe_dlmlock->l_reference,
+				   "osc_extent", ext);
+			LDLM_LOCK_PUT(ext->oe_dlmlock);
+			ext->oe_dlmlock = NULL;
 		}
 		osc_extent_free(ext);
 	}
@@ -543,7 +556,7 @@ static int osc_extent_merge(const struct lu_env *env, struct osc_extent *cur,
 	if (cur->oe_max_end != victim->oe_max_end)
 		return -ERANGE;
 
-	LASSERT(cur->oe_osclock == victim->oe_osclock);
+	LASSERT(cur->oe_dlmlock == victim->oe_dlmlock);
 	ppc_bits = osc_cli(obj)->cl_chunkbits - PAGE_CACHE_SHIFT;
 	chunk_start = cur->oe_start >> ppc_bits;
 	chunk_end = cur->oe_end >> ppc_bits;
@@ -624,10 +637,10 @@ static inline int overlapped(struct osc_extent *ex1, struct osc_extent *ex2)
 static struct osc_extent *osc_extent_find(const struct lu_env *env,
 					  struct osc_object *obj, pgoff_t index,
 					  int *grants)
-
 {
 	struct client_obd *cli = osc_cli(obj);
-	struct cl_lock *lock;
+	struct osc_lock   *olck;
+	struct cl_lock_descr *descr;
 	struct osc_extent *cur;
 	struct osc_extent *ext;
 	struct osc_extent *conflict = NULL;
@@ -644,8 +657,12 @@ static struct osc_extent *osc_extent_find(const struct lu_env *env,
 	if (!cur)
 		return ERR_PTR(-ENOMEM);
 
-	lock = cl_lock_at_pgoff(env, osc2cl(obj), index, NULL, 1, 0);
-	LASSERT(lock->cll_descr.cld_mode >= CLM_WRITE);
+	olck = osc_env_io(env)->oi_write_osclock;
+	LASSERTF(olck, "page %lu is not covered by lock\n", index);
+	LASSERT(olck->ols_state == OLS_GRANTED);
+
+	descr = &olck->ols_cl.cls_lock->cll_descr;
+	LASSERT(descr->cld_mode >= CLM_WRITE);
 
 	LASSERT(cli->cl_chunkbits >= PAGE_CACHE_SHIFT);
 	ppc_bits = cli->cl_chunkbits - PAGE_CACHE_SHIFT;
@@ -657,19 +674,23 @@ static struct osc_extent *osc_extent_find(const struct lu_env *env,
 	max_pages = cli->cl_max_pages_per_rpc;
 	LASSERT((max_pages & ~chunk_mask) == 0);
 	max_end = index - (index % max_pages) + max_pages - 1;
-	max_end = min_t(pgoff_t, max_end, lock->cll_descr.cld_end);
+	max_end = min_t(pgoff_t, max_end, descr->cld_end);
 
 	/* initialize new extent by parameters so far */
 	cur->oe_max_end = max_end;
 	cur->oe_start = index & chunk_mask;
 	cur->oe_end = ((index + ~chunk_mask + 1) & chunk_mask) - 1;
-	if (cur->oe_start < lock->cll_descr.cld_start)
-		cur->oe_start = lock->cll_descr.cld_start;
+	if (cur->oe_start < descr->cld_start)
+		cur->oe_start = descr->cld_start;
 	if (cur->oe_end > max_end)
 		cur->oe_end = max_end;
-	cur->oe_osclock = lock;
 	cur->oe_grants = 0;
 	cur->oe_mppr = max_pages;
+	if (olck->ols_dlmlock) {
+		LASSERT(olck->ols_hold);
+		cur->oe_dlmlock = LDLM_LOCK_GET(olck->ols_dlmlock);
+		lu_ref_add(&olck->ols_dlmlock->l_reference, "osc_extent", cur);
+	}
 
 	/* grants has been allocated by caller */
 	LASSERTF(*grants >= chunksize + cli->cl_extent_tax,
@@ -691,7 +712,7 @@ restart:
 			break;
 
 		/* if covering by different locks, no chance to match */
-		if (lock != ext->oe_osclock) {
+		if (olck->ols_dlmlock != ext->oe_dlmlock) {
 			EASSERTF(!overlapped(ext, cur), ext,
 				 EXTSTR"\n", EXTPARA(cur));
 
@@ -795,7 +816,7 @@ restart:
 	if (found) {
 		LASSERT(!conflict);
 		if (!IS_ERR(found)) {
-			LASSERT(found->oe_osclock == cur->oe_osclock);
+			LASSERT(found->oe_dlmlock == cur->oe_dlmlock);
 			OSC_EXTENT_DUMP(D_CACHE, found,
 					"found caching ext for %lu.\n", index);
 		}
@@ -810,7 +831,7 @@ restart:
 		found = osc_extent_hold(cur);
 		osc_extent_insert(obj, cur);
 		OSC_EXTENT_DUMP(D_CACHE, cur, "add into tree %lu/%lu.\n",
-				index, lock->cll_descr.cld_end);
+				index, descr->cld_end);
 	}
 	osc_object_unlock(obj);
 
@@ -2630,6 +2651,7 @@ int osc_queue_sync_pages(const struct lu_env *env, struct osc_object *obj,
 	}
 
 	ext->oe_rw = !!(cmd & OBD_BRW_READ);
+	ext->oe_sync = 1;
 	ext->oe_urgent = 1;
 	ext->oe_start = start;
 	ext->oe_end = ext->oe_max_end = end;
@@ -3087,27 +3109,27 @@ static int check_and_discard_cb(const struct lu_env *env, struct cl_io *io,
 				struct osc_page *ops, void *cbdata)
 {
 	struct osc_thread_info *info = osc_env_info(env);
-	struct cl_lock *lock = cbdata;
+	struct osc_object *osc = cbdata;
 	pgoff_t index;
 
 	index = osc_index(ops);
 	if (index >= info->oti_fn_index) {
-		struct cl_lock *tmp;
+		struct ldlm_lock *tmp;
 		struct cl_page *page = ops->ops_cl.cpl_page;
 
 		/* refresh non-overlapped index */
-		tmp = cl_lock_at_pgoff(env, lock->cll_descr.cld_obj, index,
-				       lock, 1, 0);
+		tmp = osc_dlmlock_at_pgoff(env, osc, index, 0, 0);
 		if (tmp) {
+			__u64 end = tmp->l_policy_data.l_extent.end;
 			/* Cache the first-non-overlapped index so as to skip
-			 * all pages within [index, oti_fn_index). This
-			 * is safe because if tmp lock is canceled, it will
-			 * discard these pages.
+			 * all pages within [index, oti_fn_index). This is safe
+			 * because if tmp lock is canceled, it will discard
+			 * these pages.
 			 */
-			info->oti_fn_index = tmp->cll_descr.cld_end + 1;
-			if (tmp->cll_descr.cld_end == CL_PAGE_EOF)
+			info->oti_fn_index = cl_index(osc2cl(osc), end + 1);
+			if (end == OBD_OBJECT_EOF)
 				info->oti_fn_index = CL_PAGE_EOF;
-			cl_lock_put(env, tmp);
+			LDLM_LOCK_PUT(tmp);
 		} else if (cl_page_own(env, io, page) == 0) {
 			/* discard the page */
 			cl_page_discard(env, io, page);
@@ -3125,11 +3147,8 @@ static int discard_cb(const struct lu_env *env, struct cl_io *io,
 		      struct osc_page *ops, void *cbdata)
 {
 	struct osc_thread_info *info = osc_env_info(env);
-	struct cl_lock *lock = cbdata;
 	struct cl_page *page = ops->ops_cl.cpl_page;
 
-	LASSERT(lock->cll_descr.cld_mode >= CLM_WRITE);
-
 	/* page is top page. */
 	info->oti_next_index = osc_index(ops) + 1;
 	if (cl_page_own(env, io, page) == 0) {
@@ -3154,30 +3173,27 @@ static int discard_cb(const struct lu_env *env, struct cl_io *io,
  * If error happens on any step, the process continues anyway (the reasoning
  * behind this being that lock cancellation cannot be delayed indefinitely).
  */
-int osc_lock_discard_pages(const struct lu_env *env, struct osc_lock *ols)
+int osc_lock_discard_pages(const struct lu_env *env, struct osc_object *osc,
+			   pgoff_t start, pgoff_t end, enum cl_lock_mode mode)
 {
 	struct osc_thread_info *info = osc_env_info(env);
 	struct cl_io *io = &info->oti_io;
-	struct cl_object *osc = ols->ols_cl.cls_obj;
-	struct cl_lock *lock = ols->ols_cl.cls_lock;
-	struct cl_lock_descr *descr = &lock->cll_descr;
 	osc_page_gang_cbt cb;
 	int res;
 	int result;
 
-	io->ci_obj = cl_object_top(osc);
+	io->ci_obj = cl_object_top(osc2cl(osc));
 	io->ci_ignore_layout = 1;
 	result = cl_io_init(env, io, CIT_MISC, io->ci_obj);
 	if (result != 0)
 		goto out;
 
-	cb = descr->cld_mode == CLM_READ ? check_and_discard_cb : discard_cb;
-	info->oti_fn_index = info->oti_next_index = descr->cld_start;
+	cb = mode == CLM_READ ? check_and_discard_cb : discard_cb;
+	info->oti_fn_index = info->oti_next_index = start;
 	do {
-		res = osc_page_gang_lookup(env, io, cl2osc(osc),
-					   info->oti_next_index, descr->cld_end,
-					   cb, (void *)lock);
-		if (info->oti_next_index > descr->cld_end)
+		res = osc_page_gang_lookup(env, io, osc,
+					   info->oti_next_index, end, cb, osc);
+		if (info->oti_next_index > end)
 			break;
 
 		if (res == CLP_GANG_RESCHED)
diff --git a/drivers/staging/lustre/lustre/osc/osc_cl_internal.h b/drivers/staging/lustre/lustre/osc/osc_cl_internal.h
index bb34b53a..c7a69e4 100644
--- a/drivers/staging/lustre/lustre/osc/osc_cl_internal.h
+++ b/drivers/staging/lustre/lustre/osc/osc_cl_internal.h
@@ -68,6 +68,9 @@ struct osc_io {
 	struct cl_io_slice oi_cl;
 	/** true if this io is lockless. */
 	int		oi_lockless;
+	/** how many LRU pages are reserved for this IO */
+	int oi_lru_reserved;
+
 	/** active extents, we know how many bytes is going to be written,
 	 * so having an active extent will prevent it from being fragmented
 	 */
@@ -77,8 +80,8 @@ struct osc_io {
 	 */
 	struct osc_extent *oi_trunc;
 
-	int oi_lru_reserved;
-
+	/** write osc_lock for this IO, used by osc_extent_find(). */
+	struct osc_lock   *oi_write_osclock;
 	struct obd_info    oi_info;
 	struct obdo	oi_oa;
 	struct osc_async_cbargs {
@@ -117,6 +120,7 @@ struct osc_thread_info {
 	 */
 	pgoff_t			oti_next_index;
 	pgoff_t			oti_fn_index; /* first non-overlapped index */
+	struct cl_sync_io	oti_anchor;
 };
 
 struct osc_object {
@@ -173,6 +177,10 @@ struct osc_object {
 	struct radix_tree_root	oo_tree;
 	spinlock_t		oo_tree_lock;
 	unsigned long		oo_npages;
+
+	/* Protect osc_lock this osc_object has */
+	spinlock_t		oo_ol_spin;
+	struct list_head	oo_ol_list;
 };
 
 static inline void osc_object_lock(struct osc_object *obj)
@@ -212,8 +220,6 @@ enum osc_lock_state {
 	OLS_ENQUEUED,
 	OLS_UPCALL_RECEIVED,
 	OLS_GRANTED,
-	OLS_RELEASED,
-	OLS_BLOCKED,
 	OLS_CANCELLED
 };
 
@@ -222,10 +228,8 @@ enum osc_lock_state {
  *
  * Interaction with DLM.
  *
- * CLIO enqueues all DLM locks through ptlrpcd (that is, in "async" mode).
- *
  * Once receive upcall is invoked, osc_lock remembers a handle of DLM lock in
- * osc_lock::ols_handle and a pointer to that lock in osc_lock::ols_lock.
+ * osc_lock::ols_handle and a pointer to that lock in osc_lock::ols_dlmlock.
  *
  * This pointer is protected through a reference, acquired by
  * osc_lock_upcall0(). Also, an additional reference is acquired by
@@ -263,16 +267,27 @@ enum osc_lock_state {
  */
 struct osc_lock {
 	struct cl_lock_slice     ols_cl;
+	/** Internal lock to protect states, etc. */
+	spinlock_t		ols_lock;
+	/** Owner sleeps on this channel for state change */
+	struct cl_sync_io	*ols_owner;
+	/** waiting list for this lock to be cancelled */
+	struct list_head	ols_waiting_list;
+	/** wait entry of ols_waiting_list */
+	struct list_head	ols_wait_entry;
+	/** list entry for osc_object::oo_ol_list */
+	struct list_head	ols_nextlock_oscobj;
+
 	/** underlying DLM lock */
-	struct ldlm_lock	*ols_lock;
-	/** lock value block */
-	struct ost_lvb	   ols_lvb;
+	struct ldlm_lock	*ols_dlmlock;
 	/** DLM flags with which osc_lock::ols_lock was enqueued */
 	__u64		    ols_flags;
 	/** osc_lock::ols_lock handle */
 	struct lustre_handle     ols_handle;
 	struct ldlm_enqueue_info ols_einfo;
 	enum osc_lock_state      ols_state;
+	/** lock value block */
+	struct ost_lvb		ols_lvb;
 
 	/**
 	 * true, if ldlm_lock_addref() was called against
@@ -303,16 +318,6 @@ struct osc_lock {
 	 */
 				 ols_locklessable:1,
 	/**
-	 * set by osc_lock_use() to wait until blocking AST enters into
-	 * osc_ldlm_blocking_ast0(), so that cl_lock mutex can be used for
-	 * further synchronization.
-	 */
-				 ols_ast_wait:1,
-	/**
-	 * If the data of this lock has been flushed to server side.
-	 */
-				 ols_flush:1,
-	/**
 	 * if set, the osc_lock is a glimpse lock. For glimpse locks, we treat
 	 * the EVAVAIL error as tolerable, this will make upper logic happy
 	 * to wait all glimpse locks to each OSTs to be completed.
@@ -325,15 +330,6 @@ struct osc_lock {
 	 * For async glimpse lock.
 	 */
 				 ols_agl:1;
-	/**
-	 * IO that owns this lock. This field is used for a dead-lock
-	 * avoidance by osc_lock_enqueue_wait().
-	 *
-	 * XXX: unfortunately, the owner of a osc_lock is not unique,
-	 * the lock may have multiple users, if the lock is granted and
-	 * then matched.
-	 */
-	struct osc_io	   *ols_owner;
 };
 
 /**
@@ -627,6 +623,8 @@ struct osc_extent {
 	unsigned int       oe_intree:1,
 	/** 0 is write, 1 is read */
 			   oe_rw:1,
+	/** sync extent, queued by osc_queue_sync_pages() */
+				oe_sync:1,
 			   oe_srvlock:1,
 			   oe_memalloc:1,
 	/** an ACTIVE extent is going to be truncated, so when this extent
@@ -675,7 +673,7 @@ struct osc_extent {
 	 */
 	wait_queue_head_t	oe_waitq;
 	/** lock covering this extent */
-	struct cl_lock    *oe_osclock;
+	struct ldlm_lock	*oe_dlmlock;
 	/** terminator of this extent. Must be true if this extent is in IO. */
 	struct task_struct	*oe_owner;
 	/** return value of writeback. If somebody is waiting for this extent,
@@ -690,14 +688,14 @@ int osc_extent_finish(const struct lu_env *env, struct osc_extent *ext,
 		      int sent, int rc);
 void osc_extent_release(const struct lu_env *env, struct osc_extent *ext);
 
-int osc_lock_discard_pages(const struct lu_env *env, struct osc_lock *lock);
+int osc_lock_discard_pages(const struct lu_env *env, struct osc_object *osc,
+			   pgoff_t start, pgoff_t end, enum cl_lock_mode mode);
 
 typedef int (*osc_page_gang_cbt)(const struct lu_env *, struct cl_io *,
 				 struct osc_page *, void *);
 int osc_page_gang_lookup(const struct lu_env *env, struct cl_io *io,
 			 struct osc_object *osc, pgoff_t start, pgoff_t end,
 			 osc_page_gang_cbt cb, void *cbdata);
-
 /** @} osc */
 
 #endif /* OSC_CL_INTERNAL_H */
diff --git a/drivers/staging/lustre/lustre/osc/osc_internal.h b/drivers/staging/lustre/lustre/osc/osc_internal.h
index b7fb01a..cf9f8b7 100644
--- a/drivers/staging/lustre/lustre/osc/osc_internal.h
+++ b/drivers/staging/lustre/lustre/osc/osc_internal.h
@@ -108,12 +108,14 @@ void osc_update_next_shrink(struct client_obd *cli);
 
 extern struct ptlrpc_request_set *PTLRPCD_SET;
 
+typedef int (*osc_enqueue_upcall_f)(void *cookie, struct lustre_handle *lockh,
+				    int rc);
+
 int osc_enqueue_base(struct obd_export *exp, struct ldlm_res_id *res_id,
 		     __u64 *flags, ldlm_policy_data_t *policy,
 		     struct ost_lvb *lvb, int kms_valid,
-		     obd_enqueue_update_f upcall,
+		     osc_enqueue_upcall_f upcall,
 		     void *cookie, struct ldlm_enqueue_info *einfo,
-		     struct lustre_handle *lockh,
 		     struct ptlrpc_request_set *rqset, int async, int agl);
 int osc_cancel_base(struct lustre_handle *lockh, __u32 mode);
 
@@ -140,7 +142,6 @@ int osc_lru_shrink(const struct lu_env *env, struct client_obd *cli,
 		   int target, bool force);
 int osc_lru_reclaim(struct client_obd *cli);
 
-extern spinlock_t osc_ast_guard;
 unsigned long osc_ldlm_weigh_ast(struct ldlm_lock *dlmlock);
 
 int osc_setup(struct obd_device *obd, struct lustre_cfg *lcfg);
@@ -199,5 +200,8 @@ int osc_quotactl(struct obd_device *unused, struct obd_export *exp,
 int osc_quotacheck(struct obd_device *unused, struct obd_export *exp,
 		   struct obd_quotactl *oqctl);
 int osc_quota_poll_check(struct obd_export *exp, struct if_quotacheck *qchk);
+struct ldlm_lock *osc_dlmlock_at_pgoff(const struct lu_env *env,
+				       struct osc_object *obj, pgoff_t index,
+				       int pending, int canceling);
 
 #endif /* OSC_INTERNAL_H */
diff --git a/drivers/staging/lustre/lustre/osc/osc_io.c b/drivers/staging/lustre/lustre/osc/osc_io.c
index 1ae8a22..cf7743d 100644
--- a/drivers/staging/lustre/lustre/osc/osc_io.c
+++ b/drivers/staging/lustre/lustre/osc/osc_io.c
@@ -354,6 +354,7 @@ static void osc_io_rw_iter_fini(const struct lu_env *env,
 		atomic_add(oio->oi_lru_reserved, cli->cl_lru_left);
 		oio->oi_lru_reserved = 0;
 	}
+	oio->oi_write_osclock = NULL;
 }
 
 static int osc_io_fault_start(const struct lu_env *env,
@@ -751,8 +752,7 @@ static void osc_req_attr_set(const struct lu_env *env,
 	struct lov_oinfo *oinfo;
 	struct cl_req *clerq;
 	struct cl_page *apage; /* _some_ page in @clerq */
-	struct cl_lock *lock;  /* _some_ lock protecting @apage */
-	struct osc_lock *olck;
+	struct ldlm_lock *lock;  /* _some_ lock protecting @apage */
 	struct osc_page *opg;
 	struct obdo *oa;
 	struct ost_lvb *lvb;
@@ -782,38 +782,37 @@ static void osc_req_attr_set(const struct lu_env *env,
 		oa->o_valid |= OBD_MD_FLID;
 	}
 	if (flags & OBD_MD_FLHANDLE) {
-		struct cl_object *subobj;
 
 		clerq = slice->crs_req;
 		LASSERT(!list_empty(&clerq->crq_pages));
 		apage = container_of(clerq->crq_pages.next,
 				     struct cl_page, cp_flight);
 		opg = osc_cl_page_osc(apage, NULL);
-		subobj = opg->ops_cl.cpl_obj;
-		lock = cl_lock_at_pgoff(env, subobj, osc_index(opg),
-					NULL, 1, 1);
-		if (!lock) {
-			struct cl_object_header *head;
-			struct cl_lock *scan;
-
-			head = cl_object_header(subobj);
-			list_for_each_entry(scan, &head->coh_locks, cll_linkage)
-				CL_LOCK_DEBUG(D_ERROR, env, scan,
-					      "no cover page!\n");
-			CL_PAGE_DEBUG(D_ERROR, env, apage,
-				      "dump uncover page!\n");
+		lock = osc_dlmlock_at_pgoff(env, cl2osc(obj), osc_index(opg),
+					    1, 1);
+		if (!lock && !opg->ops_srvlock) {
+			struct ldlm_resource *res;
+			struct ldlm_res_id *resname;
+
+			CL_PAGE_DEBUG(D_ERROR, env, apage, "uncovered page!\n");
+
+			resname = &osc_env_info(env)->oti_resname;
+			ostid_build_res_name(&oinfo->loi_oi, resname);
+			res = ldlm_resource_get(
+				osc_export(cl2osc(obj))->exp_obd->obd_namespace,
+				NULL, resname, LDLM_EXTENT, 0);
+			ldlm_resource_dump(D_ERROR, res);
+
 			dump_stack();
 			LBUG();
 		}
 
-		olck = osc_lock_at(lock);
-		LASSERT(ergo(opg->ops_srvlock, !olck->ols_lock));
 		/* check for lockless io. */
-		if (olck->ols_lock) {
-			oa->o_handle = olck->ols_lock->l_remote_handle;
+		if (lock) {
+			oa->o_handle = lock->l_remote_handle;
 			oa->o_valid |= OBD_MD_FLHANDLE;
+			LDLM_LOCK_PUT(lock);
 		}
-		cl_lock_put(env, lock);
 	}
 }
 
diff --git a/drivers/staging/lustre/lustre/osc/osc_lock.c b/drivers/staging/lustre/lustre/osc/osc_lock.c
index 978b6ea..68c5013 100644
--- a/drivers/staging/lustre/lustre/osc/osc_lock.c
+++ b/drivers/staging/lustre/lustre/osc/osc_lock.c
@@ -61,7 +61,6 @@ static const struct cl_lock_operations osc_lock_ops;
 static const struct cl_lock_operations osc_lock_lockless_ops;
 static void osc_lock_to_lockless(const struct lu_env *env,
 				 struct osc_lock *ols, int force);
-static int osc_lock_has_pages(struct osc_lock *olck);
 
 int osc_lock_is_lockless(const struct osc_lock *olck)
 {
@@ -89,11 +88,11 @@ static struct ldlm_lock *osc_handle_ptr(struct lustre_handle *handle)
 static int osc_lock_invariant(struct osc_lock *ols)
 {
 	struct ldlm_lock *lock = osc_handle_ptr(&ols->ols_handle);
-	struct ldlm_lock *olock = ols->ols_lock;
+	struct ldlm_lock *olock = ols->ols_dlmlock;
 	int handle_used = lustre_handle_is_used(&ols->ols_handle);
 
 	if (ergo(osc_lock_is_lockless(ols),
-		 ols->ols_locklessable && !ols->ols_lock))
+		 ols->ols_locklessable && !ols->ols_dlmlock))
 		return 1;
 
 	/*
@@ -110,7 +109,7 @@ static int osc_lock_invariant(struct osc_lock *ols)
 		  ergo(!lock, !olock)))
 		return 0;
 	/*
-	 * Check that ->ols_handle and ->ols_lock are consistent, but
+	 * Check that ->ols_handle and ->ols_dlmlock are consistent, but
 	 * take into account that they are set at the different time.
 	 */
 	if (!ergo(ols->ols_state == OLS_CANCELLED,
@@ -137,115 +136,13 @@ static int osc_lock_invariant(struct osc_lock *ols)
  *
  */
 
-/**
- * Breaks a link between osc_lock and dlm_lock.
- */
-static void osc_lock_detach(const struct lu_env *env, struct osc_lock *olck)
-{
-	struct ldlm_lock *dlmlock;
-
-	spin_lock(&osc_ast_guard);
-	dlmlock = olck->ols_lock;
-	if (!dlmlock) {
-		spin_unlock(&osc_ast_guard);
-		return;
-	}
-
-	olck->ols_lock = NULL;
-	/* wb(); --- for all who checks (ols->ols_lock != NULL) before
-	 * call to osc_lock_detach()
-	 */
-	dlmlock->l_ast_data = NULL;
-	olck->ols_handle.cookie = 0ULL;
-	spin_unlock(&osc_ast_guard);
-
-	lock_res_and_lock(dlmlock);
-	if (dlmlock->l_granted_mode == dlmlock->l_req_mode) {
-		struct cl_object *obj = olck->ols_cl.cls_obj;
-		struct cl_attr *attr = &osc_env_info(env)->oti_attr;
-		__u64 old_kms;
-
-		cl_object_attr_lock(obj);
-		/* Must get the value under the lock to avoid possible races. */
-		old_kms = cl2osc(obj)->oo_oinfo->loi_kms;
-		/* Update the kms. Need to loop all granted locks.
-		 * Not a problem for the client
-		 */
-		attr->cat_kms = ldlm_extent_shift_kms(dlmlock, old_kms);
-
-		cl_object_attr_set(env, obj, attr, CAT_KMS);
-		cl_object_attr_unlock(obj);
-	}
-	unlock_res_and_lock(dlmlock);
-
-	/* release a reference taken in osc_lock_upcall0(). */
-	LASSERT(olck->ols_has_ref);
-	lu_ref_del(&dlmlock->l_reference, "osc_lock", olck);
-	LDLM_LOCK_RELEASE(dlmlock);
-	olck->ols_has_ref = 0;
-}
-
-static int osc_lock_unhold(struct osc_lock *ols)
-{
-	int result = 0;
-
-	if (ols->ols_hold) {
-		ols->ols_hold = 0;
-		result = osc_cancel_base(&ols->ols_handle,
-					 ols->ols_einfo.ei_mode);
-	}
-	return result;
-}
-
-static int osc_lock_unuse(const struct lu_env *env,
-			  const struct cl_lock_slice *slice)
-{
-	struct osc_lock *ols = cl2osc_lock(slice);
-
-	LINVRNT(osc_lock_invariant(ols));
-
-	switch (ols->ols_state) {
-	case OLS_NEW:
-		LASSERT(!ols->ols_hold);
-		LASSERT(ols->ols_agl);
-		return 0;
-	case OLS_UPCALL_RECEIVED:
-		osc_lock_unhold(ols);
-	case OLS_ENQUEUED:
-		LASSERT(!ols->ols_hold);
-		osc_lock_detach(env, ols);
-		ols->ols_state = OLS_NEW;
-		return 0;
-	case OLS_GRANTED:
-		LASSERT(!ols->ols_glimpse);
-		LASSERT(ols->ols_hold);
-		/*
-		 * Move lock into OLS_RELEASED state before calling
-		 * osc_cancel_base() so that possible synchronous cancellation
-		 * sees that lock is released.
-		 */
-		ols->ols_state = OLS_RELEASED;
-		return osc_lock_unhold(ols);
-	default:
-		CERROR("Impossible state: %d\n", ols->ols_state);
-		LBUG();
-	}
-}
-
 static void osc_lock_fini(const struct lu_env *env,
 			  struct cl_lock_slice *slice)
 {
 	struct osc_lock *ols = cl2osc_lock(slice);
 
 	LINVRNT(osc_lock_invariant(ols));
-	/*
-	 * ->ols_hold can still be true at this point if, for example, a
-	 * thread that requested a lock was killed (and released a reference
-	 * to the lock), before reply from a server was received. In this case
-	 * lock is destroyed immediately after upcall.
-	 */
-	osc_lock_unhold(ols);
-	LASSERT(!ols->ols_lock);
+	LASSERT(!ols->ols_dlmlock);
 
 	kmem_cache_free(osc_lock_kmem, ols);
 }
@@ -272,55 +169,12 @@ static __u64 osc_enq2ldlm_flags(__u32 enqflags)
 		result |= LDLM_FL_HAS_INTENT;
 	if (enqflags & CEF_DISCARD_DATA)
 		result |= LDLM_FL_AST_DISCARD_DATA;
+	if (enqflags & CEF_PEEK)
+		result |= LDLM_FL_TEST_LOCK;
 	return result;
 }
 
 /**
- * Global spin-lock protecting consistency of ldlm_lock::l_ast_data
- * pointers. Initialized in osc_init().
- */
-spinlock_t osc_ast_guard;
-
-static struct osc_lock *osc_ast_data_get(struct ldlm_lock *dlm_lock)
-{
-	struct osc_lock *olck;
-
-	lock_res_and_lock(dlm_lock);
-	spin_lock(&osc_ast_guard);
-	olck = dlm_lock->l_ast_data;
-	if (olck) {
-		struct cl_lock *lock = olck->ols_cl.cls_lock;
-		/*
-		 * If osc_lock holds a reference on ldlm lock, return it even
-		 * when cl_lock is in CLS_FREEING state. This way
-		 *
-		 *	 osc_ast_data_get(dlmlock) == NULL
-		 *
-		 * guarantees that all osc references on dlmlock were
-		 * released. osc_dlm_blocking_ast0() relies on that.
-		 */
-		if (lock->cll_state < CLS_FREEING || olck->ols_has_ref) {
-			cl_lock_get_trust(lock);
-			lu_ref_add_atomic(&lock->cll_reference,
-					  "ast", current);
-		} else
-			olck = NULL;
-	}
-	spin_unlock(&osc_ast_guard);
-	unlock_res_and_lock(dlm_lock);
-	return olck;
-}
-
-static void osc_ast_data_put(const struct lu_env *env, struct osc_lock *olck)
-{
-	struct cl_lock *lock;
-
-	lock = olck->ols_cl.cls_lock;
-	lu_ref_del(&lock->cll_reference, "ast", current);
-	cl_lock_put(env, lock);
-}
-
-/**
  * Updates object attributes from a lock value block (lvb) received together
  * with the DLM lock reply from the server. Copy of osc_update_enqueue()
  * logic.
@@ -330,35 +184,30 @@ static void osc_ast_data_put(const struct lu_env *env, struct osc_lock *olck)
  *
  * Called under lock and resource spin-locks.
  */
-static void osc_lock_lvb_update(const struct lu_env *env, struct osc_lock *olck,
-				int rc)
+static void osc_lock_lvb_update(const struct lu_env *env,
+				struct osc_object *osc,
+				struct ldlm_lock *dlmlock,
+				struct ost_lvb *lvb)
 {
-	struct ost_lvb *lvb;
-	struct cl_object *obj;
-	struct lov_oinfo *oinfo;
-	struct cl_attr *attr;
+	struct cl_object *obj = osc2cl(osc);
+	struct lov_oinfo *oinfo = osc->oo_oinfo;
+	struct cl_attr *attr = &osc_env_info(env)->oti_attr;
 	unsigned valid;
 
-	if (!(olck->ols_flags & LDLM_FL_LVB_READY))
-		return;
-
-	lvb = &olck->ols_lvb;
-	obj = olck->ols_cl.cls_obj;
-	oinfo = cl2osc(obj)->oo_oinfo;
-	attr = &osc_env_info(env)->oti_attr;
 	valid = CAT_BLOCKS | CAT_ATIME | CAT_CTIME | CAT_MTIME | CAT_SIZE;
+	if (!lvb)
+		lvb = dlmlock->l_lvb_data;
+
 	cl_lvb2attr(attr, lvb);
 
 	cl_object_attr_lock(obj);
-	if (rc == 0) {
-		struct ldlm_lock *dlmlock;
+	if (dlmlock) {
 		__u64 size;
 
-		dlmlock = olck->ols_lock;
-
-		/* re-grab LVB from a dlm lock under DLM spin-locks. */
-		*lvb = *(struct ost_lvb *)dlmlock->l_lvb_data;
+		check_res_locked(dlmlock->l_resource);
+		LASSERT(lvb == dlmlock->l_lvb_data);
 		size = lvb->lvb_size;
+
 		/* Extend KMS up to the end of this lock and no further
 		 * A lock on [x,y] means a KMS of up to y + 1 bytes!
 		 */
@@ -375,102 +224,67 @@ static void osc_lock_lvb_update(const struct lu_env *env, struct osc_lock *olck,
 				   dlmlock->l_policy_data.l_extent.end);
 		}
 		ldlm_lock_allow_match_locked(dlmlock);
-	} else if (rc == -ENAVAIL && olck->ols_glimpse) {
-		CDEBUG(D_INODE, "glimpsed, setting rss=%llu; leaving kms=%llu\n",
-		       lvb->lvb_size, oinfo->loi_kms);
-	} else
-		valid = 0;
-
-	if (valid != 0)
-		cl_object_attr_set(env, obj, attr, valid);
+	}
 
+	cl_object_attr_set(env, obj, attr, valid);
 	cl_object_attr_unlock(obj);
 }
 
-/**
- * Called when a lock is granted, from an upcall (when server returned a
- * granted lock), or from completion AST, when server returned a blocked lock.
- *
- * Called under lock and resource spin-locks, that are released temporarily
- * here.
- */
-static void osc_lock_granted(const struct lu_env *env, struct osc_lock *olck,
-			     struct ldlm_lock *dlmlock, int rc)
+static void osc_lock_granted(const struct lu_env *env, struct osc_lock *oscl,
+			     struct lustre_handle *lockh, bool lvb_update)
 {
-	struct ldlm_extent *ext;
-	struct cl_lock *lock;
-	struct cl_lock_descr *descr;
+	struct ldlm_lock *dlmlock;
 
-	LASSERT(dlmlock->l_granted_mode == dlmlock->l_req_mode);
+	dlmlock = ldlm_handle2lock_long(lockh, 0);
+	LASSERT(dlmlock);
+
+	/* lock reference taken by ldlm_handle2lock_long() is
+	 * owned by osc_lock and released in osc_lock_detach()
+	 */
+	lu_ref_add(&dlmlock->l_reference, "osc_lock", oscl);
+	oscl->ols_has_ref = 1;
 
-	if (olck->ols_state < OLS_GRANTED) {
-		lock = olck->ols_cl.cls_lock;
-		ext = &dlmlock->l_policy_data.l_extent;
-		descr = &osc_env_info(env)->oti_descr;
-		descr->cld_obj = lock->cll_descr.cld_obj;
+	LASSERT(!oscl->ols_dlmlock);
+	oscl->ols_dlmlock = dlmlock;
 
-		/* XXX check that ->l_granted_mode is valid. */
-		descr->cld_mode = osc_ldlm2cl_lock(dlmlock->l_granted_mode);
-		descr->cld_start = cl_index(descr->cld_obj, ext->start);
-		descr->cld_end = cl_index(descr->cld_obj, ext->end);
-		descr->cld_gid = ext->gid;
-		/*
-		 * tell upper layers the extent of the lock that was actually
-		 * granted
-		 */
-		olck->ols_state = OLS_GRANTED;
-		osc_lock_lvb_update(env, olck, rc);
-
-		/* release DLM spin-locks to allow cl_lock_{modify,signal}()
-		 * to take a semaphore on a parent lock. This is safe, because
-		 * spin-locks are needed to protect consistency of
-		 * dlmlock->l_*_mode and LVB, and we have finished processing
-		 * them.
+	/* This may be a matched lock for glimpse request, do not hold
+	 * lock reference in that case.
+	 */
+	if (!oscl->ols_glimpse) {
+		/* hold a refc for non glimpse lock which will
+		 * be released in osc_lock_cancel()
 		 */
-		unlock_res_and_lock(dlmlock);
-		cl_lock_modify(env, lock, descr);
-		cl_lock_signal(env, lock);
-		LINVRNT(osc_lock_invariant(olck));
-		lock_res_and_lock(dlmlock);
+		lustre_handle_copy(&oscl->ols_handle, lockh);
+		ldlm_lock_addref(lockh, oscl->ols_einfo.ei_mode);
+		oscl->ols_hold = 1;
 	}
-}
-
-static void osc_lock_upcall0(const struct lu_env *env, struct osc_lock *olck)
-
-{
-	struct ldlm_lock *dlmlock;
-
-	dlmlock = ldlm_handle2lock_long(&olck->ols_handle, 0);
-	LASSERT(dlmlock);
 
+	/* Lock must have been granted. */
 	lock_res_and_lock(dlmlock);
-	spin_lock(&osc_ast_guard);
-	LASSERT(dlmlock->l_ast_data == olck);
-	LASSERT(!olck->ols_lock);
-	olck->ols_lock = dlmlock;
-	spin_unlock(&osc_ast_guard);
+	if (dlmlock->l_granted_mode == dlmlock->l_req_mode) {
+		struct ldlm_extent *ext = &dlmlock->l_policy_data.l_extent;
+		struct cl_lock_descr *descr = &oscl->ols_cl.cls_lock->cll_descr;
 
-	/*
-	 * Lock might be not yet granted. In this case, completion ast
-	 * (osc_ldlm_completion_ast()) comes later and finishes lock
-	 * granting.
-	 */
-	if (dlmlock->l_granted_mode == dlmlock->l_req_mode)
-		osc_lock_granted(env, olck, dlmlock, 0);
-	unlock_res_and_lock(dlmlock);
+		/* extend the lock extent, otherwise it will have problem when
+		 * we decide whether to grant a lockless lock.
+		 */
+		descr->cld_mode = osc_ldlm2cl_lock(dlmlock->l_granted_mode);
+		descr->cld_start = cl_index(descr->cld_obj, ext->start);
+		descr->cld_end = cl_index(descr->cld_obj, ext->end);
+		descr->cld_gid = ext->gid;
 
-	/*
-	 * osc_enqueue_interpret() decrefs asynchronous locks, counter
-	 * this.
-	 */
-	ldlm_lock_addref(&olck->ols_handle, olck->ols_einfo.ei_mode);
-	olck->ols_hold = 1;
+		/* no lvb update for matched lock */
+		if (lvb_update) {
+			LASSERT(oscl->ols_flags & LDLM_FL_LVB_READY);
+			osc_lock_lvb_update(env, cl2osc(oscl->ols_cl.cls_obj),
+					    dlmlock, NULL);
+		}
+		LINVRNT(osc_lock_invariant(oscl));
+	}
+	unlock_res_and_lock(dlmlock);
 
-	/* lock reference taken by ldlm_handle2lock_long() is owned by
-	 * osc_lock and released in osc_lock_detach()
-	 */
-	lu_ref_add(&dlmlock->l_reference, "osc_lock", olck);
-	olck->ols_has_ref = 1;
+	LASSERT(oscl->ols_state != OLS_GRANTED);
+	oscl->ols_state = OLS_GRANTED;
 }
 
 /**
@@ -478,143 +292,124 @@ static void osc_lock_upcall0(const struct lu_env *env, struct osc_lock *olck)
  * received from a server, or after osc_enqueue_base() matched a local DLM
  * lock.
  */
-static int osc_lock_upcall(void *cookie, int errcode)
+static int osc_lock_upcall(void *cookie, struct lustre_handle *lockh,
+			   int errcode)
 {
-	struct osc_lock	*olck = cookie;
-	struct cl_lock_slice *slice = &olck->ols_cl;
-	struct cl_lock *lock = slice->cls_lock;
+	struct osc_lock *oscl = cookie;
+	struct cl_lock_slice *slice = &oscl->ols_cl;
 	struct lu_env *env;
 	struct cl_env_nest nest;
+	int rc;
 
 	env = cl_env_nested_get(&nest);
-	if (!IS_ERR(env)) {
-		int rc;
-
-		cl_lock_mutex_get(env, lock);
+	/* should never happen, similar to osc_ldlm_blocking_ast(). */
+	LASSERT(!IS_ERR(env));
+
+	rc = ldlm_error2errno(errcode);
+	if (oscl->ols_state == OLS_ENQUEUED) {
+		oscl->ols_state = OLS_UPCALL_RECEIVED;
+	} else if (oscl->ols_state == OLS_CANCELLED) {
+		rc = -EIO;
+	} else {
+		CERROR("Impossible state: %d\n", oscl->ols_state);
+		LBUG();
+	}
 
-		LASSERT(lock->cll_state >= CLS_QUEUING);
-		if (olck->ols_state == OLS_ENQUEUED) {
-			olck->ols_state = OLS_UPCALL_RECEIVED;
-			rc = ldlm_error2errno(errcode);
-		} else if (olck->ols_state == OLS_CANCELLED) {
-			rc = -EIO;
-		} else {
-			CERROR("Impossible state: %d\n", olck->ols_state);
-			LBUG();
-		}
-		if (rc) {
-			struct ldlm_lock *dlmlock;
-
-			dlmlock = ldlm_handle2lock(&olck->ols_handle);
-			if (dlmlock) {
-				lock_res_and_lock(dlmlock);
-				spin_lock(&osc_ast_guard);
-				LASSERT(!olck->ols_lock);
-				dlmlock->l_ast_data = NULL;
-				olck->ols_handle.cookie = 0ULL;
-				spin_unlock(&osc_ast_guard);
-				ldlm_lock_fail_match_locked(dlmlock);
-				unlock_res_and_lock(dlmlock);
-				LDLM_LOCK_PUT(dlmlock);
-			}
-		} else {
-			if (olck->ols_glimpse)
-				olck->ols_glimpse = 0;
-			osc_lock_upcall0(env, olck);
-		}
+	if (rc == 0)
+		osc_lock_granted(env, oscl, lockh, errcode == ELDLM_OK);
 
-		/* Error handling, some errors are tolerable. */
-		if (olck->ols_locklessable && rc == -EUSERS) {
-			/* This is a tolerable error, turn this lock into
-			 * lockless lock.
-			 */
-			osc_object_set_contended(cl2osc(slice->cls_obj));
-			LASSERT(slice->cls_ops == &osc_lock_ops);
+	/* Error handling, some errors are tolerable. */
+	if (oscl->ols_locklessable && rc == -EUSERS) {
+		/* This is a tolerable error, turn this lock into
+		 * lockless lock.
+		 */
+		osc_object_set_contended(cl2osc(slice->cls_obj));
+		LASSERT(slice->cls_ops == &osc_lock_ops);
+
+		/* Change this lock to ldlmlock-less lock. */
+		osc_lock_to_lockless(env, oscl, 1);
+		oscl->ols_state = OLS_GRANTED;
+		rc = 0;
+	} else if (oscl->ols_glimpse && rc == -ENAVAIL) {
+		LASSERT(oscl->ols_flags & LDLM_FL_LVB_READY);
+		osc_lock_lvb_update(env, cl2osc(slice->cls_obj),
+				    NULL, &oscl->ols_lvb);
+		/* Hide the error. */
+		rc = 0;
+	}
 
-			/* Change this lock to ldlmlock-less lock. */
-			osc_lock_to_lockless(env, olck, 1);
-			olck->ols_state = OLS_GRANTED;
-			rc = 0;
-		} else if (olck->ols_glimpse && rc == -ENAVAIL) {
-			osc_lock_lvb_update(env, olck, rc);
-			cl_lock_delete(env, lock);
-			/* Hide the error. */
-			rc = 0;
-		}
+	if (oscl->ols_owner)
+		cl_sync_io_note(env, oscl->ols_owner, rc);
+	cl_env_nested_put(&nest, env);
 
-		if (rc == 0) {
-			/* For AGL case, the RPC sponsor may exits the cl_lock
-			*  processing without wait() called before related OSC
-			*  lock upcall(). So update the lock status according
-			*  to the enqueue result inside AGL upcall().
-			*/
-			if (olck->ols_agl) {
-				lock->cll_flags |= CLF_FROM_UPCALL;
-				cl_wait_try(env, lock);
-				lock->cll_flags &= ~CLF_FROM_UPCALL;
-				if (!olck->ols_glimpse)
-					olck->ols_agl = 0;
-			}
-			cl_lock_signal(env, lock);
-			/* del user for lock upcall cookie */
-			cl_unuse_try(env, lock);
-		} else {
-			/* del user for lock upcall cookie */
-			cl_lock_user_del(env, lock);
-			cl_lock_error(env, lock, rc);
-		}
+	return rc;
+}
 
-		/* release cookie reference, acquired by osc_lock_enqueue() */
-		cl_lock_hold_release(env, lock, "upcall", lock);
-		cl_lock_mutex_put(env, lock);
+static int osc_lock_upcall_agl(void *cookie, struct lustre_handle *lockh,
+			       int errcode)
+{
+	struct osc_object *osc = cookie;
+	struct ldlm_lock *dlmlock;
+	struct lu_env *env;
+	struct cl_env_nest nest;
 
-		lu_ref_del(&lock->cll_reference, "upcall", lock);
-		/* This maybe the last reference, so must be called after
-		 * cl_lock_mutex_put().
-		 */
-		cl_lock_put(env, lock);
+	env = cl_env_nested_get(&nest);
+	LASSERT(!IS_ERR(env));
 
-		cl_env_nested_put(&nest, env);
-	} else {
-		/* should never happen, similar to osc_ldlm_blocking_ast(). */
-		LBUG();
+	if (errcode == ELDLM_LOCK_MATCHED) {
+		errcode = ELDLM_OK;
+		goto out;
 	}
-	return errcode;
+
+	if (errcode != ELDLM_OK)
+		goto out;
+
+	dlmlock = ldlm_handle2lock(lockh);
+	LASSERT(dlmlock);
+
+	lock_res_and_lock(dlmlock);
+	LASSERT(dlmlock->l_granted_mode == dlmlock->l_req_mode);
+
+	/* there is no osc_lock associated with AGL lock */
+	osc_lock_lvb_update(env, osc, dlmlock, NULL);
+
+	unlock_res_and_lock(dlmlock);
+	LDLM_LOCK_PUT(dlmlock);
+
+out:
+	cl_object_put(env, osc2cl(osc));
+	cl_env_nested_put(&nest, env);
+	return ldlm_error2errno(errcode);
 }
 
-/**
- * Core of osc_dlm_blocking_ast() logic.
- */
-static void osc_lock_blocking(const struct lu_env *env,
-			      struct ldlm_lock *dlmlock,
-			      struct osc_lock *olck, int blocking)
+static int osc_lock_flush(struct osc_object *obj, pgoff_t start, pgoff_t end,
+			  enum cl_lock_mode mode, int discard)
 {
-	struct cl_lock *lock = olck->ols_cl.cls_lock;
+	struct lu_env *env;
+	struct cl_env_nest nest;
+	int rc = 0;
+	int rc2 = 0;
 
-	LASSERT(olck->ols_lock == dlmlock);
-	CLASSERT(OLS_BLOCKED < OLS_CANCELLED);
-	LASSERT(!osc_lock_is_lockless(olck));
+	env = cl_env_nested_get(&nest);
+	if (IS_ERR(env))
+		return PTR_ERR(env);
+
+	if (mode == CLM_WRITE) {
+		rc = osc_cache_writeback_range(env, obj, start, end, 1,
+					       discard);
+		CDEBUG(D_CACHE, "object %p: [%lu -> %lu] %d pages were %s.\n",
+		       obj, start, end, rc,
+		       discard ? "discarded" : "written back");
+		if (rc > 0)
+			rc = 0;
+	}
 
-	/*
-	 * Lock might be still addref-ed here, if e.g., blocking ast
-	 * is sent for a failed lock.
-	 */
-	osc_lock_unhold(olck);
+	rc2 = osc_lock_discard_pages(env, obj, start, end, mode);
+	if (rc == 0 && rc2 < 0)
+		rc = rc2;
 
-	if (blocking && olck->ols_state < OLS_BLOCKED)
-		/*
-		 * Move osc_lock into OLS_BLOCKED before canceling the lock,
-		 * because it recursively re-enters osc_lock_blocking(), with
-		 * the state set to OLS_CANCELLED.
-		 */
-		olck->ols_state = OLS_BLOCKED;
-	/*
-	 * cancel and destroy lock at least once no matter how blocking ast is
-	 * entered (see comment above osc_ldlm_blocking_ast() for use
-	 * cases). cl_lock_cancel() and cl_lock_delete() are idempotent.
-	 */
-	cl_lock_cancel(env, lock);
-	cl_lock_delete(env, lock);
+	cl_env_nested_put(&nest, env);
+	return rc;
 }
 
 /**
@@ -625,65 +420,63 @@ static int osc_dlm_blocking_ast0(const struct lu_env *env,
 				 struct ldlm_lock *dlmlock,
 				 void *data, int flag)
 {
-	struct osc_lock *olck;
-	struct cl_lock *lock;
-	int result;
-	int cancel;
-
-	LASSERT(flag == LDLM_CB_BLOCKING || flag == LDLM_CB_CANCELING);
-
-	cancel = 0;
-	olck = osc_ast_data_get(dlmlock);
-	if (olck) {
-		lock = olck->ols_cl.cls_lock;
-		cl_lock_mutex_get(env, lock);
-		LINVRNT(osc_lock_invariant(olck));
-		if (olck->ols_ast_wait) {
-			/* wake up osc_lock_use() */
-			cl_lock_signal(env, lock);
-			olck->ols_ast_wait = 0;
-		}
-		/*
-		 * Lock might have been canceled while this thread was
-		 * sleeping for lock mutex, but olck is pinned in memory.
-		 */
-		if (olck == dlmlock->l_ast_data) {
-			/*
-			 * NOTE: DLM sends blocking AST's for failed locks
-			 *       (that are still in pre-OLS_GRANTED state)
-			 *       too, and they have to be canceled otherwise
-			 *       DLM lock is never destroyed and stuck in
-			 *       the memory.
-			 *
-			 *       Alternatively, ldlm_cli_cancel() can be
-			 *       called here directly for osc_locks with
-			 *       ols_state < OLS_GRANTED to maintain an
-			 *       invariant that ->clo_cancel() is only called
-			 *       for locks that were granted.
-			 */
-			LASSERT(data == olck);
-			osc_lock_blocking(env, dlmlock,
-					  olck, flag == LDLM_CB_BLOCKING);
-		} else
-			cancel = 1;
-		cl_lock_mutex_put(env, lock);
-		osc_ast_data_put(env, olck);
-	} else
-		/*
-		 * DLM lock exists, but there is no cl_lock attached to it.
-		 * This is a `normal' race. cl_object and its cl_lock's can be
-		 * removed by memory pressure, together with all pages.
+	struct cl_object *obj = NULL;
+	int result = 0;
+	int discard;
+	enum cl_lock_mode mode = CLM_READ;
+
+	LASSERT(flag == LDLM_CB_CANCELING);
+
+	lock_res_and_lock(dlmlock);
+	if (dlmlock->l_granted_mode != dlmlock->l_req_mode) {
+		dlmlock->l_ast_data = NULL;
+		unlock_res_and_lock(dlmlock);
+		return 0;
+	}
+
+	discard = ldlm_is_discard_data(dlmlock);
+	if (dlmlock->l_granted_mode & (LCK_PW | LCK_GROUP))
+		mode = CLM_WRITE;
+
+	if (dlmlock->l_ast_data) {
+		obj = osc2cl(dlmlock->l_ast_data);
+		dlmlock->l_ast_data = NULL;
+
+		cl_object_get(obj);
+	}
+
+	unlock_res_and_lock(dlmlock);
+
+	/* if l_ast_data is NULL, the dlmlock was enqueued by AGL or
+	 * the object has been destroyed.
+	 */
+	if (obj) {
+		struct ldlm_extent *extent = &dlmlock->l_policy_data.l_extent;
+		struct cl_attr *attr = &osc_env_info(env)->oti_attr;
+		__u64 old_kms;
+
+		/* Destroy pages covered by the extent of the DLM lock */
+		result = osc_lock_flush(cl2osc(obj),
+					cl_index(obj, extent->start),
+					cl_index(obj, extent->end),
+					mode, discard);
+
+		/* losing a lock, update kms */
+		lock_res_and_lock(dlmlock);
+		cl_object_attr_lock(obj);
+		/* Must get the value under the lock to avoid race. */
+		old_kms = cl2osc(obj)->oo_oinfo->loi_kms;
+		/* Update the kms. Need to loop all granted locks.
+		 * Not a problem for the client
 		 */
-		cancel = (flag == LDLM_CB_BLOCKING);
+		attr->cat_kms = ldlm_extent_shift_kms(dlmlock, old_kms);
 
-	if (cancel) {
-		struct lustre_handle *lockh;
+		cl_object_attr_set(env, obj, attr, CAT_KMS);
+		cl_object_attr_unlock(obj);
+		unlock_res_and_lock(dlmlock);
 
-		lockh = &osc_env_info(env)->oti_handle;
-		ldlm_lock2handle(dlmlock, lockh);
-		result = ldlm_cli_cancel(lockh, LCF_ASYNC);
-	} else
-		result = 0;
+		cl_object_put(env, obj);
+	}
 	return result;
 }
 
@@ -733,107 +526,52 @@ static int osc_ldlm_blocking_ast(struct ldlm_lock *dlmlock,
 				 struct ldlm_lock_desc *new, void *data,
 				 int flag)
 {
-	struct lu_env *env;
-	struct cl_env_nest nest;
-	int result;
+	int result = 0;
 
-	/*
-	 * This can be called in the context of outer IO, e.g.,
-	 *
-	 *     cl_enqueue()->...
-	 *       ->osc_enqueue_base()->...
-	 *	 ->ldlm_prep_elc_req()->...
-	 *	   ->ldlm_cancel_callback()->...
-	 *	     ->osc_ldlm_blocking_ast()
-	 *
-	 * new environment has to be created to not corrupt outer context.
-	 */
-	env = cl_env_nested_get(&nest);
-	if (!IS_ERR(env)) {
-		result = osc_dlm_blocking_ast0(env, dlmlock, data, flag);
-		cl_env_nested_put(&nest, env);
-	} else {
-		result = PTR_ERR(env);
-		/*
-		 * XXX This should never happen, as cl_lock is
-		 * stuck. Pre-allocated environment a la vvp_inode_fini_env
-		 * should be used.
-		 */
-		LBUG();
-	}
-	if (result != 0) {
+	switch (flag) {
+	case LDLM_CB_BLOCKING: {
+		struct lustre_handle lockh;
+
+		ldlm_lock2handle(dlmlock, &lockh);
+		result = ldlm_cli_cancel(&lockh, LCF_ASYNC);
 		if (result == -ENODATA)
 			result = 0;
-		else
-			CERROR("BAST failed: %d\n", result);
+		break;
 	}
-	return result;
-}
-
-static int osc_ldlm_completion_ast(struct ldlm_lock *dlmlock,
-				   __u64 flags, void *data)
-{
-	struct cl_env_nest nest;
-	struct lu_env *env;
-	struct osc_lock *olck;
-	struct cl_lock *lock;
-	int result;
-	int dlmrc;
+	case LDLM_CB_CANCELING: {
+		struct lu_env *env;
+		struct cl_env_nest nest;
 
-	/* first, do dlm part of the work */
-	dlmrc = ldlm_completion_ast_async(dlmlock, flags, data);
-	/* then, notify cl_lock */
-	env = cl_env_nested_get(&nest);
-	if (!IS_ERR(env)) {
-		olck = osc_ast_data_get(dlmlock);
-		if (olck) {
-			lock = olck->ols_cl.cls_lock;
-			cl_lock_mutex_get(env, lock);
-			/*
-			 * ldlm_handle_cp_callback() copied LVB from request
-			 * to lock->l_lvb_data, store it in osc_lock.
-			 */
-			LASSERT(dlmlock->l_lvb_data);
-			lock_res_and_lock(dlmlock);
-			olck->ols_lvb = *(struct ost_lvb *)dlmlock->l_lvb_data;
-			if (!olck->ols_lock) {
-				/*
-				 * upcall (osc_lock_upcall()) hasn't yet been
-				 * called. Do nothing now, upcall will bind
-				 * olck to dlmlock and signal the waiters.
-				 *
-				 * This maintains an invariant that osc_lock
-				 * and ldlm_lock are always bound when
-				 * osc_lock is in OLS_GRANTED state.
-				 */
-			} else if (dlmlock->l_granted_mode ==
-				   dlmlock->l_req_mode) {
-				osc_lock_granted(env, olck, dlmlock, dlmrc);
-			}
-			unlock_res_and_lock(dlmlock);
+		/*
+		 * This can be called in the context of outer IO, e.g.,
+		 *
+		 *     osc_enqueue_base()->...
+		 *	 ->ldlm_prep_elc_req()->...
+		 *	   ->ldlm_cancel_callback()->...
+		 *	     ->osc_ldlm_blocking_ast()
+		 *
+		 * new environment has to be created to not corrupt outer
+		 * context.
+		 */
+		env = cl_env_nested_get(&nest);
+		if (IS_ERR(env)) {
+			result = PTR_ERR(env);
+			break;
+		}
 
-			if (dlmrc != 0) {
-				CL_LOCK_DEBUG(D_ERROR, env, lock,
-					      "dlmlock returned %d\n", dlmrc);
-				cl_lock_error(env, lock, dlmrc);
-			}
-			cl_lock_mutex_put(env, lock);
-			osc_ast_data_put(env, olck);
-			result = 0;
-		} else
-			result = -ELDLM_NO_LOCK_DATA;
+		result = osc_dlm_blocking_ast0(env, dlmlock, data, flag);
 		cl_env_nested_put(&nest, env);
-	} else
-		result = PTR_ERR(env);
-	return dlmrc ?: result;
+		break;
+		}
+	default:
+		LBUG();
+	}
+	return result;
 }
 
 static int osc_ldlm_glimpse_ast(struct ldlm_lock *dlmlock, void *data)
 {
 	struct ptlrpc_request *req = data;
-	struct osc_lock	*olck;
-	struct cl_lock *lock;
-	struct cl_object *obj;
 	struct cl_env_nest nest;
 	struct lu_env *env;
 	struct ost_lvb *lvb;
@@ -844,14 +582,16 @@ static int osc_ldlm_glimpse_ast(struct ldlm_lock *dlmlock, void *data)
 
 	env = cl_env_nested_get(&nest);
 	if (!IS_ERR(env)) {
-		/* osc_ast_data_get() has to go after environment is
-		 * allocated, because osc_ast_data() acquires a
-		 * reference to a lock, and it can only be released in
-		 * environment.
-		 */
-		olck = osc_ast_data_get(dlmlock);
-		if (olck) {
-			lock = olck->ols_cl.cls_lock;
+		struct cl_object *obj = NULL;
+
+		lock_res_and_lock(dlmlock);
+		if (dlmlock->l_ast_data) {
+			obj = osc2cl(dlmlock->l_ast_data);
+			cl_object_get(obj);
+		}
+		unlock_res_and_lock(dlmlock);
+
+		if (obj) {
 			/* Do not grab the mutex of cl_lock for glimpse.
 			 * See LU-1274 for details.
 			 * BTW, it's okay for cl_lock to be cancelled during
@@ -866,7 +606,6 @@ static int osc_ldlm_glimpse_ast(struct ldlm_lock *dlmlock, void *data)
 			result = req_capsule_server_pack(cap);
 			if (result == 0) {
 				lvb = req_capsule_server_get(cap, &RMF_DLM_LVB);
-				obj = lock->cll_descr.cld_obj;
 				result = cl_object_glimpse(env, obj, lvb);
 			}
 			if (!exp_connect_lvb_type(req->rq_export))
@@ -874,7 +613,7 @@ static int osc_ldlm_glimpse_ast(struct ldlm_lock *dlmlock, void *data)
 						   &RMF_DLM_LVB,
 						   sizeof(struct ost_lvb_v1),
 						   RCL_SERVER);
-			osc_ast_data_put(env, olck);
+			cl_object_put(env, obj);
 		} else {
 			/*
 			 * These errors are normal races, so we don't want to
@@ -905,23 +644,24 @@ static int weigh_cb(const struct lu_env *env, struct cl_io *io,
 }
 
 static unsigned long osc_lock_weight(const struct lu_env *env,
-				     const struct osc_lock *ols)
+				     struct osc_object *oscobj,
+				     struct ldlm_extent *extent)
 {
 	struct cl_io *io = &osc_env_info(env)->oti_io;
-	struct cl_lock_descr *descr = &ols->ols_cl.cls_lock->cll_descr;
-	struct cl_object *obj = ols->ols_cl.cls_obj;
+	struct cl_object *obj = cl_object_top(&oscobj->oo_cl);
 	unsigned long npages = 0;
 	int result;
 
-	io->ci_obj = cl_object_top(obj);
+	io->ci_obj = obj;
 	io->ci_ignore_layout = 1;
 	result = cl_io_init(env, io, CIT_MISC, io->ci_obj);
 	if (result != 0)
 		return result;
 
 	do {
-		result = osc_page_gang_lookup(env, io, cl2osc(obj),
-					      descr->cld_start, descr->cld_end,
+		result = osc_page_gang_lookup(env, io, oscobj,
+					      cl_index(obj, extent->start),
+					      cl_index(obj, extent->end),
 					      weigh_cb, (void *)&npages);
 		if (result == CLP_GANG_ABORT)
 			break;
@@ -940,8 +680,10 @@ unsigned long osc_ldlm_weigh_ast(struct ldlm_lock *dlmlock)
 {
 	struct cl_env_nest       nest;
 	struct lu_env           *env;
-	struct osc_lock         *lock;
+	struct osc_object	*obj;
+	struct osc_lock		*oscl;
 	unsigned long            weight;
+	bool			 found = false;
 
 	might_sleep();
 	/*
@@ -957,18 +699,28 @@ unsigned long osc_ldlm_weigh_ast(struct ldlm_lock *dlmlock)
 		return 1;
 
 	LASSERT(dlmlock->l_resource->lr_type == LDLM_EXTENT);
-	lock = osc_ast_data_get(dlmlock);
-	if (!lock) {
-		/* cl_lock was destroyed because of memory pressure.
-		 * It is much reasonable to assign this type of lock
-		 * a lower cost.
+	obj = dlmlock->l_ast_data;
+	if (obj) {
+		weight = 1;
+		goto out;
+	}
+
+	spin_lock(&obj->oo_ol_spin);
+	list_for_each_entry(oscl, &obj->oo_ol_list, ols_nextlock_oscobj) {
+		if (oscl->ols_dlmlock && oscl->ols_dlmlock != dlmlock)
+			continue;
+		found = true;
+	}
+	spin_unlock(&obj->oo_ol_spin);
+	if (found) {
+		/*
+		 * If the lock is being used by an IO, definitely not cancel it.
 		 */
-		weight = 0;
+		weight = 1;
 		goto out;
 	}
 
-	weight = osc_lock_weight(env, lock);
-	osc_ast_data_put(env, lock);
+	weight = osc_lock_weight(env, obj, &dlmlock->l_policy_data.l_extent);
 
 out:
 	cl_env_nested_put(&nest, env);
@@ -976,27 +728,16 @@ out:
 }
 
 static void osc_lock_build_einfo(const struct lu_env *env,
-				 const struct cl_lock *clock,
-				 struct osc_lock *lock,
+				 const struct cl_lock *lock,
+				 struct osc_object *osc,
 				 struct ldlm_enqueue_info *einfo)
 {
-	enum cl_lock_mode mode;
-
-	mode = clock->cll_descr.cld_mode;
-	if (mode == CLM_PHANTOM)
-		/*
-		 * For now, enqueue all glimpse locks in read mode. In the
-		 * future, client might choose to enqueue LCK_PW lock for
-		 * glimpse on a file opened for write.
-		 */
-		mode = CLM_READ;
-
 	einfo->ei_type = LDLM_EXTENT;
-	einfo->ei_mode = osc_cl_lock2ldlm(mode);
+	einfo->ei_mode   = osc_cl_lock2ldlm(lock->cll_descr.cld_mode);
 	einfo->ei_cb_bl = osc_ldlm_blocking_ast;
-	einfo->ei_cb_cp = osc_ldlm_completion_ast;
+	einfo->ei_cb_cp  = ldlm_completion_ast;
 	einfo->ei_cb_gl = osc_ldlm_glimpse_ast;
-	einfo->ei_cbdata = lock; /* value to be put into ->l_ast_data */
+	einfo->ei_cbdata = osc; /* value to be put into ->l_ast_data */
 }
 
 /**
@@ -1052,113 +793,100 @@ static void osc_lock_to_lockless(const struct lu_env *env,
 	LASSERT(ergo(ols->ols_glimpse, !osc_lock_is_lockless(ols)));
 }
 
-static int osc_lock_compatible(const struct osc_lock *qing,
-			       const struct osc_lock *qed)
+static bool osc_lock_compatible(const struct osc_lock *qing,
+				const struct osc_lock *qed)
 {
-	enum cl_lock_mode qing_mode;
-	enum cl_lock_mode qed_mode;
+	struct cl_lock_descr *qed_descr = &qed->ols_cl.cls_lock->cll_descr;
+	struct cl_lock_descr *qing_descr = &qing->ols_cl.cls_lock->cll_descr;
 
-	qing_mode = qing->ols_cl.cls_lock->cll_descr.cld_mode;
-	if (qed->ols_glimpse &&
-	    (qed->ols_state >= OLS_UPCALL_RECEIVED || qing_mode == CLM_READ))
-		return 1;
+	if (qed->ols_glimpse)
+		return true;
+
+	if (qing_descr->cld_mode == CLM_READ && qed_descr->cld_mode == CLM_READ)
+		return true;
+
+	if (qed->ols_state < OLS_GRANTED)
+		return true;
+
+	if (qed_descr->cld_mode  >= qing_descr->cld_mode &&
+	    qed_descr->cld_start <= qing_descr->cld_start &&
+	    qed_descr->cld_end   >= qing_descr->cld_end)
+		return true;
 
-	qed_mode = qed->ols_cl.cls_lock->cll_descr.cld_mode;
-	return ((qing_mode == CLM_READ) && (qed_mode == CLM_READ));
+	return false;
 }
 
-/**
- * Cancel all conflicting locks and wait for them to be destroyed.
- *
- * This function is used for two purposes:
- *
- *     - early cancel all conflicting locks before starting IO, and
- *
- *     - guarantee that pages added to the page cache by lockless IO are never
- *       covered by locks other than lockless IO lock, and, hence, are not
- *       visible to other threads.
- */
-static int osc_lock_enqueue_wait(const struct lu_env *env,
-				 const struct osc_lock *olck)
+static void osc_lock_wake_waiters(const struct lu_env *env,
+				  struct osc_object *osc,
+				  struct osc_lock *oscl)
 {
-	struct cl_lock *lock = olck->ols_cl.cls_lock;
-	struct cl_lock_descr *descr = &lock->cll_descr;
-	struct cl_object_header *hdr = cl_object_header(descr->cld_obj);
-	struct cl_lock *scan;
-	struct cl_lock *conflict = NULL;
-	int lockless = osc_lock_is_lockless(olck);
-	int rc = 0;
+	spin_lock(&osc->oo_ol_spin);
+	list_del_init(&oscl->ols_nextlock_oscobj);
+	spin_unlock(&osc->oo_ol_spin);
 
-	LASSERT(cl_lock_is_mutexed(lock));
+	spin_lock(&oscl->ols_lock);
+	while (!list_empty(&oscl->ols_waiting_list)) {
+		struct osc_lock *scan;
 
-	/* make it enqueue anyway for glimpse lock, because we actually
-	 * don't need to cancel any conflicting locks.
-	 */
-	if (olck->ols_glimpse)
-		return 0;
+		scan = list_entry(oscl->ols_waiting_list.next, struct osc_lock,
+				  ols_wait_entry);
+		list_del_init(&scan->ols_wait_entry);
+
+		cl_sync_io_note(env, scan->ols_owner, 0);
+	}
+	spin_unlock(&oscl->ols_lock);
+}
 
-	spin_lock(&hdr->coh_lock_guard);
-	list_for_each_entry(scan, &hdr->coh_locks, cll_linkage) {
-		struct cl_lock_descr *cld = &scan->cll_descr;
-		const struct osc_lock *scan_ols;
+static void osc_lock_enqueue_wait(const struct lu_env *env,
+				  struct osc_object *obj,
+				  struct osc_lock *oscl)
+{
+	struct osc_lock *tmp_oscl;
+	struct cl_lock_descr *need = &oscl->ols_cl.cls_lock->cll_descr;
+	struct cl_sync_io *waiter = &osc_env_info(env)->oti_anchor;
+
+	spin_lock(&obj->oo_ol_spin);
+	list_add_tail(&oscl->ols_nextlock_oscobj, &obj->oo_ol_list);
 
-		if (scan == lock)
+restart:
+	list_for_each_entry(tmp_oscl, &obj->oo_ol_list,
+			    ols_nextlock_oscobj) {
+		struct cl_lock_descr *descr;
+
+		if (tmp_oscl == oscl)
 			break;
 
-		if (scan->cll_state < CLS_QUEUING ||
-		    scan->cll_state == CLS_FREEING ||
-		    cld->cld_start > descr->cld_end ||
-		    cld->cld_end < descr->cld_start)
+		descr = &tmp_oscl->ols_cl.cls_lock->cll_descr;
+		if (descr->cld_start > need->cld_end ||
+		    descr->cld_end   < need->cld_start)
 			continue;
 
-		/* overlapped and living locks. */
+		/* We're not supposed to give up group lock */
+		if (descr->cld_mode == CLM_GROUP)
+			break;
 
-		/* We're not supposed to give up group lock. */
-		if (scan->cll_descr.cld_mode == CLM_GROUP) {
-			LASSERT(descr->cld_mode != CLM_GROUP ||
-				descr->cld_gid != scan->cll_descr.cld_gid);
+		if (!osc_lock_is_lockless(oscl) &&
+		    osc_lock_compatible(oscl, tmp_oscl))
 			continue;
-		}
 
-		scan_ols = osc_lock_at(scan);
+		/* wait for conflicting lock to be canceled */
+		cl_sync_io_init(waiter, 1, cl_sync_io_end);
+		oscl->ols_owner = waiter;
 
-		/* We need to cancel the compatible locks if we're enqueuing
-		 * a lockless lock, for example:
-		 * imagine that client has PR lock on [0, 1000], and thread T0
-		 * is doing lockless IO in [500, 1500] region. Concurrent
-		 * thread T1 can see lockless data in [500, 1000], which is
-		 * wrong, because these data are possibly stale.
-		 */
-		if (!lockless && osc_lock_compatible(olck, scan_ols))
-			continue;
+		spin_lock(&tmp_oscl->ols_lock);
+		/* add oscl into tmp's ols_waiting list */
+		list_add_tail(&oscl->ols_wait_entry,
+			      &tmp_oscl->ols_waiting_list);
+		spin_unlock(&tmp_oscl->ols_lock);
 
-		cl_lock_get_trust(scan);
-		conflict = scan;
-		break;
-	}
-	spin_unlock(&hdr->coh_lock_guard);
+		spin_unlock(&obj->oo_ol_spin);
+		(void)cl_sync_io_wait(env, waiter, 0);
 
-	if (conflict) {
-		if (lock->cll_descr.cld_mode == CLM_GROUP) {
-			/* we want a group lock but a previous lock request
-			 * conflicts, we do not wait but return 0 so the
-			 * request is send to the server
-			 */
-			CDEBUG(D_DLMTRACE, "group lock %p is conflicted with %p, no wait, send to server\n",
-			       lock, conflict);
-			cl_lock_put(env, conflict);
-			rc = 0;
-		} else {
-			CDEBUG(D_DLMTRACE, "lock %p is conflicted with %p, will wait\n",
-			       lock, conflict);
-			LASSERT(!lock->cll_conflict);
-			lu_ref_add(&conflict->cll_reference, "cancel-wait",
-				   lock);
-			lock->cll_conflict = conflict;
-			rc = CLO_WAIT;
-		}
+		spin_lock(&obj->oo_ol_spin);
+		oscl->ols_owner = NULL;
+		goto restart;
 	}
-	return rc;
+	spin_unlock(&obj->oo_ol_spin);
 }
 
 /**
@@ -1177,188 +905,122 @@ static int osc_lock_enqueue_wait(const struct lu_env *env,
  */
 static int osc_lock_enqueue(const struct lu_env *env,
 			    const struct cl_lock_slice *slice,
-			    struct cl_io *unused, __u32 enqflags)
+			    struct cl_io *unused, struct cl_sync_io *anchor)
 {
-	struct osc_lock *ols = cl2osc_lock(slice);
-	struct cl_lock *lock = ols->ols_cl.cls_lock;
+	struct osc_thread_info *info = osc_env_info(env);
+	struct osc_io *oio = osc_env_io(env);
+	struct osc_object *osc = cl2osc(slice->cls_obj);
+	struct osc_lock *oscl = cl2osc_lock(slice);
+	struct cl_lock *lock = slice->cls_lock;
+	struct ldlm_res_id *resname = &info->oti_resname;
+	ldlm_policy_data_t *policy = &info->oti_policy;
+	osc_enqueue_upcall_f upcall = osc_lock_upcall;
+	void *cookie = oscl;
+	bool async = false;
 	int result;
 
-	LASSERT(cl_lock_is_mutexed(lock));
-	LASSERTF(ols->ols_state == OLS_NEW,
-		 "Impossible state: %d\n", ols->ols_state);
-
-	LASSERTF(ergo(ols->ols_glimpse, lock->cll_descr.cld_mode <= CLM_READ),
-		 "lock = %p, ols = %p\n", lock, ols);
+	LASSERTF(ergo(oscl->ols_glimpse, lock->cll_descr.cld_mode <= CLM_READ),
+		 "lock = %p, ols = %p\n", lock, oscl);
 
-	result = osc_lock_enqueue_wait(env, ols);
-	if (result == 0) {
-		if (!osc_lock_is_lockless(ols)) {
-			struct osc_object *obj = cl2osc(slice->cls_obj);
-			struct osc_thread_info *info = osc_env_info(env);
-			struct ldlm_res_id *resname = &info->oti_resname;
-			ldlm_policy_data_t *policy = &info->oti_policy;
-			struct ldlm_enqueue_info *einfo = &ols->ols_einfo;
+	if (oscl->ols_state == OLS_GRANTED)
+		return 0;
 
-			/* lock will be passed as upcall cookie,
-			 * hold ref to prevent to be released.
-			 */
-			cl_lock_hold_add(env, lock, "upcall", lock);
-			/* a user for lock also */
-			cl_lock_user_add(env, lock);
-			ols->ols_state = OLS_ENQUEUED;
+	if (oscl->ols_flags & LDLM_FL_TEST_LOCK)
+		goto enqueue_base;
 
-			/*
-			 * XXX: this is possible blocking point as
-			 * ldlm_lock_match(LDLM_FL_LVB_READY) waits for
-			 * LDLM_CP_CALLBACK.
-			 */
-			ostid_build_res_name(&obj->oo_oinfo->loi_oi, resname);
-			osc_lock_build_policy(env, lock, policy);
-			result = osc_enqueue_base(osc_export(obj), resname,
-						  &ols->ols_flags, policy,
-						  &ols->ols_lvb,
-						  obj->oo_oinfo->loi_kms_valid,
-						  osc_lock_upcall,
-						  ols, einfo, &ols->ols_handle,
-						  PTLRPCD_SET, 1, ols->ols_agl);
-			if (result != 0) {
-				cl_lock_user_del(env, lock);
-				cl_lock_unhold(env, lock, "upcall", lock);
-				if (unlikely(result == -ECANCELED)) {
-					ols->ols_state = OLS_NEW;
-					result = 0;
-				}
-			}
-		} else {
-			ols->ols_state = OLS_GRANTED;
-			ols->ols_owner = osc_env_io(env);
-		}
+	if (oscl->ols_glimpse) {
+		LASSERT(equi(oscl->ols_agl, !anchor));
+		async = true;
+		goto enqueue_base;
 	}
-	LASSERT(ergo(ols->ols_glimpse, !osc_lock_is_lockless(ols)));
-	return result;
-}
 
-static int osc_lock_wait(const struct lu_env *env,
-			 const struct cl_lock_slice *slice)
-{
-	struct osc_lock *olck = cl2osc_lock(slice);
-	struct cl_lock *lock = olck->ols_cl.cls_lock;
-
-	LINVRNT(osc_lock_invariant(olck));
-
-	if (olck->ols_glimpse && olck->ols_state >= OLS_UPCALL_RECEIVED) {
-		if (olck->ols_flags & LDLM_FL_LVB_READY) {
-			return 0;
-		} else if (olck->ols_agl) {
-			if (lock->cll_flags & CLF_FROM_UPCALL)
-				/* It is from enqueue RPC reply upcall for
-				 * updating state. Do not re-enqueue.
-				 */
-				return -ENAVAIL;
-			olck->ols_state = OLS_NEW;
-		} else {
-			LASSERT(lock->cll_error);
-			return lock->cll_error;
-		}
+	osc_lock_enqueue_wait(env, osc, oscl);
+
+	/* we can grant lockless lock right after all conflicting locks
+	 * are canceled.
+	 */
+	if (osc_lock_is_lockless(oscl)) {
+		oscl->ols_state = OLS_GRANTED;
+		oio->oi_lockless = 1;
+		return 0;
 	}
 
-	if (olck->ols_state == OLS_NEW) {
-		int rc;
-
-		LASSERT(olck->ols_agl);
-		olck->ols_agl = 0;
-		olck->ols_flags &= ~LDLM_FL_BLOCK_NOWAIT;
-		rc = osc_lock_enqueue(env, slice, NULL, CEF_ASYNC | CEF_MUST);
-		if (rc != 0)
-			return rc;
-		else
-			return CLO_REENQUEUED;
+enqueue_base:
+	oscl->ols_state = OLS_ENQUEUED;
+	if (anchor) {
+		atomic_inc(&anchor->csi_sync_nr);
+		oscl->ols_owner = anchor;
 	}
 
-	LASSERT(equi(olck->ols_state >= OLS_UPCALL_RECEIVED &&
-		     lock->cll_error == 0, olck->ols_lock));
+	/**
+	 * DLM lock's ast data must be osc_object;
+	 * if glimpse or AGL lock, async of osc_enqueue_base() must be true,
+	 * DLM's enqueue callback set to osc_lock_upcall() with cookie as
+	 * osc_lock.
+	 */
+	ostid_build_res_name(&osc->oo_oinfo->loi_oi, resname);
+	osc_lock_build_einfo(env, lock, osc, &oscl->ols_einfo);
+	osc_lock_build_policy(env, lock, policy);
+	if (oscl->ols_agl) {
+		oscl->ols_einfo.ei_cbdata = NULL;
+		/* hold a reference for callback */
+		cl_object_get(osc2cl(osc));
+		upcall = osc_lock_upcall_agl;
+		cookie = osc;
+	}
+	result = osc_enqueue_base(osc_export(osc), resname, &oscl->ols_flags,
+				  policy, &oscl->ols_lvb,
+				  osc->oo_oinfo->loi_kms_valid,
+				  upcall, cookie,
+				  &oscl->ols_einfo, PTLRPCD_SET, async,
+				  oscl->ols_agl);
+	if (result != 0) {
+		oscl->ols_state = OLS_CANCELLED;
+		osc_lock_wake_waiters(env, osc, oscl);
 
-	return lock->cll_error ?: olck->ols_state >= OLS_GRANTED ? 0 : CLO_WAIT;
+		/* hide error for AGL lock. */
+		if (oscl->ols_agl) {
+			cl_object_put(env, osc2cl(osc));
+			result = 0;
+		}
+		if (anchor)
+			cl_sync_io_note(env, anchor, result);
+	} else {
+		if (osc_lock_is_lockless(oscl)) {
+			oio->oi_lockless = 1;
+		} else if (!async) {
+			LASSERT(oscl->ols_state == OLS_GRANTED);
+			LASSERT(oscl->ols_hold);
+			LASSERT(oscl->ols_dlmlock);
+		}
+	}
+	return result;
 }
 
 /**
- * An implementation of cl_lock_operations::clo_use() method that pins cached
- * lock.
+ * Breaks a link between osc_lock and dlm_lock.
  */
-static int osc_lock_use(const struct lu_env *env,
-			const struct cl_lock_slice *slice)
+static void osc_lock_detach(const struct lu_env *env, struct osc_lock *olck)
 {
-	struct osc_lock *olck = cl2osc_lock(slice);
-	int rc;
-
-	LASSERT(!olck->ols_hold);
+	struct ldlm_lock *dlmlock;
 
-	/*
-	 * Atomically check for LDLM_FL_CBPENDING and addref a lock if this
-	 * flag is not set. This protects us from a concurrent blocking ast.
-	 */
-	rc = ldlm_lock_addref_try(&olck->ols_handle, olck->ols_einfo.ei_mode);
-	if (rc == 0) {
-		olck->ols_hold = 1;
-		olck->ols_state = OLS_GRANTED;
-	} else {
-		struct cl_lock *lock;
+	dlmlock = olck->ols_dlmlock;
+	if (!dlmlock)
+		return;
 
-		/*
-		 * Lock is being cancelled somewhere within
-		 * ldlm_handle_bl_callback(): LDLM_FL_CBPENDING is already
-		 * set, but osc_ldlm_blocking_ast() hasn't yet acquired
-		 * cl_lock mutex.
-		 */
-		lock = slice->cls_lock;
-		LASSERT(lock->cll_state == CLS_INTRANSIT);
-		LASSERT(lock->cll_users > 0);
-		/* set a flag for osc_dlm_blocking_ast0() to signal the
-		 * lock.
-		 */
-		olck->ols_ast_wait = 1;
-		rc = CLO_WAIT;
+	if (olck->ols_hold) {
+		olck->ols_hold = 0;
+		osc_cancel_base(&olck->ols_handle, olck->ols_einfo.ei_mode);
+		olck->ols_handle.cookie = 0ULL;
 	}
-	return rc;
-}
 
-static int osc_lock_flush(struct osc_lock *ols, int discard)
-{
-	struct cl_lock *lock = ols->ols_cl.cls_lock;
-	struct cl_env_nest nest;
-	struct lu_env *env;
-	int result = 0;
+	olck->ols_dlmlock = NULL;
 
-	env = cl_env_nested_get(&nest);
-	if (!IS_ERR(env)) {
-		struct osc_object *obj = cl2osc(ols->ols_cl.cls_obj);
-		struct cl_lock_descr *descr = &lock->cll_descr;
-		int rc = 0;
-
-		if (descr->cld_mode >= CLM_WRITE) {
-			result = osc_cache_writeback_range(env, obj,
-							   descr->cld_start,
-							   descr->cld_end,
-							   1, discard);
-			LDLM_DEBUG(ols->ols_lock,
-				   "lock %p: %d pages were %s.\n", lock, result,
-				   discard ? "discarded" : "written");
-			if (result > 0)
-				result = 0;
-		}
-
-		rc = osc_lock_discard_pages(env, ols);
-		if (result == 0 && rc < 0)
-			result = rc;
-
-		cl_env_nested_put(&nest, env);
-	} else
-		result = PTR_ERR(env);
-	if (result == 0) {
-		ols->ols_flush = 1;
-		LINVRNT(!osc_lock_has_pages(ols));
-	}
-	return result;
+	/* release a reference taken in osc_lock_upcall(). */
+	LASSERT(olck->ols_has_ref);
+	lu_ref_del(&dlmlock->l_reference, "osc_lock", olck);
+	LDLM_LOCK_RELEASE(dlmlock);
+	olck->ols_has_ref = 0;
 }
 
 /**
@@ -1378,96 +1040,16 @@ static int osc_lock_flush(struct osc_lock *ols, int discard)
 static void osc_lock_cancel(const struct lu_env *env,
 			    const struct cl_lock_slice *slice)
 {
-	struct cl_lock *lock = slice->cls_lock;
-	struct osc_lock *olck = cl2osc_lock(slice);
-	struct ldlm_lock *dlmlock = olck->ols_lock;
-	int result = 0;
-	int discard;
-
-	LASSERT(cl_lock_is_mutexed(lock));
-	LINVRNT(osc_lock_invariant(olck));
-
-	if (dlmlock) {
-		int do_cancel;
-
-		discard = !!(dlmlock->l_flags & LDLM_FL_DISCARD_DATA);
-		if (olck->ols_state >= OLS_GRANTED)
-			result = osc_lock_flush(olck, discard);
-		osc_lock_unhold(olck);
-
-		lock_res_and_lock(dlmlock);
-		/* Now that we're the only user of dlm read/write reference,
-		 * mostly the ->l_readers + ->l_writers should be zero.
-		 * However, there is a corner case.
-		 * See bug 18829 for details.
-		 */
-		do_cancel = (dlmlock->l_readers == 0 &&
-			     dlmlock->l_writers == 0);
-		dlmlock->l_flags |= LDLM_FL_CBPENDING;
-		unlock_res_and_lock(dlmlock);
-		if (do_cancel)
-			result = ldlm_cli_cancel(&olck->ols_handle, LCF_ASYNC);
-		if (result < 0)
-			CL_LOCK_DEBUG(D_ERROR, env, lock,
-				      "lock %p cancel failure with error(%d)\n",
-				      lock, result);
-	}
-	olck->ols_state = OLS_CANCELLED;
-	olck->ols_flags &= ~LDLM_FL_LVB_READY;
-	osc_lock_detach(env, olck);
-}
-
-static int osc_lock_has_pages(struct osc_lock *olck)
-{
-	return 0;
-}
-
-static void osc_lock_delete(const struct lu_env *env,
-			    const struct cl_lock_slice *slice)
-{
-	struct osc_lock *olck;
-
-	olck = cl2osc_lock(slice);
-	if (olck->ols_glimpse) {
-		LASSERT(!olck->ols_hold);
-		LASSERT(!olck->ols_lock);
-		return;
-	}
-
-	LINVRNT(osc_lock_invariant(olck));
-	LINVRNT(!osc_lock_has_pages(olck));
-
-	osc_lock_unhold(olck);
-	osc_lock_detach(env, olck);
-}
+	struct osc_object *obj  = cl2osc(slice->cls_obj);
+	struct osc_lock *oscl = cl2osc_lock(slice);
 
-/**
- * Implements cl_lock_operations::clo_state() method for osc layer.
- *
- * Maintains osc_lock::ols_owner field.
- *
- * This assumes that lock always enters CLS_HELD (from some other state) in
- * the same IO context as one that requested the lock. This should not be a
- * problem, because context is by definition shared by all activity pertaining
- * to the same high-level IO.
- */
-static void osc_lock_state(const struct lu_env *env,
-			   const struct cl_lock_slice *slice,
-			   enum cl_lock_state state)
-{
-	struct osc_lock *lock = cl2osc_lock(slice);
+	LINVRNT(osc_lock_invariant(oscl));
 
-	/*
-	 * XXX multiple io contexts can use the lock at the same time.
-	 */
-	LINVRNT(osc_lock_invariant(lock));
-	if (state == CLS_HELD && slice->cls_lock->cll_state != CLS_HELD) {
-		struct osc_io *oio = osc_env_io(env);
+	osc_lock_detach(env, oscl);
+	oscl->ols_state = OLS_CANCELLED;
+	oscl->ols_flags &= ~LDLM_FL_LVB_READY;
 
-		LASSERT(!lock->ols_owner);
-		lock->ols_owner = oio;
-	} else if (state != CLS_HELD)
-		lock->ols_owner = NULL;
+	osc_lock_wake_waiters(env, obj, oscl);
 }
 
 static int osc_lock_print(const struct lu_env *env, void *cookie,
@@ -1475,197 +1057,161 @@ static int osc_lock_print(const struct lu_env *env, void *cookie,
 {
 	struct osc_lock *lock = cl2osc_lock(slice);
 
-	/*
-	 * XXX print ldlm lock and einfo properly.
-	 */
 	(*p)(env, cookie, "%p %#16llx %#llx %d %p ",
-	     lock->ols_lock, lock->ols_flags, lock->ols_handle.cookie,
+	     lock->ols_dlmlock, lock->ols_flags, lock->ols_handle.cookie,
 	     lock->ols_state, lock->ols_owner);
 	osc_lvb_print(env, cookie, p, &lock->ols_lvb);
 	return 0;
 }
 
-static int osc_lock_fits_into(const struct lu_env *env,
-			      const struct cl_lock_slice *slice,
-			      const struct cl_lock_descr *need,
-			      const struct cl_io *io)
-{
-	struct osc_lock *ols = cl2osc_lock(slice);
-
-	if (need->cld_enq_flags & CEF_NEVER)
-		return 0;
-
-	if (ols->ols_state >= OLS_CANCELLED)
-		return 0;
-
-	if (need->cld_mode == CLM_PHANTOM) {
-		if (ols->ols_agl)
-			return !(ols->ols_state > OLS_RELEASED);
-
-		/*
-		 * Note: the QUEUED lock can't be matched here, otherwise
-		 * it might cause the deadlocks.
-		 * In read_process,
-		 * P1: enqueued read lock, create sublock1
-		 * P2: enqueued write lock, create sublock2(conflicted
-		 *     with sublock1).
-		 * P1: Grant read lock.
-		 * P1: enqueued glimpse lock(with holding sublock1_read),
-		 *     matched with sublock2, waiting sublock2 to be granted.
-		 *     But sublock2 can not be granted, because P1
-		 *     will not release sublock1. Bang!
-		 */
-		if (ols->ols_state < OLS_GRANTED ||
-		    ols->ols_state > OLS_RELEASED)
-			return 0;
-	} else if (need->cld_enq_flags & CEF_MUST) {
-		/*
-		 * If the lock hasn't ever enqueued, it can't be matched
-		 * because enqueue process brings in many information
-		 * which can be used to determine things such as lockless,
-		 * CEF_MUST, etc.
-		 */
-		if (ols->ols_state < OLS_UPCALL_RECEIVED &&
-		    ols->ols_locklessable)
-			return 0;
-	}
-	return 1;
-}
-
 static const struct cl_lock_operations osc_lock_ops = {
 	.clo_fini    = osc_lock_fini,
 	.clo_enqueue = osc_lock_enqueue,
-	.clo_wait    = osc_lock_wait,
-	.clo_unuse   = osc_lock_unuse,
-	.clo_use     = osc_lock_use,
-	.clo_delete  = osc_lock_delete,
-	.clo_state   = osc_lock_state,
 	.clo_cancel  = osc_lock_cancel,
 	.clo_print   = osc_lock_print,
-	.clo_fits_into = osc_lock_fits_into,
 };
 
-static int osc_lock_lockless_unuse(const struct lu_env *env,
-				   const struct cl_lock_slice *slice)
-{
-	struct osc_lock *ols = cl2osc_lock(slice);
-	struct cl_lock *lock = slice->cls_lock;
-
-	LASSERT(ols->ols_state == OLS_GRANTED);
-	LINVRNT(osc_lock_invariant(ols));
-
-	cl_lock_cancel(env, lock);
-	cl_lock_delete(env, lock);
-	return 0;
-}
-
 static void osc_lock_lockless_cancel(const struct lu_env *env,
 				     const struct cl_lock_slice *slice)
 {
 	struct osc_lock *ols = cl2osc_lock(slice);
+	struct osc_object *osc = cl2osc(slice->cls_obj);
+	struct cl_lock_descr *descr = &slice->cls_lock->cll_descr;
 	int result;
 
-	result = osc_lock_flush(ols, 0);
+	LASSERT(!ols->ols_dlmlock);
+	result = osc_lock_flush(osc, descr->cld_start, descr->cld_end,
+				descr->cld_mode, 0);
 	if (result)
 		CERROR("Pages for lockless lock %p were not purged(%d)\n",
 		       ols, result);
-	ols->ols_state = OLS_CANCELLED;
-}
 
-static int osc_lock_lockless_wait(const struct lu_env *env,
-				  const struct cl_lock_slice *slice)
-{
-	struct osc_lock *olck = cl2osc_lock(slice);
-	struct cl_lock *lock = olck->ols_cl.cls_lock;
-
-	LINVRNT(osc_lock_invariant(olck));
-	LASSERT(olck->ols_state >= OLS_UPCALL_RECEIVED);
-
-	return lock->cll_error;
+	osc_lock_wake_waiters(env, osc, ols);
 }
 
-static void osc_lock_lockless_state(const struct lu_env *env,
-				    const struct cl_lock_slice *slice,
-				    enum cl_lock_state state)
-{
-	struct osc_lock *lock = cl2osc_lock(slice);
+static const struct cl_lock_operations osc_lock_lockless_ops = {
+	.clo_fini	= osc_lock_fini,
+	.clo_enqueue	= osc_lock_enqueue,
+	.clo_cancel	= osc_lock_lockless_cancel,
+	.clo_print	= osc_lock_print
+};
 
-	LINVRNT(osc_lock_invariant(lock));
-	if (state == CLS_HELD) {
-		struct osc_io *oio = osc_env_io(env);
+static void osc_lock_set_writer(const struct lu_env *env,
+				const struct cl_io *io,
+				struct cl_object *obj, struct osc_lock *oscl)
+{
+	struct cl_lock_descr *descr = &oscl->ols_cl.cls_lock->cll_descr;
+	pgoff_t io_start;
+	pgoff_t io_end;
 
-		LASSERT(ergo(lock->ols_owner, lock->ols_owner == oio));
-		lock->ols_owner = oio;
+	if (!cl_object_same(io->ci_obj, obj))
+		return;
 
-		/* set the io to be lockless if this lock is for io's
-		 * host object
-		 */
-		if (cl_object_same(oio->oi_cl.cis_obj, slice->cls_obj))
-			oio->oi_lockless = 1;
+	if (likely(io->ci_type == CIT_WRITE)) {
+		io_start = cl_index(obj, io->u.ci_rw.crw_pos);
+		io_end = cl_index(obj, io->u.ci_rw.crw_pos +
+				  io->u.ci_rw.crw_count - 1);
+		if (cl_io_is_append(io)) {
+			io_start = 0;
+			io_end = CL_PAGE_EOF;
+		}
+	} else {
+		LASSERT(cl_io_is_mkwrite(io));
+		io_start = io_end = io->u.ci_fault.ft_index;
 	}
-}
 
-static int osc_lock_lockless_fits_into(const struct lu_env *env,
-				       const struct cl_lock_slice *slice,
-				       const struct cl_lock_descr *need,
-				       const struct cl_io *io)
-{
-	struct osc_lock *lock = cl2osc_lock(slice);
-
-	if (!(need->cld_enq_flags & CEF_NEVER))
-		return 0;
+	if (descr->cld_mode >= CLM_WRITE &&
+	    descr->cld_start <= io_start && descr->cld_end >= io_end) {
+		struct osc_io *oio = osc_env_io(env);
 
-	/* lockless lock should only be used by its owning io. b22147 */
-	return (lock->ols_owner == osc_env_io(env));
+		/* There must be only one lock to match the write region */
+		LASSERT(!oio->oi_write_osclock);
+		oio->oi_write_osclock = oscl;
+	}
 }
 
-static const struct cl_lock_operations osc_lock_lockless_ops = {
-	.clo_fini      = osc_lock_fini,
-	.clo_enqueue   = osc_lock_enqueue,
-	.clo_wait      = osc_lock_lockless_wait,
-	.clo_unuse     = osc_lock_lockless_unuse,
-	.clo_state     = osc_lock_lockless_state,
-	.clo_fits_into = osc_lock_lockless_fits_into,
-	.clo_cancel    = osc_lock_lockless_cancel,
-	.clo_print     = osc_lock_print
-};
-
 int osc_lock_init(const struct lu_env *env,
 		  struct cl_object *obj, struct cl_lock *lock,
-		  const struct cl_io *unused)
+		  const struct cl_io *io)
 {
-	struct osc_lock *clk;
-	int result;
+	struct osc_lock *oscl;
+	__u32 enqflags = lock->cll_descr.cld_enq_flags;
+
+	oscl = kmem_cache_zalloc(osc_lock_kmem, GFP_NOFS);
+	if (!oscl)
+		return -ENOMEM;
+
+	oscl->ols_state = OLS_NEW;
+	spin_lock_init(&oscl->ols_lock);
+	INIT_LIST_HEAD(&oscl->ols_waiting_list);
+	INIT_LIST_HEAD(&oscl->ols_wait_entry);
+	INIT_LIST_HEAD(&oscl->ols_nextlock_oscobj);
+
+	oscl->ols_flags = osc_enq2ldlm_flags(enqflags);
+	oscl->ols_agl = !!(enqflags & CEF_AGL);
+	if (oscl->ols_agl)
+		oscl->ols_flags |= LDLM_FL_BLOCK_NOWAIT;
+	if (oscl->ols_flags & LDLM_FL_HAS_INTENT) {
+		oscl->ols_flags |= LDLM_FL_BLOCK_GRANTED;
+		oscl->ols_glimpse = 1;
+	}
 
-	clk = kmem_cache_zalloc(osc_lock_kmem, GFP_NOFS);
-	if (clk) {
-		__u32 enqflags = lock->cll_descr.cld_enq_flags;
+	cl_lock_slice_add(lock, &oscl->ols_cl, obj, &osc_lock_ops);
 
-		osc_lock_build_einfo(env, lock, clk, &clk->ols_einfo);
-		clk->ols_state = OLS_NEW;
+	if (!(enqflags & CEF_MUST))
+		/* try to convert this lock to a lockless lock */
+		osc_lock_to_lockless(env, oscl, (enqflags & CEF_NEVER));
+	if (oscl->ols_locklessable && !(enqflags & CEF_DISCARD_DATA))
+		oscl->ols_flags |= LDLM_FL_DENY_ON_CONTENTION;
 
-		clk->ols_flags = osc_enq2ldlm_flags(enqflags);
-		clk->ols_agl = !!(enqflags & CEF_AGL);
-		if (clk->ols_agl)
-			clk->ols_flags |= LDLM_FL_BLOCK_NOWAIT;
-		if (clk->ols_flags & LDLM_FL_HAS_INTENT)
-			clk->ols_glimpse = 1;
+	if (io->ci_type == CIT_WRITE || cl_io_is_mkwrite(io))
+		osc_lock_set_writer(env, io, obj, oscl);
 
-		cl_lock_slice_add(lock, &clk->ols_cl, obj, &osc_lock_ops);
 
-		if (!(enqflags & CEF_MUST))
-			/* try to convert this lock to a lockless lock */
-			osc_lock_to_lockless(env, clk, (enqflags & CEF_NEVER));
-		if (clk->ols_locklessable && !(enqflags & CEF_DISCARD_DATA))
-			clk->ols_flags |= LDLM_FL_DENY_ON_CONTENTION;
+	LDLM_DEBUG_NOLOCK("lock %p, osc lock %p, flags %llx\n",
+			  lock, oscl, oscl->ols_flags);
 
-		LDLM_DEBUG_NOLOCK("lock %p, osc lock %p, flags %llx",
-				  lock, clk, clk->ols_flags);
+	return 0;
+}
 
-		result = 0;
-	} else
-		result = -ENOMEM;
-	return result;
+/**
+ * Finds an existing lock covering given index and optionally different from a
+ * given \a except lock.
+ */
+struct ldlm_lock *osc_dlmlock_at_pgoff(const struct lu_env *env,
+				       struct osc_object *obj, pgoff_t index,
+				       int pending, int canceling)
+{
+	struct osc_thread_info *info = osc_env_info(env);
+	struct ldlm_res_id *resname = &info->oti_resname;
+	ldlm_policy_data_t *policy  = &info->oti_policy;
+	struct lustre_handle lockh;
+	struct ldlm_lock *lock = NULL;
+	enum ldlm_mode mode;
+	__u64 flags;
+
+	ostid_build_res_name(&obj->oo_oinfo->loi_oi, resname);
+	osc_index2policy(policy, osc2cl(obj), index, index);
+	policy->l_extent.gid = LDLM_GID_ANY;
+
+	flags = LDLM_FL_BLOCK_GRANTED | LDLM_FL_TEST_LOCK;
+	if (pending)
+		flags |= LDLM_FL_CBPENDING;
+	/*
+	 * It is fine to match any group lock since there could be only one
+	 * with a uniq gid and it conflicts with all other lock modes too
+	 */
+again:
+	mode = ldlm_lock_match(osc_export(obj)->exp_obd->obd_namespace,
+			       flags, resname, LDLM_EXTENT, policy,
+			       LCK_PR | LCK_PW | LCK_GROUP, &lockh, canceling);
+	if (mode != 0) {
+		lock = ldlm_handle2lock(&lockh);
+		/* RACE: the lock is cancelled so let's try again */
+		if (unlikely(!lock))
+			goto again;
+	}
+	return lock;
 }
 
 /** @} osc */
diff --git a/drivers/staging/lustre/lustre/osc/osc_object.c b/drivers/staging/lustre/lustre/osc/osc_object.c
index 2d2d39a..a06bdf1 100644
--- a/drivers/staging/lustre/lustre/osc/osc_object.c
+++ b/drivers/staging/lustre/lustre/osc/osc_object.c
@@ -96,6 +96,8 @@ static int osc_object_init(const struct lu_env *env, struct lu_object *obj,
 	atomic_set(&osc->oo_nr_writes, 0);
 	spin_lock_init(&osc->oo_lock);
 	spin_lock_init(&osc->oo_tree_lock);
+	spin_lock_init(&osc->oo_ol_spin);
+	INIT_LIST_HEAD(&osc->oo_ol_list);
 
 	cl_object_page_init(lu2cl(obj), sizeof(struct osc_page));
 
@@ -122,6 +124,7 @@ static void osc_object_free(const struct lu_env *env, struct lu_object *obj)
 	LASSERT(list_empty(&osc->oo_reading_exts));
 	LASSERT(atomic_read(&osc->oo_nr_reads) == 0);
 	LASSERT(atomic_read(&osc->oo_nr_writes) == 0);
+	LASSERT(list_empty(&osc->oo_ol_list));
 
 	lu_object_fini(obj);
 	kmem_cache_free(osc_object_kmem, osc);
@@ -194,6 +197,32 @@ static int osc_object_glimpse(const struct lu_env *env,
 	return 0;
 }
 
+static int osc_object_ast_clear(struct ldlm_lock *lock, void *data)
+{
+	LASSERT(lock->l_granted_mode == lock->l_req_mode);
+	if (lock->l_ast_data == data)
+		lock->l_ast_data = NULL;
+	return LDLM_ITER_CONTINUE;
+}
+
+static int osc_object_prune(const struct lu_env *env, struct cl_object *obj)
+{
+	struct osc_object       *osc = cl2osc(obj);
+	struct ldlm_res_id      *resname = &osc_env_info(env)->oti_resname;
+
+	LASSERTF(osc->oo_npages == 0,
+		 DFID "still have %lu pages, obj: %p, osc: %p\n",
+		 PFID(lu_object_fid(&obj->co_lu)), osc->oo_npages, obj, osc);
+
+	/* DLM locks don't hold a reference of osc_object so we have to
+	 * clear it before the object is being destroyed.
+	 */
+	ostid_build_res_name(&osc->oo_oinfo->loi_oi, resname);
+	ldlm_resource_iterate(osc_export(osc)->exp_obd->obd_namespace, resname,
+			      osc_object_ast_clear, osc);
+	return 0;
+}
+
 void osc_object_set_contended(struct osc_object *obj)
 {
 	obj->oo_contention_time = cfs_time_current();
@@ -238,12 +267,12 @@ static const struct cl_object_operations osc_ops = {
 	.coo_io_init   = osc_io_init,
 	.coo_attr_get  = osc_attr_get,
 	.coo_attr_set  = osc_attr_set,
-	.coo_glimpse   = osc_object_glimpse
+	.coo_glimpse   = osc_object_glimpse,
+	.coo_prune     = osc_object_prune
 };
 
 static const struct lu_object_operations osc_lu_obj_ops = {
 	.loo_object_init      = osc_object_init,
-	.loo_object_delete    = NULL,
 	.loo_object_release   = NULL,
 	.loo_object_free      = osc_object_free,
 	.loo_object_print     = osc_object_print,
diff --git a/drivers/staging/lustre/lustre/osc/osc_page.c b/drivers/staging/lustre/lustre/osc/osc_page.c
index 5b31351..82979f4 100644
--- a/drivers/staging/lustre/lustre/osc/osc_page.c
+++ b/drivers/staging/lustre/lustre/osc/osc_page.c
@@ -135,15 +135,15 @@ static int osc_page_is_under_lock(const struct lu_env *env,
 				  struct cl_io *unused, pgoff_t *max_index)
 {
 	struct osc_page *opg = cl2osc_page(slice);
-	struct cl_lock *lock;
+	struct ldlm_lock *dlmlock;
 	int result = -ENODATA;
 
-	*max_index = 0;
-	lock = cl_lock_at_pgoff(env, slice->cpl_obj, osc_index(opg),
-				NULL, 1, 0);
-	if (lock) {
-		*max_index = lock->cll_descr.cld_end;
-		cl_lock_put(env, lock);
+	dlmlock = osc_dlmlock_at_pgoff(env, cl2osc(slice->cpl_obj),
+				       osc_index(opg), 1, 0);
+	if (dlmlock) {
+		*max_index = cl_index(slice->cpl_obj,
+				      dlmlock->l_policy_data.l_extent.end);
+		LDLM_LOCK_PUT(dlmlock);
 		result = 0;
 	}
 	return result;
diff --git a/drivers/staging/lustre/lustre/osc/osc_request.c b/drivers/staging/lustre/lustre/osc/osc_request.c
index 372bd26..368b997 100644
--- a/drivers/staging/lustre/lustre/osc/osc_request.c
+++ b/drivers/staging/lustre/lustre/osc/osc_request.c
@@ -92,12 +92,13 @@ struct osc_fsync_args {
 
 struct osc_enqueue_args {
 	struct obd_export	*oa_exp;
+	enum ldlm_type		oa_type;
+	enum ldlm_mode		oa_mode;
 	__u64		    *oa_flags;
-	obd_enqueue_update_f      oa_upcall;
+	osc_enqueue_upcall_f	oa_upcall;
 	void		     *oa_cookie;
 	struct ost_lvb	   *oa_lvb;
-	struct lustre_handle     *oa_lockh;
-	struct ldlm_enqueue_info *oa_ei;
+	struct lustre_handle	oa_lockh;
 	unsigned int	      oa_agl:1;
 };
 
@@ -2068,14 +2069,12 @@ static int osc_set_lock_data_with_check(struct ldlm_lock *lock,
 	LASSERT(lock->l_glimpse_ast == einfo->ei_cb_gl);
 
 	lock_res_and_lock(lock);
-	spin_lock(&osc_ast_guard);
 
 	if (!lock->l_ast_data)
 		lock->l_ast_data = data;
 	if (lock->l_ast_data == data)
 		set = 1;
 
-	spin_unlock(&osc_ast_guard);
 	unlock_res_and_lock(lock);
 
 	return set;
@@ -2117,36 +2116,38 @@ static int osc_find_cbdata(struct obd_export *exp, struct lov_stripe_md *lsm,
 	return rc;
 }
 
-static int osc_enqueue_fini(struct ptlrpc_request *req, struct ost_lvb *lvb,
-			    obd_enqueue_update_f upcall, void *cookie,
-			    __u64 *flags, int agl, int rc)
+static int osc_enqueue_fini(struct ptlrpc_request *req,
+			    osc_enqueue_upcall_f upcall, void *cookie,
+			    struct lustre_handle *lockh, enum ldlm_mode mode,
+			    __u64 *flags, int agl, int errcode)
 {
-	int intent = *flags & LDLM_FL_HAS_INTENT;
-
-	if (intent) {
-		/* The request was created before ldlm_cli_enqueue call. */
-		if (rc == ELDLM_LOCK_ABORTED) {
-			struct ldlm_reply *rep;
+	bool intent = *flags & LDLM_FL_HAS_INTENT;
+	int rc;
 
-			rep = req_capsule_server_get(&req->rq_pill,
-						     &RMF_DLM_REP);
+	/* The request was created before ldlm_cli_enqueue call. */
+	if (intent && errcode == ELDLM_LOCK_ABORTED) {
+		struct ldlm_reply *rep;
 
-			rep->lock_policy_res1 =
-				ptlrpc_status_ntoh(rep->lock_policy_res1);
-			if (rep->lock_policy_res1)
-				rc = rep->lock_policy_res1;
-		}
-	}
+		rep = req_capsule_server_get(&req->rq_pill, &RMF_DLM_REP);
 
-	if ((intent != 0 && rc == ELDLM_LOCK_ABORTED && agl == 0) ||
-	    (rc == 0)) {
+		rep->lock_policy_res1 =
+			ptlrpc_status_ntoh(rep->lock_policy_res1);
+		if (rep->lock_policy_res1)
+			errcode = rep->lock_policy_res1;
+		if (!agl)
+			*flags |= LDLM_FL_LVB_READY;
+	} else if (errcode == ELDLM_OK) {
 		*flags |= LDLM_FL_LVB_READY;
-		CDEBUG(D_INODE, "got kms %llu blocks %llu mtime %llu\n",
-		       lvb->lvb_size, lvb->lvb_blocks, lvb->lvb_mtime);
 	}
 
 	/* Call the update callback. */
-	rc = (*upcall)(cookie, rc);
+	rc = (*upcall)(cookie, lockh, errcode);
+	/* release the reference taken in ldlm_cli_enqueue() */
+	if (errcode == ELDLM_LOCK_MATCHED)
+		errcode = ELDLM_OK;
+	if (errcode == ELDLM_OK && lustre_handle_is_used(lockh))
+		ldlm_lock_decref(lockh, mode);
+
 	return rc;
 }
 
@@ -2155,62 +2156,47 @@ static int osc_enqueue_interpret(const struct lu_env *env,
 				 struct osc_enqueue_args *aa, int rc)
 {
 	struct ldlm_lock *lock;
-	struct lustre_handle handle;
-	__u32 mode;
-	struct ost_lvb *lvb;
-	__u32 lvb_len;
-	__u64 *flags = aa->oa_flags;
-
-	/* Make a local copy of a lock handle and a mode, because aa->oa_*
-	 * might be freed anytime after lock upcall has been called.
-	 */
-	lustre_handle_copy(&handle, aa->oa_lockh);
-	mode = aa->oa_ei->ei_mode;
+	struct lustre_handle *lockh = &aa->oa_lockh;
+	enum ldlm_mode mode = aa->oa_mode;
+	struct ost_lvb *lvb = aa->oa_lvb;
+	__u32 lvb_len = sizeof(*lvb);
+	__u64 flags = 0;
+
 
 	/* ldlm_cli_enqueue is holding a reference on the lock, so it must
 	 * be valid.
 	 */
-	lock = ldlm_handle2lock(&handle);
+	lock = ldlm_handle2lock(lockh);
+	LASSERTF(lock, "lockh %llx, req %p, aa %p - client evicted?\n",
+		 lockh->cookie, req, aa);
 
 	/* Take an additional reference so that a blocking AST that
 	 * ldlm_cli_enqueue_fini() might post for a failed lock, is guaranteed
 	 * to arrive after an upcall has been executed by
 	 * osc_enqueue_fini().
 	 */
-	ldlm_lock_addref(&handle, mode);
+	ldlm_lock_addref(lockh, mode);
 
 	/* Let CP AST to grant the lock first. */
 	OBD_FAIL_TIMEOUT(OBD_FAIL_OSC_CP_ENQ_RACE, 1);
 
-	if (aa->oa_agl && rc == ELDLM_LOCK_ABORTED) {
-		lvb = NULL;
-		lvb_len = 0;
-	} else {
-		lvb = aa->oa_lvb;
-		lvb_len = sizeof(*aa->oa_lvb);
+	if (aa->oa_agl) {
+		LASSERT(!aa->oa_lvb);
+		LASSERT(!aa->oa_flags);
+		aa->oa_flags = &flags;
 	}
 
 	/* Complete obtaining the lock procedure. */
-	rc = ldlm_cli_enqueue_fini(aa->oa_exp, req, aa->oa_ei->ei_type, 1,
-				   mode, flags, lvb, lvb_len, &handle, rc);
+	rc = ldlm_cli_enqueue_fini(aa->oa_exp, req, aa->oa_type, 1,
+				   aa->oa_mode, aa->oa_flags, lvb, lvb_len,
+				   lockh, rc);
 	/* Complete osc stuff. */
-	rc = osc_enqueue_fini(req, aa->oa_lvb, aa->oa_upcall, aa->oa_cookie,
-			      flags, aa->oa_agl, rc);
+	rc = osc_enqueue_fini(req, aa->oa_upcall, aa->oa_cookie, lockh, mode,
+			      aa->oa_flags, aa->oa_agl, rc);
 
 	OBD_FAIL_TIMEOUT(OBD_FAIL_OSC_CP_CANCEL_RACE, 10);
 
-	/* Release the lock for async request. */
-	if (lustre_handle_is_used(&handle) && rc == ELDLM_OK)
-		/*
-		 * Releases a reference taken by ldlm_cli_enqueue(), if it is
-		 * not already released by
-		 * ldlm_cli_enqueue_fini()->failed_lock_cleanup()
-		 */
-		ldlm_lock_decref(&handle, mode);
-
-	LASSERTF(lock, "lockh %p, req %p, aa %p - client evicted?\n",
-		 aa->oa_lockh, req, aa);
-	ldlm_lock_decref(&handle, mode);
+	ldlm_lock_decref(lockh, mode);
 	LDLM_LOCK_PUT(lock);
 	return rc;
 }
@@ -2222,21 +2208,21 @@ struct ptlrpc_request_set *PTLRPCD_SET = (void *)1;
  * other synchronous requests, however keeping some locks and trying to obtain
  * others may take a considerable amount of time in a case of ost failure; and
  * when other sync requests do not get released lock from a client, the client
- * is excluded from the cluster -- such scenarious make the life difficult, so
+ * is evicted from the cluster -- such scenaries make the life difficult, so
  * release locks just after they are obtained.
  */
 int osc_enqueue_base(struct obd_export *exp, struct ldlm_res_id *res_id,
 		     __u64 *flags, ldlm_policy_data_t *policy,
 		     struct ost_lvb *lvb, int kms_valid,
-		     obd_enqueue_update_f upcall, void *cookie,
+		     osc_enqueue_upcall_f upcall, void *cookie,
 		     struct ldlm_enqueue_info *einfo,
-		     struct lustre_handle *lockh,
 		     struct ptlrpc_request_set *rqset, int async, int agl)
 {
 	struct obd_device *obd = exp->exp_obd;
+	struct lustre_handle lockh = { 0 };
 	struct ptlrpc_request *req = NULL;
 	int intent = *flags & LDLM_FL_HAS_INTENT;
-	__u64 match_lvb = (agl != 0 ? 0 : LDLM_FL_LVB_READY);
+	__u64 match_lvb = agl ? 0 : LDLM_FL_LVB_READY;
 	enum ldlm_mode mode;
 	int rc;
 
@@ -2272,55 +2258,39 @@ int osc_enqueue_base(struct obd_export *exp, struct ldlm_res_id *res_id,
 	if (einfo->ei_mode == LCK_PR)
 		mode |= LCK_PW;
 	mode = ldlm_lock_match(obd->obd_namespace, *flags | match_lvb, res_id,
-			       einfo->ei_type, policy, mode, lockh, 0);
+			       einfo->ei_type, policy, mode, &lockh, 0);
 	if (mode) {
-		struct ldlm_lock *matched = ldlm_handle2lock(lockh);
+		struct ldlm_lock *matched;
 
-		if ((agl != 0) && !(matched->l_flags & LDLM_FL_LVB_READY)) {
-			/* For AGL, if enqueue RPC is sent but the lock is not
-			 * granted, then skip to process this strpe.
-			 * Return -ECANCELED to tell the caller.
+		if (*flags & LDLM_FL_TEST_LOCK)
+			return ELDLM_OK;
+
+		matched = ldlm_handle2lock(&lockh);
+		if (agl) {
+			/* AGL enqueues DLM locks speculatively. Therefore if
+			 * it already exists a DLM lock, it wll just inform the
+			 * caller to cancel the AGL process for this stripe.
 			 */
-			ldlm_lock_decref(lockh, mode);
+			ldlm_lock_decref(&lockh, mode);
 			LDLM_LOCK_PUT(matched);
 			return -ECANCELED;
-		}
-
-		if (osc_set_lock_data_with_check(matched, einfo)) {
+		} else if (osc_set_lock_data_with_check(matched, einfo)) {
 			*flags |= LDLM_FL_LVB_READY;
-			/* addref the lock only if not async requests and PW
-			 * lock is matched whereas we asked for PR.
-			 */
-			if (!rqset && einfo->ei_mode != mode)
-				ldlm_lock_addref(lockh, LCK_PR);
-			if (intent) {
-				/* I would like to be able to ASSERT here that
-				 * rss <= kms, but I can't, for reasons which
-				 * are explained in lov_enqueue()
-				 */
-			}
-
-			/* We already have a lock, and it's referenced.
-			 *
-			 * At this point, the cl_lock::cll_state is CLS_QUEUING,
-			 * AGL upcall may change it to CLS_HELD directly.
-			 */
-			(*upcall)(cookie, ELDLM_OK);
+			/* We already have a lock, and it's referenced. */
+			(*upcall)(cookie, &lockh, ELDLM_LOCK_MATCHED);
 
-			if (einfo->ei_mode != mode)
-				ldlm_lock_decref(lockh, LCK_PW);
-			else if (rqset)
-				/* For async requests, decref the lock. */
-				ldlm_lock_decref(lockh, einfo->ei_mode);
+			ldlm_lock_decref(&lockh, mode);
 			LDLM_LOCK_PUT(matched);
 			return ELDLM_OK;
+		} else {
+			ldlm_lock_decref(&lockh, mode);
+			LDLM_LOCK_PUT(matched);
 		}
-
-		ldlm_lock_decref(lockh, mode);
-		LDLM_LOCK_PUT(matched);
 	}
 
- no_match:
+no_match:
+	if (*flags & LDLM_FL_TEST_LOCK)
+		return -ENOLCK;
 	if (intent) {
 		LIST_HEAD(cancels);
 
@@ -2344,21 +2314,31 @@ int osc_enqueue_base(struct obd_export *exp, struct ldlm_res_id *res_id,
 	*flags &= ~LDLM_FL_BLOCK_GRANTED;
 
 	rc = ldlm_cli_enqueue(exp, &req, einfo, res_id, policy, flags, lvb,
-			      sizeof(*lvb), LVB_T_OST, lockh, async);
-	if (rqset) {
+			      sizeof(*lvb), LVB_T_OST, &lockh, async);
+	if (async) {
 		if (!rc) {
 			struct osc_enqueue_args *aa;
 
-			CLASSERT (sizeof(*aa) <= sizeof(req->rq_async_args));
+			CLASSERT(sizeof(*aa) <= sizeof(req->rq_async_args));
 			aa = ptlrpc_req_async_args(req);
-			aa->oa_ei = einfo;
 			aa->oa_exp = exp;
-			aa->oa_flags  = flags;
+			aa->oa_mode = einfo->ei_mode;
+			aa->oa_type = einfo->ei_type;
+			lustre_handle_copy(&aa->oa_lockh, &lockh);
 			aa->oa_upcall = upcall;
 			aa->oa_cookie = cookie;
-			aa->oa_lvb    = lvb;
-			aa->oa_lockh  = lockh;
 			aa->oa_agl    = !!agl;
+			if (!agl) {
+				aa->oa_flags = flags;
+				aa->oa_lvb = lvb;
+			} else {
+				/* AGL is essentially to enqueue an DLM lock
+				* in advance, so we don't care about the
+				* result of AGL enqueue.
+				*/
+				aa->oa_lvb = NULL;
+				aa->oa_flags = NULL;
+			}
 
 			req->rq_interpret_reply =
 				(ptlrpc_interpterer_t)osc_enqueue_interpret;
@@ -2372,7 +2352,8 @@ int osc_enqueue_base(struct obd_export *exp, struct ldlm_res_id *res_id,
 		return rc;
 	}
 
-	rc = osc_enqueue_fini(req, lvb, upcall, cookie, flags, agl, rc);
+	rc = osc_enqueue_fini(req, upcall, cookie, &lockh, einfo->ei_mode,
+			      flags, agl, rc);
 	if (intent)
 		ptlrpc_req_finished(req);
 
@@ -3359,7 +3340,6 @@ static struct obd_ops osc_obd_ops = {
 };
 
 extern struct lu_kmem_descr osc_caches[];
-extern spinlock_t osc_ast_guard;
 extern struct lock_class_key osc_ast_guard_class;
 
 static int __init osc_init(void)
@@ -3386,9 +3366,6 @@ static int __init osc_init(void)
 	if (rc)
 		goto out_kmem;
 
-	spin_lock_init(&osc_ast_guard);
-	lockdep_set_class(&osc_ast_guard, &osc_ast_guard_class);
-
 	/* This is obviously too much memory, only prevent overflow here */
 	if (osc_reqpool_mem_max >= 1 << 12 || osc_reqpool_mem_max == 0) {
 		rc = -EINVAL;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 20/46] staging/lustre: update comments after cl_lock simplification
  2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
                   ` (18 preceding siblings ...)
  2016-03-30 23:48 ` [PATCH v2 19/46] staging/lustre/clio: cl_lock simplification green
@ 2016-03-30 23:48 ` green
  2016-03-30 23:48 ` [PATCH v2 21/46] staging/lustre/llite: clip page correctly for vvp_io_commit_sync green
                   ` (25 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: green @ 2016-03-30 23:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List, Bobi Jam,
	Oleg Drokin

From: Bobi Jam <bobijam.xu@intel.com>

Update comments to reflect current cl_lock situations.

Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Reviewed-on: http://review.whamcloud.com/13137
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6046
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 drivers/staging/lustre/lustre/include/cl_object.h  | 130 +++------------------
 .../staging/lustre/lustre/lov/lov_cl_internal.h    |  13 ---
 2 files changed, 19 insertions(+), 124 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/cl_object.h b/drivers/staging/lustre/lustre/include/cl_object.h
index 8f9512e..e613007 100644
--- a/drivers/staging/lustre/lustre/include/cl_object.h
+++ b/drivers/staging/lustre/lustre/include/cl_object.h
@@ -1117,111 +1117,29 @@ static inline struct page *cl_page_vmpage(struct cl_page *page)
  *
  * LIFE CYCLE
  *
- * cl_lock is reference counted. When reference counter drops to 0, lock is
- * placed in the cache, except when lock is in CLS_FREEING state. CLS_FREEING
- * lock is destroyed when last reference is released. Referencing between
- * top-lock and its sub-locks is described in the lov documentation module.
- *
- * STATE MACHINE
- *
- * Also, cl_lock is a state machine. This requires some clarification. One of
- * the goals of client IO re-write was to make IO path non-blocking, or at
- * least to make it easier to make it non-blocking in the future. Here
- * `non-blocking' means that when a system call (read, write, truncate)
- * reaches a situation where it has to wait for a communication with the
- * server, it should --instead of waiting-- remember its current state and
- * switch to some other work.  E.g,. instead of waiting for a lock enqueue,
- * client should proceed doing IO on the next stripe, etc. Obviously this is
- * rather radical redesign, and it is not planned to be fully implemented at
- * this time, instead we are putting some infrastructure in place, that would
- * make it easier to do asynchronous non-blocking IO easier in the
- * future. Specifically, where old locking code goes to sleep (waiting for
- * enqueue, for example), new code returns cl_lock_transition::CLO_WAIT. When
- * enqueue reply comes, its completion handler signals that lock state-machine
- * is ready to transit to the next state. There is some generic code in
- * cl_lock.c that sleeps, waiting for these signals. As a result, for users of
- * this cl_lock.c code, it looks like locking is done in normal blocking
- * fashion, and it the same time it is possible to switch to the non-blocking
- * locking (simply by returning cl_lock_transition::CLO_WAIT from cl_lock.c
- * functions).
- *
- * For a description of state machine states and transitions see enum
- * cl_lock_state.
- *
- * There are two ways to restrict a set of states which lock might move to:
- *
- *     - placing a "hold" on a lock guarantees that lock will not be moved
- *       into cl_lock_state::CLS_FREEING state until hold is released. Hold
- *       can be only acquired on a lock that is not in
- *       cl_lock_state::CLS_FREEING. All holds on a lock are counted in
- *       cl_lock::cll_holds. Hold protects lock from cancellation and
- *       destruction. Requests to cancel and destroy a lock on hold will be
- *       recorded, but only honored when last hold on a lock is released;
- *
- *     - placing a "user" on a lock guarantees that lock will not leave
- *       cl_lock_state::CLS_NEW, cl_lock_state::CLS_QUEUING,
- *       cl_lock_state::CLS_ENQUEUED and cl_lock_state::CLS_HELD set of
- *       states, once it enters this set. That is, if a user is added onto a
- *       lock in a state not from this set, it doesn't immediately enforce
- *       lock to move to this set, but once lock enters this set it will
- *       remain there until all users are removed. Lock users are counted in
- *       cl_lock::cll_users.
- *
- *       User is used to assure that lock is not canceled or destroyed while
- *       it is being enqueued, or actively used by some IO.
- *
- *       Currently, a user always comes with a hold (cl_lock_invariant()
- *       checks that a number of holds is not less than a number of users).
- *
- * CONCURRENCY
- *
- * This is how lock state-machine operates. struct cl_lock contains a mutex
- * cl_lock::cll_guard that protects struct fields.
- *
- *     - mutex is taken, and cl_lock::cll_state is examined.
- *
- *     - for every state there are possible target states where lock can move
- *       into. They are tried in order. Attempts to move into next state are
- *       done by _try() functions in cl_lock.c:cl_{enqueue,unlock,wait}_try().
- *
- *     - if the transition can be performed immediately, state is changed,
- *       and mutex is released.
- *
- *     - if the transition requires blocking, _try() function returns
- *       cl_lock_transition::CLO_WAIT. Caller unlocks mutex and goes to
- *       sleep, waiting for possibility of lock state change. It is woken
- *       up when some event occurs, that makes lock state change possible
- *       (e.g., the reception of the reply from the server), and repeats
- *       the loop.
- *
- * Top-lock and sub-lock has separate mutexes and the latter has to be taken
- * first to avoid dead-lock.
- *
- * To see an example of interaction of all these issues, take a look at the
- * lov_cl.c:lov_lock_enqueue() function. It is called as a part of
- * cl_enqueue_try(), and tries to advance top-lock to ENQUEUED state, by
- * advancing state-machines of its sub-locks (lov_lock_enqueue_one()). Note
- * also, that it uses trylock to grab sub-lock mutex to avoid dead-lock. It
- * also has to handle CEF_ASYNC enqueue, when sub-locks enqueues have to be
- * done in parallel, rather than one after another (this is used for glimpse
- * locks, that cannot dead-lock).
+ * cl_lock is a cacheless data container for the requirements of locks to
+ * complete the IO. cl_lock is created before I/O starts and destroyed when the
+ * I/O is complete.
+ *
+ * cl_lock depends on LDLM lock to fulfill lock semantics. LDLM lock is attached
+ * to cl_lock at OSC layer. LDLM lock is still cacheable.
  *
  * INTERFACE AND USAGE
  *
- * struct cl_lock_operations provide a number of call-backs that are invoked
- * when events of interest occurs. Layers can intercept and handle glimpse,
- * blocking, cancel ASTs and a reception of the reply from the server.
+ * Two major methods are supported for cl_lock: clo_enqueue and clo_cancel.  A
+ * cl_lock is enqueued by cl_lock_request(), which will call clo_enqueue()
+ * methods for each layer to enqueue the lock. At the LOV layer, if a cl_lock
+ * consists of multiple sub cl_locks, each sub locks will be enqueued
+ * correspondingly. At OSC layer, the lock enqueue request will tend to reuse
+ * cached LDLM lock; otherwise a new LDLM lock will have to be requested from
+ * OST side.
  *
- * One important difference with the old client locking model is that new
- * client has a representation for the top-lock, whereas in the old code only
- * sub-locks existed as real data structures and file-level locks are
- * represented by "request sets" that are created and destroyed on each and
- * every lock creation.
+ * cl_lock_cancel() must be called to release a cl_lock after use. clo_cancel()
+ * method will be called for each layer to release the resource held by this
+ * lock. At OSC layer, the reference count of LDLM lock, which is held at
+ * clo_enqueue time, is released.
  *
- * Top-locks are cached, and can be found in the cache by the system calls. It
- * is possible that top-lock is in cache, but some of its sub-locks were
- * canceled and destroyed. In that case top-lock has to be enqueued again
- * before it can be used.
+ * LDLM lock can only be canceled if there is no cl_lock using it.
  *
  * Overall process of the locking during IO operation is as following:
  *
@@ -1234,7 +1152,7 @@ static inline struct page *cl_page_vmpage(struct cl_page *page)
  *
  *     - when all locks are acquired, IO is performed;
  *
- *     - locks are released into cache.
+ *     - locks are released after IO is complete.
  *
  * Striping introduces major additional complexity into locking. The
  * fundamental problem is that it is generally unsafe to actively use (hold)
@@ -1256,16 +1174,6 @@ static inline struct page *cl_page_vmpage(struct cl_page *page)
  * buf is a part of memory mapped Lustre file, a lock or locks protecting buf
  * has to be held together with the usual lock on [offset, offset + count].
  *
- * As multi-stripe locks have to be allowed, it makes sense to cache them, so
- * that, for example, a sequence of O_APPEND writes can proceed quickly
- * without going down to the individual stripes to do lock matching. On the
- * other hand, multi-stripe locks shouldn't be used by normal read/write
- * calls. To achieve this, every layer can implement ->clo_fits_into() method,
- * that is called by lock matching code (cl_lock_lookup()), and that can be
- * used to selectively disable matching of certain locks for certain IOs. For
- * example, lov layer implements lov_lock_fits_into() that allow multi-stripe
- * locks to be matched only for truncates and O_APPEND writes.
- *
  * Interaction with DLM
  *
  * In the expected setup, cl_lock is ultimately backed up by a collection of
diff --git a/drivers/staging/lustre/lustre/lov/lov_cl_internal.h b/drivers/staging/lustre/lustre/lov/lov_cl_internal.h
index dfe41a8..ac9744e 100644
--- a/drivers/staging/lustre/lustre/lov/lov_cl_internal.h
+++ b/drivers/staging/lustre/lustre/lov/lov_cl_internal.h
@@ -73,19 +73,6 @@
  *     - top-page keeps a reference to its sub-page, and destroys it when it
  *       is destroyed.
  *
- *     - sub-lock keep a reference to its top-locks. Top-lock keeps a
- *       reference (and a hold, see cl_lock_hold()) on its sub-locks when it
- *       actively using them (that is, in cl_lock_state::CLS_QUEUING,
- *       cl_lock_state::CLS_ENQUEUED, cl_lock_state::CLS_HELD states). When
- *       moving into cl_lock_state::CLS_CACHED state, top-lock releases a
- *       hold. From this moment top-lock has only a 'weak' reference to its
- *       sub-locks. This reference is protected by top-lock
- *       cl_lock::cll_guard, and will be automatically cleared by the sub-lock
- *       when the latter is destroyed. When a sub-lock is canceled, a
- *       reference to it is removed from the top-lock array, and top-lock is
- *       moved into CLS_NEW state. It is guaranteed that all sub-locks exist
- *       while their top-lock is in CLS_HELD or CLS_CACHED states.
- *
  *     - IO's are not reference counted.
  *
  * To implement a connection between top and sub entities, lov layer is split
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 21/46] staging/lustre/llite: clip page correctly for vvp_io_commit_sync
  2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
                   ` (19 preceding siblings ...)
  2016-03-30 23:48 ` [PATCH v2 20/46] staging/lustre: update comments after " green
@ 2016-03-30 23:48 ` green
  2016-03-30 23:48 ` [PATCH v2 22/46] staging/lustre/llite: deadlock for page write green
                   ` (24 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: green @ 2016-03-30 23:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Jinshan Xiong, Oleg Drokin

From: Jinshan Xiong <jinshan.xiong@intel.com>

The original code was wrong which clipped page incorrectly for
partial pages started with zero.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-on: http://review.whamcloud.com/8531
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4201
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: wangdi <di.wang@intel.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 drivers/staging/lustre/lustre/llite/vvp_io.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/vvp_io.c b/drivers/staging/lustre/lustre/llite/vvp_io.c
index fb6f932..e44ef21 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_io.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_io.c
@@ -595,15 +595,17 @@ static int vvp_io_commit_sync(const struct lu_env *env, struct cl_io *io,
 	if (plist->pl_nr == 0)
 		return 0;
 
-	if (from != 0) {
+	if (from > 0 || to != PAGE_SIZE) {
 		page = cl_page_list_first(plist);
-		cl_page_clip(env, page, from,
-			     plist->pl_nr == 1 ? to : PAGE_SIZE);
-	}
-	if (to != PAGE_SIZE && plist->pl_nr > 1) {
+		if (plist->pl_nr == 1) {
+			cl_page_clip(env, page, from, to);
+		} else if (from > 0) {
+			cl_page_clip(env, page, from, PAGE_SIZE);
+		} else {
 		page = cl_page_list_last(plist);
 		cl_page_clip(env, page, 0, to);
 	}
+	}
 
 	cl_2queue_init(queue);
 	cl_page_list_splice(plist, &queue->c2_qin);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 22/46] staging/lustre/llite: deadlock for page write
  2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
                   ` (20 preceding siblings ...)
  2016-03-30 23:48 ` [PATCH v2 21/46] staging/lustre/llite: clip page correctly for vvp_io_commit_sync green
@ 2016-03-30 23:48 ` green
  2016-03-30 23:48 ` [PATCH v2 23/46] staging/lustre/llite: make sure we do cl_page_clip on the last page green
                   ` (23 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: green @ 2016-03-30 23:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Jinshan Xiong, Oleg Drokin

From: Jinshan Xiong <jinshan.xiong@intel.com>

Writing thread already locked page #1, and then wait for the
Writeback bit of page #2;

Ptlrpc thread is composing a write RPC, so it sets Writeback on
page #2 and tries to lock page #1 to make it ready.

Deadlocked.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-on: http://review.whamcloud.com/9036
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4540
Reviewed-by: wangdi <di.wang@intel.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 drivers/staging/lustre/lustre/llite/rw26.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/rw26.c b/drivers/staging/lustre/lustre/llite/rw26.c
index 50d8289..f87238b 100644
--- a/drivers/staging/lustre/lustre/llite/rw26.c
+++ b/drivers/staging/lustre/lustre/llite/rw26.c
@@ -512,7 +512,7 @@ static int ll_write_begin(struct file *file, struct address_space *mapping,
 
 	/* To avoid deadlock, try to lock page first. */
 	vmpage = grab_cache_page_nowait(mapping, index);
-	if (unlikely(!vmpage || PageDirty(vmpage))) {
+	if (unlikely(!vmpage || PageDirty(vmpage) || PageWriteback(vmpage))) {
 		struct ccc_io *cio = ccc_env_io(env);
 		struct cl_page_list *plist = &cio->u.write.cui_queue;
 
@@ -522,7 +522,7 @@ static int ll_write_begin(struct file *file, struct address_space *mapping,
 		 * more grants. It's okay for the dirty page to be the first
 		 * one in commit page list, though.
 		 */
-		if (vmpage && PageDirty(vmpage) && plist->pl_nr > 0) {
+		if (vmpage && plist->pl_nr > 0) {
 			unlock_page(vmpage);
 			page_cache_release(vmpage);
 			vmpage = NULL;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 23/46] staging/lustre/llite: make sure we do cl_page_clip on the last page
  2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
                   ` (21 preceding siblings ...)
  2016-03-30 23:48 ` [PATCH v2 22/46] staging/lustre/llite: deadlock for page write green
@ 2016-03-30 23:48 ` green
  2016-03-30 23:48 ` [PATCH v2 24/46] staging/lustre/llite: merge lclient.h into llite/vvp_internal.h green
                   ` (22 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: green @ 2016-03-30 23:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List, Li Dongyang,
	Oleg Drokin

From: Li Dongyang <dongyang.li@anu.edu.au>

When we are doing a partial IO on both first and last page,
the logic currently only call cl_page_clip on the first page, which
will end up with a incorrect i_size.

Signed-off-by: Li Dongyang <dongyang.li@anu.edu.au>
Reviewed-on: http://review.whamcloud.com/11630
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5552
Reviewed-by: Ian Costello <costello.ian@gmail.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Li Xi <pkuelelixi@gmail.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 drivers/staging/lustre/lustre/llite/vvp_io.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/vvp_io.c b/drivers/staging/lustre/lustre/llite/vvp_io.c
index e44ef21..1d2fc3b 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_io.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_io.c
@@ -599,12 +599,14 @@ static int vvp_io_commit_sync(const struct lu_env *env, struct cl_io *io,
 		page = cl_page_list_first(plist);
 		if (plist->pl_nr == 1) {
 			cl_page_clip(env, page, from, to);
-		} else if (from > 0) {
-			cl_page_clip(env, page, from, PAGE_SIZE);
 		} else {
-		page = cl_page_list_last(plist);
-		cl_page_clip(env, page, 0, to);
-	}
+			if (from > 0)
+				cl_page_clip(env, page, from, PAGE_SIZE);
+			if (to != PAGE_SIZE) {
+				page = cl_page_list_last(plist);
+				cl_page_clip(env, page, 0, to);
+			}
+		}
 	}
 
 	cl_2queue_init(queue);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 24/46] staging/lustre/llite: merge lclient.h into llite/vvp_internal.h
  2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
                   ` (22 preceding siblings ...)
  2016-03-30 23:48 ` [PATCH v2 23/46] staging/lustre/llite: make sure we do cl_page_clip on the last page green
@ 2016-03-30 23:48 ` green
  2016-03-30 23:48 ` [PATCH v2 25/46] staging/lustre/llite: rename ccc_device to vvp_device green
                   ` (21 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: green @ 2016-03-30 23:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List,
	John L. Hammond, Oleg Drokin

From: "John L. Hammond" <john.hammond@intel.com>

Move the definition of struct cl_client_cache to
lustre/include/cl_object.h and move the rest of
lustre/include/lclient.h in to lustre/llite/vvp_internal.h.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Reviewed-on: http://review.whamcloud.com/12592
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5971
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 drivers/staging/lustre/lustre/include/cl_object.h  |  36 ++
 drivers/staging/lustre/lustre/include/lclient.h    | 409 ---------------------
 drivers/staging/lustre/lustre/llite/glimpse.c      |   1 -
 drivers/staging/lustre/lustre/llite/lcommon_cl.c   |   2 -
 drivers/staging/lustre/lustre/llite/lcommon_misc.c |   2 +-
 .../staging/lustre/lustre/llite/llite_internal.h   |   2 +-
 drivers/staging/lustre/lustre/llite/vvp_internal.h | 368 +++++++++++++++++-
 drivers/staging/lustre/lustre/llite/vvp_io.c       |   1 +
 drivers/staging/lustre/lustre/llite/vvp_object.c   |   1 +
 drivers/staging/lustre/lustre/llite/vvp_page.c     |   1 +
 drivers/staging/lustre/lustre/lov/lov_obd.c        |   1 -
 .../staging/lustre/lustre/osc/osc_cl_internal.h    |   1 -
 12 files changed, 408 insertions(+), 417 deletions(-)
 delete mode 100644 drivers/staging/lustre/lustre/include/lclient.h

diff --git a/drivers/staging/lustre/lustre/include/cl_object.h b/drivers/staging/lustre/lustre/include/cl_object.h
index e613007..f2bb8f8 100644
--- a/drivers/staging/lustre/lustre/include/cl_object.h
+++ b/drivers/staging/lustre/lustre/include/cl_object.h
@@ -97,9 +97,12 @@
  * super-class definitions.
  */
 #include "lu_object.h"
+#include <linux/atomic.h>
 #include "linux/lustre_compat25.h"
 #include <linux/mutex.h>
 #include <linux/radix-tree.h>
+#include <linux/spinlock.h>
+#include <linux/wait.h>
 
 struct inode;
 
@@ -2317,6 +2320,39 @@ void cl_lock_descr_print(const struct lu_env *env, void *cookie,
 			 const struct cl_lock_descr *descr);
 /* @} helper */
 
+/**
+ * Data structure managing a client's cached pages. A count of
+ * "unstable" pages is maintained, and an LRU of clean pages is
+ * maintained. "unstable" pages are pages pinned by the ptlrpc
+ * layer for recovery purposes.
+ */
+struct cl_client_cache {
+	/**
+	 * # of users (OSCs)
+	 */
+	atomic_t		ccc_users;
+	/**
+	 * # of threads are doing shrinking
+	 */
+	unsigned int		ccc_lru_shrinkers;
+	/**
+	 * # of LRU entries available
+	 */
+	atomic_t		ccc_lru_left;
+	/**
+	 * List of entities(OSCs) for this LRU cache
+	 */
+	struct list_head	ccc_lru;
+	/**
+	 * Max # of LRU entries
+	 */
+	unsigned long		ccc_lru_max;
+	/**
+	 * Lock to protect ccc_lru list
+	 */
+	spinlock_t		ccc_lru_lock;
+};
+
 /** @} cl_page */
 
 /** \defgroup cl_lock cl_lock
diff --git a/drivers/staging/lustre/lustre/include/lclient.h b/drivers/staging/lustre/lustre/include/lclient.h
deleted file mode 100644
index 82af8ae..0000000
--- a/drivers/staging/lustre/lustre/include/lclient.h
+++ /dev/null
@@ -1,409 +0,0 @@
-/*
- * GPL HEADER START
- *
- * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 only,
- * as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * General Public License version 2 for more details (a copy is included
- * in the LICENSE file that accompanied this code).
- *
- * You should have received a copy of the GNU General Public License
- * version 2 along with this program; If not, see
- * http://www.sun.com/software/products/lustre/docs/GPLv2.pdf
- *
- * Please contact Sun Microsystems, Inc., 4150 Network Circle, Santa Clara,
- * CA 95054 USA or visit www.sun.com if you need additional information or
- * have any questions.
- *
- * GPL HEADER END
- */
-/*
- * Copyright (c) 2008, 2010, Oracle and/or its affiliates. All rights reserved.
- * Use is subject to license terms.
- *
- * Copyright (c) 2011, 2012, Intel Corporation.
- */
-/*
- * This file is part of Lustre, http://www.lustre.org/
- * Lustre is a trademark of Sun Microsystems, Inc.
- *
- * Definitions shared between vvp and liblustre, and other clients in the
- * future.
- *
- *   Author: Oleg Drokin <oleg.drokin@sun.com>
- *   Author: Nikita Danilov <nikita.danilov@sun.com>
- */
-
-#ifndef LCLIENT_H
-#define LCLIENT_H
-
-blkcnt_t dirty_cnt(struct inode *inode);
-
-int cl_glimpse_size0(struct inode *inode, int agl);
-int cl_glimpse_lock(const struct lu_env *env, struct cl_io *io,
-		    struct inode *inode, struct cl_object *clob, int agl);
-
-static inline int cl_glimpse_size(struct inode *inode)
-{
-	return cl_glimpse_size0(inode, 0);
-}
-
-static inline int cl_agl(struct inode *inode)
-{
-	return cl_glimpse_size0(inode, 1);
-}
-
-/**
- * Locking policy for setattr.
- */
-enum ccc_setattr_lock_type {
-	/** Locking is done by server */
-	SETATTR_NOLOCK,
-	/** Extent lock is enqueued */
-	SETATTR_EXTENT_LOCK,
-	/** Existing local extent lock is used */
-	SETATTR_MATCH_LOCK
-};
-
-/**
- * IO state private to vvp or slp layers.
- */
-struct ccc_io {
-	/** super class */
-	struct cl_io_slice     cui_cl;
-	struct cl_io_lock_link cui_link;
-	/**
-	 * I/O vector information to or from which read/write is going.
-	 */
-	struct iov_iter *cui_iter;
-	/**
-	 * Total size for the left IO.
-	 */
-	size_t cui_tot_count;
-
-	union {
-		struct {
-			enum ccc_setattr_lock_type cui_local_lock;
-		} setattr;
-		struct {
-			struct cl_page_list cui_queue;
-			unsigned long cui_written;
-			int cui_from;
-			int cui_to;
-		} write;
-	} u;
-	/**
-	 * Layout version when this IO is initialized
-	 */
-	__u32		cui_layout_gen;
-	/**
-	 * File descriptor against which IO is done.
-	 */
-	struct ll_file_data *cui_fd;
-	struct kiocb *cui_iocb;
-};
-
-/**
- * True, if \a io is a normal io, False for splice_{read,write}.
- * must be implemented in arch specific code.
- */
-int cl_is_normalio(const struct lu_env *env, const struct cl_io *io);
-
-extern struct lu_context_key ccc_key;
-extern struct lu_context_key ccc_session_key;
-
-struct ccc_thread_info {
-	struct cl_lock		cti_lock;
-	struct cl_lock_descr cti_descr;
-	struct cl_io	 cti_io;
-	struct cl_attr       cti_attr;
-};
-
-static inline struct ccc_thread_info *ccc_env_info(const struct lu_env *env)
-{
-	struct ccc_thread_info      *info;
-
-	info = lu_context_key_get(&env->le_ctx, &ccc_key);
-	LASSERT(info);
-	return info;
-}
-
-static inline struct cl_lock *ccc_env_lock(const struct lu_env *env)
-{
-	struct cl_lock *lock = &ccc_env_info(env)->cti_lock;
-
-	memset(lock, 0, sizeof(*lock));
-	return lock;
-}
-
-static inline struct cl_attr *ccc_env_thread_attr(const struct lu_env *env)
-{
-	struct cl_attr *attr = &ccc_env_info(env)->cti_attr;
-
-	memset(attr, 0, sizeof(*attr));
-	return attr;
-}
-
-static inline struct cl_io *ccc_env_thread_io(const struct lu_env *env)
-{
-	struct cl_io *io = &ccc_env_info(env)->cti_io;
-
-	memset(io, 0, sizeof(*io));
-	return io;
-}
-
-struct ccc_session {
-	struct ccc_io cs_ios;
-};
-
-static inline struct ccc_session *ccc_env_session(const struct lu_env *env)
-{
-	struct ccc_session *ses;
-
-	ses = lu_context_key_get(env->le_ses, &ccc_session_key);
-	LASSERT(ses);
-	return ses;
-}
-
-static inline struct ccc_io *ccc_env_io(const struct lu_env *env)
-{
-	return &ccc_env_session(env)->cs_ios;
-}
-
-/**
- * ccc-private object state.
- */
-struct ccc_object {
-	struct cl_object_header cob_header;
-	struct cl_object	cob_cl;
-	struct inode	   *cob_inode;
-
-	/**
-	 * A list of dirty pages pending IO in the cache. Used by
-	 * SOM. Protected by ll_inode_info::lli_lock.
-	 *
-	 * \see ccc_page::cpg_pending_linkage
-	 */
-	struct list_head	     cob_pending_list;
-
-	/**
-	 * Access this counter is protected by inode->i_sem. Now that
-	 * the lifetime of transient pages must be covered by inode sem,
-	 * we don't need to hold any lock..
-	 */
-	int		     cob_transient_pages;
-	/**
-	 * Number of outstanding mmaps on this file.
-	 *
-	 * \see ll_vm_open(), ll_vm_close().
-	 */
-	atomic_t	    cob_mmap_cnt;
-
-	/**
-	 * various flags
-	 * cob_discard_page_warned
-	 *     if pages belonging to this object are discarded when a client
-	 * is evicted, some debug info will be printed, this flag will be set
-	 * during processing the first discarded page, then avoid flooding
-	 * debug message for lots of discarded pages.
-	 *
-	 * \see ll_dirty_page_discard_warn.
-	 */
-	unsigned int		cob_discard_page_warned:1;
-};
-
-/**
- * ccc-private page state.
- */
-struct ccc_page {
-	struct cl_page_slice cpg_cl;
-	int		  cpg_defer_uptodate;
-	int		  cpg_ra_used;
-	int		  cpg_write_queued;
-	/**
-	 * Non-empty iff this page is already counted in
-	 * ccc_object::cob_pending_list. Protected by
-	 * ccc_object::cob_pending_guard. This list is only used as a flag,
-	 * that is, never iterated through, only checked for list_empty(), but
-	 * having a list is useful for debugging.
-	 */
-	struct list_head	   cpg_pending_linkage;
-	/** VM page */
-	struct page	  *cpg_page;
-};
-
-static inline struct ccc_page *cl2ccc_page(const struct cl_page_slice *slice)
-{
-	return container_of(slice, struct ccc_page, cpg_cl);
-}
-
-static inline pgoff_t ccc_index(struct ccc_page *ccc)
-{
-	return ccc->cpg_cl.cpl_index;
-}
-
-struct ccc_device {
-	struct cl_device    cdv_cl;
-	struct super_block *cdv_sb;
-	struct cl_device   *cdv_next;
-};
-
-struct ccc_lock {
-	struct cl_lock_slice clk_cl;
-};
-
-struct ccc_req {
-	struct cl_req_slice  crq_cl;
-};
-
-void *ccc_key_init	(const struct lu_context *ctx,
-			   struct lu_context_key *key);
-void  ccc_key_fini	(const struct lu_context *ctx,
-			   struct lu_context_key *key, void *data);
-void *ccc_session_key_init(const struct lu_context *ctx,
-			   struct lu_context_key *key);
-void  ccc_session_key_fini(const struct lu_context *ctx,
-			   struct lu_context_key *key, void *data);
-
-int	      ccc_device_init  (const struct lu_env *env,
-				   struct lu_device *d,
-				   const char *name, struct lu_device *next);
-struct lu_device *ccc_device_fini (const struct lu_env *env,
-				   struct lu_device *d);
-struct lu_device *ccc_device_alloc(const struct lu_env *env,
-				   struct lu_device_type *t,
-				   struct lustre_cfg *cfg,
-				   const struct lu_device_operations *luops,
-				   const struct cl_device_operations *clops);
-struct lu_device *ccc_device_free (const struct lu_env *env,
-				   struct lu_device *d);
-struct lu_object *ccc_object_alloc(const struct lu_env *env,
-				   const struct lu_object_header *hdr,
-				   struct lu_device *dev,
-				   const struct cl_object_operations *clops,
-				   const struct lu_object_operations *luops);
-
-int ccc_req_init(const struct lu_env *env, struct cl_device *dev,
-		 struct cl_req *req);
-void ccc_umount(const struct lu_env *env, struct cl_device *dev);
-int ccc_global_init(struct lu_device_type *device_type);
-void ccc_global_fini(struct lu_device_type *device_type);
-int ccc_object_init0(const struct lu_env *env, struct ccc_object *vob,
-		     const struct cl_object_conf *conf);
-int ccc_object_init(const struct lu_env *env, struct lu_object *obj,
-		    const struct lu_object_conf *conf);
-void ccc_object_free(const struct lu_env *env, struct lu_object *obj);
-int ccc_lock_init(const struct lu_env *env, struct cl_object *obj,
-		  struct cl_lock *lock, const struct cl_io *io,
-		  const struct cl_lock_operations *lkops);
-int ccc_object_glimpse(const struct lu_env *env,
-		       const struct cl_object *obj, struct ost_lvb *lvb);
-int ccc_fail(const struct lu_env *env, const struct cl_page_slice *slice);
-int ccc_transient_page_prep(const struct lu_env *env,
-			    const struct cl_page_slice *slice,
-			    struct cl_io *io);
-void ccc_lock_delete(const struct lu_env *env,
-		     const struct cl_lock_slice *slice);
-void ccc_lock_fini(const struct lu_env *env, struct cl_lock_slice *slice);
-int ccc_lock_enqueue(const struct lu_env *env,
-		     const struct cl_lock_slice *slice,
-		     struct cl_io *io, struct cl_sync_io *anchor);
-int ccc_io_one_lock_index(const struct lu_env *env, struct cl_io *io,
-			  __u32 enqflags, enum cl_lock_mode mode,
-			  pgoff_t start, pgoff_t end);
-int ccc_io_one_lock(const struct lu_env *env, struct cl_io *io,
-		    __u32 enqflags, enum cl_lock_mode mode,
-		    loff_t start, loff_t end);
-void ccc_io_end(const struct lu_env *env, const struct cl_io_slice *ios);
-void ccc_io_advance(const struct lu_env *env, const struct cl_io_slice *ios,
-		    size_t nob);
-void ccc_io_update_iov(const struct lu_env *env, struct ccc_io *cio,
-		       struct cl_io *io);
-int ccc_prep_size(const struct lu_env *env, struct cl_object *obj,
-		  struct cl_io *io, loff_t start, size_t count, int *exceed);
-void ccc_req_completion(const struct lu_env *env,
-			const struct cl_req_slice *slice, int ioret);
-void ccc_req_attr_set(const struct lu_env *env,
-		      const struct cl_req_slice *slice,
-		      const struct cl_object *obj,
-		      struct cl_req_attr *oa, u64 flags);
-
-struct lu_device   *ccc2lu_dev      (struct ccc_device *vdv);
-struct lu_object   *ccc2lu	  (struct ccc_object *vob);
-struct ccc_device  *lu2ccc_dev      (const struct lu_device *d);
-struct ccc_device  *cl2ccc_dev      (const struct cl_device *d);
-struct ccc_object  *lu2ccc	  (const struct lu_object *obj);
-struct ccc_object  *cl2ccc	  (const struct cl_object *obj);
-struct ccc_lock    *cl2ccc_lock     (const struct cl_lock_slice *slice);
-struct ccc_io      *cl2ccc_io       (const struct lu_env *env,
-				     const struct cl_io_slice *slice);
-struct ccc_req     *cl2ccc_req      (const struct cl_req_slice *slice);
-struct page	 *cl2vm_page      (const struct cl_page_slice *slice);
-struct inode       *ccc_object_inode(const struct cl_object *obj);
-struct ccc_object  *cl_inode2ccc    (struct inode *inode);
-
-int cl_setattr_ost(struct inode *inode, const struct iattr *attr);
-
-int ccc_object_invariant(const struct cl_object *obj);
-int cl_file_inode_init(struct inode *inode, struct lustre_md *md);
-void cl_inode_fini(struct inode *inode);
-int cl_local_size(struct inode *inode);
-
-__u16 ll_dirent_type_get(struct lu_dirent *ent);
-__u64 cl_fid_build_ino(const struct lu_fid *fid, int api32);
-__u32 cl_fid_build_gen(const struct lu_fid *fid);
-
-# define CLOBINVRNT(env, clob, expr)					\
-	((void)sizeof(env), (void)sizeof(clob), (void)sizeof(!!(expr)))
-
-int cl_init_ea_size(struct obd_export *md_exp, struct obd_export *dt_exp);
-int cl_ocd_update(struct obd_device *host,
-		  struct obd_device *watched,
-		  enum obd_notify_event ev, void *owner, void *data);
-
-struct ccc_grouplock {
-	struct lu_env   *cg_env;
-	struct cl_io    *cg_io;
-	struct cl_lock  *cg_lock;
-	unsigned long    cg_gid;
-};
-
-int  cl_get_grouplock(struct cl_object *obj, unsigned long gid, int nonblock,
-		      struct ccc_grouplock *cg);
-void cl_put_grouplock(struct ccc_grouplock *cg);
-
-/**
- * New interfaces to get and put lov_stripe_md from lov layer. This violates
- * layering because lov_stripe_md is supposed to be a private data in lov.
- *
- * NB: If you find you have to use these interfaces for your new code, please
- * think about it again. These interfaces may be removed in the future for
- * better layering.
- */
-struct lov_stripe_md *lov_lsm_get(struct cl_object *clobj);
-void lov_lsm_put(struct cl_object *clobj, struct lov_stripe_md *lsm);
-int lov_read_and_clear_async_rc(struct cl_object *clob);
-
-struct lov_stripe_md *ccc_inode_lsm_get(struct inode *inode);
-void ccc_inode_lsm_put(struct inode *inode, struct lov_stripe_md *lsm);
-
-/**
- * Data structure managing a client's cached clean pages. An LRU of
- * pages is maintained, along with other statistics.
- */
-struct cl_client_cache {
-	atomic_t	ccc_users;    /* # of users (OSCs) of this data */
-	struct list_head	ccc_lru;      /* LRU list of cached clean pages */
-	spinlock_t	ccc_lru_lock; /* lock for list */
-	atomic_t	ccc_lru_left; /* # of LRU entries available */
-	unsigned long	ccc_lru_max;  /* Max # of LRU entries possible */
-	unsigned int	ccc_lru_shrinkers; /* # of threads reclaiming */
-};
-
-#endif /*LCLIENT_H */
diff --git a/drivers/staging/lustre/lustre/llite/glimpse.c b/drivers/staging/lustre/lustre/llite/glimpse.c
index 0759dfc..a634633 100644
--- a/drivers/staging/lustre/lustre/llite/glimpse.c
+++ b/drivers/staging/lustre/lustre/llite/glimpse.c
@@ -52,7 +52,6 @@
 #include <linux/file.h>
 
 #include "../include/cl_object.h"
-#include "../include/lclient.h"
 #include "../llite/llite_internal.h"
 
 static const struct cl_lock_descr whole_file = {
diff --git a/drivers/staging/lustre/lustre/llite/lcommon_cl.c b/drivers/staging/lustre/lustre/llite/lcommon_cl.c
index b339d1b..79cdf83 100644
--- a/drivers/staging/lustre/lustre/llite/lcommon_cl.c
+++ b/drivers/staging/lustre/lustre/llite/lcommon_cl.c
@@ -59,8 +59,6 @@
 #include "../include/lustre_mdc.h"
 #include "../include/cl_object.h"
 
-#include "../include/lclient.h"
-
 #include "../llite/llite_internal.h"
 
 static const struct cl_req_operations ccc_req_ops;
diff --git a/drivers/staging/lustre/lustre/llite/lcommon_misc.c b/drivers/staging/lustre/lustre/llite/lcommon_misc.c
index 4c12e21..68e3db1 100644
--- a/drivers/staging/lustre/lustre/llite/lcommon_misc.c
+++ b/drivers/staging/lustre/lustre/llite/lcommon_misc.c
@@ -41,8 +41,8 @@
 #include "../include/obd_support.h"
 #include "../include/obd.h"
 #include "../include/cl_object.h"
-#include "../include/lclient.h"
 
+#include "vvp_internal.h"
 #include "../include/lustre_lite.h"
 
 /* Initialize the default and maximum LOV EA and cookie sizes.  This allows
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index ffed507..5686b94 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -43,11 +43,11 @@
 
 /* for struct cl_lock_descr and struct cl_io */
 #include "../include/cl_object.h"
-#include "../include/lclient.h"
 #include "../include/lustre_mdc.h"
 #include "../include/lustre_intent.h"
 #include <linux/compat.h>
 #include <linux/posix_acl_xattr.h>
+#include "vvp_internal.h"
 
 #ifndef FMODE_EXEC
 #define FMODE_EXEC 0
diff --git a/drivers/staging/lustre/lustre/llite/vvp_internal.h b/drivers/staging/lustre/lustre/llite/vvp_internal.h
index aa06f40..d4fe611 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_internal.h
+++ b/drivers/staging/lustre/lustre/llite/vvp_internal.h
@@ -41,8 +41,374 @@
 #ifndef VVP_INTERNAL_H
 #define VVP_INTERNAL_H
 
+#include "../include/lustre/lustre_idl.h"
 #include "../include/cl_object.h"
-#include "llite_internal.h"
+
+enum obd_notify_event;
+struct inode;
+struct lov_stripe_md;
+struct lustre_md;
+struct obd_capa;
+struct obd_device;
+struct obd_export;
+struct page;
+
+blkcnt_t dirty_cnt(struct inode *inode);
+
+int cl_glimpse_size0(struct inode *inode, int agl);
+int cl_glimpse_lock(const struct lu_env *env, struct cl_io *io,
+		    struct inode *inode, struct cl_object *clob, int agl);
+
+static inline int cl_glimpse_size(struct inode *inode)
+{
+	return cl_glimpse_size0(inode, 0);
+}
+
+static inline int cl_agl(struct inode *inode)
+{
+	return cl_glimpse_size0(inode, 1);
+}
+
+/**
+ * Locking policy for setattr.
+ */
+enum ccc_setattr_lock_type {
+	/** Locking is done by server */
+	SETATTR_NOLOCK,
+	/** Extent lock is enqueued */
+	SETATTR_EXTENT_LOCK,
+	/** Existing local extent lock is used */
+	SETATTR_MATCH_LOCK
+};
+
+/**
+ * IO state private to vvp or slp layers.
+ */
+struct ccc_io {
+	/** super class */
+	struct cl_io_slice     cui_cl;
+	struct cl_io_lock_link cui_link;
+	/**
+	 * I/O vector information to or from which read/write is going.
+	 */
+	struct iov_iter *cui_iter;
+	/**
+	 * Total size for the left IO.
+	 */
+	size_t cui_tot_count;
+
+	union {
+		struct {
+			enum ccc_setattr_lock_type cui_local_lock;
+		} setattr;
+		struct {
+			struct cl_page_list cui_queue;
+			unsigned long cui_written;
+			int cui_from;
+			int cui_to;
+		} write;
+	} u;
+	/**
+	 * Layout version when this IO is initialized
+	 */
+	__u32		cui_layout_gen;
+	/**
+	 * File descriptor against which IO is done.
+	 */
+	struct ll_file_data *cui_fd;
+	struct kiocb *cui_iocb;
+};
+
+/**
+ * True, if \a io is a normal io, False for other splice_{read,write}.
+ * must be implemented in arch specific code.
+ */
+int cl_is_normalio(const struct lu_env *env, const struct cl_io *io);
+
+extern struct lu_context_key ccc_key;
+extern struct lu_context_key ccc_session_key;
+
+struct ccc_thread_info {
+	struct cl_lock		cti_lock;
+	struct cl_lock_descr	cti_descr;
+	struct cl_io		cti_io;
+	struct cl_attr		cti_attr;
+};
+
+static inline struct ccc_thread_info *ccc_env_info(const struct lu_env *env)
+{
+	struct ccc_thread_info      *info;
+
+	info = lu_context_key_get(&env->le_ctx, &ccc_key);
+	LASSERT(info);
+
+	return info;
+}
+
+static inline struct cl_lock *ccc_env_lock(const struct lu_env *env)
+{
+	struct cl_lock *lock = &ccc_env_info(env)->cti_lock;
+
+	memset(lock, 0, sizeof(*lock));
+	return lock;
+}
+
+static inline struct cl_attr *ccc_env_thread_attr(const struct lu_env *env)
+{
+	struct cl_attr *attr = &ccc_env_info(env)->cti_attr;
+
+	memset(attr, 0, sizeof(*attr));
+
+	return attr;
+}
+
+static inline struct cl_io *ccc_env_thread_io(const struct lu_env *env)
+{
+	struct cl_io *io = &ccc_env_info(env)->cti_io;
+
+	memset(io, 0, sizeof(*io));
+
+	return io;
+}
+
+struct ccc_session {
+	struct ccc_io cs_ios;
+};
+
+static inline struct ccc_session *ccc_env_session(const struct lu_env *env)
+{
+	struct ccc_session *ses;
+
+	ses = lu_context_key_get(env->le_ses, &ccc_session_key);
+	LASSERT(ses);
+
+	return ses;
+}
+
+static inline struct ccc_io *ccc_env_io(const struct lu_env *env)
+{
+	return &ccc_env_session(env)->cs_ios;
+}
+
+/**
+ * ccc-private object state.
+ */
+struct ccc_object {
+	struct cl_object_header cob_header;
+	struct cl_object        cob_cl;
+	struct inode           *cob_inode;
+
+	/**
+	 * A list of dirty pages pending IO in the cache. Used by
+	 * SOM. Protected by ll_inode_info::lli_lock.
+	 *
+	 * \see ccc_page::cpg_pending_linkage
+	 */
+	struct list_head	cob_pending_list;
+
+	/**
+	 * Access this counter is protected by inode->i_sem. Now that
+	 * the lifetime of transient pages must be covered by inode sem,
+	 * we don't need to hold any lock..
+	 */
+	int			cob_transient_pages;
+	/**
+	 * Number of outstanding mmaps on this file.
+	 *
+	 * \see ll_vm_open(), ll_vm_close().
+	 */
+	atomic_t                cob_mmap_cnt;
+
+	/**
+	 * various flags
+	 * cob_discard_page_warned
+	 *     if pages belonging to this object are discarded when a client
+	 * is evicted, some debug info will be printed, this flag will be set
+	 * during processing the first discarded page, then avoid flooding
+	 * debug message for lots of discarded pages.
+	 *
+	 * \see ll_dirty_page_discard_warn.
+	 */
+	unsigned int		cob_discard_page_warned:1;
+};
+
+/**
+ * ccc-private page state.
+ */
+struct ccc_page {
+	struct cl_page_slice cpg_cl;
+	int		  cpg_defer_uptodate;
+	int		  cpg_ra_used;
+	int		  cpg_write_queued;
+	/**
+	 * Non-empty iff this page is already counted in
+	 * ccc_object::cob_pending_list. Protected by
+	 * ccc_object::cob_pending_guard. This list is only used as a flag,
+	 * that is, never iterated through, only checked for list_empty(), but
+	 * having a list is useful for debugging.
+	 */
+	struct list_head	   cpg_pending_linkage;
+	/** VM page */
+	struct page	  *cpg_page;
+};
+
+static inline struct ccc_page *cl2ccc_page(const struct cl_page_slice *slice)
+{
+	return container_of(slice, struct ccc_page, cpg_cl);
+}
+
+static inline pgoff_t ccc_index(struct ccc_page *ccc)
+{
+	return ccc->cpg_cl.cpl_index;
+}
+
+struct cl_page *ccc_vmpage_page_transient(struct page *vmpage);
+
+struct ccc_device {
+	struct cl_device    cdv_cl;
+	struct super_block *cdv_sb;
+	struct cl_device   *cdv_next;
+};
+
+struct ccc_lock {
+	struct cl_lock_slice clk_cl;
+};
+
+struct ccc_req {
+	struct cl_req_slice  crq_cl;
+};
+
+void *ccc_key_init(const struct lu_context *ctx,
+		   struct lu_context_key *key);
+void ccc_key_fini(const struct lu_context *ctx,
+		  struct lu_context_key *key, void *data);
+void *ccc_session_key_init(const struct lu_context *ctx,
+			   struct lu_context_key *key);
+void  ccc_session_key_fini(const struct lu_context *ctx,
+			   struct lu_context_key *key, void *data);
+
+int ccc_device_init(const struct lu_env *env,
+		    struct lu_device *d,
+		    const char *name, struct lu_device *next);
+struct lu_device *ccc_device_fini(const struct lu_env *env,
+				  struct lu_device *d);
+struct lu_device *ccc_device_alloc(const struct lu_env *env,
+				   struct lu_device_type *t,
+				   struct lustre_cfg *cfg,
+				   const struct lu_device_operations *luops,
+				   const struct cl_device_operations *clops);
+struct lu_device *ccc_device_free(const struct lu_env *env,
+				  struct lu_device *d);
+struct lu_object *ccc_object_alloc(const struct lu_env *env,
+				   const struct lu_object_header *hdr,
+				   struct lu_device *dev,
+				   const struct cl_object_operations *clops,
+				   const struct lu_object_operations *luops);
+
+int ccc_req_init(const struct lu_env *env, struct cl_device *dev,
+		 struct cl_req *req);
+void ccc_umount(const struct lu_env *env, struct cl_device *dev);
+int ccc_global_init(struct lu_device_type *device_type);
+void ccc_global_fini(struct lu_device_type *device_type);
+int ccc_object_init0(const struct lu_env *env, struct ccc_object *vob,
+		     const struct cl_object_conf *conf);
+int ccc_object_init(const struct lu_env *env, struct lu_object *obj,
+		    const struct lu_object_conf *conf);
+void ccc_object_free(const struct lu_env *env, struct lu_object *obj);
+int ccc_lock_init(const struct lu_env *env, struct cl_object *obj,
+		  struct cl_lock *lock, const struct cl_io *io,
+		  const struct cl_lock_operations *lkops);
+int ccc_object_glimpse(const struct lu_env *env,
+		       const struct cl_object *obj, struct ost_lvb *lvb);
+int ccc_fail(const struct lu_env *env, const struct cl_page_slice *slice);
+int ccc_transient_page_prep(const struct lu_env *env,
+			    const struct cl_page_slice *slice,
+			    struct cl_io *io);
+void ccc_lock_delete(const struct lu_env *env,
+		     const struct cl_lock_slice *slice);
+void ccc_lock_fini(const struct lu_env *env, struct cl_lock_slice *slice);
+int ccc_lock_enqueue(const struct lu_env *env,
+		     const struct cl_lock_slice *slice,
+		     struct cl_io *io, struct cl_sync_io *anchor);
+
+int ccc_io_one_lock_index(const struct lu_env *env, struct cl_io *io,
+			  __u32 enqflags, enum cl_lock_mode mode,
+			  pgoff_t start, pgoff_t end);
+int ccc_io_one_lock(const struct lu_env *env, struct cl_io *io,
+		    __u32 enqflags, enum cl_lock_mode mode,
+		    loff_t start, loff_t end);
+void ccc_io_end(const struct lu_env *env, const struct cl_io_slice *ios);
+void ccc_io_advance(const struct lu_env *env, const struct cl_io_slice *ios,
+		    size_t nob);
+void ccc_io_update_iov(const struct lu_env *env, struct ccc_io *cio,
+		       struct cl_io *io);
+int ccc_prep_size(const struct lu_env *env, struct cl_object *obj,
+		  struct cl_io *io, loff_t start, size_t count, int *exceed);
+void ccc_req_completion(const struct lu_env *env,
+			const struct cl_req_slice *slice, int ioret);
+void ccc_req_attr_set(const struct lu_env *env,
+		      const struct cl_req_slice *slice,
+		      const struct cl_object *obj,
+		      struct cl_req_attr *oa, u64 flags);
+
+struct lu_device *ccc2lu_dev(struct ccc_device *vdv);
+struct lu_object *ccc2lu(struct ccc_object *vob);
+struct ccc_device *lu2ccc_dev(const struct lu_device *d);
+struct ccc_device *cl2ccc_dev(const struct cl_device *d);
+struct ccc_object *lu2ccc(const struct lu_object *obj);
+struct ccc_object *cl2ccc(const struct cl_object *obj);
+struct ccc_lock *cl2ccc_lock(const struct cl_lock_slice *slice);
+struct ccc_io *cl2ccc_io(const struct lu_env *env,
+			 const struct cl_io_slice *slice);
+struct ccc_req *cl2ccc_req(const struct cl_req_slice *slice);
+struct page *cl2vm_page(const struct cl_page_slice *slice);
+struct inode *ccc_object_inode(const struct cl_object *obj);
+struct ccc_object *cl_inode2ccc(struct inode *inode);
+
+int cl_setattr_ost(struct inode *inode, const struct iattr *attr);
+
+int ccc_object_invariant(const struct cl_object *obj);
+int cl_file_inode_init(struct inode *inode, struct lustre_md *md);
+void cl_inode_fini(struct inode *inode);
+int cl_local_size(struct inode *inode);
+
+__u16 ll_dirent_type_get(struct lu_dirent *ent);
+__u64 cl_fid_build_ino(const struct lu_fid *fid, int api32);
+__u32 cl_fid_build_gen(const struct lu_fid *fid);
+
+# define CLOBINVRNT(env, clob, expr)					\
+	((void)sizeof(env), (void)sizeof(clob), (void)sizeof(!!(expr)))
+
+int cl_init_ea_size(struct obd_export *md_exp, struct obd_export *dt_exp);
+int cl_ocd_update(struct obd_device *host,
+		  struct obd_device *watched,
+		  enum obd_notify_event ev, void *owner, void *data);
+
+struct ccc_grouplock {
+	struct lu_env	*cg_env;
+	struct cl_io	*cg_io;
+	struct cl_lock	*cg_lock;
+	unsigned long	 cg_gid;
+};
+
+int cl_get_grouplock(struct cl_object *obj, unsigned long gid, int nonblock,
+		     struct ccc_grouplock *cg);
+void cl_put_grouplock(struct ccc_grouplock *cg);
+
+/**
+ * New interfaces to get and put lov_stripe_md from lov layer. This violates
+ * layering because lov_stripe_md is supposed to be a private data in lov.
+ *
+ * NB: If you find you have to use these interfaces for your new code, please
+ * think about it again. These interfaces may be removed in the future for
+ * better layering.
+ */
+struct lov_stripe_md *lov_lsm_get(struct cl_object *clobj);
+void lov_lsm_put(struct cl_object *clobj, struct lov_stripe_md *lsm);
+int lov_read_and_clear_async_rc(struct cl_object *clob);
+
+struct lov_stripe_md *ccc_inode_lsm_get(struct inode *inode);
+void ccc_inode_lsm_put(struct inode *inode, struct lov_stripe_md *lsm);
 
 int vvp_io_init(const struct lu_env *env, struct cl_object *obj,
 		struct cl_io *io);
diff --git a/drivers/staging/lustre/lustre/llite/vvp_io.c b/drivers/staging/lustre/lustre/llite/vvp_io.c
index 1d2fc3b..fcf0cfe 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_io.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_io.c
@@ -44,6 +44,7 @@
 #include "../include/obd.h"
 #include "../include/lustre_lite.h"
 
+#include "llite_internal.h"
 #include "vvp_internal.h"
 
 static struct vvp_io *cl2vvp_io(const struct lu_env *env,
diff --git a/drivers/staging/lustre/lustre/llite/vvp_object.c b/drivers/staging/lustre/lustre/llite/vvp_object.c
index 34210db..45fac69 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_object.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_object.c
@@ -45,6 +45,7 @@
 #include "../include/obd.h"
 #include "../include/lustre_lite.h"
 
+#include "llite_internal.h"
 #include "vvp_internal.h"
 
 /*****************************************************************************
diff --git a/drivers/staging/lustre/lustre/llite/vvp_page.c b/drivers/staging/lustre/lustre/llite/vvp_page.c
index 3c6b723..66a4f9b 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_page.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_page.c
@@ -44,6 +44,7 @@
 #include "../include/obd.h"
 #include "../include/lustre_lite.h"
 
+#include "llite_internal.h"
 #include "vvp_internal.h"
 
 /*****************************************************************************
diff --git a/drivers/staging/lustre/lustre/lov/lov_obd.c b/drivers/staging/lustre/lustre/lov/lov_obd.c
index 5daa7fa..1a9e3e8 100644
--- a/drivers/staging/lustre/lustre/lov/lov_obd.c
+++ b/drivers/staging/lustre/lustre/lov/lov_obd.c
@@ -54,7 +54,6 @@
 #include "../include/lprocfs_status.h"
 #include "../include/lustre_param.h"
 #include "../include/cl_object.h"
-#include "../include/lclient.h"		/* for cl_client_lru */
 #include "../include/lustre/ll_fiemap.h"
 #include "../include/lustre_fid.h"
 
diff --git a/drivers/staging/lustre/lustre/osc/osc_cl_internal.h b/drivers/staging/lustre/lustre/osc/osc_cl_internal.h
index c7a69e4..d6d7661 100644
--- a/drivers/staging/lustre/lustre/osc/osc_cl_internal.h
+++ b/drivers/staging/lustre/lustre/osc/osc_cl_internal.h
@@ -51,7 +51,6 @@
 #include "../include/obd.h"
 /* osc_build_res_name() */
 #include "../include/cl_object.h"
-#include "../include/lclient.h"
 #include "osc_internal.h"
 
 /** \defgroup osc osc
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 25/46] staging/lustre/llite: rename ccc_device to vvp_device
  2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
                   ` (23 preceding siblings ...)
  2016-03-30 23:48 ` [PATCH v2 24/46] staging/lustre/llite: merge lclient.h into llite/vvp_internal.h green
@ 2016-03-30 23:48 ` green
  2016-03-30 23:48 ` [PATCH v2 26/46] staging/lustre/llite: rename ccc_object to vvp_object green
                   ` (20 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: green @ 2016-03-30 23:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List,
	John L. Hammond, Oleg Drokin

From: "John L. Hammond" <john.hammond@intel.com>

Rename struct ccc_device to struct vvp_device and merge the CCC device
methods into the VVP device methods.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Reviewed-on: http://review.whamcloud.com/13075
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5971
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 drivers/staging/lustre/lustre/include/cl_object.h  |   2 +-
 drivers/staging/lustre/lustre/llite/lcommon_cl.c   | 104 +--------------------
 .../staging/lustre/lustre/llite/llite_internal.h   |   2 +-
 drivers/staging/lustre/lustre/llite/vvp_dev.c      |  84 ++++++++++++++++-
 drivers/staging/lustre/lustre/llite/vvp_internal.h |  38 ++++----
 5 files changed, 102 insertions(+), 128 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/cl_object.h b/drivers/staging/lustre/lustre/include/cl_object.h
index f2bb8f8..f2a242b 100644
--- a/drivers/staging/lustre/lustre/include/cl_object.h
+++ b/drivers/staging/lustre/lustre/include/cl_object.h
@@ -149,7 +149,7 @@ struct cl_device_operations {
 /**
  * Device in the client stack.
  *
- * \see ccc_device, lov_device, lovsub_device, osc_device
+ * \see vvp_device, lov_device, lovsub_device, osc_device
  */
 struct cl_device {
 	/** Super-class. */
diff --git a/drivers/staging/lustre/lustre/llite/lcommon_cl.c b/drivers/staging/lustre/lustre/llite/lcommon_cl.c
index 79cdf83..b0d4a3d 100644
--- a/drivers/staging/lustre/lustre/llite/lcommon_cl.c
+++ b/drivers/staging/lustre/lustre/llite/lcommon_cl.c
@@ -159,91 +159,6 @@ struct lu_context_key ccc_session_key = {
 	.lct_fini = ccc_session_key_fini
 };
 
-/* type constructor/destructor: ccc_type_{init,fini,start,stop}(). */
-/* LU_TYPE_INIT_FINI(ccc, &ccc_key, &ccc_session_key); */
-
-int ccc_device_init(const struct lu_env *env, struct lu_device *d,
-		    const char *name, struct lu_device *next)
-{
-	struct ccc_device  *vdv;
-	int rc;
-
-	vdv = lu2ccc_dev(d);
-	vdv->cdv_next = lu2cl_dev(next);
-
-	LASSERT(d->ld_site && next->ld_type);
-	next->ld_site = d->ld_site;
-	rc = next->ld_type->ldt_ops->ldto_device_init(
-			env, next, next->ld_type->ldt_name, NULL);
-	if (rc == 0) {
-		lu_device_get(next);
-		lu_ref_add(&next->ld_reference, "lu-stack", &lu_site_init);
-	}
-	return rc;
-}
-
-struct lu_device *ccc_device_fini(const struct lu_env *env,
-				  struct lu_device *d)
-{
-	return cl2lu_dev(lu2ccc_dev(d)->cdv_next);
-}
-
-struct lu_device *ccc_device_alloc(const struct lu_env *env,
-				   struct lu_device_type *t,
-				   struct lustre_cfg *cfg,
-				   const struct lu_device_operations *luops,
-				   const struct cl_device_operations *clops)
-{
-	struct ccc_device *vdv;
-	struct lu_device  *lud;
-	struct cl_site    *site;
-	int rc;
-
-	vdv = kzalloc(sizeof(*vdv), GFP_NOFS);
-	if (!vdv)
-		return ERR_PTR(-ENOMEM);
-
-	lud = &vdv->cdv_cl.cd_lu_dev;
-	cl_device_init(&vdv->cdv_cl, t);
-	ccc2lu_dev(vdv)->ld_ops = luops;
-	vdv->cdv_cl.cd_ops = clops;
-
-	site = kzalloc(sizeof(*site), GFP_NOFS);
-	if (site) {
-		rc = cl_site_init(site, &vdv->cdv_cl);
-		if (rc == 0) {
-			rc = lu_site_init_finish(&site->cs_lu);
-		} else {
-			LASSERT(!lud->ld_site);
-			CERROR("Cannot init lu_site, rc %d.\n", rc);
-			kfree(site);
-		}
-	} else {
-		rc = -ENOMEM;
-	}
-	if (rc != 0) {
-		ccc_device_free(env, lud);
-		lud = ERR_PTR(rc);
-	}
-	return lud;
-}
-
-struct lu_device *ccc_device_free(const struct lu_env *env,
-				  struct lu_device *d)
-{
-	struct ccc_device *vdv  = lu2ccc_dev(d);
-	struct cl_site    *site = lu2cl_site(d->ld_site);
-	struct lu_device  *next = cl2lu_dev(vdv->cdv_next);
-
-	if (d->ld_site) {
-		cl_site_fini(site);
-		kfree(site);
-	}
-	cl_device_fini(lu2cl_dev(d));
-	kfree(vdv);
-	return next;
-}
-
 int ccc_req_init(const struct lu_env *env, struct cl_device *dev,
 		 struct cl_req *req)
 {
@@ -360,13 +275,13 @@ int ccc_object_init0(const struct lu_env *env,
 int ccc_object_init(const struct lu_env *env, struct lu_object *obj,
 		    const struct lu_object_conf *conf)
 {
-	struct ccc_device *dev = lu2ccc_dev(obj->lo_dev);
+	struct vvp_device *dev = lu2vvp_dev(obj->lo_dev);
 	struct ccc_object *vob = lu2ccc(obj);
 	struct lu_object  *below;
 	struct lu_device  *under;
 	int result;
 
-	under = &dev->cdv_next->cd_lu_dev;
+	under = &dev->vdv_next->cd_lu_dev;
 	below = under->ld_ops->ldo_object_alloc(env, obj->lo_header, under);
 	if (below) {
 		const struct cl_object_conf *cconf;
@@ -779,21 +694,6 @@ again:
  *
  */
 
-struct lu_device *ccc2lu_dev(struct ccc_device *vdv)
-{
-	return &vdv->cdv_cl.cd_lu_dev;
-}
-
-struct ccc_device *lu2ccc_dev(const struct lu_device *d)
-{
-	return container_of0(d, struct ccc_device, cdv_cl.cd_lu_dev);
-}
-
-struct ccc_device *cl2ccc_dev(const struct cl_device *d)
-{
-	return container_of0(d, struct ccc_device, cdv_cl);
-}
-
 struct lu_object *ccc2lu(struct ccc_object *vob)
 {
 	return &vob->cob_cl.co_lu;
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index 5686b94..1e9e41b 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -1321,7 +1321,7 @@ static inline void cl_stats_tally(struct cl_device *dev, enum cl_req_type crt,
 	int opc = (crt == CRT_READ) ? LPROC_LL_OSC_READ :
 				      LPROC_LL_OSC_WRITE;
 
-	ll_stats_ops_tally(ll_s2sbi(cl2ccc_dev(dev)->cdv_sb), opc, rc);
+	ll_stats_ops_tally(ll_s2sbi(cl2vvp_dev(dev)->vdv_sb), opc, rc);
 }
 
 ssize_t ll_direct_rw_pages(const struct lu_env *env, struct cl_io *io,
diff --git a/drivers/staging/lustre/lustre/llite/vvp_dev.c b/drivers/staging/lustre/lustre/llite/vvp_dev.c
index 29d24c9..e934ec8 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_dev.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_dev.c
@@ -136,11 +136,85 @@ static const struct cl_device_operations vvp_cl_ops = {
 	.cdo_req_init = ccc_req_init
 };
 
+static struct lu_device *vvp_device_free(const struct lu_env *env,
+					 struct lu_device *d)
+{
+	struct vvp_device *vdv  = lu2vvp_dev(d);
+	struct cl_site    *site = lu2cl_site(d->ld_site);
+	struct lu_device  *next = cl2lu_dev(vdv->vdv_next);
+
+	if (d->ld_site) {
+		cl_site_fini(site);
+		kfree(site);
+	}
+	cl_device_fini(lu2cl_dev(d));
+	kfree(vdv);
+	return next;
+}
+
 static struct lu_device *vvp_device_alloc(const struct lu_env *env,
 					  struct lu_device_type *t,
 					  struct lustre_cfg *cfg)
 {
-	return ccc_device_alloc(env, t, cfg, &vvp_lu_ops, &vvp_cl_ops);
+	struct vvp_device *vdv;
+	struct lu_device  *lud;
+	struct cl_site    *site;
+	int rc;
+
+	vdv = kzalloc(sizeof(*vdv), GFP_NOFS);
+	if (!vdv)
+		return ERR_PTR(-ENOMEM);
+
+	lud = &vdv->vdv_cl.cd_lu_dev;
+	cl_device_init(&vdv->vdv_cl, t);
+	vvp2lu_dev(vdv)->ld_ops = &vvp_lu_ops;
+	vdv->vdv_cl.cd_ops = &vvp_cl_ops;
+
+	site = kzalloc(sizeof(*site), GFP_NOFS);
+	if (site) {
+		rc = cl_site_init(site, &vdv->vdv_cl);
+		if (rc == 0) {
+			rc = lu_site_init_finish(&site->cs_lu);
+		} else {
+			LASSERT(!lud->ld_site);
+			CERROR("Cannot init lu_site, rc %d.\n", rc);
+			kfree(site);
+		}
+	} else {
+		rc = -ENOMEM;
+	}
+	if (rc != 0) {
+		vvp_device_free(env, lud);
+		lud = ERR_PTR(rc);
+	}
+	return lud;
+}
+
+static int vvp_device_init(const struct lu_env *env, struct lu_device *d,
+			   const char *name, struct lu_device *next)
+{
+	struct vvp_device  *vdv;
+	int rc;
+
+	vdv = lu2vvp_dev(d);
+	vdv->vdv_next = lu2cl_dev(next);
+
+	LASSERT(d->ld_site && next->ld_type);
+	next->ld_site = d->ld_site;
+	rc = next->ld_type->ldt_ops->ldto_device_init(env, next,
+						      next->ld_type->ldt_name,
+						      NULL);
+	if (rc == 0) {
+		lu_device_get(next);
+		lu_ref_add(&next->ld_reference, "lu-stack", &lu_site_init);
+	}
+	return rc;
+}
+
+static struct lu_device *vvp_device_fini(const struct lu_env *env,
+					 struct lu_device *d)
+{
+	return cl2lu_dev(lu2vvp_dev(d)->vdv_next);
 }
 
 static const struct lu_device_type_operations vvp_device_type_ops = {
@@ -151,9 +225,9 @@ static const struct lu_device_type_operations vvp_device_type_ops = {
 	.ldto_stop  = vvp_type_stop,
 
 	.ldto_device_alloc = vvp_device_alloc,
-	.ldto_device_free  = ccc_device_free,
-	.ldto_device_init  = ccc_device_init,
-	.ldto_device_fini  = ccc_device_fini
+	.ldto_device_free	= vvp_device_free,
+	.ldto_device_init	= vvp_device_init,
+	.ldto_device_fini	= vvp_device_fini,
 };
 
 struct lu_device_type vvp_device_type = {
@@ -206,7 +280,7 @@ int cl_sb_init(struct super_block *sb)
 		cl = cl_type_setup(env, NULL, &vvp_device_type,
 				   sbi->ll_dt_exp->exp_obd->obd_lu_dev);
 		if (!IS_ERR(cl)) {
-			cl2ccc_dev(cl)->cdv_sb = sb;
+			cl2vvp_dev(cl)->vdv_sb = sb;
 			sbi->ll_cl = cl;
 			sbi->ll_site = cl2lu_dev(cl)->ld_site;
 		}
diff --git a/drivers/staging/lustre/lustre/llite/vvp_internal.h b/drivers/staging/lustre/lustre/llite/vvp_internal.h
index d4fe611..34509f9 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_internal.h
+++ b/drivers/staging/lustre/lustre/llite/vvp_internal.h
@@ -264,10 +264,10 @@ static inline pgoff_t ccc_index(struct ccc_page *ccc)
 
 struct cl_page *ccc_vmpage_page_transient(struct page *vmpage);
 
-struct ccc_device {
-	struct cl_device    cdv_cl;
-	struct super_block *cdv_sb;
-	struct cl_device   *cdv_next;
+struct vvp_device {
+	struct cl_device    vdv_cl;
+	struct super_block *vdv_sb;
+	struct cl_device   *vdv_next;
 };
 
 struct ccc_lock {
@@ -287,18 +287,6 @@ void *ccc_session_key_init(const struct lu_context *ctx,
 void  ccc_session_key_fini(const struct lu_context *ctx,
 			   struct lu_context_key *key, void *data);
 
-int ccc_device_init(const struct lu_env *env,
-		    struct lu_device *d,
-		    const char *name, struct lu_device *next);
-struct lu_device *ccc_device_fini(const struct lu_env *env,
-				  struct lu_device *d);
-struct lu_device *ccc_device_alloc(const struct lu_env *env,
-				   struct lu_device_type *t,
-				   struct lustre_cfg *cfg,
-				   const struct lu_device_operations *luops,
-				   const struct cl_device_operations *clops);
-struct lu_device *ccc_device_free(const struct lu_env *env,
-				  struct lu_device *d);
 struct lu_object *ccc_object_alloc(const struct lu_env *env,
 				   const struct lu_object_header *hdr,
 				   struct lu_device *dev,
@@ -351,10 +339,22 @@ void ccc_req_attr_set(const struct lu_env *env,
 		      const struct cl_object *obj,
 		      struct cl_req_attr *oa, u64 flags);
 
-struct lu_device *ccc2lu_dev(struct ccc_device *vdv);
+static inline struct lu_device *vvp2lu_dev(struct vvp_device *vdv)
+{
+	return &vdv->vdv_cl.cd_lu_dev;
+}
+
+static inline struct vvp_device *lu2vvp_dev(const struct lu_device *d)
+{
+	return container_of0(d, struct vvp_device, vdv_cl.cd_lu_dev);
+}
+
+static inline struct vvp_device *cl2vvp_dev(const struct cl_device *d)
+{
+	return container_of0(d, struct vvp_device, vdv_cl);
+}
+
 struct lu_object *ccc2lu(struct ccc_object *vob);
-struct ccc_device *lu2ccc_dev(const struct lu_device *d);
-struct ccc_device *cl2ccc_dev(const struct cl_device *d);
 struct ccc_object *lu2ccc(const struct lu_object *obj);
 struct ccc_object *cl2ccc(const struct cl_object *obj);
 struct ccc_lock *cl2ccc_lock(const struct cl_lock_slice *slice);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 26/46] staging/lustre/llite: rename ccc_object to vvp_object
  2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
                   ` (24 preceding siblings ...)
  2016-03-30 23:48 ` [PATCH v2 25/46] staging/lustre/llite: rename ccc_device to vvp_device green
@ 2016-03-30 23:48 ` green
  2016-03-30 23:48 ` [PATCH v2 27/46] staging/lustre/llite: rename ccc_page to vvp_page green
                   ` (19 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: green @ 2016-03-30 23:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List,
	John L. Hammond, Oleg Drokin

From: "John L. Hammond" <john.hammond@intel.com>

Rename struct ccc_object to struct vvp_object and merge the CCC object
methods into the VVP object methods.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Reviewed-on: http://review.whamcloud.com/13077
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5971
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 drivers/staging/lustre/lustre/include/cl_object.h  |   4 +-
 drivers/staging/lustre/lustre/llite/glimpse.c      |   4 +-
 drivers/staging/lustre/lustre/llite/lcommon_cl.c   | 166 ++-------------------
 drivers/staging/lustre/lustre/llite/llite_close.c  |  22 +--
 .../staging/lustre/lustre/llite/llite_internal.h   |   6 +-
 drivers/staging/lustre/lustre/llite/llite_lib.c    |   4 +-
 drivers/staging/lustre/lustre/llite/llite_mmap.c   |  16 +-
 drivers/staging/lustre/lustre/llite/rw.c           |   4 +-
 drivers/staging/lustre/lustre/llite/rw26.c         |   6 +-
 drivers/staging/lustre/lustre/llite/vvp_dev.c      |  10 +-
 drivers/staging/lustre/lustre/llite/vvp_internal.h |  61 ++++----
 drivers/staging/lustre/lustre/llite/vvp_io.c       |  36 ++---
 drivers/staging/lustre/lustre/llite/vvp_object.c   | 121 +++++++++++++--
 drivers/staging/lustre/lustre/llite/vvp_page.c     |  40 ++---
 .../staging/lustre/lustre/osc/osc_cl_internal.h    |   2 +-
 15 files changed, 231 insertions(+), 271 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/cl_object.h b/drivers/staging/lustre/lustre/include/cl_object.h
index f2a242b..d3271ff 100644
--- a/drivers/staging/lustre/lustre/include/cl_object.h
+++ b/drivers/staging/lustre/lustre/include/cl_object.h
@@ -245,7 +245,7 @@ enum cl_attr_valid {
  *    be discarded from the memory, all its sub-objects are torn-down and
  *    destroyed too.
  *
- * \see ccc_object, lov_object, lovsub_object, osc_object
+ * \see vvp_object, lov_object, lovsub_object, osc_object
  */
 struct cl_object {
 	/** super class */
@@ -385,7 +385,7 @@ struct cl_object_operations {
 	 * object. Layers are supposed to fill parts of \a lvb that will be
 	 * shipped to the glimpse originator as a glimpse result.
 	 *
-	 * \see ccc_object_glimpse(), lovsub_object_glimpse(),
+	 * \see vvp_object_glimpse(), lovsub_object_glimpse(),
 	 * \see osc_object_glimpse()
 	 */
 	int (*coo_glimpse)(const struct lu_env *env,
diff --git a/drivers/staging/lustre/lustre/llite/glimpse.c b/drivers/staging/lustre/lustre/llite/glimpse.c
index a634633..d76fa16 100644
--- a/drivers/staging/lustre/lustre/llite/glimpse.c
+++ b/drivers/staging/lustre/lustre/llite/glimpse.c
@@ -69,14 +69,14 @@ static const struct cl_lock_descr whole_file = {
 blkcnt_t dirty_cnt(struct inode *inode)
 {
 	blkcnt_t cnt = 0;
-	struct ccc_object *vob = cl_inode2ccc(inode);
+	struct vvp_object *vob = cl_inode2vvp(inode);
 	void	      *results[1];
 
 	if (inode->i_mapping)
 		cnt += radix_tree_gang_lookup_tag(&inode->i_mapping->page_tree,
 						  results, 0, 1,
 						  PAGECACHE_TAG_DIRTY);
-	if (cnt == 0 && atomic_read(&vob->cob_mmap_cnt) > 0)
+	if (cnt == 0 && atomic_read(&vob->vob_mmap_cnt) > 0)
 		cnt = 1;
 
 	return (cnt > 0) ? 1 : 0;
diff --git a/drivers/staging/lustre/lustre/llite/lcommon_cl.c b/drivers/staging/lustre/lustre/llite/lcommon_cl.c
index b0d4a3d..9db0510 100644
--- a/drivers/staging/lustre/lustre/llite/lcommon_cl.c
+++ b/drivers/staging/lustre/lustre/llite/lcommon_cl.c
@@ -68,7 +68,6 @@ static const struct cl_req_operations ccc_req_ops;
  */
 
 static struct kmem_cache *ccc_lock_kmem;
-static struct kmem_cache *ccc_object_kmem;
 static struct kmem_cache *ccc_thread_kmem;
 static struct kmem_cache *ccc_session_kmem;
 static struct kmem_cache *ccc_req_kmem;
@@ -80,11 +79,6 @@ static struct lu_kmem_descr ccc_caches[] = {
 		.ckd_size  = sizeof(struct ccc_lock)
 	},
 	{
-		.ckd_cache = &ccc_object_kmem,
-		.ckd_name  = "ccc_object_kmem",
-		.ckd_size  = sizeof(struct ccc_object)
-	},
-	{
 		.ckd_cache = &ccc_thread_kmem,
 		.ckd_name  = "ccc_thread_kmem",
 		.ckd_size  = sizeof(struct ccc_thread_info),
@@ -227,84 +221,6 @@ void ccc_global_fini(struct lu_device_type *device_type)
 	lu_kmem_fini(ccc_caches);
 }
 
-/*****************************************************************************
- *
- * Object operations.
- *
- */
-
-struct lu_object *ccc_object_alloc(const struct lu_env *env,
-				   const struct lu_object_header *unused,
-				   struct lu_device *dev,
-				   const struct cl_object_operations *clops,
-				   const struct lu_object_operations *luops)
-{
-	struct ccc_object *vob;
-	struct lu_object  *obj;
-
-	vob = kmem_cache_zalloc(ccc_object_kmem, GFP_NOFS);
-	if (vob) {
-		struct cl_object_header *hdr;
-
-		obj = ccc2lu(vob);
-		hdr = &vob->cob_header;
-		cl_object_header_init(hdr);
-		hdr->coh_page_bufsize = cfs_size_round(sizeof(struct cl_page));
-
-		lu_object_init(obj, &hdr->coh_lu, dev);
-		lu_object_add_top(&hdr->coh_lu, obj);
-
-		vob->cob_cl.co_ops = clops;
-		obj->lo_ops = luops;
-	} else {
-		obj = NULL;
-	}
-	return obj;
-}
-
-int ccc_object_init0(const struct lu_env *env,
-		     struct ccc_object *vob,
-		     const struct cl_object_conf *conf)
-{
-	vob->cob_inode = conf->coc_inode;
-	vob->cob_transient_pages = 0;
-	cl_object_page_init(&vob->cob_cl, sizeof(struct ccc_page));
-	return 0;
-}
-
-int ccc_object_init(const struct lu_env *env, struct lu_object *obj,
-		    const struct lu_object_conf *conf)
-{
-	struct vvp_device *dev = lu2vvp_dev(obj->lo_dev);
-	struct ccc_object *vob = lu2ccc(obj);
-	struct lu_object  *below;
-	struct lu_device  *under;
-	int result;
-
-	under = &dev->vdv_next->cd_lu_dev;
-	below = under->ld_ops->ldo_object_alloc(env, obj->lo_header, under);
-	if (below) {
-		const struct cl_object_conf *cconf;
-
-		cconf = lu2cl_conf(conf);
-		INIT_LIST_HEAD(&vob->cob_pending_list);
-		lu_object_add(obj, below);
-		result = ccc_object_init0(env, vob, cconf);
-	} else {
-		result = -ENOMEM;
-	}
-	return result;
-}
-
-void ccc_object_free(const struct lu_env *env, struct lu_object *obj)
-{
-	struct ccc_object *vob = lu2ccc(obj);
-
-	lu_object_fini(obj);
-	lu_object_header_fini(obj->lo_header);
-	kmem_cache_free(ccc_object_kmem, vob);
-}
-
 int ccc_lock_init(const struct lu_env *env,
 		  struct cl_object *obj, struct cl_lock *lock,
 		  const struct cl_io *unused,
@@ -313,7 +229,7 @@ int ccc_lock_init(const struct lu_env *env,
 	struct ccc_lock *clk;
 	int result;
 
-	CLOBINVRNT(env, obj, ccc_object_invariant(obj));
+	CLOBINVRNT(env, obj, vvp_object_invariant(obj));
 
 	clk = kmem_cache_zalloc(ccc_lock_kmem, GFP_NOFS);
 	if (clk) {
@@ -325,35 +241,17 @@ int ccc_lock_init(const struct lu_env *env,
 	return result;
 }
 
-int ccc_object_glimpse(const struct lu_env *env,
-		       const struct cl_object *obj, struct ost_lvb *lvb)
-{
-	struct inode *inode = ccc_object_inode(obj);
-
-	lvb->lvb_mtime = LTIME_S(inode->i_mtime);
-	lvb->lvb_atime = LTIME_S(inode->i_atime);
-	lvb->lvb_ctime = LTIME_S(inode->i_ctime);
-	/*
-	 * LU-417: Add dirty pages block count lest i_blocks reports 0, some
-	 * "cp" or "tar" on remote node may think it's a completely sparse file
-	 * and skip it.
-	 */
-	if (lvb->lvb_size > 0 && lvb->lvb_blocks == 0)
-		lvb->lvb_blocks = dirty_cnt(inode);
-	return 0;
-}
-
-static void ccc_object_size_lock(struct cl_object *obj)
+static void vvp_object_size_lock(struct cl_object *obj)
 {
-	struct inode *inode = ccc_object_inode(obj);
+	struct inode *inode = vvp_object_inode(obj);
 
 	ll_inode_size_lock(inode);
 	cl_object_attr_lock(obj);
 }
 
-static void ccc_object_size_unlock(struct cl_object *obj)
+static void vvp_object_size_unlock(struct cl_object *obj)
 {
-	struct inode *inode = ccc_object_inode(obj);
+	struct inode *inode = vvp_object_inode(obj);
 
 	cl_object_attr_unlock(obj);
 	ll_inode_size_unlock(inode);
@@ -399,7 +297,7 @@ int ccc_lock_enqueue(const struct lu_env *env,
 		     const struct cl_lock_slice *slice,
 		     struct cl_io *unused, struct cl_sync_io *anchor)
 {
-	CLOBINVRNT(env, slice->cls_obj, ccc_object_invariant(slice->cls_obj));
+	CLOBINVRNT(env, slice->cls_obj, vvp_object_invariant(slice->cls_obj));
 	return 0;
 }
 
@@ -417,7 +315,7 @@ int ccc_io_one_lock_index(const struct lu_env *env, struct cl_io *io,
 	struct cl_lock_descr   *descr = &cio->cui_link.cill_descr;
 	struct cl_object       *obj   = io->ci_obj;
 
-	CLOBINVRNT(env, obj, ccc_object_invariant(obj));
+	CLOBINVRNT(env, obj, vvp_object_invariant(obj));
 
 	CDEBUG(D_VFSTRACE, "lock: %d [%lu, %lu]\n", mode, start, end);
 
@@ -462,7 +360,7 @@ int ccc_io_one_lock(const struct lu_env *env, struct cl_io *io,
 void ccc_io_end(const struct lu_env *env, const struct cl_io_slice *ios)
 {
 	CLOBINVRNT(env, ios->cis_io->ci_obj,
-		   ccc_object_invariant(ios->cis_io->ci_obj));
+		   vvp_object_invariant(ios->cis_io->ci_obj));
 }
 
 void ccc_io_advance(const struct lu_env *env,
@@ -473,7 +371,7 @@ void ccc_io_advance(const struct lu_env *env,
 	struct cl_io     *io  = ios->cis_io;
 	struct cl_object *obj = ios->cis_io->ci_obj;
 
-	CLOBINVRNT(env, obj, ccc_object_invariant(obj));
+	CLOBINVRNT(env, obj, vvp_object_invariant(obj));
 
 	if (!cl_is_normalio(env, io))
 		return;
@@ -496,7 +394,7 @@ int ccc_prep_size(const struct lu_env *env, struct cl_object *obj,
 		  struct cl_io *io, loff_t start, size_t count, int *exceed)
 {
 	struct cl_attr *attr  = ccc_env_thread_attr(env);
-	struct inode   *inode = ccc_object_inode(obj);
+	struct inode   *inode = vvp_object_inode(obj);
 	loff_t	  pos   = start + count - 1;
 	loff_t kms;
 	int result;
@@ -520,7 +418,7 @@ int ccc_prep_size(const struct lu_env *env, struct cl_object *obj,
 	 * ll_inode_size_lock(). This guarantees that short reads are handled
 	 * correctly in the face of concurrent writes and truncates.
 	 */
-	ccc_object_size_lock(obj);
+	vvp_object_size_lock(obj);
 	result = cl_object_attr_get(env, obj, attr);
 	if (result == 0) {
 		kms = attr->cat_kms;
@@ -530,7 +428,7 @@ int ccc_prep_size(const struct lu_env *env, struct cl_object *obj,
 			 * return a short read (B) or some zeroes at the end
 			 * of the buffer (C)
 			 */
-			ccc_object_size_unlock(obj);
+			vvp_object_size_unlock(obj);
 			result = cl_glimpse_lock(env, io, inode, obj, 0);
 			if (result == 0 && exceed) {
 				/* If objective page index exceed end-of-file
@@ -567,7 +465,9 @@ int ccc_prep_size(const struct lu_env *env, struct cl_object *obj,
 			       (__u64)i_size_read(inode));
 		}
 	}
-	ccc_object_size_unlock(obj);
+
+	vvp_object_size_unlock(obj);
+
 	return result;
 }
 
@@ -618,7 +518,7 @@ void ccc_req_attr_set(const struct lu_env *env,
 	u32	      valid_flags;
 
 	oa = attr->cra_oa;
-	inode = ccc_object_inode(obj);
+	inode = vvp_object_inode(obj);
 	valid_flags = OBD_MD_FLTYPE;
 
 	if (slice->crs_req->crq_type == CRT_WRITE) {
@@ -694,21 +594,6 @@ again:
  *
  */
 
-struct lu_object *ccc2lu(struct ccc_object *vob)
-{
-	return &vob->cob_cl.co_lu;
-}
-
-struct ccc_object *lu2ccc(const struct lu_object *obj)
-{
-	return container_of0(obj, struct ccc_object, cob_cl.co_lu);
-}
-
-struct ccc_object *cl2ccc(const struct cl_object *obj)
-{
-	return container_of0(obj, struct ccc_object, cob_cl);
-}
-
 struct ccc_lock *cl2ccc_lock(const struct cl_lock_slice *slice)
 {
 	return container_of(slice, struct ccc_lock, clk_cl);
@@ -734,25 +619,6 @@ struct page *cl2vm_page(const struct cl_page_slice *slice)
 	return cl2ccc_page(slice)->cpg_page;
 }
 
-/*****************************************************************************
- *
- * Accessors.
- *
- */
-int ccc_object_invariant(const struct cl_object *obj)
-{
-	struct inode	 *inode = ccc_object_inode(obj);
-	struct ll_inode_info	*lli	= ll_i2info(inode);
-
-	return (S_ISREG(inode->i_mode) || inode->i_mode == 0) &&
-	       lli->lli_clob == obj;
-}
-
-struct inode *ccc_object_inode(const struct cl_object *obj)
-{
-	return cl2ccc(obj)->cob_inode;
-}
-
 /**
  * Initialize or update CLIO structures for regular files when new
  * meta-data arrives from the server.
diff --git a/drivers/staging/lustre/lustre/llite/llite_close.c b/drivers/staging/lustre/lustre/llite/llite_close.c
index a55ac4d..6e99d34 100644
--- a/drivers/staging/lustre/lustre/llite/llite_close.c
+++ b/drivers/staging/lustre/lustre/llite/llite_close.c
@@ -46,21 +46,21 @@
 #include "llite_internal.h"
 
 /** records that a write is in flight */
-void vvp_write_pending(struct ccc_object *club, struct ccc_page *page)
+void vvp_write_pending(struct vvp_object *club, struct ccc_page *page)
 {
-	struct ll_inode_info *lli = ll_i2info(club->cob_inode);
+	struct ll_inode_info *lli = ll_i2info(club->vob_inode);
 
 	spin_lock(&lli->lli_lock);
 	lli->lli_flags |= LLIF_SOM_DIRTY;
 	if (page && list_empty(&page->cpg_pending_linkage))
-		list_add(&page->cpg_pending_linkage, &club->cob_pending_list);
+		list_add(&page->cpg_pending_linkage, &club->vob_pending_list);
 	spin_unlock(&lli->lli_lock);
 }
 
 /** records that a write has completed */
-void vvp_write_complete(struct ccc_object *club, struct ccc_page *page)
+void vvp_write_complete(struct vvp_object *club, struct ccc_page *page)
 {
-	struct ll_inode_info *lli = ll_i2info(club->cob_inode);
+	struct ll_inode_info *lli = ll_i2info(club->vob_inode);
 	int rc = 0;
 
 	spin_lock(&lli->lli_lock);
@@ -70,7 +70,7 @@ void vvp_write_complete(struct ccc_object *club, struct ccc_page *page)
 	}
 	spin_unlock(&lli->lli_lock);
 	if (rc)
-		ll_queue_done_writing(club->cob_inode, 0);
+		ll_queue_done_writing(club->vob_inode, 0);
 }
 
 /** Queues DONE_WRITING if
@@ -80,13 +80,13 @@ void vvp_write_complete(struct ccc_object *club, struct ccc_page *page)
 void ll_queue_done_writing(struct inode *inode, unsigned long flags)
 {
 	struct ll_inode_info *lli = ll_i2info(inode);
-	struct ccc_object *club = cl2ccc(ll_i2info(inode)->lli_clob);
+	struct vvp_object *club = cl2vvp(ll_i2info(inode)->lli_clob);
 
 	spin_lock(&lli->lli_lock);
 	lli->lli_flags |= flags;
 
 	if ((lli->lli_flags & LLIF_DONE_WRITING) &&
-	    list_empty(&club->cob_pending_list)) {
+	    list_empty(&club->vob_pending_list)) {
 		struct ll_close_queue *lcq = ll_i2sbi(inode)->ll_lcq;
 
 		if (lli->lli_flags & LLIF_MDS_SIZE_LOCK)
@@ -140,10 +140,10 @@ void ll_ioepoch_close(struct inode *inode, struct md_op_data *op_data,
 		      struct obd_client_handle **och, unsigned long flags)
 {
 	struct ll_inode_info *lli = ll_i2info(inode);
-	struct ccc_object *club = cl2ccc(ll_i2info(inode)->lli_clob);
+	struct vvp_object *club = cl2vvp(ll_i2info(inode)->lli_clob);
 
 	spin_lock(&lli->lli_lock);
-	if (!(list_empty(&club->cob_pending_list))) {
+	if (!(list_empty(&club->vob_pending_list))) {
 		if (!(lli->lli_flags & LLIF_EPOCH_PENDING)) {
 			LASSERT(*och);
 			LASSERT(!lli->lli_pending_och);
@@ -198,7 +198,7 @@ void ll_ioepoch_close(struct inode *inode, struct md_op_data *op_data,
 		}
 	}
 
-	LASSERT(list_empty(&club->cob_pending_list));
+	LASSERT(list_empty(&club->vob_pending_list));
 	lli->lli_flags &= ~LLIF_SOM_DIRTY;
 	spin_unlock(&lli->lli_lock);
 	ll_done_writing_attr(inode, op_data);
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index 1e9e41b..1c39d15 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -828,10 +828,8 @@ struct ll_close_queue {
 	atomic_t		lcq_stop;
 };
 
-struct ccc_object *cl_inode2ccc(struct inode *inode);
-
-void vvp_write_pending (struct ccc_object *club, struct ccc_page *page);
-void vvp_write_complete(struct ccc_object *club, struct ccc_page *page);
+void vvp_write_pending(struct vvp_object *club, struct ccc_page *page);
+void vvp_write_complete(struct vvp_object *club, struct ccc_page *page);
 
 /* specific architecture can implement only part of this list */
 enum vvp_io_subtype {
diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c
index 0f01cfc..95c55c3 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -2270,7 +2270,7 @@ void ll_dirty_page_discard_warn(struct page *page, int ioret)
 {
 	char *buf, *path = NULL;
 	struct dentry *dentry = NULL;
-	struct ccc_object *obj = cl_inode2ccc(page->mapping->host);
+	struct vvp_object *obj = cl_inode2vvp(page->mapping->host);
 
 	/* this can be called inside spin lock so use GFP_ATOMIC. */
 	buf = (char *)__get_free_page(GFP_ATOMIC);
@@ -2284,7 +2284,7 @@ void ll_dirty_page_discard_warn(struct page *page, int ioret)
 	       "%s: dirty page discard: %s/fid: " DFID "/%s may get corrupted (rc %d)\n",
 	       ll_get_fsname(page->mapping->host->i_sb, NULL, 0),
 	       s2lsi(page->mapping->host->i_sb)->lsi_lmd->lmd_dev,
-	       PFID(&obj->cob_header.coh_lu.loh_fid),
+	       PFID(&obj->vob_header.coh_lu.loh_fid),
 	       (path && !IS_ERR(path)) ? path : "", ioret);
 
 	if (dentry)
diff --git a/drivers/staging/lustre/lustre/llite/llite_mmap.c b/drivers/staging/lustre/lustre/llite/llite_mmap.c
index baccf93..1263da8 100644
--- a/drivers/staging/lustre/lustre/llite/llite_mmap.c
+++ b/drivers/staging/lustre/lustre/llite/llite_mmap.c
@@ -200,7 +200,7 @@ static int ll_page_mkwrite0(struct vm_area_struct *vma, struct page *vmpage,
 	 * Otherwise, we could add dirty pages into osc cache
 	 * while truncate is on-going.
 	 */
-	inode = ccc_object_inode(io->ci_obj);
+	inode = vvp_object_inode(io->ci_obj);
 	lli = ll_i2info(inode);
 	down_read(&lli->lli_trunc_sem);
 
@@ -422,16 +422,16 @@ static int ll_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 
 /**
  *  To avoid cancel the locks covering mmapped region for lock cache pressure,
- *  we track the mapped vma count in ccc_object::cob_mmap_cnt.
+ *  we track the mapped vma count in vvp_object::vob_mmap_cnt.
  */
 static void ll_vm_open(struct vm_area_struct *vma)
 {
 	struct inode *inode    = file_inode(vma->vm_file);
-	struct ccc_object *vob = cl_inode2ccc(inode);
+	struct vvp_object *vob = cl_inode2vvp(inode);
 
 	LASSERT(vma->vm_file);
-	LASSERT(atomic_read(&vob->cob_mmap_cnt) >= 0);
-	atomic_inc(&vob->cob_mmap_cnt);
+	LASSERT(atomic_read(&vob->vob_mmap_cnt) >= 0);
+	atomic_inc(&vob->vob_mmap_cnt);
 }
 
 /**
@@ -440,11 +440,11 @@ static void ll_vm_open(struct vm_area_struct *vma)
 static void ll_vm_close(struct vm_area_struct *vma)
 {
 	struct inode      *inode = file_inode(vma->vm_file);
-	struct ccc_object *vob   = cl_inode2ccc(inode);
+	struct vvp_object *vob   = cl_inode2vvp(inode);
 
 	LASSERT(vma->vm_file);
-	atomic_dec(&vob->cob_mmap_cnt);
-	LASSERT(atomic_read(&vob->cob_mmap_cnt) >= 0);
+	atomic_dec(&vob->vob_mmap_cnt);
+	LASSERT(atomic_read(&vob->vob_mmap_cnt) >= 0);
 }
 
 /* XXX put nice comment here.  talk about __free_pte -> dirty pages and
diff --git a/drivers/staging/lustre/lustre/llite/rw.c b/drivers/staging/lustre/lustre/llite/rw.c
index ad15058..5ae0993 100644
--- a/drivers/staging/lustre/lustre/llite/rw.c
+++ b/drivers/staging/lustre/lustre/llite/rw.c
@@ -343,7 +343,7 @@ static int ll_read_ahead_page(const struct lu_env *env, struct cl_io *io,
 			      pgoff_t index, pgoff_t *max_index)
 {
 	struct cl_object *clob  = io->ci_obj;
-	struct inode     *inode = ccc_object_inode(clob);
+	struct inode     *inode = vvp_object_inode(clob);
 	struct page      *vmpage;
 	struct cl_page   *page;
 	enum ra_stat      which = _NR_RA_STAT; /* keep gcc happy */
@@ -558,7 +558,7 @@ int ll_readahead(const struct lu_env *env, struct cl_io *io,
 	__u64 kms;
 
 	clob = io->ci_obj;
-	inode = ccc_object_inode(clob);
+	inode = vvp_object_inode(clob);
 
 	memset(ria, 0, sizeof(*ria));
 
diff --git a/drivers/staging/lustre/lustre/llite/rw26.c b/drivers/staging/lustre/lustre/llite/rw26.c
index f87238b..ec114bc 100644
--- a/drivers/staging/lustre/lustre/llite/rw26.c
+++ b/drivers/staging/lustre/lustre/llite/rw26.c
@@ -350,7 +350,7 @@ static ssize_t ll_direct_IO_26(struct kiocb *iocb, struct iov_iter *iter,
 	struct cl_io *io;
 	struct file *file = iocb->ki_filp;
 	struct inode *inode = file->f_mapping->host;
-	struct ccc_object *obj = cl_inode2ccc(inode);
+	struct vvp_object *obj = cl_inode2vvp(inode);
 	ssize_t count = iov_iter_count(iter);
 	ssize_t tot_bytes = 0, result = 0;
 	struct ll_inode_info *lli = ll_i2info(inode);
@@ -386,7 +386,7 @@ static ssize_t ll_direct_IO_26(struct kiocb *iocb, struct iov_iter *iter,
 	if (iov_iter_rw(iter) == READ)
 		inode_lock(inode);
 
-	LASSERT(obj->cob_transient_pages == 0);
+	LASSERT(obj->vob_transient_pages == 0);
 	while (iov_iter_count(iter)) {
 		struct page **pages;
 		size_t offs;
@@ -434,7 +434,7 @@ static ssize_t ll_direct_IO_26(struct kiocb *iocb, struct iov_iter *iter,
 		file_offset += result;
 	}
 out:
-	LASSERT(obj->cob_transient_pages == 0);
+	LASSERT(obj->vob_transient_pages == 0);
 	if (iov_iter_rw(iter) == READ)
 		inode_unlock(inode);
 
diff --git a/drivers/staging/lustre/lustre/llite/vvp_dev.c b/drivers/staging/lustre/lustre/llite/vvp_dev.c
index e934ec8..45c549c 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_dev.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_dev.c
@@ -57,10 +57,16 @@
  * "llite_" (var. "ll_") prefix.
  */
 
+struct kmem_cache *vvp_object_kmem;
 static struct kmem_cache *vvp_thread_kmem;
 static struct kmem_cache *vvp_session_kmem;
 static struct lu_kmem_descr vvp_caches[] = {
 	{
+		.ckd_cache = &vvp_object_kmem,
+		.ckd_name  = "vvp_object_kmem",
+		.ckd_size  = sizeof(struct vvp_object),
+	},
+	{
 		.ckd_cache = &vvp_thread_kmem,
 		.ckd_name  = "vvp_thread_kmem",
 		.ckd_size  = sizeof(struct vvp_thread_info),
@@ -431,7 +437,7 @@ static loff_t vvp_pgcache_find(const struct lu_env *env,
 			return ~0ULL;
 		clob = vvp_pgcache_obj(env, dev, &id);
 		if (clob) {
-			struct inode *inode = ccc_object_inode(clob);
+			struct inode *inode = vvp_object_inode(clob);
 			struct page *vmpage;
 			int nr;
 
@@ -512,7 +518,7 @@ static int vvp_pgcache_show(struct seq_file *f, void *v)
 		sbi = f->private;
 		clob = vvp_pgcache_obj(env, &sbi->ll_cl->cd_lu_dev, &id);
 		if (clob) {
-			struct inode *inode = ccc_object_inode(clob);
+			struct inode *inode = vvp_object_inode(clob);
 			struct cl_page *page = NULL;
 			struct page *vmpage;
 
diff --git a/drivers/staging/lustre/lustre/llite/vvp_internal.h b/drivers/staging/lustre/lustre/llite/vvp_internal.h
index 34509f9..76e7b4c 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_internal.h
+++ b/drivers/staging/lustre/lustre/llite/vvp_internal.h
@@ -128,6 +128,8 @@ int cl_is_normalio(const struct lu_env *env, const struct cl_io *io);
 extern struct lu_context_key ccc_key;
 extern struct lu_context_key ccc_session_key;
 
+extern struct kmem_cache *vvp_object_kmem;
+
 struct ccc_thread_info {
 	struct cl_lock		cti_lock;
 	struct cl_lock_descr	cti_descr;
@@ -193,10 +195,10 @@ static inline struct ccc_io *ccc_env_io(const struct lu_env *env)
 /**
  * ccc-private object state.
  */
-struct ccc_object {
-	struct cl_object_header cob_header;
-	struct cl_object        cob_cl;
-	struct inode           *cob_inode;
+struct vvp_object {
+	struct cl_object_header vob_header;
+	struct cl_object        vob_cl;
+	struct inode           *vob_inode;
 
 	/**
 	 * A list of dirty pages pending IO in the cache. Used by
@@ -204,24 +206,24 @@ struct ccc_object {
 	 *
 	 * \see ccc_page::cpg_pending_linkage
 	 */
-	struct list_head	cob_pending_list;
+	struct list_head	vob_pending_list;
 
 	/**
 	 * Access this counter is protected by inode->i_sem. Now that
 	 * the lifetime of transient pages must be covered by inode sem,
 	 * we don't need to hold any lock..
 	 */
-	int			cob_transient_pages;
+	int			vob_transient_pages;
 	/**
 	 * Number of outstanding mmaps on this file.
 	 *
 	 * \see ll_vm_open(), ll_vm_close().
 	 */
-	atomic_t                cob_mmap_cnt;
+	atomic_t                vob_mmap_cnt;
 
 	/**
 	 * various flags
-	 * cob_discard_page_warned
+	 * vob_discard_page_warned
 	 *     if pages belonging to this object are discarded when a client
 	 * is evicted, some debug info will be printed, this flag will be set
 	 * during processing the first discarded page, then avoid flooding
@@ -229,7 +231,7 @@ struct ccc_object {
 	 *
 	 * \see ll_dirty_page_discard_warn.
 	 */
-	unsigned int		cob_discard_page_warned:1;
+	unsigned int		vob_discard_page_warned:1;
 };
 
 /**
@@ -242,8 +244,7 @@ struct ccc_page {
 	int		  cpg_write_queued;
 	/**
 	 * Non-empty iff this page is already counted in
-	 * ccc_object::cob_pending_list. Protected by
-	 * ccc_object::cob_pending_guard. This list is only used as a flag,
+	 * vvp_object::vob_pending_list. This list is only used as a flag,
 	 * that is, never iterated through, only checked for list_empty(), but
 	 * having a list is useful for debugging.
 	 */
@@ -287,27 +288,14 @@ void *ccc_session_key_init(const struct lu_context *ctx,
 void  ccc_session_key_fini(const struct lu_context *ctx,
 			   struct lu_context_key *key, void *data);
 
-struct lu_object *ccc_object_alloc(const struct lu_env *env,
-				   const struct lu_object_header *hdr,
-				   struct lu_device *dev,
-				   const struct cl_object_operations *clops,
-				   const struct lu_object_operations *luops);
-
 int ccc_req_init(const struct lu_env *env, struct cl_device *dev,
 		 struct cl_req *req);
 void ccc_umount(const struct lu_env *env, struct cl_device *dev);
 int ccc_global_init(struct lu_device_type *device_type);
 void ccc_global_fini(struct lu_device_type *device_type);
-int ccc_object_init0(const struct lu_env *env, struct ccc_object *vob,
-		     const struct cl_object_conf *conf);
-int ccc_object_init(const struct lu_env *env, struct lu_object *obj,
-		    const struct lu_object_conf *conf);
-void ccc_object_free(const struct lu_env *env, struct lu_object *obj);
 int ccc_lock_init(const struct lu_env *env, struct cl_object *obj,
 		  struct cl_lock *lock, const struct cl_io *io,
 		  const struct cl_lock_operations *lkops);
-int ccc_object_glimpse(const struct lu_env *env,
-		       const struct cl_object *obj, struct ost_lvb *lvb);
 int ccc_fail(const struct lu_env *env, const struct cl_page_slice *slice);
 int ccc_transient_page_prep(const struct lu_env *env,
 			    const struct cl_page_slice *slice,
@@ -354,20 +342,32 @@ static inline struct vvp_device *cl2vvp_dev(const struct cl_device *d)
 	return container_of0(d, struct vvp_device, vdv_cl);
 }
 
-struct lu_object *ccc2lu(struct ccc_object *vob);
-struct ccc_object *lu2ccc(const struct lu_object *obj);
-struct ccc_object *cl2ccc(const struct cl_object *obj);
+static inline struct vvp_object *cl2vvp(const struct cl_object *obj)
+{
+	return container_of0(obj, struct vvp_object, vob_cl);
+}
+
+static inline struct vvp_object *lu2vvp(const struct lu_object *obj)
+{
+	return container_of0(obj, struct vvp_object, vob_cl.co_lu);
+}
+
+static inline struct inode *vvp_object_inode(const struct cl_object *obj)
+{
+	return cl2vvp(obj)->vob_inode;
+}
+
+int vvp_object_invariant(const struct cl_object *obj);
+struct vvp_object *cl_inode2vvp(struct inode *inode);
+
 struct ccc_lock *cl2ccc_lock(const struct cl_lock_slice *slice);
 struct ccc_io *cl2ccc_io(const struct lu_env *env,
 			 const struct cl_io_slice *slice);
 struct ccc_req *cl2ccc_req(const struct cl_req_slice *slice);
 struct page *cl2vm_page(const struct cl_page_slice *slice);
-struct inode *ccc_object_inode(const struct cl_object *obj);
-struct ccc_object *cl_inode2ccc(struct inode *inode);
 
 int cl_setattr_ost(struct inode *inode, const struct iattr *attr);
 
-int ccc_object_invariant(const struct cl_object *obj);
 int cl_file_inode_init(struct inode *inode, struct lustre_md *md);
 void cl_inode_fini(struct inode *inode);
 int cl_local_size(struct inode *inode);
@@ -419,7 +419,6 @@ int vvp_page_init(const struct lu_env *env, struct cl_object *obj,
 struct lu_object *vvp_object_alloc(const struct lu_env *env,
 				   const struct lu_object_header *hdr,
 				   struct lu_device *dev);
-struct ccc_object *cl_inode2ccc(struct inode *inode);
 
 extern const struct file_operations vvp_dump_pgcache_file_ops;
 
diff --git a/drivers/staging/lustre/lustre/llite/vvp_io.c b/drivers/staging/lustre/lustre/llite/vvp_io.c
index fcf0cfe..1773cb2 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_io.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_io.c
@@ -127,7 +127,7 @@ static int vvp_io_fault_iter_init(const struct lu_env *env,
 				  const struct cl_io_slice *ios)
 {
 	struct vvp_io *vio   = cl2vvp_io(env, ios);
-	struct inode  *inode = ccc_object_inode(ios->cis_obj);
+	struct inode  *inode = vvp_object_inode(ios->cis_obj);
 
 	LASSERT(inode ==
 		file_inode(cl2ccc_io(env, ios)->cui_fd->fd_file));
@@ -141,7 +141,7 @@ static void vvp_io_fini(const struct lu_env *env, const struct cl_io_slice *ios)
 	struct cl_object *obj = io->ci_obj;
 	struct ccc_io    *cio = cl2ccc_io(env, ios);
 
-	CLOBINVRNT(env, obj, ccc_object_invariant(obj));
+	CLOBINVRNT(env, obj, vvp_object_invariant(obj));
 
 	CDEBUG(D_VFSTRACE, DFID
 	       " ignore/verify layout %d/%d, layout version %d restore needed %d\n",
@@ -155,7 +155,7 @@ static void vvp_io_fini(const struct lu_env *env, const struct cl_io_slice *ios)
 		/* file was detected release, we need to restore it
 		 * before finishing the io
 		 */
-		rc = ll_layout_restore(ccc_object_inode(obj));
+		rc = ll_layout_restore(vvp_object_inode(obj));
 		/* if restore registration failed, no restart,
 		 * we will return -ENODATA
 		 */
@@ -181,7 +181,7 @@ static void vvp_io_fini(const struct lu_env *env, const struct cl_io_slice *ios)
 		__u32 gen = 0;
 
 		/* check layout version */
-		ll_layout_refresh(ccc_object_inode(obj), &gen);
+		ll_layout_refresh(vvp_object_inode(obj), &gen);
 		io->ci_need_restart = cio->cui_layout_gen != gen;
 		if (io->ci_need_restart) {
 			CDEBUG(D_VFSTRACE,
@@ -190,7 +190,7 @@ static void vvp_io_fini(const struct lu_env *env, const struct cl_io_slice *ios)
 			       cio->cui_layout_gen, gen);
 			/* today successful restore is the only possible case */
 			/* restore was done, clear restoring state */
-			ll_i2info(ccc_object_inode(obj))->lli_flags &=
+			ll_i2info(vvp_object_inode(obj))->lli_flags &=
 				~LLIF_FILE_RESTORING;
 		}
 	}
@@ -202,7 +202,7 @@ static void vvp_io_fault_fini(const struct lu_env *env,
 	struct cl_io   *io   = ios->cis_io;
 	struct cl_page *page = io->u.ci_fault.ft_page;
 
-	CLOBINVRNT(env, io->ci_obj, ccc_object_invariant(io->ci_obj));
+	CLOBINVRNT(env, io->ci_obj, vvp_object_invariant(io->ci_obj));
 
 	if (page) {
 		lu_ref_del(&page->cp_reference, "fault", io);
@@ -459,7 +459,7 @@ static int vvp_io_setattr_start(const struct lu_env *env,
 				const struct cl_io_slice *ios)
 {
 	struct cl_io	*io    = ios->cis_io;
-	struct inode	*inode = ccc_object_inode(io->ci_obj);
+	struct inode	*inode = vvp_object_inode(io->ci_obj);
 	int result = 0;
 
 	inode_lock(inode);
@@ -475,7 +475,7 @@ static void vvp_io_setattr_end(const struct lu_env *env,
 			       const struct cl_io_slice *ios)
 {
 	struct cl_io *io    = ios->cis_io;
-	struct inode *inode = ccc_object_inode(io->ci_obj);
+	struct inode *inode = vvp_object_inode(io->ci_obj);
 
 	if (cl_io_is_trunc(io))
 		/* Truncate in memory pages - they must be clean pages
@@ -499,7 +499,7 @@ static int vvp_io_read_start(const struct lu_env *env,
 	struct ccc_io     *cio   = cl2ccc_io(env, ios);
 	struct cl_io      *io    = ios->cis_io;
 	struct cl_object  *obj   = io->ci_obj;
-	struct inode      *inode = ccc_object_inode(obj);
+	struct inode      *inode = vvp_object_inode(obj);
 	struct ll_ra_read *bead  = &vio->cui_bead;
 	struct file       *file  = cio->cui_fd->fd_file;
 
@@ -509,7 +509,7 @@ static int vvp_io_read_start(const struct lu_env *env,
 	long    tot = cio->cui_tot_count;
 	int     exceed = 0;
 
-	CLOBINVRNT(env, obj, ccc_object_invariant(obj));
+	CLOBINVRNT(env, obj, vvp_object_invariant(obj));
 
 	CDEBUG(D_VFSTRACE, "read: -> [%lli, %lli)\n", pos, pos + cnt);
 
@@ -653,7 +653,7 @@ static void write_commit_callback(const struct lu_env *env, struct cl_io *io,
 	set_page_dirty(vmpage);
 
 	cp = cl2ccc_page(cl_object_page_slice(clob, page));
-	vvp_write_pending(cl2ccc(clob), cp);
+	vvp_write_pending(cl2vvp(clob), cp);
 
 	cl_page_disown(env, io, page);
 
@@ -690,7 +690,7 @@ static bool page_list_sanity_check(struct cl_object *obj,
 int vvp_io_write_commit(const struct lu_env *env, struct cl_io *io)
 {
 	struct cl_object *obj = io->ci_obj;
-	struct inode *inode = ccc_object_inode(obj);
+	struct inode *inode = vvp_object_inode(obj);
 	struct ccc_io *cio = ccc_env_io(env);
 	struct cl_page_list *queue = &cio->u.write.cui_queue;
 	struct cl_page *page;
@@ -773,7 +773,7 @@ static int vvp_io_write_start(const struct lu_env *env,
 	struct ccc_io      *cio   = cl2ccc_io(env, ios);
 	struct cl_io       *io    = ios->cis_io;
 	struct cl_object   *obj   = io->ci_obj;
-	struct inode       *inode = ccc_object_inode(obj);
+	struct inode       *inode = vvp_object_inode(obj);
 	ssize_t result = 0;
 	loff_t pos = io->u.ci_wr.wr.crw_pos;
 	size_t cnt = io->u.ci_wr.wr.crw_count;
@@ -874,7 +874,7 @@ static void mkwrite_commit_callback(const struct lu_env *env, struct cl_io *io,
 	set_page_dirty(page->cp_vmpage);
 
 	cp = cl2ccc_page(cl_object_page_slice(clob, page));
-	vvp_write_pending(cl2ccc(clob), cp);
+	vvp_write_pending(cl2vvp(clob), cp);
 }
 
 static int vvp_io_fault_start(const struct lu_env *env,
@@ -883,7 +883,7 @@ static int vvp_io_fault_start(const struct lu_env *env,
 	struct vvp_io       *vio     = cl2vvp_io(env, ios);
 	struct cl_io	*io      = ios->cis_io;
 	struct cl_object    *obj     = io->ci_obj;
-	struct inode	*inode   = ccc_object_inode(obj);
+	struct inode        *inode   = vvp_object_inode(obj);
 	struct cl_fault_io  *fio     = &io->u.ci_fault;
 	struct vvp_fault_io *cfio    = &vio->u.fault;
 	loff_t	       offset;
@@ -1060,7 +1060,7 @@ static int vvp_io_read_page(const struct lu_env *env,
 	struct cl_io	      *io     = ios->cis_io;
 	struct ccc_page	   *cp     = cl2ccc_page(slice);
 	struct cl_page	    *page   = slice->cpl_page;
-	struct inode	      *inode  = ccc_object_inode(slice->cpl_obj);
+	struct inode              *inode  = vvp_object_inode(slice->cpl_obj);
 	struct ll_sb_info	 *sbi    = ll_i2sbi(inode);
 	struct ll_file_data       *fd     = cl2ccc_io(env, ios)->cui_fd;
 	struct ll_readahead_state *ras    = &fd->fd_ras;
@@ -1135,10 +1135,10 @@ int vvp_io_init(const struct lu_env *env, struct cl_object *obj,
 {
 	struct vvp_io      *vio   = vvp_env_io(env);
 	struct ccc_io      *cio   = ccc_env_io(env);
-	struct inode       *inode = ccc_object_inode(obj);
+	struct inode       *inode = vvp_object_inode(obj);
 	int		 result;
 
-	CLOBINVRNT(env, obj, ccc_object_invariant(obj));
+	CLOBINVRNT(env, obj, vvp_object_invariant(obj));
 
 	CDEBUG(D_VFSTRACE, DFID
 	       " ignore/verify layout %d/%d, layout version %d restore needed %d\n",
diff --git a/drivers/staging/lustre/lustre/llite/vvp_object.c b/drivers/staging/lustre/lustre/llite/vvp_object.c
index 45fac69..9f5e6a6 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_object.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_object.c
@@ -54,16 +54,25 @@
  *
  */
 
+int vvp_object_invariant(const struct cl_object *obj)
+{
+	struct inode *inode  = vvp_object_inode(obj);
+	struct ll_inode_info *lli = ll_i2info(inode);
+
+	return (S_ISREG(inode->i_mode) || inode->i_mode == 0) &&
+		lli->lli_clob == obj;
+}
+
 static int vvp_object_print(const struct lu_env *env, void *cookie,
 			    lu_printer_t p, const struct lu_object *o)
 {
-	struct ccc_object    *obj   = lu2ccc(o);
-	struct inode	 *inode = obj->cob_inode;
+	struct vvp_object    *obj   = lu2vvp(o);
+	struct inode	 *inode = obj->vob_inode;
 	struct ll_inode_info *lli;
 
 	(*p)(env, cookie, "(%s %d %d) inode: %p ",
-	     list_empty(&obj->cob_pending_list) ? "-" : "+",
-	     obj->cob_transient_pages, atomic_read(&obj->cob_mmap_cnt),
+	     list_empty(&obj->vob_pending_list) ? "-" : "+",
+	     obj->vob_transient_pages, atomic_read(&obj->vob_mmap_cnt),
 	     inode);
 	if (inode) {
 		lli = ll_i2info(inode);
@@ -78,7 +87,7 @@ static int vvp_object_print(const struct lu_env *env, void *cookie,
 static int vvp_attr_get(const struct lu_env *env, struct cl_object *obj,
 			struct cl_attr *attr)
 {
-	struct inode *inode = ccc_object_inode(obj);
+	struct inode *inode = vvp_object_inode(obj);
 
 	/*
 	 * lov overwrites most of these fields in
@@ -100,7 +109,7 @@ static int vvp_attr_get(const struct lu_env *env, struct cl_object *obj,
 static int vvp_attr_set(const struct lu_env *env, struct cl_object *obj,
 			const struct cl_attr *attr, unsigned valid)
 {
-	struct inode *inode = ccc_object_inode(obj);
+	struct inode *inode = vvp_object_inode(obj);
 
 	if (valid & CAT_UID)
 		inode->i_uid = make_kuid(&init_user_ns, attr->cat_uid);
@@ -168,7 +177,7 @@ static int vvp_conf_set(const struct lu_env *env, struct cl_object *obj,
 
 static int vvp_prune(const struct lu_env *env, struct cl_object *obj)
 {
-	struct inode *inode = ccc_object_inode(obj);
+	struct inode *inode = vvp_object_inode(obj);
 	int rc;
 
 	rc = cl_sync_file_range(inode, 0, OBD_OBJECT_EOF, CL_FSYNC_LOCAL, 1);
@@ -182,6 +191,24 @@ static int vvp_prune(const struct lu_env *env, struct cl_object *obj)
 	return 0;
 }
 
+static int vvp_object_glimpse(const struct lu_env *env,
+			      const struct cl_object *obj, struct ost_lvb *lvb)
+{
+	struct inode *inode = vvp_object_inode(obj);
+
+	lvb->lvb_mtime = LTIME_S(inode->i_mtime);
+	lvb->lvb_atime = LTIME_S(inode->i_atime);
+	lvb->lvb_ctime = LTIME_S(inode->i_ctime);
+	/*
+	 * LU-417: Add dirty pages block count lest i_blocks reports 0, some
+	 * "cp" or "tar" on remote node may think it's a completely sparse file
+	 * and skip it.
+	 */
+	if (lvb->lvb_size > 0 && lvb->lvb_blocks == 0)
+		lvb->lvb_blocks = dirty_cnt(inode);
+	return 0;
+}
+
 static const struct cl_object_operations vvp_ops = {
 	.coo_page_init = vvp_page_init,
 	.coo_lock_init = vvp_lock_init,
@@ -190,16 +217,60 @@ static const struct cl_object_operations vvp_ops = {
 	.coo_attr_set  = vvp_attr_set,
 	.coo_conf_set  = vvp_conf_set,
 	.coo_prune     = vvp_prune,
-	.coo_glimpse   = ccc_object_glimpse
+	.coo_glimpse   = vvp_object_glimpse
 };
 
+static int vvp_object_init0(const struct lu_env *env,
+			    struct vvp_object *vob,
+			    const struct cl_object_conf *conf)
+{
+	vob->vob_inode = conf->coc_inode;
+	vob->vob_transient_pages = 0;
+	cl_object_page_init(&vob->vob_cl, sizeof(struct ccc_page));
+	return 0;
+}
+
+static int vvp_object_init(const struct lu_env *env, struct lu_object *obj,
+			   const struct lu_object_conf *conf)
+{
+	struct vvp_device *dev = lu2vvp_dev(obj->lo_dev);
+	struct vvp_object *vob = lu2vvp(obj);
+	struct lu_object  *below;
+	struct lu_device  *under;
+	int result;
+
+	under = &dev->vdv_next->cd_lu_dev;
+	below = under->ld_ops->ldo_object_alloc(env, obj->lo_header, under);
+	if (below) {
+		const struct cl_object_conf *cconf;
+
+		cconf = lu2cl_conf(conf);
+		INIT_LIST_HEAD(&vob->vob_pending_list);
+		lu_object_add(obj, below);
+		result = vvp_object_init0(env, vob, cconf);
+	} else {
+		result = -ENOMEM;
+	}
+
+	return result;
+}
+
+static void vvp_object_free(const struct lu_env *env, struct lu_object *obj)
+{
+	struct vvp_object *vob = lu2vvp(obj);
+
+	lu_object_fini(obj);
+	lu_object_header_fini(obj->lo_header);
+	kmem_cache_free(vvp_object_kmem, vob);
+}
+
 static const struct lu_object_operations vvp_lu_obj_ops = {
-	.loo_object_init  = ccc_object_init,
-	.loo_object_free  = ccc_object_free,
-	.loo_object_print = vvp_object_print
+	.loo_object_init	= vvp_object_init,
+	.loo_object_free	= vvp_object_free,
+	.loo_object_print	= vvp_object_print,
 };
 
-struct ccc_object *cl_inode2ccc(struct inode *inode)
+struct vvp_object *cl_inode2vvp(struct inode *inode)
 {
 	struct ll_inode_info *lli = ll_i2info(inode);
 	struct cl_object     *obj = lli->lli_clob;
@@ -207,12 +278,32 @@ struct ccc_object *cl_inode2ccc(struct inode *inode)
 
 	lu = lu_object_locate(obj->co_lu.lo_header, &vvp_device_type);
 	LASSERT(lu);
-	return lu2ccc(lu);
+	return lu2vvp(lu);
 }
 
 struct lu_object *vvp_object_alloc(const struct lu_env *env,
-				   const struct lu_object_header *hdr,
+				   const struct lu_object_header *unused,
 				   struct lu_device *dev)
 {
-	return ccc_object_alloc(env, hdr, dev, &vvp_ops, &vvp_lu_obj_ops);
+	struct vvp_object *vob;
+	struct lu_object  *obj;
+
+	vob = kmem_cache_zalloc(vvp_object_kmem, GFP_NOFS);
+	if (vob) {
+		struct cl_object_header *hdr;
+
+		obj = &vob->vob_cl.co_lu;
+		hdr = &vob->vob_header;
+		cl_object_header_init(hdr);
+		hdr->coh_page_bufsize = cfs_size_round(sizeof(struct cl_page));
+
+		lu_object_init(obj, &hdr->coh_lu, dev);
+		lu_object_add_top(&hdr->coh_lu, obj);
+
+		vob->vob_cl.co_ops = &vvp_ops;
+		obj->lo_ops = &vvp_lu_obj_ops;
+	} else {
+		obj = NULL;
+	}
+	return obj;
 }
diff --git a/drivers/staging/lustre/lustre/llite/vvp_page.c b/drivers/staging/lustre/lustre/llite/vvp_page.c
index 66a4f9b..419a535 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_page.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_page.c
@@ -159,9 +159,9 @@ static void vvp_page_delete(const struct lu_env *env,
 
 	LASSERT(PageLocked(vmpage));
 	LASSERT((struct cl_page *)vmpage->private == page);
-	LASSERT(inode == ccc_object_inode(obj));
+	LASSERT(inode == vvp_object_inode(obj));
 
-	vvp_write_complete(cl2ccc(obj), cl2ccc_page(slice));
+	vvp_write_complete(cl2vvp(obj), cl2ccc_page(slice));
 
 	/* Drop the reference count held in vvp_page_init */
 	refc = atomic_dec_return(&page->cp_ref);
@@ -220,7 +220,7 @@ static int vvp_page_prep_write(const struct lu_env *env,
 	if (!pg->cp_sync_io)
 		set_page_writeback(vmpage);
 
-	vvp_write_pending(cl2ccc(slice->cpl_obj), cl2ccc_page(slice));
+	vvp_write_pending(cl2vvp(slice->cpl_obj), cl2ccc_page(slice));
 
 	return 0;
 }
@@ -233,11 +233,11 @@ static int vvp_page_prep_write(const struct lu_env *env,
  */
 static void vvp_vmpage_error(struct inode *inode, struct page *vmpage, int ioret)
 {
-	struct ccc_object *obj = cl_inode2ccc(inode);
+	struct vvp_object *obj = cl_inode2vvp(inode);
 
 	if (ioret == 0) {
 		ClearPageError(vmpage);
-		obj->cob_discard_page_warned = 0;
+		obj->vob_discard_page_warned = 0;
 	} else {
 		SetPageError(vmpage);
 		if (ioret == -ENOSPC)
@@ -246,8 +246,8 @@ static void vvp_vmpage_error(struct inode *inode, struct page *vmpage, int ioret
 			set_bit(AS_EIO, &inode->i_mapping->flags);
 
 		if ((ioret == -ESHUTDOWN || ioret == -EINTR) &&
-		    obj->cob_discard_page_warned == 0) {
-			obj->cob_discard_page_warned = 1;
+		     obj->vob_discard_page_warned == 0) {
+			obj->vob_discard_page_warned = 1;
 			ll_dirty_page_discard_warn(vmpage, ioret);
 		}
 	}
@@ -260,7 +260,7 @@ static void vvp_page_completion_read(const struct lu_env *env,
 	struct ccc_page *cp     = cl2ccc_page(slice);
 	struct page      *vmpage = cp->cpg_page;
 	struct cl_page  *page   = slice->cpl_page;
-	struct inode    *inode  = ccc_object_inode(page->cp_obj);
+	struct inode    *inode  = vvp_object_inode(page->cp_obj);
 
 	LASSERT(PageLocked(vmpage));
 	CL_PAGE_HEADER(D_PAGE, env, page, "completing READ with %d\n", ioret);
@@ -299,7 +299,7 @@ static void vvp_page_completion_write(const struct lu_env *env,
 	 */
 
 	cp->cpg_write_queued = 0;
-	vvp_write_complete(cl2ccc(slice->cpl_obj), cp);
+	vvp_write_complete(cl2vvp(slice->cpl_obj), cp);
 
 	if (pg->cp_sync_io) {
 		LASSERT(PageLocked(vmpage));
@@ -310,7 +310,7 @@ static void vvp_page_completion_write(const struct lu_env *env,
 		 * Only mark the page error only when it's an async write
 		 * because applications won't wait for IO to finish.
 		 */
-		vvp_vmpage_error(ccc_object_inode(pg->cp_obj), vmpage, ioret);
+		vvp_vmpage_error(vvp_object_inode(pg->cp_obj), vmpage, ioret);
 
 		end_page_writeback(vmpage);
 	}
@@ -342,7 +342,7 @@ static int vvp_page_make_ready(const struct lu_env *env,
 		LASSERT(pg->cp_state == CPS_CACHED);
 		/* This actually clears the dirty bit in the radix tree. */
 		set_page_writeback(vmpage);
-		vvp_write_pending(cl2ccc(slice->cpl_obj), cl2ccc_page(slice));
+		vvp_write_pending(cl2vvp(slice->cpl_obj), cl2ccc_page(slice));
 		CL_PAGE_HEADER(D_PAGE, env, pg, "readied\n");
 	} else if (pg->cp_state == CPS_PAGEOUT) {
 		/* is it possible for osc_flush_async_page() to already
@@ -421,7 +421,7 @@ static const struct cl_page_operations vvp_page_ops = {
 
 static void vvp_transient_page_verify(const struct cl_page *page)
 {
-	struct inode *inode = ccc_object_inode(page->cp_obj);
+	struct inode *inode = vvp_object_inode(page->cp_obj);
 
 	LASSERT(!inode_trylock(inode));
 }
@@ -472,7 +472,7 @@ static void vvp_transient_page_discard(const struct lu_env *env,
 static int vvp_transient_page_is_vmlocked(const struct lu_env *env,
 					  const struct cl_page_slice *slice)
 {
-	struct inode    *inode = ccc_object_inode(slice->cpl_obj);
+	struct inode    *inode = vvp_object_inode(slice->cpl_obj);
 	int	locked;
 
 	locked = !inode_trylock(inode);
@@ -494,11 +494,11 @@ static void vvp_transient_page_fini(const struct lu_env *env,
 {
 	struct ccc_page *cp = cl2ccc_page(slice);
 	struct cl_page *clp = slice->cpl_page;
-	struct ccc_object *clobj = cl2ccc(clp->cp_obj);
+	struct vvp_object *clobj = cl2vvp(clp->cp_obj);
 
 	vvp_page_fini_common(cp);
-	LASSERT(!inode_trylock(clobj->cob_inode));
-	clobj->cob_transient_pages--;
+	LASSERT(!inode_trylock(clobj->vob_inode));
+	clobj->vob_transient_pages--;
 }
 
 static const struct cl_page_operations vvp_transient_page_ops = {
@@ -529,7 +529,7 @@ int vvp_page_init(const struct lu_env *env, struct cl_object *obj,
 	struct ccc_page *cpg = cl_object_page_slice(obj, page);
 	struct page     *vmpage = page->cp_vmpage;
 
-	CLOBINVRNT(env, obj, ccc_object_invariant(obj));
+	CLOBINVRNT(env, obj, vvp_object_invariant(obj));
 
 	cpg->cpg_page = vmpage;
 	page_cache_get(vmpage);
@@ -543,12 +543,12 @@ int vvp_page_init(const struct lu_env *env, struct cl_object *obj,
 		cl_page_slice_add(page, &cpg->cpg_cl, obj, index,
 				  &vvp_page_ops);
 	} else {
-		struct ccc_object *clobj = cl2ccc(obj);
+		struct vvp_object *clobj = cl2vvp(obj);
 
-		LASSERT(!inode_trylock(clobj->cob_inode));
+		LASSERT(!inode_trylock(clobj->vob_inode));
 		cl_page_slice_add(page, &cpg->cpg_cl, obj, index,
 				  &vvp_transient_page_ops);
-		clobj->cob_transient_pages++;
+		clobj->vob_transient_pages++;
 	}
 	return 0;
 }
diff --git a/drivers/staging/lustre/lustre/osc/osc_cl_internal.h b/drivers/staging/lustre/lustre/osc/osc_cl_internal.h
index d6d7661..aba6469 100644
--- a/drivers/staging/lustre/lustre/osc/osc_cl_internal.h
+++ b/drivers/staging/lustre/lustre/osc/osc_cl_internal.h
@@ -135,7 +135,7 @@ struct osc_object {
 	 */
 	struct list_head	 oo_inflight[CRT_NR];
 	/**
-	 * Lock, protecting ccc_object::cob_inflight, because a seat-belt is
+	 * Lock, protecting osc_page::ops_inflight, because a seat-belt is
 	 * locked during take-off and landing.
 	 */
 	spinlock_t	   oo_seatbelt;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 27/46] staging/lustre/llite: rename ccc_page to vvp_page
  2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
                   ` (25 preceding siblings ...)
  2016-03-30 23:48 ` [PATCH v2 26/46] staging/lustre/llite: rename ccc_object to vvp_object green
@ 2016-03-30 23:48 ` green
  2016-03-30 23:48 ` [PATCH v2 28/46] staging/lustre/llite: rename ccc_lock to vvp_lock green
                   ` (18 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: green @ 2016-03-30 23:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List,
	John L. Hammond, Oleg Drokin

From: "John L. Hammond" <john.hammond@intel.com>

Rename struct ccc_page to struct vvp_page and remove obsolete CCC page
methods.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Reviewed-on: http://review.whamcloud.com/13086
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5971
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 drivers/staging/lustre/lustre/include/cl_object.h  |   2 +-
 drivers/staging/lustre/lustre/llite/lcommon_cl.c   |  28 ------
 drivers/staging/lustre/lustre/llite/llite_close.c  |  12 +--
 .../staging/lustre/lustre/llite/llite_internal.h   |   4 +-
 drivers/staging/lustre/lustre/llite/rw.c           |  14 +--
 drivers/staging/lustre/lustre/llite/rw26.c         |  10 +-
 drivers/staging/lustre/lustre/llite/vvp_dev.c      |  12 +--
 drivers/staging/lustre/lustre/llite/vvp_internal.h |  38 ++++----
 drivers/staging/lustre/lustre/llite/vvp_io.c       |  34 +++----
 drivers/staging/lustre/lustre/llite/vvp_object.c   |   2 +-
 drivers/staging/lustre/lustre/llite/vvp_page.c     | 107 +++++++++++++--------
 11 files changed, 131 insertions(+), 132 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/cl_object.h b/drivers/staging/lustre/lustre/include/cl_object.h
index d3271ff..3488349 100644
--- a/drivers/staging/lustre/lustre/include/cl_object.h
+++ b/drivers/staging/lustre/lustre/include/cl_object.h
@@ -769,7 +769,7 @@ struct cl_page {
 /**
  * Per-layer part of cl_page.
  *
- * \see ccc_page, lov_page, osc_page
+ * \see vvp_page, lov_page, osc_page
  */
 struct cl_page_slice {
 	struct cl_page		  *cpl_page;
diff --git a/drivers/staging/lustre/lustre/llite/lcommon_cl.c b/drivers/staging/lustre/lustre/llite/lcommon_cl.c
index 9db0510..0762032 100644
--- a/drivers/staging/lustre/lustre/llite/lcommon_cl.c
+++ b/drivers/staging/lustre/lustre/llite/lcommon_cl.c
@@ -259,29 +259,6 @@ static void vvp_object_size_unlock(struct cl_object *obj)
 
 /*****************************************************************************
  *
- * Page operations.
- *
- */
-
-int ccc_fail(const struct lu_env *env, const struct cl_page_slice *slice)
-{
-	/*
-	 * Cached read?
-	 */
-	LBUG();
-	return 0;
-}
-
-int ccc_transient_page_prep(const struct lu_env *env,
-			    const struct cl_page_slice *slice,
-			    struct cl_io *unused)
-{
-	/* transient page should always be sent. */
-	return 0;
-}
-
-/*****************************************************************************
- *
  * Lock operations.
  *
  */
@@ -614,11 +591,6 @@ struct ccc_req *cl2ccc_req(const struct cl_req_slice *slice)
 	return container_of0(slice, struct ccc_req, crq_cl);
 }
 
-struct page *cl2vm_page(const struct cl_page_slice *slice)
-{
-	return cl2ccc_page(slice)->cpg_page;
-}
-
 /**
  * Initialize or update CLIO structures for regular files when new
  * meta-data arrives from the server.
diff --git a/drivers/staging/lustre/lustre/llite/llite_close.c b/drivers/staging/lustre/lustre/llite/llite_close.c
index 6e99d34..8d23980 100644
--- a/drivers/staging/lustre/lustre/llite/llite_close.c
+++ b/drivers/staging/lustre/lustre/llite/llite_close.c
@@ -46,26 +46,26 @@
 #include "llite_internal.h"
 
 /** records that a write is in flight */
-void vvp_write_pending(struct vvp_object *club, struct ccc_page *page)
+void vvp_write_pending(struct vvp_object *club, struct vvp_page *page)
 {
 	struct ll_inode_info *lli = ll_i2info(club->vob_inode);
 
 	spin_lock(&lli->lli_lock);
 	lli->lli_flags |= LLIF_SOM_DIRTY;
-	if (page && list_empty(&page->cpg_pending_linkage))
-		list_add(&page->cpg_pending_linkage, &club->vob_pending_list);
+	if (page && list_empty(&page->vpg_pending_linkage))
+		list_add(&page->vpg_pending_linkage, &club->vob_pending_list);
 	spin_unlock(&lli->lli_lock);
 }
 
 /** records that a write has completed */
-void vvp_write_complete(struct vvp_object *club, struct ccc_page *page)
+void vvp_write_complete(struct vvp_object *club, struct vvp_page *page)
 {
 	struct ll_inode_info *lli = ll_i2info(club->vob_inode);
 	int rc = 0;
 
 	spin_lock(&lli->lli_lock);
-	if (page && !list_empty(&page->cpg_pending_linkage)) {
-		list_del_init(&page->cpg_pending_linkage);
+	if (page && !list_empty(&page->vpg_pending_linkage)) {
+		list_del_init(&page->vpg_pending_linkage);
 		rc = 1;
 	}
 	spin_unlock(&lli->lli_lock);
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index 1c39d15..78b3edd 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -828,8 +828,8 @@ struct ll_close_queue {
 	atomic_t		lcq_stop;
 };
 
-void vvp_write_pending(struct vvp_object *club, struct ccc_page *page);
-void vvp_write_complete(struct vvp_object *club, struct ccc_page *page);
+void vvp_write_pending(struct vvp_object *club, struct vvp_page *page);
+void vvp_write_complete(struct vvp_object *club, struct vvp_page *page);
 
 /* specific architecture can implement only part of this list */
 enum vvp_io_subtype {
diff --git a/drivers/staging/lustre/lustre/llite/rw.c b/drivers/staging/lustre/lustre/llite/rw.c
index 5ae0993..2c4d4c4 100644
--- a/drivers/staging/lustre/lustre/llite/rw.c
+++ b/drivers/staging/lustre/lustre/llite/rw.c
@@ -298,21 +298,21 @@ static int cl_read_ahead_page(const struct lu_env *env, struct cl_io *io,
 			      struct cl_object *clob, pgoff_t *max_index)
 {
 	struct page *vmpage = page->cp_vmpage;
-	struct ccc_page *cp;
+	struct vvp_page *vpg;
 	int	      rc;
 
 	rc = 0;
 	cl_page_assume(env, io, page);
 	lu_ref_add(&page->cp_reference, "ra", current);
-	cp = cl2ccc_page(cl_object_page_slice(clob, page));
-	if (!cp->cpg_defer_uptodate && !PageUptodate(vmpage)) {
+	vpg = cl2vvp_page(cl_object_page_slice(clob, page));
+	if (!vpg->vpg_defer_uptodate && !PageUptodate(vmpage)) {
 		CDEBUG(D_READA, "page index %lu, max_index: %lu\n",
-		       ccc_index(cp), *max_index);
-		if (*max_index == 0 || ccc_index(cp) > *max_index)
+		       vvp_index(vpg), *max_index);
+		if (*max_index == 0 || vvp_index(vpg) > *max_index)
 			rc = cl_page_is_under_lock(env, io, page, max_index);
 		if (rc == 0) {
-			cp->cpg_defer_uptodate = 1;
-			cp->cpg_ra_used = 0;
+			vpg->vpg_defer_uptodate = 1;
+			vpg->vpg_ra_used = 0;
 			cl_page_list_add(queue, page);
 			rc = 1;
 		} else {
diff --git a/drivers/staging/lustre/lustre/llite/rw26.c b/drivers/staging/lustre/lustre/llite/rw26.c
index ec114bc..bb85629 100644
--- a/drivers/staging/lustre/lustre/llite/rw26.c
+++ b/drivers/staging/lustre/lustre/llite/rw26.c
@@ -457,8 +457,8 @@ static int ll_prepare_partial_page(const struct lu_env *env, struct cl_io *io,
 {
 	struct cl_attr *attr   = ccc_env_thread_attr(env);
 	struct cl_object *obj  = io->ci_obj;
-	struct ccc_page *cp    = cl_object_page_slice(obj, pg);
-	loff_t          offset = cl_offset(obj, ccc_index(cp));
+	struct vvp_page *vpg   = cl_object_page_slice(obj, pg);
+	loff_t          offset = cl_offset(obj, vvp_index(vpg));
 	int             result;
 
 	cl_object_attr_lock(obj);
@@ -471,12 +471,12 @@ static int ll_prepare_partial_page(const struct lu_env *env, struct cl_io *io,
 		 * purposes here we can treat it like i_size.
 		 */
 		if (attr->cat_kms <= offset) {
-			char *kaddr = kmap_atomic(cp->cpg_page);
+			char *kaddr = kmap_atomic(vpg->vpg_page);
 
 			memset(kaddr, 0, cl_page_size(obj));
 			kunmap_atomic(kaddr);
-		} else if (cp->cpg_defer_uptodate) {
-			cp->cpg_ra_used = 1;
+		} else if (vpg->vpg_defer_uptodate) {
+			vpg->vpg_ra_used = 1;
 		} else {
 			result = ll_page_sync_io(env, io, pg, CRT_READ);
 		}
diff --git a/drivers/staging/lustre/lustre/llite/vvp_dev.c b/drivers/staging/lustre/lustre/llite/vvp_dev.c
index 45c549c..3891b0d 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_dev.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_dev.c
@@ -474,18 +474,18 @@ static loff_t vvp_pgcache_find(const struct lu_env *env,
 static void vvp_pgcache_page_show(const struct lu_env *env,
 				  struct seq_file *seq, struct cl_page *page)
 {
-	struct ccc_page *cpg;
+	struct vvp_page *vpg;
 	struct page      *vmpage;
 	int	      has_flags;
 
-	cpg = cl2ccc_page(cl_page_at(page, &vvp_device_type));
-	vmpage = cpg->cpg_page;
+	vpg = cl2vvp_page(cl_page_at(page, &vvp_device_type));
+	vmpage = vpg->vpg_page;
 	seq_printf(seq, " %5i | %p %p %s %s %s %s | %p %lu/%u(%p) %lu %u [",
 		   0 /* gen */,
-		   cpg, page,
+		   vpg, page,
 		   "none",
-		   cpg->cpg_write_queued ? "wq" : "- ",
-		   cpg->cpg_defer_uptodate ? "du" : "- ",
+		   vpg->vpg_write_queued ? "wq" : "- ",
+		   vpg->vpg_defer_uptodate ? "du" : "- ",
 		   PageWriteback(vmpage) ? "wb" : "-",
 		   vmpage, vmpage->mapping->host->i_ino,
 		   vmpage->mapping->host->i_generation,
diff --git a/drivers/staging/lustre/lustre/llite/vvp_internal.h b/drivers/staging/lustre/lustre/llite/vvp_internal.h
index 76e7b4c..4991ce7 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_internal.h
+++ b/drivers/staging/lustre/lustre/llite/vvp_internal.h
@@ -204,7 +204,7 @@ struct vvp_object {
 	 * A list of dirty pages pending IO in the cache. Used by
 	 * SOM. Protected by ll_inode_info::lli_lock.
 	 *
-	 * \see ccc_page::cpg_pending_linkage
+	 * \see vvp_page::vpg_pending_linkage
 	 */
 	struct list_head	vob_pending_list;
 
@@ -235,36 +235,34 @@ struct vvp_object {
 };
 
 /**
- * ccc-private page state.
+ * VVP-private page state.
  */
-struct ccc_page {
-	struct cl_page_slice cpg_cl;
-	int		  cpg_defer_uptodate;
-	int		  cpg_ra_used;
-	int		  cpg_write_queued;
+struct vvp_page {
+	struct cl_page_slice vpg_cl;
+	int		  vpg_defer_uptodate;
+	int		  vpg_ra_used;
+	int		  vpg_write_queued;
 	/**
 	 * Non-empty iff this page is already counted in
 	 * vvp_object::vob_pending_list. This list is only used as a flag,
 	 * that is, never iterated through, only checked for list_empty(), but
 	 * having a list is useful for debugging.
 	 */
-	struct list_head	   cpg_pending_linkage;
+	struct list_head	   vpg_pending_linkage;
 	/** VM page */
-	struct page	  *cpg_page;
+	struct page	  *vpg_page;
 };
 
-static inline struct ccc_page *cl2ccc_page(const struct cl_page_slice *slice)
+static inline struct vvp_page *cl2vvp_page(const struct cl_page_slice *slice)
 {
-	return container_of(slice, struct ccc_page, cpg_cl);
+	return container_of(slice, struct vvp_page, vpg_cl);
 }
 
-static inline pgoff_t ccc_index(struct ccc_page *ccc)
+static inline pgoff_t vvp_index(struct vvp_page *vvp)
 {
-	return ccc->cpg_cl.cpl_index;
+	return vvp->vpg_cl.cpl_index;
 }
 
-struct cl_page *ccc_vmpage_page_transient(struct page *vmpage);
-
 struct vvp_device {
 	struct cl_device    vdv_cl;
 	struct super_block *vdv_sb;
@@ -296,10 +294,6 @@ void ccc_global_fini(struct lu_device_type *device_type);
 int ccc_lock_init(const struct lu_env *env, struct cl_object *obj,
 		  struct cl_lock *lock, const struct cl_io *io,
 		  const struct cl_lock_operations *lkops);
-int ccc_fail(const struct lu_env *env, const struct cl_page_slice *slice);
-int ccc_transient_page_prep(const struct lu_env *env,
-			    const struct cl_page_slice *slice,
-			    struct cl_io *io);
 void ccc_lock_delete(const struct lu_env *env,
 		     const struct cl_lock_slice *slice);
 void ccc_lock_fini(const struct lu_env *env, struct cl_lock_slice *slice);
@@ -360,11 +354,15 @@ static inline struct inode *vvp_object_inode(const struct cl_object *obj)
 int vvp_object_invariant(const struct cl_object *obj);
 struct vvp_object *cl_inode2vvp(struct inode *inode);
 
+static inline struct page *cl2vm_page(const struct cl_page_slice *slice)
+{
+	return cl2vvp_page(slice)->vpg_page;
+}
+
 struct ccc_lock *cl2ccc_lock(const struct cl_lock_slice *slice);
 struct ccc_io *cl2ccc_io(const struct lu_env *env,
 			 const struct cl_io_slice *slice);
 struct ccc_req *cl2ccc_req(const struct cl_req_slice *slice);
-struct page *cl2vm_page(const struct cl_page_slice *slice);
 
 int cl_setattr_ost(struct inode *inode, const struct iattr *attr);
 
diff --git a/drivers/staging/lustre/lustre/llite/vvp_io.c b/drivers/staging/lustre/lustre/llite/vvp_io.c
index 1773cb2..eb6ce1c 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_io.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_io.c
@@ -645,15 +645,15 @@ static int vvp_io_commit_sync(const struct lu_env *env, struct cl_io *io,
 static void write_commit_callback(const struct lu_env *env, struct cl_io *io,
 				  struct cl_page *page)
 {
-	struct ccc_page *cp;
+	struct vvp_page *vpg;
 	struct page *vmpage = page->cp_vmpage;
 	struct cl_object *clob = cl_io_top(io)->ci_obj;
 
 	SetPageUptodate(vmpage);
 	set_page_dirty(vmpage);
 
-	cp = cl2ccc_page(cl_object_page_slice(clob, page));
-	vvp_write_pending(cl2vvp(clob), cp);
+	vpg = cl2vvp_page(cl_object_page_slice(clob, page));
+	vvp_write_pending(cl2vvp(clob), vpg);
 
 	cl_page_disown(env, io, page);
 
@@ -670,15 +670,15 @@ static bool page_list_sanity_check(struct cl_object *obj,
 	pgoff_t index = CL_PAGE_EOF;
 
 	cl_page_list_for_each(page, plist) {
-		struct ccc_page *cp = cl_object_page_slice(obj, page);
+		struct vvp_page *vpg = cl_object_page_slice(obj, page);
 
 		if (index == CL_PAGE_EOF) {
-			index = ccc_index(cp);
+			index = vvp_index(vpg);
 			continue;
 		}
 
 		++index;
-		if (index == ccc_index(cp))
+		if (index == vvp_index(vpg))
 			continue;
 
 		return false;
@@ -868,13 +868,13 @@ static int vvp_io_kernel_fault(struct vvp_fault_io *cfio)
 static void mkwrite_commit_callback(const struct lu_env *env, struct cl_io *io,
 				    struct cl_page *page)
 {
-	struct ccc_page *cp;
+	struct vvp_page *vpg;
 	struct cl_object *clob = cl_io_top(io)->ci_obj;
 
 	set_page_dirty(page->cp_vmpage);
 
-	cp = cl2ccc_page(cl_object_page_slice(clob, page));
-	vvp_write_pending(cl2vvp(clob), cp);
+	vpg = cl2vvp_page(cl_object_page_slice(clob, page));
+	vvp_write_pending(cl2vvp(clob), vpg);
 }
 
 static int vvp_io_fault_start(const struct lu_env *env,
@@ -979,7 +979,7 @@ static int vvp_io_fault_start(const struct lu_env *env,
 		wait_on_page_writeback(vmpage);
 		if (!PageDirty(vmpage)) {
 			struct cl_page_list *plist = &io->ci_queue.c2_qin;
-			struct ccc_page *cp = cl_object_page_slice(obj, page);
+			struct vvp_page *vpg = cl_object_page_slice(obj, page);
 			int to = PAGE_SIZE;
 
 			/* vvp_page_assume() calls wait_on_page_writeback(). */
@@ -989,7 +989,7 @@ static int vvp_io_fault_start(const struct lu_env *env,
 			cl_page_list_add(plist, page);
 
 			/* size fixup */
-			if (last_index == ccc_index(cp))
+			if (last_index == vvp_index(vpg))
 				to = size & ~PAGE_MASK;
 
 			/* Do not set Dirty bit here so that in case IO is
@@ -1058,7 +1058,7 @@ static int vvp_io_read_page(const struct lu_env *env,
 			    const struct cl_page_slice *slice)
 {
 	struct cl_io	      *io     = ios->cis_io;
-	struct ccc_page	   *cp     = cl2ccc_page(slice);
+	struct vvp_page           *vpg    = cl2vvp_page(slice);
 	struct cl_page	    *page   = slice->cpl_page;
 	struct inode              *inode  = vvp_object_inode(slice->cpl_obj);
 	struct ll_sb_info	 *sbi    = ll_i2sbi(inode);
@@ -1068,11 +1068,11 @@ static int vvp_io_read_page(const struct lu_env *env,
 
 	if (sbi->ll_ra_info.ra_max_pages_per_file &&
 	    sbi->ll_ra_info.ra_max_pages)
-		ras_update(sbi, inode, ras, ccc_index(cp),
-			   cp->cpg_defer_uptodate);
+		ras_update(sbi, inode, ras, vvp_index(vpg),
+			   vpg->vpg_defer_uptodate);
 
-	if (cp->cpg_defer_uptodate) {
-		cp->cpg_ra_used = 1;
+	if (vpg->vpg_defer_uptodate) {
+		vpg->vpg_ra_used = 1;
 		cl_page_export(env, page, 1);
 	}
 	/*
@@ -1084,7 +1084,7 @@ static int vvp_io_read_page(const struct lu_env *env,
 	if (sbi->ll_ra_info.ra_max_pages_per_file &&
 	    sbi->ll_ra_info.ra_max_pages)
 		ll_readahead(env, io, &queue->c2_qin, ras,
-			     cp->cpg_defer_uptodate);
+			     vpg->vpg_defer_uptodate);
 
 	return 0;
 }
diff --git a/drivers/staging/lustre/lustre/llite/vvp_object.c b/drivers/staging/lustre/lustre/llite/vvp_object.c
index 9f5e6a6..18c9df7 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_object.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_object.c
@@ -226,7 +226,7 @@ static int vvp_object_init0(const struct lu_env *env,
 {
 	vob->vob_inode = conf->coc_inode;
 	vob->vob_transient_pages = 0;
-	cl_object_page_init(&vob->vob_cl, sizeof(struct ccc_page));
+	cl_object_page_init(&vob->vob_cl, sizeof(struct vvp_page));
 	return 0;
 }
 
diff --git a/drivers/staging/lustre/lustre/llite/vvp_page.c b/drivers/staging/lustre/lustre/llite/vvp_page.c
index 419a535..5ebbe27 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_page.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_page.c
@@ -41,7 +41,13 @@
 
 #define DEBUG_SUBSYSTEM S_LLITE
 
-#include "../include/obd.h"
+#include <linux/atomic.h>
+#include <linux/bitops.h>
+#include <linux/mm.h>
+#include <linux/mutex.h>
+#include <linux/page-flags.h>
+#include <linux/pagemap.h>
+
 #include "../include/lustre_lite.h"
 
 #include "llite_internal.h"
@@ -53,9 +59,9 @@
  *
  */
 
-static void vvp_page_fini_common(struct ccc_page *cp)
+static void vvp_page_fini_common(struct vvp_page *vpg)
 {
-	struct page *vmpage = cp->cpg_page;
+	struct page *vmpage = vpg->vpg_page;
 
 	LASSERT(vmpage);
 	page_cache_release(vmpage);
@@ -64,23 +70,23 @@ static void vvp_page_fini_common(struct ccc_page *cp)
 static void vvp_page_fini(const struct lu_env *env,
 			  struct cl_page_slice *slice)
 {
-	struct ccc_page *cp = cl2ccc_page(slice);
-	struct page *vmpage  = cp->cpg_page;
+	struct vvp_page *vpg     = cl2vvp_page(slice);
+	struct page     *vmpage  = vpg->vpg_page;
 
 	/*
 	 * vmpage->private was already cleared when page was moved into
 	 * VPG_FREEING state.
 	 */
 	LASSERT((struct cl_page *)vmpage->private != slice->cpl_page);
-	vvp_page_fini_common(cp);
+	vvp_page_fini_common(vpg);
 }
 
 static int vvp_page_own(const struct lu_env *env,
 			const struct cl_page_slice *slice, struct cl_io *io,
 			int nonblock)
 {
-	struct ccc_page *vpg    = cl2ccc_page(slice);
-	struct page      *vmpage = vpg->cpg_page;
+	struct vvp_page *vpg    = cl2vvp_page(slice);
+	struct page     *vmpage = vpg->vpg_page;
 
 	LASSERT(vmpage);
 	if (nonblock) {
@@ -97,6 +103,7 @@ static int vvp_page_own(const struct lu_env *env,
 
 	lock_page(vmpage);
 	wait_on_page_writeback(vmpage);
+
 	return 0;
 }
 
@@ -137,12 +144,12 @@ static void vvp_page_discard(const struct lu_env *env,
 			     struct cl_io *unused)
 {
 	struct page	   *vmpage  = cl2vm_page(slice);
-	struct ccc_page      *cpg     = cl2ccc_page(slice);
+	struct vvp_page      *vpg     = cl2vvp_page(slice);
 
 	LASSERT(vmpage);
 	LASSERT(PageLocked(vmpage));
 
-	if (cpg->cpg_defer_uptodate && !cpg->cpg_ra_used)
+	if (vpg->vpg_defer_uptodate && !vpg->vpg_ra_used)
 		ll_ra_stats_inc(vmpage->mapping->host, RA_STAT_DISCARDED);
 
 	ll_invalidate_page(vmpage);
@@ -161,7 +168,7 @@ static void vvp_page_delete(const struct lu_env *env,
 	LASSERT((struct cl_page *)vmpage->private == page);
 	LASSERT(inode == vvp_object_inode(obj));
 
-	vvp_write_complete(cl2vvp(obj), cl2ccc_page(slice));
+	vvp_write_complete(cl2vvp(obj), cl2vvp_page(slice));
 
 	/* Drop the reference count held in vvp_page_init */
 	refc = atomic_dec_return(&page->cp_ref);
@@ -220,7 +227,7 @@ static int vvp_page_prep_write(const struct lu_env *env,
 	if (!pg->cp_sync_io)
 		set_page_writeback(vmpage);
 
-	vvp_write_pending(cl2vvp(slice->cpl_obj), cl2ccc_page(slice));
+	vvp_write_pending(cl2vvp(slice->cpl_obj), cl2vvp_page(slice));
 
 	return 0;
 }
@@ -257,22 +264,23 @@ static void vvp_page_completion_read(const struct lu_env *env,
 				     const struct cl_page_slice *slice,
 				     int ioret)
 {
-	struct ccc_page *cp     = cl2ccc_page(slice);
-	struct page      *vmpage = cp->cpg_page;
+	struct vvp_page *vpg    = cl2vvp_page(slice);
+	struct page     *vmpage = vpg->vpg_page;
 	struct cl_page  *page   = slice->cpl_page;
 	struct inode    *inode  = vvp_object_inode(page->cp_obj);
 
 	LASSERT(PageLocked(vmpage));
 	CL_PAGE_HEADER(D_PAGE, env, page, "completing READ with %d\n", ioret);
 
-	if (cp->cpg_defer_uptodate)
+	if (vpg->vpg_defer_uptodate)
 		ll_ra_count_put(ll_i2sbi(inode), 1);
 
 	if (ioret == 0)  {
-		if (!cp->cpg_defer_uptodate)
+		if (!vpg->vpg_defer_uptodate)
 			cl_page_export(env, page, 1);
-	} else
-		cp->cpg_defer_uptodate = 0;
+	} else {
+		vpg->vpg_defer_uptodate = 0;
+	}
 
 	if (!page->cp_sync_io)
 		unlock_page(vmpage);
@@ -282,9 +290,9 @@ static void vvp_page_completion_write(const struct lu_env *env,
 				      const struct cl_page_slice *slice,
 				      int ioret)
 {
-	struct ccc_page *cp     = cl2ccc_page(slice);
+	struct vvp_page *vpg     = cl2vvp_page(slice);
 	struct cl_page  *pg     = slice->cpl_page;
-	struct page      *vmpage = cp->cpg_page;
+	struct page     *vmpage = vpg->vpg_page;
 
 	CL_PAGE_HEADER(D_PAGE, env, pg, "completing WRITE with %d\n", ioret);
 
@@ -298,8 +306,8 @@ static void vvp_page_completion_write(const struct lu_env *env,
 	 * and then re-add the page into pending transfer queue.  -jay
 	 */
 
-	cp->cpg_write_queued = 0;
-	vvp_write_complete(cl2vvp(slice->cpl_obj), cp);
+	vpg->vpg_write_queued = 0;
+	vvp_write_complete(cl2vvp(slice->cpl_obj), vpg);
 
 	if (pg->cp_sync_io) {
 		LASSERT(PageLocked(vmpage));
@@ -342,7 +350,7 @@ static int vvp_page_make_ready(const struct lu_env *env,
 		LASSERT(pg->cp_state == CPS_CACHED);
 		/* This actually clears the dirty bit in the radix tree. */
 		set_page_writeback(vmpage);
-		vvp_write_pending(cl2vvp(slice->cpl_obj), cl2ccc_page(slice));
+		vvp_write_pending(cl2vvp(slice->cpl_obj), cl2vvp_page(slice));
 		CL_PAGE_HEADER(D_PAGE, env, pg, "readied\n");
 	} else if (pg->cp_state == CPS_PAGEOUT) {
 		/* is it possible for osc_flush_async_page() to already
@@ -376,12 +384,12 @@ static int vvp_page_print(const struct lu_env *env,
 			  const struct cl_page_slice *slice,
 			  void *cookie, lu_printer_t printer)
 {
-	struct ccc_page *vp = cl2ccc_page(slice);
-	struct page      *vmpage = vp->cpg_page;
+	struct vvp_page *vpg = cl2vvp_page(slice);
+	struct page     *vmpage = vpg->vpg_page;
 
 	(*printer)(env, cookie, LUSTRE_VVP_NAME "-page@%p(%d:%d:%d) vm@%p ",
-		   vp, vp->cpg_defer_uptodate, vp->cpg_ra_used,
-		   vp->cpg_write_queued, vmpage);
+		   vpg, vpg->vpg_defer_uptodate, vpg->vpg_ra_used,
+		   vpg->vpg_write_queued, vmpage);
 	if (vmpage) {
 		(*printer)(env, cookie, "%lx %d:%d %lx %lu %slru",
 			   (long)vmpage->flags, page_count(vmpage),
@@ -389,7 +397,20 @@ static int vvp_page_print(const struct lu_env *env,
 			   page_index(vmpage),
 			   list_empty(&vmpage->lru) ? "not-" : "");
 	}
+
 	(*printer)(env, cookie, "\n");
+
+	return 0;
+}
+
+static int vvp_page_fail(const struct lu_env *env,
+			 const struct cl_page_slice *slice)
+{
+	/*
+	 * Cached read?
+	 */
+	LBUG();
+
 	return 0;
 }
 
@@ -409,16 +430,24 @@ static const struct cl_page_operations vvp_page_ops = {
 		[CRT_READ] = {
 			.cpo_prep	= vvp_page_prep_read,
 			.cpo_completion  = vvp_page_completion_read,
-			.cpo_make_ready  = ccc_fail,
+			.cpo_make_ready = vvp_page_fail,
 		},
 		[CRT_WRITE] = {
 			.cpo_prep	= vvp_page_prep_write,
 			.cpo_completion  = vvp_page_completion_write,
 			.cpo_make_ready  = vvp_page_make_ready,
-		}
-	}
+		},
+	},
 };
 
+static int vvp_transient_page_prep(const struct lu_env *env,
+				   const struct cl_page_slice *slice,
+				   struct cl_io *unused)
+{
+	/* transient page should always be sent. */
+	return 0;
+}
+
 static void vvp_transient_page_verify(const struct cl_page *page)
 {
 	struct inode *inode = vvp_object_inode(page->cp_obj);
@@ -492,11 +521,11 @@ vvp_transient_page_completion(const struct lu_env *env,
 static void vvp_transient_page_fini(const struct lu_env *env,
 				    struct cl_page_slice *slice)
 {
-	struct ccc_page *cp = cl2ccc_page(slice);
+	struct vvp_page *vpg = cl2vvp_page(slice);
 	struct cl_page *clp = slice->cpl_page;
 	struct vvp_object *clobj = cl2vvp(clp->cp_obj);
 
-	vvp_page_fini_common(cp);
+	vvp_page_fini_common(vpg);
 	LASSERT(!inode_trylock(clobj->vob_inode));
 	clobj->vob_transient_pages--;
 }
@@ -513,11 +542,11 @@ static const struct cl_page_operations vvp_transient_page_ops = {
 	.cpo_is_under_lock	= vvp_page_is_under_lock,
 	.io = {
 		[CRT_READ] = {
-			.cpo_prep	= ccc_transient_page_prep,
+			.cpo_prep	= vvp_transient_page_prep,
 			.cpo_completion  = vvp_transient_page_completion,
 		},
 		[CRT_WRITE] = {
-			.cpo_prep	= ccc_transient_page_prep,
+			.cpo_prep	= vvp_transient_page_prep,
 			.cpo_completion  = vvp_transient_page_completion,
 		}
 	}
@@ -526,27 +555,27 @@ static const struct cl_page_operations vvp_transient_page_ops = {
 int vvp_page_init(const struct lu_env *env, struct cl_object *obj,
 		struct cl_page *page, pgoff_t index)
 {
-	struct ccc_page *cpg = cl_object_page_slice(obj, page);
+	struct vvp_page *vpg = cl_object_page_slice(obj, page);
 	struct page     *vmpage = page->cp_vmpage;
 
 	CLOBINVRNT(env, obj, vvp_object_invariant(obj));
 
-	cpg->cpg_page = vmpage;
+	vpg->vpg_page = vmpage;
 	page_cache_get(vmpage);
 
-	INIT_LIST_HEAD(&cpg->cpg_pending_linkage);
+	INIT_LIST_HEAD(&vpg->vpg_pending_linkage);
 	if (page->cp_type == CPT_CACHEABLE) {
 		/* in cache, decref in vvp_page_delete */
 		atomic_inc(&page->cp_ref);
 		SetPagePrivate(vmpage);
 		vmpage->private = (unsigned long)page;
-		cl_page_slice_add(page, &cpg->cpg_cl, obj, index,
+		cl_page_slice_add(page, &vpg->vpg_cl, obj, index,
 				  &vvp_page_ops);
 	} else {
 		struct vvp_object *clobj = cl2vvp(obj);
 
 		LASSERT(!inode_trylock(clobj->vob_inode));
-		cl_page_slice_add(page, &cpg->cpg_cl, obj, index,
+		cl_page_slice_add(page, &vpg->vpg_cl, obj, index,
 				  &vvp_transient_page_ops);
 		clobj->vob_transient_pages++;
 	}
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 28/46] staging/lustre/llite: rename ccc_lock to vvp_lock
  2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
                   ` (26 preceding siblings ...)
  2016-03-30 23:48 ` [PATCH v2 27/46] staging/lustre/llite: rename ccc_page to vvp_page green
@ 2016-03-30 23:48 ` green
  2016-03-30 23:48 ` [PATCH v2 29/46] staging/lustre:llite: remove struct ll_ra_read green
                   ` (17 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: green @ 2016-03-30 23:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List,
	John L. Hammond, Oleg Drokin

From: "John L. Hammond" <john.hammond@intel.com>

Rename struct ccc_lock to struct vvp_lock and merge the CCC lock
methods into the VVP lock methods.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Reviewed-on: http://review.whamcloud.com/13088
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5971
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 drivers/staging/lustre/lustre/include/cl_object.h  |  6 +--
 drivers/staging/lustre/lustre/llite/lcommon_cl.c   | 52 ----------------------
 drivers/staging/lustre/lustre/llite/vvp_dev.c      |  6 +++
 drivers/staging/lustre/lustre/llite/vvp_internal.h | 20 ++++-----
 drivers/staging/lustre/lustre/llite/vvp_lock.c     | 38 +++++++++++++---
 5 files changed, 50 insertions(+), 72 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/cl_object.h b/drivers/staging/lustre/lustre/include/cl_object.h
index 3488349..d509f94 100644
--- a/drivers/staging/lustre/lustre/include/cl_object.h
+++ b/drivers/staging/lustre/lustre/include/cl_object.h
@@ -1227,7 +1227,7 @@ struct cl_lock {
 /**
  * Per-layer part of cl_lock
  *
- * \see ccc_lock, lov_lock, lovsub_lock, osc_lock
+ * \see vvp_lock, lov_lock, lovsub_lock, osc_lock
  */
 struct cl_lock_slice {
 	struct cl_lock		  *cls_lock;
@@ -1254,7 +1254,7 @@ struct cl_lock_operations {
 	 *		@anchor for resources
 	 * \retval -ve	failure
 	 *
-	 * \see ccc_lock_enqueue(), lov_lock_enqueue(), lovsub_lock_enqueue(),
+	 * \see vvp_lock_enqueue(), lov_lock_enqueue(), lovsub_lock_enqueue(),
 	 * \see osc_lock_enqueue()
 	 */
 	int  (*clo_enqueue)(const struct lu_env *env,
@@ -1270,7 +1270,7 @@ struct cl_lock_operations {
 	/**
 	 * Destructor. Frees resources and the slice.
 	 *
-	 * \see ccc_lock_fini(), lov_lock_fini(), lovsub_lock_fini(),
+	 * \see vvp_lock_fini(), lov_lock_fini(), lovsub_lock_fini(),
 	 * \see osc_lock_fini()
 	 */
 	void (*clo_fini)(const struct lu_env *env, struct cl_lock_slice *slice);
diff --git a/drivers/staging/lustre/lustre/llite/lcommon_cl.c b/drivers/staging/lustre/lustre/llite/lcommon_cl.c
index 0762032..8884317 100644
--- a/drivers/staging/lustre/lustre/llite/lcommon_cl.c
+++ b/drivers/staging/lustre/lustre/llite/lcommon_cl.c
@@ -67,18 +67,12 @@ static const struct cl_req_operations ccc_req_ops;
  * ccc_ prefix stands for "Common Client Code".
  */
 
-static struct kmem_cache *ccc_lock_kmem;
 static struct kmem_cache *ccc_thread_kmem;
 static struct kmem_cache *ccc_session_kmem;
 static struct kmem_cache *ccc_req_kmem;
 
 static struct lu_kmem_descr ccc_caches[] = {
 	{
-		.ckd_cache = &ccc_lock_kmem,
-		.ckd_name  = "ccc_lock_kmem",
-		.ckd_size  = sizeof(struct ccc_lock)
-	},
-	{
 		.ckd_cache = &ccc_thread_kmem,
 		.ckd_name  = "ccc_thread_kmem",
 		.ckd_size  = sizeof(struct ccc_thread_info),
@@ -221,26 +215,6 @@ void ccc_global_fini(struct lu_device_type *device_type)
 	lu_kmem_fini(ccc_caches);
 }
 
-int ccc_lock_init(const struct lu_env *env,
-		  struct cl_object *obj, struct cl_lock *lock,
-		  const struct cl_io *unused,
-		  const struct cl_lock_operations *lkops)
-{
-	struct ccc_lock *clk;
-	int result;
-
-	CLOBINVRNT(env, obj, vvp_object_invariant(obj));
-
-	clk = kmem_cache_zalloc(ccc_lock_kmem, GFP_NOFS);
-	if (clk) {
-		cl_lock_slice_add(lock, &clk->clk_cl, obj, lkops);
-		result = 0;
-	} else {
-		result = -ENOMEM;
-	}
-	return result;
-}
-
 static void vvp_object_size_lock(struct cl_object *obj)
 {
 	struct inode *inode = vvp_object_inode(obj);
@@ -259,27 +233,6 @@ static void vvp_object_size_unlock(struct cl_object *obj)
 
 /*****************************************************************************
  *
- * Lock operations.
- *
- */
-
-void ccc_lock_fini(const struct lu_env *env, struct cl_lock_slice *slice)
-{
-	struct ccc_lock *clk = cl2ccc_lock(slice);
-
-	kmem_cache_free(ccc_lock_kmem, clk);
-}
-
-int ccc_lock_enqueue(const struct lu_env *env,
-		     const struct cl_lock_slice *slice,
-		     struct cl_io *unused, struct cl_sync_io *anchor)
-{
-	CLOBINVRNT(env, slice->cls_obj, vvp_object_invariant(slice->cls_obj));
-	return 0;
-}
-
-/*****************************************************************************
- *
  * io operations.
  *
  */
@@ -571,11 +524,6 @@ again:
  *
  */
 
-struct ccc_lock *cl2ccc_lock(const struct cl_lock_slice *slice)
-{
-	return container_of(slice, struct ccc_lock, clk_cl);
-}
-
 struct ccc_io *cl2ccc_io(const struct lu_env *env,
 			 const struct cl_io_slice *slice)
 {
diff --git a/drivers/staging/lustre/lustre/llite/vvp_dev.c b/drivers/staging/lustre/lustre/llite/vvp_dev.c
index 3891b0d..4b77db3 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_dev.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_dev.c
@@ -57,11 +57,17 @@
  * "llite_" (var. "ll_") prefix.
  */
 
+struct kmem_cache *vvp_lock_kmem;
 struct kmem_cache *vvp_object_kmem;
 static struct kmem_cache *vvp_thread_kmem;
 static struct kmem_cache *vvp_session_kmem;
 static struct lu_kmem_descr vvp_caches[] = {
 	{
+		.ckd_cache = &vvp_lock_kmem,
+		.ckd_name  = "vvp_lock_kmem",
+		.ckd_size  = sizeof(struct vvp_lock),
+	},
+	{
 		.ckd_cache = &vvp_object_kmem,
 		.ckd_name  = "vvp_object_kmem",
 		.ckd_size  = sizeof(struct vvp_object),
diff --git a/drivers/staging/lustre/lustre/llite/vvp_internal.h b/drivers/staging/lustre/lustre/llite/vvp_internal.h
index 4991ce7..403d2a7 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_internal.h
+++ b/drivers/staging/lustre/lustre/llite/vvp_internal.h
@@ -128,6 +128,7 @@ int cl_is_normalio(const struct lu_env *env, const struct cl_io *io);
 extern struct lu_context_key ccc_key;
 extern struct lu_context_key ccc_session_key;
 
+extern struct kmem_cache *vvp_lock_kmem;
 extern struct kmem_cache *vvp_object_kmem;
 
 struct ccc_thread_info {
@@ -269,8 +270,8 @@ struct vvp_device {
 	struct cl_device   *vdv_next;
 };
 
-struct ccc_lock {
-	struct cl_lock_slice clk_cl;
+struct vvp_lock {
+	struct cl_lock_slice vlk_cl;
 };
 
 struct ccc_req {
@@ -291,15 +292,6 @@ int ccc_req_init(const struct lu_env *env, struct cl_device *dev,
 void ccc_umount(const struct lu_env *env, struct cl_device *dev);
 int ccc_global_init(struct lu_device_type *device_type);
 void ccc_global_fini(struct lu_device_type *device_type);
-int ccc_lock_init(const struct lu_env *env, struct cl_object *obj,
-		  struct cl_lock *lock, const struct cl_io *io,
-		  const struct cl_lock_operations *lkops);
-void ccc_lock_delete(const struct lu_env *env,
-		     const struct cl_lock_slice *slice);
-void ccc_lock_fini(const struct lu_env *env, struct cl_lock_slice *slice);
-int ccc_lock_enqueue(const struct lu_env *env,
-		     const struct cl_lock_slice *slice,
-		     struct cl_io *io, struct cl_sync_io *anchor);
 
 int ccc_io_one_lock_index(const struct lu_env *env, struct cl_io *io,
 			  __u32 enqflags, enum cl_lock_mode mode,
@@ -359,7 +351,11 @@ static inline struct page *cl2vm_page(const struct cl_page_slice *slice)
 	return cl2vvp_page(slice)->vpg_page;
 }
 
-struct ccc_lock *cl2ccc_lock(const struct cl_lock_slice *slice);
+static inline struct vvp_lock *cl2vvp_lock(const struct cl_lock_slice *slice)
+{
+	return container_of(slice, struct vvp_lock, vlk_cl);
+}
+
 struct ccc_io *cl2ccc_io(const struct lu_env *env,
 			 const struct cl_io_slice *slice);
 struct ccc_req *cl2ccc_req(const struct cl_req_slice *slice);
diff --git a/drivers/staging/lustre/lustre/llite/vvp_lock.c b/drivers/staging/lustre/lustre/llite/vvp_lock.c
index 8c505a6..f5bd6c2 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_lock.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_lock.c
@@ -40,7 +40,7 @@
 
 #define DEBUG_SUBSYSTEM S_LLITE
 
-#include "../include/obd.h"
+#include "../include/obd_support.h"
 #include "../include/lustre_lite.h"
 
 #include "vvp_internal.h"
@@ -51,13 +51,41 @@
  *
  */
 
+static void vvp_lock_fini(const struct lu_env *env, struct cl_lock_slice *slice)
+{
+	struct vvp_lock *vlk = cl2vvp_lock(slice);
+
+	kmem_cache_free(vvp_lock_kmem, vlk);
+}
+
+static int vvp_lock_enqueue(const struct lu_env *env,
+			    const struct cl_lock_slice *slice,
+			    struct cl_io *unused, struct cl_sync_io *anchor)
+{
+	CLOBINVRNT(env, slice->cls_obj, vvp_object_invariant(slice->cls_obj));
+
+	return 0;
+}
+
 static const struct cl_lock_operations vvp_lock_ops = {
-	.clo_fini      = ccc_lock_fini,
-	.clo_enqueue   = ccc_lock_enqueue
+	.clo_fini	= vvp_lock_fini,
+	.clo_enqueue	= vvp_lock_enqueue,
 };
 
 int vvp_lock_init(const struct lu_env *env, struct cl_object *obj,
-		  struct cl_lock *lock, const struct cl_io *io)
+		  struct cl_lock *lock, const struct cl_io *unused)
 {
-	return ccc_lock_init(env, obj, lock, io, &vvp_lock_ops);
+	struct vvp_lock *vlk;
+	int result;
+
+	CLOBINVRNT(env, obj, vvp_object_invariant(obj));
+
+	vlk = kmem_cache_zalloc(vvp_lock_kmem, GFP_NOFS);
+	if (vlk) {
+		cl_lock_slice_add(lock, &vlk->vlk_cl, obj, &vvp_lock_ops);
+		result = 0;
+	} else {
+		result = -ENOMEM;
+	}
+	return result;
 }
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 29/46] staging/lustre:llite: remove struct ll_ra_read
  2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
                   ` (27 preceding siblings ...)
  2016-03-30 23:48 ` [PATCH v2 28/46] staging/lustre/llite: rename ccc_lock to vvp_lock green
@ 2016-03-30 23:48 ` green
  2016-03-30 23:48 ` [PATCH v2 30/46] staging/lustre/llite: merge ccc_io and vvp_io green
                   ` (16 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: green @ 2016-03-30 23:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List,
	John L. Hammond, Oleg Drokin

From: "John L. Hammond" <john.hammond@intel.com>

Ever since removal of the the unused function ll_ra_read_get(),
the struct ll_ra_read members lrr_reader and lrr_linkage and
the struct ll_readahead_state member ras_read_beads unnecessary
so remove them.
In struct vvp_io replace the struct ll_ra_read cui_bead member with
cui_ra_start and cui_ra_count.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Reviewed-on: http://review.whamcloud.com/13347
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5971
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 .../staging/lustre/lustre/llite/llite_internal.h   | 30 +++---------
 drivers/staging/lustre/lustre/llite/rw.c           | 53 ++++++----------------
 drivers/staging/lustre/lustre/llite/vvp_io.c       | 31 ++++---------
 3 files changed, 28 insertions(+), 86 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index 78b3edd..9dd4325 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -528,13 +528,6 @@ struct ll_sb_info {
 	struct completion	 ll_kobj_unregister;
 };
 
-struct ll_ra_read {
-	pgoff_t	     lrr_start;
-	pgoff_t	     lrr_count;
-	struct task_struct *lrr_reader;
-	struct list_head	  lrr_linkage;
-};
-
 /*
  * per file-descriptor read-ahead data.
  */
@@ -593,12 +586,6 @@ struct ll_readahead_state {
 	 */
 	unsigned long   ras_request_index;
 	/*
-	 * list of struct ll_ra_read's one per read(2) call current in
-	 * progress against this file descriptor. Used by read-ahead code,
-	 * protected by ->ras_lock.
-	 */
-	struct list_head      ras_read_beads;
-	/*
 	 * The following 3 items are used for detecting the stride I/O
 	 * mode.
 	 * In stride I/O mode,
@@ -666,8 +653,7 @@ static inline int ll_need_32bit_api(struct ll_sb_info *sbi)
 #endif
 }
 
-void ll_ra_read_in(struct file *f, struct ll_ra_read *rar);
-void ll_ra_read_ex(struct file *f, struct ll_ra_read *rar);
+void ll_ras_enter(struct file *f);
 
 /* llite/lproc_llite.c */
 int ldebugfs_register_mountpoint(struct dentry *parent,
@@ -876,14 +862,12 @@ struct vvp_io {
 			} fault;
 		} fault;
 	} u;
-	/**
-	 * Read-ahead state used by read and page-fault IO contexts.
-	 */
-	struct ll_ra_read    cui_bead;
-	/**
-	 * Set when cui_bead has been initialized.
-	 */
-	int		  cui_ra_window_set;
+
+	/* Readahead state. */
+	pgoff_t	cui_ra_start;
+	pgoff_t cui_ra_count;
+	/* Set when cui_ra_{start,count} have been initialized. */
+	bool	cui_ra_valid;
 };
 
 /**
diff --git a/drivers/staging/lustre/lustre/llite/rw.c b/drivers/staging/lustre/lustre/llite/rw.c
index 2c4d4c4..f06d8be 100644
--- a/drivers/staging/lustre/lustre/llite/rw.c
+++ b/drivers/staging/lustre/lustre/llite/rw.c
@@ -258,38 +258,15 @@ static int index_in_window(unsigned long index, unsigned long point,
 	return start <= index && index <= end;
 }
 
-static struct ll_readahead_state *ll_ras_get(struct file *f)
+void ll_ras_enter(struct file *f)
 {
-	struct ll_file_data       *fd;
-
-	fd = LUSTRE_FPRIVATE(f);
-	return &fd->fd_ras;
-}
-
-void ll_ra_read_in(struct file *f, struct ll_ra_read *rar)
-{
-	struct ll_readahead_state *ras;
-
-	ras = ll_ras_get(f);
+	struct ll_file_data *fd = LUSTRE_FPRIVATE(f);
+	struct ll_readahead_state *ras = &fd->fd_ras;
 
 	spin_lock(&ras->ras_lock);
 	ras->ras_requests++;
 	ras->ras_request_index = 0;
 	ras->ras_consecutive_requests++;
-	rar->lrr_reader = current;
-
-	list_add(&rar->lrr_linkage, &ras->ras_read_beads);
-	spin_unlock(&ras->ras_lock);
-}
-
-void ll_ra_read_ex(struct file *f, struct ll_ra_read *rar)
-{
-	struct ll_readahead_state *ras;
-
-	ras = ll_ras_get(f);
-
-	spin_lock(&ras->ras_lock);
-	list_del_init(&rar->lrr_linkage);
 	spin_unlock(&ras->ras_lock);
 }
 
@@ -551,7 +528,6 @@ int ll_readahead(const struct lu_env *env, struct cl_io *io,
 	unsigned long start = 0, end = 0, reserved;
 	unsigned long ra_end, len, mlen = 0;
 	struct inode *inode;
-	struct ll_ra_read *bead;
 	struct ra_io_arg *ria = &vti->vti_ria;
 	struct cl_object *clob;
 	int ret = 0;
@@ -575,17 +551,15 @@ int ll_readahead(const struct lu_env *env, struct cl_io *io,
 	}
 
 	spin_lock(&ras->ras_lock);
-	if (vio->cui_ra_window_set)
-		bead = &vio->cui_bead;
-	else
-		bead = NULL;
 
 	/* Enlarge the RA window to encompass the full read */
-	if (bead && ras->ras_window_start + ras->ras_window_len <
-	    bead->lrr_start + bead->lrr_count) {
-		ras->ras_window_len = bead->lrr_start + bead->lrr_count -
+	if (vio->cui_ra_valid &&
+	    ras->ras_window_start + ras->ras_window_len <
+	    vio->cui_ra_start + vio->cui_ra_count) {
+		ras->ras_window_len = vio->cui_ra_start + vio->cui_ra_count -
 				      ras->ras_window_start;
 	}
+
 	/* Reserve a part of the read-ahead window that we'll be issuing */
 	if (ras->ras_window_len) {
 		start = ras->ras_next_readahead;
@@ -641,15 +615,15 @@ int ll_readahead(const struct lu_env *env, struct cl_io *io,
 	CDEBUG(D_READA, DFID ": ria: %lu/%lu, bead: %lu/%lu, hit: %d\n",
 	       PFID(lu_object_fid(&clob->co_lu)),
 	       ria->ria_start, ria->ria_end,
-	       !bead ? 0 : bead->lrr_start,
-	       !bead ? 0 : bead->lrr_count,
+	       vio->cui_ra_valid ? vio->cui_ra_start : 0,
+	       vio->cui_ra_valid ? vio->cui_ra_count : 0,
 	       hit);
 
 	/* at least to extend the readahead window to cover current read */
-	if (!hit && bead &&
-	    bead->lrr_start + bead->lrr_count > ria->ria_start) {
+	if (!hit && vio->cui_ra_valid &&
+	    vio->cui_ra_start + vio->cui_ra_count > ria->ria_start) {
 		/* to the end of current read window. */
-		mlen = bead->lrr_start + bead->lrr_count - ria->ria_start;
+		mlen = vio->cui_ra_start + vio->cui_ra_count - ria->ria_start;
 		/* trim to RPC boundary */
 		start = ria->ria_start & (PTLRPC_MAX_BRW_PAGES - 1);
 		mlen = min(mlen, PTLRPC_MAX_BRW_PAGES - start);
@@ -730,7 +704,6 @@ void ll_readahead_init(struct inode *inode, struct ll_readahead_state *ras)
 	spin_lock_init(&ras->ras_lock);
 	ras_reset(inode, ras, 0);
 	ras->ras_requests = 0;
-	INIT_LIST_HEAD(&ras->ras_read_beads);
 }
 
 /*
diff --git a/drivers/staging/lustre/lustre/llite/vvp_io.c b/drivers/staging/lustre/lustre/llite/vvp_io.c
index eb6ce1c..86e73d5 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_io.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_io.c
@@ -500,7 +500,6 @@ static int vvp_io_read_start(const struct lu_env *env,
 	struct cl_io      *io    = ios->cis_io;
 	struct cl_object  *obj   = io->ci_obj;
 	struct inode      *inode = vvp_object_inode(obj);
-	struct ll_ra_read *bead  = &vio->cui_bead;
 	struct file       *file  = cio->cui_fd->fd_file;
 
 	int     result;
@@ -530,14 +529,11 @@ static int vvp_io_read_start(const struct lu_env *env,
 	cio->cui_fd->fd_file->f_ra.ra_pages = 0;
 
 	/* initialize read-ahead window once per syscall */
-	if (!vio->cui_ra_window_set) {
-		vio->cui_ra_window_set = 1;
-		bead->lrr_start = cl_index(obj, pos);
-		/*
-		 * XXX: explicit PAGE_CACHE_SIZE
-		 */
-		bead->lrr_count = cl_index(obj, tot + PAGE_CACHE_SIZE - 1);
-		ll_ra_read_in(file, bead);
+	if (!vio->cui_ra_valid) {
+		vio->cui_ra_valid = true;
+		vio->cui_ra_start = cl_index(obj, pos);
+		vio->cui_ra_count = cl_index(obj, tot + PAGE_CACHE_SIZE - 1);
+		ll_ras_enter(file);
 	}
 
 	/* BUG: 5972 */
@@ -574,17 +570,6 @@ out:
 	return result;
 }
 
-static void vvp_io_read_fini(const struct lu_env *env, const struct cl_io_slice *ios)
-{
-	struct vvp_io *vio = cl2vvp_io(env, ios);
-	struct ccc_io *cio = cl2ccc_io(env, ios);
-
-	if (vio->cui_ra_window_set)
-		ll_ra_read_ex(cio->cui_fd->fd_file, &vio->cui_bead);
-
-	vvp_io_fini(env, ios);
-}
-
 static int vvp_io_commit_sync(const struct lu_env *env, struct cl_io *io,
 			      struct cl_page_list *plist, int from, int to)
 {
@@ -1092,10 +1077,10 @@ static int vvp_io_read_page(const struct lu_env *env,
 static const struct cl_io_operations vvp_io_ops = {
 	.op = {
 		[CIT_READ] = {
-			.cio_fini      = vvp_io_read_fini,
+			.cio_fini	= vvp_io_fini,
 			.cio_lock      = vvp_io_read_lock,
 			.cio_start     = vvp_io_read_start,
-			.cio_advance   = ccc_io_advance
+			.cio_advance	= ccc_io_advance,
 		},
 		[CIT_WRITE] = {
 			.cio_fini      = vvp_io_fini,
@@ -1148,7 +1133,7 @@ int vvp_io_init(const struct lu_env *env, struct cl_object *obj,
 
 	CL_IO_SLICE_CLEAN(cio, cui_cl);
 	cl_io_slice_add(io, &cio->cui_cl, obj, &vvp_io_ops);
-	vio->cui_ra_window_set = 0;
+	vio->cui_ra_valid = false;
 	result = 0;
 	if (io->ci_type == CIT_READ || io->ci_type == CIT_WRITE) {
 		size_t count;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 30/46] staging/lustre/llite: merge ccc_io and vvp_io
  2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
                   ` (28 preceding siblings ...)
  2016-03-30 23:48 ` [PATCH v2 29/46] staging/lustre:llite: remove struct ll_ra_read green
@ 2016-03-30 23:48 ` green
  2016-03-30 23:48 ` [PATCH v2 31/46] staging/lustre/llite: use vui prefix for struct vvp_io members green
                   ` (15 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: green @ 2016-03-30 23:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List,
	John L. Hammond, Oleg Drokin

From: "John L. Hammond" <john.hammond@intel.com>

Move the contents of struct vvp_io into struct ccc_io, delete the
former, and rename the latter to struct vvp_io. Rename various ccc_io
related functions to use vvp rather than ccc.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Reviewed-on: http://review.whamcloud.com/13351
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5971
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 drivers/staging/lustre/lustre/include/cl_object.h  |   2 +-
 drivers/staging/lustre/lustre/llite/file.c         |  13 ++-
 drivers/staging/lustre/lustre/llite/lcommon_cl.c   |  60 +++---------
 .../staging/lustre/lustre/llite/llite_internal.h   |  72 ---------------
 drivers/staging/lustre/lustre/llite/llite_mmap.c   |  12 +--
 drivers/staging/lustre/lustre/llite/rw.c           |   4 +-
 drivers/staging/lustre/lustre/llite/rw26.c         |  10 +-
 drivers/staging/lustre/lustre/llite/vvp_dev.c      |   2 +-
 drivers/staging/lustre/lustre/llite/vvp_internal.h |  81 +++++++++++++----
 drivers/staging/lustre/lustre/llite/vvp_io.c       | 101 ++++++++++-----------
 drivers/staging/lustre/lustre/llite/vvp_page.c     |   2 +-
 11 files changed, 144 insertions(+), 215 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/cl_object.h b/drivers/staging/lustre/lustre/include/cl_object.h
index d509f94..104dc9f 100644
--- a/drivers/staging/lustre/lustre/include/cl_object.h
+++ b/drivers/staging/lustre/lustre/include/cl_object.h
@@ -1461,7 +1461,7 @@ enum cl_io_state {
  * This is usually embedded into layer session data, rather than allocated
  * dynamically.
  *
- * \see vvp_io, lov_io, osc_io, ccc_io
+ * \see vvp_io, lov_io, osc_io
  */
 struct cl_io_slice {
 	struct cl_io		  *cis_io;
diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index bf2a5ee..27e7e65 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -1135,14 +1135,13 @@ restart:
 	ll_io_init(io, file, iot == CIT_WRITE);
 
 	if (cl_io_rw_init(env, io, iot, *ppos, count) == 0) {
-		struct vvp_io *vio = vvp_env_io(env);
-		struct ccc_io *cio = ccc_env_io(env);
+		struct vvp_io *cio = vvp_env_io(env);
 		int write_mutex_locked = 0;
 
 		cio->cui_fd  = LUSTRE_FPRIVATE(file);
-		vio->cui_io_subtype = args->via_io_subtype;
+		cio->cui_io_subtype = args->via_io_subtype;
 
-		switch (vio->cui_io_subtype) {
+		switch (cio->cui_io_subtype) {
 		case IO_NORMAL:
 			cio->cui_iter = args->u.normal.via_iter;
 			cio->cui_iocb = args->u.normal.via_iocb;
@@ -1158,11 +1157,11 @@ restart:
 			down_read(&lli->lli_trunc_sem);
 			break;
 		case IO_SPLICE:
-			vio->u.splice.cui_pipe = args->u.splice.via_pipe;
-			vio->u.splice.cui_flags = args->u.splice.via_flags;
+			cio->u.splice.cui_pipe = args->u.splice.via_pipe;
+			cio->u.splice.cui_flags = args->u.splice.via_flags;
 			break;
 		default:
-			CERROR("Unknown IO type - %u\n", vio->cui_io_subtype);
+			CERROR("Unknown IO type - %u\n", cio->cui_io_subtype);
 			LBUG();
 		}
 		result = cl_io_loop(env, io);
diff --git a/drivers/staging/lustre/lustre/llite/lcommon_cl.c b/drivers/staging/lustre/lustre/llite/lcommon_cl.c
index 8884317..9a9d706 100644
--- a/drivers/staging/lustre/lustre/llite/lcommon_cl.c
+++ b/drivers/staging/lustre/lustre/llite/lcommon_cl.c
@@ -68,7 +68,6 @@ static const struct cl_req_operations ccc_req_ops;
  */
 
 static struct kmem_cache *ccc_thread_kmem;
-static struct kmem_cache *ccc_session_kmem;
 static struct kmem_cache *ccc_req_kmem;
 
 static struct lu_kmem_descr ccc_caches[] = {
@@ -78,11 +77,6 @@ static struct lu_kmem_descr ccc_caches[] = {
 		.ckd_size  = sizeof(struct ccc_thread_info),
 	},
 	{
-		.ckd_cache = &ccc_session_kmem,
-		.ckd_name  = "ccc_session_kmem",
-		.ckd_size  = sizeof(struct ccc_session)
-	},
-	{
 		.ckd_cache = &ccc_req_kmem,
 		.ckd_name  = "ccc_req_kmem",
 		.ckd_size  = sizeof(struct ccc_req)
@@ -116,37 +110,12 @@ void ccc_key_fini(const struct lu_context *ctx,
 	kmem_cache_free(ccc_thread_kmem, info);
 }
 
-void *ccc_session_key_init(const struct lu_context *ctx,
-			   struct lu_context_key *key)
-{
-	struct ccc_session *session;
-
-	session = kmem_cache_zalloc(ccc_session_kmem, GFP_NOFS);
-	if (!session)
-		session = ERR_PTR(-ENOMEM);
-	return session;
-}
-
-void ccc_session_key_fini(const struct lu_context *ctx,
-			  struct lu_context_key *key, void *data)
-{
-	struct ccc_session *session = data;
-
-	kmem_cache_free(ccc_session_kmem, session);
-}
-
 struct lu_context_key ccc_key = {
 	.lct_tags = LCT_CL_THREAD,
 	.lct_init = ccc_key_init,
 	.lct_fini = ccc_key_fini
 };
 
-struct lu_context_key ccc_session_key = {
-	.lct_tags = LCT_SESSION,
-	.lct_init = ccc_session_key_init,
-	.lct_fini = ccc_session_key_fini
-};
-
 int ccc_req_init(const struct lu_env *env, struct cl_device *dev,
 		 struct cl_req *req)
 {
@@ -237,11 +206,11 @@ static void vvp_object_size_unlock(struct cl_object *obj)
  *
  */
 
-int ccc_io_one_lock_index(const struct lu_env *env, struct cl_io *io,
+int vvp_io_one_lock_index(const struct lu_env *env, struct cl_io *io,
 			  __u32 enqflags, enum cl_lock_mode mode,
 			  pgoff_t start, pgoff_t end)
 {
-	struct ccc_io	  *cio   = ccc_env_io(env);
+	struct vvp_io *cio = vvp_env_io(env);
 	struct cl_lock_descr   *descr = &cio->cui_link.cill_descr;
 	struct cl_object       *obj   = io->ci_obj;
 
@@ -266,8 +235,8 @@ int ccc_io_one_lock_index(const struct lu_env *env, struct cl_io *io,
 	return 0;
 }
 
-void ccc_io_update_iov(const struct lu_env *env,
-		       struct ccc_io *cio, struct cl_io *io)
+void vvp_io_update_iov(const struct lu_env *env,
+		       struct vvp_io *cio, struct cl_io *io)
 {
 	size_t size = io->u.ci_rw.crw_count;
 
@@ -277,27 +246,27 @@ void ccc_io_update_iov(const struct lu_env *env,
 	iov_iter_truncate(cio->cui_iter, size);
 }
 
-int ccc_io_one_lock(const struct lu_env *env, struct cl_io *io,
+int vvp_io_one_lock(const struct lu_env *env, struct cl_io *io,
 		    __u32 enqflags, enum cl_lock_mode mode,
 		    loff_t start, loff_t end)
 {
 	struct cl_object *obj = io->ci_obj;
 
-	return ccc_io_one_lock_index(env, io, enqflags, mode,
+	return vvp_io_one_lock_index(env, io, enqflags, mode,
 				     cl_index(obj, start), cl_index(obj, end));
 }
 
-void ccc_io_end(const struct lu_env *env, const struct cl_io_slice *ios)
+void vvp_io_end(const struct lu_env *env, const struct cl_io_slice *ios)
 {
 	CLOBINVRNT(env, ios->cis_io->ci_obj,
 		   vvp_object_invariant(ios->cis_io->ci_obj));
 }
 
-void ccc_io_advance(const struct lu_env *env,
+void vvp_io_advance(const struct lu_env *env,
 		    const struct cl_io_slice *ios,
 		    size_t nob)
 {
-	struct ccc_io    *cio = cl2ccc_io(env, ios);
+	struct vvp_io    *cio = cl2vvp_io(env, ios);
 	struct cl_io     *io  = ios->cis_io;
 	struct cl_object *obj = ios->cis_io->ci_obj;
 
@@ -492,7 +461,7 @@ int cl_setattr_ost(struct inode *inode, const struct iattr *attr)
 
 again:
 	if (cl_io_init(env, io, CIT_SETATTR, io->ci_obj) == 0) {
-		struct ccc_io *cio = ccc_env_io(env);
+		struct vvp_io *cio = vvp_env_io(env);
 
 		if (attr->ia_valid & ATTR_FILE)
 			/* populate the file descriptor for ftruncate to honor
@@ -524,13 +493,14 @@ again:
  *
  */
 
-struct ccc_io *cl2ccc_io(const struct lu_env *env,
+struct vvp_io *cl2vvp_io(const struct lu_env *env,
 			 const struct cl_io_slice *slice)
 {
-	struct ccc_io *cio;
+	struct vvp_io *cio;
+
+	cio = container_of(slice, struct vvp_io, cui_cl);
+	LASSERT(cio == vvp_env_io(env));
 
-	cio = container_of(slice, struct ccc_io, cui_cl);
-	LASSERT(cio == ccc_env_io(env));
 	return cio;
 }
 
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index 9dd4325..9856bb6 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -817,59 +817,6 @@ struct ll_close_queue {
 void vvp_write_pending(struct vvp_object *club, struct vvp_page *page);
 void vvp_write_complete(struct vvp_object *club, struct vvp_page *page);
 
-/* specific architecture can implement only part of this list */
-enum vvp_io_subtype {
-	/** normal IO */
-	IO_NORMAL,
-	/** io started from splice_{read|write} */
-	IO_SPLICE
-};
-
-/* IO subtypes */
-struct vvp_io {
-	/** io subtype */
-	enum vvp_io_subtype    cui_io_subtype;
-
-	union {
-		struct {
-			struct pipe_inode_info *cui_pipe;
-			unsigned int	    cui_flags;
-		} splice;
-		struct vvp_fault_io {
-			/**
-			 * Inode modification time that is checked across DLM
-			 * lock request.
-			 */
-			time64_t	    ft_mtime;
-			struct vm_area_struct *ft_vma;
-			/**
-			 *  locked page returned from vvp_io
-			 */
-			struct page	    *ft_vmpage;
-			struct vm_fault_api {
-				/**
-				 * kernel fault info
-				 */
-				struct vm_fault *ft_vmf;
-				/**
-				 * fault API used bitflags for return code.
-				 */
-				unsigned int    ft_flags;
-				/**
-				 * check that flags are from filemap_fault
-				 */
-				bool		ft_flags_valid;
-			} fault;
-		} fault;
-	} u;
-
-	/* Readahead state. */
-	pgoff_t	cui_ra_start;
-	pgoff_t cui_ra_count;
-	/* Set when cui_ra_{start,count} have been initialized. */
-	bool	cui_ra_valid;
-};
-
 /**
  * IO arguments for various VFS I/O interfaces.
  */
@@ -923,25 +870,6 @@ static inline struct vvp_io_args *vvp_env_args(const struct lu_env *env,
 	return ret;
 }
 
-struct vvp_session {
-	struct vvp_io	 vs_ios;
-};
-
-static inline struct vvp_session *vvp_env_session(const struct lu_env *env)
-{
-	extern struct lu_context_key vvp_session_key;
-	struct vvp_session *ses;
-
-	ses = lu_context_key_get(env->le_ses, &vvp_session_key);
-	LASSERT(ses);
-	return ses;
-}
-
-static inline struct vvp_io *vvp_env_io(const struct lu_env *env)
-{
-	return &vvp_env_session(env)->vs_ios;
-}
-
 int vvp_global_init(void);
 void vvp_global_fini(void);
 
diff --git a/drivers/staging/lustre/lustre/llite/llite_mmap.c b/drivers/staging/lustre/lustre/llite/llite_mmap.c
index 1263da8..7c214f8 100644
--- a/drivers/staging/lustre/lustre/llite/llite_mmap.c
+++ b/drivers/staging/lustre/lustre/llite/llite_mmap.c
@@ -146,7 +146,7 @@ ll_fault_io_init(struct vm_area_struct *vma, struct lu_env **env_ret,
 
 	rc = cl_io_init(env, io, CIT_FAULT, io->ci_obj);
 	if (rc == 0) {
-		struct ccc_io *cio = ccc_env_io(env);
+		struct vvp_io *cio = vvp_env_io(env);
 		struct ll_file_data *fd = LUSTRE_FPRIVATE(file);
 
 		LASSERT(cio->cui_cl.cis_io == io);
@@ -307,17 +307,17 @@ static int ll_fault0(struct vm_area_struct *vma, struct vm_fault *vmf)
 		vio = vvp_env_io(env);
 		vio->u.fault.ft_vma       = vma;
 		vio->u.fault.ft_vmpage    = NULL;
-		vio->u.fault.fault.ft_vmf = vmf;
-		vio->u.fault.fault.ft_flags = 0;
-		vio->u.fault.fault.ft_flags_valid = false;
+		vio->u.fault.ft_vmf = vmf;
+		vio->u.fault.ft_flags = 0;
+		vio->u.fault.ft_flags_valid = false;
 
 		result = cl_io_loop(env, io);
 
 		/* ft_flags are only valid if we reached
 		 * the call to filemap_fault
 		 */
-		if (vio->u.fault.fault.ft_flags_valid)
-			fault_ret = vio->u.fault.fault.ft_flags;
+		if (vio->u.fault.ft_flags_valid)
+			fault_ret = vio->u.fault.ft_flags;
 
 		vmpage = vio->u.fault.ft_vmpage;
 		if (result != 0 && vmpage) {
diff --git a/drivers/staging/lustre/lustre/llite/rw.c b/drivers/staging/lustre/lustre/llite/rw.c
index f06d8be..da44107 100644
--- a/drivers/staging/lustre/lustre/llite/rw.c
+++ b/drivers/staging/lustre/lustre/llite/rw.c
@@ -90,7 +90,7 @@ struct ll_cl_context *ll_cl_init(struct file *file, struct page *vmpage)
 	struct lu_env    *env;
 	struct cl_io     *io;
 	struct cl_object *clob;
-	struct ccc_io    *cio;
+	struct vvp_io    *cio;
 
 	int refcheck;
 	int result = 0;
@@ -108,7 +108,7 @@ struct ll_cl_context *ll_cl_init(struct file *file, struct page *vmpage)
 	lcc->lcc_refcheck = refcheck;
 	lcc->lcc_cookie = current;
 
-	cio = ccc_env_io(env);
+	cio = vvp_env_io(env);
 	io = cio->cui_cl.cis_io;
 	lcc->lcc_io = io;
 	if (!io) {
diff --git a/drivers/staging/lustre/lustre/llite/rw26.c b/drivers/staging/lustre/lustre/llite/rw26.c
index bb85629..2f69634 100644
--- a/drivers/staging/lustre/lustre/llite/rw26.c
+++ b/drivers/staging/lustre/lustre/llite/rw26.c
@@ -376,7 +376,7 @@ static ssize_t ll_direct_IO_26(struct kiocb *iocb, struct iov_iter *iter,
 
 	env = cl_env_get(&refcheck);
 	LASSERT(!IS_ERR(env));
-	io = ccc_env_io(env)->cui_cl.cis_io;
+	io = vvp_env_io(env)->cui_cl.cis_io;
 	LASSERT(io);
 
 	/* 0. Need locking between buffered and direct access. and race with
@@ -439,7 +439,7 @@ out:
 		inode_unlock(inode);
 
 	if (tot_bytes > 0) {
-		struct ccc_io *cio = ccc_env_io(env);
+		struct vvp_io *cio = vvp_env_io(env);
 
 		/* no commit async for direct IO */
 		cio->u.write.cui_written += tot_bytes;
@@ -513,7 +513,7 @@ static int ll_write_begin(struct file *file, struct address_space *mapping,
 	/* To avoid deadlock, try to lock page first. */
 	vmpage = grab_cache_page_nowait(mapping, index);
 	if (unlikely(!vmpage || PageDirty(vmpage) || PageWriteback(vmpage))) {
-		struct ccc_io *cio = ccc_env_io(env);
+		struct vvp_io *cio = vvp_env_io(env);
 		struct cl_page_list *plist = &cio->u.write.cui_queue;
 
 		/* if the page is already in dirty cache, we have to commit
@@ -595,7 +595,7 @@ static int ll_write_end(struct file *file, struct address_space *mapping,
 	struct ll_cl_context *lcc = fsdata;
 	struct lu_env *env;
 	struct cl_io *io;
-	struct ccc_io *cio;
+	struct vvp_io *cio;
 	struct cl_page *page;
 	unsigned from = pos & (PAGE_CACHE_SIZE - 1);
 	bool unplug = false;
@@ -606,7 +606,7 @@ static int ll_write_end(struct file *file, struct address_space *mapping,
 	env  = lcc->lcc_env;
 	page = lcc->lcc_page;
 	io   = lcc->lcc_io;
-	cio  = ccc_env_io(env);
+	cio  = vvp_env_io(env);
 
 	LASSERT(cl_page_is_owned(page, io));
 	if (copied > 0) {
diff --git a/drivers/staging/lustre/lustre/llite/vvp_dev.c b/drivers/staging/lustre/lustre/llite/vvp_dev.c
index 4b77db3..030a246 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_dev.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_dev.c
@@ -138,7 +138,7 @@ struct lu_context_key vvp_session_key = {
 };
 
 /* type constructor/destructor: vvp_type_{init,fini,start,stop}(). */
-LU_TYPE_INIT_FINI(vvp, &ccc_key, &ccc_session_key, &vvp_key, &vvp_session_key);
+LU_TYPE_INIT_FINI(vvp, &ccc_key, &vvp_key, &vvp_session_key);
 
 static const struct lu_device_operations vvp_lu_ops = {
 	.ldo_object_alloc      = vvp_object_alloc
diff --git a/drivers/staging/lustre/lustre/llite/vvp_internal.h b/drivers/staging/lustre/lustre/llite/vvp_internal.h
index 403d2a7..443485c 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_internal.h
+++ b/drivers/staging/lustre/lustre/llite/vvp_internal.h
@@ -81,10 +81,18 @@ enum ccc_setattr_lock_type {
 	SETATTR_MATCH_LOCK
 };
 
+/* specific architecture can implement only part of this list */
+enum vvp_io_subtype {
+	/** normal IO */
+	IO_NORMAL,
+	/** io started from splice_{read|write} */
+	IO_SPLICE
+};
+
 /**
- * IO state private to vvp or slp layers.
+ * IO state private to IO state private to VVP layer.
  */
-struct ccc_io {
+struct vvp_io {
 	/** super class */
 	struct cl_io_slice     cui_cl;
 	struct cl_io_lock_link cui_link;
@@ -98,16 +106,47 @@ struct ccc_io {
 	size_t cui_tot_count;
 
 	union {
+		struct vvp_fault_io {
+			/**
+			 * Inode modification time that is checked across DLM
+			 * lock request.
+			 */
+			time64_t	    ft_mtime;
+			struct vm_area_struct *ft_vma;
+			/**
+			 *  locked page returned from vvp_io
+			 */
+			struct page	    *ft_vmpage;
+			/**
+			 * kernel fault info
+			 */
+			struct vm_fault *ft_vmf;
+			/**
+			 * fault API used bitflags for return code.
+			 */
+			unsigned int    ft_flags;
+			/**
+			 * check that flags are from filemap_fault
+			 */
+			bool		ft_flags_valid;
+		} fault;
 		struct {
 			enum ccc_setattr_lock_type cui_local_lock;
 		} setattr;
 		struct {
+			struct pipe_inode_info	*cui_pipe;
+			unsigned int		 cui_flags;
+		} splice;
+		struct {
 			struct cl_page_list cui_queue;
 			unsigned long cui_written;
 			int cui_from;
 			int cui_to;
 		} write;
 	} u;
+
+	enum vvp_io_subtype	cui_io_subtype;
+
 	/**
 	 * Layout version when this IO is initialized
 	 */
@@ -117,6 +156,12 @@ struct ccc_io {
 	 */
 	struct ll_file_data *cui_fd;
 	struct kiocb *cui_iocb;
+
+	/* Readahead state. */
+	pgoff_t	cui_ra_start;
+	pgoff_t	cui_ra_count;
+	/* Set when cui_ra_{start,count} have been initialized. */
+	bool		cui_ra_valid;
 };
 
 /**
@@ -126,7 +171,7 @@ struct ccc_io {
 int cl_is_normalio(const struct lu_env *env, const struct cl_io *io);
 
 extern struct lu_context_key ccc_key;
-extern struct lu_context_key ccc_session_key;
+extern struct lu_context_key vvp_session_key;
 
 extern struct kmem_cache *vvp_lock_kmem;
 extern struct kmem_cache *vvp_object_kmem;
@@ -174,23 +219,23 @@ static inline struct cl_io *ccc_env_thread_io(const struct lu_env *env)
 	return io;
 }
 
-struct ccc_session {
-	struct ccc_io cs_ios;
+struct vvp_session {
+	struct vvp_io cs_ios;
 };
 
-static inline struct ccc_session *ccc_env_session(const struct lu_env *env)
+static inline struct vvp_session *vvp_env_session(const struct lu_env *env)
 {
-	struct ccc_session *ses;
+	struct vvp_session *ses;
 
-	ses = lu_context_key_get(env->le_ses, &ccc_session_key);
+	ses = lu_context_key_get(env->le_ses, &vvp_session_key);
 	LASSERT(ses);
 
 	return ses;
 }
 
-static inline struct ccc_io *ccc_env_io(const struct lu_env *env)
+static inline struct vvp_io *vvp_env_io(const struct lu_env *env)
 {
-	return &ccc_env_session(env)->cs_ios;
+	return &vvp_env_session(env)->cs_ios;
 }
 
 /**
@@ -282,10 +327,6 @@ void *ccc_key_init(const struct lu_context *ctx,
 		   struct lu_context_key *key);
 void ccc_key_fini(const struct lu_context *ctx,
 		  struct lu_context_key *key, void *data);
-void *ccc_session_key_init(const struct lu_context *ctx,
-			   struct lu_context_key *key);
-void  ccc_session_key_fini(const struct lu_context *ctx,
-			   struct lu_context_key *key, void *data);
 
 int ccc_req_init(const struct lu_env *env, struct cl_device *dev,
 		 struct cl_req *req);
@@ -293,16 +334,16 @@ void ccc_umount(const struct lu_env *env, struct cl_device *dev);
 int ccc_global_init(struct lu_device_type *device_type);
 void ccc_global_fini(struct lu_device_type *device_type);
 
-int ccc_io_one_lock_index(const struct lu_env *env, struct cl_io *io,
+int vvp_io_one_lock_index(const struct lu_env *env, struct cl_io *io,
 			  __u32 enqflags, enum cl_lock_mode mode,
 			  pgoff_t start, pgoff_t end);
-int ccc_io_one_lock(const struct lu_env *env, struct cl_io *io,
+int vvp_io_one_lock(const struct lu_env *env, struct cl_io *io,
 		    __u32 enqflags, enum cl_lock_mode mode,
 		    loff_t start, loff_t end);
-void ccc_io_end(const struct lu_env *env, const struct cl_io_slice *ios);
-void ccc_io_advance(const struct lu_env *env, const struct cl_io_slice *ios,
+void vvp_io_end(const struct lu_env *env, const struct cl_io_slice *ios);
+void vvp_io_advance(const struct lu_env *env, const struct cl_io_slice *ios,
 		    size_t nob);
-void ccc_io_update_iov(const struct lu_env *env, struct ccc_io *cio,
+void vvp_io_update_iov(const struct lu_env *env, struct vvp_io *cio,
 		       struct cl_io *io);
 int ccc_prep_size(const struct lu_env *env, struct cl_object *obj,
 		  struct cl_io *io, loff_t start, size_t count, int *exceed);
@@ -356,7 +397,7 @@ static inline struct vvp_lock *cl2vvp_lock(const struct cl_lock_slice *slice)
 	return container_of(slice, struct vvp_lock, vlk_cl);
 }
 
-struct ccc_io *cl2ccc_io(const struct lu_env *env,
+struct vvp_io *cl2vvp_io(const struct lu_env *env,
 			 const struct cl_io_slice *slice);
 struct ccc_req *cl2ccc_req(const struct cl_req_slice *slice);
 
diff --git a/drivers/staging/lustre/lustre/llite/vvp_io.c b/drivers/staging/lustre/lustre/llite/vvp_io.c
index 86e73d5..1059c63 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_io.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_io.c
@@ -47,9 +47,6 @@
 #include "llite_internal.h"
 #include "vvp_internal.h"
 
-static struct vvp_io *cl2vvp_io(const struct lu_env *env,
-				const struct cl_io_slice *slice);
-
 /**
  * True, if \a io is a normal io, False for splice_{read,write}
  */
@@ -72,7 +69,7 @@ static bool can_populate_pages(const struct lu_env *env, struct cl_io *io,
 			       struct inode *inode)
 {
 	struct ll_inode_info	*lli = ll_i2info(inode);
-	struct ccc_io		*cio = ccc_env_io(env);
+	struct vvp_io		*cio = vvp_env_io(env);
 	bool rc = true;
 
 	switch (io->ci_type) {
@@ -105,7 +102,7 @@ static bool can_populate_pages(const struct lu_env *env, struct cl_io *io,
 static int vvp_io_write_iter_init(const struct lu_env *env,
 				  const struct cl_io_slice *ios)
 {
-	struct ccc_io *cio = cl2ccc_io(env, ios);
+	struct vvp_io *cio = cl2vvp_io(env, ios);
 
 	cl_page_list_init(&cio->u.write.cui_queue);
 	cio->u.write.cui_written = 0;
@@ -118,7 +115,7 @@ static int vvp_io_write_iter_init(const struct lu_env *env,
 static void vvp_io_write_iter_fini(const struct lu_env *env,
 				   const struct cl_io_slice *ios)
 {
-	struct ccc_io *cio = cl2ccc_io(env, ios);
+	struct vvp_io *cio = cl2vvp_io(env, ios);
 
 	LASSERT(cio->u.write.cui_queue.pl_nr == 0);
 }
@@ -129,8 +126,7 @@ static int vvp_io_fault_iter_init(const struct lu_env *env,
 	struct vvp_io *vio   = cl2vvp_io(env, ios);
 	struct inode  *inode = vvp_object_inode(ios->cis_obj);
 
-	LASSERT(inode ==
-		file_inode(cl2ccc_io(env, ios)->cui_fd->fd_file));
+	LASSERT(inode == file_inode(vio->cui_fd->fd_file));
 	vio->u.fault.ft_mtime = inode->i_mtime.tv_sec;
 	return 0;
 }
@@ -139,7 +135,7 @@ static void vvp_io_fini(const struct lu_env *env, const struct cl_io_slice *ios)
 {
 	struct cl_io     *io  = ios->cis_io;
 	struct cl_object *obj = io->ci_obj;
-	struct ccc_io    *cio = cl2ccc_io(env, ios);
+	struct vvp_io    *cio = cl2vvp_io(env, ios);
 
 	CLOBINVRNT(env, obj, vvp_object_invariant(obj));
 
@@ -225,7 +221,7 @@ static enum cl_lock_mode vvp_mode_from_vma(struct vm_area_struct *vma)
 }
 
 static int vvp_mmap_locks(const struct lu_env *env,
-			  struct ccc_io *vio, struct cl_io *io)
+			  struct vvp_io *vio, struct cl_io *io)
 {
 	struct ccc_thread_info *cti = ccc_env_info(env);
 	struct mm_struct       *mm = current->mm;
@@ -310,19 +306,19 @@ static int vvp_mmap_locks(const struct lu_env *env,
 static int vvp_io_rw_lock(const struct lu_env *env, struct cl_io *io,
 			  enum cl_lock_mode mode, loff_t start, loff_t end)
 {
-	struct ccc_io *cio = ccc_env_io(env);
+	struct vvp_io *cio = vvp_env_io(env);
 	int result;
 	int ast_flags = 0;
 
 	LASSERT(io->ci_type == CIT_READ || io->ci_type == CIT_WRITE);
 
-	ccc_io_update_iov(env, cio, io);
+	vvp_io_update_iov(env, cio, io);
 
 	if (io->u.ci_rw.crw_nonblock)
 		ast_flags |= CEF_NONBLOCK;
 	result = vvp_mmap_locks(env, cio, io);
 	if (result == 0)
-		result = ccc_io_one_lock(env, io, ast_flags, mode, start, end);
+		result = vvp_io_one_lock(env, io, ast_flags, mode, start, end);
 	return result;
 }
 
@@ -347,9 +343,11 @@ static int vvp_io_fault_lock(const struct lu_env *env,
 	/*
 	 * XXX LDLM_FL_CBPENDING
 	 */
-	return ccc_io_one_lock_index
-		(env, io, 0, vvp_mode_from_vma(vio->u.fault.ft_vma),
-		 io->u.ci_fault.ft_index, io->u.ci_fault.ft_index);
+	return vvp_io_one_lock_index(env,
+				     io, 0,
+				     vvp_mode_from_vma(vio->u.fault.ft_vma),
+				     io->u.ci_fault.ft_index,
+				     io->u.ci_fault.ft_index);
 }
 
 static int vvp_io_write_lock(const struct lu_env *env,
@@ -383,7 +381,7 @@ static int vvp_io_setattr_iter_init(const struct lu_env *env,
 static int vvp_io_setattr_lock(const struct lu_env *env,
 			       const struct cl_io_slice *ios)
 {
-	struct ccc_io *cio = ccc_env_io(env);
+	struct vvp_io *cio = vvp_env_io(env);
 	struct cl_io  *io  = ios->cis_io;
 	__u64 new_size;
 	__u32 enqflags = 0;
@@ -401,7 +399,8 @@ static int vvp_io_setattr_lock(const struct lu_env *env,
 		new_size = 0;
 	}
 	cio->u.setattr.cui_local_lock = SETATTR_EXTENT_LOCK;
-	return ccc_io_one_lock(env, io, enqflags, CLM_WRITE,
+
+	return vvp_io_one_lock(env, io, enqflags, CLM_WRITE,
 			       new_size, OBD_OBJECT_EOF);
 }
 
@@ -496,16 +495,15 @@ static int vvp_io_read_start(const struct lu_env *env,
 			     const struct cl_io_slice *ios)
 {
 	struct vvp_io     *vio   = cl2vvp_io(env, ios);
-	struct ccc_io     *cio   = cl2ccc_io(env, ios);
 	struct cl_io      *io    = ios->cis_io;
 	struct cl_object  *obj   = io->ci_obj;
 	struct inode      *inode = vvp_object_inode(obj);
-	struct file       *file  = cio->cui_fd->fd_file;
+	struct file       *file  = vio->cui_fd->fd_file;
 
 	int     result;
 	loff_t  pos = io->u.ci_rd.rd.crw_pos;
 	long    cnt = io->u.ci_rd.rd.crw_count;
-	long    tot = cio->cui_tot_count;
+	long    tot = vio->cui_tot_count;
 	int     exceed = 0;
 
 	CLOBINVRNT(env, obj, vvp_object_invariant(obj));
@@ -526,7 +524,7 @@ static int vvp_io_read_start(const struct lu_env *env,
 			 inode->i_ino, cnt, pos, i_size_read(inode));
 
 	/* turn off the kernel's read-ahead */
-	cio->cui_fd->fd_file->f_ra.ra_pages = 0;
+	vio->cui_fd->fd_file->f_ra.ra_pages = 0;
 
 	/* initialize read-ahead window once per syscall */
 	if (!vio->cui_ra_valid) {
@@ -540,8 +538,8 @@ static int vvp_io_read_start(const struct lu_env *env,
 	file_accessed(file);
 	switch (vio->cui_io_subtype) {
 	case IO_NORMAL:
-		LASSERT(cio->cui_iocb->ki_pos == pos);
-		result = generic_file_read_iter(cio->cui_iocb, cio->cui_iter);
+		LASSERT(vio->cui_iocb->ki_pos == pos);
+		result = generic_file_read_iter(vio->cui_iocb, vio->cui_iter);
 		break;
 	case IO_SPLICE:
 		result = generic_file_splice_read(file, &pos,
@@ -564,7 +562,7 @@ out:
 			io->ci_continue = 0;
 		io->ci_nob += result;
 		ll_rw_stats_tally(ll_i2sbi(inode), current->pid,
-				  cio->cui_fd, pos, result, READ);
+				  vio->cui_fd, pos, result, READ);
 		result = 0;
 	}
 	return result;
@@ -676,7 +674,7 @@ int vvp_io_write_commit(const struct lu_env *env, struct cl_io *io)
 {
 	struct cl_object *obj = io->ci_obj;
 	struct inode *inode = vvp_object_inode(obj);
-	struct ccc_io *cio = ccc_env_io(env);
+	struct vvp_io *cio = vvp_env_io(env);
 	struct cl_page_list *queue = &cio->u.write.cui_queue;
 	struct cl_page *page;
 	int rc = 0;
@@ -755,7 +753,7 @@ int vvp_io_write_commit(const struct lu_env *env, struct cl_io *io)
 static int vvp_io_write_start(const struct lu_env *env,
 			      const struct cl_io_slice *ios)
 {
-	struct ccc_io      *cio   = cl2ccc_io(env, ios);
+	struct vvp_io      *cio   = cl2vvp_io(env, ios);
 	struct cl_io       *io    = ios->cis_io;
 	struct cl_object   *obj   = io->ci_obj;
 	struct inode       *inode = vvp_object_inode(obj);
@@ -813,10 +811,10 @@ static int vvp_io_write_start(const struct lu_env *env,
 
 static int vvp_io_kernel_fault(struct vvp_fault_io *cfio)
 {
-	struct vm_fault *vmf = cfio->fault.ft_vmf;
+	struct vm_fault *vmf = cfio->ft_vmf;
 
-	cfio->fault.ft_flags = filemap_fault(cfio->ft_vma, vmf);
-	cfio->fault.ft_flags_valid = 1;
+	cfio->ft_flags = filemap_fault(cfio->ft_vma, vmf);
+	cfio->ft_flags_valid = 1;
 
 	if (vmf->page) {
 		CDEBUG(D_PAGE,
@@ -824,29 +822,29 @@ static int vvp_io_kernel_fault(struct vvp_fault_io *cfio)
 		       vmf->page, vmf->page->mapping, vmf->page->index,
 		       (long)vmf->page->flags, page_count(vmf->page),
 		       page_private(vmf->page), vmf->virtual_address);
-		if (unlikely(!(cfio->fault.ft_flags & VM_FAULT_LOCKED))) {
+		if (unlikely(!(cfio->ft_flags & VM_FAULT_LOCKED))) {
 			lock_page(vmf->page);
-			cfio->fault.ft_flags |= VM_FAULT_LOCKED;
+			cfio->ft_flags |= VM_FAULT_LOCKED;
 		}
 
 		cfio->ft_vmpage = vmf->page;
 		return 0;
 	}
 
-	if (cfio->fault.ft_flags & (VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV)) {
+	if (cfio->ft_flags & (VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV)) {
 		CDEBUG(D_PAGE, "got addr %p - SIGBUS\n", vmf->virtual_address);
 		return -EFAULT;
 	}
 
-	if (cfio->fault.ft_flags & VM_FAULT_OOM) {
+	if (cfio->ft_flags & VM_FAULT_OOM) {
 		CDEBUG(D_PAGE, "got addr %p - OOM\n", vmf->virtual_address);
 		return -ENOMEM;
 	}
 
-	if (cfio->fault.ft_flags & VM_FAULT_RETRY)
+	if (cfio->ft_flags & VM_FAULT_RETRY)
 		return -EAGAIN;
 
-	CERROR("Unknown error in page fault %d!\n", cfio->fault.ft_flags);
+	CERROR("Unknown error in page fault %d!\n", cfio->ft_flags);
 	return -EINVAL;
 }
 
@@ -1024,7 +1022,9 @@ out:
 	/* return unlocked vmpage to avoid deadlocking */
 	if (vmpage)
 		unlock_page(vmpage);
-	cfio->fault.ft_flags &= ~VM_FAULT_LOCKED;
+
+	cfio->ft_flags &= ~VM_FAULT_LOCKED;
+
 	return result;
 }
 
@@ -1047,7 +1047,7 @@ static int vvp_io_read_page(const struct lu_env *env,
 	struct cl_page	    *page   = slice->cpl_page;
 	struct inode              *inode  = vvp_object_inode(slice->cpl_obj);
 	struct ll_sb_info	 *sbi    = ll_i2sbi(inode);
-	struct ll_file_data       *fd     = cl2ccc_io(env, ios)->cui_fd;
+	struct ll_file_data       *fd     = cl2vvp_io(env, ios)->cui_fd;
 	struct ll_readahead_state *ras    = &fd->fd_ras;
 	struct cl_2queue	  *queue  = &io->ci_queue;
 
@@ -1080,7 +1080,7 @@ static const struct cl_io_operations vvp_io_ops = {
 			.cio_fini	= vvp_io_fini,
 			.cio_lock      = vvp_io_read_lock,
 			.cio_start     = vvp_io_read_start,
-			.cio_advance	= ccc_io_advance,
+			.cio_advance	= vvp_io_advance,
 		},
 		[CIT_WRITE] = {
 			.cio_fini      = vvp_io_fini,
@@ -1088,7 +1088,7 @@ static const struct cl_io_operations vvp_io_ops = {
 			.cio_iter_fini = vvp_io_write_iter_fini,
 			.cio_lock      = vvp_io_write_lock,
 			.cio_start     = vvp_io_write_start,
-			.cio_advance   = ccc_io_advance
+			.cio_advance   = vvp_io_advance,
 		},
 		[CIT_SETATTR] = {
 			.cio_fini       = vvp_io_setattr_fini,
@@ -1102,7 +1102,7 @@ static const struct cl_io_operations vvp_io_ops = {
 			.cio_iter_init = vvp_io_fault_iter_init,
 			.cio_lock      = vvp_io_fault_lock,
 			.cio_start     = vvp_io_fault_start,
-			.cio_end       = ccc_io_end
+			.cio_end       = vvp_io_end,
 		},
 		[CIT_FSYNC] = {
 			.cio_start  = vvp_io_fsync_start,
@@ -1119,7 +1119,6 @@ int vvp_io_init(const struct lu_env *env, struct cl_object *obj,
 		struct cl_io *io)
 {
 	struct vvp_io      *vio   = vvp_env_io(env);
-	struct ccc_io      *cio   = ccc_env_io(env);
 	struct inode       *inode = vvp_object_inode(obj);
 	int		 result;
 
@@ -1129,10 +1128,10 @@ int vvp_io_init(const struct lu_env *env, struct cl_object *obj,
 	       " ignore/verify layout %d/%d, layout version %d restore needed %d\n",
 	       PFID(lu_object_fid(&obj->co_lu)),
 	       io->ci_ignore_layout, io->ci_verify_layout,
-	       cio->cui_layout_gen, io->ci_restore_needed);
+	       vio->cui_layout_gen, io->ci_restore_needed);
 
-	CL_IO_SLICE_CLEAN(cio, cui_cl);
-	cl_io_slice_add(io, &cio->cui_cl, obj, &vvp_io_ops);
+	CL_IO_SLICE_CLEAN(vio, cui_cl);
+	cl_io_slice_add(io, &vio->cui_cl, obj, &vvp_io_ops);
 	vio->cui_ra_valid = false;
 	result = 0;
 	if (io->ci_type == CIT_READ || io->ci_type == CIT_WRITE) {
@@ -1146,7 +1145,7 @@ int vvp_io_init(const struct lu_env *env, struct cl_object *obj,
 		if (count == 0)
 			result = 1;
 		else
-			cio->cui_tot_count = count;
+			vio->cui_tot_count = count;
 
 		/* for read/write, we store the jobid in the inode, and
 		 * it'll be fetched by osc when building RPC.
@@ -1172,7 +1171,7 @@ int vvp_io_init(const struct lu_env *env, struct cl_object *obj,
 	 * because it might not grant layout lock in IT_OPEN.
 	 */
 	if (result == 0 && !io->ci_ignore_layout) {
-		result = ll_layout_refresh(inode, &cio->cui_layout_gen);
+		result = ll_layout_refresh(inode, &vio->cui_layout_gen);
 		if (result == -ENOENT)
 			/* If the inode on MDS has been removed, but the objects
 			 * on OSTs haven't been destroyed (async unlink), layout
@@ -1188,11 +1187,3 @@ int vvp_io_init(const struct lu_env *env, struct cl_object *obj,
 
 	return result;
 }
-
-static struct vvp_io *cl2vvp_io(const struct lu_env *env,
-				const struct cl_io_slice *slice)
-{
-	/* Calling just for assertion */
-	cl2ccc_io(env, slice);
-	return vvp_env_io(env);
-}
diff --git a/drivers/staging/lustre/lustre/llite/vvp_page.c b/drivers/staging/lustre/lustre/llite/vvp_page.c
index 5ebbe27..4f7dfe2 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_page.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_page.c
@@ -372,7 +372,7 @@ static int vvp_page_is_under_lock(const struct lu_env *env,
 {
 	if (io->ci_type == CIT_READ || io->ci_type == CIT_WRITE ||
 	    io->ci_type == CIT_FAULT) {
-		struct ccc_io *cio = ccc_env_io(env);
+		struct vvp_io *cio = vvp_env_io(env);
 
 		if (unlikely(cio->cui_fd->fd_flags & LL_FILE_GROUP_LOCKED))
 			*max_index = CL_PAGE_EOF;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 31/46] staging/lustre/llite: use vui prefix for struct vvp_io members
  2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
                   ` (29 preceding siblings ...)
  2016-03-30 23:48 ` [PATCH v2 30/46] staging/lustre/llite: merge ccc_io and vvp_io green
@ 2016-03-30 23:48 ` green
  2016-03-30 23:48 ` [PATCH v2 32/46] staging/lustre/llite: move vvp_io functions to vvp_io.c green
                   ` (14 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: green @ 2016-03-30 23:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List,
	John L. Hammond, Oleg Drokin

From: "John L. Hammond" <john.hammond@intel.com>

Rename members of struct vvp_io to used to start with vui_ rather than
cui_.  Rename several instances of struct vvp_io * from cio to vio.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Reviewed-on: http://review.whamcloud.com/13363
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5971
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 drivers/staging/lustre/lustre/llite/file.c         |  20 ++--
 drivers/staging/lustre/lustre/llite/lcommon_cl.c   |  34 +++---
 drivers/staging/lustre/lustre/llite/llite_mmap.c   |   6 +-
 drivers/staging/lustre/lustre/llite/rw.c           |  24 ++--
 drivers/staging/lustre/lustre/llite/rw26.c         |  20 ++--
 drivers/staging/lustre/lustre/llite/vvp_internal.h |  38 +++---
 drivers/staging/lustre/lustre/llite/vvp_io.c       | 131 +++++++++++----------
 drivers/staging/lustre/lustre/llite/vvp_page.c     |   4 +-
 8 files changed, 139 insertions(+), 138 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index 27e7e65..63aa080 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -1135,18 +1135,18 @@ restart:
 	ll_io_init(io, file, iot == CIT_WRITE);
 
 	if (cl_io_rw_init(env, io, iot, *ppos, count) == 0) {
-		struct vvp_io *cio = vvp_env_io(env);
+		struct vvp_io *vio = vvp_env_io(env);
 		int write_mutex_locked = 0;
 
-		cio->cui_fd  = LUSTRE_FPRIVATE(file);
-		cio->cui_io_subtype = args->via_io_subtype;
+		vio->vui_fd  = LUSTRE_FPRIVATE(file);
+		vio->vui_io_subtype = args->via_io_subtype;
 
-		switch (cio->cui_io_subtype) {
+		switch (vio->vui_io_subtype) {
 		case IO_NORMAL:
-			cio->cui_iter = args->u.normal.via_iter;
-			cio->cui_iocb = args->u.normal.via_iocb;
+			vio->vui_iter = args->u.normal.via_iter;
+			vio->vui_iocb = args->u.normal.via_iocb;
 			if ((iot == CIT_WRITE) &&
-			    !(cio->cui_fd->fd_flags & LL_FILE_GROUP_LOCKED)) {
+			    !(vio->vui_fd->fd_flags & LL_FILE_GROUP_LOCKED)) {
 				if (mutex_lock_interruptible(&lli->
 							       lli_write_mutex)) {
 					result = -ERESTARTSYS;
@@ -1157,11 +1157,11 @@ restart:
 			down_read(&lli->lli_trunc_sem);
 			break;
 		case IO_SPLICE:
-			cio->u.splice.cui_pipe = args->u.splice.via_pipe;
-			cio->u.splice.cui_flags = args->u.splice.via_flags;
+			vio->u.splice.vui_pipe = args->u.splice.via_pipe;
+			vio->u.splice.vui_flags = args->u.splice.via_flags;
 			break;
 		default:
-			CERROR("Unknown IO type - %u\n", cio->cui_io_subtype);
+			CERROR("Unknown IO type - %u\n", vio->vui_io_subtype);
 			LBUG();
 		}
 		result = cl_io_loop(env, io);
diff --git a/drivers/staging/lustre/lustre/llite/lcommon_cl.c b/drivers/staging/lustre/lustre/llite/lcommon_cl.c
index 9a9d706..630c371 100644
--- a/drivers/staging/lustre/lustre/llite/lcommon_cl.c
+++ b/drivers/staging/lustre/lustre/llite/lcommon_cl.c
@@ -210,19 +210,19 @@ int vvp_io_one_lock_index(const struct lu_env *env, struct cl_io *io,
 			  __u32 enqflags, enum cl_lock_mode mode,
 			  pgoff_t start, pgoff_t end)
 {
-	struct vvp_io *cio = vvp_env_io(env);
-	struct cl_lock_descr   *descr = &cio->cui_link.cill_descr;
+	struct vvp_io          *vio   = vvp_env_io(env);
+	struct cl_lock_descr   *descr = &vio->vui_link.cill_descr;
 	struct cl_object       *obj   = io->ci_obj;
 
 	CLOBINVRNT(env, obj, vvp_object_invariant(obj));
 
 	CDEBUG(D_VFSTRACE, "lock: %d [%lu, %lu]\n", mode, start, end);
 
-	memset(&cio->cui_link, 0, sizeof(cio->cui_link));
+	memset(&vio->vui_link, 0, sizeof(vio->vui_link));
 
-	if (cio->cui_fd && (cio->cui_fd->fd_flags & LL_FILE_GROUP_LOCKED)) {
+	if (vio->vui_fd && (vio->vui_fd->fd_flags & LL_FILE_GROUP_LOCKED)) {
 		descr->cld_mode = CLM_GROUP;
-		descr->cld_gid  = cio->cui_fd->fd_grouplock.cg_gid;
+		descr->cld_gid  = vio->vui_fd->fd_grouplock.cg_gid;
 	} else {
 		descr->cld_mode  = mode;
 	}
@@ -231,19 +231,19 @@ int vvp_io_one_lock_index(const struct lu_env *env, struct cl_io *io,
 	descr->cld_end   = end;
 	descr->cld_enq_flags = enqflags;
 
-	cl_io_lock_add(env, io, &cio->cui_link);
+	cl_io_lock_add(env, io, &vio->vui_link);
 	return 0;
 }
 
 void vvp_io_update_iov(const struct lu_env *env,
-		       struct vvp_io *cio, struct cl_io *io)
+		       struct vvp_io *vio, struct cl_io *io)
 {
 	size_t size = io->u.ci_rw.crw_count;
 
-	if (!cl_is_normalio(env, io) || !cio->cui_iter)
+	if (!cl_is_normalio(env, io) || !vio->vui_iter)
 		return;
 
-	iov_iter_truncate(cio->cui_iter, size);
+	iov_iter_truncate(vio->vui_iter, size);
 }
 
 int vvp_io_one_lock(const struct lu_env *env, struct cl_io *io,
@@ -266,7 +266,7 @@ void vvp_io_advance(const struct lu_env *env,
 		    const struct cl_io_slice *ios,
 		    size_t nob)
 {
-	struct vvp_io    *cio = cl2vvp_io(env, ios);
+	struct vvp_io    *vio = cl2vvp_io(env, ios);
 	struct cl_io     *io  = ios->cis_io;
 	struct cl_object *obj = ios->cis_io->ci_obj;
 
@@ -275,7 +275,7 @@ void vvp_io_advance(const struct lu_env *env,
 	if (!cl_is_normalio(env, io))
 		return;
 
-	iov_iter_reexpand(cio->cui_iter, cio->cui_tot_count  -= nob);
+	iov_iter_reexpand(vio->vui_iter, vio->vui_tot_count  -= nob);
 }
 
 /**
@@ -461,13 +461,13 @@ int cl_setattr_ost(struct inode *inode, const struct iattr *attr)
 
 again:
 	if (cl_io_init(env, io, CIT_SETATTR, io->ci_obj) == 0) {
-		struct vvp_io *cio = vvp_env_io(env);
+		struct vvp_io *vio = vvp_env_io(env);
 
 		if (attr->ia_valid & ATTR_FILE)
 			/* populate the file descriptor for ftruncate to honor
 			 * group lock - see LU-787
 			 */
-			cio->cui_fd = LUSTRE_FPRIVATE(attr->ia_file);
+			vio->vui_fd = LUSTRE_FPRIVATE(attr->ia_file);
 
 		result = cl_io_loop(env, io);
 	} else {
@@ -496,12 +496,12 @@ again:
 struct vvp_io *cl2vvp_io(const struct lu_env *env,
 			 const struct cl_io_slice *slice)
 {
-	struct vvp_io *cio;
+	struct vvp_io *vio;
 
-	cio = container_of(slice, struct vvp_io, cui_cl);
-	LASSERT(cio == vvp_env_io(env));
+	vio = container_of(slice, struct vvp_io, vui_cl);
+	LASSERT(vio == vvp_env_io(env));
 
-	return cio;
+	return vio;
 }
 
 struct ccc_req *cl2ccc_req(const struct cl_req_slice *slice)
diff --git a/drivers/staging/lustre/lustre/llite/llite_mmap.c b/drivers/staging/lustre/lustre/llite/llite_mmap.c
index 7c214f8..a7693c5 100644
--- a/drivers/staging/lustre/lustre/llite/llite_mmap.c
+++ b/drivers/staging/lustre/lustre/llite/llite_mmap.c
@@ -146,14 +146,14 @@ ll_fault_io_init(struct vm_area_struct *vma, struct lu_env **env_ret,
 
 	rc = cl_io_init(env, io, CIT_FAULT, io->ci_obj);
 	if (rc == 0) {
-		struct vvp_io *cio = vvp_env_io(env);
+		struct vvp_io *vio = vvp_env_io(env);
 		struct ll_file_data *fd = LUSTRE_FPRIVATE(file);
 
-		LASSERT(cio->cui_cl.cis_io == io);
+		LASSERT(vio->vui_cl.cis_io == io);
 
 		/* mmap lock must be MANDATORY it has to cache pages. */
 		io->ci_lockreq = CILR_MANDATORY;
-		cio->cui_fd = fd;
+		vio->vui_fd = fd;
 	} else {
 		LASSERT(rc < 0);
 		cl_io_fini(env, io);
diff --git a/drivers/staging/lustre/lustre/llite/rw.c b/drivers/staging/lustre/lustre/llite/rw.c
index da44107..634f0bb 100644
--- a/drivers/staging/lustre/lustre/llite/rw.c
+++ b/drivers/staging/lustre/lustre/llite/rw.c
@@ -90,7 +90,7 @@ struct ll_cl_context *ll_cl_init(struct file *file, struct page *vmpage)
 	struct lu_env    *env;
 	struct cl_io     *io;
 	struct cl_object *clob;
-	struct vvp_io    *cio;
+	struct vvp_io    *vio;
 
 	int refcheck;
 	int result = 0;
@@ -108,8 +108,8 @@ struct ll_cl_context *ll_cl_init(struct file *file, struct page *vmpage)
 	lcc->lcc_refcheck = refcheck;
 	lcc->lcc_cookie = current;
 
-	cio = vvp_env_io(env);
-	io = cio->cui_cl.cis_io;
+	vio = vvp_env_io(env);
+	io = vio->vui_cl.cis_io;
 	lcc->lcc_io = io;
 	if (!io) {
 		struct inode *inode = file_inode(file);
@@ -125,7 +125,7 @@ struct ll_cl_context *ll_cl_init(struct file *file, struct page *vmpage)
 		struct cl_page   *page;
 
 		LASSERT(io->ci_state == CIS_IO_GOING);
-		LASSERT(cio->cui_fd == LUSTRE_FPRIVATE(file));
+		LASSERT(vio->vui_fd == LUSTRE_FPRIVATE(file));
 		page = cl_page_find(env, clob, vmpage->index, vmpage,
 				    CPT_CACHEABLE);
 		if (!IS_ERR(page)) {
@@ -553,10 +553,10 @@ int ll_readahead(const struct lu_env *env, struct cl_io *io,
 	spin_lock(&ras->ras_lock);
 
 	/* Enlarge the RA window to encompass the full read */
-	if (vio->cui_ra_valid &&
+	if (vio->vui_ra_valid &&
 	    ras->ras_window_start + ras->ras_window_len <
-	    vio->cui_ra_start + vio->cui_ra_count) {
-		ras->ras_window_len = vio->cui_ra_start + vio->cui_ra_count -
+	    vio->vui_ra_start + vio->vui_ra_count) {
+		ras->ras_window_len = vio->vui_ra_start + vio->vui_ra_count -
 				      ras->ras_window_start;
 	}
 
@@ -615,15 +615,15 @@ int ll_readahead(const struct lu_env *env, struct cl_io *io,
 	CDEBUG(D_READA, DFID ": ria: %lu/%lu, bead: %lu/%lu, hit: %d\n",
 	       PFID(lu_object_fid(&clob->co_lu)),
 	       ria->ria_start, ria->ria_end,
-	       vio->cui_ra_valid ? vio->cui_ra_start : 0,
-	       vio->cui_ra_valid ? vio->cui_ra_count : 0,
+	       vio->vui_ra_valid ? vio->vui_ra_start : 0,
+	       vio->vui_ra_valid ? vio->vui_ra_count : 0,
 	       hit);
 
 	/* at least to extend the readahead window to cover current read */
-	if (!hit && vio->cui_ra_valid &&
-	    vio->cui_ra_start + vio->cui_ra_count > ria->ria_start) {
+	if (!hit && vio->vui_ra_valid &&
+	    vio->vui_ra_start + vio->vui_ra_count > ria->ria_start) {
 		/* to the end of current read window. */
-		mlen = vio->cui_ra_start + vio->cui_ra_count - ria->ria_start;
+		mlen = vio->vui_ra_start + vio->vui_ra_count - ria->ria_start;
 		/* trim to RPC boundary */
 		start = ria->ria_start & (PTLRPC_MAX_BRW_PAGES - 1);
 		mlen = min(mlen, PTLRPC_MAX_BRW_PAGES - start);
diff --git a/drivers/staging/lustre/lustre/llite/rw26.c b/drivers/staging/lustre/lustre/llite/rw26.c
index 2f69634..106473e 100644
--- a/drivers/staging/lustre/lustre/llite/rw26.c
+++ b/drivers/staging/lustre/lustre/llite/rw26.c
@@ -376,7 +376,7 @@ static ssize_t ll_direct_IO_26(struct kiocb *iocb, struct iov_iter *iter,
 
 	env = cl_env_get(&refcheck);
 	LASSERT(!IS_ERR(env));
-	io = vvp_env_io(env)->cui_cl.cis_io;
+	io = vvp_env_io(env)->vui_cl.cis_io;
 	LASSERT(io);
 
 	/* 0. Need locking between buffered and direct access. and race with
@@ -439,10 +439,10 @@ out:
 		inode_unlock(inode);
 
 	if (tot_bytes > 0) {
-		struct vvp_io *cio = vvp_env_io(env);
+		struct vvp_io *vio = vvp_env_io(env);
 
 		/* no commit async for direct IO */
-		cio->u.write.cui_written += tot_bytes;
+		vio->u.write.vui_written += tot_bytes;
 	}
 
 	cl_env_put(env, &refcheck);
@@ -513,8 +513,8 @@ static int ll_write_begin(struct file *file, struct address_space *mapping,
 	/* To avoid deadlock, try to lock page first. */
 	vmpage = grab_cache_page_nowait(mapping, index);
 	if (unlikely(!vmpage || PageDirty(vmpage) || PageWriteback(vmpage))) {
-		struct vvp_io *cio = vvp_env_io(env);
-		struct cl_page_list *plist = &cio->u.write.cui_queue;
+		struct vvp_io *vio = vvp_env_io(env);
+		struct cl_page_list *plist = &vio->u.write.vui_queue;
 
 		/* if the page is already in dirty cache, we have to commit
 		 * the pages right now; otherwise, it may cause deadlock
@@ -595,7 +595,7 @@ static int ll_write_end(struct file *file, struct address_space *mapping,
 	struct ll_cl_context *lcc = fsdata;
 	struct lu_env *env;
 	struct cl_io *io;
-	struct vvp_io *cio;
+	struct vvp_io *vio;
 	struct cl_page *page;
 	unsigned from = pos & (PAGE_CACHE_SIZE - 1);
 	bool unplug = false;
@@ -606,21 +606,21 @@ static int ll_write_end(struct file *file, struct address_space *mapping,
 	env  = lcc->lcc_env;
 	page = lcc->lcc_page;
 	io   = lcc->lcc_io;
-	cio  = vvp_env_io(env);
+	vio  = vvp_env_io(env);
 
 	LASSERT(cl_page_is_owned(page, io));
 	if (copied > 0) {
-		struct cl_page_list *plist = &cio->u.write.cui_queue;
+		struct cl_page_list *plist = &vio->u.write.vui_queue;
 
 		lcc->lcc_page = NULL; /* page will be queued */
 
 		/* Add it into write queue */
 		cl_page_list_add(plist, page);
 		if (plist->pl_nr == 1) /* first page */
-			cio->u.write.cui_from = from;
+			vio->u.write.vui_from = from;
 		else
 			LASSERT(from == 0);
-		cio->u.write.cui_to = from + copied;
+		vio->u.write.vui_to = from + copied;
 
 		/* We may have one full RPC, commit it soon */
 		if (plist->pl_nr >= PTLRPC_MAX_BRW_PAGES)
diff --git a/drivers/staging/lustre/lustre/llite/vvp_internal.h b/drivers/staging/lustre/lustre/llite/vvp_internal.h
index 443485c..e04f23e 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_internal.h
+++ b/drivers/staging/lustre/lustre/llite/vvp_internal.h
@@ -94,16 +94,16 @@ enum vvp_io_subtype {
  */
 struct vvp_io {
 	/** super class */
-	struct cl_io_slice     cui_cl;
-	struct cl_io_lock_link cui_link;
+	struct cl_io_slice     vui_cl;
+	struct cl_io_lock_link vui_link;
 	/**
 	 * I/O vector information to or from which read/write is going.
 	 */
-	struct iov_iter *cui_iter;
+	struct iov_iter *vui_iter;
 	/**
 	 * Total size for the left IO.
 	 */
-	size_t cui_tot_count;
+	size_t vui_tot_count;
 
 	union {
 		struct vvp_fault_io {
@@ -131,37 +131,37 @@ struct vvp_io {
 			bool		ft_flags_valid;
 		} fault;
 		struct {
-			enum ccc_setattr_lock_type cui_local_lock;
+			enum ccc_setattr_lock_type vui_local_lock;
 		} setattr;
 		struct {
-			struct pipe_inode_info	*cui_pipe;
-			unsigned int		 cui_flags;
+			struct pipe_inode_info	*vui_pipe;
+			unsigned int		 vui_flags;
 		} splice;
 		struct {
-			struct cl_page_list cui_queue;
-			unsigned long cui_written;
-			int cui_from;
-			int cui_to;
+			struct cl_page_list vui_queue;
+			unsigned long vui_written;
+			int vui_from;
+			int vui_to;
 		} write;
 	} u;
 
-	enum vvp_io_subtype	cui_io_subtype;
+	enum vvp_io_subtype	vui_io_subtype;
 
 	/**
 	 * Layout version when this IO is initialized
 	 */
-	__u32		cui_layout_gen;
+	__u32			vui_layout_gen;
 	/**
 	 * File descriptor against which IO is done.
 	 */
-	struct ll_file_data *cui_fd;
-	struct kiocb *cui_iocb;
+	struct ll_file_data	*vui_fd;
+	struct kiocb		*vui_iocb;
 
 	/* Readahead state. */
-	pgoff_t	cui_ra_start;
-	pgoff_t	cui_ra_count;
-	/* Set when cui_ra_{start,count} have been initialized. */
-	bool		cui_ra_valid;
+	pgoff_t	vui_ra_start;
+	pgoff_t	vui_ra_count;
+	/* Set when vui_ra_{start,count} have been initialized. */
+	bool		vui_ra_valid;
 };
 
 /**
diff --git a/drivers/staging/lustre/lustre/llite/vvp_io.c b/drivers/staging/lustre/lustre/llite/vvp_io.c
index 1059c63..53cf2be 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_io.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_io.c
@@ -56,7 +56,7 @@ int cl_is_normalio(const struct lu_env *env, const struct cl_io *io)
 
 	LASSERT(io->ci_type == CIT_READ || io->ci_type == CIT_WRITE);
 
-	return vio->cui_io_subtype == IO_NORMAL;
+	return vio->vui_io_subtype == IO_NORMAL;
 }
 
 /**
@@ -69,7 +69,7 @@ static bool can_populate_pages(const struct lu_env *env, struct cl_io *io,
 			       struct inode *inode)
 {
 	struct ll_inode_info	*lli = ll_i2info(inode);
-	struct vvp_io		*cio = vvp_env_io(env);
+	struct vvp_io		*vio = vvp_env_io(env);
 	bool rc = true;
 
 	switch (io->ci_type) {
@@ -78,7 +78,7 @@ static bool can_populate_pages(const struct lu_env *env, struct cl_io *io,
 		/* don't need lock here to check lli_layout_gen as we have held
 		 * extent lock and GROUP lock has to hold to swap layout
 		 */
-		if (ll_layout_version_get(lli) != cio->cui_layout_gen) {
+		if (ll_layout_version_get(lli) != vio->vui_layout_gen) {
 			io->ci_need_restart = 1;
 			/* this will return application a short read/write */
 			io->ci_continue = 0;
@@ -102,12 +102,12 @@ static bool can_populate_pages(const struct lu_env *env, struct cl_io *io,
 static int vvp_io_write_iter_init(const struct lu_env *env,
 				  const struct cl_io_slice *ios)
 {
-	struct vvp_io *cio = cl2vvp_io(env, ios);
+	struct vvp_io *vio = cl2vvp_io(env, ios);
 
-	cl_page_list_init(&cio->u.write.cui_queue);
-	cio->u.write.cui_written = 0;
-	cio->u.write.cui_from = 0;
-	cio->u.write.cui_to = PAGE_SIZE;
+	cl_page_list_init(&vio->u.write.vui_queue);
+	vio->u.write.vui_written = 0;
+	vio->u.write.vui_from = 0;
+	vio->u.write.vui_to = PAGE_SIZE;
 
 	return 0;
 }
@@ -115,9 +115,9 @@ static int vvp_io_write_iter_init(const struct lu_env *env,
 static void vvp_io_write_iter_fini(const struct lu_env *env,
 				   const struct cl_io_slice *ios)
 {
-	struct vvp_io *cio = cl2vvp_io(env, ios);
+	struct vvp_io *vio = cl2vvp_io(env, ios);
 
-	LASSERT(cio->u.write.cui_queue.pl_nr == 0);
+	LASSERT(vio->u.write.vui_queue.pl_nr == 0);
 }
 
 static int vvp_io_fault_iter_init(const struct lu_env *env,
@@ -126,7 +126,7 @@ static int vvp_io_fault_iter_init(const struct lu_env *env,
 	struct vvp_io *vio   = cl2vvp_io(env, ios);
 	struct inode  *inode = vvp_object_inode(ios->cis_obj);
 
-	LASSERT(inode == file_inode(vio->cui_fd->fd_file));
+	LASSERT(inode == file_inode(vio->vui_fd->fd_file));
 	vio->u.fault.ft_mtime = inode->i_mtime.tv_sec;
 	return 0;
 }
@@ -135,7 +135,7 @@ static void vvp_io_fini(const struct lu_env *env, const struct cl_io_slice *ios)
 {
 	struct cl_io     *io  = ios->cis_io;
 	struct cl_object *obj = io->ci_obj;
-	struct vvp_io    *cio = cl2vvp_io(env, ios);
+	struct vvp_io    *vio = cl2vvp_io(env, ios);
 
 	CLOBINVRNT(env, obj, vvp_object_invariant(obj));
 
@@ -143,7 +143,7 @@ static void vvp_io_fini(const struct lu_env *env, const struct cl_io_slice *ios)
 	       " ignore/verify layout %d/%d, layout version %d restore needed %d\n",
 	       PFID(lu_object_fid(&obj->co_lu)),
 	       io->ci_ignore_layout, io->ci_verify_layout,
-	       cio->cui_layout_gen, io->ci_restore_needed);
+	       vio->vui_layout_gen, io->ci_restore_needed);
 
 	if (io->ci_restore_needed == 1) {
 		int	rc;
@@ -178,12 +178,12 @@ static void vvp_io_fini(const struct lu_env *env, const struct cl_io_slice *ios)
 
 		/* check layout version */
 		ll_layout_refresh(vvp_object_inode(obj), &gen);
-		io->ci_need_restart = cio->cui_layout_gen != gen;
+		io->ci_need_restart = vio->vui_layout_gen != gen;
 		if (io->ci_need_restart) {
 			CDEBUG(D_VFSTRACE,
 			       DFID" layout changed from %d to %d.\n",
 			       PFID(lu_object_fid(&obj->co_lu)),
-			       cio->cui_layout_gen, gen);
+			       vio->vui_layout_gen, gen);
 			/* today successful restore is the only possible case */
 			/* restore was done, clear restoring state */
 			ll_i2info(vvp_object_inode(obj))->lli_flags &=
@@ -239,14 +239,14 @@ static int vvp_mmap_locks(const struct lu_env *env,
 	if (!cl_is_normalio(env, io))
 		return 0;
 
-	if (!vio->cui_iter) /* nfs or loop back device write */
+	if (!vio->vui_iter) /* nfs or loop back device write */
 		return 0;
 
 	/* No MM (e.g. NFS)? No vmas too. */
 	if (!mm)
 		return 0;
 
-	iov_for_each(iov, i, *(vio->cui_iter)) {
+	iov_for_each(iov, i, *vio->vui_iter) {
 		addr = (unsigned long)iov.iov_base;
 		count = iov.iov_len;
 		if (count == 0)
@@ -306,17 +306,17 @@ static int vvp_mmap_locks(const struct lu_env *env,
 static int vvp_io_rw_lock(const struct lu_env *env, struct cl_io *io,
 			  enum cl_lock_mode mode, loff_t start, loff_t end)
 {
-	struct vvp_io *cio = vvp_env_io(env);
+	struct vvp_io *vio = vvp_env_io(env);
 	int result;
 	int ast_flags = 0;
 
 	LASSERT(io->ci_type == CIT_READ || io->ci_type == CIT_WRITE);
 
-	vvp_io_update_iov(env, cio, io);
+	vvp_io_update_iov(env, vio, io);
 
 	if (io->u.ci_rw.crw_nonblock)
 		ast_flags |= CEF_NONBLOCK;
-	result = vvp_mmap_locks(env, cio, io);
+	result = vvp_mmap_locks(env, vio, io);
 	if (result == 0)
 		result = vvp_io_one_lock(env, io, ast_flags, mode, start, end);
 	return result;
@@ -374,14 +374,14 @@ static int vvp_io_setattr_iter_init(const struct lu_env *env,
 }
 
 /**
- * Implementation of cl_io_operations::cio_lock() method for CIT_SETATTR io.
+ * Implementation of cl_io_operations::vio_lock() method for CIT_SETATTR io.
  *
  * Handles "lockless io" mode when extent locking is done by server.
  */
 static int vvp_io_setattr_lock(const struct lu_env *env,
 			       const struct cl_io_slice *ios)
 {
-	struct vvp_io *cio = vvp_env_io(env);
+	struct vvp_io *vio = vvp_env_io(env);
 	struct cl_io  *io  = ios->cis_io;
 	__u64 new_size;
 	__u32 enqflags = 0;
@@ -398,7 +398,8 @@ static int vvp_io_setattr_lock(const struct lu_env *env,
 			return 0;
 		new_size = 0;
 	}
-	cio->u.setattr.cui_local_lock = SETATTR_EXTENT_LOCK;
+
+	vio->u.setattr.vui_local_lock = SETATTR_EXTENT_LOCK;
 
 	return vvp_io_one_lock(env, io, enqflags, CLM_WRITE,
 			       new_size, OBD_OBJECT_EOF);
@@ -498,12 +499,12 @@ static int vvp_io_read_start(const struct lu_env *env,
 	struct cl_io      *io    = ios->cis_io;
 	struct cl_object  *obj   = io->ci_obj;
 	struct inode      *inode = vvp_object_inode(obj);
-	struct file       *file  = vio->cui_fd->fd_file;
+	struct file       *file  = vio->vui_fd->fd_file;
 
 	int     result;
 	loff_t  pos = io->u.ci_rd.rd.crw_pos;
 	long    cnt = io->u.ci_rd.rd.crw_count;
-	long    tot = vio->cui_tot_count;
+	long    tot = vio->vui_tot_count;
 	int     exceed = 0;
 
 	CLOBINVRNT(env, obj, vvp_object_invariant(obj));
@@ -524,27 +525,27 @@ static int vvp_io_read_start(const struct lu_env *env,
 			 inode->i_ino, cnt, pos, i_size_read(inode));
 
 	/* turn off the kernel's read-ahead */
-	vio->cui_fd->fd_file->f_ra.ra_pages = 0;
+	vio->vui_fd->fd_file->f_ra.ra_pages = 0;
 
 	/* initialize read-ahead window once per syscall */
-	if (!vio->cui_ra_valid) {
-		vio->cui_ra_valid = true;
-		vio->cui_ra_start = cl_index(obj, pos);
-		vio->cui_ra_count = cl_index(obj, tot + PAGE_CACHE_SIZE - 1);
+	if (!vio->vui_ra_valid) {
+		vio->vui_ra_valid = true;
+		vio->vui_ra_start = cl_index(obj, pos);
+		vio->vui_ra_count = cl_index(obj, tot + PAGE_CACHE_SIZE - 1);
 		ll_ras_enter(file);
 	}
 
 	/* BUG: 5972 */
 	file_accessed(file);
-	switch (vio->cui_io_subtype) {
+	switch (vio->vui_io_subtype) {
 	case IO_NORMAL:
-		LASSERT(vio->cui_iocb->ki_pos == pos);
-		result = generic_file_read_iter(vio->cui_iocb, vio->cui_iter);
+		LASSERT(vio->vui_iocb->ki_pos == pos);
+		result = generic_file_read_iter(vio->vui_iocb, vio->vui_iter);
 		break;
 	case IO_SPLICE:
 		result = generic_file_splice_read(file, &pos,
-						  vio->u.splice.cui_pipe, cnt,
-						  vio->u.splice.cui_flags);
+						  vio->u.splice.vui_pipe, cnt,
+						  vio->u.splice.vui_flags);
 		/* LU-1109: do splice read stripe by stripe otherwise if it
 		 * may make nfsd stuck if this read occupied all internal pipe
 		 * buffers.
@@ -552,7 +553,7 @@ static int vvp_io_read_start(const struct lu_env *env,
 		io->ci_continue = 0;
 		break;
 	default:
-		CERROR("Wrong IO type %u\n", vio->cui_io_subtype);
+		CERROR("Wrong IO type %u\n", vio->vui_io_subtype);
 		LBUG();
 	}
 
@@ -562,7 +563,7 @@ out:
 			io->ci_continue = 0;
 		io->ci_nob += result;
 		ll_rw_stats_tally(ll_i2sbi(inode), current->pid,
-				  vio->cui_fd, pos, result, READ);
+				  vio->vui_fd, pos, result, READ);
 		result = 0;
 	}
 	return result;
@@ -674,24 +675,24 @@ int vvp_io_write_commit(const struct lu_env *env, struct cl_io *io)
 {
 	struct cl_object *obj = io->ci_obj;
 	struct inode *inode = vvp_object_inode(obj);
-	struct vvp_io *cio = vvp_env_io(env);
-	struct cl_page_list *queue = &cio->u.write.cui_queue;
+	struct vvp_io *vio = vvp_env_io(env);
+	struct cl_page_list *queue = &vio->u.write.vui_queue;
 	struct cl_page *page;
 	int rc = 0;
 	int bytes = 0;
-	unsigned int npages = cio->u.write.cui_queue.pl_nr;
+	unsigned int npages = vio->u.write.vui_queue.pl_nr;
 
 	if (npages == 0)
 		return 0;
 
 	CDEBUG(D_VFSTRACE, "commit async pages: %d, from %d, to %d\n",
-	       npages, cio->u.write.cui_from, cio->u.write.cui_to);
+	       npages, vio->u.write.vui_from, vio->u.write.vui_to);
 
 	LASSERT(page_list_sanity_check(obj, queue));
 
 	/* submit IO with async write */
 	rc = cl_io_commit_async(env, io, queue,
-				cio->u.write.cui_from, cio->u.write.cui_to,
+				vio->u.write.vui_from, vio->u.write.vui_to,
 				write_commit_callback);
 	npages -= queue->pl_nr; /* already committed pages */
 	if (npages > 0) {
@@ -699,18 +700,18 @@ int vvp_io_write_commit(const struct lu_env *env, struct cl_io *io)
 		bytes = npages << PAGE_SHIFT;
 
 		/* first page */
-		bytes -= cio->u.write.cui_from;
+		bytes -= vio->u.write.vui_from;
 		if (queue->pl_nr == 0) /* last page */
-			bytes -= PAGE_SIZE - cio->u.write.cui_to;
+			bytes -= PAGE_SIZE - vio->u.write.vui_to;
 		LASSERTF(bytes > 0, "bytes = %d, pages = %d\n", bytes, npages);
 
-		cio->u.write.cui_written += bytes;
+		vio->u.write.vui_written += bytes;
 
 		CDEBUG(D_VFSTRACE, "Committed %d pages %d bytes, tot: %ld\n",
-		       npages, bytes, cio->u.write.cui_written);
+		       npages, bytes, vio->u.write.vui_written);
 
 		/* the first page must have been written. */
-		cio->u.write.cui_from = 0;
+		vio->u.write.vui_from = 0;
 	}
 	LASSERT(page_list_sanity_check(obj, queue));
 	LASSERT(ergo(rc == 0, queue->pl_nr == 0));
@@ -718,10 +719,10 @@ int vvp_io_write_commit(const struct lu_env *env, struct cl_io *io)
 	/* out of quota, try sync write */
 	if (rc == -EDQUOT && !cl_io_is_mkwrite(io)) {
 		rc = vvp_io_commit_sync(env, io, queue,
-					cio->u.write.cui_from,
-					cio->u.write.cui_to);
+					vio->u.write.vui_from,
+					vio->u.write.vui_to);
 		if (rc > 0) {
-			cio->u.write.cui_written += rc;
+			vio->u.write.vui_written += rc;
 			rc = 0;
 		}
 	}
@@ -753,7 +754,7 @@ int vvp_io_write_commit(const struct lu_env *env, struct cl_io *io)
 static int vvp_io_write_start(const struct lu_env *env,
 			      const struct cl_io_slice *ios)
 {
-	struct vvp_io      *cio   = cl2vvp_io(env, ios);
+	struct vvp_io      *vio   = cl2vvp_io(env, ios);
 	struct cl_io       *io    = ios->cis_io;
 	struct cl_object   *obj   = io->ci_obj;
 	struct inode       *inode = vvp_object_inode(obj);
@@ -771,22 +772,22 @@ static int vvp_io_write_start(const struct lu_env *env,
 		 */
 		ll_merge_attr(env, inode);
 		pos = io->u.ci_wr.wr.crw_pos = i_size_read(inode);
-		cio->cui_iocb->ki_pos = pos;
+		vio->vui_iocb->ki_pos = pos;
 	} else {
-		LASSERT(cio->cui_iocb->ki_pos == pos);
+		LASSERT(vio->vui_iocb->ki_pos == pos);
 	}
 
 	CDEBUG(D_VFSTRACE, "write: [%lli, %lli)\n", pos, pos + (long long)cnt);
 
-	if (!cio->cui_iter) /* from a temp io in ll_cl_init(). */
+	if (!vio->vui_iter) /* from a temp io in ll_cl_init(). */
 		result = 0;
 	else
-		result = generic_file_write_iter(cio->cui_iocb, cio->cui_iter);
+		result = generic_file_write_iter(vio->vui_iocb, vio->vui_iter);
 
 	if (result > 0) {
 		result = vvp_io_write_commit(env, io);
-		if (cio->u.write.cui_written > 0) {
-			result = cio->u.write.cui_written;
+		if (vio->u.write.vui_written > 0) {
+			result = vio->u.write.vui_written;
 			io->ci_nob += result;
 
 			CDEBUG(D_VFSTRACE, "write: nob %zd, result: %zd\n",
@@ -803,7 +804,7 @@ static int vvp_io_write_start(const struct lu_env *env,
 		if (result < cnt)
 			io->ci_continue = 0;
 		ll_rw_stats_tally(ll_i2sbi(inode), current->pid,
-				  cio->cui_fd, pos, result, WRITE);
+				  vio->vui_fd, pos, result, WRITE);
 		result = 0;
 	}
 	return result;
@@ -1047,7 +1048,7 @@ static int vvp_io_read_page(const struct lu_env *env,
 	struct cl_page	    *page   = slice->cpl_page;
 	struct inode              *inode  = vvp_object_inode(slice->cpl_obj);
 	struct ll_sb_info	 *sbi    = ll_i2sbi(inode);
-	struct ll_file_data       *fd     = cl2vvp_io(env, ios)->cui_fd;
+	struct ll_file_data       *fd     = cl2vvp_io(env, ios)->vui_fd;
 	struct ll_readahead_state *ras    = &fd->fd_ras;
 	struct cl_2queue	  *queue  = &io->ci_queue;
 
@@ -1128,11 +1129,11 @@ int vvp_io_init(const struct lu_env *env, struct cl_object *obj,
 	       " ignore/verify layout %d/%d, layout version %d restore needed %d\n",
 	       PFID(lu_object_fid(&obj->co_lu)),
 	       io->ci_ignore_layout, io->ci_verify_layout,
-	       vio->cui_layout_gen, io->ci_restore_needed);
+	       vio->vui_layout_gen, io->ci_restore_needed);
 
-	CL_IO_SLICE_CLEAN(vio, cui_cl);
-	cl_io_slice_add(io, &vio->cui_cl, obj, &vvp_io_ops);
-	vio->cui_ra_valid = false;
+	CL_IO_SLICE_CLEAN(vio, vui_cl);
+	cl_io_slice_add(io, &vio->vui_cl, obj, &vvp_io_ops);
+	vio->vui_ra_valid = false;
 	result = 0;
 	if (io->ci_type == CIT_READ || io->ci_type == CIT_WRITE) {
 		size_t count;
@@ -1145,7 +1146,7 @@ int vvp_io_init(const struct lu_env *env, struct cl_object *obj,
 		if (count == 0)
 			result = 1;
 		else
-			vio->cui_tot_count = count;
+			vio->vui_tot_count = count;
 
 		/* for read/write, we store the jobid in the inode, and
 		 * it'll be fetched by osc when building RPC.
@@ -1171,7 +1172,7 @@ int vvp_io_init(const struct lu_env *env, struct cl_object *obj,
 	 * because it might not grant layout lock in IT_OPEN.
 	 */
 	if (result == 0 && !io->ci_ignore_layout) {
-		result = ll_layout_refresh(inode, &vio->cui_layout_gen);
+		result = ll_layout_refresh(inode, &vio->vui_layout_gen);
 		if (result == -ENOENT)
 			/* If the inode on MDS has been removed, but the objects
 			 * on OSTs haven't been destroyed (async unlink), layout
diff --git a/drivers/staging/lustre/lustre/llite/vvp_page.c b/drivers/staging/lustre/lustre/llite/vvp_page.c
index 4f7dfe2..69316c1 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_page.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_page.c
@@ -372,9 +372,9 @@ static int vvp_page_is_under_lock(const struct lu_env *env,
 {
 	if (io->ci_type == CIT_READ || io->ci_type == CIT_WRITE ||
 	    io->ci_type == CIT_FAULT) {
-		struct vvp_io *cio = vvp_env_io(env);
+		struct vvp_io *vio = vvp_env_io(env);
 
-		if (unlikely(cio->cui_fd->fd_flags & LL_FILE_GROUP_LOCKED))
+		if (unlikely(vio->vui_fd->fd_flags & LL_FILE_GROUP_LOCKED))
 			*max_index = CL_PAGE_EOF;
 	}
 	return 0;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 32/46] staging/lustre/llite: move vvp_io functions to vvp_io.c
  2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
                   ` (30 preceding siblings ...)
  2016-03-30 23:48 ` [PATCH v2 31/46] staging/lustre/llite: use vui prefix for struct vvp_io members green
@ 2016-03-30 23:48 ` green
  2016-03-30 23:48 ` [PATCH v2 33/46] staging/lustre/llite: rename ccc_req to vvp_req green
                   ` (13 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: green @ 2016-03-30 23:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List,
	John L. Hammond, Oleg Drokin

From: "John L. Hammond" <john.hammond@intel.com>

Move all vvp_io related functions from lustre/llite/lcommon_cl.c to
the sole file where they are used lustre/llite/vvp_io.c.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Reviewed-on: http://review.whamcloud.com/13376
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5971
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 drivers/staging/lustre/lustre/llite/lcommon_cl.c   | 197 --------------------
 .../staging/lustre/lustre/llite/llite_internal.h   |   1 -
 drivers/staging/lustre/lustre/llite/vvp_internal.h |  22 +--
 drivers/staging/lustre/lustre/llite/vvp_io.c       | 198 ++++++++++++++++++++-
 4 files changed, 196 insertions(+), 222 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/lcommon_cl.c b/drivers/staging/lustre/lustre/llite/lcommon_cl.c
index 630c371..1b11103 100644
--- a/drivers/staging/lustre/lustre/llite/lcommon_cl.c
+++ b/drivers/staging/lustre/lustre/llite/lcommon_cl.c
@@ -184,192 +184,6 @@ void ccc_global_fini(struct lu_device_type *device_type)
 	lu_kmem_fini(ccc_caches);
 }
 
-static void vvp_object_size_lock(struct cl_object *obj)
-{
-	struct inode *inode = vvp_object_inode(obj);
-
-	ll_inode_size_lock(inode);
-	cl_object_attr_lock(obj);
-}
-
-static void vvp_object_size_unlock(struct cl_object *obj)
-{
-	struct inode *inode = vvp_object_inode(obj);
-
-	cl_object_attr_unlock(obj);
-	ll_inode_size_unlock(inode);
-}
-
-/*****************************************************************************
- *
- * io operations.
- *
- */
-
-int vvp_io_one_lock_index(const struct lu_env *env, struct cl_io *io,
-			  __u32 enqflags, enum cl_lock_mode mode,
-			  pgoff_t start, pgoff_t end)
-{
-	struct vvp_io          *vio   = vvp_env_io(env);
-	struct cl_lock_descr   *descr = &vio->vui_link.cill_descr;
-	struct cl_object       *obj   = io->ci_obj;
-
-	CLOBINVRNT(env, obj, vvp_object_invariant(obj));
-
-	CDEBUG(D_VFSTRACE, "lock: %d [%lu, %lu]\n", mode, start, end);
-
-	memset(&vio->vui_link, 0, sizeof(vio->vui_link));
-
-	if (vio->vui_fd && (vio->vui_fd->fd_flags & LL_FILE_GROUP_LOCKED)) {
-		descr->cld_mode = CLM_GROUP;
-		descr->cld_gid  = vio->vui_fd->fd_grouplock.cg_gid;
-	} else {
-		descr->cld_mode  = mode;
-	}
-	descr->cld_obj   = obj;
-	descr->cld_start = start;
-	descr->cld_end   = end;
-	descr->cld_enq_flags = enqflags;
-
-	cl_io_lock_add(env, io, &vio->vui_link);
-	return 0;
-}
-
-void vvp_io_update_iov(const struct lu_env *env,
-		       struct vvp_io *vio, struct cl_io *io)
-{
-	size_t size = io->u.ci_rw.crw_count;
-
-	if (!cl_is_normalio(env, io) || !vio->vui_iter)
-		return;
-
-	iov_iter_truncate(vio->vui_iter, size);
-}
-
-int vvp_io_one_lock(const struct lu_env *env, struct cl_io *io,
-		    __u32 enqflags, enum cl_lock_mode mode,
-		    loff_t start, loff_t end)
-{
-	struct cl_object *obj = io->ci_obj;
-
-	return vvp_io_one_lock_index(env, io, enqflags, mode,
-				     cl_index(obj, start), cl_index(obj, end));
-}
-
-void vvp_io_end(const struct lu_env *env, const struct cl_io_slice *ios)
-{
-	CLOBINVRNT(env, ios->cis_io->ci_obj,
-		   vvp_object_invariant(ios->cis_io->ci_obj));
-}
-
-void vvp_io_advance(const struct lu_env *env,
-		    const struct cl_io_slice *ios,
-		    size_t nob)
-{
-	struct vvp_io    *vio = cl2vvp_io(env, ios);
-	struct cl_io     *io  = ios->cis_io;
-	struct cl_object *obj = ios->cis_io->ci_obj;
-
-	CLOBINVRNT(env, obj, vvp_object_invariant(obj));
-
-	if (!cl_is_normalio(env, io))
-		return;
-
-	iov_iter_reexpand(vio->vui_iter, vio->vui_tot_count  -= nob);
-}
-
-/**
- * Helper function that if necessary adjusts file size (inode->i_size), when
- * position at the offset \a pos is accessed. File size can be arbitrary stale
- * on a Lustre client, but client at least knows KMS. If accessed area is
- * inside [0, KMS], set file size to KMS, otherwise glimpse file size.
- *
- * Locking: cl_isize_lock is used to serialize changes to inode size and to
- * protect consistency between inode size and cl_object
- * attributes. cl_object_size_lock() protects consistency between cl_attr's of
- * top-object and sub-objects.
- */
-int ccc_prep_size(const struct lu_env *env, struct cl_object *obj,
-		  struct cl_io *io, loff_t start, size_t count, int *exceed)
-{
-	struct cl_attr *attr  = ccc_env_thread_attr(env);
-	struct inode   *inode = vvp_object_inode(obj);
-	loff_t	  pos   = start + count - 1;
-	loff_t kms;
-	int result;
-
-	/*
-	 * Consistency guarantees: following possibilities exist for the
-	 * relation between region being accessed and real file size at this
-	 * moment:
-	 *
-	 *  (A): the region is completely inside of the file;
-	 *
-	 *  (B-x): x bytes of region are inside of the file, the rest is
-	 *  outside;
-	 *
-	 *  (C): the region is completely outside of the file.
-	 *
-	 * This classification is stable under DLM lock already acquired by
-	 * the caller, because to change the class, other client has to take
-	 * DLM lock conflicting with our lock. Also, any updates to ->i_size
-	 * by other threads on this client are serialized by
-	 * ll_inode_size_lock(). This guarantees that short reads are handled
-	 * correctly in the face of concurrent writes and truncates.
-	 */
-	vvp_object_size_lock(obj);
-	result = cl_object_attr_get(env, obj, attr);
-	if (result == 0) {
-		kms = attr->cat_kms;
-		if (pos > kms) {
-			/*
-			 * A glimpse is necessary to determine whether we
-			 * return a short read (B) or some zeroes at the end
-			 * of the buffer (C)
-			 */
-			vvp_object_size_unlock(obj);
-			result = cl_glimpse_lock(env, io, inode, obj, 0);
-			if (result == 0 && exceed) {
-				/* If objective page index exceed end-of-file
-				 * page index, return directly. Do not expect
-				 * kernel will check such case correctly.
-				 * linux-2.6.18-128.1.1 miss to do that.
-				 * --bug 17336
-				 */
-				loff_t size = i_size_read(inode);
-				loff_t cur_index = start >> PAGE_CACHE_SHIFT;
-				loff_t size_index = (size - 1) >>
-						    PAGE_CACHE_SHIFT;
-
-				if ((size == 0 && cur_index != 0) ||
-				    size_index < cur_index)
-					*exceed = 1;
-			}
-			return result;
-		}
-		/*
-		 * region is within kms and, hence, within real file
-		 * size (A). We need to increase i_size to cover the
-		 * read region so that generic_file_read() will do its
-		 * job, but that doesn't mean the kms size is
-		 * _correct_, it is only the _minimum_ size. If
-		 * someone does a stat they will get the correct size
-		 * which will always be >= the kms value here.
-		 * b=11081
-		 */
-		if (i_size_read(inode) < kms) {
-			i_size_write(inode, kms);
-			CDEBUG(D_VFSTRACE, DFID " updating i_size %llu\n",
-			       PFID(lu_object_fid(&obj->co_lu)),
-			       (__u64)i_size_read(inode));
-		}
-	}
-
-	vvp_object_size_unlock(obj);
-
-	return result;
-}
-
 /*****************************************************************************
  *
  * Transfer operations.
@@ -493,17 +307,6 @@ again:
  *
  */
 
-struct vvp_io *cl2vvp_io(const struct lu_env *env,
-			 const struct cl_io_slice *slice)
-{
-	struct vvp_io *vio;
-
-	vio = container_of(slice, struct vvp_io, vui_cl);
-	LASSERT(vio == vvp_env_io(env));
-
-	return vio;
-}
-
 struct ccc_req *cl2ccc_req(const struct cl_req_slice *slice)
 {
 	return container_of0(slice, struct ccc_req, crq_cl);
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index 9856bb6..86e93c0 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -693,7 +693,6 @@ void ll_readahead_init(struct inode *inode, struct ll_readahead_state *ras);
 int ll_readahead(const struct lu_env *env, struct cl_io *io,
 		 struct cl_page_list *queue, struct ll_readahead_state *ras,
 		 bool hit);
-int vvp_io_write_commit(const struct lu_env *env, struct cl_io *io);
 struct ll_cl_context *ll_cl_init(struct file *file, struct page *vmpage);
 void ll_cl_fini(struct ll_cl_context *lcc);
 
diff --git a/drivers/staging/lustre/lustre/llite/vvp_internal.h b/drivers/staging/lustre/lustre/llite/vvp_internal.h
index e04f23e..7af8c44 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_internal.h
+++ b/drivers/staging/lustre/lustre/llite/vvp_internal.h
@@ -164,12 +164,6 @@ struct vvp_io {
 	bool		vui_ra_valid;
 };
 
-/**
- * True, if \a io is a normal io, False for other splice_{read,write}.
- * must be implemented in arch specific code.
- */
-int cl_is_normalio(const struct lu_env *env, const struct cl_io *io);
-
 extern struct lu_context_key ccc_key;
 extern struct lu_context_key vvp_session_key;
 
@@ -334,19 +328,6 @@ void ccc_umount(const struct lu_env *env, struct cl_device *dev);
 int ccc_global_init(struct lu_device_type *device_type);
 void ccc_global_fini(struct lu_device_type *device_type);
 
-int vvp_io_one_lock_index(const struct lu_env *env, struct cl_io *io,
-			  __u32 enqflags, enum cl_lock_mode mode,
-			  pgoff_t start, pgoff_t end);
-int vvp_io_one_lock(const struct lu_env *env, struct cl_io *io,
-		    __u32 enqflags, enum cl_lock_mode mode,
-		    loff_t start, loff_t end);
-void vvp_io_end(const struct lu_env *env, const struct cl_io_slice *ios);
-void vvp_io_advance(const struct lu_env *env, const struct cl_io_slice *ios,
-		    size_t nob);
-void vvp_io_update_iov(const struct lu_env *env, struct vvp_io *cio,
-		       struct cl_io *io);
-int ccc_prep_size(const struct lu_env *env, struct cl_object *obj,
-		  struct cl_io *io, loff_t start, size_t count, int *exceed);
 void ccc_req_completion(const struct lu_env *env,
 			const struct cl_req_slice *slice, int ioret);
 void ccc_req_attr_set(const struct lu_env *env,
@@ -397,8 +378,6 @@ static inline struct vvp_lock *cl2vvp_lock(const struct cl_lock_slice *slice)
 	return container_of(slice, struct vvp_lock, vlk_cl);
 }
 
-struct vvp_io *cl2vvp_io(const struct lu_env *env,
-			 const struct cl_io_slice *slice);
 struct ccc_req *cl2ccc_req(const struct cl_req_slice *slice);
 
 int cl_setattr_ost(struct inode *inode, const struct iattr *attr);
@@ -447,6 +426,7 @@ void ccc_inode_lsm_put(struct inode *inode, struct lov_stripe_md *lsm);
 
 int vvp_io_init(const struct lu_env *env, struct cl_object *obj,
 		struct cl_io *io);
+int vvp_io_write_commit(const struct lu_env *env, struct cl_io *io);
 int vvp_lock_init(const struct lu_env *env, struct cl_object *obj,
 		  struct cl_lock *lock, const struct cl_io *io);
 int vvp_page_init(const struct lu_env *env, struct cl_object *obj,
diff --git a/drivers/staging/lustre/lustre/llite/vvp_io.c b/drivers/staging/lustre/lustre/llite/vvp_io.c
index 53cf2be..48b0693 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_io.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_io.c
@@ -47,10 +47,21 @@
 #include "llite_internal.h"
 #include "vvp_internal.h"
 
+struct vvp_io *cl2vvp_io(const struct lu_env *env,
+			 const struct cl_io_slice *slice)
+{
+	struct vvp_io *vio;
+
+	vio = container_of(slice, struct vvp_io, vui_cl);
+	LASSERT(vio == vvp_env_io(env));
+
+	return vio;
+}
+
 /**
  * True, if \a io is a normal io, False for splice_{read,write}
  */
-int cl_is_normalio(const struct lu_env *env, const struct cl_io *io)
+static int cl_is_normalio(const struct lu_env *env, const struct cl_io *io)
 {
 	struct vvp_io *vio = vvp_env_io(env);
 
@@ -93,12 +104,160 @@ static bool can_populate_pages(const struct lu_env *env, struct cl_io *io,
 	return rc;
 }
 
+static void vvp_object_size_lock(struct cl_object *obj)
+{
+	struct inode *inode = vvp_object_inode(obj);
+
+	ll_inode_size_lock(inode);
+	cl_object_attr_lock(obj);
+}
+
+static void vvp_object_size_unlock(struct cl_object *obj)
+{
+	struct inode *inode = vvp_object_inode(obj);
+
+	cl_object_attr_unlock(obj);
+	ll_inode_size_unlock(inode);
+}
+
+/**
+ * Helper function that if necessary adjusts file size (inode->i_size), when
+ * position at the offset \a pos is accessed. File size can be arbitrary stale
+ * on a Lustre client, but client at least knows KMS. If accessed area is
+ * inside [0, KMS], set file size to KMS, otherwise glimpse file size.
+ *
+ * Locking: cl_isize_lock is used to serialize changes to inode size and to
+ * protect consistency between inode size and cl_object
+ * attributes. cl_object_size_lock() protects consistency between cl_attr's of
+ * top-object and sub-objects.
+ */
+static int vvp_prep_size(const struct lu_env *env, struct cl_object *obj,
+			 struct cl_io *io, loff_t start, size_t count,
+			 int *exceed)
+{
+	struct cl_attr *attr  = ccc_env_thread_attr(env);
+	struct inode   *inode = vvp_object_inode(obj);
+	loff_t	  pos   = start + count - 1;
+	loff_t kms;
+	int result;
+
+	/*
+	 * Consistency guarantees: following possibilities exist for the
+	 * relation between region being accessed and real file size at this
+	 * moment:
+	 *
+	 *  (A): the region is completely inside of the file;
+	 *
+	 *  (B-x): x bytes of region are inside of the file, the rest is
+	 *  outside;
+	 *
+	 *  (C): the region is completely outside of the file.
+	 *
+	 * This classification is stable under DLM lock already acquired by
+	 * the caller, because to change the class, other client has to take
+	 * DLM lock conflicting with our lock. Also, any updates to ->i_size
+	 * by other threads on this client are serialized by
+	 * ll_inode_size_lock(). This guarantees that short reads are handled
+	 * correctly in the face of concurrent writes and truncates.
+	 */
+	vvp_object_size_lock(obj);
+	result = cl_object_attr_get(env, obj, attr);
+	if (result == 0) {
+		kms = attr->cat_kms;
+		if (pos > kms) {
+			/*
+			 * A glimpse is necessary to determine whether we
+			 * return a short read (B) or some zeroes at the end
+			 * of the buffer (C)
+			 */
+			vvp_object_size_unlock(obj);
+			result = cl_glimpse_lock(env, io, inode, obj, 0);
+			if (result == 0 && exceed) {
+				/* If objective page index exceed end-of-file
+				 * page index, return directly. Do not expect
+				 * kernel will check such case correctly.
+				 * linux-2.6.18-128.1.1 miss to do that.
+				 * --bug 17336
+				 */
+				loff_t size = i_size_read(inode);
+				loff_t cur_index = start >> PAGE_CACHE_SHIFT;
+				loff_t size_index = (size - 1) >>
+						    PAGE_CACHE_SHIFT;
+
+				if ((size == 0 && cur_index != 0) ||
+				    size_index < cur_index)
+					*exceed = 1;
+			}
+			return result;
+		}
+		/*
+		 * region is within kms and, hence, within real file
+		 * size (A). We need to increase i_size to cover the
+		 * read region so that generic_file_read() will do its
+		 * job, but that doesn't mean the kms size is
+		 * _correct_, it is only the _minimum_ size. If
+		 * someone does a stat they will get the correct size
+		 * which will always be >= the kms value here.
+		 * b=11081
+		 */
+		if (i_size_read(inode) < kms) {
+			i_size_write(inode, kms);
+			CDEBUG(D_VFSTRACE, DFID " updating i_size %llu\n",
+			       PFID(lu_object_fid(&obj->co_lu)),
+			       (__u64)i_size_read(inode));
+		}
+	}
+
+	vvp_object_size_unlock(obj);
+
+	return result;
+}
+
 /*****************************************************************************
  *
  * io operations.
  *
  */
 
+static int vvp_io_one_lock_index(const struct lu_env *env, struct cl_io *io,
+				 __u32 enqflags, enum cl_lock_mode mode,
+			  pgoff_t start, pgoff_t end)
+{
+	struct vvp_io          *vio   = vvp_env_io(env);
+	struct cl_lock_descr   *descr = &vio->vui_link.cill_descr;
+	struct cl_object       *obj   = io->ci_obj;
+
+	CLOBINVRNT(env, obj, vvp_object_invariant(obj));
+
+	CDEBUG(D_VFSTRACE, "lock: %d [%lu, %lu]\n", mode, start, end);
+
+	memset(&vio->vui_link, 0, sizeof(vio->vui_link));
+
+	if (vio->vui_fd && (vio->vui_fd->fd_flags & LL_FILE_GROUP_LOCKED)) {
+		descr->cld_mode = CLM_GROUP;
+		descr->cld_gid  = vio->vui_fd->fd_grouplock.cg_gid;
+	} else {
+		descr->cld_mode  = mode;
+	}
+	descr->cld_obj   = obj;
+	descr->cld_start = start;
+	descr->cld_end   = end;
+	descr->cld_enq_flags = enqflags;
+
+	cl_io_lock_add(env, io, &vio->vui_link);
+	return 0;
+}
+
+static int vvp_io_one_lock(const struct lu_env *env, struct cl_io *io,
+			   __u32 enqflags, enum cl_lock_mode mode,
+			   loff_t start, loff_t end)
+{
+	struct cl_object *obj = io->ci_obj;
+
+	return vvp_io_one_lock_index(env, io, enqflags, mode,
+				     cl_index(obj, start), cl_index(obj, end));
+}
+
 static int vvp_io_write_iter_init(const struct lu_env *env,
 				  const struct cl_io_slice *ios)
 {
@@ -303,6 +462,33 @@ static int vvp_mmap_locks(const struct lu_env *env,
 	return result;
 }
 
+static void vvp_io_advance(const struct lu_env *env,
+			   const struct cl_io_slice *ios,
+			   size_t nob)
+{
+	struct vvp_io    *vio = cl2vvp_io(env, ios);
+	struct cl_io     *io  = ios->cis_io;
+	struct cl_object *obj = ios->cis_io->ci_obj;
+
+	CLOBINVRNT(env, obj, vvp_object_invariant(obj));
+
+	if (!cl_is_normalio(env, io))
+		return;
+
+	iov_iter_reexpand(vio->vui_iter, vio->vui_tot_count  -= nob);
+}
+
+static void vvp_io_update_iov(const struct lu_env *env,
+			      struct vvp_io *vio, struct cl_io *io)
+{
+	size_t size = io->u.ci_rw.crw_count;
+
+	if (!cl_is_normalio(env, io) || !vio->vui_iter)
+		return;
+
+	iov_iter_truncate(vio->vui_iter, size);
+}
+
 static int vvp_io_rw_lock(const struct lu_env *env, struct cl_io *io,
 			  enum cl_lock_mode mode, loff_t start, loff_t end)
 {
@@ -514,7 +700,7 @@ static int vvp_io_read_start(const struct lu_env *env,
 	if (!can_populate_pages(env, io, inode))
 		return 0;
 
-	result = ccc_prep_size(env, obj, io, pos, tot, &exceed);
+	result = vvp_prep_size(env, obj, io, pos, tot, &exceed);
 	if (result != 0)
 		return result;
 	else if (exceed != 0)
@@ -886,7 +1072,7 @@ static int vvp_io_fault_start(const struct lu_env *env,
 	/* offset of the last byte on the page */
 	offset = cl_offset(obj, fio->ft_index + 1) - 1;
 	LASSERT(cl_index(obj, offset) == fio->ft_index);
-	result = ccc_prep_size(env, obj, io, 0, offset + 1, NULL);
+	result = vvp_prep_size(env, obj, io, 0, offset + 1, NULL);
 	if (result != 0)
 		return result;
 
@@ -1075,6 +1261,12 @@ static int vvp_io_read_page(const struct lu_env *env,
 	return 0;
 }
 
+void vvp_io_end(const struct lu_env *env, const struct cl_io_slice *ios)
+{
+	CLOBINVRNT(env, ios->cis_io->ci_obj,
+		   vvp_object_invariant(ios->cis_io->ci_obj));
+}
+
 static const struct cl_io_operations vvp_io_ops = {
 	.op = {
 		[CIT_READ] = {
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 33/46] staging/lustre/llite: rename ccc_req to vvp_req
  2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
                   ` (31 preceding siblings ...)
  2016-03-30 23:48 ` [PATCH v2 32/46] staging/lustre/llite: move vvp_io functions to vvp_io.c green
@ 2016-03-30 23:48 ` green
  2016-03-30 23:48 ` [PATCH v2 34/46] staging/lustre/llite: Rename struct ccc_grouplock to ll_grouplock green
                   ` (12 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: green @ 2016-03-30 23:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List,
	John L. Hammond, Oleg Drokin

From: "John L. Hammond" <john.hammond@intel.com>

Rename struct ccc_req to struct vvp_req and move related functions
from lustre/llite/lcommon_cl.c to the new file lustre/llite/vvp_req.c.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Reviewed-on: http://review.whamcloud.com/13377
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5971
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 drivers/staging/lustre/lustre/include/cl_object.h  |   2 +-
 drivers/staging/lustre/lustre/llite/Makefile       |   3 +-
 drivers/staging/lustre/lustre/llite/lcommon_cl.c   | 104 ------------------
 drivers/staging/lustre/lustre/llite/vvp_dev.c      |   8 +-
 drivers/staging/lustre/lustre/llite/vvp_internal.h |  18 +--
 drivers/staging/lustre/lustre/llite/vvp_req.c      | 121 +++++++++++++++++++++
 6 files changed, 136 insertions(+), 120 deletions(-)
 create mode 100644 drivers/staging/lustre/lustre/llite/vvp_req.c

diff --git a/drivers/staging/lustre/lustre/include/cl_object.h b/drivers/staging/lustre/lustre/include/cl_object.h
index 104dc9f..918be65 100644
--- a/drivers/staging/lustre/lustre/include/cl_object.h
+++ b/drivers/staging/lustre/lustre/include/cl_object.h
@@ -140,7 +140,7 @@ struct cl_device_operations {
 	 * cl_req_slice_add().
 	 *
 	 * \see osc_req_init(), lov_req_init(), lovsub_req_init()
-	 * \see ccc_req_init()
+	 * \see vvp_req_init()
 	 */
 	int (*cdo_req_init)(const struct lu_env *env, struct cl_device *dev,
 			    struct cl_req *req);
diff --git a/drivers/staging/lustre/lustre/llite/Makefile b/drivers/staging/lustre/lustre/llite/Makefile
index 24085e2..2ce10ff 100644
--- a/drivers/staging/lustre/lustre/llite/Makefile
+++ b/drivers/staging/lustre/lustre/llite/Makefile
@@ -5,6 +5,7 @@ lustre-y := dcache.o dir.o file.o llite_close.o llite_lib.o llite_nfs.o \
 	    xattr.o xattr_cache.o remote_perm.o llite_rmtacl.o \
 	    rw26.o super25.o statahead.o \
 	    glimpse.o lcommon_cl.o lcommon_misc.o \
-	    vvp_dev.o vvp_page.o vvp_lock.o vvp_io.o vvp_object.o lproc_llite.o
+	    vvp_dev.o vvp_page.o vvp_lock.o vvp_io.o vvp_object.o vvp_req.o \
+	    lproc_llite.o
 
 llite_lloop-y := lloop.o
diff --git a/drivers/staging/lustre/lustre/llite/lcommon_cl.c b/drivers/staging/lustre/lustre/llite/lcommon_cl.c
index 1b11103..d28546a 100644
--- a/drivers/staging/lustre/lustre/llite/lcommon_cl.c
+++ b/drivers/staging/lustre/lustre/llite/lcommon_cl.c
@@ -61,14 +61,11 @@
 
 #include "../llite/llite_internal.h"
 
-static const struct cl_req_operations ccc_req_ops;
-
 /*
  * ccc_ prefix stands for "Common Client Code".
  */
 
 static struct kmem_cache *ccc_thread_kmem;
-static struct kmem_cache *ccc_req_kmem;
 
 static struct lu_kmem_descr ccc_caches[] = {
 	{
@@ -77,11 +74,6 @@ static struct lu_kmem_descr ccc_caches[] = {
 		.ckd_size  = sizeof(struct ccc_thread_info),
 	},
 	{
-		.ckd_cache = &ccc_req_kmem,
-		.ckd_name  = "ccc_req_kmem",
-		.ckd_size  = sizeof(struct ccc_req)
-	},
-	{
 		.ckd_cache = NULL
 	}
 };
@@ -116,22 +108,6 @@ struct lu_context_key ccc_key = {
 	.lct_fini = ccc_key_fini
 };
 
-int ccc_req_init(const struct lu_env *env, struct cl_device *dev,
-		 struct cl_req *req)
-{
-	struct ccc_req *vrq;
-	int result;
-
-	vrq = kmem_cache_zalloc(ccc_req_kmem, GFP_NOFS);
-	if (vrq) {
-		cl_req_slice_add(req, &vrq->crq_cl, dev, &ccc_req_ops);
-		result = 0;
-	} else {
-		result = -ENOMEM;
-	}
-	return result;
-}
-
 /**
  * An `emergency' environment used by ccc_inode_fini() when cl_env_get()
  * fails. Access to this environment is serialized by ccc_inode_fini_guard
@@ -184,75 +160,6 @@ void ccc_global_fini(struct lu_device_type *device_type)
 	lu_kmem_fini(ccc_caches);
 }
 
-/*****************************************************************************
- *
- * Transfer operations.
- *
- */
-
-void ccc_req_completion(const struct lu_env *env,
-			const struct cl_req_slice *slice, int ioret)
-{
-	struct ccc_req *vrq;
-
-	if (ioret > 0)
-		cl_stats_tally(slice->crs_dev, slice->crs_req->crq_type, ioret);
-
-	vrq = cl2ccc_req(slice);
-	kmem_cache_free(ccc_req_kmem, vrq);
-}
-
-/**
- * Implementation of struct cl_req_operations::cro_attr_set() for ccc
- * layer. ccc is responsible for
- *
- *    - o_[mac]time
- *
- *    - o_mode
- *
- *    - o_parent_seq
- *
- *    - o_[ug]id
- *
- *    - o_parent_oid
- *
- *    - o_parent_ver
- *
- *    - o_ioepoch,
- *
- */
-void ccc_req_attr_set(const struct lu_env *env,
-		      const struct cl_req_slice *slice,
-		      const struct cl_object *obj,
-		      struct cl_req_attr *attr, u64 flags)
-{
-	struct inode *inode;
-	struct obdo  *oa;
-	u32	      valid_flags;
-
-	oa = attr->cra_oa;
-	inode = vvp_object_inode(obj);
-	valid_flags = OBD_MD_FLTYPE;
-
-	if (slice->crs_req->crq_type == CRT_WRITE) {
-		if (flags & OBD_MD_FLEPOCH) {
-			oa->o_valid |= OBD_MD_FLEPOCH;
-			oa->o_ioepoch = ll_i2info(inode)->lli_ioepoch;
-			valid_flags |= OBD_MD_FLMTIME | OBD_MD_FLCTIME |
-				       OBD_MD_FLUID | OBD_MD_FLGID;
-		}
-	}
-	obdo_from_inode(oa, inode, valid_flags & flags);
-	obdo_set_parent_fid(oa, &ll_i2info(inode)->lli_fid);
-	memcpy(attr->cra_jobid, ll_i2info(inode)->lli_jobid,
-	       JOBSTATS_JOBID_SIZE);
-}
-
-static const struct cl_req_operations ccc_req_ops = {
-	.cro_attr_set   = ccc_req_attr_set,
-	.cro_completion = ccc_req_completion
-};
-
 int cl_setattr_ost(struct inode *inode, const struct iattr *attr)
 {
 	struct lu_env *env;
@@ -301,17 +208,6 @@ again:
 	return result;
 }
 
-/*****************************************************************************
- *
- * Type conversions.
- *
- */
-
-struct ccc_req *cl2ccc_req(const struct cl_req_slice *slice)
-{
-	return container_of0(slice, struct ccc_req, crq_cl);
-}
-
 /**
  * Initialize or update CLIO structures for regular files when new
  * meta-data arrives from the server.
diff --git a/drivers/staging/lustre/lustre/llite/vvp_dev.c b/drivers/staging/lustre/lustre/llite/vvp_dev.c
index 030a246..5d3beb6 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_dev.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_dev.c
@@ -59,6 +59,7 @@
 
 struct kmem_cache *vvp_lock_kmem;
 struct kmem_cache *vvp_object_kmem;
+struct kmem_cache *vvp_req_kmem;
 static struct kmem_cache *vvp_thread_kmem;
 static struct kmem_cache *vvp_session_kmem;
 static struct lu_kmem_descr vvp_caches[] = {
@@ -73,6 +74,11 @@ static struct lu_kmem_descr vvp_caches[] = {
 		.ckd_size  = sizeof(struct vvp_object),
 	},
 	{
+		.ckd_cache = &vvp_req_kmem,
+		.ckd_name  = "vvp_req_kmem",
+		.ckd_size  = sizeof(struct vvp_req),
+	},
+	{
 		.ckd_cache = &vvp_thread_kmem,
 		.ckd_name  = "vvp_thread_kmem",
 		.ckd_size  = sizeof(struct vvp_thread_info),
@@ -145,7 +151,7 @@ static const struct lu_device_operations vvp_lu_ops = {
 };
 
 static const struct cl_device_operations vvp_cl_ops = {
-	.cdo_req_init = ccc_req_init
+	.cdo_req_init = vvp_req_init
 };
 
 static struct lu_device *vvp_device_free(const struct lu_env *env,
diff --git a/drivers/staging/lustre/lustre/llite/vvp_internal.h b/drivers/staging/lustre/lustre/llite/vvp_internal.h
index 7af8c44..99e7ef3 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_internal.h
+++ b/drivers/staging/lustre/lustre/llite/vvp_internal.h
@@ -169,6 +169,7 @@ extern struct lu_context_key vvp_session_key;
 
 extern struct kmem_cache *vvp_lock_kmem;
 extern struct kmem_cache *vvp_object_kmem;
+extern struct kmem_cache *vvp_req_kmem;
 
 struct ccc_thread_info {
 	struct cl_lock		cti_lock;
@@ -313,8 +314,8 @@ struct vvp_lock {
 	struct cl_lock_slice vlk_cl;
 };
 
-struct ccc_req {
-	struct cl_req_slice  crq_cl;
+struct vvp_req {
+	struct cl_req_slice  vrq_cl;
 };
 
 void *ccc_key_init(const struct lu_context *ctx,
@@ -322,19 +323,10 @@ void *ccc_key_init(const struct lu_context *ctx,
 void ccc_key_fini(const struct lu_context *ctx,
 		  struct lu_context_key *key, void *data);
 
-int ccc_req_init(const struct lu_env *env, struct cl_device *dev,
-		 struct cl_req *req);
 void ccc_umount(const struct lu_env *env, struct cl_device *dev);
 int ccc_global_init(struct lu_device_type *device_type);
 void ccc_global_fini(struct lu_device_type *device_type);
 
-void ccc_req_completion(const struct lu_env *env,
-			const struct cl_req_slice *slice, int ioret);
-void ccc_req_attr_set(const struct lu_env *env,
-		      const struct cl_req_slice *slice,
-		      const struct cl_object *obj,
-		      struct cl_req_attr *oa, u64 flags);
-
 static inline struct lu_device *vvp2lu_dev(struct vvp_device *vdv)
 {
 	return &vdv->vdv_cl.cd_lu_dev;
@@ -378,8 +370,6 @@ static inline struct vvp_lock *cl2vvp_lock(const struct cl_lock_slice *slice)
 	return container_of(slice, struct vvp_lock, vlk_cl);
 }
 
-struct ccc_req *cl2ccc_req(const struct cl_req_slice *slice);
-
 int cl_setattr_ost(struct inode *inode, const struct iattr *attr);
 
 int cl_file_inode_init(struct inode *inode, struct lustre_md *md);
@@ -431,6 +421,8 @@ int vvp_lock_init(const struct lu_env *env, struct cl_object *obj,
 		  struct cl_lock *lock, const struct cl_io *io);
 int vvp_page_init(const struct lu_env *env, struct cl_object *obj,
 		  struct cl_page *page, pgoff_t index);
+int vvp_req_init(const struct lu_env *env, struct cl_device *dev,
+		 struct cl_req *req);
 struct lu_object *vvp_object_alloc(const struct lu_env *env,
 				   const struct lu_object_header *hdr,
 				   struct lu_device *dev);
diff --git a/drivers/staging/lustre/lustre/llite/vvp_req.c b/drivers/staging/lustre/lustre/llite/vvp_req.c
new file mode 100644
index 0000000..fb88629
--- /dev/null
+++ b/drivers/staging/lustre/lustre/llite/vvp_req.c
@@ -0,0 +1,121 @@
+/*
+ * GPL HEADER START
+ *
+ * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 only,
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License version 2 for more details (a copy is included
+ * in the LICENSE file that accompanied this code).
+ *
+ * You should have received a copy of the GNU General Public License
+ * version 2 along with this program; If not, see
+ * http://www.gnu.org/licenses/gpl-2.0.html
+ *
+ * GPL HEADER END
+ */
+/*
+ * Copyright (c) 2008, 2010, Oracle and/or its affiliates. All rights reserved.
+ * Use is subject to license terms.
+ *
+ * Copyright (c) 2011, 2014, Intel Corporation.
+ */
+
+#define DEBUG_SUBSYSTEM S_LLITE
+
+#include "../include/lustre/lustre_idl.h"
+#include "../include/cl_object.h"
+#include "../include/obd.h"
+#include "../include/obd_support.h"
+#include "../include/lustre_lite.h"
+#include "llite_internal.h"
+#include "vvp_internal.h"
+
+static inline struct vvp_req *cl2vvp_req(const struct cl_req_slice *slice)
+{
+	return container_of0(slice, struct vvp_req, vrq_cl);
+}
+
+/**
+ * Implementation of struct cl_req_operations::cro_attr_set() for VVP
+ * layer. VVP is responsible for
+ *
+ *    - o_[mac]time
+ *
+ *    - o_mode
+ *
+ *    - o_parent_seq
+ *
+ *    - o_[ug]id
+ *
+ *    - o_parent_oid
+ *
+ *    - o_parent_ver
+ *
+ *    - o_ioepoch,
+ *
+ */
+void vvp_req_attr_set(const struct lu_env *env,
+		      const struct cl_req_slice *slice,
+		      const struct cl_object *obj,
+		      struct cl_req_attr *attr, u64 flags)
+{
+	struct inode *inode;
+	struct obdo  *oa;
+	u32	      valid_flags;
+
+	oa = attr->cra_oa;
+	inode = vvp_object_inode(obj);
+	valid_flags = OBD_MD_FLTYPE;
+
+	if (slice->crs_req->crq_type == CRT_WRITE) {
+		if (flags & OBD_MD_FLEPOCH) {
+			oa->o_valid |= OBD_MD_FLEPOCH;
+			oa->o_ioepoch = ll_i2info(inode)->lli_ioepoch;
+			valid_flags |= OBD_MD_FLMTIME | OBD_MD_FLCTIME |
+				       OBD_MD_FLUID | OBD_MD_FLGID;
+		}
+	}
+	obdo_from_inode(oa, inode, valid_flags & flags);
+	obdo_set_parent_fid(oa, &ll_i2info(inode)->lli_fid);
+	memcpy(attr->cra_jobid, ll_i2info(inode)->lli_jobid,
+	       JOBSTATS_JOBID_SIZE);
+}
+
+void vvp_req_completion(const struct lu_env *env,
+			const struct cl_req_slice *slice, int ioret)
+{
+	struct vvp_req *vrq;
+
+	if (ioret > 0)
+		cl_stats_tally(slice->crs_dev, slice->crs_req->crq_type, ioret);
+
+	vrq = cl2vvp_req(slice);
+	kmem_cache_free(vvp_req_kmem, vrq);
+}
+
+static const struct cl_req_operations vvp_req_ops = {
+	.cro_attr_set   = vvp_req_attr_set,
+	.cro_completion = vvp_req_completion
+};
+
+int vvp_req_init(const struct lu_env *env, struct cl_device *dev,
+		 struct cl_req *req)
+{
+	struct vvp_req *vrq;
+	int result;
+
+	vrq = kmem_cache_zalloc(vvp_req_kmem, GFP_NOFS);
+	if (vrq) {
+		cl_req_slice_add(req, &vrq->vrq_cl, dev, &vvp_req_ops);
+		result = 0;
+	} else {
+		result = -ENOMEM;
+	}
+	return result;
+}
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 34/46] staging/lustre/llite: Rename struct ccc_grouplock to ll_grouplock
  2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
                   ` (32 preceding siblings ...)
  2016-03-30 23:48 ` [PATCH v2 33/46] staging/lustre/llite: rename ccc_req to vvp_req green
@ 2016-03-30 23:48 ` green
  2016-03-30 23:48 ` [PATCH v2 35/46] staging/lustre/llite: Rename struct vvp_thread_info to ll_thread_info green
                   ` (11 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: green @ 2016-03-30 23:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List, John Hammond,
	Jinshan Xiong, Oleg Drokin

From: John Hammond <john.hammond@intel.com>

And move the definition from vvp_internal.h to llite_internal.h.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-on: http://review.whamcloud.com/13714
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5971
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 drivers/staging/lustre/lustre/llite/file.c         | 16 ++++++-------
 drivers/staging/lustre/lustre/llite/lcommon_misc.c | 26 +++++++++++-----------
 .../staging/lustre/lustre/llite/llite_internal.h   | 14 +++++++++++-
 drivers/staging/lustre/lustre/llite/vvp_internal.h | 11 ---------
 drivers/staging/lustre/lustre/llite/vvp_io.c       |  2 +-
 5 files changed, 35 insertions(+), 34 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index 63aa080..210d1e2 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -278,7 +278,7 @@ static int ll_md_close(struct obd_export *md_exp, struct inode *inode,
 
 	/* clear group lock, if present */
 	if (unlikely(fd->fd_flags & LL_FILE_GROUP_LOCKED))
-		ll_put_grouplock(inode, file, fd->fd_grouplock.cg_gid);
+		ll_put_grouplock(inode, file, fd->fd_grouplock.lg_gid);
 
 	if (fd->fd_lease_och) {
 		bool lease_broken;
@@ -1570,7 +1570,7 @@ ll_get_grouplock(struct inode *inode, struct file *file, unsigned long arg)
 {
 	struct ll_inode_info   *lli = ll_i2info(inode);
 	struct ll_file_data    *fd = LUSTRE_FPRIVATE(file);
-	struct ccc_grouplock    grouplock;
+	struct ll_grouplock    grouplock;
 	int		     rc;
 
 	if (arg == 0) {
@@ -1584,11 +1584,11 @@ ll_get_grouplock(struct inode *inode, struct file *file, unsigned long arg)
 	spin_lock(&lli->lli_lock);
 	if (fd->fd_flags & LL_FILE_GROUP_LOCKED) {
 		CWARN("group lock already existed with gid %lu\n",
-		      fd->fd_grouplock.cg_gid);
+		      fd->fd_grouplock.lg_gid);
 		spin_unlock(&lli->lli_lock);
 		return -EINVAL;
 	}
-	LASSERT(!fd->fd_grouplock.cg_lock);
+	LASSERT(!fd->fd_grouplock.lg_lock);
 	spin_unlock(&lli->lli_lock);
 
 	rc = cl_get_grouplock(ll_i2info(inode)->lli_clob,
@@ -1617,7 +1617,7 @@ static int ll_put_grouplock(struct inode *inode, struct file *file,
 {
 	struct ll_inode_info   *lli = ll_i2info(inode);
 	struct ll_file_data    *fd = LUSTRE_FPRIVATE(file);
-	struct ccc_grouplock    grouplock;
+	struct ll_grouplock    grouplock;
 
 	spin_lock(&lli->lli_lock);
 	if (!(fd->fd_flags & LL_FILE_GROUP_LOCKED)) {
@@ -1625,11 +1625,11 @@ static int ll_put_grouplock(struct inode *inode, struct file *file,
 		CWARN("no group lock held\n");
 		return -EINVAL;
 	}
-	LASSERT(fd->fd_grouplock.cg_lock);
+	LASSERT(fd->fd_grouplock.lg_lock);
 
-	if (fd->fd_grouplock.cg_gid != arg) {
+	if (fd->fd_grouplock.lg_gid != arg) {
 		CWARN("group lock %lu doesn't match current id %lu\n",
-		      arg, fd->fd_grouplock.cg_gid);
+		      arg, fd->fd_grouplock.lg_gid);
 		spin_unlock(&lli->lli_lock);
 		return -EINVAL;
 	}
diff --git a/drivers/staging/lustre/lustre/llite/lcommon_misc.c b/drivers/staging/lustre/lustre/llite/lcommon_misc.c
index 68e3db1..5e3e43f 100644
--- a/drivers/staging/lustre/lustre/llite/lcommon_misc.c
+++ b/drivers/staging/lustre/lustre/llite/lcommon_misc.c
@@ -42,8 +42,8 @@
 #include "../include/obd.h"
 #include "../include/cl_object.h"
 
-#include "vvp_internal.h"
 #include "../include/lustre_lite.h"
+#include "llite_internal.h"
 
 /* Initialize the default and maximum LOV EA and cookie sizes.  This allows
  * us to make MDS RPCs with large enough reply buffers to hold the
@@ -126,7 +126,7 @@ int cl_ocd_update(struct obd_device *host,
 #define GROUPLOCK_SCOPE "grouplock"
 
 int cl_get_grouplock(struct cl_object *obj, unsigned long gid, int nonblock,
-		     struct ccc_grouplock *cg)
+		     struct ll_grouplock *cg)
 {
 	struct lu_env	  *env;
 	struct cl_io	   *io;
@@ -172,25 +172,25 @@ int cl_get_grouplock(struct cl_object *obj, unsigned long gid, int nonblock,
 		return rc;
 	}
 
-	cg->cg_env  = cl_env_get(&refcheck);
-	cg->cg_io   = io;
-	cg->cg_lock = lock;
-	cg->cg_gid  = gid;
-	LASSERT(cg->cg_env == env);
+	cg->lg_env  = cl_env_get(&refcheck);
+	cg->lg_io   = io;
+	cg->lg_lock = lock;
+	cg->lg_gid  = gid;
+	LASSERT(cg->lg_env == env);
 
 	cl_env_unplant(env, &refcheck);
 	return 0;
 }
 
-void cl_put_grouplock(struct ccc_grouplock *cg)
+void cl_put_grouplock(struct ll_grouplock *cg)
 {
-	struct lu_env  *env  = cg->cg_env;
-	struct cl_io   *io   = cg->cg_io;
-	struct cl_lock *lock = cg->cg_lock;
+	struct lu_env  *env  = cg->lg_env;
+	struct cl_io   *io   = cg->lg_io;
+	struct cl_lock *lock = cg->lg_lock;
 	int	     refcheck;
 
-	LASSERT(cg->cg_env);
-	LASSERT(cg->cg_gid);
+	LASSERT(cg->lg_env);
+	LASSERT(cg->lg_gid);
 
 	cl_env_implant(env, &refcheck);
 	cl_env_put(env, &refcheck);
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index 86e93c0..8b943df 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -99,6 +99,13 @@ struct ll_remote_perm {
 					     */
 };
 
+struct ll_grouplock {
+	struct lu_env	*lg_env;
+	struct cl_io	*lg_io;
+	struct cl_lock	*lg_lock;
+	unsigned long	 lg_gid;
+};
+
 enum lli_flags {
 	/* MDS has an authority for the Size-on-MDS attributes. */
 	LLIF_MDS_SIZE_LOCK      = (1 << 0),
@@ -612,7 +619,7 @@ extern struct kmem_cache *ll_file_data_slab;
 struct lustre_handle;
 struct ll_file_data {
 	struct ll_readahead_state fd_ras;
-	struct ccc_grouplock fd_grouplock;
+	struct ll_grouplock fd_grouplock;
 	__u64 lfd_pos;
 	__u32 fd_flags;
 	fmode_t fd_omode;
@@ -655,6 +662,11 @@ static inline int ll_need_32bit_api(struct ll_sb_info *sbi)
 
 void ll_ras_enter(struct file *f);
 
+/* llite/lcommon_misc.c */
+int cl_get_grouplock(struct cl_object *obj, unsigned long gid, int nonblock,
+		     struct ll_grouplock *cg);
+void cl_put_grouplock(struct ll_grouplock *cg);
+
 /* llite/lproc_llite.c */
 int ldebugfs_register_mountpoint(struct dentry *parent,
 				 struct super_block *sb, char *osc, char *mdc);
diff --git a/drivers/staging/lustre/lustre/llite/vvp_internal.h b/drivers/staging/lustre/lustre/llite/vvp_internal.h
index 99e7ef3..4740ff5 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_internal.h
+++ b/drivers/staging/lustre/lustre/llite/vvp_internal.h
@@ -388,17 +388,6 @@ int cl_ocd_update(struct obd_device *host,
 		  struct obd_device *watched,
 		  enum obd_notify_event ev, void *owner, void *data);
 
-struct ccc_grouplock {
-	struct lu_env	*cg_env;
-	struct cl_io	*cg_io;
-	struct cl_lock	*cg_lock;
-	unsigned long	 cg_gid;
-};
-
-int cl_get_grouplock(struct cl_object *obj, unsigned long gid, int nonblock,
-		     struct ccc_grouplock *cg);
-void cl_put_grouplock(struct ccc_grouplock *cg);
-
 /**
  * New interfaces to get and put lov_stripe_md from lov layer. This violates
  * layering because lov_stripe_md is supposed to be a private data in lov.
diff --git a/drivers/staging/lustre/lustre/llite/vvp_io.c b/drivers/staging/lustre/lustre/llite/vvp_io.c
index 48b0693..18bc71b 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_io.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_io.c
@@ -235,7 +235,7 @@ static int vvp_io_one_lock_index(const struct lu_env *env, struct cl_io *io,
 
 	if (vio->vui_fd && (vio->vui_fd->fd_flags & LL_FILE_GROUP_LOCKED)) {
 		descr->cld_mode = CLM_GROUP;
-		descr->cld_gid  = vio->vui_fd->fd_grouplock.cg_gid;
+		descr->cld_gid  = vio->vui_fd->fd_grouplock.lg_gid;
 	} else {
 		descr->cld_mode  = mode;
 	}
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 35/46] staging/lustre/llite: Rename struct vvp_thread_info to ll_thread_info
  2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
                   ` (33 preceding siblings ...)
  2016-03-30 23:48 ` [PATCH v2 34/46] staging/lustre/llite: Rename struct ccc_grouplock to ll_grouplock green
@ 2016-03-30 23:48 ` green
  2016-03-30 23:48 ` [PATCH v2 36/46] staging/lustre/llite: rename struct ccc_thread_info to vvp_thread_info green
                   ` (10 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: green @ 2016-03-30 23:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List, John Hammond,
	Jinshan Xiong, Oleg Drokin

From: John Hammond <john.hammond@intel.com>

struct vvp_thread_info is used in the non-VVP parts of llite so rename
it struct ll_thread_info. Rename supporting functions accordingly.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-on: http://review.whamcloud.com/13714
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5971
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 drivers/staging/lustre/lustre/llite/file.c         |  6 ++--
 .../staging/lustre/lustre/llite/llite_internal.h   | 30 ++++++++---------
 drivers/staging/lustre/lustre/llite/rw.c           |  6 ++--
 drivers/staging/lustre/lustre/llite/vvp_dev.c      | 38 +++++++++++-----------
 4 files changed, 40 insertions(+), 40 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index 210d1e2..f7aabac 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -1221,7 +1221,7 @@ static ssize_t ll_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	if (IS_ERR(env))
 		return PTR_ERR(env);
 
-	args = vvp_env_args(env, IO_NORMAL);
+	args = ll_env_args(env, IO_NORMAL);
 	args->u.normal.via_iter = to;
 	args->u.normal.via_iocb = iocb;
 
@@ -1245,7 +1245,7 @@ static ssize_t ll_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	if (IS_ERR(env))
 		return PTR_ERR(env);
 
-	args = vvp_env_args(env, IO_NORMAL);
+	args = ll_env_args(env, IO_NORMAL);
 	args->u.normal.via_iter = from;
 	args->u.normal.via_iocb = iocb;
 
@@ -1271,7 +1271,7 @@ static ssize_t ll_file_splice_read(struct file *in_file, loff_t *ppos,
 	if (IS_ERR(env))
 		return PTR_ERR(env);
 
-	args = vvp_env_args(env, IO_SPLICE);
+	args = ll_env_args(env, IO_SPLICE);
 	args->u.splice.via_pipe = pipe;
 	args->u.splice.via_flags = flags;
 
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index 8b943df..a6ee2fe 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -855,30 +855,30 @@ struct ll_cl_context {
 	int	     lcc_refcheck;
 };
 
-struct vvp_thread_info {
-	struct vvp_io_args   vti_args;
-	struct ra_io_arg     vti_ria;
-	struct ll_cl_context vti_io_ctx;
+struct ll_thread_info {
+	struct vvp_io_args   lti_args;
+	struct ra_io_arg     lti_ria;
+	struct ll_cl_context lti_io_ctx;
 };
 
-static inline struct vvp_thread_info *vvp_env_info(const struct lu_env *env)
+extern struct lu_context_key ll_thread_key;
+static inline struct ll_thread_info *ll_env_info(const struct lu_env *env)
 {
-	extern struct lu_context_key vvp_key;
-	struct vvp_thread_info      *info;
+	struct ll_thread_info *lti;
 
-	info = lu_context_key_get(&env->le_ctx, &vvp_key);
-	LASSERT(info);
-	return info;
+	lti = lu_context_key_get(&env->le_ctx, &ll_thread_key);
+	LASSERT(lti);
+	return lti;
 }
 
-static inline struct vvp_io_args *vvp_env_args(const struct lu_env *env,
-					       enum vvp_io_subtype type)
+static inline struct vvp_io_args *ll_env_args(const struct lu_env *env,
+					      enum vvp_io_subtype type)
 {
-	struct vvp_io_args *ret = &vvp_env_info(env)->vti_args;
+	struct vvp_io_args *via = &ll_env_info(env)->lti_args;
 
-	ret->via_io_subtype = type;
+	via->via_io_subtype = type;
 
-	return ret;
+	return via;
 }
 
 int vvp_global_init(void);
diff --git a/drivers/staging/lustre/lustre/llite/rw.c b/drivers/staging/lustre/lustre/llite/rw.c
index 634f0bb..f9bc3e4 100644
--- a/drivers/staging/lustre/lustre/llite/rw.c
+++ b/drivers/staging/lustre/lustre/llite/rw.c
@@ -102,7 +102,7 @@ struct ll_cl_context *ll_cl_init(struct file *file, struct page *vmpage)
 	if (IS_ERR(env))
 		return ERR_CAST(env);
 
-	lcc = &vvp_env_info(env)->vti_io_ctx;
+	lcc = &ll_env_info(env)->lti_io_ctx;
 	memset(lcc, 0, sizeof(*lcc));
 	lcc->lcc_env = env;
 	lcc->lcc_refcheck = refcheck;
@@ -523,12 +523,12 @@ int ll_readahead(const struct lu_env *env, struct cl_io *io,
 		 bool hit)
 {
 	struct vvp_io *vio = vvp_env_io(env);
-	struct vvp_thread_info *vti = vvp_env_info(env);
+	struct ll_thread_info *lti = ll_env_info(env);
 	struct cl_attr *attr = ccc_env_thread_attr(env);
 	unsigned long start = 0, end = 0, reserved;
 	unsigned long ra_end, len, mlen = 0;
 	struct inode *inode;
-	struct ra_io_arg *ria = &vti->vti_ria;
+	struct ra_io_arg *ria = &lti->lti_ria;
 	struct cl_object *clob;
 	int ret = 0;
 	__u64 kms;
diff --git a/drivers/staging/lustre/lustre/llite/vvp_dev.c b/drivers/staging/lustre/lustre/llite/vvp_dev.c
index 5d3beb6..2278433 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_dev.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_dev.c
@@ -57,13 +57,18 @@
  * "llite_" (var. "ll_") prefix.
  */
 
+static struct kmem_cache *ll_thread_kmem;
 struct kmem_cache *vvp_lock_kmem;
 struct kmem_cache *vvp_object_kmem;
 struct kmem_cache *vvp_req_kmem;
-static struct kmem_cache *vvp_thread_kmem;
 static struct kmem_cache *vvp_session_kmem;
 static struct lu_kmem_descr vvp_caches[] = {
 	{
+		.ckd_cache = &ll_thread_kmem,
+		.ckd_name  = "ll_thread_kmem",
+		.ckd_size  = sizeof(struct ll_thread_info),
+	},
+	{
 		.ckd_cache = &vvp_lock_kmem,
 		.ckd_name  = "vvp_lock_kmem",
 		.ckd_size  = sizeof(struct vvp_lock),
@@ -79,11 +84,6 @@ static struct lu_kmem_descr vvp_caches[] = {
 		.ckd_size  = sizeof(struct vvp_req),
 	},
 	{
-		.ckd_cache = &vvp_thread_kmem,
-		.ckd_name  = "vvp_thread_kmem",
-		.ckd_size  = sizeof(struct vvp_thread_info),
-	},
-	{
 		.ckd_cache = &vvp_session_kmem,
 		.ckd_name  = "vvp_session_kmem",
 		.ckd_size  = sizeof(struct vvp_session)
@@ -93,25 +93,31 @@ static struct lu_kmem_descr vvp_caches[] = {
 	}
 };
 
-static void *vvp_key_init(const struct lu_context *ctx,
-			  struct lu_context_key *key)
+static void *ll_thread_key_init(const struct lu_context *ctx,
+				struct lu_context_key *key)
 {
 	struct vvp_thread_info *info;
 
-	info = kmem_cache_zalloc(vvp_thread_kmem, GFP_NOFS);
+	info = kmem_cache_zalloc(ll_thread_kmem, GFP_NOFS);
 	if (!info)
 		info = ERR_PTR(-ENOMEM);
 	return info;
 }
 
-static void vvp_key_fini(const struct lu_context *ctx,
-			 struct lu_context_key *key, void *data)
+static void ll_thread_key_fini(const struct lu_context *ctx,
+			       struct lu_context_key *key, void *data)
 {
 	struct vvp_thread_info *info = data;
 
-	kmem_cache_free(vvp_thread_kmem, info);
+	kmem_cache_free(ll_thread_kmem, info);
 }
 
+struct lu_context_key ll_thread_key = {
+	.lct_tags = LCT_CL_THREAD,
+	.lct_init = ll_thread_key_init,
+	.lct_fini = ll_thread_key_fini
+};
+
 static void *vvp_session_key_init(const struct lu_context *ctx,
 				  struct lu_context_key *key)
 {
@@ -131,12 +137,6 @@ static void vvp_session_key_fini(const struct lu_context *ctx,
 	kmem_cache_free(vvp_session_kmem, session);
 }
 
-struct lu_context_key vvp_key = {
-	.lct_tags = LCT_CL_THREAD,
-	.lct_init = vvp_key_init,
-	.lct_fini = vvp_key_fini
-};
-
 struct lu_context_key vvp_session_key = {
 	.lct_tags = LCT_SESSION,
 	.lct_init = vvp_session_key_init,
@@ -144,7 +144,7 @@ struct lu_context_key vvp_session_key = {
 };
 
 /* type constructor/destructor: vvp_type_{init,fini,start,stop}(). */
-LU_TYPE_INIT_FINI(vvp, &ccc_key, &vvp_key, &vvp_session_key);
+LU_TYPE_INIT_FINI(vvp, &ccc_key, &ll_thread_key, &vvp_session_key);
 
 static const struct lu_device_operations vvp_lu_ops = {
 	.ldo_object_alloc      = vvp_object_alloc
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 36/46] staging/lustre/llite: rename struct ccc_thread_info to vvp_thread_info
  2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
                   ` (34 preceding siblings ...)
  2016-03-30 23:48 ` [PATCH v2 35/46] staging/lustre/llite: Rename struct vvp_thread_info to ll_thread_info green
@ 2016-03-30 23:48 ` green
  2016-03-30 23:48 ` [PATCH v2 37/46] staging/lustre/llite: Remove ccc_global_{init,fini}() green
                   ` (9 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: green @ 2016-03-30 23:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List, John Hammond,
	Jinshan Xiong, Oleg Drokin

From: John Hammond <john.hammond@intel.com>

struct ccc_thread_info is used in the VVP parts of llite so rename
it struct vvp_thread_info. Rename supporting functions accordingly.
Move init code from lcommon_cl.c to vvp_dev.c

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-on: http://review.whamcloud.com/13714
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5971
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 drivers/staging/lustre/lustre/llite/file.c         |  6 +--
 drivers/staging/lustre/lustre/llite/glimpse.c      |  6 +--
 drivers/staging/lustre/lustre/llite/lcommon_cl.c   | 48 +---------------------
 drivers/staging/lustre/lustre/llite/lcommon_misc.c |  4 +-
 drivers/staging/lustre/lustre/llite/llite_mmap.c   |  2 +-
 drivers/staging/lustre/lustre/llite/rw.c           |  4 +-
 drivers/staging/lustre/lustre/llite/rw26.c         |  2 +-
 drivers/staging/lustre/lustre/llite/vvp_dev.c      | 34 ++++++++++++++-
 drivers/staging/lustre/lustre/llite/vvp_internal.h | 34 +++++++--------
 drivers/staging/lustre/lustre/llite/vvp_io.c       |  8 ++--
 10 files changed, 68 insertions(+), 80 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index f7aabac..69b56a8 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -998,7 +998,7 @@ int ll_merge_attr(const struct lu_env *env, struct inode *inode)
 {
 	struct ll_inode_info *lli = ll_i2info(inode);
 	struct cl_object *obj = lli->lli_clob;
-	struct cl_attr *attr = ccc_env_thread_attr(env);
+	struct cl_attr *attr = vvp_env_thread_attr(env);
 	s64 atime;
 	s64 mtime;
 	s64 ctime;
@@ -1131,7 +1131,7 @@ ll_file_io_generic(const struct lu_env *env, struct vvp_io_args *args,
 	       file->f_path.dentry->d_name.name, iot, *ppos, count);
 
 restart:
-	io = ccc_env_thread_io(env);
+	io = vvp_env_thread_io(env);
 	ll_io_init(io, file, iot == CIT_WRITE);
 
 	if (cl_io_rw_init(env, io, iot, *ppos, count) == 0) {
@@ -2612,7 +2612,7 @@ int cl_sync_file_range(struct inode *inode, loff_t start, loff_t end,
 	if (IS_ERR(env))
 		return PTR_ERR(env);
 
-	io = ccc_env_thread_io(env);
+	io = vvp_env_thread_io(env);
 	io->ci_obj = ll_i2info(inode)->lli_clob;
 	io->ci_ignore_layout = ignore_layout;
 
diff --git a/drivers/staging/lustre/lustre/llite/glimpse.c b/drivers/staging/lustre/lustre/llite/glimpse.c
index d76fa16..d8ea754 100644
--- a/drivers/staging/lustre/lustre/llite/glimpse.c
+++ b/drivers/staging/lustre/lustre/llite/glimpse.c
@@ -93,7 +93,7 @@ int cl_glimpse_lock(const struct lu_env *env, struct cl_io *io,
 	if (!(lli->lli_flags & LLIF_MDS_SIZE_LOCK)) {
 		CDEBUG(D_DLMTRACE, "Glimpsing inode " DFID "\n", PFID(fid));
 		if (lli->lli_has_smd) {
-			struct cl_lock *lock = ccc_env_lock(env);
+			struct cl_lock *lock = vvp_env_lock(env);
 			struct cl_lock_descr *descr = &lock->cll_descr;
 
 			/* NOTE: this looks like DLM lock request, but it may
@@ -163,7 +163,7 @@ static int cl_io_get(struct inode *inode, struct lu_env **envout,
 	if (S_ISREG(inode->i_mode)) {
 		env = cl_env_get(refcheck);
 		if (!IS_ERR(env)) {
-			io = ccc_env_thread_io(env);
+			io = vvp_env_thread_io(env);
 			io->ci_obj = clob;
 			*envout = env;
 			*ioout  = io;
@@ -238,7 +238,7 @@ int cl_local_size(struct inode *inode)
 	if (result > 0) {
 		result = io->ci_result;
 	} else if (result == 0) {
-		struct cl_lock *lock = ccc_env_lock(env);
+		struct cl_lock *lock = vvp_env_lock(env);
 
 		lock->cll_descr = whole_file;
 		lock->cll_descr.cld_enq_flags = CEF_PEEK;
diff --git a/drivers/staging/lustre/lustre/llite/lcommon_cl.c b/drivers/staging/lustre/lustre/llite/lcommon_cl.c
index d28546a..5b523e33 100644
--- a/drivers/staging/lustre/lustre/llite/lcommon_cl.c
+++ b/drivers/staging/lustre/lustre/llite/lcommon_cl.c
@@ -65,49 +65,12 @@
  * ccc_ prefix stands for "Common Client Code".
  */
 
-static struct kmem_cache *ccc_thread_kmem;
-
-static struct lu_kmem_descr ccc_caches[] = {
-	{
-		.ckd_cache = &ccc_thread_kmem,
-		.ckd_name  = "ccc_thread_kmem",
-		.ckd_size  = sizeof(struct ccc_thread_info),
-	},
-	{
-		.ckd_cache = NULL
-	}
-};
-
 /*****************************************************************************
  *
  * Vvp device and device type functions.
  *
  */
 
-void *ccc_key_init(const struct lu_context *ctx, struct lu_context_key *key)
-{
-	struct ccc_thread_info *info;
-
-	info = kmem_cache_zalloc(ccc_thread_kmem, GFP_NOFS);
-	if (!info)
-		info = ERR_PTR(-ENOMEM);
-	return info;
-}
-
-void ccc_key_fini(const struct lu_context *ctx,
-		  struct lu_context_key *key, void *data)
-{
-	struct ccc_thread_info *info = data;
-
-	kmem_cache_free(ccc_thread_kmem, info);
-}
-
-struct lu_context_key ccc_key = {
-	.lct_tags = LCT_CL_THREAD,
-	.lct_init = ccc_key_init,
-	.lct_fini = ccc_key_fini
-};
-
 /**
  * An `emergency' environment used by ccc_inode_fini() when cl_env_get()
  * fails. Access to this environment is serialized by ccc_inode_fini_guard
@@ -126,13 +89,9 @@ int ccc_global_init(struct lu_device_type *device_type)
 {
 	int result;
 
-	result = lu_kmem_init(ccc_caches);
-	if (result)
-		return result;
-
 	result = lu_device_type_init(device_type);
 	if (result)
-		goto out_kmem;
+		return result;
 
 	ccc_inode_fini_env = cl_env_alloc(&dummy_refcheck,
 					  LCT_REMEMBER | LCT_NOREF);
@@ -145,8 +104,6 @@ int ccc_global_init(struct lu_device_type *device_type)
 	return 0;
 out_device:
 	lu_device_type_fini(device_type);
-out_kmem:
-	lu_kmem_fini(ccc_caches);
 	return result;
 }
 
@@ -157,7 +114,6 @@ void ccc_global_fini(struct lu_device_type *device_type)
 		ccc_inode_fini_env = NULL;
 	}
 	lu_device_type_fini(device_type);
-	lu_kmem_fini(ccc_caches);
 }
 
 int cl_setattr_ost(struct inode *inode, const struct iattr *attr)
@@ -171,7 +127,7 @@ int cl_setattr_ost(struct inode *inode, const struct iattr *attr)
 	if (IS_ERR(env))
 		return PTR_ERR(env);
 
-	io = ccc_env_thread_io(env);
+	io = vvp_env_thread_io(env);
 	io->ci_obj = ll_i2info(inode)->lli_clob;
 
 	io->u.ci_setattr.sa_attr.lvb_atime = LTIME_S(attr->ia_atime);
diff --git a/drivers/staging/lustre/lustre/llite/lcommon_misc.c b/drivers/staging/lustre/lustre/llite/lcommon_misc.c
index 5e3e43f..12f3e71 100644
--- a/drivers/staging/lustre/lustre/llite/lcommon_misc.c
+++ b/drivers/staging/lustre/lustre/llite/lcommon_misc.c
@@ -140,7 +140,7 @@ int cl_get_grouplock(struct cl_object *obj, unsigned long gid, int nonblock,
 	if (IS_ERR(env))
 		return PTR_ERR(env);
 
-	io = ccc_env_thread_io(env);
+	io = vvp_env_thread_io(env);
 	io->ci_obj = obj;
 	io->ci_ignore_layout = 1;
 
@@ -154,7 +154,7 @@ int cl_get_grouplock(struct cl_object *obj, unsigned long gid, int nonblock,
 		return rc;
 	}
 
-	lock = ccc_env_lock(env);
+	lock = vvp_env_lock(env);
 	descr = &lock->cll_descr;
 	descr->cld_obj = obj;
 	descr->cld_start = 0;
diff --git a/drivers/staging/lustre/lustre/llite/llite_mmap.c b/drivers/staging/lustre/lustre/llite/llite_mmap.c
index a7693c5..83d7006 100644
--- a/drivers/staging/lustre/lustre/llite/llite_mmap.c
+++ b/drivers/staging/lustre/lustre/llite/llite_mmap.c
@@ -123,7 +123,7 @@ ll_fault_io_init(struct vm_area_struct *vma, struct lu_env **env_ret,
 
 	*env_ret = env;
 
-	io = ccc_env_thread_io(env);
+	io = vvp_env_thread_io(env);
 	io->ci_obj = ll_i2info(inode)->lli_clob;
 	LASSERT(io->ci_obj);
 
diff --git a/drivers/staging/lustre/lustre/llite/rw.c b/drivers/staging/lustre/lustre/llite/rw.c
index f9bc3e4..7d5dd38 100644
--- a/drivers/staging/lustre/lustre/llite/rw.c
+++ b/drivers/staging/lustre/lustre/llite/rw.c
@@ -524,7 +524,7 @@ int ll_readahead(const struct lu_env *env, struct cl_io *io,
 {
 	struct vvp_io *vio = vvp_env_io(env);
 	struct ll_thread_info *lti = ll_env_info(env);
-	struct cl_attr *attr = ccc_env_thread_attr(env);
+	struct cl_attr *attr = vvp_env_thread_attr(env);
 	unsigned long start = 0, end = 0, reserved;
 	unsigned long ra_end, len, mlen = 0;
 	struct inode *inode;
@@ -999,7 +999,7 @@ int ll_writepage(struct page *vmpage, struct writeback_control *wbc)
 	clob  = ll_i2info(inode)->lli_clob;
 	LASSERT(clob);
 
-	io = ccc_env_thread_io(env);
+	io = vvp_env_thread_io(env);
 	io->ci_obj = clob;
 	io->ci_ignore_layout = 1;
 	result = cl_io_init(env, io, CIT_MISC, clob);
diff --git a/drivers/staging/lustre/lustre/llite/rw26.c b/drivers/staging/lustre/lustre/llite/rw26.c
index 106473e..65baeeb 100644
--- a/drivers/staging/lustre/lustre/llite/rw26.c
+++ b/drivers/staging/lustre/lustre/llite/rw26.c
@@ -455,7 +455,7 @@ out:
 static int ll_prepare_partial_page(const struct lu_env *env, struct cl_io *io,
 				   struct cl_page *pg)
 {
-	struct cl_attr *attr   = ccc_env_thread_attr(env);
+	struct cl_attr *attr   = vvp_env_thread_attr(env);
 	struct cl_object *obj  = io->ci_obj;
 	struct vvp_page *vpg   = cl_object_page_slice(obj, pg);
 	loff_t          offset = cl_offset(obj, vvp_index(vpg));
diff --git a/drivers/staging/lustre/lustre/llite/vvp_dev.c b/drivers/staging/lustre/lustre/llite/vvp_dev.c
index 2278433..b33cd35 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_dev.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_dev.c
@@ -62,6 +62,8 @@ struct kmem_cache *vvp_lock_kmem;
 struct kmem_cache *vvp_object_kmem;
 struct kmem_cache *vvp_req_kmem;
 static struct kmem_cache *vvp_session_kmem;
+static struct kmem_cache *vvp_thread_kmem;
+
 static struct lu_kmem_descr vvp_caches[] = {
 	{
 		.ckd_cache = &ll_thread_kmem,
@@ -89,6 +91,11 @@ static struct lu_kmem_descr vvp_caches[] = {
 		.ckd_size  = sizeof(struct vvp_session)
 	},
 	{
+		.ckd_cache = &vvp_thread_kmem,
+		.ckd_name  = "vvp_thread_kmem",
+		.ckd_size  = sizeof(struct vvp_thread_info),
+	},
+	{
 		.ckd_cache = NULL
 	}
 };
@@ -143,8 +150,33 @@ struct lu_context_key vvp_session_key = {
 	.lct_fini = vvp_session_key_fini
 };
 
+void *vvp_thread_key_init(const struct lu_context *ctx,
+			  struct lu_context_key *key)
+{
+	struct vvp_thread_info *vti;
+
+	vti = kmem_cache_zalloc(vvp_thread_kmem, GFP_NOFS);
+	if (!vti)
+		vti = ERR_PTR(-ENOMEM);
+	return vti;
+}
+
+void vvp_thread_key_fini(const struct lu_context *ctx,
+			 struct lu_context_key *key, void *data)
+{
+	struct vvp_thread_info *vti = data;
+
+	kmem_cache_free(vvp_thread_kmem, vti);
+}
+
+struct lu_context_key vvp_thread_key = {
+	.lct_tags = LCT_CL_THREAD,
+	.lct_init = vvp_thread_key_init,
+	.lct_fini = vvp_thread_key_fini
+};
+
 /* type constructor/destructor: vvp_type_{init,fini,start,stop}(). */
-LU_TYPE_INIT_FINI(vvp, &ccc_key, &ll_thread_key, &vvp_session_key);
+LU_TYPE_INIT_FINI(vvp, &vvp_thread_key, &ll_thread_key, &vvp_session_key);
 
 static const struct lu_device_operations vvp_lu_ops = {
 	.ldo_object_alloc      = vvp_object_alloc
diff --git a/drivers/staging/lustre/lustre/llite/vvp_internal.h b/drivers/staging/lustre/lustre/llite/vvp_internal.h
index 4740ff5..0e15202 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_internal.h
+++ b/drivers/staging/lustre/lustre/llite/vvp_internal.h
@@ -164,50 +164,50 @@ struct vvp_io {
 	bool		vui_ra_valid;
 };
 
-extern struct lu_context_key ccc_key;
 extern struct lu_context_key vvp_session_key;
+extern struct lu_context_key vvp_thread_key;
 
 extern struct kmem_cache *vvp_lock_kmem;
 extern struct kmem_cache *vvp_object_kmem;
 extern struct kmem_cache *vvp_req_kmem;
 
-struct ccc_thread_info {
-	struct cl_lock		cti_lock;
-	struct cl_lock_descr	cti_descr;
-	struct cl_io		cti_io;
-	struct cl_attr		cti_attr;
+struct vvp_thread_info {
+	struct cl_lock		vti_lock;
+	struct cl_lock_descr	vti_descr;
+	struct cl_io		vti_io;
+	struct cl_attr		vti_attr;
 };
 
-static inline struct ccc_thread_info *ccc_env_info(const struct lu_env *env)
+static inline struct vvp_thread_info *vvp_env_info(const struct lu_env *env)
 {
-	struct ccc_thread_info      *info;
+	struct vvp_thread_info      *vti;
 
-	info = lu_context_key_get(&env->le_ctx, &ccc_key);
-	LASSERT(info);
+	vti = lu_context_key_get(&env->le_ctx, &vvp_thread_key);
+	LASSERT(vti);
 
-	return info;
+	return vti;
 }
 
-static inline struct cl_lock *ccc_env_lock(const struct lu_env *env)
+static inline struct cl_lock *vvp_env_lock(const struct lu_env *env)
 {
-	struct cl_lock *lock = &ccc_env_info(env)->cti_lock;
+	struct cl_lock *lock = &vvp_env_info(env)->vti_lock;
 
 	memset(lock, 0, sizeof(*lock));
 	return lock;
 }
 
-static inline struct cl_attr *ccc_env_thread_attr(const struct lu_env *env)
+static inline struct cl_attr *vvp_env_thread_attr(const struct lu_env *env)
 {
-	struct cl_attr *attr = &ccc_env_info(env)->cti_attr;
+	struct cl_attr *attr = &vvp_env_info(env)->vti_attr;
 
 	memset(attr, 0, sizeof(*attr));
 
 	return attr;
 }
 
-static inline struct cl_io *ccc_env_thread_io(const struct lu_env *env)
+static inline struct cl_io *vvp_env_thread_io(const struct lu_env *env)
 {
-	struct cl_io *io = &ccc_env_info(env)->cti_io;
+	struct cl_io *io = &vvp_env_info(env)->vti_io;
 
 	memset(io, 0, sizeof(*io));
 
diff --git a/drivers/staging/lustre/lustre/llite/vvp_io.c b/drivers/staging/lustre/lustre/llite/vvp_io.c
index 18bc71b..366ac1c 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_io.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_io.c
@@ -135,7 +135,7 @@ static int vvp_prep_size(const struct lu_env *env, struct cl_object *obj,
 			 struct cl_io *io, loff_t start, size_t count,
 			 int *exceed)
 {
-	struct cl_attr *attr  = ccc_env_thread_attr(env);
+	struct cl_attr *attr  = vvp_env_thread_attr(env);
 	struct inode   *inode = vvp_object_inode(obj);
 	loff_t	  pos   = start + count - 1;
 	loff_t kms;
@@ -382,10 +382,10 @@ static enum cl_lock_mode vvp_mode_from_vma(struct vm_area_struct *vma)
 static int vvp_mmap_locks(const struct lu_env *env,
 			  struct vvp_io *vio, struct cl_io *io)
 {
-	struct ccc_thread_info *cti = ccc_env_info(env);
+	struct vvp_thread_info *cti = vvp_env_info(env);
 	struct mm_struct       *mm = current->mm;
 	struct vm_area_struct  *vma;
-	struct cl_lock_descr   *descr = &cti->cti_descr;
+	struct cl_lock_descr   *descr = &cti->vti_descr;
 	ldlm_policy_data_t      policy;
 	unsigned long	   addr;
 	ssize_t		 count;
@@ -621,7 +621,7 @@ static int vvp_io_setattr_time(const struct lu_env *env,
 {
 	struct cl_io       *io    = ios->cis_io;
 	struct cl_object   *obj   = io->ci_obj;
-	struct cl_attr     *attr  = ccc_env_thread_attr(env);
+	struct cl_attr     *attr  = vvp_env_thread_attr(env);
 	int result;
 	unsigned valid = CAT_CTIME;
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 37/46] staging/lustre/llite: Remove ccc_global_{init,fini}()
  2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
                   ` (35 preceding siblings ...)
  2016-03-30 23:48 ` [PATCH v2 36/46] staging/lustre/llite: rename struct ccc_thread_info to vvp_thread_info green
@ 2016-03-30 23:48 ` green
  2016-03-30 23:48 ` [PATCH v2 38/46] staging/lustre/llite: Move ll_dirent_type_get and make it static green
                   ` (8 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: green @ 2016-03-30 23:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List, John Hammond,
	Jinshan Xiong, Oleg Drokin

From: John Hammond <john.hammond@intel.com>

Merge their contents into vvp_global_{init,fini}() and
{init,exit}_lustre_lite().
Rename ccc_inode_fini_* to cl_inode_fini_*.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-on: http://review.whamcloud.com/13714
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5971
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 drivers/staging/lustre/lustre/llite/lcommon_cl.c   | 53 +++++-----------------
 .../staging/lustre/lustre/llite/llite_internal.h   |  7 +--
 drivers/staging/lustre/lustre/llite/super25.c      | 14 +++++-
 drivers/staging/lustre/lustre/llite/vvp_dev.c      | 25 ++++++----
 drivers/staging/lustre/lustre/llite/vvp_internal.h |  4 +-
 5 files changed, 46 insertions(+), 57 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/lcommon_cl.c b/drivers/staging/lustre/lustre/llite/lcommon_cl.c
index 5b523e33..164737a 100644
--- a/drivers/staging/lustre/lustre/llite/lcommon_cl.c
+++ b/drivers/staging/lustre/lustre/llite/lcommon_cl.c
@@ -72,49 +72,18 @@
  */
 
 /**
- * An `emergency' environment used by ccc_inode_fini() when cl_env_get()
- * fails. Access to this environment is serialized by ccc_inode_fini_guard
+ * An `emergency' environment used by cl_inode_fini() when cl_env_get()
+ * fails. Access to this environment is serialized by cl_inode_fini_guard
  * mutex.
  */
-static struct lu_env *ccc_inode_fini_env;
+struct lu_env *cl_inode_fini_env;
+int cl_inode_fini_refcheck;
 
 /**
  * A mutex serializing calls to slp_inode_fini() under extreme memory
  * pressure, when environments cannot be allocated.
  */
-static DEFINE_MUTEX(ccc_inode_fini_guard);
-static int dummy_refcheck;
-
-int ccc_global_init(struct lu_device_type *device_type)
-{
-	int result;
-
-	result = lu_device_type_init(device_type);
-	if (result)
-		return result;
-
-	ccc_inode_fini_env = cl_env_alloc(&dummy_refcheck,
-					  LCT_REMEMBER | LCT_NOREF);
-	if (IS_ERR(ccc_inode_fini_env)) {
-		result = PTR_ERR(ccc_inode_fini_env);
-		goto out_device;
-	}
-
-	ccc_inode_fini_env->le_ctx.lc_cookie = 0x4;
-	return 0;
-out_device:
-	lu_device_type_fini(device_type);
-	return result;
-}
-
-void ccc_global_fini(struct lu_device_type *device_type)
-{
-	if (ccc_inode_fini_env) {
-		cl_env_put(ccc_inode_fini_env, &dummy_refcheck);
-		ccc_inode_fini_env = NULL;
-	}
-	lu_device_type_fini(device_type);
-}
+static DEFINE_MUTEX(cl_inode_fini_guard);
 
 int cl_setattr_ost(struct inode *inode, const struct iattr *attr)
 {
@@ -286,10 +255,10 @@ void cl_inode_fini(struct inode *inode)
 		env = cl_env_get(&refcheck);
 		emergency = IS_ERR(env);
 		if (emergency) {
-			mutex_lock(&ccc_inode_fini_guard);
-			LASSERT(ccc_inode_fini_env);
-			cl_env_implant(ccc_inode_fini_env, &refcheck);
-			env = ccc_inode_fini_env;
+			mutex_lock(&cl_inode_fini_guard);
+			LASSERT(cl_inode_fini_env);
+			cl_env_implant(cl_inode_fini_env, &refcheck);
+			env = cl_inode_fini_env;
 		}
 		/*
 		 * cl_object cache is a slave to inode cache (which, in turn
@@ -301,8 +270,8 @@ void cl_inode_fini(struct inode *inode)
 		cl_object_put_last(env, clob);
 		lli->lli_clob = NULL;
 		if (emergency) {
-			cl_env_unplant(ccc_inode_fini_env, &refcheck);
-			mutex_unlock(&ccc_inode_fini_guard);
+			cl_env_unplant(cl_inode_fini_env, &refcheck);
+			mutex_unlock(&cl_inode_fini_guard);
 		} else {
 			cl_env_put(env, &refcheck);
 		}
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index a6ee2fe..993cee8 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -984,9 +984,6 @@ void free_rmtperm_hash(struct hlist_head *hash);
 int ll_update_remote_perm(struct inode *inode, struct mdt_remote_perm *perm);
 int lustre_check_remote_perm(struct inode *inode, int mask);
 
-/* llite/llite_cl.c */
-extern struct lu_device_type vvp_device_type;
-
 /**
  * Common IO arguments for various VFS I/O interfaces.
  */
@@ -1371,4 +1368,8 @@ void ll_xattr_fini(void);
 int ll_page_sync_io(const struct lu_env *env, struct cl_io *io,
 		    struct cl_page *page, enum cl_req_type crt);
 
+/* lcommon_cl.c */
+extern struct lu_env *cl_inode_fini_env;
+extern int cl_inode_fini_refcheck;
+
 #endif /* LLITE_INTERNAL_H */
diff --git a/drivers/staging/lustre/lustre/llite/super25.c b/drivers/staging/lustre/lustre/llite/super25.c
index 61856d3..415750b 100644
--- a/drivers/staging/lustre/lustre/llite/super25.c
+++ b/drivers/staging/lustre/lustre/llite/super25.c
@@ -164,9 +164,18 @@ static int __init lustre_init(void)
 	if (rc != 0)
 		goto out_sysfs;
 
+	cl_inode_fini_env = cl_env_alloc(&cl_inode_fini_refcheck,
+					 LCT_REMEMBER | LCT_NOREF);
+	if (IS_ERR(cl_inode_fini_env)) {
+		rc = PTR_ERR(cl_inode_fini_env);
+		goto out_vvp;
+	}
+
+	cl_inode_fini_env->le_ctx.lc_cookie = 0x4;
+
 	rc = ll_xattr_init();
 	if (rc != 0)
-		goto out_vvp;
+		goto out_inode_fini_env;
 
 	lustre_register_client_fill_super(ll_fill_super);
 	lustre_register_kill_super_cb(ll_kill_super);
@@ -174,6 +183,8 @@ static int __init lustre_init(void)
 
 	return 0;
 
+out_inode_fini_env:
+	cl_env_put(cl_inode_fini_env, &cl_inode_fini_refcheck);
 out_vvp:
 	vvp_global_fini();
 out_sysfs:
@@ -198,6 +209,7 @@ static void __exit lustre_exit(void)
 	kset_unregister(llite_kset);
 
 	ll_xattr_fini();
+	cl_env_put(cl_inode_fini_env, &cl_inode_fini_refcheck);
 	vvp_global_fini();
 
 	kmem_cache_destroy(ll_inode_cachep);
diff --git a/drivers/staging/lustre/lustre/llite/vvp_dev.c b/drivers/staging/lustre/lustre/llite/vvp_dev.c
index b33cd35..e35c1a1 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_dev.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_dev.c
@@ -293,20 +293,27 @@ struct lu_device_type vvp_device_type = {
  */
 int vvp_global_init(void)
 {
-	int result;
+	int rc;
 
-	result = lu_kmem_init(vvp_caches);
-	if (result == 0) {
-		result = ccc_global_init(&vvp_device_type);
-		if (result != 0)
-			lu_kmem_fini(vvp_caches);
-	}
-	return result;
+	rc = lu_kmem_init(vvp_caches);
+	if (rc != 0)
+		return rc;
+
+	rc = lu_device_type_init(&vvp_device_type);
+	if (rc != 0)
+		goto out_kmem;
+
+	return 0;
+
+out_kmem:
+	lu_kmem_fini(vvp_caches);
+
+	return rc;
 }
 
 void vvp_global_fini(void)
 {
-	ccc_global_fini(&vvp_device_type);
+	lu_device_type_fini(&vvp_device_type);
 	lu_kmem_fini(vvp_caches);
 }
 
diff --git a/drivers/staging/lustre/lustre/llite/vvp_internal.h b/drivers/staging/lustre/lustre/llite/vvp_internal.h
index 0e15202..fe29fb5 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_internal.h
+++ b/drivers/staging/lustre/lustre/llite/vvp_internal.h
@@ -164,6 +164,8 @@ struct vvp_io {
 	bool		vui_ra_valid;
 };
 
+extern struct lu_device_type vvp_device_type;
+
 extern struct lu_context_key vvp_session_key;
 extern struct lu_context_key vvp_thread_key;
 
@@ -324,8 +326,6 @@ void ccc_key_fini(const struct lu_context *ctx,
 		  struct lu_context_key *key, void *data);
 
 void ccc_umount(const struct lu_env *env, struct cl_device *dev);
-int ccc_global_init(struct lu_device_type *device_type);
-void ccc_global_fini(struct lu_device_type *device_type);
 
 static inline struct lu_device *vvp2lu_dev(struct vvp_device *vdv)
 {
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 38/46] staging/lustre/llite: Move ll_dirent_type_get and make it static
  2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
                   ` (36 preceding siblings ...)
  2016-03-30 23:48 ` [PATCH v2 37/46] staging/lustre/llite: Remove ccc_global_{init,fini}() green
@ 2016-03-30 23:48 ` green
  2016-03-30 23:49 ` [PATCH v2 39/46] staging/lustre/llite: Move several declarations to llite_internal.h green
                   ` (7 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: green @ 2016-03-30 23:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List, Oleg Drokin

From: Oleg Drokin <green@linuxhacker.ru>

ll_dirent_type_get is only used in one place in llite/dir.c,
so move it there.

Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 drivers/staging/lustre/lustre/llite/dir.c          | 22 ++++++++++++++++++++++
 drivers/staging/lustre/lustre/llite/lcommon_cl.c   | 22 ----------------------
 drivers/staging/lustre/lustre/llite/vvp_internal.h |  1 -
 3 files changed, 22 insertions(+), 23 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/dir.c b/drivers/staging/lustre/lustre/llite/dir.c
index 2ca4b0e..b085fb4 100644
--- a/drivers/staging/lustre/lustre/llite/dir.c
+++ b/drivers/staging/lustre/lustre/llite/dir.c
@@ -469,6 +469,28 @@ fail:
 	goto out_unlock;
 }
 
+/**
+ * return IF_* type for given lu_dirent entry.
+ * IF_* flag shld be converted to particular OS file type in
+ * platform llite module.
+ */
+static __u16 ll_dirent_type_get(struct lu_dirent *ent)
+{
+	__u16 type = 0;
+	struct luda_type *lt;
+	int len = 0;
+
+	if (le32_to_cpu(ent->lde_attrs) & LUDA_TYPE) {
+		const unsigned int align = sizeof(struct luda_type) - 1;
+
+		len = le16_to_cpu(ent->lde_namelen);
+		len = (len + align) & ~align;
+		lt = (void *)ent->lde_name + len;
+		type = IFTODT(le16_to_cpu(lt->lt_type));
+	}
+	return type;
+}
+
 int ll_dir_read(struct inode *inode, struct dir_context *ctx)
 {
 	struct ll_inode_info *info       = ll_i2info(inode);
diff --git a/drivers/staging/lustre/lustre/llite/lcommon_cl.c b/drivers/staging/lustre/lustre/llite/lcommon_cl.c
index 164737a..6c00715 100644
--- a/drivers/staging/lustre/lustre/llite/lcommon_cl.c
+++ b/drivers/staging/lustre/lustre/llite/lcommon_cl.c
@@ -280,28 +280,6 @@ void cl_inode_fini(struct inode *inode)
 }
 
 /**
- * return IF_* type for given lu_dirent entry.
- * IF_* flag shld be converted to particular OS file type in
- * platform llite module.
- */
-__u16 ll_dirent_type_get(struct lu_dirent *ent)
-{
-	__u16 type = 0;
-	struct luda_type *lt;
-	int len = 0;
-
-	if (le32_to_cpu(ent->lde_attrs) & LUDA_TYPE) {
-		const unsigned int align = sizeof(struct luda_type) - 1;
-
-		len = le16_to_cpu(ent->lde_namelen);
-		len = (len + align) & ~align;
-		lt = (void *)ent->lde_name + len;
-		type = IFTODT(le16_to_cpu(lt->lt_type));
-	}
-	return type;
-}
-
-/**
  * build inode number from passed @fid
  */
 __u64 cl_fid_build_ino(const struct lu_fid *fid, int api32)
diff --git a/drivers/staging/lustre/lustre/llite/vvp_internal.h b/drivers/staging/lustre/lustre/llite/vvp_internal.h
index fe29fb5..06e2726 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_internal.h
+++ b/drivers/staging/lustre/lustre/llite/vvp_internal.h
@@ -376,7 +376,6 @@ int cl_file_inode_init(struct inode *inode, struct lustre_md *md);
 void cl_inode_fini(struct inode *inode);
 int cl_local_size(struct inode *inode);
 
-__u16 ll_dirent_type_get(struct lu_dirent *ent);
 __u64 cl_fid_build_ino(const struct lu_fid *fid, int api32);
 __u32 cl_fid_build_gen(const struct lu_fid *fid);
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 39/46] staging/lustre/llite: Move several declarations to llite_internal.h
  2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
                   ` (37 preceding siblings ...)
  2016-03-30 23:48 ` [PATCH v2 38/46] staging/lustre/llite: Move ll_dirent_type_get and make it static green
@ 2016-03-30 23:49 ` green
  2016-03-30 23:49 ` [PATCH v2 40/46] staging/lustre/llite: Remove unused vui_local_lock field green
                   ` (6 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: green @ 2016-03-30 23:49 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List, John Hammond,
	Jinshan Xiong, Oleg Drokin

From: John Hammond <john.hammond@intel.com>

Move several declarations between llite_internal.h and vvp_internal.h
with the goal of reserving the latter header for functions that
pertain to vvp_{device,object,page,...}.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-on: http://review.whamcloud.com/13714
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5971
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 .../staging/lustre/lustre/llite/llite_internal.h   | 32 +++++++++++++++++--
 drivers/staging/lustre/lustre/llite/vvp_internal.h | 36 ++--------------------
 2 files changed, 32 insertions(+), 36 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index 993cee8..ba24f09 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -663,6 +663,10 @@ static inline int ll_need_32bit_api(struct ll_sb_info *sbi)
 void ll_ras_enter(struct file *f);
 
 /* llite/lcommon_misc.c */
+int cl_init_ea_size(struct obd_export *md_exp, struct obd_export *dt_exp);
+int cl_ocd_update(struct obd_device *host,
+		  struct obd_device *watched,
+		  enum obd_notify_event ev, void *owner, void *data);
 int cl_get_grouplock(struct cl_object *obj, unsigned long gid, int nonblock,
 		     struct ll_grouplock *cg);
 void cl_put_grouplock(struct ll_grouplock *cg);
@@ -881,9 +885,6 @@ static inline struct vvp_io_args *ll_env_args(const struct lu_env *env,
 	return via;
 }
 
-int vvp_global_init(void);
-void vvp_global_fini(void);
-
 void ll_queue_done_writing(struct inode *inode, unsigned long flags);
 void ll_close_thread_shutdown(struct ll_close_queue *lcq);
 int ll_close_thread_start(struct ll_close_queue **lcq_ret);
@@ -1089,6 +1090,22 @@ int do_statahead_enter(struct inode *dir, struct dentry **dentry,
 		       int only_unplug);
 void ll_stop_statahead(struct inode *dir, void *key);
 
+blkcnt_t dirty_cnt(struct inode *inode);
+
+int cl_glimpse_size0(struct inode *inode, int agl);
+int cl_glimpse_lock(const struct lu_env *env, struct cl_io *io,
+		    struct inode *inode, struct cl_object *clob, int agl);
+
+static inline int cl_glimpse_size(struct inode *inode)
+{
+	return cl_glimpse_size0(inode, 0);
+}
+
+static inline int cl_agl(struct inode *inode)
+{
+	return cl_glimpse_size0(inode, 1);
+}
+
 static inline int ll_glimpse_size(struct inode *inode)
 {
 	struct ll_inode_info *lli = ll_i2info(inode);
@@ -1369,7 +1386,16 @@ int ll_page_sync_io(const struct lu_env *env, struct cl_io *io,
 		    struct cl_page *page, enum cl_req_type crt);
 
 /* lcommon_cl.c */
+int cl_setattr_ost(struct inode *inode, const struct iattr *attr);
+
 extern struct lu_env *cl_inode_fini_env;
 extern int cl_inode_fini_refcheck;
 
+int cl_file_inode_init(struct inode *inode, struct lustre_md *md);
+void cl_inode_fini(struct inode *inode);
+int cl_local_size(struct inode *inode);
+
+__u64 cl_fid_build_ino(const struct lu_fid *fid, int api32);
+__u32 cl_fid_build_gen(const struct lu_fid *fid);
+
 #endif /* LLITE_INTERNAL_H */
diff --git a/drivers/staging/lustre/lustre/llite/vvp_internal.h b/drivers/staging/lustre/lustre/llite/vvp_internal.h
index 06e2726..ce4aeca 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_internal.h
+++ b/drivers/staging/lustre/lustre/llite/vvp_internal.h
@@ -53,25 +53,6 @@ struct obd_device;
 struct obd_export;
 struct page;
 
-blkcnt_t dirty_cnt(struct inode *inode);
-
-int cl_glimpse_size0(struct inode *inode, int agl);
-int cl_glimpse_lock(const struct lu_env *env, struct cl_io *io,
-		    struct inode *inode, struct cl_object *clob, int agl);
-
-static inline int cl_glimpse_size(struct inode *inode)
-{
-	return cl_glimpse_size0(inode, 0);
-}
-
-static inline int cl_agl(struct inode *inode)
-{
-	return cl_glimpse_size0(inode, 1);
-}
-
-/**
- * Locking policy for setattr.
- */
 enum ccc_setattr_lock_type {
 	/** Locking is done by server */
 	SETATTR_NOLOCK,
@@ -370,23 +351,9 @@ static inline struct vvp_lock *cl2vvp_lock(const struct cl_lock_slice *slice)
 	return container_of(slice, struct vvp_lock, vlk_cl);
 }
 
-int cl_setattr_ost(struct inode *inode, const struct iattr *attr);
-
-int cl_file_inode_init(struct inode *inode, struct lustre_md *md);
-void cl_inode_fini(struct inode *inode);
-int cl_local_size(struct inode *inode);
-
-__u64 cl_fid_build_ino(const struct lu_fid *fid, int api32);
-__u32 cl_fid_build_gen(const struct lu_fid *fid);
-
 # define CLOBINVRNT(env, clob, expr)					\
 	((void)sizeof(env), (void)sizeof(clob), (void)sizeof(!!(expr)))
 
-int cl_init_ea_size(struct obd_export *md_exp, struct obd_export *dt_exp);
-int cl_ocd_update(struct obd_device *host,
-		  struct obd_device *watched,
-		  enum obd_notify_event ev, void *owner, void *data);
-
 /**
  * New interfaces to get and put lov_stripe_md from lov layer. This violates
  * layering because lov_stripe_md is supposed to be a private data in lov.
@@ -415,6 +382,9 @@ struct lu_object *vvp_object_alloc(const struct lu_env *env,
 				   const struct lu_object_header *hdr,
 				   struct lu_device *dev);
 
+int vvp_global_init(void);
+void vvp_global_fini(void);
+
 extern const struct file_operations vvp_dump_pgcache_file_ops;
 
 #endif /* VVP_INTERNAL_H */
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 40/46] staging/lustre/llite: Remove unused vui_local_lock field
  2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
                   ` (38 preceding siblings ...)
  2016-03-30 23:49 ` [PATCH v2 39/46] staging/lustre/llite: Move several declarations to llite_internal.h green
@ 2016-03-30 23:49 ` green
  2016-03-30 23:49 ` [PATCH v2 41/46] staging/lustre/ldlm: ELC picks locks in a safer policy green
                   ` (5 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: green @ 2016-03-30 23:49 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List, Oleg Drokin

From: Oleg Drokin <green@linuxhacker.ru>

vvp_io_setattr_lock is the only user that sets it, but it's
never checked anywhere, so could go away.
Also get rid of enum ccc_setattr_lock_type that becomes unused.

Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 drivers/staging/lustre/lustre/llite/vvp_internal.h | 12 ------------
 drivers/staging/lustre/lustre/llite/vvp_io.c       |  3 ---
 2 files changed, 15 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/vvp_internal.h b/drivers/staging/lustre/lustre/llite/vvp_internal.h
index ce4aeca..27b9b0a 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_internal.h
+++ b/drivers/staging/lustre/lustre/llite/vvp_internal.h
@@ -53,15 +53,6 @@ struct obd_device;
 struct obd_export;
 struct page;
 
-enum ccc_setattr_lock_type {
-	/** Locking is done by server */
-	SETATTR_NOLOCK,
-	/** Extent lock is enqueued */
-	SETATTR_EXTENT_LOCK,
-	/** Existing local extent lock is used */
-	SETATTR_MATCH_LOCK
-};
-
 /* specific architecture can implement only part of this list */
 enum vvp_io_subtype {
 	/** normal IO */
@@ -112,9 +103,6 @@ struct vvp_io {
 			bool		ft_flags_valid;
 		} fault;
 		struct {
-			enum ccc_setattr_lock_type vui_local_lock;
-		} setattr;
-		struct {
 			struct pipe_inode_info	*vui_pipe;
 			unsigned int		 vui_flags;
 		} splice;
diff --git a/drivers/staging/lustre/lustre/llite/vvp_io.c b/drivers/staging/lustre/lustre/llite/vvp_io.c
index 366ac1c..aed7b8e 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_io.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_io.c
@@ -567,7 +567,6 @@ static int vvp_io_setattr_iter_init(const struct lu_env *env,
 static int vvp_io_setattr_lock(const struct lu_env *env,
 			       const struct cl_io_slice *ios)
 {
-	struct vvp_io *vio = vvp_env_io(env);
 	struct cl_io  *io  = ios->cis_io;
 	__u64 new_size;
 	__u32 enqflags = 0;
@@ -585,8 +584,6 @@ static int vvp_io_setattr_lock(const struct lu_env *env,
 		new_size = 0;
 	}
 
-	vio->u.setattr.vui_local_lock = SETATTR_EXTENT_LOCK;
-
 	return vvp_io_one_lock(env, io, enqflags, CLM_WRITE,
 			       new_size, OBD_OBJECT_EOF);
 }
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 41/46] staging/lustre/ldlm: ELC picks locks in a safer policy
  2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
                   ` (39 preceding siblings ...)
  2016-03-30 23:49 ` [PATCH v2 40/46] staging/lustre/llite: Remove unused vui_local_lock field green
@ 2016-03-30 23:49 ` green
  2016-03-30 23:49 ` [PATCH v2 42/46] staging/lustre/ldlm: revert changes to ldlm_cancel_aged_policy() green
                   ` (4 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: green @ 2016-03-30 23:49 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Jinshan Xiong, Oleg Drokin

From: Jinshan Xiong <jinshan.xiong@intel.com>

Change the policy of ELC to pick locks that have no dirty pages,
no page in writeback state, and no locked pages.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-on: http://review.whamcloud.com/9175
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4300
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 drivers/staging/lustre/lustre/include/lustre_dlm.h | 13 ++++++----
 drivers/staging/lustre/lustre/ldlm/ldlm_request.c  | 28 +++++++++++++++-------
 drivers/staging/lustre/lustre/mdc/mdc_request.c    |  4 ++--
 drivers/staging/lustre/lustre/osc/osc_lock.c       |  4 +++-
 drivers/staging/lustre/lustre/osc/osc_request.c    | 19 ++++++---------
 5 files changed, 39 insertions(+), 29 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre_dlm.h b/drivers/staging/lustre/lustre/include/lustre_dlm.h
index b1abdc2..9cade14 100644
--- a/drivers/staging/lustre/lustre/include/lustre_dlm.h
+++ b/drivers/staging/lustre/lustre/include/lustre_dlm.h
@@ -270,7 +270,7 @@ struct ldlm_pool {
 	struct completion	 pl_kobj_unregister;
 };
 
-typedef int (*ldlm_cancel_for_recovery)(struct ldlm_lock *lock);
+typedef int (*ldlm_cancel_cbt)(struct ldlm_lock *lock);
 
 /**
  * LVB operations.
@@ -447,8 +447,11 @@ struct ldlm_namespace {
 	/** Limit of parallel AST RPC count. */
 	unsigned		ns_max_parallel_ast;
 
-	/** Callback to cancel locks before replaying it during recovery. */
-	ldlm_cancel_for_recovery ns_cancel_for_recovery;
+	/**
+	 * Callback to check if a lock is good to be canceled by ELC or
+	 * during recovery.
+	 */
+	ldlm_cancel_cbt		ns_cancel;
 
 	/** LDLM lock stats */
 	struct lprocfs_stats	*ns_stats;
@@ -480,9 +483,9 @@ static inline int ns_connect_lru_resize(struct ldlm_namespace *ns)
 }
 
 static inline void ns_register_cancel(struct ldlm_namespace *ns,
-				      ldlm_cancel_for_recovery arg)
+				      ldlm_cancel_cbt arg)
 {
-	ns->ns_cancel_for_recovery = arg;
+	ns->ns_cancel = arg;
 }
 
 struct ldlm_lock;
diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_request.c b/drivers/staging/lustre/lustre/ldlm/ldlm_request.c
index 42925ac..2f12194 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_request.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_request.c
@@ -1137,7 +1137,6 @@ static ldlm_policy_res_t ldlm_cancel_no_wait_policy(struct ldlm_namespace *ns,
 						    int count)
 {
 	ldlm_policy_res_t result = LDLM_POLICY_CANCEL_LOCK;
-	ldlm_cancel_for_recovery cb = ns->ns_cancel_for_recovery;
 
 	/* don't check added & count since we want to process all locks
 	 * from unused list.
@@ -1147,7 +1146,7 @@ static ldlm_policy_res_t ldlm_cancel_no_wait_policy(struct ldlm_namespace *ns,
 	switch (lock->l_resource->lr_type) {
 	case LDLM_EXTENT:
 	case LDLM_IBITS:
-		if (cb && cb(lock))
+			if (ns->ns_cancel && ns->ns_cancel(lock) != 0)
 			break;
 	default:
 		result = LDLM_POLICY_SKIP_LOCK;
@@ -1197,8 +1196,13 @@ static ldlm_policy_res_t ldlm_cancel_lrur_policy(struct ldlm_namespace *ns,
 	/* Stop when SLV is not yet come from server or lv is smaller than
 	 * it is.
 	 */
-	return (slv == 0 || lv < slv) ?
-		LDLM_POLICY_KEEP_LOCK : LDLM_POLICY_CANCEL_LOCK;
+	if (slv == 0 || lv < slv)
+		return LDLM_POLICY_KEEP_LOCK;
+
+	if (ns->ns_cancel && ns->ns_cancel(lock) == 0)
+		return LDLM_POLICY_KEEP_LOCK;
+
+	return LDLM_POLICY_CANCEL_LOCK;
 }
 
 /**
@@ -1236,11 +1240,17 @@ static ldlm_policy_res_t ldlm_cancel_aged_policy(struct ldlm_namespace *ns,
 						 int unused, int added,
 						 int count)
 {
-	/* Stop LRU processing if young lock is found and we reach past count */
-	return ((added >= count) &&
-		time_before(cfs_time_current(),
-			    cfs_time_add(lock->l_last_used, ns->ns_max_age))) ?
-		LDLM_POLICY_KEEP_LOCK : LDLM_POLICY_CANCEL_LOCK;
+	if (added >= count)
+		return LDLM_POLICY_KEEP_LOCK;
+
+	if (time_before(cfs_time_current(),
+			cfs_time_add(lock->l_last_used, ns->ns_max_age)))
+		return LDLM_POLICY_KEEP_LOCK;
+
+	if (ns->ns_cancel && ns->ns_cancel(lock) == 0)
+		return LDLM_POLICY_KEEP_LOCK;
+
+	return LDLM_POLICY_CANCEL_LOCK;
 }
 
 /**
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_request.c b/drivers/staging/lustre/lustre/mdc/mdc_request.c
index 55dd8ef..98b27f1 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_request.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_request.c
@@ -2249,7 +2249,7 @@ static struct obd_uuid *mdc_get_uuid(struct obd_export *exp)
  * recovery, non zero value will be return if the lock can be canceled,
  * or zero returned for not
  */
-static int mdc_cancel_for_recovery(struct ldlm_lock *lock)
+static int mdc_cancel_weight(struct ldlm_lock *lock)
 {
 	if (lock->l_resource->lr_type != LDLM_IBITS)
 		return 0;
@@ -2331,7 +2331,7 @@ static int mdc_setup(struct obd_device *obd, struct lustre_cfg *cfg)
 	sptlrpc_lprocfs_cliobd_attach(obd);
 	ptlrpc_lprocfs_register_obd(obd);
 
-	ns_register_cancel(obd->obd_namespace, mdc_cancel_for_recovery);
+	ns_register_cancel(obd->obd_namespace, mdc_cancel_weight);
 
 	obd->obd_namespace->ns_lvbo = &inode_lvbo;
 
diff --git a/drivers/staging/lustre/lustre/osc/osc_lock.c b/drivers/staging/lustre/lustre/osc/osc_lock.c
index 68c5013..49dfe9f 100644
--- a/drivers/staging/lustre/lustre/osc/osc_lock.c
+++ b/drivers/staging/lustre/lustre/osc/osc_lock.c
@@ -635,7 +635,9 @@ static int weigh_cb(const struct lu_env *env, struct cl_io *io,
 {
 	struct cl_page *page = ops->ops_cl.cpl_page;
 
-	if (cl_page_is_vmlocked(env, page)) {
+	if (cl_page_is_vmlocked(env, page) ||
+	    PageDirty(page->cp_vmpage) || PageWriteback(page->cp_vmpage)
+	   ) {
 		(*(unsigned long *)cbdata)++;
 		return CLP_GANG_ABORT;
 	}
diff --git a/drivers/staging/lustre/lustre/osc/osc_request.c b/drivers/staging/lustre/lustre/osc/osc_request.c
index 368b997..a6dc517 100644
--- a/drivers/staging/lustre/lustre/osc/osc_request.c
+++ b/drivers/staging/lustre/lustre/osc/osc_request.c
@@ -2292,15 +2292,13 @@ no_match:
 	if (*flags & LDLM_FL_TEST_LOCK)
 		return -ENOLCK;
 	if (intent) {
-		LIST_HEAD(cancels);
-
 		req = ptlrpc_request_alloc(class_exp2cliimp(exp),
 					   &RQF_LDLM_ENQUEUE_LVB);
 		if (!req)
 			return -ENOMEM;
 
-		rc = ldlm_prep_enqueue_req(exp, req, &cancels, 0);
-		if (rc) {
+		rc = ptlrpc_request_pack(req, LUSTRE_DLM_VERSION, LDLM_ENQUEUE);
+		if (rc < 0) {
 			ptlrpc_request_free(req);
 			return rc;
 		}
@@ -3110,17 +3108,14 @@ static int osc_import_event(struct obd_device *obd,
  * \retval zero the lock can't be canceled
  * \retval other ok to cancel
  */
-static int osc_cancel_for_recovery(struct ldlm_lock *lock)
+static int osc_cancel_weight(struct ldlm_lock *lock)
 {
 	/*
-	 * Cancel all unused extent lock in granted mode LCK_PR or LCK_CR.
-	 *
-	 * XXX as a future improvement, we can also cancel unused write lock
-	 * if it doesn't have dirty data and active mmaps.
+	 * Cancel all unused and granted extent lock.
 	 */
 	if (lock->l_resource->lr_type == LDLM_EXTENT &&
-	    (lock->l_granted_mode == LCK_PR ||
-	     lock->l_granted_mode == LCK_CR) && osc_ldlm_weigh_ast(lock) == 0)
+	    lock->l_granted_mode == lock->l_req_mode &&
+	    osc_ldlm_weigh_ast(lock) == 0)
 		return 1;
 
 	return 0;
@@ -3197,7 +3192,7 @@ int osc_setup(struct obd_device *obd, struct lustre_cfg *lcfg)
 	}
 
 	INIT_LIST_HEAD(&cli->cl_grant_shrink_list);
-	ns_register_cancel(obd->obd_namespace, osc_cancel_for_recovery);
+	ns_register_cancel(obd->obd_namespace, osc_cancel_weight);
 	return rc;
 
 out_ptlrpcd_work:
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 42/46] staging/lustre/ldlm: revert changes to ldlm_cancel_aged_policy()
  2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
                   ` (40 preceding siblings ...)
  2016-03-30 23:49 ` [PATCH v2 41/46] staging/lustre/ldlm: ELC picks locks in a safer policy green
@ 2016-03-30 23:49 ` green
  2016-03-30 23:49 ` [PATCH v2 43/46] staging/lustre/ldlm: restore the ELC for enqueue green
                   ` (3 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: green @ 2016-03-30 23:49 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List, Niu Yawei,
	Oleg Drokin

From: Niu Yawei <yawei.niu@intel.com>

The changes to ldlm_cancel_aged_policy() introduced from LU-4300
was incorrect. This patch revert this part of changes.

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-on: http://review.whamcloud.com/12448
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5727
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 drivers/staging/lustre/lustre/ldlm/ldlm_request.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_request.c b/drivers/staging/lustre/lustre/ldlm/ldlm_request.c
index 2f12194..48e9828 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_request.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_request.c
@@ -1240,10 +1240,8 @@ static ldlm_policy_res_t ldlm_cancel_aged_policy(struct ldlm_namespace *ns,
 						 int unused, int added,
 						 int count)
 {
-	if (added >= count)
-		return LDLM_POLICY_KEEP_LOCK;
-
-	if (time_before(cfs_time_current(),
+	if ((added >= count) &&
+	    time_before(cfs_time_current(),
 			cfs_time_add(lock->l_last_used, ns->ns_max_age)))
 		return LDLM_POLICY_KEEP_LOCK;
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 43/46] staging/lustre/ldlm: restore the ELC for enqueue
  2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
                   ` (41 preceding siblings ...)
  2016-03-30 23:49 ` [PATCH v2 42/46] staging/lustre/ldlm: revert changes to ldlm_cancel_aged_policy() green
@ 2016-03-30 23:49 ` green
  2016-03-30 23:49 ` [PATCH v2 44/46] staging/lustre: Fix spacing style before open parenthesis green
                   ` (2 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: green @ 2016-03-30 23:49 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Vitaly Fertman, Oleg Drokin

From: Vitaly Fertman <vitaly.fertman@seagate.com>

after LU-4300 enqueue does not ELC anymore, however if enqueue is
agressive (ls -la of a large dir) we may exceed lru-resize limit
quickly because LRUR shrinker and recalc are called not so often.

ELC is to be restored in enqueue.
ELC also should check for the lock weight, in addition to LRUR.
ELC can also keep "skipped" locks, i.e. once checked for the weight
and left in the lru - let LRUR take care about them later.
LRUR is to be left untouched, no weight logic, otherwise LU-5727
appears and OPEN locks do not get canceled.

Xyratex-bug-id: MRP-2550
Signed-off-by: Vitaly Fertman <vitaly.fertman@seagate.com>
Reviewed-on: http://review.whamcloud.com/14342
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6390
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 drivers/staging/lustre/lustre/ldlm/ldlm_internal.h |  7 +++---
 drivers/staging/lustre/lustre/ldlm/ldlm_request.c  | 25 ++++++++++++++++++----
 drivers/staging/lustre/lustre/osc/osc_request.c    |  4 ++--
 3 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_internal.h b/drivers/staging/lustre/lustre/ldlm/ldlm_internal.h
index e21373e..e31d84a 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_internal.h
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_internal.h
@@ -95,9 +95,10 @@ enum {
 	LDLM_CANCEL_PASSED = 1 << 1, /* Cancel passed number of locks. */
 	LDLM_CANCEL_SHRINK = 1 << 2, /* Cancel locks from shrinker. */
 	LDLM_CANCEL_LRUR   = 1 << 3, /* Cancel locks from lru resize. */
-	LDLM_CANCEL_NO_WAIT = 1 << 4 /* Cancel locks w/o blocking (neither
-				      * sending nor waiting for any rpcs)
-				      */
+	LDLM_CANCEL_NO_WAIT = 1 << 4, /* Cancel locks w/o blocking (neither
+				       * sending nor waiting for any rpcs)
+				       */
+	LDLM_CANCEL_LRUR_NO_WAIT = 1 << 5, /* LRUR + NO_WAIT */
 };
 
 int ldlm_cancel_lru(struct ldlm_namespace *ns, int nr,
diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_request.c b/drivers/staging/lustre/lustre/ldlm/ldlm_request.c
index 48e9828..9aa4c2d 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_request.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_request.c
@@ -601,7 +601,7 @@ int ldlm_prep_elc_req(struct obd_export *exp, struct ptlrpc_request *req,
 		avail = ldlm_capsule_handles_avail(pill, RCL_CLIENT, canceloff);
 
 		flags = ns_connect_lru_resize(ns) ?
-			LDLM_CANCEL_LRUR : LDLM_CANCEL_AGED;
+			LDLM_CANCEL_LRUR_NO_WAIT : LDLM_CANCEL_AGED;
 		to_free = !ns_connect_lru_resize(ns) &&
 			  opc == LDLM_ENQUEUE ? 1 : 0;
 
@@ -1146,7 +1146,7 @@ static ldlm_policy_res_t ldlm_cancel_no_wait_policy(struct ldlm_namespace *ns,
 	switch (lock->l_resource->lr_type) {
 	case LDLM_EXTENT:
 	case LDLM_IBITS:
-			if (ns->ns_cancel && ns->ns_cancel(lock) != 0)
+		if (ns->ns_cancel && ns->ns_cancel(lock) != 0)
 			break;
 	default:
 		result = LDLM_POLICY_SKIP_LOCK;
@@ -1251,6 +1251,21 @@ static ldlm_policy_res_t ldlm_cancel_aged_policy(struct ldlm_namespace *ns,
 	return LDLM_POLICY_CANCEL_LOCK;
 }
 
+static ldlm_policy_res_t
+ldlm_cancel_lrur_no_wait_policy(struct ldlm_namespace *ns,
+				struct ldlm_lock *lock,
+				int unused, int added,
+				int count)
+{
+	ldlm_policy_res_t result;
+
+	result = ldlm_cancel_lrur_policy(ns, lock, unused, added, count);
+	if (result == LDLM_POLICY_KEEP_LOCK)
+		return result;
+
+	return ldlm_cancel_no_wait_policy(ns, lock, unused, added, count);
+}
+
 /**
  * Callback function for default policy. Makes decision whether to keep \a lock
  * in LRU for current LRU size \a unused, added in current scan \a added and
@@ -1290,6 +1305,8 @@ ldlm_cancel_lru_policy(struct ldlm_namespace *ns, int flags)
 			return ldlm_cancel_lrur_policy;
 		else if (flags & LDLM_CANCEL_PASSED)
 			return ldlm_cancel_passed_policy;
+		else if (flags & LDLM_CANCEL_LRUR_NO_WAIT)
+			return ldlm_cancel_lrur_no_wait_policy;
 	} else {
 		if (flags & LDLM_CANCEL_AGED)
 			return ldlm_cancel_aged_policy;
@@ -1338,6 +1355,7 @@ static int ldlm_prepare_lru_list(struct ldlm_namespace *ns,
 	ldlm_cancel_lru_policy_t pf;
 	struct ldlm_lock *lock, *next;
 	int added = 0, unused, remained;
+	int no_wait = flags & (LDLM_CANCEL_NO_WAIT | LDLM_CANCEL_LRUR_NO_WAIT);
 
 	spin_lock(&ns->ns_lock);
 	unused = ns->ns_nr_unused;
@@ -1365,8 +1383,7 @@ static int ldlm_prepare_lru_list(struct ldlm_namespace *ns,
 			/* No locks which got blocking requests. */
 			LASSERT(!(lock->l_flags & LDLM_FL_BL_AST));
 
-			if (flags & LDLM_CANCEL_NO_WAIT &&
-			    lock->l_flags & LDLM_FL_SKIPPED)
+			if (no_wait && lock->l_flags & LDLM_FL_SKIPPED)
 				/* already processed */
 				continue;
 
diff --git a/drivers/staging/lustre/lustre/osc/osc_request.c b/drivers/staging/lustre/lustre/osc/osc_request.c
index a6dc517..5b9f72c 100644
--- a/drivers/staging/lustre/lustre/osc/osc_request.c
+++ b/drivers/staging/lustre/lustre/osc/osc_request.c
@@ -2297,8 +2297,8 @@ no_match:
 		if (!req)
 			return -ENOMEM;
 
-		rc = ptlrpc_request_pack(req, LUSTRE_DLM_VERSION, LDLM_ENQUEUE);
-		if (rc < 0) {
+		rc = ldlm_prep_enqueue_req(exp, req, NULL, 0);
+		if (rc) {
 			ptlrpc_request_free(req);
 			return rc;
 		}
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 44/46] staging/lustre: Fix spacing style before open parenthesis
  2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
                   ` (42 preceding siblings ...)
  2016-03-30 23:49 ` [PATCH v2 43/46] staging/lustre/ldlm: restore the ELC for enqueue green
@ 2016-03-30 23:49 ` green
  2016-03-30 23:49 ` [PATCH v2 45/46] staging/lustre/ldlm: Solve a race for LRU lock cancel green
  2016-03-30 23:49 ` [PATCH v2 46/46] staging/lustre: lov_io_init() should return error code green
  45 siblings, 0 replies; 47+ messages in thread
From: green @ 2016-03-30 23:49 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List, Oleg Drokin

From: Oleg Drokin <green@linuxhacker.ru>

This fixes the remaining occurences of checkpatch warnings
of the form of
WARNING: space prohibited between function name and open parenthesis '('

Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 drivers/staging/lustre/lustre/include/lu_object.h  | 64 +++++++++++-----------
 .../lustre/lustre/include/lustre/lustre_idl.h      |  2 +-
 .../lustre/lustre/include/lustre/lustre_user.h     | 36 ++++++------
 drivers/staging/lustre/lustre/include/lustre_cfg.h |  2 +-
 .../staging/lustre/lustre/include/lustre_import.h  |  2 +-
 drivers/staging/lustre/lustre/include/lustre_lib.h | 36 ++++++------
 drivers/staging/lustre/lustre/obdclass/debug.c     |  4 +-
 .../staging/lustre/lustre/osc/osc_cl_internal.h    | 22 ++++----
 drivers/staging/lustre/lustre/osc/osc_request.c    |  2 +-
 9 files changed, 85 insertions(+), 85 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lu_object.h b/drivers/staging/lustre/lustre/include/lu_object.h
index b5088b1..fcb9db6 100644
--- a/drivers/staging/lustre/lustre/include/lu_object.h
+++ b/drivers/staging/lustre/lustre/include/lu_object.h
@@ -656,21 +656,21 @@ static inline struct seq_server_site *lu_site2seq(const struct lu_site *s)
  * @{
  */
 
-int  lu_site_init	 (struct lu_site *s, struct lu_device *d);
-void lu_site_fini	 (struct lu_site *s);
-int  lu_site_init_finish  (struct lu_site *s);
-void lu_stack_fini	(const struct lu_env *env, struct lu_device *top);
-void lu_device_get	(struct lu_device *d);
-void lu_device_put	(struct lu_device *d);
-int  lu_device_init       (struct lu_device *d, struct lu_device_type *t);
-void lu_device_fini       (struct lu_device *d);
-int  lu_object_header_init(struct lu_object_header *h);
+int lu_site_init(struct lu_site *s, struct lu_device *d);
+void lu_site_fini(struct lu_site *s);
+int lu_site_init_finish(struct lu_site *s);
+void lu_stack_fini(const struct lu_env *env, struct lu_device *top);
+void lu_device_get(struct lu_device *d);
+void lu_device_put(struct lu_device *d);
+int lu_device_init(struct lu_device *d, struct lu_device_type *t);
+void lu_device_fini(struct lu_device *d);
+int lu_object_header_init(struct lu_object_header *h);
 void lu_object_header_fini(struct lu_object_header *h);
-int  lu_object_init       (struct lu_object *o,
-			   struct lu_object_header *h, struct lu_device *d);
-void lu_object_fini       (struct lu_object *o);
-void lu_object_add_top    (struct lu_object_header *h, struct lu_object *o);
-void lu_object_add	(struct lu_object *before, struct lu_object *o);
+int lu_object_init(struct lu_object *o,
+		   struct lu_object_header *h, struct lu_device *d);
+void lu_object_fini(struct lu_object *o);
+void lu_object_add_top(struct lu_object_header *h, struct lu_object *o);
+void lu_object_add(struct lu_object *before, struct lu_object *o);
 
 /**
  * Helpers to initialize and finalize device types.
@@ -1118,7 +1118,7 @@ struct lu_context_key {
 	{							 \
 		type *value;				      \
 								  \
-		CLASSERT(PAGE_CACHE_SIZE >= sizeof (*value));       \
+		CLASSERT(PAGE_CACHE_SIZE >= sizeof(*value));       \
 								  \
 		value = kzalloc(sizeof(*value), GFP_NOFS);	\
 		if (!value)				\
@@ -1154,12 +1154,12 @@ do {						    \
 	(key)->lct_owner = THIS_MODULE;		 \
 } while (0)
 
-int   lu_context_key_register(struct lu_context_key *key);
-void  lu_context_key_degister(struct lu_context_key *key);
-void *lu_context_key_get     (const struct lu_context *ctx,
-			       const struct lu_context_key *key);
-void  lu_context_key_quiesce (struct lu_context_key *key);
-void  lu_context_key_revive  (struct lu_context_key *key);
+int lu_context_key_register(struct lu_context_key *key);
+void lu_context_key_degister(struct lu_context_key *key);
+void *lu_context_key_get(const struct lu_context *ctx,
+			 const struct lu_context_key *key);
+void lu_context_key_quiesce(struct lu_context_key *key);
+void lu_context_key_revive(struct lu_context_key *key);
 
 /*
  * LU_KEY_INIT_GENERIC() has to be a macro to correctly determine an
@@ -1216,21 +1216,21 @@ void  lu_context_key_revive  (struct lu_context_key *key);
 	LU_TYPE_START(mod, __VA_ARGS__);	\
 	LU_TYPE_STOP(mod, __VA_ARGS__)
 
-int   lu_context_init  (struct lu_context *ctx, __u32 tags);
-void  lu_context_fini  (struct lu_context *ctx);
-void  lu_context_enter (struct lu_context *ctx);
-void  lu_context_exit  (struct lu_context *ctx);
-int   lu_context_refill(struct lu_context *ctx);
+int lu_context_init(struct lu_context *ctx, __u32 tags);
+void lu_context_fini(struct lu_context *ctx);
+void lu_context_enter(struct lu_context *ctx);
+void lu_context_exit(struct lu_context *ctx);
+int lu_context_refill(struct lu_context *ctx);
 
 /*
  * Helper functions to operate on multiple keys. These are used by the default
  * device type operations, defined by LU_TYPE_INIT_FINI().
  */
 
-int  lu_context_key_register_many(struct lu_context_key *k, ...);
+int lu_context_key_register_many(struct lu_context_key *k, ...);
 void lu_context_key_degister_many(struct lu_context_key *k, ...);
-void lu_context_key_revive_many  (struct lu_context_key *k, ...);
-void lu_context_key_quiesce_many (struct lu_context_key *k, ...);
+void lu_context_key_revive_many(struct lu_context_key *k, ...);
+void lu_context_key_quiesce_many(struct lu_context_key *k, ...);
 
 /**
  * Environment.
@@ -1246,9 +1246,9 @@ struct lu_env {
 	struct lu_context *le_ses;
 };
 
-int  lu_env_init  (struct lu_env *env, __u32 tags);
-void lu_env_fini  (struct lu_env *env);
-int  lu_env_refill(struct lu_env *env);
+int lu_env_init(struct lu_env *env, __u32 tags);
+void lu_env_fini(struct lu_env *env);
+int lu_env_refill(struct lu_env *env);
 
 /** @} lu_context */
 
diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index 4318511..12e6718 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -3306,7 +3306,7 @@ struct getinfo_fid2path {
 	char	    gf_path[0];
 } __packed;
 
-void lustre_swab_fid2path (struct getinfo_fid2path *gf);
+void lustre_swab_fid2path(struct getinfo_fid2path *gf);
 
 enum {
 	LAYOUT_INTENT_ACCESS    = 0,
diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_user.h b/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
index 276906e..19f2271 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
@@ -193,37 +193,37 @@ struct ost_id {
  * *INFO    - set/get lov_user_mds_data
  */
 /* see <lustre_lib.h> for ioctl numberss 101-150 */
-#define LL_IOC_GETFLAGS		 _IOR ('f', 151, long)
-#define LL_IOC_SETFLAGS		 _IOW ('f', 152, long)
-#define LL_IOC_CLRFLAGS		 _IOW ('f', 153, long)
+#define LL_IOC_GETFLAGS		 _IOR('f', 151, long)
+#define LL_IOC_SETFLAGS		 _IOW('f', 152, long)
+#define LL_IOC_CLRFLAGS		 _IOW('f', 153, long)
 /* LL_IOC_LOV_SETSTRIPE: See also OBD_IOC_LOV_SETSTRIPE */
-#define LL_IOC_LOV_SETSTRIPE	    _IOW ('f', 154, long)
+#define LL_IOC_LOV_SETSTRIPE	    _IOW('f', 154, long)
 /* LL_IOC_LOV_GETSTRIPE: See also OBD_IOC_LOV_GETSTRIPE */
-#define LL_IOC_LOV_GETSTRIPE	    _IOW ('f', 155, long)
+#define LL_IOC_LOV_GETSTRIPE	    _IOW('f', 155, long)
 /* LL_IOC_LOV_SETEA: See also OBD_IOC_LOV_SETEA */
-#define LL_IOC_LOV_SETEA		_IOW ('f', 156, long)
-#define LL_IOC_RECREATE_OBJ	     _IOW ('f', 157, long)
-#define LL_IOC_RECREATE_FID	     _IOW ('f', 157, struct lu_fid)
-#define LL_IOC_GROUP_LOCK	       _IOW ('f', 158, long)
-#define LL_IOC_GROUP_UNLOCK	     _IOW ('f', 159, long)
+#define LL_IOC_LOV_SETEA		_IOW('f', 156, long)
+#define LL_IOC_RECREATE_OBJ	     _IOW('f', 157, long)
+#define LL_IOC_RECREATE_FID	     _IOW('f', 157, struct lu_fid)
+#define LL_IOC_GROUP_LOCK	       _IOW('f', 158, long)
+#define LL_IOC_GROUP_UNLOCK	     _IOW('f', 159, long)
 /* LL_IOC_QUOTACHECK: See also OBD_IOC_QUOTACHECK */
-#define LL_IOC_QUOTACHECK	       _IOW ('f', 160, int)
+#define LL_IOC_QUOTACHECK	       _IOW('f', 160, int)
 /* LL_IOC_POLL_QUOTACHECK: See also OBD_IOC_POLL_QUOTACHECK */
-#define LL_IOC_POLL_QUOTACHECK	  _IOR ('f', 161, struct if_quotacheck *)
+#define LL_IOC_POLL_QUOTACHECK	  _IOR('f', 161, struct if_quotacheck *)
 /* LL_IOC_QUOTACTL: See also OBD_IOC_QUOTACTL */
 #define LL_IOC_QUOTACTL		 _IOWR('f', 162, struct if_quotactl)
 #define IOC_OBD_STATFS		  _IOWR('f', 164, struct obd_statfs *)
 #define IOC_LOV_GETINFO		 _IOWR('f', 165, struct lov_user_mds_data *)
-#define LL_IOC_FLUSHCTX		 _IOW ('f', 166, long)
-#define LL_IOC_RMTACL		   _IOW ('f', 167, long)
-#define LL_IOC_GETOBDCOUNT	      _IOR ('f', 168, long)
+#define LL_IOC_FLUSHCTX		 _IOW('f', 166, long)
+#define LL_IOC_RMTACL		   _IOW('f', 167, long)
+#define LL_IOC_GETOBDCOUNT	      _IOR('f', 168, long)
 #define LL_IOC_LLOOP_ATTACH	     _IOWR('f', 169, long)
 #define LL_IOC_LLOOP_DETACH	     _IOWR('f', 170, long)
 #define LL_IOC_LLOOP_INFO	       _IOWR('f', 171, struct lu_fid)
 #define LL_IOC_LLOOP_DETACH_BYDEV       _IOWR('f', 172, long)
-#define LL_IOC_PATH2FID		 _IOR ('f', 173, long)
+#define LL_IOC_PATH2FID		 _IOR('f', 173, long)
 #define LL_IOC_GET_CONNECT_FLAGS	_IOWR('f', 174, __u64 *)
-#define LL_IOC_GET_MDTIDX	       _IOR ('f', 175, int)
+#define LL_IOC_GET_MDTIDX	       _IOR('f', 175, int)
 
 /* see <lustre_lib.h> for ioctl numbers 177-210 */
 
@@ -1100,7 +1100,7 @@ struct hsm_action_list {
 } __packed;
 
 #ifndef HAVE_CFS_SIZE_ROUND
-static inline int cfs_size_round (int val)
+static inline int cfs_size_round(int val)
 {
 	return (val + 7) & (~0x7);
 }
diff --git a/drivers/staging/lustre/lustre/include/lustre_cfg.h b/drivers/staging/lustre/lustre/include/lustre_cfg.h
index bb16ae9..e229e91 100644
--- a/drivers/staging/lustre/lustre/include/lustre_cfg.h
+++ b/drivers/staging/lustre/lustre/include/lustre_cfg.h
@@ -161,7 +161,7 @@ static inline void *lustre_cfg_buf(struct lustre_cfg *lcfg, int index)
 	int offset;
 	int bufcount;
 
-	LASSERT (index >= 0);
+	LASSERT(index >= 0);
 
 	bufcount = lcfg->lcfg_bufcount;
 	if (index >= bufcount)
diff --git a/drivers/staging/lustre/lustre/include/lustre_import.h b/drivers/staging/lustre/lustre/include/lustre_import.h
index dac2d84..8325c82 100644
--- a/drivers/staging/lustre/lustre/include/lustre_import.h
+++ b/drivers/staging/lustre/lustre/include/lustre_import.h
@@ -109,7 +109,7 @@ static inline char *ptlrpc_import_state_name(enum lustre_imp_state state)
 		"RECOVER", "FULL", "EVICTED",
 	};
 
-	LASSERT (state <= LUSTRE_IMP_EVICTED);
+	LASSERT(state <= LUSTRE_IMP_EVICTED);
 	return import_state_names[state];
 }
 
diff --git a/drivers/staging/lustre/lustre/include/lustre_lib.h b/drivers/staging/lustre/lustre/include/lustre_lib.h
index c5713d7..2e66b27 100644
--- a/drivers/staging/lustre/lustre/include/lustre_lib.h
+++ b/drivers/staging/lustre/lustre/include/lustre_lib.h
@@ -280,16 +280,16 @@ static inline void obd_ioctl_freedata(char *buf, int len)
 #define OBD_IOC_DATA_TYPE long
 
 #define OBD_IOC_CREATE		 _IOWR('f', 101, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_DESTROY		_IOW ('f', 104, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_DESTROY		_IOW('f', 104, OBD_IOC_DATA_TYPE)
 #define OBD_IOC_PREALLOCATE	    _IOWR('f', 105, OBD_IOC_DATA_TYPE)
 
-#define OBD_IOC_SETATTR		_IOW ('f', 107, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_SETATTR		_IOW('f', 107, OBD_IOC_DATA_TYPE)
 #define OBD_IOC_GETATTR		_IOWR ('f', 108, OBD_IOC_DATA_TYPE)
 #define OBD_IOC_READ		   _IOWR('f', 109, OBD_IOC_DATA_TYPE)
 #define OBD_IOC_WRITE		  _IOWR('f', 110, OBD_IOC_DATA_TYPE)
 
 #define OBD_IOC_STATFS		 _IOWR('f', 113, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_SYNC		   _IOW ('f', 114, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_SYNC		   _IOW('f', 114, OBD_IOC_DATA_TYPE)
 #define OBD_IOC_READ2		  _IOWR('f', 115, OBD_IOC_DATA_TYPE)
 #define OBD_IOC_FORMAT		 _IOWR('f', 116, OBD_IOC_DATA_TYPE)
 #define OBD_IOC_PARTITION	      _IOWR('f', 117, OBD_IOC_DATA_TYPE)
@@ -308,13 +308,13 @@ static inline void obd_ioctl_freedata(char *buf, int len)
 #define OBD_IOC_GETDTNAME	       OBD_IOC_GETNAME
 
 #define OBD_IOC_LOV_GET_CONFIG	 _IOWR('f', 132, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_CLIENT_RECOVER	 _IOW ('f', 133, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_PING_TARGET	    _IOW ('f', 136, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_CLIENT_RECOVER	 _IOW('f', 133, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_PING_TARGET	    _IOW('f', 136, OBD_IOC_DATA_TYPE)
 
 #define OBD_IOC_DEC_FS_USE_COUNT       _IO  ('f', 139)
-#define OBD_IOC_NO_TRANSNO	     _IOW ('f', 140, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_SET_READONLY	   _IOW ('f', 141, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_ABORT_RECOVERY	 _IOR ('f', 142, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_NO_TRANSNO	     _IOW('f', 140, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_SET_READONLY	   _IOW('f', 141, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_ABORT_RECOVERY	 _IOR('f', 142, OBD_IOC_DATA_TYPE)
 
 #define OBD_IOC_ROOT_SQUASH	    _IOWR('f', 143, OBD_IOC_DATA_TYPE)
 
@@ -324,27 +324,27 @@ static inline void obd_ioctl_freedata(char *buf, int len)
 
 #define OBD_IOC_CLOSE_UUID	     _IOWR ('f', 147, OBD_IOC_DATA_TYPE)
 
-#define OBD_IOC_CHANGELOG_SEND	 _IOW ('f', 148, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_CHANGELOG_SEND	 _IOW('f', 148, OBD_IOC_DATA_TYPE)
 #define OBD_IOC_GETDEVICE	      _IOWR ('f', 149, OBD_IOC_DATA_TYPE)
 #define OBD_IOC_FID2PATH	       _IOWR ('f', 150, OBD_IOC_DATA_TYPE)
 /* see also <lustre/lustre_user.h> for ioctls 151-153 */
 /* OBD_IOC_LOV_SETSTRIPE: See also LL_IOC_LOV_SETSTRIPE */
-#define OBD_IOC_LOV_SETSTRIPE	  _IOW ('f', 154, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_LOV_SETSTRIPE	  _IOW('f', 154, OBD_IOC_DATA_TYPE)
 /* OBD_IOC_LOV_GETSTRIPE: See also LL_IOC_LOV_GETSTRIPE */
-#define OBD_IOC_LOV_GETSTRIPE	  _IOW ('f', 155, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_LOV_GETSTRIPE	  _IOW('f', 155, OBD_IOC_DATA_TYPE)
 /* OBD_IOC_LOV_SETEA: See also LL_IOC_LOV_SETEA */
-#define OBD_IOC_LOV_SETEA	      _IOW ('f', 156, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_LOV_SETEA	      _IOW('f', 156, OBD_IOC_DATA_TYPE)
 /* see <lustre/lustre_user.h> for ioctls 157-159 */
 /* OBD_IOC_QUOTACHECK: See also LL_IOC_QUOTACHECK */
-#define OBD_IOC_QUOTACHECK	     _IOW ('f', 160, int)
+#define OBD_IOC_QUOTACHECK	     _IOW('f', 160, int)
 /* OBD_IOC_POLL_QUOTACHECK: See also LL_IOC_POLL_QUOTACHECK */
-#define OBD_IOC_POLL_QUOTACHECK	_IOR ('f', 161, struct if_quotacheck *)
+#define OBD_IOC_POLL_QUOTACHECK	_IOR('f', 161, struct if_quotacheck *)
 /* OBD_IOC_QUOTACTL: See also LL_IOC_QUOTACTL */
 #define OBD_IOC_QUOTACTL	       _IOWR('f', 162, struct if_quotactl)
 /* see  also <lustre/lustre_user.h> for ioctls 163-176 */
-#define OBD_IOC_CHANGELOG_REG	  _IOW ('f', 177, struct obd_ioctl_data)
-#define OBD_IOC_CHANGELOG_DEREG	_IOW ('f', 178, struct obd_ioctl_data)
-#define OBD_IOC_CHANGELOG_CLEAR	_IOW ('f', 179, struct obd_ioctl_data)
+#define OBD_IOC_CHANGELOG_REG	  _IOW('f', 177, struct obd_ioctl_data)
+#define OBD_IOC_CHANGELOG_DEREG	_IOW('f', 178, struct obd_ioctl_data)
+#define OBD_IOC_CHANGELOG_CLEAR	_IOW('f', 179, struct obd_ioctl_data)
 #define OBD_IOC_RECORD		 _IOWR('f', 180, OBD_IOC_DATA_TYPE)
 #define OBD_IOC_ENDRECORD	      _IOWR('f', 181, OBD_IOC_DATA_TYPE)
 #define OBD_IOC_PARSE		  _IOWR('f', 182, OBD_IOC_DATA_TYPE)
@@ -352,7 +352,7 @@ static inline void obd_ioctl_freedata(char *buf, int len)
 #define OBD_IOC_PROCESS_CFG	    _IOWR('f', 184, OBD_IOC_DATA_TYPE)
 #define OBD_IOC_DUMP_LOG	       _IOWR('f', 185, OBD_IOC_DATA_TYPE)
 #define OBD_IOC_CLEAR_LOG	      _IOWR('f', 186, OBD_IOC_DATA_TYPE)
-#define OBD_IOC_PARAM		  _IOW ('f', 187, OBD_IOC_DATA_TYPE)
+#define OBD_IOC_PARAM		  _IOW('f', 187, OBD_IOC_DATA_TYPE)
 #define OBD_IOC_POOL		   _IOWR('f', 188, OBD_IOC_DATA_TYPE)
 #define OBD_IOC_REPLACE_NIDS	   _IOWR('f', 189, OBD_IOC_DATA_TYPE)
 
diff --git a/drivers/staging/lustre/lustre/obdclass/debug.c b/drivers/staging/lustre/lustre/obdclass/debug.c
index 43a7f7a..e4edfb2 100644
--- a/drivers/staging/lustre/lustre/obdclass/debug.c
+++ b/drivers/staging/lustre/lustre/obdclass/debug.c
@@ -68,8 +68,8 @@ int block_debug_check(char *who, void *addr, int end, __u64 off, __u64 id)
 
 	LASSERT(addr);
 
-	ne_off = le64_to_cpu (off);
-	id = le64_to_cpu (id);
+	ne_off = le64_to_cpu(off);
+	id = le64_to_cpu(id);
 	if (memcmp(addr, (char *)&ne_off, LPDS)) {
 		CDEBUG(D_ERROR, "%s: id %#llx offset %llu off: %#llx != %#llx\n",
 		       who, id, off, *(__u64 *)addr, ne_off);
diff --git a/drivers/staging/lustre/lustre/osc/osc_cl_internal.h b/drivers/staging/lustre/lustre/osc/osc_cl_internal.h
index aba6469..ae19d39 100644
--- a/drivers/staging/lustre/lustre/osc/osc_cl_internal.h
+++ b/drivers/staging/lustre/lustre/osc/osc_cl_internal.h
@@ -403,20 +403,20 @@ extern struct lu_context_key osc_session_key;
 int osc_lock_init(const struct lu_env *env,
 		  struct cl_object *obj, struct cl_lock *lock,
 		  const struct cl_io *io);
-int osc_io_init  (const struct lu_env *env,
-		  struct cl_object *obj, struct cl_io *io);
-int osc_req_init (const struct lu_env *env, struct cl_device *dev,
-		  struct cl_req *req);
+int osc_io_init(const struct lu_env *env,
+		struct cl_object *obj, struct cl_io *io);
+int osc_req_init(const struct lu_env *env, struct cl_device *dev,
+		 struct cl_req *req);
 struct lu_object *osc_object_alloc(const struct lu_env *env,
 				   const struct lu_object_header *hdr,
 				   struct lu_device *dev);
 int osc_page_init(const struct lu_env *env, struct cl_object *obj,
 		  struct cl_page *page, pgoff_t ind);
 
-void osc_index2policy  (ldlm_policy_data_t *policy, const struct cl_object *obj,
-			pgoff_t start, pgoff_t end);
-int  osc_lvb_print     (const struct lu_env *env, void *cookie,
-			lu_printer_t p, const struct ost_lvb *lvb);
+void osc_index2policy(ldlm_policy_data_t *policy, const struct cl_object *obj,
+		      pgoff_t start, pgoff_t end);
+int osc_lvb_print(const struct lu_env *env, void *cookie,
+		  lu_printer_t p, const struct ost_lvb *lvb);
 
 void osc_lru_add_batch(struct client_obd *cli, struct list_head *list);
 void osc_page_submit(const struct lu_env *env, struct osc_page *opg,
@@ -448,11 +448,11 @@ void osc_io_unplug(const struct lu_env *env, struct client_obd *cli,
 		   struct osc_object *osc);
 int lru_queue_work(const struct lu_env *env, void *data);
 
-void osc_object_set_contended  (struct osc_object *obj);
+void osc_object_set_contended(struct osc_object *obj);
 void osc_object_clear_contended(struct osc_object *obj);
-int  osc_object_is_contended   (struct osc_object *obj);
+int  osc_object_is_contended(struct osc_object *obj);
 
-int  osc_lock_is_lockless      (const struct osc_lock *olck);
+int  osc_lock_is_lockless(const struct osc_lock *olck);
 
 /*****************************************************************************
  *
diff --git a/drivers/staging/lustre/lustre/osc/osc_request.c b/drivers/staging/lustre/lustre/osc/osc_request.c
index 5b9f72c..4d3eed6 100644
--- a/drivers/staging/lustre/lustre/osc/osc_request.c
+++ b/drivers/staging/lustre/lustre/osc/osc_request.c
@@ -2485,7 +2485,7 @@ static int osc_statfs_async(struct obd_export *exp,
 	}
 
 	req->rq_interpret_reply = (ptlrpc_interpterer_t)osc_statfs_interpret;
-	CLASSERT (sizeof(*aa) <= sizeof(req->rq_async_args));
+	CLASSERT(sizeof(*aa) <= sizeof(req->rq_async_args));
 	aa = ptlrpc_req_async_args(req);
 	aa->aa_oi = oinfo;
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 45/46] staging/lustre/ldlm: Solve a race for LRU lock cancel
  2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
                   ` (43 preceding siblings ...)
  2016-03-30 23:49 ` [PATCH v2 44/46] staging/lustre: Fix spacing style before open parenthesis green
@ 2016-03-30 23:49 ` green
  2016-03-30 23:49 ` [PATCH v2 46/46] staging/lustre: lov_io_init() should return error code green
  45 siblings, 0 replies; 47+ messages in thread
From: green @ 2016-03-30 23:49 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List,
	Vitaly Fertman, Jinshan Xiong, Vitaly Fertman, Oleg Drokin

From: Vitaly Fertman <vitaly.fertman@seagate.com>

This patch solves a race condition that the lock may be used again
after LRU cancellation policy check. In that case, the lock may have
locked or dirty pages that makes the policy check totally useless.
The problem is solved by checking l_last_used at cancellation time
therefore it can make sure that the lock has not been used.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Signed-off-by: Vitaly Fertman <vitaly_fertman@xyratex.com>
Reviewed-on: http://review.whamcloud.com/12603
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5781
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 drivers/staging/lustre/lustre/ldlm/ldlm_internal.h |  3 ++-
 drivers/staging/lustre/lustre/ldlm/ldlm_lock.c     | 16 +++++++++++++---
 drivers/staging/lustre/lustre/ldlm/ldlm_request.c  | 11 +++++++++--
 3 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_internal.h b/drivers/staging/lustre/lustre/ldlm/ldlm_internal.h
index e31d84a..351f8b4 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_internal.h
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_internal.h
@@ -146,7 +146,8 @@ void ldlm_lock_decref_internal(struct ldlm_lock *, __u32 mode);
 void ldlm_lock_decref_internal_nolock(struct ldlm_lock *, __u32 mode);
 int ldlm_run_ast_work(struct ldlm_namespace *ns, struct list_head *rpc_list,
 		      enum ldlm_desc_ast_t ast_type);
-int ldlm_lock_remove_from_lru(struct ldlm_lock *lock);
+int ldlm_lock_remove_from_lru_check(struct ldlm_lock *lock, time_t last_use);
+#define ldlm_lock_remove_from_lru(lock) ldlm_lock_remove_from_lru_check(lock, 0)
 int ldlm_lock_remove_from_lru_nolock(struct ldlm_lock *lock);
 void ldlm_lock_destroy_nolock(struct ldlm_lock *lock);
 
diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_lock.c b/drivers/staging/lustre/lustre/ldlm/ldlm_lock.c
index 27a051b..3f9b8526 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_lock.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_lock.c
@@ -229,15 +229,25 @@ int ldlm_lock_remove_from_lru_nolock(struct ldlm_lock *lock)
 
 /**
  * Removes LDLM lock \a lock from LRU. Obtains the LRU lock first.
+ *
+ * If \a last_use is non-zero, it will remove the lock from LRU only if
+ * it matches lock's l_last_used.
+ *
+ * \retval 0 if \a last_use is set, the lock is not in LRU list or \a last_use
+ *           doesn't match lock's l_last_used;
+ *           otherwise, the lock hasn't been in the LRU list.
+ * \retval 1 the lock was in LRU list and removed.
  */
-int ldlm_lock_remove_from_lru(struct ldlm_lock *lock)
+int ldlm_lock_remove_from_lru_check(struct ldlm_lock *lock, time_t last_use)
 {
 	struct ldlm_namespace *ns = ldlm_lock_to_ns(lock);
-	int rc;
+	int rc = 0;
 
 	spin_lock(&ns->ns_lock);
-	rc = ldlm_lock_remove_from_lru_nolock(lock);
+	if (last_use == 0 || last_use == lock->l_last_used)
+		rc = ldlm_lock_remove_from_lru_nolock(lock);
 	spin_unlock(&ns->ns_lock);
+
 	return rc;
 }
 
diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_request.c b/drivers/staging/lustre/lustre/ldlm/ldlm_request.c
index 9aa4c2d..5b0e396 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_request.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_request.c
@@ -1369,6 +1369,7 @@ static int ldlm_prepare_lru_list(struct ldlm_namespace *ns,
 
 	while (!list_empty(&ns->ns_unused_list)) {
 		ldlm_policy_res_t result;
+		time_t last_use = 0;
 
 		/* all unused locks */
 		if (remained-- <= 0)
@@ -1387,6 +1388,10 @@ static int ldlm_prepare_lru_list(struct ldlm_namespace *ns,
 				/* already processed */
 				continue;
 
+			last_use = lock->l_last_used;
+			if (last_use == cfs_time_current())
+				continue;
+
 			/* Somebody is already doing CANCEL. No need for this
 			 * lock in LRU, do not traverse it again.
 			 */
@@ -1434,11 +1439,13 @@ static int ldlm_prepare_lru_list(struct ldlm_namespace *ns,
 		lock_res_and_lock(lock);
 		/* Check flags again under the lock. */
 		if ((lock->l_flags & LDLM_FL_CANCELING) ||
-		    (ldlm_lock_remove_from_lru(lock) == 0)) {
+		    (ldlm_lock_remove_from_lru_check(lock, last_use) == 0)) {
 			/* Another thread is removing lock from LRU, or
 			 * somebody is already doing CANCEL, or there
 			 * is a blocking request which will send cancel
-			 * by itself, or the lock is no longer unused.
+			 * by itself, or the lock is no longer unused or
+			 * the lock has been used since the pf() call and
+			 * pages could be put under it.
 			 */
 			unlock_res_and_lock(lock);
 			lu_ref_del(&lock->l_reference,
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 46/46] staging/lustre: lov_io_init() should return error code
  2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
                   ` (44 preceding siblings ...)
  2016-03-30 23:49 ` [PATCH v2 45/46] staging/lustre/ldlm: Solve a race for LRU lock cancel green
@ 2016-03-30 23:49 ` green
  45 siblings, 0 replies; 47+ messages in thread
From: green @ 2016-03-30 23:49 UTC (permalink / raw)
  To: Greg Kroah-Hartman, devel, Andreas Dilger
  Cc: Linux Kernel Mailing List, Lustre Development List, Bobi Jam,
	Oleg Drokin

From: Bobi Jam <bobijam.xu@intel.com>

lov_io_init_empty/release() should returns error code instead of
true on error case.

Fault IO needs to handle restart in the case of accessing HSM released
file

Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Reviewed-on: http://review.whamcloud.com/17240
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7446
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 drivers/staging/lustre/lustre/llite/llite_mmap.c | 4 ++++
 drivers/staging/lustre/lustre/lov/lov_io.c       | 4 ++--
 drivers/staging/lustre/lustre/obdclass/cl_io.c   | 1 +
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/llite_mmap.c b/drivers/staging/lustre/lustre/llite/llite_mmap.c
index 83d7006..5b4382c 100644
--- a/drivers/staging/lustre/lustre/llite/llite_mmap.c
+++ b/drivers/staging/lustre/lustre/llite/llite_mmap.c
@@ -123,6 +123,7 @@ ll_fault_io_init(struct vm_area_struct *vma, struct lu_env **env_ret,
 
 	*env_ret = env;
 
+restart:
 	io = vvp_env_thread_io(env);
 	io->ci_obj = ll_i2info(inode)->lli_clob;
 	LASSERT(io->ci_obj);
@@ -157,6 +158,9 @@ ll_fault_io_init(struct vm_area_struct *vma, struct lu_env **env_ret,
 	} else {
 		LASSERT(rc < 0);
 		cl_io_fini(env, io);
+		if (io->ci_need_restart)
+			goto restart;
+
 		cl_env_nested_put(nest, env);
 		io = ERR_PTR(rc);
 	}
diff --git a/drivers/staging/lustre/lustre/lov/lov_io.c b/drivers/staging/lustre/lustre/lov/lov_io.c
index ba79955..4151237 100644
--- a/drivers/staging/lustre/lustre/lov/lov_io.c
+++ b/drivers/staging/lustre/lustre/lov/lov_io.c
@@ -916,7 +916,7 @@ int lov_io_init_empty(const struct lu_env *env, struct cl_object *obj,
 	}
 
 	io->ci_result = result < 0 ? result : 0;
-	return result != 0;
+	return result;
 }
 
 int lov_io_init_released(const struct lu_env *env, struct cl_object *obj,
@@ -959,7 +959,7 @@ int lov_io_init_released(const struct lu_env *env, struct cl_object *obj,
 	}
 
 	io->ci_result = result < 0 ? result : 0;
-	return result != 0;
+	return result;
 }
 
 /** @} lov */
diff --git a/drivers/staging/lustre/lustre/obdclass/cl_io.c b/drivers/staging/lustre/lustre/obdclass/cl_io.c
index 7655dc4..f4b3178 100644
--- a/drivers/staging/lustre/lustre/obdclass/cl_io.c
+++ b/drivers/staging/lustre/lustre/obdclass/cl_io.c
@@ -133,6 +133,7 @@ void cl_io_fini(const struct lu_env *env, struct cl_io *io)
 	case CIT_WRITE:
 		break;
 	case CIT_FAULT:
+		break;
 	case CIT_FSYNC:
 		LASSERT(!io->ci_need_restart);
 		break;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2016-03-31  0:01 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-30 23:48 [PATCH v2 00/46] Lustre IO stack simplifications and cleanups green
2016-03-30 23:48 ` [PATCH v2 01/46] staging/lustre/obdclass: limit lu_site hash table size green
2016-03-30 23:48 ` [PATCH v2 02/46] staging/lustre: Get rid of CFS_PAGE_MASK green
2016-03-30 23:48 ` [PATCH v2 03/46] staging/lustre: merge lclient/*.c into llite/ green
2016-03-30 23:48 ` [PATCH v2 04/46] staging/lustre: Reintroduce global env list green
2016-03-30 23:48 ` [PATCH v2 05/46] staging/lustre/osc: Adjustment on osc LRU for performance green
2016-03-30 23:48 ` [PATCH v2 06/46] staging/lustre/osc: to drop LRU pages with cl_lru_work green
2016-03-30 23:48 ` [PATCH v2 07/46] staging/lustre/clio: collapse layer of cl_page green
2016-03-30 23:48 ` [PATCH v2 08/46] staging/lustre/obdclass: Add a preallocated percpu cl_env green
2016-03-30 23:48 ` [PATCH v2 09/46] staging/lustre/clio: add pages into writeback cache in batches green
2016-03-30 23:48 ` [PATCH v2 10/46] staging/lustre/osc: add weight function for DLM lock green
2016-03-30 23:48 ` [PATCH v2 11/46] staging/lustre/clio: remove stackable cl_page completely green
2016-03-30 23:48 ` [PATCH v2 12/46] staging/lustre/clio: optimize read ahead code green
2016-03-30 23:48 ` [PATCH v2 13/46] staging/lustre/llite: remove lli_lvb green
2016-03-30 23:48 ` [PATCH v2 14/46] staging/lustre/lmv: remove lmv_init_{lock,unlock}() green
2016-03-30 23:48 ` [PATCH v2 15/46] staging/lustre/obd: remove struct client_obd_lock green
2016-03-30 23:48 ` [PATCH v2 16/46] staging/lustre/llite: remove some cl wrappers green
2016-03-30 23:48 ` [PATCH v2 17/46] staging/lustre: Remove struct ll_iattr green
2016-03-30 23:48 ` [PATCH v2 18/46] staging/lustre/clio: generalize cl_sync_io green
2016-03-30 23:48 ` [PATCH v2 19/46] staging/lustre/clio: cl_lock simplification green
2016-03-30 23:48 ` [PATCH v2 20/46] staging/lustre: update comments after " green
2016-03-30 23:48 ` [PATCH v2 21/46] staging/lustre/llite: clip page correctly for vvp_io_commit_sync green
2016-03-30 23:48 ` [PATCH v2 22/46] staging/lustre/llite: deadlock for page write green
2016-03-30 23:48 ` [PATCH v2 23/46] staging/lustre/llite: make sure we do cl_page_clip on the last page green
2016-03-30 23:48 ` [PATCH v2 24/46] staging/lustre/llite: merge lclient.h into llite/vvp_internal.h green
2016-03-30 23:48 ` [PATCH v2 25/46] staging/lustre/llite: rename ccc_device to vvp_device green
2016-03-30 23:48 ` [PATCH v2 26/46] staging/lustre/llite: rename ccc_object to vvp_object green
2016-03-30 23:48 ` [PATCH v2 27/46] staging/lustre/llite: rename ccc_page to vvp_page green
2016-03-30 23:48 ` [PATCH v2 28/46] staging/lustre/llite: rename ccc_lock to vvp_lock green
2016-03-30 23:48 ` [PATCH v2 29/46] staging/lustre:llite: remove struct ll_ra_read green
2016-03-30 23:48 ` [PATCH v2 30/46] staging/lustre/llite: merge ccc_io and vvp_io green
2016-03-30 23:48 ` [PATCH v2 31/46] staging/lustre/llite: use vui prefix for struct vvp_io members green
2016-03-30 23:48 ` [PATCH v2 32/46] staging/lustre/llite: move vvp_io functions to vvp_io.c green
2016-03-30 23:48 ` [PATCH v2 33/46] staging/lustre/llite: rename ccc_req to vvp_req green
2016-03-30 23:48 ` [PATCH v2 34/46] staging/lustre/llite: Rename struct ccc_grouplock to ll_grouplock green
2016-03-30 23:48 ` [PATCH v2 35/46] staging/lustre/llite: Rename struct vvp_thread_info to ll_thread_info green
2016-03-30 23:48 ` [PATCH v2 36/46] staging/lustre/llite: rename struct ccc_thread_info to vvp_thread_info green
2016-03-30 23:48 ` [PATCH v2 37/46] staging/lustre/llite: Remove ccc_global_{init,fini}() green
2016-03-30 23:48 ` [PATCH v2 38/46] staging/lustre/llite: Move ll_dirent_type_get and make it static green
2016-03-30 23:49 ` [PATCH v2 39/46] staging/lustre/llite: Move several declarations to llite_internal.h green
2016-03-30 23:49 ` [PATCH v2 40/46] staging/lustre/llite: Remove unused vui_local_lock field green
2016-03-30 23:49 ` [PATCH v2 41/46] staging/lustre/ldlm: ELC picks locks in a safer policy green
2016-03-30 23:49 ` [PATCH v2 42/46] staging/lustre/ldlm: revert changes to ldlm_cancel_aged_policy() green
2016-03-30 23:49 ` [PATCH v2 43/46] staging/lustre/ldlm: restore the ELC for enqueue green
2016-03-30 23:49 ` [PATCH v2 44/46] staging/lustre: Fix spacing style before open parenthesis green
2016-03-30 23:49 ` [PATCH v2 45/46] staging/lustre/ldlm: Solve a race for LRU lock cancel green
2016-03-30 23:49 ` [PATCH v2 46/46] staging/lustre: lov_io_init() should return error code green

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).