All of lore.kernel.org
 help / color / mirror / Atom feed
* [lustre-devel] [PATCH v2 00/33] lustre: add PFL support
@ 2019-01-06 22:13 James Simmons
  2019-01-06 22:13 ` [lustre-devel] [PATCH v2 01/33] lustre: clio: fix incorrect invariant in cl_io_iter_fini() James Simmons
                   ` (32 more replies)
  0 siblings, 33 replies; 47+ messages in thread
From: James Simmons @ 2019-01-06 22:13 UTC (permalink / raw)
  To: lustre-devel

Progressive file layouts are characterized by increasing the
stripe count of the file in a step-wise manner as the file
offset increases. This will be achieved by using multiple
extent-based composite layouts as described in the Layout
Enhancement High Level Design:

http://wiki.lustre.org/Layout_Enhancement_High_Level_Design

This was the key item still missing in the linux lustre client.
We added this funcitonality with this patch set. With this last
key feature added this brings the linux lustre client on about
the same level as the 2.10.6 LTS. Since this is the case we can
bump the lustre version.

Andreas Dilger (1):
  lustre: lov: use stripe_count instead of stripe_nr

Bobi Jam (20):
  lustre: lov: move code for PFL work
  lustre: lov: merge lov_mds_md_v3 and lov_mds_md_v1 handling
  lustre: lov: fold lmm_verify() handling into lmm_unpackmd()
  lustre: lov: create struct lov_stripe_md_entry
  lustre: lov: add composite layout unpacking
  lustre: lov: embedded raid0 in struct lov_layout_composite
  lustre: lov: migrate lov raid0 to future PFL component handling
  lustre: lov: reduce code indentation
  lustre: lov: change lo_entries to array.
  lustre: lov: move around PFL code and cleanups
  lustre: lov: remove lsm_stripe_by_[index|offset]_plain
  lustre: lov: add looping lsm_entry_count times
  lustre: lov: create lov_comp_* wrappers
  lustre: clio: client side implementation for PFL
  lustre: pfl: dynamic layout modification with write/truncate
  lustre: pfl: calculate PFL file LOVEA correctly
  lustre: lov: keep minimum LOVEA size
  lustre: pfl: fix hang with grouplocks
  lustre: pfl: fix ost pool op->size handling
  lustre: llite: restore ll_file_getstripe in ll_lov_setstripe

Fan Yong (1):
  lustre: pfl: enhance PFID EA for PFL

James Simmons (2):
  lustre: clio: fix incorrect invariant in cl_io_iter_fini()
  lustre: update version to 2.9.99

Jinshan Xiong (3):
  lustre: pfl: Read should not trigger layout write intent
  lustre: lov: readahead shouldn't exceed component boundary
  lustre: lov: do not split IO for single striped file

Mike Pershin (1):
  lustre: lov: call cl_object_attr_get under cl_attr lock

Niu Yawei (4):
  lustre: pfl: Basic data structures for composite layout
  lustre: clio: getstripe support comp layout
  lustre: uapi: support negative flags
  lustre: llite: return v1/v3 layout for legacy app

wang di (1):
  lustre: ldlm: Transfer layout only if layout lock is granted

 .../lustre/include/uapi/linux/lustre/lustre_idl.h  |  36 +-
 .../lustre/include/uapi/linux/lustre/lustre_user.h |  84 ++-
 .../lustre/include/uapi/linux/lustre/lustre_ver.h  |   4 +-
 drivers/staging/lustre/lustre/include/cl_object.h  |  12 +-
 drivers/staging/lustre/lustre/include/lustre_sec.h |   4 +-
 .../staging/lustre/lustre/include/lustre_swab.h    |   1 +
 drivers/staging/lustre/lustre/include/obd.h        |   4 -
 drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c    |  18 -
 drivers/staging/lustre/lustre/llite/dir.c          |  38 +-
 drivers/staging/lustre/lustre/llite/file.c         | 200 +++--
 .../staging/lustre/lustre/llite/llite_internal.h   |   3 +
 drivers/staging/lustre/lustre/llite/vvp_io.c       |  44 +-
 drivers/staging/lustre/lustre/llite/xattr.c        |  70 +-
 .../staging/lustre/lustre/lov/lov_cl_internal.h    | 191 ++---
 drivers/staging/lustre/lustre/lov/lov_ea.c         | 570 ++++++++++----
 drivers/staging/lustre/lustre/lov/lov_internal.h   | 177 +++--
 drivers/staging/lustre/lustre/lov/lov_io.c         | 651 +++++++++-------
 drivers/staging/lustre/lustre/lov/lov_lock.c       |  94 ++-
 drivers/staging/lustre/lustre/lov/lov_merge.c      |  12 +-
 drivers/staging/lustre/lustre/lov/lov_object.c     | 833 ++++++++++++---------
 drivers/staging/lustre/lustre/lov/lov_offset.c     |  65 +-
 drivers/staging/lustre/lustre/lov/lov_pack.c       | 368 +++++----
 drivers/staging/lustre/lustre/lov/lov_page.c       |  42 +-
 drivers/staging/lustre/lustre/lov/lov_pool.c       |  20 +-
 drivers/staging/lustre/lustre/lov/lov_request.c    |   4 +-
 drivers/staging/lustre/lustre/lov/lovsub_object.c  |  23 +-
 drivers/staging/lustre/lustre/mdc/mdc_locks.c      |  89 ++-
 drivers/staging/lustre/lustre/obdclass/cl_io.c     |   2 +-
 drivers/staging/lustre/lustre/obdclass/cl_object.c |   5 +-
 drivers/staging/lustre/lustre/obdclass/genops.c    |  16 +-
 drivers/staging/lustre/lustre/osc/osc_io.c         |   4 +-
 drivers/staging/lustre/lustre/osc/osc_lock.c       |   7 +-
 drivers/staging/lustre/lustre/ptlrpc/layout.c      |   6 +-
 .../staging/lustre/lustre/ptlrpc/pack_generic.c    |  84 ++-
 .../staging/lustre/lustre/ptlrpc/ptlrpc_internal.h |   7 +-
 drivers/staging/lustre/lustre/ptlrpc/sec.c         |   5 +-
 drivers/staging/lustre/lustre/ptlrpc/wiretest.c    | 125 +++-
 37 files changed, 2507 insertions(+), 1411 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [lustre-devel] [PATCH v2 01/33] lustre: clio: fix incorrect invariant in cl_io_iter_fini()
  2019-01-06 22:13 [lustre-devel] [PATCH v2 00/33] lustre: add PFL support James Simmons
@ 2019-01-06 22:13 ` James Simmons
  2019-01-06 22:13 ` [lustre-devel] [PATCH v2 02/33] lustre: pfl: Basic data structures for composite layout James Simmons
                   ` (31 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: James Simmons @ 2019-01-06 22:13 UTC (permalink / raw)
  To: lustre-devel

It was discovered during PFL testing that if you enable invarients
that cl_io_iter_fini() will crash with the following back trace:

kernel: cl_io_iter_fini+0x10c/0x110 [obdclass]
kernel: cl_io_loop+0x46/0x220 [obdclass]
kernel: cl_setattr_ost+0x1ed/0x2a0 [lustre]
kernel: ll_setattr_raw+0x7b0/0x9a0 [lustre]
kernel: notify_change+0x1dc/0x430
kernel: do_truncate+0x72/0xc0
kernel: do_sys_ftruncate+0xf5/0x160

This is due to assumption that the ci_state will always be
CIS_UNLOCKED but by looking at the behavior of cl_io_loop() it
can be seen that is not the case. We do want to make sure the
state is in the range of CIS_IT_STARTED to CIS_IO_FINISHED when
cl_io_iter_fini() is called.

Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-11828
Reviewed-on: https://review.whamcloud.com/33915
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/obdclass/cl_io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/lustre/obdclass/cl_io.c b/drivers/staging/lustre/lustre/obdclass/cl_io.c
index 879383ae..0da731c 100644
--- a/drivers/staging/lustre/lustre/obdclass/cl_io.c
+++ b/drivers/staging/lustre/lustre/obdclass/cl_io.c
@@ -436,7 +436,7 @@ void cl_io_iter_fini(const struct lu_env *env, struct cl_io *io)
 	const struct cl_io_slice *scan;
 
 	LINVRNT(cl_io_is_loopable(io));
-	LINVRNT(io->ci_state == CIS_UNLOCKED);
+	LINVRNT(io->ci_state < CIS_LOCKED || io->ci_state > CIS_IO_FINISHED);
 	LINVRNT(cl_io_invariant(io));
 
 	list_for_each_entry_reverse(scan, &io->ci_layers, cis_linkage) {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [lustre-devel] [PATCH v2 02/33] lustre: pfl: Basic data structures for composite layout
  2019-01-06 22:13 [lustre-devel] [PATCH v2 00/33] lustre: add PFL support James Simmons
  2019-01-06 22:13 ` [lustre-devel] [PATCH v2 01/33] lustre: clio: fix incorrect invariant in cl_io_iter_fini() James Simmons
@ 2019-01-06 22:13 ` James Simmons
  2019-01-06 22:13 ` [lustre-devel] [PATCH v2 03/33] lustre: lov: move code for PFL work James Simmons
                   ` (30 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: James Simmons @ 2019-01-06 22:13 UTC (permalink / raw)
  To: lustre-devel

From: Niu Yawei <yawei.niu@intel.com>

Added basic structures and magic numbers for composite layout.

Details about PFL can be reviewed at
http://wiki.lustre.org/PFL_Prototype_High_Level_Design

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-8998
Reviewed-on: https://review.whamcloud.com/24822
Reviewed-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/include/uapi/linux/lustre/lustre_idl.h  |  1 +
 .../lustre/include/uapi/linux/lustre/lustre_user.h | 50 +++++++++++++++
 .../staging/lustre/lustre/include/lustre_swab.h    |  1 +
 .../staging/lustre/lustre/ptlrpc/pack_generic.c    | 71 ++++++++++++++++++++++
 drivers/staging/lustre/lustre/ptlrpc/wiretest.c    | 67 ++++++++++++++++++++
 5 files changed, 190 insertions(+)

diff --git a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h
index a42ce9d..333b791 100644
--- a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h
@@ -960,6 +960,7 @@ enum obdo_flags {
 /* reserved for specifying OSTs */
 #define LOV_MAGIC_SPECIFIC	(0x0BD50000 | LOV_MAGIC_MAGIC)
 #define LOV_MAGIC		LOV_MAGIC_V1
+#define LOV_MAGIC_COMP_V1	(0x0BD60000 | LOV_MAGIC_MAGIC)
 
 /*
  * magic for fully defined striping
diff --git a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_user.h b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_user.h
index 4412dc8..bb87a6f 100644
--- a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_user.h
+++ b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_user.h
@@ -327,6 +327,7 @@ enum ll_lease_type {
 #define LOV_USER_MAGIC_V3	0x0BD30BD0
 /* 0x0BD40BD0 is occupied by LOV_MAGIC_MIGRATE */
 #define LOV_USER_MAGIC_SPECIFIC	0x0BD50BD0	/* for specific OSTs */
+#define LOV_USER_MAGIC_COMP_V1	0x0BD60BD0
 
 #define LMV_USER_MAGIC    0x0CD30CD0    /*default lmv magic*/
 
@@ -405,6 +406,55 @@ struct lov_user_md_v3 {	   /* LOV EA user data (host-endian) */
 	struct lov_user_ost_data_v1 lmm_objects[0]; /* per-stripe data */
 } __packed;
 
+struct lu_extent {
+	__u64	e_start;
+	__u64	e_end;
+};
+
+enum lov_comp_md_entry_flags {
+	LCME_FL_PRIMARY		= 0x00000001,   /* Not used */
+	LCME_FL_STALE		= 0x00000002,   /* Not used */
+	LCME_FL_OFFLINE		= 0x00000004,   /* Not used */
+	LCME_FL_PREFERRED	= 0x00000008,	/* Not used */
+	LCME_FL_INIT		= 0x00000010,	/* instantiated */
+};
+
+#define LCME_KNOWN_FLAGS	LCME_FL_INIT
+
+/* lcme_id can be specified as certain flags, and the first
+ * bit of lcme_id is used to indicate that the ID is representing
+ * certain LCME_FL_* but not a real ID. Which implies we can have
+ * at most 31 flags (see LCME_FL_XXX).
+ */
+enum lcme_id {
+	LCME_ID_INVAL	= 0x0,
+	LCME_ID_MAX	= 0x7FFFFFFF,
+	LCME_ID_ALL	= 0xFFFFFFFF,
+	LCME_ID_NONE	= 0x80000000
+};
+
+struct lov_comp_md_entry_v1 {
+	__u32			lcme_id;	/* unique id of component */
+	__u32			lcme_flags;	/* LCME_FL_XXX */
+	struct lu_extent	lcme_extent;	/* file extent for component */
+	__u32			lcme_offset;	/* offset of component blob,
+						 * start from lov_comp_md_v1
+						 */
+	__u32			lcme_size;	/* size of component blob */
+	__u64			lcme_padding[2];
+} __packed;
+
+struct lov_comp_md_v1 {
+	__u32	lcm_magic;	/* LOV_USER_MAGIC_COMP_V1 */
+	__u32	lcm_size;	/* overall size including this struct */
+	__u32	lcm_layout_gen;
+	__u16	lcm_flags;
+	__u16	lcm_entry_count;
+	__u64	lcm_padding1;
+	__u64	lcm_padding2;
+	struct lov_comp_md_entry_v1 lcm_entries[0];
+} __packed;
+
 static inline __u32 lov_user_md_size(__u16 stripes, __u32 lmm_magic)
 {
 	if (lmm_magic == LOV_USER_MAGIC_V1)
diff --git a/drivers/staging/lustre/lustre/include/lustre_swab.h b/drivers/staging/lustre/lustre/include/lustre_swab.h
index e09a3dc..6939ac1 100644
--- a/drivers/staging/lustre/lustre/include/lustre_swab.h
+++ b/drivers/staging/lustre/lustre/include/lustre_swab.h
@@ -83,6 +83,7 @@
 void lustre_swab_fiemap(struct fiemap *fiemap);
 void lustre_swab_lov_user_md_v1(struct lov_user_md_v1 *lum);
 void lustre_swab_lov_user_md_v3(struct lov_user_md_v3 *lum);
+void lustre_swab_lov_comp_md_v1(struct lov_comp_md_v1 *lum);
 void lustre_swab_lov_user_md_objects(struct lov_user_ost_data *lod,
 				     int stripe_count);
 void lustre_swab_lov_mds_md(struct lov_mds_md *lmm);
diff --git a/drivers/staging/lustre/lustre/ptlrpc/pack_generic.c b/drivers/staging/lustre/lustre/ptlrpc/pack_generic.c
index 951bb92..9c5be30 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/pack_generic.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/pack_generic.c
@@ -1990,6 +1990,77 @@ void lustre_swab_lov_user_md_v3(struct lov_user_md_v3 *lum)
 }
 EXPORT_SYMBOL(lustre_swab_lov_user_md_v3);
 
+void lustre_swab_lov_comp_md_v1(struct lov_comp_md_v1 *lum)
+{
+	struct lov_comp_md_entry_v1 *ent;
+	bool cpu_endian;
+	u16 ent_count;
+	int i;
+
+	cpu_endian = lum->lcm_magic == LOV_USER_MAGIC_COMP_V1;
+	ent_count = lum->lcm_entry_count;
+	if (!cpu_endian)
+		__swab16s(&ent_count);
+
+	CDEBUG(D_IOCTL, "swabbing lov_user_comp_md v1\n");
+	__swab32s(&lum->lcm_magic);
+	__swab32s(&lum->lcm_size);
+	__swab32s(&lum->lcm_layout_gen);
+	__swab16s(&lum->lcm_flags);
+	__swab16s(&lum->lcm_entry_count);
+	BUILD_BUG_ON(offsetof(typeof(*lum), lcm_padding1) == 0);
+	BUILD_BUG_ON(offsetof(typeof(*lum), lcm_padding2) == 0);
+
+	for (i = 0; i < ent_count; i++) {
+		struct lov_user_md_v1 *v1;
+		u16 stripe_count;
+		u32 off, size;
+
+		ent = &lum->lcm_entries[i];
+		off = ent->lcme_offset;
+		size = ent->lcme_size;
+
+		if (!cpu_endian) {
+			__swab32s(&off);
+			__swab32s(&size);
+		}
+		__swab32s(&ent->lcme_id);
+		__swab32s(&ent->lcme_flags);
+		__swab64s(&ent->lcme_extent.e_start);
+		__swab64s(&ent->lcme_extent.e_end);
+		__swab32s(&ent->lcme_offset);
+		__swab32s(&ent->lcme_size);
+		BUILD_BUG_ON(offsetof(typeof(*ent), lcme_padding) == 0);
+
+		v1 = (struct lov_user_md_v1 *)((char *)lum + off);
+		stripe_count = v1->lmm_stripe_count;
+		if (!cpu_endian)
+			__swab16s(&stripe_count);
+
+		if (v1->lmm_magic == __swab32(LOV_USER_MAGIC_V1) ||
+		    v1->lmm_magic == LOV_USER_MAGIC_V1) {
+			lustre_swab_lov_user_md_v1(v1);
+			if (size > sizeof(*v1))
+				lustre_swab_lov_user_md_objects(v1->lmm_objects,
+								stripe_count);
+		} else if (v1->lmm_magic == __swab32(LOV_USER_MAGIC_V3) ||
+			   v1->lmm_magic == LOV_USER_MAGIC_V3 ||
+			   v1->lmm_magic == __swab32(LOV_USER_MAGIC_SPECIFIC) ||
+			   v1->lmm_magic == LOV_USER_MAGIC_SPECIFIC) {
+			struct lov_user_md_v3 *v3;
+
+			v3 = (struct lov_user_md_v3 *)v1;
+			lustre_swab_lov_user_md_v3(v3);
+			if (size > sizeof(*v3))
+				lustre_swab_lov_user_md_objects(v3->lmm_objects,
+								stripe_count);
+		} else {
+			CERROR("Invalid magic %#x\n", v1->lmm_magic);
+		}
+	}
+}
+EXPORT_SYMBOL(lustre_swab_lov_comp_md_v1);
+
 void lustre_swab_lov_mds_md(struct lov_mds_md *lmm)
 {
 	CDEBUG(D_IOCTL, "swabbing lov_mds_md\n");
diff --git a/drivers/staging/lustre/lustre/ptlrpc/wiretest.c b/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
index 3aaaebb..90e6b8c 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
@@ -1450,6 +1450,73 @@ void lustre_assert_wire_constants(void)
 	LASSERTF(LOV_PATTERN_CMOBD == 0x00000200UL, "found 0x%.8xUL\n",
 		 (unsigned int)LOV_PATTERN_CMOBD);
 
+	/* Checks for struct lov_comp_md_entry_v1 */
+	LASSERTF((int)sizeof(struct lov_comp_md_entry_v1) == 48, "found %lld\n",
+		 (long long)(int)sizeof(struct lov_comp_md_entry_v1));
+	LASSERTF((int)offsetof(struct lov_comp_md_entry_v1, lcme_id) == 0, "found %lld\n",
+		 (long long)(int)offsetof(struct lov_comp_md_entry_v1, lcme_id));
+	LASSERTF((int)sizeof(((struct lov_comp_md_entry_v1 *)0)->lcme_id) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct lov_comp_md_entry_v1 *)0)->lcme_id));
+	LASSERTF((int)offsetof(struct lov_comp_md_entry_v1, lcme_flags) == 4, "found %lld\n",
+		 (long long)(int)offsetof(struct lov_comp_md_entry_v1, lcme_flags));
+	LASSERTF((int)sizeof(((struct lov_comp_md_entry_v1 *)0)->lcme_flags) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct lov_comp_md_entry_v1 *)0)->lcme_flags));
+	LASSERTF((int)offsetof(struct lov_comp_md_entry_v1, lcme_extent) == 8, "found %lld\n",
+		 (long long)(int)offsetof(struct lov_comp_md_entry_v1, lcme_extent));
+	LASSERTF((int)sizeof(((struct lov_comp_md_entry_v1 *)0)->lcme_extent) == 16, "found %lld\n",
+		 (long long)(int)sizeof(((struct lov_comp_md_entry_v1 *)0)->lcme_extent));
+	LASSERTF((int)offsetof(struct lov_comp_md_entry_v1, lcme_offset) == 24, "found %lld\n",
+		 (long long)(int)offsetof(struct lov_comp_md_entry_v1, lcme_offset));
+	LASSERTF((int)sizeof(((struct lov_comp_md_entry_v1 *)0)->lcme_offset) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct lov_comp_md_entry_v1 *)0)->lcme_offset));
+	LASSERTF((int)offsetof(struct lov_comp_md_entry_v1, lcme_size) == 28, "found %lld\n",
+		 (long long)(int)offsetof(struct lov_comp_md_entry_v1, lcme_size));
+	LASSERTF((int)sizeof(((struct lov_comp_md_entry_v1 *)0)->lcme_size) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct lov_comp_md_entry_v1 *)0)->lcme_size));
+	LASSERTF((int)offsetof(struct lov_comp_md_entry_v1, lcme_padding) == 32, "found %lld\n",
+		 (long long)(int)offsetof(struct lov_comp_md_entry_v1, lcme_padding));
+	LASSERTF((int)sizeof(((struct lov_comp_md_entry_v1 *)0)->lcme_padding) == 16, "found %lld\n",
+		 (long long)(int)sizeof(((struct lov_comp_md_entry_v1 *)0)->lcme_padding));
+	LASSERTF(LCME_FL_INIT == 0x00000010UL, "found 0x%.8xUL\n",
+	         (unsigned int)LCME_FL_INIT);
+
+	/* Checks for struct lov_comp_md_v1 */
+	LASSERTF((int)sizeof(struct lov_comp_md_v1) == 32, "found %lld\n",
+		 (long long)(int)sizeof(struct lov_comp_md_v1));
+	LASSERTF((int)offsetof(struct lov_comp_md_v1, lcm_magic) == 0, "found %lld\n",
+		 (long long)(int)offsetof(struct lov_comp_md_v1, lcm_magic));
+	LASSERTF((int)sizeof(((struct lov_comp_md_v1 *)0)->lcm_magic) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct lov_comp_md_v1 *)0)->lcm_magic));
+	LASSERTF((int)offsetof(struct lov_comp_md_v1, lcm_size) == 4, "found %lld\n",
+		 (long long)(int)offsetof(struct lov_comp_md_v1, lcm_size));
+	LASSERTF((int)sizeof(((struct lov_comp_md_v1 *)0)->lcm_size) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct lov_comp_md_v1 *)0)->lcm_size));
+	LASSERTF((int)offsetof(struct lov_comp_md_v1, lcm_layout_gen) == 8, "found %lld\n",
+		 (long long)(int)offsetof(struct lov_comp_md_v1, lcm_layout_gen));
+	LASSERTF((int)sizeof(((struct lov_comp_md_v1 *)0)->lcm_layout_gen) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct lov_comp_md_v1 *)0)->lcm_layout_gen));
+	LASSERTF((int)offsetof(struct lov_comp_md_v1, lcm_flags) == 12, "found %lld\n",
+		 (long long)(int)offsetof(struct lov_comp_md_v1, lcm_flags));
+	LASSERTF((int)sizeof(((struct lov_comp_md_v1 *)0)->lcm_flags) == 2, "found %lld\n",
+		 (long long)(int)sizeof(((struct lov_comp_md_v1 *)0)->lcm_flags));
+	LASSERTF((int)offsetof(struct lov_comp_md_v1, lcm_entry_count) == 14, "found %lld\n",
+		 (long long)(int)offsetof(struct lov_comp_md_v1, lcm_entry_count));
+	LASSERTF((int)sizeof(((struct lov_comp_md_v1 *)0)->lcm_entry_count) == 2, "found %lld\n",
+		 (long long)(int)sizeof(((struct lov_comp_md_v1 *)0)->lcm_entry_count));
+	LASSERTF((int)offsetof(struct lov_comp_md_v1, lcm_padding1) == 16, "found %lld\n",
+		 (long long)(int)offsetof(struct lov_comp_md_v1, lcm_padding1));
+	LASSERTF((int)sizeof(((struct lov_comp_md_v1 *)0)->lcm_padding1) == 8, "found %lld\n",
+		 (long long)(int)sizeof(((struct lov_comp_md_v1 *)0)->lcm_padding1));
+	LASSERTF((int)offsetof(struct lov_comp_md_v1, lcm_padding2) == 24, "found %lld\n",
+		 (long long)(int)offsetof(struct lov_comp_md_v1, lcm_padding2));
+	LASSERTF((int)sizeof(((struct lov_comp_md_v1 *)0)->lcm_padding2) == 8, "found %lld\n",
+		 (long long)(int)sizeof(((struct lov_comp_md_v1 *)0)->lcm_padding2));
+	LASSERTF((int)offsetof(struct lov_comp_md_v1, lcm_entries[0]) == 32, "found %lld\n",
+		 (long long)(int)offsetof(struct lov_comp_md_v1, lcm_entries[0]));
+	LASSERTF((int)sizeof(((struct lov_comp_md_v1 *)0)->lcm_entries[0]) == 48, "found %lld\n",
+		 (long long)(int)sizeof(((struct lov_comp_md_v1 *)0)->lcm_entries[0]));
+	BUILD_BUG_ON(LOV_MAGIC_COMP_V1 != (0x0BD60000 | 0x0BD0));
+
 	/* Checks for struct lmv_mds_md_v1 */
 	LASSERTF((int)sizeof(struct lmv_mds_md_v1) == 56, "found %lld\n",
 		 (long long)(int)sizeof(struct lmv_mds_md_v1));
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [lustre-devel] [PATCH v2 03/33] lustre: lov: move code for PFL work
  2019-01-06 22:13 [lustre-devel] [PATCH v2 00/33] lustre: add PFL support James Simmons
  2019-01-06 22:13 ` [lustre-devel] [PATCH v2 01/33] lustre: clio: fix incorrect invariant in cl_io_iter_fini() James Simmons
  2019-01-06 22:13 ` [lustre-devel] [PATCH v2 02/33] lustre: pfl: Basic data structures for composite layout James Simmons
@ 2019-01-06 22:13 ` James Simmons
  2019-01-06 22:13 ` [lustre-devel] [PATCH v2 04/33] lustre: lov: merge lov_mds_md_v3 and lov_mds_md_v1 handling James Simmons
                   ` (29 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: James Simmons @ 2019-01-06 22:13 UTC (permalink / raw)
  To: lustre-devel

From: Bobi Jam <bobijam@hotmail.com>

Move lov_tgt_maxbytes() and lsm_free_plain() toward the top of
lov_ea.c for upcoming PFL work. Migrate inline function
lsm_op_find() out of lov_internal.h to lov_ea.c since it is
consider bad code style to add define external structures
and have an inline function in the same header. Instead
only add lsm_op_find() prototype to lov_internal.h and
make all the struct lsm_operations static in lov_ea.c.

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Signed-off-by: Bobi Jam <bobijam@hotmail.com>
Signed-off-by: Niu Yawei <yawei.niu@intel.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-8998
Reviewed-on: https://review.whamcloud.com/24849
Reviewed-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/lov/lov_ea.c       | 87 ++++++++++++++----------
 drivers/staging/lustre/lustre/lov/lov_internal.h | 16 +----
 2 files changed, 51 insertions(+), 52 deletions(-)

diff --git a/drivers/staging/lustre/lustre/lov/lov_ea.c b/drivers/staging/lustre/lustre/lov/lov_ea.c
index c80320a..3dfb204 100644
--- a/drivers/staging/lustre/lustre/lov/lov_ea.c
+++ b/drivers/staging/lustre/lustre/lov/lov_ea.c
@@ -44,6 +44,33 @@
 
 #include "lov_internal.h"
 
+/*
+ * Find minimum stripe maxbytes value. For inactive or
+ * reconnecting targets use LUSTRE_EXT3_STRIPE_MAXBYTES.
+ */
+static loff_t lov_tgt_maxbytes(struct lov_tgt_desc *tgt)
+{
+	loff_t maxbytes = LUSTRE_EXT3_STRIPE_MAXBYTES;
+	struct obd_import *imp;
+
+	if (!tgt->ltd_active)
+		return maxbytes;
+
+	imp = tgt->ltd_obd->u.cli.cl_import;
+	if (!imp)
+		return maxbytes;
+
+	spin_lock(&imp->imp_lock);
+	if (imp->imp_state == LUSTRE_IMP_FULL &&
+	    (imp->imp_connect_data.ocd_connect_flags & OBD_CONNECT_MAXBYTES) &&
+	     imp->imp_connect_data.ocd_maxbytes > 0)
+		maxbytes = imp->imp_connect_data.ocd_maxbytes;
+
+	spin_unlock(&imp->imp_lock);
+
+	return maxbytes;
+}
+
 static int lsm_lmm_verify_common(struct lov_mds_md *lmm, int lmm_bytes,
 				 __u16 stripe_count)
 {
@@ -76,6 +103,16 @@ static int lsm_lmm_verify_common(struct lov_mds_md *lmm, int lmm_bytes,
 	return 0;
 }
 
+void lsm_free_plain(struct lov_stripe_md *lsm)
+{
+	__u16 stripe_count = lsm->lsm_stripe_count;
+	int i;
+
+	for (i = 0; i < stripe_count; i++)
+		kmem_cache_free(lov_oinfo_slab, lsm->lsm_oinfo[i]);
+	kvfree(lsm);
+}
+
 struct lov_stripe_md *lsm_alloc_plain(u16 stripe_count)
 {
 	size_t oinfo_ptrs_size, lsm_size;
@@ -108,43 +145,6 @@ struct lov_stripe_md *lsm_alloc_plain(u16 stripe_count)
 	return NULL;
 }
 
-void lsm_free_plain(struct lov_stripe_md *lsm)
-{
-	__u16 stripe_count = lsm->lsm_stripe_count;
-	int i;
-
-	for (i = 0; i < stripe_count; i++)
-		kmem_cache_free(lov_oinfo_slab, lsm->lsm_oinfo[i]);
-	kvfree(lsm);
-}
-
-/*
- * Find minimum stripe maxbytes value.  For inactive or
- * reconnecting targets use LUSTRE_EXT3_STRIPE_MAXBYTES.
- */
-static loff_t lov_tgt_maxbytes(struct lov_tgt_desc *tgt)
-{
-	loff_t maxbytes = LUSTRE_EXT3_STRIPE_MAXBYTES;
-	struct obd_import *imp;
-
-	if (!tgt->ltd_active)
-		return maxbytes;
-
-	imp = tgt->ltd_obd->u.cli.cl_import;
-	if (!imp)
-		return maxbytes;
-
-	spin_lock(&imp->imp_lock);
-	if (imp->imp_state == LUSTRE_IMP_FULL &&
-	    (imp->imp_connect_data.ocd_connect_flags & OBD_CONNECT_MAXBYTES) &&
-	     imp->imp_connect_data.ocd_maxbytes > 0)
-		maxbytes = imp->imp_connect_data.ocd_maxbytes;
-
-	spin_unlock(&imp->imp_lock);
-
-	return maxbytes;
-}
-
 static int lsm_unpackmd_common(struct lov_obd *lov,
 			       struct lov_stripe_md *lsm,
 			       struct lov_mds_md *lmm,
@@ -320,6 +320,19 @@ static int lsm_unpackmd_v3(struct lov_obd *lov, struct lov_stripe_md *lsm,
 	.lsm_unpackmd	   = lsm_unpackmd_v3,
 };
 
+const struct lsm_operations *lsm_op_find(int magic)
+{
+	switch (magic) {
+	case LOV_MAGIC_V1:
+		return &lsm_v1_ops;
+	case LOV_MAGIC_V3:
+		return &lsm_v3_ops;
+	default:
+		CERROR("unrecognized lsm_magic %08x\n", magic);
+		return NULL;
+	}
+}
+
 void dump_lsm(unsigned int level, const struct lov_stripe_md *lsm)
 {
 	CDEBUG(level, "lsm %p, objid " DOSTID ", maxbytes %#llx, magic 0x%08X, stripe_size %u, stripe_count %u, refc: %d, layout_gen %u, pool [" LOV_POOLNAMEF "]\n",
diff --git a/drivers/staging/lustre/lustre/lov/lov_internal.h b/drivers/staging/lustre/lustre/lov/lov_internal.h
index 44a997e..51f416e 100644
--- a/drivers/staging/lustre/lustre/lov/lov_internal.h
+++ b/drivers/staging/lustre/lustre/lov/lov_internal.h
@@ -92,21 +92,7 @@ struct lsm_operations {
 			    struct lov_mds_md *lmm);
 };
 
-extern const struct lsm_operations lsm_v1_ops;
-extern const struct lsm_operations lsm_v3_ops;
-
-static inline const struct lsm_operations *lsm_op_find(int magic)
-{
-	switch (magic) {
-	case LOV_MAGIC_V1:
-		return &lsm_v1_ops;
-	case LOV_MAGIC_V3:
-		return &lsm_v3_ops;
-	default:
-		CERROR("unrecognized lsm_magic %08x\n", magic);
-		return NULL;
-	}
-}
+const struct lsm_operations *lsm_op_find(int magic);
 
 /* lov_do_div64(a, b) returns a % b, and a = a / b.
  * The 32-bit code is LOV-specific due to knowing about stripe limits in
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [lustre-devel] [PATCH v2 04/33] lustre: lov: merge lov_mds_md_v3 and lov_mds_md_v1 handling
  2019-01-06 22:13 [lustre-devel] [PATCH v2 00/33] lustre: add PFL support James Simmons
                   ` (2 preceding siblings ...)
  2019-01-06 22:13 ` [lustre-devel] [PATCH v2 03/33] lustre: lov: move code for PFL work James Simmons
@ 2019-01-06 22:13 ` James Simmons
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 05/33] lustre: lov: fold lmm_verify() handling into lmm_unpackmd() James Simmons
                   ` (28 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: James Simmons @ 2019-01-06 22:13 UTC (permalink / raw)
  To: lustre-devel

From: Bobi Jam <bobijam@hotmail.com>

Several of the struct lsm_operations functions for both v1 and v3
are nearly identical. Let's merge them together.

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Signed-off-by: Bobi Jam <bobijam@hotmail.com>
Signed-off-by: Niu Yawei <yawei.niu@intel.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-8998
Reviewed-on: https://review.whamcloud.com/24849
WC-bug-id: https://jira.whamcloud.com/browse/LU-9315
Reviewed-on: https://review.whamcloud.com/26503
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/lov/lov_ea.c       | 58 ++++++++++++------------
 drivers/staging/lustre/lustre/lov/lov_internal.h |  3 +-
 drivers/staging/lustre/lustre/lov/lov_pack.c     | 30 ++++--------
 3 files changed, 38 insertions(+), 53 deletions(-)

diff --git a/drivers/staging/lustre/lustre/lov/lov_ea.c b/drivers/staging/lustre/lustre/lov/lov_ea.c
index 3dfb204..2b3552a 100644
--- a/drivers/staging/lustre/lustre/lov/lov_ea.c
+++ b/drivers/staging/lustre/lustre/lov/lov_ea.c
@@ -71,8 +71,8 @@ static loff_t lov_tgt_maxbytes(struct lov_tgt_desc *tgt)
 	return maxbytes;
 }
 
-static int lsm_lmm_verify_common(struct lov_mds_md *lmm, int lmm_bytes,
-				 __u16 stripe_count)
+static int lsm_lmm_verify_v1v3(struct lov_mds_md *lmm, size_t lmm_size,
+			       u16 stripe_count)
 {
 	if (stripe_count > LOV_V1_INSANE_STRIPE_COUNT) {
 		CERROR("bad stripe count %d\n", stripe_count);
@@ -103,7 +103,7 @@ static int lsm_lmm_verify_common(struct lov_mds_md *lmm, int lmm_bytes,
 	return 0;
 }
 
-void lsm_free_plain(struct lov_stripe_md *lsm)
+void lsm_free(struct lov_stripe_md *lsm)
 {
 	__u16 stripe_count = lsm->lsm_stripe_count;
 	int i;
@@ -145,10 +145,11 @@ struct lov_stripe_md *lsm_alloc_plain(u16 stripe_count)
 	return NULL;
 }
 
-static int lsm_unpackmd_common(struct lov_obd *lov,
-			       struct lov_stripe_md *lsm,
-			       struct lov_mds_md *lmm,
-			       struct lov_ost_data_v1 *objects)
+static int lsm_unpackmd_v1v3(struct lov_obd *lov,
+			     struct lov_stripe_md *lsm,
+			     struct lov_mds_md *lmm,
+			     const char *pool_name,
+			     struct lov_ost_data_v1 *objects)
 {
 	loff_t min_stripe_maxbytes = 0;
 	unsigned int stripe_count;
@@ -168,6 +169,15 @@ static int lsm_unpackmd_common(struct lov_obd *lov,
 
 	stripe_count = lsm_is_released(lsm) ? 0 : lsm->lsm_stripe_count;
 
+	if (pool_name) {
+		size_t pool_name_len;
+
+		pool_name_len = strlcpy(lsm->lsm_pool_name, pool_name,
+					sizeof(lsm->lsm_pool_name));
+		if (pool_name_len >= sizeof(lsm->lsm_pool_name))
+			return -E2BIG;
+	}
+
 	for (i = 0; i < stripe_count; i++) {
 		loi = lsm->lsm_oinfo[i];
 		ostid_le_to_cpu(&objects[i].l_ost_oi, &loi->loi_oi);
@@ -248,17 +258,16 @@ static int lsm_lmm_verify_v1(struct lov_mds_md_v1 *lmm, int lmm_bytes,
 		return -EINVAL;
 	}
 
-	return lsm_lmm_verify_common(lmm, lmm_bytes, *stripe_count);
+	return lsm_lmm_verify_v1v3(lmm, lmm_bytes, *stripe_count);
 }
 
 static int lsm_unpackmd_v1(struct lov_obd *lov, struct lov_stripe_md *lsm,
 			   struct lov_mds_md_v1 *lmm)
 {
-	return lsm_unpackmd_common(lov, lsm, lmm, lmm->lmm_objects);
+	return lsm_unpackmd_v1v3(lov, lsm, lmm, NULL, lmm->lmm_objects);
 }
 
-const struct lsm_operations lsm_v1_ops = {
-	.lsm_free	    = lsm_free_plain,
+const static struct lsm_operations lsm_v1_ops = {
 	.lsm_stripe_by_index    = lsm_stripe_by_index_plain,
 	.lsm_stripe_by_offset   = lsm_stripe_by_offset_plain,
 	.lsm_lmm_verify	 = lsm_lmm_verify_v1,
@@ -289,7 +298,7 @@ static int lsm_lmm_verify_v3(struct lov_mds_md *lmmv1, int lmm_bytes,
 		return -EINVAL;
 	}
 
-	return lsm_lmm_verify_common((struct lov_mds_md_v1 *)lmm, lmm_bytes,
+	return lsm_lmm_verify_v1v3((struct lov_mds_md_v1 *)lmm, lmm_bytes,
 				     *stripe_count);
 }
 
@@ -297,27 +306,16 @@ static int lsm_unpackmd_v3(struct lov_obd *lov, struct lov_stripe_md *lsm,
 			   struct lov_mds_md *lmm)
 {
 	struct lov_mds_md_v3 *lmm_v3 = (struct lov_mds_md_v3 *)lmm;
-	size_t cplen = 0;
-	int rc;
-
-	rc = lsm_unpackmd_common(lov, lsm, lmm, lmm_v3->lmm_objects);
-	if (rc)
-		return rc;
 
-	cplen = strlcpy(lsm->lsm_pool_name, lmm_v3->lmm_pool_name,
-			sizeof(lsm->lsm_pool_name));
-	if (cplen >= sizeof(lsm->lsm_pool_name))
-		return -E2BIG;
-
-	return 0;
+	return lsm_unpackmd_v1v3(lov, lsm, lmm, lmm_v3->lmm_pool_name,
+				 lmm_v3->lmm_objects);
 }
 
-const struct lsm_operations lsm_v3_ops = {
-	.lsm_free	    = lsm_free_plain,
-	.lsm_stripe_by_index    = lsm_stripe_by_index_plain,
-	.lsm_stripe_by_offset   = lsm_stripe_by_offset_plain,
-	.lsm_lmm_verify	 = lsm_lmm_verify_v3,
-	.lsm_unpackmd	   = lsm_unpackmd_v3,
+const static struct lsm_operations lsm_v3_ops = {
+	.lsm_stripe_by_index	= lsm_stripe_by_index_plain,
+	.lsm_stripe_by_offset	= lsm_stripe_by_offset_plain,
+	.lsm_lmm_verify		= lsm_lmm_verify_v3,
+	.lsm_unpackmd		= lsm_unpackmd_v3,
 };
 
 const struct lsm_operations *lsm_op_find(int magic)
diff --git a/drivers/staging/lustre/lustre/lov/lov_internal.h b/drivers/staging/lustre/lustre/lov/lov_internal.h
index 51f416e..2c416b4 100644
--- a/drivers/staging/lustre/lustre/lov/lov_internal.h
+++ b/drivers/staging/lustre/lustre/lov/lov_internal.h
@@ -81,7 +81,6 @@ static inline bool lsm_has_objects(struct lov_stripe_md *lsm)
 }
 
 struct lsm_operations {
-	void (*lsm_free)(struct lov_stripe_md *);
 	void (*lsm_stripe_by_index)(struct lov_stripe_md *, int *, loff_t *,
 				    loff_t *);
 	void (*lsm_stripe_by_offset)(struct lov_stripe_md *, int *, loff_t *,
@@ -93,6 +92,7 @@ struct lsm_operations {
 };
 
 const struct lsm_operations *lsm_op_find(int magic);
+void lsm_free(struct lov_stripe_md *lsm);
 
 /* lov_do_div64(a, b) returns a % b, and a = a / b.
  * The 32-bit code is LOV-specific due to knowing about stripe limits in
@@ -224,7 +224,6 @@ struct lov_stripe_md *lov_unpackmd(struct lov_obd *lov, struct lov_mds_md *lmm,
 
 /* lov_ea.c */
 struct lov_stripe_md *lsm_alloc_plain(u16 stripe_count);
-void lsm_free_plain(struct lov_stripe_md *lsm);
 void dump_lsm(unsigned int level, const struct lov_stripe_md *lsm);
 
 /* lproc_lov.c */
diff --git a/drivers/staging/lustre/lustre/lov/lov_pack.c b/drivers/staging/lustre/lustre/lov/lov_pack.c
index 98b114b..02936bf 100644
--- a/drivers/staging/lustre/lustre/lov/lov_pack.c
+++ b/drivers/staging/lustre/lustre/lov/lov_pack.c
@@ -181,22 +181,6 @@ __u16 lov_get_stripecnt(struct lov_obd *lov, __u32 magic, __u16 stripe_count)
 	return stripe_count;
 }
 
-static int lov_verify_lmm(void *lmm, int lmm_bytes, __u16 *stripe_count)
-{
-	int rc;
-
-	if (!lsm_op_find(le32_to_cpu(*(__u32 *)lmm))) {
-		CERROR("bad disk LOV MAGIC: 0x%08X; dumping LMM (size=%d):\n",
-		       le32_to_cpu(*(__u32 *)lmm), lmm_bytes);
-		CERROR("%*phN\n", lmm_bytes, lmm);
-		return -EINVAL;
-	}
-	rc = lsm_op_find(le32_to_cpu(*(__u32 *)lmm))->lsm_lmm_verify(lmm,
-								     lmm_bytes,
-								  stripe_count);
-	return rc;
-}
-
 static struct lov_stripe_md *lov_lsm_alloc(u16 stripe_count, u32 pattern,
 					   u32 magic)
 {
@@ -237,7 +221,7 @@ int lov_free_memmd(struct lov_stripe_md **lsmp)
 	LASSERT(atomic_read(&lsm->lsm_refc) > 0);
 	refc = atomic_dec_return(&lsm->lsm_refc);
 	if (refc == 0)
-		lsm_op_find(lsm->lsm_magic)->lsm_free(lsm);
+		lsm_free(lsm);
 
 	return refc;
 }
@@ -248,25 +232,29 @@ int lov_free_memmd(struct lov_stripe_md **lsmp)
 struct lov_stripe_md *lov_unpackmd(struct lov_obd *lov, struct lov_mds_md *lmm,
 				   size_t lmm_size)
 {
+	const struct lsm_operations *op;
 	struct lov_stripe_md *lsm;
 	u16 stripe_count;
 	u32 pattern;
 	u32 magic;
 	int rc;
 
-	rc = lov_verify_lmm(lmm, lmm_size, &stripe_count);
+	magic = le32_to_cpu(lmm->lmm_magic);
+	op = lsm_op_find(magic);
+	if (!op)
+		return ERR_PTR(-EINVAL);
+
+	rc = op->lsm_lmm_verify(lmm, lmm_size, &stripe_count);
 	if (rc)
 		return ERR_PTR(rc);
 
-	magic = le32_to_cpu(lmm->lmm_magic);
 	pattern = le32_to_cpu(lmm->lmm_pattern);
 
 	lsm = lov_lsm_alloc(stripe_count, pattern, magic);
 	if (IS_ERR(lsm))
 		return lsm;
 
-	LASSERT(lsm_op_find(magic));
-	rc = lsm_op_find(magic)->lsm_unpackmd(lov, lsm, lmm);
+	rc = op->lsm_unpackmd(lov, lsm, lmm);
 	if (rc) {
 		lov_free_memmd(&lsm);
 		return ERR_PTR(rc);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [lustre-devel] [PATCH v2 05/33] lustre: lov: fold lmm_verify() handling into lmm_unpackmd()
  2019-01-06 22:13 [lustre-devel] [PATCH v2 00/33] lustre: add PFL support James Simmons
                   ` (3 preceding siblings ...)
  2019-01-06 22:13 ` [lustre-devel] [PATCH v2 04/33] lustre: lov: merge lov_mds_md_v3 and lov_mds_md_v1 handling James Simmons
@ 2019-01-06 22:14 ` James Simmons
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 06/33] lustre: lov: create struct lov_stripe_md_entry James Simmons
                   ` (27 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: James Simmons @ 2019-01-06 22:14 UTC (permalink / raw)
  To: lustre-devel

From: Bobi Jam <bobijam@hotmail.com>

The function lov_unpackmd() calls the format specific version of
lmm_verify() and uses the returned information to allocate the
correct amount for the lsm information. We can fold the
lmm_verify() handling into the format specific unpackmd()
function. This also enables use to intergate the lsm allocation
as well into the unpackmd() function. This also greatly simplifies
lov_unpackmd().

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Signed-off-by: Bobi Jam <bobijam@hotmail.com>
Signed-off-by: Niu Yawei <yawei.niu@intel.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-8998
Reviewed-on: https://review.whamcloud.com/24849
Reviewed-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/lov/lov_ea.c       | 113 ++++++++++++++++++-----
 drivers/staging/lustre/lustre/lov/lov_internal.h |  11 +--
 drivers/staging/lustre/lustre/lov/lov_pack.c     |  59 +-----------
 3 files changed, 99 insertions(+), 84 deletions(-)

diff --git a/drivers/staging/lustre/lustre/lov/lov_ea.c b/drivers/staging/lustre/lustre/lov/lov_ea.c
index 2b3552a..5a01fbb 100644
--- a/drivers/staging/lustre/lustre/lov/lov_ea.c
+++ b/drivers/staging/lustre/lustre/lov/lov_ea.c
@@ -145,6 +145,37 @@ struct lov_stripe_md *lsm_alloc_plain(u16 stripe_count)
 	return NULL;
 }
 
+static struct lov_stripe_md *lov_lsm_alloc(u16 stripe_count, u32 pattern,
+					   u32 magic)
+{
+	struct lov_stripe_md *lsm;
+	unsigned int i;
+
+	CDEBUG(D_INFO, "alloc lsm, stripe_count %u\n", stripe_count);
+
+	lsm = lsm_alloc_plain(stripe_count);
+	if (!lsm) {
+		CERROR("cannot allocate LSM stripe_count %u\n", stripe_count);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	atomic_set(&lsm->lsm_refc, 1);
+	spin_lock_init(&lsm->lsm_lock);
+	lsm->lsm_magic = magic;
+	lsm->lsm_stripe_count = stripe_count;
+	lsm->lsm_maxbytes = LUSTRE_EXT3_STRIPE_MAXBYTES * stripe_count;
+	lsm->lsm_pattern = pattern;
+	lsm->lsm_pool_name[0] = '\0';
+	lsm->lsm_layout_gen = 0;
+	if (stripe_count > 0)
+		lsm->lsm_oinfo[0]->loi_ost_idx = ~0;
+
+	for (i = 0; i < stripe_count; i++)
+		loi_init(lsm->lsm_oinfo[i]);
+
+	return lsm;
+}
+
 static int lsm_unpackmd_v1v3(struct lov_obd *lov,
 			     struct lov_stripe_md *lsm,
 			     struct lov_mds_md *lmm,
@@ -238,12 +269,12 @@ static int lsm_unpackmd_v1v3(struct lov_obd *lov,
 		*swidth = (u64)lsm->lsm_stripe_size * lsm->lsm_stripe_count;
 }
 
-static int lsm_lmm_verify_v1(struct lov_mds_md_v1 *lmm, int lmm_bytes,
+static int lsm_lmm_verify_v1(struct lov_mds_md_v1 *lmm, size_t lmm_bytes,
 			     __u16 *stripe_count)
 {
 	if (lmm_bytes < sizeof(*lmm)) {
-		CERROR("lov_mds_md_v1 too small: %d, need at least %d\n",
-		       lmm_bytes, (int)sizeof(*lmm));
+		CERROR("lov_mds_md_v1 too small: %zu, need@least %zu\n",
+		       lmm_bytes, sizeof(*lmm));
 		return -EINVAL;
 	}
 
@@ -252,7 +283,7 @@ static int lsm_lmm_verify_v1(struct lov_mds_md_v1 *lmm, int lmm_bytes,
 		*stripe_count = 0;
 
 	if (lmm_bytes < lov_mds_md_size(*stripe_count, LOV_MAGIC_V1)) {
-		CERROR("LOV EA V1 too small: %d, need %d\n",
+		CERROR("LOV EA V1 too small: %zu, need %d\n",
 		       lmm_bytes, lov_mds_md_size(*stripe_count, LOV_MAGIC_V1));
 		lov_dump_lmm_common(D_WARNING, lmm);
 		return -EINVAL;
@@ -261,29 +292,47 @@ static int lsm_lmm_verify_v1(struct lov_mds_md_v1 *lmm, int lmm_bytes,
 	return lsm_lmm_verify_v1v3(lmm, lmm_bytes, *stripe_count);
 }
 
-static int lsm_unpackmd_v1(struct lov_obd *lov, struct lov_stripe_md *lsm,
-			   struct lov_mds_md_v1 *lmm)
+static struct lov_stripe_md *
+lsm_unpackmd_v1(struct lov_obd *lov, void *buf, size_t buf_size)
 {
-	return lsm_unpackmd_v1v3(lov, lsm, lmm, NULL, lmm->lmm_objects);
+	struct lov_mds_md_v1 *lmm = buf;
+	u32 magic = le32_to_cpu(lmm->lmm_magic);
+	struct lov_stripe_md *lsm;
+	u16 stripe_count;
+	u32 pattern;
+	int rc;
+
+	rc = lsm_lmm_verify_v1(lmm, buf_size, &stripe_count);
+	if (rc)
+		return ERR_PTR(rc);
+
+	pattern = le32_to_cpu(lmm->lmm_pattern);
+
+	lsm = lov_lsm_alloc(stripe_count, pattern, magic);
+	if (IS_ERR(lsm))
+		return lsm;
+
+	rc = lsm_unpackmd_v1v3(lov, lsm, lmm, NULL, lmm->lmm_objects);
+	if (rc) {
+		lov_free_memmd(&lsm);
+		lsm = ERR_PTR(rc);
+	}
+
+	return lsm;
 }
 
 const static struct lsm_operations lsm_v1_ops = {
 	.lsm_stripe_by_index    = lsm_stripe_by_index_plain,
 	.lsm_stripe_by_offset   = lsm_stripe_by_offset_plain,
-	.lsm_lmm_verify	 = lsm_lmm_verify_v1,
 	.lsm_unpackmd	   = lsm_unpackmd_v1,
 };
 
-static int lsm_lmm_verify_v3(struct lov_mds_md *lmmv1, int lmm_bytes,
+static int lsm_lmm_verify_v3(struct lov_mds_md_v3 *lmm, size_t lmm_bytes,
 			     __u16 *stripe_count)
 {
-	struct lov_mds_md_v3 *lmm;
-
-	lmm = (struct lov_mds_md_v3 *)lmmv1;
-
 	if (lmm_bytes < sizeof(*lmm)) {
-		CERROR("lov_mds_md_v3 too small: %d, need at least %d\n",
-		       lmm_bytes, (int)sizeof(*lmm));
+		CERROR("lov_mds_md_v3 too small: %zu, need@least %zu\n",
+		       lmm_bytes, sizeof(*lmm));
 		return -EINVAL;
 	}
 
@@ -292,7 +341,7 @@ static int lsm_lmm_verify_v3(struct lov_mds_md *lmmv1, int lmm_bytes,
 		*stripe_count = 0;
 
 	if (lmm_bytes < lov_mds_md_size(*stripe_count, LOV_MAGIC_V3)) {
-		CERROR("LOV EA V3 too small: %d, need %d\n",
+		CERROR("LOV EA V3 too small: %zu, need %d\n",
 		       lmm_bytes, lov_mds_md_size(*stripe_count, LOV_MAGIC_V3));
 		lov_dump_lmm_common(D_WARNING, lmm);
 		return -EINVAL;
@@ -302,19 +351,39 @@ static int lsm_lmm_verify_v3(struct lov_mds_md *lmmv1, int lmm_bytes,
 				     *stripe_count);
 }
 
-static int lsm_unpackmd_v3(struct lov_obd *lov, struct lov_stripe_md *lsm,
-			   struct lov_mds_md *lmm)
+static struct lov_stripe_md *
+lsm_unpackmd_v3(struct lov_obd *lov, void *buf, size_t buf_size)
 {
-	struct lov_mds_md_v3 *lmm_v3 = (struct lov_mds_md_v3 *)lmm;
+	struct lov_mds_md_v3 *lmm = buf;
+	u32 magic = le32_to_cpu(lmm->lmm_magic);
+	struct lov_stripe_md *lsm;
+	u16 stripe_count;
+	u32 pattern;
+	int rc;
+
+	rc = lsm_lmm_verify_v3(lmm, buf_size, &stripe_count);
+	if (rc)
+		return ERR_PTR(rc);
+
+	pattern = le32_to_cpu(lmm->lmm_pattern);
 
-	return lsm_unpackmd_v1v3(lov, lsm, lmm, lmm_v3->lmm_pool_name,
-				 lmm_v3->lmm_objects);
+	lsm = lov_lsm_alloc(stripe_count, pattern, magic);
+	if (IS_ERR(lsm))
+		return lsm;
+
+	rc = lsm_unpackmd_v1v3(lov, lsm, (struct lov_mds_md_v1 *)lmm,
+			       lmm->lmm_pool_name, lmm->lmm_objects);
+	if (rc) {
+		lov_free_memmd(&lsm);
+		lsm = ERR_PTR(rc);
+	}
+
+	return lsm;
 }
 
 const static struct lsm_operations lsm_v3_ops = {
 	.lsm_stripe_by_index	= lsm_stripe_by_index_plain,
 	.lsm_stripe_by_offset	= lsm_stripe_by_offset_plain,
-	.lsm_lmm_verify		= lsm_lmm_verify_v3,
 	.lsm_unpackmd		= lsm_unpackmd_v3,
 };
 
diff --git a/drivers/staging/lustre/lustre/lov/lov_internal.h b/drivers/staging/lustre/lustre/lov/lov_internal.h
index 2c416b4..ae122f6 100644
--- a/drivers/staging/lustre/lustre/lov/lov_internal.h
+++ b/drivers/staging/lustre/lustre/lov/lov_internal.h
@@ -85,10 +85,8 @@ struct lsm_operations {
 				    loff_t *);
 	void (*lsm_stripe_by_offset)(struct lov_stripe_md *, int *, loff_t *,
 				     loff_t *);
-	int (*lsm_lmm_verify)(struct lov_mds_md *lmm, int lmm_bytes,
-			      u16 *stripe_count);
-	int (*lsm_unpackmd)(struct lov_obd *lov, struct lov_stripe_md *lsm,
-			    struct lov_mds_md *lmm);
+	struct lov_stripe_md *(*lsm_unpackmd)(struct lov_obd *obd, void *buf,
+					      size_t buf_len);
 };
 
 const struct lsm_operations *lsm_op_find(int magic);
@@ -214,8 +212,8 @@ int lov_del_target(struct obd_device *obd, __u32 index,
 /* lov_pack.c */
 ssize_t lov_lsm_pack(const struct lov_stripe_md *lsm, void *buf,
 		     size_t buf_size);
-struct lov_stripe_md *lov_unpackmd(struct lov_obd *lov, struct lov_mds_md *lmm,
-				   size_t lmm_size);
+struct lov_stripe_md *lov_unpackmd(struct lov_obd *lov, void *buf,
+				   size_t buf_size);
 int lov_free_memmd(struct lov_stripe_md **lsmp);
 
 void lov_dump_lmm_v1(int level, struct lov_mds_md_v1 *lmm);
@@ -223,7 +221,6 @@ struct lov_stripe_md *lov_unpackmd(struct lov_obd *lov, struct lov_mds_md *lmm,
 void lov_dump_lmm_common(int level, void *lmmp);
 
 /* lov_ea.c */
-struct lov_stripe_md *lsm_alloc_plain(u16 stripe_count);
 void dump_lsm(unsigned int level, const struct lov_stripe_md *lsm);
 
 /* lproc_lov.c */
diff --git a/drivers/staging/lustre/lustre/lov/lov_pack.c b/drivers/staging/lustre/lustre/lov/lov_pack.c
index 02936bf..90f9f2d 100644
--- a/drivers/staging/lustre/lustre/lov/lov_pack.c
+++ b/drivers/staging/lustre/lustre/lov/lov_pack.c
@@ -181,37 +181,6 @@ __u16 lov_get_stripecnt(struct lov_obd *lov, __u32 magic, __u16 stripe_count)
 	return stripe_count;
 }
 
-static struct lov_stripe_md *lov_lsm_alloc(u16 stripe_count, u32 pattern,
-					   u32 magic)
-{
-	struct lov_stripe_md *lsm;
-	unsigned int i;
-
-	CDEBUG(D_INFO, "alloc lsm, stripe_count %u\n", stripe_count);
-
-	lsm = lsm_alloc_plain(stripe_count);
-	if (!lsm) {
-		CERROR("cannot allocate LSM stripe_count %u\n", stripe_count);
-		return ERR_PTR(-ENOMEM);
-	}
-
-	atomic_set(&lsm->lsm_refc, 1);
-	spin_lock_init(&lsm->lsm_lock);
-	lsm->lsm_magic = magic;
-	lsm->lsm_stripe_count = stripe_count;
-	lsm->lsm_maxbytes = LUSTRE_EXT3_STRIPE_MAXBYTES * stripe_count;
-	lsm->lsm_pattern = pattern;
-	lsm->lsm_pool_name[0] = '\0';
-	lsm->lsm_layout_gen = 0;
-	if (stripe_count > 0)
-		lsm->lsm_oinfo[0]->loi_ost_idx = ~0;
-
-	for (i = 0; i < stripe_count; i++)
-		loi_init(lsm->lsm_oinfo[i]);
-
-	return lsm;
-}
-
 int lov_free_memmd(struct lov_stripe_md **lsmp)
 {
 	struct lov_stripe_md *lsm = *lsmp;
@@ -229,38 +198,18 @@ int lov_free_memmd(struct lov_stripe_md **lsmp)
 /* Unpack LOV object metadata from disk storage.  It is packed in LE byte
  * order and is opaque to the networking layer.
  */
-struct lov_stripe_md *lov_unpackmd(struct lov_obd *lov, struct lov_mds_md *lmm,
-				   size_t lmm_size)
+struct lov_stripe_md *lov_unpackmd(struct lov_obd *lov, void *buf,
+				   size_t buf_size)
 {
 	const struct lsm_operations *op;
-	struct lov_stripe_md *lsm;
-	u16 stripe_count;
-	u32 pattern;
 	u32 magic;
-	int rc;
 
-	magic = le32_to_cpu(lmm->lmm_magic);
+	magic = le32_to_cpu(*(u32 *)buf);
 	op = lsm_op_find(magic);
 	if (!op)
 		return ERR_PTR(-EINVAL);
 
-	rc = op->lsm_lmm_verify(lmm, lmm_size, &stripe_count);
-	if (rc)
-		return ERR_PTR(rc);
-
-	pattern = le32_to_cpu(lmm->lmm_pattern);
-
-	lsm = lov_lsm_alloc(stripe_count, pattern, magic);
-	if (IS_ERR(lsm))
-		return lsm;
-
-	rc = op->lsm_unpackmd(lov, lsm, lmm);
-	if (rc) {
-		lov_free_memmd(&lsm);
-		return ERR_PTR(rc);
-	}
-
-	return lsm;
+	return op->lsm_unpackmd(lov, buf, buf_size);
 }
 
 /* Retrieve object striping information.
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [lustre-devel] [PATCH v2 06/33] lustre: lov: create struct lov_stripe_md_entry
  2019-01-06 22:13 [lustre-devel] [PATCH v2 00/33] lustre: add PFL support James Simmons
                   ` (4 preceding siblings ...)
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 05/33] lustre: lov: fold lmm_verify() handling into lmm_unpackmd() James Simmons
@ 2019-01-06 22:14 ` James Simmons
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 07/33] lustre: lov: add composite layout unpacking James Simmons
                   ` (26 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: James Simmons @ 2019-01-06 22:14 UTC (permalink / raw)
  To: lustre-devel

From: Bobi Jam <bobijam@hotmail.com>

Create a new struct lov_stripe_md_entry that will be shared with
older striping methods and the new PFL handling. Rearrange the
code to handle this new data structure.

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Signed-off-by: Bobi Jam <bobijam@hotmail.com>
Signed-off-by: Niu Yawei <yawei.niu@intel.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-8998
Reviewed-on: https://review.whamcloud.com/24849
Reviewed-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/include/obd.h      |   4 -
 drivers/staging/lustre/lustre/lov/lov_ea.c       | 338 ++++++++++-------------
 drivers/staging/lustre/lustre/lov/lov_internal.h |  35 ++-
 drivers/staging/lustre/lustre/lov/lov_io.c       |  17 +-
 drivers/staging/lustre/lustre/lov/lov_merge.c    |   4 +-
 drivers/staging/lustre/lustre/lov/lov_object.c   |  80 +++---
 drivers/staging/lustre/lustre/lov/lov_offset.c   |   8 +-
 drivers/staging/lustre/lustre/lov/lov_pack.c     |  32 +--
 8 files changed, 245 insertions(+), 273 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/obd.h b/drivers/staging/lustre/lustre/include/obd.h
index d6a968c..15d9573 100644
--- a/drivers/staging/lustre/lustre/include/obd.h
+++ b/drivers/staging/lustre/lustre/include/obd.h
@@ -75,10 +75,6 @@ static inline void loi_kms_set(struct lov_oinfo *oinfo, __u64 kms)
 	oinfo->loi_kms_valid = 1;
 }
 
-static inline void loi_init(struct lov_oinfo *loi)
-{
-}
-
 struct lov_stripe_md;
 struct obd_info;
 
diff --git a/drivers/staging/lustre/lustre/lov/lov_ea.c b/drivers/staging/lustre/lustre/lov/lov_ea.c
index 5a01fbb..f794df9 100644
--- a/drivers/staging/lustre/lustre/lov/lov_ea.c
+++ b/drivers/staging/lustre/lustre/lov/lov_ea.c
@@ -103,114 +103,106 @@ static int lsm_lmm_verify_v1v3(struct lov_mds_md *lmm, size_t lmm_size,
 	return 0;
 }
 
-void lsm_free(struct lov_stripe_md *lsm)
+static void lsme_free(struct lov_stripe_md_entry *lsme)
 {
-	__u16 stripe_count = lsm->lsm_stripe_count;
-	int i;
+	unsigned int stripe_count = lsme->lsme_stripe_count;
+	unsigned int i;
 
 	for (i = 0; i < stripe_count; i++)
-		kmem_cache_free(lov_oinfo_slab, lsm->lsm_oinfo[i]);
-	kvfree(lsm);
-}
-
-struct lov_stripe_md *lsm_alloc_plain(u16 stripe_count)
-{
-	size_t oinfo_ptrs_size, lsm_size;
-	struct lov_stripe_md *lsm;
-	struct lov_oinfo     *loi;
-	int i;
-
-	LASSERT(stripe_count <= LOV_MAX_STRIPE_COUNT);
-
-	oinfo_ptrs_size = sizeof(struct lov_oinfo *) * stripe_count;
-	lsm_size = sizeof(*lsm) + oinfo_ptrs_size;
+		kmem_cache_free(lov_oinfo_slab, lsme->lsme_oinfo[i]);
 
-	lsm = kvzalloc(lsm_size, GFP_NOFS);
-	if (!lsm)
-		return NULL;
-
-	for (i = 0; i < stripe_count; i++) {
-		loi = kmem_cache_zalloc(lov_oinfo_slab, GFP_NOFS);
-		if (!loi)
-			goto err;
-		lsm->lsm_oinfo[i] = loi;
-	}
-	lsm->lsm_stripe_count = stripe_count;
-	return lsm;
-
-err:
-	while (--i >= 0)
-		kmem_cache_free(lov_oinfo_slab, lsm->lsm_oinfo[i]);
-	kvfree(lsm);
-	return NULL;
+	kvfree(lsme);
 }
 
-static struct lov_stripe_md *lov_lsm_alloc(u16 stripe_count, u32 pattern,
-					   u32 magic)
+void lsm_free(struct lov_stripe_md *lsm)
 {
-	struct lov_stripe_md *lsm;
+	unsigned int entry_count = lsm->lsm_entry_count;
 	unsigned int i;
 
-	CDEBUG(D_INFO, "alloc lsm, stripe_count %u\n", stripe_count);
-
-	lsm = lsm_alloc_plain(stripe_count);
-	if (!lsm) {
-		CERROR("cannot allocate LSM stripe_count %u\n", stripe_count);
-		return ERR_PTR(-ENOMEM);
-	}
-
-	atomic_set(&lsm->lsm_refc, 1);
-	spin_lock_init(&lsm->lsm_lock);
-	lsm->lsm_magic = magic;
-	lsm->lsm_stripe_count = stripe_count;
-	lsm->lsm_maxbytes = LUSTRE_EXT3_STRIPE_MAXBYTES * stripe_count;
-	lsm->lsm_pattern = pattern;
-	lsm->lsm_pool_name[0] = '\0';
-	lsm->lsm_layout_gen = 0;
-	if (stripe_count > 0)
-		lsm->lsm_oinfo[0]->loi_ost_idx = ~0;
+	for (i = 0; i < entry_count; i++)
+		lsme_free(lsm->lsm_entries[i]);
 
-	for (i = 0; i < stripe_count; i++)
-		loi_init(lsm->lsm_oinfo[i]);
-
-	return lsm;
+	kfree(lsm);
 }
 
-static int lsm_unpackmd_v1v3(struct lov_obd *lov,
-			     struct lov_stripe_md *lsm,
-			     struct lov_mds_md *lmm,
-			     const char *pool_name,
-			     struct lov_ost_data_v1 *objects)
+/**
+ * Unpack a struct lov_mds_md into a struct lov_stripe_md_entry.
+ *
+ * The caller should set id and extent.
+ */
+static struct lov_stripe_md_entry *
+lsme_unpack(struct lov_obd *lov, struct lov_mds_md *lmm, size_t buf_size,
+	    const char *pool_name, struct lov_ost_data_v1 *objects,
+	    loff_t *maxbytes)
 {
+	struct lov_stripe_md_entry *lsme;
 	loff_t min_stripe_maxbytes = 0;
 	unsigned int stripe_count;
-	struct lov_oinfo *loi;
 	loff_t lov_bytes;
+	size_t lsme_size;
 	unsigned int i;
+	u32 pattern;
+	u32 magic;
+	int rc;
 
-	/*
-	 * This supposes lov_mds_md_v1/v3 first fields are
-	 * are the same
-	 */
-	lmm_oi_le_to_cpu(&lsm->lsm_oi, &lmm->lmm_oi);
-	lsm->lsm_stripe_size = le32_to_cpu(lmm->lmm_stripe_size);
-	lsm->lsm_pattern = le32_to_cpu(lmm->lmm_pattern);
-	lsm->lsm_layout_gen = le16_to_cpu(lmm->lmm_layout_gen);
-	lsm->lsm_pool_name[0] = '\0';
+	magic = le32_to_cpu(lmm->lmm_magic);
+	if (magic != LOV_MAGIC_V1 && magic != LOV_MAGIC_V3)
+		return ERR_PTR(-EINVAL);
+
+	pattern = le32_to_cpu(lmm->lmm_pattern);
+	if (pattern & LOV_PATTERN_F_RELEASED)
+		stripe_count = 0;
+	else
+		stripe_count = le16_to_cpu(lmm->lmm_stripe_count);
+
+	if (buf_size < (magic == LOV_MAGIC_V1 ? sizeof(struct lov_mds_md_v1) :
+						sizeof(struct lov_mds_md_v3))) {
+		CERROR("LOV EA %s too small: %zu, need %u\n",
+		       magic == LOV_MAGIC_V1 ? "V1" : "V3", buf_size,
+		       lov_mds_md_size(stripe_count, magic == LOV_MAGIC_V1 ?
+				       LOV_MAGIC_V1 : LOV_MAGIC_V3));
+		lov_dump_lmm_common(D_WARNING, lmm);
+		return ERR_PTR(-EINVAL);
+	}
 
-	stripe_count = lsm_is_released(lsm) ? 0 : lsm->lsm_stripe_count;
+	rc = lsm_lmm_verify_v1v3(lmm, buf_size, stripe_count);
+	if (rc < 0)
+		return ERR_PTR(rc);
+
+	lsme_size = offsetof(typeof(*lsme), lsme_oinfo[stripe_count]);
+	lsme = kvzalloc(lsme_size, GFP_KERNEL);
+	if (!lsme)
+		return ERR_PTR(-ENOMEM);
+
+	lsme->lsme_magic = magic;
+	lsme->lsme_pattern = pattern;
+	lsme->lsme_stripe_size = le32_to_cpu(lmm->lmm_stripe_size);
+	lsme->lsme_stripe_count = stripe_count;
+	lsme->lsme_layout_gen = le16_to_cpu(lmm->lmm_layout_gen);
 
 	if (pool_name) {
 		size_t pool_name_len;
 
-		pool_name_len = strlcpy(lsm->lsm_pool_name, pool_name,
-					sizeof(lsm->lsm_pool_name));
-		if (pool_name_len >= sizeof(lsm->lsm_pool_name))
-			return -E2BIG;
+		pool_name_len = strlcpy(lsme->lsme_pool_name, pool_name,
+					sizeof(lsme->lsme_pool_name));
+		if (pool_name_len >= sizeof(lsme->lsme_pool_name)) {
+			rc = -E2BIG;
+			goto out_lsme;
+		}
 	}
 
 	for (i = 0; i < stripe_count; i++) {
-		loi = lsm->lsm_oinfo[i];
+		struct lov_tgt_desc *ltd;
+		struct lov_oinfo *loi;
+
+		loi = kmem_cache_zalloc(lov_oinfo_slab, GFP_KERNEL);
+		if (!loi) {
+			rc = -ENOMEM;
+			goto out_lsme;
+		}
+
+		lsme->lsme_oinfo[i] = loi;
+
 		ostid_le_to_cpu(&objects[i].l_ost_oi, &loi->loi_oi);
 		loi->loi_ost_idx = le32_to_cpu(objects[i].l_ost_idx);
 		loi->loi_ost_gen = le32_to_cpu(objects[i].l_ost_gen);
@@ -223,10 +215,12 @@ static int lsm_unpackmd_v1v3(struct lov_obd *lov,
 			       (char *)lov->desc.ld_uuid.uuid,
 			       loi->loi_ost_idx, lov->desc.ld_tgt_count);
 			lov_dump_lmm_v1(D_WARNING, lmm);
-			return -EINVAL;
+			rc = -EINVAL;
+			goto out_lsme;
 		}
 
-		if (!lov->lov_tgts[loi->loi_ost_idx]) {
+		ltd = lov->lov_tgts[loi->loi_ost_idx];
+		if (!ltd) {
 			CERROR("%s: OST index %d missing\n",
 			       (char *)lov->desc.ld_uuid.uuid,
 			       loi->loi_ost_idx);
@@ -234,7 +228,7 @@ static int lsm_unpackmd_v1v3(struct lov_obd *lov,
 			continue;
 		}
 
-		lov_bytes = lov_tgt_maxbytes(lov->lov_tgts[loi->loi_ost_idx]);
+		lov_bytes = lov_tgt_maxbytes(ltd);
 		if (min_stripe_maxbytes == 0 || lov_bytes < min_stripe_maxbytes)
 			min_stripe_maxbytes = lov_bytes;
 	}
@@ -242,15 +236,68 @@ static int lsm_unpackmd_v1v3(struct lov_obd *lov,
 	if (min_stripe_maxbytes == 0)
 		min_stripe_maxbytes = LUSTRE_EXT3_STRIPE_MAXBYTES;
 
-	stripe_count = lsm->lsm_stripe_count ?: lov->desc.ld_tgt_count;
 	lov_bytes = min_stripe_maxbytes * stripe_count;
 
-	if (lov_bytes < min_stripe_maxbytes) /* handle overflow */
-		lsm->lsm_maxbytes = MAX_LFS_FILESIZE;
-	else
-		lsm->lsm_maxbytes = lov_bytes;
+	if (maxbytes) {
+		if (lov_bytes < min_stripe_maxbytes) /* handle overflow */
+			*maxbytes = MAX_LFS_FILESIZE;
+		else
+			*maxbytes = lov_bytes;
+	}
 
-	return 0;
+	return lsme;
+
+out_lsme:
+	for (i = 0; i < stripe_count; i++) {
+		struct lov_oinfo *loi = lsme->lsme_oinfo[i];
+
+		if (loi)
+			kmem_cache_free(lov_oinfo_slab, lsme->lsme_oinfo[i]);
+	}
+	kvfree(lsme);
+
+	return ERR_PTR(rc);
+}
+
+static inline struct lov_stripe_md *
+lsm_unpackmd_v1v3(struct lov_obd *lov,
+		  struct lov_mds_md *lmm, size_t buf_size,
+		  const char *pool_name,
+		  struct lov_ost_data_v1 *objects)
+{
+	struct lov_stripe_md_entry *lsme;
+	struct lov_stripe_md *lsm;
+	size_t lsm_size;
+	loff_t maxbytes;
+	u32 pattern;
+
+	pattern = le32_to_cpu(lmm->lmm_pattern);
+
+	lsme = lsme_unpack(lov, lmm, buf_size, pool_name, objects, &maxbytes);
+	if (IS_ERR(lsme))
+		return ERR_CAST(lsme);
+
+	lsme->lsme_extent.e_start = 0;
+	lsme->lsme_extent.e_end = LUSTRE_EOF;
+
+	lsm_size = offsetof(typeof(*lsm), lsm_entries[1]);
+	lsm = kzalloc(lsm_size, GFP_KERNEL);
+	if (!lsm) {
+		lsme_free(lsme);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	atomic_set(&lsm->lsm_refc, 1);
+	spin_lock_init(&lsm->lsm_lock);
+	lsm->lsm_maxbytes = maxbytes;
+	lmm_oi_le_to_cpu(&lsm->lsm_oi, &lmm->lmm_oi);
+	lsm->lsm_magic = le32_to_cpu(lmm->lmm_magic);
+	lsm->lsm_layout_gen = le16_to_cpu(lmm->lmm_layout_gen);
+	lsm->lsm_entry_count = 1;
+	lsm->lsm_is_released = pattern & LOV_PATTERN_F_RELEASED;
+	lsm->lsm_entries[0] = lsme;
+
+	return lsm;
 }
 
 static void
@@ -258,7 +305,8 @@ static int lsm_unpackmd_v1v3(struct lov_obd *lov,
 			  loff_t *lov_off, loff_t *swidth)
 {
 	if (swidth)
-		*swidth = (u64)lsm->lsm_stripe_size * lsm->lsm_stripe_count;
+		*swidth = (loff_t)lsm->lsm_entries[0]->lsme_stripe_size *
+			  lsm->lsm_entries[0]->lsme_stripe_count;
 }
 
 static void
@@ -266,59 +314,16 @@ static int lsm_unpackmd_v1v3(struct lov_obd *lov,
 			   loff_t *lov_off, loff_t *swidth)
 {
 	if (swidth)
-		*swidth = (u64)lsm->lsm_stripe_size * lsm->lsm_stripe_count;
-}
-
-static int lsm_lmm_verify_v1(struct lov_mds_md_v1 *lmm, size_t lmm_bytes,
-			     __u16 *stripe_count)
-{
-	if (lmm_bytes < sizeof(*lmm)) {
-		CERROR("lov_mds_md_v1 too small: %zu, need@least %zu\n",
-		       lmm_bytes, sizeof(*lmm));
-		return -EINVAL;
-	}
-
-	*stripe_count = le16_to_cpu(lmm->lmm_stripe_count);
-	if (le32_to_cpu(lmm->lmm_pattern) & LOV_PATTERN_F_RELEASED)
-		*stripe_count = 0;
-
-	if (lmm_bytes < lov_mds_md_size(*stripe_count, LOV_MAGIC_V1)) {
-		CERROR("LOV EA V1 too small: %zu, need %d\n",
-		       lmm_bytes, lov_mds_md_size(*stripe_count, LOV_MAGIC_V1));
-		lov_dump_lmm_common(D_WARNING, lmm);
-		return -EINVAL;
-	}
-
-	return lsm_lmm_verify_v1v3(lmm, lmm_bytes, *stripe_count);
+		*swidth = (loff_t)lsm->lsm_entries[0]->lsme_stripe_size *
+			  lsm->lsm_entries[0]->lsme_stripe_count;
 }
 
 static struct lov_stripe_md *
 lsm_unpackmd_v1(struct lov_obd *lov, void *buf, size_t buf_size)
 {
 	struct lov_mds_md_v1 *lmm = buf;
-	u32 magic = le32_to_cpu(lmm->lmm_magic);
-	struct lov_stripe_md *lsm;
-	u16 stripe_count;
-	u32 pattern;
-	int rc;
 
-	rc = lsm_lmm_verify_v1(lmm, buf_size, &stripe_count);
-	if (rc)
-		return ERR_PTR(rc);
-
-	pattern = le32_to_cpu(lmm->lmm_pattern);
-
-	lsm = lov_lsm_alloc(stripe_count, pattern, magic);
-	if (IS_ERR(lsm))
-		return lsm;
-
-	rc = lsm_unpackmd_v1v3(lov, lsm, lmm, NULL, lmm->lmm_objects);
-	if (rc) {
-		lov_free_memmd(&lsm);
-		lsm = ERR_PTR(rc);
-	}
-
-	return lsm;
+	return lsm_unpackmd_v1v3(lov, buf, buf_size, NULL, lmm->lmm_objects);
 }
 
 const static struct lsm_operations lsm_v1_ops = {
@@ -327,58 +332,13 @@ static int lsm_lmm_verify_v1(struct lov_mds_md_v1 *lmm, size_t lmm_bytes,
 	.lsm_unpackmd	   = lsm_unpackmd_v1,
 };
 
-static int lsm_lmm_verify_v3(struct lov_mds_md_v3 *lmm, size_t lmm_bytes,
-			     __u16 *stripe_count)
-{
-	if (lmm_bytes < sizeof(*lmm)) {
-		CERROR("lov_mds_md_v3 too small: %zu, need@least %zu\n",
-		       lmm_bytes, sizeof(*lmm));
-		return -EINVAL;
-	}
-
-	*stripe_count = le16_to_cpu(lmm->lmm_stripe_count);
-	if (le32_to_cpu(lmm->lmm_pattern) & LOV_PATTERN_F_RELEASED)
-		*stripe_count = 0;
-
-	if (lmm_bytes < lov_mds_md_size(*stripe_count, LOV_MAGIC_V3)) {
-		CERROR("LOV EA V3 too small: %zu, need %d\n",
-		       lmm_bytes, lov_mds_md_size(*stripe_count, LOV_MAGIC_V3));
-		lov_dump_lmm_common(D_WARNING, lmm);
-		return -EINVAL;
-	}
-
-	return lsm_lmm_verify_v1v3((struct lov_mds_md_v1 *)lmm, lmm_bytes,
-				     *stripe_count);
-}
-
 static struct lov_stripe_md *
 lsm_unpackmd_v3(struct lov_obd *lov, void *buf, size_t buf_size)
 {
 	struct lov_mds_md_v3 *lmm = buf;
-	u32 magic = le32_to_cpu(lmm->lmm_magic);
-	struct lov_stripe_md *lsm;
-	u16 stripe_count;
-	u32 pattern;
-	int rc;
-
-	rc = lsm_lmm_verify_v3(lmm, buf_size, &stripe_count);
-	if (rc)
-		return ERR_PTR(rc);
-
-	pattern = le32_to_cpu(lmm->lmm_pattern);
-
-	lsm = lov_lsm_alloc(stripe_count, pattern, magic);
-	if (IS_ERR(lsm))
-		return lsm;
 
-	rc = lsm_unpackmd_v1v3(lov, lsm, (struct lov_mds_md_v1 *)lmm,
-			       lmm->lmm_pool_name, lmm->lmm_objects);
-	if (rc) {
-		lov_free_memmd(&lsm);
-		lsm = ERR_PTR(rc);
-	}
-
-	return lsm;
+	return lsm_unpackmd_v1v3(lov, buf, buf_size, lmm->lmm_pool_name,
+				 lmm->lmm_objects);
 }
 
 const static struct lsm_operations lsm_v3_ops = {
@@ -403,9 +363,9 @@ const struct lsm_operations *lsm_op_find(int magic)
 void dump_lsm(unsigned int level, const struct lov_stripe_md *lsm)
 {
 	CDEBUG(level, "lsm %p, objid " DOSTID ", maxbytes %#llx, magic 0x%08X, stripe_size %u, stripe_count %u, refc: %d, layout_gen %u, pool [" LOV_POOLNAMEF "]\n",
-	       lsm,
-	       POSTID(&lsm->lsm_oi), lsm->lsm_maxbytes, lsm->lsm_magic,
-	       lsm->lsm_stripe_size, lsm->lsm_stripe_count,
+	       lsm, POSTID(&lsm->lsm_oi), lsm->lsm_maxbytes, lsm->lsm_magic,
+	       lsm->lsm_entries[0]->lsme_stripe_size,
+	       lsm->lsm_entries[0]->lsme_stripe_count,
 	       atomic_read(&lsm->lsm_refc), lsm->lsm_layout_gen,
-	       lsm->lsm_pool_name);
+	       lsm->lsm_entries[0]->lsme_pool_name);
 }
diff --git a/drivers/staging/lustre/lustre/lov/lov_internal.h b/drivers/staging/lustre/lustre/lov/lov_internal.h
index ae122f6..f2747c9 100644
--- a/drivers/staging/lustre/lustre/lov/lov_internal.h
+++ b/drivers/staging/lustre/lustre/lov/lov_internal.h
@@ -44,6 +44,18 @@
  */
 #define LUSTRE_EXT3_STRIPE_MAXBYTES 0x1fffffff000ULL
 
+struct lov_stripe_md_entry {
+	struct lu_extent	lsme_extent;
+	u32			lsme_id;
+	u32			lsme_magic;
+	u32			lsme_pattern;
+	u32			lsme_stripe_size;
+	u16			lsme_stripe_count;
+	u16			lsme_layout_gen;
+	char			lsme_pool_name[LOV_MAXPOOLNAME + 1];
+	struct lov_oinfo       *lsme_oinfo[];
+};
+
 struct lov_stripe_md {
 	atomic_t	lsm_refc;
 	spinlock_t	lsm_lock;
@@ -56,28 +68,15 @@ struct lov_stripe_md {
 	loff_t		lsm_maxbytes;
 	struct ost_id	lsm_oi;
 	u32		lsm_magic;
-	u32		lsm_stripe_size;
-	u32		lsm_pattern; /* RAID0, RAID1, released, ... */
-	u16		lsm_stripe_count;
-	u16		lsm_layout_gen;
-	char		lsm_pool_name[LOV_MAXPOOLNAME + 1];
-	struct lov_oinfo	*lsm_oinfo[0];
+	u32		lsm_layout_gen;
+	u32		lsm_entry_count;
+	bool		lsm_is_released;
+	struct lov_stripe_md_entry *lsm_entries[];
 };
 
-static inline bool lsm_is_released(struct lov_stripe_md *lsm)
-{
-	return !!(lsm->lsm_pattern & LOV_PATTERN_F_RELEASED);
-}
-
 static inline bool lsm_has_objects(struct lov_stripe_md *lsm)
 {
-	if (!lsm)
-		return false;
-
-	if (lsm_is_released(lsm))
-		return false;
-
-	return true;
+	return lsm && !lsm->lsm_is_released;
 }
 
 struct lsm_operations {
diff --git a/drivers/staging/lustre/lustre/lov/lov_io.c b/drivers/staging/lustre/lustre/lov/lov_io.c
index 6537ba3..2d62566 100644
--- a/drivers/staging/lustre/lustre/lov/lov_io.c
+++ b/drivers/staging/lustre/lustre/lov/lov_io.c
@@ -251,10 +251,9 @@ static int lov_io_subio_init(const struct lu_env *env, struct lov_io *lio,
 	 * Need to be optimized, we can't afford to allocate a piece of memory
 	 * when writing a page. -jay
 	 */
-	lio->lis_subs =
-		kvzalloc(lsm->lsm_stripe_count *
+	lio->lis_subs = kcalloc(lsm->lsm_entries[0]->lsme_stripe_count,
 				sizeof(lio->lis_subs[0]),
-				GFP_NOFS);
+				GFP_KERNEL);
 	if (lio->lis_subs) {
 		lio->lis_nr_subios = lio->lis_stripe_count;
 		lio->lis_single_subio_index = -1;
@@ -272,7 +271,7 @@ static int lov_io_slice_init(struct lov_io *lio, struct lov_object *obj,
 	io->ci_result = 0;
 	lio->lis_object = obj;
 
-	lio->lis_stripe_count = obj->lo_lsm->lsm_stripe_count;
+	lio->lis_stripe_count = obj->lo_lsm->lsm_entries[0]->lsme_stripe_count;
 
 	switch (io->ci_type) {
 	case CIT_READ:
@@ -287,7 +286,7 @@ static int lov_io_slice_init(struct lov_io *lio, struct lov_object *obj,
 			 * If there is LOV EA hole, then we may cannot locate
 			 * the current file-tail exactly.
 			 */
-			if (unlikely(obj->lo_lsm->lsm_pattern &
+			if (unlikely(obj->lo_lsm->lsm_entries[0]->lsme_pattern &
 				     LOV_PATTERN_F_HOLE))
 				return -EIO;
 
@@ -419,9 +418,9 @@ static int lov_io_rw_iter_init(const struct lu_env *env,
 	struct lov_io	*lio = cl2lov_io(env, ios);
 	struct cl_io	 *io  = ios->cis_io;
 	struct lov_stripe_md *lsm = lio->lis_object->lo_lsm;
-	__u64 start = io->u.ci_rw.crw_pos;
+	unsigned long ssize = lsm->lsm_entries[0]->lsme_stripe_size;
+	u64 start = io->u.ci_rw.crw_pos;
 	loff_t next;
-	unsigned long ssize = lsm->lsm_stripe_size;
 
 	LASSERT(io->ci_type == CIT_READ || io->ci_type == CIT_WRITE);
 
@@ -596,11 +595,11 @@ static int lov_io_read_ahead(const struct lu_env *env,
 	if (ra_end != CL_PAGE_EOF)
 		ra_end = lov_stripe_pgoff(loo->lo_lsm, ra_end, stripe);
 
-	pps = loo->lo_lsm->lsm_stripe_size >> PAGE_SHIFT;
+	pps = loo->lo_lsm->lsm_entries[0]->lsme_stripe_size >> PAGE_SHIFT;
 
 	CDEBUG(D_READA, DFID " max_index = %lu, pps = %u, stripe_size = %u, stripe no = %u, start index = %lu\n",
 	       PFID(lu_object_fid(lov2lu(loo))), ra_end, pps,
-	       loo->lo_lsm->lsm_stripe_size, stripe, start);
+	       loo->lo_lsm->lsm_entries[0]->lsme_stripe_size, stripe, start);
 
 	/* never exceed the end of the stripe */
 	ra->cra_end = min_t(pgoff_t, ra_end, start + pps - start % pps - 1);
diff --git a/drivers/staging/lustre/lustre/lov/lov_merge.c b/drivers/staging/lustre/lustre/lov/lov_merge.c
index 006717c..10b8448 100644
--- a/drivers/staging/lustre/lustre/lov/lov_merge.c
+++ b/drivers/staging/lustre/lustre/lov/lov_merge.c
@@ -59,8 +59,8 @@ int lov_merge_lvb_kms(struct lov_stripe_md *lsm,
 	CDEBUG(D_INODE, "MDT ID " DOSTID " initial value: s=%llu m=%llu a=%llu c=%llu b=%llu\n",
 	       POSTID(&lsm->lsm_oi), lvb->lvb_size, lvb->lvb_mtime,
 	       lvb->lvb_atime, lvb->lvb_ctime, lvb->lvb_blocks);
-	for (i = 0; i < lsm->lsm_stripe_count; i++) {
-		struct lov_oinfo *loi = lsm->lsm_oinfo[i];
+	for (i = 0; i < lsm->lsm_entries[0]->lsme_stripe_count; i++) {
+		struct lov_oinfo *loi = lsm->lsm_entries[0]->lsme_oinfo[i];
 		u64 lov_size, tmpsize;
 
 		if (OST_LVB_IS_ERR(loi->loi_lvb.lvb_blocks)) {
diff --git a/drivers/staging/lustre/lustre/lov/lov_object.c b/drivers/staging/lustre/lustre/lov/lov_object.c
index adc90f3..ad2901a 100644
--- a/drivers/staging/lustre/lustre/lov/lov_object.c
+++ b/drivers/staging/lustre/lustre/lov/lov_object.c
@@ -153,7 +153,7 @@ static int lov_init_sub(const struct lu_env *env, struct lov_object *lov,
 	hdr    = cl_object_header(lov2cl(lov));
 	subhdr = cl_object_header(stripe);
 
-	oinfo = lov->lo_lsm->lsm_oinfo[idx];
+	oinfo = lov->lo_lsm->lsm_entries[0]->lsme_oinfo[idx];
 	CDEBUG(D_INODE, DFID "@%p[%d] -> " DFID "@%p: ostid: " DOSTID " idx: %d gen: %d\n",
 	       PFID(&subhdr->coh_lu.loh_fid), subhdr, idx,
 	       PFID(&hdr->coh_lu.loh_fid), hdr, POSTID(&oinfo->loi_oi),
@@ -239,7 +239,7 @@ static int lov_init_raid0(const struct lu_env *env, struct lov_device *dev,
 	LASSERT(!lov->lo_lsm);
 	lov->lo_lsm = lsm_addref(lsm);
 	lov->lo_layout_invalid = true;
-	r0->lo_nr  = lsm->lsm_stripe_count;
+	r0->lo_nr  = lsm->lsm_entries[0]->lsme_stripe_count;
 	LASSERT(r0->lo_nr <= lov_targets_nr(dev));
 
 	r0->lo_sub = kvzalloc(r0->lo_nr * sizeof(r0->lo_sub[0]),
@@ -255,9 +255,10 @@ static int lov_init_raid0(const struct lu_env *env, struct lov_device *dev,
 		 */
 		for (i = 0; i < r0->lo_nr && result == 0; ++i) {
 			struct cl_device *subdev;
-			struct lov_oinfo *oinfo = lsm->lsm_oinfo[i];
-			int ost_idx = oinfo->loi_ost_idx;
+			struct lov_oinfo *oinfo;
+			int ost_idx;
 
+			oinfo = lsm->lsm_entries[0]->lsme_oinfo[i];
 			if (lov_oinfo_is_dummy(oinfo))
 				continue;
 
@@ -266,6 +267,7 @@ static int lov_init_raid0(const struct lu_env *env, struct lov_device *dev,
 			if (result != 0)
 				goto out;
 
+			ost_idx = oinfo->loi_ost_idx;
 			if (!dev->ld_target[ost_idx]) {
 				CERROR("%s: OST %04x is not initialized\n",
 				lov2obd(dev->ld_lov)->obd_name, ost_idx);
@@ -314,7 +316,7 @@ static int lov_init_released(const struct lu_env *env, struct lov_device *dev,
 			     union  lov_layout_state *state)
 {
 	LASSERT(lsm);
-	LASSERT(lsm_is_released(lsm));
+	LASSERT(lsm->lsm_is_released);
 	LASSERT(!lov->lo_lsm);
 
 	lov->lo_lsm = lsm_addref(lsm);
@@ -327,7 +329,7 @@ static struct cl_object *lov_find_subobj(const struct lu_env *env,
 					 int stripe_idx)
 {
 	struct lov_device *dev = lu2lov_dev(lov2lu(lov)->lo_dev);
-	struct lov_oinfo *oinfo = lsm->lsm_oinfo[stripe_idx];
+	struct lov_oinfo *oinfo = lsm->lsm_entries[0]->lsme_oinfo[stripe_idx];
 	struct lov_thread_info *lti = lov_env_info(env);
 	struct lu_fid *ofid = &lti->lti_fid;
 	struct cl_device *subdev;
@@ -485,7 +487,7 @@ static int lov_print_raid0(const struct lu_env *env, void *cookie,
 	(*p)(env, cookie, "stripes: %d, %s, lsm{%p 0x%08X %d %u %u}:\n",
 	     r0->lo_nr, lov->lo_layout_invalid ? "invalid" : "valid", lsm,
 	     lsm->lsm_magic, atomic_read(&lsm->lsm_refc),
-	     lsm->lsm_stripe_count, lsm->lsm_layout_gen);
+	     lsm->lsm_entries[0]->lsme_stripe_count, lsm->lsm_layout_gen);
 	for (i = 0; i < r0->lo_nr; ++i) {
 		struct lu_object *sub;
 
@@ -509,7 +511,7 @@ static int lov_print_released(const struct lu_env *env, void *cookie,
 	     "released: %s, lsm{%p 0x%08X %d %u %u}:\n",
 	     lov->lo_layout_invalid ? "invalid" : "valid", lsm,
 	     lsm->lsm_magic, atomic_read(&lsm->lsm_refc),
-	     lsm->lsm_stripe_count, lsm->lsm_layout_gen);
+	     lsm->lsm_entries[0]->lsme_stripe_count, lsm->lsm_layout_gen);
 	return 0;
 }
 
@@ -650,8 +652,13 @@ static enum lov_layout_type lov_type(struct lov_stripe_md *lsm)
 {
 	if (!lsm)
 		return LLT_EMPTY;
-	if (lsm_is_released(lsm))
+
+	if (lsm->lsm_magic == LOV_MAGIC_COMP_V1)
+		return LLT_EMPTY;
+
+	if (lsm->lsm_is_released)
 		return LLT_RELEASED;
+
 	return LLT_RAID0;
 }
 
@@ -882,7 +889,8 @@ static int lov_conf_set(const struct lu_env *env, struct cl_object *obj,
 	if ((!lsm && !lov->lo_lsm) ||
 	    ((lsm && lov->lo_lsm) &&
 	     (lov->lo_lsm->lsm_layout_gen == lsm->lsm_layout_gen) &&
-	     (lov->lo_lsm->lsm_pattern == lsm->lsm_pattern))) {
+	     (lov->lo_lsm->lsm_entries[0]->lsme_pattern ==
+	      lsm->lsm_entries[0]->lsme_pattern))) {
 		/* same version of layout */
 		lov->lo_layout_invalid = false;
 		result = 0;
@@ -1010,19 +1018,24 @@ static int fiemap_calc_last_stripe(struct lov_stripe_md *lsm,
 	u64 obd_end;
 	int i, j;
 
-	if (fm_end - fm_start > lsm->lsm_stripe_size * lsm->lsm_stripe_count) {
-		last_stripe = (start_stripe < 1 ? lsm->lsm_stripe_count - 1 :
+	if (fm_end - fm_start > lsm->lsm_entries[0]->lsme_stripe_size *
+				lsm->lsm_entries[0]->lsme_stripe_count) {
+		last_stripe = (start_stripe < 1 ?
+			       lsm->lsm_entries[0]->lsme_stripe_count - 1 :
 			       start_stripe - 1);
-		*stripe_count = lsm->lsm_stripe_count;
+		*stripe_count = lsm->lsm_entries[0]->lsme_stripe_count;
 	} else {
-		for (j = 0, i = start_stripe; j < lsm->lsm_stripe_count;
-		     i = (i + 1) % lsm->lsm_stripe_count, j++) {
-			if (!(lov_stripe_intersects(lsm, i, fm_start, fm_end,
-						    &obd_start, &obd_end)))
+		for (j = 0, i = start_stripe;
+		     j < lsm->lsm_entries[0]->lsme_stripe_count;
+		     i = (i + 1) % lsm->lsm_entries[0]->lsme_stripe_count,
+		     j++) {
+			if (lov_stripe_intersects(lsm, i, fm_start, fm_end,
+						  &obd_start, &obd_end) == 0)
 				break;
 		}
 		*stripe_count = j;
-		last_stripe = (start_stripe + j - 1) % lsm->lsm_stripe_count;
+		last_stripe = (start_stripe + j - 1) %
+			      lsm->lsm_entries[0]->lsme_stripe_count;
 	}
 
 	return last_stripe;
@@ -1090,8 +1103,8 @@ static u64 fiemap_calc_fm_end_offset(struct fiemap *fiemap,
 		return 0;
 
 	/* Find out stripe_no from ost_index saved in the fe_device */
-	for (i = 0; i < lsm->lsm_stripe_count; i++) {
-		struct lov_oinfo *oinfo = lsm->lsm_oinfo[i];
+	for (i = 0; i < lsm->lsm_entries[0]->lsme_stripe_count; i++) {
+		struct lov_oinfo *oinfo = lsm->lsm_entries[0]->lsme_oinfo[i];
 
 		if (lov_oinfo_is_dummy(oinfo))
 			continue;
@@ -1110,7 +1123,7 @@ static u64 fiemap_calc_fm_end_offset(struct fiemap *fiemap,
 	 * offset to start of next device
 	 */
 	if (lov_stripe_intersects(lsm, stripe_no, fm_start, fm_end,
-				  &lun_start, &lun_end) &&
+				  &lun_start, &lun_end) != 0 &&
 	    local_end < lun_end) {
 		fm_end_offset = local_end;
 		*start_stripe = stripe_no;
@@ -1119,7 +1132,8 @@ static u64 fiemap_calc_fm_end_offset(struct fiemap *fiemap,
 		 * calculate offset in next stripe.
 		 */
 		fm_end_offset = 0;
-		*start_stripe = (stripe_no + 1) % lsm->lsm_stripe_count;
+		*start_stripe = (stripe_no + 1) %
+				lsm->lsm_entries[0]->lsme_stripe_count;
 	}
 
 	return fm_end_offset;
@@ -1168,7 +1182,7 @@ static int fiemap_for_stripe(const struct lu_env *env, struct cl_object *obj,
 				   &lun_start, &obd_object_end)) == 0)
 		return 0;
 
-	if (lov_oinfo_is_dummy(lsm->lsm_oinfo[stripeno]))
+	if (lov_oinfo_is_dummy(lsm->lsm_entries[0]->lsme_oinfo[stripeno]))
 		return -EIO;
 
 	/* If this is a continuation FIEMAP call and we are on
@@ -1218,7 +1232,7 @@ static int fiemap_for_stripe(const struct lu_env *env, struct cl_object *obj,
 		fs->fs_fm->fm_mapped_extents = 0;
 		fs->fs_fm->fm_flags = fiemap->fm_flags;
 
-		ost_index = lsm->lsm_oinfo[stripeno]->loi_ost_idx;
+		ost_index = lsm->lsm_entries[0]->lsme_oinfo[stripeno]->loi_ost_idx;
 
 		if (ost_index < 0 || ost_index >= lov->desc.ld_tgt_count) {
 			rc = -EINVAL;
@@ -1347,13 +1361,13 @@ static int lov_object_fiemap(const struct lu_env *env, struct cl_object *obj,
 	 * If the stripe_count > 1 and the application does not understand
 	 * DEVICE_ORDER flag, it cannot interpret the extents correctly.
 	 */
-	if (lsm->lsm_stripe_count > 1 &&
+	if (lsm->lsm_entries[0]->lsme_stripe_count > 1 &&
 	    !(fiemap->fm_flags & FIEMAP_FLAG_DEVICE_ORDER)) {
 		rc = -ENOTSUPP;
 		goto out;
 	}
 
-	if (lsm_is_released(lsm)) {
+	if (lsm->lsm_is_released) {
 		if (fiemap->fm_start < fmkey->lfik_oa.o_size) {
 			/**
 			 * released file, return a minimal FIEMAP if
@@ -1431,7 +1445,8 @@ static int lov_object_fiemap(const struct lu_env *env, struct cl_object *obj,
 	/* Check each stripe */
 	for (cur_stripe = fs.fs_start_stripe; stripe_count > 0;
 	     --stripe_count,
-	     cur_stripe = (cur_stripe + 1) % lsm->lsm_stripe_count) {
+	     cur_stripe = (cur_stripe + 1) %
+			  lsm->lsm_entries[0]->lsme_stripe_count) {
 		rc = fiemap_for_stripe(env, obj, lsm, fiemap, buflen, fmkey,
 				       cur_stripe, &fs);
 		if (rc < 0)
@@ -1443,7 +1458,7 @@ static int lov_object_fiemap(const struct lu_env *env, struct cl_object *obj,
 	 * Indicate that we are returning device offsets unless file just has
 	 * single stripe
 	 */
-	if (lsm->lsm_stripe_count > 1)
+	if (lsm->lsm_entries[0]->lsme_stripe_count > 1)
 		fiemap->fm_flags |= FIEMAP_FLAG_DEVICE_ORDER;
 
 	if (!fiemap->fm_extent_count)
@@ -1495,7 +1510,8 @@ static int lov_object_layout_get(const struct lu_env *env,
 		return 0;
 	}
 
-	cl->cl_size = lov_mds_md_size(lsm->lsm_stripe_count, lsm->lsm_magic);
+	cl->cl_size = lov_mds_md_size(lsm->lsm_entries[0]->lsme_stripe_count,
+				      lsm->lsm_magic);
 	cl->cl_layout_gen = lsm->lsm_layout_gen;
 
 	rc = lov_lsm_pack(lsm, buf->lb_buf, buf->lb_len);
@@ -1599,9 +1615,11 @@ int lov_read_and_clear_async_rc(struct cl_object *clob)
 			int i;
 
 			lsm = lov->lo_lsm;
-			for (i = 0; i < lsm->lsm_stripe_count; i++) {
-				struct lov_oinfo *loi = lsm->lsm_oinfo[i];
+			for (i = 0; i < lsm->lsm_entries[0]->lsme_stripe_count;
+			     i++) {
+				struct lov_oinfo *loi;
 
+				loi = lsm->lsm_entries[0]->lsme_oinfo[i];
 				if (lov_oinfo_is_dummy(loi))
 					continue;
 
diff --git a/drivers/staging/lustre/lustre/lov/lov_offset.c b/drivers/staging/lustre/lustre/lov/lov_offset.c
index a5f00f6..19a44d3 100644
--- a/drivers/staging/lustre/lustre/lov/lov_offset.c
+++ b/drivers/staging/lustre/lustre/lov/lov_offset.c
@@ -40,7 +40,7 @@
 /* compute object size given "stripeno" and the ost size */
 u64 lov_stripe_size(struct lov_stripe_md *lsm, u64 ost_size, int stripeno)
 {
-	unsigned long ssize = lsm->lsm_stripe_size;
+	unsigned long ssize = lsm->lsm_entries[0]->lsme_stripe_size;
 	unsigned long stripe_size;
 	u64 swidth;
 	u64 lov_size;
@@ -125,7 +125,7 @@ pgoff_t lov_stripe_pgoff(struct lov_stripe_md *lsm, pgoff_t stripe_index,
 int lov_stripe_offset(struct lov_stripe_md *lsm, u64 lov_off,
 		      int stripeno, u64 *obdoff)
 {
-	unsigned long ssize  = lsm->lsm_stripe_size;
+	unsigned long ssize  = lsm->lsm_entries[0]->lsme_stripe_size;
 	u64 stripe_off, this_stripe, swidth;
 	int magic = lsm->lsm_magic;
 	int ret = 0;
@@ -180,7 +180,7 @@ int lov_stripe_offset(struct lov_stripe_md *lsm, u64 lov_off,
 u64 lov_size_to_stripe(struct lov_stripe_md *lsm, u64 file_size,
 		       int stripeno)
 {
-	unsigned long ssize  = lsm->lsm_stripe_size;
+	unsigned long ssize  = lsm->lsm_entries[0]->lsme_stripe_size;
 	u64 stripe_off, this_stripe, swidth;
 	int magic = lsm->lsm_magic;
 
@@ -254,7 +254,7 @@ int lov_stripe_intersects(struct lov_stripe_md *lsm, int stripeno,
 /* compute which stripe number "lov_off" will be written into */
 int lov_stripe_number(struct lov_stripe_md *lsm, u64 lov_off)
 {
-	unsigned long ssize  = lsm->lsm_stripe_size;
+	unsigned long ssize  = lsm->lsm_entries[0]->lsme_stripe_size;
 	u64 stripe_off, swidth;
 	int magic = lsm->lsm_magic;
 
diff --git a/drivers/staging/lustre/lustre/lov/lov_pack.c b/drivers/staging/lustre/lustre/lov/lov_pack.c
index 90f9f2d..3700937 100644
--- a/drivers/staging/lustre/lustre/lov/lov_pack.c
+++ b/drivers/staging/lustre/lustre/lov/lov_pack.c
@@ -116,7 +116,8 @@ ssize_t lov_lsm_pack(const struct lov_stripe_md *lsm, void *buf,
 	size_t lmm_size;
 	unsigned int i;
 
-	lmm_size = lov_mds_md_size(lsm->lsm_stripe_count, lsm->lsm_magic);
+	lmm_size = lov_mds_md_size(lsm->lsm_entries[0]->lsme_stripe_count,
+				   lsm->lsm_magic);
 	if (!buf_size)
 		return lmm_size;
 
@@ -129,23 +130,24 @@ ssize_t lov_lsm_pack(const struct lov_stripe_md *lsm, void *buf,
 	 */
 	lmmv1->lmm_magic = cpu_to_le32(lsm->lsm_magic);
 	lmm_oi_cpu_to_le(&lmmv1->lmm_oi, &lsm->lsm_oi);
-	lmmv1->lmm_stripe_size = cpu_to_le32(lsm->lsm_stripe_size);
-	lmmv1->lmm_stripe_count = cpu_to_le16(lsm->lsm_stripe_count);
-	lmmv1->lmm_pattern = cpu_to_le32(lsm->lsm_pattern);
+	lmmv1->lmm_stripe_size = cpu_to_le32(lsm->lsm_entries[0]->lsme_stripe_size);
+	lmmv1->lmm_stripe_count = cpu_to_le16(lsm->lsm_entries[0]->lsme_stripe_count);
+	lmmv1->lmm_pattern = cpu_to_le32(lsm->lsm_entries[0]->lsme_pattern);
 	lmmv1->lmm_layout_gen = cpu_to_le16(lsm->lsm_layout_gen);
 
 	if (lsm->lsm_magic == LOV_MAGIC_V3) {
-		BUILD_BUG_ON(sizeof(lsm->lsm_pool_name) !=
-			 sizeof(lmmv3->lmm_pool_name));
-		strlcpy(lmmv3->lmm_pool_name, lsm->lsm_pool_name,
+		BUILD_BUG_ON(sizeof(lsm->lsm_entries[0]->lsme_pool_name) !=
+			     sizeof(lmmv3->lmm_pool_name));
+		strlcpy(lmmv3->lmm_pool_name,
+			lsm->lsm_entries[0]->lsme_pool_name,
 			sizeof(lmmv3->lmm_pool_name));
 		lmm_objects = lmmv3->lmm_objects;
 	} else {
 		lmm_objects = lmmv1->lmm_objects;
 	}
 
-	for (i = 0; i < lsm->lsm_stripe_count; i++) {
-		struct lov_oinfo *loi = lsm->lsm_oinfo[i];
+	for (i = 0; i < lsm->lsm_entries[0]->lsme_stripe_count; i++) {
+		struct lov_oinfo *loi = lsm->lsm_entries[0]->lsme_oinfo[i];
 
 		ostid_cpu_to_le(&loi->loi_oi, &lmm_objects[i].l_ost_oi);
 		lmm_objects[i].l_ost_gen = cpu_to_le32(loi->loi_ost_gen);
@@ -240,8 +242,8 @@ int lov_getstripe(struct lov_object *obj, struct lov_stripe_md *lsm,
 		goto out;
 	}
 
-	if (!lsm_is_released(lsm))
-		stripe_count = lsm->lsm_stripe_count;
+	if (!lsm->lsm_is_released)
+		stripe_count = lsm->lsm_entries[0]->lsme_stripe_count;
 	else
 		stripe_count = 0;
 
@@ -260,18 +262,16 @@ int lov_getstripe(struct lov_object *obj, struct lov_stripe_md *lsm,
 		goto out;
 	}
 
-	if (lum.lmm_stripe_count &&
-	    (lum.lmm_stripe_count < lsm->lsm_stripe_count)) {
+	if (lum.lmm_stripe_count && lum.lmm_stripe_count < stripe_count) {
 		/* Return right size of stripe to user */
 		lum.lmm_stripe_count = stripe_count;
 		rc = copy_to_user(lump, &lum, lum_size);
 		rc = -EOVERFLOW;
 		goto out;
 	}
-	lmmk_size = lov_mds_md_size(stripe_count, lsm->lsm_magic);
-
 
-	lmmk = kvzalloc(lmmk_size, GFP_NOFS);
+	lmmk_size = lov_mds_md_size(stripe_count, lsm->lsm_magic);
+	lmmk = kvzalloc(lmmk_size, GFP_KERNEL);
 	if (!lmmk) {
 		rc = -ENOMEM;
 		goto out;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [lustre-devel] [PATCH v2 07/33] lustre: lov: add composite layout unpacking
  2019-01-06 22:13 [lustre-devel] [PATCH v2 00/33] lustre: add PFL support James Simmons
                   ` (5 preceding siblings ...)
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 06/33] lustre: lov: create struct lov_stripe_md_entry James Simmons
@ 2019-01-06 22:14 ` James Simmons
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 08/33] lustre: lov: embedded raid0 in struct lov_layout_composite James Simmons
                   ` (25 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: James Simmons @ 2019-01-06 22:14 UTC (permalink / raw)
  To: lustre-devel

From: Bobi Jam <bobijam@hotmail.com>

Update struct lov_stripe_md to accommodate composite layouts. Add
methods to unpack composite layouts.

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Signed-off-by: Bobi Jam <bobijam@hotmail.com>
Signed-off-by: Niu Yawei <yawei.niu@intel.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-8998
Reviewed-on: https://review.whamcloud.com/24849
WC-bug-id: https://jira.whamcloud.com/browse/LU-9315
Reviewed-on: https://review.whamcloud.com/26503
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/lov/lov_ea.c   | 175 ++++++++++++++++++++++++++-
 drivers/staging/lustre/lustre/lov/lov_pack.c |   3 +
 2 files changed, 175 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lustre/lov/lov_ea.c b/drivers/staging/lustre/lustre/lov/lov_ea.c
index f794df9..7d86318 100644
--- a/drivers/staging/lustre/lustre/lov/lov_ea.c
+++ b/drivers/staging/lustre/lustre/lov/lov_ea.c
@@ -38,12 +38,21 @@
 #define DEBUG_SUBSYSTEM S_LOV
 
 #include <asm/div64.h>
+#include <linux/sort.h>
 
 #include <obd_class.h>
 #include <uapi/linux/lustre/lustre_idl.h>
+#include <uapi/linux/lustre/lustre_user.h>
 
 #include "lov_internal.h"
 
+static inline void lu_extent_le_to_cpu(struct lu_extent *dst,
+				       const struct lu_extent *src)
+{
+	dst->e_start = le64_to_cpu(src->e_start);
+	dst->e_end = le64_to_cpu(src->e_end);
+}
+
 /*
  * Find minimum stripe maxbytes value. For inactive or
  * reconnecting targets use LUSTRE_EXT3_STRIPE_MAXBYTES.
@@ -347,17 +356,177 @@ void lsm_free(struct lov_stripe_md *lsm)
 	.lsm_unpackmd		= lsm_unpackmd_v3,
 };
 
+static int lsm_verify_comp_md_v1(struct lov_comp_md_v1 *lcm,
+				 size_t lcm_buf_size)
+{
+	unsigned int entry_count;
+	size_t lcm_size;
+	unsigned int i;
+
+	lcm_size = le32_to_cpu(lcm->lcm_size);
+	if (lcm_buf_size < lcm_size) {
+		CERROR("bad LCM buffer size %zu, expected %zu\n",
+		       lcm_buf_size, lcm_size);
+		return -EINVAL;
+	}
+
+	entry_count = le16_to_cpu(lcm->lcm_entry_count);
+	for (i = 0; i < entry_count; i++) {
+		struct lov_comp_md_entry_v1 *lcme = &lcm->lcm_entries[i];
+		size_t blob_offset;
+		size_t blob_size;
+
+		blob_offset = le32_to_cpu(lcme->lcme_offset);
+		blob_size = le32_to_cpu(lcme->lcme_size);
+
+		if (lcm_size < blob_offset || lcm_size < blob_size ||
+		    lcm_size < blob_offset + blob_size) {
+			CERROR("LCM entry %u has invalid blob: LCM size = %zu, offset = %zu, size = %zu\n",
+			       le32_to_cpu(lcme->lcme_id), lcm_size,
+			       blob_offset, blob_size);
+			return -EINVAL;
+		}
+	}
+
+	return 0;
+}
+
+static struct lov_stripe_md_entry *
+lsme_unpack_comp(struct lov_obd *lov, struct lov_mds_md *lmm,
+		 size_t lmm_buf_size, loff_t *maxbytes)
+{
+	unsigned int stripe_count;
+	unsigned int magic;
+
+	stripe_count = le16_to_cpu(lmm->lmm_stripe_count);
+	if (stripe_count == 0)
+		return ERR_PTR(-EINVAL);
+
+	magic = le32_to_cpu(lmm->lmm_magic);
+	if (magic != LOV_MAGIC_V1 && magic != LOV_MAGIC_V3)
+		return ERR_PTR(-EINVAL);
+
+	if (lmm_buf_size < lov_mds_md_size(stripe_count, magic))
+		return ERR_PTR(-EINVAL);
+
+	if (magic == LOV_MAGIC_V1) {
+		return lsme_unpack(lov, lmm, lmm_buf_size, NULL,
+				   lmm->lmm_objects, maxbytes);
+	} else {
+		struct lov_mds_md_v3 *lmm3 = (struct lov_mds_md_v3 *)lmm;
+
+		return lsme_unpack(lov, lmm, lmm_buf_size, lmm3->lmm_pool_name,
+				   lmm3->lmm_objects, maxbytes);
+	}
+}
+
+static struct lov_stripe_md *
+lsm_unpackmd_comp_md_v1(struct lov_obd *lov, void *buf, size_t buf_size)
+{
+	struct lov_comp_md_v1 *lcm = buf;
+	struct lov_stripe_md *lsm;
+	unsigned int entry_count = 0;
+	loff_t maxbytes;
+	size_t lsm_size;
+	unsigned int i;
+	int rc;
+
+	rc = lsm_verify_comp_md_v1(buf, buf_size);
+	if (rc < 0)
+		return ERR_PTR(rc);
+
+	entry_count = le16_to_cpu(lcm->lcm_entry_count);
+
+	lsm_size = offsetof(typeof(*lsm), lsm_entries[entry_count]);
+	lsm = kzalloc(lsm_size, GFP_KERNEL);
+	if (!lsm)
+		return ERR_PTR(-ENOMEM);
+
+	atomic_set(&lsm->lsm_refc, 1);
+	spin_lock_init(&lsm->lsm_lock);
+	lsm->lsm_magic = le32_to_cpu(lcm->lcm_magic);
+	lsm->lsm_layout_gen = le32_to_cpu(lcm->lcm_layout_gen);
+	lsm->lsm_entry_count = entry_count;
+	lsm->lsm_is_released = true;
+	lsm->lsm_maxbytes = LLONG_MIN;
+
+	for (i = 0; i < entry_count; i++) {
+		struct lov_comp_md_entry_v1 *lcme = &lcm->lcm_entries[i];
+		struct lov_stripe_md_entry *lsme;
+		size_t blob_offset;
+		size_t blob_size;
+		void *blob;
+
+		blob_offset = le32_to_cpu(lcme->lcme_offset);
+		blob_size = le32_to_cpu(lcme->lcme_size);
+		blob = (char *)lcm + blob_offset;
+
+		lsme = lsme_unpack_comp(lov, blob, blob_size,
+					(i == entry_count - 1) ? &maxbytes :
+					NULL);
+		if (IS_ERR(lsme)) {
+			rc = PTR_ERR(lsme);
+			goto out_lsm;
+		}
+
+		if (!(lsme->lsme_pattern & LOV_PATTERN_F_RELEASED))
+			lsm->lsm_is_released = false;
+
+		lsm->lsm_entries[i] = lsme;
+		lsme->lsme_id = le32_to_cpu(lcme->lcme_id);
+		lu_extent_le_to_cpu(&lsme->lsme_extent, &lcme->lcme_extent);
+
+		if (i == entry_count - 1) {
+			lsm->lsm_maxbytes = (loff_t)lsme->lsme_extent.e_start +
+					    maxbytes;
+			/* the last component hasn't been defined, or
+			 * lsm_maxbytes overflowed.
+			 */
+			if (lsme->lsme_extent.e_end != LUSTRE_EOF ||
+			    lsm->lsm_maxbytes <
+			    (loff_t)lsme->lsme_extent.e_start)
+				lsm->lsm_maxbytes = MAX_LFS_FILESIZE;
+		}
+	}
+
+	return lsm;
+
+out_lsm:
+	for (i = 0; i < entry_count; i++)
+		if (lsm->lsm_entries[i])
+			lsme_free(lsm->lsm_entries[i]);
+
+	kfree(lsm);
+
+	return ERR_PTR(rc);
+}
+
+const static struct lsm_operations lsm_comp_md_v1_ops = {
+	.lsm_stripe_by_index	= lsm_stripe_by_index_plain,
+	.lsm_stripe_by_offset	= lsm_stripe_by_offset_plain,
+	.lsm_unpackmd		= lsm_unpackmd_comp_md_v1,
+};
+
 const struct lsm_operations *lsm_op_find(int magic)
 {
+	const struct lsm_operations *lsm = NULL;
+
 	switch (magic) {
 	case LOV_MAGIC_V1:
-		return &lsm_v1_ops;
+		lsm = &lsm_v1_ops;
+		break;
 	case LOV_MAGIC_V3:
-		return &lsm_v3_ops;
+		lsm = &lsm_v3_ops;
+		break;
+	case LOV_MAGIC_COMP_V1:
+		lsm = &lsm_comp_md_v1_ops;
+		break;
 	default:
 		CERROR("unrecognized lsm_magic %08x\n", magic);
-		return NULL;
+		break;
 	}
+
+	return lsm;
 }
 
 void dump_lsm(unsigned int level, const struct lov_stripe_md *lsm)
diff --git a/drivers/staging/lustre/lustre/lov/lov_pack.c b/drivers/staging/lustre/lustre/lov/lov_pack.c
index 3700937..8b7a572 100644
--- a/drivers/staging/lustre/lustre/lov/lov_pack.c
+++ b/drivers/staging/lustre/lustre/lov/lov_pack.c
@@ -206,6 +206,9 @@ struct lov_stripe_md *lov_unpackmd(struct lov_obd *lov, void *buf,
 	const struct lsm_operations *op;
 	u32 magic;
 
+	if (buf_size < sizeof(magic))
+		return ERR_PTR(-EINVAL);
+
 	magic = le32_to_cpu(*(u32 *)buf);
 	op = lsm_op_find(magic);
 	if (!op)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [lustre-devel] [PATCH v2 08/33] lustre: lov: embedded raid0 in struct lov_layout_composite
  2019-01-06 22:13 [lustre-devel] [PATCH v2 00/33] lustre: add PFL support James Simmons
                   ` (6 preceding siblings ...)
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 07/33] lustre: lov: add composite layout unpacking James Simmons
@ 2019-01-06 22:14 ` James Simmons
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 09/33] lustre: lov: migrate lov raid0 to future PFL component handling James Simmons
                   ` (24 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: James Simmons @ 2019-01-06 22:14 UTC (permalink / raw)
  To: lustre-devel

From: Bobi Jam <bobijam@hotmail.com>

Create bare bones struct lov_layout_composite that
Make client layer support composite layout.

Plain layout will be stored in LOV layer as a composite layout
containing a single component.

Signed-off-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Signed-off-by: Bobi Jam <bobijam@hotmail.com>
Signed-off-by: Niu Yawei <yawei.niu@intel.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-8998
Reviewed-on: https://review.whamcloud.com/24850
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../staging/lustre/lustre/lov/lov_cl_internal.h    | 78 ++++++++++++----------
 drivers/staging/lustre/lustre/lov/lov_object.c     |  8 +--
 drivers/staging/lustre/lustre/lov/lovsub_object.c  |  8 +--
 3 files changed, 50 insertions(+), 44 deletions(-)

diff --git a/drivers/staging/lustre/lustre/lov/lov_cl_internal.h b/drivers/staging/lustre/lustre/lov/lov_cl_internal.h
index e4f7621..c6ace49 100644
--- a/drivers/staging/lustre/lustre/lov/lov_cl_internal.h
+++ b/drivers/staging/lustre/lustre/lov/lov_cl_internal.h
@@ -129,6 +129,42 @@ static inline char *llt2str(enum lov_layout_type llt)
 	return "";
 }
 
+struct lov_layout_raid0 {
+	unsigned int		lo_nr;
+	/**
+	 * When this is true, lov_object::lo_attr contains
+	 * valid up to date attributes for a top-level
+	 * object. This field is reset to 0 when attributes of
+	 * any sub-object change.
+	 */
+	int			lo_attr_valid;
+	/**
+	 * Array of sub-objects. Allocated when top-object is
+	 * created (lov_init_raid0()).
+	 *
+	 * Top-object is a strict master of its sub-objects:
+	 * it is created before them, and outlives its
+	 * children (this later is necessary so that basic
+	 * functions like cl_object_top() always
+	 * work). Top-object keeps a reference on every
+	 * sub-object.
+	 *
+	 * When top-object is destroyed (lov_delete_raid0())
+	 * it releases its reference to a sub-object and waits
+	 * until the latter is finally destroyed.
+	 */
+	struct lovsub_object  **lo_sub;
+	/**
+	 * protect lo_sub
+	 */
+	spinlock_t		lo_sub_lock;
+	/**
+	 * Cached object attribute, built from sub-object
+	 * attributes.
+	 */
+	struct cl_attr		lo_attr;
+};
+
 /**
  * lov-specific file state.
  *
@@ -178,45 +214,15 @@ struct lov_object {
 	struct lov_stripe_md  *lo_lsm;
 
 	union lov_layout_state {
-		struct lov_layout_raid0 {
-			unsigned int	       lo_nr;
-			/**
-			 * When this is true, lov_object::lo_attr contains
-			 * valid up to date attributes for a top-level
-			 * object. This field is reset to 0 when attributes of
-			 * any sub-object change.
-			 */
-			int		       lo_attr_valid;
-			/**
-			 * Array of sub-objects. Allocated when top-object is
-			 * created (lov_init_raid0()).
-			 *
-			 * Top-object is a strict master of its sub-objects:
-			 * it is created before them, and outlives its
-			 * children (this later is necessary so that basic
-			 * functions like cl_object_top() always
-			 * work). Top-object keeps a reference on every
-			 * sub-object.
-			 *
-			 * When top-object is destroyed (lov_delete_raid0())
-			 * it releases its reference to a sub-object and waits
-			 * until the latter is finally destroyed.
-			 */
-			struct lovsub_object **lo_sub;
-			/**
-			 * protect lo_sub
-			 */
-			spinlock_t		lo_sub_lock;
-			/**
-			 * Cached object attribute, built from sub-object
-			 * attributes.
-			 */
-			struct cl_attr	 lo_attr;
-		} raid0;
 		struct lov_layout_state_empty {
 		} empty;
 		struct lov_layout_state_released {
 		} released;
+		struct lov_layout_composite {
+			struct lov_layout_entry {
+				struct lov_layout_raid0 lle_raid0;
+			} lo_entries;
+		} composite;
 	} u;
 	/**
 	 * Thread that acquired lov_object::lo_type_guard in an exclusive
@@ -627,7 +633,7 @@ static inline struct lov_layout_raid0 *lov_r0(struct lov_object *lov)
 	LASSERT(lov->lo_type == LLT_RAID0);
 	LASSERT(lov->lo_lsm->lsm_magic == LOV_MAGIC ||
 		lov->lo_lsm->lsm_magic == LOV_MAGIC_V3);
-	return &lov->u.raid0;
+	return &lov->u.composite.lo_entries.lle_raid0;
 }
 
 /* lov_pack.c */
diff --git a/drivers/staging/lustre/lustre/lov/lov_object.c b/drivers/staging/lustre/lustre/lov/lov_object.c
index ad2901a..15ed378 100644
--- a/drivers/staging/lustre/lustre/lov/lov_object.c
+++ b/drivers/staging/lustre/lustre/lov/lov_object.c
@@ -228,7 +228,7 @@ static int lov_init_raid0(const struct lu_env *env, struct lov_device *dev,
 	struct lov_thread_info  *lti     = lov_env_info(env);
 	struct cl_object_conf   *subconf = &lti->lti_stripe_conf;
 	struct lu_fid	   *ofid    = &lti->lti_fid;
-	struct lov_layout_raid0 *r0      = &state->raid0;
+	struct lov_layout_raid0 *r0 = &state->composite.lo_entries.lle_raid0;
 
 	if (lsm->lsm_magic != LOV_MAGIC_V1 && lsm->lsm_magic != LOV_MAGIC_V3) {
 		dump_lsm(D_ERROR, lsm);
@@ -375,7 +375,7 @@ static void lov_subobject_kill(const struct lu_env *env, struct lov_object *lov,
 	wait_queue_head_t *wq;
 	wait_queue_entry_t	  *waiter;
 
-	r0  = &lov->u.raid0;
+	r0  = &lov->u.composite.lo_entries.lle_raid0;
 	LASSERT(r0->lo_sub[idx] == los);
 
 	sub  = lovsub2cl(los);
@@ -418,7 +418,7 @@ static void lov_subobject_kill(const struct lu_env *env, struct lov_object *lov,
 static int lov_delete_raid0(const struct lu_env *env, struct lov_object *lov,
 			    union lov_layout_state *state)
 {
-	struct lov_layout_raid0 *r0 = &state->raid0;
+	struct lov_layout_raid0 *r0 = &state->composite.lo_entries.lle_raid0;
 	struct lov_stripe_md    *lsm = lov->lo_lsm;
 	int i;
 
@@ -451,7 +451,7 @@ static void lov_fini_empty(const struct lu_env *env, struct lov_object *lov,
 static void lov_fini_raid0(const struct lu_env *env, struct lov_object *lov,
 			   union lov_layout_state *state)
 {
-	struct lov_layout_raid0 *r0 = &state->raid0;
+	struct lov_layout_raid0 *r0 = &state->composite.lo_entries.lle_raid0;
 
 	if (r0->lo_sub) {
 		kvfree(r0->lo_sub);
diff --git a/drivers/staging/lustre/lustre/lov/lovsub_object.c b/drivers/staging/lustre/lustre/lov/lovsub_object.c
index 13d4520..7360c16 100644
--- a/drivers/staging/lustre/lustre/lov/lovsub_object.c
+++ b/drivers/staging/lustre/lustre/lov/lovsub_object.c
@@ -80,10 +80,10 @@ static void lovsub_object_free(const struct lu_env *env, struct lu_object *obj)
 	 */
 	if (lov) {
 		LASSERT(lov->lo_type == LLT_RAID0);
-		LASSERT(lov->u.raid0.lo_sub[los->lso_index] == los);
-		spin_lock(&lov->u.raid0.lo_sub_lock);
-		lov->u.raid0.lo_sub[los->lso_index] = NULL;
-		spin_unlock(&lov->u.raid0.lo_sub_lock);
+		LASSERT(lov->u.composite.lo_entries.lle_raid0.lo_sub[los->lso_index] == los);
+		spin_lock(&lov->u.composite.lo_entries.lle_raid0.lo_sub_lock);
+		lov->u.composite.lo_entries.lle_raid0.lo_sub[los->lso_index] = NULL;
+		spin_unlock(&lov->u.composite.lo_entries.lle_raid0.lo_sub_lock);
 	}
 
 	lu_object_fini(obj);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [lustre-devel] [PATCH v2 09/33] lustre: lov: migrate lov raid0 to future PFL component handling
  2019-01-06 22:13 [lustre-devel] [PATCH v2 00/33] lustre: add PFL support James Simmons
                   ` (7 preceding siblings ...)
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 08/33] lustre: lov: embedded raid0 in struct lov_layout_composite James Simmons
@ 2019-01-06 22:14 ` James Simmons
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 10/33] lustre: lov: reduce code indentation James Simmons
                   ` (23 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: James Simmons @ 2019-01-06 22:14 UTC (permalink / raw)
  To: lustre-devel

From: Bobi Jam <bobijam@hotmail.com>

PFL will change striping from being static to dynamic. The idea
of stripe count will change under this case. So rename the fields
representing stripe index to component index. The raid0 stripe
handing will be replaced with PFL component handling so make the
raid0 a subsystem of PFL handling.

Signed-off-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Signed-off-by: Bobi Jam <bobijam@hotmail.com>
Signed-off-by: Niu Yawei <yawei.niu@intel.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-8998
Reviewed-on: https://review.whamcloud.com/24850
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../staging/lustre/lustre/lov/lov_cl_internal.h    |  29 +--
 drivers/staging/lustre/lustre/lov/lov_io.c         |  41 +++--
 drivers/staging/lustre/lustre/lov/lov_lock.c       |   8 +-
 drivers/staging/lustre/lustre/lov/lov_object.c     | 196 +++++++++++++--------
 drivers/staging/lustre/lustre/lov/lov_page.c       |  19 +-
 drivers/staging/lustre/lustre/lov/lovsub_object.c  |  13 +-
 6 files changed, 178 insertions(+), 128 deletions(-)

diff --git a/drivers/staging/lustre/lustre/lov/lov_cl_internal.h b/drivers/staging/lustre/lustre/lov/lov_cl_internal.h
index c6ace49..c44c937 100644
--- a/drivers/staging/lustre/lustre/lov/lov_cl_internal.h
+++ b/drivers/staging/lustre/lustre/lov/lov_cl_internal.h
@@ -108,8 +108,8 @@ struct lov_device {
  */
 enum lov_layout_type {
 	LLT_EMPTY,	/** empty file without body (mknod + truncate) */
-	LLT_RAID0,	/** striped file */
 	LLT_RELEASED,	/** file with no objects (data in HSM) */
+	LLT_COMP,	/** support composite layout */
 	LLT_NR
 };
 
@@ -118,10 +118,10 @@ static inline char *llt2str(enum lov_layout_type llt)
 	switch (llt) {
 	case LLT_EMPTY:
 		return "EMPTY";
-	case LLT_RAID0:
-		return "RAID0";
 	case LLT_RELEASED:
 		return "RELEASED";
+	case LLT_COMP:
+		return "COMPOSITE";
 	case LLT_NR:
 		LBUG();
 	}
@@ -242,7 +242,7 @@ struct lov_lock_sub {
 	 */
 	unsigned int		sub_is_enqueued:1,
 				sub_initialized:1;
-	int		  sub_stripe;
+	int			sub_index;
 };
 
 /**
@@ -258,7 +258,8 @@ struct lov_lock {
 
 struct lov_page {
 	struct cl_page_slice	lps_cl;
-	unsigned int		lps_stripe; /* stripe index */
+	/** layout_entry + stripe index, composed using lov_comp_index() */
+	unsigned int		lps_index;
 };
 
 /*
@@ -309,7 +310,6 @@ struct lov_thread_info {
  * State that lov_io maintains for every sub-io.
  */
 struct lov_io_sub {
-	u16		 sub_stripe;
 	/**
 	 * environment's refcheck.
 	 *
@@ -331,6 +331,7 @@ struct lov_io_sub {
 	 * sub-io's active for the current IO iteration.
 	 */
 	struct list_head	 sub_linkage;
+	u16			sub_subio_index;
 	/**
 	 * sub-io for a stripe. Ideally sub-io's can be stopped and resumed
 	 * independently, with lov acting as a scheduler to maximize overall
@@ -425,12 +426,12 @@ int lov_io_init(const struct lu_env *env, struct cl_object *obj,
 int lovsub_lock_init(const struct lu_env *env, struct cl_object *obj,
 		     struct cl_lock *lock, const struct cl_io *io);
 
-int lov_lock_init_raid0(const struct lu_env *env, struct cl_object *obj,
-			struct cl_lock *lock, const struct cl_io *io);
+int lov_lock_init_composite(const struct lu_env *env, struct cl_object *obj,
+			    struct cl_lock *lock, const struct cl_io *io);
 int lov_lock_init_empty(const struct lu_env *env, struct cl_object *obj,
 			struct cl_lock *lock, const struct cl_io *io);
-int lov_io_init_raid0(const struct lu_env *env, struct cl_object *obj,
-		      struct cl_io *io);
+int lov_io_init_composite(const struct lu_env *env, struct cl_object *obj,
+			  struct cl_io *io);
 int lov_io_init_empty(const struct lu_env *env, struct cl_object *obj,
 		      struct cl_io *io);
 int lov_io_init_released(const struct lu_env *env, struct cl_object *obj,
@@ -445,8 +446,8 @@ int lovsub_page_init(const struct lu_env *env, struct cl_object *ob,
 		     struct cl_page *page, pgoff_t index);
 int lov_page_init_empty(const struct lu_env *env, struct cl_object *obj,
 			struct cl_page *page, pgoff_t index);
-int lov_page_init_raid0(const struct lu_env *env, struct cl_object *obj,
-			struct cl_page *page, pgoff_t index);
+int lov_page_init_composite(const struct lu_env *env, struct cl_object *obj,
+			    struct cl_page *page, pgoff_t index);
 struct lu_object *lov_object_alloc(const struct lu_env *env,
 				   const struct lu_object_header *hdr,
 				   struct lu_device *dev);
@@ -455,7 +456,6 @@ struct lu_object *lovsub_object_alloc(const struct lu_env *env,
 				      struct lu_device *dev);
 
 struct lov_stripe_md *lov_lsm_addref(struct lov_object *lov);
-int lov_page_stripe(const struct cl_page *page);
 
 #define lov_foreach_target(lov, var)		    \
 	for (var = 0; var < lov_targets_nr(lov); ++var)
@@ -630,9 +630,10 @@ static inline struct lov_thread_info *lov_env_info(const struct lu_env *env)
 
 static inline struct lov_layout_raid0 *lov_r0(struct lov_object *lov)
 {
-	LASSERT(lov->lo_type == LLT_RAID0);
+	LASSERT(lov->lo_type == LLT_COMP);
 	LASSERT(lov->lo_lsm->lsm_magic == LOV_MAGIC ||
 		lov->lo_lsm->lsm_magic == LOV_MAGIC_V3);
+
 	return &lov->u.composite.lo_entries.lle_raid0;
 }
 
diff --git a/drivers/staging/lustre/lustre/lov/lov_io.c b/drivers/staging/lustre/lustre/lov/lov_io.c
index 2d62566..023b588 100644
--- a/drivers/staging/lustre/lustre/lov/lov_io.c
+++ b/drivers/staging/lustre/lustre/lov/lov_io.c
@@ -53,7 +53,7 @@ static void lov_io_sub_fini(const struct lu_env *env, struct lov_io *lio,
 			sub->sub_io_initialized = 0;
 			lio->lis_active_subios--;
 		}
-		if (sub->sub_stripe == lio->lis_single_subio_index)
+		if (sub->sub_subio_index == lio->lis_single_subio_index)
 			lio->lis_single_subio_index = -1;
 		else if (!sub->sub_borrowed)
 			kfree(sub->sub_io);
@@ -143,12 +143,12 @@ static int lov_io_sub_init(const struct lu_env *env, struct lov_io *lio,
 	struct cl_io      *sub_io;
 	struct cl_object  *sub_obj;
 	struct cl_io      *io  = lio->lis_cl.cis_io;
-	int stripe = sub->sub_stripe;
+	int stripe = sub->sub_subio_index;
 	int rc;
 
 	LASSERT(!sub->sub_io);
 	LASSERT(!sub->sub_env);
-	LASSERT(sub->sub_stripe < lio->lis_stripe_count);
+	LASSERT(sub->sub_subio_index < lio->lis_stripe_count);
 
 	if (unlikely(!lov_r0(lov)->lo_sub[stripe]))
 		return -EIO;
@@ -203,15 +203,15 @@ static int lov_io_sub_init(const struct lu_env *env, struct lov_io *lio,
 }
 
 struct lov_io_sub *lov_sub_get(const struct lu_env *env,
-			       struct lov_io *lio, int stripe)
+			       struct lov_io *lio, int index)
 {
 	int rc;
-	struct lov_io_sub *sub = &lio->lis_subs[stripe];
+	struct lov_io_sub *sub = &lio->lis_subs[index];
 
-	LASSERT(stripe < lio->lis_stripe_count);
+	LASSERT(index < lio->lis_stripe_count);
 
 	if (!sub->sub_io_initialized) {
-		sub->sub_stripe = stripe;
+		sub->sub_subio_index = index;
 		rc = lov_io_sub_init(env, lio, sub);
 	} else {
 		rc = 0;
@@ -228,14 +228,14 @@ struct lov_io_sub *lov_sub_get(const struct lu_env *env,
  *
  */
 
-int lov_page_stripe(const struct cl_page *page)
+static int lov_page_index(const struct cl_page *page)
 {
 	const struct cl_page_slice *slice;
 
 	slice = cl_page_at(page, &lov_device_type);
 	LASSERT(slice->cpl_obj);
 
-	return cl2lov_page(slice)->lps_stripe;
+	return cl2lov_page(slice)->lps_index;
 }
 
 static int lov_io_subio_init(const struct lu_env *env, struct lov_io *lio,
@@ -630,8 +630,7 @@ static int lov_io_submit(const struct lu_env *env,
 	struct lov_io_sub *sub;
 	struct cl_page_list *plist = &lov_env_info(env)->lti_plist;
 	struct cl_page *page;
-	int stripe;
-
+	int index;
 	int rc = 0;
 
 	if (lio->lis_active_subios == 1) {
@@ -657,16 +656,16 @@ static int lov_io_submit(const struct lu_env *env,
 		page = cl_page_list_first(qin);
 		cl_page_list_move(&cl2q->c2_qin, qin, page);
 
-		stripe = lov_page_stripe(page);
+		index = lov_page_index(page);
 		while (qin->pl_nr > 0) {
 			page = cl_page_list_first(qin);
-			if (stripe != lov_page_stripe(page))
+			if (index != lov_page_index(page))
 				break;
 
 			cl_page_list_move(&cl2q->c2_qin, qin, page);
 		}
 
-		sub = lov_sub_get(env, lio, stripe);
+		sub = lov_sub_get(env, lio, index);
 		if (!IS_ERR(sub)) {
 			rc = cl_io_submit_rw(sub->sub_env, sub->sub_io,
 					     crt, cl2q);
@@ -716,16 +715,16 @@ static int lov_io_commit_async(const struct lu_env *env,
 	cl_page_list_init(plist);
 	while (queue->pl_nr > 0) {
 		int stripe_to = to;
-		int stripe;
+		int index;
 
 		LASSERT(plist->pl_nr == 0);
 		page = cl_page_list_first(queue);
 		cl_page_list_move(plist, queue, page);
 
-		stripe = lov_page_stripe(page);
+		index = lov_page_index(page);
 		while (queue->pl_nr > 0) {
 			page = cl_page_list_first(queue);
-			if (stripe != lov_page_stripe(page))
+			if (index != lov_page_index(page))
 				break;
 
 			cl_page_list_move(plist, queue, page);
@@ -734,7 +733,7 @@ static int lov_io_commit_async(const struct lu_env *env,
 		if (queue->pl_nr > 0) /* still has more pages */
 			stripe_to = PAGE_SIZE;
 
-		sub = lov_sub_get(env, lio, stripe);
+		sub = lov_sub_get(env, lio, index);
 		if (!IS_ERR(sub)) {
 			rc = cl_io_commit_async(sub->sub_env, sub->sub_io,
 						plist, from, stripe_to, cb);
@@ -769,7 +768,7 @@ static int lov_io_fault_start(const struct lu_env *env,
 
 	fio = &ios->cis_io->u.ci_fault;
 	lio = cl2lov_io(env, ios);
-	sub = lov_sub_get(env, lio, lov_page_stripe(fio->ft_page));
+	sub = lov_sub_get(env, lio, lov_page_index(fio->ft_page));
 	if (IS_ERR(sub))
 		return PTR_ERR(sub);
 	sub->sub_io->u.ci_fault.ft_nob = fio->ft_nob;
@@ -941,8 +940,8 @@ static void lov_empty_impossible(const struct lu_env *env,
 	.cio_commit_async              = LOV_EMPTY_IMPOSSIBLE
 };
 
-int lov_io_init_raid0(const struct lu_env *env, struct cl_object *obj,
-		      struct cl_io *io)
+int lov_io_init_composite(const struct lu_env *env, struct cl_object *obj,
+			  struct cl_io *io)
 {
 	struct lov_io       *lio = lov_env_io(env);
 	struct lov_object   *lov = cl2lov(obj);
diff --git a/drivers/staging/lustre/lustre/lov/lov_lock.c b/drivers/staging/lustre/lustre/lov/lov_lock.c
index b029210..4340063 100644
--- a/drivers/staging/lustre/lustre/lov/lov_lock.c
+++ b/drivers/staging/lustre/lustre/lov/lov_lock.c
@@ -73,7 +73,7 @@ static struct lov_sublock_env *lov_sublock_env_get(const struct lu_env *env,
 		subenv->lse_env = env;
 		subenv->lse_io  = io;
 	} else {
-		sub = lov_sub_get(env, lio, lls->sub_stripe);
+		sub = lov_sub_get(env, lio, lls->sub_index);
 		if (!IS_ERR(sub)) {
 			subenv->lse_env = sub->sub_env;
 			subenv->lse_io  = sub->sub_io;
@@ -167,7 +167,7 @@ static struct lov_lock *lov_lock_sub_init(const struct lu_env *env,
 			descr->cld_mode  = lock->cll_descr.cld_mode;
 			descr->cld_gid   = lock->cll_descr.cld_gid;
 			descr->cld_enq_flags = lock->cll_descr.cld_enq_flags;
-			lls->sub_stripe = i;
+			lls->sub_index = i;
 
 			/* initialize sub lock */
 			result = lov_sublock_init(env, lock, lls);
@@ -295,8 +295,8 @@ static int lov_lock_print(const struct lu_env *env, void *cookie,
 	.clo_print     = lov_lock_print
 };
 
-int lov_lock_init_raid0(const struct lu_env *env, struct cl_object *obj,
-			struct cl_lock *lock, const struct cl_io *io)
+int lov_lock_init_composite(const struct lu_env *env, struct cl_object *obj,
+			    struct cl_lock *lock, const struct cl_io *io)
 {
 	struct lov_lock *lck;
 	int result = 0;
diff --git a/drivers/staging/lustre/lustre/lov/lov_object.c b/drivers/staging/lustre/lustre/lov/lov_object.c
index 15ed378..f5c6da1 100644
--- a/drivers/staging/lustre/lustre/lov/lov_object.c
+++ b/drivers/staging/lustre/lustre/lov/lov_object.c
@@ -110,9 +110,9 @@ static int lov_init_empty(const struct lu_env *env, struct lov_device *dev,
 	return 0;
 }
 
-static void lov_install_raid0(const struct lu_env *env,
-			      struct lov_object *lov,
-			      union lov_layout_state *state)
+static void lov_install_composite(const struct lu_env *env,
+				  struct lov_object *lov,
+				  union lov_layout_state *state)
 {
 }
 
@@ -129,7 +129,7 @@ static struct cl_object *lov_sub_find(const struct lu_env *env,
 }
 
 static int lov_init_sub(const struct lu_env *env, struct lov_object *lov,
-			struct cl_object *stripe, struct lov_layout_raid0 *r0,
+			struct cl_object *subobj, struct lov_layout_raid0 *r0,
 			int idx)
 {
 	struct cl_object_header *hdr;
@@ -145,13 +145,13 @@ static int lov_init_sub(const struct lu_env *env, struct lov_object *lov,
 		 * lov_oinfo of lsm_stripe_data which will be freed due to
 		 * this failure.
 		 */
-		cl_object_kill(env, stripe);
-		cl_object_put(env, stripe);
+		cl_object_kill(env, subobj);
+		cl_object_put(env, subobj);
 		return -EIO;
 	}
 
 	hdr    = cl_object_header(lov2cl(lov));
-	subhdr = cl_object_header(stripe);
+	subhdr = cl_object_header(subobj);
 
 	oinfo = lov->lo_lsm->lsm_entries[0]->lsme_oinfo[idx];
 	CDEBUG(D_INODE, DFID "@%p[%d] -> " DFID "@%p: ostid: " DOSTID " idx: %d gen: %d\n",
@@ -166,8 +166,8 @@ static int lov_init_sub(const struct lu_env *env, struct lov_object *lov,
 		subhdr->coh_parent = hdr;
 		spin_unlock(&subhdr->coh_attr_guard);
 		subhdr->coh_nesting = hdr->coh_nesting + 1;
-		lu_object_ref_add(&stripe->co_lu, "lov-parent", lov);
-		r0->lo_sub[idx] = cl2lovsub(stripe);
+		lu_object_ref_add(&subobj->co_lu, "lov-parent", lov);
+		r0->lo_sub[idx] = cl2lovsub(subobj);
 		r0->lo_sub[idx]->lso_super = lov;
 		r0->lo_sub[idx]->lso_index = idx;
 		result = 0;
@@ -184,18 +184,18 @@ static int lov_init_sub(const struct lu_env *env, struct lov_object *lov,
 			/* the object's layout has already changed but isn't
 			 * refreshed
 			 */
-			lu_object_unhash(env, &stripe->co_lu);
+			lu_object_unhash(env, &subobj->co_lu);
 			result = -EAGAIN;
 		} else {
 			mask = D_ERROR;
 			result = -EIO;
 		}
 
-		LU_OBJECT_DEBUG(mask, env, &stripe->co_lu,
+		LU_OBJECT_DEBUG(mask, env, &subobj->co_lu,
 				"stripe %d is already owned.", idx);
 		LU_OBJECT_DEBUG(mask, env, old_obj, "owned.");
 		LU_OBJECT_HEADER(mask, env, lov2lu(lov), "try to own.\n");
-		cl_object_put(env, stripe);
+		cl_object_put(env, subobj);
 	}
 	return result;
 }
@@ -219,7 +219,7 @@ static int lov_page_slice_fixup(struct lov_object *lov,
 static int lov_init_raid0(const struct lu_env *env, struct lov_device *dev,
 			  struct lov_object *lov, struct lov_stripe_md *lsm,
 			  const struct cl_object_conf *conf,
-			  union  lov_layout_state *state)
+			  struct lov_layout_raid0 *r0)
 {
 	int result;
 	int i;
@@ -228,7 +228,6 @@ static int lov_init_raid0(const struct lu_env *env, struct lov_device *dev,
 	struct lov_thread_info  *lti     = lov_env_info(env);
 	struct cl_object_conf   *subconf = &lti->lti_stripe_conf;
 	struct lu_fid	   *ofid    = &lti->lti_fid;
-	struct lov_layout_raid0 *r0 = &state->composite.lo_entries.lle_raid0;
 
 	if (lsm->lsm_magic != LOV_MAGIC_V1 && lsm->lsm_magic != LOV_MAGIC_V3) {
 		dump_lsm(D_ERROR, lsm);
@@ -310,6 +309,17 @@ static int lov_init_raid0(const struct lu_env *env, struct lov_device *dev,
 	return result;
 }
 
+static int lov_init_composite(const struct lu_env *env, struct lov_device *dev,
+			      struct lov_object *lov, struct lov_stripe_md *lsm,
+			      const struct cl_object_conf *conf,
+			      union lov_layout_state *state)
+{
+	struct lov_layout_composite *comp = &state->composite;
+	struct lov_layout_entry *le = &comp->lo_entries;
+
+	return lov_init_raid0(env, dev, lov, lsm, conf, &le->lle_raid0);
+}
+
 static int lov_init_released(const struct lu_env *env, struct lov_device *dev,
 			     struct lov_object *lov, struct lov_stripe_md *lsm,
 			     const struct cl_object_conf *conf,
@@ -337,7 +347,7 @@ static struct cl_object *lov_find_subobj(const struct lu_env *env,
 	int ost_idx;
 	int rc;
 
-	if (lov->lo_type != LLT_RAID0) {
+	if (lov->lo_type != LLT_COMP) {
 		result = NULL;
 		goto out;
 	}
@@ -367,15 +377,14 @@ static int lov_delete_empty(const struct lu_env *env, struct lov_object *lov,
 }
 
 static void lov_subobject_kill(const struct lu_env *env, struct lov_object *lov,
+			       struct lov_layout_raid0 *r0,
 			       struct lovsub_object *los, int idx)
 {
 	struct cl_object	*sub;
-	struct lov_layout_raid0 *r0;
 	struct lu_site	  *site;
 	wait_queue_head_t *wq;
 	wait_queue_entry_t	  *waiter;
 
-	r0  = &lov->u.composite.lo_entries.lle_raid0;
 	LASSERT(r0->lo_sub[idx] == los);
 
 	sub  = lovsub2cl(los);
@@ -415,17 +424,12 @@ static void lov_subobject_kill(const struct lu_env *env, struct lov_object *lov,
 	LASSERT(!r0->lo_sub[idx]);
 }
 
-static int lov_delete_raid0(const struct lu_env *env, struct lov_object *lov,
-			    union lov_layout_state *state)
+static void lov_delete_raid0(const struct lu_env *env, struct lov_object *lov,
+			     struct lov_layout_raid0 *r0)
 {
-	struct lov_layout_raid0 *r0 = &state->composite.lo_entries.lle_raid0;
-	struct lov_stripe_md    *lsm = lov->lo_lsm;
-	int i;
-
-	dump_lsm(D_INODE, lsm);
-
-	lov_layout_wait(env, lov);
 	if (r0->lo_sub) {
+		int i;
+
 		for (i = 0; i < r0->lo_nr; ++i) {
 			struct lovsub_object *los = r0->lo_sub[i];
 
@@ -435,10 +439,24 @@ static int lov_delete_raid0(const struct lu_env *env, struct lov_object *lov,
 				 * If top-level object is to be evicted from
 				 * the cache, so are its sub-objects.
 				 */
-				lov_subobject_kill(env, lov, los, i);
+				lov_subobject_kill(env, lov, r0, los, i);
 			}
 		}
 	}
+}
+
+static int lov_delete_composite(const struct lu_env *env,
+				struct lov_object *lov,
+				union lov_layout_state *state)
+{
+	struct lov_layout_composite *comp = &state->composite;
+	struct lov_layout_entry *entry = &comp->lo_entries;
+
+	dump_lsm(D_INODE, lov->lo_lsm);
+
+	lov_layout_wait(env, lov);
+	lov_delete_raid0(env, lov, &entry->lle_raid0);
+
 	return 0;
 }
 
@@ -448,15 +466,23 @@ static void lov_fini_empty(const struct lu_env *env, struct lov_object *lov,
 	LASSERT(lov->lo_type == LLT_EMPTY || lov->lo_type == LLT_RELEASED);
 }
 
-static void lov_fini_raid0(const struct lu_env *env, struct lov_object *lov,
-			   union lov_layout_state *state)
+static void lov_fini_raid0(const struct lu_env *env,
+			   struct lov_layout_raid0 *r0)
 {
-	struct lov_layout_raid0 *r0 = &state->composite.lo_entries.lle_raid0;
-
 	if (r0->lo_sub) {
 		kvfree(r0->lo_sub);
 		r0->lo_sub = NULL;
 	}
+}
+
+static void lov_fini_composite(const struct lu_env *env,
+			       struct lov_object *lov,
+			       union lov_layout_state *state)
+{
+	struct lov_layout_composite *comp = &state->composite;
+	struct lov_layout_entry *entry = &comp->lo_entries;
+
+	lov_fini_raid0(env, &entry->lle_raid0);
 
 	dump_lsm(D_INODE, lov->lo_lsm);
 	lov_free_memmd(&lov->lo_lsm);
@@ -477,17 +503,10 @@ static int lov_print_empty(const struct lu_env *env, void *cookie,
 }
 
 static int lov_print_raid0(const struct lu_env *env, void *cookie,
-			   lu_printer_t p, const struct lu_object *o)
+			   lu_printer_t p, struct lov_layout_raid0 *r0)
 {
-	struct lov_object	*lov = lu2lov(o);
-	struct lov_layout_raid0	*r0  = lov_r0(lov);
-	struct lov_stripe_md	*lsm = lov->lo_lsm;
-	int			 i;
+	int i;
 
-	(*p)(env, cookie, "stripes: %d, %s, lsm{%p 0x%08X %d %u %u}:\n",
-	     r0->lo_nr, lov->lo_layout_invalid ? "invalid" : "valid", lsm,
-	     lsm->lsm_magic, atomic_read(&lsm->lsm_refc),
-	     lsm->lsm_entries[0]->lsme_stripe_count, lsm->lsm_layout_gen);
 	for (i = 0; i < r0->lo_nr; ++i) {
 		struct lu_object *sub;
 
@@ -501,6 +520,23 @@ static int lov_print_raid0(const struct lu_env *env, void *cookie,
 	return 0;
 }
 
+static int lov_print_composite(const struct lu_env *env, void *cookie,
+			       lu_printer_t p, const struct lu_object *o)
+{
+	struct lov_object *lov = lu2lov(o);
+	struct lov_layout_raid0	*r0 = lov_r0(lov);
+	struct lov_stripe_md *lsm = lov->lo_lsm;
+
+	(*p)(env, cookie, "stripes: %d, %s, lsm{%p 0x%08X %d %u %u}:\n",
+	     r0->lo_nr, lov->lo_layout_invalid ? "invalid" : "valid", lsm,
+	     lsm->lsm_magic, atomic_read(&lsm->lsm_refc),
+	     lsm->lsm_entries[0]->lsme_stripe_count, lsm->lsm_layout_gen);
+
+	lov_print_raid0(env, cookie, p, r0);
+
+	return 0;
+}
+
 static int lov_print_released(const struct lu_env *env, void *cookie,
 			      lu_printer_t p, const struct lu_object *o)
 {
@@ -525,17 +561,13 @@ static int lov_print_released(const struct lu_env *env, void *cookie,
 static int lov_attr_get_empty(const struct lu_env *env, struct cl_object *obj,
 			      struct cl_attr *attr)
 {
-	attr->cat_blocks = 0;
 	return 0;
 }
 
-static int lov_attr_get_raid0(const struct lu_env *env, struct cl_object *obj,
-			      struct cl_attr *attr)
+static int lov_attr_get_raid0(const struct lu_env *env, struct lov_object *lov,
+			      struct cl_attr *attr, struct lov_layout_raid0 *r0)
 {
-	struct lov_object	*lov = cl2lov(obj);
-	struct lov_layout_raid0 *r0 = lov_r0(lov);
-	struct cl_attr		*lov_attr = &r0->lo_attr;
-	int			 result = 0;
+	int result = 0;
 
 	/* this is called w/o holding type guard mutex, so it must be inside
 	 * an on going IO otherwise lsm may be replaced.
@@ -577,22 +609,38 @@ static int lov_attr_get_raid0(const struct lu_env *env, struct cl_object *obj,
 		result = lov_merge_lvb_kms(lsm, lvb, &kms);
 		lov_stripe_unlock(lsm);
 		if (result == 0) {
-			cl_lvb2attr(lov_attr, lvb);
-			lov_attr->cat_kms = kms;
+			cl_lvb2attr(attr, lvb);
+			attr->cat_kms = kms;
 			r0->lo_attr_valid = 1;
 		}
 	}
-	if (result == 0) { /* merge results */
-		attr->cat_blocks = lov_attr->cat_blocks;
-		attr->cat_size = lov_attr->cat_size;
-		attr->cat_kms = lov_attr->cat_kms;
-		if (attr->cat_atime < lov_attr->cat_atime)
-			attr->cat_atime = lov_attr->cat_atime;
-		if (attr->cat_ctime < lov_attr->cat_ctime)
-			attr->cat_ctime = lov_attr->cat_ctime;
-		if (attr->cat_mtime < lov_attr->cat_mtime)
-			attr->cat_mtime = lov_attr->cat_mtime;
-	}
+
+	return result;
+}
+
+static int lov_attr_get_composite(const struct lu_env *env,
+				  struct cl_object *obj,
+				  struct cl_attr *attr)
+{
+	struct lov_object *lov = cl2lov(obj);
+	struct lov_layout_raid0 *r0 = lov_r0(lov);
+	struct cl_attr *lov_attr = &r0->lo_attr;
+	int result;
+
+	result = lov_attr_get_raid0(env, lov, attr, r0);
+	if (result)
+		return result;
+
+	attr->cat_blocks = lov_attr->cat_blocks;
+	attr->cat_size = lov_attr->cat_size;
+	attr->cat_kms = lov_attr->cat_kms;
+	if (attr->cat_atime < lov_attr->cat_atime)
+		attr->cat_atime = lov_attr->cat_atime;
+	if (attr->cat_ctime < lov_attr->cat_ctime)
+		attr->cat_ctime = lov_attr->cat_ctime;
+	if (attr->cat_mtime < lov_attr->cat_mtime)
+		attr->cat_mtime = lov_attr->cat_mtime;
+
 	return result;
 }
 
@@ -608,17 +656,6 @@ static int lov_attr_get_raid0(const struct lu_env *env, struct cl_object *obj,
 		.llo_io_init   = lov_io_init_empty,
 		.llo_getattr   = lov_attr_get_empty
 	},
-	[LLT_RAID0] = {
-		.llo_init      = lov_init_raid0,
-		.llo_delete    = lov_delete_raid0,
-		.llo_fini      = lov_fini_raid0,
-		.llo_install   = lov_install_raid0,
-		.llo_print     = lov_print_raid0,
-		.llo_page_init = lov_page_init_raid0,
-		.llo_lock_init = lov_lock_init_raid0,
-		.llo_io_init   = lov_io_init_raid0,
-		.llo_getattr   = lov_attr_get_raid0
-	},
 	[LLT_RELEASED] = {
 		.llo_init      = lov_init_released,
 		.llo_delete    = lov_delete_empty,
@@ -629,7 +666,18 @@ static int lov_attr_get_raid0(const struct lu_env *env, struct cl_object *obj,
 		.llo_lock_init = lov_lock_init_empty,
 		.llo_io_init   = lov_io_init_released,
 		.llo_getattr   = lov_attr_get_empty
-	}
+	},
+	[LLT_COMP] = {
+		.llo_init	= lov_init_composite,
+		.llo_delete	= lov_delete_composite,
+		.llo_fini	= lov_fini_composite,
+		.llo_install	= lov_install_composite,
+		.llo_print	= lov_print_composite,
+		.llo_page_init	= lov_page_init_composite,
+		.llo_lock_init	= lov_lock_init_composite,
+		.llo_io_init	= lov_io_init_composite,
+		.llo_getattr	= lov_attr_get_composite,
+	},
 };
 
 /**
@@ -659,7 +707,7 @@ static enum lov_layout_type lov_type(struct lov_stripe_md *lsm)
 	if (lsm->lsm_is_released)
 		return LLT_RELEASED;
 
-	return LLT_RAID0;
+	return LLT_COMP;
 }
 
 static inline void lov_conf_freeze(struct lov_object *lov)
@@ -1610,7 +1658,7 @@ int lov_read_and_clear_async_rc(struct cl_object *clob)
 
 		lov_conf_freeze(lov);
 		switch (lov->lo_type) {
-		case LLT_RAID0: {
+		case LLT_COMP: {
 			struct lov_stripe_md *lsm;
 			int i;
 
diff --git a/drivers/staging/lustre/lustre/lov/lov_page.c b/drivers/staging/lustre/lustre/lov/lov_page.c
index f1c99a2..d94d003 100644
--- a/drivers/staging/lustre/lustre/lov/lov_page.c
+++ b/drivers/staging/lustre/lustre/lov/lov_page.c
@@ -50,22 +50,21 @@
  * Lov page operations.
  *
  */
-
-static int lov_raid0_page_print(const struct lu_env *env,
-				const struct cl_page_slice *slice,
-				void *cookie, lu_printer_t printer)
+static int lov_comp_page_print(const struct lu_env *env,
+			       const struct cl_page_slice *slice,
+			       void *cookie, lu_printer_t printer)
 {
 	struct lov_page *lp = cl2lov_page(slice);
 
 	return (*printer)(env, cookie, LUSTRE_LOV_NAME "-page@%p, raid0\n", lp);
 }
 
-static const struct cl_page_operations lov_raid0_page_ops = {
-	.cpo_print  = lov_raid0_page_print
+static const struct cl_page_operations lov_comp_page_ops = {
+	.cpo_print  = lov_comp_page_print
 };
 
-int lov_page_init_raid0(const struct lu_env *env, struct cl_object *obj,
-			struct cl_page *page, pgoff_t index)
+int lov_page_init_composite(const struct lu_env *env, struct cl_object *obj,
+			    struct cl_page *page, pgoff_t index)
 {
 	struct lov_object *loo = cl2lov(obj);
 	struct lov_layout_raid0 *r0 = lov_r0(loo);
@@ -85,8 +84,8 @@ int lov_page_init_raid0(const struct lu_env *env, struct cl_object *obj,
 	rc = lov_stripe_offset(loo->lo_lsm, offset, stripe, &suboff);
 	LASSERT(rc == 0);
 
-	lpg->lps_stripe = stripe;
-	cl_page_slice_add(page, &lpg->lps_cl, obj, index, &lov_raid0_page_ops);
+	lpg->lps_index = stripe;
+	cl_page_slice_add(page, &lpg->lps_cl, obj, index, &lov_comp_page_ops);
 
 	sub = lov_sub_get(env, lio, stripe);
 	if (IS_ERR(sub))
diff --git a/drivers/staging/lustre/lustre/lov/lovsub_object.c b/drivers/staging/lustre/lustre/lov/lovsub_object.c
index 7360c16..d3e9537 100644
--- a/drivers/staging/lustre/lustre/lov/lovsub_object.c
+++ b/drivers/staging/lustre/lustre/lov/lovsub_object.c
@@ -79,11 +79,14 @@ static void lovsub_object_free(const struct lu_env *env, struct lu_object *obj)
 	 * object handling in lu_object_find.
 	 */
 	if (lov) {
-		LASSERT(lov->lo_type == LLT_RAID0);
-		LASSERT(lov->u.composite.lo_entries.lle_raid0.lo_sub[los->lso_index] == los);
-		spin_lock(&lov->u.composite.lo_entries.lle_raid0.lo_sub_lock);
-		lov->u.composite.lo_entries.lle_raid0.lo_sub[los->lso_index] = NULL;
-		spin_unlock(&lov->u.composite.lo_entries.lle_raid0.lo_sub_lock);
+		int stripe = los->lso_index;
+		struct lov_layout_raid0 *r0 = lov_r0(lov);
+
+		LASSERT(lov->lo_type == LLT_COMP);
+		LASSERT(r0->lo_sub[stripe] == los);
+		spin_lock(&r0->lo_sub_lock);
+		r0->lo_sub[stripe] = NULL;
+		spin_unlock(&r0->lo_sub_lock);
 	}
 
 	lu_object_fini(obj);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [lustre-devel] [PATCH v2 10/33] lustre: lov: reduce code indentation
  2019-01-06 22:13 [lustre-devel] [PATCH v2 00/33] lustre: add PFL support James Simmons
                   ` (8 preceding siblings ...)
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 09/33] lustre: lov: migrate lov raid0 to future PFL component handling James Simmons
@ 2019-01-06 22:14 ` James Simmons
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 11/33] lustre: lov: change lo_entries to array James Simmons
                   ` (22 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: James Simmons @ 2019-01-06 22:14 UTC (permalink / raw)
  To: lustre-devel

From: Bobi Jam <bobijam@hotmail.com>

For lov_init_raid0() we check for the failure of lo_sub and return
an error rigth away. This allows us to reduce the code indentation.
The same can be done for lov_attr_get_raid0() with the test of
r0->lo_attr_valid.

Signed-off-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Signed-off-by: Bobi Jam <bobijam@hotmail.com>
Signed-off-by: Niu Yawei <yawei.niu@intel.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-8998
Reviewed-on: https://review.whamcloud.com/24850
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/lov/lov_io.c     |  11 +-
 drivers/staging/lustre/lustre/lov/lov_object.c | 186 ++++++++++++-------------
 2 files changed, 96 insertions(+), 101 deletions(-)

diff --git a/drivers/staging/lustre/lustre/lov/lov_io.c b/drivers/staging/lustre/lustre/lov/lov_io.c
index 023b588..6dd5639 100644
--- a/drivers/staging/lustre/lustre/lov/lov_io.c
+++ b/drivers/staging/lustre/lustre/lov/lov_io.c
@@ -948,12 +948,13 @@ int lov_io_init_composite(const struct lu_env *env, struct cl_object *obj,
 
 	INIT_LIST_HEAD(&lio->lis_active);
 	io->ci_result = lov_io_slice_init(lio, lov, io);
+	if (io->ci_result)
+		return io->ci_result;
+
+	io->ci_result = lov_io_subio_init(env, lio, io);
 	if (io->ci_result == 0) {
-		io->ci_result = lov_io_subio_init(env, lio, io);
-		if (io->ci_result == 0) {
-			cl_io_slice_add(io, &lio->lis_cl, obj, &lov_io_ops);
-			atomic_inc(&lov->lo_active_ios);
-		}
+		cl_io_slice_add(io, &lio->lis_cl, obj, &lov_io_ops);
+		atomic_inc(&lov->lo_active_ios);
 	}
 	return io->ci_result;
 }
diff --git a/drivers/staging/lustre/lustre/lov/lov_object.c b/drivers/staging/lustre/lustre/lov/lov_object.c
index f5c6da1..1ebaa23 100644
--- a/drivers/staging/lustre/lustre/lov/lov_object.c
+++ b/drivers/staging/lustre/lustre/lov/lov_object.c
@@ -228,6 +228,7 @@ static int lov_init_raid0(const struct lu_env *env, struct lov_device *dev,
 	struct lov_thread_info  *lti     = lov_env_info(env);
 	struct cl_object_conf   *subconf = &lti->lti_stripe_conf;
 	struct lu_fid	   *ofid    = &lti->lti_fid;
+	int psz;
 
 	if (lsm->lsm_magic != LOV_MAGIC_V1 && lsm->lsm_magic != LOV_MAGIC_V3) {
 		dump_lsm(D_ERROR, lsm);
@@ -238,73 +239,76 @@ static int lov_init_raid0(const struct lu_env *env, struct lov_device *dev,
 	LASSERT(!lov->lo_lsm);
 	lov->lo_lsm = lsm_addref(lsm);
 	lov->lo_layout_invalid = true;
+
+	spin_lock_init(&r0->lo_sub_lock);
 	r0->lo_nr  = lsm->lsm_entries[0]->lsme_stripe_count;
 	LASSERT(r0->lo_nr <= lov_targets_nr(dev));
 
 	r0->lo_sub = kvzalloc(r0->lo_nr * sizeof(r0->lo_sub[0]),
 				     GFP_NOFS);
-	if (r0->lo_sub) {
-		int psz = 0;
+	if (!r0->lo_sub)
+		return -ENOMEM;
 
-		result = 0;
-		subconf->coc_inode = conf->coc_inode;
-		spin_lock_init(&r0->lo_sub_lock);
-		/*
-		 * Create stripe cl_objects.
+	psz = 0;
+	result = 0;
+	subconf->coc_inode = conf->coc_inode;
+	/*
+	 * Create stripe cl_objects.
+	 */
+	for (i = 0; i < r0->lo_nr; ++i) {
+		struct cl_device *subdev;
+		struct lov_oinfo *oinfo;
+		int ost_idx;
+
+		oinfo = lsm->lsm_entries[0]->lsme_oinfo[i];
+		if (lov_oinfo_is_dummy(oinfo))
+			continue;
+
+		result = ostid_to_fid(ofid, &oinfo->loi_oi,
+				      oinfo->loi_ost_idx);
+		if (result != 0)
+			goto out;
+
+		ost_idx = oinfo->loi_ost_idx;
+		if (!dev->ld_target[ost_idx]) {
+			CERROR("%s: OST %04x is not initialized\n",
+			       lov2obd(dev->ld_lov)->obd_name, ost_idx);
+			result = -EIO;
+			goto out;
+		}
+
+		subdev = lovsub2cl_dev(dev->ld_target[ost_idx]);
+		subconf->u.coc_oinfo = oinfo;
+		LASSERTF(subdev, "not init ost %d\n", ost_idx);
+		/* In the function below, .hs_keycmp resolves to
+		 * lu_obj_hop_keycmp()
 		 */
-		for (i = 0; i < r0->lo_nr && result == 0; ++i) {
-			struct cl_device *subdev;
-			struct lov_oinfo *oinfo;
-			int ost_idx;
-
-			oinfo = lsm->lsm_entries[0]->lsme_oinfo[i];
-			if (lov_oinfo_is_dummy(oinfo))
-				continue;
-
-			result = ostid_to_fid(ofid, &oinfo->loi_oi,
-					      oinfo->loi_ost_idx);
-			if (result != 0)
-				goto out;
-
-			ost_idx = oinfo->loi_ost_idx;
-			if (!dev->ld_target[ost_idx]) {
-				CERROR("%s: OST %04x is not initialized\n",
-				lov2obd(dev->ld_lov)->obd_name, ost_idx);
-				result = -EIO;
-				goto out;
-			}
+		/* coverity[overrun-buffer-val] */
+		stripe = lov_sub_find(env, subdev, ofid, subconf);
+		if (IS_ERR(stripe)) {
+			result = PTR_ERR(stripe);
+			goto out;
+		}
 
-			subdev = lovsub2cl_dev(dev->ld_target[ost_idx]);
-			subconf->u.coc_oinfo = oinfo;
-			LASSERTF(subdev, "not init ost %d\n", ost_idx);
-			/* In the function below, .hs_keycmp resolves to
-			 * lu_obj_hop_keycmp()
-			 */
-			/* coverity[overrun-buffer-val] */
-			stripe = lov_sub_find(env, subdev, ofid, subconf);
-			if (!IS_ERR(stripe)) {
-				result = lov_init_sub(env, lov, stripe, r0, i);
-				if (result == -EAGAIN) { /* try again */
-					--i;
-					result = 0;
-					continue;
-				}
-			} else {
-				result = PTR_ERR(stripe);
-			}
+		result = lov_init_sub(env, lov, stripe, r0, i);
+		if (result == -EAGAIN) { /* try again */
+			--i;
+			result = 0;
+			continue;
+		}
 
-			if (result == 0) {
-				int sz = lov_page_slice_fixup(lov, stripe);
+		if (result == 0) {
+			int sz = lov_page_slice_fixup(lov, stripe);
 
-				LASSERT(ergo(psz > 0, psz == sz));
-				psz = sz;
-			}
+			LASSERT(ergo(psz > 0, psz == sz));
+			psz = sz;
 		}
-		if (result == 0)
-			cl_object_header(&lov->lo_cl)->coh_page_bufsize += psz;
-	} else {
-		result = -ENOMEM;
 	}
+	if (result == 0)
+		cl_object_header(&lov->lo_cl)->coh_page_bufsize += psz;
+	else
+		result = -ENOMEM;
+
 out:
 	return result;
 }
@@ -567,53 +571,43 @@ static int lov_attr_get_empty(const struct lu_env *env, struct cl_object *obj,
 static int lov_attr_get_raid0(const struct lu_env *env, struct lov_object *lov,
 			      struct cl_attr *attr, struct lov_layout_raid0 *r0)
 {
+	struct lov_stripe_md *lsm = lov->lo_lsm;
+	struct ost_lvb *lvb = &lov_env_info(env)->lti_lvb;
 	int result = 0;
+	u64 kms = 0;
 
-	/* this is called w/o holding type guard mutex, so it must be inside
-	 * an on going IO otherwise lsm may be replaced.
-	 * LU-2117: it turns out there exists one exception. For mmaped files,
-	 * the lock of those files may be requested in the other file's IO
-	 * context, and this function is called in ccc_lock_state(), it will
-	 * hit this assertion.
-	 * Anyway, it's still okay to call attr_get w/o type guard as layout
-	 * can't go if locks exist.
-	 */
-	/* LASSERT(atomic_read(&lsm->lsm_refc) > 1); */
+	if (r0->lo_attr_valid)
+		return 0;
 
-	if (!r0->lo_attr_valid) {
-		struct lov_stripe_md    *lsm = lov->lo_lsm;
-		struct ost_lvb	  *lvb = &lov_env_info(env)->lti_lvb;
-		__u64		    kms = 0;
+	memset(lvb, 0, sizeof(*lvb));
+	/* XXX: timestamps can be negative by sanity:test_39m,
+	 * how can it be?
+	 */
+	lvb->lvb_atime = LLONG_MIN;
+	lvb->lvb_ctime = LLONG_MIN;
+	lvb->lvb_mtime = LLONG_MIN;
 
-		memset(lvb, 0, sizeof(*lvb));
-		/* XXX: timestamps can be negative by sanity:test_39m,
-		 * how can it be?
-		 */
-		lvb->lvb_atime = LLONG_MIN;
-		lvb->lvb_ctime = LLONG_MIN;
-		lvb->lvb_mtime = LLONG_MIN;
+	/*
+	 * XXX that should be replaced with a loop over sub-objects,
+	 * doing cl_object_attr_get() on them. But for now, let's
+	 * reuse old lov code.
+	 */
 
-		/*
-		 * XXX that should be replaced with a loop over sub-objects,
-		 * doing cl_object_attr_get() on them. But for now, let's
-		 * reuse old lov code.
-		 */
+	/*
+	 * XXX take lsm spin-lock to keep lov_merge_lvb_kms()
+	 * happy. It's not needed, because new code uses
+	 * ->coh_attr_guard spin-lock to protect consistency of
+	 * sub-object attributes.
+	 */
+	lov_stripe_lock(lsm);
+	result = lov_merge_lvb_kms(lsm, lvb, &kms);
+	lov_stripe_unlock(lsm);
+	if (result)
+		return result;
 
-		/*
-		 * XXX take lsm spin-lock to keep lov_merge_lvb_kms()
-		 * happy. It's not needed, because new code uses
-		 * ->coh_attr_guard spin-lock to protect consistency of
-		 * sub-object attributes.
-		 */
-		lov_stripe_lock(lsm);
-		result = lov_merge_lvb_kms(lsm, lvb, &kms);
-		lov_stripe_unlock(lsm);
-		if (result == 0) {
-			cl_lvb2attr(attr, lvb);
-			attr->cat_kms = kms;
-			r0->lo_attr_valid = 1;
-		}
-	}
+	cl_lvb2attr(attr, lvb);
+	attr->cat_kms = kms;
+	r0->lo_attr_valid = 1;
 
 	return result;
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [lustre-devel] [PATCH v2 11/33] lustre: lov: change lo_entries to array.
  2019-01-06 22:13 [lustre-devel] [PATCH v2 00/33] lustre: add PFL support James Simmons
                   ` (9 preceding siblings ...)
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 10/33] lustre: lov: reduce code indentation James Simmons
@ 2019-01-06 22:14 ` James Simmons
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 12/33] lustre: lov: move around PFL code and cleanups James Simmons
                   ` (21 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: James Simmons @ 2019-01-06 22:14 UTC (permalink / raw)
  To: lustre-devel

From: Bobi Jam <bobijam@hotmail.com>

Old style striping is equal to a single component. To support PFL
we need to change lo_entries to an array.

Signed-off-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Signed-off-by: Bobi Jam <bobijam@hotmail.com>
Signed-off-by: Niu Yawei <yawei.niu@intel.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-8998
Reviewed-on: https://review.whamcloud.com/24850
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../staging/lustre/lustre/lov/lov_cl_internal.h    |  14 +-
 drivers/staging/lustre/lustre/lov/lov_internal.h   |  20 +--
 drivers/staging/lustre/lustre/lov/lov_io.c         |  27 ++--
 drivers/staging/lustre/lustre/lov/lov_lock.c       |  13 +-
 drivers/staging/lustre/lustre/lov/lov_merge.c      |   6 +-
 drivers/staging/lustre/lustre/lov/lov_object.c     | 149 ++++++++++++---------
 drivers/staging/lustre/lustre/lov/lov_offset.c     |  30 +++--
 drivers/staging/lustre/lustre/lov/lov_page.c       |  11 +-
 drivers/staging/lustre/lustre/lov/lovsub_object.c  |   5 +-
 9 files changed, 158 insertions(+), 117 deletions(-)

diff --git a/drivers/staging/lustre/lustre/lov/lov_cl_internal.h b/drivers/staging/lustre/lustre/lov/lov_cl_internal.h
index c44c937..99bd1c1 100644
--- a/drivers/staging/lustre/lustre/lov/lov_cl_internal.h
+++ b/drivers/staging/lustre/lustre/lov/lov_cl_internal.h
@@ -219,9 +219,13 @@ struct lov_object {
 		struct lov_layout_state_released {
 		} released;
 		struct lov_layout_composite {
+			/**
+			 * Current valid entry count of lo_entries.
+			 */
+			unsigned int lo_entry_count;
 			struct lov_layout_entry {
 				struct lov_layout_raid0 lle_raid0;
-			} lo_entries;
+			} *lo_entries;
 		} composite;
 	} u;
 	/**
@@ -628,13 +632,13 @@ static inline struct lov_thread_info *lov_env_info(const struct lu_env *env)
 	return info;
 }
 
-static inline struct lov_layout_raid0 *lov_r0(struct lov_object *lov)
+static inline struct lov_layout_raid0 *lov_r0(struct lov_object *lov, int i)
 {
 	LASSERT(lov->lo_type == LLT_COMP);
-	LASSERT(lov->lo_lsm->lsm_magic == LOV_MAGIC ||
-		lov->lo_lsm->lsm_magic == LOV_MAGIC_V3);
+	LASSERTF(i < lov->u.composite.lo_entry_count,
+		 "entry %d entry_count %d", i, lov->u.composite.lo_entry_count);
 
-	return &lov->u.composite.lo_entries.lle_raid0;
+	return &lov->u.composite.lo_entries[i].lle_raid0;
 }
 
 /* lov_pack.c */
diff --git a/drivers/staging/lustre/lustre/lov/lov_internal.h b/drivers/staging/lustre/lustre/lov/lov_internal.h
index f2747c9..4c9e324 100644
--- a/drivers/staging/lustre/lustre/lov/lov_internal.h
+++ b/drivers/staging/lustre/lustre/lov/lov_internal.h
@@ -169,20 +169,22 @@ struct lov_request_set {
 	(char *)((lv)->lov_tgts[index]->ltd_uuid.uuid)
 
 /* lov_merge.c */
-int lov_merge_lvb_kms(struct lov_stripe_md *lsm,
+int lov_merge_lvb_kms(struct lov_stripe_md *lsm, int index,
 		      struct ost_lvb *lvb, __u64 *kms_place);
 
 /* lov_offset.c */
-u64 lov_stripe_size(struct lov_stripe_md *lsm, u64 ost_size, int stripeno);
-int lov_stripe_offset(struct lov_stripe_md *lsm, u64 lov_off,
-		      int stripeno, u64 *u64);
-u64 lov_size_to_stripe(struct lov_stripe_md *lsm, u64 file_size, int stripeno);
-int lov_stripe_intersects(struct lov_stripe_md *lsm, int stripeno,
+u64 lov_stripe_size(struct lov_stripe_md *lsm, int index, u64 ost_size,
+		    int stripeno);
+int lov_stripe_offset(struct lov_stripe_md *lsm, int index, u64 lov_off,
+		      int stripeno, u64 *obd_off);
+u64 lov_size_to_stripe(struct lov_stripe_md *lsm, int index, u64 file_size,
+		       int stripeno);
+int lov_stripe_intersects(struct lov_stripe_md *lsm, int index, int stripeno,
 			  u64 start, u64 end,
 			  u64 *obd_start, u64 *obd_end);
-int lov_stripe_number(struct lov_stripe_md *lsm, u64 lov_off);
-pgoff_t lov_stripe_pgoff(struct lov_stripe_md *lsm, pgoff_t stripe_index,
-			 int stripe);
+int lov_stripe_number(struct lov_stripe_md *lsm, int index, u64 lov_off);
+pgoff_t lov_stripe_pgoff(struct lov_stripe_md *lsm, int index,
+			 pgoff_t stripe_index, int stripe);
 
 /* lov_request.c */
 int lov_prep_statfs_set(struct obd_device *obd, struct obd_info *oinfo,
diff --git a/drivers/staging/lustre/lustre/lov/lov_io.c b/drivers/staging/lustre/lustre/lov/lov_io.c
index 6dd5639..26d0043 100644
--- a/drivers/staging/lustre/lustre/lov/lov_io.c
+++ b/drivers/staging/lustre/lustre/lov/lov_io.c
@@ -85,7 +85,7 @@ static void lov_io_sub_inherit(struct cl_io *io, struct lov_io *lio,
 		if (cl_io_is_trunc(io)) {
 			loff_t new_size = parent->u.ci_setattr.sa_attr.lvb_size;
 
-			new_size = lov_size_to_stripe(lsm, new_size, stripe);
+			new_size = lov_size_to_stripe(lsm, 0, new_size, stripe);
 			io->u.ci_setattr.sa_attr.lvb_size = new_size;
 		}
 		break;
@@ -101,7 +101,7 @@ static void lov_io_sub_inherit(struct cl_io *io, struct lov_io *lio,
 		loff_t off = cl_offset(obj, parent->u.ci_fault.ft_index);
 
 		io->u.ci_fault = parent->u.ci_fault;
-		off = lov_size_to_stripe(lsm, off, stripe);
+		off = lov_size_to_stripe(lsm, 0, off, stripe);
 		io->u.ci_fault.ft_index = cl_index(obj, off);
 		break;
 	}
@@ -144,13 +144,14 @@ static int lov_io_sub_init(const struct lu_env *env, struct lov_io *lio,
 	struct cl_object  *sub_obj;
 	struct cl_io      *io  = lio->lis_cl.cis_io;
 	int stripe = sub->sub_subio_index;
+	int index = 0;
 	int rc;
 
 	LASSERT(!sub->sub_io);
 	LASSERT(!sub->sub_env);
 	LASSERT(sub->sub_subio_index < lio->lis_stripe_count);
 
-	if (unlikely(!lov_r0(lov)->lo_sub[stripe]))
+	if (unlikely(!lov_r0(lov, index)->lo_sub[stripe]))
 		return -EIO;
 
 	sub->sub_io_initialized = 0;
@@ -179,7 +180,7 @@ static int lov_io_sub_init(const struct lu_env *env, struct lov_io *lio,
 		}
 	}
 
-	sub_obj = lovsub2cl(lov_r0(lov)->lo_sub[stripe]);
+	sub_obj = lovsub2cl(lov_r0(lov, index)->lo_sub[stripe]);
 	sub_io = sub->sub_io;
 
 	sub_io->ci_obj = sub_obj;
@@ -375,14 +376,15 @@ static int lov_io_iter_init(const struct lu_env *env,
 	u64 end;
 	int stripe;
 	int rc = 0;
+	int index = 0;
 
 	endpos = lov_offset_mod(lio->lis_endpos, -1);
 	for (stripe = 0; stripe < lio->lis_stripe_count; stripe++) {
-		if (!lov_stripe_intersects(lsm, stripe, lio->lis_pos,
+		if (!lov_stripe_intersects(lsm, index, stripe, lio->lis_pos,
 					   endpos, &start, &end))
 			continue;
 
-		if (unlikely(!lov_r0(lio->lis_object)->lo_sub[stripe])) {
+		if (unlikely(!lov_r0(lio->lis_object, index)->lo_sub[stripe])) {
 			if (ios->cis_io->ci_type == CIT_READ ||
 			    ios->cis_io->ci_type == CIT_WRITE ||
 			    ios->cis_io->ci_type == CIT_FAULT)
@@ -555,15 +557,18 @@ static int lov_io_read_ahead(const struct lu_env *env,
 	struct lov_io *lio = cl2lov_io(env, ios);
 	struct lov_object *loo = lio->lis_object;
 	struct cl_object *obj = lov2cl(loo);
-	struct lov_layout_raid0 *r0 = lov_r0(loo);
+	struct lov_layout_raid0 *r0;
 	unsigned int pps; /* pages per stripe */
 	struct lov_io_sub *sub;
 	pgoff_t ra_end;
-	loff_t suboff;
+	u64 suboff;
 	int stripe;
+	int index = 0;
 	int rc;
 
-	stripe = lov_stripe_number(loo->lo_lsm, cl_offset(obj, start));
+	stripe = lov_stripe_number(loo->lo_lsm, index, cl_offset(obj, start));
+
+	r0 = lov_r0(loo, index);
 	if (unlikely(!r0->lo_sub[stripe]))
 		return -EIO;
 
@@ -571,7 +576,7 @@ static int lov_io_read_ahead(const struct lu_env *env,
 	if (IS_ERR(sub))
 		return PTR_ERR(sub);
 
-	lov_stripe_offset(loo->lo_lsm, cl_offset(obj, start), stripe, &suboff);
+	lov_stripe_offset(loo->lo_lsm, index, cl_offset(obj, start), stripe, &suboff);
 	rc = cl_io_read_ahead(sub->sub_env, sub->sub_io,
 			      cl_index(lovsub2cl(r0->lo_sub[stripe]), suboff),
 			      ra);
@@ -593,7 +598,7 @@ static int lov_io_read_ahead(const struct lu_env *env,
 	/* cra_end is stripe level, convert it into file level */
 	ra_end = ra->cra_end;
 	if (ra_end != CL_PAGE_EOF)
-		ra_end = lov_stripe_pgoff(loo->lo_lsm, ra_end, stripe);
+		ra_end = lov_stripe_pgoff(loo->lo_lsm, index, ra_end, stripe);
 
 	pps = loo->lo_lsm->lsm_entries[0]->lsme_stripe_size >> PAGE_SHIFT;
 
diff --git a/drivers/staging/lustre/lustre/lov/lov_lock.c b/drivers/staging/lustre/lustre/lov/lov_lock.c
index 4340063..36c9eb7 100644
--- a/drivers/staging/lustre/lustre/lov/lov_lock.c
+++ b/drivers/staging/lustre/lustre/lov/lov_lock.c
@@ -114,7 +114,11 @@ static struct lov_lock *lov_lock_sub_init(const struct lu_env *env,
 					  const struct cl_object *obj,
 					  struct cl_lock *lock)
 {
+	struct lov_object *loo = cl2lov(obj);
+	struct lov_layout_raid0 *r0;
+	struct lov_lock	*lovlck;
 	int result = 0;
+	int index = 0;
 	int i;
 	int nr;
 	u64 start;
@@ -122,10 +126,6 @@ static struct lov_lock *lov_lock_sub_init(const struct lu_env *env,
 	u64 file_start;
 	u64 file_end;
 
-	struct lov_object       *loo    = cl2lov(obj);
-	struct lov_layout_raid0 *r0     = lov_r0(loo);
-	struct lov_lock		*lovlck;
-
 	CDEBUG(D_INODE, "%p: lock/io FID " DFID "/" DFID ", lock/io clobj %p/%p\n",
 	       loo, PFID(lu_object_fid(lov2lu(loo))),
 	       PFID(lu_object_fid(&obj->co_lu)),
@@ -134,13 +134,14 @@ static struct lov_lock *lov_lock_sub_init(const struct lu_env *env,
 	file_start = cl_offset(lov2cl(loo), lock->cll_descr.cld_start);
 	file_end   = cl_offset(lov2cl(loo), lock->cll_descr.cld_end + 1) - 1;
 
+	r0 = lov_r0(loo, index);
 	for (i = 0, nr = 0; i < r0->lo_nr; i++) {
 		/*
 		 * XXX for wide striping smarter algorithm is desirable,
 		 * breaking out of the loop, early.
 		 */
 		if (likely(r0->lo_sub[i]) && /* spare layout */
-		    lov_stripe_intersects(loo->lo_lsm, i,
+		    lov_stripe_intersects(loo->lo_lsm, index, i,
 					  file_start, file_end, &start, &end))
 			nr++;
 	}
@@ -153,7 +154,7 @@ static struct lov_lock *lov_lock_sub_init(const struct lu_env *env,
 	lovlck->lls_nr = nr;
 	for (i = 0, nr = 0; i < r0->lo_nr; ++i) {
 		if (likely(r0->lo_sub[i]) &&
-		    lov_stripe_intersects(loo->lo_lsm, i,
+		    lov_stripe_intersects(loo->lo_lsm, index, i,
 					  file_start, file_end, &start, &end)) {
 			struct lov_lock_sub *lls = &lovlck->lls_sub[nr];
 			struct cl_lock_descr *descr;
diff --git a/drivers/staging/lustre/lustre/lov/lov_merge.c b/drivers/staging/lustre/lustre/lov/lov_merge.c
index 10b8448..020795f 100644
--- a/drivers/staging/lustre/lustre/lov/lov_merge.c
+++ b/drivers/staging/lustre/lustre/lov/lov_merge.c
@@ -41,7 +41,7 @@
  * initializes the current atime, mtime, ctime to avoid regressing a more
  * uptodate time on the local client.
  */
-int lov_merge_lvb_kms(struct lov_stripe_md *lsm,
+int lov_merge_lvb_kms(struct lov_stripe_md *lsm, int index,
 		      struct ost_lvb *lvb, __u64 *kms_place)
 {
 	__u64 size = 0;
@@ -69,14 +69,14 @@ int lov_merge_lvb_kms(struct lov_stripe_md *lsm,
 		}
 
 		tmpsize = loi->loi_kms;
-		lov_size = lov_stripe_size(lsm, tmpsize, i);
+		lov_size = lov_stripe_size(lsm, index, tmpsize, i);
 		if (lov_size > kms)
 			kms = lov_size;
 
 		if (loi->loi_lvb.lvb_size > tmpsize)
 			tmpsize = loi->loi_lvb.lvb_size;
 
-		lov_size = lov_stripe_size(lsm, tmpsize, i);
+		lov_size = lov_stripe_size(lsm, index, tmpsize, i);
 		if (lov_size > size)
 			size = lov_size;
 		/* merge blocks, mtime, atime */
diff --git a/drivers/staging/lustre/lustre/lov/lov_object.c b/drivers/staging/lustre/lustre/lov/lov_object.c
index 1ebaa23..de5e2a2 100644
--- a/drivers/staging/lustre/lustre/lov/lov_object.c
+++ b/drivers/staging/lustre/lustre/lov/lov_object.c
@@ -221,24 +221,13 @@ static int lov_init_raid0(const struct lu_env *env, struct lov_device *dev,
 			  const struct cl_object_conf *conf,
 			  struct lov_layout_raid0 *r0)
 {
+	struct cl_object *stripe;
+	struct lov_thread_info *lti = lov_env_info(env);
+	struct cl_object_conf *subconf = &lti->lti_stripe_conf;
+	struct lu_fid *ofid = &lti->lti_fid;
 	int result;
-	int i;
-
-	struct cl_object	*stripe;
-	struct lov_thread_info  *lti     = lov_env_info(env);
-	struct cl_object_conf   *subconf = &lti->lti_stripe_conf;
-	struct lu_fid	   *ofid    = &lti->lti_fid;
 	int psz;
-
-	if (lsm->lsm_magic != LOV_MAGIC_V1 && lsm->lsm_magic != LOV_MAGIC_V3) {
-		dump_lsm(D_ERROR, lsm);
-		LASSERTF(0, "magic mismatch, expected %d/%d, actual %d.\n",
-			 LOV_MAGIC_V1, LOV_MAGIC_V3, lsm->lsm_magic);
-	}
-
-	LASSERT(!lov->lo_lsm);
-	lov->lo_lsm = lsm_addref(lsm);
-	lov->lo_layout_invalid = true;
+	int i;
 
 	spin_lock_init(&r0->lo_sub_lock);
 	r0->lo_nr  = lsm->lsm_entries[0]->lsme_stripe_count;
@@ -305,10 +294,7 @@ static int lov_init_raid0(const struct lu_env *env, struct lov_device *dev,
 		}
 	}
 	if (result == 0)
-		cl_object_header(&lov->lo_cl)->coh_page_bufsize += psz;
-	else
-		result = -ENOMEM;
-
+		result = psz;
 out:
 	return result;
 }
@@ -319,9 +305,37 @@ static int lov_init_composite(const struct lu_env *env, struct lov_device *dev,
 			      union lov_layout_state *state)
 {
 	struct lov_layout_composite *comp = &state->composite;
-	struct lov_layout_entry *le = &comp->lo_entries;
+	unsigned int entry_count = 1;
+	unsigned int psz = 0;
+	int result = 0;
+	int i;
 
-	return lov_init_raid0(env, dev, lov, lsm, conf, &le->lle_raid0);
+	LASSERT(!lov->lo_lsm);
+	lov->lo_lsm = lsm_addref(lsm);
+	lov->lo_layout_invalid = true;
+
+	comp->lo_entry_count = entry_count;
+
+	comp->lo_entries = kcalloc(entry_count, sizeof(*comp->lo_entries),
+				   GFP_KERNEL);
+	if (!comp->lo_entries)
+		return -ENOMEM;
+
+	for (i = 0; i < entry_count; i++) {
+		struct lov_layout_entry *le = &comp->lo_entries[i];
+
+		result = lov_init_raid0(env, dev, lov, lsm, conf,
+					&le->lle_raid0);
+		if (result < 0)
+			break;
+
+		LASSERT(ergo(psz > 0, psz == result));
+		psz = result;
+	}
+	if (psz > 0)
+		cl_object_header(&lov->lo_cl)->coh_page_bufsize += psz;
+
+	return result > 0 ? 0 : result;
 }
 
 static int lov_init_released(const struct lu_env *env, struct lov_device *dev,
@@ -454,7 +468,7 @@ static int lov_delete_composite(const struct lu_env *env,
 				union lov_layout_state *state)
 {
 	struct lov_layout_composite *comp = &state->composite;
-	struct lov_layout_entry *entry = &comp->lo_entries;
+	struct lov_layout_entry *entry = &comp->lo_entries[0];
 
 	dump_lsm(D_INODE, lov->lo_lsm);
 
@@ -484,9 +498,15 @@ static void lov_fini_composite(const struct lu_env *env,
 			       union lov_layout_state *state)
 {
 	struct lov_layout_composite *comp = &state->composite;
-	struct lov_layout_entry *entry = &comp->lo_entries;
 
-	lov_fini_raid0(env, &entry->lle_raid0);
+	if (comp->lo_entries) {
+		struct lov_layout_entry *entry = &comp->lo_entries[0];
+
+		lov_fini_raid0(env, &entry->lle_raid0);
+
+		kvfree(comp->lo_entries);
+		comp->lo_entries = NULL;
+	}
 
 	dump_lsm(D_INODE, lov->lo_lsm);
 	lov_free_memmd(&lov->lo_lsm);
@@ -528,7 +548,7 @@ static int lov_print_composite(const struct lu_env *env, void *cookie,
 			       lu_printer_t p, const struct lu_object *o)
 {
 	struct lov_object *lov = lu2lov(o);
-	struct lov_layout_raid0	*r0 = lov_r0(lov);
+	struct lov_layout_raid0	*r0 = lov_r0(lov, 0);
 	struct lov_stripe_md *lsm = lov->lo_lsm;
 
 	(*p)(env, cookie, "stripes: %d, %s, lsm{%p 0x%08X %d %u %u}:\n",
@@ -600,7 +620,7 @@ static int lov_attr_get_raid0(const struct lu_env *env, struct lov_object *lov,
 	 * sub-object attributes.
 	 */
 	lov_stripe_lock(lsm);
-	result = lov_merge_lvb_kms(lsm, lvb, &kms);
+	result = lov_merge_lvb_kms(lsm, 0, lvb, &kms);
 	lov_stripe_unlock(lsm);
 	if (result)
 		return result;
@@ -617,7 +637,7 @@ static int lov_attr_get_composite(const struct lu_env *env,
 				  struct cl_attr *attr)
 {
 	struct lov_object *lov = cl2lov(obj);
-	struct lov_layout_raid0 *r0 = lov_r0(lov);
+	struct lov_layout_raid0 *r0 = lov_r0(lov, 0);
 	struct cl_attr *lov_attr = &r0->lo_attr;
 	int result;
 
@@ -1051,33 +1071,31 @@ int lov_lock_init(const struct lu_env *env, struct cl_object *obj,
  *
  * \retval last_stripe		return the last stripe of the mapping
  */
-static int fiemap_calc_last_stripe(struct lov_stripe_md *lsm,
+static int fiemap_calc_last_stripe(struct lov_stripe_md *lsm, int index,
 				   u64 fm_start, u64 fm_end,
 				   int start_stripe, int *stripe_count)
 {
+	struct lov_stripe_md_entry *lsme = lsm->lsm_entries[index];
 	int last_stripe;
 	u64 obd_start;
 	u64 obd_end;
 	int i, j;
 
-	if (fm_end - fm_start > lsm->lsm_entries[0]->lsme_stripe_size *
-				lsm->lsm_entries[0]->lsme_stripe_count) {
-		last_stripe = (start_stripe < 1 ?
-			       lsm->lsm_entries[0]->lsme_stripe_count - 1 :
-			       start_stripe - 1);
-		*stripe_count = lsm->lsm_entries[0]->lsme_stripe_count;
+	if (fm_end - fm_start >
+	    lsme->lsme_stripe_size * lsme->lsme_stripe_count) {
+		last_stripe = (start_stripe < 1 ? lsme->lsme_stripe_count - 1 :
+						  start_stripe - 1);
+		*stripe_count = lsme->lsme_stripe_count;
 	} else {
-		for (j = 0, i = start_stripe;
-		     j < lsm->lsm_entries[0]->lsme_stripe_count;
-		     i = (i + 1) % lsm->lsm_entries[0]->lsme_stripe_count,
+		for (j = 0, i = start_stripe; j < lsme->lsme_stripe_count;
+		     i = (i + 1) % lsme->lsme_stripe_count,
 		     j++) {
-			if (lov_stripe_intersects(lsm, i, fm_start, fm_end,
+			if (lov_stripe_intersects(lsm, index, i, fm_start, fm_end,
 						  &obd_start, &obd_end) == 0)
 				break;
 		}
 		*stripe_count = j;
-		last_stripe = (start_stripe + j - 1) %
-			      lsm->lsm_entries[0]->lsme_stripe_count;
+		last_stripe = (start_stripe + j - 1) % lsme->lsme_stripe_count;
 	}
 
 	return last_stripe;
@@ -1132,9 +1150,10 @@ static void fiemap_prepare_and_copy_exts(struct fiemap *fiemap,
  */
 static u64 fiemap_calc_fm_end_offset(struct fiemap *fiemap,
 				     struct lov_stripe_md *lsm,
-				     u64 fm_start, u64 fm_end,
+				     int index, u64 fm_start, u64 fm_end,
 				     int *start_stripe)
 {
+	struct lov_stripe_md_entry *lsme = lsm->lsm_entries[index];
 	u64 local_end = fiemap->fm_extents[0].fe_logical;
 	u64 lun_start, lun_end;
 	u64 fm_end_offset;
@@ -1145,8 +1164,8 @@ static u64 fiemap_calc_fm_end_offset(struct fiemap *fiemap,
 		return 0;
 
 	/* Find out stripe_no from ost_index saved in the fe_device */
-	for (i = 0; i < lsm->lsm_entries[0]->lsme_stripe_count; i++) {
-		struct lov_oinfo *oinfo = lsm->lsm_entries[0]->lsme_oinfo[i];
+	for (i = 0; i < lsme->lsme_stripe_count; i++) {
+		struct lov_oinfo *oinfo = lsme->lsme_oinfo[i];
 
 		if (lov_oinfo_is_dummy(oinfo))
 			continue;
@@ -1164,7 +1183,7 @@ static u64 fiemap_calc_fm_end_offset(struct fiemap *fiemap,
 	 * If we have finished mapping on previous device, shift logical
 	 * offset to start of next device
 	 */
-	if (lov_stripe_intersects(lsm, stripe_no, fm_start, fm_end,
+	if (lov_stripe_intersects(lsm, index, stripe_no, fm_start, fm_end,
 				  &lun_start, &lun_end) != 0 &&
 	    local_end < lun_end) {
 		fm_end_offset = local_end;
@@ -1174,8 +1193,7 @@ static u64 fiemap_calc_fm_end_offset(struct fiemap *fiemap,
 		 * calculate offset in next stripe.
 		 */
 		fm_end_offset = 0;
-		*start_stripe = (stripe_no + 1) %
-				lsm->lsm_entries[0]->lsme_stripe_count;
+		*start_stripe = (stripe_no + 1) % lsme->lsme_stripe_count;
 	}
 
 	return fm_end_offset;
@@ -1197,11 +1215,11 @@ struct fiemap_state {
 };
 
 static int fiemap_for_stripe(const struct lu_env *env, struct cl_object *obj,
-			     struct lov_stripe_md *lsm,
-			     struct fiemap *fiemap, size_t *buflen,
-			     struct ll_fiemap_info_key *fmkey, int stripeno,
-			     struct fiemap_state *fs)
+			     struct lov_stripe_md *lsm, struct fiemap *fiemap,
+			     size_t *buflen, struct ll_fiemap_info_key *fmkey,
+			     int index, int stripeno, struct fiemap_state *fs)
 {
+	struct lov_stripe_md_entry *lsme = lsm->lsm_entries[index];
 	struct cl_object *subobj;
 	struct lov_obd *lov = lu2lov_dev(obj->co_lu.lo_dev)->ld_lov;
 	struct fiemap_extent *fm_ext = &fs->fs_fm->fm_extents[0];
@@ -1220,11 +1238,12 @@ static int fiemap_for_stripe(const struct lu_env *env, struct cl_object *obj,
 
 	fs->fs_device_done = false;
 	/* Find out range of mapping on this stripe */
-	if ((lov_stripe_intersects(lsm, stripeno, fs->fs_start, fs->fs_end,
+	if ((lov_stripe_intersects(lsm, index, stripeno,
+				   fs->fs_start, fs->fs_end,
 				   &lun_start, &obd_object_end)) == 0)
 		return 0;
 
-	if (lov_oinfo_is_dummy(lsm->lsm_entries[0]->lsme_oinfo[stripeno]))
+	if (lov_oinfo_is_dummy(lsme->lsme_oinfo[stripeno]))
 		return -EIO;
 
 	/* If this is a continuation FIEMAP call and we are on
@@ -1239,7 +1258,8 @@ static int fiemap_for_stripe(const struct lu_env *env, struct cl_object *obj,
 		/* Handle fs->fs_start + fs->fs_length overflow */
 		if (fs->fs_start + fs->fs_length < fs->fs_start)
 			fs->fs_length = ~0ULL - fs->fs_start;
-		lun_end = lov_size_to_stripe(lsm, fs->fs_start + fs->fs_length,
+		lun_end = lov_size_to_stripe(lsm, index,
+					     fs->fs_start + fs->fs_length,
 					     stripeno);
 	}
 
@@ -1274,7 +1294,7 @@ static int fiemap_for_stripe(const struct lu_env *env, struct cl_object *obj,
 		fs->fs_fm->fm_mapped_extents = 0;
 		fs->fs_fm->fm_flags = fiemap->fm_flags;
 
-		ost_index = lsm->lsm_entries[0]->lsme_oinfo[stripeno]->loi_ost_idx;
+		ost_index = lsme->lsme_oinfo[stripeno]->loi_ost_idx;
 
 		if (ost_index < 0 || ost_index >= lov->desc.ld_tgt_count) {
 			rc = -EINVAL;
@@ -1345,8 +1365,9 @@ static int fiemap_for_stripe(const struct lu_env *env, struct cl_object *obj,
 		 */
 		if (fm_ext[ext_count - 1].fe_flags & FIEMAP_EXTENT_LAST)
 			fm_ext[ext_count - 1].fe_flags &= ~FIEMAP_EXTENT_LAST;
-		if (lov_stripe_size(lsm, fm_ext[ext_count - 1].fe_logical +
-					 fm_ext[ext_count - 1].fe_length,
+		if (lov_stripe_size(lsm, index,
+				    fm_ext[ext_count - 1].fe_logical +
+				    fm_ext[ext_count - 1].fe_length,
 				    stripeno) >= fmkey->lfik_oa.o_size) {
 			ost_eof = true;
 			fs->fs_device_done = true;
@@ -1391,6 +1412,7 @@ static int lov_object_fiemap(const struct lu_env *env, struct cl_object *obj,
 	struct fiemap *fm_local = NULL;
 	struct lov_stripe_md *lsm;
 	int rc = 0;
+	int entry = 0;
 	int cur_stripe;
 	int stripe_count;
 	struct fiemap_state fs = { NULL };
@@ -1450,7 +1472,7 @@ static int lov_object_fiemap(const struct lu_env *env, struct cl_object *obj,
 		goto out;
 	}
 	/* Calculate start stripe, last stripe and length of mapping */
-	fs.fs_start_stripe = lov_stripe_number(lsm, fs.fs_start);
+	fs.fs_start_stripe = lov_stripe_number(lsm, 0, fs.fs_start);
 	fs.fs_end = (fs.fs_length == ~0ULL) ? fmkey->lfik_oa.o_size :
 					      fs.fs_start + fs.fs_length - 1;
 	/* If fs_length != ~0ULL but fs_start+fs_length-1 exceeds file size */
@@ -1459,11 +1481,12 @@ static int lov_object_fiemap(const struct lu_env *env, struct cl_object *obj,
 		fs.fs_length = fs.fs_end - fs.fs_start;
 	}
 
-	fs.fs_last_stripe = fiemap_calc_last_stripe(lsm, fs.fs_start, fs.fs_end,
+	fs.fs_last_stripe = fiemap_calc_last_stripe(lsm, entry,
+						    fs.fs_start, fs.fs_end,
 						    fs.fs_start_stripe,
 						    &stripe_count);
-	fs.fs_end_offset = fiemap_calc_fm_end_offset(fiemap, lsm, fs.fs_start,
-						     fs.fs_end,
+	fs.fs_end_offset = fiemap_calc_fm_end_offset(fiemap, lsm, entry,
+						     fs.fs_start, fs.fs_end,
 						     &fs.fs_start_stripe);
 	if (fs.fs_end_offset == -EINVAL) {
 		rc = -EINVAL;
@@ -1489,8 +1512,8 @@ static int lov_object_fiemap(const struct lu_env *env, struct cl_object *obj,
 	     --stripe_count,
 	     cur_stripe = (cur_stripe + 1) %
 			  lsm->lsm_entries[0]->lsme_stripe_count) {
-		rc = fiemap_for_stripe(env, obj, lsm, fiemap, buflen, fmkey,
-				       cur_stripe, &fs);
+		rc = fiemap_for_stripe(env, obj, lsm, fiemap, buflen,
+				       fmkey, 0, cur_stripe, &fs);
 		if (rc < 0)
 			goto out;
 		if (fs.fs_finish)
diff --git a/drivers/staging/lustre/lustre/lov/lov_offset.c b/drivers/staging/lustre/lustre/lov/lov_offset.c
index 19a44d3..d817aa5 100644
--- a/drivers/staging/lustre/lustre/lov/lov_offset.c
+++ b/drivers/staging/lustre/lustre/lov/lov_offset.c
@@ -38,9 +38,10 @@
 #include "lov_internal.h"
 
 /* compute object size given "stripeno" and the ost size */
-u64 lov_stripe_size(struct lov_stripe_md *lsm, u64 ost_size, int stripeno)
+u64 lov_stripe_size(struct lov_stripe_md *lsm, int index, u64 ost_size,
+		    int stripeno)
 {
-	unsigned long ssize = lsm->lsm_entries[0]->lsme_stripe_size;
+	unsigned long ssize = lsm->lsm_entries[index]->lsme_stripe_size;
 	unsigned long stripe_size;
 	u64 swidth;
 	u64 lov_size;
@@ -64,12 +65,13 @@ u64 lov_stripe_size(struct lov_stripe_md *lsm, u64 ost_size, int stripeno)
 /**
  * Compute file level page index by stripe level page offset
  */
-pgoff_t lov_stripe_pgoff(struct lov_stripe_md *lsm, pgoff_t stripe_index,
-			 int stripe)
+pgoff_t lov_stripe_pgoff(struct lov_stripe_md *lsm, int index,
+			 pgoff_t stripe_index, int stripe)
 {
 	loff_t offset;
 
-	offset = lov_stripe_size(lsm, (stripe_index << PAGE_SHIFT) + 1, stripe);
+	offset = lov_stripe_size(lsm, index, (stripe_index << PAGE_SHIFT) + 1,
+				 stripe);
 	return offset >> PAGE_SHIFT;
 }
 
@@ -122,10 +124,10 @@ pgoff_t lov_stripe_pgoff(struct lov_stripe_md *lsm, pgoff_t stripe_index,
  * falls in the stripe and no shifting was done; > 0 when the offset
  * was outside the stripe and was pulled back to its final byte.
  */
-int lov_stripe_offset(struct lov_stripe_md *lsm, u64 lov_off,
+int lov_stripe_offset(struct lov_stripe_md *lsm, int index, u64 lov_off,
 		      int stripeno, u64 *obdoff)
 {
-	unsigned long ssize  = lsm->lsm_entries[0]->lsme_stripe_size;
+	unsigned long ssize  = lsm->lsm_entries[index]->lsme_stripe_size;
 	u64 stripe_off, this_stripe, swidth;
 	int magic = lsm->lsm_magic;
 	int ret = 0;
@@ -177,10 +179,10 @@ int lov_stripe_offset(struct lov_stripe_md *lsm, u64 lov_off,
  * |    0    |     1     |     2     |    0    |     1     |     2     |
  * ---------------------------------------------------------------------
  */
-u64 lov_size_to_stripe(struct lov_stripe_md *lsm, u64 file_size,
+u64 lov_size_to_stripe(struct lov_stripe_md *lsm, int index, u64 file_size,
 		       int stripeno)
 {
-	unsigned long ssize  = lsm->lsm_entries[0]->lsme_stripe_size;
+	unsigned long ssize  = lsm->lsm_entries[index]->lsme_stripe_size;
 	u64 stripe_off, this_stripe, swidth;
 	int magic = lsm->lsm_magic;
 
@@ -218,13 +220,13 @@ u64 lov_size_to_stripe(struct lov_stripe_md *lsm, u64 file_size,
  * that is contained within the lov extent.  this returns true if the given
  * stripe does intersect with the lov extent.
  */
-int lov_stripe_intersects(struct lov_stripe_md *lsm, int stripeno,
+int lov_stripe_intersects(struct lov_stripe_md *lsm, int index, int stripeno,
 			  u64 start, u64 end, u64 *obd_start, u64 *obd_end)
 {
 	int start_side, end_side;
 
-	start_side = lov_stripe_offset(lsm, start, stripeno, obd_start);
-	end_side = lov_stripe_offset(lsm, end, stripeno, obd_end);
+	start_side = lov_stripe_offset(lsm, index, start, stripeno, obd_start);
+	end_side = lov_stripe_offset(lsm, index, end, stripeno, obd_end);
 
 	CDEBUG(D_INODE, "[%llu->%llu] -> [(%d) %llu->%llu (%d)]\n",
 	       start, end, start_side, *obd_start, *obd_end, end_side);
@@ -252,9 +254,9 @@ int lov_stripe_intersects(struct lov_stripe_md *lsm, int stripeno,
 }
 
 /* compute which stripe number "lov_off" will be written into */
-int lov_stripe_number(struct lov_stripe_md *lsm, u64 lov_off)
+int lov_stripe_number(struct lov_stripe_md *lsm, int index, u64 lov_off)
 {
-	unsigned long ssize  = lsm->lsm_entries[0]->lsme_stripe_size;
+	unsigned long ssize  = lsm->lsm_entries[index]->lsme_stripe_size;
 	u64 stripe_off, swidth;
 	int magic = lsm->lsm_magic;
 
diff --git a/drivers/staging/lustre/lustre/lov/lov_page.c b/drivers/staging/lustre/lustre/lov/lov_page.c
index d94d003..ad34fc3 100644
--- a/drivers/staging/lustre/lustre/lov/lov_page.c
+++ b/drivers/staging/lustre/lustre/lov/lov_page.c
@@ -67,21 +67,24 @@ int lov_page_init_composite(const struct lu_env *env, struct cl_object *obj,
 			    struct cl_page *page, pgoff_t index)
 {
 	struct lov_object *loo = cl2lov(obj);
-	struct lov_layout_raid0 *r0 = lov_r0(loo);
 	struct lov_io     *lio = lov_env_io(env);
+	struct lov_layout_raid0 *r0;
 	struct cl_object  *subobj;
 	struct cl_object  *o;
 	struct lov_io_sub *sub;
 	struct lov_page   *lpg = cl_object_page_slice(obj, page);
-	loff_t	     offset;
+	u64 offset;
 	u64	    suboff;
 	int		stripe;
+	int entry = 0;
 	int		rc;
 
 	offset = cl_offset(obj, index);
-	stripe = lov_stripe_number(loo->lo_lsm, offset);
+
+	r0 = lov_r0(loo, entry);
+	stripe = lov_stripe_number(loo->lo_lsm, entry, offset);
 	LASSERT(stripe < r0->lo_nr);
-	rc = lov_stripe_offset(loo->lo_lsm, offset, stripe, &suboff);
+	rc = lov_stripe_offset(loo->lo_lsm, entry, offset, stripe, &suboff);
 	LASSERT(rc == 0);
 
 	lpg->lps_index = stripe;
diff --git a/drivers/staging/lustre/lustre/lov/lovsub_object.c b/drivers/staging/lustre/lustre/lov/lovsub_object.c
index d3e9537..cd7806b 100644
--- a/drivers/staging/lustre/lustre/lov/lovsub_object.c
+++ b/drivers/staging/lustre/lustre/lov/lovsub_object.c
@@ -79,8 +79,9 @@ static void lovsub_object_free(const struct lu_env *env, struct lu_object *obj)
 	 * object handling in lu_object_find.
 	 */
 	if (lov) {
+		int index = 0;
 		int stripe = los->lso_index;
-		struct lov_layout_raid0 *r0 = lov_r0(lov);
+		struct lov_layout_raid0 *r0 = lov_r0(lov, index);
 
 		LASSERT(lov->lo_type == LLT_COMP);
 		LASSERT(r0->lo_sub[stripe] == los);
@@ -107,7 +108,7 @@ static int lovsub_attr_update(const struct lu_env *env, struct cl_object *obj,
 {
 	struct lov_object *lov = cl2lovsub(obj)->lso_super;
 
-	lov_r0(lov)->lo_attr_valid = 0;
+	lov_r0(lov, 0)->lo_attr_valid = 0;
 	return 0;
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [lustre-devel] [PATCH v2 12/33] lustre: lov: move around PFL code and cleanups
  2019-01-06 22:13 [lustre-devel] [PATCH v2 00/33] lustre: add PFL support James Simmons
                   ` (10 preceding siblings ...)
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 11/33] lustre: lov: change lo_entries to array James Simmons
@ 2019-01-06 22:14 ` James Simmons
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 13/33] lustre: lov: remove lsm_stripe_by_[index|offset]_plain James Simmons
                   ` (20 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: James Simmons @ 2019-01-06 22:14 UTC (permalink / raw)
  To: lustre-devel

From: Bobi Jam <bobijam@hotmail.com>

No code changes expect for sub_subio_index that changed type.
Move some code around and some style cleanups. This makes it
clear the real code changes from style updates.

Signed-off-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Signed-off-by: Bobi Jam <bobijam@hotmail.com>
Signed-off-by: Niu Yawei <yawei.niu@intel.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-8998
Reviewed-on: https://review.whamcloud.com/24850
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/llite/file.c         |   2 +-
 .../staging/lustre/lustre/lov/lov_cl_internal.h    |  45 ++---
 drivers/staging/lustre/lustre/lov/lov_ea.c         |   5 +-
 drivers/staging/lustre/lustre/lov/lov_io.c         | 181 ++++++++++-----------
 drivers/staging/lustre/lustre/lov/lov_object.c     |  25 +--
 5 files changed, 130 insertions(+), 128 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index c018c5f..fae0111 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -3726,6 +3726,7 @@ static int ll_layout_lock_set(struct lustre_handle *lockh, enum ldlm_mode mode,
 	lock_res_and_lock(lock);
 	lvb_ready = ldlm_is_lvb_ready(lock);
 	unlock_res_and_lock(lock);
+
 	/* checking lvb_ready is racy but this is okay. The worst case is
 	 * that multi processes may configure the file on the same time.
 	 */
@@ -3755,7 +3756,6 @@ static int ll_layout_lock_set(struct lustre_handle *lockh, enum ldlm_mode mode,
 
 	/* refresh layout failed, need to wait */
 	wait_layout = rc == -EBUSY;
-
 out:
 	LDLM_LOCK_PUT(lock);
 	ldlm_lock_decref(lockh, mode);
diff --git a/drivers/staging/lustre/lustre/lov/lov_cl_internal.h b/drivers/staging/lustre/lustre/lov/lov_cl_internal.h
index 99bd1c1..ce32823 100644
--- a/drivers/staging/lustre/lustre/lov/lov_cl_internal.h
+++ b/drivers/staging/lustre/lustre/lov/lov_cl_internal.h
@@ -315,12 +315,6 @@ struct lov_thread_info {
  */
 struct lov_io_sub {
 	/**
-	 * environment's refcheck.
-	 *
-	 * \see cl_env_get()
-	 */
-	u16			 sub_refcheck;
-	/**
 	 * true, iff cl_io_init() was successfully executed against
 	 * lov_io_sub::sub_io.
 	 */
@@ -334,18 +328,24 @@ struct lov_io_sub {
 	 * Linkage into a list (hanging off lov_io::lis_active) of all
 	 * sub-io's active for the current IO iteration.
 	 */
-	struct list_head	 sub_linkage;
-	u16			sub_subio_index;
+	struct list_head	sub_linkage;
+	unsigned int		sub_subio_index;
 	/**
 	 * sub-io for a stripe. Ideally sub-io's can be stopped and resumed
 	 * independently, with lov acting as a scheduler to maximize overall
 	 * throughput.
 	 */
-	struct cl_io	*sub_io;
+	struct cl_io		*sub_io;
 	/**
 	 * environment, in which sub-io executes.
 	 */
-	struct lu_env *sub_env;
+	struct lu_env		*sub_env;
+	/**
+	 * environment's refcheck.
+	 *
+	 * \see cl_env_get()
+	 */
+	u16			sub_refcheck;
 };
 
 /**
@@ -367,37 +367,38 @@ struct lov_io {
 	 *
 	 * This is used only for CIT_READ and CIT_WRITE io's.
 	 */
-	loff_t	     lis_io_endpos;
+	loff_t			lis_io_endpos;
 
 	/**
 	 * starting position within a file, for the current io loop iteration
 	 * (stripe), used by ci_io_loop().
 	 */
-	u64	    lis_pos;
+	u64			lis_pos;
 	/**
 	 * end position with in a file, for the current stripe io. This is
 	 * exclusive (i.e., next offset after last byte affected by io).
 	 */
-	u64	    lis_endpos;
-
-	int		lis_stripe_count;
-	int		lis_active_subios;
+	u64			lis_endpos;
+	int			lis_stripe_count;
+	int			lis_active_subios;
 
 	/**
 	 * the index of ls_single_subio in ls_subios array
 	 */
-	int		lis_single_subio_index;
-	struct cl_io       lis_single_subio;
+	int			lis_single_subio_index;
+	struct cl_io		lis_single_subio;
+
+	/**
+	 * List of active sub-io's. Active sub-io's are under the range
+	 * of [lis_pos, lis_endpos).
+	 */
+	struct list_head	lis_active;
 
 	/**
 	 * size of ls_subios array, actually the highest stripe #
 	 */
 	int		lis_nr_subios;
 	struct lov_io_sub *lis_subs;
-	/**
-	 * List of active sub-io's.
-	 */
-	struct list_head	 lis_active;
 };
 
 struct lov_session {
diff --git a/drivers/staging/lustre/lustre/lov/lov_ea.c b/drivers/staging/lustre/lustre/lov/lov_ea.c
index 7d86318..f2a5a60 100644
--- a/drivers/staging/lustre/lustre/lov/lov_ea.c
+++ b/drivers/staging/lustre/lustre/lov/lov_ea.c
@@ -338,7 +338,7 @@ void lsm_free(struct lov_stripe_md *lsm)
 const static struct lsm_operations lsm_v1_ops = {
 	.lsm_stripe_by_index    = lsm_stripe_by_index_plain,
 	.lsm_stripe_by_offset   = lsm_stripe_by_offset_plain,
-	.lsm_unpackmd	   = lsm_unpackmd_v1,
+	.lsm_unpackmd		= lsm_unpackmd_v1,
 };
 
 static struct lov_stripe_md *
@@ -531,7 +531,8 @@ const struct lsm_operations *lsm_op_find(int magic)
 
 void dump_lsm(unsigned int level, const struct lov_stripe_md *lsm)
 {
-	CDEBUG(level, "lsm %p, objid " DOSTID ", maxbytes %#llx, magic 0x%08X, stripe_size %u, stripe_count %u, refc: %d, layout_gen %u, pool [" LOV_POOLNAMEF "]\n",
+	CDEBUG(level,
+	       "lsm %p, objid " DOSTID ", maxbytes %#llx, magic 0x%08X, stripe_size %u, stripe_count %u, refc: %d, layout_gen %u, pool [" LOV_POOLNAMEF "]\n",
 	       lsm, POSTID(&lsm->lsm_oi), lsm->lsm_maxbytes, lsm->lsm_magic,
 	       lsm->lsm_entries[0]->lsme_stripe_size,
 	       lsm->lsm_entries[0]->lsme_stripe_count,
diff --git a/drivers/staging/lustre/lustre/lov/lov_io.c b/drivers/staging/lustre/lustre/lov/lov_io.c
index 26d0043..ab97326 100644
--- a/drivers/staging/lustre/lustre/lov/lov_io.c
+++ b/drivers/staging/lustre/lustre/lov/lov_io.c
@@ -43,7 +43,6 @@
 /** \addtogroup lov
  *  @{
  */
-
 static void lov_io_sub_fini(const struct lu_env *env, struct lov_io *lio,
 			    struct lov_io_sub *sub)
 {
@@ -66,76 +65,6 @@ static void lov_io_sub_fini(const struct lu_env *env, struct lov_io *lio,
 	}
 }
 
-static void lov_io_sub_inherit(struct cl_io *io, struct lov_io *lio,
-			       int stripe, loff_t start, loff_t end)
-{
-	struct lov_stripe_md *lsm    = lio->lis_object->lo_lsm;
-	struct cl_io	 *parent = lio->lis_cl.cis_io;
-
-	switch (io->ci_type) {
-	case CIT_SETATTR: {
-		io->u.ci_setattr.sa_attr = parent->u.ci_setattr.sa_attr;
-		io->u.ci_setattr.sa_attr_flags =
-					parent->u.ci_setattr.sa_attr_flags;
-		io->u.ci_setattr.sa_avalid = parent->u.ci_setattr.sa_avalid;
-		io->u.ci_setattr.sa_xvalid = parent->u.ci_setattr.sa_xvalid;
-		io->u.ci_setattr.sa_stripe_index = stripe;
-		io->u.ci_setattr.sa_parent_fid =
-					parent->u.ci_setattr.sa_parent_fid;
-		if (cl_io_is_trunc(io)) {
-			loff_t new_size = parent->u.ci_setattr.sa_attr.lvb_size;
-
-			new_size = lov_size_to_stripe(lsm, 0, new_size, stripe);
-			io->u.ci_setattr.sa_attr.lvb_size = new_size;
-		}
-		break;
-	}
-	case CIT_DATA_VERSION: {
-		io->u.ci_data_version.dv_data_version = 0;
-		io->u.ci_data_version.dv_flags =
-			parent->u.ci_data_version.dv_flags;
-		break;
-	}
-	case CIT_FAULT: {
-		struct cl_object *obj = parent->ci_obj;
-		loff_t off = cl_offset(obj, parent->u.ci_fault.ft_index);
-
-		io->u.ci_fault = parent->u.ci_fault;
-		off = lov_size_to_stripe(lsm, 0, off, stripe);
-		io->u.ci_fault.ft_index = cl_index(obj, off);
-		break;
-	}
-	case CIT_FSYNC: {
-		io->u.ci_fsync.fi_start = start;
-		io->u.ci_fsync.fi_end = end;
-		io->u.ci_fsync.fi_fid = parent->u.ci_fsync.fi_fid;
-		io->u.ci_fsync.fi_mode = parent->u.ci_fsync.fi_mode;
-		break;
-	}
-	case CIT_READ:
-	case CIT_WRITE: {
-		io->u.ci_wr.wr_sync = cl_io_is_sync_write(parent);
-		if (cl_io_is_append(parent)) {
-			io->u.ci_wr.wr_append = 1;
-		} else {
-			io->u.ci_rw.crw_pos = start;
-			io->u.ci_rw.crw_count = end - start;
-		}
-		break;
-	}
-	case CIT_LADVISE: {
-		io->u.ci_ladvise.li_start = start;
-		io->u.ci_ladvise.li_end = end;
-		io->u.ci_ladvise.li_fid = parent->u.ci_ladvise.li_fid;
-		io->u.ci_ladvise.li_advice = parent->u.ci_ladvise.li_advice;
-		io->u.ci_ladvise.li_flags = parent->u.ci_ladvise.li_flags;
-		break;
-	}
-	default:
-		break;
-	}
-}
-
 static int lov_io_sub_init(const struct lu_env *env, struct lov_io *lio,
 			   struct lov_io_sub *sub)
 {
@@ -228,7 +157,6 @@ struct lov_io_sub *lov_sub_get(const struct lu_env *env,
  * Lov io operations.
  *
  */
-
 static int lov_page_index(const struct cl_page *page)
 {
 	const struct cl_page_slice *slice;
@@ -358,6 +286,76 @@ static void lov_io_fini(const struct lu_env *env, const struct cl_io_slice *ios)
 		wake_up_all(&lov->lo_waitq);
 }
 
+static void lov_io_sub_inherit(struct cl_io *io, struct lov_io *lio,
+			       int stripe, loff_t start, loff_t end)
+{
+	struct lov_stripe_md *lsm = lio->lis_object->lo_lsm;
+	struct cl_io *parent = lio->lis_cl.cis_io;
+
+	switch (io->ci_type) {
+	case CIT_SETATTR: {
+		io->u.ci_setattr.sa_attr = parent->u.ci_setattr.sa_attr;
+		io->u.ci_setattr.sa_attr_flags =
+			parent->u.ci_setattr.sa_attr_flags;
+		io->u.ci_setattr.sa_avalid = parent->u.ci_setattr.sa_avalid;
+		io->u.ci_setattr.sa_xvalid = parent->u.ci_setattr.sa_xvalid;
+		io->u.ci_setattr.sa_stripe_index = stripe;
+		io->u.ci_setattr.sa_parent_fid =
+			parent->u.ci_setattr.sa_parent_fid;
+		if (cl_io_is_trunc(io)) {
+			loff_t new_size = parent->u.ci_setattr.sa_attr.lvb_size;
+
+			new_size = lov_size_to_stripe(lsm, 0, new_size, stripe);
+			io->u.ci_setattr.sa_attr.lvb_size = new_size;
+		}
+		break;
+	}
+	case CIT_DATA_VERSION: {
+		io->u.ci_data_version.dv_data_version = 0;
+		io->u.ci_data_version.dv_flags =
+			parent->u.ci_data_version.dv_flags;
+		break;
+	}
+	case CIT_FAULT: {
+		struct cl_object *obj = parent->ci_obj;
+		loff_t off = cl_offset(obj, parent->u.ci_fault.ft_index);
+
+		io->u.ci_fault = parent->u.ci_fault;
+		off = lov_size_to_stripe(lsm, 0, off, stripe);
+		io->u.ci_fault.ft_index = cl_index(obj, off);
+		break;
+	}
+	case CIT_FSYNC: {
+		io->u.ci_fsync.fi_start = start;
+		io->u.ci_fsync.fi_end = end;
+		io->u.ci_fsync.fi_fid = parent->u.ci_fsync.fi_fid;
+		io->u.ci_fsync.fi_mode = parent->u.ci_fsync.fi_mode;
+		break;
+	}
+	case CIT_READ:
+	case CIT_WRITE: {
+		io->u.ci_wr.wr_sync = cl_io_is_sync_write(parent);
+		if (cl_io_is_append(parent)) {
+			io->u.ci_wr.wr_append = 1;
+		} else {
+			io->u.ci_rw.crw_pos = start;
+			io->u.ci_rw.crw_count = end - start;
+		}
+		break;
+	}
+	case CIT_LADVISE: {
+		io->u.ci_ladvise.li_start = start;
+		io->u.ci_ladvise.li_end = end;
+		io->u.ci_ladvise.li_fid = parent->u.ci_ladvise.li_fid;
+		io->u.ci_ladvise.li_advice = parent->u.ci_ladvise.li_advice;
+		io->u.ci_ladvise.li_flags = parent->u.ci_ladvise.li_flags;
+		break;
+	}
+	default:
+		break;
+	}
+}
+
 static u64 lov_offset_mod(u64 val, int delta)
 {
 	if (val != OBD_OBJECT_EOF)
@@ -491,24 +489,6 @@ static int lov_io_end_wrapper(const struct lu_env *env, struct cl_io *io)
 	return 0;
 }
 
-static void
-lov_io_data_version_end(const struct lu_env *env, const struct cl_io_slice *ios)
-{
-	struct lov_io *lio = cl2lov_io(env, ios);
-	struct cl_io *parent = lio->lis_cl.cis_io;
-	struct lov_io_sub *sub;
-
-	list_for_each_entry(sub, &lio->lis_active, sub_linkage) {
-		lov_io_end_wrapper(sub->sub_env, sub->sub_io);
-
-		parent->u.ci_data_version.dv_data_version +=
-			sub->sub_io->u.ci_data_version.dv_data_version;
-
-		if (!parent->ci_result)
-			parent->ci_result = sub->sub_io->ci_result;
-	}
-}
-
 static int lov_io_iter_fini_wrapper(const struct lu_env *env, struct cl_io *io)
 {
 	cl_io_iter_fini(env, io);
@@ -529,6 +509,24 @@ static void lov_io_end(const struct lu_env *env, const struct cl_io_slice *ios)
 	LASSERT(rc == 0);
 }
 
+static void
+lov_io_data_version_end(const struct lu_env *env, const struct cl_io_slice *ios)
+{
+	struct lov_io *lio = cl2lov_io(env, ios);
+	struct cl_io *parent = lio->lis_cl.cis_io;
+	struct lov_io_sub *sub;
+
+	list_for_each_entry(sub, &lio->lis_active, sub_linkage) {
+		lov_io_end_wrapper(sub->sub_env, sub->sub_io);
+
+		parent->u.ci_data_version.dv_data_version +=
+			sub->sub_io->u.ci_data_version.dv_data_version;
+
+		if (!parent->ci_result)
+			parent->ci_result = sub->sub_io->ci_result;
+	}
+}
+
 static void lov_io_iter_fini(const struct lu_env *env,
 			     const struct cl_io_slice *ios)
 {
@@ -602,7 +600,8 @@ static int lov_io_read_ahead(const struct lu_env *env,
 
 	pps = loo->lo_lsm->lsm_entries[0]->lsme_stripe_size >> PAGE_SHIFT;
 
-	CDEBUG(D_READA, DFID " max_index = %lu, pps = %u, stripe_size = %u, stripe no = %u, start index = %lu\n",
+	CDEBUG(D_READA,
+	       DFID " max_index = %lu, pps = %u, stripe_size = %u, stripe no = %u, start index = %lu\n",
 	       PFID(lu_object_fid(lov2lu(loo))), ra_end, pps,
 	       loo->lo_lsm->lsm_entries[0]->lsme_stripe_size, stripe, start);
 
diff --git a/drivers/staging/lustre/lustre/lov/lov_object.c b/drivers/staging/lustre/lustre/lov/lov_object.c
index de5e2a2..3677fac 100644
--- a/drivers/staging/lustre/lustre/lov/lov_object.c
+++ b/drivers/staging/lustre/lustre/lov/lov_object.c
@@ -600,6 +600,7 @@ static int lov_attr_get_raid0(const struct lu_env *env, struct lov_object *lov,
 		return 0;
 
 	memset(lvb, 0, sizeof(*lvb));
+
 	/* XXX: timestamps can be negative by sanity:test_39m,
 	 * how can it be?
 	 */
@@ -1200,18 +1201,18 @@ static u64 fiemap_calc_fm_end_offset(struct fiemap *fiemap,
 }
 
 struct fiemap_state {
-	struct fiemap	*fs_fm;
-	u64		fs_start;
-	u64		fs_length;
-	u64		fs_end;
-	u64		fs_end_offset;
-	int		fs_cur_extent;
-	int		fs_cnt_need;
-	int		fs_start_stripe;
-	int		fs_last_stripe;
-	bool		fs_device_done;
-	bool		fs_finish;
-	bool		fs_enough;
+	struct fiemap		*fs_fm;
+	u64			fs_start;
+	u64			fs_length;
+	u64			fs_end;
+	u64			fs_end_offset;
+	int			fs_cur_extent;
+	int			fs_cnt_need;
+	int			fs_start_stripe;
+	int			fs_last_stripe;
+	bool			fs_device_done;
+	bool			fs_finish;
+	bool			fs_enough;
 };
 
 static int fiemap_for_stripe(const struct lu_env *env, struct cl_object *obj,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [lustre-devel] [PATCH v2 13/33] lustre: lov: remove lsm_stripe_by_[index|offset]_plain
  2019-01-06 22:13 [lustre-devel] [PATCH v2 00/33] lustre: add PFL support James Simmons
                   ` (11 preceding siblings ...)
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 12/33] lustre: lov: move around PFL code and cleanups James Simmons
@ 2019-01-06 22:14 ` James Simmons
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 14/33] lustre: lov: add looping lsm_entry_count times James Simmons
                   ` (19 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: James Simmons @ 2019-01-06 22:14 UTC (permalink / raw)
  To: lustre-devel

From: Bobi Jam <bobijam@hotmail.com>

Since both lsm_stripe_by_index() and lsm_stripe_by_offset() are
the same for lsm_operations replace them with a new universal
function stripe_width().

Signed-off-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Signed-off-by: Bobi Jam <bobijam@hotmail.com>
Signed-off-by: Niu Yawei <yawei.niu@intel.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-8998
Reviewed-on: https://review.whamcloud.com/24850
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/lov/lov_ea.c       | 24 ------------------------
 drivers/staging/lustre/lustre/lov/lov_internal.h |  4 ----
 drivers/staging/lustre/lustre/lov/lov_offset.c   | 23 +++++++++++++----------
 3 files changed, 13 insertions(+), 38 deletions(-)

diff --git a/drivers/staging/lustre/lustre/lov/lov_ea.c b/drivers/staging/lustre/lustre/lov/lov_ea.c
index f2a5a60..1824469 100644
--- a/drivers/staging/lustre/lustre/lov/lov_ea.c
+++ b/drivers/staging/lustre/lustre/lov/lov_ea.c
@@ -309,24 +309,6 @@ void lsm_free(struct lov_stripe_md *lsm)
 	return lsm;
 }
 
-static void
-lsm_stripe_by_index_plain(struct lov_stripe_md *lsm, int *stripeno,
-			  loff_t *lov_off, loff_t *swidth)
-{
-	if (swidth)
-		*swidth = (loff_t)lsm->lsm_entries[0]->lsme_stripe_size *
-			  lsm->lsm_entries[0]->lsme_stripe_count;
-}
-
-static void
-lsm_stripe_by_offset_plain(struct lov_stripe_md *lsm, int *stripeno,
-			   loff_t *lov_off, loff_t *swidth)
-{
-	if (swidth)
-		*swidth = (loff_t)lsm->lsm_entries[0]->lsme_stripe_size *
-			  lsm->lsm_entries[0]->lsme_stripe_count;
-}
-
 static struct lov_stripe_md *
 lsm_unpackmd_v1(struct lov_obd *lov, void *buf, size_t buf_size)
 {
@@ -336,8 +318,6 @@ void lsm_free(struct lov_stripe_md *lsm)
 }
 
 const static struct lsm_operations lsm_v1_ops = {
-	.lsm_stripe_by_index    = lsm_stripe_by_index_plain,
-	.lsm_stripe_by_offset   = lsm_stripe_by_offset_plain,
 	.lsm_unpackmd		= lsm_unpackmd_v1,
 };
 
@@ -351,8 +331,6 @@ void lsm_free(struct lov_stripe_md *lsm)
 }
 
 const static struct lsm_operations lsm_v3_ops = {
-	.lsm_stripe_by_index	= lsm_stripe_by_index_plain,
-	.lsm_stripe_by_offset	= lsm_stripe_by_offset_plain,
 	.lsm_unpackmd		= lsm_unpackmd_v3,
 };
 
@@ -502,8 +480,6 @@ static int lsm_verify_comp_md_v1(struct lov_comp_md_v1 *lcm,
 }
 
 const static struct lsm_operations lsm_comp_md_v1_ops = {
-	.lsm_stripe_by_index	= lsm_stripe_by_index_plain,
-	.lsm_stripe_by_offset	= lsm_stripe_by_offset_plain,
 	.lsm_unpackmd		= lsm_unpackmd_comp_md_v1,
 };
 
diff --git a/drivers/staging/lustre/lustre/lov/lov_internal.h b/drivers/staging/lustre/lustre/lov/lov_internal.h
index 4c9e324..ebe5890 100644
--- a/drivers/staging/lustre/lustre/lov/lov_internal.h
+++ b/drivers/staging/lustre/lustre/lov/lov_internal.h
@@ -80,10 +80,6 @@ static inline bool lsm_has_objects(struct lov_stripe_md *lsm)
 }
 
 struct lsm_operations {
-	void (*lsm_stripe_by_index)(struct lov_stripe_md *, int *, loff_t *,
-				    loff_t *);
-	void (*lsm_stripe_by_offset)(struct lov_stripe_md *, int *, loff_t *,
-				     loff_t *);
 	struct lov_stripe_md *(*lsm_unpackmd)(struct lov_obd *obd, void *buf,
 					      size_t buf_len);
 };
diff --git a/drivers/staging/lustre/lustre/lov/lov_offset.c b/drivers/staging/lustre/lustre/lov/lov_offset.c
index d817aa5..513f1fd 100644
--- a/drivers/staging/lustre/lustre/lov/lov_offset.c
+++ b/drivers/staging/lustre/lustre/lov/lov_offset.c
@@ -37,6 +37,15 @@
 
 #include "lov_internal.h"
 
+static u64 stripe_width(struct lov_stripe_md *lsm, unsigned int index)
+{
+	struct lov_stripe_md_entry *entry = lsm->lsm_entries[index];
+
+	LASSERT(index < lsm->lsm_entry_count);
+
+	return entry->lsme_stripe_size * entry->lsme_stripe_count;
+}
+
 /* compute object size given "stripeno" and the ost size */
 u64 lov_stripe_size(struct lov_stripe_md *lsm, int index, u64 ost_size,
 		    int stripeno)
@@ -45,12 +54,11 @@ u64 lov_stripe_size(struct lov_stripe_md *lsm, int index, u64 ost_size,
 	unsigned long stripe_size;
 	u64 swidth;
 	u64 lov_size;
-	int magic = lsm->lsm_magic;
 
 	if (ost_size == 0)
 		return 0;
 
-	lsm_op_find(magic)->lsm_stripe_by_index(lsm, &stripeno, NULL, &swidth);
+	swidth = stripe_width(lsm, index);
 
 	/* lov_do_div64(a, b) returns a % b, and a = a / b */
 	stripe_size = lov_do_div64(ost_size, ssize);
@@ -129,7 +137,6 @@ int lov_stripe_offset(struct lov_stripe_md *lsm, int index, u64 lov_off,
 {
 	unsigned long ssize  = lsm->lsm_entries[index]->lsme_stripe_size;
 	u64 stripe_off, this_stripe, swidth;
-	int magic = lsm->lsm_magic;
 	int ret = 0;
 
 	if (lov_off == OBD_OBJECT_EOF) {
@@ -137,8 +144,7 @@ int lov_stripe_offset(struct lov_stripe_md *lsm, int index, u64 lov_off,
 		return 0;
 	}
 
-	lsm_op_find(magic)->lsm_stripe_by_index(lsm, &stripeno, &lov_off,
-						&swidth);
+	swidth = stripe_width(lsm, index);
 
 	/* lov_do_div64(a, b) returns a % b, and a = a / b */
 	stripe_off = lov_do_div64(lov_off, swidth);
@@ -184,13 +190,11 @@ u64 lov_size_to_stripe(struct lov_stripe_md *lsm, int index, u64 file_size,
 {
 	unsigned long ssize  = lsm->lsm_entries[index]->lsme_stripe_size;
 	u64 stripe_off, this_stripe, swidth;
-	int magic = lsm->lsm_magic;
 
 	if (file_size == OBD_OBJECT_EOF)
 		return OBD_OBJECT_EOF;
 
-	lsm_op_find(magic)->lsm_stripe_by_index(lsm, &stripeno, &file_size,
-						&swidth);
+	swidth = stripe_width(lsm, index);
 
 	/* lov_do_div64(a, b) returns a % b, and a = a / b */
 	stripe_off = lov_do_div64(file_size, swidth);
@@ -258,9 +262,8 @@ int lov_stripe_number(struct lov_stripe_md *lsm, int index, u64 lov_off)
 {
 	unsigned long ssize  = lsm->lsm_entries[index]->lsme_stripe_size;
 	u64 stripe_off, swidth;
-	int magic = lsm->lsm_magic;
 
-	lsm_op_find(magic)->lsm_stripe_by_offset(lsm, NULL, &lov_off, &swidth);
+	swidth = stripe_width(lsm, index);
 
 	stripe_off = lov_do_div64(lov_off, swidth);
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [lustre-devel] [PATCH v2 14/33] lustre: lov: add looping lsm_entry_count times
  2019-01-06 22:13 [lustre-devel] [PATCH v2 00/33] lustre: add PFL support James Simmons
                   ` (12 preceding siblings ...)
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 13/33] lustre: lov: remove lsm_stripe_by_[index|offset]_plain James Simmons
@ 2019-01-06 22:14 ` James Simmons
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 15/33] lustre: lov: create lov_comp_* wrappers James Simmons
                   ` (18 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: James Simmons @ 2019-01-06 22:14 UTC (permalink / raw)
  To: lustre-devel

From: Bobi Jam <bobijam@hotmail.com>

Create lov_for_each_layout_entry() and lov_lse() to handle when
lsm_entry_count will be greater than one. Modifiy various code
blocks to loop lsm_entry_count times.

Signed-off-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Signed-off-by: Bobi Jam <bobijam@hotmail.com>
Signed-off-by: Niu Yawei <yawei.niu@intel.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-8998
Reviewed-on: https://review.whamcloud.com/24850
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../staging/lustre/lustre/lov/lov_cl_internal.h    |  13 +++
 drivers/staging/lustre/lustre/lov/lov_ea.c         |  20 +++-
 drivers/staging/lustre/lustre/lov/lov_io.c         |  88 +++++++++-------
 drivers/staging/lustre/lustre/lov/lov_merge.c      |   6 +-
 drivers/staging/lustre/lustre/lov/lov_object.c     | 116 +++++++++++++--------
 5 files changed, 156 insertions(+), 87 deletions(-)

diff --git a/drivers/staging/lustre/lustre/lov/lov_cl_internal.h b/drivers/staging/lustre/lustre/lov/lov_cl_internal.h
index ce32823..952da3a 100644
--- a/drivers/staging/lustre/lustre/lov/lov_cl_internal.h
+++ b/drivers/staging/lustre/lustre/lov/lov_cl_internal.h
@@ -235,6 +235,11 @@ struct lov_object {
 	struct task_struct	*lo_owner;
 };
 
+#define lov_foreach_layout_entry(lov, entry)				\
+	for (entry = &lov->u.composite.lo_entries[0];			\
+	     entry < &lov->u.composite.lo_entries[lov->u.composite.lo_entry_count];\
+	     entry++)
+
 /**
  * State lov_lock keeps for each sub-lock.
  */
@@ -642,6 +647,14 @@ static inline struct lov_layout_raid0 *lov_r0(struct lov_object *lov, int i)
 	return &lov->u.composite.lo_entries[i].lle_raid0;
 }
 
+static inline struct lov_stripe_md_entry *lov_lse(struct lov_object *lov, int i)
+{
+	LASSERT(lov->lo_lsm);
+	LASSERT(i < lov->lo_lsm->lsm_entry_count);
+
+	return lov->lo_lsm->lsm_entries[i];
+}
+
 /* lov_pack.c */
 int lov_getstripe(struct lov_object *obj, struct lov_stripe_md *lsm,
 		  struct lov_user_md __user *lump);
diff --git a/drivers/staging/lustre/lustre/lov/lov_ea.c b/drivers/staging/lustre/lustre/lov/lov_ea.c
index 1824469..ff6b251 100644
--- a/drivers/staging/lustre/lustre/lov/lov_ea.c
+++ b/drivers/staging/lustre/lustre/lov/lov_ea.c
@@ -507,11 +507,21 @@ const struct lsm_operations *lsm_op_find(int magic)
 
 void dump_lsm(unsigned int level, const struct lov_stripe_md *lsm)
 {
+	int i;
+
 	CDEBUG(level,
-	       "lsm %p, objid " DOSTID ", maxbytes %#llx, magic 0x%08X, stripe_size %u, stripe_count %u, refc: %d, layout_gen %u, pool [" LOV_POOLNAMEF "]\n",
+	       "lsm %p, objid " DOSTID ", maxbytes %#llx, magic 0x%08X, refc: %d, entry: %u, layout_gen %u\n",
 	       lsm, POSTID(&lsm->lsm_oi), lsm->lsm_maxbytes, lsm->lsm_magic,
-	       lsm->lsm_entries[0]->lsme_stripe_size,
-	       lsm->lsm_entries[0]->lsme_stripe_count,
-	       atomic_read(&lsm->lsm_refc), lsm->lsm_layout_gen,
-	       lsm->lsm_entries[0]->lsme_pool_name);
+	       atomic_read(&lsm->lsm_refc), lsm->lsm_entry_count,
+	       lsm->lsm_layout_gen);
+
+	for (i = 0; i < lsm->lsm_entry_count; i++) {
+		struct lov_stripe_md_entry *lse = lsm->lsm_entries[i];
+
+		CDEBUG(level,
+		       ": id: %u, magic 0x%08X, stripe count %u, size %u, layout_gen %u, pool: [" LOV_POOLNAMEF "]\n",
+		       lse->lsme_id, lse->lsme_magic,
+		       lse->lsme_stripe_count, lse->lsme_stripe_size,
+		       lse->lsme_layout_gen, lse->lsme_pool_name);
+	}
 }
diff --git a/drivers/staging/lustre/lustre/lov/lov_io.c b/drivers/staging/lustre/lustre/lov/lov_io.c
index ab97326..7fdbed9 100644
--- a/drivers/staging/lustre/lustre/lov/lov_io.c
+++ b/drivers/staging/lustre/lustre/lov/lov_io.c
@@ -368,46 +368,59 @@ static int lov_io_iter_init(const struct lu_env *env,
 {
 	struct lov_io	*lio = cl2lov_io(env, ios);
 	struct lov_stripe_md *lsm = lio->lis_object->lo_lsm;
+	struct lov_layout_entry *le;
 	struct lov_io_sub    *sub;
 	u64 endpos;
-	u64 start;
-	u64 end;
-	int stripe;
 	int rc = 0;
-	int index = 0;
+	int index;
 
 	endpos = lov_offset_mod(lio->lis_endpos, -1);
-	for (stripe = 0; stripe < lio->lis_stripe_count; stripe++) {
-		if (!lov_stripe_intersects(lsm, index, stripe, lio->lis_pos,
-					   endpos, &start, &end))
-			continue;
-
-		if (unlikely(!lov_r0(lio->lis_object, index)->lo_sub[stripe])) {
-			if (ios->cis_io->ci_type == CIT_READ ||
-			    ios->cis_io->ci_type == CIT_WRITE ||
-			    ios->cis_io->ci_type == CIT_FAULT)
-				return -EIO;
 
-			continue;
-		}
+	index = 0;
+	lov_foreach_layout_entry(lio->lis_object, le) {
+		struct lov_layout_raid0 *r0 = &le->lle_raid0;
+		int stripe;
+		u64 start;
+		u64 end;
+
+		index++;
+
+		for (stripe = 0; stripe < r0->lo_nr; stripe++) {
+			if (!lov_stripe_intersects(lsm, index - 1, stripe,
+						   lio->lis_pos,
+						   endpos, &start, &end))
+				continue;
+
+			if (unlikely(!r0->lo_sub[stripe])) {
+				if (ios->cis_io->ci_type == CIT_READ ||
+				    ios->cis_io->ci_type == CIT_WRITE ||
+				    ios->cis_io->ci_type == CIT_FAULT)
+					return -EIO;
+
+				continue;
+			}
+
+			end = lov_offset_mod(end, 1);
+			sub = lov_sub_get(env, lio, stripe);
+			if (IS_ERR(sub)) {
+				rc = PTR_ERR(sub);
+				break;
+			}
 
-		end = lov_offset_mod(end, 1);
-		sub = lov_sub_get(env, lio, stripe);
-		if (IS_ERR(sub)) {
-			rc = PTR_ERR(sub);
-			break;
-		}
+			lov_io_sub_inherit(sub->sub_io, lio, stripe, start, end);
+			rc = cl_io_iter_init(sub->sub_env, sub->sub_io);
+			if (rc) {
+				cl_io_iter_fini(sub->sub_env, sub->sub_io);
+				break;
+			}
 
-		lov_io_sub_inherit(sub->sub_io, lio, stripe, start, end);
-		rc = cl_io_iter_init(sub->sub_env, sub->sub_io);
-		if (rc) {
-			cl_io_iter_fini(sub->sub_env, sub->sub_io);
-			break;
-		}
-		CDEBUG(D_VFSTRACE, "shrink: %d [%llu, %llu)\n",
-		       stripe, start, end);
+			CDEBUG(D_VFSTRACE, "shrink: %d [%llu, %llu)\n",
+			       stripe, start, end);
 
-		list_add_tail(&sub->sub_linkage, &lio->lis_active);
+			list_add_tail(&sub->sub_linkage, &lio->lis_active);
+		}
+		if (rc)
+			break;
 	}
 	return rc;
 }
@@ -417,13 +430,18 @@ static int lov_io_rw_iter_init(const struct lu_env *env,
 {
 	struct lov_io	*lio = cl2lov_io(env, ios);
 	struct cl_io	 *io  = ios->cis_io;
-	struct lov_stripe_md *lsm = lio->lis_object->lo_lsm;
-	unsigned long ssize = lsm->lsm_entries[0]->lsme_stripe_size;
 	u64 start = io->u.ci_rw.crw_pos;
+	struct lov_stripe_md_entry *lse;
+	unsigned long ssize;
 	loff_t next;
+	int index = 0;
 
 	LASSERT(io->ci_type == CIT_READ || io->ci_type == CIT_WRITE);
 
+	lse = lov_lse(lio->lis_object, index);
+
+	ssize = lse->lsme_stripe_size;
+
 	/* fast path for common case. */
 	if (lio->lis_nr_subios != 1 && !cl_io_is_append(io)) {
 		lov_do_div64(start, ssize);
@@ -598,12 +616,12 @@ static int lov_io_read_ahead(const struct lu_env *env,
 	if (ra_end != CL_PAGE_EOF)
 		ra_end = lov_stripe_pgoff(loo->lo_lsm, index, ra_end, stripe);
 
-	pps = loo->lo_lsm->lsm_entries[0]->lsme_stripe_size >> PAGE_SHIFT;
+	pps = lov_lse(loo, index)->lsme_stripe_size >> PAGE_SHIFT;
 
 	CDEBUG(D_READA,
 	       DFID " max_index = %lu, pps = %u, stripe_size = %u, stripe no = %u, start index = %lu\n",
 	       PFID(lu_object_fid(lov2lu(loo))), ra_end, pps,
-	       loo->lo_lsm->lsm_entries[0]->lsme_stripe_size, stripe, start);
+	       lov_lse(loo, index)->lsme_stripe_size, stripe, start);
 
 	/* never exceed the end of the stripe */
 	ra->cra_end = min_t(pgoff_t, ra_end, start + pps - start % pps - 1);
diff --git a/drivers/staging/lustre/lustre/lov/lov_merge.c b/drivers/staging/lustre/lustre/lov/lov_merge.c
index 020795f..79edc26 100644
--- a/drivers/staging/lustre/lustre/lov/lov_merge.c
+++ b/drivers/staging/lustre/lustre/lov/lov_merge.c
@@ -44,6 +44,7 @@
 int lov_merge_lvb_kms(struct lov_stripe_md *lsm, int index,
 		      struct ost_lvb *lvb, __u64 *kms_place)
 {
+	struct lov_stripe_md_entry *lse = lsm->lsm_entries[index];
 	__u64 size = 0;
 	__u64 kms = 0;
 	__u64 blocks = 0;
@@ -59,8 +60,9 @@ int lov_merge_lvb_kms(struct lov_stripe_md *lsm, int index,
 	CDEBUG(D_INODE, "MDT ID " DOSTID " initial value: s=%llu m=%llu a=%llu c=%llu b=%llu\n",
 	       POSTID(&lsm->lsm_oi), lvb->lvb_size, lvb->lvb_mtime,
 	       lvb->lvb_atime, lvb->lvb_ctime, lvb->lvb_blocks);
-	for (i = 0; i < lsm->lsm_entries[0]->lsme_stripe_count; i++) {
-		struct lov_oinfo *loi = lsm->lsm_entries[0]->lsme_oinfo[i];
+
+	for (i = 0; i < lse->lsme_stripe_count; i++) {
+		struct lov_oinfo *loi = lse->lsme_oinfo[i];
 		u64 lov_size, tmpsize;
 
 		if (OST_LVB_IS_ERR(loi->loi_lvb.lvb_blocks)) {
diff --git a/drivers/staging/lustre/lustre/lov/lov_object.c b/drivers/staging/lustre/lustre/lov/lov_object.c
index 3677fac..74e95b1 100644
--- a/drivers/staging/lustre/lustre/lov/lov_object.c
+++ b/drivers/staging/lustre/lustre/lov/lov_object.c
@@ -217,10 +217,11 @@ static int lov_page_slice_fixup(struct lov_object *lov,
 }
 
 static int lov_init_raid0(const struct lu_env *env, struct lov_device *dev,
-			  struct lov_object *lov, struct lov_stripe_md *lsm,
+			  struct lov_object *lov, int index,
 			  const struct cl_object_conf *conf,
 			  struct lov_layout_raid0 *r0)
 {
+	struct lov_stripe_md_entry *lse = lov_lse(lov, index);
 	struct cl_object *stripe;
 	struct lov_thread_info *lti = lov_env_info(env);
 	struct cl_object_conf *subconf = &lti->lti_stripe_conf;
@@ -230,7 +231,7 @@ static int lov_init_raid0(const struct lu_env *env, struct lov_device *dev,
 	int i;
 
 	spin_lock_init(&r0->lo_sub_lock);
-	r0->lo_nr  = lsm->lsm_entries[0]->lsme_stripe_count;
+	r0->lo_nr = lse->lsme_stripe_count;
 	LASSERT(r0->lo_nr <= lov_targets_nr(dev));
 
 	r0->lo_sub = kvzalloc(r0->lo_nr * sizeof(r0->lo_sub[0]),
@@ -245,11 +246,10 @@ static int lov_init_raid0(const struct lu_env *env, struct lov_device *dev,
 	 * Create stripe cl_objects.
 	 */
 	for (i = 0; i < r0->lo_nr; ++i) {
+		struct lov_oinfo *oinfo = lse->lsme_oinfo[i];
 		struct cl_device *subdev;
-		struct lov_oinfo *oinfo;
 		int ost_idx;
 
-		oinfo = lsm->lsm_entries[0]->lsme_oinfo[i];
 		if (lov_oinfo_is_dummy(oinfo))
 			continue;
 
@@ -324,7 +324,7 @@ static int lov_init_composite(const struct lu_env *env, struct lov_device *dev,
 	for (i = 0; i < entry_count; i++) {
 		struct lov_layout_entry *le = &comp->lo_entries[i];
 
-		result = lov_init_raid0(env, dev, lov, lsm, conf,
+		result = lov_init_raid0(env, dev, lov, i, conf,
 					&le->lle_raid0);
 		if (result < 0)
 			break;
@@ -467,13 +467,13 @@ static int lov_delete_composite(const struct lu_env *env,
 				struct lov_object *lov,
 				union lov_layout_state *state)
 {
-	struct lov_layout_composite *comp = &state->composite;
-	struct lov_layout_entry *entry = &comp->lo_entries[0];
+	struct lov_layout_entry *entry;
 
 	dump_lsm(D_INODE, lov->lo_lsm);
 
 	lov_layout_wait(env, lov);
-	lov_delete_raid0(env, lov, &entry->lle_raid0);
+	lov_foreach_layout_entry(lov, entry)
+		lov_delete_raid0(env, lov, &entry->lle_raid0);
 
 	return 0;
 }
@@ -500,9 +500,10 @@ static void lov_fini_composite(const struct lu_env *env,
 	struct lov_layout_composite *comp = &state->composite;
 
 	if (comp->lo_entries) {
-		struct lov_layout_entry *entry = &comp->lo_entries[0];
+		struct lov_layout_entry *entry;
 
-		lov_fini_raid0(env, &entry->lle_raid0);
+		lov_foreach_layout_entry(lov, entry)
+			lov_fini_raid0(env, &entry->lle_raid0);
 
 		kvfree(comp->lo_entries);
 		comp->lo_entries = NULL;
@@ -548,15 +549,24 @@ static int lov_print_composite(const struct lu_env *env, void *cookie,
 			       lu_printer_t p, const struct lu_object *o)
 {
 	struct lov_object *lov = lu2lov(o);
-	struct lov_layout_raid0	*r0 = lov_r0(lov, 0);
 	struct lov_stripe_md *lsm = lov->lo_lsm;
+	int i;
 
-	(*p)(env, cookie, "stripes: %d, %s, lsm{%p 0x%08X %d %u %u}:\n",
-	     r0->lo_nr, lov->lo_layout_invalid ? "invalid" : "valid", lsm,
+	(*p)(env, cookie, "entries: %d, %s, lsm{%p 0x%08X %d %u}:\n",
+	     lsm->lsm_entry_count,
+	     lov->lo_layout_invalid ? "invalid" : "valid", lsm,
 	     lsm->lsm_magic, atomic_read(&lsm->lsm_refc),
-	     lsm->lsm_entries[0]->lsme_stripe_count, lsm->lsm_layout_gen);
+	     lsm->lsm_layout_gen);
+
+	for (i = 0; i < lsm->lsm_entry_count; i++) {
+		struct lov_stripe_md_entry *lse = lsm->lsm_entries[i];
 
-	lov_print_raid0(env, cookie, p, r0);
+		(*p)(env, cookie, ": { 0x%08X, %u, %u, %u, %u }\n",
+		     lse->lsme_magic,
+		     lse->lsme_id, lse->lsme_layout_gen,
+		     lse->lsme_stripe_count, lse->lsme_stripe_size);
+		lov_print_raid0(env, cookie, p, lov_r0(lov, i));
+	}
 
 	return 0;
 }
@@ -589,10 +599,11 @@ static int lov_attr_get_empty(const struct lu_env *env, struct cl_object *obj,
 }
 
 static int lov_attr_get_raid0(const struct lu_env *env, struct lov_object *lov,
-			      struct cl_attr *attr, struct lov_layout_raid0 *r0)
+			      unsigned int index, struct lov_layout_raid0 *r0)
 {
 	struct lov_stripe_md *lsm = lov->lo_lsm;
 	struct ost_lvb *lvb = &lov_env_info(env)->lti_lvb;
+	struct cl_attr *attr = &r0->lo_attr;
 	int result = 0;
 	u64 kms = 0;
 
@@ -621,7 +632,7 @@ static int lov_attr_get_raid0(const struct lu_env *env, struct lov_object *lov,
 	 * sub-object attributes.
 	 */
 	lov_stripe_lock(lsm);
-	result = lov_merge_lvb_kms(lsm, 0, lvb, &kms);
+	result = lov_merge_lvb_kms(lsm, index, lvb, &kms);
 	lov_stripe_unlock(lsm);
 	if (result)
 		return result;
@@ -638,24 +649,33 @@ static int lov_attr_get_composite(const struct lu_env *env,
 				  struct cl_attr *attr)
 {
 	struct lov_object *lov = cl2lov(obj);
-	struct lov_layout_raid0 *r0 = lov_r0(lov, 0);
-	struct cl_attr *lov_attr = &r0->lo_attr;
-	int result;
+	struct lov_layout_entry *entry;
+	int result = 0;
+	int index = 0;
 
-	result = lov_attr_get_raid0(env, lov, attr, r0);
-	if (result)
-		return result;
+	attr->cat_blocks = 0;
+	attr->cat_size = 0;
+	lov_foreach_layout_entry(lov, entry) {
+		struct lov_layout_raid0 *r0 = &entry->lle_raid0;
+		struct cl_attr *lov_attr = &r0->lo_attr;
 
-	attr->cat_blocks = lov_attr->cat_blocks;
-	attr->cat_size = lov_attr->cat_size;
-	attr->cat_kms = lov_attr->cat_kms;
-	if (attr->cat_atime < lov_attr->cat_atime)
-		attr->cat_atime = lov_attr->cat_atime;
-	if (attr->cat_ctime < lov_attr->cat_ctime)
-		attr->cat_ctime = lov_attr->cat_ctime;
-	if (attr->cat_mtime < lov_attr->cat_mtime)
-		attr->cat_mtime = lov_attr->cat_mtime;
+		result = lov_attr_get_raid0(env, lov, index, r0);
+		if (result)
+			break;
 
+		/* merge results */
+		attr->cat_blocks += lov_attr->cat_blocks;
+		if (attr->cat_size < lov_attr->cat_size)
+			attr->cat_size = lov_attr->cat_size;
+		if (attr->cat_kms < lov_attr->cat_kms)
+			attr->cat_kms = lov_attr->cat_kms;
+		if (attr->cat_atime < lov_attr->cat_atime)
+			attr->cat_atime = lov_attr->cat_atime;
+		if (attr->cat_ctime < lov_attr->cat_ctime)
+			attr->cat_ctime = lov_attr->cat_ctime;
+		if (attr->cat_mtime < lov_attr->cat_mtime)
+			attr->cat_mtime = lov_attr->cat_mtime;
+	}
 	return result;
 }
 
@@ -1089,8 +1109,7 @@ static int fiemap_calc_last_stripe(struct lov_stripe_md *lsm, int index,
 		*stripe_count = lsme->lsme_stripe_count;
 	} else {
 		for (j = 0, i = start_stripe; j < lsme->lsme_stripe_count;
-		     i = (i + 1) % lsme->lsme_stripe_count,
-		     j++) {
+		     i = (i + 1) % lsme->lsme_stripe_count, j++) {
 			if (lov_stripe_intersects(lsm, index, i, fm_start, fm_end,
 						  &obd_start, &obd_end) == 0)
 				break;
@@ -1681,18 +1700,25 @@ int lov_read_and_clear_async_rc(struct cl_object *clob)
 			int i;
 
 			lsm = lov->lo_lsm;
-			for (i = 0; i < lsm->lsm_entries[0]->lsme_stripe_count;
-			     i++) {
-				struct lov_oinfo *loi;
-
-				loi = lsm->lsm_entries[0]->lsme_oinfo[i];
-				if (lov_oinfo_is_dummy(loi))
-					continue;
-
-				if (loi->loi_ar.ar_rc && !rc)
-					rc = loi->loi_ar.ar_rc;
-				loi->loi_ar.ar_rc = 0;
+			LASSERT(lsm);
+			for (i = 0; i < lsm->lsm_entry_count; i++) {
+				struct lov_stripe_md_entry *lse;
+				int j;
+
+				lse = lsm->lsm_entries[i];
+				for (j = 0; j < lse->lsme_stripe_count; j++) {
+					struct lov_oinfo *loi;
+
+					loi = lse->lsme_oinfo[j];
+					if (lov_oinfo_is_dummy(loi))
+						continue;
+
+					if (loi->loi_ar.ar_rc && !rc)
+						rc = loi->loi_ar.ar_rc;
+					loi->loi_ar.ar_rc = 0;
+				}
 			}
+			break;
 		}
 		case LLT_RELEASED:
 		case LLT_EMPTY:
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [lustre-devel] [PATCH v2 15/33] lustre: lov: create lov_comp_* wrappers
  2019-01-06 22:13 [lustre-devel] [PATCH v2 00/33] lustre: add PFL support James Simmons
                   ` (13 preceding siblings ...)
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 14/33] lustre: lov: add looping lsm_entry_count times James Simmons
@ 2019-01-06 22:14 ` James Simmons
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 16/33] lustre: clio: client side implementation for PFL James Simmons
                   ` (17 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: James Simmons @ 2019-01-06 22:14 UTC (permalink / raw)
  To: lustre-devel

From: Bobi Jam <bobijam@hotmail.com>

Add new lov_comp_*() wrappers to get the index, stripe, and
entries for PFL components.

Signed-off-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Signed-off-by: Bobi Jam <bobijam@hotmail.com>
Signed-off-by: Niu Yawei <yawei.niu@intel.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-8998
Reviewed-on: https://review.whamcloud.com/24850
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/lov/lov_internal.h  | 15 ++++++++++++++
 drivers/staging/lustre/lustre/lov/lov_io.c        | 20 ++++++++++--------
 drivers/staging/lustre/lustre/lov/lov_lock.c      |  3 ++-
 drivers/staging/lustre/lustre/lov/lov_object.c    | 25 +++++++++++++++--------
 drivers/staging/lustre/lustre/lov/lov_page.c      |  4 ++--
 drivers/staging/lustre/lustre/lov/lovsub_object.c |  9 ++++----
 6 files changed, 52 insertions(+), 24 deletions(-)

diff --git a/drivers/staging/lustre/lustre/lov/lov_internal.h b/drivers/staging/lustre/lustre/lov/lov_internal.h
index ebe5890..ef47c67 100644
--- a/drivers/staging/lustre/lustre/lov/lov_internal.h
+++ b/drivers/staging/lustre/lustre/lov/lov_internal.h
@@ -79,6 +79,21 @@ static inline bool lsm_has_objects(struct lov_stripe_md *lsm)
 	return lsm && !lsm->lsm_is_released;
 }
 
+static inline unsigned int lov_comp_index(int entry, int stripe)
+{
+	return stripe;
+}
+
+static inline int lov_comp_stripe(int index)
+{
+	return index & 0xffff;
+}
+
+static inline int lov_comp_entry(int index)
+{
+	return 0;
+}
+
 struct lsm_operations {
 	struct lov_stripe_md *(*lsm_unpackmd)(struct lov_obd *obd, void *buf,
 					      size_t buf_len);
diff --git a/drivers/staging/lustre/lustre/lov/lov_io.c b/drivers/staging/lustre/lustre/lov/lov_io.c
index 7fdbed9..635e5a6 100644
--- a/drivers/staging/lustre/lustre/lov/lov_io.c
+++ b/drivers/staging/lustre/lustre/lov/lov_io.c
@@ -72,8 +72,8 @@ static int lov_io_sub_init(const struct lu_env *env, struct lov_io *lio,
 	struct cl_io      *sub_io;
 	struct cl_object  *sub_obj;
 	struct cl_io      *io  = lio->lis_cl.cis_io;
-	int stripe = sub->sub_subio_index;
-	int index = 0;
+	int index = lov_comp_entry(sub->sub_subio_index);
+	int stripe = lov_comp_stripe(sub->sub_subio_index);
 	int rc;
 
 	LASSERT(!sub->sub_io);
@@ -286,11 +286,13 @@ static void lov_io_fini(const struct lu_env *env, const struct cl_io_slice *ios)
 		wake_up_all(&lov->lo_waitq);
 }
 
-static void lov_io_sub_inherit(struct cl_io *io, struct lov_io *lio,
+static void lov_io_sub_inherit(struct lov_io_sub *sub, struct lov_io *lio,
 			       int stripe, loff_t start, loff_t end)
 {
+	struct cl_io *io = sub->sub_io;
 	struct lov_stripe_md *lsm = lio->lis_object->lo_lsm;
 	struct cl_io *parent = lio->lis_cl.cis_io;
+	int index = lov_comp_entry(sub->sub_subio_index);
 
 	switch (io->ci_type) {
 	case CIT_SETATTR: {
@@ -305,7 +307,8 @@ static void lov_io_sub_inherit(struct cl_io *io, struct lov_io *lio,
 		if (cl_io_is_trunc(io)) {
 			loff_t new_size = parent->u.ci_setattr.sa_attr.lvb_size;
 
-			new_size = lov_size_to_stripe(lsm, 0, new_size, stripe);
+			new_size = lov_size_to_stripe(lsm, index, new_size,
+						      stripe);
 			io->u.ci_setattr.sa_attr.lvb_size = new_size;
 		}
 		break;
@@ -321,7 +324,7 @@ static void lov_io_sub_inherit(struct cl_io *io, struct lov_io *lio,
 		loff_t off = cl_offset(obj, parent->u.ci_fault.ft_index);
 
 		io->u.ci_fault = parent->u.ci_fault;
-		off = lov_size_to_stripe(lsm, 0, off, stripe);
+		off = lov_size_to_stripe(lsm, index, off, stripe);
 		io->u.ci_fault.ft_index = cl_index(obj, off);
 		break;
 	}
@@ -401,13 +404,14 @@ static int lov_io_iter_init(const struct lu_env *env,
 			}
 
 			end = lov_offset_mod(end, 1);
-			sub = lov_sub_get(env, lio, stripe);
+			sub = lov_sub_get(env, lio,
+					  lov_comp_index(index - 1, stripe));
 			if (IS_ERR(sub)) {
 				rc = PTR_ERR(sub);
 				break;
 			}
 
-			lov_io_sub_inherit(sub->sub_io, lio, stripe, start, end);
+			lov_io_sub_inherit(sub, lio, stripe, start, end);
 			rc = cl_io_iter_init(sub->sub_env, sub->sub_io);
 			if (rc) {
 				cl_io_iter_fini(sub->sub_env, sub->sub_io);
@@ -588,7 +592,7 @@ static int lov_io_read_ahead(const struct lu_env *env,
 	if (unlikely(!r0->lo_sub[stripe]))
 		return -EIO;
 
-	sub = lov_sub_get(env, lio, stripe);
+	sub = lov_sub_get(env, lio, lov_comp_index(index, stripe));
 	if (IS_ERR(sub))
 		return PTR_ERR(sub);
 
diff --git a/drivers/staging/lustre/lustre/lov/lov_lock.c b/drivers/staging/lustre/lustre/lov/lov_lock.c
index 36c9eb7..cc08e96 100644
--- a/drivers/staging/lustre/lustre/lov/lov_lock.c
+++ b/drivers/staging/lustre/lustre/lov/lov_lock.c
@@ -168,7 +168,8 @@ static struct lov_lock *lov_lock_sub_init(const struct lu_env *env,
 			descr->cld_mode  = lock->cll_descr.cld_mode;
 			descr->cld_gid   = lock->cll_descr.cld_gid;
 			descr->cld_enq_flags = lock->cll_descr.cld_enq_flags;
-			lls->sub_index = i;
+
+			lls->sub_index = lov_comp_index(index, i);
 
 			/* initialize sub lock */
 			result = lov_sublock_init(env, lock, lls);
diff --git a/drivers/staging/lustre/lustre/lov/lov_object.c b/drivers/staging/lustre/lustre/lov/lov_object.c
index 74e95b1..3b34713 100644
--- a/drivers/staging/lustre/lustre/lov/lov_object.c
+++ b/drivers/staging/lustre/lustre/lov/lov_object.c
@@ -132,6 +132,8 @@ static int lov_init_sub(const struct lu_env *env, struct lov_object *lov,
 			struct cl_object *subobj, struct lov_layout_raid0 *r0,
 			int idx)
 {
+	int stripe = lov_comp_stripe(idx);
+	int entry = lov_comp_entry(idx);
 	struct cl_object_header *hdr;
 	struct cl_object_header *subhdr;
 	struct cl_object_header *parent;
@@ -154,8 +156,9 @@ static int lov_init_sub(const struct lu_env *env, struct lov_object *lov,
 	subhdr = cl_object_header(subobj);
 
 	oinfo = lov->lo_lsm->lsm_entries[0]->lsme_oinfo[idx];
-	CDEBUG(D_INODE, DFID "@%p[%d] -> " DFID "@%p: ostid: " DOSTID " idx: %d gen: %d\n",
-	       PFID(&subhdr->coh_lu.loh_fid), subhdr, idx,
+	CDEBUG(D_INODE,
+	       DFID "@%p[%d:%d] -> " DFID "@%p: ostid: " DOSTID " ost idx: %d gen: %d\n",
+	       PFID(&subhdr->coh_lu.loh_fid), subhdr, entry, stripe,
 	       PFID(&hdr->coh_lu.loh_fid), hdr, POSTID(&oinfo->loi_oi),
 	       oinfo->loi_ost_idx, oinfo->loi_ost_gen);
 
@@ -167,9 +170,9 @@ static int lov_init_sub(const struct lu_env *env, struct lov_object *lov,
 		spin_unlock(&subhdr->coh_attr_guard);
 		subhdr->coh_nesting = hdr->coh_nesting + 1;
 		lu_object_ref_add(&subobj->co_lu, "lov-parent", lov);
-		r0->lo_sub[idx] = cl2lovsub(subobj);
-		r0->lo_sub[idx]->lso_super = lov;
-		r0->lo_sub[idx]->lso_index = idx;
+		r0->lo_sub[stripe] = cl2lovsub(subobj);
+		r0->lo_sub[stripe]->lso_super = lov;
+		r0->lo_sub[stripe]->lso_index = idx;
 		result = 0;
 	} else {
 		struct lu_object  *old_obj;
@@ -279,7 +282,8 @@ static int lov_init_raid0(const struct lu_env *env, struct lov_device *dev,
 			goto out;
 		}
 
-		result = lov_init_sub(env, lov, stripe, r0, i);
+		result = lov_init_sub(env, lov, stripe, r0,
+				      lov_comp_index(index, i));
 		if (result == -EAGAIN) { /* try again */
 			--i;
 			result = 0;
@@ -354,14 +358,15 @@ static int lov_init_released(const struct lu_env *env, struct lov_device *dev,
 static struct cl_object *lov_find_subobj(const struct lu_env *env,
 					 struct lov_object *lov,
 					 struct lov_stripe_md *lsm,
-					 int stripe_idx)
+					 int index)
 {
 	struct lov_device *dev = lu2lov_dev(lov2lu(lov)->lo_dev);
-	struct lov_oinfo *oinfo = lsm->lsm_entries[0]->lsme_oinfo[stripe_idx];
 	struct lov_thread_info *lti = lov_env_info(env);
 	struct lu_fid *ofid = &lti->lti_fid;
+	int stripe = lov_comp_stripe(index);
 	struct cl_device *subdev;
 	struct cl_object *result;
+	struct lov_oinfo *oinfo;
 	int ost_idx;
 	int rc;
 
@@ -370,6 +375,7 @@ static struct cl_object *lov_find_subobj(const struct lu_env *env,
 		goto out;
 	}
 
+	oinfo = lsm->lsm_entries[0]->lsme_oinfo[stripe];
 	ost_idx = oinfo->loi_ost_idx;
 	rc = ostid_to_fid(ofid, &oinfo->loi_oi, ost_idx);
 	if (rc) {
@@ -1291,7 +1297,8 @@ static int fiemap_for_stripe(const struct lu_env *env, struct cl_object *obj,
 	len_mapped_single_call = 0;
 
 	/* find lobsub object */
-	subobj = lov_find_subobj(env, cl2lov(obj), lsm, stripeno);
+	subobj = lov_find_subobj(env, cl2lov(obj), lsm,
+				 lov_comp_index(index, stripeno));
 	if (IS_ERR(subobj))
 		return PTR_ERR(subobj);
 	/* If the output buffer is very large and the objects have many
diff --git a/drivers/staging/lustre/lustre/lov/lov_page.c b/drivers/staging/lustre/lustre/lov/lov_page.c
index ad34fc3..e227279 100644
--- a/drivers/staging/lustre/lustre/lov/lov_page.c
+++ b/drivers/staging/lustre/lustre/lov/lov_page.c
@@ -87,10 +87,10 @@ int lov_page_init_composite(const struct lu_env *env, struct cl_object *obj,
 	rc = lov_stripe_offset(loo->lo_lsm, entry, offset, stripe, &suboff);
 	LASSERT(rc == 0);
 
-	lpg->lps_index = stripe;
+	lpg->lps_index = lov_comp_index(entry, stripe);
 	cl_page_slice_add(page, &lpg->lps_cl, obj, index, &lov_comp_page_ops);
 
-	sub = lov_sub_get(env, lio, stripe);
+	sub = lov_sub_get(env, lio, lpg->lps_index);
 	if (IS_ERR(sub))
 		return PTR_ERR(sub);
 
diff --git a/drivers/staging/lustre/lustre/lov/lovsub_object.c b/drivers/staging/lustre/lustre/lov/lovsub_object.c
index cd7806b..ca7c8a0 100644
--- a/drivers/staging/lustre/lustre/lov/lovsub_object.c
+++ b/drivers/staging/lustre/lustre/lov/lovsub_object.c
@@ -79,8 +79,8 @@ static void lovsub_object_free(const struct lu_env *env, struct lu_object *obj)
 	 * object handling in lu_object_find.
 	 */
 	if (lov) {
-		int index = 0;
-		int stripe = los->lso_index;
+		int index = lov_comp_entry(los->lso_index);
+		int stripe = lov_comp_stripe(los->lso_index);
 		struct lov_layout_raid0 *r0 = lov_r0(lov, index);
 
 		LASSERT(lov->lo_type == LLT_COMP);
@@ -107,8 +107,9 @@ static int lovsub_attr_update(const struct lu_env *env, struct cl_object *obj,
 			      const struct cl_attr *attr, unsigned int valid)
 {
 	struct lov_object *lov = cl2lovsub(obj)->lso_super;
+	struct lovsub_object *los = cl2lovsub(obj);
 
-	lov_r0(lov, 0)->lo_attr_valid = 0;
+	lov_r0(lov, lov_comp_entry(los->lso_index))->lo_attr_valid = 0;
 	return 0;
 }
 
@@ -137,7 +138,7 @@ static void lovsub_req_attr_set(const struct lu_env *env, struct cl_object *obj,
 	 * There is no OBD_MD_* flag for obdo::o_stripe_idx, so set it
 	 * unconditionally. It never changes anyway.
 	 */
-	attr->cra_oa->o_stripe_idx = subobj->lso_index;
+	attr->cra_oa->o_stripe_idx = lov_comp_stripe(subobj->lso_index);
 }
 
 static const struct cl_object_operations lovsub_ops = {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [lustre-devel] [PATCH v2 16/33] lustre: clio: client side implementation for PFL
  2019-01-06 22:13 [lustre-devel] [PATCH v2 00/33] lustre: add PFL support James Simmons
                   ` (14 preceding siblings ...)
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 15/33] lustre: lov: create lov_comp_* wrappers James Simmons
@ 2019-01-06 22:14 ` James Simmons
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 17/33] lustre: clio: getstripe support comp layout James Simmons
                   ` (16 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: James Simmons @ 2019-01-06 22:14 UTC (permalink / raw)
  To: lustre-devel

From: Bobi Jam <bobijam@hotmail.com>

Make client layer support composite layout.

Plain layout will be stored in LOV layer as a composite layout
containing a single component.

Signed-off-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Signed-off-by: Bobi Jam <bobijam@hotmail.com>
Signed-off-by: Niu Yawei <yawei.niu@intel.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-8998
Reviewed-on: https://review.whamcloud.com/24850
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/include/uapi/linux/lustre/lustre_user.h |   9 +
 .../staging/lustre/lustre/lov/lov_cl_internal.h    |  25 +-
 drivers/staging/lustre/lustre/lov/lov_ea.c         |  21 +-
 drivers/staging/lustre/lustre/lov/lov_internal.h   |  10 +-
 drivers/staging/lustre/lustre/lov/lov_io.c         | 301 +++++++++++----------
 drivers/staging/lustre/lustre/lov/lov_lock.c       |  83 +++---
 drivers/staging/lustre/lustre/lov/lov_object.c     | 283 ++++++++++---------
 drivers/staging/lustre/lustre/lov/lov_offset.c     |  12 +-
 drivers/staging/lustre/lustre/lov/lov_pack.c       |   2 +-
 drivers/staging/lustre/lustre/lov/lov_page.c       |   8 +-
 drivers/staging/lustre/lustre/osc/osc_lock.c       |   7 +-
 11 files changed, 438 insertions(+), 323 deletions(-)

diff --git a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_user.h b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_user.h
index bb87a6f..8ef05f5 100644
--- a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_user.h
+++ b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_user.h
@@ -411,6 +411,15 @@ struct lu_extent {
 	__u64	e_end;
 };
 
+#define DEXT "[ %#llx , %#llx )"
+#define PEXT(ext) (ext)->e_start, (ext)->e_end
+
+static inline bool lu_extent_is_overlapped(struct lu_extent *e1,
+					    struct lu_extent *e2)
+{
+	return e1->e_start < e2->e_end && e2->e_start < e1->e_end;
+}
+
 enum lov_comp_md_entry_flags {
 	LCME_FL_PRIMARY		= 0x00000001,   /* Not used */
 	LCME_FL_STALE		= 0x00000002,   /* Not used */
diff --git a/drivers/staging/lustre/lustre/lov/lov_cl_internal.h b/drivers/staging/lustre/lustre/lov/lov_cl_internal.h
index 952da3a..96e6636 100644
--- a/drivers/staging/lustre/lustre/lov/lov_cl_internal.h
+++ b/drivers/staging/lustre/lustre/lov/lov_cl_internal.h
@@ -224,6 +224,7 @@ struct lov_object {
 			 */
 			unsigned int lo_entry_count;
 			struct lov_layout_entry {
+				struct lu_extent lle_extent;
 				struct lov_layout_raid0 lle_raid0;
 			} *lo_entries;
 		} composite;
@@ -320,15 +321,9 @@ struct lov_thread_info {
  */
 struct lov_io_sub {
 	/**
-	 * true, iff cl_io_init() was successfully executed against
-	 * lov_io_sub::sub_io.
+	 * Linkage into a list (hanging off lov_io::lis_subios)
 	 */
-	u16			 sub_io_initialized:1,
-	/**
-	 * True, iff lov_io_sub::sub_io and lov_io_sub::sub_env weren't
-	 * allocated, but borrowed from a per-device emergency pool.
-	 */
-				 sub_borrowed:1;
+	struct list_head	sub_list;
 	/**
 	 * Linkage into a list (hanging off lov_io::lis_active) of all
 	 * sub-io's active for the current IO iteration.
@@ -340,7 +335,7 @@ struct lov_io_sub {
 	 * independently, with lov acting as a scheduler to maximize overall
 	 * throughput.
 	 */
-	struct cl_io		*sub_io;
+	struct cl_io		sub_io;
 	/**
 	 * environment, in which sub-io executes.
 	 */
@@ -351,6 +346,7 @@ struct lov_io_sub {
 	 * \see cl_env_get()
 	 */
 	u16			sub_refcheck;
+	u16			sub_reenter;
 };
 
 /**
@@ -384,14 +380,13 @@ struct lov_io {
 	 * exclusive (i.e., next offset after last byte affected by io).
 	 */
 	u64			lis_endpos;
-	int			lis_stripe_count;
-	int			lis_active_subios;
+	int			lis_nr_subios;
 
 	/**
 	 * the index of ls_single_subio in ls_subios array
 	 */
 	int			lis_single_subio_index;
-	struct cl_io		lis_single_subio;
+	struct lov_io_sub	lis_single_subio;
 
 	/**
 	 * List of active sub-io's. Active sub-io's are under the range
@@ -400,10 +395,9 @@ struct lov_io {
 	struct list_head	lis_active;
 
 	/**
-	 * size of ls_subios array, actually the highest stripe #
+	 * All sub-io's created in this lov_io.
 	 */
-	int		lis_nr_subios;
-	struct lov_io_sub *lis_subs;
+	struct list_head	lis_subios;
 };
 
 struct lov_session {
@@ -466,6 +460,7 @@ struct lu_object *lovsub_object_alloc(const struct lu_env *env,
 				      struct lu_device *dev);
 
 struct lov_stripe_md *lov_lsm_addref(struct lov_object *lov);
+int lov_lsm_entry(const struct lov_stripe_md *lsm, u64 offset);
 
 #define lov_foreach_target(lov, var)		    \
 	for (var = 0; var < lov_targets_nr(lov); ++var)
diff --git a/drivers/staging/lustre/lustre/lov/lov_ea.c b/drivers/staging/lustre/lustre/lov/lov_ea.c
index ff6b251..6e5b59e 100644
--- a/drivers/staging/lustre/lustre/lov/lov_ea.c
+++ b/drivers/staging/lustre/lustre/lov/lov_ea.c
@@ -519,9 +519,26 @@ void dump_lsm(unsigned int level, const struct lov_stripe_md *lsm)
 		struct lov_stripe_md_entry *lse = lsm->lsm_entries[i];
 
 		CDEBUG(level,
-		       ": id: %u, magic 0x%08X, stripe count %u, size %u, layout_gen %u, pool: [" LOV_POOLNAMEF "]\n",
-		       lse->lsme_id, lse->lsme_magic,
+		       DEXT ": id: %u, magic 0x%08X, stripe count %u, size %u, layout_gen %u, pool: [" LOV_POOLNAMEF "]\n",
+		       PEXT(&lse->lsme_extent), lse->lsme_id, lse->lsme_magic,
 		       lse->lsme_stripe_count, lse->lsme_stripe_size,
 		       lse->lsme_layout_gen, lse->lsme_pool_name);
 	}
 }
+
+int lov_lsm_entry(const struct lov_stripe_md *lsm, u64 offset)
+{
+	int i;
+
+	for (i = 0; i < lsm->lsm_entry_count; i++) {
+		struct lov_stripe_md_entry *lse = lsm->lsm_entries[i];
+
+		if ((offset >= lse->lsme_extent.e_start &&
+		     offset < lse->lsme_extent.e_end) ||
+		    (offset == OBD_OBJECT_EOF &&
+		     lse->lsme_extent.e_end == OBD_OBJECT_EOF))
+			return i;
+	}
+
+	return -1;
+}
diff --git a/drivers/staging/lustre/lustre/lov/lov_internal.h b/drivers/staging/lustre/lustre/lov/lov_internal.h
index ef47c67..29325ff 100644
--- a/drivers/staging/lustre/lustre/lov/lov_internal.h
+++ b/drivers/staging/lustre/lustre/lov/lov_internal.h
@@ -81,7 +81,10 @@ static inline bool lsm_has_objects(struct lov_stripe_md *lsm)
 
 static inline unsigned int lov_comp_index(int entry, int stripe)
 {
-	return stripe;
+	LASSERT(entry >= 0 && entry <= SHRT_MAX);
+	LASSERT(stripe >= 0 && stripe < USHRT_MAX);
+
+	return entry << 16 | stripe;
 }
 
 static inline int lov_comp_stripe(int index)
@@ -91,7 +94,7 @@ static inline int lov_comp_stripe(int index)
 
 static inline int lov_comp_entry(int index)
 {
-	return 0;
+	return index >> 16;
 }
 
 struct lsm_operations {
@@ -191,8 +194,7 @@ int lov_stripe_offset(struct lov_stripe_md *lsm, int index, u64 lov_off,
 u64 lov_size_to_stripe(struct lov_stripe_md *lsm, int index, u64 file_size,
 		       int stripeno);
 int lov_stripe_intersects(struct lov_stripe_md *lsm, int index, int stripeno,
-			  u64 start, u64 end,
-			  u64 *obd_start, u64 *obd_end);
+			  struct lu_extent *ext, u64 *obd_start, u64 *obd_end);
 int lov_stripe_number(struct lov_stripe_md *lsm, int index, u64 lov_off);
 pgoff_t lov_stripe_pgoff(struct lov_stripe_md *lsm, int index,
 			 pgoff_t stripe_index, int stripe);
diff --git a/drivers/staging/lustre/lustre/lov/lov_io.c b/drivers/staging/lustre/lustre/lov/lov_io.c
index 635e5a6..d9b2a81 100644
--- a/drivers/staging/lustre/lustre/lov/lov_io.c
+++ b/drivers/staging/lustre/lustre/lov/lov_io.c
@@ -43,24 +43,46 @@
 /** \addtogroup lov
  *  @{
  */
+
+static inline struct lov_io_sub *lov_sub_alloc(struct lov_io *lio, int index)
+{
+	struct lov_io_sub *sub;
+
+	if (lio->lis_nr_subios == 0) {
+		LASSERT(lio->lis_single_subio_index == -1);
+		sub = &lio->lis_single_subio;
+		lio->lis_single_subio_index = index;
+		memset(sub, 0, sizeof(*sub));
+	} else {
+		sub = kzalloc(sizeof(*sub), GFP_KERNEL);
+	}
+
+	if (sub) {
+		INIT_LIST_HEAD(&sub->sub_list);
+		INIT_LIST_HEAD(&sub->sub_linkage);
+		sub->sub_subio_index = index;
+	}
+
+	return sub;
+}
+
+static inline void lov_sub_free(struct lov_io *lio, struct lov_io_sub *sub)
+{
+	if (sub->sub_subio_index == lio->lis_single_subio_index) {
+		LASSERT(sub == &lio->lis_single_subio);
+		lio->lis_single_subio_index = -1;
+	} else {
+		kfree(sub);
+	}
+}
+
 static void lov_io_sub_fini(const struct lu_env *env, struct lov_io *lio,
 			    struct lov_io_sub *sub)
 {
-	if (sub->sub_io) {
-		if (sub->sub_io_initialized) {
-			cl_io_fini(sub->sub_env, sub->sub_io);
-			sub->sub_io_initialized = 0;
-			lio->lis_active_subios--;
-		}
-		if (sub->sub_subio_index == lio->lis_single_subio_index)
-			lio->lis_single_subio_index = -1;
-		else if (!sub->sub_borrowed)
-			kfree(sub->sub_io);
-		sub->sub_io = NULL;
-	}
-	if (!IS_ERR_OR_NULL(sub->sub_env)) {
-		if (!sub->sub_borrowed)
-			cl_env_put(sub->sub_env, &sub->sub_refcheck);
+	cl_io_fini(sub->sub_env, &sub->sub_io);
+
+	if (sub->sub_env && !IS_ERR(sub->sub_env)) {
+		cl_env_put(sub->sub_env, &sub->sub_refcheck);
 		sub->sub_env = NULL;
 	}
 }
@@ -74,46 +96,24 @@ static int lov_io_sub_init(const struct lu_env *env, struct lov_io *lio,
 	struct cl_io      *io  = lio->lis_cl.cis_io;
 	int index = lov_comp_entry(sub->sub_subio_index);
 	int stripe = lov_comp_stripe(sub->sub_subio_index);
-	int rc;
+	int rc = 0;
 
-	LASSERT(!sub->sub_io);
 	LASSERT(!sub->sub_env);
-	LASSERT(sub->sub_subio_index < lio->lis_stripe_count);
 
 	if (unlikely(!lov_r0(lov, index)->lo_sub[stripe]))
 		return -EIO;
 
-	sub->sub_io_initialized = 0;
-	sub->sub_borrowed = 0;
-
 	/* obtain new environment */
 	sub->sub_env = cl_env_get(&sub->sub_refcheck);
-	if (IS_ERR(sub->sub_env)) {
+	if (IS_ERR(sub->sub_env))
 		rc = PTR_ERR(sub->sub_env);
-		goto fini_lov_io;
-	}
-
-	/*
-	 * First sub-io. Use ->lis_single_subio to
-	 * avoid dynamic allocation.
-	 */
-	if (lio->lis_active_subios == 0) {
-		sub->sub_io = &lio->lis_single_subio;
-		lio->lis_single_subio_index = stripe;
-	} else {
-		sub->sub_io = kzalloc(sizeof(*sub->sub_io),
-				      GFP_NOFS);
-		if (!sub->sub_io) {
-			rc = -ENOMEM;
-			goto fini_lov_io;
-		}
-	}
 
 	sub_obj = lovsub2cl(lov_r0(lov, index)->lo_sub[stripe]);
-	sub_io = sub->sub_io;
+	sub_io = &sub->sub_io;
 
 	sub_io->ci_obj = sub_obj;
 	sub_io->ci_result = 0;
+
 	sub_io->ci_parent = io;
 	sub_io->ci_lockreq = io->ci_lockreq;
 	sub_io->ci_type = io->ci_type;
@@ -121,31 +121,42 @@ static int lov_io_sub_init(const struct lu_env *env, struct lov_io *lio,
 	sub_io->ci_noatime = io->ci_noatime;
 
 	rc = cl_io_sub_init(sub->sub_env, sub_io, io->ci_type, sub_obj);
-	if (rc >= 0) {
-		lio->lis_active_subios++;
-		sub->sub_io_initialized = 1;
-		rc = 0;
-	}
-fini_lov_io:
-	if (rc)
+	if (rc < 0)
 		lov_io_sub_fini(env, lio, sub);
+
 	return rc;
 }
 
 struct lov_io_sub *lov_sub_get(const struct lu_env *env,
 			       struct lov_io *lio, int index)
 {
-	int rc;
-	struct lov_io_sub *sub = &lio->lis_subs[index];
+	struct lov_io_sub *sub;
+	int rc = 0;
 
-	LASSERT(index < lio->lis_stripe_count);
+	list_for_each_entry(sub, &lio->lis_subios, sub_list) {
+		if (sub->sub_subio_index == index) {
+			rc = 1;
+			break;
+		}
+	}
+
+	if (rc == 0) {
+		sub = lov_sub_alloc(lio, index);
+		if (!sub) {
+			rc = -ENOMEM;
+			goto out;
+		}
 
-	if (!sub->sub_io_initialized) {
-		sub->sub_subio_index = index;
 		rc = lov_io_sub_init(env, lio, sub);
-	} else {
-		rc = 0;
+		if (rc < 0) {
+			lov_sub_free(lio, sub);
+			goto out;
+		}
+
+		list_add_tail(&sub->sub_list, &lio->lis_subios);
+		lio->lis_nr_subios++;
 	}
+out:
 	if (rc < 0)
 		sub = ERR_PTR(rc);
 
@@ -162,6 +173,7 @@ static int lov_page_index(const struct cl_page *page)
 	const struct cl_page_slice *slice;
 
 	slice = cl_page_at(page, &lov_device_type);
+	LASSERT(slice);
 	LASSERT(slice->cpl_obj);
 
 	return cl2lov_page(slice)->lps_index;
@@ -170,28 +182,13 @@ static int lov_page_index(const struct cl_page *page)
 static int lov_io_subio_init(const struct lu_env *env, struct lov_io *lio,
 			     struct cl_io *io)
 {
-	struct lov_stripe_md *lsm;
-	int result;
-
 	LASSERT(lio->lis_object);
-	lsm = lio->lis_object->lo_lsm;
 
-	/*
-	 * Need to be optimized, we can't afford to allocate a piece of memory
-	 * when writing a page. -jay
-	 */
-	lio->lis_subs = kcalloc(lsm->lsm_entries[0]->lsme_stripe_count,
-				sizeof(lio->lis_subs[0]),
-				GFP_KERNEL);
-	if (lio->lis_subs) {
-		lio->lis_nr_subios = lio->lis_stripe_count;
-		lio->lis_single_subio_index = -1;
-		lio->lis_active_subios = 0;
-		result = 0;
-	} else {
-		result = -ENOMEM;
-	}
-	return result;
+	INIT_LIST_HEAD(&lio->lis_subios);
+	lio->lis_single_subio_index = -1;
+	lio->lis_nr_subios = 0;
+
+	return 0;
 }
 
 static int lov_io_slice_init(struct lov_io *lio, struct lov_object *obj,
@@ -200,7 +197,7 @@ static int lov_io_slice_init(struct lov_io *lio, struct lov_object *obj,
 	io->ci_result = 0;
 	lio->lis_object = obj;
 
-	lio->lis_stripe_count = obj->lo_lsm->lsm_entries[0]->lsme_stripe_count;
+	LASSERT(obj->lo_lsm);
 
 	switch (io->ci_type) {
 	case CIT_READ:
@@ -272,14 +269,21 @@ static void lov_io_fini(const struct lu_env *env, const struct cl_io_slice *ios)
 {
 	struct lov_io *lio = cl2lov_io(env, ios);
 	struct lov_object *lov = cl2lov(ios->cis_obj);
-	int i;
 
-	if (lio->lis_subs) {
-		for (i = 0; i < lio->lis_nr_subios; i++)
-			lov_io_sub_fini(env, lio, &lio->lis_subs[i]);
-		kvfree(lio->lis_subs);
-		lio->lis_nr_subios = 0;
+	LASSERT(list_empty(&lio->lis_active));
+
+	while (!list_empty(&lio->lis_subios)) {
+		struct lov_io_sub *sub = list_entry(lio->lis_subios.next,
+						    struct lov_io_sub,
+						    sub_list);
+
+		list_del_init(&sub->sub_list);
+		lio->lis_nr_subios--;
+
+		lov_io_sub_fini(env, lio, sub);
+		lov_sub_free(lio, sub);
 	}
+	LASSERT(lio->lis_nr_subios == 0);
 
 	LASSERT(atomic_read(&lov->lo_active_ios) > 0);
 	if (atomic_dec_and_test(&lov->lo_active_ios))
@@ -287,12 +291,13 @@ static void lov_io_fini(const struct lu_env *env, const struct cl_io_slice *ios)
 }
 
 static void lov_io_sub_inherit(struct lov_io_sub *sub, struct lov_io *lio,
-			       int stripe, loff_t start, loff_t end)
+			       loff_t start, loff_t end)
 {
-	struct cl_io *io = sub->sub_io;
+	struct cl_io *io = &sub->sub_io;
 	struct lov_stripe_md *lsm = lio->lis_object->lo_lsm;
 	struct cl_io *parent = lio->lis_cl.cis_io;
 	int index = lov_comp_entry(sub->sub_subio_index);
+	int stripe = lov_comp_stripe(sub->sub_subio_index);
 
 	switch (io->ci_type) {
 	case CIT_SETATTR: {
@@ -321,7 +326,7 @@ static void lov_io_sub_inherit(struct lov_io_sub *sub, struct lov_io *lio,
 	}
 	case CIT_FAULT: {
 		struct cl_object *obj = parent->ci_obj;
-		loff_t off = cl_offset(obj, parent->u.ci_fault.ft_index);
+		u64 off = cl_offset(obj, parent->u.ci_fault.ft_index);
 
 		io->u.ci_fault = parent->u.ci_fault;
 		off = lov_size_to_stripe(lsm, index, off, stripe);
@@ -373,11 +378,12 @@ static int lov_io_iter_init(const struct lu_env *env,
 	struct lov_stripe_md *lsm = lio->lis_object->lo_lsm;
 	struct lov_layout_entry *le;
 	struct lov_io_sub    *sub;
-	u64 endpos;
+	struct lu_extent ext;
 	int rc = 0;
 	int index;
 
-	endpos = lov_offset_mod(lio->lis_endpos, -1);
+	ext.e_start = lio->lis_pos;
+	ext.e_end = lio->lis_endpos;
 
 	index = 0;
 	lov_foreach_layout_entry(lio->lis_object, le) {
@@ -387,11 +393,12 @@ static int lov_io_iter_init(const struct lu_env *env,
 		u64 end;
 
 		index++;
+		if (!lu_extent_is_overlapped(&ext, &le->lle_extent))
+			continue;
 
 		for (stripe = 0; stripe < r0->lo_nr; stripe++) {
 			if (!lov_stripe_intersects(lsm, index - 1, stripe,
-						   lio->lis_pos,
-						   endpos, &start, &end))
+						   &ext, &start, &end))
 				continue;
 
 			if (unlikely(!r0->lo_sub[stripe])) {
@@ -411,10 +418,10 @@ static int lov_io_iter_init(const struct lu_env *env,
 				break;
 			}
 
-			lov_io_sub_inherit(sub, lio, stripe, start, end);
-			rc = cl_io_iter_init(sub->sub_env, sub->sub_io);
+			lov_io_sub_inherit(sub, lio, start, end);
+			rc = cl_io_iter_init(sub->sub_env, &sub->sub_io);
 			if (rc) {
-				cl_io_iter_fini(sub->sub_env, sub->sub_io);
+				cl_io_iter_fini(sub->sub_env, &sub->sub_io);
 				break;
 			}
 
@@ -437,31 +444,50 @@ static int lov_io_rw_iter_init(const struct lu_env *env,
 	u64 start = io->u.ci_rw.crw_pos;
 	struct lov_stripe_md_entry *lse;
 	unsigned long ssize;
-	loff_t next;
-	int index = 0;
+	int index;
+	u64 next;
 
 	LASSERT(io->ci_type == CIT_READ || io->ci_type == CIT_WRITE);
 
+	if (cl_io_is_append(io))
+		return lov_io_iter_init(env, ios);
+
+	index = lov_lsm_entry(lio->lis_object->lo_lsm, io->u.ci_rw.crw_pos);
+	if (index < 0) { /* non-existing layout component */
+		if (io->ci_type == CIT_READ) {
+			/* TODO: it needs to detect the next component and
+			 * then set the next pos
+			 */
+			io->ci_continue = 0;
+
+			return lov_io_iter_init(env, ios);
+		}
+
+		return -ENODATA;
+	}
+
 	lse = lov_lse(lio->lis_object, index);
 
 	ssize = lse->lsme_stripe_size;
+	lov_do_div64(start, ssize);
+	next = (start + 1) * ssize;
+	if (next <= start * ssize)
+		next = ~0ull;
+
+	LASSERT(io->u.ci_rw.crw_pos >= lse->lsme_extent.e_start);
+	next = min_t(u64, next, lse->lsme_extent.e_end);
+	next = min_t(u64, next, lio->lis_io_endpos);
+
+	io->ci_continue = next < lio->lis_io_endpos;
+	io->u.ci_rw.crw_count = next - io->u.ci_rw.crw_pos;
+	lio->lis_pos = io->u.ci_rw.crw_pos;
+	lio->lis_endpos = io->u.ci_rw.crw_pos + io->u.ci_rw.crw_count;
+
+	CDEBUG(D_VFSTRACE,
+	       "stripe: %llu chunk: [%llu, %llu) %llu\n",
+	       (u64)start, lio->lis_pos, lio->lis_endpos,
+	       (u64)lio->lis_io_endpos);
 
-	/* fast path for common case. */
-	if (lio->lis_nr_subios != 1 && !cl_io_is_append(io)) {
-		lov_do_div64(start, ssize);
-		next = (start + 1) * ssize;
-		if (next <= start * ssize)
-			next = ~0ull;
-
-		io->ci_continue = next < lio->lis_io_endpos;
-		io->u.ci_rw.crw_count = min_t(loff_t, lio->lis_io_endpos,
-					      next) - io->u.ci_rw.crw_pos;
-		lio->lis_pos    = io->u.ci_rw.crw_pos;
-		lio->lis_endpos = io->u.ci_rw.crw_pos + io->u.ci_rw.crw_count;
-		CDEBUG(D_VFSTRACE, "stripe: %llu chunk: [%llu, %llu) %llu\n",
-		       (__u64)start, lio->lis_pos, lio->lis_endpos,
-		       (__u64)lio->lis_io_endpos);
-	}
 	/*
 	 * XXX The following call should be optimized: we know, that
 	 * [lio->lis_pos, lio->lis_endpos) intersects with exactly one stripe.
@@ -477,12 +503,12 @@ static int lov_io_call(const struct lu_env *env, struct lov_io *lio,
 	int rc = 0;
 
 	list_for_each_entry(sub, &lio->lis_active, sub_linkage) {
-		rc = iofunc(sub->sub_env, sub->sub_io);
+		rc = iofunc(sub->sub_env, &sub->sub_io);
 		if (rc)
 			break;
 
 		if (parent->ci_result == 0)
-			parent->ci_result = sub->sub_io->ci_result;
+			parent->ci_result = sub->sub_io.ci_result;
 	}
 	return rc;
 }
@@ -539,13 +565,13 @@ static void lov_io_end(const struct lu_env *env, const struct cl_io_slice *ios)
 	struct lov_io_sub *sub;
 
 	list_for_each_entry(sub, &lio->lis_active, sub_linkage) {
-		lov_io_end_wrapper(sub->sub_env, sub->sub_io);
+		lov_io_end_wrapper(sub->sub_env, &sub->sub_io);
 
 		parent->u.ci_data_version.dv_data_version +=
-			sub->sub_io->u.ci_data_version.dv_data_version;
+			sub->sub_io.u.ci_data_version.dv_data_version;
 
 		if (!parent->ci_result)
-			parent->ci_result = sub->sub_io->ci_result;
+			parent->ci_result = sub->sub_io.ci_result;
 	}
 }
 
@@ -581,12 +607,18 @@ static int lov_io_read_ahead(const struct lu_env *env,
 	unsigned int pps; /* pages per stripe */
 	struct lov_io_sub *sub;
 	pgoff_t ra_end;
+	u64 offset;
 	u64 suboff;
 	int stripe;
-	int index = 0;
+	int index;
 	int rc;
 
-	stripe = lov_stripe_number(loo->lo_lsm, index, cl_offset(obj, start));
+	offset = cl_offset(obj, start);
+	index = lov_lsm_entry(loo->lo_lsm, offset);
+	if (index < 0)
+		return -ENODATA;
+
+	stripe = lov_stripe_number(loo->lo_lsm, index, offset);
 
 	r0 = lov_r0(loo, index);
 	if (unlikely(!r0->lo_sub[stripe]))
@@ -596,8 +628,8 @@ static int lov_io_read_ahead(const struct lu_env *env,
 	if (IS_ERR(sub))
 		return PTR_ERR(sub);
 
-	lov_stripe_offset(loo->lo_lsm, index, cl_offset(obj, start), stripe, &suboff);
-	rc = cl_io_read_ahead(sub->sub_env, sub->sub_io,
+	lov_stripe_offset(loo->lo_lsm, index, offset, stripe, &suboff);
+	rc = cl_io_read_ahead(sub->sub_env, &sub->sub_io,
 			      cl_index(lovsub2cl(r0->lo_sub[stripe]), suboff),
 			      ra);
 
@@ -623,8 +655,8 @@ static int lov_io_read_ahead(const struct lu_env *env,
 	pps = lov_lse(loo, index)->lsme_stripe_size >> PAGE_SHIFT;
 
 	CDEBUG(D_READA,
-	       DFID " max_index = %lu, pps = %u, stripe_size = %u, stripe no = %u, start index = %lu\n",
-	       PFID(lu_object_fid(lov2lu(loo))), ra_end, pps,
+	       DFID " max_index = %lu, pps = %u, index = %u, stripe_size = %u, stripe no = %u, start index = %lu\n",
+	       PFID(lu_object_fid(lov2lu(loo))), ra_end, pps, index,
 	       lov_lse(loo, index)->lsme_stripe_size, stripe, start);
 
 	/* never exceed the end of the stripe */
@@ -659,20 +691,17 @@ static int lov_io_submit(const struct lu_env *env,
 	int index;
 	int rc = 0;
 
-	if (lio->lis_active_subios == 1) {
+	if (lio->lis_nr_subios == 1) {
 		int idx = lio->lis_single_subio_index;
 
-		LASSERT(idx < lio->lis_nr_subios);
 		sub = lov_sub_get(env, lio, idx);
 		LASSERT(!IS_ERR(sub));
-		LASSERT(sub->sub_io == &lio->lis_single_subio);
-		rc = cl_io_submit_rw(sub->sub_env, sub->sub_io,
+		LASSERT(sub == &lio->lis_single_subio);
+		rc = cl_io_submit_rw(sub->sub_env, &sub->sub_io,
 				     crt, queue);
 		return rc;
 	}
 
-	LASSERT(lio->lis_subs);
-
 	cl_page_list_init(plist);
 	while (qin->pl_nr > 0) {
 		struct cl_2queue *cl2q = &lov_env_info(env)->lti_cl2q;
@@ -693,7 +722,7 @@ static int lov_io_submit(const struct lu_env *env,
 
 		sub = lov_sub_get(env, lio, index);
 		if (!IS_ERR(sub)) {
-			rc = cl_io_submit_rw(sub->sub_env, sub->sub_io,
+			rc = cl_io_submit_rw(sub->sub_env, &sub->sub_io,
 					     crt, cl2q);
 		} else {
 			rc = PTR_ERR(sub);
@@ -724,20 +753,17 @@ static int lov_io_commit_async(const struct lu_env *env,
 	struct cl_page *page;
 	int rc = 0;
 
-	if (lio->lis_active_subios == 1) {
+	if (lio->lis_nr_subios == 1) {
 		int idx = lio->lis_single_subio_index;
 
-		LASSERT(idx < lio->lis_nr_subios);
 		sub = lov_sub_get(env, lio, idx);
 		LASSERT(!IS_ERR(sub));
-		LASSERT(sub->sub_io == &lio->lis_single_subio);
-		rc = cl_io_commit_async(sub->sub_env, sub->sub_io, queue,
+		LASSERT(sub == &lio->lis_single_subio);
+		rc = cl_io_commit_async(sub->sub_env, &sub->sub_io, queue,
 					from, to, cb);
 		return rc;
 	}
 
-	LASSERT(lio->lis_subs);
-
 	cl_page_list_init(plist);
 	while (queue->pl_nr > 0) {
 		int stripe_to = to;
@@ -761,7 +787,7 @@ static int lov_io_commit_async(const struct lu_env *env,
 
 		sub = lov_sub_get(env, lio, index);
 		if (!IS_ERR(sub)) {
-			rc = cl_io_commit_async(sub->sub_env, sub->sub_io,
+			rc = cl_io_commit_async(sub->sub_env, &sub->sub_io,
 						plist, from, stripe_to, cb);
 		} else {
 			rc = PTR_ERR(sub);
@@ -797,7 +823,8 @@ static int lov_io_fault_start(const struct lu_env *env,
 	sub = lov_sub_get(env, lio, lov_page_index(fio->ft_page));
 	if (IS_ERR(sub))
 		return PTR_ERR(sub);
-	sub->sub_io->u.ci_fault.ft_nob = fio->ft_nob;
+	sub->sub_io.u.ci_fault.ft_nob = fio->ft_nob;
+
 	return lov_io_start(env, ios);
 }
 
@@ -810,7 +837,7 @@ static void lov_io_fsync_end(const struct lu_env *env,
 
 	*written = 0;
 	list_for_each_entry(sub, &lio->lis_active, sub_linkage) {
-		struct cl_io *subio = sub->sub_io;
+		struct cl_io *subio = &sub->sub_io;
 
 		lov_io_end_wrapper(sub->sub_env, subio);
 
diff --git a/drivers/staging/lustre/lustre/lov/lov_lock.c b/drivers/staging/lustre/lustre/lov/lov_lock.c
index cc08e96..ba31be4 100644
--- a/drivers/staging/lustre/lustre/lov/lov_lock.c
+++ b/drivers/staging/lustre/lustre/lov/lov_lock.c
@@ -76,7 +76,7 @@ static struct lov_sublock_env *lov_sublock_env_get(const struct lu_env *env,
 		sub = lov_sub_get(env, lio, lls->sub_index);
 		if (!IS_ERR(sub)) {
 			subenv->lse_env = sub->sub_env;
-			subenv->lse_io  = sub->sub_io;
+			subenv->lse_io = &sub->sub_io;
 		} else {
 			subenv = (void *)sub;
 		}
@@ -114,52 +114,65 @@ static struct lov_lock *lov_lock_sub_init(const struct lu_env *env,
 					  const struct cl_object *obj,
 					  struct cl_lock *lock)
 {
-	struct lov_object *loo = cl2lov(obj);
-	struct lov_layout_raid0 *r0;
-	struct lov_lock	*lovlck;
+	struct lov_object *lov = cl2lov(obj);
+	struct lov_lock *lovlck;
+	struct lu_extent ext;
 	int result = 0;
-	int index = 0;
+	int index;
 	int i;
 	int nr;
 	u64 start;
 	u64 end;
-	u64 file_start;
-	u64 file_end;
-
-	CDEBUG(D_INODE, "%p: lock/io FID " DFID "/" DFID ", lock/io clobj %p/%p\n",
-	       loo, PFID(lu_object_fid(lov2lu(loo))),
-	       PFID(lu_object_fid(&obj->co_lu)),
-	       lov2cl(loo), obj);
-
-	file_start = cl_offset(lov2cl(loo), lock->cll_descr.cld_start);
-	file_end   = cl_offset(lov2cl(loo), lock->cll_descr.cld_end + 1) - 1;
-
-	r0 = lov_r0(loo, index);
-	for (i = 0, nr = 0; i < r0->lo_nr; i++) {
-		/*
-		 * XXX for wide striping smarter algorithm is desirable,
-		 * breaking out of the loop, early.
-		 */
-		if (likely(r0->lo_sub[i]) && /* spare layout */
-		    lov_stripe_intersects(loo->lo_lsm, index, i,
-					  file_start, file_end, &start, &end))
-			nr++;
+
+	ext.e_start = cl_offset(obj, lock->cll_descr.cld_start);
+	if (lock->cll_descr.cld_end == CL_PAGE_EOF)
+		ext.e_end = OBD_OBJECT_EOF;
+	else
+		ext.e_end = cl_offset(obj, lock->cll_descr.cld_end + 1);
+
+	nr = 0;
+	for (index = lov_lsm_entry(lov->lo_lsm, ext.e_start);
+	     index != -1 && index < lov->lo_lsm->lsm_entry_count; index++) {
+		struct lov_layout_raid0 *r0 = lov_r0(lov, index);
+
+		/* assume lsm entries are sorted. */
+		if (!lu_extent_is_overlapped(&ext,
+					     &lov_lse(lov, index)->lsme_extent))
+			break;
+
+		for (i = 0; i < r0->lo_nr; i++) {
+			if (likely(r0->lo_sub[i]) && /* spare layout */
+			    lov_stripe_intersects(lov->lo_lsm, index, i,
+						  &ext, &start, &end))
+				nr++;
+		}
 	}
-	LASSERT(nr > 0);
+	if (nr == 0)
+		return ERR_PTR(-EINVAL);
+
 	lovlck = kvzalloc(offsetof(struct lov_lock, lls_sub[nr]),
 				 GFP_NOFS);
 	if (!lovlck)
 		return ERR_PTR(-ENOMEM);
 
 	lovlck->lls_nr = nr;
-	for (i = 0, nr = 0; i < r0->lo_nr; ++i) {
-		if (likely(r0->lo_sub[i]) &&
-		    lov_stripe_intersects(loo->lo_lsm, index, i,
-					  file_start, file_end, &start, &end)) {
+	nr = 0;
+	for (index = lov_lsm_entry(lov->lo_lsm, ext.e_start);
+	     index < lov->lo_lsm->lsm_entry_count; index++) {
+		struct lov_layout_raid0 *r0 = lov_r0(lov, index);
+
+		/* assume lsm entries are sorted. */
+		if (!lu_extent_is_overlapped(&ext,
+					     &lov_lse(lov, index)->lsme_extent))
+			break;
+		for (i = 0; i < r0->lo_nr; ++i) {
 			struct lov_lock_sub *lls = &lovlck->lls_sub[nr];
-			struct cl_lock_descr *descr;
+			struct cl_lock_descr *descr = &lls->sub_lock.cll_descr;
 
-			descr = &lls->sub_lock.cll_descr;
+			if (unlikely(!r0->lo_sub[i]) ||
+			    !lov_stripe_intersects(lov->lo_lsm, index, i,
+						   &ext, &start, &end))
+				continue;
 
 			LASSERT(!descr->cld_obj);
 			descr->cld_obj   = lovsub2cl(r0->lo_sub[i]);
@@ -267,8 +280,8 @@ static void lov_lock_cancel(const struct lu_env *env,
 			cl_lock_cancel(subenv->lse_env, sublock);
 		} else {
 			CL_LOCK_DEBUG(D_ERROR, env, slice->cls_lock,
-				      "%s fails with %ld.\n",
-				      __func__, PTR_ERR(subenv));
+				      "lov_lock_cancel fails with %ld.\n",
+				      PTR_ERR(subenv));
 		}
 	}
 }
diff --git a/drivers/staging/lustre/lustre/lov/lov_object.c b/drivers/staging/lustre/lustre/lov/lov_object.c
index 3b34713..a7d3068 100644
--- a/drivers/staging/lustre/lustre/lov/lov_object.c
+++ b/drivers/staging/lustre/lustre/lov/lov_object.c
@@ -130,14 +130,13 @@ static struct cl_object *lov_sub_find(const struct lu_env *env,
 
 static int lov_init_sub(const struct lu_env *env, struct lov_object *lov,
 			struct cl_object *subobj, struct lov_layout_raid0 *r0,
-			int idx)
+			struct lov_oinfo *oinfo, int idx)
 {
 	int stripe = lov_comp_stripe(idx);
 	int entry = lov_comp_entry(idx);
 	struct cl_object_header *hdr;
 	struct cl_object_header *subhdr;
 	struct cl_object_header *parent;
-	struct lov_oinfo	*oinfo;
 	int result;
 
 	if (OBD_FAIL_CHECK(OBD_FAIL_LOV_INIT)) {
@@ -155,11 +154,10 @@ static int lov_init_sub(const struct lu_env *env, struct lov_object *lov,
 	hdr    = cl_object_header(lov2cl(lov));
 	subhdr = cl_object_header(subobj);
 
-	oinfo = lov->lo_lsm->lsm_entries[0]->lsme_oinfo[idx];
 	CDEBUG(D_INODE,
 	       DFID "@%p[%d:%d] -> " DFID "@%p: ostid: " DOSTID " ost idx: %d gen: %d\n",
-	       PFID(&subhdr->coh_lu.loh_fid), subhdr, entry, stripe,
-	       PFID(&hdr->coh_lu.loh_fid), hdr, POSTID(&oinfo->loi_oi),
+	       PFID(lu_object_fid(&subobj->co_lu)), subhdr, entry, stripe,
+	       PFID(lu_object_fid(lov2lu(lov))), hdr, POSTID(&oinfo->loi_oi),
 	       oinfo->loi_ost_idx, oinfo->loi_ost_gen);
 
 	/* reuse ->coh_attr_guard to protect coh_parent change */
@@ -221,14 +219,13 @@ static int lov_page_slice_fixup(struct lov_object *lov,
 
 static int lov_init_raid0(const struct lu_env *env, struct lov_device *dev,
 			  struct lov_object *lov, int index,
-			  const struct cl_object_conf *conf,
 			  struct lov_layout_raid0 *r0)
 {
 	struct lov_stripe_md_entry *lse = lov_lse(lov, index);
-	struct cl_object *stripe;
 	struct lov_thread_info *lti = lov_env_info(env);
 	struct cl_object_conf *subconf = &lti->lti_stripe_conf;
 	struct lu_fid *ofid = &lti->lti_fid;
+	struct cl_object *stripe;
 	int result;
 	int psz;
 	int i;
@@ -238,20 +235,21 @@ static int lov_init_raid0(const struct lu_env *env, struct lov_device *dev,
 	LASSERT(r0->lo_nr <= lov_targets_nr(dev));
 
 	r0->lo_sub = kvzalloc(r0->lo_nr * sizeof(r0->lo_sub[0]),
-				     GFP_NOFS);
+			      GFP_KERNEL);
 	if (!r0->lo_sub)
 		return -ENOMEM;
 
 	psz = 0;
 	result = 0;
-	subconf->coc_inode = conf->coc_inode;
+	memset(subconf, 0, sizeof(*subconf));
+
 	/*
 	 * Create stripe cl_objects.
 	 */
 	for (i = 0; i < r0->lo_nr; ++i) {
 		struct lov_oinfo *oinfo = lse->lsme_oinfo[i];
+		int ost_idx = oinfo->loi_ost_idx;
 		struct cl_device *subdev;
-		int ost_idx;
 
 		if (lov_oinfo_is_dummy(oinfo))
 			continue;
@@ -261,7 +259,6 @@ static int lov_init_raid0(const struct lu_env *env, struct lov_device *dev,
 		if (result != 0)
 			goto out;
 
-		ost_idx = oinfo->loi_ost_idx;
 		if (!dev->ld_target[ost_idx]) {
 			CERROR("%s: OST %04x is not initialized\n",
 			       lov2obd(dev->ld_lov)->obd_name, ost_idx);
@@ -282,7 +279,7 @@ static int lov_init_raid0(const struct lu_env *env, struct lov_device *dev,
 			goto out;
 		}
 
-		result = lov_init_sub(env, lov, stripe, r0,
+		result = lov_init_sub(env, lov, stripe, r0, oinfo,
 				      lov_comp_index(index, i));
 		if (result == -EAGAIN) { /* try again */
 			--i;
@@ -309,15 +306,17 @@ static int lov_init_composite(const struct lu_env *env, struct lov_device *dev,
 			      union lov_layout_state *state)
 {
 	struct lov_layout_composite *comp = &state->composite;
-	unsigned int entry_count = 1;
+	unsigned int entry_count;
 	unsigned int psz = 0;
 	int result = 0;
 	int i;
 
+	LASSERT(lsm->lsm_entry_count > 0);
 	LASSERT(!lov->lo_lsm);
 	lov->lo_lsm = lsm_addref(lsm);
 	lov->lo_layout_invalid = true;
 
+	entry_count = lsm->lsm_entry_count;
 	comp->lo_entry_count = entry_count;
 
 	comp->lo_entries = kcalloc(entry_count, sizeof(*comp->lo_entries),
@@ -328,8 +327,8 @@ static int lov_init_composite(const struct lu_env *env, struct lov_device *dev,
 	for (i = 0; i < entry_count; i++) {
 		struct lov_layout_entry *le = &comp->lo_entries[i];
 
-		result = lov_init_raid0(env, dev, lov, i, conf,
-					&le->lle_raid0);
+		le->lle_extent = lsm->lsm_entries[i]->lsme_extent;
+		result = lov_init_raid0(env, dev, lov, i, &le->lle_raid0);
 		if (result < 0)
 			break;
 
@@ -364,31 +363,30 @@ static struct cl_object *lov_find_subobj(const struct lu_env *env,
 	struct lov_thread_info *lti = lov_env_info(env);
 	struct lu_fid *ofid = &lti->lti_fid;
 	int stripe = lov_comp_stripe(index);
+	int entry = lov_comp_entry(index);
+	struct cl_object *result = NULL;
 	struct cl_device *subdev;
-	struct cl_object *result;
 	struct lov_oinfo *oinfo;
 	int ost_idx;
 	int rc;
 
-	if (lov->lo_type != LLT_COMP) {
-		result = NULL;
+	if (lov->lo_type != LLT_COMP)
+		goto out;
+
+	if (entry >= lsm->lsm_entry_count ||
+	    stripe >= lsm->lsm_entries[entry]->lsme_stripe_count)
 		goto out;
-	}
 
-	oinfo = lsm->lsm_entries[0]->lsme_oinfo[stripe];
+	oinfo = lsm->lsm_entries[entry]->lsme_oinfo[stripe];
 	ost_idx = oinfo->loi_ost_idx;
 	rc = ostid_to_fid(ofid, &oinfo->loi_oi, ost_idx);
-	if (rc) {
-		result = NULL;
+	if (rc)
 		goto out;
-	}
 
 	subdev = lovsub2cl_dev(dev->ld_target[ost_idx]);
 	result = lov_sub_find(env, subdev, ofid, NULL);
 out:
-	if (!result)
-		result = ERR_PTR(-EINVAL);
-	return result;
+	return result ? result : ERR_PTR(-EINVAL);
 }
 
 static int lov_delete_empty(const struct lu_env *env, struct lov_object *lov,
@@ -567,8 +565,8 @@ static int lov_print_composite(const struct lu_env *env, void *cookie,
 	for (i = 0; i < lsm->lsm_entry_count; i++) {
 		struct lov_stripe_md_entry *lse = lsm->lsm_entries[i];
 
-		(*p)(env, cookie, ": { 0x%08X, %u, %u, %u, %u }\n",
-		     lse->lsme_magic,
+		(*p)(env, cookie, DEXT ": { 0x%08X, %u, %u, %u, %u }\n",
+		     PEXT(&lse->lsme_extent), lse->lsme_magic,
 		     lse->lsme_id, lse->lsme_layout_gen,
 		     lse->lsme_stripe_count, lse->lsme_stripe_size);
 		lov_print_raid0(env, cookie, p, lov_r0(lov, i));
@@ -584,10 +582,10 @@ static int lov_print_released(const struct lu_env *env, void *cookie,
 	struct lov_stripe_md	*lsm = lov->lo_lsm;
 
 	(*p)(env, cookie,
-	     "released: %s, lsm{%p 0x%08X %d %u %u}:\n",
+	     "released: %s, lsm{%p 0x%08X %d %u}:\n",
 	     lov->lo_layout_invalid ? "invalid" : "valid", lsm,
 	     lsm->lsm_magic, atomic_read(&lsm->lsm_refc),
-	     lsm->lsm_entries[0]->lsme_stripe_count, lsm->lsm_layout_gen);
+	     lsm->lsm_layout_gen);
 	return 0;
 }
 
@@ -601,6 +599,7 @@ static int lov_print_released(const struct lu_env *env, void *cookie,
 static int lov_attr_get_empty(const struct lu_env *env, struct cl_object *obj,
 			      struct cl_attr *attr)
 {
+	attr->cat_blocks = 0;
 	return 0;
 }
 
@@ -659,16 +658,18 @@ static int lov_attr_get_composite(const struct lu_env *env,
 	int result = 0;
 	int index = 0;
 
-	attr->cat_blocks = 0;
 	attr->cat_size = 0;
+	attr->cat_blocks = 0;
 	lov_foreach_layout_entry(lov, entry) {
 		struct lov_layout_raid0 *r0 = &entry->lle_raid0;
 		struct cl_attr *lov_attr = &r0->lo_attr;
 
 		result = lov_attr_get_raid0(env, lov, index, r0);
-		if (result)
+		if (result != 0)
 			break;
 
+		index++;
+
 		/* merge results */
 		attr->cat_blocks += lov_attr->cat_blocks;
 		if (attr->cat_size < lov_attr->cat_size)
@@ -742,13 +743,15 @@ static enum lov_layout_type lov_type(struct lov_stripe_md *lsm)
 	if (!lsm)
 		return LLT_EMPTY;
 
-	if (lsm->lsm_magic == LOV_MAGIC_COMP_V1)
-		return LLT_EMPTY;
-
 	if (lsm->lsm_is_released)
 		return LLT_RELEASED;
 
-	return LLT_COMP;
+	if (lsm->lsm_magic == LOV_MAGIC_V1 ||
+	    lsm->lsm_magic == LOV_MAGIC_V3 ||
+	    lsm->lsm_magic == LOV_MAGIC_COMP_V1)
+		return LLT_COMP;
+
+	return LLT_EMPTY;
 }
 
 static inline void lov_conf_freeze(struct lov_object *lov)
@@ -926,6 +929,8 @@ int lov_object_init(const struct lu_env *env, struct lu_object *obj,
 				   cconf->u.coc_layout.lb_len);
 		if (IS_ERR(lsm))
 			return PTR_ERR(lsm);
+
+		dump_lsm(D_INODE, lsm);
 	}
 
 	/* no locking is necessary, as object is being created */
@@ -1090,8 +1095,8 @@ int lov_lock_init(const struct lu_env *env, struct cl_object *obj,
  * over which the mapping is spread
  *
  * \param lsm [in]		striping information for the file
- * \param fm_start [in]		logical start of mapping
- * \param fm_end [in]		logical end of mapping
+ * @index			stripe component index
+ * @ext				logical extent of mapping
  * \param start_stripe [in]	starting stripe of the mapping
  * \param stripe_count [out]	the number of stripes across which to map is
  *				returned
@@ -1099,7 +1104,7 @@ int lov_lock_init(const struct lu_env *env, struct cl_object *obj,
  * \retval last_stripe		return the last stripe of the mapping
  */
 static int fiemap_calc_last_stripe(struct lov_stripe_md *lsm, int index,
-				   u64 fm_start, u64 fm_end,
+				   struct lu_extent *ext,
 				   int start_stripe, int *stripe_count)
 {
 	struct lov_stripe_md_entry *lsme = lsm->lsm_entries[index];
@@ -1108,7 +1113,7 @@ static int fiemap_calc_last_stripe(struct lov_stripe_md *lsm, int index,
 	u64 obd_end;
 	int i, j;
 
-	if (fm_end - fm_start >
+	if (ext->e_end - ext->e_start >
 	    lsme->lsme_stripe_size * lsme->lsme_stripe_count) {
 		last_stripe = (start_stripe < 1 ? lsme->lsme_stripe_count - 1 :
 						  start_stripe - 1);
@@ -1116,7 +1121,7 @@ static int fiemap_calc_last_stripe(struct lov_stripe_md *lsm, int index,
 	} else {
 		for (j = 0, i = start_stripe; j < lsme->lsme_stripe_count;
 		     i = (i + 1) % lsme->lsme_stripe_count, j++) {
-			if (lov_stripe_intersects(lsm, index, i, fm_start, fm_end,
+			if (lov_stripe_intersects(lsm, index, i, ext,
 						  &obd_start, &obd_end) == 0)
 				break;
 		}
@@ -1170,13 +1175,13 @@ static void fiemap_prepare_and_copy_exts(struct fiemap *fiemap,
  *
  * \param fiemap [in]		fiemap request header
  * \param lsm [in]		striping information for the file
- * \param fm_start [in]		logical start of mapping
- * \param fm_end [in]		logical end of mapping
+ * @index			stripe component index
+ * @ext				logical extent of mapping
  * \param start_stripe [out]	starting stripe will be returned in this
  */
 static u64 fiemap_calc_fm_end_offset(struct fiemap *fiemap,
 				     struct lov_stripe_md *lsm,
-				     int index, u64 fm_start, u64 fm_end,
+				     int index, struct lu_extent *ext,
 				     int *start_stripe)
 {
 	struct lov_stripe_md_entry *lsme = lsm->lsm_entries[index];
@@ -1209,7 +1214,7 @@ static u64 fiemap_calc_fm_end_offset(struct fiemap *fiemap,
 	 * If we have finished mapping on previous device, shift logical
 	 * offset to start of next device
 	 */
-	if (lov_stripe_intersects(lsm, index, stripe_no, fm_start, fm_end,
+	if (lov_stripe_intersects(lsm, index, stripe_no, ext,
 				  &lun_start, &lun_end) != 0 &&
 	    local_end < lun_end) {
 		fm_end_offset = local_end;
@@ -1227,16 +1232,15 @@ static u64 fiemap_calc_fm_end_offset(struct fiemap *fiemap,
 
 struct fiemap_state {
 	struct fiemap		*fs_fm;
-	u64			fs_start;
+	struct lu_extent	fs_ext;
 	u64			fs_length;
-	u64			fs_end;
 	u64			fs_end_offset;
 	int			fs_cur_extent;
 	int			fs_cnt_need;
 	int			fs_start_stripe;
 	int			fs_last_stripe;
 	bool			fs_device_done;
-	bool			fs_finish;
+	bool			fs_finish_stripe;
 	bool			fs_enough;
 };
 
@@ -1264,8 +1268,7 @@ static int fiemap_for_stripe(const struct lu_env *env, struct cl_object *obj,
 
 	fs->fs_device_done = false;
 	/* Find out range of mapping on this stripe */
-	if ((lov_stripe_intersects(lsm, index, stripeno,
-				   fs->fs_start, fs->fs_end,
+	if ((lov_stripe_intersects(lsm, index, stripeno, &fs->fs_ext,
 				   &lun_start, &obd_object_end)) == 0)
 		return 0;
 
@@ -1279,16 +1282,7 @@ static int fiemap_for_stripe(const struct lu_env *env, struct cl_object *obj,
 	if (fs->fs_end_offset != 0 && stripeno == fs->fs_start_stripe)
 		lun_start = fs->fs_end_offset;
 
-	lun_end = fs->fs_length;
-	if (lun_end != ~0ULL) {
-		/* Handle fs->fs_start + fs->fs_length overflow */
-		if (fs->fs_start + fs->fs_length < fs->fs_start)
-			fs->fs_length = ~0ULL - fs->fs_start;
-		lun_end = lov_size_to_stripe(lsm, index,
-					     fs->fs_start + fs->fs_length,
-					     stripeno);
-	}
-
+	lun_end = lov_size_to_stripe(lsm, index, fs->fs_ext.e_end, stripeno);
 	if (lun_start == lun_end)
 		return 0;
 
@@ -1316,6 +1310,11 @@ static int fiemap_for_stripe(const struct lu_env *env, struct cl_object *obj,
 		lun_start += len_mapped_single_call;
 		fs->fs_fm->fm_length = req_fm_len - len_mapped_single_call;
 		req_fm_len = fs->fs_fm->fm_length;
+		/**
+		 * If we've collected enough extent map, we'd request 1 more,
+		 * to see whether we coincidentally finished all available
+		 * extent map, so that FIEMAP_EXTENT_LAST would be set.
+		 */
 		fs->fs_fm->fm_extent_count = fs->fs_enough ?
 					     1 : fs->fs_cnt_need;
 		fs->fs_fm->fm_mapped_extents = 0;
@@ -1357,7 +1356,7 @@ static int fiemap_for_stripe(const struct lu_env *env, struct cl_object *obj,
 			 */
 			if (stripeno == fs->fs_last_stripe) {
 				fiemap->fm_mapped_extents = 0;
-				fs->fs_finish = true;
+				fs->fs_finish_stripe = true;
 				goto obj_put;
 			}
 			break;
@@ -1366,7 +1365,6 @@ static int fiemap_for_stripe(const struct lu_env *env, struct cl_object *obj,
 			 * We've collected enough extents and there are
 			 * more extents after it.
 			 */
-			fs->fs_finish = true;
 			goto obj_put;
 		}
 
@@ -1410,7 +1408,7 @@ static int fiemap_for_stripe(const struct lu_env *env, struct cl_object *obj,
 	} while (!ost_done && !ost_eof);
 
 	if (stripeno == fs->fs_last_stripe)
-		fs->fs_finish = true;
+		fs->fs_finish_stripe = true;
 obj_put:
 	cl_object_put(env, subobj);
 
@@ -1436,26 +1434,35 @@ static int lov_object_fiemap(const struct lu_env *env, struct cl_object *obj,
 			     struct fiemap *fiemap, size_t *buflen)
 {
 	unsigned int buffer_size = FIEMAP_BUFFER_SIZE;
+	struct lov_stripe_md_entry *lsme;
 	struct fiemap *fm_local = NULL;
 	struct lov_stripe_md *lsm;
-	int rc = 0;
-	int entry = 0;
-	int cur_stripe;
+	loff_t whole_start;
+	loff_t whole_end;
+	int entry;
+	int start_entry;
+	int end_entry;
+	int cur_stripe = 0;
 	int stripe_count;
+	int rc = 0;
 	struct fiemap_state fs = { NULL };
 
 	lsm = lov_lsm_addref(cl2lov(obj));
 	if (!lsm)
 		return -ENODATA;
 
-	/**
-	 * If the stripe_count > 1 and the application does not understand
-	 * DEVICE_ORDER flag, it cannot interpret the extents correctly.
-	 */
-	if (lsm->lsm_entries[0]->lsme_stripe_count > 1 &&
-	    !(fiemap->fm_flags & FIEMAP_FLAG_DEVICE_ORDER)) {
-		rc = -ENOTSUPP;
-		goto out;
+	if (!(fiemap->fm_flags & FIEMAP_FLAG_DEVICE_ORDER)) {
+		/**
+		 * If the entry count > 1 or stripe_count > 1 and the
+		 * application does not understand DEVICE_ORDER flag,
+		 * it cannot interpret the extents correctly.
+		 */
+		if (lsm->lsm_entry_count > 1 ||
+		    (lsm->lsm_entry_count == 1 &&
+		     lsm->lsm_entries[0]->lsme_stripe_count > 1)) {
+			rc = -ENOTSUPP;
+			goto out_lsm;
+		}
 	}
 
 	if (lsm->lsm_is_released) {
@@ -1478,49 +1485,19 @@ static int lov_object_fiemap(const struct lu_env *env, struct cl_object *obj,
 				FIEMAP_EXTENT_UNKNOWN | FIEMAP_EXTENT_LAST;
 		}
 		rc = 0;
-		goto out;
+		goto out_lsm;
 	}
 
+	/* buffer_size is small to hold fm_extent_count of extents. */
 	if (fiemap_count_to_size(fiemap->fm_extent_count) < buffer_size)
 		buffer_size = fiemap_count_to_size(fiemap->fm_extent_count);
 
 	fm_local = kvzalloc(buffer_size, GFP_NOFS);
 	if (!fm_local) {
 		rc = -ENOMEM;
-		goto out;
-	}
-	fs.fs_fm = fm_local;
-	fs.fs_cnt_need = fiemap_size_to_count(buffer_size);
-
-	fs.fs_start = fiemap->fm_start;
-	/* fs_start is beyond the end of the file */
-	if (fs.fs_start > fmkey->lfik_oa.o_size) {
-		rc = -EINVAL;
-		goto out;
-	}
-	/* Calculate start stripe, last stripe and length of mapping */
-	fs.fs_start_stripe = lov_stripe_number(lsm, 0, fs.fs_start);
-	fs.fs_end = (fs.fs_length == ~0ULL) ? fmkey->lfik_oa.o_size :
-					      fs.fs_start + fs.fs_length - 1;
-	/* If fs_length != ~0ULL but fs_start+fs_length-1 exceeds file size */
-	if (fs.fs_end > fmkey->lfik_oa.o_size) {
-		fs.fs_end = fmkey->lfik_oa.o_size;
-		fs.fs_length = fs.fs_end - fs.fs_start;
+		goto out_lsm;
 	}
 
-	fs.fs_last_stripe = fiemap_calc_last_stripe(lsm, entry,
-						    fs.fs_start, fs.fs_end,
-						    fs.fs_start_stripe,
-						    &stripe_count);
-	fs.fs_end_offset = fiemap_calc_fm_end_offset(fiemap, lsm, entry,
-						     fs.fs_start, fs.fs_end,
-						     &fs.fs_start_stripe);
-	if (fs.fs_end_offset == -EINVAL) {
-		rc = -EINVAL;
-		goto out;
-	}
-
-
 	/**
 	 * Requested extent count exceeds the fiemap buffer size, shrink our
 	 * ambition.
@@ -1530,27 +1507,88 @@ static int lov_object_fiemap(const struct lu_env *env, struct cl_object *obj,
 	if (!fiemap->fm_extent_count)
 		fs.fs_cnt_need = 0;
 
-	fs.fs_finish = false;
 	fs.fs_enough = false;
 	fs.fs_cur_extent = 0;
+	fs.fs_fm = fm_local;
+	fs.fs_cnt_need = fiemap_size_to_count(buffer_size);
+
+	whole_start = fiemap->fm_start;
+	/* whole_start is beyond the end of the file */
+	if (whole_start > fmkey->lfik_oa.o_size) {
+		rc = -EINVAL;
+		goto out_fm_local;
+	}
+	whole_end = (fiemap->fm_length == OBD_OBJECT_EOF) ?
+		     fmkey->lfik_oa.o_size :
+		     whole_start + fiemap->fm_length - 1;
+	/**
+	 * If fiemap->fm_length != OBD_OBJECT_EOF but whole_end exceeds file
+	 * size
+	 */
+	if (whole_end > fmkey->lfik_oa.o_size)
+		whole_end = fmkey->lfik_oa.o_size;
+
+	start_entry = lov_lsm_entry(lsm, whole_start);
+	end_entry = lov_lsm_entry(lsm, whole_end);
+	if (end_entry == -1)
+		end_entry = lsm->lsm_entry_count - 1;
+
+	if (start_entry == -1 || end_entry == -1) {
+		rc = -EINVAL;
+		goto out_fm_local;
+	}
+
+	for (entry = start_entry; entry <= end_entry; entry++) {
+		lsme = lsm->lsm_entries[entry];
+
+		if (entry == start_entry)
+			fs.fs_ext.e_start = whole_start;
+		else
+			fs.fs_ext.e_start = lsme->lsme_extent.e_start;
+		if (entry == end_entry)
+			fs.fs_ext.e_end = whole_end;
+		else
+			fs.fs_ext.e_end = lsme->lsme_extent.e_end - 1;
+		fs.fs_length = fs.fs_ext.e_end - fs.fs_ext.e_start + 1;
+
+		/* Calculate start stripe, last stripe and length of mapping */
+		fs.fs_start_stripe = lov_stripe_number(lsm, entry,
+						       fs.fs_ext.e_start);
+		fs.fs_last_stripe = fiemap_calc_last_stripe(lsm, entry,
+							    &fs.fs_ext,
+							    fs.fs_start_stripe,
+							    &stripe_count);
+		fs.fs_end_offset = fiemap_calc_fm_end_offset(fiemap, lsm, entry,
+							     &fs.fs_ext,
+							     &fs.fs_start_stripe);
+		/* Check each stripe */
+		for (cur_stripe = fs.fs_start_stripe; stripe_count > 0;
+		     --stripe_count,
+		     cur_stripe = (cur_stripe + 1) % lsme->lsme_stripe_count) {
+			rc = fiemap_for_stripe(env, obj, lsm, fiemap, buflen,
+					       fmkey, entry, cur_stripe, &fs);
+			if (rc < 0)
+				goto out_fm_local;
+			if (fs.fs_enough)
+				goto finish;
+			if (fs.fs_finish_stripe)
+				break;
+		 } /* for each stripe */
+	} /* for covering layout component */
 
-	/* Check each stripe */
-	for (cur_stripe = fs.fs_start_stripe; stripe_count > 0;
-	     --stripe_count,
-	     cur_stripe = (cur_stripe + 1) %
-			  lsm->lsm_entries[0]->lsme_stripe_count) {
-		rc = fiemap_for_stripe(env, obj, lsm, fiemap, buflen,
-				       fmkey, 0, cur_stripe, &fs);
-		if (rc < 0)
-			goto out;
-		if (fs.fs_finish)
-			break;
-	} /* for each stripe */
+	/*
+	 * We've traversed all components, set @entry to the last component
+	 * entry, it's for the last stripe check.
+	 */
+	entry--;
+finish:
 	/*
 	 * Indicate that we are returning device offsets unless file just has
 	 * single stripe
 	 */
-	if (lsm->lsm_entries[0]->lsme_stripe_count > 1)
+	if (lsm->lsm_entry_count > 1 ||
+	    (lsm->lsm_entry_count == 1 &&
+	     lsm->lsm_entries[0]->lsme_stripe_count > 1))
 		fiemap->fm_flags |= FIEMAP_FLAG_DEVICE_ORDER;
 
 	if (!fiemap->fm_extent_count)
@@ -1565,8 +1603,9 @@ static int lov_object_fiemap(const struct lu_env *env, struct cl_object *obj,
 							FIEMAP_EXTENT_LAST;
 skip_last_device_calc:
 	fiemap->fm_mapped_extents = fs.fs_cur_extent;
-out:
+out_fm_local:
 	kvfree(fm_local);
+out_lsm:
 	lov_lsm_put(lsm);
 	return rc;
 }
diff --git a/drivers/staging/lustre/lustre/lov/lov_offset.c b/drivers/staging/lustre/lustre/lov/lov_offset.c
index 513f1fd..ab02c34 100644
--- a/drivers/staging/lustre/lustre/lov/lov_offset.c
+++ b/drivers/staging/lustre/lustre/lov/lov_offset.c
@@ -225,9 +225,19 @@ u64 lov_size_to_stripe(struct lov_stripe_md *lsm, int index, u64 file_size,
  * stripe does intersect with the lov extent.
  */
 int lov_stripe_intersects(struct lov_stripe_md *lsm, int index, int stripeno,
-			  u64 start, u64 end, u64 *obd_start, u64 *obd_end)
+			  struct lu_extent *ext, u64 *obd_start, u64 *obd_end)
 {
+	struct lov_stripe_md_entry *entry = lsm->lsm_entries[index];
 	int start_side, end_side;
+	u64 start, end;
+
+	if (!lu_extent_is_overlapped(ext, &entry->lsme_extent))
+		return 0;
+
+	start = max_t(u64, ext->e_start, entry->lsme_extent.e_start);
+	end = min_t(u64, ext->e_end, entry->lsme_extent.e_end);
+	if (end != OBD_OBJECT_EOF)
+		end--;
 
 	start_side = lov_stripe_offset(lsm, index, start, stripeno, obd_start);
 	end_side = lov_stripe_offset(lsm, index, end, stripeno, obd_end);
diff --git a/drivers/staging/lustre/lustre/lov/lov_pack.c b/drivers/staging/lustre/lustre/lov/lov_pack.c
index 8b7a572..ba7c488 100644
--- a/drivers/staging/lustre/lustre/lov/lov_pack.c
+++ b/drivers/staging/lustre/lustre/lov/lov_pack.c
@@ -189,8 +189,8 @@ int lov_free_memmd(struct lov_stripe_md **lsmp)
 	int refc;
 
 	*lsmp = NULL;
-	LASSERT(atomic_read(&lsm->lsm_refc) > 0);
 	refc = atomic_dec_return(&lsm->lsm_refc);
+	LASSERT(refc >= 0);
 	if (refc == 0)
 		lsm_free(lsm);
 
diff --git a/drivers/staging/lustre/lustre/lov/lov_page.c b/drivers/staging/lustre/lustre/lov/lov_page.c
index e227279..f53379a 100644
--- a/drivers/staging/lustre/lustre/lov/lov_page.c
+++ b/drivers/staging/lustre/lustre/lov/lov_page.c
@@ -76,10 +76,16 @@ int lov_page_init_composite(const struct lu_env *env, struct cl_object *obj,
 	u64 offset;
 	u64	    suboff;
 	int		stripe;
-	int entry = 0;
+	int entry;
 	int		rc;
 
 	offset = cl_offset(obj, index);
+	entry = lov_lsm_entry(loo->lo_lsm, offset);
+	if (entry < 0) {
+		/* non-existing layout component */
+		lov_page_init_empty(env, obj, page, index);
+		return 0;
+	}
 
 	r0 = lov_r0(loo, entry);
 	stripe = lov_stripe_number(loo->lo_lsm, entry, offset);
diff --git a/drivers/staging/lustre/lustre/osc/osc_lock.c b/drivers/staging/lustre/lustre/osc/osc_lock.c
index 4cc813d..824c655 100644
--- a/drivers/staging/lustre/lustre/osc/osc_lock.c
+++ b/drivers/staging/lustre/lustre/osc/osc_lock.c
@@ -1128,10 +1128,6 @@ static void osc_lock_set_writer(const struct lu_env *env,
 		io_start = cl_index(obj, io->u.ci_rw.crw_pos);
 		io_end = cl_index(obj, io->u.ci_rw.crw_pos +
 				  io->u.ci_rw.crw_count - 1);
-		if (cl_io_is_append(io)) {
-			io_start = 0;
-			io_end = CL_PAGE_EOF;
-		}
 	} else {
 		LASSERT(cl_io_is_mkwrite(io));
 		io_start = io->u.ci_fault.ft_index;
@@ -1139,7 +1135,8 @@ static void osc_lock_set_writer(const struct lu_env *env,
 	}
 
 	if (descr->cld_mode >= CLM_WRITE &&
-	    descr->cld_start <= io_start && descr->cld_end >= io_end) {
+	    (cl_io_is_append(io) ||
+	     (descr->cld_start <= io_start && descr->cld_end >= io_end))) {
 		struct osc_io *oio = osc_env_io(env);
 
 		/* There must be only one lock to match the write region */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [lustre-devel] [PATCH v2 17/33] lustre: clio: getstripe support comp layout
  2019-01-06 22:13 [lustre-devel] [PATCH v2 00/33] lustre: add PFL support James Simmons
                   ` (15 preceding siblings ...)
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 16/33] lustre: clio: client side implementation for PFL James Simmons
@ 2019-01-06 22:14 ` James Simmons
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 18/33] lustre: pfl: enhance PFID EA for PFL James Simmons
                   ` (15 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: James Simmons @ 2019-01-06 22:14 UTC (permalink / raw)
  To: lustre-devel

From: Niu Yawei <yawei.niu@intel.com>

{get/set}stripe support composite layout

Signed-off-by: Bobi Jam <bobijam@hotmail.com>
Signed-off-by: Niu Yawei <yawei.niu@intel.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-8998
Reviewed-on: https://review.whamcloud.com/24851
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/llite/dir.c          |  33 +++-
 drivers/staging/lustre/lustre/llite/file.c         |  25 ++-
 .../staging/lustre/lustre/llite/llite_internal.h   |   2 +
 drivers/staging/lustre/lustre/llite/xattr.c        |  70 +++++----
 drivers/staging/lustre/lustre/lov/lov_internal.h   |  24 +++
 drivers/staging/lustre/lustre/lov/lov_object.c     |   3 +-
 drivers/staging/lustre/lustre/lov/lov_pack.c       | 172 ++++++++++++---------
 7 files changed, 203 insertions(+), 126 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/dir.c b/drivers/staging/lustre/lustre/llite/dir.c
index 0a7330d..57acb7b 100644
--- a/drivers/staging/lustre/lustre/llite/dir.c
+++ b/drivers/staging/lustre/lustre/llite/dir.c
@@ -522,6 +522,15 @@ int ll_dir_setstripe(struct inode *inode, struct lov_user_md *lump,
 			lum_size = sizeof(struct lov_user_md_v3);
 			break;
 		}
+		case LOV_USER_MAGIC_COMP_V1: {
+			if (lump->lmm_magic !=
+			    cpu_to_le32(LOV_USER_MAGIC_COMP_V1))
+				lustre_swab_lov_comp_md_v1(
+					(struct lov_comp_md_v1 *)lump);
+			lum_size = le32_to_cpu(
+				((struct lov_comp_md_v1 *)lump)->lcm_size);
+			break;
+		}
 		case LMV_USER_MAGIC: {
 			if (lump->lmm_magic != cpu_to_le32(LMV_USER_MAGIC))
 				lustre_swab_lmv_user_md(
@@ -562,7 +571,9 @@ int ll_dir_setstripe(struct inode *inode, struct lov_user_md *lump,
 	 * LOV_USER_MAGIC_V3 have the same initial fields so we do not
 	 * need to make the distinction between the 2 versions
 	 */
-	if (set_default && mgc->u.cli.cl_mgc_mgsexp) {
+	if (set_default && mgc->u.cli.cl_mgc_mgsexp &&
+	    (!lump || le32_to_cpu(lump->lmm_magic) == LOV_USER_MAGIC_V1 ||
+	     le32_to_cpu(lump->lmm_magic) == LOV_USER_MAGIC_V3)) {
 		char *param = NULL;
 		char *buf;
 
@@ -577,23 +588,23 @@ int ll_dir_setstripe(struct inode *inode, struct lov_user_md *lump,
 		buf += strlen(buf);
 
 		/* Set root stripesize */
-		sprintf(buf, ".stripesize=%u",
-			lump ? le32_to_cpu(lump->lmm_stripe_size) : 0);
+		snprintf(buf, MGS_PARAM_MAXLEN, ".stripesize=%u",
+			 lump ? le32_to_cpu(lump->lmm_stripe_size) : 0);
 		rc = ll_send_mgc_param(mgc->u.cli.cl_mgc_mgsexp, param);
 		if (rc)
 			goto end;
 
 		/* Set root stripecount */
-		sprintf(buf, ".stripecount=%hd",
-			lump ? le16_to_cpu(lump->lmm_stripe_count) : 0);
+		snprintf(buf, MGS_PARAM_MAXLEN, ".stripecount=%hd",
+			 lump ? le16_to_cpu(lump->lmm_stripe_count) : 0);
 		rc = ll_send_mgc_param(mgc->u.cli.cl_mgc_mgsexp, param);
 		if (rc)
 			goto end;
 
 		/* Set root stripeoffset */
-		sprintf(buf, ".stripeoffset=%hd",
-			lump ? le16_to_cpu(lump->lmm_stripe_offset) :
-			(typeof(lump->lmm_stripe_offset))(-1));
+		snprintf(buf, MGS_PARAM_MAXLEN, ".stripeoffset=%hd",
+			 lump ? le16_to_cpu(lump->lmm_stripe_offset) :
+				(typeof(lump->lmm_stripe_offset))(-1));
 		rc = ll_send_mgc_param(mgc->u.cli.cl_mgc_mgsexp, param);
 
 end:
@@ -669,6 +680,10 @@ int ll_dir_getstripe(struct inode *inode, void **plmm, int *plmm_size,
 		if (cpu_to_le32(LOV_MAGIC) != LOV_MAGIC)
 			lustre_swab_lov_user_md_v3((struct lov_user_md_v3 *)lmm);
 		break;
+	case LOV_MAGIC_COMP_V1:
+		if (cpu_to_le32(LOV_MAGIC) != LOV_MAGIC)
+			lustre_swab_lov_comp_md_v1((struct lov_comp_md_v1 *)lmm);
+		break;
 	case LMV_MAGIC_V1:
 		if (cpu_to_le32(LMV_MAGIC) != LMV_MAGIC)
 			lustre_swab_lmv_mds_md((union lmv_mds_md *)lmm);
@@ -1217,6 +1232,8 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 
 		int set_default = 0;
 
+		BUILD_BUG_ON(sizeof(struct lov_user_md_v3) <=
+			     sizeof(struct lov_comp_md_v1));
 		LASSERT(sizeof(lumv3) == sizeof(*lumv3p));
 		LASSERT(sizeof(lumv3.lmm_objects[0]) ==
 			sizeof(lumv3p->lmm_objects[0]));
diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index fae0111..24a0948 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -1430,8 +1430,9 @@ int ll_lov_getstripe_ea_info(struct inode *inode, const char *filename,
 
 	lmm = req_capsule_server_sized_get(&req->rq_pill, &RMF_MDT_MD, lmmsize);
 
-	if ((lmm->lmm_magic != cpu_to_le32(LOV_MAGIC_V1)) &&
-	    (lmm->lmm_magic != cpu_to_le32(LOV_MAGIC_V3))) {
+	if (lmm->lmm_magic != cpu_to_le32(LOV_MAGIC_V1) &&
+	    lmm->lmm_magic != cpu_to_le32(LOV_MAGIC_V3) &&
+	    lmm->lmm_magic != cpu_to_le32(LOV_MAGIC_COMP_V1)) {
 		rc = -EPROTO;
 		goto out;
 	}
@@ -1444,9 +1445,13 @@ int ll_lov_getstripe_ea_info(struct inode *inode, const char *filename,
 	if (cpu_to_le32(LOV_MAGIC) != LOV_MAGIC) {
 		int stripe_count;
 
-		stripe_count = le16_to_cpu(lmm->lmm_stripe_count);
-		if (le32_to_cpu(lmm->lmm_pattern) & LOV_PATTERN_F_RELEASED)
-			stripe_count = 0;
+		if (lmm->lmm_magic == cpu_to_le32(LOV_MAGIC_V1) ||
+		    lmm->lmm_magic == cpu_to_le32(LOV_MAGIC_V3)) {
+			stripe_count = le16_to_cpu(lmm->lmm_stripe_count);
+			if (le32_to_cpu(lmm->lmm_pattern) &
+			    LOV_PATTERN_F_RELEASED)
+				stripe_count = 0;
+		}
 
 		/* if function called for directory - we should
 		 * avoid swab not existent lsm objects
@@ -1463,6 +1468,8 @@ int ll_lov_getstripe_ea_info(struct inode *inode, const char *filename,
 				lustre_swab_lov_user_md_objects(
 				 ((struct lov_user_md_v3 *)lmm)->lmm_objects,
 				 stripe_count);
+		} else if (lmm->lmm_magic == cpu_to_le32(LOV_MAGIC_COMP_V1)) {
+			lustre_swab_lov_comp_md_v1((struct lov_comp_md_v1 *)lmm);
 		}
 	}
 
@@ -1534,14 +1541,6 @@ static int ll_lov_setstripe(struct inode *inode, struct file *file,
 	rc = ll_lov_setstripe_ea_info(inode, file->f_path.dentry, flags, klum,
 				      lum_size);
 	cl_lov_delay_create_clear(&file->f_flags);
-	if (rc == 0) {
-		__u32 gen;
-
-		put_user(0, &lum->lmm_stripe_count);
-
-		ll_layout_refresh(inode, &gen);
-		rc = ll_file_getstripe(inode, (struct lov_user_md __user *)arg);
-	}
 
 	kfree(klum);
 	return rc;
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index 48424a4..e3f5450 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -927,6 +927,8 @@ static inline ssize_t ll_lov_user_md_size(const struct lov_user_md *lum)
 
 		return lov_user_md_size(lum->lmm_stripe_count,
 					LOV_USER_MAGIC_SPECIFIC);
+	case LOV_USER_MAGIC_COMP_V1:
+		return ((struct lov_comp_md_v1 *)lum)->lcm_size;
 	}
 	return -EINVAL;
 }
diff --git a/drivers/staging/lustre/lustre/llite/xattr.c b/drivers/staging/lustre/lustre/llite/xattr.c
index aeaa04a..0670ed3 100644
--- a/drivers/staging/lustre/lustre/llite/xattr.c
+++ b/drivers/staging/lustre/lustre/llite/xattr.c
@@ -194,40 +194,53 @@ static int get_hsm_state(struct inode *inode, u32 *hus_states)
 
 static int ll_adjust_lum(struct inode *inode, struct lov_user_md *lump)
 {
+	struct lov_comp_md_v1 *comp_v1 = (struct lov_comp_md_v1 *)lump;
+	struct lov_user_md *v1 = lump;
+	bool need_clear_release = false;
+	bool release_checked = false;
+	bool is_composite = false;
+	u16 entry_count = 1;
 	int rc = 0;
+	int i;
 
 	if (!lump)
 		return 0;
 
-	/* Attributes that are saved via getxattr will always have
-	 * the stripe_offset as 0.  Instead, the MDS should be
-	 * allowed to pick the starting OST index.   b=17846
-	 */
-	if (lump->lmm_stripe_offset == 0)
-		lump->lmm_stripe_offset = -1;
+	if (lump->lmm_magic == LOV_USER_MAGIC_COMP_V1) {
+		entry_count = comp_v1->lcm_entry_count;
+		is_composite = true;
+        }
+
+	for (i = 0; i < entry_count; i++) {
+		if (lump->lmm_magic == LOV_USER_MAGIC_COMP_V1) {
+			void *ptr = comp_v1;
 
-	/* Avoid anyone directly setting the RELEASED flag. */
-	if (lump->lmm_pattern & LOV_PATTERN_F_RELEASED) {
-		/* Only if we have a released flag check if the file
-		 * was indeed archived.
+			ptr += comp_v1->lcm_entries[i].lcme_offset;
+			v1 = (struct lov_user_md *)ptr;
+		}
+
+		/* Attributes that are saved via getxattr will always have
+		 * the stripe_offset as 0.  Instead, the MDS should be
+		 * allowed to pick the starting OST index.   b=17846
 		 */
-		u32 state = HS_NONE;
-
-		rc = get_hsm_state(inode, &state);
-		if (rc)
-			return rc;
-
-		if (!(state & HS_ARCHIVED)) {
-			CDEBUG(D_VFSTRACE,
-			       "hus_states state = %x, pattern = %x\n",
-				state, lump->lmm_pattern);
-			/*
-			 * Here the state is: real file is not
-			 * archived but user is requesting to set
-			 * the RELEASED flag so we mask off the
-			 * released flag from the request
-			 */
-			lump->lmm_pattern ^= LOV_PATTERN_F_RELEASED;
+		if (!is_composite && v1->lmm_stripe_offset == 0)
+			v1->lmm_stripe_offset = -1;
+
+		/* Avoid anyone directly setting the RELEASED flag. */
+		if (v1->lmm_pattern & LOV_PATTERN_F_RELEASED) {
+			if (!release_checked) {
+				u32 state = HS_NONE;
+
+				rc = get_hsm_state(inode, &state);
+				if (rc)
+					return rc;
+
+				if (!(state & HS_ARCHIVED))
+					need_clear_release = true;
+				release_checked = true;
+			}
+			if (need_clear_release)
+				v1->lmm_pattern ^= LOV_PATTERN_F_RELEASED;
 		}
 	}
 
@@ -495,6 +508,9 @@ static ssize_t ll_getxattr_lov(struct inode *inode, void *buf, size_t buf_size)
 		 * recognizing layout gen as stripe offset when the
 		 * file is restored. See LU-2809.
 		 */
+		if (((struct lov_mds_md *)buf)->lmm_magic == LOV_MAGIC_COMP_V1)
+			goto out_env;
+
 		((struct lov_mds_md *)buf)->lmm_layout_gen = 0;
 out_env:
 		cl_env_put(env, &refcheck);
diff --git a/drivers/staging/lustre/lustre/lov/lov_internal.h b/drivers/staging/lustre/lustre/lov/lov_internal.h
index 29325ff..9c0a4f7 100644
--- a/drivers/staging/lustre/lustre/lov/lov_internal.h
+++ b/drivers/staging/lustre/lustre/lov/lov_internal.h
@@ -74,6 +74,30 @@ struct lov_stripe_md {
 	struct lov_stripe_md_entry *lsm_entries[];
 };
 
+static inline size_t lov_comp_md_size(const struct lov_stripe_md *lsm)
+{
+	struct lov_stripe_md_entry *lsme;
+	size_t size;
+	int entry;
+
+	if (lsm->lsm_magic == LOV_MAGIC_V1 || lsm->lsm_magic == LOV_MAGIC_V3)
+		return lov_mds_md_size(lsm->lsm_entries[0]->lsme_stripe_count,
+				       lsm->lsm_entries[0]->lsme_magic);
+
+	LASSERT(lsm->lsm_magic == LOV_MAGIC_COMP_V1);
+
+	size = sizeof(struct lov_comp_md_v1);
+	for (entry = 0; entry < lsm->lsm_entry_count; entry++) {
+		lsme = lsm->lsm_entries[entry];
+
+		size += sizeof(*lsme);
+		size += lov_mds_md_size(lsme->lsme_stripe_count,
+					lsme->lsme_magic);
+	}
+
+	return size;
+}
+
 static inline bool lsm_has_objects(struct lov_stripe_md *lsm)
 {
 	return lsm && !lsm->lsm_is_released;
diff --git a/drivers/staging/lustre/lustre/lov/lov_object.c b/drivers/staging/lustre/lustre/lov/lov_object.c
index a7d3068..8596c2f 100644
--- a/drivers/staging/lustre/lustre/lov/lov_object.c
+++ b/drivers/staging/lustre/lustre/lov/lov_object.c
@@ -1641,8 +1641,7 @@ static int lov_object_layout_get(const struct lu_env *env,
 		return 0;
 	}
 
-	cl->cl_size = lov_mds_md_size(lsm->lsm_entries[0]->lsme_stripe_count,
-				      lsm->lsm_magic);
+	cl->cl_size = lov_comp_md_size(lsm);
 	cl->cl_layout_gen = lsm->lsm_layout_gen;
 
 	rc = lov_lsm_pack(lsm, buf->lb_buf, buf->lb_len);
diff --git a/drivers/staging/lustre/lustre/lov/lov_pack.c b/drivers/staging/lustre/lustre/lov/lov_pack.c
index ba7c488..79d8a32 100644
--- a/drivers/staging/lustre/lustre/lov/lov_pack.c
+++ b/drivers/staging/lustre/lustre/lov/lov_pack.c
@@ -107,8 +107,8 @@ void lov_dump_lmm_v3(int level, struct lov_mds_md_v3 *lmm)
  * then return the size needed. If \a buf_size is too small then
  * return -ERANGE. Otherwise return the size of the result.
  */
-ssize_t lov_lsm_pack(const struct lov_stripe_md *lsm, void *buf,
-		     size_t buf_size)
+ssize_t lov_lsm_pack_v1v3(const struct lov_stripe_md *lsm, void *buf,
+			  size_t buf_size)
 {
 	struct lov_ost_data_v1 *lmm_objects;
 	struct lov_mds_md_v1 *lmmv1 = buf;
@@ -157,6 +157,88 @@ ssize_t lov_lsm_pack(const struct lov_stripe_md *lsm, void *buf,
 	return lmm_size;
 }
 
+ssize_t lov_lsm_pack(const struct lov_stripe_md *lsm, void *buf,
+		     size_t buf_size)
+{
+	struct lov_comp_md_v1 *lcmv1 = buf;
+	struct lov_comp_md_entry_v1 *lcme;
+	struct lov_ost_data_v1 *lmm_objects;
+	unsigned int offset;
+	unsigned int entry;
+	unsigned int size;
+	unsigned int i;
+	size_t lmm_size;
+
+	if (lsm->lsm_magic == LOV_MAGIC_V1 || lsm->lsm_magic == LOV_MAGIC_V3)
+		return lov_lsm_pack_v1v3(lsm, buf, buf_size);
+
+	lmm_size = lov_comp_md_size(lsm);
+	if (buf_size == 0)
+		return lmm_size;
+
+	if (buf_size < lmm_size)
+		return -ERANGE;
+
+	lcmv1->lcm_magic = cpu_to_le32(lsm->lsm_magic);
+	lcmv1->lcm_size = cpu_to_le32(lmm_size);
+	lcmv1->lcm_layout_gen = cpu_to_le32(lsm->lsm_layout_gen);
+	lcmv1->lcm_entry_count = cpu_to_le16(lsm->lsm_entry_count);
+
+	offset = sizeof(*lcmv1) + sizeof(*lcme) * lsm->lsm_entry_count;
+
+	for (entry = 0; entry < lsm->lsm_entry_count; entry++) {
+		struct lov_stripe_md_entry *lsme;
+		struct lov_mds_md *lmm;
+
+		lsme = lsm->lsm_entries[entry];
+		lcme = &lcmv1->lcm_entries[entry];
+
+		lcme->lcme_id = cpu_to_le32(lsme->lsme_id);
+		lcme->lcme_extent.e_start =
+			cpu_to_le64(lsme->lsme_extent.e_start);
+		lcme->lcme_extent.e_end =
+			cpu_to_le64(lsme->lsme_extent.e_end);
+		lcme->lcme_offset = cpu_to_le32(offset);
+
+		lmm = (struct lov_mds_md *)((char *)lcmv1 + offset);
+		lmm->lmm_magic = cpu_to_le32(lsme->lsme_magic);
+		/* lmm->lmm_oi not set */
+		lmm->lmm_pattern = cpu_to_le32(lsme->lsme_pattern);
+		lmm->lmm_stripe_size = cpu_to_le32(lsme->lsme_stripe_size);
+		lmm->lmm_stripe_count = cpu_to_le16(lsme->lsme_stripe_count);
+		lmm->lmm_layout_gen = cpu_to_le16(lsme->lsme_layout_gen);
+
+		if (lsme->lsme_magic == LOV_MAGIC_V3) {
+			struct lov_mds_md_v3 *lmmv3;
+
+			lmmv3 = (struct lov_mds_md_v3 *)lmm;
+
+			strlcpy(lmmv3->lmm_pool_name, lsme->lsme_pool_name,
+				sizeof(lmmv3->lmm_pool_name));
+			lmm_objects = lmmv3->lmm_objects;
+		} else {
+			lmm_objects = ((struct lov_mds_md_v1 *)lmm)->lmm_objects;
+		}
+
+		for (i = 0; i < lsme->lsme_stripe_count; i++) {
+			struct lov_oinfo *loi = lsme->lsme_oinfo[i];
+
+			ostid_cpu_to_le(&loi->loi_oi, &lmm_objects[i].l_ost_oi);
+			lmm_objects[i].l_ost_gen =
+				cpu_to_le32(loi->loi_ost_gen);
+			lmm_objects[i].l_ost_idx =
+				cpu_to_le32(loi->loi_ost_idx);
+		}
+
+		size = lov_mds_md_size(lsme->lsme_stripe_count,
+				       lsme->lsme_magic);
+		lcme->lcme_size = cpu_to_le32(size);
+		offset += size;
+	} /* for each layout component */
+
+	return lmm_size;
+}
+
 /* Find the max stripecount we should use */
 __u16 lov_get_stripecnt(struct lov_obd *lov, __u32 magic, __u16 stripe_count)
 {
@@ -227,53 +309,23 @@ int lov_getstripe(struct lov_object *obj, struct lov_stripe_md *lsm,
 		  struct lov_user_md __user *lump)
 {
 	/* we use lov_user_md_v3 because it is larger than lov_user_md_v1 */
-	struct lov_user_md_v3 lum;
 	struct lov_mds_md *lmmk;
-	u32 stripe_count;
 	ssize_t lmm_size;
 	size_t lmmk_size;
-	size_t lum_size;
-	int rc;
+	int rc = 0;
 
 	if (!lsm)
 		return -ENODATA;
 
-	if (lsm->lsm_magic != LOV_MAGIC_V1 && lsm->lsm_magic != LOV_MAGIC_V3) {
+	if (lsm->lsm_magic != LOV_MAGIC_V1 && lsm->lsm_magic != LOV_MAGIC_V3 &&
+	    lsm->lsm_magic != LOV_MAGIC_COMP_V1) {
 		CERROR("bad LSM MAGIC: 0x%08X != 0x%08X nor 0x%08X\n",
 		       lsm->lsm_magic, LOV_MAGIC_V1, LOV_MAGIC_V3);
 		rc = -EIO;
 		goto out;
 	}
 
-	if (!lsm->lsm_is_released)
-		stripe_count = lsm->lsm_entries[0]->lsme_stripe_count;
-	else
-		stripe_count = 0;
-
-	/* we only need the header part from user space to get lmm_magic and
-	 * lmm_stripe_count, (the header part is common to v1 and v3)
-	 */
-	lum_size = sizeof(struct lov_user_md_v1);
-	if (copy_from_user(&lum, lump, lum_size)) {
-		rc = -EFAULT;
-		goto out;
-	}
-	if (lum.lmm_magic != LOV_USER_MAGIC_V1 &&
-	    lum.lmm_magic != LOV_USER_MAGIC_V3 &&
-	    lum.lmm_magic != LOV_USER_MAGIC_SPECIFIC) {
-		rc = -EINVAL;
-		goto out;
-	}
-
-	if (lum.lmm_stripe_count && lum.lmm_stripe_count < stripe_count) {
-		/* Return right size of stripe to user */
-		lum.lmm_stripe_count = stripe_count;
-		rc = copy_to_user(lump, &lum, lum_size);
-		rc = -EOVERFLOW;
-		goto out;
-	}
-
-	lmmk_size = lov_mds_md_size(stripe_count, lsm->lsm_magic);
+	lmmk_size = lov_comp_md_size(lsm);
 	lmmk = kvzalloc(lmmk_size, GFP_KERNEL);
 	if (!lmmk) {
 		rc = -ENOMEM;
@@ -286,54 +338,22 @@ int lov_getstripe(struct lov_object *obj, struct lov_stripe_md *lsm,
 		goto out_free;
 	}
 
-	/* FIXME: Bug 1185 - copy fields properly when structs change */
-	/* struct lov_user_md_v3 and struct lov_mds_md_v3 must be the same */
-	BUILD_BUG_ON(sizeof(lum) != sizeof(struct lov_mds_md_v3));
-	BUILD_BUG_ON(sizeof(lum.lmm_objects[0]) != sizeof(lmmk->lmm_objects[0]));
-
-	if (cpu_to_le32(LOV_MAGIC) != LOV_MAGIC &&
-	    (lmmk->lmm_magic == cpu_to_le32(LOV_MAGIC_V1) ||
-	     lmmk->lmm_magic == cpu_to_le32(LOV_MAGIC_V3))) {
-		lustre_swab_lov_mds_md(lmmk);
-		lustre_swab_lov_user_md_objects(
+	if (cpu_to_le32(LOV_MAGIC) != LOV_MAGIC) {
+		if (lmmk->lmm_magic == cpu_to_le32(LOV_MAGIC_V1) ||
+		    lmmk->lmm_magic == cpu_to_le32(LOV_MAGIC_V3)) {
+			lustre_swab_lov_mds_md(lmmk);
+			lustre_swab_lov_user_md_objects(
 				(struct lov_user_ost_data *)lmmk->lmm_objects,
 				lmmk->lmm_stripe_count);
-	}
-
-	if (lum.lmm_magic == LOV_USER_MAGIC) {
-		/* User request for v1, we need skip lmm_pool_name */
-		if (lmmk->lmm_magic == LOV_MAGIC_V3) {
-			memmove(((struct lov_mds_md_v1 *)lmmk)->lmm_objects,
-				((struct lov_mds_md_v3 *)lmmk)->lmm_objects,
-				lmmk->lmm_stripe_count *
-				sizeof(struct lov_ost_data_v1));
-			lmm_size -= LOV_MAXPOOLNAME;
+		} else if (lmmk->lmm_magic == cpu_to_le32(LOV_MAGIC_COMP_V1)) {
+			lustre_swab_lov_comp_md_v1((struct lov_comp_md_v1 *)lmmk);
 		}
-	} else {
-		/* if v3 we just have to update the lum_size */
-		lum_size = sizeof(struct lov_user_md_v3);
 	}
 
-	/* User wasn't expecting this many OST entries */
-	if (lum.lmm_stripe_count == 0) {
-		lmm_size = lum_size;
-	} else if (lum.lmm_stripe_count < lmmk->lmm_stripe_count) {
-		rc = -EOVERFLOW;
-		goto out_free;
-	}
-	/*
-	 * Have a difference between lov_mds_md & lov_user_md.
-	 * So we have to re-order the data before copy to user.
-	 */
-	lum.lmm_stripe_count = lmmk->lmm_stripe_count;
-	lum.lmm_layout_gen = lmmk->lmm_layout_gen;
-	((struct lov_user_md *)lmmk)->lmm_layout_gen = lum.lmm_layout_gen;
-	((struct lov_user_md *)lmmk)->lmm_stripe_count = lum.lmm_stripe_count;
-	if (copy_to_user(lump, lmmk, lmm_size))
+	if (copy_to_user(lump, lmmk, lmmk_size))
 		rc = -EFAULT;
 	else
 		rc = 0;
-
 out_free:
 	kvfree(lmmk);
 out:
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [lustre-devel] [PATCH v2 18/33] lustre: pfl: enhance PFID EA for PFL
  2019-01-06 22:13 [lustre-devel] [PATCH v2 00/33] lustre: add PFL support James Simmons
                   ` (16 preceding siblings ...)
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 17/33] lustre: clio: getstripe support comp layout James Simmons
@ 2019-01-06 22:14 ` James Simmons
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 19/33] lustre: pfl: dynamic layout modification with write/truncate James Simmons
                   ` (14 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: James Simmons @ 2019-01-06 22:14 UTC (permalink / raw)
  To: lustre-devel

From: Fan Yong <fan.yong@intel.com>

This is a misc patch that contains some adjustments to
store more stripe information in the OST-object's PFID
EA (XATTR_NAME_FID). It is client duty to transfer the
stripe and PFL information to the OST via the write,
setattr and punch RPC. Then OST will store these
information in the PFID EA.

Signed-off-by: Fan Yong <fan.yong@intel.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-8998
Reviewed-on: https://review.whamcloud.com/24882
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/include/uapi/linux/lustre/lustre_idl.h  | 14 ++++--
 .../lustre/include/uapi/linux/lustre/lustre_user.h | 17 +++----
 drivers/staging/lustre/lustre/include/cl_object.h  |  1 +
 drivers/staging/lustre/lustre/lov/lov_internal.h   | 16 ++++++
 drivers/staging/lustre/lustre/lov/lov_io.c         |  2 +
 drivers/staging/lustre/lustre/lov/lovsub_object.c  |  4 ++
 drivers/staging/lustre/lustre/osc/osc_io.c         |  4 +-
 .../staging/lustre/lustre/ptlrpc/pack_generic.c    | 13 ++++-
 drivers/staging/lustre/lustre/ptlrpc/wiretest.c    | 58 +++++++++++++---------
 9 files changed, 89 insertions(+), 40 deletions(-)

diff --git a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h
index 333b791..695f1a1 100644
--- a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h
@@ -1134,6 +1134,7 @@ static inline __u32 lov_mds_md_size(__u16 stripes, __u32 lmm_magic)
 							    */
 
 #define OBD_MD_DEFAULT_MEA	(0x0040000000000000ULL) /* default MEA */
+#define OBD_MD_FLOSTLAYOUT	(0x0080000000000000ULL)	/* contain ost_layout */
 #define OBD_MD_FLPROJID		(0x0100000000000000ULL) /* project ID */
 
 #define OBD_MD_FLALLQUOTA (OBD_MD_FLUSRQUOTA | \
@@ -2637,9 +2638,16 @@ struct obdo {
 	__u32		o_parent_ver;
 	struct lustre_handle    o_handle;  /* brw: lock handle to prolong locks
 					    */
-	struct llog_cookie      o_lcookie; /* destroy: unlink cookie from MDS,
-					    * obsolete in 2.8, reused in OSP
-					    */
+	/* Originally, the field is llog_cookie for destroy with unlink cookie
+	 * from MDS, it is obsolete in 2.8. Then reuse it by client to transfer
+	 * layout and PFL information in IO, setattr RPCs. Since llog_cookie is
+	 * not used on wire any longer, remove it from the obdo, then it can be
+	 * enlarged freely in the further without affect related RPCs.
+	 *
+	 * sizeof(ost_layout) + sizeof(__u32) == sizeof(llog_cookie).
+	 */
+	struct ost_layout	o_layout;
+	__u32			o_padding_3;
 	__u32		o_uid_h;
 	__u32		o_gid_h;
 
diff --git a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_user.h b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_user.h
index 8ef05f5..28d4e0c 100644
--- a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_user.h
+++ b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_user.h
@@ -154,16 +154,13 @@ static inline bool fid_is_zero(const struct lu_fid *fid)
 	return !fid->f_seq && !fid->f_oid;
 }
 
-struct filter_fid {
-	struct lu_fid	ff_parent;  /* ff_parent.f_ver == file stripe number */
-};
-
-/* keep this one for compatibility */
-struct filter_fid_old {
-	struct lu_fid	ff_parent;
-	__u64		ff_objid;
-	__u64		ff_seq;
-};
+struct ost_layout {
+	__u32	ol_stripe_size;
+	__u32	ol_stripe_count;
+	__u64	ol_comp_start;
+	__u64	ol_comp_end;
+	__u32	ol_comp_id;
+} __packed;
 
 /* Userspace should treat lu_fid as opaque, and only use the following methods
  * to print or parse them.  Other functions (e.g. compare, swab) could be moved
diff --git a/drivers/staging/lustre/lustre/include/cl_object.h b/drivers/staging/lustre/lustre/include/cl_object.h
index a1e07f8..d0edeb7c 100644
--- a/drivers/staging/lustre/lustre/include/cl_object.h
+++ b/drivers/staging/lustre/lustre/include/cl_object.h
@@ -1784,6 +1784,7 @@ struct cl_io {
 			unsigned int     sa_avalid;
 			unsigned int		sa_xvalid; /* OP_XVALID */
 			int		sa_stripe_index;
+			struct ost_layout	 sa_layout;
 			const struct lu_fid	*sa_parent_fid;
 		} ci_setattr;
 		struct cl_data_version_io {
diff --git a/drivers/staging/lustre/lustre/lov/lov_internal.h b/drivers/staging/lustre/lustre/lov/lov_internal.h
index 9c0a4f7..e8102df 100644
--- a/drivers/staging/lustre/lustre/lov/lov_internal.h
+++ b/drivers/staging/lustre/lustre/lov/lov_internal.h
@@ -304,4 +304,20 @@ static inline struct obd_device *lov2obd(const struct lov_obd *lov)
 	return container_of_safe(lov, struct obd_device, u.lov);
 }
 
+static inline void lov_lsm2layout(struct lov_stripe_md *lsm,
+				  struct lov_stripe_md_entry *lsme,
+				  struct ost_layout *ol)
+{
+	ol->ol_stripe_size = lsme->lsme_stripe_size;
+	ol->ol_stripe_count = lsme->lsme_stripe_count;
+	if (lsm->lsm_magic == LOV_MAGIC_COMP_V1) {
+		ol->ol_comp_start = lsme->lsme_extent.e_start;
+		ol->ol_comp_end = lsme->lsme_extent.e_end;
+		ol->ol_comp_id = lsme->lsme_id;
+	} else {
+		ol->ol_comp_start = 0;
+		ol->ol_comp_end = 0;
+		ol->ol_comp_id = 0;
+	}
+}
 #endif
diff --git a/drivers/staging/lustre/lustre/lov/lov_io.c b/drivers/staging/lustre/lustre/lov/lov_io.c
index d9b2a81..70908b1 100644
--- a/drivers/staging/lustre/lustre/lov/lov_io.c
+++ b/drivers/staging/lustre/lustre/lov/lov_io.c
@@ -316,6 +316,8 @@ static void lov_io_sub_inherit(struct lov_io_sub *sub, struct lov_io *lio,
 						      stripe);
 			io->u.ci_setattr.sa_attr.lvb_size = new_size;
 		}
+		lov_lsm2layout(lsm, lsm->lsm_entries[index],
+			       &io->u.ci_setattr.sa_layout);
 		break;
 	}
 	case CIT_DATA_VERSION: {
diff --git a/drivers/staging/lustre/lustre/lov/lovsub_object.c b/drivers/staging/lustre/lustre/lov/lovsub_object.c
index ca7c8a0..da4b7f1 100644
--- a/drivers/staging/lustre/lustre/lov/lovsub_object.c
+++ b/drivers/staging/lustre/lustre/lov/lovsub_object.c
@@ -131,6 +131,7 @@ static void lovsub_req_attr_set(const struct lu_env *env, struct cl_object *obj,
 				struct cl_req_attr *attr)
 {
 	struct lovsub_object *subobj = cl2lovsub(obj);
+	struct lov_stripe_md *lsm = subobj->lso_super->lo_lsm;
 
 	cl_req_attr_set(env, &subobj->lso_super->lo_cl, attr);
 
@@ -139,6 +140,9 @@ static void lovsub_req_attr_set(const struct lu_env *env, struct cl_object *obj,
 	 * unconditionally. It never changes anyway.
 	 */
 	attr->cra_oa->o_stripe_idx = lov_comp_stripe(subobj->lso_index);
+	lov_lsm2layout(lsm, lsm->lsm_entries[lov_comp_entry(subobj->lso_index)],
+		       &attr->cra_oa->o_layout);
+	attr->cra_oa->o_valid |= OBD_MD_FLOSTLAYOUT;
 }
 
 static const struct cl_object_operations lovsub_ops = {
diff --git a/drivers/staging/lustre/lustre/osc/osc_io.c b/drivers/staging/lustre/lustre/osc/osc_io.c
index dabdf6d..8cd0813 100644
--- a/drivers/staging/lustre/lustre/osc/osc_io.c
+++ b/drivers/staging/lustre/lustre/osc/osc_io.c
@@ -542,7 +542,9 @@ static int osc_io_setattr_start(const struct lu_env *env,
 		oa->o_oi = loi->loi_oi;
 		obdo_set_parent_fid(oa, io->u.ci_setattr.sa_parent_fid);
 		oa->o_stripe_idx = io->u.ci_setattr.sa_stripe_index;
-		oa->o_valid |= OBD_MD_FLID | OBD_MD_FLGROUP;
+		oa->o_layout = io->u.ci_setattr.sa_layout;
+		oa->o_valid |= OBD_MD_FLID | OBD_MD_FLGROUP |
+			       OBD_MD_FLOSTLAYOUT;
 		if (ia_avalid & ATTR_CTIME) {
 			oa->o_valid |= OBD_MD_FLCTIME;
 			oa->o_ctime = attr->cat_ctime;
diff --git a/drivers/staging/lustre/lustre/ptlrpc/pack_generic.c b/drivers/staging/lustre/lustre/ptlrpc/pack_generic.c
index 9c5be30..5fadd5e 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/pack_generic.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/pack_generic.c
@@ -1587,6 +1587,15 @@ void lustre_swab_connect(struct obd_connect_data *ocd)
 	BUILD_BUG_ON(offsetof(typeof(*ocd), paddingF) == 0);
 }
 
+static void lustre_swab_ost_layout(struct ost_layout *ol)
+{
+	__swab32s(&ol->ol_stripe_size);
+	__swab32s(&ol->ol_stripe_count);
+	__swab64s(&ol->ol_comp_start);
+	__swab64s(&ol->ol_comp_end);
+	__swab32s(&ol->ol_comp_id);
+}
+
 static void lustre_swab_obdo(struct obdo *o)
 {
 	__swab64s(&o->o_valid);
@@ -1609,8 +1618,8 @@ static void lustre_swab_obdo(struct obdo *o)
 	__swab64s(&o->o_ioepoch);
 	__swab32s(&o->o_stripe_idx);
 	__swab32s(&o->o_parent_ver);
-	/* o_handle is opaque */
-	/* o_lcookie is swabbed elsewhere */
+	lustre_swab_ost_layout(&o->o_layout);
+	BUILD_BUG_ON(offsetof(typeof(*o), o_padding_3) == 0);
 	__swab32s(&o->o_uid_h);
 	__swab32s(&o->o_gid_h);
 	__swab64s(&o->o_data_version);
diff --git a/drivers/staging/lustre/lustre/ptlrpc/wiretest.c b/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
index 90e6b8c..639db24 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
@@ -1128,6 +1128,30 @@ void lustre_assert_wire_constants(void)
 	LASSERTF(OBD_CKSUM_CRC32C == 0x00000004UL, "found 0x%.8xUL\n",
 		 (unsigned int)OBD_CKSUM_CRC32C);
 
+	/* Checks for struct ost_layout */
+	LASSERTF((int)sizeof(struct ost_layout) == 28, "found %lld\n",
+		 (long long)(int)sizeof(struct ost_layout));
+	LASSERTF((int)offsetof(struct ost_layout, ol_stripe_size) == 0, "found %lld\n",
+		 (long long)(int)offsetof(struct ost_layout, ol_stripe_size));
+	LASSERTF((int)sizeof(((struct ost_layout *)0)->ol_stripe_size) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct ost_layout *)0)->ol_stripe_size));
+	LASSERTF((int)offsetof(struct ost_layout, ol_stripe_count) == 4, "found %lld\n",
+		 (long long)(int)offsetof(struct ost_layout, ol_stripe_count));
+	LASSERTF((int)sizeof(((struct ost_layout *)0)->ol_stripe_count) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct ost_layout *)0)->ol_stripe_count));
+	LASSERTF((int)offsetof(struct ost_layout, ol_comp_start) == 8, "found %lld\n",
+		 (long long)(int)offsetof(struct ost_layout, ol_comp_start));
+	LASSERTF((int)sizeof(((struct ost_layout *)0)->ol_comp_start) == 8, "found %lld\n",
+		 (long long)(int)sizeof(((struct ost_layout *)0)->ol_comp_start));
+	LASSERTF((int)offsetof(struct ost_layout, ol_comp_end) == 16, "found %lld\n",
+		 (long long)(int)offsetof(struct ost_layout, ol_comp_end));
+	LASSERTF((int)sizeof(((struct ost_layout *)0)->ol_comp_end) == 8, "found %lld\n",
+		 (long long)(int)sizeof(((struct ost_layout *)0)->ol_comp_end));
+	LASSERTF((int)offsetof(struct ost_layout, ol_comp_id) == 24, "found %lld\n",
+		 (long long)(int)offsetof(struct ost_layout, ol_comp_id));
+	LASSERTF((int)sizeof(((struct ost_layout *)0)->ol_comp_id) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct ost_layout *)0)->ol_comp_id));
+
 	/* Checks for struct obdo */
 	LASSERTF((int)sizeof(struct obdo) == 208, "found %lld\n",
 		 (long long)(int)sizeof(struct obdo));
@@ -1215,10 +1239,14 @@ void lustre_assert_wire_constants(void)
 		 (long long)(int)offsetof(struct obdo, o_handle));
 	LASSERTF((int)sizeof(((struct obdo *)0)->o_handle) == 8, "found %lld\n",
 		 (long long)(int)sizeof(((struct obdo *)0)->o_handle));
-	LASSERTF((int)offsetof(struct obdo, o_lcookie) == 136, "found %lld\n",
-		 (long long)(int)offsetof(struct obdo, o_lcookie));
-	LASSERTF((int)sizeof(((struct obdo *)0)->o_lcookie) == 32, "found %lld\n",
-		 (long long)(int)sizeof(((struct obdo *)0)->o_lcookie));
+	LASSERTF((int)offsetof(struct obdo, o_layout) == 136, "found %lld\n",
+		 (long long)(int)offsetof(struct obdo, o_layout));
+	LASSERTF((int)sizeof(((struct obdo *)0)->o_layout) == 28, "found %lld\n",
+		 (long long)(int)sizeof(((struct obdo *)0)->o_layout));
+	LASSERTF((int)offsetof(struct obdo, o_padding_3) == 164, "found %lld\n",
+		 (long long)(int)offsetof(struct obdo, o_padding_3));
+	LASSERTF((int)sizeof(((struct obdo *)0)->o_padding_3) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct obdo *)0)->o_padding_3));
 	LASSERTF((int)offsetof(struct obdo, o_uid_h) == 168, "found %lld\n",
 		 (long long)(int)offsetof(struct obdo, o_uid_h));
 	LASSERTF((int)sizeof(((struct obdo *)0)->o_uid_h) == 4, "found %lld\n",
@@ -1331,6 +1359,8 @@ void lustre_assert_wire_constants(void)
 		 OBD_MD_FLGETATTRLOCK);
 	LASSERTF(OBD_MD_FLDATAVERSION == (0x0010000000000000ULL), "found 0x%.16llxULL\n",
 		 OBD_MD_FLDATAVERSION);
+	LASSERTF(OBD_MD_FLOSTLAYOUT == (0x0080000000000000ULL), "found 0x%.16llxULL\n",
+		 OBD_MD_FLOSTLAYOUT);
 	LASSERTF(OBD_MD_FLPROJID == (0x0100000000000000ULL), "found 0x%.16llxULL\n",
 		 OBD_MD_FLPROJID);
 
@@ -3549,26 +3579,6 @@ void lustre_assert_wire_constants(void)
 	LASSERTF((int)sizeof(((struct llog_log_hdr *)0)->llh_tail) == 8, "found %lld\n",
 		 (long long)(int)sizeof(((struct llog_log_hdr *)0)->llh_tail));
 
-	/* Checks for struct llog_cookie */
-	LASSERTF((int)sizeof(struct llog_cookie) == 32, "found %lld\n",
-		 (long long)(int)sizeof(struct llog_cookie));
-	LASSERTF((int)offsetof(struct llog_cookie, lgc_lgl) == 0, "found %lld\n",
-		 (long long)(int)offsetof(struct llog_cookie, lgc_lgl));
-	LASSERTF((int)sizeof(((struct llog_cookie *)0)->lgc_lgl) == 20, "found %lld\n",
-		 (long long)(int)sizeof(((struct llog_cookie *)0)->lgc_lgl));
-	LASSERTF((int)offsetof(struct llog_cookie, lgc_subsys) == 20, "found %lld\n",
-		 (long long)(int)offsetof(struct llog_cookie, lgc_subsys));
-	LASSERTF((int)sizeof(((struct llog_cookie *)0)->lgc_subsys) == 4, "found %lld\n",
-		 (long long)(int)sizeof(((struct llog_cookie *)0)->lgc_subsys));
-	LASSERTF((int)offsetof(struct llog_cookie, lgc_index) == 24, "found %lld\n",
-		 (long long)(int)offsetof(struct llog_cookie, lgc_index));
-	LASSERTF((int)sizeof(((struct llog_cookie *)0)->lgc_index) == 4, "found %lld\n",
-		 (long long)(int)sizeof(((struct llog_cookie *)0)->lgc_index));
-	LASSERTF((int)offsetof(struct llog_cookie, lgc_padding) == 28, "found %lld\n",
-		 (long long)(int)offsetof(struct llog_cookie, lgc_padding));
-	LASSERTF((int)sizeof(((struct llog_cookie *)0)->lgc_padding) == 4, "found %lld\n",
-		 (long long)(int)sizeof(((struct llog_cookie *)0)->lgc_padding));
-
 	/* Checks for struct llogd_body */
 	LASSERTF((int)sizeof(struct llogd_body) == 48, "found %lld\n",
 		 (long long)(int)sizeof(struct llogd_body));
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [lustre-devel] [PATCH v2 19/33] lustre: pfl: dynamic layout modification with write/truncate
  2019-01-06 22:13 [lustre-devel] [PATCH v2 00/33] lustre: add PFL support James Simmons
                   ` (17 preceding siblings ...)
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 18/33] lustre: pfl: enhance PFID EA for PFL James Simmons
@ 2019-01-06 22:14 ` James Simmons
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 20/33] lustre: ldlm: Transfer layout only if layout lock is granted James Simmons
                   ` (13 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: James Simmons @ 2019-01-06 22:14 UTC (permalink / raw)
  To: lustre-devel

From: Bobi Jam <bobijam@hotmail.com>

* in lov_init_composite(), skip init sub object without LCME_FL_INIT
  layout component.
* issue layout intent RPC during write/trunc ops when try to write to
  an un-init-ed component (even if at the lock stage).
* After layout intent RPC issued, restart the IO.
* get rid of unused lov_layout_operations::llo_install() interface.
* add an empty mdt_layout_change() interface to handle intent layout
  write RPC.

Signed-off-by: Bobi Jam <bobijam@hotmail.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-9008
Reviewed-on: https://review.whamcloud.com/25317
WC-bug-id: https://jira.whamcloud.com/browse/LU-9307
Reviewed-on: https://review.whamcloud.com/26456
WC-bug-id: https://jira.whamcloud.com/browse/LU-9311
Reviewed-on: https://review.whamcloud.com/26474
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/include/uapi/linux/lustre/lustre_idl.h  |  18 ++--
 drivers/staging/lustre/lustre/include/cl_object.h  |   5 +
 drivers/staging/lustre/lustre/include/lustre_sec.h |   4 +-
 drivers/staging/lustre/lustre/llite/file.c         | 102 ++++++++++++++-------
 .../staging/lustre/lustre/llite/llite_internal.h   |   1 +
 drivers/staging/lustre/lustre/llite/vvp_io.c       |  36 +++++++-
 drivers/staging/lustre/lustre/lov/lov_ea.c         |  51 ++++++++---
 drivers/staging/lustre/lustre/lov/lov_internal.h   |  22 +++++
 drivers/staging/lustre/lustre/lov/lov_io.c         |  49 ++++++++--
 drivers/staging/lustre/lustre/lov/lov_lock.c       |  11 ++-
 drivers/staging/lustre/lustre/lov/lov_object.c     |  53 +++++------
 drivers/staging/lustre/lustre/lov/lov_pack.c       |  19 ++--
 drivers/staging/lustre/lustre/lov/lov_page.c       |   2 +-
 drivers/staging/lustre/lustre/mdc/mdc_locks.c      |  79 +++++++++-------
 drivers/staging/lustre/lustre/obdclass/genops.c    |  16 +++-
 drivers/staging/lustre/lustre/ptlrpc/layout.c      |   6 +-
 .../staging/lustre/lustre/ptlrpc/ptlrpc_internal.h |   7 +-
 drivers/staging/lustre/lustre/ptlrpc/sec.c         |   5 +-
 18 files changed, 337 insertions(+), 149 deletions(-)

diff --git a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h
index 695f1a1..5b4d9fc 100644
--- a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h
@@ -2784,22 +2784,22 @@ struct getparent {
 } __packed;
 
 enum {
-	LAYOUT_INTENT_ACCESS    = 0,
-	LAYOUT_INTENT_READ      = 1,
-	LAYOUT_INTENT_WRITE     = 2,
-	LAYOUT_INTENT_GLIMPSE   = 3,
-	LAYOUT_INTENT_TRUNC     = 4,
-	LAYOUT_INTENT_RELEASE   = 5,
-	LAYOUT_INTENT_RESTORE   = 6
+	LAYOUT_INTENT_ACCESS    = 0,	/** generic access */
+	LAYOUT_INTENT_READ      = 1,	/** not used */
+	LAYOUT_INTENT_WRITE     = 2,	/** write file, for comp layout */
+	LAYOUT_INTENT_GLIMPSE   = 3,	/** not used */
+	LAYOUT_INTENT_TRUNC     = 4,	/** truncate file, for comp layout */
+	LAYOUT_INTENT_RELEASE   = 5,	/** reserved for HSM release */
+	LAYOUT_INTENT_RESTORE   = 6	/** reserved for HSM restore */
 };
 
 /* enqueue layout lock with intent */
 struct layout_intent {
-	__u32 li_opc; /* intent operation for enqueue, read, write etc */
+	__u32 li_opc;	/* intent operation for enqueue, read, write etc */
 	__u32 li_flags;
 	__u64 li_start;
 	__u64 li_end;
-};
+} __packed;
 
 /**
  * On the wire version of hsm_progress structure.
diff --git a/drivers/staging/lustre/lustre/include/cl_object.h b/drivers/staging/lustre/lustre/include/cl_object.h
index d0edeb7c..57ced0f 100644
--- a/drivers/staging/lustre/lustre/include/cl_object.h
+++ b/drivers/staging/lustre/lustre/include/cl_object.h
@@ -1843,6 +1843,11 @@ struct cl_io {
 	 */
 			     ci_ignore_layout:1,
 	/**
+	 * Need MDS intervention to complete a write. This usually means the
+	 * corresponding component is not initialized for the writing extent.
+	 */
+			ci_need_write_intent:1,
+	/**
 	 * Check if layout changed after the IO finishes. Mainly for HSM
 	 * requirement. If IO occurs to openning files, it doesn't need to
 	 * verify layout because HSM won't release openning files.
diff --git a/drivers/staging/lustre/lustre/include/lustre_sec.h b/drivers/staging/lustre/lustre/include/lustre_sec.h
index d35bcbc..43ff594 100644
--- a/drivers/staging/lustre/lustre/include/lustre_sec.h
+++ b/drivers/staging/lustre/lustre/include/lustre_sec.h
@@ -65,6 +65,7 @@
 struct ptlrpc_svc_ctx;
 struct ptlrpc_cli_ctx;
 struct ptlrpc_ctx_ops;
+struct req_msg_field;
 
 /**
  * \addtogroup flavor flavor
@@ -976,7 +977,8 @@ int cli_ctx_is_eternal(struct ptlrpc_cli_ctx *ctx)
 int sptlrpc_cli_alloc_repbuf(struct ptlrpc_request *req, int msgsize);
 void sptlrpc_cli_free_repbuf(struct ptlrpc_request *req);
 int sptlrpc_cli_enlarge_reqbuf(struct ptlrpc_request *req,
-			       int segment, int newsize);
+			       const struct req_msg_field *field,
+			       int newsize);
 int  sptlrpc_cli_unwrap_early_reply(struct ptlrpc_request *req,
 				    struct ptlrpc_request **req_ret);
 void sptlrpc_cli_finish_early_reply(struct ptlrpc_request *early_req);
diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index 24a0948..a976e15 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -3780,38 +3780,37 @@ static int ll_layout_lock_set(struct lustre_handle *lockh, enum ldlm_mode mode,
 	return rc;
 }
 
-static int ll_layout_refresh_locked(struct inode *inode)
+/**
+ * Issue layout intent RPC to MDS.
+ * @inode	file inode
+ * @intent	layout intent
+ *
+ * RETURNS:
+ * 0		on success
+ * retval < 0	error code
+ */
+static int ll_layout_intent(struct inode *inode, struct layout_intent *intent)
 {
 	struct ll_inode_info  *lli = ll_i2info(inode);
 	struct ll_sb_info     *sbi = ll_i2sbi(inode);
 	struct md_op_data     *op_data;
 	struct lookup_intent   it;
-	struct lustre_handle   lockh;
-	enum ldlm_mode	       mode;
 	struct ptlrpc_request *req;
 	int rc;
 
-again:
-	/* mostly layout lock is caching on the local side, so try to match
-	 * it before grabbing layout lock mutex.
-	 */
-	mode = ll_take_md_lock(inode, MDS_INODELOCK_LAYOUT, &lockh, 0,
-			       LCK_CR | LCK_CW | LCK_PR | LCK_PW);
-	if (mode != 0) { /* hit cached lock */
-		rc = ll_layout_lock_set(&lockh, mode, inode);
-		if (rc == -EAGAIN)
-			goto again;
-		return rc;
-	}
-
 	op_data = ll_prep_md_op_data(NULL, inode, inode, NULL,
 				     0, 0, LUSTRE_OPC_ANY, NULL);
 	if (IS_ERR(op_data))
 		return PTR_ERR(op_data);
 
-	/* have to enqueue one */
+	op_data->op_data = intent;
+	op_data->op_data_size = sizeof(*intent);
+
 	memset(&it, 0, sizeof(it));
 	it.it_op = IT_LAYOUT;
+	if (intent->li_opc == LAYOUT_INTENT_WRITE ||
+	    intent->li_opc == LAYOUT_INTENT_TRUNC)
+		it.it_flags = FMODE_WRITE;
 
 	LDLM_DEBUG_NOLOCK("%s: requeue layout lock for file " DFID "(%p)",
 			  ll_get_fsname(inode->i_sb, NULL, 0),
@@ -3824,18 +3823,11 @@ static int ll_layout_refresh_locked(struct inode *inode)
 
 	ll_finish_md_op_data(op_data);
 
-	mode = it.it_lock_mode;
-	it.it_lock_mode = 0;
-	ll_intent_drop_lock(&it);
-
-	if (rc == 0) {
-		/* set lock data in case this is a new lock */
+	/* set lock data in case this is a new lock */
+	if (!rc)
 		ll_set_lock_data(sbi->ll_md_exp, inode, &it, NULL);
-		lockh.cookie = it.it_lock_handle;
-		rc = ll_layout_lock_set(&lockh, mode, inode);
-		if (rc == -EAGAIN)
-			goto again;
-	}
+
+	ll_intent_drop_lock(&it);
 
 	return rc;
 }
@@ -3857,6 +3849,11 @@ int ll_layout_refresh(struct inode *inode, __u32 *gen)
 {
 	struct ll_inode_info *lli = ll_i2info(inode);
 	struct ll_sb_info *sbi = ll_i2sbi(inode);
+	struct layout_intent intent = {
+		.li_opc = LAYOUT_INTENT_ACCESS,
+	};
+	struct lustre_handle lockh;
+	enum ldlm_mode mode;
 	int rc;
 
 	*gen = ll_layout_version_get(lli);
@@ -3870,18 +3867,57 @@ int ll_layout_refresh(struct inode *inode, __u32 *gen)
 	/* take layout lock mutex to enqueue layout lock exclusively. */
 	mutex_lock(&lli->lli_layout_mutex);
 
-	rc = ll_layout_refresh_locked(inode);
-	if (rc < 0)
-		goto out;
+	while (1) {
+		/* mostly layout lock is caching on the local side, so try to
+		 * match it before grabbing layout lock mutex.
+		 */
+		mode = ll_take_md_lock(inode, MDS_INODELOCK_LAYOUT, &lockh, 0,
+				       LCK_CR | LCK_CW | LCK_PR | LCK_PW);
+		if (mode != 0) { /* hit cached lock */
+			rc = ll_layout_lock_set(&lockh, mode, inode);
+			if (rc == -EAGAIN)
+				continue;
+			break;
+		}
 
-	*gen = ll_layout_version_get(lli);
-out:
+		rc = ll_layout_intent(inode, &intent);
+		if (rc != 0)
+			break;
+	}
+
+	if (rc == 0)
+		*gen = ll_layout_version_get(lli);
 	mutex_unlock(&lli->lli_layout_mutex);
 
 	return rc;
 }
 
 /**
+ * Issue layout intent RPC indicating where in a file an IO is about to write.
+ *
+ * \param[in] inode    file inode.
+ * \param[in] start    start offset of fille in bytes where an IO is about to
+ *                     write.
+ * \param[in] end      exclusive end offset in bytes of the write range.
+ *
+ * \retval 0   on success
+ * \retval < 0 error code
+ */
+int ll_layout_write_intent(struct inode *inode, u64 start, u64 end)
+{
+	struct layout_intent intent = {
+		.li_opc = LAYOUT_INTENT_WRITE,
+		.li_start = start,
+		.li_end = end,
+	};
+	int rc;
+
+	rc = ll_layout_intent(inode, &intent);
+
+	return rc;
+}
+
+/**
  *  This function send a restore request to the MDT
  */
 int ll_layout_restore(struct inode *inode, loff_t offset, __u64 length)
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index e3f5450..b2a1f54 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -1320,6 +1320,7 @@ static inline void d_lustre_revalidate(struct dentry *dentry)
 int ll_layout_conf(struct inode *inode, const struct cl_object_conf *conf);
 int ll_layout_refresh(struct inode *inode, __u32 *gen);
 int ll_layout_restore(struct inode *inode, loff_t start, __u64 length);
+int ll_layout_write_intent(struct inode *inode, u64 start, u64 end);
 
 int ll_xattr_init(void);
 void ll_xattr_fini(void);
diff --git a/drivers/staging/lustre/lustre/llite/vvp_io.c b/drivers/staging/lustre/lustre/llite/vvp_io.c
index b772e25..c325eba 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_io.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_io.c
@@ -281,18 +281,18 @@ static void vvp_io_fini(const struct lu_env *env, const struct cl_io_slice *ios)
 	struct cl_object *obj = io->ci_obj;
 	struct vvp_io    *vio = cl2vvp_io(env, ios);
 	struct inode *inode = vvp_object_inode(obj);
+	int rc;
 
 	CLOBINVRNT(env, obj, vvp_object_invariant(obj));
 
 	CDEBUG(D_VFSTRACE, DFID
-	       " ignore/verify layout %d/%d, layout version %d restore needed %d\n",
+	       " ignore/verify layout %d/%d, layout version %d need write layout %d, restore needed %d\n",
 	       PFID(lu_object_fid(&obj->co_lu)),
 	       io->ci_ignore_layout, io->ci_verify_layout,
-	       vio->vui_layout_gen, io->ci_restore_needed);
+	       vio->vui_layout_gen, io->ci_need_write_intent,
+	       io->ci_restore_needed);
 
 	if (io->ci_restore_needed) {
-		int	rc;
-
 		/* file was detected release, we need to restore it
 		 * before finishing the io
 		 */
@@ -318,6 +318,34 @@ static void vvp_io_fini(const struct lu_env *env, const struct cl_io_slice *ios)
 		}
 	}
 
+	/**
+	 * dynamic layout change needed, send layout intent
+	 * RPC.
+	 */
+	if (io->ci_need_write_intent) {
+		loff_t start = 0;
+		loff_t end = 0;
+
+		LASSERT(io->ci_type == CIT_WRITE || cl_io_is_trunc(io));
+
+		io->ci_need_write_intent = 0;
+
+		if (io->ci_type == CIT_WRITE) {
+			start = io->u.ci_rw.crw_pos;
+			end = io->u.ci_rw.crw_pos + io->u.ci_rw.crw_count;
+		} else {
+			end = io->u.ci_setattr.sa_attr.lvb_size;
+		}
+
+		CDEBUG(D_VFSTRACE, DFID" type %d [%llx, %llx)\n",
+		       PFID(lu_object_fid(&obj->co_lu)), io->ci_type,
+		       start, end);
+		rc = ll_layout_write_intent(inode, start, end);
+		io->ci_result = rc;
+		if (!rc)
+			io->ci_need_restart = 1;
+	}
+
 	if (!io->ci_ignore_layout && io->ci_verify_layout) {
 		__u32 gen = 0;
 
diff --git a/drivers/staging/lustre/lustre/lov/lov_ea.c b/drivers/staging/lustre/lustre/lov/lov_ea.c
index 6e5b59e..6db4d5e 100644
--- a/drivers/staging/lustre/lustre/lov/lov_ea.c
+++ b/drivers/staging/lustre/lustre/lov/lov_ea.c
@@ -117,6 +117,10 @@ static void lsme_free(struct lov_stripe_md_entry *lsme)
 	unsigned int stripe_count = lsme->lsme_stripe_count;
 	unsigned int i;
 
+	if (!lsme_inited(lsme) ||
+	    lsme->lsme_pattern & LOV_PATTERN_F_RELEASED)
+		stripe_count = 0;
+
 	for (i = 0; i < stripe_count; i++)
 		kmem_cache_free(lov_oinfo_slab, lsme->lsme_oinfo[i]);
 
@@ -141,7 +145,7 @@ void lsm_free(struct lov_stripe_md *lsm)
  */
 static struct lov_stripe_md_entry *
 lsme_unpack(struct lov_obd *lov, struct lov_mds_md *lmm, size_t buf_size,
-	    const char *pool_name, struct lov_ost_data_v1 *objects,
+	    const char *pool_name, bool inited, struct lov_ost_data_v1 *objects,
 	    loff_t *maxbytes)
 {
 	struct lov_stripe_md_entry *lsme;
@@ -159,7 +163,7 @@ void lsm_free(struct lov_stripe_md *lsm)
 		return ERR_PTR(-EINVAL);
 
 	pattern = le32_to_cpu(lmm->lmm_pattern);
-	if (pattern & LOV_PATTERN_F_RELEASED)
+	if (pattern & LOV_PATTERN_F_RELEASED || !inited)
 		stripe_count = 0;
 	else
 		stripe_count = le16_to_cpu(lmm->lmm_stripe_count);
@@ -185,8 +189,10 @@ void lsm_free(struct lov_stripe_md *lsm)
 
 	lsme->lsme_magic = magic;
 	lsme->lsme_pattern = pattern;
+	lsme->lsme_flags = 0;
 	lsme->lsme_stripe_size = le32_to_cpu(lmm->lmm_stripe_size);
-	lsme->lsme_stripe_count = stripe_count;
+	/* preserve the possible -1 stripe count for uninstantiated component */
+	lsme->lsme_stripe_count = le16_to_cpu(lmm->lmm_stripe_count);
 	lsme->lsme_layout_gen = le16_to_cpu(lmm->lmm_layout_gen);
 
 	if (pool_name) {
@@ -282,10 +288,12 @@ void lsm_free(struct lov_stripe_md *lsm)
 
 	pattern = le32_to_cpu(lmm->lmm_pattern);
 
-	lsme = lsme_unpack(lov, lmm, buf_size, pool_name, objects, &maxbytes);
+	lsme = lsme_unpack(lov, lmm, buf_size, pool_name, true, objects,
+			   &maxbytes);
 	if (IS_ERR(lsme))
 		return ERR_CAST(lsme);
 
+	lsme->lsme_flags = LCME_FL_INIT;
 	lsme->lsme_extent.e_start = 0;
 	lsme->lsme_extent.e_end = LUSTRE_EOF;
 
@@ -371,7 +379,7 @@ static int lsm_verify_comp_md_v1(struct lov_comp_md_v1 *lcm,
 
 static struct lov_stripe_md_entry *
 lsme_unpack_comp(struct lov_obd *lov, struct lov_mds_md *lmm,
-		 size_t lmm_buf_size, loff_t *maxbytes)
+		 size_t lmm_buf_size, bool inited, loff_t *maxbytes)
 {
 	unsigned int stripe_count;
 	unsigned int magic;
@@ -380,6 +388,10 @@ static int lsm_verify_comp_md_v1(struct lov_comp_md_v1 *lcm,
 	if (stripe_count == 0)
 		return ERR_PTR(-EINVAL);
 
+	/* un-instantiated lmm contains no ost id info, i.e. lov_ost_data_v1 */
+	if (!inited)
+		stripe_count = 0;
+
 	magic = le32_to_cpu(lmm->lmm_magic);
 	if (magic != LOV_MAGIC_V1 && magic != LOV_MAGIC_V3)
 		return ERR_PTR(-EINVAL);
@@ -389,12 +401,12 @@ static int lsm_verify_comp_md_v1(struct lov_comp_md_v1 *lcm,
 
 	if (magic == LOV_MAGIC_V1) {
 		return lsme_unpack(lov, lmm, lmm_buf_size, NULL,
-				   lmm->lmm_objects, maxbytes);
+				   inited, lmm->lmm_objects, maxbytes);
 	} else {
 		struct lov_mds_md_v3 *lmm3 = (struct lov_mds_md_v3 *)lmm;
 
 		return lsme_unpack(lov, lmm, lmm_buf_size, lmm3->lmm_pool_name,
-				   lmm3->lmm_objects, maxbytes);
+				   inited, lmm3->lmm_objects, maxbytes);
 	}
 }
 
@@ -440,6 +452,7 @@ static int lsm_verify_comp_md_v1(struct lov_comp_md_v1 *lcm,
 		blob = (char *)lcm + blob_offset;
 
 		lsme = lsme_unpack_comp(lov, blob, blob_size,
+					le32_to_cpu(lcme->lcme_flags) & LCME_FL_INIT,
 					(i == entry_count - 1) ? &maxbytes :
 					NULL);
 		if (IS_ERR(lsme)) {
@@ -452,6 +465,7 @@ static int lsm_verify_comp_md_v1(struct lov_comp_md_v1 *lcm,
 
 		lsm->lsm_entries[i] = lsme;
 		lsme->lsme_id = le32_to_cpu(lcme->lcme_id);
+		lsme->lsme_flags = le32_to_cpu(lcme->lcme_flags);
 		lu_extent_le_to_cpu(&lsme->lsme_extent, &lcme->lcme_extent);
 
 		if (i == entry_count - 1) {
@@ -507,7 +521,7 @@ const struct lsm_operations *lsm_op_find(int magic)
 
 void dump_lsm(unsigned int level, const struct lov_stripe_md *lsm)
 {
-	int i;
+	int i, j;
 
 	CDEBUG(level,
 	       "lsm %p, objid " DOSTID ", maxbytes %#llx, magic 0x%08X, refc: %d, entry: %u, layout_gen %u\n",
@@ -519,10 +533,23 @@ void dump_lsm(unsigned int level, const struct lov_stripe_md *lsm)
 		struct lov_stripe_md_entry *lse = lsm->lsm_entries[i];
 
 		CDEBUG(level,
-		       DEXT ": id: %u, magic 0x%08X, stripe count %u, size %u, layout_gen %u, pool: [" LOV_POOLNAMEF "]\n",
-		       PEXT(&lse->lsme_extent), lse->lsme_id, lse->lsme_magic,
-		       lse->lsme_stripe_count, lse->lsme_stripe_size,
-		       lse->lsme_layout_gen, lse->lsme_pool_name);
+		       DEXT ": id: %u, flags: %x, magic 0x%08X, layout_gen %u, stripe count %u, sstripe size %u, pool: [" LOV_POOLNAMEF "]\n",
+		       PEXT(&lse->lsme_extent), lse->lsme_id, lse->lsme_flags,
+		       lse->lsme_magic, lse->lsme_layout_gen,
+                       lse->lsme_stripe_count, lse->lsme_stripe_size,
+		       lse->lsme_pool_name);
+		if (!lsme_inited(lse) ||
+		    lse->lsme_pattern & LOV_PATTERN_F_RELEASED)
+			continue;
+
+		for (j = 0; j < lse->lsme_stripe_count; j++) {
+			CDEBUG(level,
+			       "   oinfo:%p: ostid: " DOSTID " ost idx: %d gen: %d\n",
+			       lse->lsme_oinfo[j],
+			       POSTID(&lse->lsme_oinfo[j]->loi_oi),
+			       lse->lsme_oinfo[j]->loi_ost_idx,
+			       lse->lsme_oinfo[j]->loi_ost_gen);
+		}
 	}
 }
 
diff --git a/drivers/staging/lustre/lustre/lov/lov_internal.h b/drivers/staging/lustre/lustre/lov/lov_internal.h
index e8102df..5e3eae7 100644
--- a/drivers/staging/lustre/lustre/lov/lov_internal.h
+++ b/drivers/staging/lustre/lustre/lov/lov_internal.h
@@ -48,6 +48,7 @@ struct lov_stripe_md_entry {
 	struct lu_extent	lsme_extent;
 	u32			lsme_id;
 	u32			lsme_magic;
+	u32			lsme_flags;
 	u32			lsme_pattern;
 	u32			lsme_stripe_size;
 	u16			lsme_stripe_count;
@@ -56,6 +57,17 @@ struct lov_stripe_md_entry {
 	struct lov_oinfo       *lsme_oinfo[];
 };
 
+static inline void copy_lsm_entry(struct lov_stripe_md_entry *dst,
+				  struct lov_stripe_md_entry *src)
+{
+	unsigned int i;
+
+	for (i = 0; i < src->lsme_stripe_count; i++)
+		*dst->lsme_oinfo[i] = *src->lsme_oinfo[i];
+
+	memcpy(dst, src, offsetof(typeof(*src), lsme_oinfo));
+}
+
 struct lov_stripe_md {
 	atomic_t	lsm_refc;
 	spinlock_t	lsm_lock;
@@ -74,6 +86,16 @@ struct lov_stripe_md {
 	struct lov_stripe_md_entry *lsm_entries[];
 };
 
+static inline bool lsme_inited(const struct lov_stripe_md_entry *lsme)
+{
+	return lsme->lsme_flags & LCME_FL_INIT;
+}
+
+static inline bool lsm_entry_inited(const struct lov_stripe_md *lsm, int index)
+{
+	return lsme_inited(lsm->lsm_entries[index]);
+}
+
 static inline size_t lov_comp_md_size(const struct lov_stripe_md *lsm)
 {
 	struct lov_stripe_md_entry *lsme;
diff --git a/drivers/staging/lustre/lustre/lov/lov_io.c b/drivers/staging/lustre/lustre/lov/lov_io.c
index 70908b1..8a1bb85 100644
--- a/drivers/staging/lustre/lustre/lov/lov_io.c
+++ b/drivers/staging/lustre/lustre/lov/lov_io.c
@@ -394,6 +394,11 @@ static int lov_io_iter_init(const struct lu_env *env,
 		u64 start;
 		u64 end;
 
+		CDEBUG(D_VFSTRACE, "component[%d] flags %#x\n",
+		       index, lsm->lsm_entries[index]->lsme_flags);
+		if (!lsm_entry_inited(lsm, index))
+			break;
+
 		index++;
 		if (!lu_extent_is_overlapped(&ext, &le->lle_extent))
 			continue;
@@ -442,6 +447,7 @@ static int lov_io_rw_iter_init(const struct lu_env *env,
 			       const struct cl_io_slice *ios)
 {
 	struct lov_io	*lio = cl2lov_io(env, ios);
+	struct lov_stripe_md *lsm = lio->lis_object->lo_lsm;
 	struct cl_io	 *io  = ios->cis_io;
 	u64 start = io->u.ci_rw.crw_pos;
 	struct lov_stripe_md_entry *lse;
@@ -454,7 +460,7 @@ static int lov_io_rw_iter_init(const struct lu_env *env,
 	if (cl_io_is_append(io))
 		return lov_io_iter_init(env, ios);
 
-	index = lov_lsm_entry(lio->lis_object->lo_lsm, io->u.ci_rw.crw_pos);
+	index = lov_lsm_entry(lsm, io->u.ci_rw.crw_pos);
 	if (index < 0) { /* non-existing layout component */
 		if (io->ci_type == CIT_READ) {
 			/* TODO: it needs to detect the next component and
@@ -476,7 +482,9 @@ static int lov_io_rw_iter_init(const struct lu_env *env,
 	if (next <= start * ssize)
 		next = ~0ull;
 
-	LASSERT(io->u.ci_rw.crw_pos >= lse->lsme_extent.e_start);
+	LASSERTF(io->u.ci_rw.crw_pos >= lse->lsme_extent.e_start,
+		 "pos %lld, [%lld, %lld]\n", io->u.ci_rw.crw_pos,
+		 lse->lsme_extent.e_start, lse->lsme_extent.e_end);
 	next = min_t(u64, next, lse->lsme_extent.e_end);
 	next = min_t(u64, next, lio->lis_io_endpos);
 
@@ -486,9 +494,16 @@ static int lov_io_rw_iter_init(const struct lu_env *env,
 	lio->lis_endpos = io->u.ci_rw.crw_pos + io->u.ci_rw.crw_count;
 
 	CDEBUG(D_VFSTRACE,
-	       "stripe: %llu chunk: [%llu, %llu) %llu\n",
-	       (u64)start, lio->lis_pos, lio->lis_endpos,
-	       (u64)lio->lis_io_endpos);
+	       "stripe: %llu chunk: [%llu, %llu] %llu\n",
+	       start, lio->lis_pos, lio->lis_endpos,
+	       lio->lis_io_endpos);
+
+	index = lov_lsm_entry(lsm, lio->lis_endpos - 1);
+	if (index > 0 && !lsm_entry_inited(lsm, index)) {
+		io->ci_need_write_intent = 1;
+		io->ci_result = -ENODATA;
+		return io->ci_result;
+	}
 
 	/*
 	 * XXX The following call should be optimized: we know, that
@@ -497,6 +512,26 @@ static int lov_io_rw_iter_init(const struct lu_env *env,
 	return lov_io_iter_init(env, ios);
 }
 
+static int lov_io_setattr_iter_init(const struct lu_env *env,
+				    const struct cl_io_slice *ios)
+{
+	struct lov_io *lio = cl2lov_io(env, ios);
+	struct cl_io *io = ios->cis_io;
+	struct lov_stripe_md *lsm = lio->lis_object->lo_lsm;
+	int index;
+
+	if (cl_io_is_trunc(io) && lio->lis_pos) {
+		index = lov_lsm_entry(lsm, lio->lis_pos - 1);
+		if (index > 0 && !lsm_entry_inited(lsm, index)) {
+			io->ci_need_write_intent = 1;
+			io->ci_result = -ENODATA;
+			return io->ci_result;
+		}
+	}
+
+	return lov_io_iter_init(env, ios);
+}
+
 static int lov_io_call(const struct lu_env *env, struct lov_io *lio,
 		       int (*iofunc)(const struct lu_env *, struct cl_io *))
 {
@@ -617,7 +652,7 @@ static int lov_io_read_ahead(const struct lu_env *env,
 
 	offset = cl_offset(obj, start);
 	index = lov_lsm_entry(loo->lo_lsm, offset);
-	if (index < 0)
+	if (index < 0 || !lsm_entry_inited(loo->lo_lsm, index))
 		return -ENODATA;
 
 	stripe = lov_stripe_number(loo->lo_lsm, index, offset);
@@ -870,7 +905,7 @@ static void lov_io_fsync_end(const struct lu_env *env,
 		},
 		[CIT_SETATTR] = {
 			.cio_fini      = lov_io_fini,
-			.cio_iter_init = lov_io_iter_init,
+			.cio_iter_init = lov_io_setattr_iter_init,
 			.cio_iter_fini = lov_io_iter_fini,
 			.cio_lock      = lov_io_lock,
 			.cio_unlock    = lov_io_unlock,
diff --git a/drivers/staging/lustre/lustre/lov/lov_lock.c b/drivers/staging/lustre/lustre/lov/lov_lock.c
index ba31be4..9a46424 100644
--- a/drivers/staging/lustre/lustre/lov/lov_lock.c
+++ b/drivers/staging/lustre/lustre/lov/lov_lock.c
@@ -132,7 +132,7 @@ static struct lov_lock *lov_lock_sub_init(const struct lu_env *env,
 
 	nr = 0;
 	for (index = lov_lsm_entry(lov->lo_lsm, ext.e_start);
-	     index != -1 && index < lov->lo_lsm->lsm_entry_count; index++) {
+	     index >= 0 && index < lov->lo_lsm->lsm_entry_count; index++) {
 		struct lov_layout_raid0 *r0 = lov_r0(lov, index);
 
 		/* assume lsm entries are sorted. */
@@ -147,8 +147,11 @@ static struct lov_lock *lov_lock_sub_init(const struct lu_env *env,
 				nr++;
 		}
 	}
-	if (nr == 0)
-		return ERR_PTR(-EINVAL);
+	/**
+	 * Aggressive lock request (from cl_setattr_ost) which asks for
+	 * [eof, -1) lock, could come across uninstantiated layout extent,
+	 * hence a 0 nr is possible.
+	 */
 
 	lovlck = kvzalloc(offsetof(struct lov_lock, lls_sub[nr]),
 				 GFP_NOFS);
@@ -158,7 +161,7 @@ static struct lov_lock *lov_lock_sub_init(const struct lu_env *env,
 	lovlck->lls_nr = nr;
 	nr = 0;
 	for (index = lov_lsm_entry(lov->lo_lsm, ext.e_start);
-	     index < lov->lo_lsm->lsm_entry_count; index++) {
+	     index >= 0 && index < lov->lo_lsm->lsm_entry_count; index++) {
 		struct lov_layout_raid0 *r0 = lov_r0(lov, index);
 
 		/* assume lsm entries are sorted. */
diff --git a/drivers/staging/lustre/lustre/lov/lov_object.c b/drivers/staging/lustre/lustre/lov/lov_object.c
index 8596c2f..35c9403 100644
--- a/drivers/staging/lustre/lustre/lov/lov_object.c
+++ b/drivers/staging/lustre/lustre/lov/lov_object.c
@@ -64,8 +64,6 @@ struct lov_layout_operations {
 			  union lov_layout_state *state);
 	void (*llo_fini)(const struct lu_env *env, struct lov_object *lov,
 			 union lov_layout_state *state);
-	void (*llo_install)(const struct lu_env *env, struct lov_object *lov,
-			    union lov_layout_state *state);
 	int  (*llo_print)(const struct lu_env *env, void *cookie,
 			  lu_printer_t p, const struct lu_object *o);
 	int  (*llo_page_init)(const struct lu_env *env, struct cl_object *obj,
@@ -92,16 +90,6 @@ static void lov_lsm_put(struct lov_stripe_md *lsm)
  * Lov object layout operations.
  *
  */
-
-static void lov_install_empty(const struct lu_env *env,
-			      struct lov_object *lov,
-			      union  lov_layout_state *state)
-{
-	/*
-	 * File without objects.
-	 */
-}
-
 static int lov_init_empty(const struct lu_env *env, struct lov_device *dev,
 			  struct lov_object *lov, struct lov_stripe_md *lsm,
 			  const struct cl_object_conf *conf,
@@ -110,12 +98,6 @@ static int lov_init_empty(const struct lu_env *env, struct lov_device *dev,
 	return 0;
 }
 
-static void lov_install_composite(const struct lu_env *env,
-				  struct lov_object *lov,
-				  union lov_layout_state *state)
-{
-}
-
 static struct cl_object *lov_sub_find(const struct lu_env *env,
 				      struct cl_device *dev,
 				      const struct lu_fid *fid,
@@ -328,6 +310,14 @@ static int lov_init_composite(const struct lu_env *env, struct lov_device *dev,
 		struct lov_layout_entry *le = &comp->lo_entries[i];
 
 		le->lle_extent = lsm->lsm_entries[i]->lsme_extent;
+		/**
+		 * If the component has not been init-ed on MDS side, for
+		 * PFL layout, we'd know that the components beyond this one
+		 * will be dynamically init-ed later on file write/trunc ops.
+		 */
+		if (!lsm_entry_inited(lsm, i))
+			continue;
+
 		result = lov_init_raid0(env, dev, lov, i, &le->lle_raid0);
 		if (result < 0)
 			break;
@@ -471,13 +461,15 @@ static int lov_delete_composite(const struct lu_env *env,
 				struct lov_object *lov,
 				union lov_layout_state *state)
 {
+	struct lov_layout_composite *comp = &state->composite;
 	struct lov_layout_entry *entry;
 
 	dump_lsm(D_INODE, lov->lo_lsm);
 
 	lov_layout_wait(env, lov);
-	lov_foreach_layout_entry(lov, entry)
-		lov_delete_raid0(env, lov, &entry->lle_raid0);
+	if (comp->lo_entries)
+		lov_foreach_layout_entry(lov, entry)
+			lov_delete_raid0(env, lov, &entry->lle_raid0);
 
 	return 0;
 }
@@ -565,9 +557,9 @@ static int lov_print_composite(const struct lu_env *env, void *cookie,
 	for (i = 0; i < lsm->lsm_entry_count; i++) {
 		struct lov_stripe_md_entry *lse = lsm->lsm_entries[i];
 
-		(*p)(env, cookie, DEXT ": { 0x%08X, %u, %u, %u, %u }\n",
+		(*p)(env, cookie, DEXT ": { 0x%08X, %u, %u, %#x, %u, %u }\n",
 		     PEXT(&lse->lsme_extent), lse->lsme_magic,
-		     lse->lsme_id, lse->lsme_layout_gen,
+		     lse->lsme_id, lse->lsme_layout_gen, lse->lsme_flags,
 		     lse->lsme_stripe_count, lse->lsme_stripe_size);
 		lov_print_raid0(env, cookie, p, lov_r0(lov, i));
 	}
@@ -664,6 +656,10 @@ static int lov_attr_get_composite(const struct lu_env *env,
 		struct lov_layout_raid0 *r0 = &entry->lle_raid0;
 		struct cl_attr *lov_attr = &r0->lo_attr;
 
+		/* PFL: This component has not been init-ed. */
+		if (!lsm_entry_inited(lov->lo_lsm, index))
+			break;
+
 		result = lov_attr_get_raid0(env, lov, index, r0);
 		if (result != 0)
 			break;
@@ -691,7 +687,6 @@ static int lov_attr_get_composite(const struct lu_env *env,
 		.llo_init      = lov_init_empty,
 		.llo_delete    = lov_delete_empty,
 		.llo_fini      = lov_fini_empty,
-		.llo_install   = lov_install_empty,
 		.llo_print     = lov_print_empty,
 		.llo_page_init = lov_page_init_empty,
 		.llo_lock_init = lov_lock_init_empty,
@@ -702,7 +697,6 @@ static int lov_attr_get_composite(const struct lu_env *env,
 		.llo_init      = lov_init_released,
 		.llo_delete    = lov_delete_empty,
 		.llo_fini      = lov_fini_released,
-		.llo_install   = lov_install_empty,
 		.llo_print     = lov_print_released,
 		.llo_page_init = lov_page_init_empty,
 		.llo_lock_init = lov_lock_init_empty,
@@ -713,7 +707,6 @@ static int lov_attr_get_composite(const struct lu_env *env,
 		.llo_init	= lov_init_composite,
 		.llo_delete	= lov_delete_composite,
 		.llo_fini	= lov_fini_composite,
-		.llo_install	= lov_install_composite,
 		.llo_print	= lov_print_composite,
 		.llo_page_init	= lov_page_init_composite,
 		.llo_lock_init	= lov_lock_init_composite,
@@ -894,7 +887,6 @@ static int lov_layout_change(const struct lu_env *unused,
 		goto out;
 	}
 
-	new_ops->llo_install(env, lov, state);
 	lov->lo_type = llt;
 out:
 	cl_env_put(env, &refcheck);
@@ -937,8 +929,6 @@ int lov_object_init(const struct lu_env *env, struct lu_object *obj,
 	lov->lo_type = lov_type(lsm);
 	ops = &lov_dispatch[lov->lo_type];
 	rc = ops->llo_init(env, dev, lov, lsm, cconf, set);
-	if (!rc)
-		ops->llo_install(env, lov, set);
 
 	lov_lsm_put(lsm);
 
@@ -959,6 +949,7 @@ static int lov_conf_set(const struct lu_env *env, struct cl_object *obj,
 				   conf->u.coc_layout.lb_len);
 		if (IS_ERR(lsm))
 			return PTR_ERR(lsm);
+		dump_lsm(D_INODE, lsm);
 	}
 
 	lov_conf_lock(lov);
@@ -1541,6 +1532,9 @@ static int lov_object_fiemap(const struct lu_env *env, struct cl_object *obj,
 	for (entry = start_entry; entry <= end_entry; entry++) {
 		lsme = lsm->lsm_entries[entry];
 
+		if (!lsme_inited(lsme))
+			break;
+
 		if (entry == start_entry)
 			fs.fs_ext.e_start = whole_start;
 		else
@@ -1751,6 +1745,9 @@ int lov_read_and_clear_async_rc(struct cl_object *clob)
 				int j;
 
 				lse = lsm->lsm_entries[i];
+				if (!lsme_inited(lse))
+					break;
+
 				for (j = 0; j < lse->lsme_stripe_count; j++) {
 					struct lov_oinfo *loi;
 
diff --git a/drivers/staging/lustre/lustre/lov/lov_pack.c b/drivers/staging/lustre/lustre/lov/lov_pack.c
index 79d8a32..32e4b33 100644
--- a/drivers/staging/lustre/lustre/lov/lov_pack.c
+++ b/drivers/staging/lustre/lustre/lov/lov_pack.c
@@ -146,6 +146,9 @@ ssize_t lov_lsm_pack_v1v3(const struct lov_stripe_md *lsm, void *buf,
 		lmm_objects = lmmv1->lmm_objects;
 	}
 
+	if (lsm->lsm_is_released)
+		return lmm_size;
+
 	for (i = 0; i < lsm->lsm_entries[0]->lsme_stripe_count; i++) {
 		struct lov_oinfo *loi = lsm->lsm_entries[0]->lsme_oinfo[i];
 
@@ -189,11 +192,13 @@ ssize_t lov_lsm_pack(const struct lov_stripe_md *lsm, void *buf,
 	for (entry = 0; entry < lsm->lsm_entry_count; entry++) {
 		struct lov_stripe_md_entry *lsme;
 		struct lov_mds_md *lmm;
+		u16 stripecnt;
 
 		lsme = lsm->lsm_entries[entry];
 		lcme = &lcmv1->lcm_entries[entry];
 
 		lcme->lcme_id = cpu_to_le32(lsme->lsme_id);
+		lcme->lcme_flags = cpu_to_le32(lsme->lsme_flags);
 		lcme->lcme_extent.e_start =
 			cpu_to_le64(lsme->lsme_extent.e_start);
 		lcme->lcme_extent.e_end =
@@ -220,7 +225,13 @@ ssize_t lov_lsm_pack(const struct lov_stripe_md *lsm, void *buf,
 			lmm_objects = ((struct lov_mds_md_v1 *)lmm)->lmm_objects;
 		}
 
-		for (i = 0; i < lsme->lsme_stripe_count; i++) {
+		if (lsme_inited(lsme) &&
+		    !(lsme->lsme_pattern & LOV_PATTERN_F_RELEASED))
+			stripecnt = lsme->lsme_stripe_count;
+		else
+			stripecnt = 0;
+
+		for (i = 0; i < stripecnt; i++) {
 			struct lov_oinfo *loi = lsme->lsme_oinfo[i];
 
 			ostid_cpu_to_le(&loi->loi_oi, &lmm_objects[i].l_ost_oi);
@@ -230,8 +241,7 @@ ssize_t lov_lsm_pack(const struct lov_stripe_md *lsm, void *buf,
 				cpu_to_le32(loi->loi_ost_idx);
 		}
 
-		size = lov_mds_md_size(lsme->lsme_stripe_count,
-				       lsme->lsme_magic);
+		size = lov_mds_md_size(stripecnt, lsme->lsme_magic);
 		lcme->lcme_size = cpu_to_le32(size);
 		offset += size;
 	} /* for each layout component */
@@ -314,9 +324,6 @@ int lov_getstripe(struct lov_object *obj, struct lov_stripe_md *lsm,
 	size_t lmmk_size;
 	int rc = 0;
 
-	if (!lsm)
-		return -ENODATA;
-
 	if (lsm->lsm_magic != LOV_MAGIC_V1 && lsm->lsm_magic != LOV_MAGIC_V3 &&
 	    lsm->lsm_magic != LOV_MAGIC_COMP_V1) {
 		CERROR("bad LSM MAGIC: 0x%08X != 0x%08X nor 0x%08X\n",
diff --git a/drivers/staging/lustre/lustre/lov/lov_page.c b/drivers/staging/lustre/lustre/lov/lov_page.c
index f53379a..8b68d3c 100644
--- a/drivers/staging/lustre/lustre/lov/lov_page.c
+++ b/drivers/staging/lustre/lustre/lov/lov_page.c
@@ -81,7 +81,7 @@ int lov_page_init_composite(const struct lu_env *env, struct cl_object *obj,
 
 	offset = cl_offset(obj, index);
 	entry = lov_lsm_entry(loo->lo_lsm, offset);
-	if (entry < 0) {
+	if (entry < 0 || !lsm_entry_inited(loo->lo_lsm, entry)) {
 		/* non-existing layout component */
 		lov_page_init_empty(env, obj, page, index);
 		return 0;
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index 7d4ba9c..0abe426 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -214,20 +214,32 @@ static inline void mdc_clear_replay_flag(struct ptlrpc_request *req, int rc)
  * but this is incredibly unlikely, and questionable whether the client
  * could do MDS recovery under OOM anyways...
  */
-static void mdc_realloc_openmsg(struct ptlrpc_request *req,
-				struct mdt_body *body)
+static int mdc_save_lovea(struct ptlrpc_request *req,
+			  const struct req_msg_field *field,
+			  void *data, u32 size)
 {
-	int     rc;
-
-	/* FIXME: remove this explicit offset. */
-	rc = sptlrpc_cli_enlarge_reqbuf(req, DLM_INTENT_REC_OFF + 4,
-					body->mbo_eadatasize);
-	if (rc) {
-		CERROR("Can't enlarge segment %d size to %d\n",
-		       DLM_INTENT_REC_OFF + 4, body->mbo_eadatasize);
-		body->mbo_valid &= ~OBD_MD_FLEASIZE;
-		body->mbo_eadatasize = 0;
+	struct req_capsule *pill = &req->rq_pill;
+	int rc = 0;
+	void *lmm;
+
+	if (req_capsule_get_size(pill, field, RCL_CLIENT) < size) {
+		rc = sptlrpc_cli_enlarge_reqbuf(req, field, size);
+		if (rc) {
+			CERROR("%s: Can't enlarge ea size to %d: rc = %d\n",
+			       req->rq_export->exp_obd->obd_name,
+			       size, rc);
+			return rc;
+		}
+	} else {
+		req_capsule_shrink(pill, field, size, RCL_CLIENT);
 	}
+
+	req_capsule_set_size(pill, field, RCL_CLIENT, size);
+	lmm = req_capsule_client_get(pill, field);
+	if (lmm)
+		memcpy(lmm, data, size);
+
+	return rc;
 }
 
 static struct ptlrpc_request *
@@ -470,7 +482,7 @@ static struct ptlrpc_request *mdc_intent_getattr_pack(struct obd_export *exp,
 
 static struct ptlrpc_request *mdc_intent_layout_pack(struct obd_export *exp,
 						     struct lookup_intent *it,
-						     struct md_op_data *unused)
+						     struct md_op_data *op_data)
 {
 	struct obd_device     *obd = class_exp2obd(exp);
 	struct ptlrpc_request *req;
@@ -496,10 +508,9 @@ static struct ptlrpc_request *mdc_intent_layout_pack(struct obd_export *exp,
 
 	/* pack the layout intent request */
 	layout = req_capsule_client_get(&req->rq_pill, &RMF_LAYOUT_INTENT);
-	/* LAYOUT_INTENT_ACCESS is generic, specific operation will be
-	 * set for replication
-	 */
-	layout->li_opc = LAYOUT_INTENT_ACCESS;
+	LASSERT(op_data->op_data);
+	LASSERT(op_data->op_data_size == sizeof(*layout));
+	memcpy(layout, op_data->op_data, sizeof(*layout));
 
 	req_capsule_set_size(&req->rq_pill, &RMF_DLM_LVB, RCL_SERVER,
 			     obd->u.cli.cl_default_mds_easize);
@@ -649,24 +660,13 @@ static int mdc_finish_enqueue(struct obd_export *exp,
 			 * (for example error one).
 			 */
 			if ((it->it_op & IT_OPEN) && req->rq_replay) {
-				void *lmm;
-
-				if (req_capsule_get_size(pill, &RMF_EADATA,
-							 RCL_CLIENT) <
-				    body->mbo_eadatasize)
-					mdc_realloc_openmsg(req, body);
-				else
-					req_capsule_shrink(pill, &RMF_EADATA,
-							   body->mbo_eadatasize,
-							   RCL_CLIENT);
-
-				req_capsule_set_size(pill, &RMF_EADATA,
-						     RCL_CLIENT,
-						     body->mbo_eadatasize);
-
-				lmm = req_capsule_client_get(pill, &RMF_EADATA);
-				if (lmm)
-					memcpy(lmm, eadata, body->mbo_eadatasize);
+				rc = mdc_save_lovea(req, &RMF_EADATA, eadata,
+						    body->mbo_eadatasize);
+				if (rc) {
+					body->mbo_valid &= ~OBD_MD_FLEASIZE;
+					body->mbo_eadatasize = 0;
+					rc = 0;
+				}
 			}
 		}
 	} else if (it->it_op & IT_LAYOUT) {
@@ -680,6 +680,15 @@ static int mdc_finish_enqueue(struct obd_export *exp,
 								lvb_len);
 			if (!lvb_data)
 				return -EPROTO;
+
+			/**
+			 * save replied layout data to the request buffer for
+			 * recovery consideration (lest MDS reinitialize
+			 * another set of OST objects).
+			 */
+			if (req->rq_transno)
+				(void)mdc_save_lovea(req, &RMF_EADATA, lvb_data,
+						     lvb_len);
 		}
 	}
 
diff --git a/drivers/staging/lustre/lustre/obdclass/genops.c b/drivers/staging/lustre/lustre/obdclass/genops.c
index 76bc73f..03df181 100644
--- a/drivers/staging/lustre/lustre/obdclass/genops.c
+++ b/drivers/staging/lustre/lustre/obdclass/genops.c
@@ -1546,6 +1546,16 @@ static inline bool obd_mod_rpc_slot_avail(struct client_obd *cli,
 	return avail;
 }
 
+static inline bool obd_skip_mod_rpc_slot(const struct lookup_intent *it)
+{
+	if (it &&
+	    (it->it_op == IT_GETATTR || it->it_op == IT_LOOKUP ||
+	     it->it_op == IT_READDIR ||
+	     (it->it_op == IT_LAYOUT && !(it->it_flags & FMODE_WRITE))))
+		return true;
+	return false;
+}
+
 /* Get a modify RPC slot from the obd client @cli according
  * to the kind of operation @opc that is going to be sent
  * and the intent @it of the operation if it applies.
@@ -1563,8 +1573,7 @@ u16 obd_get_mod_rpc_slot(struct client_obd *cli, __u32 opc,
 	/* read-only metadata RPCs don't consume a slot on MDT
 	 * for reply reconstruction
 	 */
-	if (it && (it->it_op == IT_GETATTR || it->it_op == IT_LOOKUP ||
-		   it->it_op == IT_LAYOUT || it->it_op == IT_READDIR))
+	if (obd_skip_mod_rpc_slot(it))
 		return 0;
 
 	if (opc == MDS_CLOSE)
@@ -1610,8 +1619,7 @@ void obd_put_mod_rpc_slot(struct client_obd *cli, u32 opc,
 {
 	bool close_req = false;
 
-	if (it && (it->it_op == IT_GETATTR || it->it_op == IT_LOOKUP ||
-		   it->it_op == IT_LAYOUT || it->it_op == IT_READDIR))
+	if (obd_skip_mod_rpc_slot(it))
 		return;
 
 	if (opc == MDS_CLOSE)
diff --git a/drivers/staging/lustre/lustre/ptlrpc/layout.c b/drivers/staging/lustre/lustre/ptlrpc/layout.c
index d3c0dd6..a155200 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/layout.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/layout.c
@@ -1797,9 +1797,9 @@ int req_capsule_server_pack(struct req_capsule *pill)
  * Returns the PTLRPC request or reply (\a loc) buffer offset of a \a pill
  * corresponding to the given RMF (\a field).
  */
-static u32 __req_capsule_offset(const struct req_capsule *pill,
-				const struct req_msg_field *field,
-				enum req_location loc)
+u32 __req_capsule_offset(const struct req_capsule *pill,
+			 const struct req_msg_field *field,
+			 enum req_location loc)
 {
 	u32 offset;
 
diff --git a/drivers/staging/lustre/lustre/ptlrpc/ptlrpc_internal.h b/drivers/staging/lustre/lustre/ptlrpc/ptlrpc_internal.h
index 0e4a215..177010c 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/ptlrpc_internal.h
+++ b/drivers/staging/lustre/lustre/ptlrpc/ptlrpc_internal.h
@@ -88,7 +88,7 @@ void ptlrpc_set_add_new_req(struct ptlrpcd_ctl *pc,
 void ptlrpc_initiate_recovery(struct obd_import *imp);
 
 int lustre_unpack_req_ptlrpc_body(struct ptlrpc_request *req, int offset);
-int lustre_unpack_rep_ptlrpc_body(struct ptlrpc_request *req, int offset);
+int lustre_unpack_rep_ptlrpc_body(struct ptlrpc_request *req, int effset);
 
 int ptlrpc_sysfs_register_service(struct kset *parent,
 				  struct ptlrpc_service *svc);
@@ -284,6 +284,11 @@ void sptlrpc_conf_choose_flavor(enum lustre_sec_part from,
 int  sptlrpc_init(void);
 void sptlrpc_fini(void);
 
+/* layout.c */
+u32 __req_capsule_offset(const struct req_capsule *pill,
+			 const struct req_msg_field *field,
+			 enum req_location loc);
+
 static inline bool ptlrpc_recoverable_error(int rc)
 {
 	return (rc == -ENOTCONN || rc == -ENODEV);
diff --git a/drivers/staging/lustre/lustre/ptlrpc/sec.c b/drivers/staging/lustre/lustre/ptlrpc/sec.c
index 9c59871..53f4d4f 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/sec.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/sec.c
@@ -1611,11 +1611,14 @@ void _sptlrpc_enlarge_msg_inplace(struct lustre_msg *msg,
  * so caller should refresh its local pointers if needed.
  */
 int sptlrpc_cli_enlarge_reqbuf(struct ptlrpc_request *req,
-			       int segment, int newsize)
+			       const struct req_msg_field *field,
+			       int newsize)
 {
+	struct req_capsule *pill = &req->rq_pill;
 	struct ptlrpc_cli_ctx *ctx = req->rq_cli_ctx;
 	struct ptlrpc_sec_cops *cops;
 	struct lustre_msg *msg = req->rq_reqmsg;
+	int segment = __req_capsule_offset(pill, field, RCL_CLIENT);
 
 	LASSERT(ctx);
 	LASSERT(msg);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [lustre-devel] [PATCH v2 20/33] lustre: ldlm: Transfer layout only if layout lock is granted
  2019-01-06 22:13 [lustre-devel] [PATCH v2 00/33] lustre: add PFL support James Simmons
                   ` (18 preceding siblings ...)
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 19/33] lustre: pfl: dynamic layout modification with write/truncate James Simmons
@ 2019-01-06 22:14 ` James Simmons
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 21/33] lustre: pfl: calculate PFL file LOVEA correctly James Simmons
                   ` (12 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: James Simmons @ 2019-01-06 22:14 UTC (permalink / raw)
  To: lustre-devel

From: wang di <di.wang@intel.com>

Make sure that only valid layout is transferred;
Client also checks if lock is granted before trusting the layout;
Restore change LU-3299 commit e2335e5d because it breaks the
assumption that l_lvb_data is immutable once assigned;

Fixes: e2335e5d52b2 ("staging/lustre/llite: force lvb_data update after layout change")
Signed-off-by: wang di <di.wang@intel.com>
Signed-off-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6581
Reviewed-on: http://review.whamcloud.com/14726
Reviewed-by: jacques-Charles Lafoucriere <jacques-charles.lafoucriere@cea.fr>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c | 18 ------------------
 drivers/staging/lustre/lustre/llite/file.c      | 15 +++++++++------
 drivers/staging/lustre/lustre/mdc/mdc_locks.c   | 10 ++++++++--
 3 files changed, 17 insertions(+), 26 deletions(-)

diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c b/drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c
index 986c378..e766f798 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c
@@ -187,24 +187,6 @@ static void ldlm_handle_cp_callback(struct ptlrpc_request *req,
 				rc = -EINVAL;
 				goto out;
 			}
-		} else if (ldlm_has_layout(lock)) { /* for layout lock, lvb has
-						     * variable length
-						     */
-			void *lvb_data;
-
-			lvb_data = kzalloc(lvb_len, GFP_NOFS);
-			if (!lvb_data) {
-				LDLM_ERROR(lock, "No memory: %d.\n", lvb_len);
-				rc = -ENOMEM;
-				goto out;
-			}
-
-			lock_res_and_lock(lock);
-			LASSERT(!lock->l_lvb_data);
-			lock->l_lvb_type = LVB_T_LAYOUT;
-			lock->l_lvb_data = lvb_data;
-			lock->l_lvb_len = lvb_len;
-			unlock_res_and_lock(lock);
 		}
 	}
 
diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index a976e15..6a0a468 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -3641,7 +3641,7 @@ static int ll_layout_fetch(struct inode *inode, struct ldlm_lock *lock)
 	       PFID(ll_inode2fid(inode)), ldlm_is_lvb_ready(lock),
 	       lock->l_lvb_data, lock->l_lvb_len);
 
-	if (lock->l_lvb_data && ldlm_is_lvb_ready(lock))
+	if (lock->l_lvb_data)
 		return 0;
 
 	/* if layout lock was granted right away, the layout is returned
@@ -3683,13 +3683,16 @@ static int ll_layout_fetch(struct inode *inode, struct ldlm_lock *lock)
 
 	memcpy(lvbdata, lmm, lmmsize);
 	lock_res_and_lock(lock);
-	if (lock->l_lvb_data)
-		kvfree(lock->l_lvb_data);
-
-	lock->l_lvb_data = lvbdata;
-	lock->l_lvb_len = lmmsize;
+	if (!lock->l_lvb_data) {
+		lock->l_lvb_type = LVB_T_LAYOUT;
+		lock->l_lvb_data = lvbdata;
+		lock->l_lvb_len = lmmsize;
+		lvbdata = NULL;
+	}
 	unlock_res_and_lock(lock);
 
+	if (lvbdata)
+		kvfree(lvbdata);
 out:
 	ptlrpc_req_finished(req);
 	return rc;
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index 0abe426..a60959d 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -692,9 +692,15 @@ static int mdc_finish_enqueue(struct obd_export *exp,
 		}
 	}
 
-	/* fill in stripe data for layout lock */
+	/* fill in stripe data for layout lock.
+	 * LU-6581: trust layout data only if layout lock is granted. The MDT
+	 * has stopped sending layout unless the layout lock is granted. The
+	 * client still does this checking in case it's talking with an old
+	 * server. - Jinshan
+	 */
 	lock = ldlm_handle2lock(lockh);
-	if (lock && ldlm_has_layout(lock) && lvb_data) {
+	if (lock && ldlm_has_layout(lock) && lvb_data &&
+	    !(lockrep->lock_flags & LDLM_FL_BLOCKED_MASK)) {
 		void *lmm;
 
 		LDLM_DEBUG(lock, "layout lock returned by: %s, lvb_len: %d",
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [lustre-devel] [PATCH v2 21/33] lustre: pfl: calculate PFL file LOVEA correctly
  2019-01-06 22:13 [lustre-devel] [PATCH v2 00/33] lustre: add PFL support James Simmons
                   ` (19 preceding siblings ...)
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 20/33] lustre: ldlm: Transfer layout only if layout lock is granted James Simmons
@ 2019-01-06 22:14 ` James Simmons
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 22/33] lustre: lov: keep minimum LOVEA size James Simmons
                   ` (11 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: James Simmons @ 2019-01-06 22:14 UTC (permalink / raw)
  To: lustre-devel

From: Bobi Jam <bobijam@hotmail.com>

PFL file could contain uninstantiated component, so it could still
keeps the specified -1 stripe count,
lov_mds_md_size()/lov_user_md_size() should heed this case,
otherwise its LOVEA size could be errneous big.

Signed-off-by: Bobi Jam <bobijam@hotmail.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-9335
Reviewed-on: https://review.whamcloud.com/26597
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h  | 3 +++
 drivers/staging/lustre/include/uapi/linux/lustre/lustre_user.h | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h
index 5b4d9fc..42396dc 100644
--- a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h
@@ -1036,6 +1036,9 @@ struct lov_mds_md_v3 {		/* LOV EA mds/wire data (little-endian) */
 
 static inline __u32 lov_mds_md_size(__u16 stripes, __u32 lmm_magic)
 {
+	if (stripes == (__u16)-1)
+		stripes = 0;
+
 	if (lmm_magic == LOV_MAGIC_V3)
 		return sizeof(struct lov_mds_md_v3) +
 				stripes * sizeof(struct lov_ost_data_v1);
diff --git a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_user.h b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_user.h
index 28d4e0c..0f401bb 100644
--- a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_user.h
+++ b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_user.h
@@ -463,6 +463,9 @@ struct lov_comp_md_v1 {
 
 static inline __u32 lov_user_md_size(__u16 stripes, __u32 lmm_magic)
 {
+	if (stripes == (__u16)-1)
+		stripes = 0;
+
 	if (lmm_magic == LOV_USER_MAGIC_V1)
 		return sizeof(struct lov_user_md_v1) +
 				stripes * sizeof(struct lov_user_ost_data_v1);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [lustre-devel] [PATCH v2 22/33] lustre: lov: keep minimum LOVEA size
  2019-01-06 22:13 [lustre-devel] [PATCH v2 00/33] lustre: add PFL support James Simmons
                   ` (20 preceding siblings ...)
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 21/33] lustre: pfl: calculate PFL file LOVEA correctly James Simmons
@ 2019-01-06 22:14 ` James Simmons
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 23/33] lustre: pfl: Read should not trigger layout write intent James Simmons
                   ` (10 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: James Simmons @ 2019-01-06 22:14 UTC (permalink / raw)
  To: lustre-devel

From: Bobi Jam <bobijam@hotmail.com>

For a PFL file, some of its component could be un-instantiated, and
their lov_ost_data_v1 array is not needed, we should keep its LOVEA
as small as possible.

An unstantiated component's stripe offset should be set.

Signed-off-by: Bobi Jam <bobijam@hotmail.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-9489
Reviewed-on: https://review.whamcloud.com/27089
WC-bug-id: https://jira.whamcloud.com/browse/LU-9941
Reviewed-on: https://review.whamcloud.com/28845
Reviewed-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/lov/lov_internal.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/staging/lustre/lustre/lov/lov_internal.h b/drivers/staging/lustre/lustre/lov/lov_internal.h
index 5e3eae7..dd4dd24 100644
--- a/drivers/staging/lustre/lustre/lov/lov_internal.h
+++ b/drivers/staging/lustre/lustre/lov/lov_internal.h
@@ -96,6 +96,11 @@ static inline bool lsm_entry_inited(const struct lov_stripe_md *lsm, int index)
 	return lsme_inited(lsm->lsm_entries[index]);
 }
 
+static inline bool lsm_is_composite(u32 magic)
+{
+	return magic == LOV_MAGIC_COMP_V1;
+}
+
 static inline size_t lov_comp_md_size(const struct lov_stripe_md *lsm)
 {
 	struct lov_stripe_md_entry *lsme;
@@ -110,8 +115,15 @@ static inline size_t lov_comp_md_size(const struct lov_stripe_md *lsm)
 
 	size = sizeof(struct lov_comp_md_v1);
 	for (entry = 0; entry < lsm->lsm_entry_count; entry++) {
+		u16 stripe_count;
+
 		lsme = lsm->lsm_entries[entry];
 
+		if (lsme_inited(lsme))
+			stripe_count = lsme->lsme_stripe_count;
+		else
+			stripe_count = 0;
+
 		size += sizeof(*lsme);
 		size += lov_mds_md_size(lsme->lsme_stripe_count,
 					lsme->lsme_magic);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [lustre-devel] [PATCH v2 23/33] lustre: pfl: Read should not trigger layout write intent
  2019-01-06 22:13 [lustre-devel] [PATCH v2 00/33] lustre: add PFL support James Simmons
                   ` (21 preceding siblings ...)
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 22/33] lustre: lov: keep minimum LOVEA size James Simmons
@ 2019-01-06 22:14 ` James Simmons
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 24/33] lustre: pfl: fix hang with grouplocks James Simmons
                   ` (9 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: James Simmons @ 2019-01-06 22:14 UTC (permalink / raw)
  To: lustre-devel

From: Jinshan Xiong <jinshan.xiong@gmail.com>

In lov_io_rw_iter_init(), only write not read operation should
trigger layout write intent.

For append write, it has to make sure all uninited components
are instantiated.

Page mkwrite should also trigger write intent.

Signed-off-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Signed-off-by: Bobi Jam <bobijam@hotmail.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-9008
Reviewed-on: https://review.whamcloud.com/26499
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/llite/vvp_io.c | 20 ++++++++++++-----
 drivers/staging/lustre/lustre/lov/lov_io.c   | 33 +++++++++++++++++-----------
 2 files changed, 34 insertions(+), 19 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/vvp_io.c b/drivers/staging/lustre/lustre/llite/vvp_io.c
index c325eba..d9f02ae 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_io.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_io.c
@@ -323,18 +323,26 @@ static void vvp_io_fini(const struct lu_env *env, const struct cl_io_slice *ios)
 	 * RPC.
 	 */
 	if (io->ci_need_write_intent) {
+		loff_t end = OBD_OBJECT_EOF;
 		loff_t start = 0;
-		loff_t end = 0;
-
-		LASSERT(io->ci_type == CIT_WRITE || cl_io_is_trunc(io));
 
 		io->ci_need_write_intent = 0;
 
+		LASSERT(io->ci_type == CIT_WRITE ||
+			cl_io_is_trunc(io) || cl_io_is_mkwrite(io));
+
 		if (io->ci_type == CIT_WRITE) {
-			start = io->u.ci_rw.crw_pos;
-			end = io->u.ci_rw.crw_pos + io->u.ci_rw.crw_count;
-		} else {
+			if (!cl_io_is_append(io)) {
+				start = io->u.ci_rw.crw_pos;
+				end = start + io->u.ci_rw.crw_count;
+			}
+		} else if (cl_io_is_trunc(io)) {
 			end = io->u.ci_setattr.sa_attr.lvb_size;
+		} else { /* mkwrite */
+			pgoff_t index = io->u.ci_fault.ft_index;
+
+			start = cl_offset(io->ci_obj, index);
+			end = cl_offset(io->ci_obj, index + 1);
 		}
 
 		CDEBUG(D_VFSTRACE, DFID" type %d [%llx, %llx)\n",
diff --git a/drivers/staging/lustre/lustre/lov/lov_io.c b/drivers/staging/lustre/lustre/lov/lov_io.c
index 8a1bb85..0d809b1 100644
--- a/drivers/staging/lustre/lustre/lov/lov_io.c
+++ b/drivers/staging/lustre/lustre/lov/lov_io.c
@@ -378,6 +378,7 @@ static int lov_io_iter_init(const struct lu_env *env,
 {
 	struct lov_io	*lio = cl2lov_io(env, ios);
 	struct lov_stripe_md *lsm = lio->lis_object->lo_lsm;
+	struct cl_io *io = ios->cis_io;
 	struct lov_layout_entry *le;
 	struct lov_io_sub    *sub;
 	struct lu_extent ext;
@@ -394,15 +395,28 @@ static int lov_io_iter_init(const struct lu_env *env,
 		u64 start;
 		u64 end;
 
-		CDEBUG(D_VFSTRACE, "component[%d] flags %#x\n",
-		       index, lsm->lsm_entries[index]->lsme_flags);
-		if (!lsm_entry_inited(lsm, index))
-			break;
-
 		index++;
 		if (!lu_extent_is_overlapped(&ext, &le->lle_extent))
 			continue;
 
+		CDEBUG(D_VFSTRACE, "component[%d] flags %#x\n",
+		       index - 1, lsm->lsm_entries[index - 1]->lsme_flags);
+		if (!lsm_entry_inited(lsm, index - 1)) {
+			/* truncate IO will trigger write intent as well, and
+			 * it's handled in lov_io_setattr_iter_init()
+			 */
+			if (io->ci_type == CIT_WRITE || cl_io_is_mkwrite(io)) {
+			    io->ci_need_write_intent = 1;
+				rc = -ENODATA;
+				break;
+			}
+
+			/* Read from uninitialized components should return
+			 * zero filled pages.
+			 */
+			continue;
+		}
+
 		for (stripe = 0; stripe < r0->lo_nr; stripe++) {
 			if (!lov_stripe_intersects(lsm, index - 1, stripe,
 						   &ext, &start, &end))
@@ -498,13 +512,6 @@ static int lov_io_rw_iter_init(const struct lu_env *env,
 	       start, lio->lis_pos, lio->lis_endpos,
 	       lio->lis_io_endpos);
 
-	index = lov_lsm_entry(lsm, lio->lis_endpos - 1);
-	if (index > 0 && !lsm_entry_inited(lsm, index)) {
-		io->ci_need_write_intent = 1;
-		io->ci_result = -ENODATA;
-		return io->ci_result;
-	}
-
 	/*
 	 * XXX The following call should be optimized: we know, that
 	 * [lio->lis_pos, lio->lis_endpos) intersects with exactly one stripe.
@@ -520,7 +527,7 @@ static int lov_io_setattr_iter_init(const struct lu_env *env,
 	struct lov_stripe_md *lsm = lio->lis_object->lo_lsm;
 	int index;
 
-	if (cl_io_is_trunc(io) && lio->lis_pos) {
+	if (cl_io_is_trunc(io) && lio->lis_pos > 0) {
 		index = lov_lsm_entry(lsm, lio->lis_pos - 1);
 		if (index > 0 && !lsm_entry_inited(lsm, index)) {
 			io->ci_need_write_intent = 1;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [lustre-devel] [PATCH v2 24/33] lustre: pfl: fix hang with grouplocks
  2019-01-06 22:13 [lustre-devel] [PATCH v2 00/33] lustre: add PFL support James Simmons
                   ` (22 preceding siblings ...)
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 23/33] lustre: pfl: Read should not trigger layout write intent James Simmons
@ 2019-01-06 22:14 ` James Simmons
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 25/33] lustre: pfl: fix ost pool op->size handling James Simmons
                   ` (8 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: James Simmons @ 2019-01-06 22:14 UTC (permalink / raw)
  To: lustre-devel

From: Bobi Jam <bobijam@hotmail.com>

This is a makeshift fix. When we hold a group lock of a file,
there should no data written to the file, since during the
write IO, the file's layout could possibly change, and the
write IO will try to update its layout, which could be blocked
by itself.

Signed-off-by: Bobi Jam <bobijam@hotmail.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-9344
Reviewed-on: https://review.whamcloud.com/26646
Reviewed-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/include/cl_object.h |  2 ++
 drivers/staging/lustre/lustre/llite/file.c        | 26 +++++++++++++++++++++++
 drivers/staging/lustre/lustre/lov/lov_object.c    |  1 +
 3 files changed, 29 insertions(+)

diff --git a/drivers/staging/lustre/lustre/include/cl_object.h b/drivers/staging/lustre/lustre/include/cl_object.h
index 57ced0f..ee71f1c 100644
--- a/drivers/staging/lustre/lustre/include/cl_object.h
+++ b/drivers/staging/lustre/lustre/include/cl_object.h
@@ -288,6 +288,8 @@ struct cl_layout {
 	size_t		cl_size;
 	/** Layout generation. */
 	u32		cl_layout_gen;
+	/** whether layout is a composite one */
+	bool		cl_is_composite;
 };
 
 /**
diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index 6a0a468..08ba8f7 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -1551,6 +1551,7 @@ static int ll_lov_setstripe(struct inode *inode, struct file *file,
 {
 	struct ll_inode_info   *lli = ll_i2info(inode);
 	struct ll_file_data    *fd = LUSTRE_FPRIVATE(file);
+	struct cl_object *obj = lli->lli_clob;
 	struct ll_grouplock    grouplock;
 	int		     rc;
 
@@ -1572,6 +1573,31 @@ static int ll_lov_setstripe(struct inode *inode, struct file *file,
 	LASSERT(!fd->fd_grouplock.lg_lock);
 	spin_unlock(&lli->lli_lock);
 
+	/**
+	 * XXX: group lock needs to protect all OST objects while PFL
+	 * can add new OST objects during the IO, so we'd instantiate
+	 * all OST objects before getting its group lock.
+	 */
+	if (obj) {
+		struct cl_layout cl = {
+			.cl_is_composite = false,
+		};
+		struct lu_env *env;
+		u16 refcheck;
+
+		env = cl_env_get(&refcheck);
+		if (IS_ERR(env))
+			return PTR_ERR(env);
+
+		rc = cl_object_layout_get(env, obj, &cl);
+		if (!rc && cl.cl_is_composite)
+			rc = ll_layout_write_intent(inode, 0, OBD_OBJECT_EOF);
+
+		cl_env_put(env, &refcheck);
+		if (rc)
+			return rc;
+	}
+
 	rc = cl_get_grouplock(ll_i2info(inode)->lli_clob,
 			      arg, (file->f_flags & O_NONBLOCK), &grouplock);
 	if (rc)
diff --git a/drivers/staging/lustre/lustre/lov/lov_object.c b/drivers/staging/lustre/lustre/lov/lov_object.c
index 35c9403..968c49d 100644
--- a/drivers/staging/lustre/lustre/lov/lov_object.c
+++ b/drivers/staging/lustre/lustre/lov/lov_object.c
@@ -1637,6 +1637,7 @@ static int lov_object_layout_get(const struct lu_env *env,
 
 	cl->cl_size = lov_comp_md_size(lsm);
 	cl->cl_layout_gen = lsm->lsm_layout_gen;
+	cl->cl_is_composite = lsm_is_composite(lsm->lsm_magic);
 
 	rc = lov_lsm_pack(lsm, buf->lb_buf, buf->lb_len);
 	lov_lsm_put(lsm);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [lustre-devel] [PATCH v2 25/33] lustre: pfl: fix ost pool op->size handling
  2019-01-06 22:13 [lustre-devel] [PATCH v2 00/33] lustre: add PFL support James Simmons
                   ` (23 preceding siblings ...)
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 24/33] lustre: pfl: fix hang with grouplocks James Simmons
@ 2019-01-06 22:14 ` James Simmons
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 26/33] lustre: lov: readahead shouldn't exceed component boundary James Simmons
                   ` (7 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: James Simmons @ 2019-01-06 22:14 UTC (permalink / raw)
  To: lustre-devel

From: Bobi Jam <bobijam@hotmail.com>

This patch fixes the misunderstanding of ost_pool::op->size, it
indicates the buffer size allocated instead of the array count.

Signed-off-by: Bobi Jam <bobijam@hotmail.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-9359
Reviewed-on: https://review.whamcloud.com/26706
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/lov/lov_internal.h |  1 -
 drivers/staging/lustre/lustre/lov/lov_io.c       |  3 ++-
 drivers/staging/lustre/lustre/lov/lov_pool.c     | 20 +++++++++++---------
 3 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/drivers/staging/lustre/lustre/lov/lov_internal.h b/drivers/staging/lustre/lustre/lov/lov_internal.h
index dd4dd24..3878cad 100644
--- a/drivers/staging/lustre/lustre/lov/lov_internal.h
+++ b/drivers/staging/lustre/lustre/lov/lov_internal.h
@@ -195,7 +195,6 @@ struct lsm_operations {
 })
 #endif
 
-#define pool_tgt_size(p)	((p)->pool_obds.op_size)
 #define pool_tgt_count(p)	((p)->pool_obds.op_count)
 #define pool_tgt_array(p)	((p)->pool_obds.op_array)
 #define pool_tgt_rw_sem(p)	((p)->pool_obds.op_rw_sem)
diff --git a/drivers/staging/lustre/lustre/lov/lov_io.c b/drivers/staging/lustre/lustre/lov/lov_io.c
index 0d809b1..ec0d14f 100644
--- a/drivers/staging/lustre/lustre/lov/lov_io.c
+++ b/drivers/staging/lustre/lustre/lov/lov_io.c
@@ -100,7 +100,8 @@ static int lov_io_sub_init(const struct lu_env *env, struct lov_io *lio,
 
 	LASSERT(!sub->sub_env);
 
-	if (unlikely(!lov_r0(lov, index)->lo_sub[stripe]))
+	if (unlikely(!lov_r0(lov, index)->lo_sub ||
+		     !lov_r0(lov, index)->lo_sub[stripe]))
 		return -EIO;
 
 	/* obtain new environment */
diff --git a/drivers/staging/lustre/lustre/lov/lov_pool.c b/drivers/staging/lustre/lustre/lov/lov_pool.c
index c79c2ae..b90fb1c 100644
--- a/drivers/staging/lustre/lustre/lov/lov_pool.c
+++ b/drivers/staging/lustre/lustre/lov/lov_pool.c
@@ -238,8 +238,9 @@ int lov_ost_pool_init(struct ost_pool *op, unsigned int count)
 	op->op_array = NULL;
 	op->op_count = 0;
 	init_rwsem(&op->op_rw_sem);
-	op->op_size = count;
-	op->op_array = kcalloc(op->op_size, sizeof(op->op_array[0]), GFP_NOFS);
+	op->op_size = count * sizeof(op->op_array[0]);
+	op->op_array = kcalloc(count, sizeof(op->op_array[0]),
+			       GFP_KERNEL);
 	if (!op->op_array) {
 		op->op_size = 0;
 		return -ENOMEM;
@@ -250,24 +251,25 @@ int lov_ost_pool_init(struct ost_pool *op, unsigned int count)
 /* Caller must hold write op_rwlock */
 int lov_ost_pool_extend(struct ost_pool *op, unsigned int min_count)
 {
-	__u32 *new;
-	int new_size;
+	int new_count;
+	u32 *new;
 
 	LASSERT(min_count != 0);
 
-	if (op->op_count < op->op_size)
+	if (op->op_count * sizeof(op->op_array[0]) < op->op_size)
 		return 0;
 
-	new_size = max(min_count, 2 * op->op_size);
-	new = kcalloc(new_size, sizeof(op->op_array[0]), GFP_NOFS);
+	new_count = max_t(u32, min_count,
+			  2 * op->op_size / sizeof(op->op_array[0]));
+	new = kcalloc(new_count, sizeof(op->op_array[0]), GFP_KERNEL);
 	if (!new)
 		return -ENOMEM;
 
 	/* copy old array to new one */
-	memcpy(new, op->op_array, op->op_size * sizeof(op->op_array[0]));
+	memcpy(new, op->op_array, op->op_size);
 	kfree(op->op_array);
 	op->op_array = new;
-	op->op_size = new_size;
+	op->op_size = new_count * sizeof(op->op_array[0]);
 	return 0;
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [lustre-devel] [PATCH v2 26/33] lustre: lov: readahead shouldn't exceed component boundary
  2019-01-06 22:13 [lustre-devel] [PATCH v2 00/33] lustre: add PFL support James Simmons
                   ` (24 preceding siblings ...)
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 25/33] lustre: pfl: fix ost pool op->size handling James Simmons
@ 2019-01-06 22:14 ` James Simmons
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 27/33] lustre: uapi: support negative flags James Simmons
                   ` (6 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: James Simmons @ 2019-01-06 22:14 UTC (permalink / raw)
  To: lustre-devel

From: Jinshan Xiong <jinshan.xiong@gmail.com>

Otherwise, it will extend the readahead RPC to the next component
while the actual lock of that component is not checked.

Signed-off-by: Jinshan Xiong <jinshan.xiong@gmail.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-9340
Reviewed-on: https://review.whamcloud.com/26677
Reviewed-on: https://review.whamcloud.com/26861
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/lov/lov_io.c   | 23 ++++++++++++++++-------
 drivers/staging/lustre/lustre/lov/lov_page.c |  4 +++-
 2 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/drivers/staging/lustre/lustre/lov/lov_io.c b/drivers/staging/lustre/lustre/lov/lov_io.c
index ec0d14f..9a3352f 100644
--- a/drivers/staging/lustre/lustre/lov/lov_io.c
+++ b/drivers/staging/lustre/lustre/lov/lov_io.c
@@ -684,28 +684,37 @@ static int lov_io_read_ahead(const struct lu_env *env,
 		return rc;
 
 	/**
-	 * Adjust the stripe index by layout of raid0. ra->cra_end is
+	 * Adjust the stripe index by layout of comp. ra->cra_end is
 	 * the maximum page index covered by an underlying DLM lock.
 	 * This function converts cra_end from stripe level to file
-	 * level, and make sure it's not beyond stripe boundary.
+	 * level, and make sure it's not beyond stripe and component
+	 * boundary.
 	 */
-	if (r0->lo_nr == 1)	/* single stripe file */
-		return 0;
 
 	/* cra_end is stripe level, convert it into file level */
 	ra_end = ra->cra_end;
 	if (ra_end != CL_PAGE_EOF)
-		ra_end = lov_stripe_pgoff(loo->lo_lsm, index, ra_end, stripe);
+		ra->cra_end = lov_stripe_pgoff(loo->lo_lsm, index,
+					       ra_end, stripe);
+
+	/* boundary of current component */
+	ra_end = cl_index(obj, (loff_t)lov_lse(loo, index)->lsme_extent.e_end);
+	if (ra_end != CL_PAGE_EOF && ra->cra_end >= ra_end)
+		ra->cra_end = ra_end - 1;
+
+	if (r0->lo_nr == 1) /* single stripe file */
+		return 0;
 
 	pps = lov_lse(loo, index)->lsme_stripe_size >> PAGE_SHIFT;
 
 	CDEBUG(D_READA,
 	       DFID " max_index = %lu, pps = %u, index = %u, stripe_size = %u, stripe no = %u, start index = %lu\n",
-	       PFID(lu_object_fid(lov2lu(loo))), ra_end, pps, index,
+	       PFID(lu_object_fid(lov2lu(loo))), ra->cra_end, pps, index,
 	       lov_lse(loo, index)->lsme_stripe_size, stripe, start);
 
 	/* never exceed the end of the stripe */
-	ra->cra_end = min_t(pgoff_t, ra_end, start + pps - start % pps - 1);
+	ra->cra_end = min_t(pgoff_t,
+			    ra->cra_end, start + pps - start % pps - 1);
 	return 0;
 }
 
diff --git a/drivers/staging/lustre/lustre/lov/lov_page.c b/drivers/staging/lustre/lustre/lov/lov_page.c
index 8b68d3c..90e2981 100644
--- a/drivers/staging/lustre/lustre/lov/lov_page.c
+++ b/drivers/staging/lustre/lustre/lov/lov_page.c
@@ -56,7 +56,9 @@ static int lov_comp_page_print(const struct lu_env *env,
 {
 	struct lov_page *lp = cl2lov_page(slice);
 
-	return (*printer)(env, cookie, LUSTRE_LOV_NAME "-page@%p, raid0\n", lp);
+	return (*printer)(env, cookie,
+			  LUSTRE_LOV_NAME "-page@%p, comp index: %x\n",
+			  lp, lp->lps_index);
 }
 
 static const struct cl_page_operations lov_comp_page_ops = {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [lustre-devel] [PATCH v2 27/33] lustre: uapi: support negative flags
  2019-01-06 22:13 [lustre-devel] [PATCH v2 00/33] lustre: add PFL support James Simmons
                   ` (25 preceding siblings ...)
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 26/33] lustre: lov: readahead shouldn't exceed component boundary James Simmons
@ 2019-01-06 22:14 ` James Simmons
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 28/33] lustre: llite: return v1/v3 layout for legacy app James Simmons
                   ` (5 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: James Simmons @ 2019-01-06 22:14 UTC (permalink / raw)
  To: lustre-devel

From: Niu Yawei <yawei.niu@intel.com>

'flags' can be negative flags.

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-8998
Reviewed-on: https://review.whamcloud.com/26490
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/include/uapi/linux/lustre/lustre_user.h | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_user.h b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_user.h
index 0f401bb..eeea79c 100644
--- a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_user.h
+++ b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_user.h
@@ -423,9 +423,12 @@ enum lov_comp_md_entry_flags {
 	LCME_FL_OFFLINE		= 0x00000004,   /* Not used */
 	LCME_FL_PREFERRED	= 0x00000008,	/* Not used */
 	LCME_FL_INIT		= 0x00000010,	/* instantiated */
+	LCME_FL_NEG		= 0x80000000,	/* used to indicate a negative
+						 * flag, won't be stored on disk
+						 */
 };
 
-#define LCME_KNOWN_FLAGS	LCME_FL_INIT
+#define LCME_KNOWN_FLAGS	(LCME_FL_NEG | LCME_FL_INIT)
 
 /* lcme_id can be specified as certain flags, and the first
  * bit of lcme_id is used to indicate that the ID is representing
@@ -436,7 +439,7 @@ enum lcme_id {
 	LCME_ID_INVAL	= 0x0,
 	LCME_ID_MAX	= 0x7FFFFFFF,
 	LCME_ID_ALL	= 0xFFFFFFFF,
-	LCME_ID_NONE	= 0x80000000
+	LCME_ID_NOT_ID	= LCME_FL_NEG
 };
 
 struct lov_comp_md_entry_v1 {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [lustre-devel] [PATCH v2 28/33] lustre: llite: return v1/v3 layout for legacy app
  2019-01-06 22:13 [lustre-devel] [PATCH v2 00/33] lustre: add PFL support James Simmons
                   ` (26 preceding siblings ...)
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 27/33] lustre: uapi: support negative flags James Simmons
@ 2019-01-06 22:14 ` James Simmons
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 29/33] lustre: llite: restore ll_file_getstripe in ll_lov_setstripe James Simmons
                   ` (4 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: James Simmons @ 2019-01-06 22:14 UTC (permalink / raw)
  To: lustre-devel

From: Niu Yawei <yawei.niu@intel.com>

Legacy app such as ADIO fetches LOVEA by ioctl LL_IOC_LOV_GETSTRIPE
and treats file layout as v1/v3 blindly, we'd return a reasonable
v1/v3 in this case.

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-9490
Reviewed-on: https://review.whamcloud.com/27183
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/lov/lov_object.c |  2 +-
 drivers/staging/lustre/lustre/lov/lov_pack.c   | 72 +++++++++++++++++++++++---
 2 files changed, 67 insertions(+), 7 deletions(-)

diff --git a/drivers/staging/lustre/lustre/lov/lov_object.c b/drivers/staging/lustre/lustre/lov/lov_object.c
index 968c49d..aad4fee 100644
--- a/drivers/staging/lustre/lustre/lov/lov_object.c
+++ b/drivers/staging/lustre/lustre/lov/lov_object.c
@@ -1615,7 +1615,7 @@ static int lov_object_getstripe(const struct lu_env *env, struct cl_object *obj,
 	if (!lsm)
 		return -ENODATA;
 
-	rc = lov_getstripe(cl2lov(obj), lsm, lum);
+	rc = lov_getstripe(env, cl2lov(obj), lsm, lum);
 	lov_lsm_put(lsm);
 	return rc;
 }
diff --git a/drivers/staging/lustre/lustre/lov/lov_pack.c b/drivers/staging/lustre/lustre/lov/lov_pack.c
index 32e4b33..10be119 100644
--- a/drivers/staging/lustre/lustre/lov/lov_pack.c
+++ b/drivers/staging/lustre/lustre/lov/lov_pack.c
@@ -315,12 +315,14 @@ struct lov_stripe_md *lov_unpackmd(struct lov_obd *lov, void *buf,
  * the maximum number of OST indices which will fit in the user buffer.
  * lmm_magic must be LOV_USER_MAGIC.
  */
-int lov_getstripe(struct lov_object *obj, struct lov_stripe_md *lsm,
-		  struct lov_user_md __user *lump)
+int lov_getstripe(const struct lu_env *env, struct lov_object *obj,
+		  struct lov_stripe_md *lsm, struct lov_user_md __user *lump)
 {
 	/* we use lov_user_md_v3 because it is larger than lov_user_md_v1 */
 	struct lov_mds_md *lmmk;
-	ssize_t lmm_size;
+	struct lov_user_md_v1 lum;
+	ssize_t lmm_size, lum_size = 0;
+	static bool printed;
 	size_t lmmk_size;
 	int rc = 0;
 
@@ -332,6 +334,13 @@ int lov_getstripe(struct lov_object *obj, struct lov_stripe_md *lsm,
 		goto out;
 	}
 
+	if (!printed) {
+		LCONSOLE_WARN("%s: using old ioctl(LL_IOC_LOV_GETSTRIPE) on " DFID ", use llapi_layout_get_by_path()\n",
+			      current->comm,
+			      PFID(&obj->lo_cl.co_lu.lo_header->loh_fid));
+		printed = true;
+	}
+
 	lmmk_size = lov_comp_md_size(lsm);
 	lmmk = kvzalloc(lmmk_size, GFP_KERNEL);
 	if (!lmmk) {
@@ -357,10 +366,61 @@ int lov_getstripe(struct lov_object *obj, struct lov_stripe_md *lsm,
 		}
 	}
 
-	if (copy_to_user(lump, lmmk, lmmk_size))
+	/* Legacy appication passes limited buffer, we need to figure out
+	 * the user buffer size by the passed in lmm_stripe_count.
+	 */
+	if (copy_from_user(&lum, lump, sizeof(struct lov_user_md_v1))) {
 		rc = -EFAULT;
-	else
-		rc = 0;
+		goto out_free;
+	}
+
+	if (lum.lmm_magic == LOV_USER_MAGIC_V1 ||
+	    lum.lmm_magic == LOV_USER_MAGIC_V3)
+		lum_size = lov_user_md_size(lum.lmm_stripe_count,
+					    lum.lmm_magic);
+
+	if (lum_size != 0) {
+		struct lov_mds_md *comp_md = lmmk;
+
+		/* Legacy app (ADIO for instance) treats the layout as V1/V3
+		 * blindly, we'd return a reasonable V1/V3 for them.
+		 */
+		if (lmmk->lmm_magic == LOV_MAGIC_COMP_V1) {
+			struct lov_comp_md_v1 *comp_v1;
+			struct cl_object *cl_obj;
+			struct cl_attr attr;
+			int i;
+
+			attr.cat_size = 0;
+			cl_obj = cl_object_top(&obj->lo_cl);
+			cl_object_attr_get(env, cl_obj, &attr);
+
+			/* return the last instantiated component if file size
+			 * is non-zero, otherwise, return the last component.
+			 */
+			comp_v1 = (struct lov_comp_md_v1 *)lmmk;
+			i = attr.cat_size == 0 ? comp_v1->lcm_entry_count : 0;
+			for (; i < comp_v1->lcm_entry_count; i++) {
+				if (!(comp_v1->lcm_entries[i].lcme_flags &
+				    LCME_FL_INIT))
+					break;
+			}
+			if (i > 0)
+				i--;
+			comp_md = (struct lov_mds_md *)((char *)comp_v1 +
+					comp_v1->lcm_entries[i].lcme_offset);
+		}
+		if (copy_to_user(lump, comp_md, lum_size)) {
+			rc = -EFAULT;
+			goto out_free;
+		}
+	} else {
+		if (copy_to_user(lump, lmmk, lmmk_size)) {
+			rc = -EFAULT;
+			goto out_free;
+		}
+	}
+
 out_free:
 	kvfree(lmmk);
 out:
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [lustre-devel] [PATCH v2 29/33] lustre: llite: restore ll_file_getstripe in ll_lov_setstripe
  2019-01-06 22:13 [lustre-devel] [PATCH v2 00/33] lustre: add PFL support James Simmons
                   ` (27 preceding siblings ...)
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 28/33] lustre: llite: return v1/v3 layout for legacy app James Simmons
@ 2019-01-06 22:14 ` James Simmons
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 30/33] lustre: lov: do not split IO for single striped file James Simmons
                   ` (3 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: James Simmons @ 2019-01-06 22:14 UTC (permalink / raw)
  To: lustre-devel

From: Bobi Jam <bobijam@hotmail.com>

Commit fafe6b4d4a6fa63cedff3bd44e6578009578b3d7 has get rid of
the call to ll_file_getstripe in ll_lov_setstripe.

Add a @size parameter for series of xxx_getstripe interfaces,
indicating the max buffer size that user provides to hold the
stripe information. It is mainly for the ll_lov_setstripe, which
will call ll_file_getstripe to fetch basic stripe inforation.

Add LL_IOC_LOV_SETSTRIPE_NEW/LL_IOC_LOV_GETSTRIPE_NEW ioctl interface
which defines the interface correctly, which could be used in later
Lustre versions.

Signed-off-by: Bobi Jam <bobijam@hotmail.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-9367
Reviewed-on: https://review.whamcloud.com/26915
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/include/uapi/linux/lustre/lustre_user.h |  2 ++
 drivers/staging/lustre/lustre/include/cl_object.h  |  4 +--
 drivers/staging/lustre/lustre/llite/dir.c          |  5 ++-
 drivers/staging/lustre/lustre/llite/file.c         | 36 +++++++++++++++-------
 .../staging/lustre/lustre/lov/lov_cl_internal.h    |  5 +--
 drivers/staging/lustre/lustre/lov/lov_object.c     |  4 +--
 drivers/staging/lustre/lustre/lov/lov_pack.c       | 33 ++++++++++++++------
 drivers/staging/lustre/lustre/obdclass/cl_object.c |  5 +--
 8 files changed, 64 insertions(+), 30 deletions(-)

diff --git a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_user.h b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_user.h
index eeea79c..835b60c 100644
--- a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_user.h
+++ b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_user.h
@@ -236,7 +236,9 @@ struct ll_futimes_3 {
 #define LL_IOC_SETFLAGS		 _IOW('f', 152, long)
 #define LL_IOC_CLRFLAGS		 _IOW('f', 153, long)
 #define LL_IOC_LOV_SETSTRIPE	    _IOW('f', 154, long)
+#define LL_IOC_LOV_SETSTRIPE_NEW	_IOWR('f', 154, struct lov_user_md)
 #define LL_IOC_LOV_GETSTRIPE	    _IOW('f', 155, long)
+#define LL_IOC_LOV_GETSTRIPE_NEW	_IOR('f', 155, struct lov_user_md)
 #define LL_IOC_LOV_SETEA		_IOW('f', 156, long)
 /*	LL_IOC_RECREATE_OBJ		157 obsolete */
 /*	LL_IOC_RECREATE_FID		158 obsolete */
diff --git a/drivers/staging/lustre/lustre/include/cl_object.h b/drivers/staging/lustre/lustre/include/cl_object.h
index ee71f1c..4f0e8e2 100644
--- a/drivers/staging/lustre/lustre/include/cl_object.h
+++ b/drivers/staging/lustre/lustre/include/cl_object.h
@@ -390,7 +390,7 @@ struct cl_object_operations {
 	 * Object getstripe method.
 	 */
 	int (*coo_getstripe)(const struct lu_env *env, struct cl_object *obj,
-			     struct lov_user_md __user *lum);
+			     struct lov_user_md __user *lum, size_t size);
 	/**
 	 * Get FIEMAP mapping from the object.
 	 */
@@ -2057,7 +2057,7 @@ int  cl_conf_set(const struct lu_env *env, struct cl_object *obj,
 int cl_object_prune(const struct lu_env *env, struct cl_object *obj);
 void cl_object_kill(const struct lu_env *env, struct cl_object *obj);
 int  cl_object_getstripe(const struct lu_env *env, struct cl_object *obj,
-			 struct lov_user_md __user *lum);
+			 struct lov_user_md __user *lum, size_t size);
 int cl_object_fiemap(const struct lu_env *env, struct cl_object *obj,
 		     struct ll_fiemap_info_key *fmkey, struct fiemap *fiemap,
 		     size_t *buflen);
diff --git a/drivers/staging/lustre/lustre/llite/dir.c b/drivers/staging/lustre/lustre/llite/dir.c
index 57acb7b..2459f5c 100644
--- a/drivers/staging/lustre/lustre/llite/dir.c
+++ b/drivers/staging/lustre/lustre/llite/dir.c
@@ -1224,6 +1224,7 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 
 		return rc;
 	}
+	case LL_IOC_LOV_SETSTRIPE_NEW:
 	case LL_IOC_LOV_SETSTRIPE: {
 		struct lov_user_md_v3 lumv3;
 		struct lov_user_md_v1 *lumv1 = (struct lov_user_md_v1 *)&lumv3;
@@ -1363,6 +1364,7 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 	case IOC_OBD_STATFS:
 		return ll_obd_statfs(inode, (void __user *)arg);
 	case LL_IOC_LOV_GETSTRIPE:
+	case LL_IOC_LOV_GETSTRIPE_NEW:
 	case LL_IOC_MDC_GETINFO:
 	case IOC_MDC_GETFILEINFO:
 	case IOC_MDC_GETFILESTRIPE: {
@@ -1405,7 +1407,8 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 		}
 
 		if (cmd == IOC_MDC_GETFILESTRIPE ||
-		    cmd == LL_IOC_LOV_GETSTRIPE) {
+		    cmd == LL_IOC_LOV_GETSTRIPE ||
+		    cmd == LL_IOC_LOV_GETSTRIPE_NEW) {
 			lump = (struct lov_user_md __user *)arg;
 		} else {
 			struct lov_user_mds_data __user *lmdp;
diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index 08ba8f7..94574b7 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -1481,7 +1481,7 @@ int ll_lov_getstripe_ea_info(struct inode *inode, const char *filename,
 }
 
 static int ll_lov_setea(struct inode *inode, struct file *file,
-			unsigned long arg)
+			void __user *arg)
 {
 	__u64			 flags = MDS_OPEN_HAS_OBJS | FMODE_WRITE;
 	struct lov_user_md	*lump;
@@ -1496,7 +1496,7 @@ static int ll_lov_setea(struct inode *inode, struct file *file,
 	if (!lump)
 		return -ENOMEM;
 
-	if (copy_from_user(lump, (struct lov_user_md __user *)arg, lum_size)) {
+	if (copy_from_user(lump, arg, lum_size)) {
 		kvfree(lump);
 		return -EFAULT;
 	}
@@ -1509,8 +1509,7 @@ static int ll_lov_setea(struct inode *inode, struct file *file,
 	return rc;
 }
 
-static int ll_file_getstripe(struct inode *inode,
-			     struct lov_user_md __user *lum)
+static int ll_file_getstripe(struct inode *inode, void __user *lum, size_t size)
 {
 	struct lu_env *env;
 	u16 refcheck;
@@ -1520,13 +1519,13 @@ static int ll_file_getstripe(struct inode *inode,
 	if (IS_ERR(env))
 		return PTR_ERR(env);
 
-	rc = cl_object_getstripe(env, ll_i2info(inode)->lli_clob, lum);
+	rc = cl_object_getstripe(env, ll_i2info(inode)->lli_clob, lum, size);
 	cl_env_put(env, &refcheck);
 	return rc;
 }
 
 static int ll_lov_setstripe(struct inode *inode, struct file *file,
-			    unsigned long arg)
+			    void __user *arg)
 {
 	struct lov_user_md __user *lum = (struct lov_user_md __user *)arg;
 	struct lov_user_md *klum;
@@ -1540,8 +1539,22 @@ static int ll_lov_setstripe(struct inode *inode, struct file *file,
 	lum_size = rc;
 	rc = ll_lov_setstripe_ea_info(inode, file->f_path.dentry, flags, klum,
 				      lum_size);
-	cl_lov_delay_create_clear(&file->f_flags);
+	if (!rc) {
+		u32 gen;
+
+		rc = put_user(0, &lum->lmm_stripe_count);
+		if (rc)
+			goto out;
 
+		rc = ll_layout_refresh(inode, &gen);
+		if (rc)
+			goto out;
+
+		rc = ll_file_getstripe(inode, arg, lum_size);
+	}
+
+	cl_lov_delay_create_clear(&file->f_flags);
+out:
 	kfree(klum);
 	return rc;
 }
@@ -2329,9 +2342,10 @@ int ll_ioctl_fssetxattr(struct inode *inode, unsigned int cmd,
 		}
 		return 0;
 	case LL_IOC_LOV_SETSTRIPE:
-		return ll_lov_setstripe(inode, file, arg);
+	case LL_IOC_LOV_SETSTRIPE_NEW:
+		return ll_lov_setstripe(inode, file, (void __user *) arg);
 	case LL_IOC_LOV_SETEA:
-		return ll_lov_setea(inode, file, arg);
+		return ll_lov_setea(inode, file, (void __user *) arg);
 	case LL_IOC_LOV_SWAP_LAYOUTS: {
 		struct file *file2;
 		struct lustre_swap_layouts lsl;
@@ -2384,8 +2398,8 @@ int ll_ioctl_fssetxattr(struct inode *inode, unsigned int cmd,
 		return rc;
 	}
 	case LL_IOC_LOV_GETSTRIPE:
-		return ll_file_getstripe(inode,
-					 (struct lov_user_md __user *)arg);
+	case LL_IOC_LOV_GETSTRIPE_NEW:
+		return ll_file_getstripe(inode, (void __user *)arg, 0);
 	case FSFILT_IOC_GETFLAGS:
 	case FSFILT_IOC_SETFLAGS:
 		return ll_iocontrol(inode, file, cmd, arg);
diff --git a/drivers/staging/lustre/lustre/lov/lov_cl_internal.h b/drivers/staging/lustre/lustre/lov/lov_cl_internal.h
index 96e6636..5d4c83b 100644
--- a/drivers/staging/lustre/lustre/lov/lov_cl_internal.h
+++ b/drivers/staging/lustre/lustre/lov/lov_cl_internal.h
@@ -651,8 +651,9 @@ static inline struct lov_stripe_md_entry *lov_lse(struct lov_object *lov, int i)
 }
 
 /* lov_pack.c */
-int lov_getstripe(struct lov_object *obj, struct lov_stripe_md *lsm,
-		  struct lov_user_md __user *lump);
+int lov_getstripe(const struct lu_env *env, struct lov_object *obj,
+		  struct lov_stripe_md *lsm, struct lov_user_md __user *lump,
+		  size_t size);
 
 /** @} lov */
 
diff --git a/drivers/staging/lustre/lustre/lov/lov_object.c b/drivers/staging/lustre/lustre/lov/lov_object.c
index aad4fee..72f42fc 100644
--- a/drivers/staging/lustre/lustre/lov/lov_object.c
+++ b/drivers/staging/lustre/lustre/lov/lov_object.c
@@ -1605,7 +1605,7 @@ static int lov_object_fiemap(const struct lu_env *env, struct cl_object *obj,
 }
 
 static int lov_object_getstripe(const struct lu_env *env, struct cl_object *obj,
-				struct lov_user_md __user *lum)
+				struct lov_user_md __user *lum, size_t size)
 {
 	struct lov_object *lov = cl2lov(obj);
 	struct lov_stripe_md *lsm;
@@ -1615,7 +1615,7 @@ static int lov_object_getstripe(const struct lu_env *env, struct cl_object *obj,
 	if (!lsm)
 		return -ENODATA;
 
-	rc = lov_getstripe(env, cl2lov(obj), lsm, lum);
+	rc = lov_getstripe(env, cl2lov(obj), lsm, lum, size);
 	lov_lsm_put(lsm);
 	return rc;
 }
diff --git a/drivers/staging/lustre/lustre/lov/lov_pack.c b/drivers/staging/lustre/lustre/lov/lov_pack.c
index 10be119..ef3c040 100644
--- a/drivers/staging/lustre/lustre/lov/lov_pack.c
+++ b/drivers/staging/lustre/lustre/lov/lov_pack.c
@@ -314,12 +314,16 @@ struct lov_stripe_md *lov_unpackmd(struct lov_obd *lov, void *buf,
  * @lump is a pointer to an in-core struct with lmm_ost_count indicating
  * the maximum number of OST indices which will fit in the user buffer.
  * lmm_magic must be LOV_USER_MAGIC.
+ *
+ * If @size > 0, User specified limited buffer size, usually the buffer is from
+ * ll_lov_setstripe(), and the buffer can only hold basic layout template info.
  */
 int lov_getstripe(const struct lu_env *env, struct lov_object *obj,
-		  struct lov_stripe_md *lsm, struct lov_user_md __user *lump)
+		  struct lov_stripe_md *lsm, struct lov_user_md __user *lump,
+		  size_t size)
 {
 	/* we use lov_user_md_v3 because it is larger than lov_user_md_v1 */
-	struct lov_mds_md *lmmk;
+	struct lov_mds_md *lmmk, *lmm;
 	struct lov_user_md_v1 lum;
 	ssize_t lmm_size, lum_size = 0;
 	static bool printed;
@@ -410,15 +414,24 @@ int lov_getstripe(const struct lu_env *env, struct lov_object *obj,
 			comp_md = (struct lov_mds_md *)((char *)comp_v1 +
 					comp_v1->lcm_entries[i].lcme_offset);
 		}
-		if (copy_to_user(lump, comp_md, lum_size)) {
-			rc = -EFAULT;
-			goto out_free;
-		}
+
+		lmm = comp_md;
+		lmm_size = lum_size;
 	} else {
-		if (copy_to_user(lump, lmmk, lmmk_size)) {
-			rc = -EFAULT;
-			goto out_free;
-		}
+		lmm = lmmk;
+		lmm_size = lmmk_size;
+	}
+	/**
+	 * User specified limited buffer size, usually the buffer is
+	 * from ll_lov_setstripe(), and the buffer can only hold basic
+	 * layout template info.
+	 */
+	if (size == 0 || size > lmm_size)
+		size = lmm_size;
+
+	if (copy_to_user(lump, lmm, size)) {
+		rc = -EFAULT;
+		goto out_free;
 	}
 
 out_free:
diff --git a/drivers/staging/lustre/lustre/obdclass/cl_object.c b/drivers/staging/lustre/lustre/obdclass/cl_object.c
index 09fc7e7..b2bf570 100644
--- a/drivers/staging/lustre/lustre/obdclass/cl_object.c
+++ b/drivers/staging/lustre/lustre/obdclass/cl_object.c
@@ -323,7 +323,7 @@ int cl_object_prune(const struct lu_env *env, struct cl_object *obj)
  * Get stripe information of this object.
  */
 int cl_object_getstripe(const struct lu_env *env, struct cl_object *obj,
-			struct lov_user_md __user *uarg)
+			struct lov_user_md __user *uarg, size_t size)
 {
 	struct lu_object_header *top;
 	int result = 0;
@@ -331,7 +331,8 @@ int cl_object_getstripe(const struct lu_env *env, struct cl_object *obj,
 	top = obj->co_lu.lo_header;
 	list_for_each_entry(obj, &top->loh_layers, co_lu.lo_linkage) {
 		if (obj->co_ops->coo_getstripe) {
-			result = obj->co_ops->coo_getstripe(env, obj, uarg);
+			result = obj->co_ops->coo_getstripe(env, obj, uarg,
+							    size);
 			if (result)
 				break;
 		}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [lustre-devel] [PATCH v2 30/33] lustre: lov: do not split IO for single striped file
  2019-01-06 22:13 [lustre-devel] [PATCH v2 00/33] lustre: add PFL support James Simmons
                   ` (28 preceding siblings ...)
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 29/33] lustre: llite: restore ll_file_getstripe in ll_lov_setstripe James Simmons
@ 2019-01-06 22:14 ` James Simmons
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 31/33] lustre: lov: call cl_object_attr_get under cl_attr lock James Simmons
                   ` (2 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: James Simmons @ 2019-01-06 22:14 UTC (permalink / raw)
  To: lustre-devel

From: Jinshan Xiong <jinshan.xiong@gmail.com>

stripe size for single striped file is not reliable, it shouldn't
be used to split I/O.

Signed-off-by: Jinshan Xiong <jinshan.xiong@gmail.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-9841
Reviewed-on: https://review.whamcloud.com/28451
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/lov/lov_io.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/staging/lustre/lustre/lov/lov_io.c b/drivers/staging/lustre/lustre/lov/lov_io.c
index 9a3352f..47bb618 100644
--- a/drivers/staging/lustre/lustre/lov/lov_io.c
+++ b/drivers/staging/lustre/lustre/lov/lov_io.c
@@ -466,7 +466,6 @@ static int lov_io_rw_iter_init(const struct lu_env *env,
 	struct cl_io	 *io  = ios->cis_io;
 	u64 start = io->u.ci_rw.crw_pos;
 	struct lov_stripe_md_entry *lse;
-	unsigned long ssize;
 	int index;
 	u64 next;
 
@@ -491,11 +490,15 @@ static int lov_io_rw_iter_init(const struct lu_env *env,
 
 	lse = lov_lse(lio->lis_object, index);
 
-	ssize = lse->lsme_stripe_size;
-	lov_do_div64(start, ssize);
-	next = (start + 1) * ssize;
-	if (next <= start * ssize)
-		next = ~0ull;
+	next = MAX_LFS_FILESIZE;
+	if (lse->lsme_stripe_count > 1) {
+		unsigned long ssize = lse->lsme_stripe_size;
+
+		lov_do_div64(start, ssize);
+		next = (start + 1) * ssize;
+		if (next <= start * ssize)
+			next = MAX_LFS_FILESIZE;
+	}
 
 	LASSERTF(io->u.ci_rw.crw_pos >= lse->lsme_extent.e_start,
 		 "pos %lld, [%lld, %lld]\n", io->u.ci_rw.crw_pos,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [lustre-devel] [PATCH v2 31/33] lustre: lov: call cl_object_attr_get under cl_attr lock
  2019-01-06 22:13 [lustre-devel] [PATCH v2 00/33] lustre: add PFL support James Simmons
                   ` (29 preceding siblings ...)
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 30/33] lustre: lov: do not split IO for single striped file James Simmons
@ 2019-01-06 22:14 ` James Simmons
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 32/33] lustre: lov: use stripe_count instead of stripe_nr James Simmons
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 33/33] lustre: update version to 2.9.99 James Simmons
  32 siblings, 0 replies; 47+ messages in thread
From: James Simmons @ 2019-01-06 22:14 UTC (permalink / raw)
  To: lustre-devel

From: Mike Pershin <mpershin@whamcloud.com>

cl_object_attr_get() must be called under cl_object_attr_lock
get. There is place in lov_getstripe where it is called
without that lock.

Signed-off-by: Mike Pershin <mpershin@whamcloud.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-10232
Reviewed-on: https://review.whamcloud.com/30052
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/lov/lov_pack.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/staging/lustre/lustre/lov/lov_pack.c b/drivers/staging/lustre/lustre/lov/lov_pack.c
index ef3c040..089e556 100644
--- a/drivers/staging/lustre/lustre/lov/lov_pack.c
+++ b/drivers/staging/lustre/lustre/lov/lov_pack.c
@@ -397,7 +397,9 @@ int lov_getstripe(const struct lu_env *env, struct lov_object *obj,
 
 			attr.cat_size = 0;
 			cl_obj = cl_object_top(&obj->lo_cl);
+			cl_object_attr_lock(cl_obj);
 			cl_object_attr_get(env, cl_obj, &attr);
+			cl_object_attr_unlock(cl_obj);
 
 			/* return the last instantiated component if file size
 			 * is non-zero, otherwise, return the last component.
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [lustre-devel] [PATCH v2 32/33] lustre: lov: use stripe_count instead of stripe_nr
  2019-01-06 22:13 [lustre-devel] [PATCH v2 00/33] lustre: add PFL support James Simmons
                   ` (30 preceding siblings ...)
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 31/33] lustre: lov: call cl_object_attr_get under cl_attr lock James Simmons
@ 2019-01-06 22:14 ` James Simmons
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 33/33] lustre: update version to 2.9.99 James Simmons
  32 siblings, 0 replies; 47+ messages in thread
From: James Simmons @ 2019-01-06 22:14 UTC (permalink / raw)
  To: lustre-devel

From: Andreas Dilger <adilger@whamcloud.com>

Replace the use of stripecnt in the code with
stripe_count to be consistent the rest of the code.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-8653
Reviewed-on: https://review.whamcloud.com/26681
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Steve Guminski <stephenx.guminski@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/lov/lov_internal.h |  2 +-
 drivers/staging/lustre/lustre/lov/lov_pack.c     | 12 ++++++------
 drivers/staging/lustre/lustre/lov/lov_request.c  |  4 ++--
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/staging/lustre/lustre/lov/lov_internal.h b/drivers/staging/lustre/lustre/lov/lov_internal.h
index 3878cad..2b31c99 100644
--- a/drivers/staging/lustre/lustre/lov/lov_internal.h
+++ b/drivers/staging/lustre/lustre/lov/lov_internal.h
@@ -271,7 +271,7 @@ int lov_prep_statfs_set(struct obd_device *obd, struct obd_info *oinfo,
 void lov_fix_desc_stripe_count(__u32 *val);
 void lov_fix_desc_pattern(__u32 *val);
 void lov_fix_desc_qos_maxage(__u32 *val);
-__u16 lov_get_stripecnt(struct lov_obd *lov, __u32 magic, __u16 stripe_count);
+u16 lov_get_stripe_count(struct lov_obd *lov, u32 magic, u16 stripe_count);
 int lov_connect_obd(struct obd_device *obd, __u32 index, int activate,
 		    struct obd_connect_data *data);
 int lov_setup(struct obd_device *obd, struct lustre_cfg *lcfg);
diff --git a/drivers/staging/lustre/lustre/lov/lov_pack.c b/drivers/staging/lustre/lustre/lov/lov_pack.c
index 089e556..0a6bb1e 100644
--- a/drivers/staging/lustre/lustre/lov/lov_pack.c
+++ b/drivers/staging/lustre/lustre/lov/lov_pack.c
@@ -192,7 +192,7 @@ ssize_t lov_lsm_pack(const struct lov_stripe_md *lsm, void *buf,
 	for (entry = 0; entry < lsm->lsm_entry_count; entry++) {
 		struct lov_stripe_md_entry *lsme;
 		struct lov_mds_md *lmm;
-		u16 stripecnt;
+		u16 stripe_count;
 
 		lsme = lsm->lsm_entries[entry];
 		lcme = &lcmv1->lcm_entries[entry];
@@ -227,11 +227,11 @@ ssize_t lov_lsm_pack(const struct lov_stripe_md *lsm, void *buf,
 
 		if (lsme_inited(lsme) &&
 		    !(lsme->lsme_pattern & LOV_PATTERN_F_RELEASED))
-			stripecnt = lsme->lsme_stripe_count;
+			stripe_count = lsme->lsme_stripe_count;
 		else
-			stripecnt = 0;
+			stripe_count = 0;
 
-		for (i = 0; i < stripecnt; i++) {
+		for (i = 0; i < stripe_count; i++) {
 			struct lov_oinfo *loi = lsme->lsme_oinfo[i];
 
 			ostid_cpu_to_le(&loi->loi_oi, &lmm_objects[i].l_ost_oi);
@@ -241,7 +241,7 @@ ssize_t lov_lsm_pack(const struct lov_stripe_md *lsm, void *buf,
 				cpu_to_le32(loi->loi_ost_idx);
 		}
 
-		size = lov_mds_md_size(stripecnt, lsme->lsme_magic);
+		size = lov_mds_md_size(stripe_count, lsme->lsme_magic);
 		lcme->lcme_size = cpu_to_le32(size);
 		offset += size;
 	} /* for each layout component */
@@ -250,7 +250,7 @@ ssize_t lov_lsm_pack(const struct lov_stripe_md *lsm, void *buf,
 }
 
 /* Find the max stripecount we should use */
-__u16 lov_get_stripecnt(struct lov_obd *lov, __u32 magic, __u16 stripe_count)
+u16 lov_get_stripe_count(struct lov_obd *lov, u32 magic, u16 stripe_count)
 {
 	__u32 max_stripes = LOV_MAX_STRIPE_COUNT_OLD;
 
diff --git a/drivers/staging/lustre/lustre/lov/lov_request.c b/drivers/staging/lustre/lustre/lov/lov_request.c
index 8ca13ed..d13e8d1 100644
--- a/drivers/staging/lustre/lustre/lov/lov_request.c
+++ b/drivers/staging/lustre/lustre/lov/lov_request.c
@@ -149,8 +149,8 @@ static int lov_fini_statfs(struct obd_device *obd, struct obd_statfs *osfs,
 			   int success)
 {
 	if (success) {
-		__u32 expected_stripes = lov_get_stripecnt(&obd->u.lov,
-							   LOV_MAGIC, 0);
+		u32 expected_stripes = lov_get_stripe_count(&obd->u.lov,
+							    LOV_MAGIC, 0);
 		if (osfs->os_files != LOV_U64_MAX)
 			lov_do_div64(osfs->os_files, expected_stripes);
 		if (osfs->os_ffree != LOV_U64_MAX)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [lustre-devel] [PATCH v2 33/33] lustre: update version to 2.9.99
  2019-01-06 22:13 [lustre-devel] [PATCH v2 00/33] lustre: add PFL support James Simmons
                   ` (31 preceding siblings ...)
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 32/33] lustre: lov: use stripe_count instead of stripe_nr James Simmons
@ 2019-01-06 22:14 ` James Simmons
  2019-01-08  1:38   ` NeilBrown
  32 siblings, 1 reply; 47+ messages in thread
From: James Simmons @ 2019-01-06 22:14 UTC (permalink / raw)
  To: lustre-devel

With the majority of missing patches and features from the lustre
2.10 release merged upstream its time to update the upstream
client's version.

Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/include/uapi/linux/lustre/lustre_ver.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_ver.h b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_ver.h
index 1428fdd..e7a2eda 100644
--- a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_ver.h
+++ b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_ver.h
@@ -2,10 +2,10 @@
 #define _LUSTRE_VER_H_
 
 #define LUSTRE_MAJOR 2
-#define LUSTRE_MINOR 8
+#define LUSTRE_MINOR 9
 #define LUSTRE_PATCH 99
 #define LUSTRE_FIX 0
-#define LUSTRE_VERSION_STRING "2.8.99"
+#define LUSTRE_VERSION_STRING "2.9.99"
 
 #define OBD_OCD_VERSION(major, minor, patch, fix)			\
 	(((major) << 24) + ((minor) << 16) + ((patch) << 8) + (fix))
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [lustre-devel] [PATCH v2 33/33] lustre: update version to 2.9.99
  2019-01-06 22:14 ` [lustre-devel] [PATCH v2 33/33] lustre: update version to 2.9.99 James Simmons
@ 2019-01-08  1:38   ` NeilBrown
  2019-01-08  4:26     ` James Simmons
  0 siblings, 1 reply; 47+ messages in thread
From: NeilBrown @ 2019-01-08  1:38 UTC (permalink / raw)
  To: lustre-devel

On Sun, Jan 06 2019, James Simmons wrote:

> With the majority of missing patches and features from the lustre
> 2.10 release merged upstream its time to update the upstream
> client's version.

:-)

Thanks to some of these patches (this batch or previous) I have fewer
failing tests now .. those not many fewer.

The current summary is
     45             status: FAIL
    556             status: PASS
     47             status: SKIP

It used to be >50 FAIL.

The failing tests are listed below.
I know why the FID patches fail - we've discussed that.
Maybe it is time to start working out why some of the others are
failing.

Your two recent series are in my lustre-testing branch now - thanks.

NeilBrown


sanity: FAIL: test_27G 'testpool' is not empty
sanity: FAIL: test_56w /root/lustre-release/lustre/utils/lfs getstripe -c /mnt/lustre/d56w.sanity/file1 wrong: found 2, expected 1
sanity: FAIL: test_56x migrate failed rc = 11
sanity: FAIL: test_56xa migrate failed rc = 11
sanity: FAIL: test_56z /root/lustre-release/lustre/utils/lfs find did not continue after error
sanity: FAIL: test_56aa lfs find --size wrong under striped dir
sanity: FAIL: test_56ca create /mnt/lustre/d56ca.sanity/f56ca.sanity- failed
sanity: FAIL: test_64b oos.sh failed: 1
sanity: FAIL: test_102c setstripe failed
sanity: FAIL: test_102j file1-0-1: size  != 65536
sanity: FAIL: test_103a misc test failed
sanity: FAIL: test_104b lfs check servers test failed
sanity: FAIL: test_130a filefrag /mnt/lustre/f130a.sanity failed
sanity: FAIL: test_130b filefrag /mnt/lustre/f130b.sanity failed
sanity: FAIL: test_130c filefrag /mnt/lustre/f130c.sanity failed
sanity: FAIL: test_130e filefrag /mnt/lustre/f130e.sanity failed
sanity: FAIL: test_130f filefrag /mnt/lustre/f130f.sanity failed
sanity: FAIL: test_160a changelog 'f160a.sanity' fid  != file fid [0x240002342:0xd:0x0]
sanity: FAIL: test_161d cat failed
sanity: FAIL: test_205 No jobstats for id.205.mkdir.9480 found on mds1::*.lustre-MDT0000.job_stats
sanity: FAIL: test_208 get lease error
sanity: FAIL: test_225a mds-survey with zero-stripe failed
sanity: FAIL: test_225b mds-survey with stripe_count failed
sanity: FAIL: test_233a cannot access /mnt/lustre using its FID '[0x200000007:0x1:0x0]'
sanity: FAIL: test_233b cannot access /mnt/lustre/.lustre using its FID '[0x200000002:0x1:0x0]'
sanity: FAIL: test_255c Ladvise test10 failed, 255
sanity: FAIL: test_270a Can't create DoM layout
sanity: FAIL: test_270c bad pattern
sanity: FAIL: test_270e lfs find -L: found 1, expected 20
sanity: FAIL: test_270f Can't create file with 262144 DoM stripe
sanity: FAIL: test_271c Too few enqueues , expected > 2000
sanity: FAIL: test_271f expect 1 READ RPC,  occured
sanity: FAIL: test_300g create striped_dir failed
sanity: FAIL: test_300n create striped dir fails with gid=-1
sanity: FAIL: test_300q create d300q.sanity fails
sanity: FAIL: test_315 read is not accounted ()
sanity: FAIL: test_317 Expected Block 4096 got 10240 for f317.sanity
sanity: FAIL: test_405 One layout swap locked test failed
sanity: FAIL: test_406 mkdir d406.sanity failed
sanity: FAIL: test_409 Fail to cleanup the env!
sanity: FAIL: test_410 no inode match
sanity: FAIL: test_412 mkdir failed
sanity: FAIL: test_413 don't expect 1
sanity: FAIL: test_802 (5) Mount client with 'ro' should succeed
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20190108/573d6e46/attachment.sig>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [lustre-devel] [PATCH v2 33/33] lustre: update version to 2.9.99
  2019-01-08  1:38   ` NeilBrown
@ 2019-01-08  4:26     ` James Simmons
  2019-01-08 10:13       ` Andreas Dilger
  0 siblings, 1 reply; 47+ messages in thread
From: James Simmons @ 2019-01-08  4:26 UTC (permalink / raw)
  To: lustre-devel


> On Sun, Jan 06 2019, James Simmons wrote:
> 
> > With the majority of missing patches and features from the lustre
> > 2.10 release merged upstream its time to update the upstream
> > client's version.
> 
> :-)
> 
> Thanks to some of these patches (this batch or previous) I have fewer
> failing tests now .. those not many fewer.
> 
> The current summary is
>      45             status: FAIL
>     556             status: PASS
>      47             status: SKIP
> 
> It used to be >50 FAIL.
> 
> The failing tests are listed below.
> I know why the FID patches fail - we've discussed that.
> Maybe it is time to start working out why some of the others are
> failing.

You are running a much newer test suite. Using the test suite from lustre 
2.10 I see the following failures.

sanity: FAIL: test_103a run_acl_subtest cp failed    (real failure)
sanity: FAIL: test_215 cannot read lnet.stats	     (not sysfs aware)
sanity: FAIL: test_233a cannot access /lustre/lustre using its FID '[0x200000007:0x1:0x0]'
sanity: FAIL: test_233b cannot access /lustre/lustre/.lustre using its FID '[0x200000002:0x1:0x0]'
sanity: FAIL: test_256 Changelog catalog has wrong number of slots 0  (fails for 2.10 LTS release as well)
 
> Your two recent series are in my lustre-testing branch now - thanks.
> 
> NeilBrown
> 
> 
> sanity: FAIL: test_27G 'testpool' is not empty 

See LU-11208. Test currently with older lustre versions.

> sanity: FAIL: test_56w /root/lustre-release/lustre/utils/lfs getstripe -c /mnt/lustre/d56w.sanity/file1 wrong: found 2, expected 1
> sanity: FAIL: test_56x migrate failed rc = 11
> sanity: FAIL: test_56xa migrate failed rc = 11
> sanity: FAIL: test_56z /root/lustre-release/lustre/utils/lfs find did not continue after error
> sanity: FAIL: test_56aa lfs find --size wrong under striped dir
> sanity: FAIL: test_56ca create /mnt/lustre/d56ca.sanity/f56ca.sanity- failed
> sanity: FAIL: test_64b oos.sh failed: 1
> sanity: FAIL: test_102c setstripe failed
> sanity: FAIL: test_102j file1-0-1: size  != 65536

I believe these are due to the DoM feature missing

> sanity: FAIL: test_103a misc test failed

103a is real failure. Never solved yet. (LU-11594 and LU-10334 for Ubuntu)

> sanity: FAIL: test_104b lfs check servers test failed

sysfs bug. I have a patch for this.

> sanity: FAIL: test_130a filefrag /mnt/lustre/f130a.sanity failed
> sanity: FAIL: test_130b filefrag /mnt/lustre/f130b.sanity failed
> sanity: FAIL: test_130c filefrag /mnt/lustre/f130c.sanity failed
> sanity: FAIL: test_130e filefrag /mnt/lustre/f130e.sanity failed
> sanity: FAIL: test_130f filefrag /mnt/lustre/f130f.sanity failed

What version of e2fsprog are you running? You need a 1.44 version and
this should go away.

> sanity: FAIL: test_160a changelog 'f160a.sanity' fid  != file fid [0x240002342:0xd:0x0]
> sanity: FAIL: test_161d cat failed

Might be missing some more changelog improvements.

> sanity: FAIL: test_205 No jobstats for id.205.mkdir.9480 found on mds1::*.lustre-MDT0000.job_stats

Strange?

> sanity: FAIL: test_208 get lease error
> sanity: FAIL: test_225a mds-survey with zero-stripe failed
> sanity: FAIL: test_225b mds-survey with stripe_count failed

Never ran that since its not in 2.10.

> sanity: FAIL: test_233a cannot access /mnt/lustre using its FID '[0x200000007:0x1:0x0]'
> sanity: FAIL: test_233b cannot access /mnt/lustre/.lustre using its FID '[0x200000002:0x1:0x0]'

> sanity: FAIL: test_255c Ladvise test10 failed, 255
> sanity: FAIL: test_270a Can't create DoM layout
> sanity: FAIL: test_270c bad pattern
> sanity: FAIL: test_270e lfs find -L: found 1, expected 20
> sanity: FAIL: test_270f Can't create file with 262144 DoM stripe
> sanity: FAIL: test_271c Too few enqueues , expected > 2000
> sanity: FAIL: test_271f expect 1 READ RPC,  occured
> sanity: FAIL: test_300g create striped_dir failed
> sanity: FAIL: test_300n create striped dir fails with gid=-1
> sanity: FAIL: test_300q create d300q.sanity fails
> sanity: FAIL: test_315 read is not accounted ()
> sanity: FAIL: test_317 Expected Block 4096 got 10240 for f317.sanity
> sanity: FAIL: test_405 One layout swap locked test failed
> sanity: FAIL: test_406 mkdir d406.sanity failed
> sanity: FAIL: test_409 Fail to cleanup the env!
 
More DoM issues? Could be FLR as well if you are running the latest
test suite.

> sanity: FAIL: test_410 no inode match

This is a weird test running a local kernel module.

> sanity: FAIL: test_412 mkdir failed
> sanity: FAIL: test_413 don't expect 1

More DoM ???? Have to look at this.

> sanity: FAIL: test_802 (5) Mount client with 'ro' should succeed

Is test is broken. It assumes you have a specially patched kernel.
Details are under ticket LU-684.

The nice thing is with the linux client is that we are at a point
it wouldn't be a huge leap to integrate DoM (Data on MetaData).
The reason I suggest cleanups and moving out of staging first was
to perserve git blame a bit better with future patches. Currently
we see a lot of "0846e85ba2346 (NeilBrown 2018-06-07" with git blame.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [lustre-devel] [PATCH v2 33/33] lustre: update version to 2.9.99
  2019-01-08  4:26     ` James Simmons
@ 2019-01-08 10:13       ` Andreas Dilger
  2019-01-08 21:47         ` James Simmons
  2019-01-10  0:40         ` NeilBrown
  0 siblings, 2 replies; 47+ messages in thread
From: Andreas Dilger @ 2019-01-08 10:13 UTC (permalink / raw)
  To: lustre-devel

On Jan 7, 2019, at 21:26, James Simmons <jsimmons@infradead.org> wrote:
> 
>> 
>> On Sun, Jan 06 2019, James Simmons wrote:
>> 
>>> With the majority of missing patches and features from the lustre
>>> 2.10 release merged upstream its time to update the upstream
>>> client's version.
>> 
>> :-)
>> 
>> Thanks to some of these patches (this batch or previous) I have fewer
>> failing tests now .. those not many fewer.
>> 
>> The current summary is
>>     45             status: FAIL
>>    556             status: PASS
>>     47             status: SKIP
>> 
>> It used to be >50 FAIL.
>> 
>> The failing tests are listed below.
>> I know why the FID patches fail - we've discussed that.
>> Maybe it is time to start working out why some of the others are
>> failing.
> 
> You are running a much newer test suite. Using the test suite from lustre 
> 2.10 I see the following failures.
> 
> sanity: FAIL: test_103a run_acl_subtest cp failed    (real failure)
> sanity: FAIL: test_215 cannot read lnet.stats	     (not sysfs aware)
> sanity: FAIL: test_233a cannot access /lustre/lustre using its FID '[0x200000007:0x1:0x0]'
> sanity: FAIL: test_233b cannot access /lustre/lustre/.lustre using its FID '[0x200000002:0x1:0x0]'
> sanity: FAIL: test_256 Changelog catalog has wrong number of slots 0  (fails for 2.10 LTS release as well)

Yes, there are definitely some tests that do not have proper client/server version/feature checks, since the tests are introduced with the code they
are testing.  There are a number of patches in Gerrit that are adding the
proper checks that I'd like to get landed, because we do run client/server
version interop testing, but they always lag a bit behind and we never see
test-script/client version issues in our testing. 

>> Your two recent series are in my lustre-testing branch now - thanks.
>> 
>> NeilBrown
>> 
>> 
>> sanity: FAIL: test_27G 'testpool' is not empty 
> 
> See LU-11208. Test currently with older lustre versions.
> 
>> sanity: FAIL: test_56w /root/lustre-release/lustre/utils/lfs getstripe -c /mnt/lustre/d56w.sanity/file1 wrong: found 2, expected 1
>> sanity: FAIL: test_56x migrate failed rc = 11
>> sanity: FAIL: test_56xa migrate failed rc = 11
>> sanity: FAIL: test_56z /root/lustre-release/lustre/utils/lfs find did not continue after error
>> sanity: FAIL: test_56aa lfs find --size wrong under striped dir
>> sanity: FAIL: test_56ca create /mnt/lustre/d56ca.sanity/f56ca.sanity- failed
>> sanity: FAIL: test_64b oos.sh failed: 1
>> sanity: FAIL: test_102c setstripe failed
>> sanity: FAIL: test_102j file1-0-1: size  != 65536
> 
> I believe these are due to the DoM feature missing
> 
>> sanity: FAIL: test_103a misc test failed
> 
> 103a is real failure. Never solved yet. (LU-11594 and LU-10334 for Ubuntu)
> 
>> sanity: FAIL: test_104b lfs check servers test failed
> 
> sysfs bug. I have a patch for this.
> 
>> sanity: FAIL: test_130a filefrag /mnt/lustre/f130a.sanity failed
>> sanity: FAIL: test_130b filefrag /mnt/lustre/f130b.sanity failed
>> sanity: FAIL: test_130c filefrag /mnt/lustre/f130c.sanity failed
>> sanity: FAIL: test_130e filefrag /mnt/lustre/f130e.sanity failed
>> sanity: FAIL: test_130f filefrag /mnt/lustre/f130f.sanity failed
> 
> What version of e2fsprog are you running? You need a 1.44 version and
> this should go away.

To be clear - the Lustre-patched "filefrag" at:

https://downloads.whamcloud.com/public/e2fsprogs/1.44.3.wc1/

Once Lustre gets into upstream, or convince another filesystem to use the
Lustre filefrag extension (multiple devices, which BtrFS and XFS could
use) we can get the support landed into the upstream e2fsprogs.

>> sanity: FAIL: test_160a changelog 'f160a.sanity' fid  != file fid [0x240002342:0xd:0x0]
>> sanity: FAIL: test_161d cat failed
> 
> Might be missing some more changelog improvements.
> 
>> sanity: FAIL: test_205 No jobstats for id.205.mkdir.9480 found on mds1::*.lustre-MDT0000.job_stats
> 
> Strange?

This might be because the upstream Lustre doesn't allow setting per-process
JobID via environment variable, only as a single per-node value.  The real
unfortunate part is that the "get JobID from environment" actually works for
every reasonable architecture (even the one which was originally broken
fixed it), but it got yanked anyway.  This is actually one of the features
of Lustre that lots of HPC sites like to use, since it allows them to track
on the servers which users/jobs/processes on the client are doing IO.

>> sanity: FAIL: test_208 get lease error
>> sanity: FAIL: test_225a mds-survey with zero-stripe failed
>> sanity: FAIL: test_225b mds-survey with stripe_count failed
> 
> Never ran that since its not in 2.10.
> 
>> sanity: FAIL: test_233a cannot access /mnt/lustre using its FID '[0x200000007:0x1:0x0]'
>> sanity: FAIL: test_233b cannot access /mnt/lustre/.lustre using its FID '[0x200000002:0x1:0x0]'
> 
>> sanity: FAIL: test_255c Ladvise test10 failed, 255
>> sanity: FAIL: test_270a Can't create DoM layout
>> sanity: FAIL: test_270c bad pattern
>> sanity: FAIL: test_270e lfs find -L: found 1, expected 20
>> sanity: FAIL: test_270f Can't create file with 262144 DoM stripe
>> sanity: FAIL: test_271c Too few enqueues , expected > 2000
>> sanity: FAIL: test_271f expect 1 READ RPC,  occured
>> sanity: FAIL: test_300g create striped_dir failed
>> sanity: FAIL: test_300n create striped dir fails with gid=-1
>> sanity: FAIL: test_300q create d300q.sanity fails
>> sanity: FAIL: test_315 read is not accounted ()
>> sanity: FAIL: test_317 Expected Block 4096 got 10240 for f317.sanity
>> sanity: FAIL: test_405 One layout swap locked test failed
>> sanity: FAIL: test_406 mkdir d406.sanity failed
>> sanity: FAIL: test_409 Fail to cleanup the env!
> 
> More DoM issues? Could be FLR as well if you are running the latest
> test suite.
> 
>> sanity: FAIL: test_410 no inode match
> 
> This is a weird test running a local kernel module.
> 
>> sanity: FAIL: test_412 mkdir failed
>> sanity: FAIL: test_413 don't expect 1
> 
> More DoM ???? Have to look at this.
> 
>> sanity: FAIL: test_802 (5) Mount client with 'ro' should succeed
> 
> Is test is broken. It assumes you have a specially patched kernel.
> Details are under ticket LU-684.
> 
> The nice thing is with the linux client is that we are at a point
> it wouldn't be a huge leap to integrate DoM (Data on MetaData).
> The reason I suggest cleanups and moving out of staging first was
> to perserve git blame a bit better with future patches. Currently
> we see a lot of "0846e85ba2346 (NeilBrown 2018-06-07" with git blame.

Cheers, Andreas
---
Andreas Dilger
CTO Whamcloud

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [lustre-devel] [PATCH v2 33/33] lustre: update version to 2.9.99
  2019-01-08 10:13       ` Andreas Dilger
@ 2019-01-08 21:47         ` James Simmons
  2019-01-09  0:15           ` Andreas Dilger
  2019-01-10  1:46           ` NeilBrown
  2019-01-10  0:40         ` NeilBrown
  1 sibling, 2 replies; 47+ messages in thread
From: James Simmons @ 2019-01-08 21:47 UTC (permalink / raw)
  To: lustre-devel


> >> sanity: FAIL: test_104b lfs check servers test failed
> > 
> > sysfs bug. I have a patch for this.
> > 
> >> sanity: FAIL: test_130a filefrag /mnt/lustre/f130a.sanity failed
> >> sanity: FAIL: test_130b filefrag /mnt/lustre/f130b.sanity failed
> >> sanity: FAIL: test_130c filefrag /mnt/lustre/f130c.sanity failed
> >> sanity: FAIL: test_130e filefrag /mnt/lustre/f130e.sanity failed
> >> sanity: FAIL: test_130f filefrag /mnt/lustre/f130f.sanity failed
> > 
> > What version of e2fsprog are you running? You need a 1.44 version and
> > this should go away.
> 
> To be clear - the Lustre-patched "filefrag" at:
> 
> https://downloads.whamcloud.com/public/e2fsprogs/1.44.3.wc1/
> 
> Once Lustre gets into upstream, or convince another filesystem to use the
> Lustre filefrag extension (multiple devices, which BtrFS and XFS could
> use) we can get the support landed into the upstream e2fsprogs.

I swore that Ubuntu18 testing passed with the default e2fsprogs (1.44.4).
To let Neil know, this is why lustre_fiemap.h exist in the uapi headers 
directory. This kind of functionality would help the community at large.
 
> >> sanity: FAIL: test_160a changelog 'f160a.sanity' fid  != file fid [0x240002342:0xd:0x0]
> >> sanity: FAIL: test_161d cat failed
> > 
> > Might be missing some more changelog improvements.
> > 
> >> sanity: FAIL: test_205 No jobstats for id.205.mkdir.9480 found on mds1::*.lustre-MDT0000.job_stats
> > 
> > Strange?
> 
> This might be because the upstream Lustre doesn't allow setting per-process
> JobID via environment variable, only as a single per-node value.  The real
> unfortunate part is that the "get JobID from environment" actually works for
> every reasonable architecture (even the one which was originally broken
> fixed it), but it got yanked anyway.  This is actually one of the features
> of Lustre that lots of HPC sites like to use, since it allows them to track
> on the servers which users/jobs/processes on the client are doing IO.

To give background for Neil see thread:

https://lore.kernel.org/patchwork/patch/416846

In this case I do agree with Greg. The latest jobid does implement an
upcall and upcalls don't play niece with containers. Their is also the
namespace issue pointed out. I think the namespace issue might be fixed
in the latest OpenSFS code. The whole approach to stats in lustre is
pretty awful. Take jobstats for example. Currently the approach is
to poll inside the kernel at specific intervals. Part of the polling is 
scanning the running processes environment space. On top of this the 
administor ends up creating scripts to poll the proc / debugfs entry. 
Other types of lustre stat files take a similar approach. Scripts have
to poll debugfs / procfs entries.

I have been thinking what would be a better approach since I like to
approach this problem for the 2.13 time frame. Our admins at my work
place want to be able to collect application stats without being root.
So placing stats in debugfs is not an option, which we currently do
the linux client :-( The stats are not a good fit for sysfs. The solution 
I have been pondering is using netlink. Since netlink is socket based it 
can be treated as a pipe. Now you are thinking well you still need to poll 
on the netlink socket but you don't have too. systemd does it for you :-)  
We can create systemd service file which uses

ListenNetlink=generic "multicast group" ...

to launch a service to collect the stats. As for job stats we use another
type of netlink class called process connector. The below link cover this
little know feature. It is available on every system in the support 
matrix.

https://www.slideshare.net/kerneltlv/kernel-proc-connector-and-containers

In this case we parsen for the job schedular ids and pass that to lustre
using system.d.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [lustre-devel] [PATCH v2 33/33] lustre: update version to 2.9.99
  2019-01-08 21:47         ` James Simmons
@ 2019-01-09  0:15           ` Andreas Dilger
  2019-01-09 18:28             ` James Simmons
  2019-01-10  1:46           ` NeilBrown
  1 sibling, 1 reply; 47+ messages in thread
From: Andreas Dilger @ 2019-01-09  0:15 UTC (permalink / raw)
  To: lustre-devel

On Jan 8, 2019, at 14:47, James Simmons <jsimmons@infradead.org> wrote:
> 
>>>> sanity: FAIL: test_104b lfs check servers test failed
>>> 
>>> sysfs bug. I have a patch for this.
>>> 
>>>> sanity: FAIL: test_130a filefrag /mnt/lustre/f130a.sanity failed
>>>> sanity: FAIL: test_130b filefrag /mnt/lustre/f130b.sanity failed
>>>> sanity: FAIL: test_130c filefrag /mnt/lustre/f130c.sanity failed
>>>> sanity: FAIL: test_130e filefrag /mnt/lustre/f130e.sanity failed
>>>> sanity: FAIL: test_130f filefrag /mnt/lustre/f130f.sanity failed
>>> 
>>> What version of e2fsprog are you running? You need a 1.44 version and
>>> this should go away.
>> 
>> To be clear - the Lustre-patched "filefrag" at:
>> 
>> https://downloads.whamcloud.com/public/e2fsprogs/1.44.3.wc1/
>> 
>> Once Lustre gets into upstream, or convince another filesystem to use the
>> Lustre filefrag extension (multiple devices, which BtrFS and XFS could
>> use) we can get the support landed into the upstream e2fsprogs.
> 
> I swore that Ubuntu18 testing passed with the default e2fsprogs (1.44.4).
> To let Neil know, this is why lustre_fiemap.h exist in the uapi headers 
> directory. This kind of functionality would help the community at large.

The returned data is identical for single-striped files, so the vanilla
filefrag will work on Lustre in the common case.

>>>> sanity: FAIL: test_160a changelog 'f160a.sanity' fid  != file fid [0x240002342:0xd:0x0]
>>>> sanity: FAIL: test_161d cat failed
>>> 
>>> Might be missing some more changelog improvements.
>>> 
>>>> sanity: FAIL: test_205 No jobstats for id.205.mkdir.9480 found on mds1::*.lustre-MDT0000.job_stats
>>> 
>>> Strange?
>> 
>> This might be because the upstream Lustre doesn't allow setting per-process
>> JobID via environment variable, only as a single per-node value.  The real
>> unfortunate part is that the "get JobID from environment" actually works for
>> every reasonable architecture (even the one which was originally broken
>> fixed it), but it got yanked anyway.  This is actually one of the features
>> of Lustre that lots of HPC sites like to use, since it allows them to track
>> on the servers which users/jobs/processes on the client are doing IO.
> 
> To give background for Neil see thread:
> 
> https://lore.kernel.org/patchwork/patch/416846
> 
> In this case I do agree with Greg. The latest jobid does implement an
> upcall and upcalls don't play niece with containers. Their is also the
> namespace issue pointed out. I think the namespace issue might be fixed
> in the latest OpenSFS code.

I'm not sure what you mean?  AFAIK, there is no upcall for JobID, except
maybe in the kernel client where we weren't allowed to parse the process
environment directly.  I agree an upcall is problematic with namespaces,
in addition to being less functional (only a JobID per node instead of
per process), which is why direct access to JOBENV is better IMHO.

> The whole approach to stats in lustre is
> pretty awful. Take jobstats for example. Currently the approach is
> to poll inside the kernel at specific intervals. Part of the polling is 
> scanning the running processes environment space. On top of this the 
> administor ends up creating scripts to poll the proc / debugfs entry. 
> Other types of lustre stat files take a similar approach. Scripts have
> to poll debugfs / procfs entries.

I think that issue is orthogonal to getting the actual JobID.  That is
the stats collection from the kernel.  We shouldn't be inventing a new
way to process that.  What does "top" do?  Read a thousand /proc files
every second because that is flexible for different use cases.  There
are much fewer Lustre stats files on a given node, and I haven't heard
that the actual stats reading interface is a performance issue.

> I have been thinking what would be a better approach since I like to
> approach this problem for the 2.13 time frame. Our admins at my work
> place want to be able to collect application stats without being root.
> So placing stats in debugfs is not an option, which we currently do
> the linux client :-( The stats are not a good fit for sysfs. The solution 
> I have been pondering is using netlink. Since netlink is socket based it 
> can be treated as a pipe. Now you are thinking well you still need to poll 
> on the netlink socket but you don't have too. systemd does it for you :-)  
> We can create systemd service file which uses

For the love of all that is holy, do not make Lustre stats usage depend
on Systemd to be usable.

Cheers, Andreas
---
Andreas Dilger
Principal Lustre Architect
Whamcloud

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [lustre-devel] [PATCH v2 33/33] lustre: update version to 2.9.99
  2019-01-09  0:15           ` Andreas Dilger
@ 2019-01-09 18:28             ` James Simmons
  2019-01-09 23:16               ` Andreas Dilger
  0 siblings, 1 reply; 47+ messages in thread
From: James Simmons @ 2019-01-09 18:28 UTC (permalink / raw)
  To: lustre-devel


> >> This might be because the upstream Lustre doesn't allow setting per-process
> >> JobID via environment variable, only as a single per-node value.  The real
> >> unfortunate part is that the "get JobID from environment" actually works for
> >> every reasonable architecture (even the one which was originally broken
> >> fixed it), but it got yanked anyway.  This is actually one of the features
> >> of Lustre that lots of HPC sites like to use, since it allows them to track
> >> on the servers which users/jobs/processes on the client are doing IO.
> > 
> > To give background for Neil see thread:
> > 
> > https://lore.kernel.org/patchwork/patch/416846
> > 
> > In this case I do agree with Greg. The latest jobid does implement an
> > upcall and upcalls don't play niece with containers. Their is also the
> > namespace issue pointed out. I think the namespace issue might be fixed
> > in the latest OpenSFS code.
> 
> I'm not sure what you mean?  AFAIK, there is no upcall for JobID, except
> maybe in the kernel client where we weren't allowed to parse the process
> environment directly.  I agree an upcall is problematic with namespaces,
> in addition to being less functional (only a JobID per node instead of
> per process), which is why direct access to JOBENV is better IMHO.

I have some evil ideas about this. Need to think about it some more since
this is a more complex problem.
 
> > The whole approach to stats in lustre is
> > pretty awful. Take jobstats for example. Currently the approach is
> > to poll inside the kernel at specific intervals. Part of the polling is 
> > scanning the running processes environment space. On top of this the 
> > administor ends up creating scripts to poll the proc / debugfs entry. 
> > Other types of lustre stat files take a similar approach. Scripts have
> > to poll debugfs / procfs entries.
> 
> I think that issue is orthogonal to getting the actual JobID.  That is
> the stats collection from the kernel.  We shouldn't be inventing a new
> way to process that.  What does "top" do?  Read a thousand /proc files
> every second because that is flexible for different use cases.  There
> are much fewer Lustre stats files on a given node, and I haven't heard
> that the actual stats reading interface is a performance issue.

Because the policy for the linux kernel is not to add non processes 
related information in procfs anymore. "top" reads process information
from procfs which is okay. This means the stats lustre generates are
required to be placed in debugfs. The problem their is you need to be
root to access this information. I told the administrator about this
and they told me in no way will they run an application as root just
to read stats. We really don't want to require users to mount their
debugfs partitions to allow non root uses to access it. So I looked 
into alteranatives. Actually with netlink you have far more power for 
handling stats then polling some proc file. Also while for most cases the 
stat files are not huge in general but if we do end up having a stat 
seq_file with a huge amount of data then polling that file can really 
spike the load on an node. 

> > I have been thinking what would be a better approach since I like to
> > approach this problem for the 2.13 time frame. Our admins at my work
> > place want to be able to collect application stats without being root.
> > So placing stats in debugfs is not an option, which we currently do
> > the linux client :-( The stats are not a good fit for sysfs. The solution 
> > I have been pondering is using netlink. Since netlink is socket based it 
> > can be treated as a pipe. Now you are thinking well you still need to poll 
> > on the netlink socket but you don't have too. systemd does it for you :-)  
> > We can create systemd service file which uses
> 
> For the love of all that is holy, do not make Lustre stats usage depend
> on Systemd to be usable.

I never write code that locks in one approach ever. Take for example the
lctl conf_param / set_param -P handling with the move to sysfs. Instead
of the old upcall method to lctl now we have a udev rule. That rule is not
law!!! A site could create their own udev rule if they want to say log
changes to the lustre tunables. Keep in mind udev rules need to be simple
since they block until completed much like upcalls do. If you want to
run a heavy application you can create a system.d service to handle the
tunable uevent. If you are really clever you can use dbus to send that
the tuning event to external nodes. Their are many creative options.

The same is true with the stats netlink approach. I was up late last night
pondering a design for the netlink stats. I have to put together a list
of my ideas and run it by my admins. So no systemd is not a hard 
requirment. Just an option for people into that sort of thing. Using
udev and netlink opens up a whole new stack to take advantage of.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [lustre-devel] [PATCH v2 33/33] lustre: update version to 2.9.99
  2019-01-09 18:28             ` James Simmons
@ 2019-01-09 23:16               ` Andreas Dilger
  2019-01-10  1:36                 ` NeilBrown
  0 siblings, 1 reply; 47+ messages in thread
From: Andreas Dilger @ 2019-01-09 23:16 UTC (permalink / raw)
  To: lustre-devel

On Jan 9, 2019, at 11:28, James Simmons <jsimmons@infradead.org> wrote:
> 
> 
>>>> This might be because the upstream Lustre doesn't allow setting per-process
>>>> JobID via environment variable, only as a single per-node value.  The real
>>>> unfortunate part is that the "get JobID from environment" actually works for
>>>> every reasonable architecture (even the one which was originally broken
>>>> fixed it), but it got yanked anyway.  This is actually one of the features
>>>> of Lustre that lots of HPC sites like to use, since it allows them to track
>>>> on the servers which users/jobs/processes on the client are doing IO.
>>> 
>>> To give background for Neil see thread:
>>> 
>>> https://lore.kernel.org/patchwork/patch/416846
>>> 
>>> In this case I do agree with Greg. The latest jobid does implement an
>>> upcall and upcalls don't play niece with containers. Their is also the
>>> namespace issue pointed out. I think the namespace issue might be fixed
>>> in the latest OpenSFS code.
>> 
>> I'm not sure what you mean?  AFAIK, there is no upcall for JobID, except
>> maybe in the kernel client where we weren't allowed to parse the process
>> environment directly.  I agree an upcall is problematic with namespaces,
>> in addition to being less functional (only a JobID per node instead of
>> per process), which is why direct access to JOBENV is better IMHO.
> 
> I have some evil ideas about this. Need to think about it some more since
> this is a more complex problem.

Since the kernel manages the environment variables via getenv() and setenv(),
I honestly don't see why accessing them directly is a huge issue?

>>> The whole approach to stats in lustre is
>>> pretty awful. Take jobstats for example. Currently the approach is
>>> to poll inside the kernel at specific intervals. Part of the polling is 
>>> scanning the running processes environment space. On top of this the 
>>> administor ends up creating scripts to poll the proc / debugfs entry. 
>>> Other types of lustre stat files take a similar approach. Scripts have
>>> to poll debugfs / procfs entries.
>> 
>> I think that issue is orthogonal to getting the actual JobID.  That is
>> the stats collection from the kernel.  We shouldn't be inventing a new
>> way to process that.  What does "top" do?  Read a thousand /proc files
>> every second because that is flexible for different use cases.  There
>> are much fewer Lustre stats files on a given node, and I haven't heard
>> that the actual stats reading interface is a performance issue.
> 
> Because the policy for the linux kernel is not to add non processes 
> related information in procfs anymore. "top" reads process information
> from procfs which is okay.

The location of the stats (procfs vs. sysfs vs. debugfs) wasn't my point.
My point was that a *very* core kernel performance monitoring utility is
doing open/read/close from virtual kernel files, so before we go ahead
and invent our own performance monitoring framework (which may be frowned
upon by upstream for arbitrary reasons because it isn't using /proc or
/sys files).

> This means the stats lustre generates are
> required to be placed in debugfs. The problem their is you need to be
> root to access this information.

That is a self-inflicted problem because of upstream kernel policy to
move the existing files out of /proc and being unable to use /sys either.

> I told the administrator about this
> and they told me in no way will they run an application as root just
> to read stats. We really don't want to require users to mount their
> debugfs partitions to allow non root uses to access it. So I looked 
> into alteranatives. Actually with netlink you have far more power for 
> handling stats then polling some proc file. Also while for most cases the 
> stat files are not huge in general but if we do end up having a stat 
> seq_file with a huge amount of data then polling that file can really 
> spike the load on an node.

I agree it is not ideal.  One option (AFAIK) would be a udev rule that
changes the /sys/kernel/debug/lustre/* files to be at readable by a
non-root group (e.g. admin or perftools or whatever) for the collector.

>>> I have been thinking what would be a better approach since I like to
>>> approach this problem for the 2.13 time frame. Our admins at my work
>>> place want to be able to collect application stats without being root.
>>> So placing stats in debugfs is not an option, which we currently do
>>> the linux client :-( The stats are not a good fit for sysfs. The solution 
>>> I have been pondering is using netlink. Since netlink is socket based it 
>>> can be treated as a pipe. Now you are thinking well you still need to poll 
>>> on the netlink socket but you don't have too. systemd does it for you :-)  
>>> We can create systemd service file which uses
>> 
>> For the love of all that is holy, do not make Lustre stats usage depend
>> on Systemd to be usable.
> 
> I never write code that locks in one approach ever. Take for example the
> lctl conf_param / set_param -P handling with the move to sysfs. Instead
> of the old upcall method to lctl now we have a udev rule. That rule is not
> law!!! A site could create their own udev rule if they want to say log
> changes to the lustre tunables. Keep in mind udev rules need to be simple
> since they block until completed much like upcalls do. If you want to
> run a heavy application you can create a system.d service to handle the
> tunable uevent. If you are really clever you can use dbus to send that
> the tuning event to external nodes. Their are many creative options.
> 
> The same is true with the stats netlink approach. I was up late last night
> pondering a design for the netlink stats. I have to put together a list
> of my ideas and run it by my admins. So no systemd is not a hard 
> requirment. Just an option for people into that sort of thing. Using
> udev and netlink opens up a whole new stack to take advantage of.

Sorry to be negative, but I was just having fun with systemd over the
weekend on one of my home systems, and I really don't want to entangle
it into our stats.  If the existing procfs/sysfs/debugfs scraping will
continue to work in the future then I'm fine with that.

Cheers, Andreas
---
Andreas Dilger
Principal Lustre Architect
Whamcloud

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [lustre-devel] [PATCH v2 33/33] lustre: update version to 2.9.99
  2019-01-08 10:13       ` Andreas Dilger
  2019-01-08 21:47         ` James Simmons
@ 2019-01-10  0:40         ` NeilBrown
  2019-01-10  7:28           ` Andreas Dilger
  1 sibling, 1 reply; 47+ messages in thread
From: NeilBrown @ 2019-01-10  0:40 UTC (permalink / raw)
  To: lustre-devel

On Tue, Jan 08 2019, Andreas Dilger wrote:

> On Jan 7, 2019, at 21:26, James Simmons <jsimmons@infradead.org> wrote:
>> 
>>> 
>>> On Sun, Jan 06 2019, James Simmons wrote:
>>> 
>>>> With the majority of missing patches and features from the lustre
>>>> 2.10 release merged upstream its time to update the upstream
>>>> client's version.
>>> 
>>> :-)
>>> 
>>> Thanks to some of these patches (this batch or previous) I have fewer
>>> failing tests now .. those not many fewer.
>>> 
>>> The current summary is
>>>     45             status: FAIL
>>>    556             status: PASS
>>>     47             status: SKIP
>>> 
>>> It used to be >50 FAIL.
>>> 
>>> The failing tests are listed below.
>>> I know why the FID patches fail - we've discussed that.
>>> Maybe it is time to start working out why some of the others are
>>> failing.
>> 
>> You are running a much newer test suite. Using the test suite from lustre 
>> 2.10 I see the following failures.
>> 
>> sanity: FAIL: test_103a run_acl_subtest cp failed    (real failure)
>> sanity: FAIL: test_215 cannot read lnet.stats	     (not sysfs aware)
>> sanity: FAIL: test_233a cannot access /lustre/lustre using its FID '[0x200000007:0x1:0x0]'
>> sanity: FAIL: test_233b cannot access /lustre/lustre/.lustre using its FID '[0x200000002:0x1:0x0]'
>> sanity: FAIL: test_256 Changelog catalog has wrong number of slots 0  (fails for 2.10 LTS release as well)
>
> Yes, there are definitely some tests that do not have proper client/server version/feature checks, since the tests are introduced with the code they
> are testing.  There are a number of patches in Gerrit that are adding the
> proper checks that I'd like to get landed, because we do run client/server
> version interop testing, but they always lag a bit behind and we never see
> test-script/client version issues in our testing. 
>
>>> Your two recent series are in my lustre-testing branch now - thanks.
>>> 
>>> NeilBrown
>>> 
>>> 
>>> sanity: FAIL: test_27G 'testpool' is not empty 
>> 
>> See LU-11208. Test currently with older lustre versions.
>> 
>>> sanity: FAIL: test_56w /root/lustre-release/lustre/utils/lfs getstripe -c /mnt/lustre/d56w.sanity/file1 wrong: found 2, expected 1
>>> sanity: FAIL: test_56x migrate failed rc = 11
>>> sanity: FAIL: test_56xa migrate failed rc = 11
>>> sanity: FAIL: test_56z /root/lustre-release/lustre/utils/lfs find did not continue after error
>>> sanity: FAIL: test_56aa lfs find --size wrong under striped dir
>>> sanity: FAIL: test_56ca create /mnt/lustre/d56ca.sanity/f56ca.sanity- failed
>>> sanity: FAIL: test_64b oos.sh failed: 1
>>> sanity: FAIL: test_102c setstripe failed
>>> sanity: FAIL: test_102j file1-0-1: size  != 65536
>> 
>> I believe these are due to the DoM feature missing
>> 
>>> sanity: FAIL: test_103a misc test failed
>> 
>> 103a is real failure. Never solved yet. (LU-11594 and LU-10334 for Ubuntu)
>> 
>>> sanity: FAIL: test_104b lfs check servers test failed
>> 
>> sysfs bug. I have a patch for this.
>> 
>>> sanity: FAIL: test_130a filefrag /mnt/lustre/f130a.sanity failed
>>> sanity: FAIL: test_130b filefrag /mnt/lustre/f130b.sanity failed
>>> sanity: FAIL: test_130c filefrag /mnt/lustre/f130c.sanity failed
>>> sanity: FAIL: test_130e filefrag /mnt/lustre/f130e.sanity failed
>>> sanity: FAIL: test_130f filefrag /mnt/lustre/f130f.sanity failed
>> 
>> What version of e2fsprog are you running? You need a 1.44 version and
>> this should go away.
>
> To be clear - the Lustre-patched "filefrag" at:
>
> https://downloads.whamcloud.com/public/e2fsprogs/1.44.3.wc1/
>

I looked at Commit 41aee4226789 ("filefrag: Lustre changes to filefrag FIEMAP handling")
in the git tree instead.

This appears to add 3, features.

- It adds an optional device to struct fiemap.
  Presumably this is always returned if available, else zero is provided
  which means "the device".
- It adds a flag FIEMAP_EXTENT_NET which indicates that the device
  number is *not*  dev_t, but is some fs-specific value
- It allows FIEMAP_FLAG_DEVICE_ORDER to be requested.  I can't quite
  work out what this does.  Presumably it changes the order that entries
  are returned (why?) and maybe returns multiple entries for a region
  that is mirrored ???

As you say, I can see how these might be useful to other filesystems.
Maybe we should try upstreaming the support sooner rather than later.

NeilBrown
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20190110/614ef3d7/attachment.sig>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [lustre-devel] [PATCH v2 33/33] lustre: update version to 2.9.99
  2019-01-09 23:16               ` Andreas Dilger
@ 2019-01-10  1:36                 ` NeilBrown
  2019-01-10  9:10                   ` Andreas Dilger
  0 siblings, 1 reply; 47+ messages in thread
From: NeilBrown @ 2019-01-10  1:36 UTC (permalink / raw)
  To: lustre-devel

On Wed, Jan 09 2019, Andreas Dilger wrote:

> On Jan 9, 2019, at 11:28, James Simmons <jsimmons@infradead.org> wrote:
>> 
>> 
>>>>> This might be because the upstream Lustre doesn't allow setting per-process
>>>>> JobID via environment variable, only as a single per-node value.  The real
>>>>> unfortunate part is that the "get JobID from environment" actually works for
>>>>> every reasonable architecture (even the one which was originally broken
>>>>> fixed it), but it got yanked anyway.  This is actually one of the features
>>>>> of Lustre that lots of HPC sites like to use, since it allows them to track
>>>>> on the servers which users/jobs/processes on the client are doing IO.
>>>> 
>>>> To give background for Neil see thread:
>>>> 
>>>> https://lore.kernel.org/patchwork/patch/416846
>>>> 
>>>> In this case I do agree with Greg. The latest jobid does implement an
>>>> upcall and upcalls don't play niece with containers. Their is also the
>>>> namespace issue pointed out. I think the namespace issue might be fixed
>>>> in the latest OpenSFS code.
>>> 
>>> I'm not sure what you mean?  AFAIK, there is no upcall for JobID, except
>>> maybe in the kernel client where we weren't allowed to parse the process
>>> environment directly.  I agree an upcall is problematic with namespaces,
>>> in addition to being less functional (only a JobID per node instead of
>>> per process), which is why direct access to JOBENV is better IMHO.
>> 
>> I have some evil ideas about this. Need to think about it some more since
>> this is a more complex problem.
>
> Since the kernel manages the environment variables via getenv() and setenv(),
> I honestly don't see why accessing them directly is a huge issue?

This is, at best, an over-simplification.  The kernel doesn't "manage" the
environment variables.
When a process calls execve() (or similar) a collection of strings called
"arguments" and another collection of strings called "environment" are
extracted from the processes vm, and used for initializing part of the
newly created vm.  That is all the kernel does with either.
(except for providing /proc/*/cmdline and /proc/*/environ, which is best-effort).

getenv() ad setenv() are entirely implemented in user-space.  It is quite
possible for a process to mess-up its args or environment in a way that
will make /proc/*/{cmdline,environ} fail to return anything useful.

It is quite possible for the memory storing args and env to be swapped
out.  If a driver tried to accesses either, it might trigger page-in of
that part of the address space, which would probably work but might not
be a good idea.

As I understand it, the goal here is to have a cluster-wide identifier
that can be attached to groups of processes on different nodes.  Then
stats relating to all of those processes can be collected together.

If I didn't think that control-groups were an abomination I would
probably suggest using them to define a group of processes, then to
attach a tag to that group. Both the netcl and net_prio cgroups
do exactly this.  perf_event does as well, and even uses the tag exactly
for collecting performance-data together for a set of processes.
Maybe we could try to champion an fs_event control group?
However it is a long time since I've looked at control groups, and they
might have moved on a bit.

But as I do think that control-groups are an abomination, I couldn't
possible suggest any such thing.
Unix already has a perfectly good grouping abstraction - process groups
(unfortunately there are about 3 sorts of these, but that needn't be a
big problem).
Stats can be collected based on pgid, and a mapping from
client+pgid->jobid can be communicated to whatever collects the
statistics ... somehow.

NeilBrown
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20190110/de8e8a8a/attachment.sig>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [lustre-devel] [PATCH v2 33/33] lustre: update version to 2.9.99
  2019-01-08 21:47         ` James Simmons
  2019-01-09  0:15           ` Andreas Dilger
@ 2019-01-10  1:46           ` NeilBrown
  2019-01-10  7:41             ` Andreas Dilger
  1 sibling, 1 reply; 47+ messages in thread
From: NeilBrown @ 2019-01-10  1:46 UTC (permalink / raw)
  To: lustre-devel

On Tue, Jan 08 2019, James Simmons wrote:
>
> I have been thinking what would be a better approach since I like to
> approach this problem for the 2.13 time frame. Our admins at my work
> place want to be able to collect application stats without being root.
> So placing stats in debugfs is not an option, which we currently do
> the linux client :-( The stats are not a good fit for sysfs.

How much statistics data are we talking about here?
  /proc/self/mountstats
shows over 2K of stats for NFS filesystems.
Is this in the ball-park or do you need an order-of-magnitude more?

NeilBrown
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20190110/a990283e/attachment-0001.sig>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [lustre-devel] [PATCH v2 33/33] lustre: update version to 2.9.99
  2019-01-10  0:40         ` NeilBrown
@ 2019-01-10  7:28           ` Andreas Dilger
  0 siblings, 0 replies; 47+ messages in thread
From: Andreas Dilger @ 2019-01-10  7:28 UTC (permalink / raw)
  To: lustre-devel

On Jan 9, 2019, at 17:40, NeilBrown <neilb@suse.com> wrote:
> 
> On Tue, Jan 08 2019, Andreas Dilger wrote:
>> On Jan 7, 2019, at 21:26, James Simmons <jsimmons@infradead.org> wrote:
>>> 
>>>> sanity: FAIL: test_130a filefrag /mnt/lustre/f130a.sanity failed
>>>> sanity: FAIL: test_130b filefrag /mnt/lustre/f130b.sanity failed
>>>> sanity: FAIL: test_130c filefrag /mnt/lustre/f130c.sanity failed
>>>> sanity: FAIL: test_130e filefrag /mnt/lustre/f130e.sanity failed
>>>> sanity: FAIL: test_130f filefrag /mnt/lustre/f130f.sanity failed
>>> 
>>> What version of e2fsprog are you running? You need a 1.44 version and
>>> this should go away.
>> 
>> To be clear - the Lustre-patched "filefrag" at:
>> 
>> https://downloads.whamcloud.com/public/e2fsprogs/1.44.3.wc1/
> 
> I looked at Commit 41aee4226789 ("filefrag: Lustre changes to filefrag] FIEMAP handling") in the git tree instead.
> 
> This appears to add 3 features.
> 
> - It adds an optional device to struct fiemap.
>  Presumably this is always returned if available, else zero is provided
>  which means "the device".

Vanilla filefrag just returns 0 today.  For Lustre filefrag it returns
the OST index on which the blocks are located. For local filesystems
I'm expecting it to return the rdev of the block device, like 0x801 or
similar.

> - It adds a flag FIEMAP_EXTENT_NET which indicates that the device
>  number is *not*  dev_t, but is some fs-specific value

Right.

> - It allows FIEMAP_FLAG_DEVICE_ORDER to be requested.  I can't quite
>  work out what this does.

The logic makes sense once you understand it.  Consider a striped Lustre
file, or perhaps on an MD RAID device.  If you returned the blocks in
file-logical order (i.e. block 0...EOF), then the largest extent that
could be returned for the same device would be stripe_size/chunk_size.
This would be very verbose (e.g. 1TB file with 1MB stripe_size would be
1M lines of output, though still better than the 256M lines from the
old FIBMAP-based filefrag).  This would make it very hard to see if the
file allocation is contiguous or fragmented, which was our original
goal for implementing FIEMAP.

The DEVICE_ORDER flag means "return blocks in the underlying device
order".  This allows returning block extents of the maximum size for
the underlying filesystem (128MB for ext4), and much more clearly
shows whether the underlying file allocation is contiguous or fragmented.
It also simplifies the implementation at the Lustre side, because we
are essentially doing a series of per-OST FIEMAP calls until the OST
object is done, then moving on to the next object in the file.  The
alternative (which I've thought of impementing, just for compatibility
reasons) would be to interleave the FIEMAP output from each OST by the
logical file offset, but it would be ugly and not very useful, except
for tools that want to see if a file has holes or not.

$ filefrag -v /myth/tmp/4stripe 
Filesystem type is: bd00bd0
File size of /myth/tmp/4stripe is 104857600 (102400 blocks of 1024 bytes)
 ext:     device_logical:        physical_offset: length:  dev: flags:
   0:        0..   28671: 1837711360..1837740031:  28672: 0004: net
   1:        0..   24575: 1280876544..1280901119:  24576: 0000: net
   2:        0..   24575: 1535643648..1535668223:  24576: 0001: net
   3:        0..   24575: 4882608128..4882632703:  24576: 0003: last,net

>  Presumably it changes the order that entries are returned (why?) and
>  maybe returns multiple entries for a region that is mirrored ???

The multiple entries per region is needed for mirrored files.

> As you say, I can see how these might be useful to other filesystems.
> Maybe we should try upstreaming the support sooner rather than later.

I've tried a few times, but have been rebuffed because Lustre isn't
in the mainline.  Originally, BtrFS wasn't going to have multiple device
support, but that has changed since the time FIEMAP was introduced.

I'd of course be happy if it was in mainline, or at least the fields
in struct fiemap_extent reserved to avoid future conflicts.  There
was also a proposal from SuSE for BtrFS to add support for compressed
extents, but it never quite made it over the finish line:

   David Serba "fiemap: introduce EXTENT_DATA_COMPRESSED flag"

Cheers, Andreas
---
Andreas Dilger
Principal Lustre Architect
Whamcloud

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [lustre-devel] [PATCH v2 33/33] lustre: update version to 2.9.99
  2019-01-10  1:46           ` NeilBrown
@ 2019-01-10  7:41             ` Andreas Dilger
  0 siblings, 0 replies; 47+ messages in thread
From: Andreas Dilger @ 2019-01-10  7:41 UTC (permalink / raw)
  To: lustre-devel

On Jan 9, 2019, at 18:46, NeilBrown <neilb@suse.com> wrote:
> 
> On Tue, Jan 08 2019, James Simmons wrote:
>> 
>> I have been thinking what would be a better approach since I like to
>> approach this problem for the 2.13 time frame. Our admins at my work
>> place want to be able to collect application stats without being root.
>> So placing stats in debugfs is not an option, which we currently do
>> the linux client :-( The stats are not a good fit for sysfs.
> 
> How much statistics data are we talking about here?
>  /proc/self/mountstats
> shows over 2K of stats for NFS filesystems.
> Is this in the ball-park or do you need an order-of-magnitude more?

Ah, the joys of being grandfathered into the code...  One of the larger
normal /proc files is the "obdfilter.*.brw_stats" files, which I agree
is a bit of an abomination of ASCII formatted output.

Most of the regular stats files are about 1KB in size, like:

wc /proc/fs/lustre//osc/myth-OST000*/stats
14 107 1028 /proc/fs/lustre//osc/myth-OST0000-osc-ffff880429ee7c00/stats
14 107 1011 /proc/fs/lustre//osc/myth-OST0001-osc-ffff880429ee7c00/stats
13 99 989 /proc/fs/lustre//osc/myth-OST0002-osc-ffff880429ee7c00/stats
14 107 1043 /proc/fs/lustre//osc/myth-OST0003-osc-ffff880429ee7c00/stats
14 107 1075 /proc/fs/lustre//osc/myth-OST0004-osc-ffff880429ee7c00/stats

The {obdfilter,mdt}.*.job_stats files can become quite big on a large
system if there are lots of jobs running.  James would have to report
on what kind of sizes they get on their nearly-largest-in-the-world
filesystem.  Definitely into the MB range.  I don't think there would be
a need to poll that super frequently, maybe in the 60s range, as it keeps
the stats for some minutes after a job stops IO, or it would be impossible
to gather accurate stats for the whole job.

Cheers, Andreas
---
Andreas Dilger
Principal Lustre Architect
Whamcloud

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [lustre-devel] [PATCH v2 33/33] lustre: update version to 2.9.99
  2019-01-10  1:36                 ` NeilBrown
@ 2019-01-10  9:10                   ` Andreas Dilger
  0 siblings, 0 replies; 47+ messages in thread
From: Andreas Dilger @ 2019-01-10  9:10 UTC (permalink / raw)
  To: lustre-devel

On Jan 9, 2019, at 18:36, NeilBrown <neilb@suse.com> wrote:
> 
> On Wed, Jan 09 2019, Andreas Dilger wrote:
> 
>> On Jan 9, 2019, at 11:28, James Simmons <jsimmons@infradead.org> wrote:
>>> 
>>>>>> This might be because the upstream Lustre doesn't allow setting
>>>>>> per-process JobID via environment variable, only as a single
>>>>>> per-node value.  The real unfortunate part is that the "get JobID
>>>>>> from environment" actually works for every reasonable architecture
>>>>>> (even the one which was originally broken fixed it), but it got
>>>>>> yanked anyway.  This is actually one of the features of Lustre that
>>>>>> lots of HPC sites like to use, since it allows them to track on the
>>>>>> servers which users/jobs/processes on the client are doing IO.
>>>>> 
>>>>> To give background for Neil see thread:
>>>>> 
>>>>> https://lore.kernel.org/patchwork/patch/416846
>>>>> 
>>>>> In this case I do agree with Greg. The latest jobid does implement an
>>>>> upcall and upcalls don't play niece with containers. Their is also the
>>>>> namespace issue pointed out. I think the namespace issue might be fixed
>>>>> in the latest OpenSFS code.
>>>> 
>>>> I'm not sure what you mean?  AFAIK, there is no upcall for JobID, except
>>>> maybe in the kernel client where we weren't allowed to parse the process
>>>> environment directly.  I agree an upcall is problematic with namespaces,
>>>> in addition to being less functional (only a JobID per node instead of
>>>> per process), which is why direct access to JOBENV is better IMHO.
>>> 
>>> I have some evil ideas about this. Need to think about it some more since
>>> this is a more complex problem.
>> 
>> Since the kernel manages the environment variables via getenv() and setenv(), I honestly don't see why accessing them directly is a huge issue?
> 
> This is, at best, an over-simplification.  The kernel doesn't "manage" the
> environment variables.
> When a process calls execve() (or similar) a collection of strings called
> "arguments" and another collection of strings called "environment" are
> extracted from the processes vm, and used for initializing part of the
> newly created vm.  That is all the kernel does with either.
> (except for providing /proc/*/cmdline and /proc/*/environ, which is best-effort).

Sure, and we only provide a best effort at parsing it as a series of NUL-
terminated strings.  Userspace can't corrupt the kernel VMA mappings,
so at worst we don't find anything we are looking for, which can also
happen if no JobID is set in the first place.  It's not really any more
dangerous than any copy_from_user() in the filesystem/ioctl code.

> getenv() ad setenv() are entirely implemented in user-space.  It is quite
> possible for a process to mess-up its args or environment in a way that
> will make /proc/*/{cmdline,environ} fail to return anything useful.

If userspace has also messed it up so badly that it can't parse the
environment variables themselves, then even a userspace upcall isn't
going to work.

> It is quite possible for the memory storing args and env to be swapped
> out.  If a driver tried to accesses either, it might trigger page-in of
> that part of the address space, which would probably work but might not
> be a good idea.

I've never seen a report of problems like this.  Processes that are
swapped out are probably not going to be submitting IO either...  We
cache the JobID in the kernel so it is only fetched on the first IO
for that process ID.  There once was a bug where the JobID was fetched
during mmap IO which caused a deadlock, and was since fixed.  We also
added the JobID cache, which has reduced the overhead significantly.

> As I understand it, the goal here is to have a cluster-wide identifier
> that can be attached to groups of processes on different nodes.  Then
> stats relating to all of those processes can be collected together.

Correct, but it isn't just _any_ system-wide identifier.  The large
parallel MPI applications already get assigned an identifier by the
batch scheduler before they are run, and a large number of tools in
these systems use JobID for tracking logs, CPU/IO accounting, etc.

The JobID is stored in an environment variable (e.g. SLURM_JOB_ID)
by the batch scheduler before the actual job is forked.  See the
comment at the start of lustre/obdclass/lprocfs_jobstats.c for
examples.  We can also set artificial jobid values for debugging
or use with systems not using MPI (e.g. procname_uid), but they do
not need access to the process environment.

For Lustre, the admin does a one-time configuration of the name of
the environment variable ("lctl conf_param jobid_var=SLURM_JOB_ID")
to tell the kernel which environment variable to use.

> ... But as I do think that control-groups are an abomination, I couldn't
> possible suggest any such thing.
> Unix already has a perfectly good grouping abstraction - process groups
> (unfortunately there are about 3 sorts of these, but that needn't be a
> big problem).  Stats can be collected based on pgid, and a mapping from
> client+pgid->jobid can be communicated to whatever collects the
> statistics ... somehow.

So, right now we have "scan a few KB of kernel memory for a string"
periodically in the out-of-tree client (see jobid_get_from_environ()
and cfs_get_environ()), and then a hash table that caches the JobID
internally and maps the pid to the JobID when it is needed.  Most of
the code is an simplified copy of access_process_vm() for kernels after
v2.6.24-rc1-652-g02c3530da6b9 when it was un-EXPORT_SYMBOL'd, but
since kernel v4.9-rc3-36-gfcd35857d662 it is again exported so it makes
sense to add a configure check.  Most of the rest is for when the
variable or value crosses a page boundary.


Conversely, the kernel client has something like "upcall a userspace
process, fork a program (millions of cycles), have that program do the
same scan of the kernel environment memory, but now it is doing it in
userspace, open a file, write the environment variable to the kernel,
exit and clean up the process that was created" to do the same thing.


Using a pgid seems mostly unusable, since the stats are not collected
on the client, they are collected on the server (the JobID is sent with
every userspace-driven RPC to the server), which is the centralized
location where all clients submit their IO.  JobStats gives us relatively
easy and direct method to see which client process(es) are going a lot of
IO or RPCs, just looking into a /proc file if necessary (though they are
typically further centralized and monitored from the multiple servers).

We can't send a different pgid from each client along with the RPCs and
hope to aggregate that at the server without adding huge complexity.  We
would need real-time mapping from every new pgid on each client (maybe
thousands per second per client) to the JobID then passed to the MDS/OSS
so that they can reverse-map the pgid back into a JobID before the first
RPC arrives at the server.  Alternately, track separate stats for each client:pgid combination on the server (num_cores * clients = millions of
times more than today if there are multiple jobs per client) until they
are fetched into userspace for mapping and re-aggregation.


Thanks, but I'd rather stick with the relatively simple and direct method
we are using today.  It's worked without problems for 10 years of kernels.
I think that is one of the big obstacles that we face with many of the
upstream kernel maintainers, is that they are focussed on issues that are
local to a one or a few nodes, but we have to deal with issues that may
involve hundreds or thousands of different nodes working as a single task
(unlike cloud stuff where there may be many nodes, but they are all doing
things independently).  It's not that we develop crazy things because we
have spare time to burn, but because they are needed to deal sanely with
such environments.

Cheers, Andreas
---
Andreas Dilger
Principal Lustre Architect
Whamcloud

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2019-01-10  9:10 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-06 22:13 [lustre-devel] [PATCH v2 00/33] lustre: add PFL support James Simmons
2019-01-06 22:13 ` [lustre-devel] [PATCH v2 01/33] lustre: clio: fix incorrect invariant in cl_io_iter_fini() James Simmons
2019-01-06 22:13 ` [lustre-devel] [PATCH v2 02/33] lustre: pfl: Basic data structures for composite layout James Simmons
2019-01-06 22:13 ` [lustre-devel] [PATCH v2 03/33] lustre: lov: move code for PFL work James Simmons
2019-01-06 22:13 ` [lustre-devel] [PATCH v2 04/33] lustre: lov: merge lov_mds_md_v3 and lov_mds_md_v1 handling James Simmons
2019-01-06 22:14 ` [lustre-devel] [PATCH v2 05/33] lustre: lov: fold lmm_verify() handling into lmm_unpackmd() James Simmons
2019-01-06 22:14 ` [lustre-devel] [PATCH v2 06/33] lustre: lov: create struct lov_stripe_md_entry James Simmons
2019-01-06 22:14 ` [lustre-devel] [PATCH v2 07/33] lustre: lov: add composite layout unpacking James Simmons
2019-01-06 22:14 ` [lustre-devel] [PATCH v2 08/33] lustre: lov: embedded raid0 in struct lov_layout_composite James Simmons
2019-01-06 22:14 ` [lustre-devel] [PATCH v2 09/33] lustre: lov: migrate lov raid0 to future PFL component handling James Simmons
2019-01-06 22:14 ` [lustre-devel] [PATCH v2 10/33] lustre: lov: reduce code indentation James Simmons
2019-01-06 22:14 ` [lustre-devel] [PATCH v2 11/33] lustre: lov: change lo_entries to array James Simmons
2019-01-06 22:14 ` [lustre-devel] [PATCH v2 12/33] lustre: lov: move around PFL code and cleanups James Simmons
2019-01-06 22:14 ` [lustre-devel] [PATCH v2 13/33] lustre: lov: remove lsm_stripe_by_[index|offset]_plain James Simmons
2019-01-06 22:14 ` [lustre-devel] [PATCH v2 14/33] lustre: lov: add looping lsm_entry_count times James Simmons
2019-01-06 22:14 ` [lustre-devel] [PATCH v2 15/33] lustre: lov: create lov_comp_* wrappers James Simmons
2019-01-06 22:14 ` [lustre-devel] [PATCH v2 16/33] lustre: clio: client side implementation for PFL James Simmons
2019-01-06 22:14 ` [lustre-devel] [PATCH v2 17/33] lustre: clio: getstripe support comp layout James Simmons
2019-01-06 22:14 ` [lustre-devel] [PATCH v2 18/33] lustre: pfl: enhance PFID EA for PFL James Simmons
2019-01-06 22:14 ` [lustre-devel] [PATCH v2 19/33] lustre: pfl: dynamic layout modification with write/truncate James Simmons
2019-01-06 22:14 ` [lustre-devel] [PATCH v2 20/33] lustre: ldlm: Transfer layout only if layout lock is granted James Simmons
2019-01-06 22:14 ` [lustre-devel] [PATCH v2 21/33] lustre: pfl: calculate PFL file LOVEA correctly James Simmons
2019-01-06 22:14 ` [lustre-devel] [PATCH v2 22/33] lustre: lov: keep minimum LOVEA size James Simmons
2019-01-06 22:14 ` [lustre-devel] [PATCH v2 23/33] lustre: pfl: Read should not trigger layout write intent James Simmons
2019-01-06 22:14 ` [lustre-devel] [PATCH v2 24/33] lustre: pfl: fix hang with grouplocks James Simmons
2019-01-06 22:14 ` [lustre-devel] [PATCH v2 25/33] lustre: pfl: fix ost pool op->size handling James Simmons
2019-01-06 22:14 ` [lustre-devel] [PATCH v2 26/33] lustre: lov: readahead shouldn't exceed component boundary James Simmons
2019-01-06 22:14 ` [lustre-devel] [PATCH v2 27/33] lustre: uapi: support negative flags James Simmons
2019-01-06 22:14 ` [lustre-devel] [PATCH v2 28/33] lustre: llite: return v1/v3 layout for legacy app James Simmons
2019-01-06 22:14 ` [lustre-devel] [PATCH v2 29/33] lustre: llite: restore ll_file_getstripe in ll_lov_setstripe James Simmons
2019-01-06 22:14 ` [lustre-devel] [PATCH v2 30/33] lustre: lov: do not split IO for single striped file James Simmons
2019-01-06 22:14 ` [lustre-devel] [PATCH v2 31/33] lustre: lov: call cl_object_attr_get under cl_attr lock James Simmons
2019-01-06 22:14 ` [lustre-devel] [PATCH v2 32/33] lustre: lov: use stripe_count instead of stripe_nr James Simmons
2019-01-06 22:14 ` [lustre-devel] [PATCH v2 33/33] lustre: update version to 2.9.99 James Simmons
2019-01-08  1:38   ` NeilBrown
2019-01-08  4:26     ` James Simmons
2019-01-08 10:13       ` Andreas Dilger
2019-01-08 21:47         ` James Simmons
2019-01-09  0:15           ` Andreas Dilger
2019-01-09 18:28             ` James Simmons
2019-01-09 23:16               ` Andreas Dilger
2019-01-10  1:36                 ` NeilBrown
2019-01-10  9:10                   ` Andreas Dilger
2019-01-10  1:46           ` NeilBrown
2019-01-10  7:41             ` Andreas Dilger
2019-01-10  0:40         ` NeilBrown
2019-01-10  7:28           ` Andreas Dilger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.