All of lore.kernel.org
 help / color / mirror / Atom feed
* [lustre-devel] [PATCH 00/18] lustre: missing updates from lustre 2.9
@ 2018-07-02 23:24 James Simmons
  2018-07-02 23:24 ` [lustre-devel] [PATCH 01/18] lustre: grant: add support for OBD_CONNECT_GRANT_PARAM James Simmons
                   ` (18 more replies)
  0 siblings, 19 replies; 41+ messages in thread
From: James Simmons @ 2018-07-02 23:24 UTC (permalink / raw)
  To: lustre-devel

This patch set fills in the known missing pieces from the Lustre
2.9 release. With these patches this brings us to the pre-2.10
development cycle. The grant allocation fixes were missing as
well as security hole closes. The rest are small changes in
addition to support for filesets and ladvise feature.

Alexander Boyko (1):
  lustre: obd: add callback for llog_cat_process_or_fork

Bobi Jam (1):
  lustre: osc: max_pages_per_rpc should be chunk size aligned

Fan Yong (1):
  lustre: ptlrpc: properly set "rq_xid" for 4MB IO

Henri Doreau (1):
  lustre: llite: restore fd_och when putting lease

James Simmons (2):
  lustre: libcfs: restore original behavior in cfs_str2num_check
  lustre: update version to 2.8.99

Jian Yu (1):
  lustre: mount: fix lmd_parse() to handle new delimiters

Jinshan Xiong (1):
  lustre: llite: fast read implementation

Johann Lombardi (1):
  lustre: grant: add support for OBD_CONNECT_GRANT_PARAM

John L. Hammond (3):
  lustre: obd: rename md_getstatus() to md_get_root()
  lustre: obd: reserve connection flag OBD_CONNECT2_FILE_SECCTX
  lustre: security: send file security context for creates

Lai Siyao (1):
  lustre: fileset: add fileset mount support

Li Xi (3):
  lustre: ladvise: Add feature of giving file access advices
  lustre: ladvise: Add willread advice support for ladvise
  lustre: ladvise: Add dontneed advice support for ladvise

Niu Yawei (1):
  lustre: ldlm: reduce mem footprint of ldlm_resource

Patrick Farrell (1):
  lustre: llite: ladvise protocol changes

 .../lustre/include/uapi/linux/lustre/lustre_idl.h  |  52 ++-
 .../lustre/include/uapi/linux/lustre/lustre_user.h |  51 +++
 .../lustre/include/uapi/linux/lustre/lustre_ver.h  |   4 +-
 drivers/staging/lustre/lnet/libcfs/libcfs_string.c |  17 +-
 drivers/staging/lustre/lustre/include/cl_object.h  |  13 +
 .../staging/lustre/lustre/include/lustre_disk.h    |   2 +
 drivers/staging/lustre/lustre/include/lustre_dlm.h |  24 +-
 .../staging/lustre/lustre/include/lustre_import.h  |   1 +
 .../lustre/lustre/include/lustre_req_layout.h      |  10 +-
 .../staging/lustre/lustre/include/lustre_swab.h    |   2 +
 drivers/staging/lustre/lustre/include/obd.h        |  17 +-
 drivers/staging/lustre/lustre/include/obd_class.h  |  16 +-
 .../staging/lustre/lustre/include/obd_support.h    |   5 +-
 drivers/staging/lustre/lustre/ldlm/ldlm_internal.h |   1 +
 drivers/staging/lustre/lustre/ldlm/ldlm_lib.c      |   1 +
 drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c    |  20 +-
 drivers/staging/lustre/lustre/ldlm/ldlm_resource.c |  32 +-
 drivers/staging/lustre/lustre/llite/dir.c          |  54 +++-
 drivers/staging/lustre/lustre/llite/file.c         | 360 ++++++++++++++++++---
 .../staging/lustre/lustre/llite/llite_internal.h   |  34 +-
 drivers/staging/lustre/lustre/llite/llite_lib.c    |  37 ++-
 drivers/staging/lustre/lustre/llite/llite_mmap.c   |  34 +-
 drivers/staging/lustre/lustre/llite/lproc_llite.c  |  38 +++
 drivers/staging/lustre/lustre/llite/namei.c        |  40 ++-
 drivers/staging/lustre/lustre/llite/rw.c           |  68 +++-
 drivers/staging/lustre/lustre/llite/vvp_internal.h |   1 +
 drivers/staging/lustre/lustre/llite/vvp_io.c       |   3 +
 .../staging/lustre/lustre/llite/xattr_security.c   |  38 ++-
 drivers/staging/lustre/lustre/lmv/lmv_obd.c        |  34 +-
 drivers/staging/lustre/lustre/lov/lov_io.c         |  28 ++
 drivers/staging/lustre/lustre/mdc/mdc_internal.h   |   4 +
 drivers/staging/lustre/lustre/mdc/mdc_lib.c        |  32 ++
 drivers/staging/lustre/lustre/mdc/mdc_locks.c      |   7 +
 drivers/staging/lustre/lustre/mdc/mdc_reint.c      |   7 +
 drivers/staging/lustre/lustre/mdc/mdc_request.c    |  50 ++-
 drivers/staging/lustre/lustre/obdclass/cl_io.c     |   2 +
 drivers/staging/lustre/lustre/obdclass/llog_cat.c  |  16 +-
 .../lustre/lustre/obdclass/lprocfs_status.c        |  72 ++++-
 drivers/staging/lustre/lustre/obdclass/obd_mount.c | 135 +++++++-
 drivers/staging/lustre/lustre/osc/lproc_osc.c      |  18 ++
 drivers/staging/lustre/lustre/osc/osc_cache.c      |  95 ++++--
 .../staging/lustre/lustre/osc/osc_cl_internal.h    |   1 +
 drivers/staging/lustre/lustre/osc/osc_dev.c        |   1 +
 drivers/staging/lustre/lustre/osc/osc_internal.h   |   4 +
 drivers/staging/lustre/lustre/osc/osc_io.c         |  79 +++++
 drivers/staging/lustre/lustre/osc/osc_request.c    | 184 +++++++++--
 drivers/staging/lustre/lustre/ptlrpc/client.c      |   9 +-
 drivers/staging/lustre/lustre/ptlrpc/import.c      |  10 +
 drivers/staging/lustre/lustre/ptlrpc/layout.c      |  66 +++-
 .../staging/lustre/lustre/ptlrpc/lproc_ptlrpc.c    |   3 +-
 .../staging/lustre/lustre/ptlrpc/pack_generic.c    |  27 +-
 drivers/staging/lustre/lustre/ptlrpc/wiretest.c    | 122 +++++--
 52 files changed, 1722 insertions(+), 259 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [lustre-devel] [PATCH 01/18] lustre: grant: add support for OBD_CONNECT_GRANT_PARAM
  2018-07-02 23:24 [lustre-devel] [PATCH 00/18] lustre: missing updates from lustre 2.9 James Simmons
@ 2018-07-02 23:24 ` James Simmons
  2018-07-03  3:04   ` NeilBrown
  2018-07-06  4:02   ` Cory Spitz
  2018-07-02 23:24 ` [lustre-devel] [PATCH 02/18] lustre: ladvise: Add feature of giving file access advices James Simmons
                   ` (17 subsequent siblings)
  18 siblings, 2 replies; 41+ messages in thread
From: James Simmons @ 2018-07-02 23:24 UTC (permalink / raw)
  To: lustre-devel

From: Johann Lombardi <jlombardi@whamcloud.com>

Add support for grant overhead calculation on the client side.
To do so, clients track usage on a per-extent basis. An extent is
composed of contiguous blocks.
The OST now returns to the OSC layer several parameters to consume
grant more accurately:

- the backend filesystem block size which is the minimal grant
  allocation unit;
- the maximum extent size;
- the extent insertion cost.
  Clients now pack in bulk write how much grant space was consumed for
  the RPC. Dirty data accounting also adopts the same scheme.

Moreover, each backend OSD now reports its own set of parameters:
- For ldiskfs, we usually have a 4KB block size with a maximum extent
  size of 32MB (theoretical limit of 128MB) and an extent insertion
  cost of 6 x 4KB = 24KB
- For ZFS, we report a block size of 128KB, an extent size of 128
  blocks (i.e. 16MB with 128KB block size) and a block insertion cost
  of 112KB.

Besides, there is now no more generic metadata overhead reservation
done inside each OSD. Instead grant space is inflated for clients
that do not support the new grant parameters. That said, a tiny
percentage (typically 0.76%) of the free space is still reserved
inside each OSD to avoid fragmentation which might hurt performance
and impact our grant calculation (e.g. extents are broken due to
fragmentation).

This patch also fixes several other issues:

- Bulk write resent by ptlrpc after reconnection could trigger
  spurious error messages related to broken dirty accounting.
  The issue was that oa_dirty is discarded for resent requests
  (grant flag cleared in ost_brw_write()), so we can legitimately
  have grant > fed_dirty in ofd_grant_check().
  This was fixed by resetting fed_dirty on reconnection and skipping
  the dirty accounting check in ofd_grant_check() in the case of
  ptlrpc resend.

- In obd_connect_data_seqprint(), the connection flags cannot fit
  in a 32-bit integer.

- When merging two OSC extents, an extent tax should be released
  in both the merged extent and in the grant accounting.

Signed-off-by: Johann Lombardi <jlombardi@whamcloud.com>
Signed-off-by: Jinshan Xiong <jinshan.xiong@gmail.com>
WC-bug-id: https://jira.whamcloud.com/browse/2049
Reviewed-on: http://review.whamcloud.com/7793
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
---
 .../lustre/include/uapi/linux/lustre/lustre_idl.h  |  8 +-
 drivers/staging/lustre/lustre/include/obd.h        |  9 ++-
 .../staging/lustre/lustre/include/obd_support.h    |  1 +
 drivers/staging/lustre/lustre/llite/llite_lib.c    |  9 +++
 .../lustre/lustre/obdclass/lprocfs_status.c        | 10 ++-
 drivers/staging/lustre/lustre/osc/lproc_osc.c      | 18 +++++
 drivers/staging/lustre/lustre/osc/osc_cache.c      | 84 +++++++++++++++------
 drivers/staging/lustre/lustre/osc/osc_request.c    | 85 ++++++++++++++++------
 .../staging/lustre/lustre/ptlrpc/pack_generic.c    |  4 +-
 drivers/staging/lustre/lustre/ptlrpc/wiretest.c    | 32 ++++----
 10 files changed, 191 insertions(+), 69 deletions(-)

diff --git a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h
index 6c7e399..3d77ed6 100644
--- a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h
@@ -732,10 +732,10 @@ struct obd_connect_data {
 	__u32 ocd_index;	 /* LOV index to connect to */
 	__u32 ocd_brw_size;	 /* Maximum BRW size in bytes */
 	__u64 ocd_ibits_known;   /* inode bits this client understands */
-	__u8  ocd_blocksize;     /* log2 of the backend filesystem blocksize */
-	__u8  ocd_inodespace;    /* log2 of the per-inode space consumption */
-	__u16 ocd_grant_extent;  /* per-extent grant overhead, in 1K blocks */
-	__u32 ocd_unused;	 /* also fix lustre_swab_connect */
+	__u8  ocd_grant_blkbits; /* log2 of the backend filesystem blocksize */
+	__u8  ocd_grant_inobits; /* log2 of the per-inode space consumption */
+	__u16 ocd_grant_tax_kb;  /* extent grant overhead, in 1K blocks */
+	__u32 ocd_grant_max_blks;/* maximum number of blocks per extent */
 	__u64 ocd_transno;       /* first transno from client to be replayed */
 	__u32 ocd_group;	 /* MDS group on OST */
 	__u32 ocd_cksum_types;   /* supported checksum algorithms */
diff --git a/drivers/staging/lustre/lustre/include/obd.h b/drivers/staging/lustre/lustre/include/obd.h
index d38b6bc..d6fd1ea 100644
--- a/drivers/staging/lustre/lustre/include/obd.h
+++ b/drivers/staging/lustre/lustre/include/obd.h
@@ -198,6 +198,8 @@ struct client_obd {
 	unsigned long		 cl_dirty_transit;	/* dirty synchronous */
 	unsigned long		 cl_avail_grant;	/* bytes of credit for ost */
 	unsigned long		 cl_lost_grant;		/* lost credits (trunc) */
+	/* grant consumed for dirty pages */
+	unsigned long		 cl_dirty_grant;
 
 	/* since we allocate grant by blocks, we don't know how many grant will
 	 * be used to add a page into cache. As a solution, we reserve maximum
@@ -214,7 +216,12 @@ struct client_obd {
 	 * the extent size. A chunk is max(PAGE_SIZE, OST block size)
 	 */
 	int		  cl_chunkbits;
-	unsigned int	  cl_extent_tax; /* extent overhead, by bytes */
+	/* extent insertion metadata overhead to be accounted in grant,
+	 * in bytes
+	 */
+	unsigned int	 cl_grant_extent_tax;
+	/* maximum extent size, in number of pages */
+	unsigned int	 cl_max_extent_pages;
 
 	/* keep track of objects that have lois that contain pages which
 	 * have been queued for async brw.  this lock also protects the
diff --git a/drivers/staging/lustre/lustre/include/obd_support.h b/drivers/staging/lustre/lustre/include/obd_support.h
index 070a281..ca28caf 100644
--- a/drivers/staging/lustre/lustre/include/obd_support.h
+++ b/drivers/staging/lustre/lustre/include/obd_support.h
@@ -320,6 +320,7 @@
 #define OBD_FAIL_OSC_CP_ENQ_RACE	 0x410
 #define OBD_FAIL_OSC_NO_GRANT	    0x411
 #define OBD_FAIL_OSC_DELAY_SETTIME	 0x412
+#define OBD_FAIL_OSC_CONNECT_GRANT_PARAM 0x413
 #define OBD_FAIL_OSC_DELAY_IO		 0x414
 
 #define OBD_FAIL_PTLRPC		  0x500
diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c
index 9f6f061..df5bc0a 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -178,6 +178,12 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt)
 		return -ENOMEM;
 	}
 
+	/*
+	 * pass client page size via ocd_grant_blkbits, the server should report
+	 * back its backend blocksize for grant calculation purpose
+	 */
+	data->ocd_grant_blkbits = PAGE_SHIFT;
+
 	/* indicate the features supported by this client */
 	data->ocd_connect_flags = OBD_CONNECT_IBITS    | OBD_CONNECT_NODEVOH  |
 				  OBD_CONNECT_ATTRFID  |
@@ -367,6 +373,9 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt)
 				  OBD_CONNECT_PINGLESS | OBD_CONNECT_LFSCK |
 				  OBD_CONNECT_BULK_MBITS;
 
+	if (!OBD_FAIL_CHECK(OBD_FAIL_OSC_CONNECT_GRANT_PARAM))
+		data->ocd_connect_flags |= OBD_CONNECT_GRANT_PARAM;
+
 	if (!OBD_FAIL_CHECK(OBD_FAIL_OSC_CONNECT_CKSUM)) {
 		/* OBD_CONNECT_CKSUM should always be set, even if checksums are
 		 * disabled by default, because it can still be enabled on the
diff --git a/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c b/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c
index a40ec42..dd88179 100644
--- a/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c
+++ b/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c
@@ -163,10 +163,12 @@ static void obd_connect_data_seqprint(struct seq_file *m,
 	if (flags & OBD_CONNECT_GRANT_PARAM)
 		seq_printf(m, "       grant_block_size: %d\n"
 			   "       grant_inode_size: %d\n"
-			   "       grant_extent_overhead: %d\n",
-			   ocd->ocd_blocksize,
-			   ocd->ocd_inodespace,
-			   ocd->ocd_grant_extent);
+			   "       grant_max_extent_size: %d\n"
+			   "       grant_extent_tax: %d\n",
+			   1 << ocd->ocd_grant_blkbits,
+			   1 << ocd->ocd_grant_inobits,
+			   ocd->ocd_grant_max_blks << ocd->ocd_grant_blkbits,
+			   ocd->ocd_grant_tax_kb << 10);
 	if (flags & OBD_CONNECT_TRANSNO)
 		seq_printf(m, "       first_transno: %llx\n",
 			   ocd->ocd_transno);
diff --git a/drivers/staging/lustre/lustre/osc/lproc_osc.c b/drivers/staging/lustre/lustre/osc/lproc_osc.c
index 64931b9..81adf54 100644
--- a/drivers/staging/lustre/lustre/osc/lproc_osc.c
+++ b/drivers/staging/lustre/lustre/osc/lproc_osc.c
@@ -326,6 +326,23 @@ static ssize_t cur_lost_grant_bytes_show(struct kobject *kobj,
 }
 LUSTRE_RO_ATTR(cur_lost_grant_bytes);
 
+static ssize_t cur_dirty_grant_bytes_show(struct kobject *kobj,
+					  struct attribute *attr,
+					  char *buf)
+{
+	struct obd_device *dev = container_of(kobj, struct obd_device,
+					      obd_kobj);
+	struct client_obd *cli = &dev->u.cli;
+	int len;
+
+	spin_lock(&cli->cl_loi_list_lock);
+	len = sprintf(buf, "%lu\n", cli->cl_dirty_grant);
+	spin_unlock(&cli->cl_loi_list_lock);
+
+	return len;
+}
+LUSTRE_RO_ATTR(cur_dirty_grant_bytes);
+
 static ssize_t grant_shrink_interval_show(struct kobject *kobj,
 					  struct attribute *attr,
 					  char *buf)
@@ -817,6 +834,7 @@ void lproc_osc_attach_seqstat(struct obd_device *dev)
 	&lustre_attr_cur_dirty_bytes.attr,
 	&lustre_attr_cur_grant_bytes.attr,
 	&lustre_attr_cur_lost_grant_bytes.attr,
+	&lustre_attr_cur_dirty_grant_bytes.attr,
 	&lustre_attr_destroys_in_flight.attr,
 	&lustre_attr_grant_shrink_interval.attr,
 	&lustre_attr_lockless_truncate.attr,
diff --git a/drivers/staging/lustre/lustre/osc/osc_cache.c b/drivers/staging/lustre/lustre/osc/osc_cache.c
index 99de672..8d3f501 100644
--- a/drivers/staging/lustre/lustre/osc/osc_cache.c
+++ b/drivers/staging/lustre/lustre/osc/osc_cache.c
@@ -55,13 +55,16 @@ static int osc_refresh_count(const struct lu_env *env,
 static int osc_io_unplug_async(const struct lu_env *env,
 			       struct client_obd *cli, struct osc_object *osc);
 static void osc_free_grant(struct client_obd *cli, unsigned int nr_pages,
-			   unsigned int lost_grant);
+			   unsigned int lost_grant, unsigned int dirty_grant);
 
 static void osc_extent_tree_dump0(int level, struct osc_object *obj,
 				  const char *func, int line);
 #define osc_extent_tree_dump(lvl, obj) \
 	osc_extent_tree_dump0(lvl, obj, __func__, __LINE__)
 
+static void osc_unreserve_grant(struct client_obd *cli, unsigned int reserved,
+				unsigned int unused);
+
 /** \addtogroup osc
  *  @{
  */
@@ -532,12 +535,13 @@ static void osc_extent_remove(struct osc_extent *ext)
 
 /**
  * This function is used to merge extents to get better performance. It checks
- * if @cur and @victim are contiguous at chunk level.
+ * if @cur and @victim are contiguous at block level.
  */
 static int osc_extent_merge(const struct lu_env *env, struct osc_extent *cur,
 			    struct osc_extent *victim)
 {
 	struct osc_object *obj = cur->oe_obj;
+	struct client_obd *cli = osc_cli(obj);
 	pgoff_t chunk_start;
 	pgoff_t chunk_end;
 	int ppc_bits;
@@ -561,11 +565,20 @@ static int osc_extent_merge(const struct lu_env *env, struct osc_extent *cur,
 	    chunk_end + 1 != victim->oe_start >> ppc_bits)
 		return -ERANGE;
 
+	/*
+	 * overall extent size should not exceed the max supported limit
+	 * reported by the server
+	 */
+	if (cur->oe_end - cur->oe_start + 1 +
+	    victim->oe_end - victim->oe_start + 1 > cli->cl_max_extent_pages)
+		return -ERANGE;
+
 	OSC_EXTENT_DUMP(D_CACHE, victim, "will be merged by %p.\n", cur);
 
 	cur->oe_start = min(cur->oe_start, victim->oe_start);
 	cur->oe_end = max(cur->oe_end, victim->oe_end);
-	cur->oe_grants += victim->oe_grants;
+	/* per-extent tax should be accounted only once for the whole extent */
+	cur->oe_grants += victim->oe_grants - cli->cl_grant_extent_tax;
 	cur->oe_nr_pages += victim->oe_nr_pages;
 	/* only the following bits are needed to merge */
 	cur->oe_urgent |= victim->oe_urgent;
@@ -588,6 +601,7 @@ static int osc_extent_merge(const struct lu_env *env, struct osc_extent *cur,
 void osc_extent_release(const struct lu_env *env, struct osc_extent *ext)
 {
 	struct osc_object *obj = ext->oe_obj;
+	struct client_obd *cli = osc_cli(obj);
 
 	LASSERT(atomic_read(&ext->oe_users) > 0);
 	LASSERT(sanity_check(ext) == 0);
@@ -603,13 +617,19 @@ void osc_extent_release(const struct lu_env *env, struct osc_extent *ext)
 			osc_extent_state_set(ext, OES_TRUNC);
 			ext->oe_trunc_pending = 0;
 		} else {
+			int grant = 0;
+
 			osc_extent_state_set(ext, OES_CACHE);
 			osc_update_pending(obj, OBD_BRW_WRITE,
 					   ext->oe_nr_pages);
 
 			/* try to merge the previous and next extent. */
-			osc_extent_merge(env, ext, prev_extent(ext));
-			osc_extent_merge(env, ext, next_extent(ext));
+			if (!osc_extent_merge(env, ext, prev_extent(ext)))
+				grant += cli->cl_grant_extent_tax;
+			if (!osc_extent_merge(env, ext, next_extent(ext)))
+				grant += cli->cl_grant_extent_tax;
+			if (grant > 0)
+				osc_unreserve_grant(cli, 0, grant);
 
 			if (ext->oe_urgent)
 				list_move_tail(&ext->oe_link,
@@ -617,7 +637,7 @@ void osc_extent_release(const struct lu_env *env, struct osc_extent *ext)
 		}
 		osc_object_unlock(obj);
 
-		osc_io_unplug_async(env, osc_cli(obj), obj);
+		osc_io_unplug_async(env, cli, obj);
 	}
 	osc_extent_put(env, ext);
 }
@@ -690,8 +710,8 @@ static struct osc_extent *osc_extent_find(const struct lu_env *env,
 	}
 
 	/* grants has been allocated by caller */
-	LASSERTF(*grants >= chunksize + cli->cl_extent_tax,
-		 "%u/%u/%u.\n", *grants, chunksize, cli->cl_extent_tax);
+	LASSERTF(*grants >= chunksize + cli->cl_grant_extent_tax,
+		 "%u/%u/%u.\n", *grants, chunksize, cli->cl_grant_extent_tax);
 	LASSERTF((max_end - cur->oe_start) < max_pages, EXTSTR "\n",
 		 EXTPARA(cur));
 
@@ -770,6 +790,13 @@ static struct osc_extent *osc_extent_find(const struct lu_env *env,
 			continue;
 		}
 
+		/* check whether maximum extent size will be hit */
+		if ((ext_chk_end - ext_chk_start + 1) << ppc_bits >
+		    cli->cl_max_extent_pages) {
+			ext = next_extent(ext);
+			continue;
+		}
+
 		/* it's required that an extent must be contiguous at chunk
 		 * level so that we know the whole extent is covered by grant
 		 * (the pages in the extent are NOT required to be contiguous).
@@ -801,7 +828,7 @@ static struct osc_extent *osc_extent_find(const struct lu_env *env,
 			 */
 			if (osc_extent_merge(env, ext, next_extent(ext)) == 0)
 				/* we can save extent tax from next extent */
-				*grants += cli->cl_extent_tax;
+				*grants += cli->cl_grant_extent_tax;
 
 			found = osc_extent_hold(ext);
 		}
@@ -822,7 +849,7 @@ static struct osc_extent *osc_extent_find(const struct lu_env *env,
 	} else if (!conflict) {
 		/* create a new extent */
 		EASSERT(osc_extent_is_overlapped(obj, cur) == 0, cur);
-		cur->oe_grants = chunksize + cli->cl_extent_tax;
+		cur->oe_grants = chunksize + cli->cl_grant_extent_tax;
 		LASSERT(*grants >= cur->oe_grants);
 		*grants -= cur->oe_grants;
 
@@ -908,7 +935,7 @@ int osc_extent_finish(const struct lu_env *env, struct osc_extent *ext,
 		lost_grant = PAGE_SIZE - count;
 	}
 	if (ext->oe_grants > 0)
-		osc_free_grant(cli, nr_pages, lost_grant);
+		osc_free_grant(cli, nr_pages, lost_grant, ext->oe_grants);
 
 	osc_extent_remove(ext);
 	/* put the refcount for RPC */
@@ -1084,7 +1111,7 @@ static int osc_extent_truncate(struct osc_extent *ext, pgoff_t trunc_index,
 	osc_object_unlock(obj);
 
 	if (grants > 0 || nr_pages > 0)
-		osc_free_grant(cli, nr_pages, grants);
+		osc_free_grant(cli, nr_pages, grants, grants);
 
 out:
 	cl_io_fini(env, io);
@@ -1207,9 +1234,16 @@ static int osc_extent_expand(struct osc_extent *ext, pgoff_t index,
 	}
 
 	LASSERT(end_chunk + 1 == chunk);
+
 	/* try to expand this extent to cover @index */
 	end_index = min(ext->oe_max_end, ((chunk + 1) << ppc_bits) - 1);
 
+	/* don't go over the maximum extent size reported by server */
+	if (end_index - ext->oe_start + 1 > cli->cl_max_extent_pages) {
+		rc = -ERANGE;
+		goto out;
+	}
+
 	next = next_extent(ext);
 	if (next && next->oe_start <= end_index) {
 		/* complex mode - overlapped with the next extent,
@@ -1374,13 +1408,15 @@ static int osc_completion(const struct lu_env *env, struct osc_async_page *oap,
 
 #define OSC_DUMP_GRANT(lvl, cli, fmt, args...) do {			      \
 	struct client_obd *__tmp = (cli);				      \
-	CDEBUG(lvl, "%s: grant { dirty: %lu/%lu dirty_pages: %ld/%lu "	      \
-	       "dropped: %ld avail: %ld, reserved: %ld, flight: %d }"	      \
-	       "lru {in list: %ld, left: %ld, waiters: %d }" fmt "\n",	      \
+	CDEBUG(lvl, "%s: grant { dirty: %ld/%ld dirty_pages: %ld/%lu "	\
+	       "dropped: %ld avail: %ld, dirty_grant: %ld, "		\
+	       "reserved: %ld, flight: %d } lru {in list: %ld, "	\
+	       "left: %ld, waiters: %d }" fmt "\n",			\
 	       cli_name(__tmp),						      \
 	       __tmp->cl_dirty_pages, __tmp->cl_dirty_max_pages,	      \
 	       atomic_long_read(&obd_dirty_pages), obd_max_dirty_pages,	      \
 	       __tmp->cl_lost_grant, __tmp->cl_avail_grant,		      \
+	       __tmp->cl_dirty_grant,					\
 	       __tmp->cl_reserved_grant, __tmp->cl_w_in_flight,		      \
 	       atomic_long_read(&__tmp->cl_lru_in_list),		      \
 	       atomic_long_read(&__tmp->cl_lru_busy),			      \
@@ -1451,8 +1487,10 @@ static void __osc_unreserve_grant(struct client_obd *cli,
 	if (unused > reserved) {
 		cli->cl_avail_grant += reserved;
 		cli->cl_lost_grant  += unused - reserved;
+		cli->cl_dirty_grant -= unused - reserved;
 	} else {
 		cli->cl_avail_grant += unused;
+		cli->cl_dirty_grant += reserved - unused;
 	}
 }
 
@@ -1480,14 +1518,17 @@ static void osc_unreserve_grant(struct client_obd *cli,
  *    See filter_grant_check() for details.
  */
 static void osc_free_grant(struct client_obd *cli, unsigned int nr_pages,
-			   unsigned int lost_grant)
+			   unsigned int lost_grant, unsigned int dirty_grant)
 {
-	unsigned long grant = (1 << cli->cl_chunkbits) + cli->cl_extent_tax;
+	unsigned long grant;
+
+	grant = (1 << cli->cl_chunkbits) + cli->cl_grant_extent_tax;
 
 	spin_lock(&cli->cl_loi_list_lock);
 	atomic_long_sub(nr_pages, &obd_dirty_pages);
 	cli->cl_dirty_pages -= nr_pages;
 	cli->cl_lost_grant += lost_grant;
+	cli->cl_dirty_grant -= dirty_grant;
 	if (cli->cl_avail_grant < grant && cli->cl_lost_grant >= grant) {
 		/* borrow some grant from truncate to avoid the case that
 		 * truncate uses up all avail grant
@@ -1497,9 +1538,10 @@ static void osc_free_grant(struct client_obd *cli, unsigned int nr_pages,
 	}
 	osc_wake_cache_waiters(cli);
 	spin_unlock(&cli->cl_loi_list_lock);
-	CDEBUG(D_CACHE, "lost %u grant: %lu avail: %lu dirty: %lu\n",
+	CDEBUG(D_CACHE, "lost %u grant: %lu avail: %lu dirty: %lu/%lu\n",
 	       lost_grant, cli->cl_lost_grant,
-	       cli->cl_avail_grant, cli->cl_dirty_pages << PAGE_SHIFT);
+	       cli->cl_avail_grant, cli->cl_dirty_pages << PAGE_SHIFT,
+	       cli->cl_dirty_grant);
 }
 
 /**
@@ -2437,7 +2479,7 @@ int osc_queue_async_io(const struct lu_env *env, struct cl_io *io,
 		/* one chunk plus extent overhead must be enough to write this
 		 * page
 		 */
-		grants = (1 << cli->cl_chunkbits) + cli->cl_extent_tax;
+		grants = (1 << cli->cl_chunkbits) + cli->cl_grant_extent_tax;
 		if (ext->oe_end >= index)
 			grants = 0;
 
@@ -2474,7 +2516,7 @@ int osc_queue_async_io(const struct lu_env *env, struct cl_io *io,
 	}
 
 	if (!ext) {
-		tmp = (1 << cli->cl_chunkbits) + cli->cl_extent_tax;
+		tmp = (1 << cli->cl_chunkbits) + cli->cl_grant_extent_tax;
 
 		/* try to find new extent to cover this page */
 		LASSERT(!oio->oi_active);
diff --git a/drivers/staging/lustre/lustre/osc/osc_request.c b/drivers/staging/lustre/lustre/osc/osc_request.c
index bcb9b91..ce073b6 100644
--- a/drivers/staging/lustre/lustre/osc/osc_request.c
+++ b/drivers/staging/lustre/lustre/osc/osc_request.c
@@ -576,7 +576,10 @@ static void osc_announce_cached(struct client_obd *cli, struct obdo *oa,
 
 	oa->o_valid |= bits;
 	spin_lock(&cli->cl_loi_list_lock);
-	oa->o_dirty = cli->cl_dirty_pages << PAGE_SHIFT;
+	if (OCD_HAS_FLAG(&cli->cl_import->imp_connect_data, GRANT_PARAM))
+		oa->o_dirty = cli->cl_dirty_grant;
+	else
+		oa->o_dirty = cli->cl_dirty_pages << PAGE_SHIFT;
 	if (unlikely(cli->cl_dirty_pages - cli->cl_dirty_transit >
 		     cli->cl_dirty_max_pages)) {
 		CERROR("dirty %lu - %lu > dirty_max %lu\n",
@@ -601,12 +604,24 @@ static void osc_announce_cached(struct client_obd *cli, struct obdo *oa,
 		       cli->cl_dirty_pages, cli->cl_dirty_max_pages);
 		oa->o_undirty = 0;
 	} else {
-		unsigned long max_in_flight;
-
-		max_in_flight = (cli->cl_max_pages_per_rpc << PAGE_SHIFT) *
-				(cli->cl_max_rpcs_in_flight + 1);
-		oa->o_undirty = max(cli->cl_dirty_max_pages << PAGE_SHIFT,
-				    max_in_flight);
+		unsigned long nrpages;
+
+		nrpages = cli->cl_max_pages_per_rpc;
+		nrpages *= cli->cl_max_rpcs_in_flight + 1;
+		nrpages = max(nrpages, cli->cl_dirty_max_pages);
+		oa->o_undirty = nrpages << PAGE_SHIFT;
+		if (OCD_HAS_FLAG(&cli->cl_import->imp_connect_data,
+				 GRANT_PARAM)) {
+			int nrextents;
+
+			/*
+			 * take extent tax into account when asking for more
+			 * grant space
+			 */
+			nrextents = (nrpages + cli->cl_max_extent_pages - 1)  /
+				     cli->cl_max_extent_pages;
+			oa->o_undirty += nrextents * cli->cl_grant_extent_tax;
+		}
 	}
 	oa->o_grant = cli->cl_avail_grant + cli->cl_reserved_grant;
 	oa->o_dropped = cli->cl_lost_grant;
@@ -811,20 +826,40 @@ static void osc_init_grant(struct client_obd *cli, struct obd_connect_data *ocd)
 	 * race is tolerable here: if we're evicted, but imp_state already
 	 * left EVICTED state, then cl_dirty_pages must be 0 already.
 	 */
+	cli->cl_avail_grant = ocd->ocd_grant;
 	spin_lock(&cli->cl_loi_list_lock);
-	if (cli->cl_import->imp_state == LUSTRE_IMP_EVICTED)
-		cli->cl_avail_grant = ocd->ocd_grant;
-	else
-		cli->cl_avail_grant = ocd->ocd_grant -
-				      (cli->cl_dirty_pages << PAGE_SHIFT);
-
-	/* determine the appropriate chunk size used by osc_extent. */
-	cli->cl_chunkbits = max_t(int, PAGE_SHIFT, ocd->ocd_blocksize);
+	if (cli->cl_import->imp_state != LUSTRE_IMP_EVICTED) {
+		cli->cl_avail_grant -= cli->cl_reserved_grant;
+		if (OCD_HAS_FLAG(ocd, GRANT_PARAM))
+			cli->cl_avail_grant -= cli->cl_dirty_grant;
+		else
+			cli->cl_avail_grant -= cli->cl_dirty_pages << PAGE_SHIFT;
+	}
+
+	if (OCD_HAS_FLAG(ocd, GRANT_PARAM)) {
+		u64 size;
+
+		/* overhead for each extent insertion */
+		cli->cl_grant_extent_tax = ocd->ocd_grant_tax_kb << 10;
+		/* determine the appropriate chunk size used by osc_extent. */
+		cli->cl_chunkbits = max_t(int, PAGE_SHIFT,
+					  ocd->ocd_grant_blkbits);
+		/* determine maximum extent size, in #pages */
+		size = (u64)ocd->ocd_grant_max_blks << ocd->ocd_grant_blkbits;
+		cli->cl_max_extent_pages = size >> PAGE_SHIFT;
+		if (!cli->cl_max_extent_pages)
+			cli->cl_max_extent_pages = 1;
+	} else {
+		cli->cl_grant_extent_tax = 0;
+		cli->cl_chunkbits = PAGE_SHIFT;
+		cli->cl_max_extent_pages = DT_MAX_BRW_PAGES;
+	}
 	spin_unlock(&cli->cl_loi_list_lock);
 
-	CDEBUG(D_CACHE, "%s, setting cl_avail_grant: %ld cl_lost_grant: %ld chunk bits: %d\n",
+	CDEBUG(D_CACHE,
+	       "%s, setting cl_avail_grant: %ld cl_lost_grant: %ld chunk bits: %d cl_max_extent_pages: %d\n",
 	       cli_name(cli), cli->cl_avail_grant, cli->cl_lost_grant,
-	       cli->cl_chunkbits);
+	       cli->cl_chunkbits, cli->cl_max_extent_pages);
 
 	if (ocd->ocd_connect_flags & OBD_CONNECT_GRANT_SHRINK &&
 	    list_empty(&cli->cl_grant_shrink_list))
@@ -1661,6 +1696,7 @@ int osc_build_rpc(const struct lu_env *env, struct client_obd *cli,
 	int page_count = 0;
 	bool soft_sync = false;
 	bool interrupted = false;
+	int grant = 0;
 	int i;
 	int rc;
 	struct ost_body *body;
@@ -1672,6 +1708,7 @@ int osc_build_rpc(const struct lu_env *env, struct client_obd *cli,
 	list_for_each_entry(ext, ext_list, oe_link) {
 		LASSERT(ext->oe_state == OES_RPC);
 		mem_tight |= ext->oe_memalloc;
+		grant += ext->oe_grants;
 		page_count += ext->oe_nr_pages;
 		if (!obj)
 			obj = ext->oe_obj;
@@ -1732,6 +1769,9 @@ int osc_build_rpc(const struct lu_env *env, struct client_obd *cli,
 	crattr->cra_oa = oa;
 	cl_req_attr_set(env, osc2cl(obj), crattr);
 
+	if (cmd == OBD_BRW_WRITE)
+		oa->o_grant_used = grant;
+
 	sort_brw_pages(pga, page_count);
 	rc = osc_brw_prep_request(cmd, cli, oa, page_count, pga, &req, 1, 0);
 	if (rc != 0) {
@@ -2435,12 +2475,15 @@ static int osc_reconnect(const struct lu_env *env,
 	struct client_obd *cli = &obd->u.cli;
 
 	if (data && (data->ocd_connect_flags & OBD_CONNECT_GRANT)) {
-		long lost_grant;
+		long lost_grant, grant;
 
 		spin_lock(&cli->cl_loi_list_lock);
-		data->ocd_grant = (cli->cl_avail_grant +
-				   (cli->cl_dirty_pages << PAGE_SHIFT)) ?:
-				   2 * cli_brw_size(obd);
+		grant = cli->cl_avail_grant + cli->cl_reserved_grant;
+		if (data->ocd_connect_flags & OBD_CONNECT_GRANT_PARAM)
+			grant += cli->cl_dirty_grant;
+		else
+			grant += cli->cl_dirty_pages << PAGE_SHIFT;
+		data->ocd_grant = grant ? : 2 * cli_brw_size(obd);
 		lost_grant = cli->cl_lost_grant;
 		cli->cl_lost_grant = 0;
 		spin_unlock(&cli->cl_loi_list_lock);
diff --git a/drivers/staging/lustre/lustre/ptlrpc/pack_generic.c b/drivers/staging/lustre/lustre/ptlrpc/pack_generic.c
index 6ac9bb5..0337b33 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/pack_generic.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/pack_generic.c
@@ -1551,8 +1551,8 @@ void lustre_swab_connect(struct obd_connect_data *ocd)
 	/* ocd_blocksize and ocd_inodespace don't need to be swabbed because
 	 * they are 8-byte values
 	 */
-	__swab16s(&ocd->ocd_grant_extent);
-	__swab32s(&ocd->ocd_unused);
+	__swab16s(&ocd->ocd_grant_tax_kb);
+	__swab32s(&ocd->ocd_grant_max_blks);
 	__swab64s(&ocd->ocd_transno);
 	__swab32s(&ocd->ocd_group);
 	__swab32s(&ocd->ocd_cksum_types);
diff --git a/drivers/staging/lustre/lustre/ptlrpc/wiretest.c b/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
index f9394c3..2b3608c 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
@@ -884,22 +884,22 @@ void lustre_assert_wire_constants(void)
 		 (long long)(int)offsetof(struct obd_connect_data, ocd_ibits_known));
 	LASSERTF((int)sizeof(((struct obd_connect_data *)0)->ocd_ibits_known) == 8, "found %lld\n",
 		 (long long)(int)sizeof(((struct obd_connect_data *)0)->ocd_ibits_known));
-	LASSERTF((int)offsetof(struct obd_connect_data, ocd_blocksize) == 32, "found %lld\n",
-		 (long long)(int)offsetof(struct obd_connect_data, ocd_blocksize));
-	LASSERTF((int)sizeof(((struct obd_connect_data *)0)->ocd_blocksize) == 1, "found %lld\n",
-		 (long long)(int)sizeof(((struct obd_connect_data *)0)->ocd_blocksize));
-	LASSERTF((int)offsetof(struct obd_connect_data, ocd_inodespace) == 33, "found %lld\n",
-		 (long long)(int)offsetof(struct obd_connect_data, ocd_inodespace));
-	LASSERTF((int)sizeof(((struct obd_connect_data *)0)->ocd_inodespace) == 1, "found %lld\n",
-		 (long long)(int)sizeof(((struct obd_connect_data *)0)->ocd_inodespace));
-	LASSERTF((int)offsetof(struct obd_connect_data, ocd_grant_extent) == 34, "found %lld\n",
-		 (long long)(int)offsetof(struct obd_connect_data, ocd_grant_extent));
-	LASSERTF((int)sizeof(((struct obd_connect_data *)0)->ocd_grant_extent) == 2, "found %lld\n",
-		 (long long)(int)sizeof(((struct obd_connect_data *)0)->ocd_grant_extent));
-	LASSERTF((int)offsetof(struct obd_connect_data, ocd_unused) == 36, "found %lld\n",
-		 (long long)(int)offsetof(struct obd_connect_data, ocd_unused));
-	LASSERTF((int)sizeof(((struct obd_connect_data *)0)->ocd_unused) == 4, "found %lld\n",
-		 (long long)(int)sizeof(((struct obd_connect_data *)0)->ocd_unused));
+	LASSERTF((int)offsetof(struct obd_connect_data, ocd_grant_blkbits) == 32, "found %lld\n",
+		 (long long)(int)offsetof(struct obd_connect_data, ocd_grant_blkbits));
+	LASSERTF((int)sizeof(((struct obd_connect_data *)0)->ocd_grant_blkbits) == 1, "found %lld\n",
+		 (long long)(int)sizeof(((struct obd_connect_data *)0)->ocd_grant_blkbits));
+	LASSERTF((int)offsetof(struct obd_connect_data, ocd_grant_inobits) == 33, "found %lld\n",
+		 (long long)(int)offsetof(struct obd_connect_data, ocd_grant_inobits));
+	LASSERTF((int)sizeof(((struct obd_connect_data *)0)->ocd_grant_inobits) == 1, "found %lld\n",
+		 (long long)(int)sizeof(((struct obd_connect_data *)0)->ocd_grant_inobits));
+	LASSERTF((int)offsetof(struct obd_connect_data, ocd_grant_tax_kb) == 34, "found %lld\n",
+		 (long long)(int)offsetof(struct obd_connect_data, ocd_grant_tax_kb));
+	LASSERTF((int)sizeof(((struct obd_connect_data *)0)->ocd_grant_tax_kb) == 2, "found %lld\n",
+		 (long long)(int)sizeof(((struct obd_connect_data *)0)->ocd_grant_tax_kb));
+	LASSERTF((int)offsetof(struct obd_connect_data, ocd_grant_max_blks) == 36, "found %lld\n",
+		 (long long)(int)offsetof(struct obd_connect_data, ocd_grant_max_blks));
+	LASSERTF((int)sizeof(((struct obd_connect_data *)0)->ocd_grant_max_blks) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct obd_connect_data *)0)->ocd_grant_max_blks));
 	LASSERTF((int)offsetof(struct obd_connect_data, ocd_transno) == 40, "found %lld\n",
 		 (long long)(int)offsetof(struct obd_connect_data, ocd_transno));
 	LASSERTF((int)sizeof(((struct obd_connect_data *)0)->ocd_transno) == 8, "found %lld\n",
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [lustre-devel] [PATCH 02/18] lustre: ladvise: Add feature of giving file access advices
  2018-07-02 23:24 [lustre-devel] [PATCH 00/18] lustre: missing updates from lustre 2.9 James Simmons
  2018-07-02 23:24 ` [lustre-devel] [PATCH 01/18] lustre: grant: add support for OBD_CONNECT_GRANT_PARAM James Simmons
@ 2018-07-02 23:24 ` James Simmons
  2018-07-03  3:32   ` NeilBrown
  2018-07-02 23:24 ` [lustre-devel] [PATCH 03/18] lustre: fileset: add fileset mount support James Simmons
                   ` (16 subsequent siblings)
  18 siblings, 1 reply; 41+ messages in thread
From: James Simmons @ 2018-07-02 23:24 UTC (permalink / raw)
  To: lustre-devel

From: Li Xi <lixi@ddn.com>

The fadvise() system call provided by Linux kernel enables
applications to give advices or hints about how a file will be
accessed. However, It is only a client side mechanism which is
not enough for distributed file systems like Lustre, because in
order to tune system-wide cache or read-ahead policies, servers
need to understand the advices too.

This patch adds a new feature named ladvise which provides new
APIs and utils to give advices about the access pattern of Lustre
files with the purpose of performance improvement. It is similar
to Linux fadvise() system call, except it forwards the advices
directly from Lustre client to server. The server side codes will
apply appropriate read-ahead and caching techniques for the
corresponding files.

A typical workload for ladvise is e.g. a bunch of different
clients are doing small random reads of a file, so prefetching
pages into OSS cache with big linear reads before the random IO
is a net benefit. Fetching all that data into each client cache
with fadvise() may not be, due to much more data being sent to
the client.

Signed-off-by: Li Xi <lixi@ddn.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-4931
Reviewed-on: http://review.whamcloud.com/10029
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/include/uapi/linux/lustre/lustre_idl.h  |   3 +-
 .../lustre/include/uapi/linux/lustre/lustre_user.h |  32 +++++++
 drivers/staging/lustre/lustre/include/cl_object.h  |  13 +++
 .../lustre/lustre/include/lustre_req_layout.h      |   4 +
 .../staging/lustre/lustre/include/lustre_swab.h    |   2 +
 drivers/staging/lustre/lustre/llite/file.c         | 106 +++++++++++++++++++++
 drivers/staging/lustre/lustre/llite/vvp_io.c       |   3 +
 drivers/staging/lustre/lustre/lov/lov_io.c         |  28 ++++++
 drivers/staging/lustre/lustre/obdclass/cl_io.c     |   2 +
 .../staging/lustre/lustre/osc/osc_cl_internal.h    |   1 +
 drivers/staging/lustre/lustre/osc/osc_dev.c        |   1 +
 drivers/staging/lustre/lustre/osc/osc_internal.h   |   4 +
 drivers/staging/lustre/lustre/osc/osc_io.c         |  79 +++++++++++++++
 drivers/staging/lustre/lustre/osc/osc_request.c    |  94 ++++++++++++++++++
 drivers/staging/lustre/lustre/ptlrpc/layout.c      |  24 +++++
 .../staging/lustre/lustre/ptlrpc/lproc_ptlrpc.c    |   1 +
 .../staging/lustre/lustre/ptlrpc/pack_generic.c    |  19 ++++
 drivers/staging/lustre/lustre/ptlrpc/wiretest.c    |  54 ++++++++++-
 18 files changed, 468 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h
index 3d77ed6..5fab107 100644
--- a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h
@@ -812,7 +812,8 @@ enum ost_cmd {
 	OST_QUOTACHECK = 18, /* not used since 2.4 */
 	OST_QUOTACTL   = 19,
 	OST_QUOTA_ADJUST_QUNIT = 20, /* not used since 2.4 */
-	OST_LAST_OPC
+	OST_LADVISE		= 21,
+	OST_LAST_OPC /* must be < 33 to avoid MDS_GETATTR */
 };
 #define OST_FIRST_OPC  OST_REPLY
 
diff --git a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_user.h b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_user.h
index 69387f3..fc33a43 100644
--- a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_user.h
+++ b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_user.h
@@ -276,6 +276,7 @@ struct ost_id {
 #define LL_IOC_MIGRATE			_IOR('f', 247, int)
 #define LL_IOC_FID2MDTIDX		_IOWR('f', 248, struct lu_fid)
 #define LL_IOC_GETPARENT		_IOWR('f', 249, struct getparent)
+#define LL_IOC_LADVISE			_IOR('f', 250, struct lu_ladvise)
 
 /* Lease types for use as arg and return of LL_IOC_{GET,SET}_LEASE ioctl. */
 enum ll_lease_type {
@@ -1322,6 +1323,37 @@ struct hsm_copy {
 	struct hsm_action_item	hc_hai;
 };
 
+enum lu_ladvise_type {
+	LU_LADVISE_INVALID	= 0,
+};
+
+#define LU_LADVISE_NAMES { }
+
+struct lu_ladvise {
+	__u64			lla_advice;
+	__u64			lla_start;
+	__u64			lla_end;
+	__u64			lla_padding;
+};
+
+enum ladvise_flag {
+	LF_ASYNC	= 0x00000001,
+};
+
+#define LADVISE_MAGIC 0x1ADF1CE0
+#define LF_MASK LF_ASYNC
+
+struct ladvise_hdr {
+	__u32			lah_magic;	/* LADVISE_MAGIC */
+	__u32			lah_count;	/* number of advices */
+	__u64			lah_flags;	/* from enum ladvise_flag */
+	__u64			lah_padding1;	/* unused */
+	__u64			lah_padding2;	/* unused */
+	struct lu_ladvise	lah_advise[0];
+};
+
+#define LAH_COUNT_MAX		1024
+
 /** @} lustreuser */
 
 #endif /* _LUSTRE_USER_H */
diff --git a/drivers/staging/lustre/lustre/include/cl_object.h b/drivers/staging/lustre/lustre/include/cl_object.h
index 1491beb..58af22e 100644
--- a/drivers/staging/lustre/lustre/include/cl_object.h
+++ b/drivers/staging/lustre/lustre/include/cl_object.h
@@ -1394,6 +1394,11 @@ enum cl_io_type {
 	 * cl_io_loop() is never called for it.
 	 */
 	CIT_MISC,
+	/**
+	 * ladvise handling
+	 * To give advice about access of a file
+	 */
+	CIT_LADVISE,
 	CIT_OP_NR
 };
 
@@ -1804,6 +1809,14 @@ struct cl_io {
 			/* how many pages were written/discarded */
 			unsigned int       fi_nr_written;
 		} ci_fsync;
+		struct cl_ladvise_io {
+			u64			li_start;
+			u64			li_end;
+			/** file system level fid */
+			struct lu_fid	       *li_fid;
+			enum lu_ladvise_type	li_advice;
+			u64			li_flags;
+		} ci_ladvise;
 	} u;
 	struct cl_2queue     ci_queue;
 	size_t	       ci_nob;
diff --git a/drivers/staging/lustre/lustre/include/lustre_req_layout.h b/drivers/staging/lustre/lustre/include/lustre_req_layout.h
index 213d0a0..db6d8ed 100644
--- a/drivers/staging/lustre/lustre/include/lustre_req_layout.h
+++ b/drivers/staging/lustre/lustre/include/lustre_req_layout.h
@@ -190,6 +190,7 @@ void req_capsule_shrink(struct req_capsule *pill,
 extern struct req_format RQF_OST_GET_INFO_LAST_FID;
 extern struct req_format RQF_OST_SET_INFO_LAST_FID;
 extern struct req_format RQF_OST_GET_INFO_FIEMAP;
+extern struct req_format RQF_OST_LADVISE;
 
 /* LDLM req_format */
 extern struct req_format RQF_LDLM_ENQUEUE;
@@ -299,6 +300,9 @@ void req_capsule_shrink(struct req_capsule *pill,
 extern struct req_msg_field RMF_MGS_CONFIG_BODY;
 extern struct req_msg_field RMF_MGS_CONFIG_RES;
 
+extern struct req_msg_field RMF_OST_LADVISE_HDR;
+extern struct req_msg_field RMF_OST_LADVISE;
+
 /* generic uint32 */
 extern struct req_msg_field RMF_U32;
 
diff --git a/drivers/staging/lustre/lustre/include/lustre_swab.h b/drivers/staging/lustre/lustre/include/lustre_swab.h
index 9d786bb..e09a3dc 100644
--- a/drivers/staging/lustre/lustre/include/lustre_swab.h
+++ b/drivers/staging/lustre/lustre/include/lustre_swab.h
@@ -99,6 +99,8 @@ void lustre_swab_lov_user_md_objects(struct lov_user_ost_data *lod,
 void lustre_swab_swap_layouts(struct mdc_swap_layouts *msl);
 void lustre_swab_close_data(struct close_data *data);
 void lustre_swab_lmv_user_md(struct lmv_user_md *lum);
+void lustre_swab_ladvise(struct lu_ladvise *ladvise);
+void lustre_swab_ladvise_hdr(struct ladvise_hdr *ladvise_hdr);
 
 /* Functions for dumping PTLRPC fields */
 void dump_rniobuf(struct niobuf_remote *rnb);
diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index 59b5fbc..44bec1d 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -1926,6 +1926,54 @@ static inline long ll_lease_type_from_fmode(fmode_t fmode)
 	       ((fmode & FMODE_WRITE) ? LL_LEASE_WRLCK : 0);
 }
 
+/*
+ * Give file access advices
+ *
+ * The ladvise interface is similar to Linux fadvise() system call, except it
+ * forwards the advices directly from Lustre client to server. The server side
+ * codes will apply appropriate read-ahead and caching techniques for the
+ * corresponding files.
+ *
+ * A typical workload for ladvise is e.g. a bunch of different clients are
+ * doing small random reads of a file, so prefetching pages into OSS cache
+ * with big linear reads before the random IO is a net benefit. Fetching
+ * all that data into each client cache with fadvise() may not be, due to
+ * much more data being sent to the client.
+ */
+static int ll_ladvise(struct inode *inode, struct file *file, __u64 flags,
+		      struct lu_ladvise *ladvise)
+{
+	struct cl_ladvise_io *lio;
+	struct lu_env *env;
+	struct cl_io *io;
+	u16 refcheck;
+	int rc;
+
+	env = cl_env_get(&refcheck);
+	if (IS_ERR(env))
+		return PTR_ERR(env);
+
+	io = vvp_env_thread_io(env);
+	io->ci_obj = ll_i2info(inode)->lli_clob;
+
+	/* initialize parameters for ladvise */
+	lio = &io->u.ci_ladvise;
+	lio->li_start = ladvise->lla_start;
+	lio->li_end = ladvise->lla_end;
+	lio->li_fid = ll_inode2fid(inode);
+	lio->li_advice = ladvise->lla_advice;
+	lio->li_flags = flags;
+
+	if (!cl_io_init(env, io, CIT_LADVISE, io->ci_obj))
+		rc = cl_io_loop(env, io);
+	else
+		rc = io->ci_result;
+
+	cl_io_fini(env, io);
+	cl_env_put(env, &refcheck);
+	return rc;
+}
+
 static long
 ll_file_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 {
@@ -2248,6 +2296,64 @@ static inline long ll_lease_type_from_fmode(fmode_t fmode)
 		kfree(hui);
 		return rc;
 	}
+	case LL_IOC_LADVISE: {
+		struct ladvise_hdr *ladvise_hdr;
+		int alloc_size = sizeof(*ladvise_hdr);
+		int num_advise;
+		int i;
+
+		rc = 0;
+		ladvise_hdr = kzalloc(alloc_size, GFP_NOFS);
+		if (!ladvise_hdr)
+			return -ENOMEM;
+
+		if (copy_from_user(ladvise_hdr,
+				   (const struct ladvise_hdr __user *)arg,
+				   alloc_size)) {
+			rc = -EFAULT;
+			goto out_ladvise;
+		}
+
+		if (ladvise_hdr->lah_magic != LADVISE_MAGIC ||
+		    ladvise_hdr->lah_count < 1) {
+			rc = -EINVAL;
+			goto out_ladvise;
+		}
+
+		num_advise = ladvise_hdr->lah_count;
+		if (num_advise >= LAH_COUNT_MAX) {
+			rc = -EFBIG;
+			goto out_ladvise;
+		}
+
+		kfree(ladvise_hdr);
+		alloc_size = offsetof(typeof(*ladvise_hdr),
+				      lah_advise[num_advise]);
+		ladvise_hdr = kzalloc(alloc_size, GFP_NOFS);
+		if (!ladvise_hdr)
+			return -ENOMEM;
+
+		/*
+		 * TODO: submit multiple advices to one server in a single RPC
+		 */
+		if (copy_from_user(ladvise_hdr,
+				   (const struct ladvise_hdr __user *)arg,
+				   alloc_size)) {
+			rc = -EFAULT;
+			goto out_ladvise;
+		}
+
+		for (i = 0; i < num_advise; i++) {
+			rc = ll_ladvise(inode, file, ladvise_hdr->lah_flags,
+					&ladvise_hdr->lah_advise[i]);
+			if (rc)
+				break;
+		}
+
+out_ladvise:
+		kfree(ladvise_hdr);
+		return rc;
+	}
 	default: {
 		int err;
 
diff --git a/drivers/staging/lustre/lustre/llite/vvp_io.c b/drivers/staging/lustre/lustre/llite/vvp_io.c
index df47ed9..70d2387 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_io.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_io.c
@@ -1302,6 +1302,9 @@ static int vvp_io_read_ahead(const struct lu_env *env,
 		},
 		[CIT_MISC] = {
 			.cio_fini   = vvp_io_fini
+		},
+		[CIT_LADVISE] = {
+			.cio_fini	= vvp_io_fini
 		}
 	},
 	.cio_read_ahead	= vvp_io_read_ahead,
diff --git a/drivers/staging/lustre/lustre/lov/lov_io.c b/drivers/staging/lustre/lustre/lov/lov_io.c
index 5098284..6537ba3 100644
--- a/drivers/staging/lustre/lustre/lov/lov_io.c
+++ b/drivers/staging/lustre/lustre/lov/lov_io.c
@@ -123,6 +123,14 @@ static void lov_io_sub_inherit(struct cl_io *io, struct lov_io *lio,
 		}
 		break;
 	}
+	case CIT_LADVISE: {
+		io->u.ci_ladvise.li_start = start;
+		io->u.ci_ladvise.li_end = end;
+		io->u.ci_ladvise.li_fid = parent->u.ci_ladvise.li_fid;
+		io->u.ci_ladvise.li_advice = parent->u.ci_ladvise.li_advice;
+		io->u.ci_ladvise.li_flags = parent->u.ci_ladvise.li_flags;
+		break;
+	}
 	default:
 		break;
 	}
@@ -315,6 +323,12 @@ static int lov_io_slice_init(struct lov_io *lio, struct lov_object *obj,
 		break;
 	}
 
+	case CIT_LADVISE: {
+		lio->lis_pos = io->u.ci_ladvise.li_start;
+		lio->lis_endpos = io->u.ci_ladvise.li_end;
+		break;
+	}
+
 	case CIT_MISC:
 		lio->lis_pos = 0;
 		lio->lis_endpos = OBD_OBJECT_EOF;
@@ -837,6 +851,15 @@ static void lov_io_fsync_end(const struct lu_env *env,
 			.cio_start     = lov_io_start,
 			.cio_end       = lov_io_fsync_end
 		},
+		[CIT_LADVISE] = {
+			.cio_fini	= lov_io_fini,
+			.cio_iter_init	= lov_io_iter_init,
+			.cio_iter_fini	= lov_io_iter_fini,
+			.cio_lock	= lov_io_lock,
+			.cio_unlock	= lov_io_unlock,
+			.cio_start	= lov_io_start,
+			.cio_end	= lov_io_end
+		},
 		[CIT_MISC] = {
 			.cio_fini   = lov_io_fini
 		}
@@ -908,6 +931,9 @@ static void lov_empty_impossible(const struct lu_env *env,
 		[CIT_FSYNC] = {
 			.cio_fini   = lov_empty_io_fini
 		},
+		[CIT_LADVISE] = {
+			.cio_fini	= lov_empty_io_fini
+		},
 		[CIT_MISC] = {
 			.cio_fini   = lov_empty_io_fini
 		}
@@ -950,6 +976,7 @@ int lov_io_init_empty(const struct lu_env *env, struct cl_object *obj,
 		result = 0;
 		break;
 	case CIT_FSYNC:
+	case CIT_LADVISE:
 	case CIT_SETATTR:
 	case CIT_DATA_VERSION:
 		result = 1;
@@ -989,6 +1016,7 @@ int lov_io_init_released(const struct lu_env *env, struct cl_object *obj,
 		break;
 	case CIT_MISC:
 	case CIT_FSYNC:
+	case CIT_LADVISE:
 	case CIT_DATA_VERSION:
 		result = 1;
 		break;
diff --git a/drivers/staging/lustre/lustre/obdclass/cl_io.c b/drivers/staging/lustre/lustre/obdclass/cl_io.c
index fcdae60..2c77e72 100644
--- a/drivers/staging/lustre/lustre/obdclass/cl_io.c
+++ b/drivers/staging/lustre/lustre/obdclass/cl_io.c
@@ -140,6 +140,8 @@ void cl_io_fini(const struct lu_env *env, struct cl_io *io)
 		LASSERT(ergo(io->ci_ignore_layout || !io->ci_verify_layout,
 			     !io->ci_need_restart));
 		break;
+	case CIT_LADVISE:
+		break;
 	default:
 		LBUG();
 	}
diff --git a/drivers/staging/lustre/lustre/osc/osc_cl_internal.h b/drivers/staging/lustre/lustre/osc/osc_cl_internal.h
index 2d3cba1..d86d3f7 100644
--- a/drivers/staging/lustre/lustre/osc/osc_cl_internal.h
+++ b/drivers/staging/lustre/lustre/osc/osc_cl_internal.h
@@ -110,6 +110,7 @@ struct osc_thread_info {
 	pgoff_t			oti_fn_index; /* first non-overlapped index */
 	struct cl_sync_io	oti_anchor;
 	struct cl_req_attr	oti_req_attr;
+	struct lu_buf		oti_ladvise_buf;
 };
 
 struct osc_object {
diff --git a/drivers/staging/lustre/lustre/osc/osc_dev.c b/drivers/staging/lustre/lustre/osc/osc_dev.c
index 2b5f324..c767a3c 100644
--- a/drivers/staging/lustre/lustre/osc/osc_dev.c
+++ b/drivers/staging/lustre/lustre/osc/osc_dev.c
@@ -122,6 +122,7 @@ static void osc_key_fini(const struct lu_context *ctx,
 {
 	struct osc_thread_info *info = data;
 
+	kvfree(info->oti_ladvise_buf.lb_buf);
 	kmem_cache_free(osc_thread_kmem, info);
 }
 
diff --git a/drivers/staging/lustre/lustre/osc/osc_internal.h b/drivers/staging/lustre/lustre/osc/osc_internal.h
index 4ddba13..02e8318 100644
--- a/drivers/staging/lustre/lustre/osc/osc_internal.h
+++ b/drivers/staging/lustre/lustre/osc/osc_internal.h
@@ -129,6 +129,10 @@ int osc_sync_base(struct osc_object *exp, struct obdo *oa,
 		  obd_enqueue_update_f upcall, void *cookie,
 		  struct ptlrpc_request_set *rqset);
 
+int osc_ladvise_base(struct obd_export *exp, struct obdo *oa,
+		     struct ladvise_hdr *ladvise_hdr,
+		     obd_enqueue_update_f upcall, void *cookie,
+		     struct ptlrpc_request_set *rqset);
 int osc_process_config_base(struct obd_device *obd, struct lustre_cfg *cfg);
 int osc_build_rpc(const struct lu_env *env, struct client_obd *cli,
 		  struct list_head *ext_list, int cmd);
diff --git a/drivers/staging/lustre/lustre/osc/osc_io.c b/drivers/staging/lustre/lustre/osc/osc_io.c
index 955525f..628743b 100644
--- a/drivers/staging/lustre/lustre/osc/osc_io.c
+++ b/drivers/staging/lustre/lustre/osc/osc_io.c
@@ -843,6 +843,80 @@ static void osc_io_fsync_end(const struct lu_env *env,
 	slice->cis_io->ci_result = result;
 }
 
+static int osc_io_ladvise_start(const struct lu_env *env,
+				const struct cl_io_slice *slice)
+{
+	struct cl_io *io = slice->cis_io;
+	struct osc_io *oio = cl2osc_io(env, slice);
+	struct cl_object *obj = slice->cis_obj;
+	struct lov_oinfo *loi = cl2osc(obj)->oo_oinfo;
+	struct cl_ladvise_io *lio = &io->u.ci_ladvise;
+	struct obdo *oa = &oio->oi_oa;
+	struct osc_async_cbargs *cbargs = &oio->oi_cbarg;
+	struct ladvise_hdr *ladvise_hdr;
+	struct lu_ladvise *ladvise;
+	int num_advise = 1;
+	int result = 0;
+	int buf_size;
+
+	/* TODO: add multiple ladvise support in CLIO */
+	buf_size = offsetof(typeof(*ladvise_hdr), lah_advise[num_advise]);
+	if (osc_env_info(env)->oti_ladvise_buf.lb_len < buf_size) {
+		kvfree(osc_env_info(env)->oti_ladvise_buf.lb_buf);
+		osc_env_info(env)->oti_ladvise_buf.lb_buf = kvzalloc(buf_size,
+								     GFP_NOFS);
+		if (likely(osc_env_info(env)->oti_ladvise_buf.lb_buf))
+			osc_env_info(env)->oti_ladvise_buf.lb_len = buf_size;
+	}
+
+	ladvise_hdr = osc_env_info(env)->oti_ladvise_buf.lb_buf;
+	if (!ladvise_hdr)
+		return -ENOMEM;
+
+	memset(ladvise_hdr, 0, buf_size);
+	ladvise_hdr->lah_magic = LADVISE_MAGIC;
+	ladvise_hdr->lah_count = num_advise;
+	ladvise_hdr->lah_flags = lio->li_flags;
+
+	memset(oa, 0, sizeof(*oa));
+	oa->o_oi = loi->loi_oi;
+	oa->o_valid = OBD_MD_FLID;
+	obdo_set_parent_fid(oa, lio->li_fid);
+
+	ladvise = ladvise_hdr->lah_advise;
+	ladvise->lla_start = lio->li_start;
+	ladvise->lla_end = lio->li_end;
+	ladvise->lla_advice = lio->li_advice;
+
+	if (lio->li_flags & LF_ASYNC) {
+		result = osc_ladvise_base(osc_export(cl2osc(obj)), oa,
+					  ladvise_hdr, NULL, NULL, NULL);
+	} else {
+		init_completion(&cbargs->opc_sync);
+		result = osc_ladvise_base(osc_export(cl2osc(obj)), oa,
+					  ladvise_hdr, osc_async_upcall,
+					  cbargs, PTLRPCD_SET);
+		cbargs->opc_rpc_sent = !result;
+	}
+	return result;
+}
+
+static void osc_io_ladvise_end(const struct lu_env *env,
+			       const struct cl_io_slice *slice)
+{
+	struct cl_io *io = slice->cis_io;
+	struct osc_io *oio = cl2osc_io(env, slice);
+	struct osc_async_cbargs *cbargs = &oio->oi_cbarg;
+	struct cl_ladvise_io *lio = &io->u.ci_ladvise;
+	int result = 0;
+
+	if ((!(lio->li_flags & LF_ASYNC)) && cbargs->opc_rpc_sent) {
+		wait_for_completion(&cbargs->opc_sync);
+		result = cbargs->opc_rc;
+	}
+	slice->cis_io->ci_result = result;
+}
+
 static void osc_io_end(const struct lu_env *env,
 		       const struct cl_io_slice *slice)
 {
@@ -891,6 +965,11 @@ static void osc_io_end(const struct lu_env *env,
 			.cio_end    = osc_io_fsync_end,
 			.cio_fini   = osc_io_fini
 		},
+		[CIT_LADVISE] = {
+			.cio_start	= osc_io_ladvise_start,
+			.cio_end	= osc_io_ladvise_end,
+			.cio_fini	= osc_io_fini
+		},
 		[CIT_MISC] = {
 			.cio_fini   = osc_io_fini
 		}
diff --git a/drivers/staging/lustre/lustre/osc/osc_request.c b/drivers/staging/lustre/lustre/osc/osc_request.c
index ce073b6..0286f25 100644
--- a/drivers/staging/lustre/lustre/osc/osc_request.c
+++ b/drivers/staging/lustre/lustre/osc/osc_request.c
@@ -91,6 +91,12 @@ struct osc_fsync_args {
 	void		*fa_cookie;
 };
 
+struct osc_ladvise_args {
+	struct obdo		*la_oa;
+	obd_enqueue_update_f	 la_upcall;
+	void			*la_cookie;
+};
+
 struct osc_enqueue_args {
 	struct obd_export	*oa_exp;
 	enum ldlm_type		oa_type;
@@ -269,6 +275,94 @@ int osc_setattr_async(struct obd_export *exp, struct obdo *oa,
 	return 0;
 }
 
+static int osc_ladvise_interpret(const struct lu_env *env,
+				 struct ptlrpc_request *req,
+				 void *arg, int rc)
+{
+	struct osc_ladvise_args *la = arg;
+	struct ost_body *body;
+
+	if (rc)
+		goto out;
+
+	body = req_capsule_server_get(&req->rq_pill, &RMF_OST_BODY);
+	if (!body) {
+		rc = -EPROTO;
+		goto out;
+	}
+
+	*la->la_oa = body->oa;
+out:
+	rc = la->la_upcall(la->la_cookie, rc);
+	return rc;
+}
+
+/**
+ * If rqset is NULL, do not wait for response. Upcall and cookie could also
+ * be NULL in this case
+ */
+int osc_ladvise_base(struct obd_export *exp, struct obdo *oa,
+		     struct ladvise_hdr *ladvise_hdr,
+		     obd_enqueue_update_f upcall, void *cookie,
+		     struct ptlrpc_request_set *rqset)
+{
+	struct lu_ladvise *ladvise = ladvise_hdr->lah_advise;
+	int num_advise = ladvise_hdr->lah_count;
+	struct ladvise_hdr *req_ladvise_hdr;
+	struct lu_ladvise *req_ladvise;
+	struct osc_ladvise_args *la;
+	struct ptlrpc_request *req;
+	struct ost_body *body;
+	int rc;
+
+	req = ptlrpc_request_alloc(class_exp2cliimp(exp), &RQF_OST_LADVISE);
+	if (!req)
+		return -ENOMEM;
+
+	req_capsule_set_size(&req->rq_pill, &RMF_OST_LADVISE, RCL_CLIENT,
+			     num_advise * sizeof(*ladvise));
+	rc = ptlrpc_request_pack(req, LUSTRE_OST_VERSION, OST_LADVISE);
+	if (rc) {
+		ptlrpc_request_free(req);
+		return rc;
+	}
+	req->rq_request_portal = OST_IO_PORTAL;
+	ptlrpc_at_set_req_timeout(req);
+
+	body = req_capsule_client_get(&req->rq_pill, &RMF_OST_BODY);
+	LASSERT(body);
+	lustre_set_wire_obdo(&req->rq_import->imp_connect_data, &body->oa,
+			     oa);
+
+	req_ladvise_hdr = req_capsule_client_get(&req->rq_pill,
+						 &RMF_OST_LADVISE_HDR);
+	memcpy(req_ladvise_hdr, ladvise_hdr, sizeof(*ladvise_hdr));
+
+	req_ladvise = req_capsule_client_get(&req->rq_pill, &RMF_OST_LADVISE);
+	memcpy(req_ladvise, ladvise, sizeof(*ladvise) * num_advise);
+	ptlrpc_request_set_replen(req);
+
+	if (!rqset) {
+		/* Do not wait for response. */
+		ptlrpcd_add_req(req);
+		return 0;
+	}
+
+	req->rq_interpret_reply = osc_ladvise_interpret;
+	BUILD_BUG_ON(sizeof(*la) > sizeof(req->rq_async_args));
+	la = ptlrpc_req_async_args(req);
+	la->la_oa = oa;
+	la->la_upcall = upcall;
+	la->la_cookie = cookie;
+
+	if (rqset == PTLRPCD_SET)
+		ptlrpcd_add_req(req);
+	else
+		ptlrpc_set_add_req(rqset, req);
+
+	return 0;
+}
+
 static int osc_create(const struct lu_env *env, struct obd_export *exp,
 		      struct obdo *oa)
 {
diff --git a/drivers/staging/lustre/lustre/ptlrpc/layout.c b/drivers/staging/lustre/lustre/ptlrpc/layout.c
index 417d4a1..6ef8789 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/layout.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/layout.c
@@ -605,6 +605,13 @@
 	&RMF_FIEMAP_VAL
 };
 
+static const struct req_msg_field *ost_ladvise[] = {
+	&RMF_PTLRPC_BODY,
+	&RMF_OST_BODY,
+	&RMF_OST_LADVISE_HDR,
+	&RMF_OST_LADVISE,
+};
+
 static const struct req_msg_field *ost_get_fiemap_server[] = {
 	&RMF_PTLRPC_BODY,
 	&RMF_FIEMAP_VAL
@@ -716,6 +723,7 @@
 	&RQF_OST_GET_INFO_LAST_FID,
 	&RQF_OST_SET_INFO_LAST_FID,
 	&RQF_OST_GET_INFO_FIEMAP,
+	&RQF_OST_LADVISE,
 	&RQF_LDLM_ENQUEUE,
 	&RQF_LDLM_ENQUEUE_LVB,
 	&RQF_LDLM_CONVERT,
@@ -1110,6 +1118,18 @@ struct req_msg_field RMF_SWAP_LAYOUTS =
 	DEFINE_MSGF("swap_layouts", 0, sizeof(struct  mdc_swap_layouts),
 		    lustre_swab_swap_layouts, NULL);
 EXPORT_SYMBOL(RMF_SWAP_LAYOUTS);
+
+struct req_msg_field RMF_OST_LADVISE_HDR =
+	DEFINE_MSGF("ladvise_request", 0, sizeof(struct ladvise_hdr),
+		    lustre_swab_ladvise_hdr, NULL);
+EXPORT_SYMBOL(RMF_OST_LADVISE_HDR);
+
+struct req_msg_field RMF_OST_LADVISE =
+	DEFINE_MSGF("ladvise_request", RMF_F_STRUCT_ARRAY,
+		    sizeof(struct lu_ladvise),
+		    lustre_swab_ladvise, NULL);
+EXPORT_SYMBOL(RMF_OST_LADVISE);
+
 /*
  * Request formats.
  */
@@ -1552,6 +1572,10 @@ struct req_format RQF_OST_GET_INFO_FIEMAP =
 			ost_get_fiemap_server);
 EXPORT_SYMBOL(RQF_OST_GET_INFO_FIEMAP);
 
+struct req_format RQF_OST_LADVISE =
+	DEFINE_REQ_FMT0("OST_LADVISE", ost_ladvise, ost_body_only);
+EXPORT_SYMBOL(RQF_OST_LADVISE);
+
 /* Convenience macro */
 #define FMT_FIELD(fmt, i, j) ((fmt)->rf_fields[(i)].d[(j)])
 
diff --git a/drivers/staging/lustre/lustre/ptlrpc/lproc_ptlrpc.c b/drivers/staging/lustre/lustre/ptlrpc/lproc_ptlrpc.c
index a61e800..52b980c 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/lproc_ptlrpc.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/lproc_ptlrpc.c
@@ -66,6 +66,7 @@
 	{ OST_QUOTACHECK,   "ost_quotacheck" },
 	{ OST_QUOTACTL,     "ost_quotactl" },
 	{ OST_QUOTA_ADJUST_QUNIT, "ost_quota_adjust_qunit" },
+	{ OST_LADVISE,			"ost_ladvise" },
 	{ MDS_GETATTR,      "mds_getattr" },
 	{ MDS_GETATTR_NAME, "mds_getattr_lock" },
 	{ MDS_CLOSE,	"mds_close" },
diff --git a/drivers/staging/lustre/lustre/ptlrpc/pack_generic.c b/drivers/staging/lustre/lustre/ptlrpc/pack_generic.c
index 0337b33..468fa69 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/pack_generic.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/pack_generic.c
@@ -2309,3 +2309,22 @@ void lustre_swab_close_data(struct close_data *cd)
 	lustre_swab_lu_fid(&cd->cd_fid);
 	__swab64s(&cd->cd_data_version);
 }
+
+void lustre_swab_ladvise(struct lu_ladvise *ladvise)
+{
+	swab64s(&ladvise->lla_start);
+	swab64s(&ladvise->lla_end);
+	swab64s(&ladvise->lla_advice);
+	BUILD_BUG_ON(!offsetof(typeof(*ladvise), lla_padding));
+}
+EXPORT_SYMBOL(lustre_swab_ladvise);
+
+void lustre_swab_ladvise_hdr(struct ladvise_hdr *ladvise_hdr)
+{
+	swab32s(&ladvise_hdr->lah_magic);
+	swab32s(&ladvise_hdr->lah_count);
+	swab64s(&ladvise_hdr->lah_flags);
+	BUILD_BUG_ON(!offsetof(typeof(*ladvise_hdr), lah_padding1));
+	BUILD_BUG_ON(!offsetof(typeof(*ladvise_hdr), lah_padding2));
+}
+EXPORT_SYMBOL(lustre_swab_ladvise_hdr);
diff --git a/drivers/staging/lustre/lustre/ptlrpc/wiretest.c b/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
index 2b3608c..5a68de5 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
@@ -101,7 +101,9 @@ void lustre_assert_wire_constants(void)
 		 (long long)OST_QUOTACTL);
 	LASSERTF(OST_QUOTA_ADJUST_QUNIT == 20, "found %lld\n",
 		 (long long)OST_QUOTA_ADJUST_QUNIT);
-	LASSERTF(OST_LAST_OPC == 21, "found %lld\n",
+	LASSERTF(OST_LADVISE == 21, "found %lld\n",
+		 (long long)OST_LADVISE);
+	LASSERTF(OST_LAST_OPC == 22, "found %lld\n",
 		 (long long)OST_LAST_OPC);
 	LASSERTF(OBD_OBJECT_EOF == 0xffffffffffffffffULL, "found 0x%.16llxULL\n",
 		 OBD_OBJECT_EOF);
@@ -4207,4 +4209,54 @@ void lustre_assert_wire_constants(void)
 	LASSERTF(sizeof(((struct hsm_user_import *)0)->hui_archive_id) == 4,
 		 "found %lld\n",
 	      (long long)sizeof(((struct hsm_user_import *)0)->hui_archive_id));
+
+	/* Checks for struct lu_ladvise */
+	LASSERTF((int)sizeof(struct lu_ladvise) == 32, "found %lld\n",
+		 (long long)(int)sizeof(struct lu_ladvise));
+	LASSERTF((int)offsetof(struct lu_ladvise, lla_advice) == 0, "found %lld\n",
+		 (long long)(int)offsetof(struct lu_ladvise, lla_advice));
+	LASSERTF((int)sizeof(((struct lu_ladvise *)0)->lla_advice) == 8, "found %lld\n",
+		 (long long)(int)sizeof(((struct lu_ladvise *)0)->lla_advice));
+	LASSERTF((int)offsetof(struct lu_ladvise, lla_start) == 8, "found %lld\n",
+		 (long long)(int)offsetof(struct lu_ladvise, lla_start));
+	LASSERTF((int)sizeof(((struct lu_ladvise *)0)->lla_start) == 8, "found %lld\n",
+		 (long long)(int)sizeof(((struct lu_ladvise *)0)->lla_start));
+	LASSERTF((int)offsetof(struct lu_ladvise, lla_end) == 16, "found %lld\n",
+		 (long long)(int)offsetof(struct lu_ladvise, lla_end));
+	LASSERTF((int)sizeof(((struct lu_ladvise *)0)->lla_end) == 8, "found %lld\n",
+		 (long long)(int)sizeof(((struct lu_ladvise *)0)->lla_end));
+	LASSERTF((int)offsetof(struct lu_ladvise, lla_padding) == 24, "found %lld\n",
+		 (long long)(int)offsetof(struct lu_ladvise, lla_padding));
+	LASSERTF((int)sizeof(((struct lu_ladvise *)0)->lla_padding) == 8, "found %lld\n",
+		 (long long)(int)sizeof(((struct lu_ladvise *)0)->lla_padding));
+
+	/* Checks for struct ladvise_hdr */
+	LASSERTF(LADVISE_MAGIC == 0x1ADF1CE0, "found 0x%.8x\n",
+		 LADVISE_MAGIC);
+	LASSERTF((int)sizeof(struct ladvise_hdr) == 32, "found %lld\n",
+		 (long long)(int)sizeof(struct ladvise_hdr));
+	LASSERTF((int)offsetof(struct ladvise_hdr, lah_magic) == 0, "found %lld\n",
+		 (long long)(int)offsetof(struct ladvise_hdr, lah_magic));
+	LASSERTF((int)sizeof(((struct ladvise_hdr *)0)->lah_magic) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct ladvise_hdr *)0)->lah_magic));
+	LASSERTF((int)offsetof(struct ladvise_hdr, lah_count) == 4, "found %lld\n",
+		 (long long)(int)offsetof(struct ladvise_hdr, lah_count));
+	LASSERTF((int)sizeof(((struct ladvise_hdr *)0)->lah_count) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct ladvise_hdr *)0)->lah_count));
+	LASSERTF((int)offsetof(struct ladvise_hdr, lah_flags) == 8, "found %lld\n",
+		 (long long)(int)offsetof(struct ladvise_hdr, lah_flags));
+	LASSERTF((int)sizeof(((struct ladvise_hdr *)0)->lah_flags) == 8, "found %lld\n",
+		 (long long)(int)sizeof(((struct ladvise_hdr *)0)->lah_flags));
+	LASSERTF((int)offsetof(struct ladvise_hdr, lah_padding1) == 16, "found %lld\n",
+		 (long long)(int)offsetof(struct ladvise_hdr, lah_padding1));
+	LASSERTF((int)sizeof(((struct ladvise_hdr *)0)->lah_padding1) == 8, "found %lld\n",
+		 (long long)(int)sizeof(((struct ladvise_hdr *)0)->lah_padding1));
+	LASSERTF((int)offsetof(struct ladvise_hdr, lah_padding2) == 24, "found %lld\n",
+		 (long long)(int)offsetof(struct ladvise_hdr, lah_padding2));
+	LASSERTF((int)sizeof(((struct ladvise_hdr *)0)->lah_padding2) == 8, "found %lld\n",
+		 (long long)(int)sizeof(((struct ladvise_hdr *)0)->lah_padding2));
+	LASSERTF((int)offsetof(struct ladvise_hdr, lah_advise) == 32, "found %lld\n",
+		 (long long)(int)offsetof(struct ladvise_hdr, lah_advise));
+	LASSERTF((int)sizeof(((struct ladvise_hdr *)0)->lah_advise) == 0, "found %lld\n",
+		 (long long)(int)sizeof(((struct ladvise_hdr *)0)->lah_advise));
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [lustre-devel] [PATCH 03/18] lustre: fileset: add fileset mount support
  2018-07-02 23:24 [lustre-devel] [PATCH 00/18] lustre: missing updates from lustre 2.9 James Simmons
  2018-07-02 23:24 ` [lustre-devel] [PATCH 01/18] lustre: grant: add support for OBD_CONNECT_GRANT_PARAM James Simmons
  2018-07-02 23:24 ` [lustre-devel] [PATCH 02/18] lustre: ladvise: Add feature of giving file access advices James Simmons
@ 2018-07-02 23:24 ` James Simmons
  2018-07-03  3:52   ` NeilBrown
  2018-07-02 23:24 ` [lustre-devel] [PATCH 04/18] lustre: obd: rename md_getstatus() to md_get_root() James Simmons
                   ` (15 subsequent siblings)
  18 siblings, 1 reply; 41+ messages in thread
From: James Simmons @ 2018-07-02 23:24 UTC (permalink / raw)
  To: lustre-devel

From: Lai Siyao <lsiyao@whamcloud.com>

This patch enables client to mount subdirectory as fileset.

usage: mount -t lustre mgsname:/fsname/subdir /mount/point

 * mdt lookup fileset fid and return to client during mount.
 * `fid2path` support for fileset.

Signed-off-by: Lai Siyao <lsiyao@whamcloud.com>
Signed-off-by: Kit Westneat <kwestneat@ddn.com>
Signed-off-by: Wang Shilong <wshilong@ddn.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-28
Reviewed-on: http://review.whamcloud.com/5007
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/include/uapi/linux/lustre/lustre_idl.h  |  7 ++-
 .../staging/lustre/lustre/include/lustre_disk.h    |  2 +
 .../lustre/lustre/include/lustre_req_layout.h      |  2 +-
 drivers/staging/lustre/lustre/include/obd.h        |  3 +-
 drivers/staging/lustre/lustre/include/obd_class.h  |  5 ++-
 .../staging/lustre/lustre/include/obd_support.h    |  4 +-
 drivers/staging/lustre/lustre/llite/file.c         | 18 ++++++++
 drivers/staging/lustre/lustre/llite/llite_lib.c    |  6 ++-
 drivers/staging/lustre/lustre/lmv/lmv_obd.c        | 30 +++++++------
 drivers/staging/lustre/lustre/mdc/mdc_request.c    | 50 ++++++++++++++++++----
 drivers/staging/lustre/lustre/obdclass/obd_mount.c | 23 +++++++++-
 drivers/staging/lustre/lustre/ptlrpc/layout.c      | 14 ++++--
 .../staging/lustre/lustre/ptlrpc/lproc_ptlrpc.c    |  2 +-
 drivers/staging/lustre/lustre/ptlrpc/wiretest.c    | 12 +++---
 14 files changed, 135 insertions(+), 43 deletions(-)

diff --git a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h
index 5fab107..6defc6d 100644
--- a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h
@@ -1244,7 +1244,7 @@ enum mds_cmd {
 	MDS_READPAGE		= 37,
 	MDS_CONNECT		= 38,
 	MDS_DISCONNECT		= 39,
-	MDS_GETSTATUS		= 40,
+	MDS_GET_ROOT		= 40,
 	MDS_STATFS		= 41,
 	MDS_PIN			= 42, /* obsolete, never used in a release */
 	MDS_UNPIN		= 43, /* obsolete, never used in a release */
@@ -2626,7 +2626,10 @@ struct getinfo_fid2path {
 	__u64	   gf_recno;
 	__u32	   gf_linkno;
 	__u32	   gf_pathlen;
-	char	    gf_path[0];
+	union {
+		char		gf_path[0];
+		struct lu_fid	gf_root_fid[0];
+	} gf_u;
 } __packed;
 
 /** path2parent request/reply structures */
diff --git a/drivers/staging/lustre/lustre/include/lustre_disk.h b/drivers/staging/lustre/lustre/include/lustre_disk.h
index 886e817..bd8fa71 100644
--- a/drivers/staging/lustre/lustre/include/lustre_disk.h
+++ b/drivers/staging/lustre/lustre/include/lustre_disk.h
@@ -78,6 +78,7 @@ struct lustre_mount_data {
 	int	lmd_recovery_time_hard;
 	char      *lmd_dev;	   /* device name */
 	char      *lmd_profile;    /* client only */
+	char	*lmd_fileset;	/* mount fileset */
 	char      *lmd_mgssec;	/* sptlrpc flavor to mgs */
 	char      *lmd_opts;	/* lustre mount options (as opposed to
 				 * _device_ mount options)
@@ -134,6 +135,7 @@ struct lustre_sb_info {
 #define     s2lsi_nocast(sb) ((sb)->s_fs_info)
 
 #define     get_profile_name(sb)   (s2lsi(sb)->lsi_lmd->lmd_profile)
+#define     get_mount_fileset(sb)  (s2lsi(sb)->lsi_lmd->lmd_fileset)
 
 /****************** prototypes *********************/
 
diff --git a/drivers/staging/lustre/lustre/include/lustre_req_layout.h b/drivers/staging/lustre/lustre/include/lustre_req_layout.h
index db6d8ed..9d718b7 100644
--- a/drivers/staging/lustre/lustre/include/lustre_req_layout.h
+++ b/drivers/staging/lustre/lustre/include/lustre_req_layout.h
@@ -133,7 +133,7 @@ void req_capsule_shrink(struct req_capsule *pill,
 extern struct req_format RQF_MDS_CONNECT;
 extern struct req_format RQF_MDS_DISCONNECT;
 extern struct req_format RQF_MDS_STATFS;
-extern struct req_format RQF_MDS_GETSTATUS;
+extern struct req_format RQF_MDS_GET_ROOT;
 extern struct req_format RQF_MDS_SYNC;
 extern struct req_format RQF_MDS_GETXATTR;
 extern struct req_format RQF_MDS_GETATTR;
diff --git a/drivers/staging/lustre/lustre/include/obd.h b/drivers/staging/lustre/lustre/include/obd.h
index d6fd1ea..cd2a2d0 100644
--- a/drivers/staging/lustre/lustre/include/obd.h
+++ b/drivers/staging/lustre/lustre/include/obd.h
@@ -920,7 +920,8 @@ struct obd_client_handle {
 struct cl_attr;
 
 struct md_ops {
-	int (*getstatus)(struct obd_export *, struct lu_fid *);
+	int (*getstatus)(struct obd_export *exp, const char *fileset,
+			 struct lu_fid *fid);
 	int (*null_inode)(struct obd_export *, const struct lu_fid *);
 	int (*close)(struct obd_export *, struct md_op_data *,
 		     struct md_open_data *, struct ptlrpc_request **);
diff --git a/drivers/staging/lustre/lustre/include/obd_class.h b/drivers/staging/lustre/lustre/include/obd_class.h
index fc9c772..797986b 100644
--- a/drivers/staging/lustre/lustre/include/obd_class.h
+++ b/drivers/staging/lustre/lustre/include/obd_class.h
@@ -1180,13 +1180,14 @@ static inline int obd_register_observer(struct obd_device *obd,
 }
 
 /* metadata helpers */
-static inline int md_getstatus(struct obd_export *exp, struct lu_fid *fid)
+static inline int md_get_root(struct obd_export *exp, const char *fileset,
+			      struct lu_fid *fid)
 {
 	int rc;
 
 	EXP_CHECK_MD_OP(exp, getstatus);
 	EXP_MD_COUNTER_INCREMENT(exp, getstatus);
-	rc = MDP(exp->exp_obd, getstatus)(exp, fid);
+	rc = MDP(exp->exp_obd, getstatus)(exp, fileset, fid);
 	return rc;
 }
 
diff --git a/drivers/staging/lustre/lustre/include/obd_support.h b/drivers/staging/lustre/lustre/include/obd_support.h
index ca28caf..87806e8 100644
--- a/drivers/staging/lustre/lustre/include/obd_support.h
+++ b/drivers/staging/lustre/lustre/include/obd_support.h
@@ -135,8 +135,8 @@
 #define OBD_FAIL_MDS_CONNECT_PACK	0x118
 #define OBD_FAIL_MDS_REINT_NET_REP       0x119
 #define OBD_FAIL_MDS_DISCONNECT_NET      0x11a
-#define OBD_FAIL_MDS_GETSTATUS_NET       0x11b
-#define OBD_FAIL_MDS_GETSTATUS_PACK      0x11c
+#define OBD_FAIL_MDS_GET_ROOT_NET	0x11b
+#define OBD_FAIL_MDS_GET_ROOT_PACK	0x11c
 #define OBD_FAIL_MDS_STATFS_PACK	 0x11d
 #define OBD_FAIL_MDS_STATFS_NET	  0x11e
 #define OBD_FAIL_MDS_GETATTR_NAME_NET    0x11f
diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index 44bec1d..5f944ca 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -1594,6 +1594,13 @@ int ll_fid2path(struct inode *inode, void __user *arg)
 		goto gf_free;
 	}
 
+	/*
+	 * append root FID after gfout to let MDT know the root FID so that it
+	 * can lookup the correct path, this is mainly for fileset.
+	 * old server without fileset mount support will ignore this.
+	 */
+	*gfout->gf_u.gf_root_fid = *ll_inode2fid(inode);
+
 	/* Call mdc_iocontrol */
 	rc = obd_iocontrol(OBD_IOC_FID2PATH, exp, outsize, gfout, NULL);
 	if (rc != 0)
@@ -2725,6 +2732,16 @@ int ll_migrate(struct inode *parent, struct file *file, int mdtidx,
 		goto out_free;
 	}
 
+	/*
+	 * lfs migrate command needs to be blocked on the client
+	 * by checking the migrate FID against the FID of the
+	 * filesystem root.
+	 */
+	if (child_inode == parent->i_sb->s_root->d_inode) {
+		rc = -EINVAL;
+		goto out_iput;
+	}
+
 	inode_lock(child_inode);
 	op_data->op_fid3 = *ll_inode2fid(child_inode);
 	if (!fid_is_sane(&op_data->op_fid3)) {
@@ -2807,6 +2824,7 @@ int ll_migrate(struct inode *parent, struct file *file, int mdtidx,
 		clear_nlink(child_inode);
 out_unlock:
 	inode_unlock(child_inode);
+out_iput:
 	iput(child_inode);
 out_free:
 	ll_finish_md_op_data(op_data);
diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c
index df5bc0a..90dff0a 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -201,7 +201,8 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt)
 				  OBD_CONNECT_DISP_STRIPE | OBD_CONNECT_LFSCK |
 				  OBD_CONNECT_OPEN_BY_FID |
 				  OBD_CONNECT_DIR_STRIPE |
-				  OBD_CONNECT_BULK_MBITS;
+				  OBD_CONNECT_BULK_MBITS |
+				  OBD_CONNECT_SUBTREE;
 
 	if (sbi->ll_flags & LL_SBI_LRU_RESIZE)
 		data->ocd_connect_flags |= OBD_CONNECT_LRU_RESIZE;
@@ -436,7 +437,8 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt)
 	mutex_unlock(&sbi->ll_lco.lco_lock);
 
 	fid_zero(&sbi->ll_root_fid);
-	err = md_getstatus(sbi->ll_md_exp, &sbi->ll_root_fid);
+	err = md_get_root(sbi->ll_md_exp, get_mount_fileset(sb),
+			  &sbi->ll_root_fid);
 	if (err) {
 		CERROR("cannot mds_connect: rc = %d\n", err);
 		goto out_lock_cn_cb;
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index d4e8ba8..44fbaa6 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -613,17 +613,20 @@ static int lmv_fid2path(struct obd_export *exp, int len, void *karg,
 {
 	struct obd_device	*obddev = class_exp2obd(exp);
 	struct lmv_obd		*lmv = &obddev->u.lmv;
-	struct getinfo_fid2path *gf;
+	struct getinfo_fid2path *gf = karg;
 	struct lmv_tgt_desc     *tgt;
 	struct getinfo_fid2path *remote_gf = NULL;
+	struct lu_fid root_fid;
 	int			remote_gf_size = 0;
 	int			rc;
 
-	gf = karg;
 	tgt = lmv_find_target(lmv, &gf->gf_fid);
 	if (IS_ERR(tgt))
 		return PTR_ERR(tgt);
 
+	root_fid = *gf->gf_u.gf_root_fid;
+	LASSERT(fid_is_sane(&root_fid));
+
 repeat_fid2path:
 	rc = obd_iocontrol(OBD_IOC_FID2PATH, tgt->ltd_exp, len, gf, uarg);
 	if (rc != 0 && rc != -EREMOTE)
@@ -637,25 +640,25 @@ static int lmv_fid2path(struct obd_export *exp, int len, void *karg,
 		char *ptr;
 
 		ori_gf = karg;
-		if (strlen(ori_gf->gf_path) + 1 +
-		    strlen(gf->gf_path) + 1 > ori_gf->gf_pathlen) {
+		if (strlen(ori_gf->gf_u.gf_path) + 1 +
+		    strlen(gf->gf_u.gf_path) + 1 > ori_gf->gf_pathlen) {
 			rc = -EOVERFLOW;
 			goto out_fid2path;
 		}
 
-		ptr = ori_gf->gf_path;
+		ptr = ori_gf->gf_u.gf_path;
 
-		memmove(ptr + strlen(gf->gf_path) + 1, ptr,
-			strlen(ori_gf->gf_path));
+		memmove(ptr + strlen(gf->gf_u.gf_path) + 1, ptr,
+			strlen(ori_gf->gf_u.gf_path));
 
-		strncpy(ptr, gf->gf_path, strlen(gf->gf_path));
-		ptr += strlen(gf->gf_path);
+		strncpy(ptr, gf->gf_u.gf_path, strlen(gf->gf_u.gf_path));
+		ptr += strlen(gf->gf_u.gf_path);
 		*ptr = '/';
 	}
 
 	CDEBUG(D_INFO, "%s: get path %s " DFID " rec: %llu ln: %u\n",
 	       tgt->ltd_exp->exp_obd->obd_name,
-	       gf->gf_path, PFID(&gf->gf_fid), gf->gf_recno,
+	       gf->gf_u.gf_path, PFID(&gf->gf_fid), gf->gf_recno,
 	       gf->gf_linkno);
 
 	if (rc == 0)
@@ -689,7 +692,8 @@ static int lmv_fid2path(struct obd_export *exp, int len, void *karg,
 	remote_gf->gf_fid = gf->gf_fid;
 	remote_gf->gf_recno = -1;
 	remote_gf->gf_linkno = -1;
-	memset(remote_gf->gf_path, 0, remote_gf->gf_pathlen);
+	memset(remote_gf->gf_u.gf_path, 0, remote_gf->gf_pathlen);
+	*remote_gf->gf_u.gf_root_fid = root_fid;
 	gf = remote_gf;
 	goto repeat_fid2path;
 
@@ -1387,13 +1391,13 @@ static int lmv_statfs(const struct lu_env *env, struct obd_export *exp,
 	return rc;
 }
 
-static int lmv_getstatus(struct obd_export *exp,
+static int lmv_getstatus(struct obd_export *exp, const char *fileset,
 			 struct lu_fid *fid)
 {
 	struct obd_device    *obd = exp->exp_obd;
 	struct lmv_obd       *lmv = &obd->u.lmv;
 
-	return md_getstatus(lmv->tgts[0]->ltd_exp, fid);
+	return md_get_root(lmv->tgts[0]->ltd_exp, fileset, fid);
 }
 
 static int lmv_getxattr(struct obd_export *exp, const struct lu_fid *fid,
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_request.c b/drivers/staging/lustre/lustre/mdc/mdc_request.c
index 827ed0c..2e01f57 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_request.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_request.c
@@ -81,19 +81,50 @@ static inline int mdc_queue_wait(struct ptlrpc_request *req)
 	return rc;
 }
 
-static int mdc_getstatus(struct obd_export *exp, struct lu_fid *rootfid)
+/*
+ * Send MDS_GET_ROOT RPC to fetch root FID.
+ *
+ * If \a fileset is not NULL it should contain a subdirectory off
+ * the ROOT/ directory to be mounted on the client. Return the FID
+ * of the subdirectory to the client to mount onto its mountpoint.
+ *
+ * \param[in]  imp	MDC import
+ * \param[in]  fileset	fileset name, which could be NULL
+ * \param[out] rootfid	root FID of this mountpoint
+ * \param[out] pc	root capa will be unpacked and saved in this pointer
+ *
+ * \retval     0 on success, negative errno on failure
+ */
+static int mdc_get_root(struct obd_export *exp, const char *fileset,
+			struct lu_fid *rootfid)
 {
 	struct ptlrpc_request *req;
 	struct mdt_body       *body;
 	int		    rc;
 
-	req = ptlrpc_request_alloc_pack(class_exp2cliimp(exp),
-					&RQF_MDS_GETSTATUS,
-					LUSTRE_MDS_VERSION, MDS_GETSTATUS);
+	if (fileset && !(exp_connect_flags(exp) & OBD_CONNECT_SUBTREE))
+		return -ENOTSUPP;
+
+	req = ptlrpc_request_alloc(class_exp2cliimp(exp),
+				   &RQF_MDS_GET_ROOT);
 	if (!req)
 		return -ENOMEM;
 
+	if (fileset)
+		req_capsule_set_size(&req->rq_pill, &RMF_NAME, RCL_CLIENT,
+				     strlen(fileset) + 1);
+	rc = ptlrpc_request_pack(req, LUSTRE_MDS_VERSION, MDS_GET_ROOT);
+	if (rc) {
+		ptlrpc_request_free(req);
+		return rc;
+	}
 	mdc_pack_body(req, NULL, 0, 0, -1, 0);
+	if (fileset) {
+		char *name = req_capsule_client_get(&req->rq_pill, &RMF_NAME);
+
+		memcpy(name, fileset, strlen(fileset));
+	}
+	lustre_msg_add_flags(req->rq_reqmsg, LUSTRE_IMP_FULL);
 	req->rq_send_state = LUSTRE_IMP_FULL;
 
 	ptlrpc_request_set_replen(req);
@@ -1440,13 +1471,16 @@ static int mdc_ioc_fid2path(struct obd_export *exp, struct getinfo_fid2path *gf)
 		return -EOVERFLOW;
 
 	/* Key is KEY_FID2PATH + getinfo_fid2path description */
-	keylen = cfs_size_round(sizeof(KEY_FID2PATH)) + sizeof(*gf);
+	keylen = cfs_size_round(sizeof(KEY_FID2PATH)) + sizeof(*gf) +
+		 sizeof(struct lu_fid);
 	key = kzalloc(keylen, GFP_NOFS);
 	if (!key)
 		return -ENOMEM;
 	memcpy(key, KEY_FID2PATH, sizeof(KEY_FID2PATH));
 	memcpy(key + cfs_size_round(sizeof(KEY_FID2PATH)), gf, sizeof(*gf));
 
+	memcpy(key + cfs_size_round(sizeof(KEY_FID2PATH)) + sizeof(*gf),
+	       gf->gf_u.gf_root_fid, sizeof(struct lu_fid));
 	CDEBUG(D_IOCTL, "path get " DFID " from %llu #%d\n",
 	       PFID(&gf->gf_fid), gf->gf_recno, gf->gf_linkno);
 
@@ -1472,9 +1506,9 @@ static int mdc_ioc_fid2path(struct obd_export *exp, struct getinfo_fid2path *gf)
 
 	CDEBUG(D_IOCTL, "path got " DFID " from %llu #%d: %s\n",
 	       PFID(&gf->gf_fid), gf->gf_recno, gf->gf_linkno,
-	       gf->gf_pathlen < 512 ? gf->gf_path :
+	       gf->gf_pathlen < 512 ? gf->gf_u.gf_path :
 	       /* only log the last 512 characters of the path */
-	       gf->gf_path + gf->gf_pathlen - 512);
+	       gf->gf_u.gf_path + gf->gf_pathlen - 512);
 
 out:
 	kfree(key);
@@ -2713,7 +2747,7 @@ static int mdc_process_config(struct obd_device *obd, u32 len, void *buf)
 };
 
 static struct md_ops mdc_md_ops = {
-	.getstatus		= mdc_getstatus,
+	.getstatus		= mdc_get_root,
 	.null_inode		= mdc_null_inode,
 	.close			= mdc_close,
 	.create			= mdc_create,
diff --git a/drivers/staging/lustre/lustre/obdclass/obd_mount.c b/drivers/staging/lustre/lustre/obdclass/obd_mount.c
index 232bbfa..6e9803b 100644
--- a/drivers/staging/lustre/lustre/obdclass/obd_mount.c
+++ b/drivers/staging/lustre/lustre/obdclass/obd_mount.c
@@ -545,6 +545,7 @@ static int lustre_free_lsi(struct super_block *sb)
 	if (lsi->lsi_lmd) {
 		kfree(lsi->lsi_lmd->lmd_dev);
 		kfree(lsi->lsi_lmd->lmd_profile);
+		kfree(lsi->lsi_lmd->lmd_fileset);
 		kfree(lsi->lsi_lmd->lmd_mgssec);
 		kfree(lsi->lsi_lmd->lmd_opts);
 		if (lsi->lsi_lmd->lmd_exclude_count)
@@ -1073,10 +1074,30 @@ static int lmd_parse(char *options, struct lustre_mount_data *lmd)
 		/* Remove leading /s from fsname */
 		while (*++s1 == '/')
 			;
+		s2 = s1;
+		while (*s2 != '/' && *s2 != '\0')
+			s2++;
 		/* Freed in lustre_free_lsi */
-		lmd->lmd_profile = kasprintf(GFP_NOFS, "%s-client", s1);
+		lmd->lmd_profile = kzalloc(s2 - s1 + 8, GFP_NOFS);
 		if (!lmd->lmd_profile)
 			return -ENOMEM;
+
+		strncat(lmd->lmd_profile, s1, s2 - s1);
+		strncat(lmd->lmd_profile, "-client", 7);
+
+		s1 = s2;
+		s2 = s1 + strlen(s1) - 1;
+		/* Remove padding /s from fileset */
+		while (*s2 == '/')
+			s2--;
+		if (s2 > s1) {
+			lmd->lmd_fileset = kzalloc(s2 - s1 + 2, GFP_NOFS);
+			if (!lmd->lmd_fileset) {
+				kfree(lmd->lmd_profile);
+				return -ENOMEM;
+			}
+			strncat(lmd->lmd_fileset, s1, s2 - s1 + 1);
+		}
 	}
 
 	/* Freed in lustre_free_lsi */
diff --git a/drivers/staging/lustre/lustre/ptlrpc/layout.c b/drivers/staging/lustre/lustre/ptlrpc/layout.c
index 6ef8789..0b3ac14 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/layout.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/layout.c
@@ -477,6 +477,12 @@
 	&RMF_EAVALS_LENS
 };
 
+static const struct req_msg_field *mds_get_root_client[] = {
+	&RMF_PTLRPC_BODY,
+	&RMF_MDT_BODY,
+	&RMF_NAME
+};
+
 static const struct req_msg_field *mds_getxattr_client[] = {
 	&RMF_PTLRPC_BODY,
 	&RMF_MDT_BODY,
@@ -674,7 +680,7 @@
 	&RQF_MDS_CONNECT,
 	&RQF_MDS_DISCONNECT,
 	&RQF_MDS_GET_INFO,
-	&RQF_MDS_GETSTATUS,
+	&RQF_MDS_GET_ROOT,
 	&RQF_MDS_STATFS,
 	&RQF_MDS_GETATTR,
 	&RQF_MDS_GETATTR_NAME,
@@ -1228,9 +1234,9 @@ struct req_format RQF_OST_QUOTACTL =
 	DEFINE_REQ_FMT0("OST_QUOTACTL", quotactl_only, quotactl_only);
 EXPORT_SYMBOL(RQF_OST_QUOTACTL);
 
-struct req_format RQF_MDS_GETSTATUS =
-	DEFINE_REQ_FMT0("MDS_GETSTATUS", mdt_body_only, mdt_body_capa);
-EXPORT_SYMBOL(RQF_MDS_GETSTATUS);
+struct req_format RQF_MDS_GET_ROOT =
+	DEFINE_REQ_FMT0("MDS_GET_ROOT", mds_get_root_client, mdt_body_capa);
+EXPORT_SYMBOL(RQF_MDS_GET_ROOT);
 
 struct req_format RQF_MDS_STATFS =
 	DEFINE_REQ_FMT0("MDS_STATFS", empty, obd_statfs_server);
diff --git a/drivers/staging/lustre/lustre/ptlrpc/lproc_ptlrpc.c b/drivers/staging/lustre/lustre/ptlrpc/lproc_ptlrpc.c
index 52b980c..35120e7 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/lproc_ptlrpc.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/lproc_ptlrpc.c
@@ -74,7 +74,7 @@
 	{ MDS_READPAGE,     "mds_readpage" },
 	{ MDS_CONNECT,      "mds_connect" },
 	{ MDS_DISCONNECT,   "mds_disconnect" },
-	{ MDS_GETSTATUS,    "mds_getstatus" },
+	{ MDS_GET_ROOT,			"mds_get_root" },
 	{ MDS_STATFS,       "mds_statfs" },
 	{ MDS_PIN,	  "mds_pin" },
 	{ MDS_UNPIN,	"mds_unpin" },
diff --git a/drivers/staging/lustre/lustre/ptlrpc/wiretest.c b/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
index 5a68de5..43931dd 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
@@ -131,8 +131,8 @@ void lustre_assert_wire_constants(void)
 		 (long long)MDS_CONNECT);
 	LASSERTF(MDS_DISCONNECT == 39, "found %lld\n",
 		 (long long)MDS_DISCONNECT);
-	LASSERTF(MDS_GETSTATUS == 40, "found %lld\n",
-		 (long long)MDS_GETSTATUS);
+	LASSERTF(MDS_GET_ROOT == 40, "found %lld\n",
+		 (long long)MDS_GET_ROOT);
 	LASSERTF(MDS_STATFS == 41, "found %lld\n",
 		 (long long)MDS_STATFS);
 	LASSERTF(MDS_PIN == 42, "found %lld\n",
@@ -3708,10 +3708,10 @@ void lustre_assert_wire_constants(void)
 		 (long long)(int)offsetof(struct getinfo_fid2path, gf_pathlen));
 	LASSERTF((int)sizeof(((struct getinfo_fid2path *)0)->gf_pathlen) == 4, "found %lld\n",
 		 (long long)(int)sizeof(((struct getinfo_fid2path *)0)->gf_pathlen));
-	LASSERTF((int)offsetof(struct getinfo_fid2path, gf_path[0]) == 32, "found %lld\n",
-		 (long long)(int)offsetof(struct getinfo_fid2path, gf_path[0]));
-	LASSERTF((int)sizeof(((struct getinfo_fid2path *)0)->gf_path[0]) == 1, "found %lld\n",
-		 (long long)(int)sizeof(((struct getinfo_fid2path *)0)->gf_path[0]));
+	LASSERTF((int)offsetof(struct getinfo_fid2path, gf_u.gf_path[0]) == 32, "found %lld\n",
+		 (long long)(int)offsetof(struct getinfo_fid2path, gf_u.gf_path[0]));
+	LASSERTF((int)sizeof(((struct getinfo_fid2path *)0)->gf_u.gf_path[0]) == 1, "found %lld\n",
+		 (long long)(int)sizeof(((struct getinfo_fid2path *)0)->gf_u.gf_path[0]));
 
 	/* Checks for struct fiemap */
 	LASSERTF((int)sizeof(struct fiemap) == 32, "found %lld\n",
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [lustre-devel] [PATCH 04/18] lustre: obd: rename md_getstatus() to md_get_root()
  2018-07-02 23:24 [lustre-devel] [PATCH 00/18] lustre: missing updates from lustre 2.9 James Simmons
                   ` (2 preceding siblings ...)
  2018-07-02 23:24 ` [lustre-devel] [PATCH 03/18] lustre: fileset: add fileset mount support James Simmons
@ 2018-07-02 23:24 ` James Simmons
  2018-07-02 23:24 ` [lustre-devel] [PATCH 05/18] lustre: llite: fast read implementation James Simmons
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 41+ messages in thread
From: James Simmons @ 2018-07-02 23:24 UTC (permalink / raw)
  To: lustre-devel

From: "John L. Hammond" <jhammond@whamcloud.com>

Finish the partial renaming of the the OBD MD method
md_getstatus() to md_get_root().

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-8035
Reviewed-on: http://review.whamcloud.com/19824
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/include/obd.h       |  2 +-
 drivers/staging/lustre/lustre/include/obd_class.h | 10 +++++-----
 drivers/staging/lustre/lustre/lmv/lmv_obd.c       |  6 +++---
 drivers/staging/lustre/lustre/mdc/mdc_request.c   |  2 +-
 4 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/obd.h b/drivers/staging/lustre/lustre/include/obd.h
index cd2a2d0..d4574ef 100644
--- a/drivers/staging/lustre/lustre/include/obd.h
+++ b/drivers/staging/lustre/lustre/include/obd.h
@@ -920,7 +920,7 @@ struct obd_client_handle {
 struct cl_attr;
 
 struct md_ops {
-	int (*getstatus)(struct obd_export *exp, const char *fileset,
+	int (*get_root)(struct obd_export *exp, const char *fileset,
 			 struct lu_fid *fid);
 	int (*null_inode)(struct obd_export *, const struct lu_fid *);
 	int (*close)(struct obd_export *, struct md_op_data *,
diff --git a/drivers/staging/lustre/lustre/include/obd_class.h b/drivers/staging/lustre/lustre/include/obd_class.h
index 797986b..20d07f8 100644
--- a/drivers/staging/lustre/lustre/include/obd_class.h
+++ b/drivers/staging/lustre/lustre/include/obd_class.h
@@ -356,8 +356,8 @@ static inline int obd_check_dev_active(struct obd_device *obd)
 
 #define MD_COUNTER_OFFSET(op)					\
 	((offsetof(struct md_ops, op) -				\
-		offsetof(struct md_ops, getstatus))		\
-		/ sizeof(((struct md_ops *)(0))->getstatus))
+		offsetof(struct md_ops, get_root))		\
+		/ sizeof(((struct md_ops *)(0))->get_root))
 
 #define MD_COUNTER_INCREMENT(obdx, op)				 \
 do {								 \
@@ -1185,9 +1185,9 @@ static inline int md_get_root(struct obd_export *exp, const char *fileset,
 {
 	int rc;
 
-	EXP_CHECK_MD_OP(exp, getstatus);
-	EXP_MD_COUNTER_INCREMENT(exp, getstatus);
-	rc = MDP(exp->exp_obd, getstatus)(exp, fileset, fid);
+	EXP_CHECK_MD_OP(exp, get_root);
+	EXP_MD_COUNTER_INCREMENT(exp, get_root);
+	rc = MDP(exp->exp_obd, get_root)(exp, fileset, fid);
 	return rc;
 }
 
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index 44fbaa6..71109ba 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -1391,8 +1391,8 @@ static int lmv_statfs(const struct lu_env *env, struct obd_export *exp,
 	return rc;
 }
 
-static int lmv_getstatus(struct obd_export *exp, const char *fileset,
-			 struct lu_fid *fid)
+static int lmv_get_root(struct obd_export *exp, const char *fileset,
+			struct lu_fid *fid)
 {
 	struct obd_device    *obd = exp->exp_obd;
 	struct lmv_obd       *lmv = &obd->u.lmv;
@@ -3076,7 +3076,7 @@ static int lmv_merge_attr(struct obd_export *exp,
 };
 
 static struct md_ops lmv_md_ops = {
-	.getstatus		= lmv_getstatus,
+	.get_root		= lmv_get_root,
 	.null_inode		= lmv_null_inode,
 	.close			= lmv_close,
 	.create			= lmv_create,
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_request.c b/drivers/staging/lustre/lustre/mdc/mdc_request.c
index 2e01f57..7457039 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_request.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_request.c
@@ -2747,7 +2747,7 @@ static int mdc_process_config(struct obd_device *obd, u32 len, void *buf)
 };
 
 static struct md_ops mdc_md_ops = {
-	.getstatus		= mdc_get_root,
+	.get_root		= mdc_get_root,
 	.null_inode		= mdc_null_inode,
 	.close			= mdc_close,
 	.create			= mdc_create,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [lustre-devel] [PATCH 05/18] lustre: llite: fast read implementation
  2018-07-02 23:24 [lustre-devel] [PATCH 00/18] lustre: missing updates from lustre 2.9 James Simmons
                   ` (3 preceding siblings ...)
  2018-07-02 23:24 ` [lustre-devel] [PATCH 04/18] lustre: obd: rename md_getstatus() to md_get_root() James Simmons
@ 2018-07-02 23:24 ` James Simmons
  2018-07-03  4:10   ` NeilBrown
  2018-07-02 23:24 ` [lustre-devel] [PATCH 06/18] lustre: obd: reserve connection flag OBD_CONNECT2_FILE_SECCTX James Simmons
                   ` (13 subsequent siblings)
  18 siblings, 1 reply; 41+ messages in thread
From: James Simmons @ 2018-07-02 23:24 UTC (permalink / raw)
  To: lustre-devel

From: Jinshan Xiong <jinshan.xiong@gmail.com>

For read operation, if a page is already in cache, it must be covered
by a DLM lock. We can take advantage of this by reading cached page
without interacting with Lustre. Traditional read will go on if fast
read fails.

This patch can improve small read performance significantly.
These are the performance data I collected:

+------------+----------------+-----------------+
|            | read bs=4k     | read bs=1M      |
+------------+----------------+-----------------+
| w/o patch  | 257 MB/s       | 1.1 GB/s        |
+------------+----------------+-----------------+
| w/ patch   | 1.2 GB/s       | 1.4 GB/s        |
+------------+----------------+-----------------+

Signed-off-by: Jinshan Xiong <jinshan.xiong@gmail.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-4257
Reviewed-on: http://review.whamcloud.com/20255
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/llite/file.c         | 104 +++++++++++++++++++--
 .../staging/lustre/lustre/llite/llite_internal.h   |  23 ++++-
 drivers/staging/lustre/lustre/llite/llite_lib.c    |   1 +
 drivers/staging/lustre/lustre/llite/llite_mmap.c   |  34 ++++++-
 drivers/staging/lustre/lustre/llite/lproc_llite.c  |  38 ++++++++
 drivers/staging/lustre/lustre/llite/rw.c           |  68 ++++++++++++--
 drivers/staging/lustre/lustre/llite/vvp_internal.h |   1 +
 7 files changed, 252 insertions(+), 17 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index 5f944ca..db18d1d 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -953,10 +953,21 @@ int ll_merge_attr(const struct lu_env *env, struct inode *inode)
 
 	ll_inode_size_lock(inode);
 
-	/* merge timestamps the most recently obtained from mds with
-	 * timestamps obtained from osts
+	/*
+	 * merge timestamps the most recently obtained from MDS with
+	 * timestamps obtained from OSTSs.
+	 *
+	 * Do not overwrite atime of inode because it may be refreshed
+	 * by file_accessed() function. If the read was served by cache
+	 * data, there is no RPC to be sent so that atime may not be
+	 * transferred to OSTs at all. MDT only updates atime at close time
+	 * if it's at least 'mdd.*.atime_diff' older.
+	 * All in all, the atime in Lustre does not strictly comply with
+	 * POSIX. Solving this problem needs to send an RPC to MDT for each
+	 * read, this will hurt performance.
 	 */
-	inode->i_atime.tv_sec = lli->lli_atime;
+	if (inode->i_atime.tv_sec < lli->lli_atime)
+		inode->i_atime.tv_sec = lli->lli_atime;
 	inode->i_mtime.tv_sec = lli->lli_mtime;
 	inode->i_ctime.tv_sec = lli->lli_ctime;
 
@@ -1096,7 +1107,7 @@ static void ll_io_init(struct cl_io *io, const struct file *file, int write)
 
 			range_locked = true;
 		}
-		ll_cl_add(file, env, io);
+		ll_cl_add(file, env, io, LCC_RW);
 		rc = cl_io_loop(env, io);
 		ll_cl_remove(file, env);
 		if (range_locked) {
@@ -1155,23 +1166,104 @@ static void ll_io_init(struct cl_io *io, const struct file *file, int write)
 	return result > 0 ? result : rc;
 }
 
+/**
+ * The purpose of fast read is to overcome per I/O overhead and improve IOPS
+ * especially for small I/O.
+ *
+ * To serve a read request, CLIO has to create and initialize a cl_io and
+ * then request DLM lock. This has turned out to have siginificant overhead
+ * and affects the performance of small I/O dramatically.
+ *
+ * It's not necessary to create a cl_io for each I/O. Under the help of read
+ * ahead, most of the pages being read are already in memory cache and we can
+ * read those pages directly because if the pages exist, the corresponding DLM
+ * lock must exist so that page content must be valid.
+ *
+ * In fast read implementation, the llite speculatively finds and reads pages
+ * in memory cache. There are three scenarios for fast read:
+ *   - If the page exists and is uptodate, kernel VM will provide the data and
+ *     CLIO won't be intervened;
+ *   - If the page was brought into memory by read ahead, it will be exported
+ *     and read ahead parameters will be updated;
+ *   - Otherwise the page is not in memory, we can't do fast read. Therefore,
+ *     it will go back and invoke normal read, i.e., a cl_io will be created
+ *     and DLM lock will be requested.
+ *
+ * POSIX compliance: posix standard states that read is intended to be atomic.
+ * Lustre read implementation is in line with Linux kernel read implementation
+ * and neither of them complies with POSIX standard in this matter. Fast read
+ * doesn't make the situation worse on single node but it may interleave write
+ * results from multiple nodes due to short read handling in ll_file_aio_read().
+ *
+ * @env   - lu_env
+ * @iocb  - kiocb from kernel
+ * @iter  - user space buffers where the data will be copied
+ *
+ * RETURN - number of bytes have been read, or error code if error occurred.
+ */
+static ssize_t
+ll_do_fast_read(const struct lu_env *env, struct kiocb *iocb,
+		struct iov_iter *iter)
+{
+	ssize_t result;
+
+	if (!ll_sbi_has_fast_read(ll_i2sbi(file_inode(iocb->ki_filp))))
+		return 0;
+
+	/*
+	 * NB: we can't do direct IO for fast read because it will need a lock
+	 * to make IO engine happy.
+	 */
+	if (iocb->ki_filp->f_flags & O_DIRECT)
+		return 0;
+
+	ll_cl_add(iocb->ki_filp, env, NULL, LCC_RW);
+	result = generic_file_read_iter(iocb, iter);
+	ll_cl_remove(iocb->ki_filp, env);
+
+	/*
+	 * If the first page is not in cache, generic_file_aio_read() will be
+	 * returned with -ENODATA.
+	 * See corresponding code in ll_readpage().
+	 */
+	if (result == -ENODATA)
+		result = 0;
+
+	if (result > 0)
+		ll_stats_ops_tally(ll_i2sbi(file_inode(iocb->ki_filp)),
+				   LPROC_LL_READ_BYTES, result);
+
+	return result;
+}
+
 static ssize_t ll_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 {
 	struct lu_env      *env;
 	struct vvp_io_args *args;
 	ssize_t	     result;
 	u16 refcheck;
+	ssize_t rc2;
 
 	env = cl_env_get(&refcheck);
 	if (IS_ERR(env))
 		return PTR_ERR(env);
 
+	result = ll_do_fast_read(env, iocb, to);
+	if (result < 0 || iov_iter_count(to) == 0)
+		goto out;
+
 	args = ll_env_args(env);
 	args->u.normal.via_iter = to;
 	args->u.normal.via_iocb = iocb;
 
-	result = ll_file_io_generic(env, args, iocb->ki_filp, CIT_READ,
-				    &iocb->ki_pos, iov_iter_count(to));
+	rc2 = ll_file_io_generic(env, args, iocb->ki_filp, CIT_READ,
+				 &iocb->ki_pos, iov_iter_count(to));
+	if (rc2 > 0)
+		result += rc2;
+	else if (result == 0)
+		result = rc2;
+
+out:
 	cl_env_put(env, &refcheck);
 	return result;
 }
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index 8770d10..86914c9 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -395,6 +395,7 @@ enum stats_track_type {
 #define LL_SBI_ALWAYS_PING	0x200000 /* always ping even if server
 					  * suppress_pings
 					  */
+#define LL_SBI_FAST_READ	0x400000 /* fast read support */
 
 #define LL_SBI_FLAGS {	\
 	"nolck",	\
@@ -419,6 +420,7 @@ enum stats_track_type {
 	"xattr_cache",	\
 	"norootsquash",	\
 	"always_ping",	\
+	"fast_read",    \
 }
 
 /*
@@ -646,6 +648,11 @@ static inline int ll_need_32bit_api(struct ll_sb_info *sbi)
 #endif
 }
 
+static inline bool ll_sbi_has_fast_read(struct ll_sb_info *sbi)
+{
+	return !!(sbi->ll_flags & LL_SBI_FAST_READ);
+}
+
 void ll_ras_enter(struct file *f);
 
 /* llite/lcommon_misc.c */
@@ -678,6 +685,8 @@ enum {
 	LPROC_LL_OPEN,
 	LPROC_LL_RELEASE,
 	LPROC_LL_MAP,
+	LPROC_LL_FAULT,
+	LPROC_LL_MKWRITE,
 	LPROC_LL_LLSEEK,
 	LPROC_LL_FSYNC,
 	LPROC_LL_READDIR,
@@ -732,9 +741,12 @@ int ll_md_blocking_ast(struct ldlm_lock *lock, struct ldlm_lock_desc *desc,
 int ll_readpage(struct file *file, struct page *page);
 void ll_readahead_init(struct inode *inode, struct ll_readahead_state *ras);
 int vvp_io_write_commit(const struct lu_env *env, struct cl_io *io);
-struct ll_cl_context *ll_cl_find(struct file *file);
-void ll_cl_add(struct file *file, const struct lu_env *env, struct cl_io *io);
+
+enum lcc_type;
+void ll_cl_add(struct file *file, const struct lu_env *env, struct cl_io *io,
+	       enum lcc_type type);
 void ll_cl_remove(struct file *file, const struct lu_env *env);
+struct ll_cl_context *ll_cl_find(struct file *file);
 
 extern const struct address_space_operations ll_aops;
 
@@ -891,15 +903,22 @@ struct vvp_io_args {
 	} u;
 };
 
+enum lcc_type {
+	LCC_RW = 1,
+	LCC_MMAP
+};
+
 struct ll_cl_context {
 	struct list_head	 lcc_list;
 	void	   *lcc_cookie;
 	const struct lu_env	*lcc_env;
 	struct cl_io   *lcc_io;
 	struct cl_page *lcc_page;
+	enum lcc_type		 lcc_type;
 };
 
 struct ll_thread_info {
+	struct iov_iter		lti_iter;
 	struct vvp_io_args   lti_args;
 	struct ra_io_arg     lti_ria;
 	struct ll_cl_context lti_io_ctx;
diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c
index 90dff0a..6e47e5b 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -122,6 +122,7 @@ static struct ll_sb_info *ll_init_sbi(struct super_block *sb)
 	atomic_set(&sbi->ll_sa_running, 0);
 	atomic_set(&sbi->ll_agl_total, 0);
 	sbi->ll_flags |= LL_SBI_AGL_ENABLED;
+	sbi->ll_flags |= LL_SBI_FAST_READ;
 
 	/* root squash */
 	sbi->ll_squash.rsi_uid = 0;
diff --git a/drivers/staging/lustre/lustre/llite/llite_mmap.c b/drivers/staging/lustre/lustre/llite/llite_mmap.c
index 8cb8036..023d62e 100644
--- a/drivers/staging/lustre/lustre/llite/llite_mmap.c
+++ b/drivers/staging/lustre/lustre/llite/llite_mmap.c
@@ -277,6 +277,28 @@ static vm_fault_t ll_fault0(struct vm_area_struct *vma, struct vm_fault *vmf)
 	if (IS_ERR(env))
 		return VM_FAULT_ERROR;
 
+	if (ll_sbi_has_fast_read(ll_i2sbi(file_inode(vma->vm_file)))) {
+		/* do fast fault */
+		ll_cl_add(vma->vm_file, env, NULL, LCC_MMAP);
+		fault_ret = filemap_fault(vmf);
+		ll_cl_remove(vma->vm_file, env);
+
+		/*
+		 * - If there is no error, then the page was found in cache and
+		 *   uptodate;
+		 * - If VM_FAULT_RETRY is set, the page existed but failed to
+		 *   lock. It will return to kernel and retry;
+		 * - Otherwise, it should try normal fault under DLM lock.
+		 */
+		if ((fault_ret & VM_FAULT_RETRY) ||
+		    !(fault_ret & VM_FAULT_ERROR)) {
+			result = 0;
+			goto out;
+		}
+
+		fault_ret = 0;
+	}
+
 	io = ll_fault_io_init(env, vma, vmf->pgoff, &ra_flags);
 	if (IS_ERR(io)) {
 		fault_ret = to_fault_error(PTR_ERR(io));
@@ -293,7 +315,7 @@ static vm_fault_t ll_fault0(struct vm_area_struct *vma, struct vm_fault *vmf)
 		vio->u.fault.ft_flags_valid = false;
 
 		/* May call ll_readpage() */
-		ll_cl_add(vma->vm_file, env, io);
+		ll_cl_add(vma->vm_file, env, io, LCC_MMAP);
 
 		result = cl_io_loop(env, io);
 
@@ -326,6 +348,7 @@ static vm_fault_t ll_fault0(struct vm_area_struct *vma, struct vm_fault *vmf)
 
 static vm_fault_t ll_fault(struct vm_fault *vmf)
 {
+	struct vm_area_struct *vma = vmf->vma;
 	int count = 0;
 	bool printed = false;
 	vm_fault_t result;
@@ -338,10 +361,12 @@ static vm_fault_t ll_fault(struct vm_fault *vmf)
 	siginitsetinv(&new, sigmask(SIGKILL) | sigmask(SIGTERM));
 	sigprocmask(SIG_BLOCK, &new, &old);
 
+	ll_stats_ops_tally(ll_i2sbi(file_inode(vma->vm_file)),
+			   LPROC_LL_FAULT, 1);
+
 restart:
 	result = ll_fault0(vmf->vma, vmf);
-	LASSERT(!(result & VM_FAULT_LOCKED));
-	if (result == 0) {
+	if (!(result & (VM_FAULT_RETRY | VM_FAULT_ERROR | VM_FAULT_LOCKED))) {
 		struct page *vmpage = vmf->page;
 
 		/* check if this page has been truncated */
@@ -375,6 +400,9 @@ static vm_fault_t ll_page_mkwrite(struct vm_fault *vmf)
 	int err;
 	vm_fault_t ret;
 
+	ll_stats_ops_tally(ll_i2sbi(file_inode(vma->vm_file)),
+			   LPROC_LL_MKWRITE, 1);
+
 	file_update_time(vma->vm_file);
 	do {
 		retry = false;
diff --git a/drivers/staging/lustre/lustre/llite/lproc_llite.c b/drivers/staging/lustre/lustre/llite/lproc_llite.c
index 49bf1b7..9dcbe64 100644
--- a/drivers/staging/lustre/lustre/llite/lproc_llite.c
+++ b/drivers/staging/lustre/lustre/llite/lproc_llite.c
@@ -872,6 +872,41 @@ static ssize_t xattr_cache_store(struct kobject *kobj,
 }
 LUSTRE_RW_ATTR(xattr_cache);
 
+static ssize_t fast_read_show(struct kobject *kobj,
+			      struct attribute *attr,
+			      char *buf)
+{
+	struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info,
+					      ll_kobj);
+
+	return sprintf(buf, "%u\n", !!(sbi->ll_flags & LL_SBI_FAST_READ));
+}
+
+static ssize_t fast_read_store(struct kobject *kobj,
+			       struct attribute *attr,
+			       const char *buffer,
+			       size_t count)
+{
+	struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info,
+					      ll_kobj);
+	bool val;
+	int rc;
+
+	rc = kstrtobool(buffer, &val);
+	if (rc)
+		return rc;
+
+	spin_lock(&sbi->ll_lock);
+	if (val)
+		sbi->ll_flags |= LL_SBI_FAST_READ;
+	else
+		sbi->ll_flags &= ~LL_SBI_FAST_READ;
+	spin_unlock(&sbi->ll_lock);
+
+	return count;
+}
+LUSTRE_RW_ATTR(fast_read);
+
 static int ll_unstable_stats_seq_show(struct seq_file *m, void *v)
 {
 	struct super_block     *sb    = m->private;
@@ -1032,6 +1067,7 @@ static ssize_t ll_nosquash_nids_seq_write(struct file *file,
 	&lustre_attr_max_easize.attr,
 	&lustre_attr_default_easize.attr,
 	&lustre_attr_xattr_cache.attr,
+	&lustre_attr_fast_read.attr,
 	NULL,
 };
 
@@ -1068,6 +1104,8 @@ static void llite_sb_release(struct kobject *kobj)
 	{ LPROC_LL_OPEN,	   LPROCFS_TYPE_REGS, "open" },
 	{ LPROC_LL_RELEASE,	LPROCFS_TYPE_REGS, "close" },
 	{ LPROC_LL_MAP,	    LPROCFS_TYPE_REGS, "mmap" },
+	{ LPROC_LL_FAULT,		LPROCFS_TYPE_REGS, "page_fault" },
+	{ LPROC_LL_MKWRITE,		LPROCFS_TYPE_REGS, "page_mkwrite" },
 	{ LPROC_LL_LLSEEK,	 LPROCFS_TYPE_REGS, "seek" },
 	{ LPROC_LL_FSYNC,	  LPROCFS_TYPE_REGS, "fsync" },
 	{ LPROC_LL_READDIR,	LPROCFS_TYPE_REGS, "readdir" },
diff --git a/drivers/staging/lustre/lustre/llite/rw.c b/drivers/staging/lustre/lustre/llite/rw.c
index 3e008ce..59747da 100644
--- a/drivers/staging/lustre/lustre/llite/rw.c
+++ b/drivers/staging/lustre/lustre/llite/rw.c
@@ -1067,7 +1067,8 @@ struct ll_cl_context *ll_cl_find(struct file *file)
 	return found;
 }
 
-void ll_cl_add(struct file *file, const struct lu_env *env, struct cl_io *io)
+void ll_cl_add(struct file *file, const struct lu_env *env, struct cl_io *io,
+	       enum lcc_type type)
 {
 	struct ll_file_data *fd = LUSTRE_FPRIVATE(file);
 	struct ll_cl_context *lcc = &ll_env_info(env)->lti_io_ctx;
@@ -1077,6 +1078,7 @@ void ll_cl_add(struct file *file, const struct lu_env *env, struct cl_io *io)
 	lcc->lcc_cookie = current;
 	lcc->lcc_env = env;
 	lcc->lcc_io = io;
+	lcc->lcc_type = type;
 
 	write_lock(&fd->fd_lock);
 	list_add(&lcc->lcc_list, &fd->fd_lccs);
@@ -1094,10 +1096,10 @@ void ll_cl_remove(struct file *file, const struct lu_env *env)
 }
 
 static int ll_io_read_page(const struct lu_env *env, struct cl_io *io,
-			   struct cl_page *page)
+			   struct cl_page *page, struct file *file)
 {
 	struct inode *inode = vvp_object_inode(page->cp_obj);
-	struct ll_file_data *fd = vvp_env_io(env)->vui_fd;
+	struct ll_file_data *fd = LUSTRE_FPRIVATE(file);
 	struct ll_readahead_state *ras = &fd->fd_ras;
 	struct cl_2queue *queue  = &io->ci_queue;
 	struct ll_sb_info *sbi = ll_i2sbi(inode);
@@ -1109,7 +1111,8 @@ static int ll_io_read_page(const struct lu_env *env, struct cl_io *io,
 	uptodate = vpg->vpg_defer_uptodate;
 
 	if (sbi->ll_ra_info.ra_max_pages_per_file > 0 &&
-	    sbi->ll_ra_info.ra_max_pages > 0) {
+	    sbi->ll_ra_info.ra_max_pages > 0 &&
+	    !vpg->vpg_ra_updated) {
 		struct vvp_io *vio = vvp_env_io(env);
 		enum ras_update_flags flags = 0;
 
@@ -1168,13 +1171,66 @@ int ll_readpage(struct file *file, struct page *vmpage)
 
 	env = lcc->lcc_env;
 	io = lcc->lcc_io;
-	LASSERT(io->ci_state == CIS_IO_GOING);
+	if (!io) { /* fast read */
+		struct ll_file_data *fd = LUSTRE_FPRIVATE(file);
+		struct ll_readahead_state *ras = &fd->fd_ras;
+		struct inode *inode = file_inode(file);
+		struct vvp_page *vpg;
+
+		result = -ENODATA;
+
+		/*
+		 * TODO: need to verify the layout version to make sure
+		 * the page is not invalid due to layout change.
+		 */
+		page = cl_vmpage_page(vmpage, clob);
+		if (!page) {
+			unlock_page(vmpage);
+			return result;
+		}
+
+		vpg = cl2vvp_page(cl_object_page_slice(page->cp_obj, page));
+		if (vpg->vpg_defer_uptodate) {
+			enum ras_update_flags flags = LL_RAS_HIT;
+
+			if (lcc->lcc_type == LCC_MMAP)
+				flags |= LL_RAS_MMAP;
+
+			/*
+			 * For fast read, it updates read ahead state only
+			 * if the page is hit in cache because non cache page
+			 * case will be handled by slow read later.
+			 */
+			ras_update(ll_i2sbi(inode), inode, ras, vvp_index(vpg),
+				   flags);
+			/* avoid duplicate ras_update() call */
+			vpg->vpg_ra_updated = 1;
+
+			/*
+			 * Check if we can issue a readahead RPC, if that is
+			 * the case, we can't do fast IO because we will need
+			 * a cl_io to issue the RPC.
+			 */
+			if (ras->ras_window_start + ras->ras_window_len <
+			    ras->ras_next_readahead + PTLRPC_MAX_BRW_PAGES) {
+				/* export the page and skip io stack */
+				vpg->vpg_ra_used = 1;
+				cl_page_export(env, page, 1);
+				result = 0;
+			}
+		}
+
+		unlock_page(vmpage);
+		cl_page_put(env, page);
+		return result;
+	}
+
 	page = cl_page_find(env, clob, vmpage->index, vmpage, CPT_CACHEABLE);
 	if (!IS_ERR(page)) {
 		LASSERT(page->cp_type == CPT_CACHEABLE);
 		if (likely(!PageUptodate(vmpage))) {
 			cl_page_assume(env, io, page);
-			result = ll_io_read_page(env, io, page);
+			result = ll_io_read_page(env, io, page, file);
 		} else {
 			/* Page from a non-object file. */
 			unlock_page(vmpage);
diff --git a/drivers/staging/lustre/lustre/llite/vvp_internal.h b/drivers/staging/lustre/lustre/llite/vvp_internal.h
index 7d3abb4..70d62bf 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_internal.h
+++ b/drivers/staging/lustre/lustre/llite/vvp_internal.h
@@ -225,6 +225,7 @@ struct vvp_object {
 struct vvp_page {
 	struct cl_page_slice vpg_cl;
 	unsigned int	vpg_defer_uptodate:1,
+			vpg_ra_updated:1,
 			vpg_ra_used:1;
 	/** VM page */
 	struct page	  *vpg_page;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [lustre-devel] [PATCH 06/18] lustre: obd: reserve connection flag OBD_CONNECT2_FILE_SECCTX
  2018-07-02 23:24 [lustre-devel] [PATCH 00/18] lustre: missing updates from lustre 2.9 James Simmons
                   ` (4 preceding siblings ...)
  2018-07-02 23:24 ` [lustre-devel] [PATCH 05/18] lustre: llite: fast read implementation James Simmons
@ 2018-07-02 23:24 ` James Simmons
  2018-07-03  4:20   ` NeilBrown
  2018-07-02 23:24 ` [lustre-devel] [PATCH 07/18] lustre: llite: restore fd_och when putting lease James Simmons
                   ` (12 subsequent siblings)
  18 siblings, 1 reply; 41+ messages in thread
From: James Simmons @ 2018-07-02 23:24 UTC (permalink / raw)
  To: lustre-devel

From: "John L. Hammond" <jhammond@whamcloud.com>

The connection flag OBD_CONNECT2_FILE_SECCTX will be set (in
ocd_connect_flags2) if an MDT supports setting the file security
context at create time.

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-5560
Reviewed-on: http://review.whamcloud.com/19970
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/include/uapi/linux/lustre/lustre_idl.h  |  4 ++
 .../staging/lustre/lustre/include/lustre_import.h  |  1 +
 drivers/staging/lustre/lustre/include/obd_class.h  |  3 +-
 drivers/staging/lustre/lustre/ldlm/ldlm_lib.c      |  1 +
 drivers/staging/lustre/lustre/llite/llite_lib.c    |  9 +++-
 .../lustre/lustre/obdclass/lprocfs_status.c        | 62 ++++++++++++++++++----
 drivers/staging/lustre/lustre/ptlrpc/import.c      | 10 ++++
 drivers/staging/lustre/lustre/ptlrpc/wiretest.c    |  2 +
 8 files changed, 80 insertions(+), 12 deletions(-)

diff --git a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h
index 6defc6d..4e25521 100644
--- a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h
@@ -695,6 +695,10 @@ struct ptlrpc_body_v2 {
 #define OBD_CONNECT_BULK_MBITS	 0x2000000000000000ULL
 #define OBD_CONNECT_OBDOPACK	 0x4000000000000000ULL /* compact OUT obdo */
 #define OBD_CONNECT_FLAGS2	 0x8000000000000000ULL /* second flags word */
+/* ocd_connect_flags2 flags */
+#define OBD_CONNECT2_FILE_SECCTX	0x1ULL		/* set file security
+							 * context at create
+							 */
 
 /* XXX README XXX:
  * Please DO NOT add flag values here before first ensuring that this same
diff --git a/drivers/staging/lustre/lustre/include/lustre_import.h b/drivers/staging/lustre/lustre/include/lustre_import.h
index ac3805e..a629f6b 100644
--- a/drivers/staging/lustre/lustre/include/lustre_import.h
+++ b/drivers/staging/lustre/lustre/include/lustre_import.h
@@ -307,6 +307,7 @@ struct obd_import {
 	__u32		     imp_connect_op;
 	struct obd_connect_data   imp_connect_data;
 	__u64		     imp_connect_flags_orig;
+	u64			imp_connect_flags2_orig;
 	int		       imp_connect_error;
 
 	__u32		     imp_msg_magic;
diff --git a/drivers/staging/lustre/lustre/include/obd_class.h b/drivers/staging/lustre/lustre/include/obd_class.h
index 20d07f8..adfe2ab 100644
--- a/drivers/staging/lustre/lustre/include/obd_class.h
+++ b/drivers/staging/lustre/lustre/include/obd_class.h
@@ -80,7 +80,8 @@ struct obd_device *class_devices_in_group(struct obd_uuid *grp_uuid,
 
 int class_notify_sptlrpc_conf(const char *fsname, int namelen);
 
-int obd_connect_flags2str(char *page, int count, __u64 flags, char *sep);
+int obd_connect_flags2str(char *page, int count, u64 flags, u64 flags2,
+			  const char *sep);
 
 int obd_zombie_impexp_init(void);
 void obd_zombie_impexp_stop(void);
diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_lib.c b/drivers/staging/lustre/lustre/ldlm/ldlm_lib.c
index 0aa4f23..07baea7 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_lib.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_lib.c
@@ -518,6 +518,7 @@ int client_connect_import(const struct lu_env *env,
 		if (is_mdc)
 			data->ocd_connect_flags |= OBD_CONNECT_MULTIMODRPCS;
 		imp->imp_connect_flags_orig = data->ocd_connect_flags;
+		imp->imp_connect_flags2_orig = data->ocd_connect_flags2;
 	}
 
 	rc = ptlrpc_connect_import(imp);
diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c
index 6e47e5b..640205a 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -203,7 +203,10 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt)
 				  OBD_CONNECT_OPEN_BY_FID |
 				  OBD_CONNECT_DIR_STRIPE |
 				  OBD_CONNECT_BULK_MBITS |
-				  OBD_CONNECT_SUBTREE;
+				  OBD_CONNECT_SUBTREE |
+				  OBD_CONNECT_FLAGS2;
+
+	data->ocd_connect_flags2 = 0;
 
 	if (sbi->ll_flags & LL_SBI_LRU_RESIZE)
 		data->ocd_connect_flags |= OBD_CONNECT_LRU_RESIZE;
@@ -292,7 +295,7 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt)
 			goto out_md_fid;
 		}
 		obd_connect_flags2str(buf, PAGE_SIZE,
-				      valid ^ CLIENT_CONNECT_MDT_REQD, ",");
+				      valid ^ CLIENT_CONNECT_MDT_REQD, 0, ",");
 		LCONSOLE_ERROR_MSG(0x170,
 				   "Server %s does not support feature(s) needed for correct operation of this client (%s). Please upgrade server or downgrade client.\n",
 				   sbi->ll_md_exp->exp_obd->obd_name, buf);
@@ -375,6 +378,8 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt)
 				  OBD_CONNECT_PINGLESS | OBD_CONNECT_LFSCK |
 				  OBD_CONNECT_BULK_MBITS;
 
+	data->ocd_connect_flags2 = 0;
+
 	if (!OBD_FAIL_CHECK(OBD_FAIL_OSC_CONNECT_GRANT_PARAM))
 		data->ocd_connect_flags |= OBD_CONNECT_GRANT_PARAM;
 
diff --git a/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c b/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c
index dd88179..9f76d8a 100644
--- a/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c
+++ b/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c
@@ -45,6 +45,7 @@
 #include <linux/ctype.h>
 
 static const char * const obd_connect_names[] = {
+	/* flags names */
 	"read_only",
 	"lov_index",
 	"connect_from_mds",
@@ -109,23 +110,42 @@
 	"bulk_mbits",
 	"compact_obdo",
 	"second_flags",
+	/* flags2 names */
+	"file_secctx",
 	NULL
 };
 
-int obd_connect_flags2str(char *page, int count, __u64 flags, char *sep)
+int obd_connect_flags2str(char *page, int count, u64 flags, u64 flags2,
+			  const char *sep)
 {
-	__u64 mask = 1;
+	__u64 mask;
 	int i, ret = 0;
 
-	for (i = 0; obd_connect_names[i]; i++, mask <<= 1) {
+	for (i = 0, mask = 1; i < 64; i++, mask <<= 1) {
 		if (flags & mask)
 			ret += snprintf(page + ret, count - ret, "%s%s",
 					ret ? sep : "", obd_connect_names[i]);
 	}
+
 	if (flags & ~(mask - 1))
 		ret += snprintf(page + ret, count - ret,
 				"%sunknown flags %#llx",
 				ret ? sep : "", flags & ~(mask - 1));
+
+	if (!(flags & OBD_CONNECT_FLAGS2) || flags2 == 0)
+		return ret;
+
+	for (i = 64, mask = 1; obd_connect_names[i]; i++, mask <<= 1) {
+		if (flags2 & mask)
+			ret += snprintf(page + ret, count - ret, "%s%s",
+					ret ? sep : "", obd_connect_names[i]);
+	}
+
+	if (flags2 & ~(mask - 1))
+		ret += snprintf(page + ret, count - ret,
+				"%sunknown2_%#llx",
+				ret ? sep : "", flags2 & ~(mask - 1));
+
 	return ret;
 }
 EXPORT_SYMBOL(obd_connect_flags2str);
@@ -659,22 +679,43 @@ static int obd_import_flags2str(struct obd_import *imp, struct seq_file *m)
 
 #undef flags2str
 
-static void obd_connect_seq_flags2str(struct seq_file *m, __u64 flags, char *sep)
+static void obd_connect_seq_flags2str(struct seq_file *m, u64 flags,
+				      u64 flags2, const char *sep)
 {
-	__u64 mask = 1;
+	__u64 mask;
 	int i;
 	bool first = true;
 
-	for (i = 0; obd_connect_names[i]; i++, mask <<= 1) {
+	for (i = 0, mask = 1; i < 64; i++, mask <<= 1) {
 		if (flags & mask) {
 			seq_printf(m, "%s%s",
 				   first ? sep : "", obd_connect_names[i]);
 			first = false;
 		}
 	}
-	if (flags & ~(mask - 1))
+
+	if (flags & ~(mask - 1)) {
 		seq_printf(m, "%sunknown flags %#llx",
 			   first ? sep : "", flags & ~(mask - 1));
+		first = false;
+	}
+
+	if (!(flags & OBD_CONNECT_FLAGS2) || flags2 == 0)
+		return;
+
+	for (i = 64, mask = 1; obd_connect_names[i]; i++, mask <<= 1) {
+		if (flags2 & mask) {
+			seq_printf(m, "%s%s",
+				   first ? "" : sep, obd_connect_names[i]);
+			first = false;
+		}
+	}
+
+	if (flags2 & ~(mask - 1)) {
+		seq_printf(m, "%sunknown2_%#llx",
+			   first ? "" : sep, flags2 & ~(mask - 1));
+		first = false;
+	}
 }
 
 int lprocfs_rd_import(struct seq_file *m, void *data)
@@ -710,6 +751,7 @@ int lprocfs_rd_import(struct seq_file *m, void *data)
 		   ptlrpc_import_state_name(imp->imp_state),
 		   imp->imp_connect_data.ocd_instance);
 	obd_connect_seq_flags2str(m, imp->imp_connect_data.ocd_connect_flags,
+				  imp->imp_connect_data.ocd_connect_flags2,
 				  ", ");
 	seq_puts(m, " ]\n");
 	obd_connect_data_seqprint(m, ocd);
@@ -932,7 +974,7 @@ int lprocfs_rd_timeouts(struct seq_file *m, void *data)
 int lprocfs_rd_connect_flags(struct seq_file *m, void *data)
 {
 	struct obd_device *obd = data;
-	__u64 flags;
+	__u64 flags, flags2;
 	int rc;
 
 	rc = lprocfs_climp_check(obd);
@@ -940,8 +982,10 @@ int lprocfs_rd_connect_flags(struct seq_file *m, void *data)
 		return rc;
 
 	flags = obd->u.cli.cl_import->imp_connect_data.ocd_connect_flags;
+	flags2 = obd->u.cli.cl_import->imp_connect_data.ocd_connect_flags2;
 	seq_printf(m, "flags=%#llx\n", flags);
-	obd_connect_seq_flags2str(m, flags, "\n");
+	seq_printf(m, "flags2=%#llx\n", flags2);
+	obd_connect_seq_flags2str(m, flags, flags2, "\n");
 	seq_puts(m, "\n");
 	up_read(&obd->u.cli.cl_sem);
 	return 0;
diff --git a/drivers/staging/lustre/lustre/ptlrpc/import.c b/drivers/staging/lustre/lustre/ptlrpc/import.c
index 54ceac5..4db0d89 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/import.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/import.c
@@ -641,6 +641,7 @@ int ptlrpc_connect_import(struct obd_import *imp)
 	 * the server is updated on-the-fly we will get the new features.
 	 */
 	imp->imp_connect_data.ocd_connect_flags = imp->imp_connect_flags_orig;
+	imp->imp_connect_data.ocd_connect_flags2 = imp->imp_connect_flags2_orig;
 	/* Reset ocd_version each time so the server knows the exact versions */
 	imp->imp_connect_data.ocd_version = LUSTRE_VERSION_CODE;
 	imp->imp_msghdr_flags &= ~MSGHDR_AT_SUPPORT;
@@ -1019,6 +1020,15 @@ static int ptlrpc_connect_interpret(const struct lu_env *env,
 		goto out;
 	}
 
+	if ((ocd->ocd_connect_flags2 & imp->imp_connect_flags2_orig) !=
+	    ocd->ocd_connect_flags2) {
+		CERROR("%s: Server didn't grant requested subset of flags2: asked=%#llx granted=%#llx\n",
+		       imp->imp_obd->obd_name, imp->imp_connect_flags2_orig,
+		       ocd->ocd_connect_flags2);
+		rc = -EPROTO;
+		goto out;
+	}
+
 	old_connect_flags = exp_connect_flags(exp);
 	exp->exp_connect_data = *ocd;
 	imp->imp_obd->obd_self_export->exp_connect_data = *ocd;
diff --git a/drivers/staging/lustre/lustre/ptlrpc/wiretest.c b/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
index 43931dd..dae1b09 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
@@ -1116,6 +1116,8 @@ void lustre_assert_wire_constants(void)
 		 OBD_CONNECT_OBDOPACK);
 	LASSERTF(OBD_CONNECT_FLAGS2 == 0x8000000000000000ULL, "found 0x%.16llxULL\n",
 		 OBD_CONNECT_FLAGS2);
+	LASSERTF(OBD_CONNECT2_FILE_SECCTX == 0x1ULL, "found 0x%.16llxULL\n",
+		 OBD_CONNECT2_FILE_SECCTX);
 	LASSERTF(OBD_CKSUM_CRC32 == 0x00000001UL, "found 0x%.8xUL\n",
 		 (unsigned int)OBD_CKSUM_CRC32);
 	LASSERTF(OBD_CKSUM_ADLER == 0x00000002UL, "found 0x%.8xUL\n",
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [lustre-devel] [PATCH 07/18] lustre: llite: restore fd_och when putting lease
  2018-07-02 23:24 [lustre-devel] [PATCH 00/18] lustre: missing updates from lustre 2.9 James Simmons
                   ` (5 preceding siblings ...)
  2018-07-02 23:24 ` [lustre-devel] [PATCH 06/18] lustre: obd: reserve connection flag OBD_CONNECT2_FILE_SECCTX James Simmons
@ 2018-07-02 23:24 ` James Simmons
  2018-07-02 23:24 ` [lustre-devel] [PATCH 08/18] lustre: security: send file security context for creates James Simmons
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 41+ messages in thread
From: James Simmons @ 2018-07-02 23:24 UTC (permalink / raw)
  To: lustre-devel

From: Henri Doreau <henri.doreau@cea.fr>

fd_och was not restored when putting back a file lease, preventing
from getting a lease, putting it back and taking it again on a FD.

Signed-off-by: Henri Doreau <henri.doreau@cea.fr>
WC-bug-id: https://jira.whamcloud.com/browse/LU-8174
Reviewed-on: http://review.whamcloud.com/20331
Reviewed-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Reviewed-by: Jean-Baptiste Riaux <riaux.jb@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/llite/file.c | 132 +++++++++++++++++++++--------
 1 file changed, 97 insertions(+), 35 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index db18d1d..d570232 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -705,6 +705,97 @@ static int ll_md_blocking_lease_ast(struct ldlm_lock *lock,
 }
 
 /**
+ * When setting a lease on a file, we take ownership of the lli_mds_*_och
+ * and save it as fd->fd_och so as to force client to reopen the file even
+ * if it has an open lock in cache already.
+ */
+static int ll_lease_och_acquire(struct inode *inode, struct file *file,
+				struct lustre_handle *old_handle)
+{
+	struct ll_file_data *fd = LUSTRE_FPRIVATE(file);
+	struct ll_inode_info *lli = ll_i2info(inode);
+	struct obd_client_handle **och_p;
+	u64 *och_usecount;
+	int rc = 0;
+
+	/* Get the openhandle of the file */
+	mutex_lock(&lli->lli_och_mutex);
+	if (fd->fd_lease_och) {
+		rc = -EBUSY;
+		goto out_unlock;
+	}
+
+	if (!fd->fd_och) {
+		if (file->f_mode & FMODE_WRITE) {
+			LASSERT(lli->lli_mds_write_och);
+			och_p = &lli->lli_mds_write_och;
+			och_usecount = &lli->lli_open_fd_write_count;
+		} else {
+			LASSERT(lli->lli_mds_read_och);
+			och_p = &lli->lli_mds_read_och;
+			och_usecount = &lli->lli_open_fd_read_count;
+		}
+
+		if (*och_usecount > 1) {
+			rc = -EBUSY;
+			goto out_unlock;
+		}
+
+		fd->fd_och = *och_p;
+		*och_usecount = 0;
+		*och_p = NULL;
+	}
+
+	*old_handle = fd->fd_och->och_fh;
+
+out_unlock:
+	mutex_unlock(&lli->lli_och_mutex);
+	return rc;
+}
+
+/**
+ * Release ownership on lli_mds_*_och when putting back a file lease.
+ */
+static int ll_lease_och_release(struct inode *inode, struct file *file)
+{
+	struct ll_file_data *fd = LUSTRE_FPRIVATE(file);
+	struct ll_inode_info *lli = ll_i2info(inode);
+	struct obd_client_handle *old_och = NULL;
+	struct obd_client_handle **och_p;
+	u64 *och_usecount;
+	int rc = 0;
+
+	mutex_lock(&lli->lli_och_mutex);
+	if (file->f_mode & FMODE_WRITE) {
+		och_p = &lli->lli_mds_write_och;
+		och_usecount = &lli->lli_open_fd_write_count;
+	} else {
+		och_p = &lli->lli_mds_read_och;
+		och_usecount = &lli->lli_open_fd_read_count;
+	}
+
+	/*
+	 * The file may have been open by another process (broken lease) so
+	 * *och_p is not NULL. In this case we should simply increase usecount
+	 * and close fd_och.
+	 */
+	if (*och_p) {
+		old_och = fd->fd_och;
+		(*och_usecount)++;
+	} else {
+		*och_p = fd->fd_och;
+		*och_usecount = 1;
+	}
+	fd->fd_och = NULL;
+	mutex_unlock(&lli->lli_och_mutex);
+
+	if (old_och)
+		rc = ll_close_inode_openhandle(inode, old_och, 0, NULL);
+
+	return rc;
+}
+
+/**
  * Acquire a lease and open the file.
  */
 static struct obd_client_handle *
@@ -724,45 +815,12 @@ static int ll_md_blocking_lease_ast(struct ldlm_lock *lock,
 		return ERR_PTR(-EINVAL);
 
 	if (file) {
-		struct ll_inode_info *lli = ll_i2info(inode);
-		struct ll_file_data *fd = LUSTRE_FPRIVATE(file);
-		struct obd_client_handle **och_p;
-		__u64 *och_usecount;
-
 		if (!(fmode & file->f_mode) || (file->f_mode & FMODE_EXEC))
 			return ERR_PTR(-EPERM);
 
-		/* Get the openhandle of the file */
-		rc = -EBUSY;
-		mutex_lock(&lli->lli_och_mutex);
-		if (fd->fd_lease_och) {
-			mutex_unlock(&lli->lli_och_mutex);
-			return ERR_PTR(rc);
-		}
-
-		if (!fd->fd_och) {
-			if (file->f_mode & FMODE_WRITE) {
-				LASSERT(lli->lli_mds_write_och);
-				och_p = &lli->lli_mds_write_och;
-				och_usecount = &lli->lli_open_fd_write_count;
-			} else {
-				LASSERT(lli->lli_mds_read_och);
-				och_p = &lli->lli_mds_read_och;
-				och_usecount = &lli->lli_open_fd_read_count;
-			}
-			if (*och_usecount == 1) {
-				fd->fd_och = *och_p;
-				*och_p = NULL;
-				*och_usecount = 0;
-				rc = 0;
-			}
-		}
-		mutex_unlock(&lli->lli_och_mutex);
-		if (rc < 0) /* more than 1 opener */
+		rc = ll_lease_och_acquire(inode, file, &old_handle);
+		if (rc)
 			return ERR_PTR(rc);
-
-		LASSERT(fd->fd_och);
-		old_handle = fd->fd_och->och_fh;
 	}
 
 	och = kzalloc(sizeof(*och), GFP_NOFS);
@@ -2333,6 +2391,10 @@ static int ll_ladvise(struct inode *inode, struct file *file, __u64 flags,
 			if (rc < 0)
 				return rc;
 
+			rc = ll_lease_och_release(inode, file);
+			if (rc < 0)
+				return rc;
+
 			if (lease_broken)
 				fmode = 0;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [lustre-devel] [PATCH 08/18] lustre: security: send file security context for creates
  2018-07-02 23:24 [lustre-devel] [PATCH 00/18] lustre: missing updates from lustre 2.9 James Simmons
                   ` (6 preceding siblings ...)
  2018-07-02 23:24 ` [lustre-devel] [PATCH 07/18] lustre: llite: restore fd_och when putting lease James Simmons
@ 2018-07-02 23:24 ` James Simmons
  2018-07-03  4:26   ` NeilBrown
  2018-07-02 23:24 ` [lustre-devel] [PATCH 09/18] lustre: llite: ladvise protocol changes James Simmons
                   ` (10 subsequent siblings)
  18 siblings, 1 reply; 41+ messages in thread
From: James Simmons @ 2018-07-02 23:24 UTC (permalink / raw)
  To: lustre-devel

From: "John L. Hammond" <jhammond@whamcloud.com>

Send file security context to MDT along with create RPCs. This closes
the insecure window between creation and setting of the security
context that existed previously. It also avoids a potential LDLM hang
which arises from ll_create_it() when we send a MDS_SETXATTR RPC while
holding the lookup+layout lock returned from open.

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-5560
Reviewed-on: http://review.whamcloud.com/19971
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/lustre/include/lustre_req_layout.h      |  4 +-
 drivers/staging/lustre/lustre/include/obd.h        |  5 ++
 drivers/staging/lustre/lustre/llite/dir.c          | 54 ++++++++++++++++------
 .../staging/lustre/lustre/llite/llite_internal.h   | 11 ++++-
 drivers/staging/lustre/lustre/llite/llite_lib.c    | 14 ++++++
 drivers/staging/lustre/lustre/llite/namei.c        | 40 ++++++++++++++--
 .../staging/lustre/lustre/llite/xattr_security.c   | 38 ++++++++++++++-
 drivers/staging/lustre/lustre/mdc/mdc_internal.h   |  4 ++
 drivers/staging/lustre/lustre/mdc/mdc_lib.c        | 32 +++++++++++++
 drivers/staging/lustre/lustre/mdc/mdc_locks.c      |  7 +++
 drivers/staging/lustre/lustre/mdc/mdc_reint.c      |  7 +++
 drivers/staging/lustre/lustre/ptlrpc/layout.c      | 28 +++++++++--
 12 files changed, 218 insertions(+), 26 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre_req_layout.h b/drivers/staging/lustre/lustre/include/lustre_req_layout.h
index 9d718b7..3387ab2 100644
--- a/drivers/staging/lustre/lustre/include/lustre_req_layout.h
+++ b/drivers/staging/lustre/lustre/include/lustre_req_layout.h
@@ -60,7 +60,7 @@ enum req_location {
 };
 
 /* Maximal number of fields (buffers) in a request message. */
-#define REQ_MAX_FIELD_NR  9
+#define REQ_MAX_FIELD_NR 10
 
 struct req_capsule {
 	struct ptlrpc_request   *rc_req;
@@ -236,6 +236,8 @@ void req_capsule_shrink(struct req_capsule *pill,
 extern struct req_msg_field RMF_GETINFO_VALLEN;
 extern struct req_msg_field RMF_GETINFO_KEY;
 extern struct req_msg_field RMF_CLOSE_DATA;
+extern struct req_msg_field RMF_FILE_SECCTX_NAME;
+extern struct req_msg_field RMF_FILE_SECCTX;
 
 /*
  * connection handle received in MDS_CONNECT request.
diff --git a/drivers/staging/lustre/lustre/include/obd.h b/drivers/staging/lustre/lustre/include/obd.h
index d4574ef..62f85a1 100644
--- a/drivers/staging/lustre/lustre/include/obd.h
+++ b/drivers/staging/lustre/lustre/include/obd.h
@@ -755,6 +755,11 @@ struct md_op_data {
 	__u64			op_data_version;
 	struct lustre_handle	op_lease_handle;
 
+	/* File security context, for creates. */
+	const char	       *op_file_secctx_name;
+	void		       *op_file_secctx;
+	u32			op_file_secctx_size;
+
 	/* default stripe offset */
 	__u32			op_default_stripe_offset;
 };
diff --git a/drivers/staging/lustre/lustre/llite/dir.c b/drivers/staging/lustre/lustre/llite/dir.c
index 52a8ecc..987f4b2 100644
--- a/drivers/staging/lustre/lustre/llite/dir.c
+++ b/drivers/staging/lustre/lustre/llite/dir.c
@@ -398,7 +398,7 @@ static int ll_send_mgc_param(struct obd_export *mgc, char *string)
 /**
  * Create striped directory with specified stripe(@lump)
  *
- * param[in] parent	the parent of the directory.
+ * param[in] dparent	the parent of the directory.
  * param[in] lump	the specified stripes.
  * param[in] dirname	the name of the directory.
  * param[in] mode	the specified mode of the directory.
@@ -406,14 +406,23 @@ static int ll_send_mgc_param(struct obd_export *mgc, char *string)
  * retval		=0 if striped directory is being created successfully.
  *			<0 if the creation is failed.
  */
-static int ll_dir_setdirstripe(struct inode *parent, struct lmv_user_md *lump,
+static int ll_dir_setdirstripe(struct dentry *dparent, struct lmv_user_md *lump,
 			       const char *dirname, umode_t mode)
 {
+	struct inode *parent = dparent->d_inode;
 	struct ptlrpc_request *request = NULL;
 	struct md_op_data *op_data;
 	struct ll_sb_info *sbi = ll_i2sbi(parent);
 	struct inode *inode = NULL;
-	struct dentry dentry;
+	struct dentry dentry = {
+		.d_parent = dparent,
+		.d_name = {
+			.name = dirname,
+			.len = strlen(dirname),
+			.hash = full_name_hash(dparent, dirname,
+					       strlen(dirname)),
+		},
+	};
 	int err;
 
 	if (unlikely(lump->lum_magic != LMV_USER_MAGIC))
@@ -436,9 +445,21 @@ static int ll_dir_setdirstripe(struct inode *parent, struct lmv_user_md *lump,
 	op_data = ll_prep_md_op_data(NULL, parent, NULL, dirname,
 				     strlen(dirname), mode, LUSTRE_OPC_MKDIR,
 				     lump);
-	if (IS_ERR(op_data)) {
-		err = PTR_ERR(op_data);
-		goto err_exit;
+	if (IS_ERR(op_data))
+		return PTR_ERR(op_data);
+
+	if (sbi->ll_flags & LL_SBI_FILE_SECCTX) {
+		/*
+		 * selinux_dentry_init_security() uses dentry->d_parent and name
+		 * to determine the security context for the file. So our fake
+		 * dentry should be real enough for this purpose.
+		 */
+		err = ll_dentry_init_security(&dentry, mode, &dentry.d_name,
+					      &op_data->op_file_secctx_name,
+					      &op_data->op_file_secctx,
+					      &op_data->op_file_secctx_size);
+		if (err < 0)
+			goto out_op_data;
 	}
 
 	op_data->op_cli_flags |= CLI_SET_MEA;
@@ -446,20 +467,26 @@ static int ll_dir_setdirstripe(struct inode *parent, struct lmv_user_md *lump,
 			from_kuid(&init_user_ns, current_fsuid()),
 			from_kgid(&init_user_ns, current_fsgid()),
 			current_cap(), 0, &request);
-	ll_finish_md_op_data(op_data);
+	if (err)
+		goto out_request;
 
 	err = ll_prep_inode(&inode, request, parent->i_sb, NULL);
 	if (err)
-		goto err_exit;
+		goto out_inode;
 
-	memset(&dentry, 0, sizeof(dentry));
 	dentry.d_inode = inode;
 
-	err = ll_init_security(&dentry, inode, parent);
-	iput(inode);
+	if (!(sbi->ll_flags & LL_SBI_FILE_SECCTX))
+		err = ll_inode_init_security(&dentry, inode, parent);
 
-err_exit:
+out_inode:
+	if (inode)
+		iput(inode);
+out_request:
 	ptlrpc_req_finished(request);
+out_op_data:
+	ll_finish_md_op_data(op_data);
+
 	return err;
 }
 
@@ -1033,6 +1060,7 @@ static char *ll_getname(const char __user *filename)
 
 static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 {
+	struct dentry *dentry = file_dentry(file);
 	struct inode *inode = file_inode(file);
 	struct ll_sb_info *sbi = ll_i2sbi(inode);
 	struct obd_ioctl_data *data;
@@ -1146,7 +1174,7 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 #else
 		mode = data->ioc_type;
 #endif
-		rc = ll_dir_setdirstripe(inode, lum, filename, mode);
+		rc = ll_dir_setdirstripe(dentry, lum, filename, mode);
 lmv_out_free:
 		kvfree(buf);
 		return rc;
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index 86914c9..8399501 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -261,8 +261,11 @@ static inline void ll_layout_version_set(struct ll_inode_info *lli, __u32 gen)
 int ll_xattr_cache_get(struct inode *inode, const char *name,
 		       char *buffer, size_t size, __u64 valid);
 
-int ll_init_security(struct dentry *dentry, struct inode *inode,
-		     struct inode *dir);
+int ll_dentry_init_security(struct dentry *dentry, int mode, struct qstr *name,
+			    const char **secctx_name, void **secctx,
+			    u32 *secctx_size);
+int ll_inode_init_security(struct dentry *dentry, struct inode *inode,
+			   struct inode *dir);
 
 /*
  * Locking to guarantee consistency of non-atomic updates to long long i_size,
@@ -396,6 +399,9 @@ enum stats_track_type {
 					  * suppress_pings
 					  */
 #define LL_SBI_FAST_READ	0x400000 /* fast read support */
+#define LL_SBI_FILE_SECCTX	0x800000 /* set file security context at
+					  * create
+					  */
 
 #define LL_SBI_FLAGS {	\
 	"nolck",	\
@@ -421,6 +427,7 @@ enum stats_track_type {
 	"norootsquash",	\
 	"always_ping",	\
 	"fast_read",    \
+	"file_secctx",	\
 }
 
 /*
diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c
index 640205a..7a414e2 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -42,6 +42,7 @@
 #include <linux/types.h>
 #include <linux/mm.h>
 #include <linux/random.h>
+#include <linux/security.h>
 #include <linux/fs_struct.h>
 
 #include <uapi/linux/lustre/lustre_ioctl.h>
@@ -149,6 +150,12 @@ static void ll_free_sbi(struct super_block *sb)
 	kfree(sbi);
 }
 
+static inline int obd_connect_has_secctx(struct obd_connect_data *data)
+{
+	return data->ocd_connect_flags & OBD_CONNECT_FLAGS2 &&
+	       data->ocd_connect_flags2 & OBD_CONNECT2_FILE_SECCTX;
+}
+
 static int client_common_fill_super(struct super_block *sb, char *md, char *dt)
 {
 	struct inode *root = NULL;
@@ -240,6 +247,8 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt)
 	if (sbi->ll_flags & LL_SBI_ALWAYS_PING)
 		data->ocd_connect_flags &= ~OBD_CONNECT_PINGLESS;
 
+	data->ocd_connect_flags2 |= OBD_CONNECT2_FILE_SECCTX;
+
 	data->ocd_brw_size = MD_MAX_BRW_SIZE;
 
 	err = obd_connect(NULL, &sbi->ll_md_exp, obd, &sbi->ll_sb_uuid,
@@ -347,6 +356,9 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt)
 	if (data->ocd_connect_flags & OBD_CONNECT_LAYOUTLOCK)
 		sbi->ll_flags |= LL_SBI_LAYOUT_LOCK;
 
+	if (obd_connect_has_secctx(data))
+		sbi->ll_flags |= LL_SBI_FILE_SECCTX;
+
 	if (data->ocd_ibits_known & MDS_INODELOCK_XATTR) {
 		if (!(data->ocd_connect_flags & OBD_CONNECT_MAX_EASIZE)) {
 			LCONSOLE_INFO(
@@ -2370,6 +2382,8 @@ struct md_op_data *ll_prep_md_op_data(struct md_op_data *op_data,
 
 void ll_finish_md_op_data(struct md_op_data *op_data)
 {
+	security_release_secctx(op_data->op_file_secctx,
+				op_data->op_file_secctx_size);
 	kfree(op_data);
 }
 
diff --git a/drivers/staging/lustre/lustre/llite/namei.c b/drivers/staging/lustre/lustre/llite/namei.c
index abcb1c8..134cc31 100644
--- a/drivers/staging/lustre/lustre/llite/namei.c
+++ b/drivers/staging/lustre/lustre/llite/namei.c
@@ -577,13 +577,28 @@ static struct dentry *ll_lookup_it(struct inode *parent, struct dentry *dentry,
 
 	op_data = ll_prep_md_op_data(NULL, parent, NULL, dentry->d_name.name,
 				     dentry->d_name.len, 0, opc, NULL);
-	if (IS_ERR(op_data))
-		return (void *)op_data;
+	if (IS_ERR(op_data)) {
+		retval = ERR_CAST(op_data);
+		goto out;
+	}
 
 	/* enforce umask if acl disabled or MDS doesn't support umask */
 	if (!IS_POSIXACL(parent) || !exp_connect_umask(ll_i2mdexp(parent)))
 		it->it_create_mode &= ~current_umask();
 
+	if (it->it_op & IT_CREAT &&
+	    ll_i2sbi(parent)->ll_flags & LL_SBI_FILE_SECCTX) {
+		rc = ll_dentry_init_security(dentry, it->it_create_mode,
+					     &dentry->d_name,
+					     &op_data->op_file_secctx_name,
+					     &op_data->op_file_secctx,
+					     &op_data->op_file_secctx_size);
+		if (rc < 0) {
+			retval = ERR_PTR(rc);
+			goto out;
+		}
+	}
+
 	rc = md_intent_lock(ll_i2mdexp(parent), op_data, it, &req,
 			    &ll_md_blocking_ast, 0);
 	/*
@@ -838,7 +853,10 @@ static int ll_create_it(struct inode *dir, struct dentry *dentry,
 
 	d_instantiate(dentry, inode);
 
-	return ll_init_security(dentry, inode, dir);
+	if (!(ll_i2sbi(inode)->ll_flags & LL_SBI_FILE_SECCTX))
+		rc = ll_inode_init_security(dentry, inode, dir);
+
+	return rc;
 }
 
 void ll_update_times(struct ptlrpc_request *request, struct inode *inode)
@@ -882,11 +900,21 @@ static int ll_new_node(struct inode *dir, struct dentry *dentry,
 		goto err_exit;
 	}
 
+	if (sbi->ll_flags & LL_SBI_FILE_SECCTX) {
+		err = ll_dentry_init_security(dentry, mode, &dentry->d_name,
+					      &op_data->op_file_secctx_name,
+					      &op_data->op_file_secctx,
+					      &op_data->op_file_secctx_size);
+		if (err < 0)
+			goto err_exit;
+	}
+
 	err = md_create(sbi->ll_md_exp, op_data, tgt, tgt_len, mode,
 			from_kuid(&init_user_ns, current_fsuid()),
 			from_kgid(&init_user_ns, current_fsgid()),
 			current_cap(), rdev, &request);
 	ll_finish_md_op_data(op_data);
+	op_data = NULL;
 	if (err < 0 && err != -EREMOTE)
 		goto err_exit;
 
@@ -934,11 +962,15 @@ static int ll_new_node(struct inode *dir, struct dentry *dentry,
 
 	d_instantiate(dentry, inode);
 
-	err = ll_init_security(dentry, inode, dir);
+	if (!(sbi->ll_flags & LL_SBI_FILE_SECCTX))
+		err = ll_inode_init_security(dentry, inode, dir);
 err_exit:
 	if (request)
 		ptlrpc_req_finished(request);
 
+	if (!IS_ERR_OR_NULL(op_data))
+		ll_finish_md_op_data(op_data);
+
 	return err;
 }
 
diff --git a/drivers/staging/lustre/lustre/llite/xattr_security.c b/drivers/staging/lustre/lustre/llite/xattr_security.c
index 93ec075..b419d8f 100644
--- a/drivers/staging/lustre/lustre/llite/xattr_security.c
+++ b/drivers/staging/lustre/lustre/llite/xattr_security.c
@@ -36,6 +36,41 @@
 #include <linux/xattr.h>
 #include "llite_internal.h"
 
+/*
+ * Check for LL_SBI_FILE_SECCTX before calling.
+ */
+int ll_dentry_init_security(struct dentry *dentry, int mode, struct qstr *name,
+			    const char **secctx_name, void **secctx,
+			    u32 *secctx_size)
+{
+	int rc;
+
+	/*
+	 * security_dentry_init_security() is strange. Like
+	 * security_inode_init_security() it may return a context (provided a
+	 * Linux security module is enabled) but unlike
+	 * security_inode_init_security() it does not return to us the name of
+	 * the extended attribute to store the context under (for example
+	 * "security.selinux"). So we only call it when we think we know what
+	 * the name of the extended attribute will be. This is OK-ish since
+	 * SELinux is the only module that implements
+	 * security_dentry_init_security(). Note that the NFS client code just
+	 * calls it and assumes that if anything is returned then it must come
+	 * from SELinux.
+	 */
+	if (!selinux_is_enabled())
+		return 0;
+
+	rc = security_dentry_init_security(dentry, mode, name, secctx,
+					   secctx_size);
+	if (rc < 0)
+		return rc;
+
+	*secctx_name = XATTR_NAME_SELINUX;
+
+	return 0;
+}
+
 /**
  * A helper function for ll_security_inode_init_security()
  * that takes care of setting xattrs
@@ -86,7 +121,8 @@
  * \retval < 0      failure to get security context or set xattr
  */
 int
-ll_init_security(struct dentry *dentry, struct inode *inode, struct inode *dir)
+ll_inode_init_security(struct dentry *dentry, struct inode *inode,
+		       struct inode *dir)
 {
 	if (!selinux_is_enabled())
 		return 0;
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_internal.h b/drivers/staging/lustre/lustre/mdc/mdc_internal.h
index 28924e9..f19b0ce 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_internal.h
+++ b/drivers/staging/lustre/lustre/mdc/mdc_internal.h
@@ -54,6 +54,10 @@ void mdc_create_pack(struct ptlrpc_request *req, struct md_op_data *op_data,
 void mdc_open_pack(struct ptlrpc_request *req, struct md_op_data *op_data,
 		   umode_t mode, __u64 rdev, __u64 flags, const void *data,
 		   size_t datalen);
+void mdc_file_secctx_pack(struct ptlrpc_request *req,
+			  const char *secctx_name,
+			  const void *secctx, size_t secctx_size);
+
 void mdc_unlink_pack(struct ptlrpc_request *req, struct md_op_data *op_data);
 void mdc_link_pack(struct ptlrpc_request *req, struct md_op_data *op_data);
 void mdc_rename_pack(struct ptlrpc_request *req, struct md_op_data *op_data,
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_lib.c b/drivers/staging/lustre/lustre/mdc/mdc_lib.c
index 9cb4d24..fc5a51d 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_lib.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_lib.c
@@ -110,6 +110,30 @@ static void mdc_pack_name(struct ptlrpc_request *req,
 	LASSERT(cpy_len == name_len && lu_name_is_valid_2(buf, cpy_len));
 }
 
+void mdc_file_secctx_pack(struct ptlrpc_request *req, const char *secctx_name,
+			  const void *secctx, size_t secctx_size)
+{
+	size_t buf_size;
+	void *buf;
+
+	if (!secctx_name)
+		return;
+
+	buf = req_capsule_client_get(&req->rq_pill, &RMF_FILE_SECCTX_NAME);
+	buf_size = req_capsule_get_size(&req->rq_pill, &RMF_FILE_SECCTX_NAME,
+					RCL_CLIENT);
+
+	LASSERT(buf_size == strlen(secctx_name) + 1);
+	memcpy(buf, secctx_name, buf_size);
+
+	buf = req_capsule_client_get(&req->rq_pill, &RMF_FILE_SECCTX);
+	buf_size = req_capsule_get_size(&req->rq_pill, &RMF_FILE_SECCTX,
+					RCL_CLIENT);
+
+	LASSERT(buf_size == secctx_size);
+	memcpy(buf, secctx, buf_size);
+}
+
 void mdc_readdir_pack(struct ptlrpc_request *req, __u64 pgoff, size_t size,
 		      const struct lu_fid *fid)
 {
@@ -159,6 +183,10 @@ void mdc_create_pack(struct ptlrpc_request *req, struct md_op_data *op_data,
 		tmp = req_capsule_client_get(&req->rq_pill, &RMF_EADATA);
 		memcpy(tmp, data, datalen);
 	}
+
+	mdc_file_secctx_pack(req, op_data->op_file_secctx_name,
+			     op_data->op_file_secctx,
+			     op_data->op_file_secctx_size);
 }
 
 static inline __u64 mds_pack_open_flags(__u64 flags)
@@ -224,6 +252,10 @@ void mdc_open_pack(struct ptlrpc_request *req, struct md_op_data *op_data,
 
 		if (op_data->op_bias & MDS_CREATE_VOLATILE)
 			cr_flags |= MDS_OPEN_VOLATILE;
+
+		mdc_file_secctx_pack(req, op_data->op_file_secctx_name,
+				     op_data->op_file_secctx,
+				     op_data->op_file_secctx_size);
 	}
 
 	if (lmm) {
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index a8aa0fa..cfe917c 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -288,6 +288,13 @@ static void mdc_realloc_openmsg(struct ptlrpc_request *req,
 	req_capsule_set_size(&req->rq_pill, &RMF_EADATA, RCL_CLIENT,
 			     max(lmmsize, obddev->u.cli.cl_default_mds_easize));
 
+	req_capsule_set_size(&req->rq_pill, &RMF_FILE_SECCTX_NAME,
+			     RCL_CLIENT, op_data->op_file_secctx_name ?
+			     strlen(op_data->op_file_secctx_name) + 1 : 0);
+
+	req_capsule_set_size(&req->rq_pill, &RMF_FILE_SECCTX, RCL_CLIENT,
+			     op_data->op_file_secctx_size);
+
 	rc = ldlm_prep_enqueue_req(exp, req, &cancels, count);
 	if (rc < 0) {
 		ptlrpc_request_free(req);
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_reint.c b/drivers/staging/lustre/lustre/mdc/mdc_reint.c
index da5f14c3..bdffe6d 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_reint.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_reint.c
@@ -190,6 +190,13 @@ int mdc_create(struct obd_export *exp, struct md_op_data *op_data,
 	req_capsule_set_size(&req->rq_pill, &RMF_EADATA, RCL_CLIENT,
 			     data && datalen ? datalen : 0);
 
+	req_capsule_set_size(&req->rq_pill, &RMF_FILE_SECCTX_NAME,
+			     RCL_CLIENT, op_data->op_file_secctx_name ?
+			     strlen(op_data->op_file_secctx_name) + 1 : 0);
+
+	req_capsule_set_size(&req->rq_pill, &RMF_FILE_SECCTX, RCL_CLIENT,
+			     op_data->op_file_secctx_size);
+
 	rc = mdc_prep_elc_req(exp, req, MDS_REINT, &cancels, count);
 	if (rc) {
 		ptlrpc_request_free(req);
diff --git a/drivers/staging/lustre/lustre/ptlrpc/layout.c b/drivers/staging/lustre/lustre/ptlrpc/layout.c
index 0b3ac14..d3c0dd6 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/layout.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/layout.c
@@ -196,7 +196,9 @@
 	&RMF_CAPA1,
 	&RMF_NAME,
 	&RMF_EADATA,
-	&RMF_DLM_REQ
+	&RMF_DLM_REQ,
+	&RMF_FILE_SECCTX_NAME,
+	&RMF_FILE_SECCTX
 };
 
 static const struct req_msg_field *mds_reint_create_sym_client[] = {
@@ -205,7 +207,9 @@
 	&RMF_CAPA1,
 	&RMF_NAME,
 	&RMF_SYMTGT,
-	&RMF_DLM_REQ
+	&RMF_DLM_REQ,
+	&RMF_FILE_SECCTX_NAME,
+	&RMF_FILE_SECCTX
 };
 
 static const struct req_msg_field *mds_reint_open_client[] = {
@@ -214,7 +218,9 @@
 	&RMF_CAPA1,
 	&RMF_CAPA2,
 	&RMF_NAME,
-	&RMF_EADATA
+	&RMF_EADATA,
+	&RMF_FILE_SECCTX_NAME,
+	&RMF_FILE_SECCTX
 };
 
 static const struct req_msg_field *mds_reint_open_server[] = {
@@ -435,7 +441,9 @@
 	&RMF_REC_REINT,    /* coincides with mds_reint_create_client[] */
 	&RMF_CAPA1,
 	&RMF_NAME,
-	&RMF_EADATA
+	&RMF_EADATA,
+	&RMF_FILE_SECCTX_NAME,
+	&RMF_FILE_SECCTX
 };
 
 static const struct req_msg_field *ldlm_intent_open_client[] = {
@@ -446,7 +454,9 @@
 	&RMF_CAPA1,
 	&RMF_CAPA2,
 	&RMF_NAME,
-	&RMF_EADATA
+	&RMF_EADATA,
+	&RMF_FILE_SECCTX_NAME,
+	&RMF_FILE_SECCTX
 };
 
 static const struct req_msg_field *ldlm_intent_unlink_client[] = {
@@ -931,6 +941,14 @@ struct req_msg_field RMF_STRING =
 	DEFINE_MSGF("string", RMF_F_STRING, -1, NULL, NULL);
 EXPORT_SYMBOL(RMF_STRING);
 
+struct req_msg_field RMF_FILE_SECCTX_NAME =
+	DEFINE_MSGF("file_secctx_name", RMF_F_STRING, -1, NULL, NULL);
+EXPORT_SYMBOL(RMF_FILE_SECCTX_NAME);
+
+struct req_msg_field RMF_FILE_SECCTX =
+	DEFINE_MSGF("file_secctx", 0, -1, NULL, NULL);
+EXPORT_SYMBOL(RMF_FILE_SECCTX);
+
 struct req_msg_field RMF_LLOGD_BODY =
 	DEFINE_MSGF("llogd_body", 0,
 		    sizeof(struct llogd_body), lustre_swab_llogd_body, NULL);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [lustre-devel] [PATCH 09/18] lustre: llite: ladvise protocol changes
  2018-07-02 23:24 [lustre-devel] [PATCH 00/18] lustre: missing updates from lustre 2.9 James Simmons
                   ` (7 preceding siblings ...)
  2018-07-02 23:24 ` [lustre-devel] [PATCH 08/18] lustre: security: send file security context for creates James Simmons
@ 2018-07-02 23:24 ` James Simmons
  2018-07-02 23:24 ` [lustre-devel] [PATCH 10/18] lustre: ladvise: Add willread advice support for ladvise James Simmons
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 41+ messages in thread
From: James Simmons @ 2018-07-02 23:24 UTC (permalink / raw)
  To: lustre-devel

From: Patrick Farrell <paf@cray.com>

This patch makes some changes to the ladvise API and
protocol to support lock ahead and possible future users.

Primarily, it separates the userspace API arguments from
the structures which go out on the network, and adds a
number of 'value' fields without a predefined use.

The meaning of each value field can be different for
different advice types, allowing some extensibility.

Signed-off-by: Patrick Farrell <paf@cray.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-7225
Reviewed-on: http://review.whamcloud.com/20666
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 .../lustre/include/uapi/linux/lustre/lustre_idl.h  | 30 ++++++++++++++++
 .../lustre/include/uapi/linux/lustre/lustre_user.h | 37 +++++++++++++------
 drivers/staging/lustre/lustre/llite/file.c         |  8 ++---
 .../staging/lustre/lustre/ptlrpc/pack_generic.c    | 12 ++++---
 drivers/staging/lustre/lustre/ptlrpc/wiretest.c    | 42 +++++++++++++++-------
 5 files changed, 97 insertions(+), 32 deletions(-)

diff --git a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h
index 4e25521..029ac8e 100644
--- a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h
@@ -2694,5 +2694,35 @@ struct close_data {
 	__u64			cd_reserved[8];
 };
 
+/*
+ * This is the lu_ladvise struct which goes out on the wire.
+ * Corresponds to the userspace arg llapi_lu_ladvise.
+ * value[1-4] are unspecified fields, used differently by different advices
+ */
+struct lu_ladvise {
+	__u16 lla_advice;	/* advice type */
+	__u16 lla_value1;	/* values for different advice types */
+	__u32 lla_value2;
+	__u64 lla_start;	/* first byte of extent for advice */
+	__u64 lla_end;		/* last byte of extent for advice */
+	__u32 lla_value3;
+	__u32 lla_value4;
+};
+
+/*
+ * This is the ladvise_hdr which goes on the wire, corresponds to the userspace
+ * arg llapi_ladvise_hdr.
+ * value[1-3] are unspecified fields, used differently by different advices
+ */
+struct ladvise_hdr {
+	__u32			lah_magic;	/* LADVISE_MAGIC */
+	__u32			lah_count;	/* number of advices */
+	__u64			lah_flags;	/* from enum ladvise_flag */
+	__u32			lah_value1;	/* unused */
+	__u32			lah_value2;	/* unused */
+	__u64			lah_value3;	/* unused */
+	struct lu_ladvise	lah_advise[0];	/* advices in this header */
+};
+
 #endif
 /** @} lustreidl */
diff --git a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_user.h b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_user.h
index fc33a43..063a7db 100644
--- a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_user.h
+++ b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_user.h
@@ -276,7 +276,7 @@ struct ost_id {
 #define LL_IOC_MIGRATE			_IOR('f', 247, int)
 #define LL_IOC_FID2MDTIDX		_IOWR('f', 248, struct lu_fid)
 #define LL_IOC_GETPARENT		_IOWR('f', 249, struct getparent)
-#define LL_IOC_LADVISE			_IOR('f', 250, struct lu_ladvise)
+#define LL_IOC_LADVISE			_IOR('f', 250, struct llapi_lu_ladvise)
 
 /* Lease types for use as arg and return of LL_IOC_{GET,SET}_LEASE ioctl. */
 enum ll_lease_type {
@@ -1327,13 +1327,22 @@ enum lu_ladvise_type {
 	LU_LADVISE_INVALID	= 0,
 };
 
-#define LU_LADVISE_NAMES { }
+#define LU_LADVISE_NAMES {				\
+}
 
-struct lu_ladvise {
-	__u64			lla_advice;
-	__u64			lla_start;
-	__u64			lla_end;
-	__u64			lla_padding;
+/*
+ * This is the userspace argument for ladvise. It is currently the same as
+ * what goes on the wire (struct lu_ladvise), but is defined separately as we
+ * may need info which is only used locally.
+ */
+struct llapi_lu_ladvise {
+	__u16 lla_advice;	/* advice type */
+	__u16 lla_value1;	/* values for different advice types */
+	__u32 lla_value2;
+	__u64 lla_start;	/* first byte of extent for advice */
+	__u64 lla_end;		/* last byte of extent for advice */
+	__u32 lla_value3;
+	__u32 lla_value4;
 };
 
 enum ladvise_flag {
@@ -1343,13 +1352,19 @@ enum ladvise_flag {
 #define LADVISE_MAGIC 0x1ADF1CE0
 #define LF_MASK LF_ASYNC
 
-struct ladvise_hdr {
+/*
+ * This is the userspace argument for ladvise, corresponds to ladvise_hdr which
+ * is used on the wire. It is defined separately as we may need info which is
+ * only used locally.
+ */
+struct llapi_ladvise_hdr {
 	__u32			lah_magic;	/* LADVISE_MAGIC */
 	__u32			lah_count;	/* number of advices */
 	__u64			lah_flags;	/* from enum ladvise_flag */
-	__u64			lah_padding1;	/* unused */
-	__u64			lah_padding2;	/* unused */
-	struct lu_ladvise	lah_advise[0];
+	__u32			lah_value1;	/* unused */
+	__u32			lah_value2;	/* unused */
+	__u64			lah_value3;	/* unused */
+	struct llapi_lu_ladvise	lah_advise[0];	/* advices in this header */
 };
 
 #define LAH_COUNT_MAX		1024
diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index d570232..5d0f8c2 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -2098,7 +2098,7 @@ static inline long ll_lease_type_from_fmode(fmode_t fmode)
  * much more data being sent to the client.
  */
 static int ll_ladvise(struct inode *inode, struct file *file, __u64 flags,
-		      struct lu_ladvise *ladvise)
+		      struct llapi_lu_ladvise *ladvise)
 {
 	struct cl_ladvise_io *lio;
 	struct lu_env *env;
@@ -2458,7 +2458,7 @@ static int ll_ladvise(struct inode *inode, struct file *file, __u64 flags,
 		return rc;
 	}
 	case LL_IOC_LADVISE: {
-		struct ladvise_hdr *ladvise_hdr;
+		struct llapi_ladvise_hdr *ladvise_hdr;
 		int alloc_size = sizeof(*ladvise_hdr);
 		int num_advise;
 		int i;
@@ -2469,7 +2469,7 @@ static int ll_ladvise(struct inode *inode, struct file *file, __u64 flags,
 			return -ENOMEM;
 
 		if (copy_from_user(ladvise_hdr,
-				   (const struct ladvise_hdr __user *)arg,
+				   (const struct llapi_ladvise_hdr __user *)arg,
 				   alloc_size)) {
 			rc = -EFAULT;
 			goto out_ladvise;
@@ -2498,7 +2498,7 @@ static int ll_ladvise(struct inode *inode, struct file *file, __u64 flags,
 		 * TODO: submit multiple advices to one server in a single RPC
 		 */
 		if (copy_from_user(ladvise_hdr,
-				   (const struct ladvise_hdr __user *)arg,
+				   (const struct llapi_advise_hdr __user *)arg,
 				   alloc_size)) {
 			rc = -EFAULT;
 			goto out_ladvise;
diff --git a/drivers/staging/lustre/lustre/ptlrpc/pack_generic.c b/drivers/staging/lustre/lustre/ptlrpc/pack_generic.c
index 468fa69..86a64a6 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/pack_generic.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/pack_generic.c
@@ -2312,10 +2312,13 @@ void lustre_swab_close_data(struct close_data *cd)
 
 void lustre_swab_ladvise(struct lu_ladvise *ladvise)
 {
+	swab16s(&ladvise->lla_advice);
+	swab16s(&ladvise->lla_value1);
+	swab32s(&ladvise->lla_value2);
 	swab64s(&ladvise->lla_start);
 	swab64s(&ladvise->lla_end);
-	swab64s(&ladvise->lla_advice);
-	BUILD_BUG_ON(!offsetof(typeof(*ladvise), lla_padding));
+	swab32s(&ladvise->lla_value3);
+	swab32s(&ladvise->lla_value4);
 }
 EXPORT_SYMBOL(lustre_swab_ladvise);
 
@@ -2324,7 +2327,8 @@ void lustre_swab_ladvise_hdr(struct ladvise_hdr *ladvise_hdr)
 	swab32s(&ladvise_hdr->lah_magic);
 	swab32s(&ladvise_hdr->lah_count);
 	swab64s(&ladvise_hdr->lah_flags);
-	BUILD_BUG_ON(!offsetof(typeof(*ladvise_hdr), lah_padding1));
-	BUILD_BUG_ON(!offsetof(typeof(*ladvise_hdr), lah_padding2));
+	swab32s(&ladvise_hdr->lah_value1);
+	swab32s(&ladvise_hdr->lah_value2);
+	swab64s(&ladvise_hdr->lah_value3);
 }
 EXPORT_SYMBOL(lustre_swab_ladvise_hdr);
diff --git a/drivers/staging/lustre/lustre/ptlrpc/wiretest.c b/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
index dae1b09..a1895cb 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
@@ -4217,8 +4217,16 @@ void lustre_assert_wire_constants(void)
 		 (long long)(int)sizeof(struct lu_ladvise));
 	LASSERTF((int)offsetof(struct lu_ladvise, lla_advice) == 0, "found %lld\n",
 		 (long long)(int)offsetof(struct lu_ladvise, lla_advice));
-	LASSERTF((int)sizeof(((struct lu_ladvise *)0)->lla_advice) == 8, "found %lld\n",
+	LASSERTF((int)sizeof(((struct lu_ladvise *)0)->lla_advice) == 2, "found %lld\n",
 		 (long long)(int)sizeof(((struct lu_ladvise *)0)->lla_advice));
+	LASSERTF((int)offsetof(struct lu_ladvise, lla_value1) == 2, "found %lld\n",
+		 (long long)(int)offsetof(struct lu_ladvise, lla_value1));
+	LASSERTF((int)sizeof(((struct lu_ladvise *)0)->lla_value1) == 2, "found %lld\n",
+		 (long long)(int)sizeof(((struct lu_ladvise *)0)->lla_value1));
+	LASSERTF((int)offsetof(struct lu_ladvise, lla_value2) == 4, "found %lld\n",
+		 (long long)(int)offsetof(struct lu_ladvise, lla_value2));
+	LASSERTF((int)sizeof(((struct lu_ladvise *)0)->lla_value2) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct lu_ladvise *)0)->lla_value2));
 	LASSERTF((int)offsetof(struct lu_ladvise, lla_start) == 8, "found %lld\n",
 		 (long long)(int)offsetof(struct lu_ladvise, lla_start));
 	LASSERTF((int)sizeof(((struct lu_ladvise *)0)->lla_start) == 8, "found %lld\n",
@@ -4227,10 +4235,14 @@ void lustre_assert_wire_constants(void)
 		 (long long)(int)offsetof(struct lu_ladvise, lla_end));
 	LASSERTF((int)sizeof(((struct lu_ladvise *)0)->lla_end) == 8, "found %lld\n",
 		 (long long)(int)sizeof(((struct lu_ladvise *)0)->lla_end));
-	LASSERTF((int)offsetof(struct lu_ladvise, lla_padding) == 24, "found %lld\n",
-		 (long long)(int)offsetof(struct lu_ladvise, lla_padding));
-	LASSERTF((int)sizeof(((struct lu_ladvise *)0)->lla_padding) == 8, "found %lld\n",
-		 (long long)(int)sizeof(((struct lu_ladvise *)0)->lla_padding));
+	LASSERTF((int)offsetof(struct lu_ladvise, lla_value3) == 24, "found %lld\n",
+		 (long long)(int)offsetof(struct lu_ladvise, lla_value3));
+	LASSERTF((int)sizeof(((struct lu_ladvise *)0)->lla_value3) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct lu_ladvise *)0)->lla_value3));
+	LASSERTF((int)offsetof(struct lu_ladvise, lla_value4) == 28, "found %lld\n",
+		 (long long)(int)offsetof(struct lu_ladvise, lla_value4));
+	LASSERTF((int)sizeof(((struct lu_ladvise *)0)->lla_value4) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct lu_ladvise *)0)->lla_value4));
 
 	/* Checks for struct ladvise_hdr */
 	LASSERTF(LADVISE_MAGIC == 0x1ADF1CE0, "found 0x%.8x\n",
@@ -4249,14 +4261,18 @@ void lustre_assert_wire_constants(void)
 		 (long long)(int)offsetof(struct ladvise_hdr, lah_flags));
 	LASSERTF((int)sizeof(((struct ladvise_hdr *)0)->lah_flags) == 8, "found %lld\n",
 		 (long long)(int)sizeof(((struct ladvise_hdr *)0)->lah_flags));
-	LASSERTF((int)offsetof(struct ladvise_hdr, lah_padding1) == 16, "found %lld\n",
-		 (long long)(int)offsetof(struct ladvise_hdr, lah_padding1));
-	LASSERTF((int)sizeof(((struct ladvise_hdr *)0)->lah_padding1) == 8, "found %lld\n",
-		 (long long)(int)sizeof(((struct ladvise_hdr *)0)->lah_padding1));
-	LASSERTF((int)offsetof(struct ladvise_hdr, lah_padding2) == 24, "found %lld\n",
-		 (long long)(int)offsetof(struct ladvise_hdr, lah_padding2));
-	LASSERTF((int)sizeof(((struct ladvise_hdr *)0)->lah_padding2) == 8, "found %lld\n",
-		 (long long)(int)sizeof(((struct ladvise_hdr *)0)->lah_padding2));
+	LASSERTF((int)offsetof(struct ladvise_hdr, lah_value1) == 16, "found %lld\n",
+		 (long long)(int)offsetof(struct ladvise_hdr, lah_value1));
+	LASSERTF((int)sizeof(((struct ladvise_hdr *)0)->lah_value1) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct ladvise_hdr *)0)->lah_value1));
+	LASSERTF((int)offsetof(struct ladvise_hdr, lah_value2) == 20, "found %lld\n",
+		 (long long)(int)offsetof(struct ladvise_hdr, lah_value2));
+	LASSERTF((int)sizeof(((struct ladvise_hdr *)0)->lah_value2) == 4, "found %lld\n",
+		 (long long)(int)sizeof(((struct ladvise_hdr *)0)->lah_value2));
+	LASSERTF((int)offsetof(struct ladvise_hdr, lah_value3) == 24, "found %lld\n",
+		 (long long)(int)offsetof(struct ladvise_hdr, lah_value3));
+	LASSERTF((int)sizeof(((struct ladvise_hdr *)0)->lah_value3) == 8, "found %lld\n",
+		 (long long)(int)sizeof(((struct ladvise_hdr *)0)->lah_value3));
 	LASSERTF((int)offsetof(struct ladvise_hdr, lah_advise) == 32, "found %lld\n",
 		 (long long)(int)offsetof(struct ladvise_hdr, lah_advise));
 	LASSERTF((int)sizeof(((struct ladvise_hdr *)0)->lah_advise) == 0, "found %lld\n",
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [lustre-devel] [PATCH 10/18] lustre: ladvise: Add willread advice support for ladvise
  2018-07-02 23:24 [lustre-devel] [PATCH 00/18] lustre: missing updates from lustre 2.9 James Simmons
                   ` (8 preceding siblings ...)
  2018-07-02 23:24 ` [lustre-devel] [PATCH 09/18] lustre: llite: ladvise protocol changes James Simmons
@ 2018-07-02 23:24 ` James Simmons
  2018-07-02 23:24 ` [lustre-devel] [PATCH 11/18] lustre: osc: max_pages_per_rpc should be chunk size aligned James Simmons
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 41+ messages in thread
From: James Simmons @ 2018-07-02 23:24 UTC (permalink / raw)
  To: lustre-devel

From: Li Xi <lixi@ddn.com>

This patch adds WILLREAD advice to ladvise framework. OSS will
prefetch data into memory when this hint is provided. It is not
garanteed how long the cached pages will be kept in memory.

Signed-off-by: Li Xi <lixi@ddn.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-4931
Reviewed-on: http://review.whamcloud.com/12458
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/include/uapi/linux/lustre/lustre_user.h | 2 ++
 drivers/staging/lustre/lustre/ptlrpc/wiretest.c                | 4 ++++
 2 files changed, 6 insertions(+)

diff --git a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_user.h b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_user.h
index 063a7db..02b51ca 100644
--- a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_user.h
+++ b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_user.h
@@ -1325,9 +1325,11 @@ struct hsm_copy {
 
 enum lu_ladvise_type {
 	LU_LADVISE_INVALID	= 0,
+	LU_LADVISE_WILLREAD	= 1,
 };
 
 #define LU_LADVISE_NAMES {				\
+	[LU_LADVISE_WILLREAD]	= "willread",		\
 }
 
 /*
diff --git a/drivers/staging/lustre/lustre/ptlrpc/wiretest.c b/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
index a1895cb..aa17d01 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
@@ -4243,6 +4243,8 @@ void lustre_assert_wire_constants(void)
 		 (long long)(int)offsetof(struct lu_ladvise, lla_value4));
 	LASSERTF((int)sizeof(((struct lu_ladvise *)0)->lla_value4) == 4, "found %lld\n",
 		 (long long)(int)sizeof(((struct lu_ladvise *)0)->lla_value4));
+	LASSERTF(LU_LADVISE_WILLREAD == 1, "found %lld\n",
+		 (long long)LU_LADVISE_WILLREAD);
 
 	/* Checks for struct ladvise_hdr */
 	LASSERTF(LADVISE_MAGIC == 0x1ADF1CE0, "found 0x%.8x\n",
@@ -4277,4 +4279,6 @@ void lustre_assert_wire_constants(void)
 		 (long long)(int)offsetof(struct ladvise_hdr, lah_advise));
 	LASSERTF((int)sizeof(((struct ladvise_hdr *)0)->lah_advise) == 0, "found %lld\n",
 		 (long long)(int)sizeof(((struct ladvise_hdr *)0)->lah_advise));
+	LASSERTF(LF_ASYNC == 0x00000001UL, "found 0x%.8xUL\n",
+		 (unsigned int)LF_ASYNC);
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [lustre-devel] [PATCH 11/18] lustre: osc: max_pages_per_rpc should be chunk size aligned
  2018-07-02 23:24 [lustre-devel] [PATCH 00/18] lustre: missing updates from lustre 2.9 James Simmons
                   ` (9 preceding siblings ...)
  2018-07-02 23:24 ` [lustre-devel] [PATCH 10/18] lustre: ladvise: Add willread advice support for ladvise James Simmons
@ 2018-07-02 23:24 ` James Simmons
  2018-07-02 23:24 ` [lustre-devel] [PATCH 12/18] lustre: ladvise: Add dontneed advice support for ladvise James Simmons
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 41+ messages in thread
From: James Simmons @ 2018-07-02 23:24 UTC (permalink / raw)
  To: lustre-devel

From: Bobi Jam <bobijam@hotmail.com>

max_pages_per_rpc should be chunk size aligned.

obd_brw_size need to be at least one block size.

Improve the LASSERT() to an LASSERTF() that prints the related
parameters to help debug problem.

Signed-off-by: Bobi Jam <bobijam@hotmail.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-8460
Reviewed-on: http://review.whamcloud.com/21825
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/osc/osc_cache.c   | 11 ++++++++---
 drivers/staging/lustre/lustre/osc/osc_request.c |  5 +++++
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lustre/osc/osc_cache.c b/drivers/staging/lustre/lustre/osc/osc_cache.c
index 8d3f501..15a4817 100644
--- a/drivers/staging/lustre/lustre/osc/osc_cache.c
+++ b/drivers/staging/lustre/lustre/osc/osc_cache.c
@@ -681,15 +681,20 @@ static struct osc_extent *osc_extent_find(const struct lu_env *env,
 	descr = &olck->ols_cl.cls_lock->cll_descr;
 	LASSERT(descr->cld_mode >= CLM_WRITE);
 
-	LASSERT(cli->cl_chunkbits >= PAGE_SHIFT);
+	LASSERTF(cli->cl_chunkbits >= PAGE_SHIFT,
+		 "chunkbits: %u\n", cli->cl_chunkbits);
 	ppc_bits = cli->cl_chunkbits - PAGE_SHIFT;
 	chunk_mask = ~((1 << ppc_bits) - 1);
 	chunksize = 1 << cli->cl_chunkbits;
 	chunk = index >> ppc_bits;
 
-	/* align end to rpc edge, rpc size may not be a power 2 integer. */
+	/* align end to RPC edge */
 	max_pages = cli->cl_max_pages_per_rpc;
-	LASSERT((max_pages & ~chunk_mask) == 0);
+	if ((max_pages & ~chunk_mask) != 0) {
+		CERROR("max_pages: %#x chunkbits: %u chunk_mask: %#lx\n",
+		       max_pages, cli->cl_chunkbits, chunk_mask);
+		return ERR_PTR(-EINVAL);
+	}
 	max_end = index - (index % max_pages) + max_pages - 1;
 	max_end = min_t(pgoff_t, max_end, descr->cld_end);
 
diff --git a/drivers/staging/lustre/lustre/osc/osc_request.c b/drivers/staging/lustre/lustre/osc/osc_request.c
index 0286f25..2d05387 100644
--- a/drivers/staging/lustre/lustre/osc/osc_request.c
+++ b/drivers/staging/lustre/lustre/osc/osc_request.c
@@ -931,6 +931,7 @@ static void osc_init_grant(struct client_obd *cli, struct obd_connect_data *ocd)
 	}
 
 	if (OCD_HAS_FLAG(ocd, GRANT_PARAM)) {
+		int chunk_mask;
 		u64 size;
 
 		/* overhead for each extent insertion */
@@ -938,6 +939,10 @@ static void osc_init_grant(struct client_obd *cli, struct obd_connect_data *ocd)
 		/* determine the appropriate chunk size used by osc_extent. */
 		cli->cl_chunkbits = max_t(int, PAGE_SHIFT,
 					  ocd->ocd_grant_blkbits);
+		/* max_pages_per_rpc must be chunk aligned */
+		chunk_mask = ~((1 << (cli->cl_chunkbits - PAGE_SHIFT)) - 1);
+		cli->cl_max_pages_per_rpc = (cli->cl_max_pages_per_rpc +
+					     ~chunk_mask) & chunk_mask;
 		/* determine maximum extent size, in #pages */
 		size = (u64)ocd->ocd_grant_max_blks << ocd->ocd_grant_blkbits;
 		cli->cl_max_extent_pages = size >> PAGE_SHIFT;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [lustre-devel] [PATCH 12/18] lustre: ladvise: Add dontneed advice support for ladvise
  2018-07-02 23:24 [lustre-devel] [PATCH 00/18] lustre: missing updates from lustre 2.9 James Simmons
                   ` (10 preceding siblings ...)
  2018-07-02 23:24 ` [lustre-devel] [PATCH 11/18] lustre: osc: max_pages_per_rpc should be chunk size aligned James Simmons
@ 2018-07-02 23:24 ` James Simmons
  2018-07-02 23:24 ` [lustre-devel] [PATCH 13/18] lustre: ptlrpc: properly set "rq_xid" for 4MB IO James Simmons
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 41+ messages in thread
From: James Simmons @ 2018-07-02 23:24 UTC (permalink / raw)
  To: lustre-devel

From: Li Xi <lixi@ddn.com>

This patch addds DONTNEED advice to ladvise framework. OSS will
cleanup the page cache of the file when this hint is provided.

Signed-off-by: Li Xi <lixi@ddn.com>
Signed-off-by: Gu Zheng <gzheng@ddn.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-4931
Reviewed-on: http://review.whamcloud.com/20203
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/include/uapi/linux/lustre/lustre_user.h | 2 ++
 drivers/staging/lustre/lustre/ptlrpc/wiretest.c                | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_user.h b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_user.h
index 02b51ca..4fa7796 100644
--- a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_user.h
+++ b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_user.h
@@ -1326,10 +1326,12 @@ struct hsm_copy {
 enum lu_ladvise_type {
 	LU_LADVISE_INVALID	= 0,
 	LU_LADVISE_WILLREAD	= 1,
+	LU_LADVISE_DONTNEED	= 2,
 };
 
 #define LU_LADVISE_NAMES {				\
 	[LU_LADVISE_WILLREAD]	= "willread",		\
+	[LU_LADVISE_DONTNEED]	= "dontneed",		\
 }
 
 /*
diff --git a/drivers/staging/lustre/lustre/ptlrpc/wiretest.c b/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
index aa17d01..61995c3 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/wiretest.c
@@ -4245,6 +4245,8 @@ void lustre_assert_wire_constants(void)
 		 (long long)(int)sizeof(((struct lu_ladvise *)0)->lla_value4));
 	LASSERTF(LU_LADVISE_WILLREAD == 1, "found %lld\n",
 		 (long long)LU_LADVISE_WILLREAD);
+	LASSERTF(LU_LADVISE_DONTNEED == 2, "found %lld\n",
+		 (long long)LU_LADVISE_DONTNEED);
 
 	/* Checks for struct ladvise_hdr */
 	LASSERTF(LADVISE_MAGIC == 0x1ADF1CE0, "found 0x%.8x\n",
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [lustre-devel] [PATCH 13/18] lustre: ptlrpc: properly set "rq_xid" for 4MB IO
  2018-07-02 23:24 [lustre-devel] [PATCH 00/18] lustre: missing updates from lustre 2.9 James Simmons
                   ` (11 preceding siblings ...)
  2018-07-02 23:24 ` [lustre-devel] [PATCH 12/18] lustre: ladvise: Add dontneed advice support for ladvise James Simmons
@ 2018-07-02 23:24 ` James Simmons
  2018-07-02 23:24 ` [lustre-devel] [PATCH 14/18] lustre: obd: add callback for llog_cat_process_or_fork James Simmons
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 41+ messages in thread
From: James Simmons @ 2018-07-02 23:24 UTC (permalink / raw)
  To: lustre-devel

From: Fan Yong <fyong@whamcloud.com>

The commit 8bcaef92664f ("staging: lustre: ptlrpc: mbits is sent
within ptlrpc_body") replaced the "rq_xid" with "rq_mbits"
as the matchbits of bulk data transferring. To be interoperable
with old servers, it introduced the new connection flag:
OBD_CONNECT_BULK_MBITS. If the server does not support such
feature, then the "rq_xid" would be set the same value as
"rq_mbits". Unfortunately, it forgot to handle multiple bulk
operations, for example 4MB IO. If the new client wants to
make 4MB IO with old server, it may send a small "rq_xid" to
the old server, as to the old server will regard it as an 1MB
IO or 2MB IO. So the data transfer will not complete because
of only part of data transferred. Then the client will timeout
failure and retry again and again.

Fixes: 8bcaef92664f ("staging: lustre: ptlrpc: mbits is sent within ptlrpc_body)
Signed-off-by: Fan Yong <fyong@whamcloud.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-6808
Reviewed-on: http://review.whamcloud.com/22373
Reviewed-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Liang Zhen <liang.zhen@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: Tyson Whitehead <twhitehead@gmail.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/ptlrpc/client.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lustre/ptlrpc/client.c b/drivers/staging/lustre/lustre/ptlrpc/client.c
index c1b82bf..fee4e49 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/client.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/client.c
@@ -3084,8 +3084,7 @@ void ptlrpc_set_bulk_mbits(struct ptlrpc_request *req)
 		 * 'resend for the -EINPROGRESS resend'. To make it simple,
 		 * we opt to generate mbits for all resend cases.
 		 */
-		if ((bd->bd_import->imp_connect_data.ocd_connect_flags &
-		     OBD_CONNECT_BULK_MBITS)) {
+		if (OCD_HAS_FLAG(&bd->bd_import->imp_connect_data, BULK_MBITS)) {
 			req->rq_mbits = ptlrpc_next_xid();
 		} else {
 			/* old version transfers rq_xid to peer as matchbits */
@@ -3115,6 +3114,12 @@ void ptlrpc_set_bulk_mbits(struct ptlrpc_request *req)
 	 * see LU-1431
 	 */
 	req->rq_mbits += DIV_ROUND_UP(bd->bd_iov_count, LNET_MAX_IOV) - 1;
+
+	/* Set rq_xid as rq_mbits to indicate the final bulk for the old
+	 * server which does not support OBD_CONNECT_BULK_MBITS. LU-6808
+	 */
+	if (!OCD_HAS_FLAG(&bd->bd_import->imp_connect_data, BULK_MBITS))
+		req->rq_xid = req->rq_mbits;
 }
 
 /**
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [lustre-devel] [PATCH 14/18] lustre: obd: add callback for llog_cat_process_or_fork
  2018-07-02 23:24 [lustre-devel] [PATCH 00/18] lustre: missing updates from lustre 2.9 James Simmons
                   ` (12 preceding siblings ...)
  2018-07-02 23:24 ` [lustre-devel] [PATCH 13/18] lustre: ptlrpc: properly set "rq_xid" for 4MB IO James Simmons
@ 2018-07-02 23:24 ` James Simmons
  2018-07-02 23:24 ` [lustre-devel] [PATCH 15/18] lustre: mount: fix lmd_parse() to handle new delimiters James Simmons
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 41+ messages in thread
From: James Simmons @ 2018-07-02 23:24 UTC (permalink / raw)
  To: lustre-devel

From: Alexander Boyko <c17825@cray.com>

Currently llog_process_or_fork() is hard coded to
always pass the function pointer llog_cat_process_cb().
Change llog_cat_process_or_fork() to pass in any
function pointer which will allow us more options
for llog_cat callback routines in the future.

Signed-off-by: Alexander Boyko <c17825@cray.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-7156
Seagate-bug-id: MRP-2383
Reviewed-on: http://review.whamcloud.com/16416
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/obdclass/llog_cat.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/drivers/staging/lustre/lustre/obdclass/llog_cat.c b/drivers/staging/lustre/lustre/obdclass/llog_cat.c
index d9c63ad..58dbd50 100644
--- a/drivers/staging/lustre/lustre/obdclass/llog_cat.c
+++ b/drivers/staging/lustre/lustre/obdclass/llog_cat.c
@@ -189,7 +189,8 @@ static int llog_cat_process_cb(const struct lu_env *env,
 
 static int llog_cat_process_or_fork(const struct lu_env *env,
 				    struct llog_handle *cat_llh,
-				    llog_cb_t cb, void *data, int startcat,
+				    llog_cb_t cat_cb, llog_cb_t cb,
+				    void *data, int startcat,
 				    int startidx, bool fork)
 {
 	struct llog_process_data d;
@@ -210,18 +211,15 @@ static int llog_cat_process_or_fork(const struct lu_env *env,
 
 		cd.lpcd_first_idx = llh->llh_cat_idx;
 		cd.lpcd_last_idx = 0;
-		rc = llog_process_or_fork(env, cat_llh, llog_cat_process_cb,
-					  &d, &cd, fork);
+		rc = llog_process_or_fork(env, cat_llh, cat_cb, &d, &cd, fork);
 		if (rc != 0)
 			return rc;
 
 		cd.lpcd_first_idx = 0;
 		cd.lpcd_last_idx = cat_llh->lgh_last_idx;
-		rc = llog_process_or_fork(env, cat_llh, llog_cat_process_cb,
-					  &d, &cd, fork);
+		rc = llog_process_or_fork(env, cat_llh, cat_cb, &d, &cd, fork);
 	} else {
-		rc = llog_process_or_fork(env, cat_llh, llog_cat_process_cb,
-					  &d, NULL, fork);
+		rc = llog_process_or_fork(env, cat_llh, cat_cb, &d, NULL, fork);
 	}
 
 	return rc;
@@ -230,7 +228,7 @@ static int llog_cat_process_or_fork(const struct lu_env *env,
 int llog_cat_process(const struct lu_env *env, struct llog_handle *cat_llh,
 		     llog_cb_t cb, void *data, int startcat, int startidx)
 {
-	return llog_cat_process_or_fork(env, cat_llh, cb, data, startcat,
-					startidx, false);
+	return llog_cat_process_or_fork(env, cat_llh, llog_cat_process_cb, cb,
+					data, startcat, startidx, false);
 }
 EXPORT_SYMBOL(llog_cat_process);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [lustre-devel] [PATCH 15/18] lustre: mount: fix lmd_parse() to handle new delimiters
  2018-07-02 23:24 [lustre-devel] [PATCH 00/18] lustre: missing updates from lustre 2.9 James Simmons
                   ` (13 preceding siblings ...)
  2018-07-02 23:24 ` [lustre-devel] [PATCH 14/18] lustre: obd: add callback for llog_cat_process_or_fork James Simmons
@ 2018-07-02 23:24 ` James Simmons
  2018-07-02 23:24 ` [lustre-devel] [PATCH 16/18] lustre: libcfs: restore original behavior in cfs_str2num_check James Simmons
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 41+ messages in thread
From: James Simmons @ 2018-07-02 23:24 UTC (permalink / raw)
  To: lustre-devel

From: Jian Yu <jian.yu@intel.com>

The lmd_parse() function parses mount options with comma as
delimiter without considering commas in expr_list as follows
is a valid LNET nid range syntax:

<expr_list>  :== '[' <range_expr> [ ',' <range_expr>] ']'

This patch fixes the above issue by using cfs_parse_nidlist()
to parse nid range list instead of using class_parse_nid_quiet()
to parse only one nid.

In multi-rail and failover configurations, colon is a valid
delimiter in LNET nid list to separate different hosts.
This patch fixes lmd_parse()->lmd_parse_nidlist() to handle
both comma and colon as delimiters.

Signed-off-by: Jian Yu <jian.yu@intel.com>
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-9325
Reviewed-on: https://review.whamcloud.com/26558
WC-bug-id: https://jira.whamcloud.com/browse/LU-8311
Reviewed-on: http://review.whamcloud.com/21329
WC-bug-id: https://jira.whamcloud.com/browse/LU-5690
Reviewed-on: http://review.whamcloud.com/17036
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/obdclass/obd_mount.c | 112 +++++++++++++++++++--
 1 file changed, 104 insertions(+), 8 deletions(-)

diff --git a/drivers/staging/lustre/lustre/obdclass/obd_mount.c b/drivers/staging/lustre/lustre/obdclass/obd_mount.c
index 6e9803b..708c580 100644
--- a/drivers/staging/lustre/lustre/obdclass/obd_mount.c
+++ b/drivers/staging/lustre/lustre/obdclass/obd_mount.c
@@ -889,6 +889,104 @@ static int lmd_parse_mgs(struct lustre_mount_data *lmd, char **ptr)
 	return 0;
 }
 
+/**
+ * Find the first delimiter (comma or colon) from the specified \a buf and
+ * make \a *endh point to the string starting with the delimiter. The commas
+ * in expression list [...] will be skipped.
+ *
+ * @buf		a delimiter-separated string
+ * @endh	a pointer to a pointer that will point to the string
+ *		starting with the delimiter
+ *
+ * RETURNS	true if delimiter is found, false if delimiter is not found
+ */
+static bool lmd_find_delimiter(char *buf, char **endh)
+{
+	char *c = buf;
+	size_t pos;
+	bool found;
+
+	if (!buf)
+		return false;
+try_again:
+	if (*c == ',' || *c == ':')
+		return true;
+
+	pos = strcspn(c, "[:,]");
+	if (!pos)
+		return false;
+
+	/* Not a valid mount string */
+	if (*c == ']') {
+		CWARN("invalid mount string format\n");
+		return false;
+	}
+
+	c += pos;
+	if (*c == '[') {
+		c = strchr(c, ']');
+
+		/* invalid mount string */
+		if (!c) {
+			CWARN("invalid mount string format\n");
+			return false;
+		}
+		c++;
+		goto try_again;
+	}
+
+	found = *c != '\0';
+	if (found && endh)
+		*endh = c;
+
+	return found;
+}
+
+/**
+ * Find the first valid string delimited by comma or colon from the specified
+ * \a buf and parse it to see whether it's a valid nid list. If yes, \a *endh
+ * will point to the next string starting with the delimiter.
+ *
+ * \param[in] buf	a delimiter-separated string
+ * \param[in] endh	a pointer to a pointer that will point to the string
+ *			starting with the delimiter
+ *
+ * \retval 0		if the string is a valid nid list
+ * \retval 1		if the string is not a valid nid list
+ */
+static int lmd_parse_nidlist(char *buf, char **endh)
+{
+	struct list_head nidlist;
+	char *endp = buf;
+	int rc = 0;
+	char tmp;
+
+	if (!buf)
+		return 1;
+	while (*buf == ',' || *buf == ':')
+		buf++;
+	if (*buf == ' ' || *buf == '/' || *buf == '\0')
+		return 1;
+
+	if (!lmd_find_delimiter(buf, &endp))
+		endp = buf + strlen(buf);
+
+	tmp = *endp;
+	*endp = '\0';
+
+	INIT_LIST_HEAD(&nidlist);
+	if (cfs_parse_nidlist(buf, strlen(buf), &nidlist) <= 0)
+		rc = 1;
+	cfs_free_nidlist(&nidlist);
+
+	*endp = tmp;
+	if (rc)
+		return rc;
+	if (endh)
+		*endh = endp;
+	return 0;
+}
+
 /** Parse mount line options
  * e.g. mount -v -t lustre -o abort_recov uml1:uml2:/lustre-client /mnt/lustre
  * dev is passed as device=uml1:/lustre by mount.lustre
@@ -1006,20 +1104,18 @@ static int lmd_parse(char *options, struct lustre_mount_data *lmd)
 			clear++;
 		} else if (strncmp(s1, "param=", 6) == 0) {
 			size_t length, params_length;
-			char *tail = strchr(s1 + 6, ',');
+			char *tail = s1;
 
-			if (!tail) {
-				length = strlen(s1);
-			} else {
-				lnet_nid_t nid;
+			if (lmd_find_delimiter(s1 + 6, &tail)) {
 				char *param_str = tail + 1;
 				int supplementary = 1;
 
-				while (!class_parse_nid_quiet(param_str, &nid,
-							      &param_str)) {
+				while (!lmd_parse_nidlist(param_str,
+							  &param_str))
 					supplementary = 0;
-				}
 				length = param_str - s1 - supplementary;
+			} else {
+				length = strlen(s1);
 			}
 			length -= 6;
 			params_length = strlen(lmd->lmd_params);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [lustre-devel] [PATCH 16/18] lustre: libcfs: restore original behavior in cfs_str2num_check
  2018-07-02 23:24 [lustre-devel] [PATCH 00/18] lustre: missing updates from lustre 2.9 James Simmons
                   ` (14 preceding siblings ...)
  2018-07-02 23:24 ` [lustre-devel] [PATCH 15/18] lustre: mount: fix lmd_parse() to handle new delimiters James Simmons
@ 2018-07-02 23:24 ` James Simmons
  2018-07-03  4:45   ` NeilBrown
  2018-07-02 23:24 ` [lustre-devel] [PATCH 17/18] lustre: ldlm: reduce mem footprint of ldlm_resource James Simmons
                   ` (2 subsequent siblings)
  18 siblings, 1 reply; 41+ messages in thread
From: James Simmons @ 2018-07-02 23:24 UTC (permalink / raw)
  To: lustre-devel

When cfs_str2num_check() moved from simple_strtoul to kstrtoul
some of the functionality got lost. Restore handling hexidecimal
number as well as '+' and '-'. Also handle any trailing spaces.

Signed-off-by: James Simmons <uja.ornl@yahoo.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-9325
Reviewed-on: https://review.whamcloud.com/32217
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lnet/libcfs/libcfs_string.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_string.c b/drivers/staging/lustre/lnet/libcfs/libcfs_string.c
index e1fb126..e390b0b 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_string.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_string.c
@@ -214,8 +214,10 @@ char *cfs_firststr(char *str, size_t size)
 {
 	bool all_numbers = true;
 	char *endp, cache;
+	int len;
 	int rc;
 
+	endp = strim(str);
 	/**
 	 * kstrouint can only handle strings composed
 	 * of only numbers. We need to scan the string
@@ -228,16 +230,25 @@ char *cfs_firststr(char *str, size_t size)
 	 * After we are done the character at the
 	 * position we placed '\0' must be restored.
 	 */
-	for (endp = str; endp < str + nob; endp++) {
-		if (!isdigit(*endp)) {
+	len = min((int)strlen(endp), nob);
+	for (; endp < str + len; endp++) {
+		if (!isdigit(*endp) && *endp != '-' &&
+		    *endp != '+') {
 			all_numbers = false;
 			break;
 		}
 	}
+
+	/* Eat trailing space */
+	if (!all_numbers && isspace(*endp)) {
+		all_numbers = true;
+		endp--;
+	}
+
 	cache = *endp;
 	*endp = '\0';
 
-	rc = kstrtouint(str, 10, num);
+	rc = kstrtouint(str, 0, num);
 	*endp = cache;
 	if (rc || !all_numbers)
 		return 0;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [lustre-devel] [PATCH 17/18] lustre: ldlm: reduce mem footprint of ldlm_resource
  2018-07-02 23:24 [lustre-devel] [PATCH 00/18] lustre: missing updates from lustre 2.9 James Simmons
                   ` (15 preceding siblings ...)
  2018-07-02 23:24 ` [lustre-devel] [PATCH 16/18] lustre: libcfs: restore original behavior in cfs_str2num_check James Simmons
@ 2018-07-02 23:24 ` James Simmons
  2018-07-02 23:24 ` [lustre-devel] [PATCH 18/18] lustre: update version to 2.8.99 James Simmons
  2018-07-03  4:57 ` [lustre-devel] [PATCH 00/18] lustre: missing updates from lustre 2.9 NeilBrown
  18 siblings, 0 replies; 41+ messages in thread
From: James Simmons @ 2018-07-02 23:24 UTC (permalink / raw)
  To: lustre-devel

From: Niu Yawei <yawei.niu@intel.com>

 - Allocating lr_itree only for LDLM_EXTENT resource, reduced
   120 bytes;
 - Moving fields around to eliminate holes, eliminated 3 holes,
   reduced 4 bytes;
 - Remove unused lr_contention_time, reduced 8 bytes;

   Reduced 132 bytes in total.

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-6775
Reviewed-on: http://review.whamcloud.com/15485
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/lustre/include/lustre_dlm.h | 24 ++++++++--------
 drivers/staging/lustre/lustre/ldlm/ldlm_internal.h |  1 +
 drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c    | 20 +++++++++++---
 drivers/staging/lustre/lustre/ldlm/ldlm_resource.c | 32 ++++++++++++++++------
 4 files changed, 54 insertions(+), 23 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre_dlm.h b/drivers/staging/lustre/lustre/include/lustre_dlm.h
index 4f196c2..2a05ab8 100644
--- a/drivers/staging/lustre/lustre/include/lustre_dlm.h
+++ b/drivers/staging/lustre/lustre/include/lustre_dlm.h
@@ -819,6 +819,9 @@ struct ldlm_resource {
 	 */
 	struct hlist_node	lr_hash;
 
+	/** Reference count for this resource */
+	atomic_t		lr_refcount;
+
 	/** Spinlock to protect locks under this resource. */
 	spinlock_t		lr_lock;
 
@@ -835,32 +838,31 @@ struct ldlm_resource {
 	struct list_head		lr_waiting;
 	/** @} */
 
-	/** Type of locks this resource can hold. Only one type per resource. */
-	enum ldlm_type		lr_type; /* LDLM_{PLAIN,EXTENT,FLOCK,IBITS} */
-
 	/** Resource name */
 	struct ldlm_res_id	lr_name;
-	/** Reference count for this resource */
-	atomic_t		lr_refcount;
 
 	/**
 	 * Interval trees (only for extent locks) for all modes of this resource
 	 */
-	struct ldlm_interval_tree lr_itree[LCK_MODE_NUM];
+	struct ldlm_interval_tree *lr_itree;
+
+	/** Type of locks this resource can hold. Only one type per resource. */
+	enum ldlm_type		lr_type; /* LDLM_{PLAIN,EXTENT,FLOCK,IBITS} */
 
 	/**
 	 * Server-side-only lock value block elements.
 	 * To serialize lvbo_init.
 	 */
-	struct mutex		lr_lvb_mutex;
 	int			lr_lvb_len;
+	struct mutex		lr_lvb_mutex;
+
+	/**
+	 * Associated inode, used only on client side.
+	 */
+	struct inode		*lr_lvb_inode;
 
-	/** When the resource was considered as contended. */
-	unsigned long		lr_contention_time;
 	/** List of references to this resource. For debugging. */
 	struct lu_ref		lr_reference;
-
-	struct inode		*lr_lvb_inode;
 };
 
 static inline bool ldlm_has_layout(struct ldlm_lock *lock)
diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_internal.h b/drivers/staging/lustre/lustre/ldlm/ldlm_internal.h
index 60a15b9..1d7c727 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_internal.h
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_internal.h
@@ -165,6 +165,7 @@ void ldlm_handle_bl_callback(struct ldlm_namespace *ns,
 
 /* ldlm_lockd.c & ldlm_lock.c */
 extern struct kmem_cache *ldlm_lock_slab;
+extern struct kmem_cache *ldlm_interval_tree_slab;
 
 /* ldlm_extent.c */
 void ldlm_extent_add_lock(struct ldlm_resource *res, struct ldlm_lock *lock);
diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c b/drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c
index f410ef6..5b125fdc 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c
@@ -1129,15 +1129,26 @@ int ldlm_init(void)
 					   sizeof(struct ldlm_lock), 0,
 					   SLAB_HWCACHE_ALIGN |
 					   SLAB_TYPESAFE_BY_RCU, NULL);
-	if (!ldlm_lock_slab) {
-		kmem_cache_destroy(ldlm_resource_slab);
-		return -ENOMEM;
-	}
+	if (!ldlm_lock_slab)
+		goto out_resource;
+
+	ldlm_interval_tree_slab = kmem_cache_create("interval_tree",
+						    sizeof(struct ldlm_interval_tree) * LCK_MODE_NUM,
+						    0, SLAB_HWCACHE_ALIGN,
+						    NULL);
+	if (!ldlm_interval_tree_slab)
+		goto out_lock;
 
 #if LUSTRE_TRACKS_LOCK_EXP_REFS
 	class_export_dump_hook = ldlm_dump_export_locks;
 #endif
 	return 0;
+
+out_lock:
+	kmem_cache_destroy(ldlm_lock_slab);
+out_resource:
+	kmem_cache_destroy(ldlm_resource_slab);
+	return -ENOMEM;
 }
 
 void ldlm_exit(void)
@@ -1151,4 +1162,5 @@ void ldlm_exit(void)
 	 */
 	synchronize_rcu();
 	kmem_cache_destroy(ldlm_lock_slab);
+	kmem_cache_destroy(ldlm_interval_tree_slab);
 }
diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c b/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c
index 3946d62..f06cbd8 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c
@@ -44,6 +44,7 @@
 #include <linux/libcfs/libcfs_hash.h>
 
 struct kmem_cache *ldlm_resource_slab, *ldlm_lock_slab;
+struct kmem_cache *ldlm_interval_tree_slab;
 
 int ldlm_srv_namespace_nr;
 int ldlm_cli_namespace_nr;
@@ -1001,10 +1002,9 @@ struct ldlm_namespace *ldlm_namespace_first_locked(enum ldlm_side client)
 }
 
 /** Create and initialize new resource. */
-static struct ldlm_resource *ldlm_resource_new(void)
+static struct ldlm_resource *ldlm_resource_new(enum ldlm_type ldlm_type)
 {
 	struct ldlm_resource *res;
-	int idx;
 
 	res = kmem_cache_zalloc(ldlm_resource_slab, GFP_NOFS);
 	if (!res)
@@ -1013,11 +1013,22 @@ static struct ldlm_resource *ldlm_resource_new(void)
 	INIT_LIST_HEAD(&res->lr_granted);
 	INIT_LIST_HEAD(&res->lr_waiting);
 
-	/* Initialize interval trees for each lock mode. */
-	for (idx = 0; idx < LCK_MODE_NUM; idx++) {
-		res->lr_itree[idx].lit_size = 0;
-		res->lr_itree[idx].lit_mode = 1 << idx;
-		res->lr_itree[idx].lit_root = RB_ROOT_CACHED;
+	if (ldlm_type == LDLM_EXTENT) {
+		int idx;
+
+		res->lr_itree = kmem_cache_zalloc(ldlm_interval_tree_slab,
+						  GFP_NOFS);
+		if (!res->lr_itree) {
+			kmem_cache_free(ldlm_resource_slab, res);
+			return NULL;
+		}
+
+		/* Initialize interval trees for each lock mode. */
+		for (idx = 0; idx < LCK_MODE_NUM; idx++) {
+			res->lr_itree[idx].lit_size = 0;
+			res->lr_itree[idx].lit_mode = 1 << idx;
+			res->lr_itree[idx].lit_root = RB_ROOT_CACHED;
+		}
 	}
 
 	atomic_set(&res->lr_refcount, 1);
@@ -1070,7 +1081,7 @@ struct ldlm_resource *
 
 	LASSERTF(type >= LDLM_MIN_TYPE && type < LDLM_MAX_TYPE,
 		 "type: %d\n", type);
-	res = ldlm_resource_new();
+	res = ldlm_resource_new(type);
 	if (!res)
 		return ERR_PTR(-ENOMEM);
 
@@ -1089,6 +1100,9 @@ struct ldlm_resource *
 		lu_ref_fini(&res->lr_reference);
 		/* We have taken lr_lvb_mutex. Drop it. */
 		mutex_unlock(&res->lr_lvb_mutex);
+		if (res->lr_itree)
+			kmem_cache_free(ldlm_interval_tree_slab,
+					res->lr_itree);
 		kmem_cache_free(ldlm_resource_slab, res);
 lvbo_init:
 		res = hlist_entry(hnode, struct ldlm_resource, lr_hash);
@@ -1167,6 +1181,8 @@ static void __ldlm_resource_putref_final(struct cfs_hash_bd *bd,
 		ns->ns_lvbo->lvbo_free(res);
 	if (cfs_hash_bd_count_get(bd) == 0)
 		ldlm_namespace_put(ns);
+	if (res->lr_itree)
+		kmem_cache_free(ldlm_interval_tree_slab, res->lr_itree);
 	kmem_cache_free(ldlm_resource_slab, res);
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [lustre-devel] [PATCH 18/18] lustre: update version to 2.8.99
  2018-07-02 23:24 [lustre-devel] [PATCH 00/18] lustre: missing updates from lustre 2.9 James Simmons
                   ` (16 preceding siblings ...)
  2018-07-02 23:24 ` [lustre-devel] [PATCH 17/18] lustre: ldlm: reduce mem footprint of ldlm_resource James Simmons
@ 2018-07-02 23:24 ` James Simmons
  2018-07-03 21:46   ` Andreas Dilger
  2018-07-03  4:57 ` [lustre-devel] [PATCH 00/18] lustre: missing updates from lustre 2.9 NeilBrown
  18 siblings, 1 reply; 41+ messages in thread
From: James Simmons @ 2018-07-02 23:24 UTC (permalink / raw)
  To: lustre-devel

With the missing patches from the lustre 2.9 release merged
upstream its time to update the upstream client's version.

Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/include/uapi/linux/lustre/lustre_ver.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_ver.h b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_ver.h
index 19c9135..1428fdd 100644
--- a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_ver.h
+++ b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_ver.h
@@ -2,10 +2,10 @@
 #define _LUSTRE_VER_H_
 
 #define LUSTRE_MAJOR 2
-#define LUSTRE_MINOR 6
+#define LUSTRE_MINOR 8
 #define LUSTRE_PATCH 99
 #define LUSTRE_FIX 0
-#define LUSTRE_VERSION_STRING "2.6.99"
+#define LUSTRE_VERSION_STRING "2.8.99"
 
 #define OBD_OCD_VERSION(major, minor, patch, fix)			\
 	(((major) << 24) + ((minor) << 16) + ((patch) << 8) + (fix))
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [lustre-devel] [PATCH 01/18] lustre: grant: add support for OBD_CONNECT_GRANT_PARAM
  2018-07-02 23:24 ` [lustre-devel] [PATCH 01/18] lustre: grant: add support for OBD_CONNECT_GRANT_PARAM James Simmons
@ 2018-07-03  3:04   ` NeilBrown
  2018-07-03 12:43     ` Patrick Farrell
  2018-07-06  4:02   ` Cory Spitz
  1 sibling, 1 reply; 41+ messages in thread
From: NeilBrown @ 2018-07-03  3:04 UTC (permalink / raw)
  To: lustre-devel

On Mon, Jul 02 2018, James Simmons wrote:

> From: Johann Lombardi <jlombardi@whamcloud.com>
>
> Add support for grant overhead calculation on the client side.
> To do so, clients track usage on a per-extent basis. An extent is
> composed of contiguous blocks.
> The OST now returns to the OSC layer several parameters to consume
> grant more accurately:

Just to be sure that I'm understanding correctly, should "consume"
above be "compute" ??

>
> - the backend filesystem block size which is the minimal grant
>   allocation unit;
> - the maximum extent size;
> - the extent insertion cost.
>   Clients now pack in bulk write how much grant space was consumed for
>   the RPC. Dirty data accounting also adopts the same scheme.
>
> Moreover, each backend OSD now reports its own set of parameters:
> - For ldiskfs, we usually have a 4KB block size with a maximum extent
>   size of 32MB (theoretical limit of 128MB) and an extent insertion
>   cost of 6 x 4KB = 24KB
> - For ZFS, we report a block size of 128KB, an extent size of 128
>   blocks (i.e. 16MB with 128KB block size) and a block insertion cost
>   of 112KB.
>
> Besides, there is now no more generic metadata overhead reservation
> done inside each OSD. Instead grant space is inflated for clients
> that do not support the new grant parameters. That said, a tiny
> percentage (typically 0.76%) of the free space is still reserved
> inside each OSD to avoid fragmentation which might hurt performance
> and impact our grant calculation (e.g. extents are broken due to
> fragmentation).
>
> This patch also fixes several other issues:
>
> - Bulk write resent by ptlrpc after reconnection could trigger
>   spurious error messages related to broken dirty accounting.
>   The issue was that oa_dirty is discarded for resent requests
>   (grant flag cleared in ost_brw_write()), so we can legitimately
>   have grant > fed_dirty in ofd_grant_check().
>   This was fixed by resetting fed_dirty on reconnection and skipping
>   the dirty accounting check in ofd_grant_check() in the case of
>   ptlrpc resend.

ofd_grant_check doesn't appear in this patch.  Presumably it was part of
the server code that was stripped from the patch.

>
> - In obd_connect_data_seqprint(), the connection flags cannot fit
>   in a 32-bit integer.

I cannot see what this might refer to either.

>
> - When merging two OSC extents, an extent tax should be released
>   in both the merged extent and in the grant accounting.

I think I might see what this relates to.

> +static ssize_t cur_dirty_grant_bytes_show(struct kobject *kobj,
> +					  struct attribute *attr,
> +					  char *buf)
> +{
> +	struct obd_device *dev = container_of(kobj, struct obd_device,
> +					      obd_kobj);
> +	struct client_obd *cli = &dev->u.cli;
> +	int len;
> +
> +	spin_lock(&cli->cl_loi_list_lock);
> +	len = sprintf(buf, "%lu\n", cli->cl_dirty_grant);
> +	spin_unlock(&cli->cl_loi_list_lock);

Why take the spinlock?? cl_dirty_grant is 'long', so reading it is
atomic.


> +		if (OCD_HAS_FLAG(&cli->cl_import->imp_connect_data,
> +				 GRANT_PARAM)) {
> +			int nrextents;
> +
> +			/*
> +			 * take extent tax into account when asking for more
> +			 * grant space
> +			 */
> +			nrextents = (nrpages + cli->cl_max_extent_pages - 1)  /
> +				     cli->cl_max_extent_pages;

I generally prefer DIV_ROUND_UP() for this.


I don't think any of the above will stop me from applying the patch,
though I might edit the description, and may add a fix-up patch afterwards.

Thanks.
NeilBrown
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180703/348eec1f/attachment.sig>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [lustre-devel] [PATCH 02/18] lustre: ladvise: Add feature of giving file access advices
  2018-07-02 23:24 ` [lustre-devel] [PATCH 02/18] lustre: ladvise: Add feature of giving file access advices James Simmons
@ 2018-07-03  3:32   ` NeilBrown
  2018-07-05 22:01     ` James Simmons
  0 siblings, 1 reply; 41+ messages in thread
From: NeilBrown @ 2018-07-03  3:32 UTC (permalink / raw)
  To: lustre-devel

On Mon, Jul 02 2018, James Simmons wrote:

> From: Li Xi <lixi@ddn.com>
>
> The fadvise() system call provided by Linux kernel enables
> applications to give advices or hints about how a file will be
> accessed. However, It is only a client side mechanism which is
> not enough for distributed file systems like Lustre, because in
> order to tune system-wide cache or read-ahead policies, servers
> need to understand the advices too.
>
> This patch adds a new feature named ladvise which provides new
> APIs and utils to give advices about the access pattern of Lustre
> files with the purpose of performance improvement. It is similar
> to Linux fadvise() system call, except it forwards the advices
> directly from Lustre client to server. The server side codes will
> apply appropriate read-ahead and caching techniques for the
> corresponding files.
>
> A typical workload for ladvise is e.g. a bunch of different
> clients are doing small random reads of a file, so prefetching
> pages into OSS cache with big linear reads before the random IO
> is a net benefit. Fetching all that data into each client cache
> with fadvise() may not be, due to much more data being sent to
> the client.
>
...
> +	case LL_IOC_LADVISE: {
> +		struct ladvise_hdr *ladvise_hdr;
> +		int alloc_size = sizeof(*ladvise_hdr);
> +		int num_advise;
> +		int i;
> +
> +		rc = 0;
> +		ladvise_hdr = kzalloc(alloc_size, GFP_NOFS);

Lustre really needs to get more sensible about when to use GFP_NOFS and
when to use GFP_KERNEL.
NOFS is only needed when the allocation cannot wait while reclaim enters
the filesystem, due to possible deadlock.  So if you are holding a lock
or some other resource that writeout might need, use NOFS.
In this case, there is no possible conflict with writeout, so GFP_KERNEL
should be used.

In this case, alloc_size is 32, so I wonder why it is allocating memory
at all rather than just using the stack.

Below the memory is freed and allocated again (why not realloc?) so an
allocation is justified there.  That should be GFP_KERNEL as well.

Later in osc_io_ladvise_start() there is a kvzalloc() with GFP_NOFS.
I suspect that should be GFP_KERNEL as well, especially as vmalloc
doesn't properly support GFP_NOFS and never has.


Again, I'll probably apply this anyway...

NeilBrown
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180703/1f610b00/attachment.sig>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [lustre-devel] [PATCH 03/18] lustre: fileset: add fileset mount support
  2018-07-02 23:24 ` [lustre-devel] [PATCH 03/18] lustre: fileset: add fileset mount support James Simmons
@ 2018-07-03  3:52   ` NeilBrown
  0 siblings, 0 replies; 41+ messages in thread
From: NeilBrown @ 2018-07-03  3:52 UTC (permalink / raw)
  To: lustre-devel

On Mon, Jul 02 2018, James Simmons wrote:

> From: Lai Siyao <lsiyao@whamcloud.com>
>
> This patch enables client to mount subdirectory as fileset.
>
> usage: mount -t lustre mgsname:/fsname/subdir /mount/point
>
>  * mdt lookup fileset fid and return to client during mount.
>  * `fid2path` support for fileset.
>

> @@ -2626,7 +2626,10 @@ struct getinfo_fid2path {
>  	__u64	   gf_recno;
>  	__u32	   gf_linkno;
>  	__u32	   gf_pathlen;
> -	char	    gf_path[0];
> +	union {
> +		char		gf_path[0];
> +		struct lu_fid	gf_root_fid[0];
> +	} gf_u;

If you just left the "gf_u" off here, you wouldn't need changes like:

> -		ptr = ori_gf->gf_path;
> +		ptr = ori_gf->gf_u.gf_path;

throughout the code.

I'll probably make this change before applying the patch.
The rest can come afterwards.

>  } __packed;
>  
>  /** path2parent request/reply structures */
> diff --git a/drivers/staging/lustre/lustre/include/lustre_disk.h b/drivers/staging/lustre/lustre/include/lustre_disk.h
> index 886e817..bd8fa71 100644
> --- a/drivers/staging/lustre/lustre/include/lustre_disk.h
> +++ b/drivers/staging/lustre/lustre/include/lustre_disk.h
> @@ -78,6 +78,7 @@ struct lustre_mount_data {
>  	int	lmd_recovery_time_hard;
>  	char      *lmd_dev;	   /* device name */
>  	char      *lmd_profile;    /* client only */
> +	char	*lmd_fileset;	/* mount fileset */

Consistent indenting is so over-rated :-)
I really that there is *nothing* you could have done here to make
the indenting consistent without including irrelevant changes in the
patch, which is generally frowned upon.
This is why it is not unusually to have some clean-up patches first,
so that you functional changes can be applied to clean code.

>  
> +	if (fileset)
> +		req_capsule_set_size(&req->rq_pill, &RMF_NAME, RCL_CLIENT,
> +				     strlen(fileset) + 1);
> +	rc = ptlrpc_request_pack(req, LUSTRE_MDS_VERSION, MDS_GET_ROOT);
> +	if (rc) {
> +		ptlrpc_request_free(req);
> +		return rc;
> +	}
>  	mdc_pack_body(req, NULL, 0, 0, -1, 0);
> +	if (fileset) {
> +		char *name = req_capsule_client_get(&req->rq_pill, &RMF_NAME);
> +
> +		memcpy(name, fileset, strlen(fileset));

Why isn't this strcpy()?
I realize that strcpy will add a nul that memcpy doesn't add so if I
could see the length being stored, I would understand.
But above I see the size passed to req_capsule_set_size() includes
the trailing nul.

Am I confused, or is the code confusing?


> @@ -1073,10 +1074,30 @@ static int lmd_parse(char *options, struct lustre_mount_data *lmd)
>  		/* Remove leading /s from fsname */
>  		while (*++s1 == '/')
>  			;
> +		s2 = s1;
> +		while (*s2 != '/' && *s2 != '\0')
> +			s2++;

 s2 = strchrnul(s1, '/');

(I only discovered strchrnul() a few days ago).

>  		/* Freed in lustre_free_lsi */
> -		lmd->lmd_profile = kasprintf(GFP_NOFS, "%s-client", s1);
> +		lmd->lmd_profile = kzalloc(s2 - s1 + 8, GFP_NOFS);
>  		if (!lmd->lmd_profile)
>  			return -ENOMEM;
> +
> +		strncat(lmd->lmd_profile, s1, s2 - s1);
> +		strncat(lmd->lmd_profile, "-client", 7);

  kasprintf(GFP_KERN, "%.*s-client", s2-s1, s1);

??

> +
> +		s1 = s2;
> +		s2 = s1 + strlen(s1) - 1;
> +		/* Remove padding /s from fileset */
> +		while (*s2 == '/')
> +			s2--;
> +		if (s2 > s1) {
> +			lmd->lmd_fileset = kzalloc(s2 - s1 + 2, GFP_NOFS);
> +			if (!lmd->lmd_fileset) {
> +				kfree(lmd->lmd_profile);
> +				return -ENOMEM;
> +			}
> +			strncat(lmd->lmd_fileset, s1, s2 - s1 + 1);

This probably deserves to be kasprintf() too.

Thanks,
NeilBrown
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180703/bddfa938/attachment.sig>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [lustre-devel] [PATCH 05/18] lustre: llite: fast read implementation
  2018-07-02 23:24 ` [lustre-devel] [PATCH 05/18] lustre: llite: fast read implementation James Simmons
@ 2018-07-03  4:10   ` NeilBrown
  2018-07-03 12:46     ` Patrick Farrell
  0 siblings, 1 reply; 41+ messages in thread
From: NeilBrown @ 2018-07-03  4:10 UTC (permalink / raw)
  To: lustre-devel

On Mon, Jul 02 2018, James Simmons wrote:

> +static ssize_t fast_read_store(struct kobject *kobj,
> +			       struct attribute *attr,
> +			       const char *buffer,
> +			       size_t count)
> +{
> +	struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info,
> +					      ll_kobj);
> +	bool val;
> +	int rc;
> +
> +	rc = kstrtobool(buffer, &val);
> +	if (rc)
> +		return rc;
> +
> +	spin_lock(&sbi->ll_lock);
> +	if (val)
> +		sbi->ll_flags |= LL_SBI_FAST_READ;
> +	else
> +		sbi->ll_flags &= ~LL_SBI_FAST_READ;
> +	spin_unlock(&sbi->ll_lock);

This is a little odd... no other code uses ll_lock to protect updates to
ll_flags.  Various other places (e.g. checksum_pages_store()) updates it
unprotected.

Maybe there all need to use ll_lock for protection.


Thanks,
NeilBrown
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180703/0137e8ad/attachment-0001.sig>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [lustre-devel] [PATCH 06/18] lustre: obd: reserve connection flag OBD_CONNECT2_FILE_SECCTX
  2018-07-02 23:24 ` [lustre-devel] [PATCH 06/18] lustre: obd: reserve connection flag OBD_CONNECT2_FILE_SECCTX James Simmons
@ 2018-07-03  4:20   ` NeilBrown
  0 siblings, 0 replies; 41+ messages in thread
From: NeilBrown @ 2018-07-03  4:20 UTC (permalink / raw)
  To: lustre-devel

On Mon, Jul 02 2018, James Simmons wrote:

> From: "John L. Hammond" <jhammond@whamcloud.com>
>
> The connection flag OBD_CONNECT2_FILE_SECCTX will be set (in
> ocd_connect_flags2) if an MDT supports setting the file security
> context at create time.
>

>  
> -static void obd_connect_seq_flags2str(struct seq_file *m, __u64 flags, char *sep)
> +static void obd_connect_seq_flags2str(struct seq_file *m, u64 flags,
> +				      u64 flags2, const char *sep)
>  {
> -	__u64 mask = 1;
> +	__u64 mask;
>  	int i;
>  	bool first = true;
>  
> -	for (i = 0; obd_connect_names[i]; i++, mask <<= 1) {
> +	for (i = 0, mask = 1; i < 64; i++, mask <<= 1) {

This changes where the loop stops.  Previously it stopped when
obd_connect_names[i] was NULL.  Now it goes on to 64.
obd_connect_names[] actually has 66 entries, so this isn't
an actual change.  However.... (continued below).

>  		if (flags & mask) {
>  			seq_printf(m, "%s%s",
>  				   first ? sep : "", obd_connect_names[i]);
>  			first = false;
>  		}
>  	}
> -	if (flags & ~(mask - 1))
> +
> +	if (flags & ~(mask - 1)) {
>  		seq_printf(m, "%sunknown flags %#llx",
>  			   first ? sep : "", flags & ~(mask - 1));
> +		first = false;
> +	}

Now that we know i goes up to 64, this code is pointless.
At this point, mask is 0, mask-1 is -1, ~(mask - 1) is zero.
So it cannot print anything.

Thanks,
NeilBrown
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180703/91b09b52/attachment.sig>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [lustre-devel] [PATCH 08/18] lustre: security: send file security context for creates
  2018-07-02 23:24 ` [lustre-devel] [PATCH 08/18] lustre: security: send file security context for creates James Simmons
@ 2018-07-03  4:26   ` NeilBrown
  0 siblings, 0 replies; 41+ messages in thread
From: NeilBrown @ 2018-07-03  4:26 UTC (permalink / raw)
  To: lustre-devel

On Mon, Jul 02 2018, James Simmons wrote:

> From: "John L. Hammond" <jhammond@whamcloud.com>
>
> Send file security context to MDT along with create RPCs. This closes
> the insecure window between creation and setting of the security
> context that existed previously. It also avoids a potential LDLM hang
> which arises from ll_create_it() when we send a MDS_SETXATTR RPC while
> holding the lookup+layout lock returned from open.
>


>  
> -err_exit:
> +out_inode:
> +	if (inode)
> +		iput(inode);

iput() allows NULL to be passes (as all 'release' function should)
so this can be just
	iput(inode);


Thanks,
NeilBrown
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180703/8744f815/attachment.sig>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [lustre-devel] [PATCH 16/18] lustre: libcfs: restore original behavior in cfs_str2num_check
  2018-07-02 23:24 ` [lustre-devel] [PATCH 16/18] lustre: libcfs: restore original behavior in cfs_str2num_check James Simmons
@ 2018-07-03  4:45   ` NeilBrown
  2018-07-05 22:50     ` James Simmons
  0 siblings, 1 reply; 41+ messages in thread
From: NeilBrown @ 2018-07-03  4:45 UTC (permalink / raw)
  To: lustre-devel

On Mon, Jul 02 2018, James Simmons wrote:

> When cfs_str2num_check() moved from simple_strtoul to kstrtoul
> some of the functionality got lost. Restore handling hexidecimal
> number as well as '+' and '-'. Also handle any trailing spaces.
>
> Signed-off-by: James Simmons <uja.ornl@yahoo.com>
> WC-bug-id: https://jira.whamcloud.com/browse/LU-9325
> Reviewed-on: https://review.whamcloud.com/32217
> Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
> Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
> Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
> Reviewed-by: Oleg Drokin <green@whamcloud.com>
> Signed-off-by: James Simmons <jsimmons@infradead.org>
> ---
>  drivers/staging/lustre/lnet/libcfs/libcfs_string.c | 17 ++++++++++++++---
>  1 file changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_string.c b/drivers/staging/lustre/lnet/libcfs/libcfs_string.c
> index e1fb126..e390b0b 100644
> --- a/drivers/staging/lustre/lnet/libcfs/libcfs_string.c
> +++ b/drivers/staging/lustre/lnet/libcfs/libcfs_string.c
> @@ -214,8 +214,10 @@ char *cfs_firststr(char *str, size_t size)
>  {
>  	bool all_numbers = true;
>  	char *endp, cache;
> +	int len;
>  	int rc;
>  
> +	endp = strim(str);

This looks bad.  The function now changes the string buffer, where it
didn't before.

>  	/**
>  	 * kstrouint can only handle strings composed
>  	 * of only numbers. We need to scan the string
> @@ -228,16 +230,25 @@ char *cfs_firststr(char *str, size_t size)
>  	 * After we are done the character at the
>  	 * position we placed '\0' must be restored.
>  	 */
> -	for (endp = str; endp < str + nob; endp++) {
> -		if (!isdigit(*endp)) {
> +	len = min((int)strlen(endp), nob);

As endp might be different from 'str, 'nob' is not meaningful in the
context of endp.

I'm guessing that mainline commit

Commit: c948390f10cc ("staging: lustre: llite: fix inconsistencies of root squash feature")

is related to this.

I cannot apply this patch as-is.  It is broken.  Possibly changing
   endp = strim(str);
to
   endp = str;

will fix it.

Thanks,
NeilBrown

> +	for (; endp < str + len; endp++) {
> +		if (!isdigit(*endp) && *endp != '-' &&
> +		    *endp != '+') {
>  			all_numbers = false;
>  			break;
>  		}
>  	}
> +
> +	/* Eat trailing space */
> +	if (!all_numbers && isspace(*endp)) {
> +		all_numbers = true;
> +		endp--;
> +	}
> +
>  	cache = *endp;
>  	*endp = '\0';
>  
> -	rc = kstrtouint(str, 10, num);
> +	rc = kstrtouint(str, 0, num);
>  	*endp = cache;
>  	if (rc || !all_numbers)
>  		return 0;
> -- 
> 1.8.3.1
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180703/26ac325d/attachment.sig>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [lustre-devel] [PATCH 00/18] lustre: missing updates from lustre 2.9
  2018-07-02 23:24 [lustre-devel] [PATCH 00/18] lustre: missing updates from lustre 2.9 James Simmons
                   ` (17 preceding siblings ...)
  2018-07-02 23:24 ` [lustre-devel] [PATCH 18/18] lustre: update version to 2.8.99 James Simmons
@ 2018-07-03  4:57 ` NeilBrown
  18 siblings, 0 replies; 41+ messages in thread
From: NeilBrown @ 2018-07-03  4:57 UTC (permalink / raw)
  To: lustre-devel

On Mon, Jul 02 2018, James Simmons wrote:

> This patch set fills in the known missing pieces from the Lustre
> 2.9 release. With these patches this brings us to the pre-2.10
> development cycle. The grant allocation fixes were missing as
> well as security hole closes. The rest are small changes in
> addition to support for filesets and ladvise feature.
>
> Alexander Boyko (1):
>   lustre: obd: add callback for llog_cat_process_or_fork
>
> Bobi Jam (1):
>   lustre: osc: max_pages_per_rpc should be chunk size aligned
>
> Fan Yong (1):
>   lustre: ptlrpc: properly set "rq_xid" for 4MB IO
>
> Henri Doreau (1):
>   lustre: llite: restore fd_och when putting lease
>
> James Simmons (2):
>   lustre: libcfs: restore original behavior in cfs_str2num_check

Apart from this patch which I couldn't bring myself to apply, I've
applied all these patches - making changes to only one as mentioned
below.
I'll probably add some cleanups later as separately described.

>   lustre: update version to 2.8.99
>
> Jian Yu (1):
>   lustre: mount: fix lmd_parse() to handle new delimiters
>
> Jinshan Xiong (1):
>   lustre: llite: fast read implementation
>
> Johann Lombardi (1):
>   lustre: grant: add support for OBD_CONNECT_GRANT_PARAM
>
> John L. Hammond (3):
>   lustre: obd: rename md_getstatus() to md_get_root()
>   lustre: obd: reserve connection flag OBD_CONNECT2_FILE_SECCTX
>   lustre: security: send file security context for creates
>
> Lai Siyao (1):
>   lustre: fileset: add fileset mount support

When I removed the union name as promised, this went from
 14 files changed, 135 insertions(+), 43 deletions(-)
to
 14 files changed, 120 insertions(+), 28 deletions(-)

Not an enormous change, but worthwhile I think.

Thanks
NeilBrown

>
> Li Xi (3):
>   lustre: ladvise: Add feature of giving file access advices
>   lustre: ladvise: Add willread advice support for ladvise
>   lustre: ladvise: Add dontneed advice support for ladvise
>
> Niu Yawei (1):
>   lustre: ldlm: reduce mem footprint of ldlm_resource
>
> Patrick Farrell (1):
>   lustre: llite: ladvise protocol changes
>
>  .../lustre/include/uapi/linux/lustre/lustre_idl.h  |  52 ++-
>  .../lustre/include/uapi/linux/lustre/lustre_user.h |  51 +++
>  .../lustre/include/uapi/linux/lustre/lustre_ver.h  |   4 +-
>  drivers/staging/lustre/lnet/libcfs/libcfs_string.c |  17 +-
>  drivers/staging/lustre/lustre/include/cl_object.h  |  13 +
>  .../staging/lustre/lustre/include/lustre_disk.h    |   2 +
>  drivers/staging/lustre/lustre/include/lustre_dlm.h |  24 +-
>  .../staging/lustre/lustre/include/lustre_import.h  |   1 +
>  .../lustre/lustre/include/lustre_req_layout.h      |  10 +-
>  .../staging/lustre/lustre/include/lustre_swab.h    |   2 +
>  drivers/staging/lustre/lustre/include/obd.h        |  17 +-
>  drivers/staging/lustre/lustre/include/obd_class.h  |  16 +-
>  .../staging/lustre/lustre/include/obd_support.h    |   5 +-
>  drivers/staging/lustre/lustre/ldlm/ldlm_internal.h |   1 +
>  drivers/staging/lustre/lustre/ldlm/ldlm_lib.c      |   1 +
>  drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c    |  20 +-
>  drivers/staging/lustre/lustre/ldlm/ldlm_resource.c |  32 +-
>  drivers/staging/lustre/lustre/llite/dir.c          |  54 +++-
>  drivers/staging/lustre/lustre/llite/file.c         | 360 ++++++++++++++++++---
>  .../staging/lustre/lustre/llite/llite_internal.h   |  34 +-
>  drivers/staging/lustre/lustre/llite/llite_lib.c    |  37 ++-
>  drivers/staging/lustre/lustre/llite/llite_mmap.c   |  34 +-
>  drivers/staging/lustre/lustre/llite/lproc_llite.c  |  38 +++
>  drivers/staging/lustre/lustre/llite/namei.c        |  40 ++-
>  drivers/staging/lustre/lustre/llite/rw.c           |  68 +++-
>  drivers/staging/lustre/lustre/llite/vvp_internal.h |   1 +
>  drivers/staging/lustre/lustre/llite/vvp_io.c       |   3 +
>  .../staging/lustre/lustre/llite/xattr_security.c   |  38 ++-
>  drivers/staging/lustre/lustre/lmv/lmv_obd.c        |  34 +-
>  drivers/staging/lustre/lustre/lov/lov_io.c         |  28 ++
>  drivers/staging/lustre/lustre/mdc/mdc_internal.h   |   4 +
>  drivers/staging/lustre/lustre/mdc/mdc_lib.c        |  32 ++
>  drivers/staging/lustre/lustre/mdc/mdc_locks.c      |   7 +
>  drivers/staging/lustre/lustre/mdc/mdc_reint.c      |   7 +
>  drivers/staging/lustre/lustre/mdc/mdc_request.c    |  50 ++-
>  drivers/staging/lustre/lustre/obdclass/cl_io.c     |   2 +
>  drivers/staging/lustre/lustre/obdclass/llog_cat.c  |  16 +-
>  .../lustre/lustre/obdclass/lprocfs_status.c        |  72 ++++-
>  drivers/staging/lustre/lustre/obdclass/obd_mount.c | 135 +++++++-
>  drivers/staging/lustre/lustre/osc/lproc_osc.c      |  18 ++
>  drivers/staging/lustre/lustre/osc/osc_cache.c      |  95 ++++--
>  .../staging/lustre/lustre/osc/osc_cl_internal.h    |   1 +
>  drivers/staging/lustre/lustre/osc/osc_dev.c        |   1 +
>  drivers/staging/lustre/lustre/osc/osc_internal.h   |   4 +
>  drivers/staging/lustre/lustre/osc/osc_io.c         |  79 +++++
>  drivers/staging/lustre/lustre/osc/osc_request.c    | 184 +++++++++--
>  drivers/staging/lustre/lustre/ptlrpc/client.c      |   9 +-
>  drivers/staging/lustre/lustre/ptlrpc/import.c      |  10 +
>  drivers/staging/lustre/lustre/ptlrpc/layout.c      |  66 +++-
>  .../staging/lustre/lustre/ptlrpc/lproc_ptlrpc.c    |   3 +-
>  .../staging/lustre/lustre/ptlrpc/pack_generic.c    |  27 +-
>  drivers/staging/lustre/lustre/ptlrpc/wiretest.c    | 122 +++++--
>  52 files changed, 1722 insertions(+), 259 deletions(-)
>
> -- 
> 1.8.3.1
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180703/51b3eeb3/attachment.sig>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [lustre-devel] [PATCH 01/18] lustre: grant: add support for OBD_CONNECT_GRANT_PARAM
  2018-07-03  3:04   ` NeilBrown
@ 2018-07-03 12:43     ` Patrick Farrell
  2018-07-03 21:59       ` NeilBrown
  0 siblings, 1 reply; 41+ messages in thread
From: Patrick Farrell @ 2018-07-03 12:43 UTC (permalink / raw)
  To: lustre-devel

Neil,

Compute vs consume: Honestly either would work.  The client consumes grant when it writes data, this fix to the calculation allows it to consume more accurately. *shrugs*

- Patrick


________________________________
From: lustre-devel <lustre-devel-bounces@lists.lustre.org> on behalf of NeilBrown <neilb@suse.com>
Sent: Monday, July 2, 2018 10:04:48 PM
To: James Simmons; Andreas Dilger; Oleg Drokin
Cc: Johann Lombardi; Lustre Development List
Subject: Re: [lustre-devel] [PATCH 01/18] lustre: grant: add support for OBD_CONNECT_GRANT_PARAM

On Mon, Jul 02 2018, James Simmons wrote:

> From: Johann Lombardi <jlombardi@whamcloud.com>
>
> Add support for grant overhead calculation on the client side.
> To do so, clients track usage on a per-extent basis. An extent is
> composed of contiguous blocks.
> The OST now returns to the OSC layer several parameters to consume
> grant more accurately:

Just to be sure that I'm understanding correctly, should "consume"
above be "compute" ??

>
> - the backend filesystem block size which is the minimal grant
>   allocation unit;
> - the maximum extent size;
> - the extent insertion cost.
>   Clients now pack in bulk write how much grant space was consumed for
>   the RPC. Dirty data accounting also adopts the same scheme.
>
> Moreover, each backend OSD now reports its own set of parameters:
> - For ldiskfs, we usually have a 4KB block size with a maximum extent
>   size of 32MB (theoretical limit of 128MB) and an extent insertion
>   cost of 6 x 4KB = 24KB
> - For ZFS, we report a block size of 128KB, an extent size of 128
>   blocks (i.e. 16MB with 128KB block size) and a block insertion cost
>   of 112KB.
>
> Besides, there is now no more generic metadata overhead reservation
> done inside each OSD. Instead grant space is inflated for clients
> that do not support the new grant parameters. That said, a tiny
> percentage (typically 0.76%) of the free space is still reserved
> inside each OSD to avoid fragmentation which might hurt performance
> and impact our grant calculation (e.g. extents are broken due to
> fragmentation).
>
> This patch also fixes several other issues:
>
> - Bulk write resent by ptlrpc after reconnection could trigger
>   spurious error messages related to broken dirty accounting.
>   The issue was that oa_dirty is discarded for resent requests
>   (grant flag cleared in ost_brw_write()), so we can legitimately
>   have grant > fed_dirty in ofd_grant_check().
>   This was fixed by resetting fed_dirty on reconnection and skipping
>   the dirty accounting check in ofd_grant_check() in the case of
>   ptlrpc resend.

ofd_grant_check doesn't appear in this patch.  Presumably it was part of
the server code that was stripped from the patch.

>
> - In obd_connect_data_seqprint(), the connection flags cannot fit
>   in a 32-bit integer.

I cannot see what this might refer to either.

>
> - When merging two OSC extents, an extent tax should be released
>   in both the merged extent and in the grant accounting.

I think I might see what this relates to.

> +static ssize_t cur_dirty_grant_bytes_show(struct kobject *kobj,
> +                                       struct attribute *attr,
> +                                       char *buf)
> +{
> +     struct obd_device *dev = container_of(kobj, struct obd_device,
> +                                           obd_kobj);
> +     struct client_obd *cli = &dev->u.cli;
> +     int len;
> +
> +     spin_lock(&cli->cl_loi_list_lock);
> +     len = sprintf(buf, "%lu\n", cli->cl_dirty_grant);
> +     spin_unlock(&cli->cl_loi_list_lock);

Why take the spinlock?? cl_dirty_grant is 'long', so reading it is
atomic.


> +             if (OCD_HAS_FLAG(&cli->cl_import->imp_connect_data,
> +                              GRANT_PARAM)) {
> +                     int nrextents;
> +
> +                     /*
> +                      * take extent tax into account when asking for more
> +                      * grant space
> +                      */
> +                     nrextents = (nrpages + cli->cl_max_extent_pages - 1)  /
> +                                  cli->cl_max_extent_pages;

I generally prefer DIV_ROUND_UP() for this.


I don't think any of the above will stop me from applying the patch,
though I might edit the description, and may add a fix-up patch afterwards.

Thanks.
NeilBrown
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180703/dc5329ff/attachment-0001.html>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [lustre-devel] [PATCH 05/18] lustre: llite: fast read implementation
  2018-07-03  4:10   ` NeilBrown
@ 2018-07-03 12:46     ` Patrick Farrell
  2018-07-03 22:03       ` NeilBrown
  0 siblings, 1 reply; 41+ messages in thread
From: Patrick Farrell @ 2018-07-03 12:46 UTC (permalink / raw)
  To: lustre-devel


They do.  It?s probably gone unnoticed because those updates are mostly directly triggered by users or one time only during initialization, so there?s little risk of colliding.

________________________________
From: lustre-devel <lustre-devel-bounces@lists.lustre.org> on behalf of NeilBrown <neilb@suse.com>
Sent: Monday, July 2, 2018 11:10:54 PM
To: James Simmons; Andreas Dilger; Oleg Drokin
Cc: Lustre Development List
Subject: Re: [lustre-devel] [PATCH 05/18] lustre: llite: fast read implementation

On Mon, Jul 02 2018, James Simmons wrote:

> +static ssize_t fast_read_store(struct kobject *kobj,
> +                            struct attribute *attr,
> +                            const char *buffer,
> +                            size_t count)
> +{
> +     struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info,
> +                                           ll_kobj);
> +     bool val;
> +     int rc;
> +
> +     rc = kstrtobool(buffer, &val);
> +     if (rc)
> +             return rc;
> +
> +     spin_lock(&sbi->ll_lock);
> +     if (val)
> +             sbi->ll_flags |= LL_SBI_FAST_READ;
> +     else
> +             sbi->ll_flags &= ~LL_SBI_FAST_READ;
> +     spin_unlock(&sbi->ll_lock);

This is a little odd... no other code uses ll_lock to protect updates to
ll_flags.  Various other places (e.g. checksum_pages_store()) updates it
unprotected.

Maybe there all need to use ll_lock for protection.


Thanks,
NeilBrown
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180703/4a1bdce9/attachment.html>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [lustre-devel] [PATCH 18/18] lustre: update version to 2.8.99
  2018-07-02 23:24 ` [lustre-devel] [PATCH 18/18] lustre: update version to 2.8.99 James Simmons
@ 2018-07-03 21:46   ` Andreas Dilger
  2018-07-05 22:53     ` James Simmons
  0 siblings, 1 reply; 41+ messages in thread
From: Andreas Dilger @ 2018-07-03 21:46 UTC (permalink / raw)
  To: lustre-devel

James, thanks for pushing up these patches.

Why not use 2.9.0? Are there still patches missing, or do you want this version to distinguish it from the actual 2.9.0 release?

Cheers, Andreas

> On Jul 2, 2018, at 17:36, James Simmons <jsimmons@infradead.org> wrote:
> 
> With the missing patches from the lustre 2.9 release merged
> upstream its time to update the upstream client's version.
> 
> Signed-off-by: James Simmons <jsimmons@infradead.org>
> ---
> drivers/staging/lustre/include/uapi/linux/lustre/lustre_ver.h | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_ver.h b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_ver.h
> index 19c9135..1428fdd 100644
> --- a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_ver.h
> +++ b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_ver.h
> @@ -2,10 +2,10 @@
> #define _LUSTRE_VER_H_
> 
> #define LUSTRE_MAJOR 2
> -#define LUSTRE_MINOR 6
> +#define LUSTRE_MINOR 8
> #define LUSTRE_PATCH 99
> #define LUSTRE_FIX 0
> -#define LUSTRE_VERSION_STRING "2.6.99"
> +#define LUSTRE_VERSION_STRING "2.8.99"
> 
> #define OBD_OCD_VERSION(major, minor, patch, fix)            \
>    (((major) << 24) + ((minor) << 16) + ((patch) << 8) + (fix))
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [lustre-devel] [PATCH 01/18] lustre: grant: add support for OBD_CONNECT_GRANT_PARAM
  2018-07-03 12:43     ` Patrick Farrell
@ 2018-07-03 21:59       ` NeilBrown
  0 siblings, 0 replies; 41+ messages in thread
From: NeilBrown @ 2018-07-03 21:59 UTC (permalink / raw)
  To: lustre-devel

On Tue, Jul 03 2018, Patrick Farrell wrote:

> Neil,
>
> Compute vs consume: Honestly either would work.  The client consumes grant when it writes data, this fix to the calculation allows it to consume more accurately. *shrugs*

Good - thanks.
I would have said "this fix to the calculation allows it to track what it
has consumed more accurately", but it is a small point.  I just wanted to
be sure I understood, and now I'm sure I do.

Thanks,
NeilBrown
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180704/a4870fe1/attachment.sig>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [lustre-devel] [PATCH 05/18] lustre: llite: fast read implementation
  2018-07-03 12:46     ` Patrick Farrell
@ 2018-07-03 22:03       ` NeilBrown
  0 siblings, 0 replies; 41+ messages in thread
From: NeilBrown @ 2018-07-03 22:03 UTC (permalink / raw)
  To: lustre-devel

On Tue, Jul 03 2018, Patrick Farrell wrote:

> They do.  It?s probably gone unnoticed because those updates are mostly directly triggered by users or one time only during initialization, so there?s little risk of colliding.

Agreed.  Had there been no locking in this new patch I possibly wouldn't
have noticed.  But I don't like inconsistencies.

Thanks,
NeilBrown
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180704/76065dfc/attachment.sig>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [lustre-devel] [PATCH 02/18] lustre: ladvise: Add feature of giving file access advices
  2018-07-03  3:32   ` NeilBrown
@ 2018-07-05 22:01     ` James Simmons
  0 siblings, 0 replies; 41+ messages in thread
From: James Simmons @ 2018-07-05 22:01 UTC (permalink / raw)
  To: lustre-devel


> > +	case LL_IOC_LADVISE: {
> > +		struct ladvise_hdr *ladvise_hdr;
> > +		int alloc_size = sizeof(*ladvise_hdr);
> > +		int num_advise;
> > +		int i;
> > +
> > +		rc = 0;
> > +		ladvise_hdr = kzalloc(alloc_size, GFP_NOFS);
> 
> Lustre really needs to get more sensible about when to use GFP_NOFS and
> when to use GFP_KERNEL.
> NOFS is only needed when the allocation cannot wait while reclaim enters
> the filesystem, due to possible deadlock.  So if you are holding a lock
> or some other resource that writeout might need, use NOFS.
> In this case, there is no possible conflict with writeout, so GFP_KERNEL
> should be used.
> 
> In this case, alloc_size is 32, so I wonder why it is allocating memory
> at all rather than just using the stack.
> 
> Below the memory is freed and allocated again (why not realloc?) so an
> allocation is justified there.  That should be GFP_KERNEL as well.
> 
> Later in osc_io_ladvise_start() there is a kvzalloc() with GFP_NOFS.
> I suspect that should be GFP_KERNEL as well, especially as vmalloc
> doesn't properly support GFP_NOFS and never has.

This is the result of the porting from the OpenSFS tree. So in the
OpenSFS branch it still does all memory allocations using its OBD_ALLOC*
wrappers. Unfolding those wrappers you will see its mostly kmallocs done
with the GFP_NOFS flag. This is the same reason lustre avoids using the
kasprintf() and related functions as well. Those wrappers were developed
to be used to locate memory leaks which was done long before KASAN and
trace events was created. Looking at __OBD_VMALLOC_VERBOSE() you will
see its using GFP_NOFS which appears to not to be even supported.
Oh fun!!!

Since we are on this topic it might be good idea to discuss getting onto
0day and work to get KASAN going with the linux lustre client branch. The
directions are not very clear on how to set it up. Have you ever worked
with zero day? As for KASAN that can be done locally but sadly you need
at least gcc 4.9.2 which in my setup I don't have. I did notice a lustre
instance exist on 0day but it doesn't look alive in any way.

As a side note I did apply for lustre-devel to be include on 
patchworks.linux.org. They are currently in the process to upgrade to
a newer software stack so it will another week or so for it to be visible.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [lustre-devel] [PATCH 16/18] lustre: libcfs: restore original behavior in cfs_str2num_check
  2018-07-03  4:45   ` NeilBrown
@ 2018-07-05 22:50     ` James Simmons
  0 siblings, 0 replies; 41+ messages in thread
From: James Simmons @ 2018-07-05 22:50 UTC (permalink / raw)
  To: lustre-devel


> On Mon, Jul 02 2018, James Simmons wrote:
> 
> > When cfs_str2num_check() moved from simple_strtoul to kstrtoul
> > some of the functionality got lost. Restore handling hexidecimal
> > number as well as '+' and '-'. Also handle any trailing spaces.
> >
> > Signed-off-by: James Simmons <uja.ornl@yahoo.com>
> > WC-bug-id: https://jira.whamcloud.com/browse/LU-9325
> > Reviewed-on: https://review.whamcloud.com/32217
> > Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
> > Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
> > Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
> > Reviewed-by: Oleg Drokin <green@whamcloud.com>
> > Signed-off-by: James Simmons <jsimmons@infradead.org>
> > ---
> >  drivers/staging/lustre/lnet/libcfs/libcfs_string.c | 17 ++++++++++++++---
> >  1 file changed, 14 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_string.c b/drivers/staging/lustre/lnet/libcfs/libcfs_string.c
> > index e1fb126..e390b0b 100644
> > --- a/drivers/staging/lustre/lnet/libcfs/libcfs_string.c
> > +++ b/drivers/staging/lustre/lnet/libcfs/libcfs_string.c
> > @@ -214,8 +214,10 @@ char *cfs_firststr(char *str, size_t size)
> >  {
> >  	bool all_numbers = true;
> >  	char *endp, cache;
> > +	int len;
> >  	int rc;
> >  
> > +	endp = strim(str);
> 
> This looks bad.  The function now changes the string buffer, where it
> didn't before.

So original this code uses simple_strtoul() which always parsed the string 
for numbers and succeded even if non digits existed in the string. In fact
you could get pointer to the first non-digit. That was also used to trim
the string. This behavior is very different from kstroul. I guess I will
need to duplicate the string and free it at the end.
 
> >  	/**
> >  	 * kstrouint can only handle strings composed
> >  	 * of only numbers. We need to scan the string
> > @@ -228,16 +230,25 @@ char *cfs_firststr(char *str, size_t size)
> >  	 * After we are done the character at the
> >  	 * position we placed '\0' must be restored.
> >  	 */
> > -	for (endp = str; endp < str + nob; endp++) {
> > -		if (!isdigit(*endp)) {
> > +	len = min((int)strlen(endp), nob);
> 
> As endp might be different from 'str, 'nob' is not meaningful in the
> context of endp.

I believe nob is supposed to be the string length of "str" itself. We
could error on the side caution and make that

len = min(strlen(endp), strlen(str));
  
> I'm guessing that mainline commit
> 
> Commit: c948390f10cc ("staging: lustre: llite: fix inconsistencies of root squash feature")
> 
> is related to this.

Actually it is commit ae0246da1603be7e7374621741515c2b6f2d6332 which ended 
up breaking my setup. I later debugged it and pushed the patch:

3ad6152d766039cb8ffd8633d971fb79402e5464

Later I pushed that patch to the OpenSFS branch and Hammond pointed out 
that some supported behaviors were lost. You can see this at:

https://review.whamcloud.com/#/c/32217/
 
> I cannot apply this patch as-is.  It is broken.  Possibly changing
>    endp = strim(str);
> to
>    endp = str;
> 
> will fix it.

I do see that the trim thing was removed in the root squash handling.
We might of returned a bug by mistake. 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [lustre-devel] [PATCH 18/18] lustre: update version to 2.8.99
  2018-07-03 21:46   ` Andreas Dilger
@ 2018-07-05 22:53     ` James Simmons
  0 siblings, 0 replies; 41+ messages in thread
From: James Simmons @ 2018-07-05 22:53 UTC (permalink / raw)
  To: lustre-devel


> James, thanks for pushing up these patches.
> 
> Why not use 2.9.0? Are there still patches missing, or do you want this
> version to distinguish it from the actual 2.9.0 release?

Oleg asked that I use a version number never used by an offical release.
Should we can that policy? Anyways their might be stuff missing still.
Once we reach the 2.10 LTS release we can do a scan of the code to see 
if this is the case.
 
> Cheers, Andreas
> 
> > On Jul 2, 2018, at 17:36, James Simmons <jsimmons@infradead.org> wrote:
> > 
> > With the missing patches from the lustre 2.9 release merged
> > upstream its time to update the upstream client's version.
> > 
> > Signed-off-by: James Simmons <jsimmons@infradead.org>
> > ---
> > drivers/staging/lustre/include/uapi/linux/lustre/lustre_ver.h | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_ver.h b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_ver.h
> > index 19c9135..1428fdd 100644
> > --- a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_ver.h
> > +++ b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_ver.h
> > @@ -2,10 +2,10 @@
> > #define _LUSTRE_VER_H_
> > 
> > #define LUSTRE_MAJOR 2
> > -#define LUSTRE_MINOR 6
> > +#define LUSTRE_MINOR 8
> > #define LUSTRE_PATCH 99
> > #define LUSTRE_FIX 0
> > -#define LUSTRE_VERSION_STRING "2.6.99"
> > +#define LUSTRE_VERSION_STRING "2.8.99"
> > 
> > #define OBD_OCD_VERSION(major, minor, patch, fix)            \
> >    (((major) << 24) + ((minor) << 16) + ((patch) << 8) + (fix))
> > -- 
> > 1.8.3.1
> > 
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [lustre-devel] [PATCH 01/18] lustre: grant: add support for OBD_CONNECT_GRANT_PARAM
  2018-07-02 23:24 ` [lustre-devel] [PATCH 01/18] lustre: grant: add support for OBD_CONNECT_GRANT_PARAM James Simmons
  2018-07-03  3:04   ` NeilBrown
@ 2018-07-06  4:02   ` Cory Spitz
  2018-07-06 14:28     ` James Simmons
  2018-07-06 14:29     ` James Simmons
  1 sibling, 2 replies; 41+ messages in thread
From: Cory Spitz @ 2018-07-06  4:02 UTC (permalink / raw)
  To: lustre-devel

James,

?On 7/2/18, 6:25 PM, "lustre-devel on behalf of James Simmons" <lustre-devel-bounces at lists.lustre.org on behalf of jsimmons@infradead.org> wrote:

    From: Johann Lombardi <jlombardi@whamcloud.com>
    Signed-off-by: Johann Lombardi <jlombardi@whamcloud.com>
    Signed-off-by: Jinshan Xiong <jinshan.xiong@gmail.com>
    WC-bug-id: https://jira.whamcloud.com/browse/2049
    Reviewed-on: http://review.whamcloud.com/7793
    Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
    Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
 
James, can you fix this up?  I'm not sure how historical patches should be attributed, but Johann's address seems wrong.

From should be attributed to Johann Lombardi <johann.lombardi@intel.com> .  Also, the Bug URL should be https://jira.whamcloud.com/browse/LU-2049 .

Thanks,
-Cory

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [lustre-devel] [PATCH 01/18] lustre: grant: add support for OBD_CONNECT_GRANT_PARAM
  2018-07-06  4:02   ` Cory Spitz
@ 2018-07-06 14:28     ` James Simmons
  2018-07-06 14:29     ` James Simmons
  1 sibling, 0 replies; 41+ messages in thread
From: James Simmons @ 2018-07-06 14:28 UTC (permalink / raw)
  To: lustre-devel


> ?On 7/2/18, 6:25 PM, "lustre-devel on behalf of James Simmons" <lustre-devel-bounces at lists.lustre.org on behalf of jsimmons@infradead.org> wrote:
> 
>     From: Johann Lombardi <jlombardi@whamcloud.com>
>     Signed-off-by: Johann Lombardi <jlombardi@whamcloud.com>
>     Signed-off-by: Jinshan Xiong <jinshan.xiong@gmail.com>
>     WC-bug-id: https://jira.whamcloud.com/browse/2049
>     Reviewed-on: http://review.whamcloud.com/7793
>     Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
>     Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
>  
> James, can you fix this up?  I'm not sure how historical patches should be attributed, but Johann's address seems
> wrong.
> 
> From should be attributed to Johann Lombardi <johann.lombardi@intel.com> .  Also, the Bug URL should be
> https://jira.whamcloud.com/browse/LU-2049 .

This is tricky one. The reason I updated the email address to avoid 
bouncing emails. Also this allows the original author the change to
still reply. Does that seem reasonable? It is still in the air for
some people what their email address is to me. Ah you are right I
dropped the s. Its should be https://jira.whamcloud.com/....

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [lustre-devel] [PATCH 01/18] lustre: grant: add support for OBD_CONNECT_GRANT_PARAM
  2018-07-06  4:02   ` Cory Spitz
  2018-07-06 14:28     ` James Simmons
@ 2018-07-06 14:29     ` James Simmons
  2018-07-06 15:12       ` Andreas Dilger
  2018-07-23  0:36       ` NeilBrown
  1 sibling, 2 replies; 41+ messages in thread
From: James Simmons @ 2018-07-06 14:29 UTC (permalink / raw)
  To: lustre-devel


> ?On 7/2/18, 6:25 PM, "lustre-devel on behalf of James Simmons" <lustre-devel-bounces at lists.lustre.org on behalf of jsimmons@infradead.org> wrote:
> 
>     From: Johann Lombardi <jlombardi@whamcloud.com>
>     Signed-off-by: Johann Lombardi <jlombardi@whamcloud.com>
>     Signed-off-by: Jinshan Xiong <jinshan.xiong@gmail.com>
>     WC-bug-id: https://jira.whamcloud.com/browse/2049
>     Reviewed-on: http://review.whamcloud.com/7793
>     Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
>     Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
>  
> James, can you fix this up?  I'm not sure how historical patches should be attributed, but Johann's address seems
> wrong.
> 
> From should be attributed to Johann Lombardi <johann.lombardi@intel.com> .  Also, the Bug URL should be
> https://jira.whamcloud.com/browse/LU-2049 .

This is tricky one. The reason I updated the email address to avoid 
bouncing emails. Also this allows the original author the chance to
still reply. Does that seem reasonable? It is still in the air for
some people what their email address is to me. Ah you are right I
dropped the s. Its should be https://jira.whamcloud.com/....

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [lustre-devel] [PATCH 01/18] lustre: grant: add support for OBD_CONNECT_GRANT_PARAM
  2018-07-06 14:29     ` James Simmons
@ 2018-07-06 15:12       ` Andreas Dilger
  2018-07-06 15:50         ` James Simmons
  2018-07-23  0:36       ` NeilBrown
  1 sibling, 1 reply; 41+ messages in thread
From: Andreas Dilger @ 2018-07-06 15:12 UTC (permalink / raw)
  To: lustre-devel

And Johann is still at Intel...

Cheers, Andreas

> On Jul 6, 2018, at 08:30, James Simmons <jsimmons@infradead.org> wrote:
> 
> 
>> ?On 7/2/18, 6:25 PM, "lustre-devel on behalf of James Simmons" <lustre-devel-bounces at lists.lustre.org on behalf of jsimmons@infradead.org> wrote:
>> 
>>    From: Johann Lombardi <jlombardi@whamcloud.com>
>>    Signed-off-by: Johann Lombardi <jlombardi@whamcloud.com>
>>    Signed-off-by: Jinshan Xiong <jinshan.xiong@gmail.com>
>>    WC-bug-id: https://jira.whamcloud.com/browse/2049
>>    Reviewed-on: http://review.whamcloud.com/7793
>>    Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
>>    Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
>> 
>> James, can you fix this up?  I'm not sure how historical patches should be attributed, but Johann's address seems
>> wrong.
>> 
>> From should be attributed to Johann Lombardi <johann.lombardi@intel.com> .  Also, the Bug URL should be
>> https://jira.whamcloud.com/browse/LU-2049 .
> 
> This is tricky one. The reason I updated the email address to avoid 
> bouncing emails. Also this allows the original author the chance to
> still reply. Does that seem reasonable? It is still in the air for
> some people what their email address is to me. Ah you are right I
> dropped the s. Its should be https://jira.whamcloud.com/....

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [lustre-devel] [PATCH 01/18] lustre: grant: add support for OBD_CONNECT_GRANT_PARAM
  2018-07-06 15:12       ` Andreas Dilger
@ 2018-07-06 15:50         ` James Simmons
  0 siblings, 0 replies; 41+ messages in thread
From: James Simmons @ 2018-07-06 15:50 UTC (permalink / raw)
  To: lustre-devel


> And Johann is still at Intel...

Its hard to tell. Is this true for WangDi and Fan Young as well?

> > On Jul 6, 2018, at 08:30, James Simmons <jsimmons@infradead.org> wrote:
> > 
> > 
> >> ?On 7/2/18, 6:25 PM, "lustre-devel on behalf of James Simmons" <lustre-devel-bounces at lists.lustre.org on behalf of jsimmons@infradead.org> wrote:
> >> 
> >>    From: Johann Lombardi <jlombardi@whamcloud.com>
> >>    Signed-off-by: Johann Lombardi <jlombardi@whamcloud.com>
> >>    Signed-off-by: Jinshan Xiong <jinshan.xiong@gmail.com>
> >>    WC-bug-id: https://jira.whamcloud.com/browse/2049
> >>    Reviewed-on: http://review.whamcloud.com/7793
> >>    Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
> >>    Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
> >> 
> >> James, can you fix this up?  I'm not sure how historical patches should be attributed, but Johann's address seems
> >> wrong.
> >> 
> >> From should be attributed to Johann Lombardi <johann.lombardi@intel.com> .  Also, the Bug URL should be
> >> https://jira.whamcloud.com/browse/LU-2049 .
> > 
> > This is tricky one. The reason I updated the email address to avoid 
> > bouncing emails. Also this allows the original author the chance to
> > still reply. Does that seem reasonable? It is still in the air for
> > some people what their email address is to me. Ah you are right I
> > dropped the s. Its should be https://jira.whamcloud.com/....
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [lustre-devel] [PATCH 01/18] lustre: grant: add support for OBD_CONNECT_GRANT_PARAM
  2018-07-06 14:29     ` James Simmons
  2018-07-06 15:12       ` Andreas Dilger
@ 2018-07-23  0:36       ` NeilBrown
  1 sibling, 0 replies; 41+ messages in thread
From: NeilBrown @ 2018-07-23  0:36 UTC (permalink / raw)
  To: lustre-devel

On Fri, Jul 06 2018, James Simmons wrote:

>> ?On 7/2/18, 6:25 PM, "lustre-devel on behalf of James Simmons" <lustre-devel-bounces at lists.lustre.org on behalf of jsimmons@infradead.org> wrote:
>> 
>>     From: Johann Lombardi <jlombardi@whamcloud.com>
>>     Signed-off-by: Johann Lombardi <jlombardi@whamcloud.com>
>>     Signed-off-by: Jinshan Xiong <jinshan.xiong@gmail.com>
>>     WC-bug-id: https://jira.whamcloud.com/browse/2049
>>     Reviewed-on: http://review.whamcloud.com/7793
>>     Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
>>     Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
>>  
>> James, can you fix this up?  I'm not sure how historical patches should be attributed, but Johann's address seems
>> wrong.
>> 
>> From should be attributed to Johann Lombardi <johann.lombardi@intel.com> .  Also, the Bug URL should be
>> https://jira.whamcloud.com/browse/LU-2049 .
>
> This is tricky one. The reason I updated the email address to avoid 
> bouncing emails.

I would recommend never trying to update email addresses.  The kernel
history is full of addresses that are no longer valid.
Discussion of patches should happen on lists, and interested parties
should subscribe to the lists.  If someone wants to particularly see
emails sent to some old address of their's it should be easy enough to
add some filtering.  If you particularly want to contact the author of
some old patch, it is unlikely to take to much searching to find a valid
address (just ask on the email list - someone is bound to know).

Bounces are a little bit annoying, but so is spam - we just learn ways
to deal with it.

For this particular patch I've restored the addresses from the original
patch, even though I know some of them are no longer valid.  They were
valid when the review was done, and the effectively give credit to the
company which paid for the review.  I won't bother the rest though.

Thanks,
NeilBrown


> Also this allows the original author the chance to
> still reply. Does that seem reasonable? It is still in the air for
> some people what their email address is to me. Ah you are right I
> dropped the s. Its should be https://jira.whamcloud.com/....
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180723/2d64cee1/attachment.sig>

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2018-07-23  0:36 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-02 23:24 [lustre-devel] [PATCH 00/18] lustre: missing updates from lustre 2.9 James Simmons
2018-07-02 23:24 ` [lustre-devel] [PATCH 01/18] lustre: grant: add support for OBD_CONNECT_GRANT_PARAM James Simmons
2018-07-03  3:04   ` NeilBrown
2018-07-03 12:43     ` Patrick Farrell
2018-07-03 21:59       ` NeilBrown
2018-07-06  4:02   ` Cory Spitz
2018-07-06 14:28     ` James Simmons
2018-07-06 14:29     ` James Simmons
2018-07-06 15:12       ` Andreas Dilger
2018-07-06 15:50         ` James Simmons
2018-07-23  0:36       ` NeilBrown
2018-07-02 23:24 ` [lustre-devel] [PATCH 02/18] lustre: ladvise: Add feature of giving file access advices James Simmons
2018-07-03  3:32   ` NeilBrown
2018-07-05 22:01     ` James Simmons
2018-07-02 23:24 ` [lustre-devel] [PATCH 03/18] lustre: fileset: add fileset mount support James Simmons
2018-07-03  3:52   ` NeilBrown
2018-07-02 23:24 ` [lustre-devel] [PATCH 04/18] lustre: obd: rename md_getstatus() to md_get_root() James Simmons
2018-07-02 23:24 ` [lustre-devel] [PATCH 05/18] lustre: llite: fast read implementation James Simmons
2018-07-03  4:10   ` NeilBrown
2018-07-03 12:46     ` Patrick Farrell
2018-07-03 22:03       ` NeilBrown
2018-07-02 23:24 ` [lustre-devel] [PATCH 06/18] lustre: obd: reserve connection flag OBD_CONNECT2_FILE_SECCTX James Simmons
2018-07-03  4:20   ` NeilBrown
2018-07-02 23:24 ` [lustre-devel] [PATCH 07/18] lustre: llite: restore fd_och when putting lease James Simmons
2018-07-02 23:24 ` [lustre-devel] [PATCH 08/18] lustre: security: send file security context for creates James Simmons
2018-07-03  4:26   ` NeilBrown
2018-07-02 23:24 ` [lustre-devel] [PATCH 09/18] lustre: llite: ladvise protocol changes James Simmons
2018-07-02 23:24 ` [lustre-devel] [PATCH 10/18] lustre: ladvise: Add willread advice support for ladvise James Simmons
2018-07-02 23:24 ` [lustre-devel] [PATCH 11/18] lustre: osc: max_pages_per_rpc should be chunk size aligned James Simmons
2018-07-02 23:24 ` [lustre-devel] [PATCH 12/18] lustre: ladvise: Add dontneed advice support for ladvise James Simmons
2018-07-02 23:24 ` [lustre-devel] [PATCH 13/18] lustre: ptlrpc: properly set "rq_xid" for 4MB IO James Simmons
2018-07-02 23:24 ` [lustre-devel] [PATCH 14/18] lustre: obd: add callback for llog_cat_process_or_fork James Simmons
2018-07-02 23:24 ` [lustre-devel] [PATCH 15/18] lustre: mount: fix lmd_parse() to handle new delimiters James Simmons
2018-07-02 23:24 ` [lustre-devel] [PATCH 16/18] lustre: libcfs: restore original behavior in cfs_str2num_check James Simmons
2018-07-03  4:45   ` NeilBrown
2018-07-05 22:50     ` James Simmons
2018-07-02 23:24 ` [lustre-devel] [PATCH 17/18] lustre: ldlm: reduce mem footprint of ldlm_resource James Simmons
2018-07-02 23:24 ` [lustre-devel] [PATCH 18/18] lustre: update version to 2.8.99 James Simmons
2018-07-03 21:46   ` Andreas Dilger
2018-07-05 22:53     ` James Simmons
2018-07-03  4:57 ` [lustre-devel] [PATCH 00/18] lustre: missing updates from lustre 2.9 NeilBrown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.