lustre-devel-lustre.org archive mirror
 help / color / mirror / Atom feed
* [lustre-devel] [PATCH 00/14] Update to OpenSFS tree as of May 3, 2021
@ 2021-05-04  0:10 James Simmons
  2021-05-04  0:10 ` [lustre-devel] [PATCH 01/14] lustre: llite: Remove last lockahead old compat James Simmons
                   ` (13 more replies)
  0 siblings, 14 replies; 15+ messages in thread
From: James Simmons @ 2021-05-04  0:10 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

Port the latest work from the OpenSFS tree to the native
Linux client as of May 4, 2021. This now includes the
fake symlink DOAS work.

Alex Zhuravlev (1):
  lustre: mdc: make rpc set for MDS_STATFS interruptible

Alexander Boyko (1):
  lustre: ptlrpc: idle import vs lock enqueue race

Andreas Dilger (1):
  lustre: mdc: include linux/idr.h for referenced code

Bruno Faccini (1):
  lustre: llite: fake symlink type of foreign file/dir

Chris Horn (5):
  lnet: Recover local NI w/exponential backoff interval
  lnet: Deprecate lnet_recovery_interval
  lnet: Router ping timeout with discovery disabled
  lnet: Ensure proper peer, peer NI, peer net hierarchy
  lnet: Skip discovery in LNetPrimaryNID if DD disabled

Mr NeilBrown (4):
  lnet: libcfs: simplify task management in tracefile.c
  lustre: move lu_tgt_pool out of obd_target.h
  lnet: libcfs: remove references to Sun Trademark.
  lustre: llite: use d_is_symlink to test if dentry is a symlink

Patrick Farrell (1):
  lustre: llite: Remove last lockahead old compat

 fs/lustre/include/lu_object.h           |  18 +-
 fs/lustre/include/lustre_net.h          |   1 +
 fs/lustre/include/obd_class.h           |   2 +
 fs/lustre/include/obd_support.h         |   1 +
 fs/lustre/include/obd_target.h          |  53 ---
 fs/lustre/llite/Makefile                |   1 +
 fs/lustre/llite/dcache.c                |  14 +-
 fs/lustre/llite/dir.c                   |  11 +
 fs/lustre/llite/file.c                  |  53 ++-
 fs/lustre/llite/foreign_symlink.h       |  49 +++
 fs/lustre/llite/llite_foreign.c         | 284 ++++++++++++
 fs/lustre/llite/llite_foreign_symlink.c | 758 ++++++++++++++++++++++++++++++++
 fs/lustre/llite/llite_internal.h        |  35 +-
 fs/lustre/llite/llite_lib.c             | 105 +++++
 fs/lustre/llite/lproc_llite.c           |  12 +
 fs/lustre/llite/namei.c                 |  33 +-
 fs/lustre/llite/pcc.c                   |   4 +-
 fs/lustre/llite/symlink.c               |   1 +
 fs/lustre/lov/lov_object.c              |   3 +-
 fs/lustre/lov/lov_pack.c                |  13 +-
 fs/lustre/mdc/mdc_changelog.c           |   1 +
 fs/lustre/mdc/mdc_request.c             |   1 +
 fs/lustre/obdclass/lu_tgt_pool.c        |   2 +-
 fs/lustre/obdclass/obd_mount.c          |   4 +-
 fs/lustre/osc/osc_lock.c                |   5 +
 fs/lustre/ptlrpc/client.c               |  11 +
 fs/lustre/ptlrpc/import.c               |  63 ++-
 fs/lustre/ptlrpc/pinger.c               |   5 +-
 include/linux/libcfs/libcfs.h           |   1 -
 include/linux/libcfs/libcfs_cpu.h       |   1 -
 include/linux/libcfs/libcfs_debug.h     |   1 -
 include/linux/libcfs/libcfs_hash.h      |   1 -
 include/linux/libcfs/libcfs_private.h   |   1 -
 include/linux/libcfs/libcfs_string.h    |   1 -
 include/linux/lnet/lib-lnet.h           |   9 +
 include/linux/lnet/lib-types.h          |   7 +
 include/uapi/linux/lustre/lustre_user.h |  49 ++-
 net/lnet/libcfs/debug.c                 |   1 -
 net/lnet/libcfs/hash.c                  |   1 -
 net/lnet/libcfs/libcfs_cpu.c            |   1 -
 net/lnet/libcfs/libcfs_lock.c           |   1 -
 net/lnet/libcfs/libcfs_mem.c            |   1 -
 net/lnet/libcfs/libcfs_string.c         |   1 -
 net/lnet/libcfs/module.c                |   1 -
 net/lnet/libcfs/tracefile.c             |  83 ++--
 net/lnet/libcfs/tracefile.h             |   1 -
 net/lnet/lnet/api-ni.c                  |  26 +-
 net/lnet/lnet/lib-move.c                |  60 +--
 net/lnet/lnet/lib-msg.c                 |  48 +-
 net/lnet/lnet/peer.c                    |  55 ++-
 net/lnet/lnet/router.c                  |   8 +-
 51 files changed, 1640 insertions(+), 262 deletions(-)
 delete mode 100644 fs/lustre/include/obd_target.h
 create mode 100644 fs/lustre/llite/foreign_symlink.h
 create mode 100644 fs/lustre/llite/llite_foreign.c
 create mode 100644 fs/lustre/llite/llite_foreign_symlink.c

-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [lustre-devel] [PATCH 01/14] lustre: llite: Remove last lockahead old compat
  2021-05-04  0:10 [lustre-devel] [PATCH 00/14] Update to OpenSFS tree as of May 3, 2021 James Simmons
@ 2021-05-04  0:10 ` James Simmons
  2021-05-04  0:10 ` [lustre-devel] [PATCH 02/14] lustre: mdc: include linux/idr.h for referenced code James Simmons
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: James Simmons @ 2021-05-04  0:10 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Patrick Farrell, Lustre Development List

From: Patrick Farrell <farr0186@gmail.com>

The CEF_NONBLOCK flag in cld_enq_flags is required for the
old Cray-only server release of lockahead.  In the more
recent versions, the required nonblocking behavior (on the
server side) is associated with LDLM_FL_SPECULATIVE, not
with the combination of LDLM_FL_NO_EXPAND and
LDLM_FL_BLOCK_NOWAIT (as was done in the old
implementation).

Now we control 'speculative' or not with the async flag
from userspace.

Now that we've removed OBD_CONNECT_LOCKAHEAD_OLD support
from the client, we should be good to go there.

The existing testing explores both sync and async requests,
and should be enough to verify this is OK.

WC-bug-id: https://jira.whamcloud.com/browse/LU-6179
Lustre-commit: 12a0c7b5944d9e48 ("LU-6179 llite: Remove last lockahead old compat")
Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Reviewed-on: https://review.whamcloud.com/38179
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/file.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index 1561af1..346e31c 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -3033,8 +3033,7 @@ int ll_file_lock_ahead(struct file *file, struct llapi_lu_ladvise *ladvise)
 		/* CEF_MUST is used because we do not want to convert a
 		 * lockahead request to a lockless lock
 		 */
-		descr->cld_enq_flags = CEF_MUST | CEF_LOCK_NO_EXPAND |
-				       CEF_NONBLOCK;
+		descr->cld_enq_flags = CEF_MUST | CEF_LOCK_NO_EXPAND;
 
 		if (ladvise->lla_peradvice_flags & LF_ASYNC)
 			descr->cld_enq_flags |= CEF_SPECULATIVE;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [lustre-devel] [PATCH 02/14] lustre: mdc: include linux/idr.h for referenced code
  2021-05-04  0:10 [lustre-devel] [PATCH 00/14] Update to OpenSFS tree as of May 3, 2021 James Simmons
  2021-05-04  0:10 ` [lustre-devel] [PATCH 01/14] lustre: llite: Remove last lockahead old compat James Simmons
@ 2021-05-04  0:10 ` James Simmons
  2021-05-04  0:10 ` [lustre-devel] [PATCH 03/14] lnet: Recover local NI w/exponential backoff interval James Simmons
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: James Simmons @ 2021-05-04  0:10 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Andreas Dilger <adilger@whamcloud.com>

Include the <linux/idr.h> header in files that references IDR
functionality.  Don't depend on its indirect inclusion elsewhere.

Fixes: dcedf3009a71 ("lustre: changelog: support large number of MDT")
WC-bug-id: https://jira.whamcloud.com/browse/LU-6142
Lustre-commit: 3589a3141a4b9f94 ("LU-6142 mdc: include linux/idr.h for referenced code")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/43346
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/mdc/mdc_changelog.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/lustre/mdc/mdc_changelog.c b/fs/lustre/mdc/mdc_changelog.c
index ef6d4f9..31c6c8a 100644
--- a/fs/lustre/mdc/mdc_changelog.c
+++ b/fs/lustre/mdc/mdc_changelog.c
@@ -36,6 +36,7 @@
 #include <linux/poll.h>
 #include <linux/device.h>
 #include <linux/cdev.h>
+#include <linux/idr.h>
 
 #include <lustre_log.h>
 #include <uapi/linux/lustre/lustre_ioctl.h>
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [lustre-devel] [PATCH 03/14] lnet: Recover local NI w/exponential backoff interval
  2021-05-04  0:10 [lustre-devel] [PATCH 00/14] Update to OpenSFS tree as of May 3, 2021 James Simmons
  2021-05-04  0:10 ` [lustre-devel] [PATCH 01/14] lustre: llite: Remove last lockahead old compat James Simmons
  2021-05-04  0:10 ` [lustre-devel] [PATCH 02/14] lustre: mdc: include linux/idr.h for referenced code James Simmons
@ 2021-05-04  0:10 ` James Simmons
  2021-05-04  0:10 ` [lustre-devel] [PATCH 04/14] lnet: Deprecate lnet_recovery_interval James Simmons
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: James Simmons @ 2021-05-04  0:10 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

Use an exponential backoff algorithm to determine the interval at
which unhealthy local NIs are ping'd

Introduce lnet_ni_add_to_recoveryq_locked() which handles checking
pre-conditions for whether the NI should be added to the recovery
queue, and takes a ref on the NI as appropriate.

HPE-bug-id: LUS-9109
WC-bug-id: https://jira.whamcloud.com/browse/LU-13569
Lustre-commit: 8fdf2bc62ac9c418 ("LU-13569 lnet: Recover local NI w/exponential backoff interval")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/39721
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-lnet.h  |  9 ++++++++
 include/linux/lnet/lib-types.h |  7 ++++++
 net/lnet/lnet/lib-move.c       | 41 ++++++++++++++++++------------------
 net/lnet/lnet/lib-msg.c        | 48 +++++++++++++++++++++++++++---------------
 4 files changed, 67 insertions(+), 38 deletions(-)

diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index fd24c10..674f9d1 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -518,6 +518,9 @@ extern void lnet_peer_ni_add_to_recoveryq_locked(struct lnet_peer_ni *lpni,
 extern int lnet_peer_del_pref_nid(struct lnet_peer_ni *lpni, lnet_nid_t nid);
 void lnet_peer_ni_set_selection_priority(struct lnet_peer_ni *lpni,
 					 u32 priority);
+extern void lnet_ni_add_to_recoveryq_locked(struct lnet_ni *ni,
+					    struct list_head *queue,
+					    time64_t now);
 
 void lnet_router_debugfs_init(void);
 void lnet_router_debugfs_fini(void);
@@ -929,6 +932,12 @@ int lnet_get_peer_ni_info(u32 peer_index, u64 *nid,
 		lnet_get_next_recovery_ping(lpni->lpni_ping_count, now);
 }
 
+static inline void
+lnet_ni_set_next_ping(struct lnet_ni *ni, time64_t now)
+{
+	ni->ni_next_ping = lnet_get_next_recovery_ping(ni->ni_ping_count, now);
+}
+
 /*
  * A peer NI is alive if it satisfies the following two conditions:
  *  1. peer NI health >= LNET_MAX_HEALTH_VALUE * router_sensitivity_percentage
diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h
index a6a7588..f199b15 100644
--- a/include/linux/lnet/lib-types.h
+++ b/include/linux/lnet/lib-types.h
@@ -460,6 +460,13 @@ struct lnet_ni {
 	/* Recovery state. Protected by lnet_ni_lock() */
 	u32			ni_recovery_state;
 
+	/* When to send the next recovery ping */
+	time64_t                ni_next_ping;
+	/* How many pings sent during current recovery period did not receive
+	 * a reply. NB: reset whenever _any_ message arrives on this NI
+	 */
+	unsigned int		ni_ping_count;
+
 	/* per NI LND tunables */
 	struct lnet_lnd_tunables ni_lnd_tunables;
 
diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index 896ab12..46c88d0 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -3103,6 +3103,7 @@ struct lnet_mt_event_info {
 	lnet_nid_t nid;
 	int healthv;
 	int rc;
+	time64_t now;
 
 	/* splice the recovery queue on a local queue. We will iterate
 	 * through the local queue and update it as needed. Once we're
@@ -3115,6 +3116,8 @@ struct lnet_mt_event_info {
 			 &local_queue);
 	lnet_net_unlock(0);
 
+	now = ktime_get_seconds();
+
 	list_for_each_entry_safe(ni, tmp, &local_queue, ni_recovery) {
 		/* if an NI is being deleted or it is now healthy, there
 		 * is no need to keep it around in the recovery queue.
@@ -3147,6 +3150,12 @@ struct lnet_mt_event_info {
 		}
 
 		lnet_ni_unlock(ni);
+
+		if (now < ni->ni_next_ping) {
+			lnet_net_unlock(0);
+			continue;
+		}
+
 		lnet_net_unlock(0);
 
 		CDEBUG(D_NET, "attempting to recover local ni: %s\n",
@@ -3212,31 +3221,21 @@ struct lnet_mt_event_info {
 				LNetMDUnlink(mdh);
 				continue;
 			}
-			/* Same note as in lnet_recover_peer_nis(). When
-			 * we're sending the ping, the NI is free to be
-			 * deleted or manipulated. By this point it
-			 * could've been added back on the recovery queue,
-			 * and a refcount taken on it.
-			 * So we can't just add it blindly again or we'll
-			 * corrupt the queue. We must check under lock if
-			 * it's not on any list and if not then add it
-			 * to the processed list, which will eventually be
-			 * spliced back on to the recovery queue.
-			 */
-			ni->ni_ping_mdh = mdh;
-			if (list_empty(&ni->ni_recovery)) {
-				list_add_tail(&ni->ni_recovery,
-					      &processed_list);
-				lnet_ni_addref_locked(ni, 0);
-			}
-			lnet_net_unlock(0);
+			ni->ni_ping_count++;
 
-			lnet_ni_lock(ni);
-			if (rc)
+			ni->ni_ping_mdh = mdh;
+			lnet_ni_add_to_recoveryq_locked(ni, &processed_list,
+							now);
+			if (rc) {
+				lnet_ni_lock(ni);
 				ni->ni_recovery_state &=
 					~LNET_NI_RECOVERY_PENDING;
+				lnet_ni_unlock(ni);
+			}
+			lnet_net_unlock(0);
+		} else {
+			lnet_ni_unlock(ni);
 		}
-		lnet_ni_unlock(ni);
 	}
 
 	/* put back the remaining NIs on the ln_mt_localNIRecovq to be
diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c
index 3f6cd1d..580ddf6 100644
--- a/net/lnet/lnet/lib-msg.c
+++ b/net/lnet/lnet/lib-msg.c
@@ -455,6 +455,32 @@
 	}
 }
 
+/* must hold net_lock/0 */
+void
+lnet_ni_add_to_recoveryq_locked(struct lnet_ni *ni,
+				struct list_head *recovery_queue, time64_t now)
+{
+	if (!list_empty(&ni->ni_recovery))
+		return;
+
+	if (atomic_read(&ni->ni_healthv) == LNET_MAX_HEALTH_VALUE)
+		return;
+
+	/* This NI is going on the recovery queue, so take a ref on it */
+	lnet_ni_addref_locked(ni, 0);
+
+	lnet_ni_set_next_ping(ni, now);
+
+	CDEBUG(D_NET,
+	       "%s added to recovery queue. ping count: %u next ping: %lld health :%d\n",
+	       libcfs_nid2str(ni->ni_nid),
+	       ni->ni_ping_count,
+	       ni->ni_next_ping,
+	       atomic_read(&ni->ni_healthv));
+
+	list_add_tail(&ni->ni_recovery, recovery_queue);
+}
+
 static void
 lnet_handle_local_failure(struct lnet_ni *local_ni)
 {
@@ -469,21 +495,8 @@
 	}
 
 	lnet_dec_healthv_locked(&local_ni->ni_healthv, lnet_health_sensitivity);
-	/* add the NI to the recovery queue if it's not already there
-	 * and it's health value is actually below the maximum. It's
-	 * possible that the sensitivity might be set to 0, and the health
-	 * value will not be reduced. In this case, there is no reason to
-	 * invoke recovery
-	 */
-	if (list_empty(&local_ni->ni_recovery) &&
-	    atomic_read(&local_ni->ni_healthv) < LNET_MAX_HEALTH_VALUE) {
-		CDEBUG(D_NET, "ni %s added to recovery queue. Health = %d\n",
-		       libcfs_nid2str(local_ni->ni_nid),
-		       atomic_read(&local_ni->ni_healthv));
-		list_add_tail(&local_ni->ni_recovery,
-			      &the_lnet.ln_mt_localNIRecovq);
-		lnet_ni_addref_locked(local_ni, 0);
-	}
+	lnet_ni_add_to_recoveryq_locked(local_ni, &the_lnet.ln_mt_localNIRecovq,
+					ktime_get_seconds());
 	lnet_net_unlock(0);
 }
 
@@ -869,6 +882,8 @@
 		 * faster recovery.
 		 */
 		lnet_inc_healthv(&ni->ni_healthv, lnet_health_sensitivity);
+		lnet_net_lock(0);
+		ni->ni_ping_count = 0;
 		/* It's possible msg_txpeer is NULL in the LOLND
 		 * case. Only increment the peer's health if we're
 		 * receiving a message from it. It's the only sure way to
@@ -882,7 +897,6 @@
 			 * I'm a router, then set that lpni's health to
 			 * maximum so we can commence communication
 			 */
-			lnet_net_lock(0);
 			if (lnet_isrouter(lpni) || the_lnet.ln_routing) {
 				lnet_set_lpni_healthv_locked(lpni,
 							     LNET_MAX_HEALTH_VALUE);
@@ -905,8 +919,8 @@
 								     &the_lnet.ln_mt_peerNIRecovq,
 								     ktime_get_seconds());
 			}
-			lnet_net_unlock(0);
 		}
+		lnet_net_unlock(0);
 
 		/* we can finalize this message */
 		return -1;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [lustre-devel] [PATCH 04/14] lnet: Deprecate lnet_recovery_interval
  2021-05-04  0:10 [lustre-devel] [PATCH 00/14] Update to OpenSFS tree as of May 3, 2021 James Simmons
                   ` (2 preceding siblings ...)
  2021-05-04  0:10 ` [lustre-devel] [PATCH 03/14] lnet: Recover local NI w/exponential backoff interval James Simmons
@ 2021-05-04  0:10 ` James Simmons
  2021-05-04  0:10 ` [lustre-devel] [PATCH 05/14] lnet: Router ping timeout with discovery disabled James Simmons
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: James Simmons @ 2021-05-04  0:10 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

We no longer use a static recovery interval, so remove its remaining
uses and add warning that it has been deprecated.

HPE-bug-id: LUS-9109
C-bug-id: https://jira.whamcloud.com/browse/LU-13569
Lustre-commit: 79ab0535622782c82 ("LU-13569 lnet: Deprecate lnet_recovery_interval")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/39722
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/api-ni.c   | 26 ++------------------------
 net/lnet/lnet/lib-move.c | 19 +++----------------
 2 files changed, 5 insertions(+), 40 deletions(-)

diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index cc40040..d6a8c1b 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -110,7 +110,7 @@ static int recovery_interval_set(const char *val,
 		__param_check(name, p, int)
 module_param(lnet_recovery_interval, recovery_interval, 0644);
 MODULE_PARM_DESC(lnet_recovery_interval,
-		 "Interval to recover unhealthy interfaces in seconds");
+		 "DEPRECATED Interval to recover unhealthy interfaces in seconds");
 
 unsigned int lnet_recovery_limit;
 module_param(lnet_recovery_limit, uint, 0644);
@@ -253,29 +253,7 @@ static int lnet_discover(struct lnet_process_id id, u32 force,
 static int
 recovery_interval_set(const char *val, const struct kernel_param *kp)
 {
-	int rc;
-	unsigned int *interval = (unsigned int *)kp->arg;
-	unsigned long value;
-
-	rc = kstrtoul(val, 0, &value);
-	if (rc) {
-		CERROR("Invalid module parameter value for 'lnet_recovery_interval'\n");
-		return rc;
-	}
-
-	if (value < 1) {
-		CERROR("lnet_recovery_interval must be at least 1 second\n");
-		return -EINVAL;
-	}
-
-	/* The purpose of locking the api_mutex here is to ensure that
-	 * the correct value ends up stored properly.
-	 */
-	mutex_lock(&the_lnet.ln_api_mutex);
-
-	*interval = value;
-
-	mutex_unlock(&the_lnet.ln_api_mutex);
+	CWARN("'lnet_recovery_interval' has been deprecated\n");
 
 	return 0;
 }
diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index 46c88d0..cb0943e 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -3480,9 +3480,7 @@ struct lnet_mt_event_info {
 static int
 lnet_monitor_thread(void *arg)
 {
-	time64_t recovery_timeout = 0;
 	time64_t rsp_timeout = 0;
-	int interval;
 	time64_t now;
 
 	wait_for_completion(&the_lnet.ln_started);
@@ -3509,11 +3507,8 @@ struct lnet_mt_event_info {
 			rsp_timeout = now + (lnet_transaction_timeout / 2);
 		}
 
-		if (now >= recovery_timeout) {
-			lnet_recover_local_nis();
-			lnet_recover_peer_nis();
-			recovery_timeout = now + lnet_recovery_interval;
-		}
+		lnet_recover_local_nis();
+		lnet_recover_peer_nis();
 
 		/* TODO do we need to check if we should sleep without
 		 * timeout?  Technically, an active system will always
@@ -3522,17 +3517,9 @@ struct lnet_mt_event_info {
 		 * if we wake up every 1 second? Although, we've seen
 		 * cases where we get a complaint that an idle thread
 		 * is waking up unnecessarily.
-		 *
-		 * Take into account the current net_count when you wake
-		 * up for alive router checking, since we need to check
-		 * possibly as many networks as we have configured.
 		 */
-		interval = min(lnet_recovery_interval,
-			       min((unsigned int)alive_router_check_interval /
-					lnet_current_net_count,
-				   lnet_transaction_timeout / 2));
 		wait_for_completion_interruptible_timeout(&the_lnet.ln_mt_wait_complete,
-							  interval * HZ);
+							  HZ);
 		/* Must re-init the completion before testing anything,
 		 * including ln_mt_state.
 		 */
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [lustre-devel] [PATCH 05/14] lnet: Router ping timeout with discovery disabled
  2021-05-04  0:10 [lustre-devel] [PATCH 00/14] Update to OpenSFS tree as of May 3, 2021 James Simmons
                   ` (3 preceding siblings ...)
  2021-05-04  0:10 ` [lustre-devel] [PATCH 04/14] lnet: Deprecate lnet_recovery_interval James Simmons
@ 2021-05-04  0:10 ` James Simmons
  2021-05-04  0:10 ` [lustre-devel] [PATCH 06/14] lnet: Ensure proper peer, peer NI, peer net hierarchy James Simmons
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: James Simmons @ 2021-05-04  0:10 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

Discovery pings are used to determine the health of gateways and
associated routes. Ping replies from gateways with dynamic discovery
(DD) disabled (or if DD is disabled locally) are handled in
a special routine, lnet_router_discovery_ping_reply(), but this
function and related code doesn't handle the case where a discovery
ping hits the response tracker timeout and is unlinked by the
monitor thread. In this case, an UNLINK event is generated and we
do not call the lnet_router_discovery_ping_reply(). For gateways
with DD enabled (and DD enabled locally), we handle this case
in lnet_router_discovery_complete(). If discovery failed then
lp_dc_error is set and we mark all routes down for the gateway. We
can simply extend this logic to the case of gateways w/DD disabled
(or DD disabled locally).

Fixes: dc80207e3a ("lnet: fix asym routing with multi-hop")
HPE-bug-id: LUS-9612
WC-bug-id: https://jira.whamcloud.com/browse/LU-14206
Lustre-commit: 173d86c6e9a704a8 ("LU-14206 lnet: Router ping timeout with discovery disabled")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/40923
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/router.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c
index ae7582ca..e179997 100644
--- a/net/lnet/lnet/router.c
+++ b/net/lnet/lnet/router.c
@@ -495,11 +495,11 @@ bool lnet_is_route_alive(struct lnet_route *route)
 	lp->lp_alive = lp->lp_dc_error == 0;
 	spin_unlock(&lp->lp_lock);
 
-	/* ping replies are being handled when discovery is disabled */
-	if (lnet_is_discovery_disabled_locked(lp))
-		return;
-
 	if (!lp->lp_dc_error) {
+		/* ping replies are being handled when discovery is disabled */
+		if (lnet_is_discovery_disabled_locked(lp))
+			return;
+
 		/* mark single-hop routes. If the remote net is not configured
 		 * on the gateway we assume this is intentional and we mark the
 		 * gateway as multi-hop
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [lustre-devel] [PATCH 06/14] lnet: Ensure proper peer, peer NI, peer net hierarchy
  2021-05-04  0:10 [lustre-devel] [PATCH 00/14] Update to OpenSFS tree as of May 3, 2021 James Simmons
                   ` (4 preceding siblings ...)
  2021-05-04  0:10 ` [lustre-devel] [PATCH 05/14] lnet: Router ping timeout with discovery disabled James Simmons
@ 2021-05-04  0:10 ` James Simmons
  2021-05-04  0:10 ` [lustre-devel] [PATCH 07/14] lnet: libcfs: simplify task management in tracefile.c James Simmons
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: James Simmons @ 2021-05-04  0:10 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

The MR design dictates that the peer nets and peer NIs are ordered
such that the peer net and peer NI for a peer's primary NID appears
first, followed by other peer NIs in the primary NID's peer net,
followed by other peer nets/NIs. This ordering is broken and it can
result in tripping an assertion if the primary NID of a peer is
deleted. Modify lnet_peer_attach_peer_ni() to check whether the
NI being attached is the peer's primary, and place it, and its
associated peer net, appropriately.

Modify lnet_peer_set_primary_nid() so that it updates the
lp_primary_nid before calling lnet_peer_add_nid() so that
lnet_peer_attach_peer_ni() can detect the situation where the
primary is changing and act appropriately.

Finally, modify lnet_peer_merge_data() to enforce the hierarchy
after it has finished merging the contents of the ping buffer. This
ensures we maintain the correct hierarchy in certain edge cases where
we've needed to reconcile two peers. e.g. if a peer adds a new
interface, the discovery push may arrive from that new interface
which will result in a second peer object being created which will
need to be reconciled with the original peer object.

HPE-bug-id: LUS-9630
WC-bug-id: https://jira.whamcloud.com/browse/LU-13806
Lustre-commit: 9eb9474c41c823c7 ("LU-13806 lnet: Ensure proper peer, peer NI, peer net hierarchy")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/40985
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/peer.c | 41 +++++++++++++++++++++++++++++++++++++----
 1 file changed, 37 insertions(+), 4 deletions(-)

diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index 0ec1460..db00514 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -1428,7 +1428,10 @@ struct lnet_peer_net *
 
 	/* Add peer_ni to peer_net */
 	lpni->lpni_peer_net = lpn;
-	list_add_tail(&lpni->lpni_peer_nis, &lpn->lpn_peer_nis);
+	if (lp->lp_primary_nid == lpni->lpni_nid)
+		list_add(&lpni->lpni_peer_nis, &lpn->lpn_peer_nis);
+	else
+		list_add_tail(&lpni->lpni_peer_nis, &lpn->lpn_peer_nis);
 	lnet_update_peer_net_healthv(lpni);
 	lnet_peer_net_addref_locked(lpn);
 
@@ -1436,7 +1439,10 @@ struct lnet_peer_net *
 	if (!lpn->lpn_peer) {
 		new_lpn = true;
 		lpn->lpn_peer = lp;
-		list_add_tail(&lpn->lpn_peer_nets, &lp->lp_peer_nets);
+		if (lp->lp_primary_nid == lpni->lpni_nid)
+			list_add(&lpn->lpn_peer_nets, &lp->lp_peer_nets);
+		else
+			list_add_tail(&lpn->lpn_peer_nets, &lp->lp_peer_nets);
 		lnet_peer_addref_locked(lp);
 	}
 
@@ -1678,10 +1684,14 @@ struct lnet_peer_net *
 
 	if (lp->lp_primary_nid == nid)
 		goto out;
+
+	lp->lp_primary_nid = nid;
+
 	rc = lnet_peer_add_nid(lp, nid, flags);
-	if (rc)
+	if (rc) {
+		lp->lp_primary_nid = old;
 		goto out;
-	lp->lp_primary_nid = nid;
+	}
 out:
 	CDEBUG(D_NET, "peer %s NID %s: %d\n",
 	       libcfs_nid2str(old), libcfs_nid2str(nid), rc);
@@ -2777,6 +2787,7 @@ static void lnet_discovery_event_handler(struct lnet_event *event)
 static int lnet_peer_merge_data(struct lnet_peer *lp,
 				struct lnet_ping_buffer *pbuf)
 {
+	struct lnet_peer_net *lpn;
 	struct lnet_peer_ni *lpni;
 	lnet_nid_t *curnis = NULL;
 	struct lnet_ni_status *addnis = NULL;
@@ -2902,6 +2913,28 @@ static int lnet_peer_merge_data(struct lnet_peer *lp,
 				goto out;
 		}
 	}
+
+	/* The peer net for the primary NID should be the first entry in the
+	 * peer's lp_peer_nets list, and the peer NI for the primary NID should
+	 * be the first entry in its peer net's lpn_peer_nis list.
+	 */
+	lpni = lnet_find_peer_ni_locked(pbuf->pb_info.pi_ni[1].ns_nid);
+	if (!lpni) {
+		CERROR("Internal error: Failed to lookup peer NI for primary NID: %s\n",
+		       libcfs_nid2str(pbuf->pb_info.pi_ni[1].ns_nid));
+		goto out;
+	}
+
+	lnet_peer_ni_decref_locked(lpni);
+
+	lpn = lpni->lpni_peer_net;
+	if (lpn->lpn_peer_nets.prev != &lp->lp_peer_nets)
+		list_move(&lpn->lpn_peer_nets, &lp->lp_peer_nets);
+
+	if (lpni->lpni_peer_nis.prev != &lpni->lpni_peer_net->lpn_peer_nis)
+		list_move(&lpni->lpni_peer_nis,
+			  &lpni->lpni_peer_net->lpn_peer_nis);
+
 	/*
 	 * Errors other than -ENOMEM are due to peers having been
 	 * configured with DLC. Ignore these because DLC overrides
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [lustre-devel] [PATCH 07/14] lnet: libcfs: simplify task management in tracefile.c
  2021-05-04  0:10 [lustre-devel] [PATCH 00/14] Update to OpenSFS tree as of May 3, 2021 James Simmons
                   ` (5 preceding siblings ...)
  2021-05-04  0:10 ` [lustre-devel] [PATCH 06/14] lnet: Ensure proper peer, peer NI, peer net hierarchy James Simmons
@ 2021-05-04  0:10 ` James Simmons
  2021-05-04  0:10 ` [lustre-devel] [PATCH 08/14] lustre: move lu_tgt_pool out of obd_target.h James Simmons
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: James Simmons @ 2021-05-04  0:10 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

The waitqueue, mutex, and two completions are not needed.
We can use kthread_stop/kthread_should_stop to synchronize
shutdown, cmpxchg() to ensure only one task is started, and a simple
wake_up_process() to wake the process.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14428
Lustre-commit: 6c5e6dd777a49ab0 ("LU-14428 libcfs: simplify task management in tracefile.c")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/41492
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/libcfs/tracefile.c | 82 ++++++++++++++-------------------------------
 1 file changed, 26 insertions(+), 56 deletions(-)

diff --git a/net/lnet/libcfs/tracefile.c b/net/lnet/libcfs/tracefile.c
index 731623b..b1a2f3e 100644
--- a/net/lnet/libcfs/tracefile.c
+++ b/net/lnet/libcfs/tracefile.c
@@ -61,9 +61,8 @@ enum cfs_trace_buf_type {
 
 char cfs_tracefile[TRACEFILE_NAME_SIZE];
 long long cfs_tracefile_size = CFS_TRACEFILE_SIZE;
-static struct tracefiled_ctl trace_tctl;
-static DEFINE_MUTEX(cfs_trace_thread_mutex);
-static int thread_running;
+
+struct task_struct *tctl_task;
 
 static atomic_t cfs_tage_allocated = ATOMIC_INIT(0);
 static DECLARE_RWSEM(cfs_tracefile_sem);
@@ -78,14 +77,6 @@ struct page_collection {
 	int			pc_want_daemon_pages;
 };
 
-struct tracefiled_ctl {
-	struct completion	tctl_start;
-	struct completion	tctl_stop;
-	wait_queue_head_t	tctl_waitq;
-	pid_t			tctl_pid;
-	atomic_t		tctl_shutdown;
-};
-
 /*
  * small data-structure for each page owned by tracefiled.
  */
@@ -244,6 +235,7 @@ static void cfs_tage_to_tail(struct cfs_trace_page *tage,
 cfs_trace_get_tage_try(struct cfs_trace_cpu_data *tcd, unsigned long len)
 {
 	struct cfs_trace_page *tage;
+	struct task_struct *tsk;
 
 	if (tcd->tcd_cur_pages > 0) {
 		__LASSERT(!list_empty(&tcd->tcd_pages));
@@ -274,12 +266,10 @@ static void cfs_tage_to_tail(struct cfs_trace_page *tage,
 		list_add_tail(&tage->linkage, &tcd->tcd_pages);
 		tcd->tcd_cur_pages++;
 
-		if (tcd->tcd_cur_pages > 8 && thread_running) {
-			struct tracefiled_ctl *tctl = &trace_tctl;
-			/*
-			 * wake up tracefiled to process some pages.
+		if (tcd->tcd_cur_pages > 8 && tsk) {
+			/* wake up tracefiled to process some pages.
 			 */
-			wake_up(&tctl->tctl_waitq);
+			wake_up_process(tsk);
 		}
 		return tage;
 	}
@@ -332,7 +322,7 @@ static struct cfs_trace_page *cfs_trace_get_tage(struct cfs_trace_cpu_data *tcd,
 	tage = cfs_trace_get_tage_try(tcd, len);
 	if (tage)
 		return tage;
-	if (thread_running)
+	if (tctl_task)
 		cfs_tcd_shrink(tcd);
 	if (tcd->tcd_cur_pages > 0) {
 		tage = cfs_tage_from_list(tcd->tcd_pages.next);
@@ -1075,7 +1065,6 @@ int cfs_trace_get_debug_mb(void)
 static int tracefiled(void *arg)
 {
 	struct page_collection pc;
-	struct tracefiled_ctl *tctl = arg;
 	struct cfs_trace_page *tage;
 	struct cfs_trace_page *tmp;
 	struct file *filp;
@@ -1083,21 +1072,13 @@ static int tracefiled(void *arg)
 	int last_loop = 0;
 	int rc;
 
-	/* we're started late enough that we pick up init's fs context */
-	/* this is so broken in uml?  what on earth is going on? */
-
-	complete(&tctl->tctl_start);
-
 	pc.pc_want_daemon_pages = 0;
 
 	while (!last_loop) {
-		wait_event_timeout(tctl->tctl_waitq,
-				   ({ collect_pages(&pc);
-				     !list_empty(&pc.pc_pages); }) ||
-				   atomic_read(&tctl->tctl_shutdown),
-				   HZ);
-		if (atomic_read(&tctl->tctl_shutdown))
+		schedule_timeout_interruptible(HZ);
+		if (kthread_should_stop())
 			last_loop = 1;
+		collect_pages(&pc);
 		if (list_empty(&pc.pc_pages))
 			continue;
 
@@ -1168,50 +1149,39 @@ static int tracefiled(void *arg)
 		}
 		__LASSERT(list_empty(&pc.pc_pages));
 	}
-	complete(&tctl->tctl_stop);
+
 	return 0;
 }
 
 int cfs_trace_start_thread(void)
 {
-	struct tracefiled_ctl *tctl = &trace_tctl;
-	struct task_struct *task;
+	struct task_struct *tsk;
 	int rc = 0;
 
-	mutex_lock(&cfs_trace_thread_mutex);
-	if (thread_running)
-		goto out;
-
-	init_completion(&tctl->tctl_start);
-	init_completion(&tctl->tctl_stop);
-	init_waitqueue_head(&tctl->tctl_waitq);
-	atomic_set(&tctl->tctl_shutdown, 0);
+	if (tctl_task)
+		return 0;
 
-	task = kthread_run(tracefiled, tctl, "ktracefiled");
-	if (IS_ERR(task)) {
-		rc = PTR_ERR(task);
-		goto out;
-	}
+	tsk = kthread_create(tracefiled, NULL, "ktracefiled");
+	if (IS_ERR(tsk))
+		rc = PTR_ERR(tsk);
+	else if (cmpxchg(&tctl_task, NULL, tsk))
+		/* already running */
+		kthread_stop(tsk);
+	else
+		wake_up_process(tsk);
 
-	wait_for_completion(&tctl->tctl_start);
-	thread_running = 1;
-out:
-	mutex_unlock(&cfs_trace_thread_mutex);
 	return rc;
 }
 
 void cfs_trace_stop_thread(void)
 {
-	struct tracefiled_ctl *tctl = &trace_tctl;
+	struct task_struct *tsk;
 
-	mutex_lock(&cfs_trace_thread_mutex);
-	if (thread_running) {
+	tsk = xchg(&tctl_task, NULL);
+	if (tsk) {
 		pr_info("shutting down debug daemon thread...\n");
-		atomic_set(&tctl->tctl_shutdown, 1);
-		wait_for_completion(&tctl->tctl_stop);
-		thread_running = 0;
+		kthread_stop(tsk);
 	}
-	mutex_unlock(&cfs_trace_thread_mutex);
 }
 
 /* percents to share the total debug memory for each type */
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [lustre-devel] [PATCH 08/14] lustre: move lu_tgt_pool out of obd_target.h
  2021-05-04  0:10 [lustre-devel] [PATCH 00/14] Update to OpenSFS tree as of May 3, 2021 James Simmons
                   ` (6 preceding siblings ...)
  2021-05-04  0:10 ` [lustre-devel] [PATCH 07/14] lnet: libcfs: simplify task management in tracefile.c James Simmons
@ 2021-05-04  0:10 ` James Simmons
  2021-05-04  0:10 ` [lustre-devel] [PATCH 09/14] lnet: libcfs: remove references to Sun Trademark James Simmons
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: James Simmons @ 2021-05-04  0:10 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

'struct lu_tgt_pool' is the only part of obd_target.h that is needed
on the client.
So move it and related declarations to lu_object.h.
Then obd_target.h does not need to be included by lu_object.h,
and it will only be included server-side;

WC-bug-id: https://jira.whamcloud.com/browse/LU-8837
Lustre-commit: 5a8dc02609ace484 ("LU-8837 lustre: move lu_tgt_pool out of obd_target.h")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/41951
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lu_object.h    | 18 +++++++++++++-
 fs/lustre/include/obd_target.h   | 53 ----------------------------------------
 fs/lustre/obdclass/lu_tgt_pool.c |  2 +-
 3 files changed, 18 insertions(+), 55 deletions(-)
 delete mode 100644 fs/lustre/include/obd_target.h

diff --git a/fs/lustre/include/lu_object.h b/fs/lustre/include/lu_object.h
index 0aa28c7..a270631 100644
--- a/fs/lustre/include/lu_object.h
+++ b/fs/lustre/include/lu_object.h
@@ -38,7 +38,6 @@
 #include <linux/rhashtable.h>
 #include <linux/libcfs/libcfs.h>
 #include <linux/ctype.h>
-#include <obd_target.h>
 #include <uapi/linux/lustre/lustre_idl.h>
 #include <lu_ref.h>
 
@@ -1404,6 +1403,23 @@ struct lu_kmem_descr {
 extern u32 lu_context_tags_default;
 extern u32 lu_session_tags_default;
 
+/* Generic subset of tgts */
+struct lu_tgt_pool {
+	u32		   *op_array;	/* array of index of
+					 * lov_obd->lov_tgts
+					 */
+	unsigned int	    op_count;	/* number of tgts in the array */
+	unsigned int	    op_size;	/* allocated size of op_array */
+	struct rw_semaphore op_rw_sem;	/* to protect lu_tgt_pool use */
+};
+
+int tgt_pool_init(struct lu_tgt_pool *op, unsigned int count);
+int tgt_pool_add(struct lu_tgt_pool *op, u32 idx, unsigned int min_count);
+int tgt_pool_remove(struct lu_tgt_pool *op, u32 idx);
+int tgt_pool_free(struct lu_tgt_pool *op);
+int tgt_check_index(int idx, struct lu_tgt_pool *osts);
+int tgt_pool_extend(struct lu_tgt_pool *op, unsigned int min_count);
+
 /* bitflags used in rr / qos allocation */
 enum lq_flag {
 	LQ_DIRTY	= 0,	/* recalc qos data */
diff --git a/fs/lustre/include/obd_target.h b/fs/lustre/include/obd_target.h
deleted file mode 100644
index 5d8e8bb..0000000
--- a/fs/lustre/include/obd_target.h
+++ /dev/null
@@ -1,53 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/* GPL HEADER START
- *
- * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 only,
- * as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * General Public License version 2 for more details (a copy is included
- * in the LICENSE file that accompanied this code).
- *
- * You should have received a copy of the GNU General Public License
- * version 2 along with this program; If not, see
- * http://www.gnu.org/licenses/gpl-2.0.html
- *
- * GPL HEADER END
- */
-/*
- * Copyright (c) 2007, 2010, Oracle and/or its affiliates. All rights reserved.
- * Use is subject to license terms.
- *
- * Copyright (c) 2011, 2014, Intel Corporation.
- */
-/*
- * This file is part of Lustre, http://www.lustre.org/
- */
-
-#ifndef __OBD_TARGET_H
-#define __OBD_TARGET_H
-#include <lprocfs_status.h>
-
-/* Generic subset of tgts */
-struct lu_tgt_pool {
-	__u32		   *op_array;	/* array of index of
-					 * lov_obd->lov_tgts
-					 */
-	unsigned int	    op_count;	/* number of tgts in the array */
-	unsigned int	    op_size;	/* allocated size of op_array */
-	struct rw_semaphore op_rw_sem;	/* to protect lu_tgt_pool use */
-};
-
-int tgt_pool_init(struct lu_tgt_pool *op, unsigned int count);
-int tgt_pool_add(struct lu_tgt_pool *op, __u32 idx, unsigned int min_count);
-int tgt_pool_remove(struct lu_tgt_pool *op, __u32 idx);
-int tgt_pool_free(struct lu_tgt_pool *op);
-int tgt_check_index(int idx, struct lu_tgt_pool *osts);
-int tgt_pool_extend(struct lu_tgt_pool *op, unsigned int min_count);
-
-#endif /* __OBD_TARGET_H */
diff --git a/fs/lustre/obdclass/lu_tgt_pool.c b/fs/lustre/obdclass/lu_tgt_pool.c
index 5d8e362..a8e1028 100644
--- a/fs/lustre/obdclass/lu_tgt_pool.c
+++ b/fs/lustre/obdclass/lu_tgt_pool.c
@@ -42,8 +42,8 @@
 #define DEBUG_SUBSYSTEM S_CLASS
 
 #include <linux/libcfs/libcfs_private.h>
-#include <obd_target.h>
 #include <obd_support.h>
+#include <lu_object.h>
 
 /**
  * Initialize the pool data structures at startup.
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [lustre-devel] [PATCH 09/14] lnet: libcfs: remove references to Sun Trademark.
  2021-05-04  0:10 [lustre-devel] [PATCH 00/14] Update to OpenSFS tree as of May 3, 2021 James Simmons
                   ` (7 preceding siblings ...)
  2021-05-04  0:10 ` [lustre-devel] [PATCH 08/14] lustre: move lu_tgt_pool out of obd_target.h James Simmons
@ 2021-05-04  0:10 ` James Simmons
  2021-05-04  0:10 ` [lustre-devel] [PATCH 10/14] lnet: Skip discovery in LNetPrimaryNID if DD disabled James Simmons
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: James Simmons @ 2021-05-04  0:10 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

"lustre" is no longer a Trademark of Sun Microsystems.  There is no
need to acknowledge the trademark in every file, so just remove all
these claims.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14487
Lustre-commit: 6621439c371e37268 ("LU-14487 libcfs: remove references to Sun Trademark.")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/42137
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/libcfs/libcfs.h         | 1 -
 include/linux/libcfs/libcfs_cpu.h     | 1 -
 include/linux/libcfs/libcfs_debug.h   | 1 -
 include/linux/libcfs/libcfs_hash.h    | 1 -
 include/linux/libcfs/libcfs_private.h | 1 -
 include/linux/libcfs/libcfs_string.h  | 1 -
 net/lnet/libcfs/debug.c               | 1 -
 net/lnet/libcfs/hash.c                | 1 -
 net/lnet/libcfs/libcfs_cpu.c          | 1 -
 net/lnet/libcfs/libcfs_lock.c         | 1 -
 net/lnet/libcfs/libcfs_mem.c          | 1 -
 net/lnet/libcfs/libcfs_string.c       | 1 -
 net/lnet/libcfs/module.c              | 1 -
 net/lnet/libcfs/tracefile.c           | 1 -
 net/lnet/libcfs/tracefile.h           | 1 -
 15 files changed, 15 deletions(-)

diff --git a/include/linux/libcfs/libcfs.h b/include/linux/libcfs/libcfs.h
index c98d7a1..b59ef9b 100644
--- a/include/linux/libcfs/libcfs.h
+++ b/include/linux/libcfs/libcfs.h
@@ -28,7 +28,6 @@
  */
 /*
  * This file is part of Lustre, http://www.lustre.org/
- * Lustre is a trademark of Sun Microsystems, Inc.
  */
 
 #ifndef __LIBCFS_LIBCFS_H__
diff --git a/include/linux/libcfs/libcfs_cpu.h b/include/linux/libcfs/libcfs_cpu.h
index 310b25c..b4f1b58 100644
--- a/include/linux/libcfs/libcfs_cpu.h
+++ b/include/linux/libcfs/libcfs_cpu.h
@@ -23,7 +23,6 @@
  */
 /*
  * This file is part of Lustre, http://www.lustre.org/
- * Lustre is a trademark of Sun Microsystems, Inc.
  *
  * libcfs/include/libcfs/libcfs_cpu.h
  *
diff --git a/include/linux/libcfs/libcfs_debug.h b/include/linux/libcfs/libcfs_debug.h
index bc85bb9..373a055 100644
--- a/include/linux/libcfs/libcfs_debug.h
+++ b/include/linux/libcfs/libcfs_debug.h
@@ -28,7 +28,6 @@
  */
 /*
  * This file is part of Lustre, http://www.lustre.org/
- * Lustre is a trademark of Sun Microsystems, Inc.
  *
  * libcfs/include/libcfs/libcfs_debug.h
  *
diff --git a/include/linux/libcfs/libcfs_hash.h b/include/linux/libcfs/libcfs_hash.h
index 7f62379..d3b4875 100644
--- a/include/linux/libcfs/libcfs_hash.h
+++ b/include/linux/libcfs/libcfs_hash.h
@@ -28,7 +28,6 @@
  */
 /*
  * This file is part of Lustre, http://www.lustre.org/
- * Lustre is a trademark of Sun Microsystems, Inc.
  *
  * libcfs/include/libcfs/libcfs_hash.h
  *
diff --git a/include/linux/libcfs/libcfs_private.h b/include/linux/libcfs/libcfs_private.h
index 3996d0e..378c2a5 100644
--- a/include/linux/libcfs/libcfs_private.h
+++ b/include/linux/libcfs/libcfs_private.h
@@ -28,7 +28,6 @@
  */
 /*
  * This file is part of Lustre, http://www.lustre.org/
- * Lustre is a trademark of Sun Microsystems, Inc.
  *
  * libcfs/include/libcfs/libcfs_private.h
  *
diff --git a/include/linux/libcfs/libcfs_string.h b/include/linux/libcfs/libcfs_string.h
index 34b5521..e2b6d72 100644
--- a/include/linux/libcfs/libcfs_string.h
+++ b/include/linux/libcfs/libcfs_string.h
@@ -28,7 +28,6 @@
  */
 /*
  * This file is part of Lustre, http://www.lustre.org/
- * Lustre is a trademark of Sun Microsystems, Inc.
  *
  * libcfs/include/libcfs/libcfs_string.h
  *
diff --git a/net/lnet/libcfs/debug.c b/net/lnet/libcfs/debug.c
index cb6c33a..1bb382d 100644
--- a/net/lnet/libcfs/debug.c
+++ b/net/lnet/libcfs/debug.c
@@ -28,7 +28,6 @@
  */
 /*
  * This file is part of Lustre, http://www.lustre.org/
- * Lustre is a trademark of Sun Microsystems, Inc.
  *
  * libcfs/libcfs/debug.c
  *
diff --git a/net/lnet/libcfs/hash.c b/net/lnet/libcfs/hash.c
index f452c45..d060eaa 100644
--- a/net/lnet/libcfs/hash.c
+++ b/net/lnet/libcfs/hash.c
@@ -28,7 +28,6 @@
  */
 /*
  * This file is part of Lustre, http://www.lustre.org/
- * Lustre is a trademark of Sun Microsystems, Inc.
  *
  * libcfs/libcfs/hash.c
  *
diff --git a/net/lnet/libcfs/libcfs_cpu.c b/net/lnet/libcfs/libcfs_cpu.c
index 8e4fdb1..dca92cd 100644
--- a/net/lnet/libcfs/libcfs_cpu.c
+++ b/net/lnet/libcfs/libcfs_cpu.c
@@ -22,7 +22,6 @@
  */
 /*
  * This file is part of Lustre, http://www.lustre.org/
- * Lustre is a trademark of Sun Microsystems, Inc.
  *
  * Please see comments in libcfs/include/libcfs/libcfs_cpu.h for introduction
  *
diff --git a/net/lnet/libcfs/libcfs_lock.c b/net/lnet/libcfs/libcfs_lock.c
index 313aa95..8af77b1 100644
--- a/net/lnet/libcfs/libcfs_lock.c
+++ b/net/lnet/libcfs/libcfs_lock.c
@@ -21,7 +21,6 @@
  */
 /*
  * This file is part of Lustre, http://www.lustre.org/
- * Lustre is a trademark of Sun Microsystems, Inc.
  *
  * Author: liang@whamcloud.com
  */
diff --git a/net/lnet/libcfs/libcfs_mem.c b/net/lnet/libcfs/libcfs_mem.c
index 6a49d39..f2af90c 100644
--- a/net/lnet/libcfs/libcfs_mem.c
+++ b/net/lnet/libcfs/libcfs_mem.c
@@ -22,7 +22,6 @@
  */
 /*
  * This file is part of Lustre, http://www.lustre.org/
- * Lustre is a trademark of Sun Microsystems, Inc.
  *
  * Author: liang@whamcloud.com
  */
diff --git a/net/lnet/libcfs/libcfs_string.c b/net/lnet/libcfs/libcfs_string.c
index 66a108c..d2460f3 100644
--- a/net/lnet/libcfs/libcfs_string.c
+++ b/net/lnet/libcfs/libcfs_string.c
@@ -28,7 +28,6 @@
  */
 /*
  * This file is part of Lustre, http://www.lustre.org/
- * Lustre is a trademark of Sun Microsystems, Inc.
  *
  * String manipulation functions.
  *
diff --git a/net/lnet/libcfs/module.c b/net/lnet/libcfs/module.c
index 93e9b9e..8059569 100644
--- a/net/lnet/libcfs/module.c
+++ b/net/lnet/libcfs/module.c
@@ -28,7 +28,6 @@
  */
 /*
  * This file is part of Lustre, http://www.lustre.org/
- * Lustre is a trademark of Sun Microsystems, Inc.
  */
 #include <linux/miscdevice.h>
 #include <linux/module.h>
diff --git a/net/lnet/libcfs/tracefile.c b/net/lnet/libcfs/tracefile.c
index b1a2f3e..6321840 100644
--- a/net/lnet/libcfs/tracefile.c
+++ b/net/lnet/libcfs/tracefile.c
@@ -28,7 +28,6 @@
  */
 /*
  * This file is part of Lustre, http://www.lustre.org/
- * Lustre is a trademark of Sun Microsystems, Inc.
  *
  * libcfs/libcfs/tracefile.c
  *
diff --git a/net/lnet/libcfs/tracefile.h b/net/lnet/libcfs/tracefile.h
index 311ec8c..af21e4a 100644
--- a/net/lnet/libcfs/tracefile.h
+++ b/net/lnet/libcfs/tracefile.h
@@ -28,7 +28,6 @@
  */
 /*
  * This file is part of Lustre, http://www.lustre.org/
- * Lustre is a trademark of Sun Microsystems, Inc.
  */
 
 #ifndef __LIBCFS_TRACEFILE_H__
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [lustre-devel] [PATCH 10/14] lnet: Skip discovery in LNetPrimaryNID if DD disabled
  2021-05-04  0:10 [lustre-devel] [PATCH 00/14] Update to OpenSFS tree as of May 3, 2021 James Simmons
                   ` (8 preceding siblings ...)
  2021-05-04  0:10 ` [lustre-devel] [PATCH 09/14] lnet: libcfs: remove references to Sun Trademark James Simmons
@ 2021-05-04  0:10 ` James Simmons
  2021-05-04  0:10 ` [lustre-devel] [PATCH 11/14] lustre: ptlrpc: idle import vs lock enqueue race James Simmons
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: James Simmons @ 2021-05-04  0:10 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

If discovery is disabled locally then the discovery thread will not
modify any peer objects as a result of the discovery process. Thus,
the primary NID of any peer we're asked to discover will not change
as a result of discovery. Therefore, we do not need to actually
perform discovery in LNetPrimaryNID() if discovery is disabled
locally. Since this routine can result in long client mount times
when a Lustre server is down we should avoid this unnecessary
discovery.

HPE-bug-id: LUS-9887
WC-bug-id: https://jira.whamcloud.com/browse/LU-14566
Lustre-commit: 16264da9e3c43a63 ("LU-14566 lnet: Skip discovery in LNetPrimaryNID if DD disabled")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/43141
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/peer.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index db00514..d66a302 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -1336,7 +1336,13 @@ struct lnet_peer_ni *
 	}
 	lp = lpni->lpni_peer_net->lpn_peer;
 
-	while (!lnet_peer_is_uptodate(lp)) {
+	/* If discovery is disabled locally then we needn't bother running
+	 * discovery here because discovery will not modify whatever
+	 * primary NID is currently set for this peer. If the specified peer is
+	 * down then this discovery can introduce long delays into the mount
+	 * process, so skip it if it isn't necessary.
+	 */
+	while (!lnet_peer_discovery_disabled && !lnet_peer_is_uptodate(lp)) {
 		spin_lock(&lp->lp_lock);
 		/* force a full discovery cycle */
 		lp->lp_state |= LNET_PEER_FORCE_PING | LNET_PEER_FORCE_PUSH;
@@ -1357,7 +1363,11 @@ struct lnet_peer_ni *
 		}
 		lp = lpni->lpni_peer_net->lpn_peer;
 
-		/* Only try once if discovery is disabled */
+		/* If we find that the peer has discovery disabled then we will
+		 * not modify whatever primary NID is currently set for this
+		 * peer. Thus, we can break out of this loop even if the peer
+		 * is not fully up to date.
+		 */
 		if (lnet_is_discovery_disabled(lp))
 			break;
 	}
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [lustre-devel] [PATCH 11/14] lustre: ptlrpc: idle import vs lock enqueue race
  2021-05-04  0:10 [lustre-devel] [PATCH 00/14] Update to OpenSFS tree as of May 3, 2021 James Simmons
                   ` (9 preceding siblings ...)
  2021-05-04  0:10 ` [lustre-devel] [PATCH 10/14] lnet: Skip discovery in LNetPrimaryNID if DD disabled James Simmons
@ 2021-05-04  0:10 ` James Simmons
  2021-05-04  0:10 ` [lustre-devel] [PATCH 12/14] lustre: mdc: make rpc set for MDS_STATFS interruptible James Simmons
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: James Simmons @ 2021-05-04  0:10 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Alexander Boyko, Andriy Skulysh, Alexander Boyko,
	Lustre Development List

From: Alexander Boyko <c17825@cray.com>

There is a window after ptlrpc_check_import_is_idle()
and setting LUSTRE_IMP_CONNECTING for lock enqueue.
The lock get granted on OST and is returned to the client.
Server's lock is destroyed on OST_DISCONNECT.

Perform import counters check with setting LUSTRE_IMP_CONNECTING.
A regression test_812c was added to sanity.

HPE-bug-id: LUS-8705
WC-bug-id: https://jira.whamcloud.com/browse/LU-14397
Lustre-commit: e6af3c529021976e ("LU-14397 ptlrpc: idle import vs lock enqueue race")
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-on: https://review.whamcloud.com/41403
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andriy Skulysh <askulysh@gmail.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lustre_net.h  |  1 +
 fs/lustre/include/obd_class.h   |  2 ++
 fs/lustre/include/obd_support.h |  1 +
 fs/lustre/obdclass/obd_mount.c  |  4 +--
 fs/lustre/osc/osc_lock.c        |  5 ++++
 fs/lustre/ptlrpc/client.c       | 11 +++++++
 fs/lustre/ptlrpc/import.c       | 63 ++++++++++++++++++++++++++++-------------
 fs/lustre/ptlrpc/pinger.c       |  5 ++--
 8 files changed, 69 insertions(+), 23 deletions(-)

diff --git a/fs/lustre/include/lustre_net.h b/fs/lustre/include/lustre_net.h
index e1eb888..f84ee46 100644
--- a/fs/lustre/include/lustre_net.h
+++ b/fs/lustre/include/lustre_net.h
@@ -1904,6 +1904,7 @@ int ptlrpc_request_bufs_pack(struct ptlrpc_request *request,
 			     u32 version, int opcode, char **bufs,
 			     struct ptlrpc_cli_ctx *ctx);
 void ptlrpc_req_finished(struct ptlrpc_request *request);
+void ptlrpc_req_finished_with_imp_lock(struct ptlrpc_request *request);
 struct ptlrpc_request *ptlrpc_request_addref(struct ptlrpc_request *req);
 struct ptlrpc_bulk_desc *ptlrpc_prep_bulk_imp(struct ptlrpc_request *req,
 					      unsigned int nfrags,
diff --git a/fs/lustre/include/obd_class.h b/fs/lustre/include/obd_class.h
index 1d1777a..eb52733 100644
--- a/fs/lustre/include/obd_class.h
+++ b/fs/lustre/include/obd_class.h
@@ -1713,6 +1713,8 @@ struct root_squash_info {
 	spinlock_t		rsi_lock;	/* protects rsi_nosquash_nids */
 };
 
+int server_name2index(const char *svname, u32 *idx, const char **endptr);
+
 /* linux-module.c */
 struct obd_ioctl_data;
 int obd_ioctl_getdata(struct obd_ioctl_data **data, int *len, void __user *arg);
diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h
index 878a5cd..4628fab 100644
--- a/fs/lustre/include/obd_support.h
+++ b/fs/lustre/include/obd_support.h
@@ -365,6 +365,7 @@
 #define OBD_FAIL_PTLRPC_BULK_REPLY_ATTACH		0x522
 #define OBD_FAIL_PTLRPC_ROUND_XID			0x530
 #define OBD_FAIL_PTLRPC_CONNECT_RACE			0x531
+#define OBD_FAIL_PTLRPC_IDLE_RACE			0x533
 
 #define OBD_FAIL_OBD_PING_NET				0x600
 /*	OBD_FAIL_OBD_LOG_CANCEL_NET	0x601 obsolete since 1.5 */
diff --git a/fs/lustre/obdclass/obd_mount.c b/fs/lustre/obdclass/obd_mount.c
index fbad459..0a5e338 100644
--- a/fs/lustre/obdclass/obd_mount.c
+++ b/fs/lustre/obdclass/obd_mount.c
@@ -618,8 +618,7 @@ int server_name2fsname(const char *svname, char *fsname,
  * rc < 0  on error
  * if endptr isn't NULL it is set to end of name
  */
-static int server_name2index(const char *svname, u32 *idx,
-			     const char **endptr)
+int server_name2index(const char *svname, u32 *idx, const char **endptr)
 {
 	unsigned long index;
 	int rc;
@@ -658,6 +657,7 @@ static int server_name2index(const char *svname, u32 *idx,
 
 	return rc;
 }
+EXPORT_SYMBOL(server_name2index);
 
 /*************** mount common between server and client ***************/
 
diff --git a/fs/lustre/osc/osc_lock.c b/fs/lustre/osc/osc_lock.c
index de96fc0..e0de371 100644
--- a/fs/lustre/osc/osc_lock.c
+++ b/fs/lustre/osc/osc_lock.c
@@ -1037,6 +1037,11 @@ static int osc_lock_enqueue(const struct lu_env *env,
 		if (osc_lock_is_lockless(oscl)) {
 			oio->oi_lockless = 1;
 		} else if (!async) {
+			if (OBD_FAIL_PRECHECK(OBD_FAIL_PTLRPC_IDLE_RACE)) {
+				OBD_RACE(OBD_FAIL_PTLRPC_IDLE_RACE);
+				set_current_state(TASK_UNINTERRUPTIBLE);
+				schedule_timeout(HZ / 2);
+			}
 			LASSERT(oscl->ols_state == OLS_GRANTED);
 			LASSERT(oscl->ols_hold);
 			LASSERT(oscl->ols_dlmlock);
diff --git a/fs/lustre/ptlrpc/client.c b/fs/lustre/ptlrpc/client.c
index 97f1251..a812b29 100644
--- a/fs/lustre/ptlrpc/client.c
+++ b/fs/lustre/ptlrpc/client.c
@@ -2543,6 +2543,17 @@ static void __ptlrpc_free_req(struct ptlrpc_request *request, int locked)
 		ptlrpc_request_cache_free(request);
 }
 
+static int __ptlrpc_req_finished(struct ptlrpc_request *request, int locked);
+/**
+ * Drop one request reference. Must be called with import imp_lock held.
+ * When reference count drops to zero, request is freed.
+ */
+void ptlrpc_req_finished_with_imp_lock(struct ptlrpc_request *request)
+{
+	assert_spin_locked(&request->rq_import->imp_lock);
+	(void)__ptlrpc_req_finished(request, 1);
+}
+
 /**
  * Helper function
  * Drops one reference count for request @request.
diff --git a/fs/lustre/ptlrpc/import.c b/fs/lustre/ptlrpc/import.c
index 317f28c..1f31edb 100644
--- a/fs/lustre/ptlrpc/import.c
+++ b/fs/lustre/ptlrpc/import.c
@@ -1653,7 +1653,6 @@ static struct ptlrpc_request *ptlrpc_disconnect_prep_req(struct obd_import *imp)
 	req->rq_timeout = min_t(timeout_t, req->rq_timeout,
 				INITIAL_CONNECT_TIMEOUT);
 
-	import_set_state(imp, LUSTRE_IMP_CONNECTING);
 	req->rq_send_state =  LUSTRE_IMP_CONNECTING;
 	ptlrpc_request_set_replen(req);
 
@@ -1701,16 +1700,20 @@ int ptlrpc_disconnect_import(struct obd_import *imp, int noclose)
 				!ptlrpc_import_in_recovery(imp));
 	}
 
-	spin_lock(&imp->imp_lock);
-	if (imp->imp_state != LUSTRE_IMP_FULL)
-		goto out;
-	spin_unlock(&imp->imp_lock);
-
 	req = ptlrpc_disconnect_prep_req(imp);
 	if (IS_ERR(req)) {
 		rc = PTR_ERR(req);
 		goto set_state;
 	}
+
+	spin_lock(&imp->imp_lock);
+	if (imp->imp_state != LUSTRE_IMP_FULL) {
+		ptlrpc_req_finished_with_imp_lock(req);
+		goto out;
+	}
+	import_set_state_nolock(imp, LUSTRE_IMP_CONNECTING);
+	spin_unlock(&imp->imp_lock);
+
 	rc = ptlrpc_queue_wait(req);
 	ptlrpc_req_finished(req);
 
@@ -1794,6 +1797,21 @@ static int ptlrpc_disconnect_idle_interpret(const struct lu_env *env,
 	return 0;
 }
 
+static bool ptlrpc_can_idle(struct obd_import *imp)
+{
+	struct ldlm_namespace *ns = imp->imp_obd->obd_namespace;
+
+	/* one request for disconnect rpc */
+	if (atomic_read(&imp->imp_reqs) > 1)
+		return false;
+
+	/* any lock increases ns_bref being a resource holder */
+	if (ns && atomic_read(&ns->ns_bref) > 0)
+		return false;
+
+	return true;
+}
+
 int ptlrpc_disconnect_and_idle_import(struct obd_import *imp)
 {
 	struct ptlrpc_request *req;
@@ -1804,31 +1822,38 @@ int ptlrpc_disconnect_and_idle_import(struct obd_import *imp)
 	if (ptlrpc_import_in_recovery(imp))
 		return 0;
 
-	spin_lock(&imp->imp_lock);
+	req = ptlrpc_disconnect_prep_req(imp);
+	if (IS_ERR(req))
+		return PTR_ERR(req);
 
-	if (imp->imp_state != LUSTRE_IMP_FULL) {
+	req->rq_interpret_reply = ptlrpc_disconnect_idle_interpret;
+
+	if (OBD_FAIL_PRECHECK(OBD_FAIL_PTLRPC_IDLE_RACE)) {
+		u32 idx;
+
+		server_name2index(imp->imp_obd->obd_name, &idx, NULL);
+		if (idx == 0)
+			OBD_RACE(OBD_FAIL_PTLRPC_IDLE_RACE);
+	}
+
+	spin_lock(&imp->imp_lock);
+	if (imp->imp_state != LUSTRE_IMP_FULL || !ptlrpc_can_idle(imp)) {
+		ptlrpc_req_finished_with_imp_lock(req);
 		spin_unlock(&imp->imp_lock);
 		return 0;
 	}
+	import_set_state_nolock(imp, LUSTRE_IMP_CONNECTING);
+	/* don't make noise at reconnection */
+	imp->imp_was_idle = 1;
 	spin_unlock(&imp->imp_lock);
 
-	req = ptlrpc_disconnect_prep_req(imp);
-	if (IS_ERR(req))
-		return PTR_ERR(req);
-
 	CDEBUG_LIMIT(imp->imp_idle_debug, "%s: disconnect after %llus idle\n",
 		     imp->imp_obd->obd_name,
 		     ktime_get_real_seconds() - imp->imp_last_reply_time);
 
-	/* don't make noise at reconnection */
-	spin_lock(&imp->imp_lock);
-	imp->imp_was_idle = 1;
-	spin_unlock(&imp->imp_lock);
-
-	req->rq_interpret_reply = ptlrpc_disconnect_idle_interpret;
 	ptlrpcd_add_req(req);
 
-	return 0;
+	return 1;
 }
 EXPORT_SYMBOL(ptlrpc_disconnect_and_idle_import);
 
diff --git a/fs/lustre/ptlrpc/pinger.c b/fs/lustre/ptlrpc/pinger.c
index f565982..76a0844 100644
--- a/fs/lustre/ptlrpc/pinger.c
+++ b/fs/lustre/ptlrpc/pinger.c
@@ -127,8 +127,9 @@ static int ptlrpc_ping(struct obd_import *imp)
 {
 	struct ptlrpc_request *req;
 
-	if (ptlrpc_check_import_is_idle(imp))
-		return ptlrpc_disconnect_and_idle_import(imp);
+	if (ptlrpc_check_import_is_idle(imp) &&
+	    ptlrpc_disconnect_and_idle_import(imp) == 1)
+		return 0;
 
 	req = ptlrpc_prep_ping(imp);
 	if (!req) {
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [lustre-devel] [PATCH 12/14] lustre: mdc: make rpc set for MDS_STATFS interruptible
  2021-05-04  0:10 [lustre-devel] [PATCH 00/14] Update to OpenSFS tree as of May 3, 2021 James Simmons
                   ` (10 preceding siblings ...)
  2021-05-04  0:10 ` [lustre-devel] [PATCH 11/14] lustre: ptlrpc: idle import vs lock enqueue race James Simmons
@ 2021-05-04  0:10 ` James Simmons
  2021-05-04  0:10 ` [lustre-devel] [PATCH 13/14] lustre: llite: fake symlink type of foreign file/dir James Simmons
  2021-05-04  0:10 ` [lustre-devel] [PATCH 14/14] lustre: llite: use d_is_symlink to test if dentry is a symlink James Simmons
  13 siblings, 0 replies; 15+ messages in thread
From: James Simmons @ 2021-05-04  0:10 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Alex Zhuravlev <bzzz@whamcloud.com>

otherwise it ignores signals making imposible to interrupt
mount process with a signal which is checked by conf-sanity/23a

WC-bug-id: https://jira.whamcloud.com/browse/LU-14344
Lustre-commit: f125ba1f42b9b046 ("LU-14344 mdc: make rpc set for MDS_STATFS interruptible")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/41282
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/mdc/mdc_request.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c
index e6c0df5..7df2c59 100644
--- a/fs/lustre/mdc/mdc_request.c
+++ b/fs/lustre/mdc/mdc_request.c
@@ -1611,6 +1611,7 @@ static int mdc_statfs(const struct lu_env *env,
 		rc = -ENOMEM;
 		goto output;
 	}
+	req->rq_allow_intr = 1;
 
 	if ((flags & OBD_STATFS_SUM) &&
 	    (exp_connect_flags2(exp) & OBD_CONNECT2_SUM_STATFS)) {
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [lustre-devel] [PATCH 13/14] lustre: llite: fake symlink type of foreign file/dir
  2021-05-04  0:10 [lustre-devel] [PATCH 00/14] Update to OpenSFS tree as of May 3, 2021 James Simmons
                   ` (11 preceding siblings ...)
  2021-05-04  0:10 ` [lustre-devel] [PATCH 12/14] lustre: mdc: make rpc set for MDS_STATFS interruptible James Simmons
@ 2021-05-04  0:10 ` James Simmons
  2021-05-04  0:10 ` [lustre-devel] [PATCH 14/14] lustre: llite: use d_is_symlink to test if dentry is a symlink James Simmons
  13 siblings, 0 replies; 15+ messages in thread
From: James Simmons @ 2021-05-04  0:10 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Bruno Faccini <bruno.faccini@intel.com>

This patch implements a "fake symlink" specific usage of
"foreign" LOV/LMV format. It basically allows these
particular type of foreign files/dirs to behave as a
symlink from VFS point of view, by allowing to construct
a relative path from the LOV/LMV foreign content, to
complement it with a prefix, and then to expose it to
the VFS as a symlink destination. The default/internal
mechanism simply takes the full foreign free string as
the relative path, and for more complex internal formats
an upcall has been implemented to provide format's
details (presently just in terms of constant strings
and substrings positions in EA, but this can be enhanced)
to llite layer.
Using this feature, instead of real symlinks or user EA,
will permit to benefit from the special features (lock,
prefetch, caches) already implemented to handle both
LOV/LMV EAs.

WC-bug-id: https://jira.whamcloud.com/browse/LU-12682
Lustre-commit: 15d44e787e17ff5 ("LU-12682 llite: fake symlink type of foreign file/dir")
Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Reviewed-on: https://review.whamcloud.com/35856
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Ben Evans <beevans@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/Makefile                |   1 +
 fs/lustre/llite/dcache.c                |  14 +-
 fs/lustre/llite/dir.c                   |  11 +
 fs/lustre/llite/file.c                  |  50 ++-
 fs/lustre/llite/foreign_symlink.h       |  49 +++
 fs/lustre/llite/llite_foreign.c         | 284 ++++++++++++
 fs/lustre/llite/llite_foreign_symlink.c | 758 ++++++++++++++++++++++++++++++++
 fs/lustre/llite/llite_internal.h        |  35 +-
 fs/lustre/llite/llite_lib.c             | 105 +++++
 fs/lustre/llite/lproc_llite.c           |  12 +
 fs/lustre/llite/namei.c                 |  33 +-
 fs/lustre/llite/pcc.c                   |   4 +-
 fs/lustre/llite/symlink.c               |   1 +
 fs/lustre/lov/lov_object.c              |   3 +-
 fs/lustre/lov/lov_pack.c                |  13 +-
 include/uapi/linux/lustre/lustre_user.h |  49 ++-
 16 files changed, 1399 insertions(+), 23 deletions(-)
 create mode 100644 fs/lustre/llite/foreign_symlink.h
 create mode 100644 fs/lustre/llite/llite_foreign.c
 create mode 100644 fs/lustre/llite/llite_foreign_symlink.c

diff --git a/fs/lustre/llite/Makefile b/fs/lustre/llite/Makefile
index 3bad19c..c83d98c 100644
--- a/fs/lustre/llite/Makefile
+++ b/fs/lustre/llite/Makefile
@@ -7,6 +7,7 @@ lustre-y := dcache.o dir.o file.o llite_lib.o llite_nfs.o \
 	    xattr.o xattr_cache.o xattr_security.o \
 	    super25.o statahead.o glimpse.o lcommon_cl.o lcommon_misc.o \
 	    vvp_dev.o vvp_page.o vvp_io.o vvp_object.o \
+	    llite_foreign.o llite_foreign_symlink.o \
 	    lproc_llite.o pcc.o
 
 lustre-$(CONFIG_LUSTRE_FS_POSIX_ACL) += acl.o
diff --git a/fs/lustre/llite/dcache.c b/fs/lustre/llite/dcache.c
index cf6619f..f8b82d6 100644
--- a/fs/lustre/llite/dcache.c
+++ b/fs/lustre/llite/dcache.c
@@ -248,8 +248,18 @@ static int ll_revalidate_dentry(struct dentry *dentry,
 		return 1;
 
 	/* Symlink - always valid as long as the dentry was found */
-	if (dentry->d_inode && S_ISLNK(dentry->d_inode->i_mode))
-		return 1;
+	/* only special case is to prevent ELOOP error from VFS during open
+	 * of a foreign symlink file/dir with O_NOFOLLOW, like it happens for
+	 * real symlinks. This will allow to open foreign symlink file/dir
+	 * for get[dir]stripe/unlock ioctl()s.
+	 */
+	if (dentry->d_inode && dentry->d_inode->i_op->get_link) {
+		if (!S_ISLNK(dentry->d_inode->i_mode) &&
+		    !(lookup_flags & LOOKUP_FOLLOW))
+			return 0;
+		else
+			return 1;
+	}
 
 	/*
 	 * VFS warns us that this is the second go around and previous
diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c
index bf2d9fe..06ca329 100644
--- a/fs/lustre/llite/dir.c
+++ b/fs/lustre/llite/dir.c
@@ -1641,6 +1641,17 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 		ptlrpc_req_finished(root_request);
 		return rc;
 	}
+
+	case LL_IOC_UNLOCK_FOREIGN:
+		/* if not a foreign symlink do nothing */
+		if (ll_foreign_is_removable(dentry, true)) {
+			CDEBUG(D_INFO,
+			       "prevent rmdir of non-foreign dir ("DFID")\n",
+			       PFID(ll_inode2fid(inode)));
+			return -EOPNOTSUPP;
+		}
+		return 0;
+
 	case LL_IOC_RMFID:
 		return ll_rmfid(file, (void __user *)arg);
 	case LL_IOC_LOV_SWAP_LAYOUTS:
diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index 346e31c..78f3469 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -2342,12 +2342,12 @@ static int ll_lov_setstripe(struct inode *inode, struct file *file,
 		}
 
 		rc = cl_object_layout_get(env, obj, &cl);
-		if (!rc && cl.cl_is_composite)
+		if (rc >= 0 && cl.cl_is_composite)
 			rc = ll_layout_write_intent(inode, LAYOUT_INTENT_WRITE,
 						    &ext);
 
 		cl_env_put(env, &refcheck);
-		if (rc)
+		if (rc < 0)
 			goto out;
 	}
 
@@ -3001,7 +3001,7 @@ int ll_file_lock_ahead(struct file *file, struct llapi_lu_ladvise *ladvise)
 	CDEBUG(D_VFSTRACE,
 	       "Lock request: file=%pd, inode=%p, mode=%s start=%llu, end=%llu\n",
 	       dentry, dentry->d_inode,
-	       user_lockname[ladvise->lla_lockahead_mode], (__u64) start, end);
+	       user_lockname[ladvise->lla_lockahead_mode], (u64) start, end);
 
 	cl_mode = cl_mode_user_to_kernel(ladvise->lla_lockahead_mode);
 	if (cl_mode < 0) {
@@ -4086,6 +4086,20 @@ static int ll_heat_set(struct inode *inode, enum lu_heat_flag flags)
 			return -EOPNOTSUPP;
 		return llcrypt_ioctl_get_key_status(file, (void __user *)arg);
 #endif
+
+	case LL_IOC_UNLOCK_FOREIGN: {
+		struct dentry *dentry = file_dentry(file);
+
+		/* if not a foreign symlink do nothing */
+		if (ll_foreign_is_removable(dentry, true)) {
+			CDEBUG(D_INFO,
+			       "prevent unlink of non-foreign file ("DFID")\n",
+			       PFID(ll_inode2fid(inode)));
+			return -EOPNOTSUPP;
+		}
+		return 0;
+	}
+
 	default:
 		return obd_iocontrol(cmd, ll_i2dtexp(inode), 0, NULL,
 				     (void __user *)arg);
@@ -4842,10 +4856,9 @@ static int ll_merge_md_attr(struct inode *inode)
 	return 0;
 }
 
-int ll_getattr(const struct path *path, struct kstat *stat,
-	       u32 request_mask, unsigned int flags)
+int ll_getattr_dentry(struct dentry *de, struct kstat *stat, u32 request_mask,
+		      unsigned int flags, bool foreign)
 {
-	struct dentry *de = path->dentry;
 	struct inode *inode = d_inode(de);
 	struct ll_sb_info *sbi = ll_i2sbi(inode);
 	struct ll_inode_info *lli = ll_i2info(inode);
@@ -4872,7 +4885,10 @@ int ll_getattr(const struct path *path, struct kstat *stat,
 	if (rc < 0)
 		return rc;
 
-	if (S_ISREG(inode->i_mode)) {
+	/* foreign file/dir are always of zero length, so don't
+	 * need to validate size.
+	 */
+	if (S_ISREG(inode->i_mode) && !foreign) {
 		bool cached;
 
 		if (!need_glimpse)
@@ -4919,7 +4935,8 @@ int ll_getattr(const struct path *path, struct kstat *stat,
 		}
 	} else {
 		/* If object isn't regular a file then don't validate size. */
-		if (ll_dir_striped(inode)) {
+		/* foreign dir is not striped dir */
+		if (ll_dir_striped(inode) && !foreign) {
 			rc = ll_merge_md_attr(inode);
 			if (rc < 0)
 				return rc;
@@ -4948,7 +4965,13 @@ int ll_getattr(const struct path *path, struct kstat *stat,
 		stat->rdev = inode->i_rdev;
 		stat->ino = inode->i_ino;
 	}
-	stat->mode = inode->i_mode;
+
+	/* foreign symlink to be exposed as a real symlink */
+	if (!foreign)
+		stat->mode = inode->i_mode;
+	else
+		stat->mode = (inode->i_mode & ~S_IFMT) | S_IFLNK;
+
 	stat->uid = inode->i_uid;
 	stat->gid = inode->i_gid;
 	stat->atime = inode->i_atime;
@@ -4991,6 +5014,13 @@ int ll_getattr(const struct path *path, struct kstat *stat,
 	return 0;
 }
 
+int ll_getattr(const struct path *path, struct kstat *stat,
+	       u32 request_mask, unsigned int flags)
+{
+	return ll_getattr_dentry(path->dentry, stat, request_mask, flags,
+				 false);
+}
+
 int cl_falloc(struct inode *inode, int mode, loff_t offset, loff_t len)
 {
 	struct lu_env *env;
@@ -5319,7 +5349,7 @@ int ll_layout_conf(struct inode *inode, const struct cl_object_conf *conf)
 	}
 out:
 	cl_env_put(env, &refcheck);
-	return rc;
+	return rc < 0 ? rc : 0;
 }
 
 /* Fetch layout from MDT with getxattr request, if it's not ready yet */
diff --git a/fs/lustre/llite/foreign_symlink.h b/fs/lustre/llite/foreign_symlink.h
new file mode 100644
index 0000000..05e30b4
--- /dev/null
+++ b/fs/lustre/llite/foreign_symlink.h
@@ -0,0 +1,49 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * GPL HEADER START
+ *
+ * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 only,
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License version 2 for more details (a copy is included
+ * in the LICENSE file that accompanied this code).
+ *
+ * You should have received a copy of the GNU General Public License
+ * version 2 along with this program; If not, see
+ * http://www.gnu.org/licenses/gpl-2.0.html
+ *
+ * GPL HEADER END
+ */
+
+#ifndef LLITE_FOREIGN_SYMLINK_H
+#define LLITE_FOREIGN_SYMLINK_H
+
+/* llite/llite_foreign_symlink.c */
+ssize_t foreign_symlink_enable_show(struct kobject *kobj,
+				    struct attribute *attr, char *buf);
+ssize_t foreign_symlink_enable_store(struct kobject *kobj,
+				     struct attribute *attr,
+				     const char *buffer, size_t count);
+ssize_t foreign_symlink_prefix_show(struct kobject *kobj,
+				    struct attribute *attr, char *buf);
+ssize_t foreign_symlink_prefix_store(struct kobject *kobj,
+				     struct attribute *attr,
+				     const char *buffer, size_t count);
+ssize_t foreign_symlink_upcall_show(struct kobject *kobj,
+				    struct attribute *attr, char *buf);
+ssize_t foreign_symlink_upcall_store(struct kobject *kobj,
+				     struct attribute *attr,
+				     const char *buffer, size_t count);
+ssize_t foreign_symlink_upcall_info_store(struct kobject *kobj,
+				     struct attribute *attr,
+				     const char *buffer, size_t count);
+extern const struct inode_operations ll_foreign_file_symlink_inode_operations;
+extern const struct inode_operations ll_foreign_dir_symlink_inode_operations;
+
+#endif /* LLITE_FOREIGN_SYMLINK_H */
diff --git a/fs/lustre/llite/llite_foreign.c b/fs/lustre/llite/llite_foreign.c
new file mode 100644
index 0000000..ed958f5
--- /dev/null
+++ b/fs/lustre/llite/llite_foreign.c
@@ -0,0 +1,284 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * GPL HEADER START
+ *
+ * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 only,
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License version 2 for more details (a copy is included
+ * in the LICENSE file that accompanied this code).
+ *
+ * You should have received a copy of the GNU General Public License
+ * version 2 along with this program; If not, see
+ * http://www.gnu.org/licenses/gpl-2.0.html
+ *
+ * GPL HEADER END
+ */
+/*
+ * Copyright (c) 2020 Intel Corporation.
+ */
+#define DEBUG_SUBSYSTEM S_LLITE
+
+#include "llite_internal.h"
+
+static void ll_manage_foreign_file(struct inode *inode,
+				   struct lov_foreign_md *lfm)
+{
+	struct ll_sb_info *sbi = ll_i2sbi(inode);
+
+	if (le32_to_cpu(lfm->lfm_type) == LU_FOREIGN_TYPE_SYMLINK) {
+		CDEBUG(D_INFO,
+		       "%s: inode %p of fid "DFID": Foreign file of type symlink, faking a symlink\n",
+		       sbi->ll_fsname, inode, PFID(ll_inode2fid(inode)));
+		/* change inode_operations to add symlink methods, and clear
+		 * IOP_NOFOLLOW to ensure file will be treated as a symlink
+		 * by Kernel (see in * d_flags_for_inode()).
+		 */
+		inode->i_op = &ll_foreign_file_symlink_inode_operations;
+		inode->i_opflags &= ~IOP_NOFOLLOW;
+	} else {
+		CDEBUG(D_INFO,
+		       "%s: inode %p of fid "DFID": Foreign file of type %ux, nothing special to do\n",
+		       sbi->ll_fsname, inode, PFID(ll_inode2fid(inode)),
+		       le32_to_cpu(lfm->lfm_type));
+	}
+}
+
+static void ll_manage_foreign_dir(struct inode *inode,
+				  struct lmv_foreign_md *lfm)
+{
+	struct ll_sb_info *sbi = ll_i2sbi(inode);
+
+	if (lfm->lfm_type == LU_FOREIGN_TYPE_SYMLINK) {
+		CDEBUG(D_INFO,
+		       "%s: inode %p of fid "DFID": Foreign dir of type symlink, faking a symlink\n",
+		       sbi->ll_fsname, inode, PFID(ll_inode2fid(inode)));
+		/* change inode_operations to add symlink methods
+		 * IOP_NOFOLLOW should not be set for dirs
+		 */
+		inode->i_op = &ll_foreign_dir_symlink_inode_operations;
+	} else {
+		CDEBUG(D_INFO,
+		       "%s: inode %p of fid "DFID": Foreign dir of type %ux, nothing special to do\n",
+		       sbi->ll_fsname, inode, PFID(ll_inode2fid(inode)),
+		       le32_to_cpu(lfm->lfm_type));
+	}
+}
+
+int ll_manage_foreign(struct inode *inode, struct lustre_md *lmd)
+{
+	int rc = 0;
+
+	/* apply any foreign file/dir policy */
+	if (S_ISREG((inode)->i_mode)) {
+		struct ll_inode_info *lli = ll_i2info(inode);
+		struct cl_object *obj = lli->lli_clob;
+
+		if (lmd->layout.lb_buf && lmd->layout.lb_len != 0) {
+			struct lov_foreign_md *lfm = lmd->layout.lb_buf;
+
+			if (lfm->lfm_magic == LOV_MAGIC_FOREIGN)
+				ll_manage_foreign_file(inode, lfm);
+			goto out;
+		}
+
+		if (obj) {
+			struct lov_foreign_md lfm = {
+				.lfm_magic = LOV_MAGIC,
+			};
+			struct cl_layout cl = {
+				.cl_buf.lb_buf = &lfm,
+				.cl_buf.lb_len = sizeof(lfm),
+			};
+			struct lu_env *env;
+			u16 refcheck;
+
+			env = cl_env_get(&refcheck);
+			if (IS_ERR(env)) {
+				rc = PTR_ERR(env);
+				goto out;
+			}
+			rc = cl_object_layout_get(env, obj, &cl);
+			/* error is likely to be -ERANGE because of the small
+			 * buffer we use, only the content is significant here
+			 */
+			if (rc < 0 && rc != -ERANGE) {
+				cl_env_put(env, &refcheck);
+				goto out;
+			}
+			if (lfm.lfm_magic == LOV_MAGIC_FOREIGN)
+				ll_manage_foreign_file(inode, &lfm);
+			cl_env_put(env, &refcheck);
+		}
+	} else if (S_ISDIR((inode)->i_mode)) {
+		if (lmd->lfm &&
+		    lmd->lfm->lfm_magic == LMV_MAGIC_FOREIGN) {
+			ll_manage_foreign_dir(inode, lmd->lfm);
+		} else {
+			struct ll_inode_info *lli = ll_i2info(inode);
+			struct lmv_foreign_md *lfm;
+
+			down_read(&lli->lli_lsm_sem);
+			lfm = (struct lmv_foreign_md *)(lli->lli_lsm_md);
+			if (lfm &&  lfm->lfm_magic == LMV_MAGIC_FOREIGN)
+				ll_manage_foreign_dir(inode, lfm);
+			up_read(&lli->lli_lsm_sem);
+		}
+	}
+out:
+	return rc;
+}
+
+/* dentry must be spliced to inode (dentry->d_inode != NULL) !!! */
+bool ll_foreign_is_openable(struct dentry *dentry, unsigned int flags)
+{
+	/* check for faked symlink here as they should not be opened (unless
+	 * O_NOFOLLOW!) and thus wants ll_atomic_open() to return 1 from
+	 * finish_no_open() in order to get follow_link() to be called in both
+	 * path_lookupat() and path_openupat().
+	 * This will not break regular symlink handling as they have
+	 * been treated/filtered upstream.
+	 */
+	if (d_is_symlink(dentry) && !S_ISLNK(dentry->d_inode->i_mode) &&
+	    !(flags & O_NOFOLLOW))
+		return false;
+
+	return true;
+}
+
+static bool should_preserve_foreign_file(struct lov_foreign_md *lfm,
+					 struct ll_inode_info *lli, bool unset)
+{
+	/* for now, only avoid foreign fake symlink file removal */
+
+	if (unset)
+		if (lfm->lfm_type == LU_FOREIGN_TYPE_SYMLINK) {
+			set_bit(LLIF_FOREIGN_REMOVABLE, &lli->lli_flags);
+			return true;
+		} else {
+			return false;
+		}
+	else
+		return lfm->lfm_type == LU_FOREIGN_TYPE_SYMLINK &&
+		       !test_bit(LLIF_FOREIGN_REMOVABLE, &lli->lli_flags);
+}
+
+static bool should_preserve_foreign_dir(struct lmv_foreign_md *lfm,
+					struct ll_inode_info *lli, bool unset)
+{
+	/* for now, only avoid foreign fake symlink dir removal */
+
+	if (unset)
+		if (lfm->lfm_type == LU_FOREIGN_TYPE_SYMLINK) {
+			set_bit(LLIF_FOREIGN_REMOVABLE, &lli->lli_flags);
+			return true;
+		} else {
+			return false;
+		}
+	else
+		return lfm->lfm_type == LU_FOREIGN_TYPE_SYMLINK &&
+		       !test_bit(LLIF_FOREIGN_REMOVABLE, &lli->lli_flags);
+}
+
+/* XXX
+ * instead of fetching type from foreign LOV/LMV, we may simply
+ * check (d_is_symlink(dentry) && !S_ISLNK(dentry->d_inode->i_mode))
+ * to identify a fake symlink
+ */
+bool ll_foreign_is_removable(struct dentry *dentry, bool unset)
+{
+	struct inode *inode = dentry->d_inode;
+	struct qstr *name = &dentry->d_name;
+	bool preserve_foreign = false;
+	int rc = 0;
+
+	if (!inode)
+		return 0;
+
+	/* some foreign types may not be allowed to be unlinked in order to
+	 * keep references with external objects
+	 */
+	if (S_ISREG(inode->i_mode)) {
+		struct ll_inode_info *lli = ll_i2info(inode);
+		struct cl_object *obj = lli->lli_clob;
+
+		if (obj) {
+			struct lov_foreign_md lfm = {
+				.lfm_magic = LOV_MAGIC,
+			};
+			struct cl_layout cl = {
+				.cl_buf.lb_buf = &lfm,
+				.cl_buf.lb_len = sizeof(lfm),
+			};
+			struct lu_env *env;
+			u16 refcheck;
+
+			env = cl_env_get(&refcheck);
+			if (IS_ERR(env)) {
+				rc = PTR_ERR(env);
+				goto out;
+			}
+			rc = cl_object_layout_get(env, obj, &cl);
+			/* error is likely to be -ERANGE because of the small
+			 * buffer we use, only the content is significant here
+			 */
+			if (rc < 0 && rc != -ERANGE) {
+				cl_env_put(env, &refcheck);
+				goto out;
+			} else {
+				rc = 0;
+			}
+			if (lfm.lfm_magic == LOV_MAGIC_FOREIGN)
+				preserve_foreign =
+					should_preserve_foreign_file(&lfm, lli,
+								     unset);
+			cl_env_put(env, &refcheck);
+			if (preserve_foreign) {
+				CDEBUG(D_INFO,
+				       "%s unlink of foreign file (%.*s, "DFID")\n",
+				       unset ? "allow" : "prevent",
+				       name->len, name->name,
+				       PFID(ll_inode2fid(inode)));
+				return false;
+			}
+		} else {
+			CDEBUG(D_INFO,
+			       "unable to check if file (%.*s, "DFID") is foreign...\n",
+			       name->len, name->name,
+			       PFID(ll_inode2fid(inode)));
+			/* XXX should we prevent removal ?? */
+		}
+	} else if (S_ISDIR(inode->i_mode)) {
+		struct ll_inode_info *lli = ll_i2info(inode);
+		struct lmv_foreign_md *lfm;
+
+		down_read(&lli->lli_lsm_sem);
+		lfm = (struct lmv_foreign_md *)(lli->lli_lsm_md);
+		if (!lfm)
+			CDEBUG(D_INFO,
+			       "unable to check if dir (%.*s, "DFID") is foreign...\n",
+			       name->len, name->name,
+			       PFID(ll_inode2fid(inode)));
+		else if (lfm->lfm_magic == LMV_MAGIC_FOREIGN)
+			preserve_foreign = should_preserve_foreign_dir(lfm, lli,
+								       unset);
+		up_read(&lli->lli_lsm_sem);
+		if (preserve_foreign) {
+			CDEBUG(D_INFO,
+			       "%s unlink of foreign dir (%.*s, "DFID")\n",
+			       unset ? "allow" : "prevent",
+			       name->len, name->name,
+			       PFID(ll_inode2fid(inode)));
+			return false;
+		}
+	}
+
+out:
+	return true;
+}
diff --git a/fs/lustre/llite/llite_foreign_symlink.c b/fs/lustre/llite/llite_foreign_symlink.c
new file mode 100644
index 0000000..7ba33f4
--- /dev/null
+++ b/fs/lustre/llite/llite_foreign_symlink.c
@@ -0,0 +1,758 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * GPL HEADER START
+ *
+ * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 only,
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License version 2 for more details (a copy is included
+ * in the LICENSE file that accompanied this code).
+ *
+ * You should have received a copy of the GNU General Public License
+ * version 2 along with this program; If not, see
+ * http://www.gnu.org/licenses/gpl-2.0.html
+ *
+ * GPL HEADER END
+ */
+/*
+ * Copyright (c) 2020 Intel Corporation.
+ */
+/*
+ * Foreign symlink implementation.
+ *
+ * Methods in this source file allow to construct a relative path from the
+ * LOV/LMV foreign content, to complement it with a prefix, and then to
+ * expose it to the VFS as a symlink destination.
+ * The default/internal mechanism simply takes the full foreign free string
+ * as the relative path, and for more complex internal formats an upcall has
+ * been implemented to provide format's details (presently just in terms of
+ * constant strings and substrings positions in EA, but this can be enhanced)
+ * to llite layer.
+ */
+
+#include <linux/fs.h>
+#include <linux/fs_struct.h>
+#include <linux/mm.h>
+#include <linux/stat.h>
+#include <linux/version.h>
+#define DEBUG_SUBSYSTEM S_LLITE
+
+#include "llite_internal.h"
+
+/* allocate space for "/<prefix>/<suffix>'\0'" and copy prefix in,
+ * returns start position for suffix in *destname
+ * must be called with ll_foreign_symlink_sem locked for read, to
+ * protect against sbi->ll_foreign_symlink_prefix change
+ * on output, provides position where to start prefix complement
+ */
+static int foreign_symlink_alloc_and_copy_prefix(struct ll_sb_info *sbi,
+						 struct inode *inode,
+						 char **destname,
+						 size_t suffix_size)
+{
+	size_t prefix_size, full_size;
+
+	/* allocate enough for "/<prefix>/<suffix>'\0'" */
+	prefix_size = sbi->ll_foreign_symlink_prefix_size - 1;
+	full_size = suffix_size + prefix_size + 3;
+	if (full_size > PATH_MAX) {
+		CERROR("%s: inode "DFID": resolved destination path too long\n",
+		       sbi->ll_fsname, PFID(ll_inode2fid(inode)));
+		return -EINVAL;
+	}
+	*destname = kzalloc(full_size, GFP_KERNEL);
+	if (!*destname)
+		return -ENOMEM;
+
+	memcpy(*destname + 1, sbi->ll_foreign_symlink_prefix,
+	       prefix_size);
+	(*destname)[0] = '/';
+	(*destname)[prefix_size + 1] = '/';
+
+	return prefix_size + 2;
+}
+
+/* if no upcall registered, default foreign symlink parsing method
+ * is to use the full lfm_value as a relative path to complement
+ * foreign_prefix
+ */
+static int ll_foreign_symlink_default_parse(struct ll_sb_info *sbi,
+					    struct inode *inode,
+					    struct lov_foreign_md *lfm,
+					    char **destname)
+{
+	int suffix_pos;
+
+	down_read(&sbi->ll_foreign_symlink_sem);
+	suffix_pos = foreign_symlink_alloc_and_copy_prefix(sbi, inode,
+							   destname,
+							   lfm->lfm_length);
+	up_read(&sbi->ll_foreign_symlink_sem);
+
+	if (suffix_pos < 0)
+		return suffix_pos;
+
+	memcpy(*destname + suffix_pos, lfm->lfm_value,
+	       lfm->lfm_length);
+	(*destname)[suffix_pos + lfm->lfm_length] = '\0';
+
+	return 0;
+}
+
+/* if an upcall has been registered, foreign symlink will be
+ * constructed as per upcall provided format
+ * presently we only support a serie of constant strings and sub-strings
+ * to be taken from lfm_value content
+ */
+static int ll_foreign_symlink_upcall_parse(struct ll_sb_info *sbi,
+					   struct inode *inode,
+					   struct lov_foreign_md *lfm,
+					   char **destname)
+{
+	int pos = 0, suffix_pos = -1, items_size = 0;
+	struct ll_foreign_symlink_upcall_item *foreign_symlink_items =
+			sbi->ll_foreign_symlink_upcall_items;
+	int i = 0, rc = 0;
+
+	down_read(&sbi->ll_foreign_symlink_sem);
+
+	/* compute size of relative path of destination path
+	 * could be done once during upcall items/infos reading
+	 * and stored as new ll_sb_info field
+	 */
+	for (i = 0; i < sbi->ll_foreign_symlink_upcall_nb_items; i++) {
+		switch (foreign_symlink_items[i].type) {
+		case STRING_TYPE:
+			items_size += foreign_symlink_items[i].size;
+			break;
+		case POSLEN_TYPE:
+			items_size += foreign_symlink_items[i].len;
+			break;
+		case EOB_TYPE:
+			/* should be the last item */
+			break;
+		default:
+			CERROR("%s: unexpected type '%u' found in items\n",
+			       sbi->ll_fsname, foreign_symlink_items[i].type);
+			rc = -EINVAL;
+			goto failed;
+		}
+	}
+
+	suffix_pos = foreign_symlink_alloc_and_copy_prefix(sbi, inode, destname,
+							   items_size);
+	if (suffix_pos < 0) {
+		rc = suffix_pos;
+		goto failed;
+	}
+
+	/* rescan foreign_symlink_items[] to create faked symlink dest path */
+	i = 0;
+	while (foreign_symlink_items[i].type != EOB_TYPE) {
+		if (foreign_symlink_items[i].type == STRING_TYPE) {
+			memcpy(*destname + suffix_pos + pos,
+			       foreign_symlink_items[i].string,
+			       foreign_symlink_items[i].size);
+			pos += foreign_symlink_items[i].size;
+		} else if (foreign_symlink_items[i].type == POSLEN_TYPE) {
+			if (lfm->lfm_length < foreign_symlink_items[i].pos +
+					      foreign_symlink_items[i].len) {
+				CERROR("%s:  "DFID" foreign EA too short to find (%u,%u) item\n",
+				       sbi->ll_fsname,
+				       PFID(ll_inode2fid(inode)),
+				       foreign_symlink_items[i].pos,
+				       foreign_symlink_items[i].len);
+				rc = -EINVAL;
+				goto failed;
+			}
+			memcpy(*destname + suffix_pos + pos,
+			       lfm->lfm_value + foreign_symlink_items[i].pos,
+			       foreign_symlink_items[i].len);
+			pos += foreign_symlink_items[i].len;
+		} else {
+			CERROR("%s: unexpected type '%u' found in items\n",
+			       sbi->ll_fsname, foreign_symlink_items[i].type);
+			rc = -EINVAL;
+			goto failed;
+		}
+		i++;
+	}
+failed:
+	up_read(&sbi->ll_foreign_symlink_sem);
+
+	if (rc != 0 && suffix_pos >= 0) {
+		kvfree(*destname);
+		*destname = NULL;
+	}
+
+	return rc;
+}
+
+static int ll_foreign_symlink_parse(struct ll_sb_info *sbi,
+				    struct inode *inode,
+				    struct lov_foreign_md *lfm,
+				    char **destname)
+{
+	int rc;
+
+	/* if no user-land upcall registered, assuming whole free field
+	 * of foreign LOV is relative path of faked symlink destination,
+	 * to be completed by prefix
+	 */
+	if (!(sbi->ll_flags & LL_SBI_FOREIGN_SYMLINK_UPCALL))
+		rc = ll_foreign_symlink_default_parse(sbi, inode, lfm,
+						      destname);
+	else /* upcall is available */
+		rc = ll_foreign_symlink_upcall_parse(sbi, inode, lfm,
+						     destname);
+	return rc;
+}
+
+/* Don't need lli_size_mutex locked as LOV/LMV are EAs
+ * and should not be stored in data blocks
+ */
+static int ll_foreign_readlink_internal(struct inode *inode, char **symname)
+{
+	struct ll_inode_info *lli = ll_i2info(inode);
+	struct ll_sb_info *sbi = ll_i2sbi(inode);
+	struct lov_foreign_md *lfm = NULL;
+	char *destname = NULL;
+	size_t lfm_size = 0;
+	int rc;
+
+	if (S_ISREG(inode->i_mode)) {
+		struct cl_object *obj = lli->lli_clob;
+		struct cl_layout cl = {
+			.cl_buf.lb_len = 0, /* to get real size */
+		};
+		struct lu_env *env;
+		u16 refcheck;
+
+		if (!obj) {
+			CERROR("%s: inode "DFID": can not get layout, no cl_object\n",
+			       sbi->ll_fsname, PFID(ll_inode2fid(inode)));
+			rc = -EINVAL;
+			goto failed;
+		}
+
+		env = cl_env_get(&refcheck);
+		if (IS_ERR(env))
+			return PTR_ERR(env);
+		/* get layout size */
+		rc = cl_object_layout_get(env, obj, &cl);
+		if (rc <= 0) {
+			CERROR("%s: inode "DFID": error trying to get layout size : %d\n",
+			       sbi->ll_fsname, PFID(ll_inode2fid(inode)), rc);
+			cl_env_put(env, &refcheck);
+			return rc;
+		}
+		lfm = kzalloc(rc, GFP_KERNEL);
+		if (!lfm) {
+			CERROR("%s: inode "DFID": can not allocate enough mem to get layout\n",
+			       sbi->ll_fsname, PFID(ll_inode2fid(inode)));
+			cl_env_put(env, &refcheck);
+			return -ENOMEM;
+		}
+		cl.cl_buf.lb_len = rc;
+		cl.cl_buf.lb_buf = lfm;
+		/* get layout */
+		rc = cl_object_layout_get(env, obj, &cl);
+		if (rc <= 0) {
+			CERROR("%s: inode "DFID": error trying to get layout : %d\n",
+			       sbi->ll_fsname, PFID(ll_inode2fid(inode)), rc);
+			kfree(lfm);
+			cl_env_put(env, &refcheck);
+			return rc;
+		}
+		lfm_size = cl.cl_buf.lb_len;
+		cl_env_put(env, &refcheck);
+	} else if (S_ISDIR(inode->i_mode)) {
+		down_read(&lli->lli_lsm_sem);
+
+		/* should be casted lmv_foreign_md, but it is ok as both foreign LOV
+		 * and LMV formats are identical, and then we also only need
+		 * one set of parsing routines for both foreign files and dirs!
+		 */
+		lfm = (struct lov_foreign_md *)(lli->lli_lsm_md);
+		if (lfm) {
+			CDEBUG(D_INFO, "%s: inode "DFID": LMV cached found\n",
+			       sbi->ll_fsname, PFID(ll_inode2fid(inode)));
+		} else {
+			CERROR("%s: inode "DFID": cannot get layout, no LMV cached\n",
+			       sbi->ll_fsname, PFID(ll_inode2fid(inode)));
+			rc = -EINVAL;
+			goto failed;
+		}
+	} else {
+		CERROR("%s: inode "DFID": not a regular file nor directory\n",
+		       sbi->ll_fsname, PFID(ll_inode2fid(inode)));
+		rc = -EINVAL;
+		goto failed;
+	}
+
+	/* XXX no assert nor double check of magic, length and type ? */
+
+	rc = ll_foreign_symlink_parse(sbi, inode, lfm, &destname);
+failed:
+	if (S_ISDIR(inode->i_mode))
+		up_read(&lli->lli_lsm_sem);
+
+	if (S_ISREG(inode->i_mode) && lfm)
+		kfree(lfm);
+
+	if (!rc) {
+		*symname = destname;
+		CDEBUG(D_INFO,
+		       "%s: inode "DFID": faking symlink to dest '%s'\n",
+		       sbi->ll_fsname, PFID(ll_inode2fid(inode)), destname);
+	}
+
+	return rc;
+}
+
+static void ll_foreign_put_link(void *cookie)
+{
+	/* to avoid allocating an unnecessary big buffer, and since ways to
+	 * build the symlink path from foreign LOV/LMV can be multiple and
+	 * not constant. So it size is not known and we need to use
+	 * strlen(cookie)+1 to determine its size and to avoid false positive
+	 * to be reported by memory leak check code
+	 */
+	kvfree(cookie);
+}
+
+static const char *ll_foreign_get_link(struct dentry *dentry,
+				       struct inode *inode,
+				       struct delayed_call *done)
+{
+	char *symname = NULL;
+	int rc;
+
+	CDEBUG(D_VFSTRACE, "VFS Op\n");
+	if (!dentry)
+		return ERR_PTR(-ECHILD);
+	rc = ll_foreign_readlink_internal(inode, &symname);
+
+	/*
+	 * symname must be freed when we are done
+	 *
+	 * XXX we may avoid the need to do so if we use
+	 * lli_symlink_name cache to retain symname and
+	 * let ll_clear_inode free it...
+	 */
+	set_delayed_call(done, ll_foreign_put_link, symname);
+	return rc ? ERR_PTR(rc) : symname;
+}
+
+/*
+ * Should only be called for already in-use/cache foreign dir inode
+ * when foreign fake-symlink behaviour has been enabled afterward
+ */
+static struct dentry *ll_foreign_dir_lookup(struct inode *parent,
+					 struct dentry *dentry,
+					 unsigned int flags)
+{
+	CDEBUG(D_VFSTRACE, "VFS Op:name=%.*s, dir="DFID"(%p)\n",
+	       dentry->d_name.len, dentry->d_name.name,
+	       PFID(ll_inode2fid(parent)), parent);
+
+	return ERR_PTR(-ENODATA);
+}
+
+static bool has_same_mount_namespace(struct ll_sb_info *sbi)
+{
+	int rc;
+
+	rc = (sbi->ll_mnt.mnt == current->fs->root.mnt);
+	if (!rc)
+		LCONSOLE_WARN("%s: client mount %s and '%s.%d' not in same mnt-namespace\n",
+			      sbi->ll_fsname, sbi->ll_kset.kobj.name,
+			      current->comm, current->pid);
+
+	return rc;
+}
+
+ssize_t foreign_symlink_enable_show(struct kobject *kobj,
+				    struct attribute *attr, char *buf)
+{
+	struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info,
+					      ll_kset.kobj);
+
+	return snprintf(buf, PAGE_SIZE, "%d\n",
+			!!(sbi->ll_flags & LL_SBI_FOREIGN_SYMLINK));
+}
+
+/*
+ * XXX
+ * There should be already in-use/cached inodes of foreign files/dirs who
+ * will not-be/continue-to-be handled as fake-symlink, depending if
+ * feature is being enabled/disabled, until being revalidated.
+ * Also, does it require sbi->ll_lock protection ?
+ */
+ssize_t foreign_symlink_enable_store(struct kobject *kobj,
+				     struct attribute *attr,
+				     const char *buffer, size_t count)
+{
+	struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info,
+					      ll_kset.kobj);
+	unsigned int val;
+	int rc;
+
+	if (!has_same_mount_namespace(sbi))
+		return -EINVAL;
+
+	rc = kstrtouint(buffer, 10, &val);
+	if (rc)
+		return rc;
+
+	if (val)
+		sbi->ll_flags |= LL_SBI_FOREIGN_SYMLINK;
+	else
+		sbi->ll_flags &= ~LL_SBI_FOREIGN_SYMLINK;
+
+	return count;
+}
+
+ssize_t foreign_symlink_prefix_show(struct kobject *kobj,
+				    struct attribute *attr, char *buf)
+{
+	struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info,
+					      ll_kset.kobj);
+	ssize_t size;
+
+	down_read(&sbi->ll_foreign_symlink_sem);
+	size = snprintf(buf, PAGE_SIZE, "%s\n", sbi->ll_foreign_symlink_prefix);
+	up_read(&sbi->ll_foreign_symlink_sem);
+
+	return size;
+}
+
+ssize_t foreign_symlink_prefix_store(struct kobject *kobj,
+				     struct attribute *attr,
+				     const char *buffer, size_t count)
+{
+	struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info,
+					      ll_kset.kobj);
+	char *new, *old;
+	size_t new_len, old_len;
+
+	if (!has_same_mount_namespace(sbi))
+		return -EINVAL;
+
+	/* XXX strip buffer of any CR/LF,space,... ?? */
+
+	/* check buffer looks like a valid absolute path */
+	if (*buffer != '/') {
+		CERROR("foreign symlink prefix must be an absolute path\n");
+		return -EINVAL;
+	}
+	new_len = strnlen(buffer, count);
+	if (new_len < count)
+		CDEBUG(D_INFO, "NUL byte found in %zu bytes\n", count);
+	if (new_len > PATH_MAX) {
+		CERROR("%s: foreign symlink prefix length %zu > PATH_MAX\n",
+		       sbi->ll_fsname, new_len);
+		return -EINVAL;
+	}
+	new = kzalloc(new_len + 1, GFP_KERNEL);
+	if (!new) {
+		CERROR("%s: can not allocate space for foreign path prefix\n",
+		       sbi->ll_fsname);
+		return -ENOSPC;
+	}
+
+	down_write(&sbi->ll_foreign_symlink_sem);
+	old_len = sbi->ll_foreign_symlink_prefix_size;
+	old = sbi->ll_foreign_symlink_prefix;
+	memcpy(new, buffer, new_len);
+	*(new + new_len) = '\0';
+
+	sbi->ll_foreign_symlink_prefix = new;
+	sbi->ll_foreign_symlink_prefix_size = new_len + 1;
+	up_write(&sbi->ll_foreign_symlink_sem);
+
+	kfree(old);
+
+	return new_len;
+}
+
+ssize_t foreign_symlink_upcall_show(struct kobject *kobj,
+				    struct attribute *attr, char *buf)
+{
+	ssize_t size;
+	struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info,
+					      ll_kset.kobj);
+
+	down_read(&sbi->ll_foreign_symlink_sem);
+	size = snprintf(buf, PAGE_SIZE, "%s\n", sbi->ll_foreign_symlink_upcall);
+	up_read(&sbi->ll_foreign_symlink_sem);
+
+	return size;
+}
+
+ssize_t foreign_symlink_upcall_store(struct kobject *kobj,
+				     struct attribute *attr,
+				     const char *buffer, size_t count)
+{
+	struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info,
+					      ll_kset.kobj);
+	char *old = NULL, *new = NULL;
+	size_t new_len;
+
+	if (!has_same_mount_namespace(sbi))
+		return -EINVAL;
+
+	/* XXX strip buffer of any CR/LF,space,... ?? */
+
+	/* check buffer looks like a valid absolute path */
+	if (*buffer != '/' && strcmp(buffer, "none")) {
+		CERROR("foreign symlink upcall must be an absolute path\n");
+		return -EINVAL;
+	}
+	new_len = strnlen(buffer, count);
+	if (new_len < count)
+		CDEBUG(D_INFO, "NULL byte found in %zu bytes\n", count);
+	if (new_len > PATH_MAX) {
+		CERROR("%s: foreign symlink upcall path length %zu > PATH_MAX\n",
+		       sbi->ll_fsname, new_len);
+		return -EINVAL;
+	}
+
+	new = kzalloc(new_len + 1, GFP_KERNEL);
+	if (!new) {
+		CERROR("%s: can not allocate space for foreign symlink upcall path\n",
+		       sbi->ll_fsname);
+		return -ENOSPC;
+	}
+	memcpy(new, buffer, new_len);
+	*(new + new_len) = '\0';
+
+	down_write(&sbi->ll_foreign_symlink_sem);
+	old = sbi->ll_foreign_symlink_upcall;
+
+	sbi->ll_foreign_symlink_upcall = new;
+	/* LL_SBI_FOREIGN_SYMLINK_UPCALL will be set by
+	 * foreign_symlink_upcall_info_store() upon valid being provided
+	 * by upcall
+	 * XXX there is a potential race if there are multiple concurent
+	 * attempts to set upcall path and execution occur in different
+	 * order, we may end up using the format provided by a different
+	 * upcall than the one set in ll_foreign_symlink_upcall
+	 */
+	sbi->ll_flags &= ~LL_SBI_FOREIGN_SYMLINK_UPCALL;
+	up_write(&sbi->ll_foreign_symlink_sem);
+
+	if (strcmp(new, "none")) {
+		char *argv[] = {
+			  [0] = new,
+			  /* sbi sysfs object name */
+			  [1] = (char *)sbi->ll_kset.kobj.name,
+			  [2] = NULL
+		};
+		char *envp[] = {
+			  [0] = "HOME=/",
+			  [1] = "PATH=/sbin:/usr/sbin",
+			  [2] = NULL
+		};
+		int rc;
+
+		rc = call_usermodehelper(new, argv, envp, UMH_WAIT_EXEC);
+		if (rc < 0)
+			CERROR("%s: error invoking foreign symlink upcall %s: rc %d\n",
+			       sbi->ll_fsname, new, rc);
+		else
+			CDEBUG(D_INFO, "%s: invoked upcall %s\n",
+			       sbi->ll_fsname, new);
+	}
+
+	kvfree(old);
+
+	return new_len;
+}
+
+/* foreign_symlink_upcall_info_store() stores format items in
+ * foreign_symlink_items[], and foreign_symlink_upcall_parse()
+ * uses it to parse each foreign symlink LOV/LMV EAs
+ */
+ssize_t foreign_symlink_upcall_info_store(struct kobject *kobj,
+				     struct attribute *attr,
+				     const char *buffer, size_t count)
+{
+	struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info,
+					      ll_kset.kobj);
+	struct ll_foreign_symlink_upcall_item items[MAX_NB_UPCALL_ITEMS], *item;
+	struct ll_foreign_symlink_upcall_item *new_items, *old_items;
+	size_t remaining = count;
+	int nb_items = 0, old_nb_items, i, rc = 0;
+
+	if (!has_same_mount_namespace(sbi))
+		return -EINVAL;
+
+	/* parse buffer to check validity of infos and fill symlink format
+	 * descriptors
+	 */
+
+	if (count % sizeof(u32) != 0) {
+		CERROR("%s: invalid size '%zu' of infos buffer returned by foreign symlink upcall\n",
+		       sbi->ll_fsname, count);
+		return -EINVAL;
+	}
+
+	/* evaluate number of items provided */
+	while (remaining > 0) {
+		item = (struct ll_foreign_symlink_upcall_item *)
+				&buffer[count - remaining];
+		switch (item->type) {
+		case STRING_TYPE: {
+			/* a constant string following */
+			if (item->size >= remaining -
+			    offsetof(struct ll_foreign_symlink_upcall_item,
+				     bytestring) - sizeof(item->type)) {
+				/* size of string must not overflow remaining
+				 * bytes minus EOB_TYPE item
+				 */
+				CERROR("%s: constant string too long in infos buffer returned by foreign symlink upcall\n",
+				       sbi->ll_fsname);
+				rc = -EINVAL;
+				goto failed;
+			}
+			items[nb_items].string = kzalloc(item->size,
+							 GFP_KERNEL);
+			if (!items[nb_items].string) {
+				CERROR("%s: constant string allocation has failed for constant string of size %zu\n",
+				       sbi->ll_fsname, item->size);
+				rc = -ENOMEM;
+				goto failed;
+			}
+			memcpy(items[nb_items].string,
+			       item->bytestring, item->size);
+			items[nb_items].size = item->size;
+			/* string items to fit on u32 boundary */
+			remaining = remaining - STRING_ITEM_SZ(item->size);
+			break;
+		}
+		case POSLEN_TYPE: {
+			/* a tuple (pos,len) following to delimit a sub-string
+			 * in lfm_value
+			 */
+			items[nb_items].pos = item->pos;
+			items[nb_items].len = item->len;
+			remaining -= POSLEN_ITEM_SZ;
+			break;
+		}
+		case EOB_TYPE:
+			if (remaining != sizeof(item->type)) {
+				CERROR("%s: early end of infos buffer returned by foreign symlink upcall\n",
+				       sbi->ll_fsname);
+				rc = -EINVAL;
+				goto failed;
+			}
+			remaining -= sizeof(item->type);
+			break;
+		default:
+			CERROR("%s: wrong type '%u' encountered at pos %zu , with %zu remaining bytes, in infos buffer returned by foreign symlink upcall\n",
+			       sbi->ll_fsname, (u32)buffer[count - remaining],
+			       count - remaining, remaining);
+			rc = -EINVAL;
+			goto failed;
+		}
+
+		items[nb_items].type = item->type;
+		nb_items++;
+		if (nb_items >= MAX_NB_UPCALL_ITEMS) {
+			CERROR("%s: too many items in infos buffer returned by foreign symlink upcall\n",
+			       sbi->ll_fsname);
+			rc = -EINVAL;
+			goto failed;
+		}
+	}
+	/* valid format has been provided by foreign symlink user upcall */
+	new_items = kvmalloc_array(nb_items,
+				   sizeof(struct ll_foreign_symlink_upcall_item),
+				   GFP_KERNEL);
+	if (!new_items) {
+		CERROR("%s: constant string allocation has failed for constant string of size %zu\n",
+		       sbi->ll_fsname, nb_items *
+			sizeof(struct ll_foreign_symlink_upcall_item));
+		rc = -ENOMEM;
+		goto failed;
+	}
+	for (i = 0; i < nb_items; i++)
+		*((struct ll_foreign_symlink_upcall_item *)new_items + i) =
+			items[i];
+
+	down_write(&sbi->ll_foreign_symlink_sem);
+	old_items = sbi->ll_foreign_symlink_upcall_items;
+	old_nb_items = sbi->ll_foreign_symlink_upcall_nb_items;
+	sbi->ll_foreign_symlink_upcall_items = new_items;
+	sbi->ll_foreign_symlink_upcall_nb_items = nb_items;
+	sbi->ll_flags |= LL_SBI_FOREIGN_SYMLINK_UPCALL;
+	up_write(&sbi->ll_foreign_symlink_sem);
+
+	/* free old_items */
+	if (old_items) {
+		for (i = 0 ; i < old_nb_items; i++)
+			if (old_items[i].type == STRING_TYPE)
+				kfree(old_items[i].string);
+
+		kvfree(old_items);
+	}
+
+failed:
+	/* clean items[] and free any strings */
+	if (rc != 0) {
+		for (i = 0; i < nb_items; i++) {
+			switch (items[i].type) {
+			case STRING_TYPE:
+				kfree(items[i].string);
+				items[i].string = NULL;
+				items[i].size = 0;
+				break;
+			case POSLEN_TYPE:
+				items[i].pos = 0;
+				items[i].len = 0;
+				break;
+			case EOB_TYPE:
+				break;
+			default:
+				CERROR("%s: wrong '%u'type encountered in foreign symlink upcall items\n",
+				       sbi->ll_fsname, items[i].type);
+				rc = -EINVAL;
+				goto failed;
+			}
+			items[i].type = 0;
+		}
+	}
+
+	return rc == 0 ? count : rc;
+}
+
+static int ll_foreign_symlink_getattr(const struct path *path, struct kstat *stat,
+				      u32 request_mask, unsigned int flags)
+{
+	return ll_getattr_dentry(path->dentry, stat, request_mask, flags,
+				 true);
+}
+
+const struct inode_operations ll_foreign_file_symlink_inode_operations = {
+	.setattr	= ll_setattr,
+	.get_link	= ll_foreign_get_link,
+	.getattr	= ll_foreign_symlink_getattr,
+	.permission	= ll_inode_permission,
+	.listxattr	= ll_listxattr,
+};
+
+const struct inode_operations ll_foreign_dir_symlink_inode_operations = {
+	.lookup		= ll_foreign_dir_lookup,
+	.setattr	= ll_setattr,
+	.get_link	= ll_foreign_get_link,
+	.getattr	= ll_foreign_symlink_getattr,
+	.permission	= ll_inode_permission,
+	.listxattr	= ll_listxattr,
+};
diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h
index 669500b..b3e8a96 100644
--- a/fs/lustre/llite/llite_internal.h
+++ b/fs/lustre/llite/llite_internal.h
@@ -51,6 +51,7 @@
 
 #include "vvp_internal.h"
 #include "pcc.h"
+#include "foreign_symlink.h"
 
 /** Only used on client-side for indicating the tail of dir hash/offset. */
 #define LL_DIR_END_OFF	  0x7fffffffffffffffULL
@@ -102,6 +103,8 @@ enum ll_file_flags {
 	 * local inode atime.
 	 */
 	LLIF_UPDATE_ATIME	= 4,
+	/* foreign file/dir can be unlinked unconditionnaly */
+	LLIF_FOREIGN_REMOVABLE	= 5,
 	/* setting encryption context in progress */
 	LLIF_SET_ENC_CTX	= 6,
 };
@@ -619,6 +622,10 @@ enum stats_track_type {
 #define LL_SBI_FILE_HEAT    0x4000000 /* file heat support */
 #define LL_SBI_TEST_DUMMY_ENCRYPTION	0x8000000 /* test dummy encryption */
 #define LL_SBI_ENCRYPT	   0x10000000 /* client side encryption */
+#define LL_SBI_FOREIGN_SYMLINK	0x20000000 /* foreign fake-symlink support */
+/* foreign fake-symlink upcall registered */
+#define LL_SBI_FOREIGN_SYMLINK_UPCALL	0x40000000
+
 #define LL_SBI_FLAGS {	\
 	"nolck",	\
 	"checksum",	\
@@ -649,6 +656,8 @@ enum stats_track_type {
 	"file_heat",	\
 	"test_dummy_encryption", \
 	"noencrypt",	\
+	"foreign_symlink",	\
+	"foreign_symlink_upcall",	\
 }
 
 /*
@@ -761,6 +770,19 @@ struct ll_sb_info {
 
 	/* Persistent Client Cache */
 	struct pcc_super	ll_pcc_super;
+
+	/* to protect vs updates in all following foreign symlink fields */
+	struct rw_semaphore	ll_foreign_symlink_sem;
+	/* foreign symlink path prefix */
+	char			*ll_foreign_symlink_prefix;
+	/* full prefix size including leading '\0' */
+	size_t			ll_foreign_symlink_prefix_size;
+	/* foreign symlink path upcall */
+	char			*ll_foreign_symlink_upcall;
+	/* foreign symlink path upcall infos */
+	struct ll_foreign_symlink_upcall_item *ll_foreign_symlink_upcall_items;
+	/* foreign symlink path upcall nb infos */
+	unsigned int		ll_foreign_symlink_upcall_nb_items;
 };
 
 #define SBI_DEFAULT_HEAT_DECAY_WEIGHT	((80 * 256 + 50) / 100)
@@ -951,6 +973,11 @@ static inline bool ll_sbi_has_file_heat(struct ll_sb_info *sbi)
 	return !!(sbi->ll_flags & LL_SBI_FILE_HEAT);
 }
 
+static inline bool ll_sbi_has_foreign_symlink(struct ll_sb_info *sbi)
+{
+	return !!(sbi->ll_flags & LL_SBI_FOREIGN_SYMLINK);
+}
+
 void ll_ras_enter(struct file *f, loff_t pos, size_t count);
 
 /* llite/lcommon_misc.c */
@@ -1063,6 +1090,8 @@ enum ldlm_mode ll_take_md_lock(struct inode *inode, u64 bits,
 int ll_md_real_close(struct inode *inode, fmode_t fmode);
 int ll_getattr(const struct path *path, struct kstat *stat,
 	       u32 request_mask, unsigned int flags);
+int ll_getattr_dentry(struct dentry *de, struct kstat *stat, u32 request_mask,
+		      unsigned int flags, bool foreign);
 #ifdef CONFIG_LUSTRE_FS_POSIX_ACL
 struct posix_acl *ll_get_acl(struct inode *inode, int type);
 int ll_set_acl(struct inode *inode, struct posix_acl *acl, int type);
@@ -1459,7 +1488,7 @@ static inline int cl_agl(struct inode *inode)
 int ll_file_lock_ahead(struct file *file, struct llapi_lu_ladvise *ladvise);
 
 int cl_io_get(struct inode *inode, struct lu_env **envout,
-	      struct cl_io **ioout, __u16 *refcheck);
+	      struct cl_io **ioout, u16 *refcheck);
 
 static inline int ll_glimpse_size(struct inode *inode)
 {
@@ -1671,5 +1700,9 @@ inline void ll_sbi_set_encrypt(struct ll_sb_info *sbi, bool set)
 {
 }
 #endif /* !CONFIG_FS_ENCRYPTION */
+/* llite/llite_foreign.c */
+int ll_manage_foreign(struct inode *inode, struct lustre_md *lmd);
+bool ll_foreign_is_openable(struct dentry *dentry, unsigned int flags);
+bool ll_foreign_is_removable(struct dentry *dentry, bool unset);
 
 #endif /* LLITE_INTERNAL_H */
diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index f520b34..1b3eef0 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -122,6 +122,28 @@ static struct ll_sb_info *ll_init_sbi(void)
 		goto out_destroy_ra;
 	}
 
+	/* initialize foreign symlink prefix path */
+	sbi->ll_foreign_symlink_prefix = kasprintf(GFP_KERNEL, "/mnt/");
+	if (!sbi->ll_foreign_symlink_prefix) {
+		rc = -ENOMEM;
+		goto out_destroy_ra;
+	}
+	sbi->ll_foreign_symlink_prefix_size = sizeof("/mnt/");
+
+	/* initialize foreign symlink upcall path, none by default */
+	sbi->ll_foreign_symlink_upcall = kasprintf(GFP_KERNEL, "none");
+	if (!sbi->ll_foreign_symlink_upcall) {
+		rc = -ENOMEM;
+		goto out_destroy_ra;
+	}
+
+	sbi->ll_foreign_symlink_upcall_items = NULL;
+	sbi->ll_foreign_symlink_upcall_nb_items = 0;
+	init_rwsem(&sbi->ll_foreign_symlink_sem);
+	/* foreign symlink support (LL_SBI_FOREIGN_SYMLINK in ll_flags)
+	 * not enabled by default
+	 */
+
 	sbi->ll_ra_info.ra_max_pages =
 		min(pages / 32, SBI_DEFAULT_READ_AHEAD_MAX);
 	sbi->ll_ra_info.ra_max_pages_per_file =
@@ -170,6 +192,12 @@ static struct ll_sb_info *ll_init_sbi(void)
 	sbi->ll_heat_period_second = SBI_DEFAULT_HEAT_PERIOD_SECOND;
 	return sbi;
 out_destroy_ra:
+	kfree(sbi->ll_foreign_symlink_upcall);
+	kfree(sbi->ll_foreign_symlink_prefix);
+	if (sbi->ll_cache) {
+		cl_cache_decref(sbi->ll_cache);
+		sbi->ll_cache = NULL;
+	}
 	destroy_workqueue(sbi->ll_ra_info.ll_readahead_wq);
 out_pcc:
 	pcc_super_fini(&sbi->ll_pcc_super);
@@ -190,6 +218,23 @@ static void ll_free_sbi(struct super_block *sb)
 		cl_cache_decref(sbi->ll_cache);
 		sbi->ll_cache = NULL;
 	}
+	kfree(sbi->ll_foreign_symlink_prefix);
+	sbi->ll_foreign_symlink_prefix = NULL;
+	kfree(sbi->ll_foreign_symlink_upcall);
+	sbi->ll_foreign_symlink_upcall = NULL;
+	if (sbi->ll_foreign_symlink_upcall_items) {
+		int i;
+		int nb_items = sbi->ll_foreign_symlink_upcall_nb_items;
+		struct ll_foreign_symlink_upcall_item *items;
+
+		items = sbi->ll_foreign_symlink_upcall_items;
+		for (i = 0 ; i < nb_items; i++)
+			if (items[i].type == STRING_TYPE)
+				kfree(items[i].string);
+
+		kvfree(items);
+		sbi->ll_foreign_symlink_upcall_items = NULL;
+	}
 	pcc_super_fini(&sbi->ll_pcc_super);
 	kfree(sbi);
 }
@@ -958,6 +1003,57 @@ static int ll_options(char *options, struct ll_sb_info *sbi)
 #endif
 			goto next;
 		}
+		tmp = ll_set_opt("foreign_symlink", s1, LL_SBI_FOREIGN_SYMLINK);
+		if (tmp) {
+			int prefix_pos = sizeof("foreign_symlink=") - 1;
+			int equal_pos = sizeof("foreign_symlink=") - 2;
+
+			/* non-default prefix provided ? */
+			if (strlen(s1) >= sizeof("foreign_symlink=") &&
+			    *(s1 + equal_pos) == '=') {
+				char *old = sbi->ll_foreign_symlink_prefix;
+				size_t old_len =
+					sbi->ll_foreign_symlink_prefix_size;
+
+				/* path must be absolute */
+				if (*(s1 + sizeof("foreign_symlink=") -
+				    1) != '/') {
+					LCONSOLE_ERROR_MSG(0x152,
+							   "foreign prefix '%s' must be an absolute path\n",
+							   s1 + prefix_pos);
+					return -EINVAL;
+				}
+				/* last option ? */
+				s2 = strchrnul(s1 + prefix_pos, ',');
+
+				if (sbi->ll_foreign_symlink_prefix) {
+					sbi->ll_foreign_symlink_prefix = NULL;
+					sbi->ll_foreign_symlink_prefix_size = 0;
+				}
+				/* alloc for path length and '\0' */
+				sbi->ll_foreign_symlink_prefix = kmalloc(s2 - (s1 + prefix_pos) + 1,
+									 GFP_KERNEL);
+				if (!sbi->ll_foreign_symlink_prefix) {
+					/* restore previous */
+					sbi->ll_foreign_symlink_prefix = old;
+					sbi->ll_foreign_symlink_prefix_size =
+						old_len;
+					return -ENOMEM;
+				}
+				kfree(old);
+				strncpy(sbi->ll_foreign_symlink_prefix,
+					s1 + prefix_pos,
+					s2 - (s1 + prefix_pos));
+				sbi->ll_foreign_symlink_prefix_size =
+					s2 - (s1 + prefix_pos) + 1;
+			} else {
+				LCONSOLE_ERROR_MSG(0x152,
+						   "invalid %s option\n", s1);
+			}
+			/* enable foreign symlink support */
+			*flags |= tmp;
+			goto next;
+		}
 		LCONSOLE_ERROR_MSG(0x152, "Unknown option '%s', won't mount.\n",
 				   s1);
 		return -EINVAL;
@@ -2763,6 +2859,10 @@ int ll_prep_inode(struct inode **inode, struct ptlrpc_request *req,
 
 	if (default_lmv_deleted)
 		ll_update_default_lsm_md(*inode, &md);
+
+	/* we may want to apply some policy for foreign file/dir */
+	if (ll_sbi_has_foreign_symlink(sbi))
+		rc = ll_manage_foreign(*inode, &md);
 out:
 	/* cleanup will be done if necessary */
 	md_free_lustre_md(sbi->ll_md_exp, &md);
@@ -2974,6 +3074,11 @@ int ll_show_options(struct seq_file *seq, struct dentry *dentry)
 	else
 		seq_puts(seq, ",noencrypt");
 
+	if (sbi->ll_flags & LL_SBI_FOREIGN_SYMLINK) {
+		seq_puts(seq, ",foreign_symlink=");
+		seq_puts(seq, sbi->ll_foreign_symlink_prefix);
+	}
+
 	return 0;
 }
 
diff --git a/fs/lustre/llite/lproc_llite.c b/fs/lustre/llite/lproc_llite.c
index 3ca553d..16d1497 100644
--- a/fs/lustre/llite/lproc_llite.c
+++ b/fs/lustre/llite/lproc_llite.c
@@ -290,6 +290,14 @@ static ssize_t client_type_show(struct kobject *kobj, struct attribute *attr,
 }
 LUSTRE_RO_ATTR(client_type);
 
+LUSTRE_RW_ATTR(foreign_symlink_enable);
+
+LUSTRE_RW_ATTR(foreign_symlink_prefix);
+
+LUSTRE_RW_ATTR(foreign_symlink_upcall);
+
+LUSTRE_WO_ATTR(foreign_symlink_upcall_info);
+
 static ssize_t fstype_show(struct kobject *kobj, struct attribute *attr,
 			   char *buf)
 {
@@ -1551,6 +1559,10 @@ struct ldebugfs_vars lprocfs_llite_obd_vars[] = {
 	&lustre_attr_filestotal.attr,
 	&lustre_attr_filesfree.attr,
 	&lustre_attr_client_type.attr,
+	&lustre_attr_foreign_symlink_enable.attr,
+	&lustre_attr_foreign_symlink_prefix.attr,
+	&lustre_attr_foreign_symlink_upcall.attr,
+	&lustre_attr_foreign_symlink_upcall_info.attr,
 	&lustre_attr_fstype.attr,
 	&lustre_attr_uuid.attr,
 	&lustre_attr_max_read_ahead_mb.attr,
diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c
index 7f1fd5c..2da33d0 100644
--- a/fs/lustre/llite/namei.c
+++ b/fs/lustre/llite/namei.c
@@ -48,7 +48,7 @@
 static int ll_create_it(struct inode *dir, struct dentry *dentry,
 			struct lookup_intent *it,
 			void *secctx, u32 secctxlen, bool encrypt,
-			void *encctx, __u32 encctxlen);
+			void *encctx, u32 encctxlen);
 
 /* called from iget5_locked->find_inode() under inode_hash_lock spinlock */
 static int ll_test_inode(struct inode *inode, void *opaque)
@@ -597,6 +597,23 @@ struct dentry *ll_splice_alias(struct inode *inode, struct dentry *de)
 	} else {
 		struct dentry *new = d_splice_alias(inode, de);
 
+		/* this needs only to be done for foreign symlink dirs as
+		 * DCACHE_SYMLINK_TYPE is already set by d_flags_for_inode()
+		 * kernel routine for files with symlink ops (ie, real symlink)
+		 */
+		if (inode && ll_sbi_has_foreign_symlink(ll_i2sbi(inode)) &&
+		    inode->i_op->get_link) {
+			CDEBUG(D_INFO,
+			       "%s: inode "DFID": faking foreign dir as a symlink\n",
+			       ll_i2sbi(inode)->ll_fsname,
+			       PFID(ll_inode2fid(inode)));
+			spin_lock(&de->d_lock);
+			/* like d_flags_for_inode() already does for files */
+			de->d_flags = (de->d_flags & ~DCACHE_ENTRY_TYPE) |
+				      DCACHE_SYMLINK_TYPE;
+			spin_unlock(&de->d_lock);
+		}
+
 		if (IS_ERR(new))
 			CDEBUG(D_DENTRY,
 			       "splice inode %p as %pd gives error %lu\n",
@@ -1187,8 +1204,8 @@ static int ll_atomic_open(struct inode *dir, struct dentry *dentry,
 			}
 		}
 
-		if (d_really_is_positive(dentry) &&
-		    it_disposition(it, DISP_OPEN_OPEN)) {
+		if (dentry->d_inode && it_disposition(it, DISP_OPEN_OPEN) &&
+		    ll_foreign_is_openable(dentry, open_flags)) {
 			/* Open dentry. */
 			if (S_ISFIFO(d_inode(dentry)->i_mode)) {
 				/* We cannot call open here as it might
@@ -1270,7 +1287,7 @@ static struct inode *ll_create_node(struct inode *dir, struct lookup_intent *it)
 static int ll_create_it(struct inode *dir, struct dentry *dentry,
 			struct lookup_intent *it,
 			void *secctx, u32 secctxlen, bool encrypt,
-			void *encctx, __u32 encctxlen)
+			void *encctx, u32 encctxlen)
 {
 	struct inode *inode;
 	u64 bits = 0;
@@ -1581,6 +1598,10 @@ static int ll_unlink(struct inode *dir, struct dentry *dchild)
 	CDEBUG(D_VFSTRACE, "VFS Op:name=%pd,dir=%lu/%u(%p)\n",
 	       dchild, dir->i_ino, dir->i_generation, dir);
 
+	/* some foreign file/dir may not be allowed to be unlinked */
+	if (!ll_foreign_is_removable(dchild, false))
+		return -EPERM;
+
 	op_data = ll_prep_md_op_data(NULL, dir, NULL,
 				     dchild->d_name.name,
 				     dchild->d_name.len,
@@ -1647,6 +1668,10 @@ static int ll_rmdir(struct inode *dir, struct dentry *dchild)
 	CDEBUG(D_VFSTRACE, "VFS Op:name=%pd, dir=" DFID "(%p)\n",
 	       dchild, PFID(ll_inode2fid(dir)), dir);
 
+	/* some foreign dir may not be allowed to be removed */
+	if (!ll_foreign_is_removable(dchild, false))
+		return -EPERM;
+
 	op_data = ll_prep_md_op_data(NULL, dir, NULL,
 				     dchild->d_name.name,
 				     dchild->d_name.len,
diff --git a/fs/lustre/llite/pcc.c b/fs/lustre/llite/pcc.c
index b259fa5..8430fff 100644
--- a/fs/lustre/llite/pcc.c
+++ b/fs/lustre/llite/pcc.c
@@ -1152,12 +1152,12 @@ static int pcc_get_layout_info(struct inode *inode, struct cl_layout *clt)
 		return PTR_ERR(env);
 
 	rc = cl_object_layout_get(env, lli->lli_clob, clt);
-	if (rc)
+	if (rc < 0)
 		CDEBUG(D_INODE, "Cannot get layout for "DFID"\n",
 		       PFID(ll_inode2fid(inode)));
 
 	cl_env_put(env, &refcheck);
-	return rc;
+	return rc < 0 ? rc : 0;
 }
 
 static int pcc_fid2dataset_fullpath(char *buf, int sz, struct lu_fid *fid,
diff --git a/fs/lustre/llite/symlink.c b/fs/lustre/llite/symlink.c
index f78db86..cf5ad9e 100644
--- a/fs/lustre/llite/symlink.c
+++ b/fs/lustre/llite/symlink.c
@@ -37,6 +37,7 @@
 
 #include "llite_internal.h"
 
+/* Must be called with lli_size_mutex locked */
 static int ll_readlink_internal(struct inode *inode,
 				struct ptlrpc_request **request, char **symname)
 {
diff --git a/fs/lustre/lov/lov_object.c b/fs/lustre/lov/lov_object.c
index ee61983..16fed09 100644
--- a/fs/lustre/lov/lov_object.c
+++ b/fs/lustre/lov/lov_object.c
@@ -2177,7 +2177,8 @@ static int lov_object_layout_get(const struct lu_env *env,
 	rc = lov_lsm_pack(lsm, buf->lb_buf, buf->lb_len);
 	lov_lsm_put(lsm);
 
-	return rc < 0 ? rc : 0;
+	/* return error or number of bytes */
+	return rc;
 }
 
 static loff_t lov_object_maxbytes(struct cl_object *obj)
diff --git a/fs/lustre/lov/lov_pack.c b/fs/lustre/lov/lov_pack.c
index 9b5fb9c..438bf36 100644
--- a/fs/lustre/lov/lov_pack.c
+++ b/fs/lustre/lov/lov_pack.c
@@ -163,8 +163,14 @@ static ssize_t lov_lsm_pack_foreign(const struct lov_stripe_md *lsm, void *buf,
 	if (buf_size == 0)
 		return lfm_size;
 
-	if (buf_size < lfm_size)
+	/* if buffer too small return ERANGE but copy the size the
+	 * caller has requested anyway. This may be useful to get
+	 * only the header without the need to alloc the full size
+	 */
+	if (buf_size < lfm_size) {
+		memcpy(lfm, lsm_foreign(lsm), buf_size);
 		return -ERANGE;
+	}
 
 	/* full foreign LOV is already avail in its cache
 	 * no need to translate format fields to little-endian
@@ -189,8 +195,10 @@ ssize_t lov_lsm_pack(const struct lov_stripe_md *lsm, void *buf,
 	if (lsm->lsm_magic == LOV_MAGIC_V1 || lsm->lsm_magic == LOV_MAGIC_V3)
 		return lov_lsm_pack_v1v3(lsm, buf, buf_size);
 
-	if (lsm->lsm_magic == LOV_MAGIC_FOREIGN)
+	if (lsm->lsm_magic == LOV_MAGIC_FOREIGN) {
+		pr_info("calling lov_lsm_pack_foreign\n");
 		return lov_lsm_pack_foreign(lsm, buf, buf_size);
+	}
 
 	lmm_size = lov_comp_md_size(lsm);
 	if (buf_size == 0)
@@ -370,6 +378,7 @@ int lov_getstripe(const struct lu_env *env, struct lov_object *obj,
 
 	lmm_size = lov_lsm_pack(lsm, lmmk, lmmk_size);
 	if (lmm_size < 0) {
+		pr_info("lov_lsm_pack return rc = %zd\n", lmm_size);
 		rc = lmm_size;
 		goto out_free;
 	}
diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h
index 5d46ec9..542d2d3 100644
--- a/include/uapi/linux/lustre/lustre_user.h
+++ b/include/uapi/linux/lustre/lustre_user.h
@@ -360,6 +360,7 @@ struct ll_ioc_lease_id {
 #define LL_IOC_LMV_SETSTRIPE		_IOWR('f', 240, struct lmv_user_md)
 #define LL_IOC_LMV_GETSTRIPE		_IOWR('f', 241, struct lmv_user_md)
 #define LL_IOC_RMFID			_IOR('f', 242, struct fid_array)
+#define LL_IOC_UNLOCK_FOREIGN		_IO('f', 242)
 #define LL_IOC_SET_LEASE		_IOWR('f', 243, struct ll_ioc_lease)
 #define LL_IOC_SET_LEASE_OLD		_IOWR('f', 243, long)
 #define LL_IOC_GET_LEASE		_IO('f', 244)
@@ -769,7 +770,7 @@ struct lustre_foreign_type {
  **/
 enum lustre_foreign_types {
 	LU_FOREIGN_TYPE_NONE = 0,
-	LU_FOREIGN_TYPE_DAOS = 0xda05,
+	LU_FOREIGN_TYPE_SYMLINK = 0xda05,
 	/* must be the max/last one */
 	LU_FOREIGN_TYPE_UNKNOWN = 0xffffffff,
 };
@@ -2280,6 +2281,52 @@ struct fid_array {
 };
 #define OBD_MAX_FIDS_IN_ARRAY	4096
 
+/* more types could be defined upon need for more complex
+ * format to be used in foreign symlink LOV/LMV EAs, like
+ * one to describe a delimiter string and occurence number
+ * of delimited sub-string, ...
+ */
+enum ll_foreign_symlink_upcall_item_type {
+	EOB_TYPE = 1,
+	STRING_TYPE = 2,
+	POSLEN_TYPE = 3,
+};
+
+/* may need to be modified to allow for more format items to be defined, and
+ * like for ll_foreign_symlink_upcall_item_type enum
+ */
+struct ll_foreign_symlink_upcall_item {
+	__u32 type;
+	union {
+		struct {
+			__u32 pos;
+			__u32 len;
+		};
+		struct {
+			size_t size;
+			union {
+				/* internal storage of constant string */
+				char *string;
+				/* upcall stores constant string in a raw */
+				char bytestring[0];
+			};
+		};
+	};
+};
+
+#define POSLEN_ITEM_SZ (offsetof(struct ll_foreign_symlink_upcall_item, len) + \
+		sizeof(((struct ll_foreign_symlink_upcall_item *)0)->len))
+#define STRING_ITEM_SZ(sz) ( \
+	offsetof(struct ll_foreign_symlink_upcall_item, bytestring) + \
+	(sz + sizeof(__u32) - 1) / sizeof(__u32) * sizeof(__u32))
+
+/* presently limited to not cause max stack frame size to be reached
+ * because of temporary automatic array of
+ * "struct ll_foreign_symlink_upcall_item" presently used in
+ * foreign_symlink_upcall_info_store()
+ */
+#define MAX_NB_UPCALL_ITEMS 32
+
 #if defined(__cplusplus)
 }
 #endif
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [lustre-devel] [PATCH 14/14] lustre: llite: use d_is_symlink to test if dentry is a symlink
  2021-05-04  0:10 [lustre-devel] [PATCH 00/14] Update to OpenSFS tree as of May 3, 2021 James Simmons
                   ` (12 preceding siblings ...)
  2021-05-04  0:10 ` [lustre-devel] [PATCH 13/14] lustre: llite: fake symlink type of foreign file/dir James Simmons
@ 2021-05-04  0:10 ` James Simmons
  13 siblings, 0 replies; 15+ messages in thread
From: James Simmons @ 2021-05-04  0:10 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

Using d_is_symlink() is preferred to testing ->get_link or
->follow_link.

A recent patch made this work for foreign files/dirs by making sure
the entry type in d_flags is correct, so we can simplify the code in
ll_revalidate_dentry().

Fixes: 94875289c356 ("lustre: llite: fake symlink type of foreign file/dir")
WC-bug-id: https://jira.whamcloud.com/browse/LU-6142
Lustre-commit: 36b1e4c4142f8a72 ("LU-6142 llite: use d_is_symlink to test if dentry is a symlink")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/41770
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/dcache.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/lustre/llite/dcache.c b/fs/lustre/llite/dcache.c
index f8b82d6..24af33e 100644
--- a/fs/lustre/llite/dcache.c
+++ b/fs/lustre/llite/dcache.c
@@ -253,7 +253,7 @@ static int ll_revalidate_dentry(struct dentry *dentry,
 	 * real symlinks. This will allow to open foreign symlink file/dir
 	 * for get[dir]stripe/unlock ioctl()s.
 	 */
-	if (dentry->d_inode && dentry->d_inode->i_op->get_link) {
+	if (d_is_symlink(dentry)) {
 		if (!S_ISLNK(dentry->d_inode->i_mode) &&
 		    !(lookup_flags & LOOKUP_FOLLOW))
 			return 0;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-05-04  0:11 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-04  0:10 [lustre-devel] [PATCH 00/14] Update to OpenSFS tree as of May 3, 2021 James Simmons
2021-05-04  0:10 ` [lustre-devel] [PATCH 01/14] lustre: llite: Remove last lockahead old compat James Simmons
2021-05-04  0:10 ` [lustre-devel] [PATCH 02/14] lustre: mdc: include linux/idr.h for referenced code James Simmons
2021-05-04  0:10 ` [lustre-devel] [PATCH 03/14] lnet: Recover local NI w/exponential backoff interval James Simmons
2021-05-04  0:10 ` [lustre-devel] [PATCH 04/14] lnet: Deprecate lnet_recovery_interval James Simmons
2021-05-04  0:10 ` [lustre-devel] [PATCH 05/14] lnet: Router ping timeout with discovery disabled James Simmons
2021-05-04  0:10 ` [lustre-devel] [PATCH 06/14] lnet: Ensure proper peer, peer NI, peer net hierarchy James Simmons
2021-05-04  0:10 ` [lustre-devel] [PATCH 07/14] lnet: libcfs: simplify task management in tracefile.c James Simmons
2021-05-04  0:10 ` [lustre-devel] [PATCH 08/14] lustre: move lu_tgt_pool out of obd_target.h James Simmons
2021-05-04  0:10 ` [lustre-devel] [PATCH 09/14] lnet: libcfs: remove references to Sun Trademark James Simmons
2021-05-04  0:10 ` [lustre-devel] [PATCH 10/14] lnet: Skip discovery in LNetPrimaryNID if DD disabled James Simmons
2021-05-04  0:10 ` [lustre-devel] [PATCH 11/14] lustre: ptlrpc: idle import vs lock enqueue race James Simmons
2021-05-04  0:10 ` [lustre-devel] [PATCH 12/14] lustre: mdc: make rpc set for MDS_STATFS interruptible James Simmons
2021-05-04  0:10 ` [lustre-devel] [PATCH 13/14] lustre: llite: fake symlink type of foreign file/dir James Simmons
2021-05-04  0:10 ` [lustre-devel] [PATCH 14/14] lustre: llite: use d_is_symlink to test if dentry is a symlink James Simmons

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).