All of lore.kernel.org
 help / color / mirror / Atom feed
From: James Simmons <jsimmons@infradead.org>
To: Andreas Dilger <adilger@whamcloud.com>,
	Oleg Drokin <green@whamcloud.com>, NeilBrown <neilb@suse.de>
Cc: Serguei Smirnov <ssmirnov@whamcloud.com>,
	Lustre Development List <lustre-devel@lists.lustre.org>
Subject: [lustre-devel] [PATCH 24/25] lnet: o2iblnd: clear fatal error on successful failover
Date: Mon,  2 Aug 2021 15:50:47 -0400	[thread overview]
Message-ID: <1627933851-7603-28-git-send-email-jsimmons@infradead.org> (raw)
In-Reply-To: <1627933851-7603-1-git-send-email-jsimmons@infradead.org>

From: Serguei Smirnov <ssmirnov@whamcloud.com>

In IB bonding configuration link down event causes fatal error
flag to be set on the bonded interface so it is not selected by
LNet for tx, e.g. when just one of the two cables is pulled.
This change allows for the interface status to be restored on
successful failover.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14806
Lustre-commit: 4668283cd13079dd ("LU-14806 o2iblnd: clear fatal error on successful failover")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44139
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/klnds/o2iblnd/o2iblnd.c | 27 +++++++++++++++++++++++++--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.c b/net/lnet/klnds/o2iblnd/o2iblnd.c
index 3141953..686581a 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd.c
@@ -1487,6 +1487,21 @@ static void kiblnd_fini_fmr_poolset(struct kib_fmr_poolset *fps)
 	}
 }
 
+static int kiblnd_get_link_status(struct net_device *dev)
+{
+	int ret = -1;
+
+	LASSERT(dev);
+
+	if (!netif_running(dev))
+		ret = 0;
+	/* Some devices may not be providing link settings */
+	else if (dev->ethtool_ops->get_link)
+		ret = dev->ethtool_ops->get_link(dev);
+
+	return ret;
+}
+
 static int
 kiblnd_init_fmr_poolset(struct kib_fmr_poolset *fps, int cpt, int ncpts,
 			struct kib_net *net,
@@ -2347,6 +2362,7 @@ int kiblnd_dev_failover(struct kib_dev *dev, struct net *ns)
 	struct ib_pd *pd;
 	struct kib_net *net;
 	struct sockaddr_in addr;
+	struct net_device *netdev;
 	unsigned long flags;
 	int rc = 0;
 	int i;
@@ -2467,11 +2483,18 @@ int kiblnd_dev_failover(struct kib_dev *dev, struct net *ns)
 	if (hdev)
 		kiblnd_hdev_decref(hdev);
 
-	if (rc)
+	if (rc) {
 		dev->ibd_failed_failover++;
-	else
+	} else {
 		dev->ibd_failed_failover = 0;
 
+		rcu_read_lock();
+		netdev = dev_get_by_name_rcu(ns, dev->ibd_ifname);
+		if (netdev && (kiblnd_get_link_status(netdev) == 1))
+			kiblnd_set_ni_fatal_on(dev->ibd_hdev, 0);
+		rcu_read_unlock();
+	}
+
 	return rc;
 }
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

  parent reply	other threads:[~2021-08-02 19:51 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-02 19:50 [lustre-devel] [PATCH 00/25] Sync to OpenSFS tree as of Aug 2, 2021 James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 01/25] lustre: llite: avoid stale data reading James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 02/25] lustre: llite: No locked parallel DIO James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 03/25] lnet: discard lnet_current_net_count James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 04/25] lnet: convert kiblnd/ksocknal_thread_start to vararg James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 05/25] lnet: print device status in net show command James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 06/25] lustre: lmv: getattr_name("..") under striped directory James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 07/25] lustre: llite: revert 'simplify callback handling for async getattr' James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 08/25] lnet: Protect lpni deref in lnet_health_check James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 09/25] lustre: uapi: remove MDS_SETATTR_PORTAL and service James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 10/25] lustre: llite: Modify AIO/DIO reference counting James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 11/25] lustre: llite: Remove transient page counting James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 12/25] lustre: lov: Improve DIO submit James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 13/25] lustre: llite: Adjust dio refcounting James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 14/25] lustre: clio: Skip prep for transients James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 15/25] lustre: osc: Improve osc_queue_sync_pages James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 16/25] lustre: llite: avoid project quota overflow James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 17/25] lnet: check memdup_user_nul using IS_ERR James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 18/25] lustre: osc: Remove lockless truncate James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 19/25] lustre: osc: Remove client contention support James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 20/25] lustre: osc: osc: Do not flush on lockless cancel James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 21/26] lustre: pcc: add LCM_FL_PCC_RDONLY layout flag James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 21/25] lustre: update version to 2.14.53 James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 22/25] lustre: mdc: set default LMV on ROOT James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 22/26] lustre: update version to 2.14.53 James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 23/25] lustre: llite: enable filesystem-wide default LMV James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 23/26] lustre: mdc: set default LMV on ROOT James Simmons
2021-08-02 19:50 ` James Simmons [this message]
2021-08-02 19:50 ` [lustre-devel] [PATCH 24/26] lustre: llite: enable filesystem-wide default LMV James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 25/25] lnet: add "stats reset" to lnetctl James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 25/26] lnet: o2iblnd: clear fatal error on successful failover James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 26/26] lnet: add "stats reset" to lnetctl James Simmons

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1627933851-7603-28-git-send-email-jsimmons@infradead.org \
    --to=jsimmons@infradead.org \
    --cc=adilger@whamcloud.com \
    --cc=green@whamcloud.com \
    --cc=lustre-devel@lists.lustre.org \
    --cc=neilb@suse.de \
    --cc=ssmirnov@whamcloud.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.