From: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
To: roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org,
jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org,
Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Subject: [PATCH REPOST FIXES for-3.11 4/4] IB/ipoib: Fix pkey-change flow for Virtualization environments
Date: Wed, 17 Jul 2013 17:22:42 +0300 [thread overview]
Message-ID: <1374070962-328-5-git-send-email-ogerlitz@mellanox.com> (raw)
In-Reply-To: <1374070962-328-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
From: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
IPoIB's required behaviour w.r.t to the pkey used by the device is the following:
- For "parent" interfaces (e.g ib0, ib1, etc) who are created automatically as a
result of hot-plug events from the IB core, the driver needs to take whatever
pkey vlaue it finds in index 0, and stick to that index.
- For child interfaces (e.g ib0.8001, etc) created by admin directive, the driver
needs to use and stick to the value provided during its creation.
In SR-IOV environment its possible for the VF probe to take place before the
cloud management software provisions the suitable pkey for the VF in the
paravirtualed PKEY table index 0. When this is the case, the VF IB stack will
find in index 0 an invalide pkey, which is all zeros.
Moreover, the cloud managment can assign the pkey value at index 0 at any
time of the guest life cycle.
The correct behavior for IPoIB to address these requirements for parent
interfaces is to use PKEY_CHANGE event as trigger to optionally re-init the
device pkey value and re-create all the relevant resources accordingly, if
the value of the pkey in index 0 has changed (from invalid to valid or from
valid value X to invalid value Y).
This patch enhances the heavy flushing code which is triggered by pkey change
event, to behave correctly for parent devices. For child devices, the code
remains the same, namely chases pkey value and not index.
Signed-off-by: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
drivers/infiniband/ulp/ipoib/ipoib_ib.c | 76 +++++++++++++++++++++++++-----
1 files changed, 63 insertions(+), 13 deletions(-)
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index 2cfa76f..196b1d1 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -932,12 +932,47 @@ int ipoib_ib_dev_init(struct net_device *dev, struct ib_device *ca, int port)
return 0;
}
+/*
+ * Takes whatever value which is in pkey index 0 and updates priv->pkey
+ * returns 0 if the pkey value was changed.
+ */
+static inline int update_parent_pkey(struct ipoib_dev_priv *priv)
+{
+ int result;
+ u16 prev_pkey;
+
+ prev_pkey = priv->pkey;
+ result = ib_query_pkey(priv->ca, priv->port, 0, &priv->pkey);
+ if (result) {
+ ipoib_warn(priv, "ib_query_pkey port %d failed (ret = %d)\n",
+ priv->port, result);
+ return result;
+ }
+
+ priv->pkey |= 0x8000;
+
+ if (prev_pkey != priv->pkey) {
+ ipoib_dbg(priv, "pkey changed from 0x%x to 0x%x\n",
+ prev_pkey, priv->pkey);
+ /*
+ * Update the pkey in the broadcast address, while making sure to set
+ * the full membership bit, so that we join the right broadcast group.
+ */
+ priv->dev->broadcast[8] = priv->pkey >> 8;
+ priv->dev->broadcast[9] = priv->pkey & 0xff;
+ return 0;
+ }
+
+ return 1;
+}
+
static void __ipoib_ib_dev_flush(struct ipoib_dev_priv *priv,
enum ipoib_flush_level level)
{
struct ipoib_dev_priv *cpriv;
struct net_device *dev = priv->dev;
u16 new_index;
+ int result;
mutex_lock(&priv->vlan_mutex);
@@ -951,6 +986,10 @@ static void __ipoib_ib_dev_flush(struct ipoib_dev_priv *priv,
mutex_unlock(&priv->vlan_mutex);
if (!test_bit(IPOIB_FLAG_INITIALIZED, &priv->flags)) {
+ /* for non-child devices must check/update the pkey value here */
+ if (level == IPOIB_FLUSH_HEAVY &&
+ !test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags))
+ update_parent_pkey(priv);
ipoib_dbg(priv, "Not flushing - IPOIB_FLAG_INITIALIZED not set.\n");
return;
}
@@ -961,21 +1000,32 @@ static void __ipoib_ib_dev_flush(struct ipoib_dev_priv *priv,
}
if (level == IPOIB_FLUSH_HEAVY) {
- if (ib_find_pkey(priv->ca, priv->port, priv->pkey, &new_index)) {
- clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags);
- ipoib_ib_dev_down(dev, 0);
- ipoib_ib_dev_stop(dev, 0);
- if (ipoib_pkey_dev_delay_open(dev))
+ /* child devices chase their origin pkey value, while non-child
+ * (parent) devices should always takes what present in pkey index 0
+ */
+ if (test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) {
+ if (ib_find_pkey(priv->ca, priv->port, priv->pkey, &new_index)) {
+ clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags);
+ ipoib_ib_dev_down(dev, 0);
+ ipoib_ib_dev_stop(dev, 0);
+ if (ipoib_pkey_dev_delay_open(dev))
+ return;
+ }
+ /* restart QP only if P_Key index is changed */
+ if (test_and_set_bit(IPOIB_PKEY_ASSIGNED, &priv->flags) &&
+ new_index == priv->pkey_index) {
+ ipoib_dbg(priv, "Not flushing - P_Key index not changed.\n");
return;
+ }
+ priv->pkey_index = new_index;
+ } else {
+ result = update_parent_pkey(priv);
+ /* restart QP only if P_Key value changed */
+ if (result) {
+ ipoib_dbg(priv, "Not flushing - P_Key value not changed.\n");
+ return;
+ }
}
-
- /* restart QP only if P_Key index is changed */
- if (test_and_set_bit(IPOIB_PKEY_ASSIGNED, &priv->flags) &&
- new_index == priv->pkey_index) {
- ipoib_dbg(priv, "Not flushing - P_Key index not changed.\n");
- return;
- }
- priv->pkey_index = new_index;
}
if (level == IPOIB_FLUSH_LIGHT) {
--
1.7.1
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
prev parent reply other threads:[~2013-07-17 14:22 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-07-17 14:22 [PATCH REPOST FIXES for-3.11 0/4] Pkey fixes for IB core and IPoIB Or Gerlitz
[not found] ` <1374070962-328-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-07-17 14:22 ` [PATCH REPOST FIXES for-3.11 1/4] IB/core: Create QP1 using the pkey index which contains the default pkey Or Gerlitz
[not found] ` <1374070962-328-2-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-07-17 16:31 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A82373805B82BE-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2013-07-18 6:29 ` Or Gerlitz
2013-07-17 14:22 ` [PATCH REPOST FIXES for-3.11 2/4] IB/mlx4: Use default pkey when creating tunnel QPs Or Gerlitz
2013-07-17 14:22 ` [PATCH REPOST FIXES for-3.11 3/4] IB/ipoib: Make sure child devices use valid/proper pkeys Or Gerlitz
2013-07-17 14:22 ` Or Gerlitz [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1374070962-328-5-git-send-email-ogerlitz@mellanox.com \
--to=ogerlitz-vpraknaxozvwk0htik3j/w@public.gmane.org \
--cc=erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
--cc=jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.