* [PATCH FIX For-3.19 v4 1/7] IB/ipoib: Fix failed multicast joins/sends
[not found] ` <cover.1421725318.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-01-20 3:58 ` Doug Ledford
2015-01-20 3:58 ` [PATCH FIX For-3.19 v4 2/7] IB/ipoib: Add a helper to restart the multicast task Doug Ledford
` (6 subsequent siblings)
7 siblings, 0 replies; 18+ messages in thread
From: Doug Ledford @ 2015-01-20 3:58 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA, roland-DgEjT+Ai2ygdnm+yROfE0A
Cc: Amir Vadai, Eyal Perry, Or Gerlitz, Erez Shitrit, Doug Ledford
The usage of IPOIB_MCAST_RUN as a flag is inconsistent. In some places
it is used to mean "our device is administratively allowed to send
multicast joins/leaves/packets" and in other places it means "our
multicast join task thread is currently running and will process your
request if you put it on the queue". However, this latter meaning is in
fact flawed as there is a race condition between the join task testing
the mcast list and finding it empty of remaining work, dropping the
mcast mutex and also the priv->lock spinlock, and clearing the
IPOIB_MCAST_RUN flag. Further, there are numerous locations that use
the flag in the former fashion, and when all tasks complete and the task
thread clears the RUN flag, all of those other locations will fail to
ever again queue any work. This results in the interface coming up fine
initially, but having problems adding new multicast groups after the
first round of groups have all been added and the RUN flag is cleared by
the join task thread when it thinks it is done. To resolve this issue,
convert all locations in the code to treat the RUN flag as an indicator
that the multicast portion of this interface is in fact administratively
up and joins/leaves/sends can be performed. There is no harm (other
than a slight performance penalty) to never clearing this flag and using
it in this fashion as it simply means that a few places that used to
micro-optimize how often this task was queued on a work queue will now
queue the task a few extra times. We can address that suboptimal
behavior in future patches.
Signed-off-by: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index bc50dd0d0e4..91b8fe118ec 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -630,8 +630,6 @@ void ipoib_mcast_join_task(struct work_struct *work)
}
ipoib_dbg_mcast(priv, "successfully joined all multicast groups\n");
-
- clear_bit(IPOIB_MCAST_RUN, &priv->flags);
}
int ipoib_mcast_start_thread(struct net_device *dev)
@@ -641,8 +639,8 @@ int ipoib_mcast_start_thread(struct net_device *dev)
ipoib_dbg_mcast(priv, "starting multicast thread\n");
mutex_lock(&mcast_mutex);
- if (!test_and_set_bit(IPOIB_MCAST_RUN, &priv->flags))
- queue_delayed_work(priv->wq, &priv->mcast_task, 0);
+ set_bit(IPOIB_MCAST_RUN, &priv->flags);
+ queue_delayed_work(priv->wq, &priv->mcast_task, 0);
mutex_unlock(&mcast_mutex);
return 0;
@@ -725,7 +723,7 @@ void ipoib_mcast_send(struct net_device *dev, u8 *daddr, struct sk_buff *skb)
memcpy(mcast->mcmember.mgid.raw, mgid, sizeof (union ib_gid));
__ipoib_mcast_add(dev, mcast);
list_add_tail(&mcast->list, &priv->multicast_list);
- if (!test_and_set_bit(IPOIB_MCAST_RUN, &priv->flags))
+ if (test_bit(IPOIB_MCAST_RUN, &priv->flags))
queue_delayed_work(priv->wq, &priv->mcast_task, 0);
}
@@ -951,7 +949,8 @@ void ipoib_mcast_restart_task(struct work_struct *work)
/*
* Restart our join task if needed
*/
- ipoib_mcast_start_thread(dev);
+ if (test_bit(IPOIB_MCAST_RUN, &priv->flags))
+ queue_delayed_work(priv->wq, &priv->mcast_task, 0);
rtnl_unlock();
}
--
2.1.0
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH FIX For-3.19 v4 2/7] IB/ipoib: Add a helper to restart the multicast task
[not found] ` <cover.1421725318.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-01-20 3:58 ` [PATCH FIX For-3.19 v4 1/7] IB/ipoib: Fix failed multicast joins/sends Doug Ledford
@ 2015-01-20 3:58 ` Doug Ledford
2015-01-20 3:58 ` [PATCH FIX For-3.19 v4 3/7] IB/ipoib: make delayed tasks not hold up everything Doug Ledford
` (5 subsequent siblings)
7 siblings, 0 replies; 18+ messages in thread
From: Doug Ledford @ 2015-01-20 3:58 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA, roland-DgEjT+Ai2ygdnm+yROfE0A
Cc: Amir Vadai, Eyal Perry, Or Gerlitz, Erez Shitrit, Doug Ledford
Add a simple helper to do the right thing when restarting
the multicast thread. Call that helper from all the places
that need to restart the thread. Add two places that didn't
restart the thread, but should have.
Signed-off-by: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 46 ++++++++++++++++----------
1 file changed, 29 insertions(+), 17 deletions(-)
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index 91b8fe118ec..cb1e495bd74 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -66,6 +66,13 @@ struct ipoib_mcast_iter {
unsigned int send_only;
};
+static void __ipoib_mcast_continue_join_thread(struct ipoib_dev_priv *priv,
+ int delay)
+{
+ if (test_bit(IPOIB_MCAST_RUN, &priv->flags))
+ queue_delayed_work(priv->wq, &priv->mcast_task, delay);
+}
+
static void ipoib_mcast_free(struct ipoib_mcast *mcast)
{
struct net_device *dev = mcast->dev;
@@ -270,6 +277,7 @@ ipoib_mcast_sendonly_join_complete(int status,
{
struct ipoib_mcast *mcast = multicast->context;
struct net_device *dev = mcast->dev;
+ struct ipoib_dev_priv *priv = netdev_priv(dev);
/*
* We have to take the mutex to force mcast_sendonly_join to
@@ -309,6 +317,7 @@ out:
complete(&mcast->done);
if (status == -ENETRESET)
status = 0;
+ __ipoib_mcast_continue_join_thread(priv, 0);
mutex_unlock(&mcast_mutex);
return status;
}
@@ -360,6 +369,7 @@ static int ipoib_mcast_sendonly_join(struct ipoib_mcast *mcast)
complete(&mcast->done);
ipoib_warn(priv, "ib_sa_join_multicast for sendonly join "
"failed (ret = %d)\n", ret);
+ __ipoib_mcast_continue_join_thread(priv, 0);
} else {
ipoib_dbg_mcast(priv, "no multicast record for %pI6, starting "
"sendonly join\n", mcast->mcmember.mgid.raw);
@@ -431,8 +441,7 @@ static int ipoib_mcast_join_complete(int status,
if (!status) {
mcast->backoff = 1;
- if (test_bit(IPOIB_MCAST_RUN, &priv->flags))
- queue_delayed_work(priv->wq, &priv->mcast_task, 0);
+ __ipoib_mcast_continue_join_thread(priv, 0);
/*
* Defer carrier on work to priv->wq to avoid a
@@ -454,6 +463,13 @@ static int ipoib_mcast_join_complete(int status,
mcast->backoff *= 2;
if (mcast->backoff > IPOIB_MAX_BACKOFF_SECONDS)
mcast->backoff = IPOIB_MAX_BACKOFF_SECONDS;
+ /*
+ * XXX - This is wrong. *Our* join failed, but because the
+ * join thread does the joins in a serial fashion, if there
+ * are any joins behind ours waiting to complete, they should
+ * not be subjected to our backoff delay.
+ */
+ __ipoib_mcast_continue_join_thread(priv, mcast->backoff * HZ);
}
out:
spin_lock_irq(&priv->lock);
@@ -463,9 +479,6 @@ out:
complete(&mcast->done);
if (status == -ENETRESET)
status = 0;
- if (status && test_bit(IPOIB_MCAST_RUN, &priv->flags))
- queue_delayed_work(priv->wq, &priv->mcast_task,
- mcast->backoff * HZ);
spin_unlock_irq(&priv->lock);
mutex_unlock(&mcast_mutex);
@@ -532,10 +545,13 @@ static void ipoib_mcast_join(struct net_device *dev, struct ipoib_mcast *mcast,
mcast->backoff *= 2;
if (mcast->backoff > IPOIB_MAX_BACKOFF_SECONDS)
mcast->backoff = IPOIB_MAX_BACKOFF_SECONDS;
-
- if (test_bit(IPOIB_MCAST_RUN, &priv->flags))
- queue_delayed_work(priv->wq, &priv->mcast_task,
- mcast->backoff * HZ);
+ /*
+ * XXX - This is wrong. *Our* join failed, but because the
+ * join thread does the joins in a serial fashion, if there
+ * are any joins behind ours waiting to complete, they should
+ * not be subjected to our backoff delay.
+ */
+ __ipoib_mcast_continue_join_thread(priv, mcast->backoff * HZ);
}
mutex_unlock(&mcast_mutex);
}
@@ -573,9 +589,7 @@ void ipoib_mcast_join_task(struct work_struct *work)
if (!broadcast) {
ipoib_warn(priv, "failed to allocate broadcast group\n");
mutex_lock(&mcast_mutex);
- if (test_bit(IPOIB_MCAST_RUN, &priv->flags))
- queue_delayed_work(priv->wq, &priv->mcast_task,
- HZ);
+ __ipoib_mcast_continue_join_thread(priv, HZ);
mutex_unlock(&mcast_mutex);
return;
}
@@ -723,8 +737,7 @@ void ipoib_mcast_send(struct net_device *dev, u8 *daddr, struct sk_buff *skb)
memcpy(mcast->mcmember.mgid.raw, mgid, sizeof (union ib_gid));
__ipoib_mcast_add(dev, mcast);
list_add_tail(&mcast->list, &priv->multicast_list);
- if (test_bit(IPOIB_MCAST_RUN, &priv->flags))
- queue_delayed_work(priv->wq, &priv->mcast_task, 0);
+ __ipoib_mcast_continue_join_thread(priv, 0);
}
if (!mcast->ah) {
@@ -947,10 +960,9 @@ void ipoib_mcast_restart_task(struct work_struct *work)
ipoib_mcast_free(mcast);
}
/*
- * Restart our join task if needed
+ * Restart our join task thread if needed
*/
- if (test_bit(IPOIB_MCAST_RUN, &priv->flags))
- queue_delayed_work(priv->wq, &priv->mcast_task, 0);
+ __ipoib_mcast_continue_join_thread(priv, 0);
rtnl_unlock();
}
--
2.1.0
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH FIX For-3.19 v4 3/7] IB/ipoib: make delayed tasks not hold up everything
[not found] ` <cover.1421725318.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-01-20 3:58 ` [PATCH FIX For-3.19 v4 1/7] IB/ipoib: Fix failed multicast joins/sends Doug Ledford
2015-01-20 3:58 ` [PATCH FIX For-3.19 v4 2/7] IB/ipoib: Add a helper to restart the multicast task Doug Ledford
@ 2015-01-20 3:58 ` Doug Ledford
2015-01-20 3:58 ` [PATCH FIX For-3.19 v4 4/7] IB/ipoib: Handle -ENETRESET properly in our callback Doug Ledford
` (4 subsequent siblings)
7 siblings, 0 replies; 18+ messages in thread
From: Doug Ledford @ 2015-01-20 3:58 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA, roland-DgEjT+Ai2ygdnm+yROfE0A
Cc: Amir Vadai, Eyal Perry, Or Gerlitz, Erez Shitrit, Doug Ledford
Currently, the multicast join thread only processes one task at a
time. It does this by having itself be scheduled as a delayed work
item, when it runs, it finds one, and only one, piece of work to
process, it then kicks that off via either the normal join process
or the sendonly join process, and then it immediately exits. Both
of those process chains are responsible for restarting the task
when they have completed their specific action. This makes the entire
join process serial with only one join supposedly ever outstanding at
a time.
However, if we fail a join, and we need to initiate a backoff delay,
that delay holds up the entire join process for all joins, not just
the failed join. So modify the design such that we can have joins
in delay, and the multicast thread will ignore them until their time
comes around, and then we can process the rest of the queue without
waiting for the delayed items.
Signed-off-by: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
drivers/infiniband/ulp/ipoib/ipoib.h | 1 +
drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 86 +++++++++++++++++---------
2 files changed, 57 insertions(+), 30 deletions(-)
diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index 8ba80a6d3a4..c79dcd5ee8a 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -154,6 +154,7 @@ struct ipoib_mcast {
unsigned long created;
unsigned long backoff;
+ unsigned long delay_until;
unsigned long flags;
unsigned char logcount;
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index cb1e495bd74..957e7d2e80c 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -67,10 +67,34 @@ struct ipoib_mcast_iter {
};
static void __ipoib_mcast_continue_join_thread(struct ipoib_dev_priv *priv,
+ struct ipoib_mcast *mcast,
int delay)
{
- if (test_bit(IPOIB_MCAST_RUN, &priv->flags))
+ if (test_bit(IPOIB_MCAST_RUN, &priv->flags)) {
+ /*
+ * Mark this mcast for its delay and set a timer to kick the
+ * thread when the delay completes
+ */
+ if (mcast && delay) {
+ mcast->backoff *= 2;
+ if (mcast->backoff > IPOIB_MAX_BACKOFF_SECONDS)
+ mcast->backoff = IPOIB_MAX_BACKOFF_SECONDS;
+ mcast->delay_until = jiffies + (mcast->backoff * HZ);
+ queue_delayed_work(priv->wq, &priv->mcast_task,
+ mcast->backoff * HZ);
+ } else if (delay) {
+ /* Special case of retrying after a failure to
+ * allocate the broadcast multicast group, wait
+ * 1 second and try again
+ */
+ queue_delayed_work(priv->wq, &priv->mcast_task, HZ);
+ }
+ /*
+ * But also rekick the thread immediately for any other
+ * tasks in queue behind this one
+ */
queue_delayed_work(priv->wq, &priv->mcast_task, delay);
+ }
}
static void ipoib_mcast_free(struct ipoib_mcast *mcast)
@@ -110,6 +134,7 @@ static struct ipoib_mcast *ipoib_mcast_alloc(struct net_device *dev,
mcast->dev = dev;
mcast->created = jiffies;
+ mcast->delay_until = jiffies;
mcast->backoff = 1;
INIT_LIST_HEAD(&mcast->list);
@@ -309,6 +334,10 @@ ipoib_mcast_sendonly_join_complete(int status,
dev_kfree_skb_any(skb_dequeue(&mcast->pkt_queue));
}
netif_tx_unlock_bh(dev);
+ } else {
+ /* Join completed, so reset any backoff parameters */
+ mcast->backoff = 1;
+ mcast->delay_until = jiffies;
}
out:
clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags);
@@ -317,7 +346,7 @@ out:
complete(&mcast->done);
if (status == -ENETRESET)
status = 0;
- __ipoib_mcast_continue_join_thread(priv, 0);
+ __ipoib_mcast_continue_join_thread(priv, NULL, 0);
mutex_unlock(&mcast_mutex);
return status;
}
@@ -369,7 +398,7 @@ static int ipoib_mcast_sendonly_join(struct ipoib_mcast *mcast)
complete(&mcast->done);
ipoib_warn(priv, "ib_sa_join_multicast for sendonly join "
"failed (ret = %d)\n", ret);
- __ipoib_mcast_continue_join_thread(priv, 0);
+ __ipoib_mcast_continue_join_thread(priv, NULL, 0);
} else {
ipoib_dbg_mcast(priv, "no multicast record for %pI6, starting "
"sendonly join\n", mcast->mcmember.mgid.raw);
@@ -441,7 +470,8 @@ static int ipoib_mcast_join_complete(int status,
if (!status) {
mcast->backoff = 1;
- __ipoib_mcast_continue_join_thread(priv, 0);
+ mcast->delay_until = jiffies;
+ __ipoib_mcast_continue_join_thread(priv, NULL, 0);
/*
* Defer carrier on work to priv->wq to avoid a
@@ -460,16 +490,8 @@ static int ipoib_mcast_join_complete(int status,
}
}
- mcast->backoff *= 2;
- if (mcast->backoff > IPOIB_MAX_BACKOFF_SECONDS)
- mcast->backoff = IPOIB_MAX_BACKOFF_SECONDS;
- /*
- * XXX - This is wrong. *Our* join failed, but because the
- * join thread does the joins in a serial fashion, if there
- * are any joins behind ours waiting to complete, they should
- * not be subjected to our backoff delay.
- */
- __ipoib_mcast_continue_join_thread(priv, mcast->backoff * HZ);
+ /* Requeue this join task with a backoff delay */
+ __ipoib_mcast_continue_join_thread(priv, mcast, 1);
}
out:
spin_lock_irq(&priv->lock);
@@ -541,17 +563,8 @@ static void ipoib_mcast_join(struct net_device *dev, struct ipoib_mcast *mcast,
complete(&mcast->done);
ret = PTR_ERR(mcast->mc);
ipoib_warn(priv, "ib_sa_join_multicast failed, status %d\n", ret);
-
- mcast->backoff *= 2;
- if (mcast->backoff > IPOIB_MAX_BACKOFF_SECONDS)
- mcast->backoff = IPOIB_MAX_BACKOFF_SECONDS;
- /*
- * XXX - This is wrong. *Our* join failed, but because the
- * join thread does the joins in a serial fashion, if there
- * are any joins behind ours waiting to complete, they should
- * not be subjected to our backoff delay.
- */
- __ipoib_mcast_continue_join_thread(priv, mcast->backoff * HZ);
+ /* Requeue this join task with a backoff delay */
+ __ipoib_mcast_continue_join_thread(priv, mcast, 1);
}
mutex_unlock(&mcast_mutex);
}
@@ -589,7 +602,13 @@ void ipoib_mcast_join_task(struct work_struct *work)
if (!broadcast) {
ipoib_warn(priv, "failed to allocate broadcast group\n");
mutex_lock(&mcast_mutex);
- __ipoib_mcast_continue_join_thread(priv, HZ);
+ /*
+ * Restart us after a 1 second delay to retry
+ * creating our broadcast group and attaching to
+ * it. Until this succeeds, this ipoib dev is
+ * completely stalled (multicast wise).
+ */
+ __ipoib_mcast_continue_join_thread(priv, NULL, 1);
mutex_unlock(&mcast_mutex);
return;
}
@@ -623,7 +642,10 @@ void ipoib_mcast_join_task(struct work_struct *work)
list_for_each_entry(mcast, &priv->multicast_list, list) {
if (IS_ERR_OR_NULL(mcast->mc) &&
!test_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags) &&
- !test_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags)) {
+ !test_bit(IPOIB_MCAST_FLAG_ATTACHED,
+ &mcast->flags) &&
+ (mcast->backoff == 1 ||
+ time_after_eq(jiffies, mcast->delay_until))) {
/* Found the next unjoined group */
break;
}
@@ -632,7 +654,11 @@ void ipoib_mcast_join_task(struct work_struct *work)
mutex_unlock(&mcast_mutex);
if (&mcast->list == &priv->multicast_list) {
- /* All done */
+ /* All done, unless we have delayed work from
+ * backoff retransmissions, but we will get
+ * restarted when the time is right, so we are
+ * done for now
+ */
break;
}
@@ -737,7 +763,7 @@ void ipoib_mcast_send(struct net_device *dev, u8 *daddr, struct sk_buff *skb)
memcpy(mcast->mcmember.mgid.raw, mgid, sizeof (union ib_gid));
__ipoib_mcast_add(dev, mcast);
list_add_tail(&mcast->list, &priv->multicast_list);
- __ipoib_mcast_continue_join_thread(priv, 0);
+ __ipoib_mcast_continue_join_thread(priv, NULL, 0);
}
if (!mcast->ah) {
@@ -962,7 +988,7 @@ void ipoib_mcast_restart_task(struct work_struct *work)
/*
* Restart our join task thread if needed
*/
- __ipoib_mcast_continue_join_thread(priv, 0);
+ __ipoib_mcast_continue_join_thread(priv, NULL, 0);
rtnl_unlock();
}
--
2.1.0
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH FIX For-3.19 v4 4/7] IB/ipoib: Handle -ENETRESET properly in our callback
[not found] ` <cover.1421725318.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
` (2 preceding siblings ...)
2015-01-20 3:58 ` [PATCH FIX For-3.19 v4 3/7] IB/ipoib: make delayed tasks not hold up everything Doug Ledford
@ 2015-01-20 3:58 ` Doug Ledford
2015-01-20 3:58 ` [PATCH FIX For-3.19 v4 5/7] IB/ipoib: don't restart our thread on ENETRESET Doug Ledford
` (3 subsequent siblings)
7 siblings, 0 replies; 18+ messages in thread
From: Doug Ledford @ 2015-01-20 3:58 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA, roland-DgEjT+Ai2ygdnm+yROfE0A
Cc: Amir Vadai, Eyal Perry, Or Gerlitz, Erez Shitrit, Doug Ledford
When the core layer calls our callback with ENETRESET as the error, we
clear the status to 0 before returning indicating that we are going to
handle the error ourselves. This causes the core layer to not free the
mcast->mc structure, but we are releasing our reference to it by
clearing mcast->mc. So, preserve our reference to the multicast
structure so when we next run ipoib_mcast_dev_flush, it will be able
to properly release the mcast->mc entry at the right time in the right
way.
Signed-off-by: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index 957e7d2e80c..50d2de2270a 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -315,8 +315,10 @@ ipoib_mcast_sendonly_join_complete(int status,
mutex_lock(&mcast_mutex);
/* We trap for port events ourselves. */
- if (status == -ENETRESET)
+ if (status == -ENETRESET) {
+ status = 0;
goto out;
+ }
if (!status)
status = ipoib_mcast_join_finish(mcast, &multicast->rec);
@@ -344,8 +346,6 @@ out:
if (status)
mcast->mc = NULL;
complete(&mcast->done);
- if (status == -ENETRESET)
- status = 0;
__ipoib_mcast_continue_join_thread(priv, NULL, 0);
mutex_unlock(&mcast_mutex);
return status;
@@ -462,8 +462,10 @@ static int ipoib_mcast_join_complete(int status,
mutex_lock(&mcast_mutex);
/* We trap for port events ourselves. */
- if (status == -ENETRESET)
+ if (status == -ENETRESET) {
+ status = 0;
goto out;
+ }
if (!status)
status = ipoib_mcast_join_finish(mcast, &multicast->rec);
@@ -499,8 +501,6 @@ out:
if (status)
mcast->mc = NULL;
complete(&mcast->done);
- if (status == -ENETRESET)
- status = 0;
spin_unlock_irq(&priv->lock);
mutex_unlock(&mcast_mutex);
--
2.1.0
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH FIX For-3.19 v4 5/7] IB/ipoib: don't restart our thread on ENETRESET
[not found] ` <cover.1421725318.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
` (3 preceding siblings ...)
2015-01-20 3:58 ` [PATCH FIX For-3.19 v4 4/7] IB/ipoib: Handle -ENETRESET properly in our callback Doug Ledford
@ 2015-01-20 3:58 ` Doug Ledford
2015-01-20 3:58 ` [PATCH FIX For-3.19 v4 6/7] IB/ipoib: remove unneeded locks Doug Ledford
` (2 subsequent siblings)
7 siblings, 0 replies; 18+ messages in thread
From: Doug Ledford @ 2015-01-20 3:58 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA, roland-DgEjT+Ai2ygdnm+yROfE0A
Cc: Amir Vadai, Eyal Perry, Or Gerlitz, Erez Shitrit, Doug Ledford
Make the send only join completion handler behave like the normal join
completion handler in that we should restart the multicast join thread
whenever our join completes with either an error or success, but don't
restart the thread if we got a net reset event, let ipoib_event queue
up a device flush, which will then call ipoib_mcast_dev_flush to remove
all of the current mcast entries, and ipoib_mcase_restart_task to check
out current mcast entry list against the netdev list of mcast entries
we are supposed to have and requeue the ones we need to get back.
Signed-off-by: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index 50d2de2270a..a03b6a04203 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -336,17 +336,18 @@ ipoib_mcast_sendonly_join_complete(int status,
dev_kfree_skb_any(skb_dequeue(&mcast->pkt_queue));
}
netif_tx_unlock_bh(dev);
+ __ipoib_mcast_continue_join_thread(priv, mcast, 1);
} else {
/* Join completed, so reset any backoff parameters */
mcast->backoff = 1;
mcast->delay_until = jiffies;
+ __ipoib_mcast_continue_join_thread(priv, NULL, 0);
}
out:
clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags);
if (status)
mcast->mc = NULL;
complete(&mcast->done);
- __ipoib_mcast_continue_join_thread(priv, NULL, 0);
mutex_unlock(&mcast_mutex);
return status;
}
--
2.1.0
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH FIX For-3.19 v4 6/7] IB/ipoib: remove unneeded locks
[not found] ` <cover.1421725318.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
` (4 preceding siblings ...)
2015-01-20 3:58 ` [PATCH FIX For-3.19 v4 5/7] IB/ipoib: don't restart our thread on ENETRESET Doug Ledford
@ 2015-01-20 3:58 ` Doug Ledford
2015-01-20 3:58 ` [PATCH FIX For-3.19 v4 7/7] IB/ipoib: fix race between mcast_dev_flush and mcast_join Doug Ledford
2015-01-20 16:16 ` [PATCH FIX For-3.19 v4 0/7] IB/ipoib: follow fixes for multicast handling Erez Shitrit
7 siblings, 0 replies; 18+ messages in thread
From: Doug Ledford @ 2015-01-20 3:58 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA, roland-DgEjT+Ai2ygdnm+yROfE0A
Cc: Amir Vadai, Eyal Perry, Or Gerlitz, Erez Shitrit, Doug Ledford
During the various work I've done on this, some extra locking has crept
in when things weren't working right. This is one of those spots.
Remove the unneeded spinlocks. The mutex is enough to protect against
what we need to protect against.
Signed-off-by: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index a03b6a04203..424c602d5a0 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -497,12 +497,10 @@ static int ipoib_mcast_join_complete(int status,
__ipoib_mcast_continue_join_thread(priv, mcast, 1);
}
out:
- spin_lock_irq(&priv->lock);
clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags);
if (status)
mcast->mc = NULL;
complete(&mcast->done);
- spin_unlock_irq(&priv->lock);
mutex_unlock(&mcast_mutex);
return status;
--
2.1.0
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH FIX For-3.19 v4 7/7] IB/ipoib: fix race between mcast_dev_flush and mcast_join
[not found] ` <cover.1421725318.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
` (5 preceding siblings ...)
2015-01-20 3:58 ` [PATCH FIX For-3.19 v4 6/7] IB/ipoib: remove unneeded locks Doug Ledford
@ 2015-01-20 3:58 ` Doug Ledford
2015-01-20 16:16 ` [PATCH FIX For-3.19 v4 0/7] IB/ipoib: follow fixes for multicast handling Erez Shitrit
7 siblings, 0 replies; 18+ messages in thread
From: Doug Ledford @ 2015-01-20 3:58 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA, roland-DgEjT+Ai2ygdnm+yROfE0A
Cc: Amir Vadai, Eyal Perry, Or Gerlitz, Erez Shitrit, Doug Ledford
There is a classic race between mcast_dev_flush and mcast_join_thread
in that the join thread is going to loop through all of the mcast
entries in the device mcast list, while dev_flush wants to remove
those entries from the list so that they can eventually be freed when
they are no longer busy.
As always, the list traversal and the list removal must be done under
the same lock to avoid races, so modify the join_task thread code to
hold onto the spinlock for the entirety of its run up until it has
saved the mcast entry that it's going to work on into a private pointer
and set the flags on that entry to mark it as busy. Only then is it
safe to drop the lock and let mcast_dev_flush remove entries from the
list.
I also reworked the flow of the join_task thread to be more clear that
it is making a single pass at work to do and then exiting as soon as it
gets that single task queued off.
I then reworked both the regular and send only join routines to know
that they will be called with the BUSY flag already set on the mcast
entry they are passed, so any error condition means they need to
clear the flag and complete the task and they should not check the
busy status as it will now always be true.
Signed-off-by: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 112 ++++++++++++-------------
1 file changed, 56 insertions(+), 56 deletions(-)
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index 424c602d5a0..972fa15a225 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -368,22 +368,16 @@ static int ipoib_mcast_sendonly_join(struct ipoib_mcast *mcast)
if (!test_bit(IPOIB_FLAG_OPER_UP, &priv->flags)) {
ipoib_dbg_mcast(priv, "device shutting down, no sendonly "
"multicast joins\n");
+ clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags);
+ complete(&mcast->done);
return -ENODEV;
}
- if (test_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags)) {
- ipoib_dbg_mcast(priv, "multicast entry busy, skipping "
- "sendonly join\n");
- return -EBUSY;
- }
-
rec.mgid = mcast->mcmember.mgid;
rec.port_gid = priv->local_gid;
rec.pkey = cpu_to_be16(priv->pkey);
mutex_lock(&mcast_mutex);
- init_completion(&mcast->done);
- set_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags);
mcast->mc = ib_sa_join_multicast(&ipoib_sa_client, priv->ca,
priv->port, &rec,
IB_SA_MCMEMBER_REC_MGID |
@@ -552,8 +546,6 @@ static void ipoib_mcast_join(struct net_device *dev, struct ipoib_mcast *mcast,
}
mutex_lock(&mcast_mutex);
- init_completion(&mcast->done);
- set_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags);
mcast->mc = ib_sa_join_multicast(&ipoib_sa_client, priv->ca, priv->port,
&rec, comp_mask, GFP_KERNEL,
ipoib_mcast_join_complete, mcast);
@@ -574,6 +566,8 @@ void ipoib_mcast_join_task(struct work_struct *work)
container_of(work, struct ipoib_dev_priv, mcast_task.work);
struct net_device *dev = priv->dev;
struct ib_port_attr port_attr;
+ struct ipoib_mcast *mcast = NULL;
+ int create = 1;
if (!test_bit(IPOIB_MCAST_RUN, &priv->flags))
return;
@@ -591,16 +585,25 @@ void ipoib_mcast_join_task(struct work_struct *work)
else
memcpy(priv->dev->dev_addr + 4, priv->local_gid.raw, sizeof (union ib_gid));
+ /*
+ * We have to hold the mutex to keep from racing with the join
+ * completion threads on setting flags on mcasts, and we have
+ * to hold the priv->lock because dev_flush will remove entries
+ * out from underneath us, so at a minimum we need the lock
+ * through the time that we do the for_each loop of the mcast
+ * list or else dev_flush can make us oops.
+ */
+ mutex_lock(&mcast_mutex);
+ spin_lock_irq(&priv->lock);
+ if (!test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags))
+ goto out;
+
if (!priv->broadcast) {
struct ipoib_mcast *broadcast;
- if (!test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags))
- return;
-
- broadcast = ipoib_mcast_alloc(dev, 1);
+ broadcast = ipoib_mcast_alloc(dev, 0);
if (!broadcast) {
ipoib_warn(priv, "failed to allocate broadcast group\n");
- mutex_lock(&mcast_mutex);
/*
* Restart us after a 1 second delay to retry
* creating our broadcast group and attaching to
@@ -608,67 +611,64 @@ void ipoib_mcast_join_task(struct work_struct *work)
* completely stalled (multicast wise).
*/
__ipoib_mcast_continue_join_thread(priv, NULL, 1);
- mutex_unlock(&mcast_mutex);
- return;
+ goto out;
}
- spin_lock_irq(&priv->lock);
memcpy(broadcast->mcmember.mgid.raw, priv->dev->broadcast + 4,
sizeof (union ib_gid));
priv->broadcast = broadcast;
__ipoib_mcast_add(dev, priv->broadcast);
- spin_unlock_irq(&priv->lock);
}
if (!test_bit(IPOIB_MCAST_FLAG_ATTACHED, &priv->broadcast->flags)) {
if (IS_ERR_OR_NULL(priv->broadcast->mc) &&
- !test_bit(IPOIB_MCAST_FLAG_BUSY, &priv->broadcast->flags))
- ipoib_mcast_join(dev, priv->broadcast, 0);
- return;
- }
-
- while (1) {
- struct ipoib_mcast *mcast = NULL;
-
- /*
- * Need the mutex so our flags are consistent, need the
- * priv->lock so we don't race with list removals in either
- * mcast_dev_flush or mcast_restart_task
- */
- mutex_lock(&mcast_mutex);
- spin_lock_irq(&priv->lock);
- list_for_each_entry(mcast, &priv->multicast_list, list) {
- if (IS_ERR_OR_NULL(mcast->mc) &&
- !test_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags) &&
- !test_bit(IPOIB_MCAST_FLAG_ATTACHED,
- &mcast->flags) &&
- (mcast->backoff == 1 ||
- time_after_eq(jiffies, mcast->delay_until))) {
- /* Found the next unjoined group */
- break;
- }
+ !test_bit(IPOIB_MCAST_FLAG_BUSY, &priv->broadcast->flags)) {
+ mcast = priv->broadcast;
+ create = 0;
}
- spin_unlock_irq(&priv->lock);
- mutex_unlock(&mcast_mutex);
+ goto out;
+ }
- if (&mcast->list == &priv->multicast_list) {
- /* All done, unless we have delayed work from
- * backoff retransmissions, but we will get
- * restarted when the time is right, so we are
- * done for now
- */
+ /* We'll never get here until the broadcast group is both allocated
+ * and attached
+ */
+ list_for_each_entry(mcast, &priv->multicast_list, list) {
+ if (IS_ERR_OR_NULL(mcast->mc) &&
+ !test_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags) &&
+ !test_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags) &&
+ (mcast->backoff == 1 ||
+ time_after_eq(jiffies, mcast->delay_until))) {
+ /* Found the next unjoined group */
break;
}
+ }
+ if (&mcast->list == &priv->multicast_list) {
+ /* All done, unless we have delayed work from
+ * backoff retransmissions, but we will get
+ * restarted when the time is right, so we are
+ * done for now
+ */
+ mcast = NULL;
+ ipoib_dbg_mcast(priv, "successfully joined all "
+ "multicast groups\n");
+ }
+
+out:
+ if (mcast) {
+ init_completion(&mcast->done);
+ set_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags);
+ }
+ spin_unlock_irq(&priv->lock);
+ mutex_unlock(&mcast_mutex);
+ if (mcast) {
if (test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags))
ipoib_mcast_sendonly_join(mcast);
else
- ipoib_mcast_join(dev, mcast, 1);
- return;
+ ipoib_mcast_join(dev, mcast, create);
}
-
- ipoib_dbg_mcast(priv, "successfully joined all multicast groups\n");
+ return;
}
int ipoib_mcast_start_thread(struct net_device *dev)
--
2.1.0
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH FIX For-3.19 v4 0/7] IB/ipoib: follow fixes for multicast handling
[not found] ` <cover.1421725318.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
` (6 preceding siblings ...)
2015-01-20 3:58 ` [PATCH FIX For-3.19 v4 7/7] IB/ipoib: fix race between mcast_dev_flush and mcast_join Doug Ledford
@ 2015-01-20 16:16 ` Erez Shitrit
[not found] ` <54BE7F66.4070404-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
7 siblings, 1 reply; 18+ messages in thread
From: Erez Shitrit @ 2015-01-20 16:16 UTC (permalink / raw)
To: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
roland-DgEjT+Ai2ygdnm+yROfE0A
Cc: Amir Vadai, Eyal Perry, Or Gerlitz, Erez Shitrit
On 1/20/2015 5:58 AM, Doug Ledford wrote:
> These patches are to resolve issues created by my previous patch set.
> While that set worked fine in my testing, there were problems with
> multicast joins after the initial set of joins had completed. Since my
> testing relied upon the normal set of multicast joins that happen
> when the interface is first brought up, I missed those problems.
>
> Symptoms vary from failure to send packets due to a failed join, to
> loss of connectivity after a subnet manager restart, to failure
> to properly release multicast groups on shutdown resulting in hangs
> when the mlx4 driver attempts to unload itself via its reboot
> notifier handler.
>
> This set of patches has passed a number of tests above and beyond my
> original tests. As suggested by Or Gerlitz I added IPv6 and IPv4
> multicast tests. I also added both subnet manager restarts and
> manual shutdown/restart of individual ports at the switch in order to
> ensure that the ENETRESET path was properly tested. I included
> testing, then a subnet manager restart, then a quiescent period for
> caches to expire, then restarting testing to make sure that arp and
> neighbor discovery work after the subnet manager restart.
>
> All in all, I have not been able to trip the multicast joins up any
> longer.
>
> Additionally, the original impetus for my first 8 patch set was that
> it was simply too easy to break the IPoIB subsystem with this simple
> loop:
>
> while true; do
> ifconfig ib0 up
> ifconfig ib0 down
> done
>
> Just to be safe, I made sure this problem did not resurface.
>
> Roland, the 3.19-rc code is broken. We either need to revert my
> original patchset, or grab these, but I would not recommend leaving
> it as it currently stands.
>
> Doug Ledford (7):
> IB/ipoib: Fix failed multicast joins/sends
> IB/ipoib: Add a helper to restart the multicast task
> IB/ipoib: make delayed tasks not hold up everything
> IB/ipoib: Handle -ENETRESET properly in our callback
> IB/ipoib: don't restart our thread on ENETRESET
> IB/ipoib: remove unneeded locks
> IB/ipoib: fix race between mcast_dev_flush and mcast_join
>
> drivers/infiniband/ulp/ipoib/ipoib.h | 1 +
> drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 204 +++++++++++++++----------
> 2 files changed, 121 insertions(+), 84 deletions(-)
>
Hi Doug,
After trying your V4 patch series, I can tell that first, the endless
scheduling of
the mcast task is indeed over, but still, the multicast functionality in
ipoib is unstable.
I see that there are times that ping6 works good, and sometimes it
doesn't, to
make it clear I always use the link-local address assigned by the stack
to the
IPoIB device, see [1] below for how I run it.
I also see that send-only mcast stops working from time to time, see [2]
below
for how I run this. I can narrow the problem to be on the sender
(client) side,
since I work with a peer node which has well functioning IPoIB multicast
code.
One more phenomena, that in some cases I can see that the driver (after the
mcast_debug_level is set) prints endless message:
"ib0: no address vector, but multicast join already started"
One practical solution here would be to revert the offending commit 3.19-rc1
016d9fb "IPoIB: fix MCAST_FLAG_BUSY usage".
Thanks, Erez
1] IPv6 ping
$ ping6 fe80::202:c903:9f:3b0a -I ib0
where the IPv6 address is the one displayed by "ip addr show dev ib0" on
the remote node
[2] IPv4 multicast
# server
$ route add -net 224.0.0.0 netmask 240.0.0.0 dev ib0
$ netserver
# client
$ route add -net 224.0.0.0 netmask 240.0.0.0 dev ib0
$ netperf -H 11.134.33.1 -t omni -- -H 225.5.5.4 -T udp -R 1
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 18+ messages in thread