All of lore.kernel.org
 help / color / mirror / Atom feed
From: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
To: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
	Saeed Mahameed <saeedm-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
	Yevgeny Petrilin
	<yevgenyp-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Subject: Re: [PATCH 0/8] IPoIB: Fix multiple race conditions
Date: Wed, 3 Sep 2014 16:52:20 +0300	[thread overview]
Message-ID: <54071D14.9040404@mellanox.com> (raw)
In-Reply-To: <cover.1407885724.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On 8/13/2014 2:38 AM, Doug Ledford wrote:
> Locking of multicast joins/leaves in the IPoIB layer have been problematic
> for a while.  There have been recent changes to try and make things better,
> including these changes:
>
> bea1e22 IPoIB: Fix use-after-free of multicast object
> a9c8ba5 IPoIB: Fix usage of uninitialized multicast objects
>
> Unfortunately, the following test still fails (miserably) on a plain
> upstream kernel:
>
> pass=0
> ifdown ib0
> while true; do
>      ifconfig ib0 up
>      ifconfig ib0 down
>      echo "Pass $pass"
>      let pass++
> done
>
> This usually fails within 10 to 20 passes, although I did have a lucky
> run make it to 300 or so.  If you happen to have a P_Key child interface,
> it fails even quicker.

Hi Doug,

Thanks for looking on that. I wasn't aware we're doing so badly... I 
checked here and the plan is that Erez Shitrit from Mellanox will also 
be providing feedback on the series.

Anyway, just to make sure we're on the same page, people agree that 
picking such series is too heavy for post merge window, right? so we are 
talking on 3.18, agree?

Or.

>
> In tracking down the rtnl deadlock in the multicast code, I ended up
> re-designing the flag usage and clean up just the race conditions in
> the various tasks.  Even that wasn't enough to resolve the issue entirely
> though, as if you ran the test above on multiple interfaces simultaneously,
> it could still deadlock.  So in the end I re-did the workqueue usage too
> so that we now use a workqueue per device (and that includes P_Key devices
> have dedicated workqueues) as well as one global workqueue that does
> nothing but flush tasks.  This turns out to be a much more elegant way
> of handling the workqueues and in fact enabled me to remove all of the
> klunky passing around of flush parameters to tell various functions not
> to flush the workqueue if it would deadlock the workqueue we are running
> from.
>
> Here is my test setup:
>
> 2 InfiniBand physical fabrics: ib0 and ib1
> 2 P_Keys on each fabric: default and 0x8002 on ib0 and 0x8003 on ib1
> 4 total IPoIB devices that I have named mlx4_ib0, mlx4_ib0.8002,
>    mlx4_ib1, and mlx4_ib1.8003
>
> In order to test my final patch set, I logged into my test machine on
> four different virtual terminals, I used ifdown on all of the above
> interfaces to get things in a consistent state, and then I ran the above
> loop, one per terminal per interface simultaneously.
>
> It's worth noting here that when you ifconfig a base interface up, it
> automatically brings up the child interface too, so the ifconfig mlx4_ib0
> up is in fact racing with both ups and downs of mlx4_ib0.8002.  The same
> is true for the mlx4_ib1 interface and its child.  With my patch set in
> place, these loops are currently running without a problem and have passed
> 15,000 up/down cycles per interface.
>
> Doug Ledford (8):
>    IPoIB: Consolidate rtnl_lock tasks in workqueue
>    IPoIB: Make the carrier_on_task race aware
>    IPoIB: fix MCAST_FLAG_BUSY usage
>    IPoIB: fix mcast_dev_flush/mcast_restart_task race
>    IPoIB: change init sequence ordering
>    IPoIB: Use dedicated workqueues per interface
>    IPoIB: Make ipoib_mcast_stop_thread flush the workqueue
>    IPoIB: No longer use flush as a parameter
>
>   drivers/infiniband/ulp/ipoib/ipoib.h           |  19 +-
>   drivers/infiniband/ulp/ipoib/ipoib_cm.c        |  18 +-
>   drivers/infiniband/ulp/ipoib/ipoib_ib.c        |  27 ++-
>   drivers/infiniband/ulp/ipoib/ipoib_main.c      |  49 +++--
>   drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 239 ++++++++++++++++---------
>   drivers/infiniband/ulp/ipoib/ipoib_verbs.c     |  22 ++-
>   6 files changed, 240 insertions(+), 134 deletions(-)
>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2014-09-03 13:52 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-12 23:38 [PATCH 0/8] IPoIB: Fix multiple race conditions Doug Ledford
     [not found] ` <cover.1407885724.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-08-12 23:38   ` [PATCH 1/8] IPoIB: Consolidate rtnl_lock tasks in workqueue Doug Ledford
     [not found]     ` <2394730ce5ae1d46522dca04066293dd842edf16.1407885724.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-08-15 22:11       ` Wendy Cheng
2014-09-04 14:28       ` Erez Shitrit
2014-08-12 23:38   ` [PATCH 2/8] IPoIB: Make the carrier_on_task race aware Doug Ledford
     [not found]     ` <d05cdce70f1312f35f8be2d14bafd2a06809b137.1407885724.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-08-18 23:26       ` Wendy Cheng
     [not found]         ` <CABgxfbE6edfZZ58=mTvhGqWSkCxsik0XuQPR0L-yayze=803cg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-08-19 20:32           ` Doug Ledford
     [not found]             ` <2CC1794B-10CD-49A2-8F5D-0C66A6684DBC-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-08-19 22:05               ` Wendy Cheng
2014-09-04 12:13       ` Erez Shitrit
     [not found]         ` <5408576C.7040609-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2014-09-09  7:17           ` Doug Ledford
2014-08-12 23:38   ` [PATCH 3/8] IPoIB: fix MCAST_FLAG_BUSY usage Doug Ledford
     [not found]     ` <a410e80dc5ca7cfa64229bbbf50c1337317e3bd8.1407885724.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-08-19 18:08       ` Wendy Cheng
     [not found]         ` <CABgxfbH-Dt3CpxJKwCAZeHTUyupaA9y_WXVXuxgiPMet26PTQw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-08-19 20:28           ` Doug Ledford
     [not found]             ` <902D5BF2-159A-4B31-A87F-7491F3C8057F-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-08-25 19:51               ` Wendy Cheng
     [not found]                 ` <CABgxfbHOD75vLdZ0TtWZbk8ne3kHd_eWObxPHmoJ-D8DjE0bkw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-08-25 20:03                   ` Doug Ledford
     [not found]                     ` <E3EFCBAC-2D6E-49D3-A556-DBD40701CC5F-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-08-26 18:04                       ` Wendy Cheng
2014-08-12 23:38   ` [PATCH 4/8] IPoIB: fix mcast_dev_flush/mcast_restart_task race Doug Ledford
     [not found]     ` <e1dbcfc25d8930b281aad12699ebf8fa82485b0e.1407885724.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-08-29 21:53       ` Wendy Cheng
     [not found]         ` <CABgxfbHDuUrdHuLJT2oD07Cy3Ys4_rj-bJ6eR=9+uv0CuPH7_Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-08-30 15:39           ` Wendy Cheng
     [not found]             ` <CABgxfbGJOdmAn1sokEtisDdnA=r_4mfP=PfqZVsP0cd_oL50dA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-09-03 18:06               ` Doug Ledford
     [not found]                 ` <1409767614.26762.7.camel-v+aXH1h/sVwpzh8Nc7Vzg+562jBIR2Zt@public.gmane.org>
2014-09-03 19:45                   ` Wendy Cheng
2014-09-03 17:49           ` Doug Ledford
2014-08-12 23:38   ` [PATCH 5/8] IPoIB: change init sequence ordering Doug Ledford
     [not found]     ` <ead9800512c1cb412b86cb1de3868c40f07c72be.1407885724.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-09-04 12:36       ` Erez Shitrit
2014-08-12 23:38   ` [PATCH 6/8] IPoIB: Use dedicated workqueues per interface Doug Ledford
     [not found]     ` <f7af9c251d722675a549e4a673f46c0f31dfa266.1407885724.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-08-20 15:01       ` Estrin, Alex
2014-09-04  6:49       ` Erez Shitrit
     [not found]         ` <54080B6B.8050707-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2014-09-09  7:09           ` Doug Ledford
     [not found]             ` <540EA7B2.2050805-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-09-10 17:27               ` Erez Shitrit
2014-08-12 23:38   ` [PATCH 7/8] IPoIB: Make ipoib_mcast_stop_thread flush the workqueue Doug Ledford
     [not found]     ` <ae3912431eeacd81d920a405a6bdeb3853791b1a.1407885724.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-08-20 15:03       ` Estrin, Alex
2014-08-12 23:38   ` [PATCH 8/8] IPoIB: No longer use flush as a parameter Doug Ledford
     [not found]     ` <ad7bb2b8da52f187cf2978e6a1c77ead32b60de3.1407885724.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-08-20 15:04       ` Estrin, Alex
2014-08-15 22:08   ` [PATCH 0/8] IPoIB: Fix multiple race conditions Wendy Cheng
     [not found]     ` <CABgxfbEGfiNGUKT4NJi1GoDRouFznxpgogDt5yr47TLfwDB7hA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-09-03 16:26       ` Wendy Cheng
2014-09-03 13:52   ` Or Gerlitz [this message]
     [not found]     ` <54071D14.9040404-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2014-09-03 18:12       ` Doug Ledford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54071D14.9040404@mellanox.com \
    --to=ogerlitz-vpraknaxozvwk0htik3j/w@public.gmane.org \
    --cc=dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=saeedm-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    --cc=yevgenyp-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.