All of lore.kernel.org
 help / color / mirror / Atom feed
From: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Wendy Cheng <s.wendy.cheng-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: "linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org"
	<roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Subject: Re: [PATCH 4/8] IPoIB: fix mcast_dev_flush/mcast_restart_task race
Date: Wed, 03 Sep 2014 14:06:54 -0400	[thread overview]
Message-ID: <1409767614.26762.7.camel@firewall.xsintricity.com> (raw)
In-Reply-To: <CABgxfbGJOdmAn1sokEtisDdnA=r_4mfP=PfqZVsP0cd_oL50dA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 3223 bytes --]

On Sat, 2014-08-30 at 08:39 -0700, Wendy Cheng wrote:
> On Fri, Aug 29, 2014 at 2:53 PM, Wendy Cheng <s.wendy.cheng-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > On Tue, Aug 12, 2014 at 4:38 PM, Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> >> Our mcast_dev_flush routine and our mcast_restart_task can race against
> >> each other.  In particular, they both hold the priv->lock while
> >> manipulating the rbtree and while removing mcast entries from the
> >> multicast_list and while adding entries to the remove_list, but they
> >> also both drop their locks prior to doing the actual removes.  The
> >> mcast_dev_flush routine is run entirely under the rtnl lock and so has
> >> at least some locking.  The actual race condition is like this:
> >>
> >> Thread 1                                Thread 2
> >> ifconfig ib0 up
> >>   start multicast join for broadcast
> >>   multicast join completes for broadcast
> >>   start to add more multicast joins
> >>     call mcast_restart_task to add new entries
> >>                                         ifconfig ib0 down
> >>                                           mcast_dev_flush
> >>                                             mcast_leave(mcast A)
> >>     mcast_leave(mcast A)
> >>
> >> As mcast_leave calls ib_sa_multicast_leave, and as member in
> >> core/multicast.c is ref counted, we run into an unbalanced refcount
> >> issue.  To avoid stomping on each others removes, take the rtnl lock
> >> specifically when we are deleting the entries from the remove list.
> >
> > Isn't "test_and_clear_bit()" atomic so it is unlikely that
> > ib_sa_free_multicast() can run multiple times  ?
> 
> Oops .. how about if the structure itself gets freed ? My bad !

Well, just like the last email, the code you are referring to is in the
original code, and had other issues.  After my patches it does not look
like that.

> However, isn't that the remove_list a local list on the caller's stack
> ? .. and  the original list entry moving (to remove_list) is protected
> by the spin lock (priv->lock), it is unlikely that the
> ib_sa_free_multicast() can operate on the same entry ?

Yes, you're right.  I had it in my mind that the remove_list was part of
the ipoib private dev, not local on the stack.  So you are right that we
could probably get away with removing the rtnl lock there (although it
would need to be in a later patch than the one you are reviewing
here...here there would still be a race between the restart task and the
downing of the interface because they all still share the same work
queue, but once we switch to the per device work queues in patch #6,
this can happen in parallel safely with the flush task I think).

> The patch itself is harmless though .. but adding the rntl_lock is
> really not ideal.
> 
> -- Wendy
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

  parent reply	other threads:[~2014-09-03 18:06 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-12 23:38 [PATCH 0/8] IPoIB: Fix multiple race conditions Doug Ledford
     [not found] ` <cover.1407885724.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-08-12 23:38   ` [PATCH 1/8] IPoIB: Consolidate rtnl_lock tasks in workqueue Doug Ledford
     [not found]     ` <2394730ce5ae1d46522dca04066293dd842edf16.1407885724.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-08-15 22:11       ` Wendy Cheng
2014-09-04 14:28       ` Erez Shitrit
2014-08-12 23:38   ` [PATCH 2/8] IPoIB: Make the carrier_on_task race aware Doug Ledford
     [not found]     ` <d05cdce70f1312f35f8be2d14bafd2a06809b137.1407885724.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-08-18 23:26       ` Wendy Cheng
     [not found]         ` <CABgxfbE6edfZZ58=mTvhGqWSkCxsik0XuQPR0L-yayze=803cg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-08-19 20:32           ` Doug Ledford
     [not found]             ` <2CC1794B-10CD-49A2-8F5D-0C66A6684DBC-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-08-19 22:05               ` Wendy Cheng
2014-09-04 12:13       ` Erez Shitrit
     [not found]         ` <5408576C.7040609-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2014-09-09  7:17           ` Doug Ledford
2014-08-12 23:38   ` [PATCH 3/8] IPoIB: fix MCAST_FLAG_BUSY usage Doug Ledford
     [not found]     ` <a410e80dc5ca7cfa64229bbbf50c1337317e3bd8.1407885724.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-08-19 18:08       ` Wendy Cheng
     [not found]         ` <CABgxfbH-Dt3CpxJKwCAZeHTUyupaA9y_WXVXuxgiPMet26PTQw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-08-19 20:28           ` Doug Ledford
     [not found]             ` <902D5BF2-159A-4B31-A87F-7491F3C8057F-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-08-25 19:51               ` Wendy Cheng
     [not found]                 ` <CABgxfbHOD75vLdZ0TtWZbk8ne3kHd_eWObxPHmoJ-D8DjE0bkw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-08-25 20:03                   ` Doug Ledford
     [not found]                     ` <E3EFCBAC-2D6E-49D3-A556-DBD40701CC5F-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-08-26 18:04                       ` Wendy Cheng
2014-08-12 23:38   ` [PATCH 4/8] IPoIB: fix mcast_dev_flush/mcast_restart_task race Doug Ledford
     [not found]     ` <e1dbcfc25d8930b281aad12699ebf8fa82485b0e.1407885724.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-08-29 21:53       ` Wendy Cheng
     [not found]         ` <CABgxfbHDuUrdHuLJT2oD07Cy3Ys4_rj-bJ6eR=9+uv0CuPH7_Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-08-30 15:39           ` Wendy Cheng
     [not found]             ` <CABgxfbGJOdmAn1sokEtisDdnA=r_4mfP=PfqZVsP0cd_oL50dA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-09-03 18:06               ` Doug Ledford [this message]
     [not found]                 ` <1409767614.26762.7.camel-v+aXH1h/sVwpzh8Nc7Vzg+562jBIR2Zt@public.gmane.org>
2014-09-03 19:45                   ` Wendy Cheng
2014-09-03 17:49           ` Doug Ledford
2014-08-12 23:38   ` [PATCH 5/8] IPoIB: change init sequence ordering Doug Ledford
     [not found]     ` <ead9800512c1cb412b86cb1de3868c40f07c72be.1407885724.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-09-04 12:36       ` Erez Shitrit
2014-08-12 23:38   ` [PATCH 6/8] IPoIB: Use dedicated workqueues per interface Doug Ledford
     [not found]     ` <f7af9c251d722675a549e4a673f46c0f31dfa266.1407885724.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-08-20 15:01       ` Estrin, Alex
2014-09-04  6:49       ` Erez Shitrit
     [not found]         ` <54080B6B.8050707-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2014-09-09  7:09           ` Doug Ledford
     [not found]             ` <540EA7B2.2050805-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-09-10 17:27               ` Erez Shitrit
2014-08-12 23:38   ` [PATCH 7/8] IPoIB: Make ipoib_mcast_stop_thread flush the workqueue Doug Ledford
     [not found]     ` <ae3912431eeacd81d920a405a6bdeb3853791b1a.1407885724.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-08-20 15:03       ` Estrin, Alex
2014-08-12 23:38   ` [PATCH 8/8] IPoIB: No longer use flush as a parameter Doug Ledford
     [not found]     ` <ad7bb2b8da52f187cf2978e6a1c77ead32b60de3.1407885724.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-08-20 15:04       ` Estrin, Alex
2014-08-15 22:08   ` [PATCH 0/8] IPoIB: Fix multiple race conditions Wendy Cheng
     [not found]     ` <CABgxfbEGfiNGUKT4NJi1GoDRouFznxpgogDt5yr47TLfwDB7hA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-09-03 16:26       ` Wendy Cheng
2014-09-03 13:52   ` Or Gerlitz
     [not found]     ` <54071D14.9040404-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2014-09-03 18:12       ` Doug Ledford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1409767614.26762.7.camel@firewall.xsintricity.com \
    --to=dledford-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=s.wendy.cheng-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.