Hang in ipoib_mcast_stop_thread

* Hang in ipoib_mcast_stop_thread
@ 2016-06-06  9:09 Nikolay Borisov
       [not found] ` <57553DE7.2060009-6AxghH7DbtA@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Nikolay Borisov @ 2016-06-06  9:09 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, SiteGround Operations, Tejun Heo

Hello,

[Tejun, I have cc'ed you since I'm not familiar with the internals of
the workqueue code and cannot comment whether the the mcast_task was
queued or and whether there were any tasks in the workqueue based on the
structures I've presented below, could you please comment on that?]

I've been running an infiniband network on 4.4.10 and recently due to
the infiniband failing (I presume) and having the following in my logs:

[128340.462147] ib0: transmit timeout: latency 45014865 msecs
[128340.462151] ib0: queue stopped 1, tx_head 711, tx_tail 583
[128341.461122] ib0: transmit timeout: latency 45015864 msecs
[128341.461126] ib0: queue stopped 1, tx_head 711, tx_tail 583
[128342.461053] ib0: transmit timeout: latency 45016864 msecs
[128342.461056] ib0: queue stopped 1, tx_head 711, tx_tail 583

I had various commands hang due to rtnl_lock being held by an ip
command, trying to shutdown the infiniband interface:

PID: 30572 TASK: ffff881bb6bf6e00 CPU: 15 COMMAND: "ip"
#0 [ffff88027f50b390] __schedule at ffffffff81631421
#1 [ffff88027f50b430] schedule at ffffffff81631cc7
#2 [ffff88027f50b450] schedule_timeout at ffffffff81634c89
#3 [ffff88027f50b4e0] wait_for_completion at ffffffff81632d73
#4 [ffff88027f50b550] flush_workqueue at ffffffff8106c761
#5 [ffff88027f50b620] ipoib_mcast_stop_thread at ffffffffa02e0bf2 [ib_ipoib]
#6 [ffff88027f50b650] ipoib_ib_dev_down at ffffffffa02de7d6 [ib_ipoib]
#7 [ffff88027f50b670] ipoib_stop at ffffffffa02dc208 [ib_ipoib]
#8 [ffff88027f50b6a0] __dev_close_many at ffffffff81562692
#9 [ffff88027f50b6c0] __dev_close at ffffffff81562716
#10 [ffff88027f50b6f0] __dev_change_flags at ffffffff815630fc
#11 [ffff88027f50b730] dev_change_flags at ffffffff81563207
#12 [ffff88027f50b760] do_setlink at ffffffff81576f5f
#13 [ffff88027f50b860] rtnl_newlink at ffffffff81578afb
#14 [ffff88027f50bb10] rtnetlink_rcv_msg at ffffffff81577ae5
#15 [ffff88027f50bb90] netlink_rcv_skb at ffffffff8159a373
#16 [ffff88027f50bbc0] rtnetlink_rcv at ffffffff81577bb5
#17 [ffff88027f50bbe0] netlink_unicast at ffffffff81599e98
#18 [ffff88027f50bc40] netlink_sendmsg at ffffffff8159aabe
#19 [ffff88027f50bd00] sock_sendmsg at ffffffff81548ce9
#20 [ffff88027f50bd20] ___sys_sendmsg at ffffffff8154ae78
#21 [ffff88027f50beb0] __sys_sendmsg at ffffffff8154b059
#22 [ffff88027f50bf40] sys_sendmsg at ffffffff8154b0a9
#23 [ffff88027f50bf50] entry_SYSCALL_64_fastpath at ffffffff81635c57

So clearly ipoib_mcast_stop_thread has hung on trying to stop the
multicast thread. Here is the state of the ipoib_wq:

struct workqueue_struct {
  pwqs = {
    next = 0xffff883ff1196770,
    prev = 0xffff883ff1196770
  },
  list = {
    next = 0xffff883fef5bea10,
    prev = 0xffff883fef201010
  },
  mutex = {
    count = {
      counter = 1
    },
    wait_lock = {
      {
        rlock = {
          raw_lock = {
            val = {
              counter = 0
            }
          }
        }
      }
    },
    wait_list = {
      next = 0xffff883fef200c28,
      prev = 0xffff883fef200c28
    },
    owner = 0x0,
    osq = {
      tail = {
        counter = 0
      }
    }
  },
  work_color = 1,
  flush_color = 0,
  nr_pwqs_to_flush = {
    counter = 1
  },
  first_flusher = 0xffff88027f50b568,
  flusher_queue = {
    next = 0xffff883fef200c60,
    prev = 0xffff883fef200c60
  },
  flusher_overflow = {
    next = 0xffff883fef200c70,
    prev = 0xffff883fef200c70
  },
  maydays = {
    next = 0xffff883fef200c80,
    prev = 0xffff883fef200c80
  },
  rescuer = 0xffff883feb073680,
  nr_drainers = 0,
  saved_max_active = 1,
  unbound_attrs = 0xffff883fef2b5c20,
  dfl_pwq = 0xffff883ff1196700,
  wq_dev = 0x0,
  name =
"ipoib_wq\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
  rcu = {
    next = 0x0,
    func = 0x0
  },
  flags = 131082,
  cpu_pwqs = 0x0,

Also the state of the mcast_task member:

crash> struct ipoib_dev_priv.mcast_task ffff883fed6ec700
  mcast_task = {
    work = {
      data = {
        counter = 3072
      },
      entry = {
        next = 0xffff883fed6ec888,
        prev = 0xffff883fed6ec888
      },
      func = 0xffffffffa02e1220 <ipoib_mcast_join_task>
    },
    timer = {
      entry = {
        next = 0x0,
        pprev = 0x0
      },
      expires = 0,
      function = 0xffffffff8106dd20 <delayed_work_timer_fn>,
      data = 18446612406880618624,
      flags = 2097164,
      slack = -1
    },
    wq = 0x0,
    cpu = 0
  }

flush_workqueue is essentially waiting on
wait_for_completion(&this_flusher.done) in flush_workqueue, which
apparently never returned. I'm assuming this is due to the general
unavailability of the infiniband network. However, I think it's wrong,
in case of infiniband being down, the whole server to be rendered
unresponsive, due to rtnl_lock being held. Do you think it is possible
to rework this code to prevent it hanging in case the workqueue cannot
be flushed? Furthermore, do you think it's feasible to put code in
ipoib_mcast_join_task to detect such situations and not re-arm itself
and then use cancel_delayed_work_sync ?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread