On Wed, Sep 23, 2009 at 12:08 PM, Sean Hefty <sean.hefty@intel.com> wrote:
>ibnetdiscover D ffffffff80149b8d     0 26968  26544
>(L-TLB)
> ffff8102c900bd88 0000000000000046 ffff81037e8e0000 ffff81037e8e02e8
> ffff8102c900bd78 000000000000000a ffff8102c5b50820 ffff81038a929820
> 0000011837bf6105 0000000000000ede ffff8102c5b50a08 0000000100000000
>Call Trace:
> [<ffffffff80064207>] wait_for_completion+0x79/0xa2
> [<ffffffff8008b4cc>] default_wake_function+0x0/0xe
> [<ffffffff882271d9>] :ib_mad:ib_cancel_rmpp_recvs+0x87/0xde
> [<ffffffff88224485>] :ib_mad:ib_unregister_mad_agent+0x30d/0x424
> [<ffffffff883983e9>] :ib_umad:ib_umad_close+0x9d/0xd6
> [<ffffffff80012e22>] __fput+0xae/0x198
> [<ffffffff80023de6>] filp_close+0x5c/0x64
> [<ffffffff800393df>] put_files_struct+0x63/0xae
> [<ffffffff80015b26>] do_exit+0x31c/0x911
> [<ffffffff8004971a>] cpuset_exit+0x0/0x6c
> [<ffffffff8005e116>] system_call+0x7e/0x83
>
>From the dump it seems that the process is waits on the call to
>flush_workqueue() in ib_cancel_rmpp_recvs(). The package they use is
>OFED 1.4.2.

Roland just submitted a patch in this area yesterday.  I don't know if the patch
would fix their issue, but it may be worth trying.  What kernel does 1.4.2 map
to?

What RMPP messages does ibnetdiscover use?
 
None AFAIK.
 
-- Hal
 
  If the program is completing
successfully, there may be a different race with the rmpp cleanup.  I'll see if
anything else stands out in that area.

- Sean

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html