All of lore.kernel.org
 help / color / mirror / Atom feed
* [Lustre-devel] lustre-1.8.8: rdma_listen() backlog 0 breaks iWARP
@ 2014-07-23 15:44 Steve Wise
  2014-07-23 18:52 ` Dilger, Andreas
  0 siblings, 1 reply; 5+ messages in thread
From: Steve Wise @ 2014-07-23 15:44 UTC (permalink / raw)
  To: lustre-devel

Hello,

I'm trying to get lustre-1.8.8/RHEL6 running over Chelsio iWARP RNICs and connection setup
is failing at the server due to kiblnd_startup() calling rdma_listen() with a backlog of
0.  This effectively rejects all incoming connection requests.   I looked at lustre-1.8.7,
and the backlog was 256 in that release.  

Q:  Why was it changed to 0?   

Thanks,

Steve.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Lustre-devel] lustre-1.8.8: rdma_listen() backlog 0 breaks iWARP
  2014-07-23 15:44 [Lustre-devel] lustre-1.8.8: rdma_listen() backlog 0 breaks iWARP Steve Wise
@ 2014-07-23 18:52 ` Dilger, Andreas
  2014-07-24 14:22   ` Steve Wise
  0 siblings, 1 reply; 5+ messages in thread
From: Dilger, Andreas @ 2014-07-23 18:52 UTC (permalink / raw)
  To: lustre-devel

On 2014/07/23, 9:44 AM, "Steve Wise" <swise@opengridcomputing.com> wrote:

>Hello,
>
>I'm trying to get lustre-1.8.8/RHEL6 running over Chelsio iWARP RNICs and
>connection setup
>is failing at the server due to kiblnd_startup() calling rdma_listen()
>with a backlog of
>0.  This effectively rejects all incoming connection requests.   I looked
>at lustre-1.8.7,
>and the backlog was 256 in that release.
>
>Q:  Why was it changed to 0?

Since I'm not familiar with the LNET code myself, I'd recommend to check
the
commit messages in Git to see if there is an explanation, or in the linked
Jira/Bugzilla ticket.

You may also want to see if this is fixed with the 1.8.9 release.

Cheers, Andreas
-- 
Andreas Dilger

Lustre Software Architect
Intel High Performance Data Division

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Lustre-devel] lustre-1.8.8: rdma_listen() backlog 0 breaks iWARP
  2014-07-23 18:52 ` Dilger, Andreas
@ 2014-07-24 14:22   ` Steve Wise
  2014-07-24 15:08     ` Hefty, Sean
  0 siblings, 1 reply; 5+ messages in thread
From: Steve Wise @ 2014-07-24 14:22 UTC (permalink / raw)
  To: lustre-devel

> >Hello,
> >
> >I'm trying to get lustre-1.8.8/RHEL6 running over Chelsio iWARP RNICs and
> >connection setup
> >is failing at the server due to kiblnd_startup() calling rdma_listen()
> >with a backlog of
> >0.  This effectively rejects all incoming connection requests.   I looked
> >at lustre-1.8.7,
> >and the backlog was 256 in that release.
> >
> >Q:  Why was it changed to 0?
> 
> Since I'm not familiar with the LNET code myself, I'd recommend to check
> the
> commit messages in Git to see if there is an explanation, or in the linked
> Jira/Bugzilla ticket.
> 
> You may also want to see if this is fixed with the 1.8.9 release.
> 

+ sean hefty
+ Isaac Huang

This commit changed the backlog to 0:

commit 7b442f1a43714455fad06c527b6fbc10f82af857
Author: Isaac Huang <he.h.huang@oracle.com>
Date:   Wed Nov 17 07:14:46 2010 -0700

    b=20153 add IB bonding failover support to o2iblnd

    O2iblnd changes to support failover events from an IB
    bonding IPoIB interface. Mostly to recreate device
    specific resources, e.g. listener CMID.

    i=isaac
    i=liang

Bug: https://projectlava.xyratex.com/show_bug.cgi?id=20153

I'm not sure why it was changed to 0 though.  It definitely breaks iwarp support.  I'm not
yet sure what the semantics are for creating a listening cm_id with a backlog of 0.  Was
the assumption that 0 means "let the system choose" or "max supported backlog"?  The iwarp
CM interprets 0 to mean no connection requests allowed. :)  

Isaac, can you explain?

Thanks,

Steve.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Lustre-devel] lustre-1.8.8: rdma_listen() backlog 0 breaks iWARP
  2014-07-24 14:22   ` Steve Wise
@ 2014-07-24 15:08     ` Hefty, Sean
  2014-07-24 15:13       ` Steve Wise
  0 siblings, 1 reply; 5+ messages in thread
From: Hefty, Sean @ 2014-07-24 15:08 UTC (permalink / raw)
  To: lustre-devel

> I'm not sure why it was changed to 0 though.  It definitely breaks iwarp
> support.  I'm not
> yet sure what the semantics are for creating a listening cm_id with a
> backlog of 0.  Was
> the assumption that 0 means "let the system choose" or "max supported
> backlog"?  The iwarp
> CM interprets 0 to mean no connection requests allowed. :)

0 should mean let the system choose.  Interpreting 0 as no connections allowed doesn't really make sense, since the app can get that by not calling listen at all.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Lustre-devel] lustre-1.8.8: rdma_listen() backlog 0 breaks iWARP
  2014-07-24 15:08     ` Hefty, Sean
@ 2014-07-24 15:13       ` Steve Wise
  0 siblings, 0 replies; 5+ messages in thread
From: Steve Wise @ 2014-07-24 15:13 UTC (permalink / raw)
  To: lustre-devel


> > I'm not sure why it was changed to 0 though.  It definitely breaks iwarp
> > support.  I'm not
> > yet sure what the semantics are for creating a listening cm_id with a
> > backlog of 0.  Was
> > the assumption that 0 means "let the system choose" or "max supported
> > backlog"?  The iwarp
> > CM interprets 0 to mean no connection requests allowed. :)
> 
> 0 should mean let the system choose.  Interpreting 0 as no connections allowed doesn't
> really make sense, since the app can get that by not calling listen at all.

Ok then we can fix this in the iwcm.  I'll post a patch to Linux-rdma soon. 

Thanks,

Steve.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-07-24 15:13 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-23 15:44 [Lustre-devel] lustre-1.8.8: rdma_listen() backlog 0 breaks iWARP Steve Wise
2014-07-23 18:52 ` Dilger, Andreas
2014-07-24 14:22   ` Steve Wise
2014-07-24 15:08     ` Hefty, Sean
2014-07-24 15:13       ` Steve Wise

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.