All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] iw_cm: reject connect requests if cmid is not in LISTEN
@ 2012-02-22 21:43 Steve Wise
       [not found] ` <20120222214307.23921.83903.stgit-T4OLL4TyM9aNDNWfRnPdfg@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Steve Wise @ 2012-02-22 21:43 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A, sean.hefty-ral2JQCrhuEAvxtiuMwx3w
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

When destroying a listening cmid, the iwcm first marks the state of
the cmid as DESTROYING, then releases the lock and calls into the
iwarp provider to destroy the endpoint.  Since the cmid is not locked,
its possible for the iwarp provider to pass a connection request event
to the iwcm, which will be silently dropped by the iwcm.  This causes
the iwarp provider to never free up the resources from this connection
because the assumption is the iwcm will accept or reject this connection.

The solution is to reject these connection requests.

Signed-off-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
---

 drivers/infiniband/core/iwcm.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/core/iwcm.c b/drivers/infiniband/core/iwcm.c
index 1a696f7..6847d76 100644
--- a/drivers/infiniband/core/iwcm.c
+++ b/drivers/infiniband/core/iwcm.c
@@ -631,6 +631,8 @@ static void cm_conn_req_handler(struct iwcm_id_private *listen_id_priv,
 	spin_lock_irqsave(&listen_id_priv->lock, flags);
 	if (listen_id_priv->state != IW_CM_STATE_LISTEN) {
 		spin_unlock_irqrestore(&listen_id_priv->lock, flags);
+		iw_cm_reject(cm_id, NULL, 0);
+		iw_destroy_cm_id(cm_id);
 		goto out;
 	}
 	spin_unlock_irqrestore(&listen_id_priv->lock, flags);

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] iw_cm: reject connect requests if cmid is not in LISTEN
       [not found] ` <20120222214307.23921.83903.stgit-T4OLL4TyM9aNDNWfRnPdfg@public.gmane.org>
@ 2012-02-23  7:46   ` Roland Dreier
       [not found]     ` <CAL1RGDV7ZoKWgbh+ERF+af3_B7K2USAkXSPKWeQEg5atpHY0og-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2012-02-24 21:32   ` Roland Dreier
  1 sibling, 1 reply; 9+ messages in thread
From: Roland Dreier @ 2012-02-23  7:46 UTC (permalink / raw)
  To: Steve Wise
  Cc: sean.hefty-ral2JQCrhuEAvxtiuMwx3w, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Wed, Feb 22, 2012 at 1:43 PM, Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+v7I7tHvgBF7@public.gmane.orgm> wrote:
> diff --git a/drivers/infiniband/core/iwcm.c b/drivers/infiniband/core/iwcm.c
> index 1a696f7..6847d76 100644
> --- a/drivers/infiniband/core/iwcm.c
> +++ b/drivers/infiniband/core/iwcm.c
> @@ -631,6 +631,8 @@ static void cm_conn_req_handler(struct iwcm_id_private *listen_id_priv,
>        spin_lock_irqsave(&listen_id_priv->lock, flags);
>        if (listen_id_priv->state != IW_CM_STATE_LISTEN) {
>                spin_unlock_irqrestore(&listen_id_priv->lock, flags);
> +               iw_cm_reject(cm_id, NULL, 0);
> +               iw_destroy_cm_id(cm_id);
>                goto out;
>        }
>        spin_unlock_irqrestore(&listen_id_priv->lock, flags);

Thanks, this makes more sense to my brain at least.

I assume this works just as well in your testing? :)
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] iw_cm: reject connect requests if cmid is not in LISTEN
       [not found]     ` <CAL1RGDV7ZoKWgbh+ERF+af3_B7K2USAkXSPKWeQEg5atpHY0og-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-02-23 15:24       ` Steve Wise
       [not found]         ` <4F465A46.3060301-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Steve Wise @ 2012-02-23 15:24 UTC (permalink / raw)
  To: Roland Dreier
  Cc: sean.hefty-ral2JQCrhuEAvxtiuMwx3w, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 02/23/2012 01:46 AM, Roland Dreier wrote:
> On Wed, Feb 22, 2012 at 1:43 PM, Steve Wise<swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>  wrote:
>> diff --git a/drivers/infiniband/core/iwcm.c b/drivers/infiniband/core/iwcm.c
>> index 1a696f7..6847d76 100644
>> --- a/drivers/infiniband/core/iwcm.c
>> +++ b/drivers/infiniband/core/iwcm.c
>> @@ -631,6 +631,8 @@ static void cm_conn_req_handler(struct iwcm_id_private *listen_id_priv,
>>         spin_lock_irqsave(&listen_id_priv->lock, flags);
>>         if (listen_id_priv->state != IW_CM_STATE_LISTEN) {
>>                 spin_unlock_irqrestore(&listen_id_priv->lock, flags);
>> +               iw_cm_reject(cm_id, NULL, 0);
>> +               iw_destroy_cm_id(cm_id);
>>                 goto out;
>>         }
>>         spin_unlock_irqrestore(&listen_id_priv->lock, flags);
> Thanks, this makes more sense to my brain at least.
>

Yes, this is the best fix methinks.  Thanks for the review!

> I assume this works just as well in your testing? :)

Yes, I've run some large NP MPI tests that tickle this condition and all the connections get cleaned up now.  I also ran 
some other MPI regression tests with this fix.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] iw_cm: reject connect requests if cmid is not in LISTEN
       [not found]         ` <4F465A46.3060301-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
@ 2012-02-23 19:55           ` Steve Wise
       [not found]             ` <4F4699A1.7030402-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Steve Wise @ 2012-02-23 19:55 UTC (permalink / raw)
  To: Roland Dreier
  Cc: sean.hefty-ral2JQCrhuEAvxtiuMwx3w, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 02/23/2012 09:24 AM, Steve Wise wrote:
> On 02/23/2012 01:46 AM, Roland Dreier wrote:
>> On Wed, Feb 22, 2012 at 1:43 PM, Steve Wise<swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>  wrote:
>>> diff --git a/drivers/infiniband/core/iwcm.c b/drivers/infiniband/core/iwcm.c
>>> index 1a696f7..6847d76 100644
>>> --- a/drivers/infiniband/core/iwcm.c
>>> +++ b/drivers/infiniband/core/iwcm.c
>>> @@ -631,6 +631,8 @@ static void cm_conn_req_handler(struct iwcm_id_private *listen_id_priv,
>>>         spin_lock_irqsave(&listen_id_priv->lock, flags);
>>>         if (listen_id_priv->state != IW_CM_STATE_LISTEN) {
>>>                 spin_unlock_irqrestore(&listen_id_priv->lock, flags);
>>> +               iw_cm_reject(cm_id, NULL, 0);
>>> +               iw_destroy_cm_id(cm_id);
>>>                 goto out;
>>>         }
>>>         spin_unlock_irqrestore(&listen_id_priv->lock, flags);
>> Thanks, this makes more sense to my brain at least.
>>
>
> Yes, this is the best fix methinks.  Thanks for the review!
>
>> I assume this works just as well in your testing? :)
>
> Yes, I've run some large NP MPI tests that tickle this condition and all the connections get cleaned up now.  I also 
> ran some other MPI regression tests with this fix.
>

Hrm.  I just hit this after more testing.  Debugging now.  Just hold of on this patch until I root cause this.


Unable to handle kernel paging request at 0000000000200200 RIP:
  [<0000000000200200>]
PGD 183c984067 PUD 0
Oops: 0010 [1] SMP
last sysfs file: /class/infiniband/cxgb4_0/node_guid
CPU 10
Modules linked in: nfs fscache nfs_acl cxgb3(U) iw_cxgb4(U) kretprobes(U) autofs4 hidp rfcomm l2cap bluetooth lockd 
sunrpc be2iscsi iscsi_tcp bnx2i cnic uio libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi rdma_ucm(U) 
ib_sdp(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) ipoib_helper(U) ib_cm(U) ib_sa(U) ipv6 xfrm_nalgo crypto_api 
ib_uverbs(U) ib_umad(U) iw_nes(U) ib_qib(U) dca mlx4_ib(U) mlx4_en(U) mlx4_core(U) ib_mthca(U) ib_mad(U) ib_core(U) 
dm_mirror dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi 
acpi_memhotplug ac parport_pc lp parport joydev cxgb4(U) tpm_tis tpm e1000e tpm_bios sr_mod shpchp i7core_edac edac_mc 
cdrom i2c_i801 i2c_core serio_raw 8021q sg pcspkr dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache ahci 
libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 5708, comm: iw_cm_wq Tainted: G      2.6.18-238.el5 #1
RIP: 0010:[<0000000000200200>]  [<0000000000200200>]
RSP: 0018:ffff81183e0cfcf8  EFLAGS: 00010097
RAX: ffff810c3cf3ca58 RBX: 0c30100000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff81012aad6a58
RBP: ffff81183e0cfd30 R08: ffff81012aad6a70 R09: 0000000000000282
R10: 0000000000000000 R11: 0000000000000280 R12: 0000000000000000
R13: 0000000000003c15 R14: ffff810c3cf3ca50 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff810c6a3c42c0(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000200200 CR3: 0000000c3d5a4000 CR4: 00000000000006e0
Process iw_cm_wq (pid: 5708, threadinfo ffff81183e0ce000, task ffff810c3ea79080)
Stack:  ffffffff8008c846 0000000300000000 ffff810c3cf3ca50 0000000000000000
  0000000000000000 0000000000000282 0000000000000003 ffff81183e0cfd70
  ffffffff8002e261 0000000000000000 ffff810c3cf3c9c0 ffff810c3cf3c900
Call Trace:
  [<ffffffff8008c846>] __wake_up_common+0x3e/0x68
  [<ffffffff8002e261>] __wake_up+0x38/0x4f
  [<ffffffff8867410b>] :iw_cm:iw_cm_reject+0x5a/0xa7
  [<ffffffff88674baa>] :iw_cm:cm_work_handler+0x15e/0x424
  [<ffffffff88674a4c>] :iw_cm:cm_work_handler+0x0/0x424
  [<ffffffff8004d7ae>] run_workqueue+0x99/0xf6
  [<ffffffff80049ff6>] worker_thread+0x0/0x122
  [<ffffffff800a269c>] keventd_create_kthread+0x0/0xc4
  [<ffffffff8004a0e6>] worker_thread+0xf0/0x122
  [<ffffffff8008e40a>] default_wake_function+0x0/0xe
  [<ffffffff800a269c>] keventd_create_kthread+0x0/0xc4
  [<ffffffff80032974>] kthread+0xfe/0x132
  [<ffffffff8005dfb1>] child_rip+0xa/0x11
  [<ffffffff800a269c>] keventd_create_kthread+0x0/0xc4
  [<ffffffff80032876>] kthread+0x0/0x132
  [<ffffffff8005dfa7>] child_rip+0x0/0x11


Code:  Bad RIP value.
RIP  [<0000000000200200>]
  RSP <ffff81183e0cfcf8>
crash>


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] iw_cm: reject connect requests if cmid is not in LISTEN
       [not found]             ` <4F4699A1.7030402-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
@ 2012-02-23 20:23               ` Steve Wise
  2012-02-24  1:57               ` Roland Dreier
  1 sibling, 0 replies; 9+ messages in thread
From: Steve Wise @ 2012-02-23 20:23 UTC (permalink / raw)
  To: Roland Dreier
  Cc: sean.hefty-ral2JQCrhuEAvxtiuMwx3w, linux-rdma-u79uwXL29TY76Z2rM5mHXA


>
> Hrm.  I just hit this after more testing.  Debugging now.  Just hold of on this patch until I root cause this.
>
>
> Unable to handle kernel paging request at 0000000000200200 RIP:
>  [<0000000000200200>]
> PGD 183c984067 PUD 0
> Oops: 0010 [1] SMP
> last sysfs file: /class/infiniband/cxgb4_0/node_guid
> CPU 10
> Modules linked in: nfs fscache nfs_acl cxgb3(U) iw_cxgb4(U) kretprobes(U) autofs4 hidp rfcomm l2cap bluetooth lockd 
> sunrpc be2iscsi iscsi_tcp bnx2i cnic uio libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi rdma_ucm(U) 
> ib_sdp(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) ipoib_helper(U) ib_cm(U) ib_sa(U) ipv6 xfrm_nalgo crypto_api 
> ib_uverbs(U) ib_umad(U) iw_nes(U) ib_qib(U) dca mlx4_ib(U) mlx4_en(U) mlx4_core(U) ib_mthca(U) ib_mad(U) ib_core(U) 
> dm_mirror dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi 
> acpi_memhotplug ac parport_pc lp parport joydev cxgb4(U) tpm_tis tpm e1000e tpm_bios sr_mod shpchp i7core_edac edac_mc 
> cdrom i2c_i801 i2c_core serio_raw 8021q sg pcspkr dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache ahci 
> libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
> Pid: 5708, comm: iw_cm_wq Tainted: G      2.6.18-238.el5 #1
> RIP: 0010:[<0000000000200200>]  [<0000000000200200>]
> RSP: 0018:ffff81183e0cfcf8  EFLAGS: 00010097
> RAX: ffff810c3cf3ca58 RBX: 0c30100000000000 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff81012aad6a58
> RBP: ffff81183e0cfd30 R08: ffff81012aad6a70 R09: 0000000000000282
> R10: 0000000000000000 R11: 0000000000000280 R12: 0000000000000000
> R13: 0000000000003c15 R14: ffff810c3cf3ca50 R15: 0000000000000000
> FS:  0000000000000000(0000) GS:ffff810c6a3c42c0(0000) knlGS:0000000000000000
> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 0000000000200200 CR3: 0000000c3d5a4000 CR4: 00000000000006e0
> Process iw_cm_wq (pid: 5708, threadinfo ffff81183e0ce000, task ffff810c3ea79080)
> Stack:  ffffffff8008c846 0000000300000000 ffff810c3cf3ca50 0000000000000000
>  0000000000000000 0000000000000282 0000000000000003 ffff81183e0cfd70
>  ffffffff8002e261 0000000000000000 ffff810c3cf3c9c0 ffff810c3cf3c900
> Call Trace:
>  [<ffffffff8008c846>] __wake_up_common+0x3e/0x68
>  [<ffffffff8002e261>] __wake_up+0x38/0x4f
>  [<ffffffff8867410b>] :iw_cm:iw_cm_reject+0x5a/0xa7
>  [<ffffffff88674baa>] :iw_cm:cm_work_handler+0x15e/0x424
>  [<ffffffff88674a4c>] :iw_cm:cm_work_handler+0x0/0x424
>  [<ffffffff8004d7ae>] run_workqueue+0x99/0xf6
>  [<ffffffff80049ff6>] worker_thread+0x0/0x122
>  [<ffffffff800a269c>] keventd_create_kthread+0x0/0xc4
>  [<ffffffff8004a0e6>] worker_thread+0xf0/0x122
>  [<ffffffff8008e40a>] default_wake_function+0x0/0xe
>  [<ffffffff800a269c>] keventd_create_kthread+0x0/0xc4
>  [<ffffffff80032974>] kthread+0xfe/0x132
>  [<ffffffff8005dfb1>] child_rip+0xa/0x11
>  [<ffffffff800a269c>] keventd_create_kthread+0x0/0xc4
>  [<ffffffff80032876>] kthread+0x0/0x132
>  [<ffffffff8005dfa7>] child_rip+0x0/0x11
>
>


Strange.  From my analysis, cm_work_handler + 0x15e points to cm_conn_req_handler() in the block where 
alloc_work_entries() returns non zero:

         cm_id_priv = container_of(cm_id, struct iwcm_id_private, id);
         cm_id_priv->state = IW_CM_STATE_CONN_RECV;

         ret = alloc_work_entries(cm_id_priv, 3);
         if (ret) {
                 iw_cm_reject(cm_id, NULL, 0);
                 iw_destroy_cm_id(cm_id);
                 goto out;
         }


So its calling iw_cm_reject() in the block above having just set the state to CONN_RECV.

Now, iw_cm_reject + 0x5a points to this code in iw_cm_reject():

         if (cm_id_priv->state != IW_CM_STATE_CONN_RECV) {
                 spin_unlock_irqrestore(&cm_id_priv->lock, flags);
                 clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags);
                 wake_up_all(&cm_id_priv->connect_wait);
                 return -EINVAL;
         }


Since the state isn't CONN_RECV, yet the previous stack frame set the state to this, then I can only assume some other 
thread is whacking the cm_id concurrently.




--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] iw_cm: reject connect requests if cmid is not in LISTEN
       [not found]             ` <4F4699A1.7030402-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
  2012-02-23 20:23               ` Steve Wise
@ 2012-02-24  1:57               ` Roland Dreier
       [not found]                 ` <CAL1RGDWkVJxEDZ5SaaSa8oA_y6a0u1NCbzTK9agsJE+V_YzimQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 9+ messages in thread
From: Roland Dreier @ 2012-02-24  1:57 UTC (permalink / raw)
  To: Steve Wise
  Cc: sean.hefty-ral2JQCrhuEAvxtiuMwx3w, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Thu, Feb 23, 2012 at 11:55 AM, Steve Wise
<swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> wrote:
> RIP: 0010:[<0000000000200200>]  [<0000000000200200>]

0x200200 is LIST_POISON2, which is set when something is removed
from a Linux list.

>  [<ffffffff8008c846>] __wake_up_common+0x3e/0x68
>  [<ffffffff8002e261>] __wake_up+0x38/0x4f
>  [<ffffffff8867410b>] :iw_cm:iw_cm_reject+0x5a/0xa7

So in the code

       if (cm_id_priv->state != IW_CM_STATE_CONN_RECV) {
               spin_unlock_irqrestore(&cm_id_priv->lock, flags);
               clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags);
               wake_up_all(&cm_id_priv->connect_wait);
               return -EINVAL;
       }

the cm_id_priv->connect_wait is being corrupted before
or during the call to wake_up_all...

could the HW or low-level driver be generating events for
the incoming connections that are associated with the
listen ID, as the listen ID is being destroyed?

Maybe that's why the code silently ignored these requests
before :)

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] iw_cm: reject connect requests if cmid is not in LISTEN
       [not found]                 ` <CAL1RGDWkVJxEDZ5SaaSa8oA_y6a0u1NCbzTK9agsJE+V_YzimQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-02-24 14:16                   ` Steve Wise
  0 siblings, 0 replies; 9+ messages in thread
From: Steve Wise @ 2012-02-24 14:16 UTC (permalink / raw)
  To: Roland Dreier
  Cc: sean.hefty-ral2JQCrhuEAvxtiuMwx3w, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 02/23/2012 07:57 PM, Roland Dreier wrote:
> On Thu, Feb 23, 2012 at 11:55 AM, Steve Wise
> <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>  wrote:
>> RIP: 0010:[<0000000000200200>]  [<0000000000200200>]
> 0x200200 is LIST_POISON2, which is set when something is removed
> from a Linux list.
>
>>   [<ffffffff8008c846>] __wake_up_common+0x3e/0x68
>>   [<ffffffff8002e261>] __wake_up+0x38/0x4f
>>   [<ffffffff8867410b>] :iw_cm:iw_cm_reject+0x5a/0xa7
> So in the code
>
>         if (cm_id_priv->state != IW_CM_STATE_CONN_RECV) {
>                 spin_unlock_irqrestore(&cm_id_priv->lock, flags);
>                 clear_bit(IWCM_F_CONNECT_WAIT,&cm_id_priv->flags);
>                 wake_up_all(&cm_id_priv->connect_wait);
>                 return -EINVAL;
>         }
>
> the cm_id_priv->connect_wait is being corrupted before
> or during the call to wake_up_all...
>
> could the HW or low-level driver be generating events for
> the incoming connections that are associated with the
> listen ID, as the listen ID is being destroyed?
>
> Maybe that's why the code silently ignored these requests
> before :)
>

The low level driver at this point doesn't have the cm_id for this connect request because it has not been accepted or 
rejected by the iwcm.  So there's no way it can post any events.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] iw_cm: reject connect requests if cmid is not in LISTEN
       [not found] ` <20120222214307.23921.83903.stgit-T4OLL4TyM9aNDNWfRnPdfg@public.gmane.org>
  2012-02-23  7:46   ` Roland Dreier
@ 2012-02-24 21:32   ` Roland Dreier
       [not found]     ` <CAL1RGDWb0ocYN5oM3QtxRj5VWCAWrp3Jtx6N1UHSrNDP2A1WEw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 9+ messages in thread
From: Roland Dreier @ 2012-02-24 21:32 UTC (permalink / raw)
  To: Steve Wise
  Cc: sean.hefty-ral2JQCrhuEAvxtiuMwx3w, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Wed, Feb 22, 2012 at 1:43 PM, Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+v7I7tHvgBF7@public.gmane.orgm> wrote:
> @@ -631,6 +631,8 @@ static void cm_conn_req_handler(struct iwcm_id_private *listen_id_priv,
>        spin_lock_irqsave(&listen_id_priv->lock, flags);
>        if (listen_id_priv->state != IW_CM_STATE_LISTEN) {
>                spin_unlock_irqrestore(&listen_id_priv->lock, flags);
> +               iw_cm_reject(cm_id, NULL, 0);
> +               iw_destroy_cm_id(cm_id);
>                goto out;
>        }
>        spin_unlock_irqrestore(&listen_id_priv->lock, flags);

I think I see your bug.  Look at the whole cm_conn_req_handler()
function.  Where is this new code relative to where you initialize cm_id?

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] iw_cm: reject connect requests if cmid is not in LISTEN
       [not found]     ` <CAL1RGDWb0ocYN5oM3QtxRj5VWCAWrp3Jtx6N1UHSrNDP2A1WEw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-02-24 21:41       ` Steve Wise
  0 siblings, 0 replies; 9+ messages in thread
From: Steve Wise @ 2012-02-24 21:41 UTC (permalink / raw)
  To: Roland Dreier
  Cc: sean.hefty-ral2JQCrhuEAvxtiuMwx3w, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 02/24/2012 03:32 PM, Roland Dreier wrote:
> On Wed, Feb 22, 2012 at 1:43 PM, Steve Wise<swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>  wrote:
>> @@ -631,6 +631,8 @@ static void cm_conn_req_handler(struct iwcm_id_private *listen_id_priv,
>>         spin_lock_irqsave(&listen_id_priv->lock, flags);
>>         if (listen_id_priv->state != IW_CM_STATE_LISTEN) {
>>                 spin_unlock_irqrestore(&listen_id_priv->lock, flags);
>> +               iw_cm_reject(cm_id, NULL, 0);
>> +               iw_destroy_cm_id(cm_id);
>>                 goto out;
>>         }
>>         spin_unlock_irqrestore(&listen_id_priv->lock, flags);
> I think I see your bug.  Look at the whole cm_conn_req_handler()
> function.  Where is this new code relative to where you initialize cm_id?
>
>   - R.

duh. jeeze. ok, lemme try again.  I'll add printks to make sure I'm really hitting this path too.




--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-02-24 21:41 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-22 21:43 [PATCH] iw_cm: reject connect requests if cmid is not in LISTEN Steve Wise
     [not found] ` <20120222214307.23921.83903.stgit-T4OLL4TyM9aNDNWfRnPdfg@public.gmane.org>
2012-02-23  7:46   ` Roland Dreier
     [not found]     ` <CAL1RGDV7ZoKWgbh+ERF+af3_B7K2USAkXSPKWeQEg5atpHY0og-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-02-23 15:24       ` Steve Wise
     [not found]         ` <4F465A46.3060301-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2012-02-23 19:55           ` Steve Wise
     [not found]             ` <4F4699A1.7030402-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2012-02-23 20:23               ` Steve Wise
2012-02-24  1:57               ` Roland Dreier
     [not found]                 ` <CAL1RGDWkVJxEDZ5SaaSa8oA_y6a0u1NCbzTK9agsJE+V_YzimQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-02-24 14:16                   ` Steve Wise
2012-02-24 21:32   ` Roland Dreier
     [not found]     ` <CAL1RGDWb0ocYN5oM3QtxRj5VWCAWrp3Jtx6N1UHSrNDP2A1WEw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-02-24 21:41       ` Steve Wise

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.