[PATCH] opensm/osm_state_mgr.c: force heavy sweep when fabric consists of single switch

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] opensm/osm_state_mgr.c: force heavy sweep when fabric consists of single switch
@ 2009-11-03 10:26 Yevgeny Kliteynik
       [not found] ` <4AF0056A.5030503-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Yevgeny Kliteynik @ 2009-11-03 10:26 UTC (permalink / raw)
  To: Sasha Khapyorsky; +Cc: Linux RDMA

Always do heavy sweep when there is only one node in the
fabric, and this node is a switch, and SM runs on top of it -
there may be a race when OSM starts running before the
external ports are ports are up, or if they went through
reset while SM was starting.
In this race switch brings up the ports and turns on the
PSC bit, but OSM might get PortInfo before SwitchInfo, and it
might see all ports as down, but PSC bit on. If that happens,
OSM turns off PSC bit, and it will never see external ports
again - it won't perform any heavy sweep, only light sweep

Signed-off-by: Yevgeny Kliteynik <kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
---
 opensm/opensm/osm_state_mgr.c |   15 ++++++++++-----
 1 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c
index 4303d6e..537c855 100644
--- a/opensm/opensm/osm_state_mgr.c
+++ b/opensm/opensm/osm_state_mgr.c
@@ -1062,13 +1062,18 @@ static void do_sweep(osm_sm_t * sm)
 	 * Otherwise, this is probably our first discovery pass
 	 * or we are connected in loopback. In both cases do a
 	 * heavy sweep.
-	 * Note: If we are connected in loopback we want a heavy
-	 * sweep, since we will not be getting any traps if there is
-	 * a lost connection.
+	 * Note the following:
+	 * 1. If we are connected in loopback we want a heavy sweep, since we
+	 *    will not be getting any traps if there is a lost connection.
+	 * 2. If we are in DISCOVERING state - this means it is either in
+	 *    initializing or wake up from STANDBY - run the heavy sweep.
+	 * 3. If there is only one node in the fabric, and this node is a
+	 *    switch, and OSM runs on top of it, there might be a race when
+	 *    OSM starts running before the external ports are up - run the
+	 *    heavy sweep.
 	 */
-	/*  if we are in DISCOVERING state - this means it is either in
-	 *  initializing or wake up from STANDBY - run the heavy sweep */
 	if (cl_qmap_count(&sm->p_subn->sw_guid_tbl)
+	    && cl_qmap_count(&sm->p_subn->node_guid_tbl) != 1
 	    && sm->p_subn->sm_state != IB_SMINFO_STATE_DISCOVERING
 	    && sm->p_subn->opt.force_heavy_sweep == FALSE
 	    && sm->p_subn->force_heavy_sweep == FALSE
-- 
1.5.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] opensm/osm_state_mgr.c: force heavy sweep when fabric consists of single switch
       [not found] ` <4AF0056A.5030503-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2009-11-03 22:12   ` Sasha Khapyorsky
  2009-11-04  9:47     ` Yevgeny Kliteynik
  0 siblings, 1 reply; 18+ messages in thread
From: Sasha Khapyorsky @ 2009-11-03 22:12 UTC (permalink / raw)
  To: Yevgeny Kliteynik; +Cc: Linux RDMA

On 12:26 Tue 03 Nov     , Yevgeny Kliteynik wrote:
> Always do heavy sweep when there is only one node in the
> fabric, and this node is a switch, and SM runs on top of it -
> there may be a race when OSM starts running before the
> external ports are ports are up, or if they went through
> reset while SM was starting.
> In this race switch brings up the ports and turns on the
> PSC bit, but OSM might get PortInfo before SwitchInfo, and it
> might see all ports as down, but PSC bit on. If that happens,
> OSM turns off PSC bit, and it will never see external ports
> again - it won't perform any heavy sweep, only light sweep

Could such race happen when there are more than one node in a fabric?

Sasha

> 
> Signed-off-by: Yevgeny Kliteynik <kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
> ---
>  opensm/opensm/osm_state_mgr.c |   15 ++++++++++-----
>  1 files changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c
> index 4303d6e..537c855 100644
> --- a/opensm/opensm/osm_state_mgr.c
> +++ b/opensm/opensm/osm_state_mgr.c
> @@ -1062,13 +1062,18 @@ static void do_sweep(osm_sm_t * sm)
>  	 * Otherwise, this is probably our first discovery pass
>  	 * or we are connected in loopback. In both cases do a
>  	 * heavy sweep.
> -	 * Note: If we are connected in loopback we want a heavy
> -	 * sweep, since we will not be getting any traps if there is
> -	 * a lost connection.
> +	 * Note the following:
> +	 * 1. If we are connected in loopback we want a heavy sweep, since we
> +	 *    will not be getting any traps if there is a lost connection.
> +	 * 2. If we are in DISCOVERING state - this means it is either in
> +	 *    initializing or wake up from STANDBY - run the heavy sweep.
> +	 * 3. If there is only one node in the fabric, and this node is a
> +	 *    switch, and OSM runs on top of it, there might be a race when
> +	 *    OSM starts running before the external ports are up - run the
> +	 *    heavy sweep.
>  	 */
> -	/*  if we are in DISCOVERING state - this means it is either in
> -	 *  initializing or wake up from STANDBY - run the heavy sweep */
>  	if (cl_qmap_count(&sm->p_subn->sw_guid_tbl)
> +	    && cl_qmap_count(&sm->p_subn->node_guid_tbl) != 1
>  	    && sm->p_subn->sm_state != IB_SMINFO_STATE_DISCOVERING
>  	    && sm->p_subn->opt.force_heavy_sweep == FALSE
>  	    && sm->p_subn->force_heavy_sweep == FALSE
> -- 
> 1.5.1.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] opensm/osm_state_mgr.c: force heavy sweep when fabric consists of single switch
  2009-11-03 22:12   ` Sasha Khapyorsky
@ 2009-11-04  9:47     ` Yevgeny Kliteynik
       [not found]       ` <4AF14DCD.3010407-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Yevgeny Kliteynik @ 2009-11-04  9:47 UTC (permalink / raw)
  To: Sasha Khapyorsky; +Cc: Linux RDMA

Sasha Khapyorsky wrote:
> On 12:26 Tue 03 Nov     , Yevgeny Kliteynik wrote:
>> Always do heavy sweep when there is only one node in the
>> fabric, and this node is a switch, and SM runs on top of it -
>> there may be a race when OSM starts running before the
>> external ports are ports are up, or if they went through
>> reset while SM was starting.
>> In this race switch brings up the ports and turns on the
>> PSC bit, but OSM might get PortInfo before SwitchInfo, and it
>> might see all ports as down, but PSC bit on. If that happens,
>> OSM turns off PSC bit, and it will never see external ports
>> again - it won't perform any heavy sweep, only light sweep
> 
> Could such race happen when there are more than one node in a fabric?

I think that my description of the race was misleading.
The race can happen on *any* fabric when SM runs on switch.
But when it does happen, SM thinks that the whole subnet
is just one switch - that's what it managed to discover.
I've actually seen it happening.
So the patch fixes this particular case.

So the next question that you would probably ask is can
this race happen on some *other* switch and not the one
SM is running on?

Well, I don't know. I have a hunch that it can't, but I
couldn't prove it to myself yet.

The race on the managed switch is a special case because
SM always sees port 0, and always gets responses to its
SMP queries. On any other switch, if the ports were reset,
SM won't get any response until the ports are up again.

Perhaps there might be a case where SM got some port as down,
and by the time SM got SwitchInfo with PSC bit the port
was already up, so SM won't start discovery beyond this
port. But this race would be fixed on the next heavy sweep,
when SM will discover this port that it missed the previous
time, whereas race on managed switch is fatal - SM won't
ever do any heavy sweep.

-- Yevgeny
 
> Sasha
> 
>> Signed-off-by: Yevgeny Kliteynik <kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
>> ---
>>  opensm/opensm/osm_state_mgr.c |   15 ++++++++++-----
>>  1 files changed, 10 insertions(+), 5 deletions(-)
>>
>> diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c
>> index 4303d6e..537c855 100644
>> --- a/opensm/opensm/osm_state_mgr.c
>> +++ b/opensm/opensm/osm_state_mgr.c
>> @@ -1062,13 +1062,18 @@ static void do_sweep(osm_sm_t * sm)
>>  	 * Otherwise, this is probably our first discovery pass
>>  	 * or we are connected in loopback. In both cases do a
>>  	 * heavy sweep.
>> -	 * Note: If we are connected in loopback we want a heavy
>> -	 * sweep, since we will not be getting any traps if there is
>> -	 * a lost connection.
>> +	 * Note the following:
>> +	 * 1. If we are connected in loopback we want a heavy sweep, since we
>> +	 *    will not be getting any traps if there is a lost connection.
>> +	 * 2. If we are in DISCOVERING state - this means it is either in
>> +	 *    initializing or wake up from STANDBY - run the heavy sweep.
>> +	 * 3. If there is only one node in the fabric, and this node is a
>> +	 *    switch, and OSM runs on top of it, there might be a race when
>> +	 *    OSM starts running before the external ports are up - run the
>> +	 *    heavy sweep.
>>  	 */
>> -	/*  if we are in DISCOVERING state - this means it is either in
>> -	 *  initializing or wake up from STANDBY - run the heavy sweep */
>>  	if (cl_qmap_count(&sm->p_subn->sw_guid_tbl)
>> +	    && cl_qmap_count(&sm->p_subn->node_guid_tbl) != 1
>>  	    && sm->p_subn->sm_state != IB_SMINFO_STATE_DISCOVERING
>>  	    && sm->p_subn->opt.force_heavy_sweep == FALSE
>>  	    && sm->p_subn->force_heavy_sweep == FALSE
>> -- 
>> 1.5.1.4
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] opensm/osm_state_mgr.c: force heavy sweep when fabric consists of single switch
       [not found]       ` <4AF14DCD.3010407-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2009-11-04 11:36         ` Line Holen
       [not found]           ` <4AF16740.3080600-UdXhSnd/wVw@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Line Holen @ 2009-11-04 11:36 UTC (permalink / raw)
  To: kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Sasha Khapyorsky; +Cc: Linux RDMA

On 11/ 4/09 10:47 AM, Yevgeny Kliteynik wrote:
> Sasha Khapyorsky wrote:
>> On 12:26 Tue 03 Nov     , Yevgeny Kliteynik wrote:
>>> Always do heavy sweep when there is only one node in the
>>> fabric, and this node is a switch, and SM runs on top of it -
>>> there may be a race when OSM starts running before the
>>> external ports are ports are up, or if they went through
>>> reset while SM was starting.
>>> In this race switch brings up the ports and turns on the
>>> PSC bit, but OSM might get PortInfo before SwitchInfo, and it
>>> might see all ports as down, but PSC bit on. If that happens,
>>> OSM turns off PSC bit, and it will never see external ports
>>> again - it won't perform any heavy sweep, only light sweep
>>
>> Could such race happen when there are more than one node in a fabric?
> 
> I think that my description of the race was misleading.
> The race can happen on *any* fabric when SM runs on switch.
> But when it does happen, SM thinks that the whole subnet
> is just one switch - that's what it managed to discover.
> I've actually seen it happening.
> So the patch fixes this particular case.
> 
> So the next question that you would probably ask is can
> this race happen on some *other* switch and not the one
> SM is running on?
> 
> Well, I don't know. I have a hunch that it can't, but I
> couldn't prove it to myself yet.
> 
> The race on the managed switch is a special case because
> SM always sees port 0, and always gets responses to its
> SMP queries. On any other switch, if the ports were reset,
> SM won't get any response until the ports are up again.
> 
> Perhaps there might be a case where SM got some port as down,
> and by the time SM got SwitchInfo with PSC bit the port
> was already up, so SM won't start discovery beyond this
> port. But this race would be fixed on the next heavy sweep,
> when SM will discover this port that it missed the previous
> time, whereas race on managed switch is fatal - SM won't
> ever do any heavy sweep.
> 
> -- Yevgeny

At least for the 3.2 branch there is a general race regardless of
where the SM is running. I haven't checked the current master, but
I cannot recall seeing any patches related to this so I assume
the race is still there.

There is a window between SM discovering a switch and clearing PSC
for the same switch. The SM will not detect a state change on the
switch ports during this time.

I have a patch for the 3.2 branch that I can merge into master.

Line

> 
>> Sasha
>>
>>> Signed-off-by: Yevgeny Kliteynik <kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
>>> ---
>>>  opensm/opensm/osm_state_mgr.c |   15 ++++++++++-----
>>>  1 files changed, 10 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/opensm/opensm/osm_state_mgr.c
>>> b/opensm/opensm/osm_state_mgr.c
>>> index 4303d6e..537c855 100644
>>> --- a/opensm/opensm/osm_state_mgr.c
>>> +++ b/opensm/opensm/osm_state_mgr.c
>>> @@ -1062,13 +1062,18 @@ static void do_sweep(osm_sm_t * sm)
>>>       * Otherwise, this is probably our first discovery pass
>>>       * or we are connected in loopback. In both cases do a
>>>       * heavy sweep.
>>> -     * Note: If we are connected in loopback we want a heavy
>>> -     * sweep, since we will not be getting any traps if there is
>>> -     * a lost connection.
>>> +     * Note the following:
>>> +     * 1. If we are connected in loopback we want a heavy sweep,
>>> since we
>>> +     *    will not be getting any traps if there is a lost connection.
>>> +     * 2. If we are in DISCOVERING state - this means it is either in
>>> +     *    initializing or wake up from STANDBY - run the heavy sweep.
>>> +     * 3. If there is only one node in the fabric, and this node is a
>>> +     *    switch, and OSM runs on top of it, there might be a race when
>>> +     *    OSM starts running before the external ports are up - run the
>>> +     *    heavy sweep.
>>>       */
>>> -    /*  if we are in DISCOVERING state - this means it is either in
>>> -     *  initializing or wake up from STANDBY - run the heavy sweep */
>>>      if (cl_qmap_count(&sm->p_subn->sw_guid_tbl)
>>> +        && cl_qmap_count(&sm->p_subn->node_guid_tbl) != 1
>>>          && sm->p_subn->sm_state != IB_SMINFO_STATE_DISCOVERING
>>>          && sm->p_subn->opt.force_heavy_sweep == FALSE
>>>          && sm->p_subn->force_heavy_sweep == FALSE
>>> -- 
>>> 1.5.1.4
>>>
>>> -- 
>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] opensm/osm_state_mgr.c: force heavy sweep when fabric consists of single switch
       [not found]           ` <4AF16740.3080600-UdXhSnd/wVw@public.gmane.org>
@ 2009-11-04 15:54             ` Yevgeny Kliteynik
       [not found]               ` <4AF1A3CA.9070902-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Yevgeny Kliteynik @ 2009-11-04 15:54 UTC (permalink / raw)
  To: Line Holen; +Cc: Sasha Khapyorsky, Linux RDMA

Line Holen wrote:
> On 11/ 4/09 10:47 AM, Yevgeny Kliteynik wrote:
>> Sasha Khapyorsky wrote:
>>> On 12:26 Tue 03 Nov     , Yevgeny Kliteynik wrote:
>>>> Always do heavy sweep when there is only one node in the
>>>> fabric, and this node is a switch, and SM runs on top of it -
>>>> there may be a race when OSM starts running before the
>>>> external ports are ports are up, or if they went through
>>>> reset while SM was starting.
>>>> In this race switch brings up the ports and turns on the
>>>> PSC bit, but OSM might get PortInfo before SwitchInfo, and it
>>>> might see all ports as down, but PSC bit on. If that happens,
>>>> OSM turns off PSC bit, and it will never see external ports
>>>> again - it won't perform any heavy sweep, only light sweep
>>> Could such race happen when there are more than one node in a fabric?
>> I think that my description of the race was misleading.
>> The race can happen on *any* fabric when SM runs on switch.
>> But when it does happen, SM thinks that the whole subnet
>> is just one switch - that's what it managed to discover.
>> I've actually seen it happening.
>> So the patch fixes this particular case.
>>
>> So the next question that you would probably ask is can
>> this race happen on some *other* switch and not the one
>> SM is running on?
>>
>> Well, I don't know. I have a hunch that it can't, but I
>> couldn't prove it to myself yet.
>>
>> The race on the managed switch is a special case because
>> SM always sees port 0, and always gets responses to its
>> SMP queries. On any other switch, if the ports were reset,
>> SM won't get any response until the ports are up again.
>>
>> Perhaps there might be a case where SM got some port as down,
>> and by the time SM got SwitchInfo with PSC bit the port
>> was already up, so SM won't start discovery beyond this
>> port. But this race would be fixed on the next heavy sweep,
>> when SM will discover this port that it missed the previous
>> time, whereas race on managed switch is fatal - SM won't
>> ever do any heavy sweep.
>>
>> -- Yevgeny
> 
> At least for the 3.2 branch there is a general race regardless of
> where the SM is running. I haven't checked the current master, but
> I cannot recall seeing any patches related to this so I assume
> the race is still there.
> 
> There is a window between SM discovering a switch and clearing PSC
> for the same switch. The SM will not detect a state change on the
> switch ports during this time.

If the port changes state during that period, the switch issues
new trap 128, which (I think) should cause SM to re-discover the
fabric once this discovery cycle is over. Is this correct?

Or perhaps the more serious problem happens when SM LID is not
configured yet on the switch, hence the trap is not going to the
right place?

> I have a patch for the 3.2 branch that I can merge into master.

Sure, that would be nice :)

-- Yevgeny

 
> Line
> 
>>> Sasha
>>>
>>>> Signed-off-by: Yevgeny Kliteynik <kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
>>>> ---
>>>>  opensm/opensm/osm_state_mgr.c |   15 ++++++++++-----
>>>>  1 files changed, 10 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/opensm/opensm/osm_state_mgr.c
>>>> b/opensm/opensm/osm_state_mgr.c
>>>> index 4303d6e..537c855 100644
>>>> --- a/opensm/opensm/osm_state_mgr.c
>>>> +++ b/opensm/opensm/osm_state_mgr.c
>>>> @@ -1062,13 +1062,18 @@ static void do_sweep(osm_sm_t * sm)
>>>>       * Otherwise, this is probably our first discovery pass
>>>>       * or we are connected in loopback. In both cases do a
>>>>       * heavy sweep.
>>>> -     * Note: If we are connected in loopback we want a heavy
>>>> -     * sweep, since we will not be getting any traps if there is
>>>> -     * a lost connection.
>>>> +     * Note the following:
>>>> +     * 1. If we are connected in loopback we want a heavy sweep,
>>>> since we
>>>> +     *    will not be getting any traps if there is a lost connection.
>>>> +     * 2. If we are in DISCOVERING state - this means it is either in
>>>> +     *    initializing or wake up from STANDBY - run the heavy sweep.
>>>> +     * 3. If there is only one node in the fabric, and this node is a
>>>> +     *    switch, and OSM runs on top of it, there might be a race when
>>>> +     *    OSM starts running before the external ports are up - run the
>>>> +     *    heavy sweep.
>>>>       */
>>>> -    /*  if we are in DISCOVERING state - this means it is either in
>>>> -     *  initializing or wake up from STANDBY - run the heavy sweep */
>>>>      if (cl_qmap_count(&sm->p_subn->sw_guid_tbl)
>>>> +        && cl_qmap_count(&sm->p_subn->node_guid_tbl) != 1
>>>>          && sm->p_subn->sm_state != IB_SMINFO_STATE_DISCOVERING
>>>>          && sm->p_subn->opt.force_heavy_sweep == FALSE
>>>>          && sm->p_subn->force_heavy_sweep == FALSE
>>>> -- 
>>>> 1.5.1.4
>>>>
>>>> -- 
>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] opensm/osm_state_mgr.c: force heavy sweep when fabric consists of single switch
       [not found]               ` <4AF1A3CA.9070902-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2009-11-04 17:42                 ` Line Holen
       [not found]                   ` <4AF1BD1C.4090703-UdXhSnd/wVw@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Line Holen @ 2009-11-04 17:42 UTC (permalink / raw)
  To: kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb; +Cc: Sasha Khapyorsky, Linux RDMA

On 11/ 4/09 04:54 PM, Yevgeny Kliteynik wrote:
> Line Holen wrote:
>> On 11/ 4/09 10:47 AM, Yevgeny Kliteynik wrote:
>>> Sasha Khapyorsky wrote:
>>>> On 12:26 Tue 03 Nov     , Yevgeny Kliteynik wrote:
>>>>> Always do heavy sweep when there is only one node in the
>>>>> fabric, and this node is a switch, and SM runs on top of it -
>>>>> there may be a race when OSM starts running before the
>>>>> external ports are ports are up, or if they went through
>>>>> reset while SM was starting.
>>>>> In this race switch brings up the ports and turns on the
>>>>> PSC bit, but OSM might get PortInfo before SwitchInfo, and it
>>>>> might see all ports as down, but PSC bit on. If that happens,
>>>>> OSM turns off PSC bit, and it will never see external ports
>>>>> again - it won't perform any heavy sweep, only light sweep
>>>> Could such race happen when there are more than one node in a fabric?
>>> I think that my description of the race was misleading.
>>> The race can happen on *any* fabric when SM runs on switch.
>>> But when it does happen, SM thinks that the whole subnet
>>> is just one switch - that's what it managed to discover.
>>> I've actually seen it happening.
>>> So the patch fixes this particular case.
>>>
>>> So the next question that you would probably ask is can
>>> this race happen on some *other* switch and not the one
>>> SM is running on?
>>>
>>> Well, I don't know. I have a hunch that it can't, but I
>>> couldn't prove it to myself yet.
>>>
>>> The race on the managed switch is a special case because
>>> SM always sees port 0, and always gets responses to its
>>> SMP queries. On any other switch, if the ports were reset,
>>> SM won't get any response until the ports are up again.
>>>
>>> Perhaps there might be a case where SM got some port as down,
>>> and by the time SM got SwitchInfo with PSC bit the port
>>> was already up, so SM won't start discovery beyond this
>>> port. But this race would be fixed on the next heavy sweep,
>>> when SM will discover this port that it missed the previous
>>> time, whereas race on managed switch is fatal - SM won't
>>> ever do any heavy sweep.
>>>
>>> -- Yevgeny
>>
>> At least for the 3.2 branch there is a general race regardless of
>> where the SM is running. I haven't checked the current master, but
>> I cannot recall seeing any patches related to this so I assume
>> the race is still there.
>>
>> There is a window between SM discovering a switch and clearing PSC
>> for the same switch. The SM will not detect a state change on the
>> switch ports during this time.
> 
> If the port changes state during that period, the switch issues
> new trap 128, which (I think) should cause SM to re-discover the
> fabric once this discovery cycle is over. Is this correct?
> 

I think the switch shall send a trap whenever it sets the PSC bit.
Once set I believe it will not send another trap until it is reset.
Or do I misinterpret the spec ?

> Or perhaps the more serious problem happens when SM LID is not
> configured yet on the switch, hence the trap is not going to the
> right place?
> 
>> I have a patch for the 3.2 branch that I can merge into master.
> 
> Sure, that would be nice :)
> 
> -- Yevgeny
> 
> 
>> Line
>>
>>>> Sasha
>>>>
>>>>> Signed-off-by: Yevgeny Kliteynik <kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
>>>>> ---
>>>>>  opensm/opensm/osm_state_mgr.c |   15 ++++++++++-----
>>>>>  1 files changed, 10 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/opensm/opensm/osm_state_mgr.c
>>>>> b/opensm/opensm/osm_state_mgr.c
>>>>> index 4303d6e..537c855 100644
>>>>> --- a/opensm/opensm/osm_state_mgr.c
>>>>> +++ b/opensm/opensm/osm_state_mgr.c
>>>>> @@ -1062,13 +1062,18 @@ static void do_sweep(osm_sm_t * sm)
>>>>>       * Otherwise, this is probably our first discovery pass
>>>>>       * or we are connected in loopback. In both cases do a
>>>>>       * heavy sweep.
>>>>> -     * Note: If we are connected in loopback we want a heavy
>>>>> -     * sweep, since we will not be getting any traps if there is
>>>>> -     * a lost connection.
>>>>> +     * Note the following:
>>>>> +     * 1. If we are connected in loopback we want a heavy sweep,
>>>>> since we
>>>>> +     *    will not be getting any traps if there is a lost
>>>>> connection.
>>>>> +     * 2. If we are in DISCOVERING state - this means it is either in
>>>>> +     *    initializing or wake up from STANDBY - run the heavy sweep.
>>>>> +     * 3. If there is only one node in the fabric, and this node is a
>>>>> +     *    switch, and OSM runs on top of it, there might be a race
>>>>> when
>>>>> +     *    OSM starts running before the external ports are up -
>>>>> run the
>>>>> +     *    heavy sweep.
>>>>>       */
>>>>> -    /*  if we are in DISCOVERING state - this means it is either in
>>>>> -     *  initializing or wake up from STANDBY - run the heavy sweep */
>>>>>      if (cl_qmap_count(&sm->p_subn->sw_guid_tbl)
>>>>> +        && cl_qmap_count(&sm->p_subn->node_guid_tbl) != 1
>>>>>          && sm->p_subn->sm_state != IB_SMINFO_STATE_DISCOVERING
>>>>>          && sm->p_subn->opt.force_heavy_sweep == FALSE
>>>>>          && sm->p_subn->force_heavy_sweep == FALSE
>>>>> -- 
>>>>> 1.5.1.4
>>>>>
>>>>> -- 
>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>> linux-rdma" in
>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>> -- 
>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] opensm/osm_state_mgr.c: force heavy sweep when fabric consists of single switch
       [not found]                   ` <4AF1BD1C.4090703-UdXhSnd/wVw@public.gmane.org>
@ 2009-11-04 18:39                     ` Yevgeny Kliteynik
       [not found]                       ` <4AF1CA61.2020007-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Yevgeny Kliteynik @ 2009-11-04 18:39 UTC (permalink / raw)
  To: Line Holen; +Cc: Sasha Khapyorsky, Linux RDMA

Line Holen wrote:
> On 11/ 4/09 04:54 PM, Yevgeny Kliteynik wrote:
>> Line Holen wrote:
>>> On 11/ 4/09 10:47 AM, Yevgeny Kliteynik wrote:
>>>> Sasha Khapyorsky wrote:
>>>>> On 12:26 Tue 03 Nov     , Yevgeny Kliteynik wrote:
>>>>>> Always do heavy sweep when there is only one node in the
>>>>>> fabric, and this node is a switch, and SM runs on top of it -
>>>>>> there may be a race when OSM starts running before the
>>>>>> external ports are ports are up, or if they went through
>>>>>> reset while SM was starting.
>>>>>> In this race switch brings up the ports and turns on the
>>>>>> PSC bit, but OSM might get PortInfo before SwitchInfo, and it
>>>>>> might see all ports as down, but PSC bit on. If that happens,
>>>>>> OSM turns off PSC bit, and it will never see external ports
>>>>>> again - it won't perform any heavy sweep, only light sweep
>>>>> Could such race happen when there are more than one node in a fabric?
>>>> I think that my description of the race was misleading.
>>>> The race can happen on *any* fabric when SM runs on switch.
>>>> But when it does happen, SM thinks that the whole subnet
>>>> is just one switch - that's what it managed to discover.
>>>> I've actually seen it happening.
>>>> So the patch fixes this particular case.
>>>>
>>>> So the next question that you would probably ask is can
>>>> this race happen on some *other* switch and not the one
>>>> SM is running on?
>>>>
>>>> Well, I don't know. I have a hunch that it can't, but I
>>>> couldn't prove it to myself yet.
>>>>
>>>> The race on the managed switch is a special case because
>>>> SM always sees port 0, and always gets responses to its
>>>> SMP queries. On any other switch, if the ports were reset,
>>>> SM won't get any response until the ports are up again.
>>>>
>>>> Perhaps there might be a case where SM got some port as down,
>>>> and by the time SM got SwitchInfo with PSC bit the port
>>>> was already up, so SM won't start discovery beyond this
>>>> port. But this race would be fixed on the next heavy sweep,
>>>> when SM will discover this port that it missed the previous
>>>> time, whereas race on managed switch is fatal - SM won't
>>>> ever do any heavy sweep.
>>>>
>>>> -- Yevgeny
>>> At least for the 3.2 branch there is a general race regardless of
>>> where the SM is running. I haven't checked the current master, but
>>> I cannot recall seeing any patches related to this so I assume
>>> the race is still there.
>>>
>>> There is a window between SM discovering a switch and clearing PSC
>>> for the same switch. The SM will not detect a state change on the
>>> switch ports during this time.
>> If the port changes state during that period, the switch issues
>> new trap 128, which (I think) should cause SM to re-discover the
>> fabric once this discovery cycle is over. Is this correct?
>>
> 
> I think the switch shall send a trap whenever it sets the PSC bit.
> Once set I believe it will not send another trap until it is reset.
> Or do I misinterpret the spec ?

I may be wrong, but I thought that this is how things work:
 - port state changes
 - switch turns on PSC bit and starts sending traps
 - SM gets the trap, sends trap repress
 - switch gets trap repress and stops sending traps
 - PSC is still on
 - port state changes again (the same or any other port)
 - switch turns on PSC bit (which doesn't matter as PSC is
   already on) and starts sending traps again
 - etc...

Anyway, I'll double-check this issue.

-- Yevgeny
 
>> Or perhaps the more serious problem happens when SM LID is not
>> configured yet on the switch, hence the trap is not going to the
>> right place?
>>
>>> I have a patch for the 3.2 branch that I can merge into master.
>> Sure, that would be nice :)
>>
>> -- Yevgeny
>>
>>
>>> Line
>>>
>>>>> Sasha
>>>>>
>>>>>> Signed-off-by: Yevgeny Kliteynik <kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
>>>>>> ---
>>>>>>  opensm/opensm/osm_state_mgr.c |   15 ++++++++++-----
>>>>>>  1 files changed, 10 insertions(+), 5 deletions(-)
>>>>>>
>>>>>> diff --git a/opensm/opensm/osm_state_mgr.c
>>>>>> b/opensm/opensm/osm_state_mgr.c
>>>>>> index 4303d6e..537c855 100644
>>>>>> --- a/opensm/opensm/osm_state_mgr.c
>>>>>> +++ b/opensm/opensm/osm_state_mgr.c
>>>>>> @@ -1062,13 +1062,18 @@ static void do_sweep(osm_sm_t * sm)
>>>>>>       * Otherwise, this is probably our first discovery pass
>>>>>>       * or we are connected in loopback. In both cases do a
>>>>>>       * heavy sweep.
>>>>>> -     * Note: If we are connected in loopback we want a heavy
>>>>>> -     * sweep, since we will not be getting any traps if there is
>>>>>> -     * a lost connection.
>>>>>> +     * Note the following:
>>>>>> +     * 1. If we are connected in loopback we want a heavy sweep,
>>>>>> since we
>>>>>> +     *    will not be getting any traps if there is a lost
>>>>>> connection.
>>>>>> +     * 2. If we are in DISCOVERING state - this means it is either in
>>>>>> +     *    initializing or wake up from STANDBY - run the heavy sweep.
>>>>>> +     * 3. If there is only one node in the fabric, and this node is a
>>>>>> +     *    switch, and OSM runs on top of it, there might be a race
>>>>>> when
>>>>>> +     *    OSM starts running before the external ports are up -
>>>>>> run the
>>>>>> +     *    heavy sweep.
>>>>>>       */
>>>>>> -    /*  if we are in DISCOVERING state - this means it is either in
>>>>>> -     *  initializing or wake up from STANDBY - run the heavy sweep */
>>>>>>      if (cl_qmap_count(&sm->p_subn->sw_guid_tbl)
>>>>>> +        && cl_qmap_count(&sm->p_subn->node_guid_tbl) != 1
>>>>>>          && sm->p_subn->sm_state != IB_SMINFO_STATE_DISCOVERING
>>>>>>          && sm->p_subn->opt.force_heavy_sweep == FALSE
>>>>>>          && sm->p_subn->force_heavy_sweep == FALSE
>>>>>> -- 
>>>>>> 1.5.1.4
>>>>>>
>>>>>> -- 
>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>> linux-rdma" in
>>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>> -- 
>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] opensm/osm_state_mgr.c: force heavy sweep when fabric consists of single switch
       [not found]                       ` <4AF1CA61.2020007-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2009-11-05  7:29                         ` Yevgeny Kliteynik
       [not found]                           ` <4AF27EDB.6070604-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Yevgeny Kliteynik @ 2009-11-05  7:29 UTC (permalink / raw)
  To: kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb
  Cc: Line Holen, Sasha Khapyorsky, Linux RDMA

Yevgeny Kliteynik wrote:
> Line Holen wrote:
>> On 11/ 4/09 04:54 PM, Yevgeny Kliteynik wrote:
>>> Line Holen wrote:
>>>> On 11/ 4/09 10:47 AM, Yevgeny Kliteynik wrote:
>>>>> Sasha Khapyorsky wrote:
>>>>>> On 12:26 Tue 03 Nov     , Yevgeny Kliteynik wrote:
>>>>>>> Always do heavy sweep when there is only one node in the
>>>>>>> fabric, and this node is a switch, and SM runs on top of it -
>>>>>>> there may be a race when OSM starts running before the
>>>>>>> external ports are ports are up, or if they went through
>>>>>>> reset while SM was starting.
>>>>>>> In this race switch brings up the ports and turns on the
>>>>>>> PSC bit, but OSM might get PortInfo before SwitchInfo, and it
>>>>>>> might see all ports as down, but PSC bit on. If that happens,
>>>>>>> OSM turns off PSC bit, and it will never see external ports
>>>>>>> again - it won't perform any heavy sweep, only light sweep
>>>>>> Could such race happen when there are more than one node in a fabric?
>>>>> I think that my description of the race was misleading.
>>>>> The race can happen on *any* fabric when SM runs on switch.
>>>>> But when it does happen, SM thinks that the whole subnet
>>>>> is just one switch - that's what it managed to discover.
>>>>> I've actually seen it happening.
>>>>> So the patch fixes this particular case.
>>>>>
>>>>> So the next question that you would probably ask is can
>>>>> this race happen on some *other* switch and not the one
>>>>> SM is running on?
>>>>>
>>>>> Well, I don't know. I have a hunch that it can't, but I
>>>>> couldn't prove it to myself yet.
>>>>>
>>>>> The race on the managed switch is a special case because
>>>>> SM always sees port 0, and always gets responses to its
>>>>> SMP queries. On any other switch, if the ports were reset,
>>>>> SM won't get any response until the ports are up again.
>>>>>
>>>>> Perhaps there might be a case where SM got some port as down,
>>>>> and by the time SM got SwitchInfo with PSC bit the port
>>>>> was already up, so SM won't start discovery beyond this
>>>>> port. But this race would be fixed on the next heavy sweep,
>>>>> when SM will discover this port that it missed the previous
>>>>> time, whereas race on managed switch is fatal - SM won't
>>>>> ever do any heavy sweep.
>>>>>
>>>>> -- Yevgeny
>>>> At least for the 3.2 branch there is a general race regardless of
>>>> where the SM is running. I haven't checked the current master, but
>>>> I cannot recall seeing any patches related to this so I assume
>>>> the race is still there.
>>>>
>>>> There is a window between SM discovering a switch and clearing PSC
>>>> for the same switch. The SM will not detect a state change on the
>>>> switch ports during this time.
>>> If the port changes state during that period, the switch issues
>>> new trap 128, which (I think) should cause SM to re-discover the
>>> fabric once this discovery cycle is over. Is this correct?
>>>
>>
>> I think the switch shall send a trap whenever it sets the PSC bit.
>> Once set I believe it will not send another trap until it is reset.
>> Or do I misinterpret the spec ?
> 
> I may be wrong, but I thought that this is how things work:
> - port state changes
> - switch turns on PSC bit and starts sending traps
> - SM gets the trap, sends trap repress
> - switch gets trap repress and stops sending traps
> - PSC is still on
> - port state changes again (the same or any other port)
> - switch turns on PSC bit (which doesn't matter as PSC is
>   already on) and starts sending traps again
> - etc...
> 
> Anyway, I'll double-check this issue.

Yep, verified.
Switch sends traps regardless the PSC bit status.
Also, the spec doesn't link them together:

   o14-5.1.1: If a switch supports Traps (PortInfo:
   CapabilityMask.IsTrap-Supported is one), its SMA
   shall send trap 128 to the SM indicated by the 
   PortInfo:MasterSMLID under any condition that 
   would cause SwitchInfo:PortStateChange to be set
   to one. (See 14.2.5.4 SwitchInfo on page 827.)

-- Yevgeny

> -- Yevgeny
> 
>>> Or perhaps the more serious problem happens when SM LID is not
>>> configured yet on the switch, hence the trap is not going to the
>>> right place?
>>>
>>>> I have a patch for the 3.2 branch that I can merge into master.
>>> Sure, that would be nice :)
>>>
>>> -- Yevgeny
>>>
>>>
>>>> Line
>>>>
>>>>>> Sasha
>>>>>>
>>>>>>> Signed-off-by: Yevgeny Kliteynik <kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
>>>>>>> ---
>>>>>>>  opensm/opensm/osm_state_mgr.c |   15 ++++++++++-----
>>>>>>>  1 files changed, 10 insertions(+), 5 deletions(-)
>>>>>>>
>>>>>>> diff --git a/opensm/opensm/osm_state_mgr.c
>>>>>>> b/opensm/opensm/osm_state_mgr.c
>>>>>>> index 4303d6e..537c855 100644
>>>>>>> --- a/opensm/opensm/osm_state_mgr.c
>>>>>>> +++ b/opensm/opensm/osm_state_mgr.c
>>>>>>> @@ -1062,13 +1062,18 @@ static void do_sweep(osm_sm_t * sm)
>>>>>>>       * Otherwise, this is probably our first discovery pass
>>>>>>>       * or we are connected in loopback. In both cases do a
>>>>>>>       * heavy sweep.
>>>>>>> -     * Note: If we are connected in loopback we want a heavy
>>>>>>> -     * sweep, since we will not be getting any traps if there is
>>>>>>> -     * a lost connection.
>>>>>>> +     * Note the following:
>>>>>>> +     * 1. If we are connected in loopback we want a heavy sweep,
>>>>>>> since we
>>>>>>> +     *    will not be getting any traps if there is a lost
>>>>>>> connection.
>>>>>>> +     * 2. If we are in DISCOVERING state - this means it is 
>>>>>>> either in
>>>>>>> +     *    initializing or wake up from STANDBY - run the heavy 
>>>>>>> sweep.
>>>>>>> +     * 3. If there is only one node in the fabric, and this node 
>>>>>>> is a
>>>>>>> +     *    switch, and OSM runs on top of it, there might be a race
>>>>>>> when
>>>>>>> +     *    OSM starts running before the external ports are up -
>>>>>>> run the
>>>>>>> +     *    heavy sweep.
>>>>>>>       */
>>>>>>> -    /*  if we are in DISCOVERING state - this means it is either in
>>>>>>> -     *  initializing or wake up from STANDBY - run the heavy 
>>>>>>> sweep */
>>>>>>>      if (cl_qmap_count(&sm->p_subn->sw_guid_tbl)
>>>>>>> +        && cl_qmap_count(&sm->p_subn->node_guid_tbl) != 1
>>>>>>>          && sm->p_subn->sm_state != IB_SMINFO_STATE_DISCOVERING
>>>>>>>          && sm->p_subn->opt.force_heavy_sweep == FALSE
>>>>>>>          && sm->p_subn->force_heavy_sweep == FALSE
>>>>>>> -- 
>>>>>>> 1.5.1.4
>>>>>>>
>>>>>>> -- 
>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>> linux-rdma" in
>>>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>
>>>>> -- 
>>>>> To unsubscribe from this list: send the line "unsubscribe 
>>>>> linux-rdma" in
>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>> -- 
>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] opensm/osm_state_mgr.c: force heavy sweep when fabric consists of single switch
       [not found]                           ` <4AF27EDB.6070604-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2009-11-08 14:30                             ` Eli Dorfman (Voltaire)
       [not found]                               ` <4AF6D619.8000908-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Eli Dorfman (Voltaire) @ 2009-11-08 14:30 UTC (permalink / raw)
  To: kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb
  Cc: Line Holen, Sasha Khapyorsky, Linux RDMA

Yevgeny Kliteynik wrote:
> Yevgeny Kliteynik wrote:
>> Line Holen wrote:
>>> On 11/ 4/09 04:54 PM, Yevgeny Kliteynik wrote:
>>>> Line Holen wrote:
>>>>> On 11/ 4/09 10:47 AM, Yevgeny Kliteynik wrote:
>>>>>> Sasha Khapyorsky wrote:
>>>>>>> On 12:26 Tue 03 Nov     , Yevgeny Kliteynik wrote:
>>>>>>>> Always do heavy sweep when there is only one node in the
>>>>>>>> fabric, and this node is a switch, and SM runs on top of it -
>>>>>>>> there may be a race when OSM starts running before the
>>>>>>>> external ports are ports are up, or if they went through
>>>>>>>> reset while SM was starting.
>>>>>>>> In this race switch brings up the ports and turns on the
>>>>>>>> PSC bit, but OSM might get PortInfo before SwitchInfo, and it
>>>>>>>> might see all ports as down, but PSC bit on. If that happens,
>>>>>>>> OSM turns off PSC bit, and it will never see external ports
>>>>>>>> again - it won't perform any heavy sweep, only light sweep
>>>>>>> Could such race happen when there are more than one node in a
>>>>>>> fabric?
>>>>>> I think that my description of the race was misleading.
>>>>>> The race can happen on *any* fabric when SM runs on switch.
>>>>>> But when it does happen, SM thinks that the whole subnet
>>>>>> is just one switch - that's what it managed to discover.
>>>>>> I've actually seen it happening.
>>>>>> So the patch fixes this particular case.
>>>>>>
>>>>>> So the next question that you would probably ask is can
>>>>>> this race happen on some *other* switch and not the one
>>>>>> SM is running on?
>>>>>>
>>>>>> Well, I don't know. I have a hunch that it can't, but I
>>>>>> couldn't prove it to myself yet.
>>>>>>
>>>>>> The race on the managed switch is a special case because
>>>>>> SM always sees port 0, and always gets responses to its
>>>>>> SMP queries. On any other switch, if the ports were reset,
>>>>>> SM won't get any response until the ports are up again.
>>>>>>
>>>>>> Perhaps there might be a case where SM got some port as down,
>>>>>> and by the time SM got SwitchInfo with PSC bit the port
>>>>>> was already up, so SM won't start discovery beyond this
>>>>>> port. But this race would be fixed on the next heavy sweep,
>>>>>> when SM will discover this port that it missed the previous
>>>>>> time, whereas race on managed switch is fatal - SM won't
>>>>>> ever do any heavy sweep.
>>>>>>
>>>>>> -- Yevgeny
>>>>> At least for the 3.2 branch there is a general race regardless of
>>>>> where the SM is running. I haven't checked the current master, but
>>>>> I cannot recall seeing any patches related to this so I assume
>>>>> the race is still there.
>>>>>
>>>>> There is a window between SM discovering a switch and clearing PSC
>>>>> for the same switch. The SM will not detect a state change on the
>>>>> switch ports during this time.
>>>> If the port changes state during that period, the switch issues
>>>> new trap 128, which (I think) should cause SM to re-discover the
>>>> fabric once this discovery cycle is over. Is this correct?
>>>>
>>>
>>> I think the switch shall send a trap whenever it sets the PSC bit.
>>> Once set I believe it will not send another trap until it is reset.
>>> Or do I misinterpret the spec ?
>>
>> I may be wrong, but I thought that this is how things work:
>> - port state changes
>> - switch turns on PSC bit and starts sending traps
>> - SM gets the trap, sends trap repress
>> - switch gets trap repress and stops sending traps
>> - PSC is still on
>> - port state changes again (the same or any other port)
>> - switch turns on PSC bit (which doesn't matter as PSC is
>>   already on) and starts sending traps again
>> - etc...
>>
>> Anyway, I'll double-check this issue.
> 
> Yep, verified.
> Switch sends traps regardless the PSC bit status.
> Also, the spec doesn't link them together:
> 
>   o14-5.1.1: If a switch supports Traps (PortInfo:
>   CapabilityMask.IsTrap-Supported is one), its SMA
>   shall send trap 128 to the SM indicated by the   PortInfo:MasterSMLID
> under any condition that   would cause SwitchInfo:PortStateChange to be set
>   to one. (See 14.2.5.4 SwitchInfo on page 827.)
> 

Trap will be sent according to the SMLID. After first bring up the SMLID is not set yet and trap will not be sent.
In that case the opensm would discover the change only by PSC bit.
For IS3 chips the PSC bit and/or trap were set only after one or more ports changed their state, so I don't understand how can the SM discover PSC bit set while all ports are down. Or is this a change in IS4?

Eli

> -- Yevgeny
> 
>> -- Yevgeny
>>
>>>> Or perhaps the more serious problem happens when SM LID is not
>>>> configured yet on the switch, hence the trap is not going to the
>>>> right place?
>>>>
>>>>> I have a patch for the 3.2 branch that I can merge into master.
>>>> Sure, that would be nice :)
>>>>
>>>> -- Yevgeny
>>>>
>>>>
>>>>> Line
>>>>>
>>>>>>> Sasha
>>>>>>>
>>>>>>>> Signed-off-by: Yevgeny Kliteynik <kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
>>>>>>>> ---
>>>>>>>>  opensm/opensm/osm_state_mgr.c |   15 ++++++++++-----
>>>>>>>>  1 files changed, 10 insertions(+), 5 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/opensm/opensm/osm_state_mgr.c
>>>>>>>> b/opensm/opensm/osm_state_mgr.c
>>>>>>>> index 4303d6e..537c855 100644
>>>>>>>> --- a/opensm/opensm/osm_state_mgr.c
>>>>>>>> +++ b/opensm/opensm/osm_state_mgr.c
>>>>>>>> @@ -1062,13 +1062,18 @@ static void do_sweep(osm_sm_t * sm)
>>>>>>>>       * Otherwise, this is probably our first discovery pass
>>>>>>>>       * or we are connected in loopback. In both cases do a
>>>>>>>>       * heavy sweep.
>>>>>>>> -     * Note: If we are connected in loopback we want a heavy
>>>>>>>> -     * sweep, since we will not be getting any traps if there is
>>>>>>>> -     * a lost connection.
>>>>>>>> +     * Note the following:
>>>>>>>> +     * 1. If we are connected in loopback we want a heavy sweep,
>>>>>>>> since we
>>>>>>>> +     *    will not be getting any traps if there is a lost
>>>>>>>> connection.
>>>>>>>> +     * 2. If we are in DISCOVERING state - this means it is
>>>>>>>> either in
>>>>>>>> +     *    initializing or wake up from STANDBY - run the heavy
>>>>>>>> sweep.
>>>>>>>> +     * 3. If there is only one node in the fabric, and this
>>>>>>>> node is a
>>>>>>>> +     *    switch, and OSM runs on top of it, there might be a race
>>>>>>>> when
>>>>>>>> +     *    OSM starts running before the external ports are up -
>>>>>>>> run the
>>>>>>>> +     *    heavy sweep.
>>>>>>>>       */
>>>>>>>> -    /*  if we are in DISCOVERING state - this means it is
>>>>>>>> either in
>>>>>>>> -     *  initializing or wake up from STANDBY - run the heavy
>>>>>>>> sweep */
>>>>>>>>      if (cl_qmap_count(&sm->p_subn->sw_guid_tbl)
>>>>>>>> +        && cl_qmap_count(&sm->p_subn->node_guid_tbl) != 1
>>>>>>>>          && sm->p_subn->sm_state != IB_SMINFO_STATE_DISCOVERING
>>>>>>>>          && sm->p_subn->opt.force_heavy_sweep == FALSE
>>>>>>>>          && sm->p_subn->force_heavy_sweep == FALSE
>>>>>>>> -- 
>>>>>>>> 1.5.1.4
>>>>>>>>
>>>>>>>> -- 
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>> linux-rdma" in
>>>>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>
>>>>>> -- 
>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>> linux-rdma" in
>>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>> -- 
>>>> To unsubscribe from this list: send the line "unsubscribe
>>>> linux-rdma" in
>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] opensm/osm_state_mgr.c: force heavy sweep when fabric consists of single switch
       [not found]                               ` <4AF6D619.8000908-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2009-11-09  8:18                                 ` Yevgeny Kliteynik
       [not found]                                   ` <4AF7D040.2060807-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Yevgeny Kliteynik @ 2009-11-09  8:18 UTC (permalink / raw)
  To: Eli Dorfman (Voltaire); +Cc: Line Holen, Sasha Khapyorsky, Linux RDMA

Eli Dorfman (Voltaire) wrote:
> Yevgeny Kliteynik wrote:
>> Yevgeny Kliteynik wrote:
>>> Line Holen wrote:
>>>> On 11/ 4/09 04:54 PM, Yevgeny Kliteynik wrote:
>>>>> Line Holen wrote:
>>>>>> On 11/ 4/09 10:47 AM, Yevgeny Kliteynik wrote:
>>>>>>> Sasha Khapyorsky wrote:
>>>>>>>> On 12:26 Tue 03 Nov     , Yevgeny Kliteynik wrote:
>>>>>>>>> Always do heavy sweep when there is only one node in the
>>>>>>>>> fabric, and this node is a switch, and SM runs on top of it -
>>>>>>>>> there may be a race when OSM starts running before the
>>>>>>>>> external ports are ports are up, or if they went through
>>>>>>>>> reset while SM was starting.
>>>>>>>>> In this race switch brings up the ports and turns on the
>>>>>>>>> PSC bit, but OSM might get PortInfo before SwitchInfo, and it
>>>>>>>>> might see all ports as down, but PSC bit on. If that happens,
>>>>>>>>> OSM turns off PSC bit, and it will never see external ports
>>>>>>>>> again - it won't perform any heavy sweep, only light sweep
>>>>>>>> Could such race happen when there are more than one node in a
>>>>>>>> fabric?
>>>>>>> I think that my description of the race was misleading.
>>>>>>> The race can happen on *any* fabric when SM runs on switch.
>>>>>>> But when it does happen, SM thinks that the whole subnet
>>>>>>> is just one switch - that's what it managed to discover.
>>>>>>> I've actually seen it happening.
>>>>>>> So the patch fixes this particular case.
>>>>>>>
>>>>>>> So the next question that you would probably ask is can
>>>>>>> this race happen on some *other* switch and not the one
>>>>>>> SM is running on?
>>>>>>>
>>>>>>> Well, I don't know. I have a hunch that it can't, but I
>>>>>>> couldn't prove it to myself yet.
>>>>>>>
>>>>>>> The race on the managed switch is a special case because
>>>>>>> SM always sees port 0, and always gets responses to its
>>>>>>> SMP queries. On any other switch, if the ports were reset,
>>>>>>> SM won't get any response until the ports are up again.
>>>>>>>
>>>>>>> Perhaps there might be a case where SM got some port as down,
>>>>>>> and by the time SM got SwitchInfo with PSC bit the port
>>>>>>> was already up, so SM won't start discovery beyond this
>>>>>>> port. But this race would be fixed on the next heavy sweep,
>>>>>>> when SM will discover this port that it missed the previous
>>>>>>> time, whereas race on managed switch is fatal - SM won't
>>>>>>> ever do any heavy sweep.
>>>>>>>
>>>>>>> -- Yevgeny
>>>>>> At least for the 3.2 branch there is a general race regardless of
>>>>>> where the SM is running. I haven't checked the current master, but
>>>>>> I cannot recall seeing any patches related to this so I assume
>>>>>> the race is still there.
>>>>>>
>>>>>> There is a window between SM discovering a switch and clearing PSC
>>>>>> for the same switch. The SM will not detect a state change on the
>>>>>> switch ports during this time.
>>>>> If the port changes state during that period, the switch issues
>>>>> new trap 128, which (I think) should cause SM to re-discover the
>>>>> fabric once this discovery cycle is over. Is this correct?
>>>>>
>>>> I think the switch shall send a trap whenever it sets the PSC bit.
>>>> Once set I believe it will not send another trap until it is reset.
>>>> Or do I misinterpret the spec ?
>>> I may be wrong, but I thought that this is how things work:
>>> - port state changes
>>> - switch turns on PSC bit and starts sending traps
>>> - SM gets the trap, sends trap repress
>>> - switch gets trap repress and stops sending traps
>>> - PSC is still on
>>> - port state changes again (the same or any other port)
>>> - switch turns on PSC bit (which doesn't matter as PSC is
>>>   already on) and starts sending traps again
>>> - etc...
>>>
>>> Anyway, I'll double-check this issue.
>> Yep, verified.
>> Switch sends traps regardless the PSC bit status.
>> Also, the spec doesn't link them together:
>>
>>   o14-5.1.1: If a switch supports Traps (PortInfo:
>>   CapabilityMask.IsTrap-Supported is one), its SMA
>>   shall send trap 128 to the SM indicated by the   PortInfo:MasterSMLID
>> under any condition that   would cause SwitchInfo:PortStateChange to be set
>>   to one. (See 14.2.5.4 SwitchInfo on page 827.)
>>
> 
> Trap will be sent according to the SMLID. After first bring up the SMLID is not set yet and trap will not be sent.
> In that case the opensm would discover the change only by PSC bit.
> For IS3 chips the PSC bit and/or trap were set only after one or more ports changed their state, so I don't understand how can the SM discover PSC bit set while all ports are down. Or is this a change in IS4?

It can happen when SM runs on the switch, not not host.
In this case if all ports are going down, SM will see
them all down and it will see PSC bit on.

-- Yevgeny

> Eli
> 
>> -- Yevgeny
>>
>>> -- Yevgeny
>>>
>>>>> Or perhaps the more serious problem happens when SM LID is not
>>>>> configured yet on the switch, hence the trap is not going to the
>>>>> right place?
>>>>>
>>>>>> I have a patch for the 3.2 branch that I can merge into master.
>>>>> Sure, that would be nice :)
>>>>>
>>>>> -- Yevgeny
>>>>>
>>>>>
>>>>>> Line
>>>>>>
>>>>>>>> Sasha
>>>>>>>>
>>>>>>>>> Signed-off-by: Yevgeny Kliteynik <kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
>>>>>>>>> ---
>>>>>>>>>  opensm/opensm/osm_state_mgr.c |   15 ++++++++++-----
>>>>>>>>>  1 files changed, 10 insertions(+), 5 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/opensm/opensm/osm_state_mgr.c
>>>>>>>>> b/opensm/opensm/osm_state_mgr.c
>>>>>>>>> index 4303d6e..537c855 100644
>>>>>>>>> --- a/opensm/opensm/osm_state_mgr.c
>>>>>>>>> +++ b/opensm/opensm/osm_state_mgr.c
>>>>>>>>> @@ -1062,13 +1062,18 @@ static void do_sweep(osm_sm_t * sm)
>>>>>>>>>       * Otherwise, this is probably our first discovery pass
>>>>>>>>>       * or we are connected in loopback. In both cases do a
>>>>>>>>>       * heavy sweep.
>>>>>>>>> -     * Note: If we are connected in loopback we want a heavy
>>>>>>>>> -     * sweep, since we will not be getting any traps if there is
>>>>>>>>> -     * a lost connection.
>>>>>>>>> +     * Note the following:
>>>>>>>>> +     * 1. If we are connected in loopback we want a heavy sweep,
>>>>>>>>> since we
>>>>>>>>> +     *    will not be getting any traps if there is a lost
>>>>>>>>> connection.
>>>>>>>>> +     * 2. If we are in DISCOVERING state - this means it is
>>>>>>>>> either in
>>>>>>>>> +     *    initializing or wake up from STANDBY - run the heavy
>>>>>>>>> sweep.
>>>>>>>>> +     * 3. If there is only one node in the fabric, and this
>>>>>>>>> node is a
>>>>>>>>> +     *    switch, and OSM runs on top of it, there might be a race
>>>>>>>>> when
>>>>>>>>> +     *    OSM starts running before the external ports are up -
>>>>>>>>> run the
>>>>>>>>> +     *    heavy sweep.
>>>>>>>>>       */
>>>>>>>>> -    /*  if we are in DISCOVERING state - this means it is
>>>>>>>>> either in
>>>>>>>>> -     *  initializing or wake up from STANDBY - run the heavy
>>>>>>>>> sweep */
>>>>>>>>>      if (cl_qmap_count(&sm->p_subn->sw_guid_tbl)
>>>>>>>>> +        && cl_qmap_count(&sm->p_subn->node_guid_tbl) != 1
>>>>>>>>>          && sm->p_subn->sm_state != IB_SMINFO_STATE_DISCOVERING
>>>>>>>>>          && sm->p_subn->opt.force_heavy_sweep == FALSE
>>>>>>>>>          && sm->p_subn->force_heavy_sweep == FALSE
>>>>>>>>> -- 
>>>>>>>>> 1.5.1.4
>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>>> linux-rdma" in
>>>>>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>
>>>>>>> -- 
>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>> linux-rdma" in
>>>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>> -- 
>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>> linux-rdma" in
>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>> -- 
>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] opensm/osm_state_mgr.c: force heavy sweep when fabric consists of single switch
       [not found]                                   ` <4AF7D040.2060807-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2009-11-09 10:42                                     ` Eli Dorfman (Voltaire)
       [not found]                                       ` <4AF7F22D.9010609-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Eli Dorfman (Voltaire) @ 2009-11-09 10:42 UTC (permalink / raw)
  To: kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb
  Cc: Line Holen, Sasha Khapyorsky, Linux RDMA

Yevgeny Kliteynik wrote:
> Eli Dorfman (Voltaire) wrote:
>> Yevgeny Kliteynik wrote:
>>> Yevgeny Kliteynik wrote:
>>>> Line Holen wrote:
>>>>> On 11/ 4/09 04:54 PM, Yevgeny Kliteynik wrote:
>>>>>> Line Holen wrote:
>>>>>>> On 11/ 4/09 10:47 AM, Yevgeny Kliteynik wrote:
>>>>>>>> Sasha Khapyorsky wrote:
>>>>>>>>> On 12:26 Tue 03 Nov     , Yevgeny Kliteynik wrote:
>>>>>>>>>> Always do heavy sweep when there is only one node in the
>>>>>>>>>> fabric, and this node is a switch, and SM runs on top of it -
>>>>>>>>>> there may be a race when OSM starts running before the
>>>>>>>>>> external ports are ports are up, or if they went through
>>>>>>>>>> reset while SM was starting.
>>>>>>>>>> In this race switch brings up the ports and turns on the
>>>>>>>>>> PSC bit, but OSM might get PortInfo before SwitchInfo, and it
>>>>>>>>>> might see all ports as down, but PSC bit on. If that happens,
>>>>>>>>>> OSM turns off PSC bit, and it will never see external ports
>>>>>>>>>> again - it won't perform any heavy sweep, only light sweep
>>>>>>>>> Could such race happen when there are more than one node in a
>>>>>>>>> fabric?
>>>>>>>> I think that my description of the race was misleading.
>>>>>>>> The race can happen on *any* fabric when SM runs on switch.
>>>>>>>> But when it does happen, SM thinks that the whole subnet
>>>>>>>> is just one switch - that's what it managed to discover.
>>>>>>>> I've actually seen it happening.
>>>>>>>> So the patch fixes this particular case.
>>>>>>>>
>>>>>>>> So the next question that you would probably ask is can
>>>>>>>> this race happen on some *other* switch and not the one
>>>>>>>> SM is running on?
>>>>>>>>
>>>>>>>> Well, I don't know. I have a hunch that it can't, but I
>>>>>>>> couldn't prove it to myself yet.
>>>>>>>>
>>>>>>>> The race on the managed switch is a special case because
>>>>>>>> SM always sees port 0, and always gets responses to its
>>>>>>>> SMP queries. On any other switch, if the ports were reset,
>>>>>>>> SM won't get any response until the ports are up again.
>>>>>>>>
>>>>>>>> Perhaps there might be a case where SM got some port as down,
>>>>>>>> and by the time SM got SwitchInfo with PSC bit the port
>>>>>>>> was already up, so SM won't start discovery beyond this
>>>>>>>> port. But this race would be fixed on the next heavy sweep,
>>>>>>>> when SM will discover this port that it missed the previous
>>>>>>>> time, whereas race on managed switch is fatal - SM won't
>>>>>>>> ever do any heavy sweep.
>>>>>>>>
>>>>>>>> -- Yevgeny
>>>>>>> At least for the 3.2 branch there is a general race regardless of
>>>>>>> where the SM is running. I haven't checked the current master, but
>>>>>>> I cannot recall seeing any patches related to this so I assume
>>>>>>> the race is still there.
>>>>>>>
>>>>>>> There is a window between SM discovering a switch and clearing PSC
>>>>>>> for the same switch. The SM will not detect a state change on the
>>>>>>> switch ports during this time.
>>>>>> If the port changes state during that period, the switch issues
>>>>>> new trap 128, which (I think) should cause SM to re-discover the
>>>>>> fabric once this discovery cycle is over. Is this correct?
>>>>>>
>>>>> I think the switch shall send a trap whenever it sets the PSC bit.
>>>>> Once set I believe it will not send another trap until it is reset.
>>>>> Or do I misinterpret the spec ?
>>>> I may be wrong, but I thought that this is how things work:
>>>> - port state changes
>>>> - switch turns on PSC bit and starts sending traps
>>>> - SM gets the trap, sends trap repress
>>>> - switch gets trap repress and stops sending traps
>>>> - PSC is still on
>>>> - port state changes again (the same or any other port)
>>>> - switch turns on PSC bit (which doesn't matter as PSC is
>>>>   already on) and starts sending traps again
>>>> - etc...
>>>>
>>>> Anyway, I'll double-check this issue.
>>> Yep, verified.
>>> Switch sends traps regardless the PSC bit status.
>>> Also, the spec doesn't link them together:
>>>
>>>   o14-5.1.1: If a switch supports Traps (PortInfo:
>>>   CapabilityMask.IsTrap-Supported is one), its SMA
>>>   shall send trap 128 to the SM indicated by the   PortInfo:MasterSMLID
>>> under any condition that   would cause SwitchInfo:PortStateChange to
>>> be set
>>>   to one. (See 14.2.5.4 SwitchInfo on page 827.)
>>>
>>
>> Trap will be sent according to the SMLID. After first bring up the
>> SMLID is not set yet and trap will not be sent.
>> In that case the opensm would discover the change only by PSC bit.
>> For IS3 chips the PSC bit and/or trap were set only after one or more
>> ports changed their state, so I don't understand how can the SM
>> discover PSC bit set while all ports are down. Or is this a change in
>> IS4?
> 
> It can happen when SM runs on the switch, not not host.
> In this case if all ports are going down, SM will see
> them all down and it will see PSC bit on.

So this patch is only for SM running on a switch which is the only node in the fabric?
I don't see the race when there is more than one switch - please explain.
Also AFAIK the PSC bit is set only after any physical port state change.
So if we clear the PSC bit and only then get PortInfo we will still catch any new state change.
right?

Eli


> 
> -- Yevgeny
> 
>> Eli
>>
>>> -- Yevgeny
>>>
>>>> -- Yevgeny
>>>>
>>>>>> Or perhaps the more serious problem happens when SM LID is not
>>>>>> configured yet on the switch, hence the trap is not going to the
>>>>>> right place?
>>>>>>
>>>>>>> I have a patch for the 3.2 branch that I can merge into master.
>>>>>> Sure, that would be nice :)
>>>>>>
>>>>>> -- Yevgeny
>>>>>>
>>>>>>
>>>>>>> Line
>>>>>>>
>>>>>>>>> Sasha
>>>>>>>>>
>>>>>>>>>> Signed-off-by: Yevgeny Kliteynik <kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
>>>>>>>>>> ---
>>>>>>>>>>  opensm/opensm/osm_state_mgr.c |   15 ++++++++++-----
>>>>>>>>>>  1 files changed, 10 insertions(+), 5 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/opensm/opensm/osm_state_mgr.c
>>>>>>>>>> b/opensm/opensm/osm_state_mgr.c
>>>>>>>>>> index 4303d6e..537c855 100644
>>>>>>>>>> --- a/opensm/opensm/osm_state_mgr.c
>>>>>>>>>> +++ b/opensm/opensm/osm_state_mgr.c
>>>>>>>>>> @@ -1062,13 +1062,18 @@ static void do_sweep(osm_sm_t * sm)
>>>>>>>>>>       * Otherwise, this is probably our first discovery pass
>>>>>>>>>>       * or we are connected in loopback. In both cases do a
>>>>>>>>>>       * heavy sweep.
>>>>>>>>>> -     * Note: If we are connected in loopback we want a heavy
>>>>>>>>>> -     * sweep, since we will not be getting any traps if there is
>>>>>>>>>> -     * a lost connection.
>>>>>>>>>> +     * Note the following:
>>>>>>>>>> +     * 1. If we are connected in loopback we want a heavy sweep,
>>>>>>>>>> since we
>>>>>>>>>> +     *    will not be getting any traps if there is a lost
>>>>>>>>>> connection.
>>>>>>>>>> +     * 2. If we are in DISCOVERING state - this means it is
>>>>>>>>>> either in
>>>>>>>>>> +     *    initializing or wake up from STANDBY - run the heavy
>>>>>>>>>> sweep.
>>>>>>>>>> +     * 3. If there is only one node in the fabric, and this
>>>>>>>>>> node is a
>>>>>>>>>> +     *    switch, and OSM runs on top of it, there might be a
>>>>>>>>>> race
>>>>>>>>>> when
>>>>>>>>>> +     *    OSM starts running before the external ports are up -
>>>>>>>>>> run the
>>>>>>>>>> +     *    heavy sweep.
>>>>>>>>>>       */
>>>>>>>>>> -    /*  if we are in DISCOVERING state - this means it is
>>>>>>>>>> either in
>>>>>>>>>> -     *  initializing or wake up from STANDBY - run the heavy
>>>>>>>>>> sweep */
>>>>>>>>>>      if (cl_qmap_count(&sm->p_subn->sw_guid_tbl)
>>>>>>>>>> +        && cl_qmap_count(&sm->p_subn->node_guid_tbl) != 1
>>>>>>>>>>          && sm->p_subn->sm_state != IB_SMINFO_STATE_DISCOVERING
>>>>>>>>>>          && sm->p_subn->opt.force_heavy_sweep == FALSE
>>>>>>>>>>          && sm->p_subn->force_heavy_sweep == FALSE
>>>>>>>>>> -- 
>>>>>>>>>> 1.5.1.4
>>>>>>>>>>
>>>>>>>>>> -- 
>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>>>> linux-rdma" in
>>>>>>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>>>>>> More majordomo info at 
>>>>>>>>>> http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>
>>>>>>>> -- 
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>> linux-rdma" in
>>>>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>> -- 
>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>> linux-rdma" in
>>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>> -- 
>>>> To unsubscribe from this list: send the line "unsubscribe
>>>> linux-rdma" in
>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>> -- 
>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] opensm/osm_state_mgr.c: force heavy sweep when fabric consists of single switch
       [not found]                                       ` <4AF7F22D.9010609-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2009-11-09 11:09                                         ` Yevgeny Kliteynik
       [not found]                                           ` <4AF7F864.6030809-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Yevgeny Kliteynik @ 2009-11-09 11:09 UTC (permalink / raw)
  To: Eli Dorfman (Voltaire); +Cc: Line Holen, Sasha Khapyorsky, Linux RDMA

Eli Dorfman (Voltaire) wrote:
> Yevgeny Kliteynik wrote:
>> Eli Dorfman (Voltaire) wrote:
>>> Yevgeny Kliteynik wrote:
>>>> Yevgeny Kliteynik wrote:
>>>>> Line Holen wrote:
>>>>>> On 11/ 4/09 04:54 PM, Yevgeny Kliteynik wrote:
>>>>>>> Line Holen wrote:
>>>>>>>> On 11/ 4/09 10:47 AM, Yevgeny Kliteynik wrote:
>>>>>>>>> Sasha Khapyorsky wrote:
>>>>>>>>>> On 12:26 Tue 03 Nov     , Yevgeny Kliteynik wrote:
>>>>>>>>>>> Always do heavy sweep when there is only one node in the
>>>>>>>>>>> fabric, and this node is a switch, and SM runs on top of it -
>>>>>>>>>>> there may be a race when OSM starts running before the
>>>>>>>>>>> external ports are ports are up, or if they went through
>>>>>>>>>>> reset while SM was starting.
>>>>>>>>>>> In this race switch brings up the ports and turns on the
>>>>>>>>>>> PSC bit, but OSM might get PortInfo before SwitchInfo, and it
>>>>>>>>>>> might see all ports as down, but PSC bit on. If that happens,
>>>>>>>>>>> OSM turns off PSC bit, and it will never see external ports
>>>>>>>>>>> again - it won't perform any heavy sweep, only light sweep
>>>>>>>>>> Could such race happen when there are more than one node in a
>>>>>>>>>> fabric?
>>>>>>>>> I think that my description of the race was misleading.
>>>>>>>>> The race can happen on *any* fabric when SM runs on switch.
>>>>>>>>> But when it does happen, SM thinks that the whole subnet
>>>>>>>>> is just one switch - that's what it managed to discover.
>>>>>>>>> I've actually seen it happening.
>>>>>>>>> So the patch fixes this particular case.
>>>>>>>>>
>>>>>>>>> So the next question that you would probably ask is can
>>>>>>>>> this race happen on some *other* switch and not the one
>>>>>>>>> SM is running on?
>>>>>>>>>
>>>>>>>>> Well, I don't know. I have a hunch that it can't, but I
>>>>>>>>> couldn't prove it to myself yet.
>>>>>>>>>
>>>>>>>>> The race on the managed switch is a special case because
>>>>>>>>> SM always sees port 0, and always gets responses to its
>>>>>>>>> SMP queries. On any other switch, if the ports were reset,
>>>>>>>>> SM won't get any response until the ports are up again.
>>>>>>>>>
>>>>>>>>> Perhaps there might be a case where SM got some port as down,
>>>>>>>>> and by the time SM got SwitchInfo with PSC bit the port
>>>>>>>>> was already up, so SM won't start discovery beyond this
>>>>>>>>> port. But this race would be fixed on the next heavy sweep,
>>>>>>>>> when SM will discover this port that it missed the previous
>>>>>>>>> time, whereas race on managed switch is fatal - SM won't
>>>>>>>>> ever do any heavy sweep.
>>>>>>>>>
>>>>>>>>> -- Yevgeny
>>>>>>>> At least for the 3.2 branch there is a general race regardless of
>>>>>>>> where the SM is running. I haven't checked the current master, but
>>>>>>>> I cannot recall seeing any patches related to this so I assume
>>>>>>>> the race is still there.
>>>>>>>>
>>>>>>>> There is a window between SM discovering a switch and clearing PSC
>>>>>>>> for the same switch. The SM will not detect a state change on the
>>>>>>>> switch ports during this time.
>>>>>>> If the port changes state during that period, the switch issues
>>>>>>> new trap 128, which (I think) should cause SM to re-discover the
>>>>>>> fabric once this discovery cycle is over. Is this correct?
>>>>>>>
>>>>>> I think the switch shall send a trap whenever it sets the PSC bit.
>>>>>> Once set I believe it will not send another trap until it is reset.
>>>>>> Or do I misinterpret the spec ?
>>>>> I may be wrong, but I thought that this is how things work:
>>>>> - port state changes
>>>>> - switch turns on PSC bit and starts sending traps
>>>>> - SM gets the trap, sends trap repress
>>>>> - switch gets trap repress and stops sending traps
>>>>> - PSC is still on
>>>>> - port state changes again (the same or any other port)
>>>>> - switch turns on PSC bit (which doesn't matter as PSC is
>>>>>   already on) and starts sending traps again
>>>>> - etc...
>>>>>
>>>>> Anyway, I'll double-check this issue.
>>>> Yep, verified.
>>>> Switch sends traps regardless the PSC bit status.
>>>> Also, the spec doesn't link them together:
>>>>
>>>>   o14-5.1.1: If a switch supports Traps (PortInfo:
>>>>   CapabilityMask.IsTrap-Supported is one), its SMA
>>>>   shall send trap 128 to the SM indicated by the   PortInfo:MasterSMLID
>>>> under any condition that   would cause SwitchInfo:PortStateChange to
>>>> be set
>>>>   to one. (See 14.2.5.4 SwitchInfo on page 827.)
>>>>
>>> Trap will be sent according to the SMLID. After first bring up the
>>> SMLID is not set yet and trap will not be sent.
>>> In that case the opensm would discover the change only by PSC bit.
>>> For IS3 chips the PSC bit and/or trap were set only after one or more
>>> ports changed their state, so I don't understand how can the SM
>>> discover PSC bit set while all ports are down. Or is this a change in
>>> IS4?
>> It can happen when SM runs on the switch, not not host.
>> In this case if all ports are going down, SM will see
>> them all down and it will see PSC bit on.
> 
> So this patch is only for SM running on a switch which is the only node in the fabric?
> I don't see the race when there is more than one switch - please explain.

Quoting from above:

   The race can happen on *any* fabric when SM runs on switch.
   But when it does happen, SM thinks that the whole subnet
   is just one switch - that's what it managed to discover.

> Also AFAIK the PSC bit is set only after any physical port state change.

Yes, but it is set only once.
Meanwhile, the ports can change from up to down,
then SM discovers them, and then from down to up.

-- Yevgeny


> So if we clear the PSC bit and only then get PortInfo we will still catch any new state change.
> right?
> 
> Eli
> 
> 
>> -- Yevgeny
>>
>>> Eli
>>>
>>>> -- Yevgeny
>>>>
>>>>> -- Yevgeny
>>>>>
>>>>>>> Or perhaps the more serious problem happens when SM LID is not
>>>>>>> configured yet on the switch, hence the trap is not going to the
>>>>>>> right place?
>>>>>>>
>>>>>>>> I have a patch for the 3.2 branch that I can merge into master.
>>>>>>> Sure, that would be nice :)
>>>>>>>
>>>>>>> -- Yevgeny
>>>>>>>
>>>>>>>
>>>>>>>> Line
>>>>>>>>
>>>>>>>>>> Sasha
>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Yevgeny Kliteynik <kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
>>>>>>>>>>> ---
>>>>>>>>>>>  opensm/opensm/osm_state_mgr.c |   15 ++++++++++-----
>>>>>>>>>>>  1 files changed, 10 insertions(+), 5 deletions(-)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/opensm/opensm/osm_state_mgr.c
>>>>>>>>>>> b/opensm/opensm/osm_state_mgr.c
>>>>>>>>>>> index 4303d6e..537c855 100644
>>>>>>>>>>> --- a/opensm/opensm/osm_state_mgr.c
>>>>>>>>>>> +++ b/opensm/opensm/osm_state_mgr.c
>>>>>>>>>>> @@ -1062,13 +1062,18 @@ static void do_sweep(osm_sm_t * sm)
>>>>>>>>>>>       * Otherwise, this is probably our first discovery pass
>>>>>>>>>>>       * or we are connected in loopback. In both cases do a
>>>>>>>>>>>       * heavy sweep.
>>>>>>>>>>> -     * Note: If we are connected in loopback we want a heavy
>>>>>>>>>>> -     * sweep, since we will not be getting any traps if there is
>>>>>>>>>>> -     * a lost connection.
>>>>>>>>>>> +     * Note the following:
>>>>>>>>>>> +     * 1. If we are connected in loopback we want a heavy sweep,
>>>>>>>>>>> since we
>>>>>>>>>>> +     *    will not be getting any traps if there is a lost
>>>>>>>>>>> connection.
>>>>>>>>>>> +     * 2. If we are in DISCOVERING state - this means it is
>>>>>>>>>>> either in
>>>>>>>>>>> +     *    initializing or wake up from STANDBY - run the heavy
>>>>>>>>>>> sweep.
>>>>>>>>>>> +     * 3. If there is only one node in the fabric, and this
>>>>>>>>>>> node is a
>>>>>>>>>>> +     *    switch, and OSM runs on top of it, there might be a
>>>>>>>>>>> race
>>>>>>>>>>> when
>>>>>>>>>>> +     *    OSM starts running before the external ports are up -
>>>>>>>>>>> run the
>>>>>>>>>>> +     *    heavy sweep.
>>>>>>>>>>>       */
>>>>>>>>>>> -    /*  if we are in DISCOVERING state - this means it is
>>>>>>>>>>> either in
>>>>>>>>>>> -     *  initializing or wake up from STANDBY - run the heavy
>>>>>>>>>>> sweep */
>>>>>>>>>>>      if (cl_qmap_count(&sm->p_subn->sw_guid_tbl)
>>>>>>>>>>> +        && cl_qmap_count(&sm->p_subn->node_guid_tbl) != 1
>>>>>>>>>>>          && sm->p_subn->sm_state != IB_SMINFO_STATE_DISCOVERING
>>>>>>>>>>>          && sm->p_subn->opt.force_heavy_sweep == FALSE
>>>>>>>>>>>          && sm->p_subn->force_heavy_sweep == FALSE
>>>>>>>>>>> -- 
>>>>>>>>>>> 1.5.1.4
>>>>>>>>>>>
>>>>>>>>>>> -- 
>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>>>>> linux-rdma" in
>>>>>>>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>>>>>>> More majordomo info at 
>>>>>>>>>>> http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>>> linux-rdma" in
>>>>>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>> -- 
>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>> linux-rdma" in
>>>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>> -- 
>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>> linux-rdma" in
>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>> -- 
>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> -- 
>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] opensm/osm_state_mgr.c: force heavy sweep when fabric consists of single switch
       [not found]                                           ` <4AF7F864.6030809-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2009-11-09 13:54                                             ` Eli Dorfman (Voltaire)
       [not found]                                               ` <4AF81F15.1080205-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Eli Dorfman (Voltaire) @ 2009-11-09 13:54 UTC (permalink / raw)
  To: kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb
  Cc: Line Holen, Sasha Khapyorsky, Linux RDMA

Yevgeny Kliteynik wrote:
> Eli Dorfman (Voltaire) wrote:
>> Yevgeny Kliteynik wrote:
>>> Eli Dorfman (Voltaire) wrote:
>>>> Yevgeny Kliteynik wrote:
>>>>> Yevgeny Kliteynik wrote:
>>>>>> Line Holen wrote:
>>>>>>> On 11/ 4/09 04:54 PM, Yevgeny Kliteynik wrote:
>>>>>>>> Line Holen wrote:
>>>>>>>>> On 11/ 4/09 10:47 AM, Yevgeny Kliteynik wrote:
>>>>>>>>>> Sasha Khapyorsky wrote:
>>>>>>>>>>> On 12:26 Tue 03 Nov     , Yevgeny Kliteynik wrote:
>>>>>>>>>>>> Always do heavy sweep when there is only one node in the
>>>>>>>>>>>> fabric, and this node is a switch, and SM runs on top of it -
>>>>>>>>>>>> there may be a race when OSM starts running before the
>>>>>>>>>>>> external ports are ports are up, or if they went through
>>>>>>>>>>>> reset while SM was starting.
>>>>>>>>>>>> In this race switch brings up the ports and turns on the
>>>>>>>>>>>> PSC bit, but OSM might get PortInfo before SwitchInfo, and it
>>>>>>>>>>>> might see all ports as down, but PSC bit on. If that happens,
>>>>>>>>>>>> OSM turns off PSC bit, and it will never see external ports
>>>>>>>>>>>> again - it won't perform any heavy sweep, only light sweep
>>>>>>>>>>> Could such race happen when there are more than one node in a
>>>>>>>>>>> fabric?
>>>>>>>>>> I think that my description of the race was misleading.
>>>>>>>>>> The race can happen on *any* fabric when SM runs on switch.
>>>>>>>>>> But when it does happen, SM thinks that the whole subnet
>>>>>>>>>> is just one switch - that's what it managed to discover.
>>>>>>>>>> I've actually seen it happening.
>>>>>>>>>> So the patch fixes this particular case.
>>>>>>>>>>
>>>>>>>>>> So the next question that you would probably ask is can
>>>>>>>>>> this race happen on some *other* switch and not the one
>>>>>>>>>> SM is running on?
>>>>>>>>>>
>>>>>>>>>> Well, I don't know. I have a hunch that it can't, but I
>>>>>>>>>> couldn't prove it to myself yet.
>>>>>>>>>>
>>>>>>>>>> The race on the managed switch is a special case because
>>>>>>>>>> SM always sees port 0, and always gets responses to its
>>>>>>>>>> SMP queries. On any other switch, if the ports were reset,
>>>>>>>>>> SM won't get any response until the ports are up again.
>>>>>>>>>>
>>>>>>>>>> Perhaps there might be a case where SM got some port as down,
>>>>>>>>>> and by the time SM got SwitchInfo with PSC bit the port
>>>>>>>>>> was already up, so SM won't start discovery beyond this
>>>>>>>>>> port. But this race would be fixed on the next heavy sweep,
>>>>>>>>>> when SM will discover this port that it missed the previous
>>>>>>>>>> time, whereas race on managed switch is fatal - SM won't
>>>>>>>>>> ever do any heavy sweep.
>>>>>>>>>>
>>>>>>>>>> -- Yevgeny
>>>>>>>>> At least for the 3.2 branch there is a general race regardless of
>>>>>>>>> where the SM is running. I haven't checked the current master, but
>>>>>>>>> I cannot recall seeing any patches related to this so I assume
>>>>>>>>> the race is still there.
>>>>>>>>>
>>>>>>>>> There is a window between SM discovering a switch and clearing PSC
>>>>>>>>> for the same switch. The SM will not detect a state change on the
>>>>>>>>> switch ports during this time.
>>>>>>>> If the port changes state during that period, the switch issues
>>>>>>>> new trap 128, which (I think) should cause SM to re-discover the
>>>>>>>> fabric once this discovery cycle is over. Is this correct?
>>>>>>>>
>>>>>>> I think the switch shall send a trap whenever it sets the PSC bit.
>>>>>>> Once set I believe it will not send another trap until it is reset.
>>>>>>> Or do I misinterpret the spec ?
>>>>>> I may be wrong, but I thought that this is how things work:
>>>>>> - port state changes
>>>>>> - switch turns on PSC bit and starts sending traps
>>>>>> - SM gets the trap, sends trap repress
>>>>>> - switch gets trap repress and stops sending traps
>>>>>> - PSC is still on
>>>>>> - port state changes again (the same or any other port)
>>>>>> - switch turns on PSC bit (which doesn't matter as PSC is
>>>>>>   already on) and starts sending traps again
>>>>>> - etc...
>>>>>>
>>>>>> Anyway, I'll double-check this issue.
>>>>> Yep, verified.
>>>>> Switch sends traps regardless the PSC bit status.
>>>>> Also, the spec doesn't link them together:
>>>>>
>>>>>   o14-5.1.1: If a switch supports Traps (PortInfo:
>>>>>   CapabilityMask.IsTrap-Supported is one), its SMA
>>>>>   shall send trap 128 to the SM indicated by the  
>>>>> PortInfo:MasterSMLID
>>>>> under any condition that   would cause SwitchInfo:PortStateChange to
>>>>> be set
>>>>>   to one. (See 14.2.5.4 SwitchInfo on page 827.)
>>>>>
>>>> Trap will be sent according to the SMLID. After first bring up the
>>>> SMLID is not set yet and trap will not be sent.
>>>> In that case the opensm would discover the change only by PSC bit.
>>>> For IS3 chips the PSC bit and/or trap were set only after one or more
>>>> ports changed their state, so I don't understand how can the SM
>>>> discover PSC bit set while all ports are down. Or is this a change in
>>>> IS4?
>>> It can happen when SM runs on the switch, not not host.
>>> In this case if all ports are going down, SM will see
>>> them all down and it will see PSC bit on.
>>
>> So this patch is only for SM running on a switch which is the only
>> node in the fabric?
>> I don't see the race when there is more than one switch - please explain.
> 
> Quoting from above:
> 
>   The race can happen on *any* fabric when SM runs on switch.
>   But when it does happen, SM thinks that the whole subnet
>   is just one switch - that's what it managed to discover.

I saw that but I don't understand how this can happen.
If PSC bit is set after *every* port state change and
SM clears PSC bit before reading PortInfo from the switch, then there is no race condition.
As I mentioned before for IS3 switches that is correct.
Is there a different behavior with IS4 switches?

> 
>> Also AFAIK the PSC bit is set only after any physical port state change.
> 
> Yes, but it is set only once.

PSC bit should be set after *every* port state change.

Eli

> Meanwhile, the ports can change from up to down,
> then SM discovers them, and then from down to up.
> 
> -- Yevgeny
> 
> 
>> So if we clear the PSC bit and only then get PortInfo we will still
>> catch any new state change.
>> right?
>>
>> Eli
>>
>>
>>> -- Yevgeny
>>>
>>>> Eli
>>>>
>>>>> -- Yevgeny
>>>>>
>>>>>> -- Yevgeny
>>>>>>
>>>>>>>> Or perhaps the more serious problem happens when SM LID is not
>>>>>>>> configured yet on the switch, hence the trap is not going to the
>>>>>>>> right place?
>>>>>>>>
>>>>>>>>> I have a patch for the 3.2 branch that I can merge into master.
>>>>>>>> Sure, that would be nice :)
>>>>>>>>
>>>>>>>> -- Yevgeny
>>>>>>>>
>>>>>>>>
>>>>>>>>> Line
>>>>>>>>>
>>>>>>>>>>> Sasha
>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Yevgeny Kliteynik <kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
>>>>>>>>>>>> ---
>>>>>>>>>>>>  opensm/opensm/osm_state_mgr.c |   15 ++++++++++-----
>>>>>>>>>>>>  1 files changed, 10 insertions(+), 5 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/opensm/opensm/osm_state_mgr.c
>>>>>>>>>>>> b/opensm/opensm/osm_state_mgr.c
>>>>>>>>>>>> index 4303d6e..537c855 100644
>>>>>>>>>>>> --- a/opensm/opensm/osm_state_mgr.c
>>>>>>>>>>>> +++ b/opensm/opensm/osm_state_mgr.c
>>>>>>>>>>>> @@ -1062,13 +1062,18 @@ static void do_sweep(osm_sm_t * sm)
>>>>>>>>>>>>       * Otherwise, this is probably our first discovery pass
>>>>>>>>>>>>       * or we are connected in loopback. In both cases do a
>>>>>>>>>>>>       * heavy sweep.
>>>>>>>>>>>> -     * Note: If we are connected in loopback we want a heavy
>>>>>>>>>>>> -     * sweep, since we will not be getting any traps if
>>>>>>>>>>>> there is
>>>>>>>>>>>> -     * a lost connection.
>>>>>>>>>>>> +     * Note the following:
>>>>>>>>>>>> +     * 1. If we are connected in loopback we want a heavy
>>>>>>>>>>>> sweep,
>>>>>>>>>>>> since we
>>>>>>>>>>>> +     *    will not be getting any traps if there is a lost
>>>>>>>>>>>> connection.
>>>>>>>>>>>> +     * 2. If we are in DISCOVERING state - this means it is
>>>>>>>>>>>> either in
>>>>>>>>>>>> +     *    initializing or wake up from STANDBY - run the heavy
>>>>>>>>>>>> sweep.
>>>>>>>>>>>> +     * 3. If there is only one node in the fabric, and this
>>>>>>>>>>>> node is a
>>>>>>>>>>>> +     *    switch, and OSM runs on top of it, there might be a
>>>>>>>>>>>> race
>>>>>>>>>>>> when
>>>>>>>>>>>> +     *    OSM starts running before the external ports are
>>>>>>>>>>>> up -
>>>>>>>>>>>> run the
>>>>>>>>>>>> +     *    heavy sweep.
>>>>>>>>>>>>       */
>>>>>>>>>>>> -    /*  if we are in DISCOVERING state - this means it is
>>>>>>>>>>>> either in
>>>>>>>>>>>> -     *  initializing or wake up from STANDBY - run the heavy
>>>>>>>>>>>> sweep */
>>>>>>>>>>>>      if (cl_qmap_count(&sm->p_subn->sw_guid_tbl)
>>>>>>>>>>>> +        && cl_qmap_count(&sm->p_subn->node_guid_tbl) != 1
>>>>>>>>>>>>          && sm->p_subn->sm_state != IB_SMINFO_STATE_DISCOVERING
>>>>>>>>>>>>          && sm->p_subn->opt.force_heavy_sweep == FALSE
>>>>>>>>>>>>          && sm->p_subn->force_heavy_sweep == FALSE
>>>>>>>>>>>> -- 
>>>>>>>>>>>> 1.5.1.4
>>>>>>>>>>>>
>>>>>>>>>>>> -- 
>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>>>>>> linux-rdma" in
>>>>>>>>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>>>>>>>> More majordomo info at
>>>>>>>>>>>> http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>
>>>>>>>>>> -- 
>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>>>> linux-rdma" in
>>>>>>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>>>>>> More majordomo info at 
>>>>>>>>>> http://vger.kernel.org/majordomo-info.html
>>>>>>>> -- 
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>> linux-rdma" in
>>>>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>> -- 
>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>> linux-rdma" in
>>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>> -- 
>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>> linux-rdma" in
>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>> -- 
>>>> To unsubscribe from this list: send the line "unsubscribe
>>>> linux-rdma" in
>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] opensm/osm_state_mgr.c: force heavy sweep when fabric consists of single switch
       [not found]                                               ` <4AF81F15.1080205-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2009-11-11  9:15                                                 ` Yevgeny Kliteynik
       [not found]                                                   ` <4AFA80A5.8080809-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Yevgeny Kliteynik @ 2009-11-11  9:15 UTC (permalink / raw)
  To: Eli Dorfman (Voltaire); +Cc: Line Holen, Sasha Khapyorsky, Linux RDMA

Eli Dorfman (Voltaire) wrote:
> Yevgeny Kliteynik wrote:
>> Eli Dorfman (Voltaire) wrote:
>>> Yevgeny Kliteynik wrote:
>>>> Eli Dorfman (Voltaire) wrote:
>>>>> Yevgeny Kliteynik wrote:
>>>>>> Yevgeny Kliteynik wrote:
>>>>>>> Line Holen wrote:
>>>>>>>> On 11/ 4/09 04:54 PM, Yevgeny Kliteynik wrote:
>>>>>>>>> Line Holen wrote:
>>>>>>>>>> On 11/ 4/09 10:47 AM, Yevgeny Kliteynik wrote:
>>>>>>>>>>> Sasha Khapyorsky wrote:
>>>>>>>>>>>> On 12:26 Tue 03 Nov     , Yevgeny Kliteynik wrote:
>>>>>>>>>>>>> Always do heavy sweep when there is only one node in the
>>>>>>>>>>>>> fabric, and this node is a switch, and SM runs on top of it -
>>>>>>>>>>>>> there may be a race when OSM starts running before the
>>>>>>>>>>>>> external ports are ports are up, or if they went through
>>>>>>>>>>>>> reset while SM was starting.
>>>>>>>>>>>>> In this race switch brings up the ports and turns on the
>>>>>>>>>>>>> PSC bit, but OSM might get PortInfo before SwitchInfo, and it
>>>>>>>>>>>>> might see all ports as down, but PSC bit on. If that happens,
>>>>>>>>>>>>> OSM turns off PSC bit, and it will never see external ports
>>>>>>>>>>>>> again - it won't perform any heavy sweep, only light sweep
>>>>>>>>>>>> Could such race happen when there are more than one node in a
>>>>>>>>>>>> fabric?
>>>>>>>>>>> I think that my description of the race was misleading.
>>>>>>>>>>> The race can happen on *any* fabric when SM runs on switch.
>>>>>>>>>>> But when it does happen, SM thinks that the whole subnet
>>>>>>>>>>> is just one switch - that's what it managed to discover.
>>>>>>>>>>> I've actually seen it happening.
>>>>>>>>>>> So the patch fixes this particular case.
>>>>>>>>>>>
>>>>>>>>>>> So the next question that you would probably ask is can
>>>>>>>>>>> this race happen on some *other* switch and not the one
>>>>>>>>>>> SM is running on?
>>>>>>>>>>>
>>>>>>>>>>> Well, I don't know. I have a hunch that it can't, but I
>>>>>>>>>>> couldn't prove it to myself yet.
>>>>>>>>>>>
>>>>>>>>>>> The race on the managed switch is a special case because
>>>>>>>>>>> SM always sees port 0, and always gets responses to its
>>>>>>>>>>> SMP queries. On any other switch, if the ports were reset,
>>>>>>>>>>> SM won't get any response until the ports are up again.
>>>>>>>>>>>
>>>>>>>>>>> Perhaps there might be a case where SM got some port as down,
>>>>>>>>>>> and by the time SM got SwitchInfo with PSC bit the port
>>>>>>>>>>> was already up, so SM won't start discovery beyond this
>>>>>>>>>>> port. But this race would be fixed on the next heavy sweep,
>>>>>>>>>>> when SM will discover this port that it missed the previous
>>>>>>>>>>> time, whereas race on managed switch is fatal - SM won't
>>>>>>>>>>> ever do any heavy sweep.
>>>>>>>>>>>
>>>>>>>>>>> -- Yevgeny
>>>>>>>>>> At least for the 3.2 branch there is a general race regardless of
>>>>>>>>>> where the SM is running. I haven't checked the current master, but
>>>>>>>>>> I cannot recall seeing any patches related to this so I assume
>>>>>>>>>> the race is still there.
>>>>>>>>>>
>>>>>>>>>> There is a window between SM discovering a switch and clearing PSC
>>>>>>>>>> for the same switch. The SM will not detect a state change on the
>>>>>>>>>> switch ports during this time.
>>>>>>>>> If the port changes state during that period, the switch issues
>>>>>>>>> new trap 128, which (I think) should cause SM to re-discover the
>>>>>>>>> fabric once this discovery cycle is over. Is this correct?
>>>>>>>>>
>>>>>>>> I think the switch shall send a trap whenever it sets the PSC bit.
>>>>>>>> Once set I believe it will not send another trap until it is reset.
>>>>>>>> Or do I misinterpret the spec ?
>>>>>>> I may be wrong, but I thought that this is how things work:
>>>>>>> - port state changes
>>>>>>> - switch turns on PSC bit and starts sending traps
>>>>>>> - SM gets the trap, sends trap repress
>>>>>>> - switch gets trap repress and stops sending traps
>>>>>>> - PSC is still on
>>>>>>> - port state changes again (the same or any other port)
>>>>>>> - switch turns on PSC bit (which doesn't matter as PSC is
>>>>>>>   already on) and starts sending traps again
>>>>>>> - etc...
>>>>>>>
>>>>>>> Anyway, I'll double-check this issue.
>>>>>> Yep, verified.
>>>>>> Switch sends traps regardless the PSC bit status.
>>>>>> Also, the spec doesn't link them together:
>>>>>>
>>>>>>   o14-5.1.1: If a switch supports Traps (PortInfo:
>>>>>>   CapabilityMask.IsTrap-Supported is one), its SMA
>>>>>>   shall send trap 128 to the SM indicated by the  
>>>>>> PortInfo:MasterSMLID
>>>>>> under any condition that   would cause SwitchInfo:PortStateChange to
>>>>>> be set
>>>>>>   to one. (See 14.2.5.4 SwitchInfo on page 827.)
>>>>>>
>>>>> Trap will be sent according to the SMLID. After first bring up the
>>>>> SMLID is not set yet and trap will not be sent.
>>>>> In that case the opensm would discover the change only by PSC bit.
>>>>> For IS3 chips the PSC bit and/or trap were set only after one or more
>>>>> ports changed their state, so I don't understand how can the SM
>>>>> discover PSC bit set while all ports are down. Or is this a change in
>>>>> IS4?
>>>> It can happen when SM runs on the switch, not not host.
>>>> In this case if all ports are going down, SM will see
>>>> them all down and it will see PSC bit on.
>>> So this patch is only for SM running on a switch which is the only
>>> node in the fabric?
>>> I don't see the race when there is more than one switch - please explain.
>> Quoting from above:
>>
>>   The race can happen on *any* fabric when SM runs on switch.
>>   But when it does happen, SM thinks that the whole subnet
>>   is just one switch - that's what it managed to discover.
> 
> I saw that but I don't understand how this can happen.
> If PSC bit is set after *every* port state change and
> SM clears PSC bit before reading PortInfo from the switch,

osm_node_info_rcv.c, ni_rcv_process_switch():
I see in the code that SM receives NodeInfo, then it requests
SwitchInfo and right after that it requests PortInfo for all
the ports w/o waiting for the SwitchInfo response.
In addition to that, if it happens during the first master
sweep, SM LID is not configured yet, or configured to the
wrong value, so no traps will be received by the SM.

-- Yevgeny

> then there is no race condition.
> As I mentioned before for IS3 switches that is correct.
> Is there a different behavior with IS4 switches?
> 
>>> Also AFAIK the PSC bit is set only after any physical port state change.
>> Yes, but it is set only once.
> 
> PSC bit should be set after *every* port state change.
> 
> Eli
> 
>> Meanwhile, the ports can change from up to down,
>> then SM discovers them, and then from down to up.
>>
>> -- Yevgeny
>>
>>
>>> So if we clear the PSC bit and only then get PortInfo we will still
>>> catch any new state change.
>>> right?
>>>
>>> Eli
>>>
>>>
>>>> -- Yevgeny
>>>>
>>>>> Eli
>>>>>
>>>>>> -- Yevgeny
>>>>>>
>>>>>>> -- Yevgeny
>>>>>>>
>>>>>>>>> Or perhaps the more serious problem happens when SM LID is not
>>>>>>>>> configured yet on the switch, hence the trap is not going to the
>>>>>>>>> right place?
>>>>>>>>>
>>>>>>>>>> I have a patch for the 3.2 branch that I can merge into master.
>>>>>>>>> Sure, that would be nice :)
>>>>>>>>>
>>>>>>>>> -- Yevgeny
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Line
>>>>>>>>>>
>>>>>>>>>>>> Sasha
>>>>>>>>>>>>
>>>>>>>>>>>>> Signed-off-by: Yevgeny Kliteynik <kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
>>>>>>>>>>>>> ---
>>>>>>>>>>>>>  opensm/opensm/osm_state_mgr.c |   15 ++++++++++-----
>>>>>>>>>>>>>  1 files changed, 10 insertions(+), 5 deletions(-)
>>>>>>>>>>>>>
>>>>>>>>>>>>> diff --git a/opensm/opensm/osm_state_mgr.c
>>>>>>>>>>>>> b/opensm/opensm/osm_state_mgr.c
>>>>>>>>>>>>> index 4303d6e..537c855 100644
>>>>>>>>>>>>> --- a/opensm/opensm/osm_state_mgr.c
>>>>>>>>>>>>> +++ b/opensm/opensm/osm_state_mgr.c
>>>>>>>>>>>>> @@ -1062,13 +1062,18 @@ static void do_sweep(osm_sm_t * sm)
>>>>>>>>>>>>>       * Otherwise, this is probably our first discovery pass
>>>>>>>>>>>>>       * or we are connected in loopback. In both cases do a
>>>>>>>>>>>>>       * heavy sweep.
>>>>>>>>>>>>> -     * Note: If we are connected in loopback we want a heavy
>>>>>>>>>>>>> -     * sweep, since we will not be getting any traps if
>>>>>>>>>>>>> there is
>>>>>>>>>>>>> -     * a lost connection.
>>>>>>>>>>>>> +     * Note the following:
>>>>>>>>>>>>> +     * 1. If we are connected in loopback we want a heavy
>>>>>>>>>>>>> sweep,
>>>>>>>>>>>>> since we
>>>>>>>>>>>>> +     *    will not be getting any traps if there is a lost
>>>>>>>>>>>>> connection.
>>>>>>>>>>>>> +     * 2. If we are in DISCOVERING state - this means it is
>>>>>>>>>>>>> either in
>>>>>>>>>>>>> +     *    initializing or wake up from STANDBY - run the heavy
>>>>>>>>>>>>> sweep.
>>>>>>>>>>>>> +     * 3. If there is only one node in the fabric, and this
>>>>>>>>>>>>> node is a
>>>>>>>>>>>>> +     *    switch, and OSM runs on top of it, there might be a
>>>>>>>>>>>>> race
>>>>>>>>>>>>> when
>>>>>>>>>>>>> +     *    OSM starts running before the external ports are
>>>>>>>>>>>>> up -
>>>>>>>>>>>>> run the
>>>>>>>>>>>>> +     *    heavy sweep.
>>>>>>>>>>>>>       */
>>>>>>>>>>>>> -    /*  if we are in DISCOVERING state - this means it is
>>>>>>>>>>>>> either in
>>>>>>>>>>>>> -     *  initializing or wake up from STANDBY - run the heavy
>>>>>>>>>>>>> sweep */
>>>>>>>>>>>>>      if (cl_qmap_count(&sm->p_subn->sw_guid_tbl)
>>>>>>>>>>>>> +        && cl_qmap_count(&sm->p_subn->node_guid_tbl) != 1
>>>>>>>>>>>>>          && sm->p_subn->sm_state != IB_SMINFO_STATE_DISCOVERING
>>>>>>>>>>>>>          && sm->p_subn->opt.force_heavy_sweep == FALSE
>>>>>>>>>>>>>          && sm->p_subn->force_heavy_sweep == FALSE
>>>>>>>>>>>>> -- 
>>>>>>>>>>>>> 1.5.1.4
>>>>>>>>>>>>>
>>>>>>>>>>>>> -- 
>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>>>>>>> linux-rdma" in
>>>>>>>>>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>>>>>>>>> More majordomo info at
>>>>>>>>>>>>> http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>
>>>>>>>>>>> -- 
>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>>>>> linux-rdma" in
>>>>>>>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>>>>>>> More majordomo info at 
>>>>>>>>>>> http://vger.kernel.org/majordomo-info.html
>>>>>>>>> -- 
>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>>> linux-rdma" in
>>>>>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>> -- 
>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>> linux-rdma" in
>>>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>
>>>>>> -- 
>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>> linux-rdma" in
>>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>> -- 
>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>> linux-rdma" in
>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>> -- 
>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] opensm/osm_state_mgr.c: force heavy sweep when fabric consists of single switch
       [not found]                                                   ` <4AFA80A5.8080809-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2009-11-12  8:05                                                     ` Eli Dorfman (Voltaire)
       [not found]                                                       ` <4AFBC1B7.8090509-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Eli Dorfman (Voltaire) @ 2009-11-12  8:05 UTC (permalink / raw)
  To: kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Sasha Khapyorsky
  Cc: Line Holen, Linux RDMA

Yevgeny Kliteynik wrote:
> Eli Dorfman (Voltaire) wrote:
>> Yevgeny Kliteynik wrote:
>>> Eli Dorfman (Voltaire) wrote:
>>>> Yevgeny Kliteynik wrote:
>>>>> Eli Dorfman (Voltaire) wrote:
>>>>>> Yevgeny Kliteynik wrote:
>>>>>>> Yevgeny Kliteynik wrote:
>>>>>>>> Line Holen wrote:
>>>>>>>>> On 11/ 4/09 04:54 PM, Yevgeny Kliteynik wrote:
>>>>>>>>>> Line Holen wrote:
>>>>>>>>>>> On 11/ 4/09 10:47 AM, Yevgeny Kliteynik wrote:
>>>>>>>>>>>> Sasha Khapyorsky wrote:
>>>>>>>>>>>>> On 12:26 Tue 03 Nov     , Yevgeny Kliteynik wrote:
>>>>>>>>>>>>>> Always do heavy sweep when there is only one node in the
>>>>>>>>>>>>>> fabric, and this node is a switch, and SM runs on top of it -
>>>>>>>>>>>>>> there may be a race when OSM starts running before the
>>>>>>>>>>>>>> external ports are ports are up, or if they went through
>>>>>>>>>>>>>> reset while SM was starting.
>>>>>>>>>>>>>> In this race switch brings up the ports and turns on the
>>>>>>>>>>>>>> PSC bit, but OSM might get PortInfo before SwitchInfo, and it
>>>>>>>>>>>>>> might see all ports as down, but PSC bit on. If that happens,
>>>>>>>>>>>>>> OSM turns off PSC bit, and it will never see external ports
>>>>>>>>>>>>>> again - it won't perform any heavy sweep, only light sweep
>>>>>>>>>>>>> Could such race happen when there are more than one node in a
>>>>>>>>>>>>> fabric?
>>>>>>>>>>>> I think that my description of the race was misleading.
>>>>>>>>>>>> The race can happen on *any* fabric when SM runs on switch.
>>>>>>>>>>>> But when it does happen, SM thinks that the whole subnet
>>>>>>>>>>>> is just one switch - that's what it managed to discover.
>>>>>>>>>>>> I've actually seen it happening.
>>>>>>>>>>>> So the patch fixes this particular case.
>>>>>>>>>>>>
>>>>>>>>>>>> So the next question that you would probably ask is can
>>>>>>>>>>>> this race happen on some *other* switch and not the one
>>>>>>>>>>>> SM is running on?
>>>>>>>>>>>>
>>>>>>>>>>>> Well, I don't know. I have a hunch that it can't, but I
>>>>>>>>>>>> couldn't prove it to myself yet.
>>>>>>>>>>>>
>>>>>>>>>>>> The race on the managed switch is a special case because
>>>>>>>>>>>> SM always sees port 0, and always gets responses to its
>>>>>>>>>>>> SMP queries. On any other switch, if the ports were reset,
>>>>>>>>>>>> SM won't get any response until the ports are up again.
>>>>>>>>>>>>
>>>>>>>>>>>> Perhaps there might be a case where SM got some port as down,
>>>>>>>>>>>> and by the time SM got SwitchInfo with PSC bit the port
>>>>>>>>>>>> was already up, so SM won't start discovery beyond this
>>>>>>>>>>>> port. But this race would be fixed on the next heavy sweep,
>>>>>>>>>>>> when SM will discover this port that it missed the previous
>>>>>>>>>>>> time, whereas race on managed switch is fatal - SM won't
>>>>>>>>>>>> ever do any heavy sweep.
>>>>>>>>>>>>
>>>>>>>>>>>> -- Yevgeny
>>>>>>>>>>> At least for the 3.2 branch there is a general race
>>>>>>>>>>> regardless of
>>>>>>>>>>> where the SM is running. I haven't checked the current
>>>>>>>>>>> master, but
>>>>>>>>>>> I cannot recall seeing any patches related to this so I assume
>>>>>>>>>>> the race is still there.
>>>>>>>>>>>
>>>>>>>>>>> There is a window between SM discovering a switch and
>>>>>>>>>>> clearing PSC
>>>>>>>>>>> for the same switch. The SM will not detect a state change on
>>>>>>>>>>> the
>>>>>>>>>>> switch ports during this time.
>>>>>>>>>> If the port changes state during that period, the switch issues
>>>>>>>>>> new trap 128, which (I think) should cause SM to re-discover the
>>>>>>>>>> fabric once this discovery cycle is over. Is this correct?
>>>>>>>>>>
>>>>>>>>> I think the switch shall send a trap whenever it sets the PSC bit.
>>>>>>>>> Once set I believe it will not send another trap until it is
>>>>>>>>> reset.
>>>>>>>>> Or do I misinterpret the spec ?
>>>>>>>> I may be wrong, but I thought that this is how things work:
>>>>>>>> - port state changes
>>>>>>>> - switch turns on PSC bit and starts sending traps
>>>>>>>> - SM gets the trap, sends trap repress
>>>>>>>> - switch gets trap repress and stops sending traps
>>>>>>>> - PSC is still on
>>>>>>>> - port state changes again (the same or any other port)
>>>>>>>> - switch turns on PSC bit (which doesn't matter as PSC is
>>>>>>>>   already on) and starts sending traps again
>>>>>>>> - etc...
>>>>>>>>
>>>>>>>> Anyway, I'll double-check this issue.
>>>>>>> Yep, verified.
>>>>>>> Switch sends traps regardless the PSC bit status.
>>>>>>> Also, the spec doesn't link them together:
>>>>>>>
>>>>>>>   o14-5.1.1: If a switch supports Traps (PortInfo:
>>>>>>>   CapabilityMask.IsTrap-Supported is one), its SMA
>>>>>>>   shall send trap 128 to the SM indicated by the 
>>>>>>> PortInfo:MasterSMLID
>>>>>>> under any condition that   would cause SwitchInfo:PortStateChange to
>>>>>>> be set
>>>>>>>   to one. (See 14.2.5.4 SwitchInfo on page 827.)
>>>>>>>
>>>>>> Trap will be sent according to the SMLID. After first bring up the
>>>>>> SMLID is not set yet and trap will not be sent.
>>>>>> In that case the opensm would discover the change only by PSC bit.
>>>>>> For IS3 chips the PSC bit and/or trap were set only after one or more
>>>>>> ports changed their state, so I don't understand how can the SM
>>>>>> discover PSC bit set while all ports are down. Or is this a change in
>>>>>> IS4?
>>>>> It can happen when SM runs on the switch, not not host.
>>>>> In this case if all ports are going down, SM will see
>>>>> them all down and it will see PSC bit on.
>>>> So this patch is only for SM running on a switch which is the only
>>>> node in the fabric?
>>>> I don't see the race when there is more than one switch - please
>>>> explain.
>>> Quoting from above:
>>>
>>>   The race can happen on *any* fabric when SM runs on switch.
>>>   But when it does happen, SM thinks that the whole subnet
>>>   is just one switch - that's what it managed to discover.
>>
>> I saw that but I don't understand how this can happen.
>> If PSC bit is set after *every* port state change and
>> SM clears PSC bit before reading PortInfo from the switch,
> 
> osm_node_info_rcv.c, ni_rcv_process_switch():
> I see in the code that SM receives NodeInfo, then it requests
> SwitchInfo and right after that it requests PortInfo for all
> the ports w/o waiting for the SwitchInfo response.
> In addition to that, if it happens during the first master
> sweep, SM LID is not configured yet, or configured to the
> wrong value, so no traps will be received by the SM.

I agree that trap will not be received but still PSC bit in SwitchInfo will be set for any port transition.
See also the spec:
"It is set to one anytime the PortState component in the
PortInfo of any ports transitions from Down to Initialize,
Initialize to Down, Armed to Down, or Active to Down as
a result of link state machine logic. Changes in Portstate
resulting from SubnSet() do not change this bit.
This bit is cleared by writing one, writing zero is
ignored."

The problem is that switch info PSC bit is cleared too late - in osm_ucast_mgr.c, ucast_mgr_set_fwd_top()
and not earlier (i.e. in osm_sw_info_rcv.c, osm_si_rcv_process_existing()).
The PSC bit should be cleared before SM reads the PortInfo from that switch.
In that case any PSC change will trigger discovery of new ports even if trap is missed.

Eli

> 
> -- Yevgeny
> 
>> then there is no race condition.
>> As I mentioned before for IS3 switches that is correct.
>> Is there a different behavior with IS4 switches?
>>
>>>> Also AFAIK the PSC bit is set only after any physical port state
>>>> change.
>>> Yes, but it is set only once.
>>
>> PSC bit should be set after *every* port state change.
>>
>> Eli
>>
>>> Meanwhile, the ports can change from up to down,
>>> then SM discovers them, and then from down to up.
>>>
>>> -- Yevgeny
>>>
>>>
>>>> So if we clear the PSC bit and only then get PortInfo we will still
>>>> catch any new state change.
>>>> right?
>>>>
>>>> Eli
>>>>
>>>>
>>>>> -- Yevgeny
>>>>>
>>>>>> Eli
>>>>>>
>>>>>>> -- Yevgeny
>>>>>>>
>>>>>>>> -- Yevgeny
>>>>>>>>
>>>>>>>>>> Or perhaps the more serious problem happens when SM LID is not
>>>>>>>>>> configured yet on the switch, hence the trap is not going to the
>>>>>>>>>> right place?
>>>>>>>>>>
>>>>>>>>>>> I have a patch for the 3.2 branch that I can merge into master.
>>>>>>>>>> Sure, that would be nice :)
>>>>>>>>>>
>>>>>>>>>> -- Yevgeny
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Line
>>>>>>>>>>>
>>>>>>>>>>>>> Sasha
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Signed-off-by: Yevgeny Kliteynik <kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>  opensm/opensm/osm_state_mgr.c |   15 ++++++++++-----
>>>>>>>>>>>>>>  1 files changed, 10 insertions(+), 5 deletions(-)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> diff --git a/opensm/opensm/osm_state_mgr.c
>>>>>>>>>>>>>> b/opensm/opensm/osm_state_mgr.c
>>>>>>>>>>>>>> index 4303d6e..537c855 100644
>>>>>>>>>>>>>> --- a/opensm/opensm/osm_state_mgr.c
>>>>>>>>>>>>>> +++ b/opensm/opensm/osm_state_mgr.c
>>>>>>>>>>>>>> @@ -1062,13 +1062,18 @@ static void do_sweep(osm_sm_t * sm)
>>>>>>>>>>>>>>       * Otherwise, this is probably our first discovery pass
>>>>>>>>>>>>>>       * or we are connected in loopback. In both cases do a
>>>>>>>>>>>>>>       * heavy sweep.
>>>>>>>>>>>>>> -     * Note: If we are connected in loopback we want a heavy
>>>>>>>>>>>>>> -     * sweep, since we will not be getting any traps if
>>>>>>>>>>>>>> there is
>>>>>>>>>>>>>> -     * a lost connection.
>>>>>>>>>>>>>> +     * Note the following:
>>>>>>>>>>>>>> +     * 1. If we are connected in loopback we want a heavy
>>>>>>>>>>>>>> sweep,
>>>>>>>>>>>>>> since we
>>>>>>>>>>>>>> +     *    will not be getting any traps if there is a lost
>>>>>>>>>>>>>> connection.
>>>>>>>>>>>>>> +     * 2. If we are in DISCOVERING state - this means it is
>>>>>>>>>>>>>> either in
>>>>>>>>>>>>>> +     *    initializing or wake up from STANDBY - run the
>>>>>>>>>>>>>> heavy
>>>>>>>>>>>>>> sweep.
>>>>>>>>>>>>>> +     * 3. If there is only one node in the fabric, and this
>>>>>>>>>>>>>> node is a
>>>>>>>>>>>>>> +     *    switch, and OSM runs on top of it, there might
>>>>>>>>>>>>>> be a
>>>>>>>>>>>>>> race
>>>>>>>>>>>>>> when
>>>>>>>>>>>>>> +     *    OSM starts running before the external ports are
>>>>>>>>>>>>>> up -
>>>>>>>>>>>>>> run the
>>>>>>>>>>>>>> +     *    heavy sweep.
>>>>>>>>>>>>>>       */
>>>>>>>>>>>>>> -    /*  if we are in DISCOVERING state - this means it is
>>>>>>>>>>>>>> either in
>>>>>>>>>>>>>> -     *  initializing or wake up from STANDBY - run the heavy
>>>>>>>>>>>>>> sweep */
>>>>>>>>>>>>>>      if (cl_qmap_count(&sm->p_subn->sw_guid_tbl)
>>>>>>>>>>>>>> +        && cl_qmap_count(&sm->p_subn->node_guid_tbl) != 1
>>>>>>>>>>>>>>          && sm->p_subn->sm_state !=
>>>>>>>>>>>>>> IB_SMINFO_STATE_DISCOVERING
>>>>>>>>>>>>>>          && sm->p_subn->opt.force_heavy_sweep == FALSE
>>>>>>>>>>>>>>          && sm->p_subn->force_heavy_sweep == FALSE
>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>> 1.5.1.4
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>>>>>>>> linux-rdma" in
>>>>>>>>>>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>>>>>>>>>> More majordomo info at
>>>>>>>>>>>>>> http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>
>>>>>>>>>>>> -- 
>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>>>>>> linux-rdma" in
>>>>>>>>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>>>>>>>> More majordomo info at
>>>>>>>>>>>> http://vger.kernel.org/majordomo-info.html
>>>>>>>>>> -- 
>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>>>> linux-rdma" in
>>>>>>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>>>>>> More majordomo info at 
>>>>>>>>>> http://vger.kernel.org/majordomo-info.html
>>>>>>>> -- 
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>> linux-rdma" in
>>>>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>
>>>>>>> -- 
>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>> linux-rdma" in
>>>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>> -- 
>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>> linux-rdma" in
>>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>> -- 
>>>> To unsubscribe from this list: send the line "unsubscribe
>>>> linux-rdma" in
>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] opensm/osm_state_mgr.c: force heavy sweep when fabric consists of single switch
       [not found]                                                       ` <4AFBC1B7.8090509-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2009-11-12 12:50                                                         ` Yevgeny Kliteynik
       [not found]                                                           ` <4AFC04A8.1040808-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Yevgeny Kliteynik @ 2009-11-12 12:50 UTC (permalink / raw)
  To: Eli Dorfman (Voltaire); +Cc: Sasha Khapyorsky, Line Holen, Linux RDMA

Eli Dorfman (Voltaire) wrote:
> Yevgeny Kliteynik wrote:
>> Eli Dorfman (Voltaire) wrote:
>>> Yevgeny Kliteynik wrote:
>>>> Eli Dorfman (Voltaire) wrote:
>>>>> Yevgeny Kliteynik wrote:
>>>>>> Eli Dorfman (Voltaire) wrote:
>>>>>>> Yevgeny Kliteynik wrote:
>>>>>>>> Yevgeny Kliteynik wrote:
>>>>>>>>> Line Holen wrote:
>>>>>>>>>> On 11/ 4/09 04:54 PM, Yevgeny Kliteynik wrote:
>>>>>>>>>>> Line Holen wrote:
>>>>>>>>>>>> On 11/ 4/09 10:47 AM, Yevgeny Kliteynik wrote:
>>>>>>>>>>>>> Sasha Khapyorsky wrote:
>>>>>>>>>>>>>> On 12:26 Tue 03 Nov     , Yevgeny Kliteynik wrote:
>>>>>>>>>>>>>>> Always do heavy sweep when there is only one node in the
>>>>>>>>>>>>>>> fabric, and this node is a switch, and SM runs on top of it -
>>>>>>>>>>>>>>> there may be a race when OSM starts running before the
>>>>>>>>>>>>>>> external ports are ports are up, or if they went through
>>>>>>>>>>>>>>> reset while SM was starting.
>>>>>>>>>>>>>>> In this race switch brings up the ports and turns on the
>>>>>>>>>>>>>>> PSC bit, but OSM might get PortInfo before SwitchInfo, and it
>>>>>>>>>>>>>>> might see all ports as down, but PSC bit on. If that happens,
>>>>>>>>>>>>>>> OSM turns off PSC bit, and it will never see external ports
>>>>>>>>>>>>>>> again - it won't perform any heavy sweep, only light sweep
>>>>>>>>>>>>>> Could such race happen when there are more than one node in a
>>>>>>>>>>>>>> fabric?
>>>>>>>>>>>>> I think that my description of the race was misleading.
>>>>>>>>>>>>> The race can happen on *any* fabric when SM runs on switch.
>>>>>>>>>>>>> But when it does happen, SM thinks that the whole subnet
>>>>>>>>>>>>> is just one switch - that's what it managed to discover.
>>>>>>>>>>>>> I've actually seen it happening.
>>>>>>>>>>>>> So the patch fixes this particular case.
>>>>>>>>>>>>>
>>>>>>>>>>>>> So the next question that you would probably ask is can
>>>>>>>>>>>>> this race happen on some *other* switch and not the one
>>>>>>>>>>>>> SM is running on?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Well, I don't know. I have a hunch that it can't, but I
>>>>>>>>>>>>> couldn't prove it to myself yet.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The race on the managed switch is a special case because
>>>>>>>>>>>>> SM always sees port 0, and always gets responses to its
>>>>>>>>>>>>> SMP queries. On any other switch, if the ports were reset,
>>>>>>>>>>>>> SM won't get any response until the ports are up again.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Perhaps there might be a case where SM got some port as down,
>>>>>>>>>>>>> and by the time SM got SwitchInfo with PSC bit the port
>>>>>>>>>>>>> was already up, so SM won't start discovery beyond this
>>>>>>>>>>>>> port. But this race would be fixed on the next heavy sweep,
>>>>>>>>>>>>> when SM will discover this port that it missed the previous
>>>>>>>>>>>>> time, whereas race on managed switch is fatal - SM won't
>>>>>>>>>>>>> ever do any heavy sweep.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -- Yevgeny
>>>>>>>>>>>> At least for the 3.2 branch there is a general race
>>>>>>>>>>>> regardless of
>>>>>>>>>>>> where the SM is running. I haven't checked the current
>>>>>>>>>>>> master, but
>>>>>>>>>>>> I cannot recall seeing any patches related to this so I assume
>>>>>>>>>>>> the race is still there.
>>>>>>>>>>>>
>>>>>>>>>>>> There is a window between SM discovering a switch and
>>>>>>>>>>>> clearing PSC
>>>>>>>>>>>> for the same switch. The SM will not detect a state change on
>>>>>>>>>>>> the
>>>>>>>>>>>> switch ports during this time.
>>>>>>>>>>> If the port changes state during that period, the switch issues
>>>>>>>>>>> new trap 128, which (I think) should cause SM to re-discover the
>>>>>>>>>>> fabric once this discovery cycle is over. Is this correct?
>>>>>>>>>>>
>>>>>>>>>> I think the switch shall send a trap whenever it sets the PSC bit.
>>>>>>>>>> Once set I believe it will not send another trap until it is
>>>>>>>>>> reset.
>>>>>>>>>> Or do I misinterpret the spec ?
>>>>>>>>> I may be wrong, but I thought that this is how things work:
>>>>>>>>> - port state changes
>>>>>>>>> - switch turns on PSC bit and starts sending traps
>>>>>>>>> - SM gets the trap, sends trap repress
>>>>>>>>> - switch gets trap repress and stops sending traps
>>>>>>>>> - PSC is still on
>>>>>>>>> - port state changes again (the same or any other port)
>>>>>>>>> - switch turns on PSC bit (which doesn't matter as PSC is
>>>>>>>>>   already on) and starts sending traps again
>>>>>>>>> - etc...
>>>>>>>>>
>>>>>>>>> Anyway, I'll double-check this issue.
>>>>>>>> Yep, verified.
>>>>>>>> Switch sends traps regardless the PSC bit status.
>>>>>>>> Also, the spec doesn't link them together:
>>>>>>>>
>>>>>>>>   o14-5.1.1: If a switch supports Traps (PortInfo:
>>>>>>>>   CapabilityMask.IsTrap-Supported is one), its SMA
>>>>>>>>   shall send trap 128 to the SM indicated by the 
>>>>>>>> PortInfo:MasterSMLID
>>>>>>>> under any condition that   would cause SwitchInfo:PortStateChange to
>>>>>>>> be set
>>>>>>>>   to one. (See 14.2.5.4 SwitchInfo on page 827.)
>>>>>>>>
>>>>>>> Trap will be sent according to the SMLID. After first bring up the
>>>>>>> SMLID is not set yet and trap will not be sent.
>>>>>>> In that case the opensm would discover the change only by PSC bit.
>>>>>>> For IS3 chips the PSC bit and/or trap were set only after one or more
>>>>>>> ports changed their state, so I don't understand how can the SM
>>>>>>> discover PSC bit set while all ports are down. Or is this a change in
>>>>>>> IS4?
>>>>>> It can happen when SM runs on the switch, not not host.
>>>>>> In this case if all ports are going down, SM will see
>>>>>> them all down and it will see PSC bit on.
>>>>> So this patch is only for SM running on a switch which is the only
>>>>> node in the fabric?
>>>>> I don't see the race when there is more than one switch - please
>>>>> explain.
>>>> Quoting from above:
>>>>
>>>>   The race can happen on *any* fabric when SM runs on switch.
>>>>   But when it does happen, SM thinks that the whole subnet
>>>>   is just one switch - that's what it managed to discover.
>>> I saw that but I don't understand how this can happen.
>>> If PSC bit is set after *every* port state change and
>>> SM clears PSC bit before reading PortInfo from the switch,
>> osm_node_info_rcv.c, ni_rcv_process_switch():
>> I see in the code that SM receives NodeInfo, then it requests
>> SwitchInfo and right after that it requests PortInfo for all
>> the ports w/o waiting for the SwitchInfo response.
>> In addition to that, if it happens during the first master
>> sweep, SM LID is not configured yet, or configured to the
>> wrong value, so no traps will be received by the SM.
> 
> I agree that trap will not be received but still PSC bit in SwitchInfo will be set for any port transition.
> See also the spec:
> "It is set to one anytime the PortState component in the
> PortInfo of any ports transitions from Down to Initialize,
> Initialize to Down, Armed to Down, or Active to Down as
> a result of link state machine logic. Changes in Portstate
> resulting from SubnSet() do not change this bit.
> This bit is cleared by writing one, writing zero is
> ignored."
> 
> The problem is that switch info PSC bit is cleared too late - in osm_ucast_mgr.c, ucast_mgr_set_fwd_top()
> and not earlier (i.e. in osm_sw_info_rcv.c, osm_si_rcv_process_existing()).
> The PSC bit should be cleared before SM reads the PortInfo from that switch.

This may be true, but do you really want to require
Set(SwitchInfo) *completion* before getting other PortInfo
MADs from this switch?
First of all, this means that you will have more than one
Set(SwitchInfo) for the switch (during the discovery you
don't know what is the LFT top). Second, it will slow down
the whole discovery - you will introduce lots of new
barriers that will damage the whole asynchronous chain
reaction of the discovery.

-- Yevgeny

> In that case any PSC change will trigger discovery of new ports even if trap is missed.
> 
> Eli
> 
>> -- Yevgeny
>>
>>> then there is no race condition.
>>> As I mentioned before for IS3 switches that is correct.
>>> Is there a different behavior with IS4 switches?
>>>
>>>>> Also AFAIK the PSC bit is set only after any physical port state
>>>>> change.
>>>> Yes, but it is set only once.
>>> PSC bit should be set after *every* port state change.
>>>
>>> Eli
>>>
>>>> Meanwhile, the ports can change from up to down,
>>>> then SM discovers them, and then from down to up.
>>>>
>>>> -- Yevgeny
>>>>
>>>>
>>>>> So if we clear the PSC bit and only then get PortInfo we will still
>>>>> catch any new state change.
>>>>> right?
>>>>>
>>>>> Eli
>>>>>
>>>>>
>>>>>> -- Yevgeny
>>>>>>
>>>>>>> Eli
>>>>>>>
>>>>>>>> -- Yevgeny
>>>>>>>>
>>>>>>>>> -- Yevgeny
>>>>>>>>>
>>>>>>>>>>> Or perhaps the more serious problem happens when SM LID is not
>>>>>>>>>>> configured yet on the switch, hence the trap is not going to the
>>>>>>>>>>> right place?
>>>>>>>>>>>
>>>>>>>>>>>> I have a patch for the 3.2 branch that I can merge into master.
>>>>>>>>>>> Sure, that would be nice :)
>>>>>>>>>>>
>>>>>>>>>>> -- Yevgeny
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Line
>>>>>>>>>>>>
>>>>>>>>>>>>>> Sasha
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Signed-off-by: Yevgeny Kliteynik <kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>  opensm/opensm/osm_state_mgr.c |   15 ++++++++++-----
>>>>>>>>>>>>>>>  1 files changed, 10 insertions(+), 5 deletions(-)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> diff --git a/opensm/opensm/osm_state_mgr.c
>>>>>>>>>>>>>>> b/opensm/opensm/osm_state_mgr.c
>>>>>>>>>>>>>>> index 4303d6e..537c855 100644
>>>>>>>>>>>>>>> --- a/opensm/opensm/osm_state_mgr.c
>>>>>>>>>>>>>>> +++ b/opensm/opensm/osm_state_mgr.c
>>>>>>>>>>>>>>> @@ -1062,13 +1062,18 @@ static void do_sweep(osm_sm_t * sm)
>>>>>>>>>>>>>>>       * Otherwise, this is probably our first discovery pass
>>>>>>>>>>>>>>>       * or we are connected in loopback. In both cases do a
>>>>>>>>>>>>>>>       * heavy sweep.
>>>>>>>>>>>>>>> -     * Note: If we are connected in loopback we want a heavy
>>>>>>>>>>>>>>> -     * sweep, since we will not be getting any traps if
>>>>>>>>>>>>>>> there is
>>>>>>>>>>>>>>> -     * a lost connection.
>>>>>>>>>>>>>>> +     * Note the following:
>>>>>>>>>>>>>>> +     * 1. If we are connected in loopback we want a heavy
>>>>>>>>>>>>>>> sweep,
>>>>>>>>>>>>>>> since we
>>>>>>>>>>>>>>> +     *    will not be getting any traps if there is a lost
>>>>>>>>>>>>>>> connection.
>>>>>>>>>>>>>>> +     * 2. If we are in DISCOVERING state - this means it is
>>>>>>>>>>>>>>> either in
>>>>>>>>>>>>>>> +     *    initializing or wake up from STANDBY - run the
>>>>>>>>>>>>>>> heavy
>>>>>>>>>>>>>>> sweep.
>>>>>>>>>>>>>>> +     * 3. If there is only one node in the fabric, and this
>>>>>>>>>>>>>>> node is a
>>>>>>>>>>>>>>> +     *    switch, and OSM runs on top of it, there might
>>>>>>>>>>>>>>> be a
>>>>>>>>>>>>>>> race
>>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>> +     *    OSM starts running before the external ports are
>>>>>>>>>>>>>>> up -
>>>>>>>>>>>>>>> run the
>>>>>>>>>>>>>>> +     *    heavy sweep.
>>>>>>>>>>>>>>>       */
>>>>>>>>>>>>>>> -    /*  if we are in DISCOVERING state - this means it is
>>>>>>>>>>>>>>> either in
>>>>>>>>>>>>>>> -     *  initializing or wake up from STANDBY - run the heavy
>>>>>>>>>>>>>>> sweep */
>>>>>>>>>>>>>>>      if (cl_qmap_count(&sm->p_subn->sw_guid_tbl)
>>>>>>>>>>>>>>> +        && cl_qmap_count(&sm->p_subn->node_guid_tbl) != 1
>>>>>>>>>>>>>>>          && sm->p_subn->sm_state !=
>>>>>>>>>>>>>>> IB_SMINFO_STATE_DISCOVERING
>>>>>>>>>>>>>>>          && sm->p_subn->opt.force_heavy_sweep == FALSE
>>>>>>>>>>>>>>>          && sm->p_subn->force_heavy_sweep == FALSE
>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>> 1.5.1.4
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>>>>>>>>> linux-rdma" in
>>>>>>>>>>>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>>>>>>>>>>> More majordomo info at
>>>>>>>>>>>>>>> http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>
>>>>>>>>>>>>> -- 
>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>>>>>>> linux-rdma" in
>>>>>>>>>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>>>>>>>>> More majordomo info at
>>>>>>>>>>>>> http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>> -- 
>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>>>>> linux-rdma" in
>>>>>>>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>>>>>>> More majordomo info at 
>>>>>>>>>>> http://vger.kernel.org/majordomo-info.html
>>>>>>>>> -- 
>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>>> linux-rdma" in
>>>>>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>
>>>>>>>> -- 
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>> linux-rdma" in
>>>>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>> -- 
>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>> linux-rdma" in
>>>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>
>>>>> -- 
>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>> linux-rdma" in
>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>> -- 
>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] opensm/osm_state_mgr.c: force heavy sweep when fabric consists of single switch
       [not found]                                                           ` <4AFC04A8.1040808-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2009-11-13  8:24                                                             ` Eli Dorfman
       [not found]                                                               ` <694d48600911130024vd803e5fhf835690742f14ba7-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Eli Dorfman @ 2009-11-13  8:24 UTC (permalink / raw)
  To: kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb
  Cc: Sasha Khapyorsky, Line Holen, Linux RDMA

On Thu, Nov 12, 2009 at 2:50 PM, Yevgeny Kliteynik
<kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote:
> Eli Dorfman (Voltaire) wrote:
>>
>> Yevgeny Kliteynik wrote:
>>>
>>> Eli Dorfman (Voltaire) wrote:
>>>>
>>>> Yevgeny Kliteynik wrote:
>>>>>
>>>>> Eli Dorfman (Voltaire) wrote:
>>>>>>
>>>>>> Yevgeny Kliteynik wrote:
>>>>>>>
>>>>>>> Eli Dorfman (Voltaire) wrote:
>>>>>>>>
>>>>>>>> Yevgeny Kliteynik wrote:
>>>>>>>>>
>>>>>>>>> Yevgeny Kliteynik wrote:
>>>>>>>>>>
>>>>>>>>>> Line Holen wrote:
>>>>>>>>>>>
>>>>>>>>>>> On 11/ 4/09 04:54 PM, Yevgeny Kliteynik wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Line Holen wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 11/ 4/09 10:47 AM, Yevgeny Kliteynik wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Sasha Khapyorsky wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 12:26 Tue 03 Nov     , Yevgeny Kliteynik wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Always do heavy sweep when there is only one node in the
>>>>>>>>>>>>>>>> fabric, and this node is a switch, and SM runs on top of it
>>>>>>>>>>>>>>>> -
>>>>>>>>>>>>>>>> there may be a race when OSM starts running before the
>>>>>>>>>>>>>>>> external ports are ports are up, or if they went through
>>>>>>>>>>>>>>>> reset while SM was starting.
>>>>>>>>>>>>>>>> In this race switch brings up the ports and turns on the
>>>>>>>>>>>>>>>> PSC bit, but OSM might get PortInfo before SwitchInfo, and
>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>> might see all ports as down, but PSC bit on. If that
>>>>>>>>>>>>>>>> happens,
>>>>>>>>>>>>>>>> OSM turns off PSC bit, and it will never see external ports
>>>>>>>>>>>>>>>> again - it won't perform any heavy sweep, only light sweep
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Could such race happen when there are more than one node in a
>>>>>>>>>>>>>>> fabric?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think that my description of the race was misleading.
>>>>>>>>>>>>>> The race can happen on *any* fabric when SM runs on switch.
>>>>>>>>>>>>>> But when it does happen, SM thinks that the whole subnet
>>>>>>>>>>>>>> is just one switch - that's what it managed to discover.
>>>>>>>>>>>>>> I've actually seen it happening.
>>>>>>>>>>>>>> So the patch fixes this particular case.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So the next question that you would probably ask is can
>>>>>>>>>>>>>> this race happen on some *other* switch and not the one
>>>>>>>>>>>>>> SM is running on?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Well, I don't know. I have a hunch that it can't, but I
>>>>>>>>>>>>>> couldn't prove it to myself yet.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The race on the managed switch is a special case because
>>>>>>>>>>>>>> SM always sees port 0, and always gets responses to its
>>>>>>>>>>>>>> SMP queries. On any other switch, if the ports were reset,
>>>>>>>>>>>>>> SM won't get any response until the ports are up again.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Perhaps there might be a case where SM got some port as down,
>>>>>>>>>>>>>> and by the time SM got SwitchInfo with PSC bit the port
>>>>>>>>>>>>>> was already up, so SM won't start discovery beyond this
>>>>>>>>>>>>>> port. But this race would be fixed on the next heavy sweep,
>>>>>>>>>>>>>> when SM will discover this port that it missed the previous
>>>>>>>>>>>>>> time, whereas race on managed switch is fatal - SM won't
>>>>>>>>>>>>>> ever do any heavy sweep.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -- Yevgeny
>>>>>>>>>>>>>
>>>>>>>>>>>>> At least for the 3.2 branch there is a general race
>>>>>>>>>>>>> regardless of
>>>>>>>>>>>>> where the SM is running. I haven't checked the current
>>>>>>>>>>>>> master, but
>>>>>>>>>>>>> I cannot recall seeing any patches related to this so I assume
>>>>>>>>>>>>> the race is still there.
>>>>>>>>>>>>>
>>>>>>>>>>>>> There is a window between SM discovering a switch and
>>>>>>>>>>>>> clearing PSC
>>>>>>>>>>>>> for the same switch. The SM will not detect a state change on
>>>>>>>>>>>>> the
>>>>>>>>>>>>> switch ports during this time.
>>>>>>>>>>>>
>>>>>>>>>>>> If the port changes state during that period, the switch issues
>>>>>>>>>>>> new trap 128, which (I think) should cause SM to re-discover the
>>>>>>>>>>>> fabric once this discovery cycle is over. Is this correct?
>>>>>>>>>>>>
>>>>>>>>>>> I think the switch shall send a trap whenever it sets the PSC
>>>>>>>>>>> bit.
>>>>>>>>>>> Once set I believe it will not send another trap until it is
>>>>>>>>>>> reset.
>>>>>>>>>>> Or do I misinterpret the spec ?
>>>>>>>>>>
>>>>>>>>>> I may be wrong, but I thought that this is how things work:
>>>>>>>>>> - port state changes
>>>>>>>>>> - switch turns on PSC bit and starts sending traps
>>>>>>>>>> - SM gets the trap, sends trap repress
>>>>>>>>>> - switch gets trap repress and stops sending traps
>>>>>>>>>> - PSC is still on
>>>>>>>>>> - port state changes again (the same or any other port)
>>>>>>>>>> - switch turns on PSC bit (which doesn't matter as PSC is
>>>>>>>>>>  already on) and starts sending traps again
>>>>>>>>>> - etc...
>>>>>>>>>>
>>>>>>>>>> Anyway, I'll double-check this issue.
>>>>>>>>>
>>>>>>>>> Yep, verified.
>>>>>>>>> Switch sends traps regardless the PSC bit status.
>>>>>>>>> Also, the spec doesn't link them together:
>>>>>>>>>
>>>>>>>>>  o14-5.1.1: If a switch supports Traps (PortInfo:
>>>>>>>>>  CapabilityMask.IsTrap-Supported is one), its SMA
>>>>>>>>>  shall send trap 128 to the SM indicated by the
>>>>>>>>> PortInfo:MasterSMLID
>>>>>>>>> under any condition that   would cause SwitchInfo:PortStateChange
>>>>>>>>> to
>>>>>>>>> be set
>>>>>>>>>  to one. (See 14.2.5.4 SwitchInfo on page 827.)
>>>>>>>>>
>>>>>>>> Trap will be sent according to the SMLID. After first bring up the
>>>>>>>> SMLID is not set yet and trap will not be sent.
>>>>>>>> In that case the opensm would discover the change only by PSC bit.
>>>>>>>> For IS3 chips the PSC bit and/or trap were set only after one or
>>>>>>>> more
>>>>>>>> ports changed their state, so I don't understand how can the SM
>>>>>>>> discover PSC bit set while all ports are down. Or is this a change
>>>>>>>> in
>>>>>>>> IS4?
>>>>>>>
>>>>>>> It can happen when SM runs on the switch, not not host.
>>>>>>> In this case if all ports are going down, SM will see
>>>>>>> them all down and it will see PSC bit on.
>>>>>>
>>>>>> So this patch is only for SM running on a switch which is the only
>>>>>> node in the fabric?
>>>>>> I don't see the race when there is more than one switch - please
>>>>>> explain.
>>>>>
>>>>> Quoting from above:
>>>>>
>>>>>  The race can happen on *any* fabric when SM runs on switch.
>>>>>  But when it does happen, SM thinks that the whole subnet
>>>>>  is just one switch - that's what it managed to discover.
>>>>
>>>> I saw that but I don't understand how this can happen.
>>>> If PSC bit is set after *every* port state change and
>>>> SM clears PSC bit before reading PortInfo from the switch,
>>>
>>> osm_node_info_rcv.c, ni_rcv_process_switch():
>>> I see in the code that SM receives NodeInfo, then it requests
>>> SwitchInfo and right after that it requests PortInfo for all
>>> the ports w/o waiting for the SwitchInfo response.
>>> In addition to that, if it happens during the first master
>>> sweep, SM LID is not configured yet, or configured to the
>>> wrong value, so no traps will be received by the SM.
>>
>> I agree that trap will not be received but still PSC bit in SwitchInfo
>> will be set for any port transition.
>> See also the spec:
>> "It is set to one anytime the PortState component in the
>> PortInfo of any ports transitions from Down to Initialize,
>> Initialize to Down, Armed to Down, or Active to Down as
>> a result of link state machine logic. Changes in Portstate
>> resulting from SubnSet() do not change this bit.
>> This bit is cleared by writing one, writing zero is
>> ignored."
>>
>> The problem is that switch info PSC bit is cleared too late - in
>> osm_ucast_mgr.c, ucast_mgr_set_fwd_top()
>> and not earlier (i.e. in osm_sw_info_rcv.c,
>> osm_si_rcv_process_existing()).
>> The PSC bit should be cleared before SM reads the PortInfo from that
>> switch.
>
> This may be true, but do you really want to require
> Set(SwitchInfo) *completion* before getting other PortInfo
> MADs from this switch?

I don't see any other good alternative and I'm sure this will solve
all race conditions.
Most of the Set(SwitchInfo) request will be sent in parallel to other
discovery MAD PortInfo, NodeInfo, etc.

> First of all, this means that you will have more than one
> Set(SwitchInfo) for the switch (during the discovery you
> don't know what is the LFT top).

That is correct but number of switches is small and compared to all
the MADs that are sent during discovery
this is a very small penalty.

> Second, it will slow down
> the whole discovery - you will introduce lots of new
> barriers that will damage the whole asynchronous chain
> reaction of the discovery.

not necessarily.
I suggest that when PSC bit is set in SwitchInfo response we will send
Set(SwitchInfo) to clear the PSC bit and
in SwitchInfo response send Get(PortInfo) - this requires adding
another flag to the mad context.
If PSC bit is not set or SM is not in Master state Get(PortInfo) will
be done in the immediately.

what do you think?

Eli

>
> -- Yevgeny
>
>> In that case any PSC change will trigger discovery of new ports even if
>> trap is missed.
>>
>> Eli
>>
>>> -- Yevgeny
>>>
>>>> then there is no race condition.
>>>> As I mentioned before for IS3 switches that is correct.
>>>> Is there a different behavior with IS4 switches?
>>>>
>>>>>> Also AFAIK the PSC bit is set only after any physical port state
>>>>>> change.
>>>>>
>>>>> Yes, but it is set only once.
>>>>
>>>> PSC bit should be set after *every* port state change.
>>>>
>>>> Eli
>>>>
>>>>> Meanwhile, the ports can change from up to down,
>>>>> then SM discovers them, and then from down to up.
>>>>>
>>>>> -- Yevgeny
>>>>>
>>>>>
>>>>>> So if we clear the PSC bit and only then get PortInfo we will still
>>>>>> catch any new state change.
>>>>>> right?
>>>>>>
>>>>>> Eli
>>>>>>
>>>>>>
>>>>>>> -- Yevgeny
>>>>>>>
>>>>>>>> Eli
>>>>>>>>
>>>>>>>>> -- Yevgeny
>>>>>>>>>
>>>>>>>>>> -- Yevgeny
>>>>>>>>>>
>>>>>>>>>>>> Or perhaps the more serious problem happens when SM LID is not
>>>>>>>>>>>> configured yet on the switch, hence the trap is not going to the
>>>>>>>>>>>> right place?
>>>>>>>>>>>>
>>>>>>>>>>>>> I have a patch for the 3.2 branch that I can merge into master.
>>>>>>>>>>>>
>>>>>>>>>>>> Sure, that would be nice :)
>>>>>>>>>>>>
>>>>>>>>>>>> -- Yevgeny
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Line
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Sasha
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Signed-off-by: Yevgeny Kliteynik
>>>>>>>>>>>>>>>> <kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>  opensm/opensm/osm_state_mgr.c |   15 ++++++++++-----
>>>>>>>>>>>>>>>>  1 files changed, 10 insertions(+), 5 deletions(-)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> diff --git a/opensm/opensm/osm_state_mgr.c
>>>>>>>>>>>>>>>> b/opensm/opensm/osm_state_mgr.c
>>>>>>>>>>>>>>>> index 4303d6e..537c855 100644
>>>>>>>>>>>>>>>> --- a/opensm/opensm/osm_state_mgr.c
>>>>>>>>>>>>>>>> +++ b/opensm/opensm/osm_state_mgr.c
>>>>>>>>>>>>>>>> @@ -1062,13 +1062,18 @@ static void do_sweep(osm_sm_t * sm)
>>>>>>>>>>>>>>>>      * Otherwise, this is probably our first discovery pass
>>>>>>>>>>>>>>>>      * or we are connected in loopback. In both cases do a
>>>>>>>>>>>>>>>>      * heavy sweep.
>>>>>>>>>>>>>>>> -     * Note: If we are connected in loopback we want a
>>>>>>>>>>>>>>>> heavy
>>>>>>>>>>>>>>>> -     * sweep, since we will not be getting any traps if
>>>>>>>>>>>>>>>> there is
>>>>>>>>>>>>>>>> -     * a lost connection.
>>>>>>>>>>>>>>>> +     * Note the following:
>>>>>>>>>>>>>>>> +     * 1. If we are connected in loopback we want a heavy
>>>>>>>>>>>>>>>> sweep,
>>>>>>>>>>>>>>>> since we
>>>>>>>>>>>>>>>> +     *    will not be getting any traps if there is a lost
>>>>>>>>>>>>>>>> connection.
>>>>>>>>>>>>>>>> +     * 2. If we are in DISCOVERING state - this means it is
>>>>>>>>>>>>>>>> either in
>>>>>>>>>>>>>>>> +     *    initializing or wake up from STANDBY - run the
>>>>>>>>>>>>>>>> heavy
>>>>>>>>>>>>>>>> sweep.
>>>>>>>>>>>>>>>> +     * 3. If there is only one node in the fabric, and this
>>>>>>>>>>>>>>>> node is a
>>>>>>>>>>>>>>>> +     *    switch, and OSM runs on top of it, there might
>>>>>>>>>>>>>>>> be a
>>>>>>>>>>>>>>>> race
>>>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>>> +     *    OSM starts running before the external ports are
>>>>>>>>>>>>>>>> up -
>>>>>>>>>>>>>>>> run the
>>>>>>>>>>>>>>>> +     *    heavy sweep.
>>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>> -    /*  if we are in DISCOVERING state - this means it is
>>>>>>>>>>>>>>>> either in
>>>>>>>>>>>>>>>> -     *  initializing or wake up from STANDBY - run the
>>>>>>>>>>>>>>>> heavy
>>>>>>>>>>>>>>>> sweep */
>>>>>>>>>>>>>>>>     if (cl_qmap_count(&sm->p_subn->sw_guid_tbl)
>>>>>>>>>>>>>>>> +        && cl_qmap_count(&sm->p_subn->node_guid_tbl) != 1
>>>>>>>>>>>>>>>>         && sm->p_subn->sm_state !=
>>>>>>>>>>>>>>>> IB_SMINFO_STATE_DISCOVERING
>>>>>>>>>>>>>>>>         && sm->p_subn->opt.force_heavy_sweep == FALSE
>>>>>>>>>>>>>>>>         && sm->p_subn->force_heavy_sweep == FALSE
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> 1.5.1.4
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>>>>>>>>>> linux-rdma" in
>>>>>>>>>>>>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>>>>>>>>>>>> More majordomo info at
>>>>>>>>>>>>>>>> http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>>>>>>>> linux-rdma" in
>>>>>>>>>>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>>>>>>>>>> More majordomo info at
>>>>>>>>>>>>>> http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>>>>>> linux-rdma" in
>>>>>>>>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>>>>>>>> More majordomo info at
>>>>>>>>>>>> http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>>>> linux-rdma" in
>>>>>>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>>> linux-rdma" in
>>>>>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>> linux-rdma" in
>>>>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>> linux-rdma" in
>>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] opensm/osm_state_mgr.c: force heavy sweep when fabric consists of single switch
       [not found]                                                               ` <694d48600911130024vd803e5fhf835690742f14ba7-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2009-11-13 15:20                                                                 ` Sasha Khapyorsky
  0 siblings, 0 replies; 18+ messages in thread
From: Sasha Khapyorsky @ 2009-11-13 15:20 UTC (permalink / raw)
  To: Eli Dorfman
  Cc: kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Line Holen, Linux RDMA

On 10:24 Fri 13 Nov     , Eli Dorfman wrote:
> On Thu, Nov 12, 2009 at 2:50 PM, Yevgeny Kliteynik
> <kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote:
> > Eli Dorfman (Voltaire) wrote:
> >>
> >> Yevgeny Kliteynik wrote:
> >>>
> >>> Eli Dorfman (Voltaire) wrote:
> >>>>
> >>>> Yevgeny Kliteynik wrote:
> >>>>>
> >>>>> Eli Dorfman (Voltaire) wrote:
> >>>>>>
> >>>>>> Yevgeny Kliteynik wrote:
> >>>>>>>
> >>>>>>> Eli Dorfman (Voltaire) wrote:
> >>>>>>>>
> >>>>>>>> Yevgeny Kliteynik wrote:
> >>>>>>>>>
> >>>>>>>>> Yevgeny Kliteynik wrote:
> >>>>>>>>>>
> >>>>>>>>>> Line Holen wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> On 11/ 4/09 04:54 PM, Yevgeny Kliteynik wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Line Holen wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On 11/ 4/09 10:47 AM, Yevgeny Kliteynik wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Sasha Khapyorsky wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On 12:26 Tue 03 Nov     , Yevgeny Kliteynik wrote:

This is a nice and productive discussion. But please remove a context
which is not related to the current point - it helps to follow thread.

[snip]

> >> The PSC bit should be cleared before SM reads the PortInfo from that
> >> switch.
> >
> > This may be true, but do you really want to require
> > Set(SwitchInfo) *completion* before getting other PortInfo
> > MADs from this switch?
> 
> I don't see any other good alternative and I'm sure this will solve
> all race conditions.

I agree with Eli - if we want to prevent PSC bit loss (which can be
relevant for any subnet/SM topology), we need to clear it first and than
to fetch PortInfo. Obviously this may have some performance penalty, but
proper discovery is more important.

> I suggest that when PSC bit is set in SwitchInfo response we will send
> Set(SwitchInfo) to clear the PSC bit and
> in SwitchInfo response send Get(PortInfo) - this requires adding
> another flag to the mad context.

Another flag or reusing one of existing flag.

> If PSC bit is not set or SM is not in Master state Get(PortInfo) will
> be done in the immediately.
> 
> what do you think?

Another potential issue is to avoid a secondary trap triggered sweep for
cases when PSC change is already cached.

Basically looks like a right direction for me.

Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2009-11-13 15:20 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-03 10:26 [PATCH] opensm/osm_state_mgr.c: force heavy sweep when fabric consists of single switch Yevgeny Kliteynik
     [not found] ` <4AF0056A.5030503-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2009-11-03 22:12   ` Sasha Khapyorsky
2009-11-04  9:47     ` Yevgeny Kliteynik
     [not found]       ` <4AF14DCD.3010407-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2009-11-04 11:36         ` Line Holen
     [not found]           ` <4AF16740.3080600-UdXhSnd/wVw@public.gmane.org>
2009-11-04 15:54             ` Yevgeny Kliteynik
     [not found]               ` <4AF1A3CA.9070902-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2009-11-04 17:42                 ` Line Holen
     [not found]                   ` <4AF1BD1C.4090703-UdXhSnd/wVw@public.gmane.org>
2009-11-04 18:39                     ` Yevgeny Kliteynik
     [not found]                       ` <4AF1CA61.2020007-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2009-11-05  7:29                         ` Yevgeny Kliteynik
     [not found]                           ` <4AF27EDB.6070604-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2009-11-08 14:30                             ` Eli Dorfman (Voltaire)
     [not found]                               ` <4AF6D619.8000908-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2009-11-09  8:18                                 ` Yevgeny Kliteynik
     [not found]                                   ` <4AF7D040.2060807-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2009-11-09 10:42                                     ` Eli Dorfman (Voltaire)
     [not found]                                       ` <4AF7F22D.9010609-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2009-11-09 11:09                                         ` Yevgeny Kliteynik
     [not found]                                           ` <4AF7F864.6030809-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2009-11-09 13:54                                             ` Eli Dorfman (Voltaire)
     [not found]                                               ` <4AF81F15.1080205-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2009-11-11  9:15                                                 ` Yevgeny Kliteynik
     [not found]                                                   ` <4AFA80A5.8080809-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2009-11-12  8:05                                                     ` Eli Dorfman (Voltaire)
     [not found]                                                       ` <4AFBC1B7.8090509-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2009-11-12 12:50                                                         ` Yevgeny Kliteynik
     [not found]                                                           ` <4AFC04A8.1040808-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2009-11-13  8:24                                                             ` Eli Dorfman
     [not found]                                                               ` <694d48600911130024vd803e5fhf835690742f14ba7-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-11-13 15:20                                                                 ` Sasha Khapyorsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.