All of lore.kernel.org
 help / color / mirror / Atom feed
* opensm with multiple IB subnets
@ 2010-04-20 21:13 Ken Teague
       [not found] ` <k2s2d0a59b21004201413ia115ae29u661f8df428d5ad08-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Ken Teague @ 2010-04-20 21:13 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA

I have a 17-node cluster and each node has a single IB card that has
2x IB ports (ib0 and ib1).  I have each node plugging in to an IB
switch; one cable between ib0 and the respective A-channel and another
cable between ib1 and the respective B-channel (e.g. c2n2, ib0 leads
to switch port 2A / c2n2, ib1 leads to switch port 2B / c2n3, ib0
leads to switch port 3A / c2n3, ib1 leads to switch port 3B ... etc.).
 Each channel and IB port/cable run are on separate subnets.  opensm
is running as a demon on the master node.

The IB switch in question has a total of 48 ports split between 24
A-channels and 24 B-channels.  Although this is 1 physical switch, the
two channels are separated internally in the circuitry.

After deploying the cluster, my A-channels were lighting up with both
the green and amber lights, but the B-channels were only lighting up
with the green link light.  I read opensm(8) and, if I'm understanding
this correctly, I need to run two instances of opensm; one for each
port.  Is this correct?

If I do need to run an instances of opensm per subnet, what is the
best way to do this automatically during boot?  /etc/ofa/opensm.conf
does allow me to specify a GUID, but will it allow me to specify
multiple GUIDs?  Should I (or is there a benefit to) run opensm on the
same host?  Please let me know if more information is needed.  Thanks
in advance.

Distribution:
OpenSM 3.2.2
openSUSE 10.3 (X86-64)
VERSION = 10.3
LSB_VERSION="core-2.0-noarch:core-3.0-noarch:core-2.0-x86_64:core-3.0-x86_64"

Subnets:
10.1.1.x - eth1
10.0.1.x - ib
10.0.2.x - ib2

HCA:
Microway DDR using Mellanox chipset.  Each card has 2x IB ports and 2x
EIA-422-B ports.

Switch:
Microway FasTree 48-port split between A-channels and B-channels, 24
ports per channel.  Although this is 1 physical switch, the A-channels
and B-channels are separate.

Cluster:
17 nodes including the master

/etc/hosts:
127.0.0.1       localhost.cl.mydomain.local  localhost

10.1.2.1        master.cl.mydomain.local    master
10.1.2.2        c2n2.cl.mydomain.local      c2n2
10.1.2.3        c2n3.cl.mydomain.local      c2n3
10.1.2.4        c2n4.cl.mydomain.local      c2n4
10.1.2.5        c2n5.cl.mydomain.local      c2n5
10.1.2.6        c2n6.cl.mydomain.local      c2n6
10.1.2.7        c2n7.cl.mydomain.local      c2n7
10.1.2.8        c2n8.cl.mydomain.local      c2n8
10.1.2.9        c2n9.cl.mydomain.local      c2n9
10.1.2.10       c2n10.cl.mydomain.local     c2n10
10.1.2.11       c2n11.cl.mydomain.local     c2n11
10.1.2.12       c2n12.cl.mydomain.local     c2n12
10.1.2.13       c2n13.cl.mydomain.local     c2n13
10.1.2.14       c2n14.cl.mydomain.local     c2n14
10.1.2.15       c2n15.cl.mydomain.local     c2n15
10.1.2.16       c2n16.cl.mydomain.local     c2n16
10.1.2.17       c2n17.cl.mydomain.local     c2n17

10.0.1.1        master-ib.cl.mydomain.local master-ib
10.0.1.2        c2n2-ib.cl.mydomain.local   c2n2-ib
10.0.1.3        c2n3-ib.cl.mydomain.local   c2n3-ib c2n3ib
10.0.1.4        c2n4-ib.cl.mydomain.local   c2n4-ib
10.0.1.5        c2n5-ib.cl.mydomain.local   c2n5-ib
10.0.1.6        c2n6-ib.cl.mydomain.local   c2n6-ib
10.0.1.7        c2n7-ib.cl.mydomain.local   c2n7-ib
10.0.1.8        c2n8-ib.cl.mydomain.local   c2n8-ib
10.0.1.9        c2n9-ib.cl.mydomain.local   c2n9-ib
10.0.1.10       c2n10-ib.cl.mydomain.local  c2n10-ib
10.0.1.11       c2n11-ib.cl.mydomain.local  c2n11-ib
10.0.1.12       c2n12-ib.cl.mydomain.local  c2n12-ib
10.0.1.13       c2n13-ib.cl.mydomain.local  c2n13-ib
10.0.1.14       c2n14-ib.cl.mydomain.local  c2n14-ib
10.0.1.15       c2n15-ib.cl.mydomain.local  c2n15-ib
10.0.1.16       c2n16-ib.cl.mydomain.local  c2n16-ib
10.0.1.17       c2n17-ib.cl.mydomain.local  c2n17-ib

10.0.2.1        master-ib2.cl.mydomain.local master-ib2
10.0.2.2        c2n2-ib2.cl.mydomain.local  c2n2-ib2
10.0.2.3        c2n3-ib2.cl.mydomain.local  c2n3-ib2 c2n3ib2
10.0.2.4        c2n4-ib2.cl.mydomain.local  c2n4-ib2
10.0.2.5        c2n5-ib2.cl.mydomain.local  c2n5-ib2
10.0.2.6        c2n6-ib2.cl.mydomain.local  c2n6-ib2
10.0.2.7        c2n7-ib2.cl.mydomain.local  c2n7-ib2
10.0.2.8        c2n8-ib2.cl.mydomain.local  c2n8-ib2
10.0.2.9        c2n9-ib2.cl.mydomain.local  c2n9-ib2
10.0.2.10       c2n10-ib2.cl.mydomain.local c2n10-ib2
10.0.2.11       c2n11-ib2.cl.mydomain.local c2n11-ib2
10.0.2.12       c2n12-ib2.cl.mydomain.local c2n12-ib2
10.0.2.13       c2n13-ib2.cl.mydomain.local c2n13-ib2
10.0.2.14       c2n14-ib2.cl.mydomain.local c2n14-ib2
10.0.2.15       c2n15-ib2.cl.mydomain.local c2n15-ib2
10.0.2.16       c2n16-ib2.cl.mydomain.local c2n16-ib2
10.0.2.17       c2n17-ib2.cl.mydomain.local c2n17-ib2

Routes:
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
239.2.11.71     0.0.0.0         255.255.255.255 UH    0      0        0 eth0
10.0.1.0        0.0.0.0         255.255.255.0   U     0      0        0 ib0
10.0.2.0        0.0.0.0         255.255.255.0   U     0      0        0 ib1
10.1.1.0        0.0.0.0         255.255.255.0   U     0      0        0 eth1
10.1.2.0        0.0.0.0         255.255.255.0   U     0      0        0 eth0
169.254.0.0     0.0.0.0         255.255.0.0     U     0      0        0 eth0
127.0.0.0       0.0.0.0         255.0.0.0       U     0      0        0 lo
0.0.0.0         10.1.1.1        0.0.0.0         UG    0      0        0 eth1
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: opensm with multiple IB subnets
       [not found] ` <k2s2d0a59b21004201413ia115ae29u661f8df428d5ad08-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2010-04-21  0:07   ` Ken Teague
       [not found]     ` <u2q2d0a59b21004201707gecf7f978pa585ada342ccb9b6-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Ken Teague @ 2010-04-21  0:07 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Tue, Apr 20, 2010 at 2:13 PM, Ken Teague <kteague-e+AXbWqSrlAAvxtiuMwx3w@public.gmane.org> wrote:
> I have a 17-node cluster and each node has a single IB card that has
> 2x IB ports (ib0 and ib1).....

After doing a little more research, I confirmed that my understanding
of the manual page is correct.  To run opensm for each GUID, I
modified my init script to run a for loop based on the information
returned from "ibstat -p".


I added this near the beginning of the script where the other
environment variables are located:
<snip>
OFA_HOME="/usr/local/sbin"
IBSTAT_BIN="${OFA_HOME}/ibstat"
IBSTAT_ARG="-p"
OPENSM_BIN="${OFA_HOME}/opensm"
OPENSM_ARG="-B -g"
<snip>


I replaced the single line which started opensm with this for loop:
for i in `${IBSTAT_BIN} ${IBSTAT_ARG}`
do
    ${OPENSM_BIN} ${OPENSM_ARG} ${i}
done
<snip>

If anyone has a more elegant way to handle this, I'm open to
suggestions.  Many thanks.

Ken
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: opensm with multiple IB subnets
       [not found]     ` <u2q2d0a59b21004201707gecf7f978pa585ada342ccb9b6-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2010-04-21 14:56       ` Yevgeny Kliteynik
       [not found]         ` <4BCF1215.7070601-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Yevgeny Kliteynik @ 2010-04-21 14:56 UTC (permalink / raw)
  To: Ken Teague; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Ken,

On 4/21/2010 3:07 AM, Ken Teague wrote:
> On Tue, Apr 20, 2010 at 2:13 PM, Ken Teague<kteague-e+AXbWqSrlAAvxtiuMwx3w@public.gmane.org>  wrote:
>> I have a 17-node cluster and each node has a single IB card that has
>> 2x IB ports (ib0 and ib1).....
>
> After doing a little more research, I confirmed that my understanding
> of the manual page is correct.  To run opensm for each GUID, I
> modified my init script to run a for loop based on the information
> returned from "ibstat -p".
>
>
> I added this near the beginning of the script where the other
> environment variables are located:
> <snip>
> OFA_HOME="/usr/local/sbin"
> IBSTAT_BIN="${OFA_HOME}/ibstat"
> IBSTAT_ARG="-p"
> OPENSM_BIN="${OFA_HOME}/opensm"
> OPENSM_ARG="-B -g"
> <snip>
>
>
> I replaced the single line which started opensm with this for loop:
> for i in `${IBSTAT_BIN} ${IBSTAT_ARG}`
> do
>      ${OPENSM_BIN} ${OPENSM_ARG} ${i}
> done
> <snip>
>
> If anyone has a more elegant way to handle this, I'm open to
> suggestions.  Many thanks.

OpenSM dumps various files to /var/log and /var/cache/opensm folders.
When you have more than one OpenSM process, they will all dump the
same files, which is probably not a good idea.

To change the output directories, set the OSM_TMP_DIR and
OSM_CACHE_DIR env. variables to some other place.
In addition, you need to make sure that each SM instance
prints its log in a different place. You need to do
something like this:

foreach guid in guid_list
	export OSM_TMP_DIR=/tmp/osm_dump_dir${guid}
	export OSM_CACHE_DIR=/tmp/osm_dump_dir${guid}
	opensm --log_file /tmp/osm_dump_dir${guid}/osm.log -g ${guid} [your other options]

-- Yevgeny

> Ken
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: opensm with multiple IB subnets
       [not found]         ` <4BCF1215.7070601-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2010-04-22 14:23           ` Justin Clift
       [not found]             ` <4BD05BD2.9000600-oNuxUQfTmABg9hUCZPvPmw@public.gmane.org>
  2010-04-22 19:47           ` Ken Teague
  1 sibling, 1 reply; 6+ messages in thread
From: Justin Clift @ 2010-04-22 14:23 UTC (permalink / raw)
  To: Yevgeny Kliteynik; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

We should really put this on the wiki. :)


On 04/22/2010 12:56 AM, Yevgeny Kliteynik wrote:
<snip>
> OpenSM dumps various files to /var/log and /var/cache/opensm folders.
> When you have more than one OpenSM process, they will all dump the
> same files, which is probably not a good idea.
>
> To change the output directories, set the OSM_TMP_DIR and
> OSM_CACHE_DIR env. variables to some other place.
> In addition, you need to make sure that each SM instance
> prints its log in a different place. You need to do
> something like this:
>
> foreach guid in guid_list
> export OSM_TMP_DIR=/tmp/osm_dump_dir${guid}
> export OSM_CACHE_DIR=/tmp/osm_dump_dir${guid}
> opensm --log_file /tmp/osm_dump_dir${guid}/osm.log -g ${guid} [your
> other options]
>
> -- Yevgeny
<snip>

-- 
Salasaga  -  Open Source eLearning IDE
               http://www.salasaga.org
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: opensm with multiple IB subnets
       [not found]             ` <4BD05BD2.9000600-oNuxUQfTmABg9hUCZPvPmw@public.gmane.org>
@ 2010-04-22 14:38               ` Yevgeny Kliteynik
  0 siblings, 0 replies; 6+ messages in thread
From: Yevgeny Kliteynik @ 2010-04-22 14:38 UTC (permalink / raw)
  To: Justin Clift; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 4/22/2010 5:23 PM, Justin Clift wrote:
> We should really put this on the wiki. :)

Good idea :)

-- Yevgeny
  
>
> On 04/22/2010 12:56 AM, Yevgeny Kliteynik wrote:
> <snip>
>> OpenSM dumps various files to /var/log and /var/cache/opensm folders.
>> When you have more than one OpenSM process, they will all dump the
>> same files, which is probably not a good idea.
>>
>> To change the output directories, set the OSM_TMP_DIR and
>> OSM_CACHE_DIR env. variables to some other place.
>> In addition, you need to make sure that each SM instance
>> prints its log in a different place. You need to do
>> something like this:
>>
>> foreach guid in guid_list
>> export OSM_TMP_DIR=/tmp/osm_dump_dir${guid}
>> export OSM_CACHE_DIR=/tmp/osm_dump_dir${guid}
>> opensm --log_file /tmp/osm_dump_dir${guid}/osm.log -g ${guid} [your
>> other options]
>>
>> -- Yevgeny
> <snip>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: opensm with multiple IB subnets
       [not found]         ` <4BCF1215.7070601-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  2010-04-22 14:23           ` Justin Clift
@ 2010-04-22 19:47           ` Ken Teague
  1 sibling, 0 replies; 6+ messages in thread
From: Ken Teague @ 2010-04-22 19:47 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Thank you, Yevgeny.  I've modified my init script accordingly and
restarted opensm.  For anyone else that may find it useful, here is
the "start" portion of my init script:

start () {
    echo -n "Starting opensm: "
    for GUID in `${IBSTAT_BIN} ${IBSTAT_ARG}`
    do
        export OSM_TMP_DIR="/tmp/opensm/${GUID}"
        export OSM_CACHE_DIR="/var/cache/opensm/${GUID}"
        export OSM_LOG_DIR="/var/log/opensm/${GUID}"
        [ -d ${OSM_TMP_DIR} ] || mkdir -p ${OSM_TMP_DIR}
        [ -d ${OSM_CACHE_DIR} ] || mkdir -p ${OSM_CACHE_DIR}
        [ -d ${OSM_LOG_DIR} ] || mkdir ${OSM_LOG_DIR}
        ${OPENSM_BIN} --log_file ${OSM_LOG_DIR}/opensm.log
${OPENSM_ARG} ${GUID} > /dev/null
    done
    if [[ $RETVAL -eq 0 ]]; then
        touch /var/lock/subsys/opensm
        success
    else
        failure
    fi
    echo
}


On Wed, Apr 21, 2010 at 7:56 AM, Yevgeny Kliteynik
<kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote:
> Ken,
>
> On 4/21/2010 3:07 AM, Ken Teague wrote:
>>
>> On Tue, Apr 20, 2010 at 2:13 PM, Ken Teague<kteague-e+AXbWqSrlAAvxtiuMwx3w@public.gmane.org>  wrote:
>>>
>>> I have a 17-node cluster and each node has a single IB card that has
>>> 2x IB ports (ib0 and ib1).....
>>
>> After doing a little more research, I confirmed that my understanding
>> of the manual page is correct.  To run opensm for each GUID, I
>> modified my init script to run a for loop based on the information
>> returned from "ibstat -p".
>>
>>
>> I added this near the beginning of the script where the other
>> environment variables are located:
>> <snip>
>> OFA_HOME="/usr/local/sbin"
>> IBSTAT_BIN="${OFA_HOME}/ibstat"
>> IBSTAT_ARG="-p"
>> OPENSM_BIN="${OFA_HOME}/opensm"
>> OPENSM_ARG="-B -g"
>> <snip>
>>
>>
>> I replaced the single line which started opensm with this for loop:
>> for i in `${IBSTAT_BIN} ${IBSTAT_ARG}`
>> do
>>     ${OPENSM_BIN} ${OPENSM_ARG} ${i}
>> done
>> <snip>
>>
>> If anyone has a more elegant way to handle this, I'm open to
>> suggestions.  Many thanks.
>
> OpenSM dumps various files to /var/log and /var/cache/opensm folders.
> When you have more than one OpenSM process, they will all dump the
> same files, which is probably not a good idea.
>
> To change the output directories, set the OSM_TMP_DIR and
> OSM_CACHE_DIR env. variables to some other place.
> In addition, you need to make sure that each SM instance
> prints its log in a different place. You need to do
> something like this:
>
> foreach guid in guid_list
>        export OSM_TMP_DIR=/tmp/osm_dump_dir${guid}
>        export OSM_CACHE_DIR=/tmp/osm_dump_dir${guid}
>        opensm --log_file /tmp/osm_dump_dir${guid}/osm.log -g ${guid} [your
> other options]
>
> -- Yevgeny
>
>> Ken
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-04-22 19:47 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-20 21:13 opensm with multiple IB subnets Ken Teague
     [not found] ` <k2s2d0a59b21004201413ia115ae29u661f8df428d5ad08-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-04-21  0:07   ` Ken Teague
     [not found]     ` <u2q2d0a59b21004201707gecf7f978pa585ada342ccb9b6-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-04-21 14:56       ` Yevgeny Kliteynik
     [not found]         ` <4BCF1215.7070601-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2010-04-22 14:23           ` Justin Clift
     [not found]             ` <4BD05BD2.9000600-oNuxUQfTmABg9hUCZPvPmw@public.gmane.org>
2010-04-22 14:38               ` Yevgeny Kliteynik
2010-04-22 19:47           ` Ken Teague

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.