linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RE: InfiniBand BOF @ LSM - topics of interest
@ 2002-05-15 20:37 Woodruff, Robert J
  2002-05-15 20:55 ` Pete Zaitcev
  2002-05-15 21:01 ` Alan Cox
  0 siblings, 2 replies; 35+ messages in thread
From: Woodruff, Robert J @ 2002-05-15 20:37 UTC (permalink / raw)
  To: 'Stephen Hemminger', Pete Zaitcev; +Cc: Alan Cox, linux-kernel


On Tue, 2002-05-14 at 22:01, Pete Zaitcev wrote:
 
> The thing about Infiniband is that its scope is so great.
> If you consider Infiniband was only a glorified PCI with serial
> connector, the congestion control is not an issue. Credits
> are quite sufficient to provide per link flow control, and
> everything would work nicely with a couple of switches.
> Such was the original plan, anyways, but somehow cluster
> ninjas managed to hijack the spec and we have the rabid
> overengineering running amok. In fact, they ran so far
> that Intel jumped ship and created PCI Express, and we
> have discussions about congestion control. Sad, really...
> 


It's clear from this email thread that there is a lot of 
confusion about the intended use of InfiniBand and it's 
benefits. I'll take that as a need to prepare some material
for the Linux SymposiuM BOF that shows some of 
the  benefits of InfiniBand as demonstrated on the early
InfiniBand hardware. 




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: InfiniBand BOF @ LSM - topics of interest
  2002-05-15 20:37 InfiniBand BOF @ LSM - topics of interest Woodruff, Robert J
@ 2002-05-15 20:55 ` Pete Zaitcev
  2002-05-15 21:01 ` Alan Cox
  1 sibling, 0 replies; 35+ messages in thread
From: Pete Zaitcev @ 2002-05-15 20:55 UTC (permalink / raw)
  To: Woodruff, Robert J; +Cc: linux-kernel

> From: "Woodruff, Robert J" <woody@co.intel.com>
> Date: Wed, 15 May 2002 13:37:02 -0700
 
> [...]
> It's clear from this email thread that there is a lot of 
> confusion about the intended use of InfiniBand and it's 
> benefits. I'll take that as a need to prepare some material
> for the Linux SymposiuM BOF that shows some of 
> the  benefits of InfiniBand as demonstrated on the early
> InfiniBand hardware. 

Bob, that would be much appreciated. Last year I did
a small introductory presentation, transcript of which is at
<http://www.fenrus.com/infiniband/bof.txt>.
I would be delighted to pass the banner to you, especially
given your insider knowledge of the actual history.

-- Pete

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: InfiniBand BOF @ LSM - topics of interest
  2002-05-15 20:37 InfiniBand BOF @ LSM - topics of interest Woodruff, Robert J
  2002-05-15 20:55 ` Pete Zaitcev
@ 2002-05-15 21:01 ` Alan Cox
  1 sibling, 0 replies; 35+ messages in thread
From: Alan Cox @ 2002-05-15 21:01 UTC (permalink / raw)
  To: Woodruff, Robert J
  Cc: 'Stephen Hemminger', Pete Zaitcev, Alan Cox, linux-kernel

> It's clear from this email thread that there is a lot of 
> confusion about the intended use of InfiniBand and it's 
> benefits. I'll take that as a need to prepare some material
> for the Linux SymposiuM BOF that shows some of 
> the  benefits of InfiniBand as demonstrated on the early
> InfiniBand hardware. 

That would I think be good. Especially if we can get lots of pure networking
and pure I/O driver people at the BOF

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: InfiniBand BOF @ LSM - topics of interest
@ 2002-05-16 23:42 Woodruff, Robert J
  0 siblings, 0 replies; 35+ messages in thread
From: Woodruff, Robert J @ 2002-05-16 23:42 UTC (permalink / raw)
  To: 'Alan Cox', Woodruff, Robert J
  Cc: russ, Tony.P.Lee, wookie, lmb, Woodruff, Robert J, linux-kernel, zaitcev


> so someone could invent a new address family for sockets,
> say AF_INFINIBANDO, that is much more light weight than the existing
TCP/IP
> stack.
> Thus with a small change to the application, a good performance increase
can
> be attained.

>Shouldn't be too hard. It looks like its basically AF_PACKET combined with
>the infiniband notions of security.

Maybe a little higher level than raw packets, but yes, 
a light-weight sockets protocol driver. 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: InfiniBand BOF @ LSM - topics of interest
@ 2002-05-16 10:37 Eric Brunner-Williams in Portland Maine
  0 siblings, 0 replies; 35+ messages in thread
From: Eric Brunner-Williams in Portland Maine @ 2002-05-16 10:37 UTC (permalink / raw)
  To: ashokr2
  Cc: 'Russell Leighton',
	Tony.P.Lee, wookie, alan, lmb, linux-kernel, zaitcev, woody,
	brunner


Phil Karn's memo "Advice for Internet Subnetwork Designers",
(draft-ietf-pilc-link-design-11.txt), is interesting reading.

I'd appreciate it if someone with IB-clue would comment on
Phil's memo, and the design choices of the IB designer(s).

My two beads of wampum,
Eric

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: InfiniBand BOF @ LSM - topics of interest
  2002-05-15 21:36 ` Russell Leighton
  2002-05-15 22:41   ` Lars Marowsky-Bree
@ 2002-05-16  4:51   ` Ashok Raj
  1 sibling, 0 replies; 35+ messages in thread
From: Ashok Raj @ 2002-05-16  4:51 UTC (permalink / raw)
  To: Russell Leighton, Tony.P.Lee
  Cc: wookie, alan, lmb, woody, linux-kernel, zaitcev

->
->Can we really have these sort of low level IB interactions and have :
->    - security issues addressed, mostly an issue if the devices are over
->a network w/other devices

IB fabric does not replace the network. It is designed for the data-center.

IB offers even more RAS from a server standpoint. For e.g todays drivers
could DMA to areas of memory owned by another driver/process. IB protects
your RDMA by implementing protection domains which each driver could be on a
separate protection domain, and any access to RDMA to areas not owned would
only cause error to the specific driver/application that is malfunctioning.

->    - qos control

IB specifies Service Level (a.k.a SL) and multiple data Virtual Lanes (VL)
that can be managed via the Subnet Manager to implement QoS. There is a
dedicated workgroup in IBTA working on some of these issues.

Applications that would require absolute feel of the metal can use the
access layer API's that will provide direct user mode ability to send data
from hardware, and also process completions without entering kernel mode
entirely in user mode, if the hardware permits that.
->
-

->
->Thanks!
->
->Russ
->
->
->Tony.P.Lee@nokia.com wrote:
->
->>
->>For VNC type application, instead server translates
->>every X Windows, Mac, Windows GUI calls/Bitmap update
->>to TCP stream, you convert the GUI API calls to
->>IB RC messages and bitmap updates to RDMA write directly
->>to client app's frame buffer.
->>
->>For SAMBA like fs, the file read api can be translated to
->>IB RC messages on client + RDMA write to remote
->>client app's buffer directly from server.
->>
->>They won't be "standard" VNC/SAMBA any more.
->>
->>On the other hand, we can put VNC over TCP over IP over IB,
->>- "for people with hammer, every problem looks like a nail." :-)
->>
->>In theory, we can have IB DVD drive RDMA video directly
->>over IB switch to IB enable VGA frame buffer and completely
->>by pass the system.  CPU only needed setup the proper
->>connections.   The idea is to truely virtualized the system
->>resources and "resource server" RDMA the data to anyone on IB
->>switch with minimal CPU interaction in the process.
->>
->>You can also config a normal SCSI card DMA data to virtualized
->>IB address on PCI address space and have the data shows up 15 meters
->>or 2 km away on server's "virtual scsi driver" destination DMA address.
->>It made iSCSI looked like dial up modem in term of performance
->>and latency.
->>
->>
->>----------------------------------------------------------------
->>Tony
->>-
->>To unsubscribe from this list: send the line "unsubscribe
->linux-kernel" in
->>the body of a message to majordomo@vger.kernel.org
->>More majordomo info at  http://vger.kernel.org/majordomo-info.html
->>Please read the FAQ at  http://www.tux.org/lkml/
->>
->
->
->-
->To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
->the body of a message to majordomo@vger.kernel.org
->More majordomo info at  http://vger.kernel.org/majordomo-info.html
->Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: InfiniBand BOF @ LSM - topics of interest
  2002-05-15 23:58 Woodruff, Robert J
@ 2002-05-16  0:32 ` Alan Cox
  0 siblings, 0 replies; 35+ messages in thread
From: Alan Cox @ 2002-05-16  0:32 UTC (permalink / raw)
  To: Woodruff, Robert J
  Cc: 'Russell Leighton',
	Tony.P.Lee, wookie, alan, lmb, Woodruff Robert J, linux-kernel,
	zaitcev

> so someone could invent a new address family for sockets,
> say AF_INFINIBANDO, that is much more light weight than the existing TCP/IP
> stack.
> Thus with a small change to the application, a good performance increase can
> be attained.

Shouldn't be too hard. It looks like its basically AF_PACKET combined with
the infiniband notions of security.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: InfiniBand BOF @ LSM - topics of interest
@ 2002-05-15 23:58 Woodruff, Robert J
  2002-05-16  0:32 ` Alan Cox
  0 siblings, 1 reply; 35+ messages in thread
From: Woodruff, Robert J @ 2002-05-15 23:58 UTC (permalink / raw)
  To: 'Russell Leighton', Tony.P.Lee
  Cc: wookie, alan, lmb, Woodruff, Robert J, linux-kernel, zaitcev


>OR is it saner to layer TCP/IP etc. over IB... it seems to me the point
>others were making was that  there is more to "network"  style services
>than just passing bits from here to there...will IB have low level
>support many of the features people have come to expect?

Yes someone will layer IP over IB to support the gazillion applications
that currently run over that protocol without any code changes. This should
perform
as well as 10 gigabit Ethernet, but may cost less. 

However, since InfiniBand is already a reliable transport, one can bypass
TCP, 
so someone could invent a new address family for sockets,
say AF_INFINIBANDO, that is much more light weight than the existing TCP/IP
stack.
Thus with a small change to the application, a good performance increase can
be attained.
This is probably how the InfiniBand standard sockets direct protocol will be
initially
implemented for Linux. Next, one could entertain a concept similar to
Winsock Direct,
where no change is needed to the application, and the kernel has a swtich in
the
AF_INET code path that bypasses the TCP processing for packets destined for
the InfiniBand subnet.  This will likely have to come later for Linux 
after the performance value of bypassing TCP has 
actually been demonstrated. People are currently skeptical that 
a concept that has been demonstrated to have performance advantages
in Windows 2000 will achieve the same result if implemented in 
Linux. 

Finally, there are those that want to run on almost the bare metal. These
are 
typically the high performance computing types, or some of the large data
base
vendors that want to use InfiniBand for clustering.
For them, there is an InfiniBand Access Layer that allows 
direct access to the hardware (via a thin S/W layer)
from user space applications. One could
probably mmap() an ethernet NIC registers into user space, but 
probably only one process. 
InfiniBand was designed to specifically allow mmap()ing the hardware 
into user space,  even for multiple processes. 

As for a high level answer to the security question,
InfiniBand has the concept of protection domains
and read and write keys with checking built into the hardware. 
This provides protection from people RDMAing to the wrong place. 

Kind of a high level simplistic view, see the InfiniBand 
specification for details. There is also an open source
project on sourceforege that is working on code. See 
the home page for details.

http://sourceforge.net/projects/infiniband






^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: InfiniBand BOF @ LSM - topics of interest
  2002-05-15 21:36 ` Russell Leighton
@ 2002-05-15 22:41   ` Lars Marowsky-Bree
  2002-05-16  4:51   ` Ashok Raj
  1 sibling, 0 replies; 35+ messages in thread
From: Lars Marowsky-Bree @ 2002-05-15 22:41 UTC (permalink / raw)
  To: Russell Leighton, Tony.P.Lee; +Cc: wookie, alan, woody, linux-kernel, zaitcev

On 2002-05-15T17:36:27,
   Russell Leighton <russ@elegant-software.com> said:

> Lot's of very cool ideas for IB ...not knowing much about IB, but
> being curious and interested, I have a question which may be stupid
> so I apoligize in advance if it is...
> 
> Can we really have these sort of low level IB interactions and have :
>    - security issues addressed, mostly an issue if the devices are over 
> a network w/other devices

The idea is that the IB interconnect is "trusted". Doing very low level kernel
operations cluster-style over a non-trusted link is asking for it; either you
lose security-wise or performance for authentication / encryption _will_ kill
you.

The real interesting question from my side is "availability"; how does the
kernel deal with crashed nodes, loss of link etc? If you already had agreed or
semi-standard interfaces here, we would gladly pick them up.

These generic cluster interfaces are being discussed as part of the Open
Clustering Framework now, and it would be good if a kernel developer reviewed
the discussions regarding the generic event mechanism proposed from a kernel
perspective.


Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
Immortality is an adequate definition of high availability for me.
	--- Gregory F. Pfister


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: InfiniBand BOF @ LSM - topics of interest
  2002-05-15 20:47 Tony.P.Lee
@ 2002-05-15 21:36 ` Russell Leighton
  2002-05-15 22:41   ` Lars Marowsky-Bree
  2002-05-16  4:51   ` Ashok Raj
  0 siblings, 2 replies; 35+ messages in thread
From: Russell Leighton @ 2002-05-15 21:36 UTC (permalink / raw)
  To: Tony.P.Lee; +Cc: wookie, alan, lmb, woody, linux-kernel, zaitcev


Lot's of very cool ideas for IB ...not knowing much about IB, but
being curious and interested, I have a question which may be stupid
so I apoligize in advance if it is...

Can we really have these sort of low level IB interactions and have :
    - security issues addressed, mostly an issue if the devices are over 
a network w/other devices
       doing other things for other processes/people...
       (I get the a bit scared when someone suggests that a remote 
device can write directly to my frame
         buffer or disk, I hope there are security controls in place!)
    - congestion control (already mentioned)
    - qos control

OR is it saner to layer TCP/IP etc. over IB... it seems to me the point
others were making was that  there is more to "network"  style services
than just passing bits from here to there...will IB have low level
support many of the features people have come to expect?

Thanks!

Russ


Tony.P.Lee@nokia.com wrote:

>
>For VNC type application, instead server translates
>every X Windows, Mac, Windows GUI calls/Bitmap update
>to TCP stream, you convert the GUI API calls to 
>IB RC messages and bitmap updates to RDMA write directly 
>to client app's frame buffer.  
>
>For SAMBA like fs, the file read api can be translated to 
>IB RC messages on client + RDMA write to remote 
>client app's buffer directly from server.
>
>They won't be "standard" VNC/SAMBA any more. 
>
>On the other hand, we can put VNC over TCP over IP over IB,
>- "for people with hammer, every problem looks like a nail." :-)
>
>In theory, we can have IB DVD drive RDMA video directly 
>over IB switch to IB enable VGA frame buffer and completely
>by pass the system.  CPU only needed setup the proper 
>connections.   The idea is to truely virtualized the system
>resources and "resource server" RDMA the data to anyone on IB
>switch with minimal CPU interaction in the process.
>
>You can also config a normal SCSI card DMA data to virtualized 
>IB address on PCI address space and have the data shows up 15 meters
>or 2 km away on server's "virtual scsi driver" destination DMA address.
>It made iSCSI looked like dial up modem in term of performance
>and latency. 
>
>
>----------------------------------------------------------------
>Tony 
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/
>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: InfiniBand BOF @ LSM - topics of interest
@ 2002-05-15 20:47 Tony.P.Lee
  2002-05-15 21:36 ` Russell Leighton
  0 siblings, 1 reply; 35+ messages in thread
From: Tony.P.Lee @ 2002-05-15 20:47 UTC (permalink / raw)
  To: wookie, alan; +Cc: lmb, woody, linux-kernel, zaitcev

> > Thats an assumption that is actually historically not a 
> very good one to
> > make. There are fundamental things that most of the "no 
> network layer"
> > people tend to forget
> > 
> > 1.	Van Jacobson saturated 10Mbit ethernet with a Sun 3/50
> > 2.	SGI saturated HIPPI with MIPS processors that are at 
> best comparable
> > 	to the lowest end wince PDAs
> > 3.	Having no network layer in almost every case is tied to 
> the belief
> > 	that bandwidth is infinite and you need to congestion control
> > 
>  First I want to say that the average person using the system the
> socket interface with the underlying network stack is the best
> way to go. IMHO
>  
> But if the issue isn't throughput and if the application requires
> some items to have low latency.  The size of the network stack can
> have an adverse effect on the overall performance of the system.  A
> good example is in a Distributed Lock Manager (DLM).  In this case
> the round trip including the software stack limits the number of
> locks per second that can occur. 
> 
>   So if we can fit everything that we need into a 64 byte IB packet
> for the imaginary application that would take 256 nsec to transmit
> to the other system. (2.5 Gb/sec link) If you assume zero turn around
> time you get 512 nsec as the lowest lock request/grant time possible
> which puts you in the range of doing a little under 980,000
> lock/unlock operations per second on a single lock and of course when
> you add the software to actually process the packet the number of
> lock/unlock operations per second will always be below that 
> in the real 
> world.  When you compare this to a modern processors ability to
> do lock/unlocks this is really a small number of lock/unlock 
> operations per second. 
> 
> So the ability to scarf that packet into the application and respond
> is the hard issue to solve in scalability.
> 
> Tim
> 

For VNC type application, instead server translates
every X Windows, Mac, Windows GUI calls/Bitmap update
to TCP stream, you convert the GUI API calls to 
IB RC messages and bitmap updates to RDMA write directly 
to client app's frame buffer.  

For SAMBA like fs, the file read api can be translated to 
IB RC messages on client + RDMA write to remote 
client app's buffer directly from server.

They won't be "standard" VNC/SAMBA any more. 

On the other hand, we can put VNC over TCP over IP over IB,
- "for people with hammer, every problem looks like a nail." :-)

In theory, we can have IB DVD drive RDMA video directly 
over IB switch to IB enable VGA frame buffer and completely
by pass the system.  CPU only needed setup the proper 
connections.   The idea is to truely virtualized the system
resources and "resource server" RDMA the data to anyone on IB
switch with minimal CPU interaction in the process.

You can also config a normal SCSI card DMA data to virtualized 
IB address on PCI address space and have the data shows up 15 meters
or 2 km away on server's "virtual scsi driver" destination DMA address.
It made iSCSI looked like dial up modem in term of performance
and latency. 


----------------------------------------------------------------
Tony 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: InfiniBand BOF @ LSM - topics of interest
  2002-05-15  5:01   ` Pete Zaitcev
  2002-05-15 11:29     ` Alan Cox
@ 2002-05-15 16:34     ` Stephen Hemminger
  1 sibling, 0 replies; 35+ messages in thread
From: Stephen Hemminger @ 2002-05-15 16:34 UTC (permalink / raw)
  To: Pete Zaitcev; +Cc: Alan Cox, linux-kernel

On Tue, 2002-05-14 at 22:01, Pete Zaitcev wrote:
 
> The thing about Infiniband is that its scope is so great.
> If you consider Infiniband was only a glorified PCI with serial
> connector, the congestion control is not an issue. Credits
> are quite sufficient to provide per link flow control, and
> everything would work nicely with a couple of switches.
> Such was the original plan, anyways, but somehow cluster
> ninjas managed to hijack the spec and we have the rabid
> overengineering running amok. In fact, they ran so far
> that Intel jumped ship and created PCI Express, and we
> have discussions about congestion control. Sad, really...
> 
> -- Pete
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

This sounds like deja vu all over again.
Each new interconnect technology like ATM seems to go through the cycle:

	Assert: all other network protocols are crap
	Deny: history
	Assert: our problem is different, therefore we must
	        reinvent everything from data transfer up to 		applications

	Reality strikes!

	New technology ends up being used with standard applications and
	protocols.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: InfiniBand BOF @ LSM - topics of interest
  2002-05-14 23:58 ` Alan Cox
@ 2002-05-15 15:45   ` Timothy D. Witham
  0 siblings, 0 replies; 35+ messages in thread
From: Timothy D. Witham @ 2002-05-15 15:45 UTC (permalink / raw)
  To: Alan Cox; +Cc: Tony.P.Lee, lmb, woody, linux-kernel, Pete Zaitcev

On Tue, 2002-05-14 at 16:58, Alan Cox wrote:
> > I like to see user application such as VNC, SAMBA build directly
> > on top of IB API.  I have couple of IB cards that can 
> > send 10k 32KBytes message (320MB of data) every ~1 second over 
> > 1x link with only <7% CPU usage (single CPU xeon 700MHz).  
> > I was very impressed.  
> > 
> > Go thru the socket layer API would just slow thing down.
> 
> Thats an assumption that is actually historically not a very good one to
> make. There are fundamental things that most of the "no network layer"
> people tend to forget
> 
> 1.	Van Jacobson saturated 10Mbit ethernet with a Sun 3/50
> 2.	SGI saturated HIPPI with MIPS processors that are at best comparable
> 	to the lowest end wince PDAs
> 3.	Having no network layer in almost every case is tied to the belief
> 	that bandwidth is infinite and you need to congestion control
> 
 First I want to say that the average person using the system the
socket interface with the underlying network stack is the best
way to go. IMHO
 
But if the issue isn't throughput and if the application requires
some items to have low latency.  The size of the network stack can
have an adverse effect on the overall performance of the system.  A
good example is in a Distributed Lock Manager (DLM).  In this case
the round trip including the software stack limits the number of
locks per second that can occur. 

  So if we can fit everything that we need into a 64 byte IB packet
for the imaginary application that would take 256 nsec to transmit
to the other system. (2.5 Gb/sec link) If you assume zero turn around
time you get 512 nsec as the lowest lock request/grant time possible
which puts you in the range of doing a little under 980,000
lock/unlock operations per second on a single lock and of course when
you add the software to actually process the packet the number of
lock/unlock operations per second will always be below that in the real 
world.  When you compare this to a modern processors ability to
do lock/unlocks this is really a small number of lock/unlock 
operations per second. 

So the ability to scarf that packet into the application and respond
is the hard issue to solve in scalability.

Tim

> In a network congestion based collapse is spectacularly bad. Some of the
> internet old hands can probably tell you the horror stories of the period
> the whole internet backbone basically did that until they got their research
> right. Nagle's tinygram congestion avoidance work took Ford's network usage
> down by I believe the paper quoted 90%.
> 
> The socket API is very efficient. TCP is extremely efficient in the service
> it provides. IB can support large messages, which massively ups the throughput.
> 
> Let me ask you a much more important question 
> 
> Can you send achieve 90% efficiency on a 90% utilized fabric with multiple
> nodes and multiple hops ? If you can't then you are not talking about a 
> network you are talking about a benchmark.
> 
> Alan
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-- 
Timothy D. Witham - Lab Director - wookie@osdlab.org
Open Source Development Lab Inc - A non-profit corporation
15275 SW Koll Parkway - Suite H - Beaverton OR, 97006
(503)-626-2455 x11 (office)    (503)-702-2871     (cell)
(503)-626-2436     (fax)


^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: InfiniBand BOF @ LSM - topics of interest
  2002-05-15 11:29     ` Alan Cox
@ 2002-05-15 14:27       ` Ashok Raj
  0 siblings, 0 replies; 35+ messages in thread
From: Ashok Raj @ 2002-05-15 14:27 UTC (permalink / raw)
  To: Alan Cox, Pete Zaitcev; +Cc: Tony.P.Lee, lmb, woody, linux-kernel, zaitcev

InfiniBand specifications provides tools to implement QoS.

Service Level (SL) and Virtual Lane (VL) managed by the SM determines how
the SL->VL mapping is performed these attributes are carried in the local
routing headers when a node sources the packet to the fabric.

The SL is not modified when the packet crosses subnets. IB spec has more
details on how these could be used. There is a separate congestion control
work group in IBTA that is trying to address these issue, but i have not see
that very active. Possibly since the update to 1.0a is due in very short
time.

-----Original Message-----
From: linux-kernel-owner@vger.kernel.org
[mailto:linux-kernel-owner@vger.kernel.org]On Behalf Of Alan Cox
Sent: Wednesday, May 15, 2002 4:29 AM
To: Pete Zaitcev
Cc: Alan Cox; Tony.P.Lee@nokia.com; lmb@suse.de; woody@co.intel.com;
linux-kernel@vger.kernel.org; zaitcev@redhat.com
Subject: Re: InfiniBand BOF @ LSM - topics of interest


> The thing about Infiniband is that its scope is so great.
> If you consider Infiniband was only a glorified PCI with serial
> connector, the congestion control is not an issue. Credits

Congestion control is always an issue 8

> are quite sufficient to provide per link flow control, and
> everything would work nicely with a couple of switches.
> Such was the original plan, anyways, but somehow cluster
> ninjas managed to hijack the spec and we have the rabid
> overengineering running amok. In fact, they ran so far
> that Intel jumped ship and created PCI Express, and we
> have discussions about congestion control. Sad, really...

My interest is in the question "does infiniband have usable congestion
control for tcp/clustering/networking". I don't actually care if it doesn't
and I'd rather have most congestion control in software anyway.

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: InfiniBand BOF @ LSM - topics of interest
  2002-05-15  5:01   ` Pete Zaitcev
@ 2002-05-15 11:29     ` Alan Cox
  2002-05-15 14:27       ` Ashok Raj
  2002-05-15 16:34     ` Stephen Hemminger
  1 sibling, 1 reply; 35+ messages in thread
From: Alan Cox @ 2002-05-15 11:29 UTC (permalink / raw)
  To: Pete Zaitcev; +Cc: Alan Cox, Tony.P.Lee, lmb, woody, linux-kernel, zaitcev

> The thing about Infiniband is that its scope is so great.
> If you consider Infiniband was only a glorified PCI with serial
> connector, the congestion control is not an issue. Credits

Congestion control is always an issue 8

> are quite sufficient to provide per link flow control, and
> everything would work nicely with a couple of switches.
> Such was the original plan, anyways, but somehow cluster
> ninjas managed to hijack the spec and we have the rabid
> overengineering running amok. In fact, they ran so far
> that Intel jumped ship and created PCI Express, and we
> have discussions about congestion control. Sad, really...

My interest is in the question "does infiniband have usable congestion
control for tcp/clustering/networking". I don't actually care if it doesn't
and I'd rather have most congestion control in software anyway.

Alan


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: InfiniBand BOF @ LSM - topics of interest
  2002-05-15  2:35 ` Alan Cox
@ 2002-05-15  5:01   ` Pete Zaitcev
  2002-05-15 11:29     ` Alan Cox
  2002-05-15 16:34     ` Stephen Hemminger
  0 siblings, 2 replies; 35+ messages in thread
From: Pete Zaitcev @ 2002-05-15  5:01 UTC (permalink / raw)
  To: Alan Cox; +Cc: Tony.P.Lee, lmb, woody, linux-kernel, zaitcev

> Date: Wed, 15 May 2002 03:35:00 +0100 (BST)
> From: Alan Cox <alan@lxorguk.ukuu.org.uk>

> According to folks at Quantum the IB stuff isnt doing 'true' congestion
> control. At the moment its hard to tell since 1.0a doesn't deal with
> congestion management and the 2.0 congestion stuff isnt due out until
> later this year. Even then the Infiniband trade association folks use
> words like "hopefully eliminating the congestion" in their presentation to 
> describe their mechanism.

The thing about Infiniband is that its scope is so great.
If you consider Infiniband was only a glorified PCI with serial
connector, the congestion control is not an issue. Credits
are quite sufficient to provide per link flow control, and
everything would work nicely with a couple of switches.
Such was the original plan, anyways, but somehow cluster
ninjas managed to hijack the spec and we have the rabid
overengineering running amok. In fact, they ran so far
that Intel jumped ship and created PCI Express, and we
have discussions about congestion control. Sad, really...

-- Pete

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: InfiniBand BOF @ LSM - topics of interest
  2002-05-14 22:15 Woodruff, Robert J
@ 2002-05-15  2:42 ` Timothy D. Witham
  0 siblings, 0 replies; 35+ messages in thread
From: Timothy D. Witham @ 2002-05-15  2:42 UTC (permalink / raw)
  To: Woodruff, Robert J
  Cc: Pete Zaitcev, linux-kernel, 'Woodruff, Robert J'

  Anything that I purchase needs to be production level.  But
I can purchase items that have a delivered pre-production
unit and a later upgrade.  Of course equipment that isn't
purchased is a little more open.  But I really can't require
anybody signing a NDA to use it as I don't have any way
of tracking that sort of environment. 

Tim

On Tue, 2002-05-14 at 15:15, Woodruff, Robert J wrote:
> Hi Tim,
> 
> Does the OSDL require production H/W  ?
> I may be able to get my hands on a few early development HCAs,
> as loaners, till production HCAs are avaialble
> from HCA vendors. 
> 
> woody
> 
> 
> -----Original Message-----
> From: Timothy D. Witham [mailto:wookie@osdl.org]
> Sent: Tuesday, May 14, 2002 2:01 PM
> To: Timothy D. Witham
> Cc: Pete Zaitcev; linux-kernel@vger.kernel.org; Woodruff, Robert J
> Subject: Re: InfiniBand BOF @ LSM - topics of interest
> 
> 
>   Should of said.
> 
>   I have both IA-32 and IA-64 machines so by purchasing
> PCI HCA's we can move them around. :-)
> 
> Tim
> 
> On Tue, 2002-05-14 at 13:33, Timothy D. Witham wrote:
> >   I have IA-32 and IA-64 machines with PCI HCA's we
> > can move them around. 
> > 
> > Tim
> > 
> > On Tue, 2002-05-14 at 13:23, Pete Zaitcev wrote:
> > > > From: "Timothy D. Witham" <wookie@osdl.org>
> > > > Date: 14 May 2002 13:09:53 -0700
> > > 
> > > >   As an FYI to the group the OSDL has put out a
> > > > RFQ for equipment to build a IB development environment
> > > > and I was wondering if any of the possible developers
> > > > would like to comment on the mix.  
> > > > 
> > > > http://www.osdl.org/rfqs/infiniband_dev_env
> > > 
> > > Very interesting. Are you going to plug it into Itanic based
> > > hosts? Obviously, vendors cannot "support" that, but being
> > > Linux based we should be able to get that running, right?
> > > 
> > > -- Pete
> > -- 
> > Timothy D. Witham - Lab Director - wookie@osdlab.org
> > Open Source Development Lab Inc - A non-profit corporation
> > 15275 SW Koll Parkway - Suite H - Beaverton OR, 97006
> > (503)-626-2455 x11 (office)    (503)-702-2871     (cell)
> > (503)-626-2436     (fax)
> > 
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> -- 
> Timothy D. Witham - Lab Director - wookie@osdlab.org
> Open Source Development Lab Inc - A non-profit corporation
> 15275 SW Koll Parkway - Suite H - Beaverton OR, 97006
> (503)-626-2455 x11 (office)    (503)-702-2871     (cell)
> (503)-626-2436     (fax)
-- 
Timothy D. Witham - Lab Director - wookie@osdlab.org
Open Source Development Lab Inc - A non-profit corporation
15275 SW Koll Parkway - Suite H - Beaverton OR, 97006
(503)-626-2455 x11 (office)    (503)-702-2871     (cell)
(503)-626-2436     (fax)


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: InfiniBand BOF @ LSM - topics of interest
  2002-05-15  1:29 Tony.P.Lee
@ 2002-05-15  2:35 ` Alan Cox
  2002-05-15  5:01   ` Pete Zaitcev
  0 siblings, 1 reply; 35+ messages in thread
From: Alan Cox @ 2002-05-15  2:35 UTC (permalink / raw)
  To: Tony.P.Lee; +Cc: alan, lmb, woody, linux-kernel, zaitcev

> Good points,  I prefer to see IB as replacement for
> SCSI, FibreChannel, IDE, with its RDMA, messaging and

Fibrechannel can have congestion control failures. Its not something
you see badly because scsi timeouts are incredibly conservative (often
30 seconds for a read or write). They do deal with head of queue blocking
but thats a tiny bit of it

> I don't know if IB over multi-nodes/multi-hops in a WAN like
> setup works or not.  I like to see network experts like=20
> yourself try to break that since all the congestion control which
> is "supposely" done in HW as compare to doing that in software as=20
> in TCP/IP.  It would be nice to know how solid is the overall=20
> IB congestion control design in that environment and at=20
> what point does it break.  =20

According to folks at Quantum the IB stuff isnt doing 'true' congestion
control. At the moment its hard to tell since 1.0a doesn't deal with
congestion management and the 2.0 congestion stuff isnt due out until
later this year. Even then the Infiniband trade association folks use
words like "hopefully eliminating the congestion" in their presentation to 
describe their mechanism.

I've seen no mathematical proofs and no nice answers to the fact that FECN
has latencies and that the notification assumes the Ack packet with CA set
doesn't actually get dropped. People seem to model FECN as if the feedback
was instant whereas the feedback is one round trip on a loaded network if
the acks get back. For unreliable you end up introducing congestion 
notifications which I'm also not clear is perfect. I understand folks have
been doing the maths on this stuff though and it will be interesting to
see what they conclude

Finally for the centralised congestion manager the IBTA completely wash
their hands of policy - thats less of a problem as its mostly about telling the
boss to buy bigger toys.

TCP itself btw has issues over infiniband, stuff like convergence time,
clocking rates and window management reno style all begin to break down
at 10Gbit. On the other hand we have to solve those anyway along with other
upcoming horrors like extreme packet re-ordering.

(oh now I found it - the reference for the great internet congestion
 collapse is
	ACM computer communications review august 1988)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: InfiniBand BOF @ LSM - topics of interest
@ 2002-05-15  1:29 Tony.P.Lee
  2002-05-15  2:35 ` Alan Cox
  0 siblings, 1 reply; 35+ messages in thread
From: Tony.P.Lee @ 2002-05-15  1:29 UTC (permalink / raw)
  To: alan; +Cc: lmb, woody, linux-kernel, zaitcev


> Thats an assumption that is actually historically not a very 
> good one to
> make. There are fundamental things that most of the "no network layer"
> people tend to forget
> 
> 1.	Van Jacobson saturated 10Mbit ethernet with a Sun 3/50
> 2.	SGI saturated HIPPI with MIPS processors that are at 
> best comparable
> 	to the lowest end wince PDAs
> 3.	Having no network layer in almost every case is tied to 
> the belief
> 	that bandwidth is infinite and you need to congestion control
> 
> In a network congestion based collapse is spectacularly bad. 
> Some of the
> internet old hands can probably tell you the horror stories 
> of the period
> the whole internet backbone basically did that until they got 
> their research
> right. Nagle's tinygram congestion avoidance work took Ford's 
> network usage
> down by I believe the paper quoted 90%.
> 
> The socket API is very efficient. TCP is extremely efficient 
> in the service
> it provides. IB can support large messages, which massively 
> ups the throughput.
> 
> Let me ask you a much more important question 
> 
> Can you send achieve 90% efficiency on a 90% utilized fabric 
> with multiple
> nodes and multiple hops ? If you can't then you are not 
> talking about a 
> network you are talking about a benchmark.
> 

Good points,  I prefer to see IB as replacement for
SCSI, FibreChannel, IDE, with its RDMA, messaging and
reliable connection, user mode DMA type features.   They
are type of connections without the congestion avoidance
issues associate with TCP design - the "no network layer"
type of works.


I don't know if IB over multi-nodes/multi-hops in a WAN like
setup works or not.  I like to see network experts like 
yourself try to break that since all the congestion control which
is "supposely" done in HW as compare to doing that in software as 
in TCP/IP.  It would be nice to know how solid is the overall 
IB congestion control design in that environment and at 
what point does it break.   


IP over IB is a bit like IP over SCSI or IP over FibreChannel
for me.






^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: InfiniBand BOF @ LSM - topics of interest
  2002-05-13 16:25 Woodruff, Robert J
  2002-05-13 16:36 ` Pete Zaitcev
  2002-05-13 23:42 ` Alan Cox
@ 2002-05-15  0:11 ` Eric Brunner-Williams in Portland Maine
  2 siblings, 0 replies; 35+ messages in thread
From: Eric Brunner-Williams in Portland Maine @ 2002-05-15  0:11 UTC (permalink / raw)
  To: Woodruff, Robert J; +Cc: linux-kernel, zaitcev, brunner

> Soliciting specific topics that people would like to discuss at the LSM
> InfiniBand BOF.
> There are at least 2 that I thought might be good discussion topics;
	...
> Others ?

Well, I'm sort of partial to MPI as a reason to pursue the bandwidth/latency
data fabric goose. 

The DAT project (kDAPL) seems worth disussion too.

My two beads of wampum,
Eric

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: InfiniBand BOF @ LSM - topics of interest
  2002-05-14 20:19 Tony.P.Lee
  2002-05-14 20:42 ` Pete Zaitcev
  2002-05-14 23:25 ` Jakob Østergaard
@ 2002-05-14 23:58 ` Alan Cox
  2002-05-15 15:45   ` Timothy D. Witham
  2 siblings, 1 reply; 35+ messages in thread
From: Alan Cox @ 2002-05-14 23:58 UTC (permalink / raw)
  To: Tony.P.Lee; +Cc: alan, lmb, woody, linux-kernel, zaitcev

> I like to see user application such as VNC, SAMBA build directly
> on top of IB API.  I have couple of IB cards that can 
> send 10k 32KBytes message (320MB of data) every ~1 second over 
> 1x link with only <7% CPU usage (single CPU xeon 700MHz).  
> I was very impressed.  
> 
> Go thru the socket layer API would just slow thing down.

Thats an assumption that is actually historically not a very good one to
make. There are fundamental things that most of the "no network layer"
people tend to forget

1.	Van Jacobson saturated 10Mbit ethernet with a Sun 3/50
2.	SGI saturated HIPPI with MIPS processors that are at best comparable
	to the lowest end wince PDAs
3.	Having no network layer in almost every case is tied to the belief
	that bandwidth is infinite and you need to congestion control

In a network congestion based collapse is spectacularly bad. Some of the
internet old hands can probably tell you the horror stories of the period
the whole internet backbone basically did that until they got their research
right. Nagle's tinygram congestion avoidance work took Ford's network usage
down by I believe the paper quoted 90%.

The socket API is very efficient. TCP is extremely efficient in the service
it provides. IB can support large messages, which massively ups the throughput.

Let me ask you a much more important question 

Can you send achieve 90% efficiency on a 90% utilized fabric with multiple
nodes and multiple hops ? If you can't then you are not talking about a 
network you are talking about a benchmark.

Alan


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: InfiniBand BOF @ LSM - topics of interest
  2002-05-14 20:19 Tony.P.Lee
  2002-05-14 20:42 ` Pete Zaitcev
@ 2002-05-14 23:25 ` Jakob Østergaard
  2002-05-14 23:58 ` Alan Cox
  2 siblings, 0 replies; 35+ messages in thread
From: Jakob Østergaard @ 2002-05-14 23:25 UTC (permalink / raw)
  To: Tony.P.Lee; +Cc: alan, lmb, woody, linux-kernel, zaitcev

On Tue, May 14, 2002 at 01:19:13PM -0700, Tony.P.Lee@nokia.com wrote:
...
> 
> I like to see user application such as VNC, SAMBA build directly
> on top of IB API.  I have couple of IB cards that can 
> send 10k 32KBytes message (320MB of data) every ~1 second over 
> 1x link with only <7% CPU usage (single CPU xeon 700MHz).  
> I was very impressed.  
> 
> Go thru the socket layer API would just slow thing down.

Not going thru the socket layer will slow you down even more.  It will require
you to re-write every single performance-requiring application every time some
bloke designs a new network interconnect with a new API.

I'd take a 5% performance loss over re-writing all my code any day.

But why would it be any slower going over the socket API ?  After all, quite a
lot of people have put quite a lot of effort into making that API perform very
well.

> 
> With IB bandwidth faster than standard 32/33MHZ PCI, one might
> run DOOM over VNC over IB on remote computer faster 
> than a normal PC running DOOM locally....

But not until you port DOOM to the API-of-the-day.   Sweet idea though  ;)

> 
> One might create a OS that miror the complete process state
> info (replicate all the modified page) everytime that 
> process is schedule out. 

Latency kills.

Adding tracks to the highway doesn't make it any shorter.

-- 
................................................................
:   jakob@unthought.net   : And I see the elder races,         :
:.........................: putrid forms of man                :
:   Jakob Østergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: InfiniBand BOF @ LSM - topics of interest
@ 2002-05-14 22:15 Woodruff, Robert J
  2002-05-15  2:42 ` Timothy D. Witham
  0 siblings, 1 reply; 35+ messages in thread
From: Woodruff, Robert J @ 2002-05-14 22:15 UTC (permalink / raw)
  To: 'Timothy D. Witham'
  Cc: Pete Zaitcev, linux-kernel, 'Woodruff, Robert J'

Hi Tim,

Does the OSDL require production H/W  ?
I may be able to get my hands on a few early development HCAs,
as loaners, till production HCAs are avaialble
from HCA vendors. 

woody


-----Original Message-----
From: Timothy D. Witham [mailto:wookie@osdl.org]
Sent: Tuesday, May 14, 2002 2:01 PM
To: Timothy D. Witham
Cc: Pete Zaitcev; linux-kernel@vger.kernel.org; Woodruff, Robert J
Subject: Re: InfiniBand BOF @ LSM - topics of interest


  Should of said.

  I have both IA-32 and IA-64 machines so by purchasing
PCI HCA's we can move them around. :-)

Tim

On Tue, 2002-05-14 at 13:33, Timothy D. Witham wrote:
>   I have IA-32 and IA-64 machines with PCI HCA's we
> can move them around. 
> 
> Tim
> 
> On Tue, 2002-05-14 at 13:23, Pete Zaitcev wrote:
> > > From: "Timothy D. Witham" <wookie@osdl.org>
> > > Date: 14 May 2002 13:09:53 -0700
> > 
> > >   As an FYI to the group the OSDL has put out a
> > > RFQ for equipment to build a IB development environment
> > > and I was wondering if any of the possible developers
> > > would like to comment on the mix.  
> > > 
> > > http://www.osdl.org/rfqs/infiniband_dev_env
> > 
> > Very interesting. Are you going to plug it into Itanic based
> > hosts? Obviously, vendors cannot "support" that, but being
> > Linux based we should be able to get that running, right?
> > 
> > -- Pete
> -- 
> Timothy D. Witham - Lab Director - wookie@osdlab.org
> Open Source Development Lab Inc - A non-profit corporation
> 15275 SW Koll Parkway - Suite H - Beaverton OR, 97006
> (503)-626-2455 x11 (office)    (503)-702-2871     (cell)
> (503)-626-2436     (fax)
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-- 
Timothy D. Witham - Lab Director - wookie@osdlab.org
Open Source Development Lab Inc - A non-profit corporation
15275 SW Koll Parkway - Suite H - Beaverton OR, 97006
(503)-626-2455 x11 (office)    (503)-702-2871     (cell)
(503)-626-2436     (fax)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: InfiniBand BOF @ LSM - topics of interest
  2002-05-14 20:33       ` Timothy D. Witham
@ 2002-05-14 21:01         ` Timothy D. Witham
  0 siblings, 0 replies; 35+ messages in thread
From: Timothy D. Witham @ 2002-05-14 21:01 UTC (permalink / raw)
  To: Timothy D. Witham; +Cc: Pete Zaitcev, linux-kernel, Woodruff, Robert J

  Should of said.

  I have both IA-32 and IA-64 machines so by purchasing
PCI HCA's we can move them around. :-)

Tim

On Tue, 2002-05-14 at 13:33, Timothy D. Witham wrote:
>   I have IA-32 and IA-64 machines with PCI HCA's we
> can move them around. 
> 
> Tim
> 
> On Tue, 2002-05-14 at 13:23, Pete Zaitcev wrote:
> > > From: "Timothy D. Witham" <wookie@osdl.org>
> > > Date: 14 May 2002 13:09:53 -0700
> > 
> > >   As an FYI to the group the OSDL has put out a
> > > RFQ for equipment to build a IB development environment
> > > and I was wondering if any of the possible developers
> > > would like to comment on the mix.  
> > > 
> > > http://www.osdl.org/rfqs/infiniband_dev_env
> > 
> > Very interesting. Are you going to plug it into Itanic based
> > hosts? Obviously, vendors cannot "support" that, but being
> > Linux based we should be able to get that running, right?
> > 
> > -- Pete
> -- 
> Timothy D. Witham - Lab Director - wookie@osdlab.org
> Open Source Development Lab Inc - A non-profit corporation
> 15275 SW Koll Parkway - Suite H - Beaverton OR, 97006
> (503)-626-2455 x11 (office)    (503)-702-2871     (cell)
> (503)-626-2436     (fax)
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-- 
Timothy D. Witham - Lab Director - wookie@osdlab.org
Open Source Development Lab Inc - A non-profit corporation
15275 SW Koll Parkway - Suite H - Beaverton OR, 97006
(503)-626-2455 x11 (office)    (503)-702-2871     (cell)
(503)-626-2436     (fax)


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: InfiniBand BOF @ LSM - topics of interest
  2002-05-14 20:19 Tony.P.Lee
@ 2002-05-14 20:42 ` Pete Zaitcev
  2002-05-14 23:25 ` Jakob Østergaard
  2002-05-14 23:58 ` Alan Cox
  2 siblings, 0 replies; 35+ messages in thread
From: Pete Zaitcev @ 2002-05-14 20:42 UTC (permalink / raw)
  To: Tony.P.Lee; +Cc: linux-kernel, zaitcev

> Date: Tue, 14 May 2002 13:19:13 -0700
> From: <Tony.P.Lee@nokia.com>

Hrm. Didn't a guy from Nokia cooked a bizzare SM or an IB switch
software, which required ultra-fast mprotect?
 
> I like to see user application such as VNC, SAMBA build directly
> on top of IB API.  I have couple of IB cards that can 
> send 10k 32KBytes message (320MB of data) every ~1 second over 
> 1x link with only <7% CPU usage (single CPU xeon 700MHz).  
> I was very impressed.  
> 
> Go thru the socket layer API would just slow thing down.
>[...]

Do you have any data to back up this claim about socket layer?

Also, time and time again I see the same bullshit (sometimes
from intelligent people!). OF COURSE, it is TRIVIAL to send
messages (or receive messages), in fact, with a correct HCA
design IB allows it to be done from user mode. It is not where
the problem is. The problem is the delivery of notifications.
A notification is an interrupt (unless your application hogs
the CPU by CQ polling), and an interrupt is a context switch.
It takes infinitely smaller amount of time to post a work
request than to retire it on Infiniband. Best people are
cracking their heads over it. Ashok Raj, THE Intel's ARCHITECT
went as far as writing to linux-kernel asking for additional
arguments to RT signals (crazy, huh? :). Oracle gave up and said that
a) they do not mind hogging CPUs, they are the only application
running on the box anyways, and b) they only request notifications
for one in N messages [Joel, I know you are listening, care
to nod or call foul here?]

I may sound like Andre Hendrik here, but please understand
that nobody gives a flying dick about "sending 10k 32KBytes
message (320MB of data) every ~1 second over > 1x link with only
<7% CPU usage at a single CPU xeon 700MHz." My stack can do it today,
in 1500 lines of C code. It is the notification that counts.
How many RPC messages per second your clustered SQL server
can receive AND PROCESS is how many transactions it can do.

-- Pete

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: InfiniBand BOF @ LSM - topics of interest
  2002-05-14 20:23     ` Pete Zaitcev
@ 2002-05-14 20:33       ` Timothy D. Witham
  2002-05-14 21:01         ` Timothy D. Witham
  0 siblings, 1 reply; 35+ messages in thread
From: Timothy D. Witham @ 2002-05-14 20:33 UTC (permalink / raw)
  To: Pete Zaitcev; +Cc: linux-kernel, Woodruff, Robert J

  I have IA-32 and IA-64 machines with PCI HCA's we
can move them around. 

Tim

On Tue, 2002-05-14 at 13:23, Pete Zaitcev wrote:
> > From: "Timothy D. Witham" <wookie@osdl.org>
> > Date: 14 May 2002 13:09:53 -0700
> 
> >   As an FYI to the group the OSDL has put out a
> > RFQ for equipment to build a IB development environment
> > and I was wondering if any of the possible developers
> > would like to comment on the mix.  
> > 
> > http://www.osdl.org/rfqs/infiniband_dev_env
> 
> Very interesting. Are you going to plug it into Itanic based
> hosts? Obviously, vendors cannot "support" that, but being
> Linux based we should be able to get that running, right?
> 
> -- Pete
-- 
Timothy D. Witham - Lab Director - wookie@osdlab.org
Open Source Development Lab Inc - A non-profit corporation
15275 SW Koll Parkway - Suite H - Beaverton OR, 97006
(503)-626-2455 x11 (office)    (503)-702-2871     (cell)
(503)-626-2436     (fax)


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: InfiniBand BOF @ LSM - topics of interest
  2002-05-14 20:09   ` Timothy D. Witham
@ 2002-05-14 20:23     ` Pete Zaitcev
  2002-05-14 20:33       ` Timothy D. Witham
  0 siblings, 1 reply; 35+ messages in thread
From: Pete Zaitcev @ 2002-05-14 20:23 UTC (permalink / raw)
  To: Timothy D. Witham; +Cc: Pete Zaitcev, linux-kernel, Woodruff, Robert J

> From: "Timothy D. Witham" <wookie@osdl.org>
> Date: 14 May 2002 13:09:53 -0700

>   As an FYI to the group the OSDL has put out a
> RFQ for equipment to build a IB development environment
> and I was wondering if any of the possible developers
> would like to comment on the mix.  
> 
> http://www.osdl.org/rfqs/infiniband_dev_env

Very interesting. Are you going to plug it into Itanic based
hosts? Obviously, vendors cannot "support" that, but being
Linux based we should be able to get that running, right?

-- Pete

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: InfiniBand BOF @ LSM - topics of interest
@ 2002-05-14 20:19 Tony.P.Lee
  2002-05-14 20:42 ` Pete Zaitcev
                   ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Tony.P.Lee @ 2002-05-14 20:19 UTC (permalink / raw)
  To: alan, lmb; +Cc: woody, linux-kernel, zaitcev

 
> > On 2002-05-14T00:42:07,
> >    Alan Cox <alan@lxorguk.ukuu.org.uk> said:
> > 
> > > Kernel mode RPC over infiniband - relevant to mosix type 
> stuff, to McVoy
> > > scalable cluster type stuff and also to things like file 
> system offload
> > 
> > For that, a generic comm interface would be a good thing to 
> have first.
> 
> It has to be fast, nonblocking and kernel callable. Cluster 
> people count
> individual microseconds so its base layers must be extremely efficient
> even if there are "easy use" layers above. The obvious "easy 
> use" layer being
> IP over infiniband.
> 

I like to see user application such as VNC, SAMBA build directly
on top of IB API.  I have couple of IB cards that can 
send 10k 32KBytes message (320MB of data) every ~1 second over 
1x link with only <7% CPU usage (single CPU xeon 700MHz).  
I was very impressed.  

Go thru the socket layer API would just slow thing down.

With IB bandwidth faster than standard 32/33MHZ PCI, one might
run DOOM over VNC over IB on remote computer faster 
than a normal PC running DOOM locally....

One might create a OS that miror the complete process state
info (replicate all the modified page) everytime that 
process is schedule out. 


----------------------------------------------------------------
Tony Lee         

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: InfiniBand BOF @ LSM - topics of interest
  2002-05-13 16:36 ` Pete Zaitcev
@ 2002-05-14 20:09   ` Timothy D. Witham
  2002-05-14 20:23     ` Pete Zaitcev
  0 siblings, 1 reply; 35+ messages in thread
From: Timothy D. Witham @ 2002-05-14 20:09 UTC (permalink / raw)
  To: Pete Zaitcev; +Cc: linux-kernel, Woodruff, Robert J

  As an FYI to the group the OSDL has put out a
RFQ for equipment to build a IB development environment
and I was wondering if any of the possible developers
would like to comment on the mix.  

http://www.osdl.org/rfqs/infiniband_dev_env

  The idea is to have enough equipment that folks can
test out a range of configurations.  Once things
progress we will add more.

Tim

On Mon, 2002-05-13 at 09:36, Pete Zaitcev wrote:
> > From: "Woodruff, Robert J" <woody@co.intel.com>
> > Date: Mon, 13 May 2002 09:25:41 -0700
> 
> > Soliciting specific topics that people would like to discuss at the LSM
> > InfiniBand BOF.
> 
> N.B.: Woody means OLS BOF here (dunno where "LSM" comes from).
>  http://www.linuxsymposium.org/2002/bofs.php
> 
> -- Pete
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-- 
Timothy D. Witham - Lab Director - wookie@osdlab.org
Open Source Development Lab Inc - A non-profit corporation
15275 SW Koll Parkway - Suite H - Beaverton OR, 97006
(503)-626-2455 x11 (office)    (503)-702-2871     (cell)
(503)-626-2436     (fax)


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: InfiniBand BOF @ LSM - topics of interest
  2002-05-13 23:26   ` Lars Marowsky-Bree
@ 2002-05-14  0:19     ` Alan Cox
  0 siblings, 0 replies; 35+ messages in thread
From: Alan Cox @ 2002-05-14  0:19 UTC (permalink / raw)
  To: Lars Marowsky-Bree; +Cc: Alan Cox, Woodruff Robert J, linux-kernel, zaitcev

> On 2002-05-14T00:42:07,
>    Alan Cox <alan@lxorguk.ukuu.org.uk> said:
> 
> > Kernel mode RPC over infiniband - relevant to mosix type stuff, to McVoy
> > scalable cluster type stuff and also to things like file system offload
> 
> For that, a generic comm interface would be a good thing to have first.

It has to be fast, nonblocking and kernel callable. Cluster people count
individual microseconds so its base layers must be extremely efficient
even if there are "easy use" layers above. The obvious "easy use" layer being
IP over infiniband.

Alan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: InfiniBand BOF @ LSM - topics of interest
  2002-05-13 23:42 ` Alan Cox
  2002-05-13 23:26   ` Lars Marowsky-Bree
@ 2002-05-14  0:19   ` Joel Becker
  1 sibling, 0 replies; 35+ messages in thread
From: Joel Becker @ 2002-05-14  0:19 UTC (permalink / raw)
  To: Alan Cox; +Cc: Woodruff, Robert J, linux-kernel, zaitcev

On Mon, 2002-05-13 at 16:42, Alan Cox wrote:
> > There are at least 2 that I thought might be good discussion topics;
> > 	Sockets Direct Protocol for InfiniBand
> > 	User Mode Access to the InfiniBand network
> 
> Surely these are the same topic ?

	Nope.  There are three issues here, really.  The second item, "User
Mode Access" is probably one or both of two things.  A "native" IBA
interface (aka Verbs).  These are the actual IBA work elements.  Things
like Subnet Managers need access to this.  The other is uDAPL (user
Direct Access Programming Library).  uDAPL (and kDAPL) are emerging
standards for high-speed messaging systems (VIA/IBA/Myrinet type
things).  uDAPL is likely the protocol most of userspace will want to
program to.

Joel 

-- 

 "I'm living so far beyond my income that we may almost be said
 to be living apart."
         - e e cummings

Joel Becker
Senior Member of Technical Staff
Oracle Corporation
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: InfiniBand BOF @ LSM - topics of interest
  2002-05-13 16:25 Woodruff, Robert J
  2002-05-13 16:36 ` Pete Zaitcev
@ 2002-05-13 23:42 ` Alan Cox
  2002-05-13 23:26   ` Lars Marowsky-Bree
  2002-05-14  0:19   ` Joel Becker
  2002-05-15  0:11 ` Eric Brunner-Williams in Portland Maine
  2 siblings, 2 replies; 35+ messages in thread
From: Alan Cox @ 2002-05-13 23:42 UTC (permalink / raw)
  To: Woodruff, Robert J; +Cc: linux-kernel, zaitcev

> 
> Soliciting specific topics that people would like to discuss at the LSM
> InfiniBand BOF.
> There are at least 2 that I thought might be good discussion topics;
> 	Sockets Direct Protocol for InfiniBand
> 	User Mode Access to the InfiniBand network

Surely these are the same topic ?

> Others ?

Kernel mode RPC over infiniband - relevant to mosix type stuff, to 
McVoy scalable cluster type stuff and also to things like file system
offload

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: InfiniBand BOF @ LSM - topics of interest
  2002-05-13 23:42 ` Alan Cox
@ 2002-05-13 23:26   ` Lars Marowsky-Bree
  2002-05-14  0:19     ` Alan Cox
  2002-05-14  0:19   ` Joel Becker
  1 sibling, 1 reply; 35+ messages in thread
From: Lars Marowsky-Bree @ 2002-05-13 23:26 UTC (permalink / raw)
  To: Alan Cox; +Cc: Woodruff, Robert J, linux-kernel, zaitcev

On 2002-05-14T00:42:07,
   Alan Cox <alan@lxorguk.ukuu.org.uk> said:

> Kernel mode RPC over infiniband - relevant to mosix type stuff, to McVoy
> scalable cluster type stuff and also to things like file system offload

For that, a generic comm interface would be a good thing to have first.


Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
Immortality is an adequate definition of high availability for me.
	--- Gregory F. Pfister


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: InfiniBand BOF @ LSM - topics of interest
  2002-05-13 16:25 Woodruff, Robert J
@ 2002-05-13 16:36 ` Pete Zaitcev
  2002-05-14 20:09   ` Timothy D. Witham
  2002-05-13 23:42 ` Alan Cox
  2002-05-15  0:11 ` Eric Brunner-Williams in Portland Maine
  2 siblings, 1 reply; 35+ messages in thread
From: Pete Zaitcev @ 2002-05-13 16:36 UTC (permalink / raw)
  To: linux-kernel; +Cc: Woodruff, Robert J

> From: "Woodruff, Robert J" <woody@co.intel.com>
> Date: Mon, 13 May 2002 09:25:41 -0700

> Soliciting specific topics that people would like to discuss at the LSM
> InfiniBand BOF.

N.B.: Woody means OLS BOF here (dunno where "LSM" comes from).
 http://www.linuxsymposium.org/2002/bofs.php

-- Pete

^ permalink raw reply	[flat|nested] 35+ messages in thread

* InfiniBand BOF @ LSM - topics of interest
@ 2002-05-13 16:25 Woodruff, Robert J
  2002-05-13 16:36 ` Pete Zaitcev
                   ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Woodruff, Robert J @ 2002-05-13 16:25 UTC (permalink / raw)
  To: linux-kernel, zaitcev


Hi All, 

Soliciting specific topics that people would like to discuss at the LSM
InfiniBand BOF.
There are at least 2 that I thought might be good discussion topics;
	Sockets Direct Protocol for InfiniBand
	User Mode Access to the InfiniBand network
Others ?


^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2002-05-16 23:42 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-05-15 20:37 InfiniBand BOF @ LSM - topics of interest Woodruff, Robert J
2002-05-15 20:55 ` Pete Zaitcev
2002-05-15 21:01 ` Alan Cox
  -- strict thread matches above, loose matches on Subject: below --
2002-05-16 23:42 Woodruff, Robert J
2002-05-16 10:37 Eric Brunner-Williams in Portland Maine
2002-05-15 23:58 Woodruff, Robert J
2002-05-16  0:32 ` Alan Cox
2002-05-15 20:47 Tony.P.Lee
2002-05-15 21:36 ` Russell Leighton
2002-05-15 22:41   ` Lars Marowsky-Bree
2002-05-16  4:51   ` Ashok Raj
2002-05-15  1:29 Tony.P.Lee
2002-05-15  2:35 ` Alan Cox
2002-05-15  5:01   ` Pete Zaitcev
2002-05-15 11:29     ` Alan Cox
2002-05-15 14:27       ` Ashok Raj
2002-05-15 16:34     ` Stephen Hemminger
2002-05-14 22:15 Woodruff, Robert J
2002-05-15  2:42 ` Timothy D. Witham
2002-05-14 20:19 Tony.P.Lee
2002-05-14 20:42 ` Pete Zaitcev
2002-05-14 23:25 ` Jakob Østergaard
2002-05-14 23:58 ` Alan Cox
2002-05-15 15:45   ` Timothy D. Witham
2002-05-13 16:25 Woodruff, Robert J
2002-05-13 16:36 ` Pete Zaitcev
2002-05-14 20:09   ` Timothy D. Witham
2002-05-14 20:23     ` Pete Zaitcev
2002-05-14 20:33       ` Timothy D. Witham
2002-05-14 21:01         ` Timothy D. Witham
2002-05-13 23:42 ` Alan Cox
2002-05-13 23:26   ` Lars Marowsky-Bree
2002-05-14  0:19     ` Alan Cox
2002-05-14  0:19   ` Joel Becker
2002-05-15  0:11 ` Eric Brunner-Williams in Portland Maine

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).