linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RE: InfiniBand BOF @ LSM - topics of interest
@ 2002-05-15  1:29 Tony.P.Lee
  2002-05-15  2:35 ` Alan Cox
  0 siblings, 1 reply; 35+ messages in thread
From: Tony.P.Lee @ 2002-05-15  1:29 UTC (permalink / raw)
  To: alan; +Cc: lmb, woody, linux-kernel, zaitcev


> Thats an assumption that is actually historically not a very 
> good one to
> make. There are fundamental things that most of the "no network layer"
> people tend to forget
> 
> 1.	Van Jacobson saturated 10Mbit ethernet with a Sun 3/50
> 2.	SGI saturated HIPPI with MIPS processors that are at 
> best comparable
> 	to the lowest end wince PDAs
> 3.	Having no network layer in almost every case is tied to 
> the belief
> 	that bandwidth is infinite and you need to congestion control
> 
> In a network congestion based collapse is spectacularly bad. 
> Some of the
> internet old hands can probably tell you the horror stories 
> of the period
> the whole internet backbone basically did that until they got 
> their research
> right. Nagle's tinygram congestion avoidance work took Ford's 
> network usage
> down by I believe the paper quoted 90%.
> 
> The socket API is very efficient. TCP is extremely efficient 
> in the service
> it provides. IB can support large messages, which massively 
> ups the throughput.
> 
> Let me ask you a much more important question 
> 
> Can you send achieve 90% efficiency on a 90% utilized fabric 
> with multiple
> nodes and multiple hops ? If you can't then you are not 
> talking about a 
> network you are talking about a benchmark.
> 

Good points,  I prefer to see IB as replacement for
SCSI, FibreChannel, IDE, with its RDMA, messaging and
reliable connection, user mode DMA type features.   They
are type of connections without the congestion avoidance
issues associate with TCP design - the "no network layer"
type of works.


I don't know if IB over multi-nodes/multi-hops in a WAN like
setup works or not.  I like to see network experts like 
yourself try to break that since all the congestion control which
is "supposely" done in HW as compare to doing that in software as 
in TCP/IP.  It would be nice to know how solid is the overall 
IB congestion control design in that environment and at 
what point does it break.   


IP over IB is a bit like IP over SCSI or IP over FibreChannel
for me.






^ permalink raw reply	[flat|nested] 35+ messages in thread
* RE: InfiniBand BOF @ LSM - topics of interest
@ 2002-05-16 23:42 Woodruff, Robert J
  0 siblings, 0 replies; 35+ messages in thread
From: Woodruff, Robert J @ 2002-05-16 23:42 UTC (permalink / raw)
  To: 'Alan Cox', Woodruff, Robert J
  Cc: russ, Tony.P.Lee, wookie, lmb, Woodruff, Robert J, linux-kernel, zaitcev


> so someone could invent a new address family for sockets,
> say AF_INFINIBANDO, that is much more light weight than the existing
TCP/IP
> stack.
> Thus with a small change to the application, a good performance increase
can
> be attained.

>Shouldn't be too hard. It looks like its basically AF_PACKET combined with
>the infiniband notions of security.

Maybe a little higher level than raw packets, but yes, 
a light-weight sockets protocol driver. 

^ permalink raw reply	[flat|nested] 35+ messages in thread
* Re: InfiniBand BOF @ LSM - topics of interest
@ 2002-05-16 10:37 Eric Brunner-Williams in Portland Maine
  0 siblings, 0 replies; 35+ messages in thread
From: Eric Brunner-Williams in Portland Maine @ 2002-05-16 10:37 UTC (permalink / raw)
  To: ashokr2
  Cc: 'Russell Leighton',
	Tony.P.Lee, wookie, alan, lmb, linux-kernel, zaitcev, woody,
	brunner


Phil Karn's memo "Advice for Internet Subnetwork Designers",
(draft-ietf-pilc-link-design-11.txt), is interesting reading.

I'd appreciate it if someone with IB-clue would comment on
Phil's memo, and the design choices of the IB designer(s).

My two beads of wampum,
Eric

^ permalink raw reply	[flat|nested] 35+ messages in thread
* RE: InfiniBand BOF @ LSM - topics of interest
@ 2002-05-15 23:58 Woodruff, Robert J
  2002-05-16  0:32 ` Alan Cox
  0 siblings, 1 reply; 35+ messages in thread
From: Woodruff, Robert J @ 2002-05-15 23:58 UTC (permalink / raw)
  To: 'Russell Leighton', Tony.P.Lee
  Cc: wookie, alan, lmb, Woodruff, Robert J, linux-kernel, zaitcev


>OR is it saner to layer TCP/IP etc. over IB... it seems to me the point
>others were making was that  there is more to "network"  style services
>than just passing bits from here to there...will IB have low level
>support many of the features people have come to expect?

Yes someone will layer IP over IB to support the gazillion applications
that currently run over that protocol without any code changes. This should
perform
as well as 10 gigabit Ethernet, but may cost less. 

However, since InfiniBand is already a reliable transport, one can bypass
TCP, 
so someone could invent a new address family for sockets,
say AF_INFINIBANDO, that is much more light weight than the existing TCP/IP
stack.
Thus with a small change to the application, a good performance increase can
be attained.
This is probably how the InfiniBand standard sockets direct protocol will be
initially
implemented for Linux. Next, one could entertain a concept similar to
Winsock Direct,
where no change is needed to the application, and the kernel has a swtich in
the
AF_INET code path that bypasses the TCP processing for packets destined for
the InfiniBand subnet.  This will likely have to come later for Linux 
after the performance value of bypassing TCP has 
actually been demonstrated. People are currently skeptical that 
a concept that has been demonstrated to have performance advantages
in Windows 2000 will achieve the same result if implemented in 
Linux. 

Finally, there are those that want to run on almost the bare metal. These
are 
typically the high performance computing types, or some of the large data
base
vendors that want to use InfiniBand for clustering.
For them, there is an InfiniBand Access Layer that allows 
direct access to the hardware (via a thin S/W layer)
from user space applications. One could
probably mmap() an ethernet NIC registers into user space, but 
probably only one process. 
InfiniBand was designed to specifically allow mmap()ing the hardware 
into user space,  even for multiple processes. 

As for a high level answer to the security question,
InfiniBand has the concept of protection domains
and read and write keys with checking built into the hardware. 
This provides protection from people RDMAing to the wrong place. 

Kind of a high level simplistic view, see the InfiniBand 
specification for details. There is also an open source
project on sourceforege that is working on code. See 
the home page for details.

http://sourceforge.net/projects/infiniband






^ permalink raw reply	[flat|nested] 35+ messages in thread
* RE: InfiniBand BOF @ LSM - topics of interest
@ 2002-05-15 20:47 Tony.P.Lee
  2002-05-15 21:36 ` Russell Leighton
  0 siblings, 1 reply; 35+ messages in thread
From: Tony.P.Lee @ 2002-05-15 20:47 UTC (permalink / raw)
  To: wookie, alan; +Cc: lmb, woody, linux-kernel, zaitcev

> > Thats an assumption that is actually historically not a 
> very good one to
> > make. There are fundamental things that most of the "no 
> network layer"
> > people tend to forget
> > 
> > 1.	Van Jacobson saturated 10Mbit ethernet with a Sun 3/50
> > 2.	SGI saturated HIPPI with MIPS processors that are at 
> best comparable
> > 	to the lowest end wince PDAs
> > 3.	Having no network layer in almost every case is tied to 
> the belief
> > 	that bandwidth is infinite and you need to congestion control
> > 
>  First I want to say that the average person using the system the
> socket interface with the underlying network stack is the best
> way to go. IMHO
>  
> But if the issue isn't throughput and if the application requires
> some items to have low latency.  The size of the network stack can
> have an adverse effect on the overall performance of the system.  A
> good example is in a Distributed Lock Manager (DLM).  In this case
> the round trip including the software stack limits the number of
> locks per second that can occur. 
> 
>   So if we can fit everything that we need into a 64 byte IB packet
> for the imaginary application that would take 256 nsec to transmit
> to the other system. (2.5 Gb/sec link) If you assume zero turn around
> time you get 512 nsec as the lowest lock request/grant time possible
> which puts you in the range of doing a little under 980,000
> lock/unlock operations per second on a single lock and of course when
> you add the software to actually process the packet the number of
> lock/unlock operations per second will always be below that 
> in the real 
> world.  When you compare this to a modern processors ability to
> do lock/unlocks this is really a small number of lock/unlock 
> operations per second. 
> 
> So the ability to scarf that packet into the application and respond
> is the hard issue to solve in scalability.
> 
> Tim
> 

For VNC type application, instead server translates
every X Windows, Mac, Windows GUI calls/Bitmap update
to TCP stream, you convert the GUI API calls to 
IB RC messages and bitmap updates to RDMA write directly 
to client app's frame buffer.  

For SAMBA like fs, the file read api can be translated to 
IB RC messages on client + RDMA write to remote 
client app's buffer directly from server.

They won't be "standard" VNC/SAMBA any more. 

On the other hand, we can put VNC over TCP over IP over IB,
- "for people with hammer, every problem looks like a nail." :-)

In theory, we can have IB DVD drive RDMA video directly 
over IB switch to IB enable VGA frame buffer and completely
by pass the system.  CPU only needed setup the proper 
connections.   The idea is to truely virtualized the system
resources and "resource server" RDMA the data to anyone on IB
switch with minimal CPU interaction in the process.

You can also config a normal SCSI card DMA data to virtualized 
IB address on PCI address space and have the data shows up 15 meters
or 2 km away on server's "virtual scsi driver" destination DMA address.
It made iSCSI looked like dial up modem in term of performance
and latency. 


----------------------------------------------------------------
Tony 

^ permalink raw reply	[flat|nested] 35+ messages in thread
* RE: InfiniBand BOF @ LSM - topics of interest
@ 2002-05-15 20:37 Woodruff, Robert J
  2002-05-15 20:55 ` Pete Zaitcev
  2002-05-15 21:01 ` Alan Cox
  0 siblings, 2 replies; 35+ messages in thread
From: Woodruff, Robert J @ 2002-05-15 20:37 UTC (permalink / raw)
  To: 'Stephen Hemminger', Pete Zaitcev; +Cc: Alan Cox, linux-kernel


On Tue, 2002-05-14 at 22:01, Pete Zaitcev wrote:
 
> The thing about Infiniband is that its scope is so great.
> If you consider Infiniband was only a glorified PCI with serial
> connector, the congestion control is not an issue. Credits
> are quite sufficient to provide per link flow control, and
> everything would work nicely with a couple of switches.
> Such was the original plan, anyways, but somehow cluster
> ninjas managed to hijack the spec and we have the rabid
> overengineering running amok. In fact, they ran so far
> that Intel jumped ship and created PCI Express, and we
> have discussions about congestion control. Sad, really...
> 


It's clear from this email thread that there is a lot of 
confusion about the intended use of InfiniBand and it's 
benefits. I'll take that as a need to prepare some material
for the Linux SymposiuM BOF that shows some of 
the  benefits of InfiniBand as demonstrated on the early
InfiniBand hardware. 




^ permalink raw reply	[flat|nested] 35+ messages in thread
* RE: InfiniBand BOF @ LSM - topics of interest
@ 2002-05-14 22:15 Woodruff, Robert J
  2002-05-15  2:42 ` Timothy D. Witham
  0 siblings, 1 reply; 35+ messages in thread
From: Woodruff, Robert J @ 2002-05-14 22:15 UTC (permalink / raw)
  To: 'Timothy D. Witham'
  Cc: Pete Zaitcev, linux-kernel, 'Woodruff, Robert J'

Hi Tim,

Does the OSDL require production H/W  ?
I may be able to get my hands on a few early development HCAs,
as loaners, till production HCAs are avaialble
from HCA vendors. 

woody


-----Original Message-----
From: Timothy D. Witham [mailto:wookie@osdl.org]
Sent: Tuesday, May 14, 2002 2:01 PM
To: Timothy D. Witham
Cc: Pete Zaitcev; linux-kernel@vger.kernel.org; Woodruff, Robert J
Subject: Re: InfiniBand BOF @ LSM - topics of interest


  Should of said.

  I have both IA-32 and IA-64 machines so by purchasing
PCI HCA's we can move them around. :-)

Tim

On Tue, 2002-05-14 at 13:33, Timothy D. Witham wrote:
>   I have IA-32 and IA-64 machines with PCI HCA's we
> can move them around. 
> 
> Tim
> 
> On Tue, 2002-05-14 at 13:23, Pete Zaitcev wrote:
> > > From: "Timothy D. Witham" <wookie@osdl.org>
> > > Date: 14 May 2002 13:09:53 -0700
> > 
> > >   As an FYI to the group the OSDL has put out a
> > > RFQ for equipment to build a IB development environment
> > > and I was wondering if any of the possible developers
> > > would like to comment on the mix.  
> > > 
> > > http://www.osdl.org/rfqs/infiniband_dev_env
> > 
> > Very interesting. Are you going to plug it into Itanic based
> > hosts? Obviously, vendors cannot "support" that, but being
> > Linux based we should be able to get that running, right?
> > 
> > -- Pete
> -- 
> Timothy D. Witham - Lab Director - wookie@osdlab.org
> Open Source Development Lab Inc - A non-profit corporation
> 15275 SW Koll Parkway - Suite H - Beaverton OR, 97006
> (503)-626-2455 x11 (office)    (503)-702-2871     (cell)
> (503)-626-2436     (fax)
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-- 
Timothy D. Witham - Lab Director - wookie@osdlab.org
Open Source Development Lab Inc - A non-profit corporation
15275 SW Koll Parkway - Suite H - Beaverton OR, 97006
(503)-626-2455 x11 (office)    (503)-702-2871     (cell)
(503)-626-2436     (fax)

^ permalink raw reply	[flat|nested] 35+ messages in thread
* RE: InfiniBand BOF @ LSM - topics of interest
@ 2002-05-14 20:19 Tony.P.Lee
  2002-05-14 20:42 ` Pete Zaitcev
                   ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Tony.P.Lee @ 2002-05-14 20:19 UTC (permalink / raw)
  To: alan, lmb; +Cc: woody, linux-kernel, zaitcev

 
> > On 2002-05-14T00:42:07,
> >    Alan Cox <alan@lxorguk.ukuu.org.uk> said:
> > 
> > > Kernel mode RPC over infiniband - relevant to mosix type 
> stuff, to McVoy
> > > scalable cluster type stuff and also to things like file 
> system offload
> > 
> > For that, a generic comm interface would be a good thing to 
> have first.
> 
> It has to be fast, nonblocking and kernel callable. Cluster 
> people count
> individual microseconds so its base layers must be extremely efficient
> even if there are "easy use" layers above. The obvious "easy 
> use" layer being
> IP over infiniband.
> 

I like to see user application such as VNC, SAMBA build directly
on top of IB API.  I have couple of IB cards that can 
send 10k 32KBytes message (320MB of data) every ~1 second over 
1x link with only <7% CPU usage (single CPU xeon 700MHz).  
I was very impressed.  

Go thru the socket layer API would just slow thing down.

With IB bandwidth faster than standard 32/33MHZ PCI, one might
run DOOM over VNC over IB on remote computer faster 
than a normal PC running DOOM locally....

One might create a OS that miror the complete process state
info (replicate all the modified page) everytime that 
process is schedule out. 


----------------------------------------------------------------
Tony Lee         

^ permalink raw reply	[flat|nested] 35+ messages in thread
* InfiniBand BOF @ LSM - topics of interest
@ 2002-05-13 16:25 Woodruff, Robert J
  2002-05-13 16:36 ` Pete Zaitcev
                   ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Woodruff, Robert J @ 2002-05-13 16:25 UTC (permalink / raw)
  To: linux-kernel, zaitcev


Hi All, 

Soliciting specific topics that people would like to discuss at the LSM
InfiniBand BOF.
There are at least 2 that I thought might be good discussion topics;
	Sockets Direct Protocol for InfiniBand
	User Mode Access to the InfiniBand network
Others ?


^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2002-05-16 23:42 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-05-15  1:29 InfiniBand BOF @ LSM - topics of interest Tony.P.Lee
2002-05-15  2:35 ` Alan Cox
2002-05-15  5:01   ` Pete Zaitcev
2002-05-15 11:29     ` Alan Cox
2002-05-15 14:27       ` Ashok Raj
2002-05-15 16:34     ` Stephen Hemminger
  -- strict thread matches above, loose matches on Subject: below --
2002-05-16 23:42 Woodruff, Robert J
2002-05-16 10:37 Eric Brunner-Williams in Portland Maine
2002-05-15 23:58 Woodruff, Robert J
2002-05-16  0:32 ` Alan Cox
2002-05-15 20:47 Tony.P.Lee
2002-05-15 21:36 ` Russell Leighton
2002-05-15 22:41   ` Lars Marowsky-Bree
2002-05-16  4:51   ` Ashok Raj
2002-05-15 20:37 Woodruff, Robert J
2002-05-15 20:55 ` Pete Zaitcev
2002-05-15 21:01 ` Alan Cox
2002-05-14 22:15 Woodruff, Robert J
2002-05-15  2:42 ` Timothy D. Witham
2002-05-14 20:19 Tony.P.Lee
2002-05-14 20:42 ` Pete Zaitcev
2002-05-14 23:25 ` Jakob Østergaard
2002-05-14 23:58 ` Alan Cox
2002-05-15 15:45   ` Timothy D. Witham
2002-05-13 16:25 Woodruff, Robert J
2002-05-13 16:36 ` Pete Zaitcev
2002-05-14 20:09   ` Timothy D. Witham
2002-05-14 20:23     ` Pete Zaitcev
2002-05-14 20:33       ` Timothy D. Witham
2002-05-14 21:01         ` Timothy D. Witham
2002-05-13 23:42 ` Alan Cox
2002-05-13 23:26   ` Lars Marowsky-Bree
2002-05-14  0:19     ` Alan Cox
2002-05-14  0:19   ` Joel Becker
2002-05-15  0:11 ` Eric Brunner-Williams in Portland Maine

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).