linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Linux and system area networks
       [not found] <mailman.993492125.21454.linux-kernel2news@redhat.com>
@ 2001-06-25 22:30 ` Pete Zaitcev
  2001-06-25 22:55   ` Roland Dreier
  2001-06-26  0:08   ` Alan Cox
  0 siblings, 2 replies; 11+ messages in thread
From: Pete Zaitcev @ 2001-06-25 22:30 UTC (permalink / raw)
  To: roland, linux-kernel

> I'd like to find out if anyone has thought about how Linux will handle
> some of the new network technologies people are starting to push.
> Specifically I'm talking about "System Area Networks," that is, things
> like Infiniband, as well as TCP/IP offload.

Infiniband is doing relatively well, as much as anything can
with Intel at the helm (see "Itanic"). This has nothing to
do with TCP/IP offload, which is an extremely stupid idea.
The whole thing stems from a desire by vendors to sell
"smart" (== very expensive) NICs.

RDMA in Infiniband is, in my view, a little more than
traditional DMA in any other advanced server I/O bus.
Sun UPA is packetized, for instance.

> The rough idea is that WSD is a new user space library that looks at
> sockets calls and decides if they have to go through the usual kernel
> network stack, or if they can be handed off to a "SAN service
> provider" which bypasses the network stack and uses hardware reliable
> transport and possibly RDMA.

That can be done in Linux just as easily, using same DLLs
(they are called .so for "shared object"). If you look
at Ashok Raj's Infi presentation, you may discern "user-level
sockets", if you look hard enough. I invite you to try, if
errors of others did not teach you anything.

> This means that all applications that use Winsock benefit from the
> advanced network hardware.  Also, it means that Windows is much easier
> for hardware vendors to support than other OSes.  For example,
> Alacritech's TCP/IP offload NIC only works under Windows.  Microsoft
> is also including Infiniband support in Windows XP and Windows 2002.

IMHO, Alacritech is about to join scores and scores of vendors
who tried that before. Customers understand very soon that
a properly written host based stack works much better in
the face of a changing environment: Faster CPUs, new CPUs
(IA-64), new network protocols (ECN). Besides, it is easy
to "accelerate" a bad network stack, but try to outdo a
well done stack.

> So I guess my question is whether anyone has started thinking about
> the architectural changes needed to make System Area Networking and
> TCP/IP offload easier under Linux.

Pretty much zero-copy that DaveM and Co. do addresses this.

-- Pete

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Linux and system area networks
  2001-06-25 22:30 ` Linux and system area networks Pete Zaitcev
@ 2001-06-25 22:55   ` Roland Dreier
  2001-06-26  0:14     ` Alan Cox
  2001-06-26  0:08   ` Alan Cox
  1 sibling, 1 reply; 11+ messages in thread
From: Roland Dreier @ 2001-06-25 22:55 UTC (permalink / raw)
  To: linux-kernel; +Cc: Pete Zaitcev

>>>>> "Pete" == Pete Zaitcev <zaitcev@redhat.com> writes:

    Roland> The rough idea is that WSD is a new user space library
    Roland> that looks at sockets calls and decides if they have to go
    Roland> through the usual kernel network stack, or if they can be
    Roland> handed off to a "SAN service provider" which bypasses the
    Roland> network stack and uses hardware reliable transport and
    Roland> possibly RDMA.

    Pete> That can be done in Linux just as easily, using same DLLs
    Pete> (they are called .so for "shared object"). If you look at
    Pete> Ashok Raj's Infi presentation, you may discern "user-level
    Pete> sockets", if you look hard enough. I invite you to try, if
    Pete> errors of others did not teach you anything.

I think you misunderstood the point.  Microsoft is providing this WSD
DLL as a standard part of W2K now.  This means that hardware vendors
just have to write a SAN service provider, and all Winsock-using
applications benefit transparently.  No matter how good your TCP/IP
implementation is, you still lose (especially in latency) compared to
using reliable hardware transport.  Oracle-with-VI and DAFS-vs-NFS
benchmarks show this quite clearly.

Linux has nothing to compare to Winsock Direct.  I agree, one could
put an equivalent in glibc, or one could take advantage of Linux's
relatively low system call latency and put something in the kernel.
The unfortunate consequence of this is that SAN (system area network)
hardware vendors are not going to support Linux very well.

BTW, do you have a pointer to Ashok Raj's presentation?

    Roland> This means that all applications that use Winsock benefit
    Roland> from the advanced network hardware.  Also, it means that
    Roland> Windows is much easier for hardware vendors to support
    Roland> than other OSes.  For example, Alacritech's TCP/IP offload
    Roland> NIC only works under Windows.  Microsoft is also including
    Roland> Infiniband support in Windows XP and Windows 2002.

    Pete> IMHO, Alacritech is about to join scores and scores of
    Pete> vendors who tried that before. Customers understand very
    Pete> soon that a properly written host based stack works much
    Pete> better in the face of a changing environment: Faster CPUs,
    Pete> new CPUs (IA-64), new network protocols (ECN). Besides, it
    Pete> is easy to "accelerate" a bad network stack, but try to
    Pete> outdo a well done stack.

OK, how about an Infiniband network with a TCP/IP gateway at the edge?
Have we thought about how Linux servers should use the gateway to talk
to internet hosts?  Surely there's no point in running TCP/IP inside
the Infiniband network, so there needs to be some concept of "socket
over Infiniband."

Thanks,
  Roland

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Linux and system area networks
  2001-06-25 22:30 ` Linux and system area networks Pete Zaitcev
  2001-06-25 22:55   ` Roland Dreier
@ 2001-06-26  0:08   ` Alan Cox
  1 sibling, 0 replies; 11+ messages in thread
From: Alan Cox @ 2001-06-26  0:08 UTC (permalink / raw)
  To: Pete Zaitcev; +Cc: roland, linux-kernel

> a properly written host based stack works much better in
> the face of a changing environment: Faster CPUs, new CPUs
> (IA-64), new network protocols (ECN). Besides, it is easy
> to "accelerate" a bad network stack, but try to outdo a
> well done stack.

Putting the stack partly in user spacd can sometimes be a benefit. Linux 8086
does this to cut down kernel size for example ;)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Linux and system area networks
  2001-06-25 22:55   ` Roland Dreier
@ 2001-06-26  0:14     ` Alan Cox
  0 siblings, 0 replies; 11+ messages in thread
From: Alan Cox @ 2001-06-26  0:14 UTC (permalink / raw)
  To: Roland Dreier; +Cc: linux-kernel, Pete Zaitcev

> OK, how about an Infiniband network with a TCP/IP gateway at the edge?
> Have we thought about how Linux servers should use the gateway to talk
> to internet hosts?  Surely there's no point in running TCP/IP inside
> the Infiniband network, so there needs to be some concept of "socket
> over Infiniband."

So write the library, it shouldnt need the kernel involved, and you can
take over AF_INET socket syscalls with an LD_PRELOAD so it can be transparent

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Linux and system area networks
  2001-06-28 21:46       ` Roland Dreier
@ 2001-06-29  2:33         ` Bernd Eckenfels
  0 siblings, 0 replies; 11+ messages in thread
From: Bernd Eckenfels @ 2001-06-29  2:33 UTC (permalink / raw)
  To: linux-kernel

In article <52d77o46ra.fsf@love-boat.topspincom.com> you wrote:
> We seem to have come full circle.  My original question was about
> providing a better way for sockets applications to take advantage of
> SAN hardware.  W2K Datacenter introduces "Winsock Direct," which will
> bypass the protocol stack when appropriate.  The Infiniband people are
> working on a "Sockets Direct" standard, which is a similar idea.  No
> one seems to care about this for Linux.

Well, there is some work done by the zero-copy folks and the sendfile()
function. Realy much more than a mmaped network socket is not needed.

Besides it looks like SAN will go all the way in the IP Direction sooner or
later anyway :)

There are some interesting Features like accessing MS SQL 7.0 Server via VIA
architecture interfaces over SAN, I am not sure o how open VIA is.

Greetings
Bernd

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Linux and system area networks
  2001-06-28 19:12     ` Pekka Pietikainen
@ 2001-06-28 21:46       ` Roland Dreier
  2001-06-29  2:33         ` Bernd Eckenfels
  0 siblings, 1 reply; 11+ messages in thread
From: Roland Dreier @ 2001-06-28 21:46 UTC (permalink / raw)
  To: Pekka Pietikainen; +Cc: Bogdan Costescu, linux-kernel

    Pekka> If you used sockets, I believe the normal way to use SAN
    Pekka> boards is to just make them look like network cards with a
    Pekka> large MTU Sure it works, but it's not very efficient :) (I
    Pekka> have to admit I've not played with that kind of toys at
    Pekka> all, though)

We seem to have come full circle.  My original question was about
providing a better way for sockets applications to take advantage of
SAN hardware.  W2K Datacenter introduces "Winsock Direct," which will
bypass the protocol stack when appropriate.  The Infiniband people are
working on a "Sockets Direct" standard, which is a similar idea.  No
one seems to care about this for Linux.

Roland

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Linux and system area networks
  2001-06-28 17:28   ` Bogdan Costescu
@ 2001-06-28 19:12     ` Pekka Pietikainen
  2001-06-28 21:46       ` Roland Dreier
  0 siblings, 1 reply; 11+ messages in thread
From: Pekka Pietikainen @ 2001-06-28 19:12 UTC (permalink / raw)
  To: Bogdan Costescu; +Cc: linux-kernel

On Thu, Jun 28, 2001 at 07:28:20PM +0200, Bogdan Costescu wrote:
> On Wed, 27 Jun 2001, Pekka Pietikainen wrote:
> 
> I'm sorry, but I don't understand your reference to MPI here. MPI is a
> high-level API; MPI can run on top of whatever communication features
> exists: TCP/IP, shared memory, VI, etc.

Well, the way I understood the discussion was about how you can
utilize your new $$$ SAN boards well with your existing applications.
If you used something like MPI you just switch to a new implementation
optimized for your network (and hope the new one is compatible
with your code ;) )

Of course you can use some lower-level API and get better 
performance, but your programs will undoubtedly be more complicated
and probably need to be rewritten for new APIs every now and then.

If you used sockets, I believe the normal way to use SAN boards
is to just make them look like network cards with a large MTU 
Sure it works, but it's not very efficient :) (I have to admit 
I've not played with that kind of toys at all, though)

-- 
Pekka Pietikainen




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Linux and system area networks
  2001-06-27 12:41 ` Pekka Pietikainen
@ 2001-06-28 17:28   ` Bogdan Costescu
  2001-06-28 19:12     ` Pekka Pietikainen
  0 siblings, 1 reply; 11+ messages in thread
From: Bogdan Costescu @ 2001-06-28 17:28 UTC (permalink / raw)
  To: Pekka Pietikainen; +Cc: linux-kernel

On Wed, 27 Jun 2001, Pekka Pietikainen wrote:

> Providing a wrapper library for use with Infiniband and the current
> SAN boards like WSD would probably be a useful exercise, but to really get
> good performance (especially latency-wise) you probably want to use
> something like MPI. For many applications a wrapper will be enough, though.

I'm sorry, but I don't understand your reference to MPI here. MPI is a
high-level API; MPI can run on top of whatever communication features
exists: TCP/IP, shared memory, VI, etc.
MPI (as well as other "standards" for parallel programming - PVM, OpenMP)
came from the need to have a common interface, not to have all parallel
programs include specific code to deal with TCP/IP, shared memory, VI,
etc. whenever they were available. Instead, MPI serves as a middle-man
between them and the parallel programs. So, MPI cannot be faster than the
underlying communication features.

Sincerely,

Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Linux and system area networks
  2001-06-26 12:36 Jesse Pollard
@ 2001-06-27 12:41 ` Pekka Pietikainen
  2001-06-28 17:28   ` Bogdan Costescu
  0 siblings, 1 reply; 11+ messages in thread
From: Pekka Pietikainen @ 2001-06-27 12:41 UTC (permalink / raw)
  To: linux-kernel

On Tue, Jun 26, 2001 at 07:36:30AM -0500, Jesse Pollard wrote:
> > I think you misunderstood the point.  Microsoft is providing this WSD
> > DLL as a standard part of W2K now.  This means that hardware vendors
> > just have to write a SAN service provider, and all Winsock-using
> > applications benefit transparently.  No matter how good your TCP/IP
> > implementation is, you still lose (especially in latency) compared to
> > using reliable hardware transport.  Oracle-with-VI and DAFS-vs-NFS
> > benchmarks show this quite clearly.
> 
> You do loose in security. You can't use IPSec over such a device without
> some drastic overhaul.
And the performance gains are not as obvious as one would hope, as
 there is some overhead caused by the WSD switch software
that transparently maps connections onto standard IP networks and SAN
boards depending on who you are talking to.

For some performance comparisions comparing WSD/native VI/TCP, there's
a paper called "WSDLite: a Lightweight Alternative to Windows Sockets Direct
Path", there's a link to the paper at http://citeseer.nj.nec.com/388853.html
(seems you have to use the Cached: links)

Providing a wrapper library for use with Infiniband and the current
SAN boards like WSD would probably be a useful exercise, but to really get
good performance (especially latency-wise) you probably want to use
something like MPI. For many applications a wrapper will be enough, though.
-- 
Pekka Pietikainen

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Linux and system area networks
@ 2001-06-26 12:36 Jesse Pollard
  2001-06-27 12:41 ` Pekka Pietikainen
  0 siblings, 1 reply; 11+ messages in thread
From: Jesse Pollard @ 2001-06-26 12:36 UTC (permalink / raw)
  To: roland, linux-kernel; +Cc: Pete Zaitcev

---------  Received message begins Here  ---------

> 
> >>>>> "Pete" == Pete Zaitcev <zaitcev@redhat.com> writes:
> 
>     Roland> The rough idea is that WSD is a new user space library
>     Roland> that looks at sockets calls and decides if they have to go
>     Roland> through the usual kernel network stack, or if they can be
>     Roland> handed off to a "SAN service provider" which bypasses the
>     Roland> network stack and uses hardware reliable transport and
>     Roland> possibly RDMA.
> 
>     Pete> That can be done in Linux just as easily, using same DLLs
>     Pete> (they are called .so for "shared object"). If you look at
>     Pete> Ashok Raj's Infi presentation, you may discern "user-level
>     Pete> sockets", if you look hard enough. I invite you to try, if
>     Pete> errors of others did not teach you anything.
> 
> I think you misunderstood the point.  Microsoft is providing this WSD
> DLL as a standard part of W2K now.  This means that hardware vendors
> just have to write a SAN service provider, and all Winsock-using
> applications benefit transparently.  No matter how good your TCP/IP
> implementation is, you still lose (especially in latency) compared to
> using reliable hardware transport.  Oracle-with-VI and DAFS-vs-NFS
> benchmarks show this quite clearly.

You do loose in security. You can't use IPSec over such a device without
some drastic overhaul.

> Linux has nothing to compare to Winsock Direct.  I agree, one could
> put an equivalent in glibc, or one could take advantage of Linux's
> relatively low system call latency and put something in the kernel.
> The unfortunate consequence of this is that SAN (system area network)
> hardware vendors are not going to support Linux very well.
> 
> BTW, do you have a pointer to Ashok Raj's presentation?

That would be usefull. We had a presentation here, but it did not
show any great detail (mostly marketing drivel "it will be faster/more
efficient/less overhead.." but nothing about security).
 
>     Roland> This means that all applications that use Winsock benefit
>     Roland> from the advanced network hardware.  Also, it means that
>     Roland> Windows is much easier for hardware vendors to support
>     Roland> than other OSes.  For example, Alacritech's TCP/IP offload
>     Roland> NIC only works under Windows.  Microsoft is also including
>     Roland> Infiniband support in Windows XP and Windows 2002.
> 
>     Pete> IMHO, Alacritech is about to join scores and scores of
>     Pete> vendors who tried that before. Customers understand very
>     Pete> soon that a properly written host based stack works much
>     Pete> better in the face of a changing environment: Faster CPUs,
>     Pete> new CPUs (IA-64), new network protocols (ECN). Besides, it
>     Pete> is easy to "accelerate" a bad network stack, but try to
>     Pete> outdo a well done stack.
> 
> OK, how about an Infiniband network with a TCP/IP gateway at the edge?
> Have we thought about how Linux servers should use the gateway to talk
> to internet hosts?  Surely there's no point in running TCP/IP inside
> the Infiniband network, so there needs to be some concept of "socket
> over Infiniband."

One of the problems I haven't seen explained is how the address translation
between TCP/IP and any SAN. Much less how security is going to be controled.
Personally, I think it will end up equivalent to TCP/IP over fibre channel...

-------------------------------------------------------------------------
Jesse I Pollard, II
Email: pollard@navo.hpc.mil

Any opinions expressed are solely my own.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Linux and system area networks
@ 2001-06-25 17:59 Roland Dreier
  0 siblings, 0 replies; 11+ messages in thread
From: Roland Dreier @ 2001-06-25 17:59 UTC (permalink / raw)
  To: linux-kernel

I'd like to find out if anyone has thought about how Linux will handle
some of the new network technologies people are starting to push.
Specifically I'm talking about "System Area Networks," that is, things
like Infiniband, as well as TCP/IP offload.

In the past people have advocated VIA as a way to use network hardware
that provides reliability and remote DMA (RDMA).  However, VI never
really caught on because it requires applications to be completely
rewritten.  In addition, the corporate backers of VI seem to have
mostly given up on it.

Late last year, Network Appliance proposed something they called
"DASockets," which would mostly preserve socket semantics.  However
that seems to have been put on hold.

Microsoft recently introduced something called "Winsock Direct" in W2K
Datacenter.  For more info you can look at:

http://www.microsoft.com/windows2000/en/datacenter/help/default.asp?url=/WINDOWS2000/en/datacenter/help/WSD_and_SAN.htm

The rough idea is that WSD is a new user space library that looks at
sockets calls and decides if they have to go through the usual kernel
network stack, or if they can be handed off to a "SAN service
provider" which bypasses the network stack and uses hardware reliable
transport and possibly RDMA.

This means that all applications that use Winsock benefit from the
advanced network hardware.  Also, it means that Windows is much easier
for hardware vendors to support than other OSes.  For example,
Alacritech's TCP/IP offload NIC only works under Windows.  Microsoft
is also including Infiniband support in Windows XP and Windows 2002.
(Intel will be pushing Infiniband onto motherboards pretty soon, which
will bring reliable transport, RDMA network hardware into the
mainstream)

So I guess my question is whether anyone has started thinking about
the architectural changes needed to make System Area Networking and
TCP/IP offload easier under Linux.

Thanks,
  Roland
-- 
Roland Dreier                                <roland@digitalvampire.org>
GPG Key fingerprint = A89F B5E9 C185 F34D BD50  4009 37E2 25CC E0EE FAC0

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2001-06-29  2:33 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <mailman.993492125.21454.linux-kernel2news@redhat.com>
2001-06-25 22:30 ` Linux and system area networks Pete Zaitcev
2001-06-25 22:55   ` Roland Dreier
2001-06-26  0:14     ` Alan Cox
2001-06-26  0:08   ` Alan Cox
2001-06-26 12:36 Jesse Pollard
2001-06-27 12:41 ` Pekka Pietikainen
2001-06-28 17:28   ` Bogdan Costescu
2001-06-28 19:12     ` Pekka Pietikainen
2001-06-28 21:46       ` Roland Dreier
2001-06-29  2:33         ` Bernd Eckenfels
  -- strict thread matches above, loose matches on Subject: below --
2001-06-25 17:59 Roland Dreier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).