From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S270767AbTHFR4B (ORCPT ); Wed, 6 Aug 2003 13:56:01 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S270772AbTHFR4B (ORCPT ); Wed, 6 Aug 2003 13:56:01 -0400 Received: from mail.zmailer.org ([62.240.94.4]:3467 "EHLO mail.zmailer.org") by vger.kernel.org with ESMTP id S270767AbTHFRz7 (ORCPT ); Wed, 6 Aug 2003 13:55:59 -0400 Date: Wed, 6 Aug 2003 20:55:57 +0300 From: Matti Aarnio To: Andy Isaacson Cc: linux-kernel@vger.kernel.org Subject: Re: TOE brain dump Message-ID: <20030806175557.GB6898@mea-ext.zmailer.org> References: <20030802184901.G5798@almesberger.net> <20030804162433.L5798@almesberger.net> <20030806021304.E5798@almesberger.net> <20030806103758.H5798@almesberger.net> <20030806105830.B26920@hexapodia.org> <3F312C65.9000201@nortelnetworks.com> <20030806120145.A15543@hexapodia.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030806120145.A15543@hexapodia.org> Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Aug 06, 2003 at 12:01:45PM -0500, Andy Isaacson wrote: > On Wed, Aug 06, 2003 at 12:27:17PM -0400, Chris Friesen wrote: > > Andy Isaacson wrote: > > > On Wed, Aug 06, 2003 at 10:37:58AM -0300, Werner Almesberger wrote: > > >>Eric W. Biederman wrote: > > >>>to keep your latency down. Do any ethernet switches do cut-through? > > >>According to Google, many at least claim to do this. Quite a while back (several years) several "cut-through" routing things were introduced, primarily over ATMish core networks. The idea ran essentially as: "if you can't find header address lookup from cache, run routing and form a VC to carry rest of the flow, if you can find a VC from cache, send the packet there" (what the "VC" is in the end is not that important.) NOTHING in those implementations was (as I recall) specifying about treatment of the packet before it was fully collected into router local buffer memory. In very high speed local networks (like Cray T3 series switch fabric with _routable_ packets) one can implement protocols, which carry destination node address selector bits in header, and if the fabric is e.g. congestion free one, there is guaranteed success at delivering the bits to desired destination. To make UDPish communication a bit simpler, relevant hardware got signal back about "sent ok thru / collision", so the sender hardware could automagically retry the xmit. To certain extent one could handle e.g. ethernet in similar style by fast-switching packets by cached destination MAC addresses. When destination MAC lookup points to some destination port in local hardware, internal VC is formed (reserved in output end, presuming sufficient core bandwidth to handle everything), and incoming enet frame is sent piece by piece thru the internal switch to the output port. If the output port can not be contacted immediately, full frame (possibly two or three) need to be buffered at the receiver. That way switch internal buffering delay would be -- lets see: - preamble 7 bytes - SFD 1 byte - dest mac 6 bytes plus processing delay, but that is absolute minimum for 100BASE-T Cheap cluster super-computer makers are using ethernets, and other "off the shelf" stuff, but I don't see why semi-proprietary high performance "LANs" could not emerge for this market. E.g. I would love to have cheapish (mere 5 times price of Cu-GE card) "LAN" cards for cluster binding, especially if I get direct memory access to other machine's memory. A whole bundle of various cluster interconnects are mentioned at this white-paper from 2001: http://www.dell.com/us/en/slg/topics/power_ps4q01-ctcinter.htm VIA, VI-IP, SCI, FE, infiniband, etc... /Matti Aarnio