linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] myri10ge - Myri-10G Ethernet driver
@ 2006-05-10 21:22 Brice Goglin
  2006-05-10 21:34 ` [PATCH 1/6] myri10ge - Revive pci_find_ext_capability Brice Goglin
                   ` (5 more replies)
  0 siblings, 6 replies; 28+ messages in thread
From: Brice Goglin @ 2006-05-10 21:22 UTC (permalink / raw)
  To: netdev, Andrew Morton; +Cc: LKML, Andrew J. Gallatin

[PATCH 0/6] myri10ge - Myri-10G Ethernet driver

The following 6 patches introduce the myri10ge driver for Myricom Myri-10G
boards in Ethernet mode. The driver is called myri10ge. The patches are
against 2.6.17-rc3-mm1.

[1/6]	Restore pci_find_ext_capability.
[2/6]	Add nVidia nForce CK804 PCI-E and ServerWorks HT2000 PCI-E IDs.
[3/6]	myri10ge driver header files.
[4-5/6]	Two halves of myri10ge driver core.
[6/6]	Add Kconfig and Makefile support for the myri10ge driver.

It also uses the following patches that have been sent on May 2
(http://lkml.org/lkml/2006/5/2/286 and 288) and merged into -mm.
add-__iowrite64_copy.patch
	Introduce __iowrite64_copy.
add-pci_cap_id_vndr.patch
	Add the vendor specific extended capability PCI_CAP_ID_VNDR.


The Myri-10G board operates as a regular PCI-Express Ethernet NIC.
If a firmware is available through hotplug, the driver will load it if its
version matches the driver requirements. If not, the driver will adopt the
running firmware that came in the board's eeprom if it is recent enough.

This driver supports in particular NAPI, power management, IPv4 and IPv6
checksum offload, 802.1q VLAN, and TCP Segmentation Offload.

Regards,
Brice Goglin



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/6] myri10ge - Revive pci_find_ext_capability
  2006-05-10 21:22 [PATCH 0/6] myri10ge - Myri-10G Ethernet driver Brice Goglin
@ 2006-05-10 21:34 ` Brice Goglin
  2006-05-10 21:35 ` [PATCH 2/6] myri10ge - Add missing PCI IDs Brice Goglin
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 28+ messages in thread
From: Brice Goglin @ 2006-05-10 21:34 UTC (permalink / raw)
  To: netdev, Andrew Morton; +Cc: LKML, Andrew J. Gallatin

[PATCH 1/6] myri10ge - Revive pci_find_ext_capability

This patch revives pci_find_ext_capability (has been disabled a couple month
ago since it was not used anywhere. See http://lkml.org/lkml/2006/1/20/247).
It will now be used by the myri10ge driver.

Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Andrew J. Gallatin <gallatin@myri.com>

 drivers/pci/pci.c   |    3 +--
 include/linux/pci.h |    2 ++
 2 files changed, 3 insertions(+), 2 deletions(-)

--- linux-mm/drivers/pci/pci.c.old
+++ linux-mm/drivers/pci/pci.c
@@ -164,7 +164,6 @@ int pci_bus_find_capability(struct pci_b
 	return __pci_bus_find_cap(bus, devfn, hdr_type & 0x7f, cap);
 }
 
-#if 0
 /**
  * pci_find_ext_capability - Find an extended capability
  * @dev: PCI device to query
@@ -212,7 +211,7 @@ int pci_find_ext_capability(struct pci_d
 
 	return 0;
 }
-#endif  /*  0  */
+EXPORT_SYMBOL_GPL(pci_find_ext_capability);
 
 /**
  * pci_find_parent_resource - return resource region of parent bus of given region
--- linux-mm/include/linux/pci.h.old
+++ linux-mm/include/linux/pci.h
@@ -443,6 +443,7 @@ struct pci_dev *pci_find_device_reverse 
 struct pci_dev *pci_find_slot (unsigned int bus, unsigned int devfn);
 int pci_find_capability (struct pci_dev *dev, int cap);
 int pci_find_next_capability (struct pci_dev *dev, u8 pos, int cap);
+int pci_find_ext_capability (struct pci_dev *dev, int cap);
 struct pci_bus * pci_find_next_bus(const struct pci_bus *from);
 
 struct pci_dev *pci_get_device (unsigned int vendor, unsigned int device, struct pci_dev *from);
@@ -665,6 +666,7 @@ static inline int pci_register_driver(st
 static inline void pci_unregister_driver(struct pci_driver *drv) { }
 static inline int pci_find_capability (struct pci_dev *dev, int cap) {return 0; }
 static inline int pci_find_next_capability (struct pci_dev *dev, u8 post, int cap) { return 0; }
+static inline int pci_find_ext_capability (struct pci_dev *dev, int cap) {return 0; }
 static inline const struct pci_device_id *pci_match_device(const struct pci_device_id *ids, const struct pci_dev *dev) { return NULL; }
 
 /* Power management related routines */



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 2/6] myri10ge - Add missing PCI IDs
  2006-05-10 21:22 [PATCH 0/6] myri10ge - Myri-10G Ethernet driver Brice Goglin
  2006-05-10 21:34 ` [PATCH 1/6] myri10ge - Revive pci_find_ext_capability Brice Goglin
@ 2006-05-10 21:35 ` Brice Goglin
  2006-05-10 21:52   ` Andi Kleen
  2006-05-10 21:36 ` [PATCH 3/6] myri10ge - Driver header files Brice Goglin
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 28+ messages in thread
From: Brice Goglin @ 2006-05-10 21:35 UTC (permalink / raw)
  To: netdev, Andrew Morton; +Cc: LKML, Andrew J. Gallatin

[PATCH 2/6] myri10ge - Add missing PCI IDs

Add nVidia nForce CK804 PCI-E bridge and 
ServerWorks HT2000 PCI-E bridge IDs.
They will be used by the myri10ge driver.

Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Andrew J. Gallatin <gallatin@myri.com>

 pci_ids.h |    2 ++
 1 file changed, 2 insertions(+)

--- linux-mm/include/linux/pci_ids.h.old	2006-05-09 03:03:23.000000000 +0200
+++ linux-mm/include/linux/pci_ids.h	2006-05-09 03:05:19.000000000 +0200
@@ -1021,6 +1021,7 @@
 #define PCI_DEVICE_ID_NVIDIA_NVENET_8		0x0056
 #define PCI_DEVICE_ID_NVIDIA_NVENET_9		0x0057
 #define PCI_DEVICE_ID_NVIDIA_CK804_AUDIO	0x0059
+#define PCI_DEVICE_ID_NVIDIA_NFORCE_CK804_PCIE	0x005d
 #define PCI_DEVICE_ID_NVIDIA_NFORCE2_SMBUS	0x0064
 #define PCI_DEVICE_ID_NVIDIA_NFORCE2_IDE	0x0065
 #define PCI_DEVICE_ID_NVIDIA_NVENET_2		0x0066
@@ -1383,6 +1384,7 @@
 #define PCI_DEVICE_ID_SERVERWORKS_LE	  0x0009
 #define PCI_DEVICE_ID_SERVERWORKS_GCNB_LE 0x0017
 #define PCI_DEVICE_ID_SERVERWORKS_EPB	  0x0103
+#define PCI_DEVICE_ID_SERVERWORKS_HT2000_PCIE	0x0132
 #define PCI_DEVICE_ID_SERVERWORKS_OSB4	  0x0200
 #define PCI_DEVICE_ID_SERVERWORKS_CSB5	  0x0201
 #define PCI_DEVICE_ID_SERVERWORKS_CSB6    0x0203



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 3/6] myri10ge - Driver header files
  2006-05-10 21:22 [PATCH 0/6] myri10ge - Myri-10G Ethernet driver Brice Goglin
  2006-05-10 21:34 ` [PATCH 1/6] myri10ge - Revive pci_find_ext_capability Brice Goglin
  2006-05-10 21:35 ` [PATCH 2/6] myri10ge - Add missing PCI IDs Brice Goglin
@ 2006-05-10 21:36 ` Brice Goglin
  2006-05-10 21:57   ` Roland Dreier
                     ` (2 more replies)
  2006-05-10 21:40 ` [PATCH 4/6] myri10ge - First half of the driver Brice Goglin
                   ` (2 subsequent siblings)
  5 siblings, 3 replies; 28+ messages in thread
From: Brice Goglin @ 2006-05-10 21:36 UTC (permalink / raw)
  To: netdev, Andrew Morton; +Cc: LKML, Andrew J. Gallatin

[PATCH 3/6] myri10ge - Driver header files

myri10ge driver header files.
myri10ge_mcp.h is the generic header, while myri10ge_mcp_gen_header.h
is automatically generated from our firmware image.

Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Andrew J. Gallatin <gallatin@myri.com>

 myri10ge_mcp.h            |  233 ++++++++++++++++++++++++++++++++++++++++++++++
 myri10ge_mcp_gen_header.h |   73 ++++++++++++++
 2 files changed, 306 insertions(+)

--- /dev/null	2006-04-21 00:45:09.064430000 -0700
+++ linux-mm/drivers/net/myri10ge/myri10ge_mcp.h	2006-04-21 08:20:59.000000000 -0700
@@ -0,0 +1,233 @@
+#ifndef _myri10ge_mcp_h
+#define _myri10ge_mcp_h
+
+#define MYRI10GE_MCP_MAJOR	1
+#define MYRI10GE_MCP_MINOR	4
+
+#ifdef MYRI10GE_MCP
+typedef signed char          int8_t;
+typedef signed short        int16_t;
+typedef signed int          int32_t;
+typedef signed long long    int64_t;
+typedef unsigned char       uint8_t;
+typedef unsigned short     uint16_t;
+typedef unsigned int       uint32_t;
+typedef unsigned long long uint64_t;
+#endif
+
+/* 8 Bytes */
+typedef struct
+{
+  uint32_t high;
+  uint32_t low;
+} mcp_dma_addr_t;
+
+/* 16 Bytes */
+typedef struct
+{
+  uint16_t checksum;
+  uint16_t length;
+} mcp_slot_t;
+
+/* 64 Bytes */
+typedef struct
+{
+  uint32_t cmd;
+  uint32_t data0;	/* will be low portion if data > 32 bits */
+  /* 8 */
+  uint32_t data1;	/* will be high portion if data > 32 bits */
+  uint32_t data2;	/* currently unused.. */
+  /* 16 */
+  mcp_dma_addr_t response_addr;
+  /* 24 */
+  uint8_t pad[40];
+} mcp_cmd_t;
+
+/* 8 Bytes */
+typedef struct
+{
+  uint32_t data;
+  uint32_t result;
+} mcp_cmd_response_t;
+
+
+
+/* 
+   flags used in mcp_kreq_ether_send_t:
+
+   The SMALL flag is only needed in the first segment. It is raised
+   for packets that are total less or equal 512 bytes.
+
+   The CKSUM flag must be set in all segments.
+
+   The PADDED flags is set if the packet needs to be padded, and it
+   must be set for all segments.
+
+   The  MYRI10GE_MCP_ETHER_FLAGS_ALIGN_ODD must be set if the cumulative
+   length of all previous segments was odd.
+*/
+
+
+#define MYRI10GE_MCP_ETHER_FLAGS_SMALL      0x1
+#define MYRI10GE_MCP_ETHER_FLAGS_TSO_HDR    0x1
+#define MYRI10GE_MCP_ETHER_FLAGS_FIRST      0x2
+#define MYRI10GE_MCP_ETHER_FLAGS_ALIGN_ODD  0x4
+#define MYRI10GE_MCP_ETHER_FLAGS_CKSUM      0x8
+#define MYRI10GE_MCP_ETHER_FLAGS_TSO_LAST   0x8
+#define MYRI10GE_MCP_ETHER_FLAGS_NO_TSO     0x10
+#define MYRI10GE_MCP_ETHER_FLAGS_TSO_CHOP   0x10
+#define MYRI10GE_MCP_ETHER_FLAGS_TSO_PLD    0x20
+
+#define MYRI10GE_MCP_ETHER_SEND_SMALL_SIZE  1520
+#define MYRI10GE_MCP_ETHER_MAX_MTU          9400
+
+typedef union mcp_pso_or_cumlen
+{
+  uint16_t pseudo_hdr_offset;
+  uint16_t cum_len;
+} mcp_pso_or_cumlen_t;
+
+#define	MYRI10GE_MCP_ETHER_MAX_SEND_DESC 12
+#define MYRI10GE_MCP_ETHER_PAD	    2
+
+/* 16 Bytes */
+typedef struct
+{
+  uint32_t addr_high;
+  uint32_t addr_low;
+  uint16_t pseudo_hdr_offset;
+  uint16_t length;
+  uint8_t  pad;
+  uint8_t  rdma_count;
+  uint8_t  cksum_offset; 	/* where to start computing cksum */
+  uint8_t  flags;	       	/* as defined above */
+} mcp_kreq_ether_send_t;
+
+/* 8 Bytes */
+typedef struct
+{
+  uint32_t addr_high;
+  uint32_t addr_low;
+} mcp_kreq_ether_recv_t;
+
+
+/* Commands */
+
+#define MYRI10GE_MCP_CMD_OFFSET 0xf80000
+
+typedef enum {
+  MYRI10GE_MCP_CMD_NONE = 0,
+  /* Reset the mcp, it is left in a safe state, waiting
+     for the driver to set all its parameters */
+  MYRI10GE_MCP_CMD_RESET,
+
+  /* get the version number of the current firmware..
+     (may be available in the eeprom strings..? */
+  MYRI10GE_MCP_GET_MCP_VERSION,
+
+
+  /* Parameters which must be set by the driver before it can
+     issue MYRI10GE_MCP_CMD_ETHERNET_UP. They persist until the next
+     MYRI10GE_MCP_CMD_RESET is issued */
+
+  MYRI10GE_MCP_CMD_SET_INTRQ_DMA,
+  MYRI10GE_MCP_CMD_SET_BIG_BUFFER_SIZE,	/* in bytes, power of 2 */
+  MYRI10GE_MCP_CMD_SET_SMALL_BUFFER_SIZE,	/* in bytes */
+  
+
+  /* Parameters which refer to lanai SRAM addresses where the 
+     driver must issue PIO writes for various things */
+
+  MYRI10GE_MCP_CMD_GET_SEND_OFFSET,
+  MYRI10GE_MCP_CMD_GET_SMALL_RX_OFFSET,
+  MYRI10GE_MCP_CMD_GET_BIG_RX_OFFSET,
+  MYRI10GE_MCP_CMD_GET_IRQ_ACK_OFFSET,
+  MYRI10GE_MCP_CMD_GET_IRQ_DEASSERT_OFFSET,
+
+  /* Parameters which refer to rings stored on the MCP,
+     and whose size is controlled by the mcp */
+
+  MYRI10GE_MCP_CMD_GET_SEND_RING_SIZE,	/* in bytes */
+  MYRI10GE_MCP_CMD_GET_RX_RING_SIZE,		/* in bytes */
+
+  /* Parameters which refer to rings stored in the host,
+     and whose size is controlled by the host.  Note that
+     all must be physically contiguous and must contain 
+     a power of 2 number of entries.  */
+
+  MYRI10GE_MCP_CMD_SET_INTRQ_SIZE, 	/* in bytes */
+
+  /* command to bring ethernet interface up.  Above parameters
+     (plus mtu & mac address) must have been exchanged prior
+     to issuing this command  */
+  MYRI10GE_MCP_CMD_ETHERNET_UP,
+
+  /* command to bring ethernet interface down.  No further sends
+     or receives may be processed until an MYRI10GE_MCP_CMD_ETHERNET_UP
+     is issued, and all interrupt queues must be flushed prior
+     to ack'ing this command */
+
+  MYRI10GE_MCP_CMD_ETHERNET_DOWN,
+
+  /* commands the driver may issue live, without resetting
+     the nic.  Note that increasing the mtu "live" should
+     only be done if the driver has already supplied buffers
+     sufficiently large to handle the new mtu.  Decreasing
+     the mtu live is safe */
+
+  MYRI10GE_MCP_CMD_SET_MTU,
+  MYRI10GE_MCP_CMD_GET_INTR_COAL_DELAY_OFFSET,  /* in microseconds */
+  MYRI10GE_MCP_CMD_SET_STATS_INTERVAL,   /* in microseconds */
+  MYRI10GE_MCP_CMD_SET_STATS_DMA,
+
+  MYRI10GE_MCP_ENABLE_PROMISC,
+  MYRI10GE_MCP_DISABLE_PROMISC,
+  MYRI10GE_MCP_SET_MAC_ADDRESS,
+
+  MYRI10GE_MCP_ENABLE_FLOW_CONTROL,
+  MYRI10GE_MCP_DISABLE_FLOW_CONTROL,
+
+  /* do a DMA test
+     data0,data1 = DMA address
+     data2       = RDMA length (MSH), WDMA length (LSH)
+     command return data = repetitions (MSH), 0.5-ms ticks (LSH)
+  */
+  MYRI10GE_MCP_DMA_TEST
+} myri10ge_mcp_cmd_type_t;
+
+
+typedef enum {
+  MYRI10GE_MCP_CMD_OK = 0,
+  MYRI10GE_MCP_CMD_UNKNOWN,
+  MYRI10GE_MCP_CMD_ERROR_RANGE,
+  MYRI10GE_MCP_CMD_ERROR_BUSY,
+  MYRI10GE_MCP_CMD_ERROR_EMPTY,
+  MYRI10GE_MCP_CMD_ERROR_CLOSED,
+  MYRI10GE_MCP_CMD_ERROR_HASH_ERROR,
+  MYRI10GE_MCP_CMD_ERROR_BAD_PORT,
+  MYRI10GE_MCP_CMD_ERROR_RESOURCES
+} myri10ge_mcp_cmd_status_t;
+
+
+/* 32 Bytes */
+typedef struct
+{
+  uint32_t send_done_count;
+
+  uint32_t link_up;
+  uint32_t dropped_link_overflow;
+  uint32_t dropped_link_error_or_filtered;
+  uint32_t dropped_runt;
+  uint32_t dropped_overrun;
+  uint32_t dropped_no_small_buffer;
+  uint32_t dropped_no_big_buffer;
+  uint32_t rdma_tags_available;
+
+  uint8_t tx_stopped;
+  uint8_t link_down;
+  uint8_t stats_updated;
+  uint8_t valid;
+} mcp_irq_data_t;
+
+
+#endif /* _myri10ge_mcp_h */
--- /dev/null	2006-04-21 00:45:09.064430000 -0700
+++ linux-mm/drivers/net/myri10ge/myri10ge_mcp_gen_header.h	2006-04-21 08:22:06.000000000 -0700
@@ -0,0 +1,73 @@
+#ifndef _myri10ge_mcp_gen_header_h
+#define _myri10ge_mcp_gen_header_h
+
+/* this file define a standard header used as a first entry point to
+   exchange information between firmware/driver and driver.  The
+   header structure can be anywhere in the mcp. It will usually be in
+   the .data section, because some fields needs to be initialized at
+   compile time.
+   The 32bit word at offset MX_HEADER_PTR_OFFSET in the mcp must
+   contains the location of the header. 
+
+   Typically a MCP will start with the following:
+   .text
+     .space 52    ! to help catch MEMORY_INT errors
+     bt start     ! jump to real code
+     nop
+     .long _gen_mcp_header
+   
+   The source will have a definition like:
+
+   mcp_gen_header_t gen_mcp_header = {
+      .header_length = sizeof(mcp_gen_header_t),
+      .mcp_type = MCP_TYPE_XXX,
+      .version = "something $Id: mcp_gen_header.h,v 1.1 2005/12/23 02:10:44 gallatin Exp $",
+      .mcp_globals = (unsigned)&Globals
+   };
+*/
+
+
+#define MCP_HEADER_PTR_OFFSET  0x3c
+
+#define MCP_TYPE_MX 0x4d582020 /* "MX  " */
+#define MCP_TYPE_PCIE 0x70636965 /* "PCIE" pcie-only MCP */
+#define MCP_TYPE_ETH 0x45544820 /* "ETH " */
+#define MCP_TYPE_MCP0 0x4d435030 /* "MCP0" */
+
+
+typedef struct mcp_gen_header {
+  /* the first 4 fields are filled at compile time */
+  unsigned header_length;
+  unsigned mcp_type;
+  char version[128];
+  unsigned mcp_globals; /* pointer to mcp-type specific structure */
+
+  /* filled by the MCP at run-time */
+  unsigned sram_size;
+  unsigned string_specs;  /* either the original STRING_SPECS or a superset */
+  unsigned string_specs_len;
+
+  /* Fields above this comment are guaranteed to be present.
+
+     Fields below this comment are extensions added in later versions
+     of this struct, drivers should compare the header_length against
+     offsetof(field) to check wether a given MCP implements them.
+
+     Never remove any field.  Keep everything naturally align.
+  */
+} mcp_gen_header_t;
+
+/* Macro to create a simple mcp header */
+#define MCP_GEN_HEADER_DECL(type, version_str, global_ptr)	\
+  struct mcp_gen_header mcp_gen_header = {			\
+    sizeof (struct mcp_gen_header),				\
+    (type),							\
+    version_str,						\
+    (global_ptr),						\
+    SRAM_SIZE,							\
+    (unsigned int) STRING_SPECS,				\
+    256								\
+  }
+
+
+#endif /* _myri10ge_mcp_gen_header_h */



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 4/6] myri10ge - First half of the driver
  2006-05-10 21:22 [PATCH 0/6] myri10ge - Myri-10G Ethernet driver Brice Goglin
                   ` (2 preceding siblings ...)
  2006-05-10 21:36 ` [PATCH 3/6] myri10ge - Driver header files Brice Goglin
@ 2006-05-10 21:40 ` Brice Goglin
  2006-05-10 22:01   ` Stephen Hemminger
                     ` (2 more replies)
  2006-05-10 21:42 ` [PATCH 5/6] myri10ge - Second " Brice Goglin
  2006-05-10 21:43 ` [PATCH 6/6] myri10ge - Kconfig and Makefile Brice Goglin
  5 siblings, 3 replies; 28+ messages in thread
From: Brice Goglin @ 2006-05-10 21:40 UTC (permalink / raw)
  To: netdev, Andrew Morton; +Cc: LKML, Andrew J. Gallatin, brice

[PATCH 4/6] myri10ge - First half of the driver

The first half of the myri10ge driver core.

Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Andrew J. Gallatin <gallatin@myri.com>

 myri10ge.c | 1483 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 1483 insertions(+)

--- /dev/null	2006-05-09 19:43:19.324446250 +0200
+++ linux/drivers/net/myri10ge/myri10ge.c	2006-05-09 23:00:55.000000000 +0200
@@ -0,0 +1,1483 @@
+/*************************************************************************
+ * myri10ge.c: Myricom Myri-10G Ethernet driver.
+ *
+ * Copyright (C) 2005, 2006 Myricom Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ * 3. Neither the name of Myricom, Inc. nor the names of its contributors
+ *    may be used to endorse or promote products derived from this software
+ *    without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ *
+ *
+ * If the eeprom on your board is not recent enough, you will need to get a
+ * newer firmware image at:
+ *   http://www.myri.com/scs/download-Myri10GE.html
+ *
+ * Contact Information:
+ *   <help@myri.com>
+ *   Myricom, Inc., 325N Santa Anita Avenue, Arcadia, CA 91006
+ *************************************************************************/
+
+#include <linux/tcp.h>
+#include <linux/netdevice.h>
+#include <linux/skbuff.h>
+#include <linux/string.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/etherdevice.h>
+#include <linux/if_ether.h>
+#include <linux/if_vlan.h>
+#include <linux/ip.h>
+#include <linux/inet.h>
+#include <linux/in.h>
+#include <linux/ethtool.h>
+#include <linux/firmware.h>
+#include <linux/delay.h>
+#include <linux/version.h>
+#include <linux/timer.h>
+#include <linux/vmalloc.h>
+#include <linux/crc32.h>
+#include <linux/moduleparam.h>
+#include <linux/io.h>
+#include <net/checksum.h>
+#include <asm/byteorder.h>
+#include <asm/io.h>
+#include <asm/pci.h>
+#include <asm/processor.h>
+#ifdef CONFIG_MTRR
+#include <asm/mtrr.h>
+#endif
+
+#include "myri10ge_mcp.h"
+#include "myri10ge_mcp_gen_header.h"
+
+#define MYRI10GE_VERSION_STR "0.9.0"
+
+MODULE_DESCRIPTION("Myricom 10G driver (10GbE)");
+MODULE_AUTHOR("Maintainer: help@myri.com");
+MODULE_VERSION(MYRI10GE_VERSION_STR);
+MODULE_LICENSE("Dual BSD/GPL");
+
+#define MYRI10GE_MAX_ETHER_MTU 9014
+
+#define MYRI10GE_ETH_STOPPED 0
+#define MYRI10GE_ETH_STOPPING 1
+#define MYRI10GE_ETH_STARTING 2
+#define MYRI10GE_ETH_RUNNING 3
+#define MYRI10GE_ETH_OPEN_FAILED 4
+
+#define MYRI10GE_EEPROM_STRINGS_SIZE 256
+#define MYRI10GE_MCP_ETHER_MAX_SEND_DESC_TSO ((65536 / 2048) * 2)
+
+struct myri10ge_rx_buffer_state {
+	struct sk_buff *skb;
+	DECLARE_PCI_UNMAP_ADDR(bus)
+	DECLARE_PCI_UNMAP_LEN(len)
+};
+
+struct myri10ge_tx_buffer_state {
+	struct sk_buff *skb;
+	int last;
+	DECLARE_PCI_UNMAP_ADDR(bus)
+	DECLARE_PCI_UNMAP_LEN(len)
+};
+
+typedef struct {
+	uint32_t data0;
+	uint32_t data1;
+	uint32_t data2;
+} myri10ge_cmd_t;
+
+typedef struct {
+	mcp_kreq_ether_recv_t __iomem *lanai;	/* lanai ptr for recv ring */
+	volatile uint8_t __iomem *wc_fifo;	/* w/c rx dma addr fifo address */
+	mcp_kreq_ether_recv_t *shadow;	/* host shadow of recv ring */
+	struct myri10ge_rx_buffer_state *info;
+	int cnt;
+	int alloc_fail;
+	int mask;			/* number of rx slots -1 */
+} myri10ge_rx_buf_t;
+
+typedef struct {
+	mcp_kreq_ether_send_t __iomem *lanai;	/* lanai ptr for sendq */
+	volatile uint8_t __iomem *wc_fifo;	/* w/c send fifo address */
+	mcp_kreq_ether_send_t *req_list;	/* host shadow of sendq */
+	char *req_bytes;
+	struct myri10ge_tx_buffer_state *info;
+	int mask;			/* number of transmit slots -1	*/
+	int boundary;			/* boundary transmits cannot cross*/
+	int req ____cacheline_aligned;	/* transmit slots submitted	*/
+	int pkt_start;			/* packets started */
+	int done ____cacheline_aligned;	/* transmit slots completed	*/
+	int pkt_done;			/* packets completed */
+} myri10ge_tx_buf_t;
+
+typedef struct {
+	mcp_slot_t *entry;
+	dma_addr_t bus;
+	int cnt;
+	int idx;
+} myri10ge_rx_done_t;
+
+struct myri10ge_priv {
+	int running;			/* running? 		*/
+	int csum_flag;			/* rx_csums? 		*/
+	myri10ge_tx_buf_t tx;	/* transmit ring 	*/
+	myri10ge_rx_buf_t rx_small;
+	myri10ge_rx_buf_t rx_big;
+	myri10ge_rx_done_t rx_done;
+	int small_bytes;
+	struct net_device *dev;
+	struct net_device_stats stats;
+	volatile uint8_t __iomem *sram;
+	int sram_size;
+	unsigned long board_span;
+	unsigned long iomem_base;
+	volatile uint32_t __iomem *irq_claim;
+	volatile uint32_t __iomem *irq_deassert;
+	char *mac_addr_string;
+	mcp_cmd_response_t *cmd;
+	dma_addr_t cmd_bus;
+	mcp_irq_data_t *fw_stats;
+	dma_addr_t fw_stats_bus;
+	struct pci_dev *pdev;
+	int msi_enabled;
+	unsigned int link_state;
+	unsigned int rdma_tags_available;
+	int intr_coal_delay;
+	volatile uint32_t __iomem *intr_coal_delay_ptr;
+	int mtrr;
+	spinlock_t cmd_lock;
+	int wake_queue;
+	int stop_queue;
+	int down_cnt;
+	struct work_struct watchdog_work;
+	struct timer_list watchdog_timer;
+	int watchdog_tx_done;
+	int watchdog_resets;
+	int tx_linearized;
+	int pause;
+	char *fw_name;
+	char eeprom_strings[MYRI10GE_EEPROM_STRINGS_SIZE];
+	char fw_version[128];
+	uint8_t	mac_addr[6];	/* eeprom mac address */
+	unsigned long serial_number;
+	int vendor_specific_offset;
+	uint32_t devctl;
+	uint32_t msi_addr_low;
+	uint32_t msi_addr_high;
+	uint16_t msi_flags;
+	uint16_t msi_data_32;
+	uint16_t msi_data_64;
+	uint32_t pm_state[16];
+	uint32_t read_dma;
+	uint32_t write_dma;
+	uint32_t read_write_dma;
+};
+
+
+static char *myri10ge_fw_name = NULL;
+static char *myri10ge_fw_unaligned = "myri10ge_ethp_z8e.dat";
+#if defined(CONFIG_X86) || defined(CONFIG_X86_64)
+static char *myri10ge_fw_aligned = "myri10ge_eth_z8e.dat";
+static int myri10ge_ecrc_enable = 1;
+#endif
+static int myri10ge_max_intr_slots = 1024;
+static int myri10ge_small_bytes = -1;	/* -1 == auto */
+#ifdef CONFIG_PCI_MSI
+static int myri10ge_msi = -1;	/* 0: off, 1:on, otherwise auto */
+#endif
+static int myri10ge_intr_coal_delay = 25;
+static int myri10ge_flow_control = 1;
+static int myri10ge_deassert_wait = 1;
+static int myri10ge_force_firmware = 0;
+static int myri10ge_skb_cross_4k = 0;
+static int myri10ge_initial_mtu = MYRI10GE_MAX_ETHER_MTU - ETH_HLEN;
+static int myri10ge_napi = 1;
+static int myri10ge_napi_weight = 64;
+static int myri10ge_watchdog_timeout = 1;
+static int myri10ge_max_irq_loops = 1048576;
+
+module_param(myri10ge_fw_name, charp, S_IRUGO | S_IWUSR);
+module_param(myri10ge_max_intr_slots, int, S_IRUGO);
+module_param(myri10ge_small_bytes, int, S_IRUGO | S_IWUSR);
+#ifdef CONFIG_PCI_MSI
+module_param(myri10ge_msi, int, S_IRUGO);
+#endif
+module_param(myri10ge_intr_coal_delay, int, S_IRUGO);
+#if defined(CONFIG_X86) || defined(CONFIG_X86_64)
+module_param(myri10ge_ecrc_enable, int, S_IRUGO);
+#endif
+module_param(myri10ge_flow_control, int, S_IRUGO);
+module_param(myri10ge_deassert_wait, int, S_IRUGO | S_IWUSR);
+module_param(myri10ge_force_firmware, int, S_IRUGO);
+module_param(myri10ge_skb_cross_4k, int, S_IRUGO | S_IWUSR);
+module_param(myri10ge_initial_mtu, int, S_IRUGO);
+module_param(myri10ge_napi, int, S_IRUGO);
+module_param(myri10ge_napi_weight, int, S_IRUGO);
+module_param(myri10ge_watchdog_timeout, int, S_IRUGO);
+module_param(myri10ge_max_irq_loops, int, S_IRUGO);
+
+#define MYRI10GE_FW_OFFSET 1024*1024
+#define MYRI10GE_HIGHPART_TO_U32(X) \
+(sizeof (X) == 8) ? ((uint32_t)((uint64_t)(X) >> 32)) : (0)
+#define MYRI10GE_LOWPART_TO_U32(X) ((uint32_t)(X))
+
+#define myri10ge_pio_copy(to,from,size) __iowrite64_copy(to,from,size/8)
+
+int myri10ge_hyper_msi_cap_on(struct pci_dev *pdev)
+{
+	uint8_t cap_off;
+	int nbcap = 0;
+
+	cap_off = PCI_CAPABILITY_LIST - 1;
+	/* go through all caps looking for a hypertransport msi mapping */
+	while (pci_read_config_byte(pdev, cap_off + 1, &cap_off) == 0 &&
+	       nbcap++ <= 256 / 4) {
+		uint32_t cap_hdr;
+		if (cap_off == 0 || cap_off == 0xff)
+			break;
+		cap_off &= 0xfc;
+		/* cf hypertransport spec, msi mapping section */
+		if (pci_read_config_dword(pdev, cap_off, &cap_hdr) == 0
+		    && (cap_hdr & 0xff) == 8 /* hypertransport cap */
+		    && (cap_hdr & 0xf8000000) == 0xa8000000 /* msi mapping */
+		    && (cap_hdr & 0x10000) /* msi mapping cap enabled */) {
+			/* MSI present and enabled */
+			return 1;
+		}
+	}
+	return 0;
+}
+
+#ifdef CONFIG_PCI_MSI
+static int
+myri10ge_use_msi(struct pci_dev *pdev)
+{
+	if (myri10ge_msi == 1 || myri10ge_msi == 0)
+		return myri10ge_msi;
+
+	/*  find root complex for our device */
+	while (pdev->bus && pdev->bus->self) {
+		pdev = pdev->bus->self;
+	}
+	/* go for it if chipset is intel, or has hypertransport msi cap */
+	if (pdev->vendor == PCI_VENDOR_ID_INTEL
+	    || myri10ge_hyper_msi_cap_on(pdev))
+		return 1;
+
+	/*  check if main chipset device has hypertransport msi cap */
+	pdev = pci_find_slot(pdev->bus->number, 0);
+	if (pdev && myri10ge_hyper_msi_cap_on(pdev))
+		return 1;
+
+	/* default off */
+	return 0;
+}
+#endif /* CONFIG_PCI_MSI */
+
+
+static int
+myri10ge_send_cmd(struct myri10ge_priv *mgp, uint32_t cmd,
+		  myri10ge_cmd_t *data)
+{
+	mcp_cmd_t *buf;
+	char buf_bytes[sizeof(*buf) + 8];
+	volatile mcp_cmd_response_t *response = mgp->cmd;
+	volatile char __iomem *cmd_addr = mgp->sram + MYRI10GE_MCP_CMD_OFFSET;
+	uint32_t dma_low, dma_high;
+	int sleep_total = 0;
+
+	/* ensure buf is aligned to 8 bytes */
+	buf = (mcp_cmd_t *) ((unsigned long)(buf_bytes + 7) & ~7UL);
+
+	buf->data0 = htonl(data->data0);
+	buf->data1 = htonl(data->data1);
+	buf->data2 = htonl(data->data2);
+	buf->cmd = htonl(cmd);
+	dma_low = MYRI10GE_LOWPART_TO_U32(mgp->cmd_bus);
+	dma_high = MYRI10GE_HIGHPART_TO_U32(mgp->cmd_bus);
+
+	buf->response_addr.low = htonl(dma_low);
+	buf->response_addr.high = htonl(dma_high);
+	spin_lock(&mgp->cmd_lock);
+	response->result = 0xffffffff;
+	mb();
+	myri10ge_pio_copy((void __iomem *) cmd_addr, buf, sizeof (*buf));
+
+	/* wait up to 2 seconds */
+	for (sleep_total = 0; sleep_total < (2 * 1000); sleep_total += 10) {
+		mb();
+		if (response->result != 0xffffffff) {
+			if (response->result == 0) {
+				data->data0 = ntohl(response->data);
+				spin_unlock(&mgp->cmd_lock);
+				return 0;
+			} else {
+				dev_err(&mgp->pdev->dev,
+					"command %d failed, result = %d\n",
+				       cmd, ntohl(response->result));
+				spin_unlock(&mgp->cmd_lock);
+				return -ENXIO;
+			}
+		}
+		udelay(1000 * 10);
+	}
+	spin_unlock(&mgp->cmd_lock);
+	dev_err(&mgp->pdev->dev, "command %d timed out, result = %d\n",
+	       cmd, ntohl(response->result));
+	return -EAGAIN;
+}
+
+
+/*
+ * The eeprom strings on the lanaiX have the format
+ * SN=x\0
+ * MAC=x:x:x:x:x:x\0
+ * PT:ddd mmm xx xx:xx:xx xx\0
+ * PV:ddd mmm xx xx:xx:xx xx\0
+ */
+int
+myri10ge_read_mac_addr(struct myri10ge_priv *mgp)
+{
+	char *ptr, *limit;
+	int i;
+
+	ptr = mgp->eeprom_strings;
+	limit = mgp->eeprom_strings + MYRI10GE_EEPROM_STRINGS_SIZE;
+
+	while (*ptr != '\0' && ptr < limit) {
+		if (memcmp(ptr, "MAC=", 4) == 0) {
+			ptr += 4;
+			mgp->mac_addr_string = ptr;
+			for (i = 0; i < 6; i++) {
+				if ((ptr + 2) > limit)
+					goto abort;
+				mgp->mac_addr[i] = simple_strtoul(ptr, &ptr, 16);
+				ptr += 1;
+			}
+		}
+		if (memcmp((const void *) ptr, "SN=", 3) == 0) {
+			ptr += 3;
+			mgp->serial_number = simple_strtoul(ptr, &ptr, 10);
+		}
+		while (ptr < limit && *ptr++);
+	}
+
+	return 0;
+
+ abort:
+	dev_err(&mgp->pdev->dev, "failed to parse eeprom_strings\n");
+	return -ENXIO;
+}
+
+/*
+ * Enable or disable periodic RDMAs from the host to make certain
+ * chipsets resend dropped PCIe messages
+ */
+
+static void
+myri10ge_dummy_rdma(struct myri10ge_priv *mgp, int enable)
+{
+	volatile uint32_t *confirm;
+	volatile char __iomem *submit;
+	uint32_t buf[16];
+	uint32_t dma_low, dma_high;
+	int i;
+
+	/* clear confirmation addr */
+	confirm = (volatile uint32_t *) mgp->cmd;
+	*confirm = 0;
+	mb();
+
+	/* send a rdma command to the PCIe engine, and wait for the
+	 * response in the confirmation address.  The firmware should
+	 * write a -1 there to indicate it is alive and well
+	 */
+	dma_low = MYRI10GE_LOWPART_TO_U32(mgp->cmd_bus);
+	dma_high = MYRI10GE_HIGHPART_TO_U32(mgp->cmd_bus);
+
+	buf[0] = htonl(dma_high); 	/* confirm addr MSW */
+	buf[1] = htonl(dma_low); 	/* confirm addr LSW */
+	buf[2] = htonl(0xffffffff);	/* confirm data */
+	buf[3] = htonl(dma_high); 	/* dummy addr MSW */
+	buf[4] = htonl(dma_low); 	/* dummy addr LSW */
+	buf[5] = htonl(enable);		/* enable? */
+
+	submit = mgp->sram + 0xfc01c0;
+
+	myri10ge_pio_copy((void __iomem *) submit, &buf, sizeof (buf));
+	mb();
+	udelay(1000);
+	mb();
+	i = 0;
+	while (*confirm != 0xffffffff && i < 20) {
+		udelay(1000);
+		i++;
+	}
+	if (*confirm != 0xffffffff) {
+		dev_err(&mgp->pdev->dev, "dummy rdma %s failed\n",
+			(enable ? "enable" : "disable"));
+	}
+}
+
+static int
+myri10ge_validate_firmware(struct myri10ge_priv *mgp,
+			   mcp_gen_header_t *hdr)
+{
+	struct device *dev = &mgp->pdev->dev;
+	int major, minor;
+
+
+	/* check firmware type */
+	if (ntohl(hdr->mcp_type) != MCP_TYPE_ETH) {
+		dev_err(dev, "Bad firmware type: 0x%x\n",
+			ntohl(hdr->mcp_type));
+		return -EINVAL;
+	}
+
+	/* save firmware version for ethtool */
+	strncpy(mgp->fw_version, hdr->version, sizeof (mgp->fw_version));
+
+	sscanf(mgp->fw_version, "%d.%d", &major, &minor);
+
+	if (!(major == MYRI10GE_MCP_MAJOR && minor == MYRI10GE_MCP_MINOR)) {
+		dev_err(dev, "Found firmware version %s\n",
+			mgp->fw_version);
+		dev_err(dev, "Driver needs %d.%d\n", MYRI10GE_MCP_MAJOR,
+			MYRI10GE_MCP_MINOR);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static int
+myri10ge_load_hotplug_firmware(struct myri10ge_priv *mgp, uint32_t *size)
+{
+	unsigned crc, reread_crc;
+	const struct firmware *fw;
+	struct device *dev = &mgp->pdev->dev;
+	mcp_gen_header_t *hdr;
+	size_t hdr_offset;
+	int status;
+
+	if ((status = request_firmware(&fw, mgp->fw_name, dev)) < 0) {
+		dev_err(dev, "Unable to load %s firmware image via hotplug\n",
+			mgp->fw_name);
+		status = -EINVAL;
+		goto abort_with_nothing;
+	}
+
+	/* check size */
+
+	if (fw->size >= mgp->sram_size - MYRI10GE_FW_OFFSET ||
+	    fw->size < MCP_HEADER_PTR_OFFSET + 4) {
+		dev_err(dev, "Firmware size invalid:%d\n", (int)fw->size);
+		status = -EINVAL;
+		goto abort_with_fw;
+	}
+
+	/* check id */
+	hdr_offset = ntohl(*(uint32_t *) (fw->data + MCP_HEADER_PTR_OFFSET));
+	if ((hdr_offset & 3) || hdr_offset + sizeof(*hdr) > fw->size) {
+		dev_err(dev, "Bad firmware file\n");
+		status = -EINVAL;
+		goto abort_with_fw;
+	}
+	hdr = (void*) (fw->data + hdr_offset);
+
+	status = myri10ge_validate_firmware(mgp, hdr);
+	if (status != 0) {
+		goto abort_with_fw;
+	}
+
+	crc = crc32(~0, fw->data, fw->size);
+	memcpy_toio(mgp->sram + MYRI10GE_FW_OFFSET, fw->data, fw->size);
+	/* corruption checking is good for parity recovery and buggy chipset */
+	memcpy_fromio(fw->data, mgp->sram + MYRI10GE_FW_OFFSET, fw->size);
+	reread_crc = crc32(~0, fw->data, fw->size);
+	if (crc != reread_crc) {
+		dev_err(dev, "CRC failed(fw-len=%u), got 0x%x (expect 0x%x)\n",
+		       (unsigned)fw->size, reread_crc, crc);
+		status = -EIO;
+		goto abort_with_fw;
+	}
+	*size = (uint32_t)fw->size;
+
+abort_with_fw:
+	release_firmware(fw);
+
+abort_with_nothing:
+	return status;
+}
+
+static int
+myri10ge_adopt_running_firmware(struct myri10ge_priv *mgp)
+{
+	mcp_gen_header_t *hdr;
+	struct device *dev = &mgp->pdev->dev;
+	size_t bytes, hdr_offset;
+	int status;
+
+	/* find running firmware header */
+	hdr_offset = ntohl(__raw_readl(mgp->sram + MCP_HEADER_PTR_OFFSET));
+
+	if ((hdr_offset & 3) || hdr_offset + sizeof(*hdr) > mgp->sram_size) {
+		dev_err(dev, "Running firmware has bad header offset (%d)\n",
+			(int)hdr_offset);
+		return -EIO;
+	}
+
+	/* copy header of running firmware from SRAM to host memory to
+	 * validate firmware */
+	bytes = sizeof (mcp_gen_header_t);
+	hdr = (mcp_gen_header_t *) kmalloc(bytes, GFP_KERNEL);
+	if (hdr == NULL) {
+		dev_err(dev, "could not malloc firmware hdr\n");
+		return -ENOMEM;
+	}
+	memcpy_fromio(hdr, mgp->sram + hdr_offset, bytes);
+	status = myri10ge_validate_firmware(mgp, hdr);
+	kfree(hdr);
+	return status;
+}
+
+
+static int
+myri10ge_load_firmware(struct myri10ge_priv *mgp)
+{
+	volatile uint32_t *confirm;
+	volatile char __iomem *submit;
+	uint32_t buf[16];
+	uint32_t dma_low, dma_high, size;
+	int status, i;
+
+	status = myri10ge_load_hotplug_firmware(mgp, &size);
+	if (status) {
+		dev_warn(&mgp->pdev->dev,
+			 "hotplug firmware loading failed\n");
+
+		/* Do not attempt to adopt firmware if there
+		   was a bad crc */
+		if (status == -EIO) {
+			return status;
+		}
+		status = myri10ge_adopt_running_firmware(mgp);
+		if (status != 0) {
+			dev_err(&mgp->pdev->dev,
+				"failed to adopt running firmware\n");
+			return status;
+		}
+		dev_info(&mgp->pdev->dev,
+			 "Successfully adopted running firmware\n");
+		if (mgp->tx.boundary == 4096) {
+			dev_warn(&mgp->pdev->dev,
+				"Using firmware currently running on NIC"
+				 ".  For optimal\n");
+			dev_warn(&mgp->pdev->dev,
+				 "performance consider loading optimized "
+				 "firmware\n");
+			dev_warn(&mgp->pdev->dev, "via hotplug\n");
+		}
+
+		mgp->fw_name = "adopted";
+		mgp->tx.boundary = 2048;
+		return status;
+	}
+
+	/* clear confirmation addr */
+	confirm = (volatile uint32_t *) mgp->cmd;
+	*confirm = 0;
+	mb();
+
+	/* send a reload command to the bootstrap MCP, and wait for the
+	 *  response in the confirmation address.  The firmware should
+	 * write a -1 there to indicate it is alive and well
+	 */
+	dma_low = MYRI10GE_LOWPART_TO_U32(mgp->cmd_bus);
+	dma_high = MYRI10GE_HIGHPART_TO_U32(mgp->cmd_bus);
+
+	buf[0] = htonl(dma_high); 	/* confirm addr MSW */
+	buf[1] = htonl(dma_low); 	/* confirm addr LSW */
+	buf[2] = htonl(0xffffffff);	/* confirm data */
+
+	/* FIX: All newest firmware should un-protect the bottom of
+	 * the sram before handoff. However, the very first interfaces
+	 * do not. Therefore the handoff copy must skip the first 8 bytes
+	 */
+	buf[3] = htonl(MYRI10GE_FW_OFFSET + 8);	/* where the code starts */
+	buf[4] = htonl(size - 8); 		/* length of code */
+	buf[5] = htonl(8);			/* where to copy to */
+	buf[6] = htonl(0);			/* where to jump to */
+
+	submit = mgp->sram + 0xfc0000;
+
+	myri10ge_pio_copy((void __iomem *) submit, &buf, sizeof (buf));
+	mb();
+	udelay(1000);
+	mb();
+	i = 0;
+	while (*confirm != 0xffffffff && i < 20) {
+		udelay(1000);
+		i++;
+	}
+	if (*confirm != 0xffffffff) {
+		dev_err(&mgp->pdev->dev, "handoff failed\n");
+		return -ENXIO;
+	}
+	dev_info(&mgp->pdev->dev, "handoff confirmed\n");
+	myri10ge_dummy_rdma(mgp, mgp->tx.boundary != 4096);
+
+	return 0;
+}
+
+static int
+myri10ge_update_mac_address(struct myri10ge_priv *mgp, uint8_t *addr)
+{
+	myri10ge_cmd_t cmd;
+	int status;
+
+	cmd.data0 = ((addr[0] << 24) | (addr[1] << 16)
+		     | (addr[2] << 8) | addr[3]);
+
+	cmd.data1 = ((addr[4] << 8) | (addr[5]));
+
+	status = myri10ge_send_cmd(mgp, MYRI10GE_MCP_SET_MAC_ADDRESS, &cmd);
+	return status;
+}
+
+static int
+myri10ge_change_pause(struct myri10ge_priv *mgp, int pause)
+{
+	myri10ge_cmd_t cmd;
+	int status;
+
+	if (pause)
+		status = myri10ge_send_cmd(mgp, MYRI10GE_MCP_ENABLE_FLOW_CONTROL, &cmd);
+	else
+		status = myri10ge_send_cmd(mgp, MYRI10GE_MCP_DISABLE_FLOW_CONTROL, &cmd);
+
+	if (status) {
+		printk(KERN_ERR "myri10ge: %s: Failed to set flow control mode\n",
+		       mgp->dev->name);
+		return -ENXIO;
+	}
+	mgp->pause = pause;
+	return 0;
+}
+
+static void
+myri10ge_change_promisc(struct myri10ge_priv *mgp, int promisc)
+{
+	myri10ge_cmd_t cmd;
+	int status;
+
+	if (promisc)
+		status = myri10ge_send_cmd(mgp, MYRI10GE_MCP_ENABLE_PROMISC, &cmd);
+	else
+		status = myri10ge_send_cmd(mgp, MYRI10GE_MCP_DISABLE_PROMISC, &cmd);
+
+	if (status) {
+		printk(KERN_ERR "myri10ge: %s: Failed to set promisc mode\n",
+		       mgp->dev->name);
+	}
+}
+
+
+static int
+myri10ge_reset(struct myri10ge_priv *mgp)
+{
+
+	myri10ge_cmd_t cmd;
+	int status;
+	size_t bytes;
+	uint32_t len;
+
+	/* try to send a reset command to the card to see if it
+	   is alive */
+	memset(&cmd, 0, sizeof (cmd));
+	status = myri10ge_send_cmd(mgp, MYRI10GE_MCP_CMD_RESET, &cmd);
+	if (status != 0) {
+		dev_err(&mgp->pdev->dev, "failed reset\n");
+		return -ENXIO;
+	}
+
+	/* Now exchange information about interrupts  */
+
+	bytes = myri10ge_max_intr_slots * sizeof (*mgp->rx_done.entry);
+	memset(mgp->rx_done.entry, 0, bytes);
+	cmd.data0 = (uint32_t) bytes;
+	status = myri10ge_send_cmd(mgp, MYRI10GE_MCP_CMD_SET_INTRQ_SIZE, &cmd);
+	cmd.data0 = MYRI10GE_LOWPART_TO_U32(mgp->rx_done.bus);
+	cmd.data1 = MYRI10GE_HIGHPART_TO_U32(mgp->rx_done.bus);
+	status |= myri10ge_send_cmd(mgp, MYRI10GE_MCP_CMD_SET_INTRQ_DMA, &cmd);
+
+
+	status |= myri10ge_send_cmd(mgp,  MYRI10GE_MCP_CMD_GET_IRQ_ACK_OFFSET, &cmd);
+	mgp->irq_claim = (__iomem uint32_t *) (mgp->sram + cmd.data0);
+	if (!mgp->msi_enabled) {
+		status |= myri10ge_send_cmd
+			(mgp,  MYRI10GE_MCP_CMD_GET_IRQ_DEASSERT_OFFSET,
+			 &cmd);
+		mgp->irq_deassert = (__iomem uint32_t *) (mgp->sram + cmd.data0);
+
+	}
+	status |= myri10ge_send_cmd
+		(mgp, MYRI10GE_MCP_CMD_GET_INTR_COAL_DELAY_OFFSET, &cmd);
+	mgp->intr_coal_delay_ptr = (__iomem uint32_t *) (mgp->sram + cmd.data0);
+	if (status != 0) {
+		dev_err(&mgp->pdev->dev, "failed set interrupt parameters\n");
+		return status;
+	}
+	__raw_writel(htonl(mgp->intr_coal_delay), mgp->intr_coal_delay_ptr);
+
+	/* Run a small DMA test.
+	 * The magic multipliers to the length tell the firmware
+	 * to do DMA read, write, or read+write tests.  The
+	 * results are returned in cmd.data0.  The upper 16
+	 * bits or the return is the number of transfers completed.
+	 * The lower 16 bits is the time in 0.5us ticks that the
+	 * transfers took to complete.
+	 */
+
+	len = mgp->tx.boundary;
+
+	cmd.data0 = MYRI10GE_LOWPART_TO_U32(mgp->rx_done.bus);
+	cmd.data1 = MYRI10GE_HIGHPART_TO_U32(mgp->rx_done.bus);
+	cmd.data2 = len * 0x10000;
+	status |= myri10ge_send_cmd(mgp, MYRI10GE_MCP_DMA_TEST, &cmd);
+	mgp->read_dma = ((cmd.data0>>16) * len * 2)/(cmd.data0 & 0xffff);
+
+	cmd.data0 = MYRI10GE_LOWPART_TO_U32(mgp->rx_done.bus);
+	cmd.data1 = MYRI10GE_HIGHPART_TO_U32(mgp->rx_done.bus);
+	cmd.data2 = len * 0x1;
+	status |= myri10ge_send_cmd(mgp, MYRI10GE_MCP_DMA_TEST, &cmd);
+	mgp->write_dma = ((cmd.data0>>16) * len * 2)/(cmd.data0 & 0xffff);
+
+	cmd.data0 = MYRI10GE_LOWPART_TO_U32(mgp->rx_done.bus);
+	cmd.data1 = MYRI10GE_HIGHPART_TO_U32(mgp->rx_done.bus);
+	cmd.data2 = len * 0x10001;
+	status |= myri10ge_send_cmd(mgp, MYRI10GE_MCP_DMA_TEST, &cmd);
+	mgp->read_write_dma = ((cmd.data0>>16) * len * 2 * 2) /
+		(cmd.data0 & 0xffff);
+
+	memset(mgp->rx_done.entry, 0, bytes);
+
+	/* reset mcp/driver shared state back to 0 */
+	mgp->tx.req = 0;
+	mgp->tx.done = 0;
+	mgp->tx.pkt_start = 0;
+	mgp->tx.pkt_done = 0;
+	mgp->rx_big.cnt = 0;
+	mgp->rx_small.cnt = 0;
+	mgp->rx_done.idx = 0;
+	mgp->rx_done.cnt = 0;
+	status = myri10ge_update_mac_address(mgp, mgp->dev->dev_addr);
+	myri10ge_change_promisc(mgp, 0);
+	myri10ge_change_pause(mgp, mgp->pause);
+	return status;
+}
+
+static inline void
+myri10ge_submit_8rx(mcp_kreq_ether_recv_t __iomem *dst, mcp_kreq_ether_recv_t *src)
+{
+	uint32_t low;
+
+	low = src->addr_low;
+	src->addr_low = 0xffffffff;
+	myri10ge_pio_copy(dst, src, 8 * sizeof(*src));
+	mb();
+	src->addr_low = low;
+	*(uint32_t __force *) &dst->addr_low = src->addr_low;
+	mb();
+}
+
+/*
+ * Set of routunes to get a new receive buffer.  Any buffer which
+ * crosses a 4KB boundary must start on a 4KB boundary due to PCIe
+ * wdma restrictions. We also try to align any smaller allocation to
+ * at least a 16 byte boundary for efficiency.  We assume the linux
+ * memory allocator works by powers of 2, and will not return memory
+ * smaller than 2KB which crosses a 4KB boundary.  If it does, we fall
+ * back to allocating 2x as much space as required.
+ */
+
+static inline struct sk_buff *
+myri10ge_alloc_big(int bytes)
+{
+	struct sk_buff *skb;
+	unsigned long data, roundup;
+
+	skb = dev_alloc_skb(bytes + 4096 + MYRI10GE_MCP_ETHER_PAD);
+	if (skb == NULL)
+		return NULL;
+
+	/* Correct skb->truesize so that socket buffer
+	 * accounting is not confused the rounding we must
+	 * do to satisfy alignment constraints.
+	 */
+	skb->truesize -= 4096;
+
+	data = (unsigned long)(skb->data);
+	roundup = (-data) & (4095);
+	skb_reserve(skb, roundup);
+	return skb;
+}
+
+/* Allocate 2x as much space as required and use whichever portion
+   does not cross a 4KB boundary */
+static inline struct sk_buff *
+myri10ge_alloc_small_safe(unsigned int bytes)
+{
+	struct sk_buff *skb;
+	unsigned long data, boundary;
+
+	skb = dev_alloc_skb(2 * (bytes + MYRI10GE_MCP_ETHER_PAD) - 1);
+	if (unlikely(skb == NULL))
+		return NULL;
+
+	/* Correct skb->truesize so that socket buffer
+	 * accounting is not confused the rounding we must
+	 * do to satisfy alignment constraints.
+	 */
+	skb->truesize -= bytes + MYRI10GE_MCP_ETHER_PAD;
+
+	data = (unsigned long)(skb->data);
+	boundary = (data + 4095UL) & ~4095UL;
+	if ((boundary - data) >= (bytes + MYRI10GE_MCP_ETHER_PAD)) {
+		return skb;
+	}
+	skb_reserve(skb, boundary - data);
+	return skb;
+}
+
+/* Allocate just enough space, and verify that the allocated
+   space does not cross a 4KB boundary */
+static inline struct sk_buff *
+myri10ge_alloc_small(int bytes)
+{
+	struct sk_buff *skb;
+	unsigned long roundup, data, end;
+
+	skb = dev_alloc_skb(bytes + 16 + MYRI10GE_MCP_ETHER_PAD);
+	if (unlikely(skb == NULL))
+		return NULL;
+
+	/* Round allocated buffer to 16 byte boundary */
+	data = (unsigned long)(skb->data);
+	roundup = (-data) & 15UL;
+	skb_reserve(skb, roundup);
+	/* Verify that the data buffer does not cross a page boundary */
+	data = (unsigned long)(skb->data);
+	end = data + bytes + MYRI10GE_MCP_ETHER_PAD - 1;
+	if (unlikely (((end >> 12) != (data >> 12)) && (data & 4095UL))) {
+		printk("myri10ge_alloc_small: small skb crossed 4KB boundary\n");
+		myri10ge_skb_cross_4k = 1;
+		dev_kfree_skb_any(skb);
+		skb = myri10ge_alloc_small_safe(bytes);
+	}
+	return skb;
+}
+
+static inline int
+myri10ge_getbuf(myri10ge_rx_buf_t *rx, struct pci_dev *pdev, int bytes, int idx)
+{
+	struct sk_buff *skb;
+	dma_addr_t bus;
+	int len, retval = 0;
+
+	bytes += VLAN_HLEN;	/* account for 802.1q vlan tag */
+
+	if ((bytes + MYRI10GE_MCP_ETHER_PAD) >
+	    (4096 - 16) /* linux overhead */) {
+		skb = myri10ge_alloc_big(bytes);
+	} else {
+		if (myri10ge_skb_cross_4k) {
+			skb = myri10ge_alloc_small_safe(bytes);
+		} else {
+			skb = myri10ge_alloc_small(bytes);
+		}
+	}
+	if (unlikely(skb == NULL)) {
+		rx->alloc_fail++;
+		retval = -ENOBUFS;
+		goto done;
+	}
+
+	/* set len so that it only covers the area we
+	   need mapped for DMA */
+	len = bytes + MYRI10GE_MCP_ETHER_PAD;
+
+	bus = pci_map_single(pdev, skb->data, len, PCI_DMA_FROMDEVICE);
+	rx->info[idx].skb = skb;
+	pci_unmap_addr_set(&rx->info[idx], bus, bus);
+	pci_unmap_len_set(&rx->info[idx], len, len);
+	rx->shadow[idx].addr_low = htonl(MYRI10GE_LOWPART_TO_U32(bus));
+	rx->shadow[idx].addr_high = htonl(MYRI10GE_HIGHPART_TO_U32(bus));
+
+done:
+	/* copy 8 descriptors (64-bytes) to the mcp at a time */
+	if ((idx & 7) == 7) {
+		if (rx->wc_fifo == NULL) {
+			myri10ge_submit_8rx(&rx->lanai[idx - 7],
+					    &rx->shadow[idx - 7]);
+		} else {
+			mb();
+			myri10ge_pio_copy((void __iomem *) rx->wc_fifo,
+					  &rx->shadow[idx - 7], 64);
+		}
+	}
+	return retval;
+}
+
+static inline void
+myri10ge_vlan_ip_csum(struct sk_buff *skb, uint16_t hw_csum)
+{
+	struct vlan_hdr *vh = (struct vlan_hdr *) (skb->data);
+
+	if ((skb->protocol == ntohs(ETH_P_8021Q)) &&
+	    (vh->h_vlan_encapsulated_proto == htons(ETH_P_IP) ||
+	     vh->h_vlan_encapsulated_proto == htons(ETH_P_IPV6))) {
+		skb->csum = hw_csum;
+		skb->ip_summed = CHECKSUM_HW;
+	}
+}
+
+static inline unsigned long
+myri10ge_rx_done(struct myri10ge_priv *mgp, myri10ge_rx_buf_t *rx,
+		  int bytes, int len, int csum)
+{
+	dma_addr_t bus;
+	struct sk_buff *skb;
+	int idx, unmap_len;
+
+	idx = rx->cnt & rx->mask;
+	rx->cnt++;
+
+	/* save a pointer to the received skb */
+	skb = rx->info[idx].skb;
+	bus = pci_unmap_addr(&rx->info[idx], bus);
+	unmap_len = pci_unmap_len(&rx->info[idx], len);
+
+	/* try to replace the received skb */
+	if (myri10ge_getbuf(rx, mgp->pdev, bytes, idx)) {
+		/* drop the frame -- the old skbuf is re-cycled */
+		mgp->stats.rx_dropped += 1;
+		return 0;
+	}
+
+	/* unmap the recvd skb */
+	pci_unmap_single(mgp->pdev,
+			 bus, unmap_len,
+			 PCI_DMA_FROMDEVICE);
+
+	/* mcp implicitly skips 1st bytes so that packet is properly
+	 * aligned */
+	skb_reserve(skb, MYRI10GE_MCP_ETHER_PAD);
+
+	/* set the length of the frame */
+	skb_put(skb, len);
+
+	skb->protocol = eth_type_trans(skb, mgp->dev);
+	skb->dev = mgp->dev;
+	if (mgp->csum_flag) {
+		if ((skb->protocol == ntohs(ETH_P_IP)) ||
+		    (skb->protocol == ntohs(ETH_P_IPV6))) {
+			skb->csum = ntohs((uint16_t)csum);
+			skb->ip_summed = CHECKSUM_HW;
+		} else {
+			myri10ge_vlan_ip_csum(skb,
+					      ntohs((uint16_t) csum));
+		}
+	}
+
+	if (myri10ge_napi)
+		netif_receive_skb(skb);
+	else
+		netif_rx(skb);
+
+	mgp->dev->last_rx = jiffies;
+	return 1;
+}
+
+static inline void
+myri10ge_tx_done(struct myri10ge_priv *mgp, int mcp_index)
+{
+	struct pci_dev *pdev = mgp->pdev;
+	myri10ge_tx_buf_t *tx = &mgp->tx;
+	struct sk_buff *skb;
+	int idx, len;
+	int limit = 0;
+
+	while (tx->pkt_done != mcp_index) {
+		idx = tx->done & tx->mask;
+		skb = tx->info[idx].skb;
+
+		/* Mark as free */
+		tx->info[idx].skb = NULL;
+		if (tx->info[idx].last) {
+			tx->pkt_done++;
+			tx->info[idx].last = 0;
+		}
+		tx->done++;
+		len = pci_unmap_len(&tx->info[idx], len);
+		pci_unmap_len_set(&tx->info[idx], len, 0);
+		if (skb) {
+			mgp->stats.tx_bytes += skb->len;
+			mgp->stats.tx_packets++;
+			dev_kfree_skb_irq(skb);
+			if (len)
+				pci_unmap_single(pdev,
+						 pci_unmap_addr(&tx->info[idx], bus),
+						 len, PCI_DMA_TODEVICE);
+		} else {
+			if (len)
+				pci_unmap_page(pdev,
+					       pci_unmap_addr(&tx->info[idx], bus),
+					       len, PCI_DMA_TODEVICE);
+		}
+
+		/* limit potential for livelock by only handling
+		   2 full tx rings per call */
+		if (unlikely(++limit >  2 * tx->mask))
+			break;
+	}
+	/* start the queue if we've stopped it */
+	if (netif_queue_stopped(mgp->dev)
+	    && tx->req - tx->done < (tx->mask >> 1)) {
+		mgp->wake_queue++;
+		netif_wake_queue(mgp->dev);
+	}
+}
+
+
+static inline void
+myri10ge_clean_rx_done(struct myri10ge_priv *mgp, int *limit)
+{
+	myri10ge_rx_done_t *rx_done = &mgp->rx_done;
+	unsigned long rx_bytes = 0;
+	unsigned long rx_packets = 0;
+	unsigned long rx_ok;
+
+	int idx = rx_done->idx;
+	int cnt = rx_done->cnt;
+	uint16_t length;
+	uint16_t checksum;
+
+	while (rx_done->entry[idx].length != 0 &&
+		*limit != 0) {
+		length = ntohs(rx_done->entry[idx].length);
+		rx_done->entry[idx].length = 0;
+		checksum = ntohs(rx_done->entry[idx].checksum);
+		if (length <= mgp->small_bytes)
+			rx_ok = myri10ge_rx_done(mgp, &mgp->rx_small,
+						 mgp->small_bytes,
+						 length, checksum);
+		else
+			rx_ok = myri10ge_rx_done(mgp, &mgp->rx_big,
+						 mgp->dev->mtu + ETH_HLEN,
+						 length, checksum);
+		rx_packets += rx_ok;
+		rx_bytes += rx_ok * (unsigned long)length;
+		cnt++;
+		idx = cnt & (myri10ge_max_intr_slots - 1);
+
+		/* limit potential for livelock by only handling a
+		 * limited number of frames. */
+		(*limit)--;
+	}
+	rx_done->idx = idx;
+	rx_done->cnt = cnt;
+	mgp->stats.rx_packets += rx_packets;
+	mgp->stats.rx_bytes += rx_bytes;
+}
+
+static inline void
+myri10ge_check_statblock(struct myri10ge_priv *mgp)
+{
+	mcp_irq_data_t *stats = mgp->fw_stats;
+
+	if (unlikely(stats->stats_updated)) {
+		if (mgp->link_state != stats->link_up) {
+			mgp->link_state = stats->link_up;
+			if (mgp->link_state) {
+				printk("myri10ge: %s: link up\n",
+				       mgp->dev->name);
+				netif_carrier_on(mgp->dev);
+			} else {
+				printk("myri10ge: %s: link down\n",
+				       mgp->dev->name);
+				netif_carrier_off(mgp->dev);
+			}
+		}
+		if (mgp->rdma_tags_available !=
+		    ntohl(mgp->fw_stats->rdma_tags_available)) {
+			mgp->rdma_tags_available = ntohl(
+				mgp->fw_stats->rdma_tags_available);
+			printk("myri10ge: %s: RDMA timed out! "
+			       "%d tags left\n", mgp->dev->name,
+			       mgp->rdma_tags_available);
+		}
+		mgp->down_cnt += stats->link_down;
+	}
+}
+
+static int
+myri10ge_poll(struct net_device *netdev, int *budget)
+{
+	struct myri10ge_priv *mgp = netdev_priv(netdev);
+	myri10ge_rx_done_t *rx_done = &mgp->rx_done;
+	int limit, orig_limit, work_done;
+
+	/* process as many rx events as NAPI will allow */
+	limit = min(*budget, netdev->quota);
+	orig_limit = limit;
+	myri10ge_clean_rx_done(mgp, &limit);
+	work_done = orig_limit - limit;
+	*budget -= work_done;
+	netdev->quota -= work_done;
+
+	if (rx_done->entry[rx_done->idx].length == 0 ||
+	    !netif_running(netdev)) {
+		netif_rx_complete(netdev);
+		__raw_writel(htonl(3), mgp->irq_claim);
+		return 0;
+	}
+	return 1;
+}
+
+static irqreturn_t
+myri10ge_napi_intr(int irq, void *arg, struct pt_regs *regs)
+{
+	struct myri10ge_priv *mgp = (struct myri10ge_priv *) arg;
+	mcp_irq_data_t *stats = mgp->fw_stats;
+	myri10ge_tx_buf_t *tx = &mgp->tx;
+	uint32_t send_done_count;
+	int i;
+
+	/* make sure it is our IRQ, and that the DMA has finished */
+	if (unlikely(!stats->valid)) {
+		return (IRQ_NONE);
+	}
+
+	/* low bit indicates receives are present, so schedule
+	   napi poll handler */
+	if (stats->valid & 1) {
+		netif_rx_schedule(mgp->dev);
+	}
+
+	if (!mgp->msi_enabled) {
+		__raw_writel(0, mgp->irq_deassert);
+		if (!myri10ge_deassert_wait)
+			stats->valid = 0;
+		mb();
+	} else {
+		stats->valid = 0;
+	}
+
+
+	/* Wait for IRQ line to go low, if using INTx */
+	i = 0;
+	do {
+		i++;
+		/* check for transmit completes and receives */
+		send_done_count = ntohl(stats->send_done_count);
+		if (send_done_count != tx->pkt_done)
+			myri10ge_tx_done(mgp, (int)send_done_count);
+		if (*((uint8_t * volatile) &stats->valid) == 0)
+			cpu_relax();
+		if (unlikely(i > myri10ge_max_irq_loops)) {
+			printk("myri10ge: %s: irq stuck?\n",
+			       mgp->dev->name);
+			stats->valid = 0;
+			schedule_work(&mgp->watchdog_work);
+		}
+	} while (*((uint8_t * volatile) &stats->valid));
+
+	myri10ge_check_statblock(mgp);
+
+	__raw_writel(htonl(3), mgp->irq_claim + 1);
+	return (IRQ_HANDLED);
+}
+
+
+static irqreturn_t
+myri10ge_intr(int irq, void *arg, struct pt_regs *regs)
+{
+	struct myri10ge_priv *mgp = (struct myri10ge_priv *) arg;
+	mcp_irq_data_t *stats = mgp->fw_stats;
+	myri10ge_tx_buf_t *tx = &mgp->tx;
+	myri10ge_rx_done_t *rx_done = &mgp->rx_done;
+	uint32_t send_done_count;
+	uint8_t valid;
+	int limit, i;
+
+	/* make sure it is our IRQ, and that the DMA has finished */
+	if (unlikely(!stats->valid)) {
+		return (IRQ_NONE);
+	}
+	valid = stats->valid;
+	if (!mgp->msi_enabled) {
+		__raw_writel(0, mgp->irq_deassert);
+		if (!myri10ge_deassert_wait)
+			stats->valid = 0;
+		mb();
+	} else {
+		stats->valid = 0;
+	}
+
+	i = 0;
+	do {
+		/* check for transmit completes and receives */
+		send_done_count = ntohl(stats->send_done_count);
+		while ((send_done_count != tx->pkt_done) ||
+		       (rx_done->entry[rx_done->idx].length != 0)) {
+			myri10ge_tx_done(mgp, (int)send_done_count);
+			limit = 2 * myri10ge_max_intr_slots;
+			myri10ge_clean_rx_done(mgp, &limit);
+			send_done_count = ntohl(stats->send_done_count);
+		}
+		if (unlikely(i > myri10ge_max_irq_loops)) {
+			printk("myri10ge: %s: irq stuck?\n",
+			       mgp->dev->name);
+			stats->valid = 0;
+			schedule_work(&mgp->watchdog_work);
+		}
+	} while (*((uint8_t * volatile) &stats->valid));
+
+	myri10ge_check_statblock(mgp);
+	/* check to see if we have rx token, pass it back
+	   if we do */
+	if (valid & 0x1)
+		__raw_writel(htonl(3), mgp->irq_claim);
+	__raw_writel(htonl(3), mgp->irq_claim + 1);
+	return (IRQ_HANDLED);
+}
+
+static int
+myri10ge_get_settings(struct net_device *netdev, struct ethtool_cmd *cmd)
+{
+	cmd->autoneg = AUTONEG_DISABLE;
+	cmd->speed = SPEED_10000;
+	cmd->duplex = DUPLEX_FULL;
+	return 0;
+}
+
+static int
+myri10ge_set_settings(struct net_device *netdev, struct ethtool_cmd *cmd)
+{
+	return -EINVAL;
+}
+
+static void
+myri10ge_get_drvinfo(struct net_device *netdev,
+		   struct ethtool_drvinfo *info)
+{
+	struct myri10ge_priv *mgp = netdev_priv(netdev);
+
+	strlcpy(info->driver, "myri10ge", sizeof (info->driver));
+	strlcpy(info->version, MYRI10GE_VERSION_STR, sizeof (info->version));
+	strlcpy(info->fw_version, mgp->fw_version, sizeof (info->fw_version));
+	strlcpy(info->bus_info, pci_name(mgp->pdev), sizeof (info->bus_info));
+}
+
+static int
+myri10ge_get_coalesce(struct net_device *netdev,
+		     struct ethtool_coalesce *coal)
+{
+	struct myri10ge_priv *mgp = netdev_priv(netdev);
+	coal->rx_coalesce_usecs = mgp->intr_coal_delay;
+	return 0;
+}
+
+static int
+myri10ge_set_coalesce(struct net_device *netdev,
+		     struct ethtool_coalesce *coal)
+{
+	struct myri10ge_priv *mgp = netdev_priv(netdev);
+
+	mgp->intr_coal_delay = coal->rx_coalesce_usecs;
+	__raw_writel(htonl(mgp->intr_coal_delay),
+		     mgp->intr_coal_delay_ptr);
+	return 0;
+}
+
+static void
+myri10ge_get_pauseparam(struct net_device *netdev,
+			struct ethtool_pauseparam *pause)
+{
+	struct myri10ge_priv *mgp = netdev_priv(netdev);
+
+	pause->autoneg = 0;
+	pause->rx_pause = mgp->pause;
+	pause->tx_pause = mgp->pause;
+}
+
+static int
+myri10ge_set_pauseparam(struct net_device *netdev,
+			struct ethtool_pauseparam *pause)
+{
+	struct myri10ge_priv *mgp = netdev_priv(netdev);
+
+	if (pause->tx_pause != mgp->pause) {
+		return (myri10ge_change_pause(mgp, pause->tx_pause));
+	}
+	if (pause->rx_pause != mgp->pause) {
+		return (myri10ge_change_pause(mgp, pause->tx_pause));
+	}
+	if (pause->autoneg != 0)
+		return -EINVAL;
+	return 0;
+}
+
+static void
+myri10ge_get_ringparam(struct net_device *netdev,
+		       struct ethtool_ringparam *ring)
+{
+	struct myri10ge_priv *mgp = netdev_priv(netdev);
+
+	ring->rx_mini_max_pending = mgp->rx_small.mask + 1;
+	ring->rx_max_pending = mgp->rx_big.mask + 1;
+	ring->rx_jumbo_max_pending = 0;
+	ring->tx_max_pending = mgp->rx_small.mask + 1;
+	ring->rx_mini_pending = ring->rx_mini_max_pending;
+	ring->rx_pending = ring->rx_max_pending;
+	ring->rx_jumbo_pending = ring->rx_jumbo_max_pending;
+	ring->tx_pending = ring->tx_max_pending;
+}
+
+static u32
+myri10ge_get_rx_csum(struct net_device *netdev)
+{
+	struct myri10ge_priv *mgp = netdev_priv(netdev);
+	if (mgp->csum_flag)
+		return 1;
+	else
+		return 0;
+}
+
+static int
+myri10ge_set_rx_csum(struct net_device *netdev, u32 csum_enabled)
+{
+	struct myri10ge_priv *mgp = netdev_priv(netdev);
+	if (csum_enabled)
+		mgp->csum_flag = MYRI10GE_MCP_ETHER_FLAGS_CKSUM;
+	else
+		mgp->csum_flag = 0;
+	return 0;
+}
+
+
+static const char myri10ge_gstrings_stats[][ETH_GSTRING_LEN] = {
+	"rx_packets", "tx_packets", "rx_bytes", "tx_bytes", "rx_errors",
+	"tx_errors", "rx_dropped", "tx_dropped", "multicast", "collisions",
+	"rx_length_errors", "rx_over_errors", "rx_crc_errors",
+	"rx_frame_errors", "rx_fifo_errors", "rx_missed_errors",
+	"tx_aborted_errors", "tx_carrier_errors", "tx_fifo_errors",
+	"tx_heartbeat_errors", "tx_window_errors",
+	/* device-specific stats */
+	"read_dma_bw_MBs", "write_dma_bw_MBs", "read_write_dma_bw_MBs",
+	"serial_number", "tx_pkt_start", "tx_pkt_done",
+	"tx_req", "tx_done", "rx_small_cnt", "rx_big_cnt",
+	"wake_queue", "stop_queue", "watchdog_resets", "tx_linearized",
+	"link_up", "dropped_link_overflow", "dropped_link_error_or_filtered",
+	"dropped_runt", "dropped_overrun", "dropped_no_small_buffer",
+	"dropped_no_big_buffer"
+};
+#define MYRI10GE_NET_STATS_LEN      21
+#define MYRI10GE_STATS_LEN  sizeof(myri10ge_gstrings_stats) / ETH_GSTRING_LEN
+
+static void
+myri10ge_get_strings(struct net_device *netdev, u32 stringset, u8 *data)
+{
+	switch (stringset) {
+	case ETH_SS_STATS:
+		memcpy(data, *myri10ge_gstrings_stats,
+		       sizeof(myri10ge_gstrings_stats));
+		break;
+	}
+}
+
+static int
+myri10ge_get_stats_count(struct net_device *netdev)
+{
+	return MYRI10GE_STATS_LEN;
+}
+
+static void
+myri10ge_get_ethtool_stats(struct net_device *netdev,
+			   struct ethtool_stats *stats, u64 *data)
+{
+	struct myri10ge_priv *mgp = netdev_priv(netdev);
+	int i;
+
+	for(i = 0; i < MYRI10GE_NET_STATS_LEN; i++)
+		data[i] = ((unsigned long *) &mgp->stats)[i];
+
+	data[i++] = (unsigned int)mgp->read_dma;
+	data[i++] = (unsigned int)mgp->write_dma;
+	data[i++] = (unsigned int)mgp->read_write_dma;
+	data[i++] = (unsigned int)mgp->serial_number;
+	data[i++] = (unsigned int)mgp->tx.pkt_start;
+	data[i++] = (unsigned int)mgp->tx.pkt_done;
+	data[i++] = (unsigned int)mgp->tx.req;
+	data[i++] = (unsigned int)mgp->tx.done;
+	data[i++] = (unsigned int)mgp->rx_small.cnt;
+	data[i++] = (unsigned int)mgp->rx_big.cnt;
+	data[i++] = (unsigned int)mgp->wake_queue;
+	data[i++] = (unsigned int)mgp->stop_queue;
+	data[i++] = (unsigned int)mgp->watchdog_resets;
+	data[i++] = (unsigned int)mgp->tx_linearized;
+	data[i++] = (unsigned int)ntohl(mgp->fw_stats->link_up);
+	data[i++] = (unsigned int)ntohl(mgp->fw_stats->dropped_link_overflow);
+	data[i++] = (unsigned int)ntohl(mgp->fw_stats->dropped_link_error_or_filtered);
+	data[i++] = (unsigned int)ntohl(mgp->fw_stats->dropped_runt);
+	data[i++] = (unsigned int)ntohl(mgp->fw_stats->dropped_overrun);
+	data[i++] = (unsigned int)ntohl(mgp->fw_stats->dropped_no_small_buffer);
+	data[i++] = (unsigned int)ntohl(mgp->fw_stats->dropped_no_big_buffer);
+}
+
+
+static struct ethtool_ops myri10ge_ethtool_ops = {
+	.get_settings 			= myri10ge_get_settings,
+	.set_settings 			= myri10ge_set_settings,
+	.get_drvinfo			= myri10ge_get_drvinfo,
+	.get_coalesce			= myri10ge_get_coalesce,
+	.set_coalesce			= myri10ge_set_coalesce,
+	.get_pauseparam			= myri10ge_get_pauseparam,
+	.set_pauseparam			= myri10ge_set_pauseparam,
+	.get_ringparam			= myri10ge_get_ringparam,
+	.get_rx_csum			= myri10ge_get_rx_csum,
+	.set_rx_csum			= myri10ge_set_rx_csum,
+	.get_tx_csum			= ethtool_op_get_tx_csum,
+	.set_tx_csum			= ethtool_op_set_tx_csum,
+	.get_sg				= ethtool_op_get_sg,
+	.set_sg				= ethtool_op_set_sg,
+#ifdef NETIF_F_TSO
+	.get_tso			= ethtool_op_get_tso,
+	.set_tso			= ethtool_op_set_tso,
+#endif
+	.get_strings			= myri10ge_get_strings,
+	.get_stats_count		= myri10ge_get_stats_count,
+	.get_ethtool_stats		= myri10ge_get_ethtool_stats
+};



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 5/6] myri10ge - Second half of the driver
  2006-05-10 21:22 [PATCH 0/6] myri10ge - Myri-10G Ethernet driver Brice Goglin
                   ` (3 preceding siblings ...)
  2006-05-10 21:40 ` [PATCH 4/6] myri10ge - First half of the driver Brice Goglin
@ 2006-05-10 21:42 ` Brice Goglin
  2006-05-10 22:22   ` Stephen Hemminger
  2006-05-10 21:43 ` [PATCH 6/6] myri10ge - Kconfig and Makefile Brice Goglin
  5 siblings, 1 reply; 28+ messages in thread
From: Brice Goglin @ 2006-05-10 21:42 UTC (permalink / raw)
  To: netdev, Andrew Morton; +Cc: LKML, Andrew J. Gallatin, brice

[PATCH 5/6] myri10ge - Second half of the driver

The second half of the myri10ge driver core.

Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Andrew J. Gallatin <gallatin@myri.com>

 myri10ge.c | 1540 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 1540 insertions(+)

--- linux/drivers/net/myri10ge/myri10ge.c.old	2006-05-09 23:00:54.000000000 +0200
+++ linux/drivers/net/myri10ge/myri10ge.c	2006-05-09 23:00:54.000000000 +0200
@@ -1481,3 +1481,1543 @@ static struct ethtool_ops myri10ge_ethto
 	.get_stats_count		= myri10ge_get_stats_count,
 	.get_ethtool_stats		= myri10ge_get_ethtool_stats
 };
+
+static int
+myri10ge_open(struct net_device *dev)
+{
+	struct myri10ge_priv *mgp;
+	size_t bytes;
+	myri10ge_cmd_t cmd;
+	int tx_ring_size, rx_ring_size;
+	int tx_ring_entries, rx_ring_entries;
+	int i, status, big_pow2;
+
+	mgp = dev->priv;
+
+	if (mgp->running != MYRI10GE_ETH_STOPPED)
+		return -EBUSY;
+
+	mgp->running = MYRI10GE_ETH_STARTING;
+	status = myri10ge_reset(mgp);
+	if (status != 0) {
+		printk(KERN_ERR "myri10ge: %s: failed reset\n", dev->name);
+		mgp->running = MYRI10GE_ETH_STOPPED;
+		return -ENXIO;
+	}
+
+	/* decide what small buffer size to use.  For good TCP rx
+	 * performance, it is important to not receive 1514 byte
+	 * frames into jumbo buffers, as it confuses the socket buffer
+	 * accounting code, leading to drops and erratic performance.
+	 */
+
+	if (dev->mtu <= ETH_DATA_LEN) {
+		mgp->small_bytes = 128;			/* enough for a TCP header */
+	} else {
+		mgp->small_bytes = ETH_FRAME_LEN;	/* enough for an ETH_DATA_LEN frame */
+	}
+	/* Override the small buffer size? */
+	if (myri10ge_small_bytes > 0) {
+		mgp->small_bytes = myri10ge_small_bytes;
+	}
+
+	/* If the user sets an obscenely small MTU, adjust the small
+	 * bytes down to nearly nothing */
+	if (mgp->small_bytes >= (dev->mtu + ETH_HLEN))
+		mgp->small_bytes = 64;
+
+	/* get ring sizes */
+	status = myri10ge_send_cmd(mgp, MYRI10GE_MCP_CMD_GET_SEND_RING_SIZE, &cmd);
+	tx_ring_size = cmd.data0;
+	status |= myri10ge_send_cmd(mgp, MYRI10GE_MCP_CMD_GET_RX_RING_SIZE, &cmd);
+	rx_ring_size = cmd.data0;
+
+	/* get the lanai pointers to the send and receive rings */
+
+	status = myri10ge_send_cmd(mgp, MYRI10GE_MCP_CMD_GET_SEND_OFFSET, &cmd);
+	mgp->tx.lanai = (mcp_kreq_ether_send_t __iomem *) (mgp->sram + cmd.data0);
+
+
+	status |= myri10ge_send_cmd(mgp, MYRI10GE_MCP_CMD_GET_SMALL_RX_OFFSET, &cmd);
+	mgp->rx_small.lanai = (mcp_kreq_ether_recv_t __iomem *) (mgp->sram + cmd.data0);
+
+	status |= myri10ge_send_cmd(mgp, MYRI10GE_MCP_CMD_GET_BIG_RX_OFFSET, &cmd);
+	mgp->rx_big.lanai = (mcp_kreq_ether_recv_t __iomem *) (mgp->sram + cmd.data0);
+
+	if (status != 0) {
+		printk(KERN_ERR "myri10ge: %s: failed to get ring sizes or locations\n",
+		      dev->name);
+		mgp->running = MYRI10GE_ETH_STOPPED;
+		return -ENXIO;
+	}
+
+	if (mgp->mtrr >= 0) {
+		mgp->tx.wc_fifo = mgp->sram + 0x200000;
+		mgp->rx_small.wc_fifo = mgp->sram + 0x300000;
+		mgp->rx_big.wc_fifo = mgp->sram + 0x340000;
+	} else {
+		mgp->tx.wc_fifo = NULL;
+		mgp->rx_small.wc_fifo = NULL;
+		mgp->rx_big.wc_fifo = NULL;
+	}
+
+	tx_ring_entries = tx_ring_size / sizeof (mcp_kreq_ether_send_t);
+	rx_ring_entries = rx_ring_size / sizeof (mcp_dma_addr_t);
+	mgp->tx.mask = tx_ring_entries - 1;
+	mgp->rx_small.mask = mgp->rx_big.mask = rx_ring_entries - 1;
+
+	/* allocate the host shadow rings */
+
+	bytes = 8 + (MYRI10GE_MCP_ETHER_MAX_SEND_DESC_TSO + 4)
+		* sizeof (*mgp->tx.req_list);
+	mgp->tx.req_bytes = kmalloc(bytes, GFP_KERNEL);
+	if (mgp->tx.req_bytes == NULL)
+		goto abort_with_nothing;
+	memset(mgp->tx.req_bytes, 0, bytes);
+
+	/* ensure req_list entries are aligned to 8 bytes */
+	mgp->tx.req_list = (mcp_kreq_ether_send_t *)
+		((unsigned long)(mgp->tx.req_bytes + 7) & ~7UL);
+
+	bytes = rx_ring_entries * sizeof (*mgp->rx_small.shadow);
+	mgp->rx_small.shadow = kmalloc(bytes, GFP_KERNEL);
+	if (mgp->rx_small.shadow == NULL)
+		goto abort_with_tx_req_bytes;
+	memset(mgp->rx_small.shadow, 0, bytes);
+
+	bytes = rx_ring_entries * sizeof (*mgp->rx_big.shadow);
+	mgp->rx_big.shadow = kmalloc(bytes, GFP_KERNEL);
+	if (mgp->rx_big.shadow == NULL)
+		goto abort_with_rx_small_shadow;
+	memset(mgp->rx_big.shadow, 0, bytes);
+
+	/* allocate the host info rings */
+
+	bytes = tx_ring_entries * sizeof (*mgp->tx.info);
+	mgp->tx.info = kmalloc(bytes, GFP_KERNEL);
+	if (mgp->tx.info == NULL)
+		goto abort_with_rx_big_shadow;
+	memset(mgp->tx.info, 0, bytes);
+
+	bytes = rx_ring_entries * sizeof (*mgp->rx_small.info);
+	mgp->rx_small.info = kmalloc(bytes, GFP_KERNEL);
+	if (mgp->rx_small.info == NULL)
+		goto abort_with_tx_info;
+	memset(mgp->rx_small.info, 0, bytes);
+
+	bytes = rx_ring_entries * sizeof (*mgp->rx_big.info);
+	mgp->rx_big.info = kmalloc(bytes, GFP_KERNEL);
+	if (mgp->rx_big.info == NULL)
+		goto abort_with_rx_small_info;
+	memset(mgp->rx_big.info, 0, bytes);
+
+	/* Fill the receive rings */
+	for (i = 0; i <= mgp->rx_small.mask; i++) {
+		status = myri10ge_getbuf(&mgp->rx_small, mgp->pdev,
+					 mgp->small_bytes, i);
+		if (status) {
+			printk(KERN_ERR "myri10ge: %s: alloced only %d small bufs\n",
+			       dev->name, i);
+			goto abort_with_rx_small_ring;
+		}
+	}
+
+	for (i = 0; i <= mgp->rx_big.mask; i++) {
+		status = myri10ge_getbuf(&mgp->rx_big, mgp->pdev, dev->mtu + ETH_HLEN, i);
+		if (status) {
+			printk(KERN_ERR "myri10ge: %s: alloced only %d big bufs\n",
+			       dev->name, i);
+			goto abort_with_rx_big_ring;
+		}
+	}
+
+	/* Firmware needs the big buff size as a power of 2.  Lie and
+	 * tell him the buffer is larger, because we only use 1
+	 * buffer/pkt, and the mtu will prevent overruns.
+	 */
+	big_pow2 = dev->mtu + ETH_HLEN + MYRI10GE_MCP_ETHER_PAD;
+	while ((big_pow2 & (big_pow2 - 1)) != 0)
+		big_pow2++;
+
+	/* now give firmware buffers sizes, and MTU */
+	cmd.data0 = dev->mtu + ETH_HLEN + VLAN_HLEN;
+	status = myri10ge_send_cmd(mgp, MYRI10GE_MCP_CMD_SET_MTU, &cmd);
+	cmd.data0 = mgp->small_bytes;
+	status |= myri10ge_send_cmd(mgp, MYRI10GE_MCP_CMD_SET_SMALL_BUFFER_SIZE, &cmd);
+	cmd.data0 = big_pow2;
+	status |= myri10ge_send_cmd(mgp, MYRI10GE_MCP_CMD_SET_BIG_BUFFER_SIZE, &cmd);
+	if (status) {
+		printk(KERN_ERR "myri10ge: %s: Couldn't set buffer sizes\n",
+		       dev->name);
+		goto abort_with_rx_big_ring;
+	}
+
+	cmd.data0 = MYRI10GE_LOWPART_TO_U32(mgp->fw_stats_bus);
+	cmd.data1 = MYRI10GE_HIGHPART_TO_U32(mgp->fw_stats_bus);
+	status = myri10ge_send_cmd(mgp, MYRI10GE_MCP_CMD_SET_STATS_DMA, &cmd);
+	if (status) {
+		printk(KERN_ERR "myri10ge: %s: Couldn't set stats DMA\n",
+		       dev->name);
+		goto abort_with_rx_big_ring;
+	}
+
+	mgp->link_state = -1;
+	mgp->rdma_tags_available = 15;
+
+	if (myri10ge_napi) /* must happen prior to any irq */
+		netif_poll_enable(mgp->dev);
+
+	status = myri10ge_send_cmd(mgp, MYRI10GE_MCP_CMD_ETHERNET_UP, &cmd);
+	if (status) {
+		printk(KERN_ERR "myri10ge: %s: Couldn't bring up link\n",
+		       dev->name);
+		goto abort_with_rx_big_ring;
+	}
+
+	mgp->wake_queue = 0;
+	mgp->stop_queue = 0;
+	mgp->running = MYRI10GE_ETH_RUNNING;
+	mgp->watchdog_timer.expires =
+		jiffies + myri10ge_watchdog_timeout * HZ;
+	add_timer(&mgp->watchdog_timer);
+	netif_wake_queue(dev);
+	return 0;
+
+abort_with_rx_big_ring:
+	for (i = 0; i <= mgp->rx_big.mask; i++) {
+		if (mgp->rx_big.info[i].skb != NULL)
+			dev_kfree_skb_any(mgp->rx_big.info[i].skb);
+		if (pci_unmap_len(&mgp->rx_big.info[i], len)) {
+			pci_unmap_single(mgp->pdev,
+					 pci_unmap_addr(&mgp->rx_big.info[i], bus),
+					 pci_unmap_len(&mgp->rx_big.info[i], len),
+					 PCI_DMA_FROMDEVICE);
+		}
+	}
+
+abort_with_rx_small_ring:
+	for (i = 0; i <= mgp->rx_small.mask; i++) {
+		if (mgp->rx_small.info[i].skb != NULL)
+			dev_kfree_skb_any(mgp->rx_small.info[i].skb);
+		if (pci_unmap_len(&mgp->rx_small.info[i], len)) {
+			pci_unmap_single(mgp->pdev,
+					 pci_unmap_addr(&mgp->rx_small.info[i], bus),
+					 pci_unmap_len(&mgp->rx_small.info[i], len),
+					 PCI_DMA_FROMDEVICE);
+		}
+	}
+	kfree(mgp->rx_big.info);
+
+abort_with_rx_small_info:
+	kfree(mgp->rx_small.info);
+
+abort_with_tx_info:
+	kfree(mgp->tx.info);
+
+abort_with_rx_big_shadow:
+	kfree(mgp->rx_big.shadow);
+
+abort_with_rx_small_shadow:
+	kfree(mgp->rx_small.shadow);
+
+abort_with_tx_req_bytes:
+	kfree(mgp->tx.req_bytes);
+	mgp->tx.req_bytes = NULL;
+	mgp->tx.req_list = NULL;
+
+abort_with_nothing:
+	mgp->running = MYRI10GE_ETH_STOPPED;
+	return -ENOMEM;
+
+}
+static int
+myri10ge_close(struct net_device *dev)
+{
+	struct myri10ge_priv *mgp;
+	struct sk_buff *skb;
+	myri10ge_tx_buf_t *tx;
+	int status, i, old_down_cnt, len, idx;
+	myri10ge_cmd_t cmd;
+
+	mgp = dev->priv;
+
+	if (mgp->running != MYRI10GE_ETH_RUNNING)
+		return 0;
+
+	if (mgp->tx.req_bytes == NULL)
+		return 0;
+
+	del_timer_sync(&mgp->watchdog_timer);
+	mgp->running = MYRI10GE_ETH_STOPPING;
+	if (myri10ge_napi)
+		netif_poll_disable(mgp->dev);
+	netif_carrier_off(dev);
+	netif_stop_queue(dev);
+	old_down_cnt = mgp->down_cnt;
+	mb();
+	status = myri10ge_send_cmd(mgp, MYRI10GE_MCP_CMD_ETHERNET_DOWN, &cmd);
+	if (status) {
+		printk(KERN_ERR "myri10ge: %s: Couldn't bring down link\n",
+		       dev->name);
+	}
+	set_current_state (TASK_UNINTERRUPTIBLE);
+	if (old_down_cnt == mgp->down_cnt)
+		schedule_timeout(HZ);
+	set_current_state(TASK_RUNNING);
+	if (old_down_cnt == mgp->down_cnt) {
+		printk(KERN_ERR "myri10ge: %s never got down irq\n",
+		       dev->name);
+	}
+	netif_tx_disable(dev);
+
+	for (i = 0; i <= mgp->rx_big.mask; i++) {
+		if (mgp->rx_big.info[i].skb != NULL)
+			dev_kfree_skb_any(mgp->rx_big.info[i].skb);
+		if (pci_unmap_len(&mgp->rx_big.info[i], len)) {
+			pci_unmap_single(mgp->pdev,
+					 pci_unmap_addr(&mgp->rx_big.info[i], bus),
+					 pci_unmap_len(&mgp->rx_big.info[i], len),
+					 PCI_DMA_FROMDEVICE);
+		}
+	}
+
+	for (i = 0; i <= mgp->rx_small.mask; i++) {
+		if (mgp->rx_small.info[i].skb != NULL)
+			dev_kfree_skb_any(mgp->rx_small.info[i].skb);
+		if (pci_unmap_len(&mgp->rx_small.info[i], len)) {
+			pci_unmap_single(mgp->pdev,
+					 pci_unmap_addr(&mgp->rx_small.info[i], bus),
+					 pci_unmap_len(&mgp->rx_small.info[i], len),
+					 PCI_DMA_FROMDEVICE);
+		}
+	}
+
+	tx = &mgp->tx;
+	while (tx->done != tx->req) {
+		idx = tx->done & tx->mask;
+		skb = tx->info[idx].skb;
+
+		/* Mark as free */
+		tx->info[idx].skb = NULL;
+		tx->done++;
+		len = pci_unmap_len(&tx->info[idx], len);
+		pci_unmap_len_set(&tx->info[idx], len, 0);
+		if (skb) {
+			mgp->stats.tx_dropped++;
+			dev_kfree_skb_any(skb);
+			if (len)
+				pci_unmap_single(mgp->pdev,
+						 pci_unmap_addr(&tx->info[idx], bus),
+						 len, PCI_DMA_TODEVICE);
+		} else {
+			if (len)
+				pci_unmap_page(mgp->pdev,
+					       pci_unmap_addr(&tx->info[idx], bus),
+					       len, PCI_DMA_TODEVICE);
+		}
+	}
+	kfree(mgp->rx_big.info);
+
+	kfree(mgp->rx_small.info);
+
+	kfree(mgp->tx.info);
+
+	kfree(mgp->rx_big.shadow);
+
+	kfree(mgp->rx_small.shadow);
+
+	kfree(mgp->tx.req_bytes);
+	mgp->tx.req_bytes = NULL;
+	mgp->tx.req_list = NULL;
+	mgp->running = MYRI10GE_ETH_STOPPED;
+	return 0;
+}
+
+/* copy an array of mcp_kreq_ether_send_t's to the mcp.  Copy
+ * backwards one at a time and handle ring wraps */
+
+static inline void
+myri10ge_submit_req_backwards(myri10ge_tx_buf_t *tx,
+			      mcp_kreq_ether_send_t *src, int cnt)
+{
+	int idx, starting_slot;
+	starting_slot = tx->req;
+	while (cnt > 1) {
+		cnt--;
+		idx = (starting_slot + cnt) & tx->mask;
+		myri10ge_pio_copy(&tx->lanai[idx],
+				  &src[cnt], sizeof(*src));
+		mb();
+	}
+}
+
+/*
+ * copy an array of mcp_kreq_ether_send_t's to the mcp.  Copy
+ * at most 32 bytes at a time, so as to avoid involving the software
+ * pio handler in the nic.   We re-write the first segment's flags
+ * to mark them valid only after writing the entire chain.
+ */
+
+static inline void
+myri10ge_submit_req(myri10ge_tx_buf_t *tx, mcp_kreq_ether_send_t *src,
+		    int cnt)
+{
+	int idx, i;
+	uint32_t __iomem *dst_ints;
+	uint32_t *src_ints;
+	mcp_kreq_ether_send_t __iomem *dstp, *dst;
+	mcp_kreq_ether_send_t *srcp;
+	uint8_t last_flags;
+
+	idx = tx->req & tx->mask;
+
+	last_flags = src->flags;
+	src->flags = 0;
+	mb();
+	dst = dstp = &tx->lanai[idx];
+	srcp = src;
+
+	if ((idx + cnt) < tx->mask) {
+		for (i = 0; i < (cnt - 1); i += 2) {
+			myri10ge_pio_copy(dstp, srcp, 2 * sizeof(*src));
+			mb();	/* force write every 32 bytes */
+			srcp += 2;
+			dstp += 2;
+		}
+	} else {
+		/* submit all but the first request, and ensure
+		   that it is submitted below */
+		myri10ge_submit_req_backwards(tx, src, cnt);
+		i = 0;
+	}
+	if (i < cnt) {
+		/* submit the first request */
+		myri10ge_pio_copy(dstp, srcp, sizeof(*src));
+		mb(); /* barrier before setting valid flag */
+	}
+
+	/* re-write the last 32-bits with the valid flags */
+	src->flags = last_flags;
+	src_ints = (uint32_t *) src;
+	src_ints += 3;
+	dst_ints = (uint32_t __iomem *) dst;
+	dst_ints += 3;
+	*(uint32_t __force *) dst_ints = *src_ints;
+	tx->req += cnt;
+	mb();
+}
+
+static inline void
+myri10ge_submit_req_wc(myri10ge_tx_buf_t *tx,
+		       mcp_kreq_ether_send_t *src, int cnt)
+{
+	tx->req += cnt;
+	mb();
+	while (cnt >= 4) {
+		myri10ge_pio_copy((void __iomem *) tx->wc_fifo, src, 64);
+		mb();
+		src += 4;
+		cnt -= 4;
+	}
+	if (cnt > 0) {
+		/* pad it to 64 bytes.  The src is 64 bytes bigger than it
+		 * needs to be so that we don't overrun it */
+		myri10ge_pio_copy((void __iomem *) tx->wc_fifo + (cnt<<18), src, 64);
+		mb();
+	}
+}
+
+#ifdef NETIF_F_TSO
+static inline unsigned long
+myri10ge_tcpend(struct sk_buff *skb)
+{
+	struct iphdr *ip;
+	int iphlen, tcplen;
+	struct tcphdr *tcp;
+
+	ip = (struct iphdr *) ((char *) skb->data + 14);
+	iphlen = ip->ihl << 2;
+	tcp = (struct tcphdr *) ((char *) ip + iphlen);
+	tcplen = tcp->doff << 2;
+	return (tcplen + iphlen + 14);
+}
+#endif
+
+static inline void
+myri10ge_csum_fixup(struct sk_buff *skb, int cksum_offset,
+		    int pseudo_hdr_offset)
+{
+	int csum;
+	uint16_t *csum_ptr;
+
+
+	csum = skb_checksum(skb, cksum_offset,
+			    skb->len - cksum_offset, 0);
+	csum_ptr = (uint16_t *) (skb->h.raw + skb->csum);
+	if (!pskb_may_pull(skb, pseudo_hdr_offset)) {
+		printk(KERN_ERR "myri10ge: can't pull skb %d\n",
+		       pseudo_hdr_offset);
+		return;
+	}
+	*csum_ptr = csum_fold(csum);
+	/* need to fixup IPv4 UDP packets according to RFC768 */
+	if (unlikely(*csum_ptr == 0 &&
+		     skb->protocol == htons(ETH_P_IP) &&
+		     skb->nh.iph->protocol == IPPROTO_UDP)) {
+		*csum_ptr = 0xffff;
+	}
+}
+
+/*
+ * Transmit a packet.  We need to split the packet so that a single
+ * segment does not cross myri10ge->tx.boundary, so this makes segment
+ * counting tricky.  So rather than try to count segments up front, we
+ * just give up if there are too few segments to hold a reasonably
+ * fragmented packet currently available.  If we run
+ * out of segments while preparing a packet for DMA, we just linearize
+ * it and try again.
+ */
+
+static int
+myri10ge_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+	struct myri10ge_priv *mgp = dev->priv;
+	mcp_kreq_ether_send_t *req;
+	myri10ge_tx_buf_t *tx = &mgp->tx;
+	struct skb_frag_struct *frag;
+	dma_addr_t bus;
+	uint32_t low, high_swapped;
+	unsigned int len;
+	int idx, last_idx, avail, frag_cnt, frag_idx, count, mss, max_segments;
+	uint16_t pseudo_hdr_offset, cksum_offset;
+	int cum_len, seglen, boundary, rdma_count;
+	uint8_t flags, odd_flag;
+
+again:
+	req = tx->req_list;
+	avail = tx->mask - 1 - (tx->req - tx->done);
+
+	mss = 0;
+	max_segments = MYRI10GE_MCP_ETHER_MAX_SEND_DESC;
+
+#ifdef NETIF_F_TSO
+	if (skb->len > (dev->mtu + ETH_HLEN)) {
+		mss = skb_shinfo(skb)->tso_size;
+		if (mss != 0)
+			max_segments = MYRI10GE_MCP_ETHER_MAX_SEND_DESC_TSO;
+	}
+#endif /*NETIF_F_TSO */
+
+	if ((unlikely(avail < max_segments))) {
+		/* we are out of transmit resources */
+		mgp->stop_queue++;
+		netif_stop_queue(dev);
+		return 1;
+	}
+
+	/* Setup checksum offloading, if needed */
+	cksum_offset = 0;
+	pseudo_hdr_offset = 0;
+	odd_flag = 0;
+	flags = (MYRI10GE_MCP_ETHER_FLAGS_NO_TSO |
+		 MYRI10GE_MCP_ETHER_FLAGS_FIRST);
+	if (likely(skb->ip_summed == CHECKSUM_HW)) {
+		cksum_offset = (skb->h.raw - skb->data);
+		pseudo_hdr_offset = (skb->h.raw + skb->csum) - skb->data;
+		/* If the headers are excessively large, then we must
+		 * fall back to a software checksum */
+		if (unlikely(cksum_offset > 255 ||
+			     pseudo_hdr_offset > 127)) {
+			myri10ge_csum_fixup(skb, cksum_offset, pseudo_hdr_offset);
+			cksum_offset = 0;
+			pseudo_hdr_offset = 0;
+		} else {
+			pseudo_hdr_offset = htons(pseudo_hdr_offset);
+			odd_flag = MYRI10GE_MCP_ETHER_FLAGS_ALIGN_ODD;
+			flags |= MYRI10GE_MCP_ETHER_FLAGS_CKSUM;
+		}
+	}
+
+	cum_len = 0;
+
+#ifdef NETIF_F_TSO
+	if (mss) { /* TSO */
+		/* this removes any CKSUM flag from before */
+		flags = (MYRI10GE_MCP_ETHER_FLAGS_TSO_HDR |
+			 MYRI10GE_MCP_ETHER_FLAGS_FIRST);
+
+		/* negative cum_len signifies to the
+		 * send loop that we are still in the
+		 * header portion of the TSO packet.
+		 * TSO header must be at most 134 bytes long */
+		cum_len = -myri10ge_tcpend(skb);
+
+		/* for TSO, pseudo_hdr_offset holds mss.
+		 * The firmware figures out where to put
+		 * the checksum by parsing the header. */
+		pseudo_hdr_offset = htons(mss);
+	} else
+#endif /*NETIF_F_TSO */
+	/* Mark small packets, and pad out tiny packets */
+	if (skb->len <= MYRI10GE_MCP_ETHER_SEND_SMALL_SIZE) {
+		flags |= MYRI10GE_MCP_ETHER_FLAGS_SMALL;
+
+		/* pad frames to at least ETH_ZLEN bytes */
+		if (unlikely(skb->len < ETH_ZLEN)) {
+			skb = skb_padto(skb, ETH_ZLEN);
+			if (skb == NULL) {
+				/* The packet is gone, so we must
+				   return 0 */
+				mgp->stats.tx_dropped += 1;
+				return 0;
+			}
+			/* adjust the len to account for the zero pad
+			   so that the nic can know how long it is */
+			skb->len = ETH_ZLEN;
+		}
+	}
+
+	/* map the skb for DMA */
+	len = skb->len - skb->data_len;
+	idx = tx->req & tx->mask;
+	tx->info[idx].skb = skb;
+	bus = pci_map_single(mgp->pdev, skb->data, len, PCI_DMA_TODEVICE);
+	pci_unmap_addr_set(&tx->info[idx], bus, bus);
+	pci_unmap_len_set(&tx->info[idx], len, len);
+
+	frag_cnt = skb_shinfo(skb)->nr_frags;
+	frag_idx = 0;
+	count = 0;
+	rdma_count = 0;
+
+	/* "rdma_count" is the number of RDMAs belonging to the
+	 * current packet BEFORE the current send request. For
+	 * non-TSO packets, this is equal to "count".
+	 * For TSO packets, rdma_count needs to be reset
+	 * to 0 after a segment cut.
+	 *
+	 * The rdma_count field of the send request is
+	 * the number of RDMAs of the packet starting at
+	 * that request. For TSO send requests with one ore more cuts
+	 * in the middle, this is the number of RDMAs starting
+	 * after the last cut in the request. All previous
+	 * segments before the last cut implicitly have 1 RDMA.
+	 *
+	 * Since the number of RDMAs is not known beforehand,
+	 * it must be filled-in retroactively - after each
+	 * segmentation cut or at the end of the entire packet.
+	 */
+
+	while (1) {
+		/* Break the SKB or Fragment up into pieces which
+		   do not cross mgp->tx.boundary */
+		low = MYRI10GE_LOWPART_TO_U32(bus);
+		high_swapped = htonl(MYRI10GE_HIGHPART_TO_U32(bus));
+		while (len) {
+			uint8_t flags_next;
+			int cum_len_next;
+
+			if (unlikely(count == max_segments))
+				goto abort_linearize;
+
+			boundary = (low + tx->boundary) & ~(tx->boundary - 1);
+			seglen = boundary - low;
+			if (seglen > len)
+				seglen = len;
+			flags_next = flags & ~MYRI10GE_MCP_ETHER_FLAGS_FIRST;
+			cum_len_next = cum_len + seglen;
+#ifdef NETIF_F_TSO
+			if (mss) { /* TSO */
+				(req-rdma_count)->rdma_count = rdma_count + 1;
+
+				if (likely(cum_len >= 0)) { /* payload */
+					int next_is_first, chop;
+
+					chop = (cum_len_next>mss);
+					cum_len_next = cum_len_next % mss;
+					next_is_first = (cum_len_next == 0);
+					flags |= chop *
+						MYRI10GE_MCP_ETHER_FLAGS_TSO_CHOP;
+					flags_next |= next_is_first *
+						MYRI10GE_MCP_ETHER_FLAGS_FIRST;
+					rdma_count |= -(chop | next_is_first);
+					rdma_count += chop & !next_is_first;
+				} else if (likely(cum_len_next >= 0)) { /* header ends */
+					int small;
+
+					rdma_count = -1;
+					cum_len_next = 0;
+					seglen = -cum_len;
+					small = (mss <= MYRI10GE_MCP_ETHER_SEND_SMALL_SIZE);
+					flags_next = MYRI10GE_MCP_ETHER_FLAGS_TSO_PLD |
+						MYRI10GE_MCP_ETHER_FLAGS_FIRST |
+						(small * MYRI10GE_MCP_ETHER_FLAGS_SMALL);
+				}
+			}
+#endif /* NETIF_F_TSO */
+			req->addr_high = high_swapped;
+			req->addr_low = htonl(low);
+			req->pseudo_hdr_offset = pseudo_hdr_offset;
+			req->pad = 0;	/* complete solid 16-byte block; does this matter? */
+			req->rdma_count = 1;
+			req->length = htons(seglen);
+			req->cksum_offset = cksum_offset;
+			req->flags = flags | ((cum_len & 1) * odd_flag);
+
+			low += seglen;
+			len -= seglen;
+			cum_len = cum_len_next;
+			flags = flags_next;
+			req++;
+			count++;
+			rdma_count++;
+			if (unlikely(cksum_offset > seglen))
+				cksum_offset -= seglen;
+			else
+				cksum_offset = 0;
+		}
+		if (frag_idx == frag_cnt)
+			break;
+
+		/* map next fragment for DMA */
+		idx = (count + tx->req) & tx->mask;
+		frag = &skb_shinfo(skb)->frags[frag_idx];
+		frag_idx++;
+		len = frag->size;
+		bus = pci_map_page(mgp->pdev, frag->page, frag->page_offset,
+				   len, PCI_DMA_TODEVICE);
+		pci_unmap_addr_set(&tx->info[idx], bus, bus);
+		pci_unmap_len_set(&tx->info[idx], len, len);
+	}
+
+	(req-rdma_count)->rdma_count = rdma_count;
+#ifdef NETIF_F_TSO
+	if (mss) {
+		do {
+			req--;
+			req->flags |= MYRI10GE_MCP_ETHER_FLAGS_TSO_LAST;
+		} while (!(req->flags & (MYRI10GE_MCP_ETHER_FLAGS_TSO_CHOP |
+					 MYRI10GE_MCP_ETHER_FLAGS_FIRST)));
+	}
+#endif
+	idx = ((count - 1) + tx->req) & tx->mask;
+	tx->info[idx].last = 1;
+	if (tx->wc_fifo == NULL)
+		myri10ge_submit_req(tx, tx->req_list, count);
+	else
+		myri10ge_submit_req_wc(tx, tx->req_list, count);
+	tx->pkt_start++;
+	if ((avail - count) < MYRI10GE_MCP_ETHER_MAX_SEND_DESC) {
+		mgp->stop_queue++;
+		netif_stop_queue(dev);
+	}
+	dev->trans_start = jiffies;
+	return 0;
+
+
+abort_linearize:
+	/* Free any DMA resources we've alloced and clear out the skb
+	 * slot so as to not trip up assertions, and to avoid a
+	 * double-free if linearizing fails */
+
+	last_idx = (idx + 1) & tx->mask;
+	idx = tx->req & tx->mask;
+	tx->info[idx].skb = NULL;
+	do {
+		len = pci_unmap_len(&tx->info[idx], len);
+		if (len) {
+			if (tx->info[idx].skb != NULL) {
+				pci_unmap_single(mgp->pdev,
+						 pci_unmap_addr(&tx->info[idx], bus),
+						 len, PCI_DMA_TODEVICE);
+			} else {
+				pci_unmap_page(mgp->pdev,
+					       pci_unmap_addr(&tx->info[idx], bus),
+					       len, PCI_DMA_TODEVICE);
+			}
+			pci_unmap_len_set(&tx->info[idx], len, 0);
+			tx->info[idx].skb = NULL;
+		}
+		idx = (idx + 1) & tx->mask;
+	} while (idx != last_idx);
+	if (skb_shinfo(skb)->tso_size) {
+		printk(KERN_ERR "myri10ge: %s: TSO but wanted to linearize?!?!?\n",
+		       mgp->dev->name);
+		goto drop;
+	}
+
+	if (skb_linearize(skb, GFP_ATOMIC)) {
+		goto drop;
+	}
+	mgp->tx_linearized++;
+	goto again;
+
+drop:
+	dev_kfree_skb_any(skb);
+	mgp->stats.tx_dropped += 1;
+	return 0;
+
+
+}
+
+static struct net_device_stats *
+myri10ge_get_stats(struct net_device *dev)
+{
+	struct myri10ge_priv *mgp = dev->priv;
+	return &mgp->stats;
+}
+
+static void
+myri10ge_set_multicast_list(struct net_device *dev)
+{
+	myri10ge_change_promisc(dev->priv, dev->flags & IFF_PROMISC);
+}
+
+
+static int
+myri10ge_set_mac_address (struct net_device *dev, void *addr)
+{
+	struct sockaddr *sa = (struct sockaddr *) addr;
+	struct myri10ge_priv *mgp = dev->priv;
+	int status;
+
+	if (!is_valid_ether_addr(sa->sa_data))
+		return -EADDRNOTAVAIL;
+
+	status = myri10ge_update_mac_address(mgp, sa->sa_data);
+	if (status != 0) {
+		printk(KERN_ERR "myri10ge: %s: changing mac address failed with %d\n",
+		       dev->name, status);
+		return status;
+	}
+
+	/* change the dev structure */
+	memcpy(dev->dev_addr, sa->sa_data, 6);
+	return 0;
+}
+static int
+myri10ge_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
+{
+	return -EOPNOTSUPP;
+}
+
+static int
+myri10ge_init(struct net_device *dev)
+{
+	return 0;
+}
+
+static int
+myri10ge_change_mtu(struct net_device *dev, int new_mtu)
+{
+	struct myri10ge_priv *mgp = dev->priv;
+	int error = 0;
+
+	if ((new_mtu < 68) || (ETH_HLEN + new_mtu > MYRI10GE_MAX_ETHER_MTU)) {
+		printk(KERN_ERR "myri10ge: %s: new mtu (%d) is not valid\n",
+		       dev->name, new_mtu);
+		return -EINVAL;
+	}
+	printk("%s: changing mtu from %d to %d\n",
+	       dev->name, dev->mtu, new_mtu);
+	if (mgp->running) {
+		/* if we change the mtu on an active device, we must
+		 * reset the device so the firmware sees the change */
+		myri10ge_close(dev);
+		dev->mtu = new_mtu;
+		myri10ge_open(dev);
+	} else {
+		dev->mtu = new_mtu;
+	}
+	return error;
+}
+
+#if defined(CONFIG_X86) || defined(CONFIG_X86_64)
+
+/*
+ * Enable ECRC to align PCI-E Completion packets.  Rather than using
+ * normal pci config space writes, we must map the Nvidia config space
+ * ourselves.  This is because on opteron/nvidia class machine the
+ * 0xe000000 mapping is handled by the nvidia chipset, that means
+ * the internal PCI device (the on-chip northbridge), or the amd-8131
+ * bridge and things behind them are not visible by this method.
+ */
+
+static void
+myri10ge_enable_ecrc(struct myri10ge_priv *mgp)
+{
+	struct pci_dev *bridge = mgp->pdev->bus->self;
+	struct device * dev = &mgp->pdev->dev;
+	unsigned cap;
+	unsigned err_cap;
+	int ret;
+
+	if (!myri10ge_ecrc_enable || !bridge)
+		return;
+
+	cap = pci_find_ext_capability(bridge, PCI_EXT_CAP_ID_ERR);
+	/* nvidia ext cap is not always linked in ext cap chain */
+	if (!cap
+	    && bridge->vendor == PCI_VENDOR_ID_NVIDIA
+	    && bridge->device == PCI_DEVICE_ID_NVIDIA_NFORCE_CK804_PCIE)
+		cap = 0x160;
+
+	if (!cap)
+		return;
+
+	ret = pci_read_config_dword(bridge, cap + PCI_ERR_CAP, &err_cap);
+	if (ret) {
+		dev_err(dev, "failed reading ext-conf-space of %s\n",
+			pci_name(bridge));
+		dev_err(dev, "\t pci=nommconf in use? "
+			"or buggy/incomplete/absent acpi MCFG attr?\n");
+		return;
+	}
+	if (!(err_cap & PCI_ERR_CAP_ECRC_GENC))
+		return;
+
+	err_cap |= PCI_ERR_CAP_ECRC_GENE;
+	pci_write_config_dword(bridge, cap + PCI_ERR_CAP, err_cap);
+	dev_info(dev,
+		 "Enabled ECRC on upstream bridge %s\n",
+		 pci_name(bridge));
+	mgp->tx.boundary = 4096;
+	mgp->fw_name = myri10ge_fw_aligned;
+}
+#endif /* defined(CONFIG_X86) || defined(CONFIG_X86_64) */
+
+/*
+ * The Lanai Z8E PCI-E interface achieves higher Read-DMA throughput
+ * when the PCI-E Completion packets are aligned on an 8-byte
+ * boundary.  Some PCI-E chip sets always align Completion packets; on
+ * the ones that do not, the alignment can be enforced by enabling
+ * ECRC generation (if supported).
+ *
+ * When PCI-E Completion packets are not aligned, it is actually more
+ * efficient to limit Read-DMA transactions to 2KB, rather than 4KB.
+ *
+ * If the driver can neither enable ECRC nor verify that it has
+ * already been enabled, then it must use a firmware image which works
+ * around unaligned completion packets (myri10ge_ethp_z8e.dat), and it
+ * should also ensure that it never gives the device a Read-DMA which is
+ * larger than 2KB by setting the tx.boundary to 2KB.  If ECRC is
+ * enabled, then the driver should use the aligned (myri10ge_eth_z8e.dat)
+ * firmware image, and set tx.boundary to 4KB.
+ */
+
+static void
+myri10ge_select_firmware(struct myri10ge_priv *mgp)
+{
+	struct pci_dev *bridge = mgp->pdev->bus->self;
+
+	mgp->tx.boundary = 2048;
+	mgp->fw_name = myri10ge_fw_unaligned;
+
+	if (myri10ge_force_firmware == 0) {
+#if defined(CONFIG_X86) || defined(CONFIG_X86_64)
+		myri10ge_enable_ecrc(mgp);
+#endif
+		/* Check to see if the upstream bridge is known to
+		 * provide aligned completions */
+		if (bridge
+		    /* ServerWorks HT2000/HT1000 */
+		    && bridge->vendor == PCI_VENDOR_ID_SERVERWORKS
+		    && bridge->device == PCI_DEVICE_ID_SERVERWORKS_HT2000_PCIE) {
+			dev_info(&mgp->pdev->dev,
+				 "Assuming aligned completions (0x%x:0x%x)\n",
+				 bridge->vendor, bridge->device);
+			mgp->tx.boundary = 4096;
+			mgp->fw_name = myri10ge_fw_aligned;
+		}
+	} else {
+		if (myri10ge_force_firmware == 1) {
+			dev_info(&mgp->pdev->dev,
+				 "Assuming aligned completions (forced)\n");
+			mgp->tx.boundary = 4096;
+			mgp->fw_name = myri10ge_fw_aligned;
+		} else {
+			dev_info(&mgp->pdev->dev,
+				 "Assuming unaligned completions (forced)\n");
+			mgp->tx.boundary = 2048;
+			mgp->fw_name = myri10ge_fw_unaligned;
+		}
+	}
+	if (myri10ge_fw_name != NULL) {
+		dev_info(&mgp->pdev->dev, "overriding firmware to %s\n",
+			 myri10ge_fw_name);
+		mgp->fw_name = myri10ge_fw_name;
+	}
+}
+
+
+static void
+myri10ge_save_state(struct myri10ge_priv *mgp)
+{
+	struct pci_dev *pdev =	mgp->pdev;
+	int cap;
+
+	pci_save_state(pdev);
+	/* now save PCIe and MSI state that Linux will not
+	   save for us */
+	cap = pci_find_capability(pdev, PCI_CAP_ID_EXP);
+	pci_read_config_dword(pdev, cap + PCI_EXP_DEVCTL, &mgp->devctl);
+	cap = pci_find_capability(pdev, PCI_CAP_ID_MSI);
+	pci_read_config_word(pdev, cap + PCI_MSI_FLAGS, &mgp->msi_flags);
+	pci_read_config_dword(pdev, cap + PCI_MSI_ADDRESS_LO,
+			      &mgp->msi_addr_low);
+	pci_read_config_dword(pdev, cap + PCI_MSI_ADDRESS_HI,
+			      &mgp->msi_addr_high);
+	pci_read_config_word(pdev, cap + PCI_MSI_DATA_32,
+			     &mgp->msi_data_32);
+	pci_read_config_word(pdev, cap + PCI_MSI_DATA_64,
+			     &mgp->msi_data_64);
+}
+
+static void
+myri10ge_restore_state(struct myri10ge_priv *mgp)
+{
+	struct pci_dev *pdev =	mgp->pdev;
+	int cap;
+
+	pci_restore_state(pdev);
+	/* restore PCIe and MSI state that linux will not */
+	cap = pci_find_capability(pdev, PCI_CAP_ID_EXP);
+	pci_write_config_dword(pdev, cap + PCI_CAP_ID_EXP, mgp->devctl);
+	cap = pci_find_capability(pdev, PCI_CAP_ID_MSI);
+	pci_write_config_word(pdev, cap + PCI_MSI_FLAGS, mgp->msi_flags);
+	pci_write_config_dword(pdev, cap + PCI_MSI_ADDRESS_LO,
+			       mgp->msi_addr_low);
+	pci_write_config_dword(pdev, cap + PCI_MSI_ADDRESS_HI,
+			       mgp->msi_addr_high);
+	pci_write_config_word(pdev, cap + PCI_MSI_DATA_32,
+			      mgp->msi_data_32);
+	pci_write_config_word(pdev, cap + PCI_MSI_DATA_64,
+			      mgp->msi_data_64);
+}
+
+#ifdef CONFIG_PM
+
+static int
+myri10ge_suspend(struct pci_dev *pdev, pm_message_t state)
+{
+	struct myri10ge_priv *mgp;
+	struct net_device *netdev;
+
+	mgp = (struct myri10ge_priv *) pci_get_drvdata(pdev);
+	if (mgp == NULL)
+		return -EINVAL;
+	netdev = mgp->dev;
+
+	if (netif_running(netdev)) {
+		printk("myri10ge: closing %s\n", netdev->name);
+		myri10ge_close(netdev);
+	}
+	myri10ge_dummy_rdma(mgp, 0);
+	free_irq(pdev->irq, mgp);
+#ifdef CONFIG_PCI_MSI
+	if (mgp->msi_enabled)
+		pci_disable_msi(pdev);
+#endif
+	netif_device_detach(netdev);
+	myri10ge_save_state(mgp);
+	pci_disable_device(pdev);
+	pci_set_power_state(pdev, pci_choose_state(pdev, state));
+	return 0;
+}
+
+static int
+myri10ge_resume(struct pci_dev *pdev)
+{
+	struct myri10ge_priv *mgp;
+	struct net_device *netdev;
+	int status;
+
+	mgp = (struct myri10ge_priv *) pci_get_drvdata(pdev);
+	if (mgp == NULL)
+		return -EINVAL;
+	netdev = mgp->dev;
+	pci_set_power_state(pdev, 0);  /* zeros conf space as a side effect */
+	udelay(5000);	/* give card time to respond */
+	myri10ge_restore_state(mgp);
+	pci_enable_device(pdev);
+	pci_set_master(pdev);
+
+#ifdef CONFIG_PCI_MSI
+	if (myri10ge_use_msi(pdev)) {
+		status = pci_enable_msi(pdev);
+		if (status != 0) {
+			dev_err(&pdev->dev,
+				"Error %d setting up MSI; falling back to xPIC\n",
+				status);
+
+		} else {
+			mgp->msi_enabled = 1;
+		}
+	}
+#endif
+	if (myri10ge_napi) {
+		status = request_irq(pdev->irq, myri10ge_napi_intr, SA_SHIRQ,
+				     netdev->name, mgp);
+	} else {
+
+		status = request_irq(pdev->irq, myri10ge_intr, SA_SHIRQ,
+				     netdev->name, mgp);
+	}
+	if (status != 0) {
+		dev_err(&pdev->dev, "failed to allocate IRQ\n");
+		goto abort_with_msi;
+	}
+
+	myri10ge_reset(mgp);
+	myri10ge_dummy_rdma(mgp, mgp->tx.boundary != 4096);
+
+	/* Save configuration space to be restored if the
+	   nic resets due to a parity error */
+	myri10ge_save_state(mgp);
+
+	netif_device_attach(netdev);
+	if (netif_running(netdev))
+		myri10ge_open(netdev);
+	return 0;
+
+abort_with_msi:
+#ifdef CONFIG_PCI_MSI
+	if (mgp->msi_enabled)
+		pci_disable_msi(pdev);
+#endif
+	return -EIO;
+
+}
+
+#endif /* CONFIG_PM */
+
+static uint32_t
+myri10ge_read_reboot(struct myri10ge_priv *mgp)
+{
+	struct pci_dev *pdev = mgp->pdev;
+	int vs = mgp->vendor_specific_offset;
+	uint32_t reboot;
+
+	/*enter read32 mode */
+	pci_write_config_byte(pdev, vs + 0x10, 0x3);
+
+	/*read REBOOT_STATUS (0xfffffff0) */
+	pci_write_config_dword(pdev, vs + 0x18, 0xfffffff0);
+	pci_read_config_dword(pdev, vs + 0x14, &reboot);
+	return reboot;
+}
+
+static void
+myri10ge_watchdog(void *arg)
+{
+	struct myri10ge_priv *mgp = arg;
+	uint32_t reboot;
+	int status;
+	uint16_t cmd, vendor;
+
+	mgp->watchdog_resets++;
+	pci_read_config_word(mgp->pdev, PCI_COMMAND, &cmd);
+	if ((cmd & PCI_COMMAND_MASTER) == 0) {
+		/* Bus master DMA disabled?  Check to see
+		 * if the card rebooted due to a parity error
+		 * For now, just report it */
+		reboot = myri10ge_read_reboot(mgp);
+		printk(KERN_ERR "myri10ge: %s: NIC rebooted (0x%x), resetting\n",
+		       mgp->dev->name, reboot);
+		/*
+		 * A rebooted nic will come back with config space as
+		 * it was after power was applied to PCIe bus.
+		 * Attempt to restore config space which was saved
+		 * when the driver was loaded, or the last time the
+		 * nic was resumed from power saving mode.
+		 */
+		myri10ge_restore_state(mgp);
+	} else {
+		/* if we get back -1's from our slot, perhaps somebody
+		   powered off our card.  Don't try to reset it in
+		   this case */
+		if (cmd == 0xffff) {
+			pci_read_config_word(mgp->pdev, PCI_VENDOR_ID, &vendor);
+			if (vendor == 0xffff) {
+				printk(KERN_ERR "myri10ge: %s: device disappeared!\n",
+				       mgp->dev->name);
+				return;
+			}
+		}
+		/* Perhaps it is a software error.  Try to reset */
+
+		printk(KERN_ERR "myri10ge: %s: device timeout, resetting\n",
+		       mgp->dev->name);
+		printk("myri10ge: %s: %d %d %d %d %d\n", mgp->dev->name,
+		       mgp->tx.req, mgp->tx.done, mgp->tx.pkt_start,
+		       mgp->tx.pkt_done,
+		       (int)ntohl(mgp->fw_stats->send_done_count));
+		set_current_state (TASK_UNINTERRUPTIBLE);
+		schedule_timeout(HZ*2);
+		set_current_state(TASK_RUNNING);
+		printk("myri10ge: %s: %d %d %d %d %d\n", mgp->dev->name,
+		       mgp->tx.req, mgp->tx.done, mgp->tx.pkt_start,
+		       mgp->tx.pkt_done,
+		       (int)ntohl(mgp->fw_stats->send_done_count));
+	}
+	myri10ge_close(mgp->dev);
+	status = myri10ge_load_firmware(mgp);
+	if (status != 0) {
+		printk(KERN_ERR "myri10ge: %s: failed to load firmware\n",
+		       mgp->dev->name);
+		return;
+	}
+	myri10ge_open(mgp->dev);
+}
+
+static void
+myri10ge_watchdog_timer(unsigned long arg)
+{
+	struct myri10ge_priv *mgp;
+
+	mgp = (struct myri10ge_priv *) arg;
+	if (mgp->tx.req != mgp->tx.done &&
+	    mgp->tx.done == mgp->watchdog_tx_done) {
+		/* nic seems like it might be stuck.. */
+		schedule_work(&mgp->watchdog_work);
+	} else {
+		/* rearm timer */
+		mod_timer(&mgp->watchdog_timer,
+			  jiffies + myri10ge_watchdog_timeout * HZ);
+	}
+	mgp->watchdog_tx_done = mgp->tx.done;
+}
+
+static int
+myri10ge_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
+{
+	struct net_device *netdev;
+	struct myri10ge_priv *mgp;
+	struct device *dev = &pdev->dev;
+	size_t bytes;
+	int i;
+	int status = -ENXIO;
+	int cap;
+	u16 val;
+
+	netdev = alloc_etherdev(sizeof(*mgp));
+	if (netdev == NULL) {
+		dev_err(dev, "Could not allocate ethernet device\n");
+		return -ENOMEM;
+	}
+
+	mgp = netdev_priv(netdev);
+	memset(mgp, 0, sizeof (*mgp));
+	mgp->dev = netdev;
+	mgp->pdev = pdev;
+	mgp->csum_flag = MYRI10GE_MCP_ETHER_FLAGS_CKSUM;
+	mgp->pause = myri10ge_flow_control;
+	mgp->intr_coal_delay = myri10ge_intr_coal_delay;
+
+	spin_lock_init(&mgp->cmd_lock);
+	if (pci_enable_device(pdev)) {
+		dev_err(&pdev->dev, "pci_enable_device call failed\n");
+		status = -ENODEV;
+		goto abort_with_netdev;
+	}
+	myri10ge_select_firmware(mgp);
+
+	/* Find the vendor-specific cap so we can check
+	   the reboot register later on */
+	mgp->vendor_specific_offset
+		= pci_find_capability(pdev, PCI_CAP_ID_VNDR);
+
+	/* Set our max read request to 4KB */
+	cap = pci_find_capability(pdev, PCI_CAP_ID_EXP);
+	if (cap < 64) {
+		dev_err(&pdev->dev,"Bad PCI_CAP_ID_EXP location %d\n", cap);
+		goto abort_with_netdev;
+	}
+	status = pci_read_config_word(pdev, cap + PCI_EXP_DEVCTL, &val);
+	if (status != 0) {
+		dev_err(&pdev->dev, "Error %d reading PCI_EXP_DEVCTL\n", status);
+		goto abort_with_netdev;
+	}
+	val = (val & ~PCI_EXP_DEVCTL_READRQ) | (5 << 12);
+	status = pci_write_config_word(pdev, cap + PCI_EXP_DEVCTL, val);
+	if (status != 0) {
+		dev_err(&pdev->dev, "Error %d writing PCI_EXP_DEVCTL\n", status);
+		goto abort_with_netdev;
+	}
+
+	pci_set_master(pdev);
+	status = pci_set_dma_mask(pdev, (dma_addr_t)~0ULL);
+	if (status != 0) {
+		dev_err(&pdev->dev, "64-bit pci address mask was refused, trying 32-bit");
+		status = pci_set_dma_mask(pdev, (dma_addr_t)0xffffffffULL);
+	}
+	if (status != 0) {
+		dev_err(&pdev->dev, "Error %d setting DMA mask\n", status);
+		goto abort_with_netdev;
+	}
+	mgp->cmd = (mcp_cmd_response_t *)
+		pci_alloc_consistent(pdev, sizeof (*mgp->cmd), &mgp->cmd_bus);
+	if (mgp->cmd == NULL) {
+		goto abort_with_netdev;
+	}
+
+	mgp->fw_stats = (mcp_irq_data_t *)
+		pci_alloc_consistent(pdev, sizeof (*mgp->fw_stats),
+				     &mgp->fw_stats_bus);
+	if (mgp->fw_stats == NULL) {
+		goto abort_with_cmd;
+	}
+
+	strcpy(netdev->name, "eth%d");
+	mgp->board_span = pci_resource_len(pdev, 0);
+	mgp->iomem_base = pci_resource_start(pdev, 0);
+	mgp->mtrr = -1;
+#ifdef CONFIG_MTRR
+	mgp->mtrr = mtrr_add(mgp->iomem_base, mgp->board_span,
+			     MTRR_TYPE_WRCOMB, 1);
+#endif
+	/* Hack.  need to get rid of these magic numbers */
+	mgp->sram_size = 2*1024*1024 - (2*(48*1024)+(32*1024)) - 0x100;
+	if (mgp->sram_size > mgp->board_span) {
+		dev_err(&pdev->dev, "board span %ld bytes too small\n",
+		       mgp->board_span);
+		goto abort_with_wc;
+	}
+	mgp->sram = ioremap(mgp->iomem_base, mgp->board_span);
+	if (mgp->sram == NULL) {
+		dev_err(&pdev->dev, "ioremap failed for %ld bytes at 0x%lx\n",
+		       mgp->board_span, mgp->iomem_base);
+		status = -ENXIO;
+		goto abort_with_wc;
+	}
+	memcpy_fromio(mgp->eeprom_strings,
+		      mgp->sram + mgp->sram_size - MYRI10GE_EEPROM_STRINGS_SIZE,
+		      MYRI10GE_EEPROM_STRINGS_SIZE);
+	memset(mgp->eeprom_strings + MYRI10GE_EEPROM_STRINGS_SIZE - 2, 0, 2);
+	status = myri10ge_read_mac_addr(mgp);
+	if (status) {
+		goto abort_with_ioremap;
+	}
+	for (i = 0; i < 6; i++) {
+		netdev->dev_addr[i] = mgp->mac_addr[i];
+	}
+	/* allocate rx done ring */
+	bytes = myri10ge_max_intr_slots * sizeof (*mgp->rx_done.entry);
+	mgp->rx_done.entry = (mcp_slot_t *)
+		pci_alloc_consistent(pdev, bytes, &mgp->rx_done.bus);
+	if (mgp->rx_done.entry == NULL)
+		goto abort_with_ioremap;
+	memset(mgp->rx_done.entry, 0, bytes);
+
+	status = myri10ge_load_firmware(mgp);
+	if (status != 0) {
+		dev_err(&pdev->dev, "failed to load firmware\n");
+		goto abort_with_rx_done;
+	}
+
+	status = myri10ge_reset(mgp);
+	if (status != 0) {
+		dev_err(&pdev->dev, "failed reset\n");
+		goto abort_with_firmware;
+	}
+
+#ifdef CONFIG_PCI_MSI
+	if (myri10ge_use_msi(pdev)) {
+		status = pci_enable_msi(pdev);
+		if (status != 0) {
+			dev_err(&pdev->dev,
+				"Error %d setting up MSI; falling back to xPIC\n",
+				status);
+		} else {
+			mgp->msi_enabled = 1;
+		}
+	}
+#endif
+
+	if (myri10ge_napi) {
+		status = request_irq(pdev->irq, myri10ge_napi_intr, SA_SHIRQ,
+				     netdev->name, mgp);
+	} else {
+		status = request_irq(pdev->irq, myri10ge_intr, SA_SHIRQ,
+				     netdev->name, mgp);
+	}
+	if (status != 0) {
+		dev_err(&pdev->dev, "failed to allocate IRQ\n");
+		goto abort_with_firmware;
+	}
+
+	pci_set_drvdata(pdev, mgp);
+	if ((myri10ge_initial_mtu + ETH_HLEN) > MYRI10GE_MAX_ETHER_MTU)
+		myri10ge_initial_mtu = MYRI10GE_MAX_ETHER_MTU - ETH_HLEN;
+	if ((myri10ge_initial_mtu + ETH_HLEN) < 68)
+		myri10ge_initial_mtu = 68;
+	netdev->mtu = myri10ge_initial_mtu;
+	netdev->open = myri10ge_open;
+	netdev->stop = myri10ge_close;
+	netdev->hard_start_xmit = myri10ge_xmit;
+	netdev->get_stats = myri10ge_get_stats;
+	netdev->base_addr = mgp->iomem_base;
+	netdev->irq = pdev->irq;
+	netdev->init = myri10ge_init;
+	netdev->change_mtu = myri10ge_change_mtu;
+	netdev->set_multicast_list = myri10ge_set_multicast_list;
+	netdev->set_mac_address = myri10ge_set_mac_address;
+	netdev->do_ioctl = myri10ge_ioctl;
+	netdev->features = NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_HIGHDMA;
+#if 0
+	/* TSO can be enabled via ethtool -K eth1 tso on */
+#ifdef NETIF_F_TSO
+	netdev->features |= NETIF_F_TSO;
+#endif
+#endif
+	if (myri10ge_napi) {
+		netdev->poll = myri10ge_poll;
+		netdev->weight = myri10ge_napi_weight;
+	}
+
+	/* Save configuration space to be restored if the
+	 * nic resets due to a parity error */
+	myri10ge_save_state(mgp);
+
+	/* Setup the watchdog timer */
+	init_timer(&mgp->watchdog_timer);
+	mgp->watchdog_timer.data = (unsigned long)mgp;
+	mgp->watchdog_timer.function = myri10ge_watchdog_timer;
+
+	SET_ETHTOOL_OPS(netdev, &myri10ge_ethtool_ops);
+	INIT_WORK(&mgp->watchdog_work, myri10ge_watchdog, mgp);
+	status = register_netdev(netdev);
+	if (status != 0) {
+		dev_err(&pdev->dev, "register_netdev failed: %d\n", status);
+		goto abort_with_irq;
+	}
+
+	printk("myri10ge: %s: %s IRQ %d, tx bndry %d, fw %s, WC %s\n",
+	       netdev->name,  (mgp->msi_enabled ? "MSI" : "xPIC"),
+	       pdev->irq, mgp->tx.boundary, mgp->fw_name,
+	       (mgp->mtrr >= 0 ? "Enabled" : "Disabled"));
+
+	return 0;
+
+abort_with_irq:
+	free_irq(pdev->irq, mgp);
+#ifdef CONFIG_PCI_MSI
+	if (mgp->msi_enabled)
+		pci_disable_msi(pdev);
+#endif
+
+abort_with_firmware:
+	myri10ge_dummy_rdma(mgp, 0);
+
+abort_with_rx_done:
+	bytes = myri10ge_max_intr_slots * sizeof (*mgp->rx_done.entry);
+	pci_free_consistent(pdev, bytes, mgp->rx_done.entry, mgp->rx_done.bus);
+
+abort_with_ioremap:
+	iounmap((void __iomem *) mgp->sram);
+
+abort_with_wc:
+#ifdef CONFIG_MTRR
+	if (mgp->mtrr >= 0)
+		mtrr_del(mgp->mtrr, mgp->iomem_base, mgp->board_span);
+#endif
+	pci_free_consistent(pdev, sizeof (*mgp->fw_stats),
+			    mgp->fw_stats, mgp->fw_stats_bus);
+
+abort_with_cmd:
+	pci_free_consistent(pdev, sizeof (*mgp->cmd), mgp->cmd, mgp->cmd_bus);
+
+abort_with_netdev:
+
+	free_netdev(netdev);
+	return status;
+}
+
+/****************************************************************
+ * myri10ge_remove
+ *
+ * Does what is necessary to shutdown one Myrinet device. Called
+ *   once for each Myrinet card by the kernel when a module is
+ *   unloaded.
+ ****************************************************************/
+
+static void
+myri10ge_remove(struct pci_dev *pdev)
+{
+	struct myri10ge_priv *mgp;
+	struct net_device *netdev;
+	size_t bytes;
+
+	mgp = (struct myri10ge_priv *) pci_get_drvdata(pdev);
+	if (mgp == NULL)
+		return;
+
+	flush_scheduled_work();
+	netdev = mgp->dev;
+	unregister_netdev(netdev);
+	free_irq(pdev->irq, mgp);
+#ifdef CONFIG_PCI_MSI
+	if (mgp->msi_enabled)
+		pci_disable_msi(pdev);
+#endif
+
+	myri10ge_dummy_rdma(mgp, 0);
+
+
+	bytes = myri10ge_max_intr_slots * sizeof (*mgp->rx_done.entry);
+	pci_free_consistent(pdev, bytes, mgp->rx_done.entry, mgp->rx_done.bus);
+
+	iounmap((void __iomem *) mgp->sram);
+
+#ifdef CONFIG_MTRR
+	if (mgp->mtrr >= 0)
+		mtrr_del(mgp->mtrr, mgp->iomem_base, mgp->board_span);
+#endif
+	pci_free_consistent(pdev, sizeof (*mgp->fw_stats),
+			    mgp->fw_stats, mgp->fw_stats_bus);
+
+	pci_free_consistent(pdev, sizeof (*mgp->cmd), mgp->cmd, mgp->cmd_bus);
+
+	free_netdev(netdev);
+	pci_set_drvdata(pdev, NULL);
+}
+
+
+#define MYRI10GE_PCI_VENDOR_MYRICOM 	0x14c1
+#define MYRI10GE_PCI_DEVICE_Z8E 	0x0008
+static struct pci_device_id myri10ge_pci_tbl[] = {
+	{MYRI10GE_PCI_VENDOR_MYRICOM, MYRI10GE_PCI_DEVICE_Z8E,
+	 PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0},
+	{0,},
+};
+
+static struct pci_driver myri10ge_driver = {
+	.name = "myri10ge",
+	.probe = myri10ge_probe,
+	.remove = myri10ge_remove,
+	.id_table = myri10ge_pci_tbl,
+#ifdef CONFIG_PM
+	.suspend = myri10ge_suspend,
+	.resume = myri10ge_resume,
+#endif
+};
+
+static int
+myri10ge_init_module(void)
+{
+	int rc;
+	printk("%s: Version %s\n", myri10ge_driver.name,
+	       MYRI10GE_VERSION_STR);
+	rc = pci_register_driver(&myri10ge_driver);
+	return rc < 0 ? rc : 0;
+}
+
+static void
+myri10ge_cleanup_module(void)
+{
+	pci_unregister_driver(&myri10ge_driver);
+}
+
+module_init(myri10ge_init_module);
+module_exit(myri10ge_cleanup_module);
+



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 6/6] myri10ge - Kconfig and Makefile
  2006-05-10 21:22 [PATCH 0/6] myri10ge - Myri-10G Ethernet driver Brice Goglin
                   ` (4 preceding siblings ...)
  2006-05-10 21:42 ` [PATCH 5/6] myri10ge - Second " Brice Goglin
@ 2006-05-10 21:43 ` Brice Goglin
  2006-05-13 18:51   ` Adrian Bunk
  5 siblings, 1 reply; 28+ messages in thread
From: Brice Goglin @ 2006-05-10 21:43 UTC (permalink / raw)
  To: netdev, Andrew Morton; +Cc: LKML, Andrew J. Gallatin

[PATCH 6/6] myri10ge - Kconfig and Makefile

Add Kconfig and Makefile support for the myri10ge driver.

Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Andrew J. Gallatin <gallatin@myri.com>

 Kconfig           |   16 ++++++++++++++++
 Makefile          |    1 +
 myri10ge/Makefile |    5 +++++
 3 files changed, 22 insertions(+)

--- linux-mm/drivers/net/Kconfig.old	2006-04-10 03:44:01.000000000 -0700
+++ linux-mm/drivers/net/Kconfig	2006-04-18 03:49:11.000000000 -0700
@@ -2327,6 +2327,23 @@ config S2IO_NAPI
 
 	  If in doubt, say N.
 
+config MYRI10GE
+	tristate "Myricom Myri-10G Ethernet support"
+	depends on PCI
+	select FW_LOADER
+	select CRC32
+	---help---
+	  This driver supports Myricom Myri-10G Dual Protocol interface in
+	  Ethernet mode. If the eeprom on your board is not recent enough,
+	  you will need a newer firmware image.
+	  You may get this image or more information, at:
+
+	  <http://www.myri.com/Myri-10G/>
+
+	  To compile this driver as a module, choose M here and read
+	  <file:Documentation/networking/net-modules.txt>.  The module
+	  will be called myri10ge.
+
 endmenu
 
 source "drivers/net/tokenring/Kconfig"
--- linux-mm/drivers/net/Makefile.old	2006-04-08 04:49:53.000000000 -0700
+++ linux-mm/drivers/net/Makefile	2006-04-21 08:10:27.000000000 -0700
@@ -192,6 +192,7 @@ obj-$(CONFIG_R8169) += r8169.o
 obj-$(CONFIG_AMD8111_ETH) += amd8111e.o
 obj-$(CONFIG_IBMVETH) += ibmveth.o
 obj-$(CONFIG_S2IO) += s2io.o
+obj-$(CONFIG_MYRI10GE) += myri10ge/
 obj-$(CONFIG_SMC91X) += smc91x.o
 obj-$(CONFIG_SMC911X) += smc911x.o
 obj-$(CONFIG_DM9000) += dm9000.o
--- /dev/null	2006-04-21 00:45:09.064430000 -0700
+++ linux-mm/drivers/net/myri10ge/Makefile	2006-04-21 08:14:21.000000000 -0700
@@ -0,0 +1,5 @@
+#
+# Makefile for the Myricom Myri-10G ethernet driver
+#
+
+obj-$(CONFIG_MYRI10GE) += myri10ge.o



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/6] myri10ge - Add missing PCI IDs
  2006-05-10 21:35 ` [PATCH 2/6] myri10ge - Add missing PCI IDs Brice Goglin
@ 2006-05-10 21:52   ` Andi Kleen
  0 siblings, 0 replies; 28+ messages in thread
From: Andi Kleen @ 2006-05-10 21:52 UTC (permalink / raw)
  To: Brice Goglin; +Cc: netdev, Andrew Morton, LKML, Andrew J. Gallatin

On Wednesday 10 May 2006 23:35, Brice Goglin wrote:
> [PATCH 2/6] myri10ge - Add missing PCI IDs
> 
> Add nVidia nForce CK804 PCI-E bridge and 
> ServerWorks HT2000 PCI-E bridge IDs.
> They will be used by the myri10ge driver.

That's a bad sign. It means you have code in your driver 
that should be somewhere else.

-Andi

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/6] myri10ge - Driver header files
  2006-05-10 21:36 ` [PATCH 3/6] myri10ge - Driver header files Brice Goglin
@ 2006-05-10 21:57   ` Roland Dreier
  2006-05-10 22:00   ` Stephen Hemminger
  2006-05-10 22:02   ` Francois Romieu
  2 siblings, 0 replies; 28+ messages in thread
From: Roland Dreier @ 2006-05-10 21:57 UTC (permalink / raw)
  To: Brice Goglin; +Cc: netdev, Andrew Morton, LKML, Andrew J. Gallatin

A few quick obvious comments:

 > +#ifdef MYRI10GE_MCP
 > +typedef signed char          int8_t;
 > +typedef signed short        int16_t;
 > +typedef signed int          int32_t;
 > +typedef signed long long    int64_t;
 > +typedef unsigned char       uint8_t;
 > +typedef unsigned short     uint16_t;
 > +typedef unsigned int       uint32_t;
 > +typedef unsigned long long uint64_t;
 > +#endif

What's this doing?  If you must use uintXX_t types the kernel already
has them.  Although it would be nicer to use u8, u16, etc.

 > +/* 8 Bytes */
 > +typedef struct
 > +{
 > +  uint32_t high;
 > +  uint32_t low;
 > +} mcp_dma_addr_t;

All of these typedefs are unnecessary.  In the kernel it's strongly
preferred to just do

struct mcp_dma_addr {
	u32 high;
        u32 low;
};

and then use "struct mcp_dma_addr" instead of "mcp_dma_addr_t".

Similarly for enums.  Just use "enum whatever" instead of "whatever_t".

BTW, indentation is busted in these headers too (two spaces instead of a tab).

 - R.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/6] myri10ge - Driver header files
  2006-05-10 21:36 ` [PATCH 3/6] myri10ge - Driver header files Brice Goglin
  2006-05-10 21:57   ` Roland Dreier
@ 2006-05-10 22:00   ` Stephen Hemminger
  2006-05-10 22:02   ` Francois Romieu
  2 siblings, 0 replies; 28+ messages in thread
From: Stephen Hemminger @ 2006-05-10 22:00 UTC (permalink / raw)
  To: Brice Goglin; +Cc: netdev, Andrew Morton, LKML, Andrew J. Gallatin

On Wed, 10 May 2006 23:36:18 +0200
Brice Goglin <brice@myri.com> wrote:

> [PATCH 3/6] myri10ge - Driver header files
> 
> myri10ge driver header files.
> myri10ge_mcp.h is the generic header, while myri10ge_mcp_gen_header.h
> is automatically generated from our firmware image.

Then clean it up after the auto generation.
Auto generated code still gets maintained by humans.

> Signed-off-by: Brice Goglin <brice@myri.com>
> Signed-off-by: Andrew J. Gallatin <gallatin@myri.com>
> 
>  myri10ge_mcp.h            |  233 ++++++++++++++++++++++++++++++++++++++++++++++
>  myri10ge_mcp_gen_header.h |   73 ++++++++++++++
>  2 files changed, 306 insertions(+)
> 
> --- /dev/null	2006-04-21 00:45:09.064430000 -0700
> +++ linux-mm/drivers/net/myri10ge/myri10ge_mcp.h	2006-04-21 08:20:59.000000000 -0700
> @@ -0,0 +1,233 @@
> +#ifndef _myri10ge_mcp_h
> +#define _myri10ge_mcp_h
> +
> +#define MYRI10GE_MCP_MAJOR	1
> +#define MYRI10GE_MCP_MINOR	4
> +

Major/Minor for what. You don't have a character device.

> +#ifdef MYRI10GE_MCP
> +typedef signed char          int8_t;
> +typedef signed short        int16_t;
> +typedef signed int          int32_t;
> +typedef signed long long    int64_t;
> +typedef unsigned char       uint8_t;
> +typedef unsigned short     uint16_t;
> +typedef unsigned int       uint32_t;
> +typedef unsigned long long uint64_t;
> +#endif

Use u8 u16 u32


> +/* 8 Bytes */
> +typedef struct
> +{
> +  uint32_t high;
> +  uint32_t low;
> +} mcp_dma_addr_t;

Run this through scripts/Lindent and get indentation right

> +/* 16 Bytes */
> +typedef struct
> +{
> +  uint16_t checksum;
> +  uint16_t length;
> +} mcp_slot_t;
> +
> +/* 64 Bytes */
> +typedef struct
> +{
> +  uint32_t cmd;
> +  uint32_t data0;	/* will be low portion if data > 32 bits */
> +  /* 8 */
> +  uint32_t data1;	/* will be high portion if data > 32 bits */
> +  uint32_t data2;	/* currently unused.. */
> +  /* 16 */
> +  mcp_dma_addr_t response_addr;
> +  /* 24 */
> +  uint8_t pad[40];
> +} mcp_cmd_t;
> +
> +/* 8 Bytes */
> +typedef struct
> +{
> +  uint32_t data;
> +  uint32_t result;
> +} mcp_cmd_response_t;
> +
> +
> +
> +/* 
> +   flags used in mcp_kreq_ether_send_t:
> +
> +   The SMALL flag is only needed in the first segment. It is raised
> +   for packets that are total less or equal 512 bytes.
> +
> +   The CKSUM flag must be set in all segments.
> +
> +   The PADDED flags is set if the packet needs to be padded, and it
> +   must be set for all segments.
> +
> +   The  MYRI10GE_MCP_ETHER_FLAGS_ALIGN_ODD must be set if the cumulative
> +   length of all previous segments was odd.
> +*/
> +
> +
> +#define MYRI10GE_MCP_ETHER_FLAGS_SMALL      0x1
> +#define MYRI10GE_MCP_ETHER_FLAGS_TSO_HDR    0x1
> +#define MYRI10GE_MCP_ETHER_FLAGS_FIRST      0x2
> +#define MYRI10GE_MCP_ETHER_FLAGS_ALIGN_ODD  0x4
> +#define MYRI10GE_MCP_ETHER_FLAGS_CKSUM      0x8
> +#define MYRI10GE_MCP_ETHER_FLAGS_TSO_LAST   0x8
> +#define MYRI10GE_MCP_ETHER_FLAGS_NO_TSO     0x10
> +#define MYRI10GE_MCP_ETHER_FLAGS_TSO_CHOP   0x10
> +#define MYRI10GE_MCP_ETHER_FLAGS_TSO_PLD    0x20
> +
> +#define MYRI10GE_MCP_ETHER_SEND_SMALL_SIZE  1520
> +#define MYRI10GE_MCP_ETHER_MAX_MTU          9400
> +
> +typedef union mcp_pso_or_cumlen
> +{
> +  uint16_t pseudo_hdr_offset;
> +  uint16_t cum_len;
> +} mcp_pso_or_cumlen_t;
> +
> +#define	MYRI10GE_MCP_ETHER_MAX_SEND_DESC 12
> +#define MYRI10GE_MCP_ETHER_PAD	    2
> +
> +/* 16 Bytes */
> +typedef struct
> +{
> +  uint32_t addr_high;
> +  uint32_t addr_low;
> +  uint16_t pseudo_hdr_offset;
> +  uint16_t length;
> +  uint8_t  pad;
> +  uint8_t  rdma_count;
> +  uint8_t  cksum_offset; 	/* where to start computing cksum */
> +  uint8_t  flags;	       	/* as defined above */
> +} mcp_kreq_ether_send_t;
> +
> +/* 8 Bytes */
> +typedef struct
> +{
> +  uint32_t addr_high;
> +  uint32_t addr_low;
> +} mcp_kreq_ether_recv_t;
> +
> +
> +/* Commands */
> +
> +#define MYRI10GE_MCP_CMD_OFFSET 0xf80000
> +
> +typedef enum {
> +  MYRI10GE_MCP_CMD_NONE = 0,
> +  /* Reset the mcp, it is left in a safe state, waiting
> +     for the driver to set all its parameters */
> +  MYRI10GE_MCP_CMD_RESET,
> +
> +  /* get the version number of the current firmware..
> +     (may be available in the eeprom strings..? */
> +  MYRI10GE_MCP_GET_MCP_VERSION,
> +
> +
> +  /* Parameters which must be set by the driver before it can
> +     issue MYRI10GE_MCP_CMD_ETHERNET_UP. They persist until the next
> +     MYRI10GE_MCP_CMD_RESET is issued */
> +
> +  MYRI10GE_MCP_CMD_SET_INTRQ_DMA,
> +  MYRI10GE_MCP_CMD_SET_BIG_BUFFER_SIZE,	/* in bytes, power of 2 */
> +  MYRI10GE_MCP_CMD_SET_SMALL_BUFFER_SIZE,	/* in bytes */
> +  
> +
> +  /* Parameters which refer to lanai SRAM addresses where the 
> +     driver must issue PIO writes for various things */
> +
> +  MYRI10GE_MCP_CMD_GET_SEND_OFFSET,
> +  MYRI10GE_MCP_CMD_GET_SMALL_RX_OFFSET,
> +  MYRI10GE_MCP_CMD_GET_BIG_RX_OFFSET,
> +  MYRI10GE_MCP_CMD_GET_IRQ_ACK_OFFSET,
> +  MYRI10GE_MCP_CMD_GET_IRQ_DEASSERT_OFFSET,
> +
> +  /* Parameters which refer to rings stored on the MCP,
> +     and whose size is controlled by the mcp */
> +
> +  MYRI10GE_MCP_CMD_GET_SEND_RING_SIZE,	/* in bytes */
> +  MYRI10GE_MCP_CMD_GET_RX_RING_SIZE,		/* in bytes */
> +
> +  /* Parameters which refer to rings stored in the host,
> +     and whose size is controlled by the host.  Note that
> +     all must be physically contiguous and must contain 
> +     a power of 2 number of entries.  */
> +
> +  MYRI10GE_MCP_CMD_SET_INTRQ_SIZE, 	/* in bytes */
> +
> +  /* command to bring ethernet interface up.  Above parameters
> +     (plus mtu & mac address) must have been exchanged prior
> +     to issuing this command  */
> +  MYRI10GE_MCP_CMD_ETHERNET_UP,
> +
> +  /* command to bring ethernet interface down.  No further sends
> +     or receives may be processed until an MYRI10GE_MCP_CMD_ETHERNET_UP
> +     is issued, and all interrupt queues must be flushed prior
> +     to ack'ing this command */
> +
> +  MYRI10GE_MCP_CMD_ETHERNET_DOWN,
> +
> +  /* commands the driver may issue live, without resetting
> +     the nic.  Note that increasing the mtu "live" should
> +     only be done if the driver has already supplied buffers
> +     sufficiently large to handle the new mtu.  Decreasing
> +     the mtu live is safe */
> +
> +  MYRI10GE_MCP_CMD_SET_MTU,
> +  MYRI10GE_MCP_CMD_GET_INTR_COAL_DELAY_OFFSET,  /* in microseconds */
> +  MYRI10GE_MCP_CMD_SET_STATS_INTERVAL,   /* in microseconds */
> +  MYRI10GE_MCP_CMD_SET_STATS_DMA,
> +
> +  MYRI10GE_MCP_ENABLE_PROMISC,
> +  MYRI10GE_MCP_DISABLE_PROMISC,
> +  MYRI10GE_MCP_SET_MAC_ADDRESS,
> +
> +  MYRI10GE_MCP_ENABLE_FLOW_CONTROL,
> +  MYRI10GE_MCP_DISABLE_FLOW_CONTROL,
> +
> +  /* do a DMA test
> +     data0,data1 = DMA address
> +     data2       = RDMA length (MSH), WDMA length (LSH)
> +     command return data = repetitions (MSH), 0.5-ms ticks (LSH)
> +  */
> +  MYRI10GE_MCP_DMA_TEST
> +} myri10ge_mcp_cmd_type_t;
> +
> +
> +typedef enum {
> +  MYRI10GE_MCP_CMD_OK = 0,
> +  MYRI10GE_MCP_CMD_UNKNOWN,
> +  MYRI10GE_MCP_CMD_ERROR_RANGE,
> +  MYRI10GE_MCP_CMD_ERROR_BUSY,
> +  MYRI10GE_MCP_CMD_ERROR_EMPTY,
> +  MYRI10GE_MCP_CMD_ERROR_CLOSED,
> +  MYRI10GE_MCP_CMD_ERROR_HASH_ERROR,
> +  MYRI10GE_MCP_CMD_ERROR_BAD_PORT,
> +  MYRI10GE_MCP_CMD_ERROR_RESOURCES
> +} myri10ge_mcp_cmd_status_t;
> +
> +
> +/* 32 Bytes */
> +typedef struct
> +{
> +  uint32_t send_done_count;
> +
> +  uint32_t link_up;
> +  uint32_t dropped_link_overflow;
> +  uint32_t dropped_link_error_or_filtered;
> +  uint32_t dropped_runt;
> +  uint32_t dropped_overrun;
> +  uint32_t dropped_no_small_buffer;
> +  uint32_t dropped_no_big_buffer;
> +  uint32_t rdma_tags_available;
> +
> +  uint8_t tx_stopped;
> +  uint8_t link_down;
> +  uint8_t stats_updated;
> +  uint8_t valid;
> +} mcp_irq_data_t;
> +
> +
> +#endif /* _myri10ge_mcp_h */
> --- /dev/null	2006-04-21 00:45:09.064430000 -0700
> +++ linux-mm/drivers/net/myri10ge/myri10ge_mcp_gen_header.h	2006-04-21 08:22:06.000000000 -0700
> @@ -0,0 +1,73 @@
> +#ifndef _myri10ge_mcp_gen_header_h
> +#define _myri10ge_mcp_gen_header_h
> +
> +/* this file define a standard header used as a first entry point to
> +   exchange information between firmware/driver and driver.  The
> +   header structure can be anywhere in the mcp. It will usually be in
> +   the .data section, because some fields needs to be initialized at
> +   compile time.
> +   The 32bit word at offset MX_HEADER_PTR_OFFSET in the mcp must
> +   contains the location of the header. 
> +
> +   Typically a MCP will start with the following:
> +   .text
> +     .space 52    ! to help catch MEMORY_INT errors
> +     bt start     ! jump to real code
> +     nop
> +     .long _gen_mcp_header
> +   
> +   The source will have a definition like:
> +
> +   mcp_gen_header_t gen_mcp_header = {
> +      .header_length = sizeof(mcp_gen_header_t),
> +      .mcp_type = MCP_TYPE_XXX,
> +      .version = "something $Id: mcp_gen_header.h,v 1.1 2005/12/23 02:10:44 gallatin Exp $",
> +      .mcp_globals = (unsigned)&Globals
> +   };
> +*/
> +
> +
> +#define MCP_HEADER_PTR_OFFSET  0x3c
> +
> +#define MCP_TYPE_MX 0x4d582020 /* "MX  " */
> +#define MCP_TYPE_PCIE 0x70636965 /* "PCIE" pcie-only MCP */
> +#define MCP_TYPE_ETH 0x45544820 /* "ETH " */
> +#define MCP_TYPE_MCP0 0x4d435030 /* "MCP0" */
> +
> +
> +typedef struct mcp_gen_header {
> +  /* the first 4 fields are filled at compile time */
> +  unsigned header_length;
> +  unsigned mcp_type;
> +  char version[128];
> +  unsigned mcp_globals; /* pointer to mcp-type specific structure */
> +
> +  /* filled by the MCP at run-time */
> +  unsigned sram_size;
> +  unsigned string_specs;  /* either the original STRING_SPECS or a superset */
> +  unsigned string_specs_len;
> +
> +  /* Fields above this comment are guaranteed to be present.
> +
> +     Fields below this comment are extensions added in later versions
> +     of this struct, drivers should compare the header_length against
> +     offsetof(field) to check wether a given MCP implements them.
> +
> +     Never remove any field.  Keep everything naturally align.
> +  */
> +} mcp_gen_header_t;
> +
> +/* Macro to create a simple mcp header */
> +#define MCP_GEN_HEADER_DECL(type, version_str, global_ptr)	\
> +  struct mcp_gen_header mcp_gen_header = {			\
> +    sizeof (struct mcp_gen_header),				\
> +    (type),							\
> +    version_str,						\
> +    (global_ptr),						\
> +    SRAM_SIZE,							\
> +    (unsigned int) STRING_SPECS,				\
> +    256								\
> +  }
> +
> +

Ugly macro.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/6] myri10ge - First half of the driver
  2006-05-10 21:40 ` [PATCH 4/6] myri10ge - First half of the driver Brice Goglin
@ 2006-05-10 22:01   ` Stephen Hemminger
  2006-05-10 22:06     ` Roland Dreier
  2006-05-10 22:04   ` Roland Dreier
  2006-05-10 23:13   ` Francois Romieu
  2 siblings, 1 reply; 28+ messages in thread
From: Stephen Hemminger @ 2006-05-10 22:01 UTC (permalink / raw)
  To: Brice Goglin; +Cc: netdev, Andrew Morton, LKML, Andrew J. Gallatin, brice

On Wed, 10 May 2006 14:40:22 -0700 (PDT)
Brice Goglin <bgoglin@myri.com> wrote:

> [PATCH 4/6] myri10ge - First half of the driver
> 
> The first half of the myri10ge driver core.
> 

Splitting it in half, might help email restrictions, but it kills
future users of 'git bisect' who expect to have every kernel buildable.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/6] myri10ge - Driver header files
  2006-05-10 21:36 ` [PATCH 3/6] myri10ge - Driver header files Brice Goglin
  2006-05-10 21:57   ` Roland Dreier
  2006-05-10 22:00   ` Stephen Hemminger
@ 2006-05-10 22:02   ` Francois Romieu
  2 siblings, 0 replies; 28+ messages in thread
From: Francois Romieu @ 2006-05-10 22:02 UTC (permalink / raw)
  To: Brice Goglin; +Cc: netdev, Andrew Morton, LKML, Andrew J. Gallatin

Brice Goglin <brice@myri.com> :
> [PATCH 3/6] myri10ge - Driver header files
> 
> myri10ge driver header files.
> myri10ge_mcp.h is the generic header, while myri10ge_mcp_gen_header.h
> is automatically generated from our firmware image.
> 
> Signed-off-by: Brice Goglin <brice@myri.com>
> Signed-off-by: Andrew J. Gallatin <gallatin@myri.com>
> 
>  myri10ge_mcp.h            |  233 ++++++++++++++++++++++++++++++++++++++++++++++
>  myri10ge_mcp_gen_header.h |   73 ++++++++++++++
>  2 files changed, 306 insertions(+)
> 
> --- /dev/null	2006-04-21 00:45:09.064430000 -0700
> +++ linux-mm/drivers/net/myri10ge/myri10ge_mcp.h	2006-04-21 08:20:59.000000000 -0700
> @@ -0,0 +1,233 @@
> +#ifndef _myri10ge_mcp_h
> +#define _myri10ge_mcp_h

Uppercase please.

[...]
> +#ifdef MYRI10GE_MCP
> +typedef signed char          int8_t;
> +typedef signed short        int16_t;
> +typedef signed int          int32_t;
> +typedef signed long long    int64_t;
> +typedef unsigned char       uint8_t;
> +typedef unsigned short     uint16_t;
> +typedef unsigned int       uint32_t;
> +typedef unsigned long long uint64_t;
> +#endif

Bloat. u8/u16/u32 and friends should be used instead.

> +/* 8 Bytes */
> +typedef struct
> +{
> +  uint32_t high;
> +  uint32_t low;
> +} mcp_dma_addr_t;

Typedef are frowned upon.

[...]
> +/* 32 Bytes */

The struct takes 40 bytes. Does it need a 32 bytes alignment or such ?

> +typedef struct
> +{
> +  uint32_t send_done_count;
> +
> +  uint32_t link_up;
> +  uint32_t dropped_link_overflow;
> +  uint32_t dropped_link_error_or_filtered;
> +  uint32_t dropped_runt;
> +  uint32_t dropped_overrun;
> +  uint32_t dropped_no_small_buffer;
> +  uint32_t dropped_no_big_buffer;
> +  uint32_t rdma_tags_available;
> +
> +  uint8_t tx_stopped;
> +  uint8_t link_down;
> +  uint8_t stats_updated;
> +  uint8_t valid;
> +} mcp_irq_data_t;
> +
> +

-- 
Ueimor

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/6] myri10ge - First half of the driver
  2006-05-10 21:40 ` [PATCH 4/6] myri10ge - First half of the driver Brice Goglin
  2006-05-10 22:01   ` Stephen Hemminger
@ 2006-05-10 22:04   ` Roland Dreier
  2006-05-11 23:53     ` Brice Goglin
  2006-05-10 23:13   ` Francois Romieu
  2 siblings, 1 reply; 28+ messages in thread
From: Roland Dreier @ 2006-05-10 22:04 UTC (permalink / raw)
  To: Brice Goglin; +Cc: netdev, Andrew Morton, LKML, Andrew J. Gallatin, brice

 > +typedef struct {
 > +	mcp_kreq_ether_recv_t __iomem *lanai;	/* lanai ptr for recv ring */
 > +	volatile uint8_t __iomem *wc_fifo;	/* w/c rx dma addr fifo address */
 > +	mcp_kreq_ether_recv_t *shadow;	/* host shadow of recv ring */
 > +	struct myri10ge_rx_buffer_state *info;
 > +	int cnt;
 > +	int alloc_fail;
 > +	int mask;			/* number of rx slots -1 */
 > +} myri10ge_rx_buf_t;

Why is wc_fifo volatile?  The only places you actually use it, you
seem to cast away the volatile anyway.

Also, again, no typedef of structures please.

 > +#define myri10ge_pio_copy(to,from,size) __iowrite64_copy(to,from,size/8)

Why do you need this wrapper?  Why not just call __iowrite64_copy()
without the obfuscation?  Anyone reading the code will just have to
search back to this define and mentally translate the size back and
forth all the time.

 > +int myri10ge_hyper_msi_cap_on(struct pci_dev *pdev)
 > +{
 > +	uint8_t cap_off;
 > +	int nbcap = 0;
 > +
 > +	cap_off = PCI_CAPABILITY_LIST - 1;
 > +	/* go through all caps looking for a hypertransport msi mapping */

This looks like something that should be fixed up in the general PCI
quirk handling rather than in every driver...

 > +static int
 > +myri10ge_use_msi(struct pci_dev *pdev)
 > +{
 > +	if (myri10ge_msi == 1 || myri10ge_msi == 0)
 > +		return myri10ge_msi;
 > +
 > +	/*  find root complex for our device */
 > +	while (pdev->bus && pdev->bus->self) {
 > +		pdev = pdev->bus->self;
 > +	}

Similarly looks like generic PCI code (if it's needed at all).  If I
understand correctly you're trying to check if MSI has a chance at
working on the system, but a network device driver has no business
walking up the PCI hierarchy.

 > +	buf = (mcp_cmd_t *) ((unsigned long)(buf_bytes + 7) & ~7UL);

ALIGN() from <linux/kernel.h>?

 - R.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/6] myri10ge - First half of the driver
  2006-05-10 22:01   ` Stephen Hemminger
@ 2006-05-10 22:06     ` Roland Dreier
  2006-05-11 23:53       ` Brice Goglin
  0 siblings, 1 reply; 28+ messages in thread
From: Roland Dreier @ 2006-05-10 22:06 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Brice Goglin, netdev, Andrew Morton, LKML, Andrew J. Gallatin, brice

    Stephen> Splitting it in half, might help email restrictions, but
    Stephen> it kills future users of 'git bisect' who expect to have
    Stephen> every kernel buildable.

Not really, since the makefile/kconfig stuff comes in a later patch.

But yes, it is cleaner to have drivers go in in sane pieces.

 - R.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/6] myri10ge - Second half of the driver
  2006-05-10 21:42 ` [PATCH 5/6] myri10ge - Second " Brice Goglin
@ 2006-05-10 22:22   ` Stephen Hemminger
  2006-05-11 23:53     ` Brice Goglin
  0 siblings, 1 reply; 28+ messages in thread
From: Stephen Hemminger @ 2006-05-10 22:22 UTC (permalink / raw)
  To: Brice Goglin; +Cc: netdev, Andrew Morton, LKML, Andrew J. Gallatin, brice

On Wed, 10 May 2006 14:42:41 -0700 (PDT)
Brice Goglin <bgoglin@myri.com> wrote:

> [PATCH 5/6] myri10ge - Second half of the driver
> 
> The second half of the myri10ge driver core.
> 
> Signed-off-by: Brice Goglin <brice@myri.com>
> Signed-off-by: Andrew J. Gallatin <gallatin@myri.com>
> 
>  myri10ge.c | 1540 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 1540 insertions(+)
> 
> --- linux/drivers/net/myri10ge/myri10ge.c.old	2006-05-09 23:00:54.000000000 +0200
> +++ linux/drivers/net/myri10ge/myri10ge.c	2006-05-09 23:00:54.000000000 +0200
> @@ -1481,3 +1481,1543 @@ static struct ethtool_ops myri10ge_ethto
>  	.get_stats_count		= myri10ge_get_stats_count,
>  	.get_ethtool_stats		= myri10ge_get_ethtool_stats
>  };
> +
> +static int
> +myri10ge_open(struct net_device *dev)

It is preferred to put function declarations on one line.

static int mril10ge_open(struct net_device *dev)



> +{
> +	struct myri10ge_priv *mgp;
> +	size_t bytes;
> +	myri10ge_cmd_t cmd;
> +	int tx_ring_size, rx_ring_size;
> +	int tx_ring_entries, rx_ring_entries;
> +	int i, status, big_pow2;
> +
> +	mgp = dev->priv;

use netdev_priv(dev)

> +
> +	if (mgp->running != MYRI10GE_ETH_STOPPED)
> +		return -EBUSY;
> +
> +	mgp->running = MYRI10GE_ETH_STARTING;
> +	status = myri10ge_reset(mgp);
>
> +	/* If the user sets an obscenely small MTU, adjust the small
> +	 * bytes down to nearly nothing */
> +	if (mgp->small_bytes >= (dev->mtu + ETH_HLEN))
> +		mgp->small_bytes = 64;

You should enforce mtu >= 68 in your driver (see eth_change_mtu)

>
> +static int
> +myri10ge_close(struct net_device *dev)
> +{
> +	struct myri10ge_priv *mgp;
> +	struct sk_buff *skb;
> +	myri10ge_tx_buf_t *tx;
> +	int status, i, old_down_cnt, len, idx;
> +	myri10ge_cmd_t cmd;
> +
> +	mgp = dev->priv;
> +
> +	if (mgp->running != MYRI10GE_ETH_RUNNING)
> +		return 0;
> +
> +	if (mgp->tx.req_bytes == NULL)
> +		return 0;
> +
> +	del_timer_sync(&mgp->watchdog_timer);
> +	mgp->running = MYRI10GE_ETH_STOPPING;
> +	if (myri10ge_napi)
> +		netif_poll_disable(mgp->dev);
> +	netif_carrier_off(dev);
> +	netif_stop_queue(dev);
> +	old_down_cnt = mgp->down_cnt;
> +	mb();
> +	status = myri10ge_send_cmd(mgp, MYRI10GE_MCP_CMD_ETHERNET_DOWN, &cmd);
> +	if (status) {
> +		printk(KERN_ERR "myri10ge: %s: Couldn't bring down link\n",
> +		       dev->name);
> +	}
> +	set_current_state (TASK_UNINTERRUPTIBLE);
> +	if (old_down_cnt == mgp->down_cnt)
> +		schedule_timeout(HZ);
> +	set_current_state(TASK_RUNNING);
> +	if (old_down_cnt == mgp->down_cnt) {
> +		printk(KERN_ERR "myri10ge: %s never got down irq\n",
> +		       dev->name);
> +	}

Better to use a wait_queue and wait_event()

> 
> +#ifdef NETIF_F_TSO
> +static inline unsigned long
> +myri10ge_tcpend(struct sk_buff *skb)
> +{
> +	struct iphdr *ip;
> +	int iphlen, tcplen;
> +	struct tcphdr *tcp;
> +
> +	ip = (struct iphdr *) ((char *) skb->data + 14);
> +	iphlen = ip->ihl << 2;
> +	tcp = (struct tcphdr *) ((char *) ip + iphlen);
> +	tcplen = tcp->doff << 2;
> +	return (tcplen + iphlen + 14);
> +}
> +#endif

The information you want is already in skb->nh.iph and skb->h.th
and it works with VLAN's. Your code doesn't.

> +
> +static inline void
> +myri10ge_csum_fixup(struct sk_buff *skb, int cksum_offset,
> +		    int pseudo_hdr_offset)
> +{
> +	int csum;
> +	uint16_t *csum_ptr;
> +
> +
> +	csum = skb_checksum(skb, cksum_offset,
> +			    skb->len - cksum_offset, 0);
> +	csum_ptr = (uint16_t *) (skb->h.raw + skb->csum);
> +	if (!pskb_may_pull(skb, pseudo_hdr_offset)) {
> +		printk(KERN_ERR "myri10ge: can't pull skb %d\n",
> +		       pseudo_hdr_offset);
> +		return;
> +	}
> +	*csum_ptr = csum_fold(csum);
> +	/* need to fixup IPv4 UDP packets according to RFC768 */
> +	if (unlikely(*csum_ptr == 0 &&
> +		     skb->protocol == htons(ETH_P_IP) &&
> +		     skb->nh.iph->protocol == IPPROTO_UDP)) {
> +		*csum_ptr = 0xffff;
> +	}
> +}

Use skb_checksum_help() instead of this code...

> +
> +/*
> + * Transmit a packet.  We need to split the packet so that a single
> + * segment does not cross myri10ge->tx.boundary, so this makes segment
> + * counting tricky.  So rather than try to count segments up front, we
> + * just give up if there are too few segments to hold a reasonably
> + * fragmented packet currently available.  If we run
> + * out of segments while preparing a packet for DMA, we just linearize
> + * it and try again.
> + */
> +
> +static int
> +myri10ge_xmit(struct sk_buff *skb, struct net_device *dev)
> +{
> +	struct myri10ge_priv *mgp = dev->priv;
> +	mcp_kreq_ether_send_t *req;
> +	myri10ge_tx_buf_t *tx = &mgp->tx;
> +	struct skb_frag_struct *frag;
> +	dma_addr_t bus;
> +	uint32_t low, high_swapped;
> +	unsigned int len;
> +	int idx, last_idx, avail, frag_cnt, frag_idx, count, mss, max_segments;
> +	uint16_t pseudo_hdr_offset, cksum_offset;
> +	int cum_len, seglen, boundary, rdma_count;
> +	uint8_t flags, odd_flag;
> +
> +again:
> +	req = tx->req_list;
> +	avail = tx->mask - 1 - (tx->req - tx->done);
> +
> +	mss = 0;
> +	max_segments = MYRI10GE_MCP_ETHER_MAX_SEND_DESC;
> +
> +#ifdef NETIF_F_TSO
> +	if (skb->len > (dev->mtu + ETH_HLEN)) {
> +		mss = skb_shinfo(skb)->tso_size;
> +		if (mss != 0)
> +			max_segments = MYRI10GE_MCP_ETHER_MAX_SEND_DESC_TSO;
> +	}
> +#endif /*NETIF_F_TSO */
> +
> +	if ((unlikely(avail < max_segments))) {
> +		/* we are out of transmit resources */
> +		mgp->stop_queue++;
> +		netif_stop_queue(dev);
> +		return 1;
> +	}
> +
> +	/* Setup checksum offloading, if needed */
> +	cksum_offset = 0;
> +	pseudo_hdr_offset = 0;
> +	odd_flag = 0;
> +	flags = (MYRI10GE_MCP_ETHER_FLAGS_NO_TSO |
> +		 MYRI10GE_MCP_ETHER_FLAGS_FIRST);
> +	if (likely(skb->ip_summed == CHECKSUM_HW)) {
> +		cksum_offset = (skb->h.raw - skb->data);
> +		pseudo_hdr_offset = (skb->h.raw + skb->csum) - skb->data;
> +		/* If the headers are excessively large, then we must
> +		 * fall back to a software checksum */
> +		if (unlikely(cksum_offset > 255 ||
> +			     pseudo_hdr_offset > 127)) {
> +			myri10ge_csum_fixup(skb, cksum_offset, pseudo_hdr_offset);


skb_checksum_help(skb, 0) will do what you want

> +			cksum_offset = 0;
> +			pseudo_hdr_offset = 0;
> +		} else {
> +			pseudo_hdr_offset = htons(pseudo_hdr_offset);
> +			odd_flag = MYRI10GE_MCP_ETHER_FLAGS_ALIGN_ODD;
> +			flags |= MYRI10GE_MCP_ETHER_FLAGS_CKSUM;
> +		}
> +	}
> +
> +	cum_len = 0;
> +
> +#ifdef NETIF_F_TSO
> +	if (mss) { /* TSO */
> +		/* this removes any CKSUM flag from before */
> +		flags = (MYRI10GE_MCP_ETHER_FLAGS_TSO_HDR |
> +			 MYRI10GE_MCP_ETHER_FLAGS_FIRST);
> +
> +		/* negative cum_len signifies to the
> +		 * send loop that we are still in the
> +		 * header portion of the TSO packet.
> +		 * TSO header must be at most 134 bytes long */
> +		cum_len = -myri10ge_tcpend(skb);
> +
> +		/* for TSO, pseudo_hdr_offset holds mss.
> +		 * The firmware figures out where to put
> +		 * the checksum by parsing the header. */
> +		pseudo_hdr_offset = htons(mss);
> +	} else
> +#endif /*NETIF_F_TSO */
> +	/* Mark small packets, and pad out tiny packets */
> +	if (skb->len <= MYRI10GE_MCP_ETHER_SEND_SMALL_SIZE) {
> +		flags |= MYRI10GE_MCP_ETHER_FLAGS_SMALL;
> +
> +		/* pad frames to at least ETH_ZLEN bytes */
> +		if (unlikely(skb->len < ETH_ZLEN)) {
> +			skb = skb_padto(skb, ETH_ZLEN);
> +			if (skb == NULL) {
> +				/* The packet is gone, so we must
> +				   return 0 */
> +				mgp->stats.tx_dropped += 1;
> +				return 0;
> +			}
> +			/* adjust the len to account for the zero pad
> +			   so that the nic can know how long it is */
> +			skb->len = ETH_ZLEN;
> +		}
> +	}
> +
> +	/* map the skb for DMA */
> +	len = skb->len - skb->data_len;
> +	idx = tx->req & tx->mask;
> +	tx->info[idx].skb = skb;
> +	bus = pci_map_single(mgp->pdev, skb->data, len, PCI_DMA_TODEVICE);
> +	pci_unmap_addr_set(&tx->info[idx], bus, bus);
> +	pci_unmap_len_set(&tx->info[idx], len, len);
> +
> +	frag_cnt = skb_shinfo(skb)->nr_frags;
> +	frag_idx = 0;
> +	count = 0;
> +	rdma_count = 0;
> +
> +	/* "rdma_count" is the number of RDMAs belonging to the
> +	 * current packet BEFORE the current send request. For
> +	 * non-TSO packets, this is equal to "count".
> +	 * For TSO packets, rdma_count needs to be reset
> +	 * to 0 after a segment cut.
> +	 *
> +	 * The rdma_count field of the send request is
> +	 * the number of RDMAs of the packet starting at
> +	 * that request. For TSO send requests with one ore more cuts
> +	 * in the middle, this is the number of RDMAs starting
> +	 * after the last cut in the request. All previous
> +	 * segments before the last cut implicitly have 1 RDMA.
> +	 *
> +	 * Since the number of RDMAs is not known beforehand,
> +	 * it must be filled-in retroactively - after each
> +	 * segmentation cut or at the end of the entire packet.
> +	 */
> +
> +	while (1) {
> +		/* Break the SKB or Fragment up into pieces which
> +		   do not cross mgp->tx.boundary */
> +		low = MYRI10GE_LOWPART_TO_U32(bus);
> +		high_swapped = htonl(MYRI10GE_HIGHPART_TO_U32(bus));
> +		while (len) {
> +			uint8_t flags_next;
> +			int cum_len_next;
> +
> +			if (unlikely(count == max_segments))
> +				goto abort_linearize;
> +
> +			boundary = (low + tx->boundary) & ~(tx->boundary - 1);
> +			seglen = boundary - low;
> +			if (seglen > len)
> +				seglen = len;
> +			flags_next = flags & ~MYRI10GE_MCP_ETHER_FLAGS_FIRST;
> +			cum_len_next = cum_len + seglen;
> +#ifdef NETIF_F_TSO
> +			if (mss) { /* TSO */
> +				(req-rdma_count)->rdma_count = rdma_count + 1;
> +
> +				if (likely(cum_len >= 0)) { /* payload */
> +					int next_is_first, chop;
> +
> +					chop = (cum_len_next>mss);
> +					cum_len_next = cum_len_next % mss;
> +					next_is_first = (cum_len_next == 0);
> +					flags |= chop *
> +						MYRI10GE_MCP_ETHER_FLAGS_TSO_CHOP;
> +					flags_next |= next_is_first *
> +						MYRI10GE_MCP_ETHER_FLAGS_FIRST;
> +					rdma_count |= -(chop | next_is_first);
> +					rdma_count += chop & !next_is_first;
> +				} else if (likely(cum_len_next >= 0)) { /* header ends */
> +					int small;
> +
> +					rdma_count = -1;
> +					cum_len_next = 0;
> +					seglen = -cum_len;
> +					small = (mss <= MYRI10GE_MCP_ETHER_SEND_SMALL_SIZE);
> +					flags_next = MYRI10GE_MCP_ETHER_FLAGS_TSO_PLD |
> +						MYRI10GE_MCP_ETHER_FLAGS_FIRST |
> +						(small * MYRI10GE_MCP_ETHER_FLAGS_SMALL);
> +				}
> +			}
> +#endif /* NETIF_F_TSO */
> +			req->addr_high = high_swapped;
> +			req->addr_low = htonl(low);
> +			req->pseudo_hdr_offset = pseudo_hdr_offset;
> +			req->pad = 0;	/* complete solid 16-byte block; does this matter? */
> +			req->rdma_count = 1;
> +			req->length = htons(seglen);
> +			req->cksum_offset = cksum_offset;
> +			req->flags = flags | ((cum_len & 1) * odd_flag);
> +
> +			low += seglen;
> +			len -= seglen;
> +			cum_len = cum_len_next;
> +			flags = flags_next;
> +			req++;
> +			count++;
> +			rdma_count++;
> +			if (unlikely(cksum_offset > seglen))
> +				cksum_offset -= seglen;
> +			else
> +				cksum_offset = 0;
> +		}
> +		if (frag_idx == frag_cnt)
> +			break;
> +
> +		/* map next fragment for DMA */
> +		idx = (count + tx->req) & tx->mask;
> +		frag = &skb_shinfo(skb)->frags[frag_idx];
> +		frag_idx++;
> +		len = frag->size;
> +		bus = pci_map_page(mgp->pdev, frag->page, frag->page_offset,
> +				   len, PCI_DMA_TODEVICE);
> +		pci_unmap_addr_set(&tx->info[idx], bus, bus);
> +		pci_unmap_len_set(&tx->info[idx], len, len);
> +	}
> +
> +	(req-rdma_count)->rdma_count = rdma_count;
> +#ifdef NETIF_F_TSO
> +	if (mss) {
> +		do {
> +			req--;
> +			req->flags |= MYRI10GE_MCP_ETHER_FLAGS_TSO_LAST;
> +		} while (!(req->flags & (MYRI10GE_MCP_ETHER_FLAGS_TSO_CHOP |
> +					 MYRI10GE_MCP_ETHER_FLAGS_FIRST)));
> +	}
> +#endif
> +	idx = ((count - 1) + tx->req) & tx->mask;
> +	tx->info[idx].last = 1;
> +	if (tx->wc_fifo == NULL)
> +		myri10ge_submit_req(tx, tx->req_list, count);
> +	else
> +		myri10ge_submit_req_wc(tx, tx->req_list, count);
> +	tx->pkt_start++;
> +	if ((avail - count) < MYRI10GE_MCP_ETHER_MAX_SEND_DESC) {
> +		mgp->stop_queue++;
> +		netif_stop_queue(dev);
> +	}
> +	dev->trans_start = jiffies;
> +	return 0;
> +
> +
> +abort_linearize:
> +	/* Free any DMA resources we've alloced and clear out the skb
> +	 * slot so as to not trip up assertions, and to avoid a
> +	 * double-free if linearizing fails */
> +
> +	last_idx = (idx + 1) & tx->mask;
> +	idx = tx->req & tx->mask;
> +	tx->info[idx].skb = NULL;
> +	do {
> +		len = pci_unmap_len(&tx->info[idx], len);
> +		if (len) {
> +			if (tx->info[idx].skb != NULL) {
> +				pci_unmap_single(mgp->pdev,
> +						 pci_unmap_addr(&tx->info[idx], bus),
> +						 len, PCI_DMA_TODEVICE);
> +			} else {
> +				pci_unmap_page(mgp->pdev,
> +					       pci_unmap_addr(&tx->info[idx], bus),
> +					       len, PCI_DMA_TODEVICE);
> +			}
> +			pci_unmap_len_set(&tx->info[idx], len, 0);
> +			tx->info[idx].skb = NULL;
> +		}
> +		idx = (idx + 1) & tx->mask;
> +	} while (idx != last_idx);
> +	if (skb_shinfo(skb)->tso_size) {
> +		printk(KERN_ERR "myri10ge: %s: TSO but wanted to linearize?!?!?\n",
> +		       mgp->dev->name);
> +		goto drop;
> +	}
> +
> +	if (skb_linearize(skb, GFP_ATOMIC)) {
> +		goto drop;
> +	}
> +	mgp->tx_linearized++;
> +	goto again;
> +
> +drop:
> +	dev_kfree_skb_any(skb);
> +	mgp->stats.tx_dropped += 1;
> +	return 0;
> +
> +
> +}
> +
> +static struct net_device_stats *
> +myri10ge_get_stats(struct net_device *dev)
> +{
> +	struct myri10ge_priv *mgp = dev->priv;
> +	return &mgp->stats;
> +}
> +
> +static void
> +myri10ge_set_multicast_list(struct net_device *dev)
> +{
> +	myri10ge_change_promisc(dev->priv, dev->flags & IFF_PROMISC);
> +}
> +
> +
> +static int
> +myri10ge_set_mac_address (struct net_device *dev, void *addr)
> +{
> +	struct sockaddr *sa = (struct sockaddr *) addr;
> +	struct myri10ge_priv *mgp = dev->priv;
> +	int status;
> +
> +	if (!is_valid_ether_addr(sa->sa_data))
> +		return -EADDRNOTAVAIL;
> +
> +	status = myri10ge_update_mac_address(mgp, sa->sa_data);
> +	if (status != 0) {
> +		printk(KERN_ERR "myri10ge: %s: changing mac address failed with %d\n",
> +		       dev->name, status);
> +		return status;
> +	}
> +
> +	/* change the dev structure */
> +	memcpy(dev->dev_addr, sa->sa_data, 6);
> +	return 0;
> +}
> +static int
> +myri10ge_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
> +{
> +	return -EOPNOTSUPP;
> +}

Just leave dev->ioctl as NULL then, it will do what you want

> +
> +static int
> +myri10ge_init(struct net_device *dev)
> +{
> +	return 0;
> +}
> +

You don't have to have an init routine, so stub is unneeded.

> +static int
> +myri10ge_change_mtu(struct net_device *dev, int new_mtu)
> +{
> +	struct myri10ge_priv *mgp = dev->priv;
> +	int error = 0;
> +
> +	if ((new_mtu < 68) || (ETH_HLEN + new_mtu > MYRI10GE_MAX_ETHER_MTU)) {
> +		printk(KERN_ERR "myri10ge: %s: new mtu (%d) is not valid\n",
> +		       dev->name, new_mtu);
> +		return -EINVAL;
> +	}
> +	printk("%s: changing mtu from %d to %d\n",
> +	       dev->name, dev->mtu, new_mtu);
> +	if (mgp->running) {
> +		/* if we change the mtu on an active device, we must
> +		 * reset the device so the firmware sees the change */
> +		myri10ge_close(dev);
> +		dev->mtu = new_mtu;
> +		myri10ge_open(dev);
> +	} else {
> +		dev->mtu = new_mtu;
> +	}
> +	return error;
> +}
> +
> +#if defined(CONFIG_X86) || defined(CONFIG_X86_64)

Bad sign,... machine dependent code in driver...

> +
> +/*
> + * Enable ECRC to align PCI-E Completion packets.  Rather than using
> + * normal pci config space writes, we must map the Nvidia config space
> + * ourselves.  This is because on opteron/nvidia class machine the
> + * 0xe000000 mapping is handled by the nvidia chipset, that means
> + * the internal PCI device (the on-chip northbridge), or the amd-8131
> + * bridge and things behind them are not visible by this method.
> + */
> +

Fix the PCI support, don't do it in driver!


> +static void
> +myri10ge_enable_ecrc(struct myri10ge_priv *mgp)
> +{
> +	struct pci_dev *bridge = mgp->pdev->bus->self;
> +	struct device * dev = &mgp->pdev->dev;
> +	unsigned cap;
> +	unsigned err_cap;
> +	int ret;
> +
> +	if (!myri10ge_ecrc_enable || !bridge)
> +		return;
> +
> +	cap = pci_find_ext_capability(bridge, PCI_EXT_CAP_ID_ERR);
> +	/* nvidia ext cap is not always linked in ext cap chain */
> +	if (!cap
> +	    && bridge->vendor == PCI_VENDOR_ID_NVIDIA
> +	    && bridge->device == PCI_DEVICE_ID_NVIDIA_NFORCE_CK804_PCIE)
> +		cap = 0x160;
> +
> +	if (!cap)
> +		return;
> +
> +	ret = pci_read_config_dword(bridge, cap + PCI_ERR_CAP, &err_cap);
> +	if (ret) {
> +		dev_err(dev, "failed reading ext-conf-space of %s\n",
> +			pci_name(bridge));
> +		dev_err(dev, "\t pci=nommconf in use? "
> +			"or buggy/incomplete/absent acpi MCFG attr?\n");
> +		return;
> +	}
> +	if (!(err_cap & PCI_ERR_CAP_ECRC_GENC))
> +		return;
> +
> +	err_cap |= PCI_ERR_CAP_ECRC_GENE;
> +	pci_write_config_dword(bridge, cap + PCI_ERR_CAP, err_cap);
> +	dev_info(dev,
> +		 "Enabled ECRC on upstream bridge %s\n",
> +		 pci_name(bridge));
> +	mgp->tx.boundary = 4096;
> +	mgp->fw_name = myri10ge_fw_aligned;
> +}
> +#endif /* defined(CONFIG_X86) || defined(CONFIG_X86_64) */
> +
> +/*
> + * The Lanai Z8E PCI-E interface achieves higher Read-DMA throughput
> + * when the PCI-E Completion packets are aligned on an 8-byte
> + * boundary.  Some PCI-E chip sets always align Completion packets; on
> + * the ones that do not, the alignment can be enforced by enabling
> + * ECRC generation (if supported).
> + *
> + * When PCI-E Completion packets are not aligned, it is actually more
> + * efficient to limit Read-DMA transactions to 2KB, rather than 4KB.
> + *
> + * If the driver can neither enable ECRC nor verify that it has
> + * already been enabled, then it must use a firmware image which works
> + * around unaligned completion packets (myri10ge_ethp_z8e.dat), and it
> + * should also ensure that it never gives the device a Read-DMA which is
> + * larger than 2KB by setting the tx.boundary to 2KB.  If ECRC is
> + * enabled, then the driver should use the aligned (myri10ge_eth_z8e.dat)
> + * firmware image, and set tx.boundary to 4KB.
> + */
> +
> +static void
> +myri10ge_select_firmware(struct myri10ge_priv *mgp)
> +{
> +	struct pci_dev *bridge = mgp->pdev->bus->self;
> +
> +	mgp->tx.boundary = 2048;
> +	mgp->fw_name = myri10ge_fw_unaligned;
> +
> +	if (myri10ge_force_firmware == 0) {
> +#if defined(CONFIG_X86) || defined(CONFIG_X86_64)
> +		myri10ge_enable_ecrc(mgp);
> +#endif
> +		/* Check to see if the upstream bridge is known to
> +		 * provide aligned completions */
> +		if (bridge
> +		    /* ServerWorks HT2000/HT1000 */
> +		    && bridge->vendor == PCI_VENDOR_ID_SERVERWORKS
> +		    && bridge->device == PCI_DEVICE_ID_SERVERWORKS_HT2000_PCIE) {
> +			dev_info(&mgp->pdev->dev,
> +				 "Assuming aligned completions (0x%x:0x%x)\n",
> +				 bridge->vendor, bridge->device);
> +			mgp->tx.boundary = 4096;
> +			mgp->fw_name = myri10ge_fw_aligned;
> +		}
> +	} else {
> +		if (myri10ge_force_firmware == 1) {
> +			dev_info(&mgp->pdev->dev,
> +				 "Assuming aligned completions (forced)\n");
> +			mgp->tx.boundary = 4096;
> +			mgp->fw_name = myri10ge_fw_aligned;
> +		} else {
> +			dev_info(&mgp->pdev->dev,
> +				 "Assuming unaligned completions (forced)\n");
> +			mgp->tx.boundary = 2048;
> +			mgp->fw_name = myri10ge_fw_unaligned;
> +		}
> +	}
> +	if (myri10ge_fw_name != NULL) {
> +		dev_info(&mgp->pdev->dev, "overriding firmware to %s\n",
> +			 myri10ge_fw_name);
> +		mgp->fw_name = myri10ge_fw_name;
> +	}
> +}
> +
> +
> +static void
> +myri10ge_save_state(struct myri10ge_priv *mgp)
> +{
> +	struct pci_dev *pdev =	mgp->pdev;
> +	int cap;
> +
> +	pci_save_state(pdev);
> +	/* now save PCIe and MSI state that Linux will not
> +	   save for us */
> +	cap = pci_find_capability(pdev, PCI_CAP_ID_EXP);
> +	pci_read_config_dword(pdev, cap + PCI_EXP_DEVCTL, &mgp->devctl);
> +	cap = pci_find_capability(pdev, PCI_CAP_ID_MSI);
> +	pci_read_config_word(pdev, cap + PCI_MSI_FLAGS, &mgp->msi_flags);
> +	pci_read_config_dword(pdev, cap + PCI_MSI_ADDRESS_LO,
> +			      &mgp->msi_addr_low);
> +	pci_read_config_dword(pdev, cap + PCI_MSI_ADDRESS_HI,
> +			      &mgp->msi_addr_high);
> +	pci_read_config_word(pdev, cap + PCI_MSI_DATA_32,
> +			     &mgp->msi_data_32);
> +	pci_read_config_word(pdev, cap + PCI_MSI_DATA_64,
> +			     &mgp->msi_data_64);
> +}
> +
> +static void
> +myri10ge_restore_state(struct myri10ge_priv *mgp)
> +{
> +	struct pci_dev *pdev =	mgp->pdev;
> +	int cap;
> +
> +	pci_restore_state(pdev);
> +	/* restore PCIe and MSI state that linux will not */
> +	cap = pci_find_capability(pdev, PCI_CAP_ID_EXP);
> +	pci_write_config_dword(pdev, cap + PCI_CAP_ID_EXP, mgp->devctl);
> +	cap = pci_find_capability(pdev, PCI_CAP_ID_MSI);
> +	pci_write_config_word(pdev, cap + PCI_MSI_FLAGS, mgp->msi_flags);
> +	pci_write_config_dword(pdev, cap + PCI_MSI_ADDRESS_LO,
> +			       mgp->msi_addr_low);
> +	pci_write_config_dword(pdev, cap + PCI_MSI_ADDRESS_HI,
> +			       mgp->msi_addr_high);
> +	pci_write_config_word(pdev, cap + PCI_MSI_DATA_32,
> +			      mgp->msi_data_32);
> +	pci_write_config_word(pdev, cap + PCI_MSI_DATA_64,
> +			      mgp->msi_data_64);
> +}
> +
> +#ifdef CONFIG_PM
> +
> +static int
> +myri10ge_suspend(struct pci_dev *pdev, pm_message_t state)
> +{
> +	struct myri10ge_priv *mgp;
> +	struct net_device *netdev;
> +
> +	mgp = (struct myri10ge_priv *) pci_get_drvdata(pdev);
> +	if (mgp == NULL)
> +		return -EINVAL;
> +	netdev = mgp->dev;
> +
> +	if (netif_running(netdev)) {
> +		printk("myri10ge: closing %s\n", netdev->name);
> +		myri10ge_close(netdev);
> +	}
> +	myri10ge_dummy_rdma(mgp, 0);
> +	free_irq(pdev->irq, mgp);
> +#ifdef CONFIG_PCI_MSI
> +	if (mgp->msi_enabled)
> +		pci_disable_msi(pdev);
> +#endif
> +	netif_device_detach(netdev);
> +	myri10ge_save_state(mgp);
> +	pci_disable_device(pdev);
> +	pci_set_power_state(pdev, pci_choose_state(pdev, state));
> +	return 0;
> +}
> +
> +static int
> +myri10ge_resume(struct pci_dev *pdev)
> +{
> +	struct myri10ge_priv *mgp;
> +	struct net_device *netdev;
> +	int status;
> +
> +	mgp = (struct myri10ge_priv *) pci_get_drvdata(pdev);
> +	if (mgp == NULL)
> +		return -EINVAL;
> +	netdev = mgp->dev;
> +	pci_set_power_state(pdev, 0);  /* zeros conf space as a side effect */
> +	udelay(5000);	/* give card time to respond */
> +	myri10ge_restore_state(mgp);
> +	pci_enable_device(pdev);
> +	pci_set_master(pdev);
> +
> +#ifdef CONFIG_PCI_MSI
> +	if (myri10ge_use_msi(pdev)) {
> +		status = pci_enable_msi(pdev);
> +		if (status != 0) {
> +			dev_err(&pdev->dev,
> +				"Error %d setting up MSI; falling back to xPIC\n",
> +				status);
> +
> +		} else {
> +			mgp->msi_enabled = 1;
> +		}
> +	}
> +#endif
> +	if (myri10ge_napi) {
> +		status = request_irq(pdev->irq, myri10ge_napi_intr, SA_SHIRQ,
> +				     netdev->name, mgp);
> +	} else {
> +
> +		status = request_irq(pdev->irq, myri10ge_intr, SA_SHIRQ,
> +				     netdev->name, mgp);
> +	}

I would prefer to just have driver always do NAPI.  It's a 10G driver, it
really needs to be NAPI to prevent machine starvation.

> +	if (status != 0) {
> +		dev_err(&pdev->dev, "failed to allocate IRQ\n");
> +		goto abort_with_msi;
> +	}
> +
> +	myri10ge_reset(mgp);
> +	myri10ge_dummy_rdma(mgp, mgp->tx.boundary != 4096);
> +
> +	/* Save configuration space to be restored if the
> +	   nic resets due to a parity error */
> +	myri10ge_save_state(mgp);
> +
> +	netif_device_attach(netdev);
> +	if (netif_running(netdev))
> +		myri10ge_open(netdev);
> +	return 0;
> +
> +abort_with_msi:
> +#ifdef CONFIG_PCI_MSI
> +	if (mgp->msi_enabled)
> +		pci_disable_msi(pdev);
> +#endif
> +	return -EIO;
> +
> +}
> +
> +#endif /* CONFIG_PM */
> +
> +static uint32_t
> +myri10ge_read_reboot(struct myri10ge_priv *mgp)
> +{
> +	struct pci_dev *pdev = mgp->pdev;
> +	int vs = mgp->vendor_specific_offset;
> +	uint32_t reboot;
> +
> +	/*enter read32 mode */
> +	pci_write_config_byte(pdev, vs + 0x10, 0x3);
> +
> +	/*read REBOOT_STATUS (0xfffffff0) */
> +	pci_write_config_dword(pdev, vs + 0x18, 0xfffffff0);
> +	pci_read_config_dword(pdev, vs + 0x14, &reboot);
> +	return reboot;
> +}
> +
> +static void
> +myri10ge_watchdog(void *arg)
> +{
> +	struct myri10ge_priv *mgp = arg;
> +	uint32_t reboot;
> +	int status;
> +	uint16_t cmd, vendor;
> +
> +	mgp->watchdog_resets++;
> +	pci_read_config_word(mgp->pdev, PCI_COMMAND, &cmd);
> +	if ((cmd & PCI_COMMAND_MASTER) == 0) {
> +		/* Bus master DMA disabled?  Check to see
> +		 * if the card rebooted due to a parity error
> +		 * For now, just report it */
> +		reboot = myri10ge_read_reboot(mgp);
> +		printk(KERN_ERR "myri10ge: %s: NIC rebooted (0x%x), resetting\n",
> +		       mgp->dev->name, reboot);
> +		/*
> +		 * A rebooted nic will come back with config space as
> +		 * it was after power was applied to PCIe bus.
> +		 * Attempt to restore config space which was saved
> +		 * when the driver was loaded, or the last time the
> +		 * nic was resumed from power saving mode.
> +		 */
> +		myri10ge_restore_state(mgp);
> +	} else {
> +		/* if we get back -1's from our slot, perhaps somebody
> +		   powered off our card.  Don't try to reset it in
> +		   this case */
> +		if (cmd == 0xffff) {
> +			pci_read_config_word(mgp->pdev, PCI_VENDOR_ID, &vendor);
> +			if (vendor == 0xffff) {
> +				printk(KERN_ERR "myri10ge: %s: device disappeared!\n",
> +				       mgp->dev->name);
> +				return;
> +			}
> +		}
> +		/* Perhaps it is a software error.  Try to reset */
> +
> +		printk(KERN_ERR "myri10ge: %s: device timeout, resetting\n",
> +		       mgp->dev->name);
> +		printk("myri10ge: %s: %d %d %d %d %d\n", mgp->dev->name,
> +		       mgp->tx.req, mgp->tx.done, mgp->tx.pkt_start,
> +		       mgp->tx.pkt_done,
> +		       (int)ntohl(mgp->fw_stats->send_done_count));
> +		set_current_state (TASK_UNINTERRUPTIBLE);
> +		schedule_timeout(HZ*2);
> +		set_current_state(TASK_RUNNING);
> +		printk("myri10ge: %s: %d %d %d %d %d\n", mgp->dev->name,
> +		       mgp->tx.req, mgp->tx.done, mgp->tx.pkt_start,
> +		       mgp->tx.pkt_done,
> +		       (int)ntohl(mgp->fw_stats->send_done_count));
> +	}
> +	myri10ge_close(mgp->dev);
> +	status = myri10ge_load_firmware(mgp);
> +	if (status != 0) {
> +		printk(KERN_ERR "myri10ge: %s: failed to load firmware\n",
> +		       mgp->dev->name);
> +		return;
> +	}
> +	myri10ge_open(mgp->dev);
> +}

Watchdog's are a sign of buggy hardware and drivers!

> +
> +static void
> +myri10ge_watchdog_timer(unsigned long arg)
> +{
> +	struct myri10ge_priv *mgp;
> +
> +	mgp = (struct myri10ge_priv *) arg;
> +	if (mgp->tx.req != mgp->tx.done &&
> +	    mgp->tx.done == mgp->watchdog_tx_done) {
> +		/* nic seems like it might be stuck.. */
> +		schedule_work(&mgp->watchdog_work);
> +	} else {
> +		/* rearm timer */
> +		mod_timer(&mgp->watchdog_timer,
> +			  jiffies + myri10ge_watchdog_timeout * HZ);
> +	}
> +	mgp->watchdog_tx_done = mgp->tx.done;
> +}
> +
> +static int
> +myri10ge_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
> +{
> +	struct net_device *netdev;
> +	struct myri10ge_priv *mgp;
> +	struct device *dev = &pdev->dev;
> +	size_t bytes;
> +	int i;
> +	int status = -ENXIO;
> +	int cap;
> +	u16 val;
> +
> +	netdev = alloc_etherdev(sizeof(*mgp));
> +	if (netdev == NULL) {
> +		dev_err(dev, "Could not allocate ethernet device\n");
> +		return -ENOMEM;
> +	}
> +
> +	mgp = netdev_priv(netdev);
> +	memset(mgp, 0, sizeof (*mgp));
> +	mgp->dev = netdev;
> +	mgp->pdev = pdev;
> +	mgp->csum_flag = MYRI10GE_MCP_ETHER_FLAGS_CKSUM;
> +	mgp->pause = myri10ge_flow_control;
> +	mgp->intr_coal_delay = myri10ge_intr_coal_delay;
> +
> +	spin_lock_init(&mgp->cmd_lock);
> +	if (pci_enable_device(pdev)) {
> +		dev_err(&pdev->dev, "pci_enable_device call failed\n");
> +		status = -ENODEV;
> +		goto abort_with_netdev;
> +	}
> +	myri10ge_select_firmware(mgp);
> +
> +	/* Find the vendor-specific cap so we can check
> +	   the reboot register later on */
> +	mgp->vendor_specific_offset
> +		= pci_find_capability(pdev, PCI_CAP_ID_VNDR);
> +
> +	/* Set our max read request to 4KB */
> +	cap = pci_find_capability(pdev, PCI_CAP_ID_EXP);
> +	if (cap < 64) {
> +		dev_err(&pdev->dev,"Bad PCI_CAP_ID_EXP location %d\n", cap);
> +		goto abort_with_netdev;
> +	}
> +	status = pci_read_config_word(pdev, cap + PCI_EXP_DEVCTL, &val);
> +	if (status != 0) {
> +		dev_err(&pdev->dev, "Error %d reading PCI_EXP_DEVCTL\n", status);
> +		goto abort_with_netdev;
> +	}
> +	val = (val & ~PCI_EXP_DEVCTL_READRQ) | (5 << 12);
> +	status = pci_write_config_word(pdev, cap + PCI_EXP_DEVCTL, val);
> +	if (status != 0) {
> +		dev_err(&pdev->dev, "Error %d writing PCI_EXP_DEVCTL\n", status);
> +		goto abort_with_netdev;
> +	}
> +
> +	pci_set_master(pdev);
> +	status = pci_set_dma_mask(pdev, (dma_addr_t)~0ULL);
> +	if (status != 0) {
> +		dev_err(&pdev->dev, "64-bit pci address mask was refused, trying 32-bit");
> +		status = pci_set_dma_mask(pdev, (dma_addr_t)0xffffffffULL);
> +	}
> +	if (status != 0) {
> +		dev_err(&pdev->dev, "Error %d setting DMA mask\n", status);
> +		goto abort_with_netdev;
> +	}
> +	mgp->cmd = (mcp_cmd_response_t *)
> +		pci_alloc_consistent(pdev, sizeof (*mgp->cmd), &mgp->cmd_bus);
> +	if (mgp->cmd == NULL) {
> +		goto abort_with_netdev;
> +	}
> +
> +	mgp->fw_stats = (mcp_irq_data_t *)
> +		pci_alloc_consistent(pdev, sizeof (*mgp->fw_stats),
> +				     &mgp->fw_stats_bus);
> +	if (mgp->fw_stats == NULL) {
> +		goto abort_with_cmd;
> +	}
> +
> +	strcpy(netdev->name, "eth%d");

Already done by alloc_ether_dev...

> +	mgp->board_span = pci_resource_len(pdev, 0);
> +	mgp->iomem_base = pci_resource_start(pdev, 0);
> +	mgp->mtrr = -1;
> +#ifdef CONFIG_MTRR
> +	mgp->mtrr = mtrr_add(mgp->iomem_base, mgp->board_span,
> +			     MTRR_TYPE_WRCOMB, 1);
> +#endif
> +	/* Hack.  need to get rid of these magic numbers */
> +	mgp->sram_size = 2*1024*1024 - (2*(48*1024)+(32*1024)) - 0x100;
> +	if (mgp->sram_size > mgp->board_span) {
> +		dev_err(&pdev->dev, "board span %ld bytes too small\n",
> +		       mgp->board_span);
> +		goto abort_with_wc;
> +	}
> +	mgp->sram = ioremap(mgp->iomem_base, mgp->board_span);
> +	if (mgp->sram == NULL) {
> +		dev_err(&pdev->dev, "ioremap failed for %ld bytes at 0x%lx\n",
> +		       mgp->board_span, mgp->iomem_base);
> +		status = -ENXIO;
> +		goto abort_with_wc;
> +	}
> +	memcpy_fromio(mgp->eeprom_strings,
> +		      mgp->sram + mgp->sram_size - MYRI10GE_EEPROM_STRINGS_SIZE,
> +		      MYRI10GE_EEPROM_STRINGS_SIZE);
> +	memset(mgp->eeprom_strings + MYRI10GE_EEPROM_STRINGS_SIZE - 2, 0, 2);
> +	status = myri10ge_read_mac_addr(mgp);
> +	if (status) {
> +		goto abort_with_ioremap;
> +	}
extra brackets for a goto

> +	for (i = 0; i < 6; i++) {
use ETH_ALEN not 6

> +		netdev->dev_addr[i] = mgp->mac_addr[i];
> +	}
> +	/* allocate rx done ring */
> +	bytes = myri10ge_max_intr_slots * sizeof (*mgp->rx_done.entry);
> +	mgp->rx_done.entry = (mcp_slot_t *)
> +		pci_alloc_consistent(pdev, bytes, &mgp->rx_done.bus);
> +	if (mgp->rx_done.entry == NULL)
> +		goto abort_with_ioremap;
> +	memset(mgp->rx_done.entry, 0, bytes);
> +
> +	status = myri10ge_load_firmware(mgp);
> +	if (status != 0) {
> +		dev_err(&pdev->dev, "failed to load firmware\n");
> +		goto abort_with_rx_done;
> +	}
> +
> +	status = myri10ge_reset(mgp);
> +	if (status != 0) {
> +		dev_err(&pdev->dev, "failed reset\n");
> +		goto abort_with_firmware;
> +	}
> +
> +#ifdef CONFIG_PCI_MSI

you don't need this ifdef because if CONFIG_PCI_MSI
is not set then pci_enable_msi() always returns 0 (false)
so your code should handle that....

> +	if (myri10ge_use_msi(pdev)) {
> +		status = pci_enable_msi(pdev);
> +		if (status != 0) {
> +			dev_err(&pdev->dev,
> +				"Error %d setting up MSI; falling back to xPIC\n",
> +				status);
> +		} else {
> +			mgp->msi_enabled = 1;
> +		}
> +	}
> +#endif
> +
> +	if (myri10ge_napi) {
> +		status = request_irq(pdev->irq, myri10ge_napi_intr, SA_SHIRQ,
> +				     netdev->name, mgp);
> +	} else {
> +		status = request_irq(pdev->irq, myri10ge_intr, SA_SHIRQ,
> +				     netdev->name, mgp);
> +	}
> +	if (status != 0) {
> +		dev_err(&pdev->dev, "failed to allocate IRQ\n");
> +		goto abort_with_firmware;
> +	}
> +
> +	pci_set_drvdata(pdev, mgp);
> +	if ((myri10ge_initial_mtu + ETH_HLEN) > MYRI10GE_MAX_ETHER_MTU)
> +		myri10ge_initial_mtu = MYRI10GE_MAX_ETHER_MTU - ETH_HLEN;
> +	if ((myri10ge_initial_mtu + ETH_HLEN) < 68)
> +		myri10ge_initial_mtu = 68;
> +	netdev->mtu = myri10ge_initial_mtu;
> +	netdev->open = myri10ge_open;
> +	netdev->stop = myri10ge_close;
> +	netdev->hard_start_xmit = myri10ge_xmit;
> +	netdev->get_stats = myri10ge_get_stats;
> +	netdev->base_addr = mgp->iomem_base;
> +	netdev->irq = pdev->irq;
> +	netdev->init = myri10ge_init;
> +	netdev->change_mtu = myri10ge_change_mtu;
> +	netdev->set_multicast_list = myri10ge_set_multicast_list;
> +	netdev->set_mac_address = myri10ge_set_mac_address;
> +	netdev->do_ioctl = myri10ge_ioctl;
> +	netdev->features = NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_HIGHDMA;

Can't enable HIGHDMA unless you set dma_mask right?

> +#if 0
> +	/* TSO can be enabled via ethtool -K eth1 tso on */
> +#ifdef NETIF_F_TSO
> +	netdev->features |= NETIF_F_TSO;
> +#endif
> +#endif

If it works enable it, if it doesn't take the code out.

> +	if (myri10ge_napi) {
> +		netdev->poll = myri10ge_poll;
> +		netdev->weight = myri10ge_napi_weight;
> +	}
> +
> +	/* Save configuration space to be restored if the
> +	 * nic resets due to a parity error */
> +	myri10ge_save_state(mgp);
> +
> +	/* Setup the watchdog timer */
> +	init_timer(&mgp->watchdog_timer);
> +	mgp->watchdog_timer.data = (unsigned long)mgp;
> +	mgp->watchdog_timer.function = myri10ge_watchdog_timer;

There is setup_timer()

> +
> +	SET_ETHTOOL_OPS(netdev, &myri10ge_ethtool_ops);
> +	INIT_WORK(&mgp->watchdog_work, myri10ge_watchdog, mgp);
> +	status = register_netdev(netdev);
> +	if (status != 0) {
> +		dev_err(&pdev->dev, "register_netdev failed: %d\n", status);
> +		goto abort_with_irq;
> +	}
> +
> +	printk("myri10ge: %s: %s IRQ %d, tx bndry %d, fw %s, WC %s\n",
> +	       netdev->name,  (mgp->msi_enabled ? "MSI" : "xPIC"),
> +	       pdev->irq, mgp->tx.boundary, mgp->fw_name,
> +	       (mgp->mtrr >= 0 ? "Enabled" : "Disabled"));
> +
> +	return 0;
> +
> +abort_with_irq:
> +	free_irq(pdev->irq, mgp);
> +#ifdef CONFIG_PCI_MSI
> +	if (mgp->msi_enabled)
> +		pci_disable_msi(pdev);
> +#endif
> +
> +abort_with_firmware:
> +	myri10ge_dummy_rdma(mgp, 0);
> +
> +abort_with_rx_done:
> +	bytes = myri10ge_max_intr_slots * sizeof (*mgp->rx_done.entry);
> +	pci_free_consistent(pdev, bytes, mgp->rx_done.entry, mgp->rx_done.bus);
> +
> +abort_with_ioremap:
> +	iounmap((void __iomem *) mgp->sram);
> +
> +abort_with_wc:
> +#ifdef CONFIG_MTRR
> +	if (mgp->mtrr >= 0)
> +		mtrr_del(mgp->mtrr, mgp->iomem_base, mgp->board_span);
> +#endif
> +	pci_free_consistent(pdev, sizeof (*mgp->fw_stats),
> +			    mgp->fw_stats, mgp->fw_stats_bus);
> +
> +abort_with_cmd:
> +	pci_free_consistent(pdev, sizeof (*mgp->cmd), mgp->cmd, mgp->cmd_bus);
> +
> +abort_with_netdev:
> +
> +	free_netdev(netdev);
> +	return status;
> +}
> +
> +/****************************************************************
> + * myri10ge_remove
> + *
> + * Does what is necessary to shutdown one Myrinet device. Called
> + *   once for each Myrinet card by the kernel when a module is
> + *   unloaded.
> + ****************************************************************/
> +
> +static void
> +myri10ge_remove(struct pci_dev *pdev)
> +{
> +	struct myri10ge_priv *mgp;
> +	struct net_device *netdev;
> +	size_t bytes;
> +
> +	mgp = (struct myri10ge_priv *) pci_get_drvdata(pdev);
> +	if (mgp == NULL)
> +		return;
> +
> +	flush_scheduled_work();
> +	netdev = mgp->dev;
> +	unregister_netdev(netdev);
> +	free_irq(pdev->irq, mgp);
> +#ifdef CONFIG_PCI_MSI
> +	if (mgp->msi_enabled)
> +		pci_disable_msi(pdev);
> +#endif
> +
> +	myri10ge_dummy_rdma(mgp, 0);
> +
> +
> +	bytes = myri10ge_max_intr_slots * sizeof (*mgp->rx_done.entry);
> +	pci_free_consistent(pdev, bytes, mgp->rx_done.entry, mgp->rx_done.bus);
> +
> +	iounmap((void __iomem *) mgp->sram);
> +
> +#ifdef CONFIG_MTRR
> +	if (mgp->mtrr >= 0)
> +		mtrr_del(mgp->mtrr, mgp->iomem_base, mgp->board_span);
> +#endif
> +	pci_free_consistent(pdev, sizeof (*mgp->fw_stats),
> +			    mgp->fw_stats, mgp->fw_stats_bus);
> +
> +	pci_free_consistent(pdev, sizeof (*mgp->cmd), mgp->cmd, mgp->cmd_bus);
> +
> +	free_netdev(netdev);
> +	pci_set_drvdata(pdev, NULL);
> +}
> +
> +
> +#define MYRI10GE_PCI_VENDOR_MYRICOM 	0x14c1
> +#define MYRI10GE_PCI_DEVICE_Z8E 	0x0008
> +static struct pci_device_id myri10ge_pci_tbl[] = {
> +	{MYRI10GE_PCI_VENDOR_MYRICOM, MYRI10GE_PCI_DEVICE_Z8E,
> +	 PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0},
> +	{0,},
There is nice PCI_DEVICE() macro for this.

> +};
> +
> +static struct pci_driver myri10ge_driver = {
> +	.name = "myri10ge",
> +	.probe = myri10ge_probe,
> +	.remove = myri10ge_remove,
> +	.id_table = myri10ge_pci_tbl,
> +#ifdef CONFIG_PM
> +	.suspend = myri10ge_suspend,
> +	.resume = myri10ge_resume,
> +#endif
> +};
> +
> +static int
> +myri10ge_init_module(void)

static int __init myril10ge_init_module(void)

> +{
> +	int rc;
> +	printk("%s: Version %s\n", myri10ge_driver.name,
> +	       MYRI10GE_VERSION_STR);
> +	rc = pci_register_driver(&myri10ge_driver);
	return pci_register_driver() ...

> +	return rc < 0 ? rc : 0;
> +}
> +
static __exit myril10ge_cleanup_module(void)


> +static void
> +myri10ge_cleanup_module(void)
> +{
> +	pci_unregister_driver(&myri10ge_driver);
> +}
> +
> +module_init(myri10ge_init_module);
> +module_exit(myri10ge_cleanup_module);
> +
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/6] myri10ge - First half of the driver
  2006-05-10 21:40 ` [PATCH 4/6] myri10ge - First half of the driver Brice Goglin
  2006-05-10 22:01   ` Stephen Hemminger
  2006-05-10 22:04   ` Roland Dreier
@ 2006-05-10 23:13   ` Francois Romieu
  2006-05-11 23:53     ` Brice Goglin
  2 siblings, 1 reply; 28+ messages in thread
From: Francois Romieu @ 2006-05-10 23:13 UTC (permalink / raw)
  To: Brice Goglin; +Cc: netdev, Andrew Morton, LKML, Andrew J. Gallatin, brice

Brice Goglin <bgoglin@myri.com> :
> [PATCH 4/6] myri10ge - First half of the driver
> 
> The first half of the myri10ge driver core.
> 
> Signed-off-by: Brice Goglin <brice@myri.com>
> Signed-off-by: Andrew J. Gallatin <gallatin@myri.com>
> 
>  myri10ge.c | 1483 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 1483 insertions(+)
> 
> --- /dev/null	2006-05-09 19:43:19.324446250 +0200
> +++ linux/drivers/net/myri10ge/myri10ge.c	2006-05-09 23:00:55.000000000 +0200
[...]
> +module_param(myri10ge_flow_control, int, S_IRUGO);
> +module_param(myri10ge_deassert_wait, int, S_IRUGO | S_IWUSR);
> +module_param(myri10ge_force_firmware, int, S_IRUGO);
> +module_param(myri10ge_skb_cross_4k, int, S_IRUGO | S_IWUSR);
> +module_param(myri10ge_initial_mtu, int, S_IRUGO);
> +module_param(myri10ge_napi, int, S_IRUGO);
> +module_param(myri10ge_napi_weight, int, S_IRUGO);
> +module_param(myri10ge_watchdog_timeout, int, S_IRUGO);
> +module_param(myri10ge_max_irq_loops, int, S_IRUGO);

MODULE_PARM_DESC() would be nice.

> +
> +#define MYRI10GE_FW_OFFSET 1024*1024
> +#define MYRI10GE_HIGHPART_TO_U32(X) \
> +(sizeof (X) == 8) ? ((uint32_t)((uint64_t)(X) >> 32)) : (0)
> +#define MYRI10GE_LOWPART_TO_U32(X) ((uint32_t)(X))
> +
> +#define myri10ge_pio_copy(to,from,size) __iowrite64_copy(to,from,size/8)
> +
> +int myri10ge_hyper_msi_cap_on(struct pci_dev *pdev)

static int ?

[...]
> +static int
> +myri10ge_send_cmd(struct myri10ge_priv *mgp, uint32_t cmd,
> +		  myri10ge_cmd_t *data)
> +{
> +	mcp_cmd_t *buf;
> +	char buf_bytes[sizeof(*buf) + 8];
> +	volatile mcp_cmd_response_t *response = mgp->cmd;
> +	volatile char __iomem *cmd_addr = mgp->sram + MYRI10GE_MCP_CMD_OFFSET;
> +	uint32_t dma_low, dma_high;
> +	int sleep_total = 0;
> +
> +	/* ensure buf is aligned to 8 bytes */
> +	buf = (mcp_cmd_t *) ((unsigned long)(buf_bytes + 7) & ~7UL);
> +
> +	buf->data0 = htonl(data->data0);
> +	buf->data1 = htonl(data->data1);
> +	buf->data2 = htonl(data->data2);
> +	buf->cmd = htonl(cmd);
> +	dma_low = MYRI10GE_LOWPART_TO_U32(mgp->cmd_bus);
> +	dma_high = MYRI10GE_HIGHPART_TO_U32(mgp->cmd_bus);
> +
> +	buf->response_addr.low = htonl(dma_low);
> +	buf->response_addr.high = htonl(dma_high);
> +	spin_lock(&mgp->cmd_lock);
> +	response->result = 0xffffffff;
> +	mb();
> +	myri10ge_pio_copy((void __iomem *) cmd_addr, buf, sizeof (*buf));
> +
> +	/* wait up to 2 seconds */

You must not hold a spinlock for up to 2 seconds.

> +	for (sleep_total = 0; sleep_total < (2 * 1000); sleep_total += 10) {
> +		mb();
> +		if (response->result != 0xffffffff) {
> +			if (response->result == 0) {
> +				data->data0 = ntohl(response->data);
> +				spin_unlock(&mgp->cmd_lock);
> +				return 0;
> +			} else {
> +				dev_err(&mgp->pdev->dev,
> +					"command %d failed, result = %d\n",
> +				       cmd, ntohl(response->result));
> +				spin_unlock(&mgp->cmd_lock);
> +				return -ENXIO;

Return in a middle of a spinlock-intensive function. :o(

> +			}
> +		}
> +		udelay(1000 * 10);
> +	}
> +	spin_unlock(&mgp->cmd_lock);
> +	dev_err(&mgp->pdev->dev, "command %d timed out, result = %d\n",
> +	       cmd, ntohl(response->result));
> +	return -EAGAIN;
> +}
> +
> +
> +/*
> + * The eeprom strings on the lanaiX have the format
> + * SN=x\0
> + * MAC=x:x:x:x:x:x\0
> + * PT:ddd mmm xx xx:xx:xx xx\0
> + * PV:ddd mmm xx xx:xx:xx xx\0
> + */
> +int
> +myri10ge_read_mac_addr(struct myri10ge_priv *mgp)

static int ?

[...]
> +static void
> +myri10ge_dummy_rdma(struct myri10ge_priv *mgp, int enable)
> +{
> +	volatile uint32_t *confirm;
> +	volatile char __iomem *submit;
> +	uint32_t buf[16];
> +	uint32_t dma_low, dma_high;
> +	int i;
> +
> +	/* clear confirmation addr */
> +	confirm = (volatile uint32_t *) mgp->cmd;
> +	*confirm = 0;
> +	mb();
> +
> +	/* send a rdma command to the PCIe engine, and wait for the
> +	 * response in the confirmation address.  The firmware should
> +	 * write a -1 there to indicate it is alive and well
> +	 */
> +	dma_low = MYRI10GE_LOWPART_TO_U32(mgp->cmd_bus);
> +	dma_high = MYRI10GE_HIGHPART_TO_U32(mgp->cmd_bus);
> +
> +	buf[0] = htonl(dma_high); 	/* confirm addr MSW */
> +	buf[1] = htonl(dma_low); 	/* confirm addr LSW */
> +	buf[2] = htonl(0xffffffff);	/* confirm data */
> +	buf[3] = htonl(dma_high); 	/* dummy addr MSW */
> +	buf[4] = htonl(dma_low); 	/* dummy addr LSW */
> +	buf[5] = htonl(enable);		/* enable? */
> +
> +	submit = mgp->sram + 0xfc01c0;
> +
> +	myri10ge_pio_copy((void __iomem *) submit, &buf, sizeof (buf));
> +	mb();
> +	udelay(1000);
> +	mb();
> +	i = 0;
> +	while (*confirm != 0xffffffff && i < 20) {
> +		udelay(1000);
> +		i++;
> +	}

	for (i = 0; *confirm != 0xffffffff && i < 20; i++)
		udelay(1000);


[...]
> +static int
> +myri10ge_adopt_running_firmware(struct myri10ge_priv *mgp)
> +{
> +	mcp_gen_header_t *hdr;
> +	struct device *dev = &mgp->pdev->dev;
> +	size_t bytes, hdr_offset;
> +	int status;
> +
> +	/* find running firmware header */
> +	hdr_offset = ntohl(__raw_readl(mgp->sram + MCP_HEADER_PTR_OFFSET));
> +
> +	if ((hdr_offset & 3) || hdr_offset + sizeof(*hdr) > mgp->sram_size) {
> +		dev_err(dev, "Running firmware has bad header offset (%d)\n",
> +			(int)hdr_offset);
> +		return -EIO;
> +	}
> +
> +	/* copy header of running firmware from SRAM to host memory to
> +	 * validate firmware */
> +	bytes = sizeof (mcp_gen_header_t);

const size_t bytes = ...

> +	hdr = (mcp_gen_header_t *) kmalloc(bytes, GFP_KERNEL);

Useless cast.

[...]
> +static int
> +myri10ge_change_pause(struct myri10ge_priv *mgp, int pause)
> +{
> +	myri10ge_cmd_t cmd;
> +	int status;
> +
> +	if (pause)
> +		status = myri10ge_send_cmd(mgp, MYRI10GE_MCP_ENABLE_FLOW_CONTROL, &cmd);
> +	else
> +		status = myri10ge_send_cmd(mgp, MYRI10GE_MCP_DISABLE_FLOW_CONTROL, &cmd);

	ctl = pause ? MYRI10GE_MCP_ENABLE_FLOW_CONTROL : 
		MYRI10GE_MCP_DISABLE_FLOW_CONTROL;

	status = myri10ge_send_cmd(mgp, ctl, ...)

> +
> +	if (status) {
> +		printk(KERN_ERR "myri10ge: %s: Failed to set flow control mode\n",
> +		       mgp->dev->name);
> +		return -ENXIO;

Why not use the status code returned by myri10ge_send_cmd() ?

[...]
> +static int
> +myri10ge_reset(struct myri10ge_priv *mgp)
> +{
[...]
> +	cmd.data0 = MYRI10GE_LOWPART_TO_U32(mgp->rx_done.bus);
> +	cmd.data1 = MYRI10GE_HIGHPART_TO_U32(mgp->rx_done.bus);
> +	cmd.data2 = len * 0x10001;
> +	status |= myri10ge_send_cmd(mgp, MYRI10GE_MCP_DMA_TEST, &cmd);

The status code is not used.

> +	mgp->read_write_dma = ((cmd.data0>>16) * len * 2 * 2) /
> +		(cmd.data0 & 0xffff);
> +
> +	memset(mgp->rx_done.entry, 0, bytes);
> +
> +	/* reset mcp/driver shared state back to 0 */
> +	mgp->tx.req = 0;
> +	mgp->tx.done = 0;
> +	mgp->tx.pkt_start = 0;
> +	mgp->tx.pkt_done = 0;
> +	mgp->rx_big.cnt = 0;
> +	mgp->rx_small.cnt = 0;
> +	mgp->rx_done.idx = 0;
> +	mgp->rx_done.cnt = 0;
> +	status = myri10ge_update_mac_address(mgp, mgp->dev->dev_addr);
> +	myri10ge_change_promisc(mgp, 0);
> +	myri10ge_change_pause(mgp, mgp->pause);
> +	return status;
> +}
> +
> +static inline void
> +myri10ge_submit_8rx(mcp_kreq_ether_recv_t __iomem *dst, mcp_kreq_ether_recv_t *src)
> +{
> +	uint32_t low;
> +
> +	low = src->addr_low;
> +	src->addr_low = 0xffffffff;

DMA_32BIT_MASK ?

> +	myri10ge_pio_copy(dst, src, 8 * sizeof(*src));
> +	mb();
> +	src->addr_low = low;
> +	*(uint32_t __force *) &dst->addr_low = src->addr_low;
> +	mb();
> +}
> +
> +/*
> + * Set of routunes to get a new receive buffer.  Any buffer which
> + * crosses a 4KB boundary must start on a 4KB boundary due to PCIe
> + * wdma restrictions. We also try to align any smaller allocation to
> + * at least a 16 byte boundary for efficiency.  We assume the linux
> + * memory allocator works by powers of 2, and will not return memory
> + * smaller than 2KB which crosses a 4KB boundary.  If it does, we fall
> + * back to allocating 2x as much space as required.
> + */
> +
> +static inline struct sk_buff *
> +myri10ge_alloc_big(int bytes)

It fits on a single line.

> +{
> +	struct sk_buff *skb;
> +	unsigned long data, roundup;
> +
> +	skb = dev_alloc_skb(bytes + 4096 + MYRI10GE_MCP_ETHER_PAD);
> +	if (skb == NULL)
> +		return NULL;

Imho you will want to work directly with pages shortly.

[...]
> +static irqreturn_t
> +myri10ge_napi_intr(int irq, void *arg, struct pt_regs *regs)
> +{
> +	struct myri10ge_priv *mgp = (struct myri10ge_priv *) arg;

Useless cast.

[...]
> +static int
> +myri10ge_set_settings(struct net_device *netdev, struct ethtool_cmd *cmd)
> +{
> +	return -EINVAL;
> +}

Useless.

-- 
Ueimor

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/6] myri10ge - First half of the driver
  2006-05-10 22:04   ` Roland Dreier
@ 2006-05-11 23:53     ` Brice Goglin
  0 siblings, 0 replies; 28+ messages in thread
From: Brice Goglin @ 2006-05-11 23:53 UTC (permalink / raw)
  To: Roland Dreier; +Cc: netdev, LKML, Andrew J. Gallatin

Roland Dreier wrote:
>  > +#define myri10ge_pio_copy(to,from,size) __iowrite64_copy(to,from,size/8)
>
> Why do you need this wrapper?  Why not just call __iowrite64_copy()
> without the obfuscation?  Anyone reading the code will just have to
> search back to this define and mentally translate the size back and
> forth all the time.
>   

Well, I know that abstraction layer is bad. But in this case I really
think that a name like myri10ge_pio_copy(size) is way less obfuscating
than __iowrite64_copy(size/8).
Will fix it if it really matters.


>  > +int myri10ge_hyper_msi_cap_on(struct pci_dev *pdev)
>  > +{
>  > +	uint8_t cap_off;
>  > +	int nbcap = 0;
>  > +
>  > +	cap_off = PCI_CAPABILITY_LIST - 1;
>  > +	/* go through all caps looking for a hypertransport msi mapping */
>
> This looks like something that should be fixed up in the general PCI
> quirk handling rather than in every driver...
>
>  > +static int
>  > +myri10ge_use_msi(struct pci_dev *pdev)
>  > +{
>  > +	if (myri10ge_msi == 1 || myri10ge_msi == 0)
>  > +		return myri10ge_msi;
>  > +
>  > +	/*  find root complex for our device */
>  > +	while (pdev->bus && pdev->bus->self) {
>  > +		pdev = pdev->bus->self;
>  > +	}
>
> Similarly looks like generic PCI code (if it's needed at all).  If I
> understand correctly you're trying to check if MSI has a chance at
> working on the system, but a network device driver has no business
> walking up the PCI hierarchy.
>   

Right, I will look at moving all this to the core PCI code.


Thanks for all the comments.

Brice


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/6] myri10ge - Second half of the driver
  2006-05-10 22:22   ` Stephen Hemminger
@ 2006-05-11 23:53     ` Brice Goglin
  2006-05-12  0:31       ` Herbert Xu
  0 siblings, 1 reply; 28+ messages in thread
From: Brice Goglin @ 2006-05-11 23:53 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, LKML, Andrew J. Gallatin

Stephen Hemminger wrote:
>> +
>> +static int
>> +myri10ge_open(struct net_device *dev)
>>     
>
> It is preferred to put function declarations on one line.
>
> static int mril10ge_open(struct net_device *dev)
>   

Well, I have seen several threads about this in the archive, with some
people against and some people pro. I personaly like grepping for the
declaration of function using ^name.
If this codingstyle is really required, I will do.

> I would prefer to just have driver always do NAPI.  It's a 10G driver, it
> really needs to be NAPI to prevent machine starvation.
>   

When TSO is disabled, we see performance being about 300Mbs lower when
enabling NAPI. But we'll probably enable TSO by default (see below) so
we'll probably drop non-NAPI.

>> +	myri10ge_close(mgp->dev);
>> +	status = myri10ge_load_firmware(mgp);
>> +	if (status != 0) {
>> +		printk(KERN_ERR "myri10ge: %s: failed to load firmware\n",
>> +		       mgp->dev->name);
>> +		return;
>> +	}
>> +	myri10ge_open(mgp->dev);
>> +}
>>     
>
> Watchdog's are a sign of buggy hardware and drivers!
>   

Well... the watchdog is supposed to help detecting memory parity errors
in the NIC. It's rare, but it happens with cosmic rays. The recovery
part still need some work anyway. So we might drop the watchdog for now
and come back when recovery is ready.

Additionally, we are using our own watchdog because the linux netdev
watchdog does not seem to work well for devices with large hardware
transmit queues.  If there is a hardware problem, a single (or even a
handful) of TCP streams will not backup into the hardware queue in a
timely fashion, leading to a long delay before the netdev watchdog
routine is called.

>> +#if 0
>> +	/* TSO can be enabled via ethtool -K eth1 tso on */
>> +#ifdef NETIF_F_TSO
>> +	netdev->features |= NETIF_F_TSO;
>> +#endif
>> +#endif
>>     
>
> If it works enable it, if it doesn't take the code out.
>   

It works. We did not enable it by default because there were some
problems in older kernels. They seem to be fixed in recent kernels. So
we'll enable TSO by default and have people disable it if it causes
problems.


>> [PATCH 3/6] myri10ge - Driver header files
>>
>> myri10ge driver header files.
>> myri10ge_mcp.h is the generic header, while myri10ge_mcp_gen_header.h
>> is automatically generated from our firmware image.
>>     
>
> Then clean it up after the auto generation.
> Auto generated code still gets maintained by humans.
>   

Oops sorry, I forgot to apply my cleaning script before sending.


>> +#define MYRI10GE_MCP_MAJOR	1
>> +#define MYRI10GE_MCP_MINOR	4
>> +
>>     
>
> Major/Minor for what. You don't have a character device.
>   

That's the firmware version, we'll find better names.



Thanks a lot for all the comments.

Brice


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/6] myri10ge - First half of the driver
  2006-05-10 22:06     ` Roland Dreier
@ 2006-05-11 23:53       ` Brice Goglin
  0 siblings, 0 replies; 28+ messages in thread
From: Brice Goglin @ 2006-05-11 23:53 UTC (permalink / raw)
  To: Roland Dreier; +Cc: Stephen Hemminger, LKML

Roland Dreier wrote:
>     Stephen> Splitting it in half, might help email restrictions, but
>     Stephen> it kills future users of 'git bisect' who expect to have
>     Stephen> every kernel buildable.
>
> Not really, since the makefile/kconfig stuff comes in a later patch.
>
> But yes, it is cleaner to have drivers go in in sane pieces.
>   

Yes sure. But the submission was not supposed to get merged as is
anyway, so I thought breaking the driver was not a big deal for this time.

By the way, what's the exact message size limit ?

Brice


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/6] myri10ge - First half of the driver
  2006-05-10 23:13   ` Francois Romieu
@ 2006-05-11 23:53     ` Brice Goglin
  2006-05-12  6:47       ` Evgeniy Polyakov
                         ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Brice Goglin @ 2006-05-11 23:53 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev, LKML, Andrew J. Gallatin

Francois Romieu wrote:
>
>> +	spin_lock(&mgp->cmd_lock);
>> +	response->result = 0xffffffff;
>> +	mb();
>> +	myri10ge_pio_copy((void __iomem *) cmd_addr, buf, sizeof (*buf));
>> +
>> +	/* wait up to 2 seconds */
>>     
>
> You must not hold a spinlock for up to 2 seconds.
>   

We are working on reducing the delay to about 15ms. It only occurs when
the driver is loaded or the link brought up.

>> +	for (sleep_total = 0; sleep_total < (2 * 1000); sleep_total += 10) {
>> +		mb();
>> +		if (response->result != 0xffffffff) {
>> +			if (response->result == 0) {
>> +				data->data0 = ntohl(response->data);
>> +				spin_unlock(&mgp->cmd_lock);
>> +				return 0;
>> +			} else {
>> +				dev_err(&mgp->pdev->dev,
>> +					"command %d failed, result = %d\n",
>> +				       cmd, ntohl(response->result));
>> +				spin_unlock(&mgp->cmd_lock);
>> +				return -ENXIO;
>>     
>
> Return in a middle of a spinlock-intensive function. :o(
>   

What do you mean ?

>   
>> +{
>> +	struct sk_buff *skb;
>> +	unsigned long data, roundup;
>> +
>> +	skb = dev_alloc_skb(bytes + 4096 + MYRI10GE_MCP_ETHER_PAD);
>> +	if (skb == NULL)
>> +		return NULL;
>>     
>
> Imho you will want to work directly with pages shortly.
>   

We had thought about doing this, but were a little nervous since we did
not know of any other drivers that worked directly with pages.  If this
is an official direction to work directly with pages, we will. But the
existing approach is well tested through our beta cycle, and we would
prefer to leave it as is and update to a pages based approach in the
future.


Thanks a lot for all the comments.

Brice


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/6] myri10ge - Second half of the driver
  2006-05-11 23:53     ` Brice Goglin
@ 2006-05-12  0:31       ` Herbert Xu
  0 siblings, 0 replies; 28+ messages in thread
From: Herbert Xu @ 2006-05-12  0:31 UTC (permalink / raw)
  To: Brice Goglin; +Cc: shemminger, netdev, linux-kernel, gallatin

Brice Goglin <brice@myri.com> wrote:
>
>> It is preferred to put function declarations on one line.
>>
>> static int mril10ge_open(struct net_device *dev)
> 
> Well, I have seen several threads about this in the archive, with some
> people against and some people pro. I personaly like grepping for the
> declaration of function using ^name.
> If this codingstyle is really required, I will do.

Yes this is the standard coding style used in Linux so please do.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/6] myri10ge - First half of the driver
  2006-05-11 23:53     ` Brice Goglin
@ 2006-05-12  6:47       ` Evgeniy Polyakov
  2006-05-12 17:40         ` David S. Miller
  2006-05-13 16:13       ` Francois Romieu
  2006-05-15 17:02       ` Lee Revell
  2 siblings, 1 reply; 28+ messages in thread
From: Evgeniy Polyakov @ 2006-05-12  6:47 UTC (permalink / raw)
  To: Brice Goglin; +Cc: Francois Romieu, netdev, LKML, Andrew J. Gallatin

On Fri, May 12, 2006 at 01:53:44AM +0200, Brice Goglin (brice@myri.com) wrote:
> > Imho you will want to work directly with pages shortly.
> >   
> 
> We had thought about doing this, but were a little nervous since we did
> not know of any other drivers that worked directly with pages.  If this
> is an official direction to work directly with pages, we will. 

s2io does. e1000 does it with skb frags.
If your hardware allows header split and driver can put headers into
skb->data and real data into frag_list, that allows to create various
interesting things like receiving zero-copy support and netchannels
support. It is work in progress, not official direction currently,
but this definitely will help your driver to support future high 
performance extensions.

> Brice

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/6] myri10ge - First half of the driver
  2006-05-12  6:47       ` Evgeniy Polyakov
@ 2006-05-12 17:40         ` David S. Miller
  0 siblings, 0 replies; 28+ messages in thread
From: David S. Miller @ 2006-05-12 17:40 UTC (permalink / raw)
  To: johnpol; +Cc: brice, romieu, netdev, linux-kernel, gallatin

From: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Date: Fri, 12 May 2006 10:47:11 +0400

> On Fri, May 12, 2006 at 01:53:44AM +0200, Brice Goglin (brice@myri.com) wrote:
> > > Imho you will want to work directly with pages shortly.
> > >   
> > 
> > We had thought about doing this, but were a little nervous since we did
> > not know of any other drivers that worked directly with pages.  If this
> > is an official direction to work directly with pages, we will. 
> 
> s2io does. e1000 does it with skb frags.
> If your hardware allows header split and driver can put headers into
> skb->data and real data into frag_list, that allows to create various
> interesting things like receiving zero-copy support and netchannels
> support. It is work in progress, not official direction currently,
> but this definitely will help your driver to support future high 
> performance extensions.

The most important impact is not having to use order 1 pages
for jumbo MTU frames, which are more likely to fail allocations
thant order 0 pages under heavy load.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/6] myri10ge - First half of the driver
  2006-05-11 23:53     ` Brice Goglin
  2006-05-12  6:47       ` Evgeniy Polyakov
@ 2006-05-13 16:13       ` Francois Romieu
  2006-05-15 17:02       ` Lee Revell
  2 siblings, 0 replies; 28+ messages in thread
From: Francois Romieu @ 2006-05-13 16:13 UTC (permalink / raw)
  To: Brice Goglin; +Cc: netdev, LKML, Andrew J. Gallatin

Brice Goglin <brice@myri.com> :
[...]
> > Return in a middle of a spinlock-intensive function. :o(
> >   
> 
> What do you mean ?

It is preferred for maintenance purpose (hello Mr Morton) to organize
the control flow with a single spin_{lock/unlock} pair: if there is a
branch in the control flow, it ought to be joined again before returning.

-- 
Ueimor

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/6] myri10ge - Kconfig and Makefile
  2006-05-10 21:43 ` [PATCH 6/6] myri10ge - Kconfig and Makefile Brice Goglin
@ 2006-05-13 18:51   ` Adrian Bunk
  2006-05-13 18:56     ` Brice Goglin
  0 siblings, 1 reply; 28+ messages in thread
From: Adrian Bunk @ 2006-05-13 18:51 UTC (permalink / raw)
  To: Brice Goglin; +Cc: netdev, Andrew Morton, LKML, Andrew J. Gallatin

On Wed, May 10, 2006 at 11:43:55PM +0200, Brice Goglin wrote:
>...
> --- linux-mm/drivers/net/Makefile.old	2006-04-08 04:49:53.000000000 -0700
> +++ linux-mm/drivers/net/Makefile	2006-04-21 08:10:27.000000000 -0700
> @@ -192,6 +192,7 @@ obj-$(CONFIG_R8169) += r8169.o
>  obj-$(CONFIG_AMD8111_ETH) += amd8111e.o
>  obj-$(CONFIG_IBMVETH) += ibmveth.o
>  obj-$(CONFIG_S2IO) += s2io.o
> +obj-$(CONFIG_MYRI10GE) += myri10ge/
>  obj-$(CONFIG_SMC91X) += smc91x.o
>  obj-$(CONFIG_SMC911X) += smc911x.o
>  obj-$(CONFIG_DM9000) += dm9000.o
> --- /dev/null	2006-04-21 00:45:09.064430000 -0700
> +++ linux-mm/drivers/net/myri10ge/Makefile	2006-04-21 08:14:21.000000000 -0700
> @@ -0,0 +1,5 @@
> +#
> +# Makefile for the Myricom Myri-10G ethernet driver
> +#
> +
> +obj-$(CONFIG_MYRI10GE) += myri10ge.o

If the driver consists of one source file, why does it need an own 
subdir?

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/6] myri10ge - Kconfig and Makefile
  2006-05-13 18:51   ` Adrian Bunk
@ 2006-05-13 18:56     ` Brice Goglin
  0 siblings, 0 replies; 28+ messages in thread
From: Brice Goglin @ 2006-05-13 18:56 UTC (permalink / raw)
  To: Adrian Bunk; +Cc: netdev, LKML, Andrew J. Gallatin

Adrian Bunk wrote:
> On Wed, May 10, 2006 at 11:43:55PM +0200, Brice Goglin wrote:
>   
>> ...
>> --- linux-mm/drivers/net/Makefile.old	2006-04-08 04:49:53.000000000 -0700
>> +++ linux-mm/drivers/net/Makefile	2006-04-21 08:10:27.000000000 -0700
>> @@ -192,6 +192,7 @@ obj-$(CONFIG_R8169) += r8169.o
>>  obj-$(CONFIG_AMD8111_ETH) += amd8111e.o
>>  obj-$(CONFIG_IBMVETH) += ibmveth.o
>>  obj-$(CONFIG_S2IO) += s2io.o
>> +obj-$(CONFIG_MYRI10GE) += myri10ge/
>>  obj-$(CONFIG_SMC91X) += smc91x.o
>>  obj-$(CONFIG_SMC911X) += smc911x.o
>>  obj-$(CONFIG_DM9000) += dm9000.o
>> --- /dev/null	2006-04-21 00:45:09.064430000 -0700
>> +++ linux-mm/drivers/net/myri10ge/Makefile	2006-04-21 08:14:21.000000000 -0700
>> @@ -0,0 +1,5 @@
>> +#
>> +# Makefile for the Myricom Myri-10G ethernet driver
>> +#
>> +
>> +obj-$(CONFIG_MYRI10GE) += myri10ge.o
>>     
>
> If the driver consists of one source file, why does it need an own 
> subdir?
>
> cu
> Adrian
>   


We also have 2 header files. But, I am fine with putting our 3 files in
drivers/net/ instead.

Brice


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/6] myri10ge - First half of the driver
  2006-05-11 23:53     ` Brice Goglin
  2006-05-12  6:47       ` Evgeniy Polyakov
  2006-05-13 16:13       ` Francois Romieu
@ 2006-05-15 17:02       ` Lee Revell
  2006-05-15 17:39         ` Brice Goglin
  2 siblings, 1 reply; 28+ messages in thread
From: Lee Revell @ 2006-05-15 17:02 UTC (permalink / raw)
  To: Brice Goglin; +Cc: Francois Romieu, netdev, LKML, Andrew J. Gallatin

On Fri, 2006-05-12 at 01:53 +0200, Brice Goglin wrote:
> Francois Romieu wrote:
> >
> >> +	spin_lock(&mgp->cmd_lock);
> >> +	response->result = 0xffffffff;
> >> +	mb();
> >> +	myri10ge_pio_copy((void __iomem *) cmd_addr, buf, sizeof (*buf));
> >> +
> >> +	/* wait up to 2 seconds */
> >>     
> >
> > You must not hold a spinlock for up to 2 seconds.
> >   
> 
> We are working on reducing the delay to about 15ms. It only occurs when
> the driver is loaded or the link brought up.

I think 15ms is quite a long time to hold a spinlock also - most
spinlocks in the kernel are held for less than 500 microseconds.

Can't you use a mutex?

Lee


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/6] myri10ge - First half of the driver
  2006-05-15 17:02       ` Lee Revell
@ 2006-05-15 17:39         ` Brice Goglin
  0 siblings, 0 replies; 28+ messages in thread
From: Brice Goglin @ 2006-05-15 17:39 UTC (permalink / raw)
  To: Lee Revell; +Cc: Francois Romieu, netdev, LKML, Andrew J. Gallatin

Lee Revell wrote:
> On Fri, 2006-05-12 at 01:53 +0200, Brice Goglin wrote:
>   
>> Francois Romieu wrote:
>>     
>>>> +	spin_lock(&mgp->cmd_lock);
>>>> +	response->result = 0xffffffff;
>>>> +	mb();
>>>> +	myri10ge_pio_copy((void __iomem *) cmd_addr, buf, sizeof (*buf));
>>>> +
>>>> +	/* wait up to 2 seconds */
>>>>     
>>>>         
>>> You must not hold a spinlock for up to 2 seconds.
>>>   
>>>       
>> We are working on reducing the delay to about 15ms. It only occurs when
>> the driver is loaded or the link brought up.
>>     
>
> I think 15ms is quite a long time to hold a spinlock also - most
> spinlocks in the kernel are held for less than 500 microseconds.
>
> Can't you use a mutex?
>   

It looks like rtnl_lock protects us here. We are working on it.

Brice


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2006-05-15 17:40 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-05-10 21:22 [PATCH 0/6] myri10ge - Myri-10G Ethernet driver Brice Goglin
2006-05-10 21:34 ` [PATCH 1/6] myri10ge - Revive pci_find_ext_capability Brice Goglin
2006-05-10 21:35 ` [PATCH 2/6] myri10ge - Add missing PCI IDs Brice Goglin
2006-05-10 21:52   ` Andi Kleen
2006-05-10 21:36 ` [PATCH 3/6] myri10ge - Driver header files Brice Goglin
2006-05-10 21:57   ` Roland Dreier
2006-05-10 22:00   ` Stephen Hemminger
2006-05-10 22:02   ` Francois Romieu
2006-05-10 21:40 ` [PATCH 4/6] myri10ge - First half of the driver Brice Goglin
2006-05-10 22:01   ` Stephen Hemminger
2006-05-10 22:06     ` Roland Dreier
2006-05-11 23:53       ` Brice Goglin
2006-05-10 22:04   ` Roland Dreier
2006-05-11 23:53     ` Brice Goglin
2006-05-10 23:13   ` Francois Romieu
2006-05-11 23:53     ` Brice Goglin
2006-05-12  6:47       ` Evgeniy Polyakov
2006-05-12 17:40         ` David S. Miller
2006-05-13 16:13       ` Francois Romieu
2006-05-15 17:02       ` Lee Revell
2006-05-15 17:39         ` Brice Goglin
2006-05-10 21:42 ` [PATCH 5/6] myri10ge - Second " Brice Goglin
2006-05-10 22:22   ` Stephen Hemminger
2006-05-11 23:53     ` Brice Goglin
2006-05-12  0:31       ` Herbert Xu
2006-05-10 21:43 ` [PATCH 6/6] myri10ge - Kconfig and Makefile Brice Goglin
2006-05-13 18:51   ` Adrian Bunk
2006-05-13 18:56     ` Brice Goglin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).