All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator
@ 2006-04-27 12:30 Or Gerlitz
  2006-04-27 12:30 ` [PATCH 1/6] iSER's Makefile and Kconfig Or Gerlitz
  2006-05-01 18:32 ` [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator Roland Dreier
  0 siblings, 2 replies; 21+ messages in thread
From: Or Gerlitz @ 2006-04-27 12:30 UTC (permalink / raw)
  To: rdreier; +Cc: openib-general, linux-kernel

Roland,

The patch series that follows contains the iSER code which we want to submit 
upstream for 2.6.18. Below i have placed some general information on iser for 
LKML reviewers (please CC openib-general@openib.org on your responses).

Details are provided here on new some code which iser is dependent upon, and 
is expected in 2.6.18, (i have communicated them already to you but prefer
to repeat it again for clarity).

iSER is dependent on three new changesets/functionalities which are expected 
in 2.6.18, two in iscsi and one in infiniband.

+1 libiscsi - a kernel library (module) implementing lots of common code to
   iscsi_tcp and iscsi_iser

+2 iscsi transport ep callbacks - the first patch in this RFC, which enables
   an iscsi transport to establish/disconnect its connection from the kernel

+3 the rdma cm (CMA) - a module that implements RDMA transport neutral Address
   Translation and Communication Management (CM). iSER as most of the inwork 
   IB RC ULPs (eg SDP, NFSoRDMA, Lustre, etc) are coded to the CMA api.

The patch adding libiscsi is one of 5 iSCSI patches present already in the 
scsi-misc git tree, where the ep callbacks patch is expected to be pushed
by the end of this week. The CMA is present in the infiniband git tree. 

To compile the code you would need to patch 2.6.17-rcX with the 6 iscsi patches 
I have described above (iser is directly dependent only on two but the patches 
might apply only in the full order), the patches are also present under 
https://openib.org/svn/gen2/branches/backport/2.6.17

The code has been tested with 2.6.16 and 2.6.17-rc3 (drivers/infiniband and 
include/rdma being latest openib) and the user space part of latest open-iscsi. 
The only patches over this setting were the iscsi updates for 2.6.18. 

Over the 2.6.17 testing an issue with kmem_cache_destroy crash which seems
unrelated to iSER has popped up, i have sent a bug report on the matter today.

The iSER targets in this testing were from two types: Voltaire's IB/FC router 
and Voltaire's Native IB storage box, also recently an open source iSER target 
was kickedoff.

OK, here is some general information on iSER:

iSER (iSCSI Extensions for RDMA) is defined by the IETF IP Storage (IPS) working 
group, also an iSER annex was recently approved to appear in the IB spec.

This driver is an iSER transport implementation for the Open iSCSI initiator 
(www.open-iscsi.org) whose kernel portion and TCP transport provider are merged 
in as of 2.6.15 (iscsi_trasport_iscsi & iscsi_tcp and with 2.6.18 also libiscsi)

Hence iSER is both a provider of the Linux iSCSI transport api (scsi/
scsi_transport_iscsi.h) and a SCSI LLD (Low Level Driver) of the Linux SCSI 
midlayer api (scsi/scsi_host.h)

The Open iSCSI initiator discovery of targets and login into a target is carried 
out from user space, where once the login negotiation is done, the transport 
connection is bounded to an iSCSI connection. The diagram under http://www.
open-iscsi.org/docs/open-iscsi-1.jpg shows the connecting sequence for TCP.

Upto 2.6.18, the transport is expected to use a socket for the connection where 
Linux has the means to move a socket from user to kernel space. This restriction, 
the inability to move an IB QP (Queue-Pair) from user to kernel space, and looking
forward to integrate with more transports such as iSCSI offloads lead to a change 
in iscsi under which the transport is allowed to create/connect its native "end 
point" either from user space (eg TCP/socket) or from the kernel (iSER/QP), later
the transport connection is bounded to an iSCSI connection.

Basically, it goes like:
+1 target discovery over TCP/IP with the discovery server
+2.TCP  socket create/bind/setopt/connect to the target
+2.iSER CMA_ID/QP create/connect to the target
+3 iscsi session create
+4 iscsi connection create
+5 bind iscsi connection to the transport connection 
+6 login request/response negotiation
+7 iscsi connection start
+8 the SCSI midlayer starts its inquiry and so on

Or Gerlitz


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/6] iSER's Makefile and Kconfig
  2006-04-27 12:30 [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator Or Gerlitz
@ 2006-04-27 12:30 ` Or Gerlitz
  2006-04-27 12:31   ` [PATCH 2/6] iscsi_iser header file Or Gerlitz
  2006-04-27 12:40   ` [PATCH 1/6] iSER's Makefile and Kconfig Jan-Benedict Glaw
  2006-05-01 18:32 ` [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator Roland Dreier
  1 sibling, 2 replies; 21+ messages in thread
From: Or Gerlitz @ 2006-04-27 12:30 UTC (permalink / raw)
  To: rdreier; +Cc: openib-general, linux-kernel

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>

--- /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser-x/Makefile	1970-01-01 02:00:00.000000000 +0200
+++ /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser/Makefile	2006-04-27 15:12:33.000000000 +0300
@@ -0,0 +1,6 @@
+obj-$(CONFIG_INFINIBAND_ISER)	+= ib_iser.o
+
+ib_iser-y			:= iser_verbs.o \
+				   iser_initiator.o \
+				   iser_memory.o \
+				   iscsi_iser.o 
--- /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser-x/Kconfig	1970-01-01 02:00:00.000000000 +0200
+++ /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser/Kconfig	2006-04-16 11:04:42.000000000 +0300
@@ -0,0 +1,12 @@
+config INFINIBAND_ISER
+	tristate "ISCSI RDMA Protocol"
+	depends on INFINIBAND && SCSI
+	select SCSI_ISCSI_ATTRS
+	---help---
+
+	  Support for the ISCSI RDMA Protocol over InfiniBand.  This
+	  allows you to access storage devices that speak ISER/ISCSI
+	  over InfiniBand.
+
+	  The ISER protocol is defined by IETF.
+	  See <http://www.ietf.org/>.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 2/6] iscsi_iser header file
  2006-04-27 12:30 ` [PATCH 1/6] iSER's Makefile and Kconfig Or Gerlitz
@ 2006-04-27 12:31   ` Or Gerlitz
  2006-04-27 12:31     ` [PATCH 3/6] open iscsi iser transport provider code Or Gerlitz
  2006-04-27 16:58     ` [PATCH 2/6] iscsi_iser header file Stephen Hemminger
  2006-04-27 12:40   ` [PATCH 1/6] iSER's Makefile and Kconfig Jan-Benedict Glaw
  1 sibling, 2 replies; 21+ messages in thread
From: Or Gerlitz @ 2006-04-27 12:31 UTC (permalink / raw)
  To: rdreier; +Cc: openib-general, linux-kernel

iscsi_iser is the buddy of drivers/scsi/iscsi_tcp, were with the 
introduction of libiscsi much of the code (which was common) was 
moved into it. 

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>

--- /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser-x/iscsi_iser.h	1970-01-01 02:00:00.000000000 +0200
+++ /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser/iscsi_iser.h	2006-04-27 10:07:04.000000000 +0300
@@ -0,0 +1,355 @@
+/*
+ * iSER transport for the Open iSCSI Initiator & iSER transport internals
+ *
+ * Copyright (C) 2004 Dmitry Yusupov
+ * Copyright (C) 2004 Alex Aizman
+ * Copyright (C) 2005 Mike Christie
+ * based on code maintained by open-iscsi@googlegroups.com
+ *
+ * Copyright (c) 2004, 2005, 2006 Voltaire, Inc. All rights reserved.
+ * Copyright (c) 2005, 2006 Cisco Systems.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *	- Redistributions of source code must retain the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer.
+ *
+ *	- Redistributions in binary form must reproduce the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer in the documentation and/or other materials
+ *	  provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ * $Id: iscsi_iser.h 6643 2006-04-26 10:01:01Z ogerlitz $
+ */
+#ifndef __ISCSI_ISER_H__
+#define __ISCSI_ISER_H__
+
+#include <linux/types.h>
+#include <linux/net.h>
+#include <scsi/libiscsi.h>
+#include <scsi/scsi_transport_iscsi.h>
+
+#include <linux/wait.h>
+#include <linux/sched.h>
+#include <linux/list.h>
+#include <linux/slab.h>
+#include <linux/dma-mapping.h>
+#include <linux/mutex.h>
+#include <linux/mempool.h>
+#include <linux/uio.h>
+
+#include <linux/socket.h>
+#include <linux/in.h>
+#include <linux/in6.h>
+
+#include <rdma/ib_verbs.h>
+#include <rdma/ib_fmr_pool.h>
+#include <rdma/rdma_cm.h>
+
+#define PFX "iser:"
+
+#define iser_dbg(fmt, arg...)				\
+	do {						\
+		if (iser_debug_level > 0)		\
+			printk(KERN_DEBUG PFX "%s:" fmt,\
+				__func__ , ## arg);	\
+	} while (0)
+
+#define iser_err(fmt, arg...)				\
+	do {						\
+		printk(KERN_ERR PFX "%s:" fmt,          \
+		       __func__ , ## arg);		\
+	} while (0)
+
+#define iser_bug(fmt,arg...)				\
+	do {						\
+		printk(KERN_ERR PFX "%s: PANIC! " fmt,	\
+			__func__ , ## arg);		\
+		BUG();					\
+	} while(0)
+
+					/* support upto 512KB in one RDMA */
+#define ISCSI_ISER_SG_TABLESIZE         (0x80000 >> PAGE_SHIFT)
+#define ISCSI_ISER_MAX_LUN		256
+#define ISCSI_ISER_MAX_CMD_LEN		16
+
+/* QP settings */
+/* Maximal bounds on received asynchronous PDUs */
+#define ISER_MAX_RX_MISC_PDUS		4 /* NOOP_IN(2) , ASYNC_EVENT(2)   */
+
+#define ISER_MAX_TX_MISC_PDUS		6 /* NOOP_OUT(2), TEXT(1),         *
+					   * SCSI_TMFUNC(2), LOGOUT(1) */
+
+#define ISER_QP_MAX_RECV_DTOS		(ISCSI_XMIT_CMDS_MAX + \
+					ISER_MAX_RX_MISC_PDUS    +  \
+					ISER_MAX_TX_MISC_PDUS)
+
+/* the max TX (send) WR supported by the iSER QP is defined by                 *
+ * max_send_wr = T * (1 + D) + C ; D is how many inflight dataouts we expect   *
+ * to have at max for SCSI command. The tx posting & completion handling code  *
+ * supports -EAGAIN scheme where tx is suspended till the QP has room for more *
+ * send WR. D=8 comes from 64K/8K                                              */
+
+#define ISER_INFLIGHT_DATAOUTS		8
+
+#define ISER_QP_MAX_REQ_DTOS		(ISCSI_XMIT_CMDS_MAX *    \
+					(1 + ISER_INFLIGHT_DATAOUTS) + \
+					ISER_MAX_TX_MISC_PDUS        + \
+					ISER_MAX_RX_MISC_PDUS)
+
+#define ISER_VER			0x10
+#define ISER_WSV			0x08
+#define ISER_RSV			0x04
+
+struct iser_hdr {
+	u8      flags;
+	u8      rsvd[3];
+	__be32  write_stag; /* write rkey */
+	__be64  write_va;
+	__be32  read_stag;  /* read rkey */
+	__be64  read_va;
+} __attribute__((packed));
+
+
+/* Length of an object name string */
+#define ISER_OBJECT_NAME_SIZE		    64
+
+enum iser_ib_conn_state {
+	ISER_CONN_INIT,		   /* descriptor allocd, no conn          */
+	ISER_CONN_PENDING,	   /* in the process of being established */
+	ISER_CONN_UP,		   /* up and running                      */
+	ISER_CONN_TERMINATING,	   /* in the process of being terminated  */
+	ISER_CONN_DOWN,		   /* shut down                           */
+	ISER_CONN_STATES_NUM
+};
+
+enum iser_task_status {
+	ISER_TASK_STATUS_INIT = 0,
+	ISER_TASK_STATUS_STARTED,
+	ISER_TASK_STATUS_COMPLETED
+};
+
+enum iser_data_dir {
+	ISER_DIR_IN = 0,	   /* to initiator */
+	ISER_DIR_OUT,		   /* from initiator */
+	ISER_DIRS_NUM
+};
+
+struct iser_data_buf {
+	void               *buf;      /* pointer to the sg list               */
+	unsigned int       size;      /* num entries of this sg               */
+	unsigned long      data_len;  /* total data len                       */
+	unsigned int       dma_nents; /* returned by dma_map_sg               */
+	char       	   *copy_buf; /* allocated copy buf for SGs unaligned *
+	                               * for rdma which are copied            */
+	struct scatterlist sg_single; /* SG-ified clone of a non SG SC or     *
+				       * unaligned SG                         */
+  };
+
+/* fwd declarations */
+struct iser_device;
+struct iscsi_iser_conn;
+struct iscsi_iser_cmd_task;
+
+struct iser_mem_reg {
+	u32  lkey;
+	u32  rkey;
+	u64  va;
+	u64  len;
+	void *mem_h;
+};
+
+struct iser_regd_buf {
+	struct iser_mem_reg     reg;        /* memory registration info        */
+	void                    *virt_addr;
+	struct iser_device      *device;    /* device->device for dma_unmap    */
+	dma_addr_t              dma_addr;   /* if non zero, addr for dma_unmap */
+	enum dma_data_direction direction;  /* direction for dma_unmap	       */
+	unsigned int            data_size;
+	atomic_t                ref_count;  /* refcount, freed when dec to 0   */
+};
+
+#define MAX_REGD_BUF_VECTOR_LEN	2
+
+struct iser_dto {
+	struct iscsi_iser_cmd_task *ctask;
+	struct iscsi_iser_conn     *conn;
+	int                        notify_enable;
+
+	/* vector of registered buffers */
+	unsigned int               regd_vector_len;
+	struct iser_regd_buf       *regd[MAX_REGD_BUF_VECTOR_LEN];
+
+	/* offset into the registered buffer may be specified */
+	unsigned int               offset[MAX_REGD_BUF_VECTOR_LEN];
+
+	/* a smaller size may be specified, if 0, then full size is used */
+	unsigned int               used_sz[MAX_REGD_BUF_VECTOR_LEN];
+};
+
+enum iser_desc_type {
+	ISCSI_RX,
+	ISCSI_TX_CONTROL ,
+	ISCSI_TX_SCSI_COMMAND,
+	ISCSI_TX_DATAOUT
+};
+
+struct iser_desc {
+	struct iser_hdr              iser_header;
+	struct iscsi_hdr             iscsi_header;
+	struct iser_regd_buf         hdr_regd_buf;
+	void                         *data;         /* used by RX & TX_CONTROL */
+	struct iser_regd_buf         data_regd_buf; /* used by RX & TX_CONTROL */
+	enum   iser_desc_type        type;
+	struct iser_dto              dto;
+};
+
+struct iser_device {
+	struct ib_device             *ib_device;
+	struct ib_pd	             *pd;
+	struct ib_cq	             *cq;
+	struct ib_mr	             *mr;
+	struct tasklet_struct	     cq_tasklet;
+	struct list_head             ig_list; /* entry in ig devices list */
+	int                          refcount;
+};
+
+struct iser_conn
+{
+	struct iscsi_iser_conn       *iser_conn; /* iser conn for upcalls  */
+	atomic_t		     state;	    /* rdma connection state   */
+	struct iser_device           *device;       /* device context          */
+	struct rdma_cm_id            *cma_id;       /* CMA ID		       */
+	struct ib_qp	             *qp;           /* QP 		       */
+	struct ib_fmr_pool           *fmr_pool;     /* pool of IB FMRs         */
+	int                          disc_evt_flag; /* disconn event delivered */
+	wait_queue_head_t	     wait;          /* waitq for conn/disconn  */
+	atomic_t                     post_recv_buf_count; /* posted rx count   */
+	atomic_t                     post_send_buf_count; /* posted tx count   */
+	struct work_struct           comperror_work; /* conn term sleepable ctx*/
+	char 			     name[ISER_OBJECT_NAME_SIZE];
+	struct iser_page_vec         *page_vec;     /* represents SG to fmr maps*
+						     * maps serialized as tx is*/
+	struct list_head	     conn_list;       /* entry in ig conn list */
+};
+
+struct iscsi_iser_conn {
+	struct iscsi_conn            *iscsi_conn;/* ptr to iscsi conn */
+	struct iser_conn             *ib_conn;   /* iSER IB conn      */
+
+	rwlock_t		     lock;
+};
+
+struct iscsi_iser_cmd_task {
+	struct iser_desc             desc;
+	struct iscsi_iser_conn	     *iser_conn;
+	int			     rdma_data_count;/* RDMA bytes           */
+	enum iser_task_status 	     status;
+	int                          command_sent;  /* set if command  sent  */
+	int                          dir[ISER_DIRS_NUM];      /* set if dir use*/
+	struct iser_regd_buf         rdma_regd[ISER_DIRS_NUM];/* regd rdma buf */
+	struct iser_data_buf         data[ISER_DIRS_NUM];     /* orig. data des*/
+	struct iser_data_buf         data_copy[ISER_DIRS_NUM];/* contig. copy  */
+};
+
+struct iser_page_vec {
+	u64 *pages;
+	int length;
+	int offset;
+	int data_size;
+};
+
+struct iser_global {
+	struct mutex      device_list_mutex;/*                   */
+	struct list_head  device_list;	     /* all iSER devices */
+	struct mutex      connlist_mutex;
+	struct list_head  connlist;		/* all iSER IB connections */
+
+	kmem_cache_t *desc_cache;
+};
+
+extern struct iser_global ig;
+extern int iser_debug_level;
+
+/* allocate connection resources needed for rdma functionality */
+int iser_conn_set_full_featured_mode(struct iscsi_conn *conn);
+
+int iser_send_control(struct iscsi_conn      *conn,
+		      struct iscsi_mgmt_task *mtask);
+
+int iser_send_command(struct iscsi_conn      *conn,
+		      struct iscsi_cmd_task  *ctask);
+
+int iser_send_data_out(struct iscsi_conn     *conn,
+		       struct iscsi_cmd_task *ctask,
+		       struct iscsi_data          *hdr);
+
+void iscsi_iser_recv(struct iscsi_conn *conn,
+		     struct iscsi_hdr       *hdr,
+		     char                   *rx_data,
+		     int                    rx_data_len);
+
+int  iser_conn_init(struct iser_conn **ib_conn);
+
+void iser_conn_terminate(struct iser_conn *ib_conn);
+
+void iser_conn_release(struct iser_conn *ib_conn);
+
+void iser_rcv_completion(struct iser_desc *desc,
+			 unsigned long    dto_xfer_len);
+
+void iser_snd_completion(struct iser_desc *desc);
+
+void iser_ctask_rdma_init(struct iscsi_iser_cmd_task     *ctask);
+
+void iser_ctask_rdma_finalize(struct iscsi_iser_cmd_task *ctask);
+
+void iser_dto_buffs_release(struct iser_dto *dto);
+
+int  iser_regd_buff_release(struct iser_regd_buf *regd_buf);
+
+void iser_reg_single(struct iser_device      *device,
+		     struct iser_regd_buf    *regd_buf,
+		     enum dma_data_direction direction);
+
+int  iser_start_rdma_unaligned_sg(struct iscsi_iser_cmd_task    *ctask,
+				  enum iser_data_dir            cmd_dir);
+
+void iser_finalize_rdma_unaligned_sg(struct iscsi_iser_cmd_task *ctask,
+				     enum iser_data_dir         cmd_dir);
+
+int  iser_reg_rdma_mem(struct iscsi_iser_cmd_task *ctask,
+		       enum   iser_data_dir        cmd_dir);
+
+int  iser_connect(struct iser_conn   *ib_conn,
+		  struct sockaddr_in *src_addr,
+		  struct sockaddr_in *dst_addr,
+		  int                non_blocking);
+
+int  iser_reg_page_vec(struct iser_conn     *ib_conn,
+		       struct iser_page_vec *page_vec,
+		       struct iser_mem_reg  *mem_reg);
+
+void iser_unreg_mem(struct iser_mem_reg *mem_reg);
+
+int  iser_post_recv(struct iser_desc *rx_desc);
+int  iser_post_send(struct iser_desc *tx_desc);
+#endif


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 3/6] open iscsi iser transport provider code
  2006-04-27 12:31   ` [PATCH 2/6] iscsi_iser header file Or Gerlitz
@ 2006-04-27 12:31     ` Or Gerlitz
  2006-04-27 12:32       ` [PATCH 4/6] iser initiator Or Gerlitz
  2006-04-27 17:01       ` [PATCH 3/6] open iscsi iser transport provider code Stephen Hemminger
  2006-04-27 16:58     ` [PATCH 2/6] iscsi_iser header file Stephen Hemminger
  1 sibling, 2 replies; 21+ messages in thread
From: Or Gerlitz @ 2006-04-27 12:31 UTC (permalink / raw)
  To: rdreier; +Cc: openib-general, linux-kernel

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>

--- /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser-x/iscsi_iser.c	1970-01-01 02:00:00.000000000 +0200
+++ /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser/iscsi_iser.c	2006-04-26 12:50:11.000000000 +0300
@@ -0,0 +1,800 @@
+/*
+ * iSCSI Initiator over iSER Data-Path
+ *
+ * Copyright (C) 2004 Dmitry Yusupov
+ * Copyright (C) 2004 Alex Aizman
+ * Copyright (C) 2005 Mike Christie
+ * Copyright (c) 2005, 2006 Voltaire, Inc. All rights reserved.
+ * maintained by openib-general@openib.org
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *	- Redistributions of source code must retain the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer.
+ *
+ *	- Redistributions in binary form must reproduce the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer in the documentation and/or other materials
+ *	  provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ * Credits:
+ *	Christoph Hellwig
+ *	FUJITA Tomonori
+ *	Arne Redlich
+ *	Zhenyu Wang
+ * Modified by:
+ *      Erez Zilber
+ *
+ *
+ * $Id: iscsi_iser.c 6643 2006-04-26 10:01:01Z ogerlitz $
+ */
+
+#include <linux/types.h>
+#include <linux/list.h>
+#include <linux/hardirq.h>
+#include <linux/kfifo.h>
+#include <linux/blkdev.h>
+#include <linux/init.h>
+#include <linux/ioctl.h>
+#include <linux/devfs_fs_kernel.h>
+#include <linux/cdev.h>
+#include <linux/in.h>
+#include <linux/net.h>
+#include <linux/scatterlist.h>
+#include <linux/delay.h>
+
+#include <net/sock.h>
+
+#include <asm/uaccess.h>
+
+#include <scsi/scsi_cmnd.h>
+#include <scsi/scsi_device.h>
+#include <scsi/scsi_eh.h>
+#include <scsi/scsi_request.h>
+#include <scsi/scsi_tcq.h>
+#include <scsi/scsi_host.h>
+#include <scsi/scsi.h>
+#include <scsi/scsi_transport_iscsi.h>
+
+#include "iscsi_iser.h"
+
+static unsigned int iscsi_max_lun = 512;
+module_param_named(max_lun, iscsi_max_lun, uint, S_IRUGO);
+
+#define DRV_VER	     "$Rev: 227 $"
+#define DRV_DATE     "$LastChangedDate: 2006-03-22 16:47:30 +0200 (Wed, 22 Mar 2006) $"
+
+int iser_debug_level = 0;
+
+MODULE_DESCRIPTION("iSER (iSCSI Extensions for RDMA) Datamover "
+		   "v" DRV_VER "(" DRV_DATE ")");
+MODULE_LICENSE("Dual BSD/GPL");
+MODULE_AUTHOR("Alex Nezhinsky, Dan Bar Dov, Or Gerlitz");
+
+module_param_named(debug_level, iser_debug_level, int, 0644);
+MODULE_PARM_DESC(debug_level, "Enable debug tracing if > 0 (default:disabled)");
+
+struct iser_global ig;
+
+void
+iscsi_iser_recv(struct iscsi_conn *conn,
+		struct iscsi_hdr *hdr, char *rx_data, int rx_data_len)
+{
+	int rc = 0;
+	uint32_t ret_itt;
+	int datalen;
+	int ahslen;
+
+	/* verify PDU length */
+	datalen = ntoh24(hdr->dlength);
+	if (datalen != rx_data_len) {
+		printk(KERN_ERR "iscsi_iser: datalen %d (hdr) != %d (IB) \n",
+		       datalen, rx_data_len);
+		rc = ISCSI_ERR_DATALEN;
+		goto error;
+	}
+
+	/* read AHS */
+	ahslen = hdr->hlength * 4;
+
+	/* verify itt (itt encoding: age+cid+itt) */
+	rc = iscsi_verify_itt(conn, hdr, &ret_itt);
+
+	if (!rc)
+		rc = iscsi_complete_pdu(conn, hdr, rx_data, rx_data_len);
+
+	if (rc && rc != ISCSI_ERR_NO_SCSI_CMD)
+		goto error;
+
+	return;
+error:
+	iscsi_conn_failure(conn, rc);
+}
+
+
+/**
+ * iscsi_iser_cmd_init - Initialize iSCSI SCSI_READ or SCSI_WRITE commands
+ *
+ **/
+static void
+iscsi_iser_cmd_init(struct iscsi_cmd_task *ctask)
+{
+	struct iscsi_iser_conn     *iser_conn  = ctask->conn->dd_data;
+	struct iscsi_iser_cmd_task *iser_ctask = ctask->dd_data;
+	struct scsi_cmnd  *sc = ctask->sc;
+
+	iser_ctask->command_sent = 0;
+	iser_ctask->iser_conn    = iser_conn;
+
+	if (sc->sc_data_direction == DMA_TO_DEVICE) {
+		BUG_ON(ctask->total_length == 0);
+		/* bytes to be sent via RDMA operations */
+		iser_ctask->rdma_data_count = ctask->total_length -
+					 ctask->imm_count -
+					 ctask->unsol_count;
+
+		debug_scsi("cmd [itt %x total %d imm %d imm_data %d "
+			   "rdma_data %d]\n",
+			   ctask->itt, ctask->total_length, ctask->imm_count,
+			   ctask->unsol_count, ctask->rdma_data_count);
+	} else
+		/* bytes to be sent via RDMA operations */
+		iser_ctask->rdma_data_count = ctask->total_length;
+
+	iser_ctask_rdma_init(iser_ctask);
+}
+
+/**
+ * iscsi_mtask_xmit - xmit management(immediate) task
+ * @conn: iscsi connection
+ * @mtask: task management task
+ *
+ * Notes:
+ *	The function can return -EAGAIN in which case caller must
+ *	call it again later, or recover. '0' return code means successful
+ *	xmit.
+ *
+ **/
+static int
+iscsi_iser_mtask_xmit(struct iscsi_conn *conn,
+		      struct iscsi_mgmt_task *mtask)
+{
+	int error = 0;
+
+	debug_scsi("mtask deq [cid %d itt 0x%x]\n", conn->id, mtask->itt);
+
+	error = iser_send_control(conn, mtask);
+
+	/* since iser xmits control with zero copy, mtasks can not be recycled
+	 * right after sending them.
+	 * The recycling scheme is based on whether a response is expected
+	 * - if yes, the mtask is recycled at iscsi_complete_pdu
+	 * - if no,  the mtask is recycled at iser_snd_completion
+	 */
+	if (error && error != -EAGAIN)
+		iscsi_conn_failure(conn, ISCSI_ERR_CONN_FAILED);
+
+	return error;
+}
+
+static int
+iscsi_iser_ctask_xmit_unsol_data(struct iscsi_conn *conn,
+				 struct iscsi_cmd_task *ctask)
+{
+	struct iscsi_data  hdr;
+	int error = 0;
+	struct iscsi_iser_cmd_task *iser_ctask = ctask->dd_data;
+
+	/* Send data-out PDUs while there's still unsolicited data to send */
+	while (ctask->unsol_count > 0) {
+		iscsi_prep_unsolicit_data_pdu(ctask, &hdr,
+					      iser_ctask->rdma_data_count);
+
+		debug_scsi("Sending data-out: itt 0x%x, data count %d\n",
+			   hdr.itt, ctask->data_count);
+
+		/* the buffer description has been passed with the command */
+		/* Send the command */
+		error = iser_send_data_out(conn, ctask, &hdr);
+		if (error) {
+			ctask->unsol_datasn--;
+			goto iscsi_iser_ctask_xmit_unsol_data_exit;
+		}
+		ctask->unsol_count -= ctask->data_count;
+		debug_scsi("Need to send %d more as data-out PDUs\n",
+			   ctask->unsol_count);
+	}
+
+iscsi_iser_ctask_xmit_unsol_data_exit:
+	return error;
+}
+
+static int
+iscsi_iser_ctask_xmit(struct iscsi_conn *conn,
+		      struct iscsi_cmd_task *ctask)
+{
+	struct iscsi_iser_cmd_task *iser_ctask = ctask->dd_data;
+	int error = 0;
+
+	debug_scsi("ctask deq [cid %d itt 0x%x]\n",
+		   conn->id, ctask->itt);
+
+	/*
+	 * serialize with TMF AbortTask
+	 */
+	if (ctask->mtask)
+		return error;
+
+	/* Send the cmd PDU */
+	if (!iser_ctask->command_sent) {
+		error = iser_send_command(conn, ctask);
+		if (error)
+			goto iscsi_iser_ctask_xmit_exit;
+		iser_ctask->command_sent = 1;
+	}
+
+	/* Send unsolicited data-out PDU(s) if necessary */
+	if (ctask->unsol_count)
+		error = iscsi_iser_ctask_xmit_unsol_data(conn, ctask);
+
+ iscsi_iser_ctask_xmit_exit:
+	if (error && error != -EAGAIN)
+		iscsi_conn_failure(conn, ISCSI_ERR_CONN_FAILED);
+	return error;
+}
+
+static void
+iscsi_iser_cleanup_ctask(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask)
+{
+	struct iscsi_iser_cmd_task *iser_ctask = ctask->dd_data;
+
+	if (iser_ctask->status == ISER_TASK_STATUS_STARTED) {
+		iser_ctask->status = ISER_TASK_STATUS_COMPLETED;
+		iser_ctask_rdma_finalize(iser_ctask);
+	}
+}
+
+static struct iser_conn *
+iscsi_iser_ib_conn_lookup(__u64 ep_handle)
+{
+	struct iser_conn *ib_conn;
+	struct iser_conn *uib_conn = (struct iser_conn *)(unsigned long)ep_handle;
+
+	mutex_lock(&ig.connlist_mutex);
+	list_for_each_entry(ib_conn, &ig.connlist, conn_list) {
+		if (ib_conn == uib_conn) {
+			mutex_unlock(&ig.connlist_mutex);
+			return ib_conn;
+		}
+	}
+	mutex_unlock(&ig.connlist_mutex);
+	iser_err("no conn exists for eph %llx\n",(unsigned long long)ep_handle);
+	return NULL;
+}
+
+static struct iscsi_cls_conn *
+iscsi_iser_conn_create(struct iscsi_cls_session *cls_session, uint32_t conn_idx)
+{
+	struct iscsi_conn *conn;
+	struct iscsi_cls_conn *cls_conn;
+	struct iscsi_iser_conn *iser_conn;
+
+	cls_conn = iscsi_conn_setup(cls_session, conn_idx);
+	if (!cls_conn)
+		return NULL;
+	conn = cls_conn->dd_data;
+
+	/*
+	 * due to issues with the login code re iser sematics
+	 * this not set in iscsi_conn_setup - FIXME
+	 */
+	conn->max_recv_dlength = 128;
+
+	iser_conn = kzalloc(sizeof(*iser_conn), GFP_KERNEL);
+	if (!iser_conn)
+		goto conn_alloc_fail;
+
+	/* currently this is the only field which need to be initiated */
+	rwlock_init(&iser_conn->lock);
+
+	conn->recv_lock = &iser_conn->lock;
+
+	conn->dd_data = iser_conn;
+	iser_conn->iscsi_conn = conn;
+
+	return cls_conn;
+
+conn_alloc_fail:
+	iscsi_conn_teardown(cls_conn);
+	return NULL;
+}
+
+static void
+iscsi_iser_conn_destroy(struct iscsi_cls_conn *cls_conn)
+{
+	struct iscsi_conn *conn = cls_conn->dd_data;
+	struct iscsi_iser_conn *iser_conn = conn->dd_data;
+
+	iscsi_conn_teardown(cls_conn);
+	kfree(iser_conn);
+}
+
+static int
+iscsi_iser_conn_bind(struct iscsi_cls_session *cls_session,
+		     struct iscsi_cls_conn *cls_conn, uint64_t transport_eph,
+		     int is_leading)
+{
+	struct iscsi_conn *conn = cls_conn->dd_data;
+	struct iscsi_iser_conn *iser_conn;
+	struct iser_conn *ib_conn;
+	int error;
+
+	error = iscsi_conn_bind(cls_session, cls_conn, is_leading);
+	if (error)
+		return error;
+
+	if (conn->stop_stage != STOP_CONN_SUSPEND) {
+		/* the transport ep handle comes from user space so it must be
+		 * verified against the global ib connections list */
+		ib_conn = iscsi_iser_ib_conn_lookup(transport_eph);
+		if (!ib_conn) {
+			iser_err("can't bind eph %llx\n",
+				 (unsigned long long)transport_eph);
+			return -EINVAL;
+		}
+		/* binds the iSER connection retrieved from the previously
+		 * connected ep_handle to the iSCSI layer connection. exchanges
+		 * connection pointers */
+		iser_err("binding iscsi conn %p to iser_conn %p\n",conn,ib_conn);
+		iser_conn = conn->dd_data;
+		ib_conn->iser_conn = iser_conn;
+		iser_conn->ib_conn  = ib_conn;
+	}
+
+	return 0;
+}
+
+static int
+iscsi_iser_conn_start(struct iscsi_cls_conn *cls_conn)
+{
+	struct iscsi_conn *conn = cls_conn->dd_data;
+	int err;
+
+	err = iscsi_conn_start(cls_conn);
+	if (err)
+		return err;
+
+	return iser_conn_set_full_featured_mode(conn);
+}
+
+static void
+iscsi_iser_conn_terminate(struct iscsi_conn *conn)
+{
+	struct iscsi_iser_conn *iser_conn = conn->dd_data;
+	struct iser_conn *ib_conn = iser_conn->ib_conn;
+
+	BUG_ON(!ib_conn);
+	/* starts conn teardown process, waits until all previously   *
+	 * posted buffers get flushed, deallocates all conn resources */
+	iser_conn_terminate(ib_conn);
+	iser_conn->ib_conn = NULL;
+	conn->recv_lock = NULL;
+}
+
+
+static struct iscsi_transport iscsi_iser_transport;
+
+static struct iscsi_cls_session *
+iscsi_iser_session_create(struct iscsi_transport *iscsit,
+			 struct scsi_transport_template *scsit,
+			  uint32_t initial_cmdsn, uint32_t *hostno)
+{
+	struct iscsi_cls_session *cls_session;
+	struct iscsi_session *session;
+	int i;
+	uint32_t hn;
+	struct iscsi_cmd_task  *ctask;
+	struct iscsi_mgmt_task *mtask;
+	struct iscsi_iser_cmd_task *iser_ctask;
+	struct iser_desc *desc;
+
+	cls_session = iscsi_session_setup(iscsit, scsit,
+					  sizeof(struct iscsi_iser_cmd_task),
+					  sizeof(struct iser_desc),
+					  initial_cmdsn, &hn);
+	if (!cls_session)
+	return NULL;
+
+	*hostno = hn;
+	session = class_to_transport_session(cls_session);
+
+	/* libiscsi setup itts, data and pool so just set desc fields */
+	for (i = 0; i < session->cmds_max; i++) {
+		ctask      = session->cmds[i];
+		iser_ctask = ctask->dd_data;
+		ctask->hdr = (struct iscsi_cmd *)&iser_ctask->desc.iscsi_header;
+	}
+
+	for (i = 0; i < session->mgmtpool_max; i++) {
+		mtask      = session->mgmt_cmds[i];
+		desc       = mtask->dd_data;
+		mtask->hdr = &desc->iscsi_header;
+		desc->data = mtask->data;
+	}
+
+	return cls_session;
+}
+
+static int
+iscsi_iser_conn_set_param(struct iscsi_cls_conn *cls_conn,
+			  enum iscsi_param param, uint32_t value)
+{
+	struct iscsi_conn *conn = cls_conn->dd_data;
+	struct iscsi_session *session = conn->session;
+
+	spin_lock_bh(&session->lock);
+	if (conn->c_stage != ISCSI_CONN_INITIAL_STAGE &&
+	    conn->stop_stage != STOP_CONN_RECOVER) {
+		printk(KERN_ERR "iscsi_iser: can not change parameter [%d]\n",
+		       param);
+		spin_unlock_bh(&session->lock);
+		return 0;
+	}
+	spin_unlock_bh(&session->lock);
+
+	switch (param) {
+	case ISCSI_PARAM_MAX_RECV_DLENGTH:
+		/* TBD */
+		break;
+	case ISCSI_PARAM_MAX_XMIT_DLENGTH:
+		conn->max_xmit_dlength =  value;
+		break;
+	case ISCSI_PARAM_HDRDGST_EN:
+		if (value) {
+			printk(KERN_ERR "DataDigest wasn't negotiated to None");
+			return -EPROTO;
+		}
+		break;
+	case ISCSI_PARAM_DATADGST_EN:
+		if (value) {
+			printk(KERN_ERR "DataDigest wasn't negotiated to None");
+			return -EPROTO;
+		}
+		break;
+	case ISCSI_PARAM_INITIAL_R2T_EN:
+		session->initial_r2t_en = value;
+		break;
+	case ISCSI_PARAM_IMM_DATA_EN:
+		session->imm_data_en = value;
+		break;
+	case ISCSI_PARAM_FIRST_BURST:
+		session->first_burst = value;
+		break;
+	case ISCSI_PARAM_MAX_BURST:
+		session->max_burst = value;
+		break;
+	case ISCSI_PARAM_PDU_INORDER_EN:
+		session->pdu_inorder_en = value;
+		break;
+	case ISCSI_PARAM_DATASEQ_INORDER_EN:
+		session->dataseq_inorder_en = value;
+		break;
+	case ISCSI_PARAM_ERL:
+		session->erl = value;
+		break;
+	case ISCSI_PARAM_IFMARKER_EN:
+		if (value) {
+			printk(KERN_ERR "IFMarker wasn't negotiated to No");
+			return -EPROTO;
+		}
+		break;
+	case ISCSI_PARAM_OFMARKER_EN:
+		if (value) {
+			printk(KERN_ERR "OFMarker wasn't negotiated to No");
+			return -EPROTO;
+		}
+		break;
+	default:
+		break;
+	}
+
+	return 0;
+}
+
+static int
+iscsi_iser_session_get_param(struct iscsi_cls_session *cls_session,
+			     enum iscsi_param param, uint32_t *value)
+{
+	struct Scsi_Host *shost = iscsi_session_to_shost(cls_session);
+	struct iscsi_session *session = iscsi_hostdata(shost->hostdata);
+
+	switch (param) {
+	case ISCSI_PARAM_INITIAL_R2T_EN:
+		*value = session->initial_r2t_en;
+		break;
+	case ISCSI_PARAM_MAX_R2T:
+		*value = session->max_r2t;
+		break;
+	case ISCSI_PARAM_IMM_DATA_EN:
+		*value = session->imm_data_en;
+		break;
+	case ISCSI_PARAM_FIRST_BURST:
+		*value = session->first_burst;
+		break;
+	case ISCSI_PARAM_MAX_BURST:
+		*value = session->max_burst;
+		break;
+	case ISCSI_PARAM_PDU_INORDER_EN:
+		*value = session->pdu_inorder_en;
+		break;
+	case ISCSI_PARAM_DATASEQ_INORDER_EN:
+		*value = session->dataseq_inorder_en;
+		break;
+	case ISCSI_PARAM_ERL:
+		*value = session->erl;
+		break;
+	case ISCSI_PARAM_IFMARKER_EN:
+		*value = 0;
+		break;
+	case ISCSI_PARAM_OFMARKER_EN:
+		*value = 0;
+		break;
+	default:
+		return ISCSI_ERR_PARAM_NOT_FOUND;
+	}
+
+	return 0;
+}
+
+static int
+iscsi_iser_conn_get_param(struct iscsi_cls_conn *cls_conn,
+			  enum iscsi_param param, uint32_t *value)
+{
+	struct iscsi_conn *conn = cls_conn->dd_data;
+
+	switch(param) {
+	case ISCSI_PARAM_MAX_RECV_DLENGTH:
+		*value = conn->max_recv_dlength;
+		break;
+	case ISCSI_PARAM_MAX_XMIT_DLENGTH:
+		*value = conn->max_xmit_dlength;
+		break;
+	case ISCSI_PARAM_HDRDGST_EN:
+		*value = 0;
+		break;
+	case ISCSI_PARAM_DATADGST_EN:
+		*value = 0;
+		break;
+	/*case ISCSI_PARAM_TARGET_RECV_DLENGTH:
+		*value = conn->target_recv_dlength;
+		break;
+	case ISCSI_PARAM_INITIATOR_RECV_DLENGTH:
+		*value = conn->initiator_recv_dlength;
+		break;*/
+	default:
+		return ISCSI_ERR_PARAM_NOT_FOUND;
+	}
+
+	return 0;
+}
+
+
+static void
+iscsi_iser_conn_get_stats(struct iscsi_cls_conn *cls_conn, struct iscsi_stats *stats)
+{
+	struct iscsi_conn *conn = cls_conn->dd_data;
+
+	stats->txdata_octets = conn->txdata_octets;
+	stats->rxdata_octets = conn->rxdata_octets;
+	stats->scsicmd_pdus = conn->scsicmd_pdus_cnt;
+	stats->dataout_pdus = conn->dataout_pdus_cnt;
+	stats->scsirsp_pdus = conn->scsirsp_pdus_cnt;
+	stats->datain_pdus = conn->datain_pdus_cnt; /* always 0 */
+	stats->r2t_pdus = conn->r2t_pdus_cnt; /* always 0 */
+	stats->tmfcmd_pdus = conn->tmfcmd_pdus_cnt;
+	stats->tmfrsp_pdus = conn->tmfrsp_pdus_cnt;
+	stats->custom_length = 3;
+	strcpy(stats->custom[0].desc, "qp_tx_queue_full");
+	stats->custom[0].value = 0; /* TB iser_conn->qp_tx_queue_full; */
+	strcpy(stats->custom[1].desc, "fmr_map_not_avail");
+	stats->custom[1].value = 0; /* TB iser_conn->fmr_map_not_avail */;
+	strcpy(stats->custom[2].desc, "eh_abort_cnt");
+	stats->custom[2].value = conn->eh_abort_cnt;
+}
+
+static int
+iscsi_iser_ep_connect(struct sockaddr *dst_addr, int non_blocking,
+		      __u64 *ep_handle)
+{
+	int err;
+	struct iser_conn *ib_conn;
+
+	err = iser_conn_init(&ib_conn);
+	if (err)
+		goto out;
+
+	err = iser_connect(ib_conn, NULL, (struct sockaddr_in *)dst_addr, non_blocking);
+	if (!err)
+		*ep_handle = (__u64)(unsigned long)ib_conn;
+
+out:
+	return err;
+}
+
+static int
+iscsi_iser_ep_poll(__u64 ep_handle, int timeout_ms)
+{
+	struct iser_conn *ib_conn = iscsi_iser_ib_conn_lookup(ep_handle);
+	int rc;
+
+	if (!ib_conn)
+		return -EINVAL;
+
+	rc = wait_event_interruptible_timeout(ib_conn->wait,
+			     atomic_read(&ib_conn->state) == ISER_CONN_UP,
+			     msecs_to_jiffies(timeout_ms));
+
+	/* if conn establishment failed, return error code to iscsi */
+	if (!rc &&
+	    (atomic_read(&ib_conn->state) == ISER_CONN_TERMINATING ||
+	     atomic_read(&ib_conn->state) == ISER_CONN_DOWN))
+		rc = -1;
+
+	iser_err("ib conn %p rc = %d\n", ib_conn, rc);
+
+	if (rc > 0)
+		return 1; /* success, this is the equivalent of POLLOUT */
+	else if (!rc)
+		return 0; /* timeout */
+	else
+		return rc; /* signal */
+}
+
+static void
+iscsi_iser_ep_disconnect(__u64 ep_handle)
+{
+	struct iser_conn *ib_conn = iscsi_iser_ib_conn_lookup(ep_handle);
+
+	if (!ib_conn)
+		return;
+
+	iser_err("ib conn %p state %d\n",ib_conn, atomic_read(&ib_conn->state));
+
+	if (atomic_read(&ib_conn->state) == ISER_CONN_UP)
+		iser_conn_terminate(ib_conn);
+
+	iser_conn_release(ib_conn);
+}
+
+static struct scsi_host_template iscsi_iser_sht = {
+	.name                   = "iSCSI Initiator over iSER, v."
+				  ISCSI_VERSION_STR,
+	.queuecommand           = iscsi_queuecommand,
+	.can_queue		= ISCSI_XMIT_CMDS_MAX - 1,
+	.sg_tablesize           = ISCSI_ISER_SG_TABLESIZE,
+	.cmd_per_lun            = ISCSI_MAX_CMD_PER_LUN,
+	.eh_abort_handler       = iscsi_eh_abort,
+	.eh_host_reset_handler	= iscsi_eh_host_reset,
+	.use_clustering         = DISABLE_CLUSTERING,
+	.proc_name              = "iscsi_iser",
+	.this_id                = -1,
+};
+
+static struct iscsi_transport iscsi_iser_transport = {
+	.owner                  = THIS_MODULE,
+	.name                   = "iser",
+	.caps                   = CAP_RECOVERY_L0 | CAP_MULTI_R2T,
+	.param_mask		= ISCSI_MAX_RECV_DLENGTH |
+				  ISCSI_MAX_XMIT_DLENGTH |
+				  ISCSI_HDRDGST_EN |
+				  ISCSI_DATADGST_EN |
+				  ISCSI_INITIAL_R2T_EN |
+				  ISCSI_MAX_R2T |
+				  ISCSI_IMM_DATA_EN |
+				  ISCSI_FIRST_BURST |
+				  ISCSI_MAX_BURST |
+				  ISCSI_PDU_INORDER_EN |
+				  ISCSI_DATASEQ_INORDER_EN,
+	.host_template          = &iscsi_iser_sht,
+	.conndata_size		= sizeof(struct iscsi_conn),
+	.max_lun                = ISCSI_ISER_MAX_LUN,
+	.max_cmd_len            = ISCSI_ISER_MAX_CMD_LEN,
+	/* session management */
+	.create_session         = iscsi_iser_session_create,
+	.destroy_session        = iscsi_session_teardown,
+	/* connection management */
+	.create_conn            = iscsi_iser_conn_create,
+	.bind_conn              = iscsi_iser_conn_bind,
+	.destroy_conn           = iscsi_iser_conn_destroy,
+	.set_param              = iscsi_iser_conn_set_param,
+	.get_conn_param		= iscsi_iser_conn_get_param,
+	.get_session_param	= iscsi_iser_session_get_param,
+	.start_conn             = iscsi_iser_conn_start,
+	.stop_conn              = iscsi_conn_stop,
+	/* these are called as part of conn recovery */
+	.suspend_conn_recv	= NULL, /* FIXME is/how this relvant to iser? */
+	.terminate_conn		= iscsi_iser_conn_terminate,
+	/* IO */
+	.send_pdu		= iscsi_conn_send_pdu,
+	.get_stats		= iscsi_iser_conn_get_stats,
+	.init_cmd_task		= iscsi_iser_cmd_init,
+	.xmit_cmd_task		= iscsi_iser_ctask_xmit,
+	.xmit_mgmt_task		= iscsi_iser_mtask_xmit,
+	.cleanup_cmd_task	= iscsi_iser_cleanup_ctask,
+	/* recovery */
+	.session_recovery_timedout = iscsi_session_recovery_timedout,
+
+	.ep_connect             = iscsi_iser_ep_connect,
+	.ep_poll                = iscsi_iser_ep_poll,
+	.ep_disconnect          = iscsi_iser_ep_disconnect
+};
+
+static int __init iser_init(void)
+{
+	int err;
+
+	iser_dbg("Starting iSER datamover...\n");
+
+	if (iscsi_max_lun < 1) {
+		printk(KERN_ERR "Invalid max_lun value of %u\n", iscsi_max_lun);
+		return -EINVAL;
+	}
+
+	iscsi_iser_transport.max_lun = iscsi_max_lun;
+
+	memset(&ig, 0, sizeof(struct iser_global));
+
+	ig.desc_cache = kmem_cache_create("iser_descriptors",
+					  sizeof (struct iser_desc),
+					  0, SLAB_HWCACHE_ALIGN,
+					  NULL, NULL);
+	if (ig.desc_cache == NULL)
+		return -ENOMEM;
+
+	/* device init is called only after the first addr resolution */
+	mutex_init(&ig.device_list_mutex);
+	INIT_LIST_HEAD(&ig.device_list);
+	mutex_init(&ig.connlist_mutex);
+	INIT_LIST_HEAD(&ig.connlist);
+
+	if (!iscsi_register_transport(&iscsi_iser_transport)) {
+		iser_err("iscsi_register_transport failed\n");
+		err = -EINVAL;
+		goto register_transport_failure;
+	}
+
+	return 0;
+
+register_transport_failure:
+	kmem_cache_destroy(ig.desc_cache);
+
+	return err;
+}
+
+static void __exit iser_exit(void)
+{
+	iser_dbg("Removing iSER datamover...\n");
+	iscsi_unregister_transport(&iscsi_iser_transport);
+	kmem_cache_destroy(ig.desc_cache);
+}
+
+module_init(iser_init);
+module_exit(iser_exit);


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 4/6] iser initiator
  2006-04-27 12:31     ` [PATCH 3/6] open iscsi iser transport provider code Or Gerlitz
@ 2006-04-27 12:32       ` Or Gerlitz
  2006-04-27 12:32         ` [PATCH 5/6] iser RDMA CM (CMA) and IB verbs interaction Or Gerlitz
  2006-04-27 17:01       ` [PATCH 3/6] open iscsi iser transport provider code Stephen Hemminger
  1 sibling, 1 reply; 21+ messages in thread
From: Or Gerlitz @ 2006-04-27 12:32 UTC (permalink / raw)
  To: rdreier; +Cc: openib-general, linux-kernel

the main entry points to this code are iser_send_control/command/dataout
for flow coming from iscsi_iser.c and iser_snd_compltion/iser_rcv_completion
for handling of completions towards iscsi_iser.c

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>

--- /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser-x/iser_initiator.c	1970-01-01 02:00:00.000000000 +0200
+++ /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser/iser_initiator.c	2006-04-26 12:50:11.000000000 +0300
@@ -0,0 +1,732 @@
+/*
+ * Copyright (c) 2004, 2005, 2006 Voltaire, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *	- Redistributions of source code must retain the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer.
+ *
+ *	- Redistributions in binary form must reproduce the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer in the documentation and/or other materials
+ *	  provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ * $Id: iser_initiator.c 6643 2006-04-26 10:01:01Z ogerlitz $
+ */
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/mm.h>
+#include <asm/io.h>
+#include <asm/scatterlist.h>
+#include <linux/scatterlist.h>
+#include <linux/kfifo.h>
+#include <scsi/scsi_cmnd.h>
+#include <scsi/scsi_host.h>
+
+#include "iscsi_iser.h"
+
+/* Constant PDU lengths calculations */
+#define ISER_TOTAL_HEADERS_LEN  (sizeof (struct iser_hdr) + \
+				 sizeof (struct iscsi_hdr))
+
+/* iser_dto_add_regd_buff - increments the reference count for *
+ * the registered buffer & adds it to the DTO object           */
+static void iser_dto_add_regd_buff(struct iser_dto *dto,
+				   struct iser_regd_buf *regd_buf,
+				   unsigned long use_offset,
+				   unsigned long use_size)
+{
+	int add_idx;
+
+	atomic_inc(&regd_buf->ref_count);
+
+	add_idx = dto->regd_vector_len;
+	dto->regd[add_idx] = regd_buf;
+	dto->used_sz[add_idx] = use_size;
+	dto->offset[add_idx] = use_offset;
+
+	dto->regd_vector_len++;
+}
+
+static int iser_dma_map_task_data(struct iscsi_iser_cmd_task *iser_ctask,
+				  struct iser_data_buf       *data,
+				  enum   iser_data_dir       iser_dir,
+				  enum   dma_data_direction  dma_dir)
+{
+	struct device *dma_device;
+
+	iser_ctask->dir[iser_dir] = 1;
+	dma_device = iser_ctask->iser_conn->ib_conn->device->ib_device->dma_device;
+
+	data->dma_nents = dma_map_sg(dma_device, data->buf, data->size, dma_dir);
+	if (data->dma_nents == 0) {
+		iser_err("dma_map_sg failed!!!\n");
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static void iser_dma_unmap_task_data(struct iscsi_iser_cmd_task *iser_ctask)
+{
+	struct device  *dma_device;
+	struct iser_data_buf *data;
+
+	dma_device = iser_ctask->iser_conn->ib_conn->device->ib_device->dma_device;
+
+	if (iser_ctask->dir[ISER_DIR_IN]) {
+		data = &iser_ctask->data[ISER_DIR_IN];
+		dma_unmap_sg(dma_device, data->buf, data->size, DMA_FROM_DEVICE);
+	}
+
+	if (iser_ctask->dir[ISER_DIR_OUT]) {
+		data = &iser_ctask->data[ISER_DIR_OUT];
+		dma_unmap_sg(dma_device, data->buf, data->size, DMA_TO_DEVICE);
+	}
+}
+
+/* Register user buffer memory and initialize passive rdma
+ *  dto descriptor. Total data size is stored in
+ *  iser_ctask->data[ISER_DIR_IN].data_len
+ */
+static int iser_prepare_read_cmd(struct iscsi_cmd_task *ctask,
+				 unsigned int edtl)
+
+{
+	struct iscsi_iser_cmd_task *iser_ctask = ctask->dd_data;
+	struct iser_regd_buf *regd_buf;
+	int err;
+	struct iser_hdr *hdr = &iser_ctask->desc.iser_header;
+	struct iser_data_buf *buf_in = &iser_ctask->data[ISER_DIR_IN];
+
+	err = iser_dma_map_task_data(iser_ctask,
+				     buf_in,
+				     ISER_DIR_IN,
+				     DMA_FROM_DEVICE);
+	if (err)
+		return err;
+
+	if (edtl > iser_ctask->data[ISER_DIR_IN].data_len) {
+		iser_err("Total data length: %ld, less than EDTL: "
+			 "%d, in READ cmd BHS itt: %d, conn: 0x%p\n",
+			 iser_ctask->data[ISER_DIR_IN].data_len, edtl,
+			 ctask->itt, iser_ctask->iser_conn);
+		return -EINVAL;
+	}
+
+	err = iser_reg_rdma_mem(iser_ctask,ISER_DIR_IN);
+	if (err) {
+		iser_err("Failed to set up Data-IN RDMA\n");
+		return err;
+	}
+	regd_buf = &iser_ctask->rdma_regd[ISER_DIR_IN];
+
+	hdr->flags    |= ISER_RSV;
+	hdr->read_stag = cpu_to_be32(regd_buf->reg.rkey);
+	hdr->read_va   = cpu_to_be64(regd_buf->reg.va);
+
+	iser_dbg("Cmd itt:%d READ tags RKEY:%#.4X VA:%#llX\n",
+		 ctask->itt, regd_buf->reg.rkey,
+		 (unsigned long long)regd_buf->reg.va);
+
+	return 0;
+}
+
+/* Register user buffer memory and initialize passive rdma
+ *  dto descriptor. Total data size is stored in
+ *  ctask->data[ISER_DIR_OUT].data_len
+ */
+static int
+iser_prepare_write_cmd(struct iscsi_cmd_task *ctask,
+		       unsigned int imm_sz,
+		       unsigned int unsol_sz,
+		       unsigned int edtl)
+{
+	struct iscsi_iser_cmd_task *iser_ctask = ctask->dd_data;
+	struct iser_regd_buf *regd_buf;
+	int err;
+	struct iser_dto *send_dto = &iser_ctask->desc.dto;
+	struct iser_hdr *hdr = &iser_ctask->desc.iser_header;
+	struct iser_data_buf *buf_out = &iser_ctask->data[ISER_DIR_OUT];
+
+	err = iser_dma_map_task_data(iser_ctask,
+				     buf_out,
+				     ISER_DIR_OUT,
+				     DMA_TO_DEVICE);
+	if (err)
+		return err;
+
+	if (edtl > iser_ctask->data[ISER_DIR_OUT].data_len) {
+		iser_err("Total data length: %ld, less than EDTL: %d, "
+			 "in WRITE cmd BHS itt: %d, conn: 0x%p\n",
+			 iser_ctask->data[ISER_DIR_OUT].data_len,
+			 edtl, ctask->itt, ctask->conn);
+		return -EINVAL;
+	}
+
+	err = iser_reg_rdma_mem(iser_ctask,ISER_DIR_OUT);
+	if (err != 0) {
+		iser_err("Failed to register write cmd RDMA mem\n");
+		return err;
+	}
+
+	regd_buf = &iser_ctask->rdma_regd[ISER_DIR_OUT];
+
+	if (unsol_sz < edtl) {
+		hdr->flags     |= ISER_WSV;
+		hdr->write_stag = cpu_to_be32(regd_buf->reg.rkey);
+		hdr->write_va   = cpu_to_be64(regd_buf->reg.va + unsol_sz);
+
+		iser_dbg("Cmd itt:%d, WRITE tags, RKEY:%#.4X "
+			 "VA:%#llX + unsol:%d\n",
+			 ctask->itt, regd_buf->reg.rkey,
+			 (unsigned long long)regd_buf->reg.va, unsol_sz);
+	}
+
+	if (imm_sz > 0) {
+		iser_dbg("Cmd itt:%d, WRITE, adding imm.data sz: %d\n",
+			 ctask->itt, imm_sz);
+		iser_dto_add_regd_buff(send_dto,
+				       regd_buf,
+				       0,
+				       imm_sz);
+	}
+
+	return 0;
+}
+
+/**
+ * iser_post_receive_control - allocates, initializes and posts receive DTO.
+ */
+static int iser_post_receive_control(struct iscsi_conn *conn)
+{
+	struct iscsi_iser_conn *iser_conn = conn->dd_data;
+	struct iser_desc     *rx_desc;
+	struct iser_regd_buf *regd_hdr;
+	struct iser_regd_buf *regd_data;
+	struct iser_dto      *recv_dto = NULL;
+	struct iser_device  *device = iser_conn->ib_conn->device;
+	int rx_data_size, err = 0;
+
+	rx_desc = kmem_cache_alloc(ig.desc_cache, GFP_KERNEL);
+	if (rx_desc == NULL) {
+		iser_err("Failed to alloc desc for post recv\n");
+		return -ENOMEM;
+	}
+	rx_desc->type = ISCSI_RX;
+
+	/* for the login sequence we must support rx of upto 8K */
+	if (conn->c_stage == ISCSI_CONN_INITIAL_STAGE)
+		rx_data_size = DEFAULT_MAX_RECV_DATA_SEGMENT_LENGTH;
+	else /* FIXME till user space sets conn->max_recv_dlength correctly */
+		rx_data_size = 128;
+
+	rx_desc->data = kmalloc(rx_data_size, GFP_KERNEL);
+	if (rx_desc->data == NULL) {
+		iser_err("Failed to alloc data buf for post recv\n");
+		err = -ENOMEM;
+		goto post_rx_kmalloc_failure;
+	}
+
+	recv_dto = &rx_desc->dto;
+	recv_dto->conn          = iser_conn;
+	recv_dto->regd_vector_len = 0;
+
+	regd_hdr = &rx_desc->hdr_regd_buf;
+	memset(regd_hdr, 0, sizeof(struct iser_regd_buf));
+	regd_hdr->device  = device;
+	regd_hdr->virt_addr  = rx_desc; /* == &rx_desc->iser_header */
+	regd_hdr->data_size  = ISER_TOTAL_HEADERS_LEN;
+
+	iser_reg_single(device, regd_hdr, DMA_FROM_DEVICE);
+
+	iser_dto_add_regd_buff(recv_dto, regd_hdr, 0, 0);
+
+	regd_data = &rx_desc->data_regd_buf;
+	memset(regd_data, 0, sizeof(struct iser_regd_buf));
+	regd_data->device  = device;
+	regd_data->virt_addr  = rx_desc->data;
+	regd_data->data_size  = rx_data_size;
+
+	iser_reg_single(device, regd_data, DMA_FROM_DEVICE);
+
+	iser_dto_add_regd_buff(recv_dto, regd_data, 0, 0);
+
+	err = iser_post_recv(rx_desc);
+	if (!err)
+		return 0;
+
+	/* iser_post_recv failed */
+	iser_dto_buffs_release(recv_dto);
+	kfree(rx_desc->data);
+post_rx_kmalloc_failure:
+	kmem_cache_free(ig.desc_cache, rx_desc);
+	return err;
+}
+
+/* creates a new tx descriptor and adds header regd buffer */
+static void iser_create_send_desc(struct iscsi_iser_conn *iser_conn,
+				  struct iser_desc       *tx_desc)
+{
+	struct iser_regd_buf *regd_hdr = &tx_desc->hdr_regd_buf;
+	struct iser_dto      *send_dto = &tx_desc->dto;
+
+	memset(regd_hdr, 0, sizeof(struct iser_regd_buf));
+	regd_hdr->device  = iser_conn->ib_conn->device;
+	regd_hdr->virt_addr  = tx_desc; /* == &tx_desc->iser_header */
+	regd_hdr->data_size  = ISER_TOTAL_HEADERS_LEN;
+
+	send_dto->conn          = iser_conn;
+	send_dto->notify_enable   = 1;
+	send_dto->regd_vector_len = 0;
+
+	memset(&tx_desc->iser_header, 0, sizeof(struct iser_hdr));
+	tx_desc->iser_header.flags = ISER_VER;
+
+	iser_dto_add_regd_buff(send_dto, regd_hdr, 0, 0);
+}
+
+/**
+ *  iser_conn_set_full_featured_mode - (iSER API)
+ */
+int iser_conn_set_full_featured_mode(struct iscsi_conn *conn)
+{
+	struct iscsi_iser_conn *iser_conn = conn->dd_data;
+
+	int i;
+	/* no need to keep it in a var, we are after login so if this should
+	 * be negotiated, by now the result should be available here */
+	int initial_post_recv_bufs_num = ISER_MAX_RX_MISC_PDUS;
+
+	iser_dbg("Initially post: %d\n", initial_post_recv_bufs_num);
+
+	/* Check that there is no posted recv or send buffers left - */
+	/* they must be consumed during the login phase */
+	if (atomic_read(&iser_conn->ib_conn->post_recv_buf_count) != 0)
+		iser_bug("Number of currently posted recv bufs non-zero\n");
+	if (atomic_read(&iser_conn->ib_conn->post_send_buf_count) != 0)
+		iser_bug("Number of currently posted send bufs non-zero\n");
+
+	/* Initial post receive buffers */
+	for (i = 0; i < initial_post_recv_bufs_num; i++) {
+		if (iser_post_receive_control(conn) != 0) {
+			iser_err("Failed to post recv bufs at:%d conn:0x%p\n",
+				 i, conn);
+			return -ENOMEM;
+		}
+	}
+	iser_dbg("Posted %d post recv bufs, conn:0x%p\n", i, conn);
+	return 0;
+}
+
+static int
+iser_check_xmit(struct iscsi_conn *conn, void *task)
+{
+	int rc = 0;
+	struct iscsi_iser_conn *iser_conn = conn->dd_data;
+
+	write_lock_bh(conn->recv_lock);
+	if (atomic_read(&iser_conn->ib_conn->post_send_buf_count) ==
+	    ISER_QP_MAX_REQ_DTOS) {
+		iser_dbg("%ld can't xmit task %p, suspending tx\n",jiffies,task);
+		set_bit(ISCSI_SUSPEND_BIT, &conn->suspend_tx);
+		rc = -EAGAIN;
+	}
+	write_unlock_bh(conn->recv_lock);
+	return rc;
+}
+
+
+/**
+ * iser_send_command - send command PDU
+ */
+int iser_send_command(struct iscsi_conn     *conn,
+		      struct iscsi_cmd_task *ctask)
+{
+	struct iscsi_iser_conn *iser_conn = conn->dd_data;
+	struct iscsi_iser_cmd_task *iser_ctask = ctask->dd_data;
+	struct iser_dto *send_dto = NULL;
+	unsigned long edtl;
+	int err = 0;
+	struct iser_data_buf *data_buf;
+
+	struct iscsi_cmd *hdr =  ctask->hdr;
+	struct scsi_cmnd *sc  =  ctask->sc;
+
+	if (atomic_read(&iser_conn->ib_conn->state) != ISER_CONN_UP) {
+		iser_err("Failed to send, conn: 0x%p is not up\n", iser_conn->ib_conn);
+		return -EPERM;
+	}
+	if (iser_check_xmit(conn, ctask))
+		return -EAGAIN;
+
+	edtl = ntohl(hdr->data_length);
+
+	/* build the tx desc regd header and add it to the tx desc dto */
+	iser_ctask->desc.type = ISCSI_TX_SCSI_COMMAND;
+	send_dto = &iser_ctask->desc.dto;
+	send_dto->ctask = iser_ctask;
+	iser_create_send_desc(iser_conn, &iser_ctask->desc);
+
+	if (hdr->flags & ISCSI_FLAG_CMD_READ)
+		data_buf = &iser_ctask->data[ISER_DIR_IN];
+	else
+		data_buf = &iser_ctask->data[ISER_DIR_OUT];
+
+	if (sc->use_sg) { /* using a scatter list */
+		data_buf->buf  = sc->request_buffer;
+		data_buf->size = sc->use_sg;
+	} else { /* using a single buffer - convert it into one entry SG */
+		sg_init_one(&data_buf->sg_single,
+			    sc->request_buffer, sc->request_bufflen);
+		data_buf->buf   = &data_buf->sg_single;
+		data_buf->size  = 1;
+	}
+
+	data_buf->data_len = sc->request_bufflen;
+
+	if (hdr->flags & ISCSI_FLAG_CMD_READ) {
+		err = iser_prepare_read_cmd(ctask, edtl);
+		if (err)
+			goto send_command_error;
+	}
+	if (hdr->flags & ISCSI_FLAG_CMD_WRITE) {
+		err = iser_prepare_write_cmd(ctask,
+					     ctask->imm_count,
+				             ctask->imm_count +
+					     ctask->unsol_count,
+					     edtl);
+		if (err)
+			goto send_command_error;
+	}
+
+	iser_reg_single(iser_conn->ib_conn->device,
+			send_dto->regd[0], DMA_TO_DEVICE);
+
+	if (iser_post_receive_control(conn) != 0) {
+		iser_err("post_recv failed!\n");
+		err = -ENOMEM;
+		goto send_command_error;
+	}
+
+	iser_ctask->status = ISER_TASK_STATUS_STARTED;
+
+	err = iser_post_send(&iser_ctask->desc);
+	if (!err)
+		return 0;
+
+send_command_error:
+	iser_dto_buffs_release(send_dto);
+	iser_err("conn %p failed ctask->itt %d err %d\n",conn, ctask->itt, err);
+	return err;
+}
+
+/**
+ * iser_send_data_out - send data out PDU
+ */
+int iser_send_data_out(struct iscsi_conn     *conn,
+		       struct iscsi_cmd_task *ctask,
+		       struct iscsi_data *hdr)
+{
+	struct iscsi_iser_conn *iser_conn = conn->dd_data;
+	struct iscsi_iser_cmd_task *iser_ctask = ctask->dd_data;
+	struct iser_desc *tx_desc = NULL;
+	struct iser_dto *send_dto = NULL;
+	unsigned long buf_offset;
+	unsigned long data_seg_len;
+	unsigned int itt;
+	int err = 0;
+
+	if (atomic_read(&iser_conn->ib_conn->state) != ISER_CONN_UP) {
+		iser_err("Failed to send, conn: 0x%p is not up\n", iser_conn->ib_conn);
+		return -EPERM;
+	}
+
+	if (iser_check_xmit(conn, ctask))
+		return -EAGAIN;
+
+	itt = ntohl(hdr->itt);
+	data_seg_len = ntoh24(hdr->dlength);
+	buf_offset   = ntohl(hdr->offset);
+
+	iser_dbg("%s itt %d dseg_len %d offset %d\n",
+		 __func__,(int)itt,(int)data_seg_len,(int)buf_offset);
+
+	tx_desc = kmem_cache_alloc(ig.desc_cache, GFP_KERNEL);
+	if (tx_desc == NULL) {
+		iser_err("Failed to alloc desc for post dataout\n");
+		return -ENOMEM;
+	}
+
+	tx_desc->type = ISCSI_TX_DATAOUT;
+	memcpy(&tx_desc->iscsi_header, hdr, sizeof(struct iscsi_hdr));
+
+	/* build the tx desc regd header and add it to the tx desc dto */
+	send_dto = &tx_desc->dto;
+	send_dto->ctask = iser_ctask;
+	iser_create_send_desc(iser_conn, tx_desc);
+
+	iser_reg_single(iser_conn->ib_conn->device,
+			send_dto->regd[0], DMA_TO_DEVICE);
+
+	/* all data was registered for RDMA, we can use the lkey */
+	iser_dto_add_regd_buff(send_dto,
+			       &iser_ctask->rdma_regd[ISER_DIR_OUT],
+			       buf_offset,
+			       data_seg_len);
+
+	if (buf_offset + data_seg_len > iser_ctask->data[ISER_DIR_OUT].data_len) {
+		iser_err("Offset:%ld & DSL:%ld in Data-Out "
+			 "inconsistent with total len:%ld, itt:%d\n",
+			 buf_offset, data_seg_len,
+			 iser_ctask->data[ISER_DIR_OUT].data_len, itt);
+		err = -EINVAL;
+		goto send_data_out_error;
+	}
+	iser_dbg("data-out itt: %d, offset: %ld, sz: %ld\n",
+		 itt, buf_offset, data_seg_len);
+
+
+	err = iser_post_send(tx_desc);
+	if (!err)
+		return 0;
+
+send_data_out_error:
+	iser_dto_buffs_release(send_dto);
+	kmem_cache_free(ig.desc_cache, tx_desc);
+	iser_err("conn %p failed err %d\n",conn, err);
+	return err;
+}
+
+int iser_send_control(struct iscsi_conn *conn,
+		      struct iscsi_mgmt_task *mtask)
+{
+	struct iscsi_iser_conn *iser_conn = conn->dd_data;
+	struct iser_desc *mdesc = mtask->dd_data;
+	struct iser_dto *send_dto = NULL;
+	unsigned int itt;
+	unsigned long data_seg_len;
+	int err = 0;
+	unsigned char opcode;
+	struct iser_regd_buf *regd_buf;
+	struct iser_device *device;
+
+	if (atomic_read(&iser_conn->ib_conn->state) != ISER_CONN_UP) {
+		iser_err("Failed to send, conn: 0x%p is not up\n", iser_conn->ib_conn);
+		return -EPERM;
+	}
+
+	if (iser_check_xmit(conn,mtask))
+		return -EAGAIN;
+
+	/* build the tx desc regd header and add it to the tx desc dto */
+	mdesc->type = ISCSI_TX_CONTROL;
+	send_dto = &mdesc->dto;
+	send_dto->ctask = NULL;
+	iser_create_send_desc(iser_conn, mdesc);
+
+	device = iser_conn->ib_conn->device;
+
+	iser_reg_single(device, send_dto->regd[0], DMA_TO_DEVICE);
+
+	itt = ntohl(mtask->hdr->itt);
+	opcode = mtask->hdr->opcode & ISCSI_OPCODE_MASK;
+	data_seg_len = ntoh24(mtask->hdr->dlength);
+
+	if (data_seg_len > 0) {
+		regd_buf = &mdesc->data_regd_buf;
+		memset(regd_buf, 0, sizeof(struct iser_regd_buf));
+		regd_buf->device = device;
+		regd_buf->virt_addr = mtask->data;
+		regd_buf->data_size = mtask->data_count;
+		iser_reg_single(device, regd_buf,
+				DMA_TO_DEVICE);
+		iser_dto_add_regd_buff(send_dto, regd_buf,
+				       0,
+				       data_seg_len);
+	}
+
+	if (iser_post_receive_control(conn) != 0) {
+		iser_err("post_rcv_buff failed!\n");
+		err = -ENOMEM;
+		goto send_control_error;
+	}
+
+	err = iser_post_send(mdesc);
+	if (!err)
+		return 0;
+
+send_control_error:
+	iser_dto_buffs_release(send_dto);
+	iser_err("conn %p failed err %d\n",conn, err);
+	return err;
+}
+
+/**
+ * iser_rcv_dto_completion - recv DTO completion
+ */
+void iser_rcv_completion(struct iser_desc *rx_desc,
+			 unsigned long dto_xfer_len)
+{
+	struct iser_dto        *dto = &rx_desc->dto;
+	struct iscsi_iser_conn *conn = dto->conn;
+	struct iscsi_session *session = conn->iscsi_conn->session;
+	struct iscsi_cmd_task *ctask;
+	struct iscsi_iser_cmd_task *iser_ctask;
+	struct iscsi_hdr *hdr;
+	char   *rx_data = NULL;
+	int     rx_data_len = 0;
+	unsigned int itt;
+	unsigned char opcode;
+
+	hdr = &rx_desc->iscsi_header;
+
+	iser_dbg("op 0x%x itt 0x%x\n", hdr->opcode,hdr->itt);
+
+	if (dto_xfer_len > ISER_TOTAL_HEADERS_LEN) { /* we have data */
+		rx_data_len = dto_xfer_len - ISER_TOTAL_HEADERS_LEN;
+		rx_data     = dto->regd[1]->virt_addr;
+		rx_data    += dto->offset[1];
+	}
+
+	opcode = hdr->opcode & ISCSI_OPCODE_MASK;
+
+	if (opcode == ISCSI_OP_SCSI_CMD_RSP) {
+	        itt = hdr->itt & ISCSI_ITT_MASK; /* mask out cid and age bits */
+		if (!(itt < session->cmds_max))
+			iser_err("itt can't be matched to task!!!"
+				 "conn %p opcode %d cmds_max %d itt %d\n",
+				 conn->iscsi_conn,opcode,session->cmds_max,itt);
+		/* use the mapping given with the cmds array indexed by itt */
+		ctask = (struct iscsi_cmd_task *)session->cmds[itt];
+		iser_ctask = ctask->dd_data;
+		iser_dbg("itt %d ctask %p\n",itt,ctask);
+		iser_ctask->status = ISER_TASK_STATUS_COMPLETED;
+		iser_ctask_rdma_finalize(iser_ctask);
+	}
+
+	iser_dto_buffs_release(dto);
+
+	iscsi_iser_recv(conn->iscsi_conn, hdr, rx_data, rx_data_len);
+
+	kfree(rx_desc->data);
+	kmem_cache_free(ig.desc_cache, rx_desc);
+
+	/* decrementing conn->post_recv_buf_count only --after-- freeing the   *
+	 * task eliminates the need to worry on tasks which are completed in   *
+	 * parallel to the execution of iser_conn_term. So the code that waits *
+	 * for the posted rx bufs refcount to become zero handles everything   */
+	atomic_dec(&conn->ib_conn->post_recv_buf_count);
+}
+
+void iser_snd_completion(struct iser_desc *tx_desc)
+{
+	struct iser_dto        *dto = &tx_desc->dto;
+	struct iscsi_iser_conn *iser_conn = dto->conn;
+	struct iscsi_conn      *conn = iser_conn->iscsi_conn;
+	struct iscsi_mgmt_task *mtask;
+
+	iser_dbg("Initiator, Data sent dto=0x%p\n", dto);
+
+	iser_dto_buffs_release(dto);
+
+	if (tx_desc->type == ISCSI_TX_DATAOUT)
+		kmem_cache_free(ig.desc_cache, tx_desc);
+
+	atomic_dec(&iser_conn->ib_conn->post_send_buf_count);
+
+	write_lock(conn->recv_lock);
+	if (conn->suspend_tx) {
+		iser_dbg("%ld resuming tx\n",jiffies);
+		clear_bit(ISCSI_SUSPEND_BIT, &conn->suspend_tx);
+		scsi_queue_work(conn->session->host, &conn->xmitwork);
+	}
+	write_unlock(conn->recv_lock);
+
+	if (tx_desc->type == ISCSI_TX_CONTROL) {
+		/* this arithmetic is legal by libiscsi dd_data allocation */
+		mtask = (void *) ((long)(void *)tx_desc -
+				  sizeof(struct iscsi_mgmt_task));
+		if (mtask->hdr->itt == cpu_to_be32(ISCSI_RESERVED_TAG)) {
+			struct iscsi_session *session = conn->session;
+
+			spin_lock(&conn->session->lock);
+			list_del(&mtask->running);
+			__kfifo_put(session->mgmtpool.queue, (void*)&mtask,
+				    sizeof(void*));
+			spin_unlock(&session->lock);
+		}
+	}
+}
+
+void iser_ctask_rdma_init(struct iscsi_iser_cmd_task *iser_ctask)
+
+{
+	iser_ctask->status = ISER_TASK_STATUS_INIT;
+
+	iser_ctask->dir[ISER_DIR_IN] = 0;
+	iser_ctask->dir[ISER_DIR_OUT] = 0;
+
+	iser_ctask->data[ISER_DIR_IN].data_len  = 0;
+	iser_ctask->data[ISER_DIR_OUT].data_len = 0;
+
+	memset(&iser_ctask->rdma_regd[ISER_DIR_IN], 0,
+	       sizeof(struct iser_regd_buf));
+	memset(&iser_ctask->rdma_regd[ISER_DIR_OUT], 0,
+	       sizeof(struct iser_regd_buf));
+}
+
+void iser_ctask_rdma_finalize(struct iscsi_iser_cmd_task *iser_ctask)
+{
+	int deferred;
+
+	/* if we were reading, copy back to unaligned sglist,
+	 * anyway dma_unmap and free the copy
+	 */
+	if (iser_ctask->data_copy[ISER_DIR_IN].copy_buf != NULL)
+		iser_finalize_rdma_unaligned_sg(iser_ctask, ISER_DIR_IN);
+	if (iser_ctask->data_copy[ISER_DIR_OUT].copy_buf != NULL)
+		iser_finalize_rdma_unaligned_sg(iser_ctask, ISER_DIR_OUT);
+
+	if (iser_ctask->dir[ISER_DIR_IN]) {
+		deferred = iser_regd_buff_release
+			(&iser_ctask->rdma_regd[ISER_DIR_IN]);
+		if (deferred)
+			iser_bug("References remain for BUF-IN rdma reg\n");
+	}
+
+	if (iser_ctask->dir[ISER_DIR_OUT]) {
+		deferred = iser_regd_buff_release
+			(&iser_ctask->rdma_regd[ISER_DIR_OUT]);
+		if (deferred)
+			iser_bug("References remain for BUF-OUT rdma reg\n");
+	}
+
+	iser_dma_unmap_task_data(iser_ctask);
+}
+
+void iser_dto_buffs_release(struct iser_dto *dto)
+{
+	int i;
+
+	for (i = 0; i < dto->regd_vector_len; i++)
+		iser_regd_buff_release(dto->regd[i]);
+}
+


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 5/6] iser RDMA CM (CMA) and IB verbs interaction
  2006-04-27 12:32       ` [PATCH 4/6] iser initiator Or Gerlitz
@ 2006-04-27 12:32         ` Or Gerlitz
  2006-04-27 12:33           ` [PATCH 6/6] iser handling of memory for RDMA Or Gerlitz
  2006-04-28 23:05           ` [openib-general] [PATCH 5/6] iser RDMA CM (CMA) and IB verbsinteraction Sean Hefty
  0 siblings, 2 replies; 21+ messages in thread
From: Or Gerlitz @ 2006-04-27 12:32 UTC (permalink / raw)
  To: rdreier; +Cc: openib-general, linux-kernel

This code does the low level work with the ib verbs and cma, eg

+ establish/disconnect the iser connection
+ create/destory IB resources: PD, DMA MR, CQ, QP, FMR pool
+ do fast registration (FMR) of SG list associated with the SC
+ post rx and tx requests to the QP and reap completions from the CQ

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>

--- /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser-x/iser_verbs.c	1970-01-01 02:00:00.000000000 +0200
+++ /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser/iser_verbs.c	2006-04-26 12:50:11.000000000 +0300
@@ -0,0 +1,804 @@
+/*
+ * Copyright (c) 2004, 2005, 2006 Voltaire, Inc. All rights reserved.
+ * Copyright (c) 2005, 2006 Cisco Systems.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *	- Redistributions of source code must retain the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer.
+ *
+ *	- Redistributions in binary form must reproduce the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer in the documentation and/or other materials
+ *	  provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ * $Id: iser_verbs.c 6643 2006-04-26 10:01:01Z ogerlitz $
+ */
+#include <asm/io.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/smp_lock.h>
+#include <linux/delay.h>
+#include <linux/version.h>
+
+#include "iscsi_iser.h"
+
+#define ISCSI_ISER_MAX_CONN	8
+#define ISER_MAX_CQ_LEN		((ISER_QP_MAX_RECV_DTOS + \
+				ISER_QP_MAX_REQ_DTOS) *   \
+				 ISCSI_ISER_MAX_CONN)
+
+static void iser_cq_tasklet_fn(unsigned long data);
+static void iser_cq_callback(struct ib_cq *cq, void *cq_context);
+static void iser_comp_error_worker(void *data);
+
+static void iser_cq_event_callback(struct ib_event *cause, void *context)
+{
+	iser_err("got cq event %d \n", cause->event);
+}
+
+static void iser_qp_event_callback(struct ib_event *cause, void *context)
+{
+	iser_err("got qp event %d\n",cause->event);
+}
+
+/**
+ * iser_create_device_ib_res - creates Protection Domain (PD), Completion
+ * Queue (CQ), DMA Memory Region (DMA MR) with the device associated with
+ * the adapator.
+ *
+ * returns 0 on success, -1 on failure
+ */
+static int iser_create_device_ib_res(struct iser_device *device)
+{
+	device->pd = ib_alloc_pd(device->ib_device);
+	if (IS_ERR(device->pd))
+		goto pd_err;
+
+	device->cq = ib_create_cq(device->ib_device,
+				  iser_cq_callback,
+				  iser_cq_event_callback,
+				  (void *)device,
+				  ISER_MAX_CQ_LEN);
+	if (IS_ERR(device->cq))
+		goto cq_err;
+
+	if (ib_req_notify_cq(device->cq, IB_CQ_NEXT_COMP))
+		goto cq_arm_err;
+
+	tasklet_init(&device->cq_tasklet,
+		     iser_cq_tasklet_fn,
+		     (unsigned long)device);
+
+	device->mr = ib_get_dma_mr(device->pd,
+				   IB_ACCESS_LOCAL_WRITE);
+	if (IS_ERR(device->mr))
+		goto dma_mr_err;
+
+	return 0;
+
+dma_mr_err:
+	tasklet_kill(&device->cq_tasklet);
+cq_arm_err:
+	ib_destroy_cq(device->cq);
+cq_err:
+	ib_dealloc_pd(device->pd);
+pd_err:
+	iser_err("failed to allocate an IB resource\n");
+	return -1;
+}
+
+/**
+ * iser_free_device_ib_res - destory/dealloc/dereg the DMA MR,
+ * CQ and PD created with the device associated with the adapator.
+ *
+ * returns 0 on success, -1 on failure
+ */
+static int iser_free_device_ib_res(struct iser_device *device)
+{
+	BUG_ON(device->mr == NULL);
+
+	tasklet_kill(&device->cq_tasklet);
+
+	(void)ib_dereg_mr(device->mr);
+	(void)ib_destroy_cq(device->cq);
+	(void)ib_dealloc_pd(device->pd);
+
+	device->mr = NULL;
+	device->cq = NULL;
+	device->pd = NULL;
+	return 0;
+}
+
+/**
+ * iser_create_ib_conn_res - Creates FMR pool and Queue-Pair (QP)
+ *
+ * returns 0 on success, -1 on failure
+ */
+static int iser_create_ib_conn_res(struct iser_conn *ib_conn)
+{
+	struct iser_device	*device;
+	struct ib_qp_init_attr	init_attr;
+	int			ret;
+	struct ib_fmr_pool_param params;
+
+	BUG_ON(ib_conn->device == NULL);
+
+	device = ib_conn->device;
+
+	ib_conn->page_vec = kmalloc(sizeof(struct iser_page_vec) +
+				    (sizeof(u64) * (ISCSI_ISER_SG_TABLESIZE +1)),
+				    GFP_KERNEL);
+	if (!ib_conn->page_vec) {
+		ret = -ENOMEM;
+		goto alloc_err;
+	}
+	ib_conn->page_vec->pages = (u64 *) (ib_conn->page_vec + 1);
+
+	params.page_shift        = PAGE_SHIFT;
+	/* when the first/last SG element are not start/end *
+	 * page aligned, the map whould be of N+1 pages     */
+	params.max_pages_per_fmr = ISCSI_ISER_SG_TABLESIZE + 1;
+	/* make the pool size twice the max number of SCSI commands *
+	 * the ML is expected to queue, watermark for unmap at 50%  */
+	params.pool_size	 = ISCSI_XMIT_CMDS_MAX * 2;
+	params.dirty_watermark	 = ISCSI_XMIT_CMDS_MAX;
+	params.cache		 = 0;
+	params.flush_function	 = NULL;
+	params.access		 = (IB_ACCESS_LOCAL_WRITE  |
+				    IB_ACCESS_REMOTE_WRITE |
+				    IB_ACCESS_REMOTE_READ);
+
+	ib_conn->fmr_pool = ib_create_fmr_pool(device->pd, &params);
+	if (IS_ERR(ib_conn->fmr_pool)) {
+		ret = PTR_ERR(ib_conn->fmr_pool);
+		goto fmr_pool_err;
+	}
+
+	memset(&init_attr, 0, sizeof init_attr);
+
+	init_attr.event_handler = iser_qp_event_callback;
+	init_attr.qp_context	= (void *)ib_conn;
+	init_attr.send_cq	= device->cq;
+	init_attr.recv_cq	= device->cq;
+	init_attr.cap.max_send_wr  = ISER_QP_MAX_REQ_DTOS;
+	init_attr.cap.max_recv_wr  = ISER_QP_MAX_RECV_DTOS;
+	init_attr.cap.max_send_sge = MAX_REGD_BUF_VECTOR_LEN;
+	init_attr.cap.max_recv_sge = 2;
+	init_attr.sq_sig_type	= IB_SIGNAL_REQ_WR;
+	init_attr.qp_type	= IB_QPT_RC;
+
+	ret = rdma_create_qp(ib_conn->cma_id, device->pd, &init_attr);
+	if (ret)
+		goto qp_err;
+
+	ib_conn->qp = ib_conn->cma_id->qp;
+	iser_err("setting conn %p cma_id %p: fmr_pool %p qp %p\n",
+		 ib_conn, ib_conn->cma_id,
+		 ib_conn->fmr_pool, ib_conn->cma_id->qp);
+	return ret;
+
+qp_err:
+	(void)ib_destroy_fmr_pool(ib_conn->fmr_pool);
+fmr_pool_err:
+	kfree(ib_conn->page_vec);
+alloc_err:
+	iser_err("unable to alloc mem or create resource, err %d\n", ret);
+	return ret;
+}
+
+/**
+ * releases the FMR pool, QP and CMA ID objects, returns 0 on success,
+ * -1 on failure
+ */
+static int iser_free_ib_conn_res(struct iser_conn *ib_conn)
+{
+	BUG_ON(ib_conn == NULL);
+
+	iser_err("freeing conn %p cma_id %p fmr pool %p qp %p\n",
+		 ib_conn, ib_conn->cma_id,
+		 ib_conn->fmr_pool, ib_conn->qp);
+
+	/* qp is created only once both addr & route are resolved */
+	if (ib_conn->fmr_pool != NULL)
+		ib_destroy_fmr_pool(ib_conn->fmr_pool);
+
+	if (ib_conn->qp != NULL)
+		rdma_destroy_qp(ib_conn->cma_id);
+
+	if (ib_conn->cma_id != NULL)
+		rdma_destroy_id(ib_conn->cma_id);
+
+	ib_conn->fmr_pool = NULL;
+	ib_conn->qp	  = NULL;
+	ib_conn->cma_id   = NULL;
+	kfree(ib_conn->page_vec);
+
+	return 0;
+}
+
+/**
+ * based on the resolved device node GUID see if there already allocated
+ * device for this device. If there's no such, create one.
+ */
+static
+struct iser_device *iser_device_find_by_ib_device(struct rdma_cm_id *cma_id)
+{
+	struct list_head    *p_list;
+	struct iser_device  *device = NULL;
+
+	mutex_lock(&ig.device_list_mutex);
+
+	p_list = ig.device_list.next;
+	while (p_list != &ig.device_list) {
+		device = list_entry(p_list, struct iser_device, ig_list);
+		/* find if there's a match using the node GUID */
+		if (device->ib_device->node_guid == cma_id->device->node_guid)
+			break;
+	}
+
+	if (device == NULL) {
+		device = kzalloc(sizeof *device, GFP_KERNEL);
+		if (device == NULL)
+			goto end;
+		/* assign this device to the device */
+		device->ib_device = cma_id->device;
+		/* init the device and link it into ig device list */
+		if (iser_create_device_ib_res(device)) {
+			kfree(device);
+			device = NULL;
+			goto end;
+		}
+		list_add(&device->ig_list, &ig.device_list);
+	}
+end:
+	BUG_ON(device == NULL);
+	device->refcount++;
+	mutex_unlock(&ig.device_list_mutex);
+	return device;
+}
+
+/* if there's no demand for this device, release it */
+static void iser_device_try_release(struct iser_device *device)
+{
+	mutex_lock(&ig.device_list_mutex);
+	device->refcount--;
+	iser_err("device %p refcount %d\n",device,device->refcount);
+	if (!device->refcount) {
+		iser_free_device_ib_res(device);
+		list_del(&device->ig_list);
+		kfree(device);
+	}
+	mutex_unlock(&ig.device_list_mutex);
+}
+
+/**
+ * triggers start of the disconnect procedures and wait for them to be done
+ */
+void iser_conn_terminate(struct iser_conn *ib_conn)
+{
+	int err = 0;
+
+	atomic_set(&ib_conn->state, ISER_CONN_TERMINATING);
+	err = rdma_disconnect(ib_conn->cma_id);
+	if (err)
+		iser_bug("Failed to disconnect, conn: 0x%p err %d\n",ib_conn,err);
+	wait_event_interruptible(ib_conn->wait,
+				 (atomic_read(&ib_conn->state) == ISER_CONN_DOWN));
+
+	mutex_lock(&ig.connlist_mutex);
+	list_del(&ib_conn->conn_list);
+	mutex_unlock(&ig.connlist_mutex);
+
+	iser_conn_release(ib_conn);
+}
+
+static void iser_connect_error(struct rdma_cm_id *cma_id)
+{
+	struct iser_conn *ib_conn;
+	ib_conn = (struct iser_conn *)cma_id->context;
+
+	if (atomic_read(&ib_conn->state) == ISER_CONN_PENDING) {
+		atomic_set(&ib_conn->state, ISER_CONN_DOWN);
+		wake_up_interruptible(&ib_conn->wait);
+	} else
+		iser_err("Unexpected evt for conn.state: %d\n",
+			 atomic_read(&ib_conn->state));
+}
+
+static void iser_addr_handler(struct rdma_cm_id *cma_id)
+{
+	struct iser_device *device;
+	struct iser_conn   *ib_conn;
+	int    ret;
+
+	device = iser_device_find_by_ib_device(cma_id);
+	ib_conn = (struct iser_conn *)cma_id->context;
+	ib_conn->device = device;
+
+	ret = rdma_resolve_route(cma_id, 1000);
+	if (ret) {
+		iser_err("resolve route failed: %d\n", ret);
+		iser_connect_error(cma_id);
+	}
+	return;
+}
+
+static void iser_route_handler(struct rdma_cm_id *cma_id)
+{
+	struct rdma_conn_param conn_param;
+	int    ret;
+
+	ret = iser_create_ib_conn_res((struct iser_conn *)cma_id->context);
+	if (ret)
+		goto failure;
+
+	iser_dbg("path.mtu is %d setting it to %d\n",
+		 cma_id->route.path_rec->mtu, IB_MTU_1024);
+
+	/* we must set the MTU to 1024 as this is what the target is assuming */
+	if (cma_id->route.path_rec->mtu > IB_MTU_1024)
+		cma_id->route.path_rec->mtu = IB_MTU_1024;
+
+	memset(&conn_param, 0, sizeof conn_param);
+	conn_param.responder_resources = 4;
+	conn_param.initiator_depth     = 1;
+	conn_param.retry_count	       = 7;
+	conn_param.rnr_retry_count     = 6;
+
+	ret = rdma_connect(cma_id, &conn_param);
+	if (ret) {
+		iser_err("failure connecting: %d\n", ret);
+		goto failure;
+	}
+
+	return;
+failure:
+	iser_connect_error(cma_id);
+}
+
+static void iser_connected_handler(struct rdma_cm_id *cma_id)
+{
+	struct iser_conn *ib_conn;
+
+	ib_conn = (struct iser_conn *)cma_id->context;
+	atomic_set(&ib_conn->state, ISER_CONN_UP);
+	wake_up_interruptible(&ib_conn->wait);
+}
+
+static void iser_disconnected_handler(struct rdma_cm_id *cma_id)
+{
+	struct iser_conn *ib_conn;
+
+	ib_conn = (struct iser_conn *)cma_id->context;
+	ib_conn->disc_evt_flag = 1;
+
+	/* If this event is unsolicited this means that the conn is being */
+	/* terminated asynchronously from the iSCSI layer's perspective.  */
+	if (atomic_read(&ib_conn->state) == ISER_CONN_PENDING) {
+		atomic_set(&ib_conn->state, ISER_CONN_DOWN);
+		wake_up_interruptible(&ib_conn->wait);
+	} else {
+		if (atomic_read(&ib_conn->state) == ISER_CONN_UP) {
+			atomic_set(&ib_conn->state, ISER_CONN_TERMINATING);
+			iscsi_conn_failure(ib_conn->iser_conn->iscsi_conn,
+						ISCSI_ERR_CONN_FAILED);
+		}
+		/* Complete the termination process if no posts are pending */
+		if ((atomic_read(&ib_conn->post_recv_buf_count) == 0) &&
+		    (atomic_read(&ib_conn->post_send_buf_count) == 0)) {
+			atomic_set(&ib_conn->state, ISER_CONN_DOWN);
+			wake_up_interruptible(&ib_conn->wait);
+		}
+	}
+}
+
+static int iser_cma_handler(struct rdma_cm_id *cma_id, struct rdma_cm_event *event)
+{
+	int ret = 0;
+
+	iser_err("event %d conn %p id %p\n",event->event,cma_id->context,cma_id);
+
+	switch (event->event) {
+	case RDMA_CM_EVENT_ADDR_RESOLVED:
+		iser_addr_handler(cma_id);
+		break;
+	case RDMA_CM_EVENT_ROUTE_RESOLVED:
+		iser_route_handler(cma_id);
+		break;
+	case RDMA_CM_EVENT_ESTABLISHED:
+		iser_connected_handler(cma_id);
+		break;
+	case RDMA_CM_EVENT_ADDR_ERROR:
+	case RDMA_CM_EVENT_ROUTE_ERROR:
+	case RDMA_CM_EVENT_CONNECT_ERROR:
+	case RDMA_CM_EVENT_UNREACHABLE:
+	case RDMA_CM_EVENT_REJECTED:
+		iser_err("event: %d, error: %d\n", event->event, event->status);
+		iser_connect_error(cma_id);
+		break;
+	case RDMA_CM_EVENT_DISCONNECTED:
+		iser_disconnected_handler(cma_id);
+		break;
+	case RDMA_CM_EVENT_DEVICE_REMOVAL:
+		iser_bug("device removal is not handled yet\n");
+		break;
+	case RDMA_CM_EVENT_CONNECT_RESPONSE:
+		iser_bug("not expecting cma to deliver the REP!!!\n");
+		break;
+	case RDMA_CM_EVENT_CONNECT_REQUEST:
+	default:
+		break;
+	}
+	return ret;
+}
+
+int iser_conn_init(struct iser_conn **ibconn)
+{
+	struct iser_conn *ib_conn;
+
+	ib_conn = kzalloc(sizeof *ib_conn, GFP_KERNEL);
+	if (!ib_conn) {
+		iser_err("can't alloc memory for struct iser_conn\n");
+		return -ENOMEM;
+	}
+	atomic_set(&ib_conn->state, ISER_CONN_INIT);
+	init_waitqueue_head(&ib_conn->wait);
+	atomic_set(&ib_conn->post_recv_buf_count, 0);
+	atomic_set(&ib_conn->post_send_buf_count, 0);
+	INIT_WORK(&ib_conn->comperror_work, iser_comp_error_worker,
+		  ib_conn);
+
+	*ibconn = ib_conn;
+	return 0;
+}
+
+ /**
+ * starts the process of connecting to the target
+ * sleeps untill the connection is established or rejected
+ */
+int iser_connect(struct iser_conn   *ib_conn,
+		 struct sockaddr_in *src_addr,
+		 struct sockaddr_in *dst_addr,
+		 int                 non_blocking)
+{
+	struct sockaddr *src, *dst;
+	int err = 0;
+
+	sprintf(ib_conn->name,"%d.%d.%d.%d:%d",
+		NIPQUAD(dst_addr->sin_addr.s_addr), dst_addr->sin_port);
+
+	/* the device is known only --after-- address resolution */
+	ib_conn->device = NULL;
+
+	iser_err("connecting to: %d.%d.%d.%d, port 0x%x\n",
+		 NIPQUAD(dst_addr->sin_addr), dst_addr->sin_port);
+
+	atomic_set(&ib_conn->state, ISER_CONN_PENDING);
+
+	ib_conn->cma_id = rdma_create_id(iser_cma_handler,
+					     (void *)ib_conn,
+					     RDMA_PS_TCP);
+	if (IS_ERR(ib_conn->cma_id)) {
+		err = PTR_ERR(ib_conn->cma_id);
+		iser_err("rdma_create_id failed: %d\n", err);
+		goto id_failure;
+	}
+
+	src = (struct sockaddr *)src_addr;
+	dst = (struct sockaddr *)dst_addr;
+	err = rdma_resolve_addr(ib_conn->cma_id, src, dst, 1000);
+	if (err) {
+		iser_err("rdma_resolve_addr failed: %d\n", err);
+		goto addr_failure;
+	}
+
+	if (!non_blocking) {
+		wait_event_interruptible(ib_conn->wait,
+			 atomic_read(&ib_conn->state) != ISER_CONN_PENDING);
+
+		if (atomic_read(&ib_conn->state) != ISER_CONN_UP) {
+			err =  -EIO;
+			goto connect_failure;
+		}
+	}
+
+	mutex_lock(&ig.connlist_mutex);
+	list_add(&ib_conn->conn_list, &ig.connlist);
+	mutex_unlock(&ig.connlist_mutex);
+	return 0;
+
+id_failure:
+	ib_conn->cma_id = NULL;
+addr_failure:
+	atomic_set(&ib_conn->state, ISER_CONN_DOWN);
+connect_failure:
+	iser_conn_release(ib_conn);
+	return err;
+}
+
+/**
+ * Frees all conn objects and deallocs conn descriptor
+ */
+void iser_conn_release(struct iser_conn *ib_conn)
+{
+	struct iser_device  *device = ib_conn->device;
+
+	BUG_ON(atomic_read(&ib_conn->state) != ISER_CONN_DOWN);
+
+	iser_free_ib_conn_res(ib_conn);
+	ib_conn->device = NULL;
+	/* on EVENT_ADDR_ERROR there's no device yet for this conn */
+	if (device != NULL)
+		iser_device_try_release(device);
+	kfree(ib_conn);
+}
+
+
+/**
+ * iser_reg_page_vec - Register physical memory
+ *
+ * returns: 0 on success, errno code on failure
+ */
+int iser_reg_page_vec(struct iser_conn     *ib_conn,
+		      struct iser_page_vec *page_vec,
+		      struct iser_mem_reg  *mem_reg)
+{
+	struct ib_pool_fmr *mem;
+	u64		   io_addr;
+	u64		   *page_list;
+	int		   status;
+
+	page_list = page_vec->pages;
+	io_addr	  = page_list[0];
+
+	mem  = ib_fmr_pool_map_phys(ib_conn->fmr_pool,
+				    page_list,
+				    page_vec->length,
+				    &io_addr);
+
+	if (IS_ERR(mem)) {
+		status = (int)PTR_ERR(mem);
+		iser_err("ib_fmr_pool_map_phys failed: %d\n", status);
+		return status;
+	}
+
+	mem_reg->lkey  = mem->fmr->lkey;
+	mem_reg->rkey  = mem->fmr->rkey;
+	mem_reg->len   = page_vec->length * PAGE_SIZE;
+	mem_reg->va    = io_addr;
+	mem_reg->mem_h = (void *)mem;
+
+	mem_reg->va   += page_vec->offset;
+	mem_reg->len   = page_vec->data_size;
+
+	iser_dbg("PHYSICAL Mem.register, [PHYS p_array: 0x%p, sz: %d, "
+		 "entry[0]: (0x%08lx,%ld)] -> "
+		 "[lkey: 0x%08X mem_h: 0x%p va: 0x%08lX sz: %ld]\n",
+		 page_vec, page_vec->length,
+		 (unsigned long)page_vec->pages[0],
+		 (unsigned long)page_vec->data_size,
+		 (unsigned int)mem_reg->lkey, mem_reg->mem_h,
+		 (unsigned long)mem_reg->va, (unsigned long)mem_reg->len);
+	return 0;
+}
+
+/**
+ * Unregister (previosuly registered) memory.
+ */
+void iser_unreg_mem(struct iser_mem_reg *reg)
+{
+	int ret;
+
+	iser_dbg("PHYSICAL Mem.Unregister mem_h %p\n",reg->mem_h);
+
+	ret = ib_fmr_pool_unmap((struct ib_pool_fmr *)reg->mem_h);
+	if (ret)
+		iser_err("ib_fmr_pool_unmap failed %d\n", ret);
+
+	reg->mem_h = NULL;
+}
+
+/**
+ * iser_dto_to_iov - builds IOV from a dto descriptor
+ */
+static void iser_dto_to_iov(struct iser_dto *dto, struct ib_sge *iov, int iov_len)
+{
+	int		     i;
+	struct ib_sge	     *sge;
+	struct iser_regd_buf *regd_buf;
+
+	if (dto->regd_vector_len > iov_len)
+		iser_bug("iov size %d too small for posting dto of len %d\n",
+			 iov_len, dto->regd_vector_len);
+
+	for (i = 0; i < dto->regd_vector_len; i++) {
+		sge	    = &iov[i];
+		regd_buf  = dto->regd[i];
+
+		sge->addr   = regd_buf->reg.va;
+		sge->length = regd_buf->reg.len;
+		sge->lkey   = regd_buf->reg.lkey;
+
+		if (dto->used_sz[i] > 0)  /* Adjust size */
+			sge->length = dto->used_sz[i];
+
+		/* offset and length should not exceed the regd buf length */
+		if (sge->length + dto->offset[i] > regd_buf->reg.len) {
+			iser_bug("Used len:%ld + offset:%d, exceed reg.buf.len:"
+				 "%ld in dto:0x%p [%d], va:0x%08lX\n",
+				 (unsigned long)sge->length, dto->offset[i],
+				 (unsigned long)regd_buf->reg.len, dto, i,
+				 (unsigned long)sge->addr);
+		}
+
+		sge->addr += dto->offset[i]; /* Adjust offset */
+	}
+}
+
+/**
+ * iser_post_recv - Posts a receive buffer.
+ *
+ * returns 0 on success, -1 on failure
+ */
+int iser_post_recv(struct iser_desc *rx_desc)
+{
+	int		  ib_ret, ret_val = 0;
+	struct ib_recv_wr recv_wr, *recv_wr_failed;
+	struct ib_sge	  iov[2];
+	struct iser_conn  *ib_conn;
+	struct iser_dto   *recv_dto = &rx_desc->dto;
+
+	/* Retrieve conn */
+	ib_conn = recv_dto->conn->ib_conn;
+
+	iser_dto_to_iov(recv_dto, iov, 2);
+
+	recv_wr.next	= NULL;
+	recv_wr.sg_list = iov;
+	recv_wr.num_sge = recv_dto->regd_vector_len;
+	recv_wr.wr_id	= (unsigned long)rx_desc;
+
+	atomic_inc(&ib_conn->post_recv_buf_count);
+	ib_ret	= ib_post_recv(ib_conn->qp, &recv_wr, &recv_wr_failed);
+	if (ib_ret) {
+		iser_err("ib_post_recv failed ret=%d\n", ib_ret);
+		atomic_dec(&ib_conn->post_recv_buf_count);
+		ret_val = -1;
+	}
+
+	return ret_val;
+}
+
+/**
+ * iser_start_send - Initiate a Send DTO operation
+ *
+ * returns 0 on success, -1 on failure
+ */
+int iser_post_send(struct iser_desc *tx_desc)
+{
+	int		  ib_ret, ret_val = 0;
+	struct ib_send_wr send_wr, *send_wr_failed;
+	struct ib_sge	  iov[MAX_REGD_BUF_VECTOR_LEN];
+	struct iser_conn  *ib_conn;
+	struct iser_dto   *dto = &tx_desc->dto;
+
+	ib_conn = dto->conn->ib_conn;
+
+	iser_dto_to_iov(dto, iov, MAX_REGD_BUF_VECTOR_LEN);
+
+	send_wr.next	   = NULL;
+	send_wr.wr_id	   = (unsigned long)tx_desc;
+	send_wr.sg_list	   = iov;
+	send_wr.num_sge	   = dto->regd_vector_len;
+	send_wr.opcode	   = IB_WR_SEND;
+	send_wr.send_flags = dto->notify_enable ? IB_SEND_SIGNALED : 0;
+
+	atomic_inc(&ib_conn->post_send_buf_count);
+
+	ib_ret = ib_post_send(ib_conn->qp, &send_wr, &send_wr_failed);
+	if (ib_ret) {
+		iser_err("Failed to start SEND DTO, dto: 0x%p, IOV len: %d\n",
+			 dto, dto->regd_vector_len);
+		iser_err("ib_post_send failed, ret:%d\n", ib_ret);
+		atomic_dec(&ib_conn->post_send_buf_count);
+		ret_val = -1;
+	}
+
+	return ret_val;
+}
+
+static void iser_comp_error_worker(void *data)
+{
+	struct iser_conn *ib_conn = data;
+
+	if (atomic_read(&ib_conn->state) == ISER_CONN_UP) {
+		atomic_set(&ib_conn->state, ISER_CONN_TERMINATING);
+		iscsi_conn_failure(ib_conn->iser_conn->iscsi_conn,
+					ISCSI_ERR_CONN_FAILED);
+	}
+
+	/* complete the termination process if disconnect event was delivered *
+	 * note there are no more non completed posts to the QP               */
+	if (ib_conn->disc_evt_flag) {
+		atomic_set(&ib_conn->state, ISER_CONN_DOWN);
+		wake_up_interruptible(&ib_conn->wait);
+	}
+}
+
+static void iser_handle_comp_error(struct iser_desc *desc)
+{
+	struct iser_dto  *dto     = &desc->dto;
+	struct iser_conn *ib_conn = dto->conn->ib_conn;
+
+	iser_dto_buffs_release(dto);
+
+	if (desc->type == ISCSI_RX) {
+		kfree(desc->data);
+		kmem_cache_free(ig.desc_cache, desc);
+		atomic_dec(&ib_conn->post_recv_buf_count);
+	} else { /* type is TX control/command/dataout */
+		if (desc->type == ISCSI_TX_DATAOUT)
+			kmem_cache_free(ig.desc_cache, desc);
+		atomic_dec(&ib_conn->post_send_buf_count);
+	}
+
+	if (atomic_read(&ib_conn->post_recv_buf_count) == 0 &&
+	    atomic_read(&ib_conn->post_send_buf_count) == 0)
+		schedule_work(&ib_conn->comperror_work);
+}
+
+static void iser_cq_tasklet_fn(unsigned long data)
+{
+	 struct iser_device  *device = (struct iser_device *)data;
+	 struct ib_cq	     *cq = device->cq;
+	 struct ib_wc	     wc;
+	 struct iser_desc    *desc;
+	 unsigned long	     xfer_len;
+
+	 while (ib_poll_cq(cq, 1, &wc) == 1) {
+		 desc	 = (struct iser_desc *) (unsigned long) wc.wr_id;
+
+		 if (desc == NULL)
+			 iser_bug("NULL desc\n");
+
+		if (wc.status == IB_WC_SUCCESS) {
+			if (desc->type == ISCSI_RX) {
+				xfer_len = (unsigned long)wc.byte_len;
+				iser_rcv_completion(desc, xfer_len);
+			} else /* type == ISCSI_TX_CONTROL/SCSI_CMD/DOUT */
+				iser_snd_completion(desc);
+		} else {
+			iser_err("comp w. error op %d status %d\n",desc->type,wc.status);
+			iser_handle_comp_error(desc);
+		}
+	 }
+	 /* #warning "it is assumed here that arming CQ only once its empty" *
+	  * " would not cause interrupts to be missed"                       */
+	 ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
+}
+
+static void iser_cq_callback(struct ib_cq *cq, void *cq_context)
+{
+	struct iser_device  *device = (struct iser_device *)cq_context;
+
+	tasklet_schedule(&device->cq_tasklet);
+}


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 6/6] iser handling of memory for RDMA
  2006-04-27 12:32         ` [PATCH 5/6] iser RDMA CM (CMA) and IB verbs interaction Or Gerlitz
@ 2006-04-27 12:33           ` Or Gerlitz
  2006-04-28 23:05           ` [openib-general] [PATCH 5/6] iser RDMA CM (CMA) and IB verbsinteraction Sean Hefty
  1 sibling, 0 replies; 21+ messages in thread
From: Or Gerlitz @ 2006-04-27 12:33 UTC (permalink / raw)
  To: rdreier; +Cc: openib-general, linux-kernel

the code has the ability to handle the case of SG lists which are
not aligned for RDMA in the sense that one VA and RKEY pair can NOT be
produced for them by ANY of the ib verbs memory registration apis.

from our experience such lists are very rare and over time less then
0.1% of the data sent down by the SCSI ML is represented by such SGs

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>

--- /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser-x/iser_memory.c	1970-01-01 02:00:00.000000000 +0200
+++ /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser/iser_memory.c	2006-04-26 12:50:11.000000000 +0300
@@ -0,0 +1,403 @@
+/*
+ * Copyright (c) 2004, 2005, 2006 Voltaire, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *	- Redistributions of source code must retain the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer.
+ *
+ *	- Redistributions in binary form must reproduce the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer in the documentation and/or other materials
+ *	  provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ * $Id: iser_memory.c 6643 2006-04-26 10:01:01Z ogerlitz $
+ */
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/mm.h>
+#include <asm/io.h>
+#include <asm/scatterlist.h>
+#include <linux/scatterlist.h>
+
+#include "iscsi_iser.h"
+
+#define ISER_KMALLOC_THRESHOLD 0x20000 /* 128K - kmalloc limit */
+/**
+ * Decrements the reference count for the
+ * registered buffer & releases it
+ *
+ * returns 0 if released, 1 if deferred
+ */
+int iser_regd_buff_release(struct iser_regd_buf *regd_buf)
+{
+	struct device *dma_device;
+
+	if ((atomic_read(&regd_buf->ref_count) == 0) ||
+	    atomic_dec_and_test(&regd_buf->ref_count)) {
+		/* if we used the dma mr, unreg is just NOP */
+		if (regd_buf->reg.rkey != 0)
+			iser_unreg_mem(&regd_buf->reg);
+
+		if (regd_buf->dma_addr) {
+			dma_device = regd_buf->device->ib_device->dma_device;
+			dma_unmap_single(dma_device,
+					 regd_buf->dma_addr,
+					 regd_buf->data_size,
+					 regd_buf->direction);
+		}
+		/* else this regd buf is associated with task which we */
+		/* dma_unmap_single/sg later */
+		return 0;
+	} else {
+		iser_dbg("Release deferred, regd.buff: 0x%p\n", regd_buf);
+		return 1;
+	}
+}
+
+/**
+ * iser_reg_single - fills registered buffer descriptor with
+ *		     registration information
+ */
+void iser_reg_single(struct iser_device *device,
+		     struct iser_regd_buf *regd_buf,
+		     enum dma_data_direction direction)
+{
+	dma_addr_t dma_addr;
+
+	dma_addr  = dma_map_single(device->ib_device->dma_device,
+				   regd_buf->virt_addr,
+				   regd_buf->data_size, direction);
+	if (dma_mapping_error(dma_addr))
+		iser_bug("dma_map_single failed at %p\n", regd_buf->virt_addr);
+
+	regd_buf->reg.lkey = device->mr->lkey;
+	regd_buf->reg.rkey = 0; /* indicate there's no need to unreg */
+	regd_buf->reg.len  = regd_buf->data_size;
+	regd_buf->reg.va   = dma_addr;
+
+	regd_buf->dma_addr  = dma_addr;
+	regd_buf->direction = direction;
+}
+
+/**
+ * iser_start_rdma_unaligned_sg
+ */
+int iser_start_rdma_unaligned_sg(struct iscsi_iser_cmd_task  *iser_ctask,
+				 enum iser_data_dir cmd_dir)
+{
+	int dma_nents;
+	struct device *dma_device;
+	char *mem = NULL;
+	struct iser_data_buf *data = &iser_ctask->data[cmd_dir];
+	unsigned long  cmd_data_len = data->data_len;
+
+	if (cmd_data_len > ISER_KMALLOC_THRESHOLD)
+		mem = (void *)__get_free_pages(GFP_KERNEL,
+		      long_log2(roundup_pow_of_two(cmd_data_len)) - PAGE_SHIFT);
+	else
+		mem = kmalloc(cmd_data_len, GFP_KERNEL);
+
+	if (mem == NULL) {
+		iser_err("Failed to allocate mem size %d %d for copying sglist\n",
+			 data->size,(int)cmd_data_len);
+		return -ENOMEM;
+	}
+
+	if (cmd_dir == ISER_DIR_OUT) {
+		/* copy the unaligned sg the buffer which is used for RDMA */
+		struct scatterlist *sg = (struct scatterlist *)data->buf;
+		int i;
+		char *p, *from;
+
+		for (p = mem, i = 0; i < data->size; i++) {
+			from = kmap_atomic(sg[i].page, KM_USER0);
+			memcpy(p,
+			       from + sg[i].offset,
+			       sg[i].length);
+			kunmap_atomic(from, KM_USER0);
+			p += sg[i].length;
+		}
+	}
+
+	sg_init_one(&iser_ctask->data_copy[cmd_dir].sg_single, mem, cmd_data_len);
+	iser_ctask->data_copy[cmd_dir].buf  =
+		&iser_ctask->data_copy[cmd_dir].sg_single;
+	iser_ctask->data_copy[cmd_dir].size = 1;
+
+	iser_ctask->data_copy[cmd_dir].copy_buf  = mem;
+
+	dma_device = iser_ctask->iser_conn->ib_conn->device->ib_device->dma_device;
+
+	if (cmd_dir == ISER_DIR_OUT)
+		dma_nents = dma_map_sg(dma_device,
+				       &iser_ctask->data_copy[cmd_dir].sg_single,
+				       1, DMA_TO_DEVICE);
+	else
+		dma_nents = dma_map_sg(dma_device,
+				       &iser_ctask->data_copy[cmd_dir].sg_single,
+				       1, DMA_FROM_DEVICE);
+
+	if (dma_nents == 0)
+		iser_bug("dma_map_sg failed at %p\n", mem);
+
+	iser_ctask->data_copy[cmd_dir].dma_nents = dma_nents;
+	return 0;
+}
+
+/**
+ * iser_finalize_rdma_unaligned_sg
+ */
+void iser_finalize_rdma_unaligned_sg(struct iscsi_iser_cmd_task *iser_ctask,
+				     enum iser_data_dir         cmd_dir)
+{
+	struct device *dma_device;
+	struct iser_data_buf *mem_copy;
+	unsigned long  cmd_data_len;
+
+	dma_device = iser_ctask->iser_conn->ib_conn->device->ib_device->dma_device;
+	mem_copy   = &iser_ctask->data_copy[cmd_dir];
+
+	if (cmd_dir == ISER_DIR_OUT)
+		dma_unmap_sg(dma_device, &mem_copy->sg_single, 1,
+			     DMA_TO_DEVICE);
+	else
+		dma_unmap_sg(dma_device, &mem_copy->sg_single, 1,
+			     DMA_FROM_DEVICE);
+
+	if (cmd_dir == ISER_DIR_IN) {
+		char *mem;
+		struct scatterlist *sg;
+		unsigned char *p, *to;
+		unsigned int sg_size;
+		int i;
+
+		/* copy back read RDMA to unaligned sg */
+		mem	= mem_copy->copy_buf;
+
+		sg	= (struct scatterlist *)iser_ctask->data[ISER_DIR_IN].buf;
+		sg_size = iser_ctask->data[ISER_DIR_IN].size;
+
+		for (p = mem, i = 0; i < sg_size; i++){
+			to = kmap_atomic(sg[i].page, KM_SOFTIRQ0);
+			memcpy(to + sg[i].offset,
+			       p,
+			       sg[i].length);
+			kunmap_atomic(to, KM_SOFTIRQ0);
+			p += sg[i].length;
+		}
+	}
+
+	cmd_data_len = iser_ctask->data[cmd_dir].data_len;
+
+	if (cmd_data_len > ISER_KMALLOC_THRESHOLD)
+		free_pages((unsigned long)mem_copy->copy_buf,
+			   long_log2(roundup_pow_of_two(cmd_data_len)) - PAGE_SHIFT);
+	else
+		kfree(mem_copy->copy_buf);
+
+	mem_copy->copy_buf = NULL;
+}
+
+/**
+ * iser_sg_to_page_vec - Translates scatterlist entries to physical addresses
+ * and returns the length of resulting physical address array (may be less than
+ * the original due to possible compaction).
+ *
+ * we build a "page vec" under the assumption that the SG meets the RDMA
+ * alignment requirements. Other then the first and last SG elements, all
+ * the "internal" elements can be compacted into a list whose elements are
+ * dma addresses of physical pages. The code supports also the weird case
+ * where --few fragments of the same page-- are present in the SG as
+ * consecutive elements. Also, it handles one entry SG.
+ */
+static int iser_sg_to_page_vec(struct iser_data_buf *data,
+			       struct iser_page_vec *page_vec)
+{
+	struct scatterlist *sg = (struct scatterlist *)data->buf;
+	dma_addr_t first_addr, last_addr, page;
+	int start_aligned, end_aligned;
+	unsigned int cur_page = 0;
+	unsigned long total_sz = 0;
+	int i;
+
+	/* compute the offset of first element */
+	page_vec->offset = (u64) sg[0].offset;
+
+	for (i = 0; i < data->dma_nents; i++) {
+		total_sz += sg_dma_len(&sg[i]);
+
+		first_addr = sg_dma_address(&sg[i]);
+		last_addr  = first_addr + sg_dma_len(&sg[i]);
+
+		start_aligned = !(first_addr & ~PAGE_MASK);
+		end_aligned   = !(last_addr  & ~PAGE_MASK);
+
+		/* continue to collect page fragments till aligned or SG ends */
+		while (!end_aligned && (i + 1 < data->dma_nents)) {
+			i++;
+			total_sz += sg_dma_len(&sg[i]);
+			last_addr = sg_dma_address(&sg[i]) + sg_dma_len(&sg[i]);
+			end_aligned = !(last_addr  & ~PAGE_MASK);
+		}
+
+		first_addr = first_addr & PAGE_MASK;
+
+		for (page = first_addr; page < last_addr; page += PAGE_SIZE)
+			page_vec->pages[cur_page++] = page;
+
+	}
+	page_vec->data_size = total_sz;
+	iser_dbg("page_vec->data_size:%d cur_page %d\n", page_vec->data_size,cur_page);
+	return cur_page;
+}
+
+#define MASK_4K			((1UL << 12) - 1) /* 0xFFF */
+#define IS_4K_ALIGNED(addr)	((((unsigned long)addr) & MASK_4K) == 0)
+
+/**
+ * iser_data_buf_aligned_len - Tries to determine the maximal correctly aligned
+ * for RDMA sub-list of a scatter-gather list of memory buffers, and  returns
+ * the number of entries which are aligned correctly. Supports the case where
+ * consecutive SG elements are actually fragments of the same physcial page.
+ */
+static unsigned int iser_data_buf_aligned_len(struct iser_data_buf *data)
+{
+	struct scatterlist *sg;
+	dma_addr_t end_addr, next_addr;
+	int i, cnt;
+	unsigned int ret_len = 0;
+
+	sg = (struct scatterlist *)data->buf;
+
+	for (cnt = 0, i = 0; i < data->dma_nents; i++, cnt++) {
+		/* iser_dbg("Checking sg iobuf [%d]: phys=0x%08lX "
+		   "offset: %ld sz: %ld\n", i,
+		   (unsigned long)page_to_phys(sg[i].page),
+		   (unsigned long)sg[i].offset,
+		   (unsigned long)sg[i].length); */
+		end_addr = sg_dma_address(&sg[i]) +
+			   sg_dma_len(&sg[i]);
+		/* iser_dbg("Checking sg iobuf end address "
+		       "0x%08lX\n", end_addr); */
+		if (i + 1 < data->dma_nents) {
+			next_addr = sg_dma_address(&sg[i+1]);
+			/* are i, i+1 fragments of the same page? */
+			if (end_addr == next_addr)
+				continue;
+			else if (!IS_4K_ALIGNED(end_addr)) {
+				ret_len = cnt + 1;
+				break;
+			}
+		}
+	}
+	if (i == data->dma_nents)
+		ret_len = cnt;	/* loop ended */
+	iser_dbg("Found %d aligned entries out of %d in sg:0x%p\n",
+		 ret_len, data->dma_nents, data);
+	return ret_len;
+}
+
+static void iser_data_buf_dump(struct iser_data_buf *data)
+{
+	struct scatterlist *sg = (struct scatterlist *)data->buf;
+	int i;
+
+	for (i = 0; i < data->size; i++)
+		iser_err("sg[%d] dma_addr:0x%lX page:0x%p "
+			 "off:%d sz:%d dma_len:%d\n",
+			 i, (unsigned long)sg_dma_address(&sg[i]),
+			 sg[i].page, sg[i].offset,
+			 sg[i].length,sg_dma_len(&sg[i]));
+}
+
+static void iser_dump_page_vec(struct iser_page_vec *page_vec)
+{
+	int i;
+
+	iser_err("page vec length %d data size %d\n",
+		 page_vec->length, page_vec->data_size);
+	for (i = 0; i < page_vec->length; i++)
+		iser_err("%d %lx\n",i,(unsigned long)page_vec->pages[i]);
+}
+
+static void iser_page_vec_build(struct iser_data_buf *data,
+				struct iser_page_vec *page_vec)
+{
+	int page_vec_len = 0;
+
+	page_vec->length = 0;
+	page_vec->offset = 0;
+
+	iser_dbg("Translating sg sz: %d\n", data->dma_nents);
+	page_vec_len = iser_sg_to_page_vec(data,page_vec);
+	iser_dbg("sg len %d page_vec_len %d\n", data->dma_nents,page_vec_len);
+
+	page_vec->length = page_vec_len;
+
+	if (page_vec_len * 4096 < page_vec->data_size) {
+		iser_err("dumping sg\n");
+		iser_data_buf_dump(data);
+		iser_dump_page_vec(page_vec);
+		iser_bug("page_vec too short to hold this SG\n");
+	}
+}
+
+/**
+ * iser_reg_rdma_mem - Registers memory intended for RDMA,
+ * obtaining rkey and va
+ *
+ * returns 0 on success, errno code on failure
+ */
+int iser_reg_rdma_mem(struct iscsi_iser_cmd_task *iser_ctask,
+		      enum   iser_data_dir        cmd_dir)
+{
+	struct iser_conn     *ib_conn = iser_ctask->iser_conn->ib_conn;
+	struct iser_data_buf *mem = &iser_ctask->data[cmd_dir];
+	struct iser_regd_buf *regd_buf;
+	int aligned_len;
+	int err;
+
+	regd_buf = &iser_ctask->rdma_regd[cmd_dir];
+
+	aligned_len = iser_data_buf_aligned_len(mem);
+	if (aligned_len != mem->size) {
+		iser_err("rdma alignment violation %d/%d aligned\n",
+			 aligned_len, mem->size);
+		iser_data_buf_dump(mem);
+		/* allocate copy buf, if we are writing, copy the */
+		/* unaligned scatterlist, dma map the copy        */
+		if (iser_start_rdma_unaligned_sg(iser_ctask, cmd_dir) != 0)
+				return -ENOMEM;
+		mem = &iser_ctask->data_copy[cmd_dir];
+	}
+
+	iser_page_vec_build(mem, ib_conn->page_vec);
+	err = iser_reg_page_vec(ib_conn, ib_conn->page_vec, &regd_buf->reg);
+	if (err)
+		return err;
+
+	/* take a reference on this regd buf such that it will not be released *
+	 * (eg in send dto completion) before we get the scsi response         */
+	atomic_inc(&regd_buf->ref_count);
+	return 0;
+}


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/6] iSER's Makefile and Kconfig
  2006-04-27 12:30 ` [PATCH 1/6] iSER's Makefile and Kconfig Or Gerlitz
  2006-04-27 12:31   ` [PATCH 2/6] iscsi_iser header file Or Gerlitz
@ 2006-04-27 12:40   ` Jan-Benedict Glaw
  2006-04-27 12:44     ` Or Gerlitz
  1 sibling, 1 reply; 21+ messages in thread
From: Jan-Benedict Glaw @ 2006-04-27 12:40 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: rdreier, openib-general, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1776 bytes --]

On Thu, 2006-04-27 15:30:32 +0300, Or Gerlitz <ogerlitz@voltaire.com> wrote:
> Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
> 
> --- /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser-x/Makefile	1970-01-01 02:00:00.000000000 +0200
> +++ /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser/Makefile	2006-04-27 15:12:33.000000000 +0300
> @@ -0,0 +1,6 @@
> +obj-$(CONFIG_INFINIBAND_ISER)	+= ib_iser.o
> +
> +ib_iser-y			:= iser_verbs.o \
> +				   iser_initiator.o \
> +				   iser_memory.o \
> +				   iscsi_iser.o 
> --- /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser-x/Kconfig	1970-01-01 02:00:00.000000000 +0200
> +++ /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser/Kconfig	2006-04-16 11:04:42.000000000 +0300
> @@ -0,0 +1,12 @@
> +config INFINIBAND_ISER
> +	tristate "ISCSI RDMA Protocol"
> +	depends on INFINIBAND && SCSI
> +	select SCSI_ISCSI_ATTRS
> +	---help---
> +
> +	  Support for the ISCSI RDMA Protocol over InfiniBand.  This
> +	  allows you to access storage devices that speak ISER/ISCSI
> +	  over InfiniBand.
> +
> +	  The ISER protocol is defined by IETF.
> +	  See <http://www.ietf.org/>.

Please always send patches in an order so that the kernel still is
compilable.

Eg. with your first patch introducing the Makefile stuff (while the C
files are still not there), this will break and thus make it harder to
automatically trace down unrelated breakages.

MfG, JBG

-- 
Jan-Benedict Glaw       jbglaw@lug-owl.de    . +49-172-7608481             _ O _
"Eine Freie Meinung in  einem Freien Kopf    | Gegen Zensur | Gegen Krieg  _ _ O
 für einen Freien Staat voll Freier Bürger"  | im Internet! |   im Irak!   O O O
ret = do_actions((curr | FREE_SPEECH) & ~(NEW_COPYRIGHT_LAW | DRM | TCPA));

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/6] iSER's Makefile and Kconfig
  2006-04-27 12:40   ` [PATCH 1/6] iSER's Makefile and Kconfig Jan-Benedict Glaw
@ 2006-04-27 12:44     ` Or Gerlitz
  0 siblings, 0 replies; 21+ messages in thread
From: Or Gerlitz @ 2006-04-27 12:44 UTC (permalink / raw)
  To: Or Gerlitz, rdreier, openib-general, linux-kernel

Jan-Benedict Glaw wrote:
> Please always send patches in an order so that the kernel still is
> compilable.
> 
> Eg. with your first patch introducing the Makefile stuff (while the C
> files are still not there), this will break and thus make it harder to
> automatically trace down unrelated breakages.

OK, i understand it,

thanks,

Or.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/6] iscsi_iser header file
  2006-04-27 12:31   ` [PATCH 2/6] iscsi_iser header file Or Gerlitz
  2006-04-27 12:31     ` [PATCH 3/6] open iscsi iser transport provider code Or Gerlitz
@ 2006-04-27 16:58     ` Stephen Hemminger
  1 sibling, 0 replies; 21+ messages in thread
From: Stephen Hemminger @ 2006-04-27 16:58 UTC (permalink / raw)
  To: linux-kernel

O
> +#define PFX "iser:"
> +
> +#define iser_dbg(fmt, arg...)				\
> +	do {						\
> +		if (iser_debug_level > 0)		\
> +			printk(KERN_DEBUG PFX "%s:" fmt,\
> +				__func__ , ## arg);	\
> +	} while (0)
> +
> +#define iser_err(fmt, arg...)				\
> +	do {						\
> +		printk(KERN_ERR PFX "%s:" fmt,          \
> +		       __func__ , ## arg);		\
> +	} while (0)
> +
> +#define iser_bug(fmt,arg...)				\
> +	do {						\
> +		printk(KERN_ERR PFX "%s: PANIC! " fmt,	\
> +			__func__ , ## arg);		\
> +		BUG();					\
> +	} while(0)
> +

Why? is pr_debug, BUG_ON, etc, not good enough for you.
Macro's that obfuscate things like this make global fixups harder

> +					/* support upto 512KB in one RDMA */
> +#define ISCSI_ISER_SG_TABLESIZE         (0x80000 >> PAGE_SHIFT)
> +#define ISCSI_ISER_MAX_LUN		256
> +#define ISCSI_ISER_MAX_CMD_LEN		16
> +
> +/* QP settings */
> +/* Maximal bounds on received asynchronous PDUs */
> +#define ISER_MAX_RX_MISC_PDUS		4 /* NOOP_IN(2) , ASYNC_EVENT(2)   */
> +
> +#define ISER_MAX_TX_MISC_PDUS		6 /* NOOP_OUT(2), TEXT(1),         *
> +					   * SCSI_TMFUNC(2), LOGOUT(1) */
> +
> +#define ISER_QP_MAX_RECV_DTOS		(ISCSI_XMIT_CMDS_MAX + \
> +					ISER_MAX_RX_MISC_PDUS    +  \
> +					ISER_MAX_TX_MISC_PDUS)
> +
> +/* the max TX (send) WR supported by the iSER QP is defined by                 *
> + * max_send_wr = T * (1 + D) + C ; D is how many inflight dataouts we expect   *
> + * to have at max for SCSI command. The tx posting & completion handling code  *
> + * supports -EAGAIN scheme where tx is suspended till the QP has room for more *
> + * send WR. D=8 comes from 64K/8K                                              */
> +
> +#define ISER_INFLIGHT_DATAOUTS		8
> +
> +#define ISER_QP_MAX_REQ_DTOS		(ISCSI_XMIT_CMDS_MAX *    \
> +					(1 + ISER_INFLIGHT_DATAOUTS) + \
> +					ISER_MAX_TX_MISC_PDUS        + \
> +					ISER_MAX_RX_MISC_PDUS)
> +
> +#define ISER_VER			0x10
> +#define ISER_WSV			0x08
> +#define ISER_RSV			0x04
> +
> +struct iser_hdr {
> +	u8      flags;
> +	u8      rsvd[3];
> +	__be32  write_stag; /* write rkey */
> +	__be64  write_va;
> +	__be32  read_stag;  /* read rkey */
> +	__be64  read_va;
> +} __attribute__((packed));
> +
> +
> +/* Length of an object name string */
> +#define ISER_OBJECT_NAME_SIZE		    64
> +
> +enum iser_ib_conn_state {
> +	ISER_CONN_INIT,		   /* descriptor allocd, no conn          */
> +	ISER_CONN_PENDING,	   /* in the process of being established */
> +	ISER_CONN_UP,		   /* up and running                      */
> +	ISER_CONN_TERMINATING,	   /* in the process of being terminated  */
> +	ISER_CONN_DOWN,		   /* shut down                           */
> +	ISER_CONN_STATES_NUM
> +};
> +
> +enum iser_task_status {
> +	ISER_TASK_STATUS_INIT = 0,
> +	ISER_TASK_STATUS_STARTED,
> +	ISER_TASK_STATUS_COMPLETED
> +};
> +
> +enum iser_data_dir {
> +	ISER_DIR_IN = 0,	   /* to initiator */
> +	ISER_DIR_OUT,		   /* from initiator */
> +	ISER_DIRS_NUM
> +};
> +
> +struct iser_data_buf {
> +	void               *buf;      /* pointer to the sg list               */
> +	unsigned int       size;      /* num entries of this sg               */
> +	unsigned long      data_len;  /* total data len                       */
> +	unsigned int       dma_nents; /* returned by dma_map_sg               */
> +	char       	   *copy_buf; /* allocated copy buf for SGs unaligned *
> +	                               * for rdma which are copied            */
> +	struct scatterlist sg_single; /* SG-ified clone of a non SG SC or     *
> +				       * unaligned SG                         */
> +  };
> +
> +/* fwd declarations */
> +struct iser_device;
> +struct iscsi_iser_conn;
> +struct iscsi_iser_cmd_task;
> +
> +struct iser_mem_reg {
> +	u32  lkey;
> +	u32  rkey;
> +	u64  va;
> +	u64  len;
> +	void *mem_h;
> +};
> +
> +struct iser_regd_buf {
> +	struct iser_mem_reg     reg;        /* memory registration info        */
> +	void                    *virt_addr;
> +	struct iser_device      *device;    /* device->device for dma_unmap    */
> +	dma_addr_t              dma_addr;   /* if non zero, addr for dma_unmap */
> +	enum dma_data_direction direction;  /* direction for dma_unmap	       */
> +	unsigned int            data_size;
> +	atomic_t                ref_count;  /* refcount, freed when dec to 0   */
> +};
> +
> +#define MAX_REGD_BUF_VECTOR_LEN	2
> +
> +struct iser_dto {
> +	struct iscsi_iser_cmd_task *ctask;
> +	struct iscsi_iser_conn     *conn;
> +	int                        notify_enable;
> +
> +	/* vector of registered buffers */
> +	unsigned int               regd_vector_len;
> +	struct iser_regd_buf       *regd[MAX_REGD_BUF_VECTOR_LEN];
> +
> +	/* offset into the registered buffer may be specified */
> +	unsigned int               offset[MAX_REGD_BUF_VECTOR_LEN];
> +
> +	/* a smaller size may be specified, if 0, then full size is used */
> +	unsigned int               used_sz[MAX_REGD_BUF_VECTOR_LEN];
> +};
> +
> +enum iser_desc_type {
> +	ISCSI_RX,
> +	ISCSI_TX_CONTROL ,
> +	ISCSI_TX_SCSI_COMMAND,
> +	ISCSI_TX_DATAOUT
> +};
> +
> +struct iser_desc {
> +	struct iser_hdr              iser_header;
> +	struct iscsi_hdr             iscsi_header;
> +	struct iser_regd_buf         hdr_regd_buf;
> +	void                         *data;         /* used by RX & TX_CONTROL */
> +	struct iser_regd_buf         data_regd_buf; /* used by RX & TX_CONTROL */
> +	enum   iser_desc_type        type;
> +	struct iser_dto              dto;
> +};
> +
> +struct iser_device {
> +	struct ib_device             *ib_device;
> +	struct ib_pd	             *pd;
> +	struct ib_cq	             *cq;
> +	struct ib_mr	             *mr;
> +	struct tasklet_struct	     cq_tasklet;
> +	struct list_head             ig_list; /* entry in ig devices list */
> +	int                          refcount;
> +};
> +
> +struct iser_conn
> +{

you were  doing bracket after the 'struct foo' why the sudden change
of style?

> +	struct iscsi_iser_conn       *iser_conn; /* iser conn for upcalls  */
> +	atomic_t		     state;	    /* rdma connection state   */
> +	struct iser_device           *device;       /* device context          */
> +	struct rdma_cm_id            *cma_id;       /* CMA ID		       */
> +	struct ib_qp	             *qp;           /* QP 		       */
> +	struct ib_fmr_pool           *fmr_pool;     /* pool of IB FMRs         */
> +	int                          disc_evt_flag; /* disconn event delivered */
> +	wait_queue_head_t	     wait;          /* waitq for conn/disconn  */
> +	atomic_t                     post_recv_buf_count; /* posted rx count   */
> +	atomic_t                     post_send_buf_count; /* posted tx count   */
> +	struct work_struct           comperror_work; /* conn term sleepable ctx*/
> +	char 			     name[ISER_OBJECT_NAME_SIZE];
> +	struct iser_page_vec         *page_vec;     /* represents SG to fmr maps*
> +						     * maps serialized as tx is*/
> +	struct list_head	     conn_list;       /* entry in ig conn list */
> +};
> +
> +struct iscsi_iser_conn {
> +	struct iscsi_conn            *iscsi_conn;/* ptr to iscsi conn */
> +	struct iser_conn             *ib_conn;   /* iSER IB conn      */
> +
> +	rwlock_t		     lock;
> +};
> +
> +struct iscsi_iser_cmd_task {
> +	struct iser_desc             desc;
> +	struct iscsi_iser_conn	     *iser_conn;
> +	int			     rdma_data_count;/* RDMA bytes           */
> +	enum iser_task_status 	     status;
> +	int                          command_sent;  /* set if command  sent  */
> +	int                          dir[ISER_DIRS_NUM];      /* set if dir use*/
> +	struct iser_regd_buf         rdma_regd[ISER_DIRS_NUM];/* regd rdma buf */
> +	struct iser_data_buf         data[ISER_DIRS_NUM];     /* orig. data des*/
> +	struct iser_data_buf         data_copy[ISER_DIRS_NUM];/* contig. copy  */
> +};
> +
> +struct iser_page_vec {
> +	u64 *pages;
> +	int length;
> +	int offset;
> +	int data_size;
> +};
> +
> +struct iser_global {
> +	struct mutex      device_list_mutex;/*                   */
> +	struct list_head  device_list;	     /* all iSER devices */
> +	struct mutex      connlist_mutex;
> +	struct list_head  connlist;		/* all iSER IB connections */
> +
> +	kmem_cache_t *desc_cache;
> +};
> +
> +extern struct iser_global ig;
> +extern int iser_debug_level;
> +
> +/* allocate connection resources needed for rdma functionality */
> +int iser_conn_set_full_featured_mode(struct iscsi_conn *conn);
> +
> +int iser_send_control(struct iscsi_conn      *conn,
> +		      struct iscsi_mgmt_task *mtask);
> +
> +int iser_send_command(struct iscsi_conn      *conn,
> +		      struct iscsi_cmd_task  *ctask);
> +
> +int iser_send_data_out(struct iscsi_conn     *conn,
> +		       struct iscsi_cmd_task *ctask,
> +		       struct iscsi_data          *hdr);
> +
> +void iscsi_iser_recv(struct iscsi_conn *conn,
> +		     struct iscsi_hdr       *hdr,
> +		     char                   *rx_data,
> +		     int                    rx_data_len);
> +
> +int  iser_conn_init(struct iser_conn **ib_conn);
> +
> +void iser_conn_terminate(struct iser_conn *ib_conn);
> +
> +void iser_conn_release(struct iser_conn *ib_conn);
> +
> +void iser_rcv_completion(struct iser_desc *desc,
> +			 unsigned long    dto_xfer_len);
> +
> +void iser_snd_completion(struct iser_desc *desc);
> +
> +void iser_ctask_rdma_init(struct iscsi_iser_cmd_task     *ctask);
> +
> +void iser_ctask_rdma_finalize(struct iscsi_iser_cmd_task *ctask);
> +
> +void iser_dto_buffs_release(struct iser_dto *dto);
> +
> +int  iser_regd_buff_release(struct iser_regd_buf *regd_buf);
> +
> +void iser_reg_single(struct iser_device      *device,
> +		     struct iser_regd_buf    *regd_buf,
> +		     enum dma_data_direction direction);
> +
> +int  iser_start_rdma_unaligned_sg(struct iscsi_iser_cmd_task    *ctask,
> +				  enum iser_data_dir            cmd_dir);
> +
> +void iser_finalize_rdma_unaligned_sg(struct iscsi_iser_cmd_task *ctask,
> +				     enum iser_data_dir         cmd_dir);
> +
> +int  iser_reg_rdma_mem(struct iscsi_iser_cmd_task *ctask,
> +		       enum   iser_data_dir        cmd_dir);
> +
> +int  iser_connect(struct iser_conn   *ib_conn,
> +		  struct sockaddr_in *src_addr,
> +		  struct sockaddr_in *dst_addr,
> +		  int                non_blocking);
> +
> +int  iser_reg_page_vec(struct iser_conn     *ib_conn,
> +		       struct iser_page_vec *page_vec,
> +		       struct iser_mem_reg  *mem_reg);
> +
> +void iser_unreg_mem(struct iser_mem_reg *mem_reg);
> +
> +int  iser_post_recv(struct iser_desc *rx_desc);
> +int  iser_post_send(struct iser_desc *tx_desc);
> +#endif

common practice is to put extern ahead of function prototypes in .h file.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/6] open iscsi iser transport provider code
  2006-04-27 12:31     ` [PATCH 3/6] open iscsi iser transport provider code Or Gerlitz
  2006-04-27 12:32       ` [PATCH 4/6] iser initiator Or Gerlitz
@ 2006-04-27 17:01       ` Stephen Hemminger
  1 sibling, 0 replies; 21+ messages in thread
From: Stephen Hemminger @ 2006-04-27 17:01 UTC (permalink / raw)
  To: linux-kernel

On Thu, 27 Apr 2006 15:31:52 +0300 (IDT)
Or Gerlitz <ogerlitz@voltaire.com> wrote:

> Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
> 
> --- /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser-x/iscsi_iser.c	1970-01-01 02:00:00.000000000 +0200
> +++ /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser/iscsi_iser.c	2006-04-26 12:50:11.000000000 +0300
> @@ -0,0 +1,800 @@
> +/*
> + * iSCSI Initiator over iSER Data-Path
> + *
> + * Copyright (C) 2004 Dmitry Yusupov
> + * Copyright (C) 2004 Alex Aizman
> + * Copyright (C) 2005 Mike Christie
> + * Copyright (c) 2005, 2006 Voltaire, Inc. All rights reserved.
> + * maintained by openib-general@openib.org
> + *
> + * This software is available to you under a choice of one of two
> + * licenses.  You may choose to be licensed under the terms of the GNU
> + * General Public License (GPL) Version 2, available from the file
> + * COPYING in the main directory of this source tree, or the
> + * OpenIB.org BSD license below:
> + *
> + *     Redistribution and use in source and binary forms, with or
> + *     without modification, are permitted provided that the following
> + *     conditions are met:
> + *
> + *	- Redistributions of source code must retain the above
> + *	  copyright notice, this list of conditions and the following
> + *	  disclaimer.
> + *
> + *	- Redistributions in binary form must reproduce the above
> + *	  copyright notice, this list of conditions and the following
> + *	  disclaimer in the documentation and/or other materials
> + *	  provided with the distribution.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> + * SOFTWARE.
> + *
> + * Credits:
> + *	Christoph Hellwig
> + *	FUJITA Tomonori
> + *	Arne Redlich
> + *	Zhenyu Wang
> + * Modified by:
> + *      Erez Zilber
> + *
> + *
> + * $Id: iscsi_iser.c 6643 2006-04-26 10:01:01Z ogerlitz $
> + */
> +
> +#include <linux/types.h>
> +#include <linux/list.h>
> +#include <linux/hardirq.h>
> +#include <linux/kfifo.h>
> +#include <linux/blkdev.h>
> +#include <linux/init.h>
> +#include <linux/ioctl.h>
> +#include <linux/devfs_fs_kernel.h>
> +#include <linux/cdev.h>
> +#include <linux/in.h>
> +#include <linux/net.h>
> +#include <linux/scatterlist.h>
> +#include <linux/delay.h>
> +
> +#include <net/sock.h>
> +
> +#include <asm/uaccess.h>
> +
> +#include <scsi/scsi_cmnd.h>
> +#include <scsi/scsi_device.h>
> +#include <scsi/scsi_eh.h>
> +#include <scsi/scsi_request.h>
> +#include <scsi/scsi_tcq.h>
> +#include <scsi/scsi_host.h>
> +#include <scsi/scsi.h>
> +#include <scsi/scsi_transport_iscsi.h>
> +
> +#include "iscsi_iser.h"
> +
> +static unsigned int iscsi_max_lun = 512;
> +module_param_named(max_lun, iscsi_max_lun, uint, S_IRUGO);
> +
> +#define DRV_VER	     "$Rev: 227 $"
> +#define DRV_DATE     "$LastChangedDate: 2006-03-22 16:47:30 +0200 (Wed, 22 Mar 2006) $"
> +

Don't use your magic revision control tags, they won't be updated by other 
revision control systems.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [openib-general] [PATCH 5/6] iser RDMA CM (CMA) and IB verbsinteraction
  2006-04-27 12:32         ` [PATCH 5/6] iser RDMA CM (CMA) and IB verbs interaction Or Gerlitz
  2006-04-27 12:33           ` [PATCH 6/6] iser handling of memory for RDMA Or Gerlitz
@ 2006-04-28 23:05           ` Sean Hefty
  2006-04-30 12:30             ` Or Gerlitz
  2006-05-01 13:02             ` Or Gerlitz
  1 sibling, 2 replies; 21+ messages in thread
From: Sean Hefty @ 2006-04-28 23:05 UTC (permalink / raw)
  To: 'Or Gerlitz', rdreier; +Cc: linux-kernel, openib-general

>+static int iser_free_device_ib_res(struct iser_device *device)
>+{
>+	BUG_ON(device->mr == NULL);
>+
>+	tasklet_kill(&device->cq_tasklet);
>+
>+	(void)ib_dereg_mr(device->mr);
>+	(void)ib_destroy_cq(device->cq);
>+	(void)ib_dealloc_pd(device->pd);
>+
>+	device->mr = NULL;
>+	device->cq = NULL;
>+	device->pd = NULL;
>+	return 0;
>+}

Can you eliminate the return code?

>+static int iser_free_ib_conn_res(struct iser_conn *ib_conn)
>+{
>+	BUG_ON(ib_conn == NULL);
>+
>+	iser_err("freeing conn %p cma_id %p fmr pool %p qp %p\n",
>+		 ib_conn, ib_conn->cma_id,
>+		 ib_conn->fmr_pool, ib_conn->qp);
>+
>+	/* qp is created only once both addr & route are resolved */
>+	if (ib_conn->fmr_pool != NULL)
>+		ib_destroy_fmr_pool(ib_conn->fmr_pool);
>+
>+	if (ib_conn->qp != NULL)
>+		rdma_destroy_qp(ib_conn->cma_id);
>+
>+	if (ib_conn->cma_id != NULL)
>+		rdma_destroy_id(ib_conn->cma_id);

Are the NULL checks needed above?  Neither iser_create_device_ib_res() or
iser_create_ib_conn_res() set the values to NULL if an error occurred.

>+
>+	ib_conn->fmr_pool = NULL;
>+	ib_conn->qp	  = NULL;
>+	ib_conn->cma_id   = NULL;
>+	kfree(ib_conn->page_vec);
>+
>+	return 0;
>+}
>+
>+/**
>+ * based on the resolved device node GUID see if there already allocated
>+ * device for this device. If there's no such, create one.
>+ */
>+static
>+struct iser_device *iser_device_find_by_ib_device(struct rdma_cm_id *cma_id)
>+{
>+	struct list_head    *p_list;
>+	struct iser_device  *device = NULL;
>+
>+	mutex_lock(&ig.device_list_mutex);
>+
>+	p_list = ig.device_list.next;
>+	while (p_list != &ig.device_list) {
>+		device = list_entry(p_list, struct iser_device, ig_list);
>+		/* find if there's a match using the node GUID */
>+		if (device->ib_device->node_guid == cma_id->device->node_guid)
>+			break;
>+	}
>+
>+	if (device == NULL) {
>+		device = kzalloc(sizeof *device, GFP_KERNEL);
>+		if (device == NULL)
>+			goto end;

goto out;  // see below

>+		/* assign this device to the device */
>+		device->ib_device = cma_id->device;
>+		/* init the device and link it into ig device list */
>+		if (iser_create_device_ib_res(device)) {
>+			kfree(device);
>+			device = NULL;
>+			goto end;
>+		}
>+		list_add(&device->ig_list, &ig.device_list);
>+	}
>+end:
>+	BUG_ON(device == NULL);
>+	device->refcount++;

out:

>+	mutex_unlock(&ig.device_list_mutex);
>+	return device;
>+}
>+

>+static void iser_disconnected_handler(struct rdma_cm_id *cma_id)
>+{
>+	struct iser_conn *ib_conn;
>+
>+	ib_conn = (struct iser_conn *)cma_id->context;
>+	ib_conn->disc_evt_flag = 1;
>+
>+	/* If this event is unsolicited this means that the conn is being */
>+	/* terminated asynchronously from the iSCSI layer's perspective.  */
>+	if (atomic_read(&ib_conn->state) == ISER_CONN_PENDING) {
>+		atomic_set(&ib_conn->state, ISER_CONN_DOWN);
>+		wake_up_interruptible(&ib_conn->wait);
>+	} else {
>+		if (atomic_read(&ib_conn->state) == ISER_CONN_UP) {
>+			atomic_set(&ib_conn->state, ISER_CONN_TERMINATING);
>+			iscsi_conn_failure(ib_conn->iser_conn->iscsi_conn,
>+						ISCSI_ERR_CONN_FAILED);
>+		}
>+		/* Complete the termination process if no posts are pending */
>+		if ((atomic_read(&ib_conn->post_recv_buf_count) == 0) &&
>+		    (atomic_read(&ib_conn->post_send_buf_count) == 0)) {
>+			atomic_set(&ib_conn->state, ISER_CONN_DOWN);
>+			wake_up_interruptible(&ib_conn->wait);
>+		}
>+	}

Are there races here between reading ib_conn->state and setting it?  Could it
have changed in between the atomic_read() and atomic_set()?

>+	src = (struct sockaddr *)src_addr;
>+	dst = (struct sockaddr *)dst_addr;
>+	err = rdma_resolve_addr(ib_conn->cma_id, src, dst, 1000);
>+	if (err) {
>+		iser_err("rdma_resolve_addr failed: %d\n", err);
>+		goto addr_failure;
>+	}
>+
>+	if (!non_blocking) {
>+		wait_event_interruptible(ib_conn->wait,
>+			 atomic_read(&ib_conn->state) != ISER_CONN_PENDING);
>+
>+		if (atomic_read(&ib_conn->state) != ISER_CONN_UP) {
>+			err =  -EIO;
>+			goto connect_failure;
>+		}
>+	}
>+
>+	mutex_lock(&ig.connlist_mutex);
>+	list_add(&ib_conn->conn_list, &ig.connlist);
>+	mutex_unlock(&ig.connlist_mutex);

Not sure if there's a race here or not, but rdma_resolve_addr() will result in a
callback from a separate thread.  That callback could occur before the ib_conn
is added to the ig.connlist.  Do you assume that ib_conn is in the connlist in
any of the callbacks?

>+int iser_post_recv(struct iser_desc *rx_desc)
>+{
>+	int		  ib_ret, ret_val = 0;
>+	struct ib_recv_wr recv_wr, *recv_wr_failed;
>+	struct ib_sge	  iov[2];
>+	struct iser_conn  *ib_conn;
>+	struct iser_dto   *recv_dto = &rx_desc->dto;
>+
>+	/* Retrieve conn */
>+	ib_conn = recv_dto->conn->ib_conn;
>+
>+	iser_dto_to_iov(recv_dto, iov, 2);
>+
>+	recv_wr.next	= NULL;
>+	recv_wr.sg_list = iov;
>+	recv_wr.num_sge = recv_dto->regd_vector_len;
>+	recv_wr.wr_id	= (unsigned long)rx_desc;

Nit - position of "=" signs above is weird.

>+static void iser_comp_error_worker(void *data)
>+{
>+	struct iser_conn *ib_conn = data;
>+
>+	if (atomic_read(&ib_conn->state) == ISER_CONN_UP) {
>+		atomic_set(&ib_conn->state, ISER_CONN_TERMINATING);
>+		iscsi_conn_failure(ib_conn->iser_conn->iscsi_conn,
>+					ISCSI_ERR_CONN_FAILED);
>+	}

Potential race reading/setting state?

- Sean

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [openib-general] [PATCH 5/6] iser RDMA CM (CMA) and IB verbsinteraction
  2006-04-28 23:05           ` [openib-general] [PATCH 5/6] iser RDMA CM (CMA) and IB verbsinteraction Sean Hefty
@ 2006-04-30 12:30             ` Or Gerlitz
  2006-05-01 13:02             ` Or Gerlitz
  1 sibling, 0 replies; 21+ messages in thread
From: Or Gerlitz @ 2006-04-30 12:30 UTC (permalink / raw)
  To: Sean Hefty; +Cc: linux-kernel, openib-general

Sean Hefty wrote:
>> +static int iser_free_device_ib_res(struct iser_device *device)
>> +{
>> +	BUG_ON(device->mr == NULL);
>> +
>> +	tasklet_kill(&device->cq_tasklet);
>> +
>> +	(void)ib_dereg_mr(device->mr);
>> +	(void)ib_destroy_cq(device->cq);
>> +	(void)ib_dealloc_pd(device->pd);
>> +
>> +	device->mr = NULL;
>> +	device->cq = NULL;
>> +	device->pd = NULL;
>> +	return 0;
>> +}
> 
> Can you eliminate the return code?

Yes

>> +static int iser_free_ib_conn_res(struct iser_conn *ib_conn)
>> +{
>> +	BUG_ON(ib_conn == NULL);
>> +
>> +	iser_err("freeing conn %p cma_id %p fmr pool %p qp %p\n",
>> +		 ib_conn, ib_conn->cma_id,
>> +		 ib_conn->fmr_pool, ib_conn->qp);
>> +
>> +	/* qp is created only once both addr & route are resolved */
>> +	if (ib_conn->fmr_pool != NULL)
>> +		ib_destroy_fmr_pool(ib_conn->fmr_pool);
>> +
>> +	if (ib_conn->qp != NULL)
>> +		rdma_destroy_qp(ib_conn->cma_id);
>> +
>> +	if (ib_conn->cma_id != NULL)
>> +		rdma_destroy_id(ib_conn->cma_id);

> Are the NULL checks needed above?  Neither iser_create_device_ib_res() or
> iser_create_ib_conn_res() set the values to NULL if an error occurred.

we are dealing here with connection resources so the (shared among ib 
conns) device resources are irrelevant. The ib conn struct is kzallec-ed 
on creation, where later iser_free_ib_conn_res() can be called when only 
a ***subset*** of the resources was allocated. Examples are instant 
error from rdma_addr_resolve() or getting ADDR/ROUTE ERROR vs. CONNECT 
ERROR cma events, in the first three cases only the cma id should be 
destroyed while on the latter there's a need to destroy the fmr pool and 
the qp.

>> +/**
>> + * based on the resolved device node GUID see if there already allocated
>> + * device for this device. If there's no such, create one.
>> + */
>> +static
>> +struct iser_device *iser_device_find_by_ib_device(struct rdma_cm_id *cma_id)
>> +{
>> +	struct list_head    *p_list;
>> +	struct iser_device  *device = NULL;
>> +
>> +	mutex_lock(&ig.device_list_mutex);
>> +
>> +	p_list = ig.device_list.next;
>> +	while (p_list != &ig.device_list) {
>> +		device = list_entry(p_list, struct iser_device, ig_list);
>> +		/* find if there's a match using the node GUID */
>> +		if (device->ib_device->node_guid == cma_id->device->node_guid)
>> +			break;
>> +	}
>> +
>> +	if (device == NULL) {
>> +		device = kzalloc(sizeof *device, GFP_KERNEL);
>> +		if (device == NULL)
>> +			goto end;

> goto out;  // see below

>> +		/* assign this device to the device */
>> +		device->ib_device = cma_id->device;
>> +		/* init the device and link it into ig device list */
>> +		if (iser_create_device_ib_res(device)) {
>> +			kfree(device);
>> +			device = NULL;
>> +			goto end;
>> +		}
>> +		list_add(&device->ig_list, &ig.device_list);
>> +	}
>> +end:
>> +	BUG_ON(device == NULL);
>> +	device->refcount++;
> 
> out:

OK

Or.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [openib-general] [PATCH 5/6] iser RDMA CM (CMA) and IB verbsinteraction
  2006-04-28 23:05           ` [openib-general] [PATCH 5/6] iser RDMA CM (CMA) and IB verbsinteraction Sean Hefty
  2006-04-30 12:30             ` Or Gerlitz
@ 2006-05-01 13:02             ` Or Gerlitz
  2006-05-04 13:00               ` [openib-general] [PATCH 5/6] iser RDMA CM (CMA) and IB verbs interaction Or Gerlitz
  1 sibling, 1 reply; 21+ messages in thread
From: Or Gerlitz @ 2006-05-01 13:02 UTC (permalink / raw)
  To: Sean Hefty; +Cc: linux-kernel, openib-general

Sean Hefty wrote:
>> +static void iser_disconnected_handler(struct rdma_cm_id *cma_id)
>> +{
>> +	struct iser_conn *ib_conn;
>> +
>> +	ib_conn = (struct iser_conn *)cma_id->context;
>> +	ib_conn->disc_evt_flag = 1;
>> +
>> +	/* If this event is unsolicited this means that the conn is being */
>> +	/* terminated asynchronously from the iSCSI layer's perspective.  */
>> +	if (atomic_read(&ib_conn->state) == ISER_CONN_PENDING) {
>> +		atomic_set(&ib_conn->state, ISER_CONN_DOWN);
>> +		wake_up_interruptible(&ib_conn->wait);
>> +	} else {
>> +		if (atomic_read(&ib_conn->state) == ISER_CONN_UP) {
>> +			atomic_set(&ib_conn->state, ISER_CONN_TERMINATING);
>> +			iscsi_conn_failure(ib_conn->iser_conn->iscsi_conn,
>> +						ISCSI_ERR_CONN_FAILED);
>> +		}
>> +		/* Complete the termination process if no posts are pending */
>> +		if ((atomic_read(&ib_conn->post_recv_buf_count) == 0) &&
>> +		    (atomic_read(&ib_conn->post_send_buf_count) == 0)) {
>> +			atomic_set(&ib_conn->state, ISER_CONN_DOWN);
>> +			wake_up_interruptible(&ib_conn->wait);
>> +		}
>> +	}

> Are there races here between reading ib_conn->state and setting it?  Could it
> have changed in between the atomic_read() and atomic_set()?

It seems that indeed a race is possible here, i am rethinking now on the 
implementation of the ib connection states moves, thanks for pointing this.

>> +	src = (struct sockaddr *)src_addr;
>> +	dst = (struct sockaddr *)dst_addr;
>> +	err = rdma_resolve_addr(ib_conn->cma_id, src, dst, 1000);
>> +	if (err) {
>> +		iser_err("rdma_resolve_addr failed: %d\n", err);
>> +		goto addr_failure;
>> +	}
>> +
>> +	if (!non_blocking) {
>> +		wait_event_interruptible(ib_conn->wait,
>> +			 atomic_read(&ib_conn->state) != ISER_CONN_PENDING);
>> +
>> +		if (atomic_read(&ib_conn->state) != ISER_CONN_UP) {
>> +			err =  -EIO;
>> +			goto connect_failure;
>> +		}
>> +	}
>> +
>> +	mutex_lock(&ig.connlist_mutex);
>> +	list_add(&ib_conn->conn_list, &ig.connlist);
>> +	mutex_unlock(&ig.connlist_mutex);

> Not sure if there's a race here or not, but rdma_resolve_addr() will result in a
> callback from a separate thread.  That callback could occur before the ib_conn
> is added to the ig.connlist.  Do you assume that ib_conn is in the connlist in
> any of the callbacks?

No, i don't assume this in the callbacks. ib_conn is inserted to the 
list in iser_connect and being lookup-ed in ep_poll, conn_bind and 
ep_disconnect where each subset of the latter three functions are 
serialized are iser_connect since they are called by the same user space 
process (iscsid, via iscsi netlink u/k IPC mechanism).

However, in a review i have made to fully answer your question i have 
found a possible double call to iser_conn_release where the fix below 
handles it.

------------------------------------------------------------------------
r6802 | ogerlitz | 2006-05-01 12:27:12 +0300 (Mon, 01 May 2006) | 5 lines

move the ib conn deletion from the global connlist to iser_conn_release,
fix ep_disconnect to call conn_terminate or conn_release but not both.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>

Index: iser_verbs.c
===================================================================
--- iser_verbs.c	(revision 6761)
+++ iser_verbs.c	(revision 6802)
@@ -301,10 +301,6 @@ void iser_conn_terminate(struct iser_con
  	wait_event_interruptible(ib_conn->wait,
  				 (atomic_read(&ib_conn->state) == ISER_CONN_DOWN));

-	mutex_lock(&ig.connlist_mutex);
-	list_del(&ib_conn->conn_list);
-	mutex_unlock(&ig.connlist_mutex);
-
  	iser_conn_release(ib_conn);
  }

@@ -463,6 +459,7 @@ int iser_conn_init(struct iser_conn **ib
  	atomic_set(&ib_conn->post_send_buf_count, 0);
  	INIT_WORK(&ib_conn->comperror_work, iser_comp_error_worker,
  		  ib_conn);
+	INIT_LIST_HEAD(&ib_conn->conn_list);

  	*ibconn = ib_conn;
  	return 0;
@@ -541,6 +538,10 @@ void iser_conn_release(struct iser_conn

  	BUG_ON(atomic_read(&ib_conn->state) != ISER_CONN_DOWN);

+	mutex_lock(&ig.connlist_mutex);
+	list_del(&ib_conn->conn_list);
+	mutex_unlock(&ig.connlist_mutex);
+
  	iser_free_ib_conn_res(ib_conn);
  	ib_conn->device = NULL;
  	/* on EVENT_ADDR_ERROR there's no device yet for this conn */
Index: iscsi_iser.c
===================================================================
--- iscsi_iser.c	(revision 6761)
+++ iscsi_iser.c	(revision 6802)
@@ -680,8 +680,8 @@ iscsi_iser_ep_disconnect(__u64 ep_handle

  	if (atomic_read(&ib_conn->state) == ISER_CONN_UP)
  		iser_conn_terminate(ib_conn);
-
-	iser_conn_release(ib_conn);
+	else
+		iser_conn_release(ib_conn);
  }

  static struct scsi_host_template iscsi_iser_sht = {





^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator
  2006-04-27 12:30 [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator Or Gerlitz
  2006-04-27 12:30 ` [PATCH 1/6] iSER's Makefile and Kconfig Or Gerlitz
@ 2006-05-01 18:32 ` Roland Dreier
  2006-05-02  7:56   ` Or Gerlitz
  1 sibling, 1 reply; 21+ messages in thread
From: Roland Dreier @ 2006-05-01 18:32 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: openib-general, linux-kernel

Is this ready for queuing in my for-2.6.18 tree?  What is the status
of all the non-IB dependencies?

If it is ready for merging, please send me a clean patch series with
the comments from this thread addressed.  And also remind me of which
SCSI git trees this depends on...

Thanks,
  Roland

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator
  2006-05-01 18:32 ` [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator Roland Dreier
@ 2006-05-02  7:56   ` Or Gerlitz
  0 siblings, 0 replies; 21+ messages in thread
From: Or Gerlitz @ 2006-05-02  7:56 UTC (permalink / raw)
  To: Roland Dreier; +Cc: openib-general, linux-kernel

Roland Dreier wrote:
> Is this ready for queuing in my for-2.6.18 tree?  What is the status
> of all the non-IB dependencies?
 > If it is ready for merging, please send me a clean patch series with
 > the comments from this thread addressed.  And also remind me of which
 > SCSI git trees this depends on...

I am working on reviewing / applying fixes to the comments, and will
send you a clean patch set when done.

The only non-IB dependency is in the iSCSI updates for 2.6.18. The git 
from which those updates are pushed upstream is scsi-misc-2.6 . Now, 
James have accepted into it 5/6 of the updates (see below) but there's 
still one which is not there yet. I will let you know.

Or.

Mike Christie 	[SCSI] iscsi: convert iscsi tcp to libiscsi 	
Mike Christie 	[SCSI] iscsi: add libiscsi 	
Mike Christie 	[SCSI] iscsi: fix up iscsi eh 	
Mike Christie 	[SCSI] iscsi: add sysfs attrs for uspace sync up 	
Mike Christie 	[SCSI] iscsi: rm kernel iscsi handles usage for session





^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [openib-general] [PATCH 5/6] iser RDMA CM (CMA) and IB verbs interaction
  2006-05-01 13:02             ` Or Gerlitz
@ 2006-05-04 13:00               ` Or Gerlitz
  2006-05-04 13:06                 ` Or Gerlitz
  0 siblings, 1 reply; 21+ messages in thread
From: Or Gerlitz @ 2006-05-04 13:00 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: Sean Hefty, linux-kernel, openib-general

Or Gerlitz wrote:
> Sean Hefty wrote:
>>> +static void iser_disconnected_handler(struct rdma_cm_id *cma_id)
>>> +{
>>> +    struct iser_conn *ib_conn;
>>> +
>>> +    ib_conn = (struct iser_conn *)cma_id->context;
>>> +    ib_conn->disc_evt_flag = 1;
>>> +
>>> +    /* If this event is unsolicited this means that the conn is 
>>> being */
>>> +    /* terminated asynchronously from the iSCSI layer's 
>>> perspective.  */
>>> +    if (atomic_read(&ib_conn->state) == ISER_CONN_PENDING) {
>>> +        atomic_set(&ib_conn->state, ISER_CONN_DOWN);
>>> +        wake_up_interruptible(&ib_conn->wait);
>>> +    } else {
>>> +        if (atomic_read(&ib_conn->state) == ISER_CONN_UP) {
>>> +            atomic_set(&ib_conn->state, ISER_CONN_TERMINATING);
>>> +            iscsi_conn_failure(ib_conn->iser_conn->iscsi_conn,
>>> +                        ISCSI_ERR_CONN_FAILED);
>>> +        }
>>> +        /* Complete the termination process if no posts are pending */
>>> +        if ((atomic_read(&ib_conn->post_recv_buf_count) == 0) &&
>>> +            (atomic_read(&ib_conn->post_send_buf_count) == 0)) {
>>> +            atomic_set(&ib_conn->state, ISER_CONN_DOWN);
>>> +            wake_up_interruptible(&ib_conn->wait);
>>> +        }
>>> +    }

>> Are there races here between reading ib_conn->state and setting it?  
>> Could it have changed in between the atomic_read() and atomic_set()?

> It seems that indeed a race is possible here, i am rethinking now on the 
> implementation of the ib connection states moves, thanks for pointing this.

Following a review and the clarification i have got from you re cma 
callbacks serialization, i have committed this change which removes 
unneeded state checks from two flows (disconnect handler and connect error)

Or.

r6900 | ogerlitz | 2006-05-04 11:06:24 +0300 (Thu, 04 May 2006) | 7 lines

two fixes to iser ib conn state management:

+1 when getting DISCONNECTED cma event, iser's state can't be PENDING
+2 when connect_error is called, iser's state is PENDING, no need to 
check it

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com

Index: iser_verbs.c
===================================================================
--- iser_verbs.c	(revision 6802)
+++ iser_verbs.c	(revision 6900)
@@ -309,12 +309,8 @@ static void iser_connect_error(struct rd
  	struct iser_conn *ib_conn;
  	ib_conn = (struct iser_conn *)cma_id->context;

-	if (atomic_read(&ib_conn->state) == ISER_CONN_PENDING) {
-		atomic_set(&ib_conn->state, ISER_CONN_DOWN);
-		wake_up_interruptible(&ib_conn->wait);
-	} else
-		iser_err("Unexpected evt for conn.state: %d\n",
-			 atomic_read(&ib_conn->state));
+	atomic_set(&ib_conn->state, ISER_CONN_DOWN);
+	wake_up_interruptible(&ib_conn->wait);
  }

  static void iser_addr_handler(struct rdma_cm_id *cma_id)
@@ -386,21 +382,16 @@ static void iser_disconnected_handler(st

  	/* If this event is unsolicited this means that the conn is being */
  	/* terminated asynchronously from the iSCSI layer's perspective.  */
-	if (atomic_read(&ib_conn->state) == ISER_CONN_PENDING) {
+	if (atomic_read(&ib_conn->state) == ISER_CONN_UP) {
+		atomic_set(&ib_conn->state, ISER_CONN_TERMINATING);
+		iscsi_conn_failure(ib_conn->iser_conn->iscsi_conn,
+				   ISCSI_ERR_CONN_FAILED);
+	}
+	/* Complete the termination process if no posts are pending */
+	if ((atomic_read(&ib_conn->post_recv_buf_count) == 0) &&
+	    (atomic_read(&ib_conn->post_send_buf_count) == 0)) {
  		atomic_set(&ib_conn->state, ISER_CONN_DOWN);
  		wake_up_interruptible(&ib_conn->wait);
-	} else {
-		if (atomic_read(&ib_conn->state) == ISER_CONN_UP) {
-			atomic_set(&ib_conn->state, ISER_CONN_TERMINATING);
-			iscsi_conn_failure(ib_conn->iser_conn->iscsi_conn,
-						ISCSI_ERR_CONN_FAILED);
-		}
-		/* Complete the termination process if no posts are pending */
-		if ((atomic_read(&ib_conn->post_recv_buf_count) == 0) &&
-		    (atomic_read(&ib_conn->post_send_buf_count) == 0)) {
-			atomic_set(&ib_conn->state, ISER_CONN_DOWN);
-			wake_up_interruptible(&ib_conn->wait);
-		}
  	}
  }



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [openib-general] [PATCH 5/6] iser RDMA CM (CMA) and IB verbs interaction
  2006-05-04 13:00               ` [openib-general] [PATCH 5/6] iser RDMA CM (CMA) and IB verbs interaction Or Gerlitz
@ 2006-05-04 13:06                 ` Or Gerlitz
  0 siblings, 0 replies; 21+ messages in thread
From: Or Gerlitz @ 2006-05-04 13:06 UTC (permalink / raw)
  To: Sean Hefty; +Cc: linux-kernel, openib-general

Or Gerlitz wrote:
> Or Gerlitz wrote:
>> Sean Hefty wrote:
>>>> +static void iser_disconnected_handler(struct rdma_cm_id *cma_id)
>>>> +{
>>>> +    struct iser_conn *ib_conn;
>>>> +
>>>> +    ib_conn = (struct iser_conn *)cma_id->context;
>>>> +    ib_conn->disc_evt_flag = 1;
>>>> +
>>>> +    /* If this event is unsolicited this means that the conn is 
>>>> being */
>>>> +    /* terminated asynchronously from the iSCSI layer's 
>>>> perspective.  */
>>>> +    if (atomic_read(&ib_conn->state) == ISER_CONN_PENDING) {
>>>> +        atomic_set(&ib_conn->state, ISER_CONN_DOWN);
>>>> +        wake_up_interruptible(&ib_conn->wait);
>>>> +    } else {
>>>> +        if (atomic_read(&ib_conn->state) == ISER_CONN_UP) {
>>>> +            atomic_set(&ib_conn->state, ISER_CONN_TERMINATING);
>>>> +            iscsi_conn_failure(ib_conn->iser_conn->iscsi_conn,
>>>> +                        ISCSI_ERR_CONN_FAILED);
>>>> +        }
>>>> +        /* Complete the termination process if no posts are pending */
>>>> +        if ((atomic_read(&ib_conn->post_recv_buf_count) == 0) &&
>>>> +            (atomic_read(&ib_conn->post_send_buf_count) == 0)) {
>>>> +            atomic_set(&ib_conn->state, ISER_CONN_DOWN);
>>>> +            wake_up_interruptible(&ib_conn->wait);
>>>> +        }
>>>> +    }
> 
>>> Are there races here between reading ib_conn->state and setting it?  
>>> Could it have changed in between the atomic_read() and atomic_set()?

>> It seems that indeed a race is possible here, i am rethinking now on 
>> the implementation of the ib connection states moves, thanks for 
>> pointing this.

> Following a review and the clarification i have got from you re cma 
> callbacks serialization, i have committed this change which removes 
> unneeded state checks from two flows (disconnect handler and connect error)

This is the actual fix to the possible races you were pointing on, 
thanks for your feedback.

Or.

r6924 | ogerlitz | 2006-05-04 16:03:21 +0300 (Thu, 04 May 2006) | 7 lines

changed iser ib conn state management to be done with an int variable 
keeping the state and a lock. When a related race is possible the lock 
is used to check (comp) or change (comp_exch) the state. When no race 
can happen the state is just examined or changed.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com

$ diffstat /tmp/6900-6924
  iscsi_iser.c     |   13 +++------
  iscsi_iser.h     |    6 +++-
  iser_initiator.c |    6 ++--
  iser_verbs.c     |   77 
++++++++++++++++++++++++++++++++++++++-----------------
  4 files changed, 67 insertions(+), 35 deletions(-)


Index: iscsi_iser.h
===================================================================
--- iscsi_iser.h	(revision 6900)
+++ iscsi_iser.h	(revision 6924)
@@ -235,7 +235,8 @@ struct iser_device {
  struct iser_conn
  {
  	struct iscsi_iser_conn       *iser_conn; /* iser conn for upcalls  */
-	atomic_t		     state;	    /* rdma connection state   */
+	enum iser_ib_conn_state	     state;	    /* rdma connection state   */
+	spinlock_t		     lock;	    /* used for state changes  */
  	struct iser_device           *device;       /* device context          */
  	struct rdma_cm_id            *cma_id;       /* CMA ID		       */
  	struct ib_qp	             *qp;           /* QP 		       */
@@ -352,4 +353,7 @@ void iser_unreg_mem(struct iser_mem_reg

  int  iser_post_recv(struct iser_desc *rx_desc);
  int  iser_post_send(struct iser_desc *tx_desc);
+
+int iser_conn_state_comp(struct iser_conn *ib_conn,
+			 enum iser_ib_conn_state comp);
  #endif
Index: iser_verbs.c
===================================================================
--- iser_verbs.c	(revision 6900)
+++ iser_verbs.c	(revision 6924)
@@ -287,6 +287,30 @@ static void iser_device_try_release(stru
  	mutex_unlock(&ig.device_list_mutex);
  }

+int iser_conn_state_comp(struct iser_conn *ib_conn,
+			 enum iser_ib_conn_state comp)
+{
+        int ret;
+
+	spin_lock_bh(&ib_conn->lock);
+	ret = (ib_conn->state == comp);
+	spin_unlock_bh(&ib_conn->lock);
+	return ret;
+}
+
+static int iser_conn_state_comp_exch(struct iser_conn *ib_conn,
+				     enum iser_ib_conn_state comp,
+				     enum iser_ib_conn_state exch)
+{
+        int ret;
+
+        spin_lock_bh(&ib_conn->lock);
+        if ((ret = (ib_conn->state == comp)))
+                ib_conn->state = exch;
+        spin_unlock_bh(&ib_conn->lock);
+        return ret;
+}
+
  /**
   * triggers start of the disconnect procedures and wait for them to be 
done
   */
@@ -294,12 +318,17 @@ void iser_conn_terminate(struct iser_con
  {
  	int err = 0;

-	atomic_set(&ib_conn->state, ISER_CONN_TERMINATING);
-	err = rdma_disconnect(ib_conn->cma_id);
-	if (err)
-		iser_bug("Failed to disconnect, conn: 0x%p err %d\n",ib_conn,err);
+	if (iser_conn_state_comp_exch(ib_conn, ISER_CONN_UP,
+				      ISER_CONN_TERMINATING)) {
+		err = rdma_disconnect(ib_conn->cma_id);
+		if (err)
+			iser_err("Failed to disconnect, conn: 0x%p err %d\n",
+				 ib_conn,err);
+
+	}
+
  	wait_event_interruptible(ib_conn->wait,
-				 (atomic_read(&ib_conn->state) == ISER_CONN_DOWN));
+				 ib_conn->state == ISER_CONN_DOWN);

  	iser_conn_release(ib_conn);
  }
@@ -309,7 +338,7 @@ static void iser_connect_error(struct rd
  	struct iser_conn *ib_conn;
  	ib_conn = (struct iser_conn *)cma_id->context;

-	atomic_set(&ib_conn->state, ISER_CONN_DOWN);
+	ib_conn->state = ISER_CONN_DOWN;
  	wake_up_interruptible(&ib_conn->wait);
  }

@@ -369,7 +398,7 @@ static void iser_connected_handler(struc
  	struct iser_conn *ib_conn;

  	ib_conn = (struct iser_conn *)cma_id->context;
-	atomic_set(&ib_conn->state, ISER_CONN_UP);
+	ib_conn->state = ISER_CONN_UP;
  	wake_up_interruptible(&ib_conn->wait);
  }

@@ -380,17 +409,17 @@ static void iser_disconnected_handler(st
  	ib_conn = (struct iser_conn *)cma_id->context;
  	ib_conn->disc_evt_flag = 1;

-	/* If this event is unsolicited this means that the conn is being */
-	/* terminated asynchronously from the iSCSI layer's perspective.  */
-	if (atomic_read(&ib_conn->state) == ISER_CONN_UP) {
-		atomic_set(&ib_conn->state, ISER_CONN_TERMINATING);
+	/* getting here when the state is UP means that the conn is being *
+	 * terminated asynchronously from the iSCSI layer's perspective.  */
+	if (iser_conn_state_comp_exch(ib_conn, ISER_CONN_UP,
+				      ISER_CONN_TERMINATING))
  		iscsi_conn_failure(ib_conn->iser_conn->iscsi_conn,
  				   ISCSI_ERR_CONN_FAILED);
-	}
+	
  	/* Complete the termination process if no posts are pending */
  	if ((atomic_read(&ib_conn->post_recv_buf_count) == 0) &&
  	    (atomic_read(&ib_conn->post_send_buf_count) == 0)) {
-		atomic_set(&ib_conn->state, ISER_CONN_DOWN);
+		ib_conn->state = ISER_CONN_DOWN;
  		wake_up_interruptible(&ib_conn->wait);
  	}
  }
@@ -444,13 +473,14 @@ int iser_conn_init(struct iser_conn **ib
  		iser_err("can't alloc memory for struct iser_conn\n");
  		return -ENOMEM;
  	}
-	atomic_set(&ib_conn->state, ISER_CONN_INIT);
+	ib_conn->state = ISER_CONN_INIT;
  	init_waitqueue_head(&ib_conn->wait);
  	atomic_set(&ib_conn->post_recv_buf_count, 0);
  	atomic_set(&ib_conn->post_send_buf_count, 0);
  	INIT_WORK(&ib_conn->comperror_work, iser_comp_error_worker,
  		  ib_conn);
  	INIT_LIST_HEAD(&ib_conn->conn_list);
+	spin_lock_init(&ib_conn->lock);

  	*ibconn = ib_conn;
  	return 0;
@@ -477,7 +507,7 @@ int iser_connect(struct iser_conn   *ib_
  	iser_err("connecting to: %d.%d.%d.%d, port 0x%x\n",
  		 NIPQUAD(dst_addr->sin_addr), dst_addr->sin_port);

-	atomic_set(&ib_conn->state, ISER_CONN_PENDING);
+	ib_conn->state = ISER_CONN_PENDING;

  	ib_conn->cma_id = rdma_create_id(iser_cma_handler,
  					     (void *)ib_conn,
@@ -498,9 +528,9 @@ int iser_connect(struct iser_conn   *ib_

  	if (!non_blocking) {
  		wait_event_interruptible(ib_conn->wait,
-			 atomic_read(&ib_conn->state) != ISER_CONN_PENDING);
+					 (ib_conn->state != ISER_CONN_PENDING));

-		if (atomic_read(&ib_conn->state) != ISER_CONN_UP) {
+		if (ib_conn->state != ISER_CONN_UP) {
  			err =  -EIO;
  			goto connect_failure;
  		}
@@ -514,7 +544,7 @@ int iser_connect(struct iser_conn   *ib_
  id_failure:
  	ib_conn->cma_id = NULL;
  addr_failure:
-	atomic_set(&ib_conn->state, ISER_CONN_DOWN);
+	ib_conn->state = ISER_CONN_DOWN;
  connect_failure:
  	iser_conn_release(ib_conn);
  	return err;
@@ -527,7 +557,7 @@ void iser_conn_release(struct iser_conn
  {
  	struct iser_device  *device = ib_conn->device;

-	BUG_ON(atomic_read(&ib_conn->state) != ISER_CONN_DOWN);
+	BUG_ON(ib_conn->state != ISER_CONN_DOWN);

  	mutex_lock(&ig.connlist_mutex);
  	list_del(&ib_conn->conn_list);
@@ -719,16 +749,17 @@ static void iser_comp_error_worker(void
  {
  	struct iser_conn *ib_conn = data;

-	if (atomic_read(&ib_conn->state) == ISER_CONN_UP) {
-		atomic_set(&ib_conn->state, ISER_CONN_TERMINATING);
+	/* getting here when the state is UP means that the conn is being *
+	 * terminated asynchronously from the iSCSI layer's perspective.  */
+	if (iser_conn_state_comp_exch(ib_conn, ISER_CONN_UP,
+				      ISER_CONN_TERMINATING))
  		iscsi_conn_failure(ib_conn->iser_conn->iscsi_conn,
  					ISCSI_ERR_CONN_FAILED);
-	}

  	/* complete the termination process if disconnect event was delivered *
  	 * note there are no more non completed posts to the QP               */
  	if (ib_conn->disc_evt_flag) {
-		atomic_set(&ib_conn->state, ISER_CONN_DOWN);
+		ib_conn->state = ISER_CONN_DOWN;
  		wake_up_interruptible(&ib_conn->wait);
  	}
  }
Index: iser_initiator.c
===================================================================
--- iser_initiator.c	(revision 6900)
+++ iser_initiator.c	(revision 6924)
@@ -370,7 +370,7 @@ int iser_send_command(struct iscsi_conn
  	struct iscsi_cmd *hdr =  ctask->hdr;
  	struct scsi_cmnd *sc  =  ctask->sc;

-	if (atomic_read(&iser_conn->ib_conn->state) != ISER_CONN_UP) {
+	if (!iser_conn_state_comp(iser_conn->ib_conn, ISER_CONN_UP)) {
  		iser_err("Failed to send, conn: 0x%p is not up\n", iser_conn->ib_conn);
  		return -EPERM;
  	}
@@ -454,7 +454,7 @@ int iser_send_data_out(struct iscsi_conn
  	unsigned int itt;
  	int err = 0;

-	if (atomic_read(&iser_conn->ib_conn->state) != ISER_CONN_UP) {
+	if (!iser_conn_state_comp(iser_conn->ib_conn, ISER_CONN_UP)) {
  		iser_err("Failed to send, conn: 0x%p is not up\n", iser_conn->ib_conn);
  		return -EPERM;
  	}
@@ -528,7 +528,7 @@ int iser_send_control(struct iscsi_conn
  	struct iser_regd_buf *regd_buf;
  	struct iser_device *device;

-	if (atomic_read(&iser_conn->ib_conn->state) != ISER_CONN_UP) {
+	if (!iser_conn_state_comp(iser_conn->ib_conn, ISER_CONN_UP)) {
  		iser_err("Failed to send, conn: 0x%p is not up\n", iser_conn->ib_conn);
  		return -EPERM;
  	}
Index: iscsi_iser.c
===================================================================
--- iscsi_iser.c	(revision 6900)
+++ iscsi_iser.c	(revision 6924)
@@ -649,13 +649,13 @@ iscsi_iser_ep_poll(__u64 ep_handle, int
  		return -EINVAL;

  	rc = wait_event_interruptible_timeout(ib_conn->wait,
-			     atomic_read(&ib_conn->state) == ISER_CONN_UP,
+			     ib_conn->state == ISER_CONN_UP,
  			     msecs_to_jiffies(timeout_ms));

  	/* if conn establishment failed, return error code to iscsi */
  	if (!rc &&
-	    (atomic_read(&ib_conn->state) == ISER_CONN_TERMINATING ||
-	     atomic_read(&ib_conn->state) == ISER_CONN_DOWN))
+	    (ib_conn->state == ISER_CONN_TERMINATING ||
+	     ib_conn->state == ISER_CONN_DOWN))
  		rc = -1;

  	iser_err("ib conn %p rc = %d\n", ib_conn, rc);
@@ -676,12 +676,9 @@ iscsi_iser_ep_disconnect(__u64 ep_handle
  	if (!ib_conn)
  		return;

-	iser_err("ib conn %p state %d\n",ib_conn, atomic_read(&ib_conn->state));
+	iser_err("ib conn %p state %d\n",ib_conn, ib_conn->state);

-	if (atomic_read(&ib_conn->state) == ISER_CONN_UP)
-		iser_conn_terminate(ib_conn);
-	else
-		iser_conn_release(ib_conn);
+	iser_conn_terminate(ib_conn);
  }

  static struct scsi_host_template iscsi_iser_sht = {








^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator
@ 2006-05-11  6:59 Or Gerlitz
  0 siblings, 0 replies; 21+ messages in thread
From: Or Gerlitz @ 2006-05-11  6:59 UTC (permalink / raw)
  To: rdreier; +Cc: openib-general, linux-kernel

Roland,

I am resending the patch series, this time with changelog description 
and Signed-off-by line, sorry for forgetting it in the original post.

Or.




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator
  2006-05-10 13:20 Or Gerlitz
@ 2006-05-10 17:26 ` Roland Dreier
  0 siblings, 0 replies; 21+ messages in thread
From: Roland Dreier @ 2006-05-10 17:26 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: openib-general, linux-kernel

    Or> To have this code compiled you would need to get the iscsi
    Or> updates for 2.6.18 into your source tree, that is pull/sync
    Or> with include/scsi and drivers/scsi of the scsi-misc-2.6 git
    Or> tree.

What is the URL of this git tree?

(Since git works on changesets and not on paths a la CVS, I can only
pull the whole tree rather than selecting certain paths; but I don't
think that matters)

    Or> There's one patch which is not yet merged there and without it
    Or> iser's compilation fails. The patch is named "iscsi: add
    Or> transport end point callbacks" and i will send it to you
    Or> offlist.

Please let me know when it is merged.  I don't want to be merging
iSCSI changes via my tree.

 - R.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator
@ 2006-05-10 13:20 Or Gerlitz
  2006-05-10 17:26 ` Roland Dreier
  0 siblings, 1 reply; 21+ messages in thread
From: Or Gerlitz @ 2006-05-10 13:20 UTC (permalink / raw)
  To: rdreier; +Cc: openib-general, linux-kernel

Roland,

The patch series that follows contains the iSER code which we want to submit
upstream for 2.6.18, fixed with the comments which we got in the previous post.

LKML reviewers are reminded to CC openib-general@openib.org on your responses.

Below is a log and diffstat over the changes from the previous post which is 
archived @ http://openib.org/pipermail/openib-general/2006-April/020616.html

To have this code compiled you would need to get the iscsi updates for 2.6.18 
into your source tree, that is pull/sync with include/scsi and drivers/scsi of
the scsi-misc-2.6 git tree. 

There's one patch which is not yet merged there and without it iser's 
compilation fails. The patch is named "iscsi: add transport end point callbacks"
and i will send it to you offlist.

+ use direct BUG_ON & BUG calls instead of the iser_bug macro

+ removed usage of SVN keywords such as $LastChangedDate and $Rev

+ few fixes related to the managment of the ib conn list

+ two fixes for checks done at the ib conn state machine flow  

+ changed iser ib conn state management to be done with an int variable keeping
  the state and a lock. When a related race is possible the lock is used to check
  (comp) or change (comp_exch) the state. When no race can happen the state is
  just examined or changed.

+ always call rdma_disconnect in iser_conn_terminate such the CMA will move the
  QP state to ERROR and we will get the FLUSHES on all the pending RX/TX WRs

+ make iser_free_device_ib_res void, change the out goto label name of 
  iser_device_find_by_ib_device

+ some whitespacing cleanups

 Makefile         |    4 -
 iscsi_iser.c     |   18 ++----
 iscsi_iser.h     |   21 +++----
 iser_initiator.c |   24 ++++-----
 iser_memory.c    |   12 +---
 iser_verbs.c     |  145 +++++++++++++++++++++++++++++++------------------------
 6 files changed, 120 insertions(+), 104 deletions(-)

Or.




^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2006-05-11  6:59 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-04-27 12:30 [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator Or Gerlitz
2006-04-27 12:30 ` [PATCH 1/6] iSER's Makefile and Kconfig Or Gerlitz
2006-04-27 12:31   ` [PATCH 2/6] iscsi_iser header file Or Gerlitz
2006-04-27 12:31     ` [PATCH 3/6] open iscsi iser transport provider code Or Gerlitz
2006-04-27 12:32       ` [PATCH 4/6] iser initiator Or Gerlitz
2006-04-27 12:32         ` [PATCH 5/6] iser RDMA CM (CMA) and IB verbs interaction Or Gerlitz
2006-04-27 12:33           ` [PATCH 6/6] iser handling of memory for RDMA Or Gerlitz
2006-04-28 23:05           ` [openib-general] [PATCH 5/6] iser RDMA CM (CMA) and IB verbsinteraction Sean Hefty
2006-04-30 12:30             ` Or Gerlitz
2006-05-01 13:02             ` Or Gerlitz
2006-05-04 13:00               ` [openib-general] [PATCH 5/6] iser RDMA CM (CMA) and IB verbs interaction Or Gerlitz
2006-05-04 13:06                 ` Or Gerlitz
2006-04-27 17:01       ` [PATCH 3/6] open iscsi iser transport provider code Stephen Hemminger
2006-04-27 16:58     ` [PATCH 2/6] iscsi_iser header file Stephen Hemminger
2006-04-27 12:40   ` [PATCH 1/6] iSER's Makefile and Kconfig Jan-Benedict Glaw
2006-04-27 12:44     ` Or Gerlitz
2006-05-01 18:32 ` [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator Roland Dreier
2006-05-02  7:56   ` Or Gerlitz
2006-05-10 13:20 Or Gerlitz
2006-05-10 17:26 ` Roland Dreier
2006-05-11  6:59 Or Gerlitz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.