All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/28] INFINIBAND NETWORK BLOCK DEVICE (IBNBD)
@ 2017-03-24 10:45 ` Jack Wang
  0 siblings, 0 replies; 87+ messages in thread
From: Jack Wang @ 2017-03-24 10:45 UTC (permalink / raw)
  To: linux-block, linux-rdma
  Cc: dledford, axboe, hch, mail, Milind.dumbare, yun.wang, Jack Wang

From: Jack Wang <jinpu.wang@profitbricks.com>

This series introduces IBNBD/IBTRS kernel modules.

IBNBD (InfiniBand network block device) allows for an RDMA transfer of block IO
over InfiniBand network. The driver presents itself as a block device on client
side and transmits the block requests in a zero-copy fashion to the server-side
via InfiniBand. The server part of the driver converts the incoming buffers back
into BIOs and hands them down to the underlying block device. As soon as IO
responses come back from the drive, they are being transmitted back to the
client.

We design and implement this solution based on our need for Cloud Computing,
the key features are:
- High throughput and low latency due to:
1) Only two rdma messages per IO
2) Simplified client side server memory management
3) Eliminated SCSI sublayer
- Simple configuration and handling
1) Server side is completely passive: volumes do not need to be
explicitly exported
2) Only IB port GID and device path needed on client side to map
a block device
3) A device can be remapped automatically i.e. after storage
reboot
- Pinning of IO-related processing to the CPU of the producer

For usage please refer to Documentation/IBNBD.txt in later patch.
My colleague Danil Kpnis presents IBNBD in Vault-2017 about our design/feature/
tradeoff/performance:

http://events.linuxfoundation.org/sites/events/files/slides/IBNBD-Vault-2017.pdf

The patchset is based on Linux 4.11-rc3. I've done functional tests with our
tests framework on AMD64 machines with Mellanox CX-2 and CX-3.

TODOs:
- move some helpers to core
- use new cq api, drain_cq etc
- support poll callback in MQ
- big endian machine support
- better files layout

We've learned a lot from other opensource project, namely SRP/SCST/LIO, etc,
thanks all the contributors. We hope our IBNBD bring more value to 
the opensource world.

A git tree is also avaiable at:
https://github.com/xjtuwjp/linux-2.6/commits/ibnbdv0

As usual, comments and reviews are welcome.

Jack Wang (28):
  ibtrs: add header shared between ibtrs_client and ibtrs_server
  ibtrs: add header for log MICROs shared between ibtrs_client and
    ibtrs_server
  ibtrs_lib: add common functions shared by client and server
  ibtrs_clt: add header file for exported interface
  ibtrs_clt: main functionality of ibtrs_client
  ibtrs_clt: add header file shared only in ibtrs_client
  ibtrs_clt: add files for sysfs interface
  ibtrs_clt: add Makefile and Kconfig
  ibtrs_srv: add header file for exported interface
  ibtrs_srv: add main functionality for ibtrs_server
  ibtrs_srv: add header shared in ibtrs_server
  ibtrs_srv: add sysfs interface
  ibtrs_srv: add Makefile and Kconfig
  ibnbd: add headers shared by ibnbd_client and ibnbd_server
  ibnbd: add shared library functions
  ibnbd_clt: add main functionality of ibnbd_client
  ibnbd_clt: add header shared in ibnbd_client
  ibnbd_clt: add sysfs interface
  ibnbd_clt: add log helpers
  ibnbd_clt: add Makefile and Kconfig
  ibnbd_srv: add header shared in ibnbd_server
  ibnbd_srv: add main functionality
  ibnbd_srv: add abstraction for submit IO to file or block device
  ibnbd_srv: add log helpers
  ibnbd_srv: add sysfs interface
  ibnbd_srv: add Makefile and Kconfig
  ibnbd: add doc for how to use ibnbd and sysfs interface
  MAINTRAINERS: Add maintainer for IBNBD/IBTRS

 Documentation/IBNBD.txt                            |  284 ++
 MAINTAINERS                                        |   14 +
 drivers/block/Kconfig                              |    3 +
 drivers/block/Makefile                             |    2 +
 drivers/block/ibnbd_client/Kconfig                 |   16 +
 drivers/block/ibnbd_client/Makefile                |    5 +
 drivers/block/ibnbd_client/ibnbd_clt.c             | 2007 ++++++++
 drivers/block/ibnbd_client/ibnbd_clt.h             |  231 +
 drivers/block/ibnbd_client/ibnbd_clt_log.h         |   79 +
 drivers/block/ibnbd_client/ibnbd_clt_sysfs.c       |  863 ++++
 drivers/block/ibnbd_client/ibnbd_clt_sysfs.h       |   64 +
 drivers/block/ibnbd_inc/ibnbd-proto.h              |  273 +
 drivers/block/ibnbd_inc/ibnbd.h                    |   55 +
 drivers/block/ibnbd_inc/log.h                      |   68 +
 drivers/block/ibnbd_lib/ibnbd-proto.c              |  244 +
 drivers/block/ibnbd_lib/ibnbd.c                    |  108 +
 drivers/block/ibnbd_server/Kconfig                 |   16 +
 drivers/block/ibnbd_server/Makefile                |    3 +
 drivers/block/ibnbd_server/ibnbd_dev.c             |  436 ++
 drivers/block/ibnbd_server/ibnbd_dev.h             |  149 +
 drivers/block/ibnbd_server/ibnbd_srv.c             | 1074 ++++
 drivers/block/ibnbd_server/ibnbd_srv.h             |  115 +
 drivers/block/ibnbd_server/ibnbd_srv_log.h         |   69 +
 drivers/block/ibnbd_server/ibnbd_srv_sysfs.c       |  317 ++
 drivers/block/ibnbd_server/ibnbd_srv_sysfs.h       |   64 +
 drivers/infiniband/Kconfig                         |    3 +
 drivers/infiniband/ulp/Makefile                    |    2 +
 drivers/infiniband/ulp/ibtrs_client/Kconfig        |    8 +
 drivers/infiniband/ulp/ibtrs_client/Makefile       |    6 +
 drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c    | 5329 ++++++++++++++++++++
 .../ulp/ibtrs_client/ibtrs_clt_internal.h          |  244 +
 .../infiniband/ulp/ibtrs_client/ibtrs_clt_sysfs.c  |  412 ++
 .../infiniband/ulp/ibtrs_client/ibtrs_clt_sysfs.h  |   62 +
 drivers/infiniband/ulp/ibtrs_lib/common.c          |  104 +
 drivers/infiniband/ulp/ibtrs_lib/heartbeat.c       |  112 +
 drivers/infiniband/ulp/ibtrs_lib/ibtrs-proto.c     |  248 +
 drivers/infiniband/ulp/ibtrs_lib/ibtrs.c           |  412 ++
 drivers/infiniband/ulp/ibtrs_lib/iu.c              |  113 +
 drivers/infiniband/ulp/ibtrs_server/Kconfig        |    8 +
 drivers/infiniband/ulp/ibtrs_server/Makefile       |    6 +
 drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c    | 3744 ++++++++++++++
 .../ulp/ibtrs_server/ibtrs_srv_internal.h          |  201 +
 .../infiniband/ulp/ibtrs_server/ibtrs_srv_sysfs.c  |  301 ++
 .../infiniband/ulp/ibtrs_server/ibtrs_srv_sysfs.h  |   59 +
 include/rdma/ibtrs.h                               |  514 ++
 include/rdma/ibtrs_clt.h                           |  316 ++
 include/rdma/ibtrs_log.h                           |   88 +
 include/rdma/ibtrs_srv.h                           |  206 +
 48 files changed, 19057 insertions(+)
 create mode 100644 Documentation/IBNBD.txt
 create mode 100644 drivers/block/ibnbd_client/Kconfig
 create mode 100644 drivers/block/ibnbd_client/Makefile
 create mode 100644 drivers/block/ibnbd_client/ibnbd_clt.c
 create mode 100644 drivers/block/ibnbd_client/ibnbd_clt.h
 create mode 100644 drivers/block/ibnbd_client/ibnbd_clt_log.h
 create mode 100644 drivers/block/ibnbd_client/ibnbd_clt_sysfs.c
 create mode 100644 drivers/block/ibnbd_client/ibnbd_clt_sysfs.h
 create mode 100644 drivers/block/ibnbd_inc/ibnbd-proto.h
 create mode 100644 drivers/block/ibnbd_inc/ibnbd.h
 create mode 100644 drivers/block/ibnbd_inc/log.h
 create mode 100644 drivers/block/ibnbd_lib/ibnbd-proto.c
 create mode 100644 drivers/block/ibnbd_lib/ibnbd.c
 create mode 100644 drivers/block/ibnbd_server/Kconfig
 create mode 100644 drivers/block/ibnbd_server/Makefile
 create mode 100644 drivers/block/ibnbd_server/ibnbd_dev.c
 create mode 100644 drivers/block/ibnbd_server/ibnbd_dev.h
 create mode 100644 drivers/block/ibnbd_server/ibnbd_srv.c
 create mode 100644 drivers/block/ibnbd_server/ibnbd_srv.h
 create mode 100644 drivers/block/ibnbd_server/ibnbd_srv_log.h
 create mode 100644 drivers/block/ibnbd_server/ibnbd_srv_sysfs.c
 create mode 100644 drivers/block/ibnbd_server/ibnbd_srv_sysfs.h
 create mode 100644 drivers/infiniband/ulp/ibtrs_client/Kconfig
 create mode 100644 drivers/infiniband/ulp/ibtrs_client/Makefile
 create mode 100644 drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c
 create mode 100644 drivers/infiniband/ulp/ibtrs_client/ibtrs_clt_internal.h
 create mode 100644 drivers/infiniband/ulp/ibtrs_client/ibtrs_clt_sysfs.c
 create mode 100644 drivers/infiniband/ulp/ibtrs_client/ibtrs_clt_sysfs.h
 create mode 100644 drivers/infiniband/ulp/ibtrs_lib/common.c
 create mode 100644 drivers/infiniband/ulp/ibtrs_lib/heartbeat.c
 create mode 100644 drivers/infiniband/ulp/ibtrs_lib/ibtrs-proto.c
 create mode 100644 drivers/infiniband/ulp/ibtrs_lib/ibtrs.c
 create mode 100644 drivers/infiniband/ulp/ibtrs_lib/iu.c
 create mode 100644 drivers/infiniband/ulp/ibtrs_server/Kconfig
 create mode 100644 drivers/infiniband/ulp/ibtrs_server/Makefile
 create mode 100644 drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c
 create mode 100644 drivers/infiniband/ulp/ibtrs_server/ibtrs_srv_internal.h
 create mode 100644 drivers/infiniband/ulp/ibtrs_server/ibtrs_srv_sysfs.c
 create mode 100644 drivers/infiniband/ulp/ibtrs_server/ibtrs_srv_sysfs.h
 create mode 100644 include/rdma/ibtrs.h
 create mode 100644 include/rdma/ibtrs_clt.h
 create mode 100644 include/rdma/ibtrs_log.h
 create mode 100644 include/rdma/ibtrs_srv.h

-- 
2.7.4

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [RFC PATCH 00/28] INFINIBAND NETWORK BLOCK DEVICE (IBNBD)
@ 2017-03-24 10:45 ` Jack Wang
  0 siblings, 0 replies; 87+ messages in thread
From: Jack Wang @ 2017-03-24 10:45 UTC (permalink / raw)
  To: linux-block-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
	hch-jcswGhMUV9g, mail-99BIx50xQYGELgA04lAiVw,
	Milind.dumbare-Re5JQEeQqe8AvxtiuMwx3w,
	yun.wang-EIkl63zCoXaH+58JC4qpiA, Jack Wang

From: Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>

This series introduces IBNBD/IBTRS kernel modules.

IBNBD (InfiniBand network block device) allows for an RDMA transfer of block IO
over InfiniBand network. The driver presents itself as a block device on client
side and transmits the block requests in a zero-copy fashion to the server-side
via InfiniBand. The server part of the driver converts the incoming buffers back
into BIOs and hands them down to the underlying block device. As soon as IO
responses come back from the drive, they are being transmitted back to the
client.

We design and implement this solution based on our need for Cloud Computing,
the key features are:
- High throughput and low latency due to:
1) Only two rdma messages per IO
2) Simplified client side server memory management
3) Eliminated SCSI sublayer
- Simple configuration and handling
1) Server side is completely passive: volumes do not need to be
explicitly exported
2) Only IB port GID and device path needed on client side to map
a block device
3) A device can be remapped automatically i.e. after storage
reboot
- Pinning of IO-related processing to the CPU of the producer

For usage please refer to Documentation/IBNBD.txt in later patch.
My colleague Danil Kpnis presents IBNBD in Vault-2017 about our design/feature/
tradeoff/performance:

http://events.linuxfoundation.org/sites/events/files/slides/IBNBD-Vault-2017.pdf

The patchset is based on Linux 4.11-rc3. I've done functional tests with our
tests framework on AMD64 machines with Mellanox CX-2 and CX-3.

TODOs:
- move some helpers to core
- use new cq api, drain_cq etc
- support poll callback in MQ
- big endian machine support
- better files layout

We've learned a lot from other opensource project, namely SRP/SCST/LIO, etc,
thanks all the contributors. We hope our IBNBD bring more value to 
the opensource world.

A git tree is also avaiable at:
https://github.com/xjtuwjp/linux-2.6/commits/ibnbdv0

As usual, comments and reviews are welcome.

Jack Wang (28):
  ibtrs: add header shared between ibtrs_client and ibtrs_server
  ibtrs: add header for log MICROs shared between ibtrs_client and
    ibtrs_server
  ibtrs_lib: add common functions shared by client and server
  ibtrs_clt: add header file for exported interface
  ibtrs_clt: main functionality of ibtrs_client
  ibtrs_clt: add header file shared only in ibtrs_client
  ibtrs_clt: add files for sysfs interface
  ibtrs_clt: add Makefile and Kconfig
  ibtrs_srv: add header file for exported interface
  ibtrs_srv: add main functionality for ibtrs_server
  ibtrs_srv: add header shared in ibtrs_server
  ibtrs_srv: add sysfs interface
  ibtrs_srv: add Makefile and Kconfig
  ibnbd: add headers shared by ibnbd_client and ibnbd_server
  ibnbd: add shared library functions
  ibnbd_clt: add main functionality of ibnbd_client
  ibnbd_clt: add header shared in ibnbd_client
  ibnbd_clt: add sysfs interface
  ibnbd_clt: add log helpers
  ibnbd_clt: add Makefile and Kconfig
  ibnbd_srv: add header shared in ibnbd_server
  ibnbd_srv: add main functionality
  ibnbd_srv: add abstraction for submit IO to file or block device
  ibnbd_srv: add log helpers
  ibnbd_srv: add sysfs interface
  ibnbd_srv: add Makefile and Kconfig
  ibnbd: add doc for how to use ibnbd and sysfs interface
  MAINTRAINERS: Add maintainer for IBNBD/IBTRS

 Documentation/IBNBD.txt                            |  284 ++
 MAINTAINERS                                        |   14 +
 drivers/block/Kconfig                              |    3 +
 drivers/block/Makefile                             |    2 +
 drivers/block/ibnbd_client/Kconfig                 |   16 +
 drivers/block/ibnbd_client/Makefile                |    5 +
 drivers/block/ibnbd_client/ibnbd_clt.c             | 2007 ++++++++
 drivers/block/ibnbd_client/ibnbd_clt.h             |  231 +
 drivers/block/ibnbd_client/ibnbd_clt_log.h         |   79 +
 drivers/block/ibnbd_client/ibnbd_clt_sysfs.c       |  863 ++++
 drivers/block/ibnbd_client/ibnbd_clt_sysfs.h       |   64 +
 drivers/block/ibnbd_inc/ibnbd-proto.h              |  273 +
 drivers/block/ibnbd_inc/ibnbd.h                    |   55 +
 drivers/block/ibnbd_inc/log.h                      |   68 +
 drivers/block/ibnbd_lib/ibnbd-proto.c              |  244 +
 drivers/block/ibnbd_lib/ibnbd.c                    |  108 +
 drivers/block/ibnbd_server/Kconfig                 |   16 +
 drivers/block/ibnbd_server/Makefile                |    3 +
 drivers/block/ibnbd_server/ibnbd_dev.c             |  436 ++
 drivers/block/ibnbd_server/ibnbd_dev.h             |  149 +
 drivers/block/ibnbd_server/ibnbd_srv.c             | 1074 ++++
 drivers/block/ibnbd_server/ibnbd_srv.h             |  115 +
 drivers/block/ibnbd_server/ibnbd_srv_log.h         |   69 +
 drivers/block/ibnbd_server/ibnbd_srv_sysfs.c       |  317 ++
 drivers/block/ibnbd_server/ibnbd_srv_sysfs.h       |   64 +
 drivers/infiniband/Kconfig                         |    3 +
 drivers/infiniband/ulp/Makefile                    |    2 +
 drivers/infiniband/ulp/ibtrs_client/Kconfig        |    8 +
 drivers/infiniband/ulp/ibtrs_client/Makefile       |    6 +
 drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c    | 5329 ++++++++++++++++++++
 .../ulp/ibtrs_client/ibtrs_clt_internal.h          |  244 +
 .../infiniband/ulp/ibtrs_client/ibtrs_clt_sysfs.c  |  412 ++
 .../infiniband/ulp/ibtrs_client/ibtrs_clt_sysfs.h  |   62 +
 drivers/infiniband/ulp/ibtrs_lib/common.c          |  104 +
 drivers/infiniband/ulp/ibtrs_lib/heartbeat.c       |  112 +
 drivers/infiniband/ulp/ibtrs_lib/ibtrs-proto.c     |  248 +
 drivers/infiniband/ulp/ibtrs_lib/ibtrs.c           |  412 ++
 drivers/infiniband/ulp/ibtrs_lib/iu.c              |  113 +
 drivers/infiniband/ulp/ibtrs_server/Kconfig        |    8 +
 drivers/infiniband/ulp/ibtrs_server/Makefile       |    6 +
 drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c    | 3744 ++++++++++++++
 .../ulp/ibtrs_server/ibtrs_srv_internal.h          |  201 +
 .../infiniband/ulp/ibtrs_server/ibtrs_srv_sysfs.c  |  301 ++
 .../infiniband/ulp/ibtrs_server/ibtrs_srv_sysfs.h  |   59 +
 include/rdma/ibtrs.h                               |  514 ++
 include/rdma/ibtrs_clt.h                           |  316 ++
 include/rdma/ibtrs_log.h                           |   88 +
 include/rdma/ibtrs_srv.h                           |  206 +
 48 files changed, 19057 insertions(+)
 create mode 100644 Documentation/IBNBD.txt
 create mode 100644 drivers/block/ibnbd_client/Kconfig
 create mode 100644 drivers/block/ibnbd_client/Makefile
 create mode 100644 drivers/block/ibnbd_client/ibnbd_clt.c
 create mode 100644 drivers/block/ibnbd_client/ibnbd_clt.h
 create mode 100644 drivers/block/ibnbd_client/ibnbd_clt_log.h
 create mode 100644 drivers/block/ibnbd_client/ibnbd_clt_sysfs.c
 create mode 100644 drivers/block/ibnbd_client/ibnbd_clt_sysfs.h
 create mode 100644 drivers/block/ibnbd_inc/ibnbd-proto.h
 create mode 100644 drivers/block/ibnbd_inc/ibnbd.h
 create mode 100644 drivers/block/ibnbd_inc/log.h
 create mode 100644 drivers/block/ibnbd_lib/ibnbd-proto.c
 create mode 100644 drivers/block/ibnbd_lib/ibnbd.c
 create mode 100644 drivers/block/ibnbd_server/Kconfig
 create mode 100644 drivers/block/ibnbd_server/Makefile
 create mode 100644 drivers/block/ibnbd_server/ibnbd_dev.c
 create mode 100644 drivers/block/ibnbd_server/ibnbd_dev.h
 create mode 100644 drivers/block/ibnbd_server/ibnbd_srv.c
 create mode 100644 drivers/block/ibnbd_server/ibnbd_srv.h
 create mode 100644 drivers/block/ibnbd_server/ibnbd_srv_log.h
 create mode 100644 drivers/block/ibnbd_server/ibnbd_srv_sysfs.c
 create mode 100644 drivers/block/ibnbd_server/ibnbd_srv_sysfs.h
 create mode 100644 drivers/infiniband/ulp/ibtrs_client/Kconfig
 create mode 100644 drivers/infiniband/ulp/ibtrs_client/Makefile
 create mode 100644 drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c
 create mode 100644 drivers/infiniband/ulp/ibtrs_client/ibtrs_clt_internal.h
 create mode 100644 drivers/infiniband/ulp/ibtrs_client/ibtrs_clt_sysfs.c
 create mode 100644 drivers/infiniband/ulp/ibtrs_client/ibtrs_clt_sysfs.h
 create mode 100644 drivers/infiniband/ulp/ibtrs_lib/common.c
 create mode 100644 drivers/infiniband/ulp/ibtrs_lib/heartbeat.c
 create mode 100644 drivers/infiniband/ulp/ibtrs_lib/ibtrs-proto.c
 create mode 100644 drivers/infiniband/ulp/ibtrs_lib/ibtrs.c
 create mode 100644 drivers/infiniband/ulp/ibtrs_lib/iu.c
 create mode 100644 drivers/infiniband/ulp/ibtrs_server/Kconfig
 create mode 100644 drivers/infiniband/ulp/ibtrs_server/Makefile
 create mode 100644 drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c
 create mode 100644 drivers/infiniband/ulp/ibtrs_server/ibtrs_srv_internal.h
 create mode 100644 drivers/infiniband/ulp/ibtrs_server/ibtrs_srv_sysfs.c
 create mode 100644 drivers/infiniband/ulp/ibtrs_server/ibtrs_srv_sysfs.h
 create mode 100644 include/rdma/ibtrs.h
 create mode 100644 include/rdma/ibtrs_clt.h
 create mode 100644 include/rdma/ibtrs_log.h
 create mode 100644 include/rdma/ibtrs_srv.h

-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH 01/28] ibtrs: add header shared between ibtrs_client and ibtrs_server
@ 2017-03-24 10:45   ` Jack Wang
  0 siblings, 0 replies; 87+ messages in thread
From: Jack Wang @ 2017-03-24 10:45 UTC (permalink / raw)
  To: linux-block, linux-rdma
  Cc: dledford, axboe, hch, mail, Milind.dumbare, yun.wang, Jack Wang,
	Kleber Souza, Danil Kipnis, Roman Pen

From: Jack Wang <jinpu.wang@profitbricks.com>

Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Kleber Souza <kleber.souza@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kipnis@profitbricks.com>
Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
---
 include/rdma/ibtrs.h | 514 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 514 insertions(+)
 create mode 100644 include/rdma/ibtrs.h

diff --git a/include/rdma/ibtrs.h b/include/rdma/ibtrs.h
new file mode 100644
index 0000000..4fc572b
--- /dev/null
+++ b/include/rdma/ibtrs.h
@@ -0,0 +1,514 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler < mail@fholler.de>
+ *          Jack Wang <jinpu.wang@profitbricks.com>
+ *   	    Kleber Souza <kleber.souza@profitbricks.com>
+ * 	    Danil Kipnis <danil.kipnis@profitbricks.com>
+ *   	    Roman Pen <roman.penyaev@profitbricks.com>
+ *          Milind Dumbare <Milind.dumbare@gmail.com>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#ifndef __IBTRS_H
+#define __IBTRS_H
+
+#include <linux/uio.h>
+#include <linux/types.h>
+#include <linux/uuid.h>
+#include <rdma/rdma_cm.h>
+#include <rdma/ib_cm.h>
+#include <linux/list.h>
+#include <linux/dma-direction.h>
+#include <rdma/ib_verbs.h>
+#include <linux/time.h>
+#include <linux/ktime.h>
+#include <linux/timekeeping.h>
+
+#define IBTRS_SERVER_PORT 1234
+#define WC_ARRAY_SIZE 16
+#define IB_APM_TIMEOUT 16 /* 4.096 * 2 ^ 16 = 260 msec */
+
+#define USR_MSG_CNT 64
+#define USR_CON_BUF_SIZE (USR_MSG_CNT * 2) /* double bufs for ACK's */
+
+#define DEFAULT_HEARTBEAT_TIMEOUT_MS 20000
+#define MIN_HEARTBEAT_TIMEOUT_MS 5000
+#define HEARTBEAT_INTV_MS 500
+#define HEARTBEAT_INTV_JIFFIES msecs_to_jiffies(HEARTBEAT_INTV_MS)
+
+#define MIN_RTR_CNT 1
+#define MAX_RTR_CNT 7
+
+/*
+ * With the current size of the tag allocated on the client, 4K is the maximum
+ * number of tags we can allocate. (see IBNBD-2321)
+ * This number is also used on the client to allocate the IU for the user
+ * connection to receive the RDMA addresses from the server.
+ */
+#define MAX_SESS_QUEUE_DEPTH 4096
+
+#define XX(a) case (a): return #a
+
+#define IBTRS_ADDRLEN sizeof("ipv6:[xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx]")
+
+static inline const char *ib_wc_opcode_str(enum ib_wc_opcode opcode)
+{
+	switch (opcode) {
+	XX(IB_WC_SEND);
+	XX(IB_WC_RDMA_WRITE);
+	XX(IB_WC_RDMA_READ);
+	XX(IB_WC_COMP_SWAP);
+	XX(IB_WC_FETCH_ADD);
+	/* recv-side); inbound completion */
+	XX(IB_WC_RECV);
+	XX(IB_WC_RECV_RDMA_WITH_IMM);
+	default: return "IB_WC_OPCODE_UNKNOWN";
+	}
+}
+
+
+struct ib_session {
+	struct ib_pd		*pd;
+	struct ib_mr		*mr;
+	struct ib_event_handler	event_handler;
+};
+
+struct ibtrs_ib_path {
+	union ib_gid    p_sgid;
+	union ib_gid    p_dgid;
+};
+
+struct ib_con {
+	struct ib_qp		*qp ____cacheline_aligned;
+	struct ib_cq		*cq ____cacheline_aligned;
+	struct ib_send_wr	beacon;
+	struct rdma_cm_id	*cm_id;
+	struct ibtrs_ib_path    pri_path;
+	struct ibtrs_ib_path   cur_path;
+	char			*addr;
+	char			*hostname;
+};
+
+struct ibtrs_iu {
+	struct list_head        list;
+	dma_addr_t              dma_addr;
+	void                    *buf;
+	size_t                  size;
+	enum dma_data_direction direction;
+	bool			is_msg;
+	u32			tag;
+};
+
+struct ibtrs_heartbeat {
+	atomic64_t	send_ts_ms;
+	atomic64_t	recv_ts_ms;
+	u32		timeout_ms;
+	u32		warn_timeout_ms;
+	char		*addr;
+	char		*hostname;
+};
+
+#define IBTRS_VERSION 2
+#define IBTRS_UUID_SIZE 16
+#define IO_MSG_SIZE 24
+#define IB_IMM_SIZE_BITS 32
+
+#define GCC_DIAGNOSTIC_AWARE ((__GNUC__ > 6))
+#if GCC_DIAGNOSTIC_AWARE
+#pragma GCC diagnostic push
+#pragma GCC diagnostic warning "-Wpadded"
+#endif
+
+/**
+ * enum ibtrs_msg_types - IBTRS message types. DO NOT REMOVE OR REORDER!!!
+ * @IBTRS_MSG_SESS_OPEN:	Client requests new session on Server
+ * @IBTRS_MSG_SESS_OPEN_RESP:	Server informs Client about session parameters
+ * @IBTRS_MSG_CON_OPEN:		Client requests new connection to server
+ * @IBTRS_MSG_RDMA_WRITE:	Client writes data per RDMA to Server
+ * @IBTRS_MSG_REQ_RDMA_WRITE:	Client requests data transfer per RDMA
+ * @IBTRS_MSG_USER:		Data transfer per Infiniband message
+ * @IBTRS_MSG_ERR:		Fatal Error happened
+ * @IBTRS_MSG_SESS_INFO:	Client requests about session info
+ */
+enum ibtrs_msg_types {
+	IBTRS_MSG_SESS_OPEN,
+	IBTRS_MSG_SESS_OPEN_RESP,
+	IBTRS_MSG_CON_OPEN,
+	IBTRS_MSG_RDMA_WRITE,
+	IBTRS_MSG_REQ_RDMA_WRITE,
+	IBTRS_MSG_USER,
+	IBTRS_MSG_ERROR,
+	IBTRS_MSG_SESS_INFO,
+};
+
+/**
+ * struct ibtrs_msg_hdr - Common header of all IBTRS messages
+ * @type:	Message type, valid values see: enum ibtrs_msg_types
+ * @tsize:	Total size of transferred data
+ *
+ * Don't move the first 8 padding bytes! It's a workaround for a kernel bug.
+ * See IBNBD-610 for details
+ *
+ * DO NOT CHANGE!
+ */
+struct ibtrs_msg_hdr {
+	u8			__padding1;
+	u8			type;
+	u16			__padding2;
+	u32			tsize;
+};
+
+#define IBTRS_HDR_LEN sizeof(struct ibtrs_msg_hdr)
+
+/**
+ * struct ibtrs_msg_session_open - Opens a new session between client and server
+ * @hdr:	message header
+ * @uuid:	client host identifier, unique until module reload
+ * @ver:	IBTRS protocol version
+ * @con_cnt:    number of connections in this session
+ * @reserved:   reserved fields for future usage, 28 bytes is maximum for
+ *		all IPv6/IPv4 session
+ *
+ * DO NOT CHANGE members before ver.
+ */
+struct ibtrs_msg_sess_open {
+	struct ibtrs_msg_hdr	hdr;
+	u8			uuid[IBTRS_UUID_SIZE];
+	u8			ver;
+	u8			con_cnt;
+	u8			reserved[30];
+};
+
+/**
+ * struct ibtrs_msg_sess_info
+ * @hdr:		message header
+ * @hostname:		client host name
+ */
+struct ibtrs_msg_sess_info {
+	struct ibtrs_msg_hdr	hdr;
+	u8                      hostname[MAXHOSTNAMELEN];
+};
+
+#define MSG_SESS_INFO_SIZE sizeof(struct ibtrs_msg_sess_info)
+
+/*
+ *  Data Layout in RDMA-Bufs:
+ *
+ * +---------RDMA-BUF--------+
+ * |         Slice N	     |
+ * | +---------------------+ |
+ * | |      I/O data       | |
+ * | |---------------------| |
+ * | |      IBNBD MSG	   | |
+ * | |---------------------| |
+ * | |	    IBTRS MSG	   | |
+ * | +---------------------+ |
+ * +-------------------------+
+ * |	     Slice N+1	     |
+ * | +---------------------+ |
+ * | |       I/O data	   | |
+ * | |---------------------| |
+ * | |	     IBNBD MSG     | |
+ * | |---------------------| |
+ * | |       IBTRS MSG     | |
+ * | +---------------------+ |
+ * +-------------------------+
+ */
+
+#define IBTRS_MSG_RESV_LEN 128
+/**
+ * struct ibtrs_msg_sess_open_resp - Servers response to %IBTRS_MSG_SESS_OPEN
+ * @hdr:	message header
+ * @ver:	IBTRS protocol version
+ * @cnt:	Number of rdma addresses in this message
+ * @rkey:	remote key to allow client to access buffers
+ * @hostname:   hostname of local host
+ * @reserved:    reserved fields for future usage
+ * @max_inflight_msg:  max inflight messages (queue-depth) in this session
+ * @max_io_size:   max io size server supports
+ * @max_req_size:   max infiniband message size server supports
+ * @addr:	rdma addresses of buffers
+ *
+ * DO NOT CHANGE members before ver.
+ */
+struct ibtrs_msg_sess_open_resp {
+	struct ibtrs_msg_hdr	hdr;
+	u8			ver;
+	u8			__padding1;
+	u16			cnt;
+	u32			rkey;
+	u8                      hostname[MAXHOSTNAMELEN];
+	u8			reserved[IBTRS_MSG_RESV_LEN];
+	u16			max_inflight_msg;
+	u32			max_io_size;
+	u32			max_req_size;
+	u64			addr[];
+};
+
+#define IBTRS_MSG_SESS_OPEN_RESP_LEN(cnt) \
+	(sizeof(struct ibtrs_msg_sess_open_resp) + sizeof(u64) * cnt)
+/**
+ * struct ibtrs_msg_con_open - Opens a new connection between client and server
+ * @hdr:		message header
+ * @uuid:		client host identifier, unique until module reload
+ */
+struct ibtrs_msg_con_open {
+	struct ibtrs_msg_hdr	hdr;
+	u8			uuid[IBTRS_UUID_SIZE];
+};
+
+/**
+ * struct ibtrs_msg_user - Data exchanged a Infiniband message
+ * @hdr:		message header
+ * @payl:		Payload from user user module
+ */
+struct ibtrs_msg_user {
+	struct ibtrs_msg_hdr	hdr;
+	u8			payl[];
+};
+
+/**
+ * struct ibtrs_sg_desc - RDMA-Buffer entry description
+ * @addr:	Address of RDMA destination buffer
+ * @key:	Authorization rkey to write to the buffer
+ * @len:	Size of the buffer
+ */
+struct ibtrs_sg_desc {
+	u64			addr;
+	u32			key;
+	u32			len;
+};
+
+#define IBTRS_SG_DESC_LEN sizeof(struct ibtrs_sg_desc)
+
+/**
+ * struct ibtrs_msg_req_rdma_write - RDMA data transfer request from client
+ * @hdr:		message header
+ * @sg_cnt:		number of @desc entries
+ * @desc:		RDMA bufferst where the server can write the result to
+ */
+struct ibtrs_msg_req_rdma_write {
+	struct ibtrs_msg_hdr	hdr;
+	u32			__padding;
+	u32			sg_cnt;
+	struct ibtrs_sg_desc    desc[];
+};
+
+/**
+ * struct_msg_rdma_write - Message transferred to server with RDMA-Write
+ * @hdr:		message header
+ */
+struct ibtrs_msg_rdma_write {
+	struct ibtrs_msg_hdr	hdr;
+};
+
+/**
+ * struct ibtrs_msg_error - Error message
+ * @hdr:		message header
+ * @errno:		Errno number describing the error
+ */
+struct ibtrs_msg_error {
+	struct ibtrs_msg_hdr	hdr;
+	s32			errno;
+	u32			__padding;
+};
+
+#if GCC_DIAGNOSTIC_AWARE
+#pragma GCC diagnostic pop
+#endif
+
+int ibtrs_validate_message(u16 queue_depth, const void *hdr);
+
+void fill_ibtrs_msg_sess_open(struct ibtrs_msg_sess_open *msg, u8 con_cnt,
+			      const uuid_le *uuid);
+
+void fill_ibtrs_msg_con_open(struct ibtrs_msg_con_open *msg,
+			     const uuid_le *uuid);
+
+void fill_ibtrs_msg_sess_info(struct ibtrs_msg_sess_info *msg,
+			      const char *hostname);
+
+void ibtrs_heartbeat_set_send_ts(struct ibtrs_heartbeat *h);
+void ibtrs_set_last_heartbeat(struct ibtrs_heartbeat *h);
+u64 ibtrs_last_heartbeat_diff_ms(const struct ibtrs_heartbeat *h);
+u64 ibtrs_heartbeat_send_ts_diff_ms(const struct ibtrs_heartbeat *h);
+
+void ibtrs_set_heartbeat_timeout(struct ibtrs_heartbeat *h, u32 timeout_ms);
+
+void ibtrs_heartbeat_warn(const struct ibtrs_heartbeat *h);
+
+bool ibtrs_heartbeat_timeout_is_expired(const struct ibtrs_heartbeat *h);
+
+u32 ibtrs_heartbeat_get_send_delay(const struct ibtrs_heartbeat *h);
+u32 ibtrs_heartbeat_get_check_delay(const struct ibtrs_heartbeat *h);
+void ibtrs_iu_put(struct list_head *iu_list, struct ibtrs_iu *iu);
+struct ibtrs_iu *ibtrs_iu_get(struct list_head *iu_list);
+
+struct ibtrs_iu *ibtrs_iu_alloc(u32 tag, size_t size, gfp_t t,
+				struct ib_device *dev,
+				enum dma_data_direction, bool is_msg);
+
+void ibtrs_iu_free(struct ibtrs_iu *iu, enum dma_data_direction dir,
+		   struct ib_device *dev);
+
+int ibtrs_write_empty_imm(struct ib_qp *qp, u32 imm_data,
+			  enum ib_send_flags flags);
+
+int ibtrs_post_send(struct ib_qp *qp, struct ib_mr *mr, struct ibtrs_iu *iu,
+		    u32 size);
+
+int ib_post_rdma_write_imm(struct ib_qp *qp, struct ib_sge *sge,
+			   unsigned int num_sge, u32 rkey, u64 rdma_addr,
+			   u64 wr_id, u32 imm_data, enum ib_send_flags flags);
+
+int ib_post_rdma_write(struct ib_qp *qp, struct ib_sge *sge,
+		       unsigned int num_sge, u32 rkey, u64 rdma_addr,
+		       u64 wr_id);
+int post_beacon(struct ib_con *con);
+/**
+ * ib_session_init() - Create a new IB session
+ */
+int ib_session_init(struct ib_device *dev, struct ib_session *session);
+
+/**
+ * ib_con_init() - initialize and add a ib_con to the session
+ * @con:	&ib_con to initialize
+ * @session:	session the &ib_con is added to
+ * @ctx:	CQ context, returned to the user via completion handler
+ *
+ * Returns 0 on success otherwise a negative errno code
+ */
+int ib_con_init(struct ib_con *con, struct rdma_cm_id *cm_id,
+		u32 max_send_sge,
+		ib_comp_handler comp_handler, void *ctx, int cq_vector,
+		u16 cq_size, u16 wr_queue_size, struct ib_session *session);
+
+int ibtrs_request_cq_notifications(struct ib_con *con);
+
+void ib_con_destroy(struct ib_con *con);
+
+/**
+ * ib_session_destroy() - Free a session
+ * The corresponding &ib_con must have been freed before.
+ */
+void ib_session_destroy(struct ib_session *session);
+
+int ib_get_max_wr_queue_size(struct ib_device *dev);
+
+int ibtrs_addr_to_str(const struct sockaddr_storage *addr, char *buf,
+		      size_t len);
+
+int ibtrs_heartbeat_timeout_validate(int timeout);
+
+/**
+ * kvec_length() - Total number of bytes covered by an kvec.
+ */
+static inline size_t kvec_length(const struct kvec *vec, size_t nr)
+{
+	size_t seg, ret = 0;
+
+	for (seg = 0; seg < nr; seg++)
+		ret += vec[seg].iov_len;
+	return ret;
+}
+
+/**
+ * copy_from_kvec() - Copy kvec to the buffer.
+ */
+static inline void copy_from_kvec(void *data, const struct kvec *vec,
+				  size_t copy)
+{
+	size_t seg, len;
+
+	for (seg = 0; copy; seg++) {
+		len = min(vec[seg].iov_len, copy);
+		memcpy(data, vec[seg].iov_base, len);
+		data += len;
+		copy -= len;
+	}
+}
+
+static inline u64 timespec_to_ms(const struct timespec *ts)
+{
+	return timespec_to_ns(ts) / NSEC_PER_MSEC;
+}
+
+u64 timediff_cur_ms(u64 cur_ms);
+
+void *ibtrs_malloc(size_t size);
+void *ibtrs_zalloc(size_t size);
+
+#define STAT_STORE_FUNC(store, reset) \
+static ssize_t store##_store(struct kobject *kobj, \
+			    struct kobj_attribute *attr, \
+			    const char *buf, size_t count) \
+{ \
+	int ret = -EINVAL; \
+	struct ibtrs_session *sess = container_of(kobj, struct ibtrs_session, \
+						  kobj_stats); \
+\
+	if (sysfs_streq(buf, "1")) \
+		ret = reset(sess, true); \
+	else if (sysfs_streq(buf, "0"))\
+		ret = reset(sess, false); \
+	if (ret) \
+		return ret; \
+\
+	return count; \
+}
+
+#define STAT_SHOW_FUNC(show, print) \
+static ssize_t show##_show(struct kobject *kobj, \
+			   struct kobj_attribute *attr, \
+			   char *page) \
+{ \
+	struct ibtrs_session *sess = container_of(kobj, struct ibtrs_session, \
+						  kobj_stats); \
+\
+	return print(sess, page, PAGE_SIZE); \
+}
+
+#define STAT_ATTR(stat, print, reset) \
+STAT_STORE_FUNC(stat, reset) \
+STAT_SHOW_FUNC(stat, print) \
+static struct kobj_attribute stat##_attr = \
+		__ATTR(stat, 0644, \
+		       stat##_show, \
+		       stat##_store)
+
+#endif /*__IBTRS_H*/
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 01/28] ibtrs: add header shared between ibtrs_client and ibtrs_server
@ 2017-03-24 10:45   ` Jack Wang
  0 siblings, 0 replies; 87+ messages in thread
From: Jack Wang @ 2017-03-24 10:45 UTC (permalink / raw)
  To: linux-block-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
	hch-jcswGhMUV9g, mail-99BIx50xQYGELgA04lAiVw,
	Milind.dumbare-Re5JQEeQqe8AvxtiuMwx3w,
	yun.wang-EIkl63zCoXaH+58JC4qpiA, Jack Wang, Kleber Souza,
	Danil Kipnis, Roman Pen

From: Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>

Signed-off-by: Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Signed-off-by: Kleber Souza <kleber.souza-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Signed-off-by: Danil Kipnis <danil.kipnis-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Signed-off-by: Roman Pen <roman.penyaev-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
---
 include/rdma/ibtrs.h | 514 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 514 insertions(+)
 create mode 100644 include/rdma/ibtrs.h

diff --git a/include/rdma/ibtrs.h b/include/rdma/ibtrs.h
new file mode 100644
index 0000000..4fc572b
--- /dev/null
+++ b/include/rdma/ibtrs.h
@@ -0,0 +1,514 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler < mail-99BIx50xQYGELgA04lAiVw@public.gmane.org>
+ *          Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ *   	    Kleber Souza <kleber.souza-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ * 	    Danil Kipnis <danil.kipnis-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ *   	    Roman Pen <roman.penyaev-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ *          Milind Dumbare <Milind.dumbare-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#ifndef __IBTRS_H
+#define __IBTRS_H
+
+#include <linux/uio.h>
+#include <linux/types.h>
+#include <linux/uuid.h>
+#include <rdma/rdma_cm.h>
+#include <rdma/ib_cm.h>
+#include <linux/list.h>
+#include <linux/dma-direction.h>
+#include <rdma/ib_verbs.h>
+#include <linux/time.h>
+#include <linux/ktime.h>
+#include <linux/timekeeping.h>
+
+#define IBTRS_SERVER_PORT 1234
+#define WC_ARRAY_SIZE 16
+#define IB_APM_TIMEOUT 16 /* 4.096 * 2 ^ 16 = 260 msec */
+
+#define USR_MSG_CNT 64
+#define USR_CON_BUF_SIZE (USR_MSG_CNT * 2) /* double bufs for ACK's */
+
+#define DEFAULT_HEARTBEAT_TIMEOUT_MS 20000
+#define MIN_HEARTBEAT_TIMEOUT_MS 5000
+#define HEARTBEAT_INTV_MS 500
+#define HEARTBEAT_INTV_JIFFIES msecs_to_jiffies(HEARTBEAT_INTV_MS)
+
+#define MIN_RTR_CNT 1
+#define MAX_RTR_CNT 7
+
+/*
+ * With the current size of the tag allocated on the client, 4K is the maximum
+ * number of tags we can allocate. (see IBNBD-2321)
+ * This number is also used on the client to allocate the IU for the user
+ * connection to receive the RDMA addresses from the server.
+ */
+#define MAX_SESS_QUEUE_DEPTH 4096
+
+#define XX(a) case (a): return #a
+
+#define IBTRS_ADDRLEN sizeof("ipv6:[xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx]")
+
+static inline const char *ib_wc_opcode_str(enum ib_wc_opcode opcode)
+{
+	switch (opcode) {
+	XX(IB_WC_SEND);
+	XX(IB_WC_RDMA_WRITE);
+	XX(IB_WC_RDMA_READ);
+	XX(IB_WC_COMP_SWAP);
+	XX(IB_WC_FETCH_ADD);
+	/* recv-side); inbound completion */
+	XX(IB_WC_RECV);
+	XX(IB_WC_RECV_RDMA_WITH_IMM);
+	default: return "IB_WC_OPCODE_UNKNOWN";
+	}
+}
+
+
+struct ib_session {
+	struct ib_pd		*pd;
+	struct ib_mr		*mr;
+	struct ib_event_handler	event_handler;
+};
+
+struct ibtrs_ib_path {
+	union ib_gid    p_sgid;
+	union ib_gid    p_dgid;
+};
+
+struct ib_con {
+	struct ib_qp		*qp ____cacheline_aligned;
+	struct ib_cq		*cq ____cacheline_aligned;
+	struct ib_send_wr	beacon;
+	struct rdma_cm_id	*cm_id;
+	struct ibtrs_ib_path    pri_path;
+	struct ibtrs_ib_path   cur_path;
+	char			*addr;
+	char			*hostname;
+};
+
+struct ibtrs_iu {
+	struct list_head        list;
+	dma_addr_t              dma_addr;
+	void                    *buf;
+	size_t                  size;
+	enum dma_data_direction direction;
+	bool			is_msg;
+	u32			tag;
+};
+
+struct ibtrs_heartbeat {
+	atomic64_t	send_ts_ms;
+	atomic64_t	recv_ts_ms;
+	u32		timeout_ms;
+	u32		warn_timeout_ms;
+	char		*addr;
+	char		*hostname;
+};
+
+#define IBTRS_VERSION 2
+#define IBTRS_UUID_SIZE 16
+#define IO_MSG_SIZE 24
+#define IB_IMM_SIZE_BITS 32
+
+#define GCC_DIAGNOSTIC_AWARE ((__GNUC__ > 6))
+#if GCC_DIAGNOSTIC_AWARE
+#pragma GCC diagnostic push
+#pragma GCC diagnostic warning "-Wpadded"
+#endif
+
+/**
+ * enum ibtrs_msg_types - IBTRS message types. DO NOT REMOVE OR REORDER!!!
+ * @IBTRS_MSG_SESS_OPEN:	Client requests new session on Server
+ * @IBTRS_MSG_SESS_OPEN_RESP:	Server informs Client about session parameters
+ * @IBTRS_MSG_CON_OPEN:		Client requests new connection to server
+ * @IBTRS_MSG_RDMA_WRITE:	Client writes data per RDMA to Server
+ * @IBTRS_MSG_REQ_RDMA_WRITE:	Client requests data transfer per RDMA
+ * @IBTRS_MSG_USER:		Data transfer per Infiniband message
+ * @IBTRS_MSG_ERR:		Fatal Error happened
+ * @IBTRS_MSG_SESS_INFO:	Client requests about session info
+ */
+enum ibtrs_msg_types {
+	IBTRS_MSG_SESS_OPEN,
+	IBTRS_MSG_SESS_OPEN_RESP,
+	IBTRS_MSG_CON_OPEN,
+	IBTRS_MSG_RDMA_WRITE,
+	IBTRS_MSG_REQ_RDMA_WRITE,
+	IBTRS_MSG_USER,
+	IBTRS_MSG_ERROR,
+	IBTRS_MSG_SESS_INFO,
+};
+
+/**
+ * struct ibtrs_msg_hdr - Common header of all IBTRS messages
+ * @type:	Message type, valid values see: enum ibtrs_msg_types
+ * @tsize:	Total size of transferred data
+ *
+ * Don't move the first 8 padding bytes! It's a workaround for a kernel bug.
+ * See IBNBD-610 for details
+ *
+ * DO NOT CHANGE!
+ */
+struct ibtrs_msg_hdr {
+	u8			__padding1;
+	u8			type;
+	u16			__padding2;
+	u32			tsize;
+};
+
+#define IBTRS_HDR_LEN sizeof(struct ibtrs_msg_hdr)
+
+/**
+ * struct ibtrs_msg_session_open - Opens a new session between client and server
+ * @hdr:	message header
+ * @uuid:	client host identifier, unique until module reload
+ * @ver:	IBTRS protocol version
+ * @con_cnt:    number of connections in this session
+ * @reserved:   reserved fields for future usage, 28 bytes is maximum for
+ *		all IPv6/IPv4 session
+ *
+ * DO NOT CHANGE members before ver.
+ */
+struct ibtrs_msg_sess_open {
+	struct ibtrs_msg_hdr	hdr;
+	u8			uuid[IBTRS_UUID_SIZE];
+	u8			ver;
+	u8			con_cnt;
+	u8			reserved[30];
+};
+
+/**
+ * struct ibtrs_msg_sess_info
+ * @hdr:		message header
+ * @hostname:		client host name
+ */
+struct ibtrs_msg_sess_info {
+	struct ibtrs_msg_hdr	hdr;
+	u8                      hostname[MAXHOSTNAMELEN];
+};
+
+#define MSG_SESS_INFO_SIZE sizeof(struct ibtrs_msg_sess_info)
+
+/*
+ *  Data Layout in RDMA-Bufs:
+ *
+ * +---------RDMA-BUF--------+
+ * |         Slice N	     |
+ * | +---------------------+ |
+ * | |      I/O data       | |
+ * | |---------------------| |
+ * | |      IBNBD MSG	   | |
+ * | |---------------------| |
+ * | |	    IBTRS MSG	   | |
+ * | +---------------------+ |
+ * +-------------------------+
+ * |	     Slice N+1	     |
+ * | +---------------------+ |
+ * | |       I/O data	   | |
+ * | |---------------------| |
+ * | |	     IBNBD MSG     | |
+ * | |---------------------| |
+ * | |       IBTRS MSG     | |
+ * | +---------------------+ |
+ * +-------------------------+
+ */
+
+#define IBTRS_MSG_RESV_LEN 128
+/**
+ * struct ibtrs_msg_sess_open_resp - Servers response to %IBTRS_MSG_SESS_OPEN
+ * @hdr:	message header
+ * @ver:	IBTRS protocol version
+ * @cnt:	Number of rdma addresses in this message
+ * @rkey:	remote key to allow client to access buffers
+ * @hostname:   hostname of local host
+ * @reserved:    reserved fields for future usage
+ * @max_inflight_msg:  max inflight messages (queue-depth) in this session
+ * @max_io_size:   max io size server supports
+ * @max_req_size:   max infiniband message size server supports
+ * @addr:	rdma addresses of buffers
+ *
+ * DO NOT CHANGE members before ver.
+ */
+struct ibtrs_msg_sess_open_resp {
+	struct ibtrs_msg_hdr	hdr;
+	u8			ver;
+	u8			__padding1;
+	u16			cnt;
+	u32			rkey;
+	u8                      hostname[MAXHOSTNAMELEN];
+	u8			reserved[IBTRS_MSG_RESV_LEN];
+	u16			max_inflight_msg;
+	u32			max_io_size;
+	u32			max_req_size;
+	u64			addr[];
+};
+
+#define IBTRS_MSG_SESS_OPEN_RESP_LEN(cnt) \
+	(sizeof(struct ibtrs_msg_sess_open_resp) + sizeof(u64) * cnt)
+/**
+ * struct ibtrs_msg_con_open - Opens a new connection between client and server
+ * @hdr:		message header
+ * @uuid:		client host identifier, unique until module reload
+ */
+struct ibtrs_msg_con_open {
+	struct ibtrs_msg_hdr	hdr;
+	u8			uuid[IBTRS_UUID_SIZE];
+};
+
+/**
+ * struct ibtrs_msg_user - Data exchanged a Infiniband message
+ * @hdr:		message header
+ * @payl:		Payload from user user module
+ */
+struct ibtrs_msg_user {
+	struct ibtrs_msg_hdr	hdr;
+	u8			payl[];
+};
+
+/**
+ * struct ibtrs_sg_desc - RDMA-Buffer entry description
+ * @addr:	Address of RDMA destination buffer
+ * @key:	Authorization rkey to write to the buffer
+ * @len:	Size of the buffer
+ */
+struct ibtrs_sg_desc {
+	u64			addr;
+	u32			key;
+	u32			len;
+};
+
+#define IBTRS_SG_DESC_LEN sizeof(struct ibtrs_sg_desc)
+
+/**
+ * struct ibtrs_msg_req_rdma_write - RDMA data transfer request from client
+ * @hdr:		message header
+ * @sg_cnt:		number of @desc entries
+ * @desc:		RDMA bufferst where the server can write the result to
+ */
+struct ibtrs_msg_req_rdma_write {
+	struct ibtrs_msg_hdr	hdr;
+	u32			__padding;
+	u32			sg_cnt;
+	struct ibtrs_sg_desc    desc[];
+};
+
+/**
+ * struct_msg_rdma_write - Message transferred to server with RDMA-Write
+ * @hdr:		message header
+ */
+struct ibtrs_msg_rdma_write {
+	struct ibtrs_msg_hdr	hdr;
+};
+
+/**
+ * struct ibtrs_msg_error - Error message
+ * @hdr:		message header
+ * @errno:		Errno number describing the error
+ */
+struct ibtrs_msg_error {
+	struct ibtrs_msg_hdr	hdr;
+	s32			errno;
+	u32			__padding;
+};
+
+#if GCC_DIAGNOSTIC_AWARE
+#pragma GCC diagnostic pop
+#endif
+
+int ibtrs_validate_message(u16 queue_depth, const void *hdr);
+
+void fill_ibtrs_msg_sess_open(struct ibtrs_msg_sess_open *msg, u8 con_cnt,
+			      const uuid_le *uuid);
+
+void fill_ibtrs_msg_con_open(struct ibtrs_msg_con_open *msg,
+			     const uuid_le *uuid);
+
+void fill_ibtrs_msg_sess_info(struct ibtrs_msg_sess_info *msg,
+			      const char *hostname);
+
+void ibtrs_heartbeat_set_send_ts(struct ibtrs_heartbeat *h);
+void ibtrs_set_last_heartbeat(struct ibtrs_heartbeat *h);
+u64 ibtrs_last_heartbeat_diff_ms(const struct ibtrs_heartbeat *h);
+u64 ibtrs_heartbeat_send_ts_diff_ms(const struct ibtrs_heartbeat *h);
+
+void ibtrs_set_heartbeat_timeout(struct ibtrs_heartbeat *h, u32 timeout_ms);
+
+void ibtrs_heartbeat_warn(const struct ibtrs_heartbeat *h);
+
+bool ibtrs_heartbeat_timeout_is_expired(const struct ibtrs_heartbeat *h);
+
+u32 ibtrs_heartbeat_get_send_delay(const struct ibtrs_heartbeat *h);
+u32 ibtrs_heartbeat_get_check_delay(const struct ibtrs_heartbeat *h);
+void ibtrs_iu_put(struct list_head *iu_list, struct ibtrs_iu *iu);
+struct ibtrs_iu *ibtrs_iu_get(struct list_head *iu_list);
+
+struct ibtrs_iu *ibtrs_iu_alloc(u32 tag, size_t size, gfp_t t,
+				struct ib_device *dev,
+				enum dma_data_direction, bool is_msg);
+
+void ibtrs_iu_free(struct ibtrs_iu *iu, enum dma_data_direction dir,
+		   struct ib_device *dev);
+
+int ibtrs_write_empty_imm(struct ib_qp *qp, u32 imm_data,
+			  enum ib_send_flags flags);
+
+int ibtrs_post_send(struct ib_qp *qp, struct ib_mr *mr, struct ibtrs_iu *iu,
+		    u32 size);
+
+int ib_post_rdma_write_imm(struct ib_qp *qp, struct ib_sge *sge,
+			   unsigned int num_sge, u32 rkey, u64 rdma_addr,
+			   u64 wr_id, u32 imm_data, enum ib_send_flags flags);
+
+int ib_post_rdma_write(struct ib_qp *qp, struct ib_sge *sge,
+		       unsigned int num_sge, u32 rkey, u64 rdma_addr,
+		       u64 wr_id);
+int post_beacon(struct ib_con *con);
+/**
+ * ib_session_init() - Create a new IB session
+ */
+int ib_session_init(struct ib_device *dev, struct ib_session *session);
+
+/**
+ * ib_con_init() - initialize and add a ib_con to the session
+ * @con:	&ib_con to initialize
+ * @session:	session the &ib_con is added to
+ * @ctx:	CQ context, returned to the user via completion handler
+ *
+ * Returns 0 on success otherwise a negative errno code
+ */
+int ib_con_init(struct ib_con *con, struct rdma_cm_id *cm_id,
+		u32 max_send_sge,
+		ib_comp_handler comp_handler, void *ctx, int cq_vector,
+		u16 cq_size, u16 wr_queue_size, struct ib_session *session);
+
+int ibtrs_request_cq_notifications(struct ib_con *con);
+
+void ib_con_destroy(struct ib_con *con);
+
+/**
+ * ib_session_destroy() - Free a session
+ * The corresponding &ib_con must have been freed before.
+ */
+void ib_session_destroy(struct ib_session *session);
+
+int ib_get_max_wr_queue_size(struct ib_device *dev);
+
+int ibtrs_addr_to_str(const struct sockaddr_storage *addr, char *buf,
+		      size_t len);
+
+int ibtrs_heartbeat_timeout_validate(int timeout);
+
+/**
+ * kvec_length() - Total number of bytes covered by an kvec.
+ */
+static inline size_t kvec_length(const struct kvec *vec, size_t nr)
+{
+	size_t seg, ret = 0;
+
+	for (seg = 0; seg < nr; seg++)
+		ret += vec[seg].iov_len;
+	return ret;
+}
+
+/**
+ * copy_from_kvec() - Copy kvec to the buffer.
+ */
+static inline void copy_from_kvec(void *data, const struct kvec *vec,
+				  size_t copy)
+{
+	size_t seg, len;
+
+	for (seg = 0; copy; seg++) {
+		len = min(vec[seg].iov_len, copy);
+		memcpy(data, vec[seg].iov_base, len);
+		data += len;
+		copy -= len;
+	}
+}
+
+static inline u64 timespec_to_ms(const struct timespec *ts)
+{
+	return timespec_to_ns(ts) / NSEC_PER_MSEC;
+}
+
+u64 timediff_cur_ms(u64 cur_ms);
+
+void *ibtrs_malloc(size_t size);
+void *ibtrs_zalloc(size_t size);
+
+#define STAT_STORE_FUNC(store, reset) \
+static ssize_t store##_store(struct kobject *kobj, \
+			    struct kobj_attribute *attr, \
+			    const char *buf, size_t count) \
+{ \
+	int ret = -EINVAL; \
+	struct ibtrs_session *sess = container_of(kobj, struct ibtrs_session, \
+						  kobj_stats); \
+\
+	if (sysfs_streq(buf, "1")) \
+		ret = reset(sess, true); \
+	else if (sysfs_streq(buf, "0"))\
+		ret = reset(sess, false); \
+	if (ret) \
+		return ret; \
+\
+	return count; \
+}
+
+#define STAT_SHOW_FUNC(show, print) \
+static ssize_t show##_show(struct kobject *kobj, \
+			   struct kobj_attribute *attr, \
+			   char *page) \
+{ \
+	struct ibtrs_session *sess = container_of(kobj, struct ibtrs_session, \
+						  kobj_stats); \
+\
+	return print(sess, page, PAGE_SIZE); \
+}
+
+#define STAT_ATTR(stat, print, reset) \
+STAT_STORE_FUNC(stat, reset) \
+STAT_SHOW_FUNC(stat, print) \
+static struct kobj_attribute stat##_attr = \
+		__ATTR(stat, 0644, \
+		       stat##_show, \
+		       stat##_store)
+
+#endif /*__IBTRS_H*/
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 02/28] ibtrs: add header for log MICROs shared between ibtrs_client and ibtrs_server
  2017-03-24 10:45 ` Jack Wang
  (?)
  (?)
@ 2017-03-24 10:45 ` Jack Wang
  -1 siblings, 0 replies; 87+ messages in thread
From: Jack Wang @ 2017-03-24 10:45 UTC (permalink / raw)
  To: linux-block, linux-rdma
  Cc: dledford, axboe, hch, mail, Milind.dumbare, yun.wang, Jack Wang,
	Kleber Souza, Danil Kipnis, Roman Pen

From: Jack Wang <jinpu.wang@profitbricks.com>

Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Kleber Souza <kleber.souza@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kipnis@profitbricks.com>
Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
---
 include/rdma/ibtrs_log.h | 88 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 88 insertions(+)
 create mode 100644 include/rdma/ibtrs_log.h

diff --git a/include/rdma/ibtrs_log.h b/include/rdma/ibtrs_log.h
new file mode 100644
index 0000000..28ff5b4
--- /dev/null
+++ b/include/rdma/ibtrs_log.h
@@ -0,0 +1,88 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler <mail@fholler.de>
+ *          Jack Wang <jinpu.wang@profitbricks.com>
+ *   	    Kleber Souza <kleber.souza@profitbricks.com>
+ * 	    Danil Kipnis <danil.kipnis@profitbricks.com>
+ *   	    Roman Pen <roman.penyaev@profitbricks.com>
+ *          Milind Dumbare <Milind.dumbare@gmail.com>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#ifndef __IBTRS_LOG_H__
+#define __IBTRS_LOG_H__
+#include "ibtrs.h"
+
+#define DEB(fmt, ...) pr_debug("ibtrs L%d " fmt, __LINE__, ##__VA_ARGS__)
+#define DEB_RL(fmt, ...) pr_debug_ratelimited("ibtrs L%d " fmt, \
+					      __LINE__, ##__VA_ARGS__)
+static inline void ibtrs_deb_msg_hdr(const char *prep,
+				     const struct ibtrs_msg_hdr *hdr)
+{
+	DEB("%sibtrs msg hdr:\n"
+	    "\ttype: %d\n"
+	    "\ttsize: %d\n", prep, hdr->type, hdr->tsize);
+}
+
+#define ERR_NP(fmt, ...) pr_err("ibtrs L%d ERR: " fmt, \
+				__LINE__, ##__VA_ARGS__)
+
+#define WRN_NP(fmt, ...) pr_warn("ibtrs L%d WARN: " fmt, \
+				__LINE__, ##__VA_ARGS__)
+#define INFO_NP(fmt, ...)  pr_info("ibtrs: " fmt, ##__VA_ARGS__)
+
+#define INFO_NP_RL(fmt, ...) pr_info_ratelimited("ibtrs: " fmt, ##__VA_ARGS__)
+
+#define ibtrs_prefix(sess) ((sess->hostname[0] != '\0') ? sess->hostname : \
+							  sess->addr)
+
+#define ERR(sess, fmt, ...) pr_err("ibtrs L%d <%s> ERR: " fmt, \
+				__LINE__, ibtrs_prefix(sess), ##__VA_ARGS__)
+#define ERR_RL(sess, fmt, ...) pr_err_ratelimited("ibtrs L%d <%s> ERR: " fmt, \
+				__LINE__, ibtrs_prefix(sess), ##__VA_ARGS__)
+
+#define WRN(sess, fmt, ...) pr_warn("ibtrs L%d <%s> WARN: " fmt, \
+				__LINE__, ibtrs_prefix(sess), ##__VA_ARGS__)
+#define WRN_RL(sess, fmt, ...) pr_warn_ratelimited("ibtrs L%d <%s> WARN: " \
+			fmt, __LINE__, ibtrs_prefix(sess), ##__VA_ARGS__)
+
+#define INFO(sess, fmt, ...) pr_info("ibtrs <%s>: " fmt, \
+				    ibtrs_prefix(sess), ##__VA_ARGS__)
+#define INFO_RL(sess, fmt, ...) pr_info_ratelimited("ibtrs <%s>: " fmt, \
+					ibtrs_prefix(sess), ##__VA_ARGS__)
+#endif /*__IBTRS_LOG_H__*/
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 03/28] ibtrs_lib: add common functions shared by client and server
  2017-03-24 10:45 ` Jack Wang
                   ` (2 preceding siblings ...)
  (?)
@ 2017-03-24 10:45 ` Jack Wang
  -1 siblings, 0 replies; 87+ messages in thread
From: Jack Wang @ 2017-03-24 10:45 UTC (permalink / raw)
  To: linux-block, linux-rdma
  Cc: dledford, axboe, hch, mail, Milind.dumbare, yun.wang, Jack Wang,
	Kleber Souza, Danil Kipnis, Roman Pen

From: Jack Wang <jinpu.wang@profitbricks.com>

These files define functions used by both client and server, eg
validate protocol message, heartbeat helpers, etc.

Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Kleber Souza <kleber.souza@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kipnis@profitbricks.com>
Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
---
 drivers/infiniband/ulp/ibtrs_lib/common.c      | 104 +++++++
 drivers/infiniband/ulp/ibtrs_lib/heartbeat.c   | 112 +++++++
 drivers/infiniband/ulp/ibtrs_lib/ibtrs-proto.c | 248 +++++++++++++++
 drivers/infiniband/ulp/ibtrs_lib/ibtrs.c       | 412 +++++++++++++++++++++++++
 drivers/infiniband/ulp/ibtrs_lib/iu.c          | 113 +++++++
 5 files changed, 989 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs_lib/common.c
 create mode 100644 drivers/infiniband/ulp/ibtrs_lib/heartbeat.c
 create mode 100644 drivers/infiniband/ulp/ibtrs_lib/ibtrs-proto.c
 create mode 100644 drivers/infiniband/ulp/ibtrs_lib/ibtrs.c
 create mode 100644 drivers/infiniband/ulp/ibtrs_lib/iu.c

diff --git a/drivers/infiniband/ulp/ibtrs_lib/common.c b/drivers/infiniband/ulp/ibtrs_lib/common.c
new file mode 100644
index 0000000..81affa7
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs_lib/common.c
@@ -0,0 +1,104 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler < mail@fholler.de>
+ *          Jack Wang <jinpu.wang@profitbricks.com>
+ *   	    Kleber Souza <kleber.souza@profitbricks.com>
+ * 	    Danil Kipnis <danil.kipnis@profitbricks.com>
+ *   	    Roman Pen <roman.penyaev@profitbricks.com>
+ *          Milind Dumbare <Milind.dumbare@gmail.com>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#include <linux/slab.h>
+#include <linux/vmalloc.h>
+#include <rdma/ibtrs.h>
+
+u64 timediff_cur_ms(u64 cur_ms)
+{
+	struct timespec cur = CURRENT_TIME;
+	struct timespec ts = ns_to_timespec(cur_ms * NSEC_PER_MSEC);
+
+	if (timespec_compare(&cur, &ts) < 0)
+		return timespec_to_ms(&ts) - timespec_to_ms(&cur);
+	else
+		return timespec_to_ms(&cur) - timespec_to_ms(&ts);
+}
+
+/*
+ * ibtrs_malloc() - allocate kernel or virtual memory
+ * @size: size to be allocated
+ *
+ * The pointer returned must be freed with kvfree()
+ */
+void *ibtrs_malloc(size_t size)
+{
+	void *p;
+
+	p = kmalloc(size, (GFP_KERNEL | __GFP_REPEAT));
+	if (p)
+		return p;
+
+	/* try allocating virtual memory */
+	p = vmalloc(size);
+	if (p)
+		return p;
+
+	return NULL;
+}
+
+/*
+ * ibtrs_zalloc() - allocate kernel or virtual memory
+ * @size: size to be allocated
+ *
+ * The pointer returned must be freed with kvfree()
+ */
+void *ibtrs_zalloc(size_t size)
+{
+	void *p;
+
+	p = kzalloc(size, GFP_KERNEL);
+	if (p)
+		return p;
+
+	/* try allocating virtual memory */
+	p = vzalloc(size);
+	if (p)
+		return p;
+
+	return NULL;
+}
diff --git a/drivers/infiniband/ulp/ibtrs_lib/heartbeat.c b/drivers/infiniband/ulp/ibtrs_lib/heartbeat.c
new file mode 100644
index 0000000..1575931
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs_lib/heartbeat.c
@@ -0,0 +1,112 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler < mail@fholler.de>
+ *          Jack Wang <jinpu.wang@profitbricks.com>
+ *   	    Kleber Souza <kleber.souza@profitbricks.com>
+ * 	    Danil Kipnis <danil.kipnis@profitbricks.com>
+ *   	    Roman Pen <roman.penyaev@profitbricks.com>
+ *          Milind Dumbare <Milind.dumbare@gmail.com>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#include <rdma/ibtrs.h>
+#include <rdma/ibtrs_log.h>
+
+inline void ibtrs_heartbeat_set_send_ts(struct ibtrs_heartbeat *h)
+{
+	struct timespec ts = CURRENT_TIME;
+
+	atomic64_set(&h->send_ts_ms, timespec_to_ms(&ts));
+}
+
+inline void ibtrs_set_last_heartbeat(struct ibtrs_heartbeat *h)
+{
+	struct timespec ts = CURRENT_TIME;
+
+	atomic64_set(&h->recv_ts_ms, timespec_to_ms(&ts));
+}
+
+inline u64 ibtrs_heartbeat_send_ts_diff_ms(const struct ibtrs_heartbeat *h)
+{
+	return timediff_cur_ms(atomic64_read(&h->send_ts_ms));
+}
+
+inline u64 ibtrs_recv_ts_ms_diff_ms(const struct ibtrs_heartbeat *h)
+{
+	return timediff_cur_ms(atomic64_read(&h->recv_ts_ms));
+}
+
+void ibtrs_set_heartbeat_timeout(struct ibtrs_heartbeat *h, u32 timeout_ms)
+{
+	h->timeout_ms = timeout_ms;
+	h->warn_timeout_ms = (timeout_ms >> 1) + (timeout_ms >> 2);
+}
+
+void ibtrs_heartbeat_warn(const struct ibtrs_heartbeat *h)
+{
+	u64 diff = ibtrs_recv_ts_ms_diff_ms(h);
+
+	DEB("last heartbeat message from %s was received %lu, %llums"
+	    " ago\n", ibtrs_prefix(h), atomic64_read(&h->recv_ts_ms), diff);
+
+	if (diff >= h->warn_timeout_ms)
+		WRN(h, "Last Heartbeat message received %llums ago,"
+		       " timeout: %ums\n", diff, h->timeout_ms);
+}
+
+bool ibtrs_heartbeat_timeout_is_expired(const struct ibtrs_heartbeat *h)
+{
+	u64 diff;
+
+	if (h->timeout_ms == 0)
+		return false;
+
+	diff = ibtrs_recv_ts_ms_diff_ms(h);
+
+	DEB("last heartbeat message from %s received %lu, %llums ago\n",
+	    ibtrs_prefix(h), atomic64_read(&h->recv_ts_ms), diff);
+
+	if (diff >= h->timeout_ms) {
+		ERR(h, "Heartbeat timeout expired, no heartbeat received "
+		       "for %llums, timeout: %ums\n", diff,
+		       h->timeout_ms);
+		return true;
+	}
+
+	return false;
+}
diff --git a/drivers/infiniband/ulp/ibtrs_lib/ibtrs-proto.c b/drivers/infiniband/ulp/ibtrs_lib/ibtrs-proto.c
new file mode 100644
index 0000000..43ae8ec
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs_lib/ibtrs-proto.c
@@ -0,0 +1,248 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler < mail@fholler.de>
+ *          Jack Wang <jinpu.wang@profitbricks.com>
+ *   	    Kleber Souza <kleber.souza@profitbricks.com>
+ * 	    Danil Kipnis <danil.kipnis@profitbricks.com>
+ *   	    Roman Pen <roman.penyaev@profitbricks.com>
+ *          Milind Dumbare <Milind.dumbare@gmail.com>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#include <linux/errno.h>
+#include <linux/printk.h>
+#include <rdma/ibtrs.h>
+#include <rdma/ibtrs_log.h>
+
+static int
+ibtrs_validate_msg_sess_open_resp(const struct ibtrs_msg_sess_open_resp *msg)
+{
+	static const int min_bufs = 1;
+
+	if (unlikely(msg->hdr.tsize !=
+				IBTRS_MSG_SESS_OPEN_RESP_LEN(msg->cnt))) {
+		ERR_NP("Session open resp msg received with unexpected length"
+		       " %dB instead of %luB\n", msg->hdr.tsize,
+		       IBTRS_MSG_SESS_OPEN_RESP_LEN(msg->cnt));
+
+		return -EINVAL;
+	}
+
+	if (msg->max_inflight_msg < min_bufs) {
+		ERR_NP("Sess Open msg received with invalid max_inflight_msg %d"
+		       " expected >= %d\n", msg->max_inflight_msg, min_bufs);
+		return -EINVAL;
+	}
+
+	if (unlikely(msg->cnt != msg->max_inflight_msg)) {
+		ERR_NP("Session open msg received with invalid cnt %d"
+		       " expected %d (queue_depth)\n", msg->cnt,
+		       msg->max_inflight_msg);
+		return -EINVAL;
+	}
+
+	if (msg->ver != IBTRS_VERSION) {
+		WRN_NP("Sess open resp version mismatch: client version %d,"
+		       " server version: %d\n", IBTRS_VERSION, msg->ver);
+	}
+
+	return 0;
+}
+
+static int
+ibtrs_validate_msg_user(const struct ibtrs_msg_user *msg)
+{
+	/* keep as place holder */
+	return 0;
+}
+
+static int
+ibtrs_validate_msg_rdma_write(const struct ibtrs_msg_rdma_write *msg,
+			      u16 queue_depth)
+{
+	if (unlikely(msg->hdr.tsize <= sizeof(*msg))) {
+		ERR_NP("RDMA-Write msg received with invalid length %d"
+		       " expected > %lu\n", msg->hdr.tsize, sizeof(*msg));
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int
+ibtrs_validate_msg_req_rdma_write(const struct ibtrs_msg_req_rdma_write *msg,
+				  u16 queue_depth)
+{
+	if (unlikely(msg->hdr.tsize <= sizeof(*msg))) {
+		ERR_NP("Request-RDMA-Write msg request received with invalid"
+		       " length %d expected > %lu\n", msg->hdr.tsize,
+		       sizeof(*msg));
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int
+ibtrs_validate_msg_con_open(const struct ibtrs_msg_con_open *msg)
+{
+	if (unlikely(msg->hdr.tsize != sizeof(*msg))) {
+		ERR_NP("Con Open msg received with invalid length: %d"
+		       " expected %lu\n", msg->hdr.tsize, sizeof(*msg));
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int
+ibtrs_validate_msg_sess_open(const struct ibtrs_msg_sess_open *msg)
+{
+	if (msg->hdr.tsize != sizeof(*msg)) {
+		ERR_NP("Sess open msg received with invalid length: %d"
+		       " expected %lu\n", msg->hdr.tsize, sizeof(*msg));
+		return -EPROTONOSUPPORT;
+	}
+
+	if (msg->ver != IBTRS_VERSION) {
+		WRN_NP("Sess open msg version mismatch: client version %d,"
+		       " server version: %d\n", msg->ver, IBTRS_VERSION);
+	}
+
+	return 0;
+}
+
+static int ibtrs_validate_msg_sess_info(const struct ibtrs_msg_sess_info *msg)
+{
+	if (msg->hdr.tsize != sizeof(*msg)) {
+		ERR_NP("Error message received with invalid length: %d,"
+		       " expected %lu\n", msg->hdr.tsize, sizeof(*msg));
+		return -EPROTONOSUPPORT;
+	}
+
+	return 0;
+}
+
+static int ibtrs_validate_msg_error(const struct ibtrs_msg_error *msg)
+{
+	if (msg->hdr.tsize != sizeof(*msg)) {
+		ERR_NP("Error message received with invalid length: %d,"
+		       " expected %lu\n", msg->hdr.tsize, sizeof(*msg));
+		return -EPROTONOSUPPORT;
+	}
+
+	return 0;
+}
+
+int ibtrs_validate_message(u16 queue_depth, const void *data)
+{
+	const struct ibtrs_msg_hdr *hdr = data;
+
+	switch (hdr->type) {
+	case IBTRS_MSG_RDMA_WRITE: {
+		const struct ibtrs_msg_rdma_write *msg = data;
+
+		return ibtrs_validate_msg_rdma_write(msg, queue_depth);
+	}
+	case IBTRS_MSG_REQ_RDMA_WRITE: {
+		const struct ibtrs_msg_req_rdma_write *req = data;
+
+		return ibtrs_validate_msg_req_rdma_write(req, queue_depth);
+	}
+	case IBTRS_MSG_SESS_OPEN_RESP: {
+		const struct ibtrs_msg_sess_open_resp *msg = data;
+
+		return ibtrs_validate_msg_sess_open_resp(msg);
+	}
+	case IBTRS_MSG_SESS_INFO: {
+		const struct ibtrs_msg_sess_info *msg = data;
+
+		return ibtrs_validate_msg_sess_info(msg);
+	}
+	case IBTRS_MSG_USER: {
+		const struct ibtrs_msg_user *msg = data;
+
+		return ibtrs_validate_msg_user(msg);
+	}
+	case IBTRS_MSG_CON_OPEN: {
+		const struct ibtrs_msg_con_open *msg = data;
+
+		return ibtrs_validate_msg_con_open(msg);
+	}
+	case IBTRS_MSG_SESS_OPEN: {
+		const struct ibtrs_msg_sess_open *msg = data;
+
+		return ibtrs_validate_msg_sess_open(msg);
+	}
+	case IBTRS_MSG_ERROR: {
+		const struct ibtrs_msg_error *msg = data;
+
+		return ibtrs_validate_msg_error(msg);
+	}
+	default:
+		ERR_NP("Received IBTRS message with unknown type\n");
+		return -EINVAL;
+	}
+	return 0;
+}
+
+void fill_ibtrs_msg_sess_open(struct ibtrs_msg_sess_open *msg, u8 con_cnt,
+			      const uuid_le *uuid)
+{
+	msg->hdr.type		= IBTRS_MSG_SESS_OPEN;
+	msg->hdr.tsize		= sizeof(*msg);
+	msg->ver		= IBTRS_VERSION;
+	msg->con_cnt		= con_cnt;
+
+	memcpy(msg->uuid, uuid->b, IBTRS_UUID_SIZE);
+}
+
+void fill_ibtrs_msg_con_open(struct ibtrs_msg_con_open *msg,
+			     const uuid_le *uuid)
+{
+	msg->hdr.type		= IBTRS_MSG_CON_OPEN;
+	msg->hdr.tsize		= sizeof(*msg);
+	memcpy(msg->uuid, uuid->b, IBTRS_UUID_SIZE);
+}
+
+void fill_ibtrs_msg_sess_info(struct ibtrs_msg_sess_info *msg,
+			      const char *hostname) {
+	msg->hdr.type		= IBTRS_MSG_SESS_INFO;
+	msg->hdr.tsize		= sizeof(*msg);
+	memcpy(msg->hostname, hostname, sizeof(msg->hostname));
+}
diff --git a/drivers/infiniband/ulp/ibtrs_lib/ibtrs.c b/drivers/infiniband/ulp/ibtrs_lib/ibtrs.c
new file mode 100644
index 0000000..2bebcd0
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs_lib/ibtrs.c
@@ -0,0 +1,412 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler < mail@fholler.de>
+ *          Jack Wang <jinpu.wang@profitbricks.com>
+ *   	    Kleber Souza <kleber.souza@profitbricks.com>
+ * 	    Danil Kipnis <danil.kipnis@profitbricks.com>
+ *   	    Roman Pen <roman.penyaev@profitbricks.com>
+ *          Milind Dumbare <Milind.dumbare@gmail.com>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#include <rdma/ibtrs.h>
+#include <rdma/ibtrs_log.h>
+#include <rdma/ib.h>
+
+int ibtrs_write_empty_imm(struct ib_qp *qp, u32 imm_data,
+			  enum ib_send_flags flags)
+{
+	struct ib_send_wr wr, *bad_wr;
+
+	memset(&wr, 0, sizeof(wr));
+	wr.send_flags	= flags;
+	wr.opcode	= IB_WR_RDMA_WRITE_WITH_IMM;
+	wr.ex.imm_data	= cpu_to_be32(imm_data);
+
+	return ib_post_send(qp, &wr, &bad_wr);
+}
+
+int ibtrs_post_send(struct ib_qp *qp, struct ib_mr *mr, struct ibtrs_iu *iu,
+		    u32 size)
+{
+	struct ib_sge list;
+	struct ib_send_wr wr, *bad_wr;
+
+	if ((WARN_ON(size == 0)))
+		return -EINVAL;
+
+	list.addr   = iu->dma_addr;
+	list.length = size;
+	list.lkey   = mr->lkey;
+
+	memset(&wr, 0, sizeof(wr));
+	wr.next       = NULL;
+	wr.wr_id      = (uintptr_t)iu;
+	wr.sg_list    = &list;
+	wr.num_sge    = 1;
+	wr.opcode     = IB_WR_SEND;
+	wr.send_flags = IB_SEND_SIGNALED;
+
+	return ib_post_send(qp, &wr, &bad_wr);
+}
+
+static int post_rdma_write(struct ib_qp *qp, struct ib_sge *sge, size_t num_sge,
+			   u32 rkey, u64 rdma_addr, u64 wr_id, u32 imm_data,
+			   enum ib_wr_opcode opcode, enum ib_send_flags flags)
+{
+	struct ib_send_wr *bad_wr;
+	struct ib_rdma_wr wr;
+	int i;
+
+	wr.wr.next			= NULL;
+	wr.wr.wr_id		= wr_id;
+	wr.wr.sg_list		= sge;
+	wr.wr.num_sge		= num_sge;
+	wr.rkey		= rkey;
+	wr.remote_addr	= rdma_addr;
+	wr.wr.opcode		= opcode;
+	wr.wr.ex.imm_data		= cpu_to_be32(imm_data);
+	wr.wr.send_flags		= flags;
+
+	/* if one of the sges has 0 size,, the operation will fail with an
+	 * length error
+	 */
+	for (i = 0; i < num_sge; i++)
+		if (WARN_ON(sge[i].length == 0))
+			return -EINVAL;
+
+	return ib_post_send(qp, &wr.wr, &bad_wr);
+}
+
+inline int ib_post_rdma_write(struct ib_qp *qp, struct ib_sge *sge,
+			      unsigned int num_sge, u32 rkey, u64 rdma_addr,
+			      u64 wr_id)
+{
+	return post_rdma_write(qp, sge, num_sge, rkey, rdma_addr, wr_id,
+			       0, IB_WR_RDMA_WRITE, 0);
+}
+
+inline int ib_post_rdma_write_imm(struct ib_qp *qp, struct ib_sge *sge,
+				  unsigned int num_sge, u32 rkey, u64 rdma_addr,
+				  u64 wr_id, u32 imm_data,
+				  enum ib_send_flags flags)
+{
+	return post_rdma_write(qp, sge, num_sge, rkey, rdma_addr, wr_id,
+			       imm_data, IB_WR_RDMA_WRITE_WITH_IMM, flags);
+}
+
+int ib_get_max_wr_queue_size(struct ib_device *dev)
+{
+	struct ib_device_attr *attr = &dev->attrs;
+
+	return attr->max_qp_wr;
+}
+
+static const char *ib_event_str(enum ib_event_type ev)
+{
+	switch (ev) {
+	case IB_EVENT_CQ_ERR:
+		return "IB_EVENT_CQ_ERR";
+	case IB_EVENT_QP_FATAL:
+		return "IB_EVENT_QP_FATAIL";
+	case IB_EVENT_QP_REQ_ERR:
+		return "IB_EVENT_QP_REQ_ERR";
+	case IB_EVENT_QP_ACCESS_ERR:
+		return "IB_EVENT_QP_ACCESS_ERR";
+	case IB_EVENT_COMM_EST:
+		return "IB_EVENT_COMM_EST";
+	case IB_EVENT_SQ_DRAINED:
+		return "IB_EVENT_SQ_DRAINED";
+	case IB_EVENT_PATH_MIG:
+		return "IB_EVENT_PATH_MIG";
+	case IB_EVENT_PATH_MIG_ERR:
+		return "IB_EVENT_PATH_MIG_ERR";
+	case IB_EVENT_DEVICE_FATAL:
+		return "IB_EVENT_DEVICE_FATAL";
+	case IB_EVENT_PORT_ACTIVE:
+		return "IB_EVENT_PORT_ACTIVE";
+	case IB_EVENT_PORT_ERR:
+		return "IB_EVENT_PORT_ERR";
+	case IB_EVENT_LID_CHANGE:
+		return "IB_EVENT_LID_CHANGE";
+	case IB_EVENT_PKEY_CHANGE:
+		return "IB_EVENT_PKEY_CHANGE";
+	case IB_EVENT_SM_CHANGE:
+		return "IB_EVENT_SM_CHANGE";
+	case IB_EVENT_SRQ_ERR:
+		return "IB_EVENT_SRQ_ERR";
+	case IB_EVENT_SRQ_LIMIT_REACHED:
+		return "IB_EVENT_SRQ_LIMIT_REACHED";
+	case IB_EVENT_QP_LAST_WQE_REACHED:
+		return "IB_EVENT_QP_LAST_WQE_REACHED";
+	case IB_EVENT_CLIENT_REREGISTER:
+		return "IB_EVENT_CLIENT_REREGISTER";
+	case IB_EVENT_GID_CHANGE:
+		return "IB_EVENT_GID_CHANGE";
+	default:
+		return "Unknown IB event";
+	}
+};
+
+static void ib_event_handler(struct ib_event_handler *h, struct ib_event *ev)
+{
+	switch (ev->event) {
+	case IB_EVENT_DEVICE_FATAL:
+	case IB_EVENT_PORT_ERR:
+		WRN_NP("Received IB event %s (%d) on device %s port %d\n",
+		       ib_event_str(ev->event), ev->event,
+		       ev->device->name, ev->element.port_num);
+		break;
+	default:
+		INFO_NP("Received IB event %s (%d) on device %s port %d\n",
+			ib_event_str(ev->event), ev->event,
+			ev->device->name, ev->element.port_num);
+		break;
+	}
+}
+
+static void qp_event_handler(struct ib_event *ev, void *ctx)
+{
+	struct ib_con *con = ctx;
+
+	switch (ev->event) {
+	case IB_EVENT_COMM_EST:
+		INFO(con, "QP event %s (%d) received\n",
+		     ib_event_str(ev->event), ev->event);
+		rdma_notify(con->cm_id, IB_EVENT_COMM_EST);
+		break;
+	default:
+		INFO(con, "Unhandled QP event %s (%d) received\n",
+		     ib_event_str(ev->event), ev->event);
+		break;
+	}
+}
+
+static void cq_event_handler(struct ib_event *ev, void *ctx)
+{
+	INFO_NP("CQ event %s (%d)\n", ib_event_str(ev->event), ev->event);
+}
+
+int ib_session_init(struct ib_device *dev, struct ib_session *s)
+{
+	int err;
+
+	s->pd = ib_alloc_pd(dev, IB_PD_UNSAFE_GLOBAL_RKEY);
+	if (IS_ERR(s->pd)) {
+		ERR_NP("Allocating protection domain failed, errno: %ld\n",
+		       PTR_ERR(s->pd));
+		return PTR_ERR(s->pd);
+	}
+	s->mr = s->pd->__internal_mr;
+	INIT_IB_EVENT_HANDLER(&s->event_handler, dev, ib_event_handler);
+	err = ib_register_event_handler(&s->event_handler);
+	if (err) {
+		ERR_NP("Registering IB event handler failed, errno: %d\n",
+		       err);
+		goto err;
+	}
+
+	return 0;
+
+err:
+	ib_dealloc_pd(s->pd);
+	s->pd = NULL;
+	s->mr = NULL;
+
+	return err;
+}
+
+static int init_cq(struct ib_con *con, struct rdma_cm_id *cm_id,
+		   ib_comp_handler comp_handler, void *ctx, int cq_vector,
+		   u16 cq_size)
+{
+	struct ib_cq_init_attr cq_attr = {};
+
+	cq_attr.cqe = cq_size * 2 + 1;
+	cq_attr.comp_vector = cq_vector;
+
+	con->cq = ib_create_cq(cm_id->device, comp_handler, cq_event_handler,
+			       ctx, &cq_attr);/*1 for beacon*/
+	if (IS_ERR(con->cq)) {
+		ERR(con, "Creating completion queue failed, errno: %ld\n",
+		    PTR_ERR(con->cq));
+		return PTR_ERR(con->cq);
+	}
+
+	return 0;
+}
+
+inline int ibtrs_request_cq_notifications(struct ib_con *con)
+{
+	return ib_req_notify_cq(con->cq, IB_CQ_NEXT_COMP |
+				IB_CQ_REPORT_MISSED_EVENTS);
+}
+
+void ib_con_destroy(struct ib_con *con)
+{
+	int err;
+
+	err = ib_destroy_qp(con->qp);
+	if (err)
+		ERR(con, "Destroying QP failed, errno: %d\n",
+		    err);
+
+	err = ib_destroy_cq(con->cq);
+	if (err)
+		ERR(con, "Destroying CQ failed, errno: %d\n",
+		    err);
+}
+
+static int create_qp(struct ib_con *con, struct rdma_cm_id *cm_id,
+		     struct ib_pd *pd, u16 wr_queue_size, u32 max_send_sge)
+{
+	struct ib_qp_init_attr init_attr = {NULL};
+	int ret;
+
+	init_attr.cap.max_send_wr = wr_queue_size + 1;/*1 more for beacon*/
+	init_attr.cap.max_recv_wr = wr_queue_size;
+	init_attr.cap.max_recv_sge = 2;
+	init_attr.event_handler = qp_event_handler;
+	init_attr.qp_context = con;
+	init_attr.cap.max_send_sge = max_send_sge;
+
+	init_attr.qp_type = IB_QPT_RC;
+	init_attr.send_cq = con->cq;
+	init_attr.recv_cq = con->cq;
+	init_attr.sq_sig_type = IB_SIGNAL_REQ_WR;
+
+	ret = rdma_create_qp(cm_id, pd, &init_attr);
+	if (ret) {
+		ERR(con, "Creating QP failed, errno: %d\n", ret);
+		return ret;
+	}
+
+	con->qp = cm_id->qp;
+	return ret;
+}
+
+int post_beacon(struct ib_con *con)
+{
+	struct ib_send_wr *bad_wr;
+
+	return ib_post_send(con->qp, &con->beacon, &bad_wr);
+}
+
+int ib_con_init(struct ib_con *con, struct rdma_cm_id *cm_id,
+		u32 max_send_sge,
+		ib_comp_handler comp_handler, void *ctx, int cq_vector,
+		u16 cq_size, u16 wr_queue_size, struct ib_session *session)
+{
+	int err, ret;
+
+	err = init_cq(con, cm_id, comp_handler, ctx,
+		      cq_vector, cq_size);
+	if (err)
+		return err;
+
+	err = create_qp(con, cm_id, session->pd, wr_queue_size, max_send_sge);
+	if (err) {
+		ret = ib_destroy_cq(con->cq);
+		if (ret)
+			ERR(con, "Destroying CQ failed, errno: %d\n",
+			    ret);
+		return err;
+	}
+	con->beacon.wr_id = (uintptr_t)&con->beacon;
+	con->beacon.opcode = IB_WR_SEND;
+	con->cm_id = cm_id;
+
+	return 0;
+}
+
+void ib_session_destroy(struct ib_session *session)
+{
+	if (session->pd) {
+		ib_dealloc_pd(session->pd);
+		session->pd = NULL;
+	}
+
+	ib_unregister_event_handler(&session->event_handler);
+}
+
+int ibtrs_addr_to_str(const struct sockaddr_storage *addr, char *buf,
+		      size_t len)
+{
+	switch (addr->ss_family) {
+	case AF_IB:
+		return scnprintf(buf, len, "gid:%pI6",
+				 &((struct sockaddr_ib *)
+				   addr)->sib_addr.sib_raw);
+	case AF_INET:
+		return scnprintf(buf, len, "ip:%pI4",
+				 &((struct sockaddr_in *)addr)->sin_addr);
+	case AF_INET6:
+		/* workaround for ip4 client addr being set to INET6 family.
+		 * This should fix it:
+		 * yotamke@mellanox.com: [PATCH for-next] RDMA/CMA: Mark
+		 * IPv4 addresses correctly when the listener is IPv6]
+		 * http://permalink.gmane.org/gmane.linux.drivers.rdma/22395
+		 *
+		 * The first byte of ip6 address can't be 0. If it is, assume
+		 * structure addr actually contains ip4 address.
+		 */
+		if (!((struct sockaddr_in6 *)addr)->sin6_addr.s6_addr[0]) {
+			return scnprintf(buf, len, "ip:%pI4",
+					 &((struct sockaddr_in *)
+					   addr)->sin_addr);
+		}
+		/* end of workaround*/
+		return scnprintf(buf, len, "ip:%pI6c",
+				 &((struct sockaddr_in6 *)addr)->sin6_addr);
+	default:
+		ERR_NP("Invalid address family\n");
+		return -EINVAL;
+	}
+}
+EXPORT_SYMBOL(ibtrs_addr_to_str);
+
+int ibtrs_heartbeat_timeout_validate(int timeout)
+{
+	if (timeout && timeout < MIN_HEARTBEAT_TIMEOUT_MS) {
+		WRN_NP("Heartbeat timeout: %d is invalid, must be 0 "
+		       "or >= %d ms\n", timeout, MIN_HEARTBEAT_TIMEOUT_MS);
+		return -EINVAL;
+	}
+
+	return 0;
+}
diff --git a/drivers/infiniband/ulp/ibtrs_lib/iu.c b/drivers/infiniband/ulp/ibtrs_lib/iu.c
new file mode 100644
index 0000000..f9102b3
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs_lib/iu.c
@@ -0,0 +1,113 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler < mail@fholler.de>
+ *          Jack Wang <jinpu.wang@profitbricks.com>
+ *   	    Kleber Souza <kleber.souza@profitbricks.com>
+ * 	    Danil Kipnis <danil.kipnis@profitbricks.com>
+ *   	    Roman Pen <roman.penyaev@profitbricks.com>
+ *          Milind Dumbare <Milind.dumbare@gmail.com>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#include <linux/slab.h>
+#include <rdma/ibtrs.h>
+
+/*
+ * Return an IU  to the free pool
+ */
+inline void ibtrs_iu_put(struct list_head *head, struct ibtrs_iu *iu)
+{
+	list_add(&iu->list, head);
+}
+
+/*
+ * Get an IU from the free pool, need lock to protect list
+ */
+struct ibtrs_iu *ibtrs_iu_get(struct list_head *head)
+{
+	struct ibtrs_iu *iu;
+
+	if (list_empty(head))
+		return NULL;
+
+	iu = list_first_entry(head, struct ibtrs_iu, list);
+	list_del(&iu->list);
+	return iu;
+}
+
+struct ibtrs_iu *ibtrs_iu_alloc(u32 tag, size_t size, gfp_t gfp_mask,
+				struct ib_device *dma_dev,
+				enum dma_data_direction direction, bool is_msg)
+{
+	struct ibtrs_iu *iu;
+
+	iu = kmalloc(sizeof(*iu), gfp_mask);
+	if (!iu)
+		return NULL;
+
+	iu->buf = kzalloc(size, gfp_mask);
+	if (!iu->buf)
+		goto err1;
+
+	iu->dma_addr = ib_dma_map_single(dma_dev, iu->buf, size, direction);
+	if (ib_dma_mapping_error(dma_dev, iu->dma_addr))
+		goto err2;
+
+	iu->size      = size;
+	iu->direction = direction;
+	iu->tag       = tag;
+	iu->is_msg     = is_msg;
+	return iu;
+
+err2:
+	kfree(iu->buf);
+err1:
+	kfree(iu);
+	return NULL;
+}
+
+void ibtrs_iu_free(struct ibtrs_iu *iu, enum dma_data_direction dir,
+		   struct ib_device *ib_dev)
+{
+	if (WARN_ON(!iu))
+		return;
+
+	ib_dma_unmap_single(ib_dev, iu->dma_addr, iu->size, dir);
+	kfree(iu->buf);
+	kfree(iu);
+}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 04/28] ibtrs_clt: add header file for exported interface
  2017-03-24 10:45 ` Jack Wang
                   ` (3 preceding siblings ...)
  (?)
@ 2017-03-24 10:45 ` Jack Wang
  -1 siblings, 0 replies; 87+ messages in thread
From: Jack Wang @ 2017-03-24 10:45 UTC (permalink / raw)
  To: linux-block, linux-rdma
  Cc: dledford, axboe, hch, mail, Milind.dumbare, yun.wang, Jack Wang,
	Kleber Souza, Danil Kipnis, Roman Pen

From: Jack Wang <jinpu.wang@profitbricks.com>

User module eg ibnbd_client will use this interface to transfer
data later.

Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Kleber Souza <kleber.souza@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kipnis@profitbricks.com>
Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
---
 include/rdma/ibtrs_clt.h | 316 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 316 insertions(+)
 create mode 100644 include/rdma/ibtrs_clt.h

diff --git a/include/rdma/ibtrs_clt.h b/include/rdma/ibtrs_clt.h
new file mode 100644
index 0000000..4fc9b12
--- /dev/null
+++ b/include/rdma/ibtrs_clt.h
@@ -0,0 +1,316 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler < mail@fholler.de>
+ *          Jack Wang <jinpu.wang@profitbricks.com>
+ *   	    Kleber Souza <kleber.souza@profitbricks.com>
+ * 	    Danil Kipnis <danil.kipnis@profitbricks.com>
+ *   	    Roman Pen <roman.penyaev@profitbricks.com>
+ *          Milind Dumbare <Milind.dumbare@gmail.com>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#if !defined(IBTRS_CLIENT_H)
+#define IBTRS_CLIENT_H
+
+#include <linux/scatterlist.h>
+
+struct ibtrs_session;
+
+/**
+ * ibtrs_clt_open() - Open a session to a ibtrs_server
+ * @addr: The IPv4, IPv6 or GID address of the peer
+ * @pdu_sz: Size of extra payload which can be accessed after tag allocation.
+ * @priv: Pointer passed back on &ibtrs_clt_ops->sess_ev() invocation
+ * @max_inflight_msg: Max. number of parallel inflight messages for the session
+ * @max_segments: Max. number of segments per IO request
+ * @reconnect_delay_sec: time between reconnect tries
+ * @max_reconnect_attempts: Number of times to reconnect on error before giving
+ *			    up, 0 for * disabled, -1 for forever
+ *
+ * Starts session establishment with the ibtrs_server. The function can block
+ * up to ~2000ms until it returns.
+ *
+ * Return a valid pointer on success otherwise PTR_ERR.
+ * -EINVAL:	The provided addr could not be resolved to an Infiniband
+ *		address, the route to the host could not be resolved or
+ *		ibtrs_clt_register() was not called before.
+ */
+struct ibtrs_session *ibtrs_clt_open(const struct sockaddr_storage *addr,
+				     size_t pdu_sz, void *priv,
+				     u8 reconnect_delay_sec, u16 max_segments,
+				     s16 max_reconnect_attempts);
+
+/**
+ * ibtrs_clt_close() - Close a session
+ * @sess: Session handler, is freed on return
+ */
+int ibtrs_clt_close(struct ibtrs_session *sess);
+
+/**
+ * enum ibtrs_clt_rdma_ev - Events related to RDMA transfer operations
+ */
+enum ibtrs_clt_rdma_ev {
+	IBTRS_CLT_RDMA_EV_RDMA_REQUEST_WRITE_COMPL,
+	IBTRS_CLT_RDMA_EV_RDMA_WRITE_COMPL,
+};
+
+/**
+ * enum ibtrs_sess_ev - Events about connectivity state of a session
+ * @IBTRS_CLT_SESS_EV_RECONNECT		The session was reconnected.
+ * @IBTRS_CLT_SESS_EV_DISCONNECTED	The session was disconnected.
+ * @IBTRS_CLT_SESS_EV_MAX_RECONN_EXCEEDED Reconect attempts stopped because
+ *					  max. number of reconnect attempts
+ *					  are reached.
+ */
+enum ibtrs_clt_sess_ev {
+	IBTRS_CLT_SESS_EV_RECONNECT,
+	IBTRS_CLT_SESS_EV_DISCONNECTED,
+	IBTRS_CLT_SESS_EV_MAX_RECONN_EXCEEDED,
+};
+
+/**
+ * struct ibtrs_clt_ops - Callback functions of the user module
+ * @owner:		module that uses ibtrs_server
+ * @rdma_ev:		Event notifications for RDMA operations,
+ *      Context: in interrupt(soft irq). The function should be relatively fast.
+ *	@priv:			user supplied data that was passed to
+ *				ibtrs_clt_request_rdma_write() or
+ *				ibtrs_clt_rdma_write() before
+ *	@ev:			Occurred event
+ *	@errno:			Result of corresponding operation,
+ *				0 on success or negative ERRNO code on error
+ * @sess_ev:		Event notification for connection state changes
+ *	@priv:			user supplied data that was passed to
+ *				ibtrs_clt_open()
+ *	@ev:			Occurred event
+ *	@errno:			Result of corresponding operation,
+ *				0 on success or negative ERRNO code on error
+ * @recv:		Event notification for infiniband message receival
+ *	@priv:			user supplied data that was passed to
+ *				ibtrs_clt_open()
+ *	@msg			Received data
+ *	@len.			Size of the @msg buffer
+ *
+ * The @recv and @rdma_ev are running on the same CPU that requested the RDMA
+ * operation before.
+ */
+
+typedef void (rdma_clt_ev_fn)(void *priv, enum ibtrs_clt_rdma_ev ev, int errno);
+typedef void (sess_clt_ev_fn)(void *priv, enum ibtrs_clt_sess_ev ev, int errno);
+typedef void (recv_clt_fn)(void *priv, const void *msg, size_t len);
+
+struct ibtrs_clt_ops {
+	struct module		*owner;
+	rdma_clt_ev_fn		*rdma_ev;
+	sess_clt_ev_fn		*sess_ev;
+	recv_clt_fn		*recv;
+};
+
+/**
+ * ibtrs_clt_register() - register a user module with ibtrs_client
+ * @ops:	callback functions to register
+ *
+ * Return:
+ * 0:		Success
+ * -ENOTSUPP:	Registration failed, max. number of supported user modules
+		reached
+ */
+int ibtrs_clt_register(const struct ibtrs_clt_ops *ops);
+
+/**
+ * ibtrs_clt_unregister() - unregister a module at ibtrs_client
+ * @ops:	struct that was passed before to ibtrs_clt_register()
+ *
+ * ibtrs_clt_unregister() must only be called after all session that were
+ * created by the user module were closed.
+ */
+void ibtrs_clt_unregister(const struct ibtrs_clt_ops *ops);
+
+enum {
+	IBTRS_TAG_NOWAIT = 0,
+	IBTRS_TAG_WAIT   = 1,
+};
+
+/**
+ * ibtrs_tag - tags the memory allocation for future RDMA operation
+ */
+struct ibtrs_tag {
+	unsigned int cpu_id;
+	unsigned int mem_id;
+	unsigned int mem_id_mask;
+};
+
+static inline struct ibtrs_tag *ibtrs_tag_from_pdu(void *pdu)
+{
+	return pdu - sizeof(struct ibtrs_tag);
+}
+
+static inline void *ibtrs_tag_to_pdu(struct ibtrs_tag *tag)
+{
+	return tag + 1;
+}
+
+/**
+ * ibtrs_get_tag() - allocates tag for future RDMA operation
+ * @sess:	Current session
+ * @cpu_id:	cpu_id to run
+ * @nr_bytes:	Number of bytes to consume per tag
+ * @wait:	Wait type
+ *
+ * Description:
+ *    Allocates tag for the following RDMA operation.  Tag is used
+ *    to preallocate all resources and to propagate memory pressure
+ *    up earlier.
+ *
+ * Context:
+ *    Can sleep if @wait == IBTRS_TAG_WAIT
+ */
+struct ibtrs_tag *ibtrs_get_tag(struct ibtrs_session *sess, int cpu_id,
+				size_t nr_bytes, int wait);
+
+/**
+ * ibtrs_put_tag() - puts allocated tag
+ * @sess:	Current session
+ * @tag:	Tag to be freed
+ *
+ * Context:
+ *    Does not matter
+ */
+void ibtrs_put_tag(struct ibtrs_session *sess, struct ibtrs_tag *tag);
+
+/**
+ * ibtrs_clt_rdma_write() - Transfer data to the server via RDMA.
+ * @sess:	Session
+ * @tag:	Preallocated tag
+ * @priv:	User provided data, passed back on corresponding
+ *		@ibtrs_clt_ops->rdma_ev() event
+ * @vec:	User module message to transfer together with @sg.
+ *		Sum of len of all @vec elements limited to <= IO_MSG_SIZE
+ * @nr:		Number of elements in @vec.
+ * @data_len:	Size of data in @sg
+ * @sg:		data to transferred, 512B aligned in the receivers memory
+ * @sg_len:	number of elements in @sg array
+ *
+ * Return:
+ * 0:		Success
+ * <0:		Error
+ *
+ * On completion of the operation a %IBTRS_CLT_RDMA_EV_RDMA_WRITE_COMPL is
+ * generated. If an error happened on IBTRS layer for this operation a
+ * %IBTRS_CLT_RDMA_EV_ERROR is generated.
+ */
+int ibtrs_clt_rdma_write(struct ibtrs_session *sess, struct ibtrs_tag *tag,
+			 void *priv, const struct kvec *vec, size_t nr,
+			 size_t data_len, struct scatterlist *sg,
+			 unsigned int sg_len);
+
+/**
+ * ibtrs_clt_request_rdma_write() - Request data transfer from server via RDMA.
+ *
+ * @sess:	Session
+ * @tag:	Preallocated tag
+ * @priv:	User provided data, passed back on corresponding
+ *		@ibtrs_clt_ops->rdma_ev() event
+ * @vec:	Message that is send to server together with the request.
+ *		Sum of len of all @vec elements limited to <= IO_MSG_SIZE.
+ * @nr:		Number of elements in @vec.
+ * @result_len: Max. length of data that ibtrs_server will send back
+ * @recv_sg:	Pages in which the response of the server will be stored.
+ * @recv_sg_len: Number of elements in the @recv_sg
+ *
+ * Return:
+ * 0:		Success
+ * <0:		Error
+ *
+ * IBTRS Client will request a data transfer from Server to Client via RDMA.
+ * The data that the server will respond with will be stored in @recv_sg when
+ * the user receives an %IBTRS_CLT_RDMA_EV_RDMA_REQUEST_WRITE_COMPL event.
+ * If an error occurred on the IBTRS layer a %IBTRS_CLT_RDMA_EV_ERROR is
+ * generated instead
+ */
+int ibtrs_clt_request_rdma_write(struct ibtrs_session *sess,
+				 struct ibtrs_tag *tag, void *priv,
+				 const struct kvec *vec, size_t nr,
+				 size_t result_len,
+				 struct scatterlist *recv_sg,
+				 unsigned int recv_sg_len);
+
+/**
+ * ibtrs_clt_send() - Send data to server via an infiniband message.
+ * @sess:	Session
+ * @vec:	Data to transfer
+ * @nr:		Number of elements in @vec
+ *
+ * Return:
+ * 0:		Success
+ * <0:		Error:
+ *		-ECOMM		no connection to the server
+ *		-EINVAL		message size too big (500 bytes max)
+ *		-EAGAIN		run out of tx buffers - try again later
+ *		-<IB ERROR>	see mlx doc
+ *
+ * The operation is not confirmed. It is the responsibility of the user on the
+ * other side to send an acknowledgment if required.
+ */
+int ibtrs_clt_send(struct ibtrs_session *sess, const struct kvec *vec,
+		   size_t nr);
+
+/**
+ * ibtrs_attrs - IBTRS session attributes
+ */
+struct ibtrs_attrs {
+	u32	queue_depth;
+	u64	mr_page_mask;
+	u32	mr_page_size;
+	u32	mr_max_size;
+	u32	max_pages_per_mr;
+	u32	max_sge;
+	u32	max_io_size;
+	u8	hostname[MAXHOSTNAMELEN];
+};
+
+/**
+ * ibtrs_clt_query() - queries IBTRS session attributes
+ *
+ * Returns:
+ *    0 on success
+ *    -ECOMM		no connection to the server
+ */
+int ibtrs_clt_query(struct ibtrs_session *sess, struct ibtrs_attrs *attr);
+
+#endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 05/28] ibtrs_clt: main functionality of ibtrs_client
@ 2017-03-24 10:45   ` Jack Wang
  0 siblings, 0 replies; 87+ messages in thread
From: Jack Wang @ 2017-03-24 10:45 UTC (permalink / raw)
  To: linux-block, linux-rdma
  Cc: dledford, axboe, hch, mail, Milind.dumbare, yun.wang, Jack Wang,
	Kleber Souza, Danil Kipnis, Roman Pen

From: Jack Wang <jinpu.wang@profitbricks.com>

ibtrs_client establishes connection to a server and excutes
RDMA operations requested by ibnbd_client.

Upon a connection establishment, server and client handshake memory
info, the server reserves enough memory to hold the queue_depth of
max io size requests for that particular client. The client is then
solely responsible to manage the memory.

We use heavily RDMA Write with IMM to reduce InfiniBand messages for
each IO, thus lower latency.

Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Kleber Souza <kleber.souza@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kipnis@profitbricks.com>
Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
---
 drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c | 5329 +++++++++++++++++++++++
 1 file changed, 5329 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c

diff --git a/drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c b/drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c
new file mode 100644
index 0000000..d34d468
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c
@@ -0,0 +1,5329 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler <mail@fholler.de>
+ *          Jack Wang <jinpu.wang@profitbricks.com>
+ *   	    Kleber Souza <kleber.souza@profitbricks.com>
+ * 	    Danil Kipnis <danil.kipnis@profitbricks.com>
+ *   	    Roman Pen <roman.penyaev@profitbricks.com>
+ *          Milind Dumbare <Milind.dumbare@gmail.com>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#include <linux/module.h>
+#include <linux/wait.h>
+#include <linux/scatterlist.h>
+#include <linux/random.h>
+#include <linux/uuid.h>
+#include <linux/utsname.h>
+#include <rdma/rdma_cm.h>
+#include <rdma/ib_cm.h>
+#include <rdma/ib_fmr_pool.h>
+#include <rdma/ib.h>
+#include <rdma/ibtrs_clt.h>
+#include "ibtrs_clt_internal.h"
+#include "ibtrs_clt_sysfs.h"
+#include <rdma/ibtrs.h>
+#include <rdma/ibtrs_log.h>
+#include <linux/list.h>
+
+#define CONS_PER_SESSION (nr_cpu_ids + 1)
+#define RECONNECT_SEED 8
+#define MAX_SEGMENTS 31
+
+MODULE_AUTHOR("ibnbd@profitbricks.com");
+MODULE_DESCRIPTION("InfiniBand Transport Client");
+MODULE_VERSION(__stringify(IBTRS_VER));
+MODULE_LICENSE("GPL");
+
+static bool use_fr;
+module_param(use_fr, bool, 0444);
+MODULE_PARM_DESC(use_fr, "use FRWR mode for memory registration if possible."
+		 " (default: 0)");
+
+static int retry_count = 7;
+
+static int retry_count_set(const char *val, const struct kernel_param *kp)
+{
+	int err, ival;
+
+	err = kstrtoint(val, 0, &ival);
+	if (err)
+		return err;
+
+	if (ival < MIN_RTR_CNT || ival > MAX_RTR_CNT)
+		return -EINVAL;
+
+	retry_count = ival;
+
+	return 0;
+}
+
+static const struct kernel_param_ops retry_count_ops = {
+	.set		= retry_count_set,
+	.get		= param_get_int,
+};
+module_param_cb(retry_count, &retry_count_ops, &retry_count, 0644);
+
+MODULE_PARM_DESC(retry_count, "Number of times to send the message if the"
+		 " remote side didn't respond with Ack or Nack (default: 3,"
+		 " min: " __stringify(MIN_RTR_CNT) ", max: "
+		 __stringify(MAX_RTR_CNT) ")");
+
+static int fmr_sg_cnt = 4;
+module_param_named(fmr_sg_cnt, fmr_sg_cnt, int, 0644);
+MODULE_PARM_DESC(fmr_sg_cnt, "when sg_cnt is bigger than fmr_sg_cnt, enable"
+		 " FMR (default: 4)");
+
+static int default_heartbeat_timeout_ms = DEFAULT_HEARTBEAT_TIMEOUT_MS;
+
+static int default_heartbeat_timeout_set(const char *val,
+					 const struct kernel_param *kp)
+{
+	int ret, ival;
+
+	ret = kstrtouint(val, 0, &ival);
+	if (ret) {
+		ERR_NP("Failed to convert string '%s' to unsigned int\n", val);
+		return ret;
+	}
+
+	ret = ibtrs_heartbeat_timeout_validate(ival);
+	if (ret)
+		return ret;
+
+	default_heartbeat_timeout_ms = ival;
+
+	return 0;
+}
+
+static const struct kernel_param_ops heartbeat_timeout_ops = {
+	.set		= default_heartbeat_timeout_set,
+	.get		= param_get_int,
+};
+
+module_param_cb(default_heartbeat_timeout_ms, &heartbeat_timeout_ops,
+		&default_heartbeat_timeout_ms, 0644);
+MODULE_PARM_DESC(default_heartbeat_timeout_ms, "default heartbeat timeout,"
+		 " min: " __stringify(MIN_HEARTBEAT_TIMEOUT_MS)
+		 " (default:" __stringify(DEFAULT_HEARTBEAT_TIMEOUT_MS) ")");
+
+static char hostname[MAXHOSTNAMELEN] = "";
+
+static int hostname_set(const char *val, const struct kernel_param *kp)
+{
+	int ret = 0, len = strlen(val);
+
+	if (len >= sizeof(hostname))
+		return -EINVAL;
+	strlcpy(hostname, val, sizeof(hostname));
+	*strchrnul(hostname, '\n') = '\0';
+
+	INFO_NP("hostname changed to %s\n", hostname);
+	return ret;
+}
+
+static struct kparam_string hostname_kparam_str = {
+	.maxlen	= sizeof(hostname),
+	.string	= hostname
+};
+
+static const struct kernel_param_ops hostname_ops = {
+	.set	= hostname_set,
+	.get	= param_get_string,
+};
+
+module_param_cb(hostname, &hostname_ops,
+		&hostname_kparam_str, 0644);
+MODULE_PARM_DESC(hostname, "Sets hostname of local server, will send to the"
+		 " other side if set,  will display togather with addr "
+		 "(default: empty)");
+
+#define LOCAL_INV_WR_ID_MASK	1
+#define	FAST_REG_WR_ID_MASK	2
+
+static const struct ibtrs_clt_ops *clt_ops;
+static struct workqueue_struct *ibtrs_wq;
+static LIST_HEAD(sess_list);
+static DEFINE_MUTEX(sess_mutex);
+
+static uuid_le uuid;
+
+enum csm_state {
+	_CSM_STATE_MIN,
+	CSM_STATE_RESOLVING_ADDR,
+	CSM_STATE_RESOLVING_ROUTE,
+	CSM_STATE_CONNECTING,
+	CSM_STATE_CONNECTED,
+	CSM_STATE_CLOSING,
+	CSM_STATE_FLUSHING,
+	CSM_STATE_CLOSED,
+	_CSM_STATE_MAX
+};
+
+enum csm_ev {
+	CSM_EV_ADDR_RESOLVED,
+	CSM_EV_ROUTE_RESOLVED,
+	CSM_EV_CON_ESTABLISHED,
+	CSM_EV_SESS_CLOSING,
+	CSM_EV_CON_DISCONNECTED,
+	CSM_EV_BEACON_COMPLETED,
+	CSM_EV_WC_ERROR,
+	CSM_EV_CON_ERROR
+};
+
+enum ssm_ev {
+	SSM_EV_CON_CONNECTED,
+	SSM_EV_RECONNECT,		/* in RECONNECT state only*/
+	SSM_EV_RECONNECT_USER,		/* triggered by user via sysfs */
+	SSM_EV_RECONNECT_HEARTBEAT,	/* triggered by the heartbeat */
+	SSM_EV_SESS_CLOSE,
+	SSM_EV_CON_CLOSED,		/* when CSM switched to CLOSED */
+	SSM_EV_CON_ERROR,		/* triggered by CSM when smth. wrong */
+	SSM_EV_ALL_CON_CLOSED,		/* triggered when all cons closed */
+	SSM_EV_GOT_RDMA_INFO
+};
+
+static const char *ssm_state_str(enum ssm_state state)
+{
+	switch (state) {
+	case SSM_STATE_IDLE:
+		return "SSM_STATE_IDLE";
+	case SSM_STATE_IDLE_RECONNECT:
+		return "SSM_STATE_IDLE_RECONNECT";
+	case SSM_STATE_WF_INFO:
+		return "SSM_STATE_WF_INFO";
+	case SSM_STATE_WF_INFO_RECONNECT:
+		return "SSM_STATE_WF_INFO_RECONNECT";
+	case SSM_STATE_OPEN:
+		return "SSM_STATE_OPEN";
+	case SSM_STATE_OPEN_RECONNECT:
+		return "SSM_STATE_OPEN_RECONNECT";
+	case SSM_STATE_CONNECTED:
+		return "SSM_STATE_CONNECTED";
+	case SSM_STATE_RECONNECT:
+		return "SSM_STATE_RECONNECT";
+	case SSM_STATE_CLOSE_DESTROY:
+		return "SSM_STATE_CLOSE_DESTROY";
+	case SSM_STATE_CLOSE_RECONNECT:
+		return "SSM_STATE_CLOSE_RECONNECT";
+	case SSM_STATE_CLOSE_RECONNECT_IMM:
+		return "SSM_STATE_CLOSE_RECONNECT_IMM";
+	case SSM_STATE_DISCONNECTED:
+		return "SSM_STATE_DISCONNECTED";
+	case SSM_STATE_DESTROYED:
+		return "SSM_STATE_DESTROYED";
+	default:
+		return "UNKNOWN";
+	}
+}
+
+static const char *ssm_event_str(enum ssm_ev ev)
+{
+	switch (ev) {
+	case SSM_EV_CON_CONNECTED:
+		return "SSM_EV_CON_CONNECTED";
+	case SSM_EV_RECONNECT:
+		return "SSM_EV_RECONNECT";
+	case SSM_EV_RECONNECT_USER:
+		return "SSM_EV_RECONNECT_USER";
+	case SSM_EV_RECONNECT_HEARTBEAT:
+		return "SSM_EV_RECONNECT_HEARTBEAT";
+	case SSM_EV_SESS_CLOSE:
+		return "SSM_EV_SESS_CLOSE";
+	case SSM_EV_CON_CLOSED:
+		return "SSM_EV_CON_CLOSED";
+	case SSM_EV_CON_ERROR:
+		return "SSM_EV_CON_ERROR";
+	case SSM_EV_ALL_CON_CLOSED:
+		return "SSM_EV_ALL_CON_CLOSED";
+	case SSM_EV_GOT_RDMA_INFO:
+		return "SSM_EV_GOT_RDMA_INFO";
+	default:
+		return "UNKNOWN";
+	}
+}
+
+static const char *csm_state_str(enum csm_state state)
+{
+	switch (state) {
+	case CSM_STATE_RESOLVING_ADDR:
+		return "CSM_STATE_RESOLVING_ADDR";
+	case CSM_STATE_RESOLVING_ROUTE:
+		return "CSM_STATE_RESOLVING_ROUTE";
+	case CSM_STATE_CONNECTING:
+		return "CSM_STATE_CONNECTING";
+	case CSM_STATE_CONNECTED:
+		return "CSM_STATE_CONNECTED";
+	case CSM_STATE_FLUSHING:
+		return "CSM_STATE_FLUSHING";
+	case CSM_STATE_CLOSING:
+		return "CSM_STATE_CLOSING";
+	case CSM_STATE_CLOSED:
+		return "CSM_STATE_CLOSED";
+	default:
+		return "UNKNOWN";
+	}
+}
+
+static const char *csm_event_str(enum csm_ev ev)
+{
+	switch (ev) {
+	case CSM_EV_ADDR_RESOLVED:
+		return "CSM_EV_ADDR_RESOLVED";
+	case CSM_EV_ROUTE_RESOLVED:
+		return "CSM_EV_ROUTE_RESOLVED";
+	case CSM_EV_CON_ESTABLISHED:
+		return "CSM_EV_CON_ESTABLISHED";
+	case CSM_EV_BEACON_COMPLETED:
+		return "CSM_EV_BEACON_COMPLETED";
+	case CSM_EV_SESS_CLOSING:
+		return "CSM_EV_SESS_CLOSING";
+	case CSM_EV_CON_DISCONNECTED:
+		return "CSM_EV_CON_DISCONNECTED";
+	case CSM_EV_WC_ERROR:
+		return "CSM_EV_WC_ERROR";
+	case CSM_EV_CON_ERROR:
+		return "CSM_EV_CON_ERROR";
+	default:
+		return "UNKNOWN";
+	}
+}
+
+/* rdma_req which connect iu with sglist received from user */
+struct rdma_req {
+	struct list_head        list;
+	struct ibtrs_iu		*iu;
+	struct scatterlist	*sglist; /* list holding user data */
+	unsigned int		sg_cnt;
+	unsigned int		sg_size;
+	u32			data_len;
+	void			*priv;
+	bool			in_use;
+	struct ibtrs_con	*con;
+	union {
+		struct ib_pool_fmr	**fmr_list;
+		struct ibtrs_fr_desc	**fr_list;
+	};
+	void			*map_page;
+	struct ibtrs_tag	*tag;
+	u16			nmdesc;
+	enum dma_data_direction dir;
+	unsigned long		start_time;
+} ____cacheline_aligned;
+
+struct ibtrs_con {
+	enum  csm_state		state;
+	short			cpu;
+	bool			user; /* true if con is for user msg only */
+	atomic_t		io_cnt;
+	struct ibtrs_session	*sess;
+	struct ib_con		ib_con;
+	struct ibtrs_fr_pool	*fr_pool;
+	struct rdma_cm_id	*cm_id;
+	struct work_struct	cq_work;
+	struct workqueue_struct *cq_wq;
+	struct tasklet_struct	cq_tasklet;
+	struct ib_wc		wcs[WC_ARRAY_SIZE];
+	bool			device_being_removed;
+};
+
+struct sess_destroy_sm_wq_work {
+	struct work_struct	work;
+	struct ibtrs_session	*sess;
+};
+
+struct con_sm_work {
+	struct work_struct	work;
+	struct ibtrs_con	*con;
+	enum csm_ev		ev;
+};
+
+struct sess_sm_work {
+	struct work_struct	work;
+	struct ibtrs_session	*sess;
+	enum ssm_ev		ev;
+};
+
+struct msg_work {
+	struct work_struct	work;
+	struct ibtrs_con	*con;
+	void                    *msg;
+};
+
+static void ibtrs_clt_free_sg_list_distr_stats(struct ibtrs_session *sess)
+{
+	int i;
+
+	for (i = 0; i < num_online_cpus(); i++)
+		kfree(sess->stats.sg_list_distr[i]);
+	kfree(sess->stats.sg_list_distr);
+	sess->stats.sg_list_distr = NULL;
+	kfree(sess->stats.sg_list_total);
+	sess->stats.sg_list_total = NULL;
+}
+
+static void ibtrs_clt_free_cpu_migr_stats(struct ibtrs_session *sess)
+{
+	kfree(sess->stats.cpu_migr.to);
+	sess->stats.cpu_migr.to = NULL;
+	kfree(sess->stats.cpu_migr.from);
+	sess->stats.cpu_migr.from = NULL;
+}
+
+static void ibtrs_clt_free_rdma_lat_stats(struct ibtrs_session *sess)
+{
+	int i;
+
+	for (i = 0; i < num_online_cpus(); i++)
+		kfree(sess->stats.rdma_lat_distr[i]);
+
+	kfree(sess->stats.rdma_lat_distr);
+	sess->stats.rdma_lat_distr = NULL;
+	kfree(sess->stats.rdma_lat_max);
+	sess->stats.rdma_lat_max = NULL;
+}
+
+static void ibtrs_clt_free_wc_comp_stats(struct ibtrs_session *sess)
+{
+	kfree(sess->stats.wc_comp);
+	sess->stats.wc_comp = NULL;
+}
+
+static void ibtrs_clt_free_rdma_stats(struct ibtrs_session *sess)
+{
+	kfree(sess->stats.rdma_stats);
+	sess->stats.rdma_stats = NULL;
+}
+
+static void ibtrs_clt_free_stats(struct ibtrs_session *sess)
+{
+	ibtrs_clt_free_rdma_stats(sess);
+	ibtrs_clt_free_rdma_lat_stats(sess);
+	ibtrs_clt_free_cpu_migr_stats(sess);
+	ibtrs_clt_free_sg_list_distr_stats(sess);
+	ibtrs_clt_free_wc_comp_stats(sess);
+}
+
+static inline int get_sess(struct ibtrs_session *sess)
+{
+	return atomic_inc_not_zero(&sess->refcount);
+}
+
+static void free_con_fast_pool(struct ibtrs_con *con);
+
+static void sess_deinit_cons(struct ibtrs_session *sess)
+{
+	int i;
+
+	for (i = 0; i < CONS_PER_SESSION; i++) {
+		struct ibtrs_con *con = &sess->con[i];
+
+		if (!i)
+			destroy_workqueue(con->cq_wq);
+		else
+			tasklet_kill(&con->cq_tasklet);
+	}
+}
+
+static void put_sess(struct ibtrs_session *sess)
+{
+	if (!atomic_dec_if_positive(&sess->refcount)) {
+		struct completion *destroy_completion;
+
+		destroy_workqueue(sess->sm_wq);
+		sess_deinit_cons(sess);
+		kfree(sess->con);
+		sess->con = NULL;
+		ibtrs_clt_free_stats(sess);
+		destroy_completion = sess->destroy_completion;
+		mutex_lock(&sess_mutex);
+		list_del(&sess->list);
+		mutex_unlock(&sess_mutex);
+		INFO(sess, "Session is disconnected\n");
+		kfree(sess);
+		if (destroy_completion)
+			complete_all(destroy_completion);
+	}
+}
+
+inline int ibtrs_clt_get_user_queue_depth(struct ibtrs_session *sess)
+{
+	return sess->user_queue_depth;
+}
+
+inline int ibtrs_clt_set_user_queue_depth(struct ibtrs_session *sess,
+					  u16 queue_depth)
+{
+	if (queue_depth < 1 ||
+	    queue_depth > sess->queue_depth) {
+		ERR(sess, "Queue depth %u is out of range (1 - %u)",
+		    queue_depth,
+		    sess->queue_depth);
+		return -EINVAL;
+	}
+
+	sess->user_queue_depth = queue_depth;
+	return 0;
+}
+
+static void csm_resolving_addr(struct ibtrs_con *con, enum csm_ev ev);
+static void csm_resolving_route(struct ibtrs_con *con, enum csm_ev ev);
+static void csm_connecting(struct ibtrs_con *con, enum csm_ev ev);
+static void csm_connected(struct ibtrs_con *con, enum csm_ev ev);
+static void csm_flushing(struct ibtrs_con *con, enum csm_ev ev);
+static void csm_closing(struct ibtrs_con *con, enum csm_ev ev);
+
+static int init_con(struct ibtrs_session *sess, struct ibtrs_con *con,
+		    short cpu, bool user);
+/* ignore all event for safefy */
+static void csm_closed(struct ibtrs_con *con, enum csm_ev ev)
+{
+}
+
+typedef void (ibtrs_clt_csm_ev_handler_fn)(struct ibtrs_con *, enum csm_ev);
+
+static ibtrs_clt_csm_ev_handler_fn *ibtrs_clt_csm_ev_handlers[] = {
+	[CSM_STATE_RESOLVING_ADDR]	= csm_resolving_addr,
+	[CSM_STATE_RESOLVING_ROUTE]	= csm_resolving_route,
+	[CSM_STATE_CONNECTING]		= csm_connecting,
+	[CSM_STATE_CONNECTED]		= csm_connected,
+	[CSM_STATE_CLOSING]		= csm_closing,
+	[CSM_STATE_FLUSHING]		= csm_flushing,
+	[CSM_STATE_CLOSED]		= csm_closed
+};
+
+static void csm_trigger_event(struct work_struct *work)
+{
+	struct con_sm_work *w;
+	struct ibtrs_con *con;
+	enum csm_ev ev;
+
+	w = container_of(work, struct con_sm_work, work);
+	con = w->con;
+	ev = w->ev;
+	kvfree(w);
+
+	if (WARN_ON_ONCE(con->state <= _CSM_STATE_MIN ||
+			 con->state >= _CSM_STATE_MAX)) {
+		WRN(con->sess, "Connection state is out of range\n");
+		return;
+	}
+
+	ibtrs_clt_csm_ev_handlers[con->state](con, ev);
+}
+
+static void csm_set_state(struct ibtrs_con *con, enum csm_state s)
+{
+	if (WARN(s <= _CSM_STATE_MIN || s >= _CSM_STATE_MAX,
+		 "Unknown CSM state %d\n", s))
+		return;
+	smp_wmb(); /* fence con->state change */
+	if (con->state != s) {
+		DEB("changing con %p csm state from %s to %s\n", con,
+		    csm_state_str(con->state), csm_state_str(s));
+		con->state = s;
+	}
+}
+
+inline bool ibtrs_clt_sess_is_connected(const struct ibtrs_session *sess)
+{
+	return sess->state == SSM_STATE_CONNECTED;
+}
+
+static void ssm_idle(struct ibtrs_session *sess, enum ssm_ev ev);
+static void ssm_idle_reconnect(struct ibtrs_session *sess, enum ssm_ev ev);
+static void ssm_open(struct ibtrs_session *sess, enum ssm_ev ev);
+static void ssm_open_reconnect(struct ibtrs_session *sess, enum ssm_ev ev);
+static void ssm_connected(struct ibtrs_session *sess, enum ssm_ev ev);
+static void ssm_reconnect(struct ibtrs_session *sess, enum ssm_ev ev);
+static void ssm_close_destroy(struct ibtrs_session *sess, enum ssm_ev ev);
+static void ssm_close_reconnect(struct ibtrs_session *sess, enum ssm_ev ev);
+static void ssm_close_reconnect_imm(struct ibtrs_session *sess, enum ssm_ev ev);
+static void ssm_disconnected(struct ibtrs_session *sess, enum ssm_ev ev);
+static void ssm_destroyed(struct ibtrs_session *sess, enum ssm_ev ev);
+static void ssm_wf_info(struct ibtrs_session *sess, enum ssm_ev ev);
+static void ssm_wf_info_reconnect(struct ibtrs_session *sess, enum ssm_ev ev);
+
+typedef void (ibtrs_clt_ssm_ev_handler_fn)(struct ibtrs_session *, enum ssm_ev);
+
+static ibtrs_clt_ssm_ev_handler_fn *ibtrs_clt_ev_handlers[] = {
+	[SSM_STATE_IDLE]		= ssm_idle,
+	[SSM_STATE_IDLE_RECONNECT]	= ssm_idle_reconnect,
+	[SSM_STATE_WF_INFO]		= ssm_wf_info,
+	[SSM_STATE_WF_INFO_RECONNECT]	= ssm_wf_info_reconnect,
+	[SSM_STATE_OPEN]		= ssm_open,
+	[SSM_STATE_OPEN_RECONNECT]	= ssm_open_reconnect,
+	[SSM_STATE_CONNECTED]		= ssm_connected,
+	[SSM_STATE_RECONNECT]		= ssm_reconnect,
+	[SSM_STATE_CLOSE_DESTROY]	= ssm_close_destroy,
+	[SSM_STATE_CLOSE_RECONNECT]	= ssm_close_reconnect,
+	[SSM_STATE_CLOSE_RECONNECT_IMM]	= ssm_close_reconnect_imm,
+	[SSM_STATE_DISCONNECTED]	= ssm_disconnected,
+	[SSM_STATE_DESTROYED]		= ssm_destroyed,
+};
+
+typedef int (ibtrs_clt_ssm_state_init_fn)(struct ibtrs_session *);
+static ibtrs_clt_ssm_state_init_fn	ssm_open_init;
+static ibtrs_clt_ssm_state_init_fn	ssm_close_destroy_init;
+static ibtrs_clt_ssm_state_init_fn	ssm_destroyed_init;
+static ibtrs_clt_ssm_state_init_fn	ssm_connected_init;
+static ibtrs_clt_ssm_state_init_fn	ssm_reconnect_init;
+static ibtrs_clt_ssm_state_init_fn	ssm_idle_reconnect_init;
+static ibtrs_clt_ssm_state_init_fn	ssm_disconnected_init;
+static ibtrs_clt_ssm_state_init_fn	ssm_wf_info_init;
+
+static ibtrs_clt_ssm_state_init_fn *ibtrs_clt_ssm_state_init[] = {
+	[SSM_STATE_IDLE]		= NULL,
+	[SSM_STATE_IDLE_RECONNECT]	= ssm_idle_reconnect_init,
+	[SSM_STATE_WF_INFO]		= ssm_wf_info_init,
+	[SSM_STATE_WF_INFO_RECONNECT]	= ssm_wf_info_init,
+	[SSM_STATE_OPEN]		= ssm_open_init,
+	[SSM_STATE_OPEN_RECONNECT]	= ssm_open_init,
+	[SSM_STATE_CONNECTED]		= ssm_connected_init,
+	[SSM_STATE_RECONNECT]		= ssm_reconnect_init,
+	[SSM_STATE_CLOSE_DESTROY]	= ssm_close_destroy_init,
+	[SSM_STATE_CLOSE_RECONNECT]	= ssm_close_destroy_init,
+	[SSM_STATE_CLOSE_RECONNECT_IMM]	= ssm_close_destroy_init,
+	[SSM_STATE_DISCONNECTED]	= ssm_disconnected_init,
+	[SSM_STATE_DESTROYED]		= ssm_destroyed_init,
+};
+
+static int ssm_init_state(struct ibtrs_session *sess, enum ssm_state state)
+{
+	int err;
+
+	if (WARN(state <= _SSM_STATE_MIN || state >= _SSM_STATE_MAX,
+		 "Unknown SSM state %d\n", state))
+		return -EINVAL;
+
+	smp_rmb(); /* fence sess->state change */
+	if (sess->state == state)
+		return 0;
+
+	/* Call the init function of the new state only if:
+	 * - it is defined
+	 *   and
+	 * - it is different from the init function of the current state
+	 */
+	if (ibtrs_clt_ssm_state_init[state] &&
+	    ibtrs_clt_ssm_state_init[state] !=
+	    ibtrs_clt_ssm_state_init[sess->state]) {
+		err = ibtrs_clt_ssm_state_init[state](sess);
+		if (err) {
+			ERR(sess, "Failed to init ssm state %s from %s: %d\n",
+			    ssm_state_str(state), ssm_state_str(sess->state),
+			    err);
+			return err;
+		}
+	}
+
+	DEB("changing sess %p ssm state from %s to %s\n", sess,
+	    ssm_state_str(sess->state), ssm_state_str(state));
+
+	smp_wmb(); /* fence sess->state change */
+	sess->state = state;
+
+	return 0;
+}
+
+static void ssm_trigger_event(struct work_struct *work)
+{
+	struct sess_sm_work *w;
+	struct ibtrs_session *sess;
+	enum ssm_ev ev;
+
+	w = container_of(work, struct sess_sm_work, work);
+	sess = w->sess;
+	ev = w->ev;
+	kvfree(w);
+
+	if (WARN_ON_ONCE(sess->state <= _SSM_STATE_MIN || sess->state >=
+			 _SSM_STATE_MAX)) {
+		WRN(sess, "Session state is out of range\n");
+		return;
+	}
+
+	ibtrs_clt_ev_handlers[sess->state](sess, ev);
+}
+
+static void csm_schedule_event(struct ibtrs_con *con, enum csm_ev ev)
+{
+	struct con_sm_work *w = NULL;
+
+	if (in_softirq()) {
+		w = kmalloc(sizeof(*w), GFP_ATOMIC);
+		BUG_ON(!w);
+		goto out;
+	}
+	while (!w) {
+		w = ibtrs_malloc(sizeof(*w));
+		if (!w)
+			cond_resched();
+	}
+out:
+	w->con = con;
+	w->ev = ev;
+	INIT_WORK(&w->work, csm_trigger_event);
+	WARN_ON(!queue_work_on(0, con->sess->sm_wq, &w->work));
+}
+
+static void ssm_schedule_event(struct ibtrs_session *sess, enum ssm_ev ev)
+{
+	struct sess_sm_work *w = NULL;
+
+	while (!w) {
+		w = ibtrs_malloc(sizeof(*w));
+		if (!w)
+			cond_resched();
+	}
+
+	w->sess = sess;
+	w->ev = ev;
+	INIT_WORK(&w->work, ssm_trigger_event);
+	WARN_ON(!queue_work_on(0, sess->sm_wq, &w->work));
+}
+
+static inline bool clt_ops_are_valid(const struct ibtrs_clt_ops *ops)
+{
+	return ops && ops->rdma_ev && ops->sess_ev && ops->recv;
+}
+
+/**
+ * struct ibtrs_fr_desc - fast registration work request arguments
+ * @entry: Entry in ibtrs_fr_pool.free_list.
+ * @mr:    Memory region.
+ * @frpl:  Fast registration page list.
+ */
+struct ibtrs_fr_desc {
+	struct list_head		entry;
+	struct ib_mr			*mr;
+};
+
+/**
+ * struct ibtrs_fr_pool - pool of fast registration descriptors
+ *
+ * An entry is available for allocation if and only if it occurs in @free_list.
+ *
+ * @size:      Number of descriptors in this pool.
+ * @max_page_list_len: Maximum fast registration work request page list length.
+ * @lock:      Protects free_list.
+ * @free_list: List of free descriptors.
+ * @desc:      Fast registration descriptor pool.
+ */
+struct ibtrs_fr_pool {
+	int			size;
+	int			max_page_list_len;
+	/* lock for free_list*/
+	spinlock_t		lock ____cacheline_aligned;
+	struct list_head	free_list;
+	struct ibtrs_fr_desc	desc[0];
+};
+
+/**
+ * struct ibtrs_map_state - per-request DMA memory mapping state
+ * @desc:	    Pointer to the element of the SRP buffer descriptor array
+ *		    that is being filled in.
+ * @pages:	    Array with DMA addresses of pages being considered for
+ *		    memory registration.
+ * @base_dma_addr:  DMA address of the first page that has not yet been mapped.
+ * @dma_len:	    Number of bytes that will be registered with the next
+ *		    FMR or FR memory registration call.
+ * @total_len:	    Total number of bytes in the sg-list being mapped.
+ * @npages:	    Number of page addresses in the pages[] array.
+ * @nmdesc:	    Number of FMR or FR memory descriptors used for mapping.
+ * @ndesc:	    Number of buffer descriptors that have been filled in.
+ */
+struct ibtrs_map_state {
+	union {
+		struct ib_pool_fmr	**next_fmr;
+		struct ibtrs_fr_desc	**next_fr;
+	};
+	struct ibtrs_sg_desc	*desc;
+	union {
+		u64			*pages;
+		struct scatterlist      *sg;
+	};
+	dma_addr_t		base_dma_addr;
+	u32			dma_len;
+	u32			total_len;
+	u32			npages;
+	u32			nmdesc;
+	u32			ndesc;
+	enum dma_data_direction dir;
+};
+
+static void free_io_bufs(struct ibtrs_session *sess);
+
+static int process_open_rsp(struct ibtrs_con *con, const void *resp)
+{
+	int i;
+	const struct ibtrs_msg_sess_open_resp *msg = resp;
+	struct ibtrs_session *sess = con->sess;
+	u32 chunk_size;
+
+	rcu_read_lock();
+	smp_rmb(); /* fence con->state check */
+	if (unlikely(con->state != CSM_STATE_CONNECTED)) {
+		rcu_read_unlock();
+		INFO(sess, "Process open response failed, disconnected."
+		     " Connection state is %s, Session state is %s\n",
+		     csm_state_str(con->state),
+		     ssm_state_str(sess->state));
+		return -ECOMM;
+	}
+	rcu_read_unlock();
+
+	chunk_size = msg->max_io_size + msg->max_req_size;
+	/* check if IB immediate data size is enough to hold the mem_id and the
+	 * offset inside the memory chunk
+	 */
+	if (ilog2(msg->cnt - 1) + ilog2(chunk_size - 1) >
+		IB_IMM_SIZE_BITS) {
+		ERR(sess, "RDMA immediate size (%db) not enough to encode "
+		    "%d buffers of size %dB\n", IB_IMM_SIZE_BITS, msg->cnt,
+		    chunk_size);
+		return -EINVAL;
+	}
+
+	strlcpy(sess->hostname, msg->hostname, sizeof(sess->hostname));
+	sess->srv_rdma_buf_rkey = msg->rkey;
+	sess->user_queue_depth = msg->max_inflight_msg;
+	sess->max_io_size = msg->max_io_size;
+	sess->max_req_size = msg->max_req_size;
+	sess->chunk_size = chunk_size;
+	sess->max_desc = (msg->max_req_size - IBTRS_HDR_LEN - sizeof(u32)
+			  - sizeof(u32) - IO_MSG_SIZE) / IBTRS_SG_DESC_LEN;
+	sess->ver = min_t(u8, msg->ver, IBTRS_VERSION);
+
+	/* if the server changed the queue_depth between the reconnect,
+	 * we need to reallocate all buffers that depend on it
+	 */
+	if (sess->queue_depth &&
+	    sess->queue_depth != msg->max_inflight_msg) {
+		free_io_bufs(sess);
+		kfree(sess->srv_rdma_addr);
+		sess->srv_rdma_addr = NULL;
+	}
+
+	sess->queue_depth = msg->max_inflight_msg;
+	if (!sess->srv_rdma_addr) {
+		sess->srv_rdma_addr = kcalloc(sess->queue_depth,
+					      sizeof(*sess->srv_rdma_addr),
+					      GFP_KERNEL);
+		if (!sess->srv_rdma_addr) {
+			ERR(sess, "Failed to allocate memory for server RDMA"
+			    " addresses\n");
+			return -ENOMEM;
+		}
+	}
+
+	for (i = 0; i < msg->cnt; i++) {
+		sess->srv_rdma_addr[i] = msg->addr[i];
+		DEB("Adding contiguous buffer %d, size %u, addr: 0x%p,"
+		    " rkey: 0x%x\n", i, sess->chunk_size,
+		    (void *)sess->srv_rdma_addr[i],
+		    sess->srv_rdma_buf_rkey);
+	}
+
+	return 0;
+}
+
+static int wait_for_ssm_state(struct ibtrs_session *sess, enum ssm_state state)
+{
+	DEB("Waiting for state %s...\n", ssm_state_str(state));
+	wait_event(sess->wait_q, sess->state >= state);
+
+	if (unlikely(sess->state != state)) {
+		ERR(sess,
+		    "Waited for session state '%s', but state is '%s'\n",
+		    ssm_state_str(state), ssm_state_str(sess->state));
+		return -EHOSTUNREACH;
+	}
+
+	return 0;
+}
+
+static inline struct ibtrs_tag *__ibtrs_get_tag(struct ibtrs_session *sess,
+						int cpu_id)
+{
+	size_t max_depth = sess->user_queue_depth;
+	struct ibtrs_tag *tag;
+	int cpu, bit;
+
+	cpu = get_cpu();
+	do {
+		bit = find_first_zero_bit(sess->tags_map, max_depth);
+		if (unlikely(bit >= max_depth)) {
+			put_cpu();
+			return NULL;
+		}
+
+	} while (unlikely(test_and_set_bit_lock(bit, sess->tags_map)));
+	put_cpu();
+
+	tag = GET_TAG(sess, bit);
+	WARN_ON(tag->mem_id != bit);
+	tag->cpu_id = (cpu_id != -1 ? cpu_id : cpu);
+
+	return tag;
+}
+
+static inline void __ibtrs_put_tag(struct ibtrs_session *sess,
+				   struct ibtrs_tag *tag)
+{
+	clear_bit_unlock(tag->mem_id, sess->tags_map);
+}
+
+struct ibtrs_tag *ibtrs_get_tag(struct ibtrs_session *sess, int cpu_id,
+				size_t nr_bytes, int can_wait)
+{
+	struct ibtrs_tag *tag;
+	DEFINE_WAIT(wait);
+
+	/* Is not used for now */
+	(void)nr_bytes;
+
+	tag = __ibtrs_get_tag(sess, cpu_id);
+	if (likely(tag) || !can_wait)
+		return tag;
+
+	do {
+		prepare_to_wait(&sess->tags_wait, &wait, TASK_UNINTERRUPTIBLE);
+		tag = __ibtrs_get_tag(sess, cpu_id);
+		if (likely(tag))
+			break;
+
+		io_schedule();
+	} while (1);
+
+	finish_wait(&sess->tags_wait, &wait);
+
+	return tag;
+}
+EXPORT_SYMBOL(ibtrs_get_tag);
+
+void ibtrs_put_tag(struct ibtrs_session *sess, struct ibtrs_tag *tag)
+{
+	if (WARN_ON(tag->mem_id >= sess->queue_depth))
+		return;
+	if (WARN_ON(!test_bit(tag->mem_id, sess->tags_map)))
+		return;
+
+	__ibtrs_put_tag(sess, tag);
+
+	/* Putting a tag is a barrier, so we will observe
+	 * new entry in the wait list, no worries.
+	 */
+	if (waitqueue_active(&sess->tags_wait))
+		wake_up(&sess->tags_wait);
+}
+EXPORT_SYMBOL(ibtrs_put_tag);
+
+static void put_u_msg_iu(struct ibtrs_session *sess, struct ibtrs_iu *iu)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&sess->u_msg_ius_lock, flags);
+	ibtrs_iu_put(&sess->u_msg_ius_list, iu);
+	spin_unlock_irqrestore(&sess->u_msg_ius_lock, flags);
+}
+
+static struct ibtrs_iu *get_u_msg_iu(struct ibtrs_session *sess)
+{
+	struct ibtrs_iu *iu;
+	unsigned long flags;
+
+	spin_lock_irqsave(&sess->u_msg_ius_lock, flags);
+	iu = ibtrs_iu_get(&sess->u_msg_ius_list);
+	spin_unlock_irqrestore(&sess->u_msg_ius_lock, flags);
+
+	return iu;
+}
+
+/**
+ * ibtrs_destroy_fr_pool() - free the resources owned by a pool
+ * @pool: Fast registration pool to be destroyed.
+ */
+static void ibtrs_destroy_fr_pool(struct ibtrs_fr_pool *pool)
+{
+	int i;
+	struct ibtrs_fr_desc *d;
+	int ret;
+
+	if (!pool)
+		return;
+
+	for (i = 0, d = &pool->desc[0]; i < pool->size; i++, d++) {
+		if (d->mr) {
+			ret = ib_dereg_mr(d->mr);
+			if (ret)
+				ERR_NP("Failed to deregister memory region,"
+				       " errno: %d\n", ret);
+		}
+	}
+	kfree(pool);
+}
+
+/**
+ * ibtrs_create_fr_pool() - allocate and initialize a pool for fast registration
+ * @device:            IB device to allocate fast registration descriptors for.
+ * @pd:                Protection domain associated with the FR descriptors.
+ * @pool_size:         Number of descriptors to allocate.
+ * @max_page_list_len: Maximum fast registration work request page list length.
+ */
+static struct ibtrs_fr_pool *ibtrs_create_fr_pool(struct ib_device *device,
+						  struct ib_pd *pd,
+						  int pool_size,
+						  int max_page_list_len)
+{
+	struct ibtrs_fr_pool *pool;
+	struct ibtrs_fr_desc *d;
+	struct ib_mr *mr;
+	int i, ret;
+
+	if (pool_size <= 0) {
+		WRN_NP("Creating fr pool failed, invalid pool size %d\n",
+		       pool_size);
+		ret = -EINVAL;
+		goto err;
+	}
+
+	pool = kzalloc(sizeof(*pool) + pool_size * sizeof(*d), GFP_KERNEL);
+	if (!pool) {
+		ret = -ENOMEM;
+		goto err;
+	}
+
+	pool->size = pool_size;
+	pool->max_page_list_len = max_page_list_len;
+	spin_lock_init(&pool->lock);
+	INIT_LIST_HEAD(&pool->free_list);
+
+	for (i = 0, d = &pool->desc[0]; i < pool->size; i++, d++) {
+		mr = ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG, max_page_list_len);
+		if (IS_ERR(mr)) {
+			WRN_NP("Failed to allocate fast region memory\n");
+			ret = PTR_ERR(mr);
+			goto destroy_pool;
+		}
+		d->mr = mr;
+		list_add_tail(&d->entry, &pool->free_list);
+	}
+
+	return pool;
+
+destroy_pool:
+	ibtrs_destroy_fr_pool(pool);
+err:
+	return ERR_PTR(ret);
+}
+
+/**
+ * ibtrs_fr_pool_get() - obtain a descriptor suitable for fast registration
+ * @pool: Pool to obtain descriptor from.
+ */
+static struct ibtrs_fr_desc *ibtrs_fr_pool_get(struct ibtrs_fr_pool *pool)
+{
+	struct ibtrs_fr_desc *d = NULL;
+
+	spin_lock_bh(&pool->lock);
+	if (!list_empty(&pool->free_list)) {
+		d = list_first_entry(&pool->free_list, typeof(*d), entry);
+		list_del(&d->entry);
+	}
+	spin_unlock_bh(&pool->lock);
+
+	return d;
+}
+
+/**
+ * ibtrs_fr_pool_put() - put an FR descriptor back in the free list
+ * @pool: Pool the descriptor was allocated from.
+ * @desc: Pointer to an array of fast registration descriptor pointers.
+ * @n:    Number of descriptors to put back.
+ *
+ * Note: The caller must already have queued an invalidation request for
+ * desc->mr->rkey before calling this function.
+ */
+static void ibtrs_fr_pool_put(struct ibtrs_fr_pool *pool,
+			      struct ibtrs_fr_desc **desc, int n)
+{
+	int i;
+
+	spin_lock_bh(&pool->lock);
+	for (i = 0; i < n; i++)
+		list_add(&desc[i]->entry, &pool->free_list);
+	spin_unlock_bh(&pool->lock);
+}
+
+static inline struct ibtrs_fr_pool *alloc_fr_pool(struct ibtrs_session *sess)
+{
+	return ibtrs_create_fr_pool(sess->ib_device, sess->ib_sess.pd,
+				    sess->queue_depth,
+				    sess->max_pages_per_mr);
+}
+
+static void ibtrs_map_desc(struct ibtrs_map_state *state, dma_addr_t dma_addr,
+			   u32 dma_len, u32 rkey, u32 max_desc)
+{
+	struct ibtrs_sg_desc *desc = state->desc;
+
+	DEB("dma_addr %llu, key %u, dma_len %u\n", dma_addr, rkey, dma_len);
+	desc->addr	= dma_addr;
+	desc->key	= rkey;
+	desc->len	= dma_len;
+
+	state->total_len += dma_len;
+	if (state->ndesc < max_desc) {
+		state->desc++;
+		state->ndesc++;
+	} else {
+		state->ndesc = INT_MIN;
+		ERR_NP("Could not fit S/G list into buffer descriptor %d.\n",
+		       max_desc);
+	}
+}
+
+static int ibtrs_map_finish_fmr(struct ibtrs_map_state *state,
+				struct ibtrs_con *con)
+{
+	struct ib_pool_fmr *fmr;
+	u64 io_addr = 0;
+	dma_addr_t dma_addr;
+
+	fmr = ib_fmr_pool_map_phys(con->sess->fmr_pool, state->pages,
+				   state->npages, io_addr);
+	if (IS_ERR(fmr)) {
+		WRN_RL(con->sess, "Failed to map FMR from FMR pool, "
+		       "errno: %ld\n", PTR_ERR(fmr));
+		return PTR_ERR(fmr);
+	}
+
+	*state->next_fmr++ = fmr;
+	state->nmdesc++;
+	dma_addr = state->base_dma_addr & ~con->sess->mr_page_mask;
+	DEB("ndesc = %d, nmdesc = %d, npages = %d\n",
+	    state->ndesc, state->nmdesc, state->npages);
+	if (state->dir == DMA_TO_DEVICE)
+		ibtrs_map_desc(state, dma_addr, state->dma_len, fmr->fmr->lkey,
+			       con->sess->max_desc);
+	else
+		ibtrs_map_desc(state, dma_addr, state->dma_len, fmr->fmr->rkey,
+			       con->sess->max_desc);
+
+	return 0;
+}
+
+static int ibtrs_map_finish_fr(struct ibtrs_map_state *state,
+			       struct ibtrs_con *con, int sg_cnt,
+			       unsigned int *sg_offset_p)
+{
+	struct ib_send_wr *bad_wr;
+	struct ib_reg_wr wr;
+	struct ibtrs_fr_desc *desc;
+	struct ib_pd *pd = con->sess->ib_sess.pd;
+	u32 rkey;
+	int n;
+
+	if (sg_cnt == 1 && (pd->flags & IB_PD_UNSAFE_GLOBAL_RKEY)) {
+		unsigned int sg_offset = sg_offset_p ? *sg_offset_p : 0;
+
+		ibtrs_map_desc(state, sg_dma_address(state->sg) + sg_offset,
+			     sg_dma_len(state->sg) - sg_offset,
+			     pd->unsafe_global_rkey, con->sess->max_desc);
+		if (sg_offset_p)
+			*sg_offset_p = 0;
+		return 1;
+	}
+
+	desc = ibtrs_fr_pool_get(con->fr_pool);
+	if (!desc) {
+		WRN_RL(con->sess, "Failed to get descriptor from FR pool\n");
+		return -ENOMEM;
+	}
+
+	rkey = ib_inc_rkey(desc->mr->rkey);
+	ib_update_fast_reg_key(desc->mr, rkey);
+
+	memset(&wr, 0, sizeof(wr));
+	n = ib_map_mr_sg(desc->mr, state->sg, sg_cnt, sg_offset_p,
+			 con->sess->mr_page_size);
+	if (unlikely(n < 0)) {
+		ibtrs_fr_pool_put(con->fr_pool, &desc, 1);
+		return n;
+	}
+
+	wr.wr.next = NULL;
+	wr.wr.opcode = IB_WR_REG_MR;
+	wr.wr.wr_id = FAST_REG_WR_ID_MASK;
+	wr.wr.num_sge = 0;
+	wr.wr.send_flags = 0;
+	wr.mr = desc->mr;
+	wr.key = desc->mr->rkey;
+	wr.access = (IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE);
+
+	*state->next_fr++ = desc;
+	state->nmdesc++;
+
+	ibtrs_map_desc(state, state->base_dma_addr, state->dma_len,
+		       desc->mr->rkey, con->sess->max_desc);
+
+	return ib_post_send(con->ib_con.qp, &wr.wr, &bad_wr);
+}
+
+static int ibtrs_finish_fmr_mapping(struct ibtrs_map_state *state,
+				    struct ibtrs_con *con)
+{
+	int ret = 0;
+	struct ib_pd *pd = con->sess->ib_sess.pd;
+
+	if (state->npages == 0)
+		return 0;
+
+	if (state->npages == 1 && (pd->flags & IB_PD_UNSAFE_GLOBAL_RKEY))
+		ibtrs_map_desc(state, state->base_dma_addr, state->dma_len,
+			       pd->unsafe_global_rkey,
+			       con->sess->max_desc);
+	else
+		ret = ibtrs_map_finish_fmr(state, con);
+
+	if (ret == 0) {
+		state->npages = 0;
+		state->dma_len = 0;
+	}
+
+	return ret;
+}
+
+static int ibtrs_map_sg_entry(struct ibtrs_map_state *state,
+			      struct ibtrs_con *con, struct scatterlist *sg,
+			      int sg_count)
+{
+	struct ib_device *ibdev = con->sess->ib_device;
+	dma_addr_t dma_addr = ib_sg_dma_address(ibdev, sg);
+	unsigned int dma_len = ib_sg_dma_len(ibdev, sg);
+	unsigned int len;
+	int ret;
+
+	if (!dma_len)
+		return 0;
+
+	while (dma_len) {
+		unsigned offset = dma_addr & ~con->sess->mr_page_mask;
+
+		if (state->npages == con->sess->max_pages_per_mr ||
+		    offset != 0) {
+			ret = ibtrs_finish_fmr_mapping(state, con);
+			if (ret)
+				return ret;
+		}
+
+		len = min_t(unsigned int, dma_len,
+			    con->sess->mr_page_size - offset);
+
+		if (!state->npages)
+			state->base_dma_addr = dma_addr;
+		state->pages[state->npages++] =
+			dma_addr & con->sess->mr_page_mask;
+		state->dma_len += len;
+		dma_addr += len;
+		dma_len -= len;
+	}
+
+	/*
+	 * If the last entry of the MR wasn't a full page, then we need to
+	 * close it out and start a new one -- we can only merge at page
+	 * boundaries.
+	 */
+	ret = 0;
+	if (len != con->sess->mr_page_size)
+		ret = ibtrs_finish_fmr_mapping(state, con);
+	return ret;
+}
+
+static int ibtrs_map_fr(struct ibtrs_map_state *state, struct ibtrs_con *con,
+			struct scatterlist *sg, int sg_count)
+{
+	unsigned int sg_offset = 0;
+	state->sg = sg;
+
+	while (sg_count) {
+		int i, n;
+
+		n = ibtrs_map_finish_fr(state, con, sg_count, &sg_offset);
+		if (unlikely(n < 0))
+			return n;
+
+		sg_count -= n;
+		for (i = 0; i < n; i++)
+			state->sg = sg_next(state->sg);
+	}
+
+	return 0;
+}
+static int ibtrs_map_fmr(struct ibtrs_map_state *state, struct ibtrs_con *con,
+			 struct scatterlist *sg_first_entry, int
+			 sg_first_entry_index, int sg_count)
+{
+	int i, ret;
+	struct scatterlist *sg;
+
+	for (i = sg_first_entry_index, sg = sg_first_entry; i < sg_count;
+	     i++, sg = sg_next(sg)) {
+		ret = ibtrs_map_sg_entry(state, con, sg, sg_count);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
+static int ibtrs_map_sg(struct ibtrs_map_state *state, struct ibtrs_con *con,
+			struct rdma_req *req)
+{
+	int ret = 0;
+
+	state->pages = req->map_page;
+	if (con->sess->fast_reg_mode == IBTRS_FAST_MEM_FR) {
+		state->next_fr = req->fr_list;
+		ret = ibtrs_map_fr(state, con, req->sglist, req->sg_cnt);
+		if (ret)
+			goto out;
+	} else if (con->sess->fast_reg_mode == IBTRS_FAST_MEM_FMR) {
+		state->next_fmr = req->fmr_list;
+		ret = ibtrs_map_fmr(state, con, req->sglist, 0,
+				    req->sg_cnt);
+		if (ret)
+			goto out;
+		ret = ibtrs_finish_fmr_mapping(state, con);
+		if (ret)
+			goto out;
+	}
+
+
+
+out:
+	req->nmdesc = state->nmdesc;
+	return ret;
+}
+
+static int ibtrs_inv_rkey(struct ibtrs_con *con, u32 rkey)
+{
+	struct ib_send_wr *bad_wr;
+	struct ib_send_wr wr = {
+		.opcode		    = IB_WR_LOCAL_INV,
+		.wr_id		    = LOCAL_INV_WR_ID_MASK,
+		.next		    = NULL,
+		.num_sge	    = 0,
+		.send_flags	    = 0,
+		.ex.invalidate_rkey = rkey,
+	};
+
+	return ib_post_send(con->ib_con.qp, &wr, &bad_wr);
+}
+
+static void ibtrs_unmap_fast_reg_data(struct ibtrs_con *con,
+				      struct rdma_req *req)
+{
+	int i, ret;
+
+	if (con->sess->fast_reg_mode == IBTRS_FAST_MEM_FR) {
+		struct ibtrs_fr_desc **pfr;
+
+		for (i = req->nmdesc, pfr = req->fr_list; i > 0; i--, pfr++) {
+			ret = ibtrs_inv_rkey(con, (*pfr)->mr->rkey);
+			if (ret < 0) {
+				ERR(con->sess,
+				    "Invalidating registered RDMA memory for"
+				    " rkey %#x failed, errno: %d\n",
+				    (*pfr)->mr->rkey, ret);
+			}
+		}
+		if (req->nmdesc)
+			ibtrs_fr_pool_put(con->fr_pool, req->fr_list,
+					  req->nmdesc);
+	} else {
+		struct ib_pool_fmr **pfmr;
+
+		for (i = req->nmdesc, pfmr = req->fmr_list; i > 0; i--, pfmr++)
+			ib_fmr_pool_unmap(*pfmr);
+	}
+	req->nmdesc = 0;
+}
+
+/*
+ * We have more scatter/gather entries, so use fast_reg_map
+ * trying to merge as many entries as we can.
+ */
+static int ibtrs_fast_reg_map_data(struct ibtrs_con *con,
+				   struct ibtrs_sg_desc *desc,
+				   struct rdma_req *req)
+{
+	struct ibtrs_map_state state;
+	int ret;
+
+	memset(&state, 0, sizeof(state));
+	state.desc	= desc;
+	state.dir	= req->dir;
+	ret = ibtrs_map_sg(&state, con, req);
+
+	if (unlikely(ret))
+		goto unmap;
+
+	if (unlikely(state.ndesc <= 0)) {
+		ERR(con->sess,
+		    "Could not fit S/G list into buffer descriptor %d\n",
+		    state.ndesc);
+		ret = -EIO;
+		goto unmap;
+	}
+
+	return state.ndesc;
+unmap:
+	ibtrs_unmap_fast_reg_data(con, req);
+	return ret;
+}
+
+static int ibtrs_post_send_rdma(struct ibtrs_con *con, struct rdma_req *req,
+				u64 addr, u32 off, u32 imm)
+{
+	struct ib_sge list[1];
+	u32 cnt = atomic_inc_return(&con->io_cnt);
+
+	DEB("called, imm: %x\n", imm);
+	if (unlikely(!req->sg_size)) {
+		WRN(con->sess, "Doing RDMA Write failed, no data supplied\n");
+		return -EINVAL;
+	}
+
+	/* user data and user message in the first list element */
+	list[0].addr   = req->iu->dma_addr;
+	list[0].length = req->sg_size;
+	list[0].lkey   = con->sess->ib_sess.pd->local_dma_lkey;
+
+	return ib_post_rdma_write_imm(con->ib_con.qp, list, 1,
+				      con->sess->srv_rdma_buf_rkey,
+				      addr + off, (u64)req->iu, imm,
+				      cnt % (con->sess->queue_depth) ?
+				      0 : IB_SEND_SIGNALED);
+}
+
+static void ibtrs_set_sge_with_desc(struct ib_sge *list,
+				    struct ibtrs_sg_desc *desc)
+{
+	list->addr   = desc->addr;
+	list->length = desc->len;
+	list->lkey   = desc->key;
+	DEB("dma_addr %llu, key %u, dma_len %u\n",
+	    desc->addr, desc->key, desc->len);
+}
+
+static void ibtrs_set_rdma_desc_last(struct ibtrs_con *con, struct ib_sge *list,
+				     struct rdma_req *req,
+				     struct ib_rdma_wr *wr, int offset,
+				     struct ibtrs_sg_desc *desc, int m,
+				     int n, u64 addr, u32 size, u32 imm)
+{
+	int i;
+	struct ibtrs_session *sess = con->sess;
+	u32 cnt = atomic_inc_return(&con->io_cnt);
+
+	for (i = m; i < n; i++, desc++)
+		ibtrs_set_sge_with_desc(&list[i], desc);
+
+	list[i].addr   = req->iu->dma_addr;
+	list[i].length = size;
+	list[i].lkey   = sess->ib_sess.pd->local_dma_lkey;
+	wr->wr.wr_id = (uintptr_t)req->iu;
+	wr->wr.sg_list = &list[m];
+	wr->wr.num_sge = n - m + 1;
+	wr->remote_addr	= addr + offset;
+	wr->rkey	= sess->srv_rdma_buf_rkey;
+
+	wr->wr.opcode	= IB_WR_RDMA_WRITE_WITH_IMM;
+	wr->wr.send_flags   = cnt % (sess->queue_depth) ? 0 :
+		IB_SEND_SIGNALED;
+	wr->wr.ex.imm_data	= cpu_to_be32(imm);
+}
+
+static int ibtrs_post_send_rdma_desc_more(struct ibtrs_con *con,
+					  struct ib_sge *list,
+					  struct rdma_req *req,
+					  struct ibtrs_sg_desc *desc, int n,
+					  u64 addr, u32 size, u32 imm)
+{
+	int ret;
+	size_t num_sge = 1 + n;
+	struct ibtrs_session *sess = con->sess;
+	int max_sge = sess->max_sge;
+	int num_wr =  DIV_ROUND_UP(num_sge, max_sge);
+	struct ib_send_wr *bad_wr;
+	struct ib_rdma_wr *wrs, *wr;
+	int j = 0, k, offset = 0, len = 0;
+	int m = 0;
+
+	wrs = kcalloc(num_wr, sizeof(*wrs), GFP_ATOMIC);
+	if (!wrs)
+		return -ENOMEM;
+
+	if (num_wr == 1)
+		goto last_one;
+
+	for (; j < num_wr; j++) {
+		wr = &wrs[j];
+		for (k = 0; k < max_sge; k++, desc++) {
+			m = k + j * max_sge;
+			ibtrs_set_sge_with_desc(&list[m], desc);
+			len +=  desc->len;
+		}
+		wr->wr.wr_id = (uintptr_t)req->iu;
+		wr->wr.sg_list = &list[m];
+		wr->wr.num_sge = max_sge;
+		wr->remote_addr	= addr + offset;
+		wr->rkey	= sess->srv_rdma_buf_rkey;
+
+		offset += len;
+		wr->wr.next	= &wrs[j + 1].wr;
+		wr->wr.opcode	= IB_WR_RDMA_WRITE;
+	}
+
+last_one:
+	wr = &wrs[j];
+
+	ibtrs_set_rdma_desc_last(con, list, req, wr, offset, desc, m, n, addr,
+				 size, imm);
+
+	ret = ib_post_send(con->ib_con.qp, &wrs[0].wr, &bad_wr);
+	if (unlikely(ret))
+		ERR(sess, "Posting RDMA-Write-Request to QP failed,"
+		    " errno: %d\n", ret);
+	kfree(wrs);
+	return ret;
+}
+
+static int ibtrs_post_send_rdma_desc(struct ibtrs_con *con,
+				     struct rdma_req *req,
+				     struct ibtrs_sg_desc *desc, int n,
+				     u64 addr, u32 size, u32 imm)
+{
+	size_t num_sge = 1 + n;
+	struct ib_sge *list;
+	int ret, i;
+	struct ibtrs_session *sess = con->sess;
+
+	list = kmalloc_array(num_sge, sizeof(*list), GFP_ATOMIC);
+
+	if (!list)
+		return -ENOMEM;
+
+	DEB("n is %d\n", n);
+	if (num_sge < sess->max_sge) {
+		u32 cnt = atomic_inc_return(&con->io_cnt);
+
+		for (i = 0; i < n; i++, desc++)
+			ibtrs_set_sge_with_desc(&list[i], desc);
+		list[i].addr   = req->iu->dma_addr;
+		list[i].length = size;
+		list[i].lkey   = sess->ib_sess.pd->local_dma_lkey;
+
+		ret = ib_post_rdma_write_imm(con->ib_con.qp, list, num_sge,
+					     sess->srv_rdma_buf_rkey,
+					     addr, (u64)req->iu, imm,
+					     cnt %
+					     (sess->queue_depth) ?
+					     0 : IB_SEND_SIGNALED);
+	} else
+		ret = ibtrs_post_send_rdma_desc_more(con, list, req, desc, n,
+						     addr, size, imm);
+
+	kfree(list);
+	return ret;
+}
+
+static int ibtrs_post_send_rdma_more(struct ibtrs_con *con,
+				     struct rdma_req *req,
+				     u64 addr, u32 size, u32 imm)
+{
+	int i, ret;
+	struct scatterlist *sg;
+	struct ib_device *ibdev = con->sess->ib_device;
+	size_t num_sge = 1 + req->sg_cnt;
+	struct ib_sge *list;
+	u32 cnt = atomic_inc_return(&con->io_cnt);
+
+	list = kmalloc_array(num_sge, sizeof(*list), GFP_ATOMIC);
+
+	if (!list)
+		return -ENOMEM;
+
+	for_each_sg(req->sglist, sg, req->sg_cnt, i) {
+		list[i].addr   = ib_sg_dma_address(ibdev, sg);
+		list[i].length = ib_sg_dma_len(ibdev, sg);
+		list[i].lkey   = con->sess->ib_sess.pd->local_dma_lkey;
+	}
+	list[i].addr   = req->iu->dma_addr;
+	list[i].length = size;
+	list[i].lkey   = con->sess->ib_sess.pd->local_dma_lkey;
+
+	ret = ib_post_rdma_write_imm(con->ib_con.qp, list, num_sge,
+				     con->sess->srv_rdma_buf_rkey,
+				     addr, (uintptr_t)req->iu, imm,
+				     cnt % (con->sess->queue_depth) ?
+				     0 : IB_SEND_SIGNALED);
+
+	kfree(list);
+	return ret;
+}
+
+static int ibtrs_post_recv(struct ibtrs_con *con, struct ibtrs_iu *iu)
+{
+	int err;
+	struct ib_recv_wr wr, *bad_wr;
+	struct ib_sge list;
+
+	list.addr   = iu->dma_addr;
+	list.length = iu->size;
+	list.lkey   = con->sess->ib_sess.pd->local_dma_lkey;
+
+	if (WARN_ON(list.length == 0)) {
+		WRN(con->sess, "Posting receive work request failed,"
+		    " sg list is empty\n");
+		return -EINVAL;
+	}
+
+	wr.next     = NULL;
+	wr.wr_id    = (uintptr_t)iu;
+	wr.sg_list  = &list;
+	wr.num_sge  = 1;
+
+	err = ib_post_recv(con->ib_con.qp, &wr, &bad_wr);
+	if (unlikely(err))
+		ERR(con->sess, "Posting receive work request failed, errno:"
+		    " %d\n", err);
+
+	return err;
+}
+
+static inline int ibtrs_clt_ms_to_id(unsigned long ms)
+{
+	int id = ms ? ilog2(ms) - MIN_LOG_LATENCY + 1 : 0;
+
+	return clamp(id, 0, MAX_LOG_LATENCY - MIN_LOG_LATENCY + 1);
+}
+
+static void ibtrs_clt_update_rdma_lat(struct ibtrs_clt_stats *s, bool read,
+				      unsigned long ms)
+{
+	const int id = ibtrs_clt_ms_to_id(ms);
+	const int cpu = raw_smp_processor_id();
+
+	if (read) {
+		s->rdma_lat_distr[cpu][id].read++;
+		if (s->rdma_lat_max[cpu].read < ms)
+			s->rdma_lat_max[cpu].read = ms;
+	} else {
+		s->rdma_lat_distr[cpu][id].write++;
+		if (s->rdma_lat_max[cpu].write < ms)
+			s->rdma_lat_max[cpu].write = ms;
+	}
+}
+
+static inline unsigned long ibtrs_clt_get_raw_ms(void)
+{
+	struct timespec ts;
+
+	getrawmonotonic(&ts);
+
+	return timespec_to_ms(&ts);
+}
+
+static inline void ibtrs_clt_decrease_inflight(struct ibtrs_clt_stats *s)
+{
+	s->rdma_stats[raw_smp_processor_id()].inflight--;
+}
+
+static void process_io_rsp(struct ibtrs_session *sess, u32 msg_id, s16 errno)
+{
+	struct rdma_req *req;
+	void *priv;
+	enum dma_data_direction dir;
+
+	if (unlikely(msg_id >= sess->queue_depth)) {
+		ERR(sess,
+		    "Immediate message with invalid msg id received: %d\n",
+		    msg_id);
+		return;
+	}
+
+	req = &sess->reqs[msg_id];
+
+	DEB("Processing io resp for msg_id: %u, %s\n", msg_id,
+	    req->dir == DMA_FROM_DEVICE ? "read" : "write");
+
+	if (req->sg_cnt > fmr_sg_cnt)
+		ibtrs_unmap_fast_reg_data(req->con, req);
+	if (req->sg_cnt)
+		ib_dma_unmap_sg(sess->ib_device, req->sglist,
+				req->sg_cnt, req->dir);
+	if (sess->enable_rdma_lat)
+		ibtrs_clt_update_rdma_lat(&sess->stats,
+					  req->dir == DMA_FROM_DEVICE,
+					  ibtrs_clt_get_raw_ms() -
+					  req->start_time);
+	ibtrs_clt_decrease_inflight(&sess->stats);
+
+	req->in_use = false;
+	req->con    = NULL;
+	priv = req->priv;
+	dir = req->dir;
+
+	clt_ops->rdma_ev(priv, dir == DMA_FROM_DEVICE ?
+			 IBTRS_CLT_RDMA_EV_RDMA_REQUEST_WRITE_COMPL :
+			 IBTRS_CLT_RDMA_EV_RDMA_WRITE_COMPL, errno);
+}
+
+static int ibtrs_send_msg_user_ack(struct ibtrs_con *con)
+{
+	int err;
+
+	rcu_read_lock();
+	smp_rmb(); /* fence con->state check */
+	if (unlikely(con->state != CSM_STATE_CONNECTED)) {
+		rcu_read_unlock();
+		INFO(con->sess, "Sending user msg ack failed, disconnected"
+		     " Connection state is %s, Session state is %s\n",
+		     csm_state_str(con->state),
+		     ssm_state_str(con->sess->state));
+		return -ECOMM;
+	}
+
+	err = ibtrs_write_empty_imm(con->ib_con.qp, UINT_MAX - 1,
+				    IB_SEND_SIGNALED);
+	rcu_read_unlock();
+	if (unlikely(err)) {
+		ERR_RL(con->sess, "Sending user msg ack failed, errno: %d\n",
+		       err);
+		return err;
+	}
+
+	ibtrs_heartbeat_set_send_ts(&con->sess->heartbeat);
+	return 0;
+}
+
+static void process_msg_user(struct ibtrs_con *con, struct ibtrs_msg_user *msg)
+{
+	int len;
+	struct ibtrs_session *sess = con->sess;
+
+	len = msg->hdr.tsize - IBTRS_HDR_LEN;
+
+	sess->stats.user_ib_msgs.recv_msg_cnt++;
+	sess->stats.user_ib_msgs.recv_size += len;
+
+	clt_ops->recv(sess->priv, (const void *)msg->payl, len);
+}
+
+static void process_msg_user_ack(struct ibtrs_con *con)
+{
+	struct ibtrs_session *sess = con->sess;
+
+	atomic_inc(&sess->peer_usr_msg_bufs);
+	wake_up(&con->sess->mu_buf_wait_q);
+}
+
+static void msg_worker(struct work_struct *work)
+{
+	struct msg_work *w;
+	struct ibtrs_con *con;
+	struct ibtrs_msg_user *msg;
+
+	w = container_of(work, struct msg_work, work);
+	con = w->con;
+	msg = w->msg;
+	kvfree(w);
+	process_msg_user(con, msg);
+	kvfree(msg);
+}
+
+static int ibtrs_schedule_msg(struct ibtrs_con *con, struct ibtrs_msg_user *msg)
+{
+	struct msg_work *w;
+
+	w = ibtrs_malloc(sizeof(*w));
+	if (!w)
+		return -ENOMEM;
+
+	w->con = con;
+	w->msg = ibtrs_malloc(msg->hdr.tsize);
+	if (!w->msg) {
+		kvfree(w);
+		return -ENOMEM;
+	}
+	memcpy(w->msg, msg, msg->hdr.tsize);
+	INIT_WORK(&w->work, msg_worker);
+	queue_work(con->sess->msg_wq, &w->work);
+	return 0;
+}
+
+static void ibtrs_handle_recv(struct ibtrs_con *con, struct ibtrs_iu *iu)
+{
+	struct ibtrs_msg_hdr *hdr;
+	struct ibtrs_session *sess = con->sess;
+	int ret;
+
+	hdr = (struct ibtrs_msg_hdr *)iu->buf;
+	if (unlikely(ibtrs_validate_message(sess->queue_depth, hdr)))
+		goto err1;
+
+	DEB("recv completion, type 0x%02x\n",
+	    hdr->type);
+	print_hex_dump_debug("", DUMP_PREFIX_OFFSET, 8, 1, iu->buf,
+			     IBTRS_HDR_LEN, true);
+
+	switch (hdr->type) {
+	case IBTRS_MSG_USER:
+		ret = ibtrs_schedule_msg(con, iu->buf);
+		if (unlikely(ret)) {
+			ERR_RL(sess, "Scheduling worker of user message "
+			       "to user module failed, errno: %d\n", ret);
+			goto err1;
+		}
+		ret = ibtrs_post_recv(con, iu);
+		if (unlikely(ret)) {
+			ERR_RL(sess, "Posting receive buffer of user message "
+			       "to HCA failed, errno: %d\n", ret);
+			goto err2;
+		}
+		ret = ibtrs_send_msg_user_ack(con);
+		if (unlikely(ret)) {
+			ERR_RL(sess, "Sending ACK for user message failed, "
+			       "errno: %d\n", ret);
+			goto err2;
+		}
+		return;
+	case IBTRS_MSG_SESS_OPEN_RESP: {
+		int err;
+
+		err = process_open_rsp(con, iu->buf);
+		if (unlikely(err))
+			ssm_schedule_event(con->sess, SSM_EV_CON_ERROR);
+		else
+			ssm_schedule_event(con->sess, SSM_EV_GOT_RDMA_INFO);
+		return;
+	}
+	default:
+		WRN(sess, "Received message of unknown type: 0x%02x\n",
+		    hdr->type);
+		goto err1;
+	}
+
+err1:
+	ibtrs_post_recv(con, iu);
+err2:
+	ERR(sess, "Failed to processes IBTRS message\n");
+	csm_schedule_event(con, CSM_EV_CON_ERROR);
+}
+
+static void process_err_wc(struct ibtrs_con *con, struct ib_wc *wc)
+{
+	struct ibtrs_iu *iu;
+
+	if (wc->wr_id == (uintptr_t)&con->ib_con.beacon) {
+		csm_schedule_event(con, CSM_EV_BEACON_COMPLETED);
+		return;
+	}
+
+	if (wc->wr_id == FAST_REG_WR_ID_MASK ||
+	    wc->wr_id == LOCAL_INV_WR_ID_MASK) {
+		ERR_RL(con->sess, "Fast registration wr failed: wr_id: %d,"
+		       "status: %s\n", (int)wc->wr_id,
+		       ib_wc_status_msg(wc->status));
+		csm_schedule_event(con, CSM_EV_WC_ERROR);
+		return;
+	}
+	/* only wc->wr_id is ensured to be correct in erroneous WCs,
+	 * we can't rely on wc->opcode, use iu->direction to determine if it's
+	 * an tx or rx IU
+	 */
+	iu = (struct ibtrs_iu *)wc->wr_id;
+	if (iu && iu->direction == DMA_TO_DEVICE && iu->is_msg)
+		put_u_msg_iu(con->sess, iu);
+
+	/* suppress FLUSH_ERR log when the connection is being disconnected */
+	if (unlikely(wc->status != IB_WC_WR_FLUSH_ERR ||
+		     (con->state != CSM_STATE_CLOSING &&
+		      con->state != CSM_STATE_FLUSHING)))
+		ERR_RL(con->sess, "wr_id: 0x%llx status: %d (%s),"
+		       " type: %d (%s), vendor_err: %x, len: %u,"
+		       " connection status: %s\n", wc->wr_id,
+		       wc->status, ib_wc_status_msg(wc->status),
+		       wc->opcode, ib_wc_opcode_str(wc->opcode),
+		       wc->vendor_err, wc->byte_len, csm_state_str(con->state));
+
+	csm_schedule_event(con, CSM_EV_WC_ERROR);
+}
+
+static int process_wcs(struct ibtrs_con *con, struct ib_wc *wcs, size_t len)
+{
+	int i, ret;
+	u32 imm;
+
+	for (i = 0; i < len; i++) {
+		u32 msg_id;
+		s16 errno;
+		struct ibtrs_msg_hdr *hdr;
+		struct ibtrs_iu *iu;
+		struct ib_wc wc = wcs[i];
+
+		if (unlikely(wc.status != IB_WC_SUCCESS)) {
+			process_err_wc(con, &wc);
+			continue;
+		}
+
+		DEB("cq complete with wr_id 0x%llx "
+		    "status %d (%s) type %d (%s) len %u\n",
+		    wc.wr_id, wc.status, ib_wc_status_msg(wc.status), wc.opcode,
+		    ib_wc_opcode_str(wc.opcode), wc.byte_len);
+
+		iu = (struct ibtrs_iu *)wc.wr_id;
+
+		switch (wc.opcode) {
+		case IB_WC_SEND:
+			if (con->user) {
+				if (iu == con->sess->sess_info_iu)
+					break;
+				put_u_msg_iu(con->sess, iu);
+				wake_up(&con->sess->mu_iu_wait_q);
+			}
+			break;
+		case IB_WC_RDMA_WRITE:
+			break;
+		case IB_WC_RECV_RDMA_WITH_IMM:
+			ibtrs_set_last_heartbeat(&con->sess->heartbeat);
+			imm = be32_to_cpu(wc.ex.imm_data);
+			ret = ibtrs_post_recv(con, iu);
+			if (ret) {
+				ERR(con->sess, "Failed to post receive "
+				    "buffer\n");
+				csm_schedule_event(con, CSM_EV_CON_ERROR);
+			}
+
+			if (imm == UINT_MAX) {
+				break;
+			} else if (imm == UINT_MAX - 1) {
+				process_msg_user_ack(con);
+				break;
+			}
+			msg_id = imm >> 16;
+			errno = (imm << 16) >> 16;
+			process_io_rsp(con->sess, msg_id, errno);
+			break;
+
+		case IB_WC_RECV:
+			ibtrs_set_last_heartbeat(&con->sess->heartbeat);
+
+			hdr = (struct ibtrs_msg_hdr *)iu->buf;
+			ibtrs_deb_msg_hdr("Received: ", hdr);
+			ibtrs_handle_recv(con, iu);
+			break;
+
+		default:
+			WRN(con->sess, "Unexpected WC type: %s\n",
+			    ib_wc_opcode_str(wc.opcode));
+		}
+	}
+
+	return 0;
+}
+
+static void ibtrs_clt_update_wc_stats(struct ibtrs_con *con, int cnt)
+{
+	short cpu = con->cpu;
+
+	if (cnt > con->sess->stats.wc_comp[cpu].max_wc_cnt)
+		con->sess->stats.wc_comp[cpu].max_wc_cnt = cnt;
+	con->sess->stats.wc_comp[cpu].cnt++;
+	con->sess->stats.wc_comp[cpu].total_cnt += cnt;
+}
+
+static int get_process_wcs(struct ibtrs_con *con)
+{
+	int cnt, err;
+	struct ib_wc *wcs = con->wcs;
+
+	do {
+		cnt = ib_poll_cq(con->ib_con.cq, ARRAY_SIZE(con->wcs), wcs);
+		if (unlikely(cnt < 0)) {
+			ERR(con->sess, "Getting work requests from completion"
+			    " queue failed, errno: %d\n", cnt);
+			return cnt;
+		}
+		DEB("Retrieved %d wcs from CQ\n", cnt);
+
+		if (likely(cnt > 0)) {
+			err = process_wcs(con, wcs, cnt);
+			if (unlikely(err))
+				return err;
+			ibtrs_clt_update_wc_stats(con, cnt);
+		}
+	} while (cnt > 0);
+
+	return 0;
+}
+
+static void process_con_rejected(struct ibtrs_con *con,
+				 struct rdma_cm_event *event)
+{
+	const struct ibtrs_msg_error *msg;
+
+	msg = event->param.conn.private_data;
+	/* Check if the server has sent some message on the private data.
+	 * IB_CM_REJ_CONSUMER_DEFINED is set not only when ibtrs_server
+	 * provided private data for the rdma_reject() call, so the data len
+	 * needs also to be checked.
+	 */
+	if (event->status != IB_CM_REJ_CONSUMER_DEFINED ||
+	    msg->hdr.type != IBTRS_MSG_ERROR)
+		return;
+
+	if (unlikely(ibtrs_validate_message(con->sess->queue_depth, msg))) {
+		ERR(con->sess,
+		    "Received invalid connection rejected message\n");
+		return;
+	}
+
+	if (con == &con->sess->con[0] && msg->errno == -EEXIST)
+		ERR(con->sess, "Connection rejected by the server,"
+		    " session already exists, errno: %d\n", msg->errno);
+	else
+		ERR(con->sess, "Connection rejected by the server, errno: %d\n",
+		    msg->errno);
+}
+
+static int ibtrs_clt_rdma_cm_ev_handler(struct rdma_cm_id *cm_id,
+					struct rdma_cm_event *event)
+{
+	struct ibtrs_con *con = cm_id->context;
+
+	switch (event->event) {
+	case RDMA_CM_EVENT_ADDR_RESOLVED:
+		DEB("addr resolved on cma_id is %p\n", cm_id);
+		csm_schedule_event(con, CSM_EV_ADDR_RESOLVED);
+		break;
+
+	case RDMA_CM_EVENT_ROUTE_RESOLVED: {
+		struct sockaddr_storage *peer_addr = &con->sess->peer_addr;
+		struct sockaddr_storage *self_addr = &con->sess->self_addr;
+
+		DEB("route resolved on cma_id is %p\n", cm_id);
+		/* initiator is src, target is dst */
+		memcpy(peer_addr, &cm_id->route.addr.dst_addr,
+		       sizeof(*peer_addr));
+		memcpy(self_addr, &cm_id->route.addr.src_addr,
+		       sizeof(*self_addr));
+
+		switch (peer_addr->ss_family) {
+		case AF_INET:
+			DEB("Route %pI4->%pI4 resolved\n",
+			    &((struct sockaddr_in *)
+			      self_addr)->sin_addr.s_addr,
+			    &((struct sockaddr_in *)
+			      peer_addr)->sin_addr.s_addr);
+			break;
+		case AF_INET6:
+			DEB("Route %pI6->%pI6 resolved\n",
+			    &((struct sockaddr_in6 *)self_addr)->sin6_addr,
+			    &((struct sockaddr_in6 *)peer_addr)->sin6_addr);
+			break;
+		case AF_IB:
+			DEB("Route %pI6->%pI6 resolved\n",
+			    &((struct sockaddr_ib *)self_addr)->sib_addr,
+			    &((struct sockaddr_ib *)peer_addr)->sib_addr);
+			break;
+		default:
+			DEB("Route resolved (unknown address family)\n");
+		}
+
+		csm_schedule_event(con, CSM_EV_ROUTE_RESOLVED);
+		}
+		break;
+
+	case RDMA_CM_EVENT_ESTABLISHED:
+		DEB("Connection established\n");
+
+		csm_schedule_event(con, CSM_EV_CON_ESTABLISHED);
+		break;
+
+	case RDMA_CM_EVENT_ADDR_ERROR:
+	case RDMA_CM_EVENT_ROUTE_ERROR:
+	case RDMA_CM_EVENT_CONNECT_ERROR:
+		ERR(con->sess, "Connection establishment error"
+		    " (CM event: %s, errno: %d)\n",
+		    rdma_event_msg(event->event), event->status);
+		csm_schedule_event(con, CSM_EV_CON_ERROR);
+		break;
+
+	case RDMA_CM_EVENT_DISCONNECTED:
+	case RDMA_CM_EVENT_TIMEWAIT_EXIT:
+		csm_schedule_event(con, CSM_EV_CON_DISCONNECTED);
+		break;
+
+	case RDMA_CM_EVENT_REJECTED:
+		/* reject status is defined in enum, not errno */
+		ERR_RL(con->sess,
+		       "Connection rejected (CM event: %s, err: %s)\n",
+		       rdma_event_msg(event->event),
+		       rdma_reject_msg(cm_id, event->status));
+		process_con_rejected(con, event);
+		csm_schedule_event(con, CSM_EV_CON_ERROR);
+		break;
+
+	case RDMA_CM_EVENT_UNREACHABLE:
+	case RDMA_CM_EVENT_ADDR_CHANGE: {
+		ERR_RL(con->sess, "CM error (CM event: %s, errno: %d)\n",
+		       rdma_event_msg(event->event), event->status);
+
+		csm_schedule_event(con, CSM_EV_CON_ERROR);
+		break;
+	}
+	case RDMA_CM_EVENT_DEVICE_REMOVAL: {
+		struct completion dc;
+
+		ERR_RL(con->sess, "CM error (CM event: %s, errno: %d)\n",
+		       rdma_event_msg(event->event), event->status);
+
+		con->device_being_removed = true;
+		init_completion(&dc);
+		con->sess->ib_sess_destroy_completion = &dc;
+
+		/* Generating a CON_ERROR event will cause the SSM to close all
+		 * the connections and try to reconnect. Wait until all
+		 * connections are closed and the ib session destroyed before
+		 * returning to the ib core code.
+		 */
+		csm_schedule_event(con, CSM_EV_CON_ERROR);
+		wait_for_completion(&dc);
+		con->sess->ib_sess_destroy_completion = NULL;
+
+		/* return 1 so cm_id is destroyed afterwards */
+		return 1;
+	}
+	default:
+		WRN(con->sess, "Ignoring unexpected CM event %s, errno: %d\n",
+		    rdma_event_msg(event->event), event->status);
+		break;
+	}
+	return 0;
+}
+
+static void handle_cq_comp(struct ibtrs_con *con)
+{
+	int err;
+
+	err = get_process_wcs(con);
+	if (unlikely(err))
+		goto error;
+
+	while ((err = ib_req_notify_cq(con->ib_con.cq, IB_CQ_NEXT_COMP |
+				       IB_CQ_REPORT_MISSED_EVENTS)) > 0) {
+		DEB("Missed %d CQ notifications, processing missed WCs...\n",
+		    err);
+		err = get_process_wcs(con);
+		if (unlikely(err))
+			goto error;
+	}
+
+	if (unlikely(err))
+		goto error;
+
+	return;
+
+error:
+	ERR(con->sess, "Failed to get WCs from CQ, errno: %d\n", err);
+	csm_schedule_event(con, CSM_EV_CON_ERROR);
+}
+
+static inline void tasklet_handle_cq_comp(unsigned long data)
+{
+	struct ibtrs_con *con = (struct ibtrs_con *)data;
+
+	handle_cq_comp(con);
+}
+
+static inline void wrapper_handle_cq_comp(struct work_struct *work)
+{
+	struct ibtrs_con *con = container_of(work, struct ibtrs_con, cq_work);
+
+	handle_cq_comp(con);
+}
+
+static void cq_event_handler(struct ib_cq *cq, void *ctx)
+{
+	struct ibtrs_con *con = ctx;
+	int cpu = raw_smp_processor_id();
+
+	if (unlikely(con->cpu != cpu)) {
+		DEB_RL("WC processing is migrated from CPU %d to %d, cstate %s,"
+		       " sstate %s, user: %s\n", con->cpu,
+		       cpu, csm_state_str(con->state),
+		       ssm_state_str(con->sess->state),
+		       con->user ? "true" : "false");
+		atomic_inc(&con->sess->stats.cpu_migr.from[con->cpu]);
+		con->sess->stats.cpu_migr.to[cpu]++;
+	}
+
+	/* queue_work() can return False here.
+	 * The work can be already queued, When CQ notifications were already
+	 * activiated and are activated again after the beacon was posted.
+	 */
+	if (con->user)
+		queue_work(con->cq_wq, &con->cq_work);
+	else
+		tasklet_schedule(&con->cq_tasklet);
+}
+
+static int post_io_con_recv(struct ibtrs_con *con)
+{
+	int i, ret;
+	struct ibtrs_iu *dummy_rx_iu = con->sess->dummy_rx_iu;
+
+	for (i = 0; i < con->sess->queue_depth; i++) {
+		ret = ibtrs_post_recv(con, dummy_rx_iu);
+		if (unlikely(ret)) {
+			WRN(con->sess,
+			    "Posting receive buffers to HCA failed, errno:"
+			    " %d\n", ret);
+			return ret;
+		}
+	}
+	return 0;
+}
+
+static int post_usr_con_recv(struct ibtrs_con *con)
+{
+	int i, ret;
+
+	for (i = 0; i < USR_CON_BUF_SIZE; i++) {
+		struct ibtrs_iu *iu = con->sess->usr_rx_ring[i];
+
+		ret = ibtrs_post_recv(con, iu);
+		if (unlikely(ret)) {
+			WRN(con->sess,
+			    "Posting receive buffers to HCA failed, errno:"
+			    " %d\n", ret);
+			return ret;
+		}
+	}
+	return 0;
+}
+
+static int post_init_con_recv(struct ibtrs_con *con)
+{
+	int ret;
+
+	ret = ibtrs_post_recv(con, con->sess->rdma_info_iu);
+	if (unlikely(ret))
+		WRN(con->sess,
+		    "Posting rdma info iu to HCA failed, errno: %d\n", ret);
+	return ret;
+}
+
+static int post_recv(struct ibtrs_con *con)
+{
+	if (con->user)
+		return post_init_con_recv(con);
+	else
+		return post_io_con_recv(con);
+}
+
+static void fail_outstanding_req(struct ibtrs_con *con, struct rdma_req *req)
+{
+	void *priv;
+	enum dma_data_direction dir;
+
+	if (!req->in_use)
+		return;
+
+	if (req->sg_cnt > fmr_sg_cnt)
+		ibtrs_unmap_fast_reg_data(con, req);
+	if (req->sg_cnt)
+		ib_dma_unmap_sg(con->sess->ib_device, req->sglist,
+				req->sg_cnt, req->dir);
+	ibtrs_clt_decrease_inflight(&con->sess->stats);
+
+	req->in_use = false;
+	req->con    = NULL;
+	priv = req->priv;
+	dir = req->dir;
+
+	clt_ops->rdma_ev(priv, dir == DMA_FROM_DEVICE ?
+			 IBTRS_CLT_RDMA_EV_RDMA_REQUEST_WRITE_COMPL :
+			 IBTRS_CLT_RDMA_EV_RDMA_WRITE_COMPL, -ECONNABORTED);
+
+	DEB("Canceled outstanding request\n");
+}
+
+static void fail_outstanding_reqs(struct ibtrs_con *con)
+{
+	struct ibtrs_session *sess = con->sess;
+	int i;
+
+	if (!sess->reqs)
+		return;
+	for (i = 0; i < sess->queue_depth; ++i) {
+		if (sess->reqs[i].con == con)
+			fail_outstanding_req(con, &sess->reqs[i]);
+	}
+}
+
+static void fail_all_outstanding_reqs(struct ibtrs_session *sess)
+{
+	int i;
+
+	if (!sess->reqs)
+		return;
+	for (i = 0; i < sess->queue_depth; ++i)
+		fail_outstanding_req(sess->reqs[i].con, &sess->reqs[i]);
+}
+
+static void ibtrs_free_reqs(struct ibtrs_session *sess)
+{
+	struct rdma_req *req;
+	int i;
+
+	if (!sess->reqs)
+		return;
+
+	for (i = 0; i < sess->queue_depth; ++i) {
+		req = &sess->reqs[i];
+
+		if (sess->fast_reg_mode == IBTRS_FAST_MEM_FR) {
+			kfree(req->fr_list);
+			req->fr_list = NULL;
+		} else if (sess->fast_reg_mode == IBTRS_FAST_MEM_FMR) {
+			kfree(req->fmr_list);
+			req->fmr_list = NULL;
+		}
+
+		kfree(req->map_page);
+		req->map_page = NULL;
+	}
+
+	kfree(sess->reqs);
+	sess->reqs = NULL;
+}
+
+static int ibtrs_alloc_reqs(struct ibtrs_session *sess)
+{
+	struct rdma_req *req = NULL;
+	void *mr_list = NULL;
+	int i;
+
+	sess->reqs = kcalloc(sess->queue_depth, sizeof(*sess->reqs),
+			     GFP_KERNEL);
+	if (!sess->reqs)
+		return -ENOMEM;
+
+	for (i = 0; i < sess->queue_depth; ++i) {
+		req = &sess->reqs[i];
+		mr_list = kmalloc_array(sess->max_pages_per_mr,
+					sizeof(void *), GFP_KERNEL);
+		if (!mr_list)
+			goto out;
+
+		if (sess->fast_reg_mode == IBTRS_FAST_MEM_FR)
+			req->fr_list = mr_list;
+		else if (sess->fast_reg_mode == IBTRS_FAST_MEM_FMR)
+			req->fmr_list = mr_list;
+
+		req->map_page = kmalloc(sess->max_pages_per_mr *
+					sizeof(void *), GFP_KERNEL);
+		if (!req->map_page)
+			goto out;
+	}
+
+	return 0;
+
+out:
+	ibtrs_free_reqs(sess);
+	return -ENOMEM;
+}
+
+static void free_sess_rx_bufs(struct ibtrs_session *sess)
+{
+	int i;
+
+	if (!sess->usr_rx_ring)
+		return;
+
+	for (i = 0; i < USR_CON_BUF_SIZE; ++i)
+		if (sess->usr_rx_ring[i])
+			ibtrs_iu_free(sess->usr_rx_ring[i],
+				      DMA_FROM_DEVICE,
+				      sess->ib_device);
+
+	kfree(sess->usr_rx_ring);
+	sess->usr_rx_ring = NULL;
+}
+
+static void free_sess_tx_bufs(struct ibtrs_session *sess, bool check)
+{
+	int i;
+	struct ibtrs_iu *e, *next;
+
+	if (!sess->io_tx_ius)
+		return;
+
+	for (i = 0; i < sess->queue_depth; i++)
+		if (sess->io_tx_ius[i])
+			ibtrs_iu_free(sess->io_tx_ius[i], DMA_TO_DEVICE,
+				      sess->ib_device);
+
+	kfree(sess->io_tx_ius);
+	sess->io_tx_ius = NULL;
+	if (check) {
+		struct list_head *e;
+		size_t cnt = 0;
+
+		list_for_each(e, &sess->u_msg_ius_list)
+			cnt++;
+
+		WARN_ON(cnt != USR_CON_BUF_SIZE);
+	}
+	list_for_each_entry_safe(e, next, &sess->u_msg_ius_list, list) {
+		list_del(&e->list);
+		ibtrs_iu_free(e, DMA_TO_DEVICE, sess->ib_device);
+	}
+}
+
+static void free_sess_fast_pool(struct ibtrs_session *sess)
+{
+	if (sess->fast_reg_mode == IBTRS_FAST_MEM_FMR) {
+		if (sess->fmr_pool)
+			ib_destroy_fmr_pool(sess->fmr_pool);
+		sess->fmr_pool = NULL;
+	}
+}
+
+static void free_sess_tr_bufs(struct ibtrs_session *sess)
+{
+	free_sess_rx_bufs(sess);
+	free_sess_tx_bufs(sess, true);
+}
+
+static void free_sess_init_bufs(struct ibtrs_session *sess)
+{
+	if (sess->rdma_info_iu) {
+		ibtrs_iu_free(sess->rdma_info_iu, DMA_FROM_DEVICE,
+			      sess->ib_device);
+		sess->rdma_info_iu = NULL;
+	}
+
+	if (sess->dummy_rx_iu) {
+		ibtrs_iu_free(sess->dummy_rx_iu, DMA_FROM_DEVICE,
+			      sess->ib_device);
+		sess->dummy_rx_iu = NULL;
+	}
+
+	if (sess->sess_info_iu) {
+		ibtrs_iu_free(sess->sess_info_iu, DMA_TO_DEVICE,
+			      sess->ib_device);
+		sess->sess_info_iu = NULL;
+	}
+}
+
+static void free_io_bufs(struct ibtrs_session *sess)
+{
+	ibtrs_free_reqs(sess);
+	free_sess_fast_pool(sess);
+	kfree(sess->tags_map);
+	sess->tags_map = NULL;
+	kfree(sess->tags);
+	sess->tags = NULL;
+	sess->io_bufs_initialized = false;
+}
+
+static void free_sess_bufs(struct ibtrs_session *sess)
+{
+	free_sess_init_bufs(sess);
+	free_io_bufs(sess);
+}
+
+static struct ib_fmr_pool *alloc_fmr_pool(struct ibtrs_session *sess)
+{
+	struct ib_fmr_pool_param fmr_param;
+
+	memset(&fmr_param, 0, sizeof(fmr_param));
+	fmr_param.pool_size	    = sess->queue_depth *
+				      sess->max_pages_per_mr;
+	fmr_param.dirty_watermark   = fmr_param.pool_size / 4;
+	fmr_param.cache		    = 0;
+	fmr_param.max_pages_per_fmr = sess->max_pages_per_mr;
+	fmr_param.page_shift	    = ilog2(sess->mr_page_size);
+	fmr_param.access	    = (IB_ACCESS_LOCAL_WRITE |
+				       IB_ACCESS_REMOTE_WRITE);
+
+	return ib_create_fmr_pool(sess->ib_sess.pd, &fmr_param);
+}
+
+static int alloc_sess_rx_bufs(struct ibtrs_session *sess)
+{
+	int i;
+	u32 max_req_size = sess->max_req_size;
+
+	sess->usr_rx_ring = kcalloc(USR_CON_BUF_SIZE,
+				    sizeof(*sess->usr_rx_ring),
+				    GFP_KERNEL);
+	if (!sess->usr_rx_ring)
+		goto err;
+
+	for (i = 0; i < USR_CON_BUF_SIZE; ++i) {
+		/* alloc recv buffer, open rep is the biggest */
+		sess->usr_rx_ring[i] = ibtrs_iu_alloc(i, max_req_size,
+						      GFP_KERNEL,
+						      sess->ib_device,
+						      DMA_FROM_DEVICE, true);
+		if (!sess->usr_rx_ring[i]) {
+			WRN(sess, "Failed to allocate IU for RX ring\n");
+			goto err;
+		}
+	}
+
+	return 0;
+
+err:
+	free_sess_rx_bufs(sess);
+
+	return -ENOMEM;
+}
+
+static int alloc_sess_fast_pool(struct ibtrs_session *sess)
+{
+	int err = 0;
+	struct ib_fmr_pool *fmr_pool;
+
+	if (sess->fast_reg_mode == IBTRS_FAST_MEM_FMR) {
+		fmr_pool = alloc_fmr_pool(sess);
+		if (IS_ERR(fmr_pool)) {
+			err = PTR_ERR(fmr_pool);
+			ERR(sess, "FMR pool allocation failed, errno: %d\n",
+			    err);
+			return err;
+		}
+		sess->fmr_pool = fmr_pool;
+	}
+	return err;
+}
+
+static int alloc_sess_init_bufs(struct ibtrs_session *sess)
+{
+	sess->sess_info_iu = ibtrs_iu_alloc(0, MSG_SESS_INFO_SIZE, GFP_KERNEL,
+			       sess->ib_device, DMA_TO_DEVICE, true);
+	if (unlikely(!sess->sess_info_iu)) {
+		ERR_RL(sess, "Can't allocate transfer buffer for "
+			     "sess hostname\n");
+		return -ENOMEM;
+	}
+	sess->rdma_info_iu =
+		ibtrs_iu_alloc(0,
+			       IBTRS_MSG_SESS_OPEN_RESP_LEN(MAX_SESS_QUEUE_DEPTH),
+			       GFP_KERNEL, sess->ib_device,
+			       DMA_FROM_DEVICE, true);
+	if (!sess->rdma_info_iu) {
+		WRN(sess, "Failed to allocate IU to receive "
+			  "RDMA INFO message\n");
+		goto err;
+	}
+
+	sess->dummy_rx_iu =
+		ibtrs_iu_alloc(0, IBTRS_HDR_LEN,
+			       GFP_KERNEL, sess->ib_device,
+			       DMA_FROM_DEVICE, true);
+	if (!sess->dummy_rx_iu) {
+		WRN(sess, "Failed to allocate IU to receive "
+			  "immediate messages on io connections\n");
+		goto err;
+	}
+
+	return 0;
+
+err:
+	free_sess_init_bufs(sess);
+
+	return -ENOMEM;
+}
+
+static int alloc_sess_tx_bufs(struct ibtrs_session *sess)
+{
+	int i;
+	struct ibtrs_iu *iu;
+	u32 max_req_size = sess->max_req_size;
+
+	INIT_LIST_HEAD(&sess->u_msg_ius_list);
+	spin_lock_init(&sess->u_msg_ius_lock);
+
+	sess->io_tx_ius = kcalloc(sess->queue_depth, sizeof(*sess->io_tx_ius),
+				  GFP_KERNEL);
+	if (!sess->io_tx_ius)
+		goto err;
+
+	for (i = 0; i < sess->queue_depth; ++i) {
+		iu = ibtrs_iu_alloc(i, max_req_size, GFP_KERNEL,
+				    sess->ib_device, DMA_TO_DEVICE,false);
+		if (!iu) {
+			WRN(sess, "Failed to allocate IU for TX buffer\n");
+			goto err;
+		}
+		sess->io_tx_ius[i] = iu;
+	}
+
+	for (i = 0; i < USR_CON_BUF_SIZE; ++i) {
+		iu = ibtrs_iu_alloc(i, max_req_size, GFP_KERNEL,
+				    sess->ib_device, DMA_TO_DEVICE,
+				    true);
+		if (!iu) {
+			WRN(sess, "Failed to allocate IU for TX buffer\n");
+			goto err;
+		}
+		list_add(&iu->list, &sess->u_msg_ius_list);
+	}
+	return 0;
+
+err:
+	free_sess_tx_bufs(sess, false);
+
+	return -ENOMEM;
+}
+
+static int alloc_sess_tr_bufs(struct ibtrs_session *sess)
+{
+	int err;
+
+	err = alloc_sess_rx_bufs(sess);
+	if (!err)
+		err = alloc_sess_tx_bufs(sess);
+
+	return err;
+}
+
+static int alloc_sess_tags(struct ibtrs_session *sess)
+{
+	int err, i;
+
+	sess->tags_map = kzalloc(BITS_TO_LONGS(sess->queue_depth) *
+				 sizeof(long), GFP_KERNEL);
+	if (!sess->tags_map) {
+		ERR(sess, "Failed to alloc tags bitmap\n");
+		err = -ENOMEM;
+		goto out_err;
+	}
+
+	sess->tags = kcalloc(sess->queue_depth, TAG_SIZE(sess),
+			     GFP_KERNEL);
+	if (!sess->tags) {
+		ERR(sess, "Failed to alloc memory for tags\n");
+		err = -ENOMEM;
+		goto err_map;
+	}
+
+	for (i = 0; i < sess->queue_depth; i++) {
+		struct ibtrs_tag *tag;
+
+		tag = GET_TAG(sess, i);
+		tag->mem_id = i;
+		tag->mem_id_mask = i << ((IB_IMM_SIZE_BITS - 1) -
+					 ilog2(sess->queue_depth - 1));
+	}
+
+	return 0;
+
+err_map:
+	kfree(sess->tags_map);
+	sess->tags_map = NULL;
+out_err:
+	return err;
+}
+
+static int connect_qp(struct ibtrs_con *con)
+{
+	int err;
+	struct rdma_conn_param conn_param;
+	struct ibtrs_msg_sess_open somsg;
+	struct ibtrs_msg_con_open comsg;
+
+	memset(&conn_param, 0, sizeof(conn_param));
+	conn_param.retry_count = retry_count;
+
+	if (con->user) {
+		if (CONS_PER_SESSION > U8_MAX)
+			return -EINVAL;
+		fill_ibtrs_msg_sess_open(&somsg, CONS_PER_SESSION, &uuid);
+		conn_param.private_data		= &somsg;
+		conn_param.private_data_len	= sizeof(somsg);
+		conn_param.rnr_retry_count	= 7;
+	} else {
+		fill_ibtrs_msg_con_open(&comsg, &uuid);
+		conn_param.private_data		= &comsg;
+		conn_param.private_data_len	= sizeof(comsg);
+	}
+	err = rdma_connect(con->cm_id, &conn_param);
+	if (err) {
+		ERR(con->sess, "Establishing RDMA connection failed, errno:"
+		    " %d\n", err);
+		return err;
+	}
+
+	DEB("rdma_connect successful\n");
+	return 0;
+}
+
+static int resolve_addr(struct ibtrs_con *con,
+			const struct sockaddr_storage *addr)
+{
+	int err;
+
+	err = rdma_resolve_addr(con->cm_id, NULL,
+				(struct sockaddr *)addr, 1000);
+	if (err)
+		/* TODO: Include the address in message that was
+		 * tried to resolve can be a AF_INET, AF_INET6
+		 * or an AF_IB address
+		 */
+		ERR(con->sess, "Resolving server address failed, errno: %d\n",
+		    err);
+	return err;
+}
+
+static int resolve_route(struct ibtrs_con *con)
+{
+	int err;
+
+	err = rdma_resolve_route(con->cm_id, 1000);
+	if (err)
+		ERR(con->sess, "Resolving route failed, errno: %d\n",
+		    err);
+
+	return err;
+}
+
+static int query_fast_reg_mode(struct ibtrs_con *con)
+{
+	struct ib_device *ibdev = con->sess->ib_device;
+	struct ib_device_attr *dev_attr = &ibdev->attrs;
+	int mr_page_shift;
+	u64 max_pages_per_mr;
+
+
+	if (ibdev->alloc_fmr && ibdev->dealloc_fmr &&
+	    ibdev->map_phys_fmr && ibdev->unmap_fmr) {
+		con->sess->fast_reg_mode = IBTRS_FAST_MEM_FMR;
+		INFO(con->sess, "Device %s supports FMR\n", ibdev->name);
+	}
+	if (dev_attr->device_cap_flags & IB_DEVICE_MEM_MGT_EXTENSIONS &&
+	    use_fr) {
+		con->sess->fast_reg_mode = IBTRS_FAST_MEM_FR;
+		INFO(con->sess, "Device %s supports FR\n", ibdev->name);
+	}
+
+	/*
+	 * Use the smallest page size supported by the HCA, down to a
+	 * minimum of 4096 bytes. We're unlikely to build large sglists
+	 * out of smaller entries.
+	 */
+	mr_page_shift		= max(12, ffs(dev_attr->page_size_cap) - 1);
+	con->sess->mr_page_size	= 1 << mr_page_shift;
+	con->sess->max_sge	= dev_attr->max_sge;
+	con->sess->mr_page_mask	= ~((u64)con->sess->mr_page_size - 1);
+	max_pages_per_mr	= dev_attr->max_mr_size;
+	do_div(max_pages_per_mr, con->sess->mr_page_size);
+	con->sess->max_pages_per_mr = min_t(u64, con->sess->max_pages_per_mr,
+					    max_pages_per_mr);
+	if (con->sess->fast_reg_mode == IBTRS_FAST_MEM_FR) {
+		con->sess->max_pages_per_mr =
+			min_t(u32, con->sess->max_pages_per_mr,
+			      dev_attr->max_fast_reg_page_list_len);
+	}
+	con->sess->mr_max_size	= con->sess->mr_page_size *
+				  con->sess->max_pages_per_mr;
+	DEB("%s: mr_page_shift = %d, dev_attr->max_mr_size = %#llx, "
+	    "dev_attr->max_fast_reg_page_list_len = %u, max_pages_per_mr = %d, "
+	    "mr_max_size = %#x\n", ibdev->name, mr_page_shift,
+	    dev_attr->max_mr_size, dev_attr->max_fast_reg_page_list_len,
+	    con->sess->max_pages_per_mr, con->sess->mr_max_size);
+	return 0;
+}
+
+static int send_heartbeat(struct ibtrs_session *sess)
+{
+	int err;
+	struct ibtrs_con *con;
+
+	con = &sess->con[0];
+
+	rcu_read_lock();
+	smp_rmb(); /* fence con->state check */
+	if (unlikely(con->state != CSM_STATE_CONNECTED)) {
+		rcu_read_unlock();
+		ERR_RL(sess, "Sending heartbeat message failed, not connected."
+		       " Connection state changed to %s!\n",
+		       csm_state_str(con->state));
+		return -ECOMM;
+	}
+
+	err = ibtrs_write_empty_imm(con->ib_con.qp, UINT_MAX, IB_SEND_SIGNALED);
+	rcu_read_unlock();
+	if (unlikely(err)) {
+		WRN(sess, "Sending heartbeat failed, posting msg to QP failed,"
+		    " errno: %d\n", err);
+		return err;
+	}
+
+	ibtrs_heartbeat_set_send_ts(&sess->heartbeat);
+
+	return err;
+}
+
+static void heartbeat_work(struct work_struct *work)
+{
+	int err;
+	struct ibtrs_session *sess;
+
+	sess = container_of(to_delayed_work(work), struct ibtrs_session,
+			    heartbeat_dwork);
+
+	if (ibtrs_heartbeat_timeout_is_expired(&sess->heartbeat)) {
+		ssm_schedule_event(sess, SSM_EV_RECONNECT_HEARTBEAT);
+		return;
+	}
+
+	ibtrs_heartbeat_warn(&sess->heartbeat);
+
+	if (ibtrs_heartbeat_send_ts_diff_ms(&sess->heartbeat) >=
+	    HEARTBEAT_INTV_MS) {
+		err = send_heartbeat(sess);
+		if (unlikely(err))
+			WRN(sess, "Sending heartbeat failed, errno: %d\n",
+			    err);
+	}
+
+	if (!schedule_delayed_work(&sess->heartbeat_dwork,
+				   HEARTBEAT_INTV_JIFFIES))
+		WRN(sess, "Schedule heartbeat work failed, already queued?\n");
+}
+
+static int create_cm_id_con(const struct sockaddr_storage *addr,
+			    struct ibtrs_con *con)
+{
+	int err;
+
+	if (addr->ss_family == AF_IB)
+		con->cm_id = rdma_create_id(&init_net,
+					    ibtrs_clt_rdma_cm_ev_handler, con,
+					    RDMA_PS_IB, IB_QPT_RC);
+	else
+		con->cm_id = rdma_create_id(&init_net,
+					    ibtrs_clt_rdma_cm_ev_handler, con,
+					    RDMA_PS_TCP, IB_QPT_RC);
+
+	if (IS_ERR(con->cm_id)) {
+		err = PTR_ERR(con->cm_id);
+		WRN(con->sess, "Failed to create CM ID, errno: %d\n", err);
+		con->cm_id = NULL;
+		return err;
+	}
+
+	return 0;
+}
+
+static int create_ib_sess(struct ibtrs_con *con)
+{
+	int err;
+	struct ibtrs_session *sess = con->sess;
+
+	if (atomic_read(&sess->ib_sess_initialized) == 1)
+		return 0;
+
+	if (WARN_ON(!con->cm_id->device)) {
+		WRN(sess, "Invalid CM ID device\n");
+		return -EINVAL;
+	}
+
+	// TODO ib_device_hold(con->cm_id->device);
+	sess->ib_device = con->cm_id->device;
+
+	/* For performance reasons, we don't allow a session to be created if
+	 * the number of completion vectors available in the hardware is not
+	 * enough to have one interrupt per CPU.
+	 */
+	if (sess->ib_device->num_comp_vectors < num_online_cpus()) {
+		WRN(sess,
+		    "%d cq vectors available, not enough to have one IRQ per"
+		    " CPU, >= %d vectors required, contine anyway.\n",
+		    sess->ib_device->num_comp_vectors, num_online_cpus());
+	}
+
+	err = ib_session_init(sess->ib_device, &sess->ib_sess);
+	if (err) {
+		WRN(sess, "Failed to initialize IB session, errno: %d\n", err);
+		goto err_out;
+	}
+
+	err = query_fast_reg_mode(con);
+	if (err) {
+		WRN(sess, "Failed to query fast registration mode, errno: %d\n",
+		    err);
+		goto err_sess;
+	}
+
+	err = alloc_sess_init_bufs(sess);
+	if (err) {
+		ERR(sess, "Failed to allocate sess bufs, errno: %d\n", err);
+		goto err_sess;
+	}
+
+	sess->msg_wq = alloc_ordered_workqueue("sess_msg_wq", 0);
+	if (!sess->msg_wq) {
+		ERR(sess, "Failed to create user message workqueue\n");
+		err = -ENOMEM;
+		goto err_buff;
+	}
+
+	atomic_set(&sess->ib_sess_initialized, 1);
+
+	return 0;
+
+err_buff:
+	free_sess_init_bufs(sess);
+err_sess:
+	ib_session_destroy(&sess->ib_sess);
+err_out:
+	// TODO ib_device_put(sess->ib_device);
+	sess->ib_device = NULL;
+	return err;
+}
+
+static void ibtrs_clt_destroy_ib_session(struct ibtrs_session *sess)
+{
+	if (sess->ib_device) {
+		free_sess_bufs(sess);
+		destroy_workqueue(sess->msg_wq);
+		// TODO ib_device_put(sess->ib_device);
+		sess->ib_device = NULL;
+	}
+
+	if (atomic_cmpxchg(&sess->ib_sess_initialized, 1, 0) == 1)
+		ib_session_destroy(&sess->ib_sess);
+
+	if (sess->ib_sess_destroy_completion)
+		complete_all(sess->ib_sess_destroy_completion);
+}
+
+static void free_con_fast_pool(struct ibtrs_con *con)
+{
+	if (con->user)
+		return;
+	if (con->sess->fast_reg_mode == IBTRS_FAST_MEM_FMR)
+		return;
+	if (con->sess->fast_reg_mode == IBTRS_FAST_MEM_FR) {
+		ibtrs_destroy_fr_pool(con->fr_pool);
+		con->fr_pool = NULL;
+	}
+}
+
+static int alloc_con_fast_pool(struct ibtrs_con *con)
+{
+	int err = 0;
+	struct ibtrs_fr_pool *fr_pool;
+	struct ibtrs_session *sess = con->sess;
+
+	if (con->user)
+		return 0;
+
+	if (sess->fast_reg_mode == IBTRS_FAST_MEM_FMR)
+		return 0;
+
+	if (sess->fast_reg_mode == IBTRS_FAST_MEM_FR) {
+		fr_pool = alloc_fr_pool(sess);
+		if (IS_ERR(fr_pool)) {
+			err = PTR_ERR(fr_pool);
+			ERR(sess, "FR pool allocation failed, errno: %d\n",
+			    err);
+			return err;
+		}
+		con->fr_pool = fr_pool;
+	}
+
+	return err;
+}
+
+static void ibtrs_clt_destroy_cm_id(struct ibtrs_con *con)
+{
+	if (!con->device_being_removed) {
+		rdma_destroy_id(con->cm_id);
+		con->cm_id = NULL;
+	}
+}
+
+static void con_destroy(struct ibtrs_con *con)
+{
+	if (con->user) {
+		cancel_delayed_work_sync(&con->sess->heartbeat_dwork);
+		drain_workqueue(con->cq_wq);
+		cancel_work_sync(&con->cq_work);
+	}
+	fail_outstanding_reqs(con);
+	ib_con_destroy(&con->ib_con);
+	free_con_fast_pool(con);
+	if (con->user)
+		free_sess_tr_bufs(con->sess);
+	ibtrs_clt_destroy_cm_id(con);
+
+	/* notify possible user msg ACK thread waiting for a tx iu or user msg
+	 * buffer so they can check the connection state, give up waiting and
+	 * put back any tx_iu reserved
+	 */
+	if (con->user) {
+		wake_up(&con->sess->mu_buf_wait_q);
+		wake_up(&con->sess->mu_iu_wait_q);
+	}
+}
+
+int ibtrs_clt_stats_migration_cnt_to_str(struct ibtrs_session *sess, char *buf,
+					 size_t len)
+{
+	int i;
+	size_t used = 0;
+
+	used += scnprintf(buf + used, len - used, "    ");
+
+	for (i = 0; i < num_online_cpus(); i++)
+		used += scnprintf(buf + used, len - used, " CPU%u", i);
+
+	used += scnprintf(buf + used, len - used, "\nfrom:");
+
+	for (i = 0; i < num_online_cpus(); i++)
+		used += scnprintf(buf + used, len - used, " %d",
+				 atomic_read(&sess->stats.cpu_migr.from[i]));
+
+	used += scnprintf(buf + used, len - used, "\n"
+			 "to  :");
+
+	for (i = 0; i < num_online_cpus(); i++)
+		used += scnprintf(buf + used, len - used, " %d",
+				 sess->stats.cpu_migr.to[i]);
+
+	used += scnprintf(buf + used, len - used, "\n");
+
+	return used;
+}
+
+int ibtrs_clt_reset_reconnects_stat(struct ibtrs_session *sess, bool enable)
+{
+	if (enable) {
+		memset(&sess->stats.reconnects, 0,
+		       sizeof(sess->stats.reconnects));
+		return 0;
+	} else {
+		return -EINVAL;
+	}
+}
+
+int ibtrs_clt_stats_reconnects_to_str(struct ibtrs_session *sess, char *buf,
+				      size_t len)
+{
+	return scnprintf(buf, len, "%u %u\n",
+			sess->stats.reconnects.successful_cnt,
+			sess->stats.reconnects.fail_cnt);
+}
+
+int ibtrs_clt_reset_user_ib_msgs_stats(struct ibtrs_session *sess, bool enable)
+{
+	if (enable) {
+		memset(&sess->stats.user_ib_msgs, 0,
+		       sizeof(sess->stats.user_ib_msgs));
+		return 0;
+	} else {
+		return -EINVAL;
+	}
+}
+
+int ibtrs_clt_stats_user_ib_msgs_to_str(struct ibtrs_session *sess, char *buf,
+					size_t len)
+{
+	return scnprintf(buf, len, "%u %llu %u %llu\n",
+			sess->stats.user_ib_msgs.recv_msg_cnt,
+			sess->stats.user_ib_msgs.recv_size,
+			sess->stats.user_ib_msgs.sent_msg_cnt,
+			sess->stats.user_ib_msgs.sent_size);
+}
+
+static u32 ibtrs_clt_stats_get_max_wc_cnt(struct ibtrs_session *sess)
+{
+	int i;
+	u32 max = 0;
+
+	for (i = 0; i < num_online_cpus(); i++)
+		if (max < sess->stats.wc_comp[i].max_wc_cnt)
+			max = sess->stats.wc_comp[i].max_wc_cnt;
+	return max;
+}
+
+static u32 ibtrs_clt_stats_get_avg_wc_cnt(struct ibtrs_session *sess)
+{
+	int i;
+	u32 cnt = 0;
+	u64 sum = 0;
+
+	for (i = 0; i < num_online_cpus(); i++) {
+		sum += sess->stats.wc_comp[i].total_cnt;
+		cnt += sess->stats.wc_comp[i].cnt;
+	}
+
+	return cnt ? sum / cnt : 0;
+}
+
+int ibtrs_clt_stats_wc_completion_to_str(struct ibtrs_session *sess, char *buf,
+					 size_t len)
+{
+	return scnprintf(buf, len, "%u %u\n",
+			ibtrs_clt_stats_get_max_wc_cnt(sess),
+			ibtrs_clt_stats_get_avg_wc_cnt(sess));
+}
+
+static void sess_destroy_handler(struct work_struct *work)
+{
+	struct sess_destroy_sm_wq_work *w;
+
+	w = container_of(work, struct sess_destroy_sm_wq_work, work);
+
+	put_sess(w->sess);
+	kvfree(w);
+}
+
+static void sess_schedule_destroy(struct ibtrs_session *sess)
+{
+	struct sess_destroy_sm_wq_work *w;
+
+	while (true) {
+		w = ibtrs_malloc(sizeof(*w));
+		if (w)
+			break;
+		cond_resched();
+	}
+
+	w->sess = sess;
+	INIT_WORK(&w->work, sess_destroy_handler);
+	ibtrs_clt_destroy_sess_files(&sess->kobj, &sess->kobj_stats);
+	queue_work(ibtrs_wq, &w->work);
+}
+
+int ibtrs_clt_reset_wc_comp_stats(struct ibtrs_session *sess, bool enable)
+{
+	if (enable) {
+		memset(sess->stats.wc_comp, 0,
+		       num_online_cpus() * sizeof(*sess->stats.wc_comp));
+		return 0;
+	} else {
+		return -EINVAL;
+	}
+}
+
+static int ibtrs_clt_init_wc_comp_stats(struct ibtrs_session *sess)
+{
+	sess->stats.wc_comp = kcalloc(num_online_cpus(),
+				      sizeof(*sess->stats.wc_comp),
+				      GFP_KERNEL);
+	if (!sess->stats.wc_comp)
+		return -ENOMEM;
+
+	return 0;
+}
+
+int ibtrs_clt_reset_cpu_migr_stats(struct ibtrs_session *sess, bool enable)
+{
+	if (enable) {
+		memset(sess->stats.cpu_migr.from, 0,
+		       num_online_cpus() *
+		       sizeof(*sess->stats.cpu_migr.from));
+
+		memset(sess->stats.cpu_migr.to, 0,
+		       num_online_cpus() * sizeof(*sess->stats.cpu_migr.to));
+		return 0;
+	} else {
+		return -EINVAL;
+	}
+}
+
+static int ibtrs_clt_init_cpu_migr_stats(struct ibtrs_session *sess)
+{
+	sess->stats.cpu_migr.from = kcalloc(num_online_cpus(),
+					    sizeof(*sess->stats.cpu_migr.from),
+					    GFP_KERNEL);
+	if (!sess->stats.cpu_migr.from)
+		return -ENOMEM;
+
+	sess->stats.cpu_migr.to = kcalloc(num_online_cpus(),
+					  sizeof(*sess->stats.cpu_migr.to),
+					  GFP_KERNEL);
+	if (!sess->stats.cpu_migr.to) {
+		kfree(sess->stats.cpu_migr.from);
+		sess->stats.cpu_migr.from = NULL;
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static int ibtrs_clt_init_sg_list_distr_stats(struct ibtrs_session *sess)
+{
+	int i;
+
+	sess->stats.sg_list_distr = kmalloc_array(num_online_cpus(),
+					    sizeof(*sess->stats.sg_list_distr),
+					    GFP_KERNEL);
+
+	if (!sess->stats.sg_list_distr)
+		return -ENOMEM;
+
+	for (i = 0; i < num_online_cpus(); i++) {
+		sess->stats.sg_list_distr[i] =
+			kzalloc_node(sizeof(*sess->stats.sg_list_distr[0]) *
+				     (SG_DISTR_LEN + 1),
+				     GFP_KERNEL, cpu_to_node(i));
+		if (!sess->stats.sg_list_distr[i])
+			goto err;
+	}
+
+	sess->stats.sg_list_total = kcalloc(num_online_cpus(),
+					sizeof(*sess->stats.sg_list_total),
+					GFP_KERNEL);
+	if (!sess->stats.sg_list_total)
+		goto err;
+
+	return 0;
+
+err:
+	for (; i > 0; i--)
+		kfree(sess->stats.sg_list_distr[i - 1]);
+
+	kfree(sess->stats.sg_list_distr);
+	sess->stats.sg_list_distr = NULL;
+
+	return -ENOMEM;
+}
+
+int ibtrs_clt_reset_sg_list_distr_stats(struct ibtrs_session *sess,
+					bool enable)
+{
+	int i;
+
+	if (enable) {
+		memset(sess->stats.sg_list_total, 0,
+		       num_online_cpus() *
+		       sizeof(*sess->stats.sg_list_total));
+
+		for (i = 0; i < num_online_cpus(); i++)
+			memset(sess->stats.sg_list_distr[i], 0,
+			       sizeof(*sess->stats.sg_list_distr[0]) *
+			       (SG_DISTR_LEN + 1));
+		return 0;
+	} else {
+		return -EINVAL;
+	}
+}
+
+ssize_t ibtrs_clt_stats_rdma_lat_distr_to_str(struct ibtrs_session *sess,
+					      char *page, size_t len)
+{
+	ssize_t cnt = 0;
+	int i, cpu;
+	struct ibtrs_clt_stats *s = &sess->stats;
+	struct ibtrs_clt_stats_rdma_lat_entry res[MAX_LOG_LATENCY -
+						  MIN_LOG_LATENCY + 2];
+	struct ibtrs_clt_stats_rdma_lat_entry max;
+
+	max.write	= 0;
+	max.read	= 0;
+	for (cpu = 0; cpu < num_online_cpus(); cpu++) {
+		if (max.write < s->rdma_lat_max[cpu].write)
+			max.write = s->rdma_lat_max[cpu].write;
+		if (max.read < s->rdma_lat_max[cpu].read)
+			max.read = s->rdma_lat_max[cpu].read;
+	}
+
+	for (i = 0; i < ARRAY_SIZE(res); i++) {
+		res[i].write	= 0;
+		res[i].read	= 0;
+		for (cpu = 0; cpu < num_online_cpus(); cpu++) {
+			res[i].write += s->rdma_lat_distr[cpu][i].write;
+			res[i].read += s->rdma_lat_distr[cpu][i].read;
+		}
+	}
+
+	for (i = 0; i < ARRAY_SIZE(res) - 1; i++)
+		cnt += scnprintf(page + cnt, len - cnt,
+				 "< %6d ms: %llu %llu\n",
+				 1 << (i + MIN_LOG_LATENCY), res[i].read,
+				 res[i].write);
+	cnt += scnprintf(page + cnt, len - cnt, ">= %5d ms: %llu %llu\n",
+			 1 << (i - 1 + MIN_LOG_LATENCY), res[i].read,
+			 res[i].write);
+	cnt += scnprintf(page + cnt, len - cnt, " maximum ms: %llu %llu\n",
+			 max.read, max.write);
+
+	return cnt;
+}
+
+int ibtrs_clt_reset_rdma_lat_distr_stats(struct ibtrs_session *sess,
+					 bool enable)
+{
+	int i;
+	struct ibtrs_clt_stats *s = &sess->stats;
+
+	if (enable) {
+		memset(s->rdma_lat_max, 0,
+		       num_online_cpus() * sizeof(*s->rdma_lat_max));
+
+		for (i = 0; i < num_online_cpus(); i++)
+			memset(s->rdma_lat_distr[i], 0,
+			       sizeof(*s->rdma_lat_distr[0]) *
+			       (MAX_LOG_LATENCY - MIN_LOG_LATENCY + 2));
+	}
+	sess->enable_rdma_lat = enable;
+	return 0;
+}
+
+static int ibtrs_clt_init_rdma_lat_distr_stats(struct ibtrs_session *sess)
+{
+	int i;
+	struct ibtrs_clt_stats *s = &sess->stats;
+
+	s->rdma_lat_max = kzalloc(num_online_cpus() *
+				  sizeof(*s->rdma_lat_max), GFP_KERNEL);
+	if (!s->rdma_lat_max)
+		return -ENOMEM;
+
+	s->rdma_lat_distr = kmalloc_array(num_online_cpus(),
+					  sizeof(*s->rdma_lat_distr),
+					  GFP_KERNEL);
+	if (!s->rdma_lat_distr)
+		goto err1;
+
+	for (i = 0; i < num_online_cpus(); i++) {
+		s->rdma_lat_distr[i] =
+			kzalloc_node(sizeof(*s->rdma_lat_distr[0]) *
+				     (MAX_LOG_LATENCY - MIN_LOG_LATENCY + 2),
+				     GFP_KERNEL, cpu_to_node(i));
+		if (!s->rdma_lat_distr[i])
+			goto err2;
+	}
+
+	return 0;
+
+err2:
+	for (; i >= 0; i--)
+		kfree(s->rdma_lat_distr[i]);
+
+	kfree(s->rdma_lat_distr);
+	s->rdma_lat_distr = NULL;
+err1:
+	kfree(s->rdma_lat_max);
+	s->rdma_lat_max = NULL;
+
+	return -ENOMEM;
+}
+
+int ibtrs_clt_reset_rdma_stats(struct ibtrs_session *sess, bool enable)
+{
+	if (enable) {
+		struct ibtrs_clt_stats *s = &sess->stats;
+
+		memset(s->rdma_stats, 0,
+		       num_online_cpus() * sizeof(*s->rdma_stats));
+		return 0;
+	} else {
+		return -EINVAL;
+	}
+}
+
+static int ibtrs_clt_init_rdma_stats(struct ibtrs_session *sess)
+{
+	struct ibtrs_clt_stats *s = &sess->stats;
+
+	s->rdma_stats = kcalloc(num_online_cpus(), sizeof(*s->rdma_stats),
+				GFP_KERNEL);
+	if (!s->rdma_stats)
+		return -ENOMEM;
+
+	return 0;
+}
+
+ssize_t ibtrs_clt_reset_all_help(struct ibtrs_session *sess,
+				 char *page, size_t len)
+{
+	return scnprintf(page, len, "echo 1 to reset all statistics\n");
+}
+
+int ibtrs_clt_reset_all_stats(struct ibtrs_session *sess, bool enable)
+{
+	if (enable) {
+		ibtrs_clt_reset_rdma_stats(sess, enable);
+		ibtrs_clt_reset_rdma_lat_distr_stats(sess, enable);
+		ibtrs_clt_reset_sg_list_distr_stats(sess, enable);
+		ibtrs_clt_reset_cpu_migr_stats(sess, enable);
+		ibtrs_clt_reset_user_ib_msgs_stats(sess, enable);
+		ibtrs_clt_reset_reconnects_stat(sess, enable);
+		ibtrs_clt_reset_wc_comp_stats(sess, enable);
+		return 0;
+	} else {
+		return -EINVAL;
+	}
+}
+
+static int ibtrs_clt_init_stats(struct ibtrs_session *sess)
+{
+	int err;
+
+	err = ibtrs_clt_init_sg_list_distr_stats(sess);
+	if (err) {
+		ERR(sess,
+		    "Failed to init S/G list distribution stats, errno: %d\n",
+		    err);
+		return err;
+	}
+
+	err = ibtrs_clt_init_cpu_migr_stats(sess);
+	if (err) {
+		ERR(sess, "Failed to init CPU migration stats, errno: %d\n",
+		    err);
+		goto err_sg_list;
+	}
+
+	err = ibtrs_clt_init_rdma_lat_distr_stats(sess);
+	if (err) {
+		ERR(sess,
+		    "Failed to init RDMA lat distribution stats, errno: %d\n",
+		    err);
+		goto err_migr;
+	}
+
+	err = ibtrs_clt_init_wc_comp_stats(sess);
+	if (err) {
+		ERR(sess, "Failed to init WC completion stats, errno: %d\n",
+		    err);
+		goto err_rdma_lat;
+	}
+
+	err = ibtrs_clt_init_rdma_stats(sess);
+	if (err) {
+		ERR(sess, "Failed to init RDMA stats, errno: %d\n",
+		    err);
+		goto err_wc_comp;
+	}
+
+	return 0;
+
+err_wc_comp:
+	ibtrs_clt_free_wc_comp_stats(sess);
+err_rdma_lat:
+	ibtrs_clt_free_rdma_lat_stats(sess);
+err_migr:
+	ibtrs_clt_free_cpu_migr_stats(sess);
+err_sg_list:
+	ibtrs_clt_free_sg_list_distr_stats(sess);
+	return err;
+}
+
+static void ibtrs_clt_sess_reconnect_worker(struct work_struct *work)
+{
+	struct ibtrs_session *sess = container_of(to_delayed_work(work),
+						  struct ibtrs_session,
+						  reconnect_dwork);
+
+	ssm_schedule_event(sess, SSM_EV_RECONNECT);
+}
+
+static int sess_init_cons(struct ibtrs_session *sess)
+{
+	int i;
+
+	for (i = 0; i < CONS_PER_SESSION; i++) {
+		struct ibtrs_con *con = &sess->con[i];
+
+		csm_set_state(con, CSM_STATE_CLOSED);
+		con->sess = sess;
+		if (!i) {
+			INIT_WORK(&con->cq_work, wrapper_handle_cq_comp);
+			con->cq_wq =
+				alloc_ordered_workqueue("ibtrs_clt_wq",
+							WQ_HIGHPRI);
+			if (!con->cq_wq) {
+				ERR(sess, "Failed to allocate cq workqueue.\n");
+				return -ENOMEM;
+			}
+		} else {
+			tasklet_init(&con->cq_tasklet,
+				     tasklet_handle_cq_comp, (unsigned
+							      long)(con));
+		}
+	}
+
+	return 0;
+}
+
+static struct ibtrs_session *sess_init(const struct sockaddr_storage *addr,
+				       size_t pdu_sz, void *priv,
+				       u8 reconnect_delay_sec,
+				       u16 max_segments,
+				       s16 max_reconnect_attempts)
+{
+	int err;
+	struct ibtrs_session *sess;
+
+	sess = kzalloc(sizeof(*sess), GFP_KERNEL);
+	if (!sess) {
+		err = -ENOMEM;
+		goto err;
+	}
+	atomic_set(&sess->refcount, 1);
+	sess->sm_wq = create_workqueue("sess_sm_wq");
+
+	if (!sess->sm_wq) {
+		ERR_NP("Failed to create SSM workqueue\n");
+		err = -ENOMEM;
+		goto err_free_sess;
+	}
+
+	sess->peer_addr	= *addr;
+	sess->pdu_sz	= pdu_sz;
+	sess->priv	= priv;
+	sess->con	= kcalloc(CONS_PER_SESSION, sizeof(*sess->con),
+				  GFP_KERNEL);
+	if (!sess->con) {
+		err = -ENOMEM;
+		goto err_free_sm_wq;
+	}
+
+	sess->rdma_info_iu = NULL;
+	err = sess_init_cons(sess);
+	if (err) {
+		ERR_NP("Failed to initialize cons\n");
+		goto err_free_con;
+	}
+
+	err = ibtrs_clt_init_stats(sess);
+	if (err) {
+		ERR_NP("Failed to initialize statistics\n");
+		goto err_cons;
+	}
+
+	sess->reconnect_delay_sec	= reconnect_delay_sec;
+	sess->max_reconnect_attempts	= max_reconnect_attempts;
+	sess->max_pages_per_mr		= max_segments;
+	init_waitqueue_head(&sess->wait_q);
+	init_waitqueue_head(&sess->mu_iu_wait_q);
+	init_waitqueue_head(&sess->mu_buf_wait_q);
+
+	init_waitqueue_head(&sess->tags_wait);
+	sess->state = SSM_STATE_IDLE;
+	mutex_lock(&sess_mutex);
+	list_add(&sess->list, &sess_list);
+	mutex_unlock(&sess_mutex);
+
+	ibtrs_set_heartbeat_timeout(&sess->heartbeat,
+				    default_heartbeat_timeout_ms <
+				    MIN_HEARTBEAT_TIMEOUT_MS ?
+				    MIN_HEARTBEAT_TIMEOUT_MS :
+				    default_heartbeat_timeout_ms);
+	atomic64_set(&sess->heartbeat.send_ts_ms, 0);
+	atomic64_set(&sess->heartbeat.recv_ts_ms, 0);
+	sess->heartbeat.addr = sess->addr;
+	sess->heartbeat.hostname = sess->hostname;
+
+	INIT_DELAYED_WORK(&sess->heartbeat_dwork, heartbeat_work);
+	INIT_DELAYED_WORK(&sess->reconnect_dwork,
+			  ibtrs_clt_sess_reconnect_worker);
+
+	return sess;
+
+err_cons:
+	sess_deinit_cons(sess);
+err_free_con:
+	kfree(sess->con);
+	sess->con = NULL;
+err_free_sm_wq:
+	destroy_workqueue(sess->sm_wq);
+err_free_sess:
+	kfree(sess);
+err:
+	return ERR_PTR(err);
+}
+
+static int init_con(struct ibtrs_session *sess, struct ibtrs_con *con,
+		    short cpu, bool user)
+{
+	int err;
+
+	con->sess			= sess;
+	con->cpu			= cpu;
+	con->user			= user;
+	con->device_being_removed	= false;
+
+	err = create_cm_id_con(&sess->peer_addr, con);
+	if (err) {
+		ERR(sess, "Failed to create CM ID for connection\n");
+		return err;
+	}
+
+	csm_set_state(con, CSM_STATE_RESOLVING_ADDR);
+	err = resolve_addr(con, &sess->peer_addr);
+	if (err) {
+		ERR(sess, "Failed to resolve address, errno: %d\n", err);
+		goto err_cm_id;
+	}
+
+	sess->active_cnt++;
+
+	return 0;
+
+err_cm_id:
+	csm_set_state(con, CSM_STATE_CLOSED);
+	ibtrs_clt_destroy_cm_id(con);
+
+	return err;
+}
+
+static int create_con(struct ibtrs_con *con)
+{
+	int err, cq_vector;
+	u16 cq_size, wr_queue_size;
+	struct ibtrs_session *sess = con->sess;
+	int num_wr = DIV_ROUND_UP(con->sess->max_pages_per_mr,
+				  con->sess->max_sge);
+
+	if (con->user) {
+		err = create_ib_sess(con);
+		if (err) {
+			ERR(sess,
+			    "Failed to create IB session, errno: %d\n", err);
+			goto err_cm_id;
+		}
+		cq_size		= USR_CON_BUF_SIZE + 1;
+		wr_queue_size	= USR_CON_BUF_SIZE + 1;
+	} else {
+		err = ib_get_max_wr_queue_size(sess->ib_device);
+		if (err < 0)
+			goto err_cm_id;
+		cq_size		= sess->queue_depth;
+		wr_queue_size	= min_t(int, err - 1,
+					sess->queue_depth * num_wr *
+					(use_fr ? 3 : 2));
+	}
+
+	err = alloc_con_fast_pool(con);
+	if (err) {
+		ERR(sess, "Failed to allocate fast memory "
+		    "pool, errno: %d\n", err);
+		goto err_cm_id;
+	}
+	con->ib_con.addr = sess->addr;
+	con->ib_con.hostname = sess->hostname;
+	cq_vector = con->cpu % sess->ib_device->num_comp_vectors;
+	err = ib_con_init(&con->ib_con, con->cm_id,
+			  sess->max_sge, cq_event_handler, con, cq_vector,
+			  cq_size, wr_queue_size, &sess->ib_sess);
+	if (err) {
+		ERR(sess,
+		    "Failed to initialize IB connection, errno: %d\n", err);
+		goto err_pool;
+	}
+
+	DEB("setup_buffers successful\n");
+	err = post_recv(con);
+	if (err)
+		goto err_ib_con;
+
+	err = connect_qp(con);
+	if (err) {
+		ERR(con->sess, "Failed to connect QP, errno: %d\n", err);
+		goto err_wq;
+	}
+
+	DEB("connect qp successful\n");
+	atomic_set(&con->io_cnt, 0);
+	return 0;
+
+err_wq:
+	rdma_disconnect(con->cm_id);
+err_ib_con:
+	ib_con_destroy(&con->ib_con);
+err_pool:
+	free_con_fast_pool(con);
+err_cm_id:
+	ibtrs_clt_destroy_cm_id(con);
+
+	return err;
+}
+
+struct ibtrs_session *ibtrs_clt_open(const struct sockaddr_storage *addr,
+				     size_t pdu_sz, void *priv,
+				     u8 reconnect_delay_sec, u16 max_segments,
+				     s16 max_reconnect_attempts)
+{
+	int err;
+	struct ibtrs_session *sess;
+	char str_addr[IBTRS_ADDRLEN];
+
+	if (!clt_ops_are_valid(clt_ops)) {
+		ERR_NP("User module did not register ops callbacks\n");
+		err = -EINVAL;
+		goto err;
+	}
+
+	err = ibtrs_addr_to_str(addr, str_addr, sizeof(str_addr));
+	if (err < 0) {
+		ERR_NP("Establishing session to server failed, converting"
+		       " addr from binary to string failed, errno: %d\n", err);
+		return ERR_PTR(err);
+	}
+
+	INFO_NP("Establishing session to server %s\n", str_addr);
+
+	sess = sess_init(addr, pdu_sz, priv, reconnect_delay_sec,
+			 max_segments, max_reconnect_attempts);
+	if (IS_ERR(sess)) {
+		ERR_NP("Establishing session to %s failed, errno: %ld\n",
+		       str_addr, PTR_ERR(sess));
+		err = PTR_ERR(sess);
+		goto err;
+	}
+
+	get_sess(sess);
+	strlcpy(sess->addr, str_addr, sizeof(sess->addr));
+	err = init_con(sess, &sess->con[0], 0, true);
+	if (err) {
+		ERR(sess, "Establishing session to server failed,"
+		    " failed to init user connection, errno: %d\n", err);
+		/* Always return 'No route to host' when the connection can't be
+		 * established.
+		 */
+		err = -EHOSTUNREACH;
+		goto err1;
+	}
+
+	err = wait_for_ssm_state(sess, SSM_STATE_CONNECTED);
+	if (err) {
+		ERR(sess, "Establishing session to server failed,"
+		    " failed to establish connections, errno: %d\n", err);
+		put_sess(sess);
+		goto err; /* state machine will do the clean up. */
+	}
+	err = ibtrs_clt_create_sess_files(&sess->kobj, &sess->kobj_stats,
+					  sess->addr);
+	if (err) {
+		ERR(sess, "Establishing session to server failed,"
+		    " failed to create session sysfs files, errno: %d\n", err);
+		put_sess(sess);
+		ibtrs_clt_close(sess);
+		goto err;
+	}
+
+	put_sess(sess);
+	return sess;
+
+err1:
+	destroy_workqueue(sess->sm_wq);
+	sess_deinit_cons(sess);
+	kfree(sess->con);
+	sess->con = NULL;
+	ibtrs_clt_free_stats(sess);
+	mutex_lock(&sess_mutex);
+	list_del(&sess->list);
+	mutex_unlock(&sess_mutex);
+	kfree(sess);
+err:
+	return ERR_PTR(err);
+}
+EXPORT_SYMBOL(ibtrs_clt_open);
+
+int ibtrs_clt_close(struct ibtrs_session *sess)
+{
+	struct completion dc;
+
+	INFO(sess, "Session will be disconnected\n");
+
+	init_completion(&dc);
+	sess->destroy_completion = &dc;
+	ssm_schedule_event(sess, SSM_EV_SESS_CLOSE);
+	wait_for_completion(&dc);
+
+	return 0;
+}
+EXPORT_SYMBOL(ibtrs_clt_close);
+
+int ibtrs_clt_reconnect(struct ibtrs_session *sess)
+{
+	ssm_schedule_event(sess, SSM_EV_RECONNECT_USER);
+
+	INFO(sess, "Session reconnect event queued\n");
+
+	return 0;
+}
+
+void ibtrs_clt_set_max_reconnect_attempts(struct ibtrs_session *sess, s16 value)
+{
+	sess->max_reconnect_attempts = value;
+}
+
+s16
+inline ibtrs_clt_get_max_reconnect_attempts(const struct ibtrs_session *sess)
+{
+	return sess->max_reconnect_attempts;
+}
+
+static inline
+void ibtrs_clt_record_sg_distr(u64 *stat, u64 *total, unsigned int cnt)
+{
+	int i;
+
+	i = cnt > MAX_LIN_SG ? ilog2(cnt) + MAX_LIN_SG - MIN_LOG_SG + 1 : cnt;
+	i = i > SG_DISTR_LEN ? SG_DISTR_LEN : i;
+
+	stat[i]++;
+	(*total)++;
+}
+
+static int ibtrs_clt_rdma_write_desc(struct ibtrs_con *con,
+				     struct rdma_req *req, u64 buf,
+				     size_t u_msg_len, u32 imm,
+				     struct ibtrs_msg_rdma_write *msg)
+{
+	int ret;
+	size_t ndesc = con->sess->max_pages_per_mr;
+	struct ibtrs_sg_desc *desc;
+
+	desc = kmalloc_array(ndesc, sizeof(*desc), GFP_ATOMIC);
+	if (!desc) {
+		ib_dma_unmap_sg(con->sess->ib_device, req->sglist,
+				req->sg_cnt, req->dir);
+		return -ENOMEM;
+	}
+	ret = ibtrs_fast_reg_map_data(con, desc, req);
+	if (unlikely(ret < 0)) {
+		ERR_RL(con->sess,
+		       "RDMA-Write failed, fast reg. data mapping"
+		       " failed, errno: %d\n", ret);
+		ib_dma_unmap_sg(con->sess->ib_device, req->sglist,
+				req->sg_cnt, req->dir);
+		kfree(desc);
+		return ret;
+	}
+	ret = ibtrs_post_send_rdma_desc(con, req, desc, ret, buf,
+					u_msg_len + sizeof(*msg), imm);
+	if (unlikely(ret)) {
+		ERR(con->sess, "RDMA-Write failed, posting work"
+		    " request failed, errno: %d\n", ret);
+		ibtrs_unmap_fast_reg_data(con, req);
+		ib_dma_unmap_sg(con->sess->ib_device, req->sglist,
+				req->sg_cnt, req->dir);
+	}
+	kfree(desc);
+	return ret;
+}
+
+static int ibtrs_clt_rdma_write_sg(struct ibtrs_con *con, struct rdma_req *req,
+				   const struct kvec *vec, size_t u_msg_len,
+				   size_t data_len)
+{
+	int count = 0;
+	struct ibtrs_msg_rdma_write *msg;
+	u32 imm;
+	int ret;
+	int buf_id;
+	u64 buf;
+
+	const u32 tsize = sizeof(*msg) + data_len + u_msg_len;
+
+	if (unlikely(tsize > con->sess->chunk_size)) {
+		WRN_RL(con->sess, "RDMA-Write failed, data size too big %d >"
+		       " %d\n", tsize, con->sess->chunk_size);
+		return -EMSGSIZE;
+	}
+	if (req->sg_cnt) {
+		count = ib_dma_map_sg(con->sess->ib_device, req->sglist,
+				      req->sg_cnt, req->dir);
+		if (unlikely(!count)) {
+			WRN_RL(con->sess,
+			       "RDMA-Write failed, dma map failed\n");
+			return -EINVAL;
+		}
+	}
+
+	copy_from_kvec(req->iu->buf, vec, u_msg_len);
+
+	/* put ibtrs msg after sg and user message */
+	msg		= req->iu->buf + u_msg_len;
+	msg->hdr.type	= IBTRS_MSG_RDMA_WRITE;
+	msg->hdr.tsize	= tsize;
+
+	/* ibtrs message on server side will be after user data and message */
+	imm = req->tag->mem_id_mask + data_len + u_msg_len;
+	buf_id = req->tag->mem_id;
+	req->sg_size = data_len + u_msg_len + sizeof(*msg);
+
+	buf = con->sess->srv_rdma_addr[buf_id];
+	if (count > fmr_sg_cnt)
+		return ibtrs_clt_rdma_write_desc(con, req, buf, u_msg_len, imm,
+						 msg);
+
+	ret = ibtrs_post_send_rdma_more(con, req, buf, u_msg_len + sizeof(*msg),
+					imm);
+	if (unlikely(ret)) {
+		ERR(con->sess, "RDMA-Write failed, posting work"
+		    " request failed, errno: %d\n", ret);
+		if (count)
+			ib_dma_unmap_sg(con->sess->ib_device, req->sglist,
+					req->sg_cnt, req->dir);
+	}
+	return ret;
+}
+
+static void ibtrs_clt_update_rdma_stats(struct ibtrs_clt_stats *s,
+					size_t size, bool read)
+{
+	int cpu = raw_smp_processor_id();
+
+	if (read) {
+		s->rdma_stats[cpu].cnt_read++;
+		s->rdma_stats[cpu].size_total_read += size;
+	} else {
+		s->rdma_stats[cpu].cnt_write++;
+		s->rdma_stats[cpu].size_total_write += size;
+	}
+
+	s->rdma_stats[cpu].inflight++;
+}
+
+/**
+ * ibtrs_rdma_con_id() - returns RDMA connection id
+ *
+ * Note:
+ *     RDMA connection starts from 1.
+ *     0 connection is for user messages.
+ */
+static inline int ibtrs_rdma_con_id(struct ibtrs_tag *tag)
+{
+	return (tag->cpu_id % (CONS_PER_SESSION - 1)) + 1;
+}
+
+int ibtrs_clt_rdma_write(struct ibtrs_session *sess, struct ibtrs_tag *tag,
+			 void *priv, const struct kvec *vec, size_t nr,
+			 size_t data_len, struct scatterlist *sg,
+			 unsigned int sg_len)
+{
+	struct ibtrs_iu *iu;
+	struct rdma_req *req;
+	int err;
+	struct ibtrs_con *con;
+	int con_id;
+	size_t u_msg_len;
+
+	smp_rmb(); /* fence sess->state check */
+	if (unlikely(sess->state != SSM_STATE_CONNECTED)) {
+		ERR_RL(sess,
+		       "RDMA-Write failed, not connected (session state %s)\n",
+		       ssm_state_str(sess->state));
+		return -ECOMM;
+	}
+
+	u_msg_len = kvec_length(vec, nr);
+	if (unlikely(u_msg_len > IO_MSG_SIZE)) {
+		WRN_RL(sess, "RDMA-Write failed, user message size"
+		       " is %zu B big, max size is %d B\n", u_msg_len,
+		       IO_MSG_SIZE);
+		return -EMSGSIZE;
+	}
+
+	con_id = ibtrs_rdma_con_id(tag);
+	if (WARN_ON(con_id >= CONS_PER_SESSION))
+		return -EINVAL;
+	con = &sess->con[con_id];
+	rcu_read_lock();
+	smp_rmb(); /* fence con->state check */
+	if (unlikely(con->state != CSM_STATE_CONNECTED)) {
+		rcu_read_unlock();
+		ERR_RL(sess, "RDMA-Write failed, not connected"
+		       " (connection %d state %s)\n",
+		       con_id,
+		       csm_state_str(con->state));
+		return -ECOMM;
+	}
+
+	iu = sess->io_tx_ius[tag->mem_id];
+	req = &sess->reqs[tag->mem_id];
+	req->con	= con;
+	req->tag	= tag;
+	if (sess->enable_rdma_lat)
+		req->start_time = ibtrs_clt_get_raw_ms();
+	req->in_use	= true;
+
+	req->iu		= iu;
+	req->sglist	= sg;
+	req->sg_cnt	= sg_len;
+	req->priv	= priv;
+	req->dir        = DMA_TO_DEVICE;
+
+	err = ibtrs_clt_rdma_write_sg(con, req, vec, u_msg_len, data_len);
+	rcu_read_unlock();
+	if (unlikely(err)) {
+		req->in_use = false;
+		ERR_RL(sess, "RDMA-Write failed, failed to transfer scatter"
+		       " gather list, errno: %d\n", err);
+		return err;
+	}
+
+	ibtrs_heartbeat_set_send_ts(&sess->heartbeat);
+	ibtrs_clt_record_sg_distr(sess->stats.sg_list_distr[tag->cpu_id],
+				  &sess->stats.sg_list_total[tag->cpu_id],
+				  sg_len);
+	ibtrs_clt_update_rdma_stats(&sess->stats, u_msg_len + data_len, false);
+
+	return err;
+}
+EXPORT_SYMBOL(ibtrs_clt_rdma_write);
+
+static int ibtrs_clt_request_rdma_write_sg(struct ibtrs_con *con,
+					   struct rdma_req *req,
+					   const struct kvec *vec,
+					   size_t u_msg_len,
+					   size_t result_len)
+{
+	int count, i, ret;
+	struct ibtrs_msg_req_rdma_write *msg;
+	u32 imm;
+	int buf_id;
+	struct scatterlist *sg;
+	struct ib_device *ibdev = con->sess->ib_device;
+	const u32 tsize = sizeof(*msg) + result_len + u_msg_len;
+
+	if (unlikely(tsize > con->sess->chunk_size)) {
+		WRN_RL(con->sess, "Request-RDMA-Write failed, message size is"
+		       " %d, bigger than CHUNK_SIZE %d\n", tsize,
+			con->sess->chunk_size);
+		return -EMSGSIZE;
+	}
+
+	count = ib_dma_map_sg(ibdev, req->sglist, req->sg_cnt, req->dir);
+
+	if (unlikely(!count)) {
+		WRN_RL(con->sess,
+		       "Request-RDMA-Write failed, dma map failed\n");
+		return -EINVAL;
+	}
+
+	req->data_len = result_len;
+	copy_from_kvec(req->iu->buf, vec, u_msg_len);
+
+	/* put our message into req->buf after user message*/
+	msg		= req->iu->buf + u_msg_len;
+	msg->hdr.type	= IBTRS_MSG_REQ_RDMA_WRITE;
+	msg->hdr.tsize	= tsize;
+	msg->sg_cnt	= count;
+
+	if (WARN_ON(msg->hdr.tsize > con->sess->chunk_size))
+		return -EINVAL;
+	if (count > fmr_sg_cnt) {
+		ret = ibtrs_fast_reg_map_data(con, msg->desc, req);
+		if (ret < 0) {
+			ERR_RL(con->sess,
+			       "Request-RDMA-Write failed, failed to map fast"
+			       " reg. data, errno: %d\n", ret);
+			ib_dma_unmap_sg(con->sess->ib_device, req->sglist,
+					req->sg_cnt, req->dir);
+			return ret;
+		}
+		msg->sg_cnt = ret;
+	} else {
+		for_each_sg(req->sglist, sg, req->sg_cnt, i) {
+			msg->desc[i].addr = ib_sg_dma_address(ibdev, sg);
+			msg->desc[i].key = con->sess->ib_sess.mr->rkey;
+			msg->desc[i].len = ib_sg_dma_len(ibdev, sg);
+			DEB("desc addr %llu, len %u, i %d tsize %u\n",
+			    msg->desc[i].addr, msg->desc[i].len, i,
+			    msg->hdr.tsize);
+		}
+		req->nmdesc = 0;
+	}
+	/* ibtrs message will be after the space reserved for disk data and
+	 * user message
+	 */
+	imm = req->tag->mem_id_mask + result_len + u_msg_len;
+	buf_id = req->tag->mem_id;
+
+	req->sg_size = sizeof(*msg) + msg->sg_cnt * IBTRS_SG_DESC_LEN +
+		u_msg_len;
+	ret = ibtrs_post_send_rdma(con, req, con->sess->srv_rdma_addr[buf_id],
+				   result_len, imm);
+	if (unlikely(ret)) {
+		ERR(con->sess, "Request-RDMA-Write failed,"
+		    " posting work request failed, errno: %d\n", ret);
+
+		if (unlikely(count > fmr_sg_cnt)) {
+			ibtrs_unmap_fast_reg_data(con, req);
+			ib_dma_unmap_sg(con->sess->ib_device, req->sglist,
+					req->sg_cnt, req->dir);
+		}
+	}
+	return ret;
+}
+
+int ibtrs_clt_request_rdma_write(struct ibtrs_session *sess,
+				 struct ibtrs_tag *tag, void *priv,
+				 const struct kvec *vec, size_t nr,
+				 size_t result_len,
+				 struct scatterlist *recv_sg,
+				 unsigned int recv_sg_len)
+{
+	struct ibtrs_iu *iu;
+	struct rdma_req *req;
+	int err;
+	struct ibtrs_con *con;
+	int con_id;
+	size_t u_msg_len;
+
+	smp_rmb(); /* fence sess->state check */
+	if (unlikely(sess->state != SSM_STATE_CONNECTED)) {
+		ERR_RL(sess,
+		       "Request-RDMA-Write failed, not connected (session"
+		       " state %s)\n", ssm_state_str(sess->state));
+		return -ECOMM;
+	}
+
+	u_msg_len = kvec_length(vec, nr);
+	if (unlikely(u_msg_len > IO_MSG_SIZE ||
+		     sizeof(struct ibtrs_msg_req_rdma_write) +
+		     recv_sg_len * IBTRS_SG_DESC_LEN > sess->max_req_size)) {
+		WRN_RL(sess, "Request-RDMA-Write failed, user message size"
+		       " is %zu B big, max size is %d B\n", u_msg_len,
+		       IO_MSG_SIZE);
+		return -EMSGSIZE;
+	}
+
+	con_id = ibtrs_rdma_con_id(tag);
+	if (WARN_ON(con_id >= CONS_PER_SESSION))
+		return -EINVAL;
+	con = &sess->con[con_id];
+	rcu_read_lock();
+	smp_rmb(); /* fence con->state check */
+	if (unlikely(con->state != CSM_STATE_CONNECTED)) {
+		rcu_read_unlock();
+		ERR_RL(sess, "RDMA-Write failed, not connected"
+		       " (connection %d state %s)\n",
+		       con_id,
+		       csm_state_str(con->state));
+		return -ECOMM;
+	}
+
+	iu = sess->io_tx_ius[tag->mem_id];
+	req = &sess->reqs[tag->mem_id];
+	req->con	= con;
+	req->tag	= tag;
+	if (sess->enable_rdma_lat)
+		req->start_time = ibtrs_clt_get_raw_ms();
+	req->in_use	= true;
+
+	req->iu		= iu;
+	req->sglist	= recv_sg;
+	req->sg_cnt	= recv_sg_len;
+	req->priv	= priv;
+	req->dir        = DMA_FROM_DEVICE;
+
+	err = ibtrs_clt_request_rdma_write_sg(con, req, vec,
+					      u_msg_len, result_len);
+	rcu_read_unlock();
+	if (unlikely(err)) {
+		req->in_use = false;
+		ERR_RL(sess, "Request-RDMA-Write failed, failed to transfer"
+		       " scatter gather list, errno: %d\n", err);
+		return err;
+	}
+
+	ibtrs_heartbeat_set_send_ts(&sess->heartbeat);
+	ibtrs_clt_record_sg_distr(sess->stats.sg_list_distr[tag->cpu_id],
+				  &sess->stats.sg_list_total[tag->cpu_id],
+				  recv_sg_len);
+	ibtrs_clt_update_rdma_stats(&sess->stats, u_msg_len + result_len, true);
+
+	return err;
+}
+EXPORT_SYMBOL(ibtrs_clt_request_rdma_write);
+
+static bool ibtrs_clt_get_usr_msg_buf(struct ibtrs_session *sess)
+{
+	return atomic_dec_if_positive(&sess->peer_usr_msg_bufs) >= 0;
+}
+
+int ibtrs_clt_send(struct ibtrs_session *sess, const struct kvec *vec,
+		   size_t nr)
+{
+	struct ibtrs_con *con;
+	struct ibtrs_iu *iu = NULL;
+	struct ibtrs_msg_user *msg;
+	size_t len;
+	bool closed_st = false;
+	int err = 0;
+
+	con = &sess->con[0];
+
+	smp_rmb(); /* fence sess->state check */
+	if (unlikely(con->state != CSM_STATE_CONNECTED ||
+		     sess->state != SSM_STATE_CONNECTED)) {
+		ERR_RL(sess, "Sending user message failed, not connected,"
+		       " Connection state is %s, Session state is %s\n",
+		       csm_state_str(con->state), ssm_state_str(sess->state));
+		return -ECOMM;
+	}
+
+	len = kvec_length(vec, nr);
+
+	DEB("send user msg length=%zu, peer_msg_buf %d\n", len,
+	    atomic_read(&sess->peer_usr_msg_bufs));
+	if (len > sess->max_req_size - IBTRS_HDR_LEN) {
+		ERR_RL(sess, "Sending user message failed,"
+		       " user message length too large (len: %zu)\n", len);
+		return -EMSGSIZE;
+	}
+
+	wait_event(sess->mu_buf_wait_q,
+		   (closed_st = (con->state != CSM_STATE_CONNECTED ||
+				 sess->state != SSM_STATE_CONNECTED)) ||
+		   ibtrs_clt_get_usr_msg_buf(sess));
+
+	if (unlikely(closed_st)) {
+		ERR_RL(sess, "Sending user message failed, not connected"
+		       " Connection state is %s, Session state is %s\n",
+		       csm_state_str(con->state), ssm_state_str(sess->state));
+		return -ECOMM;
+	}
+
+	wait_event(sess->mu_iu_wait_q,
+		   (closed_st = (con->state != CSM_STATE_CONNECTED ||
+				 sess->state != SSM_STATE_CONNECTED)) ||
+		   (iu = get_u_msg_iu(sess)) != NULL);
+
+	if (unlikely(closed_st)) {
+		ERR_RL(sess, "Sending user message failed, not connected"
+		       " Connection state is %s, Session state is %s\n",
+		       csm_state_str(con->state), ssm_state_str(sess->state));
+		err = -ECOMM;
+		goto err_iu;
+	}
+
+	rcu_read_lock();
+	smp_rmb(); /* fence con->state check */
+	if (unlikely(con->state != CSM_STATE_CONNECTED)) {
+		rcu_read_unlock();
+		ERR_RL(sess, "Sending user message failed, not connected,"
+		       " Connection state is %s, Session state is %s\n",
+		       csm_state_str(con->state), ssm_state_str(sess->state));
+		err = -ECOMM;
+		goto err_post_send;
+	}
+
+	msg		= iu->buf;
+	msg->hdr.type	= IBTRS_MSG_USER;
+	msg->hdr.tsize	= IBTRS_HDR_LEN + len;
+	copy_from_kvec(msg->payl, vec, len);
+
+	ibtrs_deb_msg_hdr("Sending: ", &msg->hdr);
+	err = ibtrs_post_send(con->ib_con.qp, con->sess->ib_sess.mr, iu,
+			      msg->hdr.tsize);
+	rcu_read_unlock();
+	if (unlikely(err)) {
+		ERR_RL(sess, "Sending user message failed, posting work"
+		       " request failed, errno: %d\n", err);
+		goto err_post_send;
+	}
+
+	sess->stats.user_ib_msgs.sent_msg_cnt++;
+	sess->stats.user_ib_msgs.sent_size += len;
+
+	ibtrs_heartbeat_set_send_ts(&sess->heartbeat);
+
+	return 0;
+
+err_post_send:
+	put_u_msg_iu(sess, iu);
+	wake_up(&sess->mu_iu_wait_q);
+err_iu:
+	atomic_inc(&sess->peer_usr_msg_bufs);
+	wake_up(&sess->mu_buf_wait_q);
+	return err;
+}
+EXPORT_SYMBOL(ibtrs_clt_send);
+
+static void csm_resolving_addr(struct ibtrs_con *con, enum csm_ev ev)
+{
+	DEB("con %p, state %s event %s\n", con, csm_state_str(con->state),
+	    csm_event_str(ev));
+	switch (ev) {
+	case CSM_EV_ADDR_RESOLVED: {
+		int err;
+
+		csm_set_state(con, CSM_STATE_RESOLVING_ROUTE);
+		err = resolve_route(con);
+		if (err) {
+			ERR(con->sess, "Failed to resolve route, errno: %d\n",
+			    err);
+			ibtrs_clt_destroy_cm_id(con);
+			csm_set_state(con, CSM_STATE_CLOSED);
+			ssm_schedule_event(con->sess, SSM_EV_CON_CLOSED);
+		}
+		break;
+		}
+	case CSM_EV_CON_ERROR:
+	case CSM_EV_SESS_CLOSING:
+		ibtrs_clt_destroy_cm_id(con);
+		csm_set_state(con, CSM_STATE_CLOSED);
+		ssm_schedule_event(con->sess, SSM_EV_CON_CLOSED);
+		break;
+	default:
+		WRN(con->sess,
+		    "Unexpected CSM Event '%s' in state '%s' received\n",
+		    csm_event_str(ev), csm_state_str(con->state));
+		return;
+	}
+}
+
+static void csm_resolving_route(struct ibtrs_con *con, enum csm_ev ev)
+{
+	int err;
+
+	DEB("con %p, state %s event %s\n", con, csm_state_str(con->state),
+	    csm_event_str(ev));
+	switch (ev) {
+	case CSM_EV_ROUTE_RESOLVED:
+		err = create_con(con);
+		if (err) {
+			ERR(con->sess,
+			    "Failed to create connection, errno: %d\n", err);
+			csm_set_state(con, CSM_STATE_CLOSED);
+			ssm_schedule_event(con->sess, SSM_EV_CON_CLOSED);
+			return;
+		}
+		csm_set_state(con, CSM_STATE_CONNECTING);
+		break;
+	case CSM_EV_CON_ERROR:
+	case CSM_EV_SESS_CLOSING:
+		ibtrs_clt_destroy_cm_id(con);
+		csm_set_state(con, CSM_STATE_CLOSED);
+		ssm_schedule_event(con->sess, SSM_EV_CON_CLOSED);
+		break;
+	default:
+		WRN(con->sess,
+		    "Unexpected CSM Event '%s' in state '%s' received\n",
+		    csm_event_str(ev), csm_state_str(con->state));
+		return;
+	}
+}
+
+static int con_disconnect(struct ibtrs_con *con)
+{
+	int err;
+
+	err = rdma_disconnect(con->cm_id);
+	if (err)
+		ERR(con->sess,
+		    "Failed to disconnect RDMA connection, errno: %d\n", err);
+	return err;
+}
+
+static int send_msg_sess_info(struct ibtrs_con *con)
+{
+	struct ibtrs_msg_sess_info *msg;
+	int err;
+	struct ibtrs_session *sess = con->sess;
+
+	msg = sess->sess_info_iu->buf;
+
+	fill_ibtrs_msg_sess_info(msg, hostname);
+
+	err = ibtrs_post_send(con->ib_con.qp, con->sess->ib_sess.mr,
+			      sess->sess_info_iu, msg->hdr.tsize);
+	if (unlikely(err))
+		ERR(sess, "Sending sess info failed, "
+			  "posting msg to QP failed, errno: %d\n", err);
+
+	return err;
+}
+
+static void csm_connecting(struct ibtrs_con *con, enum csm_ev ev)
+{
+	DEB("con %p, state %s event %s\n", con, csm_state_str(con->state),
+	    csm_event_str(ev));
+	switch (ev) {
+	case CSM_EV_CON_ESTABLISHED:
+		csm_set_state(con, CSM_STATE_CONNECTED);
+		if (con->user) {
+			if (send_msg_sess_info(con))
+				goto destroy;
+		}
+		ssm_schedule_event(con->sess, SSM_EV_CON_CONNECTED);
+		break;
+	case CSM_EV_CON_ERROR:
+	case CSM_EV_SESS_CLOSING:
+	case CSM_EV_WC_ERROR:
+	case CSM_EV_CON_DISCONNECTED:
+destroy:
+		csm_set_state(con, CSM_STATE_CLOSING);
+		con_disconnect(con);
+		/* No CM_DISCONNECTED after rdma_disconnect, triger sm*/
+		csm_schedule_event(con, CSM_EV_CON_DISCONNECTED);
+		break;
+	default:
+		WRN(con->sess,
+		    "Unexpected CSM Event '%s' in state '%s' received\n",
+		    csm_event_str(ev), csm_state_str(con->state));
+		return;
+	}
+}
+
+static void csm_connected(struct ibtrs_con *con, enum csm_ev ev)
+{
+	DEB("con %p, state %s event %s\n", con, csm_state_str(con->state),
+	    csm_event_str(ev));
+	switch (ev) {
+	case CSM_EV_WC_ERROR:
+	case CSM_EV_CON_ERROR:
+	case CSM_EV_CON_DISCONNECTED:
+		ssm_schedule_event(con->sess, SSM_EV_CON_ERROR);
+		csm_set_state(con, CSM_STATE_CLOSING);
+		con_disconnect(con);
+		break;
+	case CSM_EV_SESS_CLOSING:
+		csm_set_state(con, CSM_STATE_CLOSING);
+		con_disconnect(con);
+		break;
+	default:
+		WRN(con->sess,
+		    "Unexpected CSM Event '%s' in state '%s' received\n",
+		    csm_event_str(ev), csm_state_str(con->state));
+		return;
+	}
+}
+
+static void csm_closing(struct ibtrs_con *con, enum csm_ev ev)
+{
+	DEB("con %p, state %s event %s\n", con, csm_state_str(con->state),
+	    csm_event_str(ev));
+	switch (ev) {
+	case CSM_EV_CON_DISCONNECTED:
+	case CSM_EV_CON_ERROR:
+	case CSM_EV_SESS_CLOSING: {
+		int err;
+
+		csm_set_state(con, CSM_STATE_FLUSHING);
+		synchronize_rcu();
+
+		err = post_beacon(&con->ib_con);
+		if (err) {
+			WRN(con->sess, "Failed to post BEACON,"
+			    " will destroy connection directly\n");
+			goto destroy;
+		}
+
+		err = ibtrs_request_cq_notifications(&con->ib_con);
+		if (unlikely(err < 0)) {
+			WRN(con->sess, "Requesting CQ Notification for"
+			    " ib_con failed. Connection will be destroyed\n");
+			goto destroy;
+		} else if (err > 0) {
+			err = get_process_wcs(con);
+			if (unlikely(err))
+				goto destroy;
+			break;
+		}
+		break;
+destroy:
+		con_destroy(con);
+		csm_set_state(con, CSM_STATE_CLOSED);
+		ssm_schedule_event(con->sess, SSM_EV_CON_CLOSED);
+		break;
+		}
+	case CSM_EV_CON_ESTABLISHED:
+	case CSM_EV_WC_ERROR:
+		/* ignore WC errors */
+		break;
+	default:
+		WRN(con->sess,
+		    "Unexpected CSM Event '%s' in state '%s' received\n",
+		    csm_event_str(ev), csm_state_str(con->state));
+		return;
+	}
+}
+
+static void csm_flushing(struct ibtrs_con *con, enum csm_ev ev)
+{
+	DEB("con %p, state %s event %s\n", con, csm_state_str(con->state),
+	    csm_event_str(ev));
+	switch (ev) {
+	case CSM_EV_BEACON_COMPLETED:
+		con_destroy(con);
+		csm_set_state(con, CSM_STATE_CLOSED);
+		ssm_schedule_event(con->sess, SSM_EV_CON_CLOSED);
+		break;
+	case CSM_EV_WC_ERROR:
+	case CSM_EV_CON_ERROR:
+		/* ignore WC and CON errors */
+	case CSM_EV_CON_DISCONNECTED:
+		/* Ignore CSM_EV_CON_DISCONNECTED. At this point we could have
+		 * already received a CSM_EV_CON_DISCONNECTED for the same
+		 * connection, but an additional RDMA_CM_EVENT_DISCONNECTED or
+		 * RDMA_CM_EVENT_TIMEWAIT_EXIT could be generated.
+		 */
+	case CSM_EV_SESS_CLOSING:
+		break;
+	default:
+		WRN(con->sess,
+		    "Unexpected CSM Event '%s' in state '%s' received\n",
+		    csm_event_str(ev), csm_state_str(con->state));
+		return;
+	}
+}
+
+static void schedule_all_cons_close(struct ibtrs_session *sess)
+{
+	int i;
+
+	for (i = 0; i < CONS_PER_SESSION; i++)
+		csm_schedule_event(&sess->con[i], CSM_EV_SESS_CLOSING);
+}
+
+static void ssm_idle(struct ibtrs_session *sess, enum ssm_ev ev)
+{
+	DEB("sess %p, state %s event %s\n", sess, ssm_state_str(sess->state),
+	    ssm_event_str(ev));
+	switch (ev) {
+	case SSM_EV_CON_CONNECTED:
+		WARN_ON(++sess->connected_cnt != 1);
+		if (ssm_init_state(sess, SSM_STATE_WF_INFO))
+			ssm_init_state(sess, SSM_STATE_CLOSE_DESTROY);
+		break;
+	case SSM_EV_CON_CLOSED:
+		sess->active_cnt--;
+		DEB("active_cnt %d\n", sess->active_cnt);
+		WARN_ON(sess->active_cnt);
+		/* fall through */
+	case SSM_EV_CON_ERROR:
+		ssm_init_state(sess, SSM_STATE_CLOSE_DESTROY);
+		break;
+	default:
+		WRN(sess,
+		    "Unexpected SSM Event '%s' in state '%s' received\n",
+		    ssm_event_str(ev), ssm_state_str(sess->state));
+		return;
+	}
+}
+
+static int ssm_idle_reconnect_init(struct ibtrs_session *sess)
+{
+	int err, i;
+
+	sess->retry_cnt++;
+	INFO(sess, "Reconnecting session."
+	     " Retry counter=%d, max reconnect attempts=%d\n",
+	     sess->retry_cnt, sess->max_reconnect_attempts);
+
+	for (i = 0; i < CONS_PER_SESSION; i++) {
+		struct ibtrs_con *con = &sess->con[i];
+
+		csm_set_state(con, CSM_STATE_CLOSED);
+		con->sess = sess;
+	}
+	sess->connected_cnt = 0;
+	err = init_con(sess, &sess->con[0], 0, true);
+	if (err)
+		INFO(sess, "Reconnecting session failed, errno: %d\n", err);
+	return err;
+}
+
+static void ssm_idle_reconnect(struct ibtrs_session *sess, enum ssm_ev ev)
+{
+	DEB("sess %p, state %s event %s\n", sess, ssm_state_str(sess->state),
+	    ssm_event_str(ev));
+	switch (ev) {
+	case SSM_EV_CON_CONNECTED:
+		WARN_ON(++sess->connected_cnt != 1);
+		if (ssm_init_state(sess, SSM_STATE_WF_INFO_RECONNECT))
+			ssm_init_state(sess, SSM_STATE_CLOSE_RECONNECT);
+		break;
+	case SSM_EV_SESS_CLOSE:
+		ssm_init_state(sess, SSM_STATE_CLOSE_DESTROY);
+		break;
+	case SSM_EV_CON_CLOSED:
+		sess->active_cnt--;
+		DEB("active_cnt %d\n", sess->active_cnt);
+		WARN_ON(sess->active_cnt);
+		/* fall through */
+	case SSM_EV_CON_ERROR:
+		sess->stats.reconnects.fail_cnt++;
+		ssm_init_state(sess, SSM_STATE_CLOSE_RECONNECT);
+		break;
+	case SSM_EV_RECONNECT_USER:
+		sess->retry_cnt = 0;
+		break;
+	default:
+		WRN(sess,
+		    "Unexpected SSM Event '%s' in state '%s' received\n",
+		    ssm_event_str(ev), ssm_state_str(sess->state));
+		return;
+	}
+}
+
+static int ssm_wf_info_init(struct ibtrs_session *sess)
+{
+	int err;
+
+	err = ibtrs_request_cq_notifications(&sess->con[0].ib_con);
+	if (unlikely(err < 0)) {
+		return err;
+	} else if (err > 0) {
+		err = get_process_wcs(&sess->con[0]);
+		if (unlikely(err))
+			return err;
+	} else {
+		ibtrs_set_last_heartbeat(&sess->heartbeat);
+		WARN_ON(!schedule_delayed_work(&sess->heartbeat_dwork,
+					       HEARTBEAT_INTV_JIFFIES));
+	}
+	return err;
+}
+
+static void ssm_wf_info(struct ibtrs_session *sess, enum ssm_ev ev)
+{
+	DEB("sess %p, state %s event %s\n", sess, ssm_state_str(sess->state),
+	    ssm_event_str(ev));
+	switch (ev) {
+	case SSM_EV_GOT_RDMA_INFO:
+		if (ssm_init_state(sess, SSM_STATE_OPEN))
+			ssm_init_state(sess, SSM_STATE_CLOSE_DESTROY);
+		break;
+	case SSM_EV_CON_CLOSED:
+		sess->active_cnt--;
+		DEB("active_cnt %d\n", sess->active_cnt);
+		WARN_ON(sess->active_cnt);
+		/* fall through */
+	case SSM_EV_CON_ERROR:
+	case SSM_EV_RECONNECT_HEARTBEAT:
+		ssm_init_state(sess, SSM_STATE_CLOSE_DESTROY);
+		break;
+	default:
+		WRN(sess,
+		    "Unexpected SSM Event '%s' in state '%s' received\n",
+		    ssm_event_str(ev), ssm_state_str(sess->state));
+		return;
+	}
+}
+
+static void ssm_wf_info_reconnect(struct ibtrs_session *sess, enum ssm_ev ev)
+{
+	DEB("sess %p, state %s event %s\n", sess, ssm_state_str(sess->state),
+	    ssm_event_str(ev));
+	switch (ev) {
+	case SSM_EV_GOT_RDMA_INFO:
+		if (ssm_init_state(sess, SSM_STATE_OPEN_RECONNECT))
+			ssm_init_state(sess, SSM_STATE_CLOSE_RECONNECT);
+		break;
+	case SSM_EV_SESS_CLOSE:
+		ssm_init_state(sess, SSM_STATE_CLOSE_DESTROY);
+		break;
+	case SSM_EV_CON_CLOSED:
+		sess->active_cnt--;
+		DEB("active_cnt %d\n", sess->active_cnt);
+		WARN_ON(sess->active_cnt);
+		/* fall through */
+	case SSM_EV_CON_ERROR:
+	case SSM_EV_RECONNECT_HEARTBEAT:
+		sess->stats.reconnects.fail_cnt++;
+		ssm_init_state(sess, SSM_STATE_CLOSE_RECONNECT);
+		break;
+	case SSM_EV_RECONNECT_USER:
+		sess->retry_cnt = 0;
+		break;
+	default:
+		WRN(sess,
+		    "Unexpected SSM Event '%s' in state '%s' received\n",
+		    ssm_event_str(ev), ssm_state_str(sess->state));
+		return;
+	}
+}
+
+static void queue_destroy_sess(struct ibtrs_session *sess)
+{
+	kfree(sess->srv_rdma_addr);
+	sess->srv_rdma_addr = NULL;
+	ibtrs_clt_destroy_ib_session(sess);
+	sess_schedule_destroy(sess);
+}
+
+static int ibtrs_clt_request_cq_notifications(struct ibtrs_session *sess)
+{
+	int err, i;
+
+	for (i = 0; i < CONS_PER_SESSION; i++) {
+		struct ibtrs_con *con = &sess->con[i];
+
+		err = ibtrs_request_cq_notifications(&con->ib_con);
+		if (unlikely(err < 0)) {
+			return err;
+		} else if (err > 0) {
+			err = get_process_wcs(con);
+			if (unlikely(err))
+				return err;
+		}
+	}
+
+	return 0;
+}
+
+static int ibtrs_alloc_io_bufs(struct ibtrs_session *sess)
+{
+	int ret;
+
+	if (sess->io_bufs_initialized)
+		return 0;
+
+	ret = ibtrs_alloc_reqs(sess);
+	if (ret) {
+		ERR(sess,
+		    "Failed to allocate session request buffers, errno: %d\n",
+		    ret);
+		return ret;
+	}
+
+	ret = alloc_sess_fast_pool(sess);
+	if (ret)
+		return ret;
+
+	ret = alloc_sess_tags(sess);
+	if (ret) {
+		ERR(sess, "Failed to allocate session tags, errno: %d\n",
+		    ret);
+		return ret;
+	}
+
+	sess->io_bufs_initialized = true;
+
+	return 0;
+}
+
+static int ssm_open_init(struct ibtrs_session *sess)
+{
+	int i, ret;
+
+	ret = ibtrs_alloc_io_bufs(sess);
+	if (ret)
+		return ret;
+
+	ret = alloc_sess_tr_bufs(sess);
+	if (ret) {
+		ERR(sess,
+		    "Failed to allocate session transfer buffers, errno: %d\n",
+		    ret);
+		return ret;
+	}
+
+	ret = post_usr_con_recv(&sess->con[0]);
+	if (unlikely(ret))
+		return ret;
+	for (i = 1; i < CONS_PER_SESSION; i++) {
+		ret = init_con(sess, &sess->con[i], (i - 1) % num_online_cpus(),
+			       false);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
+static void ssm_open(struct ibtrs_session *sess, enum ssm_ev ev)
+{
+	DEB("sess %p, state %s event %s\n", sess, ssm_state_str(sess->state),
+	    ssm_event_str(ev));
+	switch (ev) {
+	case SSM_EV_CON_CONNECTED:
+		if (++sess->connected_cnt < CONS_PER_SESSION)
+			return;
+
+		if (ssm_init_state(sess, SSM_STATE_CONNECTED)) {
+			ssm_init_state(sess, SSM_STATE_CLOSE_DESTROY);
+			return;
+		}
+
+		INFO(sess, "IBTRS session (QPs: %d) to server established\n",
+		     CONS_PER_SESSION);
+
+		wake_up(&sess->wait_q);
+		break;
+	case SSM_EV_CON_CLOSED:
+		sess->active_cnt--;
+		DEB("active_cnt %d\n", sess->active_cnt);
+		/* fall through */
+	case SSM_EV_CON_ERROR:
+		ssm_init_state(sess, SSM_STATE_CLOSE_DESTROY);
+		break;
+	default:
+		WRN(sess,
+		    "Unexpected SSM Event '%s' in state '%s' received\n",
+		    ssm_event_str(ev), ssm_state_str(sess->state));
+		return;
+	}
+}
+
+static void ssm_open_reconnect(struct ibtrs_session *sess, enum ssm_ev ev)
+{
+	DEB("sess %p, state %s event %s\n", sess, ssm_state_str(sess->state),
+	    ssm_event_str(ev));
+	switch (ev) {
+	case SSM_EV_CON_CONNECTED:
+		if (++sess->connected_cnt < CONS_PER_SESSION)
+			return;
+
+		if (ssm_init_state(sess, SSM_STATE_CONNECTED)) {
+			ssm_init_state(sess, SSM_STATE_CLOSE_RECONNECT);
+			return;
+		}
+
+		INFO(sess, "IBTRS session (QPs: %d) to server established\n",
+		     CONS_PER_SESSION);
+
+		sess->retry_cnt = 0;
+		sess->stats.reconnects.successful_cnt++;
+		clt_ops->sess_ev(sess->priv, IBTRS_CLT_SESS_EV_RECONNECT, 0);
+
+		break;
+	case SSM_EV_SESS_CLOSE:
+		ssm_init_state(sess, SSM_STATE_CLOSE_DESTROY);
+		break;
+	case SSM_EV_CON_CLOSED:
+		sess->active_cnt--;
+		DEB("active_cnt %d\n", sess->active_cnt);
+		/* fall through */
+	case SSM_EV_CON_ERROR:
+		sess->stats.reconnects.fail_cnt++;
+		ssm_init_state(sess, SSM_STATE_CLOSE_RECONNECT);
+		break;
+	case SSM_EV_RECONNECT_USER:
+		sess->retry_cnt = 0;
+		break;
+	default:
+		WRN(sess,
+		    "Unexpected SSM Event '%s' in state '%s' received\n",
+		    ssm_event_str(ev), ssm_state_str(sess->state));
+		return;
+	}
+}
+
+static int ssm_connected_init(struct ibtrs_session *sess)
+{
+	int err;
+
+	err = ibtrs_clt_request_cq_notifications(sess);
+	if (err) {
+		ERR(sess, "Establishing Session failed, requesting"
+		    " CQ completion notification failed, errno: %d\n", err);
+		return err;
+	}
+
+	atomic_set(&sess->peer_usr_msg_bufs, USR_MSG_CNT);
+
+	return 0;
+}
+
+static int sess_disconnect_cons(struct ibtrs_session *sess)
+{
+	int i;
+
+	for (i = 0; i < CONS_PER_SESSION; i++) {
+		struct ibtrs_con *con = &sess->con[i];
+
+		rcu_read_lock();
+		smp_rmb(); /* fence con->state check */
+		if (con->state == CSM_STATE_CONNECTED)
+			rdma_disconnect(con->cm_id);
+		rcu_read_unlock();
+	}
+
+	return 0;
+}
+
+static void ssm_connected(struct ibtrs_session *sess, enum ssm_ev ev)
+{
+	DEB("sess %p, state %s event %s\n", sess, ssm_state_str(sess->state),
+	    ssm_event_str(ev));
+	switch (ev) {
+	case SSM_EV_RECONNECT_USER:
+	case SSM_EV_CON_ERROR:
+	case SSM_EV_RECONNECT_HEARTBEAT:
+		INFO(sess, "Session disconnecting\n");
+
+		if (ev == SSM_EV_RECONNECT_USER)
+			ssm_init_state(sess, SSM_STATE_CLOSE_RECONNECT_IMM);
+		else
+			ssm_init_state(sess, SSM_STATE_CLOSE_RECONNECT);
+
+		wake_up(&sess->mu_buf_wait_q);
+		wake_up(&sess->mu_iu_wait_q);
+		clt_ops->sess_ev(sess->priv, IBTRS_CLT_SESS_EV_DISCONNECTED, 0);
+		sess_disconnect_cons(sess);
+		synchronize_rcu();
+		fail_all_outstanding_reqs(sess);
+		break;
+	case SSM_EV_SESS_CLOSE:
+		cancel_delayed_work_sync(&sess->heartbeat_dwork);
+		ssm_init_state(sess, SSM_STATE_CLOSE_DESTROY);
+		break;
+	default:
+		WRN(sess,
+		    "Unexpected SSM Event '%s' in state '%s' received\n",
+		    ssm_event_str(ev), ssm_state_str(sess->state));
+		return;
+	}
+}
+
+static int ssm_reconnect_init(struct ibtrs_session *sess)
+{
+	unsigned long delay_jiffies;
+	u16 delay_sec = 0;
+
+	if (sess->retry_cnt == 0) {
+		/* If there is a connection error, we wait 5
+		 * seconds for the first reconnect retry. This is needed
+		 * because if the server has initiated the disconnect,
+		 * it might not be ready to receive a new session
+		 * request immediately.
+		 */
+		delay_sec = 5;
+	} else {
+		delay_sec = sess->reconnect_delay_sec + sess->retry_cnt;
+	}
+
+	delay_sec = delay_sec + prandom_u32() % RECONNECT_SEED;
+
+	delay_jiffies = msecs_to_jiffies(1000 * (delay_sec));
+
+	INFO(sess, "Session reconnect in %ds\n", delay_sec);
+	queue_delayed_work_on(0, sess->sm_wq,
+			      &sess->reconnect_dwork, delay_jiffies);
+	return 0;
+}
+
+static void ssm_reconnect(struct ibtrs_session *sess, enum ssm_ev ev)
+{
+	int err;
+
+	DEB("sess %p, state %s event %s\n", sess, ssm_state_str(sess->state),
+	    ssm_event_str(ev));
+	switch (ev) {
+	case SSM_EV_RECONNECT_USER:
+		sess->retry_cnt = 0;
+		cancel_delayed_work_sync(&sess->reconnect_dwork);
+	case SSM_EV_RECONNECT:
+		err =  ssm_init_state(sess, SSM_STATE_IDLE_RECONNECT);
+		if (err == -ENODEV) {
+			cancel_delayed_work_sync(&sess->reconnect_dwork);
+			ssm_init_state(sess, SSM_STATE_DISCONNECTED);
+		} else if (err) {
+			ssm_init_state(sess, SSM_STATE_CLOSE_RECONNECT);
+		}
+		break;
+	case SSM_EV_SESS_CLOSE:
+		cancel_delayed_work_sync(&sess->reconnect_dwork);
+		ssm_init_state(sess, SSM_STATE_DESTROYED);
+		break;
+	default:
+		WRN(sess,
+		    "Unexpected SSM Event '%s' in state '%s' received\n",
+		    ssm_event_str(ev), ssm_state_str(sess->state));
+		return;
+	}
+}
+
+static int ssm_close_destroy_init(struct ibtrs_session *sess)
+{
+	if (!sess->active_cnt)
+		ssm_schedule_event(sess, SSM_EV_ALL_CON_CLOSED);
+	else
+		schedule_all_cons_close(sess);
+
+	return 0;
+}
+
+static void ssm_close_destroy(struct ibtrs_session *sess, enum ssm_ev ev)
+{
+	DEB("sess %p, state %s event %s\n", sess, ssm_state_str(sess->state),
+	    ssm_event_str(ev));
+	switch (ev) {
+	case SSM_EV_CON_CLOSED:
+		sess->active_cnt--;
+		DEB("active_cnt %d\n", sess->active_cnt);
+		if (sess->active_cnt)
+			break;
+	case SSM_EV_ALL_CON_CLOSED:
+		ssm_init_state(sess, SSM_STATE_DESTROYED);
+		wake_up(&sess->wait_q);
+		break;
+	case SSM_EV_SESS_CLOSE:
+	case SSM_EV_CON_ERROR:
+	case SSM_EV_RECONNECT_USER:
+	case SSM_EV_CON_CONNECTED:
+		break;
+	default:
+		WRN(sess,
+		    "Unexpected SSM Event '%s' in state '%s' received\n",
+		    ssm_event_str(ev), ssm_state_str(sess->state));
+		return;
+	}
+}
+
+static void ssm_close_reconnect(struct ibtrs_session *sess, enum ssm_ev ev)
+{
+	DEB("sess %p, state %s event %s\n", sess, ssm_state_str(sess->state),
+	    ssm_event_str(ev));
+	switch (ev) {
+	case SSM_EV_CON_ERROR:
+	case SSM_EV_CON_CONNECTED:
+		break;
+	case SSM_EV_CON_CLOSED:
+		sess->active_cnt--;
+		DEB("active_cnt %d\n", sess->active_cnt);
+		if (sess->active_cnt)
+			break;
+	case SSM_EV_ALL_CON_CLOSED:
+		if (!sess->ib_sess_destroy_completion &&
+		    (sess->max_reconnect_attempts == -1 ||
+		    (sess->max_reconnect_attempts > 0 &&
+		     sess->retry_cnt < sess->max_reconnect_attempts))) {
+			ssm_init_state(sess, SSM_STATE_RECONNECT);
+		} else {
+			if (sess->ib_sess_destroy_completion)
+				INFO(sess, "Device is being removed, will not"
+				     " schedule reconnect of session.\n");
+			else
+				INFO(sess, "Max reconnect attempts reached, "
+				     "will not schedule reconnect of "
+				     "session. (Current reconnect attempts=%d,"
+				     " max reconnect attempts=%d)\n",
+				     sess->retry_cnt,
+				     sess->max_reconnect_attempts);
+			clt_ops->sess_ev(sess->priv,
+					 IBTRS_CLT_SESS_EV_MAX_RECONN_EXCEEDED,
+					 0);
+
+			ssm_init_state(sess, SSM_STATE_DISCONNECTED);
+		}
+		break;
+	case SSM_EV_SESS_CLOSE:
+		ssm_init_state(sess, SSM_STATE_CLOSE_DESTROY);
+		break;
+	case SSM_EV_RECONNECT_USER:
+		sess->retry_cnt = 0;
+		ssm_init_state(sess, SSM_STATE_CLOSE_RECONNECT_IMM);
+		break;
+	default:
+		WRN(sess,
+		    "Unexpected SSM Event '%s' in state '%s' received\n",
+		    ssm_event_str(ev), ssm_state_str(sess->state));
+		return;
+	}
+}
+
+static void ssm_close_reconnect_imm(struct ibtrs_session *sess, enum ssm_ev ev)
+{
+	DEB("sess %p, state %s event %s\n", sess, ssm_state_str(sess->state),
+	    ssm_event_str(ev));
+	switch (ev) {
+	case SSM_EV_CON_CLOSED:
+		sess->active_cnt--;
+		DEB("active_cnt %d\n", sess->active_cnt);
+		if (sess->active_cnt)
+			break;
+	case SSM_EV_ALL_CON_CLOSED:
+		if (ssm_init_state(sess, SSM_STATE_IDLE_RECONNECT))
+			ssm_init_state(sess, SSM_STATE_DISCONNECTED);
+		break;
+	case SSM_EV_SESS_CLOSE:
+		ssm_init_state(sess, SSM_STATE_CLOSE_DESTROY);
+		break;
+	case SSM_EV_RECONNECT_USER:
+	case SSM_EV_CON_ERROR:
+		break;
+	default:
+		WRN(sess,
+		    "Unexpected SSM Event '%s' in state '%s' received\n",
+		    ssm_event_str(ev), ssm_state_str(sess->state));
+		return;
+	}
+}
+
+static int ssm_disconnected_init(struct ibtrs_session *sess)
+{
+	ibtrs_clt_destroy_ib_session(sess);
+
+	return 0;
+}
+
+static void ssm_disconnected(struct ibtrs_session *sess, enum ssm_ev ev)
+{
+	DEB("sess %p, state %s event %s\n", sess, ssm_state_str(sess->state),
+	    ssm_event_str(ev));
+
+	switch (ev) {
+	case SSM_EV_RECONNECT_USER:
+		sess->retry_cnt = 0;
+		/* stay in disconnected if can't switch to IDLE_RECONNECT */
+		ssm_init_state(sess, SSM_STATE_IDLE_RECONNECT);
+		break;
+	case SSM_EV_SESS_CLOSE:
+		ssm_init_state(sess, SSM_STATE_DESTROYED);
+		break;
+	default:
+		WRN(sess,
+		    "Unexpected SSM Event '%s' in state '%s' received\n",
+		    ssm_event_str(ev), ssm_state_str(sess->state));
+		return;
+	}
+}
+
+static int ssm_destroyed_init(struct ibtrs_session *sess)
+{
+	queue_destroy_sess(sess);
+
+	return 0;
+}
+
+static void ssm_destroyed(struct ibtrs_session *sess, enum ssm_ev ev)
+{
+	DEB("sess %p, state %s event %s\n", sess, ssm_state_str(sess->state),
+	    ssm_event_str(ev));
+
+	/* ignore all events since the session is being destroyed */
+}
+
+int ibtrs_clt_register(const struct ibtrs_clt_ops *ops)
+{
+	if (clt_ops) {
+		ERR_NP("Module %s already registered, only one user module"
+		       " supported\n", clt_ops->owner->name);
+		return -ENOTSUPP;
+	}
+	if (!clt_ops_are_valid(ops))
+		return -EINVAL;
+	clt_ops = ops;
+
+	return 0;
+}
+EXPORT_SYMBOL(ibtrs_clt_register);
+
+void ibtrs_clt_unregister(const struct ibtrs_clt_ops *ops)
+{
+	if (WARN_ON(!clt_ops))
+		return;
+
+	if (memcmp(clt_ops->owner, ops->owner, sizeof(*clt_ops)))
+		return;
+
+	flush_workqueue(ibtrs_wq);
+
+	mutex_lock(&sess_mutex);
+	WARN(!list_empty(&sess_list),
+	     "BUG: user module didn't close all sessions before calling %s\n",
+	     __func__);
+	mutex_unlock(&sess_mutex);
+
+	clt_ops = NULL;
+}
+EXPORT_SYMBOL(ibtrs_clt_unregister);
+
+int ibtrs_clt_query(struct ibtrs_session *sess, struct ibtrs_attrs *attr)
+{
+	if (unlikely(sess->state != SSM_STATE_CONNECTED))
+		return -ECOMM;
+
+	attr->queue_depth      = sess->queue_depth;
+	attr->mr_page_mask     = sess->mr_page_mask;
+	attr->mr_page_size     = sess->mr_page_size;
+	attr->mr_max_size      = sess->mr_max_size;
+	attr->max_pages_per_mr = sess->max_pages_per_mr;
+	attr->max_sge          = sess->max_sge;
+	attr->max_io_size      = sess->max_io_size;
+	strlcpy(attr->hostname, sess->hostname, sizeof(attr->hostname));
+
+	return 0;
+}
+EXPORT_SYMBOL(ibtrs_clt_query);
+
+static int check_module_params(void)
+{
+	if (fmr_sg_cnt > MAX_SEGMENTS || fmr_sg_cnt < 0) {
+		ERR_NP("invalid fmr_sg_cnt values\n");
+		return -EINVAL;
+	}
+	return 0;
+}
+
+ssize_t ibtrs_clt_stats_rdma_to_str(struct ibtrs_session *sess,
+				    char *page, size_t len)
+{
+	struct ibtrs_clt_stats_rdma_stats s;
+	struct ibtrs_clt_stats_rdma_stats *r = sess->stats.rdma_stats;
+	int i;
+
+	memset(&s, 0, sizeof(s));
+
+	for (i = 0; i < num_online_cpus(); i++) {
+		s.cnt_read		+= r[i].cnt_read;
+		s.size_total_read	+= r[i].size_total_read;
+		s.cnt_write		+= r[i].cnt_write;
+		s.size_total_write	+= r[i].size_total_write;
+		s.inflight		+= r[i].inflight;
+	}
+
+	return scnprintf(page, len, "%llu %llu %llu %llu %u\n",
+			 s.cnt_read, s.size_total_read, s.cnt_write,
+			 s.size_total_write, s.inflight);
+}
+
+int ibtrs_clt_stats_sg_list_distr_to_str(struct ibtrs_session *sess, char *buf,
+					 size_t len)
+{
+	int cnt = 0;
+	unsigned p, p_i, p_f;
+	u64 *total = sess->stats.sg_list_total;
+	u64 **distr = sess->stats.sg_list_distr;
+	int i, j;
+
+	cnt += scnprintf(buf + cnt, len - cnt, "n\\cpu:");
+	for (j = 0; j < num_online_cpus(); j++)
+		cnt += scnprintf(buf + cnt, len - cnt, "%5d", j);
+
+	for (i = 0; i < SG_DISTR_LEN + 1; i++) {
+		if (i <= MAX_LIN_SG)
+			cnt += scnprintf(buf + cnt, len - cnt, "\n= %3d:", i);
+		else if (i < SG_DISTR_LEN)
+			cnt += scnprintf(buf + cnt, len - cnt,
+					"\n< %3d:",
+					1 << (i + MIN_LOG_SG - MAX_LIN_SG));
+		else
+			cnt += scnprintf(buf + cnt, len - cnt,
+					"\n>=%3d:",
+					1 << (i + MIN_LOG_SG - MAX_LIN_SG - 1));
+
+		for (j = 0; j < num_online_cpus(); j++) {
+			p = total[j] ? distr[j][i] * 1000 / total[j] : 0;
+			p_i = p / 10;
+			p_f = p % 10;
+
+			if (distr[j][i])
+				cnt += scnprintf(buf + cnt, len - cnt,
+						 " %2u.%01u", p_i, p_f);
+			else
+				cnt += scnprintf(buf + cnt, len - cnt, "    0");
+		}
+	}
+
+	cnt += scnprintf(buf + cnt, len - cnt, "\ntotal:");
+	for (j = 0; j < num_online_cpus(); j++)
+		cnt += scnprintf(buf + cnt, len - cnt, " %llu", total[j]);
+	cnt += scnprintf(buf + cnt, len - cnt, "\n");
+
+	return cnt;
+}
+
+static int __init ibtrs_client_init(void)
+{
+	int err;
+
+	scnprintf(hostname, sizeof(hostname), "%s", utsname()->nodename);
+	INFO_NP("Loading module ibtrs_client, version: " __stringify(IBTRS_VER)
+		" (use_fr: %d, retry_count: %d,"
+		" fmr_sg_cnt: %d,"
+		" default_heartbeat_timeout_ms: %d, hostname: %s)\n", use_fr,
+		retry_count, fmr_sg_cnt,
+		default_heartbeat_timeout_ms, hostname);
+	err = check_module_params();
+	if (err) {
+		ERR_NP("Failed to load module, invalid module parameters,"
+		       " errno: %d\n", err);
+		return err;
+	}
+
+	ibtrs_wq = alloc_workqueue("ibtrs_client_wq", 0, 0);
+	if (!ibtrs_wq) {
+		ERR_NP("Failed to load module, alloc ibtrs_client_wq failed\n");
+		return -ENOMEM;
+	}
+
+	err = ibtrs_clt_create_sysfs_files();
+	if (err) {
+		ERR_NP("Failed to load module, can't create sysfs files,"
+		       " errno: %d\n", err);
+		goto out_destroy_wq;
+	}
+	uuid_le_gen(&uuid);
+	return 0;
+
+out_destroy_wq:
+	destroy_workqueue(ibtrs_wq);
+	return err;
+}
+
+static void __exit ibtrs_client_exit(void)
+{
+	INFO_NP("Unloading module\n");
+
+	mutex_lock(&sess_mutex);
+	WARN(!list_empty(&sess_list),
+	     "Session(s) still exist on module unload\n");
+	mutex_unlock(&sess_mutex);
+	ibtrs_clt_destroy_sysfs_files();
+	destroy_workqueue(ibtrs_wq);
+
+	INFO_NP("Module unloaded\n");
+}
+
+module_init(ibtrs_client_init);
+module_exit(ibtrs_client_exit);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 05/28] ibtrs_clt: main functionality of ibtrs_client
@ 2017-03-24 10:45   ` Jack Wang
  0 siblings, 0 replies; 87+ messages in thread
From: Jack Wang @ 2017-03-24 10:45 UTC (permalink / raw)
  To: linux-block-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
	hch-jcswGhMUV9g, mail-99BIx50xQYGELgA04lAiVw,
	Milind.dumbare-Re5JQEeQqe8AvxtiuMwx3w,
	yun.wang-EIkl63zCoXaH+58JC4qpiA, Jack Wang, Kleber Souza,
	Danil Kipnis, Roman Pen

From: Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>

ibtrs_client establishes connection to a server and excutes
RDMA operations requested by ibnbd_client.

Upon a connection establishment, server and client handshake memory
info, the server reserves enough memory to hold the queue_depth of
max io size requests for that particular client. The client is then
solely responsible to manage the memory.

We use heavily RDMA Write with IMM to reduce InfiniBand messages for
each IO, thus lower latency.

Signed-off-by: Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Signed-off-by: Kleber Souza <kleber.souza-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Signed-off-by: Danil Kipnis <danil.kipnis-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Signed-off-by: Roman Pen <roman.penyaev-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
---
 drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c | 5329 +++++++++++++++++++++++
 1 file changed, 5329 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c

diff --git a/drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c b/drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c
new file mode 100644
index 0000000..d34d468
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c
@@ -0,0 +1,5329 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler <mail-99BIx50xQYGELgA04lAiVw@public.gmane.org>
+ *          Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ *   	    Kleber Souza <kleber.souza-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ * 	    Danil Kipnis <danil.kipnis-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ *   	    Roman Pen <roman.penyaev-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ *          Milind Dumbare <Milind.dumbare-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#include <linux/module.h>
+#include <linux/wait.h>
+#include <linux/scatterlist.h>
+#include <linux/random.h>
+#include <linux/uuid.h>
+#include <linux/utsname.h>
+#include <rdma/rdma_cm.h>
+#include <rdma/ib_cm.h>
+#include <rdma/ib_fmr_pool.h>
+#include <rdma/ib.h>
+#include <rdma/ibtrs_clt.h>
+#include "ibtrs_clt_internal.h"
+#include "ibtrs_clt_sysfs.h"
+#include <rdma/ibtrs.h>
+#include <rdma/ibtrs_log.h>
+#include <linux/list.h>
+
+#define CONS_PER_SESSION (nr_cpu_ids + 1)
+#define RECONNECT_SEED 8
+#define MAX_SEGMENTS 31
+
+MODULE_AUTHOR("ibnbd-EIkl63zCoXaH+58JC4qpiA@public.gmane.org");
+MODULE_DESCRIPTION("InfiniBand Transport Client");
+MODULE_VERSION(__stringify(IBTRS_VER));
+MODULE_LICENSE("GPL");
+
+static bool use_fr;
+module_param(use_fr, bool, 0444);
+MODULE_PARM_DESC(use_fr, "use FRWR mode for memory registration if possible."
+		 " (default: 0)");
+
+static int retry_count = 7;
+
+static int retry_count_set(const char *val, const struct kernel_param *kp)
+{
+	int err, ival;
+
+	err = kstrtoint(val, 0, &ival);
+	if (err)
+		return err;
+
+	if (ival < MIN_RTR_CNT || ival > MAX_RTR_CNT)
+		return -EINVAL;
+
+	retry_count = ival;
+
+	return 0;
+}
+
+static const struct kernel_param_ops retry_count_ops = {
+	.set		= retry_count_set,
+	.get		= param_get_int,
+};
+module_param_cb(retry_count, &retry_count_ops, &retry_count, 0644);
+
+MODULE_PARM_DESC(retry_count, "Number of times to send the message if the"
+		 " remote side didn't respond with Ack or Nack (default: 3,"
+		 " min: " __stringify(MIN_RTR_CNT) ", max: "
+		 __stringify(MAX_RTR_CNT) ")");
+
+static int fmr_sg_cnt = 4;
+module_param_named(fmr_sg_cnt, fmr_sg_cnt, int, 0644);
+MODULE_PARM_DESC(fmr_sg_cnt, "when sg_cnt is bigger than fmr_sg_cnt, enable"
+		 " FMR (default: 4)");
+
+static int default_heartbeat_timeout_ms = DEFAULT_HEARTBEAT_TIMEOUT_MS;
+
+static int default_heartbeat_timeout_set(const char *val,
+					 const struct kernel_param *kp)
+{
+	int ret, ival;
+
+	ret = kstrtouint(val, 0, &ival);
+	if (ret) {
+		ERR_NP("Failed to convert string '%s' to unsigned int\n", val);
+		return ret;
+	}
+
+	ret = ibtrs_heartbeat_timeout_validate(ival);
+	if (ret)
+		return ret;
+
+	default_heartbeat_timeout_ms = ival;
+
+	return 0;
+}
+
+static const struct kernel_param_ops heartbeat_timeout_ops = {
+	.set		= default_heartbeat_timeout_set,
+	.get		= param_get_int,
+};
+
+module_param_cb(default_heartbeat_timeout_ms, &heartbeat_timeout_ops,
+		&default_heartbeat_timeout_ms, 0644);
+MODULE_PARM_DESC(default_heartbeat_timeout_ms, "default heartbeat timeout,"
+		 " min: " __stringify(MIN_HEARTBEAT_TIMEOUT_MS)
+		 " (default:" __stringify(DEFAULT_HEARTBEAT_TIMEOUT_MS) ")");
+
+static char hostname[MAXHOSTNAMELEN] = "";
+
+static int hostname_set(const char *val, const struct kernel_param *kp)
+{
+	int ret = 0, len = strlen(val);
+
+	if (len >= sizeof(hostname))
+		return -EINVAL;
+	strlcpy(hostname, val, sizeof(hostname));
+	*strchrnul(hostname, '\n') = '\0';
+
+	INFO_NP("hostname changed to %s\n", hostname);
+	return ret;
+}
+
+static struct kparam_string hostname_kparam_str = {
+	.maxlen	= sizeof(hostname),
+	.string	= hostname
+};
+
+static const struct kernel_param_ops hostname_ops = {
+	.set	= hostname_set,
+	.get	= param_get_string,
+};
+
+module_param_cb(hostname, &hostname_ops,
+		&hostname_kparam_str, 0644);
+MODULE_PARM_DESC(hostname, "Sets hostname of local server, will send to the"
+		 " other side if set,  will display togather with addr "
+		 "(default: empty)");
+
+#define LOCAL_INV_WR_ID_MASK	1
+#define	FAST_REG_WR_ID_MASK	2
+
+static const struct ibtrs_clt_ops *clt_ops;
+static struct workqueue_struct *ibtrs_wq;
+static LIST_HEAD(sess_list);
+static DEFINE_MUTEX(sess_mutex);
+
+static uuid_le uuid;
+
+enum csm_state {
+	_CSM_STATE_MIN,
+	CSM_STATE_RESOLVING_ADDR,
+	CSM_STATE_RESOLVING_ROUTE,
+	CSM_STATE_CONNECTING,
+	CSM_STATE_CONNECTED,
+	CSM_STATE_CLOSING,
+	CSM_STATE_FLUSHING,
+	CSM_STATE_CLOSED,
+	_CSM_STATE_MAX
+};
+
+enum csm_ev {
+	CSM_EV_ADDR_RESOLVED,
+	CSM_EV_ROUTE_RESOLVED,
+	CSM_EV_CON_ESTABLISHED,
+	CSM_EV_SESS_CLOSING,
+	CSM_EV_CON_DISCONNECTED,
+	CSM_EV_BEACON_COMPLETED,
+	CSM_EV_WC_ERROR,
+	CSM_EV_CON_ERROR
+};
+
+enum ssm_ev {
+	SSM_EV_CON_CONNECTED,
+	SSM_EV_RECONNECT,		/* in RECONNECT state only*/
+	SSM_EV_RECONNECT_USER,		/* triggered by user via sysfs */
+	SSM_EV_RECONNECT_HEARTBEAT,	/* triggered by the heartbeat */
+	SSM_EV_SESS_CLOSE,
+	SSM_EV_CON_CLOSED,		/* when CSM switched to CLOSED */
+	SSM_EV_CON_ERROR,		/* triggered by CSM when smth. wrong */
+	SSM_EV_ALL_CON_CLOSED,		/* triggered when all cons closed */
+	SSM_EV_GOT_RDMA_INFO
+};
+
+static const char *ssm_state_str(enum ssm_state state)
+{
+	switch (state) {
+	case SSM_STATE_IDLE:
+		return "SSM_STATE_IDLE";
+	case SSM_STATE_IDLE_RECONNECT:
+		return "SSM_STATE_IDLE_RECONNECT";
+	case SSM_STATE_WF_INFO:
+		return "SSM_STATE_WF_INFO";
+	case SSM_STATE_WF_INFO_RECONNECT:
+		return "SSM_STATE_WF_INFO_RECONNECT";
+	case SSM_STATE_OPEN:
+		return "SSM_STATE_OPEN";
+	case SSM_STATE_OPEN_RECONNECT:
+		return "SSM_STATE_OPEN_RECONNECT";
+	case SSM_STATE_CONNECTED:
+		return "SSM_STATE_CONNECTED";
+	case SSM_STATE_RECONNECT:
+		return "SSM_STATE_RECONNECT";
+	case SSM_STATE_CLOSE_DESTROY:
+		return "SSM_STATE_CLOSE_DESTROY";
+	case SSM_STATE_CLOSE_RECONNECT:
+		return "SSM_STATE_CLOSE_RECONNECT";
+	case SSM_STATE_CLOSE_RECONNECT_IMM:
+		return "SSM_STATE_CLOSE_RECONNECT_IMM";
+	case SSM_STATE_DISCONNECTED:
+		return "SSM_STATE_DISCONNECTED";
+	case SSM_STATE_DESTROYED:
+		return "SSM_STATE_DESTROYED";
+	default:
+		return "UNKNOWN";
+	}
+}
+
+static const char *ssm_event_str(enum ssm_ev ev)
+{
+	switch (ev) {
+	case SSM_EV_CON_CONNECTED:
+		return "SSM_EV_CON_CONNECTED";
+	case SSM_EV_RECONNECT:
+		return "SSM_EV_RECONNECT";
+	case SSM_EV_RECONNECT_USER:
+		return "SSM_EV_RECONNECT_USER";
+	case SSM_EV_RECONNECT_HEARTBEAT:
+		return "SSM_EV_RECONNECT_HEARTBEAT";
+	case SSM_EV_SESS_CLOSE:
+		return "SSM_EV_SESS_CLOSE";
+	case SSM_EV_CON_CLOSED:
+		return "SSM_EV_CON_CLOSED";
+	case SSM_EV_CON_ERROR:
+		return "SSM_EV_CON_ERROR";
+	case SSM_EV_ALL_CON_CLOSED:
+		return "SSM_EV_ALL_CON_CLOSED";
+	case SSM_EV_GOT_RDMA_INFO:
+		return "SSM_EV_GOT_RDMA_INFO";
+	default:
+		return "UNKNOWN";
+	}
+}
+
+static const char *csm_state_str(enum csm_state state)
+{
+	switch (state) {
+	case CSM_STATE_RESOLVING_ADDR:
+		return "CSM_STATE_RESOLVING_ADDR";
+	case CSM_STATE_RESOLVING_ROUTE:
+		return "CSM_STATE_RESOLVING_ROUTE";
+	case CSM_STATE_CONNECTING:
+		return "CSM_STATE_CONNECTING";
+	case CSM_STATE_CONNECTED:
+		return "CSM_STATE_CONNECTED";
+	case CSM_STATE_FLUSHING:
+		return "CSM_STATE_FLUSHING";
+	case CSM_STATE_CLOSING:
+		return "CSM_STATE_CLOSING";
+	case CSM_STATE_CLOSED:
+		return "CSM_STATE_CLOSED";
+	default:
+		return "UNKNOWN";
+	}
+}
+
+static const char *csm_event_str(enum csm_ev ev)
+{
+	switch (ev) {
+	case CSM_EV_ADDR_RESOLVED:
+		return "CSM_EV_ADDR_RESOLVED";
+	case CSM_EV_ROUTE_RESOLVED:
+		return "CSM_EV_ROUTE_RESOLVED";
+	case CSM_EV_CON_ESTABLISHED:
+		return "CSM_EV_CON_ESTABLISHED";
+	case CSM_EV_BEACON_COMPLETED:
+		return "CSM_EV_BEACON_COMPLETED";
+	case CSM_EV_SESS_CLOSING:
+		return "CSM_EV_SESS_CLOSING";
+	case CSM_EV_CON_DISCONNECTED:
+		return "CSM_EV_CON_DISCONNECTED";
+	case CSM_EV_WC_ERROR:
+		return "CSM_EV_WC_ERROR";
+	case CSM_EV_CON_ERROR:
+		return "CSM_EV_CON_ERROR";
+	default:
+		return "UNKNOWN";
+	}
+}
+
+/* rdma_req which connect iu with sglist received from user */
+struct rdma_req {
+	struct list_head        list;
+	struct ibtrs_iu		*iu;
+	struct scatterlist	*sglist; /* list holding user data */
+	unsigned int		sg_cnt;
+	unsigned int		sg_size;
+	u32			data_len;
+	void			*priv;
+	bool			in_use;
+	struct ibtrs_con	*con;
+	union {
+		struct ib_pool_fmr	**fmr_list;
+		struct ibtrs_fr_desc	**fr_list;
+	};
+	void			*map_page;
+	struct ibtrs_tag	*tag;
+	u16			nmdesc;
+	enum dma_data_direction dir;
+	unsigned long		start_time;
+} ____cacheline_aligned;
+
+struct ibtrs_con {
+	enum  csm_state		state;
+	short			cpu;
+	bool			user; /* true if con is for user msg only */
+	atomic_t		io_cnt;
+	struct ibtrs_session	*sess;
+	struct ib_con		ib_con;
+	struct ibtrs_fr_pool	*fr_pool;
+	struct rdma_cm_id	*cm_id;
+	struct work_struct	cq_work;
+	struct workqueue_struct *cq_wq;
+	struct tasklet_struct	cq_tasklet;
+	struct ib_wc		wcs[WC_ARRAY_SIZE];
+	bool			device_being_removed;
+};
+
+struct sess_destroy_sm_wq_work {
+	struct work_struct	work;
+	struct ibtrs_session	*sess;
+};
+
+struct con_sm_work {
+	struct work_struct	work;
+	struct ibtrs_con	*con;
+	enum csm_ev		ev;
+};
+
+struct sess_sm_work {
+	struct work_struct	work;
+	struct ibtrs_session	*sess;
+	enum ssm_ev		ev;
+};
+
+struct msg_work {
+	struct work_struct	work;
+	struct ibtrs_con	*con;
+	void                    *msg;
+};
+
+static void ibtrs_clt_free_sg_list_distr_stats(struct ibtrs_session *sess)
+{
+	int i;
+
+	for (i = 0; i < num_online_cpus(); i++)
+		kfree(sess->stats.sg_list_distr[i]);
+	kfree(sess->stats.sg_list_distr);
+	sess->stats.sg_list_distr = NULL;
+	kfree(sess->stats.sg_list_total);
+	sess->stats.sg_list_total = NULL;
+}
+
+static void ibtrs_clt_free_cpu_migr_stats(struct ibtrs_session *sess)
+{
+	kfree(sess->stats.cpu_migr.to);
+	sess->stats.cpu_migr.to = NULL;
+	kfree(sess->stats.cpu_migr.from);
+	sess->stats.cpu_migr.from = NULL;
+}
+
+static void ibtrs_clt_free_rdma_lat_stats(struct ibtrs_session *sess)
+{
+	int i;
+
+	for (i = 0; i < num_online_cpus(); i++)
+		kfree(sess->stats.rdma_lat_distr[i]);
+
+	kfree(sess->stats.rdma_lat_distr);
+	sess->stats.rdma_lat_distr = NULL;
+	kfree(sess->stats.rdma_lat_max);
+	sess->stats.rdma_lat_max = NULL;
+}
+
+static void ibtrs_clt_free_wc_comp_stats(struct ibtrs_session *sess)
+{
+	kfree(sess->stats.wc_comp);
+	sess->stats.wc_comp = NULL;
+}
+
+static void ibtrs_clt_free_rdma_stats(struct ibtrs_session *sess)
+{
+	kfree(sess->stats.rdma_stats);
+	sess->stats.rdma_stats = NULL;
+}
+
+static void ibtrs_clt_free_stats(struct ibtrs_session *sess)
+{
+	ibtrs_clt_free_rdma_stats(sess);
+	ibtrs_clt_free_rdma_lat_stats(sess);
+	ibtrs_clt_free_cpu_migr_stats(sess);
+	ibtrs_clt_free_sg_list_distr_stats(sess);
+	ibtrs_clt_free_wc_comp_stats(sess);
+}
+
+static inline int get_sess(struct ibtrs_session *sess)
+{
+	return atomic_inc_not_zero(&sess->refcount);
+}
+
+static void free_con_fast_pool(struct ibtrs_con *con);
+
+static void sess_deinit_cons(struct ibtrs_session *sess)
+{
+	int i;
+
+	for (i = 0; i < CONS_PER_SESSION; i++) {
+		struct ibtrs_con *con = &sess->con[i];
+
+		if (!i)
+			destroy_workqueue(con->cq_wq);
+		else
+			tasklet_kill(&con->cq_tasklet);
+	}
+}
+
+static void put_sess(struct ibtrs_session *sess)
+{
+	if (!atomic_dec_if_positive(&sess->refcount)) {
+		struct completion *destroy_completion;
+
+		destroy_workqueue(sess->sm_wq);
+		sess_deinit_cons(sess);
+		kfree(sess->con);
+		sess->con = NULL;
+		ibtrs_clt_free_stats(sess);
+		destroy_completion = sess->destroy_completion;
+		mutex_lock(&sess_mutex);
+		list_del(&sess->list);
+		mutex_unlock(&sess_mutex);
+		INFO(sess, "Session is disconnected\n");
+		kfree(sess);
+		if (destroy_completion)
+			complete_all(destroy_completion);
+	}
+}
+
+inline int ibtrs_clt_get_user_queue_depth(struct ibtrs_session *sess)
+{
+	return sess->user_queue_depth;
+}
+
+inline int ibtrs_clt_set_user_queue_depth(struct ibtrs_session *sess,
+					  u16 queue_depth)
+{
+	if (queue_depth < 1 ||
+	    queue_depth > sess->queue_depth) {
+		ERR(sess, "Queue depth %u is out of range (1 - %u)",
+		    queue_depth,
+		    sess->queue_depth);
+		return -EINVAL;
+	}
+
+	sess->user_queue_depth = queue_depth;
+	return 0;
+}
+
+static void csm_resolving_addr(struct ibtrs_con *con, enum csm_ev ev);
+static void csm_resolving_route(struct ibtrs_con *con, enum csm_ev ev);
+static void csm_connecting(struct ibtrs_con *con, enum csm_ev ev);
+static void csm_connected(struct ibtrs_con *con, enum csm_ev ev);
+static void csm_flushing(struct ibtrs_con *con, enum csm_ev ev);
+static void csm_closing(struct ibtrs_con *con, enum csm_ev ev);
+
+static int init_con(struct ibtrs_session *sess, struct ibtrs_con *con,
+		    short cpu, bool user);
+/* ignore all event for safefy */
+static void csm_closed(struct ibtrs_con *con, enum csm_ev ev)
+{
+}
+
+typedef void (ibtrs_clt_csm_ev_handler_fn)(struct ibtrs_con *, enum csm_ev);
+
+static ibtrs_clt_csm_ev_handler_fn *ibtrs_clt_csm_ev_handlers[] = {
+	[CSM_STATE_RESOLVING_ADDR]	= csm_resolving_addr,
+	[CSM_STATE_RESOLVING_ROUTE]	= csm_resolving_route,
+	[CSM_STATE_CONNECTING]		= csm_connecting,
+	[CSM_STATE_CONNECTED]		= csm_connected,
+	[CSM_STATE_CLOSING]		= csm_closing,
+	[CSM_STATE_FLUSHING]		= csm_flushing,
+	[CSM_STATE_CLOSED]		= csm_closed
+};
+
+static void csm_trigger_event(struct work_struct *work)
+{
+	struct con_sm_work *w;
+	struct ibtrs_con *con;
+	enum csm_ev ev;
+
+	w = container_of(work, struct con_sm_work, work);
+	con = w->con;
+	ev = w->ev;
+	kvfree(w);
+
+	if (WARN_ON_ONCE(con->state <= _CSM_STATE_MIN ||
+			 con->state >= _CSM_STATE_MAX)) {
+		WRN(con->sess, "Connection state is out of range\n");
+		return;
+	}
+
+	ibtrs_clt_csm_ev_handlers[con->state](con, ev);
+}
+
+static void csm_set_state(struct ibtrs_con *con, enum csm_state s)
+{
+	if (WARN(s <= _CSM_STATE_MIN || s >= _CSM_STATE_MAX,
+		 "Unknown CSM state %d\n", s))
+		return;
+	smp_wmb(); /* fence con->state change */
+	if (con->state != s) {
+		DEB("changing con %p csm state from %s to %s\n", con,
+		    csm_state_str(con->state), csm_state_str(s));
+		con->state = s;
+	}
+}
+
+inline bool ibtrs_clt_sess_is_connected(const struct ibtrs_session *sess)
+{
+	return sess->state == SSM_STATE_CONNECTED;
+}
+
+static void ssm_idle(struct ibtrs_session *sess, enum ssm_ev ev);
+static void ssm_idle_reconnect(struct ibtrs_session *sess, enum ssm_ev ev);
+static void ssm_open(struct ibtrs_session *sess, enum ssm_ev ev);
+static void ssm_open_reconnect(struct ibtrs_session *sess, enum ssm_ev ev);
+static void ssm_connected(struct ibtrs_session *sess, enum ssm_ev ev);
+static void ssm_reconnect(struct ibtrs_session *sess, enum ssm_ev ev);
+static void ssm_close_destroy(struct ibtrs_session *sess, enum ssm_ev ev);
+static void ssm_close_reconnect(struct ibtrs_session *sess, enum ssm_ev ev);
+static void ssm_close_reconnect_imm(struct ibtrs_session *sess, enum ssm_ev ev);
+static void ssm_disconnected(struct ibtrs_session *sess, enum ssm_ev ev);
+static void ssm_destroyed(struct ibtrs_session *sess, enum ssm_ev ev);
+static void ssm_wf_info(struct ibtrs_session *sess, enum ssm_ev ev);
+static void ssm_wf_info_reconnect(struct ibtrs_session *sess, enum ssm_ev ev);
+
+typedef void (ibtrs_clt_ssm_ev_handler_fn)(struct ibtrs_session *, enum ssm_ev);
+
+static ibtrs_clt_ssm_ev_handler_fn *ibtrs_clt_ev_handlers[] = {
+	[SSM_STATE_IDLE]		= ssm_idle,
+	[SSM_STATE_IDLE_RECONNECT]	= ssm_idle_reconnect,
+	[SSM_STATE_WF_INFO]		= ssm_wf_info,
+	[SSM_STATE_WF_INFO_RECONNECT]	= ssm_wf_info_reconnect,
+	[SSM_STATE_OPEN]		= ssm_open,
+	[SSM_STATE_OPEN_RECONNECT]	= ssm_open_reconnect,
+	[SSM_STATE_CONNECTED]		= ssm_connected,
+	[SSM_STATE_RECONNECT]		= ssm_reconnect,
+	[SSM_STATE_CLOSE_DESTROY]	= ssm_close_destroy,
+	[SSM_STATE_CLOSE_RECONNECT]	= ssm_close_reconnect,
+	[SSM_STATE_CLOSE_RECONNECT_IMM]	= ssm_close_reconnect_imm,
+	[SSM_STATE_DISCONNECTED]	= ssm_disconnected,
+	[SSM_STATE_DESTROYED]		= ssm_destroyed,
+};
+
+typedef int (ibtrs_clt_ssm_state_init_fn)(struct ibtrs_session *);
+static ibtrs_clt_ssm_state_init_fn	ssm_open_init;
+static ibtrs_clt_ssm_state_init_fn	ssm_close_destroy_init;
+static ibtrs_clt_ssm_state_init_fn	ssm_destroyed_init;
+static ibtrs_clt_ssm_state_init_fn	ssm_connected_init;
+static ibtrs_clt_ssm_state_init_fn	ssm_reconnect_init;
+static ibtrs_clt_ssm_state_init_fn	ssm_idle_reconnect_init;
+static ibtrs_clt_ssm_state_init_fn	ssm_disconnected_init;
+static ibtrs_clt_ssm_state_init_fn	ssm_wf_info_init;
+
+static ibtrs_clt_ssm_state_init_fn *ibtrs_clt_ssm_state_init[] = {
+	[SSM_STATE_IDLE]		= NULL,
+	[SSM_STATE_IDLE_RECONNECT]	= ssm_idle_reconnect_init,
+	[SSM_STATE_WF_INFO]		= ssm_wf_info_init,
+	[SSM_STATE_WF_INFO_RECONNECT]	= ssm_wf_info_init,
+	[SSM_STATE_OPEN]		= ssm_open_init,
+	[SSM_STATE_OPEN_RECONNECT]	= ssm_open_init,
+	[SSM_STATE_CONNECTED]		= ssm_connected_init,
+	[SSM_STATE_RECONNECT]		= ssm_reconnect_init,
+	[SSM_STATE_CLOSE_DESTROY]	= ssm_close_destroy_init,
+	[SSM_STATE_CLOSE_RECONNECT]	= ssm_close_destroy_init,
+	[SSM_STATE_CLOSE_RECONNECT_IMM]	= ssm_close_destroy_init,
+	[SSM_STATE_DISCONNECTED]	= ssm_disconnected_init,
+	[SSM_STATE_DESTROYED]		= ssm_destroyed_init,
+};
+
+static int ssm_init_state(struct ibtrs_session *sess, enum ssm_state state)
+{
+	int err;
+
+	if (WARN(state <= _SSM_STATE_MIN || state >= _SSM_STATE_MAX,
+		 "Unknown SSM state %d\n", state))
+		return -EINVAL;
+
+	smp_rmb(); /* fence sess->state change */
+	if (sess->state == state)
+		return 0;
+
+	/* Call the init function of the new state only if:
+	 * - it is defined
+	 *   and
+	 * - it is different from the init function of the current state
+	 */
+	if (ibtrs_clt_ssm_state_init[state] &&
+	    ibtrs_clt_ssm_state_init[state] !=
+	    ibtrs_clt_ssm_state_init[sess->state]) {
+		err = ibtrs_clt_ssm_state_init[state](sess);
+		if (err) {
+			ERR(sess, "Failed to init ssm state %s from %s: %d\n",
+			    ssm_state_str(state), ssm_state_str(sess->state),
+			    err);
+			return err;
+		}
+	}
+
+	DEB("changing sess %p ssm state from %s to %s\n", sess,
+	    ssm_state_str(sess->state), ssm_state_str(state));
+
+	smp_wmb(); /* fence sess->state change */
+	sess->state = state;
+
+	return 0;
+}
+
+static void ssm_trigger_event(struct work_struct *work)
+{
+	struct sess_sm_work *w;
+	struct ibtrs_session *sess;
+	enum ssm_ev ev;
+
+	w = container_of(work, struct sess_sm_work, work);
+	sess = w->sess;
+	ev = w->ev;
+	kvfree(w);
+
+	if (WARN_ON_ONCE(sess->state <= _SSM_STATE_MIN || sess->state >=
+			 _SSM_STATE_MAX)) {
+		WRN(sess, "Session state is out of range\n");
+		return;
+	}
+
+	ibtrs_clt_ev_handlers[sess->state](sess, ev);
+}
+
+static void csm_schedule_event(struct ibtrs_con *con, enum csm_ev ev)
+{
+	struct con_sm_work *w = NULL;
+
+	if (in_softirq()) {
+		w = kmalloc(sizeof(*w), GFP_ATOMIC);
+		BUG_ON(!w);
+		goto out;
+	}
+	while (!w) {
+		w = ibtrs_malloc(sizeof(*w));
+		if (!w)
+			cond_resched();
+	}
+out:
+	w->con = con;
+	w->ev = ev;
+	INIT_WORK(&w->work, csm_trigger_event);
+	WARN_ON(!queue_work_on(0, con->sess->sm_wq, &w->work));
+}
+
+static void ssm_schedule_event(struct ibtrs_session *sess, enum ssm_ev ev)
+{
+	struct sess_sm_work *w = NULL;
+
+	while (!w) {
+		w = ibtrs_malloc(sizeof(*w));
+		if (!w)
+			cond_resched();
+	}
+
+	w->sess = sess;
+	w->ev = ev;
+	INIT_WORK(&w->work, ssm_trigger_event);
+	WARN_ON(!queue_work_on(0, sess->sm_wq, &w->work));
+}
+
+static inline bool clt_ops_are_valid(const struct ibtrs_clt_ops *ops)
+{
+	return ops && ops->rdma_ev && ops->sess_ev && ops->recv;
+}
+
+/**
+ * struct ibtrs_fr_desc - fast registration work request arguments
+ * @entry: Entry in ibtrs_fr_pool.free_list.
+ * @mr:    Memory region.
+ * @frpl:  Fast registration page list.
+ */
+struct ibtrs_fr_desc {
+	struct list_head		entry;
+	struct ib_mr			*mr;
+};
+
+/**
+ * struct ibtrs_fr_pool - pool of fast registration descriptors
+ *
+ * An entry is available for allocation if and only if it occurs in @free_list.
+ *
+ * @size:      Number of descriptors in this pool.
+ * @max_page_list_len: Maximum fast registration work request page list length.
+ * @lock:      Protects free_list.
+ * @free_list: List of free descriptors.
+ * @desc:      Fast registration descriptor pool.
+ */
+struct ibtrs_fr_pool {
+	int			size;
+	int			max_page_list_len;
+	/* lock for free_list*/
+	spinlock_t		lock ____cacheline_aligned;
+	struct list_head	free_list;
+	struct ibtrs_fr_desc	desc[0];
+};
+
+/**
+ * struct ibtrs_map_state - per-request DMA memory mapping state
+ * @desc:	    Pointer to the element of the SRP buffer descriptor array
+ *		    that is being filled in.
+ * @pages:	    Array with DMA addresses of pages being considered for
+ *		    memory registration.
+ * @base_dma_addr:  DMA address of the first page that has not yet been mapped.
+ * @dma_len:	    Number of bytes that will be registered with the next
+ *		    FMR or FR memory registration call.
+ * @total_len:	    Total number of bytes in the sg-list being mapped.
+ * @npages:	    Number of page addresses in the pages[] array.
+ * @nmdesc:	    Number of FMR or FR memory descriptors used for mapping.
+ * @ndesc:	    Number of buffer descriptors that have been filled in.
+ */
+struct ibtrs_map_state {
+	union {
+		struct ib_pool_fmr	**next_fmr;
+		struct ibtrs_fr_desc	**next_fr;
+	};
+	struct ibtrs_sg_desc	*desc;
+	union {
+		u64			*pages;
+		struct scatterlist      *sg;
+	};
+	dma_addr_t		base_dma_addr;
+	u32			dma_len;
+	u32			total_len;
+	u32			npages;
+	u32			nmdesc;
+	u32			ndesc;
+	enum dma_data_direction dir;
+};
+
+static void free_io_bufs(struct ibtrs_session *sess);
+
+static int process_open_rsp(struct ibtrs_con *con, const void *resp)
+{
+	int i;
+	const struct ibtrs_msg_sess_open_resp *msg = resp;
+	struct ibtrs_session *sess = con->sess;
+	u32 chunk_size;
+
+	rcu_read_lock();
+	smp_rmb(); /* fence con->state check */
+	if (unlikely(con->state != CSM_STATE_CONNECTED)) {
+		rcu_read_unlock();
+		INFO(sess, "Process open response failed, disconnected."
+		     " Connection state is %s, Session state is %s\n",
+		     csm_state_str(con->state),
+		     ssm_state_str(sess->state));
+		return -ECOMM;
+	}
+	rcu_read_unlock();
+
+	chunk_size = msg->max_io_size + msg->max_req_size;
+	/* check if IB immediate data size is enough to hold the mem_id and the
+	 * offset inside the memory chunk
+	 */
+	if (ilog2(msg->cnt - 1) + ilog2(chunk_size - 1) >
+		IB_IMM_SIZE_BITS) {
+		ERR(sess, "RDMA immediate size (%db) not enough to encode "
+		    "%d buffers of size %dB\n", IB_IMM_SIZE_BITS, msg->cnt,
+		    chunk_size);
+		return -EINVAL;
+	}
+
+	strlcpy(sess->hostname, msg->hostname, sizeof(sess->hostname));
+	sess->srv_rdma_buf_rkey = msg->rkey;
+	sess->user_queue_depth = msg->max_inflight_msg;
+	sess->max_io_size = msg->max_io_size;
+	sess->max_req_size = msg->max_req_size;
+	sess->chunk_size = chunk_size;
+	sess->max_desc = (msg->max_req_size - IBTRS_HDR_LEN - sizeof(u32)
+			  - sizeof(u32) - IO_MSG_SIZE) / IBTRS_SG_DESC_LEN;
+	sess->ver = min_t(u8, msg->ver, IBTRS_VERSION);
+
+	/* if the server changed the queue_depth between the reconnect,
+	 * we need to reallocate all buffers that depend on it
+	 */
+	if (sess->queue_depth &&
+	    sess->queue_depth != msg->max_inflight_msg) {
+		free_io_bufs(sess);
+		kfree(sess->srv_rdma_addr);
+		sess->srv_rdma_addr = NULL;
+	}
+
+	sess->queue_depth = msg->max_inflight_msg;
+	if (!sess->srv_rdma_addr) {
+		sess->srv_rdma_addr = kcalloc(sess->queue_depth,
+					      sizeof(*sess->srv_rdma_addr),
+					      GFP_KERNEL);
+		if (!sess->srv_rdma_addr) {
+			ERR(sess, "Failed to allocate memory for server RDMA"
+			    " addresses\n");
+			return -ENOMEM;
+		}
+	}
+
+	for (i = 0; i < msg->cnt; i++) {
+		sess->srv_rdma_addr[i] = msg->addr[i];
+		DEB("Adding contiguous buffer %d, size %u, addr: 0x%p,"
+		    " rkey: 0x%x\n", i, sess->chunk_size,
+		    (void *)sess->srv_rdma_addr[i],
+		    sess->srv_rdma_buf_rkey);
+	}
+
+	return 0;
+}
+
+static int wait_for_ssm_state(struct ibtrs_session *sess, enum ssm_state state)
+{
+	DEB("Waiting for state %s...\n", ssm_state_str(state));
+	wait_event(sess->wait_q, sess->state >= state);
+
+	if (unlikely(sess->state != state)) {
+		ERR(sess,
+		    "Waited for session state '%s', but state is '%s'\n",
+		    ssm_state_str(state), ssm_state_str(sess->state));
+		return -EHOSTUNREACH;
+	}
+
+	return 0;
+}
+
+static inline struct ibtrs_tag *__ibtrs_get_tag(struct ibtrs_session *sess,
+						int cpu_id)
+{
+	size_t max_depth = sess->user_queue_depth;
+	struct ibtrs_tag *tag;
+	int cpu, bit;
+
+	cpu = get_cpu();
+	do {
+		bit = find_first_zero_bit(sess->tags_map, max_depth);
+		if (unlikely(bit >= max_depth)) {
+			put_cpu();
+			return NULL;
+		}
+
+	} while (unlikely(test_and_set_bit_lock(bit, sess->tags_map)));
+	put_cpu();
+
+	tag = GET_TAG(sess, bit);
+	WARN_ON(tag->mem_id != bit);
+	tag->cpu_id = (cpu_id != -1 ? cpu_id : cpu);
+
+	return tag;
+}
+
+static inline void __ibtrs_put_tag(struct ibtrs_session *sess,
+				   struct ibtrs_tag *tag)
+{
+	clear_bit_unlock(tag->mem_id, sess->tags_map);
+}
+
+struct ibtrs_tag *ibtrs_get_tag(struct ibtrs_session *sess, int cpu_id,
+				size_t nr_bytes, int can_wait)
+{
+	struct ibtrs_tag *tag;
+	DEFINE_WAIT(wait);
+
+	/* Is not used for now */
+	(void)nr_bytes;
+
+	tag = __ibtrs_get_tag(sess, cpu_id);
+	if (likely(tag) || !can_wait)
+		return tag;
+
+	do {
+		prepare_to_wait(&sess->tags_wait, &wait, TASK_UNINTERRUPTIBLE);
+		tag = __ibtrs_get_tag(sess, cpu_id);
+		if (likely(tag))
+			break;
+
+		io_schedule();
+	} while (1);
+
+	finish_wait(&sess->tags_wait, &wait);
+
+	return tag;
+}
+EXPORT_SYMBOL(ibtrs_get_tag);
+
+void ibtrs_put_tag(struct ibtrs_session *sess, struct ibtrs_tag *tag)
+{
+	if (WARN_ON(tag->mem_id >= sess->queue_depth))
+		return;
+	if (WARN_ON(!test_bit(tag->mem_id, sess->tags_map)))
+		return;
+
+	__ibtrs_put_tag(sess, tag);
+
+	/* Putting a tag is a barrier, so we will observe
+	 * new entry in the wait list, no worries.
+	 */
+	if (waitqueue_active(&sess->tags_wait))
+		wake_up(&sess->tags_wait);
+}
+EXPORT_SYMBOL(ibtrs_put_tag);
+
+static void put_u_msg_iu(struct ibtrs_session *sess, struct ibtrs_iu *iu)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&sess->u_msg_ius_lock, flags);
+	ibtrs_iu_put(&sess->u_msg_ius_list, iu);
+	spin_unlock_irqrestore(&sess->u_msg_ius_lock, flags);
+}
+
+static struct ibtrs_iu *get_u_msg_iu(struct ibtrs_session *sess)
+{
+	struct ibtrs_iu *iu;
+	unsigned long flags;
+
+	spin_lock_irqsave(&sess->u_msg_ius_lock, flags);
+	iu = ibtrs_iu_get(&sess->u_msg_ius_list);
+	spin_unlock_irqrestore(&sess->u_msg_ius_lock, flags);
+
+	return iu;
+}
+
+/**
+ * ibtrs_destroy_fr_pool() - free the resources owned by a pool
+ * @pool: Fast registration pool to be destroyed.
+ */
+static void ibtrs_destroy_fr_pool(struct ibtrs_fr_pool *pool)
+{
+	int i;
+	struct ibtrs_fr_desc *d;
+	int ret;
+
+	if (!pool)
+		return;
+
+	for (i = 0, d = &pool->desc[0]; i < pool->size; i++, d++) {
+		if (d->mr) {
+			ret = ib_dereg_mr(d->mr);
+			if (ret)
+				ERR_NP("Failed to deregister memory region,"
+				       " errno: %d\n", ret);
+		}
+	}
+	kfree(pool);
+}
+
+/**
+ * ibtrs_create_fr_pool() - allocate and initialize a pool for fast registration
+ * @device:            IB device to allocate fast registration descriptors for.
+ * @pd:                Protection domain associated with the FR descriptors.
+ * @pool_size:         Number of descriptors to allocate.
+ * @max_page_list_len: Maximum fast registration work request page list length.
+ */
+static struct ibtrs_fr_pool *ibtrs_create_fr_pool(struct ib_device *device,
+						  struct ib_pd *pd,
+						  int pool_size,
+						  int max_page_list_len)
+{
+	struct ibtrs_fr_pool *pool;
+	struct ibtrs_fr_desc *d;
+	struct ib_mr *mr;
+	int i, ret;
+
+	if (pool_size <= 0) {
+		WRN_NP("Creating fr pool failed, invalid pool size %d\n",
+		       pool_size);
+		ret = -EINVAL;
+		goto err;
+	}
+
+	pool = kzalloc(sizeof(*pool) + pool_size * sizeof(*d), GFP_KERNEL);
+	if (!pool) {
+		ret = -ENOMEM;
+		goto err;
+	}
+
+	pool->size = pool_size;
+	pool->max_page_list_len = max_page_list_len;
+	spin_lock_init(&pool->lock);
+	INIT_LIST_HEAD(&pool->free_list);
+
+	for (i = 0, d = &pool->desc[0]; i < pool->size; i++, d++) {
+		mr = ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG, max_page_list_len);
+		if (IS_ERR(mr)) {
+			WRN_NP("Failed to allocate fast region memory\n");
+			ret = PTR_ERR(mr);
+			goto destroy_pool;
+		}
+		d->mr = mr;
+		list_add_tail(&d->entry, &pool->free_list);
+	}
+
+	return pool;
+
+destroy_pool:
+	ibtrs_destroy_fr_pool(pool);
+err:
+	return ERR_PTR(ret);
+}
+
+/**
+ * ibtrs_fr_pool_get() - obtain a descriptor suitable for fast registration
+ * @pool: Pool to obtain descriptor from.
+ */
+static struct ibtrs_fr_desc *ibtrs_fr_pool_get(struct ibtrs_fr_pool *pool)
+{
+	struct ibtrs_fr_desc *d = NULL;
+
+	spin_lock_bh(&pool->lock);
+	if (!list_empty(&pool->free_list)) {
+		d = list_first_entry(&pool->free_list, typeof(*d), entry);
+		list_del(&d->entry);
+	}
+	spin_unlock_bh(&pool->lock);
+
+	return d;
+}
+
+/**
+ * ibtrs_fr_pool_put() - put an FR descriptor back in the free list
+ * @pool: Pool the descriptor was allocated from.
+ * @desc: Pointer to an array of fast registration descriptor pointers.
+ * @n:    Number of descriptors to put back.
+ *
+ * Note: The caller must already have queued an invalidation request for
+ * desc->mr->rkey before calling this function.
+ */
+static void ibtrs_fr_pool_put(struct ibtrs_fr_pool *pool,
+			      struct ibtrs_fr_desc **desc, int n)
+{
+	int i;
+
+	spin_lock_bh(&pool->lock);
+	for (i = 0; i < n; i++)
+		list_add(&desc[i]->entry, &pool->free_list);
+	spin_unlock_bh(&pool->lock);
+}
+
+static inline struct ibtrs_fr_pool *alloc_fr_pool(struct ibtrs_session *sess)
+{
+	return ibtrs_create_fr_pool(sess->ib_device, sess->ib_sess.pd,
+				    sess->queue_depth,
+				    sess->max_pages_per_mr);
+}
+
+static void ibtrs_map_desc(struct ibtrs_map_state *state, dma_addr_t dma_addr,
+			   u32 dma_len, u32 rkey, u32 max_desc)
+{
+	struct ibtrs_sg_desc *desc = state->desc;
+
+	DEB("dma_addr %llu, key %u, dma_len %u\n", dma_addr, rkey, dma_len);
+	desc->addr	= dma_addr;
+	desc->key	= rkey;
+	desc->len	= dma_len;
+
+	state->total_len += dma_len;
+	if (state->ndesc < max_desc) {
+		state->desc++;
+		state->ndesc++;
+	} else {
+		state->ndesc = INT_MIN;
+		ERR_NP("Could not fit S/G list into buffer descriptor %d.\n",
+		       max_desc);
+	}
+}
+
+static int ibtrs_map_finish_fmr(struct ibtrs_map_state *state,
+				struct ibtrs_con *con)
+{
+	struct ib_pool_fmr *fmr;
+	u64 io_addr = 0;
+	dma_addr_t dma_addr;
+
+	fmr = ib_fmr_pool_map_phys(con->sess->fmr_pool, state->pages,
+				   state->npages, io_addr);
+	if (IS_ERR(fmr)) {
+		WRN_RL(con->sess, "Failed to map FMR from FMR pool, "
+		       "errno: %ld\n", PTR_ERR(fmr));
+		return PTR_ERR(fmr);
+	}
+
+	*state->next_fmr++ = fmr;
+	state->nmdesc++;
+	dma_addr = state->base_dma_addr & ~con->sess->mr_page_mask;
+	DEB("ndesc = %d, nmdesc = %d, npages = %d\n",
+	    state->ndesc, state->nmdesc, state->npages);
+	if (state->dir == DMA_TO_DEVICE)
+		ibtrs_map_desc(state, dma_addr, state->dma_len, fmr->fmr->lkey,
+			       con->sess->max_desc);
+	else
+		ibtrs_map_desc(state, dma_addr, state->dma_len, fmr->fmr->rkey,
+			       con->sess->max_desc);
+
+	return 0;
+}
+
+static int ibtrs_map_finish_fr(struct ibtrs_map_state *state,
+			       struct ibtrs_con *con, int sg_cnt,
+			       unsigned int *sg_offset_p)
+{
+	struct ib_send_wr *bad_wr;
+	struct ib_reg_wr wr;
+	struct ibtrs_fr_desc *desc;
+	struct ib_pd *pd = con->sess->ib_sess.pd;
+	u32 rkey;
+	int n;
+
+	if (sg_cnt == 1 && (pd->flags & IB_PD_UNSAFE_GLOBAL_RKEY)) {
+		unsigned int sg_offset = sg_offset_p ? *sg_offset_p : 0;
+
+		ibtrs_map_desc(state, sg_dma_address(state->sg) + sg_offset,
+			     sg_dma_len(state->sg) - sg_offset,
+			     pd->unsafe_global_rkey, con->sess->max_desc);
+		if (sg_offset_p)
+			*sg_offset_p = 0;
+		return 1;
+	}
+
+	desc = ibtrs_fr_pool_get(con->fr_pool);
+	if (!desc) {
+		WRN_RL(con->sess, "Failed to get descriptor from FR pool\n");
+		return -ENOMEM;
+	}
+
+	rkey = ib_inc_rkey(desc->mr->rkey);
+	ib_update_fast_reg_key(desc->mr, rkey);
+
+	memset(&wr, 0, sizeof(wr));
+	n = ib_map_mr_sg(desc->mr, state->sg, sg_cnt, sg_offset_p,
+			 con->sess->mr_page_size);
+	if (unlikely(n < 0)) {
+		ibtrs_fr_pool_put(con->fr_pool, &desc, 1);
+		return n;
+	}
+
+	wr.wr.next = NULL;
+	wr.wr.opcode = IB_WR_REG_MR;
+	wr.wr.wr_id = FAST_REG_WR_ID_MASK;
+	wr.wr.num_sge = 0;
+	wr.wr.send_flags = 0;
+	wr.mr = desc->mr;
+	wr.key = desc->mr->rkey;
+	wr.access = (IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE);
+
+	*state->next_fr++ = desc;
+	state->nmdesc++;
+
+	ibtrs_map_desc(state, state->base_dma_addr, state->dma_len,
+		       desc->mr->rkey, con->sess->max_desc);
+
+	return ib_post_send(con->ib_con.qp, &wr.wr, &bad_wr);
+}
+
+static int ibtrs_finish_fmr_mapping(struct ibtrs_map_state *state,
+				    struct ibtrs_con *con)
+{
+	int ret = 0;
+	struct ib_pd *pd = con->sess->ib_sess.pd;
+
+	if (state->npages == 0)
+		return 0;
+
+	if (state->npages == 1 && (pd->flags & IB_PD_UNSAFE_GLOBAL_RKEY))
+		ibtrs_map_desc(state, state->base_dma_addr, state->dma_len,
+			       pd->unsafe_global_rkey,
+			       con->sess->max_desc);
+	else
+		ret = ibtrs_map_finish_fmr(state, con);
+
+	if (ret == 0) {
+		state->npages = 0;
+		state->dma_len = 0;
+	}
+
+	return ret;
+}
+
+static int ibtrs_map_sg_entry(struct ibtrs_map_state *state,
+			      struct ibtrs_con *con, struct scatterlist *sg,
+			      int sg_count)
+{
+	struct ib_device *ibdev = con->sess->ib_device;
+	dma_addr_t dma_addr = ib_sg_dma_address(ibdev, sg);
+	unsigned int dma_len = ib_sg_dma_len(ibdev, sg);
+	unsigned int len;
+	int ret;
+
+	if (!dma_len)
+		return 0;
+
+	while (dma_len) {
+		unsigned offset = dma_addr & ~con->sess->mr_page_mask;
+
+		if (state->npages == con->sess->max_pages_per_mr ||
+		    offset != 0) {
+			ret = ibtrs_finish_fmr_mapping(state, con);
+			if (ret)
+				return ret;
+		}
+
+		len = min_t(unsigned int, dma_len,
+			    con->sess->mr_page_size - offset);
+
+		if (!state->npages)
+			state->base_dma_addr = dma_addr;
+		state->pages[state->npages++] =
+			dma_addr & con->sess->mr_page_mask;
+		state->dma_len += len;
+		dma_addr += len;
+		dma_len -= len;
+	}
+
+	/*
+	 * If the last entry of the MR wasn't a full page, then we need to
+	 * close it out and start a new one -- we can only merge at page
+	 * boundaries.
+	 */
+	ret = 0;
+	if (len != con->sess->mr_page_size)
+		ret = ibtrs_finish_fmr_mapping(state, con);
+	return ret;
+}
+
+static int ibtrs_map_fr(struct ibtrs_map_state *state, struct ibtrs_con *con,
+			struct scatterlist *sg, int sg_count)
+{
+	unsigned int sg_offset = 0;
+	state->sg = sg;
+
+	while (sg_count) {
+		int i, n;
+
+		n = ibtrs_map_finish_fr(state, con, sg_count, &sg_offset);
+		if (unlikely(n < 0))
+			return n;
+
+		sg_count -= n;
+		for (i = 0; i < n; i++)
+			state->sg = sg_next(state->sg);
+	}
+
+	return 0;
+}
+static int ibtrs_map_fmr(struct ibtrs_map_state *state, struct ibtrs_con *con,
+			 struct scatterlist *sg_first_entry, int
+			 sg_first_entry_index, int sg_count)
+{
+	int i, ret;
+	struct scatterlist *sg;
+
+	for (i = sg_first_entry_index, sg = sg_first_entry; i < sg_count;
+	     i++, sg = sg_next(sg)) {
+		ret = ibtrs_map_sg_entry(state, con, sg, sg_count);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
+static int ibtrs_map_sg(struct ibtrs_map_state *state, struct ibtrs_con *con,
+			struct rdma_req *req)
+{
+	int ret = 0;
+
+	state->pages = req->map_page;
+	if (con->sess->fast_reg_mode == IBTRS_FAST_MEM_FR) {
+		state->next_fr = req->fr_list;
+		ret = ibtrs_map_fr(state, con, req->sglist, req->sg_cnt);
+		if (ret)
+			goto out;
+	} else if (con->sess->fast_reg_mode == IBTRS_FAST_MEM_FMR) {
+		state->next_fmr = req->fmr_list;
+		ret = ibtrs_map_fmr(state, con, req->sglist, 0,
+				    req->sg_cnt);
+		if (ret)
+			goto out;
+		ret = ibtrs_finish_fmr_mapping(state, con);
+		if (ret)
+			goto out;
+	}
+
+
+
+out:
+	req->nmdesc = state->nmdesc;
+	return ret;
+}
+
+static int ibtrs_inv_rkey(struct ibtrs_con *con, u32 rkey)
+{
+	struct ib_send_wr *bad_wr;
+	struct ib_send_wr wr = {
+		.opcode		    = IB_WR_LOCAL_INV,
+		.wr_id		    = LOCAL_INV_WR_ID_MASK,
+		.next		    = NULL,
+		.num_sge	    = 0,
+		.send_flags	    = 0,
+		.ex.invalidate_rkey = rkey,
+	};
+
+	return ib_post_send(con->ib_con.qp, &wr, &bad_wr);
+}
+
+static void ibtrs_unmap_fast_reg_data(struct ibtrs_con *con,
+				      struct rdma_req *req)
+{
+	int i, ret;
+
+	if (con->sess->fast_reg_mode == IBTRS_FAST_MEM_FR) {
+		struct ibtrs_fr_desc **pfr;
+
+		for (i = req->nmdesc, pfr = req->fr_list; i > 0; i--, pfr++) {
+			ret = ibtrs_inv_rkey(con, (*pfr)->mr->rkey);
+			if (ret < 0) {
+				ERR(con->sess,
+				    "Invalidating registered RDMA memory for"
+				    " rkey %#x failed, errno: %d\n",
+				    (*pfr)->mr->rkey, ret);
+			}
+		}
+		if (req->nmdesc)
+			ibtrs_fr_pool_put(con->fr_pool, req->fr_list,
+					  req->nmdesc);
+	} else {
+		struct ib_pool_fmr **pfmr;
+
+		for (i = req->nmdesc, pfmr = req->fmr_list; i > 0; i--, pfmr++)
+			ib_fmr_pool_unmap(*pfmr);
+	}
+	req->nmdesc = 0;
+}
+
+/*
+ * We have more scatter/gather entries, so use fast_reg_map
+ * trying to merge as many entries as we can.
+ */
+static int ibtrs_fast_reg_map_data(struct ibtrs_con *con,
+				   struct ibtrs_sg_desc *desc,
+				   struct rdma_req *req)
+{
+	struct ibtrs_map_state state;
+	int ret;
+
+	memset(&state, 0, sizeof(state));
+	state.desc	= desc;
+	state.dir	= req->dir;
+	ret = ibtrs_map_sg(&state, con, req);
+
+	if (unlikely(ret))
+		goto unmap;
+
+	if (unlikely(state.ndesc <= 0)) {
+		ERR(con->sess,
+		    "Could not fit S/G list into buffer descriptor %d\n",
+		    state.ndesc);
+		ret = -EIO;
+		goto unmap;
+	}
+
+	return state.ndesc;
+unmap:
+	ibtrs_unmap_fast_reg_data(con, req);
+	return ret;
+}
+
+static int ibtrs_post_send_rdma(struct ibtrs_con *con, struct rdma_req *req,
+				u64 addr, u32 off, u32 imm)
+{
+	struct ib_sge list[1];
+	u32 cnt = atomic_inc_return(&con->io_cnt);
+
+	DEB("called, imm: %x\n", imm);
+	if (unlikely(!req->sg_size)) {
+		WRN(con->sess, "Doing RDMA Write failed, no data supplied\n");
+		return -EINVAL;
+	}
+
+	/* user data and user message in the first list element */
+	list[0].addr   = req->iu->dma_addr;
+	list[0].length = req->sg_size;
+	list[0].lkey   = con->sess->ib_sess.pd->local_dma_lkey;
+
+	return ib_post_rdma_write_imm(con->ib_con.qp, list, 1,
+				      con->sess->srv_rdma_buf_rkey,
+				      addr + off, (u64)req->iu, imm,
+				      cnt % (con->sess->queue_depth) ?
+				      0 : IB_SEND_SIGNALED);
+}
+
+static void ibtrs_set_sge_with_desc(struct ib_sge *list,
+				    struct ibtrs_sg_desc *desc)
+{
+	list->addr   = desc->addr;
+	list->length = desc->len;
+	list->lkey   = desc->key;
+	DEB("dma_addr %llu, key %u, dma_len %u\n",
+	    desc->addr, desc->key, desc->len);
+}
+
+static void ibtrs_set_rdma_desc_last(struct ibtrs_con *con, struct ib_sge *list,
+				     struct rdma_req *req,
+				     struct ib_rdma_wr *wr, int offset,
+				     struct ibtrs_sg_desc *desc, int m,
+				     int n, u64 addr, u32 size, u32 imm)
+{
+	int i;
+	struct ibtrs_session *sess = con->sess;
+	u32 cnt = atomic_inc_return(&con->io_cnt);
+
+	for (i = m; i < n; i++, desc++)
+		ibtrs_set_sge_with_desc(&list[i], desc);
+
+	list[i].addr   = req->iu->dma_addr;
+	list[i].length = size;
+	list[i].lkey   = sess->ib_sess.pd->local_dma_lkey;
+	wr->wr.wr_id = (uintptr_t)req->iu;
+	wr->wr.sg_list = &list[m];
+	wr->wr.num_sge = n - m + 1;
+	wr->remote_addr	= addr + offset;
+	wr->rkey	= sess->srv_rdma_buf_rkey;
+
+	wr->wr.opcode	= IB_WR_RDMA_WRITE_WITH_IMM;
+	wr->wr.send_flags   = cnt % (sess->queue_depth) ? 0 :
+		IB_SEND_SIGNALED;
+	wr->wr.ex.imm_data	= cpu_to_be32(imm);
+}
+
+static int ibtrs_post_send_rdma_desc_more(struct ibtrs_con *con,
+					  struct ib_sge *list,
+					  struct rdma_req *req,
+					  struct ibtrs_sg_desc *desc, int n,
+					  u64 addr, u32 size, u32 imm)
+{
+	int ret;
+	size_t num_sge = 1 + n;
+	struct ibtrs_session *sess = con->sess;
+	int max_sge = sess->max_sge;
+	int num_wr =  DIV_ROUND_UP(num_sge, max_sge);
+	struct ib_send_wr *bad_wr;
+	struct ib_rdma_wr *wrs, *wr;
+	int j = 0, k, offset = 0, len = 0;
+	int m = 0;
+
+	wrs = kcalloc(num_wr, sizeof(*wrs), GFP_ATOMIC);
+	if (!wrs)
+		return -ENOMEM;
+
+	if (num_wr == 1)
+		goto last_one;
+
+	for (; j < num_wr; j++) {
+		wr = &wrs[j];
+		for (k = 0; k < max_sge; k++, desc++) {
+			m = k + j * max_sge;
+			ibtrs_set_sge_with_desc(&list[m], desc);
+			len +=  desc->len;
+		}
+		wr->wr.wr_id = (uintptr_t)req->iu;
+		wr->wr.sg_list = &list[m];
+		wr->wr.num_sge = max_sge;
+		wr->remote_addr	= addr + offset;
+		wr->rkey	= sess->srv_rdma_buf_rkey;
+
+		offset += len;
+		wr->wr.next	= &wrs[j + 1].wr;
+		wr->wr.opcode	= IB_WR_RDMA_WRITE;
+	}
+
+last_one:
+	wr = &wrs[j];
+
+	ibtrs_set_rdma_desc_last(con, list, req, wr, offset, desc, m, n, addr,
+				 size, imm);
+
+	ret = ib_post_send(con->ib_con.qp, &wrs[0].wr, &bad_wr);
+	if (unlikely(ret))
+		ERR(sess, "Posting RDMA-Write-Request to QP failed,"
+		    " errno: %d\n", ret);
+	kfree(wrs);
+	return ret;
+}
+
+static int ibtrs_post_send_rdma_desc(struct ibtrs_con *con,
+				     struct rdma_req *req,
+				     struct ibtrs_sg_desc *desc, int n,
+				     u64 addr, u32 size, u32 imm)
+{
+	size_t num_sge = 1 + n;
+	struct ib_sge *list;
+	int ret, i;
+	struct ibtrs_session *sess = con->sess;
+
+	list = kmalloc_array(num_sge, sizeof(*list), GFP_ATOMIC);
+
+	if (!list)
+		return -ENOMEM;
+
+	DEB("n is %d\n", n);
+	if (num_sge < sess->max_sge) {
+		u32 cnt = atomic_inc_return(&con->io_cnt);
+
+		for (i = 0; i < n; i++, desc++)
+			ibtrs_set_sge_with_desc(&list[i], desc);
+		list[i].addr   = req->iu->dma_addr;
+		list[i].length = size;
+		list[i].lkey   = sess->ib_sess.pd->local_dma_lkey;
+
+		ret = ib_post_rdma_write_imm(con->ib_con.qp, list, num_sge,
+					     sess->srv_rdma_buf_rkey,
+					     addr, (u64)req->iu, imm,
+					     cnt %
+					     (sess->queue_depth) ?
+					     0 : IB_SEND_SIGNALED);
+	} else
+		ret = ibtrs_post_send_rdma_desc_more(con, list, req, desc, n,
+						     addr, size, imm);
+
+	kfree(list);
+	return ret;
+}
+
+static int ibtrs_post_send_rdma_more(struct ibtrs_con *con,
+				     struct rdma_req *req,
+				     u64 addr, u32 size, u32 imm)
+{
+	int i, ret;
+	struct scatterlist *sg;
+	struct ib_device *ibdev = con->sess->ib_device;
+	size_t num_sge = 1 + req->sg_cnt;
+	struct ib_sge *list;
+	u32 cnt = atomic_inc_return(&con->io_cnt);
+
+	list = kmalloc_array(num_sge, sizeof(*list), GFP_ATOMIC);
+
+	if (!list)
+		return -ENOMEM;
+
+	for_each_sg(req->sglist, sg, req->sg_cnt, i) {
+		list[i].addr   = ib_sg_dma_address(ibdev, sg);
+		list[i].length = ib_sg_dma_len(ibdev, sg);
+		list[i].lkey   = con->sess->ib_sess.pd->local_dma_lkey;
+	}
+	list[i].addr   = req->iu->dma_addr;
+	list[i].length = size;
+	list[i].lkey   = con->sess->ib_sess.pd->local_dma_lkey;
+
+	ret = ib_post_rdma_write_imm(con->ib_con.qp, list, num_sge,
+				     con->sess->srv_rdma_buf_rkey,
+				     addr, (uintptr_t)req->iu, imm,
+				     cnt % (con->sess->queue_depth) ?
+				     0 : IB_SEND_SIGNALED);
+
+	kfree(list);
+	return ret;
+}
+
+static int ibtrs_post_recv(struct ibtrs_con *con, struct ibtrs_iu *iu)
+{
+	int err;
+	struct ib_recv_wr wr, *bad_wr;
+	struct ib_sge list;
+
+	list.addr   = iu->dma_addr;
+	list.length = iu->size;
+	list.lkey   = con->sess->ib_sess.pd->local_dma_lkey;
+
+	if (WARN_ON(list.length == 0)) {
+		WRN(con->sess, "Posting receive work request failed,"
+		    " sg list is empty\n");
+		return -EINVAL;
+	}
+
+	wr.next     = NULL;
+	wr.wr_id    = (uintptr_t)iu;
+	wr.sg_list  = &list;
+	wr.num_sge  = 1;
+
+	err = ib_post_recv(con->ib_con.qp, &wr, &bad_wr);
+	if (unlikely(err))
+		ERR(con->sess, "Posting receive work request failed, errno:"
+		    " %d\n", err);
+
+	return err;
+}
+
+static inline int ibtrs_clt_ms_to_id(unsigned long ms)
+{
+	int id = ms ? ilog2(ms) - MIN_LOG_LATENCY + 1 : 0;
+
+	return clamp(id, 0, MAX_LOG_LATENCY - MIN_LOG_LATENCY + 1);
+}
+
+static void ibtrs_clt_update_rdma_lat(struct ibtrs_clt_stats *s, bool read,
+				      unsigned long ms)
+{
+	const int id = ibtrs_clt_ms_to_id(ms);
+	const int cpu = raw_smp_processor_id();
+
+	if (read) {
+		s->rdma_lat_distr[cpu][id].read++;
+		if (s->rdma_lat_max[cpu].read < ms)
+			s->rdma_lat_max[cpu].read = ms;
+	} else {
+		s->rdma_lat_distr[cpu][id].write++;
+		if (s->rdma_lat_max[cpu].write < ms)
+			s->rdma_lat_max[cpu].write = ms;
+	}
+}
+
+static inline unsigned long ibtrs_clt_get_raw_ms(void)
+{
+	struct timespec ts;
+
+	getrawmonotonic(&ts);
+
+	return timespec_to_ms(&ts);
+}
+
+static inline void ibtrs_clt_decrease_inflight(struct ibtrs_clt_stats *s)
+{
+	s->rdma_stats[raw_smp_processor_id()].inflight--;
+}
+
+static void process_io_rsp(struct ibtrs_session *sess, u32 msg_id, s16 errno)
+{
+	struct rdma_req *req;
+	void *priv;
+	enum dma_data_direction dir;
+
+	if (unlikely(msg_id >= sess->queue_depth)) {
+		ERR(sess,
+		    "Immediate message with invalid msg id received: %d\n",
+		    msg_id);
+		return;
+	}
+
+	req = &sess->reqs[msg_id];
+
+	DEB("Processing io resp for msg_id: %u, %s\n", msg_id,
+	    req->dir == DMA_FROM_DEVICE ? "read" : "write");
+
+	if (req->sg_cnt > fmr_sg_cnt)
+		ibtrs_unmap_fast_reg_data(req->con, req);
+	if (req->sg_cnt)
+		ib_dma_unmap_sg(sess->ib_device, req->sglist,
+				req->sg_cnt, req->dir);
+	if (sess->enable_rdma_lat)
+		ibtrs_clt_update_rdma_lat(&sess->stats,
+					  req->dir == DMA_FROM_DEVICE,
+					  ibtrs_clt_get_raw_ms() -
+					  req->start_time);
+	ibtrs_clt_decrease_inflight(&sess->stats);
+
+	req->in_use = false;
+	req->con    = NULL;
+	priv = req->priv;
+	dir = req->dir;
+
+	clt_ops->rdma_ev(priv, dir == DMA_FROM_DEVICE ?
+			 IBTRS_CLT_RDMA_EV_RDMA_REQUEST_WRITE_COMPL :
+			 IBTRS_CLT_RDMA_EV_RDMA_WRITE_COMPL, errno);
+}
+
+static int ibtrs_send_msg_user_ack(struct ibtrs_con *con)
+{
+	int err;
+
+	rcu_read_lock();
+	smp_rmb(); /* fence con->state check */
+	if (unlikely(con->state != CSM_STATE_CONNECTED)) {
+		rcu_read_unlock();
+		INFO(con->sess, "Sending user msg ack failed, disconnected"
+		     " Connection state is %s, Session state is %s\n",
+		     csm_state_str(con->state),
+		     ssm_state_str(con->sess->state));
+		return -ECOMM;
+	}
+
+	err = ibtrs_write_empty_imm(con->ib_con.qp, UINT_MAX - 1,
+				    IB_SEND_SIGNALED);
+	rcu_read_unlock();
+	if (unlikely(err)) {
+		ERR_RL(con->sess, "Sending user msg ack failed, errno: %d\n",
+		       err);
+		return err;
+	}
+
+	ibtrs_heartbeat_set_send_ts(&con->sess->heartbeat);
+	return 0;
+}
+
+static void process_msg_user(struct ibtrs_con *con, struct ibtrs_msg_user *msg)
+{
+	int len;
+	struct ibtrs_session *sess = con->sess;
+
+	len = msg->hdr.tsize - IBTRS_HDR_LEN;
+
+	sess->stats.user_ib_msgs.recv_msg_cnt++;
+	sess->stats.user_ib_msgs.recv_size += len;
+
+	clt_ops->recv(sess->priv, (const void *)msg->payl, len);
+}
+
+static void process_msg_user_ack(struct ibtrs_con *con)
+{
+	struct ibtrs_session *sess = con->sess;
+
+	atomic_inc(&sess->peer_usr_msg_bufs);
+	wake_up(&con->sess->mu_buf_wait_q);
+}
+
+static void msg_worker(struct work_struct *work)
+{
+	struct msg_work *w;
+	struct ibtrs_con *con;
+	struct ibtrs_msg_user *msg;
+
+	w = container_of(work, struct msg_work, work);
+	con = w->con;
+	msg = w->msg;
+	kvfree(w);
+	process_msg_user(con, msg);
+	kvfree(msg);
+}
+
+static int ibtrs_schedule_msg(struct ibtrs_con *con, struct ibtrs_msg_user *msg)
+{
+	struct msg_work *w;
+
+	w = ibtrs_malloc(sizeof(*w));
+	if (!w)
+		return -ENOMEM;
+
+	w->con = con;
+	w->msg = ibtrs_malloc(msg->hdr.tsize);
+	if (!w->msg) {
+		kvfree(w);
+		return -ENOMEM;
+	}
+	memcpy(w->msg, msg, msg->hdr.tsize);
+	INIT_WORK(&w->work, msg_worker);
+	queue_work(con->sess->msg_wq, &w->work);
+	return 0;
+}
+
+static void ibtrs_handle_recv(struct ibtrs_con *con, struct ibtrs_iu *iu)
+{
+	struct ibtrs_msg_hdr *hdr;
+	struct ibtrs_session *sess = con->sess;
+	int ret;
+
+	hdr = (struct ibtrs_msg_hdr *)iu->buf;
+	if (unlikely(ibtrs_validate_message(sess->queue_depth, hdr)))
+		goto err1;
+
+	DEB("recv completion, type 0x%02x\n",
+	    hdr->type);
+	print_hex_dump_debug("", DUMP_PREFIX_OFFSET, 8, 1, iu->buf,
+			     IBTRS_HDR_LEN, true);
+
+	switch (hdr->type) {
+	case IBTRS_MSG_USER:
+		ret = ibtrs_schedule_msg(con, iu->buf);
+		if (unlikely(ret)) {
+			ERR_RL(sess, "Scheduling worker of user message "
+			       "to user module failed, errno: %d\n", ret);
+			goto err1;
+		}
+		ret = ibtrs_post_recv(con, iu);
+		if (unlikely(ret)) {
+			ERR_RL(sess, "Posting receive buffer of user message "
+			       "to HCA failed, errno: %d\n", ret);
+			goto err2;
+		}
+		ret = ibtrs_send_msg_user_ack(con);
+		if (unlikely(ret)) {
+			ERR_RL(sess, "Sending ACK for user message failed, "
+			       "errno: %d\n", ret);
+			goto err2;
+		}
+		return;
+	case IBTRS_MSG_SESS_OPEN_RESP: {
+		int err;
+
+		err = process_open_rsp(con, iu->buf);
+		if (unlikely(err))
+			ssm_schedule_event(con->sess, SSM_EV_CON_ERROR);
+		else
+			ssm_schedule_event(con->sess, SSM_EV_GOT_RDMA_INFO);
+		return;
+	}
+	default:
+		WRN(sess, "Received message of unknown type: 0x%02x\n",
+		    hdr->type);
+		goto err1;
+	}
+
+err1:
+	ibtrs_post_recv(con, iu);
+err2:
+	ERR(sess, "Failed to processes IBTRS message\n");
+	csm_schedule_event(con, CSM_EV_CON_ERROR);
+}
+
+static void process_err_wc(struct ibtrs_con *con, struct ib_wc *wc)
+{
+	struct ibtrs_iu *iu;
+
+	if (wc->wr_id == (uintptr_t)&con->ib_con.beacon) {
+		csm_schedule_event(con, CSM_EV_BEACON_COMPLETED);
+		return;
+	}
+
+	if (wc->wr_id == FAST_REG_WR_ID_MASK ||
+	    wc->wr_id == LOCAL_INV_WR_ID_MASK) {
+		ERR_RL(con->sess, "Fast registration wr failed: wr_id: %d,"
+		       "status: %s\n", (int)wc->wr_id,
+		       ib_wc_status_msg(wc->status));
+		csm_schedule_event(con, CSM_EV_WC_ERROR);
+		return;
+	}
+	/* only wc->wr_id is ensured to be correct in erroneous WCs,
+	 * we can't rely on wc->opcode, use iu->direction to determine if it's
+	 * an tx or rx IU
+	 */
+	iu = (struct ibtrs_iu *)wc->wr_id;
+	if (iu && iu->direction == DMA_TO_DEVICE && iu->is_msg)
+		put_u_msg_iu(con->sess, iu);
+
+	/* suppress FLUSH_ERR log when the connection is being disconnected */
+	if (unlikely(wc->status != IB_WC_WR_FLUSH_ERR ||
+		     (con->state != CSM_STATE_CLOSING &&
+		      con->state != CSM_STATE_FLUSHING)))
+		ERR_RL(con->sess, "wr_id: 0x%llx status: %d (%s),"
+		       " type: %d (%s), vendor_err: %x, len: %u,"
+		       " connection status: %s\n", wc->wr_id,
+		       wc->status, ib_wc_status_msg(wc->status),
+		       wc->opcode, ib_wc_opcode_str(wc->opcode),
+		       wc->vendor_err, wc->byte_len, csm_state_str(con->state));
+
+	csm_schedule_event(con, CSM_EV_WC_ERROR);
+}
+
+static int process_wcs(struct ibtrs_con *con, struct ib_wc *wcs, size_t len)
+{
+	int i, ret;
+	u32 imm;
+
+	for (i = 0; i < len; i++) {
+		u32 msg_id;
+		s16 errno;
+		struct ibtrs_msg_hdr *hdr;
+		struct ibtrs_iu *iu;
+		struct ib_wc wc = wcs[i];
+
+		if (unlikely(wc.status != IB_WC_SUCCESS)) {
+			process_err_wc(con, &wc);
+			continue;
+		}
+
+		DEB("cq complete with wr_id 0x%llx "
+		    "status %d (%s) type %d (%s) len %u\n",
+		    wc.wr_id, wc.status, ib_wc_status_msg(wc.status), wc.opcode,
+		    ib_wc_opcode_str(wc.opcode), wc.byte_len);
+
+		iu = (struct ibtrs_iu *)wc.wr_id;
+
+		switch (wc.opcode) {
+		case IB_WC_SEND:
+			if (con->user) {
+				if (iu == con->sess->sess_info_iu)
+					break;
+				put_u_msg_iu(con->sess, iu);
+				wake_up(&con->sess->mu_iu_wait_q);
+			}
+			break;
+		case IB_WC_RDMA_WRITE:
+			break;
+		case IB_WC_RECV_RDMA_WITH_IMM:
+			ibtrs_set_last_heartbeat(&con->sess->heartbeat);
+			imm = be32_to_cpu(wc.ex.imm_data);
+			ret = ibtrs_post_recv(con, iu);
+			if (ret) {
+				ERR(con->sess, "Failed to post receive "
+				    "buffer\n");
+				csm_schedule_event(con, CSM_EV_CON_ERROR);
+			}
+
+			if (imm == UINT_MAX) {
+				break;
+			} else if (imm == UINT_MAX - 1) {
+				process_msg_user_ack(con);
+				break;
+			}
+			msg_id = imm >> 16;
+			errno = (imm << 16) >> 16;
+			process_io_rsp(con->sess, msg_id, errno);
+			break;
+
+		case IB_WC_RECV:
+			ibtrs_set_last_heartbeat(&con->sess->heartbeat);
+
+			hdr = (struct ibtrs_msg_hdr *)iu->buf;
+			ibtrs_deb_msg_hdr("Received: ", hdr);
+			ibtrs_handle_recv(con, iu);
+			break;
+
+		default:
+			WRN(con->sess, "Unexpected WC type: %s\n",
+			    ib_wc_opcode_str(wc.opcode));
+		}
+	}
+
+	return 0;
+}
+
+static void ibtrs_clt_update_wc_stats(struct ibtrs_con *con, int cnt)
+{
+	short cpu = con->cpu;
+
+	if (cnt > con->sess->stats.wc_comp[cpu].max_wc_cnt)
+		con->sess->stats.wc_comp[cpu].max_wc_cnt = cnt;
+	con->sess->stats.wc_comp[cpu].cnt++;
+	con->sess->stats.wc_comp[cpu].total_cnt += cnt;
+}
+
+static int get_process_wcs(struct ibtrs_con *con)
+{
+	int cnt, err;
+	struct ib_wc *wcs = con->wcs;
+
+	do {
+		cnt = ib_poll_cq(con->ib_con.cq, ARRAY_SIZE(con->wcs), wcs);
+		if (unlikely(cnt < 0)) {
+			ERR(con->sess, "Getting work requests from completion"
+			    " queue failed, errno: %d\n", cnt);
+			return cnt;
+		}
+		DEB("Retrieved %d wcs from CQ\n", cnt);
+
+		if (likely(cnt > 0)) {
+			err = process_wcs(con, wcs, cnt);
+			if (unlikely(err))
+				return err;
+			ibtrs_clt_update_wc_stats(con, cnt);
+		}
+	} while (cnt > 0);
+
+	return 0;
+}
+
+static void process_con_rejected(struct ibtrs_con *con,
+				 struct rdma_cm_event *event)
+{
+	const struct ibtrs_msg_error *msg;
+
+	msg = event->param.conn.private_data;
+	/* Check if the server has sent some message on the private data.
+	 * IB_CM_REJ_CONSUMER_DEFINED is set not only when ibtrs_server
+	 * provided private data for the rdma_reject() call, so the data len
+	 * needs also to be checked.
+	 */
+	if (event->status != IB_CM_REJ_CONSUMER_DEFINED ||
+	    msg->hdr.type != IBTRS_MSG_ERROR)
+		return;
+
+	if (unlikely(ibtrs_validate_message(con->sess->queue_depth, msg))) {
+		ERR(con->sess,
+		    "Received invalid connection rejected message\n");
+		return;
+	}
+
+	if (con == &con->sess->con[0] && msg->errno == -EEXIST)
+		ERR(con->sess, "Connection rejected by the server,"
+		    " session already exists, errno: %d\n", msg->errno);
+	else
+		ERR(con->sess, "Connection rejected by the server, errno: %d\n",
+		    msg->errno);
+}
+
+static int ibtrs_clt_rdma_cm_ev_handler(struct rdma_cm_id *cm_id,
+					struct rdma_cm_event *event)
+{
+	struct ibtrs_con *con = cm_id->context;
+
+	switch (event->event) {
+	case RDMA_CM_EVENT_ADDR_RESOLVED:
+		DEB("addr resolved on cma_id is %p\n", cm_id);
+		csm_schedule_event(con, CSM_EV_ADDR_RESOLVED);
+		break;
+
+	case RDMA_CM_EVENT_ROUTE_RESOLVED: {
+		struct sockaddr_storage *peer_addr = &con->sess->peer_addr;
+		struct sockaddr_storage *self_addr = &con->sess->self_addr;
+
+		DEB("route resolved on cma_id is %p\n", cm_id);
+		/* initiator is src, target is dst */
+		memcpy(peer_addr, &cm_id->route.addr.dst_addr,
+		       sizeof(*peer_addr));
+		memcpy(self_addr, &cm_id->route.addr.src_addr,
+		       sizeof(*self_addr));
+
+		switch (peer_addr->ss_family) {
+		case AF_INET:
+			DEB("Route %pI4->%pI4 resolved\n",
+			    &((struct sockaddr_in *)
+			      self_addr)->sin_addr.s_addr,
+			    &((struct sockaddr_in *)
+			      peer_addr)->sin_addr.s_addr);
+			break;
+		case AF_INET6:
+			DEB("Route %pI6->%pI6 resolved\n",
+			    &((struct sockaddr_in6 *)self_addr)->sin6_addr,
+			    &((struct sockaddr_in6 *)peer_addr)->sin6_addr);
+			break;
+		case AF_IB:
+			DEB("Route %pI6->%pI6 resolved\n",
+			    &((struct sockaddr_ib *)self_addr)->sib_addr,
+			    &((struct sockaddr_ib *)peer_addr)->sib_addr);
+			break;
+		default:
+			DEB("Route resolved (unknown address family)\n");
+		}
+
+		csm_schedule_event(con, CSM_EV_ROUTE_RESOLVED);
+		}
+		break;
+
+	case RDMA_CM_EVENT_ESTABLISHED:
+		DEB("Connection established\n");
+
+		csm_schedule_event(con, CSM_EV_CON_ESTABLISHED);
+		break;
+
+	case RDMA_CM_EVENT_ADDR_ERROR:
+	case RDMA_CM_EVENT_ROUTE_ERROR:
+	case RDMA_CM_EVENT_CONNECT_ERROR:
+		ERR(con->sess, "Connection establishment error"
+		    " (CM event: %s, errno: %d)\n",
+		    rdma_event_msg(event->event), event->status);
+		csm_schedule_event(con, CSM_EV_CON_ERROR);
+		break;
+
+	case RDMA_CM_EVENT_DISCONNECTED:
+	case RDMA_CM_EVENT_TIMEWAIT_EXIT:
+		csm_schedule_event(con, CSM_EV_CON_DISCONNECTED);
+		break;
+
+	case RDMA_CM_EVENT_REJECTED:
+		/* reject status is defined in enum, not errno */
+		ERR_RL(con->sess,
+		       "Connection rejected (CM event: %s, err: %s)\n",
+		       rdma_event_msg(event->event),
+		       rdma_reject_msg(cm_id, event->status));
+		process_con_rejected(con, event);
+		csm_schedule_event(con, CSM_EV_CON_ERROR);
+		break;
+
+	case RDMA_CM_EVENT_UNREACHABLE:
+	case RDMA_CM_EVENT_ADDR_CHANGE: {
+		ERR_RL(con->sess, "CM error (CM event: %s, errno: %d)\n",
+		       rdma_event_msg(event->event), event->status);
+
+		csm_schedule_event(con, CSM_EV_CON_ERROR);
+		break;
+	}
+	case RDMA_CM_EVENT_DEVICE_REMOVAL: {
+		struct completion dc;
+
+		ERR_RL(con->sess, "CM error (CM event: %s, errno: %d)\n",
+		       rdma_event_msg(event->event), event->status);
+
+		con->device_being_removed = true;
+		init_completion(&dc);
+		con->sess->ib_sess_destroy_completion = &dc;
+
+		/* Generating a CON_ERROR event will cause the SSM to close all
+		 * the connections and try to reconnect. Wait until all
+		 * connections are closed and the ib session destroyed before
+		 * returning to the ib core code.
+		 */
+		csm_schedule_event(con, CSM_EV_CON_ERROR);
+		wait_for_completion(&dc);
+		con->sess->ib_sess_destroy_completion = NULL;
+
+		/* return 1 so cm_id is destroyed afterwards */
+		return 1;
+	}
+	default:
+		WRN(con->sess, "Ignoring unexpected CM event %s, errno: %d\n",
+		    rdma_event_msg(event->event), event->status);
+		break;
+	}
+	return 0;
+}
+
+static void handle_cq_comp(struct ibtrs_con *con)
+{
+	int err;
+
+	err = get_process_wcs(con);
+	if (unlikely(err))
+		goto error;
+
+	while ((err = ib_req_notify_cq(con->ib_con.cq, IB_CQ_NEXT_COMP |
+				       IB_CQ_REPORT_MISSED_EVENTS)) > 0) {
+		DEB("Missed %d CQ notifications, processing missed WCs...\n",
+		    err);
+		err = get_process_wcs(con);
+		if (unlikely(err))
+			goto error;
+	}
+
+	if (unlikely(err))
+		goto error;
+
+	return;
+
+error:
+	ERR(con->sess, "Failed to get WCs from CQ, errno: %d\n", err);
+	csm_schedule_event(con, CSM_EV_CON_ERROR);
+}
+
+static inline void tasklet_handle_cq_comp(unsigned long data)
+{
+	struct ibtrs_con *con = (struct ibtrs_con *)data;
+
+	handle_cq_comp(con);
+}
+
+static inline void wrapper_handle_cq_comp(struct work_struct *work)
+{
+	struct ibtrs_con *con = container_of(work, struct ibtrs_con, cq_work);
+
+	handle_cq_comp(con);
+}
+
+static void cq_event_handler(struct ib_cq *cq, void *ctx)
+{
+	struct ibtrs_con *con = ctx;
+	int cpu = raw_smp_processor_id();
+
+	if (unlikely(con->cpu != cpu)) {
+		DEB_RL("WC processing is migrated from CPU %d to %d, cstate %s,"
+		       " sstate %s, user: %s\n", con->cpu,
+		       cpu, csm_state_str(con->state),
+		       ssm_state_str(con->sess->state),
+		       con->user ? "true" : "false");
+		atomic_inc(&con->sess->stats.cpu_migr.from[con->cpu]);
+		con->sess->stats.cpu_migr.to[cpu]++;
+	}
+
+	/* queue_work() can return False here.
+	 * The work can be already queued, When CQ notifications were already
+	 * activiated and are activated again after the beacon was posted.
+	 */
+	if (con->user)
+		queue_work(con->cq_wq, &con->cq_work);
+	else
+		tasklet_schedule(&con->cq_tasklet);
+}
+
+static int post_io_con_recv(struct ibtrs_con *con)
+{
+	int i, ret;
+	struct ibtrs_iu *dummy_rx_iu = con->sess->dummy_rx_iu;
+
+	for (i = 0; i < con->sess->queue_depth; i++) {
+		ret = ibtrs_post_recv(con, dummy_rx_iu);
+		if (unlikely(ret)) {
+			WRN(con->sess,
+			    "Posting receive buffers to HCA failed, errno:"
+			    " %d\n", ret);
+			return ret;
+		}
+	}
+	return 0;
+}
+
+static int post_usr_con_recv(struct ibtrs_con *con)
+{
+	int i, ret;
+
+	for (i = 0; i < USR_CON_BUF_SIZE; i++) {
+		struct ibtrs_iu *iu = con->sess->usr_rx_ring[i];
+
+		ret = ibtrs_post_recv(con, iu);
+		if (unlikely(ret)) {
+			WRN(con->sess,
+			    "Posting receive buffers to HCA failed, errno:"
+			    " %d\n", ret);
+			return ret;
+		}
+	}
+	return 0;
+}
+
+static int post_init_con_recv(struct ibtrs_con *con)
+{
+	int ret;
+
+	ret = ibtrs_post_recv(con, con->sess->rdma_info_iu);
+	if (unlikely(ret))
+		WRN(con->sess,
+		    "Posting rdma info iu to HCA failed, errno: %d\n", ret);
+	return ret;
+}
+
+static int post_recv(struct ibtrs_con *con)
+{
+	if (con->user)
+		return post_init_con_recv(con);
+	else
+		return post_io_con_recv(con);
+}
+
+static void fail_outstanding_req(struct ibtrs_con *con, struct rdma_req *req)
+{
+	void *priv;
+	enum dma_data_direction dir;
+
+	if (!req->in_use)
+		return;
+
+	if (req->sg_cnt > fmr_sg_cnt)
+		ibtrs_unmap_fast_reg_data(con, req);
+	if (req->sg_cnt)
+		ib_dma_unmap_sg(con->sess->ib_device, req->sglist,
+				req->sg_cnt, req->dir);
+	ibtrs_clt_decrease_inflight(&con->sess->stats);
+
+	req->in_use = false;
+	req->con    = NULL;
+	priv = req->priv;
+	dir = req->dir;
+
+	clt_ops->rdma_ev(priv, dir == DMA_FROM_DEVICE ?
+			 IBTRS_CLT_RDMA_EV_RDMA_REQUEST_WRITE_COMPL :
+			 IBTRS_CLT_RDMA_EV_RDMA_WRITE_COMPL, -ECONNABORTED);
+
+	DEB("Canceled outstanding request\n");
+}
+
+static void fail_outstanding_reqs(struct ibtrs_con *con)
+{
+	struct ibtrs_session *sess = con->sess;
+	int i;
+
+	if (!sess->reqs)
+		return;
+	for (i = 0; i < sess->queue_depth; ++i) {
+		if (sess->reqs[i].con == con)
+			fail_outstanding_req(con, &sess->reqs[i]);
+	}
+}
+
+static void fail_all_outstanding_reqs(struct ibtrs_session *sess)
+{
+	int i;
+
+	if (!sess->reqs)
+		return;
+	for (i = 0; i < sess->queue_depth; ++i)
+		fail_outstanding_req(sess->reqs[i].con, &sess->reqs[i]);
+}
+
+static void ibtrs_free_reqs(struct ibtrs_session *sess)
+{
+	struct rdma_req *req;
+	int i;
+
+	if (!sess->reqs)
+		return;
+
+	for (i = 0; i < sess->queue_depth; ++i) {
+		req = &sess->reqs[i];
+
+		if (sess->fast_reg_mode == IBTRS_FAST_MEM_FR) {
+			kfree(req->fr_list);
+			req->fr_list = NULL;
+		} else if (sess->fast_reg_mode == IBTRS_FAST_MEM_FMR) {
+			kfree(req->fmr_list);
+			req->fmr_list = NULL;
+		}
+
+		kfree(req->map_page);
+		req->map_page = NULL;
+	}
+
+	kfree(sess->reqs);
+	sess->reqs = NULL;
+}
+
+static int ibtrs_alloc_reqs(struct ibtrs_session *sess)
+{
+	struct rdma_req *req = NULL;
+	void *mr_list = NULL;
+	int i;
+
+	sess->reqs = kcalloc(sess->queue_depth, sizeof(*sess->reqs),
+			     GFP_KERNEL);
+	if (!sess->reqs)
+		return -ENOMEM;
+
+	for (i = 0; i < sess->queue_depth; ++i) {
+		req = &sess->reqs[i];
+		mr_list = kmalloc_array(sess->max_pages_per_mr,
+					sizeof(void *), GFP_KERNEL);
+		if (!mr_list)
+			goto out;
+
+		if (sess->fast_reg_mode == IBTRS_FAST_MEM_FR)
+			req->fr_list = mr_list;
+		else if (sess->fast_reg_mode == IBTRS_FAST_MEM_FMR)
+			req->fmr_list = mr_list;
+
+		req->map_page = kmalloc(sess->max_pages_per_mr *
+					sizeof(void *), GFP_KERNEL);
+		if (!req->map_page)
+			goto out;
+	}
+
+	return 0;
+
+out:
+	ibtrs_free_reqs(sess);
+	return -ENOMEM;
+}
+
+static void free_sess_rx_bufs(struct ibtrs_session *sess)
+{
+	int i;
+
+	if (!sess->usr_rx_ring)
+		return;
+
+	for (i = 0; i < USR_CON_BUF_SIZE; ++i)
+		if (sess->usr_rx_ring[i])
+			ibtrs_iu_free(sess->usr_rx_ring[i],
+				      DMA_FROM_DEVICE,
+				      sess->ib_device);
+
+	kfree(sess->usr_rx_ring);
+	sess->usr_rx_ring = NULL;
+}
+
+static void free_sess_tx_bufs(struct ibtrs_session *sess, bool check)
+{
+	int i;
+	struct ibtrs_iu *e, *next;
+
+	if (!sess->io_tx_ius)
+		return;
+
+	for (i = 0; i < sess->queue_depth; i++)
+		if (sess->io_tx_ius[i])
+			ibtrs_iu_free(sess->io_tx_ius[i], DMA_TO_DEVICE,
+				      sess->ib_device);
+
+	kfree(sess->io_tx_ius);
+	sess->io_tx_ius = NULL;
+	if (check) {
+		struct list_head *e;
+		size_t cnt = 0;
+
+		list_for_each(e, &sess->u_msg_ius_list)
+			cnt++;
+
+		WARN_ON(cnt != USR_CON_BUF_SIZE);
+	}
+	list_for_each_entry_safe(e, next, &sess->u_msg_ius_list, list) {
+		list_del(&e->list);
+		ibtrs_iu_free(e, DMA_TO_DEVICE, sess->ib_device);
+	}
+}
+
+static void free_sess_fast_pool(struct ibtrs_session *sess)
+{
+	if (sess->fast_reg_mode == IBTRS_FAST_MEM_FMR) {
+		if (sess->fmr_pool)
+			ib_destroy_fmr_pool(sess->fmr_pool);
+		sess->fmr_pool = NULL;
+	}
+}
+
+static void free_sess_tr_bufs(struct ibtrs_session *sess)
+{
+	free_sess_rx_bufs(sess);
+	free_sess_tx_bufs(sess, true);
+}
+
+static void free_sess_init_bufs(struct ibtrs_session *sess)
+{
+	if (sess->rdma_info_iu) {
+		ibtrs_iu_free(sess->rdma_info_iu, DMA_FROM_DEVICE,
+			      sess->ib_device);
+		sess->rdma_info_iu = NULL;
+	}
+
+	if (sess->dummy_rx_iu) {
+		ibtrs_iu_free(sess->dummy_rx_iu, DMA_FROM_DEVICE,
+			      sess->ib_device);
+		sess->dummy_rx_iu = NULL;
+	}
+
+	if (sess->sess_info_iu) {
+		ibtrs_iu_free(sess->sess_info_iu, DMA_TO_DEVICE,
+			      sess->ib_device);
+		sess->sess_info_iu = NULL;
+	}
+}
+
+static void free_io_bufs(struct ibtrs_session *sess)
+{
+	ibtrs_free_reqs(sess);
+	free_sess_fast_pool(sess);
+	kfree(sess->tags_map);
+	sess->tags_map = NULL;
+	kfree(sess->tags);
+	sess->tags = NULL;
+	sess->io_bufs_initialized = false;
+}
+
+static void free_sess_bufs(struct ibtrs_session *sess)
+{
+	free_sess_init_bufs(sess);
+	free_io_bufs(sess);
+}
+
+static struct ib_fmr_pool *alloc_fmr_pool(struct ibtrs_session *sess)
+{
+	struct ib_fmr_pool_param fmr_param;
+
+	memset(&fmr_param, 0, sizeof(fmr_param));
+	fmr_param.pool_size	    = sess->queue_depth *
+				      sess->max_pages_per_mr;
+	fmr_param.dirty_watermark   = fmr_param.pool_size / 4;
+	fmr_param.cache		    = 0;
+	fmr_param.max_pages_per_fmr = sess->max_pages_per_mr;
+	fmr_param.page_shift	    = ilog2(sess->mr_page_size);
+	fmr_param.access	    = (IB_ACCESS_LOCAL_WRITE |
+				       IB_ACCESS_REMOTE_WRITE);
+
+	return ib_create_fmr_pool(sess->ib_sess.pd, &fmr_param);
+}
+
+static int alloc_sess_rx_bufs(struct ibtrs_session *sess)
+{
+	int i;
+	u32 max_req_size = sess->max_req_size;
+
+	sess->usr_rx_ring = kcalloc(USR_CON_BUF_SIZE,
+				    sizeof(*sess->usr_rx_ring),
+				    GFP_KERNEL);
+	if (!sess->usr_rx_ring)
+		goto err;
+
+	for (i = 0; i < USR_CON_BUF_SIZE; ++i) {
+		/* alloc recv buffer, open rep is the biggest */
+		sess->usr_rx_ring[i] = ibtrs_iu_alloc(i, max_req_size,
+						      GFP_KERNEL,
+						      sess->ib_device,
+						      DMA_FROM_DEVICE, true);
+		if (!sess->usr_rx_ring[i]) {
+			WRN(sess, "Failed to allocate IU for RX ring\n");
+			goto err;
+		}
+	}
+
+	return 0;
+
+err:
+	free_sess_rx_bufs(sess);
+
+	return -ENOMEM;
+}
+
+static int alloc_sess_fast_pool(struct ibtrs_session *sess)
+{
+	int err = 0;
+	struct ib_fmr_pool *fmr_pool;
+
+	if (sess->fast_reg_mode == IBTRS_FAST_MEM_FMR) {
+		fmr_pool = alloc_fmr_pool(sess);
+		if (IS_ERR(fmr_pool)) {
+			err = PTR_ERR(fmr_pool);
+			ERR(sess, "FMR pool allocation failed, errno: %d\n",
+			    err);
+			return err;
+		}
+		sess->fmr_pool = fmr_pool;
+	}
+	return err;
+}
+
+static int alloc_sess_init_bufs(struct ibtrs_session *sess)
+{
+	sess->sess_info_iu = ibtrs_iu_alloc(0, MSG_SESS_INFO_SIZE, GFP_KERNEL,
+			       sess->ib_device, DMA_TO_DEVICE, true);
+	if (unlikely(!sess->sess_info_iu)) {
+		ERR_RL(sess, "Can't allocate transfer buffer for "
+			     "sess hostname\n");
+		return -ENOMEM;
+	}
+	sess->rdma_info_iu =
+		ibtrs_iu_alloc(0,
+			       IBTRS_MSG_SESS_OPEN_RESP_LEN(MAX_SESS_QUEUE_DEPTH),
+			       GFP_KERNEL, sess->ib_device,
+			       DMA_FROM_DEVICE, true);
+	if (!sess->rdma_info_iu) {
+		WRN(sess, "Failed to allocate IU to receive "
+			  "RDMA INFO message\n");
+		goto err;
+	}
+
+	sess->dummy_rx_iu =
+		ibtrs_iu_alloc(0, IBTRS_HDR_LEN,
+			       GFP_KERNEL, sess->ib_device,
+			       DMA_FROM_DEVICE, true);
+	if (!sess->dummy_rx_iu) {
+		WRN(sess, "Failed to allocate IU to receive "
+			  "immediate messages on io connections\n");
+		goto err;
+	}
+
+	return 0;
+
+err:
+	free_sess_init_bufs(sess);
+
+	return -ENOMEM;
+}
+
+static int alloc_sess_tx_bufs(struct ibtrs_session *sess)
+{
+	int i;
+	struct ibtrs_iu *iu;
+	u32 max_req_size = sess->max_req_size;
+
+	INIT_LIST_HEAD(&sess->u_msg_ius_list);
+	spin_lock_init(&sess->u_msg_ius_lock);
+
+	sess->io_tx_ius = kcalloc(sess->queue_depth, sizeof(*sess->io_tx_ius),
+				  GFP_KERNEL);
+	if (!sess->io_tx_ius)
+		goto err;
+
+	for (i = 0; i < sess->queue_depth; ++i) {
+		iu = ibtrs_iu_alloc(i, max_req_size, GFP_KERNEL,
+				    sess->ib_device, DMA_TO_DEVICE,false);
+		if (!iu) {
+			WRN(sess, "Failed to allocate IU for TX buffer\n");
+			goto err;
+		}
+		sess->io_tx_ius[i] = iu;
+	}
+
+	for (i = 0; i < USR_CON_BUF_SIZE; ++i) {
+		iu = ibtrs_iu_alloc(i, max_req_size, GFP_KERNEL,
+				    sess->ib_device, DMA_TO_DEVICE,
+				    true);
+		if (!iu) {
+			WRN(sess, "Failed to allocate IU for TX buffer\n");
+			goto err;
+		}
+		list_add(&iu->list, &sess->u_msg_ius_list);
+	}
+	return 0;
+
+err:
+	free_sess_tx_bufs(sess, false);
+
+	return -ENOMEM;
+}
+
+static int alloc_sess_tr_bufs(struct ibtrs_session *sess)
+{
+	int err;
+
+	err = alloc_sess_rx_bufs(sess);
+	if (!err)
+		err = alloc_sess_tx_bufs(sess);
+
+	return err;
+}
+
+static int alloc_sess_tags(struct ibtrs_session *sess)
+{
+	int err, i;
+
+	sess->tags_map = kzalloc(BITS_TO_LONGS(sess->queue_depth) *
+				 sizeof(long), GFP_KERNEL);
+	if (!sess->tags_map) {
+		ERR(sess, "Failed to alloc tags bitmap\n");
+		err = -ENOMEM;
+		goto out_err;
+	}
+
+	sess->tags = kcalloc(sess->queue_depth, TAG_SIZE(sess),
+			     GFP_KERNEL);
+	if (!sess->tags) {
+		ERR(sess, "Failed to alloc memory for tags\n");
+		err = -ENOMEM;
+		goto err_map;
+	}
+
+	for (i = 0; i < sess->queue_depth; i++) {
+		struct ibtrs_tag *tag;
+
+		tag = GET_TAG(sess, i);
+		tag->mem_id = i;
+		tag->mem_id_mask = i << ((IB_IMM_SIZE_BITS - 1) -
+					 ilog2(sess->queue_depth - 1));
+	}
+
+	return 0;
+
+err_map:
+	kfree(sess->tags_map);
+	sess->tags_map = NULL;
+out_err:
+	return err;
+}
+
+static int connect_qp(struct ibtrs_con *con)
+{
+	int err;
+	struct rdma_conn_param conn_param;
+	struct ibtrs_msg_sess_open somsg;
+	struct ibtrs_msg_con_open comsg;
+
+	memset(&conn_param, 0, sizeof(conn_param));
+	conn_param.retry_count = retry_count;
+
+	if (con->user) {
+		if (CONS_PER_SESSION > U8_MAX)
+			return -EINVAL;
+		fill_ibtrs_msg_sess_open(&somsg, CONS_PER_SESSION, &uuid);
+		conn_param.private_data		= &somsg;
+		conn_param.private_data_len	= sizeof(somsg);
+		conn_param.rnr_retry_count	= 7;
+	} else {
+		fill_ibtrs_msg_con_open(&comsg, &uuid);
+		conn_param.private_data		= &comsg;
+		conn_param.private_data_len	= sizeof(comsg);
+	}
+	err = rdma_connect(con->cm_id, &conn_param);
+	if (err) {
+		ERR(con->sess, "Establishing RDMA connection failed, errno:"
+		    " %d\n", err);
+		return err;
+	}
+
+	DEB("rdma_connect successful\n");
+	return 0;
+}
+
+static int resolve_addr(struct ibtrs_con *con,
+			const struct sockaddr_storage *addr)
+{
+	int err;
+
+	err = rdma_resolve_addr(con->cm_id, NULL,
+				(struct sockaddr *)addr, 1000);
+	if (err)
+		/* TODO: Include the address in message that was
+		 * tried to resolve can be a AF_INET, AF_INET6
+		 * or an AF_IB address
+		 */
+		ERR(con->sess, "Resolving server address failed, errno: %d\n",
+		    err);
+	return err;
+}
+
+static int resolve_route(struct ibtrs_con *con)
+{
+	int err;
+
+	err = rdma_resolve_route(con->cm_id, 1000);
+	if (err)
+		ERR(con->sess, "Resolving route failed, errno: %d\n",
+		    err);
+
+	return err;
+}
+
+static int query_fast_reg_mode(struct ibtrs_con *con)
+{
+	struct ib_device *ibdev = con->sess->ib_device;
+	struct ib_device_attr *dev_attr = &ibdev->attrs;
+	int mr_page_shift;
+	u64 max_pages_per_mr;
+
+
+	if (ibdev->alloc_fmr && ibdev->dealloc_fmr &&
+	    ibdev->map_phys_fmr && ibdev->unmap_fmr) {
+		con->sess->fast_reg_mode = IBTRS_FAST_MEM_FMR;
+		INFO(con->sess, "Device %s supports FMR\n", ibdev->name);
+	}
+	if (dev_attr->device_cap_flags & IB_DEVICE_MEM_MGT_EXTENSIONS &&
+	    use_fr) {
+		con->sess->fast_reg_mode = IBTRS_FAST_MEM_FR;
+		INFO(con->sess, "Device %s supports FR\n", ibdev->name);
+	}
+
+	/*
+	 * Use the smallest page size supported by the HCA, down to a
+	 * minimum of 4096 bytes. We're unlikely to build large sglists
+	 * out of smaller entries.
+	 */
+	mr_page_shift		= max(12, ffs(dev_attr->page_size_cap) - 1);
+	con->sess->mr_page_size	= 1 << mr_page_shift;
+	con->sess->max_sge	= dev_attr->max_sge;
+	con->sess->mr_page_mask	= ~((u64)con->sess->mr_page_size - 1);
+	max_pages_per_mr	= dev_attr->max_mr_size;
+	do_div(max_pages_per_mr, con->sess->mr_page_size);
+	con->sess->max_pages_per_mr = min_t(u64, con->sess->max_pages_per_mr,
+					    max_pages_per_mr);
+	if (con->sess->fast_reg_mode == IBTRS_FAST_MEM_FR) {
+		con->sess->max_pages_per_mr =
+			min_t(u32, con->sess->max_pages_per_mr,
+			      dev_attr->max_fast_reg_page_list_len);
+	}
+	con->sess->mr_max_size	= con->sess->mr_page_size *
+				  con->sess->max_pages_per_mr;
+	DEB("%s: mr_page_shift = %d, dev_attr->max_mr_size = %#llx, "
+	    "dev_attr->max_fast_reg_page_list_len = %u, max_pages_per_mr = %d, "
+	    "mr_max_size = %#x\n", ibdev->name, mr_page_shift,
+	    dev_attr->max_mr_size, dev_attr->max_fast_reg_page_list_len,
+	    con->sess->max_pages_per_mr, con->sess->mr_max_size);
+	return 0;
+}
+
+static int send_heartbeat(struct ibtrs_session *sess)
+{
+	int err;
+	struct ibtrs_con *con;
+
+	con = &sess->con[0];
+
+	rcu_read_lock();
+	smp_rmb(); /* fence con->state check */
+	if (unlikely(con->state != CSM_STATE_CONNECTED)) {
+		rcu_read_unlock();
+		ERR_RL(sess, "Sending heartbeat message failed, not connected."
+		       " Connection state changed to %s!\n",
+		       csm_state_str(con->state));
+		return -ECOMM;
+	}
+
+	err = ibtrs_write_empty_imm(con->ib_con.qp, UINT_MAX, IB_SEND_SIGNALED);
+	rcu_read_unlock();
+	if (unlikely(err)) {
+		WRN(sess, "Sending heartbeat failed, posting msg to QP failed,"
+		    " errno: %d\n", err);
+		return err;
+	}
+
+	ibtrs_heartbeat_set_send_ts(&sess->heartbeat);
+
+	return err;
+}
+
+static void heartbeat_work(struct work_struct *work)
+{
+	int err;
+	struct ibtrs_session *sess;
+
+	sess = container_of(to_delayed_work(work), struct ibtrs_session,
+			    heartbeat_dwork);
+
+	if (ibtrs_heartbeat_timeout_is_expired(&sess->heartbeat)) {
+		ssm_schedule_event(sess, SSM_EV_RECONNECT_HEARTBEAT);
+		return;
+	}
+
+	ibtrs_heartbeat_warn(&sess->heartbeat);
+
+	if (ibtrs_heartbeat_send_ts_diff_ms(&sess->heartbeat) >=
+	    HEARTBEAT_INTV_MS) {
+		err = send_heartbeat(sess);
+		if (unlikely(err))
+			WRN(sess, "Sending heartbeat failed, errno: %d\n",
+			    err);
+	}
+
+	if (!schedule_delayed_work(&sess->heartbeat_dwork,
+				   HEARTBEAT_INTV_JIFFIES))
+		WRN(sess, "Schedule heartbeat work failed, already queued?\n");
+}
+
+static int create_cm_id_con(const struct sockaddr_storage *addr,
+			    struct ibtrs_con *con)
+{
+	int err;
+
+	if (addr->ss_family == AF_IB)
+		con->cm_id = rdma_create_id(&init_net,
+					    ibtrs_clt_rdma_cm_ev_handler, con,
+					    RDMA_PS_IB, IB_QPT_RC);
+	else
+		con->cm_id = rdma_create_id(&init_net,
+					    ibtrs_clt_rdma_cm_ev_handler, con,
+					    RDMA_PS_TCP, IB_QPT_RC);
+
+	if (IS_ERR(con->cm_id)) {
+		err = PTR_ERR(con->cm_id);
+		WRN(con->sess, "Failed to create CM ID, errno: %d\n", err);
+		con->cm_id = NULL;
+		return err;
+	}
+
+	return 0;
+}
+
+static int create_ib_sess(struct ibtrs_con *con)
+{
+	int err;
+	struct ibtrs_session *sess = con->sess;
+
+	if (atomic_read(&sess->ib_sess_initialized) == 1)
+		return 0;
+
+	if (WARN_ON(!con->cm_id->device)) {
+		WRN(sess, "Invalid CM ID device\n");
+		return -EINVAL;
+	}
+
+	// TODO ib_device_hold(con->cm_id->device);
+	sess->ib_device = con->cm_id->device;
+
+	/* For performance reasons, we don't allow a session to be created if
+	 * the number of completion vectors available in the hardware is not
+	 * enough to have one interrupt per CPU.
+	 */
+	if (sess->ib_device->num_comp_vectors < num_online_cpus()) {
+		WRN(sess,
+		    "%d cq vectors available, not enough to have one IRQ per"
+		    " CPU, >= %d vectors required, contine anyway.\n",
+		    sess->ib_device->num_comp_vectors, num_online_cpus());
+	}
+
+	err = ib_session_init(sess->ib_device, &sess->ib_sess);
+	if (err) {
+		WRN(sess, "Failed to initialize IB session, errno: %d\n", err);
+		goto err_out;
+	}
+
+	err = query_fast_reg_mode(con);
+	if (err) {
+		WRN(sess, "Failed to query fast registration mode, errno: %d\n",
+		    err);
+		goto err_sess;
+	}
+
+	err = alloc_sess_init_bufs(sess);
+	if (err) {
+		ERR(sess, "Failed to allocate sess bufs, errno: %d\n", err);
+		goto err_sess;
+	}
+
+	sess->msg_wq = alloc_ordered_workqueue("sess_msg_wq", 0);
+	if (!sess->msg_wq) {
+		ERR(sess, "Failed to create user message workqueue\n");
+		err = -ENOMEM;
+		goto err_buff;
+	}
+
+	atomic_set(&sess->ib_sess_initialized, 1);
+
+	return 0;
+
+err_buff:
+	free_sess_init_bufs(sess);
+err_sess:
+	ib_session_destroy(&sess->ib_sess);
+err_out:
+	// TODO ib_device_put(sess->ib_device);
+	sess->ib_device = NULL;
+	return err;
+}
+
+static void ibtrs_clt_destroy_ib_session(struct ibtrs_session *sess)
+{
+	if (sess->ib_device) {
+		free_sess_bufs(sess);
+		destroy_workqueue(sess->msg_wq);
+		// TODO ib_device_put(sess->ib_device);
+		sess->ib_device = NULL;
+	}
+
+	if (atomic_cmpxchg(&sess->ib_sess_initialized, 1, 0) == 1)
+		ib_session_destroy(&sess->ib_sess);
+
+	if (sess->ib_sess_destroy_completion)
+		complete_all(sess->ib_sess_destroy_completion);
+}
+
+static void free_con_fast_pool(struct ibtrs_con *con)
+{
+	if (con->user)
+		return;
+	if (con->sess->fast_reg_mode == IBTRS_FAST_MEM_FMR)
+		return;
+	if (con->sess->fast_reg_mode == IBTRS_FAST_MEM_FR) {
+		ibtrs_destroy_fr_pool(con->fr_pool);
+		con->fr_pool = NULL;
+	}
+}
+
+static int alloc_con_fast_pool(struct ibtrs_con *con)
+{
+	int err = 0;
+	struct ibtrs_fr_pool *fr_pool;
+	struct ibtrs_session *sess = con->sess;
+
+	if (con->user)
+		return 0;
+
+	if (sess->fast_reg_mode == IBTRS_FAST_MEM_FMR)
+		return 0;
+
+	if (sess->fast_reg_mode == IBTRS_FAST_MEM_FR) {
+		fr_pool = alloc_fr_pool(sess);
+		if (IS_ERR(fr_pool)) {
+			err = PTR_ERR(fr_pool);
+			ERR(sess, "FR pool allocation failed, errno: %d\n",
+			    err);
+			return err;
+		}
+		con->fr_pool = fr_pool;
+	}
+
+	return err;
+}
+
+static void ibtrs_clt_destroy_cm_id(struct ibtrs_con *con)
+{
+	if (!con->device_being_removed) {
+		rdma_destroy_id(con->cm_id);
+		con->cm_id = NULL;
+	}
+}
+
+static void con_destroy(struct ibtrs_con *con)
+{
+	if (con->user) {
+		cancel_delayed_work_sync(&con->sess->heartbeat_dwork);
+		drain_workqueue(con->cq_wq);
+		cancel_work_sync(&con->cq_work);
+	}
+	fail_outstanding_reqs(con);
+	ib_con_destroy(&con->ib_con);
+	free_con_fast_pool(con);
+	if (con->user)
+		free_sess_tr_bufs(con->sess);
+	ibtrs_clt_destroy_cm_id(con);
+
+	/* notify possible user msg ACK thread waiting for a tx iu or user msg
+	 * buffer so they can check the connection state, give up waiting and
+	 * put back any tx_iu reserved
+	 */
+	if (con->user) {
+		wake_up(&con->sess->mu_buf_wait_q);
+		wake_up(&con->sess->mu_iu_wait_q);
+	}
+}
+
+int ibtrs_clt_stats_migration_cnt_to_str(struct ibtrs_session *sess, char *buf,
+					 size_t len)
+{
+	int i;
+	size_t used = 0;
+
+	used += scnprintf(buf + used, len - used, "    ");
+
+	for (i = 0; i < num_online_cpus(); i++)
+		used += scnprintf(buf + used, len - used, " CPU%u", i);
+
+	used += scnprintf(buf + used, len - used, "\nfrom:");
+
+	for (i = 0; i < num_online_cpus(); i++)
+		used += scnprintf(buf + used, len - used, " %d",
+				 atomic_read(&sess->stats.cpu_migr.from[i]));
+
+	used += scnprintf(buf + used, len - used, "\n"
+			 "to  :");
+
+	for (i = 0; i < num_online_cpus(); i++)
+		used += scnprintf(buf + used, len - used, " %d",
+				 sess->stats.cpu_migr.to[i]);
+
+	used += scnprintf(buf + used, len - used, "\n");
+
+	return used;
+}
+
+int ibtrs_clt_reset_reconnects_stat(struct ibtrs_session *sess, bool enable)
+{
+	if (enable) {
+		memset(&sess->stats.reconnects, 0,
+		       sizeof(sess->stats.reconnects));
+		return 0;
+	} else {
+		return -EINVAL;
+	}
+}
+
+int ibtrs_clt_stats_reconnects_to_str(struct ibtrs_session *sess, char *buf,
+				      size_t len)
+{
+	return scnprintf(buf, len, "%u %u\n",
+			sess->stats.reconnects.successful_cnt,
+			sess->stats.reconnects.fail_cnt);
+}
+
+int ibtrs_clt_reset_user_ib_msgs_stats(struct ibtrs_session *sess, bool enable)
+{
+	if (enable) {
+		memset(&sess->stats.user_ib_msgs, 0,
+		       sizeof(sess->stats.user_ib_msgs));
+		return 0;
+	} else {
+		return -EINVAL;
+	}
+}
+
+int ibtrs_clt_stats_user_ib_msgs_to_str(struct ibtrs_session *sess, char *buf,
+					size_t len)
+{
+	return scnprintf(buf, len, "%u %llu %u %llu\n",
+			sess->stats.user_ib_msgs.recv_msg_cnt,
+			sess->stats.user_ib_msgs.recv_size,
+			sess->stats.user_ib_msgs.sent_msg_cnt,
+			sess->stats.user_ib_msgs.sent_size);
+}
+
+static u32 ibtrs_clt_stats_get_max_wc_cnt(struct ibtrs_session *sess)
+{
+	int i;
+	u32 max = 0;
+
+	for (i = 0; i < num_online_cpus(); i++)
+		if (max < sess->stats.wc_comp[i].max_wc_cnt)
+			max = sess->stats.wc_comp[i].max_wc_cnt;
+	return max;
+}
+
+static u32 ibtrs_clt_stats_get_avg_wc_cnt(struct ibtrs_session *sess)
+{
+	int i;
+	u32 cnt = 0;
+	u64 sum = 0;
+
+	for (i = 0; i < num_online_cpus(); i++) {
+		sum += sess->stats.wc_comp[i].total_cnt;
+		cnt += sess->stats.wc_comp[i].cnt;
+	}
+
+	return cnt ? sum / cnt : 0;
+}
+
+int ibtrs_clt_stats_wc_completion_to_str(struct ibtrs_session *sess, char *buf,
+					 size_t len)
+{
+	return scnprintf(buf, len, "%u %u\n",
+			ibtrs_clt_stats_get_max_wc_cnt(sess),
+			ibtrs_clt_stats_get_avg_wc_cnt(sess));
+}
+
+static void sess_destroy_handler(struct work_struct *work)
+{
+	struct sess_destroy_sm_wq_work *w;
+
+	w = container_of(work, struct sess_destroy_sm_wq_work, work);
+
+	put_sess(w->sess);
+	kvfree(w);
+}
+
+static void sess_schedule_destroy(struct ibtrs_session *sess)
+{
+	struct sess_destroy_sm_wq_work *w;
+
+	while (true) {
+		w = ibtrs_malloc(sizeof(*w));
+		if (w)
+			break;
+		cond_resched();
+	}
+
+	w->sess = sess;
+	INIT_WORK(&w->work, sess_destroy_handler);
+	ibtrs_clt_destroy_sess_files(&sess->kobj, &sess->kobj_stats);
+	queue_work(ibtrs_wq, &w->work);
+}
+
+int ibtrs_clt_reset_wc_comp_stats(struct ibtrs_session *sess, bool enable)
+{
+	if (enable) {
+		memset(sess->stats.wc_comp, 0,
+		       num_online_cpus() * sizeof(*sess->stats.wc_comp));
+		return 0;
+	} else {
+		return -EINVAL;
+	}
+}
+
+static int ibtrs_clt_init_wc_comp_stats(struct ibtrs_session *sess)
+{
+	sess->stats.wc_comp = kcalloc(num_online_cpus(),
+				      sizeof(*sess->stats.wc_comp),
+				      GFP_KERNEL);
+	if (!sess->stats.wc_comp)
+		return -ENOMEM;
+
+	return 0;
+}
+
+int ibtrs_clt_reset_cpu_migr_stats(struct ibtrs_session *sess, bool enable)
+{
+	if (enable) {
+		memset(sess->stats.cpu_migr.from, 0,
+		       num_online_cpus() *
+		       sizeof(*sess->stats.cpu_migr.from));
+
+		memset(sess->stats.cpu_migr.to, 0,
+		       num_online_cpus() * sizeof(*sess->stats.cpu_migr.to));
+		return 0;
+	} else {
+		return -EINVAL;
+	}
+}
+
+static int ibtrs_clt_init_cpu_migr_stats(struct ibtrs_session *sess)
+{
+	sess->stats.cpu_migr.from = kcalloc(num_online_cpus(),
+					    sizeof(*sess->stats.cpu_migr.from),
+					    GFP_KERNEL);
+	if (!sess->stats.cpu_migr.from)
+		return -ENOMEM;
+
+	sess->stats.cpu_migr.to = kcalloc(num_online_cpus(),
+					  sizeof(*sess->stats.cpu_migr.to),
+					  GFP_KERNEL);
+	if (!sess->stats.cpu_migr.to) {
+		kfree(sess->stats.cpu_migr.from);
+		sess->stats.cpu_migr.from = NULL;
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static int ibtrs_clt_init_sg_list_distr_stats(struct ibtrs_session *sess)
+{
+	int i;
+
+	sess->stats.sg_list_distr = kmalloc_array(num_online_cpus(),
+					    sizeof(*sess->stats.sg_list_distr),
+					    GFP_KERNEL);
+
+	if (!sess->stats.sg_list_distr)
+		return -ENOMEM;
+
+	for (i = 0; i < num_online_cpus(); i++) {
+		sess->stats.sg_list_distr[i] =
+			kzalloc_node(sizeof(*sess->stats.sg_list_distr[0]) *
+				     (SG_DISTR_LEN + 1),
+				     GFP_KERNEL, cpu_to_node(i));
+		if (!sess->stats.sg_list_distr[i])
+			goto err;
+	}
+
+	sess->stats.sg_list_total = kcalloc(num_online_cpus(),
+					sizeof(*sess->stats.sg_list_total),
+					GFP_KERNEL);
+	if (!sess->stats.sg_list_total)
+		goto err;
+
+	return 0;
+
+err:
+	for (; i > 0; i--)
+		kfree(sess->stats.sg_list_distr[i - 1]);
+
+	kfree(sess->stats.sg_list_distr);
+	sess->stats.sg_list_distr = NULL;
+
+	return -ENOMEM;
+}
+
+int ibtrs_clt_reset_sg_list_distr_stats(struct ibtrs_session *sess,
+					bool enable)
+{
+	int i;
+
+	if (enable) {
+		memset(sess->stats.sg_list_total, 0,
+		       num_online_cpus() *
+		       sizeof(*sess->stats.sg_list_total));
+
+		for (i = 0; i < num_online_cpus(); i++)
+			memset(sess->stats.sg_list_distr[i], 0,
+			       sizeof(*sess->stats.sg_list_distr[0]) *
+			       (SG_DISTR_LEN + 1));
+		return 0;
+	} else {
+		return -EINVAL;
+	}
+}
+
+ssize_t ibtrs_clt_stats_rdma_lat_distr_to_str(struct ibtrs_session *sess,
+					      char *page, size_t len)
+{
+	ssize_t cnt = 0;
+	int i, cpu;
+	struct ibtrs_clt_stats *s = &sess->stats;
+	struct ibtrs_clt_stats_rdma_lat_entry res[MAX_LOG_LATENCY -
+						  MIN_LOG_LATENCY + 2];
+	struct ibtrs_clt_stats_rdma_lat_entry max;
+
+	max.write	= 0;
+	max.read	= 0;
+	for (cpu = 0; cpu < num_online_cpus(); cpu++) {
+		if (max.write < s->rdma_lat_max[cpu].write)
+			max.write = s->rdma_lat_max[cpu].write;
+		if (max.read < s->rdma_lat_max[cpu].read)
+			max.read = s->rdma_lat_max[cpu].read;
+	}
+
+	for (i = 0; i < ARRAY_SIZE(res); i++) {
+		res[i].write	= 0;
+		res[i].read	= 0;
+		for (cpu = 0; cpu < num_online_cpus(); cpu++) {
+			res[i].write += s->rdma_lat_distr[cpu][i].write;
+			res[i].read += s->rdma_lat_distr[cpu][i].read;
+		}
+	}
+
+	for (i = 0; i < ARRAY_SIZE(res) - 1; i++)
+		cnt += scnprintf(page + cnt, len - cnt,
+				 "< %6d ms: %llu %llu\n",
+				 1 << (i + MIN_LOG_LATENCY), res[i].read,
+				 res[i].write);
+	cnt += scnprintf(page + cnt, len - cnt, ">= %5d ms: %llu %llu\n",
+			 1 << (i - 1 + MIN_LOG_LATENCY), res[i].read,
+			 res[i].write);
+	cnt += scnprintf(page + cnt, len - cnt, " maximum ms: %llu %llu\n",
+			 max.read, max.write);
+
+	return cnt;
+}
+
+int ibtrs_clt_reset_rdma_lat_distr_stats(struct ibtrs_session *sess,
+					 bool enable)
+{
+	int i;
+	struct ibtrs_clt_stats *s = &sess->stats;
+
+	if (enable) {
+		memset(s->rdma_lat_max, 0,
+		       num_online_cpus() * sizeof(*s->rdma_lat_max));
+
+		for (i = 0; i < num_online_cpus(); i++)
+			memset(s->rdma_lat_distr[i], 0,
+			       sizeof(*s->rdma_lat_distr[0]) *
+			       (MAX_LOG_LATENCY - MIN_LOG_LATENCY + 2));
+	}
+	sess->enable_rdma_lat = enable;
+	return 0;
+}
+
+static int ibtrs_clt_init_rdma_lat_distr_stats(struct ibtrs_session *sess)
+{
+	int i;
+	struct ibtrs_clt_stats *s = &sess->stats;
+
+	s->rdma_lat_max = kzalloc(num_online_cpus() *
+				  sizeof(*s->rdma_lat_max), GFP_KERNEL);
+	if (!s->rdma_lat_max)
+		return -ENOMEM;
+
+	s->rdma_lat_distr = kmalloc_array(num_online_cpus(),
+					  sizeof(*s->rdma_lat_distr),
+					  GFP_KERNEL);
+	if (!s->rdma_lat_distr)
+		goto err1;
+
+	for (i = 0; i < num_online_cpus(); i++) {
+		s->rdma_lat_distr[i] =
+			kzalloc_node(sizeof(*s->rdma_lat_distr[0]) *
+				     (MAX_LOG_LATENCY - MIN_LOG_LATENCY + 2),
+				     GFP_KERNEL, cpu_to_node(i));
+		if (!s->rdma_lat_distr[i])
+			goto err2;
+	}
+
+	return 0;
+
+err2:
+	for (; i >= 0; i--)
+		kfree(s->rdma_lat_distr[i]);
+
+	kfree(s->rdma_lat_distr);
+	s->rdma_lat_distr = NULL;
+err1:
+	kfree(s->rdma_lat_max);
+	s->rdma_lat_max = NULL;
+
+	return -ENOMEM;
+}
+
+int ibtrs_clt_reset_rdma_stats(struct ibtrs_session *sess, bool enable)
+{
+	if (enable) {
+		struct ibtrs_clt_stats *s = &sess->stats;
+
+		memset(s->rdma_stats, 0,
+		       num_online_cpus() * sizeof(*s->rdma_stats));
+		return 0;
+	} else {
+		return -EINVAL;
+	}
+}
+
+static int ibtrs_clt_init_rdma_stats(struct ibtrs_session *sess)
+{
+	struct ibtrs_clt_stats *s = &sess->stats;
+
+	s->rdma_stats = kcalloc(num_online_cpus(), sizeof(*s->rdma_stats),
+				GFP_KERNEL);
+	if (!s->rdma_stats)
+		return -ENOMEM;
+
+	return 0;
+}
+
+ssize_t ibtrs_clt_reset_all_help(struct ibtrs_session *sess,
+				 char *page, size_t len)
+{
+	return scnprintf(page, len, "echo 1 to reset all statistics\n");
+}
+
+int ibtrs_clt_reset_all_stats(struct ibtrs_session *sess, bool enable)
+{
+	if (enable) {
+		ibtrs_clt_reset_rdma_stats(sess, enable);
+		ibtrs_clt_reset_rdma_lat_distr_stats(sess, enable);
+		ibtrs_clt_reset_sg_list_distr_stats(sess, enable);
+		ibtrs_clt_reset_cpu_migr_stats(sess, enable);
+		ibtrs_clt_reset_user_ib_msgs_stats(sess, enable);
+		ibtrs_clt_reset_reconnects_stat(sess, enable);
+		ibtrs_clt_reset_wc_comp_stats(sess, enable);
+		return 0;
+	} else {
+		return -EINVAL;
+	}
+}
+
+static int ibtrs_clt_init_stats(struct ibtrs_session *sess)
+{
+	int err;
+
+	err = ibtrs_clt_init_sg_list_distr_stats(sess);
+	if (err) {
+		ERR(sess,
+		    "Failed to init S/G list distribution stats, errno: %d\n",
+		    err);
+		return err;
+	}
+
+	err = ibtrs_clt_init_cpu_migr_stats(sess);
+	if (err) {
+		ERR(sess, "Failed to init CPU migration stats, errno: %d\n",
+		    err);
+		goto err_sg_list;
+	}
+
+	err = ibtrs_clt_init_rdma_lat_distr_stats(sess);
+	if (err) {
+		ERR(sess,
+		    "Failed to init RDMA lat distribution stats, errno: %d\n",
+		    err);
+		goto err_migr;
+	}
+
+	err = ibtrs_clt_init_wc_comp_stats(sess);
+	if (err) {
+		ERR(sess, "Failed to init WC completion stats, errno: %d\n",
+		    err);
+		goto err_rdma_lat;
+	}
+
+	err = ibtrs_clt_init_rdma_stats(sess);
+	if (err) {
+		ERR(sess, "Failed to init RDMA stats, errno: %d\n",
+		    err);
+		goto err_wc_comp;
+	}
+
+	return 0;
+
+err_wc_comp:
+	ibtrs_clt_free_wc_comp_stats(sess);
+err_rdma_lat:
+	ibtrs_clt_free_rdma_lat_stats(sess);
+err_migr:
+	ibtrs_clt_free_cpu_migr_stats(sess);
+err_sg_list:
+	ibtrs_clt_free_sg_list_distr_stats(sess);
+	return err;
+}
+
+static void ibtrs_clt_sess_reconnect_worker(struct work_struct *work)
+{
+	struct ibtrs_session *sess = container_of(to_delayed_work(work),
+						  struct ibtrs_session,
+						  reconnect_dwork);
+
+	ssm_schedule_event(sess, SSM_EV_RECONNECT);
+}
+
+static int sess_init_cons(struct ibtrs_session *sess)
+{
+	int i;
+
+	for (i = 0; i < CONS_PER_SESSION; i++) {
+		struct ibtrs_con *con = &sess->con[i];
+
+		csm_set_state(con, CSM_STATE_CLOSED);
+		con->sess = sess;
+		if (!i) {
+			INIT_WORK(&con->cq_work, wrapper_handle_cq_comp);
+			con->cq_wq =
+				alloc_ordered_workqueue("ibtrs_clt_wq",
+							WQ_HIGHPRI);
+			if (!con->cq_wq) {
+				ERR(sess, "Failed to allocate cq workqueue.\n");
+				return -ENOMEM;
+			}
+		} else {
+			tasklet_init(&con->cq_tasklet,
+				     tasklet_handle_cq_comp, (unsigned
+							      long)(con));
+		}
+	}
+
+	return 0;
+}
+
+static struct ibtrs_session *sess_init(const struct sockaddr_storage *addr,
+				       size_t pdu_sz, void *priv,
+				       u8 reconnect_delay_sec,
+				       u16 max_segments,
+				       s16 max_reconnect_attempts)
+{
+	int err;
+	struct ibtrs_session *sess;
+
+	sess = kzalloc(sizeof(*sess), GFP_KERNEL);
+	if (!sess) {
+		err = -ENOMEM;
+		goto err;
+	}
+	atomic_set(&sess->refcount, 1);
+	sess->sm_wq = create_workqueue("sess_sm_wq");
+
+	if (!sess->sm_wq) {
+		ERR_NP("Failed to create SSM workqueue\n");
+		err = -ENOMEM;
+		goto err_free_sess;
+	}
+
+	sess->peer_addr	= *addr;
+	sess->pdu_sz	= pdu_sz;
+	sess->priv	= priv;
+	sess->con	= kcalloc(CONS_PER_SESSION, sizeof(*sess->con),
+				  GFP_KERNEL);
+	if (!sess->con) {
+		err = -ENOMEM;
+		goto err_free_sm_wq;
+	}
+
+	sess->rdma_info_iu = NULL;
+	err = sess_init_cons(sess);
+	if (err) {
+		ERR_NP("Failed to initialize cons\n");
+		goto err_free_con;
+	}
+
+	err = ibtrs_clt_init_stats(sess);
+	if (err) {
+		ERR_NP("Failed to initialize statistics\n");
+		goto err_cons;
+	}
+
+	sess->reconnect_delay_sec	= reconnect_delay_sec;
+	sess->max_reconnect_attempts	= max_reconnect_attempts;
+	sess->max_pages_per_mr		= max_segments;
+	init_waitqueue_head(&sess->wait_q);
+	init_waitqueue_head(&sess->mu_iu_wait_q);
+	init_waitqueue_head(&sess->mu_buf_wait_q);
+
+	init_waitqueue_head(&sess->tags_wait);
+	sess->state = SSM_STATE_IDLE;
+	mutex_lock(&sess_mutex);
+	list_add(&sess->list, &sess_list);
+	mutex_unlock(&sess_mutex);
+
+	ibtrs_set_heartbeat_timeout(&sess->heartbeat,
+				    default_heartbeat_timeout_ms <
+				    MIN_HEARTBEAT_TIMEOUT_MS ?
+				    MIN_HEARTBEAT_TIMEOUT_MS :
+				    default_heartbeat_timeout_ms);
+	atomic64_set(&sess->heartbeat.send_ts_ms, 0);
+	atomic64_set(&sess->heartbeat.recv_ts_ms, 0);
+	sess->heartbeat.addr = sess->addr;
+	sess->heartbeat.hostname = sess->hostname;
+
+	INIT_DELAYED_WORK(&sess->heartbeat_dwork, heartbeat_work);
+	INIT_DELAYED_WORK(&sess->reconnect_dwork,
+			  ibtrs_clt_sess_reconnect_worker);
+
+	return sess;
+
+err_cons:
+	sess_deinit_cons(sess);
+err_free_con:
+	kfree(sess->con);
+	sess->con = NULL;
+err_free_sm_wq:
+	destroy_workqueue(sess->sm_wq);
+err_free_sess:
+	kfree(sess);
+err:
+	return ERR_PTR(err);
+}
+
+static int init_con(struct ibtrs_session *sess, struct ibtrs_con *con,
+		    short cpu, bool user)
+{
+	int err;
+
+	con->sess			= sess;
+	con->cpu			= cpu;
+	con->user			= user;
+	con->device_being_removed	= false;
+
+	err = create_cm_id_con(&sess->peer_addr, con);
+	if (err) {
+		ERR(sess, "Failed to create CM ID for connection\n");
+		return err;
+	}
+
+	csm_set_state(con, CSM_STATE_RESOLVING_ADDR);
+	err = resolve_addr(con, &sess->peer_addr);
+	if (err) {
+		ERR(sess, "Failed to resolve address, errno: %d\n", err);
+		goto err_cm_id;
+	}
+
+	sess->active_cnt++;
+
+	return 0;
+
+err_cm_id:
+	csm_set_state(con, CSM_STATE_CLOSED);
+	ibtrs_clt_destroy_cm_id(con);
+
+	return err;
+}
+
+static int create_con(struct ibtrs_con *con)
+{
+	int err, cq_vector;
+	u16 cq_size, wr_queue_size;
+	struct ibtrs_session *sess = con->sess;
+	int num_wr = DIV_ROUND_UP(con->sess->max_pages_per_mr,
+				  con->sess->max_sge);
+
+	if (con->user) {
+		err = create_ib_sess(con);
+		if (err) {
+			ERR(sess,
+			    "Failed to create IB session, errno: %d\n", err);
+			goto err_cm_id;
+		}
+		cq_size		= USR_CON_BUF_SIZE + 1;
+		wr_queue_size	= USR_CON_BUF_SIZE + 1;
+	} else {
+		err = ib_get_max_wr_queue_size(sess->ib_device);
+		if (err < 0)
+			goto err_cm_id;
+		cq_size		= sess->queue_depth;
+		wr_queue_size	= min_t(int, err - 1,
+					sess->queue_depth * num_wr *
+					(use_fr ? 3 : 2));
+	}
+
+	err = alloc_con_fast_pool(con);
+	if (err) {
+		ERR(sess, "Failed to allocate fast memory "
+		    "pool, errno: %d\n", err);
+		goto err_cm_id;
+	}
+	con->ib_con.addr = sess->addr;
+	con->ib_con.hostname = sess->hostname;
+	cq_vector = con->cpu % sess->ib_device->num_comp_vectors;
+	err = ib_con_init(&con->ib_con, con->cm_id,
+			  sess->max_sge, cq_event_handler, con, cq_vector,
+			  cq_size, wr_queue_size, &sess->ib_sess);
+	if (err) {
+		ERR(sess,
+		    "Failed to initialize IB connection, errno: %d\n", err);
+		goto err_pool;
+	}
+
+	DEB("setup_buffers successful\n");
+	err = post_recv(con);
+	if (err)
+		goto err_ib_con;
+
+	err = connect_qp(con);
+	if (err) {
+		ERR(con->sess, "Failed to connect QP, errno: %d\n", err);
+		goto err_wq;
+	}
+
+	DEB("connect qp successful\n");
+	atomic_set(&con->io_cnt, 0);
+	return 0;
+
+err_wq:
+	rdma_disconnect(con->cm_id);
+err_ib_con:
+	ib_con_destroy(&con->ib_con);
+err_pool:
+	free_con_fast_pool(con);
+err_cm_id:
+	ibtrs_clt_destroy_cm_id(con);
+
+	return err;
+}
+
+struct ibtrs_session *ibtrs_clt_open(const struct sockaddr_storage *addr,
+				     size_t pdu_sz, void *priv,
+				     u8 reconnect_delay_sec, u16 max_segments,
+				     s16 max_reconnect_attempts)
+{
+	int err;
+	struct ibtrs_session *sess;
+	char str_addr[IBTRS_ADDRLEN];
+
+	if (!clt_ops_are_valid(clt_ops)) {
+		ERR_NP("User module did not register ops callbacks\n");
+		err = -EINVAL;
+		goto err;
+	}
+
+	err = ibtrs_addr_to_str(addr, str_addr, sizeof(str_addr));
+	if (err < 0) {
+		ERR_NP("Establishing session to server failed, converting"
+		       " addr from binary to string failed, errno: %d\n", err);
+		return ERR_PTR(err);
+	}
+
+	INFO_NP("Establishing session to server %s\n", str_addr);
+
+	sess = sess_init(addr, pdu_sz, priv, reconnect_delay_sec,
+			 max_segments, max_reconnect_attempts);
+	if (IS_ERR(sess)) {
+		ERR_NP("Establishing session to %s failed, errno: %ld\n",
+		       str_addr, PTR_ERR(sess));
+		err = PTR_ERR(sess);
+		goto err;
+	}
+
+	get_sess(sess);
+	strlcpy(sess->addr, str_addr, sizeof(sess->addr));
+	err = init_con(sess, &sess->con[0], 0, true);
+	if (err) {
+		ERR(sess, "Establishing session to server failed,"
+		    " failed to init user connection, errno: %d\n", err);
+		/* Always return 'No route to host' when the connection can't be
+		 * established.
+		 */
+		err = -EHOSTUNREACH;
+		goto err1;
+	}
+
+	err = wait_for_ssm_state(sess, SSM_STATE_CONNECTED);
+	if (err) {
+		ERR(sess, "Establishing session to server failed,"
+		    " failed to establish connections, errno: %d\n", err);
+		put_sess(sess);
+		goto err; /* state machine will do the clean up. */
+	}
+	err = ibtrs_clt_create_sess_files(&sess->kobj, &sess->kobj_stats,
+					  sess->addr);
+	if (err) {
+		ERR(sess, "Establishing session to server failed,"
+		    " failed to create session sysfs files, errno: %d\n", err);
+		put_sess(sess);
+		ibtrs_clt_close(sess);
+		goto err;
+	}
+
+	put_sess(sess);
+	return sess;
+
+err1:
+	destroy_workqueue(sess->sm_wq);
+	sess_deinit_cons(sess);
+	kfree(sess->con);
+	sess->con = NULL;
+	ibtrs_clt_free_stats(sess);
+	mutex_lock(&sess_mutex);
+	list_del(&sess->list);
+	mutex_unlock(&sess_mutex);
+	kfree(sess);
+err:
+	return ERR_PTR(err);
+}
+EXPORT_SYMBOL(ibtrs_clt_open);
+
+int ibtrs_clt_close(struct ibtrs_session *sess)
+{
+	struct completion dc;
+
+	INFO(sess, "Session will be disconnected\n");
+
+	init_completion(&dc);
+	sess->destroy_completion = &dc;
+	ssm_schedule_event(sess, SSM_EV_SESS_CLOSE);
+	wait_for_completion(&dc);
+
+	return 0;
+}
+EXPORT_SYMBOL(ibtrs_clt_close);
+
+int ibtrs_clt_reconnect(struct ibtrs_session *sess)
+{
+	ssm_schedule_event(sess, SSM_EV_RECONNECT_USER);
+
+	INFO(sess, "Session reconnect event queued\n");
+
+	return 0;
+}
+
+void ibtrs_clt_set_max_reconnect_attempts(struct ibtrs_session *sess, s16 value)
+{
+	sess->max_reconnect_attempts = value;
+}
+
+s16
+inline ibtrs_clt_get_max_reconnect_attempts(const struct ibtrs_session *sess)
+{
+	return sess->max_reconnect_attempts;
+}
+
+static inline
+void ibtrs_clt_record_sg_distr(u64 *stat, u64 *total, unsigned int cnt)
+{
+	int i;
+
+	i = cnt > MAX_LIN_SG ? ilog2(cnt) + MAX_LIN_SG - MIN_LOG_SG + 1 : cnt;
+	i = i > SG_DISTR_LEN ? SG_DISTR_LEN : i;
+
+	stat[i]++;
+	(*total)++;
+}
+
+static int ibtrs_clt_rdma_write_desc(struct ibtrs_con *con,
+				     struct rdma_req *req, u64 buf,
+				     size_t u_msg_len, u32 imm,
+				     struct ibtrs_msg_rdma_write *msg)
+{
+	int ret;
+	size_t ndesc = con->sess->max_pages_per_mr;
+	struct ibtrs_sg_desc *desc;
+
+	desc = kmalloc_array(ndesc, sizeof(*desc), GFP_ATOMIC);
+	if (!desc) {
+		ib_dma_unmap_sg(con->sess->ib_device, req->sglist,
+				req->sg_cnt, req->dir);
+		return -ENOMEM;
+	}
+	ret = ibtrs_fast_reg_map_data(con, desc, req);
+	if (unlikely(ret < 0)) {
+		ERR_RL(con->sess,
+		       "RDMA-Write failed, fast reg. data mapping"
+		       " failed, errno: %d\n", ret);
+		ib_dma_unmap_sg(con->sess->ib_device, req->sglist,
+				req->sg_cnt, req->dir);
+		kfree(desc);
+		return ret;
+	}
+	ret = ibtrs_post_send_rdma_desc(con, req, desc, ret, buf,
+					u_msg_len + sizeof(*msg), imm);
+	if (unlikely(ret)) {
+		ERR(con->sess, "RDMA-Write failed, posting work"
+		    " request failed, errno: %d\n", ret);
+		ibtrs_unmap_fast_reg_data(con, req);
+		ib_dma_unmap_sg(con->sess->ib_device, req->sglist,
+				req->sg_cnt, req->dir);
+	}
+	kfree(desc);
+	return ret;
+}
+
+static int ibtrs_clt_rdma_write_sg(struct ibtrs_con *con, struct rdma_req *req,
+				   const struct kvec *vec, size_t u_msg_len,
+				   size_t data_len)
+{
+	int count = 0;
+	struct ibtrs_msg_rdma_write *msg;
+	u32 imm;
+	int ret;
+	int buf_id;
+	u64 buf;
+
+	const u32 tsize = sizeof(*msg) + data_len + u_msg_len;
+
+	if (unlikely(tsize > con->sess->chunk_size)) {
+		WRN_RL(con->sess, "RDMA-Write failed, data size too big %d >"
+		       " %d\n", tsize, con->sess->chunk_size);
+		return -EMSGSIZE;
+	}
+	if (req->sg_cnt) {
+		count = ib_dma_map_sg(con->sess->ib_device, req->sglist,
+				      req->sg_cnt, req->dir);
+		if (unlikely(!count)) {
+			WRN_RL(con->sess,
+			       "RDMA-Write failed, dma map failed\n");
+			return -EINVAL;
+		}
+	}
+
+	copy_from_kvec(req->iu->buf, vec, u_msg_len);
+
+	/* put ibtrs msg after sg and user message */
+	msg		= req->iu->buf + u_msg_len;
+	msg->hdr.type	= IBTRS_MSG_RDMA_WRITE;
+	msg->hdr.tsize	= tsize;
+
+	/* ibtrs message on server side will be after user data and message */
+	imm = req->tag->mem_id_mask + data_len + u_msg_len;
+	buf_id = req->tag->mem_id;
+	req->sg_size = data_len + u_msg_len + sizeof(*msg);
+
+	buf = con->sess->srv_rdma_addr[buf_id];
+	if (count > fmr_sg_cnt)
+		return ibtrs_clt_rdma_write_desc(con, req, buf, u_msg_len, imm,
+						 msg);
+
+	ret = ibtrs_post_send_rdma_more(con, req, buf, u_msg_len + sizeof(*msg),
+					imm);
+	if (unlikely(ret)) {
+		ERR(con->sess, "RDMA-Write failed, posting work"
+		    " request failed, errno: %d\n", ret);
+		if (count)
+			ib_dma_unmap_sg(con->sess->ib_device, req->sglist,
+					req->sg_cnt, req->dir);
+	}
+	return ret;
+}
+
+static void ibtrs_clt_update_rdma_stats(struct ibtrs_clt_stats *s,
+					size_t size, bool read)
+{
+	int cpu = raw_smp_processor_id();
+
+	if (read) {
+		s->rdma_stats[cpu].cnt_read++;
+		s->rdma_stats[cpu].size_total_read += size;
+	} else {
+		s->rdma_stats[cpu].cnt_write++;
+		s->rdma_stats[cpu].size_total_write += size;
+	}
+
+	s->rdma_stats[cpu].inflight++;
+}
+
+/**
+ * ibtrs_rdma_con_id() - returns RDMA connection id
+ *
+ * Note:
+ *     RDMA connection starts from 1.
+ *     0 connection is for user messages.
+ */
+static inline int ibtrs_rdma_con_id(struct ibtrs_tag *tag)
+{
+	return (tag->cpu_id % (CONS_PER_SESSION - 1)) + 1;
+}
+
+int ibtrs_clt_rdma_write(struct ibtrs_session *sess, struct ibtrs_tag *tag,
+			 void *priv, const struct kvec *vec, size_t nr,
+			 size_t data_len, struct scatterlist *sg,
+			 unsigned int sg_len)
+{
+	struct ibtrs_iu *iu;
+	struct rdma_req *req;
+	int err;
+	struct ibtrs_con *con;
+	int con_id;
+	size_t u_msg_len;
+
+	smp_rmb(); /* fence sess->state check */
+	if (unlikely(sess->state != SSM_STATE_CONNECTED)) {
+		ERR_RL(sess,
+		       "RDMA-Write failed, not connected (session state %s)\n",
+		       ssm_state_str(sess->state));
+		return -ECOMM;
+	}
+
+	u_msg_len = kvec_length(vec, nr);
+	if (unlikely(u_msg_len > IO_MSG_SIZE)) {
+		WRN_RL(sess, "RDMA-Write failed, user message size"
+		       " is %zu B big, max size is %d B\n", u_msg_len,
+		       IO_MSG_SIZE);
+		return -EMSGSIZE;
+	}
+
+	con_id = ibtrs_rdma_con_id(tag);
+	if (WARN_ON(con_id >= CONS_PER_SESSION))
+		return -EINVAL;
+	con = &sess->con[con_id];
+	rcu_read_lock();
+	smp_rmb(); /* fence con->state check */
+	if (unlikely(con->state != CSM_STATE_CONNECTED)) {
+		rcu_read_unlock();
+		ERR_RL(sess, "RDMA-Write failed, not connected"
+		       " (connection %d state %s)\n",
+		       con_id,
+		       csm_state_str(con->state));
+		return -ECOMM;
+	}
+
+	iu = sess->io_tx_ius[tag->mem_id];
+	req = &sess->reqs[tag->mem_id];
+	req->con	= con;
+	req->tag	= tag;
+	if (sess->enable_rdma_lat)
+		req->start_time = ibtrs_clt_get_raw_ms();
+	req->in_use	= true;
+
+	req->iu		= iu;
+	req->sglist	= sg;
+	req->sg_cnt	= sg_len;
+	req->priv	= priv;
+	req->dir        = DMA_TO_DEVICE;
+
+	err = ibtrs_clt_rdma_write_sg(con, req, vec, u_msg_len, data_len);
+	rcu_read_unlock();
+	if (unlikely(err)) {
+		req->in_use = false;
+		ERR_RL(sess, "RDMA-Write failed, failed to transfer scatter"
+		       " gather list, errno: %d\n", err);
+		return err;
+	}
+
+	ibtrs_heartbeat_set_send_ts(&sess->heartbeat);
+	ibtrs_clt_record_sg_distr(sess->stats.sg_list_distr[tag->cpu_id],
+				  &sess->stats.sg_list_total[tag->cpu_id],
+				  sg_len);
+	ibtrs_clt_update_rdma_stats(&sess->stats, u_msg_len + data_len, false);
+
+	return err;
+}
+EXPORT_SYMBOL(ibtrs_clt_rdma_write);
+
+static int ibtrs_clt_request_rdma_write_sg(struct ibtrs_con *con,
+					   struct rdma_req *req,
+					   const struct kvec *vec,
+					   size_t u_msg_len,
+					   size_t result_len)
+{
+	int count, i, ret;
+	struct ibtrs_msg_req_rdma_write *msg;
+	u32 imm;
+	int buf_id;
+	struct scatterlist *sg;
+	struct ib_device *ibdev = con->sess->ib_device;
+	const u32 tsize = sizeof(*msg) + result_len + u_msg_len;
+
+	if (unlikely(tsize > con->sess->chunk_size)) {
+		WRN_RL(con->sess, "Request-RDMA-Write failed, message size is"
+		       " %d, bigger than CHUNK_SIZE %d\n", tsize,
+			con->sess->chunk_size);
+		return -EMSGSIZE;
+	}
+
+	count = ib_dma_map_sg(ibdev, req->sglist, req->sg_cnt, req->dir);
+
+	if (unlikely(!count)) {
+		WRN_RL(con->sess,
+		       "Request-RDMA-Write failed, dma map failed\n");
+		return -EINVAL;
+	}
+
+	req->data_len = result_len;
+	copy_from_kvec(req->iu->buf, vec, u_msg_len);
+
+	/* put our message into req->buf after user message*/
+	msg		= req->iu->buf + u_msg_len;
+	msg->hdr.type	= IBTRS_MSG_REQ_RDMA_WRITE;
+	msg->hdr.tsize	= tsize;
+	msg->sg_cnt	= count;
+
+	if (WARN_ON(msg->hdr.tsize > con->sess->chunk_size))
+		return -EINVAL;
+	if (count > fmr_sg_cnt) {
+		ret = ibtrs_fast_reg_map_data(con, msg->desc, req);
+		if (ret < 0) {
+			ERR_RL(con->sess,
+			       "Request-RDMA-Write failed, failed to map fast"
+			       " reg. data, errno: %d\n", ret);
+			ib_dma_unmap_sg(con->sess->ib_device, req->sglist,
+					req->sg_cnt, req->dir);
+			return ret;
+		}
+		msg->sg_cnt = ret;
+	} else {
+		for_each_sg(req->sglist, sg, req->sg_cnt, i) {
+			msg->desc[i].addr = ib_sg_dma_address(ibdev, sg);
+			msg->desc[i].key = con->sess->ib_sess.mr->rkey;
+			msg->desc[i].len = ib_sg_dma_len(ibdev, sg);
+			DEB("desc addr %llu, len %u, i %d tsize %u\n",
+			    msg->desc[i].addr, msg->desc[i].len, i,
+			    msg->hdr.tsize);
+		}
+		req->nmdesc = 0;
+	}
+	/* ibtrs message will be after the space reserved for disk data and
+	 * user message
+	 */
+	imm = req->tag->mem_id_mask + result_len + u_msg_len;
+	buf_id = req->tag->mem_id;
+
+	req->sg_size = sizeof(*msg) + msg->sg_cnt * IBTRS_SG_DESC_LEN +
+		u_msg_len;
+	ret = ibtrs_post_send_rdma(con, req, con->sess->srv_rdma_addr[buf_id],
+				   result_len, imm);
+	if (unlikely(ret)) {
+		ERR(con->sess, "Request-RDMA-Write failed,"
+		    " posting work request failed, errno: %d\n", ret);
+
+		if (unlikely(count > fmr_sg_cnt)) {
+			ibtrs_unmap_fast_reg_data(con, req);
+			ib_dma_unmap_sg(con->sess->ib_device, req->sglist,
+					req->sg_cnt, req->dir);
+		}
+	}
+	return ret;
+}
+
+int ibtrs_clt_request_rdma_write(struct ibtrs_session *sess,
+				 struct ibtrs_tag *tag, void *priv,
+				 const struct kvec *vec, size_t nr,
+				 size_t result_len,
+				 struct scatterlist *recv_sg,
+				 unsigned int recv_sg_len)
+{
+	struct ibtrs_iu *iu;
+	struct rdma_req *req;
+	int err;
+	struct ibtrs_con *con;
+	int con_id;
+	size_t u_msg_len;
+
+	smp_rmb(); /* fence sess->state check */
+	if (unlikely(sess->state != SSM_STATE_CONNECTED)) {
+		ERR_RL(sess,
+		       "Request-RDMA-Write failed, not connected (session"
+		       " state %s)\n", ssm_state_str(sess->state));
+		return -ECOMM;
+	}
+
+	u_msg_len = kvec_length(vec, nr);
+	if (unlikely(u_msg_len > IO_MSG_SIZE ||
+		     sizeof(struct ibtrs_msg_req_rdma_write) +
+		     recv_sg_len * IBTRS_SG_DESC_LEN > sess->max_req_size)) {
+		WRN_RL(sess, "Request-RDMA-Write failed, user message size"
+		       " is %zu B big, max size is %d B\n", u_msg_len,
+		       IO_MSG_SIZE);
+		return -EMSGSIZE;
+	}
+
+	con_id = ibtrs_rdma_con_id(tag);
+	if (WARN_ON(con_id >= CONS_PER_SESSION))
+		return -EINVAL;
+	con = &sess->con[con_id];
+	rcu_read_lock();
+	smp_rmb(); /* fence con->state check */
+	if (unlikely(con->state != CSM_STATE_CONNECTED)) {
+		rcu_read_unlock();
+		ERR_RL(sess, "RDMA-Write failed, not connected"
+		       " (connection %d state %s)\n",
+		       con_id,
+		       csm_state_str(con->state));
+		return -ECOMM;
+	}
+
+	iu = sess->io_tx_ius[tag->mem_id];
+	req = &sess->reqs[tag->mem_id];
+	req->con	= con;
+	req->tag	= tag;
+	if (sess->enable_rdma_lat)
+		req->start_time = ibtrs_clt_get_raw_ms();
+	req->in_use	= true;
+
+	req->iu		= iu;
+	req->sglist	= recv_sg;
+	req->sg_cnt	= recv_sg_len;
+	req->priv	= priv;
+	req->dir        = DMA_FROM_DEVICE;
+
+	err = ibtrs_clt_request_rdma_write_sg(con, req, vec,
+					      u_msg_len, result_len);
+	rcu_read_unlock();
+	if (unlikely(err)) {
+		req->in_use = false;
+		ERR_RL(sess, "Request-RDMA-Write failed, failed to transfer"
+		       " scatter gather list, errno: %d\n", err);
+		return err;
+	}
+
+	ibtrs_heartbeat_set_send_ts(&sess->heartbeat);
+	ibtrs_clt_record_sg_distr(sess->stats.sg_list_distr[tag->cpu_id],
+				  &sess->stats.sg_list_total[tag->cpu_id],
+				  recv_sg_len);
+	ibtrs_clt_update_rdma_stats(&sess->stats, u_msg_len + result_len, true);
+
+	return err;
+}
+EXPORT_SYMBOL(ibtrs_clt_request_rdma_write);
+
+static bool ibtrs_clt_get_usr_msg_buf(struct ibtrs_session *sess)
+{
+	return atomic_dec_if_positive(&sess->peer_usr_msg_bufs) >= 0;
+}
+
+int ibtrs_clt_send(struct ibtrs_session *sess, const struct kvec *vec,
+		   size_t nr)
+{
+	struct ibtrs_con *con;
+	struct ibtrs_iu *iu = NULL;
+	struct ibtrs_msg_user *msg;
+	size_t len;
+	bool closed_st = false;
+	int err = 0;
+
+	con = &sess->con[0];
+
+	smp_rmb(); /* fence sess->state check */
+	if (unlikely(con->state != CSM_STATE_CONNECTED ||
+		     sess->state != SSM_STATE_CONNECTED)) {
+		ERR_RL(sess, "Sending user message failed, not connected,"
+		       " Connection state is %s, Session state is %s\n",
+		       csm_state_str(con->state), ssm_state_str(sess->state));
+		return -ECOMM;
+	}
+
+	len = kvec_length(vec, nr);
+
+	DEB("send user msg length=%zu, peer_msg_buf %d\n", len,
+	    atomic_read(&sess->peer_usr_msg_bufs));
+	if (len > sess->max_req_size - IBTRS_HDR_LEN) {
+		ERR_RL(sess, "Sending user message failed,"
+		       " user message length too large (len: %zu)\n", len);
+		return -EMSGSIZE;
+	}
+
+	wait_event(sess->mu_buf_wait_q,
+		   (closed_st = (con->state != CSM_STATE_CONNECTED ||
+				 sess->state != SSM_STATE_CONNECTED)) ||
+		   ibtrs_clt_get_usr_msg_buf(sess));
+
+	if (unlikely(closed_st)) {
+		ERR_RL(sess, "Sending user message failed, not connected"
+		       " Connection state is %s, Session state is %s\n",
+		       csm_state_str(con->state), ssm_state_str(sess->state));
+		return -ECOMM;
+	}
+
+	wait_event(sess->mu_iu_wait_q,
+		   (closed_st = (con->state != CSM_STATE_CONNECTED ||
+				 sess->state != SSM_STATE_CONNECTED)) ||
+		   (iu = get_u_msg_iu(sess)) != NULL);
+
+	if (unlikely(closed_st)) {
+		ERR_RL(sess, "Sending user message failed, not connected"
+		       " Connection state is %s, Session state is %s\n",
+		       csm_state_str(con->state), ssm_state_str(sess->state));
+		err = -ECOMM;
+		goto err_iu;
+	}
+
+	rcu_read_lock();
+	smp_rmb(); /* fence con->state check */
+	if (unlikely(con->state != CSM_STATE_CONNECTED)) {
+		rcu_read_unlock();
+		ERR_RL(sess, "Sending user message failed, not connected,"
+		       " Connection state is %s, Session state is %s\n",
+		       csm_state_str(con->state), ssm_state_str(sess->state));
+		err = -ECOMM;
+		goto err_post_send;
+	}
+
+	msg		= iu->buf;
+	msg->hdr.type	= IBTRS_MSG_USER;
+	msg->hdr.tsize	= IBTRS_HDR_LEN + len;
+	copy_from_kvec(msg->payl, vec, len);
+
+	ibtrs_deb_msg_hdr("Sending: ", &msg->hdr);
+	err = ibtrs_post_send(con->ib_con.qp, con->sess->ib_sess.mr, iu,
+			      msg->hdr.tsize);
+	rcu_read_unlock();
+	if (unlikely(err)) {
+		ERR_RL(sess, "Sending user message failed, posting work"
+		       " request failed, errno: %d\n", err);
+		goto err_post_send;
+	}
+
+	sess->stats.user_ib_msgs.sent_msg_cnt++;
+	sess->stats.user_ib_msgs.sent_size += len;
+
+	ibtrs_heartbeat_set_send_ts(&sess->heartbeat);
+
+	return 0;
+
+err_post_send:
+	put_u_msg_iu(sess, iu);
+	wake_up(&sess->mu_iu_wait_q);
+err_iu:
+	atomic_inc(&sess->peer_usr_msg_bufs);
+	wake_up(&sess->mu_buf_wait_q);
+	return err;
+}
+EXPORT_SYMBOL(ibtrs_clt_send);
+
+static void csm_resolving_addr(struct ibtrs_con *con, enum csm_ev ev)
+{
+	DEB("con %p, state %s event %s\n", con, csm_state_str(con->state),
+	    csm_event_str(ev));
+	switch (ev) {
+	case CSM_EV_ADDR_RESOLVED: {
+		int err;
+
+		csm_set_state(con, CSM_STATE_RESOLVING_ROUTE);
+		err = resolve_route(con);
+		if (err) {
+			ERR(con->sess, "Failed to resolve route, errno: %d\n",
+			    err);
+			ibtrs_clt_destroy_cm_id(con);
+			csm_set_state(con, CSM_STATE_CLOSED);
+			ssm_schedule_event(con->sess, SSM_EV_CON_CLOSED);
+		}
+		break;
+		}
+	case CSM_EV_CON_ERROR:
+	case CSM_EV_SESS_CLOSING:
+		ibtrs_clt_destroy_cm_id(con);
+		csm_set_state(con, CSM_STATE_CLOSED);
+		ssm_schedule_event(con->sess, SSM_EV_CON_CLOSED);
+		break;
+	default:
+		WRN(con->sess,
+		    "Unexpected CSM Event '%s' in state '%s' received\n",
+		    csm_event_str(ev), csm_state_str(con->state));
+		return;
+	}
+}
+
+static void csm_resolving_route(struct ibtrs_con *con, enum csm_ev ev)
+{
+	int err;
+
+	DEB("con %p, state %s event %s\n", con, csm_state_str(con->state),
+	    csm_event_str(ev));
+	switch (ev) {
+	case CSM_EV_ROUTE_RESOLVED:
+		err = create_con(con);
+		if (err) {
+			ERR(con->sess,
+			    "Failed to create connection, errno: %d\n", err);
+			csm_set_state(con, CSM_STATE_CLOSED);
+			ssm_schedule_event(con->sess, SSM_EV_CON_CLOSED);
+			return;
+		}
+		csm_set_state(con, CSM_STATE_CONNECTING);
+		break;
+	case CSM_EV_CON_ERROR:
+	case CSM_EV_SESS_CLOSING:
+		ibtrs_clt_destroy_cm_id(con);
+		csm_set_state(con, CSM_STATE_CLOSED);
+		ssm_schedule_event(con->sess, SSM_EV_CON_CLOSED);
+		break;
+	default:
+		WRN(con->sess,
+		    "Unexpected CSM Event '%s' in state '%s' received\n",
+		    csm_event_str(ev), csm_state_str(con->state));
+		return;
+	}
+}
+
+static int con_disconnect(struct ibtrs_con *con)
+{
+	int err;
+
+	err = rdma_disconnect(con->cm_id);
+	if (err)
+		ERR(con->sess,
+		    "Failed to disconnect RDMA connection, errno: %d\n", err);
+	return err;
+}
+
+static int send_msg_sess_info(struct ibtrs_con *con)
+{
+	struct ibtrs_msg_sess_info *msg;
+	int err;
+	struct ibtrs_session *sess = con->sess;
+
+	msg = sess->sess_info_iu->buf;
+
+	fill_ibtrs_msg_sess_info(msg, hostname);
+
+	err = ibtrs_post_send(con->ib_con.qp, con->sess->ib_sess.mr,
+			      sess->sess_info_iu, msg->hdr.tsize);
+	if (unlikely(err))
+		ERR(sess, "Sending sess info failed, "
+			  "posting msg to QP failed, errno: %d\n", err);
+
+	return err;
+}
+
+static void csm_connecting(struct ibtrs_con *con, enum csm_ev ev)
+{
+	DEB("con %p, state %s event %s\n", con, csm_state_str(con->state),
+	    csm_event_str(ev));
+	switch (ev) {
+	case CSM_EV_CON_ESTABLISHED:
+		csm_set_state(con, CSM_STATE_CONNECTED);
+		if (con->user) {
+			if (send_msg_sess_info(con))
+				goto destroy;
+		}
+		ssm_schedule_event(con->sess, SSM_EV_CON_CONNECTED);
+		break;
+	case CSM_EV_CON_ERROR:
+	case CSM_EV_SESS_CLOSING:
+	case CSM_EV_WC_ERROR:
+	case CSM_EV_CON_DISCONNECTED:
+destroy:
+		csm_set_state(con, CSM_STATE_CLOSING);
+		con_disconnect(con);
+		/* No CM_DISCONNECTED after rdma_disconnect, triger sm*/
+		csm_schedule_event(con, CSM_EV_CON_DISCONNECTED);
+		break;
+	default:
+		WRN(con->sess,
+		    "Unexpected CSM Event '%s' in state '%s' received\n",
+		    csm_event_str(ev), csm_state_str(con->state));
+		return;
+	}
+}
+
+static void csm_connected(struct ibtrs_con *con, enum csm_ev ev)
+{
+	DEB("con %p, state %s event %s\n", con, csm_state_str(con->state),
+	    csm_event_str(ev));
+	switch (ev) {
+	case CSM_EV_WC_ERROR:
+	case CSM_EV_CON_ERROR:
+	case CSM_EV_CON_DISCONNECTED:
+		ssm_schedule_event(con->sess, SSM_EV_CON_ERROR);
+		csm_set_state(con, CSM_STATE_CLOSING);
+		con_disconnect(con);
+		break;
+	case CSM_EV_SESS_CLOSING:
+		csm_set_state(con, CSM_STATE_CLOSING);
+		con_disconnect(con);
+		break;
+	default:
+		WRN(con->sess,
+		    "Unexpected CSM Event '%s' in state '%s' received\n",
+		    csm_event_str(ev), csm_state_str(con->state));
+		return;
+	}
+}
+
+static void csm_closing(struct ibtrs_con *con, enum csm_ev ev)
+{
+	DEB("con %p, state %s event %s\n", con, csm_state_str(con->state),
+	    csm_event_str(ev));
+	switch (ev) {
+	case CSM_EV_CON_DISCONNECTED:
+	case CSM_EV_CON_ERROR:
+	case CSM_EV_SESS_CLOSING: {
+		int err;
+
+		csm_set_state(con, CSM_STATE_FLUSHING);
+		synchronize_rcu();
+
+		err = post_beacon(&con->ib_con);
+		if (err) {
+			WRN(con->sess, "Failed to post BEACON,"
+			    " will destroy connection directly\n");
+			goto destroy;
+		}
+
+		err = ibtrs_request_cq_notifications(&con->ib_con);
+		if (unlikely(err < 0)) {
+			WRN(con->sess, "Requesting CQ Notification for"
+			    " ib_con failed. Connection will be destroyed\n");
+			goto destroy;
+		} else if (err > 0) {
+			err = get_process_wcs(con);
+			if (unlikely(err))
+				goto destroy;
+			break;
+		}
+		break;
+destroy:
+		con_destroy(con);
+		csm_set_state(con, CSM_STATE_CLOSED);
+		ssm_schedule_event(con->sess, SSM_EV_CON_CLOSED);
+		break;
+		}
+	case CSM_EV_CON_ESTABLISHED:
+	case CSM_EV_WC_ERROR:
+		/* ignore WC errors */
+		break;
+	default:
+		WRN(con->sess,
+		    "Unexpected CSM Event '%s' in state '%s' received\n",
+		    csm_event_str(ev), csm_state_str(con->state));
+		return;
+	}
+}
+
+static void csm_flushing(struct ibtrs_con *con, enum csm_ev ev)
+{
+	DEB("con %p, state %s event %s\n", con, csm_state_str(con->state),
+	    csm_event_str(ev));
+	switch (ev) {
+	case CSM_EV_BEACON_COMPLETED:
+		con_destroy(con);
+		csm_set_state(con, CSM_STATE_CLOSED);
+		ssm_schedule_event(con->sess, SSM_EV_CON_CLOSED);
+		break;
+	case CSM_EV_WC_ERROR:
+	case CSM_EV_CON_ERROR:
+		/* ignore WC and CON errors */
+	case CSM_EV_CON_DISCONNECTED:
+		/* Ignore CSM_EV_CON_DISCONNECTED. At this point we could have
+		 * already received a CSM_EV_CON_DISCONNECTED for the same
+		 * connection, but an additional RDMA_CM_EVENT_DISCONNECTED or
+		 * RDMA_CM_EVENT_TIMEWAIT_EXIT could be generated.
+		 */
+	case CSM_EV_SESS_CLOSING:
+		break;
+	default:
+		WRN(con->sess,
+		    "Unexpected CSM Event '%s' in state '%s' received\n",
+		    csm_event_str(ev), csm_state_str(con->state));
+		return;
+	}
+}
+
+static void schedule_all_cons_close(struct ibtrs_session *sess)
+{
+	int i;
+
+	for (i = 0; i < CONS_PER_SESSION; i++)
+		csm_schedule_event(&sess->con[i], CSM_EV_SESS_CLOSING);
+}
+
+static void ssm_idle(struct ibtrs_session *sess, enum ssm_ev ev)
+{
+	DEB("sess %p, state %s event %s\n", sess, ssm_state_str(sess->state),
+	    ssm_event_str(ev));
+	switch (ev) {
+	case SSM_EV_CON_CONNECTED:
+		WARN_ON(++sess->connected_cnt != 1);
+		if (ssm_init_state(sess, SSM_STATE_WF_INFO))
+			ssm_init_state(sess, SSM_STATE_CLOSE_DESTROY);
+		break;
+	case SSM_EV_CON_CLOSED:
+		sess->active_cnt--;
+		DEB("active_cnt %d\n", sess->active_cnt);
+		WARN_ON(sess->active_cnt);
+		/* fall through */
+	case SSM_EV_CON_ERROR:
+		ssm_init_state(sess, SSM_STATE_CLOSE_DESTROY);
+		break;
+	default:
+		WRN(sess,
+		    "Unexpected SSM Event '%s' in state '%s' received\n",
+		    ssm_event_str(ev), ssm_state_str(sess->state));
+		return;
+	}
+}
+
+static int ssm_idle_reconnect_init(struct ibtrs_session *sess)
+{
+	int err, i;
+
+	sess->retry_cnt++;
+	INFO(sess, "Reconnecting session."
+	     " Retry counter=%d, max reconnect attempts=%d\n",
+	     sess->retry_cnt, sess->max_reconnect_attempts);
+
+	for (i = 0; i < CONS_PER_SESSION; i++) {
+		struct ibtrs_con *con = &sess->con[i];
+
+		csm_set_state(con, CSM_STATE_CLOSED);
+		con->sess = sess;
+	}
+	sess->connected_cnt = 0;
+	err = init_con(sess, &sess->con[0], 0, true);
+	if (err)
+		INFO(sess, "Reconnecting session failed, errno: %d\n", err);
+	return err;
+}
+
+static void ssm_idle_reconnect(struct ibtrs_session *sess, enum ssm_ev ev)
+{
+	DEB("sess %p, state %s event %s\n", sess, ssm_state_str(sess->state),
+	    ssm_event_str(ev));
+	switch (ev) {
+	case SSM_EV_CON_CONNECTED:
+		WARN_ON(++sess->connected_cnt != 1);
+		if (ssm_init_state(sess, SSM_STATE_WF_INFO_RECONNECT))
+			ssm_init_state(sess, SSM_STATE_CLOSE_RECONNECT);
+		break;
+	case SSM_EV_SESS_CLOSE:
+		ssm_init_state(sess, SSM_STATE_CLOSE_DESTROY);
+		break;
+	case SSM_EV_CON_CLOSED:
+		sess->active_cnt--;
+		DEB("active_cnt %d\n", sess->active_cnt);
+		WARN_ON(sess->active_cnt);
+		/* fall through */
+	case SSM_EV_CON_ERROR:
+		sess->stats.reconnects.fail_cnt++;
+		ssm_init_state(sess, SSM_STATE_CLOSE_RECONNECT);
+		break;
+	case SSM_EV_RECONNECT_USER:
+		sess->retry_cnt = 0;
+		break;
+	default:
+		WRN(sess,
+		    "Unexpected SSM Event '%s' in state '%s' received\n",
+		    ssm_event_str(ev), ssm_state_str(sess->state));
+		return;
+	}
+}
+
+static int ssm_wf_info_init(struct ibtrs_session *sess)
+{
+	int err;
+
+	err = ibtrs_request_cq_notifications(&sess->con[0].ib_con);
+	if (unlikely(err < 0)) {
+		return err;
+	} else if (err > 0) {
+		err = get_process_wcs(&sess->con[0]);
+		if (unlikely(err))
+			return err;
+	} else {
+		ibtrs_set_last_heartbeat(&sess->heartbeat);
+		WARN_ON(!schedule_delayed_work(&sess->heartbeat_dwork,
+					       HEARTBEAT_INTV_JIFFIES));
+	}
+	return err;
+}
+
+static void ssm_wf_info(struct ibtrs_session *sess, enum ssm_ev ev)
+{
+	DEB("sess %p, state %s event %s\n", sess, ssm_state_str(sess->state),
+	    ssm_event_str(ev));
+	switch (ev) {
+	case SSM_EV_GOT_RDMA_INFO:
+		if (ssm_init_state(sess, SSM_STATE_OPEN))
+			ssm_init_state(sess, SSM_STATE_CLOSE_DESTROY);
+		break;
+	case SSM_EV_CON_CLOSED:
+		sess->active_cnt--;
+		DEB("active_cnt %d\n", sess->active_cnt);
+		WARN_ON(sess->active_cnt);
+		/* fall through */
+	case SSM_EV_CON_ERROR:
+	case SSM_EV_RECONNECT_HEARTBEAT:
+		ssm_init_state(sess, SSM_STATE_CLOSE_DESTROY);
+		break;
+	default:
+		WRN(sess,
+		    "Unexpected SSM Event '%s' in state '%s' received\n",
+		    ssm_event_str(ev), ssm_state_str(sess->state));
+		return;
+	}
+}
+
+static void ssm_wf_info_reconnect(struct ibtrs_session *sess, enum ssm_ev ev)
+{
+	DEB("sess %p, state %s event %s\n", sess, ssm_state_str(sess->state),
+	    ssm_event_str(ev));
+	switch (ev) {
+	case SSM_EV_GOT_RDMA_INFO:
+		if (ssm_init_state(sess, SSM_STATE_OPEN_RECONNECT))
+			ssm_init_state(sess, SSM_STATE_CLOSE_RECONNECT);
+		break;
+	case SSM_EV_SESS_CLOSE:
+		ssm_init_state(sess, SSM_STATE_CLOSE_DESTROY);
+		break;
+	case SSM_EV_CON_CLOSED:
+		sess->active_cnt--;
+		DEB("active_cnt %d\n", sess->active_cnt);
+		WARN_ON(sess->active_cnt);
+		/* fall through */
+	case SSM_EV_CON_ERROR:
+	case SSM_EV_RECONNECT_HEARTBEAT:
+		sess->stats.reconnects.fail_cnt++;
+		ssm_init_state(sess, SSM_STATE_CLOSE_RECONNECT);
+		break;
+	case SSM_EV_RECONNECT_USER:
+		sess->retry_cnt = 0;
+		break;
+	default:
+		WRN(sess,
+		    "Unexpected SSM Event '%s' in state '%s' received\n",
+		    ssm_event_str(ev), ssm_state_str(sess->state));
+		return;
+	}
+}
+
+static void queue_destroy_sess(struct ibtrs_session *sess)
+{
+	kfree(sess->srv_rdma_addr);
+	sess->srv_rdma_addr = NULL;
+	ibtrs_clt_destroy_ib_session(sess);
+	sess_schedule_destroy(sess);
+}
+
+static int ibtrs_clt_request_cq_notifications(struct ibtrs_session *sess)
+{
+	int err, i;
+
+	for (i = 0; i < CONS_PER_SESSION; i++) {
+		struct ibtrs_con *con = &sess->con[i];
+
+		err = ibtrs_request_cq_notifications(&con->ib_con);
+		if (unlikely(err < 0)) {
+			return err;
+		} else if (err > 0) {
+			err = get_process_wcs(con);
+			if (unlikely(err))
+				return err;
+		}
+	}
+
+	return 0;
+}
+
+static int ibtrs_alloc_io_bufs(struct ibtrs_session *sess)
+{
+	int ret;
+
+	if (sess->io_bufs_initialized)
+		return 0;
+
+	ret = ibtrs_alloc_reqs(sess);
+	if (ret) {
+		ERR(sess,
+		    "Failed to allocate session request buffers, errno: %d\n",
+		    ret);
+		return ret;
+	}
+
+	ret = alloc_sess_fast_pool(sess);
+	if (ret)
+		return ret;
+
+	ret = alloc_sess_tags(sess);
+	if (ret) {
+		ERR(sess, "Failed to allocate session tags, errno: %d\n",
+		    ret);
+		return ret;
+	}
+
+	sess->io_bufs_initialized = true;
+
+	return 0;
+}
+
+static int ssm_open_init(struct ibtrs_session *sess)
+{
+	int i, ret;
+
+	ret = ibtrs_alloc_io_bufs(sess);
+	if (ret)
+		return ret;
+
+	ret = alloc_sess_tr_bufs(sess);
+	if (ret) {
+		ERR(sess,
+		    "Failed to allocate session transfer buffers, errno: %d\n",
+		    ret);
+		return ret;
+	}
+
+	ret = post_usr_con_recv(&sess->con[0]);
+	if (unlikely(ret))
+		return ret;
+	for (i = 1; i < CONS_PER_SESSION; i++) {
+		ret = init_con(sess, &sess->con[i], (i - 1) % num_online_cpus(),
+			       false);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
+static void ssm_open(struct ibtrs_session *sess, enum ssm_ev ev)
+{
+	DEB("sess %p, state %s event %s\n", sess, ssm_state_str(sess->state),
+	    ssm_event_str(ev));
+	switch (ev) {
+	case SSM_EV_CON_CONNECTED:
+		if (++sess->connected_cnt < CONS_PER_SESSION)
+			return;
+
+		if (ssm_init_state(sess, SSM_STATE_CONNECTED)) {
+			ssm_init_state(sess, SSM_STATE_CLOSE_DESTROY);
+			return;
+		}
+
+		INFO(sess, "IBTRS session (QPs: %d) to server established\n",
+		     CONS_PER_SESSION);
+
+		wake_up(&sess->wait_q);
+		break;
+	case SSM_EV_CON_CLOSED:
+		sess->active_cnt--;
+		DEB("active_cnt %d\n", sess->active_cnt);
+		/* fall through */
+	case SSM_EV_CON_ERROR:
+		ssm_init_state(sess, SSM_STATE_CLOSE_DESTROY);
+		break;
+	default:
+		WRN(sess,
+		    "Unexpected SSM Event '%s' in state '%s' received\n",
+		    ssm_event_str(ev), ssm_state_str(sess->state));
+		return;
+	}
+}
+
+static void ssm_open_reconnect(struct ibtrs_session *sess, enum ssm_ev ev)
+{
+	DEB("sess %p, state %s event %s\n", sess, ssm_state_str(sess->state),
+	    ssm_event_str(ev));
+	switch (ev) {
+	case SSM_EV_CON_CONNECTED:
+		if (++sess->connected_cnt < CONS_PER_SESSION)
+			return;
+
+		if (ssm_init_state(sess, SSM_STATE_CONNECTED)) {
+			ssm_init_state(sess, SSM_STATE_CLOSE_RECONNECT);
+			return;
+		}
+
+		INFO(sess, "IBTRS session (QPs: %d) to server established\n",
+		     CONS_PER_SESSION);
+
+		sess->retry_cnt = 0;
+		sess->stats.reconnects.successful_cnt++;
+		clt_ops->sess_ev(sess->priv, IBTRS_CLT_SESS_EV_RECONNECT, 0);
+
+		break;
+	case SSM_EV_SESS_CLOSE:
+		ssm_init_state(sess, SSM_STATE_CLOSE_DESTROY);
+		break;
+	case SSM_EV_CON_CLOSED:
+		sess->active_cnt--;
+		DEB("active_cnt %d\n", sess->active_cnt);
+		/* fall through */
+	case SSM_EV_CON_ERROR:
+		sess->stats.reconnects.fail_cnt++;
+		ssm_init_state(sess, SSM_STATE_CLOSE_RECONNECT);
+		break;
+	case SSM_EV_RECONNECT_USER:
+		sess->retry_cnt = 0;
+		break;
+	default:
+		WRN(sess,
+		    "Unexpected SSM Event '%s' in state '%s' received\n",
+		    ssm_event_str(ev), ssm_state_str(sess->state));
+		return;
+	}
+}
+
+static int ssm_connected_init(struct ibtrs_session *sess)
+{
+	int err;
+
+	err = ibtrs_clt_request_cq_notifications(sess);
+	if (err) {
+		ERR(sess, "Establishing Session failed, requesting"
+		    " CQ completion notification failed, errno: %d\n", err);
+		return err;
+	}
+
+	atomic_set(&sess->peer_usr_msg_bufs, USR_MSG_CNT);
+
+	return 0;
+}
+
+static int sess_disconnect_cons(struct ibtrs_session *sess)
+{
+	int i;
+
+	for (i = 0; i < CONS_PER_SESSION; i++) {
+		struct ibtrs_con *con = &sess->con[i];
+
+		rcu_read_lock();
+		smp_rmb(); /* fence con->state check */
+		if (con->state == CSM_STATE_CONNECTED)
+			rdma_disconnect(con->cm_id);
+		rcu_read_unlock();
+	}
+
+	return 0;
+}
+
+static void ssm_connected(struct ibtrs_session *sess, enum ssm_ev ev)
+{
+	DEB("sess %p, state %s event %s\n", sess, ssm_state_str(sess->state),
+	    ssm_event_str(ev));
+	switch (ev) {
+	case SSM_EV_RECONNECT_USER:
+	case SSM_EV_CON_ERROR:
+	case SSM_EV_RECONNECT_HEARTBEAT:
+		INFO(sess, "Session disconnecting\n");
+
+		if (ev == SSM_EV_RECONNECT_USER)
+			ssm_init_state(sess, SSM_STATE_CLOSE_RECONNECT_IMM);
+		else
+			ssm_init_state(sess, SSM_STATE_CLOSE_RECONNECT);
+
+		wake_up(&sess->mu_buf_wait_q);
+		wake_up(&sess->mu_iu_wait_q);
+		clt_ops->sess_ev(sess->priv, IBTRS_CLT_SESS_EV_DISCONNECTED, 0);
+		sess_disconnect_cons(sess);
+		synchronize_rcu();
+		fail_all_outstanding_reqs(sess);
+		break;
+	case SSM_EV_SESS_CLOSE:
+		cancel_delayed_work_sync(&sess->heartbeat_dwork);
+		ssm_init_state(sess, SSM_STATE_CLOSE_DESTROY);
+		break;
+	default:
+		WRN(sess,
+		    "Unexpected SSM Event '%s' in state '%s' received\n",
+		    ssm_event_str(ev), ssm_state_str(sess->state));
+		return;
+	}
+}
+
+static int ssm_reconnect_init(struct ibtrs_session *sess)
+{
+	unsigned long delay_jiffies;
+	u16 delay_sec = 0;
+
+	if (sess->retry_cnt == 0) {
+		/* If there is a connection error, we wait 5
+		 * seconds for the first reconnect retry. This is needed
+		 * because if the server has initiated the disconnect,
+		 * it might not be ready to receive a new session
+		 * request immediately.
+		 */
+		delay_sec = 5;
+	} else {
+		delay_sec = sess->reconnect_delay_sec + sess->retry_cnt;
+	}
+
+	delay_sec = delay_sec + prandom_u32() % RECONNECT_SEED;
+
+	delay_jiffies = msecs_to_jiffies(1000 * (delay_sec));
+
+	INFO(sess, "Session reconnect in %ds\n", delay_sec);
+	queue_delayed_work_on(0, sess->sm_wq,
+			      &sess->reconnect_dwork, delay_jiffies);
+	return 0;
+}
+
+static void ssm_reconnect(struct ibtrs_session *sess, enum ssm_ev ev)
+{
+	int err;
+
+	DEB("sess %p, state %s event %s\n", sess, ssm_state_str(sess->state),
+	    ssm_event_str(ev));
+	switch (ev) {
+	case SSM_EV_RECONNECT_USER:
+		sess->retry_cnt = 0;
+		cancel_delayed_work_sync(&sess->reconnect_dwork);
+	case SSM_EV_RECONNECT:
+		err =  ssm_init_state(sess, SSM_STATE_IDLE_RECONNECT);
+		if (err == -ENODEV) {
+			cancel_delayed_work_sync(&sess->reconnect_dwork);
+			ssm_init_state(sess, SSM_STATE_DISCONNECTED);
+		} else if (err) {
+			ssm_init_state(sess, SSM_STATE_CLOSE_RECONNECT);
+		}
+		break;
+	case SSM_EV_SESS_CLOSE:
+		cancel_delayed_work_sync(&sess->reconnect_dwork);
+		ssm_init_state(sess, SSM_STATE_DESTROYED);
+		break;
+	default:
+		WRN(sess,
+		    "Unexpected SSM Event '%s' in state '%s' received\n",
+		    ssm_event_str(ev), ssm_state_str(sess->state));
+		return;
+	}
+}
+
+static int ssm_close_destroy_init(struct ibtrs_session *sess)
+{
+	if (!sess->active_cnt)
+		ssm_schedule_event(sess, SSM_EV_ALL_CON_CLOSED);
+	else
+		schedule_all_cons_close(sess);
+
+	return 0;
+}
+
+static void ssm_close_destroy(struct ibtrs_session *sess, enum ssm_ev ev)
+{
+	DEB("sess %p, state %s event %s\n", sess, ssm_state_str(sess->state),
+	    ssm_event_str(ev));
+	switch (ev) {
+	case SSM_EV_CON_CLOSED:
+		sess->active_cnt--;
+		DEB("active_cnt %d\n", sess->active_cnt);
+		if (sess->active_cnt)
+			break;
+	case SSM_EV_ALL_CON_CLOSED:
+		ssm_init_state(sess, SSM_STATE_DESTROYED);
+		wake_up(&sess->wait_q);
+		break;
+	case SSM_EV_SESS_CLOSE:
+	case SSM_EV_CON_ERROR:
+	case SSM_EV_RECONNECT_USER:
+	case SSM_EV_CON_CONNECTED:
+		break;
+	default:
+		WRN(sess,
+		    "Unexpected SSM Event '%s' in state '%s' received\n",
+		    ssm_event_str(ev), ssm_state_str(sess->state));
+		return;
+	}
+}
+
+static void ssm_close_reconnect(struct ibtrs_session *sess, enum ssm_ev ev)
+{
+	DEB("sess %p, state %s event %s\n", sess, ssm_state_str(sess->state),
+	    ssm_event_str(ev));
+	switch (ev) {
+	case SSM_EV_CON_ERROR:
+	case SSM_EV_CON_CONNECTED:
+		break;
+	case SSM_EV_CON_CLOSED:
+		sess->active_cnt--;
+		DEB("active_cnt %d\n", sess->active_cnt);
+		if (sess->active_cnt)
+			break;
+	case SSM_EV_ALL_CON_CLOSED:
+		if (!sess->ib_sess_destroy_completion &&
+		    (sess->max_reconnect_attempts == -1 ||
+		    (sess->max_reconnect_attempts > 0 &&
+		     sess->retry_cnt < sess->max_reconnect_attempts))) {
+			ssm_init_state(sess, SSM_STATE_RECONNECT);
+		} else {
+			if (sess->ib_sess_destroy_completion)
+				INFO(sess, "Device is being removed, will not"
+				     " schedule reconnect of session.\n");
+			else
+				INFO(sess, "Max reconnect attempts reached, "
+				     "will not schedule reconnect of "
+				     "session. (Current reconnect attempts=%d,"
+				     " max reconnect attempts=%d)\n",
+				     sess->retry_cnt,
+				     sess->max_reconnect_attempts);
+			clt_ops->sess_ev(sess->priv,
+					 IBTRS_CLT_SESS_EV_MAX_RECONN_EXCEEDED,
+					 0);
+
+			ssm_init_state(sess, SSM_STATE_DISCONNECTED);
+		}
+		break;
+	case SSM_EV_SESS_CLOSE:
+		ssm_init_state(sess, SSM_STATE_CLOSE_DESTROY);
+		break;
+	case SSM_EV_RECONNECT_USER:
+		sess->retry_cnt = 0;
+		ssm_init_state(sess, SSM_STATE_CLOSE_RECONNECT_IMM);
+		break;
+	default:
+		WRN(sess,
+		    "Unexpected SSM Event '%s' in state '%s' received\n",
+		    ssm_event_str(ev), ssm_state_str(sess->state));
+		return;
+	}
+}
+
+static void ssm_close_reconnect_imm(struct ibtrs_session *sess, enum ssm_ev ev)
+{
+	DEB("sess %p, state %s event %s\n", sess, ssm_state_str(sess->state),
+	    ssm_event_str(ev));
+	switch (ev) {
+	case SSM_EV_CON_CLOSED:
+		sess->active_cnt--;
+		DEB("active_cnt %d\n", sess->active_cnt);
+		if (sess->active_cnt)
+			break;
+	case SSM_EV_ALL_CON_CLOSED:
+		if (ssm_init_state(sess, SSM_STATE_IDLE_RECONNECT))
+			ssm_init_state(sess, SSM_STATE_DISCONNECTED);
+		break;
+	case SSM_EV_SESS_CLOSE:
+		ssm_init_state(sess, SSM_STATE_CLOSE_DESTROY);
+		break;
+	case SSM_EV_RECONNECT_USER:
+	case SSM_EV_CON_ERROR:
+		break;
+	default:
+		WRN(sess,
+		    "Unexpected SSM Event '%s' in state '%s' received\n",
+		    ssm_event_str(ev), ssm_state_str(sess->state));
+		return;
+	}
+}
+
+static int ssm_disconnected_init(struct ibtrs_session *sess)
+{
+	ibtrs_clt_destroy_ib_session(sess);
+
+	return 0;
+}
+
+static void ssm_disconnected(struct ibtrs_session *sess, enum ssm_ev ev)
+{
+	DEB("sess %p, state %s event %s\n", sess, ssm_state_str(sess->state),
+	    ssm_event_str(ev));
+
+	switch (ev) {
+	case SSM_EV_RECONNECT_USER:
+		sess->retry_cnt = 0;
+		/* stay in disconnected if can't switch to IDLE_RECONNECT */
+		ssm_init_state(sess, SSM_STATE_IDLE_RECONNECT);
+		break;
+	case SSM_EV_SESS_CLOSE:
+		ssm_init_state(sess, SSM_STATE_DESTROYED);
+		break;
+	default:
+		WRN(sess,
+		    "Unexpected SSM Event '%s' in state '%s' received\n",
+		    ssm_event_str(ev), ssm_state_str(sess->state));
+		return;
+	}
+}
+
+static int ssm_destroyed_init(struct ibtrs_session *sess)
+{
+	queue_destroy_sess(sess);
+
+	return 0;
+}
+
+static void ssm_destroyed(struct ibtrs_session *sess, enum ssm_ev ev)
+{
+	DEB("sess %p, state %s event %s\n", sess, ssm_state_str(sess->state),
+	    ssm_event_str(ev));
+
+	/* ignore all events since the session is being destroyed */
+}
+
+int ibtrs_clt_register(const struct ibtrs_clt_ops *ops)
+{
+	if (clt_ops) {
+		ERR_NP("Module %s already registered, only one user module"
+		       " supported\n", clt_ops->owner->name);
+		return -ENOTSUPP;
+	}
+	if (!clt_ops_are_valid(ops))
+		return -EINVAL;
+	clt_ops = ops;
+
+	return 0;
+}
+EXPORT_SYMBOL(ibtrs_clt_register);
+
+void ibtrs_clt_unregister(const struct ibtrs_clt_ops *ops)
+{
+	if (WARN_ON(!clt_ops))
+		return;
+
+	if (memcmp(clt_ops->owner, ops->owner, sizeof(*clt_ops)))
+		return;
+
+	flush_workqueue(ibtrs_wq);
+
+	mutex_lock(&sess_mutex);
+	WARN(!list_empty(&sess_list),
+	     "BUG: user module didn't close all sessions before calling %s\n",
+	     __func__);
+	mutex_unlock(&sess_mutex);
+
+	clt_ops = NULL;
+}
+EXPORT_SYMBOL(ibtrs_clt_unregister);
+
+int ibtrs_clt_query(struct ibtrs_session *sess, struct ibtrs_attrs *attr)
+{
+	if (unlikely(sess->state != SSM_STATE_CONNECTED))
+		return -ECOMM;
+
+	attr->queue_depth      = sess->queue_depth;
+	attr->mr_page_mask     = sess->mr_page_mask;
+	attr->mr_page_size     = sess->mr_page_size;
+	attr->mr_max_size      = sess->mr_max_size;
+	attr->max_pages_per_mr = sess->max_pages_per_mr;
+	attr->max_sge          = sess->max_sge;
+	attr->max_io_size      = sess->max_io_size;
+	strlcpy(attr->hostname, sess->hostname, sizeof(attr->hostname));
+
+	return 0;
+}
+EXPORT_SYMBOL(ibtrs_clt_query);
+
+static int check_module_params(void)
+{
+	if (fmr_sg_cnt > MAX_SEGMENTS || fmr_sg_cnt < 0) {
+		ERR_NP("invalid fmr_sg_cnt values\n");
+		return -EINVAL;
+	}
+	return 0;
+}
+
+ssize_t ibtrs_clt_stats_rdma_to_str(struct ibtrs_session *sess,
+				    char *page, size_t len)
+{
+	struct ibtrs_clt_stats_rdma_stats s;
+	struct ibtrs_clt_stats_rdma_stats *r = sess->stats.rdma_stats;
+	int i;
+
+	memset(&s, 0, sizeof(s));
+
+	for (i = 0; i < num_online_cpus(); i++) {
+		s.cnt_read		+= r[i].cnt_read;
+		s.size_total_read	+= r[i].size_total_read;
+		s.cnt_write		+= r[i].cnt_write;
+		s.size_total_write	+= r[i].size_total_write;
+		s.inflight		+= r[i].inflight;
+	}
+
+	return scnprintf(page, len, "%llu %llu %llu %llu %u\n",
+			 s.cnt_read, s.size_total_read, s.cnt_write,
+			 s.size_total_write, s.inflight);
+}
+
+int ibtrs_clt_stats_sg_list_distr_to_str(struct ibtrs_session *sess, char *buf,
+					 size_t len)
+{
+	int cnt = 0;
+	unsigned p, p_i, p_f;
+	u64 *total = sess->stats.sg_list_total;
+	u64 **distr = sess->stats.sg_list_distr;
+	int i, j;
+
+	cnt += scnprintf(buf + cnt, len - cnt, "n\\cpu:");
+	for (j = 0; j < num_online_cpus(); j++)
+		cnt += scnprintf(buf + cnt, len - cnt, "%5d", j);
+
+	for (i = 0; i < SG_DISTR_LEN + 1; i++) {
+		if (i <= MAX_LIN_SG)
+			cnt += scnprintf(buf + cnt, len - cnt, "\n= %3d:", i);
+		else if (i < SG_DISTR_LEN)
+			cnt += scnprintf(buf + cnt, len - cnt,
+					"\n< %3d:",
+					1 << (i + MIN_LOG_SG - MAX_LIN_SG));
+		else
+			cnt += scnprintf(buf + cnt, len - cnt,
+					"\n>=%3d:",
+					1 << (i + MIN_LOG_SG - MAX_LIN_SG - 1));
+
+		for (j = 0; j < num_online_cpus(); j++) {
+			p = total[j] ? distr[j][i] * 1000 / total[j] : 0;
+			p_i = p / 10;
+			p_f = p % 10;
+
+			if (distr[j][i])
+				cnt += scnprintf(buf + cnt, len - cnt,
+						 " %2u.%01u", p_i, p_f);
+			else
+				cnt += scnprintf(buf + cnt, len - cnt, "    0");
+		}
+	}
+
+	cnt += scnprintf(buf + cnt, len - cnt, "\ntotal:");
+	for (j = 0; j < num_online_cpus(); j++)
+		cnt += scnprintf(buf + cnt, len - cnt, " %llu", total[j]);
+	cnt += scnprintf(buf + cnt, len - cnt, "\n");
+
+	return cnt;
+}
+
+static int __init ibtrs_client_init(void)
+{
+	int err;
+
+	scnprintf(hostname, sizeof(hostname), "%s", utsname()->nodename);
+	INFO_NP("Loading module ibtrs_client, version: " __stringify(IBTRS_VER)
+		" (use_fr: %d, retry_count: %d,"
+		" fmr_sg_cnt: %d,"
+		" default_heartbeat_timeout_ms: %d, hostname: %s)\n", use_fr,
+		retry_count, fmr_sg_cnt,
+		default_heartbeat_timeout_ms, hostname);
+	err = check_module_params();
+	if (err) {
+		ERR_NP("Failed to load module, invalid module parameters,"
+		       " errno: %d\n", err);
+		return err;
+	}
+
+	ibtrs_wq = alloc_workqueue("ibtrs_client_wq", 0, 0);
+	if (!ibtrs_wq) {
+		ERR_NP("Failed to load module, alloc ibtrs_client_wq failed\n");
+		return -ENOMEM;
+	}
+
+	err = ibtrs_clt_create_sysfs_files();
+	if (err) {
+		ERR_NP("Failed to load module, can't create sysfs files,"
+		       " errno: %d\n", err);
+		goto out_destroy_wq;
+	}
+	uuid_le_gen(&uuid);
+	return 0;
+
+out_destroy_wq:
+	destroy_workqueue(ibtrs_wq);
+	return err;
+}
+
+static void __exit ibtrs_client_exit(void)
+{
+	INFO_NP("Unloading module\n");
+
+	mutex_lock(&sess_mutex);
+	WARN(!list_empty(&sess_list),
+	     "Session(s) still exist on module unload\n");
+	mutex_unlock(&sess_mutex);
+	ibtrs_clt_destroy_sysfs_files();
+	destroy_workqueue(ibtrs_wq);
+
+	INFO_NP("Module unloaded\n");
+}
+
+module_init(ibtrs_client_init);
+module_exit(ibtrs_client_exit);
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 06/28] ibtrs_clt: add header file shared only in ibtrs_client
  2017-03-24 10:45 ` Jack Wang
                   ` (5 preceding siblings ...)
  (?)
@ 2017-03-24 10:45 ` Jack Wang
  -1 siblings, 0 replies; 87+ messages in thread
From: Jack Wang @ 2017-03-24 10:45 UTC (permalink / raw)
  To: linux-block, linux-rdma
  Cc: dledford, axboe, hch, mail, Milind.dumbare, yun.wang, Jack Wang,
	Kleber Souza, Danil Kipnis, Roman Pen

From: Jack Wang <jinpu.wang@profitbricks.com>

Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Kleber Souza <kleber.souza@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kipnis@profitbricks.com>
Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
---
 .../ulp/ibtrs_client/ibtrs_clt_internal.h          | 244 +++++++++++++++++++++
 1 file changed, 244 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs_client/ibtrs_clt_internal.h

diff --git a/drivers/infiniband/ulp/ibtrs_client/ibtrs_clt_internal.h b/drivers/infiniband/ulp/ibtrs_client/ibtrs_clt_internal.h
new file mode 100644
index 0000000..7274b2d
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs_client/ibtrs_clt_internal.h
@@ -0,0 +1,244 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler < mail@fholler.de>
+ *          Jack Wang <jinpu.wang@profitbricks.com>
+ *   	    Kleber Souza <kleber.souza@profitbricks.com>
+ * 	    Danil Kipnis <danil.kipnis@profitbricks.com>
+ *   	    Roman Pen <roman.penyaev@profitbricks.com>
+ *          Milind Dumbare <Milind.dumbare@gmail.com>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#if !defined(IBTRS_CLT_INTERNAL_H)
+#define IBTRS_CLT_INTERNAL_H
+
+#include <rdma/ibtrs.h>
+
+enum ssm_state {
+	_SSM_STATE_MIN,
+	SSM_STATE_IDLE,
+	SSM_STATE_IDLE_RECONNECT,
+	SSM_STATE_WF_INFO,
+	SSM_STATE_WF_INFO_RECONNECT,
+	SSM_STATE_OPEN,
+	SSM_STATE_OPEN_RECONNECT,
+	SSM_STATE_CONNECTED,
+	SSM_STATE_RECONNECT,
+	SSM_STATE_RECONNECT_IMM,
+	SSM_STATE_CLOSE_DESTROY,
+	SSM_STATE_CLOSE_RECONNECT,
+	SSM_STATE_CLOSE_RECONNECT_IMM,
+	SSM_STATE_DISCONNECTED,
+	SSM_STATE_DESTROYED,
+	_SSM_STATE_MAX
+};
+
+enum ibtrs_fast_reg {
+	IBTRS_FAST_MEM_NONE,
+	IBTRS_FAST_MEM_FR,
+	IBTRS_FAST_MEM_FMR
+};
+
+struct ibtrs_stats_reconnects {
+	u32 successful_cnt;
+	u32 fail_cnt;
+};
+
+struct ibtrs_stats_wc_comp {
+	u32 max_wc_cnt;
+	u32 cnt;
+	u64 total_cnt;
+};
+
+struct ibtrs_stats_cpu_migration {
+	atomic_t *from;
+	int *to;
+};
+
+struct ibtrs_clt_stats_rdma_stats {
+	u64 cnt_read;
+	u64 size_total_read;
+	u64 cnt_write;
+	u64 size_total_write;
+
+	u16 inflight;
+};
+
+#define MIN_LOG_SG 2
+#define MAX_LOG_SG 5
+#define MAX_LIN_SG BIT(MIN_LOG_SG)
+#define SG_DISTR_LEN (MAX_LOG_SG - MIN_LOG_SG + MAX_LIN_SG + 1)
+
+struct ibtrs_clt_stats_rdma_lat_entry {
+	u64 read;
+	u64 write;
+};
+
+#define MAX_LOG_LATENCY	16
+#define MIN_LOG_LATENCY	0
+
+struct ibtrs_clt_stats_user_ib_msgs {
+	u32 recv_msg_cnt;
+	u32 sent_msg_cnt;
+	u64 recv_size;
+	u64 sent_size;
+};
+
+struct ibtrs_clt_stats {
+	struct ibtrs_stats_cpu_migration	cpu_migr;
+	struct ibtrs_clt_stats_rdma_stats	*rdma_stats;
+	u64					*sg_list_total;
+	u64					**sg_list_distr;
+	struct ibtrs_stats_reconnects		reconnects;
+	struct ibtrs_clt_stats_rdma_lat_entry	**rdma_lat_distr;
+	struct ibtrs_clt_stats_rdma_lat_entry	*rdma_lat_max;
+	struct ibtrs_clt_stats_user_ib_msgs	user_ib_msgs;
+	struct ibtrs_stats_wc_comp		*wc_comp;
+};
+
+struct ibtrs_session {
+	struct list_head	list; /* global session list */
+	wait_queue_head_t	wait_q;
+	enum ssm_state		state;
+	struct ibtrs_con	*con;
+	struct ib_session	ib_sess;
+	struct ib_device	*ib_device;
+	struct ibtrs_iu		*rdma_info_iu;
+	struct ibtrs_iu		*sess_info_iu;
+	struct ibtrs_iu		*dummy_rx_iu;
+	struct ibtrs_iu		**usr_rx_ring;
+	struct ibtrs_iu		**io_tx_ius;
+
+	spinlock_t              u_msg_ius_lock ____cacheline_aligned;
+	struct list_head	u_msg_ius_list;
+
+	struct rdma_req		*reqs;
+	struct ib_fmr_pool	*fmr_pool;
+	atomic_t		ib_sess_initialized;
+	bool			io_bufs_initialized;
+	size_t			pdu_sz;
+	void			*priv;
+	struct workqueue_struct	*sm_wq;
+	struct workqueue_struct	*msg_wq;
+	struct delayed_work	heartbeat_dwork;
+	u32			heartbeat_timeout_srv_ms;
+	struct delayed_work	reconnect_dwork;
+	struct ibtrs_heartbeat	heartbeat;
+	atomic_t		refcount;
+	u8			active_cnt;
+	bool			enable_rdma_lat;
+	u8			ver;
+	u8			connected_cnt;
+	u32			retry_cnt;
+	s16			max_reconnect_attempts;
+	u8			reconnect_delay_sec;
+	void			*tags;
+	unsigned long		*tags_map;
+	wait_queue_head_t	tags_wait;
+	u64			*srv_rdma_addr;
+	u32			srv_rdma_buf_rkey;
+	u32			max_io_size;
+	u32			max_req_size;
+	u32			chunk_size;
+	u32			max_desc;
+	u32			queue_depth;
+	u16			user_queue_depth;
+	enum ibtrs_fast_reg	fast_reg_mode;
+	u64			mr_page_mask;
+	u32			mr_page_size;
+	u32			mr_max_size;
+	u32			max_pages_per_mr;
+	int			max_sge;
+	struct sockaddr_storage peer_addr;
+	struct sockaddr_storage self_addr;
+	struct completion	*destroy_completion;
+	struct kobject		kobj;
+	struct kobject		kobj_stats;
+	char			addr[IBTRS_ADDRLEN];
+	char			hostname[MAXHOSTNAMELEN];
+	struct ibtrs_clt_stats  stats;
+	wait_queue_head_t	mu_iu_wait_q;
+	wait_queue_head_t	mu_buf_wait_q;
+	atomic_t		peer_usr_msg_bufs;
+	struct completion	*ib_sess_destroy_completion;
+};
+
+#define TAG_SIZE(sess) (sizeof(struct ibtrs_tag) + (sess)->pdu_sz)
+#define GET_TAG(sess, idx) ((sess)->tags + TAG_SIZE(sess) * idx)
+
+/**
+ * ibtrs_clt_reconnect() - Reconnect the session
+ * @sess: Session handler
+ */
+int ibtrs_clt_reconnect(struct ibtrs_session *sess);
+
+void ibtrs_clt_set_max_reconnect_attempts(struct ibtrs_session *sess,
+					  s16 value);
+
+s16 ibtrs_clt_get_max_reconnect_attempts(const struct ibtrs_session *sess);
+int ibtrs_clt_get_user_queue_depth(struct ibtrs_session *sess);
+int ibtrs_clt_set_user_queue_depth(struct ibtrs_session *sess, u16 queue_depth);
+int ibtrs_clt_reset_sg_list_distr_stats(struct ibtrs_session *sess,
+					bool enable);
+int ibtrs_clt_stats_sg_list_distr_to_str(struct ibtrs_session *sess,
+					 char *buf, size_t len);
+int ibtrs_clt_reset_rdma_lat_distr_stats(struct ibtrs_session *sess,
+					 bool enable);
+ssize_t ibtrs_clt_stats_rdma_lat_distr_to_str(struct ibtrs_session *sess,
+					      char *page, size_t len);
+int ibtrs_clt_reset_cpu_migr_stats(struct ibtrs_session *sess, bool enable);
+int ibtrs_clt_stats_migration_cnt_to_str(struct ibtrs_session *sess, char *buf,
+					 size_t len);
+int ibtrs_clt_reset_reconnects_stat(struct ibtrs_session *sess, bool enable);
+int ibtrs_clt_stats_reconnects_to_str(struct ibtrs_session *sess, char *buf,
+				      size_t len);
+int ibtrs_clt_reset_user_ib_msgs_stats(struct ibtrs_session *sess, bool enable);
+int ibtrs_clt_stats_user_ib_msgs_to_str(struct ibtrs_session *sess, char *buf,
+					size_t len);
+int ibtrs_clt_reset_wc_comp_stats(struct ibtrs_session *sess, bool enable);
+int ibtrs_clt_stats_wc_completion_to_str(struct ibtrs_session *sess, char *buf,
+					 size_t len);
+int ibtrs_clt_reset_rdma_stats(struct ibtrs_session *sess, bool enable);
+ssize_t ibtrs_clt_stats_rdma_to_str(struct ibtrs_session *sess,
+				    char *page, size_t len);
+bool ibtrs_clt_sess_is_connected(const struct ibtrs_session *sess);
+int ibtrs_clt_reset_all_stats(struct ibtrs_session *sess, bool enable);
+ssize_t ibtrs_clt_reset_all_help(struct ibtrs_session *sess,
+				 char *page, size_t len);
+
+#endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 07/28] ibtrs_clt: add files for sysfs interface
  2017-03-24 10:45 ` Jack Wang
                   ` (6 preceding siblings ...)
  (?)
@ 2017-03-24 10:45 ` Jack Wang
  -1 siblings, 0 replies; 87+ messages in thread
From: Jack Wang @ 2017-03-24 10:45 UTC (permalink / raw)
  To: linux-block, linux-rdma
  Cc: dledford, axboe, hch, mail, Milind.dumbare, yun.wang, Jack Wang,
	Kleber Souza, Danil Kipnis, Roman Pen

From: Jack Wang <jinpu.wang@profitbricks.com>

Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Kleber Souza <kleber.souza@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kipnis@profitbricks.com>
Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
---
 .../infiniband/ulp/ibtrs_client/ibtrs_clt_sysfs.c  | 412 +++++++++++++++++++++
 .../infiniband/ulp/ibtrs_client/ibtrs_clt_sysfs.h  |  62 ++++
 2 files changed, 474 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs_client/ibtrs_clt_sysfs.c
 create mode 100644 drivers/infiniband/ulp/ibtrs_client/ibtrs_clt_sysfs.h

diff --git a/drivers/infiniband/ulp/ibtrs_client/ibtrs_clt_sysfs.c b/drivers/infiniband/ulp/ibtrs_client/ibtrs_clt_sysfs.c
new file mode 100644
index 0000000..d430af0
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs_client/ibtrs_clt_sysfs.c
@@ -0,0 +1,412 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler < mail@fholler.de>
+ *          Jack Wang <jinpu.wang@profitbricks.com>
+ *   	    Kleber Souza <kleber.souza@profitbricks.com>
+ * 	    Danil Kipnis <danil.kipnis@profitbricks.com>
+ *   	    Roman Pen <roman.penyaev@profitbricks.com>
+ *          Milind Dumbare <Milind.dumbare@gmail.com>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#include <linux/types.h>
+#include "ibtrs_clt_internal.h"
+#include <rdma/ibtrs_clt.h>
+#include "ibtrs_clt_sysfs.h"
+#include <rdma/ibtrs.h>
+#include <rdma/ibtrs_log.h>
+#include <rdma/ib.h>
+
+static struct kobject *sessions_kobj;
+static struct kobject *ibtrs_kobj;
+
+#define MIN_MAX_RECONN_ATT -1
+#define MAX_MAX_RECONN_ATT 9999
+
+static ssize_t ibtrs_clt_max_reconn_attempts_show(struct kobject *kobj,
+						  struct kobj_attribute *attr,
+						  char *page)
+{
+	struct ibtrs_session *sess = container_of(kobj, struct ibtrs_session,
+						  kobj);
+
+	return sprintf(page, "%d\n",
+		       ibtrs_clt_get_max_reconnect_attempts(sess));
+}
+
+static ssize_t ibtrs_clt_max_reconn_attempts_store(struct kobject *kobj,
+						   struct kobj_attribute *attr,
+						   const char *buf,
+						   size_t count)
+{
+	int ret;
+	s16 value;
+	struct ibtrs_session *sess = container_of(kobj, struct ibtrs_session,
+						  kobj);
+
+	ret = kstrtos16(buf, 10, &value);
+	if (unlikely(ret)) {
+		ERR(sess, "%s: failed to convert string '%s' to int\n",
+		    attr->attr.name, buf);
+		return ret;
+	}
+	if (unlikely(value > MAX_MAX_RECONN_ATT ||
+		     value < MIN_MAX_RECONN_ATT)) {
+		ERR(sess, "%s: invalid range"
+		    " (provided: '%s', accepted: min: %d, max: %d)\n",
+		    attr->attr.name, buf, MIN_MAX_RECONN_ATT,
+		    MAX_MAX_RECONN_ATT);
+		return -EINVAL;
+	}
+
+	INFO(sess, "%s: changing value from %d to %d\n", attr->attr.name,
+	     ibtrs_clt_get_max_reconnect_attempts(sess), value);
+	ibtrs_clt_set_max_reconnect_attempts(sess, value);
+	return count;
+}
+
+static struct kobj_attribute max_ibtrs_clt_reconnect_attempts_attr =
+		__ATTR(max_reconnect_attempts, 0644,
+		       ibtrs_clt_max_reconn_attempts_show,
+		       ibtrs_clt_max_reconn_attempts_store);
+
+static ssize_t ibtrs_clt_hb_timeout_show(struct kobject *kobj,
+					 struct kobj_attribute *attr,
+					 char *page)
+{
+	struct ibtrs_session *sess = container_of(kobj, struct ibtrs_session,
+						  kobj);
+
+	return scnprintf(page, PAGE_SIZE, "%u\n", sess->heartbeat.timeout_ms);
+}
+
+static ssize_t ibtrs_clt_hb_timeout_store(struct kobject *kobj,
+					  struct kobj_attribute *attr,
+					  const char *buf, size_t count)
+{
+	int ret;
+	u32 timeout_ms;
+	struct ibtrs_session *sess = container_of(kobj, struct ibtrs_session,
+						  kobj);
+
+	ret = kstrtouint(buf, 0, &timeout_ms);
+	if (ret) {
+		ERR(sess,
+		    "%s: failed to convert string '%s' to unsigned int\n",
+		    attr->attr.name, buf);
+		return ret;
+	}
+
+	ret = ibtrs_heartbeat_timeout_validate(timeout_ms);
+	if (ret)
+		return ret;
+
+	INFO(sess, "%s: changing value from %u to %u\n", attr->attr.name,
+	     sess->heartbeat.timeout_ms, timeout_ms);
+	ibtrs_set_heartbeat_timeout(&sess->heartbeat, timeout_ms);
+	return count;
+}
+
+static struct kobj_attribute ibtrs_clt_heartbeat_timeout_ms_attr =
+		__ATTR(heartbeat_timeout_ms, 0644,
+		       ibtrs_clt_hb_timeout_show, ibtrs_clt_hb_timeout_store);
+
+static ssize_t ibtrs_clt_state_show(struct kobject *kobj,
+				    struct kobj_attribute *attr, char *page)
+{
+	struct ibtrs_session *sess = container_of(kobj, struct ibtrs_session,
+						  kobj);
+	if (ibtrs_clt_sess_is_connected(sess))
+		return sprintf(page, "connected\n");
+
+	return sprintf(page, "disconnected\n");
+}
+
+static struct kobj_attribute ibtrs_clt_state_attr = __ATTR(state, 0444,
+							   ibtrs_clt_state_show,
+							   NULL);
+
+static ssize_t ibtrs_clt_hostname_show(struct kobject *kobj,
+				       struct kobj_attribute *attr, char *page)
+{
+	struct ibtrs_session *sess = container_of(kobj, struct ibtrs_session,
+						  kobj);
+	return sprintf(page, "%s\n", sess->hostname);
+}
+
+static struct kobj_attribute ibtrs_clt_hostname_attr =
+		__ATTR(hostname, 0444, ibtrs_clt_hostname_show, NULL);
+
+static ssize_t ibtrs_clt_reconnect_show(struct kobject *kobj,
+					struct kobj_attribute *attr, char *page)
+{
+	return scnprintf(page, PAGE_SIZE, "Usage: echo 1 > %s\n",
+			 attr->attr.name);
+}
+
+static ssize_t ibtrs_clt_reconnect_store(struct kobject *kobj,
+					 struct kobj_attribute *attr,
+					 const char *buf, size_t count)
+{
+	struct ibtrs_session *sess;
+	int ret;
+
+	sess = container_of(kobj, struct ibtrs_session, kobj);
+
+	if (!sysfs_streq(buf, "1")) {
+		ERR(sess, "%s: unknown value: '%s'\n", attr->attr.name, buf);
+		return -EINVAL;
+	}
+
+	ret = ibtrs_clt_reconnect(sess);
+	if (ret) {
+		ERR(sess, "%s: failed to reconnect session, errno: %d\n",
+		    attr->attr.name, ret);
+		return ret;
+	}
+	return count;
+}
+
+static struct kobj_attribute ibtrs_clt_reconnect_attr =
+		__ATTR(reconnect, 0644, ibtrs_clt_reconnect_show,
+		       ibtrs_clt_reconnect_store);
+
+static ssize_t ibtrs_clt_queue_show(struct kobject *kobj,
+				    struct kobj_attribute *attr, char *page)
+{
+	struct ibtrs_session *sess = container_of(kobj, struct ibtrs_session,
+						  kobj);
+	return scnprintf(page, PAGE_SIZE, "%d\n",
+			 ibtrs_clt_get_user_queue_depth(sess));
+}
+
+static ssize_t ibtrs_clt_queue_store(struct kobject *kobj,
+				     struct kobj_attribute *attr,
+				     const char *buf, size_t count)
+{
+	int res;
+	u16 old_queue_depth, queue_depth;
+	struct ibtrs_session *sess = container_of(kobj, struct ibtrs_session,
+						  kobj);
+	res = kstrtou16(buf, 0, &queue_depth);
+	if (res) {
+		ERR(sess,
+		    "%s: failed to convert string '%s' to unsigned int\n",
+		    attr->attr.name, buf);
+		return res;
+	}
+
+	old_queue_depth = ibtrs_clt_get_user_queue_depth(sess);
+	res = ibtrs_clt_set_user_queue_depth(sess, queue_depth);
+	if (!res) {
+		INFO(sess, "%s: changed value from %u to %u\n",
+		     attr->attr.name, old_queue_depth, queue_depth);
+	} else {
+		ERR(sess, "%s: failed to set queue depth, errno: %d\n",
+		    attr->attr.name, res);
+		return res;
+	}
+	return count;
+}
+
+STAT_ATTR(cpu_migration, ibtrs_clt_stats_migration_cnt_to_str,
+	  ibtrs_clt_reset_cpu_migr_stats);
+
+STAT_ATTR(sg_entries, ibtrs_clt_stats_sg_list_distr_to_str,
+	  ibtrs_clt_reset_sg_list_distr_stats);
+
+STAT_ATTR(reconnects, ibtrs_clt_stats_reconnects_to_str,
+	  ibtrs_clt_reset_reconnects_stat);
+
+STAT_ATTR(rdma_lat, ibtrs_clt_stats_rdma_lat_distr_to_str,
+	  ibtrs_clt_reset_rdma_lat_distr_stats);
+
+STAT_ATTR(user_ib_messages, ibtrs_clt_stats_user_ib_msgs_to_str,
+	  ibtrs_clt_reset_user_ib_msgs_stats);
+
+STAT_ATTR(wc_completion, ibtrs_clt_stats_wc_completion_to_str,
+	  ibtrs_clt_reset_wc_comp_stats);
+
+STAT_ATTR(rdma, ibtrs_clt_stats_rdma_to_str,
+	  ibtrs_clt_reset_rdma_stats);
+
+STAT_ATTR(reset_all, ibtrs_clt_reset_all_help, ibtrs_clt_reset_all_stats);
+
+static struct attribute *ibtrs_clt_default_stats_attrs[] = {
+	&sg_entries_attr.attr,
+	&cpu_migration_attr.attr,
+	&reconnects_attr.attr,
+	&rdma_lat_attr.attr,
+	&user_ib_messages_attr.attr,
+	&wc_completion_attr.attr,
+	&rdma_attr.attr,
+	&reset_all_attr.attr,
+	NULL,
+};
+
+static struct attribute_group ibtrs_clt_default_stats_attr_group = {
+	.attrs = ibtrs_clt_default_stats_attrs,
+};
+
+static struct kobj_type ibtrs_stats_ktype = {
+	.sysfs_ops = &kobj_sysfs_ops,
+};
+
+static int ibtrs_clt_create_stats_files(struct kobject *kobj,
+					struct kobject *kobj_stats)
+{
+	int ret;
+
+	ret = kobject_init_and_add(kobj_stats, &ibtrs_stats_ktype, kobj,
+				   "stats");
+	if (ret) {
+		ERR_NP("Failed to init and add stats kobject, errno: %d\n",
+		       ret);
+		return ret;
+	}
+
+	ret = sysfs_create_group(kobj_stats,
+				 &ibtrs_clt_default_stats_attr_group);
+	if (ret) {
+		ERR_NP("failed to create stats sysfs group, errno: %d\n", ret);
+		goto err;
+	}
+
+	return 0;
+
+err:
+	kobject_del(kobj_stats);
+	kobject_put(kobj_stats);
+
+	return ret;
+}
+
+static struct kobj_attribute ibtrs_clt_queue_depth_attr =
+		__ATTR(queue_depth, 0644, ibtrs_clt_queue_show,
+		       ibtrs_clt_queue_store);
+
+static struct attribute *ibtrs_clt_default_sess_attrs[] = {
+	&max_ibtrs_clt_reconnect_attempts_attr.attr,
+	&ibtrs_clt_heartbeat_timeout_ms_attr.attr,
+	&ibtrs_clt_state_attr.attr,
+	&ibtrs_clt_hostname_attr.attr,
+	&ibtrs_clt_reconnect_attr.attr,
+	&ibtrs_clt_queue_depth_attr.attr,
+	NULL,
+};
+
+static struct attribute_group ibtrs_clt_default_sess_attr_group = {
+	.attrs = ibtrs_clt_default_sess_attrs,
+};
+
+static struct kobj_type ibtrs_session_ktype = {
+	.sysfs_ops = &kobj_sysfs_ops,
+};
+
+int ibtrs_clt_create_sess_files(struct kobject *kobj,
+				struct kobject *kobj_stats, const char *ip)
+{
+	int ret;
+
+	ret = kobject_init_and_add(kobj, &ibtrs_session_ktype, sessions_kobj,
+				   "%s", ip);
+	if (ret) {
+		ERR_NP("Failed to create session kobject, errno: %d\n", ret);
+		return ret;
+	}
+
+	ret = sysfs_create_group(kobj, &ibtrs_clt_default_sess_attr_group);
+	if (ret) {
+		ERR_NP("Failed to create session sysfs group, errno: %d\n",
+		       ret);
+		goto err;
+	}
+
+	ret = ibtrs_clt_create_stats_files(kobj, kobj_stats);
+	if (ret) {
+		ERR_NP("Failed to create stats files, errno: %d\n", ret);
+		goto err1;
+	}
+
+	return 0;
+
+err1:
+	sysfs_remove_group(kobj, &ibtrs_clt_default_sess_attr_group);
+err:
+	kobject_del(kobj);
+	kobject_put(kobj);
+
+	return ret;
+}
+
+void ibtrs_clt_destroy_sess_files(struct kobject *kobj,
+				  struct kobject *kobj_stats)
+{
+	if (kobj->state_in_sysfs) {
+		kobject_del(kobj_stats);
+		kobject_put(kobj_stats);
+		kobject_del(kobj);
+		kobject_put(kobj);
+	}
+}
+
+int ibtrs_clt_create_sysfs_files(void)
+{
+	ibtrs_kobj = kobject_create_and_add("ibtrs", kernel_kobj);
+	if (!ibtrs_kobj) {
+		ERR_NP("Failed to create 'ibtrs' kobject\n");
+		return -ENOMEM;
+	}
+
+	sessions_kobj = kobject_create_and_add("sessions", ibtrs_kobj);
+	if (!sessions_kobj) {
+		ERR_NP("Failed to create 'sessions' kobject\n");
+		kobject_del(ibtrs_kobj);
+		kobject_put(ibtrs_kobj);
+		return -ENOMEM;
+	}
+	return 0;
+}
+
+void ibtrs_clt_destroy_sysfs_files(void)
+{
+	kobject_del(sessions_kobj);
+	kobject_del(ibtrs_kobj);
+	kobject_put(sessions_kobj);
+	kobject_put(ibtrs_kobj);
+}
diff --git a/drivers/infiniband/ulp/ibtrs_client/ibtrs_clt_sysfs.h b/drivers/infiniband/ulp/ibtrs_client/ibtrs_clt_sysfs.h
new file mode 100644
index 0000000..d3ae563
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs_client/ibtrs_clt_sysfs.h
@@ -0,0 +1,62 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler < mail@fholler.de>
+ *          Jack Wang <jinpu.wang@profitbricks.com>
+ *   	    Kleber Souza <kleber.souza@profitbricks.com>
+ * 	    Danil Kipnis <danil.kipnis@profitbricks.com>
+ *   	    Roman Pen <roman.penyaev@profitbricks.com>
+ *          Milind Dumbare <Milind.dumbare@gmail.com>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#ifndef _IBTRS_CLT_SYFS_H
+#define _IBTRS_CLT_SYFS_H
+
+#include <linux/kobject.h>
+
+int ibtrs_clt_create_sysfs_files(void);
+
+void ibtrs_clt_destroy_sysfs_files(void);
+
+int ibtrs_clt_create_sess_files(struct kobject *kobj, struct kobject *kobj_sess,
+				const char *ip);
+
+void ibtrs_clt_destroy_sess_files(struct kobject *kobj,
+				  struct kobject *kobj_sess);
+
+#endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 08/28] ibtrs_clt: add Makefile and Kconfig
  2017-03-24 10:45 ` Jack Wang
                   ` (7 preceding siblings ...)
  (?)
@ 2017-03-24 10:45 ` Jack Wang
  2017-03-25  5:51     ` kbuild test robot
  2017-03-25  6:55     ` kbuild test robot
  -1 siblings, 2 replies; 87+ messages in thread
From: Jack Wang @ 2017-03-24 10:45 UTC (permalink / raw)
  To: linux-block, linux-rdma
  Cc: dledford, axboe, hch, mail, Milind.dumbare, yun.wang, Jack Wang

From: Jack Wang <jinpu.wang@profitbricks.com>

Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
 drivers/infiniband/Kconfig                   | 2 ++
 drivers/infiniband/ulp/Makefile              | 1 +
 drivers/infiniband/ulp/ibtrs_client/Kconfig  | 8 ++++++++
 drivers/infiniband/ulp/ibtrs_client/Makefile | 6 ++++++
 4 files changed, 17 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs_client/Kconfig
 create mode 100644 drivers/infiniband/ulp/ibtrs_client/Makefile

diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
index 66f8602..cb1b864 100644
--- a/drivers/infiniband/Kconfig
+++ b/drivers/infiniband/Kconfig
@@ -85,6 +85,8 @@ source "drivers/infiniband/ulp/srpt/Kconfig"
 source "drivers/infiniband/ulp/iser/Kconfig"
 source "drivers/infiniband/ulp/isert/Kconfig"
 
+source "drivers/infiniband/ulp/ibtrs_client/Kconfig"
+
 source "drivers/infiniband/sw/rdmavt/Kconfig"
 source "drivers/infiniband/sw/rxe/Kconfig"
 
diff --git a/drivers/infiniband/ulp/Makefile b/drivers/infiniband/ulp/Makefile
index f3c7dcf..acd8ce6 100644
--- a/drivers/infiniband/ulp/Makefile
+++ b/drivers/infiniband/ulp/Makefile
@@ -3,3 +3,4 @@ obj-$(CONFIG_INFINIBAND_SRP)		+= srp/
 obj-$(CONFIG_INFINIBAND_SRPT)		+= srpt/
 obj-$(CONFIG_INFINIBAND_ISER)		+= iser/
 obj-$(CONFIG_INFINIBAND_ISERT)		+= isert/
+obj-$(CONFIG_INFINIBAND_IBTRS_CLT)      += ibtrs_client/
diff --git a/drivers/infiniband/ulp/ibtrs_client/Kconfig b/drivers/infiniband/ulp/ibtrs_client/Kconfig
new file mode 100644
index 0000000..3cf0728
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs_client/Kconfig
@@ -0,0 +1,8 @@
+config INFINIBAND_IBTRS_CLT
+	tristate "InfiniBand IBTRS CLIENT"
+	depends on INFINIBAND_ADDR_TRANS
+	---help---
+	  Support for the simplified data transfer over InfiniBand.
+	  This offer API to user module IBNBD_CLIENT
+
+	  The IBTRS protocol is defined by the ProfitBricks GmbH.
diff --git a/drivers/infiniband/ulp/ibtrs_client/Makefile b/drivers/infiniband/ulp/ibtrs_client/Makefile
new file mode 100644
index 0000000..d0fb226
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs_client/Makefile
@@ -0,0 +1,6 @@
+
+obj-$(CONFIG_INFINIBAND_IBTRS_CLT)	+= ibtrs_client.o
+
+ibtrs_client-y		:= ibtrs_clt.o ibtrs_clt_sysfs.o \
+	../ibtrs_lib/ibtrs.o ../ibtrs_lib/ibtrs-proto.o ../ibtrs_lib/iu.o \
+	../ibtrs_lib/heartbeat.o ../ibtrs_lib/common.o
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 09/28] ibtrs_srv: add header file for exported interface
@ 2017-03-24 10:45   ` Jack Wang
  0 siblings, 0 replies; 87+ messages in thread
From: Jack Wang @ 2017-03-24 10:45 UTC (permalink / raw)
  To: linux-block, linux-rdma
  Cc: dledford, axboe, hch, mail, Milind.dumbare, yun.wang, Jack Wang,
	Kleber Souza, Danil Kipnis, Roman Pen

From: Jack Wang <jinpu.wang@profitbricks.com>

Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Kleber Souza <kleber.souza@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kipnis@profitbricks.com>
Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
---
 include/rdma/ibtrs_srv.h | 206 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 206 insertions(+)
 create mode 100644 include/rdma/ibtrs_srv.h

diff --git a/include/rdma/ibtrs_srv.h b/include/rdma/ibtrs_srv.h
new file mode 100644
index 0000000..dbd535f
--- /dev/null
+++ b/include/rdma/ibtrs_srv.h
@@ -0,0 +1,206 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler < mail@fholler.de>
+ *          Jack Wang <jinpu.wang@profitbricks.com>
+ *   	    Kleber Souza <kleber.souza@profitbricks.com>
+ * 	    Danil Kipnis <danil.kipnis@profitbricks.com>
+ *   	    Roman Pen <roman.penyaev@profitbricks.com>
+ *          Milind Dumbare <Milind.dumbare@gmail.com>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#ifndef _IBTRS_SRV_H
+#define _IBTRS_SRV_H
+
+#include <linux/socket.h>
+
+struct ibtrs_session;
+struct ibtrs_ops_id;
+
+enum ibtrs_srv_rdma_ev {
+	IBTRS_SRV_RDMA_EV_RECV,
+	IBTRS_SRV_RDMA_EV_WRITE_REQ,
+};
+
+/**
+ * enum ibtrs_srv_sess_ev - Session events
+ * @IBTRS_SRV_SESS_EV_CONNECTED:	Connection from client established
+ * @IBTRS_SRV_SESS_EV_DISCONNECTING:	Connection is currently disconnected,
+ *					sending data through the connection may
+ *					fail, but could still recv messages.
+ * @IBTRS_SRV_SESS_EV_DISCONNECTED:	Connection was disconnected, all
+ *					connection IBTRS resources were freed.
+ */
+
+enum ibtrs_srv_sess_ev {
+	IBTRS_SRV_SESS_EV_CONNECTED,
+	IBTRS_SRV_SESS_EV_DISCONNECTING,
+	IBTRS_SRV_SESS_EV_DISCONNECTED,
+};
+
+/**
+ * &struct ibtrs_srv_ops - Callbacks for ibtrs_server
+ * @owner:		module that uses ibtrs_server
+ * @rdma_ev:		Event notification for RDMA operations
+ *			If the callback returns a value != 0, an error message
+ *			for the data transfer will be sent to the client.
+
+ *	@sess:		Session
+ *	@priv:		Private data from user
+ *	@id:		internal IBTRS id
+ *	@ev:		Event
+ *	@data:		Data received by the client. The message of the user of
+ *			ibtrs_client is allocated at the end of the buffer.
+ *			Before the message the data of the ibtrs_client is
+ *			located.
+ *			If the event is %IBTRS_SRV_RDMA_EV_WRITE_REQ, the user
+ *			can write his response into @data. When
+ *			ibtrs_srv_resp_rdma() is called, this @data will be
+ *			transferred to the client.
+ *	@len:		length of data in @data
+
+ * @sess_ev:		Events about connective state changes
+ *			If the callback returns != 0 and the event
+ *			%IBTRS_SRV_SESS_EV_CONNECTED the corresponding session
+ *			was destroyed.
+ *	@sess:		Session
+ *	@ev:		event
+ *	@priv:		Private data from user if previously set with
+ *			ibtrs_srv_set_sess_priv()
+
+ * @recv:		Event notification for infiniband message receival
+ *	@sess:		Session
+ *	@priv:		Private data from user if previously set with
+ *			ibtrs_srv_set_sess_priv()
+ *	@msg:		Received message
+ *	@len:		length of @msg
+ */
+
+typedef int (rdma_ev_fn)(struct ibtrs_session *sess, void *priv,
+			 struct ibtrs_ops_id *id, enum ibtrs_srv_rdma_ev ev,
+			 void *data, size_t len);
+typedef int (sess_ev_fn)(struct ibtrs_session *sess, enum ibtrs_srv_sess_ev ev,
+			 void *priv);
+typedef void (recv_fn)(struct ibtrs_session *sess, void *priv, const void *msg,
+		       size_t len);
+
+struct ibtrs_srv_ops {
+	struct module *owner;
+
+	rdma_ev_fn	*rdma_ev;
+	sess_ev_fn	*sess_ev;
+	recv_fn		*recv;
+};
+
+/**
+ * ibtrs_srv_register() - register a module with ibtrs_server
+ * @ops:		callback functions
+ *
+ * Registers a module with the ibtrs_server. The user module passes the function
+ * pointers, that ibtrs_server can call to communicate with it.
+ *
+ * Return:
+ * 0:		Success
+ * <0:		Error
+ */
+int ibtrs_srv_register(const struct ibtrs_srv_ops *ops);
+
+/**
+ * ibtrs_srv_unregister - unregister a module with ibtrs_server
+ * @ops: the struct that was passed to ibtrs_srv_register() before
+ *
+ * Unregisters a module from the ibtrs_server. All open connections will be
+ * terminated.
+ */
+void ibtrs_srv_unregister(const struct ibtrs_srv_ops *ops);
+
+/**
+ * ibtrs_srv_resp_rdma() - Finish an RDMA request
+ *
+ * @id:		Internal IBTRS operation identifier
+ * @errno:	Response Code send to the other side for this operation.
+ *		0 = success, <=0 error
+ * Return:
+ *  0:		Success
+ * <0:		Error
+ *
+ * Finish a RDMA operation. A message is sent to the client and the
+ * corresponding memory areas will be released.
+ */
+int ibtrs_srv_resp_rdma(struct ibtrs_ops_id *id, int errno);
+
+/**
+ * ibtrs_srv_send() - Send data to the ibtrs_server with an infiniband message.
+ * @sess	Session
+ * @vec:	Data send to the server
+ * @nr:		Length of @vec
+ *
+ * Return:
+ * 0:		Success
+ * <0:		Error
+ * -EINVAL:	@len is too big
+ */
+int ibtrs_srv_send(struct ibtrs_session *sess, const struct kvec *vec,
+		   size_t nr);
+
+/**
+ * ibtrs_srv_set_sess_priv() - Set private pointer in ibtrs_session.
+ * @sess	Session
+ * @priv:	The private pointer that is associated with the session.
+ */
+void ibtrs_srv_set_sess_priv(struct ibtrs_session *sess, void *priv);
+
+/**
+ * ibtrs_srv_get_sess_qdepth() - Get ibtrs_session qdepth.
+ * @sess	Session
+ */
+int ibtrs_srv_get_sess_qdepth(struct ibtrs_session *sess);
+
+/**
+ * ibtrs_srv_get_sess_addr() - Get ibtrs_session address.
+ * @sess	Session
+ */
+const char *ibtrs_srv_get_sess_addr(struct ibtrs_session *sess);
+
+/**
+ * ibtrs_srv_get_sess_hostname() - Get ibtrs_session peer hostname.
+ * @sess	Session
+ */
+const char *ibtrs_srv_get_sess_hostname(struct ibtrs_session *sess);
+
+#endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 09/28] ibtrs_srv: add header file for exported interface
@ 2017-03-24 10:45   ` Jack Wang
  0 siblings, 0 replies; 87+ messages in thread
From: Jack Wang @ 2017-03-24 10:45 UTC (permalink / raw)
  To: linux-block-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
	hch-jcswGhMUV9g, mail-99BIx50xQYGELgA04lAiVw,
	Milind.dumbare-Re5JQEeQqe8AvxtiuMwx3w,
	yun.wang-EIkl63zCoXaH+58JC4qpiA, Jack Wang, Kleber Souza,
	Danil Kipnis, Roman Pen

From: Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>

Signed-off-by: Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Signed-off-by: Kleber Souza <kleber.souza-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Signed-off-by: Danil Kipnis <danil.kipnis-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Signed-off-by: Roman Pen <roman.penyaev-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
---
 include/rdma/ibtrs_srv.h | 206 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 206 insertions(+)
 create mode 100644 include/rdma/ibtrs_srv.h

diff --git a/include/rdma/ibtrs_srv.h b/include/rdma/ibtrs_srv.h
new file mode 100644
index 0000000..dbd535f
--- /dev/null
+++ b/include/rdma/ibtrs_srv.h
@@ -0,0 +1,206 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler < mail-99BIx50xQYGELgA04lAiVw@public.gmane.org>
+ *          Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ *   	    Kleber Souza <kleber.souza-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ * 	    Danil Kipnis <danil.kipnis-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ *   	    Roman Pen <roman.penyaev-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ *          Milind Dumbare <Milind.dumbare-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#ifndef _IBTRS_SRV_H
+#define _IBTRS_SRV_H
+
+#include <linux/socket.h>
+
+struct ibtrs_session;
+struct ibtrs_ops_id;
+
+enum ibtrs_srv_rdma_ev {
+	IBTRS_SRV_RDMA_EV_RECV,
+	IBTRS_SRV_RDMA_EV_WRITE_REQ,
+};
+
+/**
+ * enum ibtrs_srv_sess_ev - Session events
+ * @IBTRS_SRV_SESS_EV_CONNECTED:	Connection from client established
+ * @IBTRS_SRV_SESS_EV_DISCONNECTING:	Connection is currently disconnected,
+ *					sending data through the connection may
+ *					fail, but could still recv messages.
+ * @IBTRS_SRV_SESS_EV_DISCONNECTED:	Connection was disconnected, all
+ *					connection IBTRS resources were freed.
+ */
+
+enum ibtrs_srv_sess_ev {
+	IBTRS_SRV_SESS_EV_CONNECTED,
+	IBTRS_SRV_SESS_EV_DISCONNECTING,
+	IBTRS_SRV_SESS_EV_DISCONNECTED,
+};
+
+/**
+ * &struct ibtrs_srv_ops - Callbacks for ibtrs_server
+ * @owner:		module that uses ibtrs_server
+ * @rdma_ev:		Event notification for RDMA operations
+ *			If the callback returns a value != 0, an error message
+ *			for the data transfer will be sent to the client.
+
+ *	@sess:		Session
+ *	@priv:		Private data from user
+ *	@id:		internal IBTRS id
+ *	@ev:		Event
+ *	@data:		Data received by the client. The message of the user of
+ *			ibtrs_client is allocated at the end of the buffer.
+ *			Before the message the data of the ibtrs_client is
+ *			located.
+ *			If the event is %IBTRS_SRV_RDMA_EV_WRITE_REQ, the user
+ *			can write his response into @data. When
+ *			ibtrs_srv_resp_rdma() is called, this @data will be
+ *			transferred to the client.
+ *	@len:		length of data in @data
+
+ * @sess_ev:		Events about connective state changes
+ *			If the callback returns != 0 and the event
+ *			%IBTRS_SRV_SESS_EV_CONNECTED the corresponding session
+ *			was destroyed.
+ *	@sess:		Session
+ *	@ev:		event
+ *	@priv:		Private data from user if previously set with
+ *			ibtrs_srv_set_sess_priv()
+
+ * @recv:		Event notification for infiniband message receival
+ *	@sess:		Session
+ *	@priv:		Private data from user if previously set with
+ *			ibtrs_srv_set_sess_priv()
+ *	@msg:		Received message
+ *	@len:		length of @msg
+ */
+
+typedef int (rdma_ev_fn)(struct ibtrs_session *sess, void *priv,
+			 struct ibtrs_ops_id *id, enum ibtrs_srv_rdma_ev ev,
+			 void *data, size_t len);
+typedef int (sess_ev_fn)(struct ibtrs_session *sess, enum ibtrs_srv_sess_ev ev,
+			 void *priv);
+typedef void (recv_fn)(struct ibtrs_session *sess, void *priv, const void *msg,
+		       size_t len);
+
+struct ibtrs_srv_ops {
+	struct module *owner;
+
+	rdma_ev_fn	*rdma_ev;
+	sess_ev_fn	*sess_ev;
+	recv_fn		*recv;
+};
+
+/**
+ * ibtrs_srv_register() - register a module with ibtrs_server
+ * @ops:		callback functions
+ *
+ * Registers a module with the ibtrs_server. The user module passes the function
+ * pointers, that ibtrs_server can call to communicate with it.
+ *
+ * Return:
+ * 0:		Success
+ * <0:		Error
+ */
+int ibtrs_srv_register(const struct ibtrs_srv_ops *ops);
+
+/**
+ * ibtrs_srv_unregister - unregister a module with ibtrs_server
+ * @ops: the struct that was passed to ibtrs_srv_register() before
+ *
+ * Unregisters a module from the ibtrs_server. All open connections will be
+ * terminated.
+ */
+void ibtrs_srv_unregister(const struct ibtrs_srv_ops *ops);
+
+/**
+ * ibtrs_srv_resp_rdma() - Finish an RDMA request
+ *
+ * @id:		Internal IBTRS operation identifier
+ * @errno:	Response Code send to the other side for this operation.
+ *		0 = success, <=0 error
+ * Return:
+ *  0:		Success
+ * <0:		Error
+ *
+ * Finish a RDMA operation. A message is sent to the client and the
+ * corresponding memory areas will be released.
+ */
+int ibtrs_srv_resp_rdma(struct ibtrs_ops_id *id, int errno);
+
+/**
+ * ibtrs_srv_send() - Send data to the ibtrs_server with an infiniband message.
+ * @sess	Session
+ * @vec:	Data send to the server
+ * @nr:		Length of @vec
+ *
+ * Return:
+ * 0:		Success
+ * <0:		Error
+ * -EINVAL:	@len is too big
+ */
+int ibtrs_srv_send(struct ibtrs_session *sess, const struct kvec *vec,
+		   size_t nr);
+
+/**
+ * ibtrs_srv_set_sess_priv() - Set private pointer in ibtrs_session.
+ * @sess	Session
+ * @priv:	The private pointer that is associated with the session.
+ */
+void ibtrs_srv_set_sess_priv(struct ibtrs_session *sess, void *priv);
+
+/**
+ * ibtrs_srv_get_sess_qdepth() - Get ibtrs_session qdepth.
+ * @sess	Session
+ */
+int ibtrs_srv_get_sess_qdepth(struct ibtrs_session *sess);
+
+/**
+ * ibtrs_srv_get_sess_addr() - Get ibtrs_session address.
+ * @sess	Session
+ */
+const char *ibtrs_srv_get_sess_addr(struct ibtrs_session *sess);
+
+/**
+ * ibtrs_srv_get_sess_hostname() - Get ibtrs_session peer hostname.
+ * @sess	Session
+ */
+const char *ibtrs_srv_get_sess_hostname(struct ibtrs_session *sess);
+
+#endif
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 10/28] ibtrs_srv: add main functionality for ibtrs_server
  2017-03-24 10:45 ` Jack Wang
                   ` (9 preceding siblings ...)
  (?)
@ 2017-03-24 10:45 ` Jack Wang
  -1 siblings, 0 replies; 87+ messages in thread
From: Jack Wang @ 2017-03-24 10:45 UTC (permalink / raw)
  To: linux-block, linux-rdma
  Cc: dledford, axboe, hch, mail, Milind.dumbare, yun.wang, Jack Wang,
	Kleber Souza, Danil Kipnis, Roman Pen

From: Jack Wang <jinpu.wang@profitbricks.com>

Service accept connection requests from clients and reserve memory
for them.

It excutes rdma transfers, hands over received data to ibnbd_server.

Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Kleber Souza <kleber.souza@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kipnis@profitbricks.com>
Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
---
 drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c | 3744 +++++++++++++++++++++++
 1 file changed, 3744 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c

diff --git a/drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c b/drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c
new file mode 100644
index 0000000..513e90a
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c
@@ -0,0 +1,3744 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler < mail@fholler.de>
+ *          Jack Wang <jinpu.wang@profitbricks.com>
+ *   	    Kleber Souza <kleber.souza@profitbricks.com>
+ * 	    Danil Kipnis <danil.kipnis@profitbricks.com>
+ *   	    Roman Pen <roman.penyaev@profitbricks.com>
+ *          Milind Dumbare <Milind.dumbare@gmail.com>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#include <linux/module.h>
+#include <linux/sizes.h>
+#include <linux/utsname.h>
+#include <linux/cpumask.h>
+#include <linux/debugfs.h>
+#include <rdma/ib_verbs.h>
+#include <rdma/rdma_cm.h>
+#include <rdma/ib.h>
+
+#include <rdma/ibtrs_srv.h>
+#include "ibtrs_srv_sysfs.h"
+#include "ibtrs_srv_internal.h"
+#include <rdma/ibtrs.h>
+#include <rdma/ibtrs_log.h>
+
+MODULE_AUTHOR("ibnbd@profitbricks.com");
+MODULE_DESCRIPTION("InfiniBand Transport Server");
+MODULE_VERSION(__stringify(IBTRS_VER));
+MODULE_LICENSE("GPL");
+
+#define DEFAULT_MAX_IO_SIZE_KB 128
+#define DEFAULT_MAX_IO_SIZE (DEFAULT_MAX_IO_SIZE_KB * 1024)
+static int max_io_size = DEFAULT_MAX_IO_SIZE;
+#define MAX_REQ_SIZE PAGE_SIZE
+static int rcv_buf_size = DEFAULT_MAX_IO_SIZE + MAX_REQ_SIZE;
+
+static int max_io_size_set(const char *val, const struct kernel_param *kp)
+{
+	int err, ival;
+
+	err = kstrtoint(val, 0, &ival);
+	if (err)
+		return err;
+
+	if (ival < 4096 || ival + MAX_REQ_SIZE > (4096 * 1024) ||
+	    (ival + MAX_REQ_SIZE) % 512 != 0) {
+		ERR_NP("Invalid max io size value %d, has to be"
+		       " > %d, < %d\n", ival, 4096, 4194304);
+		return -EINVAL;
+	}
+
+	max_io_size = ival;
+	rcv_buf_size = max_io_size + MAX_REQ_SIZE;
+	INFO_NP("max io size changed to %d\n", ival);
+
+	return 0;
+}
+
+static const struct kernel_param_ops max_io_size_ops = {
+	.set		= max_io_size_set,
+	.get		= param_get_int,
+};
+module_param_cb(max_io_size, &max_io_size_ops, &max_io_size, 0444);
+MODULE_PARM_DESC(max_io_size,
+		 "Max size for each IO request, when change the unit is in byte"
+		 " (default: " __stringify(DEFAULT_MAX_IO_SIZE_KB) "KB)");
+
+#define DEFAULT_SESS_QUEUE_DEPTH 512
+static int sess_queue_depth = DEFAULT_SESS_QUEUE_DEPTH;
+module_param_named(sess_queue_depth, sess_queue_depth, int, 0444);
+MODULE_PARM_DESC(sess_queue_depth,
+		 "Number of buffers for pending I/O requests to allocate"
+		 " per session. Maximum: " __stringify(MAX_SESS_QUEUE_DEPTH)
+		 " (default: " __stringify(DEFAULT_SESS_QUEUE_DEPTH) ")");
+
+#define DEFAULT_INIT_POOL_SIZE 10
+static int init_pool_size = DEFAULT_INIT_POOL_SIZE;
+module_param_named(init_pool_size, init_pool_size, int, 0444);
+MODULE_PARM_DESC(init_pool_size,
+		 "Maximum size of the RDMA buffers pool to pre-allocate on"
+		 " module load, in number of sessions. (default: "
+		 __stringify(DEFAULT_INIT_POOL_SIZE) ")");
+
+#define DEFAULT_POOL_SIZE_HI_WM 100
+static int pool_size_hi_wm = DEFAULT_POOL_SIZE_HI_WM;
+module_param_named(pool_size_hi_wm, pool_size_hi_wm, int, 0444);
+MODULE_PARM_DESC(pool_size_hi_wm,
+		 "High watermark value for the size of RDMA buffers pool"
+		 " (in number of sessions). Newly allocated buffers will be"
+		 " added to the pool until pool_size_hi_wm is reached."
+		 " (default: " __stringify(DEFAULT_POOL_SIZE_HI_WM) ")");
+
+static int retry_count = 7;
+
+static int retry_count_set(const char *val, const struct kernel_param *kp)
+{
+	int err, ival;
+
+	err = kstrtoint(val, 0, &ival);
+	if (err)
+		return err;
+
+	if (ival < MIN_RTR_CNT || ival > MAX_RTR_CNT) {
+		ERR_NP("Invalid retry count value %d, has to be"
+		       " > %d, < %d\n", ival, MIN_RTR_CNT, MAX_RTR_CNT);
+		return -EINVAL;
+	}
+
+	retry_count = ival;
+	INFO_NP("QP retry count changed to %d\n", ival);
+
+	return 0;
+}
+
+static const struct kernel_param_ops retry_count_ops = {
+	.set		= retry_count_set,
+	.get		= param_get_int,
+};
+module_param_cb(retry_count, &retry_count_ops, &retry_count, 0644);
+
+MODULE_PARM_DESC(retry_count, "Number of times to send the message if the"
+		 " remote side didn't respond with Ack or Nack (default: 3,"
+		 " min: " __stringify(MIN_RTR_CNT) ", max: "
+		 __stringify(MAX_RTR_CNT) ")");
+
+static int default_heartbeat_timeout_ms = DEFAULT_HEARTBEAT_TIMEOUT_MS;
+
+static int default_heartbeat_timeout_set(const char *val,
+					 const struct kernel_param *kp)
+{
+	int ret, ival;
+
+	ret = kstrtouint(val, 0, &ival);
+	if (ret)
+		return ret;
+
+	ret = ibtrs_heartbeat_timeout_validate(ival);
+	if (ret)
+		return ret;
+
+	default_heartbeat_timeout_ms = ival;
+	INFO_NP("Default heartbeat timeout changed to %d\n", ival);
+
+	return 0;
+}
+
+static const struct kernel_param_ops heartbeat_timeout_ops = {
+	.set		= default_heartbeat_timeout_set,
+	.get		= param_get_int,
+};
+
+module_param_cb(default_heartbeat_timeout_ms, &heartbeat_timeout_ops,
+		&default_heartbeat_timeout_ms, 0644);
+MODULE_PARM_DESC(default_heartbeat_timeout_ms, "default heartbeat timeout,"
+		 " min. " __stringify(MIN_HEARTBEAT_TIMEOUT_MS)
+		 " (default:" __stringify(DEFAULT_HEARTBEAT_TIMEOUT_MS) ")");
+
+static char cq_affinity_list[256] = "";
+static cpumask_t cq_affinity_mask = { CPU_BITS_ALL };
+
+static void init_cq_affinity(void)
+{
+	sprintf(cq_affinity_list, "0-%d", nr_cpu_ids - 1);
+}
+
+static int cq_affinity_list_set(const char *val, const struct kernel_param *kp)
+{
+	int ret = 0, len = strlen(val);
+	cpumask_var_t new_value;
+
+	if (!strlen(cq_affinity_list))
+		init_cq_affinity();
+
+	if (len >= sizeof(cq_affinity_list))
+		return -EINVAL;
+	if (!alloc_cpumask_var(&new_value, GFP_KERNEL))
+		return -ENOMEM;
+
+	ret = cpulist_parse(val, new_value);
+	if (ret) {
+		ERR_NP("Can't set cq_affinity_list \"%s\": %d\n", val, ret);
+		goto free_cpumask;
+	}
+
+	strlcpy(cq_affinity_list, val, sizeof(cq_affinity_list));
+	*strchrnul(cq_affinity_list, '\n') = '\0';
+	cpumask_copy(&cq_affinity_mask, new_value);
+
+	INFO_NP("cq_affinity_list changed to %*pbl\n",
+		cpumask_pr_args(&cq_affinity_mask));
+free_cpumask:
+	free_cpumask_var(new_value);
+	return ret;
+}
+
+static struct kparam_string cq_affinity_list_kparam_str = {
+	.maxlen	= sizeof(cq_affinity_list),
+	.string	= cq_affinity_list
+};
+
+static const struct kernel_param_ops cq_affinity_list_ops = {
+	.set	= cq_affinity_list_set,
+	.get	= param_get_string,
+};
+
+module_param_cb(cq_affinity_list, &cq_affinity_list_ops,
+		&cq_affinity_list_kparam_str, 0644);
+MODULE_PARM_DESC(cq_affinity_list, "Sets the list of cpus to use as cq vectors."
+				   "(default: use all possible CPUs)");
+
+static char hostname[MAXHOSTNAMELEN] = "";
+
+static int hostname_set(const char *val, const struct kernel_param *kp)
+{
+	int ret = 0, len = strlen(val);
+
+	if (len >= sizeof(hostname))
+		return -EINVAL;
+	strlcpy(hostname, val, sizeof(hostname));
+	*strchrnul(hostname, '\n') = '\0';
+
+	INFO_NP("hostname changed to %s\n", hostname);
+	return ret;
+}
+
+static struct kparam_string hostname_kparam_str = {
+	.maxlen	= sizeof(hostname),
+	.string	= hostname
+};
+
+static const struct kernel_param_ops hostname_ops = {
+	.set	= hostname_set,
+	.get	= param_get_string,
+};
+
+module_param_cb(hostname, &hostname_ops,
+		&hostname_kparam_str, 0644);
+MODULE_PARM_DESC(hostname, "Sets hostname of local server, will send to the"
+		 " other side if set,  will display togather with addr "
+		 "(default: empty)");
+
+static struct dentry *ibtrs_srv_debugfs_dir;
+static struct dentry *mempool_debugfs_dir;
+
+static struct rdma_cm_id	*cm_id_ip;
+static struct rdma_cm_id	*cm_id_ib;
+static DEFINE_MUTEX(sess_mutex);
+static LIST_HEAD(sess_list);
+static DECLARE_WAIT_QUEUE_HEAD(sess_list_waitq);
+static struct workqueue_struct *destroy_wq;
+
+static LIST_HEAD(device_list);
+static DEFINE_MUTEX(device_list_mutex);
+
+static DEFINE_MUTEX(buf_pool_mutex);
+static LIST_HEAD(free_buf_pool_list);
+static int nr_free_buf_pool;
+static int nr_total_buf_pool;
+static int nr_active_sessions;
+
+static const struct ibtrs_srv_ops *srv_ops;
+enum ssm_ev {
+	SSM_EV_CON_DISCONNECTED,
+	SSM_EV_CON_EST_ERR,
+	SSM_EV_CON_CONNECTED,
+	SSM_EV_SESS_CLOSE,
+	SSM_EV_SYSFS_DISCONNECT
+};
+
+static inline const char *ssm_ev_str(enum ssm_ev ev)
+{
+	switch (ev) {
+	case SSM_EV_CON_DISCONNECTED:
+		return "SSM_EV_CON_DISCONNECTED";
+	case SSM_EV_CON_EST_ERR:
+		return "SSM_EV_CON_EST_ERR";
+	case SSM_EV_CON_CONNECTED:
+		return "SSM_EV_CON_CONNECTED";
+	case SSM_EV_SESS_CLOSE:
+		return "SSM_EV_SESS_CLOSE";
+	case SSM_EV_SYSFS_DISCONNECT:
+		return "SSM_EV_SYSFS_DISCONNECT";
+	default:
+		return "UNKNOWN";
+	}
+}
+
+static const char *ssm_state_str(enum ssm_state state)
+{
+	switch (state) {
+	case SSM_STATE_IDLE:
+		return "SSM_STATE_IDLE";
+	case SSM_STATE_CONNECTED:
+		return "SSM_STATE_CONNECTED";
+	case SSM_STATE_CLOSING:
+		return "SSM_STATE_CLOSING";
+	case SSM_STATE_CLOSED:
+		return "SSM_STATE_CLOSED";
+	default:
+		return "UNKNOWN";
+	}
+}
+
+enum csm_state {
+	CSM_STATE_REQUESTED,
+	CSM_STATE_CONNECTED,
+	CSM_STATE_CLOSING,
+	CSM_STATE_FLUSHING,
+	CSM_STATE_CLOSED
+};
+
+static inline const char *csm_state_str(enum csm_state s)
+{
+	switch (s) {
+	case CSM_STATE_REQUESTED:
+		return "CSM_STATE_REQUESTED";
+	case CSM_STATE_CONNECTED:
+		return "CSM_STATE_CONNECTED";
+	case CSM_STATE_CLOSING:
+		return "CSM_STATE_CLOSING";
+	case CSM_STATE_FLUSHING:
+		return "CSM_STATE_FLUSHING";
+	case CSM_STATE_CLOSED:
+		return "CSM_STATE_CLOSED";
+	default:
+		return "UNKNOWN";
+	}
+}
+
+enum csm_ev {
+	CSM_EV_CON_REQUEST,
+	CSM_EV_CON_ESTABLISHED,
+	CSM_EV_CON_ERROR,
+	CSM_EV_DEVICE_REMOVAL,
+	CSM_EV_SESS_CLOSING,
+	CSM_EV_CON_DISCONNECTED,
+	CSM_EV_BEACON_COMPLETED
+};
+
+static inline const char *csm_ev_str(enum csm_ev ev)
+{
+	switch (ev) {
+	case CSM_EV_CON_REQUEST:
+		return "CSM_EV_CON_REQUEST";
+	case CSM_EV_CON_ESTABLISHED:
+		return "CSM_EV_CON_ESTABLISHED";
+	case CSM_EV_CON_ERROR:
+		return "CSM_EV_CON_ERROR";
+	case CSM_EV_DEVICE_REMOVAL:
+		return "CSM_EV_DEVICE_REMOVAL";
+	case CSM_EV_SESS_CLOSING:
+		return "CSM_EV_SESS_CLOSING";
+	case CSM_EV_CON_DISCONNECTED:
+		return "CSM_EV_CON_DISCONNECTED";
+	case CSM_EV_BEACON_COMPLETED:
+		return "CSM_EV_BEACON_COMPLETED";
+	default:
+		return "UNKNOWN";
+	}
+}
+
+struct sess_put_work {
+	struct ibtrs_session	*sess;
+	struct work_struct	work;
+};
+
+struct ibtrs_srv_sysfs_put_work {
+	struct work_struct	work;
+	struct ibtrs_session	*sess;
+};
+
+struct ssm_create_con_work {
+	struct ibtrs_session	*sess;
+	struct rdma_cm_id	*cm_id;
+	struct work_struct	work;
+	bool			user;/* true if con is for user msg only */
+};
+
+struct ssm_work {
+	struct ibtrs_session	*sess;
+	enum ssm_ev		ev;
+	struct work_struct	work;
+};
+
+struct ibtrs_con {
+	/* list for ibtrs_session->con_list */
+	struct list_head	list;
+	enum csm_state		state;
+	/* true if con is for user msg only */
+	bool			user;
+	bool			failover_enabled;
+	struct ib_con		ib_con;
+	atomic_t		wr_cnt;
+	struct rdma_cm_id	*cm_id;
+	int			cq_vector;
+	struct ibtrs_session	*sess;
+	struct work_struct	cq_work;
+	struct workqueue_struct *cq_wq;
+	struct workqueue_struct *rdma_resp_wq;
+	struct ib_wc		wcs[WC_ARRAY_SIZE];
+	bool			device_being_removed;
+};
+
+struct csm_work {
+	struct ibtrs_con	*con;
+	enum csm_ev		ev;
+	struct work_struct	work;
+};
+
+struct msg_work {
+	struct work_struct	work;
+	struct ibtrs_con	*con;
+	void                    *msg;
+};
+
+struct ibtrs_device {
+	struct list_head	entry;
+	struct ib_device	*device;
+	struct ib_session	ib_sess;
+	struct completion	*ib_sess_destroy_completion;
+	struct kref		ref;
+};
+
+struct ibtrs_ops_id {
+	struct ibtrs_con		*con;
+	u32				msg_id;
+	u8				dir;
+	u64				data_dma_addr;
+	struct ibtrs_msg_req_rdma_write *req;
+	struct ib_rdma_wr		*tx_wr;
+	struct ib_sge			*tx_sg;
+	int				status;
+	struct work_struct		work;
+} ____cacheline_aligned;
+
+static void csm_set_state(struct ibtrs_con *con, enum csm_state s)
+{
+	if (con->state != s) {
+		DEB("changing con %p csm state from %s to %s\n", con,
+		    csm_state_str(con->state), csm_state_str(s));
+		con->state = s;
+	}
+}
+
+static void ssm_set_state(struct ibtrs_session *sess, enum ssm_state state)
+{
+	if (sess->state != state) {
+		DEB("changing sess %p ssm state from %s to %s\n", sess,
+		    ssm_state_str(sess->state), ssm_state_str(state));
+		sess->state = state;
+	}
+}
+
+static struct ibtrs_con *ibtrs_srv_get_user_con(struct ibtrs_session *sess)
+{
+	struct ibtrs_con *con;
+
+	if (sess->est_cnt > 0) {
+		list_for_each_entry(con, &sess->con_list, list) {
+			if (con->user && con->state == CSM_STATE_CONNECTED)
+				return con;
+		}
+	}
+	return NULL;
+}
+
+static void csm_init(struct ibtrs_con *con);
+static void csm_schedule_event(struct ibtrs_con *con, enum csm_ev ev);
+static int ssm_init(struct ibtrs_session *sess);
+static int ssm_schedule_event(struct ibtrs_session *sess, enum ssm_ev ev);
+
+static int ibtrs_srv_get_sess_current_port_num(struct ibtrs_session *sess)
+{
+	struct ibtrs_con *con, *next;
+	struct ibtrs_con *ucon = ibtrs_srv_get_user_con(sess);
+
+	if (sess->state != SSM_STATE_CONNECTED || !ucon)
+		return -ECOMM;
+
+	mutex_lock(&sess->lock);
+	if (WARN_ON(!sess->cm_id)) {
+		mutex_unlock(&sess->lock);
+		return -ENODEV;
+	}
+	list_for_each_entry_safe(con, next, &sess->con_list, list) {
+		if (unlikely(con->state != CSM_STATE_CONNECTED)) {
+			mutex_unlock(&sess->lock);
+			return -ECOMM;
+		}
+		if (con->cm_id->port_num != sess->cm_id->port_num) {
+			mutex_unlock(&sess->lock);
+			return 0;
+		}
+	}
+	mutex_unlock(&sess->lock);
+	return sess->cm_id->port_num;
+}
+
+int ibtrs_srv_current_hca_port_to_str(struct ibtrs_session *sess,
+				      char *buf, size_t len)
+{
+	if (!ibtrs_srv_get_sess_current_port_num(sess))
+		return scnprintf(buf, len, "migrating\n");
+
+	if (ibtrs_srv_get_sess_current_port_num(sess) < 0)
+		return ibtrs_srv_get_sess_current_port_num(sess);
+
+	return scnprintf(buf, len, "%u\n",
+			 ibtrs_srv_get_sess_current_port_num(sess));
+}
+
+inline const char *ibtrs_srv_get_sess_hca_name(struct ibtrs_session *sess)
+{
+	struct ibtrs_con *con = ibtrs_srv_get_user_con(sess);
+
+	if (con)
+		return sess->dev->device->name;
+	return "n/a";
+}
+
+static void ibtrs_srv_update_rdma_stats(struct ibtrs_srv_stats *s,
+					size_t size, bool read)
+{
+	int inflight;
+
+	if (read) {
+		atomic64_inc(&s->rdma_stats.cnt_read);
+		atomic64_add(size, &s->rdma_stats.size_total_read);
+	} else {
+		atomic64_inc(&s->rdma_stats.cnt_write);
+		atomic64_add(size, &s->rdma_stats.size_total_write);
+	}
+
+	inflight = atomic_inc_return(&s->rdma_stats.inflight);
+	atomic64_add(inflight, &s->rdma_stats.inflight_total);
+}
+
+static inline void ibtrs_srv_stats_dec_inflight(struct ibtrs_session *sess)
+{
+	if (!atomic_dec_return(&sess->stats.rdma_stats.inflight))
+		wake_up(&sess->bufs_wait);
+}
+
+int ibtrs_srv_reset_rdma_stats(struct ibtrs_session *sess, bool enable)
+{
+	if (enable) {
+		struct ibtrs_srv_stats_rdma_stats *r = &sess->stats.rdma_stats;
+
+		/*
+		 * TODO: inflight is used for flow control
+		 * we can't memset the whole structure, so reset each member
+		 */
+		atomic64_set(&r->cnt_read, 0);
+		atomic64_set(&r->size_total_read, 0);
+		atomic64_set(&r->cnt_write, 0);
+		atomic64_set(&r->size_total_write, 0);
+		atomic64_set(&r->inflight_total, 0);
+		return 0;
+	} else {
+		return -EINVAL;
+	}
+}
+
+ssize_t ibtrs_srv_stats_rdma_to_str(struct ibtrs_session *sess,
+				    char *page, size_t len)
+{
+	struct ibtrs_srv_stats_rdma_stats *r = &sess->stats.rdma_stats;
+
+	return scnprintf(page, len, "%ld %ld %ld %ld %u %ld\n",
+			 atomic64_read(&r->cnt_read),
+			 atomic64_read(&r->size_total_read),
+			 atomic64_read(&r->cnt_write),
+			 atomic64_read(&r->size_total_write),
+			 atomic_read(&r->inflight),
+			 (atomic64_read(&r->cnt_read) +
+			  atomic64_read(&r->cnt_write)) ?
+			 atomic64_read(&r->inflight_total) /
+			 (atomic64_read(&r->cnt_read) +
+			  atomic64_read(&r->cnt_write)) : 0);
+}
+
+int ibtrs_srv_reset_user_ib_msgs_stats(struct ibtrs_session *sess, bool enable)
+{
+	if (enable) {
+		memset(&sess->stats.user_ib_msgs, 0,
+		       sizeof(sess->stats.user_ib_msgs));
+		return 0;
+	} else {
+		return -EINVAL;
+	}
+}
+
+int ibtrs_srv_stats_user_ib_msgs_to_str(struct ibtrs_session *sess, char *buf,
+					size_t len)
+{
+	return snprintf(buf, len, "%ld %ld %ld %ld\n",
+			atomic64_read(&sess->stats.user_ib_msgs.recv_msg_cnt),
+			atomic64_read(&sess->stats.user_ib_msgs.recv_size),
+			atomic64_read(&sess->stats.user_ib_msgs.sent_msg_cnt),
+			atomic64_read(&sess->stats.user_ib_msgs.sent_size));
+}
+
+int ibtrs_srv_reset_wc_completion_stats(struct ibtrs_session *sess, bool enable)
+{
+	if (enable) {
+		memset(&sess->stats.wc_comp, 0, sizeof(sess->stats.wc_comp));
+		return 0;
+	} else {
+		return -EINVAL;
+	}
+}
+
+int ibtrs_srv_stats_wc_completion_to_str(struct ibtrs_session *sess, char *buf,
+					 size_t len)
+{
+	return snprintf(buf, len, "%d %ld %ld\n",
+			atomic_read(&sess->stats.wc_comp.max_wc_cnt),
+			atomic64_read(&sess->stats.wc_comp.total_wc_cnt),
+			atomic64_read(&sess->stats.wc_comp.calls));
+}
+
+ssize_t ibtrs_srv_reset_all_help(struct ibtrs_session *sess,
+				 char *page, size_t len)
+{
+	return scnprintf(page, PAGE_SIZE, "echo 1 to reset all statistics\n");
+}
+
+int ibtrs_srv_reset_all_stats(struct ibtrs_session *sess, bool enable)
+{
+	if (enable) {
+		ibtrs_srv_reset_wc_completion_stats(sess, enable);
+		ibtrs_srv_reset_user_ib_msgs_stats(sess, enable);
+		ibtrs_srv_reset_rdma_stats(sess, enable);
+		return 0;
+	} else {
+		return -EINVAL;
+	}
+}
+
+static inline bool srv_ops_are_valid(const struct ibtrs_srv_ops *ops)
+{
+	return ops && ops->sess_ev && ops->rdma_ev && ops->recv;
+}
+
+static int ibtrs_srv_sess_ev(struct ibtrs_session *sess,
+			     enum ibtrs_srv_sess_ev ev)
+{
+	if (!sess->session_announced_to_user &&
+	    ev != IBTRS_SRV_SESS_EV_CONNECTED)
+		return 0;
+
+	if (ev == IBTRS_SRV_SESS_EV_CONNECTED)
+		sess->session_announced_to_user = true;
+
+	return srv_ops->sess_ev(sess, ev, sess->priv);
+}
+
+static void free_id(struct ibtrs_ops_id *id)
+{
+	if (!id)
+		return;
+	kfree(id->tx_wr);
+	kfree(id->tx_sg);
+	kvfree(id);
+}
+
+static void free_sess_tx_bufs(struct ibtrs_session *sess)
+{
+	struct ibtrs_iu *e, *next;
+	int i;
+
+	if (sess->rdma_info_iu) {
+		ibtrs_iu_free(sess->rdma_info_iu, DMA_TO_DEVICE,
+			      sess->dev->device);
+		sess->rdma_info_iu = NULL;
+	}
+
+	WARN_ON(sess->tx_bufs_used);
+	list_for_each_entry_safe(e, next, &sess->tx_bufs, list) {
+		list_del(&e->list);
+		ibtrs_iu_free(e, DMA_TO_DEVICE, sess->dev->device);
+	}
+
+	if (sess->ops_ids) {
+		for (i = 0; i < sess->queue_depth; i++)
+			free_id(sess->ops_ids[i]);
+		kfree(sess->ops_ids);
+		sess->ops_ids = NULL;
+	}
+}
+
+static void put_tx_iu(struct ibtrs_session *sess, struct ibtrs_iu *iu)
+{
+	spin_lock(&sess->tx_bufs_lock);
+	ibtrs_iu_put(&sess->tx_bufs, iu);
+	sess->tx_bufs_used--;
+	spin_unlock(&sess->tx_bufs_lock);
+}
+
+static struct ibtrs_iu *get_tx_iu(struct ibtrs_session *sess)
+{
+	struct ibtrs_iu *iu;
+
+	spin_lock(&sess->tx_bufs_lock);
+	iu = ibtrs_iu_get(&sess->tx_bufs);
+	if (iu)
+		sess->tx_bufs_used++;
+	spin_unlock(&sess->tx_bufs_lock);
+
+	return iu;
+}
+
+static int rdma_write_sg(struct ibtrs_ops_id *id)
+{
+	int err, i, offset;
+	struct ib_send_wr *bad_wr;
+	struct ib_rdma_wr *wr = NULL;
+	struct ibtrs_session *sess = id->con->sess;
+
+	if (unlikely(id->req->sg_cnt == 0))
+		return -EINVAL;
+
+	offset = 0;
+	for (i = 0; i < id->req->sg_cnt; i++) {
+		struct ib_sge *list;
+
+		wr		= &id->tx_wr[i];
+		list		= &id->tx_sg[i];
+		list->addr	= id->data_dma_addr + offset;
+		list->length	= id->req->desc[i].len;
+
+		/* WR will fail with length error
+		 * if this is 0
+		 */
+		if (unlikely(list->length == 0)) {
+			ERR(sess, "Invalid RDMA-Write sg list length 0\n");
+			return -EINVAL;
+		}
+
+		list->lkey = sess->dev->ib_sess.pd->local_dma_lkey;
+		offset += list->length;
+
+		wr->wr.wr_id		= (uintptr_t)id;
+		wr->wr.sg_list		= list;
+		wr->wr.num_sge		= 1;
+		wr->remote_addr	= id->req->desc[i].addr;
+		wr->rkey	= id->req->desc[i].key;
+
+		if (i < (id->req->sg_cnt - 1)) {
+			wr->wr.next	= &id->tx_wr[i + 1].wr;
+			wr->wr.opcode	= IB_WR_RDMA_WRITE;
+			wr->wr.ex.imm_data	= 0;
+			wr->wr.send_flags	= 0;
+		}
+	}
+
+	wr->wr.opcode	= IB_WR_RDMA_WRITE_WITH_IMM;
+	wr->wr.next	= NULL;
+	wr->wr.send_flags	= atomic_inc_return(&id->con->wr_cnt) %
+				sess->queue_depth ? 0 : IB_SEND_SIGNALED;
+	wr->wr.ex.imm_data	= cpu_to_be32(id->msg_id << 16);
+
+	err = ib_post_send(id->con->ib_con.qp, &id->tx_wr[0].wr, &bad_wr);
+	if (unlikely(err))
+		ERR(sess,
+		    "Posting RDMA-Write-Request to QP failed, errno: %d\n",
+		    err);
+
+	return err;
+}
+
+static int send_io_resp_imm(struct ibtrs_con *con, int msg_id, s16 errno)
+{
+	int err;
+
+	err = ibtrs_write_empty_imm(con->ib_con.qp, (msg_id << 16) | (u16)errno,
+				    atomic_inc_return(&con->wr_cnt) %
+				    con->sess->queue_depth ? 0 :
+				    IB_SEND_SIGNALED);
+	if (unlikely(err))
+		ERR_RL(con->sess, "Posting RDMA-Write-Request to QP failed,"
+		       " errno: %d\n", err);
+
+	return err;
+}
+
+static int send_heartbeat_raw(struct ibtrs_con *con)
+{
+	int err;
+
+	err = ibtrs_write_empty_imm(con->ib_con.qp, UINT_MAX, IB_SEND_SIGNALED);
+	if (unlikely(err)) {
+		ERR(con->sess,
+		    "Sending heartbeat failed, posting msg to QP failed,"
+		    " errno: %d\n", err);
+		return err;
+	}
+
+	ibtrs_heartbeat_set_send_ts(&con->sess->heartbeat);
+	return err;
+}
+
+static int send_heartbeat(struct ibtrs_session *sess)
+{
+	struct ibtrs_con *con;
+
+	if (unlikely(list_empty(&sess->con_list)))
+		return -ENOENT;
+
+	con = list_first_entry(&sess->con_list, struct ibtrs_con, list);
+	WARN_ON(!con->user);
+
+	if (unlikely(con->state != CSM_STATE_CONNECTED))
+		return -ENOTCONN;
+
+	return send_heartbeat_raw(con);
+}
+
+static int ibtrs_srv_queue_resp_rdma(struct ibtrs_ops_id *id)
+{
+	if (unlikely(id->con->state != CSM_STATE_CONNECTED)) {
+		ERR_RL(id->con->sess, "Sending I/O response failed, "
+		       " session is disconnected, sess state %s,"
+		       " con state %s\n", ssm_state_str(id->con->sess->state),
+		       csm_state_str(id->con->state));
+		return -ECOMM;
+	}
+
+	if (WARN_ON(!queue_work(id->con->rdma_resp_wq, &id->work))) {
+		ERR_RL(id->con->sess, "Sending I/O response failed,"
+		       " couldn't queue work\n");
+		return -EPERM;
+	}
+
+	return 0;
+}
+
+static void ibtrs_srv_resp_rdma_worker(struct work_struct *work)
+{
+	struct ibtrs_ops_id *id;
+	int err;
+	struct ibtrs_session *sess;
+
+	id = container_of(work, struct ibtrs_ops_id, work);
+	sess = id->con->sess;
+
+	if (id->status || id->dir == WRITE) {
+		DEB("err or write msg_id=%d, status=%d, sending response\n",
+		    id->msg_id, id->status);
+
+		err = send_io_resp_imm(id->con, id->msg_id, id->status);
+		if (unlikely(err)) {
+			ERR_RL(sess, "Sending imm msg failed, errno: %d\n",
+			       err);
+			if (err == -ENOMEM && !ibtrs_srv_queue_resp_rdma(id))
+				return;
+			csm_schedule_event(id->con, CSM_EV_CON_ERROR);
+		}
+
+		ibtrs_heartbeat_set_send_ts(&sess->heartbeat);
+		ibtrs_srv_stats_dec_inflight(sess);
+		return;
+	}
+
+	DEB("read req msg_id=%d completed, sending data\n", id->msg_id);
+	err = rdma_write_sg(id);
+	if (unlikely(err)) {
+		ERR_RL(sess, "Sending I/O read response failed, errno: %d\n",
+		       err);
+		if (err == -ENOMEM && !ibtrs_srv_queue_resp_rdma(id))
+			return;
+		csm_schedule_event(id->con, CSM_EV_CON_ERROR);
+	}
+	ibtrs_heartbeat_set_send_ts(&sess->heartbeat);
+	ibtrs_srv_stats_dec_inflight(sess);
+}
+
+/*
+ * This function may be called from an interrupt context, e.g. on bio_endio
+ * callback on the user module. Queue the real work on a workqueue so we don't
+ * need to hold an irq spinlock.
+ */
+int ibtrs_srv_resp_rdma(struct ibtrs_ops_id *id, int status)
+{
+	int err = 0;
+
+	if (unlikely(!id)) {
+		ERR_NP("Sending I/O response failed, I/O ops id NULL\n");
+		return -EINVAL;
+	}
+
+	id->status = status;
+	INIT_WORK(&id->work, ibtrs_srv_resp_rdma_worker);
+
+	err = ibtrs_srv_queue_resp_rdma(id);
+	if (err)
+		ibtrs_srv_stats_dec_inflight(id->con->sess);
+	return err;
+}
+EXPORT_SYMBOL(ibtrs_srv_resp_rdma);
+
+static bool ibtrs_srv_get_usr_msg_buf(struct ibtrs_session *sess)
+{
+	return atomic_dec_if_positive(&sess->peer_usr_msg_bufs) >= 0;
+}
+
+int ibtrs_srv_send(struct ibtrs_session *sess, const struct kvec *vec,
+		   size_t nr)
+{
+	struct ibtrs_iu *iu = NULL;
+	struct ibtrs_con *con;
+	struct ibtrs_msg_user *msg;
+	size_t len;
+	bool closed_st = false;
+	int err;
+
+	if (WARN_ONCE(list_empty(&sess->con_list),
+		      "Sending message failed, no connection available\n"))
+		return -ECOMM;
+	con = ibtrs_srv_get_user_con(sess);
+
+	if (unlikely(!con)) {
+		WRN(sess,
+		    "Sending message failed, no user connection exists\n");
+		return -ECOMM;
+	}
+
+	len = kvec_length(vec, nr);
+
+	if (unlikely(len + IBTRS_HDR_LEN > MAX_REQ_SIZE)) {
+		WRN_RL(sess, "Sending message failed, passed data too big,"
+		       " %zu > %lu\n", len, MAX_REQ_SIZE - IBTRS_HDR_LEN);
+		return -EMSGSIZE;
+	}
+
+	wait_event(sess->mu_buf_wait_q,
+		   (closed_st = (con->state != CSM_STATE_CONNECTED)) ||
+		   ibtrs_srv_get_usr_msg_buf(sess));
+
+	if (unlikely(closed_st)) {
+		ERR_RL(sess, "Sending message failed, not connected (state"
+		       " %s)\n", csm_state_str(con->state));
+		return -ECOMM;
+	}
+
+	wait_event(sess->mu_iu_wait_q,
+		   (closed_st = (con->state != CSM_STATE_CONNECTED)) ||
+		   (iu = get_tx_iu(sess)) != NULL);
+
+	if (unlikely(closed_st)) {
+		ERR_RL(sess, "Sending message failed, not connected (state"
+		       " %s)\n", csm_state_str(con->state));
+		err = -ECOMM;
+		goto err_iu;
+	}
+
+	msg		= iu->buf;
+	msg->hdr.type	= IBTRS_MSG_USER;
+	msg->hdr.tsize	= len + IBTRS_HDR_LEN;
+	copy_from_kvec(msg->payl, vec, len);
+
+	ibtrs_deb_msg_hdr("Sending: ", &msg->hdr);
+	err = ibtrs_post_send(con->ib_con.qp,
+			      con->sess->dev->ib_sess.pd->__internal_mr, iu,
+			      msg->hdr.tsize);
+	if (unlikely(err)) {
+		ERR_RL(sess, "Sending message failed, posting message to QP"
+		       " failed, errno: %d\n", err);
+		goto err_post_send;
+	}
+	ibtrs_heartbeat_set_send_ts(&sess->heartbeat);
+
+	atomic64_inc(&sess->stats.user_ib_msgs.sent_msg_cnt);
+	atomic64_add(len, &sess->stats.user_ib_msgs.sent_size);
+
+	return 0;
+
+err_post_send:
+	put_tx_iu(sess, iu);
+	wake_up(&con->sess->mu_iu_wait_q);
+err_iu:
+	atomic_inc(&sess->peer_usr_msg_bufs);
+	wake_up(&con->sess->mu_buf_wait_q);
+	return err;
+}
+EXPORT_SYMBOL(ibtrs_srv_send);
+
+inline void ibtrs_srv_set_sess_priv(struct ibtrs_session *sess, void *priv)
+{
+	sess->priv = priv;
+}
+EXPORT_SYMBOL(ibtrs_srv_set_sess_priv);
+
+static int ibtrs_post_recv(struct ibtrs_con *con, struct ibtrs_iu *iu)
+{
+	struct ib_recv_wr wr, *bad_wr;
+	struct ib_sge list;
+	int err;
+
+	list.addr   = iu->dma_addr;
+	list.length = iu->size;
+	list.lkey   = con->sess->dev->ib_sess.pd->local_dma_lkey;
+
+	if (unlikely(list.length == 0)) {
+		ERR_RL(con->sess, "Posting recv buffer failed, invalid sg list"
+		       " length 0\n");
+		return -EINVAL;
+	}
+
+	wr.next     = NULL;
+	wr.wr_id    = (uintptr_t)iu;
+	wr.sg_list  = &list;
+	wr.num_sge  = 1;
+
+	err = ib_post_recv(con->ib_con.qp, &wr, &bad_wr);
+	if (unlikely(err))
+		ERR_RL(con->sess, "Posting recv buffer failed, errno: %d\n",
+		       err);
+
+	return err;
+}
+
+static struct ibtrs_rcv_buf_pool *alloc_rcv_buf_pool(void)
+{
+	struct ibtrs_rcv_buf_pool *pool;
+	struct page *cont_pages = NULL;
+	struct ibtrs_mem_chunk *mem_chunk;
+	int alloced_bufs = 0;
+	int rcv_buf_order = get_order(rcv_buf_size);
+	int max_order, alloc_order;
+	unsigned int alloced_size;
+
+	pool = kzalloc(sizeof(*pool), GFP_KERNEL);
+	if (!pool) {
+		ERR_NP("Failed to allocate memory for buffer pool struct\n");
+		return NULL;
+	}
+
+	pool->rcv_bufs = kcalloc(sess_queue_depth, sizeof(*pool->rcv_bufs),
+				 GFP_KERNEL);
+	if (!pool->rcv_bufs) {
+		ERR_NP("Failed to allocate array for receive buffers\n");
+		kfree(pool);
+		return NULL;
+	}
+	INIT_LIST_HEAD(&pool->chunk_list);
+
+	while (alloced_bufs < sess_queue_depth) {
+		mem_chunk = kzalloc(sizeof(*mem_chunk), GFP_KERNEL);
+		if (!mem_chunk) {
+			ERR_NP("Failed to allocate memory for memory chunk"
+			       " struct\n");
+			goto alloc_fail;
+		}
+
+		max_order = min(MAX_ORDER - 1,
+				get_order((sess_queue_depth - alloced_bufs) *
+					  rcv_buf_size));
+		for (alloc_order = max_order; alloc_order > rcv_buf_order;
+		     alloc_order--) {
+			cont_pages = alloc_pages(__GFP_NORETRY | __GFP_NOWARN |
+						 __GFP_ZERO, alloc_order);
+			if (cont_pages) {
+				DEB("Allocated order %d pages\n", alloc_order);
+				break;
+			}
+			DEB("Failed to allocate order %d pages\n", alloc_order);
+		}
+
+		if (cont_pages) {
+			void *recv_buf_start;
+
+			mem_chunk->order = alloc_order;
+			mem_chunk->addr = page_address(cont_pages);
+			list_add_tail(&mem_chunk->list, &pool->chunk_list);
+			alloced_size = (1 << alloc_order) * PAGE_SIZE;
+
+			DEB("Memory chunk size: %d, address: %p\n",
+			    alloced_size, mem_chunk->addr);
+
+			recv_buf_start = mem_chunk->addr;
+			while (alloced_size > rcv_buf_size &&
+			       alloced_bufs < sess_queue_depth) {
+				pool->rcv_bufs[alloced_bufs].buf =
+					recv_buf_start;
+				alloced_bufs++;
+				recv_buf_start += rcv_buf_size;
+				alloced_size -= rcv_buf_size;
+			}
+		} else {
+			/* if allocation of pages to fit multiple rcv_buf's
+			 * failed we fall back to alloc'ing exact number of
+			 * pages
+			 */
+			gfp_t gfp_mask = (GFP_KERNEL | __GFP_REPEAT |
+					  __GFP_ZERO);
+			void *addr = alloc_pages_exact(rcv_buf_size, gfp_mask);
+
+			if (!addr) {
+				ERR_NP("Failed to allocate memory for "
+				       " receive buffer (size %dB)\n",
+				       rcv_buf_size);
+				goto alloc_fail;
+			}
+
+			DEB("Alloced pages exact at %p for rcv_bufs[%d]\n",
+			    addr, alloced_bufs);
+
+			mem_chunk->addr = addr;
+			mem_chunk->order = IBTRS_MEM_CHUNK_NOORDER;
+			list_add_tail(&mem_chunk->list, &pool->chunk_list);
+
+			pool->rcv_bufs[alloced_bufs].buf = addr;
+			alloced_bufs++;
+		}
+	}
+
+	return pool;
+
+alloc_fail:
+	if (!list_empty(&pool->chunk_list)) {
+		struct ibtrs_mem_chunk *tmp;
+
+		list_for_each_entry_safe(mem_chunk, tmp, &pool->chunk_list,
+					 list) {
+			if (mem_chunk->order != IBTRS_MEM_CHUNK_NOORDER)
+				free_pages((unsigned long)mem_chunk->addr,
+					   mem_chunk->order);
+			else
+				free_pages_exact(mem_chunk->addr, rcv_buf_size);
+			list_del(&mem_chunk->list);
+			kfree(mem_chunk);
+		}
+	}
+	kfree(pool->rcv_bufs);
+	kfree(pool);
+	return NULL;
+}
+
+static struct ibtrs_rcv_buf_pool *__get_pool_from_list(void)
+{
+	struct ibtrs_rcv_buf_pool *pool = NULL;
+
+	if (!list_empty(&free_buf_pool_list)) {
+		DEB("Getting buf pool from pre-allocated list\n");
+		pool = list_first_entry(&free_buf_pool_list,
+					struct ibtrs_rcv_buf_pool, list);
+		list_del(&pool->list);
+		nr_free_buf_pool--;
+	}
+
+	return pool;
+}
+
+static void __put_pool_on_list(struct ibtrs_rcv_buf_pool *pool)
+{
+	list_add(&pool->list, &free_buf_pool_list);
+	nr_free_buf_pool++;
+	DEB("Put buf pool back to the free list (nr_free_buf_pool: %d)\n",
+	    nr_free_buf_pool);
+}
+
+static struct ibtrs_rcv_buf_pool *get_alloc_rcv_buf_pool(void)
+{
+	struct ibtrs_rcv_buf_pool *pool = NULL;
+
+	mutex_lock(&buf_pool_mutex);
+	if (nr_active_sessions >= pool_size_hi_wm) {
+		WARN_ON(nr_free_buf_pool || !list_empty(&free_buf_pool_list));
+		DEB("current nr_active_sessions (%d), pool_size_hi_wm (%d),"
+		    ", allocating.\n", nr_active_sessions, pool_size_hi_wm);
+		pool = alloc_rcv_buf_pool();
+	} else if (nr_total_buf_pool < pool_size_hi_wm) {
+		/* try to allocate new pool while used+free is less then
+		 * watermark
+		 */
+		DEB("nr_total_buf_pool (%d) smaller than pool_size_hi_wm (%d)"
+		    ", trying to allocate.\n", nr_total_buf_pool,
+		    pool_size_hi_wm);
+		pool = alloc_rcv_buf_pool();
+		if (pool)
+			nr_total_buf_pool++;
+		else
+			pool = __get_pool_from_list();
+	} else if (nr_total_buf_pool == pool_size_hi_wm) {
+		/* pool size has already reached watermark, check if there are
+		 * free pools on the list
+		 */
+		if (nr_free_buf_pool) {
+			pool = __get_pool_from_list();
+			WARN_ON(!pool);
+			DEB("Got pool from free list (nr_free_buf_pool: %d)\n",
+			    nr_free_buf_pool);
+		} else {
+			/* all pools are already being used */
+			DEB("No free pool on the list\n");
+			WARN_ON((nr_active_sessions != nr_total_buf_pool) ||
+				nr_free_buf_pool);
+			pool = alloc_rcv_buf_pool();
+		}
+	} else {
+		/* all possibilities should be covered */
+		WARN_ON(1);
+	}
+
+	if (pool)
+		nr_active_sessions++;
+
+	mutex_unlock(&buf_pool_mutex);
+
+	return pool;
+}
+
+static void free_recv_buf_pool(struct ibtrs_rcv_buf_pool *pool)
+{
+	struct ibtrs_mem_chunk *mem_chunk, *tmp;
+
+	DEB("Freeing memory chunks for %d receive buffers\n", sess_queue_depth);
+
+	list_for_each_entry_safe(mem_chunk, tmp, &pool->chunk_list, list) {
+		if (mem_chunk->order != IBTRS_MEM_CHUNK_NOORDER)
+			free_pages((unsigned long)mem_chunk->addr,
+				   mem_chunk->order);
+		else
+			free_pages_exact(mem_chunk->addr, rcv_buf_size);
+		list_del(&mem_chunk->list);
+		kfree(mem_chunk);
+	}
+
+	kfree(pool->rcv_bufs);
+	kfree(pool);
+}
+
+static void put_rcv_buf_pool(struct ibtrs_rcv_buf_pool *pool)
+{
+	mutex_lock(&buf_pool_mutex);
+	nr_active_sessions--;
+	if (nr_active_sessions >= pool_size_hi_wm) {
+		mutex_unlock(&buf_pool_mutex);
+		DEB("Freeing buf pool"
+		    " (nr_active_sessions: %d, pool_size_hi_wm: %d)\n",
+		    nr_active_sessions, pool_size_hi_wm);
+		free_recv_buf_pool(pool);
+	} else {
+		__put_pool_on_list(pool);
+		mutex_unlock(&buf_pool_mutex);
+	}
+}
+
+static void unreg_cont_bufs(struct ibtrs_session *sess)
+{
+	struct ibtrs_rcv_buf *buf;
+	int i;
+
+	DEB("Unregistering %d RDMA buffers\n", sess_queue_depth);
+	for (i = 0; i < sess_queue_depth; i++) {
+		buf = &sess->rcv_buf_pool->rcv_bufs[i];
+
+		ib_dma_unmap_single(sess->dev->device, buf->rdma_addr,
+				    rcv_buf_size, DMA_BIDIRECTIONAL);
+	}
+}
+
+static void release_cont_bufs(struct ibtrs_session *sess)
+{
+	unreg_cont_bufs(sess);
+	put_rcv_buf_pool(sess->rcv_buf_pool);
+	sess->rcv_buf_pool = NULL;
+}
+
+static int setup_cont_bufs(struct ibtrs_session *sess)
+{
+	struct ibtrs_rcv_buf *buf;
+	int i, err;
+
+	sess->rcv_buf_pool = get_alloc_rcv_buf_pool();
+	if (!sess->rcv_buf_pool) {
+		ERR(sess, "Failed to allocate receive buffers for session\n");
+		return -ENOMEM;
+	}
+
+	DEB("Mapping %d buffers for RDMA\n", sess->queue_depth);
+	for (i = 0; i < sess->queue_depth; i++) {
+		buf = &sess->rcv_buf_pool->rcv_bufs[i];
+
+		buf->rdma_addr = ib_dma_map_single(sess->dev->device, buf->buf,
+						   rcv_buf_size,
+						   DMA_BIDIRECTIONAL);
+		if (unlikely(ib_dma_mapping_error(sess->dev->device,
+						  buf->rdma_addr))) {
+			ERR_NP("Registering RDMA buf failed,"
+			       " DMA mapping failed\n");
+			err = -EIO;
+			goto err_map;
+		}
+	}
+
+	sess->off_len = 31 - ilog2(sess->queue_depth - 1);
+	sess->off_mask = (1 << sess->off_len) - 1;
+
+	INFO(sess, "Allocated %d %dKB RDMA receive buffers, %dKB in total\n",
+	     sess->queue_depth, rcv_buf_size >> 10,
+	     sess->queue_depth * rcv_buf_size >> 10);
+
+	return 0;
+
+err_map:
+	for (i = 0; i < sess->queue_depth; i++) {
+		buf = &sess->rcv_buf_pool->rcv_bufs[i];
+
+		if (buf->rdma_addr &&
+		    !ib_dma_mapping_error(sess->dev->device, buf->rdma_addr))
+			ib_dma_unmap_single(sess->dev->device, buf->rdma_addr,
+					    rcv_buf_size, DMA_BIDIRECTIONAL);
+	}
+	return err;
+}
+
+static void fill_ibtrs_msg_sess_open_resp(struct ibtrs_msg_sess_open_resp *msg,
+					  struct ibtrs_con *con)
+{
+	int i;
+
+	msg->hdr.type   = IBTRS_MSG_SESS_OPEN_RESP;
+	msg->hdr.tsize  = IBTRS_MSG_SESS_OPEN_RESP_LEN(con->sess->queue_depth);
+
+	msg->ver = con->sess->ver;
+	strlcpy(msg->hostname, hostname, sizeof(msg->hostname));
+	msg->cnt = con->sess->queue_depth;
+	msg->rkey = con->sess->dev->ib_sess.pd->unsafe_global_rkey;
+	msg->max_inflight_msg = con->sess->queue_depth;
+	msg->max_io_size = max_io_size;
+	msg->max_req_size = MAX_REQ_SIZE;
+	for (i = 0; i < con->sess->queue_depth; i++)
+		msg->addr[i] = con->sess->rcv_buf_pool->rcv_bufs[i].rdma_addr;
+}
+
+static void free_sess_rx_bufs(struct ibtrs_session *sess)
+{
+	int i;
+
+	if (sess->dummy_rx_iu) {
+		ibtrs_iu_free(sess->dummy_rx_iu, DMA_FROM_DEVICE,
+			      sess->dev->device);
+		sess->dummy_rx_iu = NULL;
+	}
+
+	if (sess->usr_rx_ring) {
+		for (i = 0; i < USR_CON_BUF_SIZE; ++i)
+			if (sess->usr_rx_ring[i])
+				ibtrs_iu_free(sess->usr_rx_ring[i],
+					      DMA_FROM_DEVICE,
+					      sess->dev->device);
+		kfree(sess->usr_rx_ring);
+		sess->usr_rx_ring = NULL;
+	}
+}
+
+static int alloc_sess_tx_bufs(struct ibtrs_session *sess)
+{
+	struct ibtrs_iu *iu;
+	struct ibtrs_ops_id *id;
+	struct ib_device *ib_dev = sess->dev->device;
+	int i;
+
+	sess->rdma_info_iu =
+		ibtrs_iu_alloc(0, IBTRS_MSG_SESS_OPEN_RESP_LEN(
+			       sess->queue_depth), GFP_KERNEL, ib_dev,
+			       DMA_TO_DEVICE, true);
+	if (unlikely(!sess->rdma_info_iu)) {
+		ERR_RL(sess, "Can't allocate transfer buffer for "
+			     "sess open resp\n");
+		return -ENOMEM;
+	}
+
+	sess->ops_ids = kcalloc(sess->queue_depth, sizeof(*sess->ops_ids),
+				GFP_KERNEL);
+	if (unlikely(!sess->ops_ids)) {
+		ERR_RL(sess, "Can't alloc ops_ids for the session\n");
+		goto err;
+	}
+
+	for (i = 0; i < sess->queue_depth; ++i) {
+		id = ibtrs_zalloc(sizeof(*id));
+		if (unlikely(!id)) {
+			ERR_RL(sess, "Can't alloc ops id for session\n");
+			goto err;
+		}
+		sess->ops_ids[i] = id;
+	}
+
+	for (i = 0; i < USR_MSG_CNT; ++i) {
+		iu = ibtrs_iu_alloc(i, MAX_REQ_SIZE, GFP_KERNEL,
+				    ib_dev, DMA_TO_DEVICE, true);
+		if (!iu) {
+			ERR_RL(sess, "Can't alloc tx bufs for user msgs\n");
+			goto err;
+		}
+		list_add(&iu->list, &sess->tx_bufs);
+	}
+
+	return 0;
+
+err:
+	free_sess_tx_bufs(sess);
+	return -ENOMEM;
+}
+
+static int alloc_sess_rx_bufs(struct ibtrs_session *sess)
+{
+	int i;
+
+	sess->dummy_rx_iu =
+		ibtrs_iu_alloc(0, IBTRS_HDR_LEN, GFP_KERNEL, sess->dev->device,
+			       DMA_FROM_DEVICE, true);
+	if (!sess->dummy_rx_iu) {
+		ERR(sess, "Failed to allocate dummy IU to receive "
+			  "immediate messages on io connections\n");
+		goto err;
+	}
+
+	sess->usr_rx_ring = kcalloc(USR_CON_BUF_SIZE,
+				    sizeof(*sess->usr_rx_ring), GFP_KERNEL);
+	if (!sess->usr_rx_ring) {
+		ERR(sess, "Alloc usr_rx_ring for session failed\n");
+		goto err;
+	}
+
+	for (i = 0; i < USR_CON_BUF_SIZE; ++i) {
+		sess->usr_rx_ring[i] =
+			ibtrs_iu_alloc(i, MAX_REQ_SIZE, GFP_KERNEL,
+				       sess->dev->device, DMA_FROM_DEVICE,
+				       true);
+		if (!sess->usr_rx_ring[i]) {
+			ERR(sess, "Failed to allocate iu for usr_rx_ring\n");
+			goto err;
+		}
+	}
+
+	return 0;
+
+err:
+	free_sess_rx_bufs(sess);
+	return -ENOMEM;
+}
+
+static int alloc_sess_bufs(struct ibtrs_session *sess)
+{
+	int err;
+
+	err = alloc_sess_rx_bufs(sess);
+	if (err)
+		return err;
+	else
+		return alloc_sess_tx_bufs(sess);
+}
+
+static int post_io_con_recv(struct ibtrs_con *con)
+{
+	int i, ret;
+
+	for (i = 0; i < con->sess->queue_depth; i++) {
+		ret = ibtrs_post_recv(con, con->sess->dummy_rx_iu);
+		if (unlikely(ret))
+			return ret;
+	}
+
+	return 0;
+}
+
+static int post_user_con_recv(struct ibtrs_con *con)
+{
+	int i, ret;
+
+	for (i = 0; i < USR_CON_BUF_SIZE; i++) {
+		struct ibtrs_iu *iu = con->sess->usr_rx_ring[i];
+
+		ret = ibtrs_post_recv(con, iu);
+		if (unlikely(ret))
+			return ret;
+	}
+
+	return 0;
+}
+
+static int post_recv(struct ibtrs_con *con)
+{
+	if (con->user)
+		return post_user_con_recv(con);
+	else
+		return post_io_con_recv(con);
+
+	return 0;
+}
+
+static void free_sess_bufs(struct ibtrs_session *sess)
+{
+	free_sess_rx_bufs(sess);
+	free_sess_tx_bufs(sess);
+}
+
+static int init_transfer_bufs(struct ibtrs_con *con)
+{
+	int err;
+	struct ibtrs_session *sess = con->sess;
+
+	if (con->user) {
+		err = alloc_sess_bufs(sess);
+		if (err) {
+			ERR(sess, "Alloc sess bufs failed: %d\n", err);
+			return err;
+		}
+	}
+
+	return post_recv(con);
+}
+
+static void process_rdma_write_req(struct ibtrs_con *con,
+				   struct ibtrs_msg_req_rdma_write *req,
+				   u32 buf_id, u32 off)
+{
+	int ret;
+	struct ibtrs_ops_id *id;
+	struct ibtrs_session *sess = con->sess;
+
+	if (unlikely(sess->state != SSM_STATE_CONNECTED ||
+		     con->state != CSM_STATE_CONNECTED)) {
+		ERR_RL(sess, "Processing RDMA-Write-Req request failed, "
+		       " session is disconnected, sess state %s,"
+		       " con state %s\n", ssm_state_str(sess->state),
+		       csm_state_str(con->state));
+		return;
+	}
+	ibtrs_srv_update_rdma_stats(&sess->stats, off, true);
+	id = sess->ops_ids[buf_id];
+	kfree(id->tx_wr);
+	kfree(id->tx_sg);
+	id->con		= con;
+	id->dir		= READ;
+	id->msg_id	= buf_id;
+	id->req		= req;
+	id->tx_wr	= kcalloc(req->sg_cnt, sizeof(*id->tx_wr), GFP_KERNEL);
+	id->tx_sg	= kcalloc(req->sg_cnt, sizeof(*id->tx_sg), GFP_KERNEL);
+	if (!id->tx_wr || !id->tx_sg) {
+		ERR_RL(sess, "Processing RDMA-Write-Req failed, work request "
+		       "or scatter gather allocation failed for msg_id %d\n",
+		       buf_id);
+		ret = -ENOMEM;
+		goto send_err_msg;
+	}
+
+	id->data_dma_addr = sess->rcv_buf_pool->rcv_bufs[buf_id].rdma_addr;
+	ret = srv_ops->rdma_ev(con->sess, sess->priv, id,
+			       IBTRS_SRV_RDMA_EV_WRITE_REQ,
+			       sess->rcv_buf_pool->rcv_bufs[buf_id].buf, off);
+
+	if (unlikely(ret)) {
+		ERR_RL(sess, "Processing RDMA-Write-Req failed, user "
+		       "module cb reported for msg_id %d, errno: %d\n",
+		       buf_id, ret);
+		goto send_err_msg;
+	}
+
+	return;
+
+send_err_msg:
+	ret = send_io_resp_imm(con, buf_id, ret);
+	if (ret < 0) {
+		ERR_RL(sess, "Sending err msg for failed RDMA-Write-Req"
+		       " failed, msg_id %d, errno: %d\n", buf_id, ret);
+		csm_schedule_event(con, CSM_EV_CON_ERROR);
+	}
+	ibtrs_srv_stats_dec_inflight(sess);
+}
+
+static void process_rdma_write(struct ibtrs_con *con,
+			       struct ibtrs_msg_rdma_write *req,
+			       u32 buf_id, u32 off)
+{
+	int ret;
+	struct ibtrs_ops_id *id;
+	struct ibtrs_session *sess = con->sess;
+
+	if (unlikely(sess->state != SSM_STATE_CONNECTED ||
+		     con->state != CSM_STATE_CONNECTED)) {
+		ERR_RL(sess, "Processing RDMA-Write request failed, "
+		       " session is disconnected, sess state %s,"
+		       " con state %s\n", ssm_state_str(sess->state),
+		       csm_state_str(con->state));
+		return;
+	}
+	ibtrs_srv_update_rdma_stats(&sess->stats, off, false);
+	id = con->sess->ops_ids[buf_id];
+	id->con    = con;
+	id->dir    = WRITE;
+	id->msg_id = buf_id;
+
+	ret = srv_ops->rdma_ev(sess, sess->priv, id, IBTRS_SRV_RDMA_EV_RECV,
+			       sess->rcv_buf_pool->rcv_bufs[buf_id].buf, off);
+	if (unlikely(ret)) {
+		ERR_RL(sess, "Processing RDMA-Write failed, user module"
+		       " callback reports errno: %d\n", ret);
+		goto send_err_msg;
+	}
+
+	return;
+
+send_err_msg:
+	ret = send_io_resp_imm(con, buf_id, ret);
+	if (ret < 0) {
+		ERR_RL(sess, "Processing RDMA-Write failed, sending I/O"
+		       " response failed, msg_id %d, errno: %d\n",
+		       buf_id, ret);
+		csm_schedule_event(con, CSM_EV_CON_ERROR);
+	}
+	ibtrs_srv_stats_dec_inflight(sess);
+}
+
+static int ibtrs_send_usr_msg_ack(struct ibtrs_con *con)
+{
+	struct ibtrs_session *sess;
+	int err;
+
+	sess = con->sess;
+
+	if (unlikely(con->state != CSM_STATE_CONNECTED)) {
+		ERR_RL(sess, "Sending user msg ack failed, disconnected"
+			" Connection state is %s\n", csm_state_str(con->state));
+		return -ECOMM;
+	}
+	DEB("Sending user message ack\n");
+	err = ibtrs_write_empty_imm(con->ib_con.qp, UINT_MAX - 1,
+				    IB_SEND_SIGNALED);
+	if (unlikely(err)) {
+		ERR_RL(sess, "Sending user Ack msg failed, errno: %d\n", err);
+		return err;
+	}
+
+	ibtrs_heartbeat_set_send_ts(&sess->heartbeat);
+	return 0;
+}
+
+static void process_msg_user(struct ibtrs_con *con,
+			     struct ibtrs_msg_user *msg)
+{
+	int len;
+	struct ibtrs_session *sess = con->sess;
+
+	len = msg->hdr.tsize - IBTRS_HDR_LEN;
+	if (unlikely(sess->state < SSM_STATE_CONNECTED || !sess->priv)) {
+		ERR_RL(sess, "Sending user msg failed, session isn't ready."
+			" Session state is %s\n", ssm_state_str(sess->state));
+		return;
+	}
+
+	srv_ops->recv(sess, sess->priv, msg->payl, len);
+
+	atomic64_inc(&sess->stats.user_ib_msgs.recv_msg_cnt);
+	atomic64_add(len, &sess->stats.user_ib_msgs.recv_size);
+}
+
+static void process_msg_user_ack(struct ibtrs_con *con)
+{
+	struct ibtrs_session *sess = con->sess;
+
+	atomic_inc(&sess->peer_usr_msg_bufs);
+	wake_up(&con->sess->mu_buf_wait_q);
+}
+
+static void ibtrs_handle_write(struct ibtrs_con *con, struct ibtrs_iu *iu,
+			       struct ibtrs_msg_hdr *hdr, u32 id, u32 off)
+{
+	struct ibtrs_session *sess = con->sess;
+	int ret;
+
+	if (unlikely(ibtrs_validate_message(sess->queue_depth, hdr))) {
+		ERR(sess,
+		    "Processing I/O failed, message validation failed\n");
+		ret = ibtrs_post_recv(con, iu);
+		if (unlikely(ret != 0))
+			ERR(sess,
+			    "Failed to post receive buffer to HCA, errno: %d\n",
+			    ret);
+		goto err;
+	}
+
+	DEB("recv completion, type 0x%02x, tag %u, id %u, off %u\n",
+	    hdr->type, iu->tag, id, off);
+	print_hex_dump_debug("", DUMP_PREFIX_OFFSET, 8, 1,
+			     hdr, IBTRS_HDR_LEN + 32, true);
+	ret = ibtrs_post_recv(con, iu);
+	if (unlikely(ret != 0)) {
+		ERR(sess, "Posting receive buffer to HCA failed, errno: %d\n",
+		    ret);
+		goto err;
+	}
+
+	switch (hdr->type) {
+	case IBTRS_MSG_RDMA_WRITE:
+		process_rdma_write(con, (struct ibtrs_msg_rdma_write *)hdr,
+				   id, off);
+		break;
+	case IBTRS_MSG_REQ_RDMA_WRITE:
+		process_rdma_write_req(con,
+				       (struct ibtrs_msg_req_rdma_write *)hdr,
+				       id, off);
+		break;
+	default:
+		ERR(sess, "Processing I/O request failed, "
+		    "unknown message type received: 0x%02x\n", hdr->type);
+		goto err;
+	}
+
+	return;
+
+err:
+	csm_schedule_event(con, CSM_EV_CON_ERROR);
+}
+
+static void msg_worker(struct work_struct *work)
+{
+	struct msg_work *w;
+	struct ibtrs_con *con;
+	struct ibtrs_msg_user *msg;
+
+	w = container_of(work, struct msg_work, work);
+	con = w->con;
+	msg = w->msg;
+	kvfree(w);
+	process_msg_user(con, msg);
+	kvfree(msg);
+}
+
+static int ibtrs_schedule_msg(struct ibtrs_con *con, struct ibtrs_msg_user *msg)
+{
+	struct msg_work *w;
+
+	w = ibtrs_malloc(sizeof(*w));
+	if (!w)
+		return -ENOMEM;
+
+	w->con = con;
+	w->msg = ibtrs_malloc(msg->hdr.tsize);
+	if (!w->msg) {
+		kvfree(w);
+		return -ENOMEM;
+	}
+	memcpy(w->msg, msg, msg->hdr.tsize);
+	INIT_WORK(&w->work, msg_worker);
+	queue_work(con->sess->msg_wq, &w->work);
+	return 0;
+}
+
+static void ibtrs_handle_recv(struct ibtrs_con *con,  struct ibtrs_iu *iu)
+{
+	struct ibtrs_msg_hdr *hdr;
+	struct ibtrs_msg_sess_info *req;
+	struct ibtrs_session *sess = con->sess;
+	int ret;
+	u8 type;
+
+	hdr = (struct ibtrs_msg_hdr *)iu->buf;
+	if (unlikely(ibtrs_validate_message(sess->queue_depth, hdr)))
+		goto err1;
+
+	type = hdr->type;
+
+	DEB("recv completion, type 0x%02x, tag %u\n",
+	    type, iu->tag);
+	print_hex_dump_debug("", DUMP_PREFIX_OFFSET, 8, 1,
+			     iu->buf, IBTRS_HDR_LEN, true);
+
+	switch (type) {
+	case IBTRS_MSG_USER:
+		ret = ibtrs_schedule_msg(con, iu->buf);
+		if (unlikely(ret)) {
+			ERR_RL(sess, "Scheduling worker of user message "
+			       "to user module failed, errno: %d\n", ret);
+			goto err1;
+		}
+		ret = ibtrs_post_recv(con, iu);
+		if (unlikely(ret)) {
+			ERR_RL(sess, "Posting receive buffer of user message "
+			       "to HCA failed, errno: %d\n", ret);
+			goto err2;
+		}
+		ret = ibtrs_send_usr_msg_ack(con);
+		if (unlikely(ret)) {
+			ERR_RL(sess, "Sending ACK for user message failed, "
+			       "errno: %d\n", ret);
+			goto err2;
+		}
+		return;
+	case IBTRS_MSG_SESS_INFO:
+		ret = ibtrs_post_recv(con, iu);
+		if (unlikely(ret)) {
+			ERR_RL(sess, "Posting receive buffer of sess info "
+			       "to HCA failed, errno: %d\n", ret);
+			goto err2;
+		}
+		req = (struct ibtrs_msg_sess_info *)hdr;
+		strlcpy(sess->hostname, req->hostname, sizeof(sess->hostname));
+		return;
+	default:
+		ERR(sess, "Processing received message failed, "
+		    "unknown type: 0x%02x\n", type);
+		goto err1;
+	}
+
+err1:
+	ibtrs_post_recv(con, iu);
+err2:
+	ERR(sess, "Failed to process IBTRS message\n");
+	csm_schedule_event(con, CSM_EV_CON_ERROR);
+}
+
+static void add_con_to_list(struct ibtrs_session *sess, struct ibtrs_con *con)
+{
+	mutex_lock(&sess->lock);
+	list_add_tail(&con->list, &sess->con_list);
+	mutex_unlock(&sess->lock);
+}
+
+static void remove_con_from_list(struct ibtrs_con *con)
+{
+	if (WARN_ON(!con->sess))
+		return;
+	mutex_lock(&con->sess->lock);
+	list_del(&con->list);
+	mutex_unlock(&con->sess->lock);
+}
+
+static void close_con(struct ibtrs_con *con)
+{
+	struct ibtrs_session *sess = con->sess;
+
+	DEB("Closing connection %p\n", con);
+
+	if (con->user)
+		cancel_delayed_work(&sess->send_heartbeat_dwork);
+
+	cancel_work_sync(&con->cq_work);
+	destroy_workqueue(con->rdma_resp_wq);
+
+	ib_con_destroy(&con->ib_con);
+	if (!con->user && !con->device_being_removed)
+		rdma_destroy_id(con->cm_id);
+
+	destroy_workqueue(con->cq_wq);
+
+	if (con->user) {
+		/* notify possible user msg ACK thread waiting for a tx iu or
+		 * user msg buffer so they can check the connection state, give
+		 * up waiting and put back any tx_iu reserved
+		 */
+		wake_up(&sess->mu_buf_wait_q);
+		wake_up(&sess->mu_iu_wait_q);
+		destroy_workqueue(sess->msg_wq);
+	}
+
+	con->sess->active_cnt--;
+}
+
+static void destroy_con(struct ibtrs_con *con)
+{
+	remove_con_from_list(con);
+	kvfree(con);
+}
+
+static void destroy_sess(struct kref *kref)
+{
+	struct ibtrs_session *sess = container_of(kref, struct ibtrs_session,
+						  kref);
+	struct ibtrs_con *con, *con_next;
+
+	if (sess->cm_id)
+		rdma_destroy_id(sess->cm_id);
+
+	destroy_workqueue(sess->sm_wq);
+
+	list_for_each_entry_safe(con, con_next, &sess->con_list, list)
+		destroy_con(con);
+
+	mutex_lock(&sess_mutex);
+	list_del(&sess->list);
+	mutex_unlock(&sess_mutex);
+	wake_up(&sess_list_waitq);
+
+	INFO(sess, "Session is closed\n");
+	kvfree(sess);
+}
+
+int ibtrs_srv_sess_get(struct ibtrs_session *sess)
+{
+	return kref_get_unless_zero(&sess->kref);
+}
+
+void ibtrs_srv_sess_put(struct ibtrs_session *sess)
+{
+	kref_put(&sess->kref, destroy_sess);
+}
+
+static void sess_put_worker(struct work_struct *work)
+{
+	struct sess_put_work *w = container_of(work, struct sess_put_work,
+					       work);
+
+	ibtrs_srv_sess_put(w->sess);
+	kvfree(w);
+}
+
+static void schedule_sess_put(struct ibtrs_session *sess)
+{
+	struct sess_put_work *w;
+
+	while (true) {
+		w = ibtrs_malloc(sizeof(*w));
+		if (w)
+			break;
+		cond_resched();
+	}
+
+	/* Since we can be closing this session from a session workqueue,
+	 * we need to schedule another work on the global workqueue to put the
+	 * session, which can destroy the session workqueue and free the
+	 * session.
+	 */
+	w->sess = sess;
+	INIT_WORK(&w->work, sess_put_worker);
+	queue_work(destroy_wq, &w->work);
+}
+
+static void ibtrs_srv_sysfs_put_worker(struct work_struct *work)
+{
+	struct ibtrs_srv_sysfs_put_work *w;
+
+	w = container_of(work, struct ibtrs_srv_sysfs_put_work, work);
+	kobject_put(&w->sess->kobj_stats);
+	kobject_put(&w->sess->kobj);
+
+	kvfree(w);
+}
+
+static void ibtrs_srv_schedule_sysfs_put(struct ibtrs_session *sess)
+{
+	struct ibtrs_srv_sysfs_put_work *w = ibtrs_malloc(sizeof(*w));
+
+	if (WARN_ON(!w))
+		return;
+
+	w->sess	= sess;
+
+	INIT_WORK(&w->work, ibtrs_srv_sysfs_put_worker);
+	queue_work(destroy_wq, &w->work);
+}
+
+static void ibtrs_free_dev(struct kref *ref)
+{
+	struct ibtrs_device *ndev =
+		container_of(ref, struct ibtrs_device, ref);
+
+	mutex_lock(&device_list_mutex);
+	list_del(&ndev->entry);
+	mutex_unlock(&device_list_mutex);
+	ib_session_destroy(&ndev->ib_sess);
+	if (ndev->ib_sess_destroy_completion)
+		complete_all(ndev->ib_sess_destroy_completion);
+	kfree(ndev);
+}
+
+static struct ibtrs_device *
+ibtrs_find_get_device(struct rdma_cm_id *cm_id)
+{
+	struct ibtrs_device *ndev;
+	int err;
+
+	mutex_lock(&device_list_mutex);
+	list_for_each_entry(ndev, &device_list, entry) {
+		if (ndev->device->node_guid == cm_id->device->node_guid &&
+		    kref_get_unless_zero(&ndev->ref))
+			goto out_unlock;
+	}
+
+	ndev = kzalloc(sizeof(*ndev), GFP_KERNEL);
+	if (!ndev)
+		goto out_err;
+
+	ndev->device = cm_id->device;
+	kref_init(&ndev->ref);
+
+	err = ib_session_init(cm_id->device, &ndev->ib_sess);
+	if (err)
+		goto out_free;
+
+	list_add(&ndev->entry, &device_list);
+	DEB("added %s.\n", ndev->device->name);
+out_unlock:
+	mutex_unlock(&device_list_mutex);
+	return ndev;
+
+out_free:
+	kfree(ndev);
+out_err:
+	mutex_unlock(&device_list_mutex);
+	return NULL;
+}
+
+static void ibtrs_srv_destroy_ib_session(struct ibtrs_session *sess)
+{
+	release_cont_bufs(sess);
+	free_sess_bufs(sess);
+	kref_put(&sess->dev->ref, ibtrs_free_dev);
+}
+
+static void process_err_wc(struct ibtrs_con *con, struct ib_wc *wc)
+{
+	struct ibtrs_iu *iu;
+
+	if (wc->wr_id == (uintptr_t)&con->ib_con.beacon) {
+		DEB("beacon received for con %p\n", con);
+		csm_schedule_event(con, CSM_EV_BEACON_COMPLETED);
+		return;
+	}
+
+	/* only wc->wr_id is ensured to be correct in erroneous WCs,
+	 * we can't rely on wc->opcode, use iu->direction to determine if it's
+	 * an tx or rx IU
+	 */
+	iu = (struct ibtrs_iu *)wc->wr_id;
+	if (iu && iu->direction == DMA_TO_DEVICE &&
+	    iu != con->sess->rdma_info_iu)
+		put_tx_iu(con->sess, iu);
+
+	if (wc->status != IB_WC_WR_FLUSH_ERR ||
+	    (con->state != CSM_STATE_CLOSING &&
+	     con->state != CSM_STATE_FLUSHING)) {
+		/* suppress flush errors when the connection has
+		 * just called rdma_disconnect() and is in
+		 * DISCONNECTING state waiting for the second
+		 * CM_DISCONNECTED event
+		 */
+		ERR_RL(con->sess, "%s (wr_id: 0x%llx,"
+		       " type: %s, vendor_err: 0x%x, len: %u)\n",
+		       ib_wc_status_msg(wc->status), wc->wr_id,
+		       ib_wc_opcode_str(wc->opcode),
+		       wc->vendor_err, wc->byte_len);
+	}
+	csm_schedule_event(con, CSM_EV_CON_ERROR);
+}
+
+static int process_wcs(struct ibtrs_con *con, struct ib_wc *wcs, size_t len)
+{
+	int i, ret;
+	struct ibtrs_iu *iu;
+	struct ibtrs_session *sess = con->sess;
+
+	for (i = 0; i < len; i++) {
+		struct ib_wc wc = wcs[i];
+
+		if (unlikely(wc.status != IB_WC_SUCCESS)) {
+			process_err_wc(con, &wc);
+			continue;
+		}
+
+		/* DEB("cq complete with wr_id 0x%llx, len %u "
+		 *  "status %d (%s) type %d (%s)\n", wc.wr_id,
+		 *  wc.byte_len, wc.status, ib_wc_status_msg(wc.status),
+		 *  wc.opcode, ib_wc_opcode_str(wc.opcode));
+		 */
+
+		switch (wc.opcode) {
+		case IB_WC_SEND:
+			iu = (struct ibtrs_iu *)(uintptr_t)wc.wr_id;
+			if (iu == con->sess->rdma_info_iu)
+				break;
+			put_tx_iu(sess, iu);
+			if (con->user)
+				wake_up(&sess->mu_iu_wait_q);
+			break;
+
+		case IB_WC_RECV_RDMA_WITH_IMM: {
+			u32 imm, id, off;
+			struct ibtrs_msg_hdr *hdr;
+
+			ibtrs_set_last_heartbeat(&sess->heartbeat);
+
+			iu = (struct ibtrs_iu *)(uintptr_t)wc.wr_id;
+			imm = be32_to_cpu(wc.ex.imm_data);
+			if (imm == UINT_MAX) {
+				ret = ibtrs_post_recv(con, iu);
+				if (unlikely(ret != 0)) {
+					ERR(sess, "post receive buffer failed,"
+					    " errno: %d\n", ret);
+					return ret;
+				}
+				break;
+			} else if (imm == UINT_MAX - 1) {
+				ret = ibtrs_post_recv(con, iu);
+				if (unlikely(ret))
+					ERR_RL(sess, "Posting receive buffer of"
+					       " user Ack msg to HCA failed,"
+					       " errno: %d\n", ret);
+				process_msg_user_ack(con);
+				break;
+			}
+			id = imm >> sess->off_len;
+			off = imm & sess->off_mask;
+
+			if (id > sess->queue_depth || off > rcv_buf_size) {
+				ERR(sess, "Processing I/O failed, contiguous "
+				    "buf addr is out of reserved area\n");
+				ret = ibtrs_post_recv(con, iu);
+				if (unlikely(ret != 0))
+					ERR(sess, "Processing I/O failed, "
+					    "post receive buffer failed, "
+					    "errno: %d\n", ret);
+				return -EIO;
+			}
+
+			hdr = (struct ibtrs_msg_hdr *)
+				(sess->rcv_buf_pool->rcv_bufs[id].buf + off);
+
+			ibtrs_handle_write(con, iu, hdr, id, off);
+			break;
+		}
+
+		case IB_WC_RDMA_WRITE:
+			break;
+
+		case IB_WC_RECV: {
+			struct ibtrs_msg_hdr *hdr;
+
+			ibtrs_set_last_heartbeat(&sess->heartbeat);
+			iu = (struct ibtrs_iu *)(uintptr_t)wc.wr_id;
+			hdr = (struct ibtrs_msg_hdr *)iu->buf;
+			ibtrs_deb_msg_hdr("Received: ", hdr);
+			ibtrs_handle_recv(con, iu);
+			break;
+		}
+
+		default:
+			ERR(sess, "Processing work completion failed,"
+			    " WC has unknown opcode: %s\n",
+			    ib_wc_opcode_str(wc.opcode));
+			return -EINVAL;
+		}
+	}
+	return 0;
+}
+
+static void ibtrs_srv_update_wc_stats(struct ibtrs_con *con, int cnt)
+{
+	int old_max = atomic_read(&con->sess->stats.wc_comp.max_wc_cnt);
+	int act_max;
+
+	while (cnt > old_max) {
+		act_max = atomic_cmpxchg(&con->sess->stats.wc_comp.max_wc_cnt,
+					 old_max, cnt);
+		if (likely(act_max == old_max))
+			break;
+		old_max = act_max;
+	}
+
+	atomic64_inc(&con->sess->stats.wc_comp.calls);
+	atomic64_add(cnt, &con->sess->stats.wc_comp.total_wc_cnt);
+}
+
+static int get_process_wcs(struct ibtrs_con *con, int *total_cnt)
+{
+	int cnt, err;
+
+	do {
+		cnt = ib_poll_cq(con->ib_con.cq, ARRAY_SIZE(con->wcs),
+				 con->wcs);
+		if (unlikely(cnt < 0)) {
+			ERR(con->sess, "Polling completion queue failed, "
+			    "errno: %d\n", cnt);
+			return cnt;
+		}
+
+		if (likely(cnt > 0)) {
+			err = process_wcs(con, con->wcs, cnt);
+			*total_cnt += cnt;
+			if (unlikely(err))
+				return err;
+		}
+	} while (cnt > 0);
+
+	return 0;
+}
+
+static void wrapper_handle_cq_comp(struct work_struct *work)
+{
+	int err;
+	struct ibtrs_con *con = container_of(work, struct ibtrs_con, cq_work);
+	struct ibtrs_session *sess = con->sess;
+	int total_cnt = 0;
+
+	if (unlikely(con->state == CSM_STATE_CLOSED)) {
+		ERR(sess, "Retrieving work completions from completion"
+		    " queue failed, connection is disconnected\n");
+		goto error;
+	}
+
+	err = get_process_wcs(con, &total_cnt);
+	if (unlikely(err))
+		goto error;
+
+	while ((err = ib_req_notify_cq(con->ib_con.cq, IB_CQ_NEXT_COMP |
+				       IB_CQ_REPORT_MISSED_EVENTS)) > 0) {
+		DEB("Missed %d CQ notifications, processing missed WCs...\n",
+		    err);
+		err = get_process_wcs(con, &total_cnt);
+		if (unlikely(err))
+			goto error;
+	}
+
+	if (unlikely(err))
+		goto error;
+
+	ibtrs_srv_update_wc_stats(con, total_cnt);
+	return;
+
+error:
+	csm_schedule_event(con, CSM_EV_CON_ERROR);
+}
+
+static void cq_event_handler(struct ib_cq *cq, void *ctx)
+{
+	struct ibtrs_con *con = ctx;
+
+	/* queue_work() can return False here.
+	 * The work can be already queued, When CQ notifications were already
+	 * activiated and are activated again after the beacon was posted.
+	 */
+	if (con->state != CSM_STATE_CLOSED)
+		queue_work(con->cq_wq, &con->cq_work);
+}
+
+static int accept(struct ibtrs_con *con)
+{
+	struct rdma_conn_param conn_param;
+	int ret;
+	struct ibtrs_session *sess = con->sess;
+
+	memset(&conn_param, 0, sizeof(conn_param));
+	conn_param.retry_count = retry_count;
+
+	if (con->user)
+		conn_param.rnr_retry_count = 7;
+
+	ret = rdma_accept(con->cm_id, &conn_param);
+	if (ret) {
+		ERR(sess, "Accepting RDMA connection request failed,"
+		    " errno: %d\n", ret);
+		return ret;
+	}
+
+	return 0;
+}
+
+static struct ibtrs_session *
+__create_sess(struct rdma_cm_id *cm_id, const struct ibtrs_msg_sess_open *req)
+{
+	struct ibtrs_session *sess;
+	int err;
+
+	sess = ibtrs_zalloc(sizeof(*sess));
+	if (!sess) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	err = ibtrs_addr_to_str(&cm_id->route.addr.dst_addr, sess->addr,
+				sizeof(sess->addr));
+	if (err < 0)
+		goto err1;
+
+	sess->est_cnt = 0;
+	sess->state_in_sysfs = false;
+	sess->cur_cq_vector = -1;
+	INIT_LIST_HEAD(&sess->con_list);
+	mutex_init(&sess->lock);
+
+	INIT_LIST_HEAD(&sess->tx_bufs);
+	spin_lock_init(&sess->tx_bufs_lock);
+
+	err = ib_get_max_wr_queue_size(cm_id->device);
+	if (err < 0)
+		goto err1;
+
+	sess->wq_size = err - 1;
+
+	sess->queue_depth		= sess_queue_depth;
+	sess->con_cnt			= req->con_cnt;
+	sess->ver			= min_t(u8, req->ver, IBTRS_VERSION);
+	sess->primary_port_num		= cm_id->port_num;
+
+	init_waitqueue_head(&sess->mu_iu_wait_q);
+	init_waitqueue_head(&sess->mu_buf_wait_q);
+	ibtrs_set_heartbeat_timeout(&sess->heartbeat,
+				    default_heartbeat_timeout_ms <
+				    MIN_HEARTBEAT_TIMEOUT_MS ?
+				    MIN_HEARTBEAT_TIMEOUT_MS :
+				    default_heartbeat_timeout_ms);
+	atomic64_set(&sess->heartbeat.send_ts_ms, 0);
+	atomic64_set(&sess->heartbeat.recv_ts_ms, 0);
+	sess->heartbeat.addr = sess->addr;
+	sess->heartbeat.hostname = sess->hostname;
+
+	atomic_set(&sess->peer_usr_msg_bufs, USR_MSG_CNT);
+	sess->dev = ibtrs_find_get_device(cm_id);
+	if (!sess->dev) {
+		err = -ENOMEM;
+		WRN(sess, "Failed to alloc ibtrs_device\n");
+		goto err1;
+	}
+	err = setup_cont_bufs(sess);
+	if (err)
+		goto err2;
+
+	memcpy(sess->uuid, req->uuid, IBTRS_UUID_SIZE);
+	err = ssm_init(sess);
+	if (err) {
+		WRN(sess, "Failed to initialize the session state machine\n");
+		goto err3;
+	}
+
+	kref_init(&sess->kref);
+	init_waitqueue_head(&sess->bufs_wait);
+
+	list_add(&sess->list, &sess_list);
+	INFO(sess, "IBTRS Session created (queue depth: %d)\n",
+	     sess->queue_depth);
+
+	return sess;
+
+err3:
+	release_cont_bufs(sess);
+err2:
+	kref_put(&sess->dev->ref, ibtrs_free_dev);
+err1:
+	kvfree(sess);
+out:
+	return ERR_PTR(err);
+}
+
+inline const char *ibtrs_srv_get_sess_hostname(struct ibtrs_session *sess)
+{
+	return sess->hostname;
+}
+EXPORT_SYMBOL(ibtrs_srv_get_sess_hostname);
+
+inline const char *ibtrs_srv_get_sess_addr(struct ibtrs_session *sess)
+{
+	return sess->addr;
+}
+EXPORT_SYMBOL(ibtrs_srv_get_sess_addr);
+
+inline int ibtrs_srv_get_sess_qdepth(struct ibtrs_session *sess)
+{
+	return sess->queue_depth;
+}
+EXPORT_SYMBOL(ibtrs_srv_get_sess_qdepth);
+
+static struct ibtrs_session *__find_active_sess(const char *uuid)
+{
+	struct ibtrs_session *n;
+
+	list_for_each_entry(n, &sess_list, list) {
+		if (!memcmp(n->uuid, uuid, sizeof(n->uuid)) &&
+		    n->state != SSM_STATE_CLOSING &&
+		    n->state != SSM_STATE_CLOSED)
+			return n;
+	}
+
+	return NULL;
+}
+
+static int rdma_con_reject(struct rdma_cm_id *cm_id, s16 errno)
+{
+	struct ibtrs_msg_error msg;
+	int ret;
+
+	memset(&msg, 0, sizeof(msg));
+	msg.hdr.type	= IBTRS_MSG_ERROR;
+	msg.hdr.tsize	= sizeof(msg);
+	msg.errno	= errno;
+
+	ret = rdma_reject(cm_id, &msg, sizeof(msg));
+	if (ret)
+		ERR_NP("Rejecting RDMA connection request failed, errno: %d\n",
+		       ret);
+
+	return ret;
+}
+
+static int find_next_bit_ring(int cur)
+{
+	int v = cpumask_next(cur, &cq_affinity_mask);
+
+	if (v >= nr_cpu_ids)
+		v = cpumask_first(&cq_affinity_mask);
+	return v;
+}
+
+static int ibtrs_srv_get_next_cq_vector(struct ibtrs_session *sess)
+{
+	sess->cur_cq_vector = find_next_bit_ring(sess->cur_cq_vector);
+
+	return sess->cur_cq_vector;
+}
+
+static void ssm_create_con_worker(struct work_struct *work)
+{
+	struct ssm_create_con_work *ssm_w =
+			container_of(work, struct ssm_create_con_work, work);
+	struct ibtrs_session *sess = ssm_w->sess;
+	struct rdma_cm_id *cm_id = ssm_w->cm_id;
+	bool user = ssm_w->user;
+	struct ibtrs_con *con;
+	int ret;
+	u16 cq_size, wr_queue_size;
+
+	kvfree(ssm_w);
+
+	if (sess->state == SSM_STATE_CLOSING ||
+	    sess->state == SSM_STATE_CLOSED) {
+		WRN(sess, "Creating connection failed, "
+		    "session is being closed\n");
+		ret = -ECOMM;
+		goto err_reject;
+	}
+
+	con = ibtrs_zalloc(sizeof(*con));
+	if (!con) {
+		ERR(sess, "Creating connection failed, "
+		    "can't allocate memory for connection\n");
+		ret = -ENOMEM;
+		goto err_reject;
+	}
+
+	con->cm_id			= cm_id;
+	con->sess			= sess;
+	con->user			= user;
+	con->device_being_removed	= false;
+
+	atomic_set(&con->wr_cnt, 0);
+	if (con->user) {
+		cq_size		= USR_CON_BUF_SIZE + 1;
+		wr_queue_size	= USR_CON_BUF_SIZE + 1;
+	} else {
+		cq_size		= con->sess->queue_depth;
+		wr_queue_size	= sess->wq_size;
+	}
+
+	con->cq_vector = ibtrs_srv_get_next_cq_vector(sess);
+
+	con->ib_con.addr = sess->addr;
+	con->ib_con.hostname = sess->hostname;
+	ret = ib_con_init(&con->ib_con, con->cm_id,
+			  1, cq_event_handler, con, con->cq_vector, cq_size,
+			  wr_queue_size, &con->sess->dev->ib_sess);
+	if (ret)
+		goto err_init;
+
+	INIT_WORK(&con->cq_work, wrapper_handle_cq_comp);
+	if (con->user)
+		con->cq_wq = alloc_ordered_workqueue("%s",
+						     WQ_HIGHPRI,
+						     "ibtrs_srv_wq");
+	else
+		con->cq_wq = alloc_workqueue("%s",
+					     WQ_CPU_INTENSIVE | WQ_HIGHPRI, 0,
+					     "ibtrs_srv_wq");
+	if (!con->cq_wq) {
+		ERR(sess, "Creating connection failed, can't allocate "
+		    "work queue for completion queue, errno: %d\n", ret);
+		goto err_wq1;
+	}
+
+	con->rdma_resp_wq = alloc_workqueue("%s", 0, WQ_HIGHPRI,
+					    "ibtrs_rdma_resp");
+
+	if (!con->rdma_resp_wq) {
+		ERR(sess, "Creating connection failed, can't allocate"
+		    " work queue for send response, errno: %d\n", ret);
+		goto err_wq2;
+	}
+
+	ret = init_transfer_bufs(con);
+	if (ret) {
+		ERR(sess, "Creating connection failed, can't init"
+		    " transfer buffers, errno: %d\n", ret);
+		goto err_buf;
+	}
+
+	csm_init(con);
+	add_con_to_list(sess, con);
+
+	cm_id->context = con;
+	if (con->user) {
+		con->sess->msg_wq = alloc_ordered_workqueue("sess_msg_wq", 0);
+		if (!con->sess->msg_wq) {
+			ERR(con->sess, "Failed to create user message"
+			    " workqueue\n");
+			ret = -ENOMEM;
+			goto err_accept;
+		}
+	}
+
+	DEB("accept request\n");
+	ret = accept(con);
+	if (ret)
+		goto err_msg;
+
+	if (con->user)
+		con->sess->cm_id = cm_id;
+
+	con->sess->active_cnt++;
+
+	return;
+err_msg:
+	if (con->user)
+		destroy_workqueue(con->sess->msg_wq);
+err_accept:
+	cm_id->context = NULL;
+	remove_con_from_list(con);
+err_buf:
+	destroy_workqueue(con->rdma_resp_wq);
+err_wq2:
+	destroy_workqueue(con->cq_wq);
+err_wq1:
+	ib_con_destroy(&con->ib_con);
+err_init:
+	kvfree(con);
+err_reject:
+	rdma_destroy_id(cm_id);
+
+	ssm_schedule_event(sess, SSM_EV_CON_EST_ERR);
+}
+
+static int ssm_schedule_create_con(struct ibtrs_session *sess,
+				   struct rdma_cm_id *cm_id,
+				   bool user)
+{
+	struct ssm_create_con_work *w;
+
+	w = ibtrs_malloc(sizeof(*w));
+	if (!w)
+		return -ENOMEM;
+
+	w->sess		= sess;
+	w->cm_id	= cm_id;
+	w->user		= user;
+	INIT_WORK(&w->work, ssm_create_con_worker);
+	queue_work(sess->sm_wq, &w->work);
+
+	return 0;
+}
+
+static int rdma_con_establish(struct rdma_cm_id *cm_id, const void *data,
+			      size_t size)
+{
+	struct ibtrs_session *sess;
+	int ret;
+	const char *uuid = NULL;
+	const struct ibtrs_msg_hdr *hdr = data;
+	bool user = false;
+
+	if (unlikely(!srv_ops_are_valid(srv_ops))) {
+		ERR_NP("Establishing connection failed, "
+		       "no user module registered!\n");
+		ret = -ECOMM;
+		goto err_reject;
+	}
+
+	if (unlikely((size < sizeof(struct ibtrs_msg_con_open)) ||
+		     (size < sizeof(struct ibtrs_msg_sess_open)) ||
+		     ibtrs_validate_message(0, hdr))) {
+		ERR_NP("Establishing connection failed, "
+		       "connection request payload size unexpected "
+		       "%zu != %lu or %lu\n", size,
+		       sizeof(struct ibtrs_msg_con_open),
+		       sizeof(struct ibtrs_msg_sess_open));
+		ret = -EINVAL;
+		goto err_reject;
+	}
+
+	if (hdr->type == IBTRS_MSG_SESS_OPEN)
+		uuid = ((struct ibtrs_msg_sess_open *)data)->uuid;
+	else if (hdr->type == IBTRS_MSG_CON_OPEN)
+		uuid = ((struct ibtrs_msg_con_open *)data)->uuid;
+
+	mutex_lock(&sess_mutex);
+	sess = __find_active_sess(uuid);
+	if (sess) {
+		if (unlikely(hdr->type == IBTRS_MSG_SESS_OPEN)) {
+			INFO(sess, "Connection request rejected, "
+			     "session already exists\n");
+			mutex_unlock(&sess_mutex);
+			ret = -EEXIST;
+			goto err_reject;
+		}
+		if (!ibtrs_srv_sess_get(sess)) {
+			INFO(sess, "Connection request rejected,"
+			     " session is being closed\n");
+			mutex_unlock(&sess_mutex);
+			ret = -EINVAL;
+			goto err_reject;
+		}
+	} else {
+		if (unlikely(hdr->type == IBTRS_MSG_CON_OPEN)) {
+			mutex_unlock(&sess_mutex);
+			INFO_NP("Connection request rejected,"
+				" received con_open msg but no active session"
+				" exists.\n");
+			ret = -EINVAL;
+			goto err_reject;
+		}
+
+		sess = __create_sess(cm_id, (struct ibtrs_msg_sess_open *)data);
+		if (IS_ERR(sess)) {
+			mutex_unlock(&sess_mutex);
+			ret = PTR_ERR(sess);
+			ERR_NP("Establishing connection failed, "
+			       "creating local session resource failed, errno:"
+			       " %d\n", ret);
+			goto err_reject;
+		}
+		ibtrs_srv_sess_get(sess);
+		user = true;
+	}
+
+	mutex_unlock(&sess_mutex);
+
+	ret = ssm_schedule_create_con(sess, cm_id, user);
+	if (ret) {
+		ERR(sess, "Unable to schedule creation of connection,"
+		    " session will be closed.\n");
+		goto err_close;
+	}
+
+	ibtrs_srv_sess_put(sess);
+	return 0;
+
+err_close:
+	ssm_schedule_event(sess, SSM_EV_CON_EST_ERR);
+	ibtrs_srv_sess_put(sess);
+err_reject:
+	rdma_con_reject(cm_id, ret);
+	return ret;
+}
+
+static int ibtrs_srv_rdma_cm_ev_handler(struct rdma_cm_id *cm_id,
+					struct rdma_cm_event *event)
+{
+	struct ibtrs_con *con = cm_id->context;
+	int ret = 0;
+
+	DEB("cma_event type %d cma_id %p(%s) on con: %p\n", event->event,
+	    cm_id, rdma_event_msg(event->event), con);
+	if (!con && event->event != RDMA_CM_EVENT_CONNECT_REQUEST) {
+		INFO_NP("Ignore cma_event type %d cma_id %p(%s)\n",
+			event->event, cm_id, rdma_event_msg(event->event));
+		return 0;
+	}
+
+	switch (event->event) {
+	case RDMA_CM_EVENT_CONNECT_REQUEST:
+		ret = rdma_con_establish(cm_id, event->param.conn.private_data,
+					 event->param.conn.private_data_len);
+		break;
+	case RDMA_CM_EVENT_ESTABLISHED:
+		csm_schedule_event(con, CSM_EV_CON_ESTABLISHED);
+		break;
+	case RDMA_CM_EVENT_DISCONNECTED:
+	case RDMA_CM_EVENT_TIMEWAIT_EXIT:
+		csm_schedule_event(con, CSM_EV_CON_DISCONNECTED);
+		break;
+
+	case RDMA_CM_EVENT_DEVICE_REMOVAL: {
+		struct completion dc;
+
+		ERR_RL(con->sess,
+		       "IB Device was removed, disconnecting session.\n");
+
+		con->device_being_removed = true;
+		init_completion(&dc);
+		con->sess->dev->ib_sess_destroy_completion = &dc;
+
+		csm_schedule_event(con, CSM_EV_DEVICE_REMOVAL);
+		wait_for_completion(&dc);
+
+		/* If it's user connection, the cm_id will be destroyed by
+		 * destroy_sess(), so return 0 to signal that we will destroy
+		 * it later. Otherwise, return 1 so CMA will destroy it.
+		 */
+		if (con->user)
+			return 0;
+		else
+			return 1;
+	}
+	case RDMA_CM_EVENT_CONNECT_ERROR:
+	case RDMA_CM_EVENT_ROUTE_ERROR:
+	case RDMA_CM_EVENT_UNREACHABLE:
+	case RDMA_CM_EVENT_ADDR_CHANGE:
+		ERR_RL(con->sess, "CM error (CM event: %s, errno: %d)\n",
+		       rdma_event_msg(event->event), event->status);
+
+		csm_schedule_event(con, CSM_EV_CON_ERROR);
+		break;
+	case RDMA_CM_EVENT_REJECTED:
+		/* reject status is defined in enum, not errno */
+		ERR_RL(con->sess,
+		       "Connection rejected (CM event: %s, err: %s)\n",
+		       rdma_event_msg(event->event),
+		       rdma_reject_msg(cm_id, event->status));
+		csm_schedule_event(con, CSM_EV_CON_ERROR);
+		break;
+	default:
+		WRN(con->sess, "Ignoring unexpected CM event %s, errno %d\n",
+		    rdma_event_msg(event->event), event->status);
+		break;
+	}
+	return ret;
+}
+
+static int ibtrs_srv_cm_init(struct rdma_cm_id **cm_id, struct sockaddr *addr,
+			     enum rdma_port_space ps)
+{
+	int ret;
+
+	*cm_id = rdma_create_id(&init_net, ibtrs_srv_rdma_cm_ev_handler, NULL,
+				ps, IB_QPT_RC);
+	if (IS_ERR(*cm_id)) {
+		ret = PTR_ERR(*cm_id);
+		ERR_NP("Creating id for RDMA connection failed, errno: %d\n",
+		       ret);
+		goto err_out;
+	}
+	DEB("created cm_id %p\n", *cm_id);
+	ret = rdma_bind_addr(*cm_id, addr);
+	if (ret) {
+		ERR_NP("Binding RDMA address failed, errno: %d\n", ret);
+		goto err_cm;
+	}
+	DEB("rdma_bind_addr successful\n");
+	/* we currently accept 64 rdma_connects */
+	ret = rdma_listen(*cm_id, 64);
+	if (ret) {
+		ERR_NP("Listening on RDMA connection failed, errno: %d\n", ret);
+		goto err_cm;
+	}
+
+	switch (addr->sa_family) {
+	case AF_INET:
+		DEB("listening on port %u\n",
+		    ntohs(((struct sockaddr_in *)addr)->sin_port));
+		break;
+	case AF_INET6:
+		DEB("listening on port %u\n",
+		    ntohs(((struct sockaddr_in6 *)addr)->sin6_port));
+		break;
+	case AF_IB:
+		DEB("listening on service id 0x%016llx\n",
+		    be64_to_cpu(rdma_get_service_id(*cm_id, addr)));
+		break;
+	default:
+		DEB("listening on address family %u\n", addr->sa_family);
+	}
+
+	return 0;
+
+err_cm:
+	rdma_destroy_id(*cm_id);
+err_out:
+	return ret;
+}
+
+static int ibtrs_srv_rdma_init(void)
+{
+	int ret = 0;
+	struct sockaddr_in6 sin = {
+		.sin6_family	= AF_INET6,
+		.sin6_addr	= IN6ADDR_ANY_INIT,
+		.sin6_port	= htons(IBTRS_SERVER_PORT),
+	};
+	struct sockaddr_ib sib = {
+		.sib_family			= AF_IB,
+		.sib_addr.sib_subnet_prefix	= 0ULL,
+		.sib_addr.sib_interface_id	= 0ULL,
+		.sib_sid	= cpu_to_be64(RDMA_IB_IP_PS_IB |
+					      IBTRS_SERVER_PORT),
+		.sib_sid_mask	= cpu_to_be64(0xffffffffffffffffULL),
+		.sib_pkey	= cpu_to_be16(0xffff),
+	};
+
+	/*
+	 * We accept both IPoIB and IB connections, so we need to keep
+	 * two cm id's, one for each socket type and port space.
+	 * If the cm initialization of one of the id's fails, we abort
+	 * everything.
+	 */
+
+	ret = ibtrs_srv_cm_init(&cm_id_ip, (struct sockaddr *)&sin,
+				RDMA_PS_TCP);
+	if (ret)
+		return ret;
+
+	ret = ibtrs_srv_cm_init(&cm_id_ib, (struct sockaddr *)&sib, RDMA_PS_IB);
+	if (ret)
+		goto err_cm_ib;
+
+	return ret;
+
+err_cm_ib:
+	rdma_destroy_id(cm_id_ip);
+	return ret;
+}
+
+static void ibtrs_srv_destroy_buf_pool(void)
+{
+	struct ibtrs_rcv_buf_pool *pool, *pool_next;
+
+	mutex_lock(&buf_pool_mutex);
+	list_for_each_entry_safe(pool, pool_next, &free_buf_pool_list, list) {
+		list_del(&pool->list);
+		nr_free_buf_pool--;
+		free_recv_buf_pool(pool);
+	}
+	mutex_unlock(&buf_pool_mutex);
+}
+
+static void ibtrs_srv_alloc_ini_buf_pool(void)
+{
+	struct ibtrs_rcv_buf_pool *pool;
+	int i;
+
+	if (init_pool_size == 0)
+		return;
+
+	INFO_NP("Trying to allocate RDMA buffers pool for %d client(s)\n",
+		init_pool_size);
+	for (i = 0; i < init_pool_size; i++) {
+		pool = alloc_rcv_buf_pool();
+		if (!pool) {
+			ERR_NP("Failed to allocate initial RDMA buffer pool"
+			       " #%d\n", i + 1);
+			break;
+		}
+		mutex_lock(&buf_pool_mutex);
+		list_add(&pool->list, &free_buf_pool_list);
+		nr_free_buf_pool++;
+		nr_total_buf_pool++;
+		mutex_unlock(&buf_pool_mutex);
+		DEB("Allocated buffer pool #%d\n", i);
+	}
+
+	INFO_NP("Allocated RDMA buffers pool for %d client(s)\n", i);
+}
+
+int ibtrs_srv_register(const struct ibtrs_srv_ops *ops)
+{
+	int err;
+
+	if (srv_ops) {
+		ERR_NP("Registration failed, module %s already registered,"
+		       " only 1 user module supported\n",
+		srv_ops->owner->name);
+		return -ENOTSUPP;
+	}
+
+	if (unlikely(!srv_ops_are_valid(ops))) {
+		ERR_NP("Registration failed, user module supploed invalid ops"
+		       " parameter\n");
+		return -EFAULT;
+	}
+
+	ibtrs_srv_alloc_ini_buf_pool();
+
+	err = ibtrs_srv_rdma_init();
+	if (err) {
+		ERR_NP("Can't init RDMA resource, errno: %d\n", err);
+		return err;
+	}
+	srv_ops = ops;
+
+	return 0;
+}
+EXPORT_SYMBOL(ibtrs_srv_register);
+
+inline void ibtrs_srv_queue_close(struct ibtrs_session *sess)
+{
+	ssm_schedule_event(sess, SSM_EV_SYSFS_DISCONNECT);
+}
+
+static void close_sessions(void)
+{
+	struct ibtrs_session *sess;
+
+	mutex_lock(&sess_mutex);
+	list_for_each_entry(sess, &sess_list, list) {
+		if (!ibtrs_srv_sess_get(sess))
+			continue;
+		ssm_schedule_event(sess, SSM_EV_SESS_CLOSE);
+		ibtrs_srv_sess_put(sess);
+	}
+	mutex_unlock(&sess_mutex);
+
+	wait_event(sess_list_waitq, list_empty(&sess_list));
+}
+
+void ibtrs_srv_unregister(const struct ibtrs_srv_ops *ops)
+{
+	if (!srv_ops) {
+		WRN_NP("Nothing to unregister - srv_ops = NULL\n");
+		return;
+	}
+
+	/* TODO: in order to support registration of multiple modules,
+	 * introduce a list with srv_ops and search for the correct
+	 * one.
+	 */
+
+	if (srv_ops != ops) {
+		ERR_NP("Ops is not the ops we have registered\n");
+		return;
+	}
+
+	rdma_destroy_id(cm_id_ip);
+	cm_id_ip = NULL;
+	rdma_destroy_id(cm_id_ib);
+	cm_id_ib = NULL;
+	close_sessions();
+	flush_workqueue(destroy_wq);
+	ibtrs_srv_destroy_buf_pool();
+	srv_ops = NULL;
+}
+EXPORT_SYMBOL(ibtrs_srv_unregister);
+
+static int check_module_params(void)
+{
+	if (sess_queue_depth < 1 || sess_queue_depth > MAX_SESS_QUEUE_DEPTH) {
+		ERR_NP("Invalid sess_queue_depth parameter value\n");
+		return -EINVAL;
+	}
+
+	/* check if IB immediate data size is enough to hold the mem_id and the
+	 * offset inside the memory chunk
+	 */
+	if (ilog2(sess_queue_depth - 1) + ilog2(rcv_buf_size - 1) >
+	    IB_IMM_SIZE_BITS) {
+		ERR_NP("RDMA immediate size (%db) not enough to encode "
+		       "%d buffers of size %dB. Reduce 'sess_queue_depth' "
+		       "or 'max_io_size' parameters.\n", IB_IMM_SIZE_BITS,
+		       sess_queue_depth, rcv_buf_size);
+		return -EINVAL;
+	}
+
+	if (init_pool_size < 0) {
+		ERR_NP("Invalid 'init_pool_size' parameter value."
+		       " Value must be positive.\n");
+		return -EINVAL;
+	}
+
+	if (pool_size_hi_wm < init_pool_size) {
+		ERR_NP("Invalid 'pool_size_hi_wm' parameter value. Value must"
+		       " be iqual or higher than 'init_pool_size'.\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static void csm_init(struct ibtrs_con *con)
+{
+	DEB("initializing csm to %s\n", csm_state_str(CSM_STATE_REQUESTED));
+	csm_set_state(con, CSM_STATE_REQUESTED);
+}
+
+static int send_msg_sess_open_resp(struct ibtrs_con *con)
+{
+	struct ibtrs_msg_sess_open_resp *msg;
+	int err;
+	struct ibtrs_session *sess = con->sess;
+
+	msg = sess->rdma_info_iu->buf;
+
+	fill_ibtrs_msg_sess_open_resp(msg, con);
+
+	err = ibtrs_post_send(con->ib_con.qp, con->sess->dev->ib_sess.mr,
+			      sess->rdma_info_iu, msg->hdr.tsize);
+	if (unlikely(err))
+		ERR(sess, "Sending sess open resp failed, "
+			  "posting msg to QP failed, errno: %d\n", err);
+
+	return err;
+}
+
+static void queue_heartbeat_dwork(struct ibtrs_session *sess)
+{
+	ibtrs_set_last_heartbeat(&sess->heartbeat);
+	WARN_ON(!queue_delayed_work(sess->sm_wq,
+				    &sess->send_heartbeat_dwork,
+				    HEARTBEAT_INTV_JIFFIES));
+	WARN_ON(!queue_delayed_work(sess->sm_wq,
+				    &sess->check_heartbeat_dwork,
+				    HEARTBEAT_INTV_JIFFIES));
+}
+
+static void csm_requested(struct ibtrs_con *con, enum csm_ev ev)
+{
+	struct ibtrs_session *sess = con->sess;
+	enum csm_state state = con->state;
+
+	DEB("con %p, event %s\n", con, csm_ev_str(ev));
+	switch (ev) {
+	case CSM_EV_CON_ESTABLISHED: {
+		csm_set_state(con, CSM_STATE_CONNECTED);
+		if (con->user) {
+			/* send back rdma info */
+			if (send_msg_sess_open_resp(con))
+				goto destroy;
+			queue_heartbeat_dwork(con->sess);
+		}
+		ssm_schedule_event(sess, SSM_EV_CON_CONNECTED);
+		break;
+	}
+	case CSM_EV_DEVICE_REMOVAL:
+	case CSM_EV_CON_ERROR:
+	case CSM_EV_SESS_CLOSING:
+	case CSM_EV_CON_DISCONNECTED:
+destroy:
+		csm_set_state(con, CSM_STATE_CLOSED);
+		close_con(con);
+		ssm_schedule_event(sess, SSM_EV_CON_EST_ERR);
+		break;
+	default:
+		ERR(sess, "Connection received unexpected event %s "
+		    "in %s state.\n", csm_ev_str(ev), csm_state_str(state));
+	}
+}
+
+static void csm_connected(struct ibtrs_con *con, enum csm_ev ev)
+{
+	struct ibtrs_session *sess = con->sess;
+	enum csm_state state = con->state;
+
+	DEB("con %p, event %s\n", con, csm_ev_str(ev));
+	switch (ev) {
+	case CSM_EV_CON_ERROR:
+	case CSM_EV_SESS_CLOSING: {
+		int err;
+
+		csm_set_state(con, CSM_STATE_CLOSING);
+		err = rdma_disconnect(con->cm_id);
+		if (err)
+			ERR(sess, "Connection received event %s "
+			    "in %s state, new state is %s, but failed to "
+			    "disconnect connection.\n", csm_ev_str(ev),
+			    csm_state_str(state), csm_state_str(con->state));
+		break;
+		}
+	case CSM_EV_DEVICE_REMOVAL:
+		/* Send a SSM_EV_SESS_CLOSE event to the session to speed up the
+		 * closing of the other connections. If we just wait for the
+		 * client to close all connections this can take a while.
+		 */
+		ssm_schedule_event(sess, SSM_EV_SESS_CLOSE);
+		/* fall-through */
+	case CSM_EV_CON_DISCONNECTED: {
+		int err, cnt = 0;
+
+		csm_set_state(con, CSM_STATE_FLUSHING);
+		err = rdma_disconnect(con->cm_id);
+		if (err)
+			ERR(sess, "Connection received event %s "
+			    "in %s state, new state is %s, but failed to "
+			    "disconnect connection.\n", csm_ev_str(ev),
+			    csm_state_str(state), csm_state_str(con->state));
+
+		wait_event(sess->bufs_wait,
+			   !atomic_read(&sess->stats.rdma_stats.inflight));
+		DEB("posting beacon on con %p\n", con);
+		err = post_beacon(&con->ib_con);
+		if (err) {
+			ERR(sess, "Connection received event %s "
+			    "in %s state, new state is %s but failed to post"
+			    " beacon, closing connection.\n", csm_ev_str(ev),
+			    csm_state_str(state), csm_state_str(con->state));
+			goto destroy;
+		}
+
+		err = ibtrs_request_cq_notifications(&con->ib_con);
+		if (unlikely(err < 0)) {
+			WRN(con->sess, "Requesting CQ Notification for"
+			    " ib_con failed. Connection will be destroyed\n");
+			goto destroy;
+		} else if (err > 0) {
+			err = get_process_wcs(con, &cnt);
+			if (unlikely(err))
+				goto destroy;
+			break;
+		}
+		break;
+
+destroy:
+		csm_set_state(con, CSM_STATE_CLOSED);
+		close_con(con);
+		ssm_schedule_event(sess, SSM_EV_CON_DISCONNECTED);
+
+		break;
+		}
+	default:
+		ERR(sess, "Connection received unexpected event %s "
+		    "in %s state\n", csm_ev_str(ev), csm_state_str(state));
+	}
+}
+
+static void csm_closing(struct ibtrs_con *con, enum csm_ev ev)
+{
+	struct ibtrs_session *sess = con->sess;
+	enum csm_state state = con->state;
+
+	DEB("con %p, event %s\n", con, csm_ev_str(ev));
+	switch (ev) {
+	case CSM_EV_DEVICE_REMOVAL:
+	case CSM_EV_CON_DISCONNECTED: {
+		int err, cnt = 0;
+
+		csm_set_state(con, CSM_STATE_FLUSHING);
+
+		wait_event(sess->bufs_wait,
+			   !atomic_read(&sess->stats.rdma_stats.inflight));
+
+		DEB("posting beacon on con %p\n", con);
+		if (post_beacon(&con->ib_con)) {
+			ERR(sess, "Connection received event %s "
+			    "in %s state, new state is %s but failed to post"
+			    " beacon, closing connection.\n", csm_ev_str(ev),
+			    csm_state_str(state), csm_state_str(con->state));
+			goto destroy;
+		}
+
+		err = ibtrs_request_cq_notifications(&con->ib_con);
+		if (unlikely(err < 0)) {
+			WRN(con->sess, "Requesting CQ Notification for"
+			    " ib_con failed. Connection will be destroyed\n");
+			goto destroy;
+		} else if (err > 0) {
+			err = get_process_wcs(con, &cnt);
+			if (unlikely(err))
+				goto destroy;
+			break;
+		}
+		break;
+
+destroy:
+		csm_set_state(con, CSM_STATE_CLOSED);
+		close_con(con);
+		ssm_schedule_event(sess, SSM_EV_CON_DISCONNECTED);
+		break;
+	}
+	case CSM_EV_CON_ERROR:
+		/* ignore connection errors, just wait for CM_DISCONNECTED */
+	case CSM_EV_SESS_CLOSING:
+		break;
+	default:
+		ERR(sess, "Connection received unexpected event %s "
+		    "in %s state\n", csm_ev_str(ev), csm_state_str(state));
+	}
+}
+
+static void csm_flushing(struct ibtrs_con *con, enum csm_ev ev)
+{
+	struct ibtrs_session *sess = con->sess;
+	enum csm_state state = con->state;
+
+	DEB("con %p, event %s\n", con, csm_ev_str(ev));
+
+	switch (ev) {
+	case CSM_EV_BEACON_COMPLETED:
+		csm_set_state(con, CSM_STATE_CLOSED);
+		close_con(con);
+		ssm_schedule_event(sess, SSM_EV_CON_DISCONNECTED);
+		break;
+	case CSM_EV_SESS_CLOSING:
+	case CSM_EV_DEVICE_REMOVAL:
+		/* Ignore CSM_EV_DEVICE_REMOVAL and CSM_EV_SESS_CLOSING in
+		 * this state. The beacon was already posted, so the
+		 * CSM_EV_BEACON_COMPLETED event should arrive anytime soon.
+		 */
+		break;
+	case CSM_EV_CON_ERROR:
+		break;
+	case CSM_EV_CON_DISCONNECTED:
+		/* Ignore CSM_EV_CON_DISCONNECTED. At this point we could have
+		 * already received a CSM_EV_CON_DISCONNECTED for the same
+		 * connection, but an additional RDMA_CM_EVENT_DISCONNECTED or
+		 * RDMA_CM_EVENT_TIMEWAIT_EXIT could be generated.
+		 */
+		break;
+	default:
+		ERR(sess, "Connection received unexpected event %s "
+		    "in %s state\n", csm_ev_str(ev), csm_state_str(state));
+	}
+}
+
+static void csm_closed(struct ibtrs_con *con, enum csm_ev ev)
+{
+	/* in this state, we ignore every event scheduled for this connection
+	 * and just wait for the session workqueue to be flushed and the
+	 * connection freed
+	 */
+	DEB("con %p, event %s\n", con, csm_ev_str(ev));
+}
+
+typedef void (ibtrs_srv_csm_ev_handler_fn)(struct ibtrs_con *, enum csm_ev);
+
+static ibtrs_srv_csm_ev_handler_fn *ibtrs_srv_csm_ev_handlers[] = {
+	[CSM_STATE_REQUESTED]		= csm_requested,
+	[CSM_STATE_CONNECTED]		= csm_connected,
+	[CSM_STATE_CLOSING]		= csm_closing,
+	[CSM_STATE_FLUSHING]		= csm_flushing,
+	[CSM_STATE_CLOSED]		= csm_closed,
+};
+
+static inline void ibtrs_srv_csm_ev_handle(struct ibtrs_con *con,
+					   enum csm_ev ev)
+{
+	return (*ibtrs_srv_csm_ev_handlers[con->state])(con, ev);
+}
+
+static void csm_worker(struct work_struct *work)
+{
+	struct csm_work *csm_w = container_of(work, struct csm_work, work);
+
+	ibtrs_srv_csm_ev_handle(csm_w->con, csm_w->ev);
+	kvfree(csm_w);
+}
+
+static void csm_schedule_event(struct ibtrs_con *con, enum csm_ev ev)
+{
+	struct csm_work *w;
+
+	if (!ibtrs_srv_sess_get(con->sess))
+		return;
+
+	while (true) {
+		if (con->state == CSM_STATE_CLOSED)
+			goto out;
+		w = ibtrs_malloc(sizeof(*w));
+		if (w)
+			break;
+		cond_resched();
+	}
+
+	w->con = con;
+	w->ev = ev;
+	INIT_WORK(&w->work, csm_worker);
+	queue_work(con->sess->sm_wq, &w->work);
+
+out:
+	ibtrs_srv_sess_put(con->sess);
+}
+
+static void sess_schedule_csm_event(struct ibtrs_session *sess, enum csm_ev ev)
+{
+	struct ibtrs_con *con;
+
+	list_for_each_entry(con, &sess->con_list, list)
+		csm_schedule_event(con, ev);
+}
+
+static void remove_sess_from_sysfs(struct ibtrs_session *sess)
+{
+	if (!sess->state_in_sysfs)
+		return;
+
+	kobject_del(&sess->kobj_stats);
+	kobject_del(&sess->kobj);
+	sess->state_in_sysfs = false;
+
+	ibtrs_srv_schedule_sysfs_put(sess);
+}
+
+static __always_inline int
+__ibtrs_srv_request_cq_notifications(struct ibtrs_con *con)
+{
+	return ibtrs_request_cq_notifications(&con->ib_con);
+}
+
+static int ibtrs_srv_request_cq_notifications(struct ibtrs_session *sess)
+{
+	struct ibtrs_con *con;
+	int err, cnt = 0;
+
+	list_for_each_entry(con, &sess->con_list, list)  {
+		if (con->state == CSM_STATE_CONNECTED) {
+			err = __ibtrs_srv_request_cq_notifications(con);
+			if (unlikely(err < 0)) {
+				return err;
+			} else if (err > 0) {
+				err = get_process_wcs(con, &cnt);
+				if (unlikely(err))
+					return err;
+			}
+		}
+	}
+
+	return 0;
+}
+
+static void ssm_idle(struct ibtrs_session *sess, enum ssm_ev ev)
+{
+	enum ssm_state state = sess->state;
+
+	DEB("sess %p, event %s, est_cnt=%d\n", sess, ssm_ev_str(ev),
+	    sess->est_cnt);
+	switch (ev) {
+	case SSM_EV_CON_DISCONNECTED:
+		sess->est_cnt--;
+		/* fall through */
+	case SSM_EV_CON_EST_ERR:
+		if (!sess->active_cnt) {
+			ibtrs_srv_destroy_ib_session(sess);
+			ssm_set_state(sess, SSM_STATE_CLOSED);
+			cancel_delayed_work(&sess->check_heartbeat_dwork);
+			schedule_sess_put(sess);
+		} else {
+			ssm_set_state(sess, SSM_STATE_CLOSING);
+		}
+		break;
+	case SSM_EV_CON_CONNECTED: {
+		int err;
+
+		sess->est_cnt++;
+		if (sess->est_cnt != sess->con_cnt)
+			break;
+
+		err = ibtrs_srv_create_sess_files(sess);
+		if (err) {
+			if (err == -EEXIST)
+				ERR(sess,
+				    "Session sysfs files already exist,"
+				    " possibly a user-space process is"
+				    " holding them\n");
+			else
+				ERR(sess,
+				    "Create session sysfs files failed,"
+				    " errno: %d\n", err);
+			goto destroy;
+		}
+
+		sess->state_in_sysfs = true;
+
+		err = ibtrs_srv_sess_ev(sess, IBTRS_SRV_SESS_EV_CONNECTED);
+		if (err) {
+			ERR(sess, "Notifying user session event"
+			    " failed, errno: %d\n. Session is closed", err);
+			goto destroy;
+		}
+
+		ssm_set_state(sess, SSM_STATE_CONNECTED);
+		err = ibtrs_srv_request_cq_notifications(sess);
+		if (err) {
+			ERR(sess, "Requesting CQ completion notifications"
+			    " failed, errno: %d. Session will be closed.\n",
+			    err);
+			goto destroy;
+		}
+
+		break;
+destroy:
+		remove_sess_from_sysfs(sess);
+		ssm_set_state(sess, SSM_STATE_CLOSING);
+		sess_schedule_csm_event(sess, CSM_EV_SESS_CLOSING);
+		break;
+	}
+	case SSM_EV_SESS_CLOSE:
+		ssm_set_state(sess, SSM_STATE_CLOSING);
+		sess_schedule_csm_event(sess, CSM_EV_SESS_CLOSING);
+		break;
+	default:
+		ERR(sess, "Session received unexpected event %s "
+		    "in %s state.\n", ssm_ev_str(ev), ssm_state_str(state));
+	}
+}
+
+static void ssm_connected(struct ibtrs_session *sess, enum ssm_ev ev)
+{
+	enum ssm_state state = sess->state;
+
+	DEB("sess %p, event %s, est_cnt=%d\n", sess, ssm_ev_str(ev),
+	    sess->est_cnt);
+	switch (ev) {
+	case SSM_EV_CON_DISCONNECTED:
+		remove_sess_from_sysfs(sess);
+		sess->est_cnt--;
+
+		ssm_set_state(sess, SSM_STATE_CLOSING);
+		ibtrs_srv_sess_ev(sess, IBTRS_SRV_SESS_EV_DISCONNECTING);
+		break;
+	case SSM_EV_SESS_CLOSE:
+	case SSM_EV_SYSFS_DISCONNECT:
+		remove_sess_from_sysfs(sess);
+		ssm_set_state(sess, SSM_STATE_CLOSING);
+		ibtrs_srv_sess_ev(sess, IBTRS_SRV_SESS_EV_DISCONNECTING);
+
+		sess_schedule_csm_event(sess, CSM_EV_SESS_CLOSING);
+		break;
+	default:
+		ERR(sess, "Session received unexpected event %s "
+		    "in %s state.\n", ssm_ev_str(ev), ssm_state_str(state));
+	}
+}
+
+static void ssm_closing(struct ibtrs_session *sess, enum ssm_ev ev)
+{
+	enum ssm_state state = sess->state;
+
+	DEB("sess %p, event %s, est_cnt=%d\n", sess, ssm_ev_str(ev),
+	    sess->est_cnt);
+	switch (ev) {
+	case SSM_EV_CON_CONNECTED:
+		sess->est_cnt++;
+		break;
+	case SSM_EV_CON_DISCONNECTED:
+		sess->est_cnt--;
+		/* fall through */
+	case SSM_EV_CON_EST_ERR:
+		if (sess->active_cnt == 0) {
+			ibtrs_srv_destroy_ib_session(sess);
+			ssm_set_state(sess, SSM_STATE_CLOSED);
+			ibtrs_srv_sess_ev(sess, IBTRS_SRV_SESS_EV_DISCONNECTED);
+			cancel_delayed_work(&sess->check_heartbeat_dwork);
+			schedule_sess_put(sess);
+		}
+		break;
+	case SSM_EV_SESS_CLOSE:
+		sess_schedule_csm_event(sess, CSM_EV_SESS_CLOSING);
+		break;
+	case SSM_EV_SYSFS_DISCONNECT:
+		/* just ignore it, the connection should have a
+		 * CSM_EV_SESS_CLOSING event on the queue to be
+		 * processed later
+		 */
+		break;
+	default:
+		ERR(sess, "Session received unexpected event %s "
+		    "in %s state.\n", ssm_ev_str(ev), ssm_state_str(state));
+	}
+}
+
+static void ssm_closed(struct ibtrs_session *sess, enum ssm_ev ev)
+{
+	/* in this state, we ignore every event and wait for the session
+	 * to be destroyed
+	 */
+	DEB("sess %p, event %s, est_cnt=%d\n", sess, ssm_ev_str(ev),
+	    sess->est_cnt);
+}
+
+typedef void (ssm_ev_handler_fn)(struct ibtrs_session *, enum ssm_ev);
+
+static ssm_ev_handler_fn *ibtrs_srv_ev_handlers[] = {
+	[SSM_STATE_IDLE]		= ssm_idle,
+	[SSM_STATE_CONNECTED]		= ssm_connected,
+	[SSM_STATE_CLOSING]		= ssm_closing,
+	[SSM_STATE_CLOSED]		= ssm_closed,
+};
+
+static void check_heartbeat_work(struct work_struct *work)
+{
+	struct ibtrs_session *sess;
+
+	sess = container_of(to_delayed_work(work), struct ibtrs_session,
+			    check_heartbeat_dwork);
+
+	if (ibtrs_heartbeat_timeout_is_expired(&sess->heartbeat)) {
+		ssm_schedule_event(sess, SSM_EV_SESS_CLOSE);
+		return;
+	}
+
+	ibtrs_heartbeat_warn(&sess->heartbeat);
+
+	if (WARN_ON(!queue_delayed_work(sess->sm_wq,
+					&sess->check_heartbeat_dwork,
+					HEARTBEAT_INTV_JIFFIES)))
+		WRN_RL(sess, "Schedule check heartbeat work failed, "
+		       "check_heartbeat worker already queued?\n");
+}
+
+static void send_heartbeat_work(struct work_struct *work)
+{
+	struct ibtrs_session *sess;
+	int err;
+
+	sess = container_of(to_delayed_work(work), struct ibtrs_session,
+			    send_heartbeat_dwork);
+
+	if (ibtrs_heartbeat_send_ts_diff_ms(&sess->heartbeat) >=
+	    HEARTBEAT_INTV_MS) {
+		err = send_heartbeat(sess);
+		if (unlikely(err)) {
+			WRN_RL(sess,
+			       "Sending heartbeat failed, errno: %d,"
+			       " no further heartbeat will be sent\n", err);
+			return;
+		}
+	}
+
+	if (WARN_ON(!queue_delayed_work(sess->sm_wq,
+					&sess->send_heartbeat_dwork,
+					HEARTBEAT_INTV_JIFFIES)))
+		WRN_RL(sess, "schedule send heartbeat work failed, "
+		       "send_heartbeat worker already queued?\n");
+}
+
+static inline void ssm_ev_handle(struct ibtrs_session *sess, enum ssm_ev ev)
+{
+	return (*ibtrs_srv_ev_handlers[sess->state])(sess, ev);
+}
+
+static void ssm_worker(struct work_struct *work)
+{
+	struct ssm_work *ssm_w = container_of(work, struct ssm_work, work);
+
+	ssm_ev_handle(ssm_w->sess, ssm_w->ev);
+	kvfree(ssm_w);
+}
+
+static int ssm_schedule_event(struct ibtrs_session *sess, enum ssm_ev ev)
+{
+	struct ssm_work *w;
+	int ret = 0;
+
+	if (!ibtrs_srv_sess_get(sess))
+		return -EPERM;
+
+	while (true) {
+		if (sess->state == SSM_STATE_CLOSED) {
+			ret = -EPERM;
+			goto out;
+		}
+		w = ibtrs_malloc(sizeof(*w));
+		if (w)
+			break;
+		cond_resched();
+	}
+
+	w->sess = sess;
+	w->ev = ev;
+	INIT_WORK(&w->work, ssm_worker);
+	queue_work(sess->sm_wq, &w->work);
+
+out:
+	ibtrs_srv_sess_put(sess);
+	return ret;
+}
+
+static int ssm_init(struct ibtrs_session *sess)
+{
+	sess->sm_wq = create_singlethread_workqueue("ibtrs_ssm_wq");
+	if (!sess->sm_wq)
+		return -ENOMEM;
+
+	INIT_DELAYED_WORK(&sess->check_heartbeat_dwork, check_heartbeat_work);
+	INIT_DELAYED_WORK(&sess->send_heartbeat_dwork, send_heartbeat_work);
+
+	ssm_set_state(sess, SSM_STATE_IDLE);
+
+	return 0;
+}
+
+static int ibtrs_srv_create_debugfs_files(void)
+{
+	int ret = 0;
+	struct dentry *file;
+
+	ibtrs_srv_debugfs_dir = debugfs_create_dir("ibtrs_server", NULL);
+	if (IS_ERR_OR_NULL(ibtrs_srv_debugfs_dir)) {
+		ibtrs_srv_debugfs_dir = NULL;
+		ret = PTR_ERR(ibtrs_srv_debugfs_dir);
+		if (ret == -ENODEV)
+			WRN_NP("Debugfs not enabled in kernel\n");
+		else
+			WRN_NP("Failed to create top-level debugfs directory,"
+			       " errno: %d\n", ret);
+		goto out;
+	}
+
+	mempool_debugfs_dir = debugfs_create_dir("mempool",
+						 ibtrs_srv_debugfs_dir);
+	if (IS_ERR_OR_NULL(mempool_debugfs_dir)) {
+		ret = PTR_ERR(mempool_debugfs_dir);
+		WRN_NP("Failed to create mempool debugfs directory,"
+		       " errno: %d\n", ret);
+		goto out_remove;
+	}
+
+	file = debugfs_create_u32("nr_free_buf_pool", 0444,
+				  mempool_debugfs_dir, &nr_free_buf_pool);
+	if (IS_ERR_OR_NULL(file)) {
+		WRN_NP("Failed to create mempool \"nr_free_buf_pool\""
+		       " debugfs file\n");
+		ret = -EINVAL;
+		goto out_remove;
+	}
+
+	file = debugfs_create_u32("nr_total_buf_pool", 0444,
+				  mempool_debugfs_dir, &nr_total_buf_pool);
+	if (IS_ERR_OR_NULL(file)) {
+		WRN_NP("Failed to create mempool \"nr_total_buf_pool\""
+		       " debugfs file\n");
+		ret = -EINVAL;
+		goto out_remove;
+	}
+
+	file = debugfs_create_u32("nr_active_sessions", 0444,
+				  mempool_debugfs_dir, &nr_active_sessions);
+	if (IS_ERR_OR_NULL(file)) {
+		WRN_NP("Failed to create mempool \"nr_active_sessions\""
+		       " debugfs file\n");
+		ret = -EINVAL;
+		goto out_remove;
+	}
+
+	goto out;
+
+out_remove:
+	debugfs_remove_recursive(ibtrs_srv_debugfs_dir);
+	ibtrs_srv_debugfs_dir = NULL;
+	mempool_debugfs_dir = NULL;
+out:
+	return ret;
+}
+
+static void ibtrs_srv_destroy_debugfs_files(void)
+{
+	debugfs_remove_recursive(ibtrs_srv_debugfs_dir);
+}
+
+static int __init ibtrs_server_init(void)
+{
+	int err;
+
+	if (!strlen(cq_affinity_list))
+		init_cq_affinity();
+
+	scnprintf(hostname, sizeof(hostname), "%s", utsname()->nodename);
+	INFO_NP("Loading module ibtrs_server, version: %s ("
+		" retry_count: %d, "
+		" default_heartbeat_timeout_ms: %d,"
+		" cq_affinity_list: %s, max_io_size: %d,"
+		" sess_queue_depth: %d, init_pool_size: %d,"
+		" pool_size_hi_wm: %d, hostname: %s)\n",
+		__stringify(IBTRS_VER),
+		retry_count, default_heartbeat_timeout_ms,
+		cq_affinity_list, max_io_size, sess_queue_depth,
+		init_pool_size, pool_size_hi_wm, hostname);
+
+	err = check_module_params();
+	if (err) {
+		ERR_NP("Failed to load module, invalid module parameters,"
+		       " errno: %d\n", err);
+		return err;
+	}
+
+	destroy_wq = alloc_workqueue("ibtrs_server_destroy_wq", 0, 0);
+	if (!destroy_wq) {
+		ERR_NP("Failed to load module,"
+		       " alloc ibtrs_server_destroy_wq failed\n");
+		return -ENOMEM;
+	}
+
+	err = ibtrs_srv_create_sysfs_files();
+	if (err) {
+		ERR_NP("Failed to load module, can't create sysfs files,"
+		       " errno: %d\n", err);
+		goto out_destroy_wq;
+	}
+
+	err = ibtrs_srv_create_debugfs_files();
+	if (err)
+		WRN_NP("Unable to create debugfs files, errno: %d."
+		       " Continuing without debugfs\n", err);
+
+	return 0;
+
+out_destroy_wq:
+	destroy_workqueue(destroy_wq);
+	return err;
+}
+
+static void __exit ibtrs_server_exit(void)
+{
+	INFO_NP("Unloading module\n");
+	ibtrs_srv_destroy_debugfs_files();
+	ibtrs_srv_destroy_sysfs_files();
+	destroy_workqueue(destroy_wq);
+
+	INFO_NP("Module unloaded\n");
+}
+
+module_init(ibtrs_server_init);
+module_exit(ibtrs_server_exit);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 11/28] ibtrs_srv: add header shared in ibtrs_server
@ 2017-03-24 10:45   ` Jack Wang
  0 siblings, 0 replies; 87+ messages in thread
From: Jack Wang @ 2017-03-24 10:45 UTC (permalink / raw)
  To: linux-block, linux-rdma
  Cc: dledford, axboe, hch, mail, Milind.dumbare, yun.wang, Jack Wang,
	Kleber Souza, Danil Kipnis, Roman Pen

From: Jack Wang <jinpu.wang@profitbricks.com>

Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Kleber Souza <kleber.souza@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kipnis@profitbricks.com>
Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
---
 .../ulp/ibtrs_server/ibtrs_srv_internal.h          | 201 +++++++++++++++++++++
 1 file changed, 201 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs_server/ibtrs_srv_internal.h

diff --git a/drivers/infiniband/ulp/ibtrs_server/ibtrs_srv_internal.h b/drivers/infiniband/ulp/ibtrs_server/ibtrs_srv_internal.h
new file mode 100644
index 0000000..79130a1
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs_server/ibtrs_srv_internal.h
@@ -0,0 +1,201 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler < mail@fholler.de>
+ *          Jack Wang <jinpu.wang@profitbricks.com>
+ *   	    Kleber Souza <kleber.souza@profitbricks.com>
+ * 	    Danil Kipnis <danil.kipnis@profitbricks.com>
+ *   	    Roman Pen <roman.penyaev@profitbricks.com>
+ *          Milind Dumbare <Milind.dumbare@gmail.com>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#ifndef _IBTRS_SRV_INTERNAL_H
+#define _IBTRS_SRV_INTERNAL_H
+
+#include <rdma/ibtrs.h>
+
+enum ssm_state {
+	SSM_STATE_IDLE,
+	SSM_STATE_CONNECTED,
+	SSM_STATE_CLOSING,
+	SSM_STATE_CLOSED
+};
+
+/*
+ * Describes the rdma buffer managed by client and used for his rdma writes
+ * Rdma info has to be sent in OPEN_RESP message to the client.
+ */
+struct ibtrs_rcv_buf {
+	dma_addr_t	rdma_addr;
+	void		*buf;
+};
+
+/* to indicate that memory chunk was not allocated from a N-order contiguous
+ * pages area
+ */
+#define IBTRS_MEM_CHUNK_NOORDER -1
+
+struct ibtrs_mem_chunk {
+	struct list_head	list;
+	int			order;
+	void			*addr;
+};
+
+struct ibtrs_rcv_buf_pool {
+	struct list_head	list;
+	struct list_head	chunk_list;
+	struct ibtrs_rcv_buf	*rcv_bufs;
+};
+
+struct ibtrs_stats_wc_comp {
+	atomic_t	max_wc_cnt;
+	atomic64_t	calls;
+	atomic64_t	total_wc_cnt;
+};
+
+struct ibtrs_srv_stats_rdma_stats {
+	atomic64_t	cnt_read;
+	atomic64_t	size_total_read;
+	atomic64_t	cnt_write;
+	atomic64_t	size_total_write;
+
+	atomic_t	inflight;
+	atomic64_t	inflight_total;
+};
+
+struct ibtrs_srv_stats_user_ib_msgs {
+	atomic64_t recv_msg_cnt;
+	atomic64_t sent_msg_cnt;
+	atomic64_t recv_size;
+	atomic64_t sent_size;
+};
+
+struct ibtrs_srv_stats {
+	struct ibtrs_srv_stats_rdma_stats	rdma_stats;
+	struct ibtrs_srv_stats_user_ib_msgs	user_ib_msgs;
+	atomic_t				apm_cnt;
+	struct ibtrs_stats_wc_comp		wc_comp;
+};
+
+struct ibtrs_session {
+	struct list_head	list;
+	enum ssm_state		state;
+	struct kref		kref;
+	struct workqueue_struct *sm_wq;	/* event processing */
+	struct workqueue_struct *msg_wq;
+	struct ibtrs_device	*dev; /* ib dev with mempool */
+	struct rdma_cm_id	*cm_id;	/* cm_id used to create the session */
+	struct mutex            lock; /* to protect con_list */
+	int			cur_cq_vector;
+	struct list_head        con_list;
+	struct ibtrs_iu		*rdma_info_iu;
+	struct ibtrs_iu		*dummy_rx_iu;
+	struct ibtrs_iu		**usr_rx_ring;
+	struct ibtrs_ops_id	**ops_ids;
+	/* lock for tx_bufs */
+	spinlock_t              tx_bufs_lock ____cacheline_aligned;
+	struct list_head	tx_bufs;
+	u16			tx_bufs_used;
+	unsigned int		est_cnt; /* number of established connections */
+	unsigned int		active_cnt; /* number of active (not closed)
+					     * connections
+					     */
+	u8			con_cnt;
+	u8			ver;
+	bool			state_in_sysfs;
+	bool			session_announced_to_user;
+	struct ibtrs_rcv_buf_pool *rcv_buf_pool;
+	wait_queue_head_t	bufs_wait;
+	u8			off_len; /* number of bits for offset in
+					  * one client buffer.
+					  * 32 - ilog2(sess->queue_depth)
+					  */
+	u32			off_mask; /* mask to get offset in client buf
+					   * out of the imm field
+					   */
+	u16			queue_depth;
+	u16			wq_size;
+	u8			uuid[IBTRS_UUID_SIZE];
+	struct ibtrs_heartbeat	heartbeat;
+	struct delayed_work	check_heartbeat_dwork;
+	struct delayed_work	send_heartbeat_dwork;
+	void			*priv;
+	struct kobject		kobj;
+	struct kobject		kobj_stats;
+	char			addr[IBTRS_ADDRLEN]; /* client address */
+	char			hostname[MAXHOSTNAMELEN];
+	u8			primary_port_num;
+	struct ibtrs_srv_stats	stats;
+	wait_queue_head_t	mu_iu_wait_q;
+	wait_queue_head_t	mu_buf_wait_q;
+	atomic_t		peer_usr_msg_bufs;
+};
+
+void ibtrs_srv_queue_close(struct ibtrs_session *sess);
+
+u8 ibtrs_srv_get_sess_primary_port_num(struct ibtrs_session *sess);
+
+int ibtrs_srv_current_hca_port_to_str(struct ibtrs_session *sess,
+				      char *buf, size_t len);
+int ibtrs_srv_failover_hca_port_to_str(struct ibtrs_session *sess,
+				       char *buf, size_t len);
+const char *ibtrs_srv_get_sess_hca_name(struct ibtrs_session *sess);
+int ibtrs_srv_migrate(struct ibtrs_session *sess, u8 port_num);
+int ibtrs_srv_reset_rdma_stats(struct ibtrs_session *sess, bool enable);
+ssize_t ibtrs_srv_stats_rdma_to_str(struct ibtrs_session *sess,
+				    char *page, size_t len);
+int ibtrs_srv_reset_user_ib_msgs_stats(struct ibtrs_session *sess, bool enable);
+int ibtrs_srv_stats_user_ib_msgs_to_str(struct ibtrs_session *sess, char *buf,
+					size_t len);
+int ibtrs_srv_reset_apm_stats(struct ibtrs_session *sess, bool enable);
+int ibtrs_srv_stats_apm_to_str(struct ibtrs_session *sess, char *buf,
+			       size_t len);
+int ibtrs_srv_reset_wc_completion_stats(struct ibtrs_session *sess,
+					bool enable);
+int ibtrs_srv_stats_wc_completion_to_str(struct ibtrs_session *sess, char *buf,
+					 size_t len);
+int ibtrs_srv_reset_all_stats(struct ibtrs_session *sess, bool enable);
+ssize_t ibtrs_srv_reset_all_help(struct ibtrs_session *sess,
+				 char *page, size_t len);
+int heartbeat_timeout_validate(int timeout);
+
+int ibtrs_srv_sess_get(struct ibtrs_session *sess);
+
+void ibtrs_srv_sess_put(struct ibtrs_session *sess);
+
+#endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 11/28] ibtrs_srv: add header shared in ibtrs_server
@ 2017-03-24 10:45   ` Jack Wang
  0 siblings, 0 replies; 87+ messages in thread
From: Jack Wang @ 2017-03-24 10:45 UTC (permalink / raw)
  To: linux-block-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
	hch-jcswGhMUV9g, mail-99BIx50xQYGELgA04lAiVw,
	Milind.dumbare-Re5JQEeQqe8AvxtiuMwx3w,
	yun.wang-EIkl63zCoXaH+58JC4qpiA, Jack Wang, Kleber Souza,
	Danil Kipnis, Roman Pen

From: Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>

Signed-off-by: Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Signed-off-by: Kleber Souza <kleber.souza-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Signed-off-by: Danil Kipnis <danil.kipnis-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Signed-off-by: Roman Pen <roman.penyaev-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
---
 .../ulp/ibtrs_server/ibtrs_srv_internal.h          | 201 +++++++++++++++++++++
 1 file changed, 201 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs_server/ibtrs_srv_internal.h

diff --git a/drivers/infiniband/ulp/ibtrs_server/ibtrs_srv_internal.h b/drivers/infiniband/ulp/ibtrs_server/ibtrs_srv_internal.h
new file mode 100644
index 0000000..79130a1
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs_server/ibtrs_srv_internal.h
@@ -0,0 +1,201 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler < mail-99BIx50xQYGELgA04lAiVw@public.gmane.org>
+ *          Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ *   	    Kleber Souza <kleber.souza-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ * 	    Danil Kipnis <danil.kipnis-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ *   	    Roman Pen <roman.penyaev-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ *          Milind Dumbare <Milind.dumbare-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#ifndef _IBTRS_SRV_INTERNAL_H
+#define _IBTRS_SRV_INTERNAL_H
+
+#include <rdma/ibtrs.h>
+
+enum ssm_state {
+	SSM_STATE_IDLE,
+	SSM_STATE_CONNECTED,
+	SSM_STATE_CLOSING,
+	SSM_STATE_CLOSED
+};
+
+/*
+ * Describes the rdma buffer managed by client and used for his rdma writes
+ * Rdma info has to be sent in OPEN_RESP message to the client.
+ */
+struct ibtrs_rcv_buf {
+	dma_addr_t	rdma_addr;
+	void		*buf;
+};
+
+/* to indicate that memory chunk was not allocated from a N-order contiguous
+ * pages area
+ */
+#define IBTRS_MEM_CHUNK_NOORDER -1
+
+struct ibtrs_mem_chunk {
+	struct list_head	list;
+	int			order;
+	void			*addr;
+};
+
+struct ibtrs_rcv_buf_pool {
+	struct list_head	list;
+	struct list_head	chunk_list;
+	struct ibtrs_rcv_buf	*rcv_bufs;
+};
+
+struct ibtrs_stats_wc_comp {
+	atomic_t	max_wc_cnt;
+	atomic64_t	calls;
+	atomic64_t	total_wc_cnt;
+};
+
+struct ibtrs_srv_stats_rdma_stats {
+	atomic64_t	cnt_read;
+	atomic64_t	size_total_read;
+	atomic64_t	cnt_write;
+	atomic64_t	size_total_write;
+
+	atomic_t	inflight;
+	atomic64_t	inflight_total;
+};
+
+struct ibtrs_srv_stats_user_ib_msgs {
+	atomic64_t recv_msg_cnt;
+	atomic64_t sent_msg_cnt;
+	atomic64_t recv_size;
+	atomic64_t sent_size;
+};
+
+struct ibtrs_srv_stats {
+	struct ibtrs_srv_stats_rdma_stats	rdma_stats;
+	struct ibtrs_srv_stats_user_ib_msgs	user_ib_msgs;
+	atomic_t				apm_cnt;
+	struct ibtrs_stats_wc_comp		wc_comp;
+};
+
+struct ibtrs_session {
+	struct list_head	list;
+	enum ssm_state		state;
+	struct kref		kref;
+	struct workqueue_struct *sm_wq;	/* event processing */
+	struct workqueue_struct *msg_wq;
+	struct ibtrs_device	*dev; /* ib dev with mempool */
+	struct rdma_cm_id	*cm_id;	/* cm_id used to create the session */
+	struct mutex            lock; /* to protect con_list */
+	int			cur_cq_vector;
+	struct list_head        con_list;
+	struct ibtrs_iu		*rdma_info_iu;
+	struct ibtrs_iu		*dummy_rx_iu;
+	struct ibtrs_iu		**usr_rx_ring;
+	struct ibtrs_ops_id	**ops_ids;
+	/* lock for tx_bufs */
+	spinlock_t              tx_bufs_lock ____cacheline_aligned;
+	struct list_head	tx_bufs;
+	u16			tx_bufs_used;
+	unsigned int		est_cnt; /* number of established connections */
+	unsigned int		active_cnt; /* number of active (not closed)
+					     * connections
+					     */
+	u8			con_cnt;
+	u8			ver;
+	bool			state_in_sysfs;
+	bool			session_announced_to_user;
+	struct ibtrs_rcv_buf_pool *rcv_buf_pool;
+	wait_queue_head_t	bufs_wait;
+	u8			off_len; /* number of bits for offset in
+					  * one client buffer.
+					  * 32 - ilog2(sess->queue_depth)
+					  */
+	u32			off_mask; /* mask to get offset in client buf
+					   * out of the imm field
+					   */
+	u16			queue_depth;
+	u16			wq_size;
+	u8			uuid[IBTRS_UUID_SIZE];
+	struct ibtrs_heartbeat	heartbeat;
+	struct delayed_work	check_heartbeat_dwork;
+	struct delayed_work	send_heartbeat_dwork;
+	void			*priv;
+	struct kobject		kobj;
+	struct kobject		kobj_stats;
+	char			addr[IBTRS_ADDRLEN]; /* client address */
+	char			hostname[MAXHOSTNAMELEN];
+	u8			primary_port_num;
+	struct ibtrs_srv_stats	stats;
+	wait_queue_head_t	mu_iu_wait_q;
+	wait_queue_head_t	mu_buf_wait_q;
+	atomic_t		peer_usr_msg_bufs;
+};
+
+void ibtrs_srv_queue_close(struct ibtrs_session *sess);
+
+u8 ibtrs_srv_get_sess_primary_port_num(struct ibtrs_session *sess);
+
+int ibtrs_srv_current_hca_port_to_str(struct ibtrs_session *sess,
+				      char *buf, size_t len);
+int ibtrs_srv_failover_hca_port_to_str(struct ibtrs_session *sess,
+				       char *buf, size_t len);
+const char *ibtrs_srv_get_sess_hca_name(struct ibtrs_session *sess);
+int ibtrs_srv_migrate(struct ibtrs_session *sess, u8 port_num);
+int ibtrs_srv_reset_rdma_stats(struct ibtrs_session *sess, bool enable);
+ssize_t ibtrs_srv_stats_rdma_to_str(struct ibtrs_session *sess,
+				    char *page, size_t len);
+int ibtrs_srv_reset_user_ib_msgs_stats(struct ibtrs_session *sess, bool enable);
+int ibtrs_srv_stats_user_ib_msgs_to_str(struct ibtrs_session *sess, char *buf,
+					size_t len);
+int ibtrs_srv_reset_apm_stats(struct ibtrs_session *sess, bool enable);
+int ibtrs_srv_stats_apm_to_str(struct ibtrs_session *sess, char *buf,
+			       size_t len);
+int ibtrs_srv_reset_wc_completion_stats(struct ibtrs_session *sess,
+					bool enable);
+int ibtrs_srv_stats_wc_completion_to_str(struct ibtrs_session *sess, char *buf,
+					 size_t len);
+int ibtrs_srv_reset_all_stats(struct ibtrs_session *sess, bool enable);
+ssize_t ibtrs_srv_reset_all_help(struct ibtrs_session *sess,
+				 char *page, size_t len);
+int heartbeat_timeout_validate(int timeout);
+
+int ibtrs_srv_sess_get(struct ibtrs_session *sess);
+
+void ibtrs_srv_sess_put(struct ibtrs_session *sess);
+
+#endif
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 12/28] ibtrs_srv: add sysfs interface
  2017-03-24 10:45 ` Jack Wang
                   ` (11 preceding siblings ...)
  (?)
@ 2017-03-24 10:45 ` Jack Wang
  -1 siblings, 0 replies; 87+ messages in thread
From: Jack Wang @ 2017-03-24 10:45 UTC (permalink / raw)
  To: linux-block, linux-rdma
  Cc: dledford, axboe, hch, mail, Milind.dumbare, yun.wang, Jack Wang,
	Kleber Souza, Danil Kipnis, Roman Pen

From: Jack Wang <jinpu.wang@profitbricks.com>

Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Kleber Souza <kleber.souza@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kipnis@profitbricks.com>
Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
---
 .../infiniband/ulp/ibtrs_server/ibtrs_srv_sysfs.c  | 301 +++++++++++++++++++++
 .../infiniband/ulp/ibtrs_server/ibtrs_srv_sysfs.h  |  59 ++++
 2 files changed, 360 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs_server/ibtrs_srv_sysfs.c
 create mode 100644 drivers/infiniband/ulp/ibtrs_server/ibtrs_srv_sysfs.h

diff --git a/drivers/infiniband/ulp/ibtrs_server/ibtrs_srv_sysfs.c b/drivers/infiniband/ulp/ibtrs_server/ibtrs_srv_sysfs.c
new file mode 100644
index 0000000..c95a124
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs_server/ibtrs_srv_sysfs.c
@@ -0,0 +1,301 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler < mail@fholler.de>
+ *          Jack Wang <jinpu.wang@profitbricks.com>
+ *   	    Kleber Souza <kleber.souza@profitbricks.com>
+ * 	    Danil Kipnis <danil.kipnis@profitbricks.com>
+ *   	    Roman Pen <roman.penyaev@profitbricks.com>
+ *          Milind Dumbare <Milind.dumbare@gmail.com>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#include "ibtrs_srv_sysfs.h"
+#include "ibtrs_srv_internal.h"
+#include <rdma/ibtrs_srv.h>
+#include <rdma/ibtrs.h>
+#include <rdma/ibtrs_log.h>
+
+static struct kobject *ibtrs_srv_kobj;
+static struct kobject *ibtrs_srv_sessions_kobj;
+
+static ssize_t ibtrs_srv_hb_timeout_show(struct kobject *kobj,
+					 struct kobj_attribute *attr,
+					 char *page)
+{
+	struct ibtrs_session *sess = container_of(kobj, struct ibtrs_session,
+						  kobj);
+
+	return scnprintf(page, PAGE_SIZE, "%u\n", sess->heartbeat.timeout_ms);
+}
+
+static ssize_t ibtrs_srv_hb_timeout_store(struct kobject *kobj,
+					  struct kobj_attribute *attr,
+					  const char *buf, size_t count)
+{
+	int ret;
+	u32 timeout_ms;
+	struct ibtrs_session *sess = container_of(kobj, struct ibtrs_session,
+						  kobj);
+
+	ret = kstrtouint(buf, 0, &timeout_ms);
+	if (ret)
+		return ret;
+
+	ret = ibtrs_heartbeat_timeout_validate(timeout_ms);
+	if (ret)
+		return ret;
+
+	INFO(sess, "%s: changing value from %u to %u\n", attr->attr.name,
+	     sess->heartbeat.timeout_ms, timeout_ms);
+	ibtrs_set_heartbeat_timeout(&sess->heartbeat, timeout_ms);
+	return count;
+}
+
+static struct kobj_attribute ibtrs_srv_heartbeat_timeout_ms_attr =
+	__ATTR(heartbeat_timeout_ms, 0644,
+	       ibtrs_srv_hb_timeout_show, ibtrs_srv_hb_timeout_store);
+
+static ssize_t ibtrs_srv_disconnect_show(struct kobject *kobj,
+					 struct kobj_attribute *attr,
+					 char *page)
+{
+	return scnprintf(page, PAGE_SIZE, "Usage: echo 1 > %s\n",
+			 attr->attr.name);
+}
+
+static ssize_t ibtrs_srv_disconnect_store(struct kobject *kobj,
+					  struct kobj_attribute *attr,
+					  const char *buf, size_t count)
+{
+	struct ibtrs_session *sess = container_of(kobj, struct ibtrs_session,
+						  kobj);
+
+	if (!sysfs_streq(buf, "1")) {
+		ERR(sess, "%s: invalid value: '%s'\n", attr->attr.name, buf);
+		return -EINVAL;
+	}
+
+	INFO(sess, "%s: Session disconnect requested\n", attr->attr.name);
+	ibtrs_srv_queue_close(sess);
+
+	return count;
+}
+
+static struct kobj_attribute disconnect_attr =
+	__ATTR(disconnect, 0644,
+	       ibtrs_srv_disconnect_show, ibtrs_srv_disconnect_store);
+
+static ssize_t ibtrs_srv_current_hca_port_show(struct kobject *kobj,
+					       struct kobj_attribute *attr,
+					       char *page)
+{
+	struct ibtrs_session *sess = container_of(kobj, struct ibtrs_session,
+						  kobj);
+	return ibtrs_srv_current_hca_port_to_str(sess, page, PAGE_SIZE);
+}
+
+static struct kobj_attribute current_hca_port_attr =
+	__ATTR(current_hca_port, 0444, ibtrs_srv_current_hca_port_show,
+	       NULL);
+
+static ssize_t ibtrs_srv_hca_name_show(struct kobject *kobj,
+				       struct kobj_attribute *attr,
+				       char *page)
+{
+	struct ibtrs_session *sess = container_of(kobj, struct ibtrs_session,
+						  kobj);
+
+	return scnprintf(page, PAGE_SIZE, "%s\n",
+			 ibtrs_srv_get_sess_hca_name(sess));
+}
+
+static struct kobj_attribute hca_name_attr =
+	__ATTR(hca_name, 0444, ibtrs_srv_hca_name_show, NULL);
+
+static ssize_t hostname_show(struct kobject *kobj,
+			     struct kobj_attribute *attr, char *page)
+{
+	struct ibtrs_session *sess = container_of(kobj, struct ibtrs_session,
+						  kobj);
+	return sprintf(page, "%s\n", sess->hostname);
+}
+
+static struct kobj_attribute hostname_attr =
+		__ATTR(hostname, 0444, hostname_show, NULL);
+
+static struct attribute *default_sess_attrs[] = {
+	&hca_name_attr.attr,
+	&hostname_attr.attr,
+	&current_hca_port_attr.attr,
+	&disconnect_attr.attr,
+	&ibtrs_srv_heartbeat_timeout_ms_attr.attr,
+	NULL,
+};
+
+static struct attribute_group default_sess_attr_group = {
+	.attrs = default_sess_attrs,
+};
+
+static void ibtrs_srv_sess_release(struct kobject *kobj)
+{
+	struct ibtrs_session *sess = container_of(kobj, struct ibtrs_session,
+						  kobj);
+
+	ibtrs_srv_sess_put(sess);
+}
+
+static struct kobj_type ibtrs_srv_sess_ktype = {
+	.sysfs_ops	= &kobj_sysfs_ops,
+	.release	= ibtrs_srv_sess_release,
+};
+
+STAT_ATTR(rdma, ibtrs_srv_stats_rdma_to_str, ibtrs_srv_reset_rdma_stats);
+
+STAT_ATTR(user_ib_messages, ibtrs_srv_stats_user_ib_msgs_to_str,
+	  ibtrs_srv_reset_user_ib_msgs_stats);
+
+STAT_ATTR(wc_completion, ibtrs_srv_stats_wc_completion_to_str,
+	  ibtrs_srv_reset_wc_completion_stats);
+
+STAT_ATTR(reset_all, ibtrs_srv_reset_all_help, ibtrs_srv_reset_all_stats);
+
+static struct attribute *ibtrs_srv_default_stats_attrs[] = {
+	&rdma_attr.attr,
+	&user_ib_messages_attr.attr,
+	&wc_completion_attr.attr,
+	&reset_all_attr.attr,
+	NULL,
+};
+
+static struct attribute_group ibtrs_srv_default_stats_attr_group = {
+	.attrs = ibtrs_srv_default_stats_attrs,
+};
+
+static struct kobj_type ibtrs_stats_ktype = {
+	.sysfs_ops = &kobj_sysfs_ops,
+};
+
+static int ibtrs_srv_create_stats_files(struct ibtrs_session *sess)
+{
+	int ret;
+
+	ret = kobject_init_and_add(&sess->kobj_stats, &ibtrs_stats_ktype,
+				   &sess->kobj, "stats");
+	if (ret) {
+		ERR(sess,
+		    "Failed to init and add sysfs directory for session stats,"
+		    " errno: %d\n", ret);
+		return ret;
+	}
+
+	ret = sysfs_create_group(&sess->kobj_stats,
+				 &ibtrs_srv_default_stats_attr_group);
+	if (ret) {
+		ERR(sess, "Failed to create sysfs group for session stats,"
+		    " errno: %d\n", ret);
+		goto err;
+	}
+
+	return 0;
+
+err:
+	kobject_put(&sess->kobj_stats);
+
+	return ret;
+}
+
+int ibtrs_srv_create_sess_files(struct ibtrs_session *sess)
+{
+	int ret;
+
+	DEB("creating sysfs files for sess %s\n", sess->addr);
+
+	if (WARN_ON(!ibtrs_srv_sess_get(sess)))
+		return -EINVAL;
+
+	ret = kobject_init_and_add(&sess->kobj, &ibtrs_srv_sess_ktype,
+				   ibtrs_srv_sessions_kobj, "%s", sess->addr);
+	if (ret) {
+		ERR(sess, "Failed to init and add sysfs directory for session,"
+		    " errno: %d\n", ret);
+		ibtrs_srv_sess_put(sess);
+		return ret;
+	}
+
+	ret = sysfs_create_group(&sess->kobj, &default_sess_attr_group);
+	if (ret) {
+		ERR(sess, "Failed to create sysfs group for session,"
+		    " errno: %d\n", ret);
+		goto err;
+	}
+
+	ret = ibtrs_srv_create_stats_files(sess);
+	if (ret)
+		goto err1;
+
+	return 0;
+
+err1:
+	sysfs_remove_group(&sess->kobj, &default_sess_attr_group);
+err:
+	kobject_put(&sess->kobj);
+
+	return ret;
+}
+
+int ibtrs_srv_create_sysfs_files(void)
+{
+	ibtrs_srv_kobj = kobject_create_and_add("ibtrs", kernel_kobj);
+	if (!ibtrs_srv_kobj)
+		return -ENOMEM;
+
+	ibtrs_srv_sessions_kobj = kobject_create_and_add("sessions",
+							 ibtrs_srv_kobj);
+	if (!ibtrs_srv_sessions_kobj) {
+		kobject_put(ibtrs_srv_kobj);
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
+void ibtrs_srv_destroy_sysfs_files(void)
+{
+	kobject_put(ibtrs_srv_sessions_kobj);
+	kobject_put(ibtrs_srv_kobj);
+}
diff --git a/drivers/infiniband/ulp/ibtrs_server/ibtrs_srv_sysfs.h b/drivers/infiniband/ulp/ibtrs_server/ibtrs_srv_sysfs.h
new file mode 100644
index 0000000..e5dfa62
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs_server/ibtrs_srv_sysfs.h
@@ -0,0 +1,59 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler < mail@fholler.de>
+ *          Jack Wang <jinpu.wang@profitbricks.com>
+ *   	    Kleber Souza <kleber.souza@profitbricks.com>
+ * 	    Danil Kipnis <danil.kipnis@profitbricks.com>
+ *   	    Roman Pen <roman.penyaev@profitbricks.com>
+ *          Milind Dumbare <Milind.dumbare@gmail.com>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#ifndef _IBTRS_SRV_SYFS_H
+#define _IBTRS_SRV_SYFS_H
+
+#include <linux/kobject.h>
+#include "ibtrs_srv_internal.h"
+
+int ibtrs_srv_create_sysfs_files(void);
+
+void ibtrs_srv_destroy_sysfs_files(void);
+
+int ibtrs_srv_create_sess_files(struct ibtrs_session *sess);
+
+#endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 13/28] ibtrs_srv: add Makefile and Kconfig
@ 2017-03-24 10:45   ` Jack Wang
  0 siblings, 0 replies; 87+ messages in thread
From: Jack Wang @ 2017-03-24 10:45 UTC (permalink / raw)
  To: linux-block, linux-rdma
  Cc: dledford, axboe, hch, mail, Milind.dumbare, yun.wang, Jack Wang

From: Jack Wang <jinpu.wang@profitbricks.com>

Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
 drivers/infiniband/Kconfig                   | 1 +
 drivers/infiniband/ulp/Makefile              | 1 +
 drivers/infiniband/ulp/ibtrs_server/Kconfig  | 8 ++++++++
 drivers/infiniband/ulp/ibtrs_server/Makefile | 6 ++++++
 4 files changed, 16 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs_server/Kconfig
 create mode 100644 drivers/infiniband/ulp/ibtrs_server/Makefile

diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
index cb1b864..07aa050 100644
--- a/drivers/infiniband/Kconfig
+++ b/drivers/infiniband/Kconfig
@@ -86,6 +86,7 @@ source "drivers/infiniband/ulp/iser/Kconfig"
 source "drivers/infiniband/ulp/isert/Kconfig"
 
 source "drivers/infiniband/ulp/ibtrs_client/Kconfig"
+source "drivers/infiniband/ulp/ibtrs_server/Kconfig"
 
 source "drivers/infiniband/sw/rdmavt/Kconfig"
 source "drivers/infiniband/sw/rxe/Kconfig"
diff --git a/drivers/infiniband/ulp/Makefile b/drivers/infiniband/ulp/Makefile
index acd8ce6..eb4da3f 100644
--- a/drivers/infiniband/ulp/Makefile
+++ b/drivers/infiniband/ulp/Makefile
@@ -4,3 +4,4 @@ obj-$(CONFIG_INFINIBAND_SRPT)		+= srpt/
 obj-$(CONFIG_INFINIBAND_ISER)		+= iser/
 obj-$(CONFIG_INFINIBAND_ISERT)		+= isert/
 obj-$(CONFIG_INFINIBAND_IBTRS_CLT)      += ibtrs_client/
+obj-$(CONFIG_INFINIBAND_IBTRS_SRV)      += ibtrs_server/
diff --git a/drivers/infiniband/ulp/ibtrs_server/Kconfig b/drivers/infiniband/ulp/ibtrs_server/Kconfig
new file mode 100644
index 0000000..6fbdc54
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs_server/Kconfig
@@ -0,0 +1,8 @@
+config INFINIBAND_IBTRS_SRV
+	tristate "InfiniBand IBTRS SERVER"
+	depends on INFINIBAND_ADDR_TRANS
+	---help---
+	  Support for the simplified data transfer over InfiniBand.
+	  This offer API to user module IBNBD_SERVER
+
+	  The IBTRS protocol is defined by the ProfitBricks GmbH.
diff --git a/drivers/infiniband/ulp/ibtrs_server/Makefile b/drivers/infiniband/ulp/ibtrs_server/Makefile
new file mode 100644
index 0000000..39d9e1d
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs_server/Makefile
@@ -0,0 +1,6 @@
+
+obj-$(CONFIG_INFINIBAND_IBTRS_SRV)	+= ibtrs_server.o
+
+ibtrs_server-y		:= ibtrs_srv.o ibtrs_srv_sysfs.o \
+			   ../ibtrs_lib/ibtrs.o ../ibtrs_lib/ibtrs-proto.o ../ibtrs_lib/iu.o \
+			   ../ibtrs_lib/heartbeat.o ../ibtrs_lib/common.o
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 13/28] ibtrs_srv: add Makefile and Kconfig
@ 2017-03-24 10:45   ` Jack Wang
  0 siblings, 0 replies; 87+ messages in thread
From: Jack Wang @ 2017-03-24 10:45 UTC (permalink / raw)
  To: linux-block-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
	hch-jcswGhMUV9g, mail-99BIx50xQYGELgA04lAiVw,
	Milind.dumbare-Re5JQEeQqe8AvxtiuMwx3w,
	yun.wang-EIkl63zCoXaH+58JC4qpiA, Jack Wang

From: Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>

Signed-off-by: Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
---
 drivers/infiniband/Kconfig                   | 1 +
 drivers/infiniband/ulp/Makefile              | 1 +
 drivers/infiniband/ulp/ibtrs_server/Kconfig  | 8 ++++++++
 drivers/infiniband/ulp/ibtrs_server/Makefile | 6 ++++++
 4 files changed, 16 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs_server/Kconfig
 create mode 100644 drivers/infiniband/ulp/ibtrs_server/Makefile

diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
index cb1b864..07aa050 100644
--- a/drivers/infiniband/Kconfig
+++ b/drivers/infiniband/Kconfig
@@ -86,6 +86,7 @@ source "drivers/infiniband/ulp/iser/Kconfig"
 source "drivers/infiniband/ulp/isert/Kconfig"
 
 source "drivers/infiniband/ulp/ibtrs_client/Kconfig"
+source "drivers/infiniband/ulp/ibtrs_server/Kconfig"
 
 source "drivers/infiniband/sw/rdmavt/Kconfig"
 source "drivers/infiniband/sw/rxe/Kconfig"
diff --git a/drivers/infiniband/ulp/Makefile b/drivers/infiniband/ulp/Makefile
index acd8ce6..eb4da3f 100644
--- a/drivers/infiniband/ulp/Makefile
+++ b/drivers/infiniband/ulp/Makefile
@@ -4,3 +4,4 @@ obj-$(CONFIG_INFINIBAND_SRPT)		+= srpt/
 obj-$(CONFIG_INFINIBAND_ISER)		+= iser/
 obj-$(CONFIG_INFINIBAND_ISERT)		+= isert/
 obj-$(CONFIG_INFINIBAND_IBTRS_CLT)      += ibtrs_client/
+obj-$(CONFIG_INFINIBAND_IBTRS_SRV)      += ibtrs_server/
diff --git a/drivers/infiniband/ulp/ibtrs_server/Kconfig b/drivers/infiniband/ulp/ibtrs_server/Kconfig
new file mode 100644
index 0000000..6fbdc54
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs_server/Kconfig
@@ -0,0 +1,8 @@
+config INFINIBAND_IBTRS_SRV
+	tristate "InfiniBand IBTRS SERVER"
+	depends on INFINIBAND_ADDR_TRANS
+	---help---
+	  Support for the simplified data transfer over InfiniBand.
+	  This offer API to user module IBNBD_SERVER
+
+	  The IBTRS protocol is defined by the ProfitBricks GmbH.
diff --git a/drivers/infiniband/ulp/ibtrs_server/Makefile b/drivers/infiniband/ulp/ibtrs_server/Makefile
new file mode 100644
index 0000000..39d9e1d
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs_server/Makefile
@@ -0,0 +1,6 @@
+
+obj-$(CONFIG_INFINIBAND_IBTRS_SRV)	+= ibtrs_server.o
+
+ibtrs_server-y		:= ibtrs_srv.o ibtrs_srv_sysfs.o \
+			   ../ibtrs_lib/ibtrs.o ../ibtrs_lib/ibtrs-proto.o ../ibtrs_lib/iu.o \
+			   ../ibtrs_lib/heartbeat.o ../ibtrs_lib/common.o
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 14/28] ibnbd: add headers shared by ibnbd_client and ibnbd_server
  2017-03-24 10:45 ` Jack Wang
                   ` (13 preceding siblings ...)
  (?)
@ 2017-03-24 10:45 ` Jack Wang
  -1 siblings, 0 replies; 87+ messages in thread
From: Jack Wang @ 2017-03-24 10:45 UTC (permalink / raw)
  To: linux-block, linux-rdma
  Cc: dledford, axboe, hch, mail, Milind.dumbare, yun.wang, Jack Wang,
	Kleber Souza, Danil Kipnis, Roman Pen

From: Jack Wang <jinpu.wang@profitbricks.com>

Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Kleber Souza <kleber.souza@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kipnis@profitbricks.com>
Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
---
 drivers/block/ibnbd_inc/ibnbd-proto.h | 273 ++++++++++++++++++++++++++++++++++
 drivers/block/ibnbd_inc/ibnbd.h       |  55 +++++++
 drivers/block/ibnbd_inc/log.h         |  68 +++++++++
 3 files changed, 396 insertions(+)
 create mode 100644 drivers/block/ibnbd_inc/ibnbd-proto.h
 create mode 100644 drivers/block/ibnbd_inc/ibnbd.h
 create mode 100644 drivers/block/ibnbd_inc/log.h

diff --git a/drivers/block/ibnbd_inc/ibnbd-proto.h b/drivers/block/ibnbd_inc/ibnbd-proto.h
new file mode 100644
index 0000000..4838177
--- /dev/null
+++ b/drivers/block/ibnbd_inc/ibnbd-proto.h
@@ -0,0 +1,273 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler < mail@fholler.de>
+ *          Jack Wang <jinpu.wang@profitbricks.com>
+ *   	    Kleber Souza <kleber.souza@profitbricks.com>
+ * 	    Danil Kipnis <danil.kipnis@profitbricks.com>
+ *   	    Roman Pen <roman.penyaev@profitbricks.com>
+ *          Milind Dumbare <Milind.dumbare@gmail.com>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#ifndef __IBNBD_PROTO_H
+#define __IBNBD_PROTO_H
+#include <linux/limits.h>
+#include "ibnbd.h"
+
+#define IBNBD_VERSION 1
+
+#define GCC_DIAGNOSTIC_AWARE ((__GNUC__ > 6))
+#if GCC_DIAGNOSTIC_AWARE
+#pragma GCC diagnostic push
+#pragma GCC diagnostic warning "-Wpadded"
+#endif
+
+/**
+ * enum ibnbd_msg_types - IBNBD message types
+ * @IBNBD_MSG_SESS_INFO:	initial session info from client to server
+ * @IBNBD_MSG_SESS_INFO_RSP:	initial session info from server to client
+ * @IBNBD_MSG_OPEN:		open connection to ibnbd server instance
+ * @IBNBD_MSG_OPEN_RSP:		response to an @IBNBD_MSG_OPEN
+ * @IBNBD_MSG_READ:		request block device read operation
+ * @IBNBD_MSG_WRITE:		request block device write operation
+ * @IBNBD_MSG_REVAL:		notify client about changed device size
+ *
+ * Note: DO NOT REORDER THE MEMBERS OF THIS ENUM!
+ * If necessary, add new members after the last one.
+ */
+enum ibnbd_msg_type {
+	__IBNBD_MSG_MIN,
+	IBNBD_MSG_SESS_INFO,
+	IBNBD_MSG_SESS_INFO_RSP,
+	IBNBD_MSG_OPEN,
+	IBNBD_MSG_OPEN_RSP,
+	IBNBD_MSG_IO,
+	IBNBD_MSG_CLOSE,
+	IBNBD_MSG_CLOSE_RSP,
+	IBNBD_MSG_REVAL,
+	__IBNBD_MSG_MAX
+};
+
+/**
+ * struct ibnbd_msg_hdr - header of IBNBD messages
+ * @type:	Message type, valid values see: enum ibnbd_msg_types
+ */
+struct ibnbd_msg_hdr {
+	u16		type;
+	u16		__padding;
+};
+
+enum ibnbd_access_mode {
+	IBNBD_ACCESS_RO,
+	IBNBD_ACCESS_RW,
+	IBNBD_ACCESS_MIGRATION,
+};
+
+#define _IBNBD_FILEIO  0
+#define _IBNBD_BLOCKIO 1
+#define _IBNBD_AUTOIO  2
+
+enum ibnbd_io_mode {
+	IBNBD_FILEIO = _IBNBD_FILEIO,
+	IBNBD_BLOCKIO = _IBNBD_BLOCKIO,
+	IBNBD_AUTOIO = _IBNBD_AUTOIO,
+};
+
+/**
+ * struct ibnbd_msg_sess_info - initial session info from client to server
+ * @hdr:		message header
+ * @ver:		IBNBD protocol version
+ *
+ * Note: DO NOT CHANGE THE ORDER OF THE MEMBERS BEFORE 'ver'
+ */
+struct ibnbd_msg_sess_info {
+	struct ibnbd_msg_hdr hdr;
+
+	u8		ver;
+	u8		reserved[31];
+};
+
+/**
+ * struct ibnbd_msg_sess_info_rsp - initial session info from server to client
+ * @hdr:		message header
+ * @ver:		IBNBD protocol version
+ *
+ * Note: DO NOT CHANGE THE ORDER OF THE MEMBERS BEFORE 'ver'
+ */
+struct ibnbd_msg_sess_info_rsp {
+	struct ibnbd_msg_hdr hdr;
+
+	u8		ver;
+	u8		reserved[31];
+};
+
+/**
+ * struct ibnbd_msg_open - request to open a remote device.
+ * @hdr:		message header
+ * @clt_device_id:	device_id on client side to identify the device
+ * @access_mode:	the mode to open remote device, valid values see:
+ *			enum ibnbd_access_mode
+ * @io_mode:		Open volume on server as block device or as file
+ * @device_name:	device path on remote side
+ */
+struct ibnbd_msg_open {
+	struct ibnbd_msg_hdr hdr;
+	u32		clt_device_id;
+	u8		access_mode;
+	u8		io_mode;
+	s8		dev_name[NAME_MAX];
+	u8		__padding[3];
+};
+
+/**
+ * struct ibnbd_msg_close - request to close a remote device.
+ * @hdr:	message header
+ * @device_id:	device_id on server side to identify the device
+ */
+struct ibnbd_msg_close {
+	struct ibnbd_msg_hdr hdr;
+	u32		device_id;
+};
+
+/**
+ * struct ibnbd_msg_close_rsp - response to a close device message.
+ * @hdr:	message header
+ * @clt_device_id:	device_id on client side
+ */
+struct ibnbd_msg_close_rsp {
+	struct ibnbd_msg_hdr hdr;
+	u32		clt_device_id;
+};
+
+/**
+ * struct ibnbd_msg_open_rsp - response message to IBNBD_MSG_OPEN
+ * @hdr:		message header
+ * @result:		0 on success or negative error code on failure
+ * @clt_device_id:	device_id on client side
+ * @device_id:		device_id on server side to identify the device
+ * @queue_flags:	queue_flags of the device on server side
+ * @max_hw_sectors:	max hardware sectors in the usual 512b unit
+ * @max_write_same_sectors: max sectors for WRITE SAME in the 512b unit
+ * @max_discard_sectors: max. sectors that can be discarded at once
+ * @discard_zeroes_data: discarded areas are overwritten with 0?
+ * @discard_granularity: size of the internal discard allocation unit
+ * @discard_alignment: offset from internal allocation assignment
+ * @physical_block_size: physical block size device supports
+ * @logical_block_size: logical block size device supports
+ * @max_segments:	max segments hardware support in one transfer
+ * @nsectors:		number of sectors
+ * @secure_discard:	supports secure discard
+ * @rotation:		is a rotational disc?
+ * @io_mode:		io_mode device is opened.
+ */
+struct ibnbd_msg_open_rsp {
+	struct ibnbd_msg_hdr	hdr;
+	s32			result;
+	u32			clt_device_id;
+	u32			device_id;
+	u32			max_hw_sectors;
+	u32			max_write_same_sectors;
+	u32			max_discard_sectors;
+	u32			discard_zeroes_data;
+	u32			discard_granularity;
+	u32			discard_alignment;
+	u16			physical_block_size;
+	u16			logical_block_size;
+	u16			max_segments;
+	u16			secure_discard;
+	u64			nsectors;
+	u8			rotational;
+	u8			io_mode;
+	u8			__padding[6];
+};
+
+/**
+ * enum ibnbd_io_flags - IBNBD request types from rq_flag_bits
+ * @IBNBD_RW_REQ_WRITE:	bit not set = read, bit set = write
+ * @IBNBD_RW_REQ_SYNC:	request is sync
+ * @IBNBD_RW_REQ_DISCARD: request to discard sectors
+ * @IBNBD_RW_REQ_SECURE: secure discard request
+ * @IBNBD_RW_REQ_WRITE_SAME: write same block many times
+ */
+enum ibnbd_io_flags {
+	IBNBD_RW_REQ_WRITE		= 1 << 1,
+	IBNBD_RW_REQ_SYNC		= 1 << 2,
+	IBNBD_RW_REQ_DISCARD		= 1 << 3,
+	IBNBD_RW_REQ_SECURE		= 1 << 4,
+	IBNBD_RW_REQ_WRITE_SAME		= 1 << 5,
+	IBNBD_RW_REQ_FUA		= 1 << 6,
+	IBNBD_RW_REQ_FLUSH		= 1 << 7
+};
+
+/**
+ * struct ibnbd_msg_revalidate - notify client about new device size
+ * @hdr:		message header
+ * @clt_device_id:	device_id on client side
+ * @nsectors:		number of sectors
+ */
+struct ibnbd_msg_revalidate {
+	struct ibnbd_msg_hdr	hdr;
+	u32			clt_device_id;
+	u64			nsectors;
+};
+
+/**
+ * struct ibnbd_msg_io - message for I/O read/write
+ * @hdr:	message header
+ * @device_id:	device_id on server side to find the right device
+ * @sector:	bi_sector attribute from struct bio
+ * @rw:		bitmask, valid values are defined in enum ibnbd_io_flags
+ * @bi_size:   number of bytes for I/O read/write
+ */
+struct ibnbd_msg_io {
+	struct ibnbd_msg_hdr hdr;
+	u32		device_id;
+	u64		sector;
+	u32		rw;
+	u32		bi_size;
+};
+
+#if GCC_DIAGNOSTIC_AWARE
+#pragma GCC diagnostic pop
+#endif
+
+int ibnbd_validate_message(const void *data, size_t len);
+const char *ibnbd_io_mode_str(enum ibnbd_io_mode mode);
+const char *ibnbd_access_mode_str(enum ibnbd_access_mode mode);
+
+#endif /* __IBNBD_PROTO_H */
diff --git a/drivers/block/ibnbd_inc/ibnbd.h b/drivers/block/ibnbd_inc/ibnbd.h
new file mode 100644
index 0000000..4b691dc
--- /dev/null
+++ b/drivers/block/ibnbd_inc/ibnbd.h
@@ -0,0 +1,55 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler < mail@fholler.de>
+ *          Jack Wang <jinpu.wang@profitbricks.com>
+ *   	    Kleber Souza <kleber.souza@profitbricks.com>
+ * 	    Danil Kipnis <danil.kipnis@profitbricks.com>
+ *   	    Roman Pen <roman.penyaev@profitbricks.com>
+ *          Milind Dumbare <Milind.dumbare@gmail.com>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#ifndef __IBNBD_H
+#define __IBNBD_H
+#include <linux/types.h>
+#include <linux/blkdev.h>
+
+
+u32 rq_cmd_to_ibnbd_io_flags(struct request *rq);
+u32 ibnbd_io_flags_to_bi_rw(u32 flags);
+#endif
diff --git a/drivers/block/ibnbd_inc/log.h b/drivers/block/ibnbd_inc/log.h
new file mode 100644
index 0000000..9048bff
--- /dev/null
+++ b/drivers/block/ibnbd_inc/log.h
@@ -0,0 +1,68 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler < mail@fholler.de>
+ *          Jack Wang <jinpu.wang@profitbricks.com>
+ *   	    Kleber Souza <kleber.souza@profitbricks.com>
+ * 	    Danil Kipnis <danil.kipnis@profitbricks.com>
+ *   	    Roman Pen <roman.penyaev@profitbricks.com>
+ *          Milind Dumbare <Milind.dumbare@gmail.com>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#ifndef __IBNBD_LOG_H__
+#define __IBNBD_LOG_H__
+
+#define ERR_NP(fmt, ...) pr_err("ibnbd L%d ERR: " fmt, \
+				__LINE__, ##__VA_ARGS__)
+#define ERR_NP_RL(fmt, ...) pr_err_ratelimited("ibnbd L%d ERR: " fmt, \
+				__LINE__, ##__VA_ARGS__)
+
+#define WRN_NP(fmt, ...) pr_warn("ibnbd L%d WARN: " fmt, \
+				__LINE__, ##__VA_ARGS__)
+#define WRN_NP_RL(fmt, ...) pr_warn_ratelimited("ibnbd L%d WARN: " fmt,\
+				__LINE__, ##__VA_ARGS__)
+
+#define INFO_NP(fmt, ...)  pr_info("ibnbd: " fmt, ##__VA_ARGS__)
+#define INFO_NP_RL(fmt, ...) pr_info_ratelimited("ibnbd: " fmt, ##__VA_ARGS__)
+
+#define DEB(fmt, ...) pr_debug("ibnbd L%d " fmt, __LINE__, ##__VA_ARGS__)
+
+#define ibnbd_prefix(dev) ((dev->sess->hostname[0] != '\0') ? \
+			    dev->sess->hostname : dev->sess->str_addr)
+
+#endif /*__IBNBD_LOG_H__*/
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 15/28] ibnbd: add shared library functions
@ 2017-03-24 10:45   ` Jack Wang
  0 siblings, 0 replies; 87+ messages in thread
From: Jack Wang @ 2017-03-24 10:45 UTC (permalink / raw)
  To: linux-block, linux-rdma
  Cc: dledford, axboe, hch, mail, Milind.dumbare, yun.wang, Jack Wang,
	Kleber Souza, Danil Kipnis, Roman Pen

From: Jack Wang <jinpu.wang@profitbricks.com>

Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Kleber Souza <kleber.souza@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kipnis@profitbricks.com>
Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
---
 drivers/block/ibnbd_lib/ibnbd-proto.c | 244 ++++++++++++++++++++++++++++++++++
 drivers/block/ibnbd_lib/ibnbd.c       | 108 +++++++++++++++
 2 files changed, 352 insertions(+)
 create mode 100644 drivers/block/ibnbd_lib/ibnbd-proto.c
 create mode 100644 drivers/block/ibnbd_lib/ibnbd.c

diff --git a/drivers/block/ibnbd_lib/ibnbd-proto.c b/drivers/block/ibnbd_lib/ibnbd-proto.c
new file mode 100644
index 0000000..c6d83f2
--- /dev/null
+++ b/drivers/block/ibnbd_lib/ibnbd-proto.c
@@ -0,0 +1,244 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler < mail@fholler.de>
+ *          Jack Wang <jinpu.wang@profitbricks.com>
+ *   	    Kleber Souza <kleber.souza@profitbricks.com>
+ * 	    Danil Kipnis <danil.kipnis@profitbricks.com>
+ *   	    Roman Pen <roman.penyaev@profitbricks.com>
+ *          Milind Dumbare <Milind.dumbare@gmail.com>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#include "../ibnbd_inc/ibnbd-proto.h"
+#include "../ibnbd_inc/log.h"
+
+static int ibnbd_validate_msg_sess_info(const struct ibnbd_msg_sess_info *msg,
+					size_t len)
+{
+	if (unlikely(len != sizeof(*msg))) {
+		ERR_NP("Sess info message with unexpected length received"
+		       " %lu instead of %lu\n", len, sizeof(*msg));
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int
+ibnbd_validate_msg_sess_info_rsp(const struct ibnbd_msg_sess_info_rsp *msg,
+				 size_t len)
+{
+	if (unlikely(len != sizeof(*msg))) {
+		ERR_NP("Sess info message with unexpected length received"
+		       " %lu instead of %lu\n", len, sizeof(*msg));
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int ibnbd_validate_msg_open_resp(const struct ibnbd_msg_open_rsp *msg,
+					size_t len)
+{
+	if (unlikely(msg->result))
+		return 0;
+
+	if (unlikely(len != sizeof(*msg))) {
+		ERR_NP("Open Response msg received with unexpected length"
+		       " %zuB instead of %luB\n", len, sizeof(*msg));
+		return -EINVAL;
+	}
+
+	if (unlikely(!msg->logical_block_size)) {
+		ERR_NP("Open Resp msg received with unexpected with"
+		       " invalid logical_block_size value %d\n",
+		       msg->logical_block_size);
+		return -EINVAL;
+	}
+
+	if (unlikely(!msg->physical_block_size)) {
+		ERR_NP("Open Resp msg received with invalid"
+		       " physical_block_size value %d\n",
+		       msg->physical_block_size);
+		return -EINVAL;
+	}
+
+	if (unlikely(!msg->max_hw_sectors)) {
+		ERR_NP("Open Resp msg received with invalid"
+		       " max_hw_sectors value %d\n", msg->max_hw_sectors);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int ibnbd_validate_msg_revalidate(const struct ibnbd_msg_revalidate *msg,
+					 size_t len)
+{
+	if (unlikely(len != sizeof(*msg))) {
+		ERR_NP("Device resize message with unexpected length received"
+		       " %lu instead of %lu\n", len, sizeof(*msg));
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int ibnbd_validate_msg_open(const struct ibnbd_msg_open *msg,
+				   size_t len)
+{
+	if (len != sizeof(*msg)) {
+		ERR_NP("Open msg received with unexpected length"
+		       " %zuB instead of %luB\n", len, sizeof(*msg));
+		return -EINVAL;
+	}
+	if (msg->dev_name[strnlen(msg->dev_name, NAME_MAX)] != '\0') {
+		ERR_NP("Open msg received with invalid dev_name value,"
+		       " null terminator missing\n");
+		return -EINVAL;
+	}
+
+	if (unlikely(msg->access_mode != IBNBD_ACCESS_RO &&
+		     msg->access_mode != IBNBD_ACCESS_RW &&
+		     msg->access_mode != IBNBD_ACCESS_MIGRATION)) {
+		ERR_NP("Open msg received with invalid access_mode value %d\n",
+		       msg->access_mode);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int ibnbd_validate_msg_close(const struct ibnbd_msg_close *msg, size_t
+				    len)
+{
+	if (unlikely(len != sizeof(*msg))) {
+		ERR_NP("Close msg received with unexpected length %lu instead"
+		       " of %lu\n", len, sizeof(*msg));
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int ibnbd_validate_msg_close_rsp(const struct ibnbd_msg_close_rsp *msg,
+					size_t len)
+{
+	if (unlikely(len != sizeof(*msg))) {
+		ERR_NP("Close_rsp msg received with unexpected length %lu"
+		       " instead of %lu\n", len, sizeof(*msg));
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+int ibnbd_validate_message(const void *data, size_t len)
+{
+	const struct ibnbd_msg_hdr *hdr = data;
+
+	switch (hdr->type) {
+	case IBNBD_MSG_SESS_INFO: {
+		const struct ibnbd_msg_sess_info *msg = data;
+
+		return ibnbd_validate_msg_sess_info(msg, len);
+	}
+	case IBNBD_MSG_SESS_INFO_RSP: {
+		const struct ibnbd_msg_sess_info_rsp *msg = data;
+
+		return ibnbd_validate_msg_sess_info_rsp(msg, len);
+	}
+	case IBNBD_MSG_OPEN_RSP: {
+		const struct ibnbd_msg_open_rsp *msg = data;
+
+		return ibnbd_validate_msg_open_resp(msg, len);
+	}
+	case IBNBD_MSG_REVAL: {
+		const struct ibnbd_msg_revalidate *msg = data;
+
+		return ibnbd_validate_msg_revalidate(msg, len);
+	}
+	case IBNBD_MSG_OPEN: {
+		const struct ibnbd_msg_open *msg = data;
+
+		return ibnbd_validate_msg_open(msg, len);
+	}
+	case IBNBD_MSG_CLOSE: {
+		const struct ibnbd_msg_close *msg = data;
+
+		return ibnbd_validate_msg_close(msg, len);
+	}
+	case IBNBD_MSG_CLOSE_RSP: {
+		const struct ibnbd_msg_close_rsp *msg = data;
+
+		return ibnbd_validate_msg_close_rsp(msg, len);
+	}
+	default:
+		ERR_NP("Ignoring received message with unknown type %d\n",
+		       hdr->type);
+		return -EINVAL;
+	}
+}
+
+const char *ibnbd_io_mode_str(enum ibnbd_io_mode mode)
+{
+	switch (mode) {
+	case IBNBD_FILEIO:
+		return "fileio";
+	case IBNBD_BLOCKIO:
+		return "blockio";
+	case IBNBD_AUTOIO:
+		return "autoio";
+	default:
+		return "unknown";
+	}
+}
+
+const char *ibnbd_access_mode_str(enum ibnbd_access_mode mode)
+{
+	switch (mode) {
+	case IBNBD_ACCESS_RO:
+		return "ro";
+	case IBNBD_ACCESS_RW:
+		return "rw";
+	case IBNBD_ACCESS_MIGRATION:
+		return "migration";
+	default:
+		return "unknown";
+	}
+}
diff --git a/drivers/block/ibnbd_lib/ibnbd.c b/drivers/block/ibnbd_lib/ibnbd.c
new file mode 100644
index 0000000..1ba4777
--- /dev/null
+++ b/drivers/block/ibnbd_lib/ibnbd.c
@@ -0,0 +1,108 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler < mail@fholler.de>
+ *          Jack Wang <jinpu.wang@profitbricks.com>
+ *   	    Kleber Souza <kleber.souza@profitbricks.com>
+ * 	    Danil Kipnis <danil.kipnis@profitbricks.com>
+ *   	    Roman Pen <roman.penyaev@profitbricks.com>
+ *          Milind Dumbare <Milind.dumbare@gmail.com>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#include <linux/ctype.h>
+#include "../ibnbd_inc/ibnbd.h"
+#include "../ibnbd_inc/ibnbd-proto.h"
+
+u32 ibnbd_io_flags_to_bi_rw(u32 flags)
+{
+	u32 result = 0;
+
+	if (flags == 0)
+		return result;
+
+	if (flags & IBNBD_RW_REQ_WRITE)
+		result |= WRITE;
+
+	if (flags & IBNBD_RW_REQ_SYNC)
+		result |= REQ_SYNC;
+
+	if (flags & IBNBD_RW_REQ_DISCARD)
+		result |= REQ_OP_DISCARD;
+
+	if (flags & IBNBD_RW_REQ_SECURE)
+		result |= REQ_OP_SECURE_ERASE;
+
+	if (flags & IBNBD_RW_REQ_WRITE_SAME)
+		result |= REQ_OP_WRITE_SAME;
+
+	if (flags & IBNBD_RW_REQ_FUA)
+		result |= REQ_FUA;
+
+	if (flags & IBNBD_RW_REQ_FLUSH)
+		result |= REQ_OP_FLUSH | REQ_PREFLUSH;
+
+	return result;
+}
+
+u32 rq_cmd_to_ibnbd_io_flags(struct request *rq)
+{
+	u32 result = 0;
+
+	if (req_op(rq) == REQ_OP_WRITE)
+		result |= IBNBD_RW_REQ_WRITE;
+
+	if (rq_is_sync(rq))
+		result |= IBNBD_RW_REQ_SYNC;
+
+	if (req_op(rq) == REQ_OP_DISCARD)
+		result |= IBNBD_RW_REQ_DISCARD;
+
+	if (req_op(rq) == REQ_OP_SECURE_ERASE)
+		result |= IBNBD_RW_REQ_SECURE;
+
+	if (req_op(rq) == REQ_OP_WRITE_SAME)
+		result |= IBNBD_RW_REQ_WRITE_SAME;
+
+	if (rq->cmd_flags & REQ_FUA)
+		result |= IBNBD_RW_REQ_FUA;
+
+	if (req_op(rq) == REQ_OP_FLUSH)
+		result |= IBNBD_RW_REQ_FLUSH;
+
+	return result;
+}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 15/28] ibnbd: add shared library functions
@ 2017-03-24 10:45   ` Jack Wang
  0 siblings, 0 replies; 87+ messages in thread
From: Jack Wang @ 2017-03-24 10:45 UTC (permalink / raw)
  To: linux-block-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
	hch-jcswGhMUV9g, mail-99BIx50xQYGELgA04lAiVw,
	Milind.dumbare-Re5JQEeQqe8AvxtiuMwx3w,
	yun.wang-EIkl63zCoXaH+58JC4qpiA, Jack Wang, Kleber Souza,
	Danil Kipnis, Roman Pen

From: Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>

Signed-off-by: Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Signed-off-by: Kleber Souza <kleber.souza-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Signed-off-by: Danil Kipnis <danil.kipnis-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Signed-off-by: Roman Pen <roman.penyaev-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
---
 drivers/block/ibnbd_lib/ibnbd-proto.c | 244 ++++++++++++++++++++++++++++++++++
 drivers/block/ibnbd_lib/ibnbd.c       | 108 +++++++++++++++
 2 files changed, 352 insertions(+)
 create mode 100644 drivers/block/ibnbd_lib/ibnbd-proto.c
 create mode 100644 drivers/block/ibnbd_lib/ibnbd.c

diff --git a/drivers/block/ibnbd_lib/ibnbd-proto.c b/drivers/block/ibnbd_lib/ibnbd-proto.c
new file mode 100644
index 0000000..c6d83f2
--- /dev/null
+++ b/drivers/block/ibnbd_lib/ibnbd-proto.c
@@ -0,0 +1,244 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler < mail-99BIx50xQYGELgA04lAiVw@public.gmane.org>
+ *          Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ *   	    Kleber Souza <kleber.souza-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ * 	    Danil Kipnis <danil.kipnis-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ *   	    Roman Pen <roman.penyaev-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ *          Milind Dumbare <Milind.dumbare-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#include "../ibnbd_inc/ibnbd-proto.h"
+#include "../ibnbd_inc/log.h"
+
+static int ibnbd_validate_msg_sess_info(const struct ibnbd_msg_sess_info *msg,
+					size_t len)
+{
+	if (unlikely(len != sizeof(*msg))) {
+		ERR_NP("Sess info message with unexpected length received"
+		       " %lu instead of %lu\n", len, sizeof(*msg));
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int
+ibnbd_validate_msg_sess_info_rsp(const struct ibnbd_msg_sess_info_rsp *msg,
+				 size_t len)
+{
+	if (unlikely(len != sizeof(*msg))) {
+		ERR_NP("Sess info message with unexpected length received"
+		       " %lu instead of %lu\n", len, sizeof(*msg));
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int ibnbd_validate_msg_open_resp(const struct ibnbd_msg_open_rsp *msg,
+					size_t len)
+{
+	if (unlikely(msg->result))
+		return 0;
+
+	if (unlikely(len != sizeof(*msg))) {
+		ERR_NP("Open Response msg received with unexpected length"
+		       " %zuB instead of %luB\n", len, sizeof(*msg));
+		return -EINVAL;
+	}
+
+	if (unlikely(!msg->logical_block_size)) {
+		ERR_NP("Open Resp msg received with unexpected with"
+		       " invalid logical_block_size value %d\n",
+		       msg->logical_block_size);
+		return -EINVAL;
+	}
+
+	if (unlikely(!msg->physical_block_size)) {
+		ERR_NP("Open Resp msg received with invalid"
+		       " physical_block_size value %d\n",
+		       msg->physical_block_size);
+		return -EINVAL;
+	}
+
+	if (unlikely(!msg->max_hw_sectors)) {
+		ERR_NP("Open Resp msg received with invalid"
+		       " max_hw_sectors value %d\n", msg->max_hw_sectors);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int ibnbd_validate_msg_revalidate(const struct ibnbd_msg_revalidate *msg,
+					 size_t len)
+{
+	if (unlikely(len != sizeof(*msg))) {
+		ERR_NP("Device resize message with unexpected length received"
+		       " %lu instead of %lu\n", len, sizeof(*msg));
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int ibnbd_validate_msg_open(const struct ibnbd_msg_open *msg,
+				   size_t len)
+{
+	if (len != sizeof(*msg)) {
+		ERR_NP("Open msg received with unexpected length"
+		       " %zuB instead of %luB\n", len, sizeof(*msg));
+		return -EINVAL;
+	}
+	if (msg->dev_name[strnlen(msg->dev_name, NAME_MAX)] != '\0') {
+		ERR_NP("Open msg received with invalid dev_name value,"
+		       " null terminator missing\n");
+		return -EINVAL;
+	}
+
+	if (unlikely(msg->access_mode != IBNBD_ACCESS_RO &&
+		     msg->access_mode != IBNBD_ACCESS_RW &&
+		     msg->access_mode != IBNBD_ACCESS_MIGRATION)) {
+		ERR_NP("Open msg received with invalid access_mode value %d\n",
+		       msg->access_mode);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int ibnbd_validate_msg_close(const struct ibnbd_msg_close *msg, size_t
+				    len)
+{
+	if (unlikely(len != sizeof(*msg))) {
+		ERR_NP("Close msg received with unexpected length %lu instead"
+		       " of %lu\n", len, sizeof(*msg));
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int ibnbd_validate_msg_close_rsp(const struct ibnbd_msg_close_rsp *msg,
+					size_t len)
+{
+	if (unlikely(len != sizeof(*msg))) {
+		ERR_NP("Close_rsp msg received with unexpected length %lu"
+		       " instead of %lu\n", len, sizeof(*msg));
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+int ibnbd_validate_message(const void *data, size_t len)
+{
+	const struct ibnbd_msg_hdr *hdr = data;
+
+	switch (hdr->type) {
+	case IBNBD_MSG_SESS_INFO: {
+		const struct ibnbd_msg_sess_info *msg = data;
+
+		return ibnbd_validate_msg_sess_info(msg, len);
+	}
+	case IBNBD_MSG_SESS_INFO_RSP: {
+		const struct ibnbd_msg_sess_info_rsp *msg = data;
+
+		return ibnbd_validate_msg_sess_info_rsp(msg, len);
+	}
+	case IBNBD_MSG_OPEN_RSP: {
+		const struct ibnbd_msg_open_rsp *msg = data;
+
+		return ibnbd_validate_msg_open_resp(msg, len);
+	}
+	case IBNBD_MSG_REVAL: {
+		const struct ibnbd_msg_revalidate *msg = data;
+
+		return ibnbd_validate_msg_revalidate(msg, len);
+	}
+	case IBNBD_MSG_OPEN: {
+		const struct ibnbd_msg_open *msg = data;
+
+		return ibnbd_validate_msg_open(msg, len);
+	}
+	case IBNBD_MSG_CLOSE: {
+		const struct ibnbd_msg_close *msg = data;
+
+		return ibnbd_validate_msg_close(msg, len);
+	}
+	case IBNBD_MSG_CLOSE_RSP: {
+		const struct ibnbd_msg_close_rsp *msg = data;
+
+		return ibnbd_validate_msg_close_rsp(msg, len);
+	}
+	default:
+		ERR_NP("Ignoring received message with unknown type %d\n",
+		       hdr->type);
+		return -EINVAL;
+	}
+}
+
+const char *ibnbd_io_mode_str(enum ibnbd_io_mode mode)
+{
+	switch (mode) {
+	case IBNBD_FILEIO:
+		return "fileio";
+	case IBNBD_BLOCKIO:
+		return "blockio";
+	case IBNBD_AUTOIO:
+		return "autoio";
+	default:
+		return "unknown";
+	}
+}
+
+const char *ibnbd_access_mode_str(enum ibnbd_access_mode mode)
+{
+	switch (mode) {
+	case IBNBD_ACCESS_RO:
+		return "ro";
+	case IBNBD_ACCESS_RW:
+		return "rw";
+	case IBNBD_ACCESS_MIGRATION:
+		return "migration";
+	default:
+		return "unknown";
+	}
+}
diff --git a/drivers/block/ibnbd_lib/ibnbd.c b/drivers/block/ibnbd_lib/ibnbd.c
new file mode 100644
index 0000000..1ba4777
--- /dev/null
+++ b/drivers/block/ibnbd_lib/ibnbd.c
@@ -0,0 +1,108 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler < mail-99BIx50xQYGELgA04lAiVw@public.gmane.org>
+ *          Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ *   	    Kleber Souza <kleber.souza-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ * 	    Danil Kipnis <danil.kipnis-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ *   	    Roman Pen <roman.penyaev-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ *          Milind Dumbare <Milind.dumbare-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#include <linux/ctype.h>
+#include "../ibnbd_inc/ibnbd.h"
+#include "../ibnbd_inc/ibnbd-proto.h"
+
+u32 ibnbd_io_flags_to_bi_rw(u32 flags)
+{
+	u32 result = 0;
+
+	if (flags == 0)
+		return result;
+
+	if (flags & IBNBD_RW_REQ_WRITE)
+		result |= WRITE;
+
+	if (flags & IBNBD_RW_REQ_SYNC)
+		result |= REQ_SYNC;
+
+	if (flags & IBNBD_RW_REQ_DISCARD)
+		result |= REQ_OP_DISCARD;
+
+	if (flags & IBNBD_RW_REQ_SECURE)
+		result |= REQ_OP_SECURE_ERASE;
+
+	if (flags & IBNBD_RW_REQ_WRITE_SAME)
+		result |= REQ_OP_WRITE_SAME;
+
+	if (flags & IBNBD_RW_REQ_FUA)
+		result |= REQ_FUA;
+
+	if (flags & IBNBD_RW_REQ_FLUSH)
+		result |= REQ_OP_FLUSH | REQ_PREFLUSH;
+
+	return result;
+}
+
+u32 rq_cmd_to_ibnbd_io_flags(struct request *rq)
+{
+	u32 result = 0;
+
+	if (req_op(rq) == REQ_OP_WRITE)
+		result |= IBNBD_RW_REQ_WRITE;
+
+	if (rq_is_sync(rq))
+		result |= IBNBD_RW_REQ_SYNC;
+
+	if (req_op(rq) == REQ_OP_DISCARD)
+		result |= IBNBD_RW_REQ_DISCARD;
+
+	if (req_op(rq) == REQ_OP_SECURE_ERASE)
+		result |= IBNBD_RW_REQ_SECURE;
+
+	if (req_op(rq) == REQ_OP_WRITE_SAME)
+		result |= IBNBD_RW_REQ_WRITE_SAME;
+
+	if (rq->cmd_flags & REQ_FUA)
+		result |= IBNBD_RW_REQ_FUA;
+
+	if (req_op(rq) == REQ_OP_FLUSH)
+		result |= IBNBD_RW_REQ_FLUSH;
+
+	return result;
+}
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 16/28] ibnbd_clt: add main functionality of ibnbd_client
@ 2017-03-24 10:45   ` Jack Wang
  0 siblings, 0 replies; 87+ messages in thread
From: Jack Wang @ 2017-03-24 10:45 UTC (permalink / raw)
  To: linux-block, linux-rdma
  Cc: dledford, axboe, hch, mail, Milind.dumbare, yun.wang, Jack Wang,
	Kleber Souza, Danil Kipnis, Roman Pen

From: Jack Wang <jinpu.wang@profitbricks.com>

It provides interface to map remote device as local block devices
(/dev/ibnbdx) and prepare IO for the transfer.

It supports both request mode and multiqueue mode.

Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Kleber Souza <kleber.souza@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kipnis@profitbricks.com>
Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
---
 drivers/block/ibnbd_client/ibnbd_clt.c | 2007 ++++++++++++++++++++++++++++++++
 1 file changed, 2007 insertions(+)
 create mode 100644 drivers/block/ibnbd_client/ibnbd_clt.c

diff --git a/drivers/block/ibnbd_client/ibnbd_clt.c b/drivers/block/ibnbd_client/ibnbd_clt.c
new file mode 100644
index 0000000..945c8df
--- /dev/null
+++ b/drivers/block/ibnbd_client/ibnbd_clt.c
@@ -0,0 +1,2007 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler < mail@fholler.de>
+ *          Jack Wang <jinpu.wang@profitbricks.com>
+ *   	    Kleber Souza <kleber.souza@profitbricks.com>
+ * 	    Danil Kipnis <danil.kipnis@profitbricks.com>
+ *   	    Roman Pen <roman.penyaev@profitbricks.com>
+ *          Milind Dumbare <Milind.dumbare@gmail.com>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#include <linux/module.h>
+#include <linux/blkdev.h>
+#include <linux/hdreg.h>		/* for hd_geometry */
+#include <linux/scatterlist.h>
+#include <linux/idr.h>
+#include <linux/fs.h>
+#include <linux/uaccess.h>
+#include <rdma/ib.h>
+#include <uapi/linux/in6.h>
+
+#include "ibnbd_clt.h"
+#include "ibnbd_clt_sysfs.h"
+#include "../ibnbd_inc/ibnbd.h"
+#include <rdma/ibtrs.h>
+
+MODULE_AUTHOR("ibnbd@profitbricks.com");
+MODULE_DESCRIPTION("InfiniBand Network Block Device Client");
+MODULE_VERSION(__stringify(IBNBD_VER));
+MODULE_LICENSE("GPL");
+
+static int ibnbd_client_major;
+static DEFINE_IDR(g_index_idr);
+static DEFINE_RWLOCK(g_index_lock);
+static DEFINE_SPINLOCK(sess_lock);
+static DEFINE_SPINLOCK(dev_lock);
+static LIST_HEAD(session_list);
+static LIST_HEAD(devs_list);
+static DECLARE_WAIT_QUEUE_HEAD(sess_list_waitq);
+static struct ibtrs_clt_ops ops;
+
+static bool softirq_enable;
+module_param(softirq_enable, bool, 0444);
+MODULE_PARM_DESC(softirq_enable, "finish request in softirq_fn."
+		 " (default: 0)");
+/*
+ * Maximum number of partitions an instance can have.
+ * 6 bits = 64 minors = 63 partitions (one minor is used for the device itself)
+ */
+#define IBNBD_PART_BITS		6
+#define KERNEL_SECTOR_SIZE      512
+
+inline bool ibnbd_clt_dev_is_open(struct ibnbd_dev *dev)
+{
+	return dev->dev_state == DEV_STATE_OPEN;
+}
+
+static void ibnbd_clt_put_dev(struct ibnbd_dev *dev)
+{
+	if (!atomic_dec_if_positive(&dev->refcount)) {
+		write_lock(&g_index_lock);
+		idr_remove(&g_index_idr, dev->clt_device_id);
+		write_unlock(&g_index_lock);
+		kfree(dev->hw_queues);
+		kfree(dev->close_compl);
+		ibnbd_clt_put_sess(dev->sess);
+		kfree(dev);
+	}
+}
+
+static int ibnbd_clt_get_dev(struct ibnbd_dev *dev)
+{
+	return atomic_inc_not_zero(&dev->refcount);
+}
+
+static struct ibnbd_dev *g_get_dev(int dev_id)
+{
+	struct ibnbd_dev *dev;
+
+	read_lock(&g_index_lock);
+	dev = idr_find(&g_index_idr, dev_id);
+	if (!dev)
+		dev = ERR_PTR(-ENXIO);
+	read_unlock(&g_index_lock);
+
+	return dev;
+}
+
+static void ibnbd_clt_set_dev_attr(struct ibnbd_dev *dev,
+				   const struct ibnbd_msg_open_rsp *rsp)
+{
+	dev->device_id			= rsp->device_id;
+	dev->nsectors			= rsp->nsectors;
+	dev->logical_block_size		= rsp->logical_block_size;
+	dev->physical_block_size	= rsp->physical_block_size;
+	dev->max_write_same_sectors	= rsp->max_write_same_sectors;
+	dev->max_discard_sectors	= rsp->max_discard_sectors;
+	dev->discard_zeroes_data	= rsp->discard_zeroes_data;
+	dev->discard_granularity	= rsp->discard_granularity;
+	dev->discard_alignment		= rsp->discard_alignment;
+	dev->secure_discard		= rsp->secure_discard;
+	dev->rotational			= rsp->rotational;
+	dev->remote_io_mode		= rsp->io_mode;
+
+	if (dev->remote_io_mode == IBNBD_FILEIO) {
+		dev->max_hw_sectors = dev->sess->max_io_size /
+			rsp->logical_block_size;
+		dev->max_segments = BMAX_SEGMENTS;
+	} else {
+		dev->max_hw_sectors = dev->sess->max_io_size /
+			rsp->logical_block_size <
+			rsp->max_hw_sectors ?
+			dev->sess->max_io_size /
+			rsp->logical_block_size : rsp->max_hw_sectors;
+		dev->max_segments = rsp->max_segments > BMAX_SEGMENTS ?
+				    BMAX_SEGMENTS : rsp->max_segments;
+	}
+}
+
+static void ibnbd_clt_revalidate_disk(struct ibnbd_dev *dev,
+				      size_t new_nsectors)
+{
+	int err = 0;
+
+	INFO(dev, "Device size changed from %zu to %zu sectors\n",
+	     dev->nsectors, new_nsectors);
+	dev->nsectors = new_nsectors;
+	set_capacity(dev->gd,
+		     dev->nsectors * (dev->logical_block_size /
+				      KERNEL_SECTOR_SIZE));
+	err = revalidate_disk(dev->gd);
+	if (err)
+		ERR(dev, "Failed to change device size from"
+		    " %zu to %zu, errno: %d\n", dev->nsectors,
+		     new_nsectors, err);
+}
+
+static void process_msg_sess_info_rsp(struct ibnbd_session *sess,
+				      struct ibnbd_msg_sess_info_rsp *msg)
+{
+	sess->ver = min_t(u8, msg->ver, IBNBD_VERSION);
+	DEB("Session to %s (%s) using protocol version %d (client version: %d,"
+	    " server version: %d)\n", sess->str_addr, sess->hostname, sess->ver,
+	    IBNBD_VERSION, msg->ver);
+}
+
+static int process_msg_open_rsp(struct ibnbd_session *sess,
+				struct ibnbd_msg_open_rsp *rsp)
+{
+	struct ibnbd_dev *dev;
+	int err = 0;
+
+	dev = g_get_dev(rsp->clt_device_id);
+	if (IS_ERR(dev)) {
+		ERR_NP("Open-Response message received from session %s"
+		       " for unknown device (id: %d)\n", sess->str_addr,
+		       rsp->clt_device_id);
+		return -ENOENT;
+	}
+
+	if (!ibnbd_clt_get_dev(dev)) {
+		ERR_NP("Failed to process Open-Response message from session"
+		       " %s, unable to get reference to device (id: %d)",
+		       sess->str_addr, rsp->clt_device_id);
+		return -ENOENT;
+	}
+
+	mutex_lock(&dev->lock);
+
+	if (dev->dev_state == DEV_STATE_UNMAPPED) {
+		INFO(dev, "Ignoring Open-Response message from server for "
+		     " unmapped device\n");
+		err = -ENOENT;
+		goto out;
+	}
+
+	if (rsp->result) {
+		ERR(dev, "Server failed to open device for mapping, errno:"
+		    " %d\n", rsp->result);
+		dev->open_errno = rsp->result;
+		if (dev->open_compl)
+			complete(dev->open_compl);
+		goto out;
+	}
+
+	if (dev->dev_state == DEV_STATE_CLOSED) {
+		/* if the device was remapped and the size changed in the
+		 * meantime we need to revalidate it
+		 */
+		if (dev->nsectors != rsp->nsectors)
+			ibnbd_clt_revalidate_disk(dev, (size_t)rsp->nsectors);
+		INFO(dev, "Device online, device remapped successfully\n");
+	}
+
+	ibnbd_clt_set_dev_attr(dev, rsp);
+
+	dev->dev_state = DEV_STATE_OPEN;
+	if (dev->open_compl)
+		complete(dev->open_compl);
+
+out:
+	mutex_unlock(&dev->lock);
+	ibnbd_clt_put_dev(dev);
+
+	return err;
+}
+
+static void process_msg_revalidate(struct ibnbd_session *sess,
+				   struct ibnbd_msg_revalidate *msg)
+{
+	struct ibnbd_dev *dev;
+
+	dev = g_get_dev(msg->clt_device_id);
+	if (IS_ERR(dev)) {
+		ERR_NP("Received device revalidation message from session %s"
+		       " for non-existent device (id %d)\n", sess->str_addr,
+		       msg->clt_device_id);
+		return;
+	}
+
+	if (!ibnbd_clt_get_dev(dev)) {
+		ERR_NP("Failed to process device revalidation message from"
+		       " session %s, unable to get reference to device"
+		       " (id: %d)", sess->str_addr, msg->clt_device_id);
+		return;
+	}
+
+	mutex_lock(&dev->lock);
+
+	if (dev->dev_state == DEV_STATE_UNMAPPED) {
+		ERR(dev, "Received device revalidation message"
+		    " for unmapped device\n");
+		goto out;
+	}
+
+	if (dev->nsectors != msg->nsectors &&
+	    dev->dev_state == DEV_STATE_OPEN) {
+		ibnbd_clt_revalidate_disk(dev, (size_t)msg->nsectors);
+	} else {
+		INFO(dev, "Ignoring device revalidate message, "
+		     "current device size is the same as in the "
+		     "revalidate message, %llu sectors\n", msg->nsectors);
+	}
+
+out:
+	mutex_unlock(&dev->lock);
+	ibnbd_clt_put_dev(dev);
+}
+
+static int send_msg_close(struct ibtrs_session *sess, u32 device_id)
+{
+	struct ibnbd_msg_close msg;
+	struct kvec vec = {
+		.iov_base = &msg,
+		.iov_len  = sizeof(msg)
+	};
+
+	msg.hdr.type	= IBNBD_MSG_CLOSE;
+	msg.device_id	= device_id;
+
+	return ibtrs_clt_send(sess, &vec, 1);
+}
+
+static void ibnbd_clt_recv(void *priv, const void *msg, size_t len)
+{
+	const struct ibnbd_msg_hdr *hdr = msg;
+	struct ibnbd_session *sess = priv;
+
+	if (unlikely(WARN_ON(!hdr) || ibnbd_validate_message(msg, len)))
+		return;
+
+	print_hex_dump_debug("", DUMP_PREFIX_OFFSET, 8, 1, msg, len, true);
+
+	switch (hdr->type) {
+	case IBNBD_MSG_SESS_INFO_RSP: {
+		struct ibnbd_msg_sess_info_rsp *rsp =
+			(struct ibnbd_msg_sess_info_rsp *)msg;
+
+		process_msg_sess_info_rsp(sess, rsp);
+		if (sess->sess_info_compl)
+			complete(sess->sess_info_compl);
+		break;
+	}
+	case IBNBD_MSG_OPEN_RSP: {
+		int err;
+		struct ibnbd_msg_open_rsp *rsp =
+			(struct ibnbd_msg_open_rsp *)msg;
+
+		if (process_msg_open_rsp(sess, rsp) && !rsp->result) {
+			ERR_NP("Failed to process open response message from"
+			       " server, sending close message for dev id:"
+			       " %u\n", rsp->device_id);
+
+			err = send_msg_close(sess->sess, rsp->device_id);
+			if (err)
+				ERR_NP("Failed to send close msg for device"
+				       " with id: %u, errno: %d\n",
+				       rsp->device_id, err);
+		}
+
+		break;
+	}
+	case IBNBD_MSG_CLOSE_RSP: {
+		struct ibnbd_dev *dev;
+		struct ibnbd_msg_close_rsp *rsp =
+			(struct ibnbd_msg_close_rsp *)msg;
+
+		dev = g_get_dev(rsp->clt_device_id);
+		if (IS_ERR(dev)) {
+			ERR_NP("Close-Response message received from session %s"
+			       " for unknown device (id: %u)\n", sess->str_addr,
+			       rsp->clt_device_id);
+			break;
+		}
+		if (dev->close_compl && dev->dev_state == DEV_STATE_UNMAPPED)
+			complete(dev->close_compl);
+
+		break;
+	}
+	case IBNBD_MSG_REVAL:
+		process_msg_revalidate(sess,
+				       (struct ibnbd_msg_revalidate *)msg);
+		break;
+	default:
+		ERR_NP("IBNBD message with unknown type %d received from"
+		       " session %s\n", hdr->type, sess->str_addr);
+		break;
+	}
+}
+
+static void ibnbd_blk_delay_work(struct work_struct *work)
+{
+	struct ibnbd_dev *dev;
+
+	dev = container_of(work, struct ibnbd_dev, rq_delay_work.work);
+	spin_lock_irq(dev->queue->queue_lock);
+	blk_start_queue(dev->queue);
+	spin_unlock_irq(dev->queue->queue_lock);
+}
+
+/**
+ * What is the difference between this and original blk_delay_queue() ?
+ * Here the stop queue flag is cleared, so we are like MQ.
+ */
+static void ibnbd_blk_delay_queue(struct ibnbd_dev *dev, unsigned long msecs)
+{
+	int cpu = get_cpu();
+
+	kblockd_schedule_delayed_work_on(cpu, &dev->rq_delay_work,
+					 msecs_to_jiffies(msecs));
+	put_cpu();
+}
+
+static inline void ibnbd_dev_requeue(struct ibnbd_queue *q)
+{
+	struct ibnbd_dev *dev = q->dev;
+
+	if (dev->queue_mode == BLK_MQ) {
+		if (WARN_ON(!q->hctx))
+			return;
+		blk_mq_delay_queue(q->hctx, 0);
+	} else if (dev->queue_mode == BLK_RQ) {
+		ibnbd_blk_delay_queue(q->dev, 0);
+	} else {
+		WARN(1, "We support requeueing only for RQ or MQ");
+	}
+}
+
+enum {
+	IBNBD_DELAY_10ms   = 10,
+	IBNBD_DELAY_IFBUSY = -1,
+};
+
+/**
+ * ibnbd_get_cpu_qlist() - finds a list with HW queues to be requeued
+ *
+ * Description:
+ *     Each CPU has a list of HW queues, which needs to be requeed.  If a list
+ *     is not empty - it is marked with a bit.  This function finds first
+ *     set bit in a bitmap and returns corresponding CPU list.
+ */
+static struct ibnbd_cpu_qlist *ibnbd_get_cpu_qlist(struct ibnbd_session *sess,
+						   int cpu)
+{
+	int bit;
+
+	/* First half */
+	bit = find_next_bit(sess->cpu_queues_bm, nr_cpu_ids, cpu);
+	if (bit < nr_cpu_ids) {
+		return per_cpu_ptr(sess->cpu_queues, bit);
+	} else if (cpu != 0) {
+		/* Second half */
+		bit = find_next_bit(sess->cpu_queues_bm, cpu, 0);
+		if (bit < cpu)
+			return per_cpu_ptr(sess->cpu_queues, bit);
+	}
+
+	return NULL;
+}
+
+static inline int nxt_cpu(int cpu)
+{
+	return (cpu + 1) % NR_CPUS;
+}
+
+/**
+ * get_cpu_rr_var() - returns pointer to percpu var containing last cpu requeued
+ *
+ * It also sets the var to the current cpu if the var was never set before
+ * (== -1).
+ */
+#define get_cpu_rr_var(percpu)				\
+({							\
+	int *cpup;					\
+							\
+	cpup = &get_cpu_var(*percpu);			\
+	if (unlikely(*cpup < 0))			\
+		*cpup = smp_processor_id();		\
+	cpup;						\
+})
+
+/**
+ * ibnbd_requeue_if_needed() - requeue if CPU queue is marked as non empty
+ *
+ * Description:
+ *     Each CPU has it's own list of HW queues, which should be requeued.
+ *     Function finds such list with HW queues, takes a list lock, picks up
+ *     the first HW queue out of the list and requeues it.
+ *
+ * Return:
+ *     True if the queue was requeued, false otherwise.
+ *
+ * Context:
+ *     Does not matter.
+ */
+static inline bool ibnbd_requeue_if_needed(struct ibnbd_session *sess)
+{
+	struct ibnbd_queue *q = NULL;
+	struct ibnbd_cpu_qlist *cpu_q;
+	unsigned long flags;
+	int cpuv;
+
+	int *uninitialized_var(cpup);
+
+	/*
+	 * To keep fairness and not to let other queues starve we always
+	 * try to wake up someone else in round-robin manner.  That of course
+	 * increases latency but queues always have a chance to be executed.
+	 */
+	cpup = get_cpu_rr_var(sess->cpu_rr);
+	cpuv = (*cpup + 1) % num_online_cpus();
+	for (cpu_q = ibnbd_get_cpu_qlist(sess, cpuv); cpu_q;
+	     cpu_q = ibnbd_get_cpu_qlist(sess, nxt_cpu(cpu_q->cpu))) {
+		if (!spin_trylock_irqsave(&cpu_q->requeue_lock, flags))
+			continue;
+		if (likely(test_bit(cpu_q->cpu, sess->cpu_queues_bm))) {
+			q = list_first_entry_or_null(&cpu_q->requeue_list,
+						     typeof(*q), requeue_list);
+			if (WARN_ON(!q))
+				goto clear_bit;
+			list_del_init(&q->requeue_list);
+			clear_bit_unlock(0, &q->in_list);
+
+			if (list_empty(&cpu_q->requeue_list)) {
+				/* Clear bit if nothing is left */
+clear_bit:
+				clear_bit(cpu_q->cpu, sess->cpu_queues_bm);
+			}
+		}
+		spin_unlock_irqrestore(&cpu_q->requeue_lock, flags);
+
+		if (q)
+			break;
+	}
+
+	/**
+	 * Saves the CPU that is going to be requeued on the per-cpu var. Just
+	 * incrementing it doesn't work because ibnbd_get_cpu_qlist() will
+	 * always return the first CPU with something on the queue list when the
+	 * value stored on the var is greater than the last CPU with something
+	 * on the list.
+	 */
+	if (cpu_q)
+		*cpup = cpu_q->cpu;
+	put_cpu_var(sess->cpu_rr);
+
+	if (q)
+		ibnbd_dev_requeue(q);
+
+	return !!q;
+}
+
+/**
+ * ibnbd_requeue_all_if_idle() - requeue all queues left in the list if
+ *     session is idling (there are no requests in-flight).
+ *
+ * Description:
+ *     This function tries to rerun all stopped queues if there are no
+ *     requests in-flight anymore.  This function tries to solve an obvious
+ *     problem, when number of tags < than number of queues (hctx), which
+ *     are stopped and put to sleep.  If last tag, which has been just put,
+ *     does not wake up all left queues (hctxs), IO requests hang forever.
+ *
+ *     That can happen when all number of tags, say N, have been exhausted
+ *     from one CPU, and we have many block devices per session, say M.
+ *     Each block device has it's own queue (hctx) for each CPU, so eventually
+ *     we can put that number of queues (hctxs) to sleep: M x NR_CPUS.
+ *     If number of tags N < M x NR_CPUS finally we will get an IO hang.
+ *
+ *     To avoid this hang last caller of ibnbd_put_tag() (last caller is the
+ *     one who observes sess->busy == 0) must wake up all remaining queues.
+ *
+ * Context:
+ *     Does not matter.
+ */
+static inline void ibnbd_requeue_all_if_idle(struct ibnbd_session *sess)
+{
+	bool requeued;
+
+	do {
+		requeued = ibnbd_requeue_if_needed(sess);
+	} while (atomic_read(&sess->busy) == 0 && requeued);
+}
+
+static struct ibtrs_tag *ibnbd_get_tag(struct ibnbd_session *sess, int cpu,
+				       size_t tag_bytes, int wait)
+{
+	struct ibtrs_tag *tag;
+
+	tag = ibtrs_get_tag(sess->sess, cpu, tag_bytes,
+			    wait ? IBTRS_TAG_WAIT : IBTRS_TAG_NOWAIT);
+	if (likely(tag))
+		/* We have a subtle rare case here, when all tags can be
+		 * consumed before busy counter increased.  This is safe,
+		 * because loser will get NULL as a tag, observe 0 busy
+		 * counter and immediately restart the queue himself.
+		 */
+		atomic_inc(&sess->busy);
+
+	return tag;
+}
+
+static void ibnbd_put_tag(struct ibnbd_session *sess, struct ibtrs_tag *tag)
+{
+	ibtrs_put_tag(sess->sess, tag);
+	atomic_dec(&sess->busy);
+	/* Paired with ibnbd_dev_add_to_requeue().  Decrement first
+	 * and then check queue bits.
+	 */
+	smp_mb__after_atomic();
+	ibnbd_requeue_all_if_idle(sess);
+}
+
+static struct ibnbd_iu *ibnbd_get_iu(struct ibnbd_session *sess,
+				     size_t tag_bytes, int wait)
+{
+	struct ibnbd_iu *iu;
+	struct ibtrs_tag *tag;
+
+	tag = ibnbd_get_tag(sess, -1, tag_bytes,
+			    wait ? IBTRS_TAG_WAIT : IBTRS_TAG_NOWAIT);
+	if (unlikely(!tag))
+		return NULL;
+	iu = ibtrs_tag_to_pdu(tag);
+	iu->tag = tag; /* yes, ibtrs_tag_from_pdu() can be nice here,
+			* but also we have to think about MQ mode
+			*/
+
+	return iu;
+}
+
+static void ibnbd_put_iu(struct ibnbd_session *sess, struct ibnbd_iu *iu)
+{
+	ibnbd_put_tag(sess, iu->tag);
+}
+
+static void ibnbd_softirq_done_fn(struct request *rq)
+{
+	struct ibnbd_dev *dev		= rq->rq_disk->private_data;
+	struct ibnbd_session *sess	= dev->sess;
+	struct ibnbd_iu *iu;
+
+	switch (dev->queue_mode) {
+	case BLK_MQ:
+		iu = blk_mq_rq_to_pdu(rq);
+		ibnbd_put_tag(sess, iu->tag);
+		blk_mq_end_request(rq, iu->errno);
+		if (rq->rq_flags & RQF_SPECIAL_PAYLOAD)
+			__free_page(rq->special_vec.bv_page);
+		break;
+	case BLK_RQ:
+		iu = rq->special;
+		blk_end_request_all(rq, iu->errno);
+		if (rq->rq_flags & RQF_SPECIAL_PAYLOAD)
+			__free_page(rq->special_vec.bv_page);
+		break;
+	default:
+		WARN(true, "dev->queue_mode , contains unexpected"
+		     " value: %d. Memory Corruption? Inflight I/O stalled!\n",
+		     dev->queue_mode);
+		return;
+	}
+}
+
+static void ibnbd_clt_rdma_ev(void *priv, enum ibtrs_clt_rdma_ev ev, int errno)
+{
+	struct ibnbd_iu *iu		= (struct ibnbd_iu *)priv;
+	struct ibnbd_dev *dev		= iu->dev;
+	struct request *rq;
+	const int flags = iu->msg.rw;
+	bool is_read;
+
+	switch (dev->queue_mode) {
+	case BLK_MQ:
+		rq = iu->rq;
+		is_read = req_op(rq) == READ;
+		iu->errno = errno;
+		if (softirq_enable) {
+			blk_mq_complete_request(rq, errno);
+		} else {
+			ibnbd_put_tag(dev->sess, iu->tag);
+			blk_mq_end_request(rq, errno);
+
+			if (rq->rq_flags & RQF_SPECIAL_PAYLOAD)
+				__free_page(rq->special_vec.bv_page);
+
+		}
+		break;
+	case BLK_RQ:
+		rq = iu->rq;
+		is_read = req_op(rq) == READ;
+		iu->errno = errno;
+		if (softirq_enable) {
+			blk_complete_request(rq);
+		} else {
+			blk_end_request_all(rq, errno);
+			if (rq->rq_flags & RQF_SPECIAL_PAYLOAD)
+				__free_page(rq->special_vec.bv_page);
+		}
+		break;
+	default:
+		WARN(true, "dev->queue_mode , contains unexpected"
+		     " value: %d. Memory Corruption? Inflight I/O stalled!\n",
+		     dev->queue_mode);
+		return;
+	}
+
+	if (errno)
+		INFO_RL(dev, "%s I/O failed with status: %d, flags: 0x%x\n",
+			is_read ? "read" : "write", errno, flags);
+}
+
+static int send_msg_open(struct ibnbd_dev *dev)
+{
+	int err;
+	struct ibnbd_msg_open msg;
+	struct kvec vec = {
+		.iov_base = &msg,
+		.iov_len  = sizeof(msg)
+	};
+
+	msg.hdr.type		= IBNBD_MSG_OPEN;
+	msg.clt_device_id	= dev->clt_device_id;
+	msg.access_mode		= dev->access_mode;
+	msg.io_mode		= dev->io_mode;
+	strlcpy(msg.dev_name, dev->pathname, sizeof(msg.dev_name));
+
+	err = ibtrs_clt_send(dev->sess->sess, &vec, 1);
+
+	return err;
+}
+
+static int send_msg_sess_info(struct ibnbd_session *sess)
+{
+	struct ibnbd_msg_sess_info msg;
+	struct kvec vec = {
+		.iov_base = &msg,
+		.iov_len  = sizeof(msg)
+	};
+
+	msg.hdr.type	= IBNBD_MSG_SESS_INFO;
+	msg.ver		= IBNBD_VERSION;
+
+	return ibtrs_clt_send(sess->sess, &vec, 1);
+}
+
+int open_remote_device(struct ibnbd_dev *dev)
+{
+	int err;
+
+	err = send_msg_open(dev);
+	if (unlikely(err)) {
+		ERR(dev, "Failed to send open msg, err: %d\n", err);
+		return err;
+	}
+	return 0;
+}
+
+static int find_dev_cb(int id, void *ptr, void *data)
+{
+	struct ibnbd_dev *dev = ptr;
+	struct ibnbd_session *sess = data;
+
+	if (dev->sess == sess && dev->dev_state == DEV_STATE_INIT &&
+	    dev->open_compl) {
+		dev->dev_state = DEV_STATE_INIT_CLOSED;
+		dev->open_errno = -ECOMM;
+		complete(dev->open_compl);
+		ERR(dev, "Device offline, session disconnected.\n");
+	} else if (dev->sess == sess && dev->dev_state == DEV_STATE_UNMAPPED &&
+	    dev->close_compl) {
+		complete(dev->close_compl);
+		ERR(dev, "Device closed, session disconnected.\n");
+	}
+
+	return 0;
+}
+
+static void __set_dev_states_closed(struct ibnbd_session *sess)
+{
+	struct ibnbd_dev *dev;
+
+	list_for_each_entry(dev, &sess->devs_list, list) {
+		mutex_lock(&dev->lock);
+		dev->dev_state = DEV_STATE_CLOSED;
+		dev->open_errno = -ECOMM;
+		if (dev->open_compl)
+			complete(dev->open_compl);
+		ERR(dev, "Device offline, session disconnected.\n");
+		mutex_unlock(&dev->lock);
+	}
+	read_lock(&g_index_lock);
+	idr_for_each(&g_index_idr, find_dev_cb, sess);
+	read_unlock(&g_index_lock);
+}
+
+static int update_sess_info(struct ibnbd_session *sess)
+{
+	int err;
+
+	sess->sess_info_compl = kmalloc(sizeof(*sess->sess_info_compl),
+					GFP_KERNEL);
+	if (!sess->sess_info_compl) {
+		ERR_NP("Failed to allocate memory for completion for session"
+		       " %s (%s)\n", sess->str_addr, sess->hostname);
+		return -ENOMEM;
+	}
+
+	init_completion(sess->sess_info_compl);
+
+	err = send_msg_sess_info(sess);
+	if (unlikely(err)) {
+		ERR_NP("Failed to send SESS_INFO message for session %s (%s)\n",
+		       sess->str_addr, sess->hostname);
+		goto out;
+	}
+
+	/* wait for IBNBD_MSG_SESS_INFO_RSP from server */
+	wait_for_completion(sess->sess_info_compl);
+out:
+	kfree(sess->sess_info_compl);
+	sess->sess_info_compl = NULL;
+
+	return err;
+}
+
+static void reopen_worker(struct work_struct *work)
+{
+	struct ibnbd_work *w;
+	struct ibnbd_session *sess;
+	struct ibnbd_dev *dev;
+	int err;
+
+	w = container_of(work, struct ibnbd_work, work);
+	sess = w->sess;
+	kfree(w);
+
+	mutex_lock(&sess->lock);
+	if (sess->state == SESS_STATE_DESTROYED) {
+		mutex_unlock(&sess->lock);
+		return;
+	}
+	err = update_sess_info(sess);
+	if (unlikely(err))
+		goto out;
+	list_for_each_entry(dev, &sess->devs_list, list) {
+		INFO(dev, "session reconnected, remapping device\n");
+		open_remote_device(dev);
+	}
+out:
+	mutex_unlock(&sess->lock);
+
+	ibnbd_clt_put_sess(sess);
+}
+
+static int ibnbd_schedule_reopen(struct ibnbd_session *sess)
+{
+	struct ibnbd_work *w;
+
+	w = kmalloc(sizeof(*w), GFP_KERNEL);
+	if (!w) {
+		ERR_NP("Failed to allocate memory to schedule reopen of"
+		       " devices for session %s\n", sess->str_addr);
+		return -ENOMEM;
+	}
+
+	if (WARN_ON(!ibnbd_clt_get_sess(sess))) {
+		kfree(w);
+		return -ENOENT;
+	}
+
+	w->sess = sess;
+	INIT_WORK(&w->work, reopen_worker);
+	schedule_work(&w->work);
+
+	return 0;
+}
+
+static void ibnbd_clt_sess_ev(void *priv, enum ibtrs_clt_sess_ev ev, int errno)
+{
+	struct ibnbd_session *sess = priv;
+	struct ibtrs_attrs attrs;
+
+	switch (ev) {
+	case IBTRS_CLT_SESS_EV_DISCONNECTED:
+		if (sess->sess_info_compl)
+			complete(sess->sess_info_compl);
+		mutex_lock(&sess->lock);
+		if (sess->state == SESS_STATE_DESTROYED) {
+			mutex_unlock(&sess->lock);
+			return;
+		}
+		sess->state = SESS_STATE_DISCONNECTED;
+		__set_dev_states_closed(sess);
+		mutex_unlock(&sess->lock);
+		break;
+	case IBTRS_CLT_SESS_EV_RECONNECT:
+		mutex_lock(&sess->lock);
+		if (sess->state == SESS_STATE_DESTROYED) {
+			/* This may happen if the session started to be closed
+			 * before the reconnect event arrived. In this case, we
+			 * just return and the session will be closed later
+			 */
+			mutex_unlock(&sess->lock);
+			return;
+		}
+		sess->state = SESS_STATE_READY;
+
+		mutex_unlock(&sess->lock);
+		memset(&attrs, 0, sizeof(attrs));
+		ibtrs_clt_query(sess->sess, &attrs);
+		strlcpy(sess->hostname, attrs.hostname, sizeof(sess->hostname));
+		sess->max_io_size = attrs.max_io_size;
+		ibnbd_schedule_reopen(sess);
+		break;
+	case IBTRS_CLT_SESS_EV_MAX_RECONN_EXCEEDED:
+		INFO_NP("Reconnect attempts exceeded for session %s\n",
+			sess->str_addr);
+		break;
+	default:
+		ERR_NP("Unknown session event received (%d), session: (%s)\n",
+		       ev, sess->str_addr);
+	}
+}
+
+static int ibnbd_cmp_sock_addr(const struct sockaddr_storage *a,
+			       const struct sockaddr_storage *b)
+{
+	if (a->ss_family == b->ss_family) {
+		switch (a->ss_family) {
+		case AF_INET:
+			return memcmp(&((struct sockaddr_in *)a)->sin_addr,
+				      &((struct sockaddr_in *)b)->sin_addr,
+				      sizeof(struct in_addr));
+		case AF_INET6:
+			return memcmp(&((struct sockaddr_in6 *)a)->sin6_addr,
+				      &((struct sockaddr_in6 *)b)->sin6_addr,
+				      sizeof(struct in6_addr));
+		case AF_IB:
+			return memcmp(&((struct sockaddr_ib *)a)->sib_addr,
+				      &((struct sockaddr_ib *)b)->sib_addr,
+				      sizeof(struct ib_addr));
+		default:
+			ERR_NP("Unknown address family: %d\n", a->ss_family);
+			return -EINVAL;
+		}
+	} else {
+		return -1;
+	}
+}
+
+struct ibnbd_session *ibnbd_clt_find_sess(const struct sockaddr_storage *addr)
+{
+	struct ibnbd_session *sess;
+
+	spin_lock(&sess_lock);
+	list_for_each_entry(sess, &session_list, list)
+		if (!ibnbd_cmp_sock_addr(&sess->addr, addr)) {
+			spin_unlock(&sess_lock);
+			return sess;
+		}
+	spin_unlock(&sess_lock);
+
+	return NULL;
+}
+
+static void ibnbd_init_cpu_qlists(struct ibnbd_cpu_qlist __percpu *cpu_queues)
+{
+	unsigned int cpu;
+	struct ibnbd_cpu_qlist *cpu_q;
+
+	for_each_online_cpu(cpu) {
+		cpu_q = per_cpu_ptr(cpu_queues, cpu);
+
+		cpu_q->cpu = cpu;
+		INIT_LIST_HEAD(&cpu_q->requeue_list);
+		spin_lock_init(&cpu_q->requeue_lock);
+	}
+}
+
+static struct blk_mq_ops ibnbd_mq_ops;
+static int setup_mq_tags(struct ibnbd_session *sess)
+{
+	struct blk_mq_tag_set *tags = &sess->tag_set;
+
+	memset(tags, 0, sizeof(*tags));
+	tags->ops		= &ibnbd_mq_ops;
+	tags->queue_depth	= sess->queue_depth;
+	tags->numa_node		= NUMA_NO_NODE;
+	tags->flags		= BLK_MQ_F_SHOULD_MERGE |
+				  BLK_MQ_F_SG_MERGE     |
+				  BLK_MQ_F_TAG_SHARED;
+	tags->cmd_size		= sizeof(struct ibnbd_iu);
+	tags->nr_hw_queues	= num_online_cpus();
+
+	return blk_mq_alloc_tag_set(tags);
+}
+
+static void destroy_mq_tags(struct ibnbd_session *sess)
+{
+	blk_mq_free_tag_set(&sess->tag_set);
+}
+
+struct ibnbd_session *ibnbd_create_session(const struct sockaddr_storage *addr)
+{
+	struct ibnbd_session *sess;
+	struct ibtrs_attrs attrs;
+	char str_addr[IBTRS_ADDRLEN];
+	int err;
+	int cpu;
+
+	err = ibtrs_addr_to_str(addr, str_addr, sizeof(str_addr));
+	if (err < 0) {
+		ERR_NP("Can't create session, invalid address\n");
+		return ERR_PTR(err);
+	}
+
+	DEB("Establishing session to %s\n", str_addr);
+
+	if (ibnbd_clt_find_sess(addr)) {
+		ERR_NP("Can't create session, session to %s already exists\n",
+		       str_addr);
+		return ERR_PTR(-EEXIST);
+	}
+
+	sess = kzalloc_node(sizeof(*sess), GFP_KERNEL, NUMA_NO_NODE);
+	if (unlikely(!sess)) {
+		ERR_NP("Failed to create session to %s,"
+		       " allocating session struct failed\n", str_addr);
+		return ERR_PTR(-ENOMEM);
+	}
+	sess->cpu_queues = alloc_percpu(struct ibnbd_cpu_qlist);
+	if (unlikely(!sess->cpu_queues)) {
+		ERR_NP("Failed to create session to %s,"
+		       " alloc of percpu var (cpu_queues) failed\n", str_addr);
+		kvfree(sess);
+		return ERR_PTR(-ENOMEM);
+	}
+	ibnbd_init_cpu_qlists(sess->cpu_queues);
+
+	/**
+	 * That is simple percpu variable which stores cpu indeces, which are
+	 * incremented on each access.  We need that for the sake of fairness
+	 * to wake up queues in a round-robin manner.
+	 */
+	sess->cpu_rr = alloc_percpu(int);
+	if (unlikely(!sess->cpu_rr)) {
+		ERR_NP("Failed to create session to %s,"
+		       " alloc of percpu var (cpu_rr) failed\n", str_addr);
+		free_percpu(sess->cpu_queues);
+		kfree(sess);
+		return ERR_PTR(-ENOMEM);
+	}
+	for_each_online_cpu(cpu) {
+		*per_cpu_ptr(sess->cpu_rr, cpu) = -1;
+	}
+
+	memset(&attrs, 0, sizeof(attrs));
+	memcpy(&sess->addr, addr, sizeof(sess->addr));
+	strlcpy(sess->str_addr, str_addr, sizeof(sess->str_addr));
+
+	spin_lock(&sess_lock);
+	list_add(&sess->list, &session_list);
+	spin_unlock(&sess_lock);
+
+	atomic_set(&sess->busy, 0);
+	mutex_init(&sess->lock);
+	INIT_LIST_HEAD(&sess->devs_list);
+	bitmap_zero(sess->cpu_queues_bm, NR_CPUS);
+	kref_init(&sess->refcount);
+	sess->state = SESS_STATE_DISCONNECTED;
+
+	sess->sess = ibtrs_clt_open(addr, sizeof(struct ibnbd_iu), sess,
+				    RECONNECT_DELAY, BMAX_SEGMENTS,
+				    MAX_RECONNECTS);
+	if (!IS_ERR(sess->sess)) {
+		mutex_lock(&sess->lock);
+		sess->state = SESS_STATE_READY;
+		mutex_unlock(&sess->lock);
+	} else {
+		err = PTR_ERR(sess->sess);
+		goto out_free;
+	}
+
+	ibtrs_clt_query(sess->sess, &attrs);
+	strlcpy(sess->hostname, attrs.hostname, sizeof(sess->hostname));
+	sess->max_io_size = attrs.max_io_size;
+	sess->queue_depth = attrs.queue_depth;
+
+	err = setup_mq_tags(sess);
+	if (unlikely(err))
+		goto close_sess;
+
+	err = update_sess_info(sess);
+	if (unlikely(err))
+		goto destroy_tags;
+
+	return sess;
+
+destroy_tags:
+	destroy_mq_tags(sess);
+close_sess:
+	ibtrs_clt_close(sess->sess);
+out_free:
+	spin_lock(&sess_lock);
+	list_del(&sess->list);
+	spin_unlock(&sess_lock);
+	free_percpu(sess->cpu_queues);
+	free_percpu(sess->cpu_rr);
+	kfree(sess);
+	return ERR_PTR(err);
+}
+
+static void ibnbd_clt_destroy_session(struct ibnbd_session *sess)
+{
+	mutex_lock(&sess->lock);
+	sess->state = SESS_STATE_DESTROYED;
+
+	if (!list_empty(&sess->devs_list)) {
+		mutex_unlock(&sess->lock);
+		WRN_NP("Device list is not empty,"
+		       " closing session to %s failed\n", sess->str_addr);
+		return;
+	}
+	mutex_unlock(&sess->lock);
+	ibtrs_clt_close(sess->sess);
+
+	destroy_mq_tags(sess);
+	spin_lock(&sess_lock);
+	list_del(&sess->list);
+	spin_unlock(&sess_lock);
+	wake_up(&sess_list_waitq);
+
+	free_percpu(sess->cpu_queues);
+	free_percpu(sess->cpu_rr);
+	kfree(sess);
+}
+
+void ibnbd_clt_sess_release(struct kref *ref)
+{
+	struct ibnbd_session *sess = container_of(ref, struct ibnbd_session,
+						  refcount);
+
+	ibnbd_clt_destroy_session(sess);
+}
+
+static int ibnbd_client_open(struct block_device *block_device, fmode_t mode)
+{
+	struct ibnbd_dev *dev = block_device->bd_disk->private_data;
+
+	if (dev->read_only && (mode & FMODE_WRITE))
+		return -EPERM;
+
+	if (dev->dev_state == DEV_STATE_UNMAPPED ||
+	    !ibnbd_clt_get_dev(dev))
+		return -EIO;
+
+	DEB("OPEN, name=%s, open_cnt=%d\n", dev->gd->disk_name,
+	    atomic_read(&dev->refcount) - 1);
+
+	return 0;
+}
+
+static void ibnbd_client_release(struct gendisk *gen, fmode_t mode)
+{
+	struct ibnbd_dev *dev = gen->private_data;
+
+	DEB("RELEASE, name=%s, open_cnt %d\n", dev->gd->disk_name,
+	    atomic_read(&dev->refcount) - 1);
+
+	ibnbd_clt_put_dev(dev);
+}
+
+static int ibnbd_client_getgeo(struct block_device *block_device,
+			       struct hd_geometry *geo)
+{
+	u64 size;
+	struct ibnbd_dev *dev;
+
+	dev = block_device->bd_disk->private_data;
+	size = dev->size * (dev->logical_block_size / KERNEL_SECTOR_SIZE);
+	geo->cylinders	= (size & ~0x3f) >> 6;	/* size/64 */
+	geo->heads	= 4;
+	geo->sectors	= 16;
+	geo->start	= 0;
+
+	return 0;
+}
+
+static const struct block_device_operations ibnbd_client_ops = {
+	.owner		= THIS_MODULE,
+	.open		= ibnbd_client_open,
+	.release	= ibnbd_client_release,
+	.getgeo		= ibnbd_client_getgeo
+};
+
+static size_t ibnbd_clt_get_sg_size(struct scatterlist *sglist, u32 len)
+{
+	struct scatterlist *sg;
+	size_t tsize = 0;
+	int i;
+
+	for_each_sg(sglist, sg, len, i)
+		tsize += sg->length;
+	return tsize;
+}
+
+static inline int ibnbd_clt_setup_discard(struct request *rq)
+{
+	struct page *page;
+
+	page = alloc_page(GFP_ATOMIC | __GFP_ZERO);
+	if (!page)
+		return -ENOMEM;
+	rq->special_vec.bv_page = page;
+	rq->special_vec.bv_offset = 0;
+	rq->special_vec.bv_len = PAGE_SIZE;
+	rq->rq_flags |= RQF_SPECIAL_PAYLOAD;
+	return 0;
+}
+
+static int ibnbd_client_xfer_request(struct ibnbd_dev *dev, struct request *rq,
+				     struct ibnbd_iu *iu)
+{
+	int err;
+
+	unsigned int sg_cnt;
+	size_t size;
+	struct kvec vec;
+	struct ibtrs_session *sess = dev->sess->sess;
+	struct ibtrs_tag *tag = iu->tag;
+	struct scatterlist *sg = iu->sglist;
+
+	if (req_op(rq) == REQ_OP_DISCARD) {
+		err = ibnbd_clt_setup_discard(rq);
+		if (err)
+			return err;
+	}
+	sg_cnt = blk_rq_nr_phys_segments(rq);
+
+	if (sg_cnt == 0)
+		sg_mark_end(&sg[0]);
+	else
+		sg_mark_end(&sg[sg_cnt - 1]);
+
+	iu->rq		= rq;
+	iu->dev		= dev;
+	iu->msg.sector	= blk_rq_pos(rq);
+	iu->msg.bi_size = blk_rq_bytes(rq);
+	iu->msg.rw	= rq_cmd_to_ibnbd_io_flags(rq);
+
+	sg_cnt = blk_rq_map_sg(dev->queue, rq, sg);
+
+	iu->msg.hdr.type	= IBNBD_MSG_IO;
+	iu->msg.device_id	= dev->device_id;
+
+	size = ibnbd_clt_get_sg_size(sg, sg_cnt);
+	vec = (struct kvec) {
+		.iov_base = &iu->msg,
+		.iov_len  = sizeof(iu->msg)
+	};
+
+	if (req_op(rq) == READ)
+		err = ibtrs_clt_request_rdma_write(sess, tag, iu, &vec, 1, size,
+						   sg, sg_cnt);
+	else
+		err = ibtrs_clt_rdma_write(sess, tag, iu, &vec, 1, size, sg,
+					   sg_cnt);
+	if (unlikely(err)) {
+		ERR_RL(dev, "IBTRS failed to transfer IO, errno: %d\n", err);
+		return err;
+	}
+
+	return 0;
+}
+
+/**
+ * ibnbd_dev_add_to_requeue() - add device to requeue if session is busy
+ *
+ * Description:
+ *     If session is busy, that means someone will requeue us when resources
+ *     are freed.  If session is not doing anything - device is not added to
+ *     the list and @false is returned.
+ */
+static inline bool ibnbd_dev_add_to_requeue(struct ibnbd_dev *dev,
+					    struct ibnbd_queue *q)
+{
+	struct ibnbd_session *sess = dev->sess;
+	struct ibnbd_cpu_qlist *cpu_q;
+	unsigned long flags;
+	bool added = true;
+	bool need_set;
+
+	cpu_q = get_cpu_ptr(sess->cpu_queues);
+	spin_lock_irqsave(&cpu_q->requeue_lock, flags);
+
+	if (likely(!test_and_set_bit_lock(0, &q->in_list))) {
+		if (WARN_ON(!list_empty(&q->requeue_list)))
+			goto unlock;
+
+		need_set = !test_bit(cpu_q->cpu, sess->cpu_queues_bm);
+		if (need_set) {
+			set_bit(cpu_q->cpu, sess->cpu_queues_bm);
+			/* Paired with ibnbd_put_tag().	 Set a bit first
+			 * and then observe the busy counter.
+			 */
+			smp_mb__before_atomic();
+		}
+		if (likely(atomic_read(&sess->busy))) {
+			list_add_tail(&q->requeue_list, &cpu_q->requeue_list);
+		} else {
+			/* Very unlikely, but possible: busy counter was
+			 * observed as zero.  Drop all bits and return
+			 * false to restart the queue by ourselves.
+			 */
+			if (need_set)
+				clear_bit(cpu_q->cpu, sess->cpu_queues_bm);
+			clear_bit_unlock(0, &q->in_list);
+			added = false;
+		}
+	}
+unlock:
+	spin_unlock_irqrestore(&cpu_q->requeue_lock, flags);
+	put_cpu_ptr(sess->cpu_queues);
+
+	return added;
+}
+
+static void ibnbd_dev_kick_mq_queue(struct ibnbd_dev *dev,
+				    struct blk_mq_hw_ctx *hctx,
+				    int delay)
+{
+	struct ibnbd_queue *q = hctx->driver_data;
+
+	if (WARN_ON(dev->queue_mode != BLK_MQ))
+		return;
+	blk_mq_stop_hw_queue(hctx);
+
+	if (delay != IBNBD_DELAY_IFBUSY)
+		blk_mq_delay_queue(hctx, delay);
+	else if (unlikely(!ibnbd_dev_add_to_requeue(dev, q)))
+		/* If session is not busy we have to restart
+		 * the queue ourselves.
+		 */
+		blk_mq_delay_queue(hctx, IBNBD_DELAY_10ms);
+}
+
+static void ibnbd_dev_kick_queue(struct ibnbd_dev *dev, int delay)
+{
+	if (WARN_ON(dev->queue_mode != BLK_RQ))
+		return;
+	blk_stop_queue(dev->queue);
+
+	if (delay != IBNBD_DELAY_IFBUSY)
+		ibnbd_blk_delay_queue(dev, delay);
+	else if (unlikely(!ibnbd_dev_add_to_requeue(dev, dev->hw_queues)))
+		/* If session is not busy we have to restart
+		 * the queue ourselves.
+		 */
+		ibnbd_blk_delay_queue(dev, IBNBD_DELAY_10ms);
+}
+
+static int ibnbd_queue_rq(struct blk_mq_hw_ctx *hctx,
+			  const struct blk_mq_queue_data *bd)
+{
+	struct request *rq = bd->rq;
+	struct ibnbd_dev *dev = rq->rq_disk->private_data;
+	struct ibnbd_iu *iu = blk_mq_rq_to_pdu(rq);
+	int err;
+
+	if (unlikely(!ibnbd_clt_dev_is_open(dev)))
+		return BLK_MQ_RQ_QUEUE_ERROR;
+
+	iu->tag = ibnbd_get_tag(dev->sess, hctx->next_cpu, blk_rq_bytes(rq),
+				IBTRS_TAG_NOWAIT);
+	if (unlikely(!iu->tag)) {
+		ibnbd_dev_kick_mq_queue(dev, hctx, IBNBD_DELAY_IFBUSY);
+		return BLK_MQ_RQ_QUEUE_BUSY;
+	}
+
+	blk_mq_start_request(rq);
+	err = ibnbd_client_xfer_request(dev, rq, iu);
+	if (likely(err == 0))
+		return BLK_MQ_RQ_QUEUE_OK;
+	if (unlikely(err == -EAGAIN || err == -ENOMEM)) {
+		ibnbd_dev_kick_mq_queue(dev, hctx, IBNBD_DELAY_10ms);
+		ibnbd_put_tag(dev->sess, iu->tag);
+		return BLK_MQ_RQ_QUEUE_BUSY;
+	}
+
+	ibnbd_put_tag(dev->sess, iu->tag);
+	return BLK_MQ_RQ_QUEUE_ERROR;
+}
+
+static int ibnbd_init_request(void *data, struct request *rq,
+			      unsigned int hctx_idx, unsigned int request_idx,
+			      unsigned int numa_node)
+{
+	struct ibnbd_iu *iu = blk_mq_rq_to_pdu(rq);
+
+	sg_init_table(iu->sglist, BMAX_SEGMENTS);
+	return 0;
+}
+
+static inline void ibnbd_init_hw_queue(struct ibnbd_dev *dev,
+				       struct ibnbd_queue *q,
+				       struct blk_mq_hw_ctx *hctx)
+{
+	INIT_LIST_HEAD(&q->requeue_list);
+	q->dev  = dev;
+	q->hctx = hctx;
+}
+
+static void ibnbd_init_mq_hw_queues(struct ibnbd_dev *dev)
+{
+	int i;
+	struct blk_mq_hw_ctx *hctx;
+	struct ibnbd_queue *q;
+
+	queue_for_each_hw_ctx(dev->queue, hctx, i) {
+		q = &dev->hw_queues[i];
+		ibnbd_init_hw_queue(dev, q, hctx);
+		hctx->driver_data = q;
+	}
+}
+
+static struct blk_mq_ops ibnbd_mq_ops = {
+	.queue_rq	= ibnbd_queue_rq,
+	.init_request	= ibnbd_init_request,
+	.complete	= ibnbd_softirq_done_fn,
+};
+
+static int index_to_minor(int index)
+{
+	return index << IBNBD_PART_BITS;
+}
+
+static int minor_to_index(int minor)
+{
+	return minor >> IBNBD_PART_BITS;
+}
+
+static int ibnbd_rq_prep_fn(struct request_queue *q, struct request *rq)
+{
+	struct ibnbd_dev *dev = q->queuedata;
+	struct ibnbd_iu *iu;
+
+	iu = ibnbd_get_iu(dev->sess, blk_rq_bytes(rq), IBTRS_TAG_NOWAIT);
+	if (likely(iu)) {
+		rq->special = iu;
+		rq->rq_flags |= RQF_DONTPREP;
+
+		return BLKPREP_OK;
+	}
+
+	ibnbd_dev_kick_queue(dev, IBNBD_DELAY_IFBUSY);
+	return BLKPREP_DEFER;
+}
+
+static void ibnbd_rq_unprep_fn(struct request_queue *q, struct request *rq)
+{
+	struct ibnbd_dev *dev = q->queuedata;
+
+	if (WARN_ON(!rq->special))
+		return;
+	ibnbd_put_iu(dev->sess, rq->special);
+	rq->special = NULL;
+	rq->rq_flags &= ~RQF_DONTPREP;
+}
+
+static void ibnbd_clt_request(struct request_queue *q)
+__must_hold(q->queue_lock)
+{
+	int err;
+	struct request *req;
+	struct ibnbd_iu *iu;
+	struct ibnbd_dev *dev = q->queuedata;
+
+	while ((req = blk_fetch_request(q)) != NULL) {
+		spin_unlock_irq(q->queue_lock);
+
+		if (unlikely(!ibnbd_clt_dev_is_open(dev))) {
+			err = -EIO;
+			goto next;
+		}
+
+		iu = req->special;
+		if (WARN_ON(!iu)) {
+			err = -EIO;
+			goto next;
+		}
+
+		sg_init_table(iu->sglist, dev->max_segments);
+		err = ibnbd_client_xfer_request(dev, req, iu);
+next:
+		if (unlikely(err == -EAGAIN || err == -ENOMEM)) {
+			ibnbd_rq_unprep_fn(q, req);
+			spin_lock_irq(q->queue_lock);
+			blk_requeue_request(q, req);
+			ibnbd_dev_kick_queue(dev, IBNBD_DELAY_10ms);
+			break;
+		} else if (err) {
+			blk_end_request_all(req, err);
+		}
+
+		spin_lock_irq(q->queue_lock);
+	}
+}
+
+static int setup_mq_dev(struct ibnbd_dev *dev)
+{
+	dev->queue = blk_mq_init_queue(&dev->sess->tag_set);
+	if (IS_ERR(dev->queue)) {
+		ERR(dev, "Initializing multiqueue queue failed, errno: %ld\n",
+		    PTR_ERR(dev->queue));
+		return PTR_ERR(dev->queue);
+	}
+	ibnbd_init_mq_hw_queues(dev);
+	return 0;
+}
+
+static int setup_rq_dev(struct ibnbd_dev *dev)
+{
+	dev->queue = blk_init_queue(ibnbd_clt_request, NULL);
+	if (IS_ERR_OR_NULL(dev->queue)) {
+		if (IS_ERR(dev->queue)) {
+			ERR(dev, "Initializing request queue failed, "
+			    "errno: %ld\n", PTR_ERR(dev->queue));
+			return PTR_ERR(dev->queue);
+		}
+		ERR(dev, "Initializing request queue failed\n");
+		return -ENOMEM;
+	}
+
+	blk_queue_prep_rq(dev->queue, ibnbd_rq_prep_fn);
+	blk_queue_softirq_done(dev->queue, ibnbd_softirq_done_fn);
+	blk_queue_unprep_rq(dev->queue, ibnbd_rq_unprep_fn);
+
+	return 0;
+}
+
+static void setup_request_queue(struct ibnbd_dev *dev)
+{
+	blk_queue_logical_block_size(dev->queue, dev->logical_block_size);
+	blk_queue_physical_block_size(dev->queue, dev->physical_block_size);
+	blk_queue_max_hw_sectors(dev->queue, dev->max_hw_sectors);
+	blk_queue_max_write_same_sectors(dev->queue,
+					 dev->max_write_same_sectors);
+
+	blk_queue_max_discard_sectors(dev->queue, dev->max_discard_sectors);
+	dev->queue->limits.discard_zeroes_data	= dev->discard_zeroes_data;
+	dev->queue->limits.discard_granularity	= dev->discard_granularity;
+	dev->queue->limits.discard_alignment	= dev->discard_alignment;
+	if (dev->max_discard_sectors)
+		queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, dev->queue);
+	if (dev->secure_discard)
+		queue_flag_set_unlocked(QUEUE_FLAG_SECERASE, dev->queue);
+
+	queue_flag_set_unlocked(QUEUE_FLAG_SAME_COMP, dev->queue);
+	queue_flag_set_unlocked(QUEUE_FLAG_SAME_FORCE, dev->queue);
+	/* our hca only support 32 sg cnt, proto use one, so 31 left */
+	blk_queue_max_segments(dev->queue, dev->max_segments);
+	blk_queue_io_opt(dev->queue, dev->sess->max_io_size);
+	blk_queue_write_cache(dev->queue, true, true);
+	dev->queue->queuedata = dev;
+}
+
+static void ibnbd_clt_setup_gen_disk(struct ibnbd_dev *dev, int idx)
+{
+	dev->gd->major		= ibnbd_client_major;
+	dev->gd->first_minor	= index_to_minor(idx);
+	dev->gd->fops		= &ibnbd_client_ops;
+	dev->gd->queue		= dev->queue;
+	dev->gd->private_data	= dev;
+	snprintf(dev->gd->disk_name, sizeof(dev->gd->disk_name), "ibnbd%d",
+		 idx);
+	DEB("disk_name=%s, capacity=%zu, queue_mode=%s\n", dev->gd->disk_name,
+	    dev->nsectors * (dev->logical_block_size / KERNEL_SECTOR_SIZE),
+	    ibnbd_queue_mode_str(dev->queue_mode));
+
+	set_capacity(dev->gd, dev->nsectors * (dev->logical_block_size /
+					       KERNEL_SECTOR_SIZE));
+
+	if (dev->access_mode == IBNBD_ACCESS_RO) {
+		dev->read_only = true;
+		set_disk_ro(dev->gd, true);
+	} else {
+		dev->read_only = false;
+	}
+
+	if (!dev->rotational)
+		queue_flag_set_unlocked(QUEUE_FLAG_NONROT, dev->queue);
+}
+
+static void ibnbd_clt_add_gen_disk(struct ibnbd_dev *dev)
+{
+	add_disk(dev->gd);
+}
+
+static int ibnbd_client_setup_device(struct ibnbd_session *sess,
+				     struct ibnbd_dev *dev, int idx)
+{
+	int err;
+
+	dev->size = dev->nsectors * dev->logical_block_size;
+
+	switch (dev->queue_mode) {
+	case BLK_MQ:
+		err = setup_mq_dev(dev);
+		break;
+	case BLK_RQ:
+		err = setup_rq_dev(dev);
+		break;
+	default:
+		err = -EINVAL;
+	}
+
+	if (err)
+		return err;
+
+	setup_request_queue(dev);
+
+	dev->gd = alloc_disk_node(1 << IBNBD_PART_BITS,	NUMA_NO_NODE);
+	if (!dev->gd) {
+		ERR(dev, "Failed to allocate disk node\n");
+		blk_cleanup_queue(dev->queue);
+		return -ENOMEM;
+	}
+
+	ibnbd_clt_setup_gen_disk(dev, idx);
+
+	return 0;
+}
+
+static struct ibnbd_dev *init_dev(struct ibnbd_session *sess,
+				  enum ibnbd_access_mode access_mode,
+				  enum ibnbd_queue_mode queue_mode,
+				  const char *pathname)
+{
+	int ret;
+	struct ibnbd_dev *dev;
+	size_t nr;
+
+	dev = kzalloc_node(sizeof(*dev), GFP_KERNEL, NUMA_NO_NODE);
+	if (!dev) {
+		ERR_NP("Failed to initialize device '%s' from session %s,"
+		       " allocating device structure failed\n", pathname,
+		       sess->str_addr);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	nr = (queue_mode == BLK_MQ ? num_online_cpus() :
+	      queue_mode == BLK_RQ ? 1 : 0);
+	if (nr) {
+		dev->hw_queues = kcalloc(nr, sizeof(*dev->hw_queues),
+					 GFP_KERNEL);
+		if (unlikely(!dev->hw_queues)) {
+			ERR_NP("Failed to initialize device '%s' from session"
+			       " %s, allocating hw_queues failed.", pathname,
+			       sess->str_addr);
+			ret = -ENOMEM;
+			goto out_alloc;
+		}
+		/* for MQ mode we will init all hw queues after the
+		 * request queue is created
+		 */
+		if (queue_mode == BLK_RQ)
+			ibnbd_init_hw_queue(dev, dev->hw_queues, NULL);
+	}
+
+	idr_preload(GFP_KERNEL);
+	write_lock(&g_index_lock);
+	ret = idr_alloc(&g_index_idr, dev, 0, minor_to_index(1 << MINORBITS),
+			GFP_ATOMIC);
+	write_unlock(&g_index_lock);
+	idr_preload_end();
+	if (ret < 0) {
+		ERR_NP("Failed to initialize device '%s' from session %s,"
+		       " allocating idr failed, errno: %d\n", pathname,
+		       sess->str_addr, ret);
+		goto out_queues;
+	}
+
+	dev->clt_device_id	= ret;
+	dev->close_compl	= kmalloc(sizeof(*dev->close_compl),
+					  GFP_KERNEL);
+	if (!dev->close_compl) {
+		ERR_NP("Failed to initialize device '%s' from session %s,"
+		       " allocating close completion failed, errno: %d\n",
+		       pathname, sess->str_addr, ret);
+		ret = -ENOMEM;
+		goto out_idr;
+	}
+	init_completion(dev->close_compl);
+	dev->sess		= sess;
+	dev->access_mode	= access_mode;
+	dev->queue_mode		= queue_mode;
+
+	strlcpy(dev->pathname, pathname, sizeof(dev->pathname));
+
+	INIT_DELAYED_WORK(&dev->rq_delay_work, ibnbd_blk_delay_work);
+	mutex_init(&dev->lock);
+	atomic_set(&dev->refcount, 1);
+	dev->dev_state = DEV_STATE_INIT;
+
+	return dev;
+
+out_idr:
+	write_lock(&g_index_lock);
+	idr_remove(&g_index_idr, dev->clt_device_id);
+	write_unlock(&g_index_lock);
+out_queues:
+	kfree(dev->hw_queues);
+out_alloc:
+	kfree(dev);
+	return ERR_PTR(ret);
+}
+
+bool ibnbd_clt_dev_is_mapped(const char *pathname)
+{
+	struct ibnbd_dev *dev;
+
+	spin_lock(&dev_lock);
+	list_for_each_entry(dev, &devs_list, g_list)
+		if (!strncmp(dev->pathname, pathname, sizeof(dev->pathname))) {
+			spin_unlock(&dev_lock);
+			return true;
+		}
+	spin_unlock(&dev_lock);
+
+	return false;
+}
+
+static struct ibnbd_dev *__find_sess_dev(const struct ibnbd_session *sess,
+					 const char *pathname)
+{
+	struct ibnbd_dev *dev;
+
+	list_for_each_entry(dev, &sess->devs_list, list)
+		if (!strncmp(dev->pathname, pathname, sizeof(dev->pathname)))
+			return dev;
+
+	return NULL;
+}
+
+struct ibnbd_dev *ibnbd_client_add_device(struct ibnbd_session *sess,
+					  const char *pathname,
+					  enum ibnbd_access_mode access_mode,
+					  enum ibnbd_queue_mode queue_mode,
+					  enum ibnbd_io_mode io_mode)
+{
+	int ret;
+	struct ibnbd_dev *dev;
+	struct completion *open_compl;
+
+	DEB("Add remote device: server=%s, path='%s', access_mode=%d,"
+	    " queue_mode=%d\n", sess->str_addr, pathname, access_mode,
+	    queue_mode);
+
+	mutex_lock(&sess->lock);
+
+	if (sess->state != SESS_STATE_READY) {
+		mutex_unlock(&sess->lock);
+		ERR_NP("map_device: failed to map device '%s' from session %s,"
+		       " session is not connected\n", pathname, sess->str_addr);
+		return ERR_PTR(-ENOENT);
+	}
+
+	if (__find_sess_dev(sess, pathname)) {
+		mutex_unlock(&sess->lock);
+		ERR_NP("map_device: failed to map device '%s' from session %s,"
+		       " device with same path is already mapped\n", pathname,
+		       sess->str_addr);
+		return ERR_PTR(-EEXIST);
+	}
+
+	mutex_unlock(&sess->lock);
+	dev = init_dev(sess, access_mode, queue_mode, pathname);
+	if (IS_ERR(dev)) {
+		ERR_NP("map_device: failed to map device '%s' from session %s,"
+		       " can't initialize device, errno: %ld\n", pathname,
+		       sess->str_addr, PTR_ERR(dev));
+		return dev;
+	}
+
+	ibnbd_clt_get_sess(sess);
+
+	open_compl = kmalloc(sizeof(*open_compl), GFP_KERNEL);
+	if (!open_compl) {
+		ERR(dev, "map_device: failed, Can't allocate memory for"
+		    " completion\n");
+		ret = -ENOMEM;
+		goto out;
+	}
+	init_completion(open_compl);
+	dev->open_compl = open_compl;
+	dev->io_mode = io_mode;
+
+	ret = open_remote_device(dev);
+	if (ret) {
+		ERR(dev, "map_device: failed, can't open remote device,"
+		    " errno: %d\n", ret);
+		kfree(open_compl);
+		dev->open_compl = NULL;
+		ret = -EINVAL;
+		goto out;
+	}
+	wait_for_completion(dev->open_compl);
+	mutex_lock(&dev->lock);
+
+	kfree(open_compl);
+	dev->open_compl = NULL;
+
+	if (!ibnbd_clt_dev_is_open(dev)) {
+		mutex_unlock(&dev->lock);
+		ret = dev->open_errno;
+		ERR(dev, "map_device: failed errno: %d\n", ret);
+		goto out;
+	}
+
+	mutex_lock(&sess->lock);
+	list_add(&dev->list, &sess->devs_list);
+	mutex_unlock(&sess->lock);
+
+	spin_lock(&dev_lock);
+	list_add(&dev->g_list, &devs_list);
+	spin_unlock(&dev_lock);
+
+	DEB("Opened remote device: server=%s, path='%s'\n", sess->str_addr,
+	    pathname);
+	ret = ibnbd_client_setup_device(sess, dev, dev->clt_device_id);
+	if (ret) {
+		ERR(dev, "map_device: Failed to configure device, errno: %d\n",
+		    ret);
+		mutex_unlock(&dev->lock);
+		ret = -EINVAL;
+		goto out_close;
+	}
+
+	INFO(dev, "map_device: Device mapped as %s (nsectors: %zu,"
+	     " logical_block_size: %d, physical_block_size: %d,"
+	     " max_write_same_sectors: %d, max_discard_sectors: %d,"
+	     " discard_zeroes_data: %d, discard_granularity: %d,"
+	     " discard_alignment: %d, secure_discard: %d, max_segments: %d,"
+	     " max_hw_sectors: %d, rotational: %d)\n", dev->gd->disk_name,
+	     dev->nsectors, dev->logical_block_size, dev->physical_block_size,
+	     dev->max_write_same_sectors, dev->max_discard_sectors,
+	     dev->discard_zeroes_data, dev->discard_granularity,
+	     dev->discard_alignment, dev->secure_discard,
+	     dev->max_segments, dev->max_hw_sectors, dev->rotational);
+
+	mutex_unlock(&dev->lock);
+
+	ibnbd_clt_add_gen_disk(dev);
+
+	return dev;
+
+out_close:
+	if (!WARN_ON(ibnbd_close_device(dev, true)))
+		wait_for_completion(dev->close_compl);
+out:
+	ibnbd_clt_put_dev(dev);
+	return ERR_PTR(ret);
+}
+
+void ibnbd_destroy_gen_disk(struct ibnbd_dev *dev)
+{
+	del_gendisk(dev->gd);
+	/*
+	 * Before marking queue as dying (blk_cleanup_queue() does that)
+	 * we have to be sure that everything in-flight has gone.
+	 * Blink with freeze/unfreeze.
+	 */
+	blk_mq_freeze_queue(dev->queue);
+	blk_mq_unfreeze_queue(dev->queue);
+	blk_cleanup_queue(dev->queue);
+	put_disk(dev->gd);
+
+	ibnbd_clt_put_dev(dev);
+}
+
+static int __close_device(struct ibnbd_dev *dev, bool force)
+__must_hold(&dev->sess->lock)
+{
+	enum ibnbd_dev_state prev_state;
+	int refcount, ret = 0;
+
+	mutex_lock(&dev->lock);
+
+	if (dev->dev_state == DEV_STATE_UNMAPPED) {
+		INFO(dev, "Device is already being unmapped\n");
+		ret = -EALREADY;
+		goto out;
+	}
+
+	refcount = atomic_read(&dev->refcount);
+	if (!force && refcount > 1) {
+		ERR(dev, "Closing device failed, device is in use,"
+		    " (%d device users)\n", refcount - 1);
+		ret = -EBUSY;
+		goto out;
+	}
+
+	prev_state = dev->dev_state;
+	dev->dev_state = DEV_STATE_UNMAPPED;
+
+	list_del(&dev->list);
+
+	spin_lock(&dev_lock);
+	list_del(&dev->g_list);
+	spin_unlock(&dev_lock);
+
+	ibnbd_clt_remove_dev_symlink(dev);
+	mutex_unlock(&dev->lock);
+
+	mutex_unlock(&dev->sess->lock);
+	if (prev_state == DEV_STATE_OPEN && dev->sess->sess) {
+		if (send_msg_close(dev->sess->sess, dev->device_id))
+			complete(dev->close_compl);
+	} else {
+		complete(dev->close_compl);
+	}
+
+	mutex_lock(&dev->sess->lock);
+	if (dev->gd)
+		INFO(dev, "Device is unmapped\n");
+	else
+		INFO(dev, "Device is unmapped\n");
+	return 0;
+out:
+	mutex_unlock(&dev->lock);
+	return ret;
+}
+
+int ibnbd_close_device(struct ibnbd_dev *dev, bool force)
+{
+	int ret;
+
+	mutex_lock(&dev->sess->lock);
+	ret = __close_device(dev, force);
+	mutex_unlock(&dev->sess->lock);
+
+	return ret;
+}
+
+static void ibnbd_destroy_sessions(void)
+{
+	struct ibnbd_session *sess, *sn;
+	struct ibnbd_dev *dev, *tn;
+	int ret;
+
+	list_for_each_entry_safe(sess, sn, &session_list, list) {
+		if (!ibnbd_clt_get_sess(sess))
+			continue;
+		mutex_lock(&sess->lock);
+		sess->state = SESS_STATE_DESTROYED;
+		list_for_each_entry_safe(dev, tn, &sess->devs_list, list) {
+			if (!kobject_get(&dev->kobj))
+				continue;
+			ret = __close_device(dev, true);
+			if (ret)
+				WRN(dev, "Closing device failed, errno: %d\n",
+				    ret);
+			else
+				wait_for_completion(dev->close_compl);
+			ibnbd_clt_schedule_dev_destroy(dev);
+			kobject_put(&dev->kobj);
+		}
+		mutex_unlock(&sess->lock);
+		ibnbd_clt_put_sess(sess);
+	}
+}
+
+static int __init ibnbd_client_init(void)
+{
+	int err;
+
+	INFO_NP("Loading module ibnbd_client, version: "
+		__stringify(IBNBD_VER) " (softirq_enable: %d)\n",
+		softirq_enable);
+
+	ibnbd_client_major = register_blkdev(ibnbd_client_major, "ibnbd");
+	if (ibnbd_client_major <= 0) {
+		ERR_NP("Failed to load module,"
+		       " block device registration failed\n");
+		err = -EBUSY;
+		goto out;
+	}
+
+	ops.owner	= THIS_MODULE;
+	ops.recv	= ibnbd_clt_recv;
+	ops.rdma_ev	= ibnbd_clt_rdma_ev;
+	ops.sess_ev	= ibnbd_clt_sess_ev;
+	err = ibtrs_clt_register(&ops);
+	if (err) {
+		ERR_NP("Failed to load module, IBTRS registration failed,"
+		       " errno: %d\n", err);
+		goto out_unregister_blk;
+	}
+	err = ibnbd_clt_create_sysfs_files();
+	if (err) {
+		ERR_NP("Failed to load module,"
+		       " creating sysfs device files failed, error: %d\n", err);
+		goto out_unregister;
+	}
+
+	return 0;
+
+out_unregister:
+	ibtrs_clt_unregister(&ops);
+out_unregister_blk:
+	unregister_blkdev(ibnbd_client_major, "ibnbd");
+out:
+	return err;
+}
+
+static void __exit ibnbd_client_exit(void)
+{
+	INFO_NP("Unloading module\n");
+	ibnbd_clt_destroy_default_group();
+	flush_scheduled_work();
+	ibnbd_destroy_sessions();
+	wait_event(sess_list_waitq, list_empty(&session_list));
+	ibnbd_clt_destroy_sysfs_files();
+	ibtrs_clt_unregister(&ops);
+	unregister_blkdev(ibnbd_client_major, "ibnbd");
+	idr_destroy(&g_index_idr);
+	INFO_NP("Module unloaded\n");
+}
+
+module_init(ibnbd_client_init);
+module_exit(ibnbd_client_exit);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 16/28] ibnbd_clt: add main functionality of ibnbd_client
@ 2017-03-24 10:45   ` Jack Wang
  0 siblings, 0 replies; 87+ messages in thread
From: Jack Wang @ 2017-03-24 10:45 UTC (permalink / raw)
  To: linux-block-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
	hch-jcswGhMUV9g, mail-99BIx50xQYGELgA04lAiVw,
	Milind.dumbare-Re5JQEeQqe8AvxtiuMwx3w,
	yun.wang-EIkl63zCoXaH+58JC4qpiA, Jack Wang, Kleber Souza,
	Danil Kipnis, Roman Pen

From: Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>

It provides interface to map remote device as local block devices
(/dev/ibnbdx) and prepare IO for the transfer.

It supports both request mode and multiqueue mode.

Signed-off-by: Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Signed-off-by: Kleber Souza <kleber.souza-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Signed-off-by: Danil Kipnis <danil.kipnis-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Signed-off-by: Roman Pen <roman.penyaev-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
---
 drivers/block/ibnbd_client/ibnbd_clt.c | 2007 ++++++++++++++++++++++++++++++++
 1 file changed, 2007 insertions(+)
 create mode 100644 drivers/block/ibnbd_client/ibnbd_clt.c

diff --git a/drivers/block/ibnbd_client/ibnbd_clt.c b/drivers/block/ibnbd_client/ibnbd_clt.c
new file mode 100644
index 0000000..945c8df
--- /dev/null
+++ b/drivers/block/ibnbd_client/ibnbd_clt.c
@@ -0,0 +1,2007 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler < mail-99BIx50xQYGELgA04lAiVw@public.gmane.org>
+ *          Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ *   	    Kleber Souza <kleber.souza-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ * 	    Danil Kipnis <danil.kipnis-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ *   	    Roman Pen <roman.penyaev-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ *          Milind Dumbare <Milind.dumbare-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#include <linux/module.h>
+#include <linux/blkdev.h>
+#include <linux/hdreg.h>		/* for hd_geometry */
+#include <linux/scatterlist.h>
+#include <linux/idr.h>
+#include <linux/fs.h>
+#include <linux/uaccess.h>
+#include <rdma/ib.h>
+#include <uapi/linux/in6.h>
+
+#include "ibnbd_clt.h"
+#include "ibnbd_clt_sysfs.h"
+#include "../ibnbd_inc/ibnbd.h"
+#include <rdma/ibtrs.h>
+
+MODULE_AUTHOR("ibnbd-EIkl63zCoXaH+58JC4qpiA@public.gmane.org");
+MODULE_DESCRIPTION("InfiniBand Network Block Device Client");
+MODULE_VERSION(__stringify(IBNBD_VER));
+MODULE_LICENSE("GPL");
+
+static int ibnbd_client_major;
+static DEFINE_IDR(g_index_idr);
+static DEFINE_RWLOCK(g_index_lock);
+static DEFINE_SPINLOCK(sess_lock);
+static DEFINE_SPINLOCK(dev_lock);
+static LIST_HEAD(session_list);
+static LIST_HEAD(devs_list);
+static DECLARE_WAIT_QUEUE_HEAD(sess_list_waitq);
+static struct ibtrs_clt_ops ops;
+
+static bool softirq_enable;
+module_param(softirq_enable, bool, 0444);
+MODULE_PARM_DESC(softirq_enable, "finish request in softirq_fn."
+		 " (default: 0)");
+/*
+ * Maximum number of partitions an instance can have.
+ * 6 bits = 64 minors = 63 partitions (one minor is used for the device itself)
+ */
+#define IBNBD_PART_BITS		6
+#define KERNEL_SECTOR_SIZE      512
+
+inline bool ibnbd_clt_dev_is_open(struct ibnbd_dev *dev)
+{
+	return dev->dev_state == DEV_STATE_OPEN;
+}
+
+static void ibnbd_clt_put_dev(struct ibnbd_dev *dev)
+{
+	if (!atomic_dec_if_positive(&dev->refcount)) {
+		write_lock(&g_index_lock);
+		idr_remove(&g_index_idr, dev->clt_device_id);
+		write_unlock(&g_index_lock);
+		kfree(dev->hw_queues);
+		kfree(dev->close_compl);
+		ibnbd_clt_put_sess(dev->sess);
+		kfree(dev);
+	}
+}
+
+static int ibnbd_clt_get_dev(struct ibnbd_dev *dev)
+{
+	return atomic_inc_not_zero(&dev->refcount);
+}
+
+static struct ibnbd_dev *g_get_dev(int dev_id)
+{
+	struct ibnbd_dev *dev;
+
+	read_lock(&g_index_lock);
+	dev = idr_find(&g_index_idr, dev_id);
+	if (!dev)
+		dev = ERR_PTR(-ENXIO);
+	read_unlock(&g_index_lock);
+
+	return dev;
+}
+
+static void ibnbd_clt_set_dev_attr(struct ibnbd_dev *dev,
+				   const struct ibnbd_msg_open_rsp *rsp)
+{
+	dev->device_id			= rsp->device_id;
+	dev->nsectors			= rsp->nsectors;
+	dev->logical_block_size		= rsp->logical_block_size;
+	dev->physical_block_size	= rsp->physical_block_size;
+	dev->max_write_same_sectors	= rsp->max_write_same_sectors;
+	dev->max_discard_sectors	= rsp->max_discard_sectors;
+	dev->discard_zeroes_data	= rsp->discard_zeroes_data;
+	dev->discard_granularity	= rsp->discard_granularity;
+	dev->discard_alignment		= rsp->discard_alignment;
+	dev->secure_discard		= rsp->secure_discard;
+	dev->rotational			= rsp->rotational;
+	dev->remote_io_mode		= rsp->io_mode;
+
+	if (dev->remote_io_mode == IBNBD_FILEIO) {
+		dev->max_hw_sectors = dev->sess->max_io_size /
+			rsp->logical_block_size;
+		dev->max_segments = BMAX_SEGMENTS;
+	} else {
+		dev->max_hw_sectors = dev->sess->max_io_size /
+			rsp->logical_block_size <
+			rsp->max_hw_sectors ?
+			dev->sess->max_io_size /
+			rsp->logical_block_size : rsp->max_hw_sectors;
+		dev->max_segments = rsp->max_segments > BMAX_SEGMENTS ?
+				    BMAX_SEGMENTS : rsp->max_segments;
+	}
+}
+
+static void ibnbd_clt_revalidate_disk(struct ibnbd_dev *dev,
+				      size_t new_nsectors)
+{
+	int err = 0;
+
+	INFO(dev, "Device size changed from %zu to %zu sectors\n",
+	     dev->nsectors, new_nsectors);
+	dev->nsectors = new_nsectors;
+	set_capacity(dev->gd,
+		     dev->nsectors * (dev->logical_block_size /
+				      KERNEL_SECTOR_SIZE));
+	err = revalidate_disk(dev->gd);
+	if (err)
+		ERR(dev, "Failed to change device size from"
+		    " %zu to %zu, errno: %d\n", dev->nsectors,
+		     new_nsectors, err);
+}
+
+static void process_msg_sess_info_rsp(struct ibnbd_session *sess,
+				      struct ibnbd_msg_sess_info_rsp *msg)
+{
+	sess->ver = min_t(u8, msg->ver, IBNBD_VERSION);
+	DEB("Session to %s (%s) using protocol version %d (client version: %d,"
+	    " server version: %d)\n", sess->str_addr, sess->hostname, sess->ver,
+	    IBNBD_VERSION, msg->ver);
+}
+
+static int process_msg_open_rsp(struct ibnbd_session *sess,
+				struct ibnbd_msg_open_rsp *rsp)
+{
+	struct ibnbd_dev *dev;
+	int err = 0;
+
+	dev = g_get_dev(rsp->clt_device_id);
+	if (IS_ERR(dev)) {
+		ERR_NP("Open-Response message received from session %s"
+		       " for unknown device (id: %d)\n", sess->str_addr,
+		       rsp->clt_device_id);
+		return -ENOENT;
+	}
+
+	if (!ibnbd_clt_get_dev(dev)) {
+		ERR_NP("Failed to process Open-Response message from session"
+		       " %s, unable to get reference to device (id: %d)",
+		       sess->str_addr, rsp->clt_device_id);
+		return -ENOENT;
+	}
+
+	mutex_lock(&dev->lock);
+
+	if (dev->dev_state == DEV_STATE_UNMAPPED) {
+		INFO(dev, "Ignoring Open-Response message from server for "
+		     " unmapped device\n");
+		err = -ENOENT;
+		goto out;
+	}
+
+	if (rsp->result) {
+		ERR(dev, "Server failed to open device for mapping, errno:"
+		    " %d\n", rsp->result);
+		dev->open_errno = rsp->result;
+		if (dev->open_compl)
+			complete(dev->open_compl);
+		goto out;
+	}
+
+	if (dev->dev_state == DEV_STATE_CLOSED) {
+		/* if the device was remapped and the size changed in the
+		 * meantime we need to revalidate it
+		 */
+		if (dev->nsectors != rsp->nsectors)
+			ibnbd_clt_revalidate_disk(dev, (size_t)rsp->nsectors);
+		INFO(dev, "Device online, device remapped successfully\n");
+	}
+
+	ibnbd_clt_set_dev_attr(dev, rsp);
+
+	dev->dev_state = DEV_STATE_OPEN;
+	if (dev->open_compl)
+		complete(dev->open_compl);
+
+out:
+	mutex_unlock(&dev->lock);
+	ibnbd_clt_put_dev(dev);
+
+	return err;
+}
+
+static void process_msg_revalidate(struct ibnbd_session *sess,
+				   struct ibnbd_msg_revalidate *msg)
+{
+	struct ibnbd_dev *dev;
+
+	dev = g_get_dev(msg->clt_device_id);
+	if (IS_ERR(dev)) {
+		ERR_NP("Received device revalidation message from session %s"
+		       " for non-existent device (id %d)\n", sess->str_addr,
+		       msg->clt_device_id);
+		return;
+	}
+
+	if (!ibnbd_clt_get_dev(dev)) {
+		ERR_NP("Failed to process device revalidation message from"
+		       " session %s, unable to get reference to device"
+		       " (id: %d)", sess->str_addr, msg->clt_device_id);
+		return;
+	}
+
+	mutex_lock(&dev->lock);
+
+	if (dev->dev_state == DEV_STATE_UNMAPPED) {
+		ERR(dev, "Received device revalidation message"
+		    " for unmapped device\n");
+		goto out;
+	}
+
+	if (dev->nsectors != msg->nsectors &&
+	    dev->dev_state == DEV_STATE_OPEN) {
+		ibnbd_clt_revalidate_disk(dev, (size_t)msg->nsectors);
+	} else {
+		INFO(dev, "Ignoring device revalidate message, "
+		     "current device size is the same as in the "
+		     "revalidate message, %llu sectors\n", msg->nsectors);
+	}
+
+out:
+	mutex_unlock(&dev->lock);
+	ibnbd_clt_put_dev(dev);
+}
+
+static int send_msg_close(struct ibtrs_session *sess, u32 device_id)
+{
+	struct ibnbd_msg_close msg;
+	struct kvec vec = {
+		.iov_base = &msg,
+		.iov_len  = sizeof(msg)
+	};
+
+	msg.hdr.type	= IBNBD_MSG_CLOSE;
+	msg.device_id	= device_id;
+
+	return ibtrs_clt_send(sess, &vec, 1);
+}
+
+static void ibnbd_clt_recv(void *priv, const void *msg, size_t len)
+{
+	const struct ibnbd_msg_hdr *hdr = msg;
+	struct ibnbd_session *sess = priv;
+
+	if (unlikely(WARN_ON(!hdr) || ibnbd_validate_message(msg, len)))
+		return;
+
+	print_hex_dump_debug("", DUMP_PREFIX_OFFSET, 8, 1, msg, len, true);
+
+	switch (hdr->type) {
+	case IBNBD_MSG_SESS_INFO_RSP: {
+		struct ibnbd_msg_sess_info_rsp *rsp =
+			(struct ibnbd_msg_sess_info_rsp *)msg;
+
+		process_msg_sess_info_rsp(sess, rsp);
+		if (sess->sess_info_compl)
+			complete(sess->sess_info_compl);
+		break;
+	}
+	case IBNBD_MSG_OPEN_RSP: {
+		int err;
+		struct ibnbd_msg_open_rsp *rsp =
+			(struct ibnbd_msg_open_rsp *)msg;
+
+		if (process_msg_open_rsp(sess, rsp) && !rsp->result) {
+			ERR_NP("Failed to process open response message from"
+			       " server, sending close message for dev id:"
+			       " %u\n", rsp->device_id);
+
+			err = send_msg_close(sess->sess, rsp->device_id);
+			if (err)
+				ERR_NP("Failed to send close msg for device"
+				       " with id: %u, errno: %d\n",
+				       rsp->device_id, err);
+		}
+
+		break;
+	}
+	case IBNBD_MSG_CLOSE_RSP: {
+		struct ibnbd_dev *dev;
+		struct ibnbd_msg_close_rsp *rsp =
+			(struct ibnbd_msg_close_rsp *)msg;
+
+		dev = g_get_dev(rsp->clt_device_id);
+		if (IS_ERR(dev)) {
+			ERR_NP("Close-Response message received from session %s"
+			       " for unknown device (id: %u)\n", sess->str_addr,
+			       rsp->clt_device_id);
+			break;
+		}
+		if (dev->close_compl && dev->dev_state == DEV_STATE_UNMAPPED)
+			complete(dev->close_compl);
+
+		break;
+	}
+	case IBNBD_MSG_REVAL:
+		process_msg_revalidate(sess,
+				       (struct ibnbd_msg_revalidate *)msg);
+		break;
+	default:
+		ERR_NP("IBNBD message with unknown type %d received from"
+		       " session %s\n", hdr->type, sess->str_addr);
+		break;
+	}
+}
+
+static void ibnbd_blk_delay_work(struct work_struct *work)
+{
+	struct ibnbd_dev *dev;
+
+	dev = container_of(work, struct ibnbd_dev, rq_delay_work.work);
+	spin_lock_irq(dev->queue->queue_lock);
+	blk_start_queue(dev->queue);
+	spin_unlock_irq(dev->queue->queue_lock);
+}
+
+/**
+ * What is the difference between this and original blk_delay_queue() ?
+ * Here the stop queue flag is cleared, so we are like MQ.
+ */
+static void ibnbd_blk_delay_queue(struct ibnbd_dev *dev, unsigned long msecs)
+{
+	int cpu = get_cpu();
+
+	kblockd_schedule_delayed_work_on(cpu, &dev->rq_delay_work,
+					 msecs_to_jiffies(msecs));
+	put_cpu();
+}
+
+static inline void ibnbd_dev_requeue(struct ibnbd_queue *q)
+{
+	struct ibnbd_dev *dev = q->dev;
+
+	if (dev->queue_mode == BLK_MQ) {
+		if (WARN_ON(!q->hctx))
+			return;
+		blk_mq_delay_queue(q->hctx, 0);
+	} else if (dev->queue_mode == BLK_RQ) {
+		ibnbd_blk_delay_queue(q->dev, 0);
+	} else {
+		WARN(1, "We support requeueing only for RQ or MQ");
+	}
+}
+
+enum {
+	IBNBD_DELAY_10ms   = 10,
+	IBNBD_DELAY_IFBUSY = -1,
+};
+
+/**
+ * ibnbd_get_cpu_qlist() - finds a list with HW queues to be requeued
+ *
+ * Description:
+ *     Each CPU has a list of HW queues, which needs to be requeed.  If a list
+ *     is not empty - it is marked with a bit.  This function finds first
+ *     set bit in a bitmap and returns corresponding CPU list.
+ */
+static struct ibnbd_cpu_qlist *ibnbd_get_cpu_qlist(struct ibnbd_session *sess,
+						   int cpu)
+{
+	int bit;
+
+	/* First half */
+	bit = find_next_bit(sess->cpu_queues_bm, nr_cpu_ids, cpu);
+	if (bit < nr_cpu_ids) {
+		return per_cpu_ptr(sess->cpu_queues, bit);
+	} else if (cpu != 0) {
+		/* Second half */
+		bit = find_next_bit(sess->cpu_queues_bm, cpu, 0);
+		if (bit < cpu)
+			return per_cpu_ptr(sess->cpu_queues, bit);
+	}
+
+	return NULL;
+}
+
+static inline int nxt_cpu(int cpu)
+{
+	return (cpu + 1) % NR_CPUS;
+}
+
+/**
+ * get_cpu_rr_var() - returns pointer to percpu var containing last cpu requeued
+ *
+ * It also sets the var to the current cpu if the var was never set before
+ * (== -1).
+ */
+#define get_cpu_rr_var(percpu)				\
+({							\
+	int *cpup;					\
+							\
+	cpup = &get_cpu_var(*percpu);			\
+	if (unlikely(*cpup < 0))			\
+		*cpup = smp_processor_id();		\
+	cpup;						\
+})
+
+/**
+ * ibnbd_requeue_if_needed() - requeue if CPU queue is marked as non empty
+ *
+ * Description:
+ *     Each CPU has it's own list of HW queues, which should be requeued.
+ *     Function finds such list with HW queues, takes a list lock, picks up
+ *     the first HW queue out of the list and requeues it.
+ *
+ * Return:
+ *     True if the queue was requeued, false otherwise.
+ *
+ * Context:
+ *     Does not matter.
+ */
+static inline bool ibnbd_requeue_if_needed(struct ibnbd_session *sess)
+{
+	struct ibnbd_queue *q = NULL;
+	struct ibnbd_cpu_qlist *cpu_q;
+	unsigned long flags;
+	int cpuv;
+
+	int *uninitialized_var(cpup);
+
+	/*
+	 * To keep fairness and not to let other queues starve we always
+	 * try to wake up someone else in round-robin manner.  That of course
+	 * increases latency but queues always have a chance to be executed.
+	 */
+	cpup = get_cpu_rr_var(sess->cpu_rr);
+	cpuv = (*cpup + 1) % num_online_cpus();
+	for (cpu_q = ibnbd_get_cpu_qlist(sess, cpuv); cpu_q;
+	     cpu_q = ibnbd_get_cpu_qlist(sess, nxt_cpu(cpu_q->cpu))) {
+		if (!spin_trylock_irqsave(&cpu_q->requeue_lock, flags))
+			continue;
+		if (likely(test_bit(cpu_q->cpu, sess->cpu_queues_bm))) {
+			q = list_first_entry_or_null(&cpu_q->requeue_list,
+						     typeof(*q), requeue_list);
+			if (WARN_ON(!q))
+				goto clear_bit;
+			list_del_init(&q->requeue_list);
+			clear_bit_unlock(0, &q->in_list);
+
+			if (list_empty(&cpu_q->requeue_list)) {
+				/* Clear bit if nothing is left */
+clear_bit:
+				clear_bit(cpu_q->cpu, sess->cpu_queues_bm);
+			}
+		}
+		spin_unlock_irqrestore(&cpu_q->requeue_lock, flags);
+
+		if (q)
+			break;
+	}
+
+	/**
+	 * Saves the CPU that is going to be requeued on the per-cpu var. Just
+	 * incrementing it doesn't work because ibnbd_get_cpu_qlist() will
+	 * always return the first CPU with something on the queue list when the
+	 * value stored on the var is greater than the last CPU with something
+	 * on the list.
+	 */
+	if (cpu_q)
+		*cpup = cpu_q->cpu;
+	put_cpu_var(sess->cpu_rr);
+
+	if (q)
+		ibnbd_dev_requeue(q);
+
+	return !!q;
+}
+
+/**
+ * ibnbd_requeue_all_if_idle() - requeue all queues left in the list if
+ *     session is idling (there are no requests in-flight).
+ *
+ * Description:
+ *     This function tries to rerun all stopped queues if there are no
+ *     requests in-flight anymore.  This function tries to solve an obvious
+ *     problem, when number of tags < than number of queues (hctx), which
+ *     are stopped and put to sleep.  If last tag, which has been just put,
+ *     does not wake up all left queues (hctxs), IO requests hang forever.
+ *
+ *     That can happen when all number of tags, say N, have been exhausted
+ *     from one CPU, and we have many block devices per session, say M.
+ *     Each block device has it's own queue (hctx) for each CPU, so eventually
+ *     we can put that number of queues (hctxs) to sleep: M x NR_CPUS.
+ *     If number of tags N < M x NR_CPUS finally we will get an IO hang.
+ *
+ *     To avoid this hang last caller of ibnbd_put_tag() (last caller is the
+ *     one who observes sess->busy == 0) must wake up all remaining queues.
+ *
+ * Context:
+ *     Does not matter.
+ */
+static inline void ibnbd_requeue_all_if_idle(struct ibnbd_session *sess)
+{
+	bool requeued;
+
+	do {
+		requeued = ibnbd_requeue_if_needed(sess);
+	} while (atomic_read(&sess->busy) == 0 && requeued);
+}
+
+static struct ibtrs_tag *ibnbd_get_tag(struct ibnbd_session *sess, int cpu,
+				       size_t tag_bytes, int wait)
+{
+	struct ibtrs_tag *tag;
+
+	tag = ibtrs_get_tag(sess->sess, cpu, tag_bytes,
+			    wait ? IBTRS_TAG_WAIT : IBTRS_TAG_NOWAIT);
+	if (likely(tag))
+		/* We have a subtle rare case here, when all tags can be
+		 * consumed before busy counter increased.  This is safe,
+		 * because loser will get NULL as a tag, observe 0 busy
+		 * counter and immediately restart the queue himself.
+		 */
+		atomic_inc(&sess->busy);
+
+	return tag;
+}
+
+static void ibnbd_put_tag(struct ibnbd_session *sess, struct ibtrs_tag *tag)
+{
+	ibtrs_put_tag(sess->sess, tag);
+	atomic_dec(&sess->busy);
+	/* Paired with ibnbd_dev_add_to_requeue().  Decrement first
+	 * and then check queue bits.
+	 */
+	smp_mb__after_atomic();
+	ibnbd_requeue_all_if_idle(sess);
+}
+
+static struct ibnbd_iu *ibnbd_get_iu(struct ibnbd_session *sess,
+				     size_t tag_bytes, int wait)
+{
+	struct ibnbd_iu *iu;
+	struct ibtrs_tag *tag;
+
+	tag = ibnbd_get_tag(sess, -1, tag_bytes,
+			    wait ? IBTRS_TAG_WAIT : IBTRS_TAG_NOWAIT);
+	if (unlikely(!tag))
+		return NULL;
+	iu = ibtrs_tag_to_pdu(tag);
+	iu->tag = tag; /* yes, ibtrs_tag_from_pdu() can be nice here,
+			* but also we have to think about MQ mode
+			*/
+
+	return iu;
+}
+
+static void ibnbd_put_iu(struct ibnbd_session *sess, struct ibnbd_iu *iu)
+{
+	ibnbd_put_tag(sess, iu->tag);
+}
+
+static void ibnbd_softirq_done_fn(struct request *rq)
+{
+	struct ibnbd_dev *dev		= rq->rq_disk->private_data;
+	struct ibnbd_session *sess	= dev->sess;
+	struct ibnbd_iu *iu;
+
+	switch (dev->queue_mode) {
+	case BLK_MQ:
+		iu = blk_mq_rq_to_pdu(rq);
+		ibnbd_put_tag(sess, iu->tag);
+		blk_mq_end_request(rq, iu->errno);
+		if (rq->rq_flags & RQF_SPECIAL_PAYLOAD)
+			__free_page(rq->special_vec.bv_page);
+		break;
+	case BLK_RQ:
+		iu = rq->special;
+		blk_end_request_all(rq, iu->errno);
+		if (rq->rq_flags & RQF_SPECIAL_PAYLOAD)
+			__free_page(rq->special_vec.bv_page);
+		break;
+	default:
+		WARN(true, "dev->queue_mode , contains unexpected"
+		     " value: %d. Memory Corruption? Inflight I/O stalled!\n",
+		     dev->queue_mode);
+		return;
+	}
+}
+
+static void ibnbd_clt_rdma_ev(void *priv, enum ibtrs_clt_rdma_ev ev, int errno)
+{
+	struct ibnbd_iu *iu		= (struct ibnbd_iu *)priv;
+	struct ibnbd_dev *dev		= iu->dev;
+	struct request *rq;
+	const int flags = iu->msg.rw;
+	bool is_read;
+
+	switch (dev->queue_mode) {
+	case BLK_MQ:
+		rq = iu->rq;
+		is_read = req_op(rq) == READ;
+		iu->errno = errno;
+		if (softirq_enable) {
+			blk_mq_complete_request(rq, errno);
+		} else {
+			ibnbd_put_tag(dev->sess, iu->tag);
+			blk_mq_end_request(rq, errno);
+
+			if (rq->rq_flags & RQF_SPECIAL_PAYLOAD)
+				__free_page(rq->special_vec.bv_page);
+
+		}
+		break;
+	case BLK_RQ:
+		rq = iu->rq;
+		is_read = req_op(rq) == READ;
+		iu->errno = errno;
+		if (softirq_enable) {
+			blk_complete_request(rq);
+		} else {
+			blk_end_request_all(rq, errno);
+			if (rq->rq_flags & RQF_SPECIAL_PAYLOAD)
+				__free_page(rq->special_vec.bv_page);
+		}
+		break;
+	default:
+		WARN(true, "dev->queue_mode , contains unexpected"
+		     " value: %d. Memory Corruption? Inflight I/O stalled!\n",
+		     dev->queue_mode);
+		return;
+	}
+
+	if (errno)
+		INFO_RL(dev, "%s I/O failed with status: %d, flags: 0x%x\n",
+			is_read ? "read" : "write", errno, flags);
+}
+
+static int send_msg_open(struct ibnbd_dev *dev)
+{
+	int err;
+	struct ibnbd_msg_open msg;
+	struct kvec vec = {
+		.iov_base = &msg,
+		.iov_len  = sizeof(msg)
+	};
+
+	msg.hdr.type		= IBNBD_MSG_OPEN;
+	msg.clt_device_id	= dev->clt_device_id;
+	msg.access_mode		= dev->access_mode;
+	msg.io_mode		= dev->io_mode;
+	strlcpy(msg.dev_name, dev->pathname, sizeof(msg.dev_name));
+
+	err = ibtrs_clt_send(dev->sess->sess, &vec, 1);
+
+	return err;
+}
+
+static int send_msg_sess_info(struct ibnbd_session *sess)
+{
+	struct ibnbd_msg_sess_info msg;
+	struct kvec vec = {
+		.iov_base = &msg,
+		.iov_len  = sizeof(msg)
+	};
+
+	msg.hdr.type	= IBNBD_MSG_SESS_INFO;
+	msg.ver		= IBNBD_VERSION;
+
+	return ibtrs_clt_send(sess->sess, &vec, 1);
+}
+
+int open_remote_device(struct ibnbd_dev *dev)
+{
+	int err;
+
+	err = send_msg_open(dev);
+	if (unlikely(err)) {
+		ERR(dev, "Failed to send open msg, err: %d\n", err);
+		return err;
+	}
+	return 0;
+}
+
+static int find_dev_cb(int id, void *ptr, void *data)
+{
+	struct ibnbd_dev *dev = ptr;
+	struct ibnbd_session *sess = data;
+
+	if (dev->sess == sess && dev->dev_state == DEV_STATE_INIT &&
+	    dev->open_compl) {
+		dev->dev_state = DEV_STATE_INIT_CLOSED;
+		dev->open_errno = -ECOMM;
+		complete(dev->open_compl);
+		ERR(dev, "Device offline, session disconnected.\n");
+	} else if (dev->sess == sess && dev->dev_state == DEV_STATE_UNMAPPED &&
+	    dev->close_compl) {
+		complete(dev->close_compl);
+		ERR(dev, "Device closed, session disconnected.\n");
+	}
+
+	return 0;
+}
+
+static void __set_dev_states_closed(struct ibnbd_session *sess)
+{
+	struct ibnbd_dev *dev;
+
+	list_for_each_entry(dev, &sess->devs_list, list) {
+		mutex_lock(&dev->lock);
+		dev->dev_state = DEV_STATE_CLOSED;
+		dev->open_errno = -ECOMM;
+		if (dev->open_compl)
+			complete(dev->open_compl);
+		ERR(dev, "Device offline, session disconnected.\n");
+		mutex_unlock(&dev->lock);
+	}
+	read_lock(&g_index_lock);
+	idr_for_each(&g_index_idr, find_dev_cb, sess);
+	read_unlock(&g_index_lock);
+}
+
+static int update_sess_info(struct ibnbd_session *sess)
+{
+	int err;
+
+	sess->sess_info_compl = kmalloc(sizeof(*sess->sess_info_compl),
+					GFP_KERNEL);
+	if (!sess->sess_info_compl) {
+		ERR_NP("Failed to allocate memory for completion for session"
+		       " %s (%s)\n", sess->str_addr, sess->hostname);
+		return -ENOMEM;
+	}
+
+	init_completion(sess->sess_info_compl);
+
+	err = send_msg_sess_info(sess);
+	if (unlikely(err)) {
+		ERR_NP("Failed to send SESS_INFO message for session %s (%s)\n",
+		       sess->str_addr, sess->hostname);
+		goto out;
+	}
+
+	/* wait for IBNBD_MSG_SESS_INFO_RSP from server */
+	wait_for_completion(sess->sess_info_compl);
+out:
+	kfree(sess->sess_info_compl);
+	sess->sess_info_compl = NULL;
+
+	return err;
+}
+
+static void reopen_worker(struct work_struct *work)
+{
+	struct ibnbd_work *w;
+	struct ibnbd_session *sess;
+	struct ibnbd_dev *dev;
+	int err;
+
+	w = container_of(work, struct ibnbd_work, work);
+	sess = w->sess;
+	kfree(w);
+
+	mutex_lock(&sess->lock);
+	if (sess->state == SESS_STATE_DESTROYED) {
+		mutex_unlock(&sess->lock);
+		return;
+	}
+	err = update_sess_info(sess);
+	if (unlikely(err))
+		goto out;
+	list_for_each_entry(dev, &sess->devs_list, list) {
+		INFO(dev, "session reconnected, remapping device\n");
+		open_remote_device(dev);
+	}
+out:
+	mutex_unlock(&sess->lock);
+
+	ibnbd_clt_put_sess(sess);
+}
+
+static int ibnbd_schedule_reopen(struct ibnbd_session *sess)
+{
+	struct ibnbd_work *w;
+
+	w = kmalloc(sizeof(*w), GFP_KERNEL);
+	if (!w) {
+		ERR_NP("Failed to allocate memory to schedule reopen of"
+		       " devices for session %s\n", sess->str_addr);
+		return -ENOMEM;
+	}
+
+	if (WARN_ON(!ibnbd_clt_get_sess(sess))) {
+		kfree(w);
+		return -ENOENT;
+	}
+
+	w->sess = sess;
+	INIT_WORK(&w->work, reopen_worker);
+	schedule_work(&w->work);
+
+	return 0;
+}
+
+static void ibnbd_clt_sess_ev(void *priv, enum ibtrs_clt_sess_ev ev, int errno)
+{
+	struct ibnbd_session *sess = priv;
+	struct ibtrs_attrs attrs;
+
+	switch (ev) {
+	case IBTRS_CLT_SESS_EV_DISCONNECTED:
+		if (sess->sess_info_compl)
+			complete(sess->sess_info_compl);
+		mutex_lock(&sess->lock);
+		if (sess->state == SESS_STATE_DESTROYED) {
+			mutex_unlock(&sess->lock);
+			return;
+		}
+		sess->state = SESS_STATE_DISCONNECTED;
+		__set_dev_states_closed(sess);
+		mutex_unlock(&sess->lock);
+		break;
+	case IBTRS_CLT_SESS_EV_RECONNECT:
+		mutex_lock(&sess->lock);
+		if (sess->state == SESS_STATE_DESTROYED) {
+			/* This may happen if the session started to be closed
+			 * before the reconnect event arrived. In this case, we
+			 * just return and the session will be closed later
+			 */
+			mutex_unlock(&sess->lock);
+			return;
+		}
+		sess->state = SESS_STATE_READY;
+
+		mutex_unlock(&sess->lock);
+		memset(&attrs, 0, sizeof(attrs));
+		ibtrs_clt_query(sess->sess, &attrs);
+		strlcpy(sess->hostname, attrs.hostname, sizeof(sess->hostname));
+		sess->max_io_size = attrs.max_io_size;
+		ibnbd_schedule_reopen(sess);
+		break;
+	case IBTRS_CLT_SESS_EV_MAX_RECONN_EXCEEDED:
+		INFO_NP("Reconnect attempts exceeded for session %s\n",
+			sess->str_addr);
+		break;
+	default:
+		ERR_NP("Unknown session event received (%d), session: (%s)\n",
+		       ev, sess->str_addr);
+	}
+}
+
+static int ibnbd_cmp_sock_addr(const struct sockaddr_storage *a,
+			       const struct sockaddr_storage *b)
+{
+	if (a->ss_family == b->ss_family) {
+		switch (a->ss_family) {
+		case AF_INET:
+			return memcmp(&((struct sockaddr_in *)a)->sin_addr,
+				      &((struct sockaddr_in *)b)->sin_addr,
+				      sizeof(struct in_addr));
+		case AF_INET6:
+			return memcmp(&((struct sockaddr_in6 *)a)->sin6_addr,
+				      &((struct sockaddr_in6 *)b)->sin6_addr,
+				      sizeof(struct in6_addr));
+		case AF_IB:
+			return memcmp(&((struct sockaddr_ib *)a)->sib_addr,
+				      &((struct sockaddr_ib *)b)->sib_addr,
+				      sizeof(struct ib_addr));
+		default:
+			ERR_NP("Unknown address family: %d\n", a->ss_family);
+			return -EINVAL;
+		}
+	} else {
+		return -1;
+	}
+}
+
+struct ibnbd_session *ibnbd_clt_find_sess(const struct sockaddr_storage *addr)
+{
+	struct ibnbd_session *sess;
+
+	spin_lock(&sess_lock);
+	list_for_each_entry(sess, &session_list, list)
+		if (!ibnbd_cmp_sock_addr(&sess->addr, addr)) {
+			spin_unlock(&sess_lock);
+			return sess;
+		}
+	spin_unlock(&sess_lock);
+
+	return NULL;
+}
+
+static void ibnbd_init_cpu_qlists(struct ibnbd_cpu_qlist __percpu *cpu_queues)
+{
+	unsigned int cpu;
+	struct ibnbd_cpu_qlist *cpu_q;
+
+	for_each_online_cpu(cpu) {
+		cpu_q = per_cpu_ptr(cpu_queues, cpu);
+
+		cpu_q->cpu = cpu;
+		INIT_LIST_HEAD(&cpu_q->requeue_list);
+		spin_lock_init(&cpu_q->requeue_lock);
+	}
+}
+
+static struct blk_mq_ops ibnbd_mq_ops;
+static int setup_mq_tags(struct ibnbd_session *sess)
+{
+	struct blk_mq_tag_set *tags = &sess->tag_set;
+
+	memset(tags, 0, sizeof(*tags));
+	tags->ops		= &ibnbd_mq_ops;
+	tags->queue_depth	= sess->queue_depth;
+	tags->numa_node		= NUMA_NO_NODE;
+	tags->flags		= BLK_MQ_F_SHOULD_MERGE |
+				  BLK_MQ_F_SG_MERGE     |
+				  BLK_MQ_F_TAG_SHARED;
+	tags->cmd_size		= sizeof(struct ibnbd_iu);
+	tags->nr_hw_queues	= num_online_cpus();
+
+	return blk_mq_alloc_tag_set(tags);
+}
+
+static void destroy_mq_tags(struct ibnbd_session *sess)
+{
+	blk_mq_free_tag_set(&sess->tag_set);
+}
+
+struct ibnbd_session *ibnbd_create_session(const struct sockaddr_storage *addr)
+{
+	struct ibnbd_session *sess;
+	struct ibtrs_attrs attrs;
+	char str_addr[IBTRS_ADDRLEN];
+	int err;
+	int cpu;
+
+	err = ibtrs_addr_to_str(addr, str_addr, sizeof(str_addr));
+	if (err < 0) {
+		ERR_NP("Can't create session, invalid address\n");
+		return ERR_PTR(err);
+	}
+
+	DEB("Establishing session to %s\n", str_addr);
+
+	if (ibnbd_clt_find_sess(addr)) {
+		ERR_NP("Can't create session, session to %s already exists\n",
+		       str_addr);
+		return ERR_PTR(-EEXIST);
+	}
+
+	sess = kzalloc_node(sizeof(*sess), GFP_KERNEL, NUMA_NO_NODE);
+	if (unlikely(!sess)) {
+		ERR_NP("Failed to create session to %s,"
+		       " allocating session struct failed\n", str_addr);
+		return ERR_PTR(-ENOMEM);
+	}
+	sess->cpu_queues = alloc_percpu(struct ibnbd_cpu_qlist);
+	if (unlikely(!sess->cpu_queues)) {
+		ERR_NP("Failed to create session to %s,"
+		       " alloc of percpu var (cpu_queues) failed\n", str_addr);
+		kvfree(sess);
+		return ERR_PTR(-ENOMEM);
+	}
+	ibnbd_init_cpu_qlists(sess->cpu_queues);
+
+	/**
+	 * That is simple percpu variable which stores cpu indeces, which are
+	 * incremented on each access.  We need that for the sake of fairness
+	 * to wake up queues in a round-robin manner.
+	 */
+	sess->cpu_rr = alloc_percpu(int);
+	if (unlikely(!sess->cpu_rr)) {
+		ERR_NP("Failed to create session to %s,"
+		       " alloc of percpu var (cpu_rr) failed\n", str_addr);
+		free_percpu(sess->cpu_queues);
+		kfree(sess);
+		return ERR_PTR(-ENOMEM);
+	}
+	for_each_online_cpu(cpu) {
+		*per_cpu_ptr(sess->cpu_rr, cpu) = -1;
+	}
+
+	memset(&attrs, 0, sizeof(attrs));
+	memcpy(&sess->addr, addr, sizeof(sess->addr));
+	strlcpy(sess->str_addr, str_addr, sizeof(sess->str_addr));
+
+	spin_lock(&sess_lock);
+	list_add(&sess->list, &session_list);
+	spin_unlock(&sess_lock);
+
+	atomic_set(&sess->busy, 0);
+	mutex_init(&sess->lock);
+	INIT_LIST_HEAD(&sess->devs_list);
+	bitmap_zero(sess->cpu_queues_bm, NR_CPUS);
+	kref_init(&sess->refcount);
+	sess->state = SESS_STATE_DISCONNECTED;
+
+	sess->sess = ibtrs_clt_open(addr, sizeof(struct ibnbd_iu), sess,
+				    RECONNECT_DELAY, BMAX_SEGMENTS,
+				    MAX_RECONNECTS);
+	if (!IS_ERR(sess->sess)) {
+		mutex_lock(&sess->lock);
+		sess->state = SESS_STATE_READY;
+		mutex_unlock(&sess->lock);
+	} else {
+		err = PTR_ERR(sess->sess);
+		goto out_free;
+	}
+
+	ibtrs_clt_query(sess->sess, &attrs);
+	strlcpy(sess->hostname, attrs.hostname, sizeof(sess->hostname));
+	sess->max_io_size = attrs.max_io_size;
+	sess->queue_depth = attrs.queue_depth;
+
+	err = setup_mq_tags(sess);
+	if (unlikely(err))
+		goto close_sess;
+
+	err = update_sess_info(sess);
+	if (unlikely(err))
+		goto destroy_tags;
+
+	return sess;
+
+destroy_tags:
+	destroy_mq_tags(sess);
+close_sess:
+	ibtrs_clt_close(sess->sess);
+out_free:
+	spin_lock(&sess_lock);
+	list_del(&sess->list);
+	spin_unlock(&sess_lock);
+	free_percpu(sess->cpu_queues);
+	free_percpu(sess->cpu_rr);
+	kfree(sess);
+	return ERR_PTR(err);
+}
+
+static void ibnbd_clt_destroy_session(struct ibnbd_session *sess)
+{
+	mutex_lock(&sess->lock);
+	sess->state = SESS_STATE_DESTROYED;
+
+	if (!list_empty(&sess->devs_list)) {
+		mutex_unlock(&sess->lock);
+		WRN_NP("Device list is not empty,"
+		       " closing session to %s failed\n", sess->str_addr);
+		return;
+	}
+	mutex_unlock(&sess->lock);
+	ibtrs_clt_close(sess->sess);
+
+	destroy_mq_tags(sess);
+	spin_lock(&sess_lock);
+	list_del(&sess->list);
+	spin_unlock(&sess_lock);
+	wake_up(&sess_list_waitq);
+
+	free_percpu(sess->cpu_queues);
+	free_percpu(sess->cpu_rr);
+	kfree(sess);
+}
+
+void ibnbd_clt_sess_release(struct kref *ref)
+{
+	struct ibnbd_session *sess = container_of(ref, struct ibnbd_session,
+						  refcount);
+
+	ibnbd_clt_destroy_session(sess);
+}
+
+static int ibnbd_client_open(struct block_device *block_device, fmode_t mode)
+{
+	struct ibnbd_dev *dev = block_device->bd_disk->private_data;
+
+	if (dev->read_only && (mode & FMODE_WRITE))
+		return -EPERM;
+
+	if (dev->dev_state == DEV_STATE_UNMAPPED ||
+	    !ibnbd_clt_get_dev(dev))
+		return -EIO;
+
+	DEB("OPEN, name=%s, open_cnt=%d\n", dev->gd->disk_name,
+	    atomic_read(&dev->refcount) - 1);
+
+	return 0;
+}
+
+static void ibnbd_client_release(struct gendisk *gen, fmode_t mode)
+{
+	struct ibnbd_dev *dev = gen->private_data;
+
+	DEB("RELEASE, name=%s, open_cnt %d\n", dev->gd->disk_name,
+	    atomic_read(&dev->refcount) - 1);
+
+	ibnbd_clt_put_dev(dev);
+}
+
+static int ibnbd_client_getgeo(struct block_device *block_device,
+			       struct hd_geometry *geo)
+{
+	u64 size;
+	struct ibnbd_dev *dev;
+
+	dev = block_device->bd_disk->private_data;
+	size = dev->size * (dev->logical_block_size / KERNEL_SECTOR_SIZE);
+	geo->cylinders	= (size & ~0x3f) >> 6;	/* size/64 */
+	geo->heads	= 4;
+	geo->sectors	= 16;
+	geo->start	= 0;
+
+	return 0;
+}
+
+static const struct block_device_operations ibnbd_client_ops = {
+	.owner		= THIS_MODULE,
+	.open		= ibnbd_client_open,
+	.release	= ibnbd_client_release,
+	.getgeo		= ibnbd_client_getgeo
+};
+
+static size_t ibnbd_clt_get_sg_size(struct scatterlist *sglist, u32 len)
+{
+	struct scatterlist *sg;
+	size_t tsize = 0;
+	int i;
+
+	for_each_sg(sglist, sg, len, i)
+		tsize += sg->length;
+	return tsize;
+}
+
+static inline int ibnbd_clt_setup_discard(struct request *rq)
+{
+	struct page *page;
+
+	page = alloc_page(GFP_ATOMIC | __GFP_ZERO);
+	if (!page)
+		return -ENOMEM;
+	rq->special_vec.bv_page = page;
+	rq->special_vec.bv_offset = 0;
+	rq->special_vec.bv_len = PAGE_SIZE;
+	rq->rq_flags |= RQF_SPECIAL_PAYLOAD;
+	return 0;
+}
+
+static int ibnbd_client_xfer_request(struct ibnbd_dev *dev, struct request *rq,
+				     struct ibnbd_iu *iu)
+{
+	int err;
+
+	unsigned int sg_cnt;
+	size_t size;
+	struct kvec vec;
+	struct ibtrs_session *sess = dev->sess->sess;
+	struct ibtrs_tag *tag = iu->tag;
+	struct scatterlist *sg = iu->sglist;
+
+	if (req_op(rq) == REQ_OP_DISCARD) {
+		err = ibnbd_clt_setup_discard(rq);
+		if (err)
+			return err;
+	}
+	sg_cnt = blk_rq_nr_phys_segments(rq);
+
+	if (sg_cnt == 0)
+		sg_mark_end(&sg[0]);
+	else
+		sg_mark_end(&sg[sg_cnt - 1]);
+
+	iu->rq		= rq;
+	iu->dev		= dev;
+	iu->msg.sector	= blk_rq_pos(rq);
+	iu->msg.bi_size = blk_rq_bytes(rq);
+	iu->msg.rw	= rq_cmd_to_ibnbd_io_flags(rq);
+
+	sg_cnt = blk_rq_map_sg(dev->queue, rq, sg);
+
+	iu->msg.hdr.type	= IBNBD_MSG_IO;
+	iu->msg.device_id	= dev->device_id;
+
+	size = ibnbd_clt_get_sg_size(sg, sg_cnt);
+	vec = (struct kvec) {
+		.iov_base = &iu->msg,
+		.iov_len  = sizeof(iu->msg)
+	};
+
+	if (req_op(rq) == READ)
+		err = ibtrs_clt_request_rdma_write(sess, tag, iu, &vec, 1, size,
+						   sg, sg_cnt);
+	else
+		err = ibtrs_clt_rdma_write(sess, tag, iu, &vec, 1, size, sg,
+					   sg_cnt);
+	if (unlikely(err)) {
+		ERR_RL(dev, "IBTRS failed to transfer IO, errno: %d\n", err);
+		return err;
+	}
+
+	return 0;
+}
+
+/**
+ * ibnbd_dev_add_to_requeue() - add device to requeue if session is busy
+ *
+ * Description:
+ *     If session is busy, that means someone will requeue us when resources
+ *     are freed.  If session is not doing anything - device is not added to
+ *     the list and @false is returned.
+ */
+static inline bool ibnbd_dev_add_to_requeue(struct ibnbd_dev *dev,
+					    struct ibnbd_queue *q)
+{
+	struct ibnbd_session *sess = dev->sess;
+	struct ibnbd_cpu_qlist *cpu_q;
+	unsigned long flags;
+	bool added = true;
+	bool need_set;
+
+	cpu_q = get_cpu_ptr(sess->cpu_queues);
+	spin_lock_irqsave(&cpu_q->requeue_lock, flags);
+
+	if (likely(!test_and_set_bit_lock(0, &q->in_list))) {
+		if (WARN_ON(!list_empty(&q->requeue_list)))
+			goto unlock;
+
+		need_set = !test_bit(cpu_q->cpu, sess->cpu_queues_bm);
+		if (need_set) {
+			set_bit(cpu_q->cpu, sess->cpu_queues_bm);
+			/* Paired with ibnbd_put_tag().	 Set a bit first
+			 * and then observe the busy counter.
+			 */
+			smp_mb__before_atomic();
+		}
+		if (likely(atomic_read(&sess->busy))) {
+			list_add_tail(&q->requeue_list, &cpu_q->requeue_list);
+		} else {
+			/* Very unlikely, but possible: busy counter was
+			 * observed as zero.  Drop all bits and return
+			 * false to restart the queue by ourselves.
+			 */
+			if (need_set)
+				clear_bit(cpu_q->cpu, sess->cpu_queues_bm);
+			clear_bit_unlock(0, &q->in_list);
+			added = false;
+		}
+	}
+unlock:
+	spin_unlock_irqrestore(&cpu_q->requeue_lock, flags);
+	put_cpu_ptr(sess->cpu_queues);
+
+	return added;
+}
+
+static void ibnbd_dev_kick_mq_queue(struct ibnbd_dev *dev,
+				    struct blk_mq_hw_ctx *hctx,
+				    int delay)
+{
+	struct ibnbd_queue *q = hctx->driver_data;
+
+	if (WARN_ON(dev->queue_mode != BLK_MQ))
+		return;
+	blk_mq_stop_hw_queue(hctx);
+
+	if (delay != IBNBD_DELAY_IFBUSY)
+		blk_mq_delay_queue(hctx, delay);
+	else if (unlikely(!ibnbd_dev_add_to_requeue(dev, q)))
+		/* If session is not busy we have to restart
+		 * the queue ourselves.
+		 */
+		blk_mq_delay_queue(hctx, IBNBD_DELAY_10ms);
+}
+
+static void ibnbd_dev_kick_queue(struct ibnbd_dev *dev, int delay)
+{
+	if (WARN_ON(dev->queue_mode != BLK_RQ))
+		return;
+	blk_stop_queue(dev->queue);
+
+	if (delay != IBNBD_DELAY_IFBUSY)
+		ibnbd_blk_delay_queue(dev, delay);
+	else if (unlikely(!ibnbd_dev_add_to_requeue(dev, dev->hw_queues)))
+		/* If session is not busy we have to restart
+		 * the queue ourselves.
+		 */
+		ibnbd_blk_delay_queue(dev, IBNBD_DELAY_10ms);
+}
+
+static int ibnbd_queue_rq(struct blk_mq_hw_ctx *hctx,
+			  const struct blk_mq_queue_data *bd)
+{
+	struct request *rq = bd->rq;
+	struct ibnbd_dev *dev = rq->rq_disk->private_data;
+	struct ibnbd_iu *iu = blk_mq_rq_to_pdu(rq);
+	int err;
+
+	if (unlikely(!ibnbd_clt_dev_is_open(dev)))
+		return BLK_MQ_RQ_QUEUE_ERROR;
+
+	iu->tag = ibnbd_get_tag(dev->sess, hctx->next_cpu, blk_rq_bytes(rq),
+				IBTRS_TAG_NOWAIT);
+	if (unlikely(!iu->tag)) {
+		ibnbd_dev_kick_mq_queue(dev, hctx, IBNBD_DELAY_IFBUSY);
+		return BLK_MQ_RQ_QUEUE_BUSY;
+	}
+
+	blk_mq_start_request(rq);
+	err = ibnbd_client_xfer_request(dev, rq, iu);
+	if (likely(err == 0))
+		return BLK_MQ_RQ_QUEUE_OK;
+	if (unlikely(err == -EAGAIN || err == -ENOMEM)) {
+		ibnbd_dev_kick_mq_queue(dev, hctx, IBNBD_DELAY_10ms);
+		ibnbd_put_tag(dev->sess, iu->tag);
+		return BLK_MQ_RQ_QUEUE_BUSY;
+	}
+
+	ibnbd_put_tag(dev->sess, iu->tag);
+	return BLK_MQ_RQ_QUEUE_ERROR;
+}
+
+static int ibnbd_init_request(void *data, struct request *rq,
+			      unsigned int hctx_idx, unsigned int request_idx,
+			      unsigned int numa_node)
+{
+	struct ibnbd_iu *iu = blk_mq_rq_to_pdu(rq);
+
+	sg_init_table(iu->sglist, BMAX_SEGMENTS);
+	return 0;
+}
+
+static inline void ibnbd_init_hw_queue(struct ibnbd_dev *dev,
+				       struct ibnbd_queue *q,
+				       struct blk_mq_hw_ctx *hctx)
+{
+	INIT_LIST_HEAD(&q->requeue_list);
+	q->dev  = dev;
+	q->hctx = hctx;
+}
+
+static void ibnbd_init_mq_hw_queues(struct ibnbd_dev *dev)
+{
+	int i;
+	struct blk_mq_hw_ctx *hctx;
+	struct ibnbd_queue *q;
+
+	queue_for_each_hw_ctx(dev->queue, hctx, i) {
+		q = &dev->hw_queues[i];
+		ibnbd_init_hw_queue(dev, q, hctx);
+		hctx->driver_data = q;
+	}
+}
+
+static struct blk_mq_ops ibnbd_mq_ops = {
+	.queue_rq	= ibnbd_queue_rq,
+	.init_request	= ibnbd_init_request,
+	.complete	= ibnbd_softirq_done_fn,
+};
+
+static int index_to_minor(int index)
+{
+	return index << IBNBD_PART_BITS;
+}
+
+static int minor_to_index(int minor)
+{
+	return minor >> IBNBD_PART_BITS;
+}
+
+static int ibnbd_rq_prep_fn(struct request_queue *q, struct request *rq)
+{
+	struct ibnbd_dev *dev = q->queuedata;
+	struct ibnbd_iu *iu;
+
+	iu = ibnbd_get_iu(dev->sess, blk_rq_bytes(rq), IBTRS_TAG_NOWAIT);
+	if (likely(iu)) {
+		rq->special = iu;
+		rq->rq_flags |= RQF_DONTPREP;
+
+		return BLKPREP_OK;
+	}
+
+	ibnbd_dev_kick_queue(dev, IBNBD_DELAY_IFBUSY);
+	return BLKPREP_DEFER;
+}
+
+static void ibnbd_rq_unprep_fn(struct request_queue *q, struct request *rq)
+{
+	struct ibnbd_dev *dev = q->queuedata;
+
+	if (WARN_ON(!rq->special))
+		return;
+	ibnbd_put_iu(dev->sess, rq->special);
+	rq->special = NULL;
+	rq->rq_flags &= ~RQF_DONTPREP;
+}
+
+static void ibnbd_clt_request(struct request_queue *q)
+__must_hold(q->queue_lock)
+{
+	int err;
+	struct request *req;
+	struct ibnbd_iu *iu;
+	struct ibnbd_dev *dev = q->queuedata;
+
+	while ((req = blk_fetch_request(q)) != NULL) {
+		spin_unlock_irq(q->queue_lock);
+
+		if (unlikely(!ibnbd_clt_dev_is_open(dev))) {
+			err = -EIO;
+			goto next;
+		}
+
+		iu = req->special;
+		if (WARN_ON(!iu)) {
+			err = -EIO;
+			goto next;
+		}
+
+		sg_init_table(iu->sglist, dev->max_segments);
+		err = ibnbd_client_xfer_request(dev, req, iu);
+next:
+		if (unlikely(err == -EAGAIN || err == -ENOMEM)) {
+			ibnbd_rq_unprep_fn(q, req);
+			spin_lock_irq(q->queue_lock);
+			blk_requeue_request(q, req);
+			ibnbd_dev_kick_queue(dev, IBNBD_DELAY_10ms);
+			break;
+		} else if (err) {
+			blk_end_request_all(req, err);
+		}
+
+		spin_lock_irq(q->queue_lock);
+	}
+}
+
+static int setup_mq_dev(struct ibnbd_dev *dev)
+{
+	dev->queue = blk_mq_init_queue(&dev->sess->tag_set);
+	if (IS_ERR(dev->queue)) {
+		ERR(dev, "Initializing multiqueue queue failed, errno: %ld\n",
+		    PTR_ERR(dev->queue));
+		return PTR_ERR(dev->queue);
+	}
+	ibnbd_init_mq_hw_queues(dev);
+	return 0;
+}
+
+static int setup_rq_dev(struct ibnbd_dev *dev)
+{
+	dev->queue = blk_init_queue(ibnbd_clt_request, NULL);
+	if (IS_ERR_OR_NULL(dev->queue)) {
+		if (IS_ERR(dev->queue)) {
+			ERR(dev, "Initializing request queue failed, "
+			    "errno: %ld\n", PTR_ERR(dev->queue));
+			return PTR_ERR(dev->queue);
+		}
+		ERR(dev, "Initializing request queue failed\n");
+		return -ENOMEM;
+	}
+
+	blk_queue_prep_rq(dev->queue, ibnbd_rq_prep_fn);
+	blk_queue_softirq_done(dev->queue, ibnbd_softirq_done_fn);
+	blk_queue_unprep_rq(dev->queue, ibnbd_rq_unprep_fn);
+
+	return 0;
+}
+
+static void setup_request_queue(struct ibnbd_dev *dev)
+{
+	blk_queue_logical_block_size(dev->queue, dev->logical_block_size);
+	blk_queue_physical_block_size(dev->queue, dev->physical_block_size);
+	blk_queue_max_hw_sectors(dev->queue, dev->max_hw_sectors);
+	blk_queue_max_write_same_sectors(dev->queue,
+					 dev->max_write_same_sectors);
+
+	blk_queue_max_discard_sectors(dev->queue, dev->max_discard_sectors);
+	dev->queue->limits.discard_zeroes_data	= dev->discard_zeroes_data;
+	dev->queue->limits.discard_granularity	= dev->discard_granularity;
+	dev->queue->limits.discard_alignment	= dev->discard_alignment;
+	if (dev->max_discard_sectors)
+		queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, dev->queue);
+	if (dev->secure_discard)
+		queue_flag_set_unlocked(QUEUE_FLAG_SECERASE, dev->queue);
+
+	queue_flag_set_unlocked(QUEUE_FLAG_SAME_COMP, dev->queue);
+	queue_flag_set_unlocked(QUEUE_FLAG_SAME_FORCE, dev->queue);
+	/* our hca only support 32 sg cnt, proto use one, so 31 left */
+	blk_queue_max_segments(dev->queue, dev->max_segments);
+	blk_queue_io_opt(dev->queue, dev->sess->max_io_size);
+	blk_queue_write_cache(dev->queue, true, true);
+	dev->queue->queuedata = dev;
+}
+
+static void ibnbd_clt_setup_gen_disk(struct ibnbd_dev *dev, int idx)
+{
+	dev->gd->major		= ibnbd_client_major;
+	dev->gd->first_minor	= index_to_minor(idx);
+	dev->gd->fops		= &ibnbd_client_ops;
+	dev->gd->queue		= dev->queue;
+	dev->gd->private_data	= dev;
+	snprintf(dev->gd->disk_name, sizeof(dev->gd->disk_name), "ibnbd%d",
+		 idx);
+	DEB("disk_name=%s, capacity=%zu, queue_mode=%s\n", dev->gd->disk_name,
+	    dev->nsectors * (dev->logical_block_size / KERNEL_SECTOR_SIZE),
+	    ibnbd_queue_mode_str(dev->queue_mode));
+
+	set_capacity(dev->gd, dev->nsectors * (dev->logical_block_size /
+					       KERNEL_SECTOR_SIZE));
+
+	if (dev->access_mode == IBNBD_ACCESS_RO) {
+		dev->read_only = true;
+		set_disk_ro(dev->gd, true);
+	} else {
+		dev->read_only = false;
+	}
+
+	if (!dev->rotational)
+		queue_flag_set_unlocked(QUEUE_FLAG_NONROT, dev->queue);
+}
+
+static void ibnbd_clt_add_gen_disk(struct ibnbd_dev *dev)
+{
+	add_disk(dev->gd);
+}
+
+static int ibnbd_client_setup_device(struct ibnbd_session *sess,
+				     struct ibnbd_dev *dev, int idx)
+{
+	int err;
+
+	dev->size = dev->nsectors * dev->logical_block_size;
+
+	switch (dev->queue_mode) {
+	case BLK_MQ:
+		err = setup_mq_dev(dev);
+		break;
+	case BLK_RQ:
+		err = setup_rq_dev(dev);
+		break;
+	default:
+		err = -EINVAL;
+	}
+
+	if (err)
+		return err;
+
+	setup_request_queue(dev);
+
+	dev->gd = alloc_disk_node(1 << IBNBD_PART_BITS,	NUMA_NO_NODE);
+	if (!dev->gd) {
+		ERR(dev, "Failed to allocate disk node\n");
+		blk_cleanup_queue(dev->queue);
+		return -ENOMEM;
+	}
+
+	ibnbd_clt_setup_gen_disk(dev, idx);
+
+	return 0;
+}
+
+static struct ibnbd_dev *init_dev(struct ibnbd_session *sess,
+				  enum ibnbd_access_mode access_mode,
+				  enum ibnbd_queue_mode queue_mode,
+				  const char *pathname)
+{
+	int ret;
+	struct ibnbd_dev *dev;
+	size_t nr;
+
+	dev = kzalloc_node(sizeof(*dev), GFP_KERNEL, NUMA_NO_NODE);
+	if (!dev) {
+		ERR_NP("Failed to initialize device '%s' from session %s,"
+		       " allocating device structure failed\n", pathname,
+		       sess->str_addr);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	nr = (queue_mode == BLK_MQ ? num_online_cpus() :
+	      queue_mode == BLK_RQ ? 1 : 0);
+	if (nr) {
+		dev->hw_queues = kcalloc(nr, sizeof(*dev->hw_queues),
+					 GFP_KERNEL);
+		if (unlikely(!dev->hw_queues)) {
+			ERR_NP("Failed to initialize device '%s' from session"
+			       " %s, allocating hw_queues failed.", pathname,
+			       sess->str_addr);
+			ret = -ENOMEM;
+			goto out_alloc;
+		}
+		/* for MQ mode we will init all hw queues after the
+		 * request queue is created
+		 */
+		if (queue_mode == BLK_RQ)
+			ibnbd_init_hw_queue(dev, dev->hw_queues, NULL);
+	}
+
+	idr_preload(GFP_KERNEL);
+	write_lock(&g_index_lock);
+	ret = idr_alloc(&g_index_idr, dev, 0, minor_to_index(1 << MINORBITS),
+			GFP_ATOMIC);
+	write_unlock(&g_index_lock);
+	idr_preload_end();
+	if (ret < 0) {
+		ERR_NP("Failed to initialize device '%s' from session %s,"
+		       " allocating idr failed, errno: %d\n", pathname,
+		       sess->str_addr, ret);
+		goto out_queues;
+	}
+
+	dev->clt_device_id	= ret;
+	dev->close_compl	= kmalloc(sizeof(*dev->close_compl),
+					  GFP_KERNEL);
+	if (!dev->close_compl) {
+		ERR_NP("Failed to initialize device '%s' from session %s,"
+		       " allocating close completion failed, errno: %d\n",
+		       pathname, sess->str_addr, ret);
+		ret = -ENOMEM;
+		goto out_idr;
+	}
+	init_completion(dev->close_compl);
+	dev->sess		= sess;
+	dev->access_mode	= access_mode;
+	dev->queue_mode		= queue_mode;
+
+	strlcpy(dev->pathname, pathname, sizeof(dev->pathname));
+
+	INIT_DELAYED_WORK(&dev->rq_delay_work, ibnbd_blk_delay_work);
+	mutex_init(&dev->lock);
+	atomic_set(&dev->refcount, 1);
+	dev->dev_state = DEV_STATE_INIT;
+
+	return dev;
+
+out_idr:
+	write_lock(&g_index_lock);
+	idr_remove(&g_index_idr, dev->clt_device_id);
+	write_unlock(&g_index_lock);
+out_queues:
+	kfree(dev->hw_queues);
+out_alloc:
+	kfree(dev);
+	return ERR_PTR(ret);
+}
+
+bool ibnbd_clt_dev_is_mapped(const char *pathname)
+{
+	struct ibnbd_dev *dev;
+
+	spin_lock(&dev_lock);
+	list_for_each_entry(dev, &devs_list, g_list)
+		if (!strncmp(dev->pathname, pathname, sizeof(dev->pathname))) {
+			spin_unlock(&dev_lock);
+			return true;
+		}
+	spin_unlock(&dev_lock);
+
+	return false;
+}
+
+static struct ibnbd_dev *__find_sess_dev(const struct ibnbd_session *sess,
+					 const char *pathname)
+{
+	struct ibnbd_dev *dev;
+
+	list_for_each_entry(dev, &sess->devs_list, list)
+		if (!strncmp(dev->pathname, pathname, sizeof(dev->pathname)))
+			return dev;
+
+	return NULL;
+}
+
+struct ibnbd_dev *ibnbd_client_add_device(struct ibnbd_session *sess,
+					  const char *pathname,
+					  enum ibnbd_access_mode access_mode,
+					  enum ibnbd_queue_mode queue_mode,
+					  enum ibnbd_io_mode io_mode)
+{
+	int ret;
+	struct ibnbd_dev *dev;
+	struct completion *open_compl;
+
+	DEB("Add remote device: server=%s, path='%s', access_mode=%d,"
+	    " queue_mode=%d\n", sess->str_addr, pathname, access_mode,
+	    queue_mode);
+
+	mutex_lock(&sess->lock);
+
+	if (sess->state != SESS_STATE_READY) {
+		mutex_unlock(&sess->lock);
+		ERR_NP("map_device: failed to map device '%s' from session %s,"
+		       " session is not connected\n", pathname, sess->str_addr);
+		return ERR_PTR(-ENOENT);
+	}
+
+	if (__find_sess_dev(sess, pathname)) {
+		mutex_unlock(&sess->lock);
+		ERR_NP("map_device: failed to map device '%s' from session %s,"
+		       " device with same path is already mapped\n", pathname,
+		       sess->str_addr);
+		return ERR_PTR(-EEXIST);
+	}
+
+	mutex_unlock(&sess->lock);
+	dev = init_dev(sess, access_mode, queue_mode, pathname);
+	if (IS_ERR(dev)) {
+		ERR_NP("map_device: failed to map device '%s' from session %s,"
+		       " can't initialize device, errno: %ld\n", pathname,
+		       sess->str_addr, PTR_ERR(dev));
+		return dev;
+	}
+
+	ibnbd_clt_get_sess(sess);
+
+	open_compl = kmalloc(sizeof(*open_compl), GFP_KERNEL);
+	if (!open_compl) {
+		ERR(dev, "map_device: failed, Can't allocate memory for"
+		    " completion\n");
+		ret = -ENOMEM;
+		goto out;
+	}
+	init_completion(open_compl);
+	dev->open_compl = open_compl;
+	dev->io_mode = io_mode;
+
+	ret = open_remote_device(dev);
+	if (ret) {
+		ERR(dev, "map_device: failed, can't open remote device,"
+		    " errno: %d\n", ret);
+		kfree(open_compl);
+		dev->open_compl = NULL;
+		ret = -EINVAL;
+		goto out;
+	}
+	wait_for_completion(dev->open_compl);
+	mutex_lock(&dev->lock);
+
+	kfree(open_compl);
+	dev->open_compl = NULL;
+
+	if (!ibnbd_clt_dev_is_open(dev)) {
+		mutex_unlock(&dev->lock);
+		ret = dev->open_errno;
+		ERR(dev, "map_device: failed errno: %d\n", ret);
+		goto out;
+	}
+
+	mutex_lock(&sess->lock);
+	list_add(&dev->list, &sess->devs_list);
+	mutex_unlock(&sess->lock);
+
+	spin_lock(&dev_lock);
+	list_add(&dev->g_list, &devs_list);
+	spin_unlock(&dev_lock);
+
+	DEB("Opened remote device: server=%s, path='%s'\n", sess->str_addr,
+	    pathname);
+	ret = ibnbd_client_setup_device(sess, dev, dev->clt_device_id);
+	if (ret) {
+		ERR(dev, "map_device: Failed to configure device, errno: %d\n",
+		    ret);
+		mutex_unlock(&dev->lock);
+		ret = -EINVAL;
+		goto out_close;
+	}
+
+	INFO(dev, "map_device: Device mapped as %s (nsectors: %zu,"
+	     " logical_block_size: %d, physical_block_size: %d,"
+	     " max_write_same_sectors: %d, max_discard_sectors: %d,"
+	     " discard_zeroes_data: %d, discard_granularity: %d,"
+	     " discard_alignment: %d, secure_discard: %d, max_segments: %d,"
+	     " max_hw_sectors: %d, rotational: %d)\n", dev->gd->disk_name,
+	     dev->nsectors, dev->logical_block_size, dev->physical_block_size,
+	     dev->max_write_same_sectors, dev->max_discard_sectors,
+	     dev->discard_zeroes_data, dev->discard_granularity,
+	     dev->discard_alignment, dev->secure_discard,
+	     dev->max_segments, dev->max_hw_sectors, dev->rotational);
+
+	mutex_unlock(&dev->lock);
+
+	ibnbd_clt_add_gen_disk(dev);
+
+	return dev;
+
+out_close:
+	if (!WARN_ON(ibnbd_close_device(dev, true)))
+		wait_for_completion(dev->close_compl);
+out:
+	ibnbd_clt_put_dev(dev);
+	return ERR_PTR(ret);
+}
+
+void ibnbd_destroy_gen_disk(struct ibnbd_dev *dev)
+{
+	del_gendisk(dev->gd);
+	/*
+	 * Before marking queue as dying (blk_cleanup_queue() does that)
+	 * we have to be sure that everything in-flight has gone.
+	 * Blink with freeze/unfreeze.
+	 */
+	blk_mq_freeze_queue(dev->queue);
+	blk_mq_unfreeze_queue(dev->queue);
+	blk_cleanup_queue(dev->queue);
+	put_disk(dev->gd);
+
+	ibnbd_clt_put_dev(dev);
+}
+
+static int __close_device(struct ibnbd_dev *dev, bool force)
+__must_hold(&dev->sess->lock)
+{
+	enum ibnbd_dev_state prev_state;
+	int refcount, ret = 0;
+
+	mutex_lock(&dev->lock);
+
+	if (dev->dev_state == DEV_STATE_UNMAPPED) {
+		INFO(dev, "Device is already being unmapped\n");
+		ret = -EALREADY;
+		goto out;
+	}
+
+	refcount = atomic_read(&dev->refcount);
+	if (!force && refcount > 1) {
+		ERR(dev, "Closing device failed, device is in use,"
+		    " (%d device users)\n", refcount - 1);
+		ret = -EBUSY;
+		goto out;
+	}
+
+	prev_state = dev->dev_state;
+	dev->dev_state = DEV_STATE_UNMAPPED;
+
+	list_del(&dev->list);
+
+	spin_lock(&dev_lock);
+	list_del(&dev->g_list);
+	spin_unlock(&dev_lock);
+
+	ibnbd_clt_remove_dev_symlink(dev);
+	mutex_unlock(&dev->lock);
+
+	mutex_unlock(&dev->sess->lock);
+	if (prev_state == DEV_STATE_OPEN && dev->sess->sess) {
+		if (send_msg_close(dev->sess->sess, dev->device_id))
+			complete(dev->close_compl);
+	} else {
+		complete(dev->close_compl);
+	}
+
+	mutex_lock(&dev->sess->lock);
+	if (dev->gd)
+		INFO(dev, "Device is unmapped\n");
+	else
+		INFO(dev, "Device is unmapped\n");
+	return 0;
+out:
+	mutex_unlock(&dev->lock);
+	return ret;
+}
+
+int ibnbd_close_device(struct ibnbd_dev *dev, bool force)
+{
+	int ret;
+
+	mutex_lock(&dev->sess->lock);
+	ret = __close_device(dev, force);
+	mutex_unlock(&dev->sess->lock);
+
+	return ret;
+}
+
+static void ibnbd_destroy_sessions(void)
+{
+	struct ibnbd_session *sess, *sn;
+	struct ibnbd_dev *dev, *tn;
+	int ret;
+
+	list_for_each_entry_safe(sess, sn, &session_list, list) {
+		if (!ibnbd_clt_get_sess(sess))
+			continue;
+		mutex_lock(&sess->lock);
+		sess->state = SESS_STATE_DESTROYED;
+		list_for_each_entry_safe(dev, tn, &sess->devs_list, list) {
+			if (!kobject_get(&dev->kobj))
+				continue;
+			ret = __close_device(dev, true);
+			if (ret)
+				WRN(dev, "Closing device failed, errno: %d\n",
+				    ret);
+			else
+				wait_for_completion(dev->close_compl);
+			ibnbd_clt_schedule_dev_destroy(dev);
+			kobject_put(&dev->kobj);
+		}
+		mutex_unlock(&sess->lock);
+		ibnbd_clt_put_sess(sess);
+	}
+}
+
+static int __init ibnbd_client_init(void)
+{
+	int err;
+
+	INFO_NP("Loading module ibnbd_client, version: "
+		__stringify(IBNBD_VER) " (softirq_enable: %d)\n",
+		softirq_enable);
+
+	ibnbd_client_major = register_blkdev(ibnbd_client_major, "ibnbd");
+	if (ibnbd_client_major <= 0) {
+		ERR_NP("Failed to load module,"
+		       " block device registration failed\n");
+		err = -EBUSY;
+		goto out;
+	}
+
+	ops.owner	= THIS_MODULE;
+	ops.recv	= ibnbd_clt_recv;
+	ops.rdma_ev	= ibnbd_clt_rdma_ev;
+	ops.sess_ev	= ibnbd_clt_sess_ev;
+	err = ibtrs_clt_register(&ops);
+	if (err) {
+		ERR_NP("Failed to load module, IBTRS registration failed,"
+		       " errno: %d\n", err);
+		goto out_unregister_blk;
+	}
+	err = ibnbd_clt_create_sysfs_files();
+	if (err) {
+		ERR_NP("Failed to load module,"
+		       " creating sysfs device files failed, error: %d\n", err);
+		goto out_unregister;
+	}
+
+	return 0;
+
+out_unregister:
+	ibtrs_clt_unregister(&ops);
+out_unregister_blk:
+	unregister_blkdev(ibnbd_client_major, "ibnbd");
+out:
+	return err;
+}
+
+static void __exit ibnbd_client_exit(void)
+{
+	INFO_NP("Unloading module\n");
+	ibnbd_clt_destroy_default_group();
+	flush_scheduled_work();
+	ibnbd_destroy_sessions();
+	wait_event(sess_list_waitq, list_empty(&session_list));
+	ibnbd_clt_destroy_sysfs_files();
+	ibtrs_clt_unregister(&ops);
+	unregister_blkdev(ibnbd_client_major, "ibnbd");
+	idr_destroy(&g_index_idr);
+	INFO_NP("Module unloaded\n");
+}
+
+module_init(ibnbd_client_init);
+module_exit(ibnbd_client_exit);
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 17/28] ibnbd_clt: add header shared in ibnbd_client
@ 2017-03-24 10:45   ` Jack Wang
  0 siblings, 0 replies; 87+ messages in thread
From: Jack Wang @ 2017-03-24 10:45 UTC (permalink / raw)
  To: linux-block, linux-rdma
  Cc: dledford, axboe, hch, mail, Milind.dumbare, yun.wang, Jack Wang,
	Kleber Souza, Danil Kipnis, Roman Pen

From: Jack Wang <jinpu.wang@profitbricks.com>

Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Kleber Souza <kleber.souza@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kipnis@profitbricks.com>
Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
---
 drivers/block/ibnbd_client/ibnbd_clt.h | 231 +++++++++++++++++++++++++++++++++
 1 file changed, 231 insertions(+)
 create mode 100644 drivers/block/ibnbd_client/ibnbd_clt.h

diff --git a/drivers/block/ibnbd_client/ibnbd_clt.h b/drivers/block/ibnbd_client/ibnbd_clt.h
new file mode 100644
index 0000000..3f0db78
--- /dev/null
+++ b/drivers/block/ibnbd_client/ibnbd_clt.h
@@ -0,0 +1,231 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler < mail@fholler.de>
+ *          Jack Wang <jinpu.wang@profitbricks.com>
+ *   	    Kleber Souza <kleber.souza@profitbricks.com>
+ * 	    Danil Kipnis <danil.kipnis@profitbricks.com>
+ *   	    Roman Pen <roman.penyaev@profitbricks.com>
+ *          Milind Dumbare <Milind.dumbare@gmail.com>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#ifndef _IBNBD_CLT_H
+#define _IBNBD_CLT_H
+#include <linux/blkdev.h>
+#include <linux/wait.h>			/* for wait_queue_head_t */
+#include <linux/in.h>			/* for sockaddr_in */
+#include <linux/inet.h>			/* for sockaddr_in */
+#include <linux/blk-mq.h>
+#include "ibnbd_clt_log.h"
+#include "../ibnbd_inc/ibnbd.h"
+#include "../ibnbd_inc/ibnbd-proto.h"	/* ibnbd protocol messages */
+#include <rdma/ibtrs_clt.h>	/* for ibtrs api */
+#include <rdma/ibtrs.h>
+
+#define IP_PREFIX "ip:"
+#define IP_PREFIX_LEN strlen(IP_PREFIX)
+#define GID_PREFIX "gid:"
+#define GID_PREFIX_LEN strlen(GID_PREFIX)
+
+#define BMAX_SEGMENTS 31
+#define RECONNECT_DELAY 30
+#define MAX_RECONNECTS -1
+
+enum ibnbd_dev_state {
+	DEV_STATE_INIT,
+	DEV_STATE_INIT_CLOSED,
+	DEV_STATE_CLOSED,
+	DEV_STATE_UNMAPPED,
+	DEV_STATE_OPEN
+};
+
+enum ibnbd_queue_mode {
+	BLK_MQ,
+	BLK_RQ
+};
+
+struct ibnbd_iu {
+	struct request		*rq;
+	struct ibtrs_tag	*tag;
+	struct ibnbd_dev	*dev;
+	struct ibnbd_msg_io	msg;
+	int			errno;
+	struct scatterlist	sglist[BMAX_SEGMENTS];
+};
+
+struct ibnbd_cpu_qlist {
+	struct list_head	requeue_list;
+	spinlock_t		requeue_lock;
+	unsigned int		cpu;
+};
+
+enum sess_state {
+	SESS_STATE_READY,
+	SESS_STATE_DISCONNECTED,
+	SESS_STATE_DESTROYED,
+};
+
+struct ibnbd_session {
+	struct list_head        list;
+	struct ibtrs_session    *sess;
+	struct ibnbd_cpu_qlist	__percpu
+				*cpu_queues;
+	DECLARE_BITMAP(cpu_queues_bm, NR_CPUS);
+	int	__percpu	*cpu_rr; /* per-cpu var for CPU round-robin */
+	atomic_t		busy;
+	int			queue_depth;
+	u32			max_io_size;
+	struct blk_mq_tag_set	tag_set;
+	struct mutex		lock; /* protects state and devs_list */
+	struct list_head        devs_list; /* list of struct ibnbd_dev */
+	struct kref		refcount;
+	struct sockaddr_storage addr;
+	char			str_addr[IBTRS_ADDRLEN];
+	char			hostname[MAXHOSTNAMELEN];
+	enum sess_state		state;
+	u8			ver; /* protocol version */
+	struct completion	*sess_info_compl;
+};
+
+struct ibnbd_work {
+	struct work_struct	work;
+	struct ibnbd_session	*sess;
+};
+
+/**
+ * Submission queues.
+ */
+struct ibnbd_queue {
+	struct list_head	requeue_list;
+	unsigned long		in_list;
+	struct ibnbd_dev	*dev;
+	struct blk_mq_hw_ctx	*hctx;
+};
+
+struct ibnbd_dev {
+	struct list_head        g_list;
+	struct ibnbd_session	*sess;
+	struct request_queue	*queue;
+	struct ibnbd_queue	*hw_queues;
+	struct delayed_work	rq_delay_work;
+	u32			device_id;
+	u32			clt_device_id;
+	struct completion	*open_compl;	/* completion for open msg */
+	int			open_errno;
+	struct mutex		lock;
+	enum ibnbd_dev_state	dev_state;
+	enum ibnbd_queue_mode	queue_mode;
+	enum ibnbd_io_mode	io_mode; /* user requested */
+	enum ibnbd_io_mode	remote_io_mode; /* server really used */
+	/* local Idr index - used to track minor number allocations. */
+	char			pathname[NAME_MAX];
+	enum ibnbd_access_mode	access_mode;
+	bool			read_only;
+	bool			rotational;
+	u32			max_hw_sectors;
+	u32			max_write_same_sectors;
+	u32			max_discard_sectors;
+	u32			discard_zeroes_data;
+	u32			discard_granularity;
+	u32			discard_alignment;
+	u16			secure_discard;
+	u16			physical_block_size;
+	u16			logical_block_size;
+	u16			max_segments;
+	size_t			nsectors;
+	u64			size;		/* device size in bytes */
+	struct list_head        list;
+	struct gendisk		*gd;
+	struct kobject		kobj;
+	char			blk_symlink_name[NAME_MAX];
+	struct completion	*close_compl;
+	atomic_t		refcount;
+};
+
+static inline const char *ibnbd_queue_mode_str(enum ibnbd_queue_mode mode)
+{
+	switch (mode) {
+	case BLK_RQ:
+		return "rq";
+	case BLK_MQ:
+		return "mq";
+	default:
+		return "unknown";
+	}
+}
+
+int ibnbd_close_device(struct ibnbd_dev *dev, bool force);
+struct ibnbd_session *ibnbd_create_session(const struct sockaddr_storage *addr);
+struct ibnbd_session *ibnbd_clt_find_sess(const struct sockaddr_storage *addr);
+void ibnbd_clt_sess_release(struct kref *ref);
+struct ibnbd_dev *ibnbd_client_add_device(struct ibnbd_session *sess,
+					  const char *pathname,
+					  enum ibnbd_access_mode access_mode,
+					  enum ibnbd_queue_mode queue_mode,
+					  enum ibnbd_io_mode io_mode);
+void ibnbd_destroy_gen_disk(struct ibnbd_dev *dev);
+int ibnbd_addr_to_str(const struct sockaddr_storage *addr,
+		      char *buf, size_t len);
+bool ibnbd_clt_dev_is_open(struct ibnbd_dev *dev);
+bool ibnbd_clt_dev_is_mapped(const char *pathname);
+int open_remote_device(struct ibnbd_dev *dev);
+
+const char *ibnbd_clt_get_io_mode(const struct ibnbd_dev *dev);
+
+#define ERR_DEVS(sess, fmt, ...)	\
+({	struct ibnbd_dev *dev;		\
+					\
+	mutex_lock(&sess->lock);	\
+	list_for_each_entry(dev, &sess->devs_list, list) \
+		pr_err("ibnbd L%d <%s@%s> ERR:" fmt, \
+			__LINE__, dev->pathname, dev->sess->str_addr,\
+			##__VA_ARGS__); \
+	mutex_unlock(&sess->lock);	\
+})
+
+#define INFO_DEVS(sess, fmt, ...)	\
+({	struct ibnbd_dev *dev;		\
+					\
+	mutex_lock(&sess->lock);	\
+	list_for_each_entry(dev, &sess->devs_list, list) \
+		pr_info("ibnbd <%s@%s> ERR:" fmt, \
+			dev->pathname, dev->sess->str_addr,\
+			##__VA_ARGS__);	\
+	mutex_unlock(&sess->lock);	\
+})
+#endif /* _IBNBD_CLT_H */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 17/28] ibnbd_clt: add header shared in ibnbd_client
@ 2017-03-24 10:45   ` Jack Wang
  0 siblings, 0 replies; 87+ messages in thread
From: Jack Wang @ 2017-03-24 10:45 UTC (permalink / raw)
  To: linux-block-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
	hch-jcswGhMUV9g, mail-99BIx50xQYGELgA04lAiVw,
	Milind.dumbare-Re5JQEeQqe8AvxtiuMwx3w,
	yun.wang-EIkl63zCoXaH+58JC4qpiA, Jack Wang, Kleber Souza,
	Danil Kipnis, Roman Pen

From: Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>

Signed-off-by: Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Signed-off-by: Kleber Souza <kleber.souza-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Signed-off-by: Danil Kipnis <danil.kipnis-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Signed-off-by: Roman Pen <roman.penyaev-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
---
 drivers/block/ibnbd_client/ibnbd_clt.h | 231 +++++++++++++++++++++++++++++++++
 1 file changed, 231 insertions(+)
 create mode 100644 drivers/block/ibnbd_client/ibnbd_clt.h

diff --git a/drivers/block/ibnbd_client/ibnbd_clt.h b/drivers/block/ibnbd_client/ibnbd_clt.h
new file mode 100644
index 0000000..3f0db78
--- /dev/null
+++ b/drivers/block/ibnbd_client/ibnbd_clt.h
@@ -0,0 +1,231 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler < mail-99BIx50xQYGELgA04lAiVw@public.gmane.org>
+ *          Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ *   	    Kleber Souza <kleber.souza-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ * 	    Danil Kipnis <danil.kipnis-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ *   	    Roman Pen <roman.penyaev-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ *          Milind Dumbare <Milind.dumbare-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#ifndef _IBNBD_CLT_H
+#define _IBNBD_CLT_H
+#include <linux/blkdev.h>
+#include <linux/wait.h>			/* for wait_queue_head_t */
+#include <linux/in.h>			/* for sockaddr_in */
+#include <linux/inet.h>			/* for sockaddr_in */
+#include <linux/blk-mq.h>
+#include "ibnbd_clt_log.h"
+#include "../ibnbd_inc/ibnbd.h"
+#include "../ibnbd_inc/ibnbd-proto.h"	/* ibnbd protocol messages */
+#include <rdma/ibtrs_clt.h>	/* for ibtrs api */
+#include <rdma/ibtrs.h>
+
+#define IP_PREFIX "ip:"
+#define IP_PREFIX_LEN strlen(IP_PREFIX)
+#define GID_PREFIX "gid:"
+#define GID_PREFIX_LEN strlen(GID_PREFIX)
+
+#define BMAX_SEGMENTS 31
+#define RECONNECT_DELAY 30
+#define MAX_RECONNECTS -1
+
+enum ibnbd_dev_state {
+	DEV_STATE_INIT,
+	DEV_STATE_INIT_CLOSED,
+	DEV_STATE_CLOSED,
+	DEV_STATE_UNMAPPED,
+	DEV_STATE_OPEN
+};
+
+enum ibnbd_queue_mode {
+	BLK_MQ,
+	BLK_RQ
+};
+
+struct ibnbd_iu {
+	struct request		*rq;
+	struct ibtrs_tag	*tag;
+	struct ibnbd_dev	*dev;
+	struct ibnbd_msg_io	msg;
+	int			errno;
+	struct scatterlist	sglist[BMAX_SEGMENTS];
+};
+
+struct ibnbd_cpu_qlist {
+	struct list_head	requeue_list;
+	spinlock_t		requeue_lock;
+	unsigned int		cpu;
+};
+
+enum sess_state {
+	SESS_STATE_READY,
+	SESS_STATE_DISCONNECTED,
+	SESS_STATE_DESTROYED,
+};
+
+struct ibnbd_session {
+	struct list_head        list;
+	struct ibtrs_session    *sess;
+	struct ibnbd_cpu_qlist	__percpu
+				*cpu_queues;
+	DECLARE_BITMAP(cpu_queues_bm, NR_CPUS);
+	int	__percpu	*cpu_rr; /* per-cpu var for CPU round-robin */
+	atomic_t		busy;
+	int			queue_depth;
+	u32			max_io_size;
+	struct blk_mq_tag_set	tag_set;
+	struct mutex		lock; /* protects state and devs_list */
+	struct list_head        devs_list; /* list of struct ibnbd_dev */
+	struct kref		refcount;
+	struct sockaddr_storage addr;
+	char			str_addr[IBTRS_ADDRLEN];
+	char			hostname[MAXHOSTNAMELEN];
+	enum sess_state		state;
+	u8			ver; /* protocol version */
+	struct completion	*sess_info_compl;
+};
+
+struct ibnbd_work {
+	struct work_struct	work;
+	struct ibnbd_session	*sess;
+};
+
+/**
+ * Submission queues.
+ */
+struct ibnbd_queue {
+	struct list_head	requeue_list;
+	unsigned long		in_list;
+	struct ibnbd_dev	*dev;
+	struct blk_mq_hw_ctx	*hctx;
+};
+
+struct ibnbd_dev {
+	struct list_head        g_list;
+	struct ibnbd_session	*sess;
+	struct request_queue	*queue;
+	struct ibnbd_queue	*hw_queues;
+	struct delayed_work	rq_delay_work;
+	u32			device_id;
+	u32			clt_device_id;
+	struct completion	*open_compl;	/* completion for open msg */
+	int			open_errno;
+	struct mutex		lock;
+	enum ibnbd_dev_state	dev_state;
+	enum ibnbd_queue_mode	queue_mode;
+	enum ibnbd_io_mode	io_mode; /* user requested */
+	enum ibnbd_io_mode	remote_io_mode; /* server really used */
+	/* local Idr index - used to track minor number allocations. */
+	char			pathname[NAME_MAX];
+	enum ibnbd_access_mode	access_mode;
+	bool			read_only;
+	bool			rotational;
+	u32			max_hw_sectors;
+	u32			max_write_same_sectors;
+	u32			max_discard_sectors;
+	u32			discard_zeroes_data;
+	u32			discard_granularity;
+	u32			discard_alignment;
+	u16			secure_discard;
+	u16			physical_block_size;
+	u16			logical_block_size;
+	u16			max_segments;
+	size_t			nsectors;
+	u64			size;		/* device size in bytes */
+	struct list_head        list;
+	struct gendisk		*gd;
+	struct kobject		kobj;
+	char			blk_symlink_name[NAME_MAX];
+	struct completion	*close_compl;
+	atomic_t		refcount;
+};
+
+static inline const char *ibnbd_queue_mode_str(enum ibnbd_queue_mode mode)
+{
+	switch (mode) {
+	case BLK_RQ:
+		return "rq";
+	case BLK_MQ:
+		return "mq";
+	default:
+		return "unknown";
+	}
+}
+
+int ibnbd_close_device(struct ibnbd_dev *dev, bool force);
+struct ibnbd_session *ibnbd_create_session(const struct sockaddr_storage *addr);
+struct ibnbd_session *ibnbd_clt_find_sess(const struct sockaddr_storage *addr);
+void ibnbd_clt_sess_release(struct kref *ref);
+struct ibnbd_dev *ibnbd_client_add_device(struct ibnbd_session *sess,
+					  const char *pathname,
+					  enum ibnbd_access_mode access_mode,
+					  enum ibnbd_queue_mode queue_mode,
+					  enum ibnbd_io_mode io_mode);
+void ibnbd_destroy_gen_disk(struct ibnbd_dev *dev);
+int ibnbd_addr_to_str(const struct sockaddr_storage *addr,
+		      char *buf, size_t len);
+bool ibnbd_clt_dev_is_open(struct ibnbd_dev *dev);
+bool ibnbd_clt_dev_is_mapped(const char *pathname);
+int open_remote_device(struct ibnbd_dev *dev);
+
+const char *ibnbd_clt_get_io_mode(const struct ibnbd_dev *dev);
+
+#define ERR_DEVS(sess, fmt, ...)	\
+({	struct ibnbd_dev *dev;		\
+					\
+	mutex_lock(&sess->lock);	\
+	list_for_each_entry(dev, &sess->devs_list, list) \
+		pr_err("ibnbd L%d <%s@%s> ERR:" fmt, \
+			__LINE__, dev->pathname, dev->sess->str_addr,\
+			##__VA_ARGS__); \
+	mutex_unlock(&sess->lock);	\
+})
+
+#define INFO_DEVS(sess, fmt, ...)	\
+({	struct ibnbd_dev *dev;		\
+					\
+	mutex_lock(&sess->lock);	\
+	list_for_each_entry(dev, &sess->devs_list, list) \
+		pr_info("ibnbd <%s@%s> ERR:" fmt, \
+			dev->pathname, dev->sess->str_addr,\
+			##__VA_ARGS__);	\
+	mutex_unlock(&sess->lock);	\
+})
+#endif /* _IBNBD_CLT_H */
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 18/28] ibnbd_clt: add sysfs interface
@ 2017-03-24 10:45   ` Jack Wang
  0 siblings, 0 replies; 87+ messages in thread
From: Jack Wang @ 2017-03-24 10:45 UTC (permalink / raw)
  To: linux-block, linux-rdma
  Cc: dledford, axboe, hch, mail, Milind.dumbare, yun.wang, Jack Wang,
	Kleber Souza, Danil Kipnis, Roman Pen

From: Jack Wang <jinpu.wang@profitbricks.com>

Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Kleber Souza <kleber.souza@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kipnis@profitbricks.com>
Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
---
 drivers/block/ibnbd_client/ibnbd_clt_sysfs.c | 863 +++++++++++++++++++++++++++
 drivers/block/ibnbd_client/ibnbd_clt_sysfs.h |  64 ++
 2 files changed, 927 insertions(+)
 create mode 100644 drivers/block/ibnbd_client/ibnbd_clt_sysfs.c
 create mode 100644 drivers/block/ibnbd_client/ibnbd_clt_sysfs.h

diff --git a/drivers/block/ibnbd_client/ibnbd_clt_sysfs.c b/drivers/block/ibnbd_client/ibnbd_clt_sysfs.c
new file mode 100644
index 0000000..89d487c
--- /dev/null
+++ b/drivers/block/ibnbd_client/ibnbd_clt_sysfs.c
@@ -0,0 +1,863 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler < mail@fholler.de>
+ *          Jack Wang <jinpu.wang@profitbricks.com>
+ *   	    Kleber Souza <kleber.souza@profitbricks.com>
+ * 	    Danil Kipnis <danil.kipnis@profitbricks.com>
+ *   	    Roman Pen <roman.penyaev@profitbricks.com>
+ *          Milind Dumbare <Milind.dumbare@gmail.com>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#include <linux/types.h>
+#include <linux/ctype.h>
+#include <linux/parser.h>
+#include <linux/module.h>
+#include "ibnbd_clt_sysfs.h"
+#include "ibnbd_clt.h"
+#include <rdma/ibtrs.h>
+#include <rdma/ib.h>
+
+static struct kobject *ibnbd_kobject;
+static struct kobject *ibnbd_devices_kobject;
+static DEFINE_MUTEX(sess_lock);
+
+struct ibnbd_clt_dev_destroy_kobj_work {
+	struct ibnbd_dev	*dev;
+	struct work_struct	work;
+};
+
+enum {
+	IBNBD_OPT_ERR		= 0,
+	IBNBD_OPT_SERVER	= 1 << 0,
+	IBNBD_OPT_DEV_PATH	= 1 << 1,
+	IBNBD_OPT_ACCESS_MODE	= 1 << 3,
+	IBNBD_OPT_INPUT_MODE	= 1 << 4,
+	IBNBD_OPT_IO_MODE	= 1 << 5,
+};
+
+static unsigned ibnbd_opt_mandatory[] = {
+	IBNBD_OPT_SERVER,
+	IBNBD_OPT_DEV_PATH,
+};
+
+static const match_table_t ibnbd_opt_tokens = {
+	{	IBNBD_OPT_SERVER,	"server=%s"		},
+	{	IBNBD_OPT_DEV_PATH,	"device_path=%s"	},
+	{	IBNBD_OPT_ACCESS_MODE,	"access_mode=%s"	},
+	{	IBNBD_OPT_INPUT_MODE,	"input_mode=%s"		},
+	{	IBNBD_OPT_IO_MODE,	"io_mode=%s"		},
+	{	IBNBD_OPT_ERR,		NULL			},
+};
+
+/* remove new line from string */
+static void strip(char *s)
+{
+	char *p = s;
+
+	while (*s != '\0') {
+		if (*s != '\n')
+			*p++ = *s++;
+		else
+			++s;
+	}
+	*p = '\0';
+}
+
+static int ibnbd_clt_parse_map_options(const char *buf, char *server_addr,
+				       char *pathname,
+				       enum ibnbd_access_mode *access_mode,
+				       enum ibnbd_queue_mode *queue_mode,
+				       enum ibnbd_io_mode *io_mode)
+{
+	char *options, *sep_opt;
+	char *p;
+	substring_t args[MAX_OPT_ARGS];
+	int opt_mask = 0;
+	int token;
+	int ret = -EINVAL;
+	int i;
+
+	options = kstrdup(buf, GFP_KERNEL);
+	if (!options)
+		return -ENOMEM;
+
+	options = strstrip(options);
+	strip(options);
+	sep_opt = options;
+	while ((p = strsep(&sep_opt, " ")) != NULL) {
+		if (!*p)
+			continue;
+
+		token = match_token(p, ibnbd_opt_tokens, args);
+		opt_mask |= token;
+
+		switch (token) {
+		case IBNBD_OPT_SERVER:
+			p = match_strdup(args);
+			if (!p) {
+				ret = -ENOMEM;
+				goto out;
+			}
+			if (strlen(p) > IBTRS_ADDRLEN) {
+				ERR_NP("map_device: Server address too long\n");
+				ret = -EINVAL;
+				goto out;
+			}
+			strlcpy(server_addr, p, IBTRS_ADDRLEN);
+			kfree(p);
+			break;
+
+		case IBNBD_OPT_DEV_PATH:
+			p = match_strdup(args);
+			if (!p) {
+				ret = -ENOMEM;
+				goto out;
+			}
+			if (strlen(p) > NAME_MAX) {
+				ERR_NP("map_device: Device path too long\n");
+				ret = -EINVAL;
+				goto out;
+			}
+			strlcpy(pathname, p, NAME_MAX);
+			kfree(p);
+			break;
+
+		case IBNBD_OPT_ACCESS_MODE:
+			p = match_strdup(args);
+			if (!p) {
+				ret = -ENOMEM;
+				goto out;
+			}
+
+			if (!strcmp(p, "ro")) {
+				*access_mode = IBNBD_ACCESS_RO;
+			} else if (!strcmp(p, "rw")) {
+				*access_mode = IBNBD_ACCESS_RW;
+			} else if (!strcmp(p, "migration")) {
+				*access_mode = IBNBD_ACCESS_MIGRATION;
+			} else {
+				ERR_NP("map_device: Invalid access_mode:"
+				       " '%s'\n", p);
+				ret = -EINVAL;
+				kfree(p);
+				goto out;
+			}
+
+			kfree(p);
+			break;
+
+		case IBNBD_OPT_INPUT_MODE:
+			p = match_strdup(args);
+			if (!p) {
+				ret = -ENOMEM;
+				goto out;
+			}
+			if (!strcmp(p, "mq")) {
+				*queue_mode = BLK_MQ;
+			} else if (!strcmp(p, "rq")) {
+				*queue_mode = BLK_RQ;
+			} else {
+				ERR_NP("map_device: Invalid input_mode: "
+				       "'%s'.\n", p);
+				ret = -EINVAL;
+				kfree(p);
+				goto out;
+			}
+			kfree(p);
+			break;
+
+		case IBNBD_OPT_IO_MODE:
+			p = match_strdup(args);
+			if (!p) {
+				ret = -ENOMEM;
+				goto out;
+			}
+			if (!strcmp(p, "blockio")) {
+				*io_mode = IBNBD_BLOCKIO;
+			} else if (!strcmp(p, "fileio")) {
+				*io_mode = IBNBD_FILEIO;
+			} else {
+				ERR_NP("map_device: Invalid io_mode: '%s'.\n",
+				       p);
+				ret = -EINVAL;
+				kfree(p);
+				goto out;
+			}
+			kfree(p);
+			break;
+
+		default:
+			ERR_NP("map_device: Unknown parameter or missing value"
+			       " '%s'\n", p);
+			ret = -EINVAL;
+			goto out;
+		}
+	}
+
+	for (i = 0; i < ARRAY_SIZE(ibnbd_opt_mandatory); i++) {
+		if ((opt_mask & ibnbd_opt_mandatory[i])) {
+			ret = 0;
+		} else {
+			ERR_NP("map_device: Parameters missing\n");
+			ret = -EINVAL;
+			break;
+		}
+	}
+
+out:
+	kfree(options);
+	return ret;
+}
+
+int ibnbd_clt_get_sess(struct ibnbd_session *sess)
+{
+	return kref_get_unless_zero(&sess->refcount);
+}
+
+void ibnbd_clt_put_sess(struct ibnbd_session *sess)
+{
+	mutex_lock(&sess_lock);
+	kref_put(&sess->refcount, ibnbd_clt_sess_release);
+	mutex_unlock(&sess_lock);
+}
+
+static void ibnbd_clt_dev_destroy_kobjs(struct ibnbd_dev *dev)
+{
+	kobject_del(&dev->kobj);
+	kobject_put(&dev->kobj);
+}
+
+static ssize_t ibnbd_clt_state_show(struct kobject *kobj,
+				    struct kobj_attribute *attr, char *page)
+{
+	struct ibnbd_dev *dev = container_of(kobj, struct ibnbd_dev, kobj);
+
+	switch (dev->dev_state) {
+	case (DEV_STATE_INIT):
+		return scnprintf(page, PAGE_SIZE, "init\n");
+	case (DEV_STATE_OPEN):
+		return scnprintf(page, PAGE_SIZE, "open\n");
+	case (DEV_STATE_CLOSED):
+		return scnprintf(page, PAGE_SIZE, "closed\n");
+	case (DEV_STATE_UNMAPPED):
+		return scnprintf(page, PAGE_SIZE, "unmapped\n");
+	default:
+		return scnprintf(page, PAGE_SIZE, "unknown\n");
+	}
+}
+
+static struct kobj_attribute ibnbd_clt_state_attr =
+	__ATTR(state, 0444, ibnbd_clt_state_show, NULL);
+
+static ssize_t ibnbd_clt_input_mode_show(struct kobject *kobj,
+					 struct kobj_attribute *attr,
+					 char *page)
+{
+	struct ibnbd_dev *dev = container_of(kobj, struct ibnbd_dev, kobj);
+
+	return scnprintf(page, PAGE_SIZE, "%s\n",
+			 ibnbd_queue_mode_str(dev->queue_mode));
+}
+
+static struct kobj_attribute ibnbd_clt_input_mode_attr =
+	__ATTR(input_mode, 0444, ibnbd_clt_input_mode_show, NULL);
+
+static ssize_t ibnbd_clt_mapping_path_show(struct kobject *kobj,
+					   struct kobj_attribute *attr,
+					   char *page)
+{
+	struct ibnbd_dev *dev;
+
+	dev = container_of(kobj, struct ibnbd_dev, kobj);
+
+	return scnprintf(page, PAGE_SIZE, "%s\n", dev->pathname);
+}
+
+static struct kobj_attribute ibnbd_clt_mapping_path_attr =
+	__ATTR(mapping_path, 0444, ibnbd_clt_mapping_path_show, NULL);
+
+static ssize_t ibnbd_clt_io_mode_show(struct kobject *kobj,
+				      struct kobj_attribute *attr, char *page)
+{
+	struct ibnbd_dev *dev = container_of(kobj, struct ibnbd_dev, kobj);
+
+	return scnprintf(page, PAGE_SIZE, "%s\n",
+			 ibnbd_io_mode_str(dev->remote_io_mode));
+}
+
+static struct kobj_attribute ibnbd_clt_io_mode =
+	__ATTR(io_mode, 0444, ibnbd_clt_io_mode_show, NULL);
+
+static ssize_t ibnbd_clt_unmap_dev_show(struct kobject *kobj,
+					struct kobj_attribute *attr, char *page)
+{
+	return scnprintf(page, PAGE_SIZE, "Usage: echo <normal|force> > %s\n",
+			 attr->attr.name);
+}
+
+static void ibnbd_clt_dev_kobj_destroy_worker(struct work_struct *work)
+{
+	struct ibnbd_clt_dev_destroy_kobj_work *destroy_work;
+	struct ibnbd_dev *dev;
+
+	destroy_work = container_of(work,
+				    struct ibnbd_clt_dev_destroy_kobj_work,
+				    work);
+	dev = destroy_work->dev;
+	kobject_get(&dev->kobj);
+	ibnbd_clt_dev_destroy_kobjs(dev);
+	kobject_put(&dev->kobj);
+	kfree(destroy_work);
+}
+
+void ibnbd_clt_schedule_dev_destroy(struct ibnbd_dev *dev)
+{
+	struct ibnbd_clt_dev_destroy_kobj_work *destroy_work = NULL;
+
+	/* memory allocation cannot fail, otherwise the last reference to the
+	 * session will never be put
+	 */
+	while (!destroy_work) {
+		destroy_work = kmalloc(sizeof(*destroy_work),
+				       (GFP_KERNEL | __GFP_REPEAT));
+		if (!destroy_work)
+			cond_resched();
+	}
+
+	destroy_work->dev = dev;
+	INIT_WORK(&destroy_work->work, ibnbd_clt_dev_kobj_destroy_worker);
+	if (WARN(!schedule_work(&destroy_work->work),
+		 "failed to schedule work\n"))
+		kfree(destroy_work);
+}
+
+static ssize_t ibnbd_clt_unmap_dev_store(struct kobject *kobj,
+					 struct kobj_attribute *attr,
+					 const char *buf, size_t count)
+{
+	int err;
+	struct ibnbd_dev *dev;
+	bool force;
+	char *options;
+
+	if (!try_module_get(THIS_MODULE))
+		return -ENODEV;
+	options = kstrdup(buf, GFP_KERNEL);
+	if (!options)
+		return -ENOMEM;
+
+	options = strstrip(options);
+	strip(options);
+
+	dev = container_of(kobj, struct ibnbd_dev, kobj);
+
+	if (dev->dev_state == DEV_STATE_UNMAPPED) {
+		err = -EALREADY;
+		goto out;
+	}
+
+	if (sysfs_streq(options, "normal")) {
+		force = false;
+	} else if (sysfs_streq(options, "force")) {
+		force = true;
+	} else {
+		ERR(dev, "unmap_device: Invalid value: %s\n", options);
+		err = -EINVAL;
+		goto out;
+	}
+
+	INFO(dev, "Unmapping device, option: %s.\n",
+	     force ? "force" : "normal");
+
+	err = ibnbd_close_device(dev, force);
+	if (err) {
+		ERR(dev, "unmap_device: Failed to close device, errno: %d\n",
+		    err);
+		goto out;
+	}
+
+	wait_for_completion(dev->close_compl);
+
+	ibnbd_clt_schedule_dev_destroy(dev);
+
+	module_put(THIS_MODULE);
+	kfree(options);
+	return count;
+out:
+	module_put(THIS_MODULE);
+	kfree(options);
+	return err;
+}
+
+static struct kobj_attribute ibnbd_clt_unmap_device_attr =
+	__ATTR(unmap_device, 0644, ibnbd_clt_unmap_dev_show,
+	       ibnbd_clt_unmap_dev_store);
+
+static ssize_t ibnbd_clt_remap_dev_show(struct kobject *kobj,
+					struct kobj_attribute *attr, char *page)
+{
+	return scnprintf(page, PAGE_SIZE, "Usage: echo <1> > %s\n",
+			 attr->attr.name);
+}
+
+static ssize_t ibnbd_clt_remap_dev_store(struct kobject *kobj,
+					 struct kobj_attribute *attr,
+					 const char *buf, size_t count)
+{
+	int err;
+	struct ibnbd_dev *dev;
+	char *options;
+
+	options = kstrdup(buf, GFP_KERNEL);
+	if (!options)
+		return -ENOMEM;
+
+	options = strstrip(options);
+	strip(options);
+
+	dev = container_of(kobj, struct ibnbd_dev, kobj);
+	if (!sysfs_streq(options, "1")) {
+		ERR(dev, "remap_device: Invalid value: %s\n", options);
+		err = -EINVAL;
+		goto out;
+	}
+
+	mutex_lock(&dev->lock);
+	if (dev->dev_state == DEV_STATE_UNMAPPED) {
+		err = -EIO;
+		mutex_unlock(&dev->lock);
+		goto out;
+	} else if (dev->dev_state == DEV_STATE_OPEN) {
+		mutex_unlock(&dev->lock);
+		goto out1;
+	} else if (dev->dev_state == DEV_STATE_CLOSED) {
+		mutex_unlock(&dev->lock);
+		INFO(dev, "Remapping device.\n");
+
+		err = open_remote_device(dev);
+		if (err) {
+			ERR(dev, "remap_device: Failed to remap device,"
+			    " errno: %d\n", err);
+			goto out;
+		}
+	}
+
+out1:
+	kfree(options);
+	return count;
+out:
+	kfree(options);
+	return err;
+}
+
+static struct kobj_attribute ibnbd_clt_remap_device_attr =
+	__ATTR(remap_device, 0644, ibnbd_clt_remap_dev_show,
+	       ibnbd_clt_remap_dev_store);
+
+static ssize_t ibnbd_clt_session_show(struct kobject *kobj,
+				      struct kobj_attribute *attr,
+				      char *page)
+{
+	struct ibnbd_dev *dev = container_of(kobj, struct ibnbd_dev, kobj);
+	char server_addr[IBTRS_ADDRLEN];
+	int ret;
+
+	if (dev->dev_state == DEV_STATE_UNMAPPED)
+		return -EIO;
+
+	ret = ibtrs_addr_to_str(&dev->sess->addr, server_addr,
+				sizeof(server_addr));
+
+	if (ret >= sizeof(server_addr))
+		return -ENOBUFS;
+	if (ret < 0)
+		return ret;
+
+	return scnprintf(page, PAGE_SIZE, "%s\n", server_addr);
+}
+
+static struct kobj_attribute ibnbd_clt_session_attr =
+	__ATTR(session, 0444, ibnbd_clt_session_show, NULL);
+
+static struct attribute *ibnbd_dev_attrs[] = {
+	&ibnbd_clt_unmap_device_attr.attr,
+	&ibnbd_clt_remap_device_attr.attr,
+	&ibnbd_clt_mapping_path_attr.attr,
+	&ibnbd_clt_state_attr.attr,
+	&ibnbd_clt_input_mode_attr.attr,
+	&ibnbd_clt_session_attr.attr,
+	&ibnbd_clt_io_mode.attr,
+	NULL,
+};
+
+void ibnbd_clt_remove_dev_symlink(struct ibnbd_dev *dev)
+{
+	if (strlen(dev->blk_symlink_name))
+		sysfs_remove_link(ibnbd_devices_kobject, dev->blk_symlink_name);
+}
+
+static void ibnbd_clt_dev_release(struct kobject *kobj)
+{
+	struct ibnbd_dev *dev;
+
+	dev = container_of(kobj, struct ibnbd_dev, kobj);
+	ibnbd_destroy_gen_disk(dev);
+}
+
+static struct kobj_type ibnbd_dev_ktype = {
+	.sysfs_ops      = &kobj_sysfs_ops,
+	.default_attrs  = ibnbd_dev_attrs,
+	.release	= ibnbd_clt_dev_release,
+};
+
+static int ibnbd_clt_add_dev_kobj(struct ibnbd_dev *dev)
+{
+	int ret;
+	struct kobject *gd_kobj = &disk_to_dev(dev->gd)->kobj;
+
+	ret = kobject_init_and_add(&dev->kobj, &ibnbd_dev_ktype, gd_kobj, "%s",
+				   "ibnbd");
+	if (ret)
+		ERR(dev, "Failed to create device sysfs dir, errno: %d\n", ret);
+
+	return ret;
+}
+
+static int ibnbd_clt_str_ipv4_to_sockaddr(const char *con_addr,
+					  struct sockaddr_storage *dst)
+{
+	int ret;
+	char ipaddr[INET6_ADDRSTRLEN];
+	struct sockaddr_in *dst_sin = (struct sockaddr_in *)dst;
+	u8 ip4[4];
+
+	strlcpy(ipaddr, &con_addr[IP_PREFIX_LEN], sizeof(ipaddr));
+
+	ret = in4_pton(ipaddr, strlen(ipaddr), ip4, '\0', NULL);
+	if (ret == 0)
+		return -EINVAL;
+
+	memcpy(&dst_sin->sin_addr.s_addr, ip4,
+	       sizeof(dst_sin->sin_addr.s_addr));
+	dst_sin->sin_family = AF_INET;
+	dst_sin->sin_port = htons(IBTRS_SERVER_PORT);
+
+	return 0;
+}
+
+static int ibnbd_clt_str_ipv6_to_sockaddr(const char *con_addr,
+					  struct sockaddr_storage *dst)
+{
+	int ret;
+	char ipaddr[INET6_ADDRSTRLEN];
+	struct sockaddr_in6 *dst_sin6 = (struct sockaddr_in6 *)dst;
+
+	strlcpy(ipaddr, &con_addr[IP_PREFIX_LEN], sizeof(ipaddr));
+
+	ret = in6_pton(ipaddr, strlen(ipaddr),
+		       dst_sin6->sin6_addr.s6_addr,
+		       '\0', NULL);
+	if (ret != 1)
+		return -EINVAL;
+
+	dst_sin6->sin6_family = AF_INET6;
+	dst_sin6->sin6_port = htons(IBTRS_SERVER_PORT);
+
+	return 0;
+}
+
+static int ibnbd_clt_str_gid_to_sockaddr(const char *con_addr,
+					 struct sockaddr_storage *dst)
+{
+	int ret;
+	char gid[INET6_ADDRSTRLEN];
+	struct sockaddr_ib *dst_ib = (struct sockaddr_ib *)dst;
+
+	strlcpy(gid, &con_addr[GID_PREFIX_LEN], sizeof(gid));
+
+	/* We can use some of the I6 functions since GID is a valid
+	 * IPv6 address format
+	 */
+	ret = in6_pton(gid, strlen(gid),
+		       dst_ib->sib_addr.sib_raw, '\0', NULL);
+	if (ret == 0)
+		return -EINVAL;
+
+	dst_ib->sib_family = AF_IB;
+	/*
+	 * Use the same TCP server port number as the IB service ID
+	 * on the IB port space range
+	 */
+	dst_ib->sib_sid = cpu_to_be64(RDMA_IB_IP_PS_IB | IBTRS_SERVER_PORT);
+	dst_ib->sib_sid_mask = cpu_to_be64(0xffffffffffffffffULL);
+	dst_ib->sib_pkey = cpu_to_be16(0xffff);
+
+	return 0;
+}
+
+static int ibnbd_clt_str_to_sockaddr(char *addr,
+				     struct sockaddr_storage *sockaddr)
+{
+	if (strncmp(addr, GID_PREFIX, GID_PREFIX_LEN) == 0) {
+		return ibnbd_clt_str_gid_to_sockaddr(addr, sockaddr);
+	} else if (strncmp(addr, IP_PREFIX, IP_PREFIX_LEN) == 0) {
+		if (ibnbd_clt_str_ipv4_to_sockaddr(addr, sockaddr))
+			return ibnbd_clt_str_ipv6_to_sockaddr(addr, sockaddr);
+		else
+			return 0;
+	}
+	return -EPROTONOSUPPORT;
+}
+
+static struct ibnbd_session *
+ibnbd_clt_get_create_sess(struct sockaddr_storage *sockaddr)
+{
+	struct ibnbd_session *sess;
+
+	mutex_lock(&sess_lock);
+	sess = ibnbd_clt_find_sess(sockaddr);
+	if (sess) {
+		if (sess->state != SESS_STATE_READY ||
+		    !ibnbd_clt_get_sess(sess)) {
+			ERR_NP("Session is not connected or "
+			       "is being destroyed\n");
+			sess = ERR_PTR(-EIO);
+		}
+	} else {
+		sess = ibnbd_create_session(sockaddr);
+	}
+	mutex_unlock(&sess_lock);
+
+	return sess;
+}
+
+static ssize_t ibnbd_clt_map_device_show(struct kobject *kobj,
+					 struct kobj_attribute *attr,
+					 char *page)
+{
+	return scnprintf(page, PAGE_SIZE, "Usage: echo \"server=<address>"
+			 " device_path=<full path on remote side>"
+			 " [access_mode=<ro|rw|migration>]"
+			 " [input_mode=<mq|rq>]"
+			 " [io_mode=<fileio|blockio>]\" > %s\n\n"
+			 "address ::= [ ip:<ipv4> | ip:<ipv6> | gid:<gid> ]\n",
+			 attr->attr.name);
+}
+
+static int ibnbd_clt_get_path_name(struct ibnbd_dev *dev, char *buf,
+				   size_t len)
+{
+	int ret;
+	char pathname[NAME_MAX], *s;
+
+	strlcpy(pathname, dev->pathname, sizeof(pathname));
+	while ((s = strchr(pathname, '/')))
+		s[0] = '!';
+
+	ret = snprintf(buf, len, "%s", pathname);
+	if (ret >= len)
+		return -ENAMETOOLONG;
+
+	return 0;
+}
+
+static int ibnbd_clt_add_dev_symlink(struct ibnbd_dev *dev)
+{
+	struct kobject *gd_kobj = &disk_to_dev(dev->gd)->kobj;
+	int ret;
+
+	ret = ibnbd_clt_get_path_name(dev, dev->blk_symlink_name,
+				      sizeof(dev->blk_symlink_name));
+	if (ret) {
+		ERR(dev, "Failed to get /sys/block symlink path, error: %d.\n",
+		    ret);
+		goto out_err;
+	}
+
+	ret = sysfs_create_link(ibnbd_devices_kobject, gd_kobj,
+				dev->blk_symlink_name);
+	if (ret) {
+		ERR(dev, "Creating /sys/block symlink failed, error: %d.\n",
+		    ret);
+		goto out_err;
+	}
+
+	return 0;
+
+out_err:
+	dev->blk_symlink_name[0] = '\0';
+	return ret;
+}
+
+static ssize_t ibnbd_clt_map_device_store(struct kobject *kobj,
+					  struct kobj_attribute *attr,
+					  const char *buf, size_t count)
+{
+	struct ibnbd_session *sess;
+	struct ibnbd_dev *dev;
+	int ret;
+	char pathname[NAME_MAX];
+	char server_addr[IBTRS_ADDRLEN];
+	enum ibnbd_access_mode access_mode = IBNBD_ACCESS_RW;
+	enum ibnbd_queue_mode queue_mode = BLK_MQ;
+	enum ibnbd_io_mode io_mode = IBNBD_AUTOIO;
+	struct sockaddr_storage sockaddr;
+
+	ret = ibnbd_clt_parse_map_options(buf, server_addr, pathname,
+					  &access_mode, &queue_mode,
+					  &io_mode);
+	if (ret)
+		return ret;
+
+	ret = ibnbd_clt_str_to_sockaddr(server_addr, &sockaddr);
+	if (ret) {
+		if (ret == -EPROTONOSUPPORT)
+			ERR_NP("Invalid address protocol provided: %s\n",
+			       server_addr);
+		else
+			ERR_NP("Converting address to binary format failed: "
+			       "%s\n", server_addr);
+		return -EINVAL;
+	}
+
+	if (ibnbd_clt_dev_is_mapped(pathname)) {
+		ERR_NP("map_device: failed, Device with same path '%s' is"
+		       " already mapped\n", pathname);
+		return -EEXIST;
+	}
+
+	INFO_NP("Mapping device %s from server %s,"
+	      " (access_mode: %s, input_mode: %s, io_mode: %s)\n",
+	      pathname, server_addr, ibnbd_access_mode_str(access_mode),
+	      ibnbd_queue_mode_str(queue_mode), ibnbd_io_mode_str(io_mode));
+
+	sess = ibnbd_clt_get_create_sess(&sockaddr);
+	if (IS_ERR(sess))
+		return PTR_ERR(sess);
+
+	dev = ibnbd_client_add_device(sess, pathname, access_mode, queue_mode,
+				      io_mode);
+	if (IS_ERR(dev)) {
+		ret = PTR_ERR(dev);
+		goto out_sess_put;
+	}
+
+	ret = ibnbd_clt_add_dev_kobj(dev);
+	if (ret) {
+		if (!WARN_ON(ibnbd_close_device(dev, true)))
+			wait_for_completion(dev->close_compl);
+		/* ibnbd_destroy_gen_disk() will put the reference that was
+		 * acquired by ibnbd_client_add_device()
+		 */
+		ibnbd_destroy_gen_disk(dev);
+		goto out_sess_put;
+	}
+
+	ret = ibnbd_clt_add_dev_symlink(dev);
+	if (ret)
+		goto out_close_dev;
+
+	ibnbd_clt_put_sess(sess);
+	return count;
+
+out_close_dev:
+	if (!WARN_ON(ibnbd_close_device(dev, true)))
+		wait_for_completion(dev->close_compl);
+	kobject_del(&dev->kobj);
+	kobject_put(&dev->kobj);
+out_sess_put:
+	ibnbd_clt_put_sess(sess);
+
+	return ret;
+}
+
+static struct kobj_attribute ibnbd_clt_map_device_attr =
+	__ATTR(map_device, 0644,
+	       ibnbd_clt_map_device_show, ibnbd_clt_map_device_store);
+
+static struct attribute *default_attrs[] = {
+	&ibnbd_clt_map_device_attr.attr,
+	NULL,
+};
+
+static struct attribute_group default_attr_group = {
+	.attrs = default_attrs,
+};
+
+int ibnbd_clt_create_sysfs_files(void)
+{
+	int err = 0;
+
+	ibnbd_kobject = kobject_create_and_add("ibnbd", kernel_kobj);
+	if (!ibnbd_kobject) {
+		err = -ENOMEM;
+		goto err1;
+	}
+
+	ibnbd_devices_kobject = kobject_create_and_add("devices",
+						       ibnbd_kobject);
+	if (!ibnbd_devices_kobject) {
+		err = -ENOMEM;
+		goto err2;
+	}
+
+	err = sysfs_create_group(ibnbd_kobject, &default_attr_group);
+	if (err)
+		goto err3;
+
+	return 0;
+
+err3:
+	kobject_put(ibnbd_devices_kobject);
+err2:
+	kobject_put(ibnbd_kobject);
+err1:
+	return err;
+}
+
+void ibnbd_clt_destroy_default_group(void)
+{
+	sysfs_remove_group(ibnbd_kobject, &default_attr_group);
+}
+
+void ibnbd_clt_destroy_sysfs_files(void)
+{
+	kobject_put(ibnbd_devices_kobject);
+	kobject_put(ibnbd_kobject);
+}
diff --git a/drivers/block/ibnbd_client/ibnbd_clt_sysfs.h b/drivers/block/ibnbd_client/ibnbd_clt_sysfs.h
new file mode 100644
index 0000000..34f6013
--- /dev/null
+++ b/drivers/block/ibnbd_client/ibnbd_clt_sysfs.h
@@ -0,0 +1,64 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler < mail@fholler.de>
+ *          Jack Wang <jinpu.wang@profitbricks.com>
+ *   	    Kleber Souza <kleber.souza@profitbricks.com>
+ * 	    Danil Kipnis <danil.kipnis@profitbricks.com>
+ *   	    Roman Pen <roman.penyaev@profitbricks.com>
+ *          Milind Dumbare <Milind.dumbare@gmail.com>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#ifndef _IBNBD_CLT_SYSFS_H
+#define _IBNBD_CLT_SYSFS_H
+
+#include "ibnbd_clt.h"
+
+int ibnbd_clt_create_sysfs_files(void);
+
+void ibnbd_clt_destroy_sysfs_files(void);
+void ibnbd_clt_destroy_default_group(void);
+void ibnbd_clt_schedule_dev_destroy(struct ibnbd_dev *dev);
+
+void ibnbd_clt_remove_dev_symlink(struct ibnbd_dev *dev);
+
+int ibnbd_clt_get_sess(struct ibnbd_session *sess);
+
+void ibnbd_clt_put_sess(struct ibnbd_session *sess);
+
+#endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 18/28] ibnbd_clt: add sysfs interface
@ 2017-03-24 10:45   ` Jack Wang
  0 siblings, 0 replies; 87+ messages in thread
From: Jack Wang @ 2017-03-24 10:45 UTC (permalink / raw)
  To: linux-block-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
	hch-jcswGhMUV9g, mail-99BIx50xQYGELgA04lAiVw,
	Milind.dumbare-Re5JQEeQqe8AvxtiuMwx3w,
	yun.wang-EIkl63zCoXaH+58JC4qpiA, Jack Wang, Kleber Souza,
	Danil Kipnis, Roman Pen

From: Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>

Signed-off-by: Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Signed-off-by: Kleber Souza <kleber.souza-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Signed-off-by: Danil Kipnis <danil.kipnis-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Signed-off-by: Roman Pen <roman.penyaev-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
---
 drivers/block/ibnbd_client/ibnbd_clt_sysfs.c | 863 +++++++++++++++++++++++++++
 drivers/block/ibnbd_client/ibnbd_clt_sysfs.h |  64 ++
 2 files changed, 927 insertions(+)
 create mode 100644 drivers/block/ibnbd_client/ibnbd_clt_sysfs.c
 create mode 100644 drivers/block/ibnbd_client/ibnbd_clt_sysfs.h

diff --git a/drivers/block/ibnbd_client/ibnbd_clt_sysfs.c b/drivers/block/ibnbd_client/ibnbd_clt_sysfs.c
new file mode 100644
index 0000000..89d487c
--- /dev/null
+++ b/drivers/block/ibnbd_client/ibnbd_clt_sysfs.c
@@ -0,0 +1,863 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler < mail-99BIx50xQYGELgA04lAiVw@public.gmane.org>
+ *          Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ *   	    Kleber Souza <kleber.souza-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ * 	    Danil Kipnis <danil.kipnis-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ *   	    Roman Pen <roman.penyaev-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ *          Milind Dumbare <Milind.dumbare-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#include <linux/types.h>
+#include <linux/ctype.h>
+#include <linux/parser.h>
+#include <linux/module.h>
+#include "ibnbd_clt_sysfs.h"
+#include "ibnbd_clt.h"
+#include <rdma/ibtrs.h>
+#include <rdma/ib.h>
+
+static struct kobject *ibnbd_kobject;
+static struct kobject *ibnbd_devices_kobject;
+static DEFINE_MUTEX(sess_lock);
+
+struct ibnbd_clt_dev_destroy_kobj_work {
+	struct ibnbd_dev	*dev;
+	struct work_struct	work;
+};
+
+enum {
+	IBNBD_OPT_ERR		= 0,
+	IBNBD_OPT_SERVER	= 1 << 0,
+	IBNBD_OPT_DEV_PATH	= 1 << 1,
+	IBNBD_OPT_ACCESS_MODE	= 1 << 3,
+	IBNBD_OPT_INPUT_MODE	= 1 << 4,
+	IBNBD_OPT_IO_MODE	= 1 << 5,
+};
+
+static unsigned ibnbd_opt_mandatory[] = {
+	IBNBD_OPT_SERVER,
+	IBNBD_OPT_DEV_PATH,
+};
+
+static const match_table_t ibnbd_opt_tokens = {
+	{	IBNBD_OPT_SERVER,	"server=%s"		},
+	{	IBNBD_OPT_DEV_PATH,	"device_path=%s"	},
+	{	IBNBD_OPT_ACCESS_MODE,	"access_mode=%s"	},
+	{	IBNBD_OPT_INPUT_MODE,	"input_mode=%s"		},
+	{	IBNBD_OPT_IO_MODE,	"io_mode=%s"		},
+	{	IBNBD_OPT_ERR,		NULL			},
+};
+
+/* remove new line from string */
+static void strip(char *s)
+{
+	char *p = s;
+
+	while (*s != '\0') {
+		if (*s != '\n')
+			*p++ = *s++;
+		else
+			++s;
+	}
+	*p = '\0';
+}
+
+static int ibnbd_clt_parse_map_options(const char *buf, char *server_addr,
+				       char *pathname,
+				       enum ibnbd_access_mode *access_mode,
+				       enum ibnbd_queue_mode *queue_mode,
+				       enum ibnbd_io_mode *io_mode)
+{
+	char *options, *sep_opt;
+	char *p;
+	substring_t args[MAX_OPT_ARGS];
+	int opt_mask = 0;
+	int token;
+	int ret = -EINVAL;
+	int i;
+
+	options = kstrdup(buf, GFP_KERNEL);
+	if (!options)
+		return -ENOMEM;
+
+	options = strstrip(options);
+	strip(options);
+	sep_opt = options;
+	while ((p = strsep(&sep_opt, " ")) != NULL) {
+		if (!*p)
+			continue;
+
+		token = match_token(p, ibnbd_opt_tokens, args);
+		opt_mask |= token;
+
+		switch (token) {
+		case IBNBD_OPT_SERVER:
+			p = match_strdup(args);
+			if (!p) {
+				ret = -ENOMEM;
+				goto out;
+			}
+			if (strlen(p) > IBTRS_ADDRLEN) {
+				ERR_NP("map_device: Server address too long\n");
+				ret = -EINVAL;
+				goto out;
+			}
+			strlcpy(server_addr, p, IBTRS_ADDRLEN);
+			kfree(p);
+			break;
+
+		case IBNBD_OPT_DEV_PATH:
+			p = match_strdup(args);
+			if (!p) {
+				ret = -ENOMEM;
+				goto out;
+			}
+			if (strlen(p) > NAME_MAX) {
+				ERR_NP("map_device: Device path too long\n");
+				ret = -EINVAL;
+				goto out;
+			}
+			strlcpy(pathname, p, NAME_MAX);
+			kfree(p);
+			break;
+
+		case IBNBD_OPT_ACCESS_MODE:
+			p = match_strdup(args);
+			if (!p) {
+				ret = -ENOMEM;
+				goto out;
+			}
+
+			if (!strcmp(p, "ro")) {
+				*access_mode = IBNBD_ACCESS_RO;
+			} else if (!strcmp(p, "rw")) {
+				*access_mode = IBNBD_ACCESS_RW;
+			} else if (!strcmp(p, "migration")) {
+				*access_mode = IBNBD_ACCESS_MIGRATION;
+			} else {
+				ERR_NP("map_device: Invalid access_mode:"
+				       " '%s'\n", p);
+				ret = -EINVAL;
+				kfree(p);
+				goto out;
+			}
+
+			kfree(p);
+			break;
+
+		case IBNBD_OPT_INPUT_MODE:
+			p = match_strdup(args);
+			if (!p) {
+				ret = -ENOMEM;
+				goto out;
+			}
+			if (!strcmp(p, "mq")) {
+				*queue_mode = BLK_MQ;
+			} else if (!strcmp(p, "rq")) {
+				*queue_mode = BLK_RQ;
+			} else {
+				ERR_NP("map_device: Invalid input_mode: "
+				       "'%s'.\n", p);
+				ret = -EINVAL;
+				kfree(p);
+				goto out;
+			}
+			kfree(p);
+			break;
+
+		case IBNBD_OPT_IO_MODE:
+			p = match_strdup(args);
+			if (!p) {
+				ret = -ENOMEM;
+				goto out;
+			}
+			if (!strcmp(p, "blockio")) {
+				*io_mode = IBNBD_BLOCKIO;
+			} else if (!strcmp(p, "fileio")) {
+				*io_mode = IBNBD_FILEIO;
+			} else {
+				ERR_NP("map_device: Invalid io_mode: '%s'.\n",
+				       p);
+				ret = -EINVAL;
+				kfree(p);
+				goto out;
+			}
+			kfree(p);
+			break;
+
+		default:
+			ERR_NP("map_device: Unknown parameter or missing value"
+			       " '%s'\n", p);
+			ret = -EINVAL;
+			goto out;
+		}
+	}
+
+	for (i = 0; i < ARRAY_SIZE(ibnbd_opt_mandatory); i++) {
+		if ((opt_mask & ibnbd_opt_mandatory[i])) {
+			ret = 0;
+		} else {
+			ERR_NP("map_device: Parameters missing\n");
+			ret = -EINVAL;
+			break;
+		}
+	}
+
+out:
+	kfree(options);
+	return ret;
+}
+
+int ibnbd_clt_get_sess(struct ibnbd_session *sess)
+{
+	return kref_get_unless_zero(&sess->refcount);
+}
+
+void ibnbd_clt_put_sess(struct ibnbd_session *sess)
+{
+	mutex_lock(&sess_lock);
+	kref_put(&sess->refcount, ibnbd_clt_sess_release);
+	mutex_unlock(&sess_lock);
+}
+
+static void ibnbd_clt_dev_destroy_kobjs(struct ibnbd_dev *dev)
+{
+	kobject_del(&dev->kobj);
+	kobject_put(&dev->kobj);
+}
+
+static ssize_t ibnbd_clt_state_show(struct kobject *kobj,
+				    struct kobj_attribute *attr, char *page)
+{
+	struct ibnbd_dev *dev = container_of(kobj, struct ibnbd_dev, kobj);
+
+	switch (dev->dev_state) {
+	case (DEV_STATE_INIT):
+		return scnprintf(page, PAGE_SIZE, "init\n");
+	case (DEV_STATE_OPEN):
+		return scnprintf(page, PAGE_SIZE, "open\n");
+	case (DEV_STATE_CLOSED):
+		return scnprintf(page, PAGE_SIZE, "closed\n");
+	case (DEV_STATE_UNMAPPED):
+		return scnprintf(page, PAGE_SIZE, "unmapped\n");
+	default:
+		return scnprintf(page, PAGE_SIZE, "unknown\n");
+	}
+}
+
+static struct kobj_attribute ibnbd_clt_state_attr =
+	__ATTR(state, 0444, ibnbd_clt_state_show, NULL);
+
+static ssize_t ibnbd_clt_input_mode_show(struct kobject *kobj,
+					 struct kobj_attribute *attr,
+					 char *page)
+{
+	struct ibnbd_dev *dev = container_of(kobj, struct ibnbd_dev, kobj);
+
+	return scnprintf(page, PAGE_SIZE, "%s\n",
+			 ibnbd_queue_mode_str(dev->queue_mode));
+}
+
+static struct kobj_attribute ibnbd_clt_input_mode_attr =
+	__ATTR(input_mode, 0444, ibnbd_clt_input_mode_show, NULL);
+
+static ssize_t ibnbd_clt_mapping_path_show(struct kobject *kobj,
+					   struct kobj_attribute *attr,
+					   char *page)
+{
+	struct ibnbd_dev *dev;
+
+	dev = container_of(kobj, struct ibnbd_dev, kobj);
+
+	return scnprintf(page, PAGE_SIZE, "%s\n", dev->pathname);
+}
+
+static struct kobj_attribute ibnbd_clt_mapping_path_attr =
+	__ATTR(mapping_path, 0444, ibnbd_clt_mapping_path_show, NULL);
+
+static ssize_t ibnbd_clt_io_mode_show(struct kobject *kobj,
+				      struct kobj_attribute *attr, char *page)
+{
+	struct ibnbd_dev *dev = container_of(kobj, struct ibnbd_dev, kobj);
+
+	return scnprintf(page, PAGE_SIZE, "%s\n",
+			 ibnbd_io_mode_str(dev->remote_io_mode));
+}
+
+static struct kobj_attribute ibnbd_clt_io_mode =
+	__ATTR(io_mode, 0444, ibnbd_clt_io_mode_show, NULL);
+
+static ssize_t ibnbd_clt_unmap_dev_show(struct kobject *kobj,
+					struct kobj_attribute *attr, char *page)
+{
+	return scnprintf(page, PAGE_SIZE, "Usage: echo <normal|force> > %s\n",
+			 attr->attr.name);
+}
+
+static void ibnbd_clt_dev_kobj_destroy_worker(struct work_struct *work)
+{
+	struct ibnbd_clt_dev_destroy_kobj_work *destroy_work;
+	struct ibnbd_dev *dev;
+
+	destroy_work = container_of(work,
+				    struct ibnbd_clt_dev_destroy_kobj_work,
+				    work);
+	dev = destroy_work->dev;
+	kobject_get(&dev->kobj);
+	ibnbd_clt_dev_destroy_kobjs(dev);
+	kobject_put(&dev->kobj);
+	kfree(destroy_work);
+}
+
+void ibnbd_clt_schedule_dev_destroy(struct ibnbd_dev *dev)
+{
+	struct ibnbd_clt_dev_destroy_kobj_work *destroy_work = NULL;
+
+	/* memory allocation cannot fail, otherwise the last reference to the
+	 * session will never be put
+	 */
+	while (!destroy_work) {
+		destroy_work = kmalloc(sizeof(*destroy_work),
+				       (GFP_KERNEL | __GFP_REPEAT));
+		if (!destroy_work)
+			cond_resched();
+	}
+
+	destroy_work->dev = dev;
+	INIT_WORK(&destroy_work->work, ibnbd_clt_dev_kobj_destroy_worker);
+	if (WARN(!schedule_work(&destroy_work->work),
+		 "failed to schedule work\n"))
+		kfree(destroy_work);
+}
+
+static ssize_t ibnbd_clt_unmap_dev_store(struct kobject *kobj,
+					 struct kobj_attribute *attr,
+					 const char *buf, size_t count)
+{
+	int err;
+	struct ibnbd_dev *dev;
+	bool force;
+	char *options;
+
+	if (!try_module_get(THIS_MODULE))
+		return -ENODEV;
+	options = kstrdup(buf, GFP_KERNEL);
+	if (!options)
+		return -ENOMEM;
+
+	options = strstrip(options);
+	strip(options);
+
+	dev = container_of(kobj, struct ibnbd_dev, kobj);
+
+	if (dev->dev_state == DEV_STATE_UNMAPPED) {
+		err = -EALREADY;
+		goto out;
+	}
+
+	if (sysfs_streq(options, "normal")) {
+		force = false;
+	} else if (sysfs_streq(options, "force")) {
+		force = true;
+	} else {
+		ERR(dev, "unmap_device: Invalid value: %s\n", options);
+		err = -EINVAL;
+		goto out;
+	}
+
+	INFO(dev, "Unmapping device, option: %s.\n",
+	     force ? "force" : "normal");
+
+	err = ibnbd_close_device(dev, force);
+	if (err) {
+		ERR(dev, "unmap_device: Failed to close device, errno: %d\n",
+		    err);
+		goto out;
+	}
+
+	wait_for_completion(dev->close_compl);
+
+	ibnbd_clt_schedule_dev_destroy(dev);
+
+	module_put(THIS_MODULE);
+	kfree(options);
+	return count;
+out:
+	module_put(THIS_MODULE);
+	kfree(options);
+	return err;
+}
+
+static struct kobj_attribute ibnbd_clt_unmap_device_attr =
+	__ATTR(unmap_device, 0644, ibnbd_clt_unmap_dev_show,
+	       ibnbd_clt_unmap_dev_store);
+
+static ssize_t ibnbd_clt_remap_dev_show(struct kobject *kobj,
+					struct kobj_attribute *attr, char *page)
+{
+	return scnprintf(page, PAGE_SIZE, "Usage: echo <1> > %s\n",
+			 attr->attr.name);
+}
+
+static ssize_t ibnbd_clt_remap_dev_store(struct kobject *kobj,
+					 struct kobj_attribute *attr,
+					 const char *buf, size_t count)
+{
+	int err;
+	struct ibnbd_dev *dev;
+	char *options;
+
+	options = kstrdup(buf, GFP_KERNEL);
+	if (!options)
+		return -ENOMEM;
+
+	options = strstrip(options);
+	strip(options);
+
+	dev = container_of(kobj, struct ibnbd_dev, kobj);
+	if (!sysfs_streq(options, "1")) {
+		ERR(dev, "remap_device: Invalid value: %s\n", options);
+		err = -EINVAL;
+		goto out;
+	}
+
+	mutex_lock(&dev->lock);
+	if (dev->dev_state == DEV_STATE_UNMAPPED) {
+		err = -EIO;
+		mutex_unlock(&dev->lock);
+		goto out;
+	} else if (dev->dev_state == DEV_STATE_OPEN) {
+		mutex_unlock(&dev->lock);
+		goto out1;
+	} else if (dev->dev_state == DEV_STATE_CLOSED) {
+		mutex_unlock(&dev->lock);
+		INFO(dev, "Remapping device.\n");
+
+		err = open_remote_device(dev);
+		if (err) {
+			ERR(dev, "remap_device: Failed to remap device,"
+			    " errno: %d\n", err);
+			goto out;
+		}
+	}
+
+out1:
+	kfree(options);
+	return count;
+out:
+	kfree(options);
+	return err;
+}
+
+static struct kobj_attribute ibnbd_clt_remap_device_attr =
+	__ATTR(remap_device, 0644, ibnbd_clt_remap_dev_show,
+	       ibnbd_clt_remap_dev_store);
+
+static ssize_t ibnbd_clt_session_show(struct kobject *kobj,
+				      struct kobj_attribute *attr,
+				      char *page)
+{
+	struct ibnbd_dev *dev = container_of(kobj, struct ibnbd_dev, kobj);
+	char server_addr[IBTRS_ADDRLEN];
+	int ret;
+
+	if (dev->dev_state == DEV_STATE_UNMAPPED)
+		return -EIO;
+
+	ret = ibtrs_addr_to_str(&dev->sess->addr, server_addr,
+				sizeof(server_addr));
+
+	if (ret >= sizeof(server_addr))
+		return -ENOBUFS;
+	if (ret < 0)
+		return ret;
+
+	return scnprintf(page, PAGE_SIZE, "%s\n", server_addr);
+}
+
+static struct kobj_attribute ibnbd_clt_session_attr =
+	__ATTR(session, 0444, ibnbd_clt_session_show, NULL);
+
+static struct attribute *ibnbd_dev_attrs[] = {
+	&ibnbd_clt_unmap_device_attr.attr,
+	&ibnbd_clt_remap_device_attr.attr,
+	&ibnbd_clt_mapping_path_attr.attr,
+	&ibnbd_clt_state_attr.attr,
+	&ibnbd_clt_input_mode_attr.attr,
+	&ibnbd_clt_session_attr.attr,
+	&ibnbd_clt_io_mode.attr,
+	NULL,
+};
+
+void ibnbd_clt_remove_dev_symlink(struct ibnbd_dev *dev)
+{
+	if (strlen(dev->blk_symlink_name))
+		sysfs_remove_link(ibnbd_devices_kobject, dev->blk_symlink_name);
+}
+
+static void ibnbd_clt_dev_release(struct kobject *kobj)
+{
+	struct ibnbd_dev *dev;
+
+	dev = container_of(kobj, struct ibnbd_dev, kobj);
+	ibnbd_destroy_gen_disk(dev);
+}
+
+static struct kobj_type ibnbd_dev_ktype = {
+	.sysfs_ops      = &kobj_sysfs_ops,
+	.default_attrs  = ibnbd_dev_attrs,
+	.release	= ibnbd_clt_dev_release,
+};
+
+static int ibnbd_clt_add_dev_kobj(struct ibnbd_dev *dev)
+{
+	int ret;
+	struct kobject *gd_kobj = &disk_to_dev(dev->gd)->kobj;
+
+	ret = kobject_init_and_add(&dev->kobj, &ibnbd_dev_ktype, gd_kobj, "%s",
+				   "ibnbd");
+	if (ret)
+		ERR(dev, "Failed to create device sysfs dir, errno: %d\n", ret);
+
+	return ret;
+}
+
+static int ibnbd_clt_str_ipv4_to_sockaddr(const char *con_addr,
+					  struct sockaddr_storage *dst)
+{
+	int ret;
+	char ipaddr[INET6_ADDRSTRLEN];
+	struct sockaddr_in *dst_sin = (struct sockaddr_in *)dst;
+	u8 ip4[4];
+
+	strlcpy(ipaddr, &con_addr[IP_PREFIX_LEN], sizeof(ipaddr));
+
+	ret = in4_pton(ipaddr, strlen(ipaddr), ip4, '\0', NULL);
+	if (ret == 0)
+		return -EINVAL;
+
+	memcpy(&dst_sin->sin_addr.s_addr, ip4,
+	       sizeof(dst_sin->sin_addr.s_addr));
+	dst_sin->sin_family = AF_INET;
+	dst_sin->sin_port = htons(IBTRS_SERVER_PORT);
+
+	return 0;
+}
+
+static int ibnbd_clt_str_ipv6_to_sockaddr(const char *con_addr,
+					  struct sockaddr_storage *dst)
+{
+	int ret;
+	char ipaddr[INET6_ADDRSTRLEN];
+	struct sockaddr_in6 *dst_sin6 = (struct sockaddr_in6 *)dst;
+
+	strlcpy(ipaddr, &con_addr[IP_PREFIX_LEN], sizeof(ipaddr));
+
+	ret = in6_pton(ipaddr, strlen(ipaddr),
+		       dst_sin6->sin6_addr.s6_addr,
+		       '\0', NULL);
+	if (ret != 1)
+		return -EINVAL;
+
+	dst_sin6->sin6_family = AF_INET6;
+	dst_sin6->sin6_port = htons(IBTRS_SERVER_PORT);
+
+	return 0;
+}
+
+static int ibnbd_clt_str_gid_to_sockaddr(const char *con_addr,
+					 struct sockaddr_storage *dst)
+{
+	int ret;
+	char gid[INET6_ADDRSTRLEN];
+	struct sockaddr_ib *dst_ib = (struct sockaddr_ib *)dst;
+
+	strlcpy(gid, &con_addr[GID_PREFIX_LEN], sizeof(gid));
+
+	/* We can use some of the I6 functions since GID is a valid
+	 * IPv6 address format
+	 */
+	ret = in6_pton(gid, strlen(gid),
+		       dst_ib->sib_addr.sib_raw, '\0', NULL);
+	if (ret == 0)
+		return -EINVAL;
+
+	dst_ib->sib_family = AF_IB;
+	/*
+	 * Use the same TCP server port number as the IB service ID
+	 * on the IB port space range
+	 */
+	dst_ib->sib_sid = cpu_to_be64(RDMA_IB_IP_PS_IB | IBTRS_SERVER_PORT);
+	dst_ib->sib_sid_mask = cpu_to_be64(0xffffffffffffffffULL);
+	dst_ib->sib_pkey = cpu_to_be16(0xffff);
+
+	return 0;
+}
+
+static int ibnbd_clt_str_to_sockaddr(char *addr,
+				     struct sockaddr_storage *sockaddr)
+{
+	if (strncmp(addr, GID_PREFIX, GID_PREFIX_LEN) == 0) {
+		return ibnbd_clt_str_gid_to_sockaddr(addr, sockaddr);
+	} else if (strncmp(addr, IP_PREFIX, IP_PREFIX_LEN) == 0) {
+		if (ibnbd_clt_str_ipv4_to_sockaddr(addr, sockaddr))
+			return ibnbd_clt_str_ipv6_to_sockaddr(addr, sockaddr);
+		else
+			return 0;
+	}
+	return -EPROTONOSUPPORT;
+}
+
+static struct ibnbd_session *
+ibnbd_clt_get_create_sess(struct sockaddr_storage *sockaddr)
+{
+	struct ibnbd_session *sess;
+
+	mutex_lock(&sess_lock);
+	sess = ibnbd_clt_find_sess(sockaddr);
+	if (sess) {
+		if (sess->state != SESS_STATE_READY ||
+		    !ibnbd_clt_get_sess(sess)) {
+			ERR_NP("Session is not connected or "
+			       "is being destroyed\n");
+			sess = ERR_PTR(-EIO);
+		}
+	} else {
+		sess = ibnbd_create_session(sockaddr);
+	}
+	mutex_unlock(&sess_lock);
+
+	return sess;
+}
+
+static ssize_t ibnbd_clt_map_device_show(struct kobject *kobj,
+					 struct kobj_attribute *attr,
+					 char *page)
+{
+	return scnprintf(page, PAGE_SIZE, "Usage: echo \"server=<address>"
+			 " device_path=<full path on remote side>"
+			 " [access_mode=<ro|rw|migration>]"
+			 " [input_mode=<mq|rq>]"
+			 " [io_mode=<fileio|blockio>]\" > %s\n\n"
+			 "address ::= [ ip:<ipv4> | ip:<ipv6> | gid:<gid> ]\n",
+			 attr->attr.name);
+}
+
+static int ibnbd_clt_get_path_name(struct ibnbd_dev *dev, char *buf,
+				   size_t len)
+{
+	int ret;
+	char pathname[NAME_MAX], *s;
+
+	strlcpy(pathname, dev->pathname, sizeof(pathname));
+	while ((s = strchr(pathname, '/')))
+		s[0] = '!';
+
+	ret = snprintf(buf, len, "%s", pathname);
+	if (ret >= len)
+		return -ENAMETOOLONG;
+
+	return 0;
+}
+
+static int ibnbd_clt_add_dev_symlink(struct ibnbd_dev *dev)
+{
+	struct kobject *gd_kobj = &disk_to_dev(dev->gd)->kobj;
+	int ret;
+
+	ret = ibnbd_clt_get_path_name(dev, dev->blk_symlink_name,
+				      sizeof(dev->blk_symlink_name));
+	if (ret) {
+		ERR(dev, "Failed to get /sys/block symlink path, error: %d.\n",
+		    ret);
+		goto out_err;
+	}
+
+	ret = sysfs_create_link(ibnbd_devices_kobject, gd_kobj,
+				dev->blk_symlink_name);
+	if (ret) {
+		ERR(dev, "Creating /sys/block symlink failed, error: %d.\n",
+		    ret);
+		goto out_err;
+	}
+
+	return 0;
+
+out_err:
+	dev->blk_symlink_name[0] = '\0';
+	return ret;
+}
+
+static ssize_t ibnbd_clt_map_device_store(struct kobject *kobj,
+					  struct kobj_attribute *attr,
+					  const char *buf, size_t count)
+{
+	struct ibnbd_session *sess;
+	struct ibnbd_dev *dev;
+	int ret;
+	char pathname[NAME_MAX];
+	char server_addr[IBTRS_ADDRLEN];
+	enum ibnbd_access_mode access_mode = IBNBD_ACCESS_RW;
+	enum ibnbd_queue_mode queue_mode = BLK_MQ;
+	enum ibnbd_io_mode io_mode = IBNBD_AUTOIO;
+	struct sockaddr_storage sockaddr;
+
+	ret = ibnbd_clt_parse_map_options(buf, server_addr, pathname,
+					  &access_mode, &queue_mode,
+					  &io_mode);
+	if (ret)
+		return ret;
+
+	ret = ibnbd_clt_str_to_sockaddr(server_addr, &sockaddr);
+	if (ret) {
+		if (ret == -EPROTONOSUPPORT)
+			ERR_NP("Invalid address protocol provided: %s\n",
+			       server_addr);
+		else
+			ERR_NP("Converting address to binary format failed: "
+			       "%s\n", server_addr);
+		return -EINVAL;
+	}
+
+	if (ibnbd_clt_dev_is_mapped(pathname)) {
+		ERR_NP("map_device: failed, Device with same path '%s' is"
+		       " already mapped\n", pathname);
+		return -EEXIST;
+	}
+
+	INFO_NP("Mapping device %s from server %s,"
+	      " (access_mode: %s, input_mode: %s, io_mode: %s)\n",
+	      pathname, server_addr, ibnbd_access_mode_str(access_mode),
+	      ibnbd_queue_mode_str(queue_mode), ibnbd_io_mode_str(io_mode));
+
+	sess = ibnbd_clt_get_create_sess(&sockaddr);
+	if (IS_ERR(sess))
+		return PTR_ERR(sess);
+
+	dev = ibnbd_client_add_device(sess, pathname, access_mode, queue_mode,
+				      io_mode);
+	if (IS_ERR(dev)) {
+		ret = PTR_ERR(dev);
+		goto out_sess_put;
+	}
+
+	ret = ibnbd_clt_add_dev_kobj(dev);
+	if (ret) {
+		if (!WARN_ON(ibnbd_close_device(dev, true)))
+			wait_for_completion(dev->close_compl);
+		/* ibnbd_destroy_gen_disk() will put the reference that was
+		 * acquired by ibnbd_client_add_device()
+		 */
+		ibnbd_destroy_gen_disk(dev);
+		goto out_sess_put;
+	}
+
+	ret = ibnbd_clt_add_dev_symlink(dev);
+	if (ret)
+		goto out_close_dev;
+
+	ibnbd_clt_put_sess(sess);
+	return count;
+
+out_close_dev:
+	if (!WARN_ON(ibnbd_close_device(dev, true)))
+		wait_for_completion(dev->close_compl);
+	kobject_del(&dev->kobj);
+	kobject_put(&dev->kobj);
+out_sess_put:
+	ibnbd_clt_put_sess(sess);
+
+	return ret;
+}
+
+static struct kobj_attribute ibnbd_clt_map_device_attr =
+	__ATTR(map_device, 0644,
+	       ibnbd_clt_map_device_show, ibnbd_clt_map_device_store);
+
+static struct attribute *default_attrs[] = {
+	&ibnbd_clt_map_device_attr.attr,
+	NULL,
+};
+
+static struct attribute_group default_attr_group = {
+	.attrs = default_attrs,
+};
+
+int ibnbd_clt_create_sysfs_files(void)
+{
+	int err = 0;
+
+	ibnbd_kobject = kobject_create_and_add("ibnbd", kernel_kobj);
+	if (!ibnbd_kobject) {
+		err = -ENOMEM;
+		goto err1;
+	}
+
+	ibnbd_devices_kobject = kobject_create_and_add("devices",
+						       ibnbd_kobject);
+	if (!ibnbd_devices_kobject) {
+		err = -ENOMEM;
+		goto err2;
+	}
+
+	err = sysfs_create_group(ibnbd_kobject, &default_attr_group);
+	if (err)
+		goto err3;
+
+	return 0;
+
+err3:
+	kobject_put(ibnbd_devices_kobject);
+err2:
+	kobject_put(ibnbd_kobject);
+err1:
+	return err;
+}
+
+void ibnbd_clt_destroy_default_group(void)
+{
+	sysfs_remove_group(ibnbd_kobject, &default_attr_group);
+}
+
+void ibnbd_clt_destroy_sysfs_files(void)
+{
+	kobject_put(ibnbd_devices_kobject);
+	kobject_put(ibnbd_kobject);
+}
diff --git a/drivers/block/ibnbd_client/ibnbd_clt_sysfs.h b/drivers/block/ibnbd_client/ibnbd_clt_sysfs.h
new file mode 100644
index 0000000..34f6013
--- /dev/null
+++ b/drivers/block/ibnbd_client/ibnbd_clt_sysfs.h
@@ -0,0 +1,64 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler < mail-99BIx50xQYGELgA04lAiVw@public.gmane.org>
+ *          Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ *   	    Kleber Souza <kleber.souza-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ * 	    Danil Kipnis <danil.kipnis-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ *   	    Roman Pen <roman.penyaev-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ *          Milind Dumbare <Milind.dumbare-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#ifndef _IBNBD_CLT_SYSFS_H
+#define _IBNBD_CLT_SYSFS_H
+
+#include "ibnbd_clt.h"
+
+int ibnbd_clt_create_sysfs_files(void);
+
+void ibnbd_clt_destroy_sysfs_files(void);
+void ibnbd_clt_destroy_default_group(void);
+void ibnbd_clt_schedule_dev_destroy(struct ibnbd_dev *dev);
+
+void ibnbd_clt_remove_dev_symlink(struct ibnbd_dev *dev);
+
+int ibnbd_clt_get_sess(struct ibnbd_session *sess);
+
+void ibnbd_clt_put_sess(struct ibnbd_session *sess);
+
+#endif
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 19/28] ibnbd_clt: add log helpers
  2017-03-24 10:45 ` Jack Wang
                   ` (18 preceding siblings ...)
  (?)
@ 2017-03-24 10:45 ` Jack Wang
  -1 siblings, 0 replies; 87+ messages in thread
From: Jack Wang @ 2017-03-24 10:45 UTC (permalink / raw)
  To: linux-block, linux-rdma
  Cc: dledford, axboe, hch, mail, Milind.dumbare, yun.wang, Jack Wang,
	Kleber Souza, Danil Kipnis, Roman Pen

From: Jack Wang <jinpu.wang@profitbricks.com>

Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Kleber Souza <kleber.souza@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kipnis@profitbricks.com>
Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
---
 drivers/block/ibnbd_client/ibnbd_clt_log.h | 79 ++++++++++++++++++++++++++++++
 1 file changed, 79 insertions(+)
 create mode 100644 drivers/block/ibnbd_client/ibnbd_clt_log.h

diff --git a/drivers/block/ibnbd_client/ibnbd_clt_log.h b/drivers/block/ibnbd_client/ibnbd_clt_log.h
new file mode 100644
index 0000000..b3184b7
--- /dev/null
+++ b/drivers/block/ibnbd_client/ibnbd_clt_log.h
@@ -0,0 +1,79 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler < mail@fholler.de>
+ *          Jack Wang <jinpu.wang@profitbricks.com>
+ *   	    Kleber Souza <kleber.souza@profitbricks.com>
+ * 	    Danil Kipnis <danil.kipnis@profitbricks.com>
+ *   	    Roman Pen <roman.penyaev@profitbricks.com>
+ *          Milind Dumbare <Milind.dumbare@gmail.com>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#ifndef __IBNBD_CLT_LOG_H__
+#define __IBNBD_CLT_LOG_H__
+
+#include "../ibnbd_inc/log.h"
+
+#define blkdev_name(dev) ((dev->gd == NULL) ? "<no dev>" : dev->gd->disk_name)
+
+#define ERR(dev, fmt, ...) pr_err("ibnbd L%d <%s@%s> %s ERR: " fmt,\
+				__LINE__, dev->pathname, ibnbd_prefix(dev),\
+				blkdev_name(dev), ##__VA_ARGS__)
+
+#define ERR_RL(dev, fmt, ...) pr_err_ratelimited("ibnbd L%d <%s@%s> %s ERR: "\
+				fmt, __LINE__, dev->pathname,\
+				ibnbd_prefix(dev), blkdev_name(dev),\
+				##__VA_ARGS__)
+
+#define WRN(dev, fmt, ...) pr_warn("ibnbd L%d <%s@%s> %s WARN: " fmt,\
+				__LINE__, dev->pathname, ibnbd_prefix(dev),\
+				blkdev_name(dev), ##__VA_ARGS__)
+
+#define WRN_RL(dev, fmt, ...) pr_warn_ratelimited("ibnbd L%d <%s@%s> %s WARN: "\
+			fmt, __LINE__, dev->pathname, ibnbd_prefix(dev),\
+			blkdev_name(dev), ##__VA_ARGS__)
+
+#define INFO(dev, fmt, ...) pr_info("ibnbd <%s@%s> %s: " \
+			fmt, dev->pathname, ibnbd_prefix(dev),\
+			blkdev_name(dev), ##__VA_ARGS__)
+
+#define INFO_RL(dev, fmt, ...) pr_info_ratelimited("ibnbd <%s@%s> %s: " \
+			fmt, dev->pathname, ibnbd_prefix(dev),\
+			blkdev_name(dev), ##__VA_ARGS__)
+
+#endif /*__IBNBD_CLT_LOG_H__*/
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 20/28] ibnbd_clt: add Makefile and Kconfig
@ 2017-03-24 10:45   ` Jack Wang
  0 siblings, 0 replies; 87+ messages in thread
From: Jack Wang @ 2017-03-24 10:45 UTC (permalink / raw)
  To: linux-block, linux-rdma
  Cc: dledford, axboe, hch, mail, Milind.dumbare, yun.wang, Jack Wang

From: Jack Wang <jinpu.wang@profitbricks.com>

Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
 drivers/block/Kconfig               |  2 ++
 drivers/block/Makefile              |  1 +
 drivers/block/ibnbd_client/Kconfig  | 16 ++++++++++++++++
 drivers/block/ibnbd_client/Makefile |  5 +++++
 4 files changed, 24 insertions(+)
 create mode 100644 drivers/block/ibnbd_client/Kconfig
 create mode 100644 drivers/block/ibnbd_client/Makefile

diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
index f744de7..c309e57 100644
--- a/drivers/block/Kconfig
+++ b/drivers/block/Kconfig
@@ -275,6 +275,8 @@ config BLK_DEV_CRYPTOLOOP
 
 source "drivers/block/drbd/Kconfig"
 
+source "drivers/block/ibnbd_client/Kconfig"
+
 config BLK_DEV_NBD
 	tristate "Network block device support"
 	depends on NET
diff --git a/drivers/block/Makefile b/drivers/block/Makefile
index 1e9661e..7da1813 100644
--- a/drivers/block/Makefile
+++ b/drivers/block/Makefile
@@ -34,6 +34,7 @@ obj-$(CONFIG_BLK_DEV_HD)	+= hd.o
 
 obj-$(CONFIG_XEN_BLKDEV_FRONTEND)	+= xen-blkfront.o
 obj-$(CONFIG_XEN_BLKDEV_BACKEND)	+= xen-blkback/
+obj-$(CONFIG_BLK_DEV_IBNBD_CLT)	+= ibnbd_client/
 obj-$(CONFIG_BLK_DEV_DRBD)     += drbd/
 obj-$(CONFIG_BLK_DEV_RBD)     += rbd.o
 obj-$(CONFIG_BLK_DEV_PCIESSD_MTIP32XX)	+= mtip32xx/
diff --git a/drivers/block/ibnbd_client/Kconfig b/drivers/block/ibnbd_client/Kconfig
new file mode 100644
index 0000000..162e4e1
--- /dev/null
+++ b/drivers/block/ibnbd_client/Kconfig
@@ -0,0 +1,16 @@
+config BLK_DEV_IBNBD_CLT
+	tristate "Network block device over Infiniband client support"
+	depends on INFINIBAND_IBTRS_CLT
+	---help---
+	  Saying Y here will allow your computer to be a client for network
+	  block devices over Infiniband, i.e. it will be able to use block
+	  devices exported by
+	  servers (mount file systems on them etc.). Communication between
+	  client and server works over Infiniband networking, but to the client
+	  program this is hidden: it looks like a regular local file access to
+	  a block device special file such as /dev/ibnbd0.
+
+	  To compile this driver as a module, choose M here: the
+	  module will be called ibnbd_client.
+
+	  If unsure, say N.
diff --git a/drivers/block/ibnbd_client/Makefile b/drivers/block/ibnbd_client/Makefile
new file mode 100644
index 0000000..bbf211f
--- /dev/null
+++ b/drivers/block/ibnbd_client/Makefile
@@ -0,0 +1,5 @@
+
+obj-$(CONFIG_BLK_DEV_IBNBD_CLT)	+= ibnbd_client.o
+
+ibnbd_client-y 	:= ibnbd_clt.o ibnbd_clt_sysfs.o ../ibnbd_lib/ibnbd.o \
+			   ../ibnbd_lib/ibnbd-proto.o
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 20/28] ibnbd_clt: add Makefile and Kconfig
@ 2017-03-24 10:45   ` Jack Wang
  0 siblings, 0 replies; 87+ messages in thread
From: Jack Wang @ 2017-03-24 10:45 UTC (permalink / raw)
  To: linux-block-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
	hch-jcswGhMUV9g, mail-99BIx50xQYGELgA04lAiVw,
	Milind.dumbare-Re5JQEeQqe8AvxtiuMwx3w,
	yun.wang-EIkl63zCoXaH+58JC4qpiA, Jack Wang

From: Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>

Signed-off-by: Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
---
 drivers/block/Kconfig               |  2 ++
 drivers/block/Makefile              |  1 +
 drivers/block/ibnbd_client/Kconfig  | 16 ++++++++++++++++
 drivers/block/ibnbd_client/Makefile |  5 +++++
 4 files changed, 24 insertions(+)
 create mode 100644 drivers/block/ibnbd_client/Kconfig
 create mode 100644 drivers/block/ibnbd_client/Makefile

diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
index f744de7..c309e57 100644
--- a/drivers/block/Kconfig
+++ b/drivers/block/Kconfig
@@ -275,6 +275,8 @@ config BLK_DEV_CRYPTOLOOP
 
 source "drivers/block/drbd/Kconfig"
 
+source "drivers/block/ibnbd_client/Kconfig"
+
 config BLK_DEV_NBD
 	tristate "Network block device support"
 	depends on NET
diff --git a/drivers/block/Makefile b/drivers/block/Makefile
index 1e9661e..7da1813 100644
--- a/drivers/block/Makefile
+++ b/drivers/block/Makefile
@@ -34,6 +34,7 @@ obj-$(CONFIG_BLK_DEV_HD)	+= hd.o
 
 obj-$(CONFIG_XEN_BLKDEV_FRONTEND)	+= xen-blkfront.o
 obj-$(CONFIG_XEN_BLKDEV_BACKEND)	+= xen-blkback/
+obj-$(CONFIG_BLK_DEV_IBNBD_CLT)	+= ibnbd_client/
 obj-$(CONFIG_BLK_DEV_DRBD)     += drbd/
 obj-$(CONFIG_BLK_DEV_RBD)     += rbd.o
 obj-$(CONFIG_BLK_DEV_PCIESSD_MTIP32XX)	+= mtip32xx/
diff --git a/drivers/block/ibnbd_client/Kconfig b/drivers/block/ibnbd_client/Kconfig
new file mode 100644
index 0000000..162e4e1
--- /dev/null
+++ b/drivers/block/ibnbd_client/Kconfig
@@ -0,0 +1,16 @@
+config BLK_DEV_IBNBD_CLT
+	tristate "Network block device over Infiniband client support"
+	depends on INFINIBAND_IBTRS_CLT
+	---help---
+	  Saying Y here will allow your computer to be a client for network
+	  block devices over Infiniband, i.e. it will be able to use block
+	  devices exported by
+	  servers (mount file systems on them etc.). Communication between
+	  client and server works over Infiniband networking, but to the client
+	  program this is hidden: it looks like a regular local file access to
+	  a block device special file such as /dev/ibnbd0.
+
+	  To compile this driver as a module, choose M here: the
+	  module will be called ibnbd_client.
+
+	  If unsure, say N.
diff --git a/drivers/block/ibnbd_client/Makefile b/drivers/block/ibnbd_client/Makefile
new file mode 100644
index 0000000..bbf211f
--- /dev/null
+++ b/drivers/block/ibnbd_client/Makefile
@@ -0,0 +1,5 @@
+
+obj-$(CONFIG_BLK_DEV_IBNBD_CLT)	+= ibnbd_client.o
+
+ibnbd_client-y 	:= ibnbd_clt.o ibnbd_clt_sysfs.o ../ibnbd_lib/ibnbd.o \
+			   ../ibnbd_lib/ibnbd-proto.o
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 21/28] ibnbd_srv: add header shared in ibnbd_server
@ 2017-03-24 10:45   ` Jack Wang
  0 siblings, 0 replies; 87+ messages in thread
From: Jack Wang @ 2017-03-24 10:45 UTC (permalink / raw)
  To: linux-block, linux-rdma
  Cc: dledford, axboe, hch, mail, Milind.dumbare, yun.wang, Jack Wang,
	Kleber Souza, Danil Kipnis, Roman Pen

From: Jack Wang <jinpu.wang@profitbricks.com>

Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Kleber Souza <kleber.souza@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kipnis@profitbricks.com>
Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
---
 drivers/block/ibnbd_server/ibnbd_srv.h | 115 +++++++++++++++++++++++++++++++++
 1 file changed, 115 insertions(+)
 create mode 100644 drivers/block/ibnbd_server/ibnbd_srv.h

diff --git a/drivers/block/ibnbd_server/ibnbd_srv.h b/drivers/block/ibnbd_server/ibnbd_srv.h
new file mode 100644
index 0000000..764a31f
--- /dev/null
+++ b/drivers/block/ibnbd_server/ibnbd_srv.h
@@ -0,0 +1,115 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler < mail@fholler.de>
+ *          Jack Wang <jinpu.wang@profitbricks.com>
+ *   	    Kleber Souza <kleber.souza@profitbricks.com>
+ * 	    Danil Kipnis <danil.kipnis@profitbricks.com>
+ *   	    Roman Pen <roman.penyaev@profitbricks.com>
+ *          Milind Dumbare <Milind.dumbare@gmail.com>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#ifndef _IBNBD_SRV_H
+#define _IBNBD_SRV_H
+
+#include <linux/types.h>
+#include <linux/idr.h>
+#include <linux/kref.h>
+#include "../ibnbd_inc/ibnbd.h"
+#include "../ibnbd_inc/ibnbd-proto.h"
+#include <rdma/ibtrs.h>
+
+enum sess_state {
+	SESS_STATE_CONNECTED,
+	SESS_STATE_DISCONNECTED
+};
+
+struct ibnbd_srv_session {
+	struct list_head        list; /* for the global sess_list */
+	struct ibtrs_session    *ibtrs_sess;
+	char			str_addr[IBTRS_ADDRLEN];
+	char			hostname[MAXHOSTNAMELEN];
+	int			queue_depth;
+	enum sess_state         state;
+	struct bio_set		*sess_bio_set;
+
+	rwlock_t                index_lock ____cacheline_aligned;
+	struct idr              index_idr;
+	struct mutex		lock; /* protects sess_dev_list */
+	struct list_head        sess_dev_list; /* list of struct ibnbd_srv_sess_dev */
+	u8			ver; /* IBNBD protocol version */
+};
+
+struct ibnbd_srv_dev {
+	struct list_head                list; /* global dev_list */
+
+	struct kobject                  dev_kobj;
+	struct kobject                  dev_clients_kobj;
+
+	struct kref                     kref;
+	char				id[NAME_MAX];
+
+	struct mutex			lock; /* protects sess_dev_list and open_write_cnt */
+	struct list_head		sess_dev_list; /* list of struct ibnbd_srv_sess_dev */
+	int				open_write_cnt;
+	enum ibnbd_io_mode		mode;
+};
+
+struct ibnbd_srv_sess_dev {
+	struct list_head		dev_list; /* for struct ibnbd_srv_dev->sess_dev_list */
+	struct list_head		sess_list; /* for struct ibnbd_srv_session->sess_dev_list */
+
+	struct ibnbd_dev		*ibnbd_dev;
+	struct ibnbd_srv_session        *sess;
+	struct ibnbd_srv_dev		*dev;
+	struct kobject                  kobj;
+	struct completion		*sysfs_release_compl;
+
+	u32                             device_id;
+	u32                             clt_device_id;
+	fmode_t                         open_flags;
+	struct kref			kref;
+	struct completion               *destroy_comp;
+	char				pathname[NAME_MAX];
+	size_t				nsectors;
+	bool                            is_visible;
+};
+
+int ibnbd_srv_revalidate_dev(struct ibnbd_srv_dev *dev);
+
+#endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 21/28] ibnbd_srv: add header shared in ibnbd_server
@ 2017-03-24 10:45   ` Jack Wang
  0 siblings, 0 replies; 87+ messages in thread
From: Jack Wang @ 2017-03-24 10:45 UTC (permalink / raw)
  To: linux-block-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
	hch-jcswGhMUV9g, mail-99BIx50xQYGELgA04lAiVw,
	Milind.dumbare-Re5JQEeQqe8AvxtiuMwx3w,
	yun.wang-EIkl63zCoXaH+58JC4qpiA, Jack Wang, Kleber Souza,
	Danil Kipnis, Roman Pen

From: Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>

Signed-off-by: Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Signed-off-by: Kleber Souza <kleber.souza-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Signed-off-by: Danil Kipnis <danil.kipnis-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Signed-off-by: Roman Pen <roman.penyaev-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
---
 drivers/block/ibnbd_server/ibnbd_srv.h | 115 +++++++++++++++++++++++++++++++++
 1 file changed, 115 insertions(+)
 create mode 100644 drivers/block/ibnbd_server/ibnbd_srv.h

diff --git a/drivers/block/ibnbd_server/ibnbd_srv.h b/drivers/block/ibnbd_server/ibnbd_srv.h
new file mode 100644
index 0000000..764a31f
--- /dev/null
+++ b/drivers/block/ibnbd_server/ibnbd_srv.h
@@ -0,0 +1,115 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler < mail-99BIx50xQYGELgA04lAiVw@public.gmane.org>
+ *          Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ *   	    Kleber Souza <kleber.souza-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ * 	    Danil Kipnis <danil.kipnis-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ *   	    Roman Pen <roman.penyaev-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ *          Milind Dumbare <Milind.dumbare-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#ifndef _IBNBD_SRV_H
+#define _IBNBD_SRV_H
+
+#include <linux/types.h>
+#include <linux/idr.h>
+#include <linux/kref.h>
+#include "../ibnbd_inc/ibnbd.h"
+#include "../ibnbd_inc/ibnbd-proto.h"
+#include <rdma/ibtrs.h>
+
+enum sess_state {
+	SESS_STATE_CONNECTED,
+	SESS_STATE_DISCONNECTED
+};
+
+struct ibnbd_srv_session {
+	struct list_head        list; /* for the global sess_list */
+	struct ibtrs_session    *ibtrs_sess;
+	char			str_addr[IBTRS_ADDRLEN];
+	char			hostname[MAXHOSTNAMELEN];
+	int			queue_depth;
+	enum sess_state         state;
+	struct bio_set		*sess_bio_set;
+
+	rwlock_t                index_lock ____cacheline_aligned;
+	struct idr              index_idr;
+	struct mutex		lock; /* protects sess_dev_list */
+	struct list_head        sess_dev_list; /* list of struct ibnbd_srv_sess_dev */
+	u8			ver; /* IBNBD protocol version */
+};
+
+struct ibnbd_srv_dev {
+	struct list_head                list; /* global dev_list */
+
+	struct kobject                  dev_kobj;
+	struct kobject                  dev_clients_kobj;
+
+	struct kref                     kref;
+	char				id[NAME_MAX];
+
+	struct mutex			lock; /* protects sess_dev_list and open_write_cnt */
+	struct list_head		sess_dev_list; /* list of struct ibnbd_srv_sess_dev */
+	int				open_write_cnt;
+	enum ibnbd_io_mode		mode;
+};
+
+struct ibnbd_srv_sess_dev {
+	struct list_head		dev_list; /* for struct ibnbd_srv_dev->sess_dev_list */
+	struct list_head		sess_list; /* for struct ibnbd_srv_session->sess_dev_list */
+
+	struct ibnbd_dev		*ibnbd_dev;
+	struct ibnbd_srv_session        *sess;
+	struct ibnbd_srv_dev		*dev;
+	struct kobject                  kobj;
+	struct completion		*sysfs_release_compl;
+
+	u32                             device_id;
+	u32                             clt_device_id;
+	fmode_t                         open_flags;
+	struct kref			kref;
+	struct completion               *destroy_comp;
+	char				pathname[NAME_MAX];
+	size_t				nsectors;
+	bool                            is_visible;
+};
+
+int ibnbd_srv_revalidate_dev(struct ibnbd_srv_dev *dev);
+
+#endif
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 22/28] ibnbd_srv: add main functionality
  2017-03-24 10:45 ` Jack Wang
                   ` (21 preceding siblings ...)
  (?)
@ 2017-03-24 10:45 ` Jack Wang
  -1 siblings, 0 replies; 87+ messages in thread
From: Jack Wang @ 2017-03-24 10:45 UTC (permalink / raw)
  To: linux-block, linux-rdma
  Cc: dledford, axboe, hch, mail, Milind.dumbare, yun.wang, Jack Wang,
	Kleber Souza, Danil Kipnis, Roman Pen

From: Jack Wang <jinpu.wang@profitbricks.com>

Process incoming IO from ibtrs server, and hands them down to
underlying block device.

Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Kleber Souza <kleber.souza@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kipnis@profitbricks.com>
Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
---
 drivers/block/ibnbd_server/ibnbd_srv.c | 1074 ++++++++++++++++++++++++++++++++
 1 file changed, 1074 insertions(+)
 create mode 100644 drivers/block/ibnbd_server/ibnbd_srv.c

diff --git a/drivers/block/ibnbd_server/ibnbd_srv.c b/drivers/block/ibnbd_server/ibnbd_srv.c
new file mode 100644
index 0000000..13832b6
--- /dev/null
+++ b/drivers/block/ibnbd_server/ibnbd_srv.c
@@ -0,0 +1,1074 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler < mail@fholler.de>
+ *          Jack Wang <jinpu.wang@profitbricks.com>
+ *   	    Kleber Souza <kleber.souza@profitbricks.com>
+ * 	    Danil Kipnis <danil.kipnis@profitbricks.com>
+ *   	    Roman Pen <roman.penyaev@profitbricks.com>
+ *          Milind Dumbare <Milind.dumbare@gmail.com>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#include <linux/module.h>
+#include <linux/blkdev.h>
+#include <linux/idr.h>
+#include <rdma/ibtrs.h>
+#include "../ibnbd_inc/ibnbd-proto.h"
+#include <rdma/ibtrs_srv.h>
+#include "../ibnbd_inc/ibnbd.h"
+#include "ibnbd_srv.h"
+#include "ibnbd_srv_log.h"
+#include "ibnbd_srv_sysfs.h"
+#include "ibnbd_dev.h"
+
+MODULE_AUTHOR("ibnbd@profitbricks.com");
+MODULE_VERSION(__stringify(IBNBD_VER));
+MODULE_DESCRIPTION("InfiniBand Network Block Device Server");
+MODULE_LICENSE("GPL");
+
+#define DEFAULT_DEV_SEARCH_PATH "/"
+
+static char dev_search_path[PATH_MAX] = DEFAULT_DEV_SEARCH_PATH;
+
+static int dev_search_path_set(const char *val, const struct kernel_param *kp)
+{
+	char *dup;
+
+	if (strlen(val) >= sizeof(dev_search_path))
+		return -EINVAL;
+
+	dup = kstrdup(val, GFP_KERNEL);
+
+	if (dup[strlen(dup) - 1] == '\n')
+		dup[strlen(dup) - 1] = '\0';
+
+	strlcpy(dev_search_path, dup, sizeof(dev_search_path));
+
+	kfree(dup);
+	INFO_NP("dev_search_path changed to '%s'\n", dev_search_path);
+
+	return 0;
+}
+
+static struct kparam_string dev_search_path_kparam_str = {
+	.maxlen	= sizeof(dev_search_path),
+	.string	= dev_search_path
+};
+
+static const struct kernel_param_ops dev_search_path_ops = {
+	.set	= dev_search_path_set,
+	.get	= param_get_string,
+};
+
+module_param_cb(dev_search_path, &dev_search_path_ops,
+		&dev_search_path_kparam_str, 0444);
+MODULE_PARM_DESC(dev_search_path, "Sets the device_search_path."
+		 " When a device is mapped this path is prepended to the"
+		 " device_path from the map_device operation."
+		 " (default: " DEFAULT_DEV_SEARCH_PATH ")");
+
+static int def_io_mode = IBNBD_BLOCKIO;
+module_param(def_io_mode, int, 0444);
+MODULE_PARM_DESC(def_io_mode, "By default, export devices in"
+		 " blockio(" __stringify(_IBNBD_BLOCKIO) ") or"
+		 " fileio(" __stringify(_IBNBD_FILEIO) ") mode."
+		 " (default: " __stringify(_IBNBD_BLOCKIO) " (blockio))");
+
+static DEFINE_MUTEX(sess_lock);
+static DEFINE_SPINLOCK(dev_lock);
+
+static LIST_HEAD(sess_list);
+static LIST_HEAD(dev_list);
+
+
+struct ibnbd_io_private {
+	struct ibtrs_ops_id		*id;
+	struct ibnbd_srv_sess_dev	*sess_dev;
+};
+
+static struct ibtrs_srv_ops ibnbd_srv_ops;
+
+static void ibnbd_sess_dev_release(struct kref *kref)
+{
+	struct ibnbd_srv_sess_dev *sess_dev;
+
+	sess_dev = container_of(kref, struct ibnbd_srv_sess_dev, kref);
+	complete(sess_dev->destroy_comp);
+}
+
+static inline void ibnbd_put_sess_dev(struct ibnbd_srv_sess_dev *sess_dev)
+{
+	kref_put(&sess_dev->kref, ibnbd_sess_dev_release);
+}
+
+static void ibnbd_endio(void *priv, int error)
+{
+	int ret;
+	struct ibnbd_io_private *ibnbd_priv = priv;
+	struct ibnbd_srv_sess_dev *sess_dev = ibnbd_priv->sess_dev;
+
+	ibnbd_put_sess_dev(sess_dev);
+
+	ret = ibtrs_srv_resp_rdma(ibnbd_priv->id, error);
+	if (unlikely(ret))
+		ERR_RL(sess_dev, "Sending I/O response failed, errno: %d\n",
+		       ret);
+
+	kfree(priv);
+}
+
+static struct ibnbd_srv_sess_dev *
+ibnbd_get_sess_dev(int dev_id, struct ibnbd_srv_session *srv_sess)
+{
+	struct ibnbd_srv_sess_dev *sess_dev;
+	int ret = 0;
+
+	read_lock(&srv_sess->index_lock);
+	sess_dev = idr_find(&srv_sess->index_idr, dev_id);
+	if (likely(sess_dev))
+		ret = kref_get_unless_zero(&sess_dev->kref);
+	read_unlock(&srv_sess->index_lock);
+
+	if (unlikely(!sess_dev || !ret))
+		return ERR_PTR(-ENXIO);
+
+	return sess_dev;
+}
+
+static int process_rdma(struct ibtrs_session *sess,
+			struct ibnbd_srv_session *srv_sess,
+			struct ibtrs_ops_id *id, void *data, u32 len)
+{
+	struct ibnbd_io_private *priv;
+	struct ibnbd_srv_sess_dev *sess_dev;
+	struct ibnbd_msg_io *msg;
+	size_t data_len;
+	int err;
+	u32 dev_id;
+
+	priv = kmalloc(sizeof(*priv), GFP_KERNEL);
+	if (unlikely(!priv))
+		return -ENOMEM;
+
+	data_len = len - sizeof(*msg);
+	/* ibnbd message is after disk data */
+	msg = (struct ibnbd_msg_io *)(data + data_len);
+
+	dev_id = msg->device_id;
+
+	sess_dev = ibnbd_get_sess_dev(dev_id, srv_sess);
+	if (unlikely(IS_ERR(sess_dev))) {
+		ERR_NP_RL("Got I/O request from client %s for unknown device id"
+			  " %d\n", srv_sess->str_addr, dev_id);
+		err = -ENOTCONN;
+		goto err;
+	}
+
+	priv->sess_dev = sess_dev;
+	priv->id = id;
+
+	err = ibnbd_dev_submit_io(sess_dev->ibnbd_dev, msg->sector, data,
+				  data_len, msg->bi_size, msg->rw, priv);
+	if (unlikely(err)) {
+		ERR(sess_dev, "Submitting I/O to device failed, errno: %d\n",
+		    err);
+		goto sess_dev_put;
+	}
+
+	return 0;
+
+sess_dev_put:
+	ibnbd_put_sess_dev(sess_dev);
+err:
+	kfree(priv);
+	return err;
+}
+
+static void destroy_device(struct ibnbd_srv_dev *dev)
+{
+	WARN(!list_empty(&dev->sess_dev_list),
+	     "Device %s is being destroyed but still in use!\n",
+	     dev->id);
+
+	spin_lock(&dev_lock);
+	list_del(&dev->list);
+	spin_unlock(&dev_lock);
+
+	if (dev->dev_kobj.state_in_sysfs)
+		/*
+		 * Destroy kobj only if it was really created.
+		 * The following call should be sync, because
+		 *  we free the memory afterwards.
+		 */
+		ibnbd_srv_destroy_dev_sysfs(dev);
+
+	kfree(dev);
+}
+
+static void destroy_device_cb(struct kref *kref)
+{
+	struct ibnbd_srv_dev *dev;
+
+	dev = container_of(kref, struct ibnbd_srv_dev, kref);
+
+	destroy_device(dev);
+}
+
+static void ibnbd_put_srv_dev(struct ibnbd_srv_dev *dev)
+{
+	kref_put(&dev->kref, destroy_device_cb);
+}
+
+static void ibnbd_destroy_sess_dev(struct ibnbd_srv_sess_dev *sess_dev,
+				   bool locked)
+{
+	struct completion dc;
+
+	write_lock(&sess_dev->sess->index_lock);
+	idr_remove(&sess_dev->sess->index_idr, sess_dev->device_id);
+	write_unlock(&sess_dev->sess->index_lock);
+
+	init_completion(&dc);
+	sess_dev->destroy_comp = &dc;
+	ibnbd_put_sess_dev(sess_dev);
+	wait_for_completion(&dc);
+
+	ibnbd_dev_close(sess_dev->ibnbd_dev);
+	if (!locked)
+		mutex_lock(&sess_dev->sess->lock);
+	list_del(&sess_dev->sess_list);
+	if (!locked)
+		mutex_unlock(&sess_dev->sess->lock);
+
+	mutex_lock(&sess_dev->dev->lock);
+	list_del(&sess_dev->dev_list);
+	if (sess_dev->open_flags & FMODE_WRITE)
+		sess_dev->dev->open_write_cnt--;
+	mutex_unlock(&sess_dev->dev->lock);
+
+	ibnbd_put_srv_dev(sess_dev->dev);
+
+	INFO(sess_dev, "Device closed\n");
+	kfree(sess_dev);
+}
+
+static void destroy_sess(struct ibnbd_srv_session *srv_sess)
+{
+	struct ibnbd_srv_sess_dev *sess_dev, *tmp;
+
+	srv_sess->state = SESS_STATE_DISCONNECTED;
+
+	if (list_empty(&srv_sess->sess_dev_list))
+		goto out;
+
+	mutex_lock(&srv_sess->lock);
+	list_for_each_entry_safe(sess_dev, tmp, &srv_sess->sess_dev_list,
+				 sess_list) {
+		ibnbd_srv_destroy_dev_client_sysfs(sess_dev);
+		ibnbd_destroy_sess_dev(sess_dev, true);
+	}
+	mutex_unlock(&srv_sess->lock);
+
+out:
+	idr_destroy(&srv_sess->index_idr);
+	bioset_free(srv_sess->sess_bio_set);
+
+	INFO_NP("IBTRS Session to %s disconnected\n", srv_sess->str_addr);
+
+	mutex_lock(&sess_lock);
+	list_del(&srv_sess->list);
+	mutex_unlock(&sess_lock);
+
+	kfree(srv_sess);
+}
+
+static int create_sess(struct ibtrs_session *sess)
+{
+	struct ibnbd_srv_session *srv_sess;
+
+	srv_sess = kzalloc(sizeof(*srv_sess), GFP_KERNEL);
+	if (!srv_sess) {
+		ERR_NP("Allocating srv_session for client %s failed\n",
+		       ibtrs_srv_get_sess_addr(sess));
+		return -ENOMEM;
+	}
+	srv_sess->queue_depth = ibtrs_srv_get_sess_qdepth(sess);
+	srv_sess->sess_bio_set =  bioset_create(srv_sess->queue_depth, 0);
+	if (!srv_sess->sess_bio_set) {
+		ERR_NP("Allocating srv_session for client %s failed\n",
+		       ibtrs_srv_get_sess_addr(sess));
+		kfree(srv_sess);
+		return -ENOMEM;
+	}
+
+	idr_init(&srv_sess->index_idr);
+	rwlock_init(&srv_sess->index_lock);
+	INIT_LIST_HEAD(&srv_sess->sess_dev_list);
+	mutex_init(&srv_sess->lock);
+	srv_sess->state = SESS_STATE_CONNECTED;
+	mutex_lock(&sess_lock);
+	list_add(&srv_sess->list, &sess_list);
+	mutex_unlock(&sess_lock);
+
+	srv_sess->ibtrs_sess = sess;
+	srv_sess->queue_depth = ibtrs_srv_get_sess_qdepth(sess);
+	strlcpy(srv_sess->str_addr, ibtrs_srv_get_sess_addr(sess),
+		sizeof(srv_sess->str_addr));
+
+	ibtrs_srv_set_sess_priv(sess, srv_sess);
+
+	return 0;
+}
+
+static int ibnbd_srv_sess_ev(struct ibtrs_session *sess,
+			     enum ibtrs_srv_sess_ev ev, void *priv)
+{
+	struct ibnbd_srv_session *srv_sess = priv;
+
+	switch (ev) {
+	case IBTRS_SRV_SESS_EV_CONNECTED:
+		INFO_NP("IBTRS session to %s established\n",
+			ibtrs_srv_get_sess_addr(sess));
+		return create_sess(sess);
+
+	case IBTRS_SRV_SESS_EV_DISCONNECTING:
+		if (WARN_ON(!priv ||
+			    srv_sess->state != SESS_STATE_CONNECTED))
+			return -EINVAL;
+
+		INFO_NP("IBTRS Session to %s will be disconnected.\n",
+			srv_sess->str_addr);
+		srv_sess->state = SESS_STATE_DISCONNECTED;
+
+		return 0;
+
+	case IBTRS_SRV_SESS_EV_DISCONNECTED:
+		if (WARN_ON(!priv))
+			return -EINVAL;
+
+		destroy_sess(srv_sess);
+		return 0;
+
+	default:
+		WRN_NP("Received unknown IBTRS session event %d from session"
+		       " %s\n", ev, srv_sess->str_addr);
+		return -EINVAL;
+	}
+}
+
+static int ibnbd_srv_rdma_ev(struct ibtrs_session *sess, void *priv,
+			     struct ibtrs_ops_id *id, enum ibtrs_srv_rdma_ev ev,
+			     void *data, size_t len)
+{
+	struct ibnbd_srv_session *srv_sess = priv;
+
+	if (unlikely(WARN_ON(!srv_sess) ||
+		     srv_sess->state == SESS_STATE_DISCONNECTED))
+		return -ENODEV;
+
+	switch (ev) {
+	case IBTRS_SRV_RDMA_EV_RECV:
+	case IBTRS_SRV_RDMA_EV_WRITE_REQ:
+		return process_rdma(sess, srv_sess, id, data, len);
+
+	default:
+		WRN_NP("Received unexpected RDMA event %d from session %s\n",
+		       ev, srv_sess->str_addr);
+		return -EINVAL;
+	}
+}
+
+static struct ibnbd_srv_sess_dev
+*ibnbd_sess_dev_alloc(struct ibnbd_srv_session *srv_sess)
+{
+	struct ibnbd_srv_sess_dev *sess_dev;
+	int error;
+
+	sess_dev = kzalloc(sizeof(*sess_dev), GFP_KERNEL);
+	if (!sess_dev)
+		return ERR_PTR(-ENOMEM);
+
+	idr_preload(GFP_KERNEL);
+	write_lock(&srv_sess->index_lock);
+
+	error = idr_alloc(&srv_sess->index_idr, sess_dev, 0, -1, GFP_NOWAIT);
+	if (error < 0) {
+		WRN_NP("Allocating idr failed, errno: %d\n", error);
+		goto out_unlock;
+	}
+
+	sess_dev->device_id = error;
+	error = 0;
+
+out_unlock:
+	write_unlock(&srv_sess->index_lock);
+	idr_preload_end();
+	if (error) {
+		kfree(sess_dev);
+		return ERR_PTR(error);
+	}
+
+	return sess_dev;
+}
+
+static struct ibnbd_srv_dev *ibnbd_srv_init_srv_dev(const char *id,
+						    enum ibnbd_io_mode mode)
+{
+	struct ibnbd_srv_dev *dev;
+
+	dev = kzalloc(sizeof(*dev), GFP_KERNEL);
+	if (!dev)
+		return ERR_PTR(-ENOMEM);
+
+	strlcpy(dev->id, id, sizeof(dev->id));
+	dev->mode = mode;
+	kref_init(&dev->kref);
+	INIT_LIST_HEAD(&dev->sess_dev_list);
+	mutex_init(&dev->lock);
+
+	return dev;
+}
+
+static struct ibnbd_srv_dev *
+ibnbd_srv_find_or_add_srv_dev(struct ibnbd_srv_dev *new_dev)
+{
+	struct ibnbd_srv_dev *dev;
+
+	spin_lock(&dev_lock);
+	list_for_each_entry(dev, &dev_list, list) {
+		if (!strncmp(dev->id, new_dev->id, sizeof(dev->id))) {
+			if (!kref_get_unless_zero(&dev->kref))
+				/*
+				 * We lost the race, device is almost dead.
+				 *  Continue traversing to find a valid one.
+				 */
+				continue;
+			spin_unlock(&dev_lock);
+			return dev;
+		}
+	}
+	list_add(&new_dev->list, &dev_list);
+	spin_unlock(&dev_lock);
+
+	return new_dev;
+}
+
+static int ibnbd_srv_check_update_open_perm(struct ibnbd_srv_dev *srv_dev,
+					    struct ibnbd_srv_session *srv_sess,
+					    enum ibnbd_io_mode io_mode,
+					    enum ibnbd_access_mode access_mode)
+{
+	int ret = -EPERM;
+
+	mutex_lock(&srv_dev->lock);
+
+	if (srv_dev->mode != io_mode) {
+		ERR_NP("Mapping device '%s' for client %s in %s mode forbidden,"
+		       " device is already mapped from other client(s) in"
+		       " %s mode\n", srv_dev->id, srv_sess->str_addr,
+		       ibnbd_io_mode_str(io_mode),
+		       ibnbd_io_mode_str(srv_dev->mode));
+		goto out;
+	}
+
+	switch (access_mode) {
+	case IBNBD_ACCESS_RO:
+		ret = 0;
+		break;
+	case IBNBD_ACCESS_RW:
+		if (srv_dev->open_write_cnt == 0)  {
+			srv_dev->open_write_cnt++;
+			ret = 0;
+		} else {
+			ERR_NP("Mapping device '%s' for client %s with"
+			       " RW permissions failed. Device already opened"
+			       " as 'RW' by %d client(s) in %s mode.\n",
+			       srv_dev->id, srv_sess->str_addr,
+			       srv_dev->open_write_cnt,
+			       ibnbd_io_mode_str(srv_dev->mode));
+		}
+		break;
+	case IBNBD_ACCESS_MIGRATION:
+		if (srv_dev->open_write_cnt < 2) {
+			srv_dev->open_write_cnt++;
+			ret = 0;
+		} else {
+			ERR_NP("Mapping device '%s' for client %s with"
+			       " migration permissions failed. Device already"
+			       " opened as 'RW' by %d client(s) in %s mode.\n",
+			       srv_dev->id, srv_sess->str_addr,
+			       srv_dev->open_write_cnt,
+			       ibnbd_io_mode_str(srv_dev->mode));
+		}
+		break;
+	default:
+		ERR_NP("Received mapping request for device '%s' from client %s"
+		       " with invalid access mode: %d\n", srv_dev->id,
+		       srv_sess->str_addr, access_mode);
+		ret = -EINVAL;
+	}
+
+out:
+	mutex_unlock(&srv_dev->lock);
+
+	return ret;
+}
+
+static struct ibnbd_srv_dev *
+ibnbd_srv_get_or_create_srv_dev(struct ibnbd_dev *ibnbd_dev,
+				struct ibnbd_srv_session *srv_sess,
+				enum ibnbd_io_mode io_mode,
+				enum ibnbd_access_mode access_mode)
+{
+	int ret;
+	struct ibnbd_srv_dev *new_dev, *dev;
+	const char *dev_name = ibnbd_dev_get_name(ibnbd_dev);
+
+	new_dev = ibnbd_srv_init_srv_dev(dev_name, io_mode);
+	if (IS_ERR(new_dev))
+		return new_dev;
+
+	dev = ibnbd_srv_find_or_add_srv_dev(new_dev);
+	if (dev != new_dev)
+		kfree(new_dev);
+
+	ret = ibnbd_srv_check_update_open_perm(dev, srv_sess, io_mode,
+					       access_mode);
+	if (ret) {
+		ibnbd_put_srv_dev(dev);
+		return ERR_PTR(ret);
+	}
+
+	return dev;
+}
+
+static inline void
+ibnbd_srv_fill_msg_open_rsp_header(struct ibnbd_msg_open_rsp *rsp,
+				   u32 clt_device_id)
+{
+	rsp->hdr.type		= IBNBD_MSG_OPEN_RSP;
+	rsp->clt_device_id	= clt_device_id;
+}
+
+static void ibnbd_srv_fill_msg_open_rsp(struct ibnbd_msg_open_rsp *rsp,
+					u32 device_id, u32 clt_device_id,
+					size_t nsectors,
+					const struct ibnbd_dev *ibnbd_dev)
+{
+	struct block_device *bdev;
+
+	ibnbd_srv_fill_msg_open_rsp_header(rsp, clt_device_id);
+
+	rsp->result			= 0;
+	rsp->device_id			= device_id;
+	rsp->nsectors			= nsectors;
+	rsp->logical_block_size		=
+		ibnbd_dev_get_logical_bsize(ibnbd_dev);
+	rsp->physical_block_size	= ibnbd_dev_get_phys_bsize(ibnbd_dev);
+	rsp->max_segments		= ibnbd_dev_get_max_segs(ibnbd_dev);
+	rsp->max_hw_sectors		= ibnbd_dev_get_max_hw_sects(ibnbd_dev);
+	rsp->max_write_same_sectors	=
+		ibnbd_dev_get_max_write_same_sects(ibnbd_dev);
+
+	rsp->max_discard_sectors	=
+		ibnbd_dev_get_max_discard_sects(ibnbd_dev);
+	rsp->discard_zeroes_data	=
+		ibnbd_dev_get_discard_zeroes_data(ibnbd_dev);
+	rsp->discard_granularity	=
+		ibnbd_dev_get_discard_granularity(ibnbd_dev);
+
+	rsp->discard_alignment	= ibnbd_dev_get_discard_alignment(ibnbd_dev);
+	rsp->secure_discard	= ibnbd_dev_get_secure_discard(ibnbd_dev);
+
+	bdev = ibnbd_dev_get_bdev(ibnbd_dev);
+	rsp->rotational	= !blk_queue_nonrot(bdev_get_queue(bdev));
+	rsp->io_mode	= ibnbd_dev->mode;
+
+	DEB("nsectors = %llu, logical_block_size = %d, "
+	    "physical_block_size = %d, max_segments = %d, "
+	    "max_hw_sectors = %d, max_write_same_sects = %d, "
+	    "max_discard_sectors = %d, rotational = %d, io_mode = %d\n",
+	    rsp->nsectors, rsp->logical_block_size, rsp->physical_block_size,
+	    rsp->max_segments, rsp->max_hw_sectors, rsp->max_write_same_sectors,
+	    rsp->max_discard_sectors, rsp->rotational, rsp->io_mode);
+}
+
+static struct ibnbd_srv_sess_dev *
+ibnbd_srv_create_set_sess_dev(struct ibnbd_srv_session *srv_sess,
+			      const struct ibnbd_msg_open *open_msg,
+			      struct ibnbd_dev *ibnbd_dev, fmode_t open_flags,
+			      struct ibnbd_srv_dev *srv_dev)
+{
+	struct ibnbd_srv_sess_dev *sdev = ibnbd_sess_dev_alloc(srv_sess);
+
+	if (IS_ERR(sdev))
+		return sdev;
+
+	kref_init(&sdev->kref);
+
+	strlcpy(sdev->pathname, open_msg->dev_name, sizeof(sdev->pathname));
+
+	sdev->ibnbd_dev		= ibnbd_dev;
+	sdev->sess		= srv_sess;
+	sdev->dev		= srv_dev;
+	sdev->open_flags	= open_flags;
+	sdev->clt_device_id	= open_msg->clt_device_id;
+
+	return sdev;
+}
+
+static char *ibnbd_srv_get_full_path(const char *dev_name)
+{
+	char *full_path;
+	char *a, *b;
+
+	full_path = kmalloc(PATH_MAX, GFP_KERNEL);
+	if (!full_path)
+		return ERR_PTR(-ENOMEM);
+
+	snprintf(full_path, PATH_MAX, "%s/%s", dev_search_path, dev_name);
+
+	/* eliminitate duplicated slashes */
+	a = strchr(full_path, '/');
+	b = a;
+	while (*b != '\0') {
+		if (*b == '/' && *a == '/') {
+			b++;
+		} else {
+			a++;
+			*a = *b;
+			b++;
+		}
+	}
+	a++;
+	*a = '\0';
+
+	return full_path;
+}
+
+static void process_msg_sess_info(struct ibtrs_session *s,
+				  struct ibnbd_srv_session *srv_sess,
+				  const void *msg, size_t len)
+{
+	int err;
+	const struct ibnbd_msg_sess_info *sess_info_msg = msg;
+	struct ibnbd_msg_sess_info_rsp rsp;
+	struct kvec vec = {
+		.iov_base = &rsp,
+		.iov_len  = sizeof(rsp)
+	};
+
+	if (srv_sess->hostname[0] == '\0')
+		strlcpy(srv_sess->hostname, ibtrs_srv_get_sess_hostname(s),
+			sizeof(srv_sess->hostname));
+
+	srv_sess->ver = min_t(u8, sess_info_msg->ver, IBNBD_VERSION);
+	DEB("Session to %s (%s) using protocol version %d (client version: %d,"
+	    " server version: %d)\n", srv_sess->str_addr, srv_sess->hostname,
+	    srv_sess->ver, sess_info_msg->ver, IBNBD_VERSION);
+
+	rsp.hdr.type = IBNBD_MSG_SESS_INFO_RSP;
+	rsp.ver = srv_sess->ver;
+
+	err = ibtrs_srv_send(s, &vec, 1);
+	if (unlikely(err))
+		ERR_NP("Failed to send session info response to client"
+		       "%s (%s)\n", srv_sess->str_addr, srv_sess->hostname);
+}
+
+static void process_msg_open(struct ibtrs_session *s,
+			     struct ibnbd_srv_session *srv_sess,
+			     const void *msg, size_t len)
+{
+	int ret;
+	struct ibnbd_srv_dev *srv_dev;
+	struct ibnbd_srv_sess_dev *srv_sess_dev;
+	const struct ibnbd_msg_open *open_msg = msg;
+	fmode_t open_flags;
+	char *full_path;
+	struct ibnbd_dev *ibnbd_dev;
+	enum ibnbd_io_mode io_mode;
+	struct ibnbd_msg_open_rsp rsp;
+	struct kvec vec = {
+		.iov_base = &rsp,
+		.iov_len  = sizeof(rsp)
+	};
+
+	DEB("Open message received: client='%s' path='%s' access_mode=%d"
+	    " io_mode=%d\n", srv_sess->str_addr, open_msg->dev_name,
+	    open_msg->access_mode, open_msg->io_mode);
+	open_flags = FMODE_READ;
+	if (open_msg->access_mode != IBNBD_ACCESS_RO)
+		open_flags |= FMODE_WRITE;
+
+	if ((strlen(dev_search_path) + strlen(open_msg->dev_name))
+	    >= PATH_MAX) {
+		ERR_NP("Opening device for client %s failed, device path too"
+		       " long. '%s/%s' is longer than PATH_MAX (%d)\n",
+		       srv_sess->str_addr, dev_search_path, open_msg->dev_name,
+		       PATH_MAX);
+		ret = -EINVAL;
+		goto reject;
+	}
+	full_path = ibnbd_srv_get_full_path(open_msg->dev_name);
+	if (IS_ERR(full_path)) {
+		ret = PTR_ERR(full_path);
+		ERR_NP("Opening device '%s' for client %s failed,"
+		       " failed to get device full path, errno: %d\n",
+		       open_msg->dev_name, srv_sess->str_addr, ret);
+		goto reject;
+	}
+
+	if (open_msg->io_mode == IBNBD_BLOCKIO)
+		io_mode = IBNBD_BLOCKIO;
+	else if (open_msg->io_mode == IBNBD_FILEIO)
+		io_mode = IBNBD_FILEIO;
+	else
+		io_mode = def_io_mode;
+
+	ibnbd_dev = ibnbd_dev_open(full_path, open_flags, io_mode,
+				   srv_sess->sess_bio_set, ibnbd_endio);
+	if (IS_ERR(ibnbd_dev)) {
+		ERR_NP("Opening device '%s' for client %s failed,"
+		       " failed to open the block device, errno:"
+		       " %ld\n", full_path, srv_sess->str_addr,
+		       PTR_ERR(ibnbd_dev));
+		ret = PTR_ERR(ibnbd_dev);
+		goto free_path;
+	}
+
+	srv_dev = ibnbd_srv_get_or_create_srv_dev(ibnbd_dev, srv_sess, io_mode,
+						  open_msg->access_mode);
+	if (IS_ERR(srv_dev)) {
+		ERR_NP("Opening device '%s' for client %s failed,"
+		       " creating srv_dev failed, errno: %ld\n", full_path,
+		       srv_sess->str_addr, PTR_ERR(srv_dev));
+		ret = PTR_ERR(srv_dev);
+		goto ibnbd_dev_close;
+	}
+
+	srv_sess_dev = ibnbd_srv_create_set_sess_dev(srv_sess, open_msg,
+						     ibnbd_dev, open_flags,
+						     srv_dev);
+	if (IS_ERR(srv_sess_dev)) {
+		ERR_NP("Opening device '%s' for client %s failed,"
+		       " creating sess_dev failed, errno: %ld\n", full_path,
+		       srv_sess->str_addr, PTR_ERR(srv_sess_dev));
+		ret = PTR_ERR(srv_sess_dev);
+		goto srv_dev_put;
+	}
+
+	/* Create the srv_dev sysfs files if they haven't been created yet. The
+	 * reason to delay the creation is not to create the sysfs files before
+	 * we are sure the device can be opened.
+	 */
+	mutex_lock(&srv_dev->lock);
+	if (!srv_dev->dev_kobj.state_in_sysfs) {
+		ret = ibnbd_srv_create_dev_sysfs(srv_dev,
+						 ibnbd_dev_get_bdev(ibnbd_dev),
+						 ibnbd_dev_get_name(ibnbd_dev));
+		if (ret) {
+			mutex_unlock(&srv_dev->lock);
+			ERR(srv_sess_dev, "Opening device failed, failed to"
+			    " create device sysfs files, errno: %d\n", ret);
+			goto free_srv_sess_dev;
+		}
+	}
+
+	ret = ibnbd_srv_create_dev_client_sysfs(srv_sess_dev);
+	if (ret) {
+		mutex_unlock(&srv_dev->lock);
+		ERR(srv_sess_dev, "Opening device failed, failed to create"
+		    " dev client sysfs files, errno: %d\n", ret);
+		goto free_srv_sess_dev;
+	}
+
+	list_add(&srv_sess_dev->dev_list, &srv_dev->sess_dev_list);
+	mutex_unlock(&srv_dev->lock);
+
+	mutex_lock(&srv_sess->lock);
+	list_add(&srv_sess_dev->sess_list, &srv_sess->sess_dev_list);
+	mutex_unlock(&srv_sess->lock);
+
+	srv_sess_dev->nsectors = ibnbd_dev_get_capacity(ibnbd_dev);
+
+	ibnbd_srv_fill_msg_open_rsp(&rsp, srv_sess_dev->device_id,
+				    open_msg->clt_device_id,
+				    srv_sess_dev->nsectors, ibnbd_dev);
+
+	if (unlikely(srv_sess->state == SESS_STATE_DISCONNECTED)) {
+		ret = -ENODEV;
+		ERR(srv_sess_dev, "Opening device failed, session"
+		    " is disconnected, errno: %d\n", ret);
+		goto remove_srv_sess_dev;
+	}
+
+	ret = ibtrs_srv_send(s, &vec, 1);
+	if (unlikely(ret)) {
+		ERR(srv_sess_dev, "Opening device failed, sending open"
+		    " response msg failed, errno: %d\n", ret);
+		goto remove_srv_sess_dev;
+	}
+	srv_sess_dev->is_visible = true;
+	INFO(srv_sess_dev, "Opened device '%s' in %s mode\n",
+	     srv_dev->id, ibnbd_io_mode_str(io_mode));
+
+	kfree(full_path);
+	return;
+
+remove_srv_sess_dev:
+	ibnbd_srv_destroy_dev_client_sysfs(srv_sess_dev);
+	mutex_lock(&srv_sess->lock);
+	list_del(&srv_sess_dev->sess_list);
+	mutex_unlock(&srv_sess->lock);
+
+	mutex_lock(&srv_dev->lock);
+	list_del(&srv_sess_dev->dev_list);
+	mutex_unlock(&srv_dev->lock);
+free_srv_sess_dev:
+	write_lock(&srv_sess->index_lock);
+	idr_remove(&srv_sess->index_idr, srv_sess_dev->device_id);
+	write_unlock(&srv_sess->index_lock);
+	kfree(srv_sess_dev);
+srv_dev_put:
+	if (open_msg->access_mode != IBNBD_ACCESS_RO) {
+		mutex_lock(&srv_dev->lock);
+		srv_dev->open_write_cnt--;
+		mutex_unlock(&srv_dev->lock);
+	}
+	ibnbd_put_srv_dev(srv_dev);
+ibnbd_dev_close:
+	ibnbd_dev_close(ibnbd_dev);
+free_path:
+	kfree(full_path);
+reject:
+	DEB("Sending negative response to client %s for device '%s': %d\n",
+	    srv_sess->str_addr, open_msg->dev_name, ret);
+	ibnbd_srv_fill_msg_open_rsp_header(&rsp, open_msg->clt_device_id);
+	rsp.result = ret;
+	if (unlikely(srv_sess->state == SESS_STATE_DISCONNECTED))
+		return;
+	ret = ibtrs_srv_send(s, &vec, 1);
+	if (ret)
+		ERR_NP("Rejecting mapping request of device '%s' from client %s"
+		       " failed, errno: %d\n", open_msg->dev_name,
+		       srv_sess->str_addr, ret);
+}
+
+static int send_msg_close_rsp(struct ibtrs_session *sess, u32 clt_device_id)
+{
+	struct ibnbd_msg_close_rsp msg;
+	struct kvec vec = {
+		.iov_base = &msg,
+		.iov_len  = sizeof(msg)
+	};
+
+	msg.hdr.type	= IBNBD_MSG_CLOSE_RSP;
+	msg.clt_device_id	= clt_device_id;
+
+	return ibtrs_srv_send(sess, &vec, 1);
+}
+
+static void process_msg_close(struct ibtrs_session *s,
+			      struct ibnbd_srv_session *srv_sess,
+			      const void *msg, size_t len)
+{
+	const struct ibnbd_msg_close *close_msg = msg;
+	struct ibnbd_srv_sess_dev *sess_dev;
+	u32 dev_id;
+
+	dev_id = close_msg->device_id;
+
+	sess_dev = ibnbd_get_sess_dev(dev_id, srv_sess);
+	if (likely(!IS_ERR(sess_dev))) {
+		u32 clt_device_id = sess_dev->clt_device_id;
+
+		ibnbd_srv_destroy_dev_client_sysfs(sess_dev);
+		ibnbd_put_sess_dev(sess_dev);
+		ibnbd_destroy_sess_dev(sess_dev, false);
+		send_msg_close_rsp(s, clt_device_id);
+	} else {
+		ERR_NP("Destroying device id %d from client %s failed,"
+		       " device not open\n", dev_id, srv_sess->str_addr);
+	}
+}
+
+static void ibnbd_srv_recv(struct ibtrs_session *sess, void *priv,
+			   const void *msg, size_t len)
+{
+	struct ibnbd_msg_hdr *hdr;
+	struct ibnbd_srv_session *srv_sess;
+
+	hdr = (struct ibnbd_msg_hdr *)msg;
+	srv_sess = priv;
+
+	if (unlikely(WARN_ON(!srv_sess)))
+		return;
+	if (unlikely(WARN_ON(!hdr) || ibnbd_validate_message(msg, len)))
+		return;
+
+	print_hex_dump_debug("", DUMP_PREFIX_OFFSET, 8, 1, msg, len, true);
+
+	switch (hdr->type) {
+	case IBNBD_MSG_SESS_INFO:
+		process_msg_sess_info(sess, srv_sess, msg, len);
+		break;
+	case IBNBD_MSG_OPEN:
+		process_msg_open(sess, srv_sess, msg, len);
+		break;
+	case IBNBD_MSG_CLOSE:
+		process_msg_close(sess, srv_sess, msg, len);
+		break;
+	default:
+		WRN_NP("Message with unexpected type %d received from client"
+		       " %s\n", hdr->type, srv_sess->str_addr);
+		break;
+	}
+}
+
+static int ibnbd_srv_revalidate_sess_dev(struct ibnbd_srv_sess_dev *sess_dev)
+{
+	int ret;
+	size_t nsectors;
+	struct ibnbd_msg_revalidate msg;
+	struct kvec vec = {
+		.iov_base = &msg,
+		.iov_len  = sizeof(msg)
+	};
+
+	nsectors = ibnbd_dev_get_capacity(sess_dev->ibnbd_dev);
+
+	msg.hdr.type		= IBNBD_MSG_REVAL;
+	msg.clt_device_id	= sess_dev->clt_device_id;
+	msg.nsectors		= nsectors;
+
+	if (unlikely(sess_dev->sess->state == SESS_STATE_DISCONNECTED))
+		return -ENODEV;
+
+	if (!sess_dev->is_visible) {
+		INFO(sess_dev, "revalidate device failed, wait for sending "
+		     "open reply first\n");
+		return -EAGAIN;
+	}
+
+	ret = ibtrs_srv_send(sess_dev->sess->ibtrs_sess, &vec, 1);
+	if (unlikely(ret)) {
+		ERR(sess_dev, "revalidate: Sending new device size"
+		    " to client failed, errno: %d\n", ret);
+	} else {
+		INFO(sess_dev, "notified client about device size change"
+		     " (old nsectors: %lu, new nsectors: %lu)\n",
+		     sess_dev->nsectors, nsectors);
+		sess_dev->nsectors = nsectors;
+	}
+
+	return ret;
+}
+
+int ibnbd_srv_revalidate_dev(struct ibnbd_srv_dev *dev)
+{
+	struct ibnbd_srv_sess_dev *sess_dev;
+	int ret = 0;
+
+	mutex_lock(&dev->lock);
+	list_for_each_entry(sess_dev, &dev->sess_dev_list, dev_list)
+		ret += ibnbd_srv_revalidate_sess_dev(sess_dev);
+	mutex_unlock(&dev->lock);
+
+	if (ret)
+		return -EIO;
+
+	return 0;
+}
+
+static int __init ibnbd_srv_init_module(void)
+{
+	int err;
+
+	INFO_NP("Loading module ibnbd_server, version: %s (dev_search_path: "
+		"'%s', def_io_mode: '%s')\n", __stringify(IBNBD_VER),
+		dev_search_path, ibnbd_io_mode_str(def_io_mode));
+
+	ibnbd_srv_ops.owner	= THIS_MODULE;
+	ibnbd_srv_ops.recv	= ibnbd_srv_recv;
+	ibnbd_srv_ops.rdma_ev	= ibnbd_srv_rdma_ev;
+	ibnbd_srv_ops.sess_ev	= ibnbd_srv_sess_ev;
+
+	err = ibtrs_srv_register(&ibnbd_srv_ops);
+	if (err) {
+		ERR_NP("Failed to load module, IBTRS registration failed,"
+		       " errno: %d\n", err);
+		goto out;
+	}
+
+	err = ibnbd_dev_init();
+	if (err) {
+		ERR_NP("Failed to load module, init device resources failed,"
+		       " errno: %d\n", err);
+		goto unreg;
+	}
+
+	err = ibnbd_srv_create_sysfs_files();
+	if (err) {
+		ERR_NP("Failed to load module, create sysfs files failed,"
+		       " errno: %d\n", err);
+		goto dev_destroy;
+	}
+
+	return 0;
+
+dev_destroy:
+	ibnbd_dev_destroy();
+unreg:
+	ibtrs_srv_unregister(&ibnbd_srv_ops);
+out:
+	return err;
+}
+
+static void __exit ibnbd_srv_cleanup_module(void)
+{
+	INFO_NP("Unloading module\n");
+	ibtrs_srv_unregister(&ibnbd_srv_ops);
+	WARN_ON(!list_empty(&sess_list));
+	ibnbd_srv_destroy_sysfs_files();
+	ibnbd_dev_destroy();
+	INFO_NP("Module unloaded\n");
+}
+
+module_init(ibnbd_srv_init_module);
+module_exit(ibnbd_srv_cleanup_module);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 23/28] ibnbd_srv: add abstraction for submit IO to file or block device
@ 2017-03-24 10:45   ` Jack Wang
  0 siblings, 0 replies; 87+ messages in thread
From: Jack Wang @ 2017-03-24 10:45 UTC (permalink / raw)
  To: linux-block, linux-rdma
  Cc: dledford, axboe, hch, mail, Milind.dumbare, yun.wang, Jack Wang,
	Kleber Souza, Danil Kipnis, Roman Pen

From: Jack Wang <jinpu.wang@profitbricks.com>

Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Kleber Souza <kleber.souza@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kipnis@profitbricks.com>
Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
---
 drivers/block/ibnbd_server/ibnbd_dev.c | 436 +++++++++++++++++++++++++++++++++
 drivers/block/ibnbd_server/ibnbd_dev.h | 149 +++++++++++
 2 files changed, 585 insertions(+)
 create mode 100644 drivers/block/ibnbd_server/ibnbd_dev.c
 create mode 100644 drivers/block/ibnbd_server/ibnbd_dev.h

diff --git a/drivers/block/ibnbd_server/ibnbd_dev.c b/drivers/block/ibnbd_server/ibnbd_dev.c
new file mode 100644
index 0000000..5f6b453
--- /dev/null
+++ b/drivers/block/ibnbd_server/ibnbd_dev.c
@@ -0,0 +1,436 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler < mail@fholler.de>
+ *          Jack Wang <jinpu.wang@profitbricks.com>
+ *   	    Kleber Souza <kleber.souza@profitbricks.com>
+ * 	    Danil Kipnis <danil.kipnis@profitbricks.com>
+ *   	    Roman Pen <roman.penyaev@profitbricks.com>
+ *          Milind Dumbare <Milind.dumbare@gmail.com>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#include "ibnbd_dev.h"
+#include "ibnbd_srv_log.h"
+
+#define IBNBD_DEV_MAX_FILEIO_ACTIVE_WORKERS 0
+
+struct ibnbd_dev_file_io_work {
+	struct ibnbd_dev	*dev;
+	void			*priv;
+
+	sector_t		sector;
+	void			*data;
+	size_t			len;
+	size_t			bi_size;
+	enum ibnbd_io_flags	flags;
+
+	struct work_struct	work;
+};
+
+struct ibnbd_dev_blk_io {
+	struct ibnbd_dev *dev;
+	void		 *priv;
+};
+
+static struct workqueue_struct *fileio_wq;
+
+int ibnbd_dev_init(void)
+{
+	fileio_wq = alloc_workqueue("%s", WQ_UNBOUND,
+				    IBNBD_DEV_MAX_FILEIO_ACTIVE_WORKERS,
+				    "ibnbd_server_fileio_wq");
+	if (!fileio_wq)
+		return -ENOMEM;
+
+	return 0;
+}
+
+void ibnbd_dev_destroy(void)
+{
+	destroy_workqueue(fileio_wq);
+}
+
+static inline struct block_device *ibnbd_dev_open_bdev(const char *path,
+						       fmode_t flags)
+{
+	return blkdev_get_by_path(path, flags, THIS_MODULE);
+}
+
+static int ibnbd_dev_blk_open(struct ibnbd_dev *dev, const char *path,
+			      fmode_t flags)
+{
+	dev->bdev = ibnbd_dev_open_bdev(path, flags);
+	return PTR_ERR_OR_ZERO(dev->bdev);
+}
+
+static int ibnbd_dev_vfs_open(struct ibnbd_dev *dev, const char *path,
+			      fmode_t flags)
+{
+	int oflags = O_DSYNC; /* enable write-through */
+
+	if (flags & FMODE_WRITE)
+		oflags |= O_RDWR;
+	else if (flags & FMODE_READ)
+		oflags |= O_RDONLY;
+	else
+		return -EINVAL;
+
+	dev->file = filp_open(path, oflags, 0);
+	return PTR_ERR_OR_ZERO(dev->file);
+}
+
+struct ibnbd_dev *ibnbd_dev_open(const char *path, fmode_t flags,
+				 enum ibnbd_io_mode mode, struct bio_set *bs,
+				 ibnbd_dev_io_fn io_cb)
+{
+	struct ibnbd_dev *dev;
+	int ret;
+
+	dev = kzalloc(sizeof(*dev), GFP_KERNEL);
+	if (!dev)
+		return ERR_PTR(-ENOMEM);
+
+	if (mode == IBNBD_BLOCKIO) {
+		dev->blk_open_flags = flags;
+		ret = ibnbd_dev_blk_open(dev, path, dev->blk_open_flags);
+		if (ret)
+			goto err;
+	} else if (mode == IBNBD_FILEIO) {
+		dev->blk_open_flags = FMODE_READ;
+		ret = ibnbd_dev_blk_open(dev, path, dev->blk_open_flags);
+		if (ret)
+			goto err;
+
+		ret = ibnbd_dev_vfs_open(dev, path, flags);
+		if (ret)
+			goto blk_put;
+	}
+
+	dev->blk_open_flags	= flags;
+	dev->mode		= mode;
+	dev->io_cb		= io_cb;
+	bdevname(dev->bdev, dev->name);
+	dev->ibd_bio_set	= bs;
+
+	return dev;
+
+blk_put:
+	blkdev_put(dev->bdev, dev->blk_open_flags);
+err:
+	kfree(dev);
+	return ERR_PTR(ret);
+}
+
+void ibnbd_dev_close(struct ibnbd_dev *dev)
+{
+	flush_workqueue(fileio_wq);
+	blkdev_put(dev->bdev, dev->blk_open_flags);
+	if (dev->mode == IBNBD_FILEIO)
+		filp_close(dev->file, dev->file);
+	kfree(dev);
+}
+
+static void ibnbd_dev_bi_end_io(struct bio *bio)
+{
+	struct ibnbd_dev_blk_io *io = bio->bi_private;
+
+	int error = bio->bi_error;
+
+	io->dev->io_cb(io->priv, error);
+
+	bio_put(bio);
+	kfree(io);
+}
+
+static void bio_map_kern_endio(struct bio *bio)
+{
+	bio_put(bio);
+}
+
+/**
+ *	ibnbd_bio_map_kern	-	map kernel address into bio
+ *	@q: the struct request_queue for the bio
+ *	@data: pointer to buffer to map
+ *	@bs: bio_set to use.
+ *	@len: length in bytes
+ *	@gfp_mask: allocation flags for bio allocation
+ *
+ *	Map the kernel address into a bio suitable for io to a block
+ *	device. Returns an error pointer in case of error.
+ */
+static struct bio *ibnbd_bio_map_kern(struct request_queue *q, void *data,
+				      struct bio_set *bs,
+				      unsigned int len, gfp_t gfp_mask)
+{
+	unsigned long kaddr = (unsigned long)data;
+	unsigned long end = (kaddr + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	unsigned long start = kaddr >> PAGE_SHIFT;
+	const int nr_pages = end - start;
+	int offset, i;
+	struct bio *bio;
+
+	bio = bio_alloc_bioset(gfp_mask, nr_pages, bs);
+	if (!bio)
+		return ERR_PTR(-ENOMEM);
+
+	offset = offset_in_page(kaddr);
+	for (i = 0; i < nr_pages; i++) {
+		unsigned int bytes = PAGE_SIZE - offset;
+
+		if (len <= 0)
+			break;
+
+		if (bytes > len)
+			bytes = len;
+
+		if (bio_add_pc_page(q, bio, virt_to_page(data), bytes,
+				    offset) < bytes) {
+			/* we don't support partial mappings */
+			bio_put(bio);
+			return ERR_PTR(-EINVAL);
+		}
+
+		data += bytes;
+		len -= bytes;
+		offset = 0;
+	}
+
+	bio->bi_end_io = bio_map_kern_endio;
+	return bio;
+}
+
+static int ibnbd_dev_blk_submit_io(struct ibnbd_dev *dev, sector_t sector,
+				   void *data, size_t len, u32 bi_size,
+				   enum ibnbd_io_flags flags, void *priv)
+{
+	struct request_queue *q = bdev_get_queue(dev->bdev);
+	struct ibnbd_dev_blk_io *io;
+	struct bio *bio;
+
+	/* check if the buffer is suitable for bdev */
+	if (unlikely(WARN_ON(!blk_rq_aligned(q, (unsigned long)data, len))))
+		return -EINVAL;
+
+	/* Generate bio with pages pointing to the rdma buffer */
+	bio = ibnbd_bio_map_kern(q, data, dev->ibd_bio_set, len, GFP_KERNEL);
+	if (unlikely(IS_ERR(bio)))
+		return PTR_ERR(bio);
+
+	io = kmalloc(sizeof(*io), GFP_KERNEL);
+	if (unlikely(!io)) {
+		bio_put(bio);
+		return -ENOMEM;
+	}
+
+	io->dev		= dev;
+	io->priv	= priv;
+
+	bio->bi_end_io		= ibnbd_dev_bi_end_io;
+	bio->bi_bdev		= dev->bdev;
+	bio->bi_private		= io;
+	bio->bi_opf		= ibnbd_io_flags_to_bi_rw(flags);
+	bio->bi_iter.bi_sector	= sector;
+	bio->bi_iter.bi_size	= bi_size;
+
+	submit_bio(bio);
+
+	return 0;
+}
+
+static int ibnbd_dev_file_handle_flush(struct ibnbd_dev_file_io_work *w,
+				       loff_t start)
+{
+	int ret;
+	loff_t end;
+	int len = w->bi_size;
+
+	if (len)
+		end = start + len - 1;
+	else
+		end = LLONG_MAX;
+
+	ret = vfs_fsync_range(w->dev->file, start, end, 1);
+	if (unlikely(ret))
+		INFO_NP_RL("I/O FLUSH failed on %s, vfs_sync errno: %d\n",
+			   w->dev->name, ret);
+	return ret;
+}
+
+static int ibnbd_dev_file_handle_fua(struct ibnbd_dev_file_io_work *w,
+				     loff_t start)
+{
+	int ret;
+	loff_t end;
+	int len = w->bi_size;
+
+	if (len)
+		end = start + len - 1;
+	else
+		end = LLONG_MAX;
+
+	ret = vfs_fsync_range(w->dev->file, start, end, 1);
+	if (unlikely(ret))
+		INFO_NP_RL("I/O FUA failed on %s, vfs_sync errno: %d\n",
+			   w->dev->name, ret);
+	return ret;
+}
+
+static int ibnbd_dev_file_handle_write_same(struct ibnbd_dev_file_io_work *w)
+{
+	int i;
+
+	if (unlikely(WARN_ON(w->bi_size % w->len)))
+		return -EINVAL;
+
+	for (i = 1; i < w->bi_size / w->len; i++)
+		memcpy(w->data + i * w->len, w->data, w->len);
+
+	return 0;
+}
+
+static void ibnbd_dev_file_submit_io_worker(struct work_struct *w)
+{
+	struct ibnbd_dev_file_io_work *dev_work;
+	loff_t off;
+	int ret;
+	int len;
+	struct file *f;
+
+	dev_work = container_of(w, struct ibnbd_dev_file_io_work, work);
+	off = dev_work->sector * ibnbd_dev_get_logical_bsize(dev_work->dev);
+	f = dev_work->dev->file;
+	len = dev_work->bi_size;
+
+	if (dev_work->flags & IBNBD_RW_REQ_FLUSH) {
+		ret = ibnbd_dev_file_handle_flush(dev_work, off);
+		if (unlikely(ret))
+			goto out;
+	}
+
+	if (dev_work->flags & IBNBD_RW_REQ_WRITE_SAME) {
+		ret = ibnbd_dev_file_handle_write_same(dev_work);
+		if (unlikely(ret))
+			goto out;
+	}
+
+	/* TODO Implement support for DIRECT */
+	if (dev_work->bi_size) {
+		if (dev_work->flags & IBNBD_RW_REQ_WRITE)
+			ret = kernel_write(f, dev_work->data, dev_work->bi_size,
+					   off);
+		else
+			ret = kernel_read(f, off, dev_work->data,
+					  dev_work->bi_size);
+
+		if (unlikely(ret < 0)) {
+			goto out;
+		} else if (unlikely(ret != dev_work->bi_size)) {
+			/* TODO implement support for partial completions */
+			ret = -EIO;
+			goto out;
+		} else {
+			ret = 0;
+		}
+	}
+
+	if (dev_work->flags & IBNBD_RW_REQ_FUA)
+		ret = ibnbd_dev_file_handle_fua(dev_work, off);
+out:
+	dev_work->dev->io_cb(dev_work->priv, ret);
+	kfree(dev_work);
+}
+
+static inline bool ibnbd_dev_file_io_flags_supported(enum ibnbd_io_flags flags)
+{
+	flags &= ~IBNBD_RW_REQ_WRITE;
+	flags &= ~IBNBD_RW_REQ_SYNC;
+	flags &= ~IBNBD_RW_REQ_FUA;
+	flags &= ~IBNBD_RW_REQ_FLUSH;
+	flags &= ~IBNBD_RW_REQ_WRITE_SAME;
+
+	return (!flags);
+}
+
+static int ibnbd_dev_file_submit_io(struct ibnbd_dev *dev, sector_t sector,
+				    void *data, size_t len, size_t bi_size,
+				    enum ibnbd_io_flags flags, void *priv)
+{
+	struct ibnbd_dev_file_io_work *w;
+
+	if (!ibnbd_dev_file_io_flags_supported(flags)) {
+		INFO_NP_RL("Unsupported I/O flags: 0x%x on device %s\n", flags,
+			   dev->name);
+		return -ENOTSUPP;
+	}
+
+	w = kmalloc(sizeof(*w), GFP_KERNEL);
+	if (!w)
+		return -ENOMEM;
+
+	w->dev		= dev;
+	w->priv		= priv;
+	w->sector	= sector;
+	w->data		= data;
+	w->len		= len;
+	w->bi_size	= bi_size;
+	w->flags	= flags;
+	INIT_WORK(&w->work, ibnbd_dev_file_submit_io_worker);
+
+	if (unlikely(!queue_work(fileio_wq, &w->work))) {
+		kfree(w);
+		return -EEXIST;
+	}
+
+	return 0;
+}
+
+int ibnbd_dev_submit_io(struct ibnbd_dev *dev, sector_t sector, void *data,
+			size_t len, u32 bi_size, enum ibnbd_io_flags flags,
+			void *priv)
+{
+	if (dev->mode == IBNBD_FILEIO)
+		return ibnbd_dev_file_submit_io(dev, sector, data, len, bi_size,
+						flags, priv);
+	else if (dev->mode == IBNBD_BLOCKIO)
+		return ibnbd_dev_blk_submit_io(dev, sector, data, len, bi_size,
+					       flags, priv);
+
+	WRN_NP("Submitting I/O to %s failed, dev->mode contains invalid "
+	       "value: '%d', memory corrupted?", dev->name, dev->mode);
+	return -EINVAL;
+}
diff --git a/drivers/block/ibnbd_server/ibnbd_dev.h b/drivers/block/ibnbd_server/ibnbd_dev.h
new file mode 100644
index 0000000..7c73d64
--- /dev/null
+++ b/drivers/block/ibnbd_server/ibnbd_dev.h
@@ -0,0 +1,149 @@
+#ifndef _IBNBD_DEV_H
+#define _IBNBD_DEV_H
+
+#include <linux/fs.h>
+#include "../ibnbd_inc/ibnbd-proto.h"
+
+typedef void ibnbd_dev_io_fn(void *priv, int error);
+
+struct ibnbd_dev {
+	struct block_device	*bdev;
+	struct bio_set		*ibd_bio_set;
+	struct file		*file;
+	fmode_t			blk_open_flags;
+	enum ibnbd_io_mode	mode;
+	char			name[BDEVNAME_SIZE];
+	ibnbd_dev_io_fn		*io_cb;
+};
+
+
+/** ibnbd_dev_init() - Initialize ibnbd_dev
+ *
+ * This functions initialized the ibnbd-dev component.
+ * It has to be called 1x time before ibnbd_dev_open() is used
+ */
+int ibnbd_dev_init(void);
+
+/** ibnbd_dev_destroy() - Destroy ibnbd_dev
+ *
+ * This functions destroys the ibnbd-dev component.
+ * It has to be called after the last device was closed.
+ */
+void ibnbd_dev_destroy(void);
+
+/**
+ * ibnbd_dev_open() - Open a device
+ * @flags:	open flags
+ * @mode:	open via VFS or block layer
+ * @bs:		bio_set to use during block io,
+ * @io_cb:	is called when I/O finished
+ */
+struct ibnbd_dev *ibnbd_dev_open(const char *path, fmode_t flags,
+				 enum ibnbd_io_mode mode, struct bio_set *bs,
+				 ibnbd_dev_io_fn io_cb);
+
+/**
+ * ibnbd_dev_close() - Close a device
+ */
+void ibnbd_dev_close(struct ibnbd_dev *dev);
+
+static inline size_t ibnbd_dev_get_capacity(const struct ibnbd_dev *dev)
+{
+	return get_capacity(dev->bdev->bd_disk);
+}
+
+static inline int ibnbd_dev_get_logical_bsize(const struct ibnbd_dev *dev)
+{
+	return bdev_logical_block_size(dev->bdev);
+}
+
+static inline int ibnbd_dev_get_phys_bsize(const struct ibnbd_dev *dev)
+{
+	return bdev_physical_block_size(dev->bdev);
+}
+
+static inline int ibnbd_dev_get_max_segs(const struct ibnbd_dev *dev)
+{
+	return queue_max_segments(bdev_get_queue(dev->bdev));
+}
+
+static inline int ibnbd_dev_get_max_hw_sects(const struct ibnbd_dev *dev)
+{
+	return queue_max_hw_sectors(bdev_get_queue(dev->bdev));
+}
+
+static inline int
+ibnbd_dev_get_max_write_same_sects(const struct ibnbd_dev *dev)
+{
+	return bdev_write_same(dev->bdev);
+}
+
+static inline int ibnbd_dev_get_secure_discard(const struct ibnbd_dev *dev)
+{
+	if (dev->mode == IBNBD_BLOCKIO)
+		return blk_queue_secure_erase(bdev_get_queue(dev->bdev));
+	return 0;
+}
+
+static inline int ibnbd_dev_get_max_discard_sects(const struct ibnbd_dev *dev)
+{
+	if (!blk_queue_discard(bdev_get_queue(dev->bdev)))
+		return 0;
+
+	if (dev->mode == IBNBD_BLOCKIO)
+		return blk_queue_get_max_sectors(bdev_get_queue(dev->bdev),
+						 REQ_OP_DISCARD);
+	return 0;
+}
+
+static inline int ibnbd_dev_get_discard_zeroes_data(const struct ibnbd_dev *dev)
+{
+	if (dev->mode == IBNBD_BLOCKIO)
+		return bdev_get_queue(dev->bdev)->limits.discard_zeroes_data;
+	return 0;
+}
+
+static inline int ibnbd_dev_get_discard_granularity(const struct ibnbd_dev *dev)
+{
+	if (dev->mode == IBNBD_BLOCKIO)
+		return bdev_get_queue(dev->bdev)->limits.discard_granularity;
+	return 0;
+}
+
+static inline int ibnbd_dev_get_discard_alignment(const struct ibnbd_dev *dev)
+{
+	if (dev->mode == IBNBD_BLOCKIO)
+		return bdev_get_queue(dev->bdev)->limits.discard_alignment;
+	return 0;
+}
+
+
+/**
+ * ibnbd_dev_get_name() - Return the device name
+ * returns:	Device name up to %BDEVNAME_SIZE% long
+ */
+static inline const char *ibnbd_dev_get_name(const struct ibnbd_dev *dev)
+{
+	return dev->name;
+}
+
+static inline struct block_device *
+ibnbd_dev_get_bdev(const struct ibnbd_dev *dev)
+{
+	return dev->bdev;
+}
+
+
+/**
+ * ibnbd_dev_submit_io() - Submit an I/O to the disk
+ * @dev:	device to that the I/O is submitted
+ * @sector:	address to read/write data to
+ * @data:	I/O data to write or buffer to read I/O date into
+ * @len:	length of @data
+ * @bi_size:	Amount of data that will be read/written
+ * @priv:	private data passed to @io_fn
+ */
+int ibnbd_dev_submit_io(struct ibnbd_dev *dev, sector_t sector, void *data,
+			size_t len, u32 bi_size, enum ibnbd_io_flags flags,
+			void *priv);
+#endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 23/28] ibnbd_srv: add abstraction for submit IO to file or block device
@ 2017-03-24 10:45   ` Jack Wang
  0 siblings, 0 replies; 87+ messages in thread
From: Jack Wang @ 2017-03-24 10:45 UTC (permalink / raw)
  To: linux-block-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
	hch-jcswGhMUV9g, mail-99BIx50xQYGELgA04lAiVw,
	Milind.dumbare-Re5JQEeQqe8AvxtiuMwx3w,
	yun.wang-EIkl63zCoXaH+58JC4qpiA, Jack Wang, Kleber Souza,
	Danil Kipnis, Roman Pen

From: Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>

Signed-off-by: Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Signed-off-by: Kleber Souza <kleber.souza-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Signed-off-by: Danil Kipnis <danil.kipnis-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Signed-off-by: Roman Pen <roman.penyaev-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
---
 drivers/block/ibnbd_server/ibnbd_dev.c | 436 +++++++++++++++++++++++++++++++++
 drivers/block/ibnbd_server/ibnbd_dev.h | 149 +++++++++++
 2 files changed, 585 insertions(+)
 create mode 100644 drivers/block/ibnbd_server/ibnbd_dev.c
 create mode 100644 drivers/block/ibnbd_server/ibnbd_dev.h

diff --git a/drivers/block/ibnbd_server/ibnbd_dev.c b/drivers/block/ibnbd_server/ibnbd_dev.c
new file mode 100644
index 0000000..5f6b453
--- /dev/null
+++ b/drivers/block/ibnbd_server/ibnbd_dev.c
@@ -0,0 +1,436 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler < mail-99BIx50xQYGELgA04lAiVw@public.gmane.org>
+ *          Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ *   	    Kleber Souza <kleber.souza-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ * 	    Danil Kipnis <danil.kipnis-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ *   	    Roman Pen <roman.penyaev-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ *          Milind Dumbare <Milind.dumbare-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#include "ibnbd_dev.h"
+#include "ibnbd_srv_log.h"
+
+#define IBNBD_DEV_MAX_FILEIO_ACTIVE_WORKERS 0
+
+struct ibnbd_dev_file_io_work {
+	struct ibnbd_dev	*dev;
+	void			*priv;
+
+	sector_t		sector;
+	void			*data;
+	size_t			len;
+	size_t			bi_size;
+	enum ibnbd_io_flags	flags;
+
+	struct work_struct	work;
+};
+
+struct ibnbd_dev_blk_io {
+	struct ibnbd_dev *dev;
+	void		 *priv;
+};
+
+static struct workqueue_struct *fileio_wq;
+
+int ibnbd_dev_init(void)
+{
+	fileio_wq = alloc_workqueue("%s", WQ_UNBOUND,
+				    IBNBD_DEV_MAX_FILEIO_ACTIVE_WORKERS,
+				    "ibnbd_server_fileio_wq");
+	if (!fileio_wq)
+		return -ENOMEM;
+
+	return 0;
+}
+
+void ibnbd_dev_destroy(void)
+{
+	destroy_workqueue(fileio_wq);
+}
+
+static inline struct block_device *ibnbd_dev_open_bdev(const char *path,
+						       fmode_t flags)
+{
+	return blkdev_get_by_path(path, flags, THIS_MODULE);
+}
+
+static int ibnbd_dev_blk_open(struct ibnbd_dev *dev, const char *path,
+			      fmode_t flags)
+{
+	dev->bdev = ibnbd_dev_open_bdev(path, flags);
+	return PTR_ERR_OR_ZERO(dev->bdev);
+}
+
+static int ibnbd_dev_vfs_open(struct ibnbd_dev *dev, const char *path,
+			      fmode_t flags)
+{
+	int oflags = O_DSYNC; /* enable write-through */
+
+	if (flags & FMODE_WRITE)
+		oflags |= O_RDWR;
+	else if (flags & FMODE_READ)
+		oflags |= O_RDONLY;
+	else
+		return -EINVAL;
+
+	dev->file = filp_open(path, oflags, 0);
+	return PTR_ERR_OR_ZERO(dev->file);
+}
+
+struct ibnbd_dev *ibnbd_dev_open(const char *path, fmode_t flags,
+				 enum ibnbd_io_mode mode, struct bio_set *bs,
+				 ibnbd_dev_io_fn io_cb)
+{
+	struct ibnbd_dev *dev;
+	int ret;
+
+	dev = kzalloc(sizeof(*dev), GFP_KERNEL);
+	if (!dev)
+		return ERR_PTR(-ENOMEM);
+
+	if (mode == IBNBD_BLOCKIO) {
+		dev->blk_open_flags = flags;
+		ret = ibnbd_dev_blk_open(dev, path, dev->blk_open_flags);
+		if (ret)
+			goto err;
+	} else if (mode == IBNBD_FILEIO) {
+		dev->blk_open_flags = FMODE_READ;
+		ret = ibnbd_dev_blk_open(dev, path, dev->blk_open_flags);
+		if (ret)
+			goto err;
+
+		ret = ibnbd_dev_vfs_open(dev, path, flags);
+		if (ret)
+			goto blk_put;
+	}
+
+	dev->blk_open_flags	= flags;
+	dev->mode		= mode;
+	dev->io_cb		= io_cb;
+	bdevname(dev->bdev, dev->name);
+	dev->ibd_bio_set	= bs;
+
+	return dev;
+
+blk_put:
+	blkdev_put(dev->bdev, dev->blk_open_flags);
+err:
+	kfree(dev);
+	return ERR_PTR(ret);
+}
+
+void ibnbd_dev_close(struct ibnbd_dev *dev)
+{
+	flush_workqueue(fileio_wq);
+	blkdev_put(dev->bdev, dev->blk_open_flags);
+	if (dev->mode == IBNBD_FILEIO)
+		filp_close(dev->file, dev->file);
+	kfree(dev);
+}
+
+static void ibnbd_dev_bi_end_io(struct bio *bio)
+{
+	struct ibnbd_dev_blk_io *io = bio->bi_private;
+
+	int error = bio->bi_error;
+
+	io->dev->io_cb(io->priv, error);
+
+	bio_put(bio);
+	kfree(io);
+}
+
+static void bio_map_kern_endio(struct bio *bio)
+{
+	bio_put(bio);
+}
+
+/**
+ *	ibnbd_bio_map_kern	-	map kernel address into bio
+ *	@q: the struct request_queue for the bio
+ *	@data: pointer to buffer to map
+ *	@bs: bio_set to use.
+ *	@len: length in bytes
+ *	@gfp_mask: allocation flags for bio allocation
+ *
+ *	Map the kernel address into a bio suitable for io to a block
+ *	device. Returns an error pointer in case of error.
+ */
+static struct bio *ibnbd_bio_map_kern(struct request_queue *q, void *data,
+				      struct bio_set *bs,
+				      unsigned int len, gfp_t gfp_mask)
+{
+	unsigned long kaddr = (unsigned long)data;
+	unsigned long end = (kaddr + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	unsigned long start = kaddr >> PAGE_SHIFT;
+	const int nr_pages = end - start;
+	int offset, i;
+	struct bio *bio;
+
+	bio = bio_alloc_bioset(gfp_mask, nr_pages, bs);
+	if (!bio)
+		return ERR_PTR(-ENOMEM);
+
+	offset = offset_in_page(kaddr);
+	for (i = 0; i < nr_pages; i++) {
+		unsigned int bytes = PAGE_SIZE - offset;
+
+		if (len <= 0)
+			break;
+
+		if (bytes > len)
+			bytes = len;
+
+		if (bio_add_pc_page(q, bio, virt_to_page(data), bytes,
+				    offset) < bytes) {
+			/* we don't support partial mappings */
+			bio_put(bio);
+			return ERR_PTR(-EINVAL);
+		}
+
+		data += bytes;
+		len -= bytes;
+		offset = 0;
+	}
+
+	bio->bi_end_io = bio_map_kern_endio;
+	return bio;
+}
+
+static int ibnbd_dev_blk_submit_io(struct ibnbd_dev *dev, sector_t sector,
+				   void *data, size_t len, u32 bi_size,
+				   enum ibnbd_io_flags flags, void *priv)
+{
+	struct request_queue *q = bdev_get_queue(dev->bdev);
+	struct ibnbd_dev_blk_io *io;
+	struct bio *bio;
+
+	/* check if the buffer is suitable for bdev */
+	if (unlikely(WARN_ON(!blk_rq_aligned(q, (unsigned long)data, len))))
+		return -EINVAL;
+
+	/* Generate bio with pages pointing to the rdma buffer */
+	bio = ibnbd_bio_map_kern(q, data, dev->ibd_bio_set, len, GFP_KERNEL);
+	if (unlikely(IS_ERR(bio)))
+		return PTR_ERR(bio);
+
+	io = kmalloc(sizeof(*io), GFP_KERNEL);
+	if (unlikely(!io)) {
+		bio_put(bio);
+		return -ENOMEM;
+	}
+
+	io->dev		= dev;
+	io->priv	= priv;
+
+	bio->bi_end_io		= ibnbd_dev_bi_end_io;
+	bio->bi_bdev		= dev->bdev;
+	bio->bi_private		= io;
+	bio->bi_opf		= ibnbd_io_flags_to_bi_rw(flags);
+	bio->bi_iter.bi_sector	= sector;
+	bio->bi_iter.bi_size	= bi_size;
+
+	submit_bio(bio);
+
+	return 0;
+}
+
+static int ibnbd_dev_file_handle_flush(struct ibnbd_dev_file_io_work *w,
+				       loff_t start)
+{
+	int ret;
+	loff_t end;
+	int len = w->bi_size;
+
+	if (len)
+		end = start + len - 1;
+	else
+		end = LLONG_MAX;
+
+	ret = vfs_fsync_range(w->dev->file, start, end, 1);
+	if (unlikely(ret))
+		INFO_NP_RL("I/O FLUSH failed on %s, vfs_sync errno: %d\n",
+			   w->dev->name, ret);
+	return ret;
+}
+
+static int ibnbd_dev_file_handle_fua(struct ibnbd_dev_file_io_work *w,
+				     loff_t start)
+{
+	int ret;
+	loff_t end;
+	int len = w->bi_size;
+
+	if (len)
+		end = start + len - 1;
+	else
+		end = LLONG_MAX;
+
+	ret = vfs_fsync_range(w->dev->file, start, end, 1);
+	if (unlikely(ret))
+		INFO_NP_RL("I/O FUA failed on %s, vfs_sync errno: %d\n",
+			   w->dev->name, ret);
+	return ret;
+}
+
+static int ibnbd_dev_file_handle_write_same(struct ibnbd_dev_file_io_work *w)
+{
+	int i;
+
+	if (unlikely(WARN_ON(w->bi_size % w->len)))
+		return -EINVAL;
+
+	for (i = 1; i < w->bi_size / w->len; i++)
+		memcpy(w->data + i * w->len, w->data, w->len);
+
+	return 0;
+}
+
+static void ibnbd_dev_file_submit_io_worker(struct work_struct *w)
+{
+	struct ibnbd_dev_file_io_work *dev_work;
+	loff_t off;
+	int ret;
+	int len;
+	struct file *f;
+
+	dev_work = container_of(w, struct ibnbd_dev_file_io_work, work);
+	off = dev_work->sector * ibnbd_dev_get_logical_bsize(dev_work->dev);
+	f = dev_work->dev->file;
+	len = dev_work->bi_size;
+
+	if (dev_work->flags & IBNBD_RW_REQ_FLUSH) {
+		ret = ibnbd_dev_file_handle_flush(dev_work, off);
+		if (unlikely(ret))
+			goto out;
+	}
+
+	if (dev_work->flags & IBNBD_RW_REQ_WRITE_SAME) {
+		ret = ibnbd_dev_file_handle_write_same(dev_work);
+		if (unlikely(ret))
+			goto out;
+	}
+
+	/* TODO Implement support for DIRECT */
+	if (dev_work->bi_size) {
+		if (dev_work->flags & IBNBD_RW_REQ_WRITE)
+			ret = kernel_write(f, dev_work->data, dev_work->bi_size,
+					   off);
+		else
+			ret = kernel_read(f, off, dev_work->data,
+					  dev_work->bi_size);
+
+		if (unlikely(ret < 0)) {
+			goto out;
+		} else if (unlikely(ret != dev_work->bi_size)) {
+			/* TODO implement support for partial completions */
+			ret = -EIO;
+			goto out;
+		} else {
+			ret = 0;
+		}
+	}
+
+	if (dev_work->flags & IBNBD_RW_REQ_FUA)
+		ret = ibnbd_dev_file_handle_fua(dev_work, off);
+out:
+	dev_work->dev->io_cb(dev_work->priv, ret);
+	kfree(dev_work);
+}
+
+static inline bool ibnbd_dev_file_io_flags_supported(enum ibnbd_io_flags flags)
+{
+	flags &= ~IBNBD_RW_REQ_WRITE;
+	flags &= ~IBNBD_RW_REQ_SYNC;
+	flags &= ~IBNBD_RW_REQ_FUA;
+	flags &= ~IBNBD_RW_REQ_FLUSH;
+	flags &= ~IBNBD_RW_REQ_WRITE_SAME;
+
+	return (!flags);
+}
+
+static int ibnbd_dev_file_submit_io(struct ibnbd_dev *dev, sector_t sector,
+				    void *data, size_t len, size_t bi_size,
+				    enum ibnbd_io_flags flags, void *priv)
+{
+	struct ibnbd_dev_file_io_work *w;
+
+	if (!ibnbd_dev_file_io_flags_supported(flags)) {
+		INFO_NP_RL("Unsupported I/O flags: 0x%x on device %s\n", flags,
+			   dev->name);
+		return -ENOTSUPP;
+	}
+
+	w = kmalloc(sizeof(*w), GFP_KERNEL);
+	if (!w)
+		return -ENOMEM;
+
+	w->dev		= dev;
+	w->priv		= priv;
+	w->sector	= sector;
+	w->data		= data;
+	w->len		= len;
+	w->bi_size	= bi_size;
+	w->flags	= flags;
+	INIT_WORK(&w->work, ibnbd_dev_file_submit_io_worker);
+
+	if (unlikely(!queue_work(fileio_wq, &w->work))) {
+		kfree(w);
+		return -EEXIST;
+	}
+
+	return 0;
+}
+
+int ibnbd_dev_submit_io(struct ibnbd_dev *dev, sector_t sector, void *data,
+			size_t len, u32 bi_size, enum ibnbd_io_flags flags,
+			void *priv)
+{
+	if (dev->mode == IBNBD_FILEIO)
+		return ibnbd_dev_file_submit_io(dev, sector, data, len, bi_size,
+						flags, priv);
+	else if (dev->mode == IBNBD_BLOCKIO)
+		return ibnbd_dev_blk_submit_io(dev, sector, data, len, bi_size,
+					       flags, priv);
+
+	WRN_NP("Submitting I/O to %s failed, dev->mode contains invalid "
+	       "value: '%d', memory corrupted?", dev->name, dev->mode);
+	return -EINVAL;
+}
diff --git a/drivers/block/ibnbd_server/ibnbd_dev.h b/drivers/block/ibnbd_server/ibnbd_dev.h
new file mode 100644
index 0000000..7c73d64
--- /dev/null
+++ b/drivers/block/ibnbd_server/ibnbd_dev.h
@@ -0,0 +1,149 @@
+#ifndef _IBNBD_DEV_H
+#define _IBNBD_DEV_H
+
+#include <linux/fs.h>
+#include "../ibnbd_inc/ibnbd-proto.h"
+
+typedef void ibnbd_dev_io_fn(void *priv, int error);
+
+struct ibnbd_dev {
+	struct block_device	*bdev;
+	struct bio_set		*ibd_bio_set;
+	struct file		*file;
+	fmode_t			blk_open_flags;
+	enum ibnbd_io_mode	mode;
+	char			name[BDEVNAME_SIZE];
+	ibnbd_dev_io_fn		*io_cb;
+};
+
+
+/** ibnbd_dev_init() - Initialize ibnbd_dev
+ *
+ * This functions initialized the ibnbd-dev component.
+ * It has to be called 1x time before ibnbd_dev_open() is used
+ */
+int ibnbd_dev_init(void);
+
+/** ibnbd_dev_destroy() - Destroy ibnbd_dev
+ *
+ * This functions destroys the ibnbd-dev component.
+ * It has to be called after the last device was closed.
+ */
+void ibnbd_dev_destroy(void);
+
+/**
+ * ibnbd_dev_open() - Open a device
+ * @flags:	open flags
+ * @mode:	open via VFS or block layer
+ * @bs:		bio_set to use during block io,
+ * @io_cb:	is called when I/O finished
+ */
+struct ibnbd_dev *ibnbd_dev_open(const char *path, fmode_t flags,
+				 enum ibnbd_io_mode mode, struct bio_set *bs,
+				 ibnbd_dev_io_fn io_cb);
+
+/**
+ * ibnbd_dev_close() - Close a device
+ */
+void ibnbd_dev_close(struct ibnbd_dev *dev);
+
+static inline size_t ibnbd_dev_get_capacity(const struct ibnbd_dev *dev)
+{
+	return get_capacity(dev->bdev->bd_disk);
+}
+
+static inline int ibnbd_dev_get_logical_bsize(const struct ibnbd_dev *dev)
+{
+	return bdev_logical_block_size(dev->bdev);
+}
+
+static inline int ibnbd_dev_get_phys_bsize(const struct ibnbd_dev *dev)
+{
+	return bdev_physical_block_size(dev->bdev);
+}
+
+static inline int ibnbd_dev_get_max_segs(const struct ibnbd_dev *dev)
+{
+	return queue_max_segments(bdev_get_queue(dev->bdev));
+}
+
+static inline int ibnbd_dev_get_max_hw_sects(const struct ibnbd_dev *dev)
+{
+	return queue_max_hw_sectors(bdev_get_queue(dev->bdev));
+}
+
+static inline int
+ibnbd_dev_get_max_write_same_sects(const struct ibnbd_dev *dev)
+{
+	return bdev_write_same(dev->bdev);
+}
+
+static inline int ibnbd_dev_get_secure_discard(const struct ibnbd_dev *dev)
+{
+	if (dev->mode == IBNBD_BLOCKIO)
+		return blk_queue_secure_erase(bdev_get_queue(dev->bdev));
+	return 0;
+}
+
+static inline int ibnbd_dev_get_max_discard_sects(const struct ibnbd_dev *dev)
+{
+	if (!blk_queue_discard(bdev_get_queue(dev->bdev)))
+		return 0;
+
+	if (dev->mode == IBNBD_BLOCKIO)
+		return blk_queue_get_max_sectors(bdev_get_queue(dev->bdev),
+						 REQ_OP_DISCARD);
+	return 0;
+}
+
+static inline int ibnbd_dev_get_discard_zeroes_data(const struct ibnbd_dev *dev)
+{
+	if (dev->mode == IBNBD_BLOCKIO)
+		return bdev_get_queue(dev->bdev)->limits.discard_zeroes_data;
+	return 0;
+}
+
+static inline int ibnbd_dev_get_discard_granularity(const struct ibnbd_dev *dev)
+{
+	if (dev->mode == IBNBD_BLOCKIO)
+		return bdev_get_queue(dev->bdev)->limits.discard_granularity;
+	return 0;
+}
+
+static inline int ibnbd_dev_get_discard_alignment(const struct ibnbd_dev *dev)
+{
+	if (dev->mode == IBNBD_BLOCKIO)
+		return bdev_get_queue(dev->bdev)->limits.discard_alignment;
+	return 0;
+}
+
+
+/**
+ * ibnbd_dev_get_name() - Return the device name
+ * returns:	Device name up to %BDEVNAME_SIZE% long
+ */
+static inline const char *ibnbd_dev_get_name(const struct ibnbd_dev *dev)
+{
+	return dev->name;
+}
+
+static inline struct block_device *
+ibnbd_dev_get_bdev(const struct ibnbd_dev *dev)
+{
+	return dev->bdev;
+}
+
+
+/**
+ * ibnbd_dev_submit_io() - Submit an I/O to the disk
+ * @dev:	device to that the I/O is submitted
+ * @sector:	address to read/write data to
+ * @data:	I/O data to write or buffer to read I/O date into
+ * @len:	length of @data
+ * @bi_size:	Amount of data that will be read/written
+ * @priv:	private data passed to @io_fn
+ */
+int ibnbd_dev_submit_io(struct ibnbd_dev *dev, sector_t sector, void *data,
+			size_t len, u32 bi_size, enum ibnbd_io_flags flags,
+			void *priv);
+#endif
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 24/28] ibnbd_srv: add log helpers
  2017-03-24 10:45 ` Jack Wang
                   ` (23 preceding siblings ...)
  (?)
@ 2017-03-24 10:45 ` Jack Wang
  -1 siblings, 0 replies; 87+ messages in thread
From: Jack Wang @ 2017-03-24 10:45 UTC (permalink / raw)
  To: linux-block, linux-rdma
  Cc: dledford, axboe, hch, mail, Milind.dumbare, yun.wang, Jack Wang,
	Kleber Souza, Danil Kipnis

From: Jack Wang <jinpu.wang@profitbricks.com>

Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Kleber Souza <kleber.souza@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kipnis@profitbricks.com>
---
 drivers/block/ibnbd_server/ibnbd_srv_log.h | 69 ++++++++++++++++++++++++++++++
 1 file changed, 69 insertions(+)
 create mode 100644 drivers/block/ibnbd_server/ibnbd_srv_log.h

diff --git a/drivers/block/ibnbd_server/ibnbd_srv_log.h b/drivers/block/ibnbd_server/ibnbd_srv_log.h
new file mode 100644
index 0000000..9217804
--- /dev/null
+++ b/drivers/block/ibnbd_server/ibnbd_srv_log.h
@@ -0,0 +1,69 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler < mail@fholler.de>
+ *          Jack Wang <jinpu.wang@profitbricks.com>
+ *   	    Kleber Souza <kleber.souza@profitbricks.com>
+ * 	    Danil Kipnis <danil.kipnis@profitbricks.com>
+ *   	    Roman Pen <roman.penyaev@profitbricks.com>
+ *          Milind Dumbare <Milind.dumbare@gmail.com>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#ifndef __IBNBD_SRV_LOG_H__
+#define __IBNBD_SRV_LOG_H__
+
+#include "../ibnbd_inc/log.h"
+
+#define ERR(dev, fmt, ...) pr_err("ibnbd L%d <%s@%s> ERR: " fmt, \
+				__LINE__, dev->pathname, ibnbd_prefix(dev),\
+				##__VA_ARGS__)
+#define ERR_RL(dev, fmt, ...) pr_err_ratelimited("ibnbd L%d <%s@%s> ERR: " fmt,\
+				__LINE__, dev->pathname, ibnbd_prefix(dev),\
+				##__VA_ARGS__)
+#define WRN(dev, fmt, ...) pr_warn("ibnbd L%d <%s@%s> WARN: " fmt,\
+				__LINE__, dev->pathname, ibnbd_prefix(dev),\
+				##__VA_ARGS__)
+#define WRN_RL(dev, fmt, ...) pr_warn_ratelimited("ibnbd L%d <%s@%s> WARN: " \
+			fmt, __LINE__, dev->pathname, ibnbd_prefix(dev),\
+			##__VA_ARGS__)
+#define INFO(dev, fmt, ...) pr_info("ibnbd <%s@%s>: " \
+			fmt, dev->pathname, ibnbd_prefix(dev), ##__VA_ARGS__)
+#define INFO_RL(dev, fmt, ...) pr_info_ratelimited("ibnbd <%s@%s>: " \
+			fmt, dev->pathname, ibnbd_prefix(dev), ##__VA_ARGS__)
+
+#endif /*__IBNBD_SRV_LOG_H__*/
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 25/28] ibnbd_srv: add sysfs interface
@ 2017-03-24 10:45   ` Jack Wang
  0 siblings, 0 replies; 87+ messages in thread
From: Jack Wang @ 2017-03-24 10:45 UTC (permalink / raw)
  To: linux-block, linux-rdma
  Cc: dledford, axboe, hch, mail, Milind.dumbare, yun.wang, Jack Wang,
	Kleber Souza, Danil Kipnis, Roman Pen

From: Jack Wang <jinpu.wang@profitbricks.com>

Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Kleber Souza <kleber.souza@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kipnis@profitbricks.com>
Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
---
 drivers/block/ibnbd_server/ibnbd_srv_sysfs.c | 317 +++++++++++++++++++++++++++
 drivers/block/ibnbd_server/ibnbd_srv_sysfs.h |  64 ++++++
 2 files changed, 381 insertions(+)
 create mode 100644 drivers/block/ibnbd_server/ibnbd_srv_sysfs.c
 create mode 100644 drivers/block/ibnbd_server/ibnbd_srv_sysfs.h

diff --git a/drivers/block/ibnbd_server/ibnbd_srv_sysfs.c b/drivers/block/ibnbd_server/ibnbd_srv_sysfs.c
new file mode 100644
index 0000000..8774abe
--- /dev/null
+++ b/drivers/block/ibnbd_server/ibnbd_srv_sysfs.c
@@ -0,0 +1,317 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler < mail@fholler.de>
+ *          Jack Wang <jinpu.wang@profitbricks.com>
+ *   	    Kleber Souza <kleber.souza@profitbricks.com>
+ * 	    Danil Kipnis <danil.kipnis@profitbricks.com>
+ *   	    Roman Pen <roman.penyaev@profitbricks.com>
+ *          Milind Dumbare <Milind.dumbare@gmail.com>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#include <uapi/linux/limits.h>
+#include <linux/kobject.h>
+#include <linux/sysfs.h>
+#include <linux/stat.h>
+#include <linux/genhd.h>
+#include <linux/list.h>
+
+#include "../ibnbd_inc/ibnbd.h"
+#include "ibnbd_srv.h"
+#include "ibnbd_srv_log.h"
+#include "ibnbd_srv_sysfs.h"
+
+static struct kobject *ibnbd_srv_kobj;
+static struct kobject *ibnbd_srv_devices_kobj;
+#define IBNBD_SYSFS_DIR "ibnbd"
+static char ibnbd_sysfs_dir[64] = IBNBD_SYSFS_DIR;
+
+static ssize_t ibnbd_srv_revalidate_dev_show(struct kobject *kobj,
+					     struct kobj_attribute *attr,
+					     char *page)
+{
+	return scnprintf(page, PAGE_SIZE,
+			 "Usage: echo 1 > %s\n", attr->attr.name);
+}
+
+static ssize_t ibnbd_srv_revalidate_dev_store(struct kobject *kobj,
+					      struct kobj_attribute *attr,
+					      const char *buf, size_t count)
+{
+	int ret;
+	struct ibnbd_srv_dev *dev = container_of(kobj, struct ibnbd_srv_dev,
+						 dev_kobj);
+
+	if (!sysfs_streq(buf, "1")) {
+		ERR_NP("%s: invalid value: '%s'\n", attr->attr.name, buf);
+		return -EINVAL;
+	}
+	ret = ibnbd_srv_revalidate_dev(dev);
+	if (ret)
+		return ret;
+
+	return count;
+}
+
+static struct kobj_attribute ibnbd_srv_revalidate_dev_attr =
+					__ATTR(revalidate,
+					       0644,
+					       ibnbd_srv_revalidate_dev_show,
+					       ibnbd_srv_revalidate_dev_store);
+
+static struct attribute *ibnbd_srv_default_dev_attrs[] = {
+	&ibnbd_srv_revalidate_dev_attr.attr,
+	NULL,
+};
+
+static struct attribute_group ibnbd_srv_default_dev_attr_group = {
+	.attrs = ibnbd_srv_default_dev_attrs,
+};
+
+static ssize_t ibnbd_srv_attr_show(struct kobject *kobj, struct attribute *attr,
+				   char *page)
+{
+	struct kobj_attribute *kattr;
+	int ret = -EIO;
+
+	kattr = container_of(attr, struct kobj_attribute, attr);
+	if (kattr->show)
+		ret = kattr->show(kobj, kattr, page);
+	return ret;
+}
+
+static ssize_t ibnbd_srv_attr_store(struct kobject *kobj,
+				    struct attribute *attr,
+				    const char *page, size_t length)
+{
+	struct kobj_attribute *kattr;
+	int ret = -EIO;
+
+	kattr = container_of(attr, struct kobj_attribute, attr);
+	if (kattr->store)
+		ret = kattr->store(kobj, kattr, page, length);
+	return ret;
+}
+
+static const struct sysfs_ops ibnbd_srv_sysfs_ops = {
+	.show	= ibnbd_srv_attr_show,
+	.store	= ibnbd_srv_attr_store,
+};
+
+static struct kobj_type ibnbd_srv_dev_ktype = {
+	.sysfs_ops	= &ibnbd_srv_sysfs_ops,
+};
+
+static struct kobj_type ibnbd_srv_dev_clients_ktype = {
+	.sysfs_ops	= &ibnbd_srv_sysfs_ops,
+};
+
+int ibnbd_srv_create_dev_sysfs(struct ibnbd_srv_dev *dev,
+			       struct block_device *bdev,
+			       const char *dir_name)
+{
+	struct kobject *bdev_kobj;
+	int ret;
+
+	ret = kobject_init_and_add(&dev->dev_kobj, &ibnbd_srv_dev_ktype,
+				   ibnbd_srv_devices_kobj, dir_name);
+	if (ret)
+		return ret;
+
+	ret = kobject_init_and_add(&dev->dev_clients_kobj,
+				   &ibnbd_srv_dev_clients_ktype,
+				   &dev->dev_kobj, "clients");
+	if (ret)
+		goto err;
+
+	ret = sysfs_create_group(&dev->dev_kobj,
+				 &ibnbd_srv_default_dev_attr_group);
+	if (ret)
+		goto err2;
+
+	bdev_kobj = &disk_to_dev(bdev->bd_disk)->kobj;
+	ret = sysfs_create_link(&dev->dev_kobj, bdev_kobj, "block_dev");
+	if (ret)
+		goto err3;
+
+	return 0;
+
+err3:
+	sysfs_remove_group(&dev->dev_kobj,
+			   &ibnbd_srv_default_dev_attr_group);
+err2:
+	kobject_del(&dev->dev_clients_kobj);
+	kobject_put(&dev->dev_clients_kobj);
+err:
+	kobject_del(&dev->dev_kobj);
+	kobject_put(&dev->dev_kobj);
+	return ret;
+}
+
+void ibnbd_srv_destroy_dev_sysfs(struct ibnbd_srv_dev *dev)
+{
+	sysfs_remove_link(&dev->dev_kobj, "block_dev");
+	sysfs_remove_group(&dev->dev_kobj, &ibnbd_srv_default_dev_attr_group);
+	kobject_del(&dev->dev_clients_kobj);
+	kobject_put(&dev->dev_clients_kobj);
+	kobject_del(&dev->dev_kobj);
+	kobject_put(&dev->dev_kobj);
+}
+
+static ssize_t ibnbd_srv_dev_client_ro_show(struct kobject *kobj,
+					    struct kobj_attribute *attr,
+					    char *page)
+{
+	struct ibnbd_srv_sess_dev *sess_dev;
+
+	sess_dev = container_of(kobj, struct ibnbd_srv_sess_dev, kobj);
+
+	return scnprintf(page, PAGE_SIZE, "%s\n",
+			 (sess_dev->open_flags & FMODE_WRITE) ? "0" : "1");
+}
+
+static struct kobj_attribute ibnbd_srv_dev_client_ro_attr =
+					__ATTR(read_only, 0444,
+					       ibnbd_srv_dev_client_ro_show,
+					       NULL);
+
+static ssize_t ibnbd_srv_dev_client_mapping_path_show(
+						struct kobject *kobj,
+						struct kobj_attribute *attr,
+						char *page)
+{
+	struct ibnbd_srv_sess_dev *sess_dev;
+
+	sess_dev = container_of(kobj, struct ibnbd_srv_sess_dev, kobj);
+
+	return scnprintf(page, PAGE_SIZE, "%s\n", sess_dev->pathname);
+}
+
+static struct kobj_attribute ibnbd_srv_dev_client_mapping_path_attr =
+				__ATTR(mapping_path, 0444,
+				       ibnbd_srv_dev_client_mapping_path_show,
+				       NULL);
+
+static struct attribute *ibnbd_srv_default_dev_clients_attrs[] = {
+	&ibnbd_srv_dev_client_ro_attr.attr,
+	&ibnbd_srv_dev_client_mapping_path_attr.attr,
+	NULL,
+};
+
+static struct attribute_group ibnbd_srv_default_dev_client_attr_group = {
+	.attrs = ibnbd_srv_default_dev_clients_attrs,
+};
+
+void ibnbd_srv_destroy_dev_client_sysfs(struct ibnbd_srv_sess_dev *sess_dev)
+{
+	struct completion sysfs_compl;
+
+	sysfs_remove_group(&sess_dev->kobj,
+			   &ibnbd_srv_default_dev_client_attr_group);
+
+	init_completion(&sysfs_compl);
+	sess_dev->sysfs_release_compl = &sysfs_compl;
+	kobject_del(&sess_dev->kobj);
+	kobject_put(&sess_dev->kobj);
+	wait_for_completion(&sysfs_compl);
+}
+
+static void ibnbd_srv_sess_dev_release(struct kobject *kobj)
+{
+	struct ibnbd_srv_sess_dev *sess_dev;
+
+	sess_dev = container_of(kobj, struct ibnbd_srv_sess_dev, kobj);
+	if (sess_dev->sysfs_release_compl)
+		complete_all(sess_dev->sysfs_release_compl);
+}
+
+static struct kobj_type ibnbd_srv_sess_dev_ktype = {
+	.sysfs_ops	= &ibnbd_srv_sysfs_ops,
+	.release	= ibnbd_srv_sess_dev_release,
+};
+
+int ibnbd_srv_create_dev_client_sysfs(struct ibnbd_srv_sess_dev *sess_dev)
+{
+	int ret;
+
+	ret = kobject_init_and_add(&sess_dev->kobj, &ibnbd_srv_sess_dev_ktype,
+				   &sess_dev->dev->dev_clients_kobj, "%s",
+				   sess_dev->sess->str_addr);
+	if (ret)
+		return ret;
+
+	ret = sysfs_create_group(&sess_dev->kobj,
+				 &ibnbd_srv_default_dev_client_attr_group);
+	if (ret)
+		goto err;
+
+	return 0;
+
+err:
+	kobject_del(&sess_dev->kobj);
+	kobject_put(&sess_dev->kobj);
+	return ret;
+}
+
+int ibnbd_srv_create_sysfs_files(void)
+{
+	int err;
+
+	ibnbd_srv_kobj = kobject_create_and_add(ibnbd_sysfs_dir, kernel_kobj);
+	if (!ibnbd_srv_kobj)
+		return -ENOMEM;
+
+	ibnbd_srv_devices_kobj = kobject_create_and_add("devices",
+							ibnbd_srv_kobj);
+	if (!ibnbd_srv_devices_kobj) {
+		err = -ENOMEM;
+		goto err;
+	}
+
+	return 0;
+
+err:
+	kobject_put(ibnbd_srv_kobj);
+	return err;
+}
+
+void ibnbd_srv_destroy_sysfs_files(void)
+{
+	kobject_put(ibnbd_srv_devices_kobj);
+	kobject_put(ibnbd_srv_kobj);
+}
diff --git a/drivers/block/ibnbd_server/ibnbd_srv_sysfs.h b/drivers/block/ibnbd_server/ibnbd_srv_sysfs.h
new file mode 100644
index 0000000..1df232a
--- /dev/null
+++ b/drivers/block/ibnbd_server/ibnbd_srv_sysfs.h
@@ -0,0 +1,64 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler < mail@fholler.de>
+ *          Jack Wang <jinpu.wang@profitbricks.com>
+ *   	    Kleber Souza <kleber.souza@profitbricks.com>
+ * 	    Danil Kipnis <danil.kipnis@profitbricks.com>
+ *   	    Roman Pen <roman.penyaev@profitbricks.com>
+ *          Milind Dumbare <Milind.dumbare@gmail.com>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#ifndef _IBNBD_SRV_SYFS_H
+#define _IBNBD_SRV_SYFS_H
+
+int ibnbd_srv_create_dev_sysfs(struct ibnbd_srv_dev *dev,
+			       struct block_device *bdev,
+			       const char *dir_name);
+
+void ibnbd_srv_destroy_dev_sysfs(struct ibnbd_srv_dev *dev);
+
+int ibnbd_srv_create_dev_client_sysfs(struct ibnbd_srv_sess_dev *sess_dev);
+
+void ibnbd_srv_destroy_dev_client_sysfs(struct ibnbd_srv_sess_dev *sess_dev);
+
+int ibnbd_srv_create_sysfs_files(void);
+
+void ibnbd_srv_destroy_sysfs_files(void);
+
+#endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 25/28] ibnbd_srv: add sysfs interface
@ 2017-03-24 10:45   ` Jack Wang
  0 siblings, 0 replies; 87+ messages in thread
From: Jack Wang @ 2017-03-24 10:45 UTC (permalink / raw)
  To: linux-block-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
	hch-jcswGhMUV9g, mail-99BIx50xQYGELgA04lAiVw,
	Milind.dumbare-Re5JQEeQqe8AvxtiuMwx3w,
	yun.wang-EIkl63zCoXaH+58JC4qpiA, Jack Wang, Kleber Souza,
	Danil Kipnis, Roman Pen

From: Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>

Signed-off-by: Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Signed-off-by: Kleber Souza <kleber.souza-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Signed-off-by: Danil Kipnis <danil.kipnis-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Signed-off-by: Roman Pen <roman.penyaev-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
---
 drivers/block/ibnbd_server/ibnbd_srv_sysfs.c | 317 +++++++++++++++++++++++++++
 drivers/block/ibnbd_server/ibnbd_srv_sysfs.h |  64 ++++++
 2 files changed, 381 insertions(+)
 create mode 100644 drivers/block/ibnbd_server/ibnbd_srv_sysfs.c
 create mode 100644 drivers/block/ibnbd_server/ibnbd_srv_sysfs.h

diff --git a/drivers/block/ibnbd_server/ibnbd_srv_sysfs.c b/drivers/block/ibnbd_server/ibnbd_srv_sysfs.c
new file mode 100644
index 0000000..8774abe
--- /dev/null
+++ b/drivers/block/ibnbd_server/ibnbd_srv_sysfs.c
@@ -0,0 +1,317 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler < mail-99BIx50xQYGELgA04lAiVw@public.gmane.org>
+ *          Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ *   	    Kleber Souza <kleber.souza-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ * 	    Danil Kipnis <danil.kipnis-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ *   	    Roman Pen <roman.penyaev-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ *          Milind Dumbare <Milind.dumbare-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#include <uapi/linux/limits.h>
+#include <linux/kobject.h>
+#include <linux/sysfs.h>
+#include <linux/stat.h>
+#include <linux/genhd.h>
+#include <linux/list.h>
+
+#include "../ibnbd_inc/ibnbd.h"
+#include "ibnbd_srv.h"
+#include "ibnbd_srv_log.h"
+#include "ibnbd_srv_sysfs.h"
+
+static struct kobject *ibnbd_srv_kobj;
+static struct kobject *ibnbd_srv_devices_kobj;
+#define IBNBD_SYSFS_DIR "ibnbd"
+static char ibnbd_sysfs_dir[64] = IBNBD_SYSFS_DIR;
+
+static ssize_t ibnbd_srv_revalidate_dev_show(struct kobject *kobj,
+					     struct kobj_attribute *attr,
+					     char *page)
+{
+	return scnprintf(page, PAGE_SIZE,
+			 "Usage: echo 1 > %s\n", attr->attr.name);
+}
+
+static ssize_t ibnbd_srv_revalidate_dev_store(struct kobject *kobj,
+					      struct kobj_attribute *attr,
+					      const char *buf, size_t count)
+{
+	int ret;
+	struct ibnbd_srv_dev *dev = container_of(kobj, struct ibnbd_srv_dev,
+						 dev_kobj);
+
+	if (!sysfs_streq(buf, "1")) {
+		ERR_NP("%s: invalid value: '%s'\n", attr->attr.name, buf);
+		return -EINVAL;
+	}
+	ret = ibnbd_srv_revalidate_dev(dev);
+	if (ret)
+		return ret;
+
+	return count;
+}
+
+static struct kobj_attribute ibnbd_srv_revalidate_dev_attr =
+					__ATTR(revalidate,
+					       0644,
+					       ibnbd_srv_revalidate_dev_show,
+					       ibnbd_srv_revalidate_dev_store);
+
+static struct attribute *ibnbd_srv_default_dev_attrs[] = {
+	&ibnbd_srv_revalidate_dev_attr.attr,
+	NULL,
+};
+
+static struct attribute_group ibnbd_srv_default_dev_attr_group = {
+	.attrs = ibnbd_srv_default_dev_attrs,
+};
+
+static ssize_t ibnbd_srv_attr_show(struct kobject *kobj, struct attribute *attr,
+				   char *page)
+{
+	struct kobj_attribute *kattr;
+	int ret = -EIO;
+
+	kattr = container_of(attr, struct kobj_attribute, attr);
+	if (kattr->show)
+		ret = kattr->show(kobj, kattr, page);
+	return ret;
+}
+
+static ssize_t ibnbd_srv_attr_store(struct kobject *kobj,
+				    struct attribute *attr,
+				    const char *page, size_t length)
+{
+	struct kobj_attribute *kattr;
+	int ret = -EIO;
+
+	kattr = container_of(attr, struct kobj_attribute, attr);
+	if (kattr->store)
+		ret = kattr->store(kobj, kattr, page, length);
+	return ret;
+}
+
+static const struct sysfs_ops ibnbd_srv_sysfs_ops = {
+	.show	= ibnbd_srv_attr_show,
+	.store	= ibnbd_srv_attr_store,
+};
+
+static struct kobj_type ibnbd_srv_dev_ktype = {
+	.sysfs_ops	= &ibnbd_srv_sysfs_ops,
+};
+
+static struct kobj_type ibnbd_srv_dev_clients_ktype = {
+	.sysfs_ops	= &ibnbd_srv_sysfs_ops,
+};
+
+int ibnbd_srv_create_dev_sysfs(struct ibnbd_srv_dev *dev,
+			       struct block_device *bdev,
+			       const char *dir_name)
+{
+	struct kobject *bdev_kobj;
+	int ret;
+
+	ret = kobject_init_and_add(&dev->dev_kobj, &ibnbd_srv_dev_ktype,
+				   ibnbd_srv_devices_kobj, dir_name);
+	if (ret)
+		return ret;
+
+	ret = kobject_init_and_add(&dev->dev_clients_kobj,
+				   &ibnbd_srv_dev_clients_ktype,
+				   &dev->dev_kobj, "clients");
+	if (ret)
+		goto err;
+
+	ret = sysfs_create_group(&dev->dev_kobj,
+				 &ibnbd_srv_default_dev_attr_group);
+	if (ret)
+		goto err2;
+
+	bdev_kobj = &disk_to_dev(bdev->bd_disk)->kobj;
+	ret = sysfs_create_link(&dev->dev_kobj, bdev_kobj, "block_dev");
+	if (ret)
+		goto err3;
+
+	return 0;
+
+err3:
+	sysfs_remove_group(&dev->dev_kobj,
+			   &ibnbd_srv_default_dev_attr_group);
+err2:
+	kobject_del(&dev->dev_clients_kobj);
+	kobject_put(&dev->dev_clients_kobj);
+err:
+	kobject_del(&dev->dev_kobj);
+	kobject_put(&dev->dev_kobj);
+	return ret;
+}
+
+void ibnbd_srv_destroy_dev_sysfs(struct ibnbd_srv_dev *dev)
+{
+	sysfs_remove_link(&dev->dev_kobj, "block_dev");
+	sysfs_remove_group(&dev->dev_kobj, &ibnbd_srv_default_dev_attr_group);
+	kobject_del(&dev->dev_clients_kobj);
+	kobject_put(&dev->dev_clients_kobj);
+	kobject_del(&dev->dev_kobj);
+	kobject_put(&dev->dev_kobj);
+}
+
+static ssize_t ibnbd_srv_dev_client_ro_show(struct kobject *kobj,
+					    struct kobj_attribute *attr,
+					    char *page)
+{
+	struct ibnbd_srv_sess_dev *sess_dev;
+
+	sess_dev = container_of(kobj, struct ibnbd_srv_sess_dev, kobj);
+
+	return scnprintf(page, PAGE_SIZE, "%s\n",
+			 (sess_dev->open_flags & FMODE_WRITE) ? "0" : "1");
+}
+
+static struct kobj_attribute ibnbd_srv_dev_client_ro_attr =
+					__ATTR(read_only, 0444,
+					       ibnbd_srv_dev_client_ro_show,
+					       NULL);
+
+static ssize_t ibnbd_srv_dev_client_mapping_path_show(
+						struct kobject *kobj,
+						struct kobj_attribute *attr,
+						char *page)
+{
+	struct ibnbd_srv_sess_dev *sess_dev;
+
+	sess_dev = container_of(kobj, struct ibnbd_srv_sess_dev, kobj);
+
+	return scnprintf(page, PAGE_SIZE, "%s\n", sess_dev->pathname);
+}
+
+static struct kobj_attribute ibnbd_srv_dev_client_mapping_path_attr =
+				__ATTR(mapping_path, 0444,
+				       ibnbd_srv_dev_client_mapping_path_show,
+				       NULL);
+
+static struct attribute *ibnbd_srv_default_dev_clients_attrs[] = {
+	&ibnbd_srv_dev_client_ro_attr.attr,
+	&ibnbd_srv_dev_client_mapping_path_attr.attr,
+	NULL,
+};
+
+static struct attribute_group ibnbd_srv_default_dev_client_attr_group = {
+	.attrs = ibnbd_srv_default_dev_clients_attrs,
+};
+
+void ibnbd_srv_destroy_dev_client_sysfs(struct ibnbd_srv_sess_dev *sess_dev)
+{
+	struct completion sysfs_compl;
+
+	sysfs_remove_group(&sess_dev->kobj,
+			   &ibnbd_srv_default_dev_client_attr_group);
+
+	init_completion(&sysfs_compl);
+	sess_dev->sysfs_release_compl = &sysfs_compl;
+	kobject_del(&sess_dev->kobj);
+	kobject_put(&sess_dev->kobj);
+	wait_for_completion(&sysfs_compl);
+}
+
+static void ibnbd_srv_sess_dev_release(struct kobject *kobj)
+{
+	struct ibnbd_srv_sess_dev *sess_dev;
+
+	sess_dev = container_of(kobj, struct ibnbd_srv_sess_dev, kobj);
+	if (sess_dev->sysfs_release_compl)
+		complete_all(sess_dev->sysfs_release_compl);
+}
+
+static struct kobj_type ibnbd_srv_sess_dev_ktype = {
+	.sysfs_ops	= &ibnbd_srv_sysfs_ops,
+	.release	= ibnbd_srv_sess_dev_release,
+};
+
+int ibnbd_srv_create_dev_client_sysfs(struct ibnbd_srv_sess_dev *sess_dev)
+{
+	int ret;
+
+	ret = kobject_init_and_add(&sess_dev->kobj, &ibnbd_srv_sess_dev_ktype,
+				   &sess_dev->dev->dev_clients_kobj, "%s",
+				   sess_dev->sess->str_addr);
+	if (ret)
+		return ret;
+
+	ret = sysfs_create_group(&sess_dev->kobj,
+				 &ibnbd_srv_default_dev_client_attr_group);
+	if (ret)
+		goto err;
+
+	return 0;
+
+err:
+	kobject_del(&sess_dev->kobj);
+	kobject_put(&sess_dev->kobj);
+	return ret;
+}
+
+int ibnbd_srv_create_sysfs_files(void)
+{
+	int err;
+
+	ibnbd_srv_kobj = kobject_create_and_add(ibnbd_sysfs_dir, kernel_kobj);
+	if (!ibnbd_srv_kobj)
+		return -ENOMEM;
+
+	ibnbd_srv_devices_kobj = kobject_create_and_add("devices",
+							ibnbd_srv_kobj);
+	if (!ibnbd_srv_devices_kobj) {
+		err = -ENOMEM;
+		goto err;
+	}
+
+	return 0;
+
+err:
+	kobject_put(ibnbd_srv_kobj);
+	return err;
+}
+
+void ibnbd_srv_destroy_sysfs_files(void)
+{
+	kobject_put(ibnbd_srv_devices_kobj);
+	kobject_put(ibnbd_srv_kobj);
+}
diff --git a/drivers/block/ibnbd_server/ibnbd_srv_sysfs.h b/drivers/block/ibnbd_server/ibnbd_srv_sysfs.h
new file mode 100644
index 0000000..1df232a
--- /dev/null
+++ b/drivers/block/ibnbd_server/ibnbd_srv_sysfs.h
@@ -0,0 +1,64 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler < mail-99BIx50xQYGELgA04lAiVw@public.gmane.org>
+ *          Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ *   	    Kleber Souza <kleber.souza-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ * 	    Danil Kipnis <danil.kipnis-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ *   	    Roman Pen <roman.penyaev-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
+ *          Milind Dumbare <Milind.dumbare-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    substantially similar to the "NO WARRANTY" disclaimer below
+ *    ("Disclaimer") and any redistribution must be conditioned upon
+ *    including a substantially similar Disclaimer requirement for further
+ *    binary redistribution.
+ * 3. Neither the names of the above-listed copyright holders nor the names
+ *    of any contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGES.
+ *
+ */
+
+#ifndef _IBNBD_SRV_SYFS_H
+#define _IBNBD_SRV_SYFS_H
+
+int ibnbd_srv_create_dev_sysfs(struct ibnbd_srv_dev *dev,
+			       struct block_device *bdev,
+			       const char *dir_name);
+
+void ibnbd_srv_destroy_dev_sysfs(struct ibnbd_srv_dev *dev);
+
+int ibnbd_srv_create_dev_client_sysfs(struct ibnbd_srv_sess_dev *sess_dev);
+
+void ibnbd_srv_destroy_dev_client_sysfs(struct ibnbd_srv_sess_dev *sess_dev);
+
+int ibnbd_srv_create_sysfs_files(void);
+
+void ibnbd_srv_destroy_sysfs_files(void);
+
+#endif
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 26/28] ibnbd_srv: add Makefile and Kconfig
  2017-03-24 10:45 ` Jack Wang
                   ` (25 preceding siblings ...)
  (?)
@ 2017-03-24 10:45 ` Jack Wang
  2017-03-25  9:27     ` kbuild test robot
  -1 siblings, 1 reply; 87+ messages in thread
From: Jack Wang @ 2017-03-24 10:45 UTC (permalink / raw)
  To: linux-block, linux-rdma
  Cc: dledford, axboe, hch, mail, Milind.dumbare, yun.wang, Jack Wang

From: Jack Wang <jinpu.wang@profitbricks.com>

Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
 drivers/block/Kconfig               |  1 +
 drivers/block/Makefile              |  1 +
 drivers/block/ibnbd_server/Kconfig  | 16 ++++++++++++++++
 drivers/block/ibnbd_server/Makefile |  3 +++
 4 files changed, 21 insertions(+)
 create mode 100644 drivers/block/ibnbd_server/Kconfig
 create mode 100644 drivers/block/ibnbd_server/Makefile

diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
index c309e57..e4823c4 100644
--- a/drivers/block/Kconfig
+++ b/drivers/block/Kconfig
@@ -276,6 +276,7 @@ config BLK_DEV_CRYPTOLOOP
 source "drivers/block/drbd/Kconfig"
 
 source "drivers/block/ibnbd_client/Kconfig"
+source "drivers/block/ibnbd_server/Kconfig"
 
 config BLK_DEV_NBD
 	tristate "Network block device support"
diff --git a/drivers/block/Makefile b/drivers/block/Makefile
index 7da1813..cd20888 100644
--- a/drivers/block/Makefile
+++ b/drivers/block/Makefile
@@ -35,6 +35,7 @@ obj-$(CONFIG_BLK_DEV_HD)	+= hd.o
 obj-$(CONFIG_XEN_BLKDEV_FRONTEND)	+= xen-blkfront.o
 obj-$(CONFIG_XEN_BLKDEV_BACKEND)	+= xen-blkback/
 obj-$(CONFIG_BLK_DEV_IBNBD_CLT)	+= ibnbd_client/
+obj-$(CONFIG_BLK_DEV_IBNBD_SRV)	+= ibnbd_server/
 obj-$(CONFIG_BLK_DEV_DRBD)     += drbd/
 obj-$(CONFIG_BLK_DEV_RBD)     += rbd.o
 obj-$(CONFIG_BLK_DEV_PCIESSD_MTIP32XX)	+= mtip32xx/
diff --git a/drivers/block/ibnbd_server/Kconfig b/drivers/block/ibnbd_server/Kconfig
new file mode 100644
index 0000000..943e1b2
--- /dev/null
+++ b/drivers/block/ibnbd_server/Kconfig
@@ -0,0 +1,16 @@
+config BLK_DEV_IBNBD_SRV
+	tristate "Network block device over Infiniband server support"
+	depends on INFINIBAND_IBTRS_SRV
+	---help---
+	  Saying Y here will allow your computer to be a server for network
+	  block devices over Infiniband, i.e. it will be able to use block
+	  devices exported by servers (mount file systems on them etc.).
+	  Communication between client and server works over Infiniband
+	  networking, but to the client program this is hidden:
+	  it looks like a regular local file access to a block device
+	  special file such as /dev/ibnbd0.
+
+	  To compile this driver as a module, choose M here: the
+	  module will be called ibnbd_client.
+
+	  If unsure, say N.
diff --git a/drivers/block/ibnbd_server/Makefile b/drivers/block/ibnbd_server/Makefile
new file mode 100644
index 0000000..e66860f
--- /dev/null
+++ b/drivers/block/ibnbd_server/Makefile
@@ -0,0 +1,3 @@
+obj-$(CONFIG_BLK_DEV_IBNBD_SRV) += ibnbd_server.o
+ibnbd_server-objs 	:= ibnbd_srv.o ibnbd_srv_sysfs.o ibnbd_dev.o \
+	../ibnbd_lib/ibnbd.o ../ibnbd_lib/ibnbd-proto.o
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 27/28] ibnbd: add doc for how to use ibnbd and sysfs interface
  2017-03-24 10:45 ` Jack Wang
                   ` (26 preceding siblings ...)
  (?)
@ 2017-03-24 10:45 ` Jack Wang
  2017-03-25  7:44     ` kbuild test robot
  -1 siblings, 1 reply; 87+ messages in thread
From: Jack Wang @ 2017-03-24 10:45 UTC (permalink / raw)
  To: linux-block, linux-rdma
  Cc: dledford, axboe, hch, mail, Milind.dumbare, yun.wang, Jack Wang

From: Jack Wang <jinpu.wang@profitbricks.com>

Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
 Documentation/IBNBD.txt | 284 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 284 insertions(+)
 create mode 100644 Documentation/IBNBD.txt

diff --git a/Documentation/IBNBD.txt b/Documentation/IBNBD.txt
new file mode 100644
index 0000000..f7f490a
--- /dev/null
+++ b/Documentation/IBNBD.txt
@@ -0,0 +1,284 @@
+Infiniband Network Block Device (IBNBD)
+=======================================
+
+Introduction
+------------
+
+IBNBD (InfiniBand Network Block Device) is a pair of kernel modules (client and
+server) that allows to access a remote storage device on the server from
+clients via an InfiniBand network.
+Mapped storage devices appear transparent for the client, acting as any other
+regular storage devices.
+
+The data transport between client and server over the InfiniBand network
+is performed by the IBTRS (InfiniBand Transport) kernel modules.
+
+The administration of these modules is done via sysfs. A Command-line tool
+(ibnbd-cli) is also available for a more user-friendly experience.
+
+Requirements
+------------
+  - IBTRS kernel modules (available as git-submodule)
+
+Quick Start
+-----------
+Server:
+  # insmod ibtrs/ibtrs_server/ibtrs_server.ko
+  # insmod ibnbd_server/ibnbd_server.ko
+
+Client:
+  # insmod ibtrs/ibtrs_client/ibtrs_client.ko
+  # insmod ibnbd_client/ibnbd_client.ko
+  # echo "server=<SERVER-ADDRESS> device_path=<DEV-PATH-ON-SERVER>" > /sys/kernel/ibnbd/map_device
+
+The block device <DEV-PATH-ON-SERVER> will become available on the client as
+/dev/ibnbd<NR>. It can be used like a local block device.
+
+Client Userspace Interface
+--------------------------
+This chapter describes only the most important files of Userspace Interface.
+A full documentation can be found in the Architecture Documentation.
+
+All sysfs files that are not read-only will return a usage information if they
+are read.
+
+example:
+  $ cat /sys/kernel/ibnbd/map_device
+
+
+/sys/kernel/ibnbd/ entries
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+map_device (RW)
+^^^^^^^^^^^^^^^
+To map a volume on the client, information about the device has to be written
+to:
+  /sys/kernel/ibnbd/map_device
+
+The format of the input is:
+  "server=<server-address> device_path=<relative-path-to-device-on-server>
+   [access_mode=<ro|rw|migration] [input_mode=(mq|rq)]
+   [io_mode=fileio|blockio]"
+
+Server Parameter
+++++++++++++++++
+A server address has to be in one of the following formats:
+ - ip:<IPv6>
+ - ip:<IPv4>
+ - gid:<GID>
+
+device_path Parameter
++++++++++++++++++++++++++++++++
+A device can be mapped by specifying its relative path to the configured
+dev_search_path on the server side.
+The ibnbd_server prepends the configured dev_search_path to the passed
+device_path from the mapped operation and tries to open a block device with the
+path dev_search_path/device_path:
+On success, a /dev/ibnbd<NR> device file, a /sys/block/ibnbd/ibnbd<NR>/
+directory and a entry in /sys/kernel/ibnbd/devices will be created.
+
+access_mode Parameter
++++++++++++++++++++++
+The access_mode parameter specifies if the device is to be mapped as read-only
+or read-write. The "migration" access mode has the same effect as "rw" and
+should be used during a VM migration scenario by the client where the VM is
+being migrated to.
+If not specified, 'rw' is used.
+
+input_mode Parameter
+++++++++++++++++++++
+The input_mode parameter specifies the internal I/O processing mode of the
+network block device on the client.
+If not specified, 'mq' mode is used.
+
+io_mode Parameter
++++++++++++++++++
+The io_mode parameter specifies if the device on the server will be opened as
+block device (blockio) or as file (fileio).
+When the device is opened as file, the VFS page cache is used for read I/O
+operations, write I/O operations bypass the page cache and go directly to disk
+(except meta updates, like file access time).
+When the device is opened as block device, the block device is accessed
+directly, no VFS page cache is used.
+If not specified, 'fileio' mode is used.
+
+Exit Codes
+++++++++++
+If the device is already mapped it will fail with EEXIST. If the input has an
+invalid format it will return EINVAL. If the device path cannot be found on the
+server, it will fail with ENOENT.
+
+Examples
+++++++++
+  # echo "server=ip:10.50.100.64 device_path=/dev/ram1" input_mode=mq > /sys/kernel/ibnbd/map_device
+  # echo "server=ip:10.50.100.64 device_path=3F2504E0-4F89-41D3-9A0C-0305E82C3301" > /sys/kernel/ibnbd/map_device
+
+Finding device file after mapping
++++++++++++++++++++++++++++++++++
+After mapping, the device file can be found by:
+1.) The symlink /sys/kernel/ibnbd/devices/<device_id> points to
+    /sys/block/<dev-name>.
+    The last part of the symlink destination is the same than the device name.
+    By extracting the last part of the path the path to the device
+    /dev/<dev-name> can be build.
+2.) /dev/block/$(cat /sys/kernel/ibnbd/devices/<device_id>/dev)
+
+How to find the <device_id> of the device is described on the next chapter
+(devices/ directory).
+
+devices/ (DIRECTORY)
+^^^^^^^^^^^^^^^^^^^^
+For each device mapped on the client a new symbolic link is created as
+/sys/kernel/ibnbd/devices/<device_id>, which points to the block device created
+by ibnbd (/sys/block/ibnbd<NR>/). The <device_id> of each device is created as
+follows:
+
+- If the 'device_path' provided during mapping contains slashes ("/"), they are
+  replaced by exclamation mark ("!") and used as as the <device_id>. Otherwise,
+  the <device_id> will be the same as the 'device_path' provided.
+
+
+Examples
+++++++++
+    /sys/kernel/ibnbd/devices/3F2504E0-4F89-41D3-9A0C-0305E82C3301 -> /sys/block/ibnbd1/
+    /sys/kernel/ibnbd/devices/!dev!ram1 -> /sys/block/ibnbd0/
+
+
+/sys/block/ibnbd<NR>/ibnbd/ entries
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+unmap_device (RW)
+^^^^^^^^^^^^^^^^^
+To unmap a volume, 'normal' or 'force' has to be written to:
+  /sys/block/ibnbd<NR>/ibnbd/unmap_device
+
+When 'normal' is used, the operation will fail with EBUSY if any process is
+using the device.
+When 'force' is used, the device is also unmapped when device is in use.
+All I/Os that are in progress will fail. It can happen that the device
+file (/dev/ibnbdx) still exists after the unmapping. The kernel
+couldn't remove the file because it was in use but it's marked as unused.
+The device file will be freed when no process refer to it.
+
+In a following IBNBD mapping the remote device can be reused, but
+ibnbd may generate different device file for it.
+
+Examples
+++++++++
+   # echo "normal" > /sys/block/ibnbd0/ibnbd/unmap_device
+
+state (RO)
+^^^^^^^^^^
+The file contains the current state of the block device. The state file returns
+'open' when the device is successfully mapped from the server and accepting I/O
+requests. When the connection to the server gets disconnected in case of an
+error (e.g. link failure), the state file returns 'closed' and all I/O requests
+will fail with -EIO.
+
+session (RO)
+^^^^^^^^^^^^
+IBNBD uses IBTRS session to transport the data between client and server.
+The file 'session' contains the address of the server, that was used to
+establish the IBTRS session.
+It's the same address that was passed as server parameter to the map_device
+file.
+
+mapping_path (RO)
+^^^^^^^^^^^^^^^^^
+Contains the path that was passed as device_path to the map_device operation.
+
+/sys/kernel/ibtrs/sessions/ entries
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The connections to the servers are created and destroyed on demand. When the
+first device is mapped from a server, an IBTRS connection will be created with
+this server and the following directory will be created:
+
+/sys/kernel/ibtrs/sessions/<server-address>/
+
+If the connection establishment fails, detailed error information can be found
+in the kernel log (dmesg).
+
+When the last device is unmapped from a server, the connection will be closed
+and the directory will be deleted.
+
+
+Server Userspace Interface
+--------------------------
+
+/sys/kernel/ibnbd/ entries
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+/sys/kernel/ibnbd/devices/ entries
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+When a Pserver maps a device, a directory entry with the name of the block
+device is created under /sys/kernel/ibnbd/devices/. If the device path provided
+by the client is a symbolic link to a block device, the target block device name
+is used instead of the mapping path name.
+
+block_dev
+^^^^^^^^^
+block_dev is a symlink to the sysfs entry of the exported device
+
+Examples
+++++++++
+  block_dev -> ../../../../devices/virtual/block/nullb1
+
+revalidate
+^^^^^^^^^^
+When the size of a exported block device changes on the server, the clients
+have to be notified so they can resize the mapped device.
+
+Notification of the clients about a device change is triggered by writing '1'
+to the revalidate file.
+
+Examples
+++++++++
+ # echo 1 > /sys/kernel/ibnbd/devices/nullb1/revalidate
+
+/sys/kernel/ibnbd/devices/<device_name>/clients entries
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+When the device is mapped from a client, the following directory will be
+created:
+
+/sys/kernel/ibnbd/devices/<device_name>/clients/<client-address> entries
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+When the device is unmapped, the directory will be removed.
+
+read_only
+^^^^^^^^^
+Contains '1' if device is mapped read-only, otherwise '0'.
+
+mapping_path
+^^^^^^^^^^^^
+Contains the relative device path provided by the user during mapping.
+
+
+IBNBD-Server Module Parameters
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+dev_search_path
+^^^^^^^^^^^^^^^
+When a device is mapped from the client, the server generates the path to the
+block device on the server side by concatenating dev_search_path and the
+device_path that was specified in the map_device operation.
+
+The format of the input is
+  path ::= Absolute linux path name,
+           Max. length depends on PATH_MAX define (usually 4095 chars)
+
+The default dev_search_path is: "/".
+
+Example
++++++++
+
+Configured dev_search_path on server is: /dev/storage/
+client maps device by::
+  # echo "server=ip:10.50.100.64 device_path=3F2504E0-4F89-41D3-9A0C-0305E82C3301" > /sys/kernel/ibnbd/map_device
+
+The server tries to open a block device with the path:
+  /dev/storage/3F2504E0-4F89-41D3-9A0C-0305E82C3301
+
+
+Contact
+-------
+Mailing list: ibnbd@profitbricks.com
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 28/28] MAINTRAINERS: Add maintainer for IBNBD/IBTRS
  2017-03-24 10:45 ` Jack Wang
                   ` (27 preceding siblings ...)
  (?)
@ 2017-03-24 10:45 ` Jack Wang
  -1 siblings, 0 replies; 87+ messages in thread
From: Jack Wang @ 2017-03-24 10:45 UTC (permalink / raw)
  To: linux-block, linux-rdma
  Cc: dledford, axboe, hch, mail, Milind.dumbare, yun.wang, Jack Wang

From: Jack Wang <jinpu.wang@profitbricks.com>

Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
 MAINTAINERS | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index c776906..12a528a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6263,6 +6263,20 @@ IBM ServeRAID RAID DRIVER
 S:	Orphan
 F:	drivers/scsi/ips.*
 
+IBTRS TRANSPORT DRIVERS
+M:      Jack Wang <jinpu.wang@profitbricks.com>
+L:      linux-rdma@vger.kernel.org
+S:      Maintained
+F:      include/linux/ibtrs*.h
+F:      drivers/infiniband/ulp/ibtrs*
+
+IBNBD BLOCK DRIVERS
+M:      Jack Wang <jinpu.wang@profitbricks.com>
+L:      linux-rdma@vger.kernel.org
+S:      Maintained
+F:	Documentation/IBNBD.txt
+F:      drivers/block/ibnbd*
+
 ICH LPC AND GPIO DRIVER
 M:	Peter Tyser <ptyser@xes-inc.com>
 S:	Maintained
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH 00/28] INFINIBAND NETWORK BLOCK DEVICE (IBNBD)
@ 2017-03-24 12:15   ` Johannes Thumshirn
  0 siblings, 0 replies; 87+ messages in thread
From: Johannes Thumshirn @ 2017-03-24 12:15 UTC (permalink / raw)
  To: Jack Wang
  Cc: linux-block, linux-rdma, dledford, axboe, hch, mail,
	Milind.dumbare, yun.wang

On Fri, Mar 24, 2017 at 11:45:15AM +0100, Jack Wang wrote:
> From: Jack Wang <jinpu.wang@profitbricks.com>
> 
> This series introduces IBNBD/IBTRS kernel modules.
> 
> IBNBD (InfiniBand network block device) allows for an RDMA transfer of block IO
> over InfiniBand network. The driver presents itself as a block device on client
> side and transmits the block requests in a zero-copy fashion to the server-side
> via InfiniBand. The server part of the driver converts the incoming buffers back
> into BIOs and hands them down to the underlying block device. As soon as IO
> responses come back from the drive, they are being transmitted back to the
> client.
> 
> We design and implement this solution based on our need for Cloud Computing,
> the key features are:
> - High throughput and low latency due to:
> 1) Only two rdma messages per IO
> 2) Simplified client side server memory management
> 3) Eliminated SCSI sublayer
> - Simple configuration and handling
> 1) Server side is completely passive: volumes do not need to be
> explicitly exported
> 2) Only IB port GID and device path needed on client side to map
> a block device
> 3) A device can be remapped automatically i.e. after storage
> reboot
> - Pinning of IO-related processing to the CPU of the producer
> 
> For usage please refer to Documentation/IBNBD.txt in later patch.
> My colleague Danil Kpnis presents IBNBD in Vault-2017 about our design/feature/
> tradeoff/performance:
> 
> http://events.linuxfoundation.org/sites/events/files/slides/IBNBD-Vault-2017.pdf
> 

Hi Jack,

Sorry to ask (I haven't attented the Vault presentation) but why can't you use
NVMe over Fabrics in your environment? From what I see in your presentation
and cover letter, it provides all you need and is in fact a standard Linux and
Windows already have implemented.

Thanks,
	Johannes
-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N�rnberg
GF: Felix Imend�rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N�rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH 00/28] INFINIBAND NETWORK BLOCK DEVICE (IBNBD)
@ 2017-03-24 12:15   ` Johannes Thumshirn
  0 siblings, 0 replies; 87+ messages in thread
From: Johannes Thumshirn @ 2017-03-24 12:15 UTC (permalink / raw)
  To: Jack Wang
  Cc: linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	dledford-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
	hch-jcswGhMUV9g, mail-99BIx50xQYGELgA04lAiVw,
	Milind.dumbare-Re5JQEeQqe8AvxtiuMwx3w,
	yun.wang-EIkl63zCoXaH+58JC4qpiA

On Fri, Mar 24, 2017 at 11:45:15AM +0100, Jack Wang wrote:
> From: Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
> 
> This series introduces IBNBD/IBTRS kernel modules.
> 
> IBNBD (InfiniBand network block device) allows for an RDMA transfer of block IO
> over InfiniBand network. The driver presents itself as a block device on client
> side and transmits the block requests in a zero-copy fashion to the server-side
> via InfiniBand. The server part of the driver converts the incoming buffers back
> into BIOs and hands them down to the underlying block device. As soon as IO
> responses come back from the drive, they are being transmitted back to the
> client.
> 
> We design and implement this solution based on our need for Cloud Computing,
> the key features are:
> - High throughput and low latency due to:
> 1) Only two rdma messages per IO
> 2) Simplified client side server memory management
> 3) Eliminated SCSI sublayer
> - Simple configuration and handling
> 1) Server side is completely passive: volumes do not need to be
> explicitly exported
> 2) Only IB port GID and device path needed on client side to map
> a block device
> 3) A device can be remapped automatically i.e. after storage
> reboot
> - Pinning of IO-related processing to the CPU of the producer
> 
> For usage please refer to Documentation/IBNBD.txt in later patch.
> My colleague Danil Kpnis presents IBNBD in Vault-2017 about our design/feature/
> tradeoff/performance:
> 
> http://events.linuxfoundation.org/sites/events/files/slides/IBNBD-Vault-2017.pdf
> 

Hi Jack,

Sorry to ask (I haven't attented the Vault presentation) but why can't you use
NVMe over Fabrics in your environment? From what I see in your presentation
and cover letter, it provides all you need and is in fact a standard Linux and
Windows already have implemented.

Thanks,
	Johannes
-- 
Johannes Thumshirn                                          Storage
jthumshirn-l3A5Bk7waGM@public.gmane.org                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 01/28] ibtrs: add header shared between ibtrs_client and ibtrs_server
  2017-03-24 10:45   ` Jack Wang
@ 2017-03-24 12:35     ` Johannes Thumshirn
  -1 siblings, 0 replies; 87+ messages in thread
From: Johannes Thumshirn @ 2017-03-24 12:35 UTC (permalink / raw)
  To: Jack Wang
  Cc: linux-block, linux-rdma, dledford, axboe, hch, mail,
	Milind.dumbare, yun.wang, Kleber Souza, Danil Kipnis, Roman Pen

On Fri, Mar 24, 2017 at 11:45:16AM +0100, Jack Wang wrote:
> From: Jack Wang <jinpu.wang@profitbricks.com>
> 
> Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
> Signed-off-by: Kleber Souza <kleber.souza@profitbricks.com>
> Signed-off-by: Danil Kipnis <danil.kipnis@profitbricks.com>
> Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
> ---

[...]

> +
> +#define XX(a) case (a): return #a

please no macros with retun in them and XX isn't quite too descriptive as
well.

[...]

> +static inline const char *ib_wc_opcode_str(enum ib_wc_opcode opcode)
> +{
> +	switch (opcode) {
> +	XX(IB_WC_SEND);
> +	XX(IB_WC_RDMA_WRITE);
> +	XX(IB_WC_RDMA_READ);
> +	XX(IB_WC_COMP_SWAP);
> +	XX(IB_WC_FETCH_ADD);
> +	/* recv-side); inbound completion */
> +	XX(IB_WC_RECV);
> +	XX(IB_WC_RECV_RDMA_WITH_IMM);
> +	default: return "IB_WC_OPCODE_UNKNOWN";
> +	}
> +}

How about:

struct {
	char *name;
	enum ib_wc_opcode opcode;
} ib_wc_opcode_table[] = {
	{ stringyfy(IB_WC_SEND), IB_WC_SEND },
	{ stringyfy(IB_WC_RDMA_WRITE), IB_WC_RDMA_WRITE },
	{ stringyfy(IB_WC_RDMA_READ ), IB_WC_RDMA_READ }
	{ stringyfy(IB_WC_COMP_SWAP), IB_WC_COMP_SWAP },
	{ stringyfy(IB_WC_FETCH_ADD), IB_WC_FETCH_ADD },
	{ stringyfy(IB_WC_RECV), IB_WC_RECV },
	{ stringyfy(IB_WC_RECV_RDMA_WITH_IMM), IB_WC_RECV_RDMA_WITH_IMM },
	{ NULL, 0 },
};

static inline const char *ib_wc_opcode_str(enum ib_wc_opcode opcode)
{
	int i;

	for (i = 0; i < ARRAY_SIZE(ib_wc_opcode_table); i++)
		if (ib_wc_opcode_table[i].opcode == opcode)
			return ib_wc_opcode_table[i].name;

	return "IB_WC_OPCODE_UNKNOWN";
}


[...]

> +/**
> + * struct ibtrs_msg_hdr - Common header of all IBTRS messages
> + * @type:	Message type, valid values see: enum ibtrs_msg_types
> + * @tsize:	Total size of transferred data
> + *
> + * Don't move the first 8 padding bytes! It's a workaround for a kernel bug.
> + * See IBNBD-610 for details

What about resolving the kernel bug instead of making workarounds?

> + *
> + * DO NOT CHANGE!
> + */
> +struct ibtrs_msg_hdr {
> +	u8			__padding1;
> +	u8			type;
> +	u16			__padding2;
> +	u32			tsize;
> +};

[...]

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N�rnberg
GF: Felix Imend�rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N�rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 01/28] ibtrs: add header shared between ibtrs_client and ibtrs_server
@ 2017-03-24 12:35     ` Johannes Thumshirn
  0 siblings, 0 replies; 87+ messages in thread
From: Johannes Thumshirn @ 2017-03-24 12:35 UTC (permalink / raw)
  To: Jack Wang
  Cc: linux-block, linux-rdma, dledford, axboe, hch, mail,
	Milind.dumbare, yun.wang, Kleber Souza, Danil Kipnis, Roman Pen

On Fri, Mar 24, 2017 at 11:45:16AM +0100, Jack Wang wrote:
> From: Jack Wang <jinpu.wang@profitbricks.com>
> 
> Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
> Signed-off-by: Kleber Souza <kleber.souza@profitbricks.com>
> Signed-off-by: Danil Kipnis <danil.kipnis@profitbricks.com>
> Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
> ---

[...]

> +
> +#define XX(a) case (a): return #a

please no macros with retun in them and XX isn't quite too descriptive as
well.

[...]

> +static inline const char *ib_wc_opcode_str(enum ib_wc_opcode opcode)
> +{
> +	switch (opcode) {
> +	XX(IB_WC_SEND);
> +	XX(IB_WC_RDMA_WRITE);
> +	XX(IB_WC_RDMA_READ);
> +	XX(IB_WC_COMP_SWAP);
> +	XX(IB_WC_FETCH_ADD);
> +	/* recv-side); inbound completion */
> +	XX(IB_WC_RECV);
> +	XX(IB_WC_RECV_RDMA_WITH_IMM);
> +	default: return "IB_WC_OPCODE_UNKNOWN";
> +	}
> +}

How about:

struct {
	char *name;
	enum ib_wc_opcode opcode;
} ib_wc_opcode_table[] = {
	{ stringyfy(IB_WC_SEND), IB_WC_SEND },
	{ stringyfy(IB_WC_RDMA_WRITE), IB_WC_RDMA_WRITE },
	{ stringyfy(IB_WC_RDMA_READ ), IB_WC_RDMA_READ }
	{ stringyfy(IB_WC_COMP_SWAP), IB_WC_COMP_SWAP },
	{ stringyfy(IB_WC_FETCH_ADD), IB_WC_FETCH_ADD },
	{ stringyfy(IB_WC_RECV), IB_WC_RECV },
	{ stringyfy(IB_WC_RECV_RDMA_WITH_IMM), IB_WC_RECV_RDMA_WITH_IMM },
	{ NULL, 0 },
};

static inline const char *ib_wc_opcode_str(enum ib_wc_opcode opcode)
{
	int i;

	for (i = 0; i < ARRAY_SIZE(ib_wc_opcode_table); i++)
		if (ib_wc_opcode_table[i].opcode == opcode)
			return ib_wc_opcode_table[i].name;

	return "IB_WC_OPCODE_UNKNOWN";
}


[...]

> +/**
> + * struct ibtrs_msg_hdr - Common header of all IBTRS messages
> + * @type:	Message type, valid values see: enum ibtrs_msg_types
> + * @tsize:	Total size of transferred data
> + *
> + * Don't move the first 8 padding bytes! It's a workaround for a kernel bug.
> + * See IBNBD-610 for details

What about resolving the kernel bug instead of making workarounds?

> + *
> + * DO NOT CHANGE!
> + */
> +struct ibtrs_msg_hdr {
> +	u8			__padding1;
> +	u8			type;
> +	u16			__padding2;
> +	u32			tsize;
> +};

[...]

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH 00/28] INFINIBAND NETWORK BLOCK DEVICE (IBNBD)
@ 2017-03-24 12:46     ` Jinpu Wang
  0 siblings, 0 replies; 87+ messages in thread
From: Jinpu Wang @ 2017-03-24 12:46 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: linux-block, linux-rdma, Doug Ledford, Jens Axboe, hch,
	Fabian Holler, Milind Dumbare, Michael Wang

On Fri, Mar 24, 2017 at 1:15 PM, Johannes Thumshirn <jthumshirn@suse.de> wr=
ote:
> On Fri, Mar 24, 2017 at 11:45:15AM +0100, Jack Wang wrote:
>> From: Jack Wang <jinpu.wang@profitbricks.com>
>>
>> This series introduces IBNBD/IBTRS kernel modules.
>>
>> IBNBD (InfiniBand network block device) allows for an RDMA transfer of b=
lock IO
>> over InfiniBand network. The driver presents itself as a block device on=
 client
>> side and transmits the block requests in a zero-copy fashion to the serv=
er-side
>> via InfiniBand. The server part of the driver converts the incoming buff=
ers back
>> into BIOs and hands them down to the underlying block device. As soon as=
 IO
>> responses come back from the drive, they are being transmitted back to t=
he
>> client.
>>
>> We design and implement this solution based on our need for Cloud Comput=
ing,
>> the key features are:
>> - High throughput and low latency due to:
>> 1) Only two rdma messages per IO
>> 2) Simplified client side server memory management
>> 3) Eliminated SCSI sublayer
>> - Simple configuration and handling
>> 1) Server side is completely passive: volumes do not need to be
>> explicitly exported
>> 2) Only IB port GID and device path needed on client side to map
>> a block device
>> 3) A device can be remapped automatically i.e. after storage
>> reboot
>> - Pinning of IO-related processing to the CPU of the producer
>>
>> For usage please refer to Documentation/IBNBD.txt in later patch.
>> My colleague Danil Kpnis presents IBNBD in Vault-2017 about our design/f=
eature/
>> tradeoff/performance:
>>
>> http://events.linuxfoundation.org/sites/events/files/slides/IBNBD-Vault-=
2017.pdf
>>
>
> Hi Jack,
>
> Sorry to ask (I haven't attented the Vault presentation) but why can't yo=
u use
> NVMe over Fabrics in your environment? From what I see in your presentati=
on
> and cover letter, it provides all you need and is in fact a standard Linu=
x and
> Windows already have implemented.
>
> Thanks,
>         Johannes
> --
> Johannes Thumshirn                                          Storage
> jthumshirn@suse.de                                +49 911 74053 689
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N=C3=BCrnberg
> GF: Felix Imend=C3=B6rffer, Jane Smithard, Graham Norton
> HRB 21284 (AG N=C3=BCrnberg)
> Key fingerprint =3D EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

Hi Johnnes,

Our IBNBD project was started 3 years ago based on our need for Cloud
Computing, NVMeOF is a bit younger.
- IBNBD is one of our components, part of our software defined storage solu=
tion.
- As I listed in features, IBNBD has it's own features

We're planning to look more into NVMeOF, but it's not a replacement for IBN=
BD.

Thanks,
--=20
Jack Wang
Linux Kernel Developer

ProfitBricks GmbH
Greifswalder Str. 207
D - 10405 Berlin

Tel:       +49 30 577 008  042
Fax:      +49 30 577 008 299
Email:    jinpu.wang@profitbricks.com
URL:      https://www.profitbricks.de

Sitz der Gesellschaft: Berlin
Registergericht: Amtsgericht Charlottenburg, HRB 125506 B
Gesch=C3=A4ftsf=C3=BChrer: Achim Weiss

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH 00/28] INFINIBAND NETWORK BLOCK DEVICE (IBNBD)
@ 2017-03-24 12:46     ` Jinpu Wang
  0 siblings, 0 replies; 87+ messages in thread
From: Jinpu Wang @ 2017-03-24 12:46 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Jens Axboe, hch,
	Fabian Holler, Milind Dumbare, Michael Wang

On Fri, Mar 24, 2017 at 1:15 PM, Johannes Thumshirn <jthumshirn-l3A5Bk7waGM@public.gmane.org> wrote:
> On Fri, Mar 24, 2017 at 11:45:15AM +0100, Jack Wang wrote:
>> From: Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
>>
>> This series introduces IBNBD/IBTRS kernel modules.
>>
>> IBNBD (InfiniBand network block device) allows for an RDMA transfer of block IO
>> over InfiniBand network. The driver presents itself as a block device on client
>> side and transmits the block requests in a zero-copy fashion to the server-side
>> via InfiniBand. The server part of the driver converts the incoming buffers back
>> into BIOs and hands them down to the underlying block device. As soon as IO
>> responses come back from the drive, they are being transmitted back to the
>> client.
>>
>> We design and implement this solution based on our need for Cloud Computing,
>> the key features are:
>> - High throughput and low latency due to:
>> 1) Only two rdma messages per IO
>> 2) Simplified client side server memory management
>> 3) Eliminated SCSI sublayer
>> - Simple configuration and handling
>> 1) Server side is completely passive: volumes do not need to be
>> explicitly exported
>> 2) Only IB port GID and device path needed on client side to map
>> a block device
>> 3) A device can be remapped automatically i.e. after storage
>> reboot
>> - Pinning of IO-related processing to the CPU of the producer
>>
>> For usage please refer to Documentation/IBNBD.txt in later patch.
>> My colleague Danil Kpnis presents IBNBD in Vault-2017 about our design/feature/
>> tradeoff/performance:
>>
>> http://events.linuxfoundation.org/sites/events/files/slides/IBNBD-Vault-2017.pdf
>>
>
> Hi Jack,
>
> Sorry to ask (I haven't attented the Vault presentation) but why can't you use
> NVMe over Fabrics in your environment? From what I see in your presentation
> and cover letter, it provides all you need and is in fact a standard Linux and
> Windows already have implemented.
>
> Thanks,
>         Johannes
> --
> Johannes Thumshirn                                          Storage
> jthumshirn-l3A5Bk7waGM@public.gmane.org                                +49 911 74053 689
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)
> Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

Hi Johnnes,

Our IBNBD project was started 3 years ago based on our need for Cloud
Computing, NVMeOF is a bit younger.
- IBNBD is one of our components, part of our software defined storage solution.
- As I listed in features, IBNBD has it's own features

We're planning to look more into NVMeOF, but it's not a replacement for IBNBD.

Thanks,
-- 
Jack Wang
Linux Kernel Developer

ProfitBricks GmbH
Greifswalder Str. 207
D - 10405 Berlin

Tel:       +49 30 577 008  042
Fax:      +49 30 577 008 299
Email:    jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org
URL:      https://www.profitbricks.de

Sitz der Gesellschaft: Berlin
Registergericht: Amtsgericht Charlottenburg, HRB 125506 B
Geschäftsführer: Achim Weiss
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH 00/28] INFINIBAND NETWORK BLOCK DEVICE (IBNBD)
  2017-03-24 12:46     ` Jinpu Wang
@ 2017-03-24 12:48       ` Johannes Thumshirn
  -1 siblings, 0 replies; 87+ messages in thread
From: Johannes Thumshirn @ 2017-03-24 12:48 UTC (permalink / raw)
  To: Jinpu Wang
  Cc: linux-block, linux-rdma, Doug Ledford, Jens Axboe, hch,
	Fabian Holler, Milind Dumbare, Michael Wang

On Fri, Mar 24, 2017 at 01:46:02PM +0100, Jinpu Wang wrote:
> Hi Johnnes,
> 
> Our IBNBD project was started 3 years ago based on our need for Cloud
> Computing, NVMeOF is a bit younger.
> - IBNBD is one of our components, part of our software defined storage solution.
> - As I listed in features, IBNBD has it's own features
> 
> We're planning to look more into NVMeOF, but it's not a replacement for IBNBD.

Ok thanks for the clarification.

Byte,
	Johannes

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N�rnberg
GF: Felix Imend�rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N�rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH 00/28] INFINIBAND NETWORK BLOCK DEVICE (IBNBD)
@ 2017-03-24 12:48       ` Johannes Thumshirn
  0 siblings, 0 replies; 87+ messages in thread
From: Johannes Thumshirn @ 2017-03-24 12:48 UTC (permalink / raw)
  To: Jinpu Wang
  Cc: linux-block, linux-rdma, Doug Ledford, Jens Axboe, hch,
	Fabian Holler, Milind Dumbare, Michael Wang

On Fri, Mar 24, 2017 at 01:46:02PM +0100, Jinpu Wang wrote:
> Hi Johnnes,
> 
> Our IBNBD project was started 3 years ago based on our need for Cloud
> Computing, NVMeOF is a bit younger.
> - IBNBD is one of our components, part of our software defined storage solution.
> - As I listed in features, IBNBD has it's own features
> 
> We're planning to look more into NVMeOF, but it's not a replacement for IBNBD.

Ok thanks for the clarification.

Byte,
	Johannes

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 01/28] ibtrs: add header shared between ibtrs_client and ibtrs_server
@ 2017-03-24 12:54       ` Jinpu Wang
  0 siblings, 0 replies; 87+ messages in thread
From: Jinpu Wang @ 2017-03-24 12:54 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: linux-block, linux-rdma, Doug Ledford, Jens Axboe, hch,
	Fabian Holler, Milind Dumbare, Michael Wang, Kleber Souza,
	Danil Kipnis, Roman Pen

>> +
>> +#define XX(a) case (a): return #a
>
> please no macros with retun in them and XX isn't quite too descriptive as
> well.
>
> [...]
>
>> +static inline const char *ib_wc_opcode_str(enum ib_wc_opcode opcode)
>> +{
>> +     switch (opcode) {
>> +     XX(IB_WC_SEND);
>> +     XX(IB_WC_RDMA_WRITE);
>> +     XX(IB_WC_RDMA_READ);
>> +     XX(IB_WC_COMP_SWAP);
>> +     XX(IB_WC_FETCH_ADD);
>> +     /* recv-side); inbound completion */
>> +     XX(IB_WC_RECV);
>> +     XX(IB_WC_RECV_RDMA_WITH_IMM);
>> +     default: return "IB_WC_OPCODE_UNKNOWN";
>> +     }
>> +}
>
> How about:
>
> struct {
>         char *name;
>         enum ib_wc_opcode opcode;
> } ib_wc_opcode_table[] =3D {
>         { stringyfy(IB_WC_SEND), IB_WC_SEND },
>         { stringyfy(IB_WC_RDMA_WRITE), IB_WC_RDMA_WRITE },
>         { stringyfy(IB_WC_RDMA_READ ), IB_WC_RDMA_READ }
>         { stringyfy(IB_WC_COMP_SWAP), IB_WC_COMP_SWAP },
>         { stringyfy(IB_WC_FETCH_ADD), IB_WC_FETCH_ADD },
>         { stringyfy(IB_WC_RECV), IB_WC_RECV },
>         { stringyfy(IB_WC_RECV_RDMA_WITH_IMM), IB_WC_RECV_RDMA_WITH_IMM }=
,
>         { NULL, 0 },
> };
>
> static inline const char *ib_wc_opcode_str(enum ib_wc_opcode opcode)
> {
>         int i;
>
>         for (i =3D 0; i < ARRAY_SIZE(ib_wc_opcode_table); i++)
>                 if (ib_wc_opcode_table[i].opcode =3D=3D opcode)
>                         return ib_wc_opcode_table[i].name;
>
>         return "IB_WC_OPCODE_UNKNOWN";
> }
>
Looks nice, might be better to put it into ib_verbs.h?

>
> [...]
>
>> +/**
>> + * struct ibtrs_msg_hdr - Common header of all IBTRS messages
>> + * @type:    Message type, valid values see: enum ibtrs_msg_types
>> + * @tsize:   Total size of transferred data
>> + *
>> + * Don't move the first 8 padding bytes! It's a workaround for a kernel=
 bug.
>> + * See IBNBD-610 for details
>
> What about resolving the kernel bug instead of making workarounds?
I tried to send a patch upsteam, but was rejected by Sean.
http://www.spinics.net/lists/linux-rdma/msg22381.html

>
>> + *
>> + * DO NOT CHANGE!
>> + */
>> +struct ibtrs_msg_hdr {
>> +     u8                      __padding1;
>> +     u8                      type;
>> +     u16                     __padding2;
>> +     u32                     tsize;
>> +};
>
> [...]
>
> --
> Johannes Thumshirn                                          Storage
> jthumshirn@suse.de                                +49 911 74053 689
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N=C3=BCrnberg
> GF: Felix Imend=C3=B6rffer, Jane Smithard, Graham Norton
> HRB 21284 (AG N=C3=BCrnberg)
> Key fingerprint =3D EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

Thanks Johannes for review.


--=20
Jack Wang
Linux Kernel Developer

ProfitBricks GmbH
Greifswalder Str. 207
D - 10405 Berlin

Tel:       +49 30 577 008  042
Fax:      +49 30 577 008 299
Email:    jinpu.wang@profitbricks.com
URL:      https://www.profitbricks.de

Sitz der Gesellschaft: Berlin
Registergericht: Amtsgericht Charlottenburg, HRB 125506 B
Gesch=C3=A4ftsf=C3=BChrer: Achim Weiss

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 01/28] ibtrs: add header shared between ibtrs_client and ibtrs_server
@ 2017-03-24 12:54       ` Jinpu Wang
  0 siblings, 0 replies; 87+ messages in thread
From: Jinpu Wang @ 2017-03-24 12:54 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Jens Axboe, hch,
	Fabian Holler, Milind Dumbare, Michael Wang, Kleber Souza,
	Danil Kipnis, Roman Pen

>> +
>> +#define XX(a) case (a): return #a
>
> please no macros with retun in them and XX isn't quite too descriptive as
> well.
>
> [...]
>
>> +static inline const char *ib_wc_opcode_str(enum ib_wc_opcode opcode)
>> +{
>> +     switch (opcode) {
>> +     XX(IB_WC_SEND);
>> +     XX(IB_WC_RDMA_WRITE);
>> +     XX(IB_WC_RDMA_READ);
>> +     XX(IB_WC_COMP_SWAP);
>> +     XX(IB_WC_FETCH_ADD);
>> +     /* recv-side); inbound completion */
>> +     XX(IB_WC_RECV);
>> +     XX(IB_WC_RECV_RDMA_WITH_IMM);
>> +     default: return "IB_WC_OPCODE_UNKNOWN";
>> +     }
>> +}
>
> How about:
>
> struct {
>         char *name;
>         enum ib_wc_opcode opcode;
> } ib_wc_opcode_table[] = {
>         { stringyfy(IB_WC_SEND), IB_WC_SEND },
>         { stringyfy(IB_WC_RDMA_WRITE), IB_WC_RDMA_WRITE },
>         { stringyfy(IB_WC_RDMA_READ ), IB_WC_RDMA_READ }
>         { stringyfy(IB_WC_COMP_SWAP), IB_WC_COMP_SWAP },
>         { stringyfy(IB_WC_FETCH_ADD), IB_WC_FETCH_ADD },
>         { stringyfy(IB_WC_RECV), IB_WC_RECV },
>         { stringyfy(IB_WC_RECV_RDMA_WITH_IMM), IB_WC_RECV_RDMA_WITH_IMM },
>         { NULL, 0 },
> };
>
> static inline const char *ib_wc_opcode_str(enum ib_wc_opcode opcode)
> {
>         int i;
>
>         for (i = 0; i < ARRAY_SIZE(ib_wc_opcode_table); i++)
>                 if (ib_wc_opcode_table[i].opcode == opcode)
>                         return ib_wc_opcode_table[i].name;
>
>         return "IB_WC_OPCODE_UNKNOWN";
> }
>
Looks nice, might be better to put it into ib_verbs.h?

>
> [...]
>
>> +/**
>> + * struct ibtrs_msg_hdr - Common header of all IBTRS messages
>> + * @type:    Message type, valid values see: enum ibtrs_msg_types
>> + * @tsize:   Total size of transferred data
>> + *
>> + * Don't move the first 8 padding bytes! It's a workaround for a kernel bug.
>> + * See IBNBD-610 for details
>
> What about resolving the kernel bug instead of making workarounds?
I tried to send a patch upsteam, but was rejected by Sean.
http://www.spinics.net/lists/linux-rdma/msg22381.html

>
>> + *
>> + * DO NOT CHANGE!
>> + */
>> +struct ibtrs_msg_hdr {
>> +     u8                      __padding1;
>> +     u8                      type;
>> +     u16                     __padding2;
>> +     u32                     tsize;
>> +};
>
> [...]
>
> --
> Johannes Thumshirn                                          Storage
> jthumshirn-l3A5Bk7waGM@public.gmane.org                                +49 911 74053 689
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)
> Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

Thanks Johannes for review.


-- 
Jack Wang
Linux Kernel Developer

ProfitBricks GmbH
Greifswalder Str. 207
D - 10405 Berlin

Tel:       +49 30 577 008  042
Fax:      +49 30 577 008 299
Email:    jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org
URL:      https://www.profitbricks.de

Sitz der Gesellschaft: Berlin
Registergericht: Amtsgericht Charlottenburg, HRB 125506 B
Geschäftsführer: Achim Weiss
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH 00/28] INFINIBAND NETWORK BLOCK DEVICE (IBNBD)
  2017-03-24 12:46     ` Jinpu Wang
@ 2017-03-24 13:31       ` Bart Van Assche
  -1 siblings, 0 replies; 87+ messages in thread
From: Bart Van Assche @ 2017-03-24 13:31 UTC (permalink / raw)
  To: jthumshirn, jinpu.wang
  Cc: linux-block, linux-rdma, mail, yun.wang, hch, axboe,
	Milind.dumbare, dledford

On Fri, 2017-03-24 at 13:46 +0100, Jinpu Wang wrote:
> Our IBNBD project was started 3 years ago based on our need for Cloud
> Computing, NVMeOF is a bit younger.
> - IBNBD is one of our components, part of our software defined storage so=
lution.
> - As I listed in features, IBNBD has it's own features
>=20
> We're planning to look more into NVMeOF, but it's not a replacement for I=
BNBD.

Hello Jack, Danil and Roman,

Thanks for having taken the time to open source this work and to travel to
Boston to present this work at the Vault conference. However, my
understanding of IBNBD is that this driver has several shortcomings neither
NVMeOF nor iSER nor SRP have:
* Doesn't scale in terms of number of CPUs submitting I/O. The graphs shown
  during the Vault talk clearly illustrate this. This is probably the resul=
t
  of sharing a data structure across all client CPUs, maybe the bitmap that
  tracks which parts of the target buffer space are in use.
* Supports IB but none of the other RDMA transports (RoCE / iWARP).

We also need performance numbers that compare IBNBD against SRP and/or
NVMeOF with memory registration disabled to see whether and how much faster
IBNBD is compared to these two protocols.

The fact that IBNBD only needs to messages per I/O is an advantage it has
today over SRP but not over NVMeOF nor over iSER. The upstream initiator
drivers for the latter two protocols already support inline data.

Another question I have is whether integration with multipathd is supported=
?
If multipathd tries to run scsi_id against an IBNBD client device that will
fail.

Thanks,

Bart.=

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH 00/28] INFINIBAND NETWORK BLOCK DEVICE (IBNBD)
@ 2017-03-24 13:31       ` Bart Van Assche
  0 siblings, 0 replies; 87+ messages in thread
From: Bart Van Assche @ 2017-03-24 13:31 UTC (permalink / raw)
  To: jthumshirn, jinpu.wang
  Cc: linux-block, linux-rdma, mail, yun.wang, hch, axboe,
	Milind.dumbare, dledford

On Fri, 2017-03-24 at 13:46 +0100, Jinpu Wang wrote:
> Our IBNBD project was started 3 years ago based on our need for Cloud
> Computing, NVMeOF is a bit younger.
> - IBNBD is one of our components, part of our software defined storage solution.
> - As I listed in features, IBNBD has it's own features
> 
> We're planning to look more into NVMeOF, but it's not a replacement for IBNBD.

Hello Jack, Danil and Roman,

Thanks for having taken the time to open source this work and to travel to
Boston to present this work at the Vault conference. However, my
understanding of IBNBD is that this driver has several shortcomings neither
NVMeOF nor iSER nor SRP have:
* Doesn't scale in terms of number of CPUs submitting I/O. The graphs shown
  during the Vault talk clearly illustrate this. This is probably the result
  of sharing a data structure across all client CPUs, maybe the bitmap that
  tracks which parts of the target buffer space are in use.
* Supports IB but none of the other RDMA transports (RoCE / iWARP).

We also need performance numbers that compare IBNBD against SRP and/or
NVMeOF with memory registration disabled to see whether and how much faster
IBNBD is compared to these two protocols.

The fact that IBNBD only needs to messages per I/O is an advantage it has
today over SRP but not over NVMeOF nor over iSER. The upstream initiator
drivers for the latter two protocols already support inline data.

Another question I have is whether integration with multipathd is supported?
If multipathd tries to run scsi_id against an IBNBD client device that will
fail.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* RE: [RFC PATCH 00/28] INFINIBAND NETWORK BLOCK DEVICE (IBNBD)
  2017-03-24 10:45 ` Jack Wang
@ 2017-03-24 14:20   ` Steve Wise
  -1 siblings, 0 replies; 87+ messages in thread
From: Steve Wise @ 2017-03-24 14:20 UTC (permalink / raw)
  To: 'Jack Wang', linux-block, linux-rdma
  Cc: dledford, axboe, hch, mail, Milind.dumbare, yun.wang

> 
> From: Jack Wang <jinpu.wang@profitbricks.com>
> 
> This series introduces IBNBD/IBTRS kernel modules.
> 
> IBNBD (InfiniBand network block device) allows for an RDMA transfer of block
IO
> over InfiniBand network. The driver presents itself as a block device on
client
> side and transmits the block requests in a zero-copy fashion to the
server-side
> via InfiniBand. The server part of the driver converts the incoming buffers
back
> into BIOs and hands them down to the underlying block device. As soon as IO
> responses come back from the drive, they are being transmitted back to the
> client.

Hey Jack, why is this IB specific?  Can it work over iWARP transports as well?

Steve.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* RE: [RFC PATCH 00/28] INFINIBAND NETWORK BLOCK DEVICE (IBNBD)
@ 2017-03-24 14:20   ` Steve Wise
  0 siblings, 0 replies; 87+ messages in thread
From: Steve Wise @ 2017-03-24 14:20 UTC (permalink / raw)
  To: 'Jack Wang', linux-block, linux-rdma
  Cc: dledford, axboe, hch, mail, Milind.dumbare, yun.wang

> 
> From: Jack Wang <jinpu.wang@profitbricks.com>
> 
> This series introduces IBNBD/IBTRS kernel modules.
> 
> IBNBD (InfiniBand network block device) allows for an RDMA transfer of block
IO
> over InfiniBand network. The driver presents itself as a block device on
client
> side and transmits the block requests in a zero-copy fashion to the
server-side
> via InfiniBand. The server part of the driver converts the incoming buffers
back
> into BIOs and hands them down to the underlying block device. As soon as IO
> responses come back from the drive, they are being transmitted back to the
> client.

Hey Jack, why is this IB specific?  Can it work over iWARP transports as well?

Steve.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH 00/28] INFINIBAND NETWORK BLOCK DEVICE (IBNBD)
@ 2017-03-24 14:24         ` Jinpu Wang
  0 siblings, 0 replies; 87+ messages in thread
From: Jinpu Wang @ 2017-03-24 14:24 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: jthumshirn, linux-block, linux-rdma, mail, yun.wang, hch, axboe,
	Milind.dumbare, dledford, Danil Kipnis, Roman Penyaev

On Fri, Mar 24, 2017 at 2:31 PM, Bart Van Assche
<Bart.VanAssche@sandisk.com> wrote:
> On Fri, 2017-03-24 at 13:46 +0100, Jinpu Wang wrote:
>> Our IBNBD project was started 3 years ago based on our need for Cloud
>> Computing, NVMeOF is a bit younger.
>> - IBNBD is one of our components, part of our software defined storage s=
olution.
>> - As I listed in features, IBNBD has it's own features
>>
>> We're planning to look more into NVMeOF, but it's not a replacement for =
IBNBD.
>
> Hello Jack, Danil and Roman,
>
> Thanks for having taken the time to open source this work and to travel t=
o
> Boston to present this work at the Vault conference. However, my
> understanding of IBNBD is that this driver has several shortcomings neith=
er
> NVMeOF nor iSER nor SRP have:
> * Doesn't scale in terms of number of CPUs submitting I/O. The graphs sho=
wn
>   during the Vault talk clearly illustrate this. This is probably the res=
ult
>   of sharing a data structure across all client CPUs, maybe the bitmap th=
at
>   tracks which parts of the target buffer space are in use.
> * Supports IB but none of the other RDMA transports (RoCE / iWARP).
>
> We also need performance numbers that compare IBNBD against SRP and/or
> NVMeOF with memory registration disabled to see whether and how much fast=
er
> IBNBD is compared to these two protocols.
>
> The fact that IBNBD only needs to messages per I/O is an advantage it has
> today over SRP but not over NVMeOF nor over iSER. The upstream initiator
> drivers for the latter two protocols already support inline data.
>
> Another question I have is whether integration with multipathd is support=
ed?
> If multipathd tries to run scsi_id against an IBNBD client device that wi=
ll
> fail.
>
> Thanks,
>
> Bart.
Hello Bart,

Thanks for your comments. As usual in house driver mainly covers needs
for ProfitBricks,
We only tested in our hardware environment. We only use IB not
RoCE/iWARP. The idea to
opensource is :
- Present our design/implementation/tradeoff, others might be interested.
- Attract more attention from developers/testers, so we can improve
the project better and faster.

We will gather performance data compare with NVMeOF in next submitting.

multipath is not supported, we're using APM for failover. (patch from
Mellanox developers)

Thanks,
--=20
Jack Wang
Linux Kernel Developer

ProfitBricks GmbH
Greifswalder Str. 207
D - 10405 Berlin

Tel:       +49 30 577 008  042
Fax:      +49 30 577 008 299
Email:    jinpu.wang@profitbricks.com
URL:      https://www.profitbricks.de

Sitz der Gesellschaft: Berlin
Registergericht: Amtsgericht Charlottenburg, HRB 125506 B
Gesch=C3=A4ftsf=C3=BChrer: Achim Weiss

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH 00/28] INFINIBAND NETWORK BLOCK DEVICE (IBNBD)
@ 2017-03-24 14:24         ` Jinpu Wang
  0 siblings, 0 replies; 87+ messages in thread
From: Jinpu Wang @ 2017-03-24 14:24 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: jthumshirn-l3A5Bk7waGM, linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, mail-99BIx50xQYGELgA04lAiVw,
	yun.wang-EIkl63zCoXaH+58JC4qpiA, hch-jcswGhMUV9g,
	axboe-tSWWG44O7X1aa/9Udqfwiw,
	Milind.dumbare-Re5JQEeQqe8AvxtiuMwx3w,
	dledford-H+wXaHxf7aLQT0dZR+AlfA, Danil Kipnis, Roman Penyaev

On Fri, Mar 24, 2017 at 2:31 PM, Bart Van Assche
<Bart.VanAssche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org> wrote:
> On Fri, 2017-03-24 at 13:46 +0100, Jinpu Wang wrote:
>> Our IBNBD project was started 3 years ago based on our need for Cloud
>> Computing, NVMeOF is a bit younger.
>> - IBNBD is one of our components, part of our software defined storage solution.
>> - As I listed in features, IBNBD has it's own features
>>
>> We're planning to look more into NVMeOF, but it's not a replacement for IBNBD.
>
> Hello Jack, Danil and Roman,
>
> Thanks for having taken the time to open source this work and to travel to
> Boston to present this work at the Vault conference. However, my
> understanding of IBNBD is that this driver has several shortcomings neither
> NVMeOF nor iSER nor SRP have:
> * Doesn't scale in terms of number of CPUs submitting I/O. The graphs shown
>   during the Vault talk clearly illustrate this. This is probably the result
>   of sharing a data structure across all client CPUs, maybe the bitmap that
>   tracks which parts of the target buffer space are in use.
> * Supports IB but none of the other RDMA transports (RoCE / iWARP).
>
> We also need performance numbers that compare IBNBD against SRP and/or
> NVMeOF with memory registration disabled to see whether and how much faster
> IBNBD is compared to these two protocols.
>
> The fact that IBNBD only needs to messages per I/O is an advantage it has
> today over SRP but not over NVMeOF nor over iSER. The upstream initiator
> drivers for the latter two protocols already support inline data.
>
> Another question I have is whether integration with multipathd is supported?
> If multipathd tries to run scsi_id against an IBNBD client device that will
> fail.
>
> Thanks,
>
> Bart.
Hello Bart,

Thanks for your comments. As usual in house driver mainly covers needs
for ProfitBricks,
We only tested in our hardware environment. We only use IB not
RoCE/iWARP. The idea to
opensource is :
- Present our design/implementation/tradeoff, others might be interested.
- Attract more attention from developers/testers, so we can improve
the project better and faster.

We will gather performance data compare with NVMeOF in next submitting.

multipath is not supported, we're using APM for failover. (patch from
Mellanox developers)

Thanks,
-- 
Jack Wang
Linux Kernel Developer

ProfitBricks GmbH
Greifswalder Str. 207
D - 10405 Berlin

Tel:       +49 30 577 008  042
Fax:      +49 30 577 008 299
Email:    jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org
URL:      https://www.profitbricks.de

Sitz der Gesellschaft: Berlin
Registergericht: Amtsgericht Charlottenburg, HRB 125506 B
Geschäftsführer: Achim Weiss
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 01/28] ibtrs: add header shared between ibtrs_client and ibtrs_server
@ 2017-03-24 14:31         ` Johannes Thumshirn
  0 siblings, 0 replies; 87+ messages in thread
From: Johannes Thumshirn @ 2017-03-24 14:31 UTC (permalink / raw)
  To: Jinpu Wang
  Cc: linux-block, linux-rdma, Doug Ledford, Jens Axboe, hch,
	Fabian Holler, Milind Dumbare, Michael Wang, Kleber Souza,
	Danil Kipnis, Roman Pen

On Fri, Mar 24, 2017 at 01:54:04PM +0100, Jinpu Wang wrote:
> >> +
> >> +#define XX(a) case (a): return #a
> >
> > please no macros with retun in them and XX isn't quite too descriptive as
> > well.
> >
> > [...]
> >
> >> +static inline const char *ib_wc_opcode_str(enum ib_wc_opcode opcode)
> >> +{
> >> +     switch (opcode) {
> >> +     XX(IB_WC_SEND);
> >> +     XX(IB_WC_RDMA_WRITE);
> >> +     XX(IB_WC_RDMA_READ);
> >> +     XX(IB_WC_COMP_SWAP);
> >> +     XX(IB_WC_FETCH_ADD);
> >> +     /* recv-side); inbound completion */
> >> +     XX(IB_WC_RECV);
> >> +     XX(IB_WC_RECV_RDMA_WITH_IMM);
> >> +     default: return "IB_WC_OPCODE_UNKNOWN";
> >> +     }
> >> +}
> >
> > How about:
> >
> > struct {
> >         char *name;
> >         enum ib_wc_opcode opcode;
> > } ib_wc_opcode_table[] = {
> >         { stringyfy(IB_WC_SEND), IB_WC_SEND },
> >         { stringyfy(IB_WC_RDMA_WRITE), IB_WC_RDMA_WRITE },
> >         { stringyfy(IB_WC_RDMA_READ ), IB_WC_RDMA_READ }
> >         { stringyfy(IB_WC_COMP_SWAP), IB_WC_COMP_SWAP },
> >         { stringyfy(IB_WC_FETCH_ADD), IB_WC_FETCH_ADD },
> >         { stringyfy(IB_WC_RECV), IB_WC_RECV },
> >         { stringyfy(IB_WC_RECV_RDMA_WITH_IMM), IB_WC_RECV_RDMA_WITH_IMM },
> >         { NULL, 0 },
> > };
> >
> > static inline const char *ib_wc_opcode_str(enum ib_wc_opcode opcode)
> > {
> >         int i;
> >
> >         for (i = 0; i < ARRAY_SIZE(ib_wc_opcode_table); i++)
> >                 if (ib_wc_opcode_table[i].opcode == opcode)
> >                         return ib_wc_opcode_table[i].name;
> >
> >         return "IB_WC_OPCODE_UNKNOWN";
> > }
> >
> Looks nice, might be better to put it into ib_verbs.h?

Probably yes, as are your kvec functions for lib/iov_iter.c

[...]

> > What about resolving the kernel bug instead of making workarounds?
> I tried to send a patch upsteam, but was rejected by Sean.
> http://www.spinics.net/lists/linux-rdma/msg22381.html
> 

I don't see a NACK in this thread.

>From http://www.spinics.net/lists/linux-rdma/msg22410.html:
"The port space (which maps to the service ID) needs to be included as part of
the check that determines the format of the private data, and not simply the
address family." 

After such a state I would have expected to see a v2 of the patch with above
comment addressed.

Byte,
	Johannes
-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N�rnberg
GF: Felix Imend�rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N�rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 01/28] ibtrs: add header shared between ibtrs_client and ibtrs_server
@ 2017-03-24 14:31         ` Johannes Thumshirn
  0 siblings, 0 replies; 87+ messages in thread
From: Johannes Thumshirn @ 2017-03-24 14:31 UTC (permalink / raw)
  To: Jinpu Wang
  Cc: linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Jens Axboe, hch,
	Fabian Holler, Milind Dumbare, Michael Wang, Kleber Souza,
	Danil Kipnis, Roman Pen

On Fri, Mar 24, 2017 at 01:54:04PM +0100, Jinpu Wang wrote:
> >> +
> >> +#define XX(a) case (a): return #a
> >
> > please no macros with retun in them and XX isn't quite too descriptive as
> > well.
> >
> > [...]
> >
> >> +static inline const char *ib_wc_opcode_str(enum ib_wc_opcode opcode)
> >> +{
> >> +     switch (opcode) {
> >> +     XX(IB_WC_SEND);
> >> +     XX(IB_WC_RDMA_WRITE);
> >> +     XX(IB_WC_RDMA_READ);
> >> +     XX(IB_WC_COMP_SWAP);
> >> +     XX(IB_WC_FETCH_ADD);
> >> +     /* recv-side); inbound completion */
> >> +     XX(IB_WC_RECV);
> >> +     XX(IB_WC_RECV_RDMA_WITH_IMM);
> >> +     default: return "IB_WC_OPCODE_UNKNOWN";
> >> +     }
> >> +}
> >
> > How about:
> >
> > struct {
> >         char *name;
> >         enum ib_wc_opcode opcode;
> > } ib_wc_opcode_table[] = {
> >         { stringyfy(IB_WC_SEND), IB_WC_SEND },
> >         { stringyfy(IB_WC_RDMA_WRITE), IB_WC_RDMA_WRITE },
> >         { stringyfy(IB_WC_RDMA_READ ), IB_WC_RDMA_READ }
> >         { stringyfy(IB_WC_COMP_SWAP), IB_WC_COMP_SWAP },
> >         { stringyfy(IB_WC_FETCH_ADD), IB_WC_FETCH_ADD },
> >         { stringyfy(IB_WC_RECV), IB_WC_RECV },
> >         { stringyfy(IB_WC_RECV_RDMA_WITH_IMM), IB_WC_RECV_RDMA_WITH_IMM },
> >         { NULL, 0 },
> > };
> >
> > static inline const char *ib_wc_opcode_str(enum ib_wc_opcode opcode)
> > {
> >         int i;
> >
> >         for (i = 0; i < ARRAY_SIZE(ib_wc_opcode_table); i++)
> >                 if (ib_wc_opcode_table[i].opcode == opcode)
> >                         return ib_wc_opcode_table[i].name;
> >
> >         return "IB_WC_OPCODE_UNKNOWN";
> > }
> >
> Looks nice, might be better to put it into ib_verbs.h?

Probably yes, as are your kvec functions for lib/iov_iter.c

[...]

> > What about resolving the kernel bug instead of making workarounds?
> I tried to send a patch upsteam, but was rejected by Sean.
> http://www.spinics.net/lists/linux-rdma/msg22381.html
> 

I don't see a NACK in this thread.

>From http://www.spinics.net/lists/linux-rdma/msg22410.html:
"The port space (which maps to the service ID) needs to be included as part of
the check that determines the format of the private data, and not simply the
address family." 

After such a state I would have expected to see a v2 of the patch with above
comment addressed.

Byte,
	Johannes
-- 
Johannes Thumshirn                                          Storage
jthumshirn-l3A5Bk7waGM@public.gmane.org                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 01/28] ibtrs: add header shared between ibtrs_client and ibtrs_server
  2017-03-24 14:31         ` Johannes Thumshirn
@ 2017-03-24 14:35           ` Jinpu Wang
  -1 siblings, 0 replies; 87+ messages in thread
From: Jinpu Wang @ 2017-03-24 14:35 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: linux-block, linux-rdma, Doug Ledford, Jens Axboe, hch,
	Fabian Holler, Milind Dumbare, Michael Wang, Kleber Souza,
	Danil Kipnis, Roman Pen

On Fri, Mar 24, 2017 at 3:31 PM, Johannes Thumshirn <jthumshirn@suse.de> wr=
ote:
> On Fri, Mar 24, 2017 at 01:54:04PM +0100, Jinpu Wang wrote:
>> >> +
>> >> +#define XX(a) case (a): return #a
>> >
>> > please no macros with retun in them and XX isn't quite too descriptive=
 as
>> > well.
>> >
>> > [...]
>> >
>> >> +static inline const char *ib_wc_opcode_str(enum ib_wc_opcode opcode)
>> >> +{
>> >> +     switch (opcode) {
>> >> +     XX(IB_WC_SEND);
>> >> +     XX(IB_WC_RDMA_WRITE);
>> >> +     XX(IB_WC_RDMA_READ);
>> >> +     XX(IB_WC_COMP_SWAP);
>> >> +     XX(IB_WC_FETCH_ADD);
>> >> +     /* recv-side); inbound completion */
>> >> +     XX(IB_WC_RECV);
>> >> +     XX(IB_WC_RECV_RDMA_WITH_IMM);
>> >> +     default: return "IB_WC_OPCODE_UNKNOWN";
>> >> +     }
>> >> +}
>> >
>> > How about:
>> >
>> > struct {
>> >         char *name;
>> >         enum ib_wc_opcode opcode;
>> > } ib_wc_opcode_table[] =3D {
>> >         { stringyfy(IB_WC_SEND), IB_WC_SEND },
>> >         { stringyfy(IB_WC_RDMA_WRITE), IB_WC_RDMA_WRITE },
>> >         { stringyfy(IB_WC_RDMA_READ ), IB_WC_RDMA_READ }
>> >         { stringyfy(IB_WC_COMP_SWAP), IB_WC_COMP_SWAP },
>> >         { stringyfy(IB_WC_FETCH_ADD), IB_WC_FETCH_ADD },
>> >         { stringyfy(IB_WC_RECV), IB_WC_RECV },
>> >         { stringyfy(IB_WC_RECV_RDMA_WITH_IMM), IB_WC_RECV_RDMA_WITH_IM=
M },
>> >         { NULL, 0 },
>> > };
>> >
>> > static inline const char *ib_wc_opcode_str(enum ib_wc_opcode opcode)
>> > {
>> >         int i;
>> >
>> >         for (i =3D 0; i < ARRAY_SIZE(ib_wc_opcode_table); i++)
>> >                 if (ib_wc_opcode_table[i].opcode =3D=3D opcode)
>> >                         return ib_wc_opcode_table[i].name;
>> >
>> >         return "IB_WC_OPCODE_UNKNOWN";
>> > }
>> >
>> Looks nice, might be better to put it into ib_verbs.h?
>
> Probably yes, as are your kvec functions for lib/iov_iter.c
Thanks, will do in next round!

>
> [...]
>
>> > What about resolving the kernel bug instead of making workarounds?
>> I tried to send a patch upsteam, but was rejected by Sean.
>> http://www.spinics.net/lists/linux-rdma/msg22381.html
>>
>
> I don't see a NACK in this thread.
>
> From http://www.spinics.net/lists/linux-rdma/msg22410.html:
> "The port space (which maps to the service ID) needs to be included as pa=
rt of
> the check that determines the format of the private data, and not simply =
the
> address family."
>
> After such a state I would have expected to see a v2 of the patch with ab=
ove
> comment addressed.
I might busy with other staff at that time, I will check again and
revisit the bug.

>
> Byte,
>         Johannes
> --
> Johannes Thumshirn                                          Storage
> jthumshirn@suse.de                                +49 911 74053 689
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N=C3=BCrnberg
> GF: Felix Imend=C3=B6rffer, Jane Smithard, Graham Norton
> HRB 21284 (AG N=C3=BCrnberg)
> Key fingerprint =3D EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

Regards,
--=20
Jack Wang
Linux Kernel Developer

ProfitBricks GmbH
Greifswalder Str. 207
D - 10405 Berlin

Tel:       +49 30 577 008  042
Fax:      +49 30 577 008 299
Email:    jinpu.wang@profitbricks.com
URL:      https://www.profitbricks.de

Sitz der Gesellschaft: Berlin
Registergericht: Amtsgericht Charlottenburg, HRB 125506 B
Gesch=C3=A4ftsf=C3=BChrer: Achim Weiss

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 01/28] ibtrs: add header shared between ibtrs_client and ibtrs_server
@ 2017-03-24 14:35           ` Jinpu Wang
  0 siblings, 0 replies; 87+ messages in thread
From: Jinpu Wang @ 2017-03-24 14:35 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: linux-block, linux-rdma, Doug Ledford, Jens Axboe, hch,
	Fabian Holler, Milind Dumbare, Michael Wang, Kleber Souza,
	Danil Kipnis, Roman Pen

On Fri, Mar 24, 2017 at 3:31 PM, Johannes Thumshirn <jthumshirn@suse.de> wrote:
> On Fri, Mar 24, 2017 at 01:54:04PM +0100, Jinpu Wang wrote:
>> >> +
>> >> +#define XX(a) case (a): return #a
>> >
>> > please no macros with retun in them and XX isn't quite too descriptive as
>> > well.
>> >
>> > [...]
>> >
>> >> +static inline const char *ib_wc_opcode_str(enum ib_wc_opcode opcode)
>> >> +{
>> >> +     switch (opcode) {
>> >> +     XX(IB_WC_SEND);
>> >> +     XX(IB_WC_RDMA_WRITE);
>> >> +     XX(IB_WC_RDMA_READ);
>> >> +     XX(IB_WC_COMP_SWAP);
>> >> +     XX(IB_WC_FETCH_ADD);
>> >> +     /* recv-side); inbound completion */
>> >> +     XX(IB_WC_RECV);
>> >> +     XX(IB_WC_RECV_RDMA_WITH_IMM);
>> >> +     default: return "IB_WC_OPCODE_UNKNOWN";
>> >> +     }
>> >> +}
>> >
>> > How about:
>> >
>> > struct {
>> >         char *name;
>> >         enum ib_wc_opcode opcode;
>> > } ib_wc_opcode_table[] = {
>> >         { stringyfy(IB_WC_SEND), IB_WC_SEND },
>> >         { stringyfy(IB_WC_RDMA_WRITE), IB_WC_RDMA_WRITE },
>> >         { stringyfy(IB_WC_RDMA_READ ), IB_WC_RDMA_READ }
>> >         { stringyfy(IB_WC_COMP_SWAP), IB_WC_COMP_SWAP },
>> >         { stringyfy(IB_WC_FETCH_ADD), IB_WC_FETCH_ADD },
>> >         { stringyfy(IB_WC_RECV), IB_WC_RECV },
>> >         { stringyfy(IB_WC_RECV_RDMA_WITH_IMM), IB_WC_RECV_RDMA_WITH_IMM },
>> >         { NULL, 0 },
>> > };
>> >
>> > static inline const char *ib_wc_opcode_str(enum ib_wc_opcode opcode)
>> > {
>> >         int i;
>> >
>> >         for (i = 0; i < ARRAY_SIZE(ib_wc_opcode_table); i++)
>> >                 if (ib_wc_opcode_table[i].opcode == opcode)
>> >                         return ib_wc_opcode_table[i].name;
>> >
>> >         return "IB_WC_OPCODE_UNKNOWN";
>> > }
>> >
>> Looks nice, might be better to put it into ib_verbs.h?
>
> Probably yes, as are your kvec functions for lib/iov_iter.c
Thanks, will do in next round!

>
> [...]
>
>> > What about resolving the kernel bug instead of making workarounds?
>> I tried to send a patch upsteam, but was rejected by Sean.
>> http://www.spinics.net/lists/linux-rdma/msg22381.html
>>
>
> I don't see a NACK in this thread.
>
> From http://www.spinics.net/lists/linux-rdma/msg22410.html:
> "The port space (which maps to the service ID) needs to be included as part of
> the check that determines the format of the private data, and not simply the
> address family."
>
> After such a state I would have expected to see a v2 of the patch with above
> comment addressed.
I might busy with other staff at that time, I will check again and
revisit the bug.

>
> Byte,
>         Johannes
> --
> Johannes Thumshirn                                          Storage
> jthumshirn@suse.de                                +49 911 74053 689
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)
> Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

Regards,
-- 
Jack Wang
Linux Kernel Developer

ProfitBricks GmbH
Greifswalder Str. 207
D - 10405 Berlin

Tel:       +49 30 577 008  042
Fax:      +49 30 577 008 299
Email:    jinpu.wang@profitbricks.com
URL:      https://www.profitbricks.de

Sitz der Gesellschaft: Berlin
Registergericht: Amtsgericht Charlottenburg, HRB 125506 B
Geschäftsführer: Achim Weiss

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH 00/28] INFINIBAND NETWORK BLOCK DEVICE (IBNBD)
  2017-03-24 14:20   ` Steve Wise
@ 2017-03-24 14:37     ` Jinpu Wang
  -1 siblings, 0 replies; 87+ messages in thread
From: Jinpu Wang @ 2017-03-24 14:37 UTC (permalink / raw)
  To: Steve Wise
  Cc: linux-block, linux-rdma, Doug Ledford, Jens Axboe, hch,
	Fabian Holler, Milind Dumbare, Michael Wang

On Fri, Mar 24, 2017 at 3:20 PM, Steve Wise <swise@opengridcomputing.com> w=
rote:
>>
>> From: Jack Wang <jinpu.wang@profitbricks.com>
>>
>> This series introduces IBNBD/IBTRS kernel modules.
>>
>> IBNBD (InfiniBand network block device) allows for an RDMA transfer of b=
lock
> IO
>> over InfiniBand network. The driver presents itself as a block device on
> client
>> side and transmits the block requests in a zero-copy fashion to the
> server-side
>> via InfiniBand. The server part of the driver converts the incoming buff=
ers
> back
>> into BIOs and hands them down to the underlying block device. As soon as=
 IO
>> responses come back from the drive, they are being transmitted back to t=
he
>> client.
>
> Hey Jack, why is this IB specific?  Can it work over iWARP transports as =
well?
>
> Steve.
>
>
>
Hi Steve,

Because we only use IB in our production, as I replied to Bart, sorry, not =
yet.

Regards,
--=20
Jack Wang
Linux Kernel Developer

ProfitBricks GmbH
Greifswalder Str. 207
D - 10405 Berlin

Tel:       +49 30 577 008  042
Fax:      +49 30 577 008 299
Email:    jinpu.wang@profitbricks.com
URL:      https://www.profitbricks.de

Sitz der Gesellschaft: Berlin
Registergericht: Amtsgericht Charlottenburg, HRB 125506 B
Gesch=C3=A4ftsf=C3=BChrer: Achim Weiss

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH 00/28] INFINIBAND NETWORK BLOCK DEVICE (IBNBD)
@ 2017-03-24 14:37     ` Jinpu Wang
  0 siblings, 0 replies; 87+ messages in thread
From: Jinpu Wang @ 2017-03-24 14:37 UTC (permalink / raw)
  To: Steve Wise
  Cc: linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Jens Axboe, hch,
	Fabian Holler, Milind Dumbare, Michael Wang

On Fri, Mar 24, 2017 at 3:20 PM, Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> wrote:
>>
>> From: Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
>>
>> This series introduces IBNBD/IBTRS kernel modules.
>>
>> IBNBD (InfiniBand network block device) allows for an RDMA transfer of block
> IO
>> over InfiniBand network. The driver presents itself as a block device on
> client
>> side and transmits the block requests in a zero-copy fashion to the
> server-side
>> via InfiniBand. The server part of the driver converts the incoming buffers
> back
>> into BIOs and hands them down to the underlying block device. As soon as IO
>> responses come back from the drive, they are being transmitted back to the
>> client.
>
> Hey Jack, why is this IB specific?  Can it work over iWARP transports as well?
>
> Steve.
>
>
>
Hi Steve,

Because we only use IB in our production, as I replied to Bart, sorry, not yet.

Regards,
-- 
Jack Wang
Linux Kernel Developer

ProfitBricks GmbH
Greifswalder Str. 207
D - 10405 Berlin

Tel:       +49 30 577 008  042
Fax:      +49 30 577 008 299
Email:    jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org
URL:      https://www.profitbricks.de

Sitz der Gesellschaft: Berlin
Registergericht: Amtsgericht Charlottenburg, HRB 125506 B
Geschäftsführer: Achim Weiss
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 08/28] ibtrs_clt: add Makefile and Kconfig
@ 2017-03-25  5:51     ` kbuild test robot
  0 siblings, 0 replies; 87+ messages in thread
From: kbuild test robot @ 2017-03-25  5:51 UTC (permalink / raw)
  To: Jack Wang
  Cc: kbuild-all, linux-block, linux-rdma, dledford, axboe, hch, mail,
	Milind.dumbare, yun.wang, Jack Wang

[-- Attachment #1: Type: text/plain, Size: 14663 bytes --]

Hi Jack,

[auto build test WARNING on linus/master]
[also build test WARNING on v4.11-rc3 next-20170324]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Jack-Wang/INFINIBAND-NETWORK-BLOCK-DEVICE-IBNBD/20170325-101629
config: i386-allmodconfig (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All warnings (new ones prefixed by >>):

   In file included from include/linux/printk.h:329:0,
                    from include/linux/kernel.h:13,
                    from include/linux/list.h:8,
                    from include/linux/module.h:9,
                    from drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c:47:
   drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c: In function 'process_open_rsp':
>> drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c:859:7: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
          (void *)sess->srv_rdma_addr[i],
          ^
   include/linux/dynamic_debug.h:127:10: note: in definition of macro 'dynamic_pr_debug'
           ##__VA_ARGS__);  \
             ^~~~~~~~~~~
>> include/rdma/ibtrs_log.h:51:23: note: in expansion of macro 'pr_debug'
    #define DEB(fmt, ...) pr_debug("ibtrs L%d " fmt, __LINE__, ##__VA_ARGS__)
                          ^~~~~~~~
>> drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c:857:3: note: in expansion of macro 'DEB'
      DEB("Adding contiguous buffer %d, size %u, addr: 0x%p,"
      ^~~
   In file included from include/linux/kernel.h:13:0,
                    from include/linux/list.h:8,
                    from include/linux/module.h:9,
                    from drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c:47:
   drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c: In function 'ibtrs_map_desc':
>> include/rdma/ibtrs_log.h:51:32: warning: format '%llu' expects argument of type 'long long unsigned int', but argument 4 has type 'dma_addr_t {aka unsigned int}' [-Wformat=]
    #define DEB(fmt, ...) pr_debug("ibtrs L%d " fmt, __LINE__, ##__VA_ARGS__)
                                   ^
   include/linux/printk.h:285:21: note: in definition of macro 'pr_fmt'
    #define pr_fmt(fmt) fmt
                        ^~~
   include/linux/printk.h:333:2: note: in expansion of macro 'dynamic_pr_debug'
     dynamic_pr_debug(fmt, ##__VA_ARGS__)
     ^~~~~~~~~~~~~~~~
>> include/rdma/ibtrs_log.h:51:23: note: in expansion of macro 'pr_debug'
    #define DEB(fmt, ...) pr_debug("ibtrs L%d " fmt, __LINE__, ##__VA_ARGS__)
                          ^~~~~~~~
   drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c:1106:2: note: in expansion of macro 'DEB'
     DEB("dma_addr %llu, key %u, dma_len %u\n", dma_addr, rkey, dma_len);
     ^~~
   drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c: In function 'ibtrs_post_send_rdma':
>> drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c:1440:23: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
              addr + off, (u64)req->iu, imm,
                          ^
   drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c: In function 'ibtrs_post_send_rdma_desc':
   drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c:1565:17: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
              addr, (u64)req->iu, imm,
                    ^
   drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c: In function 'process_err_wc':
   drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c:1882:7: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
     iu = (struct ibtrs_iu *)wc->wr_id;
          ^
   drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c: In function 'process_wcs':
   drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c:1922:8: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
      iu = (struct ibtrs_iu *)wc.wr_id;
           ^
--
   In file included from include/linux/printk.h:6:0,
                    from drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c:48:
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c: In function 'ibtrs_validate_msg_sess_open_resp':
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
>> include/rdma/ibtrs_log.h:62:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibtrs L%d ERR: " fmt, \
                             ^~~~~~
>> drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c:59:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Session open resp msg received with unexpected length"
      ^~~~~~
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c: In function 'ibtrs_validate_msg_rdma_write':
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
>> include/rdma/ibtrs_log.h:62:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibtrs L%d ERR: " fmt, \
                             ^~~~~~
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c:99:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("RDMA-Write msg received with invalid length %d"
      ^~~~~~
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c: In function 'ibtrs_validate_msg_req_rdma_write':
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
>> include/rdma/ibtrs_log.h:62:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibtrs L%d ERR: " fmt, \
                             ^~~~~~
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c:112:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Request-RDMA-Write msg request received with invalid"
      ^~~~~~
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c: In function 'ibtrs_validate_msg_con_open':
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
>> include/rdma/ibtrs_log.h:62:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibtrs L%d ERR: " fmt, \
                             ^~~~~~
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c:125:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Con Open msg received with invalid length: %d"
      ^~~~~~
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c: In function 'ibtrs_validate_msg_sess_open':
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
>> include/rdma/ibtrs_log.h:62:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibtrs L%d ERR: " fmt, \
                             ^~~~~~
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c:137:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Sess open msg received with invalid length: %d"
      ^~~~~~
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c: In function 'ibtrs_validate_msg_sess_info':
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
>> include/rdma/ibtrs_log.h:62:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibtrs L%d ERR: " fmt, \
                             ^~~~~~
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c:153:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Error message received with invalid length: %d,"
      ^~~~~~
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c: In function 'ibtrs_validate_msg_error':
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
>> include/rdma/ibtrs_log.h:62:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibtrs L%d ERR: " fmt, \
                             ^~~~~~
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c:164:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Error message received with invalid length: %d,"
      ^~~~~~
--
   In file included from include/linux/kernel.h:13:0,
                    from include/linux/uio.h:12,
                    from include/rdma/ibtrs.h:50,
                    from drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/heartbeat.c:47:
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/heartbeat.c: In function 'ibtrs_heartbeat_warn':
>> include/rdma/ibtrs_log.h:51:32: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'long long int' [-Wformat=]
    #define DEB(fmt, ...) pr_debug("ibtrs L%d " fmt, __LINE__, ##__VA_ARGS__)
                                   ^
   include/linux/printk.h:285:21: note: in definition of macro 'pr_fmt'
    #define pr_fmt(fmt) fmt
                        ^~~
   include/linux/printk.h:333:2: note: in expansion of macro 'dynamic_pr_debug'
     dynamic_pr_debug(fmt, ##__VA_ARGS__)
     ^~~~~~~~~~~~~~~~
>> include/rdma/ibtrs_log.h:51:23: note: in expansion of macro 'pr_debug'
    #define DEB(fmt, ...) pr_debug("ibtrs L%d " fmt, __LINE__, ##__VA_ARGS__)
                          ^~~~~~~~
>> drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/heartbeat.c:84:2: note: in expansion of macro 'DEB'
     DEB("last heartbeat message from %s was received %lu, %llums"
     ^~~
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/heartbeat.c: In function 'ibtrs_heartbeat_timeout_is_expired':
>> include/rdma/ibtrs_log.h:51:32: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'long long int' [-Wformat=]
    #define DEB(fmt, ...) pr_debug("ibtrs L%d " fmt, __LINE__, ##__VA_ARGS__)
                                   ^
   include/linux/printk.h:285:21: note: in definition of macro 'pr_fmt'
    #define pr_fmt(fmt) fmt
                        ^~~
   include/linux/printk.h:333:2: note: in expansion of macro 'dynamic_pr_debug'
     dynamic_pr_debug(fmt, ##__VA_ARGS__)
     ^~~~~~~~~~~~~~~~
>> include/rdma/ibtrs_log.h:51:23: note: in expansion of macro 'pr_debug'
    #define DEB(fmt, ...) pr_debug("ibtrs L%d " fmt, __LINE__, ##__VA_ARGS__)
                          ^~~~~~~~
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/heartbeat.c:101:2: note: in expansion of macro 'DEB'
     DEB("last heartbeat message from %s received %lu, %llums ago\n",
     ^~~

vim +859 drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c

89b85024 Jack Wang 2017-03-24  851  			return -ENOMEM;
89b85024 Jack Wang 2017-03-24  852  		}
89b85024 Jack Wang 2017-03-24  853  	}
89b85024 Jack Wang 2017-03-24  854  
89b85024 Jack Wang 2017-03-24  855  	for (i = 0; i < msg->cnt; i++) {
89b85024 Jack Wang 2017-03-24  856  		sess->srv_rdma_addr[i] = msg->addr[i];
89b85024 Jack Wang 2017-03-24 @857  		DEB("Adding contiguous buffer %d, size %u, addr: 0x%p,"
89b85024 Jack Wang 2017-03-24  858  		    " rkey: 0x%x\n", i, sess->chunk_size,
89b85024 Jack Wang 2017-03-24 @859  		    (void *)sess->srv_rdma_addr[i],
89b85024 Jack Wang 2017-03-24  860  		    sess->srv_rdma_buf_rkey);
89b85024 Jack Wang 2017-03-24  861  	}
89b85024 Jack Wang 2017-03-24  862  

:::::: The code at line 859 was first introduced by commit
:::::: 89b85024b8ff15d239ba06be993378fe6a940693 ibtrs_clt: main functionality of ibtrs_client

:::::: TO: Jack Wang <jinpu.wang@profitbricks.com>
:::::: CC: 0day robot <fengguang.wu@intel.com>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 58998 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 08/28] ibtrs_clt: add Makefile and Kconfig
@ 2017-03-25  5:51     ` kbuild test robot
  0 siblings, 0 replies; 87+ messages in thread
From: kbuild test robot @ 2017-03-25  5:51 UTC (permalink / raw)
  Cc: kbuild-all-JC7UmRfGjtg, linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	dledford-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
	hch-jcswGhMUV9g, mail-99BIx50xQYGELgA04lAiVw,
	Milind.dumbare-Re5JQEeQqe8AvxtiuMwx3w,
	yun.wang-EIkl63zCoXaH+58JC4qpiA, Jack Wang

[-- Attachment #1: Type: text/plain, Size: 14716 bytes --]

Hi Jack,

[auto build test WARNING on linus/master]
[also build test WARNING on v4.11-rc3 next-20170324]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Jack-Wang/INFINIBAND-NETWORK-BLOCK-DEVICE-IBNBD/20170325-101629
config: i386-allmodconfig (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All warnings (new ones prefixed by >>):

   In file included from include/linux/printk.h:329:0,
                    from include/linux/kernel.h:13,
                    from include/linux/list.h:8,
                    from include/linux/module.h:9,
                    from drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c:47:
   drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c: In function 'process_open_rsp':
>> drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c:859:7: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
          (void *)sess->srv_rdma_addr[i],
          ^
   include/linux/dynamic_debug.h:127:10: note: in definition of macro 'dynamic_pr_debug'
           ##__VA_ARGS__);  \
             ^~~~~~~~~~~
>> include/rdma/ibtrs_log.h:51:23: note: in expansion of macro 'pr_debug'
    #define DEB(fmt, ...) pr_debug("ibtrs L%d " fmt, __LINE__, ##__VA_ARGS__)
                          ^~~~~~~~
>> drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c:857:3: note: in expansion of macro 'DEB'
      DEB("Adding contiguous buffer %d, size %u, addr: 0x%p,"
      ^~~
   In file included from include/linux/kernel.h:13:0,
                    from include/linux/list.h:8,
                    from include/linux/module.h:9,
                    from drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c:47:
   drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c: In function 'ibtrs_map_desc':
>> include/rdma/ibtrs_log.h:51:32: warning: format '%llu' expects argument of type 'long long unsigned int', but argument 4 has type 'dma_addr_t {aka unsigned int}' [-Wformat=]
    #define DEB(fmt, ...) pr_debug("ibtrs L%d " fmt, __LINE__, ##__VA_ARGS__)
                                   ^
   include/linux/printk.h:285:21: note: in definition of macro 'pr_fmt'
    #define pr_fmt(fmt) fmt
                        ^~~
   include/linux/printk.h:333:2: note: in expansion of macro 'dynamic_pr_debug'
     dynamic_pr_debug(fmt, ##__VA_ARGS__)
     ^~~~~~~~~~~~~~~~
>> include/rdma/ibtrs_log.h:51:23: note: in expansion of macro 'pr_debug'
    #define DEB(fmt, ...) pr_debug("ibtrs L%d " fmt, __LINE__, ##__VA_ARGS__)
                          ^~~~~~~~
   drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c:1106:2: note: in expansion of macro 'DEB'
     DEB("dma_addr %llu, key %u, dma_len %u\n", dma_addr, rkey, dma_len);
     ^~~
   drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c: In function 'ibtrs_post_send_rdma':
>> drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c:1440:23: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
              addr + off, (u64)req->iu, imm,
                          ^
   drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c: In function 'ibtrs_post_send_rdma_desc':
   drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c:1565:17: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
              addr, (u64)req->iu, imm,
                    ^
   drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c: In function 'process_err_wc':
   drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c:1882:7: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
     iu = (struct ibtrs_iu *)wc->wr_id;
          ^
   drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c: In function 'process_wcs':
   drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c:1922:8: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
      iu = (struct ibtrs_iu *)wc.wr_id;
           ^
--
   In file included from include/linux/printk.h:6:0,
                    from drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c:48:
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c: In function 'ibtrs_validate_msg_sess_open_resp':
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
>> include/rdma/ibtrs_log.h:62:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibtrs L%d ERR: " fmt, \
                             ^~~~~~
>> drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c:59:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Session open resp msg received with unexpected length"
      ^~~~~~
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c: In function 'ibtrs_validate_msg_rdma_write':
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
>> include/rdma/ibtrs_log.h:62:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibtrs L%d ERR: " fmt, \
                             ^~~~~~
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c:99:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("RDMA-Write msg received with invalid length %d"
      ^~~~~~
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c: In function 'ibtrs_validate_msg_req_rdma_write':
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
>> include/rdma/ibtrs_log.h:62:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibtrs L%d ERR: " fmt, \
                             ^~~~~~
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c:112:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Request-RDMA-Write msg request received with invalid"
      ^~~~~~
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c: In function 'ibtrs_validate_msg_con_open':
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
>> include/rdma/ibtrs_log.h:62:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibtrs L%d ERR: " fmt, \
                             ^~~~~~
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c:125:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Con Open msg received with invalid length: %d"
      ^~~~~~
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c: In function 'ibtrs_validate_msg_sess_open':
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
>> include/rdma/ibtrs_log.h:62:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibtrs L%d ERR: " fmt, \
                             ^~~~~~
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c:137:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Sess open msg received with invalid length: %d"
      ^~~~~~
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c: In function 'ibtrs_validate_msg_sess_info':
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
>> include/rdma/ibtrs_log.h:62:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibtrs L%d ERR: " fmt, \
                             ^~~~~~
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c:153:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Error message received with invalid length: %d,"
      ^~~~~~
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c: In function 'ibtrs_validate_msg_error':
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
>> include/rdma/ibtrs_log.h:62:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibtrs L%d ERR: " fmt, \
                             ^~~~~~
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c:164:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Error message received with invalid length: %d,"
      ^~~~~~
--
   In file included from include/linux/kernel.h:13:0,
                    from include/linux/uio.h:12,
                    from include/rdma/ibtrs.h:50,
                    from drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/heartbeat.c:47:
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/heartbeat.c: In function 'ibtrs_heartbeat_warn':
>> include/rdma/ibtrs_log.h:51:32: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'long long int' [-Wformat=]
    #define DEB(fmt, ...) pr_debug("ibtrs L%d " fmt, __LINE__, ##__VA_ARGS__)
                                   ^
   include/linux/printk.h:285:21: note: in definition of macro 'pr_fmt'
    #define pr_fmt(fmt) fmt
                        ^~~
   include/linux/printk.h:333:2: note: in expansion of macro 'dynamic_pr_debug'
     dynamic_pr_debug(fmt, ##__VA_ARGS__)
     ^~~~~~~~~~~~~~~~
>> include/rdma/ibtrs_log.h:51:23: note: in expansion of macro 'pr_debug'
    #define DEB(fmt, ...) pr_debug("ibtrs L%d " fmt, __LINE__, ##__VA_ARGS__)
                          ^~~~~~~~
>> drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/heartbeat.c:84:2: note: in expansion of macro 'DEB'
     DEB("last heartbeat message from %s was received %lu, %llums"
     ^~~
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/heartbeat.c: In function 'ibtrs_heartbeat_timeout_is_expired':
>> include/rdma/ibtrs_log.h:51:32: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'long long int' [-Wformat=]
    #define DEB(fmt, ...) pr_debug("ibtrs L%d " fmt, __LINE__, ##__VA_ARGS__)
                                   ^
   include/linux/printk.h:285:21: note: in definition of macro 'pr_fmt'
    #define pr_fmt(fmt) fmt
                        ^~~
   include/linux/printk.h:333:2: note: in expansion of macro 'dynamic_pr_debug'
     dynamic_pr_debug(fmt, ##__VA_ARGS__)
     ^~~~~~~~~~~~~~~~
>> include/rdma/ibtrs_log.h:51:23: note: in expansion of macro 'pr_debug'
    #define DEB(fmt, ...) pr_debug("ibtrs L%d " fmt, __LINE__, ##__VA_ARGS__)
                          ^~~~~~~~
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/heartbeat.c:101:2: note: in expansion of macro 'DEB'
     DEB("last heartbeat message from %s received %lu, %llums ago\n",
     ^~~

vim +859 drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c

89b85024 Jack Wang 2017-03-24  851  			return -ENOMEM;
89b85024 Jack Wang 2017-03-24  852  		}
89b85024 Jack Wang 2017-03-24  853  	}
89b85024 Jack Wang 2017-03-24  854  
89b85024 Jack Wang 2017-03-24  855  	for (i = 0; i < msg->cnt; i++) {
89b85024 Jack Wang 2017-03-24  856  		sess->srv_rdma_addr[i] = msg->addr[i];
89b85024 Jack Wang 2017-03-24 @857  		DEB("Adding contiguous buffer %d, size %u, addr: 0x%p,"
89b85024 Jack Wang 2017-03-24  858  		    " rkey: 0x%x\n", i, sess->chunk_size,
89b85024 Jack Wang 2017-03-24 @859  		    (void *)sess->srv_rdma_addr[i],
89b85024 Jack Wang 2017-03-24  860  		    sess->srv_rdma_buf_rkey);
89b85024 Jack Wang 2017-03-24  861  	}
89b85024 Jack Wang 2017-03-24  862  

:::::: The code at line 859 was first introduced by commit
:::::: 89b85024b8ff15d239ba06be993378fe6a940693 ibtrs_clt: main functionality of ibtrs_client

:::::: TO: Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
:::::: CC: 0day robot <fengguang.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 58998 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 08/28] ibtrs_clt: add Makefile and Kconfig
  2017-03-24 10:45 ` [PATCH 08/28] ibtrs_clt: add Makefile and Kconfig Jack Wang
@ 2017-03-25  6:55     ` kbuild test robot
  2017-03-25  6:55     ` kbuild test robot
  1 sibling, 0 replies; 87+ messages in thread
From: kbuild test robot @ 2017-03-25  6:55 UTC (permalink / raw)
  To: Jack Wang
  Cc: kbuild-all, linux-block, linux-rdma, dledford, axboe, hch, mail,
	Milind.dumbare, yun.wang, Jack Wang

[-- Attachment #1: Type: text/plain, Size: 8114 bytes --]

Hi Jack,

[auto build test WARNING on linus/master]
[also build test WARNING on v4.11-rc3 next-20170324]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Jack-Wang/INFINIBAND-NETWORK-BLOCK-DEVICE-IBNBD/20170325-101629
config: i386-allyesconfig (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All warnings (new ones prefixed by >>):

   drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c: In function 'process_open_rsp':
   drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c:857:26: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
      DEB("Adding contiguous buffer %d, size %u, addr: 0x%p,"
                             ^
   drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c: In function 'ibtrs_map_desc':
>> drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c:1106:24: warning: format '%llu' expects argument of type 'long long unsigned int', but argument 3 has type 'dma_addr_t {aka unsigned int}' [-Wformat=]
     DEB("dma_addr %llu, key %u, dma_len %u\n", dma_addr, rkey, dma_len);
                           ^~~~~~
   drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c: In function 'ibtrs_post_send_rdma':
   drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c:1440:23: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
              addr + off, (u64)req->iu, imm,
                          ^
   drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c: In function 'ibtrs_post_send_rdma_desc':
   drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c:1565:17: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
              addr, (u64)req->iu, imm,
                    ^
   drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c: In function 'process_err_wc':
   drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c:1882:7: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
     iu = (struct ibtrs_iu *)wc->wr_id;
          ^
   drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c: In function 'process_wcs':
   drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c:1922:8: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
      iu = (struct ibtrs_iu *)wc.wr_id;
           ^
--
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c: In function 'ibtrs_validate_msg_sess_open_resp':
>> drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c:59:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
      ERR_NP("Session open resp msg received with unexpected length"
             ^~~~~~
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c: In function 'ibtrs_validate_msg_rdma_write':
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c:99:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
      ERR_NP("RDMA-Write msg received with invalid length %d"
             ^~~~~~
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c: In function 'ibtrs_validate_msg_req_rdma_write':
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c:112:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
      ERR_NP("Request-RDMA-Write msg request received with invalid"
             ^~~~~~
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c: In function 'ibtrs_validate_msg_con_open':
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c:125:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
      ERR_NP("Con Open msg received with invalid length: %d"
             ^~~~~~
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c: In function 'ibtrs_validate_msg_sess_open':
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c:137:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
      ERR_NP("Sess open msg received with invalid length: %d"
             ^~~~~~
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c: In function 'ibtrs_validate_msg_sess_info':
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c:153:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
      ERR_NP("Error message received with invalid length: %d,"
             ^~~~~~
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c: In function 'ibtrs_validate_msg_error':
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c:164:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
      ERR_NP("Error message received with invalid length: %d,"
             ^~~~~~
--
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/heartbeat.c: In function 'ibtrs_heartbeat_warn':
>> drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/heartbeat.c:84:24: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'long long int' [-Wformat=]
     DEB("last heartbeat message from %s was received %lu, %llums"
                           ^~~~~~
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/heartbeat.c: In function 'ibtrs_heartbeat_timeout_is_expired':
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/heartbeat.c:101:24: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'long long int' [-Wformat=]
     DEB("last heartbeat message from %s received %lu, %llums ago\n",
                           ^~~~~~
   warning: __mcount_loc already exists: drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/heartbeat.o

vim +1106 drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c

89b85024 Jack Wang 2017-03-24  1090  		list_add(&desc[i]->entry, &pool->free_list);
89b85024 Jack Wang 2017-03-24  1091  	spin_unlock_bh(&pool->lock);
89b85024 Jack Wang 2017-03-24  1092  }
89b85024 Jack Wang 2017-03-24  1093  
89b85024 Jack Wang 2017-03-24  1094  static inline struct ibtrs_fr_pool *alloc_fr_pool(struct ibtrs_session *sess)
89b85024 Jack Wang 2017-03-24  1095  {
89b85024 Jack Wang 2017-03-24  1096  	return ibtrs_create_fr_pool(sess->ib_device, sess->ib_sess.pd,
89b85024 Jack Wang 2017-03-24  1097  				    sess->queue_depth,
89b85024 Jack Wang 2017-03-24  1098  				    sess->max_pages_per_mr);
89b85024 Jack Wang 2017-03-24  1099  }
89b85024 Jack Wang 2017-03-24  1100  
89b85024 Jack Wang 2017-03-24  1101  static void ibtrs_map_desc(struct ibtrs_map_state *state, dma_addr_t dma_addr,
89b85024 Jack Wang 2017-03-24  1102  			   u32 dma_len, u32 rkey, u32 max_desc)
89b85024 Jack Wang 2017-03-24  1103  {
89b85024 Jack Wang 2017-03-24  1104  	struct ibtrs_sg_desc *desc = state->desc;
89b85024 Jack Wang 2017-03-24  1105  
89b85024 Jack Wang 2017-03-24 @1106  	DEB("dma_addr %llu, key %u, dma_len %u\n", dma_addr, rkey, dma_len);
89b85024 Jack Wang 2017-03-24  1107  	desc->addr	= dma_addr;
89b85024 Jack Wang 2017-03-24  1108  	desc->key	= rkey;
89b85024 Jack Wang 2017-03-24  1109  	desc->len	= dma_len;
89b85024 Jack Wang 2017-03-24  1110  
89b85024 Jack Wang 2017-03-24  1111  	state->total_len += dma_len;
89b85024 Jack Wang 2017-03-24  1112  	if (state->ndesc < max_desc) {
89b85024 Jack Wang 2017-03-24  1113  		state->desc++;
89b85024 Jack Wang 2017-03-24  1114  		state->ndesc++;

:::::: The code at line 1106 was first introduced by commit
:::::: 89b85024b8ff15d239ba06be993378fe6a940693 ibtrs_clt: main functionality of ibtrs_client

:::::: TO: Jack Wang <jinpu.wang@profitbricks.com>
:::::: CC: 0day robot <fengguang.wu@intel.com>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 58338 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 08/28] ibtrs_clt: add Makefile and Kconfig
@ 2017-03-25  6:55     ` kbuild test robot
  0 siblings, 0 replies; 87+ messages in thread
From: kbuild test robot @ 2017-03-25  6:55 UTC (permalink / raw)
  Cc: kbuild-all, linux-block, linux-rdma, dledford, axboe, hch, mail,
	Milind.dumbare, yun.wang, Jack Wang

[-- Attachment #1: Type: text/plain, Size: 8114 bytes --]

Hi Jack,

[auto build test WARNING on linus/master]
[also build test WARNING on v4.11-rc3 next-20170324]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Jack-Wang/INFINIBAND-NETWORK-BLOCK-DEVICE-IBNBD/20170325-101629
config: i386-allyesconfig (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All warnings (new ones prefixed by >>):

   drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c: In function 'process_open_rsp':
   drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c:857:26: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
      DEB("Adding contiguous buffer %d, size %u, addr: 0x%p,"
                             ^
   drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c: In function 'ibtrs_map_desc':
>> drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c:1106:24: warning: format '%llu' expects argument of type 'long long unsigned int', but argument 3 has type 'dma_addr_t {aka unsigned int}' [-Wformat=]
     DEB("dma_addr %llu, key %u, dma_len %u\n", dma_addr, rkey, dma_len);
                           ^~~~~~
   drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c: In function 'ibtrs_post_send_rdma':
   drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c:1440:23: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
              addr + off, (u64)req->iu, imm,
                          ^
   drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c: In function 'ibtrs_post_send_rdma_desc':
   drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c:1565:17: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
              addr, (u64)req->iu, imm,
                    ^
   drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c: In function 'process_err_wc':
   drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c:1882:7: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
     iu = (struct ibtrs_iu *)wc->wr_id;
          ^
   drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c: In function 'process_wcs':
   drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c:1922:8: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
      iu = (struct ibtrs_iu *)wc.wr_id;
           ^
--
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c: In function 'ibtrs_validate_msg_sess_open_resp':
>> drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c:59:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
      ERR_NP("Session open resp msg received with unexpected length"
             ^~~~~~
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c: In function 'ibtrs_validate_msg_rdma_write':
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c:99:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
      ERR_NP("RDMA-Write msg received with invalid length %d"
             ^~~~~~
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c: In function 'ibtrs_validate_msg_req_rdma_write':
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c:112:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
      ERR_NP("Request-RDMA-Write msg request received with invalid"
             ^~~~~~
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c: In function 'ibtrs_validate_msg_con_open':
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c:125:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
      ERR_NP("Con Open msg received with invalid length: %d"
             ^~~~~~
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c: In function 'ibtrs_validate_msg_sess_open':
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c:137:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
      ERR_NP("Sess open msg received with invalid length: %d"
             ^~~~~~
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c: In function 'ibtrs_validate_msg_sess_info':
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c:153:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
      ERR_NP("Error message received with invalid length: %d,"
             ^~~~~~
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c: In function 'ibtrs_validate_msg_error':
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/ibtrs-proto.c:164:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
      ERR_NP("Error message received with invalid length: %d,"
             ^~~~~~
--
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/heartbeat.c: In function 'ibtrs_heartbeat_warn':
>> drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/heartbeat.c:84:24: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'long long int' [-Wformat=]
     DEB("last heartbeat message from %s was received %lu, %llums"
                           ^~~~~~
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/heartbeat.c: In function 'ibtrs_heartbeat_timeout_is_expired':
   drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/heartbeat.c:101:24: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'long long int' [-Wformat=]
     DEB("last heartbeat message from %s received %lu, %llums ago\n",
                           ^~~~~~
   warning: __mcount_loc already exists: drivers/infiniband/ulp/ibtrs_client/../ibtrs_lib/heartbeat.o

vim +1106 drivers/infiniband/ulp/ibtrs_client/ibtrs_clt.c

89b85024 Jack Wang 2017-03-24  1090  		list_add(&desc[i]->entry, &pool->free_list);
89b85024 Jack Wang 2017-03-24  1091  	spin_unlock_bh(&pool->lock);
89b85024 Jack Wang 2017-03-24  1092  }
89b85024 Jack Wang 2017-03-24  1093  
89b85024 Jack Wang 2017-03-24  1094  static inline struct ibtrs_fr_pool *alloc_fr_pool(struct ibtrs_session *sess)
89b85024 Jack Wang 2017-03-24  1095  {
89b85024 Jack Wang 2017-03-24  1096  	return ibtrs_create_fr_pool(sess->ib_device, sess->ib_sess.pd,
89b85024 Jack Wang 2017-03-24  1097  				    sess->queue_depth,
89b85024 Jack Wang 2017-03-24  1098  				    sess->max_pages_per_mr);
89b85024 Jack Wang 2017-03-24  1099  }
89b85024 Jack Wang 2017-03-24  1100  
89b85024 Jack Wang 2017-03-24  1101  static void ibtrs_map_desc(struct ibtrs_map_state *state, dma_addr_t dma_addr,
89b85024 Jack Wang 2017-03-24  1102  			   u32 dma_len, u32 rkey, u32 max_desc)
89b85024 Jack Wang 2017-03-24  1103  {
89b85024 Jack Wang 2017-03-24  1104  	struct ibtrs_sg_desc *desc = state->desc;
89b85024 Jack Wang 2017-03-24  1105  
89b85024 Jack Wang 2017-03-24 @1106  	DEB("dma_addr %llu, key %u, dma_len %u\n", dma_addr, rkey, dma_len);
89b85024 Jack Wang 2017-03-24  1107  	desc->addr	= dma_addr;
89b85024 Jack Wang 2017-03-24  1108  	desc->key	= rkey;
89b85024 Jack Wang 2017-03-24  1109  	desc->len	= dma_len;
89b85024 Jack Wang 2017-03-24  1110  
89b85024 Jack Wang 2017-03-24  1111  	state->total_len += dma_len;
89b85024 Jack Wang 2017-03-24  1112  	if (state->ndesc < max_desc) {
89b85024 Jack Wang 2017-03-24  1113  		state->desc++;
89b85024 Jack Wang 2017-03-24  1114  		state->ndesc++;

:::::: The code at line 1106 was first introduced by commit
:::::: 89b85024b8ff15d239ba06be993378fe6a940693 ibtrs_clt: main functionality of ibtrs_client

:::::: TO: Jack Wang <jinpu.wang@profitbricks.com>
:::::: CC: 0day robot <fengguang.wu@intel.com>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 58338 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 27/28] ibnbd: add doc for how to use ibnbd and sysfs interface
  2017-03-24 10:45 ` [PATCH 27/28] ibnbd: add doc for how to use ibnbd and sysfs interface Jack Wang
@ 2017-03-25  7:44     ` kbuild test robot
  0 siblings, 0 replies; 87+ messages in thread
From: kbuild test robot @ 2017-03-25  7:44 UTC (permalink / raw)
  To: Jack Wang
  Cc: kbuild-all, linux-block, linux-rdma, dledford, axboe, hch, mail,
	Milind.dumbare, yun.wang, Jack Wang

[-- Attachment #1: Type: text/plain, Size: 1946 bytes --]

Hi Jack,

[auto build test ERROR on linus/master]
[also build test ERROR on v4.11-rc3 next-20170324]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Jack-Wang/INFINIBAND-NETWORK-BLOCK-DEVICE-IBNBD/20170325-101629
config: ia64-allyesconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 6.2.0
reproduce:
        wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=ia64 

All errors (new ones prefixed by >>):

   drivers/block/ibnbd_server/built-in.o: In function `ibnbd_io_mode_str':
>> (.text+0x5fa0): multiple definition of `ibnbd_io_mode_str'
   drivers/block/ibnbd_client/built-in.o:(.text+0xb7a0): first defined here
   drivers/block/ibnbd_server/built-in.o: In function `ibnbd_validate_message':
>> (.text+0x5a40): multiple definition of `ibnbd_validate_message'
   drivers/block/ibnbd_client/built-in.o:(.text+0xb240): first defined here
   drivers/block/ibnbd_server/built-in.o: In function `rq_cmd_to_ibnbd_io_flags':
>> (.text+0x5880): multiple definition of `rq_cmd_to_ibnbd_io_flags'
   drivers/block/ibnbd_client/built-in.o:(.text+0xb080): first defined here
   drivers/block/ibnbd_server/built-in.o: In function `ibnbd_access_mode_str':
>> (.text+0x6000): multiple definition of `ibnbd_access_mode_str'
   drivers/block/ibnbd_client/built-in.o:(.text+0xb800): first defined here
   drivers/block/ibnbd_server/built-in.o: In function `ibnbd_io_flags_to_bi_rw':
>> (.text+0x57e0): multiple definition of `ibnbd_io_flags_to_bi_rw'
   drivers/block/ibnbd_client/built-in.o:(.text+0xafe0): first defined here

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 49726 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 27/28] ibnbd: add doc for how to use ibnbd and sysfs interface
@ 2017-03-25  7:44     ` kbuild test robot
  0 siblings, 0 replies; 87+ messages in thread
From: kbuild test robot @ 2017-03-25  7:44 UTC (permalink / raw)
  Cc: kbuild-all, linux-block, linux-rdma, dledford, axboe, hch, mail,
	Milind.dumbare, yun.wang, Jack Wang

[-- Attachment #1: Type: text/plain, Size: 1946 bytes --]

Hi Jack,

[auto build test ERROR on linus/master]
[also build test ERROR on v4.11-rc3 next-20170324]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Jack-Wang/INFINIBAND-NETWORK-BLOCK-DEVICE-IBNBD/20170325-101629
config: ia64-allyesconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 6.2.0
reproduce:
        wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=ia64 

All errors (new ones prefixed by >>):

   drivers/block/ibnbd_server/built-in.o: In function `ibnbd_io_mode_str':
>> (.text+0x5fa0): multiple definition of `ibnbd_io_mode_str'
   drivers/block/ibnbd_client/built-in.o:(.text+0xb7a0): first defined here
   drivers/block/ibnbd_server/built-in.o: In function `ibnbd_validate_message':
>> (.text+0x5a40): multiple definition of `ibnbd_validate_message'
   drivers/block/ibnbd_client/built-in.o:(.text+0xb240): first defined here
   drivers/block/ibnbd_server/built-in.o: In function `rq_cmd_to_ibnbd_io_flags':
>> (.text+0x5880): multiple definition of `rq_cmd_to_ibnbd_io_flags'
   drivers/block/ibnbd_client/built-in.o:(.text+0xb080): first defined here
   drivers/block/ibnbd_server/built-in.o: In function `ibnbd_access_mode_str':
>> (.text+0x6000): multiple definition of `ibnbd_access_mode_str'
   drivers/block/ibnbd_client/built-in.o:(.text+0xb800): first defined here
   drivers/block/ibnbd_server/built-in.o: In function `ibnbd_io_flags_to_bi_rw':
>> (.text+0x57e0): multiple definition of `ibnbd_io_flags_to_bi_rw'
   drivers/block/ibnbd_client/built-in.o:(.text+0xafe0): first defined here

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 49726 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 13/28] ibtrs_srv: add Makefile and Kconfig
  2017-03-24 10:45   ` Jack Wang
@ 2017-03-25  7:55     ` kbuild test robot
  -1 siblings, 0 replies; 87+ messages in thread
From: kbuild test robot @ 2017-03-25  7:55 UTC (permalink / raw)
  To: Jack Wang
  Cc: kbuild-all, linux-block, linux-rdma, dledford, axboe, hch, mail,
	Milind.dumbare, yun.wang, Jack Wang

[-- Attachment #1: Type: text/plain, Size: 17817 bytes --]

Hi Jack,

[auto build test WARNING on linus/master]
[also build test WARNING on v4.11-rc3 next-20170324]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Jack-Wang/INFINIBAND-NETWORK-BLOCK-DEVICE-IBNBD/20170325-101629
config: i386-allmodconfig (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All warnings (new ones prefixed by >>):

   In file included from include/linux/printk.h:6:0,
                    from drivers/infiniband/ulp/ibtrs_server/../ibtrs_lib/ibtrs-proto.c:48:
   drivers/infiniband/ulp/ibtrs_server/../ibtrs_lib/ibtrs-proto.c: In function 'ibtrs_validate_msg_sess_open_resp':
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
   include/rdma/ibtrs_log.h:62:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibtrs L%d ERR: " fmt, \
                             ^~~~~~
>> drivers/infiniband/ulp/ibtrs_server/../ibtrs_lib/ibtrs-proto.c:59:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Session open resp msg received with unexpected length"
      ^~~~~~
   drivers/infiniband/ulp/ibtrs_server/../ibtrs_lib/ibtrs-proto.c: In function 'ibtrs_validate_msg_rdma_write':
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
   include/rdma/ibtrs_log.h:62:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibtrs L%d ERR: " fmt, \
                             ^~~~~~
   drivers/infiniband/ulp/ibtrs_server/../ibtrs_lib/ibtrs-proto.c:99:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("RDMA-Write msg received with invalid length %d"
      ^~~~~~
   drivers/infiniband/ulp/ibtrs_server/../ibtrs_lib/ibtrs-proto.c: In function 'ibtrs_validate_msg_req_rdma_write':
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
   include/rdma/ibtrs_log.h:62:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibtrs L%d ERR: " fmt, \
                             ^~~~~~
   drivers/infiniband/ulp/ibtrs_server/../ibtrs_lib/ibtrs-proto.c:112:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Request-RDMA-Write msg request received with invalid"
      ^~~~~~
   drivers/infiniband/ulp/ibtrs_server/../ibtrs_lib/ibtrs-proto.c: In function 'ibtrs_validate_msg_con_open':
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
   include/rdma/ibtrs_log.h:62:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibtrs L%d ERR: " fmt, \
                             ^~~~~~
   drivers/infiniband/ulp/ibtrs_server/../ibtrs_lib/ibtrs-proto.c:125:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Con Open msg received with invalid length: %d"
      ^~~~~~
   drivers/infiniband/ulp/ibtrs_server/../ibtrs_lib/ibtrs-proto.c: In function 'ibtrs_validate_msg_sess_open':
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
   include/rdma/ibtrs_log.h:62:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibtrs L%d ERR: " fmt, \
                             ^~~~~~
   drivers/infiniband/ulp/ibtrs_server/../ibtrs_lib/ibtrs-proto.c:137:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Sess open msg received with invalid length: %d"
      ^~~~~~
   drivers/infiniband/ulp/ibtrs_server/../ibtrs_lib/ibtrs-proto.c: In function 'ibtrs_validate_msg_sess_info':
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
   include/rdma/ibtrs_log.h:62:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibtrs L%d ERR: " fmt, \
                             ^~~~~~
   drivers/infiniband/ulp/ibtrs_server/../ibtrs_lib/ibtrs-proto.c:153:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Error message received with invalid length: %d,"
      ^~~~~~
   drivers/infiniband/ulp/ibtrs_server/../ibtrs_lib/ibtrs-proto.c: In function 'ibtrs_validate_msg_error':
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
   include/rdma/ibtrs_log.h:62:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibtrs L%d ERR: " fmt, \
                             ^~~~~~
   drivers/infiniband/ulp/ibtrs_server/../ibtrs_lib/ibtrs-proto.c:164:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Error message received with invalid length: %d,"
      ^~~~~~
   warning: __mcount_loc already exists: drivers/infiniband/ulp/ibtrs_server/../ibtrs_lib/ibtrs-proto.o
--
   In file included from include/linux/kernel.h:13:0,
                    from include/linux/uio.h:12,
                    from include/rdma/ibtrs.h:50,
                    from drivers/infiniband/ulp/ibtrs_server/../ibtrs_lib/heartbeat.c:47:
   drivers/infiniband/ulp/ibtrs_server/../ibtrs_lib/heartbeat.c: In function 'ibtrs_heartbeat_warn':
   include/rdma/ibtrs_log.h:51:32: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'long long int' [-Wformat=]
    #define DEB(fmt, ...) pr_debug("ibtrs L%d " fmt, __LINE__, ##__VA_ARGS__)
                                   ^
   include/linux/printk.h:285:21: note: in definition of macro 'pr_fmt'
    #define pr_fmt(fmt) fmt
                        ^~~
   include/linux/printk.h:333:2: note: in expansion of macro 'dynamic_pr_debug'
     dynamic_pr_debug(fmt, ##__VA_ARGS__)
     ^~~~~~~~~~~~~~~~
   include/rdma/ibtrs_log.h:51:23: note: in expansion of macro 'pr_debug'
    #define DEB(fmt, ...) pr_debug("ibtrs L%d " fmt, __LINE__, ##__VA_ARGS__)
                          ^~~~~~~~
>> drivers/infiniband/ulp/ibtrs_server/../ibtrs_lib/heartbeat.c:84:2: note: in expansion of macro 'DEB'
     DEB("last heartbeat message from %s was received %lu, %llums"
     ^~~
   drivers/infiniband/ulp/ibtrs_server/../ibtrs_lib/heartbeat.c: In function 'ibtrs_heartbeat_timeout_is_expired':
   include/rdma/ibtrs_log.h:51:32: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'long long int' [-Wformat=]
    #define DEB(fmt, ...) pr_debug("ibtrs L%d " fmt, __LINE__, ##__VA_ARGS__)
                                   ^
   include/linux/printk.h:285:21: note: in definition of macro 'pr_fmt'
    #define pr_fmt(fmt) fmt
                        ^~~
   include/linux/printk.h:333:2: note: in expansion of macro 'dynamic_pr_debug'
     dynamic_pr_debug(fmt, ##__VA_ARGS__)
     ^~~~~~~~~~~~~~~~
   include/rdma/ibtrs_log.h:51:23: note: in expansion of macro 'pr_debug'
    #define DEB(fmt, ...) pr_debug("ibtrs L%d " fmt, __LINE__, ##__VA_ARGS__)
                          ^~~~~~~~
   drivers/infiniband/ulp/ibtrs_server/../ibtrs_lib/heartbeat.c:101:2: note: in expansion of macro 'DEB'
     DEB("last heartbeat message from %s received %lu, %llums ago\n",
     ^~~
--
   drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c: In function 'ibtrs_srv_stats_rdma_to_str':
>> drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c:605:33: warning: format '%ld' expects argument of type 'long int', but argument 4 has type 'long long int' [-Wformat=]
     return scnprintf(page, len, "%ld %ld %ld %ld %u %ld\n",
                                    ^
   drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c:605:37: warning: format '%ld' expects argument of type 'long int', but argument 5 has type 'long long int' [-Wformat=]
     return scnprintf(page, len, "%ld %ld %ld %ld %u %ld\n",
                                        ^
   drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c:605:41: warning: format '%ld' expects argument of type 'long int', but argument 6 has type 'long long int' [-Wformat=]
     return scnprintf(page, len, "%ld %ld %ld %ld %u %ld\n",
                                            ^
   drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c:605:45: warning: format '%ld' expects argument of type 'long int', but argument 7 has type 'long long int' [-Wformat=]
     return scnprintf(page, len, "%ld %ld %ld %ld %u %ld\n",
                                                ^
   drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c:605:52: warning: format '%ld' expects argument of type 'long int', but argument 9 has type 'long long int' [-Wformat=]
     return scnprintf(page, len, "%ld %ld %ld %ld %u %ld\n",
                                                       ^
   drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c: In function 'ibtrs_srv_stats_user_ib_msgs_to_str':
   drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c:632:31: warning: format '%ld' expects argument of type 'long int', but argument 4 has type 'long long int' [-Wformat=]
     return snprintf(buf, len, "%ld %ld %ld %ld\n",
                                  ^
   drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c:632:35: warning: format '%ld' expects argument of type 'long int', but argument 5 has type 'long long int' [-Wformat=]
     return snprintf(buf, len, "%ld %ld %ld %ld\n",
                                      ^
   drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c:632:39: warning: format '%ld' expects argument of type 'long int', but argument 6 has type 'long long int' [-Wformat=]
     return snprintf(buf, len, "%ld %ld %ld %ld\n",
                                          ^
   drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c:632:43: warning: format '%ld' expects argument of type 'long int', but argument 7 has type 'long long int' [-Wformat=]
     return snprintf(buf, len, "%ld %ld %ld %ld\n",
                                              ^
   drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c: In function 'ibtrs_srv_stats_wc_completion_to_str':
   drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c:652:34: warning: format '%ld' expects argument of type 'long int', but argument 5 has type 'long long int' [-Wformat=]
     return snprintf(buf, len, "%d %ld %ld\n",
                                     ^
   drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c:652:38: warning: format '%ld' expects argument of type 'long int', but argument 6 has type 'long long int' [-Wformat=]
     return snprintf(buf, len, "%d %ld %ld\n",
                                         ^
   drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c: In function 'process_err_wc':
>> drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c:2065:7: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
     iu = (struct ibtrs_iu *)wc->wr_id;
          ^
   In file included from include/linux/printk.h:6:0,
                    from include/linux/kernel.h:13,
                    from include/linux/list.h:8,
                    from include/linux/module.h:9,
                    from drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c:47:
   drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c: In function 'rdma_con_establish':
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
   include/rdma/ibtrs_log.h:62:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibtrs L%d ERR: " fmt, \
                             ^~~~~~
>> drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c:2617:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Establishing connection failed, "
      ^~~~~~
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'unsigned int' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
   include/rdma/ibtrs_log.h:62:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibtrs L%d ERR: " fmt, \
                             ^~~~~~
>> drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c:2617:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Establishing connection failed, "
      ^~~~~~

vim +/ERR_NP +59 drivers/infiniband/ulp/ibtrs_server/../ibtrs_lib/ibtrs-proto.c

f2a5844d Jack Wang 2017-03-24  43   * POSSIBILITY OF SUCH DAMAGES.
f2a5844d Jack Wang 2017-03-24  44   *
f2a5844d Jack Wang 2017-03-24  45   */
f2a5844d Jack Wang 2017-03-24  46  
f2a5844d Jack Wang 2017-03-24  47  #include <linux/errno.h>
f2a5844d Jack Wang 2017-03-24  48  #include <linux/printk.h>
f2a5844d Jack Wang 2017-03-24  49  #include <rdma/ibtrs.h>
f2a5844d Jack Wang 2017-03-24  50  #include <rdma/ibtrs_log.h>
f2a5844d Jack Wang 2017-03-24  51  
f2a5844d Jack Wang 2017-03-24  52  static int
f2a5844d Jack Wang 2017-03-24  53  ibtrs_validate_msg_sess_open_resp(const struct ibtrs_msg_sess_open_resp *msg)
f2a5844d Jack Wang 2017-03-24  54  {
f2a5844d Jack Wang 2017-03-24  55  	static const int min_bufs = 1;
f2a5844d Jack Wang 2017-03-24  56  
f2a5844d Jack Wang 2017-03-24  57  	if (unlikely(msg->hdr.tsize !=
f2a5844d Jack Wang 2017-03-24  58  				IBTRS_MSG_SESS_OPEN_RESP_LEN(msg->cnt))) {
f2a5844d Jack Wang 2017-03-24 @59  		ERR_NP("Session open resp msg received with unexpected length"
f2a5844d Jack Wang 2017-03-24  60  		       " %dB instead of %luB\n", msg->hdr.tsize,
f2a5844d Jack Wang 2017-03-24  61  		       IBTRS_MSG_SESS_OPEN_RESP_LEN(msg->cnt));
f2a5844d Jack Wang 2017-03-24  62  
f2a5844d Jack Wang 2017-03-24  63  		return -EINVAL;
f2a5844d Jack Wang 2017-03-24  64  	}
f2a5844d Jack Wang 2017-03-24  65  
f2a5844d Jack Wang 2017-03-24  66  	if (msg->max_inflight_msg < min_bufs) {
f2a5844d Jack Wang 2017-03-24  67  		ERR_NP("Sess Open msg received with invalid max_inflight_msg %d"

:::::: The code at line 59 was first introduced by commit
:::::: f2a5844d27aa77dee51bee108f1654f9ca4a3ac6 ibtrs_lib: add common functions shared by client and server

:::::: TO: Jack Wang <jinpu.wang@profitbricks.com>
:::::: CC: 0day robot <fengguang.wu@intel.com>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 59004 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 13/28] ibtrs_srv: add Makefile and Kconfig
@ 2017-03-25  7:55     ` kbuild test robot
  0 siblings, 0 replies; 87+ messages in thread
From: kbuild test robot @ 2017-03-25  7:55 UTC (permalink / raw)
  Cc: kbuild-all, linux-block, linux-rdma, dledford, axboe, hch, mail,
	Milind.dumbare, yun.wang, Jack Wang

[-- Attachment #1: Type: text/plain, Size: 17817 bytes --]

Hi Jack,

[auto build test WARNING on linus/master]
[also build test WARNING on v4.11-rc3 next-20170324]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Jack-Wang/INFINIBAND-NETWORK-BLOCK-DEVICE-IBNBD/20170325-101629
config: i386-allmodconfig (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All warnings (new ones prefixed by >>):

   In file included from include/linux/printk.h:6:0,
                    from drivers/infiniband/ulp/ibtrs_server/../ibtrs_lib/ibtrs-proto.c:48:
   drivers/infiniband/ulp/ibtrs_server/../ibtrs_lib/ibtrs-proto.c: In function 'ibtrs_validate_msg_sess_open_resp':
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
   include/rdma/ibtrs_log.h:62:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibtrs L%d ERR: " fmt, \
                             ^~~~~~
>> drivers/infiniband/ulp/ibtrs_server/../ibtrs_lib/ibtrs-proto.c:59:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Session open resp msg received with unexpected length"
      ^~~~~~
   drivers/infiniband/ulp/ibtrs_server/../ibtrs_lib/ibtrs-proto.c: In function 'ibtrs_validate_msg_rdma_write':
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
   include/rdma/ibtrs_log.h:62:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibtrs L%d ERR: " fmt, \
                             ^~~~~~
   drivers/infiniband/ulp/ibtrs_server/../ibtrs_lib/ibtrs-proto.c:99:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("RDMA-Write msg received with invalid length %d"
      ^~~~~~
   drivers/infiniband/ulp/ibtrs_server/../ibtrs_lib/ibtrs-proto.c: In function 'ibtrs_validate_msg_req_rdma_write':
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
   include/rdma/ibtrs_log.h:62:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibtrs L%d ERR: " fmt, \
                             ^~~~~~
   drivers/infiniband/ulp/ibtrs_server/../ibtrs_lib/ibtrs-proto.c:112:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Request-RDMA-Write msg request received with invalid"
      ^~~~~~
   drivers/infiniband/ulp/ibtrs_server/../ibtrs_lib/ibtrs-proto.c: In function 'ibtrs_validate_msg_con_open':
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
   include/rdma/ibtrs_log.h:62:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibtrs L%d ERR: " fmt, \
                             ^~~~~~
   drivers/infiniband/ulp/ibtrs_server/../ibtrs_lib/ibtrs-proto.c:125:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Con Open msg received with invalid length: %d"
      ^~~~~~
   drivers/infiniband/ulp/ibtrs_server/../ibtrs_lib/ibtrs-proto.c: In function 'ibtrs_validate_msg_sess_open':
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
   include/rdma/ibtrs_log.h:62:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibtrs L%d ERR: " fmt, \
                             ^~~~~~
   drivers/infiniband/ulp/ibtrs_server/../ibtrs_lib/ibtrs-proto.c:137:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Sess open msg received with invalid length: %d"
      ^~~~~~
   drivers/infiniband/ulp/ibtrs_server/../ibtrs_lib/ibtrs-proto.c: In function 'ibtrs_validate_msg_sess_info':
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
   include/rdma/ibtrs_log.h:62:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibtrs L%d ERR: " fmt, \
                             ^~~~~~
   drivers/infiniband/ulp/ibtrs_server/../ibtrs_lib/ibtrs-proto.c:153:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Error message received with invalid length: %d,"
      ^~~~~~
   drivers/infiniband/ulp/ibtrs_server/../ibtrs_lib/ibtrs-proto.c: In function 'ibtrs_validate_msg_error':
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
   include/rdma/ibtrs_log.h:62:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibtrs L%d ERR: " fmt, \
                             ^~~~~~
   drivers/infiniband/ulp/ibtrs_server/../ibtrs_lib/ibtrs-proto.c:164:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Error message received with invalid length: %d,"
      ^~~~~~
   warning: __mcount_loc already exists: drivers/infiniband/ulp/ibtrs_server/../ibtrs_lib/ibtrs-proto.o
--
   In file included from include/linux/kernel.h:13:0,
                    from include/linux/uio.h:12,
                    from include/rdma/ibtrs.h:50,
                    from drivers/infiniband/ulp/ibtrs_server/../ibtrs_lib/heartbeat.c:47:
   drivers/infiniband/ulp/ibtrs_server/../ibtrs_lib/heartbeat.c: In function 'ibtrs_heartbeat_warn':
   include/rdma/ibtrs_log.h:51:32: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'long long int' [-Wformat=]
    #define DEB(fmt, ...) pr_debug("ibtrs L%d " fmt, __LINE__, ##__VA_ARGS__)
                                   ^
   include/linux/printk.h:285:21: note: in definition of macro 'pr_fmt'
    #define pr_fmt(fmt) fmt
                        ^~~
   include/linux/printk.h:333:2: note: in expansion of macro 'dynamic_pr_debug'
     dynamic_pr_debug(fmt, ##__VA_ARGS__)
     ^~~~~~~~~~~~~~~~
   include/rdma/ibtrs_log.h:51:23: note: in expansion of macro 'pr_debug'
    #define DEB(fmt, ...) pr_debug("ibtrs L%d " fmt, __LINE__, ##__VA_ARGS__)
                          ^~~~~~~~
>> drivers/infiniband/ulp/ibtrs_server/../ibtrs_lib/heartbeat.c:84:2: note: in expansion of macro 'DEB'
     DEB("last heartbeat message from %s was received %lu, %llums"
     ^~~
   drivers/infiniband/ulp/ibtrs_server/../ibtrs_lib/heartbeat.c: In function 'ibtrs_heartbeat_timeout_is_expired':
   include/rdma/ibtrs_log.h:51:32: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'long long int' [-Wformat=]
    #define DEB(fmt, ...) pr_debug("ibtrs L%d " fmt, __LINE__, ##__VA_ARGS__)
                                   ^
   include/linux/printk.h:285:21: note: in definition of macro 'pr_fmt'
    #define pr_fmt(fmt) fmt
                        ^~~
   include/linux/printk.h:333:2: note: in expansion of macro 'dynamic_pr_debug'
     dynamic_pr_debug(fmt, ##__VA_ARGS__)
     ^~~~~~~~~~~~~~~~
   include/rdma/ibtrs_log.h:51:23: note: in expansion of macro 'pr_debug'
    #define DEB(fmt, ...) pr_debug("ibtrs L%d " fmt, __LINE__, ##__VA_ARGS__)
                          ^~~~~~~~
   drivers/infiniband/ulp/ibtrs_server/../ibtrs_lib/heartbeat.c:101:2: note: in expansion of macro 'DEB'
     DEB("last heartbeat message from %s received %lu, %llums ago\n",
     ^~~
--
   drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c: In function 'ibtrs_srv_stats_rdma_to_str':
>> drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c:605:33: warning: format '%ld' expects argument of type 'long int', but argument 4 has type 'long long int' [-Wformat=]
     return scnprintf(page, len, "%ld %ld %ld %ld %u %ld\n",
                                    ^
   drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c:605:37: warning: format '%ld' expects argument of type 'long int', but argument 5 has type 'long long int' [-Wformat=]
     return scnprintf(page, len, "%ld %ld %ld %ld %u %ld\n",
                                        ^
   drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c:605:41: warning: format '%ld' expects argument of type 'long int', but argument 6 has type 'long long int' [-Wformat=]
     return scnprintf(page, len, "%ld %ld %ld %ld %u %ld\n",
                                            ^
   drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c:605:45: warning: format '%ld' expects argument of type 'long int', but argument 7 has type 'long long int' [-Wformat=]
     return scnprintf(page, len, "%ld %ld %ld %ld %u %ld\n",
                                                ^
   drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c:605:52: warning: format '%ld' expects argument of type 'long int', but argument 9 has type 'long long int' [-Wformat=]
     return scnprintf(page, len, "%ld %ld %ld %ld %u %ld\n",
                                                       ^
   drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c: In function 'ibtrs_srv_stats_user_ib_msgs_to_str':
   drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c:632:31: warning: format '%ld' expects argument of type 'long int', but argument 4 has type 'long long int' [-Wformat=]
     return snprintf(buf, len, "%ld %ld %ld %ld\n",
                                  ^
   drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c:632:35: warning: format '%ld' expects argument of type 'long int', but argument 5 has type 'long long int' [-Wformat=]
     return snprintf(buf, len, "%ld %ld %ld %ld\n",
                                      ^
   drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c:632:39: warning: format '%ld' expects argument of type 'long int', but argument 6 has type 'long long int' [-Wformat=]
     return snprintf(buf, len, "%ld %ld %ld %ld\n",
                                          ^
   drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c:632:43: warning: format '%ld' expects argument of type 'long int', but argument 7 has type 'long long int' [-Wformat=]
     return snprintf(buf, len, "%ld %ld %ld %ld\n",
                                              ^
   drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c: In function 'ibtrs_srv_stats_wc_completion_to_str':
   drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c:652:34: warning: format '%ld' expects argument of type 'long int', but argument 5 has type 'long long int' [-Wformat=]
     return snprintf(buf, len, "%d %ld %ld\n",
                                     ^
   drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c:652:38: warning: format '%ld' expects argument of type 'long int', but argument 6 has type 'long long int' [-Wformat=]
     return snprintf(buf, len, "%d %ld %ld\n",
                                         ^
   drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c: In function 'process_err_wc':
>> drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c:2065:7: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
     iu = (struct ibtrs_iu *)wc->wr_id;
          ^
   In file included from include/linux/printk.h:6:0,
                    from include/linux/kernel.h:13,
                    from include/linux/list.h:8,
                    from include/linux/module.h:9,
                    from drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c:47:
   drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c: In function 'rdma_con_establish':
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
   include/rdma/ibtrs_log.h:62:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibtrs L%d ERR: " fmt, \
                             ^~~~~~
>> drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c:2617:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Establishing connection failed, "
      ^~~~~~
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'unsigned int' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
   include/rdma/ibtrs_log.h:62:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibtrs L%d ERR: " fmt, \
                             ^~~~~~
>> drivers/infiniband/ulp/ibtrs_server/ibtrs_srv.c:2617:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Establishing connection failed, "
      ^~~~~~

vim +/ERR_NP +59 drivers/infiniband/ulp/ibtrs_server/../ibtrs_lib/ibtrs-proto.c

f2a5844d Jack Wang 2017-03-24  43   * POSSIBILITY OF SUCH DAMAGES.
f2a5844d Jack Wang 2017-03-24  44   *
f2a5844d Jack Wang 2017-03-24  45   */
f2a5844d Jack Wang 2017-03-24  46  
f2a5844d Jack Wang 2017-03-24  47  #include <linux/errno.h>
f2a5844d Jack Wang 2017-03-24  48  #include <linux/printk.h>
f2a5844d Jack Wang 2017-03-24  49  #include <rdma/ibtrs.h>
f2a5844d Jack Wang 2017-03-24  50  #include <rdma/ibtrs_log.h>
f2a5844d Jack Wang 2017-03-24  51  
f2a5844d Jack Wang 2017-03-24  52  static int
f2a5844d Jack Wang 2017-03-24  53  ibtrs_validate_msg_sess_open_resp(const struct ibtrs_msg_sess_open_resp *msg)
f2a5844d Jack Wang 2017-03-24  54  {
f2a5844d Jack Wang 2017-03-24  55  	static const int min_bufs = 1;
f2a5844d Jack Wang 2017-03-24  56  
f2a5844d Jack Wang 2017-03-24  57  	if (unlikely(msg->hdr.tsize !=
f2a5844d Jack Wang 2017-03-24  58  				IBTRS_MSG_SESS_OPEN_RESP_LEN(msg->cnt))) {
f2a5844d Jack Wang 2017-03-24 @59  		ERR_NP("Session open resp msg received with unexpected length"
f2a5844d Jack Wang 2017-03-24  60  		       " %dB instead of %luB\n", msg->hdr.tsize,
f2a5844d Jack Wang 2017-03-24  61  		       IBTRS_MSG_SESS_OPEN_RESP_LEN(msg->cnt));
f2a5844d Jack Wang 2017-03-24  62  
f2a5844d Jack Wang 2017-03-24  63  		return -EINVAL;
f2a5844d Jack Wang 2017-03-24  64  	}
f2a5844d Jack Wang 2017-03-24  65  
f2a5844d Jack Wang 2017-03-24  66  	if (msg->max_inflight_msg < min_bufs) {
f2a5844d Jack Wang 2017-03-24  67  		ERR_NP("Sess Open msg received with invalid max_inflight_msg %d"

:::::: The code at line 59 was first introduced by commit
:::::: f2a5844d27aa77dee51bee108f1654f9ca4a3ac6 ibtrs_lib: add common functions shared by client and server

:::::: TO: Jack Wang <jinpu.wang@profitbricks.com>
:::::: CC: 0day robot <fengguang.wu@intel.com>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 59004 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 20/28] ibnbd_clt: add Makefile and Kconfig
  2017-03-24 10:45   ` Jack Wang
@ 2017-03-25  8:38     ` kbuild test robot
  -1 siblings, 0 replies; 87+ messages in thread
From: kbuild test robot @ 2017-03-25  8:38 UTC (permalink / raw)
  To: Jack Wang
  Cc: kbuild-all, linux-block, linux-rdma, dledford, axboe, hch, mail,
	Milind.dumbare, yun.wang, Jack Wang

[-- Attachment #1: Type: text/plain, Size: 6230 bytes --]

Hi Jack,

[auto build test WARNING on linus/master]
[also build test WARNING on v4.11-rc3 next-20170324]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Jack-Wang/INFINIBAND-NETWORK-BLOCK-DEVICE-IBNBD/20170325-101629
config: i386-allyesconfig (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All warnings (new ones prefixed by >>):

   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c: In function 'ibnbd_validate_msg_sess_info':
>> drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:54:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
      ERR_NP("Sess info message with unexpected length received"
             ^~~~~~
>> drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:54:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c: In function 'ibnbd_validate_msg_sess_info_rsp':
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:67:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
      ERR_NP("Sess info message with unexpected length received"
             ^~~~~~
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:67:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c: In function 'ibnbd_validate_msg_open_resp':
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:82:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
      ERR_NP("Open Response msg received with unexpected length"
             ^~~~~~
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c: In function 'ibnbd_validate_msg_revalidate':
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:114:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
      ERR_NP("Device resize message with unexpected length received"
             ^~~~~~
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:114:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c: In function 'ibnbd_validate_msg_open':
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:126:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
      ERR_NP("Open msg received with unexpected length"
             ^~~~~~
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c: In function 'ibnbd_validate_msg_close':
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:151:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
      ERR_NP("Close msg received with unexpected length %lu instead"
             ^~~~~~
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:151:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c: In function 'ibnbd_validate_msg_close_rsp':
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:163:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
      ERR_NP("Close_rsp msg received with unexpected length %lu"
             ^~~~~~
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:163:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]

vim +54 drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c

46a31b32 Jack Wang 2017-03-24  38   * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
46a31b32 Jack Wang 2017-03-24  39   * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
46a31b32 Jack Wang 2017-03-24  40   * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
46a31b32 Jack Wang 2017-03-24  41   * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
46a31b32 Jack Wang 2017-03-24  42   * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
46a31b32 Jack Wang 2017-03-24  43   * POSSIBILITY OF SUCH DAMAGES.
46a31b32 Jack Wang 2017-03-24  44   *
46a31b32 Jack Wang 2017-03-24  45   */
46a31b32 Jack Wang 2017-03-24  46  
46a31b32 Jack Wang 2017-03-24  47  #include "../ibnbd_inc/ibnbd-proto.h"
46a31b32 Jack Wang 2017-03-24  48  #include "../ibnbd_inc/log.h"
46a31b32 Jack Wang 2017-03-24  49  
46a31b32 Jack Wang 2017-03-24  50  static int ibnbd_validate_msg_sess_info(const struct ibnbd_msg_sess_info *msg,
46a31b32 Jack Wang 2017-03-24  51  					size_t len)
46a31b32 Jack Wang 2017-03-24  52  {
46a31b32 Jack Wang 2017-03-24  53  	if (unlikely(len != sizeof(*msg))) {
46a31b32 Jack Wang 2017-03-24 @54  		ERR_NP("Sess info message with unexpected length received"
46a31b32 Jack Wang 2017-03-24  55  		       " %lu instead of %lu\n", len, sizeof(*msg));
46a31b32 Jack Wang 2017-03-24  56  		return -EINVAL;
46a31b32 Jack Wang 2017-03-24  57  	}
46a31b32 Jack Wang 2017-03-24  58  
46a31b32 Jack Wang 2017-03-24  59  	return 0;
46a31b32 Jack Wang 2017-03-24  60  }
46a31b32 Jack Wang 2017-03-24  61  
46a31b32 Jack Wang 2017-03-24  62  static int

:::::: The code at line 54 was first introduced by commit
:::::: 46a31b323d8198184e9325139e4906941d9ef007 ibnbd: add shared library functions

:::::: TO: Jack Wang <jinpu.wang@profitbricks.com>
:::::: CC: 0day robot <fengguang.wu@intel.com>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 58350 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 20/28] ibnbd_clt: add Makefile and Kconfig
@ 2017-03-25  8:38     ` kbuild test robot
  0 siblings, 0 replies; 87+ messages in thread
From: kbuild test robot @ 2017-03-25  8:38 UTC (permalink / raw)
  Cc: kbuild-all, linux-block, linux-rdma, dledford, axboe, hch, mail,
	Milind.dumbare, yun.wang, Jack Wang

[-- Attachment #1: Type: text/plain, Size: 6230 bytes --]

Hi Jack,

[auto build test WARNING on linus/master]
[also build test WARNING on v4.11-rc3 next-20170324]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Jack-Wang/INFINIBAND-NETWORK-BLOCK-DEVICE-IBNBD/20170325-101629
config: i386-allyesconfig (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All warnings (new ones prefixed by >>):

   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c: In function 'ibnbd_validate_msg_sess_info':
>> drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:54:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
      ERR_NP("Sess info message with unexpected length received"
             ^~~~~~
>> drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:54:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c: In function 'ibnbd_validate_msg_sess_info_rsp':
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:67:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
      ERR_NP("Sess info message with unexpected length received"
             ^~~~~~
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:67:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c: In function 'ibnbd_validate_msg_open_resp':
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:82:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
      ERR_NP("Open Response msg received with unexpected length"
             ^~~~~~
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c: In function 'ibnbd_validate_msg_revalidate':
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:114:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
      ERR_NP("Device resize message with unexpected length received"
             ^~~~~~
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:114:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c: In function 'ibnbd_validate_msg_open':
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:126:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
      ERR_NP("Open msg received with unexpected length"
             ^~~~~~
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c: In function 'ibnbd_validate_msg_close':
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:151:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
      ERR_NP("Close msg received with unexpected length %lu instead"
             ^~~~~~
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:151:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c: In function 'ibnbd_validate_msg_close_rsp':
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:163:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
      ERR_NP("Close_rsp msg received with unexpected length %lu"
             ^~~~~~
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:163:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]

vim +54 drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c

46a31b32 Jack Wang 2017-03-24  38   * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
46a31b32 Jack Wang 2017-03-24  39   * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
46a31b32 Jack Wang 2017-03-24  40   * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
46a31b32 Jack Wang 2017-03-24  41   * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
46a31b32 Jack Wang 2017-03-24  42   * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
46a31b32 Jack Wang 2017-03-24  43   * POSSIBILITY OF SUCH DAMAGES.
46a31b32 Jack Wang 2017-03-24  44   *
46a31b32 Jack Wang 2017-03-24  45   */
46a31b32 Jack Wang 2017-03-24  46  
46a31b32 Jack Wang 2017-03-24  47  #include "../ibnbd_inc/ibnbd-proto.h"
46a31b32 Jack Wang 2017-03-24  48  #include "../ibnbd_inc/log.h"
46a31b32 Jack Wang 2017-03-24  49  
46a31b32 Jack Wang 2017-03-24  50  static int ibnbd_validate_msg_sess_info(const struct ibnbd_msg_sess_info *msg,
46a31b32 Jack Wang 2017-03-24  51  					size_t len)
46a31b32 Jack Wang 2017-03-24  52  {
46a31b32 Jack Wang 2017-03-24  53  	if (unlikely(len != sizeof(*msg))) {
46a31b32 Jack Wang 2017-03-24 @54  		ERR_NP("Sess info message with unexpected length received"
46a31b32 Jack Wang 2017-03-24  55  		       " %lu instead of %lu\n", len, sizeof(*msg));
46a31b32 Jack Wang 2017-03-24  56  		return -EINVAL;
46a31b32 Jack Wang 2017-03-24  57  	}
46a31b32 Jack Wang 2017-03-24  58  
46a31b32 Jack Wang 2017-03-24  59  	return 0;
46a31b32 Jack Wang 2017-03-24  60  }
46a31b32 Jack Wang 2017-03-24  61  
46a31b32 Jack Wang 2017-03-24  62  static int

:::::: The code at line 54 was first introduced by commit
:::::: 46a31b323d8198184e9325139e4906941d9ef007 ibnbd: add shared library functions

:::::: TO: Jack Wang <jinpu.wang@profitbricks.com>
:::::: CC: 0day robot <fengguang.wu@intel.com>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 58350 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 26/28] ibnbd_srv: add Makefile and Kconfig
@ 2017-03-25  9:27     ` kbuild test robot
  0 siblings, 0 replies; 87+ messages in thread
From: kbuild test robot @ 2017-03-25  9:27 UTC (permalink / raw)
  To: Jack Wang
  Cc: kbuild-all, linux-block, linux-rdma, dledford, axboe, hch, mail,
	Milind.dumbare, yun.wang, Jack Wang

[-- Attachment #1: Type: text/plain, Size: 6782 bytes --]

Hi Jack,

[auto build test WARNING on linus/master]
[also build test WARNING on v4.11-rc3 next-20170324]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Jack-Wang/INFINIBAND-NETWORK-BLOCK-DEVICE-IBNBD/20170325-101629
config: i386-allyesconfig (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All warnings (new ones prefixed by >>):

   drivers/block/ibnbd_server/ibnbd_srv.c: In function 'ibnbd_srv_revalidate_sess_dev':
>> drivers/block/ibnbd_server/ibnbd_srv.c:994:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'size_t {aka unsigned int}' [-Wformat=]
      INFO(sess_dev, "notified client about device size change"
             ^~~~~~
   drivers/block/ibnbd_server/ibnbd_srv.c:994:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'size_t {aka unsigned int}' [-Wformat=]
--
   drivers/block/ibnbd_server/../ibnbd_lib/ibnbd-proto.c: In function 'ibnbd_validate_msg_sess_info':
>> drivers/block/ibnbd_server/../ibnbd_lib/ibnbd-proto.c:54:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
      ERR_NP("Sess info message with unexpected length received"
             ^~~~~~
>> drivers/block/ibnbd_server/../ibnbd_lib/ibnbd-proto.c:54:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
   drivers/block/ibnbd_server/../ibnbd_lib/ibnbd-proto.c: In function 'ibnbd_validate_msg_sess_info_rsp':
   drivers/block/ibnbd_server/../ibnbd_lib/ibnbd-proto.c:67:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
      ERR_NP("Sess info message with unexpected length received"
             ^~~~~~
   drivers/block/ibnbd_server/../ibnbd_lib/ibnbd-proto.c:67:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
   drivers/block/ibnbd_server/../ibnbd_lib/ibnbd-proto.c: In function 'ibnbd_validate_msg_open_resp':
   drivers/block/ibnbd_server/../ibnbd_lib/ibnbd-proto.c:82:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
      ERR_NP("Open Response msg received with unexpected length"
             ^~~~~~
   drivers/block/ibnbd_server/../ibnbd_lib/ibnbd-proto.c: In function 'ibnbd_validate_msg_revalidate':
   drivers/block/ibnbd_server/../ibnbd_lib/ibnbd-proto.c:114:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
      ERR_NP("Device resize message with unexpected length received"
             ^~~~~~
   drivers/block/ibnbd_server/../ibnbd_lib/ibnbd-proto.c:114:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
   drivers/block/ibnbd_server/../ibnbd_lib/ibnbd-proto.c: In function 'ibnbd_validate_msg_open':
   drivers/block/ibnbd_server/../ibnbd_lib/ibnbd-proto.c:126:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
      ERR_NP("Open msg received with unexpected length"
             ^~~~~~
   drivers/block/ibnbd_server/../ibnbd_lib/ibnbd-proto.c: In function 'ibnbd_validate_msg_close':
   drivers/block/ibnbd_server/../ibnbd_lib/ibnbd-proto.c:151:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
      ERR_NP("Close msg received with unexpected length %lu instead"
             ^~~~~~
   drivers/block/ibnbd_server/../ibnbd_lib/ibnbd-proto.c:151:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
   drivers/block/ibnbd_server/../ibnbd_lib/ibnbd-proto.c: In function 'ibnbd_validate_msg_close_rsp':
   drivers/block/ibnbd_server/../ibnbd_lib/ibnbd-proto.c:163:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
      ERR_NP("Close_rsp msg received with unexpected length %lu"
             ^~~~~~
   drivers/block/ibnbd_server/../ibnbd_lib/ibnbd-proto.c:163:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
   warning: __mcount_loc already exists: drivers/block/ibnbd_server/../ibnbd_lib/ibnbd-proto.o

vim +994 drivers/block/ibnbd_server/ibnbd_srv.c

346428ec Jack Wang 2017-03-24   978  	msg.nsectors		= nsectors;
346428ec Jack Wang 2017-03-24   979  
346428ec Jack Wang 2017-03-24   980  	if (unlikely(sess_dev->sess->state == SESS_STATE_DISCONNECTED))
346428ec Jack Wang 2017-03-24   981  		return -ENODEV;
346428ec Jack Wang 2017-03-24   982  
346428ec Jack Wang 2017-03-24   983  	if (!sess_dev->is_visible) {
346428ec Jack Wang 2017-03-24   984  		INFO(sess_dev, "revalidate device failed, wait for sending "
346428ec Jack Wang 2017-03-24   985  		     "open reply first\n");
346428ec Jack Wang 2017-03-24   986  		return -EAGAIN;
346428ec Jack Wang 2017-03-24   987  	}
346428ec Jack Wang 2017-03-24   988  
346428ec Jack Wang 2017-03-24   989  	ret = ibtrs_srv_send(sess_dev->sess->ibtrs_sess, &vec, 1);
346428ec Jack Wang 2017-03-24   990  	if (unlikely(ret)) {
346428ec Jack Wang 2017-03-24   991  		ERR(sess_dev, "revalidate: Sending new device size"
346428ec Jack Wang 2017-03-24   992  		    " to client failed, errno: %d\n", ret);
346428ec Jack Wang 2017-03-24   993  	} else {
346428ec Jack Wang 2017-03-24  @994  		INFO(sess_dev, "notified client about device size change"
346428ec Jack Wang 2017-03-24   995  		     " (old nsectors: %lu, new nsectors: %lu)\n",
346428ec Jack Wang 2017-03-24   996  		     sess_dev->nsectors, nsectors);
346428ec Jack Wang 2017-03-24   997  		sess_dev->nsectors = nsectors;
346428ec Jack Wang 2017-03-24   998  	}
346428ec Jack Wang 2017-03-24   999  
346428ec Jack Wang 2017-03-24  1000  	return ret;
346428ec Jack Wang 2017-03-24  1001  }
346428ec Jack Wang 2017-03-24  1002  

:::::: The code at line 994 was first introduced by commit
:::::: 346428ec19d9ec225850f10b7fc26d98051d5f58 ibnbd_srv: add main functionality

:::::: TO: Jack Wang <jinpu.wang@profitbricks.com>
:::::: CC: 0day robot <fengguang.wu@intel.com>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 58354 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 26/28] ibnbd_srv: add Makefile and Kconfig
@ 2017-03-25  9:27     ` kbuild test robot
  0 siblings, 0 replies; 87+ messages in thread
From: kbuild test robot @ 2017-03-25  9:27 UTC (permalink / raw)
  Cc: kbuild-all-JC7UmRfGjtg, linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	dledford-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
	hch-jcswGhMUV9g, mail-99BIx50xQYGELgA04lAiVw,
	Milind.dumbare-Re5JQEeQqe8AvxtiuMwx3w,
	yun.wang-EIkl63zCoXaH+58JC4qpiA, Jack Wang

[-- Attachment #1: Type: text/plain, Size: 6835 bytes --]

Hi Jack,

[auto build test WARNING on linus/master]
[also build test WARNING on v4.11-rc3 next-20170324]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Jack-Wang/INFINIBAND-NETWORK-BLOCK-DEVICE-IBNBD/20170325-101629
config: i386-allyesconfig (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All warnings (new ones prefixed by >>):

   drivers/block/ibnbd_server/ibnbd_srv.c: In function 'ibnbd_srv_revalidate_sess_dev':
>> drivers/block/ibnbd_server/ibnbd_srv.c:994:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'size_t {aka unsigned int}' [-Wformat=]
      INFO(sess_dev, "notified client about device size change"
             ^~~~~~
   drivers/block/ibnbd_server/ibnbd_srv.c:994:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'size_t {aka unsigned int}' [-Wformat=]
--
   drivers/block/ibnbd_server/../ibnbd_lib/ibnbd-proto.c: In function 'ibnbd_validate_msg_sess_info':
>> drivers/block/ibnbd_server/../ibnbd_lib/ibnbd-proto.c:54:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
      ERR_NP("Sess info message with unexpected length received"
             ^~~~~~
>> drivers/block/ibnbd_server/../ibnbd_lib/ibnbd-proto.c:54:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
   drivers/block/ibnbd_server/../ibnbd_lib/ibnbd-proto.c: In function 'ibnbd_validate_msg_sess_info_rsp':
   drivers/block/ibnbd_server/../ibnbd_lib/ibnbd-proto.c:67:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
      ERR_NP("Sess info message with unexpected length received"
             ^~~~~~
   drivers/block/ibnbd_server/../ibnbd_lib/ibnbd-proto.c:67:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
   drivers/block/ibnbd_server/../ibnbd_lib/ibnbd-proto.c: In function 'ibnbd_validate_msg_open_resp':
   drivers/block/ibnbd_server/../ibnbd_lib/ibnbd-proto.c:82:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
      ERR_NP("Open Response msg received with unexpected length"
             ^~~~~~
   drivers/block/ibnbd_server/../ibnbd_lib/ibnbd-proto.c: In function 'ibnbd_validate_msg_revalidate':
   drivers/block/ibnbd_server/../ibnbd_lib/ibnbd-proto.c:114:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
      ERR_NP("Device resize message with unexpected length received"
             ^~~~~~
   drivers/block/ibnbd_server/../ibnbd_lib/ibnbd-proto.c:114:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
   drivers/block/ibnbd_server/../ibnbd_lib/ibnbd-proto.c: In function 'ibnbd_validate_msg_open':
   drivers/block/ibnbd_server/../ibnbd_lib/ibnbd-proto.c:126:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
      ERR_NP("Open msg received with unexpected length"
             ^~~~~~
   drivers/block/ibnbd_server/../ibnbd_lib/ibnbd-proto.c: In function 'ibnbd_validate_msg_close':
   drivers/block/ibnbd_server/../ibnbd_lib/ibnbd-proto.c:151:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
      ERR_NP("Close msg received with unexpected length %lu instead"
             ^~~~~~
   drivers/block/ibnbd_server/../ibnbd_lib/ibnbd-proto.c:151:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
   drivers/block/ibnbd_server/../ibnbd_lib/ibnbd-proto.c: In function 'ibnbd_validate_msg_close_rsp':
   drivers/block/ibnbd_server/../ibnbd_lib/ibnbd-proto.c:163:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
      ERR_NP("Close_rsp msg received with unexpected length %lu"
             ^~~~~~
   drivers/block/ibnbd_server/../ibnbd_lib/ibnbd-proto.c:163:10: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
   warning: __mcount_loc already exists: drivers/block/ibnbd_server/../ibnbd_lib/ibnbd-proto.o

vim +994 drivers/block/ibnbd_server/ibnbd_srv.c

346428ec Jack Wang 2017-03-24   978  	msg.nsectors		= nsectors;
346428ec Jack Wang 2017-03-24   979  
346428ec Jack Wang 2017-03-24   980  	if (unlikely(sess_dev->sess->state == SESS_STATE_DISCONNECTED))
346428ec Jack Wang 2017-03-24   981  		return -ENODEV;
346428ec Jack Wang 2017-03-24   982  
346428ec Jack Wang 2017-03-24   983  	if (!sess_dev->is_visible) {
346428ec Jack Wang 2017-03-24   984  		INFO(sess_dev, "revalidate device failed, wait for sending "
346428ec Jack Wang 2017-03-24   985  		     "open reply first\n");
346428ec Jack Wang 2017-03-24   986  		return -EAGAIN;
346428ec Jack Wang 2017-03-24   987  	}
346428ec Jack Wang 2017-03-24   988  
346428ec Jack Wang 2017-03-24   989  	ret = ibtrs_srv_send(sess_dev->sess->ibtrs_sess, &vec, 1);
346428ec Jack Wang 2017-03-24   990  	if (unlikely(ret)) {
346428ec Jack Wang 2017-03-24   991  		ERR(sess_dev, "revalidate: Sending new device size"
346428ec Jack Wang 2017-03-24   992  		    " to client failed, errno: %d\n", ret);
346428ec Jack Wang 2017-03-24   993  	} else {
346428ec Jack Wang 2017-03-24  @994  		INFO(sess_dev, "notified client about device size change"
346428ec Jack Wang 2017-03-24   995  		     " (old nsectors: %lu, new nsectors: %lu)\n",
346428ec Jack Wang 2017-03-24   996  		     sess_dev->nsectors, nsectors);
346428ec Jack Wang 2017-03-24   997  		sess_dev->nsectors = nsectors;
346428ec Jack Wang 2017-03-24   998  	}
346428ec Jack Wang 2017-03-24   999  
346428ec Jack Wang 2017-03-24  1000  	return ret;
346428ec Jack Wang 2017-03-24  1001  }
346428ec Jack Wang 2017-03-24  1002  

:::::: The code at line 994 was first introduced by commit
:::::: 346428ec19d9ec225850f10b7fc26d98051d5f58 ibnbd_srv: add main functionality

:::::: TO: Jack Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
:::::: CC: 0day robot <fengguang.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 58354 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 13/28] ibtrs_srv: add Makefile and Kconfig
@ 2017-03-25 10:54     ` kbuild test robot
  0 siblings, 0 replies; 87+ messages in thread
From: kbuild test robot @ 2017-03-25 10:54 UTC (permalink / raw)
  To: Jack Wang
  Cc: kbuild-all, linux-block, linux-rdma, dledford, axboe, hch, mail,
	Milind.dumbare, yun.wang, Jack Wang

[-- Attachment #1: Type: text/plain, Size: 5997 bytes --]

Hi Jack,

[auto build test ERROR on linus/master]
[also build test ERROR on v4.11-rc3 next-20170324]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Jack-Wang/INFINIBAND-NETWORK-BLOCK-DEVICE-IBNBD/20170325-101629
config: arm-allyesconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=arm 

All errors (new ones prefixed by >>):

   drivers/infiniband/ulp/ibtrs_server/built-in.o: In function `ib_con_destroy':
>> common.c:(.text+0x81bc): multiple definition of `ib_con_destroy'
   drivers/infiniband/ulp/ibtrs_client/built-in.o:common.c:(.text+0xcca8): first defined here
   drivers/infiniband/ulp/ibtrs_server/built-in.o: In function `ibtrs_malloc':
>> common.c:(.text+0x8f0c): multiple definition of `ibtrs_malloc'
   drivers/infiniband/ulp/ibtrs_client/built-in.o:common.c:(.text+0xd9f8): first defined here
   drivers/infiniband/ulp/ibtrs_server/built-in.o: In function `ibtrs_write_empty_imm':
>> common.c:(.text+0x7f34): multiple definition of `ibtrs_write_empty_imm'
   drivers/infiniband/ulp/ibtrs_client/built-in.o:common.c:(.text+0xca20): first defined here
   drivers/infiniband/ulp/ibtrs_server/built-in.o: In function `ib_post_rdma_write':
>> common.c:(.text+0x8044): multiple definition of `ib_post_rdma_write'
   drivers/infiniband/ulp/ibtrs_client/built-in.o:common.c:(.text+0xcb30): first defined here
   drivers/infiniband/ulp/ibtrs_server/built-in.o: In function `fill_ibtrs_msg_sess_open':
>> common.c:(.text+0x86dc): multiple definition of `fill_ibtrs_msg_sess_open'
   drivers/infiniband/ulp/ibtrs_client/built-in.o:common.c:(.text+0xd1c8): first defined here
   drivers/infiniband/ulp/ibtrs_server/built-in.o: In function `ibtrs_addr_to_str':
>> common.c:(.text+0x7e80): multiple definition of `ibtrs_addr_to_str'
   drivers/infiniband/ulp/ibtrs_client/built-in.o:common.c:(.text+0xc96c): first defined here
   drivers/infiniband/ulp/ibtrs_server/built-in.o: In function `ibtrs_post_send':
>> common.c:(.text+0x7f94): multiple definition of `ibtrs_post_send'
   drivers/infiniband/ulp/ibtrs_client/built-in.o:common.c:(.text+0xca80): first defined here
   drivers/infiniband/ulp/ibtrs_server/built-in.o: In function `ibtrs_iu_put':
>> common.c:(.text+0x87d8): multiple definition of `ibtrs_iu_put'
   drivers/infiniband/ulp/ibtrs_client/built-in.o:common.c:(.text+0xd2c4): first defined here
   drivers/infiniband/ulp/ibtrs_server/built-in.o: In function `ib_session_init':
>> common.c:(.text+0x80e0): multiple definition of `ib_session_init'
   drivers/infiniband/ulp/ibtrs_client/built-in.o:common.c:(.text+0xcbcc): first defined here
   drivers/infiniband/ulp/ibtrs_server/built-in.o: In function `post_beacon':
>> common.c:(.text+0x8244): multiple definition of `post_beacon'
   drivers/infiniband/ulp/ibtrs_client/built-in.o:common.c:(.text+0xcd30): first defined here
   drivers/infiniband/ulp/ibtrs_server/built-in.o: In function `ibtrs_request_cq_notifications':
>> common.c:(.text+0x8194): multiple definition of `ibtrs_request_cq_notifications'
   drivers/infiniband/ulp/ibtrs_client/built-in.o:common.c:(.text+0xcc80): first defined here
   drivers/infiniband/ulp/ibtrs_server/built-in.o: In function `ibtrs_iu_get':
>> common.c:(.text+0x8818): multiple definition of `ibtrs_iu_get'
   drivers/infiniband/ulp/ibtrs_client/built-in.o:common.c:(.text+0xd304): first defined here
   drivers/infiniband/ulp/ibtrs_server/built-in.o: In function `ibtrs_heartbeat_set_send_ts':
>> common.c:(.text+0x8b30): multiple definition of `ibtrs_heartbeat_set_send_ts'
   drivers/infiniband/ulp/ibtrs_client/built-in.o:common.c:(.text+0xd61c): first defined here
   drivers/infiniband/ulp/ibtrs_server/built-in.o: In function `ibtrs_iu_free':
>> common.c:(.text+0x8a60): multiple definition of `ibtrs_iu_free'
   drivers/infiniband/ulp/ibtrs_client/built-in.o:common.c:(.text+0xd54c): first defined here
   drivers/infiniband/ulp/ibtrs_server/built-in.o: In function `ibtrs_heartbeat_send_ts_diff_ms':
>> common.c:(.text+0x8bf0): multiple definition of `ibtrs_heartbeat_send_ts_diff_ms'
   drivers/infiniband/ulp/ibtrs_client/built-in.o:common.c:(.text+0xd6dc): first defined here
   drivers/infiniband/ulp/ibtrs_server/built-in.o: In function `ib_get_max_wr_queue_size':
>> common.c:(.text+0x80c4): multiple definition of `ib_get_max_wr_queue_size'
   drivers/infiniband/ulp/ibtrs_client/built-in.o:common.c:(.text+0xcbb0): first defined here
   drivers/infiniband/ulp/ibtrs_server/built-in.o: In function `fill_ibtrs_msg_con_open':
>> common.c:(.text+0x8730): multiple definition of `fill_ibtrs_msg_con_open'
   drivers/infiniband/ulp/ibtrs_client/built-in.o:common.c:(.text+0xd21c): first defined here
   drivers/infiniband/ulp/ibtrs_server/built-in.o: In function `ibtrs_set_last_heartbeat':
>> common.c:(.text+0x8b90): multiple definition of `ibtrs_set_last_heartbeat'
   drivers/infiniband/ulp/ibtrs_client/built-in.o:common.c:(.text+0xd67c): first defined here
   drivers/infiniband/ulp/ibtrs_server/built-in.o: In function `ibtrs_zalloc':
>> common.c:(.text+0x8f44): multiple definition of `ibtrs_zalloc'
   drivers/infiniband/ulp/ibtrs_client/built-in.o:common.c:(.text+0xda30): first defined here
   drivers/infiniband/ulp/ibtrs_server/built-in.o: In function `ibtrs_heartbeat_warn':
>> common.c:(.text+0x8c4c): multiple definition of `ibtrs_heartbeat_warn'
   drivers/infiniband/ulp/ibtrs_client/built-in.o:common.c:(.text+0xd738): first defined here
   drivers/infiniband/ulp/ibtrs_server/built-in.o: In function `ib_session_destroy':

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 60767 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 13/28] ibtrs_srv: add Makefile and Kconfig
@ 2017-03-25 10:54     ` kbuild test robot
  0 siblings, 0 replies; 87+ messages in thread
From: kbuild test robot @ 2017-03-25 10:54 UTC (permalink / raw)
  Cc: kbuild-all-JC7UmRfGjtg, linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	dledford-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
	hch-jcswGhMUV9g, mail-99BIx50xQYGELgA04lAiVw,
	Milind.dumbare-Re5JQEeQqe8AvxtiuMwx3w,
	yun.wang-EIkl63zCoXaH+58JC4qpiA, Jack Wang

[-- Attachment #1: Type: text/plain, Size: 5997 bytes --]

Hi Jack,

[auto build test ERROR on linus/master]
[also build test ERROR on v4.11-rc3 next-20170324]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Jack-Wang/INFINIBAND-NETWORK-BLOCK-DEVICE-IBNBD/20170325-101629
config: arm-allyesconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=arm 

All errors (new ones prefixed by >>):

   drivers/infiniband/ulp/ibtrs_server/built-in.o: In function `ib_con_destroy':
>> common.c:(.text+0x81bc): multiple definition of `ib_con_destroy'
   drivers/infiniband/ulp/ibtrs_client/built-in.o:common.c:(.text+0xcca8): first defined here
   drivers/infiniband/ulp/ibtrs_server/built-in.o: In function `ibtrs_malloc':
>> common.c:(.text+0x8f0c): multiple definition of `ibtrs_malloc'
   drivers/infiniband/ulp/ibtrs_client/built-in.o:common.c:(.text+0xd9f8): first defined here
   drivers/infiniband/ulp/ibtrs_server/built-in.o: In function `ibtrs_write_empty_imm':
>> common.c:(.text+0x7f34): multiple definition of `ibtrs_write_empty_imm'
   drivers/infiniband/ulp/ibtrs_client/built-in.o:common.c:(.text+0xca20): first defined here
   drivers/infiniband/ulp/ibtrs_server/built-in.o: In function `ib_post_rdma_write':
>> common.c:(.text+0x8044): multiple definition of `ib_post_rdma_write'
   drivers/infiniband/ulp/ibtrs_client/built-in.o:common.c:(.text+0xcb30): first defined here
   drivers/infiniband/ulp/ibtrs_server/built-in.o: In function `fill_ibtrs_msg_sess_open':
>> common.c:(.text+0x86dc): multiple definition of `fill_ibtrs_msg_sess_open'
   drivers/infiniband/ulp/ibtrs_client/built-in.o:common.c:(.text+0xd1c8): first defined here
   drivers/infiniband/ulp/ibtrs_server/built-in.o: In function `ibtrs_addr_to_str':
>> common.c:(.text+0x7e80): multiple definition of `ibtrs_addr_to_str'
   drivers/infiniband/ulp/ibtrs_client/built-in.o:common.c:(.text+0xc96c): first defined here
   drivers/infiniband/ulp/ibtrs_server/built-in.o: In function `ibtrs_post_send':
>> common.c:(.text+0x7f94): multiple definition of `ibtrs_post_send'
   drivers/infiniband/ulp/ibtrs_client/built-in.o:common.c:(.text+0xca80): first defined here
   drivers/infiniband/ulp/ibtrs_server/built-in.o: In function `ibtrs_iu_put':
>> common.c:(.text+0x87d8): multiple definition of `ibtrs_iu_put'
   drivers/infiniband/ulp/ibtrs_client/built-in.o:common.c:(.text+0xd2c4): first defined here
   drivers/infiniband/ulp/ibtrs_server/built-in.o: In function `ib_session_init':
>> common.c:(.text+0x80e0): multiple definition of `ib_session_init'
   drivers/infiniband/ulp/ibtrs_client/built-in.o:common.c:(.text+0xcbcc): first defined here
   drivers/infiniband/ulp/ibtrs_server/built-in.o: In function `post_beacon':
>> common.c:(.text+0x8244): multiple definition of `post_beacon'
   drivers/infiniband/ulp/ibtrs_client/built-in.o:common.c:(.text+0xcd30): first defined here
   drivers/infiniband/ulp/ibtrs_server/built-in.o: In function `ibtrs_request_cq_notifications':
>> common.c:(.text+0x8194): multiple definition of `ibtrs_request_cq_notifications'
   drivers/infiniband/ulp/ibtrs_client/built-in.o:common.c:(.text+0xcc80): first defined here
   drivers/infiniband/ulp/ibtrs_server/built-in.o: In function `ibtrs_iu_get':
>> common.c:(.text+0x8818): multiple definition of `ibtrs_iu_get'
   drivers/infiniband/ulp/ibtrs_client/built-in.o:common.c:(.text+0xd304): first defined here
   drivers/infiniband/ulp/ibtrs_server/built-in.o: In function `ibtrs_heartbeat_set_send_ts':
>> common.c:(.text+0x8b30): multiple definition of `ibtrs_heartbeat_set_send_ts'
   drivers/infiniband/ulp/ibtrs_client/built-in.o:common.c:(.text+0xd61c): first defined here
   drivers/infiniband/ulp/ibtrs_server/built-in.o: In function `ibtrs_iu_free':
>> common.c:(.text+0x8a60): multiple definition of `ibtrs_iu_free'
   drivers/infiniband/ulp/ibtrs_client/built-in.o:common.c:(.text+0xd54c): first defined here
   drivers/infiniband/ulp/ibtrs_server/built-in.o: In function `ibtrs_heartbeat_send_ts_diff_ms':
>> common.c:(.text+0x8bf0): multiple definition of `ibtrs_heartbeat_send_ts_diff_ms'
   drivers/infiniband/ulp/ibtrs_client/built-in.o:common.c:(.text+0xd6dc): first defined here
   drivers/infiniband/ulp/ibtrs_server/built-in.o: In function `ib_get_max_wr_queue_size':
>> common.c:(.text+0x80c4): multiple definition of `ib_get_max_wr_queue_size'
   drivers/infiniband/ulp/ibtrs_client/built-in.o:common.c:(.text+0xcbb0): first defined here
   drivers/infiniband/ulp/ibtrs_server/built-in.o: In function `fill_ibtrs_msg_con_open':
>> common.c:(.text+0x8730): multiple definition of `fill_ibtrs_msg_con_open'
   drivers/infiniband/ulp/ibtrs_client/built-in.o:common.c:(.text+0xd21c): first defined here
   drivers/infiniband/ulp/ibtrs_server/built-in.o: In function `ibtrs_set_last_heartbeat':
>> common.c:(.text+0x8b90): multiple definition of `ibtrs_set_last_heartbeat'
   drivers/infiniband/ulp/ibtrs_client/built-in.o:common.c:(.text+0xd67c): first defined here
   drivers/infiniband/ulp/ibtrs_server/built-in.o: In function `ibtrs_zalloc':
>> common.c:(.text+0x8f44): multiple definition of `ibtrs_zalloc'
   drivers/infiniband/ulp/ibtrs_client/built-in.o:common.c:(.text+0xda30): first defined here
   drivers/infiniband/ulp/ibtrs_server/built-in.o: In function `ibtrs_heartbeat_warn':
>> common.c:(.text+0x8c4c): multiple definition of `ibtrs_heartbeat_warn'
   drivers/infiniband/ulp/ibtrs_client/built-in.o:common.c:(.text+0xd738): first defined here
   drivers/infiniband/ulp/ibtrs_server/built-in.o: In function `ib_session_destroy':

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 60767 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 20/28] ibnbd_clt: add Makefile and Kconfig
  2017-03-24 10:45   ` Jack Wang
@ 2017-03-25 11:17     ` kbuild test robot
  -1 siblings, 0 replies; 87+ messages in thread
From: kbuild test robot @ 2017-03-25 11:17 UTC (permalink / raw)
  To: Jack Wang
  Cc: kbuild-all, linux-block, linux-rdma, dledford, axboe, hch, mail,
	Milind.dumbare, yun.wang, Jack Wang

[-- Attachment #1: Type: text/plain, Size: 14414 bytes --]

Hi Jack,

[auto build test WARNING on linus/master]
[also build test WARNING on next-20170324]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Jack-Wang/INFINIBAND-NETWORK-BLOCK-DEVICE-IBNBD/20170325-101629
config: i386-allmodconfig (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All warnings (new ones prefixed by >>):

   In file included from include/linux/printk.h:6:0,
                    from include/linux/kernel.h:13,
                    from arch/x86/include/asm/percpu.h:44,
                    from arch/x86/include/asm/current.h:5,
                    from include/linux/sched.h:11,
                    from include/linux/blkdev.h:4,
                    from drivers/block/ibnbd_client/../ibnbd_lib/../ibnbd_inc/ibnbd.h:50,
                    from drivers/block/ibnbd_client/../ibnbd_lib/../ibnbd_inc/ibnbd-proto.h:50,
                    from drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:47:
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c: In function 'ibnbd_validate_msg_sess_info':
>> include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
>> drivers/block/ibnbd_client/../ibnbd_lib/../ibnbd_inc/log.h:50:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibnbd L%d ERR: " fmt, \
                             ^~~~~~
>> drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:54:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Sess info message with unexpected length received"
      ^~~~~~
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
>> drivers/block/ibnbd_client/../ibnbd_lib/../ibnbd_inc/log.h:50:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibnbd L%d ERR: " fmt, \
                             ^~~~~~
>> drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:54:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Sess info message with unexpected length received"
      ^~~~~~
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c: In function 'ibnbd_validate_msg_sess_info_rsp':
>> include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
>> drivers/block/ibnbd_client/../ibnbd_lib/../ibnbd_inc/log.h:50:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibnbd L%d ERR: " fmt, \
                             ^~~~~~
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:67:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Sess info message with unexpected length received"
      ^~~~~~
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
>> drivers/block/ibnbd_client/../ibnbd_lib/../ibnbd_inc/log.h:50:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibnbd L%d ERR: " fmt, \
                             ^~~~~~
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:67:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Sess info message with unexpected length received"
      ^~~~~~
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c: In function 'ibnbd_validate_msg_open_resp':
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
>> drivers/block/ibnbd_client/../ibnbd_lib/../ibnbd_inc/log.h:50:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibnbd L%d ERR: " fmt, \
                             ^~~~~~
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:82:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Open Response msg received with unexpected length"
      ^~~~~~
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c: In function 'ibnbd_validate_msg_revalidate':
>> include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
>> drivers/block/ibnbd_client/../ibnbd_lib/../ibnbd_inc/log.h:50:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibnbd L%d ERR: " fmt, \
                             ^~~~~~
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:114:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Device resize message with unexpected length received"
      ^~~~~~
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
>> drivers/block/ibnbd_client/../ibnbd_lib/../ibnbd_inc/log.h:50:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibnbd L%d ERR: " fmt, \
                             ^~~~~~
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:114:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Device resize message with unexpected length received"
      ^~~~~~
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c: In function 'ibnbd_validate_msg_open':
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
>> drivers/block/ibnbd_client/../ibnbd_lib/../ibnbd_inc/log.h:50:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibnbd L%d ERR: " fmt, \
                             ^~~~~~
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:126:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Open msg received with unexpected length"
      ^~~~~~
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c: In function 'ibnbd_validate_msg_close':
>> include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
>> drivers/block/ibnbd_client/../ibnbd_lib/../ibnbd_inc/log.h:50:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibnbd L%d ERR: " fmt, \
                             ^~~~~~
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:151:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Close msg received with unexpected length %lu instead"
      ^~~~~~
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
>> drivers/block/ibnbd_client/../ibnbd_lib/../ibnbd_inc/log.h:50:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibnbd L%d ERR: " fmt, \
                             ^~~~~~
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:151:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Close msg received with unexpected length %lu instead"
      ^~~~~~
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c: In function 'ibnbd_validate_msg_close_rsp':
>> include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
>> drivers/block/ibnbd_client/../ibnbd_lib/../ibnbd_inc/log.h:50:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibnbd L%d ERR: " fmt, \
                             ^~~~~~
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:163:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Close_rsp msg received with unexpected length %lu"
      ^~~~~~
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
>> drivers/block/ibnbd_client/../ibnbd_lib/../ibnbd_inc/log.h:50:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibnbd L%d ERR: " fmt, \
                             ^~~~~~
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:163:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Close_rsp msg received with unexpected length %lu"
      ^~~~~~

vim +4 include/linux/kern_levels.h

314ba352 Joe Perches 2012-07-30   1  #ifndef __KERN_LEVELS_H__
314ba352 Joe Perches 2012-07-30   2  #define __KERN_LEVELS_H__
314ba352 Joe Perches 2012-07-30   3  
04d2c8c8 Joe Perches 2012-07-30  @4  #define KERN_SOH	"\001"		/* ASCII Start Of Header */
04d2c8c8 Joe Perches 2012-07-30   5  #define KERN_SOH_ASCII	'\001'
04d2c8c8 Joe Perches 2012-07-30   6  
04d2c8c8 Joe Perches 2012-07-30   7  #define KERN_EMERG	KERN_SOH "0"	/* system is unusable */
04d2c8c8 Joe Perches 2012-07-30   8  #define KERN_ALERT	KERN_SOH "1"	/* action must be taken immediately */
04d2c8c8 Joe Perches 2012-07-30   9  #define KERN_CRIT	KERN_SOH "2"	/* critical conditions */
04d2c8c8 Joe Perches 2012-07-30  10  #define KERN_ERR	KERN_SOH "3"	/* error conditions */
04d2c8c8 Joe Perches 2012-07-30  11  #define KERN_WARNING	KERN_SOH "4"	/* warning conditions */
04d2c8c8 Joe Perches 2012-07-30  12  #define KERN_NOTICE	KERN_SOH "5"	/* normal but significant condition */

:::::: The code at line 4 was first introduced by commit
:::::: 04d2c8c83d0e3ac5f78aeede51babb3236200112 printk: convert the format for KERN_<LEVEL> to a 2 byte pattern

:::::: TO: Joe Perches <joe@perches.com>
:::::: CC: Linus Torvalds <torvalds@linux-foundation.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 59015 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 20/28] ibnbd_clt: add Makefile and Kconfig
@ 2017-03-25 11:17     ` kbuild test robot
  0 siblings, 0 replies; 87+ messages in thread
From: kbuild test robot @ 2017-03-25 11:17 UTC (permalink / raw)
  Cc: kbuild-all, linux-block, linux-rdma, dledford, axboe, hch, mail,
	Milind.dumbare, yun.wang, Jack Wang

[-- Attachment #1: Type: text/plain, Size: 14414 bytes --]

Hi Jack,

[auto build test WARNING on linus/master]
[also build test WARNING on next-20170324]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Jack-Wang/INFINIBAND-NETWORK-BLOCK-DEVICE-IBNBD/20170325-101629
config: i386-allmodconfig (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All warnings (new ones prefixed by >>):

   In file included from include/linux/printk.h:6:0,
                    from include/linux/kernel.h:13,
                    from arch/x86/include/asm/percpu.h:44,
                    from arch/x86/include/asm/current.h:5,
                    from include/linux/sched.h:11,
                    from include/linux/blkdev.h:4,
                    from drivers/block/ibnbd_client/../ibnbd_lib/../ibnbd_inc/ibnbd.h:50,
                    from drivers/block/ibnbd_client/../ibnbd_lib/../ibnbd_inc/ibnbd-proto.h:50,
                    from drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:47:
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c: In function 'ibnbd_validate_msg_sess_info':
>> include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
>> drivers/block/ibnbd_client/../ibnbd_lib/../ibnbd_inc/log.h:50:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibnbd L%d ERR: " fmt, \
                             ^~~~~~
>> drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:54:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Sess info message with unexpected length received"
      ^~~~~~
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
>> drivers/block/ibnbd_client/../ibnbd_lib/../ibnbd_inc/log.h:50:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibnbd L%d ERR: " fmt, \
                             ^~~~~~
>> drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:54:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Sess info message with unexpected length received"
      ^~~~~~
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c: In function 'ibnbd_validate_msg_sess_info_rsp':
>> include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
>> drivers/block/ibnbd_client/../ibnbd_lib/../ibnbd_inc/log.h:50:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibnbd L%d ERR: " fmt, \
                             ^~~~~~
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:67:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Sess info message with unexpected length received"
      ^~~~~~
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
>> drivers/block/ibnbd_client/../ibnbd_lib/../ibnbd_inc/log.h:50:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibnbd L%d ERR: " fmt, \
                             ^~~~~~
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:67:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Sess info message with unexpected length received"
      ^~~~~~
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c: In function 'ibnbd_validate_msg_open_resp':
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
>> drivers/block/ibnbd_client/../ibnbd_lib/../ibnbd_inc/log.h:50:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibnbd L%d ERR: " fmt, \
                             ^~~~~~
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:82:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Open Response msg received with unexpected length"
      ^~~~~~
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c: In function 'ibnbd_validate_msg_revalidate':
>> include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
>> drivers/block/ibnbd_client/../ibnbd_lib/../ibnbd_inc/log.h:50:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibnbd L%d ERR: " fmt, \
                             ^~~~~~
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:114:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Device resize message with unexpected length received"
      ^~~~~~
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
>> drivers/block/ibnbd_client/../ibnbd_lib/../ibnbd_inc/log.h:50:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibnbd L%d ERR: " fmt, \
                             ^~~~~~
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:114:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Device resize message with unexpected length received"
      ^~~~~~
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c: In function 'ibnbd_validate_msg_open':
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
>> drivers/block/ibnbd_client/../ibnbd_lib/../ibnbd_inc/log.h:50:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibnbd L%d ERR: " fmt, \
                             ^~~~~~
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:126:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Open msg received with unexpected length"
      ^~~~~~
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c: In function 'ibnbd_validate_msg_close':
>> include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
>> drivers/block/ibnbd_client/../ibnbd_lib/../ibnbd_inc/log.h:50:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibnbd L%d ERR: " fmt, \
                             ^~~~~~
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:151:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Close msg received with unexpected length %lu instead"
      ^~~~~~
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
>> drivers/block/ibnbd_client/../ibnbd_lib/../ibnbd_inc/log.h:50:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibnbd L%d ERR: " fmt, \
                             ^~~~~~
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:151:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Close msg received with unexpected length %lu instead"
      ^~~~~~
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c: In function 'ibnbd_validate_msg_close_rsp':
>> include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
>> drivers/block/ibnbd_client/../ibnbd_lib/../ibnbd_inc/log.h:50:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibnbd L%d ERR: " fmt, \
                             ^~~~~~
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:163:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Close_rsp msg received with unexpected length %lu"
      ^~~~~~
   include/linux/kern_levels.h:4:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/kern_levels.h:10:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:301:9: note: in expansion of macro 'KERN_ERR'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
            ^~~~~~~~
>> drivers/block/ibnbd_client/../ibnbd_lib/../ibnbd_inc/log.h:50:26: note: in expansion of macro 'pr_err'
    #define ERR_NP(fmt, ...) pr_err("ibnbd L%d ERR: " fmt, \
                             ^~~~~~
   drivers/block/ibnbd_client/../ibnbd_lib/ibnbd-proto.c:163:3: note: in expansion of macro 'ERR_NP'
      ERR_NP("Close_rsp msg received with unexpected length %lu"
      ^~~~~~

vim +4 include/linux/kern_levels.h

314ba352 Joe Perches 2012-07-30   1  #ifndef __KERN_LEVELS_H__
314ba352 Joe Perches 2012-07-30   2  #define __KERN_LEVELS_H__
314ba352 Joe Perches 2012-07-30   3  
04d2c8c8 Joe Perches 2012-07-30  @4  #define KERN_SOH	"\001"		/* ASCII Start Of Header */
04d2c8c8 Joe Perches 2012-07-30   5  #define KERN_SOH_ASCII	'\001'
04d2c8c8 Joe Perches 2012-07-30   6  
04d2c8c8 Joe Perches 2012-07-30   7  #define KERN_EMERG	KERN_SOH "0"	/* system is unusable */
04d2c8c8 Joe Perches 2012-07-30   8  #define KERN_ALERT	KERN_SOH "1"	/* action must be taken immediately */
04d2c8c8 Joe Perches 2012-07-30   9  #define KERN_CRIT	KERN_SOH "2"	/* critical conditions */
04d2c8c8 Joe Perches 2012-07-30  10  #define KERN_ERR	KERN_SOH "3"	/* error conditions */
04d2c8c8 Joe Perches 2012-07-30  11  #define KERN_WARNING	KERN_SOH "4"	/* warning conditions */
04d2c8c8 Joe Perches 2012-07-30  12  #define KERN_NOTICE	KERN_SOH "5"	/* normal but significant condition */

:::::: The code at line 4 was first introduced by commit
:::::: 04d2c8c83d0e3ac5f78aeede51babb3236200112 printk: convert the format for KERN_<LEVEL> to a 2 byte pattern

:::::: TO: Joe Perches <joe@perches.com>
:::::: CC: Linus Torvalds <torvalds@linux-foundation.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 59015 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH 00/28] INFINIBAND NETWORK BLOCK DEVICE (IBNBD)
@ 2017-03-27  2:20   ` Sagi Grimberg
  0 siblings, 0 replies; 87+ messages in thread
From: Sagi Grimberg @ 2017-03-27  2:20 UTC (permalink / raw)
  To: Jack Wang, linux-block, linux-rdma
  Cc: dledford, axboe, hch, mail, Milind.dumbare, yun.wang


> This series introduces IBNBD/IBTRS kernel modules.
>
> IBNBD (InfiniBand network block device) allows for an RDMA transfer of block IO
> over InfiniBand network. The driver presents itself as a block device on client
> side and transmits the block requests in a zero-copy fashion to the server-side
> via InfiniBand. The server part of the driver converts the incoming buffers back
> into BIOs and hands them down to the underlying block device. As soon as IO
> responses come back from the drive, they are being transmitted back to the
> client.

Hi Jack, Danil and Roman,

I met Danil and Roman last week at Vault, and I think you guys
are awesome, thanks a lot for open-sourcing your work! However,
I have a couple of issues here, some are related to the code and
some are fundamental actually.

- Is there room for this ibnbd? If we were to take every block driver
   that was submitted without sufficient justification, it'd be very
   hard to maintain. What advantage (if any) does this buy anyone over
   existing rdma based protocols (srp, iser, nvmf)? I'm really (*really*)
   not sold on this one...

- To me, the fundamental design that the client side owns a pool of
   buffers that it issues writes too, seems inferior than the
   one taken in iser/nvmf (immediate data). IMO, the ibnbd design has
   scalability issues both in terms of server side resources, client
   side contention and network congestion (on infiniband the latter is
   less severe).

- I suggest that for your next post, you provide a real-life use-case
   where each of the existing drivers can't suffice, and by can't
   suffice I mean that it has a fundamental issue with it, not something
   that requires a fix. With that our feedback can be much more concrete
   and (at least on my behalf) more open to accept it.

- I'm not exactly sure why you would suggest that your implementation
   supports only infiniband if you use rdma_cm for address resolution,
   nor I understand why you emphasize feature (2) below, nor why even
   in the presence of rdma_cm you have ibtrs_ib_path? (confused...)
   iWARP needs a bit more attention if you don't use the new generic
   interfaces though...

- I honestly do not understand why you need *19057* LOC to implement
   a rdma based block driver. Thats almost larger than all of our
   existing block drivers combined... First glance at the code provides
   some explanations, (1) you have some strange code that has no business
   in a block driver like ibtrs_malloc/ibtrs_zalloc (yikes) or
   open-coding various existing logging routines, (2) you are for some
   reason adding a second tag allocation scheme (why?), (3) you are open
   coding a lot of stuff that we added to the stack in the past months...
   (4) you seem to over-layer your code for reasons that I do not
   really understand. And I didn't really look deep at all into the
   code, just to get the feel of it, and it seems like it needs a lot
   of work before it can even be considered upstream ready.

> We design and implement this solution based on our need for Cloud Computing,
> the key features are:
> - High throughput and low latency due to:
> 1) Only two rdma messages per IO

Where exactly did you witnessed latency that was meaningful by having
another rdma message on the wire? That's only for writes, anyway, and
we have first data bursts for that..

> 2) Simplified client side server memory management
> 3) Eliminated SCSI sublayer

That's hardly an advantage given all we are losing without it...

...

Cheers,
Sagi.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH 00/28] INFINIBAND NETWORK BLOCK DEVICE (IBNBD)
@ 2017-03-27  2:20   ` Sagi Grimberg
  0 siblings, 0 replies; 87+ messages in thread
From: Sagi Grimberg @ 2017-03-27  2:20 UTC (permalink / raw)
  To: Jack Wang, linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
	hch-jcswGhMUV9g, mail-99BIx50xQYGELgA04lAiVw,
	Milind.dumbare-Re5JQEeQqe8AvxtiuMwx3w,
	yun.wang-EIkl63zCoXaH+58JC4qpiA


> This series introduces IBNBD/IBTRS kernel modules.
>
> IBNBD (InfiniBand network block device) allows for an RDMA transfer of block IO
> over InfiniBand network. The driver presents itself as a block device on client
> side and transmits the block requests in a zero-copy fashion to the server-side
> via InfiniBand. The server part of the driver converts the incoming buffers back
> into BIOs and hands them down to the underlying block device. As soon as IO
> responses come back from the drive, they are being transmitted back to the
> client.

Hi Jack, Danil and Roman,

I met Danil and Roman last week at Vault, and I think you guys
are awesome, thanks a lot for open-sourcing your work! However,
I have a couple of issues here, some are related to the code and
some are fundamental actually.

- Is there room for this ibnbd? If we were to take every block driver
   that was submitted without sufficient justification, it'd be very
   hard to maintain. What advantage (if any) does this buy anyone over
   existing rdma based protocols (srp, iser, nvmf)? I'm really (*really*)
   not sold on this one...

- To me, the fundamental design that the client side owns a pool of
   buffers that it issues writes too, seems inferior than the
   one taken in iser/nvmf (immediate data). IMO, the ibnbd design has
   scalability issues both in terms of server side resources, client
   side contention and network congestion (on infiniband the latter is
   less severe).

- I suggest that for your next post, you provide a real-life use-case
   where each of the existing drivers can't suffice, and by can't
   suffice I mean that it has a fundamental issue with it, not something
   that requires a fix. With that our feedback can be much more concrete
   and (at least on my behalf) more open to accept it.

- I'm not exactly sure why you would suggest that your implementation
   supports only infiniband if you use rdma_cm for address resolution,
   nor I understand why you emphasize feature (2) below, nor why even
   in the presence of rdma_cm you have ibtrs_ib_path? (confused...)
   iWARP needs a bit more attention if you don't use the new generic
   interfaces though...

- I honestly do not understand why you need *19057* LOC to implement
   a rdma based block driver. Thats almost larger than all of our
   existing block drivers combined... First glance at the code provides
   some explanations, (1) you have some strange code that has no business
   in a block driver like ibtrs_malloc/ibtrs_zalloc (yikes) or
   open-coding various existing logging routines, (2) you are for some
   reason adding a second tag allocation scheme (why?), (3) you are open
   coding a lot of stuff that we added to the stack in the past months...
   (4) you seem to over-layer your code for reasons that I do not
   really understand. And I didn't really look deep at all into the
   code, just to get the feel of it, and it seems like it needs a lot
   of work before it can even be considered upstream ready.

> We design and implement this solution based on our need for Cloud Computing,
> the key features are:
> - High throughput and low latency due to:
> 1) Only two rdma messages per IO

Where exactly did you witnessed latency that was meaningful by having
another rdma message on the wire? That's only for writes, anyway, and
we have first data bursts for that..

> 2) Simplified client side server memory management
> 3) Eliminated SCSI sublayer

That's hardly an advantage given all we are losing without it...

...

Cheers,
Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH 00/28] INFINIBAND NETWORK BLOCK DEVICE (IBNBD)
  2017-03-27  2:20   ` Sagi Grimberg
@ 2017-03-27 10:21     ` Jinpu Wang
  -1 siblings, 0 replies; 87+ messages in thread
From: Jinpu Wang @ 2017-03-27 10:21 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: linux-block, linux-rdma, Doug Ledford, Jens Axboe, hch,
	Fabian Holler, Milind Dumbare, Michael Wang, Roman Penyaev,
	Danil Kipnis

Hi Sagi,

On Mon, Mar 27, 2017 at 4:20 AM, Sagi Grimberg <sagi@grimberg.me> wrote:
>
>> This series introduces IBNBD/IBTRS kernel modules.
>>
>> IBNBD (InfiniBand network block device) allows for an RDMA transfer of
>> block IO
>> over InfiniBand network. The driver presents itself as a block device on
>> client
>> side and transmits the block requests in a zero-copy fashion to the
>> server-side
>> via InfiniBand. The server part of the driver converts the incoming
>> buffers back
>> into BIOs and hands them down to the underlying block device. As soon as
>> IO
>> responses come back from the drive, they are being transmitted back to t=
he
>> client.
>
>
> Hi Jack, Danil and Roman,
>
> I met Danil and Roman last week at Vault, and I think you guys
> are awesome, thanks a lot for open-sourcing your work! However,
> I have a couple of issues here, some are related to the code and
> some are fundamental actually.

Thanks for comments and suggestions, reply in inline.

>
> - Is there room for this ibnbd? If we were to take every block driver
>   that was submitted without sufficient justification, it'd be very
>   hard to maintain. What advantage (if any) does this buy anyone over
>   existing rdma based protocols (srp, iser, nvmf)? I'm really (*really*)
>   not sold on this one...
>
> - To me, the fundamental design that the client side owns a pool of
>   buffers that it issues writes too, seems inferior than the
>   one taken in iser/nvmf (immediate data). IMO, the ibnbd design has
>   scalability issues both in terms of server side resources, client
>   side contention and network congestion (on infiniband the latter is
>   less severe).
>
> - I suggest that for your next post, you provide a real-life use-case
>   where each of the existing drivers can't suffice, and by can't
>   suffice I mean that it has a fundamental issue with it, not something
>   that requires a fix. With that our feedback can be much more concrete
>   and (at least on my behalf) more open to accept it.
>
> - I'm not exactly sure why you would suggest that your implementation
>   supports only infiniband if you use rdma_cm for address resolution,
>   nor I understand why you emphasize feature (2) below, nor why even
>   in the presence of rdma_cm you have ibtrs_ib_path? (confused...)
>   iWARP needs a bit more attention if you don't use the new generic
>   interfaces though...
You reminds me, we also tested in rxe drivers in the past, but not iWARP.
Might work.
ibtrs_ib_path was leftover for APM feature, we used in house, will
remove next round.

>
> - I honestly do not understand why you need *19057* LOC to implement
>   a rdma based block driver. Thats almost larger than all of our
>   existing block drivers combined... First glance at the code provides
>   some explanations, (1) you have some strange code that has no business
>   in a block driver like ibtrs_malloc/ibtrs_zalloc (yikes) or
>   open-coding various existing logging routines, (2) you are for some
>   reason adding a second tag allocation scheme (why?), (3) you are open
>   coding a lot of stuff that we added to the stack in the past months...
>   (4) you seem to over-layer your code for reasons that I do not
>   really understand. And I didn't really look deep at all into the
>   code, just to get the feel of it, and it seems like it needs a lot
>   of work before it can even be considered upstream ready.
Agree, we will clean up the code further, that's why I sent it RFC to
get a early feedback.


>
>> We design and implement this solution based on our need for Cloud
>> Computing,
>> the key features are:
>> - High throughput and low latency due to:
>> 1) Only two rdma messages per IO
>
>
> Where exactly did you witnessed latency that was meaningful by having
> another rdma message on the wire? That's only for writes, anyway, and
> we have first data bursts for that..
Clearly, we need to benchmark on latest kernel.

>
>> 2) Simplified client side server memory management
>> 3) Eliminated SCSI sublayer
>
>
> That's hardly an advantage given all we are losing without it...
>
> ...
>
> Cheers,
> Sagi.

Thanks,
--=20
Jack Wang
Linux Kernel Developer

ProfitBricks GmbH
Greifswalder Str. 207
D - 10405 Berlin

Tel:       +49 30 577 008  042
Fax:      +49 30 577 008 299
Email:    jinpu.wang@profitbricks.com
URL:      https://www.profitbricks.de

Sitz der Gesellschaft: Berlin
Registergericht: Amtsgericht Charlottenburg, HRB 125506 B
Gesch=C3=A4ftsf=C3=BChrer: Achim Weiss

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH 00/28] INFINIBAND NETWORK BLOCK DEVICE (IBNBD)
@ 2017-03-27 10:21     ` Jinpu Wang
  0 siblings, 0 replies; 87+ messages in thread
From: Jinpu Wang @ 2017-03-27 10:21 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: linux-block, linux-rdma, Doug Ledford, Jens Axboe, hch,
	Fabian Holler, Milind Dumbare, Michael Wang, Roman Penyaev,
	Danil Kipnis

Hi Sagi,

On Mon, Mar 27, 2017 at 4:20 AM, Sagi Grimberg <sagi@grimberg.me> wrote:
>
>> This series introduces IBNBD/IBTRS kernel modules.
>>
>> IBNBD (InfiniBand network block device) allows for an RDMA transfer of
>> block IO
>> over InfiniBand network. The driver presents itself as a block device on
>> client
>> side and transmits the block requests in a zero-copy fashion to the
>> server-side
>> via InfiniBand. The server part of the driver converts the incoming
>> buffers back
>> into BIOs and hands them down to the underlying block device. As soon as
>> IO
>> responses come back from the drive, they are being transmitted back to the
>> client.
>
>
> Hi Jack, Danil and Roman,
>
> I met Danil and Roman last week at Vault, and I think you guys
> are awesome, thanks a lot for open-sourcing your work! However,
> I have a couple of issues here, some are related to the code and
> some are fundamental actually.

Thanks for comments and suggestions, reply in inline.

>
> - Is there room for this ibnbd? If we were to take every block driver
>   that was submitted without sufficient justification, it'd be very
>   hard to maintain. What advantage (if any) does this buy anyone over
>   existing rdma based protocols (srp, iser, nvmf)? I'm really (*really*)
>   not sold on this one...
>
> - To me, the fundamental design that the client side owns a pool of
>   buffers that it issues writes too, seems inferior than the
>   one taken in iser/nvmf (immediate data). IMO, the ibnbd design has
>   scalability issues both in terms of server side resources, client
>   side contention and network congestion (on infiniband the latter is
>   less severe).
>
> - I suggest that for your next post, you provide a real-life use-case
>   where each of the existing drivers can't suffice, and by can't
>   suffice I mean that it has a fundamental issue with it, not something
>   that requires a fix. With that our feedback can be much more concrete
>   and (at least on my behalf) more open to accept it.
>
> - I'm not exactly sure why you would suggest that your implementation
>   supports only infiniband if you use rdma_cm for address resolution,
>   nor I understand why you emphasize feature (2) below, nor why even
>   in the presence of rdma_cm you have ibtrs_ib_path? (confused...)
>   iWARP needs a bit more attention if you don't use the new generic
>   interfaces though...
You reminds me, we also tested in rxe drivers in the past, but not iWARP.
Might work.
ibtrs_ib_path was leftover for APM feature, we used in house, will
remove next round.

>
> - I honestly do not understand why you need *19057* LOC to implement
>   a rdma based block driver. Thats almost larger than all of our
>   existing block drivers combined... First glance at the code provides
>   some explanations, (1) you have some strange code that has no business
>   in a block driver like ibtrs_malloc/ibtrs_zalloc (yikes) or
>   open-coding various existing logging routines, (2) you are for some
>   reason adding a second tag allocation scheme (why?), (3) you are open
>   coding a lot of stuff that we added to the stack in the past months...
>   (4) you seem to over-layer your code for reasons that I do not
>   really understand. And I didn't really look deep at all into the
>   code, just to get the feel of it, and it seems like it needs a lot
>   of work before it can even be considered upstream ready.
Agree, we will clean up the code further, that's why I sent it RFC to
get a early feedback.


>
>> We design and implement this solution based on our need for Cloud
>> Computing,
>> the key features are:
>> - High throughput and low latency due to:
>> 1) Only two rdma messages per IO
>
>
> Where exactly did you witnessed latency that was meaningful by having
> another rdma message on the wire? That's only for writes, anyway, and
> we have first data bursts for that..
Clearly, we need to benchmark on latest kernel.

>
>> 2) Simplified client side server memory management
>> 3) Eliminated SCSI sublayer
>
>
> That's hardly an advantage given all we are losing without it...
>
> ...
>
> Cheers,
> Sagi.

Thanks,
-- 
Jack Wang
Linux Kernel Developer

ProfitBricks GmbH
Greifswalder Str. 207
D - 10405 Berlin

Tel:       +49 30 577 008  042
Fax:      +49 30 577 008 299
Email:    jinpu.wang@profitbricks.com
URL:      https://www.profitbricks.de

Sitz der Gesellschaft: Berlin
Registergericht: Amtsgericht Charlottenburg, HRB 125506 B
Geschäftsführer: Achim Weiss

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH 00/28] INFINIBAND NETWORK BLOCK DEVICE (IBNBD)
@ 2017-03-28 14:17       ` Roman Penyaev
  0 siblings, 0 replies; 87+ messages in thread
From: Roman Penyaev @ 2017-03-28 14:17 UTC (permalink / raw)
  To: Jinpu Wang
  Cc: Sagi Grimberg, linux-block, linux-rdma, Doug Ledford, Jens Axboe,
	hch, Fabian Holler, Milind Dumbare, Michael Wang, Danil Kipnis

Hi Bart and Sagi,

Thanks for warm welcome and early feedback.  I will respond both of you
but here, on Jack's email, since I am not in CC in the first cover letter
(what a bummer).  Sorry for mess.

Sagi Grimberg <sagi@grimberg.me> wrote:

> - Is there room for this ibnbd? If we were to take every block driver
>   that was submitted without sufficient justification, it'd be very
>   hard to maintain. What advantage (if any) does this buy anyone over
>   existing rdma based protocols (srp, iser, nvmf)? I'm really (*really*)
>   not sold on this one...

It seems that better to start from the history.  IBNBD project, as it was
presented, is not only the block device (which supposed to be thin), but
mainly it is an rdma transport, client and server logic, which is called
IBTRS in our terms.  IBTRS was the starter, the main idea for us, which is
planned to bind our own replicated storage solution via infiniband.

We wanted clear transport interface, which should not be very much different
from what normal bsd sockets provide, e.g.:

  ibtrs_clt_open()               - Opens a session to a server.
  ibtrs_clt_rdma_write()         - WRITE, i.e transfer data to server.
  ibtrs_clt_request_rdma_write() - READ, i.e. request data transfer from server.

We did not want to rely, depend and embed any existent commands sets and
protocols (e.g. SCSI) inside our transport.  IBTRS should stay apart from
any storage knowledge and should be able to do only two things:

  1. establish connection to a server (accept connections from clients)
  2. read/write any data

Thinking about transport as a layer, IBNBD is just a user of that layer,
thin block device with only 4 commands, which establishes one connection
to the server and maps N block devices thru that connection.

I pretty well realize, that I've described obvious things with this
layering, but again, what we wanted and achieved is an independent
IB transport, which is planned to be used for replication and in that
project IBNBD won't exist.

> - To me, the fundamental design that the client side owns a pool of
>   buffers that it issues writes too, seems inferior than the
>   one taken in iser/nvmf (immediate data).

That of course discussable, since we consider that as a feature :)
But indeed, we never tested against nvmf.  And of course, I am open
to any surprises.

>   IMO, the ibnbd design has
>   scalability issues both in terms of server side resources,

Each client connection eats ~64Mb of memory on server side for the IO
buffer pool.  Since we are talking about hundreds of connections this
is reasonable trade off (memory is considerably cheap) to avoid any
allocations on IO path and make it completely lockless.

>   client side contention

We do not have any noticeable contentions.  What we have is a single
queue of buffers for all devices mapped on a client per connection
(session) and I can consider that as a single point of resource, where
each device have to fight in order to get an empty bit from a bitmap.
In practice all benchmarks we did (I have to say, that this was a pretty
old 4.4 kernel, and fresh data - is the amount of work which obviously
should be done) show that even we share a queue a bottleneck is always
an infiniband.

>   and network congestion (on infiniband the latter is
>   less severe).

We rely on IB flow control and if a server side did not respond
with

  ib_post_recv()

we do not reach the wire from a client.  That should be enough.

> - I honestly do not understand why you need *19057* LOC to implement
>   a rdma based block driver. Thats almost larger than all of our
>   existing block drivers combined...

Oh, those LOC numbers :)  IBNBD client (block device) itself is around
2800 lines with all sysfs handy things.  What is really bloated is a
transport client and server sides, which pretend to be smart and cover:

  o reconnects
  o heartbeats
  o quite huge FSM implementation for each connection (session)

True, this code can be split on common parts and deflated.

>   First glance at the code provides
>   some explanations,
> (1) you have some strange code that has no business
>   in a block driver like ibtrs_malloc/ibtrs_zalloc (yikes)

Indeed, that is crap which is left from those times, when we tried
to cover all generic kernel calls in order to do fault injection.

>   open-coding various existing logging routines,

No excuses, will be vanished.

> (2) you are for some
>   reason adding a second tag allocation scheme (why?)

Several reasons.  a) We share single queue of buffers for mapped devices
per client connection and we still support RQ mode.  b) MQ shared tags
indeed are shared, but not between different hctxs.  I.e. because of
our single queue for N devices per connection we are not able to create
M hctxs and share this single queue between them.  Sometimes we need to
do blk_mq_stop_hw_queue() in order to stop the hw queue and then restart
it from a completion.  Obviously, simplest approach is to equally share
static number of buffers in queue between N devices.  But that won't work
since with big N we finish with very small queue depth per device.

Of course, would be nice if along with BLK_MQ_F_TAG_SHARED another flag
exists, say, BLK_MQ_F_TAG_GLOBALLY_SHARED.  Unfortunately with our design
MQ does not help us a lot with current shared tags implementation.

> (3) you are open
>   coding a lot of stuff that we added to the stack in the past months...

Could you please point precisely in the code what do you mean?

If you are saying that you added a lot to nvmf, I would like to take a
pause and postpone nvmf discussion till we have fresh numbers and more
understanding.

>   (4) you seem to over-layer your code for reasons that I do not
>   really understand.

Frankly, I did not get this.  Here 'over-layer code' sounds very abstract
to me.

>   And I didn't really look deep at all into the
>   code, just to get the feel of it, and it seems like it needs a lot
>   of work before it can even be considered upstream ready.

Definetly.



Bart Van Assche <Bart.VanAssche@sandisk.com> wrote:

> * Doesn't scale in terms of number of CPUs submitting I/O. The graphs shown
>   during the Vault talk clearly illustrate this. This is probably the result
>   of sharing a data structure across all client CPUs, maybe the bitmap that
>   tracks which parts of the target buffer space are in use.

Probably that was not clear from the presentation, but that exactly what
was fixed by CPU affinity (next slide after those hapless graphs).  What
is left unclear to us is the NUMA effect, which is not related to distances,
reported by numactl.

But still your advice to run everything against perf and check cache misses
is more than valid.  Thanks.

> * Supports IB but none of the other RDMA transports (RoCE / iWARP).

We did a lot of experiments with SoftRoCe and virtualized environment
in order to have quick testing results on one host without hardware.
That works quite well (despite SoftRoCe bugs which were fixed locally).
So should not be a big deal to cover iWARP.

> We also need performance numbers that compare IBNBD against SRP and/or
> NVMeOF with memory registration disabled to see whether and how much faster
> IBNBD is compared to these two protocols.

Indeed, we've never tested against nvmf and that is a must.


Thanks.

--
Roman

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH 00/28] INFINIBAND NETWORK BLOCK DEVICE (IBNBD)
@ 2017-03-28 14:17       ` Roman Penyaev
  0 siblings, 0 replies; 87+ messages in thread
From: Roman Penyaev @ 2017-03-28 14:17 UTC (permalink / raw)
  To: Jinpu Wang
  Cc: Sagi Grimberg, linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Jens Axboe, hch,
	Fabian Holler, Milind Dumbare, Michael Wang, Danil Kipnis

Hi Bart and Sagi,

Thanks for warm welcome and early feedback.  I will respond both of you
but here, on Jack's email, since I am not in CC in the first cover letter
(what a bummer).  Sorry for mess.

Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org> wrote:

> - Is there room for this ibnbd? If we were to take every block driver
>   that was submitted without sufficient justification, it'd be very
>   hard to maintain. What advantage (if any) does this buy anyone over
>   existing rdma based protocols (srp, iser, nvmf)? I'm really (*really*)
>   not sold on this one...

It seems that better to start from the history.  IBNBD project, as it was
presented, is not only the block device (which supposed to be thin), but
mainly it is an rdma transport, client and server logic, which is called
IBTRS in our terms.  IBTRS was the starter, the main idea for us, which is
planned to bind our own replicated storage solution via infiniband.

We wanted clear transport interface, which should not be very much different
from what normal bsd sockets provide, e.g.:

  ibtrs_clt_open()               - Opens a session to a server.
  ibtrs_clt_rdma_write()         - WRITE, i.e transfer data to server.
  ibtrs_clt_request_rdma_write() - READ, i.e. request data transfer from server.

We did not want to rely, depend and embed any existent commands sets and
protocols (e.g. SCSI) inside our transport.  IBTRS should stay apart from
any storage knowledge and should be able to do only two things:

  1. establish connection to a server (accept connections from clients)
  2. read/write any data

Thinking about transport as a layer, IBNBD is just a user of that layer,
thin block device with only 4 commands, which establishes one connection
to the server and maps N block devices thru that connection.

I pretty well realize, that I've described obvious things with this
layering, but again, what we wanted and achieved is an independent
IB transport, which is planned to be used for replication and in that
project IBNBD won't exist.

> - To me, the fundamental design that the client side owns a pool of
>   buffers that it issues writes too, seems inferior than the
>   one taken in iser/nvmf (immediate data).

That of course discussable, since we consider that as a feature :)
But indeed, we never tested against nvmf.  And of course, I am open
to any surprises.

>   IMO, the ibnbd design has
>   scalability issues both in terms of server side resources,

Each client connection eats ~64Mb of memory on server side for the IO
buffer pool.  Since we are talking about hundreds of connections this
is reasonable trade off (memory is considerably cheap) to avoid any
allocations on IO path and make it completely lockless.

>   client side contention

We do not have any noticeable contentions.  What we have is a single
queue of buffers for all devices mapped on a client per connection
(session) and I can consider that as a single point of resource, where
each device have to fight in order to get an empty bit from a bitmap.
In practice all benchmarks we did (I have to say, that this was a pretty
old 4.4 kernel, and fresh data - is the amount of work which obviously
should be done) show that even we share a queue a bottleneck is always
an infiniband.

>   and network congestion (on infiniband the latter is
>   less severe).

We rely on IB flow control and if a server side did not respond
with

  ib_post_recv()

we do not reach the wire from a client.  That should be enough.

> - I honestly do not understand why you need *19057* LOC to implement
>   a rdma based block driver. Thats almost larger than all of our
>   existing block drivers combined...

Oh, those LOC numbers :)  IBNBD client (block device) itself is around
2800 lines with all sysfs handy things.  What is really bloated is a
transport client and server sides, which pretend to be smart and cover:

  o reconnects
  o heartbeats
  o quite huge FSM implementation for each connection (session)

True, this code can be split on common parts and deflated.

>   First glance at the code provides
>   some explanations,
> (1) you have some strange code that has no business
>   in a block driver like ibtrs_malloc/ibtrs_zalloc (yikes)

Indeed, that is crap which is left from those times, when we tried
to cover all generic kernel calls in order to do fault injection.

>   open-coding various existing logging routines,

No excuses, will be vanished.

> (2) you are for some
>   reason adding a second tag allocation scheme (why?)

Several reasons.  a) We share single queue of buffers for mapped devices
per client connection and we still support RQ mode.  b) MQ shared tags
indeed are shared, but not between different hctxs.  I.e. because of
our single queue for N devices per connection we are not able to create
M hctxs and share this single queue between them.  Sometimes we need to
do blk_mq_stop_hw_queue() in order to stop the hw queue and then restart
it from a completion.  Obviously, simplest approach is to equally share
static number of buffers in queue between N devices.  But that won't work
since with big N we finish with very small queue depth per device.

Of course, would be nice if along with BLK_MQ_F_TAG_SHARED another flag
exists, say, BLK_MQ_F_TAG_GLOBALLY_SHARED.  Unfortunately with our design
MQ does not help us a lot with current shared tags implementation.

> (3) you are open
>   coding a lot of stuff that we added to the stack in the past months...

Could you please point precisely in the code what do you mean?

If you are saying that you added a lot to nvmf, I would like to take a
pause and postpone nvmf discussion till we have fresh numbers and more
understanding.

>   (4) you seem to over-layer your code for reasons that I do not
>   really understand.

Frankly, I did not get this.  Here 'over-layer code' sounds very abstract
to me.

>   And I didn't really look deep at all into the
>   code, just to get the feel of it, and it seems like it needs a lot
>   of work before it can even be considered upstream ready.

Definetly.



Bart Van Assche <Bart.VanAssche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org> wrote:

> * Doesn't scale in terms of number of CPUs submitting I/O. The graphs shown
>   during the Vault talk clearly illustrate this. This is probably the result
>   of sharing a data structure across all client CPUs, maybe the bitmap that
>   tracks which parts of the target buffer space are in use.

Probably that was not clear from the presentation, but that exactly what
was fixed by CPU affinity (next slide after those hapless graphs).  What
is left unclear to us is the NUMA effect, which is not related to distances,
reported by numactl.

But still your advice to run everything against perf and check cache misses
is more than valid.  Thanks.

> * Supports IB but none of the other RDMA transports (RoCE / iWARP).

We did a lot of experiments with SoftRoCe and virtualized environment
in order to have quick testing results on one host without hardware.
That works quite well (despite SoftRoCe bugs which were fixed locally).
So should not be a big deal to cover iWARP.

> We also need performance numbers that compare IBNBD against SRP and/or
> NVMeOF with memory registration disabled to see whether and how much faster
> IBNBD is compared to these two protocols.

Indeed, we've never tested against nvmf and that is a must.


Thanks.

--
Roman
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 87+ messages in thread

end of thread, other threads:[~2017-03-28 14:17 UTC | newest]

Thread overview: 87+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-24 10:45 [RFC PATCH 00/28] INFINIBAND NETWORK BLOCK DEVICE (IBNBD) Jack Wang
2017-03-24 10:45 ` Jack Wang
2017-03-24 10:45 ` [PATCH 01/28] ibtrs: add header shared between ibtrs_client and ibtrs_server Jack Wang
2017-03-24 10:45   ` Jack Wang
2017-03-24 12:35   ` Johannes Thumshirn
2017-03-24 12:35     ` Johannes Thumshirn
2017-03-24 12:54     ` Jinpu Wang
2017-03-24 12:54       ` Jinpu Wang
2017-03-24 14:31       ` Johannes Thumshirn
2017-03-24 14:31         ` Johannes Thumshirn
2017-03-24 14:35         ` Jinpu Wang
2017-03-24 14:35           ` Jinpu Wang
2017-03-24 10:45 ` [PATCH 02/28] ibtrs: add header for log MICROs " Jack Wang
2017-03-24 10:45 ` [PATCH 03/28] ibtrs_lib: add common functions shared by client and server Jack Wang
2017-03-24 10:45 ` [PATCH 04/28] ibtrs_clt: add header file for exported interface Jack Wang
2017-03-24 10:45 ` [PATCH 05/28] ibtrs_clt: main functionality of ibtrs_client Jack Wang
2017-03-24 10:45   ` Jack Wang
2017-03-24 10:45 ` [PATCH 06/28] ibtrs_clt: add header file shared only in ibtrs_client Jack Wang
2017-03-24 10:45 ` [PATCH 07/28] ibtrs_clt: add files for sysfs interface Jack Wang
2017-03-24 10:45 ` [PATCH 08/28] ibtrs_clt: add Makefile and Kconfig Jack Wang
2017-03-25  5:51   ` kbuild test robot
2017-03-25  5:51     ` kbuild test robot
2017-03-25  6:55   ` kbuild test robot
2017-03-25  6:55     ` kbuild test robot
2017-03-24 10:45 ` [PATCH 09/28] ibtrs_srv: add header file for exported interface Jack Wang
2017-03-24 10:45   ` Jack Wang
2017-03-24 10:45 ` [PATCH 10/28] ibtrs_srv: add main functionality for ibtrs_server Jack Wang
2017-03-24 10:45 ` [PATCH 11/28] ibtrs_srv: add header shared in ibtrs_server Jack Wang
2017-03-24 10:45   ` Jack Wang
2017-03-24 10:45 ` [PATCH 12/28] ibtrs_srv: add sysfs interface Jack Wang
2017-03-24 10:45 ` [PATCH 13/28] ibtrs_srv: add Makefile and Kconfig Jack Wang
2017-03-24 10:45   ` Jack Wang
2017-03-25  7:55   ` kbuild test robot
2017-03-25  7:55     ` kbuild test robot
2017-03-25 10:54   ` kbuild test robot
2017-03-25 10:54     ` kbuild test robot
2017-03-24 10:45 ` [PATCH 14/28] ibnbd: add headers shared by ibnbd_client and ibnbd_server Jack Wang
2017-03-24 10:45 ` [PATCH 15/28] ibnbd: add shared library functions Jack Wang
2017-03-24 10:45   ` Jack Wang
2017-03-24 10:45 ` [PATCH 16/28] ibnbd_clt: add main functionality of ibnbd_client Jack Wang
2017-03-24 10:45   ` Jack Wang
2017-03-24 10:45 ` [PATCH 17/28] ibnbd_clt: add header shared in ibnbd_client Jack Wang
2017-03-24 10:45   ` Jack Wang
2017-03-24 10:45 ` [PATCH 18/28] ibnbd_clt: add sysfs interface Jack Wang
2017-03-24 10:45   ` Jack Wang
2017-03-24 10:45 ` [PATCH 19/28] ibnbd_clt: add log helpers Jack Wang
2017-03-24 10:45 ` [PATCH 20/28] ibnbd_clt: add Makefile and Kconfig Jack Wang
2017-03-24 10:45   ` Jack Wang
2017-03-25  8:38   ` kbuild test robot
2017-03-25  8:38     ` kbuild test robot
2017-03-25 11:17   ` kbuild test robot
2017-03-25 11:17     ` kbuild test robot
2017-03-24 10:45 ` [PATCH 21/28] ibnbd_srv: add header shared in ibnbd_server Jack Wang
2017-03-24 10:45   ` Jack Wang
2017-03-24 10:45 ` [PATCH 22/28] ibnbd_srv: add main functionality Jack Wang
2017-03-24 10:45 ` [PATCH 23/28] ibnbd_srv: add abstraction for submit IO to file or block device Jack Wang
2017-03-24 10:45   ` Jack Wang
2017-03-24 10:45 ` [PATCH 24/28] ibnbd_srv: add log helpers Jack Wang
2017-03-24 10:45 ` [PATCH 25/28] ibnbd_srv: add sysfs interface Jack Wang
2017-03-24 10:45   ` Jack Wang
2017-03-24 10:45 ` [PATCH 26/28] ibnbd_srv: add Makefile and Kconfig Jack Wang
2017-03-25  9:27   ` kbuild test robot
2017-03-25  9:27     ` kbuild test robot
2017-03-24 10:45 ` [PATCH 27/28] ibnbd: add doc for how to use ibnbd and sysfs interface Jack Wang
2017-03-25  7:44   ` kbuild test robot
2017-03-25  7:44     ` kbuild test robot
2017-03-24 10:45 ` [PATCH 28/28] MAINTRAINERS: Add maintainer for IBNBD/IBTRS Jack Wang
2017-03-24 12:15 ` [RFC PATCH 00/28] INFINIBAND NETWORK BLOCK DEVICE (IBNBD) Johannes Thumshirn
2017-03-24 12:15   ` Johannes Thumshirn
2017-03-24 12:46   ` Jinpu Wang
2017-03-24 12:46     ` Jinpu Wang
2017-03-24 12:48     ` Johannes Thumshirn
2017-03-24 12:48       ` Johannes Thumshirn
2017-03-24 13:31     ` Bart Van Assche
2017-03-24 13:31       ` Bart Van Assche
2017-03-24 14:24       ` Jinpu Wang
2017-03-24 14:24         ` Jinpu Wang
2017-03-24 14:20 ` Steve Wise
2017-03-24 14:20   ` Steve Wise
2017-03-24 14:37   ` Jinpu Wang
2017-03-24 14:37     ` Jinpu Wang
2017-03-27  2:20 ` Sagi Grimberg
2017-03-27  2:20   ` Sagi Grimberg
2017-03-27 10:21   ` Jinpu Wang
2017-03-27 10:21     ` Jinpu Wang
2017-03-28 14:17     ` Roman Penyaev
2017-03-28 14:17       ` Roman Penyaev

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.