All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V4 for-next 0/4] Add receive Flow Steering support
@ 2013-08-28 12:47 Matan Barak
       [not found] ` <1377694075-29287-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Matan Barak @ 2013-08-28 12:47 UTC (permalink / raw)
  To: Roland Dreier
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak, Or Gerlitz,
	Hadar Har-Zion, shawn.bohrer-Re5JQEeQqe8AvxtiuMwx3w,
	sean.hefty-ral2JQCrhuEAvxtiuMwx3w, tzahio-VPRAkNaXOzVWk0Htik3J/w

Hi Roland,

V6 changes:
  - Addressed comments from Roland on patch #1 and #3
  - Don't allow unsupported comp_mask values
  - Added a check in ib_uverbs_create_flow to verify
    the size passed from the user space.
  - Cosmetic changes: don't use anonymous union as the only field in a struct.

V5 changes: 
  - Addressed comments from Roland on patch #2 -- enhanced the change-log

V4 changes:
  - Addressed comments from Sean on the uverbs code that copies the 
    flow specifications from user space. Now we make sure that passed sizes
    match the kernel flow specs, and in any case, don't look on addresses 
    which are beyond the overall size passed by the write system call.

V3 changes:
  - Addressed comments from Sean:
  - modified the change-log of patch #1 to be clearer on the priority and domain
    semantics and usage
  - re-arranged the fields of struct ib_flow_attr
  - removed check from ib_flow_destroy
  - removed the IB flow spec which wasn't inline with the L2/L3/L4 approach
    done for Ethernet/IP/TCP|UDP, will use proper IB flow specs when adding
    the support for IPoIB flow steering

     
V2 changes:
  - dropped struct ib_kern_flow from patch #3, this structure wasn't 
    used and was left there by mistake (bug, thanks Roland)
  - removed the void *flow_context field from struct ib_flow, this was 
    pointing to driver private data for that flow, but doesn't belong here, 
    i.e need not be seen by the verbs consumer but rather hidden.
  - renamed struct mlx4_flow_handle to mlx4_ib_flow, a structure that contains 
    the verbs level struct ib_flow and the mlx4 registeration ID for that flow

V1 changes:

 - dropped the five pre-patches which were accepted into 3.10
 - rebased the patches against Roland's for-next / 3.10-rc4
 - in patch #3, ib_uverbs_destroy_flow was returning too quickly when the driver
   returned failure for ib_destroy_flow, need to free some uverbs resources 1st.
 - in patch #4, check index before accessing the array at mlx4_ib_create/destroy_flow

These patches add Flow Steering support to the kernel IB core, to uverbs and 
to the mlx4 IB (verbs) driver along with one patch to uverbs which adds 
some code to support extensions.

  IB/core: Add receive Flow Steering support
  IB/core: Infra-structure to support verbs extensions through uverbs
  IB/core: Export ib_create/destroy_flow through uverbs
  IB/mlx4: Add receive Flow Steering support

The main patch which introduces the Flow-Steering API is "IB/core: Add receive Flow 
Steering support", see its change log. Looking on the "Network Adapter Flow Steering" 
slides from Tzahi Oved which he presented on the annual OFA 2012 meeting could be helpful
https://www.openfabrics.org/resources/document-downloads/presentations/doc_download/518-network-adapter-flow-steering.html

Or and Matan.


Hadar Hen Zion (3):
  IB/core: Add receive Flow Steering support
  IB/core: Export ib_create/destroy_flow through uverbs
  IB/mlx4: Add receive Flow Steering support

Igor Ivanov (1):
  IB/core: Infrastructure to support verbs extensions through uverbs

 drivers/infiniband/core/uverbs.h      |    3 +
 drivers/infiniband/core/uverbs_cmd.c  |  228 ++++++++++++++++++++++++++++++++
 drivers/infiniband/core/uverbs_main.c |   42 +++++-
 drivers/infiniband/core/verbs.c       |   27 ++++
 drivers/infiniband/hw/mlx4/main.c     |  235 +++++++++++++++++++++++++++++++++
 drivers/infiniband/hw/mlx4/mlx4_ib.h  |   12 ++
 include/linux/mlx4/device.h           |    5 -
 include/rdma/ib_verbs.h               |  122 +++++++++++++++++-
 include/uapi/rdma/ib_user_verbs.h     |   99 ++++++++++++++-
 9 files changed, 759 insertions(+), 14 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH V4 for-next 1/4] IB/core: Add receive Flow Steering support
       [not found] ` <1377694075-29287-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2013-08-28 12:47   ` Matan Barak
  2013-08-28 12:47   ` [PATCH V4 for-next 2/4] IB/core: Infrastructure to support verbs extensions through uverbs Matan Barak
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 17+ messages in thread
From: Matan Barak @ 2013-08-28 12:47 UTC (permalink / raw)
  To: Roland Dreier
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak, Or Gerlitz,
	Hadar Har-Zion, shawn.bohrer-Re5JQEeQqe8AvxtiuMwx3w,
	sean.hefty-ral2JQCrhuEAvxtiuMwx3w, tzahio-VPRAkNaXOzVWk0Htik3J/w

From: Hadar Hen Zion <hadarh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

The RDMA stack allows for applications to create IB_QPT_RAW_PACKET QPs,
for which plain Ethernet packets are used, specifically packets which
don't carry any QPN to be matched by the receiving side.

Applications using these QPs must be provided with a method to
program some steering rule with the HW so packets arriving at
the local port can be routed to them.

This patch adds ib_create_flow which allow to provide a flow specification
for a QP, such that when there's a match between the specification and the
received packet, it can be forwarded to that QP, in a similar manner
one needs to use ib_attach_multicast for IB UD multicast handling.

Flow specifications are provided as instances of struct ib_flow_spec_yyy
which describe L2, L3 and L4 headers, currently specs for Ethernet, IPv4,
TCP and UDP are defined. Flow specs are made of values and masks.

The input to ib_create_flow is instance of struct ib_flow_attr which
contain few mandatory control elements and optional flow specs.

struct ib_flow_attr {
	enum ib_flow_attr_type type;
	u16      size;
	u16      priority;
	u32      flags;
	u8       num_of_specs;
	u8       port;
	/* Following are the optional layers according to user request
	 * struct ib_flow_spec_yyy
	 * struct ib_flow_spec_zzz
	 */
};

As these specs are eventually coming from user space, they are defined and
used in a way which allows adding new spec types without kernel/user ABI
change, and with a little API enhancement which defines the newly added spec.

The flow spec structures are defined in a TLV (Type-Length-Value) manner,
which allows to call ib_create_flow with a list of variable length of
optional specs.

For the actual processing of ib_flow_attr the driver uses the number of
specs and the size mandatory fields along with the TLV nature of the specs.

Steering rules processing order is according to the domain over which
the rule is set and the rule priority. All rules set by user space
applicatations fall into the IB_FLOW_DOMAIN_USER domain, other domains
could be used by future IPoIB RFS and Ethetool flow-steering interface
implementation. Lower priority numerical value means higher priority.

The returned value from ib_create_flow is instance of struct ib_flow
which contains a database pointer (handle) provided by the HW driver
to be used when calling ib_destroy_flow.

Applications that offload TCP/IP traffic could be written also over IB UD QPs.
As such, the ib_create_flow / ib_destroy_flow API is designed to support UD QPs
too, the HW driver sets IB_DEVICE_MANAGED_FLOW_STEERING to denote support
of flow steering.

The ib_flow_attr enum type relates to usage of flow steering for promiscuous
and sniffer purposes:

IB_FLOW_ATTR_NORMAL - "regular" rule, steering according to rule specification

IB_FLOW_ATTR_ALL_DEFAULT - default unicast and multicast rule, receive
all Ethernet traffic which isn't steered to any QP

IB_FLOW_ATTR_MC_DEFAULT - same as IB_FLOW_ATTR_ALL_DEFAULT but only for multicast

IB_FLOW_ATTR_SNIFFER - sniffer rule, receive all port traffic

ALL_DEFAULT and MC_DEFAULT rules options are valid only for Ethernet link type.

Change-Id: I3feff1dfb3717feebfe6ffa23fe609568ca06bcd
Signed-off-by: Hadar Hen Zion <hadarh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/verbs.c |   27 +++++++++
 include/rdma/ib_verbs.h         |  121 ++++++++++++++++++++++++++++++++++++++-
 2 files changed, 146 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 077fd64..a321df2 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1257,3 +1257,30 @@ int ib_dealloc_xrcd(struct ib_xrcd *xrcd)
 	return xrcd->device->dealloc_xrcd(xrcd);
 }
 EXPORT_SYMBOL(ib_dealloc_xrcd);
+
+struct ib_flow *ib_create_flow(struct ib_qp *qp,
+			       struct ib_flow_attr *flow_attr,
+			       int domain)
+{
+	struct ib_flow *flow_id;
+	if (!qp->device->create_flow)
+		return ERR_PTR(-ENOSYS);
+
+	flow_id = qp->device->create_flow(qp, flow_attr, domain);
+	if (!IS_ERR(flow_id))
+		atomic_inc(&qp->usecnt);
+	return flow_id;
+}
+EXPORT_SYMBOL(ib_create_flow);
+
+int ib_destroy_flow(struct ib_flow *flow_id)
+{
+	int err;
+	struct ib_qp *qp = flow_id->qp;
+
+	err = qp->device->destroy_flow(flow_id);
+	if (!err)
+		atomic_dec(&qp->usecnt);
+	return err;
+}
+EXPORT_SYMBOL(ib_destroy_flow);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index a84d3df..6f4109b 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -116,7 +116,8 @@ enum ib_device_cap_flags {
 	IB_DEVICE_MEM_MGT_EXTENSIONS	= (1<<21),
 	IB_DEVICE_BLOCK_MULTICAST_LOOPBACK = (1<<22),
 	IB_DEVICE_MEM_WINDOW_TYPE_2A	= (1<<23),
-	IB_DEVICE_MEM_WINDOW_TYPE_2B	= (1<<24)
+	IB_DEVICE_MEM_WINDOW_TYPE_2B	= (1<<24),
+	IB_DEVICE_MANAGED_FLOW_STEERING = (1<<29)
 };
 
 enum ib_atomic_cap {
@@ -1039,7 +1040,8 @@ struct ib_qp {
 	struct ib_srq	       *srq;
 	struct ib_xrcd	       *xrcd; /* XRC TGT QPs only */
 	struct list_head	xrcd_list;
-	atomic_t		usecnt; /* count times opened, mcast attaches */
+	/* count times opened, mcast attaches, flow attaches */
+	atomic_t		usecnt;
 	struct list_head	open_list;
 	struct ib_qp           *real_qp;
 	struct ib_uobject      *uobject;
@@ -1074,6 +1076,112 @@ struct ib_fmr {
 	u32			rkey;
 };
 
+/* Supported steering options */
+enum ib_flow_attr_type {
+	/* steering according to rule specifications */
+	IB_FLOW_ATTR_NORMAL		= 0x0,
+	/* default unicast and multicast rule -
+	 * receive all Eth traffic which isn't steered to any QP
+	 */
+	IB_FLOW_ATTR_ALL_DEFAULT	= 0x1,
+	/* default multicast rule -
+	 * receive all Eth multicast traffic which isn't steered to any QP
+	 */
+	IB_FLOW_ATTR_MC_DEFAULT		= 0x2,
+	/* sniffer rule - receive all port traffic */
+	IB_FLOW_ATTR_SNIFFER		= 0x3
+};
+
+/* Supported steering header types */
+enum ib_flow_spec_type {
+	/* L2 headers*/
+	IB_FLOW_SPEC_ETH	= 0x20,
+	/* L3 header*/
+	IB_FLOW_SPEC_IPV4	= 0x30,
+	/* L4 headers*/
+	IB_FLOW_SPEC_TCP	= 0x40,
+	IB_FLOW_SPEC_UDP	= 0x41
+};
+
+#define IB_FLOW_SPEC_SUPPORT_LAYERS 4
+
+/* Flow steering rule priority is set according to it's domain.
+ * Lower domain value means higher priority.
+ */
+enum ib_flow_domain {
+	IB_FLOW_DOMAIN_USER,
+	IB_FLOW_DOMAIN_ETHTOOL,
+	IB_FLOW_DOMAIN_RFS,
+	IB_FLOW_DOMAIN_NIC,
+	IB_FLOW_DOMAIN_NUM /* Must be last */
+};
+
+struct ib_flow_eth_filter {
+	u8	dst_mac[6];
+	u8	src_mac[6];
+	__be16	ether_type;
+	__be16	vlan_tag;
+};
+
+struct ib_flow_spec_eth {
+	enum ib_flow_spec_type	  type;
+	u16			  size;
+	struct ib_flow_eth_filter val;
+	struct ib_flow_eth_filter mask;
+};
+
+struct ib_flow_ipv4_filter {
+	__be32	src_ip;
+	__be32	dst_ip;
+};
+
+struct ib_flow_spec_ipv4 {
+	enum ib_flow_spec_type	   type;
+	u16			   size;
+	struct ib_flow_ipv4_filter val;
+	struct ib_flow_ipv4_filter mask;
+};
+
+struct ib_flow_tcp_udp_filter {
+	__be16	dst_port;
+	__be16	src_port;
+};
+
+struct ib_flow_spec_tcp_udp {
+	enum ib_flow_spec_type	      type;
+	u16			      size;
+	struct ib_flow_tcp_udp_filter val;
+	struct ib_flow_tcp_udp_filter mask;
+};
+
+union ib_flow_spec {
+	struct {
+		enum ib_flow_spec_type	type;
+		u16			size;
+	};
+	struct ib_flow_spec_eth eth;
+	struct ib_flow_spec_ipv4 ipv4;
+	struct ib_flow_spec_tcp_udp tcp_udp;
+};
+
+struct ib_flow_attr {
+	enum ib_flow_attr_type type;
+	u16	     size;
+	u16	     priority;
+	u32	     flags;
+	u8	     num_of_specs;
+	u8	     port;
+	/* Following are the optional layers according to user request
+	 * struct ib_flow_spec_xxx
+	 * struct ib_flow_spec_yyy
+	 */
+};
+
+struct ib_flow {
+	struct ib_qp		*qp;
+	struct ib_uobject	*uobject;
+};
+
 struct ib_mad;
 struct ib_grh;
 
@@ -1306,6 +1414,11 @@ struct ib_device {
 						 struct ib_ucontext *ucontext,
 						 struct ib_udata *udata);
 	int			   (*dealloc_xrcd)(struct ib_xrcd *xrcd);
+	struct ib_flow *	   (*create_flow)(struct ib_qp *qp,
+						  struct ib_flow_attr
+						  *flow_attr,
+						  int domain);
+	int			   (*destroy_flow)(struct ib_flow *flow_id);
 
 	struct ib_dma_mapping_ops   *dma_ops;
 
@@ -2266,4 +2379,8 @@ struct ib_xrcd *ib_alloc_xrcd(struct ib_device *device);
  */
 int ib_dealloc_xrcd(struct ib_xrcd *xrcd);
 
+struct ib_flow *ib_create_flow(struct ib_qp *qp,
+			       struct ib_flow_attr *flow_attr, int domain);
+int ib_destroy_flow(struct ib_flow *flow_id);
+
 #endif /* IB_VERBS_H */
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH V4 for-next 2/4] IB/core: Infrastructure to support verbs extensions through uverbs
       [not found] ` <1377694075-29287-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2013-08-28 12:47   ` [PATCH V4 for-next 1/4] IB/core: " Matan Barak
@ 2013-08-28 12:47   ` Matan Barak
  2013-08-28 12:47   ` [PATCH V4 for-next 3/4] IB/core: Export ib_create/destroy_flow " Matan Barak
  2013-08-28 12:47   ` [PATCH V4 for-next 4/4] IB/mlx4: Add receive Flow Steering support Matan Barak
  3 siblings, 0 replies; 17+ messages in thread
From: Matan Barak @ 2013-08-28 12:47 UTC (permalink / raw)
  To: Roland Dreier
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak, Or Gerlitz,
	Hadar Har-Zion, shawn.bohrer-Re5JQEeQqe8AvxtiuMwx3w,
	sean.hefty-ral2JQCrhuEAvxtiuMwx3w, tzahio-VPRAkNaXOzVWk0Htik3J/w,
	Igor Ivanov

From: Igor Ivanov <Igor.Ivanov-wN0M4riKYwLQT0dZR+AlfA@public.gmane.org>

Add infra-structure to support extended uverbs capabilities in a forward/backward
manner. Uverbs command opcodes which are based on the verbs extensions approach should
be greater or equal to IB_USER_VERBS_CMD_THRESHOLD. They have new header format
and processed a bit differently.

Whenever a specific IB_USER_VERBS_CMD_XXX is extended, which practically means
it needs to have additional arguments, we will be able to add them without creating
a completely new IB_USER_VERBS_CMD_YYY command or bumping the uverbs ABI version.

This patch for itself doesn't provide the whole scheme which is also dependent
on adding a comp_mask field to each extended uverbs command struct.

The new header framework allows for future extension of the CMD arguments
(ib_uverbs_cmd_hdr.in_words, ib_uverbs_cmd_hdr.out_words) for an existing
new command (that is a command that supports the new uverbs command header format
suggested in this patch) w/o bumping ABI version and with maintaining backward
and formward compatibility to new and old libibverbs versions.

In the uverbs command we are passing both uverbs arguments and the provider arguments.
We split the ib_uverbs_cmd_hdr.in_words to ib_uverbs_cmd_hdr.in_words which will now carry only
uverbs input argument struct size and  ib_uverbs_cmd_hdr.provider_in_words that will carry
the provider input argument size. Same goes for the response (the uverbs CMD output argument).

For example take the create_cq call and the mlx4_ib provider:

The uverbs layer gets libibverb's struct ibv_create_cq (named struct ib_uverbs_create_cq
in the kernel), mlx4_ib gets libmlx4's struct mlx4_create_cq (which includes struct
ibv_create_cq and is named struct mlx4_ib_create_cq in the kernel) and
in_words = sizeof(mlx4_create_cq)/4 .

Thus ib_uverbs_cmd_hdr.in_words carry both uverbs plus mlx4_ib input argument sizes,
where uverbs assumes it knows the size of its input argument - struct ibv_create_cq.

Now, if we wish to add a variable to struct ibv_create_cq, we can add a comp_mask field
to the struct which is basically bit field indicating which fields exists in the struct
(as done for the libibverbs API extension), but we need a way to tell what is the total
size of the struct and not assume the struct size is predefined (since we may get different
struct sizes from different user libibverbs versions). So we know at which point the
provider input argument (struct mlx4_create_cq) begins. Same goes for extending the
provider struct mlx4_create_cq. Thus we split the ib_uverbs_cmd_hdr.in_words to
ib_uverbs_cmd_hdr.in_words which will now carry only uverbs input argument struct size and
ib_uverbs_cmd_hdr.provider_in_words that will carry the provider (mlx4_ib) input argument size.

Signed-off-by: Igor Ivanov <Igor.Ivanov-wN0M4riKYwLQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Hadar Hen Zion <hadarh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/uverbs_main.c |   29 ++++++++++++++++++++++++-----
 include/uapi/rdma/ib_user_verbs.h     |   10 ++++++++++
 2 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index 2c6f0f2..e4e7b24 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -583,9 +583,6 @@ static ssize_t ib_uverbs_write(struct file *filp, const char __user *buf,
 	if (copy_from_user(&hdr, buf, sizeof hdr))
 		return -EFAULT;
 
-	if (hdr.in_words * 4 != count)
-		return -EINVAL;
-
 	if (hdr.command >= ARRAY_SIZE(uverbs_cmd_table) ||
 	    !uverbs_cmd_table[hdr.command])
 		return -EINVAL;
@@ -597,8 +594,30 @@ static ssize_t ib_uverbs_write(struct file *filp, const char __user *buf,
 	if (!(file->device->ib_dev->uverbs_cmd_mask & (1ull << hdr.command)))
 		return -ENOSYS;
 
-	return uverbs_cmd_table[hdr.command](file, buf + sizeof hdr,
-					     hdr.in_words * 4, hdr.out_words * 4);
+	if (hdr.command >= IB_USER_VERBS_CMD_THRESHOLD) {
+		struct ib_uverbs_cmd_hdr_ex hdr_ex;
+
+		if (copy_from_user(&hdr_ex, buf, sizeof(hdr_ex)))
+			return -EFAULT;
+
+		if (((hdr_ex.in_words + hdr_ex.provider_in_words) * 4) != count)
+			return -EINVAL;
+
+		return uverbs_cmd_table[hdr.command](file,
+						     buf + sizeof(hdr_ex),
+						     (hdr_ex.in_words +
+						      hdr_ex.provider_in_words) * 4,
+						     (hdr_ex.out_words +
+						      hdr_ex.provider_out_words) * 4);
+	} else {
+		if (hdr.in_words * 4 != count)
+			return -EINVAL;
+
+		return uverbs_cmd_table[hdr.command](file,
+						     buf + sizeof(hdr),
+						     hdr.in_words * 4,
+						     hdr.out_words * 4);
+	}
 }
 
 static int ib_uverbs_mmap(struct file *filp, struct vm_area_struct *vma)
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index 805711e..61535aa 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -43,6 +43,7 @@
  * compatibility are made.
  */
 #define IB_USER_VERBS_ABI_VERSION	6
+#define IB_USER_VERBS_CMD_THRESHOLD    50
 
 enum {
 	IB_USER_VERBS_CMD_GET_CONTEXT,
@@ -123,6 +124,15 @@ struct ib_uverbs_cmd_hdr {
 	__u16 out_words;
 };
 
+struct ib_uverbs_cmd_hdr_ex {
+	__u32 command;
+	__u16 in_words;
+	__u16 out_words;
+	__u16 provider_in_words;
+	__u16 provider_out_words;
+	__u32 cmd_hdr_reserved;
+};
+
 struct ib_uverbs_get_context {
 	__u64 response;
 	__u64 driver_data[0];
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH V4 for-next 3/4] IB/core: Export ib_create/destroy_flow through uverbs
       [not found] ` <1377694075-29287-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2013-08-28 12:47   ` [PATCH V4 for-next 1/4] IB/core: " Matan Barak
  2013-08-28 12:47   ` [PATCH V4 for-next 2/4] IB/core: Infrastructure to support verbs extensions through uverbs Matan Barak
@ 2013-08-28 12:47   ` Matan Barak
       [not found]     ` <1377694075-29287-4-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2013-08-28 12:47   ` [PATCH V4 for-next 4/4] IB/mlx4: Add receive Flow Steering support Matan Barak
  3 siblings, 1 reply; 17+ messages in thread
From: Matan Barak @ 2013-08-28 12:47 UTC (permalink / raw)
  To: Roland Dreier
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak, Or Gerlitz,
	Hadar Har-Zion, shawn.bohrer-Re5JQEeQqe8AvxtiuMwx3w,
	sean.hefty-ral2JQCrhuEAvxtiuMwx3w, tzahio-VPRAkNaXOzVWk0Htik3J/w

From: Hadar Hen Zion <hadarh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Implement ib_uverbs_create_flow and ib_uverbs_destroy_flow to
support flow steering for user space applications.

Change-Id: I0bc6dbe26f4776d00f95b6200ff43ccb24000e31
Signed-off-by: Hadar Hen Zion <hadarh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/uverbs.h      |    3 +
 drivers/infiniband/core/uverbs_cmd.c  |  228 +++++++++++++++++++++++++++++++++
 drivers/infiniband/core/uverbs_main.c |   13 ++-
 include/rdma/ib_verbs.h               |    1 +
 include/uapi/rdma/ib_user_verbs.h     |   89 +++++++++++++-
 5 files changed, 332 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index b8431d6..d040b87 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -156,6 +156,7 @@ extern struct idr ib_uverbs_cq_idr;
 extern struct idr ib_uverbs_qp_idr;
 extern struct idr ib_uverbs_srq_idr;
 extern struct idr ib_uverbs_xrcd_idr;
+extern struct idr ib_uverbs_rule_idr;
 
 void idr_remove_uobj(struct idr *idp, struct ib_uobject *uobj);
 
@@ -216,5 +217,7 @@ IB_UVERBS_DECLARE_CMD(destroy_srq);
 IB_UVERBS_DECLARE_CMD(create_xsrq);
 IB_UVERBS_DECLARE_CMD(open_xrcd);
 IB_UVERBS_DECLARE_CMD(close_xrcd);
+IB_UVERBS_DECLARE_CMD(create_flow);
+IB_UVERBS_DECLARE_CMD(destroy_flow);
 
 #endif /* UVERBS_H */
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index b105140..2856696 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -54,6 +54,7 @@ static struct uverbs_lock_class qp_lock_class	= { .name = "QP-uobj" };
 static struct uverbs_lock_class ah_lock_class	= { .name = "AH-uobj" };
 static struct uverbs_lock_class srq_lock_class	= { .name = "SRQ-uobj" };
 static struct uverbs_lock_class xrcd_lock_class = { .name = "XRCD-uobj" };
+static struct uverbs_lock_class rule_lock_class = { .name = "RULE-uobj" };
 
 #define INIT_UDATA(udata, ibuf, obuf, ilen, olen)			\
 	do {								\
@@ -330,6 +331,7 @@ ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file,
 	INIT_LIST_HEAD(&ucontext->srq_list);
 	INIT_LIST_HEAD(&ucontext->ah_list);
 	INIT_LIST_HEAD(&ucontext->xrcd_list);
+	INIT_LIST_HEAD(&ucontext->rule_list);
 	ucontext->closing = 0;
 
 	resp.num_comp_vectors = file->device->num_comp_vectors;
@@ -2597,6 +2599,232 @@ out_put:
 	return ret ? ret : in_len;
 }
 
+static int kern_spec_to_ib_spec(struct ib_kern_spec *kern_spec,
+				union ib_flow_spec *ib_spec)
+{
+	ib_spec->type = kern_spec->type;
+
+	switch (ib_spec->type) {
+	case IB_FLOW_SPEC_ETH:
+		ib_spec->eth.size = sizeof(struct ib_flow_spec_eth);
+		if (ib_spec->eth.size != kern_spec->eth.size)
+			return -EINVAL;
+		memcpy(&ib_spec->eth.val, &kern_spec->eth.val,
+		       sizeof(struct ib_flow_eth_filter));
+		memcpy(&ib_spec->eth.mask, &kern_spec->eth.mask,
+		       sizeof(struct ib_flow_eth_filter));
+		break;
+	case IB_FLOW_SPEC_IPV4:
+		ib_spec->ipv4.size = sizeof(struct ib_flow_spec_ipv4);
+		if (ib_spec->ipv4.size != kern_spec->ipv4.size)
+			return -EINVAL;
+		memcpy(&ib_spec->ipv4.val, &kern_spec->ipv4.val,
+		       sizeof(struct ib_flow_ipv4_filter));
+		memcpy(&ib_spec->ipv4.mask, &kern_spec->ipv4.mask,
+		       sizeof(struct ib_flow_ipv4_filter));
+		break;
+	case IB_FLOW_SPEC_TCP:
+	case IB_FLOW_SPEC_UDP:
+		ib_spec->tcp_udp.size = sizeof(struct ib_flow_spec_tcp_udp);
+		if (ib_spec->tcp_udp.size != kern_spec->tcp_udp.size)
+			return -EINVAL;
+		memcpy(&ib_spec->tcp_udp.val, &kern_spec->tcp_udp.val,
+		       sizeof(struct ib_flow_tcp_udp_filter));
+		memcpy(&ib_spec->tcp_udp.mask, &kern_spec->tcp_udp.mask,
+		       sizeof(struct ib_flow_tcp_udp_filter));
+		break;
+	default:
+		return -EINVAL;
+	}
+	return 0;
+}
+
+ssize_t ib_uverbs_create_flow(struct ib_uverbs_file *file,
+			      const char __user *buf, int in_len,
+			      int out_len)
+{
+	struct ib_uverbs_create_flow	  cmd;
+	struct ib_uverbs_create_flow_resp resp;
+	struct ib_uobject		  *uobj;
+	struct ib_flow			  *flow_id;
+	struct ib_kern_flow_attr	  *kern_flow_attr;
+	struct ib_flow_attr		  *flow_attr;
+	struct ib_qp			  *qp;
+	int err = 0;
+	void *kern_spec;
+	void *ib_spec;
+	int i;
+	int kern_attr_size;
+
+	if (out_len < sizeof(resp))
+		return -ENOSPC;
+
+	if (copy_from_user(&cmd, buf, sizeof(cmd)))
+		return -EFAULT;
+
+	if (cmd.comp_mask)
+		return -EINVAL;
+
+	if ((cmd.flow_attr.type == IB_FLOW_ATTR_SNIFFER &&
+	     !capable(CAP_NET_ADMIN)) || !capable(CAP_NET_RAW))
+		return -EPERM;
+
+	if (cmd.flow_attr.num_of_specs < 0 ||
+	    cmd.flow_attr.num_of_specs > IB_FLOW_SPEC_SUPPORT_LAYERS)
+		return -EINVAL;
+
+	kern_attr_size = cmd.flow_attr.size - sizeof(cmd) -
+			 sizeof(struct ib_uverbs_cmd_hdr_ex);
+
+	if (cmd.flow_attr.size < 0 || cmd.flow_attr.size > in_len ||
+	    kern_attr_size < 0 || kern_attr_size >
+	    (cmd.flow_attr.num_of_specs * sizeof(struct ib_kern_spec)))
+		return -EINVAL;
+
+	if (cmd.flow_attr.num_of_specs) {
+		kern_flow_attr = kmalloc(cmd.flow_attr.size, GFP_KERNEL);
+		if (!kern_flow_attr)
+			return -ENOMEM;
+
+		memcpy(kern_flow_attr, &cmd.flow_attr, sizeof(*kern_flow_attr));
+		if (copy_from_user(kern_flow_attr + 1, buf + sizeof(cmd),
+				   kern_attr_size)) {
+			err = -EFAULT;
+			goto err_free_attr;
+		}
+	} else {
+		kern_flow_attr = &cmd.flow_attr;
+		kern_attr_size = sizeof(cmd.flow_attr);
+	}
+
+	uobj = kmalloc(sizeof(*uobj), GFP_KERNEL);
+	if (!uobj) {
+		err = -ENOMEM;
+		goto err_free_attr;
+	}
+	init_uobj(uobj, 0, file->ucontext, &rule_lock_class);
+	down_write(&uobj->mutex);
+
+	qp = idr_read_qp(cmd.qp_handle, file->ucontext);
+	if (!qp) {
+		err = -EINVAL;
+		goto err_uobj;
+	}
+
+	flow_attr = kmalloc(cmd.flow_attr.size, GFP_KERNEL);
+	if (!flow_attr) {
+		err = -ENOMEM;
+		goto err_put;
+	}
+
+	flow_attr->type = kern_flow_attr->type;
+	flow_attr->priority = kern_flow_attr->priority;
+	flow_attr->num_of_specs = kern_flow_attr->num_of_specs;
+	flow_attr->port = kern_flow_attr->port;
+	flow_attr->flags = kern_flow_attr->flags;
+	flow_attr->size = sizeof(*flow_attr);
+
+	kern_spec = kern_flow_attr + 1;
+	ib_spec = flow_attr + 1;
+	for (i = 0; i < flow_attr->num_of_specs && kern_attr_size > 0; i++) {
+		err = kern_spec_to_ib_spec(kern_spec, ib_spec);
+		if (err)
+			goto err_free;
+		flow_attr->size +=
+			((union ib_flow_spec *)ib_spec)->size;
+		kern_attr_size -= ((struct ib_kern_spec *)kern_spec)->size;
+		kern_spec += ((struct ib_kern_spec *)kern_spec)->size;
+		ib_spec += ((union ib_flow_spec *)ib_spec)->size;
+	}
+	if (kern_attr_size) {
+		pr_warn("create flow failed, %d bytes left from uverb cmd\n",
+			kern_attr_size);
+		goto err_free;
+	}
+	flow_id = ib_create_flow(qp, flow_attr, IB_FLOW_DOMAIN_USER);
+	if (IS_ERR(flow_id)) {
+		err = PTR_ERR(flow_id);
+		goto err_free;
+	}
+	flow_id->qp = qp;
+	flow_id->uobject = uobj;
+	uobj->object = flow_id;
+
+	err = idr_add_uobj(&ib_uverbs_rule_idr, uobj);
+	if (err)
+		goto destroy_flow;
+
+	memset(&resp, 0, sizeof(resp));
+	resp.flow_handle = uobj->id;
+
+	if (copy_to_user((void __user *)(unsigned long) cmd.response,
+			 &resp, sizeof(resp))) {
+		err = -EFAULT;
+		goto err_copy;
+	}
+
+	put_qp_read(qp);
+	mutex_lock(&file->mutex);
+	list_add_tail(&uobj->list, &file->ucontext->rule_list);
+	mutex_unlock(&file->mutex);
+
+	uobj->live = 1;
+
+	up_write(&uobj->mutex);
+	kfree(flow_attr);
+	if (cmd.flow_attr.num_of_specs)
+		kfree(kern_flow_attr);
+	return in_len;
+err_copy:
+	idr_remove_uobj(&ib_uverbs_rule_idr, uobj);
+destroy_flow:
+	ib_destroy_flow(flow_id);
+err_free:
+	kfree(flow_attr);
+err_put:
+	put_qp_read(qp);
+err_uobj:
+	put_uobj_write(uobj);
+err_free_attr:
+	if (cmd.flow_attr.num_of_specs)
+		kfree(kern_flow_attr);
+	return err;
+}
+
+ssize_t ib_uverbs_destroy_flow(struct ib_uverbs_file *file,
+			       const char __user *buf, int in_len,
+			       int out_len) {
+	struct ib_uverbs_destroy_flow	cmd;
+	struct ib_flow			*flow_id;
+	struct ib_uobject		*uobj;
+	int				ret;
+
+	if (copy_from_user(&cmd, buf, sizeof(cmd)))
+		return -EFAULT;
+
+	uobj = idr_write_uobj(&ib_uverbs_rule_idr, cmd.flow_handle,
+			      file->ucontext);
+	if (!uobj)
+		return -EINVAL;
+	flow_id = uobj->object;
+
+	ret = ib_destroy_flow(flow_id);
+	if (!ret)
+		uobj->live = 0;
+
+	put_uobj_write(uobj);
+
+	idr_remove_uobj(&ib_uverbs_rule_idr, uobj);
+
+	mutex_lock(&file->mutex);
+	list_del(&uobj->list);
+	mutex_unlock(&file->mutex);
+
+	put_uobj(uobj);
+
+	return ret ? ret : in_len;
+}
+
 static int __uverbs_create_xsrq(struct ib_uverbs_file *file,
 				struct ib_uverbs_create_xsrq *cmd,
 				struct ib_udata *udata)
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index e4e7b24..75ad86c 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -73,6 +73,7 @@ DEFINE_IDR(ib_uverbs_cq_idr);
 DEFINE_IDR(ib_uverbs_qp_idr);
 DEFINE_IDR(ib_uverbs_srq_idr);
 DEFINE_IDR(ib_uverbs_xrcd_idr);
+DEFINE_IDR(ib_uverbs_rule_idr);
 
 static DEFINE_SPINLOCK(map_lock);
 static DECLARE_BITMAP(dev_map, IB_UVERBS_MAX_DEVICES);
@@ -113,7 +114,9 @@ static ssize_t (*uverbs_cmd_table[])(struct ib_uverbs_file *file,
 	[IB_USER_VERBS_CMD_OPEN_XRCD]		= ib_uverbs_open_xrcd,
 	[IB_USER_VERBS_CMD_CLOSE_XRCD]		= ib_uverbs_close_xrcd,
 	[IB_USER_VERBS_CMD_CREATE_XSRQ]		= ib_uverbs_create_xsrq,
-	[IB_USER_VERBS_CMD_OPEN_QP]		= ib_uverbs_open_qp
+	[IB_USER_VERBS_CMD_OPEN_QP]		= ib_uverbs_open_qp,
+	[IB_USER_VERBS_CMD_CREATE_FLOW]		= ib_uverbs_create_flow,
+	[IB_USER_VERBS_CMD_DESTROY_FLOW]	= ib_uverbs_destroy_flow
 };
 
 static void ib_uverbs_add_one(struct ib_device *device);
@@ -212,6 +215,14 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
 		kfree(uobj);
 	}
 
+	list_for_each_entry_safe(uobj, tmp, &context->rule_list, list) {
+		struct ib_flow *flow_id = uobj->object;
+
+		idr_remove_uobj(&ib_uverbs_rule_idr, uobj);
+		ib_destroy_flow(flow_id);
+		kfree(uobj);
+	}
+
 	list_for_each_entry_safe(uobj, tmp, &context->qp_list, list) {
 		struct ib_qp *qp = uobj->object;
 		struct ib_uqp_object *uqp =
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 6f4109b..c08b75e 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -960,6 +960,7 @@ struct ib_ucontext {
 	struct list_head	srq_list;
 	struct list_head	ah_list;
 	struct list_head	xrcd_list;
+	struct list_head	rule_list;
 	int			closing;
 };
 
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index 61535aa..0b233c5 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -86,7 +86,9 @@ enum {
 	IB_USER_VERBS_CMD_OPEN_XRCD,
 	IB_USER_VERBS_CMD_CLOSE_XRCD,
 	IB_USER_VERBS_CMD_CREATE_XSRQ,
-	IB_USER_VERBS_CMD_OPEN_QP
+	IB_USER_VERBS_CMD_OPEN_QP,
+	IB_USER_VERBS_CMD_CREATE_FLOW = IB_USER_VERBS_CMD_THRESHOLD,
+	IB_USER_VERBS_CMD_DESTROY_FLOW
 };
 
 /*
@@ -694,6 +696,91 @@ struct ib_uverbs_detach_mcast {
 	__u64 driver_data[0];
 };
 
+struct ib_kern_eth_filter {
+	__u8  dst_mac[6];
+	__u8  src_mac[6];
+	__be16 ether_type;
+	__be16 vlan_tag;
+};
+
+struct ib_kern_spec_eth {
+	__u32  type;
+	__u16  size;
+	__u16  reserved;
+	struct ib_kern_eth_filter val;
+	struct ib_kern_eth_filter mask;
+};
+
+struct ib_kern_ipv4_filter {
+	__be32 src_ip;
+	__be32 dst_ip;
+};
+
+struct ib_kern_spec_ipv4 {
+	__u32  type;
+	__u16  size;
+	__u16  reserved;
+	struct ib_kern_ipv4_filter val;
+	struct ib_kern_ipv4_filter mask;
+};
+
+struct ib_kern_tcp_udp_filter {
+	__be16 dst_port;
+	__be16 src_port;
+};
+
+struct ib_kern_spec_tcp_udp {
+	__u32  type;
+	__u16  size;
+	__u16  reserved;
+	struct ib_kern_tcp_udp_filter val;
+	struct ib_kern_tcp_udp_filter mask;
+};
+
+struct ib_kern_spec {
+	union {
+		struct {
+			__u32 type;
+			__u16 size;
+			__u16 reserved;
+		};
+		struct ib_kern_spec_eth	    eth;
+		struct ib_kern_spec_ipv4    ipv4;
+		struct ib_kern_spec_tcp_udp tcp_udp;
+	};
+};
+
+struct ib_kern_flow_attr {
+	__u32 type;
+	__u16 size;
+	__u16 priority;
+	__u8  num_of_specs;
+	__u8  reserved[2];
+	__u8  port;
+	__u32 flags;
+	/* Following are the optional layers according to user request
+	 * struct ib_flow_spec_xxx
+	 * struct ib_flow_spec_yyy
+	 */
+};
+
+struct ib_uverbs_create_flow  {
+	__u32 comp_mask;
+	__u64 response;
+	__u32 qp_handle;
+	struct ib_kern_flow_attr flow_attr;
+};
+
+struct ib_uverbs_create_flow_resp {
+	__u32 comp_mask;
+	__u32 flow_handle;
+};
+
+struct ib_uverbs_destroy_flow  {
+	__u32 comp_mask;
+	__u32 flow_handle;
+};
+
 struct ib_uverbs_create_srq {
 	__u64 response;
 	__u64 user_handle;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH V4 for-next 4/4] IB/mlx4: Add receive Flow Steering support
       [not found] ` <1377694075-29287-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (2 preceding siblings ...)
  2013-08-28 12:47   ` [PATCH V4 for-next 3/4] IB/core: Export ib_create/destroy_flow " Matan Barak
@ 2013-08-28 12:47   ` Matan Barak
  3 siblings, 0 replies; 17+ messages in thread
From: Matan Barak @ 2013-08-28 12:47 UTC (permalink / raw)
  To: Roland Dreier
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak, Or Gerlitz,
	Hadar Har-Zion, shawn.bohrer-Re5JQEeQqe8AvxtiuMwx3w,
	sean.hefty-ral2JQCrhuEAvxtiuMwx3w, tzahio-VPRAkNaXOzVWk0Htik3J/w

From: Hadar Hen Zion <hadarh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Implement ib_create_flow and ib_destroy_flow.

Translate the verbs structures provided by the user to HW structures
and call the MLX4_QP_FLOW_STEERING_ATTACH/DETACH firmware commands.

On the ATTACH command completion, the firmware provides 64 bit registration
ID which is placed into struct mlx4_ib_flow that wraps the instance of
struct ib_flow which is retuned to caller. Later, this reg ID is used
for detaching that flow from the firmware.

Change-Id: I441650d8f2199addd0b8f0392608337a79d85875
Signed-off-by: Hadar Hen Zion <hadarh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/mlx4/main.c    |  235 ++++++++++++++++++++++++++++++++++
 drivers/infiniband/hw/mlx4/mlx4_ib.h |   12 ++
 include/linux/mlx4/device.h          |    5 -
 3 files changed, 247 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index a188d31..8563a48 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -54,6 +54,8 @@
 #define DRV_VERSION	"1.0"
 #define DRV_RELDATE	"April 4, 2008"
 
+#define MLX4_IB_FLOW_MAX_PRIO 0xFFF
+
 MODULE_AUTHOR("Roland Dreier");
 MODULE_DESCRIPTION("Mellanox ConnectX HCA InfiniBand driver");
 MODULE_LICENSE("Dual BSD/GPL");
@@ -88,6 +90,25 @@ static void init_query_mad(struct ib_smp *mad)
 
 static union ib_gid zgid;
 
+static int check_flow_steering_support(struct mlx4_dev *dev)
+{
+	int ib_num_ports = 0;
+	int i;
+
+	mlx4_foreach_port(i, dev, MLX4_PORT_TYPE_IB)
+		ib_num_ports++;
+
+	if (dev->caps.steering_mode == MLX4_STEERING_MODE_DEVICE_MANAGED) {
+		if (ib_num_ports || mlx4_is_mfunc(dev)) {
+			pr_warn("Device managed flow steering is unavailable "
+				"for IB ports or in multifunction env.\n");
+			return 0;
+		}
+		return 1;
+	}
+	return 0;
+}
+
 static int mlx4_ib_query_device(struct ib_device *ibdev,
 				struct ib_device_attr *props)
 {
@@ -144,6 +165,8 @@ static int mlx4_ib_query_device(struct ib_device *ibdev,
 			props->device_cap_flags |= IB_DEVICE_MEM_WINDOW_TYPE_2B;
 		else
 			props->device_cap_flags |= IB_DEVICE_MEM_WINDOW_TYPE_2A;
+	if (check_flow_steering_support(dev->dev))
+		props->device_cap_flags |= IB_DEVICE_MANAGED_FLOW_STEERING;
 	}
 
 	props->vendor_id	   = be32_to_cpup((__be32 *) (out_mad->data + 36)) &
@@ -798,6 +821,209 @@ struct mlx4_ib_steering {
 	union ib_gid gid;
 };
 
+static int parse_flow_attr(struct mlx4_dev *dev,
+			   union  ib_flow_spec *ib_spec,
+			   struct _rule_hw *mlx4_spec)
+{
+	enum mlx4_net_trans_rule_id type;
+
+	switch (ib_spec->type) {
+	case IB_FLOW_SPEC_ETH:
+		type = MLX4_NET_TRANS_RULE_ID_ETH;
+		memcpy(mlx4_spec->eth.dst_mac, ib_spec->eth.val.dst_mac,
+		       ETH_ALEN);
+		memcpy(mlx4_spec->eth.dst_mac_msk, ib_spec->eth.mask.dst_mac,
+		       ETH_ALEN);
+		mlx4_spec->eth.vlan_tag = ib_spec->eth.val.vlan_tag;
+		mlx4_spec->eth.vlan_tag_msk = ib_spec->eth.mask.vlan_tag;
+		break;
+
+	case IB_FLOW_SPEC_IPV4:
+		type = MLX4_NET_TRANS_RULE_ID_IPV4;
+		mlx4_spec->ipv4.src_ip = ib_spec->ipv4.val.src_ip;
+		mlx4_spec->ipv4.src_ip_msk = ib_spec->ipv4.mask.src_ip;
+		mlx4_spec->ipv4.dst_ip = ib_spec->ipv4.val.dst_ip;
+		mlx4_spec->ipv4.dst_ip_msk = ib_spec->ipv4.mask.dst_ip;
+		break;
+
+	case IB_FLOW_SPEC_TCP:
+	case IB_FLOW_SPEC_UDP:
+		type = ib_spec->type == IB_FLOW_SPEC_TCP ?
+					MLX4_NET_TRANS_RULE_ID_TCP :
+					MLX4_NET_TRANS_RULE_ID_UDP;
+		mlx4_spec->tcp_udp.dst_port = ib_spec->tcp_udp.val.dst_port;
+		mlx4_spec->tcp_udp.dst_port_msk = ib_spec->tcp_udp.mask.dst_port;
+		mlx4_spec->tcp_udp.src_port = ib_spec->tcp_udp.val.src_port;
+		mlx4_spec->tcp_udp.src_port_msk = ib_spec->tcp_udp.mask.src_port;
+		break;
+
+	default:
+		return -EINVAL;
+	}
+	if (mlx4_map_sw_to_hw_steering_id(dev, type) < 0 ||
+	    mlx4_hw_rule_sz(dev, type) < 0)
+		return -EINVAL;
+	mlx4_spec->id = cpu_to_be16(mlx4_map_sw_to_hw_steering_id(dev, type));
+	mlx4_spec->size = mlx4_hw_rule_sz(dev, type) >> 2;
+	return mlx4_hw_rule_sz(dev, type);
+}
+
+static int __mlx4_ib_create_flow(struct ib_qp *qp, struct ib_flow_attr *flow_attr,
+			  int domain,
+			  enum mlx4_net_trans_promisc_mode flow_type,
+			  u64 *reg_id)
+{
+	int ret, i;
+	int size = 0;
+	void *ib_flow;
+	struct mlx4_ib_dev *mdev = to_mdev(qp->device);
+	struct mlx4_cmd_mailbox *mailbox;
+	struct mlx4_net_trans_rule_hw_ctrl *ctrl;
+	size_t rule_size = sizeof(struct mlx4_net_trans_rule_hw_ctrl) +
+			   (sizeof(struct _rule_hw) * flow_attr->num_of_specs);
+
+	static const u16 __mlx4_domain[] = {
+		[IB_FLOW_DOMAIN_USER] = MLX4_DOMAIN_UVERBS,
+		[IB_FLOW_DOMAIN_ETHTOOL] = MLX4_DOMAIN_ETHTOOL,
+		[IB_FLOW_DOMAIN_RFS] = MLX4_DOMAIN_RFS,
+		[IB_FLOW_DOMAIN_NIC] = MLX4_DOMAIN_NIC,
+	};
+
+	if (flow_attr->priority > MLX4_IB_FLOW_MAX_PRIO) {
+		pr_err("Invalid priority value %d\n", flow_attr->priority);
+		return -EINVAL;
+	}
+
+	if (domain >= IB_FLOW_DOMAIN_NUM) {
+		pr_err("Invalid domain value %d\n", domain);
+		return -EINVAL;
+	}
+
+	if (mlx4_map_sw_to_hw_steering_mode(mdev->dev, flow_type) < 0)
+		return -EINVAL;
+
+	mailbox = mlx4_alloc_cmd_mailbox(mdev->dev);
+	if (IS_ERR(mailbox))
+		return PTR_ERR(mailbox);
+	memset(mailbox->buf, 0, rule_size);
+	ctrl = mailbox->buf;
+
+	ctrl->prio = cpu_to_be16(__mlx4_domain[domain] |
+				 flow_attr->priority);
+	ctrl->type = mlx4_map_sw_to_hw_steering_mode(mdev->dev, flow_type);
+	ctrl->port = flow_attr->port;
+	ctrl->qpn = cpu_to_be32(qp->qp_num);
+
+	ib_flow = flow_attr + 1;
+	size += sizeof(struct mlx4_net_trans_rule_hw_ctrl);
+	for (i = 0; i < flow_attr->num_of_specs; i++) {
+		ret = parse_flow_attr(mdev->dev, ib_flow, mailbox->buf + size);
+		if (ret < 0) {
+			mlx4_free_cmd_mailbox(mdev->dev, mailbox);
+			return -EINVAL;
+		}
+		ib_flow += ((union ib_flow_spec *)ib_flow)->size;
+		size += ret;
+	}
+
+	ret = mlx4_cmd_imm(mdev->dev, mailbox->dma, reg_id, size >> 2, 0,
+			   MLX4_QP_FLOW_STEERING_ATTACH, MLX4_CMD_TIME_CLASS_A,
+			   MLX4_CMD_NATIVE);
+	if (ret == -ENOMEM)
+		pr_err("mcg table is full. Fail to register network rule.\n");
+	else if (ret == -ENXIO)
+		pr_err("Device managed flow steering is disabled. Fail to register network rule.\n");
+	else if (ret)
+		pr_err("Invalid argumant. Fail to register network rule.\n");
+
+	mlx4_free_cmd_mailbox(mdev->dev, mailbox);
+	return ret;
+}
+
+static int __mlx4_ib_destroy_flow(struct mlx4_dev *dev, u64 reg_id)
+{
+	int err;
+	err = mlx4_cmd(dev, reg_id, 0, 0,
+		       MLX4_QP_FLOW_STEERING_DETACH, MLX4_CMD_TIME_CLASS_A,
+		       MLX4_CMD_NATIVE);
+	if (err)
+		pr_err("Fail to detach network rule. registration id = 0x%llx\n",
+		       reg_id);
+	return err;
+}
+
+static struct ib_flow *mlx4_ib_create_flow(struct ib_qp *qp,
+				    struct ib_flow_attr *flow_attr,
+				    int domain)
+{
+	int err = 0, i = 0;
+	struct mlx4_ib_flow *mflow;
+	enum mlx4_net_trans_promisc_mode type[2];
+
+	memset(type, 0, sizeof(type));
+
+	mflow = kzalloc(sizeof(*mflow), GFP_KERNEL);
+	if (!mflow) {
+		err = -ENOMEM;
+		goto err_free;
+	}
+
+	switch (flow_attr->type) {
+	case IB_FLOW_ATTR_NORMAL:
+		type[0] = MLX4_FS_REGULAR;
+		break;
+
+	case IB_FLOW_ATTR_ALL_DEFAULT:
+		type[0] = MLX4_FS_ALL_DEFAULT;
+		break;
+
+	case IB_FLOW_ATTR_MC_DEFAULT:
+		type[0] = MLX4_FS_MC_DEFAULT;
+		break;
+
+	case IB_FLOW_ATTR_SNIFFER:
+		type[0] = MLX4_FS_UC_SNIFFER;
+		type[1] = MLX4_FS_MC_SNIFFER;
+		break;
+
+	default:
+		err = -EINVAL;
+		goto err_free;
+	}
+
+	while (i < ARRAY_SIZE(type) && type[i]) {
+		err = __mlx4_ib_create_flow(qp, flow_attr, domain, type[i],
+					    &mflow->reg_id[i]);
+		if (err)
+			goto err_free;
+		i++;
+	}
+
+	return &mflow->ibflow;
+
+err_free:
+	kfree(mflow);
+	return ERR_PTR(err);
+}
+
+static int mlx4_ib_destroy_flow(struct ib_flow *flow_id)
+{
+	int err, ret = 0;
+	int i = 0;
+	struct mlx4_ib_dev *mdev = to_mdev(flow_id->qp->device);
+	struct mlx4_ib_flow *mflow = to_mflow(flow_id);
+
+	while (i < ARRAY_SIZE(mflow->reg_id) && mflow->reg_id[i]) {
+		err = __mlx4_ib_destroy_flow(mdev->dev, mflow->reg_id[i]);
+		if (err)
+			ret = err;
+		i++;
+	}
+
+	kfree(mflow);
+	return ret;
+}
+
 static int mlx4_ib_mcg_attach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid)
 {
 	int err;
@@ -1461,6 +1687,15 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
 			(1ull << IB_USER_VERBS_CMD_CLOSE_XRCD);
 	}
 
+	if (check_flow_steering_support(dev)) {
+		ibdev->ib_dev.create_flow	= mlx4_ib_create_flow;
+		ibdev->ib_dev.destroy_flow	= mlx4_ib_destroy_flow;
+
+		ibdev->ib_dev.uverbs_cmd_mask	|=
+			(1ull << IB_USER_VERBS_CMD_CREATE_FLOW) |
+			(1ull << IB_USER_VERBS_CMD_DESTROY_FLOW);
+	}
+
 	mlx4_ib_alloc_eqs(dev, ibdev);
 
 	spin_lock_init(&iboe->lock);
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index f61ec26..036b663 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -132,6 +132,12 @@ struct mlx4_ib_fmr {
 	struct mlx4_fmr         mfmr;
 };
 
+struct mlx4_ib_flow {
+	struct ib_flow ibflow;
+	/* translating DMFS verbs sniffer rule to FW API requires two reg IDs */
+	u64 reg_id[2];
+};
+
 struct mlx4_ib_wq {
 	u64		       *wrid;
 	spinlock_t		lock;
@@ -552,6 +558,12 @@ static inline struct mlx4_ib_fmr *to_mfmr(struct ib_fmr *ibfmr)
 {
 	return container_of(ibfmr, struct mlx4_ib_fmr, ibfmr);
 }
+
+static inline struct mlx4_ib_flow *to_mflow(struct ib_flow *ibflow)
+{
+	return container_of(ibflow, struct mlx4_ib_flow, ibflow);
+}
+
 static inline struct mlx4_ib_qp *to_mqp(struct ib_qp *ibqp)
 {
 	return container_of(ibqp, struct mlx4_ib_qp, ibqp);
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 52c23a8..d73423c 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -1052,11 +1052,6 @@ struct _rule_hw {
 	};
 };
 
-/* translating DMFS verbs sniffer rule to the FW API would need two reg IDs */
-struct mlx4_flow_handle {
-	u64 reg_id[2];
-};
-
 int mlx4_flow_steer_promisc_add(struct mlx4_dev *dev, u8 port, u32 qpn,
 				enum mlx4_net_trans_promisc_mode mode);
 int mlx4_flow_steer_promisc_remove(struct mlx4_dev *dev, u8 port,
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH V4 for-next 3/4] IB/core: Export ib_create/destroy_flow through uverbs
       [not found]     ` <1377694075-29287-4-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2013-08-28 16:20       ` Jason Gunthorpe
       [not found]         ` <20130828162050.GA31381-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2013-09-11 14:04       ` Yann Droneaud
  1 sibling, 1 reply; 17+ messages in thread
From: Jason Gunthorpe @ 2013-08-28 16:20 UTC (permalink / raw)
  To: Matan Barak
  Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz,
	Hadar Har-Zion, shawn.bohrer-Re5JQEeQqe8AvxtiuMwx3w,
	sean.hefty-ral2JQCrhuEAvxtiuMwx3w, tzahio-VPRAkNaXOzVWk0Htik3J/w

On Wed, Aug 28, 2013 at 03:47:54PM +0300, Matan Barak wrote:
> +
> +	if (cmd.comp_mask)
> +		return -EINVAL;

So, how do you propose to interoperate with new user space/old
kernels?

How will user space know what comp_mask values the kernel will
support?

The notion that was established in the verbs patches is that extra
structure fields are ignored by old software.

> +	if ((cmd.flow_attr.type == IB_FLOW_ATTR_SNIFFER &&
> +	     !capable(CAP_NET_ADMIN)) || !capable(CAP_NET_RAW))
> +		return -EPERM;
> +
> +	if (cmd.flow_attr.num_of_specs < 0 ||
> +	    cmd.flow_attr.num_of_specs > IB_FLOW_SPEC_SUPPORT_LAYERS)
> +		return -EINVAL;
> +
> +	kern_attr_size = cmd.flow_attr.size - sizeof(cmd) -
> +			 sizeof(struct ib_uverbs_cmd_hdr_ex);
> +
> +	if (cmd.flow_attr.size < 0 || cmd.flow_attr.size > in_len ||
> +	    kern_attr_size < 0 || kern_attr_size >
> +	    (cmd.flow_attr.num_of_specs * sizeof(struct ib_kern_spec)))
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Watch out for integer overflow here..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V4 for-next 3/4] IB/core: Export ib_create/destroy_flow through uverbs
       [not found]         ` <20130828162050.GA31381-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2013-09-01  9:18           ` Matan Barak
       [not found]             ` <52230648.5000302-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Matan Barak @ 2013-09-01  9:18 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz,
	Hadar Hen Zion, shawn.bohrer-Re5JQEeQqe8AvxtiuMwx3w,
	sean.hefty-ral2JQCrhuEAvxtiuMwx3w, Tzahi Oved

On 28/8/2013 7:21 PM, Jason Gunthorpe wrote:
> On Wed, Aug 28, 2013 at 03:47:54PM +0300, Matan Barak wrote:
>> +
>> +	if (cmd.comp_mask)
>> +		return -EINVAL;
>
> So, how do you propose to interoperate with new user space/old
> kernels?
>
> How will user space know what comp_mask values the kernel will
> support?
>

struct ib_uverbs_create_flow_resp contains a comp_mask value. This value
should contain the supported comp_masks fields.
Currently, we don't support any extensions, so we zero this field by 
doing "memset(&resp, 0, sizeof(resp));"

I suggest returning an error and setting the response comp_mask field.

> The notion that was established in the verbs patches is that extra
> structure fields are ignored by old software.
>
I'm not aware of any such concrete example. Could you please point out ?
I think it's safer to return an error. Imagine that ibv_modify_qp was 
extended for some crucial field, say IB_QP_AV2. Creating a QP without 
indicating its AV (or AV2 in that case) is an invalid behavior. This 
example is a bit artificial, though some future extensions could have 
such mandatory fields.
The user should do ibv_query_device before requesting features that 
might be unsupported.

>> +	if ((cmd.flow_attr.type == IB_FLOW_ATTR_SNIFFER &&
>> +	     !capable(CAP_NET_ADMIN)) || !capable(CAP_NET_RAW))
>> +		return -EPERM;
>> +
>> +	if (cmd.flow_attr.num_of_specs < 0 ||
>> +	    cmd.flow_attr.num_of_specs > IB_FLOW_SPEC_SUPPORT_LAYERS)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > checked here
>> +		return -EINVAL;
>> +
>> +	kern_attr_size = cmd.flow_attr.size - sizeof(cmd) -
>> +			 sizeof(struct ib_uverbs_cmd_hdr_ex);
>> +
>> +	if (cmd.flow_attr.size < 0 || cmd.flow_attr.size > in_len ||
>> +	    kern_attr_size < 0 || kern_attr_size >
>> +	    (cmd.flow_attr.num_of_specs * sizeof(struct ib_kern_spec)))
>                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> Watch out for integer overflow here..
I don't think there's a risk of integer overflow since sizeof(struct 
ib_kern_spec) is a constant of ~50 bytes long and 
cmd.flow_attr.num_of_specs is checked above and is at most 4.
>
> Jason
>

Since the code was already merged, I'll post an incremental patch and I 
suggest that we'll continue things there.

- Matan

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V4 for-next 3/4] IB/core: Export ib_create/destroy_flow through uverbs
       [not found]             ` <52230648.5000302-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2013-09-01 22:23               ` Jason Gunthorpe
       [not found]                 ` <20130901222304.GB3422-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Jason Gunthorpe @ 2013-09-01 22:23 UTC (permalink / raw)
  To: Matan Barak
  Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz,
	Hadar Hen Zion, shawn.bohrer-Re5JQEeQqe8AvxtiuMwx3w,
	sean.hefty-ral2JQCrhuEAvxtiuMwx3w, Tzahi Oved

On Sun, Sep 01, 2013 at 12:18:00PM +0300, Matan Barak wrote:
> >How will user space know what comp_mask values the kernel will
> >support?
> 
> struct ib_uverbs_create_flow_resp contains a comp_mask value. This value
> should contain the supported comp_masks fields.
> Currently, we don't support any extensions, so we zero this field by
> doing "memset(&resp, 0, sizeof(resp));"
> 
> I suggest returning an error and setting the response comp_mask field.

But this is inconsistent. The comp_mask field should work the same in
both directions, so if you want to return an error then the kernel
should also never return a comp_mask that user space can't support.

All that is hard, which is why the approach for the verbs extensions
was that everyone always sets the maximum comp_mask they support, and
receivers ignore what they don't understand.

>> The notion that was established in the verbs patches is that extra
>> structure fields are ignored by old software.

> I'm not aware of any such concrete example. Could you please point
> out ?

It was established during the discussion, I guess I now need to check
that the proposed patches followed it.

> I think it's safer to return an error. Imagine that ibv_modify_qp
> was extended for some crucial field, say IB_QP_AV2. Creating a QP
> without indicating its AV (or AV2 in that case) is an invalid
> behavior. This example is a bit artificial, though some future
> extensions could have such mandatory fields.
> The user should do ibv_query_device before requesting features that
> might be unsupported.

No, you are mixing things. The comp_mask should only be used to
indicate the memory in the structure was initialized to something
valid (which might be a no-action initialization)

It should never be used to indicate that the receiver must do
something with the data.

Requesting a fictional AV2 must be done via some other means that has
a suitable failure mode, eg check the device caps before setting an
actionable value.

At least in uverbs the notion should be that comp_mask can be reset if
the soname was bumped.

Further, we shouldn't burden userspace with setting a 'correct'
comp_mask. That is a horrible ABI, at least a 3 on the 'Rusty Scale'.

The correct comp_mask should always match the fields that are
initialized by the caller. That is simple to explain and easy to audit
for correctness.

Anything else is too hard for users.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V4 for-next 3/4] IB/core: Export ib_create/destroy_flow through uverbs
       [not found]                 ` <20130901222304.GB3422-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2013-09-02  9:26                   ` Matan Barak
       [not found]                     ` <522459AE.9070107-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Matan Barak @ 2013-09-02  9:26 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz,
	Hadar Hen Zion, shawn.bohrer-Re5JQEeQqe8AvxtiuMwx3w,
	sean.hefty-ral2JQCrhuEAvxtiuMwx3w, Tzahi Oved

On 2/9/2013 1:23 AM, Jason Gunthorpe wrote:
> On Sun, Sep 01, 2013 at 12:18:00PM +0300, Matan Barak wrote:
>>> How will user space know what comp_mask values the kernel will
>>> support?
>>
>> struct ib_uverbs_create_flow_resp contains a comp_mask value. This value
>> should contain the supported comp_masks fields.
>> Currently, we don't support any extensions, so we zero this field by
>> doing "memset(&resp, 0, sizeof(resp));"
>>
>> I suggest returning an error and setting the response comp_mask field.
>
> But this is inconsistent. The comp_mask field should work the same in
> both directions, so if you want to return an error then the kernel
> should also never return a comp_mask that user space can't support.
>
> All that is hard, which is why the approach for the verbs extensions
> was that everyone always sets the maximum comp_mask they support, and
> receivers ignore what they don't understand.
>
This seems reasonable - returning an error shouldn't be followed by a 
comp_mask. I agree, we need other ways of telling the user which fields 
are supported.
>>> The notion that was established in the verbs patches is that extra
>>> structure fields are ignored by old software.
>
>> I'm not aware of any such concrete example. Could you please point
>> out ?
>
> It was established during the discussion, I guess I now need to check
> that the proposed patches followed it.
>
The merged patches just ignore the comp_mask. Roland asked if we 
shouldn't check to make sure that the comp_mask is 0 so we're
not silently ignoring some future extension fields and I agree.
I think we all agree that some fields are mandatory for the nature of 
the command and the kernel should return an error for receiving such a 
field. If the kernel doesn't know this field and we're working on 
"default ignore" policy, how will the kernel know it should reject the 
request?

On the contrary, if a field is optional, the user could ask the kernel 
in advance if it supports this field and act accordingly.

>> I think it's safer to return an error. Imagine that ibv_modify_qp
>> was extended for some crucial field, say IB_QP_AV2. Creating a QP
>> without indicating its AV (or AV2 in that case) is an invalid
>> behavior. This example is a bit artificial, though some future
>> extensions could have such mandatory fields.
>> The user should do ibv_query_device before requesting features that
>> might be unsupported.
>
> No, you are mixing things. The comp_mask should only be used to
> indicate the memory in the structure was initialized to something
> valid (which might be a no-action initialization)
>
> It should never be used to indicate that the receiver must do
> something with the data.
If the receiver shouldn't do something with the data, why was this field 
passed in the first place ?
>
> Requesting a fictional AV2 must be done via some other means that has
> a suitable failure mode, eg check the device caps before setting an
> actionable value.
I agree, we should use query device caps to know whether a feature is 
supported. Once a feature is supported, all its associated fields in 
comp_mask should be available to the user.
>
> At least in uverbs the notion should be that comp_mask can be reset if
> the soname was bumped.
>
> Further, we shouldn't burden userspace with setting a 'correct'
> comp_mask. That is a horrible ABI, at least a 3 on the 'Rusty Scale'.
>
> The correct comp_mask should always match the fields that are
> initialized by the caller. That is simple to explain and easy to audit
> for correctness.
I agree, but the request could be rejected if an unsupported comp_mask 
field was given. The user should query the device capabilities, see if 
the feature he wants is supported and set the fields accordingly. It's 
simple, it's like every other stack and it's robust.
>
> Anything else is too hard for users.
>
> Jason
>
Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V4 for-next 3/4] IB/core: Export ib_create/destroy_flow through uverbs
       [not found]                     ` <522459AE.9070107-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2013-09-03  4:16                       ` Jason Gunthorpe
       [not found]                         ` <20130903041636.GA3875-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Jason Gunthorpe @ 2013-09-03  4:16 UTC (permalink / raw)
  To: Matan Barak
  Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz,
	Hadar Hen Zion, shawn.bohrer-Re5JQEeQqe8AvxtiuMwx3w,
	sean.hefty-ral2JQCrhuEAvxtiuMwx3w, Tzahi Oved

On Mon, Sep 02, 2013 at 12:26:06PM +0300, Matan Barak wrote:

> >The correct comp_mask should always match the fields that are
> >initialized by the caller. That is simple to explain and easy to audit
> >for correctness.

> I agree, but the request could be rejected if an unsupported
> comp_mask field was given. The user should query the device

No, that is the exact opposite of what I said.

You are trying to make comp_mask indicate if the command should use
the data, I only want the comp_mask to indicate that the data has been
initialized (all new members must have a no-action value), they are
completely different things.

We already have commands that don't touch every member of an input
structure depending on context. eg commands acting on a UD QP don't
touch fields that are used only by RC QPs.

> capabilities, see if the feature he wants is supported and set the
> fields accordingly. It's simple, it's like every other stack and
> it's robust.

It is certainly not robust. You are relying on userspace to set the
correct value for the kernel version being used, and there is no way
for developers to test this because the latest kernel will accept all
values in comp_mask.

That is horrible. Our carefully designed ABI compatability is worth
nothing in practice if commands return EINVAL on old kerenls!

Old and new kernels must behave consistently.

The way to handle comp_mask is to have new kernels *VALIDATE* and old
kernels ignore. New kernels will require a *no action* value in all
situations where the feature is not supported (eg an older HCA,
feature not enabled, feature not present in device caps, etc).

This way the new kernel runtime tests the input, in a manner that is
consistent with the handling of old kernels (old kernel treat
unsupported members as no-action). Testing your app on a new
kernel, with a HCA that does not support the feature is sufficent to
show that it will work on past kernels that do not support the
feature.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V4 for-next 3/4] IB/core: Export ib_create/destroy_flow through uverbs
       [not found]                         ` <20130903041636.GA3875-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2013-09-03 13:36                           ` Matan Barak
  0 siblings, 0 replies; 17+ messages in thread
From: Matan Barak @ 2013-09-03 13:36 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, roland-DgEjT+Ai2ygdnm+yROfE0A,
	hadarh-VPRAkNaXOzVWk0Htik3J/w, Shawn Bohrer, Hefty, Sean,
	Or Gerlitz

On 3/9/2013 7:16 AM, Jason Gunthorpe wrote:
> On Mon, Sep 02, 2013 at 12:26:06PM +0300, Matan Barak wrote:
>
>>> The correct comp_mask should always match the fields that are
>>> initialized by the caller. That is simple to explain and easy to audit
>>> for correctness.
>
>> I agree, but the request could be rejected if an unsupported
>> comp_mask field was given. The user should query the device
>
> No, that is the exact opposite of what I said.
>
> You are trying to make comp_mask indicate if the command should use
> the data, I only want the comp_mask to indicate that the data has been
> initialized (all new members must have a no-action value), they are
> completely different things.
>

If the kernel treats unprovided fields as "no action" by default (for 
new kernel with old user space), why does it matter if the user 
initialized them with "no action" or the kernel just assumes so ? Why do 
we need a comp_mask for this ?

I think that comp_mask should be used only for fields with an "action" 
value that the user requests the kernel to handle.

> We already have commands that don't touch every member of an input
> structure depending on context. eg commands acting on a UD QP don't
> touch fields that are used only by RC QPs.
>

We use attr_mask and the QP type in order to know which fields should be 
treated.

>> capabilities, see if the feature he wants is supported and set the
>> fields accordingly. It's simple, it's like every other stack and
>> it's robust.
>
> It is certainly not robust. You are relying on userspace to set the
> correct value for the kernel version being used, and there is no way
> for developers to test this because the latest kernel will accept all
> values in comp_mask.

If treating comp_mask as "action" fields (instead of "initialized" 
fields), comp_mask = 0 is always valid. It means that all extended 
fields have their default/"older" values. In other words, the kernel 
always knows how to support "older" user space.

>
> That is horrible. Our carefully designed ABI compatability is worth
> nothing in practice if commands return EINVAL on old kerenls!
>
> Old and new kernels must behave consistently.
>
> The way to handle comp_mask is to have new kernels *VALIDATE* and old
> kernels ignore. New kernels will require a *no action* value in all
> situations where the feature is not supported (eg an older HCA,
> feature not enabled, feature not present in device caps, etc).
>
> This way the new kernel runtime tests the input, in a manner that is
> consistent with the handling of old kernels (old kernel treat
> unsupported members as no-action). Testing your app on a new
> kernel, with a HCA that does not support the feature is sufficent to
> show that it will work on past kernels that do not support the
> feature.
>

Basically, I think I understand the ABI you suggest. I think it could 
lead to a situation that an old kernel will ignore an "action" field and 
process the command differently than expected.
Saying that, I'm opened to other opinions.

> Jason
>

Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V4 for-next 3/4] IB/core: Export ib_create/destroy_flow through uverbs
       [not found]     ` <1377694075-29287-4-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2013-08-28 16:20       ` Jason Gunthorpe
@ 2013-09-11 14:04       ` Yann Droneaud
       [not found]         ` <1378908269.2258.31.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
  1 sibling, 1 reply; 17+ messages in thread
From: Yann Droneaud @ 2013-09-11 14:04 UTC (permalink / raw)
  To: Matan Barak
  Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz,
	Hadar Har-Zion, shawn.bohrer-Re5JQEeQqe8AvxtiuMwx3w,
	sean.hefty-ral2JQCrhuEAvxtiuMwx3w, tzahio-VPRAkNaXOzVWk0Htik3J/w

Le mercredi 28 août 2013 à 15:47 +0300, Matan Barak a écrit :
> From: Hadar Hen Zion <hadarh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> 
> Implement ib_uverbs_create_flow and ib_uverbs_destroy_flow to
> support flow steering for user space applications.
> 
> Change-Id: I0bc6dbe26f4776d00f95b6200ff43ccb24000e31
> Signed-off-by: Hadar Hen Zion <hadarh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Seen this in linux-next as commit
436f2ad05a0b65b1467ddf51bc68171c381bf844

> ---
>  drivers/infiniband/core/uverbs.h      |    3 +
>  drivers/infiniband/core/uverbs_cmd.c  |  228 +++++++++++++++++++++++++++++++++
>  drivers/infiniband/core/uverbs_main.c |   13 ++-
>  include/rdma/ib_verbs.h               |    1 +
>  include/uapi/rdma/ib_user_verbs.h     |   89 +++++++++++++-
>  5 files changed, 332 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
> index b105140..2856696 100644
> --- a/drivers/infiniband/core/uverbs_cmd.c
> +++ b/drivers/infiniband/core/uverbs_cmd.c
> @@ -2597,6 +2599,232 @@ out_put:
>  	return ret ? ret : in_len;
>  }
>  
> +static int kern_spec_to_ib_spec(struct ib_kern_spec *kern_spec,
> +				union ib_flow_spec *ib_spec)
> +{
> +	ib_spec->type = kern_spec->type;
> +
> +	switch (ib_spec->type) {
> +	case IB_FLOW_SPEC_ETH:
> +		ib_spec->eth.size = sizeof(struct ib_flow_spec_eth);
> +		if (ib_spec->eth.size != kern_spec->eth.size)
> +			return -EINVAL;
> +		memcpy(&ib_spec->eth.val, &kern_spec->eth.val,
> +		       sizeof(struct ib_flow_eth_filter));
> +		memcpy(&ib_spec->eth.mask, &kern_spec->eth.mask,
> +		       sizeof(struct ib_flow_eth_filter));
> +		break;
> +	case IB_FLOW_SPEC_IPV4:
> +		ib_spec->ipv4.size = sizeof(struct ib_flow_spec_ipv4);
> +		if (ib_spec->ipv4.size != kern_spec->ipv4.size)
> +			return -EINVAL;
> +		memcpy(&ib_spec->ipv4.val, &kern_spec->ipv4.val,
> +		       sizeof(struct ib_flow_ipv4_filter));
> +		memcpy(&ib_spec->ipv4.mask, &kern_spec->ipv4.mask,
> +		       sizeof(struct ib_flow_ipv4_filter));
> +		break;
> +	case IB_FLOW_SPEC_TCP:
> +	case IB_FLOW_SPEC_UDP:
> +		ib_spec->tcp_udp.size = sizeof(struct ib_flow_spec_tcp_udp);
> +		if (ib_spec->tcp_udp.size != kern_spec->tcp_udp.size)
> +			return -EINVAL;
> +		memcpy(&ib_spec->tcp_udp.val, &kern_spec->tcp_udp.val,
> +		       sizeof(struct ib_flow_tcp_udp_filter));
> +		memcpy(&ib_spec->tcp_udp.mask, &kern_spec->tcp_udp.mask,
> +		       sizeof(struct ib_flow_tcp_udp_filter));
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +	return 0;
> +}
> +
> +ssize_t ib_uverbs_create_flow(struct ib_uverbs_file *file,
> +			      const char __user *buf, int in_len,
> +			      int out_len)
> +{
> +	struct ib_uverbs_create_flow	  cmd;
> +	struct ib_uverbs_create_flow_resp resp;
> +	struct ib_uobject		  *uobj;
> +	struct ib_flow			  *flow_id;
> +	struct ib_kern_flow_attr	  *kern_flow_attr;
> +	struct ib_flow_attr		  *flow_attr;
> +	struct ib_qp			  *qp;
> +	int err = 0;
> +	void *kern_spec;
> +	void *ib_spec;

why void * here, there's a type for them, and the function
kern_spec_to_ib_spec() has a prototype with those types.

> +	int i;
> +	int kern_attr_size;
> +
> +	if (out_len < sizeof(resp))
> +		return -ENOSPC;
> +
> +	if (copy_from_user(&cmd, buf, sizeof(cmd)))
> +		return -EFAULT;
> +
> +	if (cmd.comp_mask)
> +		return -EINVAL;
> +
> +	if ((cmd.flow_attr.type == IB_FLOW_ATTR_SNIFFER &&
> +	     !capable(CAP_NET_ADMIN)) || !capable(CAP_NET_RAW))
> +		return -EPERM;
> +
> +	if (cmd.flow_attr.num_of_specs < 0 ||

num_of_specs is unsigned ...

> +	    cmd.flow_attr.num_of_specs > IB_FLOW_SPEC_SUPPORT_LAYERS)
> +		return -EINVAL;
> +
> +	kern_attr_size = cmd.flow_attr.size - sizeof(cmd) -
> +			 sizeof(struct ib_uverbs_cmd_hdr_ex);
> +

Please, no under/overflow. Check that cmd.flow_attr.size is bigger than
sizeof(cmd) + sizeof(struct ib_uverbs_cmd_hdr_ex) before computing
kern_attr_size.

> +	if (cmd.flow_attr.size < 0 || cmd.flow_attr.size > in_len ||

just like num_of_specs, it's unsigned !

cmd.flow_attr.size is certainly less then in_len, so you probably mean
cmd.flow_attr.size > (in_len - sizeof(cmd) - sizeof(struct
ib_uverbs_cmd_hdr_ex)).

> +	    kern_attr_size < 0 || kern_attr_size >
> +	    (cmd.flow_attr.num_of_specs * sizeof(struct ib_kern_spec)))
> +		return -EINVAL;
> +

let's make kern_attr_size unsigned.

And be stricter: kern_attr_size != cmd.flow_attr.num_of_specs *
sizeof(struct ib_kern_spec) might be welcome.

> +	if (cmd.flow_attr.num_of_specs) {
> +		kern_flow_attr = kmalloc(cmd.flow_attr.size, GFP_KERNEL);
> +		if (!kern_flow_attr)
> +			return -ENOMEM;
> +
> +		memcpy(kern_flow_attr, &cmd.flow_attr, sizeof(*kern_flow_attr));

might be sizeof(struct ib_kern_spec) as used in the computation.

> +		if (copy_from_user(kern_flow_attr + 1, buf + sizeof(cmd),
> +				   kern_attr_size)) {
> +			err = -EFAULT;
> +			goto err_free_attr;
> +		}

I don't feel comfortable with the bounds: previously it was 
sizeof(cmd) + sizeof(struct ib_uverbs_cmd_hdr_ex), and here
copy_from_user() is copying starting from buf + sizeof(cmd) ... so where
is the struct ib_uverbs_cmd_hdr_ex ?

And I dislike the + 1 throughout this portion of the code: why the first
element is being so different than the others. I didn't dig enough in
the code to understand, but from my point of view, it doesn't fit.

> +	kern_spec = kern_flow_attr + 1;
> +	ib_spec = flow_attr + 1;
> +	for (i = 0; i < flow_attr->num_of_specs && kern_attr_size > 0; i++) {
> +		err = kern_spec_to_ib_spec(kern_spec, ib_spec);
> +		if (err)
> +			goto err_free;
> +		flow_attr->size +=
> +			((union ib_flow_spec *)ib_spec)->size;
> +		kern_attr_size -= ((struct ib_kern_spec *)kern_spec)->size;
> +		kern_spec += ((struct ib_kern_spec *)kern_spec)->size;
> +		ib_spec += ((union ib_flow_spec *)ib_spec)->size;

Why use void * for ib_spec and kern_spec if there's a union type for
them ?

> diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
> index 61535aa..0b233c5 100644
> --- a/include/uapi/rdma/ib_user_verbs.h
> +++ b/include/uapi/rdma/ib_user_verbs.h
> @@ -86,7 +86,9 @@ enum {

> +struct ib_kern_spec {
> +	union {
> +		struct {
> +			__u32 type;
> +			__u16 size;
> +			__u16 reserved;
> +		};
> +		struct ib_kern_spec_eth	    eth;
> +		struct ib_kern_spec_ipv4    ipv4;
> +		struct ib_kern_spec_tcp_udp tcp_udp;
> +	};
> +};
> +

[Aside note: no IPv6 spec ?]


> +struct ib_kern_flow_attr {
> +	__u32 type;
> +	__u16 size;
> +	__u16 priority;
> +	__u8  num_of_specs;
> +	__u8  reserved[2];
> +	__u8  port;
> +	__u32 flags;
> +	/* Following are the optional layers according to user request
> +	 * struct ib_flow_spec_xxx
> +	 * struct ib_flow_spec_yyy
> +	 */
> +};
> +
> +struct ib_uverbs_create_flow  {
> +	__u32 comp_mask;

Alignment notice: There's a hole here on 64bits system with stricter
alignment rules.

> +	__u64 response;
> +	__u32 qp_handle;
> +	struct ib_kern_flow_attr flow_attr;
> +};
> +

Sorry I've missed that before it got included in infiniband git tree.

Regards.

-- 
Yann Droneaud
OPTEYA



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V4 for-next 3/4] IB/core: Export ib_create/destroy_flow through uverbs
       [not found]         ` <1378908269.2258.31.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
@ 2013-09-11 15:10           ` Or Gerlitz
       [not found]             ` <523087CE.4080007-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2013-09-17 12:35           ` Matan Barak
  1 sibling, 1 reply; 17+ messages in thread
From: Or Gerlitz @ 2013-09-11 15:10 UTC (permalink / raw)
  To: Yann Droneaud
  Cc: Matan Barak, Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Hadar Har-Zion, shawn.bohrer-Re5JQEeQqe8AvxtiuMwx3w,
	sean.hefty-ral2JQCrhuEAvxtiuMwx3w, tzahio-VPRAkNaXOzVWk0Htik3J/w

On 11/09/2013 17:04, Yann Droneaud wrote:

[...]
>> +	int i;
>> +	int kern_attr_size;
>> +
>> +	if (out_len < sizeof(resp))
>> +		return -ENOSPC;
>> +
>> +	if (copy_from_user(&cmd, buf, sizeof(cmd)))
>> +		return -EFAULT;
>> +
>> +	if (cmd.comp_mask)
>> +		return -EINVAL;
>> +
>> +	if ((cmd.flow_attr.type == IB_FLOW_ATTR_SNIFFER &&
>> +	     !capable(CAP_NET_ADMIN)) || !capable(CAP_NET_RAW))
>> +		return -EPERM;
>> +
>> +	if (cmd.flow_attr.num_of_specs < 0 ||
> num_of_specs is unsigned ...

sure, will send fix that eliminates this redundant check
>
>> +	    cmd.flow_attr.num_of_specs > IB_FLOW_SPEC_SUPPORT_LAYERS)
>> +		return -EINVAL;
>> +
>> +	kern_attr_size = cmd.flow_attr.size - sizeof(cmd) -
>> +			 sizeof(struct ib_uverbs_cmd_hdr_ex);
>> +
> Please, no under/overflow. Check that cmd.flow_attr.size is bigger than
> sizeof(cmd) + sizeof(struct ib_uverbs_cmd_hdr_ex) before computing kern_attr_size.

We've got feedback on this code from Sean and Roland and Matan Barak 
from our team addressed it to the bit.
Matan is OOO for couple of days, plus we're having few holidays here.  
This way or another (accepting the comments
and sending fixes or arguing with you over the list), all your feedback 
will be addressed before 3.12-rc2 is out, thanks for
looking over this!

>
>> +	if (cmd.flow_attr.size < 0 || cmd.flow_attr.size > in_len ||
> just like num_of_specs, it's unsigned !

ditto here, will fix.

[...]

diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index 61535aa..0b233c5 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -86,7 +86,9 @@ enum {

>> +struct ib_kern_spec {
>> +	union {
>> +		struct {
>> +			__u32 type;
>> +			__u16 size;
>> +			__u16 reserved;
>> +		};
>> +		struct ib_kern_spec_eth	    eth;
>> +		struct ib_kern_spec_ipv4    ipv4;
>> +		struct ib_kern_spec_tcp_udp tcp_udp;
>> +	};
>> +};
>> +
> [Aside note: no IPv6 spec ?]

indeed, but note that the way the specs are defines allow for adding 
future spec types w.o breaking anything,
its done in TLV (Type-Length-Value) manner, see the change log of the IB 
core main patch  for flow steering
319a441 IB/core: Add receive flow steering support

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH V4 for-next 3/4] IB/core: Export ib_create/destroy_flow through uverbs
       [not found]             ` <523087CE.4080007-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2013-09-11 15:25               ` Yann Droneaud
  0 siblings, 0 replies; 17+ messages in thread
From: Yann Droneaud @ 2013-09-11 15:25 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Matan Barak, Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Hadar Har-Zion, shawn.bohrer-Re5JQEeQqe8AvxtiuMwx3w,
	sean.hefty-ral2JQCrhuEAvxtiuMwx3w, tzahio-VPRAkNaXOzVWk0Htik3J/w

Le 11.09.2013 17:10, Or Gerlitz a écrit :
> On 11/09/2013 17:04, Yann Droneaud wrote:
> 
> [...]
>>> +	int i;
>>> +	int kern_attr_size;
>>> +
>>> +	if (out_len < sizeof(resp))
>>> +		return -ENOSPC;
>>> +
>>> +	if (copy_from_user(&cmd, buf, sizeof(cmd)))
>>> +		return -EFAULT;
>>> +
>>> +	if (cmd.comp_mask)
>>> +		return -EINVAL;
>>> +
>>> +	if ((cmd.flow_attr.type == IB_FLOW_ATTR_SNIFFER &&
>>> +	     !capable(CAP_NET_ADMIN)) || !capable(CAP_NET_RAW))
>>> +		return -EPERM;
>>> +
>>> +	if (cmd.flow_attr.num_of_specs < 0 ||
>> num_of_specs is unsigned ...
> 
> sure, will send fix that eliminates this redundant check
>> 
>>> +	    cmd.flow_attr.num_of_specs > IB_FLOW_SPEC_SUPPORT_LAYERS)
>>> +		return -EINVAL;
>>> +
>>> +	kern_attr_size = cmd.flow_attr.size - sizeof(cmd) -
>>> +			 sizeof(struct ib_uverbs_cmd_hdr_ex);
>>> +
>> Please, no under/overflow. Check that cmd.flow_attr.size is bigger 
>> than
>> sizeof(cmd) + sizeof(struct ib_uverbs_cmd_hdr_ex) before computing 
>> kern_attr_size.
> 
> We've got feedback on this code from Sean and Roland and Matan Barak
> from our team addressed it to the bit.
> Matan is OOO for couple of days, plus we're having few holidays here.
> This way or another (accepting the comments
> and sending fixes or arguing with you over the list), all your
> feedback will be addressed before 3.12-rc2 is out, thanks for
> looking over this!
> 

Thanks a lot.

>> 
>>> +	if (cmd.flow_attr.size < 0 || cmd.flow_attr.size > in_len ||
>> just like num_of_specs, it's unsigned !
> 
> ditto here, will fix.
> 
> [...]
> 
> diff --git a/include/uapi/rdma/ib_user_verbs.h
> b/include/uapi/rdma/ib_user_verbs.h
> index 61535aa..0b233c5 100644
> --- a/include/uapi/rdma/ib_user_verbs.h
> +++ b/include/uapi/rdma/ib_user_verbs.h
> @@ -86,7 +86,9 @@ enum {
> 
>>> +struct ib_kern_spec {
>>> +	union {
>>> +		struct {
>>> +			__u32 type;
>>> +			__u16 size;
>>> +			__u16 reserved;
>>> +		};
>>> +		struct ib_kern_spec_eth	    eth;
>>> +		struct ib_kern_spec_ipv4    ipv4;
>>> +		struct ib_kern_spec_tcp_udp tcp_udp;
>>> +	};
>>> +};
>>> +
>> [Aside note: no IPv6 spec ?]
> 
> indeed, but note that the way the specs are defines allow for adding
> future spec types w.o breaking anything,
> its done in TLV (Type-Length-Value) manner, see the change log of the
> IB core main patch  for flow steering
> 319a441 IB/core: Add receive flow steering support
> 

I've noticed the TLV nature of ib_kern_spec, but a bit too late, so my 
comments about + 1 and first element being handled differently from the 
others one are rather pointless.

I will try to provide a patch to propose another way of describing the 
data (same layout) before v3.12 release.
Hopefully I have more time since Linus hard drive broke ;)

Regards.

-- 
Yann Droneaud
OPTEYA

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V4 for-next 3/4] IB/core: Export ib_create/destroy_flow through uverbs
       [not found]         ` <1378908269.2258.31.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
  2013-09-11 15:10           ` Or Gerlitz
@ 2013-09-17 12:35           ` Matan Barak
       [not found]             ` <52384C9D.6050900-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  1 sibling, 1 reply; 17+ messages in thread
From: Matan Barak @ 2013-09-17 12:35 UTC (permalink / raw)
  To: Yann Droneaud
  Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz,
	Hadar Har-Zion, shawn.bohrer-Re5JQEeQqe8AvxtiuMwx3w,
	sean.hefty-ral2JQCrhuEAvxtiuMwx3w, tzahio-VPRAkNaXOzVWk0Htik3J/w

On 11/9/2013 5:04 PM, Yann Droneaud wrote:
> Le mercredi 28 août 2013 à 15:47 +0300, Matan Barak a écrit :
>> From: Hadar Hen Zion <hadarh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>
>> Implement ib_uverbs_create_flow and ib_uverbs_destroy_flow to
>> support flow steering for user space applications.
>>
>> Change-Id: I0bc6dbe26f4776d00f95b6200ff43ccb24000e31
>> Signed-off-by: Hadar Hen Zion <hadarh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>
> Seen this in linux-next as commit
> 436f2ad05a0b65b1467ddf51bc68171c381bf844
>
>> ---
>>   drivers/infiniband/core/uverbs.h      |    3 +
>>   drivers/infiniband/core/uverbs_cmd.c  |  228 +++++++++++++++++++++++++++++++++
>>   drivers/infiniband/core/uverbs_main.c |   13 ++-
>>   include/rdma/ib_verbs.h               |    1 +
>>   include/uapi/rdma/ib_user_verbs.h     |   89 +++++++++++++-
>>   5 files changed, 332 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
>> index b105140..2856696 100644
>> --- a/drivers/infiniband/core/uverbs_cmd.c
>> +++ b/drivers/infiniband/core/uverbs_cmd.c
>> @@ -2597,6 +2599,232 @@ out_put:
>>   	return ret ? ret : in_len;
>>   }
>>
>> +static int kern_spec_to_ib_spec(struct ib_kern_spec *kern_spec,
>> +				union ib_flow_spec *ib_spec)
>> +{
>> +	ib_spec->type = kern_spec->type;
>> +
>> +	switch (ib_spec->type) {
>> +	case IB_FLOW_SPEC_ETH:
>> +		ib_spec->eth.size = sizeof(struct ib_flow_spec_eth);
>> +		if (ib_spec->eth.size != kern_spec->eth.size)
>> +			return -EINVAL;
>> +		memcpy(&ib_spec->eth.val, &kern_spec->eth.val,
>> +		       sizeof(struct ib_flow_eth_filter));
>> +		memcpy(&ib_spec->eth.mask, &kern_spec->eth.mask,
>> +		       sizeof(struct ib_flow_eth_filter));
>> +		break;
>> +	case IB_FLOW_SPEC_IPV4:
>> +		ib_spec->ipv4.size = sizeof(struct ib_flow_spec_ipv4);
>> +		if (ib_spec->ipv4.size != kern_spec->ipv4.size)
>> +			return -EINVAL;
>> +		memcpy(&ib_spec->ipv4.val, &kern_spec->ipv4.val,
>> +		       sizeof(struct ib_flow_ipv4_filter));
>> +		memcpy(&ib_spec->ipv4.mask, &kern_spec->ipv4.mask,
>> +		       sizeof(struct ib_flow_ipv4_filter));
>> +		break;
>> +	case IB_FLOW_SPEC_TCP:
>> +	case IB_FLOW_SPEC_UDP:
>> +		ib_spec->tcp_udp.size = sizeof(struct ib_flow_spec_tcp_udp);
>> +		if (ib_spec->tcp_udp.size != kern_spec->tcp_udp.size)
>> +			return -EINVAL;
>> +		memcpy(&ib_spec->tcp_udp.val, &kern_spec->tcp_udp.val,
>> +		       sizeof(struct ib_flow_tcp_udp_filter));
>> +		memcpy(&ib_spec->tcp_udp.mask, &kern_spec->tcp_udp.mask,
>> +		       sizeof(struct ib_flow_tcp_udp_filter));
>> +		break;
>> +	default:
>> +		return -EINVAL;
>> +	}
>> +	return 0;
>> +}
>> +
>> +ssize_t ib_uverbs_create_flow(struct ib_uverbs_file *file,
>> +			      const char __user *buf, int in_len,
>> +			      int out_len)
>> +{
>> +	struct ib_uverbs_create_flow	  cmd;
>> +	struct ib_uverbs_create_flow_resp resp;
>> +	struct ib_uobject		  *uobj;
>> +	struct ib_flow			  *flow_id;
>> +	struct ib_kern_flow_attr	  *kern_flow_attr;
>> +	struct ib_flow_attr		  *flow_attr;
>> +	struct ib_qp			  *qp;
>> +	int err = 0;
>> +	void *kern_spec;
>> +	void *ib_spec;
>
> why void * here, there's a type for them, and the function
> kern_spec_to_ib_spec() has a prototype with those types.

We could use struct ib_kern_spec and union ib_flow_spec, but since there 
is a lot of pointer arithmetic here, I think void * provides a more 
readable code. Since the actual specs are of different size and are 
placed one after another, traversing through the specs requires the 
actual size rather than the union size.

>
>> +	int i;
>> +	int kern_attr_size;
>> +
>> +	if (out_len < sizeof(resp))
>> +		return -ENOSPC;
>> +
>> +	if (copy_from_user(&cmd, buf, sizeof(cmd)))
>> +		return -EFAULT;
>> +
>> +	if (cmd.comp_mask)
>> +		return -EINVAL;
>> +
>> +	if ((cmd.flow_attr.type == IB_FLOW_ATTR_SNIFFER &&
>> +	     !capable(CAP_NET_ADMIN)) || !capable(CAP_NET_RAW))
>> +		return -EPERM;
>> +
>> +	if (cmd.flow_attr.num_of_specs < 0 ||
>
> num_of_specs is unsigned ...

Correct, this will be fixed.

>
>> +	    cmd.flow_attr.num_of_specs > IB_FLOW_SPEC_SUPPORT_LAYERS)
>> +		return -EINVAL;
>> +
>> +	kern_attr_size = cmd.flow_attr.size - sizeof(cmd) -
>> +			 sizeof(struct ib_uverbs_cmd_hdr_ex);
>> +
>
> Please, no under/overflow. Check that cmd.flow_attr.size is bigger than
> sizeof(cmd) + sizeof(struct ib_uverbs_cmd_hdr_ex) before computing
> kern_attr_size.

This introduces code duplication since we check that value and then 
compute it. Since this value has an upper and lower limits and both 
boundaries are checked, why is this problematic ?

>
>> +	if (cmd.flow_attr.size < 0 || cmd.flow_attr.size > in_len ||
>
> just like num_of_specs, it's unsigned !

Correct, this will be removed.

>
> cmd.flow_attr.size is certainly less then in_len, so you probably mean
> cmd.flow_attr.size > (in_len - sizeof(cmd) - sizeof(struct
> ib_uverbs_cmd_hdr_ex)).

It's not certainly less than in_len, as the user initializes 
cmd.flow_attr.size. Currently, our version of libibverbs place the 
entire command size on cmd.flow_attr.size.

>
>> +	    kern_attr_size < 0 || kern_attr_size >
>> +	    (cmd.flow_attr.num_of_specs * sizeof(struct ib_kern_spec)))
>> +		return -EINVAL;
>> +
>
> let's make kern_attr_size unsigned.

I think that it's not a good idea since we're iterating until we've got 
to num_of_specs or we've got an invalid size in one of the spec 
structures. Since we have an upper limit to kern_attr_size, I don't 
think it's problematic to keep it as int.

>
> And be stricter: kern_attr_size != cmd.flow_attr.num_of_specs *
> sizeof(struct ib_kern_spec) might be welcome.

The structures are TLV, so I can only check upper limit but not the 
exact size without traversing them.

>
>> +	if (cmd.flow_attr.num_of_specs) {
>> +		kern_flow_attr = kmalloc(cmd.flow_attr.size, GFP_KERNEL);
>> +		if (!kern_flow_attr)
>> +			return -ENOMEM;
>> +
>> +		memcpy(kern_flow_attr, &cmd.flow_attr, sizeof(*kern_flow_attr));
>
> might be sizeof(struct ib_kern_spec) as used in the computation.

This only copies the "header" and not the spec themselves.

>
>> +		if (copy_from_user(kern_flow_attr + 1, buf + sizeof(cmd),
>> +				   kern_attr_size)) {

This copies the specs.

>> +			err = -EFAULT;
>> +			goto err_free_attr;
>> +		}
>
> I don't feel comfortable with the bounds: previously it was
> sizeof(cmd) + sizeof(struct ib_uverbs_cmd_hdr_ex), and here
> copy_from_user() is copying starting from buf + sizeof(cmd) ... so where
> is the struct ib_uverbs_cmd_hdr_ex ?

The cmd starts straight after ib_uverbs_cmd_hdr_ex, but in_len includes 
sizeof(struct ib_uverbs_cmd_hdr_ex). Please take a look at the patch 
which adds the extension verbs mechanism.

>
> And I dislike the + 1 throughout this portion of the code: why the first
> element is being so different than the others. I didn't dig enough in
> the code to understand, but from my point of view, it doesn't fit.
>
>> +	kern_spec = kern_flow_attr + 1;
>> +	ib_spec = flow_attr + 1;
>> +	for (i = 0; i < flow_attr->num_of_specs && kern_attr_size > 0; i++) {
>> +		err = kern_spec_to_ib_spec(kern_spec, ib_spec);
>> +		if (err)
>> +			goto err_free;
>> +		flow_attr->size +=
>> +			((union ib_flow_spec *)ib_spec)->size;
>> +		kern_attr_size -= ((struct ib_kern_spec *)kern_spec)->size;
>> +		kern_spec += ((struct ib_kern_spec *)kern_spec)->size;
>> +		ib_spec += ((union ib_flow_spec *)ib_spec)->size;
>
> Why use void * for ib_spec and kern_spec if there's a union type for
> them ?

Since we use TLV, using void * actually introduces less castings.

>
>> diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
>> index 61535aa..0b233c5 100644
>> --- a/include/uapi/rdma/ib_user_verbs.h
>> +++ b/include/uapi/rdma/ib_user_verbs.h
>> @@ -86,7 +86,9 @@ enum {
>
>> +struct ib_kern_spec {
>> +	union {
>> +		struct {
>> +			__u32 type;
>> +			__u16 size;
>> +			__u16 reserved;
>> +		};
>> +		struct ib_kern_spec_eth	    eth;
>> +		struct ib_kern_spec_ipv4    ipv4;
>> +		struct ib_kern_spec_tcp_udp tcp_udp;
>> +	};
>> +};
>> +
>
> [Aside note: no IPv6 spec ?]
>
>
>> +struct ib_kern_flow_attr {
>> +	__u32 type;
>> +	__u16 size;
>> +	__u16 priority;
>> +	__u8  num_of_specs;
>> +	__u8  reserved[2];
>> +	__u8  port;
>> +	__u32 flags;
>> +	/* Following are the optional layers according to user request
>> +	 * struct ib_flow_spec_xxx
>> +	 * struct ib_flow_spec_yyy
>> +	 */
>> +};
>> +
>> +struct ib_uverbs_create_flow  {
>> +	__u32 comp_mask;
>
> Alignment notice: There's a hole here on 64bits system with stricter
> alignment rules.

Thanks! Do you recommend adding a u32 reserved field ?

>
>> +	__u64 response;
>> +	__u32 qp_handle;
>> +	struct ib_kern_flow_attr flow_attr;
>> +};
>> +
>
> Sorry I've missed that before it got included in infiniband git tree.
>
> Regards.
>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V4 for-next 3/4] IB/core: Export ib_create/destroy_flow through uverbs
       [not found]             ` <52384C9D.6050900-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2013-09-17 13:00               ` Yann Droneaud
  0 siblings, 0 replies; 17+ messages in thread
From: Yann Droneaud @ 2013-09-17 13:00 UTC (permalink / raw)
  To: Matan Barak
  Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz,
	Hadar Har-Zion, shawn.bohrer-Re5JQEeQqe8AvxtiuMwx3w,
	sean.hefty-ral2JQCrhuEAvxtiuMwx3w, tzahio-VPRAkNaXOzVWk0Htik3J/w

Le 17.09.2013 14:35, Matan Barak a écrit :
> On 11/9/2013 5:04 PM, Yann Droneaud wrote:
>> Le mercredi 28 août 2013 à 15:47 +0300, Matan Barak a écrit :
>>> From: Hadar Hen Zion <hadarh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>> 
>>> Implement ib_uverbs_create_flow and ib_uverbs_destroy_flow to
>>> support flow steering for user space applications.
>>> 
>>> Change-Id: I0bc6dbe26f4776d00f95b6200ff43ccb24000e31
>>> Signed-off-by: Hadar Hen Zion <hadarh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>> Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>> Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> 
>> Seen this in linux-next as commit
>> 436f2ad05a0b65b1467ddf51bc68171c381bf844
>> 

>> 
>> Please, no under/overflow. Check that cmd.flow_attr.size is bigger 
>> than
>> sizeof(cmd) + sizeof(struct ib_uverbs_cmd_hdr_ex) before computing
>> kern_attr_size.
> 
> This introduces code duplication since we check that value and then
> compute it. Since this value has an upper and lower limits and both
> boundaries are checked, why is this problematic ?
> 

Cause negative value says: "I don't take care of checking bounds:
Checking bound is a thing for pedantic security researchers
who never wrote real code but write poems in reaction to the non-sense
of life"

Please, check user input first, don't miss the chance to catch error 
early.

> 
>> 
>> cmd.flow_attr.size is certainly less then in_len, so you probably mean
>> cmd.flow_attr.size > (in_len - sizeof(cmd) - sizeof(struct
>> ib_uverbs_cmd_hdr_ex)).
> 
> It's not certainly less than in_len, as the user initializes
> cmd.flow_attr.size. Currently, our version of libibverbs place the
> entire command size on cmd.flow_attr.size.
> 

I would prefer to push out the fixed known size of the command,
and use this size to hold only the sum of the TLV sizes.

Then you can get more information from the .size field.

>> 
>>> +	    kern_attr_size < 0 || kern_attr_size >
>>> +	    (cmd.flow_attr.num_of_specs * sizeof(struct ib_kern_spec)))
>>> +		return -EINVAL;
>>> +
>> 
>> let's make kern_attr_size unsigned.
> 
> I think that it's not a good idea since we're iterating until we've
> got to num_of_specs or we've got an invalid size in one of the spec
> structures. Since we have an upper limit to kern_attr_size, I don't
> think it's problematic to keep it as int.
> 

I think it's not a good idea to iterate until the iterator get negative.
Probably a matter of taste or experience.
If there's an invalid size for TLV, check it before subtract. Not after.
Otherwise it's not bound checking, it's error recovery.

>> 
>> And be stricter: kern_attr_size != cmd.flow_attr.num_of_specs *
>> sizeof(struct ib_kern_spec) might be welcome.
> 
> The structures are TLV, so I can only check upper limit but not the
> exact size without traversing them.
> 

OK.

>>> diff --git a/include/uapi/rdma/ib_user_verbs.h 
>>> b/include/uapi/rdma/ib_user_verbs.h
>>> index 61535aa..0b233c5 100644
>>> --- a/include/uapi/rdma/ib_user_verbs.h
>>> +++ b/include/uapi/rdma/ib_user_verbs.h
>>> @@ -86,7 +86,9 @@ enum {
>> 
>>> +struct ib_kern_flow_attr {
>>> +	__u32 type;
>>> +	__u16 size;
>>> +	__u16 priority;
>>> +	__u8  num_of_specs;
>>> +	__u8  reserved[2];
>>> +	__u8  port;
>>> +	__u32 flags;
>>> +	/* Following are the optional layers according to user request
>>> +	 * struct ib_flow_spec_xxx
>>> +	 * struct ib_flow_spec_yyy
>>> +	 */
>>> +};
>>> +
>>> +struct ib_uverbs_create_flow  {
>>> +	__u32 comp_mask;
>> 
>> Alignment notice: There's a hole here on 64bits system with stricter
>> alignment rules.
> 
> Thanks! Do you recommend adding a u32 reserved field ?
> 

Sure it must be added.

Regards.

-- 
Yann Droneaud
OPTEYA

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH V4 for-next 3/4] IB/core: Export ib_create/destroy_flow through uverbs
       [not found] ` <1375873322-19384-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2013-08-07 11:02   ` Or Gerlitz
  0 siblings, 0 replies; 17+ messages in thread
From: Or Gerlitz @ 2013-08-07 11:02 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, hadarh-VPRAkNaXOzVWk0Htik3J/w,
	matanb-VPRAkNaXOzVWk0Htik3J/w,
	shawn.bohrer-Re5JQEeQqe8AvxtiuMwx3w,
	sean.hefty-ral2JQCrhuEAvxtiuMwx3w, Or Gerlitz

From: Hadar Hen Zion <hadarh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Implement ib_uverbs_create_flow and ib_uverbs_destroy_flow to
support flow steering for user space applications.

Signed-off-by: Hadar Hen Zion <hadarh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/uverbs.h      |    3 +
 drivers/infiniband/core/uverbs_cmd.c  |  214 +++++++++++++++++++++++++++++++++
 drivers/infiniband/core/uverbs_main.c |   13 ++-
 include/rdma/ib_verbs.h               |    1 +
 include/uapi/rdma/ib_user_verbs.h     |   89 ++++++++++++++-
 5 files changed, 318 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index 0fcd7aa..ad9d102 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -155,6 +155,7 @@ extern struct idr ib_uverbs_cq_idr;
 extern struct idr ib_uverbs_qp_idr;
 extern struct idr ib_uverbs_srq_idr;
 extern struct idr ib_uverbs_xrcd_idr;
+extern struct idr ib_uverbs_rule_idr;
 
 void idr_remove_uobj(struct idr *idp, struct ib_uobject *uobj);
 
@@ -215,5 +216,7 @@ IB_UVERBS_DECLARE_CMD(destroy_srq);
 IB_UVERBS_DECLARE_CMD(create_xsrq);
 IB_UVERBS_DECLARE_CMD(open_xrcd);
 IB_UVERBS_DECLARE_CMD(close_xrcd);
+IB_UVERBS_DECLARE_CMD(create_flow);
+IB_UVERBS_DECLARE_CMD(destroy_flow);
 
 #endif /* UVERBS_H */
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index b3c07b0..aa4a5da 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -54,6 +54,7 @@ static struct uverbs_lock_class qp_lock_class	= { .name = "QP-uobj" };
 static struct uverbs_lock_class ah_lock_class	= { .name = "AH-uobj" };
 static struct uverbs_lock_class srq_lock_class	= { .name = "SRQ-uobj" };
 static struct uverbs_lock_class xrcd_lock_class = { .name = "XRCD-uobj" };
+static struct uverbs_lock_class rule_lock_class = { .name = "RULE-uobj" };
 
 #define INIT_UDATA(udata, ibuf, obuf, ilen, olen)			\
 	do {								\
@@ -330,6 +331,7 @@ ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file,
 	INIT_LIST_HEAD(&ucontext->srq_list);
 	INIT_LIST_HEAD(&ucontext->ah_list);
 	INIT_LIST_HEAD(&ucontext->xrcd_list);
+	INIT_LIST_HEAD(&ucontext->rule_list);
 	ucontext->closing = 0;
 
 	resp.num_comp_vectors = file->device->num_comp_vectors;
@@ -2587,6 +2589,218 @@ out_put:
 	return ret ? ret : in_len;
 }
 
+static int kern_spec_to_ib_spec(struct ib_kern_spec *kern_spec,
+				struct _ib_flow_spec *ib_spec)
+{
+	ib_spec->type = kern_spec->type;
+
+	switch (ib_spec->type) {
+	case IB_FLOW_SPEC_ETH:
+		ib_spec->eth.size = sizeof(struct ib_flow_spec_eth);
+		if (ib_spec->eth.size != kern_spec->eth.size)
+			return -EINVAL;
+		memcpy(&ib_spec->eth.val, &kern_spec->eth.val,
+		       sizeof(struct ib_flow_eth_filter));
+		memcpy(&ib_spec->eth.mask, &kern_spec->eth.mask,
+		       sizeof(struct ib_flow_eth_filter));
+		break;
+	case IB_FLOW_SPEC_IPV4:
+		ib_spec->ipv4.size = sizeof(struct ib_flow_spec_ipv4);
+		if (ib_spec->ipv4.size != kern_spec->ipv4.size)
+			return -EINVAL;
+		memcpy(&ib_spec->ipv4.val, &kern_spec->ipv4.val,
+		       sizeof(struct ib_flow_ipv4_filter));
+		memcpy(&ib_spec->ipv4.mask, &kern_spec->ipv4.mask,
+		       sizeof(struct ib_flow_ipv4_filter));
+		break;
+	case IB_FLOW_SPEC_TCP:
+	case IB_FLOW_SPEC_UDP:
+		ib_spec->tcp_udp.size = sizeof(struct ib_flow_spec_tcp_udp);
+		if (ib_spec->tcp_udp.size != kern_spec->tcp_udp.size)
+			return -EINVAL;
+		memcpy(&ib_spec->tcp_udp.val, &kern_spec->tcp_udp.val,
+		       sizeof(struct ib_flow_tcp_udp_filter));
+		memcpy(&ib_spec->tcp_udp.mask, &kern_spec->tcp_udp.mask,
+		       sizeof(struct ib_flow_tcp_udp_filter));
+		break;
+	default:
+		return -EINVAL;
+	}
+	return 0;
+}
+
+ssize_t ib_uverbs_create_flow(struct ib_uverbs_file *file,
+			      const char __user *buf, int in_len,
+			      int out_len)
+{
+	struct ib_uverbs_create_flow	  cmd;
+	struct ib_uverbs_create_flow_resp resp;
+	struct ib_uobject		  *uobj;
+	struct ib_flow			  *flow_id;
+	struct ib_kern_flow_attr	  *kern_flow_attr;
+	struct ib_flow_attr		  *flow_attr;
+	struct ib_qp			  *qp;
+	int err = 0;
+	void *kern_spec;
+	void *ib_spec;
+	int i;
+	int kern_attr_size;
+
+	if (out_len < sizeof(resp))
+		return -ENOSPC;
+
+	if (copy_from_user(&cmd, buf, sizeof(cmd)))
+		return -EFAULT;
+
+	if ((cmd.flow_attr.type == IB_FLOW_ATTR_SNIFFER &&
+	     !capable(CAP_NET_ADMIN)) || !capable(CAP_NET_RAW))
+		return -EPERM;
+
+	if (cmd.flow_attr.num_of_specs) {
+		kern_flow_attr = kmalloc(cmd.flow_attr.size, GFP_KERNEL);
+		if (!kern_flow_attr)
+			return -ENOMEM;
+
+		memcpy(kern_flow_attr, &cmd.flow_attr, sizeof(*kern_flow_attr));
+		kern_attr_size = cmd.flow_attr.size - sizeof(cmd) - sizeof(struct ib_uverbs_cmd_hdr_ex);
+		if (copy_from_user(kern_flow_attr + 1, buf + sizeof(cmd),
+				   kern_attr_size)) {
+			err = -EFAULT;
+			goto err_free_attr;
+		}
+	} else {
+		kern_flow_attr = &cmd.flow_attr;
+		kern_attr_size = sizeof(cmd.flow_attr);
+	}
+
+	uobj = kmalloc(sizeof(*uobj), GFP_KERNEL);
+	if (!uobj) {
+		err = -ENOMEM;
+		goto err_free_attr;
+	}
+	init_uobj(uobj, 0, file->ucontext, &rule_lock_class);
+	down_write(&uobj->mutex);
+
+	qp = idr_read_qp(cmd.qp_handle, file->ucontext);
+	if (!qp) {
+		err = -EINVAL;
+		goto err_uobj;
+	}
+
+	flow_attr = kmalloc(cmd.flow_attr.size, GFP_KERNEL);
+	if (!flow_attr) {
+		err = -ENOMEM;
+		goto err_put;
+	}
+
+	flow_attr->type = kern_flow_attr->type;
+	flow_attr->priority = kern_flow_attr->priority;
+	flow_attr->num_of_specs = kern_flow_attr->num_of_specs;
+	flow_attr->port = kern_flow_attr->port;
+	flow_attr->flags = kern_flow_attr->flags;
+	flow_attr->size = sizeof(*flow_attr);
+
+	kern_spec = kern_flow_attr + 1;
+	ib_spec = flow_attr + 1;
+	for (i = 0; i < flow_attr->num_of_specs && kern_attr_size > 0; i++) {
+		err = kern_spec_to_ib_spec(kern_spec, ib_spec);
+		if (err)
+			goto err_free;
+		flow_attr->size +=
+			((struct _ib_flow_spec *)ib_spec)->size;
+		kern_attr_size -= ((struct ib_kern_spec *)kern_spec)->size;
+		kern_spec += ((struct ib_kern_spec *)kern_spec)->size;
+		ib_spec += ((struct _ib_flow_spec *)ib_spec)->size;
+	}
+	if (kern_attr_size) {
+		pr_warn("create flow failed, %d bytes left from uverb cmd\n",
+			kern_attr_size);
+		goto err_free;
+	}
+	flow_id = ib_create_flow(qp, flow_attr, IB_FLOW_DOMAIN_USER);
+	if (IS_ERR(flow_id)) {
+		err = PTR_ERR(flow_id);
+		goto err_free;
+	}
+	flow_id->qp = qp;
+	flow_id->uobject = uobj;
+	uobj->object = flow_id;
+
+	err = idr_add_uobj(&ib_uverbs_rule_idr, uobj);
+	if (err)
+		goto destroy_flow;
+
+	memset(&resp, 0, sizeof(resp));
+	resp.flow_handle = uobj->id;
+
+	if (copy_to_user((void __user *)(unsigned long) cmd.response,
+			 &resp, sizeof(resp))) {
+		err = -EFAULT;
+		goto err_copy;
+	}
+
+	put_qp_read(qp);
+	mutex_lock(&file->mutex);
+	list_add_tail(&uobj->list, &file->ucontext->rule_list);
+	mutex_unlock(&file->mutex);
+
+	uobj->live = 1;
+
+	up_write(&uobj->mutex);
+	kfree(flow_attr);
+	if (cmd.flow_attr.num_of_specs)
+		kfree(kern_flow_attr);
+	return in_len;
+err_copy:
+	idr_remove_uobj(&ib_uverbs_rule_idr, uobj);
+destroy_flow:
+	ib_destroy_flow(flow_id);
+err_free:
+	kfree(flow_attr);
+err_put:
+	put_qp_read(qp);
+err_uobj:
+	put_uobj_write(uobj);
+err_free_attr:
+	if (cmd.flow_attr.num_of_specs)
+		kfree(kern_flow_attr);
+	return err;
+}
+
+ssize_t ib_uverbs_destroy_flow(struct ib_uverbs_file *file,
+			       const char __user *buf, int in_len,
+			       int out_len) {
+	struct ib_uverbs_destroy_flow	cmd;
+	struct ib_flow			*flow_id;
+	struct ib_uobject		*uobj;
+	int				ret;
+
+	if (copy_from_user(&cmd, buf, sizeof(cmd)))
+		return -EFAULT;
+
+	uobj = idr_write_uobj(&ib_uverbs_rule_idr, cmd.flow_handle,
+			      file->ucontext);
+	if (!uobj)
+		return -EINVAL;
+	flow_id = uobj->object;
+
+	ret = ib_destroy_flow(flow_id);
+	if (!ret)
+		uobj->live = 0;
+
+	put_uobj_write(uobj);
+
+	idr_remove_uobj(&ib_uverbs_rule_idr, uobj);
+
+	mutex_lock(&file->mutex);
+	list_del(&uobj->list);
+	mutex_unlock(&file->mutex);
+
+	put_uobj(uobj);
+
+	return ret ? ret : in_len;
+}
+
 static int __uverbs_create_xsrq(struct ib_uverbs_file *file,
 				struct ib_uverbs_create_xsrq *cmd,
 				struct ib_udata *udata)
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index e4e7b24..75ad86c 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -73,6 +73,7 @@ DEFINE_IDR(ib_uverbs_cq_idr);
 DEFINE_IDR(ib_uverbs_qp_idr);
 DEFINE_IDR(ib_uverbs_srq_idr);
 DEFINE_IDR(ib_uverbs_xrcd_idr);
+DEFINE_IDR(ib_uverbs_rule_idr);
 
 static DEFINE_SPINLOCK(map_lock);
 static DECLARE_BITMAP(dev_map, IB_UVERBS_MAX_DEVICES);
@@ -113,7 +114,9 @@ static ssize_t (*uverbs_cmd_table[])(struct ib_uverbs_file *file,
 	[IB_USER_VERBS_CMD_OPEN_XRCD]		= ib_uverbs_open_xrcd,
 	[IB_USER_VERBS_CMD_CLOSE_XRCD]		= ib_uverbs_close_xrcd,
 	[IB_USER_VERBS_CMD_CREATE_XSRQ]		= ib_uverbs_create_xsrq,
-	[IB_USER_VERBS_CMD_OPEN_QP]		= ib_uverbs_open_qp
+	[IB_USER_VERBS_CMD_OPEN_QP]		= ib_uverbs_open_qp,
+	[IB_USER_VERBS_CMD_CREATE_FLOW]		= ib_uverbs_create_flow,
+	[IB_USER_VERBS_CMD_DESTROY_FLOW]	= ib_uverbs_destroy_flow
 };
 
 static void ib_uverbs_add_one(struct ib_device *device);
@@ -212,6 +215,14 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
 		kfree(uobj);
 	}
 
+	list_for_each_entry_safe(uobj, tmp, &context->rule_list, list) {
+		struct ib_flow *flow_id = uobj->object;
+
+		idr_remove_uobj(&ib_uverbs_rule_idr, uobj);
+		ib_destroy_flow(flow_id);
+		kfree(uobj);
+	}
+
 	list_for_each_entry_safe(uobj, tmp, &context->qp_list, list) {
 		struct ib_qp *qp = uobj->object;
 		struct ib_uqp_object *uqp =
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 3d72d53..27d27a7 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -954,6 +954,7 @@ struct ib_ucontext {
 	struct list_head	srq_list;
 	struct list_head	ah_list;
 	struct list_head	xrcd_list;
+	struct list_head	rule_list;
 	int			closing;
 };
 
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index 61535aa..0b233c5 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -86,7 +86,9 @@ enum {
 	IB_USER_VERBS_CMD_OPEN_XRCD,
 	IB_USER_VERBS_CMD_CLOSE_XRCD,
 	IB_USER_VERBS_CMD_CREATE_XSRQ,
-	IB_USER_VERBS_CMD_OPEN_QP
+	IB_USER_VERBS_CMD_OPEN_QP,
+	IB_USER_VERBS_CMD_CREATE_FLOW = IB_USER_VERBS_CMD_THRESHOLD,
+	IB_USER_VERBS_CMD_DESTROY_FLOW
 };
 
 /*
@@ -694,6 +696,91 @@ struct ib_uverbs_detach_mcast {
 	__u64 driver_data[0];
 };
 
+struct ib_kern_eth_filter {
+	__u8  dst_mac[6];
+	__u8  src_mac[6];
+	__be16 ether_type;
+	__be16 vlan_tag;
+};
+
+struct ib_kern_spec_eth {
+	__u32  type;
+	__u16  size;
+	__u16  reserved;
+	struct ib_kern_eth_filter val;
+	struct ib_kern_eth_filter mask;
+};
+
+struct ib_kern_ipv4_filter {
+	__be32 src_ip;
+	__be32 dst_ip;
+};
+
+struct ib_kern_spec_ipv4 {
+	__u32  type;
+	__u16  size;
+	__u16  reserved;
+	struct ib_kern_ipv4_filter val;
+	struct ib_kern_ipv4_filter mask;
+};
+
+struct ib_kern_tcp_udp_filter {
+	__be16 dst_port;
+	__be16 src_port;
+};
+
+struct ib_kern_spec_tcp_udp {
+	__u32  type;
+	__u16  size;
+	__u16  reserved;
+	struct ib_kern_tcp_udp_filter val;
+	struct ib_kern_tcp_udp_filter mask;
+};
+
+struct ib_kern_spec {
+	union {
+		struct {
+			__u32 type;
+			__u16 size;
+			__u16 reserved;
+		};
+		struct ib_kern_spec_eth	    eth;
+		struct ib_kern_spec_ipv4    ipv4;
+		struct ib_kern_spec_tcp_udp tcp_udp;
+	};
+};
+
+struct ib_kern_flow_attr {
+	__u32 type;
+	__u16 size;
+	__u16 priority;
+	__u8  num_of_specs;
+	__u8  reserved[2];
+	__u8  port;
+	__u32 flags;
+	/* Following are the optional layers according to user request
+	 * struct ib_flow_spec_xxx
+	 * struct ib_flow_spec_yyy
+	 */
+};
+
+struct ib_uverbs_create_flow  {
+	__u32 comp_mask;
+	__u64 response;
+	__u32 qp_handle;
+	struct ib_kern_flow_attr flow_attr;
+};
+
+struct ib_uverbs_create_flow_resp {
+	__u32 comp_mask;
+	__u32 flow_handle;
+};
+
+struct ib_uverbs_destroy_flow  {
+	__u32 comp_mask;
+	__u32 flow_handle;
+};
+
 struct ib_uverbs_create_srq {
 	__u64 response;
 	__u64 user_handle;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2013-09-17 13:00 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-28 12:47 [PATCH V4 for-next 0/4] Add receive Flow Steering support Matan Barak
     [not found] ` <1377694075-29287-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-08-28 12:47   ` [PATCH V4 for-next 1/4] IB/core: " Matan Barak
2013-08-28 12:47   ` [PATCH V4 for-next 2/4] IB/core: Infrastructure to support verbs extensions through uverbs Matan Barak
2013-08-28 12:47   ` [PATCH V4 for-next 3/4] IB/core: Export ib_create/destroy_flow " Matan Barak
     [not found]     ` <1377694075-29287-4-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-08-28 16:20       ` Jason Gunthorpe
     [not found]         ` <20130828162050.GA31381-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-09-01  9:18           ` Matan Barak
     [not found]             ` <52230648.5000302-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-09-01 22:23               ` Jason Gunthorpe
     [not found]                 ` <20130901222304.GB3422-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-09-02  9:26                   ` Matan Barak
     [not found]                     ` <522459AE.9070107-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-09-03  4:16                       ` Jason Gunthorpe
     [not found]                         ` <20130903041636.GA3875-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-09-03 13:36                           ` Matan Barak
2013-09-11 14:04       ` Yann Droneaud
     [not found]         ` <1378908269.2258.31.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2013-09-11 15:10           ` Or Gerlitz
     [not found]             ` <523087CE.4080007-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-09-11 15:25               ` Yann Droneaud
2013-09-17 12:35           ` Matan Barak
     [not found]             ` <52384C9D.6050900-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-09-17 13:00               ` Yann Droneaud
2013-08-28 12:47   ` [PATCH V4 for-next 4/4] IB/mlx4: Add receive Flow Steering support Matan Barak
  -- strict thread matches above, loose matches on Subject: below --
2013-08-07 11:01 [PATCH V4 for-next 0/4] " Or Gerlitz
     [not found] ` <1375873322-19384-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-08-07 11:02   ` [PATCH V4 for-next 3/4] IB/core: Export ib_create/destroy_flow through uverbs Or Gerlitz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.