[PATCH net-next 0/2] net/sctp: Avoid allocating high order memory with kmalloc()

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH net-next 0/2] net/sctp: Avoid allocating high order memory with kmalloc()
@ 2018-04-23 18:41 ` Oleg Babin
  0 siblings, 0 replies; 64+ messages in thread
From: Oleg Babin @ 2018-04-23 18:41 UTC (permalink / raw)
  To: netdev, linux-sctp
  Cc: David S. Miller, Vlad Yasevich, Neil Horman, Xin Long,
	Marcelo Ricardo Leitner, Andrey Ryabinin

Each SCTP association can have up to 65535 input and output streams.
For each stream type an array of sctp_stream_in or sctp_stream_out
structures is allocated using kmalloc_array() function. This function
allocates physically contiguous memory regions, so this can lead
to allocation of memory regions of very high order, i.e.:

  sizeof(struct sctp_stream_out) == 24,
  ((65535 * 24) / 4096) == 383 memory pages (4096 byte per page),
  which means 9th memory order.

This can lead to a memory allocation failures on the systems
under a memory stress.

We actually do not need these arrays of memory to be physically
contiguous. Possible simple solution would be to use kvmalloc()
instread of kmalloc() as kvmalloc() can allocate physically scattered
pages if contiguous pages are not available. But the problem
is that the allocation can happed in a softirq context with
GFP_ATOMIC flag set, and kvmalloc() cannot be used in this scenario.

So the other possible solution is to use flexible arrays instead of
contiguios arrays of memory so that the memory would be allocated
on a per-page basis.

This patchset replaces kvmalloc() with flex_array usage.
It consists of two parts:

  * First patch is preparatory - it mechanically wraps all direct
    access to assoc->stream.out[] and assoc->stream.in[] arrays
    with SCTP_SO() and SCTP_SI() wrappers so that later a direct
    array access could be easily changed to an access to a
    flex_array (or any other possible alternative).
  * Second patch replaces kmalloc_array() with flex_array usage.

Oleg Babin (2):
  net/sctp: Make wrappers for accessing in/out streams
  net/sctp: Replace in/out stream arrays with flex_array

 include/net/sctp/structs.h   |  31 +++++---
 net/sctp/chunk.c             |   6 +-
 net/sctp/outqueue.c          |  11 +--
 net/sctp/socket.c            |   4 +-
 net/sctp/stream.c            | 165 +++++++++++++++++++++++++++++--------------
 net/sctp/stream_interleave.c |   2 +-
 net/sctp/stream_sched.c      |  13 ++--
 net/sctp/stream_sched_prio.c |  22 +++---
 net/sctp/stream_sched_rr.c   |   8 +--
 9 files changed, 167 insertions(+), 95 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH net-next 0/2] net/sctp: Avoid allocating high order memory with kmalloc()
@ 2018-04-23 18:41 ` Oleg Babin
  0 siblings, 0 replies; 64+ messages in thread
From: Oleg Babin @ 2018-04-23 18:41 UTC (permalink / raw)
  To: netdev, linux-sctp
  Cc: David S. Miller, Vlad Yasevich, Neil Horman, Xin Long,
	Marcelo Ricardo Leitner, Andrey Ryabinin

Each SCTP association can have up to 65535 input and output streams.
For each stream type an array of sctp_stream_in or sctp_stream_out
structures is allocated using kmalloc_array() function. This function
allocates physically contiguous memory regions, so this can lead
to allocation of memory regions of very high order, i.e.:

  sizeof(struct sctp_stream_out) = 24,
  ((65535 * 24) / 4096) = 383 memory pages (4096 byte per page),
  which means 9th memory order.

This can lead to a memory allocation failures on the systems
under a memory stress.

We actually do not need these arrays of memory to be physically
contiguous. Possible simple solution would be to use kvmalloc()
instread of kmalloc() as kvmalloc() can allocate physically scattered
pages if contiguous pages are not available. But the problem
is that the allocation can happed in a softirq context with
GFP_ATOMIC flag set, and kvmalloc() cannot be used in this scenario.

So the other possible solution is to use flexible arrays instead of
contiguios arrays of memory so that the memory would be allocated
on a per-page basis.

This patchset replaces kvmalloc() with flex_array usage.
It consists of two parts:

  * First patch is preparatory - it mechanically wraps all direct
    access to assoc->stream.out[] and assoc->stream.in[] arrays
    with SCTP_SO() and SCTP_SI() wrappers so that later a direct
    array access could be easily changed to an access to a
    flex_array (or any other possible alternative).
  * Second patch replaces kmalloc_array() with flex_array usage.

Oleg Babin (2):
  net/sctp: Make wrappers for accessing in/out streams
  net/sctp: Replace in/out stream arrays with flex_array

 include/net/sctp/structs.h   |  31 +++++---
 net/sctp/chunk.c             |   6 +-
 net/sctp/outqueue.c          |  11 +--
 net/sctp/socket.c            |   4 +-
 net/sctp/stream.c            | 165 +++++++++++++++++++++++++++++--------------
 net/sctp/stream_interleave.c |   2 +-
 net/sctp/stream_sched.c      |  13 ++--
 net/sctp/stream_sched_prio.c |  22 +++---
 net/sctp/stream_sched_rr.c   |   8 +--
 9 files changed, 167 insertions(+), 95 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH net-next 1/2] net/sctp: Make wrappers for accessing in/out streams
  2018-04-23 18:41 ` Oleg Babin
@ 2018-04-23 18:41   ` Oleg Babin
  -1 siblings, 0 replies; 64+ messages in thread
From: Oleg Babin @ 2018-04-23 18:41 UTC (permalink / raw)
  To: netdev, linux-sctp
  Cc: David S. Miller, Vlad Yasevich, Neil Horman, Xin Long,
	Marcelo Ricardo Leitner, Andrey Ryabinin

This patch introduces wrappers for accessing in/out streams indirectly.
This will enable to replace physically contiguous memory arrays
of streams with flexible arrays (or maybe any other appropriate
mechanism) which do memory allocation on a per-page basis.

Signed-off-by: Oleg Babin <obabin@virtuozzo.com>
---
 include/net/sctp/structs.h   |  30 +++++++-----
 net/sctp/chunk.c             |   6 ++-
 net/sctp/outqueue.c          |  11 +++--
 net/sctp/socket.c            |   4 +-
 net/sctp/stream.c            | 107 +++++++++++++++++++++++++------------------
 net/sctp/stream_interleave.c |   2 +-
 net/sctp/stream_sched.c      |  13 +++---
 net/sctp/stream_sched_prio.c |  22 ++++-----
 net/sctp/stream_sched_rr.c   |   8 ++--
 9 files changed, 116 insertions(+), 87 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index a0ec462..578bb40 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -394,37 +394,37 @@ int sctp_stream_init(struct sctp_stream *stream, __u16 outcnt, __u16 incnt,
 
 /* What is the current SSN number for this stream? */
 #define sctp_ssn_peek(stream, type, sid) \
-	((stream)->type[sid].ssn)
+	(sctp_stream_##type##_ptr((stream), (sid))->ssn)
 
 /* Return the next SSN number for this stream.	*/
 #define sctp_ssn_next(stream, type, sid) \
-	((stream)->type[sid].ssn++)
+	(sctp_stream_##type##_ptr((stream), (sid))->ssn++)
 
 /* Skip over this ssn and all below. */
 #define sctp_ssn_skip(stream, type, sid, ssn) \
-	((stream)->type[sid].ssn = ssn + 1)
+	(sctp_stream_##type##_ptr((stream), (sid))->ssn = ssn + 1)
 
 /* What is the current MID number for this stream? */
 #define sctp_mid_peek(stream, type, sid) \
-	((stream)->type[sid].mid)
+	(sctp_stream_##type##_ptr((stream), (sid))->mid)
 
 /* Return the next MID number for this stream.  */
 #define sctp_mid_next(stream, type, sid) \
-	((stream)->type[sid].mid++)
+	(sctp_stream_##type##_ptr((stream), (sid))->mid++)
 
 /* Skip over this mid and all below. */
 #define sctp_mid_skip(stream, type, sid, mid) \
-	((stream)->type[sid].mid = mid + 1)
+	(sctp_stream_##type##_ptr((stream), (sid))->mid = mid + 1)
 
-#define sctp_stream_in(asoc, sid) (&(asoc)->stream.in[sid])
+#define sctp_stream_in(asoc, sid) sctp_stream_in_ptr(&(asoc)->stream, (sid))
 
 /* What is the current MID_uo number for this stream? */
 #define sctp_mid_uo_peek(stream, type, sid) \
-	((stream)->type[sid].mid_uo)
+	(sctp_stream_##type##_ptr((stream), (sid))->mid_uo)
 
 /* Return the next MID_uo number for this stream.  */
 #define sctp_mid_uo_next(stream, type, sid) \
-	((stream)->type[sid].mid_uo++)
+	(sctp_stream_##type##_ptr((stream), (sid))->mid_uo++)
 
 /*
  * Pointers to address related SCTP functions.
@@ -1428,8 +1428,8 @@ struct sctp_stream_in {
 };
 
 struct sctp_stream {
-	struct sctp_stream_out *out;
-	struct sctp_stream_in *in;
+	struct flex_array *out;
+	struct flex_array *in;
 	__u16 outcnt;
 	__u16 incnt;
 	/* Current stream being sent, if any */
@@ -1451,6 +1451,14 @@ struct sctp_stream {
 	struct sctp_stream_interleave *si;
 };
 
+struct sctp_stream_out *sctp_stream_out_ptr(const struct sctp_stream *stream,
+					    __u16 sid);
+struct sctp_stream_in *sctp_stream_in_ptr(const struct sctp_stream *stream,
+					  __u16 sid);
+
+#define SCTP_SO(s, i) sctp_stream_out_ptr((s), (i))
+#define SCTP_SI(s, i) sctp_stream_in_ptr((s), (i))
+
 #define SCTP_STREAM_CLOSED		0x00
 #define SCTP_STREAM_OPEN		0x01
 
diff --git a/net/sctp/chunk.c b/net/sctp/chunk.c
index be296d6..4b9310e 100644
--- a/net/sctp/chunk.c
+++ b/net/sctp/chunk.c
@@ -333,7 +333,8 @@ int sctp_chunk_abandoned(struct sctp_chunk *chunk)
 	if (SCTP_PR_TTL_ENABLED(chunk->sinfo.sinfo_flags) &&
 	    time_after(jiffies, chunk->msg->expires_at)) {
 		struct sctp_stream_out *streamout =
-			&chunk->asoc->stream.out[chunk->sinfo.sinfo_stream];
+			SCTP_SO(&chunk->asoc->stream,
+				chunk->sinfo.sinfo_stream);
 
 		if (chunk->sent_count) {
 			chunk->asoc->abandoned_sent[SCTP_PR_INDEX(TTL)]++;
@@ -347,7 +348,8 @@ int sctp_chunk_abandoned(struct sctp_chunk *chunk)
 	} else if (SCTP_PR_RTX_ENABLED(chunk->sinfo.sinfo_flags) &&
 		   chunk->sent_count > chunk->sinfo.sinfo_timetolive) {
 		struct sctp_stream_out *streamout =
-			&chunk->asoc->stream.out[chunk->sinfo.sinfo_stream];
+			SCTP_SO(&chunk->asoc->stream,
+				chunk->sinfo.sinfo_stream);
 
 		chunk->asoc->abandoned_sent[SCTP_PR_INDEX(RTX)]++;
 		streamout->ext->abandoned_sent[SCTP_PR_INDEX(RTX)]++;
diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index f211b3d..8d5d811 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -80,7 +80,7 @@ static inline void sctp_outq_head_data(struct sctp_outq *q,
 	q->out_qlen += ch->skb->len;
 
 	stream = sctp_chunk_stream_no(ch);
-	oute = q->asoc->stream.out[stream].ext;
+	oute = SCTP_SO(&q->asoc->stream, stream)->ext;
 	list_add(&ch->stream_list, &oute->outq);
 }
 
@@ -101,7 +101,7 @@ static inline void sctp_outq_tail_data(struct sctp_outq *q,
 	q->out_qlen += ch->skb->len;
 
 	stream = sctp_chunk_stream_no(ch);
-	oute = q->asoc->stream.out[stream].ext;
+	oute = SCTP_SO(&q->asoc->stream, stream)->ext;
 	list_add_tail(&ch->stream_list, &oute->outq);
 }
 
@@ -372,7 +372,7 @@ static int sctp_prsctp_prune_sent(struct sctp_association *asoc,
 		sctp_insert_list(&asoc->outqueue.abandoned,
 				 &chk->transmitted_list);
 
-		streamout = &asoc->stream.out[chk->sinfo.sinfo_stream];
+		streamout = SCTP_SO(&asoc->stream, chk->sinfo.sinfo_stream);
 		asoc->sent_cnt_removable--;
 		asoc->abandoned_sent[SCTP_PR_INDEX(PRIO)]++;
 		streamout->ext->abandoned_sent[SCTP_PR_INDEX(PRIO)]++;
@@ -416,7 +416,7 @@ static int sctp_prsctp_prune_unsent(struct sctp_association *asoc,
 		asoc->abandoned_unsent[SCTP_PR_INDEX(PRIO)]++;
 		if (chk->sinfo.sinfo_stream < asoc->stream.outcnt) {
 			struct sctp_stream_out *streamout =
-				&asoc->stream.out[chk->sinfo.sinfo_stream];
+				SCTP_SO(&asoc->stream, chk->sinfo.sinfo_stream);
 
 			streamout->ext->abandoned_unsent[SCTP_PR_INDEX(PRIO)]++;
 		}
@@ -1050,6 +1050,7 @@ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
 		/* Finally, transmit new packets.  */
 		while ((chunk = sctp_outq_dequeue_data(q)) != NULL) {
 			__u32 sid = ntohs(chunk->subh.data_hdr->stream);
+			__u8 stream_state = SCTP_SO(&asoc->stream, sid)->state;
 
 			/* Has this chunk expired? */
 			if (sctp_chunk_abandoned(chunk)) {
@@ -1059,7 +1060,7 @@ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
 				continue;
 			}
 
-			if (asoc->stream.out[sid].state == SCTP_STREAM_CLOSED) {
+			if (stream_state == SCTP_STREAM_CLOSED) {
 				sctp_outq_head_data(q, chunk);
 				goto sctp_flush_out;
 			}
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 80835ac..3442f7c 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1907,7 +1907,7 @@ static int sctp_sendmsg_to_asoc(struct sctp_association *asoc,
 		goto err;
 	}
 
-	if (unlikely(!asoc->stream.out[sinfo->sinfo_stream].ext)) {
+	if (unlikely(!SCTP_SO(&asoc->stream, sinfo->sinfo_stream)->ext)) {
 		err = sctp_stream_init_ext(&asoc->stream, sinfo->sinfo_stream);
 		if (err)
 			goto err;
@@ -6942,7 +6942,7 @@ static int sctp_getsockopt_pr_streamstatus(struct sock *sk, int len,
 	if (!asoc || params.sprstat_sid >= asoc->stream.outcnt)
 		goto out;
 
-	streamoute = asoc->stream.out[params.sprstat_sid].ext;
+	streamoute = SCTP_SO(&asoc->stream, params.sprstat_sid)->ext;
 	if (!streamoute) {
 		/* Not allocated yet, means all stats are 0 */
 		params.sprstat_abandoned_unsent = 0;
diff --git a/net/sctp/stream.c b/net/sctp/stream.c
index f799043..16e36c0 100644
--- a/net/sctp/stream.c
+++ b/net/sctp/stream.c
@@ -37,6 +37,18 @@
 #include <net/sctp/sm.h>
 #include <net/sctp/stream_sched.h>
 
+struct sctp_stream_out *sctp_stream_out_ptr(const struct sctp_stream *stream,
+					    __u16 sid)
+{
+	return ((struct sctp_stream_out *)(stream->out)) + sid;
+}
+
+struct sctp_stream_in *sctp_stream_in_ptr(const struct sctp_stream *stream,
+					  __u16 sid)
+{
+	return ((struct sctp_stream_in *)(stream->in)) + sid;
+}
+
 /* Migrates chunks from stream queues to new stream queues if needed,
  * but not across associations. Also, removes those chunks to streams
  * higher than the new max.
@@ -78,34 +90,35 @@ static void sctp_stream_outq_migrate(struct sctp_stream *stream,
 		 * sctp_stream_update will swap ->out pointers.
 		 */
 		for (i = 0; i < outcnt; i++) {
-			kfree(new->out[i].ext);
-			new->out[i].ext = stream->out[i].ext;
-			stream->out[i].ext = NULL;
+			kfree(SCTP_SO(new, i)->ext);
+			SCTP_SO(new, i)->ext = SCTP_SO(stream, i)->ext;
+			SCTP_SO(stream, i)->ext = NULL;
 		}
 	}
 
 	for (i = outcnt; i < stream->outcnt; i++)
-		kfree(stream->out[i].ext);
+		kfree(SCTP_SO(stream, i)->ext);
 }
 
 static int sctp_stream_alloc_out(struct sctp_stream *stream, __u16 outcnt,
 				 gfp_t gfp)
 {
-	struct sctp_stream_out *out;
+	struct flex_array *out;
+	size_t elem_size = sizeof(struct sctp_stream_out);
 
-	out = kmalloc_array(outcnt, sizeof(*out), gfp);
+	out = kmalloc_array(outcnt, elem_size, gfp);
 	if (!out)
 		return -ENOMEM;
 
 	if (stream->out) {
 		memcpy(out, stream->out, min(outcnt, stream->outcnt) *
-					 sizeof(*out));
+					 elem_size);
 		kfree(stream->out);
 	}
 
 	if (outcnt > stream->outcnt)
-		memset(out + stream->outcnt, 0,
-		       (outcnt - stream->outcnt) * sizeof(*out));
+		memset(((struct sctp_stream_out *)out) + stream->outcnt, 0,
+		       (outcnt - stream->outcnt) * elem_size);
 
 	stream->out = out;
 
@@ -115,22 +128,23 @@ static int sctp_stream_alloc_out(struct sctp_stream *stream, __u16 outcnt,
 static int sctp_stream_alloc_in(struct sctp_stream *stream, __u16 incnt,
 				gfp_t gfp)
 {
-	struct sctp_stream_in *in;
+	struct flex_array *in;
+	size_t elem_size = sizeof(struct sctp_stream_in);
 
-	in = kmalloc_array(incnt, sizeof(*stream->in), gfp);
+	in = kmalloc_array(incnt, elem_size, gfp);
 
 	if (!in)
 		return -ENOMEM;
 
 	if (stream->in) {
 		memcpy(in, stream->in, min(incnt, stream->incnt) *
-				       sizeof(*in));
+				       elem_size);
 		kfree(stream->in);
 	}
 
 	if (incnt > stream->incnt)
-		memset(in + stream->incnt, 0,
-		       (incnt - stream->incnt) * sizeof(*in));
+		memset(((struct sctp_stream_in *)in) + stream->incnt, 0,
+		       (incnt - stream->incnt) * elem_size);
 
 	stream->in = in;
 
@@ -162,7 +176,7 @@ int sctp_stream_init(struct sctp_stream *stream, __u16 outcnt, __u16 incnt,
 
 	stream->outcnt = outcnt;
 	for (i = 0; i < stream->outcnt; i++)
-		stream->out[i].state = SCTP_STREAM_OPEN;
+		SCTP_SO(stream, i)->state = SCTP_STREAM_OPEN;
 
 	sched->init(stream);
 
@@ -193,7 +207,7 @@ int sctp_stream_init_ext(struct sctp_stream *stream, __u16 sid)
 	soute = kzalloc(sizeof(*soute), GFP_KERNEL);
 	if (!soute)
 		return -ENOMEM;
-	stream->out[sid].ext = soute;
+	SCTP_SO(stream, sid)->ext = soute;
 
 	return sctp_sched_init_sid(stream, sid, GFP_KERNEL);
 }
@@ -205,7 +219,7 @@ void sctp_stream_free(struct sctp_stream *stream)
 
 	sched->free(stream);
 	for (i = 0; i < stream->outcnt; i++)
-		kfree(stream->out[i].ext);
+		kfree(SCTP_SO(stream, i)->ext);
 	kfree(stream->out);
 	kfree(stream->in);
 }
@@ -215,12 +229,12 @@ void sctp_stream_clear(struct sctp_stream *stream)
 	int i;
 
 	for (i = 0; i < stream->outcnt; i++) {
-		stream->out[i].mid = 0;
-		stream->out[i].mid_uo = 0;
+		SCTP_SO(stream, i)->mid = 0;
+		SCTP_SO(stream, i)->mid_uo = 0;
 	}
 
 	for (i = 0; i < stream->incnt; i++)
-		stream->in[i].mid = 0;
+		SCTP_SI(stream, i)->mid = 0;
 }
 
 void sctp_stream_update(struct sctp_stream *stream, struct sctp_stream *new)
@@ -271,8 +285,8 @@ static bool sctp_stream_outq_is_empty(struct sctp_stream *stream,
 	for (i = 0; i < str_nums; i++) {
 		__u16 sid = ntohs(str_list[i]);
 
-		if (stream->out[sid].ext &&
-		    !list_empty(&stream->out[sid].ext->outq))
+		if (SCTP_SO(stream, sid)->ext &&
+		    !list_empty(&SCTP_SO(stream, sid)->ext->outq))
 			return false;
 	}
 
@@ -359,11 +373,11 @@ int sctp_send_reset_streams(struct sctp_association *asoc,
 	if (out) {
 		if (str_nums)
 			for (i = 0; i < str_nums; i++)
-				stream->out[str_list[i]].state =
+				SCTP_SO(stream, str_list[i])->state =
 						       SCTP_STREAM_CLOSED;
 		else
 			for (i = 0; i < stream->outcnt; i++)
-				stream->out[i].state = SCTP_STREAM_CLOSED;
+				SCTP_SO(stream, i)->state = SCTP_STREAM_CLOSED;
 	}
 
 	asoc->strreset_chunk = chunk;
@@ -378,11 +392,11 @@ int sctp_send_reset_streams(struct sctp_association *asoc,
 
 		if (str_nums)
 			for (i = 0; i < str_nums; i++)
-				stream->out[str_list[i]].state =
+				SCTP_SO(stream, str_list[i])->state =
 						       SCTP_STREAM_OPEN;
 		else
 			for (i = 0; i < stream->outcnt; i++)
-				stream->out[i].state = SCTP_STREAM_OPEN;
+				SCTP_SO(stream, i)->state = SCTP_STREAM_OPEN;
 
 		goto out;
 	}
@@ -416,7 +430,7 @@ int sctp_send_reset_assoc(struct sctp_association *asoc)
 
 	/* Block further xmit of data until this request is completed */
 	for (i = 0; i < stream->outcnt; i++)
-		stream->out[i].state = SCTP_STREAM_CLOSED;
+		SCTP_SO(stream, i)->state = SCTP_STREAM_CLOSED;
 
 	asoc->strreset_chunk = chunk;
 	sctp_chunk_hold(asoc->strreset_chunk);
@@ -427,7 +441,7 @@ int sctp_send_reset_assoc(struct sctp_association *asoc)
 		asoc->strreset_chunk = NULL;
 
 		for (i = 0; i < stream->outcnt; i++)
-			stream->out[i].state = SCTP_STREAM_OPEN;
+			SCTP_SO(stream, i)->state = SCTP_STREAM_OPEN;
 
 		return retval;
 	}
@@ -607,10 +621,10 @@ struct sctp_chunk *sctp_process_strreset_outreq(
 		}
 
 		for (i = 0; i < nums; i++)
-			stream->in[ntohs(str_p[i])].mid = 0;
+			SCTP_SI(stream, ntohs(str_p[i]))->mid = 0;
 	} else {
 		for (i = 0; i < stream->incnt; i++)
-			stream->in[i].mid = 0;
+			SCTP_SI(stream, i)->mid = 0;
 	}
 
 	result = SCTP_STRRESET_PERFORMED;
@@ -681,11 +695,11 @@ struct sctp_chunk *sctp_process_strreset_inreq(
 
 	if (nums)
 		for (i = 0; i < nums; i++)
-			stream->out[ntohs(str_p[i])].state =
+			SCTP_SO(stream, ntohs(str_p[i]))->state =
 					       SCTP_STREAM_CLOSED;
 	else
 		for (i = 0; i < stream->outcnt; i++)
-			stream->out[i].state = SCTP_STREAM_CLOSED;
+			SCTP_SO(stream, i)->state = SCTP_STREAM_CLOSED;
 
 	asoc->strreset_chunk = chunk;
 	asoc->strreset_outstanding = 1;
@@ -784,11 +798,11 @@ struct sctp_chunk *sctp_process_strreset_tsnreq(
 	 *      incoming and outgoing streams.
 	 */
 	for (i = 0; i < stream->outcnt; i++) {
-		stream->out[i].mid = 0;
-		stream->out[i].mid_uo = 0;
+		SCTP_SO(stream, i)->mid = 0;
+		SCTP_SO(stream, i)->mid_uo = 0;
 	}
 	for (i = 0; i < stream->incnt; i++)
-		stream->in[i].mid = 0;
+		SCTP_SI(stream, i)->mid = 0;
 
 	result = SCTP_STRRESET_PERFORMED;
 
@@ -977,15 +991,18 @@ struct sctp_chunk *sctp_process_strreset_resp(
 		       sizeof(__u16);
 
 		if (result == SCTP_STRRESET_PERFORMED) {
+			struct sctp_stream_out *sout;
 			if (nums) {
 				for (i = 0; i < nums; i++) {
-					stream->out[ntohs(str_p[i])].mid = 0;
-					stream->out[ntohs(str_p[i])].mid_uo = 0;
+					sout = SCTP_SO(stream, ntohs(str_p[i]));
+					sout->mid = 0;
+					sout->mid_uo = 0;
 				}
 			} else {
 				for (i = 0; i < stream->outcnt; i++) {
-					stream->out[i].mid = 0;
-					stream->out[i].mid_uo = 0;
+					sout = SCTP_SO(stream, i);
+					sout->mid = 0;
+					sout->mid_uo = 0;
 				}
 			}
 
@@ -993,7 +1010,7 @@ struct sctp_chunk *sctp_process_strreset_resp(
 		}
 
 		for (i = 0; i < stream->outcnt; i++)
-			stream->out[i].state = SCTP_STREAM_OPEN;
+			SCTP_SO(stream, i)->state = SCTP_STREAM_OPEN;
 
 		*evp = sctp_ulpevent_make_stream_reset_event(asoc, flags,
 			nums, str_p, GFP_ATOMIC);
@@ -1048,15 +1065,15 @@ struct sctp_chunk *sctp_process_strreset_resp(
 			asoc->adv_peer_ack_point = asoc->ctsn_ack_point;
 
 			for (i = 0; i < stream->outcnt; i++) {
-				stream->out[i].mid = 0;
-				stream->out[i].mid_uo = 0;
+				SCTP_SO(stream, i)->mid = 0;
+				SCTP_SO(stream, i)->mid_uo = 0;
 			}
 			for (i = 0; i < stream->incnt; i++)
-				stream->in[i].mid = 0;
+				SCTP_SI(stream, i)->mid = 0;
 		}
 
 		for (i = 0; i < stream->outcnt; i++)
-			stream->out[i].state = SCTP_STREAM_OPEN;
+			SCTP_SO(stream, i)->state = SCTP_STREAM_OPEN;
 
 		*evp = sctp_ulpevent_make_assoc_reset_event(asoc, flags,
 			stsn, rtsn, GFP_ATOMIC);
@@ -1070,7 +1087,7 @@ struct sctp_chunk *sctp_process_strreset_resp(
 
 		if (result == SCTP_STRRESET_PERFORMED)
 			for (i = number; i < stream->outcnt; i++)
-				stream->out[i].state = SCTP_STREAM_OPEN;
+				SCTP_SO(stream, i)->state = SCTP_STREAM_OPEN;
 		else
 			stream->outcnt = number;
 
diff --git a/net/sctp/stream_interleave.c b/net/sctp/stream_interleave.c
index d3764c1..46f9fb6 100644
--- a/net/sctp/stream_interleave.c
+++ b/net/sctp/stream_interleave.c
@@ -1053,7 +1053,7 @@ static void sctp_intl_abort_pd(struct sctp_ulpq *ulpq, gfp_t gfp)
 	__u16 sid;
 
 	for (sid = 0; sid < stream->incnt; sid++) {
-		struct sctp_stream_in *sin = &stream->in[sid];
+		struct sctp_stream_in *sin = SCTP_SI(stream, sid);
 		__u32 mid;
 
 		if (sin->pd_mode_uo) {
diff --git a/net/sctp/stream_sched.c b/net/sctp/stream_sched.c
index f5fcd42..a6c04a9 100644
--- a/net/sctp/stream_sched.c
+++ b/net/sctp/stream_sched.c
@@ -161,7 +161,7 @@ int sctp_sched_set_sched(struct sctp_association *asoc,
 
 		/* Give the next scheduler a clean slate. */
 		for (i = 0; i < asoc->stream.outcnt; i++) {
-			void *p = asoc->stream.out[i].ext;
+			void *p = SCTP_SO(&asoc->stream, i)->ext;
 
 			if (!p)
 				continue;
@@ -175,7 +175,7 @@ int sctp_sched_set_sched(struct sctp_association *asoc,
 	asoc->outqueue.sched = n;
 	n->init(&asoc->stream);
 	for (i = 0; i < asoc->stream.outcnt; i++) {
-		if (!asoc->stream.out[i].ext)
+		if (!SCTP_SO(&asoc->stream, i)->ext)
 			continue;
 
 		ret = n->init_sid(&asoc->stream, i, GFP_KERNEL);
@@ -217,7 +217,7 @@ int sctp_sched_set_value(struct sctp_association *asoc, __u16 sid,
 	if (sid >= asoc->stream.outcnt)
 		return -EINVAL;
 
-	if (!asoc->stream.out[sid].ext) {
+	if (!SCTP_SO(&asoc->stream, sid)->ext) {
 		int ret;
 
 		ret = sctp_stream_init_ext(&asoc->stream, sid);
@@ -234,7 +234,7 @@ int sctp_sched_get_value(struct sctp_association *asoc, __u16 sid,
 	if (sid >= asoc->stream.outcnt)
 		return -EINVAL;
 
-	if (!asoc->stream.out[sid].ext)
+	if (!SCTP_SO(&asoc->stream, sid)->ext)
 		return 0;
 
 	return asoc->outqueue.sched->get(&asoc->stream, sid, value);
@@ -252,7 +252,7 @@ void sctp_sched_dequeue_done(struct sctp_outq *q, struct sctp_chunk *ch)
 		 * priority stream comes in.
 		 */
 		sid = sctp_chunk_stream_no(ch);
-		sout = &q->asoc->stream.out[sid];
+		sout = SCTP_SO(&q->asoc->stream, sid);
 		q->asoc->stream.out_curr = sout;
 		return;
 	}
@@ -272,8 +272,9 @@ void sctp_sched_dequeue_common(struct sctp_outq *q, struct sctp_chunk *ch)
 int sctp_sched_init_sid(struct sctp_stream *stream, __u16 sid, gfp_t gfp)
 {
 	struct sctp_sched_ops *sched = sctp_sched_ops_from_stream(stream);
+	struct sctp_stream_out_ext *ext = SCTP_SO(stream, sid)->ext;
 
-	INIT_LIST_HEAD(&stream->out[sid].ext->outq);
+	INIT_LIST_HEAD(&ext->outq);
 	return sched->init_sid(stream, sid, gfp);
 }
 
diff --git a/net/sctp/stream_sched_prio.c b/net/sctp/stream_sched_prio.c
index 7997d35..2245083 100644
--- a/net/sctp/stream_sched_prio.c
+++ b/net/sctp/stream_sched_prio.c
@@ -75,10 +75,10 @@ static struct sctp_stream_priorities *sctp_sched_prio_get_head(
 
 	/* No luck. So we search on all streams now. */
 	for (i = 0; i < stream->outcnt; i++) {
-		if (!stream->out[i].ext)
+		if (!SCTP_SO(stream, i)->ext)
 			continue;
 
-		p = stream->out[i].ext->prio_head;
+		p = SCTP_SO(stream, i)->ext->prio_head;
 		if (!p)
 			/* Means all other streams won't be initialized
 			 * as well.
@@ -165,7 +165,7 @@ static void sctp_sched_prio_sched(struct sctp_stream *stream,
 static int sctp_sched_prio_set(struct sctp_stream *stream, __u16 sid,
 			       __u16 prio, gfp_t gfp)
 {
-	struct sctp_stream_out *sout = &stream->out[sid];
+	struct sctp_stream_out *sout = SCTP_SO(stream, sid);
 	struct sctp_stream_out_ext *soute = sout->ext;
 	struct sctp_stream_priorities *prio_head, *old;
 	bool reschedule = false;
@@ -186,7 +186,7 @@ static int sctp_sched_prio_set(struct sctp_stream *stream, __u16 sid,
 		return 0;
 
 	for (i = 0; i < stream->outcnt; i++) {
-		soute = stream->out[i].ext;
+		soute = SCTP_SO(stream, i)->ext;
 		if (soute && soute->prio_head == old)
 			/* It's still in use, nothing else to do here. */
 			return 0;
@@ -201,7 +201,7 @@ static int sctp_sched_prio_set(struct sctp_stream *stream, __u16 sid,
 static int sctp_sched_prio_get(struct sctp_stream *stream, __u16 sid,
 			       __u16 *value)
 {
-	*value = stream->out[sid].ext->prio_head->prio;
+	*value = SCTP_SO(stream, sid)->ext->prio_head->prio;
 	return 0;
 }
 
@@ -215,7 +215,7 @@ static int sctp_sched_prio_init(struct sctp_stream *stream)
 static int sctp_sched_prio_init_sid(struct sctp_stream *stream, __u16 sid,
 				    gfp_t gfp)
 {
-	INIT_LIST_HEAD(&stream->out[sid].ext->prio_list);
+	INIT_LIST_HEAD(&SCTP_SO(stream, sid)->ext->prio_list);
 	return sctp_sched_prio_set(stream, sid, 0, gfp);
 }
 
@@ -233,9 +233,9 @@ static void sctp_sched_prio_free(struct sctp_stream *stream)
 	 */
 	sctp_sched_prio_unsched_all(stream);
 	for (i = 0; i < stream->outcnt; i++) {
-		if (!stream->out[i].ext)
+		if (!SCTP_SO(stream, i)->ext)
 			continue;
-		prio = stream->out[i].ext->prio_head;
+		prio = SCTP_SO(stream, i)->ext->prio_head;
 		if (prio && list_empty(&prio->prio_sched))
 			list_add(&prio->prio_sched, &list);
 	}
@@ -255,7 +255,7 @@ static void sctp_sched_prio_enqueue(struct sctp_outq *q,
 	ch = list_first_entry(&msg->chunks, struct sctp_chunk, frag_list);
 	sid = sctp_chunk_stream_no(ch);
 	stream = &q->asoc->stream;
-	sctp_sched_prio_sched(stream, stream->out[sid].ext);
+	sctp_sched_prio_sched(stream, SCTP_SO(stream, sid)->ext);
 }
 
 static struct sctp_chunk *sctp_sched_prio_dequeue(struct sctp_outq *q)
@@ -297,7 +297,7 @@ static void sctp_sched_prio_dequeue_done(struct sctp_outq *q,
 	 * this priority.
 	 */
 	sid = sctp_chunk_stream_no(ch);
-	soute = q->asoc->stream.out[sid].ext;
+	soute = SCTP_SO(&q->asoc->stream, sid)->ext;
 	prio = soute->prio_head;
 
 	sctp_sched_prio_next_stream(prio);
@@ -317,7 +317,7 @@ static void sctp_sched_prio_sched_all(struct sctp_stream *stream)
 		__u16 sid;
 
 		sid = sctp_chunk_stream_no(ch);
-		sout = &stream->out[sid];
+		sout = SCTP_SO(stream, sid);
 		if (sout->ext)
 			sctp_sched_prio_sched(stream, sout->ext);
 	}
diff --git a/net/sctp/stream_sched_rr.c b/net/sctp/stream_sched_rr.c
index 1155692..52ba743 100644
--- a/net/sctp/stream_sched_rr.c
+++ b/net/sctp/stream_sched_rr.c
@@ -100,7 +100,7 @@ static int sctp_sched_rr_init(struct sctp_stream *stream)
 static int sctp_sched_rr_init_sid(struct sctp_stream *stream, __u16 sid,
 				  gfp_t gfp)
 {
-	INIT_LIST_HEAD(&stream->out[sid].ext->rr_list);
+	INIT_LIST_HEAD(&SCTP_SO(stream, sid)->ext->rr_list);
 
 	return 0;
 }
@@ -120,7 +120,7 @@ static void sctp_sched_rr_enqueue(struct sctp_outq *q,
 	ch = list_first_entry(&msg->chunks, struct sctp_chunk, frag_list);
 	sid = sctp_chunk_stream_no(ch);
 	stream = &q->asoc->stream;
-	sctp_sched_rr_sched(stream, stream->out[sid].ext);
+	sctp_sched_rr_sched(stream, SCTP_SO(stream, sid)->ext);
 }
 
 static struct sctp_chunk *sctp_sched_rr_dequeue(struct sctp_outq *q)
@@ -154,7 +154,7 @@ static void sctp_sched_rr_dequeue_done(struct sctp_outq *q,
 
 	/* Last chunk on that msg, move to the next stream */
 	sid = sctp_chunk_stream_no(ch);
-	soute = q->asoc->stream.out[sid].ext;
+	soute = SCTP_SO(&q->asoc->stream, sid)->ext;
 
 	sctp_sched_rr_next_stream(&q->asoc->stream);
 
@@ -173,7 +173,7 @@ static void sctp_sched_rr_sched_all(struct sctp_stream *stream)
 		__u16 sid;
 
 		sid = sctp_chunk_stream_no(ch);
-		soute = stream->out[sid].ext;
+		soute = SCTP_SO(stream, sid)->ext;
 		if (soute)
 			sctp_sched_rr_sched(stream, soute);
 	}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH net-next 1/2] net/sctp: Make wrappers for accessing in/out streams
@ 2018-04-23 18:41   ` Oleg Babin
  0 siblings, 0 replies; 64+ messages in thread
From: Oleg Babin @ 2018-04-23 18:41 UTC (permalink / raw)
  To: netdev, linux-sctp
  Cc: David S. Miller, Vlad Yasevich, Neil Horman, Xin Long,
	Marcelo Ricardo Leitner, Andrey Ryabinin

This patch introduces wrappers for accessing in/out streams indirectly.
This will enable to replace physically contiguous memory arrays
of streams with flexible arrays (or maybe any other appropriate
mechanism) which do memory allocation on a per-page basis.

Signed-off-by: Oleg Babin <obabin@virtuozzo.com>
---
 include/net/sctp/structs.h   |  30 +++++++-----
 net/sctp/chunk.c             |   6 ++-
 net/sctp/outqueue.c          |  11 +++--
 net/sctp/socket.c            |   4 +-
 net/sctp/stream.c            | 107 +++++++++++++++++++++++++------------------
 net/sctp/stream_interleave.c |   2 +-
 net/sctp/stream_sched.c      |  13 +++---
 net/sctp/stream_sched_prio.c |  22 ++++-----
 net/sctp/stream_sched_rr.c   |   8 ++--
 9 files changed, 116 insertions(+), 87 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index a0ec462..578bb40 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -394,37 +394,37 @@ int sctp_stream_init(struct sctp_stream *stream, __u16 outcnt, __u16 incnt,
 
 /* What is the current SSN number for this stream? */
 #define sctp_ssn_peek(stream, type, sid) \
-	((stream)->type[sid].ssn)
+	(sctp_stream_##type##_ptr((stream), (sid))->ssn)
 
 /* Return the next SSN number for this stream.	*/
 #define sctp_ssn_next(stream, type, sid) \
-	((stream)->type[sid].ssn++)
+	(sctp_stream_##type##_ptr((stream), (sid))->ssn++)
 
 /* Skip over this ssn and all below. */
 #define sctp_ssn_skip(stream, type, sid, ssn) \
-	((stream)->type[sid].ssn = ssn + 1)
+	(sctp_stream_##type##_ptr((stream), (sid))->ssn = ssn + 1)
 
 /* What is the current MID number for this stream? */
 #define sctp_mid_peek(stream, type, sid) \
-	((stream)->type[sid].mid)
+	(sctp_stream_##type##_ptr((stream), (sid))->mid)
 
 /* Return the next MID number for this stream.  */
 #define sctp_mid_next(stream, type, sid) \
-	((stream)->type[sid].mid++)
+	(sctp_stream_##type##_ptr((stream), (sid))->mid++)
 
 /* Skip over this mid and all below. */
 #define sctp_mid_skip(stream, type, sid, mid) \
-	((stream)->type[sid].mid = mid + 1)
+	(sctp_stream_##type##_ptr((stream), (sid))->mid = mid + 1)
 
-#define sctp_stream_in(asoc, sid) (&(asoc)->stream.in[sid])
+#define sctp_stream_in(asoc, sid) sctp_stream_in_ptr(&(asoc)->stream, (sid))
 
 /* What is the current MID_uo number for this stream? */
 #define sctp_mid_uo_peek(stream, type, sid) \
-	((stream)->type[sid].mid_uo)
+	(sctp_stream_##type##_ptr((stream), (sid))->mid_uo)
 
 /* Return the next MID_uo number for this stream.  */
 #define sctp_mid_uo_next(stream, type, sid) \
-	((stream)->type[sid].mid_uo++)
+	(sctp_stream_##type##_ptr((stream), (sid))->mid_uo++)
 
 /*
  * Pointers to address related SCTP functions.
@@ -1428,8 +1428,8 @@ struct sctp_stream_in {
 };
 
 struct sctp_stream {
-	struct sctp_stream_out *out;
-	struct sctp_stream_in *in;
+	struct flex_array *out;
+	struct flex_array *in;
 	__u16 outcnt;
 	__u16 incnt;
 	/* Current stream being sent, if any */
@@ -1451,6 +1451,14 @@ struct sctp_stream {
 	struct sctp_stream_interleave *si;
 };
 
+struct sctp_stream_out *sctp_stream_out_ptr(const struct sctp_stream *stream,
+					    __u16 sid);
+struct sctp_stream_in *sctp_stream_in_ptr(const struct sctp_stream *stream,
+					  __u16 sid);
+
+#define SCTP_SO(s, i) sctp_stream_out_ptr((s), (i))
+#define SCTP_SI(s, i) sctp_stream_in_ptr((s), (i))
+
 #define SCTP_STREAM_CLOSED		0x00
 #define SCTP_STREAM_OPEN		0x01
 
diff --git a/net/sctp/chunk.c b/net/sctp/chunk.c
index be296d6..4b9310e 100644
--- a/net/sctp/chunk.c
+++ b/net/sctp/chunk.c
@@ -333,7 +333,8 @@ int sctp_chunk_abandoned(struct sctp_chunk *chunk)
 	if (SCTP_PR_TTL_ENABLED(chunk->sinfo.sinfo_flags) &&
 	    time_after(jiffies, chunk->msg->expires_at)) {
 		struct sctp_stream_out *streamout -			&chunk->asoc->stream.out[chunk->sinfo.sinfo_stream];
+			SCTP_SO(&chunk->asoc->stream,
+				chunk->sinfo.sinfo_stream);
 
 		if (chunk->sent_count) {
 			chunk->asoc->abandoned_sent[SCTP_PR_INDEX(TTL)]++;
@@ -347,7 +348,8 @@ int sctp_chunk_abandoned(struct sctp_chunk *chunk)
 	} else if (SCTP_PR_RTX_ENABLED(chunk->sinfo.sinfo_flags) &&
 		   chunk->sent_count > chunk->sinfo.sinfo_timetolive) {
 		struct sctp_stream_out *streamout -			&chunk->asoc->stream.out[chunk->sinfo.sinfo_stream];
+			SCTP_SO(&chunk->asoc->stream,
+				chunk->sinfo.sinfo_stream);
 
 		chunk->asoc->abandoned_sent[SCTP_PR_INDEX(RTX)]++;
 		streamout->ext->abandoned_sent[SCTP_PR_INDEX(RTX)]++;
diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index f211b3d..8d5d811 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -80,7 +80,7 @@ static inline void sctp_outq_head_data(struct sctp_outq *q,
 	q->out_qlen += ch->skb->len;
 
 	stream = sctp_chunk_stream_no(ch);
-	oute = q->asoc->stream.out[stream].ext;
+	oute = SCTP_SO(&q->asoc->stream, stream)->ext;
 	list_add(&ch->stream_list, &oute->outq);
 }
 
@@ -101,7 +101,7 @@ static inline void sctp_outq_tail_data(struct sctp_outq *q,
 	q->out_qlen += ch->skb->len;
 
 	stream = sctp_chunk_stream_no(ch);
-	oute = q->asoc->stream.out[stream].ext;
+	oute = SCTP_SO(&q->asoc->stream, stream)->ext;
 	list_add_tail(&ch->stream_list, &oute->outq);
 }
 
@@ -372,7 +372,7 @@ static int sctp_prsctp_prune_sent(struct sctp_association *asoc,
 		sctp_insert_list(&asoc->outqueue.abandoned,
 				 &chk->transmitted_list);
 
-		streamout = &asoc->stream.out[chk->sinfo.sinfo_stream];
+		streamout = SCTP_SO(&asoc->stream, chk->sinfo.sinfo_stream);
 		asoc->sent_cnt_removable--;
 		asoc->abandoned_sent[SCTP_PR_INDEX(PRIO)]++;
 		streamout->ext->abandoned_sent[SCTP_PR_INDEX(PRIO)]++;
@@ -416,7 +416,7 @@ static int sctp_prsctp_prune_unsent(struct sctp_association *asoc,
 		asoc->abandoned_unsent[SCTP_PR_INDEX(PRIO)]++;
 		if (chk->sinfo.sinfo_stream < asoc->stream.outcnt) {
 			struct sctp_stream_out *streamout -				&asoc->stream.out[chk->sinfo.sinfo_stream];
+				SCTP_SO(&asoc->stream, chk->sinfo.sinfo_stream);
 
 			streamout->ext->abandoned_unsent[SCTP_PR_INDEX(PRIO)]++;
 		}
@@ -1050,6 +1050,7 @@ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
 		/* Finally, transmit new packets.  */
 		while ((chunk = sctp_outq_dequeue_data(q)) != NULL) {
 			__u32 sid = ntohs(chunk->subh.data_hdr->stream);
+			__u8 stream_state = SCTP_SO(&asoc->stream, sid)->state;
 
 			/* Has this chunk expired? */
 			if (sctp_chunk_abandoned(chunk)) {
@@ -1059,7 +1060,7 @@ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
 				continue;
 			}
 
-			if (asoc->stream.out[sid].state = SCTP_STREAM_CLOSED) {
+			if (stream_state = SCTP_STREAM_CLOSED) {
 				sctp_outq_head_data(q, chunk);
 				goto sctp_flush_out;
 			}
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 80835ac..3442f7c 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1907,7 +1907,7 @@ static int sctp_sendmsg_to_asoc(struct sctp_association *asoc,
 		goto err;
 	}
 
-	if (unlikely(!asoc->stream.out[sinfo->sinfo_stream].ext)) {
+	if (unlikely(!SCTP_SO(&asoc->stream, sinfo->sinfo_stream)->ext)) {
 		err = sctp_stream_init_ext(&asoc->stream, sinfo->sinfo_stream);
 		if (err)
 			goto err;
@@ -6942,7 +6942,7 @@ static int sctp_getsockopt_pr_streamstatus(struct sock *sk, int len,
 	if (!asoc || params.sprstat_sid >= asoc->stream.outcnt)
 		goto out;
 
-	streamoute = asoc->stream.out[params.sprstat_sid].ext;
+	streamoute = SCTP_SO(&asoc->stream, params.sprstat_sid)->ext;
 	if (!streamoute) {
 		/* Not allocated yet, means all stats are 0 */
 		params.sprstat_abandoned_unsent = 0;
diff --git a/net/sctp/stream.c b/net/sctp/stream.c
index f799043..16e36c0 100644
--- a/net/sctp/stream.c
+++ b/net/sctp/stream.c
@@ -37,6 +37,18 @@
 #include <net/sctp/sm.h>
 #include <net/sctp/stream_sched.h>
 
+struct sctp_stream_out *sctp_stream_out_ptr(const struct sctp_stream *stream,
+					    __u16 sid)
+{
+	return ((struct sctp_stream_out *)(stream->out)) + sid;
+}
+
+struct sctp_stream_in *sctp_stream_in_ptr(const struct sctp_stream *stream,
+					  __u16 sid)
+{
+	return ((struct sctp_stream_in *)(stream->in)) + sid;
+}
+
 /* Migrates chunks from stream queues to new stream queues if needed,
  * but not across associations. Also, removes those chunks to streams
  * higher than the new max.
@@ -78,34 +90,35 @@ static void sctp_stream_outq_migrate(struct sctp_stream *stream,
 		 * sctp_stream_update will swap ->out pointers.
 		 */
 		for (i = 0; i < outcnt; i++) {
-			kfree(new->out[i].ext);
-			new->out[i].ext = stream->out[i].ext;
-			stream->out[i].ext = NULL;
+			kfree(SCTP_SO(new, i)->ext);
+			SCTP_SO(new, i)->ext = SCTP_SO(stream, i)->ext;
+			SCTP_SO(stream, i)->ext = NULL;
 		}
 	}
 
 	for (i = outcnt; i < stream->outcnt; i++)
-		kfree(stream->out[i].ext);
+		kfree(SCTP_SO(stream, i)->ext);
 }
 
 static int sctp_stream_alloc_out(struct sctp_stream *stream, __u16 outcnt,
 				 gfp_t gfp)
 {
-	struct sctp_stream_out *out;
+	struct flex_array *out;
+	size_t elem_size = sizeof(struct sctp_stream_out);
 
-	out = kmalloc_array(outcnt, sizeof(*out), gfp);
+	out = kmalloc_array(outcnt, elem_size, gfp);
 	if (!out)
 		return -ENOMEM;
 
 	if (stream->out) {
 		memcpy(out, stream->out, min(outcnt, stream->outcnt) *
-					 sizeof(*out));
+					 elem_size);
 		kfree(stream->out);
 	}
 
 	if (outcnt > stream->outcnt)
-		memset(out + stream->outcnt, 0,
-		       (outcnt - stream->outcnt) * sizeof(*out));
+		memset(((struct sctp_stream_out *)out) + stream->outcnt, 0,
+		       (outcnt - stream->outcnt) * elem_size);
 
 	stream->out = out;
 
@@ -115,22 +128,23 @@ static int sctp_stream_alloc_out(struct sctp_stream *stream, __u16 outcnt,
 static int sctp_stream_alloc_in(struct sctp_stream *stream, __u16 incnt,
 				gfp_t gfp)
 {
-	struct sctp_stream_in *in;
+	struct flex_array *in;
+	size_t elem_size = sizeof(struct sctp_stream_in);
 
-	in = kmalloc_array(incnt, sizeof(*stream->in), gfp);
+	in = kmalloc_array(incnt, elem_size, gfp);
 
 	if (!in)
 		return -ENOMEM;
 
 	if (stream->in) {
 		memcpy(in, stream->in, min(incnt, stream->incnt) *
-				       sizeof(*in));
+				       elem_size);
 		kfree(stream->in);
 	}
 
 	if (incnt > stream->incnt)
-		memset(in + stream->incnt, 0,
-		       (incnt - stream->incnt) * sizeof(*in));
+		memset(((struct sctp_stream_in *)in) + stream->incnt, 0,
+		       (incnt - stream->incnt) * elem_size);
 
 	stream->in = in;
 
@@ -162,7 +176,7 @@ int sctp_stream_init(struct sctp_stream *stream, __u16 outcnt, __u16 incnt,
 
 	stream->outcnt = outcnt;
 	for (i = 0; i < stream->outcnt; i++)
-		stream->out[i].state = SCTP_STREAM_OPEN;
+		SCTP_SO(stream, i)->state = SCTP_STREAM_OPEN;
 
 	sched->init(stream);
 
@@ -193,7 +207,7 @@ int sctp_stream_init_ext(struct sctp_stream *stream, __u16 sid)
 	soute = kzalloc(sizeof(*soute), GFP_KERNEL);
 	if (!soute)
 		return -ENOMEM;
-	stream->out[sid].ext = soute;
+	SCTP_SO(stream, sid)->ext = soute;
 
 	return sctp_sched_init_sid(stream, sid, GFP_KERNEL);
 }
@@ -205,7 +219,7 @@ void sctp_stream_free(struct sctp_stream *stream)
 
 	sched->free(stream);
 	for (i = 0; i < stream->outcnt; i++)
-		kfree(stream->out[i].ext);
+		kfree(SCTP_SO(stream, i)->ext);
 	kfree(stream->out);
 	kfree(stream->in);
 }
@@ -215,12 +229,12 @@ void sctp_stream_clear(struct sctp_stream *stream)
 	int i;
 
 	for (i = 0; i < stream->outcnt; i++) {
-		stream->out[i].mid = 0;
-		stream->out[i].mid_uo = 0;
+		SCTP_SO(stream, i)->mid = 0;
+		SCTP_SO(stream, i)->mid_uo = 0;
 	}
 
 	for (i = 0; i < stream->incnt; i++)
-		stream->in[i].mid = 0;
+		SCTP_SI(stream, i)->mid = 0;
 }
 
 void sctp_stream_update(struct sctp_stream *stream, struct sctp_stream *new)
@@ -271,8 +285,8 @@ static bool sctp_stream_outq_is_empty(struct sctp_stream *stream,
 	for (i = 0; i < str_nums; i++) {
 		__u16 sid = ntohs(str_list[i]);
 
-		if (stream->out[sid].ext &&
-		    !list_empty(&stream->out[sid].ext->outq))
+		if (SCTP_SO(stream, sid)->ext &&
+		    !list_empty(&SCTP_SO(stream, sid)->ext->outq))
 			return false;
 	}
 
@@ -359,11 +373,11 @@ int sctp_send_reset_streams(struct sctp_association *asoc,
 	if (out) {
 		if (str_nums)
 			for (i = 0; i < str_nums; i++)
-				stream->out[str_list[i]].state +				SCTP_SO(stream, str_list[i])->state  						       SCTP_STREAM_CLOSED;
 		else
 			for (i = 0; i < stream->outcnt; i++)
-				stream->out[i].state = SCTP_STREAM_CLOSED;
+				SCTP_SO(stream, i)->state = SCTP_STREAM_CLOSED;
 	}
 
 	asoc->strreset_chunk = chunk;
@@ -378,11 +392,11 @@ int sctp_send_reset_streams(struct sctp_association *asoc,
 
 		if (str_nums)
 			for (i = 0; i < str_nums; i++)
-				stream->out[str_list[i]].state +				SCTP_SO(stream, str_list[i])->state  						       SCTP_STREAM_OPEN;
 		else
 			for (i = 0; i < stream->outcnt; i++)
-				stream->out[i].state = SCTP_STREAM_OPEN;
+				SCTP_SO(stream, i)->state = SCTP_STREAM_OPEN;
 
 		goto out;
 	}
@@ -416,7 +430,7 @@ int sctp_send_reset_assoc(struct sctp_association *asoc)
 
 	/* Block further xmit of data until this request is completed */
 	for (i = 0; i < stream->outcnt; i++)
-		stream->out[i].state = SCTP_STREAM_CLOSED;
+		SCTP_SO(stream, i)->state = SCTP_STREAM_CLOSED;
 
 	asoc->strreset_chunk = chunk;
 	sctp_chunk_hold(asoc->strreset_chunk);
@@ -427,7 +441,7 @@ int sctp_send_reset_assoc(struct sctp_association *asoc)
 		asoc->strreset_chunk = NULL;
 
 		for (i = 0; i < stream->outcnt; i++)
-			stream->out[i].state = SCTP_STREAM_OPEN;
+			SCTP_SO(stream, i)->state = SCTP_STREAM_OPEN;
 
 		return retval;
 	}
@@ -607,10 +621,10 @@ struct sctp_chunk *sctp_process_strreset_outreq(
 		}
 
 		for (i = 0; i < nums; i++)
-			stream->in[ntohs(str_p[i])].mid = 0;
+			SCTP_SI(stream, ntohs(str_p[i]))->mid = 0;
 	} else {
 		for (i = 0; i < stream->incnt; i++)
-			stream->in[i].mid = 0;
+			SCTP_SI(stream, i)->mid = 0;
 	}
 
 	result = SCTP_STRRESET_PERFORMED;
@@ -681,11 +695,11 @@ struct sctp_chunk *sctp_process_strreset_inreq(
 
 	if (nums)
 		for (i = 0; i < nums; i++)
-			stream->out[ntohs(str_p[i])].state +			SCTP_SO(stream, ntohs(str_p[i]))->state  					       SCTP_STREAM_CLOSED;
 	else
 		for (i = 0; i < stream->outcnt; i++)
-			stream->out[i].state = SCTP_STREAM_CLOSED;
+			SCTP_SO(stream, i)->state = SCTP_STREAM_CLOSED;
 
 	asoc->strreset_chunk = chunk;
 	asoc->strreset_outstanding = 1;
@@ -784,11 +798,11 @@ struct sctp_chunk *sctp_process_strreset_tsnreq(
 	 *      incoming and outgoing streams.
 	 */
 	for (i = 0; i < stream->outcnt; i++) {
-		stream->out[i].mid = 0;
-		stream->out[i].mid_uo = 0;
+		SCTP_SO(stream, i)->mid = 0;
+		SCTP_SO(stream, i)->mid_uo = 0;
 	}
 	for (i = 0; i < stream->incnt; i++)
-		stream->in[i].mid = 0;
+		SCTP_SI(stream, i)->mid = 0;
 
 	result = SCTP_STRRESET_PERFORMED;
 
@@ -977,15 +991,18 @@ struct sctp_chunk *sctp_process_strreset_resp(
 		       sizeof(__u16);
 
 		if (result = SCTP_STRRESET_PERFORMED) {
+			struct sctp_stream_out *sout;
 			if (nums) {
 				for (i = 0; i < nums; i++) {
-					stream->out[ntohs(str_p[i])].mid = 0;
-					stream->out[ntohs(str_p[i])].mid_uo = 0;
+					sout = SCTP_SO(stream, ntohs(str_p[i]));
+					sout->mid = 0;
+					sout->mid_uo = 0;
 				}
 			} else {
 				for (i = 0; i < stream->outcnt; i++) {
-					stream->out[i].mid = 0;
-					stream->out[i].mid_uo = 0;
+					sout = SCTP_SO(stream, i);
+					sout->mid = 0;
+					sout->mid_uo = 0;
 				}
 			}
 
@@ -993,7 +1010,7 @@ struct sctp_chunk *sctp_process_strreset_resp(
 		}
 
 		for (i = 0; i < stream->outcnt; i++)
-			stream->out[i].state = SCTP_STREAM_OPEN;
+			SCTP_SO(stream, i)->state = SCTP_STREAM_OPEN;
 
 		*evp = sctp_ulpevent_make_stream_reset_event(asoc, flags,
 			nums, str_p, GFP_ATOMIC);
@@ -1048,15 +1065,15 @@ struct sctp_chunk *sctp_process_strreset_resp(
 			asoc->adv_peer_ack_point = asoc->ctsn_ack_point;
 
 			for (i = 0; i < stream->outcnt; i++) {
-				stream->out[i].mid = 0;
-				stream->out[i].mid_uo = 0;
+				SCTP_SO(stream, i)->mid = 0;
+				SCTP_SO(stream, i)->mid_uo = 0;
 			}
 			for (i = 0; i < stream->incnt; i++)
-				stream->in[i].mid = 0;
+				SCTP_SI(stream, i)->mid = 0;
 		}
 
 		for (i = 0; i < stream->outcnt; i++)
-			stream->out[i].state = SCTP_STREAM_OPEN;
+			SCTP_SO(stream, i)->state = SCTP_STREAM_OPEN;
 
 		*evp = sctp_ulpevent_make_assoc_reset_event(asoc, flags,
 			stsn, rtsn, GFP_ATOMIC);
@@ -1070,7 +1087,7 @@ struct sctp_chunk *sctp_process_strreset_resp(
 
 		if (result = SCTP_STRRESET_PERFORMED)
 			for (i = number; i < stream->outcnt; i++)
-				stream->out[i].state = SCTP_STREAM_OPEN;
+				SCTP_SO(stream, i)->state = SCTP_STREAM_OPEN;
 		else
 			stream->outcnt = number;
 
diff --git a/net/sctp/stream_interleave.c b/net/sctp/stream_interleave.c
index d3764c1..46f9fb6 100644
--- a/net/sctp/stream_interleave.c
+++ b/net/sctp/stream_interleave.c
@@ -1053,7 +1053,7 @@ static void sctp_intl_abort_pd(struct sctp_ulpq *ulpq, gfp_t gfp)
 	__u16 sid;
 
 	for (sid = 0; sid < stream->incnt; sid++) {
-		struct sctp_stream_in *sin = &stream->in[sid];
+		struct sctp_stream_in *sin = SCTP_SI(stream, sid);
 		__u32 mid;
 
 		if (sin->pd_mode_uo) {
diff --git a/net/sctp/stream_sched.c b/net/sctp/stream_sched.c
index f5fcd42..a6c04a9 100644
--- a/net/sctp/stream_sched.c
+++ b/net/sctp/stream_sched.c
@@ -161,7 +161,7 @@ int sctp_sched_set_sched(struct sctp_association *asoc,
 
 		/* Give the next scheduler a clean slate. */
 		for (i = 0; i < asoc->stream.outcnt; i++) {
-			void *p = asoc->stream.out[i].ext;
+			void *p = SCTP_SO(&asoc->stream, i)->ext;
 
 			if (!p)
 				continue;
@@ -175,7 +175,7 @@ int sctp_sched_set_sched(struct sctp_association *asoc,
 	asoc->outqueue.sched = n;
 	n->init(&asoc->stream);
 	for (i = 0; i < asoc->stream.outcnt; i++) {
-		if (!asoc->stream.out[i].ext)
+		if (!SCTP_SO(&asoc->stream, i)->ext)
 			continue;
 
 		ret = n->init_sid(&asoc->stream, i, GFP_KERNEL);
@@ -217,7 +217,7 @@ int sctp_sched_set_value(struct sctp_association *asoc, __u16 sid,
 	if (sid >= asoc->stream.outcnt)
 		return -EINVAL;
 
-	if (!asoc->stream.out[sid].ext) {
+	if (!SCTP_SO(&asoc->stream, sid)->ext) {
 		int ret;
 
 		ret = sctp_stream_init_ext(&asoc->stream, sid);
@@ -234,7 +234,7 @@ int sctp_sched_get_value(struct sctp_association *asoc, __u16 sid,
 	if (sid >= asoc->stream.outcnt)
 		return -EINVAL;
 
-	if (!asoc->stream.out[sid].ext)
+	if (!SCTP_SO(&asoc->stream, sid)->ext)
 		return 0;
 
 	return asoc->outqueue.sched->get(&asoc->stream, sid, value);
@@ -252,7 +252,7 @@ void sctp_sched_dequeue_done(struct sctp_outq *q, struct sctp_chunk *ch)
 		 * priority stream comes in.
 		 */
 		sid = sctp_chunk_stream_no(ch);
-		sout = &q->asoc->stream.out[sid];
+		sout = SCTP_SO(&q->asoc->stream, sid);
 		q->asoc->stream.out_curr = sout;
 		return;
 	}
@@ -272,8 +272,9 @@ void sctp_sched_dequeue_common(struct sctp_outq *q, struct sctp_chunk *ch)
 int sctp_sched_init_sid(struct sctp_stream *stream, __u16 sid, gfp_t gfp)
 {
 	struct sctp_sched_ops *sched = sctp_sched_ops_from_stream(stream);
+	struct sctp_stream_out_ext *ext = SCTP_SO(stream, sid)->ext;
 
-	INIT_LIST_HEAD(&stream->out[sid].ext->outq);
+	INIT_LIST_HEAD(&ext->outq);
 	return sched->init_sid(stream, sid, gfp);
 }
 
diff --git a/net/sctp/stream_sched_prio.c b/net/sctp/stream_sched_prio.c
index 7997d35..2245083 100644
--- a/net/sctp/stream_sched_prio.c
+++ b/net/sctp/stream_sched_prio.c
@@ -75,10 +75,10 @@ static struct sctp_stream_priorities *sctp_sched_prio_get_head(
 
 	/* No luck. So we search on all streams now. */
 	for (i = 0; i < stream->outcnt; i++) {
-		if (!stream->out[i].ext)
+		if (!SCTP_SO(stream, i)->ext)
 			continue;
 
-		p = stream->out[i].ext->prio_head;
+		p = SCTP_SO(stream, i)->ext->prio_head;
 		if (!p)
 			/* Means all other streams won't be initialized
 			 * as well.
@@ -165,7 +165,7 @@ static void sctp_sched_prio_sched(struct sctp_stream *stream,
 static int sctp_sched_prio_set(struct sctp_stream *stream, __u16 sid,
 			       __u16 prio, gfp_t gfp)
 {
-	struct sctp_stream_out *sout = &stream->out[sid];
+	struct sctp_stream_out *sout = SCTP_SO(stream, sid);
 	struct sctp_stream_out_ext *soute = sout->ext;
 	struct sctp_stream_priorities *prio_head, *old;
 	bool reschedule = false;
@@ -186,7 +186,7 @@ static int sctp_sched_prio_set(struct sctp_stream *stream, __u16 sid,
 		return 0;
 
 	for (i = 0; i < stream->outcnt; i++) {
-		soute = stream->out[i].ext;
+		soute = SCTP_SO(stream, i)->ext;
 		if (soute && soute->prio_head = old)
 			/* It's still in use, nothing else to do here. */
 			return 0;
@@ -201,7 +201,7 @@ static int sctp_sched_prio_set(struct sctp_stream *stream, __u16 sid,
 static int sctp_sched_prio_get(struct sctp_stream *stream, __u16 sid,
 			       __u16 *value)
 {
-	*value = stream->out[sid].ext->prio_head->prio;
+	*value = SCTP_SO(stream, sid)->ext->prio_head->prio;
 	return 0;
 }
 
@@ -215,7 +215,7 @@ static int sctp_sched_prio_init(struct sctp_stream *stream)
 static int sctp_sched_prio_init_sid(struct sctp_stream *stream, __u16 sid,
 				    gfp_t gfp)
 {
-	INIT_LIST_HEAD(&stream->out[sid].ext->prio_list);
+	INIT_LIST_HEAD(&SCTP_SO(stream, sid)->ext->prio_list);
 	return sctp_sched_prio_set(stream, sid, 0, gfp);
 }
 
@@ -233,9 +233,9 @@ static void sctp_sched_prio_free(struct sctp_stream *stream)
 	 */
 	sctp_sched_prio_unsched_all(stream);
 	for (i = 0; i < stream->outcnt; i++) {
-		if (!stream->out[i].ext)
+		if (!SCTP_SO(stream, i)->ext)
 			continue;
-		prio = stream->out[i].ext->prio_head;
+		prio = SCTP_SO(stream, i)->ext->prio_head;
 		if (prio && list_empty(&prio->prio_sched))
 			list_add(&prio->prio_sched, &list);
 	}
@@ -255,7 +255,7 @@ static void sctp_sched_prio_enqueue(struct sctp_outq *q,
 	ch = list_first_entry(&msg->chunks, struct sctp_chunk, frag_list);
 	sid = sctp_chunk_stream_no(ch);
 	stream = &q->asoc->stream;
-	sctp_sched_prio_sched(stream, stream->out[sid].ext);
+	sctp_sched_prio_sched(stream, SCTP_SO(stream, sid)->ext);
 }
 
 static struct sctp_chunk *sctp_sched_prio_dequeue(struct sctp_outq *q)
@@ -297,7 +297,7 @@ static void sctp_sched_prio_dequeue_done(struct sctp_outq *q,
 	 * this priority.
 	 */
 	sid = sctp_chunk_stream_no(ch);
-	soute = q->asoc->stream.out[sid].ext;
+	soute = SCTP_SO(&q->asoc->stream, sid)->ext;
 	prio = soute->prio_head;
 
 	sctp_sched_prio_next_stream(prio);
@@ -317,7 +317,7 @@ static void sctp_sched_prio_sched_all(struct sctp_stream *stream)
 		__u16 sid;
 
 		sid = sctp_chunk_stream_no(ch);
-		sout = &stream->out[sid];
+		sout = SCTP_SO(stream, sid);
 		if (sout->ext)
 			sctp_sched_prio_sched(stream, sout->ext);
 	}
diff --git a/net/sctp/stream_sched_rr.c b/net/sctp/stream_sched_rr.c
index 1155692..52ba743 100644
--- a/net/sctp/stream_sched_rr.c
+++ b/net/sctp/stream_sched_rr.c
@@ -100,7 +100,7 @@ static int sctp_sched_rr_init(struct sctp_stream *stream)
 static int sctp_sched_rr_init_sid(struct sctp_stream *stream, __u16 sid,
 				  gfp_t gfp)
 {
-	INIT_LIST_HEAD(&stream->out[sid].ext->rr_list);
+	INIT_LIST_HEAD(&SCTP_SO(stream, sid)->ext->rr_list);
 
 	return 0;
 }
@@ -120,7 +120,7 @@ static void sctp_sched_rr_enqueue(struct sctp_outq *q,
 	ch = list_first_entry(&msg->chunks, struct sctp_chunk, frag_list);
 	sid = sctp_chunk_stream_no(ch);
 	stream = &q->asoc->stream;
-	sctp_sched_rr_sched(stream, stream->out[sid].ext);
+	sctp_sched_rr_sched(stream, SCTP_SO(stream, sid)->ext);
 }
 
 static struct sctp_chunk *sctp_sched_rr_dequeue(struct sctp_outq *q)
@@ -154,7 +154,7 @@ static void sctp_sched_rr_dequeue_done(struct sctp_outq *q,
 
 	/* Last chunk on that msg, move to the next stream */
 	sid = sctp_chunk_stream_no(ch);
-	soute = q->asoc->stream.out[sid].ext;
+	soute = SCTP_SO(&q->asoc->stream, sid)->ext;
 
 	sctp_sched_rr_next_stream(&q->asoc->stream);
 
@@ -173,7 +173,7 @@ static void sctp_sched_rr_sched_all(struct sctp_stream *stream)
 		__u16 sid;
 
 		sid = sctp_chunk_stream_no(ch);
-		soute = stream->out[sid].ext;
+		soute = SCTP_SO(stream, sid)->ext;
 		if (soute)
 			sctp_sched_rr_sched(stream, soute);
 	}
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH net-next 2/2] net/sctp: Replace in/out stream arrays with flex_array
  2018-04-23 18:41 ` Oleg Babin
@ 2018-04-23 18:41   ` Oleg Babin
  -1 siblings, 0 replies; 64+ messages in thread
From: Oleg Babin @ 2018-04-23 18:41 UTC (permalink / raw)
  To: netdev, linux-sctp
  Cc: David S. Miller, Vlad Yasevich, Neil Horman, Xin Long,
	Marcelo Ricardo Leitner, Andrey Ryabinin

This path replaces physically contiguous memory arrays
allocated using kmalloc_array() with flexible arrays.
This enables to avoid memory allocation failures on the
systems under a memory stress.

Signed-off-by: Oleg Babin <obabin@virtuozzo.com>
---
 include/net/sctp/structs.h |  1 +
 net/sctp/stream.c          | 78 +++++++++++++++++++++++++++++++++++-----------
 2 files changed, 61 insertions(+), 18 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 578bb40..c7f42b4 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -57,6 +57,7 @@
 #include <linux/atomic.h>		/* This gets us atomic counters.  */
 #include <linux/skbuff.h>	/* We need sk_buff_head. */
 #include <linux/workqueue.h>	/* We need tq_struct.	 */
+#include <linux/flex_array.h>	/* We need flex_array.   */
 #include <linux/sctp.h>		/* We need sctp* header structs.  */
 #include <net/sctp/auth.h>	/* We need auth specific structs */
 #include <net/ip.h>		/* For inet_skb_parm */
diff --git a/net/sctp/stream.c b/net/sctp/stream.c
index 16e36c0..be372b0 100644
--- a/net/sctp/stream.c
+++ b/net/sctp/stream.c
@@ -40,13 +40,60 @@
 struct sctp_stream_out *sctp_stream_out_ptr(const struct sctp_stream *stream,
 					    __u16 sid)
 {
-	return ((struct sctp_stream_out *)(stream->out)) + sid;
+	return flex_array_get(stream->out, sid);
 }
 
 struct sctp_stream_in *sctp_stream_in_ptr(const struct sctp_stream *stream,
 					  __u16 sid)
 {
-	return ((struct sctp_stream_in *)(stream->in)) + sid;
+	return flex_array_get(stream->in, sid);
+}
+
+static struct flex_array *fa_alloc(size_t elem_size, size_t elem_count,
+				   gfp_t gfp)
+{
+	struct flex_array *result;
+	int err;
+
+	result = flex_array_alloc(elem_size, elem_count, gfp);
+	if (result) {
+		err = flex_array_prealloc(result, 0, elem_count, gfp);
+		if (err) {
+			flex_array_free(result);
+			result = NULL;
+		}
+	}
+
+	return result;
+}
+
+static void fa_free(struct flex_array *fa)
+{
+	if (fa)
+		flex_array_free(fa);
+}
+
+static void fa_copy(struct flex_array *fa, struct flex_array *from,
+		    size_t index, size_t count)
+{
+	void *elem;
+
+	while (count--) {
+		elem = flex_array_get(from, index);
+		flex_array_put(fa, index, elem, 0);
+		index++;
+	}
+}
+
+static void fa_zero(struct flex_array *fa, size_t index, size_t count)
+{
+	void *elem;
+
+	while (count--) {
+		elem = flex_array_get(fa, index);
+		memset(elem, 0, fa->element_size);
+		index++;
+	}
 }
 
 /* Migrates chunks from stream queues to new stream queues if needed,
@@ -106,19 +153,17 @@ static int sctp_stream_alloc_out(struct sctp_stream *stream, __u16 outcnt,
 	struct flex_array *out;
 	size_t elem_size = sizeof(struct sctp_stream_out);
 
-	out = kmalloc_array(outcnt, elem_size, gfp);
+	out = fa_alloc(elem_size, outcnt, gfp);
 	if (!out)
 		return -ENOMEM;
 
 	if (stream->out) {
-		memcpy(out, stream->out, min(outcnt, stream->outcnt) *
-					 elem_size);
-		kfree(stream->out);
+		fa_copy(out, stream->out, 0, min(outcnt, stream->outcnt));
+		fa_free(stream->out);
 	}
 
 	if (outcnt > stream->outcnt)
-		memset(((struct sctp_stream_out *)out) + stream->outcnt, 0,
-		       (outcnt - stream->outcnt) * elem_size);
+		fa_zero(out, stream->outcnt, (outcnt - stream->outcnt));
 
 	stream->out = out;
 
@@ -131,20 +176,17 @@ static int sctp_stream_alloc_in(struct sctp_stream *stream, __u16 incnt,
 	struct flex_array *in;
 	size_t elem_size = sizeof(struct sctp_stream_in);
 
-	in = kmalloc_array(incnt, elem_size, gfp);
-
+	in = fa_alloc(elem_size, incnt, gfp);
 	if (!in)
 		return -ENOMEM;
 
 	if (stream->in) {
-		memcpy(in, stream->in, min(incnt, stream->incnt) *
-				       elem_size);
-		kfree(stream->in);
+		fa_copy(in, stream->in, 0, min(incnt, stream->incnt));
+		fa_free(stream->in);
 	}
 
 	if (incnt > stream->incnt)
-		memset(((struct sctp_stream_in *)in) + stream->incnt, 0,
-		       (incnt - stream->incnt) * elem_size);
+		fa_zero(in, stream->incnt, (incnt - stream->incnt));
 
 	stream->in = in;
 
@@ -188,7 +230,7 @@ int sctp_stream_init(struct sctp_stream *stream, __u16 outcnt, __u16 incnt,
 	ret = sctp_stream_alloc_in(stream, incnt, gfp);
 	if (ret) {
 		sched->free(stream);
-		kfree(stream->out);
+		fa_free(stream->out);
 		stream->out = NULL;
 		stream->outcnt = 0;
 		goto out;
@@ -220,8 +262,8 @@ void sctp_stream_free(struct sctp_stream *stream)
 	sched->free(stream);
 	for (i = 0; i < stream->outcnt; i++)
 		kfree(SCTP_SO(stream, i)->ext);
-	kfree(stream->out);
-	kfree(stream->in);
+	fa_free(stream->out);
+	fa_free(stream->in);
 }
 
 void sctp_stream_clear(struct sctp_stream *stream)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH net-next 2/2] net/sctp: Replace in/out stream arrays with flex_array
@ 2018-04-23 18:41   ` Oleg Babin
  0 siblings, 0 replies; 64+ messages in thread
From: Oleg Babin @ 2018-04-23 18:41 UTC (permalink / raw)
  To: netdev, linux-sctp
  Cc: David S. Miller, Vlad Yasevich, Neil Horman, Xin Long,
	Marcelo Ricardo Leitner, Andrey Ryabinin

This path replaces physically contiguous memory arrays
allocated using kmalloc_array() with flexible arrays.
This enables to avoid memory allocation failures on the
systems under a memory stress.

Signed-off-by: Oleg Babin <obabin@virtuozzo.com>
---
 include/net/sctp/structs.h |  1 +
 net/sctp/stream.c          | 78 +++++++++++++++++++++++++++++++++++-----------
 2 files changed, 61 insertions(+), 18 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 578bb40..c7f42b4 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -57,6 +57,7 @@
 #include <linux/atomic.h>		/* This gets us atomic counters.  */
 #include <linux/skbuff.h>	/* We need sk_buff_head. */
 #include <linux/workqueue.h>	/* We need tq_struct.	 */
+#include <linux/flex_array.h>	/* We need flex_array.   */
 #include <linux/sctp.h>		/* We need sctp* header structs.  */
 #include <net/sctp/auth.h>	/* We need auth specific structs */
 #include <net/ip.h>		/* For inet_skb_parm */
diff --git a/net/sctp/stream.c b/net/sctp/stream.c
index 16e36c0..be372b0 100644
--- a/net/sctp/stream.c
+++ b/net/sctp/stream.c
@@ -40,13 +40,60 @@
 struct sctp_stream_out *sctp_stream_out_ptr(const struct sctp_stream *stream,
 					    __u16 sid)
 {
-	return ((struct sctp_stream_out *)(stream->out)) + sid;
+	return flex_array_get(stream->out, sid);
 }
 
 struct sctp_stream_in *sctp_stream_in_ptr(const struct sctp_stream *stream,
 					  __u16 sid)
 {
-	return ((struct sctp_stream_in *)(stream->in)) + sid;
+	return flex_array_get(stream->in, sid);
+}
+
+static struct flex_array *fa_alloc(size_t elem_size, size_t elem_count,
+				   gfp_t gfp)
+{
+	struct flex_array *result;
+	int err;
+
+	result = flex_array_alloc(elem_size, elem_count, gfp);
+	if (result) {
+		err = flex_array_prealloc(result, 0, elem_count, gfp);
+		if (err) {
+			flex_array_free(result);
+			result = NULL;
+		}
+	}
+
+	return result;
+}
+
+static void fa_free(struct flex_array *fa)
+{
+	if (fa)
+		flex_array_free(fa);
+}
+
+static void fa_copy(struct flex_array *fa, struct flex_array *from,
+		    size_t index, size_t count)
+{
+	void *elem;
+
+	while (count--) {
+		elem = flex_array_get(from, index);
+		flex_array_put(fa, index, elem, 0);
+		index++;
+	}
+}
+
+static void fa_zero(struct flex_array *fa, size_t index, size_t count)
+{
+	void *elem;
+
+	while (count--) {
+		elem = flex_array_get(fa, index);
+		memset(elem, 0, fa->element_size);
+		index++;
+	}
 }
 
 /* Migrates chunks from stream queues to new stream queues if needed,
@@ -106,19 +153,17 @@ static int sctp_stream_alloc_out(struct sctp_stream *stream, __u16 outcnt,
 	struct flex_array *out;
 	size_t elem_size = sizeof(struct sctp_stream_out);
 
-	out = kmalloc_array(outcnt, elem_size, gfp);
+	out = fa_alloc(elem_size, outcnt, gfp);
 	if (!out)
 		return -ENOMEM;
 
 	if (stream->out) {
-		memcpy(out, stream->out, min(outcnt, stream->outcnt) *
-					 elem_size);
-		kfree(stream->out);
+		fa_copy(out, stream->out, 0, min(outcnt, stream->outcnt));
+		fa_free(stream->out);
 	}
 
 	if (outcnt > stream->outcnt)
-		memset(((struct sctp_stream_out *)out) + stream->outcnt, 0,
-		       (outcnt - stream->outcnt) * elem_size);
+		fa_zero(out, stream->outcnt, (outcnt - stream->outcnt));
 
 	stream->out = out;
 
@@ -131,20 +176,17 @@ static int sctp_stream_alloc_in(struct sctp_stream *stream, __u16 incnt,
 	struct flex_array *in;
 	size_t elem_size = sizeof(struct sctp_stream_in);
 
-	in = kmalloc_array(incnt, elem_size, gfp);
-
+	in = fa_alloc(elem_size, incnt, gfp);
 	if (!in)
 		return -ENOMEM;
 
 	if (stream->in) {
-		memcpy(in, stream->in, min(incnt, stream->incnt) *
-				       elem_size);
-		kfree(stream->in);
+		fa_copy(in, stream->in, 0, min(incnt, stream->incnt));
+		fa_free(stream->in);
 	}
 
 	if (incnt > stream->incnt)
-		memset(((struct sctp_stream_in *)in) + stream->incnt, 0,
-		       (incnt - stream->incnt) * elem_size);
+		fa_zero(in, stream->incnt, (incnt - stream->incnt));
 
 	stream->in = in;
 
@@ -188,7 +230,7 @@ int sctp_stream_init(struct sctp_stream *stream, __u16 outcnt, __u16 incnt,
 	ret = sctp_stream_alloc_in(stream, incnt, gfp);
 	if (ret) {
 		sched->free(stream);
-		kfree(stream->out);
+		fa_free(stream->out);
 		stream->out = NULL;
 		stream->outcnt = 0;
 		goto out;
@@ -220,8 +262,8 @@ void sctp_stream_free(struct sctp_stream *stream)
 	sched->free(stream);
 	for (i = 0; i < stream->outcnt; i++)
 		kfree(SCTP_SO(stream, i)->ext);
-	kfree(stream->out);
-	kfree(stream->in);
+	fa_free(stream->out);
+	fa_free(stream->in);
 }
 
 void sctp_stream_clear(struct sctp_stream *stream)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: [PATCH net-next 0/2] net/sctp: Avoid allocating high order memory with kmalloc()
  2018-04-23 18:41 ` Oleg Babin
@ 2018-04-23 21:33   ` Marcelo Ricardo Leitner
  -1 siblings, 0 replies; 64+ messages in thread
From: Marcelo Ricardo Leitner @ 2018-04-23 21:33 UTC (permalink / raw)
  To: Oleg Babin
  Cc: netdev, linux-sctp, David S. Miller, Vlad Yasevich, Neil Horman,
	Xin Long, Andrey Ryabinin

Hi,

On Mon, Apr 23, 2018 at 09:41:04PM +0300, Oleg Babin wrote:
> Each SCTP association can have up to 65535 input and output streams.
> For each stream type an array of sctp_stream_in or sctp_stream_out
> structures is allocated using kmalloc_array() function. This function
> allocates physically contiguous memory regions, so this can lead
> to allocation of memory regions of very high order, i.e.:
>
>   sizeof(struct sctp_stream_out) == 24,
>   ((65535 * 24) / 4096) == 383 memory pages (4096 byte per page),
>   which means 9th memory order.
>
> This can lead to a memory allocation failures on the systems
> under a memory stress.

Did you do performance tests while actually using these 65k streams
and with 256 (so it gets 2 pages)?

This will introduce another deref on each access to an element, but
I'm not expecting any impact due to it.

  Marcelo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH net-next 0/2] net/sctp: Avoid allocating high order memory with kmalloc()
@ 2018-04-23 21:33   ` Marcelo Ricardo Leitner
  0 siblings, 0 replies; 64+ messages in thread
From: Marcelo Ricardo Leitner @ 2018-04-23 21:33 UTC (permalink / raw)
  To: Oleg Babin
  Cc: netdev, linux-sctp, David S. Miller, Vlad Yasevich, Neil Horman,
	Xin Long, Andrey Ryabinin

Hi,

On Mon, Apr 23, 2018 at 09:41:04PM +0300, Oleg Babin wrote:
> Each SCTP association can have up to 65535 input and output streams.
> For each stream type an array of sctp_stream_in or sctp_stream_out
> structures is allocated using kmalloc_array() function. This function
> allocates physically contiguous memory regions, so this can lead
> to allocation of memory regions of very high order, i.e.:
>
>   sizeof(struct sctp_stream_out) = 24,
>   ((65535 * 24) / 4096) = 383 memory pages (4096 byte per page),
>   which means 9th memory order.
>
> This can lead to a memory allocation failures on the systems
> under a memory stress.

Did you do performance tests while actually using these 65k streams
and with 256 (so it gets 2 pages)?

This will introduce another deref on each access to an element, but
I'm not expecting any impact due to it.

  Marcelo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH net-next 1/2] net/sctp: Make wrappers for accessing in/out streams
  2018-04-23 18:41   ` Oleg Babin
@ 2018-04-23 21:33     ` Marcelo Ricardo Leitner
  -1 siblings, 0 replies; 64+ messages in thread
From: Marcelo Ricardo Leitner @ 2018-04-23 21:33 UTC (permalink / raw)
  To: Oleg Babin
  Cc: netdev, linux-sctp, David S. Miller, Vlad Yasevich, Neil Horman,
	Xin Long, Andrey Ryabinin

On Mon, Apr 23, 2018 at 09:41:05PM +0300, Oleg Babin wrote:
> This patch introduces wrappers for accessing in/out streams indirectly.
> This will enable to replace physically contiguous memory arrays
> of streams with flexible arrays (or maybe any other appropriate
> mechanism) which do memory allocation on a per-page basis.
>
> Signed-off-by: Oleg Babin <obabin@virtuozzo.com>
> ---
>  include/net/sctp/structs.h   |  30 +++++++-----
>  net/sctp/chunk.c             |   6 ++-
>  net/sctp/outqueue.c          |  11 +++--
>  net/sctp/socket.c            |   4 +-
>  net/sctp/stream.c            | 107 +++++++++++++++++++++++++------------------
>  net/sctp/stream_interleave.c |   2 +-
>  net/sctp/stream_sched.c      |  13 +++---
>  net/sctp/stream_sched_prio.c |  22 ++++-----
>  net/sctp/stream_sched_rr.c   |   8 ++--
>  9 files changed, 116 insertions(+), 87 deletions(-)
>
> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> index a0ec462..578bb40 100644
> --- a/include/net/sctp/structs.h
> +++ b/include/net/sctp/structs.h
> @@ -394,37 +394,37 @@ int sctp_stream_init(struct sctp_stream *stream, __u16 outcnt, __u16 incnt,
>
>  /* What is the current SSN number for this stream? */
>  #define sctp_ssn_peek(stream, type, sid) \
> -	((stream)->type[sid].ssn)
> +	(sctp_stream_##type##_ptr((stream), (sid))->ssn)
>
>  /* Return the next SSN number for this stream.	*/
>  #define sctp_ssn_next(stream, type, sid) \
> -	((stream)->type[sid].ssn++)
> +	(sctp_stream_##type##_ptr((stream), (sid))->ssn++)
>
>  /* Skip over this ssn and all below. */
>  #define sctp_ssn_skip(stream, type, sid, ssn) \
> -	((stream)->type[sid].ssn = ssn + 1)
> +	(sctp_stream_##type##_ptr((stream), (sid))->ssn = ssn + 1)
>
>  /* What is the current MID number for this stream? */
>  #define sctp_mid_peek(stream, type, sid) \
> -	((stream)->type[sid].mid)
> +	(sctp_stream_##type##_ptr((stream), (sid))->mid)
>
>  /* Return the next MID number for this stream.  */
>  #define sctp_mid_next(stream, type, sid) \
> -	((stream)->type[sid].mid++)
> +	(sctp_stream_##type##_ptr((stream), (sid))->mid++)
>
>  /* Skip over this mid and all below. */
>  #define sctp_mid_skip(stream, type, sid, mid) \
> -	((stream)->type[sid].mid = mid + 1)
> +	(sctp_stream_##type##_ptr((stream), (sid))->mid = mid + 1)
>
> -#define sctp_stream_in(asoc, sid) (&(asoc)->stream.in[sid])
> +#define sctp_stream_in(asoc, sid) sctp_stream_in_ptr(&(asoc)->stream, (sid))

This will get confusing:
- sctp_stream_in(asoc, sid)
- sctp_stream_in_ptr(stream, sid)

Considering all usages of sctp_stream_in(), seems you can just update
them to do the ->stream deref and keep only the later implementation.
Which then don't need the _ptr suffix.

  Marcelo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH net-next 1/2] net/sctp: Make wrappers for accessing in/out streams
@ 2018-04-23 21:33     ` Marcelo Ricardo Leitner
  0 siblings, 0 replies; 64+ messages in thread
From: Marcelo Ricardo Leitner @ 2018-04-23 21:33 UTC (permalink / raw)
  To: Oleg Babin
  Cc: netdev, linux-sctp, David S. Miller, Vlad Yasevich, Neil Horman,
	Xin Long, Andrey Ryabinin

On Mon, Apr 23, 2018 at 09:41:05PM +0300, Oleg Babin wrote:
> This patch introduces wrappers for accessing in/out streams indirectly.
> This will enable to replace physically contiguous memory arrays
> of streams with flexible arrays (or maybe any other appropriate
> mechanism) which do memory allocation on a per-page basis.
>
> Signed-off-by: Oleg Babin <obabin@virtuozzo.com>
> ---
>  include/net/sctp/structs.h   |  30 +++++++-----
>  net/sctp/chunk.c             |   6 ++-
>  net/sctp/outqueue.c          |  11 +++--
>  net/sctp/socket.c            |   4 +-
>  net/sctp/stream.c            | 107 +++++++++++++++++++++++++------------------
>  net/sctp/stream_interleave.c |   2 +-
>  net/sctp/stream_sched.c      |  13 +++---
>  net/sctp/stream_sched_prio.c |  22 ++++-----
>  net/sctp/stream_sched_rr.c   |   8 ++--
>  9 files changed, 116 insertions(+), 87 deletions(-)
>
> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> index a0ec462..578bb40 100644
> --- a/include/net/sctp/structs.h
> +++ b/include/net/sctp/structs.h
> @@ -394,37 +394,37 @@ int sctp_stream_init(struct sctp_stream *stream, __u16 outcnt, __u16 incnt,
>
>  /* What is the current SSN number for this stream? */
>  #define sctp_ssn_peek(stream, type, sid) \
> -	((stream)->type[sid].ssn)
> +	(sctp_stream_##type##_ptr((stream), (sid))->ssn)
>
>  /* Return the next SSN number for this stream.	*/
>  #define sctp_ssn_next(stream, type, sid) \
> -	((stream)->type[sid].ssn++)
> +	(sctp_stream_##type##_ptr((stream), (sid))->ssn++)
>
>  /* Skip over this ssn and all below. */
>  #define sctp_ssn_skip(stream, type, sid, ssn) \
> -	((stream)->type[sid].ssn = ssn + 1)
> +	(sctp_stream_##type##_ptr((stream), (sid))->ssn = ssn + 1)
>
>  /* What is the current MID number for this stream? */
>  #define sctp_mid_peek(stream, type, sid) \
> -	((stream)->type[sid].mid)
> +	(sctp_stream_##type##_ptr((stream), (sid))->mid)
>
>  /* Return the next MID number for this stream.  */
>  #define sctp_mid_next(stream, type, sid) \
> -	((stream)->type[sid].mid++)
> +	(sctp_stream_##type##_ptr((stream), (sid))->mid++)
>
>  /* Skip over this mid and all below. */
>  #define sctp_mid_skip(stream, type, sid, mid) \
> -	((stream)->type[sid].mid = mid + 1)
> +	(sctp_stream_##type##_ptr((stream), (sid))->mid = mid + 1)
>
> -#define sctp_stream_in(asoc, sid) (&(asoc)->stream.in[sid])
> +#define sctp_stream_in(asoc, sid) sctp_stream_in_ptr(&(asoc)->stream, (sid))

This will get confusing:
- sctp_stream_in(asoc, sid)
- sctp_stream_in_ptr(stream, sid)

Considering all usages of sctp_stream_in(), seems you can just update
them to do the ->stream deref and keep only the later implementation.
Which then don't need the _ptr suffix.

  Marcelo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH net-next 0/2] net/sctp: Avoid allocating high order memory with kmalloc()
  2018-04-23 21:33   ` Marcelo Ricardo Leitner
@ 2018-04-26 22:14     ` Oleg Babin
  -1 siblings, 0 replies; 64+ messages in thread
From: Oleg Babin @ 2018-04-26 22:14 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: netdev, linux-sctp, David S. Miller, Vlad Yasevich, Neil Horman,
	Xin Long, Andrey Ryabinin

Hi Marcelo,

On 04/24/2018 12:33 AM, Marcelo Ricardo Leitner wrote:
> Hi,
> 
> On Mon, Apr 23, 2018 at 09:41:04PM +0300, Oleg Babin wrote:
>> Each SCTP association can have up to 65535 input and output streams.
>> For each stream type an array of sctp_stream_in or sctp_stream_out
>> structures is allocated using kmalloc_array() function. This function
>> allocates physically contiguous memory regions, so this can lead
>> to allocation of memory regions of very high order, i.e.:
>>
>>   sizeof(struct sctp_stream_out) == 24,
>>   ((65535 * 24) / 4096) == 383 memory pages (4096 byte per page),
>>   which means 9th memory order.
>>
>> This can lead to a memory allocation failures on the systems
>> under a memory stress.
> 
> Did you do performance tests while actually using these 65k streams
> and with 256 (so it gets 2 pages)?
> 
> This will introduce another deref on each access to an element, but
> I'm not expecting any impact due to it.
> 

No, I didn't do such tests. Could you please tell me what methodology
do you usually use to measure performance properly?

I'm trying to do measurements with iperf3 on unmodified kernel and get
very strange results like this:

ovbabin@ovbabin-laptop:~$ ~/programs/iperf/bin/iperf3 -c 169.254.11.150 --sctp
Connecting to host 169.254.11.150, port 5201
[  5] local 169.254.11.150 port 46330 connected to 169.254.11.150 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  9.88 MBytes  82.8 Mbits/sec                  
[  5]   1.00-2.00   sec   226 MBytes  1.90 Gbits/sec                  
[  5]   2.00-3.00   sec   832 KBytes  6.82 Mbits/sec                  
[  5]   3.00-4.00   sec   640 KBytes  5.24 Mbits/sec                  
[  5]   4.00-5.00   sec   756 MBytes  6.34 Gbits/sec                  
[  5]   5.00-6.00   sec   522 MBytes  4.38 Gbits/sec                  
[  5]   6.00-7.00   sec   896 KBytes  7.34 Mbits/sec                  
[  5]   7.00-8.00   sec   519 MBytes  4.35 Gbits/sec                  
[  5]   8.00-9.00   sec   504 MBytes  4.23 Gbits/sec                  
[  5]   9.00-10.00  sec   475 MBytes  3.98 Gbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.00  sec  2.94 GBytes  2.53 Gbits/sec                  sender
[  5]   0.00-10.04  sec  2.94 GBytes  2.52 Gbits/sec                  receiver

iperf Done.

The values are spread enormously from hundreds of kilobits to gigabits.
I get similar results with netperf. This particular result was obtained
with client and server running on the same machine. Also I tried this
on different machines with different kernel versions - situation was similar.
I compiled latest versions of iperf and netperf from sources.

Could it possibly be that I am missing something very obvious? 

Thanks!

-- 
Best regards,
Oleg
  
>   Marcelo
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> .
>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH net-next 0/2] net/sctp: Avoid allocating high order memory with kmalloc()
@ 2018-04-26 22:14     ` Oleg Babin
  0 siblings, 0 replies; 64+ messages in thread
From: Oleg Babin @ 2018-04-26 22:14 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: netdev, linux-sctp, David S. Miller, Vlad Yasevich, Neil Horman,
	Xin Long, Andrey Ryabinin

Hi Marcelo,

On 04/24/2018 12:33 AM, Marcelo Ricardo Leitner wrote:
> Hi,
> 
> On Mon, Apr 23, 2018 at 09:41:04PM +0300, Oleg Babin wrote:
>> Each SCTP association can have up to 65535 input and output streams.
>> For each stream type an array of sctp_stream_in or sctp_stream_out
>> structures is allocated using kmalloc_array() function. This function
>> allocates physically contiguous memory regions, so this can lead
>> to allocation of memory regions of very high order, i.e.:
>>
>>   sizeof(struct sctp_stream_out) = 24,
>>   ((65535 * 24) / 4096) = 383 memory pages (4096 byte per page),
>>   which means 9th memory order.
>>
>> This can lead to a memory allocation failures on the systems
>> under a memory stress.
> 
> Did you do performance tests while actually using these 65k streams
> and with 256 (so it gets 2 pages)?
> 
> This will introduce another deref on each access to an element, but
> I'm not expecting any impact due to it.
> 

No, I didn't do such tests. Could you please tell me what methodology
do you usually use to measure performance properly?

I'm trying to do measurements with iperf3 on unmodified kernel and get
very strange results like this:

ovbabin@ovbabin-laptop:~$ ~/programs/iperf/bin/iperf3 -c 169.254.11.150 --sctp
Connecting to host 169.254.11.150, port 5201
[  5] local 169.254.11.150 port 46330 connected to 169.254.11.150 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  9.88 MBytes  82.8 Mbits/sec                  
[  5]   1.00-2.00   sec   226 MBytes  1.90 Gbits/sec                  
[  5]   2.00-3.00   sec   832 KBytes  6.82 Mbits/sec                  
[  5]   3.00-4.00   sec   640 KBytes  5.24 Mbits/sec                  
[  5]   4.00-5.00   sec   756 MBytes  6.34 Gbits/sec                  
[  5]   5.00-6.00   sec   522 MBytes  4.38 Gbits/sec                  
[  5]   6.00-7.00   sec   896 KBytes  7.34 Mbits/sec                  
[  5]   7.00-8.00   sec   519 MBytes  4.35 Gbits/sec                  
[  5]   8.00-9.00   sec   504 MBytes  4.23 Gbits/sec                  
[  5]   9.00-10.00  sec   475 MBytes  3.98 Gbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.00  sec  2.94 GBytes  2.53 Gbits/sec                  sender
[  5]   0.00-10.04  sec  2.94 GBytes  2.52 Gbits/sec                  receiver

iperf Done.

The values are spread enormously from hundreds of kilobits to gigabits.
I get similar results with netperf. This particular result was obtained
with client and server running on the same machine. Also I tried this
on different machines with different kernel versions - situation was similar.
I compiled latest versions of iperf and netperf from sources.

Could it possibly be that I am missing something very obvious? 

Thanks!

-- 
Best regards,
Oleg
  
>   Marcelo
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> .
>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH net-next 1/2] net/sctp: Make wrappers for accessing in/out streams
  2018-04-23 21:33     ` Marcelo Ricardo Leitner
@ 2018-04-26 22:19       ` Oleg Babin
  -1 siblings, 0 replies; 64+ messages in thread
From: Oleg Babin @ 2018-04-26 22:19 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: netdev, linux-sctp, David S. Miller, Vlad Yasevich, Neil Horman,
	Xin Long, Andrey Ryabinin

On 04/24/2018 12:33 AM, Marcelo Ricardo Leitner wrote:
> On Mon, Apr 23, 2018 at 09:41:05PM +0300, Oleg Babin wrote:
>> This patch introduces wrappers for accessing in/out streams indirectly.
>> This will enable to replace physically contiguous memory arrays
>> of streams with flexible arrays (or maybe any other appropriate
>> mechanism) which do memory allocation on a per-page basis.
>>
>> Signed-off-by: Oleg Babin <obabin@virtuozzo.com>
>> ---
>>  include/net/sctp/structs.h   |  30 +++++++-----
>>  net/sctp/chunk.c             |   6 ++-
>>  net/sctp/outqueue.c          |  11 +++--
>>  net/sctp/socket.c            |   4 +-
>>  net/sctp/stream.c            | 107 +++++++++++++++++++++++++------------------
>>  net/sctp/stream_interleave.c |   2 +-
>>  net/sctp/stream_sched.c      |  13 +++---
>>  net/sctp/stream_sched_prio.c |  22 ++++-----
>>  net/sctp/stream_sched_rr.c   |   8 ++--
>>  9 files changed, 116 insertions(+), 87 deletions(-)
>>
>> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
>> index a0ec462..578bb40 100644
>> --- a/include/net/sctp/structs.h
>> +++ b/include/net/sctp/structs.h
>> @@ -394,37 +394,37 @@ int sctp_stream_init(struct sctp_stream *stream, __u16 outcnt, __u16 incnt,
>>
>>  /* What is the current SSN number for this stream? */
>>  #define sctp_ssn_peek(stream, type, sid) \
>> -	((stream)->type[sid].ssn)
>> +	(sctp_stream_##type##_ptr((stream), (sid))->ssn)
>>
>>  /* Return the next SSN number for this stream.	*/
>>  #define sctp_ssn_next(stream, type, sid) \
>> -	((stream)->type[sid].ssn++)
>> +	(sctp_stream_##type##_ptr((stream), (sid))->ssn++)
>>
>>  /* Skip over this ssn and all below. */
>>  #define sctp_ssn_skip(stream, type, sid, ssn) \
>> -	((stream)->type[sid].ssn = ssn + 1)
>> +	(sctp_stream_##type##_ptr((stream), (sid))->ssn = ssn + 1)
>>
>>  /* What is the current MID number for this stream? */
>>  #define sctp_mid_peek(stream, type, sid) \
>> -	((stream)->type[sid].mid)
>> +	(sctp_stream_##type##_ptr((stream), (sid))->mid)
>>
>>  /* Return the next MID number for this stream.  */
>>  #define sctp_mid_next(stream, type, sid) \
>> -	((stream)->type[sid].mid++)
>> +	(sctp_stream_##type##_ptr((stream), (sid))->mid++)
>>
>>  /* Skip over this mid and all below. */
>>  #define sctp_mid_skip(stream, type, sid, mid) \
>> -	((stream)->type[sid].mid = mid + 1)
>> +	(sctp_stream_##type##_ptr((stream), (sid))->mid = mid + 1)
>>
>> -#define sctp_stream_in(asoc, sid) (&(asoc)->stream.in[sid])
>> +#define sctp_stream_in(asoc, sid) sctp_stream_in_ptr(&(asoc)->stream, (sid))
> 
> This will get confusing:
> - sctp_stream_in(asoc, sid)
> - sctp_stream_in_ptr(stream, sid)
> 
> Considering all usages of sctp_stream_in(), seems you can just update
> them to do the ->stream deref and keep only the later implementation.
> Which then don't need the _ptr suffix.
Ok, I'll change that in the next path version.

-- 
Best regards,
Oleg Babin

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH net-next 1/2] net/sctp: Make wrappers for accessing in/out streams
@ 2018-04-26 22:19       ` Oleg Babin
  0 siblings, 0 replies; 64+ messages in thread
From: Oleg Babin @ 2018-04-26 22:19 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: netdev, linux-sctp, David S. Miller, Vlad Yasevich, Neil Horman,
	Xin Long, Andrey Ryabinin

On 04/24/2018 12:33 AM, Marcelo Ricardo Leitner wrote:
> On Mon, Apr 23, 2018 at 09:41:05PM +0300, Oleg Babin wrote:
>> This patch introduces wrappers for accessing in/out streams indirectly.
>> This will enable to replace physically contiguous memory arrays
>> of streams with flexible arrays (or maybe any other appropriate
>> mechanism) which do memory allocation on a per-page basis.
>>
>> Signed-off-by: Oleg Babin <obabin@virtuozzo.com>
>> ---
>>  include/net/sctp/structs.h   |  30 +++++++-----
>>  net/sctp/chunk.c             |   6 ++-
>>  net/sctp/outqueue.c          |  11 +++--
>>  net/sctp/socket.c            |   4 +-
>>  net/sctp/stream.c            | 107 +++++++++++++++++++++++++------------------
>>  net/sctp/stream_interleave.c |   2 +-
>>  net/sctp/stream_sched.c      |  13 +++---
>>  net/sctp/stream_sched_prio.c |  22 ++++-----
>>  net/sctp/stream_sched_rr.c   |   8 ++--
>>  9 files changed, 116 insertions(+), 87 deletions(-)
>>
>> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
>> index a0ec462..578bb40 100644
>> --- a/include/net/sctp/structs.h
>> +++ b/include/net/sctp/structs.h
>> @@ -394,37 +394,37 @@ int sctp_stream_init(struct sctp_stream *stream, __u16 outcnt, __u16 incnt,
>>
>>  /* What is the current SSN number for this stream? */
>>  #define sctp_ssn_peek(stream, type, sid) \
>> -	((stream)->type[sid].ssn)
>> +	(sctp_stream_##type##_ptr((stream), (sid))->ssn)
>>
>>  /* Return the next SSN number for this stream.	*/
>>  #define sctp_ssn_next(stream, type, sid) \
>> -	((stream)->type[sid].ssn++)
>> +	(sctp_stream_##type##_ptr((stream), (sid))->ssn++)
>>
>>  /* Skip over this ssn and all below. */
>>  #define sctp_ssn_skip(stream, type, sid, ssn) \
>> -	((stream)->type[sid].ssn = ssn + 1)
>> +	(sctp_stream_##type##_ptr((stream), (sid))->ssn = ssn + 1)
>>
>>  /* What is the current MID number for this stream? */
>>  #define sctp_mid_peek(stream, type, sid) \
>> -	((stream)->type[sid].mid)
>> +	(sctp_stream_##type##_ptr((stream), (sid))->mid)
>>
>>  /* Return the next MID number for this stream.  */
>>  #define sctp_mid_next(stream, type, sid) \
>> -	((stream)->type[sid].mid++)
>> +	(sctp_stream_##type##_ptr((stream), (sid))->mid++)
>>
>>  /* Skip over this mid and all below. */
>>  #define sctp_mid_skip(stream, type, sid, mid) \
>> -	((stream)->type[sid].mid = mid + 1)
>> +	(sctp_stream_##type##_ptr((stream), (sid))->mid = mid + 1)
>>
>> -#define sctp_stream_in(asoc, sid) (&(asoc)->stream.in[sid])
>> +#define sctp_stream_in(asoc, sid) sctp_stream_in_ptr(&(asoc)->stream, (sid))
> 
> This will get confusing:
> - sctp_stream_in(asoc, sid)
> - sctp_stream_in_ptr(stream, sid)
> 
> Considering all usages of sctp_stream_in(), seems you can just update
> them to do the ->stream deref and keep only the later implementation.
> Which then don't need the _ptr suffix.
Ok, I'll change that in the next path version.

-- 
Best regards,
Oleg Babin

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH net-next 0/2] net/sctp: Avoid allocating high order memory with kmalloc()
  2018-04-26 22:14     ` Oleg Babin
@ 2018-04-26 22:28       ` Marcelo Ricardo Leitner
  -1 siblings, 0 replies; 64+ messages in thread
From: Marcelo Ricardo Leitner @ 2018-04-26 22:28 UTC (permalink / raw)
  To: Oleg Babin
  Cc: netdev, linux-sctp, David S. Miller, Vlad Yasevich, Neil Horman,
	Xin Long, Andrey Ryabinin

On Fri, Apr 27, 2018 at 01:14:56AM +0300, Oleg Babin wrote:
> Hi Marcelo,
>
> On 04/24/2018 12:33 AM, Marcelo Ricardo Leitner wrote:
> > Hi,
> >
> > On Mon, Apr 23, 2018 at 09:41:04PM +0300, Oleg Babin wrote:
> >> Each SCTP association can have up to 65535 input and output streams.
> >> For each stream type an array of sctp_stream_in or sctp_stream_out
> >> structures is allocated using kmalloc_array() function. This function
> >> allocates physically contiguous memory regions, so this can lead
> >> to allocation of memory regions of very high order, i.e.:
> >>
> >>   sizeof(struct sctp_stream_out) == 24,
> >>   ((65535 * 24) / 4096) == 383 memory pages (4096 byte per page),
> >>   which means 9th memory order.
> >>
> >> This can lead to a memory allocation failures on the systems
> >> under a memory stress.
> >
> > Did you do performance tests while actually using these 65k streams
> > and with 256 (so it gets 2 pages)?
> >
> > This will introduce another deref on each access to an element, but
> > I'm not expecting any impact due to it.
> >
>
> No, I didn't do such tests. Could you please tell me what methodology
> do you usually use to measure performance properly?
>
> I'm trying to do measurements with iperf3 on unmodified kernel and get
> very strange results like this:
...

I've been trying to fight this fluctuation for some time now but
couldn't really fix it yet. One thing that usually helps (quite a lot)
is increasing the socket buffer sizes and/or using smaller messages,
so there is more cushion in the buffers.

What I have seen in my tests is that when it floats like this, is
because socket buffers floats between 0 and full and don't get into a
steady state. I believe this is because of socket buffer size is used
for limiting the amount of memory used by the socket, instead of being
the amount of payload that the buffer can hold. This causes some
discrepancy, especially because in SCTP we don't defrag the buffer (as
TCP does, it's the collapse operation), and the announced rwnd may
turn up being a lie in the end, which triggers rx drops, then tx cwnd
reduction, and so on. SCTP min_rto of 1s also doesn't help much on
this situation.

On netperf, you may use -S 200000,200000 -s 200000,200000. That should
help it.

Cheers,
Marcelo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH net-next 0/2] net/sctp: Avoid allocating high order memory with kmalloc()
@ 2018-04-26 22:28       ` Marcelo Ricardo Leitner
  0 siblings, 0 replies; 64+ messages in thread
From: Marcelo Ricardo Leitner @ 2018-04-26 22:28 UTC (permalink / raw)
  To: Oleg Babin
  Cc: netdev, linux-sctp, David S. Miller, Vlad Yasevich, Neil Horman,
	Xin Long, Andrey Ryabinin

On Fri, Apr 27, 2018 at 01:14:56AM +0300, Oleg Babin wrote:
> Hi Marcelo,
>
> On 04/24/2018 12:33 AM, Marcelo Ricardo Leitner wrote:
> > Hi,
> >
> > On Mon, Apr 23, 2018 at 09:41:04PM +0300, Oleg Babin wrote:
> >> Each SCTP association can have up to 65535 input and output streams.
> >> For each stream type an array of sctp_stream_in or sctp_stream_out
> >> structures is allocated using kmalloc_array() function. This function
> >> allocates physically contiguous memory regions, so this can lead
> >> to allocation of memory regions of very high order, i.e.:
> >>
> >>   sizeof(struct sctp_stream_out) = 24,
> >>   ((65535 * 24) / 4096) = 383 memory pages (4096 byte per page),
> >>   which means 9th memory order.
> >>
> >> This can lead to a memory allocation failures on the systems
> >> under a memory stress.
> >
> > Did you do performance tests while actually using these 65k streams
> > and with 256 (so it gets 2 pages)?
> >
> > This will introduce another deref on each access to an element, but
> > I'm not expecting any impact due to it.
> >
>
> No, I didn't do such tests. Could you please tell me what methodology
> do you usually use to measure performance properly?
>
> I'm trying to do measurements with iperf3 on unmodified kernel and get
> very strange results like this:
...

I've been trying to fight this fluctuation for some time now but
couldn't really fix it yet. One thing that usually helps (quite a lot)
is increasing the socket buffer sizes and/or using smaller messages,
so there is more cushion in the buffers.

What I have seen in my tests is that when it floats like this, is
because socket buffers floats between 0 and full and don't get into a
steady state. I believe this is because of socket buffer size is used
for limiting the amount of memory used by the socket, instead of being
the amount of payload that the buffer can hold. This causes some
discrepancy, especially because in SCTP we don't defrag the buffer (as
TCP does, it's the collapse operation), and the announced rwnd may
turn up being a lie in the end, which triggers rx drops, then tx cwnd
reduction, and so on. SCTP min_rto of 1s also doesn't help much on
this situation.

On netperf, you may use -S 200000,200000 -s 200000,200000. That should
help it.

Cheers,
Marcelo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH net-next 0/2] net/sctp: Avoid allocating high order memory with kmalloc()
  2018-04-26 22:28       ` Marcelo Ricardo Leitner
@ 2018-04-26 22:45         ` Oleg Babin
  -1 siblings, 0 replies; 64+ messages in thread
From: Oleg Babin @ 2018-04-26 22:45 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: netdev, linux-sctp, David S. Miller, Vlad Yasevich, Neil Horman,
	Xin Long, Andrey Ryabinin

On 04/27/2018 01:28 AM, Marcelo Ricardo Leitner wrote:
> On Fri, Apr 27, 2018 at 01:14:56AM +0300, Oleg Babin wrote:
>> Hi Marcelo,
>>
>> On 04/24/2018 12:33 AM, Marcelo Ricardo Leitner wrote:
>>> Hi,
>>>
>>> On Mon, Apr 23, 2018 at 09:41:04PM +0300, Oleg Babin wrote:
>>>> Each SCTP association can have up to 65535 input and output streams.
>>>> For each stream type an array of sctp_stream_in or sctp_stream_out
>>>> structures is allocated using kmalloc_array() function. This function
>>>> allocates physically contiguous memory regions, so this can lead
>>>> to allocation of memory regions of very high order, i.e.:
>>>>
>>>>   sizeof(struct sctp_stream_out) == 24,
>>>>   ((65535 * 24) / 4096) == 383 memory pages (4096 byte per page),
>>>>   which means 9th memory order.
>>>>
>>>> This can lead to a memory allocation failures on the systems
>>>> under a memory stress.
>>>
>>> Did you do performance tests while actually using these 65k streams
>>> and with 256 (so it gets 2 pages)?
>>>
>>> This will introduce another deref on each access to an element, but
>>> I'm not expecting any impact due to it.
>>>
>>
>> No, I didn't do such tests. Could you please tell me what methodology
>> do you usually use to measure performance properly?
>>
>> I'm trying to do measurements with iperf3 on unmodified kernel and get
>> very strange results like this:
> ...
> 
> I've been trying to fight this fluctuation for some time now but
> couldn't really fix it yet. One thing that usually helps (quite a lot)
> is increasing the socket buffer sizes and/or using smaller messages,
> so there is more cushion in the buffers.
> 
> What I have seen in my tests is that when it floats like this, is
> because socket buffers floats between 0 and full and don't get into a
> steady state. I believe this is because of socket buffer size is used
> for limiting the amount of memory used by the socket, instead of being
> the amount of payload that the buffer can hold. This causes some
> discrepancy, especially because in SCTP we don't defrag the buffer (as
> TCP does, it's the collapse operation), and the announced rwnd may
> turn up being a lie in the end, which triggers rx drops, then tx cwnd
> reduction, and so on. SCTP min_rto of 1s also doesn't help much on
> this situation.
> 
> On netperf, you may use -S 200000,200000 -s 200000,200000. That should
> help it.
>

Thank you very much! I'll try this and get back with results later.

-- 
Best regards,
Oleg

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH net-next 0/2] net/sctp: Avoid allocating high order memory with kmalloc()
@ 2018-04-26 22:45         ` Oleg Babin
  0 siblings, 0 replies; 64+ messages in thread
From: Oleg Babin @ 2018-04-26 22:45 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: netdev, linux-sctp, David S. Miller, Vlad Yasevich, Neil Horman,
	Xin Long, Andrey Ryabinin

On 04/27/2018 01:28 AM, Marcelo Ricardo Leitner wrote:
> On Fri, Apr 27, 2018 at 01:14:56AM +0300, Oleg Babin wrote:
>> Hi Marcelo,
>>
>> On 04/24/2018 12:33 AM, Marcelo Ricardo Leitner wrote:
>>> Hi,
>>>
>>> On Mon, Apr 23, 2018 at 09:41:04PM +0300, Oleg Babin wrote:
>>>> Each SCTP association can have up to 65535 input and output streams.
>>>> For each stream type an array of sctp_stream_in or sctp_stream_out
>>>> structures is allocated using kmalloc_array() function. This function
>>>> allocates physically contiguous memory regions, so this can lead
>>>> to allocation of memory regions of very high order, i.e.:
>>>>
>>>>   sizeof(struct sctp_stream_out) = 24,
>>>>   ((65535 * 24) / 4096) = 383 memory pages (4096 byte per page),
>>>>   which means 9th memory order.
>>>>
>>>> This can lead to a memory allocation failures on the systems
>>>> under a memory stress.
>>>
>>> Did you do performance tests while actually using these 65k streams
>>> and with 256 (so it gets 2 pages)?
>>>
>>> This will introduce another deref on each access to an element, but
>>> I'm not expecting any impact due to it.
>>>
>>
>> No, I didn't do such tests. Could you please tell me what methodology
>> do you usually use to measure performance properly?
>>
>> I'm trying to do measurements with iperf3 on unmodified kernel and get
>> very strange results like this:
> ...
> 
> I've been trying to fight this fluctuation for some time now but
> couldn't really fix it yet. One thing that usually helps (quite a lot)
> is increasing the socket buffer sizes and/or using smaller messages,
> so there is more cushion in the buffers.
> 
> What I have seen in my tests is that when it floats like this, is
> because socket buffers floats between 0 and full and don't get into a
> steady state. I believe this is because of socket buffer size is used
> for limiting the amount of memory used by the socket, instead of being
> the amount of payload that the buffer can hold. This causes some
> discrepancy, especially because in SCTP we don't defrag the buffer (as
> TCP does, it's the collapse operation), and the announced rwnd may
> turn up being a lie in the end, which triggers rx drops, then tx cwnd
> reduction, and so on. SCTP min_rto of 1s also doesn't help much on
> this situation.
> 
> On netperf, you may use -S 200000,200000 -s 200000,200000. That should
> help it.
>

Thank you very much! I'll try this and get back with results later.

-- 
Best regards,
Oleg


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH net-next 0/2] net/sctp: Avoid allocating high order memory with kmalloc()
  2018-04-26 22:28       ` Marcelo Ricardo Leitner
@ 2018-07-24 15:35         ` Konstantin Khorenko
  -1 siblings, 0 replies; 64+ messages in thread
From: Konstantin Khorenko @ 2018-07-24 15:35 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: oleg.babin, netdev, linux-sctp, David S. Miller, Vlad Yasevich,
	Neil Horman, Xin Long, Andrey Ryabinin

On 04/27/2018 01:28 AM, Marcelo Ricardo Leitner wrote:
 > On Fri, Apr 27, 2018 at 01:14:56AM +0300, Oleg Babin wrote:
 >> Hi Marcelo,
 >>
 >> On 04/24/2018 12:33 AM, Marcelo Ricardo Leitner wrote:
 >>> Hi,
 >>>
 >>> On Mon, Apr 23, 2018 at 09:41:04PM +0300, Oleg Babin wrote:
 >>>> Each SCTP association can have up to 65535 input and output streams.
 >>>> For each stream type an array of sctp_stream_in or sctp_stream_out
 >>>> structures is allocated using kmalloc_array() function. This function
 >>>> allocates physically contiguous memory regions, so this can lead
 >>>> to allocation of memory regions of very high order, i.e.:
 >>>>
 >>>>   sizeof(struct sctp_stream_out) == 24,
 >>>>   ((65535 * 24) / 4096) == 383 memory pages (4096 byte per page),
 >>>>   which means 9th memory order.
 >>>>
 >>>> This can lead to a memory allocation failures on the systems
 >>>> under a memory stress.
 >>>
 >>> Did you do performance tests while actually using these 65k streams
 >>> and with 256 (so it gets 2 pages)?
 >>>
 >>> This will introduce another deref on each access to an element, but
 >>> I'm not expecting any impact due to it.
 >>>
 >>
 >> No, I didn't do such tests. Could you please tell me what methodology
 >> do you usually use to measure performance properly?
 >>
 >> I'm trying to do measurements with iperf3 on unmodified kernel and get
 >> very strange results like this:
 > ...
 >
 > I've been trying to fight this fluctuation for some time now but
 > couldn't really fix it yet. One thing that usually helps (quite a lot)
 > is increasing the socket buffer sizes and/or using smaller messages,
 > so there is more cushion in the buffers.
 >
 > What I have seen in my tests is that when it floats like this, is
 > because socket buffers floats between 0 and full and don't get into a
 > steady state. I believe this is because of socket buffer size is used
 > for limiting the amount of memory used by the socket, instead of being
 > the amount of payload that the buffer can hold. This causes some
 > discrepancy, especially because in SCTP we don't defrag the buffer (as
 > TCP does, it's the collapse operation), and the announced rwnd may
 > turn up being a lie in the end, which triggers rx drops, then tx cwnd
 > reduction, and so on. SCTP min_rto of 1s also doesn't help much on
 > this situation.
 >
 > On netperf, you may use -S 200000,200000 -s 200000,200000. That should
 > help it.

Hi Marcelo,

pity to abandon Oleg's attempt to avoid high order allocations and use
flex_array instead, so i tried to do the performance measurements with
options you kindly suggested.

Here are results:
   * Kernel: v4.18-rc6 - stock and with 2 patches from Oleg (earlier in this thread)
   * Node: CPU (8 cores): Intel(R) Xeon(R) CPU E31230 @ 3.20GHz
           RAM: 32 Gb

   * netperf: taken from https://github.com/HewlettPackard/netperf.git,
	     compiled from sources with sctp support
   * netperf server and client are run on the same node

The script used to run tests:
# cat run_tests.sh
#!/bin/bash

for test in SCTP_STREAM SCTP_STREAM_MANY SCTP_RR SCTP_RR_MANY; do
   echo "TEST: $test";
   for i in `seq 1 3`; do
     echo "Iteration: $i";
     set -x
     netperf -t $test -H localhost -p 22222 -S 200000,200000 -s 200000,200000 -l 60;
     set +x
   done
done
================================================

Results (a bit reformatted to be more readable):
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

				v4.18-rc6	v4.18-rc6 + fixes
TEST: SCTP_STREAM
212992 212992 212992    60.11       4.11	4.11
212992 212992 212992    60.11       4.11	4.11
212992 212992 212992    60.11       4.11	4.11
TEST: SCTP_STREAM_MANY
212992 212992   4096    60.00    1769.26	2283.85
212992 212992   4096    60.00    2309.59	858.43
212992 212992   4096    60.00    5300.65	3351.24

===========
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

					v4.18-rc6	v4.18-rc6 + fixes
TEST: SCTP_RR
212992 212992 1        1       60.00    44832.10	45148.68
212992 212992 1        1       60.00    44835.72	44662.95
212992 212992 1        1       60.00    45199.21	45055.86
TEST: SCTP_RR_MANY
212992 212992 1        1       60.00      40.90		45.55
212992 212992 1        1       60.00      40.65		45.88
212992 212992 1        1       60.00      44.53		42.15

As we can see single stream tests do not show any noticeable degradation,
and SCTP_*_MANY tests spread decreased significantly when -S/-s options are used,
but still too big to consider the performance test pass or fail.

Can you please advise anything else to try - to decrease the dispersion rate -
or can we just consider values are fine and i'm reworking the patch according
to your comment about sctp_stream_in(asoc, sid)/sctp_stream_in_ptr(stream, sid)
and that's it?

Thank you in advance!

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH net-next 0/2] net/sctp: Avoid allocating high order memory with kmalloc()
@ 2018-07-24 15:35         ` Konstantin Khorenko
  0 siblings, 0 replies; 64+ messages in thread
From: Konstantin Khorenko @ 2018-07-24 15:35 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: oleg.babin, netdev, linux-sctp, David S. Miller, Vlad Yasevich,
	Neil Horman, Xin Long, Andrey Ryabinin

On 04/27/2018 01:28 AM, Marcelo Ricardo Leitner wrote:
 > On Fri, Apr 27, 2018 at 01:14:56AM +0300, Oleg Babin wrote:
 >> Hi Marcelo,
 >>
 >> On 04/24/2018 12:33 AM, Marcelo Ricardo Leitner wrote:
 >>> Hi,
 >>>
 >>> On Mon, Apr 23, 2018 at 09:41:04PM +0300, Oleg Babin wrote:
 >>>> Each SCTP association can have up to 65535 input and output streams.
 >>>> For each stream type an array of sctp_stream_in or sctp_stream_out
 >>>> structures is allocated using kmalloc_array() function. This function
 >>>> allocates physically contiguous memory regions, so this can lead
 >>>> to allocation of memory regions of very high order, i.e.:
 >>>>
 >>>>   sizeof(struct sctp_stream_out) = 24,
 >>>>   ((65535 * 24) / 4096) = 383 memory pages (4096 byte per page),
 >>>>   which means 9th memory order.
 >>>>
 >>>> This can lead to a memory allocation failures on the systems
 >>>> under a memory stress.
 >>>
 >>> Did you do performance tests while actually using these 65k streams
 >>> and with 256 (so it gets 2 pages)?
 >>>
 >>> This will introduce another deref on each access to an element, but
 >>> I'm not expecting any impact due to it.
 >>>
 >>
 >> No, I didn't do such tests. Could you please tell me what methodology
 >> do you usually use to measure performance properly?
 >>
 >> I'm trying to do measurements with iperf3 on unmodified kernel and get
 >> very strange results like this:
 > ...
 >
 > I've been trying to fight this fluctuation for some time now but
 > couldn't really fix it yet. One thing that usually helps (quite a lot)
 > is increasing the socket buffer sizes and/or using smaller messages,
 > so there is more cushion in the buffers.
 >
 > What I have seen in my tests is that when it floats like this, is
 > because socket buffers floats between 0 and full and don't get into a
 > steady state. I believe this is because of socket buffer size is used
 > for limiting the amount of memory used by the socket, instead of being
 > the amount of payload that the buffer can hold. This causes some
 > discrepancy, especially because in SCTP we don't defrag the buffer (as
 > TCP does, it's the collapse operation), and the announced rwnd may
 > turn up being a lie in the end, which triggers rx drops, then tx cwnd
 > reduction, and so on. SCTP min_rto of 1s also doesn't help much on
 > this situation.
 >
 > On netperf, you may use -S 200000,200000 -s 200000,200000. That should
 > help it.

Hi Marcelo,

pity to abandon Oleg's attempt to avoid high order allocations and use
flex_array instead, so i tried to do the performance measurements with
options you kindly suggested.

Here are results:
   * Kernel: v4.18-rc6 - stock and with 2 patches from Oleg (earlier in this thread)
   * Node: CPU (8 cores): Intel(R) Xeon(R) CPU E31230 @ 3.20GHz
           RAM: 32 Gb

   * netperf: taken from https://github.com/HewlettPackard/netperf.git,
	     compiled from sources with sctp support
   * netperf server and client are run on the same node

The script used to run tests:
# cat run_tests.sh
#!/bin/bash

for test in SCTP_STREAM SCTP_STREAM_MANY SCTP_RR SCTP_RR_MANY; do
   echo "TEST: $test";
   for i in `seq 1 3`; do
     echo "Iteration: $i";
     set -x
     netperf -t $test -H localhost -p 22222 -S 200000,200000 -s 200000,200000 -l 60;
     set +x
   done
done
========================

Results (a bit reformatted to be more readable):
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

				v4.18-rc6	v4.18-rc6 + fixes
TEST: SCTP_STREAM
212992 212992 212992    60.11       4.11	4.11
212992 212992 212992    60.11       4.11	4.11
212992 212992 212992    60.11       4.11	4.11
TEST: SCTP_STREAM_MANY
212992 212992   4096    60.00    1769.26	2283.85
212992 212992   4096    60.00    2309.59	858.43
212992 212992   4096    60.00    5300.65	3351.24

=====Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

					v4.18-rc6	v4.18-rc6 + fixes
TEST: SCTP_RR
212992 212992 1        1       60.00    44832.10	45148.68
212992 212992 1        1       60.00    44835.72	44662.95
212992 212992 1        1       60.00    45199.21	45055.86
TEST: SCTP_RR_MANY
212992 212992 1        1       60.00      40.90		45.55
212992 212992 1        1       60.00      40.65		45.88
212992 212992 1        1       60.00      44.53		42.15

As we can see single stream tests do not show any noticeable degradation,
and SCTP_*_MANY tests spread decreased significantly when -S/-s options are used,
but still too big to consider the performance test pass or fail.

Can you please advise anything else to try - to decrease the dispersion rate -
or can we just consider values are fine and i'm reworking the patch according
to your comment about sctp_stream_in(asoc, sid)/sctp_stream_in_ptr(stream, sid)
and that's it?

Thank you in advance!

--
Best regards,
Konstantin

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH net-next 0/2] net/sctp: Avoid allocating high order memory with kmalloc()
  2018-07-24 15:35         ` Konstantin Khorenko
@ 2018-07-24 17:36           ` Marcelo Ricardo Leitner
  -1 siblings, 0 replies; 64+ messages in thread
From: Marcelo Ricardo Leitner @ 2018-07-24 17:36 UTC (permalink / raw)
  To: Konstantin Khorenko
  Cc: oleg.babin, netdev, linux-sctp, David S. Miller, Vlad Yasevich,
	Neil Horman, Xin Long, Andrey Ryabinin

On Tue, Jul 24, 2018 at 06:35:35PM +0300, Konstantin Khorenko wrote:
> Hi Marcelo,
> 
> pity to abandon Oleg's attempt to avoid high order allocations and use
> flex_array instead, so i tried to do the performance measurements with
> options you kindly suggested.

Nice, thanks!

...
> As we can see single stream tests do not show any noticeable degradation,
> and SCTP_*_MANY tests spread decreased significantly when -S/-s options are used,
> but still too big to consider the performance test pass or fail.
> 
> Can you please advise anything else to try - to decrease the dispersion rate -

In addition, you can try also using a veth tunnel or reducing lo mtu
down to 1500, and also make use of sctp tests (need to be after the --
) option -m 1452.  These will alleaviate issues with cwnd handling
that happen on loopback due to the big MTU and minimize issues with
rwnd/buffer size too.

Even with -S, -s, -m and the lower MTU, it is usual to see some
fluctuation, but not that much.

> or can we just consider values are fine and i'm reworking the patch according
> to your comment about sctp_stream_in(asoc, sid)/sctp_stream_in_ptr(stream, sid)
> and that's it?

Ok, thanks. It seems so, yes.

  Marcelo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH net-next 0/2] net/sctp: Avoid allocating high order memory with kmalloc()
@ 2018-07-24 17:36           ` Marcelo Ricardo Leitner
  0 siblings, 0 replies; 64+ messages in thread
From: Marcelo Ricardo Leitner @ 2018-07-24 17:36 UTC (permalink / raw)
  To: Konstantin Khorenko
  Cc: oleg.babin, netdev, linux-sctp, David S. Miller, Vlad Yasevich,
	Neil Horman, Xin Long, Andrey Ryabinin

On Tue, Jul 24, 2018 at 06:35:35PM +0300, Konstantin Khorenko wrote:
> Hi Marcelo,
> 
> pity to abandon Oleg's attempt to avoid high order allocations and use
> flex_array instead, so i tried to do the performance measurements with
> options you kindly suggested.

Nice, thanks!

...
> As we can see single stream tests do not show any noticeable degradation,
> and SCTP_*_MANY tests spread decreased significantly when -S/-s options are used,
> but still too big to consider the performance test pass or fail.
> 
> Can you please advise anything else to try - to decrease the dispersion rate -

In addition, you can try also using a veth tunnel or reducing lo mtu
down to 1500, and also make use of sctp tests (need to be after the --
) option -m 1452.  These will alleaviate issues with cwnd handling
that happen on loopback due to the big MTU and minimize issues with
rwnd/buffer size too.

Even with -S, -s, -m and the lower MTU, it is usual to see some
fluctuation, but not that much.

> or can we just consider values are fine and i'm reworking the patch according
> to your comment about sctp_stream_in(asoc, sid)/sctp_stream_in_ptr(stream, sid)
> and that's it?

Ok, thanks. It seems so, yes.

  Marcelo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v2 0/2] net/sctp: Avoid allocating high order memory with kmalloc()
  2018-07-24 17:36           ` Marcelo Ricardo Leitner
@ 2018-08-03 16:21             ` Konstantin Khorenko
  -1 siblings, 0 replies; 64+ messages in thread
From: Konstantin Khorenko @ 2018-08-03 16:21 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: oleg.babin, netdev, linux-sctp, David S . Miller, Vlad Yasevich,
	Neil Horman, Xin Long, Andrey Ryabinin, Konstantin Khorenko

Each SCTP association can have up to 65535 input and output streams.
For each stream type an array of sctp_stream_in or sctp_stream_out
structures is allocated using kmalloc_array() function. This function
allocates physically contiguous memory regions, so this can lead
to allocation of memory regions of very high order, i.e.:

  sizeof(struct sctp_stream_out) == 24,
  ((65535 * 24) / 4096) == 383 memory pages (4096 byte per page),
  which means 9th memory order.

This can lead to a memory allocation failures on the systems
under a memory stress.

We actually do not need these arrays of memory to be physically
contiguous. Possible simple solution would be to use kvmalloc()
instread of kmalloc() as kvmalloc() can allocate physically scattered
pages if contiguous pages are not available. But the problem
is that the allocation can happed in a softirq context with
GFP_ATOMIC flag set, and kvmalloc() cannot be used in this scenario.

So the other possible solution is to use flexible arrays instead of
contiguios arrays of memory so that the memory would be allocated
on a per-page basis.

This patchset replaces kvmalloc() with flex_array usage.
It consists of two parts:

  * First patch is preparatory - it mechanically wraps all direct
    access to assoc->stream.out[] and assoc->stream.in[] arrays
    with SCTP_SO() and SCTP_SI() wrappers so that later a direct
    array access could be easily changed to an access to a
    flex_array (or any other possible alternative).
  * Second patch replaces kmalloc_array() with flex_array usage.

Oleg Babin (2):
  net/sctp: Make wrappers for accessing in/out streams
  net/sctp: Replace in/out stream arrays with flex_array

 include/net/sctp/structs.h   |  31 ++++----
 net/sctp/chunk.c             |   6 +-
 net/sctp/outqueue.c          |  11 +--
 net/sctp/socket.c            |   4 +-
 net/sctp/stream.c            | 165 +++++++++++++++++++++++++++++--------------
 net/sctp/stream_interleave.c |  20 +++---
 net/sctp/stream_sched.c      |  13 ++--
 net/sctp/stream_sched_prio.c |  22 +++---
 net/sctp/stream_sched_rr.c   |   8 +--
 9 files changed, 175 insertions(+), 105 deletions(-)

v2 changes:
 sctp_stream_in() users are updated to provide stream as an argument,
 sctp_stream_{in,out}_ptr() are now just sctp_stream_{in,out}().

Performance results:
====================
  * Kernel: v4.18-rc6 - stock and with 2 patches from Oleg (earlier in this thread)
  * Node: CPU (8 cores): Intel(R) Xeon(R) CPU E31230 @ 3.20GHz
          RAM: 32 Gb

  * netperf: taken from https://github.com/HewlettPackard/netperf.git,
	     compiled from sources with sctp support
  * netperf server and client are run on the same node
  * ip link set lo mtu 1500

The script used to run tests:
 # cat run_tests.sh
 #!/bin/bash

for test in SCTP_STREAM SCTP_STREAM_MANY SCTP_RR SCTP_RR_MANY; do
  echo "TEST: $test";
  for i in `seq 1 3`; do
    echo "Iteration: $i";
    set -x
    netperf -t $test -H localhost -p 22222 -S 200000,200000 -s 200000,200000 \
            -l 60 -- -m 1452;
    set +x
  done
done
================================================

Results (a bit reformatted to be more readable):
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

				v4.18-rc7	v4.18-rc7 + fixes
TEST: SCTP_STREAM
212992 212992   1452    60.21	1125.52		1247.04
212992 212992   1452    60.20	1376.38		1149.95
212992 212992   1452    60.20	1131.40		1163.85
TEST: SCTP_STREAM_MANY
212992 212992   1452    60.00	1111.00		1310.05
212992 212992   1452    60.00	1188.55		1130.50
212992 212992   1452    60.00	1108.06		1162.50

===========
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

					v4.18-rc7	v4.18-rc7 + fixes
TEST: SCTP_RR
212992 212992 1        1       60.00	45486.98	46089.43
212992 212992 1        1       60.00	45584.18	45994.21
212992 212992 1        1       60.00	45703.86	45720.84
TEST: SCTP_RR_MANY
212992 212992 1        1       60.00	40.75		40.77
212992 212992 1        1       60.00	40.58		40.08
212992 212992 1        1       60.00	39.98		39.97

-- 
2.15.1

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v2 0/2] net/sctp: Avoid allocating high order memory with kmalloc()
@ 2018-08-03 16:21             ` Konstantin Khorenko
  0 siblings, 0 replies; 64+ messages in thread
From: Konstantin Khorenko @ 2018-08-03 16:21 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: oleg.babin, netdev, linux-sctp, David S . Miller, Vlad Yasevich,
	Neil Horman, Xin Long, Andrey Ryabinin, Konstantin Khorenko

Each SCTP association can have up to 65535 input and output streams.
For each stream type an array of sctp_stream_in or sctp_stream_out
structures is allocated using kmalloc_array() function. This function
allocates physically contiguous memory regions, so this can lead
to allocation of memory regions of very high order, i.e.:

  sizeof(struct sctp_stream_out) = 24,
  ((65535 * 24) / 4096) = 383 memory pages (4096 byte per page),
  which means 9th memory order.

This can lead to a memory allocation failures on the systems
under a memory stress.

We actually do not need these arrays of memory to be physically
contiguous. Possible simple solution would be to use kvmalloc()
instread of kmalloc() as kvmalloc() can allocate physically scattered
pages if contiguous pages are not available. But the problem
is that the allocation can happed in a softirq context with
GFP_ATOMIC flag set, and kvmalloc() cannot be used in this scenario.

So the other possible solution is to use flexible arrays instead of
contiguios arrays of memory so that the memory would be allocated
on a per-page basis.

This patchset replaces kvmalloc() with flex_array usage.
It consists of two parts:

  * First patch is preparatory - it mechanically wraps all direct
    access to assoc->stream.out[] and assoc->stream.in[] arrays
    with SCTP_SO() and SCTP_SI() wrappers so that later a direct
    array access could be easily changed to an access to a
    flex_array (or any other possible alternative).
  * Second patch replaces kmalloc_array() with flex_array usage.

Oleg Babin (2):
  net/sctp: Make wrappers for accessing in/out streams
  net/sctp: Replace in/out stream arrays with flex_array

 include/net/sctp/structs.h   |  31 ++++----
 net/sctp/chunk.c             |   6 +-
 net/sctp/outqueue.c          |  11 +--
 net/sctp/socket.c            |   4 +-
 net/sctp/stream.c            | 165 +++++++++++++++++++++++++++++--------------
 net/sctp/stream_interleave.c |  20 +++---
 net/sctp/stream_sched.c      |  13 ++--
 net/sctp/stream_sched_prio.c |  22 +++---
 net/sctp/stream_sched_rr.c   |   8 +--
 9 files changed, 175 insertions(+), 105 deletions(-)

v2 changes:
 sctp_stream_in() users are updated to provide stream as an argument,
 sctp_stream_{in,out}_ptr() are now just sctp_stream_{in,out}().

Performance results:
==========
  * Kernel: v4.18-rc6 - stock and with 2 patches from Oleg (earlier in this thread)
  * Node: CPU (8 cores): Intel(R) Xeon(R) CPU E31230 @ 3.20GHz
          RAM: 32 Gb

  * netperf: taken from https://github.com/HewlettPackard/netperf.git,
	     compiled from sources with sctp support
  * netperf server and client are run on the same node
  * ip link set lo mtu 1500

The script used to run tests:
 # cat run_tests.sh
 #!/bin/bash

for test in SCTP_STREAM SCTP_STREAM_MANY SCTP_RR SCTP_RR_MANY; do
  echo "TEST: $test";
  for i in `seq 1 3`; do
    echo "Iteration: $i";
    set -x
    netperf -t $test -H localhost -p 22222 -S 200000,200000 -s 200000,200000 \
            -l 60 -- -m 1452;
    set +x
  done
done
========================

Results (a bit reformatted to be more readable):
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

				v4.18-rc7	v4.18-rc7 + fixes
TEST: SCTP_STREAM
212992 212992   1452    60.21	1125.52		1247.04
212992 212992   1452    60.20	1376.38		1149.95
212992 212992   1452    60.20	1131.40		1163.85
TEST: SCTP_STREAM_MANY
212992 212992   1452    60.00	1111.00		1310.05
212992 212992   1452    60.00	1188.55		1130.50
212992 212992   1452    60.00	1108.06		1162.50

=====Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

					v4.18-rc7	v4.18-rc7 + fixes
TEST: SCTP_RR
212992 212992 1        1       60.00	45486.98	46089.43
212992 212992 1        1       60.00	45584.18	45994.21
212992 212992 1        1       60.00	45703.86	45720.84
TEST: SCTP_RR_MANY
212992 212992 1        1       60.00	40.75		40.77
212992 212992 1        1       60.00	40.58		40.08
212992 212992 1        1       60.00	39.98		39.97

-- 
2.15.1

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v2 1/2] net/sctp: Make wrappers for accessing in/out streams
  2018-08-03 16:21             ` Konstantin Khorenko
@ 2018-08-03 16:21               ` Konstantin Khorenko
  -1 siblings, 0 replies; 64+ messages in thread
From: Konstantin Khorenko @ 2018-08-03 16:21 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: oleg.babin, netdev, linux-sctp, David S . Miller, Vlad Yasevich,
	Neil Horman, Xin Long, Andrey Ryabinin, Konstantin Khorenko

This patch introduces wrappers for accessing in/out streams indirectly.
This will enable to replace physically contiguous memory arrays
of streams with flexible arrays (or maybe any other appropriate
mechanism) which do memory allocation on a per-page basis.

Signed-off-by: Oleg Babin <obabin@virtuozzo.com>
Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>

---
v2 changes:
 sctp_stream_in() users are updated to provide stream as an argument,
 sctp_stream_{in,out}_ptr() are now just sctp_stream_{in,out}().
---
 include/net/sctp/structs.h   |  30 +++++++-----
 net/sctp/chunk.c             |   6 ++-
 net/sctp/outqueue.c          |  11 +++--
 net/sctp/socket.c            |   4 +-
 net/sctp/stream.c            | 107 +++++++++++++++++++++++++------------------
 net/sctp/stream_interleave.c |  20 ++++----
 net/sctp/stream_sched.c      |  13 +++---
 net/sctp/stream_sched_prio.c |  22 ++++-----
 net/sctp/stream_sched_rr.c   |   8 ++--
 9 files changed, 124 insertions(+), 97 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index dbe1b911a24d..dc48c8e2b293 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -394,37 +394,35 @@ void sctp_stream_update(struct sctp_stream *stream, struct sctp_stream *new);
 
 /* What is the current SSN number for this stream? */
 #define sctp_ssn_peek(stream, type, sid) \
-	((stream)->type[sid].ssn)
+	(sctp_stream_##type((stream), (sid))->ssn)
 
 /* Return the next SSN number for this stream.	*/
 #define sctp_ssn_next(stream, type, sid) \
-	((stream)->type[sid].ssn++)
+	(sctp_stream_##type((stream), (sid))->ssn++)
 
 /* Skip over this ssn and all below. */
 #define sctp_ssn_skip(stream, type, sid, ssn) \
-	((stream)->type[sid].ssn = ssn + 1)
+	(sctp_stream_##type((stream), (sid))->ssn = ssn + 1)
 
 /* What is the current MID number for this stream? */
 #define sctp_mid_peek(stream, type, sid) \
-	((stream)->type[sid].mid)
+	(sctp_stream_##type((stream), (sid))->mid)
 
 /* Return the next MID number for this stream.  */
 #define sctp_mid_next(stream, type, sid) \
-	((stream)->type[sid].mid++)
+	(sctp_stream_##type((stream), (sid))->mid++)
 
 /* Skip over this mid and all below. */
 #define sctp_mid_skip(stream, type, sid, mid) \
-	((stream)->type[sid].mid = mid + 1)
-
-#define sctp_stream_in(asoc, sid) (&(asoc)->stream.in[sid])
+	(sctp_stream_##type((stream), (sid))->mid = mid + 1)
 
 /* What is the current MID_uo number for this stream? */
 #define sctp_mid_uo_peek(stream, type, sid) \
-	((stream)->type[sid].mid_uo)
+	(sctp_stream_##type((stream), (sid))->mid_uo)
 
 /* Return the next MID_uo number for this stream.  */
 #define sctp_mid_uo_next(stream, type, sid) \
-	((stream)->type[sid].mid_uo++)
+	(sctp_stream_##type((stream), (sid))->mid_uo++)
 
 /*
  * Pointers to address related SCTP functions.
@@ -1433,8 +1431,8 @@ struct sctp_stream_in {
 };
 
 struct sctp_stream {
-	struct sctp_stream_out *out;
-	struct sctp_stream_in *in;
+	struct flex_array *out;
+	struct flex_array *in;
 	__u16 outcnt;
 	__u16 incnt;
 	/* Current stream being sent, if any */
@@ -1456,6 +1454,14 @@ struct sctp_stream {
 	struct sctp_stream_interleave *si;
 };
 
+struct sctp_stream_out *sctp_stream_out(const struct sctp_stream *stream,
+					__u16 sid);
+struct sctp_stream_in *sctp_stream_in(const struct sctp_stream *stream,
+				      __u16 sid);
+
+#define SCTP_SO(s, i) sctp_stream_out((s), (i))
+#define SCTP_SI(s, i) sctp_stream_in((s), (i))
+
 #define SCTP_STREAM_CLOSED		0x00
 #define SCTP_STREAM_OPEN		0x01
 
diff --git a/net/sctp/chunk.c b/net/sctp/chunk.c
index bfb9f812e2ef..ce8087846f05 100644
--- a/net/sctp/chunk.c
+++ b/net/sctp/chunk.c
@@ -325,7 +325,8 @@ int sctp_chunk_abandoned(struct sctp_chunk *chunk)
 	if (SCTP_PR_TTL_ENABLED(chunk->sinfo.sinfo_flags) &&
 	    time_after(jiffies, chunk->msg->expires_at)) {
 		struct sctp_stream_out *streamout =
-			&chunk->asoc->stream.out[chunk->sinfo.sinfo_stream];
+			SCTP_SO(&chunk->asoc->stream,
+				chunk->sinfo.sinfo_stream);
 
 		if (chunk->sent_count) {
 			chunk->asoc->abandoned_sent[SCTP_PR_INDEX(TTL)]++;
@@ -339,7 +340,8 @@ int sctp_chunk_abandoned(struct sctp_chunk *chunk)
 	} else if (SCTP_PR_RTX_ENABLED(chunk->sinfo.sinfo_flags) &&
 		   chunk->sent_count > chunk->sinfo.sinfo_timetolive) {
 		struct sctp_stream_out *streamout =
-			&chunk->asoc->stream.out[chunk->sinfo.sinfo_stream];
+			SCTP_SO(&chunk->asoc->stream,
+				chunk->sinfo.sinfo_stream);
 
 		chunk->asoc->abandoned_sent[SCTP_PR_INDEX(RTX)]++;
 		streamout->ext->abandoned_sent[SCTP_PR_INDEX(RTX)]++;
diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index d68aa33485a9..d74d00b29942 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -80,7 +80,7 @@ static inline void sctp_outq_head_data(struct sctp_outq *q,
 	q->out_qlen += ch->skb->len;
 
 	stream = sctp_chunk_stream_no(ch);
-	oute = q->asoc->stream.out[stream].ext;
+	oute = SCTP_SO(&q->asoc->stream, stream)->ext;
 	list_add(&ch->stream_list, &oute->outq);
 }
 
@@ -101,7 +101,7 @@ static inline void sctp_outq_tail_data(struct sctp_outq *q,
 	q->out_qlen += ch->skb->len;
 
 	stream = sctp_chunk_stream_no(ch);
-	oute = q->asoc->stream.out[stream].ext;
+	oute = SCTP_SO(&q->asoc->stream, stream)->ext;
 	list_add_tail(&ch->stream_list, &oute->outq);
 }
 
@@ -372,7 +372,7 @@ static int sctp_prsctp_prune_sent(struct sctp_association *asoc,
 		sctp_insert_list(&asoc->outqueue.abandoned,
 				 &chk->transmitted_list);
 
-		streamout = &asoc->stream.out[chk->sinfo.sinfo_stream];
+		streamout = SCTP_SO(&asoc->stream, chk->sinfo.sinfo_stream);
 		asoc->sent_cnt_removable--;
 		asoc->abandoned_sent[SCTP_PR_INDEX(PRIO)]++;
 		streamout->ext->abandoned_sent[SCTP_PR_INDEX(PRIO)]++;
@@ -416,7 +416,7 @@ static int sctp_prsctp_prune_unsent(struct sctp_association *asoc,
 		asoc->abandoned_unsent[SCTP_PR_INDEX(PRIO)]++;
 		if (chk->sinfo.sinfo_stream < asoc->stream.outcnt) {
 			struct sctp_stream_out *streamout =
-				&asoc->stream.out[chk->sinfo.sinfo_stream];
+				SCTP_SO(&asoc->stream, chk->sinfo.sinfo_stream);
 
 			streamout->ext->abandoned_unsent[SCTP_PR_INDEX(PRIO)]++;
 		}
@@ -1082,6 +1082,7 @@ static void sctp_outq_flush_data(struct sctp_flush_ctx *ctx,
 	/* Finally, transmit new packets.  */
 	while ((chunk = sctp_outq_dequeue_data(ctx->q)) != NULL) {
 		__u32 sid = ntohs(chunk->subh.data_hdr->stream);
+		__u8 stream_state = SCTP_SO(&ctx->asoc->stream, sid)->state;
 
 		/* Has this chunk expired? */
 		if (sctp_chunk_abandoned(chunk)) {
@@ -1091,7 +1092,7 @@ static void sctp_outq_flush_data(struct sctp_flush_ctx *ctx,
 			continue;
 		}
 
-		if (ctx->asoc->stream.out[sid].state == SCTP_STREAM_CLOSED) {
+		if (stream_state == SCTP_STREAM_CLOSED) {
 			sctp_outq_head_data(ctx->q, chunk);
 			break;
 		}
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index ce620e878538..4582ab25bc4e 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1905,7 +1905,7 @@ static int sctp_sendmsg_to_asoc(struct sctp_association *asoc,
 		goto err;
 	}
 
-	if (unlikely(!asoc->stream.out[sinfo->sinfo_stream].ext)) {
+	if (unlikely(!SCTP_SO(&asoc->stream, sinfo->sinfo_stream)->ext)) {
 		err = sctp_stream_init_ext(&asoc->stream, sinfo->sinfo_stream);
 		if (err)
 			goto err;
@@ -6958,7 +6958,7 @@ static int sctp_getsockopt_pr_streamstatus(struct sock *sk, int len,
 	if (!asoc || params.sprstat_sid >= asoc->stream.outcnt)
 		goto out;
 
-	streamoute = asoc->stream.out[params.sprstat_sid].ext;
+	streamoute = SCTP_SO(&asoc->stream, params.sprstat_sid)->ext;
 	if (!streamoute) {
 		/* Not allocated yet, means all stats are 0 */
 		params.sprstat_abandoned_unsent = 0;
diff --git a/net/sctp/stream.c b/net/sctp/stream.c
index f1f1d1b232ba..56fadeec7cba 100644
--- a/net/sctp/stream.c
+++ b/net/sctp/stream.c
@@ -37,6 +37,18 @@
 #include <net/sctp/sm.h>
 #include <net/sctp/stream_sched.h>
 
+struct sctp_stream_out *sctp_stream_out(const struct sctp_stream *stream,
+					__u16 sid)
+{
+	return ((struct sctp_stream_out *)(stream->out)) + sid;
+}
+
+struct sctp_stream_in *sctp_stream_in(const struct sctp_stream *stream,
+				      __u16 sid)
+{
+	return ((struct sctp_stream_in *)(stream->in)) + sid;
+}
+
 /* Migrates chunks from stream queues to new stream queues if needed,
  * but not across associations. Also, removes those chunks to streams
  * higher than the new max.
@@ -78,34 +90,35 @@ static void sctp_stream_outq_migrate(struct sctp_stream *stream,
 		 * sctp_stream_update will swap ->out pointers.
 		 */
 		for (i = 0; i < outcnt; i++) {
-			kfree(new->out[i].ext);
-			new->out[i].ext = stream->out[i].ext;
-			stream->out[i].ext = NULL;
+			kfree(SCTP_SO(new, i)->ext);
+			SCTP_SO(new, i)->ext = SCTP_SO(stream, i)->ext;
+			SCTP_SO(stream, i)->ext = NULL;
 		}
 	}
 
 	for (i = outcnt; i < stream->outcnt; i++)
-		kfree(stream->out[i].ext);
+		kfree(SCTP_SO(stream, i)->ext);
 }
 
 static int sctp_stream_alloc_out(struct sctp_stream *stream, __u16 outcnt,
 				 gfp_t gfp)
 {
-	struct sctp_stream_out *out;
+	struct flex_array *out;
+	size_t elem_size = sizeof(struct sctp_stream_out);
 
-	out = kmalloc_array(outcnt, sizeof(*out), gfp);
+	out = kmalloc_array(outcnt, elem_size, gfp);
 	if (!out)
 		return -ENOMEM;
 
 	if (stream->out) {
 		memcpy(out, stream->out, min(outcnt, stream->outcnt) *
-					 sizeof(*out));
+					 elem_size);
 		kfree(stream->out);
 	}
 
 	if (outcnt > stream->outcnt)
-		memset(out + stream->outcnt, 0,
-		       (outcnt - stream->outcnt) * sizeof(*out));
+		memset(((struct sctp_stream_out *)out) + stream->outcnt, 0,
+		       (outcnt - stream->outcnt) * elem_size);
 
 	stream->out = out;
 
@@ -115,22 +128,23 @@ static int sctp_stream_alloc_out(struct sctp_stream *stream, __u16 outcnt,
 static int sctp_stream_alloc_in(struct sctp_stream *stream, __u16 incnt,
 				gfp_t gfp)
 {
-	struct sctp_stream_in *in;
+	struct flex_array *in;
+	size_t elem_size = sizeof(struct sctp_stream_in);
 
-	in = kmalloc_array(incnt, sizeof(*stream->in), gfp);
+	in = kmalloc_array(incnt, elem_size, gfp);
 
 	if (!in)
 		return -ENOMEM;
 
 	if (stream->in) {
 		memcpy(in, stream->in, min(incnt, stream->incnt) *
-				       sizeof(*in));
+				       elem_size);
 		kfree(stream->in);
 	}
 
 	if (incnt > stream->incnt)
-		memset(in + stream->incnt, 0,
-		       (incnt - stream->incnt) * sizeof(*in));
+		memset(((struct sctp_stream_in *)in) + stream->incnt, 0,
+		       (incnt - stream->incnt) * elem_size);
 
 	stream->in = in;
 
@@ -162,7 +176,7 @@ int sctp_stream_init(struct sctp_stream *stream, __u16 outcnt, __u16 incnt,
 
 	stream->outcnt = outcnt;
 	for (i = 0; i < stream->outcnt; i++)
-		stream->out[i].state = SCTP_STREAM_OPEN;
+		SCTP_SO(stream, i)->state = SCTP_STREAM_OPEN;
 
 	sched->init(stream);
 
@@ -193,7 +207,7 @@ int sctp_stream_init_ext(struct sctp_stream *stream, __u16 sid)
 	soute = kzalloc(sizeof(*soute), GFP_KERNEL);
 	if (!soute)
 		return -ENOMEM;
-	stream->out[sid].ext = soute;
+	SCTP_SO(stream, sid)->ext = soute;
 
 	return sctp_sched_init_sid(stream, sid, GFP_KERNEL);
 }
@@ -205,7 +219,7 @@ void sctp_stream_free(struct sctp_stream *stream)
 
 	sched->free(stream);
 	for (i = 0; i < stream->outcnt; i++)
-		kfree(stream->out[i].ext);
+		kfree(SCTP_SO(stream, i)->ext);
 	kfree(stream->out);
 	kfree(stream->in);
 }
@@ -215,12 +229,12 @@ void sctp_stream_clear(struct sctp_stream *stream)
 	int i;
 
 	for (i = 0; i < stream->outcnt; i++) {
-		stream->out[i].mid = 0;
-		stream->out[i].mid_uo = 0;
+		SCTP_SO(stream, i)->mid = 0;
+		SCTP_SO(stream, i)->mid_uo = 0;
 	}
 
 	for (i = 0; i < stream->incnt; i++)
-		stream->in[i].mid = 0;
+		SCTP_SI(stream, i)->mid = 0;
 }
 
 void sctp_stream_update(struct sctp_stream *stream, struct sctp_stream *new)
@@ -273,8 +287,8 @@ static bool sctp_stream_outq_is_empty(struct sctp_stream *stream,
 	for (i = 0; i < str_nums; i++) {
 		__u16 sid = ntohs(str_list[i]);
 
-		if (stream->out[sid].ext &&
-		    !list_empty(&stream->out[sid].ext->outq))
+		if (SCTP_SO(stream, sid)->ext &&
+		    !list_empty(&SCTP_SO(stream, sid)->ext->outq))
 			return false;
 	}
 
@@ -361,11 +375,11 @@ int sctp_send_reset_streams(struct sctp_association *asoc,
 	if (out) {
 		if (str_nums)
 			for (i = 0; i < str_nums; i++)
-				stream->out[str_list[i]].state =
+				SCTP_SO(stream, str_list[i])->state =
 						       SCTP_STREAM_CLOSED;
 		else
 			for (i = 0; i < stream->outcnt; i++)
-				stream->out[i].state = SCTP_STREAM_CLOSED;
+				SCTP_SO(stream, i)->state = SCTP_STREAM_CLOSED;
 	}
 
 	asoc->strreset_chunk = chunk;
@@ -380,11 +394,11 @@ int sctp_send_reset_streams(struct sctp_association *asoc,
 
 		if (str_nums)
 			for (i = 0; i < str_nums; i++)
-				stream->out[str_list[i]].state =
+				SCTP_SO(stream, str_list[i])->state =
 						       SCTP_STREAM_OPEN;
 		else
 			for (i = 0; i < stream->outcnt; i++)
-				stream->out[i].state = SCTP_STREAM_OPEN;
+				SCTP_SO(stream, i)->state = SCTP_STREAM_OPEN;
 
 		goto out;
 	}
@@ -418,7 +432,7 @@ int sctp_send_reset_assoc(struct sctp_association *asoc)
 
 	/* Block further xmit of data until this request is completed */
 	for (i = 0; i < stream->outcnt; i++)
-		stream->out[i].state = SCTP_STREAM_CLOSED;
+		SCTP_SO(stream, i)->state = SCTP_STREAM_CLOSED;
 
 	asoc->strreset_chunk = chunk;
 	sctp_chunk_hold(asoc->strreset_chunk);
@@ -429,7 +443,7 @@ int sctp_send_reset_assoc(struct sctp_association *asoc)
 		asoc->strreset_chunk = NULL;
 
 		for (i = 0; i < stream->outcnt; i++)
-			stream->out[i].state = SCTP_STREAM_OPEN;
+			SCTP_SO(stream, i)->state = SCTP_STREAM_OPEN;
 
 		return retval;
 	}
@@ -609,10 +623,10 @@ struct sctp_chunk *sctp_process_strreset_outreq(
 		}
 
 		for (i = 0; i < nums; i++)
-			stream->in[ntohs(str_p[i])].mid = 0;
+			SCTP_SI(stream, ntohs(str_p[i]))->mid = 0;
 	} else {
 		for (i = 0; i < stream->incnt; i++)
-			stream->in[i].mid = 0;
+			SCTP_SI(stream, i)->mid = 0;
 	}
 
 	result = SCTP_STRRESET_PERFORMED;
@@ -683,11 +697,11 @@ struct sctp_chunk *sctp_process_strreset_inreq(
 
 	if (nums)
 		for (i = 0; i < nums; i++)
-			stream->out[ntohs(str_p[i])].state =
+			SCTP_SO(stream, ntohs(str_p[i]))->state =
 					       SCTP_STREAM_CLOSED;
 	else
 		for (i = 0; i < stream->outcnt; i++)
-			stream->out[i].state = SCTP_STREAM_CLOSED;
+			SCTP_SO(stream, i)->state = SCTP_STREAM_CLOSED;
 
 	asoc->strreset_chunk = chunk;
 	asoc->strreset_outstanding = 1;
@@ -786,11 +800,11 @@ struct sctp_chunk *sctp_process_strreset_tsnreq(
 	 *      incoming and outgoing streams.
 	 */
 	for (i = 0; i < stream->outcnt; i++) {
-		stream->out[i].mid = 0;
-		stream->out[i].mid_uo = 0;
+		SCTP_SO(stream, i)->mid = 0;
+		SCTP_SO(stream, i)->mid_uo = 0;
 	}
 	for (i = 0; i < stream->incnt; i++)
-		stream->in[i].mid = 0;
+		SCTP_SI(stream, i)->mid = 0;
 
 	result = SCTP_STRRESET_PERFORMED;
 
@@ -979,15 +993,18 @@ struct sctp_chunk *sctp_process_strreset_resp(
 		       sizeof(__u16);
 
 		if (result == SCTP_STRRESET_PERFORMED) {
+			struct sctp_stream_out *sout;
 			if (nums) {
 				for (i = 0; i < nums; i++) {
-					stream->out[ntohs(str_p[i])].mid = 0;
-					stream->out[ntohs(str_p[i])].mid_uo = 0;
+					sout = SCTP_SO(stream, ntohs(str_p[i]));
+					sout->mid = 0;
+					sout->mid_uo = 0;
 				}
 			} else {
 				for (i = 0; i < stream->outcnt; i++) {
-					stream->out[i].mid = 0;
-					stream->out[i].mid_uo = 0;
+					sout = SCTP_SO(stream, i);
+					sout->mid = 0;
+					sout->mid_uo = 0;
 				}
 			}
 
@@ -995,7 +1012,7 @@ struct sctp_chunk *sctp_process_strreset_resp(
 		}
 
 		for (i = 0; i < stream->outcnt; i++)
-			stream->out[i].state = SCTP_STREAM_OPEN;
+			SCTP_SO(stream, i)->state = SCTP_STREAM_OPEN;
 
 		*evp = sctp_ulpevent_make_stream_reset_event(asoc, flags,
 			nums, str_p, GFP_ATOMIC);
@@ -1050,15 +1067,15 @@ struct sctp_chunk *sctp_process_strreset_resp(
 			asoc->adv_peer_ack_point = asoc->ctsn_ack_point;
 
 			for (i = 0; i < stream->outcnt; i++) {
-				stream->out[i].mid = 0;
-				stream->out[i].mid_uo = 0;
+				SCTP_SO(stream, i)->mid = 0;
+				SCTP_SO(stream, i)->mid_uo = 0;
 			}
 			for (i = 0; i < stream->incnt; i++)
-				stream->in[i].mid = 0;
+				SCTP_SI(stream, i)->mid = 0;
 		}
 
 		for (i = 0; i < stream->outcnt; i++)
-			stream->out[i].state = SCTP_STREAM_OPEN;
+			SCTP_SO(stream, i)->state = SCTP_STREAM_OPEN;
 
 		*evp = sctp_ulpevent_make_assoc_reset_event(asoc, flags,
 			stsn, rtsn, GFP_ATOMIC);
@@ -1072,7 +1089,7 @@ struct sctp_chunk *sctp_process_strreset_resp(
 
 		if (result == SCTP_STRRESET_PERFORMED)
 			for (i = number; i < stream->outcnt; i++)
-				stream->out[i].state = SCTP_STREAM_OPEN;
+				SCTP_SO(stream, i)->state = SCTP_STREAM_OPEN;
 		else
 			stream->outcnt = number;
 
diff --git a/net/sctp/stream_interleave.c b/net/sctp/stream_interleave.c
index d3764c181299..0a78cdf86463 100644
--- a/net/sctp/stream_interleave.c
+++ b/net/sctp/stream_interleave.c
@@ -197,7 +197,7 @@ static struct sctp_ulpevent *sctp_intl_retrieve_partial(
 	__u32 next_fsn = 0;
 	int is_last = 0;
 
-	sin = sctp_stream_in(ulpq->asoc, event->stream);
+	sin = sctp_stream_in(&ulpq->asoc->stream, event->stream);
 
 	skb_queue_walk(&ulpq->reasm, pos) {
 		struct sctp_ulpevent *cevent = sctp_skb2event(pos);
@@ -278,7 +278,7 @@ static struct sctp_ulpevent *sctp_intl_retrieve_reassembled(
 	__u32 pd_len = 0;
 	__u32 mid = 0;
 
-	sin = sctp_stream_in(ulpq->asoc, event->stream);
+	sin = sctp_stream_in(&ulpq->asoc->stream, event->stream);
 
 	skb_queue_walk(&ulpq->reasm, pos) {
 		struct sctp_ulpevent *cevent = sctp_skb2event(pos);
@@ -368,7 +368,7 @@ static struct sctp_ulpevent *sctp_intl_reasm(struct sctp_ulpq *ulpq,
 
 	sctp_intl_store_reasm(ulpq, event);
 
-	sin = sctp_stream_in(ulpq->asoc, event->stream);
+	sin = sctp_stream_in(&ulpq->asoc->stream, event->stream);
 	if (sin->pd_mode && event->mid == sin->mid &&
 	    event->fsn == sin->fsn)
 		retval = sctp_intl_retrieve_partial(ulpq, event);
@@ -575,7 +575,7 @@ static struct sctp_ulpevent *sctp_intl_retrieve_partial_uo(
 	__u32 next_fsn = 0;
 	int is_last = 0;
 
-	sin = sctp_stream_in(ulpq->asoc, event->stream);
+	sin = sctp_stream_in(&ulpq->asoc->stream, event->stream);
 
 	skb_queue_walk(&ulpq->reasm_uo, pos) {
 		struct sctp_ulpevent *cevent = sctp_skb2event(pos);
@@ -659,7 +659,7 @@ static struct sctp_ulpevent *sctp_intl_retrieve_reassembled_uo(
 	__u32 pd_len = 0;
 	__u32 mid = 0;
 
-	sin = sctp_stream_in(ulpq->asoc, event->stream);
+	sin = sctp_stream_in(&ulpq->asoc->stream, event->stream);
 
 	skb_queue_walk(&ulpq->reasm_uo, pos) {
 		struct sctp_ulpevent *cevent = sctp_skb2event(pos);
@@ -750,7 +750,7 @@ static struct sctp_ulpevent *sctp_intl_reasm_uo(struct sctp_ulpq *ulpq,
 
 	sctp_intl_store_reasm_uo(ulpq, event);
 
-	sin = sctp_stream_in(ulpq->asoc, event->stream);
+	sin = sctp_stream_in(&ulpq->asoc->stream, event->stream);
 	if (sin->pd_mode_uo && event->mid == sin->mid_uo &&
 	    event->fsn == sin->fsn_uo)
 		retval = sctp_intl_retrieve_partial_uo(ulpq, event);
@@ -774,7 +774,7 @@ static struct sctp_ulpevent *sctp_intl_retrieve_first_uo(struct sctp_ulpq *ulpq)
 	skb_queue_walk(&ulpq->reasm_uo, pos) {
 		struct sctp_ulpevent *cevent = sctp_skb2event(pos);
 
-		csin = sctp_stream_in(ulpq->asoc, cevent->stream);
+		csin = sctp_stream_in(&ulpq->asoc->stream, cevent->stream);
 		if (csin->pd_mode_uo)
 			continue;
 
@@ -875,7 +875,7 @@ static struct sctp_ulpevent *sctp_intl_retrieve_first(struct sctp_ulpq *ulpq)
 	skb_queue_walk(&ulpq->reasm, pos) {
 		struct sctp_ulpevent *cevent = sctp_skb2event(pos);
 
-		csin = sctp_stream_in(ulpq->asoc, cevent->stream);
+		csin = sctp_stream_in(&ulpq->asoc->stream, cevent->stream);
 		if (csin->pd_mode)
 			continue;
 
@@ -1053,7 +1053,7 @@ static void sctp_intl_abort_pd(struct sctp_ulpq *ulpq, gfp_t gfp)
 	__u16 sid;
 
 	for (sid = 0; sid < stream->incnt; sid++) {
-		struct sctp_stream_in *sin = &stream->in[sid];
+		struct sctp_stream_in *sin = SCTP_SI(stream, sid);
 		__u32 mid;
 
 		if (sin->pd_mode_uo) {
@@ -1247,7 +1247,7 @@ static void sctp_handle_fwdtsn(struct sctp_ulpq *ulpq, struct sctp_chunk *chunk)
 static void sctp_intl_skip(struct sctp_ulpq *ulpq, __u16 sid, __u32 mid,
 			   __u8 flags)
 {
-	struct sctp_stream_in *sin = sctp_stream_in(ulpq->asoc, sid);
+	struct sctp_stream_in *sin = sctp_stream_in(&ulpq->asoc->stream, sid);
 	struct sctp_stream *stream  = &ulpq->asoc->stream;
 
 	if (flags & SCTP_FTSN_U_BIT) {
diff --git a/net/sctp/stream_sched.c b/net/sctp/stream_sched.c
index f5fcd425232a..a6c04a94b08f 100644
--- a/net/sctp/stream_sched.c
+++ b/net/sctp/stream_sched.c
@@ -161,7 +161,7 @@ int sctp_sched_set_sched(struct sctp_association *asoc,
 
 		/* Give the next scheduler a clean slate. */
 		for (i = 0; i < asoc->stream.outcnt; i++) {
-			void *p = asoc->stream.out[i].ext;
+			void *p = SCTP_SO(&asoc->stream, i)->ext;
 
 			if (!p)
 				continue;
@@ -175,7 +175,7 @@ int sctp_sched_set_sched(struct sctp_association *asoc,
 	asoc->outqueue.sched = n;
 	n->init(&asoc->stream);
 	for (i = 0; i < asoc->stream.outcnt; i++) {
-		if (!asoc->stream.out[i].ext)
+		if (!SCTP_SO(&asoc->stream, i)->ext)
 			continue;
 
 		ret = n->init_sid(&asoc->stream, i, GFP_KERNEL);
@@ -217,7 +217,7 @@ int sctp_sched_set_value(struct sctp_association *asoc, __u16 sid,
 	if (sid >= asoc->stream.outcnt)
 		return -EINVAL;
 
-	if (!asoc->stream.out[sid].ext) {
+	if (!SCTP_SO(&asoc->stream, sid)->ext) {
 		int ret;
 
 		ret = sctp_stream_init_ext(&asoc->stream, sid);
@@ -234,7 +234,7 @@ int sctp_sched_get_value(struct sctp_association *asoc, __u16 sid,
 	if (sid >= asoc->stream.outcnt)
 		return -EINVAL;
 
-	if (!asoc->stream.out[sid].ext)
+	if (!SCTP_SO(&asoc->stream, sid)->ext)
 		return 0;
 
 	return asoc->outqueue.sched->get(&asoc->stream, sid, value);
@@ -252,7 +252,7 @@ void sctp_sched_dequeue_done(struct sctp_outq *q, struct sctp_chunk *ch)
 		 * priority stream comes in.
 		 */
 		sid = sctp_chunk_stream_no(ch);
-		sout = &q->asoc->stream.out[sid];
+		sout = SCTP_SO(&q->asoc->stream, sid);
 		q->asoc->stream.out_curr = sout;
 		return;
 	}
@@ -272,8 +272,9 @@ void sctp_sched_dequeue_common(struct sctp_outq *q, struct sctp_chunk *ch)
 int sctp_sched_init_sid(struct sctp_stream *stream, __u16 sid, gfp_t gfp)
 {
 	struct sctp_sched_ops *sched = sctp_sched_ops_from_stream(stream);
+	struct sctp_stream_out_ext *ext = SCTP_SO(stream, sid)->ext;
 
-	INIT_LIST_HEAD(&stream->out[sid].ext->outq);
+	INIT_LIST_HEAD(&ext->outq);
 	return sched->init_sid(stream, sid, gfp);
 }
 
diff --git a/net/sctp/stream_sched_prio.c b/net/sctp/stream_sched_prio.c
index 7997d35dd0fd..2245083a98f2 100644
--- a/net/sctp/stream_sched_prio.c
+++ b/net/sctp/stream_sched_prio.c
@@ -75,10 +75,10 @@ static struct sctp_stream_priorities *sctp_sched_prio_get_head(
 
 	/* No luck. So we search on all streams now. */
 	for (i = 0; i < stream->outcnt; i++) {
-		if (!stream->out[i].ext)
+		if (!SCTP_SO(stream, i)->ext)
 			continue;
 
-		p = stream->out[i].ext->prio_head;
+		p = SCTP_SO(stream, i)->ext->prio_head;
 		if (!p)
 			/* Means all other streams won't be initialized
 			 * as well.
@@ -165,7 +165,7 @@ static void sctp_sched_prio_sched(struct sctp_stream *stream,
 static int sctp_sched_prio_set(struct sctp_stream *stream, __u16 sid,
 			       __u16 prio, gfp_t gfp)
 {
-	struct sctp_stream_out *sout = &stream->out[sid];
+	struct sctp_stream_out *sout = SCTP_SO(stream, sid);
 	struct sctp_stream_out_ext *soute = sout->ext;
 	struct sctp_stream_priorities *prio_head, *old;
 	bool reschedule = false;
@@ -186,7 +186,7 @@ static int sctp_sched_prio_set(struct sctp_stream *stream, __u16 sid,
 		return 0;
 
 	for (i = 0; i < stream->outcnt; i++) {
-		soute = stream->out[i].ext;
+		soute = SCTP_SO(stream, i)->ext;
 		if (soute && soute->prio_head == old)
 			/* It's still in use, nothing else to do here. */
 			return 0;
@@ -201,7 +201,7 @@ static int sctp_sched_prio_set(struct sctp_stream *stream, __u16 sid,
 static int sctp_sched_prio_get(struct sctp_stream *stream, __u16 sid,
 			       __u16 *value)
 {
-	*value = stream->out[sid].ext->prio_head->prio;
+	*value = SCTP_SO(stream, sid)->ext->prio_head->prio;
 	return 0;
 }
 
@@ -215,7 +215,7 @@ static int sctp_sched_prio_init(struct sctp_stream *stream)
 static int sctp_sched_prio_init_sid(struct sctp_stream *stream, __u16 sid,
 				    gfp_t gfp)
 {
-	INIT_LIST_HEAD(&stream->out[sid].ext->prio_list);
+	INIT_LIST_HEAD(&SCTP_SO(stream, sid)->ext->prio_list);
 	return sctp_sched_prio_set(stream, sid, 0, gfp);
 }
 
@@ -233,9 +233,9 @@ static void sctp_sched_prio_free(struct sctp_stream *stream)
 	 */
 	sctp_sched_prio_unsched_all(stream);
 	for (i = 0; i < stream->outcnt; i++) {
-		if (!stream->out[i].ext)
+		if (!SCTP_SO(stream, i)->ext)
 			continue;
-		prio = stream->out[i].ext->prio_head;
+		prio = SCTP_SO(stream, i)->ext->prio_head;
 		if (prio && list_empty(&prio->prio_sched))
 			list_add(&prio->prio_sched, &list);
 	}
@@ -255,7 +255,7 @@ static void sctp_sched_prio_enqueue(struct sctp_outq *q,
 	ch = list_first_entry(&msg->chunks, struct sctp_chunk, frag_list);
 	sid = sctp_chunk_stream_no(ch);
 	stream = &q->asoc->stream;
-	sctp_sched_prio_sched(stream, stream->out[sid].ext);
+	sctp_sched_prio_sched(stream, SCTP_SO(stream, sid)->ext);
 }
 
 static struct sctp_chunk *sctp_sched_prio_dequeue(struct sctp_outq *q)
@@ -297,7 +297,7 @@ static void sctp_sched_prio_dequeue_done(struct sctp_outq *q,
 	 * this priority.
 	 */
 	sid = sctp_chunk_stream_no(ch);
-	soute = q->asoc->stream.out[sid].ext;
+	soute = SCTP_SO(&q->asoc->stream, sid)->ext;
 	prio = soute->prio_head;
 
 	sctp_sched_prio_next_stream(prio);
@@ -317,7 +317,7 @@ static void sctp_sched_prio_sched_all(struct sctp_stream *stream)
 		__u16 sid;
 
 		sid = sctp_chunk_stream_no(ch);
-		sout = &stream->out[sid];
+		sout = SCTP_SO(stream, sid);
 		if (sout->ext)
 			sctp_sched_prio_sched(stream, sout->ext);
 	}
diff --git a/net/sctp/stream_sched_rr.c b/net/sctp/stream_sched_rr.c
index 1155692448f1..52ba743fa7a7 100644
--- a/net/sctp/stream_sched_rr.c
+++ b/net/sctp/stream_sched_rr.c
@@ -100,7 +100,7 @@ static int sctp_sched_rr_init(struct sctp_stream *stream)
 static int sctp_sched_rr_init_sid(struct sctp_stream *stream, __u16 sid,
 				  gfp_t gfp)
 {
-	INIT_LIST_HEAD(&stream->out[sid].ext->rr_list);
+	INIT_LIST_HEAD(&SCTP_SO(stream, sid)->ext->rr_list);
 
 	return 0;
 }
@@ -120,7 +120,7 @@ static void sctp_sched_rr_enqueue(struct sctp_outq *q,
 	ch = list_first_entry(&msg->chunks, struct sctp_chunk, frag_list);
 	sid = sctp_chunk_stream_no(ch);
 	stream = &q->asoc->stream;
-	sctp_sched_rr_sched(stream, stream->out[sid].ext);
+	sctp_sched_rr_sched(stream, SCTP_SO(stream, sid)->ext);
 }
 
 static struct sctp_chunk *sctp_sched_rr_dequeue(struct sctp_outq *q)
@@ -154,7 +154,7 @@ static void sctp_sched_rr_dequeue_done(struct sctp_outq *q,
 
 	/* Last chunk on that msg, move to the next stream */
 	sid = sctp_chunk_stream_no(ch);
-	soute = q->asoc->stream.out[sid].ext;
+	soute = SCTP_SO(&q->asoc->stream, sid)->ext;
 
 	sctp_sched_rr_next_stream(&q->asoc->stream);
 
@@ -173,7 +173,7 @@ static void sctp_sched_rr_sched_all(struct sctp_stream *stream)
 		__u16 sid;
 
 		sid = sctp_chunk_stream_no(ch);
-		soute = stream->out[sid].ext;
+		soute = SCTP_SO(stream, sid)->ext;
 		if (soute)
 			sctp_sched_rr_sched(stream, soute);
 	}
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 1/2] net/sctp: Make wrappers for accessing in/out streams
@ 2018-08-03 16:21               ` Konstantin Khorenko
  0 siblings, 0 replies; 64+ messages in thread
From: Konstantin Khorenko @ 2018-08-03 16:21 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: oleg.babin, netdev, linux-sctp, David S . Miller, Vlad Yasevich,
	Neil Horman, Xin Long, Andrey Ryabinin, Konstantin Khorenko

This patch introduces wrappers for accessing in/out streams indirectly.
This will enable to replace physically contiguous memory arrays
of streams with flexible arrays (or maybe any other appropriate
mechanism) which do memory allocation on a per-page basis.

Signed-off-by: Oleg Babin <obabin@virtuozzo.com>
Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>

---
v2 changes:
 sctp_stream_in() users are updated to provide stream as an argument,
 sctp_stream_{in,out}_ptr() are now just sctp_stream_{in,out}().
---
 include/net/sctp/structs.h   |  30 +++++++-----
 net/sctp/chunk.c             |   6 ++-
 net/sctp/outqueue.c          |  11 +++--
 net/sctp/socket.c            |   4 +-
 net/sctp/stream.c            | 107 +++++++++++++++++++++++++------------------
 net/sctp/stream_interleave.c |  20 ++++----
 net/sctp/stream_sched.c      |  13 +++---
 net/sctp/stream_sched_prio.c |  22 ++++-----
 net/sctp/stream_sched_rr.c   |   8 ++--
 9 files changed, 124 insertions(+), 97 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index dbe1b911a24d..dc48c8e2b293 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -394,37 +394,35 @@ void sctp_stream_update(struct sctp_stream *stream, struct sctp_stream *new);
 
 /* What is the current SSN number for this stream? */
 #define sctp_ssn_peek(stream, type, sid) \
-	((stream)->type[sid].ssn)
+	(sctp_stream_##type((stream), (sid))->ssn)
 
 /* Return the next SSN number for this stream.	*/
 #define sctp_ssn_next(stream, type, sid) \
-	((stream)->type[sid].ssn++)
+	(sctp_stream_##type((stream), (sid))->ssn++)
 
 /* Skip over this ssn and all below. */
 #define sctp_ssn_skip(stream, type, sid, ssn) \
-	((stream)->type[sid].ssn = ssn + 1)
+	(sctp_stream_##type((stream), (sid))->ssn = ssn + 1)
 
 /* What is the current MID number for this stream? */
 #define sctp_mid_peek(stream, type, sid) \
-	((stream)->type[sid].mid)
+	(sctp_stream_##type((stream), (sid))->mid)
 
 /* Return the next MID number for this stream.  */
 #define sctp_mid_next(stream, type, sid) \
-	((stream)->type[sid].mid++)
+	(sctp_stream_##type((stream), (sid))->mid++)
 
 /* Skip over this mid and all below. */
 #define sctp_mid_skip(stream, type, sid, mid) \
-	((stream)->type[sid].mid = mid + 1)
-
-#define sctp_stream_in(asoc, sid) (&(asoc)->stream.in[sid])
+	(sctp_stream_##type((stream), (sid))->mid = mid + 1)
 
 /* What is the current MID_uo number for this stream? */
 #define sctp_mid_uo_peek(stream, type, sid) \
-	((stream)->type[sid].mid_uo)
+	(sctp_stream_##type((stream), (sid))->mid_uo)
 
 /* Return the next MID_uo number for this stream.  */
 #define sctp_mid_uo_next(stream, type, sid) \
-	((stream)->type[sid].mid_uo++)
+	(sctp_stream_##type((stream), (sid))->mid_uo++)
 
 /*
  * Pointers to address related SCTP functions.
@@ -1433,8 +1431,8 @@ struct sctp_stream_in {
 };
 
 struct sctp_stream {
-	struct sctp_stream_out *out;
-	struct sctp_stream_in *in;
+	struct flex_array *out;
+	struct flex_array *in;
 	__u16 outcnt;
 	__u16 incnt;
 	/* Current stream being sent, if any */
@@ -1456,6 +1454,14 @@ struct sctp_stream {
 	struct sctp_stream_interleave *si;
 };
 
+struct sctp_stream_out *sctp_stream_out(const struct sctp_stream *stream,
+					__u16 sid);
+struct sctp_stream_in *sctp_stream_in(const struct sctp_stream *stream,
+				      __u16 sid);
+
+#define SCTP_SO(s, i) sctp_stream_out((s), (i))
+#define SCTP_SI(s, i) sctp_stream_in((s), (i))
+
 #define SCTP_STREAM_CLOSED		0x00
 #define SCTP_STREAM_OPEN		0x01
 
diff --git a/net/sctp/chunk.c b/net/sctp/chunk.c
index bfb9f812e2ef..ce8087846f05 100644
--- a/net/sctp/chunk.c
+++ b/net/sctp/chunk.c
@@ -325,7 +325,8 @@ int sctp_chunk_abandoned(struct sctp_chunk *chunk)
 	if (SCTP_PR_TTL_ENABLED(chunk->sinfo.sinfo_flags) &&
 	    time_after(jiffies, chunk->msg->expires_at)) {
 		struct sctp_stream_out *streamout -			&chunk->asoc->stream.out[chunk->sinfo.sinfo_stream];
+			SCTP_SO(&chunk->asoc->stream,
+				chunk->sinfo.sinfo_stream);
 
 		if (chunk->sent_count) {
 			chunk->asoc->abandoned_sent[SCTP_PR_INDEX(TTL)]++;
@@ -339,7 +340,8 @@ int sctp_chunk_abandoned(struct sctp_chunk *chunk)
 	} else if (SCTP_PR_RTX_ENABLED(chunk->sinfo.sinfo_flags) &&
 		   chunk->sent_count > chunk->sinfo.sinfo_timetolive) {
 		struct sctp_stream_out *streamout -			&chunk->asoc->stream.out[chunk->sinfo.sinfo_stream];
+			SCTP_SO(&chunk->asoc->stream,
+				chunk->sinfo.sinfo_stream);
 
 		chunk->asoc->abandoned_sent[SCTP_PR_INDEX(RTX)]++;
 		streamout->ext->abandoned_sent[SCTP_PR_INDEX(RTX)]++;
diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index d68aa33485a9..d74d00b29942 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -80,7 +80,7 @@ static inline void sctp_outq_head_data(struct sctp_outq *q,
 	q->out_qlen += ch->skb->len;
 
 	stream = sctp_chunk_stream_no(ch);
-	oute = q->asoc->stream.out[stream].ext;
+	oute = SCTP_SO(&q->asoc->stream, stream)->ext;
 	list_add(&ch->stream_list, &oute->outq);
 }
 
@@ -101,7 +101,7 @@ static inline void sctp_outq_tail_data(struct sctp_outq *q,
 	q->out_qlen += ch->skb->len;
 
 	stream = sctp_chunk_stream_no(ch);
-	oute = q->asoc->stream.out[stream].ext;
+	oute = SCTP_SO(&q->asoc->stream, stream)->ext;
 	list_add_tail(&ch->stream_list, &oute->outq);
 }
 
@@ -372,7 +372,7 @@ static int sctp_prsctp_prune_sent(struct sctp_association *asoc,
 		sctp_insert_list(&asoc->outqueue.abandoned,
 				 &chk->transmitted_list);
 
-		streamout = &asoc->stream.out[chk->sinfo.sinfo_stream];
+		streamout = SCTP_SO(&asoc->stream, chk->sinfo.sinfo_stream);
 		asoc->sent_cnt_removable--;
 		asoc->abandoned_sent[SCTP_PR_INDEX(PRIO)]++;
 		streamout->ext->abandoned_sent[SCTP_PR_INDEX(PRIO)]++;
@@ -416,7 +416,7 @@ static int sctp_prsctp_prune_unsent(struct sctp_association *asoc,
 		asoc->abandoned_unsent[SCTP_PR_INDEX(PRIO)]++;
 		if (chk->sinfo.sinfo_stream < asoc->stream.outcnt) {
 			struct sctp_stream_out *streamout -				&asoc->stream.out[chk->sinfo.sinfo_stream];
+				SCTP_SO(&asoc->stream, chk->sinfo.sinfo_stream);
 
 			streamout->ext->abandoned_unsent[SCTP_PR_INDEX(PRIO)]++;
 		}
@@ -1082,6 +1082,7 @@ static void sctp_outq_flush_data(struct sctp_flush_ctx *ctx,
 	/* Finally, transmit new packets.  */
 	while ((chunk = sctp_outq_dequeue_data(ctx->q)) != NULL) {
 		__u32 sid = ntohs(chunk->subh.data_hdr->stream);
+		__u8 stream_state = SCTP_SO(&ctx->asoc->stream, sid)->state;
 
 		/* Has this chunk expired? */
 		if (sctp_chunk_abandoned(chunk)) {
@@ -1091,7 +1092,7 @@ static void sctp_outq_flush_data(struct sctp_flush_ctx *ctx,
 			continue;
 		}
 
-		if (ctx->asoc->stream.out[sid].state = SCTP_STREAM_CLOSED) {
+		if (stream_state = SCTP_STREAM_CLOSED) {
 			sctp_outq_head_data(ctx->q, chunk);
 			break;
 		}
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index ce620e878538..4582ab25bc4e 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1905,7 +1905,7 @@ static int sctp_sendmsg_to_asoc(struct sctp_association *asoc,
 		goto err;
 	}
 
-	if (unlikely(!asoc->stream.out[sinfo->sinfo_stream].ext)) {
+	if (unlikely(!SCTP_SO(&asoc->stream, sinfo->sinfo_stream)->ext)) {
 		err = sctp_stream_init_ext(&asoc->stream, sinfo->sinfo_stream);
 		if (err)
 			goto err;
@@ -6958,7 +6958,7 @@ static int sctp_getsockopt_pr_streamstatus(struct sock *sk, int len,
 	if (!asoc || params.sprstat_sid >= asoc->stream.outcnt)
 		goto out;
 
-	streamoute = asoc->stream.out[params.sprstat_sid].ext;
+	streamoute = SCTP_SO(&asoc->stream, params.sprstat_sid)->ext;
 	if (!streamoute) {
 		/* Not allocated yet, means all stats are 0 */
 		params.sprstat_abandoned_unsent = 0;
diff --git a/net/sctp/stream.c b/net/sctp/stream.c
index f1f1d1b232ba..56fadeec7cba 100644
--- a/net/sctp/stream.c
+++ b/net/sctp/stream.c
@@ -37,6 +37,18 @@
 #include <net/sctp/sm.h>
 #include <net/sctp/stream_sched.h>
 
+struct sctp_stream_out *sctp_stream_out(const struct sctp_stream *stream,
+					__u16 sid)
+{
+	return ((struct sctp_stream_out *)(stream->out)) + sid;
+}
+
+struct sctp_stream_in *sctp_stream_in(const struct sctp_stream *stream,
+				      __u16 sid)
+{
+	return ((struct sctp_stream_in *)(stream->in)) + sid;
+}
+
 /* Migrates chunks from stream queues to new stream queues if needed,
  * but not across associations. Also, removes those chunks to streams
  * higher than the new max.
@@ -78,34 +90,35 @@ static void sctp_stream_outq_migrate(struct sctp_stream *stream,
 		 * sctp_stream_update will swap ->out pointers.
 		 */
 		for (i = 0; i < outcnt; i++) {
-			kfree(new->out[i].ext);
-			new->out[i].ext = stream->out[i].ext;
-			stream->out[i].ext = NULL;
+			kfree(SCTP_SO(new, i)->ext);
+			SCTP_SO(new, i)->ext = SCTP_SO(stream, i)->ext;
+			SCTP_SO(stream, i)->ext = NULL;
 		}
 	}
 
 	for (i = outcnt; i < stream->outcnt; i++)
-		kfree(stream->out[i].ext);
+		kfree(SCTP_SO(stream, i)->ext);
 }
 
 static int sctp_stream_alloc_out(struct sctp_stream *stream, __u16 outcnt,
 				 gfp_t gfp)
 {
-	struct sctp_stream_out *out;
+	struct flex_array *out;
+	size_t elem_size = sizeof(struct sctp_stream_out);
 
-	out = kmalloc_array(outcnt, sizeof(*out), gfp);
+	out = kmalloc_array(outcnt, elem_size, gfp);
 	if (!out)
 		return -ENOMEM;
 
 	if (stream->out) {
 		memcpy(out, stream->out, min(outcnt, stream->outcnt) *
-					 sizeof(*out));
+					 elem_size);
 		kfree(stream->out);
 	}
 
 	if (outcnt > stream->outcnt)
-		memset(out + stream->outcnt, 0,
-		       (outcnt - stream->outcnt) * sizeof(*out));
+		memset(((struct sctp_stream_out *)out) + stream->outcnt, 0,
+		       (outcnt - stream->outcnt) * elem_size);
 
 	stream->out = out;
 
@@ -115,22 +128,23 @@ static int sctp_stream_alloc_out(struct sctp_stream *stream, __u16 outcnt,
 static int sctp_stream_alloc_in(struct sctp_stream *stream, __u16 incnt,
 				gfp_t gfp)
 {
-	struct sctp_stream_in *in;
+	struct flex_array *in;
+	size_t elem_size = sizeof(struct sctp_stream_in);
 
-	in = kmalloc_array(incnt, sizeof(*stream->in), gfp);
+	in = kmalloc_array(incnt, elem_size, gfp);
 
 	if (!in)
 		return -ENOMEM;
 
 	if (stream->in) {
 		memcpy(in, stream->in, min(incnt, stream->incnt) *
-				       sizeof(*in));
+				       elem_size);
 		kfree(stream->in);
 	}
 
 	if (incnt > stream->incnt)
-		memset(in + stream->incnt, 0,
-		       (incnt - stream->incnt) * sizeof(*in));
+		memset(((struct sctp_stream_in *)in) + stream->incnt, 0,
+		       (incnt - stream->incnt) * elem_size);
 
 	stream->in = in;
 
@@ -162,7 +176,7 @@ int sctp_stream_init(struct sctp_stream *stream, __u16 outcnt, __u16 incnt,
 
 	stream->outcnt = outcnt;
 	for (i = 0; i < stream->outcnt; i++)
-		stream->out[i].state = SCTP_STREAM_OPEN;
+		SCTP_SO(stream, i)->state = SCTP_STREAM_OPEN;
 
 	sched->init(stream);
 
@@ -193,7 +207,7 @@ int sctp_stream_init_ext(struct sctp_stream *stream, __u16 sid)
 	soute = kzalloc(sizeof(*soute), GFP_KERNEL);
 	if (!soute)
 		return -ENOMEM;
-	stream->out[sid].ext = soute;
+	SCTP_SO(stream, sid)->ext = soute;
 
 	return sctp_sched_init_sid(stream, sid, GFP_KERNEL);
 }
@@ -205,7 +219,7 @@ void sctp_stream_free(struct sctp_stream *stream)
 
 	sched->free(stream);
 	for (i = 0; i < stream->outcnt; i++)
-		kfree(stream->out[i].ext);
+		kfree(SCTP_SO(stream, i)->ext);
 	kfree(stream->out);
 	kfree(stream->in);
 }
@@ -215,12 +229,12 @@ void sctp_stream_clear(struct sctp_stream *stream)
 	int i;
 
 	for (i = 0; i < stream->outcnt; i++) {
-		stream->out[i].mid = 0;
-		stream->out[i].mid_uo = 0;
+		SCTP_SO(stream, i)->mid = 0;
+		SCTP_SO(stream, i)->mid_uo = 0;
 	}
 
 	for (i = 0; i < stream->incnt; i++)
-		stream->in[i].mid = 0;
+		SCTP_SI(stream, i)->mid = 0;
 }
 
 void sctp_stream_update(struct sctp_stream *stream, struct sctp_stream *new)
@@ -273,8 +287,8 @@ static bool sctp_stream_outq_is_empty(struct sctp_stream *stream,
 	for (i = 0; i < str_nums; i++) {
 		__u16 sid = ntohs(str_list[i]);
 
-		if (stream->out[sid].ext &&
-		    !list_empty(&stream->out[sid].ext->outq))
+		if (SCTP_SO(stream, sid)->ext &&
+		    !list_empty(&SCTP_SO(stream, sid)->ext->outq))
 			return false;
 	}
 
@@ -361,11 +375,11 @@ int sctp_send_reset_streams(struct sctp_association *asoc,
 	if (out) {
 		if (str_nums)
 			for (i = 0; i < str_nums; i++)
-				stream->out[str_list[i]].state +				SCTP_SO(stream, str_list[i])->state  						       SCTP_STREAM_CLOSED;
 		else
 			for (i = 0; i < stream->outcnt; i++)
-				stream->out[i].state = SCTP_STREAM_CLOSED;
+				SCTP_SO(stream, i)->state = SCTP_STREAM_CLOSED;
 	}
 
 	asoc->strreset_chunk = chunk;
@@ -380,11 +394,11 @@ int sctp_send_reset_streams(struct sctp_association *asoc,
 
 		if (str_nums)
 			for (i = 0; i < str_nums; i++)
-				stream->out[str_list[i]].state +				SCTP_SO(stream, str_list[i])->state  						       SCTP_STREAM_OPEN;
 		else
 			for (i = 0; i < stream->outcnt; i++)
-				stream->out[i].state = SCTP_STREAM_OPEN;
+				SCTP_SO(stream, i)->state = SCTP_STREAM_OPEN;
 
 		goto out;
 	}
@@ -418,7 +432,7 @@ int sctp_send_reset_assoc(struct sctp_association *asoc)
 
 	/* Block further xmit of data until this request is completed */
 	for (i = 0; i < stream->outcnt; i++)
-		stream->out[i].state = SCTP_STREAM_CLOSED;
+		SCTP_SO(stream, i)->state = SCTP_STREAM_CLOSED;
 
 	asoc->strreset_chunk = chunk;
 	sctp_chunk_hold(asoc->strreset_chunk);
@@ -429,7 +443,7 @@ int sctp_send_reset_assoc(struct sctp_association *asoc)
 		asoc->strreset_chunk = NULL;
 
 		for (i = 0; i < stream->outcnt; i++)
-			stream->out[i].state = SCTP_STREAM_OPEN;
+			SCTP_SO(stream, i)->state = SCTP_STREAM_OPEN;
 
 		return retval;
 	}
@@ -609,10 +623,10 @@ struct sctp_chunk *sctp_process_strreset_outreq(
 		}
 
 		for (i = 0; i < nums; i++)
-			stream->in[ntohs(str_p[i])].mid = 0;
+			SCTP_SI(stream, ntohs(str_p[i]))->mid = 0;
 	} else {
 		for (i = 0; i < stream->incnt; i++)
-			stream->in[i].mid = 0;
+			SCTP_SI(stream, i)->mid = 0;
 	}
 
 	result = SCTP_STRRESET_PERFORMED;
@@ -683,11 +697,11 @@ struct sctp_chunk *sctp_process_strreset_inreq(
 
 	if (nums)
 		for (i = 0; i < nums; i++)
-			stream->out[ntohs(str_p[i])].state +			SCTP_SO(stream, ntohs(str_p[i]))->state  					       SCTP_STREAM_CLOSED;
 	else
 		for (i = 0; i < stream->outcnt; i++)
-			stream->out[i].state = SCTP_STREAM_CLOSED;
+			SCTP_SO(stream, i)->state = SCTP_STREAM_CLOSED;
 
 	asoc->strreset_chunk = chunk;
 	asoc->strreset_outstanding = 1;
@@ -786,11 +800,11 @@ struct sctp_chunk *sctp_process_strreset_tsnreq(
 	 *      incoming and outgoing streams.
 	 */
 	for (i = 0; i < stream->outcnt; i++) {
-		stream->out[i].mid = 0;
-		stream->out[i].mid_uo = 0;
+		SCTP_SO(stream, i)->mid = 0;
+		SCTP_SO(stream, i)->mid_uo = 0;
 	}
 	for (i = 0; i < stream->incnt; i++)
-		stream->in[i].mid = 0;
+		SCTP_SI(stream, i)->mid = 0;
 
 	result = SCTP_STRRESET_PERFORMED;
 
@@ -979,15 +993,18 @@ struct sctp_chunk *sctp_process_strreset_resp(
 		       sizeof(__u16);
 
 		if (result = SCTP_STRRESET_PERFORMED) {
+			struct sctp_stream_out *sout;
 			if (nums) {
 				for (i = 0; i < nums; i++) {
-					stream->out[ntohs(str_p[i])].mid = 0;
-					stream->out[ntohs(str_p[i])].mid_uo = 0;
+					sout = SCTP_SO(stream, ntohs(str_p[i]));
+					sout->mid = 0;
+					sout->mid_uo = 0;
 				}
 			} else {
 				for (i = 0; i < stream->outcnt; i++) {
-					stream->out[i].mid = 0;
-					stream->out[i].mid_uo = 0;
+					sout = SCTP_SO(stream, i);
+					sout->mid = 0;
+					sout->mid_uo = 0;
 				}
 			}
 
@@ -995,7 +1012,7 @@ struct sctp_chunk *sctp_process_strreset_resp(
 		}
 
 		for (i = 0; i < stream->outcnt; i++)
-			stream->out[i].state = SCTP_STREAM_OPEN;
+			SCTP_SO(stream, i)->state = SCTP_STREAM_OPEN;
 
 		*evp = sctp_ulpevent_make_stream_reset_event(asoc, flags,
 			nums, str_p, GFP_ATOMIC);
@@ -1050,15 +1067,15 @@ struct sctp_chunk *sctp_process_strreset_resp(
 			asoc->adv_peer_ack_point = asoc->ctsn_ack_point;
 
 			for (i = 0; i < stream->outcnt; i++) {
-				stream->out[i].mid = 0;
-				stream->out[i].mid_uo = 0;
+				SCTP_SO(stream, i)->mid = 0;
+				SCTP_SO(stream, i)->mid_uo = 0;
 			}
 			for (i = 0; i < stream->incnt; i++)
-				stream->in[i].mid = 0;
+				SCTP_SI(stream, i)->mid = 0;
 		}
 
 		for (i = 0; i < stream->outcnt; i++)
-			stream->out[i].state = SCTP_STREAM_OPEN;
+			SCTP_SO(stream, i)->state = SCTP_STREAM_OPEN;
 
 		*evp = sctp_ulpevent_make_assoc_reset_event(asoc, flags,
 			stsn, rtsn, GFP_ATOMIC);
@@ -1072,7 +1089,7 @@ struct sctp_chunk *sctp_process_strreset_resp(
 
 		if (result = SCTP_STRRESET_PERFORMED)
 			for (i = number; i < stream->outcnt; i++)
-				stream->out[i].state = SCTP_STREAM_OPEN;
+				SCTP_SO(stream, i)->state = SCTP_STREAM_OPEN;
 		else
 			stream->outcnt = number;
 
diff --git a/net/sctp/stream_interleave.c b/net/sctp/stream_interleave.c
index d3764c181299..0a78cdf86463 100644
--- a/net/sctp/stream_interleave.c
+++ b/net/sctp/stream_interleave.c
@@ -197,7 +197,7 @@ static struct sctp_ulpevent *sctp_intl_retrieve_partial(
 	__u32 next_fsn = 0;
 	int is_last = 0;
 
-	sin = sctp_stream_in(ulpq->asoc, event->stream);
+	sin = sctp_stream_in(&ulpq->asoc->stream, event->stream);
 
 	skb_queue_walk(&ulpq->reasm, pos) {
 		struct sctp_ulpevent *cevent = sctp_skb2event(pos);
@@ -278,7 +278,7 @@ static struct sctp_ulpevent *sctp_intl_retrieve_reassembled(
 	__u32 pd_len = 0;
 	__u32 mid = 0;
 
-	sin = sctp_stream_in(ulpq->asoc, event->stream);
+	sin = sctp_stream_in(&ulpq->asoc->stream, event->stream);
 
 	skb_queue_walk(&ulpq->reasm, pos) {
 		struct sctp_ulpevent *cevent = sctp_skb2event(pos);
@@ -368,7 +368,7 @@ static struct sctp_ulpevent *sctp_intl_reasm(struct sctp_ulpq *ulpq,
 
 	sctp_intl_store_reasm(ulpq, event);
 
-	sin = sctp_stream_in(ulpq->asoc, event->stream);
+	sin = sctp_stream_in(&ulpq->asoc->stream, event->stream);
 	if (sin->pd_mode && event->mid = sin->mid &&
 	    event->fsn = sin->fsn)
 		retval = sctp_intl_retrieve_partial(ulpq, event);
@@ -575,7 +575,7 @@ static struct sctp_ulpevent *sctp_intl_retrieve_partial_uo(
 	__u32 next_fsn = 0;
 	int is_last = 0;
 
-	sin = sctp_stream_in(ulpq->asoc, event->stream);
+	sin = sctp_stream_in(&ulpq->asoc->stream, event->stream);
 
 	skb_queue_walk(&ulpq->reasm_uo, pos) {
 		struct sctp_ulpevent *cevent = sctp_skb2event(pos);
@@ -659,7 +659,7 @@ static struct sctp_ulpevent *sctp_intl_retrieve_reassembled_uo(
 	__u32 pd_len = 0;
 	__u32 mid = 0;
 
-	sin = sctp_stream_in(ulpq->asoc, event->stream);
+	sin = sctp_stream_in(&ulpq->asoc->stream, event->stream);
 
 	skb_queue_walk(&ulpq->reasm_uo, pos) {
 		struct sctp_ulpevent *cevent = sctp_skb2event(pos);
@@ -750,7 +750,7 @@ static struct sctp_ulpevent *sctp_intl_reasm_uo(struct sctp_ulpq *ulpq,
 
 	sctp_intl_store_reasm_uo(ulpq, event);
 
-	sin = sctp_stream_in(ulpq->asoc, event->stream);
+	sin = sctp_stream_in(&ulpq->asoc->stream, event->stream);
 	if (sin->pd_mode_uo && event->mid = sin->mid_uo &&
 	    event->fsn = sin->fsn_uo)
 		retval = sctp_intl_retrieve_partial_uo(ulpq, event);
@@ -774,7 +774,7 @@ static struct sctp_ulpevent *sctp_intl_retrieve_first_uo(struct sctp_ulpq *ulpq)
 	skb_queue_walk(&ulpq->reasm_uo, pos) {
 		struct sctp_ulpevent *cevent = sctp_skb2event(pos);
 
-		csin = sctp_stream_in(ulpq->asoc, cevent->stream);
+		csin = sctp_stream_in(&ulpq->asoc->stream, cevent->stream);
 		if (csin->pd_mode_uo)
 			continue;
 
@@ -875,7 +875,7 @@ static struct sctp_ulpevent *sctp_intl_retrieve_first(struct sctp_ulpq *ulpq)
 	skb_queue_walk(&ulpq->reasm, pos) {
 		struct sctp_ulpevent *cevent = sctp_skb2event(pos);
 
-		csin = sctp_stream_in(ulpq->asoc, cevent->stream);
+		csin = sctp_stream_in(&ulpq->asoc->stream, cevent->stream);
 		if (csin->pd_mode)
 			continue;
 
@@ -1053,7 +1053,7 @@ static void sctp_intl_abort_pd(struct sctp_ulpq *ulpq, gfp_t gfp)
 	__u16 sid;
 
 	for (sid = 0; sid < stream->incnt; sid++) {
-		struct sctp_stream_in *sin = &stream->in[sid];
+		struct sctp_stream_in *sin = SCTP_SI(stream, sid);
 		__u32 mid;
 
 		if (sin->pd_mode_uo) {
@@ -1247,7 +1247,7 @@ static void sctp_handle_fwdtsn(struct sctp_ulpq *ulpq, struct sctp_chunk *chunk)
 static void sctp_intl_skip(struct sctp_ulpq *ulpq, __u16 sid, __u32 mid,
 			   __u8 flags)
 {
-	struct sctp_stream_in *sin = sctp_stream_in(ulpq->asoc, sid);
+	struct sctp_stream_in *sin = sctp_stream_in(&ulpq->asoc->stream, sid);
 	struct sctp_stream *stream  = &ulpq->asoc->stream;
 
 	if (flags & SCTP_FTSN_U_BIT) {
diff --git a/net/sctp/stream_sched.c b/net/sctp/stream_sched.c
index f5fcd425232a..a6c04a94b08f 100644
--- a/net/sctp/stream_sched.c
+++ b/net/sctp/stream_sched.c
@@ -161,7 +161,7 @@ int sctp_sched_set_sched(struct sctp_association *asoc,
 
 		/* Give the next scheduler a clean slate. */
 		for (i = 0; i < asoc->stream.outcnt; i++) {
-			void *p = asoc->stream.out[i].ext;
+			void *p = SCTP_SO(&asoc->stream, i)->ext;
 
 			if (!p)
 				continue;
@@ -175,7 +175,7 @@ int sctp_sched_set_sched(struct sctp_association *asoc,
 	asoc->outqueue.sched = n;
 	n->init(&asoc->stream);
 	for (i = 0; i < asoc->stream.outcnt; i++) {
-		if (!asoc->stream.out[i].ext)
+		if (!SCTP_SO(&asoc->stream, i)->ext)
 			continue;
 
 		ret = n->init_sid(&asoc->stream, i, GFP_KERNEL);
@@ -217,7 +217,7 @@ int sctp_sched_set_value(struct sctp_association *asoc, __u16 sid,
 	if (sid >= asoc->stream.outcnt)
 		return -EINVAL;
 
-	if (!asoc->stream.out[sid].ext) {
+	if (!SCTP_SO(&asoc->stream, sid)->ext) {
 		int ret;
 
 		ret = sctp_stream_init_ext(&asoc->stream, sid);
@@ -234,7 +234,7 @@ int sctp_sched_get_value(struct sctp_association *asoc, __u16 sid,
 	if (sid >= asoc->stream.outcnt)
 		return -EINVAL;
 
-	if (!asoc->stream.out[sid].ext)
+	if (!SCTP_SO(&asoc->stream, sid)->ext)
 		return 0;
 
 	return asoc->outqueue.sched->get(&asoc->stream, sid, value);
@@ -252,7 +252,7 @@ void sctp_sched_dequeue_done(struct sctp_outq *q, struct sctp_chunk *ch)
 		 * priority stream comes in.
 		 */
 		sid = sctp_chunk_stream_no(ch);
-		sout = &q->asoc->stream.out[sid];
+		sout = SCTP_SO(&q->asoc->stream, sid);
 		q->asoc->stream.out_curr = sout;
 		return;
 	}
@@ -272,8 +272,9 @@ void sctp_sched_dequeue_common(struct sctp_outq *q, struct sctp_chunk *ch)
 int sctp_sched_init_sid(struct sctp_stream *stream, __u16 sid, gfp_t gfp)
 {
 	struct sctp_sched_ops *sched = sctp_sched_ops_from_stream(stream);
+	struct sctp_stream_out_ext *ext = SCTP_SO(stream, sid)->ext;
 
-	INIT_LIST_HEAD(&stream->out[sid].ext->outq);
+	INIT_LIST_HEAD(&ext->outq);
 	return sched->init_sid(stream, sid, gfp);
 }
 
diff --git a/net/sctp/stream_sched_prio.c b/net/sctp/stream_sched_prio.c
index 7997d35dd0fd..2245083a98f2 100644
--- a/net/sctp/stream_sched_prio.c
+++ b/net/sctp/stream_sched_prio.c
@@ -75,10 +75,10 @@ static struct sctp_stream_priorities *sctp_sched_prio_get_head(
 
 	/* No luck. So we search on all streams now. */
 	for (i = 0; i < stream->outcnt; i++) {
-		if (!stream->out[i].ext)
+		if (!SCTP_SO(stream, i)->ext)
 			continue;
 
-		p = stream->out[i].ext->prio_head;
+		p = SCTP_SO(stream, i)->ext->prio_head;
 		if (!p)
 			/* Means all other streams won't be initialized
 			 * as well.
@@ -165,7 +165,7 @@ static void sctp_sched_prio_sched(struct sctp_stream *stream,
 static int sctp_sched_prio_set(struct sctp_stream *stream, __u16 sid,
 			       __u16 prio, gfp_t gfp)
 {
-	struct sctp_stream_out *sout = &stream->out[sid];
+	struct sctp_stream_out *sout = SCTP_SO(stream, sid);
 	struct sctp_stream_out_ext *soute = sout->ext;
 	struct sctp_stream_priorities *prio_head, *old;
 	bool reschedule = false;
@@ -186,7 +186,7 @@ static int sctp_sched_prio_set(struct sctp_stream *stream, __u16 sid,
 		return 0;
 
 	for (i = 0; i < stream->outcnt; i++) {
-		soute = stream->out[i].ext;
+		soute = SCTP_SO(stream, i)->ext;
 		if (soute && soute->prio_head = old)
 			/* It's still in use, nothing else to do here. */
 			return 0;
@@ -201,7 +201,7 @@ static int sctp_sched_prio_set(struct sctp_stream *stream, __u16 sid,
 static int sctp_sched_prio_get(struct sctp_stream *stream, __u16 sid,
 			       __u16 *value)
 {
-	*value = stream->out[sid].ext->prio_head->prio;
+	*value = SCTP_SO(stream, sid)->ext->prio_head->prio;
 	return 0;
 }
 
@@ -215,7 +215,7 @@ static int sctp_sched_prio_init(struct sctp_stream *stream)
 static int sctp_sched_prio_init_sid(struct sctp_stream *stream, __u16 sid,
 				    gfp_t gfp)
 {
-	INIT_LIST_HEAD(&stream->out[sid].ext->prio_list);
+	INIT_LIST_HEAD(&SCTP_SO(stream, sid)->ext->prio_list);
 	return sctp_sched_prio_set(stream, sid, 0, gfp);
 }
 
@@ -233,9 +233,9 @@ static void sctp_sched_prio_free(struct sctp_stream *stream)
 	 */
 	sctp_sched_prio_unsched_all(stream);
 	for (i = 0; i < stream->outcnt; i++) {
-		if (!stream->out[i].ext)
+		if (!SCTP_SO(stream, i)->ext)
 			continue;
-		prio = stream->out[i].ext->prio_head;
+		prio = SCTP_SO(stream, i)->ext->prio_head;
 		if (prio && list_empty(&prio->prio_sched))
 			list_add(&prio->prio_sched, &list);
 	}
@@ -255,7 +255,7 @@ static void sctp_sched_prio_enqueue(struct sctp_outq *q,
 	ch = list_first_entry(&msg->chunks, struct sctp_chunk, frag_list);
 	sid = sctp_chunk_stream_no(ch);
 	stream = &q->asoc->stream;
-	sctp_sched_prio_sched(stream, stream->out[sid].ext);
+	sctp_sched_prio_sched(stream, SCTP_SO(stream, sid)->ext);
 }
 
 static struct sctp_chunk *sctp_sched_prio_dequeue(struct sctp_outq *q)
@@ -297,7 +297,7 @@ static void sctp_sched_prio_dequeue_done(struct sctp_outq *q,
 	 * this priority.
 	 */
 	sid = sctp_chunk_stream_no(ch);
-	soute = q->asoc->stream.out[sid].ext;
+	soute = SCTP_SO(&q->asoc->stream, sid)->ext;
 	prio = soute->prio_head;
 
 	sctp_sched_prio_next_stream(prio);
@@ -317,7 +317,7 @@ static void sctp_sched_prio_sched_all(struct sctp_stream *stream)
 		__u16 sid;
 
 		sid = sctp_chunk_stream_no(ch);
-		sout = &stream->out[sid];
+		sout = SCTP_SO(stream, sid);
 		if (sout->ext)
 			sctp_sched_prio_sched(stream, sout->ext);
 	}
diff --git a/net/sctp/stream_sched_rr.c b/net/sctp/stream_sched_rr.c
index 1155692448f1..52ba743fa7a7 100644
--- a/net/sctp/stream_sched_rr.c
+++ b/net/sctp/stream_sched_rr.c
@@ -100,7 +100,7 @@ static int sctp_sched_rr_init(struct sctp_stream *stream)
 static int sctp_sched_rr_init_sid(struct sctp_stream *stream, __u16 sid,
 				  gfp_t gfp)
 {
-	INIT_LIST_HEAD(&stream->out[sid].ext->rr_list);
+	INIT_LIST_HEAD(&SCTP_SO(stream, sid)->ext->rr_list);
 
 	return 0;
 }
@@ -120,7 +120,7 @@ static void sctp_sched_rr_enqueue(struct sctp_outq *q,
 	ch = list_first_entry(&msg->chunks, struct sctp_chunk, frag_list);
 	sid = sctp_chunk_stream_no(ch);
 	stream = &q->asoc->stream;
-	sctp_sched_rr_sched(stream, stream->out[sid].ext);
+	sctp_sched_rr_sched(stream, SCTP_SO(stream, sid)->ext);
 }
 
 static struct sctp_chunk *sctp_sched_rr_dequeue(struct sctp_outq *q)
@@ -154,7 +154,7 @@ static void sctp_sched_rr_dequeue_done(struct sctp_outq *q,
 
 	/* Last chunk on that msg, move to the next stream */
 	sid = sctp_chunk_stream_no(ch);
-	soute = q->asoc->stream.out[sid].ext;
+	soute = SCTP_SO(&q->asoc->stream, sid)->ext;
 
 	sctp_sched_rr_next_stream(&q->asoc->stream);
 
@@ -173,7 +173,7 @@ static void sctp_sched_rr_sched_all(struct sctp_stream *stream)
 		__u16 sid;
 
 		sid = sctp_chunk_stream_no(ch);
-		soute = stream->out[sid].ext;
+		soute = SCTP_SO(stream, sid)->ext;
 		if (soute)
 			sctp_sched_rr_sched(stream, soute);
 	}
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 2/2] net/sctp: Replace in/out stream arrays with flex_array
  2018-08-03 16:21             ` Konstantin Khorenko
@ 2018-08-03 16:21               ` Konstantin Khorenko
  -1 siblings, 0 replies; 64+ messages in thread
From: Konstantin Khorenko @ 2018-08-03 16:21 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: oleg.babin, netdev, linux-sctp, David S . Miller, Vlad Yasevich,
	Neil Horman, Xin Long, Andrey Ryabinin, Konstantin Khorenko

This path replaces physically contiguous memory arrays
allocated using kmalloc_array() with flexible arrays.
This enables to avoid memory allocation failures on the
systems under a memory stress.

Signed-off-by: Oleg Babin <obabin@virtuozzo.com>
---
 include/net/sctp/structs.h |  1 +
 net/sctp/stream.c          | 78 +++++++++++++++++++++++++++++++++++-----------
 2 files changed, 61 insertions(+), 18 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index dc48c8e2b293..884d33965e89 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -57,6 +57,7 @@
 #include <linux/atomic.h>		/* This gets us atomic counters.  */
 #include <linux/skbuff.h>	/* We need sk_buff_head. */
 #include <linux/workqueue.h>	/* We need tq_struct.	 */
+#include <linux/flex_array.h>	/* We need flex_array.   */
 #include <linux/sctp.h>		/* We need sctp* header structs.  */
 #include <net/sctp/auth.h>	/* We need auth specific structs */
 #include <net/ip.h>		/* For inet_skb_parm */
diff --git a/net/sctp/stream.c b/net/sctp/stream.c
index 56fadeec7cba..3e55db1a38d0 100644
--- a/net/sctp/stream.c
+++ b/net/sctp/stream.c
@@ -40,13 +40,60 @@
 struct sctp_stream_out *sctp_stream_out(const struct sctp_stream *stream,
 					__u16 sid)
 {
-	return ((struct sctp_stream_out *)(stream->out)) + sid;
+	return flex_array_get(stream->out, sid);
 }
 
 struct sctp_stream_in *sctp_stream_in(const struct sctp_stream *stream,
 				      __u16 sid)
 {
-	return ((struct sctp_stream_in *)(stream->in)) + sid;
+	return flex_array_get(stream->in, sid);
+}
+
+static struct flex_array *fa_alloc(size_t elem_size, size_t elem_count,
+				   gfp_t gfp)
+{
+	struct flex_array *result;
+	int err;
+
+	result = flex_array_alloc(elem_size, elem_count, gfp);
+	if (result) {
+		err = flex_array_prealloc(result, 0, elem_count, gfp);
+		if (err) {
+			flex_array_free(result);
+			result = NULL;
+		}
+	}
+
+	return result;
+}
+
+static void fa_free(struct flex_array *fa)
+{
+	if (fa)
+		flex_array_free(fa);
+}
+
+static void fa_copy(struct flex_array *fa, struct flex_array *from,
+		    size_t index, size_t count)
+{
+	void *elem;
+
+	while (count--) {
+		elem = flex_array_get(from, index);
+		flex_array_put(fa, index, elem, 0);
+		index++;
+	}
+}
+
+static void fa_zero(struct flex_array *fa, size_t index, size_t count)
+{
+	void *elem;
+
+	while (count--) {
+		elem = flex_array_get(fa, index);
+		memset(elem, 0, fa->element_size);
+		index++;
+	}
 }
 
 /* Migrates chunks from stream queues to new stream queues if needed,
@@ -106,19 +153,17 @@ static int sctp_stream_alloc_out(struct sctp_stream *stream, __u16 outcnt,
 	struct flex_array *out;
 	size_t elem_size = sizeof(struct sctp_stream_out);
 
-	out = kmalloc_array(outcnt, elem_size, gfp);
+	out = fa_alloc(elem_size, outcnt, gfp);
 	if (!out)
 		return -ENOMEM;
 
 	if (stream->out) {
-		memcpy(out, stream->out, min(outcnt, stream->outcnt) *
-					 elem_size);
-		kfree(stream->out);
+		fa_copy(out, stream->out, 0, min(outcnt, stream->outcnt));
+		fa_free(stream->out);
 	}
 
 	if (outcnt > stream->outcnt)
-		memset(((struct sctp_stream_out *)out) + stream->outcnt, 0,
-		       (outcnt - stream->outcnt) * elem_size);
+		fa_zero(out, stream->outcnt, (outcnt - stream->outcnt));
 
 	stream->out = out;
 
@@ -131,20 +176,17 @@ static int sctp_stream_alloc_in(struct sctp_stream *stream, __u16 incnt,
 	struct flex_array *in;
 	size_t elem_size = sizeof(struct sctp_stream_in);
 
-	in = kmalloc_array(incnt, elem_size, gfp);
-
+	in = fa_alloc(elem_size, incnt, gfp);
 	if (!in)
 		return -ENOMEM;
 
 	if (stream->in) {
-		memcpy(in, stream->in, min(incnt, stream->incnt) *
-				       elem_size);
-		kfree(stream->in);
+		fa_copy(in, stream->in, 0, min(incnt, stream->incnt));
+		fa_free(stream->in);
 	}
 
 	if (incnt > stream->incnt)
-		memset(((struct sctp_stream_in *)in) + stream->incnt, 0,
-		       (incnt - stream->incnt) * elem_size);
+		fa_zero(in, stream->incnt, (incnt - stream->incnt));
 
 	stream->in = in;
 
@@ -188,7 +230,7 @@ int sctp_stream_init(struct sctp_stream *stream, __u16 outcnt, __u16 incnt,
 	ret = sctp_stream_alloc_in(stream, incnt, gfp);
 	if (ret) {
 		sched->free(stream);
-		kfree(stream->out);
+		fa_free(stream->out);
 		stream->out = NULL;
 		stream->outcnt = 0;
 		goto out;
@@ -220,8 +262,8 @@ void sctp_stream_free(struct sctp_stream *stream)
 	sched->free(stream);
 	for (i = 0; i < stream->outcnt; i++)
 		kfree(SCTP_SO(stream, i)->ext);
-	kfree(stream->out);
-	kfree(stream->in);
+	fa_free(stream->out);
+	fa_free(stream->in);
 }
 
 void sctp_stream_clear(struct sctp_stream *stream)
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 2/2] net/sctp: Replace in/out stream arrays with flex_array
@ 2018-08-03 16:21               ` Konstantin Khorenko
  0 siblings, 0 replies; 64+ messages in thread
From: Konstantin Khorenko @ 2018-08-03 16:21 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: oleg.babin, netdev, linux-sctp, David S . Miller, Vlad Yasevich,
	Neil Horman, Xin Long, Andrey Ryabinin, Konstantin Khorenko

This path replaces physically contiguous memory arrays
allocated using kmalloc_array() with flexible arrays.
This enables to avoid memory allocation failures on the
systems under a memory stress.

Signed-off-by: Oleg Babin <obabin@virtuozzo.com>
---
 include/net/sctp/structs.h |  1 +
 net/sctp/stream.c          | 78 +++++++++++++++++++++++++++++++++++-----------
 2 files changed, 61 insertions(+), 18 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index dc48c8e2b293..884d33965e89 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -57,6 +57,7 @@
 #include <linux/atomic.h>		/* This gets us atomic counters.  */
 #include <linux/skbuff.h>	/* We need sk_buff_head. */
 #include <linux/workqueue.h>	/* We need tq_struct.	 */
+#include <linux/flex_array.h>	/* We need flex_array.   */
 #include <linux/sctp.h>		/* We need sctp* header structs.  */
 #include <net/sctp/auth.h>	/* We need auth specific structs */
 #include <net/ip.h>		/* For inet_skb_parm */
diff --git a/net/sctp/stream.c b/net/sctp/stream.c
index 56fadeec7cba..3e55db1a38d0 100644
--- a/net/sctp/stream.c
+++ b/net/sctp/stream.c
@@ -40,13 +40,60 @@
 struct sctp_stream_out *sctp_stream_out(const struct sctp_stream *stream,
 					__u16 sid)
 {
-	return ((struct sctp_stream_out *)(stream->out)) + sid;
+	return flex_array_get(stream->out, sid);
 }
 
 struct sctp_stream_in *sctp_stream_in(const struct sctp_stream *stream,
 				      __u16 sid)
 {
-	return ((struct sctp_stream_in *)(stream->in)) + sid;
+	return flex_array_get(stream->in, sid);
+}
+
+static struct flex_array *fa_alloc(size_t elem_size, size_t elem_count,
+				   gfp_t gfp)
+{
+	struct flex_array *result;
+	int err;
+
+	result = flex_array_alloc(elem_size, elem_count, gfp);
+	if (result) {
+		err = flex_array_prealloc(result, 0, elem_count, gfp);
+		if (err) {
+			flex_array_free(result);
+			result = NULL;
+		}
+	}
+
+	return result;
+}
+
+static void fa_free(struct flex_array *fa)
+{
+	if (fa)
+		flex_array_free(fa);
+}
+
+static void fa_copy(struct flex_array *fa, struct flex_array *from,
+		    size_t index, size_t count)
+{
+	void *elem;
+
+	while (count--) {
+		elem = flex_array_get(from, index);
+		flex_array_put(fa, index, elem, 0);
+		index++;
+	}
+}
+
+static void fa_zero(struct flex_array *fa, size_t index, size_t count)
+{
+	void *elem;
+
+	while (count--) {
+		elem = flex_array_get(fa, index);
+		memset(elem, 0, fa->element_size);
+		index++;
+	}
 }
 
 /* Migrates chunks from stream queues to new stream queues if needed,
@@ -106,19 +153,17 @@ static int sctp_stream_alloc_out(struct sctp_stream *stream, __u16 outcnt,
 	struct flex_array *out;
 	size_t elem_size = sizeof(struct sctp_stream_out);
 
-	out = kmalloc_array(outcnt, elem_size, gfp);
+	out = fa_alloc(elem_size, outcnt, gfp);
 	if (!out)
 		return -ENOMEM;
 
 	if (stream->out) {
-		memcpy(out, stream->out, min(outcnt, stream->outcnt) *
-					 elem_size);
-		kfree(stream->out);
+		fa_copy(out, stream->out, 0, min(outcnt, stream->outcnt));
+		fa_free(stream->out);
 	}
 
 	if (outcnt > stream->outcnt)
-		memset(((struct sctp_stream_out *)out) + stream->outcnt, 0,
-		       (outcnt - stream->outcnt) * elem_size);
+		fa_zero(out, stream->outcnt, (outcnt - stream->outcnt));
 
 	stream->out = out;
 
@@ -131,20 +176,17 @@ static int sctp_stream_alloc_in(struct sctp_stream *stream, __u16 incnt,
 	struct flex_array *in;
 	size_t elem_size = sizeof(struct sctp_stream_in);
 
-	in = kmalloc_array(incnt, elem_size, gfp);
-
+	in = fa_alloc(elem_size, incnt, gfp);
 	if (!in)
 		return -ENOMEM;
 
 	if (stream->in) {
-		memcpy(in, stream->in, min(incnt, stream->incnt) *
-				       elem_size);
-		kfree(stream->in);
+		fa_copy(in, stream->in, 0, min(incnt, stream->incnt));
+		fa_free(stream->in);
 	}
 
 	if (incnt > stream->incnt)
-		memset(((struct sctp_stream_in *)in) + stream->incnt, 0,
-		       (incnt - stream->incnt) * elem_size);
+		fa_zero(in, stream->incnt, (incnt - stream->incnt));
 
 	stream->in = in;
 
@@ -188,7 +230,7 @@ int sctp_stream_init(struct sctp_stream *stream, __u16 outcnt, __u16 incnt,
 	ret = sctp_stream_alloc_in(stream, incnt, gfp);
 	if (ret) {
 		sched->free(stream);
-		kfree(stream->out);
+		fa_free(stream->out);
 		stream->out = NULL;
 		stream->outcnt = 0;
 		goto out;
@@ -220,8 +262,8 @@ void sctp_stream_free(struct sctp_stream *stream)
 	sched->free(stream);
 	for (i = 0; i < stream->outcnt; i++)
 		kfree(SCTP_SO(stream, i)->ext);
-	kfree(stream->out);
-	kfree(stream->in);
+	fa_free(stream->out);
+	fa_free(stream->in);
 }
 
 void sctp_stream_clear(struct sctp_stream *stream)
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* RE: [PATCH v2 1/2] net/sctp: Make wrappers for accessing in/out streams
  2018-08-03 16:21               ` Konstantin Khorenko
@ 2018-08-03 16:41                 ` David Laight
  -1 siblings, 0 replies; 64+ messages in thread
From: David Laight @ 2018-08-03 16:41 UTC (permalink / raw)
  To: 'Konstantin Khorenko', Marcelo Ricardo Leitner
  Cc: oleg.babin, netdev, linux-sctp, David S . Miller, Vlad Yasevich,
	Neil Horman, Xin Long, Andrey Ryabinin

From: Konstantin Khorenko
> Sent: 03 August 2018 17:21
...
> --- a/net/sctp/stream.c
> +++ b/net/sctp/stream.c
> @@ -37,6 +37,18 @@
>  #include <net/sctp/sm.h>
>  #include <net/sctp/stream_sched.h>
> 
> +struct sctp_stream_out *sctp_stream_out(const struct sctp_stream *stream,
> +					__u16 sid)
> +{
> +	return ((struct sctp_stream_out *)(stream->out)) + sid;
> +}
> +
> +struct sctp_stream_in *sctp_stream_in(const struct sctp_stream *stream,
> +				      __u16 sid)
> +{
> +	return ((struct sctp_stream_in *)(stream->in)) + sid;
> +}
> +

Those look like they ought to be static inlines in the header file.
Otherwise you'll be making SCTP performance worse that it is already.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH v2 1/2] net/sctp: Make wrappers for accessing in/out streams
@ 2018-08-03 16:41                 ` David Laight
  0 siblings, 0 replies; 64+ messages in thread
From: David Laight @ 2018-08-03 16:41 UTC (permalink / raw)
  To: 'Konstantin Khorenko', Marcelo Ricardo Leitner
  Cc: oleg.babin, netdev, linux-sctp, David S . Miller, Vlad Yasevich,
	Neil Horman, Xin Long, Andrey Ryabinin

From: Konstantin Khorenko
> Sent: 03 August 2018 17:21
...
> --- a/net/sctp/stream.c
> +++ b/net/sctp/stream.c
> @@ -37,6 +37,18 @@
>  #include <net/sctp/sm.h>
>  #include <net/sctp/stream_sched.h>
> 
> +struct sctp_stream_out *sctp_stream_out(const struct sctp_stream *stream,
> +					__u16 sid)
> +{
> +	return ((struct sctp_stream_out *)(stream->out)) + sid;
> +}
> +
> +struct sctp_stream_in *sctp_stream_in(const struct sctp_stream *stream,
> +				      __u16 sid)
> +{
> +	return ((struct sctp_stream_in *)(stream->in)) + sid;
> +}
> +

Those look like they ought to be static inlines in the header file.
Otherwise you'll be making SCTP performance worse that it is already.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH v2 0/2] net/sctp: Avoid allocating high order memory with kmalloc()
  2018-08-03 16:21             ` Konstantin Khorenko
@ 2018-08-03 16:43               ` David Laight
  -1 siblings, 0 replies; 64+ messages in thread
From: David Laight @ 2018-08-03 16:43 UTC (permalink / raw)
  To: 'Konstantin Khorenko', Marcelo Ricardo Leitner
  Cc: oleg.babin, netdev, linux-sctp, David S . Miller, Vlad Yasevich,
	Neil Horman, Xin Long, Andrey Ryabinin

From: Konstantin Khorenko
> Sent: 03 August 2018 17:21
> 
> Each SCTP association can have up to 65535 input and output streams.
> For each stream type an array of sctp_stream_in or sctp_stream_out
> structures is allocated using kmalloc_array() function. This function
> allocates physically contiguous memory regions, so this can lead
> to allocation of memory regions of very high order, i.e.:
...

Given how useless SCTP streams are, does anything actually use
more than about 4?

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH v2 0/2] net/sctp: Avoid allocating high order memory with kmalloc()
@ 2018-08-03 16:43               ` David Laight
  0 siblings, 0 replies; 64+ messages in thread
From: David Laight @ 2018-08-03 16:43 UTC (permalink / raw)
  To: 'Konstantin Khorenko', Marcelo Ricardo Leitner
  Cc: oleg.babin, netdev, linux-sctp, David S . Miller, Vlad Yasevich,
	Neil Horman, Xin Long, Andrey Ryabinin

From: Konstantin Khorenko
> Sent: 03 August 2018 17:21
> 
> Each SCTP association can have up to 65535 input and output streams.
> For each stream type an array of sctp_stream_in or sctp_stream_out
> structures is allocated using kmalloc_array() function. This function
> allocates physically contiguous memory regions, so this can lead
> to allocation of memory regions of very high order, i.e.:
...

Given how useless SCTP streams are, does anything actually use
more than about 4?

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 1/2] net/sctp: Make wrappers for accessing in/out streams
  2018-08-03 16:21               ` Konstantin Khorenko
@ 2018-08-03 19:50                 ` David Miller
  -1 siblings, 0 replies; 64+ messages in thread
From: David Miller @ 2018-08-03 19:50 UTC (permalink / raw)
  To: khorenko
  Cc: marcelo.leitner, oleg.babin, netdev, linux-sctp, vyasevich,
	nhorman, lucien.xin, aryabinin

From: Konstantin Khorenko <khorenko@virtuozzo.com>
Date: Fri,  3 Aug 2018 19:21:01 +0300

> +struct sctp_stream_out *sctp_stream_out(const struct sctp_stream *stream,
> +					__u16 sid)
> +{
> +	return ((struct sctp_stream_out *)(stream->out)) + sid;
> +}
> +
> +struct sctp_stream_in *sctp_stream_in(const struct sctp_stream *stream,
> +				      __u16 sid)
> +{
> +	return ((struct sctp_stream_in *)(stream->in)) + sid;
> +}

I agree with David that these should be in a header file, and marked
inline.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 1/2] net/sctp: Make wrappers for accessing in/out streams
@ 2018-08-03 19:50                 ` David Miller
  0 siblings, 0 replies; 64+ messages in thread
From: David Miller @ 2018-08-03 19:50 UTC (permalink / raw)
  To: khorenko
  Cc: marcelo.leitner, oleg.babin, netdev, linux-sctp, vyasevich,
	nhorman, lucien.xin, aryabinin

From: Konstantin Khorenko <khorenko@virtuozzo.com>
Date: Fri,  3 Aug 2018 19:21:01 +0300

> +struct sctp_stream_out *sctp_stream_out(const struct sctp_stream *stream,
> +					__u16 sid)
> +{
> +	return ((struct sctp_stream_out *)(stream->out)) + sid;
> +}
> +
> +struct sctp_stream_in *sctp_stream_in(const struct sctp_stream *stream,
> +				      __u16 sid)
> +{
> +	return ((struct sctp_stream_in *)(stream->in)) + sid;
> +}

I agree with David that these should be in a header file, and marked
inline.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 0/2] net/sctp: Avoid allocating high order memory with kmalloc()
  2018-08-03 16:43               ` David Laight
@ 2018-08-03 20:30                 ` Marcelo Ricardo Leitner
  -1 siblings, 0 replies; 64+ messages in thread
From: Marcelo Ricardo Leitner @ 2018-08-03 20:30 UTC (permalink / raw)
  To: David Laight
  Cc: 'Konstantin Khorenko',
	oleg.babin, netdev, linux-sctp, David S . Miller, Vlad Yasevich,
	Neil Horman, Xin Long, Andrey Ryabinin, Michael Tuexen

On Fri, Aug 03, 2018 at 04:43:28PM +0000, David Laight wrote:
> From: Konstantin Khorenko
> > Sent: 03 August 2018 17:21
> > 
> > Each SCTP association can have up to 65535 input and output streams.
> > For each stream type an array of sctp_stream_in or sctp_stream_out
> > structures is allocated using kmalloc_array() function. This function
> > allocates physically contiguous memory regions, so this can lead
> > to allocation of memory regions of very high order, i.e.:
> ...
> 
> Given how useless SCTP streams are, does anything actually use
> more than about 4?

Maybe Michael can help us with that. I'm also curious now.

  Marcelo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 0/2] net/sctp: Avoid allocating high order memory with kmalloc()
@ 2018-08-03 20:30                 ` Marcelo Ricardo Leitner
  0 siblings, 0 replies; 64+ messages in thread
From: Marcelo Ricardo Leitner @ 2018-08-03 20:30 UTC (permalink / raw)
  To: David Laight
  Cc: 'Konstantin Khorenko',
	oleg.babin, netdev, linux-sctp, David S . Miller, Vlad Yasevich,
	Neil Horman, Xin Long, Andrey Ryabinin, Michael Tuexen

On Fri, Aug 03, 2018 at 04:43:28PM +0000, David Laight wrote:
> From: Konstantin Khorenko
> > Sent: 03 August 2018 17:21
> > 
> > Each SCTP association can have up to 65535 input and output streams.
> > For each stream type an array of sctp_stream_in or sctp_stream_out
> > structures is allocated using kmalloc_array() function. This function
> > allocates physically contiguous memory regions, so this can lead
> > to allocation of memory regions of very high order, i.e.:
> ...
> 
> Given how useless SCTP streams are, does anything actually use
> more than about 4?

Maybe Michael can help us with that. I'm also curious now.

  Marcelo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 1/2] net/sctp: Make wrappers for accessing in/out streams
  2018-08-03 16:21               ` Konstantin Khorenko
@ 2018-08-03 20:40                 ` Marcelo Ricardo Leitner
  -1 siblings, 0 replies; 64+ messages in thread
From: Marcelo Ricardo Leitner @ 2018-08-03 20:40 UTC (permalink / raw)
  To: Konstantin Khorenko
  Cc: oleg.babin, netdev, linux-sctp, David S . Miller, Vlad Yasevich,
	Neil Horman, Xin Long, Andrey Ryabinin

On Fri, Aug 03, 2018 at 07:21:01PM +0300, Konstantin Khorenko wrote:
> This patch introduces wrappers for accessing in/out streams indirectly.
> This will enable to replace physically contiguous memory arrays
> of streams with flexible arrays (or maybe any other appropriate
> mechanism) which do memory allocation on a per-page basis.
> 
> Signed-off-by: Oleg Babin <obabin@virtuozzo.com>
> Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
> 
> ---
> v2 changes:
>  sctp_stream_in() users are updated to provide stream as an argument,
>  sctp_stream_{in,out}_ptr() are now just sctp_stream_{in,out}().
> ---

...

>  
>  struct sctp_stream {
> -	struct sctp_stream_out *out;
> -	struct sctp_stream_in *in;
> +	struct flex_array *out;
> +	struct flex_array *in;

If this patch was meant to be a preparation, shouldn't this belong to
the next patch instead?

  Marcelo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 1/2] net/sctp: Make wrappers for accessing in/out streams
@ 2018-08-03 20:40                 ` Marcelo Ricardo Leitner
  0 siblings, 0 replies; 64+ messages in thread
From: Marcelo Ricardo Leitner @ 2018-08-03 20:40 UTC (permalink / raw)
  To: Konstantin Khorenko
  Cc: oleg.babin, netdev, linux-sctp, David S . Miller, Vlad Yasevich,
	Neil Horman, Xin Long, Andrey Ryabinin

On Fri, Aug 03, 2018 at 07:21:01PM +0300, Konstantin Khorenko wrote:
> This patch introduces wrappers for accessing in/out streams indirectly.
> This will enable to replace physically contiguous memory arrays
> of streams with flexible arrays (or maybe any other appropriate
> mechanism) which do memory allocation on a per-page basis.
> 
> Signed-off-by: Oleg Babin <obabin@virtuozzo.com>
> Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
> 
> ---
> v2 changes:
>  sctp_stream_in() users are updated to provide stream as an argument,
>  sctp_stream_{in,out}_ptr() are now just sctp_stream_{in,out}().
> ---

...

>  
>  struct sctp_stream {
> -	struct sctp_stream_out *out;
> -	struct sctp_stream_in *in;
> +	struct flex_array *out;
> +	struct flex_array *in;

If this patch was meant to be a preparation, shouldn't this belong to
the next patch instead?

  Marcelo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 0/2] net/sctp: Avoid allocating high order memory with kmalloc()
  2018-08-03 20:30                 ` Marcelo Ricardo Leitner
@ 2018-08-03 20:56                   ` Michael Tuexen
  -1 siblings, 0 replies; 64+ messages in thread
From: Michael Tuexen @ 2018-08-03 20:56 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: David Laight, Konstantin Khorenko, oleg.babin, netdev,
	linux-sctp, David S . Miller, Vlad Yasevich, Neil Horman,
	Xin Long, Andrey Ryabinin



> On 3. Aug 2018, at 22:30, Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> wrote:
> 
> On Fri, Aug 03, 2018 at 04:43:28PM +0000, David Laight wrote:
>> From: Konstantin Khorenko
>>> Sent: 03 August 2018 17:21
>>> 
>>> Each SCTP association can have up to 65535 input and output streams.
>>> For each stream type an array of sctp_stream_in or sctp_stream_out
>>> structures is allocated using kmalloc_array() function. This function
>>> allocates physically contiguous memory regions, so this can lead
>>> to allocation of memory regions of very high order, i.e.:
>> ...
>> 
>> Given how useless SCTP streams are, does anything actually use
>> more than about 4?
> 
> Maybe Michael can help us with that. I'm also curious now.
In the context of SIGTRAN I have seen 17 streams...

In the context of WebRTC I have seen more streams. In general,
the streams concept seems to be useful. QUIC has lots of streams.

So I'm wondering why they are considered useless.
David, can you elaborate on this?

Best regards
Michael
> 
>  Marcelo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 0/2] net/sctp: Avoid allocating high order memory with kmalloc()
@ 2018-08-03 20:56                   ` Michael Tuexen
  0 siblings, 0 replies; 64+ messages in thread
From: Michael Tuexen @ 2018-08-03 20:56 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: David Laight, Konstantin Khorenko, oleg.babin, netdev,
	linux-sctp, David S . Miller, Vlad Yasevich, Neil Horman,
	Xin Long, Andrey Ryabinin



> On 3. Aug 2018, at 22:30, Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> wrote:
> 
> On Fri, Aug 03, 2018 at 04:43:28PM +0000, David Laight wrote:
>> From: Konstantin Khorenko
>>> Sent: 03 August 2018 17:21
>>> 
>>> Each SCTP association can have up to 65535 input and output streams.
>>> For each stream type an array of sctp_stream_in or sctp_stream_out
>>> structures is allocated using kmalloc_array() function. This function
>>> allocates physically contiguous memory regions, so this can lead
>>> to allocation of memory regions of very high order, i.e.:
>> ...
>> 
>> Given how useless SCTP streams are, does anything actually use
>> more than about 4?
> 
> Maybe Michael can help us with that. I'm also curious now.
In the context of SIGTRAN I have seen 17 streams...

In the context of WebRTC I have seen more streams. In general,
the streams concept seems to be useful. QUIC has lots of streams.

So I'm wondering why they are considered useless.
David, can you elaborate on this?

Best regards
Michael
> 
>  Marcelo


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 0/2] net/sctp: Avoid allocating high order memory with kmalloc()
  2018-08-03 16:21             ` Konstantin Khorenko
@ 2018-08-03 23:36               ` Marcelo Ricardo Leitner
  -1 siblings, 0 replies; 64+ messages in thread
From: Marcelo Ricardo Leitner @ 2018-08-03 23:36 UTC (permalink / raw)
  To: Konstantin Khorenko
  Cc: oleg.babin, netdev, linux-sctp, David S . Miller, Vlad Yasevich,
	Neil Horman, Xin Long, Andrey Ryabinin

On Fri, Aug 03, 2018 at 07:21:00PM +0300, Konstantin Khorenko wrote:
...
> Performance results:
> ====================
>   * Kernel: v4.18-rc6 - stock and with 2 patches from Oleg (earlier in this thread)
>   * Node: CPU (8 cores): Intel(R) Xeon(R) CPU E31230 @ 3.20GHz
>           RAM: 32 Gb
> 
>   * netperf: taken from https://github.com/HewlettPackard/netperf.git,
> 	     compiled from sources with sctp support
>   * netperf server and client are run on the same node
>   * ip link set lo mtu 1500
> 
> The script used to run tests:
>  # cat run_tests.sh
>  #!/bin/bash
> 
> for test in SCTP_STREAM SCTP_STREAM_MANY SCTP_RR SCTP_RR_MANY; do
>   echo "TEST: $test";
>   for i in `seq 1 3`; do
>     echo "Iteration: $i";
>     set -x
>     netperf -t $test -H localhost -p 22222 -S 200000,200000 -s 200000,200000 \
>             -l 60 -- -m 1452;
>     set +x
>   done
> done
> ================================================
> 
> Results (a bit reformatted to be more readable):
...

Nice, good numbers.

I'm missing some test that actually uses more than 1 stream. All tests
in netperf uses only 1 stream. They can use 1 or Many associations on
a socket, but not multiple streams. That means the numbers here show
that we shouldn't see any regression on the more traditional uses, per
Michael's reply on the other email, but it is not testing how it will
behave if we go crazy and use the 64k streams (worst case).

You'll need some other tool to test it. One idea is sctp_test, from
lksctp-tools. Something like:

Server side: 
	./sctp_test -H 172.0.0.1 -P 22222 -l -d 0
Client side: 
	time ./sctp_test -H 172.0.0.1 -P 22221 \
		-h 172.0.0.1 -p 22222 -s \
		-c 1 -M 65535 -T -t 1 -x 100000 -d 0

And then measure the difference on how long each test took. Can you
get these too?

Interesting that in my laptop just to start this test for the first
time can took some *seconds*. Seems kernel had a hard time
defragmenting the memory here. :)

Thanks,
Marcelo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 0/2] net/sctp: Avoid allocating high order memory with kmalloc()
@ 2018-08-03 23:36               ` Marcelo Ricardo Leitner
  0 siblings, 0 replies; 64+ messages in thread
From: Marcelo Ricardo Leitner @ 2018-08-03 23:36 UTC (permalink / raw)
  To: Konstantin Khorenko
  Cc: oleg.babin, netdev, linux-sctp, David S . Miller, Vlad Yasevich,
	Neil Horman, Xin Long, Andrey Ryabinin

On Fri, Aug 03, 2018 at 07:21:00PM +0300, Konstantin Khorenko wrote:
...
> Performance results:
> ==========
>   * Kernel: v4.18-rc6 - stock and with 2 patches from Oleg (earlier in this thread)
>   * Node: CPU (8 cores): Intel(R) Xeon(R) CPU E31230 @ 3.20GHz
>           RAM: 32 Gb
> 
>   * netperf: taken from https://github.com/HewlettPackard/netperf.git,
> 	     compiled from sources with sctp support
>   * netperf server and client are run on the same node
>   * ip link set lo mtu 1500
> 
> The script used to run tests:
>  # cat run_tests.sh
>  #!/bin/bash
> 
> for test in SCTP_STREAM SCTP_STREAM_MANY SCTP_RR SCTP_RR_MANY; do
>   echo "TEST: $test";
>   for i in `seq 1 3`; do
>     echo "Iteration: $i";
>     set -x
>     netperf -t $test -H localhost -p 22222 -S 200000,200000 -s 200000,200000 \
>             -l 60 -- -m 1452;
>     set +x
>   done
> done
> ========================
> 
> Results (a bit reformatted to be more readable):
...

Nice, good numbers.

I'm missing some test that actually uses more than 1 stream. All tests
in netperf uses only 1 stream. They can use 1 or Many associations on
a socket, but not multiple streams. That means the numbers here show
that we shouldn't see any regression on the more traditional uses, per
Michael's reply on the other email, but it is not testing how it will
behave if we go crazy and use the 64k streams (worst case).

You'll need some other tool to test it. One idea is sctp_test, from
lksctp-tools. Something like:

Server side: 
	./sctp_test -H 172.0.0.1 -P 22222 -l -d 0
Client side: 
	time ./sctp_test -H 172.0.0.1 -P 22221 \
		-h 172.0.0.1 -p 22222 -s \
		-c 1 -M 65535 -T -t 1 -x 100000 -d 0

And then measure the difference on how long each test took. Can you
get these too?

Interesting that in my laptop just to start this test for the first
time can took some *seconds*. Seems kernel had a hard time
defragmenting the memory here. :)

Thanks,
Marcelo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH v2 0/2] net/sctp: Avoid allocating high order memory with kmalloc()
  2018-08-03 20:56                   ` Michael Tuexen
@ 2018-08-06  9:34                     ` David Laight
  -1 siblings, 0 replies; 64+ messages in thread
From: David Laight @ 2018-08-06  9:34 UTC (permalink / raw)
  To: 'Michael Tuexen', Marcelo Ricardo Leitner
  Cc: Konstantin Khorenko, oleg.babin, netdev, linux-sctp,
	David S . Miller, Vlad Yasevich, Neil Horman, Xin Long,
	Andrey Ryabinin

From: Michael Tuexen
> Sent: 03 August 2018 21:57
...
> >> Given how useless SCTP streams are, does anything actually use
> >> more than about 4?
> >
> > Maybe Michael can help us with that. I'm also curious now.
> In the context of SIGTRAN I have seen 17 streams...

Ok, I've seen 17 there as well, 5 is probably more common.

> In the context of WebRTC I have seen more streams. In general,
> the streams concept seems to be useful. QUIC has lots of streams.
> 
> So I'm wondering why they are considered useless.
> David, can you elaborate on this?

I don't think a lot of people know what they actually are.

Streams just allow some receive data to forwarded to applications when receive
message(s) on stream(s) are lost and have to be retransmitted.

I suspect some people think that the separate streams have separate flow control,
not just separate data sequences.

M2PA separates control message (stream 0) from user data (stream 1).
I think the spec even suggests this is so control messages get through when
user data is flow controlled off - not true (it would be true for ISO
transport's 'expedited data).

M3UA will use 16 streams (one for each (ITU) SLS), but uses stream 0 for control.
If a data message is lost then data for the other sls can be passed to the
userpart/mtp3 - this might save bursty processing when the SACK-requested
retransmission arrives. But I doubt you'd want to run M3UA on anything lossy
enough for more than 4 data streams to make sense.

Even M3UA separating control onto stream 0 data onto 1-n doesn't seem useful to me.

If QUIC is using 'lots of streams' is it just using the stream-id as a qualifier
for the data? Rather than requiring the 'not head of line blocking' feature
of sctp streams?

Thought....
Could we let the application set large stream-ids, but actually mask them
down to (say) 32 for the protocol code?

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH v2 0/2] net/sctp: Avoid allocating high order memory with kmalloc()
@ 2018-08-06  9:34                     ` David Laight
  0 siblings, 0 replies; 64+ messages in thread
From: David Laight @ 2018-08-06  9:34 UTC (permalink / raw)
  To: 'Michael Tuexen', Marcelo Ricardo Leitner
  Cc: Konstantin Khorenko, oleg.babin, netdev, linux-sctp,
	David S . Miller, Vlad Yasevich, Neil Horman, Xin Long,
	Andrey Ryabinin

From: Michael Tuexen
> Sent: 03 August 2018 21:57
...
> >> Given how useless SCTP streams are, does anything actually use
> >> more than about 4?
> >
> > Maybe Michael can help us with that. I'm also curious now.
> In the context of SIGTRAN I have seen 17 streams...

Ok, I've seen 17 there as well, 5 is probably more common.

> In the context of WebRTC I have seen more streams. In general,
> the streams concept seems to be useful. QUIC has lots of streams.
> 
> So I'm wondering why they are considered useless.
> David, can you elaborate on this?

I don't think a lot of people know what they actually are.

Streams just allow some receive data to forwarded to applications when receive
message(s) on stream(s) are lost and have to be retransmitted.

I suspect some people think that the separate streams have separate flow control,
not just separate data sequences.

M2PA separates control message (stream 0) from user data (stream 1).
I think the spec even suggests this is so control messages get through when
user data is flow controlled off - not true (it would be true for ISO
transport's 'expedited data).

M3UA will use 16 streams (one for each (ITU) SLS), but uses stream 0 for control.
If a data message is lost then data for the other sls can be passed to the
userpart/mtp3 - this might save bursty processing when the SACK-requested
retransmission arrives. But I doubt you'd want to run M3UA on anything lossy
enough for more than 4 data streams to make sense.

Even M3UA separating control onto stream 0 data onto 1-n doesn't seem useful to me.

If QUIC is using 'lots of streams' is it just using the stream-id as a qualifier
for the data? Rather than requiring the 'not head of line blocking' feature
of sctp streams?

Thought....
Could we let the application set large stream-ids, but actually mask them
down to (say) 32 for the protocol code?

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 0/2] net/sctp: Avoid allocating high order memory with kmalloc()
  2018-08-06  9:34                     ` David Laight
@ 2018-08-08 14:48                       ` Marcelo Ricardo Leitner
  -1 siblings, 0 replies; 64+ messages in thread
From: Marcelo Ricardo Leitner @ 2018-08-08 14:48 UTC (permalink / raw)
  To: David Laight
  Cc: 'Michael Tuexen',
	Konstantin Khorenko, oleg.babin, netdev, linux-sctp,
	David S . Miller, Vlad Yasevich, Neil Horman, Xin Long,
	Andrey Ryabinin

On Mon, Aug 06, 2018 at 09:34:05AM +0000, David Laight wrote:
> From: Michael Tuexen
> > Sent: 03 August 2018 21:57
> ...
> > >> Given how useless SCTP streams are, does anything actually use
> > >> more than about 4?
> > >
> > > Maybe Michael can help us with that. I'm also curious now.
> > In the context of SIGTRAN I have seen 17 streams...
> 
> Ok, I've seen 17 there as well, 5 is probably more common.
> 
> > In the context of WebRTC I have seen more streams. In general,
> > the streams concept seems to be useful. QUIC has lots of streams.

That means the migration to flex_array should not have a noticeable
impact, as for a small number of streams it will behave just as the
same as it does now. (yes, considering the app won't use high-order
stream id's just because)

...

> 
> Thought....
> Could we let the application set large stream-ids, but actually mask them
> down to (say) 32 for the protocol code?

This would require both peers to know about the mapping, as stream ids
must be same on both sides. Seems to be it is better to just adjust
the application and make use of low numbers.

  Marcelo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 0/2] net/sctp: Avoid allocating high order memory with kmalloc()
@ 2018-08-08 14:48                       ` Marcelo Ricardo Leitner
  0 siblings, 0 replies; 64+ messages in thread
From: Marcelo Ricardo Leitner @ 2018-08-08 14:48 UTC (permalink / raw)
  To: David Laight
  Cc: 'Michael Tuexen',
	Konstantin Khorenko, oleg.babin, netdev, linux-sctp,
	David S . Miller, Vlad Yasevich, Neil Horman, Xin Long,
	Andrey Ryabinin

On Mon, Aug 06, 2018 at 09:34:05AM +0000, David Laight wrote:
> From: Michael Tuexen
> > Sent: 03 August 2018 21:57
> ...
> > >> Given how useless SCTP streams are, does anything actually use
> > >> more than about 4?
> > >
> > > Maybe Michael can help us with that. I'm also curious now.
> > In the context of SIGTRAN I have seen 17 streams...
> 
> Ok, I've seen 17 there as well, 5 is probably more common.
> 
> > In the context of WebRTC I have seen more streams. In general,
> > the streams concept seems to be useful. QUIC has lots of streams.

That means the migration to flex_array should not have a noticeable
impact, as for a small number of streams it will behave just as the
same as it does now. (yes, considering the app won't use high-order
stream id's just because)

...

> 
> Thought....
> Could we let the application set large stream-ids, but actually mask them
> down to (say) 32 for the protocol code?

This would require both peers to know about the mapping, as stream ids
must be same on both sides. Seems to be it is better to just adjust
the application and make use of low numbers.

  Marcelo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 1/2] net/sctp: Make wrappers for accessing in/out streams
  2018-08-03 19:50                 ` David Miller
@ 2018-08-09  8:39                   ` Konstantin Khorenko
  -1 siblings, 0 replies; 64+ messages in thread
From: Konstantin Khorenko @ 2018-08-09  8:39 UTC (permalink / raw)
  To: David Miller, David Laight
  Cc: marcelo.leitner, oleg.babin, netdev, linux-sctp, vyasevich,
	nhorman, lucien.xin, aryabinin

On 08/03/2018 10:50 PM, David Miller wrote:
> From: Konstantin Khorenko <khorenko@virtuozzo.com>
> Date: Fri,  3 Aug 2018 19:21:01 +0300
>
>> +struct sctp_stream_out *sctp_stream_out(const struct sctp_stream *stream,
>> +					__u16 sid)
>> +{
>> +	return ((struct sctp_stream_out *)(stream->out)) + sid;
>> +}
>> +
>> +struct sctp_stream_in *sctp_stream_in(const struct sctp_stream *stream,
>> +				      __u16 sid)
>> +{
>> +	return ((struct sctp_stream_in *)(stream->in)) + sid;
>> +}
>
> I agree with David that these should be in a header file, and marked
> inline.

David and David,

sure, will move them, thank you!

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 1/2] net/sctp: Make wrappers for accessing in/out streams
@ 2018-08-09  8:39                   ` Konstantin Khorenko
  0 siblings, 0 replies; 64+ messages in thread
From: Konstantin Khorenko @ 2018-08-09  8:39 UTC (permalink / raw)
  To: David Miller, David Laight
  Cc: marcelo.leitner, oleg.babin, netdev, linux-sctp, vyasevich,
	nhorman, lucien.xin, aryabinin

On 08/03/2018 10:50 PM, David Miller wrote:
> From: Konstantin Khorenko <khorenko@virtuozzo.com>
> Date: Fri,  3 Aug 2018 19:21:01 +0300
>
>> +struct sctp_stream_out *sctp_stream_out(const struct sctp_stream *stream,
>> +					__u16 sid)
>> +{
>> +	return ((struct sctp_stream_out *)(stream->out)) + sid;
>> +}
>> +
>> +struct sctp_stream_in *sctp_stream_in(const struct sctp_stream *stream,
>> +				      __u16 sid)
>> +{
>> +	return ((struct sctp_stream_in *)(stream->in)) + sid;
>> +}
>
> I agree with David that these should be in a header file, and marked
> inline.

David and David,

sure, will move them, thank you!

--
Best regards,

Konstantin Khorenko,
Virtuozzo Linux Kernel Team


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 1/2] net/sctp: Make wrappers for accessing in/out streams
  2018-08-03 20:40                 ` Marcelo Ricardo Leitner
@ 2018-08-09  8:40                   ` Konstantin Khorenko
  -1 siblings, 0 replies; 64+ messages in thread
From: Konstantin Khorenko @ 2018-08-09  8:40 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: oleg.babin, netdev, linux-sctp, David S . Miller, Vlad Yasevich,
	Neil Horman, Xin Long, Andrey Ryabinin

On 08/03/2018 11:40 PM, Marcelo Ricardo Leitner wrote:
> On Fri, Aug 03, 2018 at 07:21:01PM +0300, Konstantin Khorenko wrote:
>> This patch introduces wrappers for accessing in/out streams indirectly.
>> This will enable to replace physically contiguous memory arrays
>> of streams with flexible arrays (or maybe any other appropriate
>> mechanism) which do memory allocation on a per-page basis.
>>
>> Signed-off-by: Oleg Babin <obabin@virtuozzo.com>
>> Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
>>
>> ---
>> v2 changes:
>>  sctp_stream_in() users are updated to provide stream as an argument,
>>  sctp_stream_{in,out}_ptr() are now just sctp_stream_{in,out}().
>> ---
>
> ...
>
>>
>>  struct sctp_stream {
>> -	struct sctp_stream_out *out;
>> -	struct sctp_stream_in *in;
>> +	struct flex_array *out;
>> +	struct flex_array *in;
>
> If this patch was meant to be a preparation, shouldn't this belong to
> the next patch instead?

Marcelo,

agree, that will be better, will move the hunk along with changes in
sctp_stream_alloc_{in,out}().

Thank you!

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 1/2] net/sctp: Make wrappers for accessing in/out streams
@ 2018-08-09  8:40                   ` Konstantin Khorenko
  0 siblings, 0 replies; 64+ messages in thread
From: Konstantin Khorenko @ 2018-08-09  8:40 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: oleg.babin, netdev, linux-sctp, David S . Miller, Vlad Yasevich,
	Neil Horman, Xin Long, Andrey Ryabinin

On 08/03/2018 11:40 PM, Marcelo Ricardo Leitner wrote:
> On Fri, Aug 03, 2018 at 07:21:01PM +0300, Konstantin Khorenko wrote:
>> This patch introduces wrappers for accessing in/out streams indirectly.
>> This will enable to replace physically contiguous memory arrays
>> of streams with flexible arrays (or maybe any other appropriate
>> mechanism) which do memory allocation on a per-page basis.
>>
>> Signed-off-by: Oleg Babin <obabin@virtuozzo.com>
>> Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
>>
>> ---
>> v2 changes:
>>  sctp_stream_in() users are updated to provide stream as an argument,
>>  sctp_stream_{in,out}_ptr() are now just sctp_stream_{in,out}().
>> ---
>
> ...
>
>>
>>  struct sctp_stream {
>> -	struct sctp_stream_out *out;
>> -	struct sctp_stream_in *in;
>> +	struct flex_array *out;
>> +	struct flex_array *in;
>
> If this patch was meant to be a preparation, shouldn't this belong to
> the next patch instead?

Marcelo,

agree, that will be better, will move the hunk along with changes in
sctp_stream_alloc_{in,out}().

Thank you!

--
Best regards,

Konstantin Khorenko,
Virtuozzo Linux Kernel Team

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 0/2] net/sctp: Avoid allocating high order memory with kmalloc()
  2018-08-03 23:36               ` Marcelo Ricardo Leitner
@ 2018-08-09  8:43                 ` Konstantin Khorenko
  -1 siblings, 0 replies; 64+ messages in thread
From: Konstantin Khorenko @ 2018-08-09  8:43 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: oleg.babin, netdev, linux-sctp, David S . Miller, Vlad Yasevich,
	Neil Horman, Xin Long, Andrey Ryabinin

On 08/04/2018 02:36 AM, Marcelo Ricardo Leitner wrote:
> On Fri, Aug 03, 2018 at 07:21:00PM +0300, Konstantin Khorenko wrote:
> ...
>> Performance results:
>> ====================
>>   * Kernel: v4.18-rc6 - stock and with 2 patches from Oleg (earlier in this thread)
>>   * Node: CPU (8 cores): Intel(R) Xeon(R) CPU E31230 @ 3.20GHz
>>           RAM: 32 Gb
>>
>>   * netperf: taken from https://github.com/HewlettPackard/netperf.git,
>> 	     compiled from sources with sctp support
>>   * netperf server and client are run on the same node
>>   * ip link set lo mtu 1500
>>
>> The script used to run tests:
>>  # cat run_tests.sh
>>  #!/bin/bash
>>
>> for test in SCTP_STREAM SCTP_STREAM_MANY SCTP_RR SCTP_RR_MANY; do
>>   echo "TEST: $test";
>>   for i in `seq 1 3`; do
>>     echo "Iteration: $i";
>>     set -x
>>     netperf -t $test -H localhost -p 22222 -S 200000,200000 -s 200000,200000 \
>>             -l 60 -- -m 1452;
>>     set +x
>>   done
>> done
>> ================================================
>>
>> Results (a bit reformatted to be more readable):
> ...
>
> Nice, good numbers.
>
> I'm missing some test that actually uses more than 1 stream. All tests
> in netperf uses only 1 stream. They can use 1 or Many associations on
> a socket, but not multiple streams. That means the numbers here show
> that we shouldn't see any regression on the more traditional uses, per
> Michael's reply on the other email, but it is not testing how it will
> behave if we go crazy and use the 64k streams (worst case).
>
> You'll need some other tool to test it. One idea is sctp_test, from
> lksctp-tools. Something like:
>
> Server side:
> 	./sctp_test -H 172.0.0.1 -P 22222 -l -d 0
> Client side:
> 	time ./sctp_test -H 172.0.0.1 -P 22221 \
> 		-h 172.0.0.1 -p 22222 -s \
> 		-c 1 -M 65535 -T -t 1 -x 100000 -d 0
>
> And then measure the difference on how long each test took. Can you
> get these too?
>
> Interesting that in my laptop just to start this test for the first
> time can took some *seconds*. Seems kernel had a hard time
> defragmenting the memory here. :)

No problem, will do those measurements as well.

Even more, to get the test results more repeatable, i think i make
the memory highly fragmented before the test. :)

Thank you!

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 0/2] net/sctp: Avoid allocating high order memory with kmalloc()
@ 2018-08-09  8:43                 ` Konstantin Khorenko
  0 siblings, 0 replies; 64+ messages in thread
From: Konstantin Khorenko @ 2018-08-09  8:43 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: oleg.babin, netdev, linux-sctp, David S . Miller, Vlad Yasevich,
	Neil Horman, Xin Long, Andrey Ryabinin

On 08/04/2018 02:36 AM, Marcelo Ricardo Leitner wrote:
> On Fri, Aug 03, 2018 at 07:21:00PM +0300, Konstantin Khorenko wrote:
> ...
>> Performance results:
>> ==========
>>   * Kernel: v4.18-rc6 - stock and with 2 patches from Oleg (earlier in this thread)
>>   * Node: CPU (8 cores): Intel(R) Xeon(R) CPU E31230 @ 3.20GHz
>>           RAM: 32 Gb
>>
>>   * netperf: taken from https://github.com/HewlettPackard/netperf.git,
>> 	     compiled from sources with sctp support
>>   * netperf server and client are run on the same node
>>   * ip link set lo mtu 1500
>>
>> The script used to run tests:
>>  # cat run_tests.sh
>>  #!/bin/bash
>>
>> for test in SCTP_STREAM SCTP_STREAM_MANY SCTP_RR SCTP_RR_MANY; do
>>   echo "TEST: $test";
>>   for i in `seq 1 3`; do
>>     echo "Iteration: $i";
>>     set -x
>>     netperf -t $test -H localhost -p 22222 -S 200000,200000 -s 200000,200000 \
>>             -l 60 -- -m 1452;
>>     set +x
>>   done
>> done
>> ========================
>>
>> Results (a bit reformatted to be more readable):
> ...
>
> Nice, good numbers.
>
> I'm missing some test that actually uses more than 1 stream. All tests
> in netperf uses only 1 stream. They can use 1 or Many associations on
> a socket, but not multiple streams. That means the numbers here show
> that we shouldn't see any regression on the more traditional uses, per
> Michael's reply on the other email, but it is not testing how it will
> behave if we go crazy and use the 64k streams (worst case).
>
> You'll need some other tool to test it. One idea is sctp_test, from
> lksctp-tools. Something like:
>
> Server side:
> 	./sctp_test -H 172.0.0.1 -P 22222 -l -d 0
> Client side:
> 	time ./sctp_test -H 172.0.0.1 -P 22221 \
> 		-h 172.0.0.1 -p 22222 -s \
> 		-c 1 -M 65535 -T -t 1 -x 100000 -d 0
>
> And then measure the difference on how long each test took. Can you
> get these too?
>
> Interesting that in my laptop just to start this test for the first
> time can took some *seconds*. Seems kernel had a hard time
> defragmenting the memory here. :)

No problem, will do those measurements as well.

Even more, to get the test results more repeatable, i think i make
the memory highly fragmented before the test. :)

Thank you!

--
Best regards,

Konstantin Khorenko,
Virtuozzo Linux Kernel Team

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 0/2] net/sctp: Avoid allocating high order memory with kmalloc()
  2018-08-09  8:43                 ` Konstantin Khorenko
@ 2018-08-10 17:03                   ` Konstantin Khorenko
  -1 siblings, 0 replies; 64+ messages in thread
From: Konstantin Khorenko @ 2018-08-10 17:03 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: oleg.babin, netdev, linux-sctp, David S . Miller, Vlad Yasevich,
	Neil Horman, Xin Long, Andrey Ryabinin

On 08/09/2018 11:43 AM, Konstantin Khorenko wrote:
> On 08/04/2018 02:36 AM, Marcelo Ricardo Leitner wrote:
>> On Fri, Aug 03, 2018 at 07:21:00PM +0300, Konstantin Khorenko wrote:
>> ...
>>> Performance results:
>>> ====================
>>>   * Kernel: v4.18-rc6 - stock and with 2 patches from Oleg (earlier in this thread)
>>>   * Node: CPU (8 cores): Intel(R) Xeon(R) CPU E31230 @ 3.20GHz
>>>           RAM: 32 Gb
>>>
>>>   * netperf: taken from https://github.com/HewlettPackard/netperf.git,
>>> 	     compiled from sources with sctp support
>>>   * netperf server and client are run on the same node
>>>   * ip link set lo mtu 1500
>>>
>>> The script used to run tests:
>>>  # cat run_tests.sh
>>>  #!/bin/bash
>>>
>>> for test in SCTP_STREAM SCTP_STREAM_MANY SCTP_RR SCTP_RR_MANY; do
>>>   echo "TEST: $test";
>>>   for i in `seq 1 3`; do
>>>     echo "Iteration: $i";
>>>     set -x
>>>     netperf -t $test -H localhost -p 22222 -S 200000,200000 -s 200000,200000 \
>>>             -l 60 -- -m 1452;
>>>     set +x
>>>   done
>>> done
>>> ================================================
>>>
>>> Results (a bit reformatted to be more readable):
>> ...
>>
>> Nice, good numbers.
>>
>> I'm missing some test that actually uses more than 1 stream. All tests
>> in netperf uses only 1 stream. They can use 1 or Many associations on
>> a socket, but not multiple streams. That means the numbers here show
>> that we shouldn't see any regression on the more traditional uses, per
>> Michael's reply on the other email, but it is not testing how it will
>> behave if we go crazy and use the 64k streams (worst case).
>>
>> You'll need some other tool to test it. One idea is sctp_test, from
>> lksctp-tools. Something like:
>>
>> Server side:
>> 	./sctp_test -H 172.0.0.1 -P 22222 -l -d 0
>> Client side:
>> 	time ./sctp_test -H 172.0.0.1 -P 22221 \
>> 		-h 172.0.0.1 -p 22222 -s \
>> 		-c 1 -M 65535 -T -t 1 -x 100000 -d 0
>>
>> And then measure the difference on how long each test took. Can you
>> get these too?
>>
>> Interesting that in my laptop just to start this test for the first
>> time can took some *seconds*. Seems kernel had a hard time
>> defragmenting the memory here. :)

Hi Marcelo,

got 3 of 4 results, please take a look, but i failed to measure running
the test on stock kernel when memory is fragmented, test fails with
         *** connect:  Cannot allocate memory ***


Performance results:
====================
   * Kernel: v4.18-rc8 - stock and with 2 patches v3
   * Node: CPU (8 cores): Intel(R) Xeon(R) CPU E31230 @ 3.20GHz
           RAM: 32 Gb

   * sctp_test: https://github.com/sctp/lksctp-tools
   * both server and client are run on the same node
   * ip link set lo mtu 1500
   * sysctl -w vm.max_map_count=65530000 (need it to make memory fragmented)

The script used to run tests:
=============================
# cat run_sctp_test.sh
#!/bin/bash

set -x

uname -r
ip link set lo mtu 1500
swapoff -a

free
cat /proc/buddyinfo

./src/apps/sctp_test -H 127.0.0.1 -P 22222 -l -d 0 &
sleep 3

time ./src/apps/sctp_test -H 127.0.0.1 -P 22221 -h 127.0.0.1 -p 22222 \
         -s -c 1 -M 65535 -T -t 1 -x 100000 -d 0 1>/dev/null

killall -9 lt-sctp_test
===============================

Results (a bit reformatted to be more readable):

1) ms stock kernel v4.18-rc8, no memory fragmentation
Info about memory - more or less same to iterations:
# free
               total        used        free      shared  buff/cache   available
Mem:       32906008      213156    32178184         764      514668    32260968
Swap:             0           0           0

cat /proc/buddyinfo
Node 0, zone      DMA      0      1      1      0      2      1      1      0      1      1      3
Node 0, zone    DMA32      1      3      5      4      2      2      3      6      6      4    867
Node 0, zone   Normal    551    422    160    204    193     34     15      7     22     19   6956

	test 1		test 2		test 3
real    0m14.715s	0m14.593s	0m15.954s
user    0m0.954s	0m0.955s	0m0.854s
sys     0m13.388s	0m12.537s	0m13.749s

2) kernel with fixes, no memory fragmentation
'free' and 'buddyinfo' similar to 1)

	test 1		test 2		test 3
real    0m14.959s	0m14.693s	0m14.762s
user    0m0.948s	0m0.921s	0m0.929s
sys     0m13.538s	0m13.225s	0m13.217s

3) kernel with fixes, memory fragmented
(mmap() all available RAM, touch all pages, munmap() half of pages (each second page), do it again for RAM/2)
'free':
               total        used        free      shared  buff/cache   available
Mem:       32906008    30555200      302740         764     2048068      266452
Mem:       32906008    30379948      541436         764     1984624      442376
Mem:       32906008    30717312      262380         764     1926316      109908

/proc/buddyinfo:
Node 0, zone   Normal  40773     37     34     29      0      0      0      0      0      0      0
Node 0, zone   Normal 100332     68      8      4      2      1      1      0      0      0      0
Node 0, zone   Normal  31113      7      2      1      0      0      0      0      0      0      0

	test 1		test 2		test 3
real    0m14.159s	0m15.252s	0m15.826s
user    0m0.839s	0m1.004s	0m1.048s
sys     0m11.827s	0m14.240s	0m14.778s

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 0/2] net/sctp: Avoid allocating high order memory with kmalloc()
@ 2018-08-10 17:03                   ` Konstantin Khorenko
  0 siblings, 0 replies; 64+ messages in thread
From: Konstantin Khorenko @ 2018-08-10 17:03 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: oleg.babin, netdev, linux-sctp, David S . Miller, Vlad Yasevich,
	Neil Horman, Xin Long, Andrey Ryabinin

On 08/09/2018 11:43 AM, Konstantin Khorenko wrote:
> On 08/04/2018 02:36 AM, Marcelo Ricardo Leitner wrote:
>> On Fri, Aug 03, 2018 at 07:21:00PM +0300, Konstantin Khorenko wrote:
>> ...
>>> Performance results:
>>> ==========
>>>   * Kernel: v4.18-rc6 - stock and with 2 patches from Oleg (earlier in this thread)
>>>   * Node: CPU (8 cores): Intel(R) Xeon(R) CPU E31230 @ 3.20GHz
>>>           RAM: 32 Gb
>>>
>>>   * netperf: taken from https://github.com/HewlettPackard/netperf.git,
>>> 	     compiled from sources with sctp support
>>>   * netperf server and client are run on the same node
>>>   * ip link set lo mtu 1500
>>>
>>> The script used to run tests:
>>>  # cat run_tests.sh
>>>  #!/bin/bash
>>>
>>> for test in SCTP_STREAM SCTP_STREAM_MANY SCTP_RR SCTP_RR_MANY; do
>>>   echo "TEST: $test";
>>>   for i in `seq 1 3`; do
>>>     echo "Iteration: $i";
>>>     set -x
>>>     netperf -t $test -H localhost -p 22222 -S 200000,200000 -s 200000,200000 \
>>>             -l 60 -- -m 1452;
>>>     set +x
>>>   done
>>> done
>>> ========================
>>>
>>> Results (a bit reformatted to be more readable):
>> ...
>>
>> Nice, good numbers.
>>
>> I'm missing some test that actually uses more than 1 stream. All tests
>> in netperf uses only 1 stream. They can use 1 or Many associations on
>> a socket, but not multiple streams. That means the numbers here show
>> that we shouldn't see any regression on the more traditional uses, per
>> Michael's reply on the other email, but it is not testing how it will
>> behave if we go crazy and use the 64k streams (worst case).
>>
>> You'll need some other tool to test it. One idea is sctp_test, from
>> lksctp-tools. Something like:
>>
>> Server side:
>> 	./sctp_test -H 172.0.0.1 -P 22222 -l -d 0
>> Client side:
>> 	time ./sctp_test -H 172.0.0.1 -P 22221 \
>> 		-h 172.0.0.1 -p 22222 -s \
>> 		-c 1 -M 65535 -T -t 1 -x 100000 -d 0
>>
>> And then measure the difference on how long each test took. Can you
>> get these too?
>>
>> Interesting that in my laptop just to start this test for the first
>> time can took some *seconds*. Seems kernel had a hard time
>> defragmenting the memory here. :)

Hi Marcelo,

got 3 of 4 results, please take a look, but i failed to measure running
the test on stock kernel when memory is fragmented, test fails with
         *** connect:  Cannot allocate memory ***


Performance results:
==========
   * Kernel: v4.18-rc8 - stock and with 2 patches v3
   * Node: CPU (8 cores): Intel(R) Xeon(R) CPU E31230 @ 3.20GHz
           RAM: 32 Gb

   * sctp_test: https://github.com/sctp/lksctp-tools
   * both server and client are run on the same node
   * ip link set lo mtu 1500
   * sysctl -w vm.max_map_counte530000 (need it to make memory fragmented)

The script used to run tests:
==============# cat run_sctp_test.sh
#!/bin/bash

set -x

uname -r
ip link set lo mtu 1500
swapoff -a

free
cat /proc/buddyinfo

./src/apps/sctp_test -H 127.0.0.1 -P 22222 -l -d 0 &
sleep 3

time ./src/apps/sctp_test -H 127.0.0.1 -P 22221 -h 127.0.0.1 -p 22222 \
         -s -c 1 -M 65535 -T -t 1 -x 100000 -d 0 1>/dev/null

killall -9 lt-sctp_test
===============
Results (a bit reformatted to be more readable):

1) ms stock kernel v4.18-rc8, no memory fragmentation
Info about memory - more or less same to iterations:
# free
               total        used        free      shared  buff/cache   available
Mem:       32906008      213156    32178184         764      514668    32260968
Swap:             0           0           0

cat /proc/buddyinfo
Node 0, zone      DMA      0      1      1      0      2      1      1      0      1      1      3
Node 0, zone    DMA32      1      3      5      4      2      2      3      6      6      4    867
Node 0, zone   Normal    551    422    160    204    193     34     15      7     22     19   6956

	test 1		test 2		test 3
real    0m14.715s	0m14.593s	0m15.954s
user    0m0.954s	0m0.955s	0m0.854s
sys     0m13.388s	0m12.537s	0m13.749s

2) kernel with fixes, no memory fragmentation
'free' and 'buddyinfo' similar to 1)

	test 1		test 2		test 3
real    0m14.959s	0m14.693s	0m14.762s
user    0m0.948s	0m0.921s	0m0.929s
sys     0m13.538s	0m13.225s	0m13.217s

3) kernel with fixes, memory fragmented
(mmap() all available RAM, touch all pages, munmap() half of pages (each second page), do it again for RAM/2)
'free':
               total        used        free      shared  buff/cache   available
Mem:       32906008    30555200      302740         764     2048068      266452
Mem:       32906008    30379948      541436         764     1984624      442376
Mem:       32906008    30717312      262380         764     1926316      109908

/proc/buddyinfo:
Node 0, zone   Normal  40773     37     34     29      0      0      0      0      0      0      0
Node 0, zone   Normal 100332     68      8      4      2      1      1      0      0      0      0
Node 0, zone   Normal  31113      7      2      1      0      0      0      0      0      0      0

	test 1		test 2		test 3
real    0m14.159s	0m15.252s	0m15.826s
user    0m0.839s	0m1.004s	0m1.048s
sys     0m11.827s	0m14.240s	0m14.778s


--
Best regards,

Konstantin Khorenko,
Virtuozzo Linux Kernel Team

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v3 0/2] net/sctp: Avoid allocating high order memory with kmalloc()
  2018-08-10 17:03                   ` Konstantin Khorenko
@ 2018-08-10 17:11                     ` Konstantin Khorenko
  -1 siblings, 0 replies; 64+ messages in thread
From: Konstantin Khorenko @ 2018-08-10 17:11 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: oleg.babin, netdev, linux-sctp, David S . Miller, Vlad Yasevich,
	Neil Horman, Xin Long, Andrey Ryabinin, Konstantin Khorenko

Each SCTP association can have up to 65535 input and output streams.
For each stream type an array of sctp_stream_in or sctp_stream_out
structures is allocated using kmalloc_array() function. This function
allocates physically contiguous memory regions, so this can lead
to allocation of memory regions of very high order, i.e.:

  sizeof(struct sctp_stream_out) == 24,
  ((65535 * 24) / 4096) == 383 memory pages (4096 byte per page),
  which means 9th memory order.

This can lead to a memory allocation failures on the systems
under a memory stress.

We actually do not need these arrays of memory to be physically
contiguous. Possible simple solution would be to use kvmalloc()
instread of kmalloc() as kvmalloc() can allocate physically scattered
pages if contiguous pages are not available. But the problem
is that the allocation can happed in a softirq context with
GFP_ATOMIC flag set, and kvmalloc() cannot be used in this scenario.

So the other possible solution is to use flexible arrays instead of
contiguios arrays of memory so that the memory would be allocated
on a per-page basis.

This patchset replaces kvmalloc() with flex_array usage.
It consists of two parts:

  * First patch is preparatory - it mechanically wraps all direct
    access to assoc->stream.out[] and assoc->stream.in[] arrays
    with SCTP_SO() and SCTP_SI() wrappers so that later a direct
    array access could be easily changed to an access to a
    flex_array (or any other possible alternative).
  * Second patch replaces kmalloc_array() with flex_array usage.

Oleg Babin (2):
  net/sctp: Make wrappers for accessing in/out streams
  net/sctp: Replace in/out stream arrays with flex_array

 include/net/sctp/structs.h   |  40 +++++++----
 net/sctp/chunk.c             |   6 +-
 net/sctp/outqueue.c          |  11 ++--
 net/sctp/socket.c            |   4 +-
 net/sctp/stream.c            | 153 ++++++++++++++++++++++++++++---------------
 net/sctp/stream_interleave.c |  20 +++---
 net/sctp/stream_sched.c      |  13 ++--
 net/sctp/stream_sched_prio.c |  22 +++----
 net/sctp/stream_sched_rr.c   |   8 +--
 9 files changed, 172 insertions(+), 105 deletions(-)

v2 changes:
 sctp_stream_in() users are updated to provide stream as an argument,
 sctp_stream_{in,out}_ptr() are now just sctp_stream_{in,out}().

v3 changes:
 Move type chages struct sctp_stream_out -> flex_array to next patch.
 Make sctp_stream_{in,out}() static incline and move them to a header.

Performance results (single stream):
====================================
  * Kernel: v4.18-rc6 - stock and with 2 patches from Oleg (earlier in this thread)
  * Node: CPU (8 cores): Intel(R) Xeon(R) CPU E31230 @ 3.20GHz
          RAM: 32 Gb

  * netperf: taken from https://github.com/HewlettPackard/netperf.git,
	     compiled from sources with sctp support
  * netperf server and client are run on the same node
  * ip link set lo mtu 1500

The script used to run tests:
 # cat run_tests.sh
 #!/bin/bash

for test in SCTP_STREAM SCTP_STREAM_MANY SCTP_RR SCTP_RR_MANY; do
  echo "TEST: $test";
  for i in `seq 1 3`; do
    echo "Iteration: $i";
    set -x
    netperf -t $test -H localhost -p 22222 -S 200000,200000 -s 200000,200000 \
            -l 60 -- -m 1452;
    set +x
  done
done
================================================

Results (a bit reformatted to be more readable):
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

				v4.18-rc7	v4.18-rc7 + fixes
TEST: SCTP_STREAM
212992 212992   1452    60.21	1125.52		1247.04
212992 212992   1452    60.20	1376.38		1149.95
212992 212992   1452    60.20	1131.40		1163.85
TEST: SCTP_STREAM_MANY
212992 212992   1452    60.00	1111.00		1310.05
212992 212992   1452    60.00	1188.55		1130.50
212992 212992   1452    60.00	1108.06		1162.50

===========
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

					v4.18-rc7	v4.18-rc7 + fixes
TEST: SCTP_RR
212992 212992 1        1       60.00	45486.98	46089.43
212992 212992 1        1       60.00	45584.18	45994.21
212992 212992 1        1       60.00	45703.86	45720.84
TEST: SCTP_RR_MANY
212992 212992 1        1       60.00	40.75		40.77
212992 212992 1        1       60.00	40.58		40.08
212992 212992 1        1       60.00	39.98		39.97

Performance results for many streams:
=====================================
   * Kernel: v4.18-rc8 - stock and with 2 patches v3
   * Node: CPU (8 cores): Intel(R) Xeon(R) CPU E31230 @ 3.20GHz
           RAM: 32 Gb

   * sctp_test: https://github.com/sctp/lksctp-tools
   * both server and client are run on the same node
   * ip link set lo mtu 1500
   * sysctl -w vm.max_map_count=65530000 (need it to make memory fragmented)

The script used to run tests:
=============================
 # cat run_sctp_test.sh
 #!/bin/bash

set -x

uname -r
ip link set lo mtu 1500
swapoff -a

free
cat /proc/buddyinfo

./src/apps/sctp_test -H 127.0.0.1 -P 22222 -l -d 0 &
sleep 3

time ./src/apps/sctp_test -H 127.0.0.1 -P 22221 -h 127.0.0.1 -p 22222 \
         -s -c 1 -M 65535 -T -t 1 -x 100000 -d 0 1>/dev/null

killall -9 lt-sctp_test
===============================

Results (a bit reformatted to be more readable):

1) ms stock kernel v4.18-rc8, no memory fragmentation
	test 1		test 2		test 3
real    0m14.715s	0m14.593s	0m15.954s
user    0m0.954s	0m0.955s	0m0.854s
sys     0m13.388s	0m12.537s	0m13.749s

2) kernel with fixes, no memory fragmentation
	test 1		test 2		test 3
real    0m14.959s	0m14.693s	0m14.762s
user    0m0.948s	0m0.921s	0m0.929s
sys     0m13.538s	0m13.225s	0m13.217s

3) kernel with fixes, memory fragmented
'free':
               total        used        free      shared  buff/cache   available
Mem:       32906008    30555200      302740         764     2048068      266452
Mem:       32906008    30379948      541436         764     1984624      442376
Mem:       32906008    30717312      262380         764     1926316      109908

/proc/buddyinfo:
Node 0, zone   Normal  40773     37     34     29      0      0      0      0      0      0      0
Node 0, zone   Normal 100332     68      8      4      2      1      1      0      0      0      0
Node 0, zone   Normal  31113      7      2      1      0      0      0      0      0      0      0

	test 1		test 2		test 3
real    0m14.159s	0m15.252s	0m15.826s
user    0m0.839s	0m1.004s	0m1.048s
sys     0m11.827s	0m14.240s	0m14.778s

-- 
2.15.1

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v3 0/2] net/sctp: Avoid allocating high order memory with kmalloc()
@ 2018-08-10 17:11                     ` Konstantin Khorenko
  0 siblings, 0 replies; 64+ messages in thread
From: Konstantin Khorenko @ 2018-08-10 17:11 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: oleg.babin, netdev, linux-sctp, David S . Miller, Vlad Yasevich,
	Neil Horman, Xin Long, Andrey Ryabinin, Konstantin Khorenko

Each SCTP association can have up to 65535 input and output streams.
For each stream type an array of sctp_stream_in or sctp_stream_out
structures is allocated using kmalloc_array() function. This function
allocates physically contiguous memory regions, so this can lead
to allocation of memory regions of very high order, i.e.:

  sizeof(struct sctp_stream_out) = 24,
  ((65535 * 24) / 4096) = 383 memory pages (4096 byte per page),
  which means 9th memory order.

This can lead to a memory allocation failures on the systems
under a memory stress.

We actually do not need these arrays of memory to be physically
contiguous. Possible simple solution would be to use kvmalloc()
instread of kmalloc() as kvmalloc() can allocate physically scattered
pages if contiguous pages are not available. But the problem
is that the allocation can happed in a softirq context with
GFP_ATOMIC flag set, and kvmalloc() cannot be used in this scenario.

So the other possible solution is to use flexible arrays instead of
contiguios arrays of memory so that the memory would be allocated
on a per-page basis.

This patchset replaces kvmalloc() with flex_array usage.
It consists of two parts:

  * First patch is preparatory - it mechanically wraps all direct
    access to assoc->stream.out[] and assoc->stream.in[] arrays
    with SCTP_SO() and SCTP_SI() wrappers so that later a direct
    array access could be easily changed to an access to a
    flex_array (or any other possible alternative).
  * Second patch replaces kmalloc_array() with flex_array usage.

Oleg Babin (2):
  net/sctp: Make wrappers for accessing in/out streams
  net/sctp: Replace in/out stream arrays with flex_array

 include/net/sctp/structs.h   |  40 +++++++----
 net/sctp/chunk.c             |   6 +-
 net/sctp/outqueue.c          |  11 ++--
 net/sctp/socket.c            |   4 +-
 net/sctp/stream.c            | 153 ++++++++++++++++++++++++++++---------------
 net/sctp/stream_interleave.c |  20 +++---
 net/sctp/stream_sched.c      |  13 ++--
 net/sctp/stream_sched_prio.c |  22 +++----
 net/sctp/stream_sched_rr.c   |   8 +--
 9 files changed, 172 insertions(+), 105 deletions(-)

v2 changes:
 sctp_stream_in() users are updated to provide stream as an argument,
 sctp_stream_{in,out}_ptr() are now just sctp_stream_{in,out}().

v3 changes:
 Move type chages struct sctp_stream_out -> flex_array to next patch.
 Make sctp_stream_{in,out}() static incline and move them to a header.

Performance results (single stream):
==================
  * Kernel: v4.18-rc6 - stock and with 2 patches from Oleg (earlier in this thread)
  * Node: CPU (8 cores): Intel(R) Xeon(R) CPU E31230 @ 3.20GHz
          RAM: 32 Gb

  * netperf: taken from https://github.com/HewlettPackard/netperf.git,
	     compiled from sources with sctp support
  * netperf server and client are run on the same node
  * ip link set lo mtu 1500

The script used to run tests:
 # cat run_tests.sh
 #!/bin/bash

for test in SCTP_STREAM SCTP_STREAM_MANY SCTP_RR SCTP_RR_MANY; do
  echo "TEST: $test";
  for i in `seq 1 3`; do
    echo "Iteration: $i";
    set -x
    netperf -t $test -H localhost -p 22222 -S 200000,200000 -s 200000,200000 \
            -l 60 -- -m 1452;
    set +x
  done
done
========================

Results (a bit reformatted to be more readable):
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

				v4.18-rc7	v4.18-rc7 + fixes
TEST: SCTP_STREAM
212992 212992   1452    60.21	1125.52		1247.04
212992 212992   1452    60.20	1376.38		1149.95
212992 212992   1452    60.20	1131.40		1163.85
TEST: SCTP_STREAM_MANY
212992 212992   1452    60.00	1111.00		1310.05
212992 212992   1452    60.00	1188.55		1130.50
212992 212992   1452    60.00	1108.06		1162.50

=====Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

					v4.18-rc7	v4.18-rc7 + fixes
TEST: SCTP_RR
212992 212992 1        1       60.00	45486.98	46089.43
212992 212992 1        1       60.00	45584.18	45994.21
212992 212992 1        1       60.00	45703.86	45720.84
TEST: SCTP_RR_MANY
212992 212992 1        1       60.00	40.75		40.77
212992 212992 1        1       60.00	40.58		40.08
212992 212992 1        1       60.00	39.98		39.97

Performance results for many streams:
==================   * Kernel: v4.18-rc8 - stock and with 2 patches v3
   * Node: CPU (8 cores): Intel(R) Xeon(R) CPU E31230 @ 3.20GHz
           RAM: 32 Gb

   * sctp_test: https://github.com/sctp/lksctp-tools
   * both server and client are run on the same node
   * ip link set lo mtu 1500
   * sysctl -w vm.max_map_counte530000 (need it to make memory fragmented)

The script used to run tests:
============== # cat run_sctp_test.sh
 #!/bin/bash

set -x

uname -r
ip link set lo mtu 1500
swapoff -a

free
cat /proc/buddyinfo

./src/apps/sctp_test -H 127.0.0.1 -P 22222 -l -d 0 &
sleep 3

time ./src/apps/sctp_test -H 127.0.0.1 -P 22221 -h 127.0.0.1 -p 22222 \
         -s -c 1 -M 65535 -T -t 1 -x 100000 -d 0 1>/dev/null

killall -9 lt-sctp_test
===============
Results (a bit reformatted to be more readable):

1) ms stock kernel v4.18-rc8, no memory fragmentation
	test 1		test 2		test 3
real    0m14.715s	0m14.593s	0m15.954s
user    0m0.954s	0m0.955s	0m0.854s
sys     0m13.388s	0m12.537s	0m13.749s

2) kernel with fixes, no memory fragmentation
	test 1		test 2		test 3
real    0m14.959s	0m14.693s	0m14.762s
user    0m0.948s	0m0.921s	0m0.929s
sys     0m13.538s	0m13.225s	0m13.217s

3) kernel with fixes, memory fragmented
'free':
               total        used        free      shared  buff/cache   available
Mem:       32906008    30555200      302740         764     2048068      266452
Mem:       32906008    30379948      541436         764     1984624      442376
Mem:       32906008    30717312      262380         764     1926316      109908

/proc/buddyinfo:
Node 0, zone   Normal  40773     37     34     29      0      0      0      0      0      0      0
Node 0, zone   Normal 100332     68      8      4      2      1      1      0      0      0      0
Node 0, zone   Normal  31113      7      2      1      0      0      0      0      0      0      0

	test 1		test 2		test 3
real    0m14.159s	0m15.252s	0m15.826s
user    0m0.839s	0m1.004s	0m1.048s
sys     0m11.827s	0m14.240s	0m14.778s

-- 
2.15.1

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v3 1/2] net/sctp: Make wrappers for accessing in/out streams
  2018-08-10 17:11                     ` Konstantin Khorenko
@ 2018-08-10 17:11                       ` Konstantin Khorenko
  -1 siblings, 0 replies; 64+ messages in thread
From: Konstantin Khorenko @ 2018-08-10 17:11 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: oleg.babin, netdev, linux-sctp, David S . Miller, Vlad Yasevich,
	Neil Horman, Xin Long, Andrey Ryabinin, Konstantin Khorenko

This patch introduces wrappers for accessing in/out streams indirectly.
This will enable to replace physically contiguous memory arrays
of streams with flexible arrays (or maybe any other appropriate
mechanism) which do memory allocation on a per-page basis.

Signed-off-by: Oleg Babin <obabin@virtuozzo.com>
Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>

---
v2 changes:
 sctp_stream_in() users are updated to provide stream as an argument,
 sctp_stream_{in,out}_ptr() are now just sctp_stream_{in,out}().

v3 changes:
 Move type chages struct sctp_stream_out -> flex_array to next patch.
 Make sctp_stream_{in,out}() static incline and move them to a header.
---
 include/net/sctp/structs.h   | 35 +++++++++++++++++-------
 net/sctp/chunk.c             |  6 ++--
 net/sctp/outqueue.c          | 11 ++++----
 net/sctp/socket.c            |  4 +--
 net/sctp/stream.c            | 65 +++++++++++++++++++++++---------------------
 net/sctp/stream_interleave.c | 20 +++++++-------
 net/sctp/stream_sched.c      | 13 +++++----
 net/sctp/stream_sched_prio.c | 22 +++++++--------
 net/sctp/stream_sched_rr.c   |  8 +++---
 9 files changed, 103 insertions(+), 81 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index dbe1b911a24d..ce4bf844f573 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -394,37 +394,35 @@ void sctp_stream_update(struct sctp_stream *stream, struct sctp_stream *new);
 
 /* What is the current SSN number for this stream? */
 #define sctp_ssn_peek(stream, type, sid) \
-	((stream)->type[sid].ssn)
+	(sctp_stream_##type((stream), (sid))->ssn)
 
 /* Return the next SSN number for this stream.	*/
 #define sctp_ssn_next(stream, type, sid) \
-	((stream)->type[sid].ssn++)
+	(sctp_stream_##type((stream), (sid))->ssn++)
 
 /* Skip over this ssn and all below. */
 #define sctp_ssn_skip(stream, type, sid, ssn) \
-	((stream)->type[sid].ssn = ssn + 1)
+	(sctp_stream_##type((stream), (sid))->ssn = ssn + 1)
 
 /* What is the current MID number for this stream? */
 #define sctp_mid_peek(stream, type, sid) \
-	((stream)->type[sid].mid)
+	(sctp_stream_##type((stream), (sid))->mid)
 
 /* Return the next MID number for this stream.  */
 #define sctp_mid_next(stream, type, sid) \
-	((stream)->type[sid].mid++)
+	(sctp_stream_##type((stream), (sid))->mid++)
 
 /* Skip over this mid and all below. */
 #define sctp_mid_skip(stream, type, sid, mid) \
-	((stream)->type[sid].mid = mid + 1)
-
-#define sctp_stream_in(asoc, sid) (&(asoc)->stream.in[sid])
+	(sctp_stream_##type((stream), (sid))->mid = mid + 1)
 
 /* What is the current MID_uo number for this stream? */
 #define sctp_mid_uo_peek(stream, type, sid) \
-	((stream)->type[sid].mid_uo)
+	(sctp_stream_##type((stream), (sid))->mid_uo)
 
 /* Return the next MID_uo number for this stream.  */
 #define sctp_mid_uo_next(stream, type, sid) \
-	((stream)->type[sid].mid_uo++)
+	(sctp_stream_##type((stream), (sid))->mid_uo++)
 
 /*
  * Pointers to address related SCTP functions.
@@ -1456,6 +1454,23 @@ struct sctp_stream {
 	struct sctp_stream_interleave *si;
 };
 
+static inline struct sctp_stream_out *sctp_stream_out(
+	const struct sctp_stream *stream,
+	__u16 sid)
+{
+	return ((struct sctp_stream_out *)(stream->out)) + sid;
+}
+
+static inline struct sctp_stream_in *sctp_stream_in(
+	const struct sctp_stream *stream,
+	__u16 sid)
+{
+	return ((struct sctp_stream_in *)(stream->in)) + sid;
+}
+
+#define SCTP_SO(s, i) sctp_stream_out((s), (i))
+#define SCTP_SI(s, i) sctp_stream_in((s), (i))
+
 #define SCTP_STREAM_CLOSED		0x00
 #define SCTP_STREAM_OPEN		0x01
 
diff --git a/net/sctp/chunk.c b/net/sctp/chunk.c
index bfb9f812e2ef..ce8087846f05 100644
--- a/net/sctp/chunk.c
+++ b/net/sctp/chunk.c
@@ -325,7 +325,8 @@ int sctp_chunk_abandoned(struct sctp_chunk *chunk)
 	if (SCTP_PR_TTL_ENABLED(chunk->sinfo.sinfo_flags) &&
 	    time_after(jiffies, chunk->msg->expires_at)) {
 		struct sctp_stream_out *streamout =
-			&chunk->asoc->stream.out[chunk->sinfo.sinfo_stream];
+			SCTP_SO(&chunk->asoc->stream,
+				chunk->sinfo.sinfo_stream);
 
 		if (chunk->sent_count) {
 			chunk->asoc->abandoned_sent[SCTP_PR_INDEX(TTL)]++;
@@ -339,7 +340,8 @@ int sctp_chunk_abandoned(struct sctp_chunk *chunk)
 	} else if (SCTP_PR_RTX_ENABLED(chunk->sinfo.sinfo_flags) &&
 		   chunk->sent_count > chunk->sinfo.sinfo_timetolive) {
 		struct sctp_stream_out *streamout =
-			&chunk->asoc->stream.out[chunk->sinfo.sinfo_stream];
+			SCTP_SO(&chunk->asoc->stream,
+				chunk->sinfo.sinfo_stream);
 
 		chunk->asoc->abandoned_sent[SCTP_PR_INDEX(RTX)]++;
 		streamout->ext->abandoned_sent[SCTP_PR_INDEX(RTX)]++;
diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index d68aa33485a9..d74d00b29942 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -80,7 +80,7 @@ static inline void sctp_outq_head_data(struct sctp_outq *q,
 	q->out_qlen += ch->skb->len;
 
 	stream = sctp_chunk_stream_no(ch);
-	oute = q->asoc->stream.out[stream].ext;
+	oute = SCTP_SO(&q->asoc->stream, stream)->ext;
 	list_add(&ch->stream_list, &oute->outq);
 }
 
@@ -101,7 +101,7 @@ static inline void sctp_outq_tail_data(struct sctp_outq *q,
 	q->out_qlen += ch->skb->len;
 
 	stream = sctp_chunk_stream_no(ch);
-	oute = q->asoc->stream.out[stream].ext;
+	oute = SCTP_SO(&q->asoc->stream, stream)->ext;
 	list_add_tail(&ch->stream_list, &oute->outq);
 }
 
@@ -372,7 +372,7 @@ static int sctp_prsctp_prune_sent(struct sctp_association *asoc,
 		sctp_insert_list(&asoc->outqueue.abandoned,
 				 &chk->transmitted_list);
 
-		streamout = &asoc->stream.out[chk->sinfo.sinfo_stream];
+		streamout = SCTP_SO(&asoc->stream, chk->sinfo.sinfo_stream);
 		asoc->sent_cnt_removable--;
 		asoc->abandoned_sent[SCTP_PR_INDEX(PRIO)]++;
 		streamout->ext->abandoned_sent[SCTP_PR_INDEX(PRIO)]++;
@@ -416,7 +416,7 @@ static int sctp_prsctp_prune_unsent(struct sctp_association *asoc,
 		asoc->abandoned_unsent[SCTP_PR_INDEX(PRIO)]++;
 		if (chk->sinfo.sinfo_stream < asoc->stream.outcnt) {
 			struct sctp_stream_out *streamout =
-				&asoc->stream.out[chk->sinfo.sinfo_stream];
+				SCTP_SO(&asoc->stream, chk->sinfo.sinfo_stream);
 
 			streamout->ext->abandoned_unsent[SCTP_PR_INDEX(PRIO)]++;
 		}
@@ -1082,6 +1082,7 @@ static void sctp_outq_flush_data(struct sctp_flush_ctx *ctx,
 	/* Finally, transmit new packets.  */
 	while ((chunk = sctp_outq_dequeue_data(ctx->q)) != NULL) {
 		__u32 sid = ntohs(chunk->subh.data_hdr->stream);
+		__u8 stream_state = SCTP_SO(&ctx->asoc->stream, sid)->state;
 
 		/* Has this chunk expired? */
 		if (sctp_chunk_abandoned(chunk)) {
@@ -1091,7 +1092,7 @@ static void sctp_outq_flush_data(struct sctp_flush_ctx *ctx,
 			continue;
 		}
 
-		if (ctx->asoc->stream.out[sid].state == SCTP_STREAM_CLOSED) {
+		if (stream_state == SCTP_STREAM_CLOSED) {
 			sctp_outq_head_data(ctx->q, chunk);
 			break;
 		}
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index ce620e878538..4582ab25bc4e 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1905,7 +1905,7 @@ static int sctp_sendmsg_to_asoc(struct sctp_association *asoc,
 		goto err;
 	}
 
-	if (unlikely(!asoc->stream.out[sinfo->sinfo_stream].ext)) {
+	if (unlikely(!SCTP_SO(&asoc->stream, sinfo->sinfo_stream)->ext)) {
 		err = sctp_stream_init_ext(&asoc->stream, sinfo->sinfo_stream);
 		if (err)
 			goto err;
@@ -6958,7 +6958,7 @@ static int sctp_getsockopt_pr_streamstatus(struct sock *sk, int len,
 	if (!asoc || params.sprstat_sid >= asoc->stream.outcnt)
 		goto out;
 
-	streamoute = asoc->stream.out[params.sprstat_sid].ext;
+	streamoute = SCTP_SO(&asoc->stream, params.sprstat_sid)->ext;
 	if (!streamoute) {
 		/* Not allocated yet, means all stats are 0 */
 		params.sprstat_abandoned_unsent = 0;
diff --git a/net/sctp/stream.c b/net/sctp/stream.c
index f1f1d1b232ba..7ca6fe4e7882 100644
--- a/net/sctp/stream.c
+++ b/net/sctp/stream.c
@@ -162,7 +162,7 @@ int sctp_stream_init(struct sctp_stream *stream, __u16 outcnt, __u16 incnt,
 
 	stream->outcnt = outcnt;
 	for (i = 0; i < stream->outcnt; i++)
-		stream->out[i].state = SCTP_STREAM_OPEN;
+		SCTP_SO(stream, i)->state = SCTP_STREAM_OPEN;
 
 	sched->init(stream);
 
@@ -193,7 +193,7 @@ int sctp_stream_init_ext(struct sctp_stream *stream, __u16 sid)
 	soute = kzalloc(sizeof(*soute), GFP_KERNEL);
 	if (!soute)
 		return -ENOMEM;
-	stream->out[sid].ext = soute;
+	SCTP_SO(stream, sid)->ext = soute;
 
 	return sctp_sched_init_sid(stream, sid, GFP_KERNEL);
 }
@@ -205,7 +205,7 @@ void sctp_stream_free(struct sctp_stream *stream)
 
 	sched->free(stream);
 	for (i = 0; i < stream->outcnt; i++)
-		kfree(stream->out[i].ext);
+		kfree(SCTP_SO(stream, i)->ext);
 	kfree(stream->out);
 	kfree(stream->in);
 }
@@ -215,12 +215,12 @@ void sctp_stream_clear(struct sctp_stream *stream)
 	int i;
 
 	for (i = 0; i < stream->outcnt; i++) {
-		stream->out[i].mid = 0;
-		stream->out[i].mid_uo = 0;
+		SCTP_SO(stream, i)->mid = 0;
+		SCTP_SO(stream, i)->mid_uo = 0;
 	}
 
 	for (i = 0; i < stream->incnt; i++)
-		stream->in[i].mid = 0;
+		SCTP_SI(stream, i)->mid = 0;
 }
 
 void sctp_stream_update(struct sctp_stream *stream, struct sctp_stream *new)
@@ -273,8 +273,8 @@ static bool sctp_stream_outq_is_empty(struct sctp_stream *stream,
 	for (i = 0; i < str_nums; i++) {
 		__u16 sid = ntohs(str_list[i]);
 
-		if (stream->out[sid].ext &&
-		    !list_empty(&stream->out[sid].ext->outq))
+		if (SCTP_SO(stream, sid)->ext &&
+		    !list_empty(&SCTP_SO(stream, sid)->ext->outq))
 			return false;
 	}
 
@@ -361,11 +361,11 @@ int sctp_send_reset_streams(struct sctp_association *asoc,
 	if (out) {
 		if (str_nums)
 			for (i = 0; i < str_nums; i++)
-				stream->out[str_list[i]].state =
+				SCTP_SO(stream, str_list[i])->state =
 						       SCTP_STREAM_CLOSED;
 		else
 			for (i = 0; i < stream->outcnt; i++)
-				stream->out[i].state = SCTP_STREAM_CLOSED;
+				SCTP_SO(stream, i)->state = SCTP_STREAM_CLOSED;
 	}
 
 	asoc->strreset_chunk = chunk;
@@ -380,11 +380,11 @@ int sctp_send_reset_streams(struct sctp_association *asoc,
 
 		if (str_nums)
 			for (i = 0; i < str_nums; i++)
-				stream->out[str_list[i]].state =
+				SCTP_SO(stream, str_list[i])->state =
 						       SCTP_STREAM_OPEN;
 		else
 			for (i = 0; i < stream->outcnt; i++)
-				stream->out[i].state = SCTP_STREAM_OPEN;
+				SCTP_SO(stream, i)->state = SCTP_STREAM_OPEN;
 
 		goto out;
 	}
@@ -418,7 +418,7 @@ int sctp_send_reset_assoc(struct sctp_association *asoc)
 
 	/* Block further xmit of data until this request is completed */
 	for (i = 0; i < stream->outcnt; i++)
-		stream->out[i].state = SCTP_STREAM_CLOSED;
+		SCTP_SO(stream, i)->state = SCTP_STREAM_CLOSED;
 
 	asoc->strreset_chunk = chunk;
 	sctp_chunk_hold(asoc->strreset_chunk);
@@ -429,7 +429,7 @@ int sctp_send_reset_assoc(struct sctp_association *asoc)
 		asoc->strreset_chunk = NULL;
 
 		for (i = 0; i < stream->outcnt; i++)
-			stream->out[i].state = SCTP_STREAM_OPEN;
+			SCTP_SO(stream, i)->state = SCTP_STREAM_OPEN;
 
 		return retval;
 	}
@@ -609,10 +609,10 @@ struct sctp_chunk *sctp_process_strreset_outreq(
 		}
 
 		for (i = 0; i < nums; i++)
-			stream->in[ntohs(str_p[i])].mid = 0;
+			SCTP_SI(stream, ntohs(str_p[i]))->mid = 0;
 	} else {
 		for (i = 0; i < stream->incnt; i++)
-			stream->in[i].mid = 0;
+			SCTP_SI(stream, i)->mid = 0;
 	}
 
 	result = SCTP_STRRESET_PERFORMED;
@@ -683,11 +683,11 @@ struct sctp_chunk *sctp_process_strreset_inreq(
 
 	if (nums)
 		for (i = 0; i < nums; i++)
-			stream->out[ntohs(str_p[i])].state =
+			SCTP_SO(stream, ntohs(str_p[i]))->state =
 					       SCTP_STREAM_CLOSED;
 	else
 		for (i = 0; i < stream->outcnt; i++)
-			stream->out[i].state = SCTP_STREAM_CLOSED;
+			SCTP_SO(stream, i)->state = SCTP_STREAM_CLOSED;
 
 	asoc->strreset_chunk = chunk;
 	asoc->strreset_outstanding = 1;
@@ -786,11 +786,11 @@ struct sctp_chunk *sctp_process_strreset_tsnreq(
 	 *      incoming and outgoing streams.
 	 */
 	for (i = 0; i < stream->outcnt; i++) {
-		stream->out[i].mid = 0;
-		stream->out[i].mid_uo = 0;
+		SCTP_SO(stream, i)->mid = 0;
+		SCTP_SO(stream, i)->mid_uo = 0;
 	}
 	for (i = 0; i < stream->incnt; i++)
-		stream->in[i].mid = 0;
+		SCTP_SI(stream, i)->mid = 0;
 
 	result = SCTP_STRRESET_PERFORMED;
 
@@ -979,15 +979,18 @@ struct sctp_chunk *sctp_process_strreset_resp(
 		       sizeof(__u16);
 
 		if (result == SCTP_STRRESET_PERFORMED) {
+			struct sctp_stream_out *sout;
 			if (nums) {
 				for (i = 0; i < nums; i++) {
-					stream->out[ntohs(str_p[i])].mid = 0;
-					stream->out[ntohs(str_p[i])].mid_uo = 0;
+					sout = SCTP_SO(stream, ntohs(str_p[i]));
+					sout->mid = 0;
+					sout->mid_uo = 0;
 				}
 			} else {
 				for (i = 0; i < stream->outcnt; i++) {
-					stream->out[i].mid = 0;
-					stream->out[i].mid_uo = 0;
+					sout = SCTP_SO(stream, i);
+					sout->mid = 0;
+					sout->mid_uo = 0;
 				}
 			}
 
@@ -995,7 +998,7 @@ struct sctp_chunk *sctp_process_strreset_resp(
 		}
 
 		for (i = 0; i < stream->outcnt; i++)
-			stream->out[i].state = SCTP_STREAM_OPEN;
+			SCTP_SO(stream, i)->state = SCTP_STREAM_OPEN;
 
 		*evp = sctp_ulpevent_make_stream_reset_event(asoc, flags,
 			nums, str_p, GFP_ATOMIC);
@@ -1050,15 +1053,15 @@ struct sctp_chunk *sctp_process_strreset_resp(
 			asoc->adv_peer_ack_point = asoc->ctsn_ack_point;
 
 			for (i = 0; i < stream->outcnt; i++) {
-				stream->out[i].mid = 0;
-				stream->out[i].mid_uo = 0;
+				SCTP_SO(stream, i)->mid = 0;
+				SCTP_SO(stream, i)->mid_uo = 0;
 			}
 			for (i = 0; i < stream->incnt; i++)
-				stream->in[i].mid = 0;
+				SCTP_SI(stream, i)->mid = 0;
 		}
 
 		for (i = 0; i < stream->outcnt; i++)
-			stream->out[i].state = SCTP_STREAM_OPEN;
+			SCTP_SO(stream, i)->state = SCTP_STREAM_OPEN;
 
 		*evp = sctp_ulpevent_make_assoc_reset_event(asoc, flags,
 			stsn, rtsn, GFP_ATOMIC);
@@ -1072,7 +1075,7 @@ struct sctp_chunk *sctp_process_strreset_resp(
 
 		if (result == SCTP_STRRESET_PERFORMED)
 			for (i = number; i < stream->outcnt; i++)
-				stream->out[i].state = SCTP_STREAM_OPEN;
+				SCTP_SO(stream, i)->state = SCTP_STREAM_OPEN;
 		else
 			stream->outcnt = number;
 
diff --git a/net/sctp/stream_interleave.c b/net/sctp/stream_interleave.c
index d3764c181299..0a78cdf86463 100644
--- a/net/sctp/stream_interleave.c
+++ b/net/sctp/stream_interleave.c
@@ -197,7 +197,7 @@ static struct sctp_ulpevent *sctp_intl_retrieve_partial(
 	__u32 next_fsn = 0;
 	int is_last = 0;
 
-	sin = sctp_stream_in(ulpq->asoc, event->stream);
+	sin = sctp_stream_in(&ulpq->asoc->stream, event->stream);
 
 	skb_queue_walk(&ulpq->reasm, pos) {
 		struct sctp_ulpevent *cevent = sctp_skb2event(pos);
@@ -278,7 +278,7 @@ static struct sctp_ulpevent *sctp_intl_retrieve_reassembled(
 	__u32 pd_len = 0;
 	__u32 mid = 0;
 
-	sin = sctp_stream_in(ulpq->asoc, event->stream);
+	sin = sctp_stream_in(&ulpq->asoc->stream, event->stream);
 
 	skb_queue_walk(&ulpq->reasm, pos) {
 		struct sctp_ulpevent *cevent = sctp_skb2event(pos);
@@ -368,7 +368,7 @@ static struct sctp_ulpevent *sctp_intl_reasm(struct sctp_ulpq *ulpq,
 
 	sctp_intl_store_reasm(ulpq, event);
 
-	sin = sctp_stream_in(ulpq->asoc, event->stream);
+	sin = sctp_stream_in(&ulpq->asoc->stream, event->stream);
 	if (sin->pd_mode && event->mid == sin->mid &&
 	    event->fsn == sin->fsn)
 		retval = sctp_intl_retrieve_partial(ulpq, event);
@@ -575,7 +575,7 @@ static struct sctp_ulpevent *sctp_intl_retrieve_partial_uo(
 	__u32 next_fsn = 0;
 	int is_last = 0;
 
-	sin = sctp_stream_in(ulpq->asoc, event->stream);
+	sin = sctp_stream_in(&ulpq->asoc->stream, event->stream);
 
 	skb_queue_walk(&ulpq->reasm_uo, pos) {
 		struct sctp_ulpevent *cevent = sctp_skb2event(pos);
@@ -659,7 +659,7 @@ static struct sctp_ulpevent *sctp_intl_retrieve_reassembled_uo(
 	__u32 pd_len = 0;
 	__u32 mid = 0;
 
-	sin = sctp_stream_in(ulpq->asoc, event->stream);
+	sin = sctp_stream_in(&ulpq->asoc->stream, event->stream);
 
 	skb_queue_walk(&ulpq->reasm_uo, pos) {
 		struct sctp_ulpevent *cevent = sctp_skb2event(pos);
@@ -750,7 +750,7 @@ static struct sctp_ulpevent *sctp_intl_reasm_uo(struct sctp_ulpq *ulpq,
 
 	sctp_intl_store_reasm_uo(ulpq, event);
 
-	sin = sctp_stream_in(ulpq->asoc, event->stream);
+	sin = sctp_stream_in(&ulpq->asoc->stream, event->stream);
 	if (sin->pd_mode_uo && event->mid == sin->mid_uo &&
 	    event->fsn == sin->fsn_uo)
 		retval = sctp_intl_retrieve_partial_uo(ulpq, event);
@@ -774,7 +774,7 @@ static struct sctp_ulpevent *sctp_intl_retrieve_first_uo(struct sctp_ulpq *ulpq)
 	skb_queue_walk(&ulpq->reasm_uo, pos) {
 		struct sctp_ulpevent *cevent = sctp_skb2event(pos);
 
-		csin = sctp_stream_in(ulpq->asoc, cevent->stream);
+		csin = sctp_stream_in(&ulpq->asoc->stream, cevent->stream);
 		if (csin->pd_mode_uo)
 			continue;
 
@@ -875,7 +875,7 @@ static struct sctp_ulpevent *sctp_intl_retrieve_first(struct sctp_ulpq *ulpq)
 	skb_queue_walk(&ulpq->reasm, pos) {
 		struct sctp_ulpevent *cevent = sctp_skb2event(pos);
 
-		csin = sctp_stream_in(ulpq->asoc, cevent->stream);
+		csin = sctp_stream_in(&ulpq->asoc->stream, cevent->stream);
 		if (csin->pd_mode)
 			continue;
 
@@ -1053,7 +1053,7 @@ static void sctp_intl_abort_pd(struct sctp_ulpq *ulpq, gfp_t gfp)
 	__u16 sid;
 
 	for (sid = 0; sid < stream->incnt; sid++) {
-		struct sctp_stream_in *sin = &stream->in[sid];
+		struct sctp_stream_in *sin = SCTP_SI(stream, sid);
 		__u32 mid;
 
 		if (sin->pd_mode_uo) {
@@ -1247,7 +1247,7 @@ static void sctp_handle_fwdtsn(struct sctp_ulpq *ulpq, struct sctp_chunk *chunk)
 static void sctp_intl_skip(struct sctp_ulpq *ulpq, __u16 sid, __u32 mid,
 			   __u8 flags)
 {
-	struct sctp_stream_in *sin = sctp_stream_in(ulpq->asoc, sid);
+	struct sctp_stream_in *sin = sctp_stream_in(&ulpq->asoc->stream, sid);
 	struct sctp_stream *stream  = &ulpq->asoc->stream;
 
 	if (flags & SCTP_FTSN_U_BIT) {
diff --git a/net/sctp/stream_sched.c b/net/sctp/stream_sched.c
index f5fcd425232a..a6c04a94b08f 100644
--- a/net/sctp/stream_sched.c
+++ b/net/sctp/stream_sched.c
@@ -161,7 +161,7 @@ int sctp_sched_set_sched(struct sctp_association *asoc,
 
 		/* Give the next scheduler a clean slate. */
 		for (i = 0; i < asoc->stream.outcnt; i++) {
-			void *p = asoc->stream.out[i].ext;
+			void *p = SCTP_SO(&asoc->stream, i)->ext;
 
 			if (!p)
 				continue;
@@ -175,7 +175,7 @@ int sctp_sched_set_sched(struct sctp_association *asoc,
 	asoc->outqueue.sched = n;
 	n->init(&asoc->stream);
 	for (i = 0; i < asoc->stream.outcnt; i++) {
-		if (!asoc->stream.out[i].ext)
+		if (!SCTP_SO(&asoc->stream, i)->ext)
 			continue;
 
 		ret = n->init_sid(&asoc->stream, i, GFP_KERNEL);
@@ -217,7 +217,7 @@ int sctp_sched_set_value(struct sctp_association *asoc, __u16 sid,
 	if (sid >= asoc->stream.outcnt)
 		return -EINVAL;
 
-	if (!asoc->stream.out[sid].ext) {
+	if (!SCTP_SO(&asoc->stream, sid)->ext) {
 		int ret;
 
 		ret = sctp_stream_init_ext(&asoc->stream, sid);
@@ -234,7 +234,7 @@ int sctp_sched_get_value(struct sctp_association *asoc, __u16 sid,
 	if (sid >= asoc->stream.outcnt)
 		return -EINVAL;
 
-	if (!asoc->stream.out[sid].ext)
+	if (!SCTP_SO(&asoc->stream, sid)->ext)
 		return 0;
 
 	return asoc->outqueue.sched->get(&asoc->stream, sid, value);
@@ -252,7 +252,7 @@ void sctp_sched_dequeue_done(struct sctp_outq *q, struct sctp_chunk *ch)
 		 * priority stream comes in.
 		 */
 		sid = sctp_chunk_stream_no(ch);
-		sout = &q->asoc->stream.out[sid];
+		sout = SCTP_SO(&q->asoc->stream, sid);
 		q->asoc->stream.out_curr = sout;
 		return;
 	}
@@ -272,8 +272,9 @@ void sctp_sched_dequeue_common(struct sctp_outq *q, struct sctp_chunk *ch)
 int sctp_sched_init_sid(struct sctp_stream *stream, __u16 sid, gfp_t gfp)
 {
 	struct sctp_sched_ops *sched = sctp_sched_ops_from_stream(stream);
+	struct sctp_stream_out_ext *ext = SCTP_SO(stream, sid)->ext;
 
-	INIT_LIST_HEAD(&stream->out[sid].ext->outq);
+	INIT_LIST_HEAD(&ext->outq);
 	return sched->init_sid(stream, sid, gfp);
 }
 
diff --git a/net/sctp/stream_sched_prio.c b/net/sctp/stream_sched_prio.c
index 7997d35dd0fd..2245083a98f2 100644
--- a/net/sctp/stream_sched_prio.c
+++ b/net/sctp/stream_sched_prio.c
@@ -75,10 +75,10 @@ static struct sctp_stream_priorities *sctp_sched_prio_get_head(
 
 	/* No luck. So we search on all streams now. */
 	for (i = 0; i < stream->outcnt; i++) {
-		if (!stream->out[i].ext)
+		if (!SCTP_SO(stream, i)->ext)
 			continue;
 
-		p = stream->out[i].ext->prio_head;
+		p = SCTP_SO(stream, i)->ext->prio_head;
 		if (!p)
 			/* Means all other streams won't be initialized
 			 * as well.
@@ -165,7 +165,7 @@ static void sctp_sched_prio_sched(struct sctp_stream *stream,
 static int sctp_sched_prio_set(struct sctp_stream *stream, __u16 sid,
 			       __u16 prio, gfp_t gfp)
 {
-	struct sctp_stream_out *sout = &stream->out[sid];
+	struct sctp_stream_out *sout = SCTP_SO(stream, sid);
 	struct sctp_stream_out_ext *soute = sout->ext;
 	struct sctp_stream_priorities *prio_head, *old;
 	bool reschedule = false;
@@ -186,7 +186,7 @@ static int sctp_sched_prio_set(struct sctp_stream *stream, __u16 sid,
 		return 0;
 
 	for (i = 0; i < stream->outcnt; i++) {
-		soute = stream->out[i].ext;
+		soute = SCTP_SO(stream, i)->ext;
 		if (soute && soute->prio_head == old)
 			/* It's still in use, nothing else to do here. */
 			return 0;
@@ -201,7 +201,7 @@ static int sctp_sched_prio_set(struct sctp_stream *stream, __u16 sid,
 static int sctp_sched_prio_get(struct sctp_stream *stream, __u16 sid,
 			       __u16 *value)
 {
-	*value = stream->out[sid].ext->prio_head->prio;
+	*value = SCTP_SO(stream, sid)->ext->prio_head->prio;
 	return 0;
 }
 
@@ -215,7 +215,7 @@ static int sctp_sched_prio_init(struct sctp_stream *stream)
 static int sctp_sched_prio_init_sid(struct sctp_stream *stream, __u16 sid,
 				    gfp_t gfp)
 {
-	INIT_LIST_HEAD(&stream->out[sid].ext->prio_list);
+	INIT_LIST_HEAD(&SCTP_SO(stream, sid)->ext->prio_list);
 	return sctp_sched_prio_set(stream, sid, 0, gfp);
 }
 
@@ -233,9 +233,9 @@ static void sctp_sched_prio_free(struct sctp_stream *stream)
 	 */
 	sctp_sched_prio_unsched_all(stream);
 	for (i = 0; i < stream->outcnt; i++) {
-		if (!stream->out[i].ext)
+		if (!SCTP_SO(stream, i)->ext)
 			continue;
-		prio = stream->out[i].ext->prio_head;
+		prio = SCTP_SO(stream, i)->ext->prio_head;
 		if (prio && list_empty(&prio->prio_sched))
 			list_add(&prio->prio_sched, &list);
 	}
@@ -255,7 +255,7 @@ static void sctp_sched_prio_enqueue(struct sctp_outq *q,
 	ch = list_first_entry(&msg->chunks, struct sctp_chunk, frag_list);
 	sid = sctp_chunk_stream_no(ch);
 	stream = &q->asoc->stream;
-	sctp_sched_prio_sched(stream, stream->out[sid].ext);
+	sctp_sched_prio_sched(stream, SCTP_SO(stream, sid)->ext);
 }
 
 static struct sctp_chunk *sctp_sched_prio_dequeue(struct sctp_outq *q)
@@ -297,7 +297,7 @@ static void sctp_sched_prio_dequeue_done(struct sctp_outq *q,
 	 * this priority.
 	 */
 	sid = sctp_chunk_stream_no(ch);
-	soute = q->asoc->stream.out[sid].ext;
+	soute = SCTP_SO(&q->asoc->stream, sid)->ext;
 	prio = soute->prio_head;
 
 	sctp_sched_prio_next_stream(prio);
@@ -317,7 +317,7 @@ static void sctp_sched_prio_sched_all(struct sctp_stream *stream)
 		__u16 sid;
 
 		sid = sctp_chunk_stream_no(ch);
-		sout = &stream->out[sid];
+		sout = SCTP_SO(stream, sid);
 		if (sout->ext)
 			sctp_sched_prio_sched(stream, sout->ext);
 	}
diff --git a/net/sctp/stream_sched_rr.c b/net/sctp/stream_sched_rr.c
index 1155692448f1..52ba743fa7a7 100644
--- a/net/sctp/stream_sched_rr.c
+++ b/net/sctp/stream_sched_rr.c
@@ -100,7 +100,7 @@ static int sctp_sched_rr_init(struct sctp_stream *stream)
 static int sctp_sched_rr_init_sid(struct sctp_stream *stream, __u16 sid,
 				  gfp_t gfp)
 {
-	INIT_LIST_HEAD(&stream->out[sid].ext->rr_list);
+	INIT_LIST_HEAD(&SCTP_SO(stream, sid)->ext->rr_list);
 
 	return 0;
 }
@@ -120,7 +120,7 @@ static void sctp_sched_rr_enqueue(struct sctp_outq *q,
 	ch = list_first_entry(&msg->chunks, struct sctp_chunk, frag_list);
 	sid = sctp_chunk_stream_no(ch);
 	stream = &q->asoc->stream;
-	sctp_sched_rr_sched(stream, stream->out[sid].ext);
+	sctp_sched_rr_sched(stream, SCTP_SO(stream, sid)->ext);
 }
 
 static struct sctp_chunk *sctp_sched_rr_dequeue(struct sctp_outq *q)
@@ -154,7 +154,7 @@ static void sctp_sched_rr_dequeue_done(struct sctp_outq *q,
 
 	/* Last chunk on that msg, move to the next stream */
 	sid = sctp_chunk_stream_no(ch);
-	soute = q->asoc->stream.out[sid].ext;
+	soute = SCTP_SO(&q->asoc->stream, sid)->ext;
 
 	sctp_sched_rr_next_stream(&q->asoc->stream);
 
@@ -173,7 +173,7 @@ static void sctp_sched_rr_sched_all(struct sctp_stream *stream)
 		__u16 sid;
 
 		sid = sctp_chunk_stream_no(ch);
-		soute = stream->out[sid].ext;
+		soute = SCTP_SO(stream, sid)->ext;
 		if (soute)
 			sctp_sched_rr_sched(stream, soute);
 	}
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v3 1/2] net/sctp: Make wrappers for accessing in/out streams
@ 2018-08-10 17:11                       ` Konstantin Khorenko
  0 siblings, 0 replies; 64+ messages in thread
From: Konstantin Khorenko @ 2018-08-10 17:11 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: oleg.babin, netdev, linux-sctp, David S . Miller, Vlad Yasevich,
	Neil Horman, Xin Long, Andrey Ryabinin, Konstantin Khorenko

This patch introduces wrappers for accessing in/out streams indirectly.
This will enable to replace physically contiguous memory arrays
of streams with flexible arrays (or maybe any other appropriate
mechanism) which do memory allocation on a per-page basis.

Signed-off-by: Oleg Babin <obabin@virtuozzo.com>
Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>

---
v2 changes:
 sctp_stream_in() users are updated to provide stream as an argument,
 sctp_stream_{in,out}_ptr() are now just sctp_stream_{in,out}().

v3 changes:
 Move type chages struct sctp_stream_out -> flex_array to next patch.
 Make sctp_stream_{in,out}() static incline and move them to a header.
---
 include/net/sctp/structs.h   | 35 +++++++++++++++++-------
 net/sctp/chunk.c             |  6 ++--
 net/sctp/outqueue.c          | 11 ++++----
 net/sctp/socket.c            |  4 +--
 net/sctp/stream.c            | 65 +++++++++++++++++++++++---------------------
 net/sctp/stream_interleave.c | 20 +++++++-------
 net/sctp/stream_sched.c      | 13 +++++----
 net/sctp/stream_sched_prio.c | 22 +++++++--------
 net/sctp/stream_sched_rr.c   |  8 +++---
 9 files changed, 103 insertions(+), 81 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index dbe1b911a24d..ce4bf844f573 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -394,37 +394,35 @@ void sctp_stream_update(struct sctp_stream *stream, struct sctp_stream *new);
 
 /* What is the current SSN number for this stream? */
 #define sctp_ssn_peek(stream, type, sid) \
-	((stream)->type[sid].ssn)
+	(sctp_stream_##type((stream), (sid))->ssn)
 
 /* Return the next SSN number for this stream.	*/
 #define sctp_ssn_next(stream, type, sid) \
-	((stream)->type[sid].ssn++)
+	(sctp_stream_##type((stream), (sid))->ssn++)
 
 /* Skip over this ssn and all below. */
 #define sctp_ssn_skip(stream, type, sid, ssn) \
-	((stream)->type[sid].ssn = ssn + 1)
+	(sctp_stream_##type((stream), (sid))->ssn = ssn + 1)
 
 /* What is the current MID number for this stream? */
 #define sctp_mid_peek(stream, type, sid) \
-	((stream)->type[sid].mid)
+	(sctp_stream_##type((stream), (sid))->mid)
 
 /* Return the next MID number for this stream.  */
 #define sctp_mid_next(stream, type, sid) \
-	((stream)->type[sid].mid++)
+	(sctp_stream_##type((stream), (sid))->mid++)
 
 /* Skip over this mid and all below. */
 #define sctp_mid_skip(stream, type, sid, mid) \
-	((stream)->type[sid].mid = mid + 1)
-
-#define sctp_stream_in(asoc, sid) (&(asoc)->stream.in[sid])
+	(sctp_stream_##type((stream), (sid))->mid = mid + 1)
 
 /* What is the current MID_uo number for this stream? */
 #define sctp_mid_uo_peek(stream, type, sid) \
-	((stream)->type[sid].mid_uo)
+	(sctp_stream_##type((stream), (sid))->mid_uo)
 
 /* Return the next MID_uo number for this stream.  */
 #define sctp_mid_uo_next(stream, type, sid) \
-	((stream)->type[sid].mid_uo++)
+	(sctp_stream_##type((stream), (sid))->mid_uo++)
 
 /*
  * Pointers to address related SCTP functions.
@@ -1456,6 +1454,23 @@ struct sctp_stream {
 	struct sctp_stream_interleave *si;
 };
 
+static inline struct sctp_stream_out *sctp_stream_out(
+	const struct sctp_stream *stream,
+	__u16 sid)
+{
+	return ((struct sctp_stream_out *)(stream->out)) + sid;
+}
+
+static inline struct sctp_stream_in *sctp_stream_in(
+	const struct sctp_stream *stream,
+	__u16 sid)
+{
+	return ((struct sctp_stream_in *)(stream->in)) + sid;
+}
+
+#define SCTP_SO(s, i) sctp_stream_out((s), (i))
+#define SCTP_SI(s, i) sctp_stream_in((s), (i))
+
 #define SCTP_STREAM_CLOSED		0x00
 #define SCTP_STREAM_OPEN		0x01
 
diff --git a/net/sctp/chunk.c b/net/sctp/chunk.c
index bfb9f812e2ef..ce8087846f05 100644
--- a/net/sctp/chunk.c
+++ b/net/sctp/chunk.c
@@ -325,7 +325,8 @@ int sctp_chunk_abandoned(struct sctp_chunk *chunk)
 	if (SCTP_PR_TTL_ENABLED(chunk->sinfo.sinfo_flags) &&
 	    time_after(jiffies, chunk->msg->expires_at)) {
 		struct sctp_stream_out *streamout -			&chunk->asoc->stream.out[chunk->sinfo.sinfo_stream];
+			SCTP_SO(&chunk->asoc->stream,
+				chunk->sinfo.sinfo_stream);
 
 		if (chunk->sent_count) {
 			chunk->asoc->abandoned_sent[SCTP_PR_INDEX(TTL)]++;
@@ -339,7 +340,8 @@ int sctp_chunk_abandoned(struct sctp_chunk *chunk)
 	} else if (SCTP_PR_RTX_ENABLED(chunk->sinfo.sinfo_flags) &&
 		   chunk->sent_count > chunk->sinfo.sinfo_timetolive) {
 		struct sctp_stream_out *streamout -			&chunk->asoc->stream.out[chunk->sinfo.sinfo_stream];
+			SCTP_SO(&chunk->asoc->stream,
+				chunk->sinfo.sinfo_stream);
 
 		chunk->asoc->abandoned_sent[SCTP_PR_INDEX(RTX)]++;
 		streamout->ext->abandoned_sent[SCTP_PR_INDEX(RTX)]++;
diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index d68aa33485a9..d74d00b29942 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -80,7 +80,7 @@ static inline void sctp_outq_head_data(struct sctp_outq *q,
 	q->out_qlen += ch->skb->len;
 
 	stream = sctp_chunk_stream_no(ch);
-	oute = q->asoc->stream.out[stream].ext;
+	oute = SCTP_SO(&q->asoc->stream, stream)->ext;
 	list_add(&ch->stream_list, &oute->outq);
 }
 
@@ -101,7 +101,7 @@ static inline void sctp_outq_tail_data(struct sctp_outq *q,
 	q->out_qlen += ch->skb->len;
 
 	stream = sctp_chunk_stream_no(ch);
-	oute = q->asoc->stream.out[stream].ext;
+	oute = SCTP_SO(&q->asoc->stream, stream)->ext;
 	list_add_tail(&ch->stream_list, &oute->outq);
 }
 
@@ -372,7 +372,7 @@ static int sctp_prsctp_prune_sent(struct sctp_association *asoc,
 		sctp_insert_list(&asoc->outqueue.abandoned,
 				 &chk->transmitted_list);
 
-		streamout = &asoc->stream.out[chk->sinfo.sinfo_stream];
+		streamout = SCTP_SO(&asoc->stream, chk->sinfo.sinfo_stream);
 		asoc->sent_cnt_removable--;
 		asoc->abandoned_sent[SCTP_PR_INDEX(PRIO)]++;
 		streamout->ext->abandoned_sent[SCTP_PR_INDEX(PRIO)]++;
@@ -416,7 +416,7 @@ static int sctp_prsctp_prune_unsent(struct sctp_association *asoc,
 		asoc->abandoned_unsent[SCTP_PR_INDEX(PRIO)]++;
 		if (chk->sinfo.sinfo_stream < asoc->stream.outcnt) {
 			struct sctp_stream_out *streamout -				&asoc->stream.out[chk->sinfo.sinfo_stream];
+				SCTP_SO(&asoc->stream, chk->sinfo.sinfo_stream);
 
 			streamout->ext->abandoned_unsent[SCTP_PR_INDEX(PRIO)]++;
 		}
@@ -1082,6 +1082,7 @@ static void sctp_outq_flush_data(struct sctp_flush_ctx *ctx,
 	/* Finally, transmit new packets.  */
 	while ((chunk = sctp_outq_dequeue_data(ctx->q)) != NULL) {
 		__u32 sid = ntohs(chunk->subh.data_hdr->stream);
+		__u8 stream_state = SCTP_SO(&ctx->asoc->stream, sid)->state;
 
 		/* Has this chunk expired? */
 		if (sctp_chunk_abandoned(chunk)) {
@@ -1091,7 +1092,7 @@ static void sctp_outq_flush_data(struct sctp_flush_ctx *ctx,
 			continue;
 		}
 
-		if (ctx->asoc->stream.out[sid].state = SCTP_STREAM_CLOSED) {
+		if (stream_state = SCTP_STREAM_CLOSED) {
 			sctp_outq_head_data(ctx->q, chunk);
 			break;
 		}
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index ce620e878538..4582ab25bc4e 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1905,7 +1905,7 @@ static int sctp_sendmsg_to_asoc(struct sctp_association *asoc,
 		goto err;
 	}
 
-	if (unlikely(!asoc->stream.out[sinfo->sinfo_stream].ext)) {
+	if (unlikely(!SCTP_SO(&asoc->stream, sinfo->sinfo_stream)->ext)) {
 		err = sctp_stream_init_ext(&asoc->stream, sinfo->sinfo_stream);
 		if (err)
 			goto err;
@@ -6958,7 +6958,7 @@ static int sctp_getsockopt_pr_streamstatus(struct sock *sk, int len,
 	if (!asoc || params.sprstat_sid >= asoc->stream.outcnt)
 		goto out;
 
-	streamoute = asoc->stream.out[params.sprstat_sid].ext;
+	streamoute = SCTP_SO(&asoc->stream, params.sprstat_sid)->ext;
 	if (!streamoute) {
 		/* Not allocated yet, means all stats are 0 */
 		params.sprstat_abandoned_unsent = 0;
diff --git a/net/sctp/stream.c b/net/sctp/stream.c
index f1f1d1b232ba..7ca6fe4e7882 100644
--- a/net/sctp/stream.c
+++ b/net/sctp/stream.c
@@ -162,7 +162,7 @@ int sctp_stream_init(struct sctp_stream *stream, __u16 outcnt, __u16 incnt,
 
 	stream->outcnt = outcnt;
 	for (i = 0; i < stream->outcnt; i++)
-		stream->out[i].state = SCTP_STREAM_OPEN;
+		SCTP_SO(stream, i)->state = SCTP_STREAM_OPEN;
 
 	sched->init(stream);
 
@@ -193,7 +193,7 @@ int sctp_stream_init_ext(struct sctp_stream *stream, __u16 sid)
 	soute = kzalloc(sizeof(*soute), GFP_KERNEL);
 	if (!soute)
 		return -ENOMEM;
-	stream->out[sid].ext = soute;
+	SCTP_SO(stream, sid)->ext = soute;
 
 	return sctp_sched_init_sid(stream, sid, GFP_KERNEL);
 }
@@ -205,7 +205,7 @@ void sctp_stream_free(struct sctp_stream *stream)
 
 	sched->free(stream);
 	for (i = 0; i < stream->outcnt; i++)
-		kfree(stream->out[i].ext);
+		kfree(SCTP_SO(stream, i)->ext);
 	kfree(stream->out);
 	kfree(stream->in);
 }
@@ -215,12 +215,12 @@ void sctp_stream_clear(struct sctp_stream *stream)
 	int i;
 
 	for (i = 0; i < stream->outcnt; i++) {
-		stream->out[i].mid = 0;
-		stream->out[i].mid_uo = 0;
+		SCTP_SO(stream, i)->mid = 0;
+		SCTP_SO(stream, i)->mid_uo = 0;
 	}
 
 	for (i = 0; i < stream->incnt; i++)
-		stream->in[i].mid = 0;
+		SCTP_SI(stream, i)->mid = 0;
 }
 
 void sctp_stream_update(struct sctp_stream *stream, struct sctp_stream *new)
@@ -273,8 +273,8 @@ static bool sctp_stream_outq_is_empty(struct sctp_stream *stream,
 	for (i = 0; i < str_nums; i++) {
 		__u16 sid = ntohs(str_list[i]);
 
-		if (stream->out[sid].ext &&
-		    !list_empty(&stream->out[sid].ext->outq))
+		if (SCTP_SO(stream, sid)->ext &&
+		    !list_empty(&SCTP_SO(stream, sid)->ext->outq))
 			return false;
 	}
 
@@ -361,11 +361,11 @@ int sctp_send_reset_streams(struct sctp_association *asoc,
 	if (out) {
 		if (str_nums)
 			for (i = 0; i < str_nums; i++)
-				stream->out[str_list[i]].state +				SCTP_SO(stream, str_list[i])->state  						       SCTP_STREAM_CLOSED;
 		else
 			for (i = 0; i < stream->outcnt; i++)
-				stream->out[i].state = SCTP_STREAM_CLOSED;
+				SCTP_SO(stream, i)->state = SCTP_STREAM_CLOSED;
 	}
 
 	asoc->strreset_chunk = chunk;
@@ -380,11 +380,11 @@ int sctp_send_reset_streams(struct sctp_association *asoc,
 
 		if (str_nums)
 			for (i = 0; i < str_nums; i++)
-				stream->out[str_list[i]].state +				SCTP_SO(stream, str_list[i])->state  						       SCTP_STREAM_OPEN;
 		else
 			for (i = 0; i < stream->outcnt; i++)
-				stream->out[i].state = SCTP_STREAM_OPEN;
+				SCTP_SO(stream, i)->state = SCTP_STREAM_OPEN;
 
 		goto out;
 	}
@@ -418,7 +418,7 @@ int sctp_send_reset_assoc(struct sctp_association *asoc)
 
 	/* Block further xmit of data until this request is completed */
 	for (i = 0; i < stream->outcnt; i++)
-		stream->out[i].state = SCTP_STREAM_CLOSED;
+		SCTP_SO(stream, i)->state = SCTP_STREAM_CLOSED;
 
 	asoc->strreset_chunk = chunk;
 	sctp_chunk_hold(asoc->strreset_chunk);
@@ -429,7 +429,7 @@ int sctp_send_reset_assoc(struct sctp_association *asoc)
 		asoc->strreset_chunk = NULL;
 
 		for (i = 0; i < stream->outcnt; i++)
-			stream->out[i].state = SCTP_STREAM_OPEN;
+			SCTP_SO(stream, i)->state = SCTP_STREAM_OPEN;
 
 		return retval;
 	}
@@ -609,10 +609,10 @@ struct sctp_chunk *sctp_process_strreset_outreq(
 		}
 
 		for (i = 0; i < nums; i++)
-			stream->in[ntohs(str_p[i])].mid = 0;
+			SCTP_SI(stream, ntohs(str_p[i]))->mid = 0;
 	} else {
 		for (i = 0; i < stream->incnt; i++)
-			stream->in[i].mid = 0;
+			SCTP_SI(stream, i)->mid = 0;
 	}
 
 	result = SCTP_STRRESET_PERFORMED;
@@ -683,11 +683,11 @@ struct sctp_chunk *sctp_process_strreset_inreq(
 
 	if (nums)
 		for (i = 0; i < nums; i++)
-			stream->out[ntohs(str_p[i])].state +			SCTP_SO(stream, ntohs(str_p[i]))->state  					       SCTP_STREAM_CLOSED;
 	else
 		for (i = 0; i < stream->outcnt; i++)
-			stream->out[i].state = SCTP_STREAM_CLOSED;
+			SCTP_SO(stream, i)->state = SCTP_STREAM_CLOSED;
 
 	asoc->strreset_chunk = chunk;
 	asoc->strreset_outstanding = 1;
@@ -786,11 +786,11 @@ struct sctp_chunk *sctp_process_strreset_tsnreq(
 	 *      incoming and outgoing streams.
 	 */
 	for (i = 0; i < stream->outcnt; i++) {
-		stream->out[i].mid = 0;
-		stream->out[i].mid_uo = 0;
+		SCTP_SO(stream, i)->mid = 0;
+		SCTP_SO(stream, i)->mid_uo = 0;
 	}
 	for (i = 0; i < stream->incnt; i++)
-		stream->in[i].mid = 0;
+		SCTP_SI(stream, i)->mid = 0;
 
 	result = SCTP_STRRESET_PERFORMED;
 
@@ -979,15 +979,18 @@ struct sctp_chunk *sctp_process_strreset_resp(
 		       sizeof(__u16);
 
 		if (result = SCTP_STRRESET_PERFORMED) {
+			struct sctp_stream_out *sout;
 			if (nums) {
 				for (i = 0; i < nums; i++) {
-					stream->out[ntohs(str_p[i])].mid = 0;
-					stream->out[ntohs(str_p[i])].mid_uo = 0;
+					sout = SCTP_SO(stream, ntohs(str_p[i]));
+					sout->mid = 0;
+					sout->mid_uo = 0;
 				}
 			} else {
 				for (i = 0; i < stream->outcnt; i++) {
-					stream->out[i].mid = 0;
-					stream->out[i].mid_uo = 0;
+					sout = SCTP_SO(stream, i);
+					sout->mid = 0;
+					sout->mid_uo = 0;
 				}
 			}
 
@@ -995,7 +998,7 @@ struct sctp_chunk *sctp_process_strreset_resp(
 		}
 
 		for (i = 0; i < stream->outcnt; i++)
-			stream->out[i].state = SCTP_STREAM_OPEN;
+			SCTP_SO(stream, i)->state = SCTP_STREAM_OPEN;
 
 		*evp = sctp_ulpevent_make_stream_reset_event(asoc, flags,
 			nums, str_p, GFP_ATOMIC);
@@ -1050,15 +1053,15 @@ struct sctp_chunk *sctp_process_strreset_resp(
 			asoc->adv_peer_ack_point = asoc->ctsn_ack_point;
 
 			for (i = 0; i < stream->outcnt; i++) {
-				stream->out[i].mid = 0;
-				stream->out[i].mid_uo = 0;
+				SCTP_SO(stream, i)->mid = 0;
+				SCTP_SO(stream, i)->mid_uo = 0;
 			}
 			for (i = 0; i < stream->incnt; i++)
-				stream->in[i].mid = 0;
+				SCTP_SI(stream, i)->mid = 0;
 		}
 
 		for (i = 0; i < stream->outcnt; i++)
-			stream->out[i].state = SCTP_STREAM_OPEN;
+			SCTP_SO(stream, i)->state = SCTP_STREAM_OPEN;
 
 		*evp = sctp_ulpevent_make_assoc_reset_event(asoc, flags,
 			stsn, rtsn, GFP_ATOMIC);
@@ -1072,7 +1075,7 @@ struct sctp_chunk *sctp_process_strreset_resp(
 
 		if (result = SCTP_STRRESET_PERFORMED)
 			for (i = number; i < stream->outcnt; i++)
-				stream->out[i].state = SCTP_STREAM_OPEN;
+				SCTP_SO(stream, i)->state = SCTP_STREAM_OPEN;
 		else
 			stream->outcnt = number;
 
diff --git a/net/sctp/stream_interleave.c b/net/sctp/stream_interleave.c
index d3764c181299..0a78cdf86463 100644
--- a/net/sctp/stream_interleave.c
+++ b/net/sctp/stream_interleave.c
@@ -197,7 +197,7 @@ static struct sctp_ulpevent *sctp_intl_retrieve_partial(
 	__u32 next_fsn = 0;
 	int is_last = 0;
 
-	sin = sctp_stream_in(ulpq->asoc, event->stream);
+	sin = sctp_stream_in(&ulpq->asoc->stream, event->stream);
 
 	skb_queue_walk(&ulpq->reasm, pos) {
 		struct sctp_ulpevent *cevent = sctp_skb2event(pos);
@@ -278,7 +278,7 @@ static struct sctp_ulpevent *sctp_intl_retrieve_reassembled(
 	__u32 pd_len = 0;
 	__u32 mid = 0;
 
-	sin = sctp_stream_in(ulpq->asoc, event->stream);
+	sin = sctp_stream_in(&ulpq->asoc->stream, event->stream);
 
 	skb_queue_walk(&ulpq->reasm, pos) {
 		struct sctp_ulpevent *cevent = sctp_skb2event(pos);
@@ -368,7 +368,7 @@ static struct sctp_ulpevent *sctp_intl_reasm(struct sctp_ulpq *ulpq,
 
 	sctp_intl_store_reasm(ulpq, event);
 
-	sin = sctp_stream_in(ulpq->asoc, event->stream);
+	sin = sctp_stream_in(&ulpq->asoc->stream, event->stream);
 	if (sin->pd_mode && event->mid = sin->mid &&
 	    event->fsn = sin->fsn)
 		retval = sctp_intl_retrieve_partial(ulpq, event);
@@ -575,7 +575,7 @@ static struct sctp_ulpevent *sctp_intl_retrieve_partial_uo(
 	__u32 next_fsn = 0;
 	int is_last = 0;
 
-	sin = sctp_stream_in(ulpq->asoc, event->stream);
+	sin = sctp_stream_in(&ulpq->asoc->stream, event->stream);
 
 	skb_queue_walk(&ulpq->reasm_uo, pos) {
 		struct sctp_ulpevent *cevent = sctp_skb2event(pos);
@@ -659,7 +659,7 @@ static struct sctp_ulpevent *sctp_intl_retrieve_reassembled_uo(
 	__u32 pd_len = 0;
 	__u32 mid = 0;
 
-	sin = sctp_stream_in(ulpq->asoc, event->stream);
+	sin = sctp_stream_in(&ulpq->asoc->stream, event->stream);
 
 	skb_queue_walk(&ulpq->reasm_uo, pos) {
 		struct sctp_ulpevent *cevent = sctp_skb2event(pos);
@@ -750,7 +750,7 @@ static struct sctp_ulpevent *sctp_intl_reasm_uo(struct sctp_ulpq *ulpq,
 
 	sctp_intl_store_reasm_uo(ulpq, event);
 
-	sin = sctp_stream_in(ulpq->asoc, event->stream);
+	sin = sctp_stream_in(&ulpq->asoc->stream, event->stream);
 	if (sin->pd_mode_uo && event->mid = sin->mid_uo &&
 	    event->fsn = sin->fsn_uo)
 		retval = sctp_intl_retrieve_partial_uo(ulpq, event);
@@ -774,7 +774,7 @@ static struct sctp_ulpevent *sctp_intl_retrieve_first_uo(struct sctp_ulpq *ulpq)
 	skb_queue_walk(&ulpq->reasm_uo, pos) {
 		struct sctp_ulpevent *cevent = sctp_skb2event(pos);
 
-		csin = sctp_stream_in(ulpq->asoc, cevent->stream);
+		csin = sctp_stream_in(&ulpq->asoc->stream, cevent->stream);
 		if (csin->pd_mode_uo)
 			continue;
 
@@ -875,7 +875,7 @@ static struct sctp_ulpevent *sctp_intl_retrieve_first(struct sctp_ulpq *ulpq)
 	skb_queue_walk(&ulpq->reasm, pos) {
 		struct sctp_ulpevent *cevent = sctp_skb2event(pos);
 
-		csin = sctp_stream_in(ulpq->asoc, cevent->stream);
+		csin = sctp_stream_in(&ulpq->asoc->stream, cevent->stream);
 		if (csin->pd_mode)
 			continue;
 
@@ -1053,7 +1053,7 @@ static void sctp_intl_abort_pd(struct sctp_ulpq *ulpq, gfp_t gfp)
 	__u16 sid;
 
 	for (sid = 0; sid < stream->incnt; sid++) {
-		struct sctp_stream_in *sin = &stream->in[sid];
+		struct sctp_stream_in *sin = SCTP_SI(stream, sid);
 		__u32 mid;
 
 		if (sin->pd_mode_uo) {
@@ -1247,7 +1247,7 @@ static void sctp_handle_fwdtsn(struct sctp_ulpq *ulpq, struct sctp_chunk *chunk)
 static void sctp_intl_skip(struct sctp_ulpq *ulpq, __u16 sid, __u32 mid,
 			   __u8 flags)
 {
-	struct sctp_stream_in *sin = sctp_stream_in(ulpq->asoc, sid);
+	struct sctp_stream_in *sin = sctp_stream_in(&ulpq->asoc->stream, sid);
 	struct sctp_stream *stream  = &ulpq->asoc->stream;
 
 	if (flags & SCTP_FTSN_U_BIT) {
diff --git a/net/sctp/stream_sched.c b/net/sctp/stream_sched.c
index f5fcd425232a..a6c04a94b08f 100644
--- a/net/sctp/stream_sched.c
+++ b/net/sctp/stream_sched.c
@@ -161,7 +161,7 @@ int sctp_sched_set_sched(struct sctp_association *asoc,
 
 		/* Give the next scheduler a clean slate. */
 		for (i = 0; i < asoc->stream.outcnt; i++) {
-			void *p = asoc->stream.out[i].ext;
+			void *p = SCTP_SO(&asoc->stream, i)->ext;
 
 			if (!p)
 				continue;
@@ -175,7 +175,7 @@ int sctp_sched_set_sched(struct sctp_association *asoc,
 	asoc->outqueue.sched = n;
 	n->init(&asoc->stream);
 	for (i = 0; i < asoc->stream.outcnt; i++) {
-		if (!asoc->stream.out[i].ext)
+		if (!SCTP_SO(&asoc->stream, i)->ext)
 			continue;
 
 		ret = n->init_sid(&asoc->stream, i, GFP_KERNEL);
@@ -217,7 +217,7 @@ int sctp_sched_set_value(struct sctp_association *asoc, __u16 sid,
 	if (sid >= asoc->stream.outcnt)
 		return -EINVAL;
 
-	if (!asoc->stream.out[sid].ext) {
+	if (!SCTP_SO(&asoc->stream, sid)->ext) {
 		int ret;
 
 		ret = sctp_stream_init_ext(&asoc->stream, sid);
@@ -234,7 +234,7 @@ int sctp_sched_get_value(struct sctp_association *asoc, __u16 sid,
 	if (sid >= asoc->stream.outcnt)
 		return -EINVAL;
 
-	if (!asoc->stream.out[sid].ext)
+	if (!SCTP_SO(&asoc->stream, sid)->ext)
 		return 0;
 
 	return asoc->outqueue.sched->get(&asoc->stream, sid, value);
@@ -252,7 +252,7 @@ void sctp_sched_dequeue_done(struct sctp_outq *q, struct sctp_chunk *ch)
 		 * priority stream comes in.
 		 */
 		sid = sctp_chunk_stream_no(ch);
-		sout = &q->asoc->stream.out[sid];
+		sout = SCTP_SO(&q->asoc->stream, sid);
 		q->asoc->stream.out_curr = sout;
 		return;
 	}
@@ -272,8 +272,9 @@ void sctp_sched_dequeue_common(struct sctp_outq *q, struct sctp_chunk *ch)
 int sctp_sched_init_sid(struct sctp_stream *stream, __u16 sid, gfp_t gfp)
 {
 	struct sctp_sched_ops *sched = sctp_sched_ops_from_stream(stream);
+	struct sctp_stream_out_ext *ext = SCTP_SO(stream, sid)->ext;
 
-	INIT_LIST_HEAD(&stream->out[sid].ext->outq);
+	INIT_LIST_HEAD(&ext->outq);
 	return sched->init_sid(stream, sid, gfp);
 }
 
diff --git a/net/sctp/stream_sched_prio.c b/net/sctp/stream_sched_prio.c
index 7997d35dd0fd..2245083a98f2 100644
--- a/net/sctp/stream_sched_prio.c
+++ b/net/sctp/stream_sched_prio.c
@@ -75,10 +75,10 @@ static struct sctp_stream_priorities *sctp_sched_prio_get_head(
 
 	/* No luck. So we search on all streams now. */
 	for (i = 0; i < stream->outcnt; i++) {
-		if (!stream->out[i].ext)
+		if (!SCTP_SO(stream, i)->ext)
 			continue;
 
-		p = stream->out[i].ext->prio_head;
+		p = SCTP_SO(stream, i)->ext->prio_head;
 		if (!p)
 			/* Means all other streams won't be initialized
 			 * as well.
@@ -165,7 +165,7 @@ static void sctp_sched_prio_sched(struct sctp_stream *stream,
 static int sctp_sched_prio_set(struct sctp_stream *stream, __u16 sid,
 			       __u16 prio, gfp_t gfp)
 {
-	struct sctp_stream_out *sout = &stream->out[sid];
+	struct sctp_stream_out *sout = SCTP_SO(stream, sid);
 	struct sctp_stream_out_ext *soute = sout->ext;
 	struct sctp_stream_priorities *prio_head, *old;
 	bool reschedule = false;
@@ -186,7 +186,7 @@ static int sctp_sched_prio_set(struct sctp_stream *stream, __u16 sid,
 		return 0;
 
 	for (i = 0; i < stream->outcnt; i++) {
-		soute = stream->out[i].ext;
+		soute = SCTP_SO(stream, i)->ext;
 		if (soute && soute->prio_head = old)
 			/* It's still in use, nothing else to do here. */
 			return 0;
@@ -201,7 +201,7 @@ static int sctp_sched_prio_set(struct sctp_stream *stream, __u16 sid,
 static int sctp_sched_prio_get(struct sctp_stream *stream, __u16 sid,
 			       __u16 *value)
 {
-	*value = stream->out[sid].ext->prio_head->prio;
+	*value = SCTP_SO(stream, sid)->ext->prio_head->prio;
 	return 0;
 }
 
@@ -215,7 +215,7 @@ static int sctp_sched_prio_init(struct sctp_stream *stream)
 static int sctp_sched_prio_init_sid(struct sctp_stream *stream, __u16 sid,
 				    gfp_t gfp)
 {
-	INIT_LIST_HEAD(&stream->out[sid].ext->prio_list);
+	INIT_LIST_HEAD(&SCTP_SO(stream, sid)->ext->prio_list);
 	return sctp_sched_prio_set(stream, sid, 0, gfp);
 }
 
@@ -233,9 +233,9 @@ static void sctp_sched_prio_free(struct sctp_stream *stream)
 	 */
 	sctp_sched_prio_unsched_all(stream);
 	for (i = 0; i < stream->outcnt; i++) {
-		if (!stream->out[i].ext)
+		if (!SCTP_SO(stream, i)->ext)
 			continue;
-		prio = stream->out[i].ext->prio_head;
+		prio = SCTP_SO(stream, i)->ext->prio_head;
 		if (prio && list_empty(&prio->prio_sched))
 			list_add(&prio->prio_sched, &list);
 	}
@@ -255,7 +255,7 @@ static void sctp_sched_prio_enqueue(struct sctp_outq *q,
 	ch = list_first_entry(&msg->chunks, struct sctp_chunk, frag_list);
 	sid = sctp_chunk_stream_no(ch);
 	stream = &q->asoc->stream;
-	sctp_sched_prio_sched(stream, stream->out[sid].ext);
+	sctp_sched_prio_sched(stream, SCTP_SO(stream, sid)->ext);
 }
 
 static struct sctp_chunk *sctp_sched_prio_dequeue(struct sctp_outq *q)
@@ -297,7 +297,7 @@ static void sctp_sched_prio_dequeue_done(struct sctp_outq *q,
 	 * this priority.
 	 */
 	sid = sctp_chunk_stream_no(ch);
-	soute = q->asoc->stream.out[sid].ext;
+	soute = SCTP_SO(&q->asoc->stream, sid)->ext;
 	prio = soute->prio_head;
 
 	sctp_sched_prio_next_stream(prio);
@@ -317,7 +317,7 @@ static void sctp_sched_prio_sched_all(struct sctp_stream *stream)
 		__u16 sid;
 
 		sid = sctp_chunk_stream_no(ch);
-		sout = &stream->out[sid];
+		sout = SCTP_SO(stream, sid);
 		if (sout->ext)
 			sctp_sched_prio_sched(stream, sout->ext);
 	}
diff --git a/net/sctp/stream_sched_rr.c b/net/sctp/stream_sched_rr.c
index 1155692448f1..52ba743fa7a7 100644
--- a/net/sctp/stream_sched_rr.c
+++ b/net/sctp/stream_sched_rr.c
@@ -100,7 +100,7 @@ static int sctp_sched_rr_init(struct sctp_stream *stream)
 static int sctp_sched_rr_init_sid(struct sctp_stream *stream, __u16 sid,
 				  gfp_t gfp)
 {
-	INIT_LIST_HEAD(&stream->out[sid].ext->rr_list);
+	INIT_LIST_HEAD(&SCTP_SO(stream, sid)->ext->rr_list);
 
 	return 0;
 }
@@ -120,7 +120,7 @@ static void sctp_sched_rr_enqueue(struct sctp_outq *q,
 	ch = list_first_entry(&msg->chunks, struct sctp_chunk, frag_list);
 	sid = sctp_chunk_stream_no(ch);
 	stream = &q->asoc->stream;
-	sctp_sched_rr_sched(stream, stream->out[sid].ext);
+	sctp_sched_rr_sched(stream, SCTP_SO(stream, sid)->ext);
 }
 
 static struct sctp_chunk *sctp_sched_rr_dequeue(struct sctp_outq *q)
@@ -154,7 +154,7 @@ static void sctp_sched_rr_dequeue_done(struct sctp_outq *q,
 
 	/* Last chunk on that msg, move to the next stream */
 	sid = sctp_chunk_stream_no(ch);
-	soute = q->asoc->stream.out[sid].ext;
+	soute = SCTP_SO(&q->asoc->stream, sid)->ext;
 
 	sctp_sched_rr_next_stream(&q->asoc->stream);
 
@@ -173,7 +173,7 @@ static void sctp_sched_rr_sched_all(struct sctp_stream *stream)
 		__u16 sid;
 
 		sid = sctp_chunk_stream_no(ch);
-		soute = stream->out[sid].ext;
+		soute = SCTP_SO(stream, sid)->ext;
 		if (soute)
 			sctp_sched_rr_sched(stream, soute);
 	}
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v3 2/2] net/sctp: Replace in/out stream arrays with flex_array
  2018-08-10 17:11                     ` Konstantin Khorenko
@ 2018-08-10 17:11                       ` Konstantin Khorenko
  -1 siblings, 0 replies; 64+ messages in thread
From: Konstantin Khorenko @ 2018-08-10 17:11 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: oleg.babin, netdev, linux-sctp, David S . Miller, Vlad Yasevich,
	Neil Horman, Xin Long, Andrey Ryabinin, Konstantin Khorenko

This path replaces physically contiguous memory arrays
allocated using kmalloc_array() with flexible arrays.
This enables to avoid memory allocation failures on the
systems under a memory stress.

Signed-off-by: Oleg Babin <obabin@virtuozzo.com>
Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>

---
 include/net/sctp/structs.h |  9 ++---
 net/sctp/stream.c          | 88 ++++++++++++++++++++++++++++++++++------------
 2 files changed, 71 insertions(+), 26 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index ce4bf844f573..f922db8029e6 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -57,6 +57,7 @@
 #include <linux/atomic.h>		/* This gets us atomic counters.  */
 #include <linux/skbuff.h>	/* We need sk_buff_head. */
 #include <linux/workqueue.h>	/* We need tq_struct.	 */
+#include <linux/flex_array.h>	/* We need flex_array.   */
 #include <linux/sctp.h>		/* We need sctp* header structs.  */
 #include <net/sctp/auth.h>	/* We need auth specific structs */
 #include <net/ip.h>		/* For inet_skb_parm */
@@ -1431,8 +1432,8 @@ struct sctp_stream_in {
 };
 
 struct sctp_stream {
-	struct sctp_stream_out *out;
-	struct sctp_stream_in *in;
+	struct flex_array *out;
+	struct flex_array *in;
 	__u16 outcnt;
 	__u16 incnt;
 	/* Current stream being sent, if any */
@@ -1458,14 +1459,14 @@ static inline struct sctp_stream_out *sctp_stream_out(
 	const struct sctp_stream *stream,
 	__u16 sid)
 {
-	return ((struct sctp_stream_out *)(stream->out)) + sid;
+	return flex_array_get(stream->out, sid);
 }
 
 static inline struct sctp_stream_in *sctp_stream_in(
 	const struct sctp_stream *stream,
 	__u16 sid)
 {
-	return ((struct sctp_stream_in *)(stream->in)) + sid;
+	return flex_array_get(stream->in, sid);
 }
 
 #define SCTP_SO(s, i) sctp_stream_out((s), (i))
diff --git a/net/sctp/stream.c b/net/sctp/stream.c
index 7ca6fe4e7882..ffb940d3b57c 100644
--- a/net/sctp/stream.c
+++ b/net/sctp/stream.c
@@ -37,6 +37,53 @@
 #include <net/sctp/sm.h>
 #include <net/sctp/stream_sched.h>
 
+static struct flex_array *fa_alloc(size_t elem_size, size_t elem_count,
+				   gfp_t gfp)
+{
+	struct flex_array *result;
+	int err;
+
+	result = flex_array_alloc(elem_size, elem_count, gfp);
+	if (result) {
+		err = flex_array_prealloc(result, 0, elem_count, gfp);
+		if (err) {
+			flex_array_free(result);
+			result = NULL;
+		}
+	}
+
+	return result;
+}
+
+static void fa_free(struct flex_array *fa)
+{
+	if (fa)
+		flex_array_free(fa);
+}
+
+static void fa_copy(struct flex_array *fa, struct flex_array *from,
+		    size_t index, size_t count)
+{
+	void *elem;
+
+	while (count--) {
+		elem = flex_array_get(from, index);
+		flex_array_put(fa, index, elem, 0);
+		index++;
+	}
+}
+
+static void fa_zero(struct flex_array *fa, size_t index, size_t count)
+{
+	void *elem;
+
+	while (count--) {
+		elem = flex_array_get(fa, index);
+		memset(elem, 0, fa->element_size);
+		index++;
+	}
+}
+
 /* Migrates chunks from stream queues to new stream queues if needed,
  * but not across associations. Also, removes those chunks to streams
  * higher than the new max.
@@ -78,34 +125,33 @@ static void sctp_stream_outq_migrate(struct sctp_stream *stream,
 		 * sctp_stream_update will swap ->out pointers.
 		 */
 		for (i = 0; i < outcnt; i++) {
-			kfree(new->out[i].ext);
-			new->out[i].ext = stream->out[i].ext;
-			stream->out[i].ext = NULL;
+			kfree(SCTP_SO(new, i)->ext);
+			SCTP_SO(new, i)->ext = SCTP_SO(stream, i)->ext;
+			SCTP_SO(stream, i)->ext = NULL;
 		}
 	}
 
 	for (i = outcnt; i < stream->outcnt; i++)
-		kfree(stream->out[i].ext);
+		kfree(SCTP_SO(stream, i)->ext);
 }
 
 static int sctp_stream_alloc_out(struct sctp_stream *stream, __u16 outcnt,
 				 gfp_t gfp)
 {
-	struct sctp_stream_out *out;
+	struct flex_array *out;
+	size_t elem_size = sizeof(struct sctp_stream_out);
 
-	out = kmalloc_array(outcnt, sizeof(*out), gfp);
+	out = fa_alloc(elem_size, outcnt, gfp);
 	if (!out)
 		return -ENOMEM;
 
 	if (stream->out) {
-		memcpy(out, stream->out, min(outcnt, stream->outcnt) *
-					 sizeof(*out));
-		kfree(stream->out);
+		fa_copy(out, stream->out, 0, min(outcnt, stream->outcnt));
+		fa_free(stream->out);
 	}
 
 	if (outcnt > stream->outcnt)
-		memset(out + stream->outcnt, 0,
-		       (outcnt - stream->outcnt) * sizeof(*out));
+		fa_zero(out, stream->outcnt, (outcnt - stream->outcnt));
 
 	stream->out = out;
 
@@ -115,22 +161,20 @@ static int sctp_stream_alloc_out(struct sctp_stream *stream, __u16 outcnt,
 static int sctp_stream_alloc_in(struct sctp_stream *stream, __u16 incnt,
 				gfp_t gfp)
 {
-	struct sctp_stream_in *in;
-
-	in = kmalloc_array(incnt, sizeof(*stream->in), gfp);
+	struct flex_array *in;
+	size_t elem_size = sizeof(struct sctp_stream_in);
 
+	in = fa_alloc(elem_size, incnt, gfp);
 	if (!in)
 		return -ENOMEM;
 
 	if (stream->in) {
-		memcpy(in, stream->in, min(incnt, stream->incnt) *
-				       sizeof(*in));
-		kfree(stream->in);
+		fa_copy(in, stream->in, 0, min(incnt, stream->incnt));
+		fa_free(stream->in);
 	}
 
 	if (incnt > stream->incnt)
-		memset(in + stream->incnt, 0,
-		       (incnt - stream->incnt) * sizeof(*in));
+		fa_zero(in, stream->incnt, (incnt - stream->incnt));
 
 	stream->in = in;
 
@@ -174,7 +218,7 @@ int sctp_stream_init(struct sctp_stream *stream, __u16 outcnt, __u16 incnt,
 	ret = sctp_stream_alloc_in(stream, incnt, gfp);
 	if (ret) {
 		sched->free(stream);
-		kfree(stream->out);
+		fa_free(stream->out);
 		stream->out = NULL;
 		stream->outcnt = 0;
 		goto out;
@@ -206,8 +250,8 @@ void sctp_stream_free(struct sctp_stream *stream)
 	sched->free(stream);
 	for (i = 0; i < stream->outcnt; i++)
 		kfree(SCTP_SO(stream, i)->ext);
-	kfree(stream->out);
-	kfree(stream->in);
+	fa_free(stream->out);
+	fa_free(stream->in);
 }
 
 void sctp_stream_clear(struct sctp_stream *stream)
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v3 2/2] net/sctp: Replace in/out stream arrays with flex_array
@ 2018-08-10 17:11                       ` Konstantin Khorenko
  0 siblings, 0 replies; 64+ messages in thread
From: Konstantin Khorenko @ 2018-08-10 17:11 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: oleg.babin, netdev, linux-sctp, David S . Miller, Vlad Yasevich,
	Neil Horman, Xin Long, Andrey Ryabinin, Konstantin Khorenko

This path replaces physically contiguous memory arrays
allocated using kmalloc_array() with flexible arrays.
This enables to avoid memory allocation failures on the
systems under a memory stress.

Signed-off-by: Oleg Babin <obabin@virtuozzo.com>
Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>

---
 include/net/sctp/structs.h |  9 ++---
 net/sctp/stream.c          | 88 ++++++++++++++++++++++++++++++++++------------
 2 files changed, 71 insertions(+), 26 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index ce4bf844f573..f922db8029e6 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -57,6 +57,7 @@
 #include <linux/atomic.h>		/* This gets us atomic counters.  */
 #include <linux/skbuff.h>	/* We need sk_buff_head. */
 #include <linux/workqueue.h>	/* We need tq_struct.	 */
+#include <linux/flex_array.h>	/* We need flex_array.   */
 #include <linux/sctp.h>		/* We need sctp* header structs.  */
 #include <net/sctp/auth.h>	/* We need auth specific structs */
 #include <net/ip.h>		/* For inet_skb_parm */
@@ -1431,8 +1432,8 @@ struct sctp_stream_in {
 };
 
 struct sctp_stream {
-	struct sctp_stream_out *out;
-	struct sctp_stream_in *in;
+	struct flex_array *out;
+	struct flex_array *in;
 	__u16 outcnt;
 	__u16 incnt;
 	/* Current stream being sent, if any */
@@ -1458,14 +1459,14 @@ static inline struct sctp_stream_out *sctp_stream_out(
 	const struct sctp_stream *stream,
 	__u16 sid)
 {
-	return ((struct sctp_stream_out *)(stream->out)) + sid;
+	return flex_array_get(stream->out, sid);
 }
 
 static inline struct sctp_stream_in *sctp_stream_in(
 	const struct sctp_stream *stream,
 	__u16 sid)
 {
-	return ((struct sctp_stream_in *)(stream->in)) + sid;
+	return flex_array_get(stream->in, sid);
 }
 
 #define SCTP_SO(s, i) sctp_stream_out((s), (i))
diff --git a/net/sctp/stream.c b/net/sctp/stream.c
index 7ca6fe4e7882..ffb940d3b57c 100644
--- a/net/sctp/stream.c
+++ b/net/sctp/stream.c
@@ -37,6 +37,53 @@
 #include <net/sctp/sm.h>
 #include <net/sctp/stream_sched.h>
 
+static struct flex_array *fa_alloc(size_t elem_size, size_t elem_count,
+				   gfp_t gfp)
+{
+	struct flex_array *result;
+	int err;
+
+	result = flex_array_alloc(elem_size, elem_count, gfp);
+	if (result) {
+		err = flex_array_prealloc(result, 0, elem_count, gfp);
+		if (err) {
+			flex_array_free(result);
+			result = NULL;
+		}
+	}
+
+	return result;
+}
+
+static void fa_free(struct flex_array *fa)
+{
+	if (fa)
+		flex_array_free(fa);
+}
+
+static void fa_copy(struct flex_array *fa, struct flex_array *from,
+		    size_t index, size_t count)
+{
+	void *elem;
+
+	while (count--) {
+		elem = flex_array_get(from, index);
+		flex_array_put(fa, index, elem, 0);
+		index++;
+	}
+}
+
+static void fa_zero(struct flex_array *fa, size_t index, size_t count)
+{
+	void *elem;
+
+	while (count--) {
+		elem = flex_array_get(fa, index);
+		memset(elem, 0, fa->element_size);
+		index++;
+	}
+}
+
 /* Migrates chunks from stream queues to new stream queues if needed,
  * but not across associations. Also, removes those chunks to streams
  * higher than the new max.
@@ -78,34 +125,33 @@ static void sctp_stream_outq_migrate(struct sctp_stream *stream,
 		 * sctp_stream_update will swap ->out pointers.
 		 */
 		for (i = 0; i < outcnt; i++) {
-			kfree(new->out[i].ext);
-			new->out[i].ext = stream->out[i].ext;
-			stream->out[i].ext = NULL;
+			kfree(SCTP_SO(new, i)->ext);
+			SCTP_SO(new, i)->ext = SCTP_SO(stream, i)->ext;
+			SCTP_SO(stream, i)->ext = NULL;
 		}
 	}
 
 	for (i = outcnt; i < stream->outcnt; i++)
-		kfree(stream->out[i].ext);
+		kfree(SCTP_SO(stream, i)->ext);
 }
 
 static int sctp_stream_alloc_out(struct sctp_stream *stream, __u16 outcnt,
 				 gfp_t gfp)
 {
-	struct sctp_stream_out *out;
+	struct flex_array *out;
+	size_t elem_size = sizeof(struct sctp_stream_out);
 
-	out = kmalloc_array(outcnt, sizeof(*out), gfp);
+	out = fa_alloc(elem_size, outcnt, gfp);
 	if (!out)
 		return -ENOMEM;
 
 	if (stream->out) {
-		memcpy(out, stream->out, min(outcnt, stream->outcnt) *
-					 sizeof(*out));
-		kfree(stream->out);
+		fa_copy(out, stream->out, 0, min(outcnt, stream->outcnt));
+		fa_free(stream->out);
 	}
 
 	if (outcnt > stream->outcnt)
-		memset(out + stream->outcnt, 0,
-		       (outcnt - stream->outcnt) * sizeof(*out));
+		fa_zero(out, stream->outcnt, (outcnt - stream->outcnt));
 
 	stream->out = out;
 
@@ -115,22 +161,20 @@ static int sctp_stream_alloc_out(struct sctp_stream *stream, __u16 outcnt,
 static int sctp_stream_alloc_in(struct sctp_stream *stream, __u16 incnt,
 				gfp_t gfp)
 {
-	struct sctp_stream_in *in;
-
-	in = kmalloc_array(incnt, sizeof(*stream->in), gfp);
+	struct flex_array *in;
+	size_t elem_size = sizeof(struct sctp_stream_in);
 
+	in = fa_alloc(elem_size, incnt, gfp);
 	if (!in)
 		return -ENOMEM;
 
 	if (stream->in) {
-		memcpy(in, stream->in, min(incnt, stream->incnt) *
-				       sizeof(*in));
-		kfree(stream->in);
+		fa_copy(in, stream->in, 0, min(incnt, stream->incnt));
+		fa_free(stream->in);
 	}
 
 	if (incnt > stream->incnt)
-		memset(in + stream->incnt, 0,
-		       (incnt - stream->incnt) * sizeof(*in));
+		fa_zero(in, stream->incnt, (incnt - stream->incnt));
 
 	stream->in = in;
 
@@ -174,7 +218,7 @@ int sctp_stream_init(struct sctp_stream *stream, __u16 outcnt, __u16 incnt,
 	ret = sctp_stream_alloc_in(stream, incnt, gfp);
 	if (ret) {
 		sched->free(stream);
-		kfree(stream->out);
+		fa_free(stream->out);
 		stream->out = NULL;
 		stream->outcnt = 0;
 		goto out;
@@ -206,8 +250,8 @@ void sctp_stream_free(struct sctp_stream *stream)
 	sched->free(stream);
 	for (i = 0; i < stream->outcnt; i++)
 		kfree(SCTP_SO(stream, i)->ext);
-	kfree(stream->out);
-	kfree(stream->in);
+	fa_free(stream->out);
+	fa_free(stream->in);
 }
 
 void sctp_stream_clear(struct sctp_stream *stream)
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 0/2] net/sctp: Avoid allocating high order memory with kmalloc()
  2018-08-10 17:03                   ` Konstantin Khorenko
@ 2018-08-10 17:41                     ` Marcelo Ricardo Leitner
  -1 siblings, 0 replies; 64+ messages in thread
From: Marcelo Ricardo Leitner @ 2018-08-10 17:41 UTC (permalink / raw)
  To: Konstantin Khorenko
  Cc: oleg.babin, netdev, linux-sctp, David S . Miller, Vlad Yasevich,
	Neil Horman, Xin Long, Andrey Ryabinin

On Fri, Aug 10, 2018 at 08:03:51PM +0300, Konstantin Khorenko wrote:
> On 08/09/2018 11:43 AM, Konstantin Khorenko wrote:
> > On 08/04/2018 02:36 AM, Marcelo Ricardo Leitner wrote:
> > > On Fri, Aug 03, 2018 at 07:21:00PM +0300, Konstantin Khorenko wrote:
> > > ...
> > > > Performance results:
> > > > ====================
> > > >   * Kernel: v4.18-rc6 - stock and with 2 patches from Oleg (earlier in this thread)
> > > >   * Node: CPU (8 cores): Intel(R) Xeon(R) CPU E31230 @ 3.20GHz
> > > >           RAM: 32 Gb
> > > > 
> > > >   * netperf: taken from https://github.com/HewlettPackard/netperf.git,
> > > > 	     compiled from sources with sctp support
> > > >   * netperf server and client are run on the same node
> > > >   * ip link set lo mtu 1500
> > > > 
> > > > The script used to run tests:
> > > >  # cat run_tests.sh
> > > >  #!/bin/bash
> > > > 
> > > > for test in SCTP_STREAM SCTP_STREAM_MANY SCTP_RR SCTP_RR_MANY; do
> > > >   echo "TEST: $test";
> > > >   for i in `seq 1 3`; do
> > > >     echo "Iteration: $i";
> > > >     set -x
> > > >     netperf -t $test -H localhost -p 22222 -S 200000,200000 -s 200000,200000 \
> > > >             -l 60 -- -m 1452;
> > > >     set +x
> > > >   done
> > > > done
> > > > ================================================
> > > > 
> > > > Results (a bit reformatted to be more readable):
> > > ...
> > > 
> > > Nice, good numbers.
> > > 
> > > I'm missing some test that actually uses more than 1 stream. All tests
> > > in netperf uses only 1 stream. They can use 1 or Many associations on
> > > a socket, but not multiple streams. That means the numbers here show
> > > that we shouldn't see any regression on the more traditional uses, per
> > > Michael's reply on the other email, but it is not testing how it will
> > > behave if we go crazy and use the 64k streams (worst case).
> > > 
> > > You'll need some other tool to test it. One idea is sctp_test, from
> > > lksctp-tools. Something like:
> > > 
> > > Server side:
> > > 	./sctp_test -H 172.0.0.1 -P 22222 -l -d 0
> > > Client side:
> > > 	time ./sctp_test -H 172.0.0.1 -P 22221 \
> > > 		-h 172.0.0.1 -p 22222 -s \
> > > 		-c 1 -M 65535 -T -t 1 -x 100000 -d 0
> > > 
> > > And then measure the difference on how long each test took. Can you
> > > get these too?
> > > 
> > > Interesting that in my laptop just to start this test for the first
> > > time can took some *seconds*. Seems kernel had a hard time
> > > defragmenting the memory here. :)
> 
> Hi Marcelo,
> 
> got 3 of 4 results, please take a look, but i failed to measure running
> the test on stock kernel when memory is fragmented, test fails with
>         *** connect:  Cannot allocate memory ***

Hah, okay.

> 
> 
> Performance results:
> ====================
>   * Kernel: v4.18-rc8 - stock and with 2 patches v3
>   * Node: CPU (8 cores): Intel(R) Xeon(R) CPU E31230 @ 3.20GHz
>           RAM: 32 Gb
> 
>   * sctp_test: https://github.com/sctp/lksctp-tools
>   * both server and client are run on the same node
>   * ip link set lo mtu 1500
>   * sysctl -w vm.max_map_count=65530000 (need it to make memory fragmented)
> 
> The script used to run tests:
> =============================
> # cat run_sctp_test.sh
> #!/bin/bash
> 
> set -x
> 
> uname -r
> ip link set lo mtu 1500
> swapoff -a
> 
> free
> cat /proc/buddyinfo
> 
> ./src/apps/sctp_test -H 127.0.0.1 -P 22222 -l -d 0 &
> sleep 3
> 
> time ./src/apps/sctp_test -H 127.0.0.1 -P 22221 -h 127.0.0.1 -p 22222 \
>         -s -c 1 -M 65535 -T -t 1 -x 100000 -d 0 1>/dev/null
> 
> killall -9 lt-sctp_test
> ===============================
> 
> Results (a bit reformatted to be more readable):
> 
> 1) ms stock kernel v4.18-rc8, no memory fragmentation
> Info about memory - more or less same to iterations:
> # free
>               total        used        free      shared  buff/cache   available
> Mem:       32906008      213156    32178184         764      514668    32260968
> Swap:             0           0           0
> 
> cat /proc/buddyinfo
> Node 0, zone      DMA      0      1      1      0      2      1      1      0      1      1      3
> Node 0, zone    DMA32      1      3      5      4      2      2      3      6      6      4    867
> Node 0, zone   Normal    551    422    160    204    193     34     15      7     22     19   6956
> 
> 	test 1		test 2		test 3
> real    0m14.715s	0m14.593s	0m15.954s
> user    0m0.954s	0m0.955s	0m0.854s
> sys     0m13.388s	0m12.537s	0m13.749s
> 
> 2) kernel with fixes, no memory fragmentation
> 'free' and 'buddyinfo' similar to 1)
> 
> 	test 1		test 2		test 3
> real    0m14.959s	0m14.693s	0m14.762s
> user    0m0.948s	0m0.921s	0m0.929s
> sys     0m13.538s	0m13.225s	0m13.217s
> 
> 3) kernel with fixes, memory fragmented
> (mmap() all available RAM, touch all pages, munmap() half of pages (each second page), do it again for RAM/2)
> 'free':
>               total        used        free      shared  buff/cache   available
> Mem:       32906008    30555200      302740         764     2048068      266452
> Mem:       32906008    30379948      541436         764     1984624      442376
> Mem:       32906008    30717312      262380         764     1926316      109908
> 
> /proc/buddyinfo:
> Node 0, zone   Normal  40773     37     34     29      0      0      0      0      0      0      0
> Node 0, zone   Normal 100332     68      8      4      2      1      1      0      0      0      0
> Node 0, zone   Normal  31113      7      2      1      0      0      0      0      0      0      0
> 
> 	test 1		test 2		test 3
> real    0m14.159s	0m15.252s	0m15.826s
> user    0m0.839s	0m1.004s	0m1.048s
> sys     0m11.827s	0m14.240s	0m14.778s

Nice. Looks like there won't be (noticeable) performance regressions
on where it was functional, and it will help make it functional in
case memory is fragmented. With some overhead, but it at least works.

Thanks for running all theses.

> 
> 
> --
> Best regards,
> 
> Konstantin Khorenko,
> Virtuozzo Linux Kernel Team

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 0/2] net/sctp: Avoid allocating high order memory with kmalloc()
@ 2018-08-10 17:41                     ` Marcelo Ricardo Leitner
  0 siblings, 0 replies; 64+ messages in thread
From: Marcelo Ricardo Leitner @ 2018-08-10 17:41 UTC (permalink / raw)
  To: Konstantin Khorenko
  Cc: oleg.babin, netdev, linux-sctp, David S . Miller, Vlad Yasevich,
	Neil Horman, Xin Long, Andrey Ryabinin

On Fri, Aug 10, 2018 at 08:03:51PM +0300, Konstantin Khorenko wrote:
> On 08/09/2018 11:43 AM, Konstantin Khorenko wrote:
> > On 08/04/2018 02:36 AM, Marcelo Ricardo Leitner wrote:
> > > On Fri, Aug 03, 2018 at 07:21:00PM +0300, Konstantin Khorenko wrote:
> > > ...
> > > > Performance results:
> > > > ==========
> > > >   * Kernel: v4.18-rc6 - stock and with 2 patches from Oleg (earlier in this thread)
> > > >   * Node: CPU (8 cores): Intel(R) Xeon(R) CPU E31230 @ 3.20GHz
> > > >           RAM: 32 Gb
> > > > 
> > > >   * netperf: taken from https://github.com/HewlettPackard/netperf.git,
> > > > 	     compiled from sources with sctp support
> > > >   * netperf server and client are run on the same node
> > > >   * ip link set lo mtu 1500
> > > > 
> > > > The script used to run tests:
> > > >  # cat run_tests.sh
> > > >  #!/bin/bash
> > > > 
> > > > for test in SCTP_STREAM SCTP_STREAM_MANY SCTP_RR SCTP_RR_MANY; do
> > > >   echo "TEST: $test";
> > > >   for i in `seq 1 3`; do
> > > >     echo "Iteration: $i";
> > > >     set -x
> > > >     netperf -t $test -H localhost -p 22222 -S 200000,200000 -s 200000,200000 \
> > > >             -l 60 -- -m 1452;
> > > >     set +x
> > > >   done
> > > > done
> > > > ========================
> > > > 
> > > > Results (a bit reformatted to be more readable):
> > > ...
> > > 
> > > Nice, good numbers.
> > > 
> > > I'm missing some test that actually uses more than 1 stream. All tests
> > > in netperf uses only 1 stream. They can use 1 or Many associations on
> > > a socket, but not multiple streams. That means the numbers here show
> > > that we shouldn't see any regression on the more traditional uses, per
> > > Michael's reply on the other email, but it is not testing how it will
> > > behave if we go crazy and use the 64k streams (worst case).
> > > 
> > > You'll need some other tool to test it. One idea is sctp_test, from
> > > lksctp-tools. Something like:
> > > 
> > > Server side:
> > > 	./sctp_test -H 172.0.0.1 -P 22222 -l -d 0
> > > Client side:
> > > 	time ./sctp_test -H 172.0.0.1 -P 22221 \
> > > 		-h 172.0.0.1 -p 22222 -s \
> > > 		-c 1 -M 65535 -T -t 1 -x 100000 -d 0
> > > 
> > > And then measure the difference on how long each test took. Can you
> > > get these too?
> > > 
> > > Interesting that in my laptop just to start this test for the first
> > > time can took some *seconds*. Seems kernel had a hard time
> > > defragmenting the memory here. :)
> 
> Hi Marcelo,
> 
> got 3 of 4 results, please take a look, but i failed to measure running
> the test on stock kernel when memory is fragmented, test fails with
>         *** connect:  Cannot allocate memory ***

Hah, okay.

> 
> 
> Performance results:
> ==========
>   * Kernel: v4.18-rc8 - stock and with 2 patches v3
>   * Node: CPU (8 cores): Intel(R) Xeon(R) CPU E31230 @ 3.20GHz
>           RAM: 32 Gb
> 
>   * sctp_test: https://github.com/sctp/lksctp-tools
>   * both server and client are run on the same node
>   * ip link set lo mtu 1500
>   * sysctl -w vm.max_map_counte530000 (need it to make memory fragmented)
> 
> The script used to run tests:
> ==============> # cat run_sctp_test.sh
> #!/bin/bash
> 
> set -x
> 
> uname -r
> ip link set lo mtu 1500
> swapoff -a
> 
> free
> cat /proc/buddyinfo
> 
> ./src/apps/sctp_test -H 127.0.0.1 -P 22222 -l -d 0 &
> sleep 3
> 
> time ./src/apps/sctp_test -H 127.0.0.1 -P 22221 -h 127.0.0.1 -p 22222 \
>         -s -c 1 -M 65535 -T -t 1 -x 100000 -d 0 1>/dev/null
> 
> killall -9 lt-sctp_test
> ===============> 
> Results (a bit reformatted to be more readable):
> 
> 1) ms stock kernel v4.18-rc8, no memory fragmentation
> Info about memory - more or less same to iterations:
> # free
>               total        used        free      shared  buff/cache   available
> Mem:       32906008      213156    32178184         764      514668    32260968
> Swap:             0           0           0
> 
> cat /proc/buddyinfo
> Node 0, zone      DMA      0      1      1      0      2      1      1      0      1      1      3
> Node 0, zone    DMA32      1      3      5      4      2      2      3      6      6      4    867
> Node 0, zone   Normal    551    422    160    204    193     34     15      7     22     19   6956
> 
> 	test 1		test 2		test 3
> real    0m14.715s	0m14.593s	0m15.954s
> user    0m0.954s	0m0.955s	0m0.854s
> sys     0m13.388s	0m12.537s	0m13.749s
> 
> 2) kernel with fixes, no memory fragmentation
> 'free' and 'buddyinfo' similar to 1)
> 
> 	test 1		test 2		test 3
> real    0m14.959s	0m14.693s	0m14.762s
> user    0m0.948s	0m0.921s	0m0.929s
> sys     0m13.538s	0m13.225s	0m13.217s
> 
> 3) kernel with fixes, memory fragmented
> (mmap() all available RAM, touch all pages, munmap() half of pages (each second page), do it again for RAM/2)
> 'free':
>               total        used        free      shared  buff/cache   available
> Mem:       32906008    30555200      302740         764     2048068      266452
> Mem:       32906008    30379948      541436         764     1984624      442376
> Mem:       32906008    30717312      262380         764     1926316      109908
> 
> /proc/buddyinfo:
> Node 0, zone   Normal  40773     37     34     29      0      0      0      0      0      0      0
> Node 0, zone   Normal 100332     68      8      4      2      1      1      0      0      0      0
> Node 0, zone   Normal  31113      7      2      1      0      0      0      0      0      0      0
> 
> 	test 1		test 2		test 3
> real    0m14.159s	0m15.252s	0m15.826s
> user    0m0.839s	0m1.004s	0m1.048s
> sys     0m11.827s	0m14.240s	0m14.778s

Nice. Looks like there won't be (noticeable) performance regressions
on where it was functional, and it will help make it functional in
case memory is fragmented. With some overhead, but it at least works.

Thanks for running all theses.

> 
> 
> --
> Best regards,
> 
> Konstantin Khorenko,
> Virtuozzo Linux Kernel Team

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 0/2] net/sctp: Avoid allocating high order memory with kmalloc()
  2018-08-10 17:11                     ` Konstantin Khorenko
@ 2018-08-11 19:36                       ` David Miller
  -1 siblings, 0 replies; 64+ messages in thread
From: David Miller @ 2018-08-11 19:36 UTC (permalink / raw)
  To: khorenko
  Cc: marcelo.leitner, oleg.babin, netdev, linux-sctp, vyasevich,
	nhorman, lucien.xin, aryabinin

From: Konstantin Khorenko <khorenko@virtuozzo.com>
Date: Fri, 10 Aug 2018 20:11:41 +0300

> Each SCTP association can have up to 65535 input and output streams.
> For each stream type an array of sctp_stream_in or sctp_stream_out
> structures is allocated using kmalloc_array() function. This function
> allocates physically contiguous memory regions, so this can lead
> to allocation of memory regions of very high order, i.e.:
> 
>   sizeof(struct sctp_stream_out) == 24,
>   ((65535 * 24) / 4096) == 383 memory pages (4096 byte per page),
>   which means 9th memory order.
> 
> This can lead to a memory allocation failures on the systems
> under a memory stress.
> 
> We actually do not need these arrays of memory to be physically
> contiguous. Possible simple solution would be to use kvmalloc()
> instread of kmalloc() as kvmalloc() can allocate physically scattered
> pages if contiguous pages are not available. But the problem
> is that the allocation can happed in a softirq context with
> GFP_ATOMIC flag set, and kvmalloc() cannot be used in this scenario.
> 
> So the other possible solution is to use flexible arrays instead of
> contiguios arrays of memory so that the memory would be allocated
> on a per-page basis.
> 
> This patchset replaces kvmalloc() with flex_array usage.
> It consists of two parts:
> 
>   * First patch is preparatory - it mechanically wraps all direct
>     access to assoc->stream.out[] and assoc->stream.in[] arrays
>     with SCTP_SO() and SCTP_SI() wrappers so that later a direct
>     array access could be easily changed to an access to a
>     flex_array (or any other possible alternative).
>   * Second patch replaces kmalloc_array() with flex_array usage.

Looks good, series applied, thanks!

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 0/2] net/sctp: Avoid allocating high order memory with kmalloc()
@ 2018-08-11 19:36                       ` David Miller
  0 siblings, 0 replies; 64+ messages in thread
From: David Miller @ 2018-08-11 19:36 UTC (permalink / raw)
  To: khorenko
  Cc: marcelo.leitner, oleg.babin, netdev, linux-sctp, vyasevich,
	nhorman, lucien.xin, aryabinin

From: Konstantin Khorenko <khorenko@virtuozzo.com>
Date: Fri, 10 Aug 2018 20:11:41 +0300

> Each SCTP association can have up to 65535 input and output streams.
> For each stream type an array of sctp_stream_in or sctp_stream_out
> structures is allocated using kmalloc_array() function. This function
> allocates physically contiguous memory regions, so this can lead
> to allocation of memory regions of very high order, i.e.:
> 
>   sizeof(struct sctp_stream_out) = 24,
>   ((65535 * 24) / 4096) = 383 memory pages (4096 byte per page),
>   which means 9th memory order.
> 
> This can lead to a memory allocation failures on the systems
> under a memory stress.
> 
> We actually do not need these arrays of memory to be physically
> contiguous. Possible simple solution would be to use kvmalloc()
> instread of kmalloc() as kvmalloc() can allocate physically scattered
> pages if contiguous pages are not available. But the problem
> is that the allocation can happed in a softirq context with
> GFP_ATOMIC flag set, and kvmalloc() cannot be used in this scenario.
> 
> So the other possible solution is to use flexible arrays instead of
> contiguios arrays of memory so that the memory would be allocated
> on a per-page basis.
> 
> This patchset replaces kvmalloc() with flex_array usage.
> It consists of two parts:
> 
>   * First patch is preparatory - it mechanically wraps all direct
>     access to assoc->stream.out[] and assoc->stream.in[] arrays
>     with SCTP_SO() and SCTP_SI() wrappers so that later a direct
>     array access could be easily changed to an access to a
>     flex_array (or any other possible alternative).
>   * Second patch replaces kmalloc_array() with flex_array usage.

Looks good, series applied, thanks!

^ permalink raw reply	[flat|nested] 64+ messages in thread

end of thread, other threads:[~2018-08-11 22:30 UTC | newest]

Thread overview: 64+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-23 18:41 [PATCH net-next 0/2] net/sctp: Avoid allocating high order memory with kmalloc() Oleg Babin
2018-04-23 18:41 ` Oleg Babin
2018-04-23 18:41 ` [PATCH net-next 1/2] net/sctp: Make wrappers for accessing in/out streams Oleg Babin
2018-04-23 18:41   ` Oleg Babin
2018-04-23 21:33   ` Marcelo Ricardo Leitner
2018-04-23 21:33     ` Marcelo Ricardo Leitner
2018-04-26 22:19     ` Oleg Babin
2018-04-26 22:19       ` Oleg Babin
2018-04-23 18:41 ` [PATCH net-next 2/2] net/sctp: Replace in/out stream arrays with flex_array Oleg Babin
2018-04-23 18:41   ` Oleg Babin
2018-04-23 21:33 ` [PATCH net-next 0/2] net/sctp: Avoid allocating high order memory with kmalloc() Marcelo Ricardo Leitner
2018-04-23 21:33   ` Marcelo Ricardo Leitner
2018-04-26 22:14   ` Oleg Babin
2018-04-26 22:14     ` Oleg Babin
2018-04-26 22:28     ` Marcelo Ricardo Leitner
2018-04-26 22:28       ` Marcelo Ricardo Leitner
2018-04-26 22:45       ` Oleg Babin
2018-04-26 22:45         ` Oleg Babin
2018-07-24 15:35       ` Konstantin Khorenko
2018-07-24 15:35         ` Konstantin Khorenko
2018-07-24 17:36         ` Marcelo Ricardo Leitner
2018-07-24 17:36           ` Marcelo Ricardo Leitner
2018-08-03 16:21           ` [PATCH v2 " Konstantin Khorenko
2018-08-03 16:21             ` Konstantin Khorenko
2018-08-03 16:21             ` [PATCH v2 1/2] net/sctp: Make wrappers for accessing in/out streams Konstantin Khorenko
2018-08-03 16:21               ` Konstantin Khorenko
2018-08-03 16:41               ` David Laight
2018-08-03 16:41                 ` David Laight
2018-08-03 19:50               ` David Miller
2018-08-03 19:50                 ` David Miller
2018-08-09  8:39                 ` Konstantin Khorenko
2018-08-09  8:39                   ` Konstantin Khorenko
2018-08-03 20:40               ` Marcelo Ricardo Leitner
2018-08-03 20:40                 ` Marcelo Ricardo Leitner
2018-08-09  8:40                 ` Konstantin Khorenko
2018-08-09  8:40                   ` Konstantin Khorenko
2018-08-03 16:21             ` [PATCH v2 2/2] net/sctp: Replace in/out stream arrays with flex_array Konstantin Khorenko
2018-08-03 16:21               ` Konstantin Khorenko
2018-08-03 16:43             ` [PATCH v2 0/2] net/sctp: Avoid allocating high order memory with kmalloc() David Laight
2018-08-03 16:43               ` David Laight
2018-08-03 20:30               ` Marcelo Ricardo Leitner
2018-08-03 20:30                 ` Marcelo Ricardo Leitner
2018-08-03 20:56                 ` Michael Tuexen
2018-08-03 20:56                   ` Michael Tuexen
2018-08-06  9:34                   ` David Laight
2018-08-06  9:34                     ` David Laight
2018-08-08 14:48                     ` Marcelo Ricardo Leitner
2018-08-08 14:48                       ` Marcelo Ricardo Leitner
2018-08-03 23:36             ` Marcelo Ricardo Leitner
2018-08-03 23:36               ` Marcelo Ricardo Leitner
2018-08-09  8:43               ` Konstantin Khorenko
2018-08-09  8:43                 ` Konstantin Khorenko
2018-08-10 17:03                 ` Konstantin Khorenko
2018-08-10 17:03                   ` Konstantin Khorenko
2018-08-10 17:11                   ` [PATCH v3 " Konstantin Khorenko
2018-08-10 17:11                     ` Konstantin Khorenko
2018-08-10 17:11                     ` [PATCH v3 1/2] net/sctp: Make wrappers for accessing in/out streams Konstantin Khorenko
2018-08-10 17:11                       ` Konstantin Khorenko
2018-08-10 17:11                     ` [PATCH v3 2/2] net/sctp: Replace in/out stream arrays with flex_array Konstantin Khorenko
2018-08-10 17:11                       ` Konstantin Khorenko
2018-08-11 19:36                     ` [PATCH v3 0/2] net/sctp: Avoid allocating high order memory with kmalloc() David Miller
2018-08-11 19:36                       ` David Miller
2018-08-10 17:41                   ` [PATCH v2 " Marcelo Ricardo Leitner
2018-08-10 17:41                     ` Marcelo Ricardo Leitner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.