All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [dpdk-dev] [PATCH 5/8] event/dsw: avoid migration waves in large systems
@ 2020-03-09  7:17 Jerin Jacob Kollanukkaran
  2020-03-09  7:58 ` Mattias Rönnblom
  0 siblings, 1 reply; 5+ messages in thread
From: Jerin Jacob Kollanukkaran @ 2020-03-09  7:17 UTC (permalink / raw)
  To: Mattias Rönnblom; +Cc: dev, stefan.sundkvist, Ola.Liljedahl

> -----Original Message-----
> From: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> Sent: Monday, March 9, 2020 12:21 PM
> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>
> Cc: dev@dpdk.org; stefan.sundkvist@ericsson.com; Ola.Liljedahl@arm.com;
> Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> Subject:  [PATCH 5/8] event/dsw: avoid migration waves in large systems
> 
> ----------------------------------------------------------------------
> DSW limits the rate of migrations on a per-port basis. Hence, as the number of
> cores grows, so does the total migration capacity.
> 
> In high core-count systems, this allows for a situation where flows are migrated
> to a lightly loaded port which recently already received a number of new flows
> (from other ports). The processing load generated by these new flows may not
> yet be reflected in the lightly loaded port's load estimate. The result is that the
> previously lightly loaded port is now overloaded.
> 
> This patch adds a rough estimate of the size of the inbound migrations to a
> particular port, which can be factored into the migration logic, avoiding the
> above problem.
> 
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> ---
> @@ -491,6 +502,9 @@ dsw_select_emigration_target(struct dsw_evdev *dsw,
>  	target_qfs[*targets_len] = *candidate_qf;
>  	(*targets_len)++;
> 
> +	rte_atomic32_add(&dsw->ports[candidate_port_id].immigration_load,
> +			 candidate_flow_load);

These are the full barriers in arm64 and PowerPC. 
Request to change the C11 mem model[1] with Load and acquire semantics
For better performance enhancement on non x86 machines.

drivers/event/opdl is already moved to C11 mem model.

[1]
https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html


> +
>  	return true;

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] [PATCH 5/8] event/dsw: avoid migration waves in large systems
  2020-03-09  7:17 [dpdk-dev] [PATCH 5/8] event/dsw: avoid migration waves in large systems Jerin Jacob Kollanukkaran
@ 2020-03-09  7:58 ` Mattias Rönnblom
  2020-03-09  8:12   ` Jerin Jacob Kollanukkaran
  0 siblings, 1 reply; 5+ messages in thread
From: Mattias Rönnblom @ 2020-03-09  7:58 UTC (permalink / raw)
  To: Jerin Jacob Kollanukkaran; +Cc: dev, Stefan Sundkvist, Ola.Liljedahl

On 2020-03-09 08:17, Jerin Jacob Kollanukkaran wrote:
>> -----Original Message-----
>> From: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>> Sent: Monday, March 9, 2020 12:21 PM
>> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>
>> Cc: dev@dpdk.org; stefan.sundkvist@ericsson.com; Ola.Liljedahl@arm.com;
>> Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>> Subject:  [PATCH 5/8] event/dsw: avoid migration waves in large systems
>>
>> ----------------------------------------------------------------------
>> DSW limits the rate of migrations on a per-port basis. Hence, as the number of
>> cores grows, so does the total migration capacity.
>>
>> In high core-count systems, this allows for a situation where flows are migrated
>> to a lightly loaded port which recently already received a number of new flows
>> (from other ports). The processing load generated by these new flows may not
>> yet be reflected in the lightly loaded port's load estimate. The result is that the
>> previously lightly loaded port is now overloaded.
>>
>> This patch adds a rough estimate of the size of the inbound migrations to a
>> particular port, which can be factored into the migration logic, avoiding the
>> above problem.
>>
>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>> ---
>> @@ -491,6 +502,9 @@ dsw_select_emigration_target(struct dsw_evdev *dsw,
>>   	target_qfs[*targets_len] = *candidate_qf;
>>   	(*targets_len)++;
>>
>> +	rte_atomic32_add(&dsw->ports[candidate_port_id].immigration_load,
>> +			 candidate_flow_load);
> These are the full barriers in arm64 and PowerPC.
> Request to change the C11 mem model[1] with Load and acquire semantics
> For better performance enhancement on non x86 machines.
>
> drivers/event/opdl is already moved to C11 mem model.
>
> [1]
> https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html
>
The performance impacts would be small, since this is in the slow path, 
with something like a handful of memory barrier per core per ms.

Arguably, it could be done for consistency reasons, but then you should 
change all DSW atomics.

>> +
>>   	return true;



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] [PATCH 5/8] event/dsw: avoid migration waves in large systems
  2020-03-09  7:58 ` Mattias Rönnblom
@ 2020-03-09  8:12   ` Jerin Jacob Kollanukkaran
  2020-03-09  8:41     ` Mattias Rönnblom
  0 siblings, 1 reply; 5+ messages in thread
From: Jerin Jacob Kollanukkaran @ 2020-03-09  8:12 UTC (permalink / raw)
  To: Mattias Rönnblom; +Cc: dev, Stefan Sundkvist, Ola.Liljedahl

> -----Original Message-----
> From: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> Sent: Monday, March 9, 2020 1:28 PM
> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>
> Cc: dev@dpdk.org; Stefan Sundkvist <stefan.sundkvist@ericsson.com>;
> Ola.Liljedahl@arm.com
> Subject: [EXT] Re: [PATCH 5/8] event/dsw: avoid migration waves in large
> systems
> 
> On 2020-03-09 08:17, Jerin Jacob Kollanukkaran wrote:
> >> -----Original Message-----
> >> From: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> >> Sent: Monday, March 9, 2020 12:21 PM
> >> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>
> >> Cc: dev@dpdk.org; stefan.sundkvist@ericsson.com;
> >> Ola.Liljedahl@arm.com; Mattias Rönnblom
> >> <mattias.ronnblom@ericsson.com>
> >> Subject:  [PATCH 5/8] event/dsw: avoid migration waves in large
> >> systems
> >>
> >> ---------------------------------------------------------------------
> >> - DSW limits the rate of migrations on a per-port basis. Hence, as
> >> the number of cores grows, so does the total migration capacity.
> >>
> >> In high core-count systems, this allows for a situation where flows
> >> are migrated to a lightly loaded port which recently already received
> >> a number of new flows (from other ports). The processing load
> >> generated by these new flows may not yet be reflected in the lightly
> >> loaded port's load estimate. The result is that the previously lightly loaded
> port is now overloaded.
> >>
> >> This patch adds a rough estimate of the size of the inbound
> >> migrations to a particular port, which can be factored into the
> >> migration logic, avoiding the above problem.
> >>
> >> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> >> ---
> >> @@ -491,6 +502,9 @@ dsw_select_emigration_target(struct dsw_evdev
> *dsw,
> >>   	target_qfs[*targets_len] = *candidate_qf;
> >>   	(*targets_len)++;
> >>
> >> +	rte_atomic32_add(&dsw->ports[candidate_port_id].immigration_load,
> >> +			 candidate_flow_load);
> > These are the full barriers in arm64 and PowerPC.
> > Request to change the C11 mem model[1] with Load and acquire semantics
> > For better performance enhancement on non x86 machines.
> >
> > drivers/event/opdl is already moved to C11 mem model.
> >
> > [1]
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__gcc.gnu.org_onlin
> > edocs_gcc_-5F005f-5F005fatomic-
> 2DBuiltins.html&d=DwIGaQ&c=nKjWec2b6R0m
> >
> OyPaz7xtfQ&r=1DGob4H4rxz6H8uITozGOCa0s5f4wCNtTa4UUKvcsvI&m=WWfY
> IvEKR8a
> >
> _FuTltGFBbtERAKU1akjXuokLpv2zSz0&s=bEjlLRgN4LriVpVzwYcdgcTV39OI_MZY
> OG0
> > QDhjmezw&e=
> >
> The performance impacts would be small, since this is in the slow path, with
> something like a handful of memory barrier per core per ms.

OK. If it is slow path, then yes, no point in changing.

How about the other following uses in the DSW driver? Does it comes in fastpath or slowpath?

drivers/event/dsw/dsw_event.c:  new_total_on_loan = rte_atomic32_add_return(&dsw->credits_on_loan,
drivers/event/dsw/dsw_event.c:          rte_atomic32_sub(&dsw->credits_on_loan, acquired_credits);
drivers/event/dsw/dsw_event.c:          rte_atomic32_sub(&dsw->credits_on_loan, return_credits);


> 
> Arguably, it could be done for consistency reasons, but then you should change
> all DSW atomics.
> 
> >> +
> >>   	return true;
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] [PATCH 5/8] event/dsw: avoid migration waves in large systems
  2020-03-09  8:12   ` Jerin Jacob Kollanukkaran
@ 2020-03-09  8:41     ` Mattias Rönnblom
  0 siblings, 0 replies; 5+ messages in thread
From: Mattias Rönnblom @ 2020-03-09  8:41 UTC (permalink / raw)
  To: Jerin Jacob Kollanukkaran; +Cc: dev, Stefan Sundkvist, Ola.Liljedahl

On 2020-03-09 09:12, Jerin Jacob Kollanukkaran wrote:
>> -----Original Message-----
>> From: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>> Sent: Monday, March 9, 2020 1:28 PM
>> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>
>> Cc: dev@dpdk.org; Stefan Sundkvist <stefan.sundkvist@ericsson.com>;
>> Ola.Liljedahl@arm.com
>> Subject: [EXT] Re: [PATCH 5/8] event/dsw: avoid migration waves in large
>> systems
>>
>> On 2020-03-09 08:17, Jerin Jacob Kollanukkaran wrote:
>>>> -----Original Message-----
>>>> From: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>>>> Sent: Monday, March 9, 2020 12:21 PM
>>>> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>
>>>> Cc: dev@dpdk.org; stefan.sundkvist@ericsson.com;
>>>> Ola.Liljedahl@arm.com; Mattias Rönnblom
>>>> <mattias.ronnblom@ericsson.com>
>>>> Subject:  [PATCH 5/8] event/dsw: avoid migration waves in large
>>>> systems
>>>>
>>>> ---------------------------------------------------------------------
>>>> - DSW limits the rate of migrations on a per-port basis. Hence, as
>>>> the number of cores grows, so does the total migration capacity.
>>>>
>>>> In high core-count systems, this allows for a situation where flows
>>>> are migrated to a lightly loaded port which recently already received
>>>> a number of new flows (from other ports). The processing load
>>>> generated by these new flows may not yet be reflected in the lightly
>>>> loaded port's load estimate. The result is that the previously lightly loaded
>> port is now overloaded.
>>>> This patch adds a rough estimate of the size of the inbound
>>>> migrations to a particular port, which can be factored into the
>>>> migration logic, avoiding the above problem.
>>>>
>>>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>>>> ---
>>>> @@ -491,6 +502,9 @@ dsw_select_emigration_target(struct dsw_evdev
>> *dsw,
>>>>    	target_qfs[*targets_len] = *candidate_qf;
>>>>    	(*targets_len)++;
>>>>
>>>> +	rte_atomic32_add(&dsw->ports[candidate_port_id].immigration_load,
>>>> +			 candidate_flow_load);
>>> These are the full barriers in arm64 and PowerPC.
>>> Request to change the C11 mem model[1] with Load and acquire semantics
>>> For better performance enhancement on non x86 machines.
>>>
>>> drivers/event/opdl is already moved to C11 mem model.
>>>
>>> [1]
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__gcc.gnu.org_onlin
>>> edocs_gcc_-5F005f-5F005fatomic-
>> 2DBuiltins.html&d=DwIGaQ&c=nKjWec2b6R0m
>> OyPaz7xtfQ&r=1DGob4H4rxz6H8uITozGOCa0s5f4wCNtTa4UUKvcsvI&m=WWfY
>> IvEKR8a
>> _FuTltGFBbtERAKU1akjXuokLpv2zSz0&s=bEjlLRgN4LriVpVzwYcdgcTV39OI_MZY
>> OG0
>>> QDhjmezw&e=
>>>
>> The performance impacts would be small, since this is in the slow path, with
>> something like a handful of memory barrier per core per ms.
> OK. If it is slow path, then yes, no point in changing.
>
> How about the other following uses in the DSW driver? Does it comes in fastpath or slowpath?
>
> drivers/event/dsw/dsw_event.c:  new_total_on_loan = rte_atomic32_add_return(&dsw->credits_on_loan,
> drivers/event/dsw/dsw_event.c:          rte_atomic32_sub(&dsw->credits_on_loan, acquired_credits);
> drivers/event/dsw/dsw_event.c:          rte_atomic32_sub(&dsw->credits_on_loan, return_credits);
>
Technically still the slow path, but a path much more often taken. For 
producer- and consumer-only ports, it's once per 64 events (per port). 
For ports that do both, it's less often.

Sounds like a bug that rte_atomic32_sub() needs a full barrier.

>> Arguably, it could be done for consistency reasons, but then you should change
>> all DSW atomics.
>>
>>>> +
>>>>    	return true;



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [dpdk-dev] [PATCH 5/8] event/dsw: avoid migration waves in large systems
  2020-03-09  6:50 [dpdk-dev] [PATCH 0/8] DSW performance and statistics improvements Mattias Rönnblom
@ 2020-03-09  6:51 ` Mattias Rönnblom
  0 siblings, 0 replies; 5+ messages in thread
From: Mattias Rönnblom @ 2020-03-09  6:51 UTC (permalink / raw)
  To: jerinj; +Cc: dev, stefan.sundkvist, Ola.Liljedahl, Mattias Rönnblom

DSW limits the rate of migrations on a per-port basis. Hence, as the
number of cores grows, so does the total migration capacity.

In high core-count systems, this allows for a situation where flows
are migrated to a lightly loaded port which recently already received
a number of new flows (from other ports). The processing load
generated by these new flows may not yet be reflected in the lightly
loaded port's load estimate. The result is that the previously lightly
loaded port is now overloaded.

This patch adds a rough estimate of the size of the inbound migrations
to a particular port, which can be factored into the migration logic,
avoiding the above problem.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 drivers/event/dsw/dsw_evdev.c |  1 +
 drivers/event/dsw/dsw_evdev.h |  2 ++
 drivers/event/dsw/dsw_event.c | 18 ++++++++++++++++--
 3 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/event/dsw/dsw_evdev.c b/drivers/event/dsw/dsw_evdev.c
index 7798a38ad..e796975df 100644
--- a/drivers/event/dsw/dsw_evdev.c
+++ b/drivers/event/dsw/dsw_evdev.c
@@ -62,6 +62,7 @@ dsw_port_setup(struct rte_eventdev *dev, uint8_t port_id,
 	port->ctl_in_ring = ctl_in_ring;
 
 	rte_atomic16_init(&port->load);
+	rte_atomic32_init(&port->immigration_load);
 
 	port->load_update_interval =
 		(DSW_LOAD_UPDATE_INTERVAL * rte_get_timer_hz()) / US_PER_S;
diff --git a/drivers/event/dsw/dsw_evdev.h b/drivers/event/dsw/dsw_evdev.h
index ced40ef8d..6cb77cfc4 100644
--- a/drivers/event/dsw/dsw_evdev.h
+++ b/drivers/event/dsw/dsw_evdev.h
@@ -220,6 +220,8 @@ struct dsw_port {
 
 	/* Estimate of current port load. */
 	rte_atomic16_t load __rte_cache_aligned;
+	/* Estimate of flows currently migrating to this port. */
+	rte_atomic32_t immigration_load __rte_cache_aligned;
 } __rte_cache_aligned;
 
 struct dsw_queue {
diff --git a/drivers/event/dsw/dsw_event.c b/drivers/event/dsw/dsw_event.c
index 21c102275..f87656703 100644
--- a/drivers/event/dsw/dsw_event.c
+++ b/drivers/event/dsw/dsw_event.c
@@ -160,6 +160,11 @@ dsw_port_load_update(struct dsw_port *port, uint64_t now)
 		(DSW_OLD_LOAD_WEIGHT+1);
 
 	rte_atomic16_set(&port->load, new_load);
+
+	/* The load of the recently immigrated flows should hopefully
+	 * be reflected the load estimate by now.
+	 */
+	rte_atomic32_set(&port->immigration_load, 0);
 }
 
 static void
@@ -362,7 +367,13 @@ dsw_retrieve_port_loads(struct dsw_evdev *dsw, int16_t *port_loads,
 	uint16_t i;
 
 	for (i = 0; i < dsw->num_ports; i++) {
-		int16_t load = rte_atomic16_read(&dsw->ports[i].load);
+		int16_t measured_load = rte_atomic16_read(&dsw->ports[i].load);
+		int32_t immigration_load =
+			rte_atomic32_read(&dsw->ports[i].immigration_load);
+		int32_t load = measured_load + immigration_load;
+
+		load = RTE_MIN(load, DSW_MAX_LOAD);
+
 		if (load < load_limit)
 			below_limit = true;
 		port_loads[i] = load;
@@ -491,6 +502,9 @@ dsw_select_emigration_target(struct dsw_evdev *dsw,
 	target_qfs[*targets_len] = *candidate_qf;
 	(*targets_len)++;
 
+	rte_atomic32_add(&dsw->ports[candidate_port_id].immigration_load,
+			 candidate_flow_load);
+
 	return true;
 }
 
@@ -503,7 +517,7 @@ dsw_select_emigration_targets(struct dsw_evdev *dsw,
 	struct dsw_queue_flow *target_qfs = source_port->emigration_target_qfs;
 	uint8_t *target_port_ids = source_port->emigration_target_port_ids;
 	uint8_t *targets_len = &source_port->emigration_targets_len;
-	uint8_t i;
+	uint16_t i;
 
 	for (i = 0; i < DSW_MAX_FLOWS_PER_MIGRATION; i++) {
 		bool found;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-03-09  8:41 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-09  7:17 [dpdk-dev] [PATCH 5/8] event/dsw: avoid migration waves in large systems Jerin Jacob Kollanukkaran
2020-03-09  7:58 ` Mattias Rönnblom
2020-03-09  8:12   ` Jerin Jacob Kollanukkaran
2020-03-09  8:41     ` Mattias Rönnblom
  -- strict thread matches above, loose matches on Subject: below --
2020-03-09  6:50 [dpdk-dev] [PATCH 0/8] DSW performance and statistics improvements Mattias Rönnblom
2020-03-09  6:51 ` [dpdk-dev] [PATCH 5/8] event/dsw: avoid migration waves in large systems Mattias Rönnblom

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.