All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] timer: fix break list when timer_cb reset running timer
@ 2016-07-17 18:08 Hiroyuki Mikita
  2016-07-20 20:17 ` Sanford, Robert
  2016-07-22 22:14 ` Sanford, Robert
  0 siblings, 2 replies; 11+ messages in thread
From: Hiroyuki Mikita @ 2016-07-17 18:08 UTC (permalink / raw)
  To: rsanford; +Cc: dev

When timer_cb resets another running timer on the same lcore,
the list of expired timers is chained to the pending-list.
This commit prevents a running timer from being reset
by not its own timer_cb.

Signed-off-by: Hiroyuki Mikita <h.mikita89@gmail.com>
---
 lib/librte_timer/rte_timer.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/lib/librte_timer/rte_timer.c b/lib/librte_timer/rte_timer.c
index 3dcdab5..00e80c9 100644
--- a/lib/librte_timer/rte_timer.c
+++ b/lib/librte_timer/rte_timer.c
@@ -69,6 +69,9 @@ struct priv_timer {
 
 	unsigned prev_lcore;              /**< used for lcore round robin */
 
+	/** running timer on this lcore now */
+	struct rte_timer *running_tim;
+
 #ifdef RTE_LIBRTE_TIMER_DEBUG
 	/** per-lcore statistics */
 	struct rte_timer_debug_stats stats;
@@ -135,9 +138,12 @@ timer_set_config_state(struct rte_timer *tim,
 	while (success == 0) {
 		prev_status.u32 = tim->status.u32;
 
-		/* timer is running on another core, exit */
+		/* timer is running on another core
+		 * or ready to run on local core, exit
+		 */
 		if (prev_status.state == RTE_TIMER_RUNNING &&
-		    prev_status.owner != (uint16_t)lcore_id)
+		    (prev_status.owner != (uint16_t)lcore_id ||
+		     tim != priv_timer[lcore_id].running_tim))
 			return -1;
 
 		/* timer is being configured on another core */
@@ -580,6 +586,7 @@ void rte_timer_manage(void)
 	for (tim = run_first_tim; tim != NULL; tim = next_tim) {
 		next_tim = tim->sl_next[0];
 		priv_timer[lcore_id].updated = 0;
+		priv_timer[lcore_id].running_tim = tim;
 
 		/* execute callback function with list unlocked */
 		tim->f(tim, tim->arg);
@@ -610,6 +617,7 @@ void rte_timer_manage(void)
 			rte_spinlock_unlock(&priv_timer[lcore_id].list_lock);
 		}
 	}
+	priv_timer[lcore_id].running_tim = NULL;
 }
 
 /* dump statistics about timers */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] timer: fix break list when timer_cb reset running timer
  2016-07-17 18:08 [PATCH] timer: fix break list when timer_cb reset running timer Hiroyuki Mikita
@ 2016-07-20 20:17 ` Sanford, Robert
  2016-07-21 10:34   ` Hiroyuki Mikita
  2016-07-22 22:14 ` Sanford, Robert
  1 sibling, 1 reply; 11+ messages in thread
From: Sanford, Robert @ 2016-07-20 20:17 UTC (permalink / raw)
  To: Hiroyuki Mikita; +Cc: dev, Thomas Monjalon

Hi Hiroyuki,

I am reviewing your 3 timer patches.
Can you please explain in more detail your use-case that results in a
problem?

For example, is it when timer A's callback tries to reset
(rte_timer_reset) timer B?
If yes, is timer B in PENDING state and likely to expire soon?

--
Thanks,
Robert


On 7/17/16 2:08 PM, "Hiroyuki Mikita" <h.mikita89@gmail.com> wrote:

>When timer_cb resets another running timer on the same lcore,
>the list of expired timers is chained to the pending-list.
>This commit prevents a running timer from being reset
>by not its own timer_cb.
>
>Signed-off-by: Hiroyuki Mikita <h.mikita89@gmail.com>
>---
> lib/librte_timer/rte_timer.c | 12 ++++++++++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>
>diff --git a/lib/librte_timer/rte_timer.c b/lib/librte_timer/rte_timer.c
>index 3dcdab5..00e80c9 100644
>--- a/lib/librte_timer/rte_timer.c
>+++ b/lib/librte_timer/rte_timer.c
>@@ -69,6 +69,9 @@ struct priv_timer {
> 
> 	unsigned prev_lcore;              /**< used for lcore round robin */
> 
>+	/** running timer on this lcore now */
>+	struct rte_timer *running_tim;
>+
> #ifdef RTE_LIBRTE_TIMER_DEBUG
> 	/** per-lcore statistics */
> 	struct rte_timer_debug_stats stats;
>@@ -135,9 +138,12 @@ timer_set_config_state(struct rte_timer *tim,
> 	while (success == 0) {
> 		prev_status.u32 = tim->status.u32;
> 
>-		/* timer is running on another core, exit */
>+		/* timer is running on another core
>+		 * or ready to run on local core, exit
>+		 */
> 		if (prev_status.state == RTE_TIMER_RUNNING &&
>-		    prev_status.owner != (uint16_t)lcore_id)
>+		    (prev_status.owner != (uint16_t)lcore_id ||
>+		     tim != priv_timer[lcore_id].running_tim))
> 			return -1;
> 
> 		/* timer is being configured on another core */
>@@ -580,6 +586,7 @@ void rte_timer_manage(void)
> 	for (tim = run_first_tim; tim != NULL; tim = next_tim) {
> 		next_tim = tim->sl_next[0];
> 		priv_timer[lcore_id].updated = 0;
>+		priv_timer[lcore_id].running_tim = tim;
> 
> 		/* execute callback function with list unlocked */
> 		tim->f(tim, tim->arg);
>@@ -610,6 +617,7 @@ void rte_timer_manage(void)
> 			rte_spinlock_unlock(&priv_timer[lcore_id].list_lock);
> 		}
> 	}
>+	priv_timer[lcore_id].running_tim = NULL;
> }
> 
> /* dump statistics about timers */
>-- 
>2.7.4
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] timer: fix break list when timer_cb reset running timer
  2016-07-20 20:17 ` Sanford, Robert
@ 2016-07-21 10:34   ` Hiroyuki Mikita
  0 siblings, 0 replies; 11+ messages in thread
From: Hiroyuki Mikita @ 2016-07-21 10:34 UTC (permalink / raw)
  To: Sanford, Robert; +Cc: dev, Thomas Monjalon

Hi Robert,

Thank you for reviewing.

In the following case, the skip list is broken.
- Timer A and timer B are configured on the same lcore, in the same
pending list.
- The expire time of timer A is earlier than that of timer B.
- rte_timer_manage() is called on the lcore after the expire time of timer B.
- The callback of timer A resets timer B.

In rte_timer_manage() process,
both timers are expired at the same time, so the running list includes
both timers.
States of both timers are transited from PENDING to RUNNING, then
callbacks are executed sequentially.
The callback of timer A resets timer B which is still RUNNING, so
timer B is added to the pending-list.
In this time, timer B is in both the running list and the pending list.
It means that the running list is chained to the pending list. Both
lists are broken.

Regards,
Hiroyuki

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] timer: fix break list when timer_cb reset running timer
  2016-07-17 18:08 [PATCH] timer: fix break list when timer_cb reset running timer Hiroyuki Mikita
  2016-07-20 20:17 ` Sanford, Robert
@ 2016-07-22 22:14 ` Sanford, Robert
  2016-07-23  8:49   ` Thomas Monjalon
  2016-07-25 15:53   ` Thomas Monjalon
  1 sibling, 2 replies; 11+ messages in thread
From: Sanford, Robert @ 2016-07-22 22:14 UTC (permalink / raw)
  To: Hiroyuki Mikita, dev, Thomas Monjalon



On 7/17/16 2:08 PM, "Hiroyuki Mikita" <h.mikita89@gmail.com> wrote:

>When timer_cb resets another running timer on the same lcore,
>the list of expired timers is chained to the pending-list.
>This commit prevents a running timer from being reset
>by not its own timer_cb.
>
>Signed-off-by: Hiroyuki Mikita <h.mikita89@gmail.com>
>---
> lib/librte_timer/rte_timer.c | 12 ++++++++++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>
>diff --git a/lib/librte_timer/rte_timer.c b/lib/librte_timer/rte_timer.c
>index 3dcdab5..00e80c9 100644
>--- a/lib/librte_timer/rte_timer.c
>+++ b/lib/librte_timer/rte_timer.c
>@@ -69,6 +69,9 @@ struct priv_timer {
> 
> 	unsigned prev_lcore;              /**< used for lcore round robin */
> 
>+	/** running timer on this lcore now */
>+	struct rte_timer *running_tim;
>+
> #ifdef RTE_LIBRTE_TIMER_DEBUG
> 	/** per-lcore statistics */
> 	struct rte_timer_debug_stats stats;
>@@ -135,9 +138,12 @@ timer_set_config_state(struct rte_timer *tim,
> 	while (success == 0) {
> 		prev_status.u32 = tim->status.u32;
> 
>-		/* timer is running on another core, exit */
>+		/* timer is running on another core
>+		 * or ready to run on local core, exit
>+		 */
> 		if (prev_status.state == RTE_TIMER_RUNNING &&
>-		    prev_status.owner != (uint16_t)lcore_id)
>+		    (prev_status.owner != (uint16_t)lcore_id ||
>+		     tim != priv_timer[lcore_id].running_tim))
> 			return -1;
> 
> 		/* timer is being configured on another core */
>@@ -580,6 +586,7 @@ void rte_timer_manage(void)
> 	for (tim = run_first_tim; tim != NULL; tim = next_tim) {
> 		next_tim = tim->sl_next[0];
> 		priv_timer[lcore_id].updated = 0;
>+		priv_timer[lcore_id].running_tim = tim;
> 
> 		/* execute callback function with list unlocked */
> 		tim->f(tim, tim->arg);
>@@ -610,6 +617,7 @@ void rte_timer_manage(void)
> 			rte_spinlock_unlock(&priv_timer[lcore_id].list_lock);
> 		}
> 	}
>+	priv_timer[lcore_id].running_tim = NULL;
> }
> 
> /* dump statistics about timers */
>-- 
>2.7.4
>

Acked-by: Robert Sanford <rsanford@akamai.com>


I tested the three timer patches with app/test timer_autotest and
timer_racecond_autotest, and additional private tests.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] timer: fix break list when timer_cb reset running timer
  2016-07-22 22:14 ` Sanford, Robert
@ 2016-07-23  8:49   ` Thomas Monjalon
  2016-07-25 12:29     ` Sanford, Robert
  2016-07-25 15:53   ` Thomas Monjalon
  1 sibling, 1 reply; 11+ messages in thread
From: Thomas Monjalon @ 2016-07-23  8:49 UTC (permalink / raw)
  To: Sanford, Robert; +Cc: Hiroyuki Mikita, dev

2016-07-23 0:14 GMT+02:00 Sanford, Robert <rsanford@akamai.com>:
> Acked-by: Robert Sanford <rsanford@akamai.com>
>
> I tested the three timer patches with app/test timer_autotest and
> timer_racecond_autotest, and additional private tests.

Thanks Robert.
Are you confident enough to integrate them in the last days of 16.07?
How critical are they?
Should we make a RC5 for them?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] timer: fix break list when timer_cb reset running timer
  2016-07-23  8:49   ` Thomas Monjalon
@ 2016-07-25 12:29     ` Sanford, Robert
  2016-07-25 14:21       ` Hiroyuki Mikita
  0 siblings, 1 reply; 11+ messages in thread
From: Sanford, Robert @ 2016-07-25 12:29 UTC (permalink / raw)
  To: Thomas Monjalon, Hiroyuki Mikita; +Cc: dev


On 7/23/16 4:49 AM, "Thomas Monjalon" <thomas.monjalon@6wind.com> wrote:

>2016-07-23 0:14 GMT+02:00 Sanford, Robert <rsanford@akamai.com>:
>> Acked-by: Robert Sanford <rsanford@akamai.com>
>>
>> I tested the three timer patches with app/test timer_autotest and
>> timer_racecond_autotest, and additional private tests.
>
>Thanks Robert.
>Are you confident enough to integrate them in the last days of 16.07?
>How critical are they?
>Should we make a RC5 for them?

Yes, I'm confident that the patches are safe and correct.
However, I'm not sure that we should make a RC just for them.

Patch 1 fixes a problem where we incorrectly lower the depth of the skip
list, resulting in performance that does not live up to O(log n) that we
expect. Summary: performance degradation with large number of timers.

Patch 2 fixes a minor inefficiency when timer_manage() races with
timer_stop() or timer_reset().

Patch 3 fixes the most serious problem: We may corrupt timer list links if
multiple timers expire at roughly the same time, and one of those timers'
callback tries to stop/reset other(s) that are scheduled to run in the
same call to timer_manage().

Question for Hiroyuki:
How did you discover timer linked-list corruption? By code inspection, or
do you have a real application that needs that functionality (timers
stop/reset each other at roughly the same time)?

Regards,
Robert

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] timer: fix break list when timer_cb reset running timer
  2016-07-25 12:29     ` Sanford, Robert
@ 2016-07-25 14:21       ` Hiroyuki Mikita
  2016-07-25 14:43         ` Thomas Monjalon
  0 siblings, 1 reply; 11+ messages in thread
From: Hiroyuki Mikita @ 2016-07-25 14:21 UTC (permalink / raw)
  To: Sanford, Robert; +Cc: Thomas Monjalon, dev

Hi Robert,

My application had timers which reset another timer and sometimes did not work.
Its profile by 'perf' command showed timer_add occupied 99.9% CPU. It
seemed that an infinite loop occurred in rte_timer.
I inspected timer codes and found that the main cause was a bug of the
patch 3. In this process, I also found the other bugs.

Regards,
Hiroyuki

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] timer: fix break list when timer_cb reset running timer
  2016-07-25 14:21       ` Hiroyuki Mikita
@ 2016-07-25 14:43         ` Thomas Monjalon
  2016-07-25 15:14           ` Hiroyuki Mikita
  0 siblings, 1 reply; 11+ messages in thread
From: Thomas Monjalon @ 2016-07-25 14:43 UTC (permalink / raw)
  To: Hiroyuki Mikita, Sanford, Robert; +Cc: dev

Hiroyuki, Robert,
I would like to apply these patches quickly.
Please could you provide some "Fixes:" line to know the origin
of the bugs?
Thanks

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] timer: fix break list when timer_cb reset running timer
  2016-07-25 14:43         ` Thomas Monjalon
@ 2016-07-25 15:14           ` Hiroyuki Mikita
  2016-07-25 15:21             ` Thomas Monjalon
  0 siblings, 1 reply; 11+ messages in thread
From: Hiroyuki Mikita @ 2016-07-25 15:14 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: Sanford, Robert, dev

Fixes: a4b7a5a45cf5 ("timer: fix race condition")

2016-07-25 23:43 GMT+09:00 Thomas Monjalon <thomas.monjalon@6wind.com>:
> Hiroyuki, Robert,
> I would like to apply these patches quickly.
> Please could you provide some "Fixes:" line to know the origin
> of the bugs?
> Thanks
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] timer: fix break list when timer_cb reset running timer
  2016-07-25 15:14           ` Hiroyuki Mikita
@ 2016-07-25 15:21             ` Thomas Monjalon
  0 siblings, 0 replies; 11+ messages in thread
From: Thomas Monjalon @ 2016-07-25 15:21 UTC (permalink / raw)
  To: Hiroyuki Mikita; +Cc: Sanford, Robert, dev

2016-07-26 00:14, Hiroyuki Mikita:
> Fixes: a4b7a5a45cf5 ("timer: fix race condition")
> 
> 2016-07-25 23:43 GMT+09:00 Thomas Monjalon <thomas.monjalon@6wind.com>:
> > Hiroyuki, Robert,
> > I would like to apply these patches quickly.
> > Please could you provide some "Fixes:" line to know the origin
> > of the bugs?
> > Thanks

Thanks a lot, Hiroyuki

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] timer: fix break list when timer_cb reset running timer
  2016-07-22 22:14 ` Sanford, Robert
  2016-07-23  8:49   ` Thomas Monjalon
@ 2016-07-25 15:53   ` Thomas Monjalon
  1 sibling, 0 replies; 11+ messages in thread
From: Thomas Monjalon @ 2016-07-25 15:53 UTC (permalink / raw)
  To: Hiroyuki Mikita; +Cc: Sanford, Robert, dev

> >When timer_cb resets another running timer on the same lcore,
> >the list of expired timers is chained to the pending-list.
> >This commit prevents a running timer from being reset
> >by not its own timer_cb.
> >
> >Signed-off-by: Hiroyuki Mikita <h.mikita89@gmail.com>
> 
> Acked-by: Robert Sanford <rsanford@akamai.com>
> 
> I tested the three timer patches with app/test timer_autotest and
> timer_racecond_autotest, and additional private tests.

Applied, thanks

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-07-25 15:53 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-17 18:08 [PATCH] timer: fix break list when timer_cb reset running timer Hiroyuki Mikita
2016-07-20 20:17 ` Sanford, Robert
2016-07-21 10:34   ` Hiroyuki Mikita
2016-07-22 22:14 ` Sanford, Robert
2016-07-23  8:49   ` Thomas Monjalon
2016-07-25 12:29     ` Sanford, Robert
2016-07-25 14:21       ` Hiroyuki Mikita
2016-07-25 14:43         ` Thomas Monjalon
2016-07-25 15:14           ` Hiroyuki Mikita
2016-07-25 15:21             ` Thomas Monjalon
2016-07-25 15:53   ` Thomas Monjalon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.