From mboxrd@z Thu Jan 1 00:00:00 1970 MIME-Version: 1.0 References: <8149f85c-4d2e-8411-ee9a-86669c5a1532@siemens.com> <878sah7hyu.fsf@xenomai.org> <871rg54mkh.fsf@xenomai.org> <90061cec-1289-025e-8896-5d85df0d1817@siemens.com> <332c15fb-574a-5cfc-fc65-0ed4c56657bc@siemens.com> In-Reply-To: <332c15fb-574a-5cfc-fc65-0ed4c56657bc@siemens.com> From: Ronny Meeus Date: Mon, 25 Jan 2021 16:10:37 +0100 Message-ID: Subject: Re: race in timerobj Content-Type: text/plain; charset="UTF-8" List-Id: Discussions about the Xenomai project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jan Kiszka Cc: Philippe Gerum , xenomai@xenomai.org Jan, This is my last attempt since I do not want to waste more time on this issue. Feel free to change whatever you want to the patch. I put my gmail in the correct mode but it keeps on wrapping the text and git email also doesn't work (we do not use git overhere) Best regards, Ronny commit 352b0355f617bfbb38f9bd7b1f7d74967f44c857 Author: Ronny Meeus Date: Mon Jan 25 15:11:52 2021 +0100 copperplate/timerobj: fix corrupted timer list. We observe an issue that the timer-list gets corrupted resulting in an endless loop executed by the timer-server thread. During the processing of the timeout list, a pointer to the next timer to be handled is kept in the tmp stack variable. Just before calling the timer handler of the current timer the lock on the timer list is released giving other threads to change the list. If the timer currently referenced by tmp is deleted, we end up with an invalid node (next pointer pointing to itself) and this will result in an endless loop of the timer server. Test code is not available but I have seen this issue in our real production code and after applying this path, the issue is solved. The patch basically changes the timer server logic to always start from the beginning of the list since when a timer is processed, it is either removed (one-shot) or reinserted in a different location in the list. The processing of the list will stop anyhow if all timers that need to expire up to "now" are handled. Signed-off-by: Ronny Meeus diff --git a/lib/copperplate/timerobj.c b/lib/copperplate/timerobj.c index 08cc0d3..96ad2e5 100644 --- a/lib/copperplate/timerobj.c +++ b/lib/copperplate/timerobj.c @@ -100,7 +100,7 @@ static void *timerobj_server(void *arg) { void (*handler)(struct timerobj *tmobj); struct timespec now, value, interval; - struct timerobj *tmobj, *tmp; + struct timerobj *tmobj; sigset_t set; int sig, ret; @@ -119,7 +119,9 @@ static void *timerobj_server(void *arg) __RT(clock_gettime(CLOCK_COPPERPLATE, &now)); - pvlist_for_each_entry_safe(tmobj, tmp, &svtimers, next) { + while (!pvlist_empty(&svtimers)) { + tmobj = pvlist_first_entry(&svtimers, typeof(*tmobj), next); + value = tmobj->itspec.it_value; interval = tmobj->itspec.it_interval; handler = tmobj->handler; Op ma 25 jan. 2021 om 15:36 schreef Jan Kiszka : > > On 25.01.21 15:27, Ronny Meeus wrote: > > commit b38a39bcd758872201e71c07a24d8b5e7f26c3ac > > Author: Ronny Meeus > > Date: Mon Jan 25 15:11:52 2021 +0100 > > > > We observe an issue that the timer-list gets corrupted resulting in an > > endless loop executed by the timer-server thread. > > > > During the processing of the timeout list, a pointer to the next timer > > to be handled is kept in the tmp stack variable. > > Just before calling the timer handler of the current timer the lock on > > the timer list is released giving other threads the opportunity to > > change the list. > > If the timer currently referenced by tmp is deleted, we and up with an > > invalid node (next pointer pointing to itself) and this will result in > > an endless loop of the timer server. > > > > Test code is not available but I have seen this issue in our real > > production code and after applying this path, the issue is solved. > > > > The patch basically changes the timer server logic to always start > > from the beginning of the list since when a timer is processed, it is > > either removed (one-shot) or reinserted in a different location in the > > list. > > The processing of the list will stop anyhow if all timers that need > > to expire up to "now" are handled. > > > > DCO (https://developercertificate.org) via signed-off-by missing. > > > diff --git a/lib/copperplate/timerobj.c b/lib/copperplate/timerobj.c > > index 08cc0d3..b6c6abb 100644 > > --- a/lib/copperplate/timerobj.c > > +++ b/lib/copperplate/timerobj.c > > @@ -100,7 +100,7 @@ static void *timerobj_server(void *arg) > > { > > void (*handler)(struct timerobj *tmobj); > > struct timespec now, value, interval; > > - struct timerobj *tmobj, *tmp; > > + struct timerobj *tmobj; > > sigset_t set; > > int sig, ret; > > > > @@ -119,7 +119,12 @@ static void *timerobj_server(void *arg) > > > > __RT(clock_gettime(CLOCK_COPPERPLATE, &now)); > > > > - pvlist_for_each_entry_safe(tmobj, tmp, &svtimers, next) { > > + for (;;) { > > + if (pvlist_empty(&svtimers)) { > > How about "while (!pvlist_empty(&svtimers)) ..."? > > > + break; > > + } > > + tmobj = pvlist_first_entry(&svtimers, > > typeof(*tmobj), next); > > + > > And then your mail client rewraps the email. If possible, disable this, > otherwise use git send-email. > > > value = tmobj->itspec.it_value; > > interval = tmobj->itspec.it_interval; > > handler = tmobj->handler; > > > > > > Thanks, > Jan > > -- > Siemens AG, T RDA IOT > Corporate Competence Center Embedded Linux