From mboxrd@z Thu Jan 1 00:00:00 1970 Subject: Re: race in timerobj References: <8149f85c-4d2e-8411-ee9a-86669c5a1532@siemens.com> <878sah7hyu.fsf@xenomai.org> <871rg54mkh.fsf@xenomai.org> <90061cec-1289-025e-8896-5d85df0d1817@siemens.com> <332c15fb-574a-5cfc-fc65-0ed4c56657bc@siemens.com> From: Jan Kiszka Message-ID: <43de9af0-0ed1-d8c4-3611-f87751436a32@siemens.com> Date: Mon, 25 Jan 2021 16:26:49 +0100 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit List-Id: Discussions about the Xenomai project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Ronny Meeus , Philippe Gerum Cc: xenomai@xenomai.org On 25.01.21 16:10, Ronny Meeus wrote: > Jan, > > This is my last attempt since I do not want to waste more time on this issue. > Feel free to change whatever you want to the patch. > I put my gmail in the correct mode but it keeps on wrapping > the text and git email also doesn't work (we do not use git overhere) That's why there is git send-email, to by-pass unsuitable interfaces (including web browsers). Thanks anyway, I've hand-edited the patch to make it apply. Philippe, any feedback from you on the approach? Jan > > Best regards, > Ronny > > commit 352b0355f617bfbb38f9bd7b1f7d74967f44c857 > Author: Ronny Meeus > Date: Mon Jan 25 15:11:52 2021 +0100 > > copperplate/timerobj: fix corrupted timer list. > > We observe an issue that the timer-list gets corrupted resulting in an > endless loop executed by the timer-server thread. > > During the processing of the timeout list, a pointer to the next timer > to be handled is kept in the tmp stack variable. > Just before calling the timer handler of the current timer the lock on > the timer list is released giving other threads to change the list. > If the timer currently referenced by tmp is deleted, we end up with an > invalid node (next pointer pointing to itself) and this will result in > an endless loop of the timer server. > > Test code is not available but I have seen this issue in our real > production code and after applying this path, the issue is solved. > > The patch basically changes the timer server logic to always start > from the beginning of the list since when a timer is processed, it is > either removed (one-shot) or reinserted in a different location in the > list. > The processing of the list will stop anyhow if all timers that need > to expire up to "now" are handled. > > Signed-off-by: Ronny Meeus > > diff --git a/lib/copperplate/timerobj.c b/lib/copperplate/timerobj.c > index 08cc0d3..96ad2e5 100644 > --- a/lib/copperplate/timerobj.c > +++ b/lib/copperplate/timerobj.c > @@ -100,7 +100,7 @@ static void *timerobj_server(void *arg) > { > void (*handler)(struct timerobj *tmobj); > struct timespec now, value, interval; > - struct timerobj *tmobj, *tmp; > + struct timerobj *tmobj; > sigset_t set; > int sig, ret; > > @@ -119,7 +119,9 @@ static void *timerobj_server(void *arg) > > __RT(clock_gettime(CLOCK_COPPERPLATE, &now)); > > - pvlist_for_each_entry_safe(tmobj, tmp, &svtimers, next) { > + while (!pvlist_empty(&svtimers)) { > + tmobj = pvlist_first_entry(&svtimers, > typeof(*tmobj), next); > + > value = tmobj->itspec.it_value; > interval = tmobj->itspec.it_interval; > handler = tmobj->handler; > > > Op ma 25 jan. 2021 om 15:36 schreef Jan Kiszka : >> >> On 25.01.21 15:27, Ronny Meeus wrote: >>> commit b38a39bcd758872201e71c07a24d8b5e7f26c3ac >>> Author: Ronny Meeus >>> Date: Mon Jan 25 15:11:52 2021 +0100 >>> >>> We observe an issue that the timer-list gets corrupted resulting in an >>> endless loop executed by the timer-server thread. >>> >>> During the processing of the timeout list, a pointer to the next timer >>> to be handled is kept in the tmp stack variable. >>> Just before calling the timer handler of the current timer the lock on >>> the timer list is released giving other threads the opportunity to >>> change the list. >>> If the timer currently referenced by tmp is deleted, we and up with an >>> invalid node (next pointer pointing to itself) and this will result in >>> an endless loop of the timer server. >>> >>> Test code is not available but I have seen this issue in our real >>> production code and after applying this path, the issue is solved. >>> >>> The patch basically changes the timer server logic to always start >>> from the beginning of the list since when a timer is processed, it is >>> either removed (one-shot) or reinserted in a different location in the >>> list. >>> The processing of the list will stop anyhow if all timers that need >>> to expire up to "now" are handled. >>> >> >> DCO (https://developercertificate.org) via signed-off-by missing. >> >>> diff --git a/lib/copperplate/timerobj.c b/lib/copperplate/timerobj.c >>> index 08cc0d3..b6c6abb 100644 >>> --- a/lib/copperplate/timerobj.c >>> +++ b/lib/copperplate/timerobj.c >>> @@ -100,7 +100,7 @@ static void *timerobj_server(void *arg) >>> { >>> void (*handler)(struct timerobj *tmobj); >>> struct timespec now, value, interval; >>> - struct timerobj *tmobj, *tmp; >>> + struct timerobj *tmobj; >>> sigset_t set; >>> int sig, ret; >>> >>> @@ -119,7 +119,12 @@ static void *timerobj_server(void *arg) >>> >>> __RT(clock_gettime(CLOCK_COPPERPLATE, &now)); >>> >>> - pvlist_for_each_entry_safe(tmobj, tmp, &svtimers, next) { >>> + for (;;) { >>> + if (pvlist_empty(&svtimers)) { >> >> How about "while (!pvlist_empty(&svtimers)) ..."? >> >>> + break; >>> + } >>> + tmobj = pvlist_first_entry(&svtimers, >>> typeof(*tmobj), next); >>> + >> >> And then your mail client rewraps the email. If possible, disable this, >> otherwise use git send-email. >> >>> value = tmobj->itspec.it_value; >>> interval = tmobj->itspec.it_interval; >>> handler = tmobj->handler; >>> >>> >> >> Thanks, >> Jan >> >> -- >> Siemens AG, T RDA IOT >> Corporate Competence Center Embedded Linux