xenomai.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* Issues with evl_mutex_trylock()
@ 2022-08-25 20:47 Giulio Moro
  2022-08-26 14:09 ` Philippe Gerum
  0 siblings, 1 reply; 17+ messages in thread
From: Giulio Moro @ 2022-08-25 20:47 UTC (permalink / raw)
  To: xenomai

In the toy example below I start some pthreads with SCHED_FIFO, priority 94 and setting the affinity of one thread per core. I then attach them to the EVL core and share a lock between them and the main thread. The main thread has a settable priority and schedule (SCHED_OTHER with prio 0 or SCHED_FIFO with higher priority) and its affinity is set to all available cores.

Here's a summary of what I get when running the test with some parameters (arguments are: "max iterations", "number of threads to create", "priority of the main thread", "flags (t means don't print anything). The error code reported alongside failures is the base 2 representation of the return code. 100 means that errors were detected in evl_unlock_mutex() in the main thread following a successful call to evl_trylock_mutex(). During these tests, the output is piped to cat and then redirected to /dev/null.

mutex-test 100000 0 0 t :  Success real 0m13.583s user 0m1.218s sys 0m0.000s
mutex-test 100000 0 90 t :  Success real 0m13.350s user 0m1.177s sys 0m0.000s
mutex-test 100000 0 99 t :  Success real 0m13.347s user 0m1.179s sys 0m0.000s
mutex-test 100000 1 0 t :  Success real 0m20.087s user 0m3.575s sys 0m0.000s
mutex-test 100000 1 90 t :  Success real 0m17.610s user 0m2.133s sys 0m0.000s
mutex-test 100000 1 99 t :  Success real 0m15.466s user 0m1.333s sys 0m0.000s
mutex-test 100000 2 0 t :  Failed 100 real 0m0.259s user 0m0.083s sys 0m0.000s
mutex-test 100000 2 90 t :  Failed 100 real 0m0.374s user 0m0.176s sys 0m0.000s
mutex-test 100000 2 99 t :  Failed 100 real 0m0.207s user 0m0.109s sys 0m0.000s
mutex-test 100000 3 0 t :  Failed 100 real 0m0.253s user 0m0.124s sys 0m0.000s
mutex-test 100000 3 90 t :  Failed 100 real 0m0.383s user 0m0.149s sys 0m0.000s
mutex-test 100000 3 99 t :  Failed 100 real 0m0.283s user 0m0.151s sys 0m0.000s
mutex-test 100000 4 0 t :  Failed 100 real 0m0.526s user 0m0.221s sys 0m0.000s
mutex-test 100000 4 90 t :  Failed 100 real 0m0.438s user 0m0.178s sys 0m0.000s
mutex-test 100000 4 99 t :  Failed 100 real 0m0.434s user 0m0.178s sys 0m0.000s

So, when running with 2 or more threads, the same error keeps surfacing very quickly.

Typically, when looking at the log of a failed test (needs to run with the `a` flag), you'd get something like this:
  ./mutex-test 10000 2 0 a
---snip---
thread 1 unlocked
thread 1 waiting
thread 0 locked
main try_lock 72
main try_lock: busy
main try_lock 73
main try_lock: busy
main try_lock 74
main try_lock: busy
main try_lock 75
thread 0 unlocked
main locked
thread 0 waiting
thread 1 locked
main ERROR unlock: -1 Operation not permitted
---snip---

So - if the evl_print() order is to be trusted - the main thread's try_lock() call succeeds and when it tries to release the lock it fails because it doesn't actually own the lock (EPERM). Note that after the main thread locks it and before it unlocks it, thread 1 has also managed to acquire the lock.
This typically happens the first time that main has successfully acquired the lock after the other threads have started.

I tried a quick and dirty pthread equivalent (build with -DUSE_PTHREAD) and that doesn't fail, which makes me think the logic of my code is OK.

I further observe the following unexpected behaviours:
- in the 0 and 1 thread cases, I get increasing values in the ISW column for the main thread when monitoring with evl ps -st. This happens regardless of the priority of the main thread and of the amount of data printed to the screen doesn't happen with 2, 3 or 4 threads (for these I have to run with 'k' to keep going after the errors and be able to observe the output of evl ps)
- running `./mutex-test 100000 1 X a` (where 'a' stands for "print all", and for any priority X) will almost always quickly show the following behaviour: the main thread to stops running, and process kworker/u8:3+stdout:3033.O runs at 100% CPU. I don't think it's expected that a SCHED_FIFO thread attached to the EVL core becomes blocked because it calls evl_printf(). Note that only the main thread is blocked, the other thread runs. This doesn't happen when numThreads == 0 or numThreads >= 2.
- running with `./mutex-test 100000 1 X q`, where 'q' is for "don't print anything", gives me 90% of the time an immediate hard lock up of the board (not even a stacktrace out of the UART!) (X can be any priority). Note that if this is run with `t` (throttled printing) or `a` (print all), it works kind of OK, apart from the issue mentione in the previous bullet point
- occasionally I get other lockups of the board during these tests that harder to reproduce. For instance, one happened at some point while running `./mutex-test 10000 3 0 t`
- I do get some EVL-related stacktraces relatively often when ctrl-C a starting program. I will make a separate post about this.

As previously, this is tested on TI's SK-AM62 with this branch https://source.denx.de/Xenomai/xenomai4/linux-evl/-/tree/vendor/ti/v5.10.120 and commit 7ccc58d62905724f1b8de4823e0fe8e0129b0fb7 of libevl (r37 + 1 )

Code below for posterity. Same code and testing routine also available here https://github.com/giuliomoro/evl-mutex-test.

#define _GNU_SOURCE
#include <sched.h>
#include <unistd.h>
#ifndef USE_PTHREAD
#include <evl/evl.h>
#endif // USE_PTHREAD
#include <pthread.h>
#include <string.h>
#include <error.h>
#include <stdlib.h>
#include <sys/sysinfo.h>
#include <stdio.h>
#include <errno.h>
#include <signal.h>

#define ERR_OTHER (1 << 0)
#define ERR_TRYLOCK (1 << 1)
#define ERR_UNLOCK (1 << 2)
#define ERR_LOCK (1 << 3)
#define ERR_BITS 4

#define PRINT_NONE 1
#define PRINT_ERRORS 2
#define PRINT_THROTTLED 3
#define PRINT_ALL 4
int printFlags = PRINT_ALL;
const int throttledPrints = 500;

#ifdef __EVL__
typedef struct evl_mutex mutex_t;
#else
typedef pthread_mutex_t mutex_t;
int nop(){ return 0; }
#define evl_init(...) nop()
#define evl_attach_thread(...) nop()
#define evl_attach_self(...) nop()
#define evl_printf(...) printf(__VA_ARGS__)
#define evl_create_mutex(mutex, ...) \
	-pthread_mutex_init(mutex, NULL);

int evl_lock_mutex(mutex_t* mutex)
{
	return -pthread_mutex_lock(mutex);
}
int evl_trylock_mutex(mutex_t* mutex)
{
	return -pthread_mutex_trylock(mutex);
}
int evl_unlock_mutex(mutex_t* mutex)
{
	return -pthread_mutex_unlock(mutex);
}
#define evl_usleep(...) usleep(__VA_ARGS__)
#endif

mutex_t mutex;
volatile int shouldStop;
pthread_t* threads;

void* kThreadSuccess = (void*)2l;
void* kThreadError = (void*)1l;

void* threadFunc(void* arg)
{
	int n = (long int)arg;
	int shouldPrint = (PRINT_NONE != printFlags);
	shouldPrint && evl_printf("Thread %d\n", n);
	int ret = evl_attach_thread(EVL_CLONE_PRIVATE, "thread-%d", n);
	if(ret < 0)
	{
		fprintf(stderr, "evl_attach_thread for thread %d failed with %s (%d)\n", n, strerror(-ret), -ret);
		return (void*)ERR_OTHER;
	}
	cpu_set_t readset;
	ret = pthread_getaffinity_np(threads[n], sizeof(readset), &readset);
	if(ret)
	{
		fprintf(stderr, "pthread_getaffinity_np %d %s\n", ret, strerror(ret));
		return (void*)ERR_OTHER;
	}
	char str[1000];
	char* p = str;
	// assemble the string before evl_printf to avoid scrambling because of
	// writing from multiple threads
	if(shouldPrint)
	{
		p += snprintf(p, sizeof(str) - (p - str), "Affinity for %d: [ ", n);
		for (int j = 0; j < CPU_SETSIZE; j++)
			if (CPU_ISSET(j, &readset))
			{
				if(p < str + sizeof(str));
					p += snprintf(p, sizeof(str) - (p - str), "%d  ", j);
			}
		evl_printf("%s]\n", str);
	}
	long int err = 0;
	unsigned int c = 0;
	while(!shouldStop)
	{
		int shouldPrint = (PRINT_ALL == printFlags) || ((PRINT_THROTTLED == printFlags) && (0 == (n % throttledPrints)));
		int shouldPrintError = (PRINT_NONE != printFlags);
		c++;
		shouldPrint && evl_printf("thread %d waiting\n", n);
		int ret = evl_lock_mutex(&mutex);
		if(ret) {
			shouldPrintError && evl_printf("thread %d ERROR lock() %d %s\n", n, ret, strerror(-ret));
			err |= ERR_LOCK;
		} else {
			shouldPrint && evl_printf("thread %d locked\n", n);
			for(volatile unsigned int n = 0; n < 100000; ++n) // do some work
				;
			ret = evl_unlock_mutex(&mutex);
			if(ret) {
				shouldPrintError && evl_printf("thread %d ERROR unlock() %d %s\n", n, ret, strerror(-ret));
				err |= ERR_UNLOCK;
			} else {
				shouldPrint && evl_printf("thread %d unlocked\n", n);
			}
		}
	}
	return (void*)err;
}

void sig_handler(int var)
{
	shouldStop = 1;
}


int main(int argc, char** argv)
{
#ifdef __EVL__
	char backend[] = "EVL";
#else
	char backend[] = "pthread";
#endif
	int maxThreads = get_nprocs();
	if(argc < 4)
	{
		fprintf(stderr, "Usage \"%s numIter numThreads mainPrio [flags]\"\n"
				"Where flags is a combination of:\n"
				" k: keep going on error (default: no)\n"
				" q: print nothing\n"
				" e: print errors\n"
				" t: print every %d iterations or errors\n"
				" a: print all (default)\n"
				, argv[0], throttledPrints);
		return ERR_OTHER;
	}
	int numIter = atoi(argv[1]);
	int numThreads = atoi(argv[2]);
	if(numThreads > maxThreads)
	{
		fprintf(stderr, "Too many threads\n");
		return ERR_OTHER;
	}
	int mainPrio = atoi(argv[3]);
	if(mainPrio > sched_get_priority_max(SCHED_FIFO))
	{
		fprintf(stderr, "Too high priority\n");
		return ERR_OTHER;
	}
	int keepGoing = 0;
	if(argc >= 5)
	{
		const char* str = argv[4];
		for(unsigned int n = 0; n < strlen(str); ++n)
		{
			switch (str[n])
			{
			case 'k':
				keepGoing = 1;
				break;
			case 'e':
				printFlags = PRINT_ERRORS;
				break;
			case 'q':
				printFlags = PRINT_NONE;
				break;
			case 't':
				printFlags = PRINT_THROTTLED;
				break;
			case 'a':
				printFlags = PRINT_ALL;
				break;
			}
		}
	}
	int shouldPrint = (PRINT_NONE != printFlags);
	shouldPrint && printf("Using %s\n", backend);
	shouldPrint && printf("NumThreads: %d\n", numThreads);

	int threadPrio = 94;
	signal(SIGINT, sig_handler);
	threads = (pthread_t*)malloc(sizeof(threads[0]) * numThreads);
	int ret = evl_init();
	if(ret < 0)
	{
		fprintf(stderr, "evl_init: %d %s\n", ret, strerror(-ret));
		return ERR_OTHER;
	}
	ret = evl_create_mutex(&mutex, EVL_CLOCK_MONOTONIC, 0, EVL_MUTEX_NORMAL | EVL_CLONE_PRIVATE, "test-mutex");
	if(ret < 0)
	{
		fprintf(stderr, "evl_create_mutex: %d %s\n", ret, strerror(-ret));
		return ERR_OTHER;
	}
	ret = evl_attach_self("main");
	if(ret < 0)
	{
		fprintf(stderr, "evl_attach_self: %d %s\n", ret, strerror(-ret));
		return ERR_OTHER;
	}
	// EVL will have pinned this thread on the current CPU. Allow it to run
	// on all CPUs. Critically, this needs to be done before alowing
	// threads to run, or we may be starved by whichever one runs on our core.
	cpu_set_t cpuset;
	CPU_ZERO(&cpuset);
	for(unsigned int n = 0; n < maxThreads; ++n)
		CPU_SET(n, &cpuset);
	ret = pthread_setaffinity_np(pthread_self(), sizeof(cpuset), &cpuset);
	if(ret)
	{
		fprintf(stderr, "pthread_setaffinity_np: %d %s\n", ret, strerror(ret));
		return ERR_OTHER;
	}
	int policy = mainPrio ? SCHED_FIFO : SCHED_OTHER;
	struct sched_param param;
	param.sched_priority = mainPrio;
	ret = pthread_setschedparam(pthread_self(), policy, &param);
	if(ret)
	{
		fprintf(stderr, "pthread_setschedparam: %d %s\n", ret, strerror(ret));
		return ERR_OTHER;
	}
	ret = evl_lock_mutex(&mutex);
	if(ret < 0)
	{
		fprintf(stderr, "evl_lock_mutex: %d %s\n", ret, strerror(-ret));
		return ERR_OTHER;
	}
	for(unsigned int n = 0; n < numThreads; ++n)
	{
		cpu_set_t cpuset;
		CPU_ZERO(&cpuset);
		// start from the last CPU to avoid using 100% of CPU 0 when
		// there is no other thread contending for the lock
		CPU_SET(maxThreads - 1 - n, &cpuset);
		pthread_attr_t attr;
		ret = pthread_attr_init(&attr);
		ret |= pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
		ret |= pthread_attr_setinheritsched(&attr, PTHREAD_EXPLICIT_SCHED);
		ret |= pthread_attr_setschedpolicy(&attr, SCHED_FIFO);
		ret |= pthread_attr_setaffinity_np(&attr, sizeof(cpuset), &cpuset);
		struct sched_param param;
		param.sched_priority = threadPrio;
		ret |= pthread_attr_setschedparam(&attr, &param);
		ret |= pthread_create(&threads[n], &attr, threadFunc, (void*)(long int)n);
		if(ret)
		{
			fprintf(stderr, "An error occurred while creating thread %d\n", n);
			return ERR_OTHER;
		}
	}
	shouldPrint && evl_printf("All threads go\n");
	ret = evl_unlock_mutex(&mutex);
	if(ret)
	{
		evl_printf("main ERROR unlock before loop\n");
		return ERR_OTHER; // TODO: cleanup
	}
	shouldPrint && evl_printf("main unlocked before loop\n");

	long int err = 0;
	for(unsigned int n = 0; n < numIter && !shouldStop; ++n)
	{
		int shouldPrint = (PRINT_ALL == printFlags) || ((PRINT_THROTTLED == printFlags) && (0 == (n % throttledPrints)));
		int shouldPrintError = (PRINT_NONE != printFlags);
		shouldPrint && evl_printf("main try_lock %d\n", n);
		int ret = evl_trylock_mutex(&mutex);
		if(0 == ret)
		{
			shouldPrint && evl_printf("main locked\n");
			ret = evl_unlock_mutex(&mutex);
			if(ret) {
				shouldPrintError && evl_printf("main ERROR unlock: %d %s\n", ret, strerror(-ret));
				err |= ERR_UNLOCK;
				if(!keepGoing)
					break;
			}
			else
				shouldPrint && evl_printf("main unlocked\n");
		} else if (-EBUSY == ret) {
			shouldPrint && evl_printf("main try_lock: busy\n");
		} else {
			shouldPrintError && evl_printf("main ERROR try_lock: %d %s\n", ret, strerror(-ret));
			err |= ERR_TRYLOCK;
			if(!keepGoing)
				break;
		}
		evl_usleep(100);
	}
	shouldStop = 1;
	for(unsigned int n = 0; n < numThreads; ++n)
	{
		shouldPrint && evl_printf("Joining %d\n", n);
		void* res;
		ret = pthread_join(threads[n], &res);
		shouldPrint && evl_printf("Joined %d : %s; result is %p\n", n, ret ? strerror(ret) : "", res);
		err |= (((long int)res) << ERR_BITS);
	}
	free(threads);
	if(err)
		shouldPrint && evl_printf("ERROR\n");
	else
		shouldPrint && evl_printf("SUCCESS\n");
	return err;
}




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Issues with evl_mutex_trylock()
  2022-08-25 20:47 Issues with evl_mutex_trylock() Giulio Moro
@ 2022-08-26 14:09 ` Philippe Gerum
  2022-08-26 15:16   ` Giulio Moro
  2022-08-26 16:17   ` Philippe Gerum
  0 siblings, 2 replies; 17+ messages in thread
From: Philippe Gerum @ 2022-08-26 14:09 UTC (permalink / raw)
  To: Giulio Moro; +Cc: xenomai


Giulio Moro <giulio@bela.io> writes:

> In the toy example below I start some pthreads with SCHED_FIFO, priority 94 and setting the affinity of one thread per core. I then attach them to the EVL core and share a lock between them and the main thread. The main thread has a settable priority and schedule (SCHED_OTHER with prio 0 or SCHED_FIFO with higher priority) and its affinity is set to all available cores.
>
> Here's a summary of what I get when running the test with some parameters (arguments are: "max iterations", "number of threads to create", "priority of the main thread", "flags (t means don't print anything). The error code reported alongside failures is the base 2 representation of the return code. 100 means that errors were detected in evl_unlock_mutex() in the main thread following a successful call to evl_trylock_mutex(). During these tests, the output is piped to cat and then redirected to /dev/null.
>
> mutex-test 100000 0 0 t :  Success real 0m13.583s user 0m1.218s sys 0m0.000s
> mutex-test 100000 0 90 t :  Success real 0m13.350s user 0m1.177s sys 0m0.000s
> mutex-test 100000 0 99 t :  Success real 0m13.347s user 0m1.179s sys 0m0.000s
> mutex-test 100000 1 0 t :  Success real 0m20.087s user 0m3.575s sys 0m0.000s
> mutex-test 100000 1 90 t :  Success real 0m17.610s user 0m2.133s sys 0m0.000s
> mutex-test 100000 1 99 t :  Success real 0m15.466s user 0m1.333s sys 0m0.000s
> mutex-test 100000 2 0 t :  Failed 100 real 0m0.259s user 0m0.083s sys 0m0.000s
> mutex-test 100000 2 90 t :  Failed 100 real 0m0.374s user 0m0.176s sys 0m0.000s
> mutex-test 100000 2 99 t :  Failed 100 real 0m0.207s user 0m0.109s sys 0m0.000s
> mutex-test 100000 3 0 t :  Failed 100 real 0m0.253s user 0m0.124s sys 0m0.000s
> mutex-test 100000 3 90 t :  Failed 100 real 0m0.383s user 0m0.149s sys 0m0.000s
> mutex-test 100000 3 99 t :  Failed 100 real 0m0.283s user 0m0.151s sys 0m0.000s
> mutex-test 100000 4 0 t :  Failed 100 real 0m0.526s user 0m0.221s sys 0m0.000s
> mutex-test 100000 4 90 t :  Failed 100 real 0m0.438s user 0m0.178s sys 0m0.000s
> mutex-test 100000 4 99 t :  Failed 100 real 0m0.434s user 0m0.178s sys 0m0.000s
>
> So, when running with 2 or more threads, the same error keeps surfacing very quickly.
>

Confirmed. There is something wrong.

> I further observe the following unexpected behaviours:
> - in the 0 and 1 thread cases, I get increasing values in the ISW
> column for the main thread when monitoring with evl ps -st. This
> happens regardless of the priority of the main thread and of the
> amount of data printed to the screen doesn't happen with 2, 3 or 4
> threads (for these I have to run with 'k' to keep going after the
> errors and be able to observe the output of evl ps)

This is due to an issue in your test code: evl_attach_thread() inherits
the current POSIX scheduling params for the caller, so you need to set
them prior to calling this routine. Otherwise, you need to change them
EVL-wise after attachment using evl_set_schedattr().

> - running `./mutex-test 100000 1 X a` (where 'a' stands for "print
> all", and for any priority X) will almost always quickly show the
> following behaviour: the main thread to stops running, and process
> kworker/u8:3+stdout:3033.O runs at 100% CPU. I don't think it's
> expected that a SCHED_FIFO thread attached to the EVL core becomes
> blocked because it calls evl_printf(). Note that only the main thread
> is blocked, the other thread runs. This doesn't happen when numThreads
> == 0 or numThreads >= 2.

This is the expected behavior unless you switch the file descriptor to
the stdout/stderr proxy to non blocking I/O. If you do, the oob caller
would not wait for the output to drain and return immediately with
-EAGAIN, dropping the current message in the same move.

> - running with `./mutex-test 100000 1 X q`, where 'q' is for "don't print anything", gives me 90% of the time an immediate hard lock up of the board (not even a stacktrace out of the UART!) (X can be any priority). Note that if this is run with `t` (throttled printing) or `a` (print all), it works kind of OK, apart from the issue mentione in the previous bullet point

Which is expected as well: although your main thread does sleep for 100
µs at each iteration, thread 1-n don't, the starving the system from CPU
time on their respecting processors. Enabling the watchdog in the EVL
debug settings would likely trigger a report from the core.

> - occasionally I get other lockups of the board during these tests that harder to reproduce. For instance, one happened at some point while running `./mutex-test 10000 3 0 t`

Could be the same issue as above.

> - I do get some EVL-related stacktraces relatively often when ctrl-C a starting program. I will make a separate post about this.
>
> As previously, this is tested on TI's SK-AM62 with this branch https://source.denx.de/Xenomai/xenomai4/linux-evl/-/tree/vendor/ti/v5.10.120 and commit 7ccc58d62905724f1b8de4823e0fe8e0129b0fb7 of libevl (r37 + 1 )
>

-- 
Philippe.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Issues with evl_mutex_trylock()
  2022-08-26 14:09 ` Philippe Gerum
@ 2022-08-26 15:16   ` Giulio Moro
  2022-08-26 15:33     ` Philippe Gerum
  2022-08-26 16:17   ` Philippe Gerum
  1 sibling, 1 reply; 17+ messages in thread
From: Giulio Moro @ 2022-08-26 15:16 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

> This is due to an issue in your test code: evl_attach_thread() inherits
> the current POSIX scheduling params for the caller, so you need to set
> them prior to calling this routine. Otherwise, you need to change them
> EVL-wise after attachment using evl_set_schedattr().

OK thanks, I fixed it and ISW on the main thread go away when the main thread is set to SCHED_FIFO before evl_attach_self().

It's probably worth adding a note about it in the description of evl_attach_thread() https://evlproject.org/core/user-api/thread/ around the table discussing scheduling policies. The statement there about using `sched_setaffinity()` to set affinity post-attach suggested that regular sched_* calls would still be valid. By the way, is there a way to contribute to the documentation?

> This is the expected behavior unless you switch the file descriptor to
> the stdout/stderr proxy to non blocking I/O. If you do, the oob caller
> would not wait for the output to drain and return immediately with
> -EAGAIN, dropping the current message in the same move.

Ah well that explains why I don't get the usual output scrambling that I'd get with Xenomai's rt_printf() when printing plenty of data. I guess everything comes with a price. I am wondering though: why do these ISW only take place only when the thread is SCHED_WEAK and not with SCHED_FIFO.

>> - running with `./mutex-test 100000 1 X q`, where 'q' is for "don't print anything", gives me 90% of the time an immediate hard lock up of the board (not even a stacktrace out of the UART!) (X can be any priority). Note that if this is run with `t` (throttled printing) or `a` (print all), it works kind of OK, apart from the issue mentione in the previous bullet point
> 
> Which is expected as well: although your main thread does sleep for 100
> µs at each iteration, thread 1-n don't, the starving the system from CPU
> time on their respecting processors. Enabling the watchdog in the EVL
> debug settings would likely trigger a report from the core.

Just to clarify, in this case, the code is running with only 1 worker thread and the main thread. main() will have affinity set to all CPUs, while the other thread will be on the last CPU. In other words, I am not hogging all CPUs and in particular not CPU 0. I will try again with EVL_WATCHDOG enabled.

>> - occasionally I get other lockups of the board during these tests that harder to reproduce. For instance, one happened at some point while running `./mutex-test 10000 3 0 t`
> 
> Could be the same issue as above

It makes sense, however when there are more than 1 worker threads, no thread will run at 100% CPU, because they are contending for the lock and doing actual work while holding it. env ps indicates that the load gets spread evenly across the worker threads.

Thanks,
Giulio

Philippe Gerum wrote on 26/08/2022 16:09:
> 
> Giulio Moro <giulio@bela.io> writes:
> 
>> In the toy example below I start some pthreads with SCHED_FIFO, priority 94 and setting the affinity of one thread per core. I then attach them to the EVL core and share a lock between them and the main thread. The main thread has a settable priority and schedule (SCHED_OTHER with prio 0 or SCHED_FIFO with higher priority) and its affinity is set to all available cores.
>>
>> Here's a summary of what I get when running the test with some parameters (arguments are: "max iterations", "number of threads to create", "priority of the main thread", "flags (t means don't print anything). The error code reported alongside failures is the base 2 representation of the return code. 100 means that errors were detected in evl_unlock_mutex() in the main thread following a successful call to evl_trylock_mutex(). During these tests, the output is piped to cat and then redirected to /dev/null.
>>
>> mutex-test 100000 0 0 t :  Success real 0m13.583s user 0m1.218s sys 0m0.000s
>> mutex-test 100000 0 90 t :  Success real 0m13.350s user 0m1.177s sys 0m0.000s
>> mutex-test 100000 0 99 t :  Success real 0m13.347s user 0m1.179s sys 0m0.000s
>> mutex-test 100000 1 0 t :  Success real 0m20.087s user 0m3.575s sys 0m0.000s
>> mutex-test 100000 1 90 t :  Success real 0m17.610s user 0m2.133s sys 0m0.000s
>> mutex-test 100000 1 99 t :  Success real 0m15.466s user 0m1.333s sys 0m0.000s
>> mutex-test 100000 2 0 t :  Failed 100 real 0m0.259s user 0m0.083s sys 0m0.000s
>> mutex-test 100000 2 90 t :  Failed 100 real 0m0.374s user 0m0.176s sys 0m0.000s
>> mutex-test 100000 2 99 t :  Failed 100 real 0m0.207s user 0m0.109s sys 0m0.000s
>> mutex-test 100000 3 0 t :  Failed 100 real 0m0.253s user 0m0.124s sys 0m0.000s
>> mutex-test 100000 3 90 t :  Failed 100 real 0m0.383s user 0m0.149s sys 0m0.000s
>> mutex-test 100000 3 99 t :  Failed 100 real 0m0.283s user 0m0.151s sys 0m0.000s
>> mutex-test 100000 4 0 t :  Failed 100 real 0m0.526s user 0m0.221s sys 0m0.000s
>> mutex-test 100000 4 90 t :  Failed 100 real 0m0.438s user 0m0.178s sys 0m0.000s
>> mutex-test 100000 4 99 t :  Failed 100 real 0m0.434s user 0m0.178s sys 0m0.000s
>>
>> So, when running with 2 or more threads, the same error keeps surfacing very quickly.
>>
> 
> Confirmed. There is something wrong.
> 
>> I further observe the following unexpected behaviours:
>> - in the 0 and 1 thread cases, I get increasing values in the ISW
>> column for the main thread when monitoring with evl ps -st. This
>> happens regardless of the priority of the main thread and of the
>> amount of data printed to the screen doesn't happen with 2, 3 or 4
>> threads (for these I have to run with 'k' to keep going after the
>> errors and be able to observe the output of evl ps)
> 
> This is due to an issue in your test code: evl_attach_thread() inherits
> the current POSIX scheduling params for the caller, so you need to set
> them prior to calling this routine. Otherwise, you need to change them
> EVL-wise after attachment using evl_set_schedattr().
> 
>> - running `./mutex-test 100000 1 X a` (where 'a' stands for "print
>> all", and for any priority X) will almost always quickly show the
>> following behaviour: the main thread to stops running, and process
>> kworker/u8:3+stdout:3033.O runs at 100% CPU. I don't think it's
>> expected that a SCHED_FIFO thread attached to the EVL core becomes
>> blocked because it calls evl_printf(). Note that only the main thread
>> is blocked, the other thread runs. This doesn't happen when numThreads
>> == 0 or numThreads >= 2.
> 
> This is the expected behavior unless you switch the file descriptor to
> the stdout/stderr proxy to non blocking I/O. If you do, the oob caller
> would not wait for the output to drain and return immediately with
> -EAGAIN, dropping the current message in the same move.
> 
>> - running with `./mutex-test 100000 1 X q`, where 'q' is for "don't print anything", gives me 90% of the time an immediate hard lock up of the board (not even a stacktrace out of the UART!) (X can be any priority). Note that if this is run with `t` (throttled printing) or `a` (print all), it works kind of OK, apart from the issue mentione in the previous bullet point
> 
> Which is expected as well: although your main thread does sleep for 100
> µs at each iteration, thread 1-n don't, the starving the system from CPU
> time on their respecting processors. Enabling the watchdog in the EVL
> debug settings would likely trigger a report from the core.
> 
>> - occasionally I get other lockups of the board during these tests that harder to reproduce. For instance, one happened at some point while running `./mutex-test 10000 3 0 t`
> 
> Could be the same issue as above.
> 
>> - I do get some EVL-related stacktraces relatively often when ctrl-C a starting program. I will make a separate post about this.
>>
>> As previously, this is tested on TI's SK-AM62 with this branch https://source.denx.de/Xenomai/xenomai4/linux-evl/-/tree/vendor/ti/v5.10.120 and commit 7ccc58d62905724f1b8de4823e0fe8e0129b0fb7 of libevl (r37 + 1 )
>>
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Issues with evl_mutex_trylock()
  2022-08-26 15:16   ` Giulio Moro
@ 2022-08-26 15:33     ` Philippe Gerum
  0 siblings, 0 replies; 17+ messages in thread
From: Philippe Gerum @ 2022-08-26 15:33 UTC (permalink / raw)
  To: Giulio Moro; +Cc: xenomai


Giulio Moro <giulio@bela.io> writes:

>> This is due to an issue in your test code: evl_attach_thread() inherits
>> the current POSIX scheduling params for the caller, so you need to set
>> them prior to calling this routine. Otherwise, you need to change them
>> EVL-wise after attachment using evl_set_schedattr().
>
> OK thanks, I fixed it and ISW on the main thread go away when the main thread is set to SCHED_FIFO before evl_attach_self().
>
> It's probably worth adding a note about it in the description of
> evl_attach_thread() https://evlproject.org/core/user-api/thread/
> around the table discussing scheduling policies. The statement there
> about using `sched_setaffinity()` to set affinity post-attach
> suggested that regular sched_* calls would still be valid. By the way,
> is there a way to contribute to the documentation?
>

Sure, involving people who do use the API is actually the best way to
get it right. Please clone the following repo, patches should be sent to
this list:

https://source.denx.de/Xenomai/xenomai4/website

This is markdown format parsed by the Hugo static site generator.

>> This is the expected behavior unless you switch the file descriptor to
>> the stdout/stderr proxy to non blocking I/O. If you do, the oob caller
>> would not wait for the output to drain and return immediately with
>> -EAGAIN, dropping the current message in the same move.
>
> Ah well that explains why I don't get the usual output scrambling that
> I'd get with Xenomai's rt_printf() when printing plenty of data. I

There is a (kernel) interface to set the size of the output buffer for
these proxies already. The current limit is set to the STDSTREAM_BUFSZ
constant in the library code ATM. Maybe a user interface is missing to
make it a bit more flexible, at least at application startup (e.g. via
evl_init()).

> guess everything comes with a price. I am wondering though: why do
> these ISW only take place only when the thread is SCHED_WEAK and not
> with SCHED_FIFO.

Because SCHED_WEAK means that in-band is the regular execution stage for
the thread, with the ability to switch oob when required by some EVL
syscall it issues. When it comes to mutexes, a thread undergoing the
SCHED_WEAK policy switches oob as a result of grabbing the lock, and
stays there until it drops it, at which point the core switches it back
automatically to the in-band stage. The latter causes the ISW counter to
increase. This is obviously a costly sequence of operations, so this
should not be done lightly, and certainly not in a tight loop.

>
>>> - running with `./mutex-test 100000 1 X q`, where 'q' is for "don't print anything", gives me 90% of the time an immediate hard lock up of the board (not even a stacktrace out of the UART!) (X can be any priority). Note that if this is run with `t` (throttled printing) or `a` (print all), it works kind of OK, apart from the issue mentione in the previous bullet point
>> Which is expected as well: although your main thread does sleep for
>> 100
>> µs at each iteration, thread 1-n don't, the starving the system from CPU
>> time on their respecting processors. Enabling the watchdog in the EVL
>> debug settings would likely trigger a report from the core.
>
> Just to clarify, in this case, the code is running with only 1 worker thread and the main thread. main() will have affinity set to all CPUs, while the other thread will be on the last CPU. In other words, I am not hogging all CPUs and in particular not CPU 0. I will try again with EVL_WATCHDOG enabled.
>

Every CPU must be allowed a small portion of the bandwidth in order for
the in-band kernel to deal with housekeeping work (e.g. serving IPIs
from other cores).

>>> - occasionally I get other lockups of the board during these tests that harder to reproduce. For instance, one happened at some point while running `./mutex-test 10000 3 0 t`
>> Could be the same issue as above
>
> It makes sense, however when there are more than 1 worker threads, no thread will run at 100% CPU, because they are contending for the lock and doing actual work while holding it. env ps indicates that the load gets spread evenly across the worker threads.
>

Maybe not 100% would be consumed, but possibly still too much for the
in-band kernel to run the housekeeping tasks. We should have a look at
this when the mutex-related is fixed (I'm on it), to make sure we start
investigating from a clean state.

-- 
Philippe.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Issues with evl_mutex_trylock()
  2022-08-26 14:09 ` Philippe Gerum
  2022-08-26 15:16   ` Giulio Moro
@ 2022-08-26 16:17   ` Philippe Gerum
  2022-08-26 18:07     ` Philippe Gerum
                       ` (3 more replies)
  1 sibling, 4 replies; 17+ messages in thread
From: Philippe Gerum @ 2022-08-26 16:17 UTC (permalink / raw)
  To: Giulio Moro; +Cc: xenomai, jan.kiszka


Philippe Gerum <rpm@xenomai.org> writes:

> Giulio Moro <giulio@bela.io> writes:
>
>> In the toy example below I start some pthreads with SCHED_FIFO, priority 94 and setting the affinity of one thread per core. I then attach them to the EVL core and share a lock between them and the main thread. The main thread has a settable priority and schedule (SCHED_OTHER with prio 0 or SCHED_FIFO with higher priority) and its affinity is set to all available cores.
>>
>> Here's a summary of what I get when running the test with some
>> parameters (arguments are: "max iterations", "number of threads to
>> create", "priority of the main thread", "flags (t means don't print
>> anything). The error code reported alongside failures is the base 2
>> representation of the return code. 100 means that errors were
>> detected in evl_unlock_mutex() in the main thread following a
>> successful call to evl_trylock_mutex(). During these tests, the
>> output is piped to cat and then redirected to /dev/null.
>>
>> mutex-test 100000 0 0 t :  Success real 0m13.583s user 0m1.218s sys 0m0.000s
>> mutex-test 100000 0 90 t :  Success real 0m13.350s user 0m1.177s sys 0m0.000s
>> mutex-test 100000 0 99 t :  Success real 0m13.347s user 0m1.179s sys 0m0.000s
>> mutex-test 100000 1 0 t :  Success real 0m20.087s user 0m3.575s sys 0m0.000s
>> mutex-test 100000 1 90 t :  Success real 0m17.610s user 0m2.133s sys 0m0.000s
>> mutex-test 100000 1 99 t :  Success real 0m15.466s user 0m1.333s sys 0m0.000s
>> mutex-test 100000 2 0 t :  Failed 100 real 0m0.259s user 0m0.083s sys 0m0.000s
>> mutex-test 100000 2 90 t :  Failed 100 real 0m0.374s user 0m0.176s sys 0m0.000s
>> mutex-test 100000 2 99 t :  Failed 100 real 0m0.207s user 0m0.109s sys 0m0.000s
>> mutex-test 100000 3 0 t :  Failed 100 real 0m0.253s user 0m0.124s sys 0m0.000s
>> mutex-test 100000 3 90 t :  Failed 100 real 0m0.383s user 0m0.149s sys 0m0.000s
>> mutex-test 100000 3 99 t :  Failed 100 real 0m0.283s user 0m0.151s sys 0m0.000s
>> mutex-test 100000 4 0 t :  Failed 100 real 0m0.526s user 0m0.221s sys 0m0.000s
>> mutex-test 100000 4 90 t :  Failed 100 real 0m0.438s user 0m0.178s sys 0m0.000s
>> mutex-test 100000 4 99 t :  Failed 100 real 0m0.434s user 0m0.178s sys 0m0.000s
>>
>> So, when running with 2 or more threads, the same error keeps surfacing very quickly.
>>
>
> Confirmed. There is something wrong.
>

Please retry with this patch in. There is clearly a race in the current
implementation of the mutex release code, specifically the release path.

@Jan: If believe this gremlin may be in Cobalt too. The key issue is
with the handle transitioning to NO_HANDLE before transfer_ownership()
switches it to the handle of the first contender in line. If userland
manages to slip in with a fast_acquire() during this tiny window, we may
be toast, with two threads actually holding the same lock. The issue was
made more obvious with EVL which does not serialize the trylock code
around cmpxchg in kernel space, triggering the issue faster. I'll have a
second look to Cobalt, but since you know the mutex code in depth there,
four eyeballs would be better.

diff --git a/kernel/evl/mutex.c b/kernel/evl/mutex.c
index e0cf1ba21d5b4..d96f8c9f04df7 100644
--- a/kernel/evl/mutex.c
+++ b/kernel/evl/mutex.c
@@ -899,14 +899,16 @@ void __evl_unlock_mutex(struct evl_mutex *mutex)
 	if (mutex->flags & EVL_MUTEX_CEILING)
 		clear_boost_locked(mutex, curr, EVL_MUTEX_CEILING);
 
+	/*
+	 * The whole logic is based on the invariant that the current
+	 * thread does own the mutex being released.
+	 * IOW: currh == atomic_read(lockp) & ~(EVL_MUTEX_FLCLAIM|EVL_MUTEX_FLCEIL).
+	 */
 	h = atomic_read(lockp);
-	h = atomic_cmpxchg(lockp, h, EVL_NO_HANDLE);
-	if ((h & ~EVL_MUTEX_FLCEIL) != currh) {
-		/* FLCLAIM set, mutex is contended. */
+	if (h & EVL_MUTEX_FLCLAIM) { /* Is there a contender? */
 		transfer_ownership(mutex, curr);
 	} else {
-		if (h != currh)	/* FLCEIL set, FLCLAIM clear. */
-			atomic_set(lockp, EVL_NO_HANDLE);
+		atomic_set(lockp, EVL_NO_HANDLE);
 		untrack_owner(mutex);
 	}
 
-- 
Philippe.

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: Issues with evl_mutex_trylock()
  2022-08-26 16:17   ` Philippe Gerum
@ 2022-08-26 18:07     ` Philippe Gerum
  2022-08-27 16:12     ` Giulio Moro
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 17+ messages in thread
From: Philippe Gerum @ 2022-08-26 18:07 UTC (permalink / raw)
  To: Giulio Moro; +Cc: xenomai, jan.kiszka


Philippe Gerum <rpm@xenomai.org> writes:

> Philippe Gerum <rpm@xenomai.org> writes:
>
>> Giulio Moro <giulio@bela.io> writes:
>>
>>> In the toy example below I start some pthreads with SCHED_FIFO, priority 94 and setting the affinity of one thread per core. I then attach them to the EVL core and share a lock between them and the main thread. The main thread has a settable priority and schedule (SCHED_OTHER with prio 0 or SCHED_FIFO with higher priority) and its affinity is set to all available cores.
>>>
>>> Here's a summary of what I get when running the test with some
>>> parameters (arguments are: "max iterations", "number of threads to
>>> create", "priority of the main thread", "flags (t means don't print
>>> anything). The error code reported alongside failures is the base 2
>>> representation of the return code. 100 means that errors were
>>> detected in evl_unlock_mutex() in the main thread following a
>>> successful call to evl_trylock_mutex(). During these tests, the
>>> output is piped to cat and then redirected to /dev/null.
>>>
>>> mutex-test 100000 0 0 t :  Success real 0m13.583s user 0m1.218s sys 0m0.000s
>>> mutex-test 100000 0 90 t :  Success real 0m13.350s user 0m1.177s sys 0m0.000s
>>> mutex-test 100000 0 99 t :  Success real 0m13.347s user 0m1.179s sys 0m0.000s
>>> mutex-test 100000 1 0 t :  Success real 0m20.087s user 0m3.575s sys 0m0.000s
>>> mutex-test 100000 1 90 t :  Success real 0m17.610s user 0m2.133s sys 0m0.000s
>>> mutex-test 100000 1 99 t :  Success real 0m15.466s user 0m1.333s sys 0m0.000s
>>> mutex-test 100000 2 0 t :  Failed 100 real 0m0.259s user 0m0.083s sys 0m0.000s
>>> mutex-test 100000 2 90 t :  Failed 100 real 0m0.374s user 0m0.176s sys 0m0.000s
>>> mutex-test 100000 2 99 t :  Failed 100 real 0m0.207s user 0m0.109s sys 0m0.000s
>>> mutex-test 100000 3 0 t :  Failed 100 real 0m0.253s user 0m0.124s sys 0m0.000s
>>> mutex-test 100000 3 90 t :  Failed 100 real 0m0.383s user 0m0.149s sys 0m0.000s
>>> mutex-test 100000 3 99 t :  Failed 100 real 0m0.283s user 0m0.151s sys 0m0.000s
>>> mutex-test 100000 4 0 t :  Failed 100 real 0m0.526s user 0m0.221s sys 0m0.000s
>>> mutex-test 100000 4 90 t :  Failed 100 real 0m0.438s user 0m0.178s sys 0m0.000s
>>> mutex-test 100000 4 99 t :  Failed 100 real 0m0.434s user 0m0.178s sys 0m0.000s
>>>
>>> So, when running with 2 or more threads, the same error keeps surfacing very quickly.
>>>
>>
>> Confirmed. There is something wrong.
>>
>
> Please retry with this patch in. There is clearly a race in the current
> implementation of the mutex release code, specifically the release path.
>
> @Jan: If believe this gremlin may be in Cobalt too. The key issue is
> with the handle transitioning to NO_HANDLE before transfer_ownership()
> switches it to the handle of the first contender in line. If userland
> manages to slip in with a fast_acquire() during this tiny window, we may
> be toast, with two threads actually holding the same lock. The issue was
> made more obvious with EVL which does not serialize the trylock code
> around cmpxchg in kernel space, triggering the issue faster. I'll have a
> second look to Cobalt, but since you know the mutex code in depth there,
> four eyeballs would be better.
>
> diff --git a/kernel/evl/mutex.c b/kernel/evl/mutex.c
> index e0cf1ba21d5b4..d96f8c9f04df7 100644
> --- a/kernel/evl/mutex.c
> +++ b/kernel/evl/mutex.c
> @@ -899,14 +899,16 @@ void __evl_unlock_mutex(struct evl_mutex *mutex)
>  	if (mutex->flags & EVL_MUTEX_CEILING)
>  		clear_boost_locked(mutex, curr, EVL_MUTEX_CEILING);
>  
> +	/*
> +	 * The whole logic is based on the invariant that the current
> +	 * thread does own the mutex being released.
> +	 * IOW: currh == atomic_read(lockp) & ~(EVL_MUTEX_FLCLAIM|EVL_MUTEX_FLCEIL).
> +	 */
>  	h = atomic_read(lockp);
> -	h = atomic_cmpxchg(lockp, h, EVL_NO_HANDLE);
> -	if ((h & ~EVL_MUTEX_FLCEIL) != currh) {
> -		/* FLCLAIM set, mutex is contended. */
> +	if (h & EVL_MUTEX_FLCLAIM) { /* Is there a contender? */
>  		transfer_ownership(mutex, curr);
>  	} else {
> -		if (h != currh)	/* FLCEIL set, FLCLAIM clear. */
> -			atomic_set(lockp, EVL_NO_HANDLE);
> +		atomic_set(lockp, EVL_NO_HANDLE);
>  		untrack_owner(mutex);
>  	}

NOTE: a second patch may follow later, related to some housekeeping
specific to the priority ceiling protocol (not involved in your test
code so far).

-- 
Philippe.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Issues with evl_mutex_trylock()
  2022-08-26 16:17   ` Philippe Gerum
  2022-08-26 18:07     ` Philippe Gerum
@ 2022-08-27 16:12     ` Giulio Moro
  2022-08-27 17:53     ` Philippe Gerum
  2023-08-11 12:31     ` Jan Kiszka
  3 siblings, 0 replies; 17+ messages in thread
From: Giulio Moro @ 2022-08-27 16:12 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai, jan.kiszka

> Please retry with this patch in. There is clearly a race in the current
> implementation of the mutex release code, specifically the release path.

I retried with your recently pushed branch which includes that patch (alongside 59700fc66877ea9ffa958b46e2cdaf2925840f0a evl/mutex: simplify the release path) and the tests do indeed succeed.
Thanks,
Giulio


Philippe Gerum wrote on 26/08/2022 18:17:
> 
> Philippe Gerum <rpm@xenomai.org> writes:
> 
>> Giulio Moro <giulio@bela.io> writes:
>>
>>> In the toy example below I start some pthreads with SCHED_FIFO, priority 94 and setting the affinity of one thread per core. I then attach them to the EVL core and share a lock between them and the main thread. The main thread has a settable priority and schedule (SCHED_OTHER with prio 0 or SCHED_FIFO with higher priority) and its affinity is set to all available cores.
>>>
>>> Here's a summary of what I get when running the test with some
>>> parameters (arguments are: "max iterations", "number of threads to
>>> create", "priority of the main thread", "flags (t means don't print
>>> anything). The error code reported alongside failures is the base 2
>>> representation of the return code. 100 means that errors were
>>> detected in evl_unlock_mutex() in the main thread following a
>>> successful call to evl_trylock_mutex(). During these tests, the
>>> output is piped to cat and then redirected to /dev/null.
>>>
>>> mutex-test 100000 0 0 t :  Success real 0m13.583s user 0m1.218s sys 0m0.000s
>>> mutex-test 100000 0 90 t :  Success real 0m13.350s user 0m1.177s sys 0m0.000s
>>> mutex-test 100000 0 99 t :  Success real 0m13.347s user 0m1.179s sys 0m0.000s
>>> mutex-test 100000 1 0 t :  Success real 0m20.087s user 0m3.575s sys 0m0.000s
>>> mutex-test 100000 1 90 t :  Success real 0m17.610s user 0m2.133s sys 0m0.000s
>>> mutex-test 100000 1 99 t :  Success real 0m15.466s user 0m1.333s sys 0m0.000s
>>> mutex-test 100000 2 0 t :  Failed 100 real 0m0.259s user 0m0.083s sys 0m0.000s
>>> mutex-test 100000 2 90 t :  Failed 100 real 0m0.374s user 0m0.176s sys 0m0.000s
>>> mutex-test 100000 2 99 t :  Failed 100 real 0m0.207s user 0m0.109s sys 0m0.000s
>>> mutex-test 100000 3 0 t :  Failed 100 real 0m0.253s user 0m0.124s sys 0m0.000s
>>> mutex-test 100000 3 90 t :  Failed 100 real 0m0.383s user 0m0.149s sys 0m0.000s
>>> mutex-test 100000 3 99 t :  Failed 100 real 0m0.283s user 0m0.151s sys 0m0.000s
>>> mutex-test 100000 4 0 t :  Failed 100 real 0m0.526s user 0m0.221s sys 0m0.000s
>>> mutex-test 100000 4 90 t :  Failed 100 real 0m0.438s user 0m0.178s sys 0m0.000s
>>> mutex-test 100000 4 99 t :  Failed 100 real 0m0.434s user 0m0.178s sys 0m0.000s
>>>
>>> So, when running with 2 or more threads, the same error keeps surfacing very quickly.
>>>
>>
>> Confirmed. There is something wrong.
>>
> 
> Please retry with this patch in. There is clearly a race in the current
> implementation of the mutex release code, specifically the release path.
> 
> @Jan: If believe this gremlin may be in Cobalt too. The key issue is
> with the handle transitioning to NO_HANDLE before transfer_ownership()
> switches it to the handle of the first contender in line. If userland
> manages to slip in with a fast_acquire() during this tiny window, we may
> be toast, with two threads actually holding the same lock. The issue was
> made more obvious with EVL which does not serialize the trylock code
> around cmpxchg in kernel space, triggering the issue faster. I'll have a
> second look to Cobalt, but since you know the mutex code in depth there,
> four eyeballs would be better.
> 
> diff --git a/kernel/evl/mutex.c b/kernel/evl/mutex.c
> index e0cf1ba21d5b4..d96f8c9f04df7 100644
> --- a/kernel/evl/mutex.c
> +++ b/kernel/evl/mutex.c
> @@ -899,14 +899,16 @@ void __evl_unlock_mutex(struct evl_mutex *mutex)
>   	if (mutex->flags & EVL_MUTEX_CEILING)
>   		clear_boost_locked(mutex, curr, EVL_MUTEX_CEILING);
>   
> +	/*
> +	 * The whole logic is based on the invariant that the current
> +	 * thread does own the mutex being released.
> +	 * IOW: currh == atomic_read(lockp) & ~(EVL_MUTEX_FLCLAIM|EVL_MUTEX_FLCEIL).
> +	 */
>   	h = atomic_read(lockp);
> -	h = atomic_cmpxchg(lockp, h, EVL_NO_HANDLE);
> -	if ((h & ~EVL_MUTEX_FLCEIL) != currh) {
> -		/* FLCLAIM set, mutex is contended. */
> +	if (h & EVL_MUTEX_FLCLAIM) { /* Is there a contender? */
>   		transfer_ownership(mutex, curr);
>   	} else {
> -		if (h != currh)	/* FLCEIL set, FLCLAIM clear. */
> -			atomic_set(lockp, EVL_NO_HANDLE);
> +		atomic_set(lockp, EVL_NO_HANDLE);
>   		untrack_owner(mutex);
>   	}
>   
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Issues with evl_mutex_trylock()
  2022-08-26 16:17   ` Philippe Gerum
  2022-08-26 18:07     ` Philippe Gerum
  2022-08-27 16:12     ` Giulio Moro
@ 2022-08-27 17:53     ` Philippe Gerum
  2023-08-10  6:00       ` Jan Kiszka
  2023-08-11 11:31       ` Jan Kiszka
  2023-08-11 12:31     ` Jan Kiszka
  3 siblings, 2 replies; 17+ messages in thread
From: Philippe Gerum @ 2022-08-27 17:53 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai


Philippe Gerum <rpm@xenomai.org> writes:

> Philippe Gerum <rpm@xenomai.org> writes:
>
>> Giulio Moro <giulio@bela.io> writes:
>>
>>> In the toy example below I start some pthreads with SCHED_FIFO, priority 94 and setting the affinity of one thread per core. I then attach them to the EVL core and share a lock between them and the main thread. The main thread has a settable priority and schedule (SCHED_OTHER with prio 0 or SCHED_FIFO with higher priority) and its affinity is set to all available cores.
>>>
>>> Here's a summary of what I get when running the test with some
>>> parameters (arguments are: "max iterations", "number of threads to
>>> create", "priority of the main thread", "flags (t means don't print
>>> anything). The error code reported alongside failures is the base 2
>>> representation of the return code. 100 means that errors were
>>> detected in evl_unlock_mutex() in the main thread following a
>>> successful call to evl_trylock_mutex(). During these tests, the
>>> output is piped to cat and then redirected to /dev/null.
>>>
>>> mutex-test 100000 0 0 t :  Success real 0m13.583s user 0m1.218s sys 0m0.000s
>>> mutex-test 100000 0 90 t :  Success real 0m13.350s user 0m1.177s sys 0m0.000s
>>> mutex-test 100000 0 99 t :  Success real 0m13.347s user 0m1.179s sys 0m0.000s
>>> mutex-test 100000 1 0 t :  Success real 0m20.087s user 0m3.575s sys 0m0.000s
>>> mutex-test 100000 1 90 t :  Success real 0m17.610s user 0m2.133s sys 0m0.000s
>>> mutex-test 100000 1 99 t :  Success real 0m15.466s user 0m1.333s sys 0m0.000s
>>> mutex-test 100000 2 0 t :  Failed 100 real 0m0.259s user 0m0.083s sys 0m0.000s
>>> mutex-test 100000 2 90 t :  Failed 100 real 0m0.374s user 0m0.176s sys 0m0.000s
>>> mutex-test 100000 2 99 t :  Failed 100 real 0m0.207s user 0m0.109s sys 0m0.000s
>>> mutex-test 100000 3 0 t :  Failed 100 real 0m0.253s user 0m0.124s sys 0m0.000s
>>> mutex-test 100000 3 90 t :  Failed 100 real 0m0.383s user 0m0.149s sys 0m0.000s
>>> mutex-test 100000 3 99 t :  Failed 100 real 0m0.283s user 0m0.151s sys 0m0.000s
>>> mutex-test 100000 4 0 t :  Failed 100 real 0m0.526s user 0m0.221s sys 0m0.000s
>>> mutex-test 100000 4 90 t :  Failed 100 real 0m0.438s user 0m0.178s sys 0m0.000s
>>> mutex-test 100000 4 99 t :  Failed 100 real 0m0.434s user 0m0.178s sys 0m0.000s
>>>
>>> So, when running with 2 or more threads, the same error keeps surfacing very quickly.
>>>
>>
>> Confirmed. There is something wrong.
>>
>
> Please retry with this patch in. There is clearly a race in the current
> implementation of the mutex release code, specifically the release path.
>
> @Jan: If believe this gremlin may be in Cobalt too. The key issue is
> with the handle transitioning to NO_HANDLE before transfer_ownership()
> switches it to the handle of the first contender in line. If userland
> manages to slip in with a fast_acquire() during this tiny window, we may
> be toast, with two threads actually holding the same lock. The issue was
> made more obvious with EVL which does not serialize the trylock code
> around cmpxchg in kernel space, triggering the issue faster. I'll have a
> second look to Cobalt, but since you know the mutex code in depth there,
> four eyeballs would be better.
>

I believe that the problem just fixed in the EVL core also affects
Cobalt. This is a suggested patch for fixing the same issue in the
latter on top of 3.2.2. Smokey's posix_mutex test still happy with it.

NOTE: the issue at hand is two threads being granted the _same_ lock
concurrently, so this might be potentially catastrophic.

---

commit 4417674f427a78ba7fe2c8e5b2607d5ca6886ec0 (HEAD -> master)
Author: Philippe Gerum <rpm@xenomai.org>
Date:   Sat Aug 27 19:44:30 2022 +0200

    cobalt/sync: fix race in release path
    
    The mutex owner handle transitions to XN_NO_HANDLE before
    transfer_ownership() may switch it to the handle of the first
    contender in line. If a user context manage to slip in with a
    succcessful cmpxchg() in this tiny window, two threads may actually
    end up holding the same lock.
    
    To address this bug, set the handle directly to its final value as a
    result of transferring ownership, without transitioning via
    XN_NO_HANDLE, closing the race window.
    
    At this chance, stop duplicating the logic which transfer_ownership()
    already provides.  We may unconditionally rely on this routine for
    either passing the lock to the next waiter in line, or marking the
    lock as free, updating the fast handle accordingly in both cases.
    
    Signed-off-by: Philippe Gerum <rpm@xenomai.org>

diff --git a/kernel/cobalt/synch.c b/kernel/cobalt/synch.c
index 6e50e537f..c5398dba3 100644
--- a/kernel/cobalt/synch.c
+++ b/kernel/cobalt/synch.c
@@ -850,6 +850,17 @@ static bool transfer_ownership(struct xnsynch *synch,
 	xnhandle_t nextownerh;
 	atomic_t *lockp;
 
+	/*
+	 * FLCEIL may only be raised by the owner, or when the owner
+	 * is blocked waiting for the synch (ownership transfer). In
+	 * addition, only the current owner of a synch may release it,
+	 * therefore we can't race while testing FLCEIL locklessly.
+	 * All updates to FLCLAIM are covered by the superlock.
+	 *
+	 * Therefore, clearing the fastlock racelessly in this routine
+	 * without leaking FLCEIL/FLCLAIM updates can be achieved by
+	 * holding the superlock.
+	 */
 	lockp = xnsynch_fastlock(synch);
 
 	/*
@@ -862,6 +873,8 @@ static bool transfer_ownership(struct xnsynch *synch,
 		return false;
 	}
 
+	XENO_WARN_ON(COBALT, !xnsynch_fast_is_claimed(atomic_read(lockp)));
+
 	nextowner = list_first_entry(&synch->pendq, struct xnthread, plink);
 	list_del(&nextowner->plink);
 	nextowner->wchan = NULL;
@@ -916,8 +929,6 @@ static bool transfer_ownership(struct xnsynch *synch,
 bool xnsynch_release(struct xnsynch *synch, struct xnthread *curr)
 {
 	bool need_resched = false;
-	xnhandle_t currh, h;
-	atomic_t *lockp;
 	spl_t s;
 
 	XENO_BUG_ON(COBALT, (synch->status & XNSYNCH_OWNER) == 0);
@@ -927,19 +938,6 @@ bool xnsynch_release(struct xnsynch *synch, struct xnthread *curr)
 	if (xnthread_put_resource(curr))
 		return false;
 
-	lockp = xnsynch_fastlock(synch);
-	currh = curr->handle;
-	/*
-	 * FLCEIL may only be raised by the owner, or when the owner
-	 * is blocked waiting for the synch (ownership transfer). In
-	 * addition, only the current owner of a synch may release it,
-	 * therefore we can't race while testing FLCEIL locklessly.
-	 * All updates to FLCLAIM are covered by the superlock.
-	 *
-	 * Therefore, clearing the fastlock racelessly in this routine
-	 * without leaking FLCEIL/FLCLAIM updates can be achieved by
-	 * holding the superlock.
-	 */
 	xnlock_get_irqsave(&nklock, s);
 
 	if (synch->status & XNSYNCH_CEILING) {
@@ -947,12 +945,7 @@ bool xnsynch_release(struct xnsynch *synch, struct xnthread *curr)
 		need_resched = true;
 	}
 
-	h = atomic_cmpxchg(lockp, currh, XN_NO_HANDLE);
-	if ((h & ~XNSYNCH_FLCEIL) != currh)
-		/* FLCLAIM set, synch is contended. */
-		need_resched = transfer_ownership(synch, curr);
-	else if (h != currh)	/* FLCEIL set, FLCLAIM clear. */
-		atomic_set(lockp, XN_NO_HANDLE);
+	need_resched |= transfer_ownership(synch, curr);
 
 	xnlock_put_irqrestore(&nklock, s);
 
-- 
Philippe.

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: Issues with evl_mutex_trylock()
  2022-08-27 17:53     ` Philippe Gerum
@ 2023-08-10  6:00       ` Jan Kiszka
  2023-08-10  6:40         ` sang yo
  2023-08-10  8:18         ` Philippe Gerum
  2023-08-11 11:31       ` Jan Kiszka
  1 sibling, 2 replies; 17+ messages in thread
From: Jan Kiszka @ 2023-08-10  6:00 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai, sang yo

On 27.08.22 19:53, Philippe Gerum wrote:
> 
> Philippe Gerum <rpm@xenomai.org> writes:
> 
>> Philippe Gerum <rpm@xenomai.org> writes:
>>
>>> Giulio Moro <giulio@bela.io> writes:
>>>
>>>> In the toy example below I start some pthreads with SCHED_FIFO, priority 94 and setting the affinity of one thread per core. I then attach them to the EVL core and share a lock between them and the main thread. The main thread has a settable priority and schedule (SCHED_OTHER with prio 0 or SCHED_FIFO with higher priority) and its affinity is set to all available cores.
>>>>
>>>> Here's a summary of what I get when running the test with some
>>>> parameters (arguments are: "max iterations", "number of threads to
>>>> create", "priority of the main thread", "flags (t means don't print
>>>> anything). The error code reported alongside failures is the base 2
>>>> representation of the return code. 100 means that errors were
>>>> detected in evl_unlock_mutex() in the main thread following a
>>>> successful call to evl_trylock_mutex(). During these tests, the
>>>> output is piped to cat and then redirected to /dev/null.
>>>>
>>>> mutex-test 100000 0 0 t :  Success real 0m13.583s user 0m1.218s sys 0m0.000s
>>>> mutex-test 100000 0 90 t :  Success real 0m13.350s user 0m1.177s sys 0m0.000s
>>>> mutex-test 100000 0 99 t :  Success real 0m13.347s user 0m1.179s sys 0m0.000s
>>>> mutex-test 100000 1 0 t :  Success real 0m20.087s user 0m3.575s sys 0m0.000s
>>>> mutex-test 100000 1 90 t :  Success real 0m17.610s user 0m2.133s sys 0m0.000s
>>>> mutex-test 100000 1 99 t :  Success real 0m15.466s user 0m1.333s sys 0m0.000s
>>>> mutex-test 100000 2 0 t :  Failed 100 real 0m0.259s user 0m0.083s sys 0m0.000s
>>>> mutex-test 100000 2 90 t :  Failed 100 real 0m0.374s user 0m0.176s sys 0m0.000s
>>>> mutex-test 100000 2 99 t :  Failed 100 real 0m0.207s user 0m0.109s sys 0m0.000s
>>>> mutex-test 100000 3 0 t :  Failed 100 real 0m0.253s user 0m0.124s sys 0m0.000s
>>>> mutex-test 100000 3 90 t :  Failed 100 real 0m0.383s user 0m0.149s sys 0m0.000s
>>>> mutex-test 100000 3 99 t :  Failed 100 real 0m0.283s user 0m0.151s sys 0m0.000s
>>>> mutex-test 100000 4 0 t :  Failed 100 real 0m0.526s user 0m0.221s sys 0m0.000s
>>>> mutex-test 100000 4 90 t :  Failed 100 real 0m0.438s user 0m0.178s sys 0m0.000s
>>>> mutex-test 100000 4 99 t :  Failed 100 real 0m0.434s user 0m0.178s sys 0m0.000s
>>>>
>>>> So, when running with 2 or more threads, the same error keeps surfacing very quickly.
>>>>
>>>
>>> Confirmed. There is something wrong.
>>>
>>
>> Please retry with this patch in. There is clearly a race in the current
>> implementation of the mutex release code, specifically the release path.
>>
>> @Jan: If believe this gremlin may be in Cobalt too. The key issue is
>> with the handle transitioning to NO_HANDLE before transfer_ownership()
>> switches it to the handle of the first contender in line. If userland
>> manages to slip in with a fast_acquire() during this tiny window, we may
>> be toast, with two threads actually holding the same lock. The issue was
>> made more obvious with EVL which does not serialize the trylock code
>> around cmpxchg in kernel space, triggering the issue faster. I'll have a
>> second look to Cobalt, but since you know the mutex code in depth there,
>> four eyeballs would be better.
>>
> 
> I believe that the problem just fixed in the EVL core also affects
> Cobalt. This is a suggested patch for fixing the same issue in the
> latter on top of 3.2.2. Smokey's posix_mutex test still happy with it.
> 
> NOTE: the issue at hand is two threads being granted the _same_ lock
> concurrently, so this might be potentially catastrophic.
> 
> ---
> 
> commit 4417674f427a78ba7fe2c8e5b2607d5ca6886ec0 (HEAD -> master)
> Author: Philippe Gerum <rpm@xenomai.org>
> Date:   Sat Aug 27 19:44:30 2022 +0200
> 
>     cobalt/sync: fix race in release path
>     
>     The mutex owner handle transitions to XN_NO_HANDLE before
>     transfer_ownership() may switch it to the handle of the first
>     contender in line. If a user context manage to slip in with a
>     succcessful cmpxchg() in this tiny window, two threads may actually
>     end up holding the same lock.
>     
>     To address this bug, set the handle directly to its final value as a
>     result of transferring ownership, without transitioning via
>     XN_NO_HANDLE, closing the race window.
>     
>     At this chance, stop duplicating the logic which transfer_ownership()
>     already provides.  We may unconditionally rely on this routine for
>     either passing the lock to the next waiter in line, or marking the
>     lock as free, updating the fast handle accordingly in both cases.
>     
>     Signed-off-by: Philippe Gerum <rpm@xenomai.org>
> 
> diff --git a/kernel/cobalt/synch.c b/kernel/cobalt/synch.c
> index 6e50e537f..c5398dba3 100644
> --- a/kernel/cobalt/synch.c
> +++ b/kernel/cobalt/synch.c
> @@ -850,6 +850,17 @@ static bool transfer_ownership(struct xnsynch *synch,
>  	xnhandle_t nextownerh;
>  	atomic_t *lockp;
>  
> +	/*
> +	 * FLCEIL may only be raised by the owner, or when the owner
> +	 * is blocked waiting for the synch (ownership transfer). In
> +	 * addition, only the current owner of a synch may release it,
> +	 * therefore we can't race while testing FLCEIL locklessly.
> +	 * All updates to FLCLAIM are covered by the superlock.
> +	 *
> +	 * Therefore, clearing the fastlock racelessly in this routine
> +	 * without leaking FLCEIL/FLCLAIM updates can be achieved by
> +	 * holding the superlock.
> +	 */
>  	lockp = xnsynch_fastlock(synch);
>  
>  	/*
> @@ -862,6 +873,8 @@ static bool transfer_ownership(struct xnsynch *synch,
>  		return false;
>  	}
>  
> +	XENO_WARN_ON(COBALT, !xnsynch_fast_is_claimed(atomic_read(lockp)));
> +
>  	nextowner = list_first_entry(&synch->pendq, struct xnthread, plink);
>  	list_del(&nextowner->plink);
>  	nextowner->wchan = NULL;
> @@ -916,8 +929,6 @@ static bool transfer_ownership(struct xnsynch *synch,
>  bool xnsynch_release(struct xnsynch *synch, struct xnthread *curr)
>  {
>  	bool need_resched = false;
> -	xnhandle_t currh, h;
> -	atomic_t *lockp;
>  	spl_t s;
>  
>  	XENO_BUG_ON(COBALT, (synch->status & XNSYNCH_OWNER) == 0);
> @@ -927,19 +938,6 @@ bool xnsynch_release(struct xnsynch *synch, struct xnthread *curr)
>  	if (xnthread_put_resource(curr))
>  		return false;
>  
> -	lockp = xnsynch_fastlock(synch);
> -	currh = curr->handle;
> -	/*
> -	 * FLCEIL may only be raised by the owner, or when the owner
> -	 * is blocked waiting for the synch (ownership transfer). In
> -	 * addition, only the current owner of a synch may release it,
> -	 * therefore we can't race while testing FLCEIL locklessly.
> -	 * All updates to FLCLAIM are covered by the superlock.
> -	 *
> -	 * Therefore, clearing the fastlock racelessly in this routine
> -	 * without leaking FLCEIL/FLCLAIM updates can be achieved by
> -	 * holding the superlock.
> -	 */
>  	xnlock_get_irqsave(&nklock, s);
>  
>  	if (synch->status & XNSYNCH_CEILING) {
> @@ -947,12 +945,7 @@ bool xnsynch_release(struct xnsynch *synch, struct xnthread *curr)
>  		need_resched = true;
>  	}
>  
> -	h = atomic_cmpxchg(lockp, currh, XN_NO_HANDLE);
> -	if ((h & ~XNSYNCH_FLCEIL) != currh)
> -		/* FLCLAIM set, synch is contended. */
> -		need_resched = transfer_ownership(synch, curr);
> -	else if (h != currh)	/* FLCEIL set, FLCLAIM clear. */
> -		atomic_set(lockp, XN_NO_HANDLE);
> +	need_resched |= transfer_ownership(synch, curr);
>  
>  	xnlock_put_irqrestore(&nklock, s);
>  

This apparently fell through the cracks on my side, not being submitted
as top-level patch, thanks for pointing out, Sang Yo.

Philippe, this is still valid? How can we check if we need it and it
does the think it is supposed to do? Sang Yo, did you see the issue, or
what raised your attention?

Thanks,
Jan

-- 
Siemens AG, Technology
Linux Expert Center


^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: Issues with evl_mutex_trylock()
  2023-08-10  6:00       ` Jan Kiszka
@ 2023-08-10  6:40         ` sang yo
  2023-08-10  8:18         ` Philippe Gerum
  1 sibling, 0 replies; 17+ messages in thread
From: sang yo @ 2023-08-10  6:40 UTC (permalink / raw)
  To: Jan Kiszka, Philippe Gerum; +Cc: xenomai

I found my code failed to acquire lock randomly before applying the patch Philippe commit(v3.2 stable branch).

Simply wondering since the issue has been addressed long time ago but patch is not alive at any branch.


Thanks

-----Original Message-----
From: Jan Kiszka <jan.kiszka@siemens.com> 
Sent: Thursday, August 10, 2023 2:01 PM
To: Philippe Gerum <rpm@xenomai.org>
Cc: xenomai@lists.linux.dev; sang yo <stlfatboy@hotmail.com>
Subject: Re: Issues with evl_mutex_trylock()

On 27.08.22 19:53, Philippe Gerum wrote:
> 
> Philippe Gerum <rpm@xenomai.org> writes:
> 
>> Philippe Gerum <rpm@xenomai.org> writes:
>>
>>> Giulio Moro <giulio@bela.io> writes:
>>>
>>>> In the toy example below I start some pthreads with SCHED_FIFO, priority 94 and setting the affinity of one thread per core. I then attach them to the EVL core and share a lock between them and the main thread. The main thread has a settable priority and schedule (SCHED_OTHER with prio 0 or SCHED_FIFO with higher priority) and its affinity is set to all available cores.
>>>>
>>>> Here's a summary of what I get when running the test with some 
>>>> parameters (arguments are: "max iterations", "number of threads to 
>>>> create", "priority of the main thread", "flags (t means don't print 
>>>> anything). The error code reported alongside failures is the base 2 
>>>> representation of the return code. 100 means that errors were 
>>>> detected in evl_unlock_mutex() in the main thread following a 
>>>> successful call to evl_trylock_mutex(). During these tests, the 
>>>> output is piped to cat and then redirected to /dev/null.
>>>>
>>>> mutex-test 100000 0 0 t :  Success real 0m13.583s user 0m1.218s sys 
>>>> 0m0.000s mutex-test 100000 0 90 t :  Success real 0m13.350s user 
>>>> 0m1.177s sys 0m0.000s mutex-test 100000 0 99 t :  Success real 
>>>> 0m13.347s user 0m1.179s sys 0m0.000s mutex-test 100000 1 0 t :  
>>>> Success real 0m20.087s user 0m3.575s sys 0m0.000s mutex-test 100000 
>>>> 1 90 t :  Success real 0m17.610s user 0m2.133s sys 0m0.000s 
>>>> mutex-test 100000 1 99 t :  Success real 0m15.466s user 0m1.333s 
>>>> sys 0m0.000s mutex-test 100000 2 0 t :  Failed 100 real 0m0.259s 
>>>> user 0m0.083s sys 0m0.000s mutex-test 100000 2 90 t :  Failed 100 
>>>> real 0m0.374s user 0m0.176s sys 0m0.000s mutex-test 100000 2 99 t :  
>>>> Failed 100 real 0m0.207s user 0m0.109s sys 0m0.000s mutex-test 
>>>> 100000 3 0 t :  Failed 100 real 0m0.253s user 0m0.124s sys 0m0.000s 
>>>> mutex-test 100000 3 90 t :  Failed 100 real 0m0.383s user 0m0.149s 
>>>> sys 0m0.000s mutex-test 100000 3 99 t :  Failed 100 real 0m0.283s 
>>>> user 0m0.151s sys 0m0.000s mutex-test 100000 4 0 t :  Failed 100 
>>>> real 0m0.526s user 0m0.221s sys 0m0.000s mutex-test 100000 4 90 t :  
>>>> Failed 100 real 0m0.438s user 0m0.178s sys 0m0.000s mutex-test 
>>>> 100000 4 99 t :  Failed 100 real 0m0.434s user 0m0.178s sys 
>>>> 0m0.000s
>>>>
>>>> So, when running with 2 or more threads, the same error keeps surfacing very quickly.
>>>>
>>>
>>> Confirmed. There is something wrong.
>>>
>>
>> Please retry with this patch in. There is clearly a race in the 
>> current implementation of the mutex release code, specifically the release path.
>>
>> @Jan: If believe this gremlin may be in Cobalt too. The key issue is 
>> with the handle transitioning to NO_HANDLE before 
>> transfer_ownership() switches it to the handle of the first contender 
>> in line. If userland manages to slip in with a fast_acquire() during 
>> this tiny window, we may be toast, with two threads actually holding 
>> the same lock. The issue was made more obvious with EVL which does 
>> not serialize the trylock code around cmpxchg in kernel space, 
>> triggering the issue faster. I'll have a second look to Cobalt, but 
>> since you know the mutex code in depth there, four eyeballs would be better.
>>
> 
> I believe that the problem just fixed in the EVL core also affects 
> Cobalt. This is a suggested patch for fixing the same issue in the 
> latter on top of 3.2.2. Smokey's posix_mutex test still happy with it.
> 
> NOTE: the issue at hand is two threads being granted the _same_ lock 
> concurrently, so this might be potentially catastrophic.
> 
> ---
> 
> commit 4417674f427a78ba7fe2c8e5b2607d5ca6886ec0 (HEAD -> master)
> Author: Philippe Gerum <rpm@xenomai.org>
> Date:   Sat Aug 27 19:44:30 2022 +0200
> 
>     cobalt/sync: fix race in release path
>     
>     The mutex owner handle transitions to XN_NO_HANDLE before
>     transfer_ownership() may switch it to the handle of the first
>     contender in line. If a user context manage to slip in with a
>     succcessful cmpxchg() in this tiny window, two threads may actually
>     end up holding the same lock.
>     
>     To address this bug, set the handle directly to its final value as a
>     result of transferring ownership, without transitioning via
>     XN_NO_HANDLE, closing the race window.
>     
>     At this chance, stop duplicating the logic which transfer_ownership()
>     already provides.  We may unconditionally rely on this routine for
>     either passing the lock to the next waiter in line, or marking the
>     lock as free, updating the fast handle accordingly in both cases.
>     
>     Signed-off-by: Philippe Gerum <rpm@xenomai.org>
> 
> diff --git a/kernel/cobalt/synch.c b/kernel/cobalt/synch.c index 
> 6e50e537f..c5398dba3 100644
> --- a/kernel/cobalt/synch.c
> +++ b/kernel/cobalt/synch.c
> @@ -850,6 +850,17 @@ static bool transfer_ownership(struct xnsynch *synch,
>  	xnhandle_t nextownerh;
>  	atomic_t *lockp;
>  
> +	/*
> +	 * FLCEIL may only be raised by the owner, or when the owner
> +	 * is blocked waiting for the synch (ownership transfer). In
> +	 * addition, only the current owner of a synch may release it,
> +	 * therefore we can't race while testing FLCEIL locklessly.
> +	 * All updates to FLCLAIM are covered by the superlock.
> +	 *
> +	 * Therefore, clearing the fastlock racelessly in this routine
> +	 * without leaking FLCEIL/FLCLAIM updates can be achieved by
> +	 * holding the superlock.
> +	 */
>  	lockp = xnsynch_fastlock(synch);
>  
>  	/*
> @@ -862,6 +873,8 @@ static bool transfer_ownership(struct xnsynch *synch,
>  		return false;
>  	}
>  
> +	XENO_WARN_ON(COBALT, !xnsynch_fast_is_claimed(atomic_read(lockp)));
> +
>  	nextowner = list_first_entry(&synch->pendq, struct xnthread, plink);
>  	list_del(&nextowner->plink);
>  	nextowner->wchan = NULL;
> @@ -916,8 +929,6 @@ static bool transfer_ownership(struct xnsynch 
> *synch,  bool xnsynch_release(struct xnsynch *synch, struct xnthread 
> *curr)  {
>  	bool need_resched = false;
> -	xnhandle_t currh, h;
> -	atomic_t *lockp;
>  	spl_t s;
>  
>  	XENO_BUG_ON(COBALT, (synch->status & XNSYNCH_OWNER) == 0); @@ 
> -927,19 +938,6 @@ bool xnsynch_release(struct xnsynch *synch, struct xnthread *curr)
>  	if (xnthread_put_resource(curr))
>  		return false;
>  
> -	lockp = xnsynch_fastlock(synch);
> -	currh = curr->handle;
> -	/*
> -	 * FLCEIL may only be raised by the owner, or when the owner
> -	 * is blocked waiting for the synch (ownership transfer). In
> -	 * addition, only the current owner of a synch may release it,
> -	 * therefore we can't race while testing FLCEIL locklessly.
> -	 * All updates to FLCLAIM are covered by the superlock.
> -	 *
> -	 * Therefore, clearing the fastlock racelessly in this routine
> -	 * without leaking FLCEIL/FLCLAIM updates can be achieved by
> -	 * holding the superlock.
> -	 */
>  	xnlock_get_irqsave(&nklock, s);
>  
>  	if (synch->status & XNSYNCH_CEILING) { @@ -947,12 +945,7 @@ bool 
> xnsynch_release(struct xnsynch *synch, struct xnthread *curr)
>  		need_resched = true;
>  	}
>  
> -	h = atomic_cmpxchg(lockp, currh, XN_NO_HANDLE);
> -	if ((h & ~XNSYNCH_FLCEIL) != currh)
> -		/* FLCLAIM set, synch is contended. */
> -		need_resched = transfer_ownership(synch, curr);
> -	else if (h != currh)	/* FLCEIL set, FLCLAIM clear. */
> -		atomic_set(lockp, XN_NO_HANDLE);
> +	need_resched |= transfer_ownership(synch, curr);
>  
>  	xnlock_put_irqrestore(&nklock, s);
>  

This apparently fell through the cracks on my side, not being submitted as top-level patch, thanks for pointing out, Sang Yo.

Philippe, this is still valid? How can we check if we need it and it does the think it is supposed to do? Sang Yo, did you see the issue, or what raised your attention?

Thanks,
Jan

--
Siemens AG, Technology
Linux Expert Center


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Issues with evl_mutex_trylock()
  2023-08-10  6:00       ` Jan Kiszka
  2023-08-10  6:40         ` sang yo
@ 2023-08-10  8:18         ` Philippe Gerum
  2023-08-11  7:58           ` Jan Kiszka
  1 sibling, 1 reply; 17+ messages in thread
From: Philippe Gerum @ 2023-08-10  8:18 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai, sang yo


Jan Kiszka <jan.kiszka@siemens.com> writes:

> On 27.08.22 19:53, Philippe Gerum wrote:
>> 
>> Philippe Gerum <rpm@xenomai.org> writes:
>> 
>>> Philippe Gerum <rpm@xenomai.org> writes:
>>>
>>>> Giulio Moro <giulio@bela.io> writes:
>>>>
>>>>> In the toy example below I start some pthreads with SCHED_FIFO, priority 94 and setting the affinity of one thread per core. I then attach them to the EVL core and share a lock between them and the main thread. The main thread has a settable priority and schedule (SCHED_OTHER with prio 0 or SCHED_FIFO with higher priority) and its affinity is set to all available cores.
>>>>>
>>>>> Here's a summary of what I get when running the test with some
>>>>> parameters (arguments are: "max iterations", "number of threads to
>>>>> create", "priority of the main thread", "flags (t means don't print
>>>>> anything). The error code reported alongside failures is the base 2
>>>>> representation of the return code. 100 means that errors were
>>>>> detected in evl_unlock_mutex() in the main thread following a
>>>>> successful call to evl_trylock_mutex(). During these tests, the
>>>>> output is piped to cat and then redirected to /dev/null.
>>>>>
>>>>> mutex-test 100000 0 0 t :  Success real 0m13.583s user 0m1.218s sys 0m0.000s
>>>>> mutex-test 100000 0 90 t :  Success real 0m13.350s user 0m1.177s sys 0m0.000s
>>>>> mutex-test 100000 0 99 t :  Success real 0m13.347s user 0m1.179s sys 0m0.000s
>>>>> mutex-test 100000 1 0 t :  Success real 0m20.087s user 0m3.575s sys 0m0.000s
>>>>> mutex-test 100000 1 90 t :  Success real 0m17.610s user 0m2.133s sys 0m0.000s
>>>>> mutex-test 100000 1 99 t :  Success real 0m15.466s user 0m1.333s sys 0m0.000s
>>>>> mutex-test 100000 2 0 t :  Failed 100 real 0m0.259s user 0m0.083s sys 0m0.000s
>>>>> mutex-test 100000 2 90 t :  Failed 100 real 0m0.374s user 0m0.176s sys 0m0.000s
>>>>> mutex-test 100000 2 99 t :  Failed 100 real 0m0.207s user 0m0.109s sys 0m0.000s
>>>>> mutex-test 100000 3 0 t :  Failed 100 real 0m0.253s user 0m0.124s sys 0m0.000s
>>>>> mutex-test 100000 3 90 t :  Failed 100 real 0m0.383s user 0m0.149s sys 0m0.000s
>>>>> mutex-test 100000 3 99 t :  Failed 100 real 0m0.283s user 0m0.151s sys 0m0.000s
>>>>> mutex-test 100000 4 0 t :  Failed 100 real 0m0.526s user 0m0.221s sys 0m0.000s
>>>>> mutex-test 100000 4 90 t :  Failed 100 real 0m0.438s user 0m0.178s sys 0m0.000s
>>>>> mutex-test 100000 4 99 t :  Failed 100 real 0m0.434s user 0m0.178s sys 0m0.000s
>>>>>
>>>>> So, when running with 2 or more threads, the same error keeps surfacing very quickly.
>>>>>
>>>>
>>>> Confirmed. There is something wrong.
>>>>
>>>
>>> Please retry with this patch in. There is clearly a race in the current
>>> implementation of the mutex release code, specifically the release path.
>>>
>>> @Jan: If believe this gremlin may be in Cobalt too. The key issue is
>>> with the handle transitioning to NO_HANDLE before transfer_ownership()
>>> switches it to the handle of the first contender in line. If userland
>>> manages to slip in with a fast_acquire() during this tiny window, we may
>>> be toast, with two threads actually holding the same lock. The issue was
>>> made more obvious with EVL which does not serialize the trylock code
>>> around cmpxchg in kernel space, triggering the issue faster. I'll have a
>>> second look to Cobalt, but since you know the mutex code in depth there,
>>> four eyeballs would be better.
>>>
>> 
>> I believe that the problem just fixed in the EVL core also affects
>> Cobalt. This is a suggested patch for fixing the same issue in the
>> latter on top of 3.2.2. Smokey's posix_mutex test still happy with it.
>> 
>> NOTE: the issue at hand is two threads being granted the _same_ lock
>> concurrently, so this might be potentially catastrophic.
>> 
>> ---
>> 
>> commit 4417674f427a78ba7fe2c8e5b2607d5ca6886ec0 (HEAD -> master)
>> Author: Philippe Gerum <rpm@xenomai.org>
>> Date:   Sat Aug 27 19:44:30 2022 +0200
>> 
>>     cobalt/sync: fix race in release path
>>     
>>     The mutex owner handle transitions to XN_NO_HANDLE before
>>     transfer_ownership() may switch it to the handle of the first
>>     contender in line. If a user context manage to slip in with a
>>     succcessful cmpxchg() in this tiny window, two threads may actually
>>     end up holding the same lock.
>>     
>>     To address this bug, set the handle directly to its final value as a
>>     result of transferring ownership, without transitioning via
>>     XN_NO_HANDLE, closing the race window.
>>     
>>     At this chance, stop duplicating the logic which transfer_ownership()
>>     already provides.  We may unconditionally rely on this routine for
>>     either passing the lock to the next waiter in line, or marking the
>>     lock as free, updating the fast handle accordingly in both cases.
>>     
>>     Signed-off-by: Philippe Gerum <rpm@xenomai.org>
>> 
>> diff --git a/kernel/cobalt/synch.c b/kernel/cobalt/synch.c
>> index 6e50e537f..c5398dba3 100644
>> --- a/kernel/cobalt/synch.c
>> +++ b/kernel/cobalt/synch.c
>> @@ -850,6 +850,17 @@ static bool transfer_ownership(struct xnsynch *synch,
>>  	xnhandle_t nextownerh;
>>  	atomic_t *lockp;
>>  
>> +	/*
>> +	 * FLCEIL may only be raised by the owner, or when the owner
>> +	 * is blocked waiting for the synch (ownership transfer). In
>> +	 * addition, only the current owner of a synch may release it,
>> +	 * therefore we can't race while testing FLCEIL locklessly.
>> +	 * All updates to FLCLAIM are covered by the superlock.
>> +	 *
>> +	 * Therefore, clearing the fastlock racelessly in this routine
>> +	 * without leaking FLCEIL/FLCLAIM updates can be achieved by
>> +	 * holding the superlock.
>> +	 */
>>  	lockp = xnsynch_fastlock(synch);
>>  
>>  	/*
>> @@ -862,6 +873,8 @@ static bool transfer_ownership(struct xnsynch *synch,
>>  		return false;
>>  	}
>>  
>> +	XENO_WARN_ON(COBALT, !xnsynch_fast_is_claimed(atomic_read(lockp)));
>> +
>>  	nextowner = list_first_entry(&synch->pendq, struct xnthread, plink);
>>  	list_del(&nextowner->plink);
>>  	nextowner->wchan = NULL;
>> @@ -916,8 +929,6 @@ static bool transfer_ownership(struct xnsynch *synch,
>>  bool xnsynch_release(struct xnsynch *synch, struct xnthread *curr)
>>  {
>>  	bool need_resched = false;
>> -	xnhandle_t currh, h;
>> -	atomic_t *lockp;
>>  	spl_t s;
>>  
>>  	XENO_BUG_ON(COBALT, (synch->status & XNSYNCH_OWNER) == 0);
>> @@ -927,19 +938,6 @@ bool xnsynch_release(struct xnsynch *synch, struct xnthread *curr)
>>  	if (xnthread_put_resource(curr))
>>  		return false;
>>  
>> -	lockp = xnsynch_fastlock(synch);
>> -	currh = curr->handle;
>> -	/*
>> -	 * FLCEIL may only be raised by the owner, or when the owner
>> -	 * is blocked waiting for the synch (ownership transfer). In
>> -	 * addition, only the current owner of a synch may release it,
>> -	 * therefore we can't race while testing FLCEIL locklessly.
>> -	 * All updates to FLCLAIM are covered by the superlock.
>> -	 *
>> -	 * Therefore, clearing the fastlock racelessly in this routine
>> -	 * without leaking FLCEIL/FLCLAIM updates can be achieved by
>> -	 * holding the superlock.
>> -	 */
>>  	xnlock_get_irqsave(&nklock, s);
>>  
>>  	if (synch->status & XNSYNCH_CEILING) {
>> @@ -947,12 +945,7 @@ bool xnsynch_release(struct xnsynch *synch, struct xnthread *curr)
>>  		need_resched = true;
>>  	}
>>  
>> -	h = atomic_cmpxchg(lockp, currh, XN_NO_HANDLE);
>> -	if ((h & ~XNSYNCH_FLCEIL) != currh)
>> -		/* FLCLAIM set, synch is contended. */
>> -		need_resched = transfer_ownership(synch, curr);
>> -	else if (h != currh)	/* FLCEIL set, FLCLAIM clear. */
>> -		atomic_set(lockp, XN_NO_HANDLE);
>> +	need_resched |= transfer_ownership(synch, curr);
>>  
>>  	xnlock_put_irqrestore(&nklock, s);
>>  
>
> This apparently fell through the cracks on my side, not being submitted
> as top-level patch, thanks for pointing out, Sang Yo.
>
> Philippe, this is still valid?

From reading xnsynch_release() in Cobalt and the explanations in that
commit log, it is. Since you were a direct recipient of this patch, I
assumed you queued it for a deep review given its criticality, so I
moved on, sorry for this.

> How can we check if we need it and it
> does the think it is supposed to do?

Unfortunately I have no xenomai3 test at hand, this bug was originally
uncovered in xenomai4 which initially imported the xenomai3 mutex logic
into the EVL core before I reworked it heavily last year, starting with
fixing this bug. Maybe the test case [1] Giulio posted back then could
also trigger the issue with xenomai3 which still runs the original
mutex implementation.

[1]
https://lore.kernel.org/xenomai/a4c1ed74-2ae9-9427-994b-54dd7773399e@bela.io/

-- 
Philippe.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Issues with evl_mutex_trylock()
  2023-08-10  8:18         ` Philippe Gerum
@ 2023-08-11  7:58           ` Jan Kiszka
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Kiszka @ 2023-08-11  7:58 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai, sang yo

On 10.08.23 10:18, Philippe Gerum wrote:
> 
> Jan Kiszka <jan.kiszka@siemens.com> writes:
> 
>> On 27.08.22 19:53, Philippe Gerum wrote:
>>>
>>> Philippe Gerum <rpm@xenomai.org> writes:
>>>
>>>> Philippe Gerum <rpm@xenomai.org> writes:
>>>>
>>>>> Giulio Moro <giulio@bela.io> writes:
>>>>>
>>>>>> In the toy example below I start some pthreads with SCHED_FIFO, priority 94 and setting the affinity of one thread per core. I then attach them to the EVL core and share a lock between them and the main thread. The main thread has a settable priority and schedule (SCHED_OTHER with prio 0 or SCHED_FIFO with higher priority) and its affinity is set to all available cores.
>>>>>>
>>>>>> Here's a summary of what I get when running the test with some
>>>>>> parameters (arguments are: "max iterations", "number of threads to
>>>>>> create", "priority of the main thread", "flags (t means don't print
>>>>>> anything). The error code reported alongside failures is the base 2
>>>>>> representation of the return code. 100 means that errors were
>>>>>> detected in evl_unlock_mutex() in the main thread following a
>>>>>> successful call to evl_trylock_mutex(). During these tests, the
>>>>>> output is piped to cat and then redirected to /dev/null.
>>>>>>
>>>>>> mutex-test 100000 0 0 t :  Success real 0m13.583s user 0m1.218s sys 0m0.000s
>>>>>> mutex-test 100000 0 90 t :  Success real 0m13.350s user 0m1.177s sys 0m0.000s
>>>>>> mutex-test 100000 0 99 t :  Success real 0m13.347s user 0m1.179s sys 0m0.000s
>>>>>> mutex-test 100000 1 0 t :  Success real 0m20.087s user 0m3.575s sys 0m0.000s
>>>>>> mutex-test 100000 1 90 t :  Success real 0m17.610s user 0m2.133s sys 0m0.000s
>>>>>> mutex-test 100000 1 99 t :  Success real 0m15.466s user 0m1.333s sys 0m0.000s
>>>>>> mutex-test 100000 2 0 t :  Failed 100 real 0m0.259s user 0m0.083s sys 0m0.000s
>>>>>> mutex-test 100000 2 90 t :  Failed 100 real 0m0.374s user 0m0.176s sys 0m0.000s
>>>>>> mutex-test 100000 2 99 t :  Failed 100 real 0m0.207s user 0m0.109s sys 0m0.000s
>>>>>> mutex-test 100000 3 0 t :  Failed 100 real 0m0.253s user 0m0.124s sys 0m0.000s
>>>>>> mutex-test 100000 3 90 t :  Failed 100 real 0m0.383s user 0m0.149s sys 0m0.000s
>>>>>> mutex-test 100000 3 99 t :  Failed 100 real 0m0.283s user 0m0.151s sys 0m0.000s
>>>>>> mutex-test 100000 4 0 t :  Failed 100 real 0m0.526s user 0m0.221s sys 0m0.000s
>>>>>> mutex-test 100000 4 90 t :  Failed 100 real 0m0.438s user 0m0.178s sys 0m0.000s
>>>>>> mutex-test 100000 4 99 t :  Failed 100 real 0m0.434s user 0m0.178s sys 0m0.000s
>>>>>>
>>>>>> So, when running with 2 or more threads, the same error keeps surfacing very quickly.
>>>>>>
>>>>>
>>>>> Confirmed. There is something wrong.
>>>>>
>>>>
>>>> Please retry with this patch in. There is clearly a race in the current
>>>> implementation of the mutex release code, specifically the release path.
>>>>
>>>> @Jan: If believe this gremlin may be in Cobalt too. The key issue is
>>>> with the handle transitioning to NO_HANDLE before transfer_ownership()
>>>> switches it to the handle of the first contender in line. If userland
>>>> manages to slip in with a fast_acquire() during this tiny window, we may
>>>> be toast, with two threads actually holding the same lock. The issue was
>>>> made more obvious with EVL which does not serialize the trylock code
>>>> around cmpxchg in kernel space, triggering the issue faster. I'll have a
>>>> second look to Cobalt, but since you know the mutex code in depth there,
>>>> four eyeballs would be better.
>>>>
>>>
>>> I believe that the problem just fixed in the EVL core also affects
>>> Cobalt. This is a suggested patch for fixing the same issue in the
>>> latter on top of 3.2.2. Smokey's posix_mutex test still happy with it.
>>>
>>> NOTE: the issue at hand is two threads being granted the _same_ lock
>>> concurrently, so this might be potentially catastrophic.
>>>
>>> ---
>>>
>>> commit 4417674f427a78ba7fe2c8e5b2607d5ca6886ec0 (HEAD -> master)
>>> Author: Philippe Gerum <rpm@xenomai.org>
>>> Date:   Sat Aug 27 19:44:30 2022 +0200
>>>
>>>     cobalt/sync: fix race in release path
>>>     
>>>     The mutex owner handle transitions to XN_NO_HANDLE before
>>>     transfer_ownership() may switch it to the handle of the first
>>>     contender in line. If a user context manage to slip in with a
>>>     succcessful cmpxchg() in this tiny window, two threads may actually
>>>     end up holding the same lock.
>>>     
>>>     To address this bug, set the handle directly to its final value as a
>>>     result of transferring ownership, without transitioning via
>>>     XN_NO_HANDLE, closing the race window.
>>>     
>>>     At this chance, stop duplicating the logic which transfer_ownership()
>>>     already provides.  We may unconditionally rely on this routine for
>>>     either passing the lock to the next waiter in line, or marking the
>>>     lock as free, updating the fast handle accordingly in both cases.
>>>     
>>>     Signed-off-by: Philippe Gerum <rpm@xenomai.org>
>>>
>>> diff --git a/kernel/cobalt/synch.c b/kernel/cobalt/synch.c
>>> index 6e50e537f..c5398dba3 100644
>>> --- a/kernel/cobalt/synch.c
>>> +++ b/kernel/cobalt/synch.c
>>> @@ -850,6 +850,17 @@ static bool transfer_ownership(struct xnsynch *synch,
>>>  	xnhandle_t nextownerh;
>>>  	atomic_t *lockp;
>>>  
>>> +	/*
>>> +	 * FLCEIL may only be raised by the owner, or when the owner
>>> +	 * is blocked waiting for the synch (ownership transfer). In
>>> +	 * addition, only the current owner of a synch may release it,
>>> +	 * therefore we can't race while testing FLCEIL locklessly.
>>> +	 * All updates to FLCLAIM are covered by the superlock.
>>> +	 *
>>> +	 * Therefore, clearing the fastlock racelessly in this routine
>>> +	 * without leaking FLCEIL/FLCLAIM updates can be achieved by
>>> +	 * holding the superlock.
>>> +	 */
>>>  	lockp = xnsynch_fastlock(synch);
>>>  
>>>  	/*
>>> @@ -862,6 +873,8 @@ static bool transfer_ownership(struct xnsynch *synch,
>>>  		return false;
>>>  	}
>>>  
>>> +	XENO_WARN_ON(COBALT, !xnsynch_fast_is_claimed(atomic_read(lockp)));
>>> +
>>>  	nextowner = list_first_entry(&synch->pendq, struct xnthread, plink);
>>>  	list_del(&nextowner->plink);
>>>  	nextowner->wchan = NULL;
>>> @@ -916,8 +929,6 @@ static bool transfer_ownership(struct xnsynch *synch,
>>>  bool xnsynch_release(struct xnsynch *synch, struct xnthread *curr)
>>>  {
>>>  	bool need_resched = false;
>>> -	xnhandle_t currh, h;
>>> -	atomic_t *lockp;
>>>  	spl_t s;
>>>  
>>>  	XENO_BUG_ON(COBALT, (synch->status & XNSYNCH_OWNER) == 0);
>>> @@ -927,19 +938,6 @@ bool xnsynch_release(struct xnsynch *synch, struct xnthread *curr)
>>>  	if (xnthread_put_resource(curr))
>>>  		return false;
>>>  
>>> -	lockp = xnsynch_fastlock(synch);
>>> -	currh = curr->handle;
>>> -	/*
>>> -	 * FLCEIL may only be raised by the owner, or when the owner
>>> -	 * is blocked waiting for the synch (ownership transfer). In
>>> -	 * addition, only the current owner of a synch may release it,
>>> -	 * therefore we can't race while testing FLCEIL locklessly.
>>> -	 * All updates to FLCLAIM are covered by the superlock.
>>> -	 *
>>> -	 * Therefore, clearing the fastlock racelessly in this routine
>>> -	 * without leaking FLCEIL/FLCLAIM updates can be achieved by
>>> -	 * holding the superlock.
>>> -	 */
>>>  	xnlock_get_irqsave(&nklock, s);
>>>  
>>>  	if (synch->status & XNSYNCH_CEILING) {
>>> @@ -947,12 +945,7 @@ bool xnsynch_release(struct xnsynch *synch, struct xnthread *curr)
>>>  		need_resched = true;
>>>  	}
>>>  
>>> -	h = atomic_cmpxchg(lockp, currh, XN_NO_HANDLE);
>>> -	if ((h & ~XNSYNCH_FLCEIL) != currh)
>>> -		/* FLCLAIM set, synch is contended. */
>>> -		need_resched = transfer_ownership(synch, curr);
>>> -	else if (h != currh)	/* FLCEIL set, FLCLAIM clear. */
>>> -		atomic_set(lockp, XN_NO_HANDLE);
>>> +	need_resched |= transfer_ownership(synch, curr);
>>>  
>>>  	xnlock_put_irqrestore(&nklock, s);
>>>  
>>
>> This apparently fell through the cracks on my side, not being submitted
>> as top-level patch, thanks for pointing out, Sang Yo.
>>
>> Philippe, this is still valid?
> 
> From reading xnsynch_release() in Cobalt and the explanations in that
> commit log, it is. Since you were a direct recipient of this patch, I
> assumed you queued it for a deep review given its criticality, so I
> moved on, sorry for this.
> 
>> How can we check if we need it and it
>> does the think it is supposed to do?
> 
> Unfortunately I have no xenomai3 test at hand, this bug was originally
> uncovered in xenomai4 which initially imported the xenomai3 mutex logic
> into the EVL core before I reworked it heavily last year, starting with
> fixing this bug. Maybe the test case [1] Giulio posted back then could
> also trigger the issue with xenomai3 which still runs the original
> mutex implementation.
> 
> [1]
> https://lore.kernel.org/xenomai/a4c1ed74-2ae9-9427-994b-54dd7773399e@bela.io/
> 

Tried that, had to fix the test first (racy in thread setup), but it
does not trigger unfortunately.

Jan

-- 
Siemens AG, Technology
Linux Expert Center


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Issues with evl_mutex_trylock()
  2022-08-27 17:53     ` Philippe Gerum
  2023-08-10  6:00       ` Jan Kiszka
@ 2023-08-11 11:31       ` Jan Kiszka
  1 sibling, 0 replies; 17+ messages in thread
From: Jan Kiszka @ 2023-08-11 11:31 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

On 27.08.22 19:53, Philippe Gerum wrote:
> 
> Philippe Gerum <rpm@xenomai.org> writes:
> 
>> Philippe Gerum <rpm@xenomai.org> writes:
>>
>>> Giulio Moro <giulio@bela.io> writes:
>>>
>>>> In the toy example below I start some pthreads with SCHED_FIFO, priority 94 and setting the affinity of one thread per core. I then attach them to the EVL core and share a lock between them and the main thread. The main thread has a settable priority and schedule (SCHED_OTHER with prio 0 or SCHED_FIFO with higher priority) and its affinity is set to all available cores.
>>>>
>>>> Here's a summary of what I get when running the test with some
>>>> parameters (arguments are: "max iterations", "number of threads to
>>>> create", "priority of the main thread", "flags (t means don't print
>>>> anything). The error code reported alongside failures is the base 2
>>>> representation of the return code. 100 means that errors were
>>>> detected in evl_unlock_mutex() in the main thread following a
>>>> successful call to evl_trylock_mutex(). During these tests, the
>>>> output is piped to cat and then redirected to /dev/null.
>>>>
>>>> mutex-test 100000 0 0 t :  Success real 0m13.583s user 0m1.218s sys 0m0.000s
>>>> mutex-test 100000 0 90 t :  Success real 0m13.350s user 0m1.177s sys 0m0.000s
>>>> mutex-test 100000 0 99 t :  Success real 0m13.347s user 0m1.179s sys 0m0.000s
>>>> mutex-test 100000 1 0 t :  Success real 0m20.087s user 0m3.575s sys 0m0.000s
>>>> mutex-test 100000 1 90 t :  Success real 0m17.610s user 0m2.133s sys 0m0.000s
>>>> mutex-test 100000 1 99 t :  Success real 0m15.466s user 0m1.333s sys 0m0.000s
>>>> mutex-test 100000 2 0 t :  Failed 100 real 0m0.259s user 0m0.083s sys 0m0.000s
>>>> mutex-test 100000 2 90 t :  Failed 100 real 0m0.374s user 0m0.176s sys 0m0.000s
>>>> mutex-test 100000 2 99 t :  Failed 100 real 0m0.207s user 0m0.109s sys 0m0.000s
>>>> mutex-test 100000 3 0 t :  Failed 100 real 0m0.253s user 0m0.124s sys 0m0.000s
>>>> mutex-test 100000 3 90 t :  Failed 100 real 0m0.383s user 0m0.149s sys 0m0.000s
>>>> mutex-test 100000 3 99 t :  Failed 100 real 0m0.283s user 0m0.151s sys 0m0.000s
>>>> mutex-test 100000 4 0 t :  Failed 100 real 0m0.526s user 0m0.221s sys 0m0.000s
>>>> mutex-test 100000 4 90 t :  Failed 100 real 0m0.438s user 0m0.178s sys 0m0.000s
>>>> mutex-test 100000 4 99 t :  Failed 100 real 0m0.434s user 0m0.178s sys 0m0.000s
>>>>
>>>> So, when running with 2 or more threads, the same error keeps surfacing very quickly.
>>>>
>>>
>>> Confirmed. There is something wrong.
>>>
>>
>> Please retry with this patch in. There is clearly a race in the current
>> implementation of the mutex release code, specifically the release path.
>>
>> @Jan: If believe this gremlin may be in Cobalt too. The key issue is
>> with the handle transitioning to NO_HANDLE before transfer_ownership()
>> switches it to the handle of the first contender in line. If userland
>> manages to slip in with a fast_acquire() during this tiny window, we may
>> be toast, with two threads actually holding the same lock. The issue was
>> made more obvious with EVL which does not serialize the trylock code
>> around cmpxchg in kernel space, triggering the issue faster. I'll have a
>> second look to Cobalt, but since you know the mutex code in depth there,
>> four eyeballs would be better.
>>
> 
> I believe that the problem just fixed in the EVL core also affects
> Cobalt. This is a suggested patch for fixing the same issue in the
> latter on top of 3.2.2. Smokey's posix_mutex test still happy with it.
> 
> NOTE: the issue at hand is two threads being granted the _same_ lock
> concurrently, so this might be potentially catastrophic.
> 
> ---
> 
> commit 4417674f427a78ba7fe2c8e5b2607d5ca6886ec0 (HEAD -> master)
> Author: Philippe Gerum <rpm@xenomai.org>
> Date:   Sat Aug 27 19:44:30 2022 +0200
> 
>     cobalt/sync: fix race in release path
>     
>     The mutex owner handle transitions to XN_NO_HANDLE before
>     transfer_ownership() may switch it to the handle of the first
>     contender in line. If a user context manage to slip in with a
>     succcessful cmpxchg() in this tiny window, two threads may actually
>     end up holding the same lock.
>     
>     To address this bug, set the handle directly to its final value as a
>     result of transferring ownership, without transitioning via
>     XN_NO_HANDLE, closing the race window.
>     
>     At this chance, stop duplicating the logic which transfer_ownership()
>     already provides.  We may unconditionally rely on this routine for
>     either passing the lock to the next waiter in line, or marking the
>     lock as free, updating the fast handle accordingly in both cases.
>     
>     Signed-off-by: Philippe Gerum <rpm@xenomai.org>
> 
> diff --git a/kernel/cobalt/synch.c b/kernel/cobalt/synch.c
> index 6e50e537f..c5398dba3 100644
> --- a/kernel/cobalt/synch.c
> +++ b/kernel/cobalt/synch.c
> @@ -850,6 +850,17 @@ static bool transfer_ownership(struct xnsynch *synch,
>  	xnhandle_t nextownerh;
>  	atomic_t *lockp;
>  
> +	/*
> +	 * FLCEIL may only be raised by the owner, or when the owner
> +	 * is blocked waiting for the synch (ownership transfer). In
> +	 * addition, only the current owner of a synch may release it,
> +	 * therefore we can't race while testing FLCEIL locklessly.
> +	 * All updates to FLCLAIM are covered by the superlock.
> +	 *
> +	 * Therefore, clearing the fastlock racelessly in this routine
> +	 * without leaking FLCEIL/FLCLAIM updates can be achieved by
> +	 * holding the superlock.
> +	 */
>  	lockp = xnsynch_fastlock(synch);
>  
>  	/*
> @@ -862,6 +873,8 @@ static bool transfer_ownership(struct xnsynch *synch,
>  		return false;
>  	}
>  
> +	XENO_WARN_ON(COBALT, !xnsynch_fast_is_claimed(atomic_read(lockp)));
> +
>  	nextowner = list_first_entry(&synch->pendq, struct xnthread, plink);
>  	list_del(&nextowner->plink);
>  	nextowner->wchan = NULL;
> @@ -916,8 +929,6 @@ static bool transfer_ownership(struct xnsynch *synch,
>  bool xnsynch_release(struct xnsynch *synch, struct xnthread *curr)
>  {
>  	bool need_resched = false;
> -	xnhandle_t currh, h;
> -	atomic_t *lockp;
>  	spl_t s;
>  
>  	XENO_BUG_ON(COBALT, (synch->status & XNSYNCH_OWNER) == 0);
> @@ -927,19 +938,6 @@ bool xnsynch_release(struct xnsynch *synch, struct xnthread *curr)
>  	if (xnthread_put_resource(curr))
>  		return false;
>  
> -	lockp = xnsynch_fastlock(synch);
> -	currh = curr->handle;
> -	/*
> -	 * FLCEIL may only be raised by the owner, or when the owner
> -	 * is blocked waiting for the synch (ownership transfer). In
> -	 * addition, only the current owner of a synch may release it,
> -	 * therefore we can't race while testing FLCEIL locklessly.
> -	 * All updates to FLCLAIM are covered by the superlock.
> -	 *
> -	 * Therefore, clearing the fastlock racelessly in this routine
> -	 * without leaking FLCEIL/FLCLAIM updates can be achieved by
> -	 * holding the superlock.
> -	 */
>  	xnlock_get_irqsave(&nklock, s);
>  
>  	if (synch->status & XNSYNCH_CEILING) {
> @@ -947,12 +945,7 @@ bool xnsynch_release(struct xnsynch *synch, struct xnthread *curr)
>  		need_resched = true;
>  	}
>  
> -	h = atomic_cmpxchg(lockp, currh, XN_NO_HANDLE);

I've tried widening the race window by injecting some udelay after this
line, but still no luck in seeing the issue with Giulio's test case
under Xenomai 3. Am I on the right track with this at all?

Jan

> -	if ((h & ~XNSYNCH_FLCEIL) != currh)
> -		/* FLCLAIM set, synch is contended. */
> -		need_resched = transfer_ownership(synch, curr);
> -	else if (h != currh)	/* FLCEIL set, FLCLAIM clear. */
> -		atomic_set(lockp, XN_NO_HANDLE);
> +	need_resched |= transfer_ownership(synch, curr);
>  
>  	xnlock_put_irqrestore(&nklock, s);
>  

-- 
Siemens AG, Technology
Linux Expert Center


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Issues with evl_mutex_trylock()
  2022-08-26 16:17   ` Philippe Gerum
                       ` (2 preceding siblings ...)
  2022-08-27 17:53     ` Philippe Gerum
@ 2023-08-11 12:31     ` Jan Kiszka
  2023-08-11 13:17       ` Philippe Gerum
  3 siblings, 1 reply; 17+ messages in thread
From: Jan Kiszka @ 2023-08-11 12:31 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai, yo sang, Giulio Moro

On 26.08.22 18:17, Philippe Gerum wrote:
> 
> Philippe Gerum <rpm@xenomai.org> writes:
> 
>> Giulio Moro <giulio@bela.io> writes:
>>
>>> In the toy example below I start some pthreads with SCHED_FIFO, priority 94 and setting the affinity of one thread per core. I then attach them to the EVL core and share a lock between them and the main thread. The main thread has a settable priority and schedule (SCHED_OTHER with prio 0 or SCHED_FIFO with higher priority) and its affinity is set to all available cores.
>>>
>>> Here's a summary of what I get when running the test with some
>>> parameters (arguments are: "max iterations", "number of threads to
>>> create", "priority of the main thread", "flags (t means don't print
>>> anything). The error code reported alongside failures is the base 2
>>> representation of the return code. 100 means that errors were
>>> detected in evl_unlock_mutex() in the main thread following a
>>> successful call to evl_trylock_mutex(). During these tests, the
>>> output is piped to cat and then redirected to /dev/null.
>>>
>>> mutex-test 100000 0 0 t :  Success real 0m13.583s user 0m1.218s sys 0m0.000s
>>> mutex-test 100000 0 90 t :  Success real 0m13.350s user 0m1.177s sys 0m0.000s
>>> mutex-test 100000 0 99 t :  Success real 0m13.347s user 0m1.179s sys 0m0.000s
>>> mutex-test 100000 1 0 t :  Success real 0m20.087s user 0m3.575s sys 0m0.000s
>>> mutex-test 100000 1 90 t :  Success real 0m17.610s user 0m2.133s sys 0m0.000s
>>> mutex-test 100000 1 99 t :  Success real 0m15.466s user 0m1.333s sys 0m0.000s
>>> mutex-test 100000 2 0 t :  Failed 100 real 0m0.259s user 0m0.083s sys 0m0.000s
>>> mutex-test 100000 2 90 t :  Failed 100 real 0m0.374s user 0m0.176s sys 0m0.000s
>>> mutex-test 100000 2 99 t :  Failed 100 real 0m0.207s user 0m0.109s sys 0m0.000s
>>> mutex-test 100000 3 0 t :  Failed 100 real 0m0.253s user 0m0.124s sys 0m0.000s
>>> mutex-test 100000 3 90 t :  Failed 100 real 0m0.383s user 0m0.149s sys 0m0.000s
>>> mutex-test 100000 3 99 t :  Failed 100 real 0m0.283s user 0m0.151s sys 0m0.000s
>>> mutex-test 100000 4 0 t :  Failed 100 real 0m0.526s user 0m0.221s sys 0m0.000s
>>> mutex-test 100000 4 90 t :  Failed 100 real 0m0.438s user 0m0.178s sys 0m0.000s
>>> mutex-test 100000 4 99 t :  Failed 100 real 0m0.434s user 0m0.178s sys 0m0.000s
>>>
>>> So, when running with 2 or more threads, the same error keeps surfacing very quickly.
>>>
>>
>> Confirmed. There is something wrong.
>>
> 
> Please retry with this patch in. There is clearly a race in the current
> implementation of the mutex release code, specifically the release path.
> 
> @Jan: If believe this gremlin may be in Cobalt too. The key issue is
> with the handle transitioning to NO_HANDLE before transfer_ownership()
> switches it to the handle of the first contender in line. If userland
> manages to slip in with a fast_acquire() during this tiny window, we may
> be toast, with two threads actually holding the same lock. The issue was
> made more obvious with EVL which does not serialize the trylock code
> around cmpxchg in kernel space, triggering the issue faster. I'll have a
> second look to Cobalt, but since you know the mutex code in depth there,
> four eyeballs would be better.
> 
> diff --git a/kernel/evl/mutex.c b/kernel/evl/mutex.c
> index e0cf1ba21d5b4..d96f8c9f04df7 100644
> --- a/kernel/evl/mutex.c
> +++ b/kernel/evl/mutex.c
> @@ -899,14 +899,16 @@ void __evl_unlock_mutex(struct evl_mutex *mutex)
>  	if (mutex->flags & EVL_MUTEX_CEILING)
>  		clear_boost_locked(mutex, curr, EVL_MUTEX_CEILING);
>  
> +	/*
> +	 * The whole logic is based on the invariant that the current
> +	 * thread does own the mutex being released.
> +	 * IOW: currh == atomic_read(lockp) & ~(EVL_MUTEX_FLCLAIM|EVL_MUTEX_FLCEIL).
> +	 */
>  	h = atomic_read(lockp);
> -	h = atomic_cmpxchg(lockp, h, EVL_NO_HANDLE);

This here was different from current Xenomai 3:

	currh = curr->handle;
	[...]
	h = atomic_cmpxchg(lockp, currh, XN_NO_HANDLE);

Under cobalt, we are using the thread handle as expectation value for
the fast lock. If FLCLAIM should have been set there, the cmpxchg will
fail, and the fast lock will NOT contain XN_NO_HANDLE until we switch
things in transfer_ownership(). IOW, I don't see how Xenomai 3 could be
affected in the same way by the issue that you fixed in EVL.

Am I missing something?

Jan

> -	if ((h & ~EVL_MUTEX_FLCEIL) != currh) {
> -		/* FLCLAIM set, mutex is contended. */
> +	if (h & EVL_MUTEX_FLCLAIM) { /* Is there a contender? */
>  		transfer_ownership(mutex, curr);
>  	} else {
> -		if (h != currh)	/* FLCEIL set, FLCLAIM clear. */
> -			atomic_set(lockp, EVL_NO_HANDLE);
> +		atomic_set(lockp, EVL_NO_HANDLE);
>  		untrack_owner(mutex);
>  	}
>  

-- 
Siemens AG, Technology
Linux Expert Center


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Issues with evl_mutex_trylock()
  2023-08-11 12:31     ` Jan Kiszka
@ 2023-08-11 13:17       ` Philippe Gerum
  2023-08-11 13:49         ` Jan Kiszka
  0 siblings, 1 reply; 17+ messages in thread
From: Philippe Gerum @ 2023-08-11 13:17 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai, yo sang, Giulio Moro


Jan Kiszka <jan.kiszka@siemens.com> writes:

> On 26.08.22 18:17, Philippe Gerum wrote:
>> 
>> Philippe Gerum <rpm@xenomai.org> writes:
>> 
>>> Giulio Moro <giulio@bela.io> writes:
>>>
>>>> In the toy example below I start some pthreads with SCHED_FIFO, priority 94 and setting the affinity of one thread per core. I then attach them to the EVL core and share a lock between them and the main thread. The main thread has a settable priority and schedule (SCHED_OTHER with prio 0 or SCHED_FIFO with higher priority) and its affinity is set to all available cores.
>>>>
>>>> Here's a summary of what I get when running the test with some
>>>> parameters (arguments are: "max iterations", "number of threads to
>>>> create", "priority of the main thread", "flags (t means don't print
>>>> anything). The error code reported alongside failures is the base 2
>>>> representation of the return code. 100 means that errors were
>>>> detected in evl_unlock_mutex() in the main thread following a
>>>> successful call to evl_trylock_mutex(). During these tests, the
>>>> output is piped to cat and then redirected to /dev/null.
>>>>
>>>> mutex-test 100000 0 0 t :  Success real 0m13.583s user 0m1.218s sys 0m0.000s
>>>> mutex-test 100000 0 90 t :  Success real 0m13.350s user 0m1.177s sys 0m0.000s
>>>> mutex-test 100000 0 99 t :  Success real 0m13.347s user 0m1.179s sys 0m0.000s
>>>> mutex-test 100000 1 0 t :  Success real 0m20.087s user 0m3.575s sys 0m0.000s
>>>> mutex-test 100000 1 90 t :  Success real 0m17.610s user 0m2.133s sys 0m0.000s
>>>> mutex-test 100000 1 99 t :  Success real 0m15.466s user 0m1.333s sys 0m0.000s
>>>> mutex-test 100000 2 0 t :  Failed 100 real 0m0.259s user 0m0.083s sys 0m0.000s
>>>> mutex-test 100000 2 90 t :  Failed 100 real 0m0.374s user 0m0.176s sys 0m0.000s
>>>> mutex-test 100000 2 99 t :  Failed 100 real 0m0.207s user 0m0.109s sys 0m0.000s
>>>> mutex-test 100000 3 0 t :  Failed 100 real 0m0.253s user 0m0.124s sys 0m0.000s
>>>> mutex-test 100000 3 90 t :  Failed 100 real 0m0.383s user 0m0.149s sys 0m0.000s
>>>> mutex-test 100000 3 99 t :  Failed 100 real 0m0.283s user 0m0.151s sys 0m0.000s
>>>> mutex-test 100000 4 0 t :  Failed 100 real 0m0.526s user 0m0.221s sys 0m0.000s
>>>> mutex-test 100000 4 90 t :  Failed 100 real 0m0.438s user 0m0.178s sys 0m0.000s
>>>> mutex-test 100000 4 99 t :  Failed 100 real 0m0.434s user 0m0.178s sys 0m0.000s
>>>>
>>>> So, when running with 2 or more threads, the same error keeps surfacing very quickly.
>>>>
>>>
>>> Confirmed. There is something wrong.
>>>
>> 
>> Please retry with this patch in. There is clearly a race in the current
>> implementation of the mutex release code, specifically the release path.
>> 
>> @Jan: If believe this gremlin may be in Cobalt too. The key issue is
>> with the handle transitioning to NO_HANDLE before transfer_ownership()
>> switches it to the handle of the first contender in line. If userland
>> manages to slip in with a fast_acquire() during this tiny window, we may
>> be toast, with two threads actually holding the same lock. The issue was
>> made more obvious with EVL which does not serialize the trylock code
>> around cmpxchg in kernel space, triggering the issue faster. I'll have a
>> second look to Cobalt, but since you know the mutex code in depth there,
>> four eyeballs would be better.
>> 
>> diff --git a/kernel/evl/mutex.c b/kernel/evl/mutex.c
>> index e0cf1ba21d5b4..d96f8c9f04df7 100644
>> --- a/kernel/evl/mutex.c
>> +++ b/kernel/evl/mutex.c
>> @@ -899,14 +899,16 @@ void __evl_unlock_mutex(struct evl_mutex *mutex)
>>  	if (mutex->flags & EVL_MUTEX_CEILING)
>>  		clear_boost_locked(mutex, curr, EVL_MUTEX_CEILING);
>>  
>> +	/*
>> +	 * The whole logic is based on the invariant that the current
>> +	 * thread does own the mutex being released.
>> +	 * IOW: currh == atomic_read(lockp) & ~(EVL_MUTEX_FLCLAIM|EVL_MUTEX_FLCEIL).
>> +	 */
>>  	h = atomic_read(lockp);
>> -	h = atomic_cmpxchg(lockp, h, EVL_NO_HANDLE);
>
> This here was different from current Xenomai 3:
>
> 	currh = curr->handle;
> 	[...]
> 	h = atomic_cmpxchg(lockp, currh, XN_NO_HANDLE);
>
> Under cobalt, we are using the thread handle as expectation value for
> the fast lock. If FLCLAIM should have been set there, the cmpxchg will
> fail, and the fast lock will NOT contain XN_NO_HANDLE until we switch
> things in transfer_ownership(). IOW, I don't see how Xenomai 3 could be
> affected in the same way by the issue that you fixed in EVL.
>
> Am I missing something?

No, I agree, Xenomai3 is immune. Either FLCLAIM is set in which case we
would not match the current handle therefore the fast handle would not
be available for grab, or it is set and we would not attempt to transfer
the ownership.

False alarm, my bad.

-- 
Philippe.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Issues with evl_mutex_trylock()
  2023-08-11 13:17       ` Philippe Gerum
@ 2023-08-11 13:49         ` Jan Kiszka
  2023-08-15  6:07           ` sang yo
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Kiszka @ 2023-08-11 13:49 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai, yo sang, Giulio Moro

On 11.08.23 15:17, Philippe Gerum wrote:
> 
> Jan Kiszka <jan.kiszka@siemens.com> writes:
> 
>> On 26.08.22 18:17, Philippe Gerum wrote:
>>>
>>> Philippe Gerum <rpm@xenomai.org> writes:
>>>
>>>> Giulio Moro <giulio@bela.io> writes:
>>>>
>>>>> In the toy example below I start some pthreads with SCHED_FIFO, priority 94 and setting the affinity of one thread per core. I then attach them to the EVL core and share a lock between them and the main thread. The main thread has a settable priority and schedule (SCHED_OTHER with prio 0 or SCHED_FIFO with higher priority) and its affinity is set to all available cores.
>>>>>
>>>>> Here's a summary of what I get when running the test with some
>>>>> parameters (arguments are: "max iterations", "number of threads to
>>>>> create", "priority of the main thread", "flags (t means don't print
>>>>> anything). The error code reported alongside failures is the base 2
>>>>> representation of the return code. 100 means that errors were
>>>>> detected in evl_unlock_mutex() in the main thread following a
>>>>> successful call to evl_trylock_mutex(). During these tests, the
>>>>> output is piped to cat and then redirected to /dev/null.
>>>>>
>>>>> mutex-test 100000 0 0 t :  Success real 0m13.583s user 0m1.218s sys 0m0.000s
>>>>> mutex-test 100000 0 90 t :  Success real 0m13.350s user 0m1.177s sys 0m0.000s
>>>>> mutex-test 100000 0 99 t :  Success real 0m13.347s user 0m1.179s sys 0m0.000s
>>>>> mutex-test 100000 1 0 t :  Success real 0m20.087s user 0m3.575s sys 0m0.000s
>>>>> mutex-test 100000 1 90 t :  Success real 0m17.610s user 0m2.133s sys 0m0.000s
>>>>> mutex-test 100000 1 99 t :  Success real 0m15.466s user 0m1.333s sys 0m0.000s
>>>>> mutex-test 100000 2 0 t :  Failed 100 real 0m0.259s user 0m0.083s sys 0m0.000s
>>>>> mutex-test 100000 2 90 t :  Failed 100 real 0m0.374s user 0m0.176s sys 0m0.000s
>>>>> mutex-test 100000 2 99 t :  Failed 100 real 0m0.207s user 0m0.109s sys 0m0.000s
>>>>> mutex-test 100000 3 0 t :  Failed 100 real 0m0.253s user 0m0.124s sys 0m0.000s
>>>>> mutex-test 100000 3 90 t :  Failed 100 real 0m0.383s user 0m0.149s sys 0m0.000s
>>>>> mutex-test 100000 3 99 t :  Failed 100 real 0m0.283s user 0m0.151s sys 0m0.000s
>>>>> mutex-test 100000 4 0 t :  Failed 100 real 0m0.526s user 0m0.221s sys 0m0.000s
>>>>> mutex-test 100000 4 90 t :  Failed 100 real 0m0.438s user 0m0.178s sys 0m0.000s
>>>>> mutex-test 100000 4 99 t :  Failed 100 real 0m0.434s user 0m0.178s sys 0m0.000s
>>>>>
>>>>> So, when running with 2 or more threads, the same error keeps surfacing very quickly.
>>>>>
>>>>
>>>> Confirmed. There is something wrong.
>>>>
>>>
>>> Please retry with this patch in. There is clearly a race in the current
>>> implementation of the mutex release code, specifically the release path.
>>>
>>> @Jan: If believe this gremlin may be in Cobalt too. The key issue is
>>> with the handle transitioning to NO_HANDLE before transfer_ownership()
>>> switches it to the handle of the first contender in line. If userland
>>> manages to slip in with a fast_acquire() during this tiny window, we may
>>> be toast, with two threads actually holding the same lock. The issue was
>>> made more obvious with EVL which does not serialize the trylock code
>>> around cmpxchg in kernel space, triggering the issue faster. I'll have a
>>> second look to Cobalt, but since you know the mutex code in depth there,
>>> four eyeballs would be better.
>>>
>>> diff --git a/kernel/evl/mutex.c b/kernel/evl/mutex.c
>>> index e0cf1ba21d5b4..d96f8c9f04df7 100644
>>> --- a/kernel/evl/mutex.c
>>> +++ b/kernel/evl/mutex.c
>>> @@ -899,14 +899,16 @@ void __evl_unlock_mutex(struct evl_mutex *mutex)
>>>  	if (mutex->flags & EVL_MUTEX_CEILING)
>>>  		clear_boost_locked(mutex, curr, EVL_MUTEX_CEILING);
>>>  
>>> +	/*
>>> +	 * The whole logic is based on the invariant that the current
>>> +	 * thread does own the mutex being released.
>>> +	 * IOW: currh == atomic_read(lockp) & ~(EVL_MUTEX_FLCLAIM|EVL_MUTEX_FLCEIL).
>>> +	 */
>>>  	h = atomic_read(lockp);
>>> -	h = atomic_cmpxchg(lockp, h, EVL_NO_HANDLE);
>>
>> This here was different from current Xenomai 3:
>>
>> 	currh = curr->handle;
>> 	[...]
>> 	h = atomic_cmpxchg(lockp, currh, XN_NO_HANDLE);
>>
>> Under cobalt, we are using the thread handle as expectation value for
>> the fast lock. If FLCLAIM should have been set there, the cmpxchg will
>> fail, and the fast lock will NOT contain XN_NO_HANDLE until we switch
>> things in transfer_ownership(). IOW, I don't see how Xenomai 3 could be
>> affected in the same way by the issue that you fixed in EVL.
>>
>> Am I missing something?
> 
> No, I agree, Xenomai3 is immune. Either FLCLAIM is set in which case we
> would not match the current handle therefore the fast handle would not
> be available for grab, or it is set and we would not attempt to transfer
> the ownership.
> 
> False alarm, my bad.
> 

Well, we still have Sang reporting that he saw an issue that was gone
after applying your patch.

Sang, could you share your test case to validate what you were seeing?
Maybe there is a different race somewhere, and the patch changes timing
but does not fix things for real.

Jan

-- 
Siemens AG, Technology
Linux Expert Center


^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: Issues with evl_mutex_trylock()
  2023-08-11 13:49         ` Jan Kiszka
@ 2023-08-15  6:07           ` sang yo
  0 siblings, 0 replies; 17+ messages in thread
From: sang yo @ 2023-08-15  6:07 UTC (permalink / raw)
  To: Jan Kiszka, Philippe Gerum; +Cc: xenomai, Giulio Moro

Glad to hear the clarification. 

I'm trying to recurrent the issue, since it has been a long time and my code changes a lot.

I will update if I discover something.

Thanks

-----Original Message-----
From: Jan Kiszka <jan.kiszka@siemens.com> 
Sent: Friday, August 11, 2023 9:50 PM
To: Philippe Gerum <rpm@xenomai.org>
Cc: xenomai@lists.linux.dev; yo sang <stlfatboy@hotmail.com>; Giulio Moro <giulio@bela.io>
Subject: Re: Issues with evl_mutex_trylock()

On 11.08.23 15:17, Philippe Gerum wrote:
> 
> Jan Kiszka <jan.kiszka@siemens.com> writes:
> 
>> On 26.08.22 18:17, Philippe Gerum wrote:
>>>
>>> Philippe Gerum <rpm@xenomai.org> writes:
>>>
>>>> Giulio Moro <giulio@bela.io> writes:
>>>>
>>>>> In the toy example below I start some pthreads with SCHED_FIFO, priority 94 and setting the affinity of one thread per core. I then attach them to the EVL core and share a lock between them and the main thread. The main thread has a settable priority and schedule (SCHED_OTHER with prio 0 or SCHED_FIFO with higher priority) and its affinity is set to all available cores.
>>>>>
>>>>> Here's a summary of what I get when running the test with some 
>>>>> parameters (arguments are: "max iterations", "number of threads to 
>>>>> create", "priority of the main thread", "flags (t means don't 
>>>>> print anything). The error code reported alongside failures is the 
>>>>> base 2 representation of the return code. 100 means that errors 
>>>>> were detected in evl_unlock_mutex() in the main thread following a 
>>>>> successful call to evl_trylock_mutex(). During these tests, the 
>>>>> output is piped to cat and then redirected to /dev/null.
>>>>>
>>>>> mutex-test 100000 0 0 t :  Success real 0m13.583s user 0m1.218s 
>>>>> sys 0m0.000s mutex-test 100000 0 90 t :  Success real 0m13.350s 
>>>>> user 0m1.177s sys 0m0.000s mutex-test 100000 0 99 t :  Success 
>>>>> real 0m13.347s user 0m1.179s sys 0m0.000s mutex-test 100000 1 0 t 
>>>>> :  Success real 0m20.087s user 0m3.575s sys 0m0.000s mutex-test 
>>>>> 100000 1 90 t :  Success real 0m17.610s user 0m2.133s sys 0m0.000s 
>>>>> mutex-test 100000 1 99 t :  Success real 0m15.466s user 0m1.333s 
>>>>> sys 0m0.000s mutex-test 100000 2 0 t :  Failed 100 real 0m0.259s 
>>>>> user 0m0.083s sys 0m0.000s mutex-test 100000 2 90 t :  Failed 100 
>>>>> real 0m0.374s user 0m0.176s sys 0m0.000s mutex-test 100000 2 99 t 
>>>>> :  Failed 100 real 0m0.207s user 0m0.109s sys 0m0.000s mutex-test 
>>>>> 100000 3 0 t :  Failed 100 real 0m0.253s user 0m0.124s sys 
>>>>> 0m0.000s mutex-test 100000 3 90 t :  Failed 100 real 0m0.383s user 
>>>>> 0m0.149s sys 0m0.000s mutex-test 100000 3 99 t :  Failed 100 real 
>>>>> 0m0.283s user 0m0.151s sys 0m0.000s mutex-test 100000 4 0 t :  
>>>>> Failed 100 real 0m0.526s user 0m0.221s sys 0m0.000s mutex-test 
>>>>> 100000 4 90 t :  Failed 100 real 0m0.438s user 0m0.178s sys 
>>>>> 0m0.000s mutex-test 100000 4 99 t :  Failed 100 real 0m0.434s user 
>>>>> 0m0.178s sys 0m0.000s
>>>>>
>>>>> So, when running with 2 or more threads, the same error keeps surfacing very quickly.
>>>>>
>>>>
>>>> Confirmed. There is something wrong.
>>>>
>>>
>>> Please retry with this patch in. There is clearly a race in the 
>>> current implementation of the mutex release code, specifically the release path.
>>>
>>> @Jan: If believe this gremlin may be in Cobalt too. The key issue is 
>>> with the handle transitioning to NO_HANDLE before 
>>> transfer_ownership() switches it to the handle of the first 
>>> contender in line. If userland manages to slip in with a 
>>> fast_acquire() during this tiny window, we may be toast, with two 
>>> threads actually holding the same lock. The issue was made more 
>>> obvious with EVL which does not serialize the trylock code around 
>>> cmpxchg in kernel space, triggering the issue faster. I'll have a 
>>> second look to Cobalt, but since you know the mutex code in depth there, four eyeballs would be better.
>>>
>>> diff --git a/kernel/evl/mutex.c b/kernel/evl/mutex.c index 
>>> e0cf1ba21d5b4..d96f8c9f04df7 100644
>>> --- a/kernel/evl/mutex.c
>>> +++ b/kernel/evl/mutex.c
>>> @@ -899,14 +899,16 @@ void __evl_unlock_mutex(struct evl_mutex *mutex)
>>>  	if (mutex->flags & EVL_MUTEX_CEILING)
>>>  		clear_boost_locked(mutex, curr, EVL_MUTEX_CEILING);
>>>  
>>> +	/*
>>> +	 * The whole logic is based on the invariant that the current
>>> +	 * thread does own the mutex being released.
>>> +	 * IOW: currh == atomic_read(lockp) & ~(EVL_MUTEX_FLCLAIM|EVL_MUTEX_FLCEIL).
>>> +	 */
>>>  	h = atomic_read(lockp);
>>> -	h = atomic_cmpxchg(lockp, h, EVL_NO_HANDLE);
>>
>> This here was different from current Xenomai 3:
>>
>> 	currh = curr->handle;
>> 	[...]
>> 	h = atomic_cmpxchg(lockp, currh, XN_NO_HANDLE);
>>
>> Under cobalt, we are using the thread handle as expectation value for 
>> the fast lock. If FLCLAIM should have been set there, the cmpxchg 
>> will fail, and the fast lock will NOT contain XN_NO_HANDLE until we 
>> switch things in transfer_ownership(). IOW, I don't see how Xenomai 3 
>> could be affected in the same way by the issue that you fixed in EVL.
>>
>> Am I missing something?
> 
> No, I agree, Xenomai3 is immune. Either FLCLAIM is set in which case 
> we would not match the current handle therefore the fast handle would 
> not be available for grab, or it is set and we would not attempt to 
> transfer the ownership.
> 
> False alarm, my bad.
> 

Well, we still have Sang reporting that he saw an issue that was gone after applying your patch.

Sang, could you share your test case to validate what you were seeing?
Maybe there is a different race somewhere, and the patch changes timing but does not fix things for real.

Jan

--
Siemens AG, Technology
Linux Expert Center


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-08-15  6:07 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-25 20:47 Issues with evl_mutex_trylock() Giulio Moro
2022-08-26 14:09 ` Philippe Gerum
2022-08-26 15:16   ` Giulio Moro
2022-08-26 15:33     ` Philippe Gerum
2022-08-26 16:17   ` Philippe Gerum
2022-08-26 18:07     ` Philippe Gerum
2022-08-27 16:12     ` Giulio Moro
2022-08-27 17:53     ` Philippe Gerum
2023-08-10  6:00       ` Jan Kiszka
2023-08-10  6:40         ` sang yo
2023-08-10  8:18         ` Philippe Gerum
2023-08-11  7:58           ` Jan Kiszka
2023-08-11 11:31       ` Jan Kiszka
2023-08-11 12:31     ` Jan Kiszka
2023-08-11 13:17       ` Philippe Gerum
2023-08-11 13:49         ` Jan Kiszka
2023-08-15  6:07           ` sang yo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).