* Conflicting EVL Processing Loops @ 2023-01-04 22:28 Russell Johnson 2023-01-05 7:49 ` Philippe Gerum 2023-01-11 15:57 ` Russell Johnson 0 siblings, 2 replies; 10+ messages in thread From: Russell Johnson @ 2023-01-04 22:28 UTC (permalink / raw) To: xenomai; +Cc: Bryan Butler [-- Attachment #1: Type: text/plain, Size: 713 bytes --] Hello, We have two independent processing loops, each consisting of their own set of EVL threads and interrupts. Each loop completes its processing and then performs an evl_sleep_until to delay until the next processing deadline occurs. If we run either loop by itself, everything is fine, and our timing margins are met. However, if we try to run both simultaneously, the timing error is increased significantly, and the loops never meet their processing deadlines. If we compile the code for Linux (substituting all EVL primitives with Linux equivalents), then we are able to run both loops simultaneously without issue. Any clue what could be causing us troubles or where to start looking? Thanks, Russell [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 6759 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Conflicting EVL Processing Loops 2023-01-04 22:28 Conflicting EVL Processing Loops Russell Johnson @ 2023-01-05 7:49 ` Philippe Gerum 2023-01-11 15:57 ` Russell Johnson 1 sibling, 0 replies; 10+ messages in thread From: Philippe Gerum @ 2023-01-05 7:49 UTC (permalink / raw) To: Russell Johnson; +Cc: xenomai, Bryan Butler Russell Johnson <russell.johnson@kratosdefense.com> writes: > [[S/MIME Signed Part:Undecided]] > Hello, > > We have two independent processing loops, each consisting of their own set > of EVL threads and interrupts. Each loop completes its processing and then > performs an evl_sleep_until to delay until the next processing deadline > occurs. If we run either loop by itself, everything is fine, and our timing > margins are met. However, if we try to run both simultaneously, the timing > error is increased significantly, and the loops never meet their processing > deadlines. If we compile the code for Linux (substituting all EVL primitives > with Linux equivalents), then we are able to run both loops simultaneously > without issue. Any clue what could be causing us troubles or where to start > looking? > In absence of any code to review, the question is too broad to figure out what might happen. Quick check though: make sure to disable all the kernel debug options which may be turned on for your EVL kernel (PROVE_LOCKING, DEBUG_LIST, KASAN and others). evl check may help with this. -- Philippe. ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: Conflicting EVL Processing Loops 2023-01-04 22:28 Conflicting EVL Processing Loops Russell Johnson 2023-01-05 7:49 ` Philippe Gerum @ 2023-01-11 15:57 ` Russell Johnson 2023-01-11 16:44 ` Russell Johnson 1 sibling, 1 reply; 10+ messages in thread From: Russell Johnson @ 2023-01-11 15:57 UTC (permalink / raw) To: xenomai; +Cc: Bryan Butler [-- Attachment #1: Type: text/plain, Size: 920 bytes --] Hi Philippe, Digging more into this, it appears that the culprit is the EVL heap. As I mentioned before - both process loops in our EVL app are independent and run concurrently. I have overridden the global new/delete to use a singular master EVL heap for any dynamic memory allocation that is done. It would seem that both process loops are fighting for the use of the heap. I know that alloc/free are guarded by a mutex, and for some reason I guess they are both constantly fighting over it which is slowing all of our threads down significantly. I ran a test with the EVL heap disabled - of course there are a lot of syscall warning from EVL, but our timing was as we would expect. So that guy is definitely the culrprit. We may have to look into trying to use a separate EVL heap for each process loop in the app. Unless there is some other way to improve the heap performance that we are seeing? Thanks, Russell [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 6759 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: Conflicting EVL Processing Loops 2023-01-11 15:57 ` Russell Johnson @ 2023-01-11 16:44 ` Russell Johnson 2023-01-11 20:33 ` Russell Johnson 0 siblings, 1 reply; 10+ messages in thread From: Russell Johnson @ 2023-01-11 16:44 UTC (permalink / raw) To: xenomai; +Cc: Bryan Butler [-- Attachment #1: Type: text/plain, Size: 296 bytes --] Also, I would assume the STL heap is implemented in a somewhat similar way, as there would have to be mutex protection around allocation calls. So why do we not get any slowdowns when using the STL heap versus using the EVL heap? Is there a signifiacnt design difference there? Thanks, Russell [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 6759 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: Conflicting EVL Processing Loops 2023-01-11 16:44 ` Russell Johnson @ 2023-01-11 20:33 ` Russell Johnson 2023-01-12 17:23 ` Philippe Gerum 0 siblings, 1 reply; 10+ messages in thread From: Russell Johnson @ 2023-01-11 20:33 UTC (permalink / raw) To: xenomai; +Cc: Bryan Butler [-- Attachment #1.1: Type: text/plain, Size: 1519 bytes --] I went ahead and put together a very simple test appllication that proves what I am seeing when it comes to the EVL heap performance being substantially slower than the Linux STL Heap. In the app, there are 2 pthreads that are attached to EVL and started one after the other. Each thread creates/destroys 100k std::strings (which use new/delete behind the scenes). The total thread time is calcluated and printed to the console before the app shutsdown. If enabling the EVL heap, the global new/delete is overridden to use the EVL Heap API. Scenario 1 is an EVL application using the STL Heap. Build with the following command: " g++ -Wall -g -std=c++11 -o test test.cpp -I/opt/evl/include -L/opt/evl/lib -levl -lpthread". When this app is run on my x86 system, I can see that the average time for the 2 threads to complete is about 0.01 seconds. Scenario 2 is an EVL application using the EVL Heap. Build with the following command: " g++ -Wall -g -std=c++11 -o test test.cpp -I/opt/evl/include -L/opt/evl/lib -levl -lpthread -D EVL_HEAP". When this app is run on my x86 system, I can see that the average time for the 2 threads to complete is about 0.8 seconds. This is a very simple example, but even here we can see that there is a significant slow down using the EVL heap. That is only magnified when running our much more complex application. Is this expected behavior out of the EVL heap? If so, is using multiple EVL heaps the recommendation? If not, where do we think the problem lies? Thanks, Russell [-- Attachment #1.2: test.cpp --] [-- Type: application/octet-stream, Size: 4923 bytes --] #include <evl/evl.h> #include <evl/thread.h> #include <evl/clock.h> #include <evl/heap.h> #include <thread> #include <unistd.h> #include <system_error> static char heap_storage[EVL_HEAP_RAW_SIZE(1024 * 1024)]; /* 1Mb heap */ static struct evl_heap runtime_heap; #if defined(EVL_HEAP) void* operator new(std::size_t n) { void* mem = evl_alloc_block(&runtime_heap, n); if (mem == nullptr) { throw std::bad_alloc(); } return mem; } void* operator new(std::size_t n, const std::nothrow_t& nothrow_value) noexcept { return evl_alloc_block(&runtime_heap, n); } void operator delete(void* p) noexcept { if (p == nullptr) { return; } evl_free_block(&runtime_heap, p); } void* operator new[](std::size_t n) { void* mem = evl_alloc_block(&runtime_heap, n); if (mem == nullptr) { throw std::bad_alloc(); } return mem; } void* operator new[](std::size_t n, const std::nothrow_t& nothrow_value) noexcept { return evl_alloc_block(&runtime_heap, n); } void operator delete[](void *p) noexcept { if (p == nullptr) { return; } evl_free_block(&runtime_heap, p); } #endif namespace { const size_t NUM_ALLOCS = 100000; } double tdiff(const struct timespec& ta, const struct timespec& tb) { double sdiff; double nsdiff; if (ta.tv_nsec >= tb.tv_nsec) { nsdiff = double(ta.tv_nsec - tb.tv_nsec) / 1e9; sdiff = double(ta.tv_sec - tb.tv_sec); } else { // Borrow required. nsdiff = double(ta.tv_nsec+1000000000 - tb.tv_nsec) / 1e9; sdiff = double((ta.tv_sec-1) - tb.tv_sec); } return(sdiff + nsdiff); } void* Thread1(void*) { pthread_setname_np(pthread_self(), "Thread1"); evl_attach_thread(EVL_CLONE_OBSERVABLE | EVL_CLONE_NONBLOCK, "Thread1"); evl_printf("Thread 1 woken up\n"); // Get start time struct timespec tstart; evl_read_clock(EVL_CLOCK_MONOTONIC, &tstart); // Allocate for (size_t i = 0; i < NUM_ALLOCS; i++) { std::string msg = "This is a test string"; } // Get end time struct timespec tend; evl_read_clock(EVL_CLOCK_MONOTONIC, &tend); // Calculate total time evl_printf("Thread 1 Total Time: %f\n", tdiff(tend, tstart)); return nullptr; } void* Thread2(void*) { pthread_setname_np(pthread_self(), "Thread2"); evl_attach_thread(EVL_CLONE_OBSERVABLE | EVL_CLONE_NONBLOCK, "Thread2"); evl_printf("Thread 2 woken up\n"); // Get start time struct timespec tstart; evl_read_clock(EVL_CLOCK_MONOTONIC, &tstart); // Allocate for (size_t i = 0; i < NUM_ALLOCS; i++) { std::string msg = "This is a test string"; } // Get end time struct timespec tend; evl_read_clock(EVL_CLOCK_MONOTONIC, &tend); // Calculate total time evl_printf("Thread 2 Total Time: %f\n", tdiff(tend, tstart)); return nullptr; } int main(int argc, char *argv[]) { #if defined(EVL_HEAP) printf("Using EVL Heap\n"); #else printf("Using STL Heap\n"); #endif // Init EVL int ret = evl_init(); if (ret) { printf("EVL Init failed with error: %d\n", ret); return -1; } ret = evl_init_heap(&runtime_heap, heap_storage, sizeof heap_storage); if (ret) { printf("EVL Heap Init failed with error: %d\n", ret); return -1; } // Thread 1 pthread_attr_t tattr; sched_param param; pthread_t tid; cpu_set_t tkcpu; CPU_ZERO(&tkcpu); CPU_SET(1, &tkcpu); pthread_attr_init(&tattr); pthread_attr_getschedparam(&tattr, ¶m); pthread_attr_setstacksize(&tattr, 1024*1024); pthread_attr_setaffinity_np(&tattr, sizeof(cpu_set_t), &tkcpu); pthread_attr_setschedpolicy(&tattr, SCHED_FIFO); param.sched_priority = 83; pthread_attr_setschedparam(&tattr, ¶m); pthread_attr_setinheritsched(&tattr, PTHREAD_EXPLICIT_SCHED); pthread_create(&tid, &tattr, Thread1, NULL); // Thread 2 pthread_attr_t tattr2; sched_param param2; pthread_t tid2; cpu_set_t tkcpu2; CPU_ZERO(&tkcpu2); CPU_SET(2, &tkcpu2); pthread_attr_init(&tattr2); pthread_attr_getschedparam(&tattr2, ¶m2); pthread_attr_setstacksize(&tattr2, 1024*1024); pthread_attr_setaffinity_np(&tattr2, sizeof(cpu_set_t), &tkcpu2); pthread_attr_setschedpolicy(&tattr2, SCHED_FIFO); param2.sched_priority = 82; pthread_attr_setschedparam(&tattr2, ¶m2); pthread_attr_setinheritsched(&tattr2, PTHREAD_EXPLICIT_SCHED); pthread_create(&tid2, &tattr2, Thread2, NULL); sleep(5); // sleep for a bit pthread_join(tid, NULL); pthread_join(tid2, NULL); return 0; } [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 6759 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Conflicting EVL Processing Loops 2023-01-11 20:33 ` Russell Johnson @ 2023-01-12 17:23 ` Philippe Gerum 2023-02-02 17:58 ` [External] - " Bryan Butler 2023-02-02 21:08 ` Russell Johnson 0 siblings, 2 replies; 10+ messages in thread From: Philippe Gerum @ 2023-01-12 17:23 UTC (permalink / raw) To: Russell Johnson; +Cc: xenomai, Bryan Butler Russell Johnson <russell.johnson@kratosdefense.com> writes: > [[S/MIME Signed Part:Undecided]] > I went ahead and put together a very simple test appllication that proves > what I am seeing when it comes to the EVL heap performance being > substantially slower than the Linux STL Heap. In the app, there are 2 > pthreads that are attached to EVL and started one after the other. Each > thread creates/destroys 100k std::strings (which use new/delete behind the > scenes). The total thread time is calcluated and printed to the console > before the app shutsdown. If enabling the EVL heap, the global new/delete is > overridden to use the EVL Heap API. > > Scenario 1 is an EVL application using the STL Heap. Build with the > following command: " g++ -Wall -g -std=c++11 -o test test.cpp > -I/opt/evl/include -L/opt/evl/lib -levl -lpthread". When this app is run on > my x86 system, I can see that the average time for the 2 threads to complete > is about 0.01 seconds. > > Scenario 2 is an EVL application using the EVL Heap. Build with the > following command: " g++ -Wall -g -std=c++11 -o test test.cpp > -I/opt/evl/include -L/opt/evl/lib -levl -lpthread -D EVL_HEAP". When this > app is run on my x86 system, I can see that the average time for the 2 > threads to complete is about 0.8 seconds. > > This is a very simple example, but even here we can see that there is a > significant slow down using the EVL heap. That is only magnified when > running our much more complex application. > > Is this expected behavior out of the EVL heap? If so, is using multiple EVL > heaps the recommendation? If not, where do we think the problem lies? > > > Thanks, > > Russell > > [2. application/octet-stream; test.cpp]... > > [[End of S/MIME Signed Part]] That is fun stuff, sort of. It looks like the difference in the performance numbers between the EVL heap (which is a clone of the Xenomai3 allocator) and malloc/free boils down to the latter implementing "fast bins". A fast bin links recently freed small chunks so that the next allocation can find and extract them very quickly would they satisfy the request, without going through the whole allocation dance. - The test scenario favors using the fast bins every time, since it allocates then frees the very same object at each iteration. - Fast bins do not require serialization via mutex, only a CAS operation is needed to pull a recycled chunk from there. - The test scenario runs the very same code loops on separate CPUs in parallel, making conflicting accesses very likely. With fast bins, a conflict goes unnoticed, since we only need one CAS operation to push/pull a block on free/alloc operations, without jumping to the kernel. Without fast bin, we always go through the longish allocation path, leading to a contention on the mutex guarding the heap when both threads conflict, in which case the code must issue a bunch of system calls which explains the slow down. This behavior may be quite random. For instance, this is a slow run using the EVL heap captured on an imx6q mira board. root@homelab-phytec-mira:~# ./evl-heap Using EVL Heap Thread 1 woken up Thread 2 woken up Thread 1 Total Time: 0.789410 Thread 2 Total Time: 0.809079 And then, the very next run a couple of secs later with no change gave this: root@homelab-phytec-mira:~# ./evl-heap Using EVL Heap Thread 1 woken up Thread 1 Total Time: 0.126860 Thread 2 woken up Thread 2 Total Time: 0.125764 A slight shift in the timings which would cause the threads to avoid conflicts explains the better results above, in this case we did not have any mutex-related syscall showing up, because we could use the fast locking which libevl provides (also CAS-based) instead of jumping to the kernel. e.g.: CPU PID SCHED PRIO ISW CTXSW SYS RWA STAT TIMEOUT %CPU CPUTIME WCHAN NAME 1 11428 fifo 83 1 1 3 0 Xo - 0.0 0:126.945 - Thread1 1 11431 fifo 82 1 1 3 0 Xo - 0.0 0:125.605 - Thread2 Likewise, the ISW field remained steady with the malloc-based test, confirming that no futex syscall had to be issued by malloc/free in absence of any access conflict (thanks to fast bins). At the opposite, the first run with the EVL heap had the CTXSW, SYS and RWA figures skyrocket (> 30k), because the test endured many sleep-then-wakeup sequences as it had to grab the mutex the slow way. What could you do to solve this quickly? a private heap like you mentioned would make sense, using the _unlocked API of the EVL heap. No lock, no problem. Now, this allocation pattern is common enough to think about having some kind of fast bin scheme in the EVL heap implementation as well, avoiding sleeping locks as much as possible. -- Philippe. ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [External] - Re: Conflicting EVL Processing Loops 2023-01-12 17:23 ` Philippe Gerum @ 2023-02-02 17:58 ` Bryan Butler 2023-02-02 21:08 ` Russell Johnson 1 sibling, 0 replies; 10+ messages in thread From: Bryan Butler @ 2023-02-02 17:58 UTC (permalink / raw) To: Philippe Gerum, Russell Johnson; +Cc: xenomai Philippe, An update and question for you. First, as you know, we found that the built-in heap management in Xenomai 4/EVL was causing us substantial performance problems, due to the need to perform locking on all memory allocations. The nolock API is not an option for us since we perform memory operations in most of our threads. We also tried the TLSF manager, but saw similar performance issues. The good news is that we've adapted the mimalloc memory management library, which I believe implements something like the fast bins you mention in your earlier email. The performance of mimalloc looks to be very good, and we are able to get our dual processing loop system running within our real-time constraints. The current implementation is still a bit "hackish", and we're continuing to test it and clean it up. I am hoping to get you the specifics about what we had to do to implement it, since I think it could be a good option for other X4/EVL users. At a high level, it is essentially a "go-between" with the Xenomai heap management at the bottom layer, replacing the low-level sbrk() used to get memory in a Linux run-time environment. One nagging problem is that we're still plagued by occasional page faults. We have tried to prefault everything we can think of, but we're obviously missing something. I go through each accessible section in the /proc/self/maps file, prefaulting each one (this includes all code and data segments). I have also added a hack to the kernel so that when the "switched inband (fault)" occurs, the faulting address is displayed in dmesg. So far, all of the runtime page faults we see are in the heap section, which I have attempted to prefault completely, even doing it multiple times during startup, since the heap section seems to be growing as we start up our real time threads. So, I'm looking at one of 2 possibilities: 1. My prefaulting code, which touches one memory location in each page, is not actually doing what it is supposed to. I've declared variables in the prefaulting function to be "volatile" so that they don't get optimized out. But I don't know any way to really verify that the pages are being mapped in and locked. 2. A kernel bug, where the pages are not, in fact, being locked into memory. We're calling "mlockall(MCL_CURRENT | MCL_FUTURE)", so, even if the heap is growing, I don't understand why any future pages are not being populated and locked into memory at the very beginning. And, the kernel should not be unmapping any of our pages, but perhaps it is? I know this isn't likely a problem with the EVL code, but we're just about out of ideas for how to find and kill this problem. I'm not much of a kernel expert. If you have any ideas for how to isolate this problem, especially if there's a way to verify whether our process pages are really locked or not, they would be greatly appreciated. ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [External] - Re: Conflicting EVL Processing Loops 2023-01-12 17:23 ` Philippe Gerum 2023-02-02 17:58 ` [External] - " Bryan Butler @ 2023-02-02 21:08 ` Russell Johnson 2023-02-05 17:29 ` Philippe Gerum 1 sibling, 1 reply; 10+ messages in thread From: Russell Johnson @ 2023-02-02 21:08 UTC (permalink / raw) To: Philippe Gerum; +Cc: xenomai, Bryan Butler [-- Attachment #1: Type: text/plain, Size: 2876 bytes --] Philippe, An update and question for you. First, as you know, we found that the built-in heap management in Xenomai 4/EVL was causing us substantial performance problems, due to the need to perform locking on all memory allocations. The nolock API is not an option for us since we perform memory operations in most of our threads. We also tried the TLSF manager, but saw similar performance issues. The good news is that we've adapted the mimalloc memory management library, which I believe implements something like the fast bins you mention in your earlier email. The performance of mimalloc looks to be very good, and we are able to get our dual processing loop system running within our real-time constraints. The current implementation is still a bit "hackish", and we're continuing to test it and clean it up. I am hoping to get you the specifics about what we had to do to implement it, since I think it could be a good option for other X4/EVL users. At a high level, it is essentially a "go-between" with the Xenomai heap management at the bottom layer, replacing the low-level sbrk() used to get memory in a Linux run-time environment. One nagging problem is that we're still plagued by occasional page faults. We have tried to prefault everything we can think of, but we're obviously missing something. I go through each accessible section in the /proc/self/maps file, prefaulting each one (this includes all code and data segments). I have also added a hack to the kernel so that when the "switched inband (fault)" occurs, the faulting address is displayed in dmesg. So far, all of the runtime page faults we see are in the heap section, which I have attempted to prefault completely, even doing it multiple times during startup, since the heap section seems to be growing as we start up our real time threads. So, I'm looking at one of 2 possibilities: 1. My prefaulting code, which touches one memory location in each page, is not actually doing what it is supposed to. I've declared variables in the prefaulting function to be "volatile" so that they don't get optimized out. But I don't know any way to really verify that the pages are being mapped in and locked. 2. A kernel bug, where the pages are not, in fact, being locked into memory. We're calling "mlockall(MCL_CURRENT | MCL_FUTURE)", so, even if the heap is growing, I don't understand why any future pages are not being populated and locked into memory at the very beginning. And, the kernel should not be unmapping any of our pages, but perhaps it is? I know this isn't likely a problem with the EVL code, but we're just about out of ideas for how to find and kill this problem. I'm not much of a kernel expert. If you have any ideas for how to isolate this problem, especially if there's a way to verify whether our process pages are really locked or not, they would be greatly appreciated. [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 6759 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [External] - Re: Conflicting EVL Processing Loops 2023-02-02 21:08 ` Russell Johnson @ 2023-02-05 17:29 ` Philippe Gerum 0 siblings, 0 replies; 10+ messages in thread From: Philippe Gerum @ 2023-02-05 17:29 UTC (permalink / raw) To: Russell Johnson; +Cc: xenomai, Bryan Butler Russell Johnson <russell.johnson@kratosdefense.com> writes: > [[S/MIME Signed Part:Undecided]] > Philippe, > > An update and question for you. > > First, as you know, we found that the built-in heap management in Xenomai > 4/EVL was causing us substantial performance problems, due to the need to > perform locking on all memory allocations. The nolock API is not an option > for us since we perform memory operations in most of our threads. We also > tried the TLSF manager, but saw similar performance issues. > > The good news is that we've adapted the mimalloc memory management library, > which I believe implements something like the fast bins you mention in your > earlier email. The performance of mimalloc looks to be very good, and we are > able to get our dual processing loop system running within our real-time > constraints. The current implementation is still a bit "hackish", and we're > continuing to test it and clean it up. I am hoping to get you the specifics > about what we had to do to implement it, since I think it could be a good > option for other X4/EVL users. At a high level, it is essentially a > "go-between" with the Xenomai heap management at the bottom layer, replacing > the low-level sbrk() used to get memory in a Linux run-time environment. > > One nagging problem is that we're still plagued by occasional page faults. > We have tried to prefault everything we can think of, but we're obviously > missing something. I go through each accessible section in the > /proc/self/maps file, prefaulting each one (this includes all code and data > segments). I have also added a hack to the kernel so that when the "switched > inband (fault)" occurs, the faulting address is displayed in dmesg. So far, > all of the runtime page faults we see are in the heap section, which I have > attempted to prefault completely, even doing it multiple times during > startup, since the heap section seems to be growing as we start up our real > time threads. > > So, I'm looking at one of 2 possibilities: > 1. My prefaulting code, which touches one memory location in each page, is > not actually doing what it is supposed to. I've declared variables in the > prefaulting function to be "volatile" so that they don't get optimized out. > But I don't know any way to really verify that the pages are being mapped in > and locked. > 2. A kernel bug, where the pages are not, in fact, being locked into memory. > We're calling "mlockall(MCL_CURRENT | MCL_FUTURE)", so, even if the heap is > growing, I don't understand why any future pages are not being populated and > locked into memory at the very beginning. And, the kernel should not be > unmapping any of our pages, but perhaps it is? > > I know this isn't likely a problem with the EVL code, but we're just about > out of ideas for how to find and kill this problem. I'm not much of a kernel > expert. If you have any ideas for how to isolate this problem, especially if > there's a way to verify whether our process pages are really locked or not, > they would be greatly appreciated. > > [[End of S/MIME Signed Part]] Mlocked pages may be migrated (Documentation/vm/unevictable-lru.txt gives details about this), in which case such pages would not be immune from minor faults when accessed anew after migration, which may be the events the core detects. NUMA support and transparent huge pages are the usual suspects in this case (CONFIG_NUMA, CONFIG_TRANSPARENT_HUGEPAGE), and/or memory compaction (CONFIG_COMPACTION). If turning these off is not a suitable option, you could try to fiddle with the related runtime settings to see what helps, e.g. sysctl -w kernel.numa_balancing=0 echo 0 > /proc/sys/vm/compact_unevictable_allowed Bottom line is that you would need to stop vmscan from invalidating the page table entries of the mlocked pages (at the expense of less flexibility in memory management, but that may not be the most important issue at hand in this case). -- Philippe. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Conflicting EVL Processing Loops @ 2023-01-04 20:08 Russell Johnson 0 siblings, 0 replies; 10+ messages in thread From: Russell Johnson @ 2023-01-04 20:08 UTC (permalink / raw) To: xenomai; +Cc: Bryan Butler [-- Attachment #1.1: Type: text/plain, Size: 725 bytes --] Hello, We have two independent processing loops, each consisting of their own set of EVL threads and interrupts. Each loop completes its processing and then performs an evl_sleep_until to delay until the next processing deadline occurs. If we run either loop by itself, everything is fine, and our timing margins are met. However, if we try to run both simultaneously, the timing error is increased significantly, and the loops never meet their processing deadlines. If we compile the code for Linux (substituting all EVL primitives with Linux equivalents), then we are able to run both loops simultaneously without issue. Any clue what could be causing us troubles or where to start looking? Thanks, Russell [-- Attachment #1.2: Type: text/html, Size: 2380 bytes --] [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 6759 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2023-02-05 17:50 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-01-04 22:28 Conflicting EVL Processing Loops Russell Johnson 2023-01-05 7:49 ` Philippe Gerum 2023-01-11 15:57 ` Russell Johnson 2023-01-11 16:44 ` Russell Johnson 2023-01-11 20:33 ` Russell Johnson 2023-01-12 17:23 ` Philippe Gerum 2023-02-02 17:58 ` [External] - " Bryan Butler 2023-02-02 21:08 ` Russell Johnson 2023-02-05 17:29 ` Philippe Gerum -- strict thread matches above, loose matches on Subject: below -- 2023-01-04 20:08 Russell Johnson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).