xenomai.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* Conflicting EVL Processing Loops
@ 2023-01-04 22:28 Russell Johnson
  2023-01-05  7:49 ` Philippe Gerum
  2023-01-11 15:57 ` Russell Johnson
  0 siblings, 2 replies; 9+ messages in thread
From: Russell Johnson @ 2023-01-04 22:28 UTC (permalink / raw)
  To: xenomai; +Cc: Bryan Butler

[-- Attachment #1: Type: text/plain, Size: 713 bytes --]

Hello,

We have two independent processing loops, each consisting of their own set
of EVL threads and interrupts. Each loop completes its processing and then
performs an evl_sleep_until to delay until the next processing deadline
occurs. If we run either loop by itself, everything is fine, and our timing
margins are met. However, if we try to run both simultaneously, the timing
error is increased significantly, and the loops never meet their processing
deadlines. If we compile the code for Linux (substituting all EVL primitives
with Linux equivalents), then we are able to run both loops simultaneously
without issue. Any clue what could be causing us troubles or where to start
looking?


Thanks,

Russell

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 6759 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Conflicting EVL Processing Loops
  2023-01-04 22:28 Conflicting EVL Processing Loops Russell Johnson
@ 2023-01-05  7:49 ` Philippe Gerum
  2023-01-11 15:57 ` Russell Johnson
  1 sibling, 0 replies; 9+ messages in thread
From: Philippe Gerum @ 2023-01-05  7:49 UTC (permalink / raw)
  To: Russell Johnson; +Cc: xenomai, Bryan Butler


Russell Johnson <russell.johnson@kratosdefense.com> writes:

> [[S/MIME Signed Part:Undecided]]
> Hello,
>
> We have two independent processing loops, each consisting of their own set
> of EVL threads and interrupts. Each loop completes its processing and then
> performs an evl_sleep_until to delay until the next processing deadline
> occurs. If we run either loop by itself, everything is fine, and our timing
> margins are met. However, if we try to run both simultaneously, the timing
> error is increased significantly, and the loops never meet their processing
> deadlines. If we compile the code for Linux (substituting all EVL primitives
> with Linux equivalents), then we are able to run both loops simultaneously
> without issue. Any clue what could be causing us troubles or where to start
> looking?
>

In absence of any code to review, the question is too broad to figure
out what might happen. Quick check though: make sure to disable all the
kernel debug options which may be turned on for your EVL kernel
(PROVE_LOCKING, DEBUG_LIST, KASAN and others). evl check may help with
this.

-- 
Philippe.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: Conflicting EVL Processing Loops
  2023-01-04 22:28 Conflicting EVL Processing Loops Russell Johnson
  2023-01-05  7:49 ` Philippe Gerum
@ 2023-01-11 15:57 ` Russell Johnson
  2023-01-11 16:44   ` Russell Johnson
  1 sibling, 1 reply; 9+ messages in thread
From: Russell Johnson @ 2023-01-11 15:57 UTC (permalink / raw)
  To: xenomai; +Cc: Bryan Butler

[-- Attachment #1: Type: text/plain, Size: 920 bytes --]

Hi Philippe,

Digging more into this, it appears that the culprit is the EVL heap. As I
mentioned before - both process loops in our EVL app are independent and run
concurrently. I have overridden the global new/delete to use a singular
master EVL heap for any dynamic memory allocation that is done. It would
seem that both process loops are fighting for the use of the heap. I know
that alloc/free are guarded by a mutex, and for some reason I guess they are
both constantly fighting over it which is slowing all of our threads down
significantly. I ran a test with the EVL heap disabled - of course there are
a lot of syscall warning from EVL, but our timing was as we would expect. So
that guy is definitely the culrprit. We may have to look into trying to use
a separate EVL heap for each process loop in the app. Unless there is some
other way to improve the heap performance that we are seeing?

Thanks,

Russell

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 6759 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: Conflicting EVL Processing Loops
  2023-01-11 15:57 ` Russell Johnson
@ 2023-01-11 16:44   ` Russell Johnson
  2023-01-11 20:33     ` Russell Johnson
  0 siblings, 1 reply; 9+ messages in thread
From: Russell Johnson @ 2023-01-11 16:44 UTC (permalink / raw)
  To: xenomai; +Cc: Bryan Butler

[-- Attachment #1: Type: text/plain, Size: 296 bytes --]

Also, I would assume the STL heap is implemented in a somewhat similar way,
as there would have to be mutex protection around allocation calls. So why
do we not get any slowdowns when using the STL heap versus using the EVL
heap? Is there a signifiacnt design difference there?

Thanks,

Russell

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 6759 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: Conflicting EVL Processing Loops
  2023-01-11 16:44   ` Russell Johnson
@ 2023-01-11 20:33     ` Russell Johnson
  2023-01-12 17:23       ` Philippe Gerum
  0 siblings, 1 reply; 9+ messages in thread
From: Russell Johnson @ 2023-01-11 20:33 UTC (permalink / raw)
  To: xenomai; +Cc: Bryan Butler


[-- Attachment #1.1: Type: text/plain, Size: 1519 bytes --]

I went ahead and put together a very simple test appllication that proves
what I am seeing when it comes to the EVL heap performance being
substantially slower than the Linux STL Heap. In the app, there are 2
pthreads that are attached to EVL and started one after the other. Each
thread creates/destroys 100k std::strings (which use new/delete behind the
scenes). The total thread time is calcluated and printed to the console
before the app shutsdown. If enabling the EVL heap, the global new/delete is
overridden to use the EVL Heap API.

Scenario 1 is an EVL application using the STL Heap. Build with the
following command: " g++ -Wall -g -std=c++11 -o test test.cpp
-I/opt/evl/include -L/opt/evl/lib -levl -lpthread". When this app is run on
my x86 system, I can see that the average time for the 2 threads to complete
is about 0.01 seconds.

Scenario 2 is an EVL application using the EVL Heap. Build with the
following command: " g++ -Wall -g -std=c++11 -o test test.cpp
-I/opt/evl/include -L/opt/evl/lib -levl -lpthread -D EVL_HEAP". When this
app is run on my x86 system, I can see that the average time for the 2
threads to complete is about 0.8 seconds.

This is a very simple example, but even here we can see that there is a
significant slow down using the EVL heap. That is only magnified when
running our much more complex application.

Is this expected behavior out of the EVL heap? If so, is using multiple EVL
heaps the recommendation? If not, where do we think the problem lies?


Thanks,

Russell


[-- Attachment #1.2: test.cpp --]
[-- Type: application/octet-stream, Size: 4923 bytes --]

#include <evl/evl.h>
#include <evl/thread.h>
#include <evl/clock.h>
#include <evl/heap.h>
#include <thread>
#include <unistd.h>
#include <system_error>

static char heap_storage[EVL_HEAP_RAW_SIZE(1024 * 1024)]; /* 1Mb heap */
static struct evl_heap runtime_heap;

#if defined(EVL_HEAP)
void* operator new(std::size_t n)
{
    void* mem = evl_alloc_block(&runtime_heap, n);
    if (mem == nullptr)
    {
        throw std::bad_alloc();
    }
    return mem;
}
void* operator new(std::size_t n, const std::nothrow_t& nothrow_value) noexcept
{
    return evl_alloc_block(&runtime_heap, n);
}
void operator delete(void* p) noexcept
{
    if (p == nullptr)
    {
        return;
    }
    evl_free_block(&runtime_heap, p);
}

void* operator new[](std::size_t n)
{
    void* mem = evl_alloc_block(&runtime_heap, n);
    if (mem == nullptr)
    {
        throw std::bad_alloc();
    }
    return mem;
}
void* operator new[](std::size_t n, const std::nothrow_t& nothrow_value) noexcept
{
    return evl_alloc_block(&runtime_heap, n);
}
void operator delete[](void *p) noexcept
{
    if (p == nullptr)
    {
        return;
    }
    evl_free_block(&runtime_heap, p);
}
#endif

namespace
{
    const size_t NUM_ALLOCS = 100000;
}

double tdiff(const struct timespec& ta, const struct timespec& tb)
{
    double sdiff;
    double nsdiff;
    
    if (ta.tv_nsec >= tb.tv_nsec)
    {
        nsdiff = double(ta.tv_nsec - tb.tv_nsec) / 1e9;
        sdiff = double(ta.tv_sec - tb.tv_sec);
    }
    else
    {
        // Borrow required.
        nsdiff = double(ta.tv_nsec+1000000000 - tb.tv_nsec) / 1e9;
        sdiff = double((ta.tv_sec-1) - tb.tv_sec);
    }
    return(sdiff + nsdiff);
}

void* Thread1(void*)
{
    pthread_setname_np(pthread_self(), "Thread1");

    evl_attach_thread(EVL_CLONE_OBSERVABLE | EVL_CLONE_NONBLOCK, "Thread1");

    evl_printf("Thread 1 woken up\n");

    // Get start time
    struct timespec tstart;
    evl_read_clock(EVL_CLOCK_MONOTONIC, &tstart);

    // Allocate
    for (size_t i = 0; i < NUM_ALLOCS; i++)
    {
        std::string msg = "This is a test string";
    }

    // Get end time
    struct timespec tend;
    evl_read_clock(EVL_CLOCK_MONOTONIC, &tend);

    // Calculate total time
    evl_printf("Thread 1 Total Time: %f\n", tdiff(tend, tstart));

    return nullptr;
}

void* Thread2(void*)
{
    pthread_setname_np(pthread_self(), "Thread2");

    evl_attach_thread(EVL_CLONE_OBSERVABLE | EVL_CLONE_NONBLOCK, "Thread2");

    evl_printf("Thread 2 woken up\n");

    // Get start time
    struct timespec tstart;
    evl_read_clock(EVL_CLOCK_MONOTONIC, &tstart);

    // Allocate
    for (size_t i = 0; i < NUM_ALLOCS; i++)
    {
        std::string msg = "This is a test string";
    }

    // Get end time
    struct timespec tend;
    evl_read_clock(EVL_CLOCK_MONOTONIC, &tend);

    // Calculate total time
    evl_printf("Thread 2 Total Time: %f\n", tdiff(tend, tstart));

    return nullptr;
}

int main(int argc, char *argv[])
{
#if defined(EVL_HEAP)
    printf("Using EVL Heap\n");
#else
    printf("Using STL Heap\n");
#endif

    // Init EVL
    int ret = evl_init();
    if (ret)
    {
        printf("EVL Init failed with error: %d\n", ret);
        return -1;
    }

    ret = evl_init_heap(&runtime_heap, heap_storage, sizeof heap_storage);
    if (ret)
    {
        printf("EVL Heap Init failed with error: %d\n", ret);
        return -1;
    }

    // Thread 1
    pthread_attr_t tattr;
    sched_param param;
    pthread_t tid;
    cpu_set_t tkcpu;
    CPU_ZERO(&tkcpu);
    CPU_SET(1, &tkcpu);
    pthread_attr_init(&tattr);
    pthread_attr_getschedparam(&tattr, &param);
    pthread_attr_setstacksize(&tattr, 1024*1024);
    pthread_attr_setaffinity_np(&tattr, sizeof(cpu_set_t), &tkcpu);
    pthread_attr_setschedpolicy(&tattr, SCHED_FIFO);
    param.sched_priority = 83;
    pthread_attr_setschedparam(&tattr, &param);
    pthread_attr_setinheritsched(&tattr, PTHREAD_EXPLICIT_SCHED);
    pthread_create(&tid, &tattr, Thread1, NULL);

    // Thread 2
    pthread_attr_t tattr2;
    sched_param param2;
    pthread_t tid2;
    cpu_set_t tkcpu2;
    CPU_ZERO(&tkcpu2);
    CPU_SET(2, &tkcpu2);
    pthread_attr_init(&tattr2);
    pthread_attr_getschedparam(&tattr2, &param2);
    pthread_attr_setstacksize(&tattr2, 1024*1024);
    pthread_attr_setaffinity_np(&tattr2, sizeof(cpu_set_t), &tkcpu2);
    pthread_attr_setschedpolicy(&tattr2, SCHED_FIFO);
    param2.sched_priority = 82;
    pthread_attr_setschedparam(&tattr2, &param2);
    pthread_attr_setinheritsched(&tattr2, PTHREAD_EXPLICIT_SCHED);
    pthread_create(&tid2, &tattr2, Thread2, NULL);

    sleep(5); // sleep for a bit

    pthread_join(tid, NULL);
    pthread_join(tid2, NULL);

    return 0;
}

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 6759 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Conflicting EVL Processing Loops
  2023-01-11 20:33     ` Russell Johnson
@ 2023-01-12 17:23       ` Philippe Gerum
  2023-02-02 17:58         ` [External] - " Bryan Butler
  2023-02-02 21:08         ` Russell Johnson
  0 siblings, 2 replies; 9+ messages in thread
From: Philippe Gerum @ 2023-01-12 17:23 UTC (permalink / raw)
  To: Russell Johnson; +Cc: xenomai, Bryan Butler


Russell Johnson <russell.johnson@kratosdefense.com> writes:

> [[S/MIME Signed Part:Undecided]]
> I went ahead and put together a very simple test appllication that proves
> what I am seeing when it comes to the EVL heap performance being
> substantially slower than the Linux STL Heap. In the app, there are 2
> pthreads that are attached to EVL and started one after the other. Each
> thread creates/destroys 100k std::strings (which use new/delete behind the
> scenes). The total thread time is calcluated and printed to the console
> before the app shutsdown. If enabling the EVL heap, the global new/delete is
> overridden to use the EVL Heap API.
>
> Scenario 1 is an EVL application using the STL Heap. Build with the
> following command: " g++ -Wall -g -std=c++11 -o test test.cpp
> -I/opt/evl/include -L/opt/evl/lib -levl -lpthread". When this app is run on
> my x86 system, I can see that the average time for the 2 threads to complete
> is about 0.01 seconds.
>
> Scenario 2 is an EVL application using the EVL Heap. Build with the
> following command: " g++ -Wall -g -std=c++11 -o test test.cpp
> -I/opt/evl/include -L/opt/evl/lib -levl -lpthread -D EVL_HEAP". When this
> app is run on my x86 system, I can see that the average time for the 2
> threads to complete is about 0.8 seconds.
>
> This is a very simple example, but even here we can see that there is a
> significant slow down using the EVL heap. That is only magnified when
> running our much more complex application.
>
> Is this expected behavior out of the EVL heap? If so, is using multiple EVL
> heaps the recommendation? If not, where do we think the problem lies?
>
>
> Thanks,
>
> Russell
>
> [2. application/octet-stream; test.cpp]...
>
> [[End of S/MIME Signed Part]]

That is fun stuff, sort of. It looks like the difference in the
performance numbers between the EVL heap (which is a clone of the
Xenomai3 allocator) and malloc/free boils down to the latter
implementing "fast bins". A fast bin links recently freed small chunks
so that the next allocation can find and extract them very quickly would
they satisfy the request, without going through the whole allocation
dance.

- The test scenario favors using the fast bins every time, since it
  allocates then frees the very same object at each iteration.

- Fast bins do not require serialization via mutex, only a CAS operation
  is needed to pull a recycled chunk from there.

- The test scenario runs the very same code loops on separate CPUs in
  parallel, making conflicting accesses very likely.

With fast bins, a conflict goes unnoticed, since we only need one CAS
operation to push/pull a block on free/alloc operations, without jumping
to the kernel. Without fast bin, we always go through the longish
allocation path, leading to a contention on the mutex guarding the heap
when both threads conflict, in which case the code must issue a bunch of
system calls which explains the slow down.

This behavior may be quite random. For instance, this is a slow run
using the EVL heap captured on an imx6q mira board.

root@homelab-phytec-mira:~# ./evl-heap 
Using EVL Heap
Thread 1 woken up
Thread 2 woken up
Thread 1 Total Time: 0.789410
Thread 2 Total Time: 0.809079

And then, the very next run a couple of secs later with no change gave
this:

root@homelab-phytec-mira:~# ./evl-heap 
Using EVL Heap
Thread 1 woken up
Thread 1 Total Time: 0.126860
Thread 2 woken up
Thread 2 Total Time: 0.125764

A slight shift in the timings which would cause the threads to avoid
conflicts explains the better results above, in this case we did not
have any mutex-related syscall showing up, because we could use the fast
locking which libevl provides (also CAS-based) instead of jumping to the
kernel. e.g.:

CPU   PID   SCHED   PRIO  ISW     CTXSW     SYS       RWA	STAT     TIMEOUT      %CPU	CPUTIME       WCHAN                 NAME
  1   11428  fifo    83   1	  1         3         0          Xo         -           0.0	 0:126.945    -                     Thread1
  1   11431  fifo    82   1	  1         3         0          Xo         -           0.0	 0:125.605    -                     Thread2

Likewise, the ISW field remained steady with the malloc-based test,
confirming that no futex syscall had to be issued by malloc/free in
absence of any access conflict (thanks to fast bins).

At the opposite, the first run with the EVL heap had the CTXSW, SYS and
RWA figures skyrocket (> 30k), because the test endured many
sleep-then-wakeup sequences as it had to grab the mutex the slow way.

What could you do to solve this quickly? a private heap like you
mentioned would make sense, using the _unlocked API of the EVL heap. No
lock, no problem.

Now, this allocation pattern is common enough to think about having some
kind of fast bin scheme in the EVL heap implementation as well, avoiding
sleeping locks as much as possible.

-- 
Philippe.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [External] - Re: Conflicting EVL Processing Loops
  2023-01-12 17:23       ` Philippe Gerum
@ 2023-02-02 17:58         ` Bryan Butler
  2023-02-02 21:08         ` Russell Johnson
  1 sibling, 0 replies; 9+ messages in thread
From: Bryan Butler @ 2023-02-02 17:58 UTC (permalink / raw)
  To: Philippe Gerum, Russell Johnson; +Cc: xenomai

Philippe,

An update and question for you.

First, as you know, we found that the built-in heap management in Xenomai 4/EVL was causing us substantial performance problems, due to the need to perform locking on all memory allocations. The nolock API is not an option for us since we perform memory operations in most of our threads. We also tried the TLSF manager, but saw similar performance issues.

The good news is that we've adapted the mimalloc memory management library, which I believe implements something like the fast bins you mention in your earlier email. The performance of mimalloc looks to be very good, and we are able to get our dual processing loop system running within our real-time constraints. The current implementation is still a bit "hackish", and we're continuing to test it and clean it up. I am hoping to get you the specifics about what we had to do to implement it, since I think it could be a good option for other X4/EVL users. At a high level, it is essentially a "go-between" with the Xenomai heap management at the bottom layer, replacing the low-level sbrk() used to get memory in a Linux run-time environment.

One nagging problem is that we're still plagued by occasional page faults. We have tried to prefault everything we can think of, but we're obviously missing something. I go through each accessible section in the /proc/self/maps file, prefaulting each one (this includes all code and data segments). I have also added a hack to the kernel so that when the "switched inband (fault)" occurs, the faulting address is displayed in dmesg. So far, all of the runtime page faults we see are in the heap section, which I have attempted to prefault completely, even doing it multiple times during startup, since the heap section seems to be growing as we start up our real time threads. 

So, I'm looking at one of 2 possibilities:
1. My prefaulting code, which touches one memory location in each page, is not actually doing what it is supposed to. I've declared variables in the prefaulting function to be "volatile" so that they don't get optimized out. But I don't know any way to really verify that the pages are being mapped in and locked.
2. A kernel bug, where the pages are not, in fact, being locked into memory. We're calling "mlockall(MCL_CURRENT | MCL_FUTURE)", so, even if the heap is growing, I don't understand why any future pages are not being populated and locked into memory at the very beginning. And, the kernel should not be unmapping any of our pages, but perhaps it is?

I know this isn't likely a problem with the EVL code, but we're just about out of ideas for how to find and kill this problem. I'm not much of a kernel expert. If you have any ideas for how to isolate this problem, especially if there's a way to verify whether our process pages are really locked or not, they would be greatly appreciated.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [External] - Re: Conflicting EVL Processing Loops
  2023-01-12 17:23       ` Philippe Gerum
  2023-02-02 17:58         ` [External] - " Bryan Butler
@ 2023-02-02 21:08         ` Russell Johnson
  2023-02-05 17:29           ` Philippe Gerum
  1 sibling, 1 reply; 9+ messages in thread
From: Russell Johnson @ 2023-02-02 21:08 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai, Bryan Butler

[-- Attachment #1: Type: text/plain, Size: 2876 bytes --]

Philippe,

An update and question for you.

First, as you know, we found that the built-in heap management in Xenomai
4/EVL was causing us substantial performance problems, due to the need to
perform locking on all memory allocations. The nolock API is not an option
for us since we perform memory operations in most of our threads. We also
tried the TLSF manager, but saw similar performance issues.

The good news is that we've adapted the mimalloc memory management library,
which I believe implements something like the fast bins you mention in your
earlier email. The performance of mimalloc looks to be very good, and we are
able to get our dual processing loop system running within our real-time
constraints. The current implementation is still a bit "hackish", and we're
continuing to test it and clean it up. I am hoping to get you the specifics
about what we had to do to implement it, since I think it could be a good
option for other X4/EVL users. At a high level, it is essentially a
"go-between" with the Xenomai heap management at the bottom layer, replacing
the low-level sbrk() used to get memory in a Linux run-time environment.

One nagging problem is that we're still plagued by occasional page faults.
We have tried to prefault everything we can think of, but we're obviously
missing something. I go through each accessible section in the
/proc/self/maps file, prefaulting each one (this includes all code and data
segments). I have also added a hack to the kernel so that when the "switched
inband (fault)" occurs, the faulting address is displayed in dmesg. So far,
all of the runtime page faults we see are in the heap section, which I have
attempted to prefault completely, even doing it multiple times during
startup, since the heap section seems to be growing as we start up our real
time threads. 

So, I'm looking at one of 2 possibilities:
1. My prefaulting code, which touches one memory location in each page, is
not actually doing what it is supposed to. I've declared variables in the
prefaulting function to be "volatile" so that they don't get optimized out.
But I don't know any way to really verify that the pages are being mapped in
and locked.
2. A kernel bug, where the pages are not, in fact, being locked into memory.
We're calling "mlockall(MCL_CURRENT | MCL_FUTURE)", so, even if the heap is
growing, I don't understand why any future pages are not being populated and
locked into memory at the very beginning. And, the kernel should not be
unmapping any of our pages, but perhaps it is?

I know this isn't likely a problem with the EVL code, but we're just about
out of ideas for how to find and kill this problem. I'm not much of a kernel
expert. If you have any ideas for how to isolate this problem, especially if
there's a way to verify whether our process pages are really locked or not,
they would be greatly appreciated.


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 6759 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [External] - Re: Conflicting EVL Processing Loops
  2023-02-02 21:08         ` Russell Johnson
@ 2023-02-05 17:29           ` Philippe Gerum
  0 siblings, 0 replies; 9+ messages in thread
From: Philippe Gerum @ 2023-02-05 17:29 UTC (permalink / raw)
  To: Russell Johnson; +Cc: xenomai, Bryan Butler


Russell Johnson <russell.johnson@kratosdefense.com> writes:

> [[S/MIME Signed Part:Undecided]]
> Philippe,
>
> An update and question for you.
>
> First, as you know, we found that the built-in heap management in Xenomai
> 4/EVL was causing us substantial performance problems, due to the need to
> perform locking on all memory allocations. The nolock API is not an option
> for us since we perform memory operations in most of our threads. We also
> tried the TLSF manager, but saw similar performance issues.
>
> The good news is that we've adapted the mimalloc memory management library,
> which I believe implements something like the fast bins you mention in your
> earlier email. The performance of mimalloc looks to be very good, and we are
> able to get our dual processing loop system running within our real-time
> constraints. The current implementation is still a bit "hackish", and we're
> continuing to test it and clean it up. I am hoping to get you the specifics
> about what we had to do to implement it, since I think it could be a good
> option for other X4/EVL users. At a high level, it is essentially a
> "go-between" with the Xenomai heap management at the bottom layer, replacing
> the low-level sbrk() used to get memory in a Linux run-time environment.
>
> One nagging problem is that we're still plagued by occasional page faults.
> We have tried to prefault everything we can think of, but we're obviously
> missing something. I go through each accessible section in the
> /proc/self/maps file, prefaulting each one (this includes all code and data
> segments). I have also added a hack to the kernel so that when the "switched
> inband (fault)" occurs, the faulting address is displayed in dmesg. So far,
> all of the runtime page faults we see are in the heap section, which I have
> attempted to prefault completely, even doing it multiple times during
> startup, since the heap section seems to be growing as we start up our real
> time threads. 
>
> So, I'm looking at one of 2 possibilities:
> 1. My prefaulting code, which touches one memory location in each page, is
> not actually doing what it is supposed to. I've declared variables in the
> prefaulting function to be "volatile" so that they don't get optimized out.
> But I don't know any way to really verify that the pages are being mapped in
> and locked.
> 2. A kernel bug, where the pages are not, in fact, being locked into memory.
> We're calling "mlockall(MCL_CURRENT | MCL_FUTURE)", so, even if the heap is
> growing, I don't understand why any future pages are not being populated and
> locked into memory at the very beginning. And, the kernel should not be
> unmapping any of our pages, but perhaps it is?
>
> I know this isn't likely a problem with the EVL code, but we're just about
> out of ideas for how to find and kill this problem. I'm not much of a kernel
> expert. If you have any ideas for how to isolate this problem, especially if
> there's a way to verify whether our process pages are really locked or not,
> they would be greatly appreciated.
>
> [[End of S/MIME Signed Part]]

Mlocked pages may be migrated (Documentation/vm/unevictable-lru.txt
gives details about this), in which case such pages would not be immune
from minor faults when accessed anew after migration, which may be the
events the core detects.

NUMA support and transparent huge pages are the usual suspects in this
case (CONFIG_NUMA, CONFIG_TRANSPARENT_HUGEPAGE), and/or memory
compaction (CONFIG_COMPACTION).

If turning these off is not a suitable option, you could try to fiddle
with the related runtime settings to see what helps,

e.g.

sysctl -w kernel.numa_balancing=0
echo 0 > /proc/sys/vm/compact_unevictable_allowed

Bottom line is that you would need to stop vmscan from invalidating the
page table entries of the mlocked pages (at the expense of less
flexibility in memory management, but that may not be the most important
issue at hand in this case).

-- 
Philippe.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-02-05 17:50 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-04 22:28 Conflicting EVL Processing Loops Russell Johnson
2023-01-05  7:49 ` Philippe Gerum
2023-01-11 15:57 ` Russell Johnson
2023-01-11 16:44   ` Russell Johnson
2023-01-11 20:33     ` Russell Johnson
2023-01-12 17:23       ` Philippe Gerum
2023-02-02 17:58         ` [External] - " Bryan Butler
2023-02-02 21:08         ` Russell Johnson
2023-02-05 17:29           ` Philippe Gerum

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).