All of lore.kernel.org
 help / color / mirror / Atom feed
From: Russell Johnson <russell.johnson@kratosdefense.com>
To: "xenomai@lists.linux.dev" <xenomai@lists.linux.dev>,
	Philippe Gerum <rpm@xenomai.org>
Cc: Bryan Butler <Bryan.Butler@kratosdefense.com>,
	Shawn McManus <shawn.mcmanus@kratosdefense.com>
Subject: EVL Memory
Date: Wed, 9 Nov 2022 17:04:34 +0000	[thread overview]
Message-ID: <PH1P110MB1050307D7FC58A2B99A40094E23E9@PH1P110MB1050.NAMP110.PROD.OUTLOOK.COM> (raw)
In-Reply-To: <PH1P110MB10508DD9688B7D0E82AB30CDE23E9@PH1P110MB1050.NAMP110.PROD.OUTLOOK.COM>


[-- Attachment #1.1.1: Type: text/plain, Size: 4554 bytes --]

Hello,

 

We have been running into some memory issues with our realtime EVL
application. Recently, we are seeing a core dump write immediately on
startup, and when running in the debugger,  the call stack just doesn't make
any sense - it almost seems random. I have the latest EVL kernel built with
all the EVL debug on including the EVL_DEBUG_MEM option, and I also turned
on KASAN. I see no output at all from the kernel when this core dump happens
except for a couple basic lines:

 

EVL: RxMain switching in-band [pid=4327, excpt=14, user_pc=0x0]

RxMain[4327]: segfault at 0 ip 0000000000000000 sp 00007f7ab5305468 error 14
in realtime_sfhm[400000+161000]

Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.

 

I wouldn't say those lines are very helpful. I know that exception 14 on x86
is a page fault and that a pc value of 0 does not seem good (memory
corruption?). 

 

I wanted to get your opinion on our overall memory setup and see if there
are obvious issues or if you had any recommendations of things to try. So we
set up 1 EVL heap on startup of the application, and this heap is used for
all dynamic allocations throughout the entire app as we also overrode global
new/delete to use the EVL heap rather than malloc/free. The reason we
overrode global new/delete is that we have a lot of STL objects as well as
third party libraries that we are using in this app and it would become a
very long and challenging process to actually go through and modify all
those to use custom allocators. So, in main(), we spawn a "Main" EVL thread.
All other threads are spawned from this parent EVL thread. When the EVL Main
thread starts, it sets a static flag that enables using the evl heap in the
global news/deletes, and every library and process for the realtime app is
created and started. We also prefault the EVL heap after it is
initialized/created in order to try to avoid page faults while running the
realtime loop. However, we still see occasional page faults in our realtime
EVL threads while running our main realtime loop. I don't understand how
there could be a page fault if the EVL heap is large enough (verified) and
prefaulted.

 

Now back to the issue mentioned at the beginning of this email - we are
unsure at this time if the page fault error seen there is cause, result, or
primary error causing these core dumps. Typically when I see the occasional
page faults while running other EVL threads in the app, I do not see core
dumps - just the log from the kernel. The only reason why I figured I would
at least run this past you was that I did not see any core dumps when
disabling the flag in the global new/delete and just using malloc/free (of
course I saw some in band switches, but that was it).

 

The issue of occasional page faults, handling dynamic memory allocation
(using global new/delete), and now these possible memory corruption seg
faults is becoming a larger concern for us. We would like to make sure that
we are understanding how to use memory in an EVL application properly, and
we would be interested to know if there are any recommended ways of tracking
these down with an EVL application. I have tried to build/run with the gcc
address sanitizer, but I was seeing issues attaching EVL threads when this
was enabled. I have also tried running valgrind, but that has produced
nothing useful. And, of course, I have run in gdb and the stack traces are
not helpful. At this point, any guidance, thoughts, and/or recommendations
would be greatly appreciated. I added some more clear/specific questions at
the bottom.

 

A few specific questions:

1.	Is this a reasonable model to use for an EVL application, or do you
expect the model to revolve more around static allocation?
2.	Are you aware of people using EVL overriding the global new/delete
to use the EVL heap?
3.	Do you have any tools for debugging the EVL heap or have you adapted
any existing tools (such as valgrind) to debug the EVL heap?
4.	Is there any known way of protecting against a stack overflow?
5.	PC value of 0 is never valid and we have no evidence that we have an
uninitialized pointer in our C++ code. Is there anyway to use info from EVL
to help track down this issue?
6.	We are allocating 2GB of EVL Heap memory and pre-faulting all of it
on startup. We also use pthread_attr_setstacksize to pre-allocate the stack
for each EVL thread we have. EVL still says that we get rare page faults.
How is this possible? Are we missing something? (I have attached our heap
pre-faulting logic)

 

 

Thanks,

 

Russell


[-- Attachment #1.1.2: Type: text/html, Size: 9154 bytes --]

[-- Attachment #1.2: heap_prealloc.txt --]
[-- Type: text/plain, Size: 1362 bytes --]

// Static variables
struct evl_heap heap_manager::s_runtime_heap;
bool heap_manager::s_use_evl = false; // Flag used in global new/delete to switch between Linux and EVL heaps
size_t heap_manager::s_init_bytes = 2147483648; // 2GB

///////////////////////////////////////////////////////////////////////////////
void* heap_manager::alloc_block(size_t bytes)
{
    void* data = evl_alloc_block(&s_runtime_heap, bytes);
    if (data == NULL)
    {
        // Switch back to Linux heap to avoid infinite alloc loop
        s_use_evl = false;
        throw(std::runtime_error("Error allocating a block on the EVL heap"));
    }
    return data;
}

///////////////////////////////////////////////////////////////////////////////
void heap_manager::free_block(void* block)
{
    int ret = evl_free_block(&s_runtime_heap, block);
    if (ret < 0)
    {
        sfhm::rtString msg;
        msg.Format("%s:%d: Error freeing a block on the EVL heap", __FILE__, __LINE__);
        throw(std::system_error(ret, std::generic_category(), msg));
    }
}

///////////////////////////////////////////////////////////////////////////////
void heap_manager::prefault()
{
    char *dummy = (char*)alloc_block(s_init_bytes);
    for (size_t i = 0; i < s_init_bytes; i += sysconf(_SC_PAGESIZE))
        dummy[i] = 1;  
    free_block(dummy);
}


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 6759 bytes --]

       reply	other threads:[~2022-11-09 17:29 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <PH1P110MB1050AD875FCD924D827A0A54E23E9@PH1P110MB1050.NAMP110.PROD.OUTLOOK.COM>
     [not found] ` <PH1P110MB10508DD9688B7D0E82AB30CDE23E9@PH1P110MB1050.NAMP110.PROD.OUTLOOK.COM>
2022-11-09 17:04   ` Russell Johnson [this message]
2022-11-09 17:20     ` EVL Memory Philippe Gerum
2022-11-11 21:34       ` [External] - " Russell Johnson
2022-11-14  9:53         ` Philippe Gerum
2022-11-14 22:42           ` Russell Johnson
2022-11-15  8:33             ` Philippe Gerum
2022-11-15 17:05               ` Russell Johnson
2022-11-15 18:36                 ` Philippe Gerum
2022-11-16 15:48                   ` Philippe Gerum
2022-11-16 21:37                     ` Russell Johnson
2022-11-17 16:48                       ` Philippe Gerum
2022-11-17 16:57                         ` Russell Johnson
2022-11-17 17:03                           ` Philippe Gerum
2022-11-17 17:37                             ` Russell Johnson
2022-11-18  8:06                               ` Philippe Gerum
2022-11-18 21:08                                 ` Russell Johnson
2022-11-17 22:19                             ` Russell Johnson
2022-11-18  8:02                               ` Philippe Gerum
2022-11-18  8:08                         ` Philippe Gerum
2022-11-19 16:37                           ` Russell Johnson
2022-11-19 16:42                             ` Philippe Gerum
2022-11-19 16:50                               ` Russell Johnson
2022-11-19 18:11                               ` Russell Johnson
2022-11-20  8:25                                 ` Philippe Gerum
2022-11-21 15:56                             ` Philippe Gerum
2022-11-21 18:33                               ` Bryan Butler
2022-11-28 15:21                                 ` Russell Johnson
2022-11-28 16:49                                   ` Philippe Gerum
2022-11-28 20:59                                     ` Russell Johnson
     [not found]                                       ` <0082bff2d91b0125ac60050159d3003e64b45bffa35e0c4f0ed9799e38b97b8c@mu>
2022-11-30 15:57                                         ` Philippe Gerum
2022-12-01 14:36                                           ` Philippe Gerum
2022-12-01 20:01                                             ` Russell Johnson
2022-12-02  9:18                                               ` Philippe Gerum
2022-12-02 15:12                                                 ` Russell Johnson
2022-12-02 15:27                                                   ` Philippe Gerum
2022-12-02 15:38                                                     ` Philippe Gerum
2022-12-02 20:50                                                       ` Russell Johnson
2022-12-03 11:37                                                         ` Philippe Gerum
2022-12-02 15:48                                                     ` Russell Johnson
2022-12-02 16:50                                                       ` Philippe Gerum
2022-12-02 17:22                                                       ` Philippe Gerum
2022-12-02 22:26                                                         ` Russell Johnson
2022-12-03 11:37                                                           ` Philippe Gerum
2022-12-03 15:44                                                             ` Philippe Gerum
2022-12-04 11:05                                                               ` Philippe Gerum
2022-12-04 18:05                                                                 ` Philippe Gerum
2022-12-04 18:43                                                                   ` Russell Johnson
2022-12-05  6:53                                                                   ` Russell Johnson
2022-12-05  6:59                                                                     ` Russell Johnson
2022-12-05  8:24                                                                       ` Philippe Gerum
2022-12-05 16:31                                                                         ` Russell Johnson
2022-12-05 16:38                                                                           ` Russell Johnson
2022-12-05 17:01                                                                             ` Philippe Gerum
2022-12-05  8:45                                                                     ` Philippe Gerum
2022-11-14 23:33           ` Russell Johnson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=PH1P110MB1050307D7FC58A2B99A40094E23E9@PH1P110MB1050.NAMP110.PROD.OUTLOOK.COM \
    --to=russell.johnson@kratosdefense.com \
    --cc=Bryan.Butler@kratosdefense.com \
    --cc=rpm@xenomai.org \
    --cc=shawn.mcmanus@kratosdefense.com \
    --cc=xenomai@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.