[Xenomai] Possible Xenomai fuse filesystem/registry queue status files issue?

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Xenomai] Possible Xenomai fuse filesystem/registry queue status files issue?
@ 2018-04-01 17:28 Steve Freyder
  2018-04-02 13:41 ` Philippe Gerum
  0 siblings, 1 reply; 15+ messages in thread
From: Steve Freyder @ 2018-04-01 17:28 UTC (permalink / raw)
  To: xenomai

Greetings again.

As I understand it, for each rt_queue there's supposed to be a
"status file" located in the fuse filesystem underneath the
"/run/xenomai/user/session/pid/alchemy/queues" directory, with
the file name being the queue name.  This used to contain very
useful info about queue status, message counts, etc.  I don't know
when it broke or whether it's something I'm doing wrong but I'm
now getting a "memory exhausted" message on the console when I
attempt to do a "cat" on the status file.

Here's a small C program that just creates a queue, and then does
a pause to hold the accessor count non-zero.

----------qc.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/mman.h>
#include <errno.h>
#include <alchemy/queue.h>
#include <alchemy/task.h>

int main(
     int argc,
     char *argv[]
){
     RT_QUEUE q ;
     char *name = argv[1] ;
     int err ;

{
     static RT_TASK MyShadow ;

     mlockall(MCL_CURRENT|MCL_FUTURE);
     err = rt_task_shadow(&MyShadow,NULL,1,0) ;
     if (err < 0)  {
         fprintf(stderr,"shadow: %s\n",strerror(-err)) ;
         exit(-err) ;
     }
}
     err = rt_queue_create(&q,name,1024*1024,1024,Q_FIFO) ;
     if (err < 0)  {
         fprintf(stderr,"rtqc %s: %s\n",name,strerror(-err)) ;
         return(1) ;
     }
     for (int nerrno = 2 ; pause() < 0 && errno == EINTR && --nerrno > 0 
;) ;
     return(0) ;
}
----------qc.c end

A shell script to conduct the test:

----------qtest.sh
#!/bin/sh

set -x
./qc --mem-pool-size=64M --session=mysession foo &
sleep 1
find /run/xenomai
qfile=/run/xenomai/*/*/*/alchemy/queues/foo
cat $qfile
----------qtest.sh

The resulting output (logged in via the system console):

# sh qtest.sh
+ sleep 1
+ ./qc --mem-pool-size=64M --session=mysession foo
+ find /run/xenomai
/run/xenomai
/run/xenomai/root
/run/xenomai/root/mysession
/run/xenomai/root/mysession/821
/run/xenomai/root/mysession/821/alchemy
/run/xenomai/root/mysession/821/alchemy/tasks
/run/xenomai/root/mysession/821/alchemy/tasks/task@1[821]
/run/xenomai/root/mysession/821/alchemy/queues
/run/xenomai/root/mysession/821/alchemy/queues/foo
/run/xenomai/root/mysession/system
/run/xenomai/root/mysession/system/threads
/run/xenomai/root/mysession/system/heaps
/run/xenomai/root/mysession/system/version
+ qfile='/run/xenomai/*/*/*/alchemy/queues/foo'
+ cat /run/xenomai/root/mysession/821/alchemy/queues/foo
memory exhausted

At this point, it hangs, although SIGINT usually terminates it.

I've seen some cases where SIGINT won't terminate it, and a reboot is
required to clean things up.  I see this message appears to be logged
in the obstack error handler.  I don't think I'm running out of memory,
which makes me think "heap corruption".  Not much of an analysis!  I did
try varying queue sizes and max message counts - no change.

Thanks in advance,
Best regards,
Steve

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xenomai] Possible Xenomai fuse filesystem/registry queue status files issue?
  2018-04-01 17:28 [Xenomai] Possible Xenomai fuse filesystem/registry queue status files issue? Steve Freyder
@ 2018-04-02 13:41 ` Philippe Gerum
  2018-04-02 14:54   ` Steve Freyder
  0 siblings, 1 reply; 15+ messages in thread
From: Philippe Gerum @ 2018-04-02 13:41 UTC (permalink / raw)
  To: Steve Freyder, xenomai

On 04/01/2018 07:28 PM, Steve Freyder wrote:
> Greetings again.
> 
> As I understand it, for each rt_queue there's supposed to be a
> "status file" located in the fuse filesystem underneath the
> "/run/xenomai/user/session/pid/alchemy/queues" directory, with
> the file name being the queue name.  This used to contain very
> useful info about queue status, message counts, etc.  I don't know
> when it broke or whether it's something I'm doing wrong but I'm
> now getting a "memory exhausted" message on the console when I
> attempt to do a "cat" on the status file.
> 
> Here's a small C program that just creates a queue, and then does
> a pause to hold the accessor count non-zero.
> 

<snip>

> The resulting output (logged in via the system console):
> 
> # sh qtest.sh
> + sleep 1
> + ./qc --mem-pool-size=64M --session=mysession foo
> + find /run/xenomai
> /run/xenomai
> /run/xenomai/root
> /run/xenomai/root/mysession
> /run/xenomai/root/mysession/821
> /run/xenomai/root/mysession/821/alchemy
> /run/xenomai/root/mysession/821/alchemy/tasks
> /run/xenomai/root/mysession/821/alchemy/tasks/task@1[821]
> /run/xenomai/root/mysession/821/alchemy/queues
> /run/xenomai/root/mysession/821/alchemy/queues/foo
> /run/xenomai/root/mysession/system
> /run/xenomai/root/mysession/system/threads
> /run/xenomai/root/mysession/system/heaps
> /run/xenomai/root/mysession/system/version
> + qfile='/run/xenomai/*/*/*/alchemy/queues/foo'
> + cat /run/xenomai/root/mysession/821/alchemy/queues/foo
> memory exhausted
> 
> At this point, it hangs, although SIGINT usually terminates it.
> 
> I've seen some cases where SIGINT won't terminate it, and a reboot is
> required to clean things up.  I see this message appears to be logged
> in the obstack error handler.  I don't think I'm running out of memory,
> which makes me think "heap corruption".  Not much of an analysis!  I did
> try varying queue sizes and max message counts - no change.
> 

I can't reproduce this. I would suspect a rampant memory corruption too,
although running the test code over valgrind (mercury build) did not
reveal any issue.

- which Xenomai version are you using?
- cobalt / mercury ?
- do you enable the shared heap when configuring ? (--enable-pshared)

-- 
Philippe.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xenomai] Possible Xenomai fuse filesystem/registry queue status files issue?
  2018-04-02 13:41 ` Philippe Gerum
@ 2018-04-02 14:54   ` Steve Freyder
  2018-04-02 15:20     ` Philippe Gerum
  0 siblings, 1 reply; 15+ messages in thread
From: Steve Freyder @ 2018-04-02 14:54 UTC (permalink / raw)
  To: xenomai

On 4/2/2018 8:41 AM, Philippe Gerum wrote:
> On 04/01/2018 07:28 PM, Steve Freyder wrote:
>> Greetings again.
>>
>> As I understand it, for each rt_queue there's supposed to be a
>> "status file" located in the fuse filesystem underneath the
>> "/run/xenomai/user/session/pid/alchemy/queues" directory, with
>> the file name being the queue name.  This used to contain very
>> useful info about queue status, message counts, etc.  I don't know
>> when it broke or whether it's something I'm doing wrong but I'm
>> now getting a "memory exhausted" message on the console when I
>> attempt to do a "cat" on the status file.
>>
>> Here's a small C program that just creates a queue, and then does
>> a pause to hold the accessor count non-zero.
>>
> <snip>
>
>> The resulting output (logged in via the system console):
>>
>> # sh qtest.sh
>> + sleep 1
>> + ./qc --mem-pool-size=64M --session=mysession foo
>> + find /run/xenomai
>> /run/xenomai
>> /run/xenomai/root
>> /run/xenomai/root/mysession
>> /run/xenomai/root/mysession/821
>> /run/xenomai/root/mysession/821/alchemy
>> /run/xenomai/root/mysession/821/alchemy/tasks
>> /run/xenomai/root/mysession/821/alchemy/tasks/task@1[821]
>> /run/xenomai/root/mysession/821/alchemy/queues
>> /run/xenomai/root/mysession/821/alchemy/queues/foo
>> /run/xenomai/root/mysession/system
>> /run/xenomai/root/mysession/system/threads
>> /run/xenomai/root/mysession/system/heaps
>> /run/xenomai/root/mysession/system/version
>> + qfile='/run/xenomai/*/*/*/alchemy/queues/foo'
>> + cat /run/xenomai/root/mysession/821/alchemy/queues/foo
>> memory exhausted
>>
>> At this point, it hangs, although SIGINT usually terminates it.
>>
>> I've seen some cases where SIGINT won't terminate it, and a reboot is
>> required to clean things up.  I see this message appears to be logged
>> in the obstack error handler.  I don't think I'm running out of memory,
>> which makes me think "heap corruption".  Not much of an analysis!  I did
>> try varying queue sizes and max message counts - no change.
>>
> I can't reproduce this. I would suspect a rampant memory corruption too,
> although running the test code over valgrind (mercury build) did not
> reveal any issue.
>
> - which Xenomai version are you using?
> - cobalt / mercury ?
> - do you enable the shared heap when configuring ? (--enable-pshared)
>

I'm using Cobalt.  uname -a reports:

Linux sdftest 4.1.18_C01571-15S00-00.000.zimg+83fdace666 #2 SMP Fri Mar 
9 11:07:52 CST 2018 armv7l GNU/Linux

Here is the config dump:

CONFIG_MMU=1
CONFIG_SMP=1
CONFIG_XENO_BUILD_ARGS=" '--build=x86_64-linux' 
'--host=arm-emac-linux-gnueabi' '--target=arm-emac-linux-gnueabi' 
'--with-core=cobalt' '--enable-pshared' '--enable-smp' '--prefix=/usr' 
'--exec_prefix=/usr' '--includedir=/usr/include/xenomai' 
'--enable-registry' 'build_alias=x86_64-linux' 
'host_alias=arm-emac-linux-gnueabi' 
'target_alias=arm-emac-linux-gnueabi' 'CC=arm-emac-linux-gnueabi-gcc  
-march=armv7-a -mfpu=neon -mfloat-abi=softfp 
--sysroot=/home/developer/oe/build_c01571-15/tmp/sysroots/c01571-15' 
'CFLAGS=-march=armv7-a' 'LDFLAGS=-D_FILE_OFFSET_BITS=64 
-I/home/developer/oe/build_c01571-15/tmp/sysroots/c01571-15/usr/include/fuse 
-lfuse -pthread' 'CPPFLAGS=' 'CPP=arm-emac-linux-gnueabi-gcc -E 
--sysroot=/home/developer/oe/build_c01571-15/tmp/sysroots/c01571-15 
-march=armv7-a -mfpu=neon  -mfloat-abi=softfp' 
'PKG_CONFIG_PATH=/home/developer/oe/build_c01571-15/tmp/sysroots/c01571-15/usr/lib/pkgconfig:/home/developer/oe/build_c01571-15/tmp/sysroots/c01571-15/usr/share/pkgconfig' 
'PKG_CONFIG_LIBDIR=/home/developer/oe/build_c01571-15/tmp/sysroots/c01571-15/usr/lib/pkgconfig'"
CONFIG_XENO_BUILD_STRING="x86_64-pc-linux-gnu"
CONFIG_XENO_COBALT=1
CONFIG_XENO_COMPILER="gcc version 5.3.0 (GCC) "
CONFIG_XENO_DEFAULT_PERIOD=1000000
CONFIG_XENO_FORTIFY=1
CONFIG_XENO_HOST_STRING="arm-emac-linux-gnueabi"
CONFIG_XENO_LORES_CLOCK_DISABLED=1
CONFIG_XENO_PREFIX="/usr"
CONFIG_XENO_PSHARED=1
CONFIG_XENO_RAW_CLOCK_ENABLED=1
CONFIG_XENO_REGISTRY=1
CONFIG_XENO_REGISTRY_ROOT="/var/run/xenomai"
CONFIG_XENO_REVISION_LEVEL=6
CONFIG_XENO_SANITY=1
CONFIG_XENO_TLSF=1
CONFIG_XENO_TLS_MODEL="initial-exec"
CONFIG_XENO_UAPI_LEVEL=14
CONFIG_XENO_VERSION_MAJOR=3
CONFIG_XENO_VERSION_MINOR=0
CONFIG_XENO_VERSION_NAME="Stellar Parallax"
CONFIG_XENO_VERSION_STRING="3.0.6"
---
CONFIG_XENO_ASYNC_CANCEL is OFF
CONFIG_XENO_COPPERPLATE_CLOCK_RESTRICTED is OFF
CONFIG_XENO_DEBUG is OFF
CONFIG_XENO_DEBUG_FULL is OFF
CONFIG_XENO_LIBS_DLOPEN is OFF
CONFIG_XENO_MERCURY is OFF
CONFIG_XENO_VALGRIND_API is OFF
CONFIG_XENO_WORKAROUND_CONDVAR_PI is OFF
CONFIG_XENO_X86_VSYSCALL is OFF
---
PTHREAD_STACK_DEFAULT=65536
AUTOMATIC_BOOTSTRAP=1

Best regards,
Steve



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xenomai] Possible Xenomai fuse filesystem/registry queue status files issue?
  2018-04-02 14:54   ` Steve Freyder
@ 2018-04-02 15:20     ` Philippe Gerum
  2018-04-02 16:11       ` Steve Freyder
  0 siblings, 1 reply; 15+ messages in thread
From: Philippe Gerum @ 2018-04-02 15:20 UTC (permalink / raw)
  To: Steve Freyder, xenomai

On 04/02/2018 04:54 PM, Steve Freyder wrote:
> On 4/2/2018 8:41 AM, Philippe Gerum wrote:
>> On 04/01/2018 07:28 PM, Steve Freyder wrote:
>>> Greetings again.
>>>
>>> As I understand it, for each rt_queue there's supposed to be a
>>> "status file" located in the fuse filesystem underneath the
>>> "/run/xenomai/user/session/pid/alchemy/queues" directory, with
>>> the file name being the queue name.  This used to contain very
>>> useful info about queue status, message counts, etc.  I don't know
>>> when it broke or whether it's something I'm doing wrong but I'm
>>> now getting a "memory exhausted" message on the console when I
>>> attempt to do a "cat" on the status file.
>>>
>>> Here's a small C program that just creates a queue, and then does
>>> a pause to hold the accessor count non-zero.
>>>
>> <snip>
>>
>>> The resulting output (logged in via the system console):
>>>
>>> # sh qtest.sh
>>> + sleep 1
>>> + ./qc --mem-pool-size=64M --session=mysession foo
>>> + find /run/xenomai
>>> /run/xenomai
>>> /run/xenomai/root
>>> /run/xenomai/root/mysession
>>> /run/xenomai/root/mysession/821
>>> /run/xenomai/root/mysession/821/alchemy
>>> /run/xenomai/root/mysession/821/alchemy/tasks
>>> /run/xenomai/root/mysession/821/alchemy/tasks/task@1[821]
>>> /run/xenomai/root/mysession/821/alchemy/queues
>>> /run/xenomai/root/mysession/821/alchemy/queues/foo
>>> /run/xenomai/root/mysession/system
>>> /run/xenomai/root/mysession/system/threads
>>> /run/xenomai/root/mysession/system/heaps
>>> /run/xenomai/root/mysession/system/version
>>> + qfile='/run/xenomai/*/*/*/alchemy/queues/foo'
>>> + cat /run/xenomai/root/mysession/821/alchemy/queues/foo
>>> memory exhausted
>>>
>>> At this point, it hangs, although SIGINT usually terminates it.
>>>
>>> I've seen some cases where SIGINT won't terminate it, and a reboot is
>>> required to clean things up.  I see this message appears to be logged
>>> in the obstack error handler.  I don't think I'm running out of memory,
>>> which makes me think "heap corruption".  Not much of an analysis!  I did
>>> try varying queue sizes and max message counts - no change.
>>>
>> I can't reproduce this. I would suspect a rampant memory corruption too,
>> although running the test code over valgrind (mercury build) did not
>> reveal any issue.
>>
>> - which Xenomai version are you using?
>> - cobalt / mercury ?
>> - do you enable the shared heap when configuring ? (--enable-pshared)
>>
> 
> I'm using Cobalt.  uname -a reports:
> 
> Linux sdftest 4.1.18_C01571-15S00-00.000.zimg+83fdace666 #2 SMP Fri Mar
> 9 11:07:52 CST 2018 armv7l GNU/Linux
> 
> Here is the config dump:
> 
> CONFIG_XENO_PSHARED=1

Any chance you could have some leftover files in /dev/shm from aborted
runs, which would steal RAM?

-- 
Philippe.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xenomai] Possible Xenomai fuse filesystem/registry queue status files issue?
  2018-04-02 15:20     ` Philippe Gerum
@ 2018-04-02 16:11       ` Steve Freyder
  2018-04-02 16:51         ` Philippe Gerum
  0 siblings, 1 reply; 15+ messages in thread
From: Steve Freyder @ 2018-04-02 16:11 UTC (permalink / raw)
  To: xenomai

On 4/2/2018 10:20 AM, Philippe Gerum wrote:
> On 04/02/2018 04:54 PM, Steve Freyder wrote:
>> On 4/2/2018 8:41 AM, Philippe Gerum wrote:
>>> On 04/01/2018 07:28 PM, Steve Freyder wrote:
>>>> Greetings again.
>>>>
>>>> As I understand it, for each rt_queue there's supposed to be a
>>>> "status file" located in the fuse filesystem underneath the
>>>> "/run/xenomai/user/session/pid/alchemy/queues" directory, with
>>>> the file name being the queue name.  This used to contain very
>>>> useful info about queue status, message counts, etc.  I don't know
>>>> when it broke or whether it's something I'm doing wrong but I'm
>>>> now getting a "memory exhausted" message on the console when I
>>>> attempt to do a "cat" on the status file.
>>>>
>>>> Here's a small C program that just creates a queue, and then does
>>>> a pause to hold the accessor count non-zero.
>>>>
>>> <snip>
>>>
>>>> The resulting output (logged in via the system console):
>>>>
>>>> # sh qtest.sh
>>>> + sleep 1
>>>> + ./qc --mem-pool-size=64M --session=mysession foo
>>>> + find /run/xenomai
>>>> /run/xenomai
>>>> /run/xenomai/root
>>>> /run/xenomai/root/mysession
>>>> /run/xenomai/root/mysession/821
>>>> /run/xenomai/root/mysession/821/alchemy
>>>> /run/xenomai/root/mysession/821/alchemy/tasks
>>>> /run/xenomai/root/mysession/821/alchemy/tasks/task@1[821]
>>>> /run/xenomai/root/mysession/821/alchemy/queues
>>>> /run/xenomai/root/mysession/821/alchemy/queues/foo
>>>> /run/xenomai/root/mysession/system
>>>> /run/xenomai/root/mysession/system/threads
>>>> /run/xenomai/root/mysession/system/heaps
>>>> /run/xenomai/root/mysession/system/version
>>>> + qfile='/run/xenomai/*/*/*/alchemy/queues/foo'
>>>> + cat /run/xenomai/root/mysession/821/alchemy/queues/foo
>>>> memory exhausted
>>>>
>>>> At this point, it hangs, although SIGINT usually terminates it.
>>>>
>>>> I've seen some cases where SIGINT won't terminate it, and a reboot is
>>>> required to clean things up.  I see this message appears to be logged
>>>> in the obstack error handler.  I don't think I'm running out of memory,
>>>> which makes me think "heap corruption".  Not much of an analysis!  I did
>>>> try varying queue sizes and max message counts - no change.
>>>>
>>> I can't reproduce this. I would suspect a rampant memory corruption too,
>>> although running the test code over valgrind (mercury build) did not
>>> reveal any issue.
>>>
>>> - which Xenomai version are you using?
>>> - cobalt / mercury ?
>>> - do you enable the shared heap when configuring ? (--enable-pshared)
>>>
>> I'm using Cobalt.  uname -a reports:
>>
>> Linux sdftest 4.1.18_C01571-15S00-00.000.zimg+83fdace666 #2 SMP Fri Mar
>> 9 11:07:52 CST 2018 armv7l GNU/Linux
>>
>> Here is the config dump:
>>
>> CONFIG_XENO_PSHARED=1
> Any chance you could have some leftover files in /dev/shm from aborted
> runs, which would steal RAM?
>
I've been rebooting before each test run, but I'll keep that in mind for
future testing.

Sounds like I need to try rolling back to an older build, I have a 3.0.5
and a 3.0.3 build handy.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xenomai] Possible Xenomai fuse filesystem/registry queue status files issue?
  2018-04-02 16:11       ` Steve Freyder
@ 2018-04-02 16:51         ` Philippe Gerum
  2018-04-08 23:01           ` Steve Freyder
  0 siblings, 1 reply; 15+ messages in thread
From: Philippe Gerum @ 2018-04-02 16:51 UTC (permalink / raw)
  To: Steve Freyder, xenomai

On 04/02/2018 06:11 PM, Steve Freyder wrote:
> On 4/2/2018 10:20 AM, Philippe Gerum wrote:
>> On 04/02/2018 04:54 PM, Steve Freyder wrote:
>>> On 4/2/2018 8:41 AM, Philippe Gerum wrote:
>>>> On 04/01/2018 07:28 PM, Steve Freyder wrote:
>>>>> Greetings again.
>>>>>
>>>>> As I understand it, for each rt_queue there's supposed to be a
>>>>> "status file" located in the fuse filesystem underneath the
>>>>> "/run/xenomai/user/session/pid/alchemy/queues" directory, with
>>>>> the file name being the queue name.  This used to contain very
>>>>> useful info about queue status, message counts, etc.  I don't know
>>>>> when it broke or whether it's something I'm doing wrong but I'm
>>>>> now getting a "memory exhausted" message on the console when I
>>>>> attempt to do a "cat" on the status file.
>>>>>
>>>>> Here's a small C program that just creates a queue, and then does
>>>>> a pause to hold the accessor count non-zero.
>>>>>
>>>> <snip>
>>>>
>>>>> The resulting output (logged in via the system console):
>>>>>
>>>>> # sh qtest.sh
>>>>> + sleep 1
>>>>> + ./qc --mem-pool-size=64M --session=mysession foo
>>>>> + find /run/xenomai
>>>>> /run/xenomai
>>>>> /run/xenomai/root
>>>>> /run/xenomai/root/mysession
>>>>> /run/xenomai/root/mysession/821
>>>>> /run/xenomai/root/mysession/821/alchemy
>>>>> /run/xenomai/root/mysession/821/alchemy/tasks
>>>>> /run/xenomai/root/mysession/821/alchemy/tasks/task@1[821]
>>>>> /run/xenomai/root/mysession/821/alchemy/queues
>>>>> /run/xenomai/root/mysession/821/alchemy/queues/foo
>>>>> /run/xenomai/root/mysession/system
>>>>> /run/xenomai/root/mysession/system/threads
>>>>> /run/xenomai/root/mysession/system/heaps
>>>>> /run/xenomai/root/mysession/system/version
>>>>> + qfile='/run/xenomai/*/*/*/alchemy/queues/foo'
>>>>> + cat /run/xenomai/root/mysession/821/alchemy/queues/foo
>>>>> memory exhausted
>>>>>
>>>>> At this point, it hangs, although SIGINT usually terminates it.
>>>>>
>>>>> I've seen some cases where SIGINT won't terminate it, and a reboot is
>>>>> required to clean things up.  I see this message appears to be logged
>>>>> in the obstack error handler.  I don't think I'm running out of
>>>>> memory,
>>>>> which makes me think "heap corruption".  Not much of an analysis! 
>>>>> I did
>>>>> try varying queue sizes and max message counts - no change.
>>>>>
>>>> I can't reproduce this. I would suspect a rampant memory corruption
>>>> too,
>>>> although running the test code over valgrind (mercury build) did not
>>>> reveal any issue.
>>>>
>>>> - which Xenomai version are you using?
>>>> - cobalt / mercury ?
>>>> - do you enable the shared heap when configuring ? (--enable-pshared)
>>>>
>>> I'm using Cobalt.  uname -a reports:
>>>
>>> Linux sdftest 4.1.18_C01571-15S00-00.000.zimg+83fdace666 #2 SMP Fri Mar
>>> 9 11:07:52 CST 2018 armv7l GNU/Linux
>>>
>>> Here is the config dump:
>>>
>>> CONFIG_XENO_PSHARED=1
>> Any chance you could have some leftover files in /dev/shm from aborted
>> runs, which would steal RAM?
>>
> I've been rebooting before each test run, but I'll keep that in mind for
> future testing.
> 
> Sounds like I need to try rolling back to an older build, I have a 3.0.5
> and a 3.0.3 build handy.

The standalone test should work with the shared heap disabled, could you
check it against a build configure with --disable-pshared? Thanks,

-- 
Philippe.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xenomai] Possible Xenomai fuse filesystem/registry queue status files issue?
  2018-04-02 16:51         ` Philippe Gerum
@ 2018-04-08 23:01           ` Steve Freyder
  2018-04-11 14:37             ` Philippe Gerum
  2018-04-12  9:31             ` Philippe Gerum
  0 siblings, 2 replies; 15+ messages in thread
From: Steve Freyder @ 2018-04-08 23:01 UTC (permalink / raw)
  To: xenomai

On 4/2/2018 11:51 AM, Philippe Gerum wrote:
> On 04/02/2018 06:11 PM, Steve Freyder wrote:
>> On 4/2/2018 10:20 AM, Philippe Gerum wrote:
>>> On 04/02/2018 04:54 PM, Steve Freyder wrote:
>>>> On 4/2/2018 8:41 AM, Philippe Gerum wrote:
>>>>> On 04/01/2018 07:28 PM, Steve Freyder wrote:
>>>>>> Greetings again.
>>>>>>
>>>>>> As I understand it, for each rt_queue there's supposed to be a
>>>>>> "status file" located in the fuse filesystem underneath the
>>>>>> "/run/xenomai/user/session/pid/alchemy/queues" directory, with
>>>>>> the file name being the queue name.  This used to contain very
>>>>>> useful info about queue status, message counts, etc.  I don't know
>>>>>> when it broke or whether it's something I'm doing wrong but I'm
>>>>>> now getting a "memory exhausted" message on the console when I
>>>>>> attempt to do a "cat" on the status file.
>>>>>>
>>>>>> Here's a small C program that just creates a queue, and then does
>>>>>> a pause to hold the accessor count non-zero.
>>>>>>
>>>>> <snip>
>>>>>
>>>>>> The resulting output (logged in via the system console):
>>>>>>
>>>>>> # sh qtest.sh
>>>>>> + sleep 1
>>>>>> + ./qc --mem-pool-size=64M --session=mysession foo
>>>>>> + find /run/xenomai
>>>>>> /run/xenomai
>>>>>> /run/xenomai/root
>>>>>> /run/xenomai/root/mysession
>>>>>> /run/xenomai/root/mysession/821
>>>>>> /run/xenomai/root/mysession/821/alchemy
>>>>>> /run/xenomai/root/mysession/821/alchemy/tasks
>>>>>> /run/xenomai/root/mysession/821/alchemy/tasks/task@1[821]
>>>>>> /run/xenomai/root/mysession/821/alchemy/queues
>>>>>> /run/xenomai/root/mysession/821/alchemy/queues/foo
>>>>>> /run/xenomai/root/mysession/system
>>>>>> /run/xenomai/root/mysession/system/threads
>>>>>> /run/xenomai/root/mysession/system/heaps
>>>>>> /run/xenomai/root/mysession/system/version
>>>>>> + qfile='/run/xenomai/*/*/*/alchemy/queues/foo'
>>>>>> + cat /run/xenomai/root/mysession/821/alchemy/queues/foo
>>>>>> memory exhausted
>>>>>>
>>>>>> At this point, it hangs, although SIGINT usually terminates it.
>>>>>>
>>>>>> I've seen some cases where SIGINT won't terminate it, and a reboot is
>>>>>> required to clean things up.  I see this message appears to be logged
>>>>>> in the obstack error handler.  I don't think I'm running out of
>>>>>> memory,
>>>>>> which makes me think "heap corruption".  Not much of an analysis!
>>>>>> I did
>>>>>> try varying queue sizes and max message counts - no change.
>>>>>>
>>>>> I can't reproduce this. I would suspect a rampant memory corruption
>>>>> too,
>>>>> although running the test code over valgrind (mercury build) did not
>>>>> reveal any issue.
>>>>>
>>>>> - which Xenomai version are you using?
>>>>> - cobalt / mercury ?
>>>>> - do you enable the shared heap when configuring ? (--enable-pshared)
>>>>>
>>>> I'm using Cobalt.  uname -a reports:
>>>>
>>>> Linux sdftest 4.1.18_C01571-15S00-00.000.zimg+83fdace666 #2 SMP Fri Mar
>>>> 9 11:07:52 CST 2018 armv7l GNU/Linux
>>>>
>>>> Here is the config dump:
>>>>
>>>> CONFIG_XENO_PSHARED=1
>>> Any chance you could have some leftover files in /dev/shm from aborted
>>> runs, which would steal RAM?
>>>
>> I've been rebooting before each test run, but I'll keep that in mind for
>> future testing.
>>
>> Sounds like I need to try rolling back to an older build, I have a 3.0.5
>> and a 3.0.3 build handy.
> The standalone test should work with the shared heap disabled, could you
> check it against a build configure with --disable-pshared? Thanks,
>
Philippe,

Sorry for the delay - our vendor had been doing all of our kernel and SDK
builds so I had to do a lot of learning to get this all going.

With the --disable-pshared in effect:

/.g3l # ./qc --dump-config | grep SHARED
based on Xenomai/cobalt v3.0.6 -- #6e34bb5 (2018-04-01 10:50:59 +0200)
CONFIG_XENO_PSHARED is OFF

/.g3l # ./qc foo &
/.g3l # find /run/xenomai/
/run/xenomai/
/run/xenomai/root
/run/xenomai/root/opus
/run/xenomai/root/opus/3477
/run/xenomai/root/opus/3477/alchemy
/run/xenomai/root/opus/3477/alchemy/tasks
/run/xenomai/root/opus/3477/alchemy/tasks/qcreate3477
/run/xenomai/root/opus/3477/alchemy/queues
/run/xenomai/root/opus/3477/alchemy/queues/foo
/run/xenomai/root/opus/system
/run/xenomai/root/opus/system/threads
/run/xenomai/root/opus/system/heaps
/run/xenomai/root/opus/system/version
root@ICB-G3L:/.g3l # cat run/xenomai/root/opus/3477/alchemy/queues/foo
[TYPE]  [TOTALMEM]  [USEDMEM]  [QLIMIT]  [MCOUNT]
  FIFO        5344       3248        10         0

Perfect!

What's the next step?

Best,
Steve



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xenomai] Possible Xenomai fuse filesystem/registry queue status files issue?
  2018-04-08 23:01           ` Steve Freyder
@ 2018-04-11 14:37             ` Philippe Gerum
  2018-04-12  9:31             ` Philippe Gerum
  1 sibling, 0 replies; 15+ messages in thread
From: Philippe Gerum @ 2018-04-11 14:37 UTC (permalink / raw)
  To: Steve Freyder, xenomai

On 04/09/2018 01:01 AM, Steve Freyder wrote:
> On 4/2/2018 11:51 AM, Philippe Gerum wrote:
>> On 04/02/2018 06:11 PM, Steve Freyder wrote:
>>> On 4/2/2018 10:20 AM, Philippe Gerum wrote:
>>>> On 04/02/2018 04:54 PM, Steve Freyder wrote:
>>>>> On 4/2/2018 8:41 AM, Philippe Gerum wrote:
>>>>>> On 04/01/2018 07:28 PM, Steve Freyder wrote:
>>>>>>> Greetings again.
>>>>>>>
>>>>>>> As I understand it, for each rt_queue there's supposed to be a
>>>>>>> "status file" located in the fuse filesystem underneath the
>>>>>>> "/run/xenomai/user/session/pid/alchemy/queues" directory, with
>>>>>>> the file name being the queue name.  This used to contain very
>>>>>>> useful info about queue status, message counts, etc.  I don't know
>>>>>>> when it broke or whether it's something I'm doing wrong but I'm
>>>>>>> now getting a "memory exhausted" message on the console when I
>>>>>>> attempt to do a "cat" on the status file.
>>>>>>>
>>>>>>> Here's a small C program that just creates a queue, and then does
>>>>>>> a pause to hold the accessor count non-zero.
>>>>>>>
>>>>>> <snip>
>>>>>>
>>>>>>> The resulting output (logged in via the system console):
>>>>>>>
>>>>>>> # sh qtest.sh
>>>>>>> + sleep 1
>>>>>>> + ./qc --mem-pool-size=64M --session=mysession foo
>>>>>>> + find /run/xenomai
>>>>>>> /run/xenomai
>>>>>>> /run/xenomai/root
>>>>>>> /run/xenomai/root/mysession
>>>>>>> /run/xenomai/root/mysession/821
>>>>>>> /run/xenomai/root/mysession/821/alchemy
>>>>>>> /run/xenomai/root/mysession/821/alchemy/tasks
>>>>>>> /run/xenomai/root/mysession/821/alchemy/tasks/task@1[821]
>>>>>>> /run/xenomai/root/mysession/821/alchemy/queues
>>>>>>> /run/xenomai/root/mysession/821/alchemy/queues/foo
>>>>>>> /run/xenomai/root/mysession/system
>>>>>>> /run/xenomai/root/mysession/system/threads
>>>>>>> /run/xenomai/root/mysession/system/heaps
>>>>>>> /run/xenomai/root/mysession/system/version
>>>>>>> + qfile='/run/xenomai/*/*/*/alchemy/queues/foo'
>>>>>>> + cat /run/xenomai/root/mysession/821/alchemy/queues/foo
>>>>>>> memory exhausted
>>>>>>>
>>>>>>> At this point, it hangs, although SIGINT usually terminates it.
>>>>>>>
>>>>>>> I've seen some cases where SIGINT won't terminate it, and a
>>>>>>> reboot is
>>>>>>> required to clean things up.  I see this message appears to be
>>>>>>> logged
>>>>>>> in the obstack error handler.  I don't think I'm running out of
>>>>>>> memory,
>>>>>>> which makes me think "heap corruption".  Not much of an analysis!
>>>>>>> I did
>>>>>>> try varying queue sizes and max message counts - no change.
>>>>>>>
>>>>>> I can't reproduce this. I would suspect a rampant memory corruption
>>>>>> too,
>>>>>> although running the test code over valgrind (mercury build) did not
>>>>>> reveal any issue.
>>>>>>
>>>>>> - which Xenomai version are you using?
>>>>>> - cobalt / mercury ?
>>>>>> - do you enable the shared heap when configuring ? (--enable-pshared)
>>>>>>
>>>>> I'm using Cobalt.  uname -a reports:
>>>>>
>>>>> Linux sdftest 4.1.18_C01571-15S00-00.000.zimg+83fdace666 #2 SMP Fri
>>>>> Mar
>>>>> 9 11:07:52 CST 2018 armv7l GNU/Linux
>>>>>
>>>>> Here is the config dump:
>>>>>
>>>>> CONFIG_XENO_PSHARED=1
>>>> Any chance you could have some leftover files in /dev/shm from aborted
>>>> runs, which would steal RAM?
>>>>
>>> I've been rebooting before each test run, but I'll keep that in mind for
>>> future testing.
>>>
>>> Sounds like I need to try rolling back to an older build, I have a 3.0.5
>>> and a 3.0.3 build handy.
>> The standalone test should work with the shared heap disabled, could you
>> check it against a build configure with --disable-pshared? Thanks,
>>
> Philippe,
> 
> Sorry for the delay - our vendor had been doing all of our kernel and SDK
> builds so I had to do a lot of learning to get this all going.
> 
> With the --disable-pshared in effect:
> 
> /.g3l # ./qc --dump-config | grep SHARED
> based on Xenomai/cobalt v3.0.6 -- #6e34bb5 (2018-04-01 10:50:59 +0200)
> CONFIG_XENO_PSHARED is OFF
> 
> /.g3l # ./qc foo &
> /.g3l # find /run/xenomai/
> /run/xenomai/
> /run/xenomai/root
> /run/xenomai/root/opus
> /run/xenomai/root/opus/3477
> /run/xenomai/root/opus/3477/alchemy
> /run/xenomai/root/opus/3477/alchemy/tasks
> /run/xenomai/root/opus/3477/alchemy/tasks/qcreate3477
> /run/xenomai/root/opus/3477/alchemy/queues
> /run/xenomai/root/opus/3477/alchemy/queues/foo
> /run/xenomai/root/opus/system
> /run/xenomai/root/opus/system/threads
> /run/xenomai/root/opus/system/heaps
> /run/xenomai/root/opus/system/version
> root@ICB-G3L:/.g3l # cat run/xenomai/root/opus/3477/alchemy/queues/foo
> [TYPE]  [TOTALMEM]  [USEDMEM]  [QLIMIT]  [MCOUNT]
>  FIFO        5344       3248        10         0
> 
> Perfect!
> 
> What's the next step?

We need to get to the bottom of this issue, because we just can't
release 3.0.7 with a bug in the pshared allocator. I could not reproduce
this bug last time I tried using the test snippet, but I did not have
your full config settings then, so I need to redo the whole test using
the same configuration. I'll follow up on this.

Thanks for the feedback.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xenomai] Possible Xenomai fuse filesystem/registry queue status files issue?
  2018-04-08 23:01           ` Steve Freyder
  2018-04-11 14:37             ` Philippe Gerum
@ 2018-04-12  9:31             ` Philippe Gerum
  2018-04-12 10:23               ` Philippe Gerum
  1 sibling, 1 reply; 15+ messages in thread
From: Philippe Gerum @ 2018-04-12  9:31 UTC (permalink / raw)
  To: Steve Freyder, xenomai

On 04/09/2018 01:01 AM, Steve Freyder wrote:
> On 4/2/2018 11:51 AM, Philippe Gerum wrote:
>> On 04/02/2018 06:11 PM, Steve Freyder wrote:
>>> On 4/2/2018 10:20 AM, Philippe Gerum wrote:
>>>> On 04/02/2018 04:54 PM, Steve Freyder wrote:
>>>>> On 4/2/2018 8:41 AM, Philippe Gerum wrote:
>>>>>> On 04/01/2018 07:28 PM, Steve Freyder wrote:
>>>>>>> Greetings again.
>>>>>>>
>>>>>>> As I understand it, for each rt_queue there's supposed to be a
>>>>>>> "status file" located in the fuse filesystem underneath the
>>>>>>> "/run/xenomai/user/session/pid/alchemy/queues" directory, with
>>>>>>> the file name being the queue name.  This used to contain very
>>>>>>> useful info about queue status, message counts, etc.  I don't know
>>>>>>> when it broke or whether it's something I'm doing wrong but I'm
>>>>>>> now getting a "memory exhausted" message on the console when I
>>>>>>> attempt to do a "cat" on the status file.
>>>>>>>
>>>>>>> Here's a small C program that just creates a queue, and then does
>>>>>>> a pause to hold the accessor count non-zero.
>>>>>>>
>>>>>> <snip>
>>>>>>
>>>>>>> The resulting output (logged in via the system console):
>>>>>>>
>>>>>>> # sh qtest.sh
>>>>>>> + sleep 1
>>>>>>> + ./qc --mem-pool-size=64M --session=mysession foo
>>>>>>> + find /run/xenomai
>>>>>>> /run/xenomai
>>>>>>> /run/xenomai/root
>>>>>>> /run/xenomai/root/mysession
>>>>>>> /run/xenomai/root/mysession/821
>>>>>>> /run/xenomai/root/mysession/821/alchemy
>>>>>>> /run/xenomai/root/mysession/821/alchemy/tasks
>>>>>>> /run/xenomai/root/mysession/821/alchemy/tasks/task@1[821]
>>>>>>> /run/xenomai/root/mysession/821/alchemy/queues
>>>>>>> /run/xenomai/root/mysession/821/alchemy/queues/foo
>>>>>>> /run/xenomai/root/mysession/system
>>>>>>> /run/xenomai/root/mysession/system/threads
>>>>>>> /run/xenomai/root/mysession/system/heaps
>>>>>>> /run/xenomai/root/mysession/system/version
>>>>>>> + qfile='/run/xenomai/*/*/*/alchemy/queues/foo'
>>>>>>> + cat /run/xenomai/root/mysession/821/alchemy/queues/foo
>>>>>>> memory exhausted
>>>>>>>
>>>>>>> At this point, it hangs, although SIGINT usually terminates it.
>>>>>>>
>>>>>>> I've seen some cases where SIGINT won't terminate it, and a
>>>>>>> reboot is
>>>>>>> required to clean things up.  I see this message appears to be
>>>>>>> logged
>>>>>>> in the obstack error handler.  I don't think I'm running out of
>>>>>>> memory,
>>>>>>> which makes me think "heap corruption".  Not much of an analysis!
>>>>>>> I did
>>>>>>> try varying queue sizes and max message counts - no change.
>>>>>>>
>>>>>> I can't reproduce this. I would suspect a rampant memory corruption
>>>>>> too,
>>>>>> although running the test code over valgrind (mercury build) did not
>>>>>> reveal any issue.
>>>>>>
>>>>>> - which Xenomai version are you using?
>>>>>> - cobalt / mercury ?
>>>>>> - do you enable the shared heap when configuring ? (--enable-pshared)
>>>>>>
>>>>> I'm using Cobalt.  uname -a reports:
>>>>>
>>>>> Linux sdftest 4.1.18_C01571-15S00-00.000.zimg+83fdace666 #2 SMP Fri
>>>>> Mar
>>>>> 9 11:07:52 CST 2018 armv7l GNU/Linux
>>>>>
>>>>> Here is the config dump:
>>>>>
>>>>> CONFIG_XENO_PSHARED=1
>>>> Any chance you could have some leftover files in /dev/shm from aborted
>>>> runs, which would steal RAM?
>>>>
>>> I've been rebooting before each test run, but I'll keep that in mind for
>>> future testing.
>>>
>>> Sounds like I need to try rolling back to an older build, I have a 3.0.5
>>> and a 3.0.3 build handy.
>> The standalone test should work with the shared heap disabled, could you
>> check it against a build configure with --disable-pshared? Thanks,
>>
> Philippe,
> 
> Sorry for the delay - our vendor had been doing all of our kernel and SDK
> builds so I had to do a lot of learning to get this all going.
> 
> With the --disable-pshared in effect:
> 
> /.g3l # ./qc --dump-config | grep SHARED
> based on Xenomai/cobalt v3.0.6 -- #6e34bb5 (2018-04-01 10:50:59 +0200)
> CONFIG_XENO_PSHARED is OFF
> 
> /.g3l # ./qc foo &
> /.g3l # find /run/xenomai/
> /run/xenomai/
> /run/xenomai/root
> /run/xenomai/root/opus
> /run/xenomai/root/opus/3477
> /run/xenomai/root/opus/3477/alchemy
> /run/xenomai/root/opus/3477/alchemy/tasks
> /run/xenomai/root/opus/3477/alchemy/tasks/qcreate3477
> /run/xenomai/root/opus/3477/alchemy/queues
> /run/xenomai/root/opus/3477/alchemy/queues/foo
> /run/xenomai/root/opus/system
> /run/xenomai/root/opus/system/threads
> /run/xenomai/root/opus/system/heaps
> /run/xenomai/root/opus/system/version
> root@ICB-G3L:/.g3l # cat run/xenomai/root/opus/3477/alchemy/queues/foo
> [TYPE]  [TOTALMEM]  [USEDMEM]  [QLIMIT]  [MCOUNT]
>  FIFO        5344       3248        10         0
> 
> Perfect!
> 
> What's the next step?
> 

I can reproduce this issue. I'm on it.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xenomai] Possible Xenomai fuse filesystem/registry queue status files issue?
  2018-04-12  9:31             ` Philippe Gerum
@ 2018-04-12 10:23               ` Philippe Gerum
  2018-04-12 15:44                 ` Steve Freyder
  0 siblings, 1 reply; 15+ messages in thread
From: Philippe Gerum @ 2018-04-12 10:23 UTC (permalink / raw)
  To: Steve Freyder, xenomai

On 04/12/2018 11:31 AM, Philippe Gerum wrote:
> On 04/09/2018 01:01 AM, Steve Freyder wrote:
>> On 4/2/2018 11:51 AM, Philippe Gerum wrote:
>>> On 04/02/2018 06:11 PM, Steve Freyder wrote:
>>>> On 4/2/2018 10:20 AM, Philippe Gerum wrote:
>>>>> On 04/02/2018 04:54 PM, Steve Freyder wrote:
>>>>>> On 4/2/2018 8:41 AM, Philippe Gerum wrote:
>>>>>>> On 04/01/2018 07:28 PM, Steve Freyder wrote:
>>>>>>>> Greetings again.
>>>>>>>>
>>>>>>>> As I understand it, for each rt_queue there's supposed to be a
>>>>>>>> "status file" located in the fuse filesystem underneath the
>>>>>>>> "/run/xenomai/user/session/pid/alchemy/queues" directory, with
>>>>>>>> the file name being the queue name.  This used to contain very
>>>>>>>> useful info about queue status, message counts, etc.  I don't know
>>>>>>>> when it broke or whether it's something I'm doing wrong but I'm
>>>>>>>> now getting a "memory exhausted" message on the console when I
>>>>>>>> attempt to do a "cat" on the status file.
>>>>>>>>
>>>>>>>> Here's a small C program that just creates a queue, and then does
>>>>>>>> a pause to hold the accessor count non-zero.
>>>>>>>>
>>>>>>> <snip>
>>>>>>>
>>>>>>>> The resulting output (logged in via the system console):
>>>>>>>>
>>>>>>>> # sh qtest.sh
>>>>>>>> + sleep 1
>>>>>>>> + ./qc --mem-pool-size=64M --session=mysession foo
>>>>>>>> + find /run/xenomai
>>>>>>>> /run/xenomai
>>>>>>>> /run/xenomai/root
>>>>>>>> /run/xenomai/root/mysession
>>>>>>>> /run/xenomai/root/mysession/821
>>>>>>>> /run/xenomai/root/mysession/821/alchemy
>>>>>>>> /run/xenomai/root/mysession/821/alchemy/tasks
>>>>>>>> /run/xenomai/root/mysession/821/alchemy/tasks/task@1[821]
>>>>>>>> /run/xenomai/root/mysession/821/alchemy/queues
>>>>>>>> /run/xenomai/root/mysession/821/alchemy/queues/foo
>>>>>>>> /run/xenomai/root/mysession/system
>>>>>>>> /run/xenomai/root/mysession/system/threads
>>>>>>>> /run/xenomai/root/mysession/system/heaps
>>>>>>>> /run/xenomai/root/mysession/system/version
>>>>>>>> + qfile='/run/xenomai/*/*/*/alchemy/queues/foo'
>>>>>>>> + cat /run/xenomai/root/mysession/821/alchemy/queues/foo
>>>>>>>> memory exhausted
>>>>>>>>
>>>>>>>> At this point, it hangs, although SIGINT usually terminates it.
>>>>>>>>
>>>>>>>> I've seen some cases where SIGINT won't terminate it, and a
>>>>>>>> reboot is
>>>>>>>> required to clean things up.  I see this message appears to be
>>>>>>>> logged
>>>>>>>> in the obstack error handler.  I don't think I'm running out of
>>>>>>>> memory,
>>>>>>>> which makes me think "heap corruption".  Not much of an analysis!
>>>>>>>> I did
>>>>>>>> try varying queue sizes and max message counts - no change.
>>>>>>>>
>>>>>>> I can't reproduce this. I would suspect a rampant memory corruption
>>>>>>> too,
>>>>>>> although running the test code over valgrind (mercury build) did not
>>>>>>> reveal any issue.
>>>>>>>
>>>>>>> - which Xenomai version are you using?
>>>>>>> - cobalt / mercury ?
>>>>>>> - do you enable the shared heap when configuring ? (--enable-pshared)
>>>>>>>
>>>>>> I'm using Cobalt.  uname -a reports:
>>>>>>
>>>>>> Linux sdftest 4.1.18_C01571-15S00-00.000.zimg+83fdace666 #2 SMP Fri
>>>>>> Mar
>>>>>> 9 11:07:52 CST 2018 armv7l GNU/Linux
>>>>>>
>>>>>> Here is the config dump:
>>>>>>
>>>>>> CONFIG_XENO_PSHARED=1
>>>>> Any chance you could have some leftover files in /dev/shm from aborted
>>>>> runs, which would steal RAM?
>>>>>
>>>> I've been rebooting before each test run, but I'll keep that in mind for
>>>> future testing.
>>>>
>>>> Sounds like I need to try rolling back to an older build, I have a 3.0.5
>>>> and a 3.0.3 build handy.
>>> The standalone test should work with the shared heap disabled, could you
>>> check it against a build configure with --disable-pshared? Thanks,
>>>
>> Philippe,
>>
>> Sorry for the delay - our vendor had been doing all of our kernel and SDK
>> builds so I had to do a lot of learning to get this all going.
>>
>> With the --disable-pshared in effect:
>>
>> /.g3l # ./qc --dump-config | grep SHARED
>> based on Xenomai/cobalt v3.0.6 -- #6e34bb5 (2018-04-01 10:50:59 +0200)
>> CONFIG_XENO_PSHARED is OFF
>>
>> /.g3l # ./qc foo &
>> /.g3l # find /run/xenomai/
>> /run/xenomai/
>> /run/xenomai/root
>> /run/xenomai/root/opus
>> /run/xenomai/root/opus/3477
>> /run/xenomai/root/opus/3477/alchemy
>> /run/xenomai/root/opus/3477/alchemy/tasks
>> /run/xenomai/root/opus/3477/alchemy/tasks/qcreate3477
>> /run/xenomai/root/opus/3477/alchemy/queues
>> /run/xenomai/root/opus/3477/alchemy/queues/foo
>> /run/xenomai/root/opus/system
>> /run/xenomai/root/opus/system/threads
>> /run/xenomai/root/opus/system/heaps
>> /run/xenomai/root/opus/system/version
>> root@ICB-G3L:/.g3l # cat run/xenomai/root/opus/3477/alchemy/queues/foo
>> [TYPE]  [TOTALMEM]  [USEDMEM]  [QLIMIT]  [MCOUNT]
>>  FIFO        5344       3248        10         0
>>
>> Perfect!
>>
>> What's the next step?
>>
> 
> I can reproduce this issue. I'm on it.
> 

The patch below should solve the problem for the registry, however this
may have uncovered a bug in the "tlsf" allocator (once again), which
should not have failed allocating memory. Two separate issues then.

diff --git a/include/copperplate/registry-obstack.h
b/include/copperplate/registry-obstack.h
index fe192faf7..48e453bc3 100644
--- a/include/copperplate/registry-obstack.h
+++ b/include/copperplate/registry-obstack.h
@@ -29,11 +29,12 @@ struct threadobj;
 struct syncobj;

 /*
- * Assume we may want fast allocation of private memory from real-time
- * mode when growing the obstack.
+ * Obstacks are grown from handlers called by the fusefs server
+ * thread, which has no real-time requirement: malloc/free is fine for
+ * memory management.
  */
-#define obstack_chunk_alloc	pvmalloc
-#define obstack_chunk_free	pvfree
+#define obstack_chunk_alloc	malloc
+#define obstack_chunk_free	free

 struct threadobj;


-- 
Philippe.


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [Xenomai] Possible Xenomai fuse filesystem/registry queue status files issue?
  2018-04-12 10:23               ` Philippe Gerum
@ 2018-04-12 15:44                 ` Steve Freyder
  2018-04-12 16:05                   ` Philippe Gerum
  0 siblings, 1 reply; 15+ messages in thread
From: Steve Freyder @ 2018-04-12 15:44 UTC (permalink / raw)
  To: Philippe Gerum, xenomai

On 4/12/2018 5:23 AM, Philippe Gerum wrote:
> On 04/12/2018 11:31 AM, Philippe Gerum wrote:
>> On 04/09/2018 01:01 AM, Steve Freyder wrote:
>>> On 4/2/2018 11:51 AM, Philippe Gerum wrote:
>>>> On 04/02/2018 06:11 PM, Steve Freyder wrote:
>>>>> On 4/2/2018 10:20 AM, Philippe Gerum wrote:
>>>>>> On 04/02/2018 04:54 PM, Steve Freyder wrote:
>>>>>>> On 4/2/2018 8:41 AM, Philippe Gerum wrote:
>>>>>>>> On 04/01/2018 07:28 PM, Steve Freyder wrote:
>>>>>>>>> Greetings again.
>>>>>>>>>
>>>>>>>>> As I understand it, for each rt_queue there's supposed to be a
>>>>>>>>> "status file" located in the fuse filesystem underneath the
>>>>>>>>> "/run/xenomai/user/session/pid/alchemy/queues" directory, with
>>>>>>>>> the file name being the queue name.  This used to contain very
>>>>>>>>> useful info about queue status, message counts, etc.  I don't know
>>>>>>>>> when it broke or whether it's something I'm doing wrong but I'm
>>>>>>>>> now getting a "memory exhausted" message on the console when I
>>>>>>>>> attempt to do a "cat" on the status file.
>>>>>>>>>
>>>>>>>>> Here's a small C program that just creates a queue, and then does
>>>>>>>>> a pause to hold the accessor count non-zero.
>>>>>>>>>
>>>>>>>> <snip>
>>>>>>>>
>>>>>>>>> The resulting output (logged in via the system console):
>>>>>>>>>
>>>>>>>>> # sh qtest.sh
>>>>>>>>> + sleep 1
>>>>>>>>> + ./qc --mem-pool-size=64M --session=mysession foo
>>>>>>>>> + find /run/xenomai
>>>>>>>>> /run/xenomai
>>>>>>>>> /run/xenomai/root
>>>>>>>>> /run/xenomai/root/mysession
>>>>>>>>> /run/xenomai/root/mysession/821
>>>>>>>>> /run/xenomai/root/mysession/821/alchemy
>>>>>>>>> /run/xenomai/root/mysession/821/alchemy/tasks
>>>>>>>>> /run/xenomai/root/mysession/821/alchemy/tasks/task@1[821]
>>>>>>>>> /run/xenomai/root/mysession/821/alchemy/queues
>>>>>>>>> /run/xenomai/root/mysession/821/alchemy/queues/foo
>>>>>>>>> /run/xenomai/root/mysession/system
>>>>>>>>> /run/xenomai/root/mysession/system/threads
>>>>>>>>> /run/xenomai/root/mysession/system/heaps
>>>>>>>>> /run/xenomai/root/mysession/system/version
>>>>>>>>> + qfile='/run/xenomai/*/*/*/alchemy/queues/foo'
>>>>>>>>> + cat /run/xenomai/root/mysession/821/alchemy/queues/foo
>>>>>>>>> memory exhausted
>>>>>>>>>
>>>>>>>>> At this point, it hangs, although SIGINT usually terminates it.
>>>>>>>>>
>>>>>>>>> I've seen some cases where SIGINT won't terminate it, and a
>>>>>>>>> reboot is
>>>>>>>>> required to clean things up.  I see this message appears to be
>>>>>>>>> logged
>>>>>>>>> in the obstack error handler.  I don't think I'm running out of
>>>>>>>>> memory,
>>>>>>>>> which makes me think "heap corruption".  Not much of an analysis!
>>>>>>>>> I did
>>>>>>>>> try varying queue sizes and max message counts - no change.
>>>>>>>>>
>>>>>>>> I can't reproduce this. I would suspect a rampant memory corruption
>>>>>>>> too,
>>>>>>>> although running the test code over valgrind (mercury build) did not
>>>>>>>> reveal any issue.
>>>>>>>>
>>>>>>>> - which Xenomai version are you using?
>>>>>>>> - cobalt / mercury ?
>>>>>>>> - do you enable the shared heap when configuring ? (--enable-pshared)
>>>>>>>>
>>>>>>> I'm using Cobalt.  uname -a reports:
>>>>>>>
>>>>>>> Linux sdftest 4.1.18_C01571-15S00-00.000.zimg+83fdace666 #2 SMP Fri
>>>>>>> Mar
>>>>>>> 9 11:07:52 CST 2018 armv7l GNU/Linux
>>>>>>>
>>>>>>> Here is the config dump:
>>>>>>>
>>>>>>> CONFIG_XENO_PSHARED=1
>>>>>> Any chance you could have some leftover files in /dev/shm from aborted
>>>>>> runs, which would steal RAM?
>>>>>>
>>>>> I've been rebooting before each test run, but I'll keep that in mind for
>>>>> future testing.
>>>>>
>>>>> Sounds like I need to try rolling back to an older build, I have a 3.0.5
>>>>> and a 3.0.3 build handy.
>>>> The standalone test should work with the shared heap disabled, could you
>>>> check it against a build configure with --disable-pshared? Thanks,
>>>>
>>> Philippe,
>>>
>>> Sorry for the delay - our vendor had been doing all of our kernel and SDK
>>> builds so I had to do a lot of learning to get this all going.
>>>
>>> With the --disable-pshared in effect:
>>>
>>> /.g3l # ./qc --dump-config | grep SHARED
>>> based on Xenomai/cobalt v3.0.6 -- #6e34bb5 (2018-04-01 10:50:59 +0200)
>>> CONFIG_XENO_PSHARED is OFF
>>>
>>> /.g3l # ./qc foo &
>>> /.g3l # find /run/xenomai/
>>> /run/xenomai/
>>> /run/xenomai/root
>>> /run/xenomai/root/opus
>>> /run/xenomai/root/opus/3477
>>> /run/xenomai/root/opus/3477/alchemy
>>> /run/xenomai/root/opus/3477/alchemy/tasks
>>> /run/xenomai/root/opus/3477/alchemy/tasks/qcreate3477
>>> /run/xenomai/root/opus/3477/alchemy/queues
>>> /run/xenomai/root/opus/3477/alchemy/queues/foo
>>> /run/xenomai/root/opus/system
>>> /run/xenomai/root/opus/system/threads
>>> /run/xenomai/root/opus/system/heaps
>>> /run/xenomai/root/opus/system/version
>>> root@ICB-G3L:/.g3l # cat run/xenomai/root/opus/3477/alchemy/queues/foo
>>> [TYPE]  [TOTALMEM]  [USEDMEM]  [QLIMIT]  [MCOUNT]
>>>   FIFO        5344       3248        10         0
>>>
>>> Perfect!
>>>
>>> What's the next step?
>>>
>> I can reproduce this issue. I'm on it.
>>
> The patch below should solve the problem for the registry, however this
> may have uncovered a bug in the "tlsf" allocator (once again), which
> should not have failed allocating memory. Two separate issues then.
>
> diff --git a/include/copperplate/registry-obstack.h
> b/include/copperplate/registry-obstack.h
> index fe192faf7..48e453bc3 100644
> --- a/include/copperplate/registry-obstack.h
> +++ b/include/copperplate/registry-obstack.h
> @@ -29,11 +29,12 @@ struct threadobj;
>   struct syncobj;
>
>   /*
> - * Assume we may want fast allocation of private memory from real-time
> - * mode when growing the obstack.
> + * Obstacks are grown from handlers called by the fusefs server
> + * thread, which has no real-time requirement: malloc/free is fine for
> + * memory management.
>    */
> -#define obstack_chunk_alloc	pvmalloc
> -#define obstack_chunk_free	pvfree
> +#define obstack_chunk_alloc	malloc
> +#define obstack_chunk_free	free
>
>   struct threadobj;
>
>
Thanks Philippe,

I shall add this to my build ASAP.  If I understand correctly, this is 
switching
the entire registry-obstack-related dynamic storage allocation mechanism 
from the
"pv" routines (TLSF allocator?) paradigm to the standard malloc/free 
paradigm.

I ask because my next issue report was going to be about a SEGV that I have
been seeing occasionally in registry_add_file() after having called 
pvstrdup()
and having gotten a NULL return back.  The caller there apparently does not
expect a NULL return, so when you said "should not have failed 
allocating memory"
that brought my attention back to the SEGV issue.  This appears to be 
related to
what I will call "heavy registry activity" when I am initializing - creating
lots of RT tasks, queues, mutexes, etc, causing hot activity in 
registry_add_file
I would expect.

I'm thinking creation of a "registry exerciser" program may be in order...



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xenomai] Possible Xenomai fuse filesystem/registry queue status files issue?
  2018-04-12 15:44                 ` Steve Freyder
@ 2018-04-12 16:05                   ` Philippe Gerum
  2018-04-12 17:56                     ` Steve Freyder
  0 siblings, 1 reply; 15+ messages in thread
From: Philippe Gerum @ 2018-04-12 16:05 UTC (permalink / raw)
  To: Steve Freyder, xenomai

On 04/12/2018 05:44 PM, Steve Freyder wrote:
> On 4/12/2018 5:23 AM, Philippe Gerum wrote:
>> On 04/12/2018 11:31 AM, Philippe Gerum wrote:
>>> On 04/09/2018 01:01 AM, Steve Freyder wrote:
>>>> On 4/2/2018 11:51 AM, Philippe Gerum wrote:
>>>>> On 04/02/2018 06:11 PM, Steve Freyder wrote:
>>>>>> On 4/2/2018 10:20 AM, Philippe Gerum wrote:
>>>>>>> On 04/02/2018 04:54 PM, Steve Freyder wrote:
>>>>>>>> On 4/2/2018 8:41 AM, Philippe Gerum wrote:
>>>>>>>>> On 04/01/2018 07:28 PM, Steve Freyder wrote:
>>>>>>>>>> Greetings again.
>>>>>>>>>>
>>>>>>>>>> As I understand it, for each rt_queue there's supposed to be a
>>>>>>>>>> "status file" located in the fuse filesystem underneath the
>>>>>>>>>> "/run/xenomai/user/session/pid/alchemy/queues" directory, with
>>>>>>>>>> the file name being the queue name.  This used to contain very
>>>>>>>>>> useful info about queue status, message counts, etc.  I don't
>>>>>>>>>> know
>>>>>>>>>> when it broke or whether it's something I'm doing wrong but I'm
>>>>>>>>>> now getting a "memory exhausted" message on the console when I
>>>>>>>>>> attempt to do a "cat" on the status file.
>>>>>>>>>>
>>>>>>>>>> Here's a small C program that just creates a queue, and then does
>>>>>>>>>> a pause to hold the accessor count non-zero.
>>>>>>>>>>
>>>>>>>>> <snip>
>>>>>>>>>
>>>>>>>>>> The resulting output (logged in via the system console):
>>>>>>>>>>
>>>>>>>>>> # sh qtest.sh
>>>>>>>>>> + sleep 1
>>>>>>>>>> + ./qc --mem-pool-size=64M --session=mysession foo
>>>>>>>>>> + find /run/xenomai
>>>>>>>>>> /run/xenomai
>>>>>>>>>> /run/xenomai/root
>>>>>>>>>> /run/xenomai/root/mysession
>>>>>>>>>> /run/xenomai/root/mysession/821
>>>>>>>>>> /run/xenomai/root/mysession/821/alchemy
>>>>>>>>>> /run/xenomai/root/mysession/821/alchemy/tasks
>>>>>>>>>> /run/xenomai/root/mysession/821/alchemy/tasks/task@1[821]
>>>>>>>>>> /run/xenomai/root/mysession/821/alchemy/queues
>>>>>>>>>> /run/xenomai/root/mysession/821/alchemy/queues/foo
>>>>>>>>>> /run/xenomai/root/mysession/system
>>>>>>>>>> /run/xenomai/root/mysession/system/threads
>>>>>>>>>> /run/xenomai/root/mysession/system/heaps
>>>>>>>>>> /run/xenomai/root/mysession/system/version
>>>>>>>>>> + qfile='/run/xenomai/*/*/*/alchemy/queues/foo'
>>>>>>>>>> + cat /run/xenomai/root/mysession/821/alchemy/queues/foo
>>>>>>>>>> memory exhausted
>>>>>>>>>>
>>>>>>>>>> At this point, it hangs, although SIGINT usually terminates it.
>>>>>>>>>>
>>>>>>>>>> I've seen some cases where SIGINT won't terminate it, and a
>>>>>>>>>> reboot is
>>>>>>>>>> required to clean things up.  I see this message appears to be
>>>>>>>>>> logged
>>>>>>>>>> in the obstack error handler.  I don't think I'm running out of
>>>>>>>>>> memory,
>>>>>>>>>> which makes me think "heap corruption".  Not much of an analysis!
>>>>>>>>>> I did
>>>>>>>>>> try varying queue sizes and max message counts - no change.
>>>>>>>>>>
>>>>>>>>> I can't reproduce this. I would suspect a rampant memory
>>>>>>>>> corruption
>>>>>>>>> too,
>>>>>>>>> although running the test code over valgrind (mercury build)
>>>>>>>>> did not
>>>>>>>>> reveal any issue.
>>>>>>>>>
>>>>>>>>> - which Xenomai version are you using?
>>>>>>>>> - cobalt / mercury ?
>>>>>>>>> - do you enable the shared heap when configuring ?
>>>>>>>>> (--enable-pshared)
>>>>>>>>>
>>>>>>>> I'm using Cobalt.  uname -a reports:
>>>>>>>>
>>>>>>>> Linux sdftest 4.1.18_C01571-15S00-00.000.zimg+83fdace666 #2 SMP Fri
>>>>>>>> Mar
>>>>>>>> 9 11:07:52 CST 2018 armv7l GNU/Linux
>>>>>>>>
>>>>>>>> Here is the config dump:
>>>>>>>>
>>>>>>>> CONFIG_XENO_PSHARED=1
>>>>>>> Any chance you could have some leftover files in /dev/shm from
>>>>>>> aborted
>>>>>>> runs, which would steal RAM?
>>>>>>>
>>>>>> I've been rebooting before each test run, but I'll keep that in
>>>>>> mind for
>>>>>> future testing.
>>>>>>
>>>>>> Sounds like I need to try rolling back to an older build, I have a
>>>>>> 3.0.5
>>>>>> and a 3.0.3 build handy.
>>>>> The standalone test should work with the shared heap disabled,
>>>>> could you
>>>>> check it against a build configure with --disable-pshared? Thanks,
>>>>>
>>>> Philippe,
>>>>
>>>> Sorry for the delay - our vendor had been doing all of our kernel
>>>> and SDK
>>>> builds so I had to do a lot of learning to get this all going.
>>>>
>>>> With the --disable-pshared in effect:
>>>>
>>>> /.g3l # ./qc --dump-config | grep SHARED
>>>> based on Xenomai/cobalt v3.0.6 -- #6e34bb5 (2018-04-01 10:50:59 +0200)
>>>> CONFIG_XENO_PSHARED is OFF
>>>>
>>>> /.g3l # ./qc foo &
>>>> /.g3l # find /run/xenomai/
>>>> /run/xenomai/
>>>> /run/xenomai/root
>>>> /run/xenomai/root/opus
>>>> /run/xenomai/root/opus/3477
>>>> /run/xenomai/root/opus/3477/alchemy
>>>> /run/xenomai/root/opus/3477/alchemy/tasks
>>>> /run/xenomai/root/opus/3477/alchemy/tasks/qcreate3477
>>>> /run/xenomai/root/opus/3477/alchemy/queues
>>>> /run/xenomai/root/opus/3477/alchemy/queues/foo
>>>> /run/xenomai/root/opus/system
>>>> /run/xenomai/root/opus/system/threads
>>>> /run/xenomai/root/opus/system/heaps
>>>> /run/xenomai/root/opus/system/version
>>>> root@ICB-G3L:/.g3l # cat run/xenomai/root/opus/3477/alchemy/queues/foo
>>>> [TYPE]  [TOTALMEM]  [USEDMEM]  [QLIMIT]  [MCOUNT]
>>>>   FIFO        5344       3248        10         0
>>>>
>>>> Perfect!
>>>>
>>>> What's the next step?
>>>>
>>> I can reproduce this issue. I'm on it.
>>>
>> The patch below should solve the problem for the registry, however this
>> may have uncovered a bug in the "tlsf" allocator (once again), which
>> should not have failed allocating memory. Two separate issues then.
>>
>> diff --git a/include/copperplate/registry-obstack.h
>> b/include/copperplate/registry-obstack.h
>> index fe192faf7..48e453bc3 100644
>> --- a/include/copperplate/registry-obstack.h
>> +++ b/include/copperplate/registry-obstack.h
>> @@ -29,11 +29,12 @@ struct threadobj;
>>   struct syncobj;
>>
>>   /*
>> - * Assume we may want fast allocation of private memory from real-time
>> - * mode when growing the obstack.
>> + * Obstacks are grown from handlers called by the fusefs server
>> + * thread, which has no real-time requirement: malloc/free is fine for
>> + * memory management.
>>    */
>> -#define obstack_chunk_alloc    pvmalloc
>> -#define obstack_chunk_free    pvfree
>> +#define obstack_chunk_alloc    malloc
>> +#define obstack_chunk_free    free
>>
>>   struct threadobj;
>>
>>
> Thanks Philippe,
> 
> I shall add this to my build ASAP.  If I understand correctly, this is
> switching
> the entire registry-obstack-related dynamic storage allocation mechanism
> from the
> "pv" routines (TLSF allocator?) paradigm to the standard malloc/free
> paradigm.

Yes, because the context that runs those calls is a non-rt one in
essence, i.e. the fuse filesystem server thread.

> 
> I ask because my next issue report was going to be about a SEGV that I have
> been seeing occasionally in registry_add_file() after having called
> pvstrdup()
> and having gotten a NULL return back.  The caller there apparently does not
> expect a NULL return, so when you said "should not have failed
> allocating memory"
> that brought my attention back to the SEGV issue.  This appears to be
> related to
> what I will call "heavy registry activity" when I am initializing -
> creating
> lots of RT tasks, queues, mutexes, etc, causing hot activity in
> registry_add_file
> I would expect.

Your assumption is spot on. There is a flaw in the way the private
memory allocator is sized in pshared mode (i.e. tlsf in that case). Only
8k are reserved for the main private pool pvmalloc() refers to, that
size was picked in order to limit the amount of locked memory reserved
by an allocator which is usually under low pressure when pshared is
enabled. The consequence of random pvmalloc() calls implicitly pulling
memory from that heap was overlooked.

Problem is that a high registry activity exactly defeats that
assumption, consuming much more memory as obstacks grow with printout data.

I've been converting pvmalloc() calls to plain malloc() when applicable
lately, but we may still need a way to let the user specify which amount
of private memory would be required for the remaining pvmalloc() calls.

> 
> I'm thinking creation of a "registry exerciser" program may be in order...
> 
> 

Yeah, possibly. The fuse-based registry was assumed to be straighforward
to implement in theory, but ended up with lots of brain damage corner cases.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xenomai] Possible Xenomai fuse filesystem/registry queue status files issue?
  2018-04-12 16:05                   ` Philippe Gerum
@ 2018-04-12 17:56                     ` Steve Freyder
  2018-04-13  6:36                       ` Philippe Gerum
  0 siblings, 1 reply; 15+ messages in thread
From: Steve Freyder @ 2018-04-12 17:56 UTC (permalink / raw)
  To: Philippe Gerum, xenomai

On 4/12/2018 11:05 AM, Philippe Gerum wrote:
> On 04/12/2018 05:44 PM, Steve Freyder wrote:
>> On 4/12/2018 5:23 AM, Philippe Gerum wrote:
>>> On 04/12/2018 11:31 AM, Philippe Gerum wrote:
>>>> On 04/09/2018 01:01 AM, Steve Freyder wrote:
>>>>> On 4/2/2018 11:51 AM, Philippe Gerum wrote:
>>>>>> On 04/02/2018 06:11 PM, Steve Freyder wrote:
>>>>>>> On 4/2/2018 10:20 AM, Philippe Gerum wrote:
>>>>>>>> On 04/02/2018 04:54 PM, Steve Freyder wrote:
>>>>>>>>> On 4/2/2018 8:41 AM, Philippe Gerum wrote:
>>>>>>>>>> On 04/01/2018 07:28 PM, Steve Freyder wrote:
>>>>>>>>>>> Greetings again.
>>>>>>>>>>>
>>>>>>>>>>> As I understand it, for each rt_queue there's supposed to be a
>>>>>>>>>>> "status file" located in the fuse filesystem underneath the
>>>>>>>>>>> "/run/xenomai/user/session/pid/alchemy/queues" directory, with
>>>>>>>>>>> the file name being the queue name.  This used to contain very
>>>>>>>>>>> useful info about queue status, message counts, etc.  I don't
>>>>>>>>>>> know
>>>>>>>>>>> when it broke or whether it's something I'm doing wrong but I'm
>>>>>>>>>>> now getting a "memory exhausted" message on the console when I
>>>>>>>>>>> attempt to do a "cat" on the status file.
>>>>>>>>>>>
>>>>>>>>>>> Here's a small C program that just creates a queue, and then does
>>>>>>>>>>> a pause to hold the accessor count non-zero.
>>>>>>>>>>>
>>>>>>>>>> <snip>
>>>>>>>>>>
>>>>>>>>>>> The resulting output (logged in via the system console):
>>>>>>>>>>>
>>>>>>>>>>> # sh qtest.sh
>>>>>>>>>>> + sleep 1
>>>>>>>>>>> + ./qc --mem-pool-size=64M --session=mysession foo
>>>>>>>>>>> + find /run/xenomai
>>>>>>>>>>> /run/xenomai
>>>>>>>>>>> /run/xenomai/root
>>>>>>>>>>> /run/xenomai/root/mysession
>>>>>>>>>>> /run/xenomai/root/mysession/821
>>>>>>>>>>> /run/xenomai/root/mysession/821/alchemy
>>>>>>>>>>> /run/xenomai/root/mysession/821/alchemy/tasks
>>>>>>>>>>> /run/xenomai/root/mysession/821/alchemy/tasks/task@1[821]
>>>>>>>>>>> /run/xenomai/root/mysession/821/alchemy/queues
>>>>>>>>>>> /run/xenomai/root/mysession/821/alchemy/queues/foo
>>>>>>>>>>> /run/xenomai/root/mysession/system
>>>>>>>>>>> /run/xenomai/root/mysession/system/threads
>>>>>>>>>>> /run/xenomai/root/mysession/system/heaps
>>>>>>>>>>> /run/xenomai/root/mysession/system/version
>>>>>>>>>>> + qfile='/run/xenomai/*/*/*/alchemy/queues/foo'
>>>>>>>>>>> + cat /run/xenomai/root/mysession/821/alchemy/queues/foo
>>>>>>>>>>> memory exhausted
>>>>>>>>>>>
>>>>>>>>>>> At this point, it hangs, although SIGINT usually terminates it.
>>>>>>>>>>>
>>>>>>>>>>> I've seen some cases where SIGINT won't terminate it, and a
>>>>>>>>>>> reboot is
>>>>>>>>>>> required to clean things up.  I see this message appears to be
>>>>>>>>>>> logged
>>>>>>>>>>> in the obstack error handler.  I don't think I'm running out of
>>>>>>>>>>> memory,
>>>>>>>>>>> which makes me think "heap corruption".  Not much of an analysis!
>>>>>>>>>>> I did
>>>>>>>>>>> try varying queue sizes and max message counts - no change.
>>>>>>>>>>>
>>>>>>>>>> I can't reproduce this. I would suspect a rampant memory
>>>>>>>>>> corruption
>>>>>>>>>> too,
>>>>>>>>>> although running the test code over valgrind (mercury build)
>>>>>>>>>> did not
>>>>>>>>>> reveal any issue.
>>>>>>>>>>
>>>>>>>>>> - which Xenomai version are you using?
>>>>>>>>>> - cobalt / mercury ?
>>>>>>>>>> - do you enable the shared heap when configuring ?
>>>>>>>>>> (--enable-pshared)
>>>>>>>>>>
>>>>>>>>> I'm using Cobalt.  uname -a reports:
>>>>>>>>>
>>>>>>>>> Linux sdftest 4.1.18_C01571-15S00-00.000.zimg+83fdace666 #2 SMP Fri
>>>>>>>>> Mar
>>>>>>>>> 9 11:07:52 CST 2018 armv7l GNU/Linux
>>>>>>>>>
>>>>>>>>> Here is the config dump:
>>>>>>>>>
>>>>>>>>> CONFIG_XENO_PSHARED=1
>>>>>>>> Any chance you could have some leftover files in /dev/shm from
>>>>>>>> aborted
>>>>>>>> runs, which would steal RAM?
>>>>>>>>
>>>>>>> I've been rebooting before each test run, but I'll keep that in
>>>>>>> mind for
>>>>>>> future testing.
>>>>>>>
>>>>>>> Sounds like I need to try rolling back to an older build, I have a
>>>>>>> 3.0.5
>>>>>>> and a 3.0.3 build handy.
>>>>>> The standalone test should work with the shared heap disabled,
>>>>>> could you
>>>>>> check it against a build configure with --disable-pshared? Thanks,
>>>>>>
>>>>> Philippe,
>>>>>
>>>>> Sorry for the delay - our vendor had been doing all of our kernel
>>>>> and SDK
>>>>> builds so I had to do a lot of learning to get this all going.
>>>>>
>>>>> With the --disable-pshared in effect:
>>>>>
>>>>> /.g3l # ./qc --dump-config | grep SHARED
>>>>> based on Xenomai/cobalt v3.0.6 -- #6e34bb5 (2018-04-01 10:50:59 +0200)
>>>>> CONFIG_XENO_PSHARED is OFF
>>>>>
>>>>> /.g3l # ./qc foo &
>>>>> /.g3l # find /run/xenomai/
>>>>> /run/xenomai/
>>>>> /run/xenomai/root
>>>>> /run/xenomai/root/opus
>>>>> /run/xenomai/root/opus/3477
>>>>> /run/xenomai/root/opus/3477/alchemy
>>>>> /run/xenomai/root/opus/3477/alchemy/tasks
>>>>> /run/xenomai/root/opus/3477/alchemy/tasks/qcreate3477
>>>>> /run/xenomai/root/opus/3477/alchemy/queues
>>>>> /run/xenomai/root/opus/3477/alchemy/queues/foo
>>>>> /run/xenomai/root/opus/system
>>>>> /run/xenomai/root/opus/system/threads
>>>>> /run/xenomai/root/opus/system/heaps
>>>>> /run/xenomai/root/opus/system/version
>>>>> root@ICB-G3L:/.g3l # cat run/xenomai/root/opus/3477/alchemy/queues/foo
>>>>> [TYPE]  [TOTALMEM]  [USEDMEM]  [QLIMIT]  [MCOUNT]
>>>>>    FIFO        5344       3248        10         0
>>>>>
>>>>> Perfect!
>>>>>
>>>>> What's the next step?
>>>>>
>>>> I can reproduce this issue. I'm on it.
>>>>
>>> The patch below should solve the problem for the registry, however this
>>> may have uncovered a bug in the "tlsf" allocator (once again), which
>>> should not have failed allocating memory. Two separate issues then.
>>>
>>> diff --git a/include/copperplate/registry-obstack.h
>>> b/include/copperplate/registry-obstack.h
>>> index fe192faf7..48e453bc3 100644
>>> --- a/include/copperplate/registry-obstack.h
>>> +++ b/include/copperplate/registry-obstack.h
>>> @@ -29,11 +29,12 @@ struct threadobj;
>>>    struct syncobj;
>>>
>>>    /*
>>> - * Assume we may want fast allocation of private memory from real-time
>>> - * mode when growing the obstack.
>>> + * Obstacks are grown from handlers called by the fusefs server
>>> + * thread, which has no real-time requirement: malloc/free is fine for
>>> + * memory management.
>>>     */
>>> -#define obstack_chunk_alloc    pvmalloc
>>> -#define obstack_chunk_free    pvfree
>>> +#define obstack_chunk_alloc    malloc
>>> +#define obstack_chunk_free    free
>>>
>>>    struct threadobj;
>>>
>>>
>> Thanks Philippe,
>>
>> I shall add this to my build ASAP.  If I understand correctly, this is
>> switching
>> the entire registry-obstack-related dynamic storage allocation mechanism
>> from the
>> "pv" routines (TLSF allocator?) paradigm to the standard malloc/free
>> paradigm.
> Yes, because the context that runs those calls is a non-rt one in
> essence, i.e. the fuse filesystem server thread.
>
>> I ask because my next issue report was going to be about a SEGV that I have
>> been seeing occasionally in registry_add_file() after having called
>> pvstrdup()
>> and having gotten a NULL return back.  The caller there apparently does not
>> expect a NULL return, so when you said "should not have failed
>> allocating memory"
>> that brought my attention back to the SEGV issue.  This appears to be
>> related to
>> what I will call "heavy registry activity" when I am initializing -
>> creating
>> lots of RT tasks, queues, mutexes, etc, causing hot activity in
>> registry_add_file
>> I would expect.
> Your assumption is spot on. There is a flaw in the way the private
> memory allocator is sized in pshared mode (i.e. tlsf in that case). Only
> 8k are reserved for the main private pool pvmalloc() refers to, that
> size was picked in order to limit the amount of locked memory reserved
> by an allocator which is usually under low pressure when pshared is
> enabled. The consequence of random pvmalloc() calls implicitly pulling
> memory from that heap was overlooked.
>
> Problem is that a high registry activity exactly defeats that
> assumption, consuming much more memory as obstacks grow with printout data.
>
> I've been converting pvmalloc() calls to plain malloc() when applicable
> lately, but we may still need a way to let the user specify which amount
> of private memory would be required for the remaining pvmalloc() calls.
>
>> I'm thinking creation of a "registry exerciser" program may be in order...
>>
>>
> Yeah, possibly. The fuse-based registry was assumed to be straighforward
> to implement in theory, but ended up with lots of brain damage corner cases.
>
That 8K allocation, is that on a "per-process" basis?  I had started
breaking my code into more processes because I had an inkling that
there was some kind of limit on how many registry entries a process
could have - maybe now I understand where that limit originates!

Do you think that an increase from 8 to 64K would be acceptable?  A
"temporary workaround" at best, but I'm pretty sure that would be
enough to get my code running in short order.

Steve



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xenomai] Possible Xenomai fuse filesystem/registry queue status files issue?
  2018-04-12 17:56                     ` Steve Freyder
@ 2018-04-13  6:36                       ` Philippe Gerum
  2018-04-13 16:25                         ` Steve Freyder
  0 siblings, 1 reply; 15+ messages in thread
From: Philippe Gerum @ 2018-04-13  6:36 UTC (permalink / raw)
  To: Steve Freyder, xenomai

On 04/12/2018 07:56 PM, Steve Freyder wrote:
> On 4/12/2018 11:05 AM, Philippe Gerum wrote:
>> On 04/12/2018 05:44 PM, Steve Freyder wrote:
>>> On 4/12/2018 5:23 AM, Philippe Gerum wrote:
>>>> On 04/12/2018 11:31 AM, Philippe Gerum wrote:
>>>>> On 04/09/2018 01:01 AM, Steve Freyder wrote:
>>>>>> On 4/2/2018 11:51 AM, Philippe Gerum wrote:
>>>>>>> On 04/02/2018 06:11 PM, Steve Freyder wrote:
>>>>>>>> On 4/2/2018 10:20 AM, Philippe Gerum wrote:
>>>>>>>>> On 04/02/2018 04:54 PM, Steve Freyder wrote:
>>>>>>>>>> On 4/2/2018 8:41 AM, Philippe Gerum wrote:
>>>>>>>>>>> On 04/01/2018 07:28 PM, Steve Freyder wrote:
>>>>>>>>>>>> Greetings again.
>>>>>>>>>>>>
>>>>>>>>>>>> As I understand it, for each rt_queue there's supposed to be a
>>>>>>>>>>>> "status file" located in the fuse filesystem underneath the
>>>>>>>>>>>> "/run/xenomai/user/session/pid/alchemy/queues" directory, with
>>>>>>>>>>>> the file name being the queue name.  This used to contain very
>>>>>>>>>>>> useful info about queue status, message counts, etc.  I don't
>>>>>>>>>>>> know
>>>>>>>>>>>> when it broke or whether it's something I'm doing wrong but I'm
>>>>>>>>>>>> now getting a "memory exhausted" message on the console when I
>>>>>>>>>>>> attempt to do a "cat" on the status file.
>>>>>>>>>>>>
>>>>>>>>>>>> Here's a small C program that just creates a queue, and then
>>>>>>>>>>>> does
>>>>>>>>>>>> a pause to hold the accessor count non-zero.
>>>>>>>>>>>>
>>>>>>>>>>> <snip>
>>>>>>>>>>>
>>>>>>>>>>>> The resulting output (logged in via the system console):
>>>>>>>>>>>>
>>>>>>>>>>>> # sh qtest.sh
>>>>>>>>>>>> + sleep 1
>>>>>>>>>>>> + ./qc --mem-pool-size=64M --session=mysession foo
>>>>>>>>>>>> + find /run/xenomai
>>>>>>>>>>>> /run/xenomai
>>>>>>>>>>>> /run/xenomai/root
>>>>>>>>>>>> /run/xenomai/root/mysession
>>>>>>>>>>>> /run/xenomai/root/mysession/821
>>>>>>>>>>>> /run/xenomai/root/mysession/821/alchemy
>>>>>>>>>>>> /run/xenomai/root/mysession/821/alchemy/tasks
>>>>>>>>>>>> /run/xenomai/root/mysession/821/alchemy/tasks/task@1[821]
>>>>>>>>>>>> /run/xenomai/root/mysession/821/alchemy/queues
>>>>>>>>>>>> /run/xenomai/root/mysession/821/alchemy/queues/foo
>>>>>>>>>>>> /run/xenomai/root/mysession/system
>>>>>>>>>>>> /run/xenomai/root/mysession/system/threads
>>>>>>>>>>>> /run/xenomai/root/mysession/system/heaps
>>>>>>>>>>>> /run/xenomai/root/mysession/system/version
>>>>>>>>>>>> + qfile='/run/xenomai/*/*/*/alchemy/queues/foo'
>>>>>>>>>>>> + cat /run/xenomai/root/mysession/821/alchemy/queues/foo
>>>>>>>>>>>> memory exhausted
>>>>>>>>>>>>
>>>>>>>>>>>> At this point, it hangs, although SIGINT usually terminates it.
>>>>>>>>>>>>
>>>>>>>>>>>> I've seen some cases where SIGINT won't terminate it, and a
>>>>>>>>>>>> reboot is
>>>>>>>>>>>> required to clean things up.  I see this message appears to be
>>>>>>>>>>>> logged
>>>>>>>>>>>> in the obstack error handler.  I don't think I'm running out of
>>>>>>>>>>>> memory,
>>>>>>>>>>>> which makes me think "heap corruption".  Not much of an
>>>>>>>>>>>> analysis!
>>>>>>>>>>>> I did
>>>>>>>>>>>> try varying queue sizes and max message counts - no change.
>>>>>>>>>>>>
>>>>>>>>>>> I can't reproduce this. I would suspect a rampant memory
>>>>>>>>>>> corruption
>>>>>>>>>>> too,
>>>>>>>>>>> although running the test code over valgrind (mercury build)
>>>>>>>>>>> did not
>>>>>>>>>>> reveal any issue.
>>>>>>>>>>>
>>>>>>>>>>> - which Xenomai version are you using?
>>>>>>>>>>> - cobalt / mercury ?
>>>>>>>>>>> - do you enable the shared heap when configuring ?
>>>>>>>>>>> (--enable-pshared)
>>>>>>>>>>>
>>>>>>>>>> I'm using Cobalt.  uname -a reports:
>>>>>>>>>>
>>>>>>>>>> Linux sdftest 4.1.18_C01571-15S00-00.000.zimg+83fdace666 #2
>>>>>>>>>> SMP Fri
>>>>>>>>>> Mar
>>>>>>>>>> 9 11:07:52 CST 2018 armv7l GNU/Linux
>>>>>>>>>>
>>>>>>>>>> Here is the config dump:
>>>>>>>>>>
>>>>>>>>>> CONFIG_XENO_PSHARED=1
>>>>>>>>> Any chance you could have some leftover files in /dev/shm from
>>>>>>>>> aborted
>>>>>>>>> runs, which would steal RAM?
>>>>>>>>>
>>>>>>>> I've been rebooting before each test run, but I'll keep that in
>>>>>>>> mind for
>>>>>>>> future testing.
>>>>>>>>
>>>>>>>> Sounds like I need to try rolling back to an older build, I have a
>>>>>>>> 3.0.5
>>>>>>>> and a 3.0.3 build handy.
>>>>>>> The standalone test should work with the shared heap disabled,
>>>>>>> could you
>>>>>>> check it against a build configure with --disable-pshared? Thanks,
>>>>>>>
>>>>>> Philippe,
>>>>>>
>>>>>> Sorry for the delay - our vendor had been doing all of our kernel
>>>>>> and SDK
>>>>>> builds so I had to do a lot of learning to get this all going.
>>>>>>
>>>>>> With the --disable-pshared in effect:
>>>>>>
>>>>>> /.g3l # ./qc --dump-config | grep SHARED
>>>>>> based on Xenomai/cobalt v3.0.6 -- #6e34bb5 (2018-04-01 10:50:59
>>>>>> +0200)
>>>>>> CONFIG_XENO_PSHARED is OFF
>>>>>>
>>>>>> /.g3l # ./qc foo &
>>>>>> /.g3l # find /run/xenomai/
>>>>>> /run/xenomai/
>>>>>> /run/xenomai/root
>>>>>> /run/xenomai/root/opus
>>>>>> /run/xenomai/root/opus/3477
>>>>>> /run/xenomai/root/opus/3477/alchemy
>>>>>> /run/xenomai/root/opus/3477/alchemy/tasks
>>>>>> /run/xenomai/root/opus/3477/alchemy/tasks/qcreate3477
>>>>>> /run/xenomai/root/opus/3477/alchemy/queues
>>>>>> /run/xenomai/root/opus/3477/alchemy/queues/foo
>>>>>> /run/xenomai/root/opus/system
>>>>>> /run/xenomai/root/opus/system/threads
>>>>>> /run/xenomai/root/opus/system/heaps
>>>>>> /run/xenomai/root/opus/system/version
>>>>>> root@ICB-G3L:/.g3l # cat
>>>>>> run/xenomai/root/opus/3477/alchemy/queues/foo
>>>>>> [TYPE]  [TOTALMEM]  [USEDMEM]  [QLIMIT]  [MCOUNT]
>>>>>>    FIFO        5344       3248        10         0
>>>>>>
>>>>>> Perfect!
>>>>>>
>>>>>> What's the next step?
>>>>>>
>>>>> I can reproduce this issue. I'm on it.
>>>>>
>>>> The patch below should solve the problem for the registry, however this
>>>> may have uncovered a bug in the "tlsf" allocator (once again), which
>>>> should not have failed allocating memory. Two separate issues then.
>>>>
>>>> diff --git a/include/copperplate/registry-obstack.h
>>>> b/include/copperplate/registry-obstack.h
>>>> index fe192faf7..48e453bc3 100644
>>>> --- a/include/copperplate/registry-obstack.h
>>>> +++ b/include/copperplate/registry-obstack.h
>>>> @@ -29,11 +29,12 @@ struct threadobj;
>>>>    struct syncobj;
>>>>
>>>>    /*
>>>> - * Assume we may want fast allocation of private memory from real-time
>>>> - * mode when growing the obstack.
>>>> + * Obstacks are grown from handlers called by the fusefs server
>>>> + * thread, which has no real-time requirement: malloc/free is fine for
>>>> + * memory management.
>>>>     */
>>>> -#define obstack_chunk_alloc    pvmalloc
>>>> -#define obstack_chunk_free    pvfree
>>>> +#define obstack_chunk_alloc    malloc
>>>> +#define obstack_chunk_free    free
>>>>
>>>>    struct threadobj;
>>>>
>>>>
>>> Thanks Philippe,
>>>
>>> I shall add this to my build ASAP.  If I understand correctly, this is
>>> switching
>>> the entire registry-obstack-related dynamic storage allocation mechanism
>>> from the
>>> "pv" routines (TLSF allocator?) paradigm to the standard malloc/free
>>> paradigm.
>> Yes, because the context that runs those calls is a non-rt one in
>> essence, i.e. the fuse filesystem server thread.
>>
>>> I ask because my next issue report was going to be about a SEGV that
>>> I have
>>> been seeing occasionally in registry_add_file() after having called
>>> pvstrdup()
>>> and having gotten a NULL return back.  The caller there apparently
>>> does not
>>> expect a NULL return, so when you said "should not have failed
>>> allocating memory"
>>> that brought my attention back to the SEGV issue.  This appears to be
>>> related to
>>> what I will call "heavy registry activity" when I am initializing -
>>> creating
>>> lots of RT tasks, queues, mutexes, etc, causing hot activity in
>>> registry_add_file
>>> I would expect.
>> Your assumption is spot on. There is a flaw in the way the private
>> memory allocator is sized in pshared mode (i.e. tlsf in that case). Only
>> 8k are reserved for the main private pool pvmalloc() refers to, that
>> size was picked in order to limit the amount of locked memory reserved
>> by an allocator which is usually under low pressure when pshared is
>> enabled. The consequence of random pvmalloc() calls implicitly pulling
>> memory from that heap was overlooked.
>>
>> Problem is that a high registry activity exactly defeats that
>> assumption, consuming much more memory as obstacks grow with printout
>> data.
>>
>> I've been converting pvmalloc() calls to plain malloc() when applicable
>> lately, but we may still need a way to let the user specify which amount
>> of private memory would be required for the remaining pvmalloc() calls.
>>
>>> I'm thinking creation of a "registry exerciser" program may be in
>>> order...
>>>
>>>
>> Yeah, possibly. The fuse-based registry was assumed to be straighforward
>> to implement in theory, but ended up with lots of brain damage corner
>> cases.
>>
> That 8K allocation, is that on a "per-process" basis?  I had started
> breaking my code into more processes because I had an inkling that
> there was some kind of limit on how many registry entries a process
> could have - maybe now I understand where that limit originates!
> 
> Do you think that an increase from 8 to 64K would be acceptable?  A
> "temporary workaround" at best, but I'm pretty sure that would be
> enough to get my code running in short order.
> 

As a work aroud, just raise MIN_TLSF_HEAPSZ to the value you see fit.
There is no limitation on the number of registry entries whatsoever,
only transient memory pressure on the private allocator with FUSE, when
running pshared that could lead to OOM situations.

The situation may get worse because TLSF is really bad at dealing with
internal fragmentation. I'm working on a drop in replacement for the
-next branch for this reason.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xenomai] Possible Xenomai fuse filesystem/registry queue status files issue?
  2018-04-13  6:36                       ` Philippe Gerum
@ 2018-04-13 16:25                         ` Steve Freyder
  0 siblings, 0 replies; 15+ messages in thread
From: Steve Freyder @ 2018-04-13 16:25 UTC (permalink / raw)
  To: Philippe Gerum, xenomai

On 4/13/2018 1:36 AM, Philippe Gerum wrote:
> On 04/12/2018 07:56 PM, Steve Freyder wrote:
>> On 4/12/2018 11:05 AM, Philippe Gerum wrote:
>>> On 04/12/2018 05:44 PM, Steve Freyder wrote:
>>>> On 4/12/2018 5:23 AM, Philippe Gerum wrote:
>>>>> On 04/12/2018 11:31 AM, Philippe Gerum wrote:
>>>>>> On 04/09/2018 01:01 AM, Steve Freyder wrote:
>>>>>>> On 4/2/2018 11:51 AM, Philippe Gerum wrote:
>>>>>>>> On 04/02/2018 06:11 PM, Steve Freyder wrote:
>>>>>>>>> On 4/2/2018 10:20 AM, Philippe Gerum wrote:
>>>>>>>>>> On 04/02/2018 04:54 PM, Steve Freyder wrote:
>>>>>>>>>>> On 4/2/2018 8:41 AM, Philippe Gerum wrote:
>>>>>>>>>>>> On 04/01/2018 07:28 PM, Steve Freyder wrote:
>>>>>>>>>>>>> Greetings again.
>>>>>>>>>>>>>
>>>>>>>>>>>>> As I understand it, for each rt_queue there's supposed to be a
>>>>>>>>>>>>> "status file" located in the fuse filesystem underneath the
>>>>>>>>>>>>> "/run/xenomai/user/session/pid/alchemy/queues" directory, with
>>>>>>>>>>>>> the file name being the queue name.  This used to contain very
>>>>>>>>>>>>> useful info about queue status, message counts, etc.  I don't
>>>>>>>>>>>>> know
>>>>>>>>>>>>> when it broke or whether it's something I'm doing wrong but I'm
>>>>>>>>>>>>> now getting a "memory exhausted" message on the console when I
>>>>>>>>>>>>> attempt to do a "cat" on the status file.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here's a small C program that just creates a queue, and then
>>>>>>>>>>>>> does
>>>>>>>>>>>>> a pause to hold the accessor count non-zero.
>>>>>>>>>>>>>
>>>>>>>>>>>> <snip>
>>>>>>>>>>>>
>>>>>>>>>>>>> The resulting output (logged in via the system console):
>>>>>>>>>>>>>
>>>>>>>>>>>>> # sh qtest.sh
>>>>>>>>>>>>> + sleep 1
>>>>>>>>>>>>> + ./qc --mem-pool-size=64M --session=mysession foo
>>>>>>>>>>>>> + find /run/xenomai
>>>>>>>>>>>>> /run/xenomai
>>>>>>>>>>>>> /run/xenomai/root
>>>>>>>>>>>>> /run/xenomai/root/mysession
>>>>>>>>>>>>> /run/xenomai/root/mysession/821
>>>>>>>>>>>>> /run/xenomai/root/mysession/821/alchemy
>>>>>>>>>>>>> /run/xenomai/root/mysession/821/alchemy/tasks
>>>>>>>>>>>>> /run/xenomai/root/mysession/821/alchemy/tasks/task@1[821]
>>>>>>>>>>>>> /run/xenomai/root/mysession/821/alchemy/queues
>>>>>>>>>>>>> /run/xenomai/root/mysession/821/alchemy/queues/foo
>>>>>>>>>>>>> /run/xenomai/root/mysession/system
>>>>>>>>>>>>> /run/xenomai/root/mysession/system/threads
>>>>>>>>>>>>> /run/xenomai/root/mysession/system/heaps
>>>>>>>>>>>>> /run/xenomai/root/mysession/system/version
>>>>>>>>>>>>> + qfile='/run/xenomai/*/*/*/alchemy/queues/foo'
>>>>>>>>>>>>> + cat /run/xenomai/root/mysession/821/alchemy/queues/foo
>>>>>>>>>>>>> memory exhausted
>>>>>>>>>>>>>
>>>>>>>>>>>>> At this point, it hangs, although SIGINT usually terminates it.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I've seen some cases where SIGINT won't terminate it, and a
>>>>>>>>>>>>> reboot is
>>>>>>>>>>>>> required to clean things up.  I see this message appears to be
>>>>>>>>>>>>> logged
>>>>>>>>>>>>> in the obstack error handler.  I don't think I'm running out of
>>>>>>>>>>>>> memory,
>>>>>>>>>>>>> which makes me think "heap corruption".  Not much of an
>>>>>>>>>>>>> analysis!
>>>>>>>>>>>>> I did
>>>>>>>>>>>>> try varying queue sizes and max message counts - no change.
>>>>>>>>>>>>>
>>>>>>>>>>>> I can't reproduce this. I would suspect a rampant memory
>>>>>>>>>>>> corruption
>>>>>>>>>>>> too,
>>>>>>>>>>>> although running the test code over valgrind (mercury build)
>>>>>>>>>>>> did not
>>>>>>>>>>>> reveal any issue.
>>>>>>>>>>>>
>>>>>>>>>>>> - which Xenomai version are you using?
>>>>>>>>>>>> - cobalt / mercury ?
>>>>>>>>>>>> - do you enable the shared heap when configuring ?
>>>>>>>>>>>> (--enable-pshared)
>>>>>>>>>>>>
>>>>>>>>>>> I'm using Cobalt.  uname -a reports:
>>>>>>>>>>>
>>>>>>>>>>> Linux sdftest 4.1.18_C01571-15S00-00.000.zimg+83fdace666 #2
>>>>>>>>>>> SMP Fri
>>>>>>>>>>> Mar
>>>>>>>>>>> 9 11:07:52 CST 2018 armv7l GNU/Linux
>>>>>>>>>>>
>>>>>>>>>>> Here is the config dump:
>>>>>>>>>>>
>>>>>>>>>>> CONFIG_XENO_PSHARED=1
>>>>>>>>>> Any chance you could have some leftover files in /dev/shm from
>>>>>>>>>> aborted
>>>>>>>>>> runs, which would steal RAM?
>>>>>>>>>>
>>>>>>>>> I've been rebooting before each test run, but I'll keep that in
>>>>>>>>> mind for
>>>>>>>>> future testing.
>>>>>>>>>
>>>>>>>>> Sounds like I need to try rolling back to an older build, I have a
>>>>>>>>> 3.0.5
>>>>>>>>> and a 3.0.3 build handy.
>>>>>>>> The standalone test should work with the shared heap disabled,
>>>>>>>> could you
>>>>>>>> check it against a build configure with --disable-pshared? Thanks,
>>>>>>>>
>>>>>>> Philippe,
>>>>>>>
>>>>>>> Sorry for the delay - our vendor had been doing all of our kernel
>>>>>>> and SDK
>>>>>>> builds so I had to do a lot of learning to get this all going.
>>>>>>>
>>>>>>> With the --disable-pshared in effect:
>>>>>>>
>>>>>>> /.g3l # ./qc --dump-config | grep SHARED
>>>>>>> based on Xenomai/cobalt v3.0.6 -- #6e34bb5 (2018-04-01 10:50:59
>>>>>>> +0200)
>>>>>>> CONFIG_XENO_PSHARED is OFF
>>>>>>>
>>>>>>> /.g3l # ./qc foo &
>>>>>>> /.g3l # find /run/xenomai/
>>>>>>> /run/xenomai/
>>>>>>> /run/xenomai/root
>>>>>>> /run/xenomai/root/opus
>>>>>>> /run/xenomai/root/opus/3477
>>>>>>> /run/xenomai/root/opus/3477/alchemy
>>>>>>> /run/xenomai/root/opus/3477/alchemy/tasks
>>>>>>> /run/xenomai/root/opus/3477/alchemy/tasks/qcreate3477
>>>>>>> /run/xenomai/root/opus/3477/alchemy/queues
>>>>>>> /run/xenomai/root/opus/3477/alchemy/queues/foo
>>>>>>> /run/xenomai/root/opus/system
>>>>>>> /run/xenomai/root/opus/system/threads
>>>>>>> /run/xenomai/root/opus/system/heaps
>>>>>>> /run/xenomai/root/opus/system/version
>>>>>>> root@ICB-G3L:/.g3l # cat
>>>>>>> run/xenomai/root/opus/3477/alchemy/queues/foo
>>>>>>> [TYPE]  [TOTALMEM]  [USEDMEM]  [QLIMIT]  [MCOUNT]
>>>>>>>     FIFO        5344       3248        10         0
>>>>>>>
>>>>>>> Perfect!
>>>>>>>
>>>>>>> What's the next step?
>>>>>>>
>>>>>> I can reproduce this issue. I'm on it.
>>>>>>
>>>>> The patch below should solve the problem for the registry, however this
>>>>> may have uncovered a bug in the "tlsf" allocator (once again), which
>>>>> should not have failed allocating memory. Two separate issues then.
>>>>>
>>>>> diff --git a/include/copperplate/registry-obstack.h
>>>>> b/include/copperplate/registry-obstack.h
>>>>> index fe192faf7..48e453bc3 100644
>>>>> --- a/include/copperplate/registry-obstack.h
>>>>> +++ b/include/copperplate/registry-obstack.h
>>>>> @@ -29,11 +29,12 @@ struct threadobj;
>>>>>     struct syncobj;
>>>>>
>>>>>     /*
>>>>> - * Assume we may want fast allocation of private memory from real-time
>>>>> - * mode when growing the obstack.
>>>>> + * Obstacks are grown from handlers called by the fusefs server
>>>>> + * thread, which has no real-time requirement: malloc/free is fine for
>>>>> + * memory management.
>>>>>      */
>>>>> -#define obstack_chunk_alloc    pvmalloc
>>>>> -#define obstack_chunk_free    pvfree
>>>>> +#define obstack_chunk_alloc    malloc
>>>>> +#define obstack_chunk_free    free
>>>>>
>>>>>     struct threadobj;
>>>>>
>>>>>
>>>> Thanks Philippe,
>>>>
>>>> I shall add this to my build ASAP.  If I understand correctly, this is
>>>> switching
>>>> the entire registry-obstack-related dynamic storage allocation mechanism
>>>> from the
>>>> "pv" routines (TLSF allocator?) paradigm to the standard malloc/free
>>>> paradigm.
>>> Yes, because the context that runs those calls is a non-rt one in
>>> essence, i.e. the fuse filesystem server thread.
>>>
>>>> I ask because my next issue report was going to be about a SEGV that
>>>> I have
>>>> been seeing occasionally in registry_add_file() after having called
>>>> pvstrdup()
>>>> and having gotten a NULL return back.  The caller there apparently
>>>> does not
>>>> expect a NULL return, so when you said "should not have failed
>>>> allocating memory"
>>>> that brought my attention back to the SEGV issue.  This appears to be
>>>> related to
>>>> what I will call "heavy registry activity" when I am initializing -
>>>> creating
>>>> lots of RT tasks, queues, mutexes, etc, causing hot activity in
>>>> registry_add_file
>>>> I would expect.
>>> Your assumption is spot on. There is a flaw in the way the private
>>> memory allocator is sized in pshared mode (i.e. tlsf in that case). Only
>>> 8k are reserved for the main private pool pvmalloc() refers to, that
>>> size was picked in order to limit the amount of locked memory reserved
>>> by an allocator which is usually under low pressure when pshared is
>>> enabled. The consequence of random pvmalloc() calls implicitly pulling
>>> memory from that heap was overlooked.
>>>
>>> Problem is that a high registry activity exactly defeats that
>>> assumption, consuming much more memory as obstacks grow with printout
>>> data.
>>>
>>> I've been converting pvmalloc() calls to plain malloc() when applicable
>>> lately, but we may still need a way to let the user specify which amount
>>> of private memory would be required for the remaining pvmalloc() calls.
>>>
>>>> I'm thinking creation of a "registry exerciser" program may be in
>>>> order...
>>>>
>>>>
>>> Yeah, possibly. The fuse-based registry was assumed to be straighforward
>>> to implement in theory, but ended up with lots of brain damage corner
>>> cases.
>>>
>> That 8K allocation, is that on a "per-process" basis?  I had started
>> breaking my code into more processes because I had an inkling that
>> there was some kind of limit on how many registry entries a process
>> could have - maybe now I understand where that limit originates!
>>
>> Do you think that an increase from 8 to 64K would be acceptable?  A
>> "temporary workaround" at best, but I'm pretty sure that would be
>> enough to get my code running in short order.
>>
> As a work aroud, just raise MIN_TLSF_HEAPSZ to the value you see fit.
> There is no limitation on the number of registry entries whatsoever,
> only transient memory pressure on the private allocator with FUSE, when
> running pshared that could lead to OOM situations.
>
> The situation may get worse because TLSF is really bad at dealing with
> internal fragmentation. I'm working on a drop in replacement for the
> -next branch for this reason.
>
I will do that.

Thank you for all of this and the associated enlightenment, very
much a pleasure.
--
Steve



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-04-13 16:25 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-01 17:28 [Xenomai] Possible Xenomai fuse filesystem/registry queue status files issue? Steve Freyder
2018-04-02 13:41 ` Philippe Gerum
2018-04-02 14:54   ` Steve Freyder
2018-04-02 15:20     ` Philippe Gerum
2018-04-02 16:11       ` Steve Freyder
2018-04-02 16:51         ` Philippe Gerum
2018-04-08 23:01           ` Steve Freyder
2018-04-11 14:37             ` Philippe Gerum
2018-04-12  9:31             ` Philippe Gerum
2018-04-12 10:23               ` Philippe Gerum
2018-04-12 15:44                 ` Steve Freyder
2018-04-12 16:05                   ` Philippe Gerum
2018-04-12 17:56                     ` Steve Freyder
2018-04-13  6:36                       ` Philippe Gerum
2018-04-13 16:25                         ` Steve Freyder

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.