All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xenomai] Xv3.0.5 /Lv4.9.35 on wandboard: page domain fault in rtudp.ko
@ 2017-07-15 20:37 Andreas Glatz
  2017-07-17 15:12 ` Philippe Gerum
  0 siblings, 1 reply; 5+ messages in thread
From: Andreas Glatz @ 2017-07-15 20:37 UTC (permalink / raw)
  To: xenomai

Hi

I managed to compile and bring-up Xenomai v3.0.5, Ipipe v2, Linux
v4.9.35 on a Wandboard Quad C1 [1]. I ran all unit-tests sucessfully
[2] and was able to get RTnet ping to an external PC working. RT ping
stops working after about 1000 pings... but that's a different story,
I guess.

So when I try to get a simple UDP server [3] running on top of RTnet I
get "Unhandled fault: page domain fault (0x01b)" oopses, where the
stacktrace indicates it has to do with rtudp.ko, e.g.
rt_udp_recvmsg+03c/0x38c. Further googling [4] told me that these
faults are related to missing copy_from_user() calls in the udp
driver, which seems to have been enforced in the most recent kernel
versions (after 4.1.x?). So if I go into the rtudp.ko source code and
start introducing copy_to_user() statements, the the page domain
faults go away one after the other...

At my company I basically want to propose to upgrade our current
kernel (3.0.35) with a more recent one. So I'm looking for a hint
what's the best+latest+supported kernel version for i.MX6 in a
production grade setup, which can be patched with Xenomai3 (incl.
RTnet)?

Cheers,

Andreas



[1] https://www.wandboard.org
[2] https://xenomai.org/installing-xenomai-3-x/#Testing_the_real-time_system_both_cores
[3] http://www.binarytides.com/programming-udp-sockets-c-linux/
[4] https://stackoverflow.com/questions/39515407/how-to-handle-a-page-domain-fault-in-a-self-written-character-device-kernel-modu


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Xenomai] Xv3.0.5 /Lv4.9.35 on wandboard: page domain fault in rtudp.ko
  2017-07-15 20:37 [Xenomai] Xv3.0.5 /Lv4.9.35 on wandboard: page domain fault in rtudp.ko Andreas Glatz
@ 2017-07-17 15:12 ` Philippe Gerum
  2017-07-19  9:26   ` Andreas Glatz
  0 siblings, 1 reply; 5+ messages in thread
From: Philippe Gerum @ 2017-07-17 15:12 UTC (permalink / raw)
  To: Andreas Glatz, xenomai

On 07/15/2017 10:37 PM, Andreas Glatz wrote:
> Hi
> 
> I managed to compile and bring-up Xenomai v3.0.5, Ipipe v2, Linux
> v4.9.35 on a Wandboard Quad C1 [1]. I ran all unit-tests sucessfully
> [2] and was able to get RTnet ping to an external PC working. RT ping
> stops working after about 1000 pings... but that's a different story,
> I guess.
> 
> So when I try to get a simple UDP server [3] running on top of RTnet I
> get "Unhandled fault: page domain fault (0x01b)" oopses, where the
> stacktrace indicates it has to do with rtudp.ko, e.g.
> rt_udp_recvmsg+03c/0x38c. Further googling [4] told me that these
> faults are related to missing copy_from_user() calls in the udp
> driver, which seems to have been enforced in the most recent kernel
> versions (after 4.1.x?). So if I go into the rtudp.ko source code and
> start introducing copy_to_user() statements, the the page domain
> faults go away one after the other...
> 
> At my company I basically want to propose to upgrade our current
> kernel (3.0.35) with a more recent one. So I'm looking for a hint
> what's the best+latest+supported kernel version for i.MX6 in a
> production grade setup, which can be patched with Xenomai3 (incl.
> RTnet)?
> 

I've been working on i.MX6 (4/2/1) + Xenomai 3.x for the last couple of
years over kernels 3.18, 4.1 and 4.4, for production grade software: so
far so good, provided one picks the latest I-pipe patches available.

Although Xenomai 3 over kernel 4.9/arm is a bit younger, I have been
using this combo recently on i.MX6 for (Xenomai) development purposes
successfully, so far.

I don't think that preferring any of the kernel releases mentioned above
among others should have a visible impact on RTnet user-wise. I only
tried with kernel 4.9 though.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Xenomai] Xv3.0.5 /Lv4.9.35 on wandboard: page domain fault in rtudp.ko
  2017-07-17 15:12 ` Philippe Gerum
@ 2017-07-19  9:26   ` Andreas Glatz
  2017-07-19 13:13     ` Philippe Gerum
  0 siblings, 1 reply; 5+ messages in thread
From: Andreas Glatz @ 2017-07-19  9:26 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

>>
>> I managed to compile and bring-up Xenomai v3.0.5, Ipipe v2, Linux
>> v4.9.35 on a Wandboard Quad C1 [1]. I ran all unit-tests sucessfully
>> [2] and was able to get RTnet ping to an external PC working. RT ping
>> stops working after about 1000 pings... but that's a different story,
>> I guess.
>>
>> So when I try to get a simple UDP server [3] running on top of RTnet I
>> get "Unhandled fault: page domain fault (0x01b)" oopses, where the
>> stacktrace indicates it has to do with rtudp.ko, e.g.
>> rt_udp_recvmsg+03c/0x38c. Further googling [4] told me that these
>> faults are related to missing copy_from_user() calls in the udp
>> driver, which seems to have been enforced in the most recent kernel
>> versions (after 4.1.x?). So if I go into the rtudp.ko source code and
>> start introducing copy_to_user() statements, the the page domain
>> faults go away one after the other...

I got the 4.1.x kernel running with the latest Ipipe/Xenomai patch and
this solves the issue I was seeing with the 4.9.x kernel: The UDP
server works now on top of the unpatched RTnet and I can communicate
with it through rteth0 (rt_fec.ko) from a remote computer. So,
something must have changed between 4.1.x (also potentially 4.4.x) and
4.9.x in terms of syscall interface. I'll also experiment to get the
4.4.x kernel running...

W/r failing ping tests after 1000 rtpings: I think that's related to
too few rtskbufs. Recently, I increased the rtskbuf pool to 256 (cat
/proc/rtnet/rtskb). With this the rtpings keep on working beyond the
1000 rtping mark. It is interesting tough that I didn't get any kernel
warning or error messages, which would tell me that I actually ran out
of rtskbufs - ping just silently stopped working...

>>
>> At my company I basically want to propose to upgrade our current
>> kernel (3.0.35) with a more recent one. So I'm looking for a hint
>> what's the best+latest+supported kernel version for i.MX6 in a
>> production grade setup, which can be patched with Xenomai3 (incl.
>> RTnet)?
>>
>
> I've been working on i.MX6 (4/2/1) + Xenomai 3.x for the last couple of
> years over kernels 3.18, 4.1 and 4.4, for production grade software: so
> far so good, provided one picks the latest I-pipe patches available.
>
> Although Xenomai 3 over kernel 4.9/arm is a bit younger, I have been
> using this combo recently on i.MX6 for (Xenomai) development purposes
> successfully, so far.
>
> I don't think that preferring any of the kernel releases mentioned above
> among others should have a visible impact on RTnet user-wise. I only
> tried with kernel 4.9 though.
>

Thanks Philippe. As said further above, I'll try Xenomai/RTnet on top
of 4.4.x as well and see if that works for me...


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Xenomai] Xv3.0.5 /Lv4.9.35 on wandboard: page domain fault in rtudp.ko
  2017-07-19  9:26   ` Andreas Glatz
@ 2017-07-19 13:13     ` Philippe Gerum
  2017-07-19 14:46       ` Philippe Gerum
  0 siblings, 1 reply; 5+ messages in thread
From: Philippe Gerum @ 2017-07-19 13:13 UTC (permalink / raw)
  To: Andreas Glatz; +Cc: xenomai

On 07/19/2017 11:26 AM, Andreas Glatz wrote:
>>>
>>> I managed to compile and bring-up Xenomai v3.0.5, Ipipe v2, Linux
>>> v4.9.35 on a Wandboard Quad C1 [1]. I ran all unit-tests sucessfully
>>> [2] and was able to get RTnet ping to an external PC working. RT ping
>>> stops working after about 1000 pings... but that's a different story,
>>> I guess.
>>>
>>> So when I try to get a simple UDP server [3] running on top of RTnet I
>>> get "Unhandled fault: page domain fault (0x01b)" oopses, where the
>>> stacktrace indicates it has to do with rtudp.ko, e.g.
>>> rt_udp_recvmsg+03c/0x38c. Further googling [4] told me that these
>>> faults are related to missing copy_from_user() calls in the udp
>>> driver, which seems to have been enforced in the most recent kernel
>>> versions (after 4.1.x?). So if I go into the rtudp.ko source code and
>>> start introducing copy_to_user() statements, the the page domain
>>> faults go away one after the other...
> 
> I got the 4.1.x kernel running with the latest Ipipe/Xenomai patch and
> this solves the issue I was seeing with the 4.9.x kernel: The UDP
> server works now on top of the unpatched RTnet and I can communicate
> with it through rteth0 (rt_fec.ko) from a remote computer. So,
> something must have changed between 4.1.x (also potentially 4.4.x) and
> 4.9.x in terms of syscall interface. I'll also experiment to get the
> 4.4.x kernel running...
> 
> W/r failing ping tests after 1000 rtpings: I think that's related to
> too few rtskbufs. Recently, I increased the rtskbuf pool to 256 (cat
> /proc/rtnet/rtskb). With this the rtpings keep on working beyond the
> 1000 rtping mark. It is interesting tough that I didn't get any kernel
> warning or error messages, which would tell me that I actually ran out
> of rtskbufs - ping just silently stopped working...
> 
>>>
>>> At my company I basically want to propose to upgrade our current
>>> kernel (3.0.35) with a more recent one. So I'm looking for a hint
>>> what's the best+latest+supported kernel version for i.MX6 in a
>>> production grade setup, which can be patched with Xenomai3 (incl.
>>> RTnet)?
>>>
>>
>> I've been working on i.MX6 (4/2/1) + Xenomai 3.x for the last couple of
>> years over kernels 3.18, 4.1 and 4.4, for production grade software: so
>> far so good, provided one picks the latest I-pipe patches available.
>>
>> Although Xenomai 3 over kernel 4.9/arm is a bit younger, I have been
>> using this combo recently on i.MX6 for (Xenomai) development purposes
>> successfully, so far.
>>
>> I don't think that preferring any of the kernel releases mentioned above
>> among others should have a visible impact on RTnet user-wise. I only
>> tried with kernel 4.9 though.
>>
> 
> Thanks Philippe. As said further above, I'll try Xenomai/RTnet on top
> of 4.4.x as well and see if that works for me...
> 

That would help thanks. I'm currently chasing what looks like a mm
breakage with ipipe-4.9 upgraded to 4.9.38 over a quad-core Cortex A53
(rpi3) which I did not detect after weeks running the same kernel over
several A9-based SoCs. The issues we are both seeing may be rooted at
the same place.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Xenomai] Xv3.0.5 /Lv4.9.35 on wandboard: page domain fault in rtudp.ko
  2017-07-19 13:13     ` Philippe Gerum
@ 2017-07-19 14:46       ` Philippe Gerum
  0 siblings, 0 replies; 5+ messages in thread
From: Philippe Gerum @ 2017-07-19 14:46 UTC (permalink / raw)
  To: Andreas Glatz; +Cc: xenomai

On 07/19/2017 03:13 PM, Philippe Gerum wrote:
> On 07/19/2017 11:26 AM, Andreas Glatz wrote:
>>>>
>>>> I managed to compile and bring-up Xenomai v3.0.5, Ipipe v2, Linux
>>>> v4.9.35 on a Wandboard Quad C1 [1]. I ran all unit-tests sucessfully
>>>> [2] and was able to get RTnet ping to an external PC working. RT ping
>>>> stops working after about 1000 pings... but that's a different story,
>>>> I guess.
>>>>
>>>> So when I try to get a simple UDP server [3] running on top of RTnet I
>>>> get "Unhandled fault: page domain fault (0x01b)" oopses, where the
>>>> stacktrace indicates it has to do with rtudp.ko, e.g.
>>>> rt_udp_recvmsg+03c/0x38c. Further googling [4] told me that these
>>>> faults are related to missing copy_from_user() calls in the udp
>>>> driver, which seems to have been enforced in the most recent kernel
>>>> versions (after 4.1.x?). So if I go into the rtudp.ko source code and
>>>> start introducing copy_to_user() statements, the the page domain
>>>> faults go away one after the other...
>>
>> I got the 4.1.x kernel running with the latest Ipipe/Xenomai patch and
>> this solves the issue I was seeing with the 4.9.x kernel: The UDP
>> server works now on top of the unpatched RTnet and I can communicate
>> with it through rteth0 (rt_fec.ko) from a remote computer. So,
>> something must have changed between 4.1.x (also potentially 4.4.x) and
>> 4.9.x in terms of syscall interface. I'll also experiment to get the
>> 4.4.x kernel running...
>>
>> W/r failing ping tests after 1000 rtpings: I think that's related to
>> too few rtskbufs. Recently, I increased the rtskbuf pool to 256 (cat
>> /proc/rtnet/rtskb). With this the rtpings keep on working beyond the
>> 1000 rtping mark. It is interesting tough that I didn't get any kernel
>> warning or error messages, which would tell me that I actually ran out
>> of rtskbufs - ping just silently stopped working...
>>
>>>>
>>>> At my company I basically want to propose to upgrade our current
>>>> kernel (3.0.35) with a more recent one. So I'm looking for a hint
>>>> what's the best+latest+supported kernel version for i.MX6 in a
>>>> production grade setup, which can be patched with Xenomai3 (incl.
>>>> RTnet)?
>>>>
>>>
>>> I've been working on i.MX6 (4/2/1) + Xenomai 3.x for the last couple of
>>> years over kernels 3.18, 4.1 and 4.4, for production grade software: so
>>> far so good, provided one picks the latest I-pipe patches available.
>>>
>>> Although Xenomai 3 over kernel 4.9/arm is a bit younger, I have been
>>> using this combo recently on i.MX6 for (Xenomai) development purposes
>>> successfully, so far.
>>>
>>> I don't think that preferring any of the kernel releases mentioned above
>>> among others should have a visible impact on RTnet user-wise. I only
>>> tried with kernel 4.9 though.
>>>
>>
>> Thanks Philippe. As said further above, I'll try Xenomai/RTnet on top
>> of 4.4.x as well and see if that works for me...
>>
> 
> That would help thanks. I'm currently chasing what looks like a mm
> breakage with ipipe-4.9 upgraded to 4.9.38 over a quad-core Cortex A53
> (rpi3) which I did not detect after weeks running the same kernel over
> several A9-based SoCs. The issues we are both seeing may be rooted at
> the same place.
> 

Ok, so on my end, ipipe-4.9 HEAD (4.9.24) is fine on the several cortex
A-based SoCs I have tested so far, including the RPI3 in 32bit mode. The
breakage may have been introduced by a wrong fixup when upgrading from
.24 to .38 locally, or a problem with the RPI vendor tree I initially
worked on. Anyway, we can target mainline for RPI, this looks mature
enough Xenomai-wise.

So I believe that downgrading to 4.1 specifically "solves" (more
precisely does not trigger) the copy_to/from_user issue with page
domains RTnet suffers from, instead of revealing any nasty mm bug in
later I-pipe/kernel releases.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-07-19 14:46 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-15 20:37 [Xenomai] Xv3.0.5 /Lv4.9.35 on wandboard: page domain fault in rtudp.ko Andreas Glatz
2017-07-17 15:12 ` Philippe Gerum
2017-07-19  9:26   ` Andreas Glatz
2017-07-19 13:13     ` Philippe Gerum
2017-07-19 14:46       ` Philippe Gerum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.