All of lore.kernel.org
 help / color / mirror / Atom feed
* AMP on an SMP system
@ 2013-08-02  8:33 Michael Schnell
  2013-08-02 11:42 ` Robert Schwebel
                   ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Michael Schnell @ 2013-08-02  8:33 UTC (permalink / raw)
  To: linux-embedded

Hi Experts.

Is there a kind of "official" way to set aside one of the available 
cores in an SMP system from the Linux OS to do deeply embedded 
extremely-low-latency stuff in a kind of single task "main loop" type 
environment ? I.e. creating a true coprocessor from an SMP hardware.

Some of the problems that come in ind here include:

  - how to make the Linux initialization ignore one of the available 
cores  or free a core later on ?
Here I found this:
http://www.linuxtopia.org/online_books/linux_kernel/kernel_configuration/re46.html
So using one of the four cores for special purpose in fact is viable.

  - how to have  a Linux task start the free running main loop ?

  - how to assign certain interrupts to that core and have ISRs run 
there only dedicatedly interrupting the "main loop" and not ever being 
blocked by any Linux activity ?
here I found this:
https://access.redhat.com/site/solutions/15482
In fact of course the hardware defines if/how a certain Interrupt can be 
assigned to a certain CPU. How is this usually done when using ARM 
Cortex A9+ cores ?

  - what about MMU issues ?

  - how to have a Linux application communicate with the non.-Linux 
application running on the dedicated core ?
Here I found this:
http://lwn.net/Articles/464391


For example I (e.g.) would like a (now rather cheap) standard quadcore 
ARM Cortex A9 processor chip and modify a Debian distribution in a way 
that support this stuff.

Thanks for any pointers ?

-Michael

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: AMP on an SMP system
  2013-08-02  8:33 AMP on an SMP system Michael Schnell
@ 2013-08-02 11:42 ` Robert Schwebel
  2013-08-02 12:13   ` Michael Schnell
  2013-08-04 21:28 ` Lambrecht Jürgen
  2013-08-08  7:41 ` Michael Schnell
  2 siblings, 1 reply; 25+ messages in thread
From: Robert Schwebel @ 2013-08-02 11:42 UTC (permalink / raw)
  To: Michael Schnell; +Cc: linux-embedded

On Fri, Aug 02, 2013 at 10:33:37AM +0200, Michael Schnell wrote:
> Is there a kind of "official" way to set aside one of the available
> cores in an SMP system from the Linux OS to do deeply embedded
> extremely-low-latency stuff in a kind of single task "main loop" type
> environment ? I.e. creating a true coprocessor from an SMP hardware.

Before hacking around (which might also lead to interesting solutions),
I would start using a kernel with preempt-rt support and play with the
cpu affinity:

http://lxr.linux.no/#linux+v3.10.4/Documentation/kernel-parameters.txt#L1257

rsc
-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: AMP on an SMP system
  2013-08-02 11:42 ` Robert Schwebel
@ 2013-08-02 12:13   ` Michael Schnell
  2013-08-02 14:53     ` Marco Stornelli
  0 siblings, 1 reply; 25+ messages in thread
From: Michael Schnell @ 2013-08-02 12:13 UTC (permalink / raw)
  To: Robert Schwebel; +Cc: linux-embedded

On 08/02/2013 01:42 PM, Robert Schwebel wrote:
> Before hacking around (which might also lead to interesting solutions),
> I would start using a kernel with preempt-rt support and play with the
> cpu affinity:
>
> http://lxr.linux.no/#linux+v3.10.4/Documentation/kernel-parameters.txt#L1257
>

Robert !
Nice to see you here (I do own your "Embedded Linux Handbuch für 
Entwickler" :-) )

Thanks for the pointer !

I do already know "preempt-rt", but I was not aware of cpu affinity.

So this might help.

In fact I need a way to do very guaranteed low latency. regarding the 
high clock rate (about 1 GHz) modern ARM chips can provide, maybe 
preempt-rt with the cpu affinity might be a decent way to go.

The raining questions include
  - how to calculate the maximum latency that can be guaranteed ? (i.e. 
does the Kernel impose any spinlocks and interrupt disables on the would 
be AMP subsystem ?)
  - how to assign an interrupt (e.g. a dedicated timer) to the subsystem ?
  - Do the interrupts immediately call the ISR of the cpu "under 
affinity" or is some additional latency imposed by the Kernel (and how 
many cpu cycles at max are needed to enter the ISR) ?

Thanks,
-Michael

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: AMP on an SMP system
  2013-08-02 12:13   ` Michael Schnell
@ 2013-08-02 14:53     ` Marco Stornelli
  2013-08-02 15:24       ` Michael Schnell
  2013-08-03 19:11       ` Robert Schwebel
  0 siblings, 2 replies; 25+ messages in thread
From: Marco Stornelli @ 2013-08-02 14:53 UTC (permalink / raw)
  To: Michael Schnell; +Cc: Robert Schwebel, linux-embedded

Il 02/08/2013 14:13, Michael Schnell ha scritto:
> On 08/02/2013 01:42 PM, Robert Schwebel wrote:
>> Before hacking around (which might also lead to interesting solutions),
>> I would start using a kernel with preempt-rt support and play with the
>> cpu affinity:
>>
>> http://lxr.linux.no/#linux+v3.10.4/Documentation/kernel-parameters.txt#L1257
>>
>>
>
> Robert !
> Nice to see you here (I do own your "Embedded Linux Handbuch für
> Entwickler" :-) )
>
> Thanks for the pointer !
>
> I do already know "preempt-rt", but I was not aware of cpu affinity.
>
> So this might help.
>
> In fact I need a way to do very guaranteed low latency. regarding the
> high clock rate (about 1 GHz) modern ARM chips can provide, maybe
> preempt-rt with the cpu affinity might be a decent way to go.
>

Just to be clear: at the moment there isn't an easy way to dedicate 
"completely" a cpu for a task. The last time I tried (some years ago 
actually) to use exclusive cpu set, the scheduler didn't do a good work 
because it was designed for SMP, not SMP minus some piece. However you 
can try and you can report your results. It would be interesting.

> The raining questions include
>   - how to calculate the maximum latency that can be guaranteed ? (i.e.
> does the Kernel impose any spinlocks and interrupt disables on the would
> be AMP subsystem ?)

No. You can use full dyn tick for example to disable timer interrupt, 
but it has got some pros and cons, especially with very low latency 
requirement.

>   - how to assign an interrupt (e.g. a dedicated timer) to the subsystem ?

Interrupt handler are kernel thread, so you can schedule your kernel 
thread on your "normal" cpu.

>   - Do the interrupts immediately call the ISR of the cpu "under
> affinity" or is some additional latency imposed by the Kernel

AFAIC, no latency for cpu "under affinity".

> (and how
> many cpu cycles at max are needed to enter the ISR) ?

It's difficult to answer to this question because the performance 
depends on your system. From my last statistics I saw that with an rt 
linux kernel you can stay below 50us for the interrupt latency.

Marco

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: AMP on an SMP system
  2013-08-02 14:53     ` Marco Stornelli
@ 2013-08-02 15:24       ` Michael Schnell
  2013-08-02 15:37         ` Marco Stornelli
  2013-08-03 19:11       ` Robert Schwebel
  1 sibling, 1 reply; 25+ messages in thread
From: Michael Schnell @ 2013-08-02 15:24 UTC (permalink / raw)
  Cc: linux-embedded

On 08/02/2013 04:53 PM, Marco Stornelli wrote:
>
>>   - how to assign an interrupt (e.g. a dedicated timer) to the 
>> subsystem ?
>
> Interrupt handler are kernel thread, so you can schedule your kernel 
> thread on your "normal" cpu.
Sorry. I don't understand.

The point I'd like to make is, that for really low latency stuff the ISR 
needs to take place immediately when the hardware fires an interrupt.

As the Linux kernel will (for the SMP CPUs it handles) need to disable 
interrupt in certain cases, it is essential that the "really low 
latency" interrupt is assigned to the AMP cpu (that the Kernel will 
never touch).

> AFAIC, no latency for cpu "under affinity".
That would be great but it need the stuff described above.
In fact the interrupt would need to be assigned to the AMP cpu by some 
hardware means (that I don't know anything about yet), and not be 
"forwarded" in any way from some other cpu (which is managed by the 
Kernel) and thus might be in a "interrupt disable state at some point in 
time.

>> (and how
>> many cpu cycles at max are needed to enter the ISR) ?
>
> It's difficult to answer to this question because the performance 
> depends on your system. From my last statistics I saw that with an rt 
> linux kernel you can stay below 50us for the interrupt latency.
Of course (sorry for unclear language). I tried to ask for a pointer to 
start developing an algorithm that allows to predict the max latency the 
system can offer.

Here we could try to do these calculations as well with a "really 
dedicated AMP" CPU and with a system using "preempt-rt" and "cpu 
affinity" appropriately. to see if a more "standard" way to do things 
might be good enough.

Thanks for your answers,
-Michael

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: AMP on an SMP system
  2013-08-02 15:24       ` Michael Schnell
@ 2013-08-02 15:37         ` Marco Stornelli
  2013-08-02 16:00           ` Michael Schnell
  0 siblings, 1 reply; 25+ messages in thread
From: Marco Stornelli @ 2013-08-02 15:37 UTC (permalink / raw)
  To: Michael Schnell; +Cc: linux-embedded

Il 02/08/2013 17:24, Michael Schnell ha scritto:
> On 08/02/2013 04:53 PM, Marco Stornelli wrote:
>>
>>>   - how to assign an interrupt (e.g. a dedicated timer) to the
>>> subsystem ?
>>
>> Interrupt handler are kernel thread, so you can schedule your kernel
>> thread on your "normal" cpu.
> Sorry. I don't understand.
>
> The point I'd like to make is, that for really low latency stuff the ISR
> needs to take place immediately when the hardware fires an interrupt.
>
> As the Linux kernel will (for the SMP CPUs it handles) need to disable
> interrupt in certain cases, it is essential that the "really low
> latency" interrupt is assigned to the AMP cpu (that the Kernel will
> never touch).
>
>> AFAIC, no latency for cpu "under affinity".
> That would be great but it need the stuff described above.
> In fact the interrupt would need to be assigned to the AMP cpu by some
> hardware means (that I don't know anything about yet), and not be
> "forwarded" in any way from some other cpu (which is managed by the
> Kernel) and thus might be in a "interrupt disable state at some point in
> time.
>

I don't know your hw so my consideration are really general. ISRs in rt 
kernel doesn't exist or at least the only work is to wake up the kernel 
thread for the management. The thing you can do is to move the kernel 
thread for interrupt X where you want to manage it, or you can set a 
specific scheduler policy. For example you can set a SCHED_FIFO with a 
very high priority for your "really low latency" tasks. RT kernel does 
the work for you :) You can see here: http://lwn.net/Articles/146861/

Marco

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: AMP on an SMP system
  2013-08-02 16:00           ` Michael Schnell
@ 2013-08-02 15:58             ` Marco Stornelli
  0 siblings, 0 replies; 25+ messages in thread
From: Marco Stornelli @ 2013-08-02 15:58 UTC (permalink / raw)
  To: Michael Schnell; +Cc: linux-embedded

Il 02/08/2013 18:00, Michael Schnell ha scritto:
> On 08/02/2013 05:37 PM, Marco Stornelli wrote:
>>
>> I don't know your hw so my consideration are really general.
> The hardware is not decided yet (it will be some A9 thingy). So for me
> "really general" is just fine.
>
>> ISRs in rt kernel doesn't exist or at least the only work is to wake
>> up the kernel thread for the management.
> I see.
>
> But how to determine the max latency for this ?
>

Maybe on eLinux.org you can find some number.

Marco

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: AMP on an SMP system
  2013-08-02 15:37         ` Marco Stornelli
@ 2013-08-02 16:00           ` Michael Schnell
  2013-08-02 15:58             ` Marco Stornelli
  0 siblings, 1 reply; 25+ messages in thread
From: Michael Schnell @ 2013-08-02 16:00 UTC (permalink / raw)
  Cc: linux-embedded

On 08/02/2013 05:37 PM, Marco Stornelli wrote:
>
> I don't know your hw so my consideration are really general. 
The hardware is not decided yet (it will be some A9 thingy). So for me 
"really general" is just fine.

> ISRs in rt kernel doesn't exist or at least the only work is to wake 
> up the kernel thread for the management. 
I see.

But how to determine the max latency for this ?

> The thing you can do is to move the kernel thread for interrupt X 
> where you want to manage it, or you can set a specific scheduler 
> policy. For example you can set a SCHED_FIFO with a very high priority 
> for your "really low latency" tasks. RT kernel does the work for you 
> :) You can see here: http://lwn.net/Articles/146861/

This seems be rather easy to do. If this is "good enough" latency wise, 
it seems  to be the way to go.

-Michael

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: AMP on an SMP system
  2013-08-02 14:53     ` Marco Stornelli
  2013-08-02 15:24       ` Michael Schnell
@ 2013-08-03 19:11       ` Robert Schwebel
  2013-08-05  7:25         ` Michael Schnell
  1 sibling, 1 reply; 25+ messages in thread
From: Robert Schwebel @ 2013-08-03 19:11 UTC (permalink / raw)
  To: Marco Stornelli; +Cc: Michael Schnell, linux-embedded

Hi,

On Fri, Aug 02, 2013 at 04:53:50PM +0200, Marco Stornelli wrote:
> > In fact I need a way to do very guaranteed low latency. regarding the
> > high clock rate (about 1 GHz) modern ARM chips can provide, maybe
> > preempt-rt with the cpu affinity might be a decent way to go.

Modern hardware has lots of features which makes them basically not
deterministic.

One recent example: on MX6 (quadcore Cortex-A9), when you run a certain
cpuburn tool, the temperature rises up to the maximum allowed value in
just a couple of seconds, and then you either have the choice to burn
your hardware or to use the tempmon interrupt to throttle down the speed
of the cpu. You can imagine what all that means for latencies.

So if you want to use a reasonably modern hardware, it is always a
matter of "high system probability" of not missing a cycle.

> Just to be clear: at the moment there isn't an easy way to dedicate
> "completely" a cpu for a task. The last time I tried (some years ago
> actually) to use exclusive cpu set, the scheduler didn't do a good
> work because it was designed for SMP, not SMP minus some piece.
> However you can try and you can report your results. It would be
> interesting.

Without having done some tests myself, I would expect that the -rt folks
have such scenarios.

> > The raining questions include
> >   - how to calculate the maximum latency that can be guaranteed ? (i.e.
> > does the Kernel impose any spinlocks and interrupt disables on the would
> > be AMP subsystem ?)

You can't. And you can't, even if you try to run bare-metal software on
a dedicated core. I can't imagine how for example the cache influences
between the cores could be determined.

> No. You can use full dyn tick for example to disable timer
> interrupt, but it has got some pros and cons, especially with very
> low latency requirement.
> 
> >  - how to assign an interrupt (e.g. a dedicated timer) to the subsystem ?
> 
> Interrupt handler are kernel thread, so you can schedule your kernel
> thread on your "normal" cpu.

The really great thing with preempt-rt is that it is all Linux and
POSIX. No need to invent new things like program starters, inter-
process-communication and even inter-processor-communication.
 
> >  - Do the interrupts immediately call the ISR of the cpu "under
> >affinity" or is some additional latency imposed by the Kernel
> 
> AFAIC, no latency for cpu "under affinity".
> 
> >(and how
> >many cpu cycles at max are needed to enter the ISR) ?
> 
> It's difficult to answer to this question because the performance
> depends on your system. From my last statistics I saw that with an
> rt linux kernel you can stay below 50us for the interrupt latency.

Here is an MX53 (single core cortex-a8) in the OSADL testlab:
https://www.osadl.org/Latency-plot-of-system-in-rack-0-slot.qa-latencyplot-r0s5.0.html

But note that multicore is quite different. I'd suggest that you measure
yourself.

rsc
-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: AMP on an SMP system
  2013-08-02  8:33 AMP on an SMP system Michael Schnell
  2013-08-02 11:42 ` Robert Schwebel
@ 2013-08-04 21:28 ` Lambrecht Jürgen
  2013-08-05  7:36   ` Michael Schnell
  2013-08-05 10:00   ` Lambrecht Jürgen
  2013-08-08  7:41 ` Michael Schnell
  2 siblings, 2 replies; 25+ messages in thread
From: Lambrecht Jürgen @ 2013-08-04 21:28 UTC (permalink / raw)
  To: Michael Schnell; +Cc: linux-embedded

On 08/02/2013 10:33 AM, Michael Schnell wrote:
> Hi Experts.
Hi, I am rather new to Linux, but now the RTOS eCos quite well, and have 
done some bare metal coding.
This year, a student did his Master's project with my topic: evaluating 
the Xilinx Zynq: a dual core ARM Cortex A9 with an FPGA (as sort of 
co-processor).
And he demonstrated AMP with Linux on CPU0 and a bare metal program 
running on CPU1.
> Is there a kind of "official" way to set aside one of the available
> cores in an SMP system from the Linux OS to do deeply embedded
> extremely-low-latency stuff in a kind of single task "main loop" type
> environment ? I.e. creating a true coprocessor from an SMP hardware.
I should start reading some multi-core datasheets, but I would expect 
all embedded multi-core processors AMP capable
>
> Some of the problems that come in ind here include:
>
>    - how to make the Linux initialization ignore one of the available
> cores  or free a core later on ?
> Here I found this:
> http://www.linuxtopia.org/online_books/linux_kernel/kernel_configuration/re46.html
> So using one of the four cores for special purpose in fact is viable.
My student used Device Tree with 'maxcpus=1'
>
>    - how to have  a Linux task start the free running main loop ?
He followed Xilinx application notes (and google..) and it is 
implemented that always CPU0 starts first, and CPU1 waits to start until 
the start address is written to a specific address - my student did it 
with a small Linux program.
>
>    - how to assign certain interrupts to that core and have ISRs run
> there only dedicatedly interrupting the "main loop" and not ever being
> blocked by any Linux activity ?
> here I found this:
> https://access.redhat.com/site/solutions/15482
> In fact of course the hardware defines if/how a certain Interrupt can be
> assigned to a certain CPU. How is this usually done when using ARM
> Cortex A9+ cores ?.
The ARM A9 datasheet will say what registers to write to assign IRQs to 
CPU1, and make Linux not to use those IRQs.
Then the max. latency is determined by the clock speed and CPU cycles 
the bare metal program needs to react (should be in datasheet).

About the non-determinism of modern hardware: if a chip is AMP capable 
the heating up of 1 core should not influence the other core. I believe 
heat spreads vertically (to the heatsink) and not so much horizontally. 
So an RTOS should run with a stable frequency. (anyhow, Linux should not 
touch the other CPU, or need to touch it).
>
>    - what about MMU issues ?
The bare metal program does not use the MMU. The L1 cache is separated, 
and the shared L2 cache was not used by CPU1 to avoid problems.
The RAM was divided in 2 separate parts.
I think it is not too difficult to share some RAM (a third section in 
the RAM) and let 1 of the core be the master of it to share data.
>
>    - how to have a Linux application communicate with the non.-Linux
> application running on the dedicated core ?
> Here I found this:
> http://lwn.net/Articles/464391
>
>
> For example I (e.g.) would like a (now rather cheap) standard quadcore
> ARM Cortex A9 processor chip and modify a Debian distribution in a way
> that support this stuff.
>
> Thanks for any pointers ?
The Xilinx Zynq is of course purpose-built for this kind of stuff. Also 
Altera has such a SoC (System on Chip).
My student also found examples of an AMP solution with Linux/FreeRTOS 
and Linux/eCos.

Let me know if you need more info, but I fear my answer is too deep 
embedded an away from Linux (when I read the other answers).

Success anyhow, and I would be happy to read about your project!

Jürgen
>
> -Michael
> --
> To unsubscribe from this list: send the line "unsubscribe linux-embedded" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>


-- 
Jürgen Lambrecht
R&D Associate
Mobile: +32 499 644 531
Tel: +32 (0)51 303045    Fax: +32 (0)51 310670
http://www.televic-rail.com
Televic Rail NV - Leo Bekaertlaan 1 - 8870 Izegem - Belgium
Company number 0825.539.581 - RPR Kortrijk

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: AMP on an SMP system
  2013-08-03 19:11       ` Robert Schwebel
@ 2013-08-05  7:25         ` Michael Schnell
  2013-08-05  8:17           ` Robert Schwebel
  0 siblings, 1 reply; 25+ messages in thread
From: Michael Schnell @ 2013-08-05  7:25 UTC (permalink / raw)
  Cc: linux-embedded

On 08/03/2013 09:11 PM, Robert Schwebel wrote:
> One recent example: on MX6 (quadcore Cortex-A9), when you run a 
> certain cpuburn tool, the temperature rises up to the maximum allowed 
> value in just a couple of seconds, and then you either have the choice 
> to burn your hardware or to use the tempmon interrupt to throttle down 
> the speed of the cpu. You can imagine what all that means for 
> latencies. So if you want to use a reasonably modern hardware, it is 
> always a matter of "high system probability" of not missing a cycle. 
I see.

OTOH. with hard realtime embedded Systems, you always need to determine 
the max latency for any critical reaction (which of course is missed 
when the system is defective, e.g. because the temperature gets higher 
than predicted by design in worst case - e.g. because of a fan fail - or 
things like power fail. Here of course the system needs to do a save 
shutdown before such a worst case hits. )

(In fact what I have in mind is "virtual peripherals", which I define as 
"hard realtime with extremely low latency", usable as an alternative to 
FPGAs.)

> You can't. And you can't, even if you try to run bare-metal software on
> a dedicated core. I can't imagine how for example the cache influences
> between the cores could be determined.
This would render all efforts for hard realtime embedded Linux 
applications useless. You always need to calculate the max latency.

Of course you are right that cache issues need to be taken into account. 
But this is on top of the latency that is imposed by process scheduling 
(if you consider user space) Kernel locks and interrupt delays (if you 
consider Kernel space).

Using a dedicated Core will certainly reduce the max latency, but of 
course it will not do away with the necessary calculations that take 
into account what the other CPUs might do (regarding second level Cache, 
cache synchronization, bus scheduling, etc.)

To eliminate this, you would need another dedicated chip (Processor or 
FPGA). And this is exactly, what I would like to avoid regarding that a 
quad core system with much higher Clock rate nowadays will cost less 
than any homebrew multi chip solution.

IMHO, a good compromise is the TI 335x chip that has a single 1 GHz 
Cortex A8 and two 300 MHz Cortex A3 cores for a very reasonable price 
and and "embedded" specs for temperature range and future chip 
availability.
> The really great thing with preempt-rt is that it is all Linx and 
> POSIX. No need to invent new things like program starters, inter- 
> process-communication and even inter-processor-communication. 
Sounds very good - provided it is possible to calculate the the max 
latency and same is a lot better than "usual".
> I'd suggest that you measure yourself.
That of course is necessary in any case.

Right now, I am just investigating viable ways to do things, before 
doing a pre-decision for any hardware and starting do dive into that.

-Michael

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: AMP on an SMP system
  2013-08-04 21:28 ` Lambrecht Jürgen
@ 2013-08-05  7:36   ` Michael Schnell
  2013-08-05 10:00   ` Lambrecht Jürgen
  1 sibling, 0 replies; 25+ messages in thread
From: Michael Schnell @ 2013-08-05  7:36 UTC (permalink / raw)
  Cc: linux-embedded

On 08/04/2013 11:28 PM, Lambrecht Jürgen wrote:
> The Xilinx Zynq is of course purpose-built for this kind of stuff. 
> Also Altera has such a SoC (System on Chip). My student also found 
> examples of an AMP solution with Linux/FreeRTOS and Linux/eCos. 

I feel that for many "virtual Peripheral" applications (at least for 
those I have in mind right now), a single dedicated 1 GHz Cortex should 
be good enough. So I'd like to avoid the cost for hardware and 
development of an additional FPGA .

> Let me know if you need more info, but I fear my answer is too deep 
> embedded an away from Linux (when I read the other answers). 
Thanks a lot for your encouraging notes. I think the gap between 
embedded Linux and more propriety embedded OSes.

> Success anyhow, and I would be happy to read about your project!
Of course I am "here to stay". But in fact this is not really a project 
yet but a rather long term thingy.

-Michael

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: AMP on an SMP system
  2013-08-05  7:25         ` Michael Schnell
@ 2013-08-05  8:17           ` Robert Schwebel
  2013-08-05  9:04             ` Michael Schnell
  0 siblings, 1 reply; 25+ messages in thread
From: Robert Schwebel @ 2013-08-05  8:17 UTC (permalink / raw)
  To: Michael Schnell; +Cc: linux-embedded

On Mon, Aug 05, 2013 at 09:25:18AM +0200, Michael Schnell wrote:
> > You can't. And you can't, even if you try to run bare-metal software
> > on a dedicated core. I can't imagine how for example the cache
> > influences between the cores could be determined.
>
> This would render all efforts for hard realtime embedded Linux
> applications useless. You always need to calculate the max latency.

You can't calculate the max latency with today's complex processor
hardware any more. It's all a matter of system failure probabilities.

> Using a dedicated Core will certainly reduce the max latency, but of
> course it will not do away with the necessary calculations that take
> into account what the other CPUs might do (regarding second level
> Cache, cache synchronization, bus scheduling, etc.)

I don't think it is possible to get an analytic result for the max
latency introduced by other CPUs.

What we do today is to run preempt-rt systems, measure them under
realistic and extreme load and look at the max latencies. If you design
your system in a way that it runs at factor X above this max latency, it
should be fine.

The advantage of preempt-rt is that you have only one software
environment for rt and non-rt. Nevertheless, there always have been
settings where you could get rid of all realtime complexity by spending
a 1-Euro microcontroller to the BOM.

> IMHO, a good compromise is the TI 335x chip that has a single 1 GHz
> Cortex A8 and two 300 MHz Cortex A3 cores for a very reasonable price
> and and "embedded" specs for temperature range and future chip
> availability.

AM335x has PRU subprocessors (not ARM architecture). Yes, that can be a
solution. Freescale's Vybrid has Cortex-M3 cores.

> Right now, I am just investigating viable ways to do things, before
> doing a pre-decision for any hardware and starting do dive into that.

What kind of application is that?

rsc
-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: AMP on an SMP system
  2013-08-05  8:17           ` Robert Schwebel
@ 2013-08-05  9:04             ` Michael Schnell
  0 siblings, 0 replies; 25+ messages in thread
From: Michael Schnell @ 2013-08-05  9:04 UTC (permalink / raw)
  Cc: linux-embedded

On 08/05/2013 10:17 AM, Robert Schwebel wrote:
> On Mon, Aug 05, 2013 at 09:25:18AM +0200, Michael Schnell wrote:
>>> You can't. And you can't, even if you try to run bare-metal software
>>> on a dedicated core. I can't imagine how for example the cache
>>> influences between the cores could be determined.
>> This would render all efforts for hard realtime embedded Linux
>> applications useless. You always need to calculate the max latency.
> You can't calculate the max latency with today's complex processor
> hardware any more. It's all a matter of system failure probabilities.
So don't use them for realtime embedded applications ?

There are companies such as SysGo that seem to claim this possibility 
with their PikeOS (see 
http://www.sysgo.com/products/pikeos-rtos-and-virtualization-concept/rtos-technology/ 
). AFAIK, they don't even are able to use dedicated cores (yet). Of 
course they don't support "virtual peripheral" technology here, but 
strict determinism is a strung requirement with the critical "security" 
applications they have in mind.


> Nevertheless, there always have been settings where you could get rid 
> of all realtime complexity by spending a 1-Euro microcontroller to the 
> BOM. 

For "virtual peripherals" applications you will need either a fast CPU 
or an FPGA.

> AM335x has PRU subprocessors (not ARM architecture).
The 4788 page "AM335x Applications Processor Technical Reference Manual" 
(SPRUH73 – October 2011) on page 226 depicts the "ARM Cortex M3 Memory 
Map".

> What kind of application is that?

At first we are discussion DMX I/O (there already is a running project 
doing this with the 335x PRUS (on a BeagleBone board).

But this is only sample "virtual peripheral" project with rather low 
demand that easily could be done with "a 1-Euro microcontroller". (and 
in fact we already did this using a PIC33).

But in future we are planning for several kinds of propriety digital 
waveforms that are to be generated or analyzed.

-Michael

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: AMP on an SMP system
  2013-08-04 21:28 ` Lambrecht Jürgen
  2013-08-05  7:36   ` Michael Schnell
@ 2013-08-05 10:00   ` Lambrecht Jürgen
  2013-08-07  8:23     ` Michael Schnell
  1 sibling, 1 reply; 25+ messages in thread
From: Lambrecht Jürgen @ 2013-08-05 10:00 UTC (permalink / raw)
  To: Michael Schnell; +Cc: linux-embedded

On 08/04/2013 11:28 PM, Lambrecht Jürgen wrote:
> On 08/02/2013 10:33 AM, Michael Schnell wrote:
> [snip]
>>     - how to assign certain interrupts to that core and have ISRs run
>> there only dedicatedly interrupting the "main loop" and not ever being
>> blocked by any Linux activity ?
>> here I found this:
>> https://access.redhat.com/site/solutions/15482
>> In fact of course the hardware defines if/how a certain Interrupt can be
>> assigned to a certain CPU. How is this usually done when using ARM
>> Cortex A9+ cores ?.
> The ARM A9 datasheet will say what registers to write to assign IRQs to
> CPU1, and make Linux not to use those IRQs.
> Then the max. latency is determined by the clock speed and CPU cycles
> the bare metal program needs to react (should be in datasheet).
I asked a Freescale FAE and the cortex A9 is AMP capable (I also needed 
to know this for my project):

"Actually, you can check on ARM community web site, where you will see that the CortexA9/GIC infrastructure enables AMP implementation.
http://forums.arm.com/index.php?/topic/15656-cortex-a9-amp/

The Global Interrupt Controller gives you the possibility to assign specific IT to specific cores. But a CortexA9 is not very RT oriented (for that ARM has created the Cortex R Family, with improved RT execution time)."


>
> About the non-determinism of modern hardware: if a chip is AMP capable
> the heating up of 1 core should not influence the other core. I believe
> heat spreads vertically (to the heatsink) and not so much horizontally.
> So an RTOS should run with a stable frequency. (anyhow, Linux should not
> touch the other CPU, or need to touch it).
That Freescale FAE warns about the voltage scaling: "you have only one 
power line to supply all the cores, so all processor would be impacted. 
There is no way to change that."
So indeed a problem with modern hardware..

Kind regards,
Jürgen--
To unsubscribe from this list: send the line "unsubscribe linux-embedded" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: AMP on an SMP system
  2013-08-05 10:00   ` Lambrecht Jürgen
@ 2013-08-07  8:23     ` Michael Schnell
  2013-08-07  8:29       ` Michael Schnell
  2013-08-07  9:04       ` Michael Schnell
  0 siblings, 2 replies; 25+ messages in thread
From: Michael Schnell @ 2013-08-07  8:23 UTC (permalink / raw)
  Cc: linux-embedded

On 08/05/2013 12:00 PM, Lambrecht Jürgen wrote:
> "Actually, you can check on ARM community web site, where you will see that the CortexA9/GIC infrastructure enables AMP implementation.
> http://forums.arm.com/index.php?/topic/15656-cortex-a9-amp/
>
Here I see:

"In the Cortex-A9 MPCore Technical Reference Manual I found the SMPnAMP 
Signal which switches between SMP or AMP for each Processor."

While it  is encouraging to see that AMP _is_  a supported feature with 
A9, I fail to understand what the use of making this known to the 
hardware might be.

I supposed AMP vs SMP would be just a matter of software.

-Michael

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: AMP on an SMP system
  2013-08-07  8:23     ` Michael Schnell
@ 2013-08-07  8:29       ` Michael Schnell
  2013-08-07  9:04       ` Michael Schnell
  1 sibling, 0 replies; 25+ messages in thread
From: Michael Schnell @ 2013-08-07  8:29 UTC (permalink / raw)
  Cc: linux-embedded

On 08/07/2013 10:23 AM, Michael Schnell wrote:
>
>
> "In the Cortex-A9 MPCore Technical Reference Manual I found the 
> SMPnAMP Signal which switches between SMP or AMP for each Processor."
>
I found this:

SMPnAMP -> "Set whether the processor is part of a coherent domain."

Which does not help, because I don't know what a "coherent domain" is 
supposed to mean.

-Michael

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: AMP on an SMP system
  2013-08-07  8:23     ` Michael Schnell
  2013-08-07  8:29       ` Michael Schnell
@ 2013-08-07  9:04       ` Michael Schnell
  1 sibling, 0 replies; 25+ messages in thread
From: Michael Schnell @ 2013-08-07  9:04 UTC (permalink / raw)
  Cc: linux-embedded

I also found:

"This processor is in the inner shared domain, and uses its cache 
coherency protocol."

I understand  that some kind of memory  coherency needs to be guaranteed 
in AMP applications as well, if memory is used for communication between 
the systems.

Of course with AMP can avoid using shared memory or restrict shared 
memory usage to certain small areas necessary for communication. Perhaps 
some kind of "cache bypass" method (that might be provided by the MPU 
for DMA purpose)  for the memory region used for communication can be 
requested.

Thus, maybe setting SMPnAMP to "AMP" helps avoiding cache synchronizing 
latency and by this greatly improves the calculated max latency and thus 
especially in my favorite issue "virtual peripheral" is exactly the 
missing feature that allows to avoid most of the "non-determinism" 
problems Robert makes us aware of.

-Michael

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: AMP on an SMP system
  2013-08-02  8:33 AMP on an SMP system Michael Schnell
  2013-08-02 11:42 ` Robert Schwebel
  2013-08-04 21:28 ` Lambrecht Jürgen
@ 2013-08-08  7:41 ` Michael Schnell
  2 siblings, 0 replies; 25+ messages in thread
From: Michael Schnell @ 2013-08-08  7:41 UTC (permalink / raw)
  To: linux-embedded

As a résumé of this discussion I feel that it would be very viable to to 
a commercial or non-commercial project that allows for easy use of a 
single (or even multiple ) dedicated AMP CPU(s) working together with an 
SMP Linux system in a multi core ARM Cortex A9 chip.

Here the supplier would need to provide:
  - means to dedicate one (or more) CPU(s) to an AMP system(s) 
(including setting the AMP Bit to prevent 1st level cache 
synchronization for this CPU(s), and possibly including a patch for the 
scheduler that performance prevents degradation due to the count of 
managed CPUs being not identical with those found in hardware)
  - means for communication with the AMP system(s) (i.e. a prototype for 
a Kernel driver that allows for bidirectional "message-queue"- / pipe- 
like communication using a DMA-alike non-cached memory region and mutual 
interrupts for notification
  - means to load and start a program in an AMP system (supposedly 
provided by the same Kernel driver). Here supposedly some kind of cache 
flush needs to be done as the cache synchronization is switched off for 
the AMP CPU(s).
  - appropriate documentation (including a definition on how to do the 
software for the AMP system and including hints on how to calculate the 
max latency)

What do you think ?

-Michael


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: AMP on an SMP system
  2013-08-05  9:06 Guenter Ebermann
@ 2013-08-05  9:34 ` Michael Schnell
  0 siblings, 0 replies; 25+ messages in thread
From: Michael Schnell @ 2013-08-05  9:34 UTC (permalink / raw)
  To: Guenter Ebermann; +Cc: linux-embedded

On 08/05/2013 11:06 AM, Guenter Ebermann wrote:
> I am using vanilla linux in an AMP setup on freescale P1022
> successfully....
Great to know this !

Thanks,
-Michael

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: AMP on an SMP system
@ 2013-08-05  9:06 Guenter Ebermann
  2013-08-05  9:34 ` Michael Schnell
  0 siblings, 1 reply; 25+ messages in thread
From: Guenter Ebermann @ 2013-08-05  9:06 UTC (permalink / raw)
  To: Michael Schnell; +Cc: linux-embedded

Hi,

I am using vanilla linux in an AMP setup on freescale P1022
successfully. The needed
linux bootargs are:
maxcpus=1 mem=448M memmap=64M$0x1C000000

Then there is a linux userspace program and a kernel module (using kernel
function mpic_reset_core) to start an application (another OS) on the
second core.

For communication with the seconds core I use a non-blocking FIFO
in a shared RAM section.

Best regards,
Günter

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: AMP on an SMP system
  2013-08-05  8:21   ` Robert Schwebel
@ 2013-08-05  8:42     ` Michael Schnell
  0 siblings, 0 replies; 25+ messages in thread
From: Michael Schnell @ 2013-08-05  8:42 UTC (permalink / raw)
  Cc: linux-embedded

On 08/05/2013 10:21 AM, Robert Schwebel wrote:
> https://www.osadl.org/QA-Farm-Realtime.qa-farm-about.0.html

Interesting stuff !

-Michael


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: AMP on an SMP system
  2013-08-05  7:45 ` Michael Schnell
@ 2013-08-05  8:21   ` Robert Schwebel
  2013-08-05  8:42     ` Michael Schnell
  0 siblings, 1 reply; 25+ messages in thread
From: Robert Schwebel @ 2013-08-05  8:21 UTC (permalink / raw)
  To: Michael Schnell; +Cc: linux-embedded

On Mon, Aug 05, 2013 at 09:45:23AM +0200, Michael Schnell wrote:
> On 08/02/2013 06:16 PM, Jon Sevy wrote:
> >You might try the Open Source Automation Development Labs website for real-time Linux latency measurements and methodology:
> >http://www.osadl.org/
> Thanks for the pointer !
> 
> On the first glace I see that this might help with legal and similar
> issues, but I did not find anything regarding latency, yet.

https://www.osadl.org/QA-Farm-Realtime.qa-farm-about.0.html

rsc
-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: AMP on an SMP system
  2013-08-02 16:16 Jon Sevy
@ 2013-08-05  7:45 ` Michael Schnell
  2013-08-05  8:21   ` Robert Schwebel
  0 siblings, 1 reply; 25+ messages in thread
From: Michael Schnell @ 2013-08-05  7:45 UTC (permalink / raw)
  Cc: linux-embedded

On 08/02/2013 06:16 PM, Jon Sevy wrote:
> You might try the Open Source Automation Development Labs website for real-time Linux latency measurements and methodology:
> http://www.osadl.org/
Thanks for the pointer !

On the first glace I see that this might help with legal and similar 
issues, but I did not find anything regarding latency, yet.

-Michael

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: AMP on an SMP system
@ 2013-08-02 16:16 Jon Sevy
  2013-08-05  7:45 ` Michael Schnell
  0 siblings, 1 reply; 25+ messages in thread
From: Jon Sevy @ 2013-08-02 16:16 UTC (permalink / raw)
  To: Marco Stornelli; +Cc: Michael Schnell, linux-embedded

You might try the Open Source Automation Development Labs website for real-time Linux latency measurements and methodology:
http://www.osadl.org/
  \
  Jon

Marco Stornelli <marco.stornelli@gmail.com> wrote:

>Il 02/08/2013 18:00, Michael Schnell ha scritto:
>> On 08/02/2013 05:37 PM, Marco Stornelli wrote:
>>>
>>> I don't know your hw so my consideration are really general.
>> The hardware is not decided yet (it will be some A9 thingy). So for me
>> "really general" is just fine.
>>
>>> ISRs in rt kernel doesn't exist or at least the only work is to wake
>>> up the kernel thread for the management.
>> I see.
>>
>> But how to determine the max latency for this ?
>>
>
>Maybe on eLinux.org you can find some number.
>
>Marco
>
>--
>To unsubscribe from this list: send the line "unsubscribe linux-embedded" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2013-08-08  7:41 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-02  8:33 AMP on an SMP system Michael Schnell
2013-08-02 11:42 ` Robert Schwebel
2013-08-02 12:13   ` Michael Schnell
2013-08-02 14:53     ` Marco Stornelli
2013-08-02 15:24       ` Michael Schnell
2013-08-02 15:37         ` Marco Stornelli
2013-08-02 16:00           ` Michael Schnell
2013-08-02 15:58             ` Marco Stornelli
2013-08-03 19:11       ` Robert Schwebel
2013-08-05  7:25         ` Michael Schnell
2013-08-05  8:17           ` Robert Schwebel
2013-08-05  9:04             ` Michael Schnell
2013-08-04 21:28 ` Lambrecht Jürgen
2013-08-05  7:36   ` Michael Schnell
2013-08-05 10:00   ` Lambrecht Jürgen
2013-08-07  8:23     ` Michael Schnell
2013-08-07  8:29       ` Michael Schnell
2013-08-07  9:04       ` Michael Schnell
2013-08-08  7:41 ` Michael Schnell
2013-08-02 16:16 Jon Sevy
2013-08-05  7:45 ` Michael Schnell
2013-08-05  8:21   ` Robert Schwebel
2013-08-05  8:42     ` Michael Schnell
2013-08-05  9:06 Guenter Ebermann
2013-08-05  9:34 ` Michael Schnell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.