All of lore.kernel.org
 help / color / mirror / Atom feed
* Controlling DMA priority between GPU and CPU on x86
@ 2019-10-03 15:23 Rossier Daniel
  2019-10-04  8:14 ` Jan Kiszka
  0 siblings, 1 reply; 6+ messages in thread
From: Rossier Daniel @ 2019-10-03 15:23 UTC (permalink / raw)
  To: xenomai

Hi all,

We are currently working on a project called OpenCN which is a framework deeply inspired from LinuxCNC, but with a revisited architecture.
Hence, OpenCN is related to machine control.

Our framework mainly use a AMP approach and runs the Xenomai/Cobalt kernel on a dedicated CPU with no I-pipe (reworked out).
(We plan to release the framework by first of November).
Our framework works well on a x86 PC, but we have to face an issue regarding DMA priority between the GPU and CPU; when the user plays with the GUI,
some small latencies (O(10ms)) raise up.
Although the question is not directly related to Xenomai, I would be very interested if somebody knows if it possible to give CPU priority against the GPU,
and how to manage the related DMA controller? Is it possible to configure it ? Or any other hint to avoid this latency ?

Many thanks in advance for any inputs.

Daniel


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Controlling DMA priority between GPU and CPU on x86
  2019-10-03 15:23 Controlling DMA priority between GPU and CPU on x86 Rossier Daniel
@ 2019-10-04  8:14 ` Jan Kiszka
  2019-10-05  8:03   ` Rossier Daniel
  0 siblings, 1 reply; 6+ messages in thread
From: Jan Kiszka @ 2019-10-04  8:14 UTC (permalink / raw)
  To: Rossier Daniel, xenomai

On 03.10.19 17:23, Rossier Daniel via Xenomai wrote:
> Hi all,
> 
> We are currently working on a project called OpenCN which is a framework deeply inspired from LinuxCNC, but with a revisited architecture.
> Hence, OpenCN is related to machine control.
> 
> Our framework mainly use a AMP approach and runs the Xenomai/Cobalt kernel on a dedicated CPU with no I-pipe (reworked out).
> (We plan to release the framework by first of November).

IOW, "bare-metal" Cobalt? Collaborative AMP, or partitioned via virtualization?

> Our framework works well on a x86 PC, but we have to face an issue regarding DMA priority between the GPU and CPU; when the user plays with the GUI,
> some small latencies (O(10ms)) raise up.
> Although the question is not directly related to Xenomai, I would be very interested if somebody knows if it possible to give CPU priority against the GPU,
> and how to manage the related DMA controller? Is it possible to configure it ? Or any other hint to avoid this latency ?
> 
> Many thanks in advance for any inputs.

I didn't dig into anything that manages PCI bandwidth yet. I heard that 
something exists, but at that time it was not publicly documented. And it's 
likely highly platform specific (SoC/chipset).

Did you clearly identify that it's DMA contention which increased your latency? 
Do you cache-miss delays increase (PMU should tell you)? Or is there a 
contention between a RT device doing DMA and the GPU?

Jan

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: Controlling DMA priority between GPU and CPU on x86
  2019-10-04  8:14 ` Jan Kiszka
@ 2019-10-05  8:03   ` Rossier Daniel
  2019-10-07  7:14     ` Stéphane Ancelot
  0 siblings, 1 reply; 6+ messages in thread
From: Rossier Daniel @ 2019-10-05  8:03 UTC (permalink / raw)
  To: Jan Kiszka, xenomai

> -----Original Message-----
> From: Jan Kiszka <jan.kiszka@siemens.com>
> Sent: vendredi, 4 octobre 2019 10:15
> To: Rossier Daniel <Daniel.Rossier@heig-vd.ch>; xenomai@xenomai.org
> Subject: Re: Controlling DMA priority between GPU and CPU on x86
> 
> On 03.10.19 17:23, Rossier Daniel via Xenomai wrote:
> > Hi all,
> >
> > We are currently working on a project called OpenCN which is a framework
> deeply inspired from LinuxCNC, but with a revisited architecture.
> > Hence, OpenCN is related to machine control.
> >
> > Our framework mainly use a AMP approach and runs the Xenomai/Cobalt
> kernel on a dedicated CPU with no I-pipe (reworked out).
> > (We plan to release the framework by first of November).
> 
> IOW, "bare-metal" Cobalt? Collaborative AMP, or partitioned via
> virtualization?

Actually, we have patched a lot Xenomai/Cobalt to keep the cobalt scheduler and main functions (mutex, task create/delete, timer, etc.) running on CPU #1 and
we manage interactions with CPU #0 via specific IPI interrupts.  The Linux scheduler does not intervene here.

> 
> > Our framework works well on a x86 PC, but we have to face an issue
> > regarding DMA priority between the GPU and CPU; when the user plays
> with the GUI, some small latencies (O(10ms)) raise up.
> > Although the question is not directly related to Xenomai, I would be
> > very interested if somebody knows if it possible to give CPU priority against
> the GPU, and how to manage the related DMA controller? Is it possible to
> configure it ? Or any other hint to avoid this latency ?
> >
> > Many thanks in advance for any inputs.
> 
> I didn't dig into anything that manages PCI bandwidth yet. I heard that
> something exists, but at that time it was not publicly documented. And it's
> likely highly platform specific (SoC/chipset).
> 
> Did you clearly identify that it's DMA contention which increased your
> latency?

The latency appears when there are some activities on the GPU (the user moves windows for example).
Otherwise, the system is very stable, on a long period too.

> Do you cache-miss delays increase (PMU should tell you)? Or is there a
> contention between a RT device doing DMA and the GPU?

How can I get info from PMU in that sense? There is the network card (e1000e) which makes DMA access probably, 
however we do not use interrupts here, only polling since it is patched Ethercat-based driver.

I also attached some information about our lab PC.

Daniel

> 
> Jan
> 
> --
> Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate
> Competence Center Embedded Linux
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: lscpu.txt
URL: <http://xenomai.org/pipermail/xenomai/attachments/20191005/16b7c474/attachment.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: lscpu_C.txt
URL: <http://xenomai.org/pipermail/xenomai/attachments/20191005/16b7c474/attachment-0001.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: lspci_knnvv.txt
URL: <http://xenomai.org/pipermail/xenomai/attachments/20191005/16b7c474/attachment-0002.txt>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Controlling DMA priority between GPU and CPU on x86
  2019-10-05  8:03   ` Rossier Daniel
@ 2019-10-07  7:14     ` Stéphane Ancelot
  2019-10-07  7:59       ` Rossier Daniel
  0 siblings, 1 reply; 6+ messages in thread
From: Stéphane Ancelot @ 2019-10-07  7:14 UTC (permalink / raw)
  To: Rossier Daniel, Jan Kiszka, xenomai

Hi,

You won't be able to achieve realtime using the actual intel linux video 
driver using GPU.

The driver is not preemptible.

This mainly happens when GPU receives batch job (eg launching a program 
that creates many graphic widgets , moving a window with a 19 inch 
display) .

The memory is locked by the GPU. Isolating rt tasks /IT to cpus won't 
solve it, since the memory is shared with all CPUs (eg only one  numa node)

The problem can better being watched splitting a realtime task in 2 or 3 
tasks, (task1 executes, and wakes up task2, task2 executes and wakes up 
task3) .

We may manage to solve the problem if being able to provide intel a unit 
testing program being able to reproduce the phenomen.

Reducing GPU frequency decreases problem frequency but does not solve it.

For reference:

https://bugs.freedesktop.org/show_bug.cgi?id=110950

Regards,

S.Ancelot


Le 05/10/2019 à 10:03, Rossier Daniel via Xenomai a écrit :
>> -----Original Message-----
>> From: Jan Kiszka <jan.kiszka@siemens.com>
>> Sent: vendredi, 4 octobre 2019 10:15
>> To: Rossier Daniel <Daniel.Rossier@heig-vd.ch>; xenomai@xenomai.org
>> Subject: Re: Controlling DMA priority between GPU and CPU on x86
>>
>> On 03.10.19 17:23, Rossier Daniel via Xenomai wrote:
>>> Hi all,
>>>
>>> We are currently working on a project called OpenCN which is a framework
>> deeply inspired from LinuxCNC, but with a revisited architecture.
>>> Hence, OpenCN is related to machine control.
>>>
>>> Our framework mainly use a AMP approach and runs the Xenomai/Cobalt
>> kernel on a dedicated CPU with no I-pipe (reworked out).
>>> (We plan to release the framework by first of November).
>> IOW, "bare-metal" Cobalt? Collaborative AMP, or partitioned via
>> virtualization?
> Actually, we have patched a lot Xenomai/Cobalt to keep the cobalt scheduler and main functions (mutex, task create/delete, timer, etc.) running on CPU #1 and
> we manage interactions with CPU #0 via specific IPI interrupts.  The Linux scheduler does not intervene here.
>
>>> Our framework works well on a x86 PC, but we have to face an issue
>>> regarding DMA priority between the GPU and CPU; when the user plays
>> with the GUI, some small latencies (O(10ms)) raise up.
>>> Although the question is not directly related to Xenomai, I would be
>>> very interested if somebody knows if it possible to give CPU priority against
>> the GPU, and how to manage the related DMA controller? Is it possible to
>> configure it ? Or any other hint to avoid this latency ?
>>> Many thanks in advance for any inputs.
>> I didn't dig into anything that manages PCI bandwidth yet. I heard that
>> something exists, but at that time it was not publicly documented. And it's
>> likely highly platform specific (SoC/chipset).
>>
>> Did you clearly identify that it's DMA contention which increased your
>> latency?
> The latency appears when there are some activities on the GPU (the user moves windows for example).
> Otherwise, the system is very stable, on a long period too.
>
>> Do you cache-miss delays increase (PMU should tell you)? Or is there a
>> contention between a RT device doing DMA and the GPU?
> How can I get info from PMU in that sense? There is the network card (e1000e) which makes DMA access probably,
> however we do not use interrupts here, only polling since it is patched Ethercat-based driver.
>
> I also attached some information about our lab PC.
>
> Daniel
>
>> Jan
>>
>> --
>> Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate
>> Competence Center Embedded Linux
> -------------- next part --------------
> An embedded and charset-unspecified text was scrubbed...
> Name: lscpu.txt
> URL: <http://xenomai.org/pipermail/xenomai/attachments/20191005/16b7c474/attachment.txt>
> -------------- next part --------------
> An embedded and charset-unspecified text was scrubbed...
> Name: lscpu_C.txt
> URL: <http://xenomai.org/pipermail/xenomai/attachments/20191005/16b7c474/attachment-0001.txt>
> -------------- next part --------------
> An embedded and charset-unspecified text was scrubbed...
> Name: lspci_knnvv.txt
> URL: <http://xenomai.org/pipermail/xenomai/attachments/20191005/16b7c474/attachment-0002.txt>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: Controlling DMA priority between GPU and CPU on x86
  2019-10-07  7:14     ` Stéphane Ancelot
@ 2019-10-07  7:59       ` Rossier Daniel
  2019-10-07  8:06         ` Stéphane Ancelot
  0 siblings, 1 reply; 6+ messages in thread
From: Rossier Daniel @ 2019-10-07  7:59 UTC (permalink / raw)
  To: Stéphane Ancelot, Jan Kiszka, xenomai

> -----Original Message-----
> From: Stéphane Ancelot <sancelot@numalliance.com>
> Sent: lundi, 7 octobre 2019 09:15
> To: Rossier Daniel <Daniel.Rossier@heig-vd.ch>; Jan Kiszka
> <jan.kiszka@siemens.com>; xenomai@xenomai.org
> Subject: Re: Controlling DMA priority between GPU and CPU on x86
> 
> Hi,
> 
> You won't be able to achieve realtime using the actual intel linux video driver
> using GPU.
> 
> The driver is not preemptible.
> 
> This mainly happens when GPU receives batch job (eg launching a program
> that creates many graphic widgets , moving a window with a 19 inch display) .
> 
> The memory is locked by the GPU. Isolating rt tasks /IT to cpus won't solve it,
> since the memory is shared with all CPUs (eg only one  numa node)
> 
> The problem can better being watched splitting a realtime task in 2 or 3 tasks,
> (task1 executes, and wakes up task2, task2 executes and wakes up task3) .
> 
> We may manage to solve the problem if being able to provide intel a unit
> testing program being able to reproduce the phenomen.
> Reducing GPU frequency decreases problem frequency but does not solve it.
> 
> For reference:
> https://bugs.freedesktop.org/show_bug.cgi?id=110950

Thanks a lot for this feedback. We will try to move the GUI on a remote PC. 
Although the network card will also perform DMA access, it could involve less latencies,
or the network card can be configured to be used in a polling mode. Do you have some hints about
such an approach?

Regards,
Daniel


> 
> Regards,
> S.Ancelot
> 
> Le 05/10/2019 à 10:03, Rossier Daniel via Xenomai a écrit :
> -----Original Message-----
> From: Jan Kiszka mailto:jan.kiszka@siemens.com
> Sent: vendredi, 4 octobre 2019 10:15
> To: Rossier Daniel mailto:Daniel.Rossier@heig-vd.ch;
> mailto:xenomai@xenomai.org
> Subject: Re: Controlling DMA priority between GPU and CPU on x86
> 
> On 03.10.19 17:23, Rossier Daniel via Xenomai wrote:
> Hi all,
> 
> We are currently working on a project called OpenCN which is a framework
> deeply inspired from LinuxCNC, but with a revisited architecture.
> Hence, OpenCN is related to machine control.
> 
> Our framework mainly use a AMP approach and runs the Xenomai/Cobalt
> kernel on a dedicated CPU with no I-pipe (reworked out).
> (We plan to release the framework by first of November).
> 
> IOW, "bare-metal" Cobalt? Collaborative AMP, or partitioned via
> virtualization?
> 
> Actually, we have patched a lot Xenomai/Cobalt to keep the cobalt scheduler
> and main functions (mutex, task create/delete, timer, etc.) running on CPU
> #1 and
> we manage interactions with CPU #0 via specific IPI interrupts.  The Linux
> scheduler does not intervene here.
> 
> 
> Our framework works well on a x86 PC, but we have to face an issue
> regarding DMA priority between the GPU and CPU; when the user plays
> with the GUI, some small latencies (O(10ms)) raise up.
> Although the question is not directly related to Xenomai, I would be
> very interested if somebody knows if it possible to give CPU priority against
> the GPU, and how to manage the related DMA controller? Is it possible to
> configure it ? Or any other hint to avoid this latency ?
> 
> Many thanks in advance for any inputs.
> 
> I didn't dig into anything that manages PCI bandwidth yet. I heard that
> something exists, but at that time it was not publicly documented. And it's
> likely highly platform specific (SoC/chipset).
> 
> Did you clearly identify that it's DMA contention which increased your
> latency?
> 
> The latency appears when there are some activities on the GPU (the user
> moves windows for example).
> Otherwise, the system is very stable, on a long period too.
> 
> Do you cache-miss delays increase (PMU should tell you)? Or is there a
> contention between a RT device doing DMA and the GPU?
> 
> How can I get info from PMU in that sense? There is the network card
> (e1000e) which makes DMA access probably,
> however we do not use interrupts here, only polling since it is patched
> Ethercat-based driver.
> 
> I also attached some information about our lab PC.
> 
> Daniel
> 
> 
> Jan
> 
> --
> Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate
> Competence Center Embedded Linux
> -------------- next part --------------
> An embedded and charset-unspecified text was scrubbed...
> Name: lscpu.txt
> URL:
> http://xenomai.org/pipermail/xenomai/attachments/20191005/16b7c474/at
> tachment.txt
> -------------- next part --------------
> An embedded and charset-unspecified text was scrubbed...
> Name: lscpu_C.txt
> URL:
> http://xenomai.org/pipermail/xenomai/attachments/20191005/16b7c474/at
> tachment-0001.txt
> -------------- next part --------------
> An embedded and charset-unspecified text was scrubbed...
> Name: lspci_knnvv.txt
> URL:
> http://xenomai.org/pipermail/xenomai/attachments/20191005/16b7c474/at
> tachment-0002.txt

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Controlling DMA priority between GPU and CPU on x86
  2019-10-07  7:59       ` Rossier Daniel
@ 2019-10-07  8:06         ` Stéphane Ancelot
  0 siblings, 0 replies; 6+ messages in thread
From: Stéphane Ancelot @ 2019-10-07  8:06 UTC (permalink / raw)
  To: Rossier Daniel, Jan Kiszka, xenomai


Le 07/10/2019 à 09:59, Rossier Daniel a écrit :
>> -----Original Message-----
>> From: Stéphane Ancelot <sancelot@numalliance.com>
>> Sent: lundi, 7 octobre 2019 09:15
>> To: Rossier Daniel <Daniel.Rossier@heig-vd.ch>; Jan Kiszka
>> <jan.kiszka@siemens.com>; xenomai@xenomai.org
>> Subject: Re: Controlling DMA priority between GPU and CPU on x86
>>
>> Hi,
>>
>> You won't be able to achieve realtime using the actual intel linux video driver
>> using GPU.
>>
>> The driver is not preemptible.
>>
>> This mainly happens when GPU receives batch job (eg launching a program
>> that creates many graphic widgets , moving a window with a 19 inch display) .
>>
>> The memory is locked by the GPU. Isolating rt tasks /IT to cpus won't solve it,
>> since the memory is shared with all CPUs (eg only one  numa node)
>>
>> The problem can better being watched splitting a realtime task in 2 or 3 tasks,
>> (task1 executes, and wakes up task2, task2 executes and wakes up task3) .
>>
>> We may manage to solve the problem if being able to provide intel a unit
>> testing program being able to reproduce the phenomen.
>> Reducing GPU frequency decreases problem frequency but does not solve it.
>>
>> For reference:
>> https://bugs.freedesktop.org/show_bug.cgi?id=110950
> Thanks a lot for this feedback. We will try to move the GUI on a remote PC.
> Although the network card will also perform DMA access, it could involve less latencies,
> or the network card can be configured to be used in a polling mode. Do you have some hints about
> such an approach?
 From my experience, the problem has not appeared redirecting whole 
graphic through an ssh -X connection.(and inhibiting X11 in RT computer)
>
> Regards,
> Daniel
>
>
>> Regards,
>> S.Ancelot
>>
>> Le 05/10/2019 à 10:03, Rossier Daniel via Xenomai a écrit :
>> -----Original Message-----
>> From: Jan Kiszka mailto:jan.kiszka@siemens.com
>> Sent: vendredi, 4 octobre 2019 10:15
>> To: Rossier Daniel mailto:Daniel.Rossier@heig-vd.ch;
>> mailto:xenomai@xenomai.org
>> Subject: Re: Controlling DMA priority between GPU and CPU on x86
>>
>> On 03.10.19 17:23, Rossier Daniel via Xenomai wrote:
>> Hi all,
>>
>> We are currently working on a project called OpenCN which is a framework
>> deeply inspired from LinuxCNC, but with a revisited architecture.
>> Hence, OpenCN is related to machine control.
>>
>> Our framework mainly use a AMP approach and runs the Xenomai/Cobalt
>> kernel on a dedicated CPU with no I-pipe (reworked out).
>> (We plan to release the framework by first of November).
>>
>> IOW, "bare-metal" Cobalt? Collaborative AMP, or partitioned via
>> virtualization?
>>
>> Actually, we have patched a lot Xenomai/Cobalt to keep the cobalt scheduler
>> and main functions (mutex, task create/delete, timer, etc.) running on CPU
>> #1 and
>> we manage interactions with CPU #0 via specific IPI interrupts.  The Linux
>> scheduler does not intervene here.
>>
>>
>> Our framework works well on a x86 PC, but we have to face an issue
>> regarding DMA priority between the GPU and CPU; when the user plays
>> with the GUI, some small latencies (O(10ms)) raise up.
>> Although the question is not directly related to Xenomai, I would be
>> very interested if somebody knows if it possible to give CPU priority against
>> the GPU, and how to manage the related DMA controller? Is it possible to
>> configure it ? Or any other hint to avoid this latency ?
>>
>> Many thanks in advance for any inputs.
>>
>> I didn't dig into anything that manages PCI bandwidth yet. I heard that
>> something exists, but at that time it was not publicly documented. And it's
>> likely highly platform specific (SoC/chipset).
>>
>> Did you clearly identify that it's DMA contention which increased your
>> latency?
>>
>> The latency appears when there are some activities on the GPU (the user
>> moves windows for example).
>> Otherwise, the system is very stable, on a long period too.
>>
>> Do you cache-miss delays increase (PMU should tell you)? Or is there a
>> contention between a RT device doing DMA and the GPU?
>>
>> How can I get info from PMU in that sense? There is the network card
>> (e1000e) which makes DMA access probably,
>> however we do not use interrupts here, only polling since it is patched
>> Ethercat-based driver.
>>
>> I also attached some information about our lab PC.
>>
>> Daniel
>>
>>
>> Jan
>>
>> --
>> Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate
>> Competence Center Embedded Linux
>> -------------- next part --------------
>> An embedded and charset-unspecified text was scrubbed...
>> Name: lscpu.txt
>> URL:
>> http://xenomai.org/pipermail/xenomai/attachments/20191005/16b7c474/at
>> tachment.txt
>> -------------- next part --------------
>> An embedded and charset-unspecified text was scrubbed...
>> Name: lscpu_C.txt
>> URL:
>> http://xenomai.org/pipermail/xenomai/attachments/20191005/16b7c474/at
>> tachment-0001.txt
>> -------------- next part --------------
>> An embedded and charset-unspecified text was scrubbed...
>> Name: lspci_knnvv.txt
>> URL:
>> http://xenomai.org/pipermail/xenomai/attachments/20191005/16b7c474/at
>> tachment-0002.txt

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-10-07  8:06 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-03 15:23 Controlling DMA priority between GPU and CPU on x86 Rossier Daniel
2019-10-04  8:14 ` Jan Kiszka
2019-10-05  8:03   ` Rossier Daniel
2019-10-07  7:14     ` Stéphane Ancelot
2019-10-07  7:59       ` Rossier Daniel
2019-10-07  8:06         ` Stéphane Ancelot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.