linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* Is it possible to run a linux on multiple clusters if CCI is missing
@ 2022-07-12 15:02 Li Chen
  2022-07-12 15:43 ` Arnd Bergmann
  0 siblings, 1 reply; 5+ messages in thread
From: Li Chen @ 2022-07-12 15:02 UTC (permalink / raw)
  To: linux-arm-kernel

Hi all,

Say that if one SoC has 4 clusters, and each cluster has an A78. Every A78 has its own SCU to maintain cache coherence,
but this SoC doesn't have CCI, so there is no hardware to maintain cache coherence between the 4 clusters.

My question: is it possible to run a single Linux kernel on 4 clusters?
If so, how to do it and how can the cluster cache coherence be maintained?

Regards,
Li

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Is it possible to run a linux on multiple clusters if CCI is missing
  2022-07-12 15:02 Is it possible to run a linux on multiple clusters if CCI is missing Li Chen
@ 2022-07-12 15:43 ` Arnd Bergmann
  2022-07-13  4:07   ` Li Chen
  0 siblings, 1 reply; 5+ messages in thread
From: Arnd Bergmann @ 2022-07-12 15:43 UTC (permalink / raw)
  To: Li Chen; +Cc: linux-arm-kernel

On Tue, Jul 12, 2022 at 5:02 PM Li Chen <me@linux.beauty> wrote:
>
> Hi all,
>
> Say that if one SoC has 4 clusters, and each cluster has an A78. Every A78 has its own SCU to maintain cache coherence,
> but this SoC doesn't have CCI, so there is no hardware to maintain cache coherence between the 4 clusters.
>
> My question: is it possible to run a single Linux kernel on 4 clusters?
> If so, how to do it and how can the cluster cache coherence be maintained?

Hi Li,

Please note that sending an open question like this to the linux-arm-kernel list
probably doesn't get you a reply, since almost nobody reads all the emails sent
to the list, it's more common to just Cc the list to have an archive of the
discussion but have other recipients.  A better place for this might be the
#armlinux IRC channel on irc.libera.chat.

I did get your email in my inbox since I recently replied to another thread
of yours.

Regarding your question, I'm pretty sure that you cannot run Linux across
multiple clusters without a CCI, as the kernel among other things on behavior
documented in Documentation/memory-barriers.txt that is not guaranteed
otherwise.

The only way I can think of for using that kind of system would be to run
a separate kernel on each cluster, and assigning each device to one of the
instances, and use explicit cache management for a communication
channel between them.

         Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Is it possible to run a linux on multiple clusters if CCI is missing
  2022-07-12 15:43 ` Arnd Bergmann
@ 2022-07-13  4:07   ` Li Chen
  2022-07-13  7:19     ` Arnd Bergmann
  0 siblings, 1 reply; 5+ messages in thread
From: Li Chen @ 2022-07-13  4:07 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: linux-arm-kernel

Hi Arnd,
 ---- On Tue, 12 Jul 2022 23:43:06 +0800  Arnd Bergmann <arnd@arndb.de> wrote --- 
 > On Tue, Jul 12, 2022 at 5:02 PM Li Chen <me@linux.beauty> wrote:
 > >
 > > Hi all,
 > >
 > > Say that if one SoC has 4 clusters, and each cluster has an A78. Every A78 has its own SCU to maintain cache coherence,
 > > but this SoC doesn't have CCI, so there is no hardware to maintain cache coherence between the 4 clusters.
 > >
 > > My question: is it possible to run a single Linux kernel on 4 clusters?
 > > If so, how to do it and how can the cluster cache coherence be maintained?
 > 
 > Hi Li,
 > 
 > Please note that sending an open question like this to the linux-arm-kernel list
 > probably doesn't get you a reply, since almost nobody reads all the emails sent
 > to the list, it's more common to just Cc the list to have an archive of the
 > discussion but have other recipients.  A better place for this might be the
 > #armlinux IRC channel on irc.libera.chat.
 > 
 > I did get your email in my inbox since I recently replied to another thread
 > of yours.

Got it. I will come to #armlinux after fixing the IRC connection issue with my company network.
But I prefer mailing list over IRC in that most IRC channels don't have archives available, so they
are searchable on google.

 
 > Regarding your question, I'm pretty sure that you cannot run Linux across
 > multiple clusters without a CCI, as the kernel among other things on behavior
 > documented in Documentation/memory-barriers.txt that is not guaranteed
 > otherwise.
 
Good point, but I don't know how CCI deals with the memory barrier, can you share more about it?

Apart from memory barriers and coherence, can TLB invalidation not also work properly among four clusters without CCI?

 > The only way I can think of for using that kind of system would be to run
 > a separate kernel on each cluster, and assigning each device to one of the
 > instances, and use explicit cache management for a communication
 > channel between them.

Sorry for my three noob questions:
1. How to assign each devices to one of the instances? Can NIC do it?
2. When you say "explicit cache management", do you mean flush/invalidate cache in kernel manually with flush_cache*
    and fush_tlb*?
3. What kind of communication channel? share a region of memory as communication and monitor it with PMU?

Regards,
Li

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Is it possible to run a linux on multiple clusters if CCI is missing
  2022-07-13  4:07   ` Li Chen
@ 2022-07-13  7:19     ` Arnd Bergmann
  2022-07-13 10:34       ` Li Chen
  0 siblings, 1 reply; 5+ messages in thread
From: Arnd Bergmann @ 2022-07-13  7:19 UTC (permalink / raw)
  To: Li Chen; +Cc: Arnd Bergmann, linux-arm-kernel

On Wed, Jul 13, 2022 at 6:07 AM Li Chen <me@linux.beauty> wrote:
>
> Got it. I will come to #armlinux after fixing the IRC connection issue with my company network.
> But I prefer mailing list over IRC in that most IRC channels don't have archives available, so they
> are searchable on google.

The easiest way is usually to pay for an irccloud.com account, which gives you
access through normal https connections and a downloadable archive.

>  > Regarding your question, I'm pretty sure that you cannot run Linux across
>  > multiple clusters without a CCI, as the kernel among other things on behavior
>  > documented in Documentation/memory-barriers.txt that is not guaranteed
>  > otherwise.
>
> Good point, but I don't know how CCI deals with the memory barrier, can you share more about it?
>
> Apart from memory barriers and coherence, can TLB invalidation not also work properly
> among four clusters without CCI?

The problem is more fundamental than this: without cache coherency, a CPU
can keep an outdated copy of a cache line in its local cache indefinitely after
another CPU writes to it, and the barriers that are meant to serialize access
have no effect here. No idea what happens with TLB invalidation, I suppose that
is similar but you won't even see get to this.

>  > The only way I can think of for using that kind of system would be to run
>  > a separate kernel on each cluster, and assigning each device to one of the
>  > instances, and use explicit cache management for a communication
>  > channel between them.
>
> Sorry for my three noob questions:
> 1. How to assign each devices to one of the instances? Can NIC do it?

The easiest way would be to just have DT files that don't overlap. Since you
cannot share memory or devices, you already need a separate set of devices
(root file system, network, etc) for each instance.

In a more sophisticated setup, you could have a small hypervisor that
controls all instances provides memory protection and communication
between them.

> 2. When you say "explicit cache management", do you mean flush/invalidate
>  cache in kernel manually with flush_cache*  and fush_tlb*?

Each instance would appear as a DMA device to the other ones, so this
comes down to the normal dma-mapping.h interfaces. You could use
uncached memory from dma_alloc_coherent() for simple shared memory
channels, or streaming mappings using dma_map_single() or similar
to perform cache flushes.

> 3. What kind of communication channel? share a region of memory as
> communication and monitor it with PMU?

A SoC design that is meant for running multiple OSs would typically have
some hardware support for this, using a combination of mailbox,
sram, doorbell or hwspinlock, which one can use to build higher-level
abstractions for device drivers. This is obviously hardware specific.

Most likely, the answer is that it's not worth trying to run Linux on
more than one cluster given this type of hardware. A more useful
model might be to have Linux on one cluster and run a single
bare-metal application on the other ones, which is accessed from
Linux using a device driver.

         Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Is it possible to run a linux on multiple clusters if CCI is missing
  2022-07-13  7:19     ` Arnd Bergmann
@ 2022-07-13 10:34       ` Li Chen
  0 siblings, 0 replies; 5+ messages in thread
From: Li Chen @ 2022-07-13 10:34 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: linux-arm-kernel

Hi Arnd
 ---- On Wed, 13 Jul 2022 15:19:23 +0800  Arnd Bergmann <arnd@arndb.de> wrote --- 
 > On Wed, Jul 13, 2022 at 6:07 AM Li Chen <me@linux.beauty> wrote:
 > >
 > > Got it. I will come to #armlinux after fixing the IRC connection issue with my company network.
 > > But I prefer mailing list over IRC in that most IRC channels don't have archives available, so they
 > > are searchable on google.
 > 
 > The easiest way is usually to pay for an irccloud.com account, which gives you
 > access through normal https connections and a downloadable archive.

 irccloud works perfectly for me!

 > 
 > >  > Regarding your question, I'm pretty sure that you cannot run Linux across
 > >  > multiple clusters without a CCI, as the kernel among other things on behavior
 > >  > documented in Documentation/memory-barriers.txt that is not guaranteed
 > >  > otherwise.
 > >
 > > Good point, but I don't know how CCI deals with the memory barrier, can you share more about it?
 > >
 > > Apart from memory barriers and coherence, can TLB invalidation not also work properly
 > > among four clusters without CCI?
 > 
 > The problem is more fundamental than this: without cache coherency, a CPU
 > can keep an outdated copy of a cache line in its local cache indefinitely after
 > another CPU writes to it, and the barriers that are meant to serialize access
 > have no effect here. No idea what happens with TLB invalidation, I suppose that
 > is similar but you won't even see get to this.
 > 
 > >  > The only way I can think of for using that kind of system would be to run
 > >  > a separate kernel on each cluster, and assigning each device to one of the
 > >  > instances, and use explicit cache management for a communication
 > >  > channel between them.
 > >
 > > Sorry for my three noob questions:
 > > 1. How to assign each devices to one of the instances? Can NIC do it?
 > 
 > The easiest way would be to just have DT files that don't overlap. Since you
 > cannot share memory or devices, you already need a separate set of devices
 > (root file system, network, etc) for each instance.
 > 
 > In a more sophisticated setup, you could have a small hypervisor that
 > controls all instances provides memory protection and communication
 > between them.
 > 
 > > 2. When you say "explicit cache management", do you mean flush/invalidate
 > >  cache in kernel manually with flush_cache*  and fush_tlb*?
 > 
 > Each instance would appear as a DMA device to the other ones, so this
 > comes down to the normal dma-mapping.h interfaces. You could use
 > uncached memory from dma_alloc_coherent() for simple shared memory
 > channels, or streaming mappings using dma_map_single() or similar
 > to perform cache flushes.
 > 
 > > 3. What kind of communication channel? share a region of memory as
 > > communication and monitor it with PMU?
 > 
 > A SoC design that is meant for running multiple OSs would typically have
 > some hardware support for this, using a combination of mailbox,
 > sram, doorbell or hwspinlock, which one can use to build higher-level
 > abstractions for device drivers. This is obviously hardware specific.
 > 
 > Most likely, the answer is that it's not worth trying to run Linux on
 > more than one cluster given this type of hardware. A more useful
 > model might be to have Linux on one cluster and run a single
 > bare-metal application on the other ones, which is accessed from
 > Linux using a device driver.
 > 
 >          Arnd
 > 

Thanks a lot for your answers.


Regards,
Li

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-07-13 10:35 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-12 15:02 Is it possible to run a linux on multiple clusters if CCI is missing Li Chen
2022-07-12 15:43 ` Arnd Bergmann
2022-07-13  4:07   ` Li Chen
2022-07-13  7:19     ` Arnd Bergmann
2022-07-13 10:34       ` Li Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).