All of lore.kernel.org
 help / color / mirror / Atom feed
* [U-Boot] driver model is not smp safe
@ 2015-07-30  4:12 Bin Meng
  2015-07-31 12:30 ` Andrew Bradford
  2015-07-31 14:31 ` Tom Rini
  0 siblings, 2 replies; 11+ messages in thread
From: Bin Meng @ 2015-07-30  4:12 UTC (permalink / raw)
  To: u-boot

Hi Simon,

When adding x86 multi-cpu initialization on a board with 4 cores, I found:

=> cpu list
  0: cpu at 0               Genuine Intel(R) CPU         @ 1.58GHz
  1: cpu at 1               Genuine Intel(R) CPU         @ 1.58GHz
  2: cpu at 2               Genuine Intel(R) CPU         @ 1.58GHz
  2: cpu at 3               Genuine Intel(R) CPU         @ 1.58GHz

cpu at 2 and cpu at 3 have the same sequence number, which indicates they
are running parallelly to get the same sequence number. The call chain
on an ap is: mp_init_cpu() -> device_probe() -> uclass_resolve_seq().
Apparently ap2 and ap3 are running at the same time to get the same
number.

Note so far all x86 boards that we have enabled x86 multi-cpu
initialization on only have 2 cores, which will not expose such issue
as there is no parallel execution among aps.

What does this mean?

-  Driver model is not smp safe. But given U-Boot is a single-threaded
environment, I don't think we want to add such support to driver
model.

OR:

- We are using driver model incorrectly on x86 mp initialization codes.

What do you think?

Regards,
Bin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [U-Boot] driver model is not smp safe
  2015-07-30  4:12 [U-Boot] driver model is not smp safe Bin Meng
@ 2015-07-31 12:30 ` Andrew Bradford
  2015-07-31 13:42   ` Bin Meng
  2015-07-31 14:31 ` Tom Rini
  1 sibling, 1 reply; 11+ messages in thread
From: Andrew Bradford @ 2015-07-31 12:30 UTC (permalink / raw)
  To: u-boot

Hi Bin,

On 07/30 12:12, Bin Meng wrote:
> Hi Simon,
> 
> When adding x86 multi-cpu initialization on a board with 4 cores, I found:
> 
> => cpu list
>   0: cpu at 0               Genuine Intel(R) CPU         @ 1.58GHz
>   1: cpu at 1               Genuine Intel(R) CPU         @ 1.58GHz
>   2: cpu at 2               Genuine Intel(R) CPU         @ 1.58GHz
>   2: cpu at 3               Genuine Intel(R) CPU         @ 1.58GHz
> 
> cpu at 2 and cpu at 3 have the same sequence number, which indicates they
> are running parallelly to get the same sequence number. The call chain
> on an ap is: mp_init_cpu() -> device_probe() -> uclass_resolve_seq().
> Apparently ap2 and ap3 are running at the same time to get the same
> number.
> 
> Note so far all x86 boards that we have enabled x86 multi-cpu
> initialization on only have 2 cores, which will not expose such issue
> as there is no parallel execution among aps.
> 
> What does this mean?
> 
> -  Driver model is not smp safe. But given U-Boot is a single-threaded
> environment, I don't think we want to add such support to driver
> model.
> 
> OR:
> 
> - We are using driver model incorrectly on x86 mp initialization codes.
> 
> What do you think?

I'm not sure what to do about this (if anything) but I also see this on
an E3845 based board.  I don't think it has affected me in any way.

Does this also affect non-x86 processors?

=> cpu list
  0: cpu at 0              Intel(R) Atom(TM) CPU  E3845  @ 1.91GHz
  2: cpu at 1              Intel(R) Atom(TM) CPU  E3845  @ 1.91GHz
  2: cpu at 2              Intel(R) Atom(TM) CPU  E3845  @ 1.91GHz
  1: cpu at 3              Intel(R) Atom(TM) CPU  E3845  @ 1.91GHz

Thanks,
Andrew

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [U-Boot] driver model is not smp safe
  2015-07-31 12:30 ` Andrew Bradford
@ 2015-07-31 13:42   ` Bin Meng
  2015-07-31 14:13     ` Simon Glass
  0 siblings, 1 reply; 11+ messages in thread
From: Bin Meng @ 2015-07-31 13:42 UTC (permalink / raw)
  To: u-boot

Hi Andrew,

On Fri, Jul 31, 2015 at 8:30 PM, Andrew Bradford
<andrew@bradfordembedded.com> wrote:
> Hi Bin,
>
> On 07/30 12:12, Bin Meng wrote:
>> Hi Simon,
>>
>> When adding x86 multi-cpu initialization on a board with 4 cores, I found:
>>
>> => cpu list
>>   0: cpu at 0               Genuine Intel(R) CPU         @ 1.58GHz
>>   1: cpu at 1               Genuine Intel(R) CPU         @ 1.58GHz
>>   2: cpu at 2               Genuine Intel(R) CPU         @ 1.58GHz
>>   2: cpu at 3               Genuine Intel(R) CPU         @ 1.58GHz
>>
>> cpu at 2 and cpu at 3 have the same sequence number, which indicates they
>> are running parallelly to get the same sequence number. The call chain
>> on an ap is: mp_init_cpu() -> device_probe() -> uclass_resolve_seq().
>> Apparently ap2 and ap3 are running at the same time to get the same
>> number.
>>
>> Note so far all x86 boards that we have enabled x86 multi-cpu
>> initialization on only have 2 cores, which will not expose such issue
>> as there is no parallel execution among aps.
>>
>> What does this mean?
>>
>> -  Driver model is not smp safe. But given U-Boot is a single-threaded
>> environment, I don't think we want to add such support to driver
>> model.
>>
>> OR:
>>
>> - We are using driver model incorrectly on x86 mp initialization codes.
>>
>> What do you think?
>
> I'm not sure what to do about this (if anything) but I also see this on
> an E3845 based board.  I don't think it has affected me in any way.
>

So far I only see the seq number is not correct. Booting Linux with MP
table seems to be OK as Linux can start all 4 cores correctly.

> Does this also affect non-x86 processors?
>

I believe currently the cpu uclass is only implemented on x86, so only
x86 is affected for now.

> => cpu list
>   0: cpu at 0              Intel(R) Atom(TM) CPU  E3845  @ 1.91GHz
>   2: cpu at 1              Intel(R) Atom(TM) CPU  E3845  @ 1.91GHz
>   2: cpu at 2              Intel(R) Atom(TM) CPU  E3845  @ 1.91GHz
>   1: cpu at 3              Intel(R) Atom(TM) CPU  E3845  @ 1.91GHz
>


Regards,
Bin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [U-Boot] driver model is not smp safe
  2015-07-31 13:42   ` Bin Meng
@ 2015-07-31 14:13     ` Simon Glass
  0 siblings, 0 replies; 11+ messages in thread
From: Simon Glass @ 2015-07-31 14:13 UTC (permalink / raw)
  To: u-boot

Hi,

On 31 July 2015 at 07:42, Bin Meng <bmeng.cn@gmail.com> wrote:
> Hi Andrew,
>
> On Fri, Jul 31, 2015 at 8:30 PM, Andrew Bradford
> <andrew@bradfordembedded.com> wrote:
>> Hi Bin,
>>
>> On 07/30 12:12, Bin Meng wrote:
>>> Hi Simon,
>>>
>>> When adding x86 multi-cpu initialization on a board with 4 cores, I found:
>>>
>>> => cpu list
>>>   0: cpu at 0               Genuine Intel(R) CPU         @ 1.58GHz
>>>   1: cpu at 1               Genuine Intel(R) CPU         @ 1.58GHz
>>>   2: cpu at 2               Genuine Intel(R) CPU         @ 1.58GHz
>>>   2: cpu at 3               Genuine Intel(R) CPU         @ 1.58GHz
>>>
>>> cpu at 2 and cpu at 3 have the same sequence number, which indicates they
>>> are running parallelly to get the same sequence number. The call chain
>>> on an ap is: mp_init_cpu() -> device_probe() -> uclass_resolve_seq().
>>> Apparently ap2 and ap3 are running at the same time to get the same
>>> number.
>>>
>>> Note so far all x86 boards that we have enabled x86 multi-cpu
>>> initialization on only have 2 cores, which will not expose such issue
>>> as there is no parallel execution among aps.
>>>
>>> What does this mean?
>>>
>>> -  Driver model is not smp safe. But given U-Boot is a single-threaded
>>> environment, I don't think we want to add such support to driver
>>> model.
>>>
>>> OR:
>>>
>>> - We are using driver model incorrectly on x86 mp initialization codes.
>>>
>>> What do you think?
>>
>> I'm not sure what to do about this (if anything) but I also see this on
>> an E3845 based board.  I don't think it has affected me in any way.
>>
>
> So far I only see the seq number is not correct. Booting Linux with MP
> table seems to be OK as Linux can start all 4 cores correctly.
>
>> Does this also affect non-x86 processors?
>>
>
> I believe currently the cpu uclass is only implemented on x86, so only
> x86 is affected for now.
>
>> => cpu list
>>   0: cpu at 0              Intel(R) Atom(TM) CPU  E3845  @ 1.91GHz
>>   2: cpu at 1              Intel(R) Atom(TM) CPU  E3845  @ 1.91GHz
>>   2: cpu at 2              Intel(R) Atom(TM) CPU  E3845  @ 1.91GHz
>>   1: cpu at 3              Intel(R) Atom(TM) CPU  E3845  @ 1.91GHz
>>

My intention with this was that the sequence numbering would be fixed
in the device tree and each CPU would discover its number in
find_cpu_by_apic_id(). However this only works if you add aliases for
each CPU.

In general I don't think we should be making driver model thread-safe.
But we could add code to sipi_vector.S or even ap_init() to avoid this
problem.

One option which should work is to add something like this to mp_init_cpu():

dev->req_seq = fdtdec_get_init(blob, dev->of_offset, "reg", -1);

Regards,
Simon

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [U-Boot] driver model is not smp safe
  2015-07-30  4:12 [U-Boot] driver model is not smp safe Bin Meng
  2015-07-31 12:30 ` Andrew Bradford
@ 2015-07-31 14:31 ` Tom Rini
  2015-08-03 18:52   ` Simon Glass
  1 sibling, 1 reply; 11+ messages in thread
From: Tom Rini @ 2015-07-31 14:31 UTC (permalink / raw)
  To: u-boot

On Thu, Jul 30, 2015 at 12:12:03PM +0800, Bin Meng wrote:

> Hi Simon,
> 
> When adding x86 multi-cpu initialization on a board with 4 cores, I found:
> 
> => cpu list
>   0: cpu at 0               Genuine Intel(R) CPU         @ 1.58GHz
>   1: cpu at 1               Genuine Intel(R) CPU         @ 1.58GHz
>   2: cpu at 2               Genuine Intel(R) CPU         @ 1.58GHz
>   2: cpu at 3               Genuine Intel(R) CPU         @ 1.58GHz
> 
> cpu at 2 and cpu at 3 have the same sequence number, which indicates they
> are running parallelly to get the same sequence number. The call chain
> on an ap is: mp_init_cpu() -> device_probe() -> uclass_resolve_seq().
> Apparently ap2 and ap3 are running at the same time to get the same
> number.
> 
> Note so far all x86 boards that we have enabled x86 multi-cpu
> initialization on only have 2 cores, which will not expose such issue
> as there is no parallel execution among aps.

So what exactly are we doing with these additional cores?  My
recollection of what we do on other arches when we even deal with other
cores is that we bring them "up" and then usually put them in a holding
pattern for the real OS to deal with _or_ it's one of those cases where
we have multiple OSes running and we do what we need to load and release
those other OSes.

-- 
Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20150731/bd82cdf4/attachment.sig>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [U-Boot] driver model is not smp safe
  2015-07-31 14:31 ` Tom Rini
@ 2015-08-03 18:52   ` Simon Glass
  2015-08-03 19:06     ` Tom Rini
  0 siblings, 1 reply; 11+ messages in thread
From: Simon Glass @ 2015-08-03 18:52 UTC (permalink / raw)
  To: u-boot

Hi Tom,

On 31 July 2015 at 08:31, Tom Rini <trini@konsulko.com> wrote:
> On Thu, Jul 30, 2015 at 12:12:03PM +0800, Bin Meng wrote:
>
>> Hi Simon,
>>
>> When adding x86 multi-cpu initialization on a board with 4 cores, I found:
>>
>> => cpu list
>>   0: cpu at 0               Genuine Intel(R) CPU         @ 1.58GHz
>>   1: cpu at 1               Genuine Intel(R) CPU         @ 1.58GHz
>>   2: cpu at 2               Genuine Intel(R) CPU         @ 1.58GHz
>>   2: cpu at 3               Genuine Intel(R) CPU         @ 1.58GHz
>>
>> cpu at 2 and cpu at 3 have the same sequence number, which indicates they
>> are running parallelly to get the same sequence number. The call chain
>> on an ap is: mp_init_cpu() -> device_probe() -> uclass_resolve_seq().
>> Apparently ap2 and ap3 are running at the same time to get the same
>> number.
>>
>> Note so far all x86 boards that we have enabled x86 multi-cpu
>> initialization on only have 2 cores, which will not expose such issue
>> as there is no parallel execution among aps.
>
> So what exactly are we doing with these additional cores?  My
> recollection of what we do on other arches when we even deal with other
> cores is that we bring them "up" and then usually put them in a holding
> pattern for the real OS to deal with _or_ it's one of those cases where
> we have multiple OSes running and we do what we need to load and release
> those other OSes.

In this case they end up at stop_this_cpu() which is just a hlt
instruction in each case.

Regards,
Simon

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [U-Boot] driver model is not smp safe
  2015-08-03 18:52   ` Simon Glass
@ 2015-08-03 19:06     ` Tom Rini
  2015-08-03 19:27       ` Simon Glass
  0 siblings, 1 reply; 11+ messages in thread
From: Tom Rini @ 2015-08-03 19:06 UTC (permalink / raw)
  To: u-boot

On Mon, Aug 03, 2015 at 12:52:19PM -0600, Simon Glass wrote:
> Hi Tom,
> 
> On 31 July 2015 at 08:31, Tom Rini <trini@konsulko.com> wrote:
> > On Thu, Jul 30, 2015 at 12:12:03PM +0800, Bin Meng wrote:
> >
> >> Hi Simon,
> >>
> >> When adding x86 multi-cpu initialization on a board with 4 cores, I found:
> >>
> >> => cpu list
> >>   0: cpu at 0               Genuine Intel(R) CPU         @ 1.58GHz
> >>   1: cpu at 1               Genuine Intel(R) CPU         @ 1.58GHz
> >>   2: cpu at 2               Genuine Intel(R) CPU         @ 1.58GHz
> >>   2: cpu at 3               Genuine Intel(R) CPU         @ 1.58GHz
> >>
> >> cpu at 2 and cpu at 3 have the same sequence number, which indicates they
> >> are running parallelly to get the same sequence number. The call chain
> >> on an ap is: mp_init_cpu() -> device_probe() -> uclass_resolve_seq().
> >> Apparently ap2 and ap3 are running at the same time to get the same
> >> number.
> >>
> >> Note so far all x86 boards that we have enabled x86 multi-cpu
> >> initialization on only have 2 cores, which will not expose such issue
> >> as there is no parallel execution among aps.
> >
> > So what exactly are we doing with these additional cores?  My
> > recollection of what we do on other arches when we even deal with other
> > cores is that we bring them "up" and then usually put them in a holding
> > pattern for the real OS to deal with _or_ it's one of those cases where
> > we have multiple OSes running and we do what we need to load and release
> > those other OSes.
> 
> In this case they end up at stop_this_cpu() which is just a hlt
> instruction in each case.

So do we really have to be doing anything here?  Or is this just
pre-emptive work for an async MP type setup down the road?  We could
probably live with this with a big comment noting why we know it's
misbehaving.

-- 
Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20150803/08d53826/attachment.sig>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [U-Boot] driver model is not smp safe
  2015-08-03 19:06     ` Tom Rini
@ 2015-08-03 19:27       ` Simon Glass
  2015-08-05  8:43         ` Bin Meng
  0 siblings, 1 reply; 11+ messages in thread
From: Simon Glass @ 2015-08-03 19:27 UTC (permalink / raw)
  To: u-boot

Hi Tom,

On 3 August 2015 at 13:06, Tom Rini <trini@konsulko.com> wrote:
> On Mon, Aug 03, 2015 at 12:52:19PM -0600, Simon Glass wrote:
>> Hi Tom,
>>
>> On 31 July 2015 at 08:31, Tom Rini <trini@konsulko.com> wrote:
>> > On Thu, Jul 30, 2015 at 12:12:03PM +0800, Bin Meng wrote:
>> >
>> >> Hi Simon,
>> >>
>> >> When adding x86 multi-cpu initialization on a board with 4 cores, I found:
>> >>
>> >> => cpu list
>> >>   0: cpu at 0               Genuine Intel(R) CPU         @ 1.58GHz
>> >>   1: cpu at 1               Genuine Intel(R) CPU         @ 1.58GHz
>> >>   2: cpu at 2               Genuine Intel(R) CPU         @ 1.58GHz
>> >>   2: cpu at 3               Genuine Intel(R) CPU         @ 1.58GHz
>> >>
>> >> cpu at 2 and cpu at 3 have the same sequence number, which indicates they
>> >> are running parallelly to get the same sequence number. The call chain
>> >> on an ap is: mp_init_cpu() -> device_probe() -> uclass_resolve_seq().
>> >> Apparently ap2 and ap3 are running at the same time to get the same
>> >> number.
>> >>
>> >> Note so far all x86 boards that we have enabled x86 multi-cpu
>> >> initialization on only have 2 cores, which will not expose such issue
>> >> as there is no parallel execution among aps.
>> >
>> > So what exactly are we doing with these additional cores?  My
>> > recollection of what we do on other arches when we even deal with other
>> > cores is that we bring them "up" and then usually put them in a holding
>> > pattern for the real OS to deal with _or_ it's one of those cases where
>> > we have multiple OSes running and we do what we need to load and release
>> > those other OSes.
>>
>> In this case they end up at stop_this_cpu() which is just a hlt
>> instruction in each case.
>
> So do we really have to be doing anything here?  Or is this just
> pre-emptive work for an async MP type setup down the road?  We could
> probably live with this with a big comment noting why we know it's
> misbehaving.

I think we should fix it - I suggested some options above and Bin may
have ideas also. Bin may be able to send a patch since he can repeat
the problem.

Regards,
Simon

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [U-Boot] driver model is not smp safe
  2015-08-03 19:27       ` Simon Glass
@ 2015-08-05  8:43         ` Bin Meng
  2015-08-07 19:09           ` Simon Glass
  0 siblings, 1 reply; 11+ messages in thread
From: Bin Meng @ 2015-08-05  8:43 UTC (permalink / raw)
  To: u-boot

Hi Simon, Tom,

On Tue, Aug 4, 2015 at 3:27 AM, Simon Glass <sjg@chromium.org> wrote:
> Hi Tom,
>
> On 3 August 2015 at 13:06, Tom Rini <trini@konsulko.com> wrote:
>> On Mon, Aug 03, 2015 at 12:52:19PM -0600, Simon Glass wrote:
>>> Hi Tom,
>>>
>>> On 31 July 2015 at 08:31, Tom Rini <trini@konsulko.com> wrote:
>>> > On Thu, Jul 30, 2015 at 12:12:03PM +0800, Bin Meng wrote:
>>> >
>>> >> Hi Simon,
>>> >>
>>> >> When adding x86 multi-cpu initialization on a board with 4 cores, I found:
>>> >>
>>> >> => cpu list
>>> >>   0: cpu at 0               Genuine Intel(R) CPU         @ 1.58GHz
>>> >>   1: cpu at 1               Genuine Intel(R) CPU         @ 1.58GHz
>>> >>   2: cpu at 2               Genuine Intel(R) CPU         @ 1.58GHz
>>> >>   2: cpu at 3               Genuine Intel(R) CPU         @ 1.58GHz
>>> >>
>>> >> cpu at 2 and cpu at 3 have the same sequence number, which indicates they
>>> >> are running parallelly to get the same sequence number. The call chain
>>> >> on an ap is: mp_init_cpu() -> device_probe() -> uclass_resolve_seq().
>>> >> Apparently ap2 and ap3 are running at the same time to get the same
>>> >> number.
>>> >>
>>> >> Note so far all x86 boards that we have enabled x86 multi-cpu
>>> >> initialization on only have 2 cores, which will not expose such issue
>>> >> as there is no parallel execution among aps.
>>> >
>>> > So what exactly are we doing with these additional cores?  My
>>> > recollection of what we do on other arches when we even deal with other
>>> > cores is that we bring them "up" and then usually put them in a holding
>>> > pattern for the real OS to deal with _or_ it's one of those cases where
>>> > we have multiple OSes running and we do what we need to load and release
>>> > those other OSes.
>>>
>>> In this case they end up at stop_this_cpu() which is just a hlt
>>> instruction in each case.
>>
>> So do we really have to be doing anything here?  Or is this just
>> pre-emptive work for an async MP type setup down the road?  We could
>> probably live with this with a big comment noting why we know it's
>> misbehaving.
>
> I think we should fix it - I suggested some options above and Bin may
> have ideas also. Bin may be able to send a patch since he can repeat
> the problem.
>

Yes we should fix it. But IMHO, just fixing the seq number only
resolves the surface problem. What concerns me is that multiple cpu
running the same piece of codes (in this case, the DM core codes)
without any protection. I have no idea whether these core structures
(like the device list) still look good from the DM core perspective.
Although right now it seems that it only exposes the seq number issue,
we don't know if there are other potential DM issues. Thus I was
thinking fundamentally we are using DM CPU uclass in a wrong way.

Regards,
Bin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [U-Boot] driver model is not smp safe
  2015-08-05  8:43         ` Bin Meng
@ 2015-08-07 19:09           ` Simon Glass
  2015-08-08  0:49             ` Bin Meng
  0 siblings, 1 reply; 11+ messages in thread
From: Simon Glass @ 2015-08-07 19:09 UTC (permalink / raw)
  To: u-boot

Hi Bin,

On 5 August 2015 at 02:43, Bin Meng <bmeng.cn@gmail.com> wrote:
> Hi Simon, Tom,
>
> On Tue, Aug 4, 2015 at 3:27 AM, Simon Glass <sjg@chromium.org> wrote:
>> Hi Tom,
>>
>> On 3 August 2015 at 13:06, Tom Rini <trini@konsulko.com> wrote:
>>> On Mon, Aug 03, 2015 at 12:52:19PM -0600, Simon Glass wrote:
>>>> Hi Tom,
>>>>
>>>> On 31 July 2015 at 08:31, Tom Rini <trini@konsulko.com> wrote:
>>>> > On Thu, Jul 30, 2015 at 12:12:03PM +0800, Bin Meng wrote:
>>>> >
>>>> >> Hi Simon,
>>>> >>
>>>> >> When adding x86 multi-cpu initialization on a board with 4 cores, I found:
>>>> >>
>>>> >> => cpu list
>>>> >>   0: cpu at 0               Genuine Intel(R) CPU         @ 1.58GHz
>>>> >>   1: cpu at 1               Genuine Intel(R) CPU         @ 1.58GHz
>>>> >>   2: cpu at 2               Genuine Intel(R) CPU         @ 1.58GHz
>>>> >>   2: cpu at 3               Genuine Intel(R) CPU         @ 1.58GHz
>>>> >>
>>>> >> cpu at 2 and cpu at 3 have the same sequence number, which indicates they
>>>> >> are running parallelly to get the same sequence number. The call chain
>>>> >> on an ap is: mp_init_cpu() -> device_probe() -> uclass_resolve_seq().
>>>> >> Apparently ap2 and ap3 are running at the same time to get the same
>>>> >> number.
>>>> >>
>>>> >> Note so far all x86 boards that we have enabled x86 multi-cpu
>>>> >> initialization on only have 2 cores, which will not expose such issue
>>>> >> as there is no parallel execution among aps.
>>>> >
>>>> > So what exactly are we doing with these additional cores?  My
>>>> > recollection of what we do on other arches when we even deal with other
>>>> > cores is that we bring them "up" and then usually put them in a holding
>>>> > pattern for the real OS to deal with _or_ it's one of those cases where
>>>> > we have multiple OSes running and we do what we need to load and release
>>>> > those other OSes.
>>>>
>>>> In this case they end up at stop_this_cpu() which is just a hlt
>>>> instruction in each case.
>>>
>>> So do we really have to be doing anything here?  Or is this just
>>> pre-emptive work for an async MP type setup down the road?  We could
>>> probably live with this with a big comment noting why we know it's
>>> misbehaving.
>>
>> I think we should fix it - I suggested some options above and Bin may
>> have ideas also. Bin may be able to send a patch since he can repeat
>> the problem.
>>
>
> Yes we should fix it. But IMHO, just fixing the seq number only
> resolves the surface problem. What concerns me is that multiple cpu
> running the same piece of codes (in this case, the DM core codes)
> without any protection. I have no idea whether these core structures
> (like the device list) still look good from the DM core perspective.
> Although right now it seems that it only exposes the seq number issue,
> we don't know if there are other potential DM issues. Thus I was
> thinking fundamentally we are using DM CPU uclass in a wrong way.

We don't add devices when running on the AP CPUs - we only scan lists.
So long as the boot CPU creates all the devices and then waits for
them to populate, we are OK. I don't see any fundamental problem.

Regards,
Simon

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [U-Boot] driver model is not smp safe
  2015-08-07 19:09           ` Simon Glass
@ 2015-08-08  0:49             ` Bin Meng
  0 siblings, 0 replies; 11+ messages in thread
From: Bin Meng @ 2015-08-08  0:49 UTC (permalink / raw)
  To: u-boot

Hi Simon,

On Sat, Aug 8, 2015 at 3:09 AM, Simon Glass <sjg@chromium.org> wrote:
> Hi Bin,
>
> On 5 August 2015 at 02:43, Bin Meng <bmeng.cn@gmail.com> wrote:
>> Hi Simon, Tom,
>>
>> On Tue, Aug 4, 2015 at 3:27 AM, Simon Glass <sjg@chromium.org> wrote:
>>> Hi Tom,
>>>
>>> On 3 August 2015 at 13:06, Tom Rini <trini@konsulko.com> wrote:
>>>> On Mon, Aug 03, 2015 at 12:52:19PM -0600, Simon Glass wrote:
>>>>> Hi Tom,
>>>>>
>>>>> On 31 July 2015 at 08:31, Tom Rini <trini@konsulko.com> wrote:
>>>>> > On Thu, Jul 30, 2015 at 12:12:03PM +0800, Bin Meng wrote:
>>>>> >
>>>>> >> Hi Simon,
>>>>> >>
>>>>> >> When adding x86 multi-cpu initialization on a board with 4 cores, I found:
>>>>> >>
>>>>> >> => cpu list
>>>>> >>   0: cpu at 0               Genuine Intel(R) CPU         @ 1.58GHz
>>>>> >>   1: cpu at 1               Genuine Intel(R) CPU         @ 1.58GHz
>>>>> >>   2: cpu at 2               Genuine Intel(R) CPU         @ 1.58GHz
>>>>> >>   2: cpu at 3               Genuine Intel(R) CPU         @ 1.58GHz
>>>>> >>
>>>>> >> cpu at 2 and cpu at 3 have the same sequence number, which indicates they
>>>>> >> are running parallelly to get the same sequence number. The call chain
>>>>> >> on an ap is: mp_init_cpu() -> device_probe() -> uclass_resolve_seq().
>>>>> >> Apparently ap2 and ap3 are running at the same time to get the same
>>>>> >> number.
>>>>> >>
>>>>> >> Note so far all x86 boards that we have enabled x86 multi-cpu
>>>>> >> initialization on only have 2 cores, which will not expose such issue
>>>>> >> as there is no parallel execution among aps.
>>>>> >
>>>>> > So what exactly are we doing with these additional cores?  My
>>>>> > recollection of what we do on other arches when we even deal with other
>>>>> > cores is that we bring them "up" and then usually put them in a holding
>>>>> > pattern for the real OS to deal with _or_ it's one of those cases where
>>>>> > we have multiple OSes running and we do what we need to load and release
>>>>> > those other OSes.
>>>>>
>>>>> In this case they end up at stop_this_cpu() which is just a hlt
>>>>> instruction in each case.
>>>>
>>>> So do we really have to be doing anything here?  Or is this just
>>>> pre-emptive work for an async MP type setup down the road?  We could
>>>> probably live with this with a big comment noting why we know it's
>>>> misbehaving.
>>>
>>> I think we should fix it - I suggested some options above and Bin may
>>> have ideas also. Bin may be able to send a patch since he can repeat
>>> the problem.
>>>
>>
>> Yes we should fix it. But IMHO, just fixing the seq number only
>> resolves the surface problem. What concerns me is that multiple cpu
>> running the same piece of codes (in this case, the DM core codes)
>> without any protection. I have no idea whether these core structures
>> (like the device list) still look good from the DM core perspective.
>> Although right now it seems that it only exposes the seq number issue,
>> we don't know if there are other potential DM issues. Thus I was
>> thinking fundamentally we are using DM CPU uclass in a wrong way.
>
> We don't add devices when running on the AP CPUs - we only scan lists.
> So long as the boot CPU creates all the devices and then waits for
> them to populate, we are OK. I don't see any fundamental problem.
>

OK, that makes me feel better, if we only need to resolve the seq
number issue. I will submit a patch for that.

Regards,
Bin

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-08-08  0:49 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-30  4:12 [U-Boot] driver model is not smp safe Bin Meng
2015-07-31 12:30 ` Andrew Bradford
2015-07-31 13:42   ` Bin Meng
2015-07-31 14:13     ` Simon Glass
2015-07-31 14:31 ` Tom Rini
2015-08-03 18:52   ` Simon Glass
2015-08-03 19:06     ` Tom Rini
2015-08-03 19:27       ` Simon Glass
2015-08-05  8:43         ` Bin Meng
2015-08-07 19:09           ` Simon Glass
2015-08-08  0:49             ` Bin Meng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.