All of lore.kernel.org
 help / color / mirror / Atom feed
* [U-Boot] [EXT] Re: Cavium/Marvell Octeon Support
@ 2019-11-06  0:03 Aaron Williams
  2019-11-07  0:34 ` Tom Rini
  0 siblings, 1 reply; 32+ messages in thread
From: Aaron Williams @ 2019-11-06  0:03 UTC (permalink / raw)
  To: u-boot

Hi Tom,

________________________________
From: Tom Rini
Sent: Tuesday, November 5, 2019 6:16 AM
To: Wolfgang Denk
Cc: Aaron Williams; Daniel Schwierzeck; u-boot at lists.denx.de
Subject: Re: [EXT] Re: Cavium/Marvell Octeon Support

On Tue, Nov 05, 2019 at 09:33:35AM +0100, Wolfgang Denk wrote:
> Dear Aaron,
>
> In message <5376617.97hUrJXovB@flash> you wrote:
> >
> > > Again you don't answer my question.  Why do you need a special new
> > > API for such code?  Why do you not just link that code with the rest
> > > of U-Boot?
> >
> > The code in question that is calling the API is not GPL and hence cannot be
> > linked with U-Boot though the phy code is GPL.
>
> Ouch.  I was afraid to hear that.
>
> Please be aware that your newly created API does NOT implement a GPL
> license exception.  the only interface that allows for non-GPL code
> to be run under control of U-Boot is the standalone program
> interface, which is intentionally very restricted.
>
> In other words: what you are doing here is a clear (and intentional,
> which makes it even worse) GPL license violation.
>
> > > It has been mentioned before, but just to be sure: this code which
> > > uses your new API is licensed under a GPLv2 conforming lincense?
> > >
> > There should be no need. None of the code is linked against U-Boot, either at
> > compile time nor at runtime. The application doesn't even know where it is
> > located except by looking for a named block of memory.
>
> It does not have to be linked.  You access internal interfaces of
> U-Boot that have not been exported for non-GPL use, so your code
> still has to be licensed under GPLv2 or a compatible license.

I'm just following up to say that I agree with Wolfgang here.

Sorry for the broken formatting (our IT department forces the Outhouse web client).

I think there is some misunderstanding here. All of the code we include in U-Boot IS GPL or GPL compatible, including the API.

"Even though U-Boot in general is covered by the GPL-2.0/GPL-2.0+,
this does *not* cover the so-called "standalone" applications that
use U-Boot services by means of the jump table provided by U-Boot
exactly for this purpose - this is merely considered normal use of
U-Boot, and does *not* fall under the heading of "derived work"."

No part of U-Boot is included in these applications and no application code is included in U-Boot. We DO have SDK files used in U-Boot, but the SDK files are under a BSD-like license, basically do whatever you want with the code but don't hold us responsible. The SDK code is also used in stand-alone applications as well as the Linux kernel, where derivatives were upstreamed long-ago.

In any event, I think at this point we can remove this support. I don't think it's used any longer. It also looks like EFI does allow for vendor defined services. I hadn't looked at this code for a while but looking at it again it also appears the phy code has been removed. I think the remaining code for QLM configuration could be modified to just use a hook from some environment variables, removing this issue entirely.

--
Tom

Regards,

Aaron

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [U-Boot] [EXT] Re: Cavium/Marvell Octeon Support
  2019-11-06  0:03 [U-Boot] [EXT] Re: Cavium/Marvell Octeon Support Aaron Williams
@ 2019-11-07  0:34 ` Tom Rini
  0 siblings, 0 replies; 32+ messages in thread
From: Tom Rini @ 2019-11-07  0:34 UTC (permalink / raw)
  To: u-boot

On Wed, Nov 06, 2019 at 12:03:23AM +0000, Aaron Williams wrote:
> Hi Tom,
> 
> ________________________________
> From: Tom Rini
> Sent: Tuesday, November 5, 2019 6:16 AM
> To: Wolfgang Denk
> Cc: Aaron Williams; Daniel Schwierzeck; u-boot at lists.denx.de
> Subject: Re: [EXT] Re: Cavium/Marvell Octeon Support
> 
> On Tue, Nov 05, 2019 at 09:33:35AM +0100, Wolfgang Denk wrote:
> > Dear Aaron,
> >
> > In message <5376617.97hUrJXovB@flash> you wrote:
> > >
> > > > Again you don't answer my question.  Why do you need a special new
> > > > API for such code?  Why do you not just link that code with the rest
> > > > of U-Boot?
> > >
> > > The code in question that is calling the API is not GPL and hence cannot be
> > > linked with U-Boot though the phy code is GPL.
> >
> > Ouch.  I was afraid to hear that.
> >
> > Please be aware that your newly created API does NOT implement a GPL
> > license exception.  the only interface that allows for non-GPL code
> > to be run under control of U-Boot is the standalone program
> > interface, which is intentionally very restricted.
> >
> > In other words: what you are doing here is a clear (and intentional,
> > which makes it even worse) GPL license violation.
> >
> > > > It has been mentioned before, but just to be sure: this code which
> > > > uses your new API is licensed under a GPLv2 conforming lincense?
> > > >
> > > There should be no need. None of the code is linked against U-Boot, either at
> > > compile time nor at runtime. The application doesn't even know where it is
> > > located except by looking for a named block of memory.
> >
> > It does not have to be linked.  You access internal interfaces of
> > U-Boot that have not been exported for non-GPL use, so your code
> > still has to be licensed under GPLv2 or a compatible license.
> 
> I'm just following up to say that I agree with Wolfgang here.
> 
> Sorry for the broken formatting (our IT department forces the Outhouse web client).
> 
> I think there is some misunderstanding here. All of the code we include in U-Boot IS GPL or GPL compatible, including the API.
> 
> "Even though U-Boot in general is covered by the GPL-2.0/GPL-2.0+,
> this does *not* cover the so-called "standalone" applications that
> use U-Boot services by means of the jump table provided by U-Boot
> exactly for this purpose - this is merely considered normal use of
> U-Boot, and does *not* fall under the heading of "derived work"."
> 
> No part of U-Boot is included in these applications and no application code is included in U-Boot. We DO have SDK files used in U-Boot, but the SDK files are under a BSD-like license, basically do whatever you want with the code but don't hold us responsible. The SDK code is also used in stand-alone applications as well as the Linux kernel, where derivatives were upstreamed long-ago.
> 
> In any event, I think at this point we can remove this support. I don't think it's used any longer. It also looks like EFI does allow for vendor defined services. I hadn't looked at this code for a while but looking at it again it also appears the phy code has been removed. I think the remaining code for QLM configuration could be modified to just use a hook from some environment variables, removing this issue entirely.

Not needing to worry about how to deal with this support is indeed the
best case for everyone, thanks!

-- 
Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20191106/daad478b/attachment.sig>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [U-Boot] [EXT] Re: Cavium/Marvell Octeon Support
  2019-11-06 22:18                             ` Aaron Williams
@ 2019-11-07  0:21                               ` Tom Rini
  0 siblings, 0 replies; 32+ messages in thread
From: Tom Rini @ 2019-11-07  0:21 UTC (permalink / raw)
  To: u-boot

On Wed, Nov 06, 2019 at 10:18:45PM +0000, Aaron Williams wrote:
> Hi Wolfgang,
> 
> On Wednesday, November 6, 2019 7:06:17 AM PST Wolfgang Denk wrote:
> > Dear Aaron,
> > 
> > In message 
> <BYAPR18MB24402A81E226896D208669F5B17E0@BYAPR18MB2440.namprd18.prod.outlook.com> 
> you wrote:
> > > > Definitely not.  You could not implement any of this without heavily
> > > > relyin on and deriving from internal interfaces of U-Boot which are
> > > > not exported for non-GPL use.
> > > 
> > > See
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.gnu.org_licenses
> > > _old-2Dlicenses_gpl-2D2.0-2Dfaq.en.html-23GPLInProp-3D&d=DwIDaQ&c=nKjWec2b
> > > 6R0mOyPaz7xtfQ&r=3yfMNumMHGMnOfmVc0dViBi3fJfF8ZXRL_aRWSIGwm4&m=a19tqjpYreP
> > > S1AEd1tHmUya1hcqvHmvs57fTB9c5I50&s=rp_kzh8HU_FV56RrXpf-0DCuegF0rrporRqWwdT
> > > MiR0&e= rietarySystem
> > > 
> > > This behaves exactly in the manner that is permitted by the GPL.
> > > They are completely separate programs.
> > 
> > Are they?
> > 
> > You wrote:
> > 
> > "There is no linking. Only a call table descriptor is published in a
> > named block of memory."
> > 
> > I can only interpret from that that there is a call table, where your
> > applications call into interfaces that have not been exported for
> > non-GPL use.  This is not what I call "completely separate".
> > 
> > 
> > Best regards,
> > 
> > Wolfgang Denk
> 
> Calling directly into U-Boot would be bad. We don't do that. It wouldn't work 
> anyway on our 32-bit bootloader due to the required TLB mapping.
> 
> There is no call table. There is a single XKPhys address that points to some 
> assembly code that saves the state of the calling application and sets up the 
> memory mapping and stack for U-Boot (we map it to 0xFFFFFFFFC0000000) then 
> look at an opcode that's passed and parameters. From there it performs one of 
> several functions based on the opcode. On the way out the reverse is done, the 
> state is restored and the TLB restored before returning to the outside 
> application. The calling application has its own virtual memory map, so that 
> has to be saved and restored on entry by the assembly code as well.
> 
> Since U-Boot uses a TLB for mapping, it's just not possible for an outside 
> application to call into U-Boot using a function table, so everything must go 
> through the one assembly function. The old U-Boot code was written before EFI 
> support was added. It looks like I'll be removing it anyway, though. We have 
> never exported any U-Boot functions save for the assembly code and the API 
> functionality. The API functionality was not usable by our applications since 
> our applications were typically 64-bit whereas our old U-Boot was 32-bit 
> running in mapped memory (0xFFFFFFFFC0000000/0xC0000000) and physically 
> located at the top of physical memory.

Alright, so I think here's the important thing to look at moving
forward.  In mainline U-Boot, the options for communication between
closed source components and U-Boot itself (where GPLv2 is the minimum
license) are either the defined ABI or making use of the EFI ABI.  We do
not want to add or support a 3rd method.  Thanks!

-- 
Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20191106/7b136b09/attachment.sig>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [U-Boot] [EXT] Re: Cavium/Marvell Octeon Support
  2019-11-06 15:06                           ` Wolfgang Denk
@ 2019-11-06 22:18                             ` Aaron Williams
  2019-11-07  0:21                               ` Tom Rini
  0 siblings, 1 reply; 32+ messages in thread
From: Aaron Williams @ 2019-11-06 22:18 UTC (permalink / raw)
  To: u-boot

Hi Wolfgang,

On Wednesday, November 6, 2019 7:06:17 AM PST Wolfgang Denk wrote:
> Dear Aaron,
> 
> In message 
<BYAPR18MB24402A81E226896D208669F5B17E0@BYAPR18MB2440.namprd18.prod.outlook.com> 
you wrote:
> > > Definitely not.  You could not implement any of this without heavily
> > > relyin on and deriving from internal interfaces of U-Boot which are
> > > not exported for non-GPL use.
> > 
> > See
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.gnu.org_licenses
> > _old-2Dlicenses_gpl-2D2.0-2Dfaq.en.html-23GPLInProp-3D&d=DwIDaQ&c=nKjWec2b
> > 6R0mOyPaz7xtfQ&r=3yfMNumMHGMnOfmVc0dViBi3fJfF8ZXRL_aRWSIGwm4&m=a19tqjpYreP
> > S1AEd1tHmUya1hcqvHmvs57fTB9c5I50&s=rp_kzh8HU_FV56RrXpf-0DCuegF0rrporRqWwdT
> > MiR0&e= rietarySystem
> > 
> > This behaves exactly in the manner that is permitted by the GPL.
> > They are completely separate programs.
> 
> Are they?
> 
> You wrote:
> 
> "There is no linking. Only a call table descriptor is published in a
> named block of memory."
> 
> I can only interpret from that that there is a call table, where your
> applications call into interfaces that have not been exported for
> non-GPL use.  This is not what I call "completely separate".
> 
> 
> Best regards,
> 
> Wolfgang Denk

Calling directly into U-Boot would be bad. We don't do that. It wouldn't work 
anyway on our 32-bit bootloader due to the required TLB mapping.

There is no call table. There is a single XKPhys address that points to some 
assembly code that saves the state of the calling application and sets up the 
memory mapping and stack for U-Boot (we map it to 0xFFFFFFFFC0000000) then 
look at an opcode that's passed and parameters. From there it performs one of 
several functions based on the opcode. On the way out the reverse is done, the 
state is restored and the TLB restored before returning to the outside 
application. The calling application has its own virtual memory map, so that 
has to be saved and restored on entry by the assembly code as well.

Since U-Boot uses a TLB for mapping, it's just not possible for an outside 
application to call into U-Boot using a function table, so everything must go 
through the one assembly function. The old U-Boot code was written before EFI 
support was added. It looks like I'll be removing it anyway, though. We have 
never exported any U-Boot functions save for the assembly code and the API 
functionality. The API functionality was not usable by our applications since 
our applications were typically 64-bit whereas our old U-Boot was 32-bit 
running in mapped memory (0xFFFFFFFFC0000000/0xC0000000) and physically 
located at the top of physical memory.

-Aaron

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [U-Boot] [EXT] Re: Cavium/Marvell Octeon Support
  2019-11-05 23:09                         ` Aaron Williams
@ 2019-11-06 15:06                           ` Wolfgang Denk
  2019-11-06 22:18                             ` Aaron Williams
  0 siblings, 1 reply; 32+ messages in thread
From: Wolfgang Denk @ 2019-11-06 15:06 UTC (permalink / raw)
  To: u-boot

Dear Aaron,

In message <BYAPR18MB24402A81E226896D208669F5B17E0@BYAPR18MB2440.namprd18.prod.outlook.com> you wrote:
>
> > Definitely not.  You could not implement any of this without heavily
> > relyin on and deriving from internal interfaces of U-Boot which are
> > not exported for non-GPL use.
>
> See https://www.gnu.org/licenses/old-licenses/gpl-2.0-faq.en.html#GPLInProp=
> rietarySystem
>
> This behaves exactly in the manner that is permitted by the GPL.
> They are completely separate programs.

Are they?

You wrote:

"There is no linking. Only a call table descriptor is published in a
named block of memory."

I can only interpret from that that there is a call table, where your
applications call into interfaces that have not been exported for
non-GPL use.  This is not what I call "completely separate".


Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
A conservative is a man who believes that nothing should be done for
the first time.                                   - Alfred E. Wiggam

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [U-Boot] [EXT] Re: Cavium/Marvell Octeon Support
  2019-11-05 11:36                       ` Wolfgang Denk
@ 2019-11-05 23:09                         ` Aaron Williams
  2019-11-06 15:06                           ` Wolfgang Denk
  0 siblings, 1 reply; 32+ messages in thread
From: Aaron Williams @ 2019-11-05 23:09 UTC (permalink / raw)
  To: u-boot



________________________________
From: Wolfgang Denk <wd@denx.de>
Sent: Tuesday, November 5, 2019 3:36 AM
To: Aaron Williams <awilliams@marvell.com>
Cc: Tom Rini <trini@konsulko.com>; Daniel Schwierzeck <daniel.schwierzeck@gmail.com>; u-boot at lists.denx.de <u-boot@lists.denx.de>
Subject: Re: [EXT] Re: Cavium/Marvell Octeon Support

Hi Wolfgang,

I apologize in  advance for the lack of email formatting (blame our IT department for forcing Linux users to use the broken Outhouse web client).

Dear Aaron,

In message <2609392.0ByMiX4J6F@flash> you wrote:
>
> U-Boot OS might be fun for people writing applications where they want bare
> metal (i.e. hard real-time), though that's already provided with the API and
> examples.

Urgh... no!!! U-Boot is definitely *not* suitable for any kind of
real-time tasks.  By design it implements strict single-tasking with
usally polling hardware access only.  No multi-tasking, no
interrupts, no locking, no timers, nothing...

And I wouldn't ask U-Boot to do this. We don't do any multi-tasking with U-Boot with the exception of SoC specific code that deals with starting simple executive applications. Our API uses a single giant spinlock to prevent there being any multi-tasking within U-Boot.

Now there is other SoC specific code that does use locks and does support multiple cores simultaneously running code. This is needed when we start these Simple Executive applications. The code allows for multiple applications as well as the Linux kernel to be started simultaneously from within U-Boot. The code is executed by all cores in use and does things like set up memory and TLB mapping for the simple executive applications for each core. None of this code would be exposed outside of our SoC code and there is zero interaction with any of U-Boot's code. Each simple executive application has a core mask of cores assigned to it. Obviously in order to be able to do this there is locking within the SoC specific code. It does not involve any code outside of the SoC in order to do this.


> You can't get much more arms length than that except perhaps requiring U-Boot
> to use an interrupt. They are by just about any definition, completely
> separate binaries. I'm no lawyer, but reading the GPL FAQ I think we fall well
> within the arms length separation.

Definitely not.  You could not implement any of this without heavily
relyin on and deriving from internal interfaces of U-Boot which are
not exported for non-GPL use.

See https://www.gnu.org/licenses/old-licenses/gpl-2.0-faq.en.html#GPLInProprietarySystem

This behaves exactly in the manner that is permitted by the GPL. They are completely separate programs.

Best regards,

Wolfgang Denk

Regards,

Aaron Williams

--
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
The IQ of the group is the lowest IQ of a member of the group divided
by the number of people in the group.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [U-Boot] [EXT] Re: Cavium/Marvell Octeon Support
  2019-11-05  8:33                 ` Wolfgang Denk
@ 2019-11-05 14:16                   ` Tom Rini
  0 siblings, 0 replies; 32+ messages in thread
From: Tom Rini @ 2019-11-05 14:16 UTC (permalink / raw)
  To: u-boot

On Tue, Nov 05, 2019 at 09:33:35AM +0100, Wolfgang Denk wrote:
> Dear Aaron,
> 
> In message <5376617.97hUrJXovB@flash> you wrote:
> >
> > > Again you don't answer my question.  Why do you need a special new
> > > API for such code?  Why do you not just link that code with the rest
> > > of U-Boot?
> >
> > The code in question that is calling the API is not GPL and hence cannot be 
> > linked with U-Boot though the phy code is GPL.
> 
> Ouch.  I was afraid to hear that.
> 
> Please be aware that your newly created API does NOT implement a GPL
> license exception.  the only interface that allows for non-GPL code
> to be run under control of U-Boot is the standalone program
> interface, which is intentionally very restricted.
> 
> In other words: what you are doing here is a clear (and intentional,
> which makes it even worse) GPL license violation.
> 
> > > It has been mentioned before, but just to be sure: this code which
> > > uses your new API is licensed under a GPLv2 conforming lincense?
> > > 
> > There should be no need. None of the code is linked against U-Boot, either at 
> > compile time nor at runtime. The application doesn't even know where it is 
> > located except by looking for a named block of memory.
> 
> It does not have to be linked.  You access internal interfaces of
> U-Boot that have not been exported for non-GPL use, so your code
> still has to be licensed under GPLv2 or a compatible license.

I'm just following up to say that I agree with Wolfgang here.

-- 
Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20191105/3adab658/attachment.sig>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [U-Boot] [EXT] Re: Cavium/Marvell Octeon Support
  2019-11-05  2:08                 ` Aaron Williams
  2019-11-05  8:37                   ` Wolfgang Denk
@ 2019-11-05 14:15                   ` Tom Rini
  1 sibling, 0 replies; 32+ messages in thread
From: Tom Rini @ 2019-11-05 14:15 UTC (permalink / raw)
  To: u-boot

On Tue, Nov 05, 2019 at 02:08:54AM +0000, Aaron Williams wrote:
> On Monday, November 4, 2019 8:23:08 AM PST Tom Rini wrote:
> > On Mon, Nov 04, 2019 at 04:44:18PM +0100, Wolfgang Denk wrote:
> > > Dear Aaron,
> > > 
> > > In message <2710076.TiSPtmOvtb@flash> you wrote:
> > > > > What exactly do you need this for?  Why don't you just link your
> > > > > code with the rest of U-Boot?
> > > > 
> > > > We need it to obtain and modify the phy parameters. This is a custom 25G
> > > > gearbox that needs a lot of hand holding. This may end up being a low
> > > > priority (not the gearbox, but the API). It's only a few hundred lines
> > > > of code (the API).
> > > 
> > > Again you don't answer my question.  Why do you need a special new
> > > API for such code?  Why do you not just link that code with the rest
> > > of U-Boot?
> > > 
> > > It has been mentioned before, but just to be sure: this code which
> > > uses your new API is licensed under a GPLv2 conforming lincense?
> > 
> > And, to be blunt, if it is not, handling your non-GPLv2 applications
> > via an EFI binary is the way forward, not extending the U-Boot binary
> > ABI, in my opinion.
> 
> To be blunt, the current U-Boot EFI driver does not provide the required 
> functionality. It would need to be extended in order to work. In addition, 
> spinlocks would be required in order to handle the case of reentrancy. Also, 
> how does the EFI loader deal with loading multiple applications across 
> multiple cores? The block support is the least important part of it. There are 
> several other services not related to block devices or network calls.

If there are parts of the EFI specification that we do not implement,
but could implement, it would be a much appreciated contribution to the
code.  If once you're up in the EFI world there are things you cannot do
that you need to do, that should be taken up with the UEFI consortium.

-- 
Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20191105/2898357d/attachment.sig>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [U-Boot] [EXT] Re: Cavium/Marvell Octeon Support
  2019-11-05  2:13           ` Aaron Williams
@ 2019-11-05 14:09             ` Tom Rini
  0 siblings, 0 replies; 32+ messages in thread
From: Tom Rini @ 2019-11-05 14:09 UTC (permalink / raw)
  To: u-boot

On Tue, Nov 05, 2019 at 02:13:13AM +0000, Aaron Williams wrote:
> Hi Wolfgang,
> 
> On Monday, November 4, 2019 9:22:16 AM PST Tom Rini wrote:
> > On Thu, Oct 31, 2019 at 06:01:34PM +0000, Aaron Williams wrote:
> > > Hi Wolfgang,
> > > 
> > > On Thursday, October 31, 2019 3:40:27 AM PDT Wolfgang Denk wrote:
> > > > Dear Aaron,
> > > > 
> > > > In message <1889679.7FQr5zsBR1@flash> you wrote:
> > > > > Currently we are using 39MB under arch/mips. I think I can easily cut
> > > > > this
> > > > > down to 15MB or smaller, especially by moving some code here to the
> > > > > appropriate driver directories (i.e. DRAM,  pcie, watchdog, etc.)
> > > > > 
> > > > > It will still be a large SoC, though.
> > > > 
> > > > Have you already looked at formal requirements, like coding style
> > > > etc.?   Did you ever run your additions through checkpatch.pl, for
> > > > example?
> > > 
> > > We did follow the formal coding style. Everything will go through
> > > checkpatch. My biggest complaint about it is the 80 columns for debug and
> > > other print statements.
> > 
> > checkpatch doesn't complain about those when they use standard logging
> > functions, however.
> 
> It complains plenty about printf(), debug() and a number of other standard U-
> Boot logging calls.

Yes, but not about pr_debug, etc, which are what really should be used.
Thanks!

-- 
Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20191105/d4a6b92b/attachment.sig>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [U-Boot] [EXT] Re: Cavium/Marvell Octeon Support
  2019-11-05 10:22                     ` Aaron Williams
@ 2019-11-05 11:36                       ` Wolfgang Denk
  2019-11-05 23:09                         ` Aaron Williams
  0 siblings, 1 reply; 32+ messages in thread
From: Wolfgang Denk @ 2019-11-05 11:36 UTC (permalink / raw)
  To: u-boot

Dear Aaron,

In message <2609392.0ByMiX4J6F@flash> you wrote:
>
> U-Boot OS might be fun for people writing applications where they want bare 
> metal (i.e. hard real-time), though that's already provided with the API and 
> examples.

Urgh... no!!! U-Boot is definitely *not* suitable for any kind of
real-time tasks.  By design it implements strict single-tasking with
usally polling hardware access only.  No multi-tasking, no
interrupts, no locking, no timers, nothing...

> You can't get much more arms length than that except perhaps requiring U-Boot 
> to use an interrupt. They are by just about any definition, completely 
> separate binaries. I'm no lawyer, but reading the GPL FAQ I think we fall well 
> within the arms length separation.

Definitely not.  You could not implement any of this without heavily
relyin on and deriving from internal interfaces of U-Boot which are
not exported for non-GPL use.


Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
The IQ of the group is the lowest IQ of a member of the group divided
by the number of people in the group.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [U-Boot] [EXT] Re: Cavium/Marvell Octeon Support
  2019-11-05  8:37                   ` Wolfgang Denk
@ 2019-11-05 10:22                     ` Aaron Williams
  2019-11-05 11:36                       ` Wolfgang Denk
  0 siblings, 1 reply; 32+ messages in thread
From: Aaron Williams @ 2019-11-05 10:22 UTC (permalink / raw)
  To: u-boot

Hi Wolfgang,

On Tuesday, November 5, 2019 12:37:26 AM PST Wolfgang Denk wrote:
> Dear Aaron,
> 
> In message <1838672.aZrPjDvGh8@flash> you wrote:
> > To be blunt, the current U-Boot EFI driver does not provide the required
> > functionality. It would need to be extended in order to work. In addition,
> > spinlocks would be required in order to handle the case of reentrancy.
> > Also, how does the EFI loader deal with loading multiple applications
> > across multiple cores? The block support is the least important part of
> > it. There are several other services not related to block devices or
> > network calls.
> Maybe you are just trying to squeeze too much of operating system
> functionality into a mere boot loader?
> 
> Using tools for purposes they have not been designed for has never
> been a good idea...
> 
> Best regards,
> 
> Wolfgang Denk

With the complexity of U-Boot, it certainly exceeds a number of operating 
systems I've used :)

U-Boot OS might be fun for people writing applications where they want bare 
metal (i.e. hard real-time), though that's already provided with the API and 
examples.

Our API is very much at arms length. It consists of a descriptor placed into a 
named block of memory that has the physical address of  a single entry point, 
version information and a magic number, similar to EFI. There has to be some 
way to hand the CPU over to U-Boot, after all. That single entry point is 
basically a syscall. It saves the context of the caller and performs a TLB 
context switch and sets up a new stack for U-Boot and the TLB mapping (we run 
U-Boot at 0xFFFFFFFFC0000000). There is also a spinlock so that no other core 
may enter U-Boot until the current request finishes. The C code then 
interprets the opcode and copies any data (using physical addresses) into 
buffers used by U-Boot then when done it copies the data back to the 
application's pointers (which are physical addresses). U-Boot code other than 
the API never sees outside pointers and all data is copied to a local buffer. 
It's not fast but it's been very reliable.  The external program doesn't need 
to know anything other than pass some parameters and call the address to hand 
the CPU context over to U-Boot. Neither side knows anything about the other. 
You can't get much more arms length than that except perhaps requiring U-Boot 
to use an interrupt. They are by just about any definition, completely 
separate binaries. I'm no lawyer, but reading the GPL FAQ I think we fall well 
within the arms length separation.

At least on MIPS, U-Boot doesn't seem to care which core it's running on as 
long as only one core is executing at a time. It's proven to be quite 
reliable. It's not meant to be a heavy-duty OS and by design it limits how 
much I/O can be performed. It's only meant to load and save configuration and 
a few other operations. Even functions like getc/putc are not supported (since 
the native application can do that). The main functions used are for changing 
the phy parameters and the MAC quad-lane-module parameters like amplitude and 
equalization which goes along with the phy code.

It also provides some very basic file I/O and block I/O and environment 
variable support like EFI. EFI would be nice to use, but it would require the 
proper lock support and a few other things to work in a multi-core 
environment.

It could be converted over to EFI, though EFI would need to be expanded in 
order to provide the spinlocks and a few other minor changes for the SoC. EFI 
would also need to be expanded to allow for platform-specific calls to be 
supported related to the phy and QLM.

Ideally we won't need this at all with some of the work we're doing on the 
Linux kernel.

Regards,

Aaron

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [U-Boot] [EXT] Re: Cavium/Marvell Octeon Support
  2019-11-05  2:08                 ` Aaron Williams
@ 2019-11-05  8:37                   ` Wolfgang Denk
  2019-11-05 10:22                     ` Aaron Williams
  2019-11-05 14:15                   ` Tom Rini
  1 sibling, 1 reply; 32+ messages in thread
From: Wolfgang Denk @ 2019-11-05  8:37 UTC (permalink / raw)
  To: u-boot

Dear Aaron,

In message <1838672.aZrPjDvGh8@flash> you wrote:
>
> To be blunt, the current U-Boot EFI driver does not provide the required 
> functionality. It would need to be extended in order to work. In addition, 
> spinlocks would be required in order to handle the case of reentrancy. Also, 
> how does the EFI loader deal with loading multiple applications across 
> multiple cores? The block support is the least important part of it. There are 
> several other services not related to block devices or network calls.

Maybe you are just trying to squeeze too much of operating system
functionality into a mere boot loader?

Using tools for purposes they have not been designed for has never
been a good idea...

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
I think animal testing is a terrible idea; they get all  nervous  and
give the wrong answers.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [U-Boot] [EXT] Re: Cavium/Marvell Octeon Support
  2019-11-05  1:57               ` Aaron Williams
@ 2019-11-05  8:33                 ` Wolfgang Denk
  2019-11-05 14:16                   ` Tom Rini
  0 siblings, 1 reply; 32+ messages in thread
From: Wolfgang Denk @ 2019-11-05  8:33 UTC (permalink / raw)
  To: u-boot

Dear Aaron,

In message <5376617.97hUrJXovB@flash> you wrote:
>
> > Again you don't answer my question.  Why do you need a special new
> > API for such code?  Why do you not just link that code with the rest
> > of U-Boot?
>
> The code in question that is calling the API is not GPL and hence cannot be 
> linked with U-Boot though the phy code is GPL.

Ouch.  I was afraid to hear that.

Please be aware that your newly created API does NOT implement a GPL
license exception.  the only interface that allows for non-GPL code
to be run under control of U-Boot is the standalone program
interface, which is intentionally very restricted.

In other words: what you are doing here is a clear (and intentional,
which makes it even worse) GPL license violation.

> > It has been mentioned before, but just to be sure: this code which
> > uses your new API is licensed under a GPLv2 conforming lincense?
> > 
> There should be no need. None of the code is linked against U-Boot, either at 
> compile time nor at runtime. The application doesn't even know where it is 
> located except by looking for a named block of memory.

It does not have to be linked.  You access internal interfaces of
U-Boot that have not been exported for non-GPL use, so your code
still has to be licensed under GPLv2 or a compatible license.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
"I've finally learned what `upward compatible' means. It means we get
to keep all our old mistakes." - Dennie van Tassel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [U-Boot] [EXT] Re: Cavium/Marvell Octeon Support
  2019-11-04 17:22         ` Tom Rini
@ 2019-11-05  2:13           ` Aaron Williams
  2019-11-05 14:09             ` Tom Rini
  0 siblings, 1 reply; 32+ messages in thread
From: Aaron Williams @ 2019-11-05  2:13 UTC (permalink / raw)
  To: u-boot

Hi Wolfgang,

On Monday, November 4, 2019 9:22:16 AM PST Tom Rini wrote:
> On Thu, Oct 31, 2019 at 06:01:34PM +0000, Aaron Williams wrote:
> > Hi Wolfgang,
> > 
> > On Thursday, October 31, 2019 3:40:27 AM PDT Wolfgang Denk wrote:
> > > Dear Aaron,
> > > 
> > > In message <1889679.7FQr5zsBR1@flash> you wrote:
> > > > Currently we are using 39MB under arch/mips. I think I can easily cut
> > > > this
> > > > down to 15MB or smaller, especially by moving some code here to the
> > > > appropriate driver directories (i.e. DRAM,  pcie, watchdog, etc.)
> > > > 
> > > > It will still be a large SoC, though.
> > > 
> > > Have you already looked at formal requirements, like coding style
> > > etc.?   Did you ever run your additions through checkpatch.pl, for
> > > example?
> > 
> > We did follow the formal coding style. Everything will go through
> > checkpatch. My biggest complaint about it is the 80 columns for debug and
> > other print statements.
> 
> checkpatch doesn't complain about those when they use standard logging
> functions, however.

It complains plenty about printf(), debug() and a number of other standard U-
Boot logging calls.

Regards,

Aaron

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [U-Boot] [EXT] Re: Cavium/Marvell Octeon Support
  2019-11-04 16:23               ` Tom Rini
@ 2019-11-05  2:08                 ` Aaron Williams
  2019-11-05  8:37                   ` Wolfgang Denk
  2019-11-05 14:15                   ` Tom Rini
  0 siblings, 2 replies; 32+ messages in thread
From: Aaron Williams @ 2019-11-05  2:08 UTC (permalink / raw)
  To: u-boot

On Monday, November 4, 2019 8:23:08 AM PST Tom Rini wrote:
> On Mon, Nov 04, 2019 at 04:44:18PM +0100, Wolfgang Denk wrote:
> > Dear Aaron,
> > 
> > In message <2710076.TiSPtmOvtb@flash> you wrote:
> > > > What exactly do you need this for?  Why don't you just link your
> > > > code with the rest of U-Boot?
> > > 
> > > We need it to obtain and modify the phy parameters. This is a custom 25G
> > > gearbox that needs a lot of hand holding. This may end up being a low
> > > priority (not the gearbox, but the API). It's only a few hundred lines
> > > of code (the API).
> > 
> > Again you don't answer my question.  Why do you need a special new
> > API for such code?  Why do you not just link that code with the rest
> > of U-Boot?
> > 
> > It has been mentioned before, but just to be sure: this code which
> > uses your new API is licensed under a GPLv2 conforming lincense?
> 
> And, to be blunt, if it is not, handling your non-GPLv2 applications
> via an EFI binary is the way forward, not extending the U-Boot binary
> ABI, in my opinion.

To be blunt, the current U-Boot EFI driver does not provide the required 
functionality. It would need to be extended in order to work. In addition, 
spinlocks would be required in order to handle the case of reentrancy. Also, 
how does the EFI loader deal with loading multiple applications across 
multiple cores? The block support is the least important part of it. There are 
several other services not related to block devices or network calls.

-Aaron

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [U-Boot] [EXT] Re: Cavium/Marvell Octeon Support
  2019-11-04 15:44             ` Wolfgang Denk
  2019-11-04 16:23               ` Tom Rini
@ 2019-11-05  1:57               ` Aaron Williams
  2019-11-05  8:33                 ` Wolfgang Denk
  1 sibling, 1 reply; 32+ messages in thread
From: Aaron Williams @ 2019-11-05  1:57 UTC (permalink / raw)
  To: u-boot

Hi Wolfgang,

On Monday, November 4, 2019 7:44:18 AM PST Wolfgang Denk wrote:
> Dear Aaron,
> 
> In message <2710076.TiSPtmOvtb@flash> you wrote:
> > > What exactly do you need this for?  Why don't you just link your
> > > code with the rest of U-Boot?
> > 
> > We need it to obtain and modify the phy parameters. This is a custom 25G
> > gearbox that needs a lot of hand holding. This may end up being a low
> > priority (not the gearbox, but the API). It's only a few hundred lines of
> > code (the API).
> 
> Again you don't answer my question.  Why do you need a special new
> API for such code?  Why do you not just link that code with the rest
> of U-Boot?

The code in question that is calling the API is not GPL and hence cannot be 
linked with U-Boot though the phy code is GPL. The applications that are 
calling also have their own virtual memory configuration and there can be 
multiple applications running on multiple cores that can make simultaneous 
calls. Because of the way the phy must be maintained with a lot of state 
information, the code controlling it cannot be spread between the separate 
independent applications which run on their own dedicated cores and address 
spaces. The API I wrote takes care of the required context switching and 
provides the services for these applications, such as control of the phy, 
access to devices like eMMC, tuning our QLM interfaces (this code is required 
for U-Boot networking anyway), etc. There is no linking. Only a call table 
descriptor is published in a named block of memory. The API also provides the 
necessary spinlocks and switch stacks. The code in question adds around 36K in 
total, so it is fairly small. The main differences are the addition of a 
number of calls that are unique to our needs in addition to the method of 
calling since a context switch is required in addition to the spinlocks.

The phy in question also does not fit in the normal phy framework. It doesn't 
even communicate with  SMI. It is a complex gearbox where there needs to be 
interaction between applications and the gearbox where some code runs on the 
phy itself but a lot needs to be external.

The API also provides a number of other services such as access to and saving 
environment variables as well as access to block devices and filesystems. It 
is centralized in U-Boot because 1) the functionality is already available in 
U-Boot which is in memory anyway and 2) it's centralized and accessible by all 
applications so it can safely provide services to multiple applications 
simultaneously.

These applications are primarily bare-metal applications.

It may be that this functionality isn't needed. I will try and remove it if I 
can.

> 
> It has been mentioned before, but just to be sure: this code which
> uses your new API is licensed under a GPLv2 conforming lincense?
> 
There should be no need. None of the code is linked against U-Boot, either at 
compile time nor at runtime. The application doesn't even know where it is 
located except by looking for a named block of memory.

This is another thing we make use of in Octeon. There is a concept of named 
blocks in memory. These named blocks are used by U-Boot, simple executive 
applications and the Linux kernel. This allows physical memory to be 
partitioned between Linux and Simple Executive applications as well as 
providing some blocks that are used by some hardware blocks. I believe this 
support is already in the upstream Linux kernel for Octeon.

> Best regards,
> 
> Wolfgang Denk

Regards,

Aaron

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [U-Boot] [EXT] Re: Cavium/Marvell Octeon Support
  2019-10-31 18:01       ` Aaron Williams
@ 2019-11-04 17:22         ` Tom Rini
  2019-11-05  2:13           ` Aaron Williams
  0 siblings, 1 reply; 32+ messages in thread
From: Tom Rini @ 2019-11-04 17:22 UTC (permalink / raw)
  To: u-boot

On Thu, Oct 31, 2019 at 06:01:34PM +0000, Aaron Williams wrote:
> Hi Wolfgang,
> 
> On Thursday, October 31, 2019 3:40:27 AM PDT Wolfgang Denk wrote:
> > Dear Aaron,
> > 
> > In message <1889679.7FQr5zsBR1@flash> you wrote:
> > > Currently we are using 39MB under arch/mips. I think I can easily cut this
> > > down to 15MB or smaller, especially by moving some code here to the
> > > appropriate driver directories (i.e. DRAM,  pcie, watchdog, etc.)
> > > 
> > > It will still be a large SoC, though.
> > 
> > Have you already looked at formal requirements, like coding style
> > etc.?   Did you ever run your additions through checkpatch.pl, for
> > example?
> 
> We did follow the formal coding style. Everything will go through checkpatch. 
> My biggest complaint about it is the 80 columns for debug and other print 
> statements.

checkpatch doesn't complain about those when they use standard logging
functions, however.

-- 
Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20191104/59881343/attachment.sig>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [U-Boot] [EXT] Re: Cavium/Marvell Octeon Support
  2019-11-04 15:44             ` Wolfgang Denk
@ 2019-11-04 16:23               ` Tom Rini
  2019-11-05  2:08                 ` Aaron Williams
  2019-11-05  1:57               ` Aaron Williams
  1 sibling, 1 reply; 32+ messages in thread
From: Tom Rini @ 2019-11-04 16:23 UTC (permalink / raw)
  To: u-boot

On Mon, Nov 04, 2019 at 04:44:18PM +0100, Wolfgang Denk wrote:
> Dear Aaron,
> 
> In message <2710076.TiSPtmOvtb@flash> you wrote:
> >
> > > What exactly do you need this for?  Why don't you just link your
> > > code with the rest of U-Boot?
> >
> > We need it to obtain and modify the phy parameters. This is a custom 25G 
> > gearbox that needs a lot of hand holding. This may end up being a low priority 
> > (not the gearbox, but the API). It's only a few hundred lines of code (the 
> > API).
> 
> Again you don't answer my question.  Why do you need a special new
> API for such code?  Why do you not just link that code with the rest
> of U-Boot?
> 
> It has been mentioned before, but just to be sure: this code which
> uses your new API is licensed under a GPLv2 conforming lincense?

And, to be blunt, if it is not, handling your non-GPLv2 applications
via an EFI binary is the way forward, not extending the U-Boot binary
ABI, in my opinion.

-- 
Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20191104/fddbf103/attachment.sig>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [U-Boot] [EXT] Re: Cavium/Marvell Octeon Support
  2019-10-31 17:59           ` Aaron Williams
@ 2019-11-04 15:44             ` Wolfgang Denk
  2019-11-04 16:23               ` Tom Rini
  2019-11-05  1:57               ` Aaron Williams
  0 siblings, 2 replies; 32+ messages in thread
From: Wolfgang Denk @ 2019-11-04 15:44 UTC (permalink / raw)
  To: u-boot

Dear Aaron,

In message <2710076.TiSPtmOvtb@flash> you wrote:
>
> > What exactly do you need this for?  Why don't you just link your
> > code with the rest of U-Boot?
>
> We need it to obtain and modify the phy parameters. This is a custom 25G 
> gearbox that needs a lot of hand holding. This may end up being a low priority 
> (not the gearbox, but the API). It's only a few hundred lines of code (the 
> API).

Again you don't answer my question.  Why do you need a special new
API for such code?  Why do you not just link that code with the rest
of U-Boot?

It has been mentioned before, but just to be sure: this code which
uses your new API is licensed under a GPLv2 conforming lincense?

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
If you believe that feeling bad or worrying long enough will change a
past or future event, then you are residing on another planet with  a
different reality system.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [U-Boot] [EXT] Re:  Cavium/Marvell Octeon Support
  2019-10-31 13:26     ` Tom Rini
@ 2019-10-31 18:04       ` Aaron Williams
  0 siblings, 0 replies; 32+ messages in thread
From: Aaron Williams @ 2019-10-31 18:04 UTC (permalink / raw)
  To: u-boot

On Thursday, October 31, 2019 6:26:51 AM PDT Tom Rini wrote:
> On Wed, Oct 30, 2019 at 11:36:19PM +0000, Aaron Williams wrote:
> > On Wednesday, October 30, 2019 3:05:25 PM PDT Tom Rini wrote:
> > > External Email
> > > 
> > > ----------------------------------------------------------------------
> > > 
> > > On Wed, Oct 23, 2019 at 03:50:00AM +0000, Aaron Williams wrote:
> > > > Hi all,
> > > > 
> > > > I have been tasked with porting our Octeon U-Boot to the latest U-Boot
> > > > and merging it upstream.
> > > 
> > > [snip]
> > > 
> > > I want to jump back up back to the top of this thread.  And first I want
> > > to say that I am glad that there is official desire to upstream support.
> > > This is good.  My concern is that the plan seems to be, at a very high
> > > level, "get everything we have for every feature upstream".  But as been
> > > said elsewhere this would roughly double the total LOC for the project,
> > > and it's not like we're a new project with a small handful of things :)
> > > It's impossible for the community to review that much code in any
> > > meaningful way over anything less than a period of several years.  I
> > > know you've said that to support various customer use cases you need all
> > > sorts of other things, and while I'm certain that's true, I believe the
> > > plan needs to be to step back and pick the smallest possible testable
> > > unit, and upstream it.  And add to it, small pieces at a time.  Thanks!
> > 
> > It might be easier if I were a maintainer for our SOC to limit the needed
> > review of a fair bit of the code. I have already found I can cut out a
> > large chunk of our code by removing support for our older models. Much of
> > the code has been very well tested, for example our serdes initialization
> > and DRAM initialization code. That's not to say I can't do some cleanup.
> 
> Don't worry, I totally expect you to become the maintainer for your SoC,
> that's key to making sure the SoC-specific stuff is done right :)  But,
> you're not the first big SoC to migrate from an internal fork to
> mainline and have a seemingly impossible number of LOC to deal with.
> It's why I'm saying you need to start with something absolutely as small
> as possible, and move forward.
> 
> > The changes to U-Boot itself should be relatively small as long as I can
> > keep much of our code under arch/mips/arch-octeon and
> > arch/mips/cpu/octeon much like how our ARM code is. For anything that is
> > applicable to other architectures I will place it in the appropriate
> > locations.
> > 
> > Our ARM code is quite a bit simpler than MIPS because on ARM most of the
> > heavy lifting is done by our "BDK" bootloader as well as ATF. On ARM,
> > U-Boot doesn't need to deal with SFPs, serdes initialization, DRAM
> > initialization, hot plug or a myriad of other issues.
> > 
> > I noticed the same with X86.  If it weren't for these other layers the X86
> > code would also be quite large.
> > 
> > I have already identified quite a few very large files that will be
> > removed, such as the error handling code for or Octeon2 and earlier CPUs.
> > 
> > Additionally I have identified a number of register definition files I can
> > get rid of, including one that's 2.3MB in size! These files tend to be
> > huge because they contain definitions for every single chip and revision
> > as well as big and little endian definitions. On top of that, there are a
> > huge number of comments. Each field contains all of the text that is in
> > our hardware reference manuals. There is no dearth of comments. I should
> > be able to cut the size of the remaining files to 1/4th their current
> > size or even smaller.
> > 
> > Some files will still remain quite large, however, such as our serdes
> > initialization and DRAM initialization code (which I plan to re-architect
> > because the original author didn't believe in functions due to stack
> > limitations. (it is well commented though). If you ever want to learn all
> > the gory details of DDR4 link training and finding trends and so-forth
> > it's all in there. The current memory initialization code is over 1MB in
> > size. I plan to cut this down and break it up in a clean manner. The
> > initialization code has grown in complexity and size over the years as
> > various instabilities have been identified and fixed. The DRAM
> > initialization code for our OcteonTX2 CPU is almost as large, though this
> > code has been cleaned up and re-written. There really is no way to avoid
> > this. The OcteonTX and OcteonTX2 DDR initialization code is similar to
> > that for Octeon. In the case of U-Boot on our ARM SoCs, though,
> > initialization is done before U-Boot is loaded.
> > 
> > I'll move the init code to drivers/ram/marvell/octeon. It will be about
> > twice as large as the AXP driver (which only handles DDR3). The serdes
> > init code I figure could go under drivers/soc/marvell/octeon.
> > 
> > I noticed that there are several directories under drivers for memory.
> > There's drivers/ram, drivers/memory and drivers/ddr. These should be
> > consolidated. I think some code might be able to be common, such as the
> > SPD decoding code. It's even possible that some algorithms might be able
> > to be made common such as deskew training and read/write leveling.
> > 
> > In terms of Octeon specific features, there really aren't too many of
> > those
> > but most of the ones we have are essential in the bootloader. There's no
> > avoiding the Serdes and low-level network initialization. The serdes init
> > code works across all networking interface types (SGMII, 1000Base-X,
> > XAUI, RXAUI, XFI, XLAUI, 25G (XLAUI), SATA, PCIe, SRIO plus all the
> > variants (i.e. KR). It also configures all the clocks and equalization.
> > It's not like a simple gigabit NIC nor is it offloaded to some other
> > layer. Some of this code will come later, for example support for NUMA
> > with CN78XX (96 cores, 256GiB of RAM).
> > 
> > Currently we are using 39MB under arch/mips. I think I can easily cut this
> > down to 15MB or smaller, especially by moving some code here to the
> > appropriate driver directories (i.e. DRAM,  pcie, watchdog, etc.)
> > 
> > It will still be a large SoC, though.
> 
> Most modern SoCs are pretty large.  Taking this one step at a time and
> evaluating and re-architecting code along the way and we'll get there.
> You're probably going to run in to a lot of code that needs to be
> adapted to new frameworks, too.  What I strongly encourage from the
> example of previous SoCs that started out this way is to think of your
> internal tree as a reference only.  Sure, you'll want to grab as much of
> the complex init sequence code when moving things over, but it shouldn't
> be thought of as "move board X/Y/Z over" but "start adding board X with
> minimal peripherals" and add on top.

This is the goal. It should be easier to develop the first port without 
networking support since the image can be booted over PCIe though the 
networking support will be key because the customer disables this access. We 
plan to adapt to the new model. I've been working with it for some time with 
our OcteonTX line which was just upstreamed.

-Aaron

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [U-Boot] [EXT] Re: Cavium/Marvell Octeon Support
  2019-10-31 10:40     ` Wolfgang Denk
@ 2019-10-31 18:01       ` Aaron Williams
  2019-11-04 17:22         ` Tom Rini
  0 siblings, 1 reply; 32+ messages in thread
From: Aaron Williams @ 2019-10-31 18:01 UTC (permalink / raw)
  To: u-boot

Hi Wolfgang,

On Thursday, October 31, 2019 3:40:27 AM PDT Wolfgang Denk wrote:
> Dear Aaron,
> 
> In message <1889679.7FQr5zsBR1@flash> you wrote:
> > Currently we are using 39MB under arch/mips. I think I can easily cut this
> > down to 15MB or smaller, especially by moving some code here to the
> > appropriate driver directories (i.e. DRAM,  pcie, watchdog, etc.)
> > 
> > It will still be a large SoC, though.
> 
> Have you already looked at formal requirements, like coding style
> etc.?   Did you ever run your additions through checkpatch.pl, for
> example?

We did follow the formal coding style. Everything will go through checkpatch. 
My biggest complaint about it is the 80 columns for debug and other print 
statements.
> 
> Best regards,
> 
> Wolfgang Denk

-Aaron

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [U-Boot] [EXT] Re: Cavium/Marvell Octeon Support
  2019-10-31 10:36         ` Wolfgang Denk
@ 2019-10-31 17:59           ` Aaron Williams
  2019-11-04 15:44             ` Wolfgang Denk
  0 siblings, 1 reply; 32+ messages in thread
From: Aaron Williams @ 2019-10-31 17:59 UTC (permalink / raw)
  To: u-boot

On Thursday, October 31, 2019 3:36:10 AM PDT Wolfgang Denk wrote:
> Dear Aaron,
> 
> In message <1932577.QJWW3v3lL8@flash> you wrote:
> > We do this relocation as well, however the way we do it is by changing a
> > couple of TLB entries. This lets U-Boot begin execution from any memory
> > location, be it flash, L2 cache or RAM. It also lets us statically link
> > U-Boot to run at a fixed address, in our case 0xC0000000. The relocation
> > happens
> It seems you have missed the primary purpose of relocation.  The
> interesting thing is not the start address, but the end address of
> U-Boot in memory, as we alsways try to place the U-Boot code and data
> at the very end of the available memory (and yes, this includes
> systems which can cam with different memory sizes). Additionally, we
> want to be able to reserve additional memry at the end of RAM, above
> U-Boot, so it can even be kept across warm boots.  Features like
> protected RAM (PRAM), shared log buffers, shared video memory etc.
> come in to mind here.
This is exactly what we do. We use a high virtual address and always move it 
to the end of physical memory.

> 
> > This might be something to consider in the future on some platforms where
> > "relocation" could be performed by just adjusting the TLB or page tables.
> > MIPS makes this particularly easy.
> 
> This cannot be done, not without castrating U-Boot from a number of
> features that require allocation at the end of the available RAM,
> see above.
> 
> > That's fine. The code is actually quite small. It has some custom APIs
> > unique to our needs. We have need to call into the phy code from these
> > applications. I don't know if this could work with the general API or
> > not. One reason we did
> What exactly do you need this for?  Why don't you just link your
> code with the rest of U-Boot?
> 
We need it to obtain and modify the phy parameters. This is a custom 25G 
gearbox that needs a lot of hand holding. This may end up being a low priority 
(not the gearbox, but the API). It's only a few hundred lines of code (the 
API).

> 
> Best regards,
> 
> Wolfgang Denk

-Aaron

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [U-Boot] [EXT] Re:  Cavium/Marvell Octeon Support
  2019-10-30 23:36   ` [U-Boot] [EXT] " Aaron Williams
  2019-10-31 10:40     ` Wolfgang Denk
@ 2019-10-31 13:26     ` Tom Rini
  2019-10-31 18:04       ` Aaron Williams
  1 sibling, 1 reply; 32+ messages in thread
From: Tom Rini @ 2019-10-31 13:26 UTC (permalink / raw)
  To: u-boot

On Wed, Oct 30, 2019 at 11:36:19PM +0000, Aaron Williams wrote:
> On Wednesday, October 30, 2019 3:05:25 PM PDT Tom Rini wrote:
> > External Email
> > 
> > ----------------------------------------------------------------------
> > 
> > On Wed, Oct 23, 2019 at 03:50:00AM +0000, Aaron Williams wrote:
> > > Hi all,
> > > 
> > > I have been tasked with porting our Octeon U-Boot to the latest U-Boot
> > > and merging it upstream.
> > 
> > [snip]
> > 
> > I want to jump back up back to the top of this thread.  And first I want
> > to say that I am glad that there is official desire to upstream support.
> > This is good.  My concern is that the plan seems to be, at a very high
> > level, "get everything we have for every feature upstream".  But as been
> > said elsewhere this would roughly double the total LOC for the project,
> > and it's not like we're a new project with a small handful of things :)
> > It's impossible for the community to review that much code in any
> > meaningful way over anything less than a period of several years.  I
> > know you've said that to support various customer use cases you need all
> > sorts of other things, and while I'm certain that's true, I believe the
> > plan needs to be to step back and pick the smallest possible testable
> > unit, and upstream it.  And add to it, small pieces at a time.  Thanks!
> 
> It might be easier if I were a maintainer for our SOC to limit the needed 
> review of a fair bit of the code. I have already found I can cut out a large 
> chunk of our code by removing support for our older models. Much of the code 
> has been very well tested, for example our serdes initialization and DRAM 
> initialization code. That's not to say I can't do some cleanup.

Don't worry, I totally expect you to become the maintainer for your SoC,
that's key to making sure the SoC-specific stuff is done right :)  But,
you're not the first big SoC to migrate from an internal fork to
mainline and have a seemingly impossible number of LOC to deal with.
It's why I'm saying you need to start with something absolutely as small
as possible, and move forward.

> The changes to U-Boot itself should be relatively small as long as I can keep 
> much of our code under arch/mips/arch-octeon and arch/mips/cpu/octeon much 
> like how our ARM code is. For anything that is applicable to other 
> architectures I will place it in the appropriate locations. 
> 
> Our ARM code is quite a bit simpler than MIPS because on ARM most of the heavy 
> lifting is done by our "BDK" bootloader as well as ATF. On ARM, U-Boot doesn't 
> need to deal with SFPs, serdes initialization, DRAM initialization, hot plug 
> or a myriad of other issues.
> 
> I noticed the same with X86.  If it weren't for these other layers the X86 
> code would also be quite large.
> 
> I have already identified quite a few very large files that will be removed, 
> such as the error handling code for or Octeon2 and earlier CPUs.
> 
> Additionally I have identified a number of register definition files I can get 
> rid of, including one that's 2.3MB in size! These files tend to be huge 
> because they contain definitions for every single chip and revision as well as 
> big and little endian definitions. On top of that, there are a huge number of 
> comments. Each field contains all of the text that is in our hardware 
> reference manuals. There is no dearth of comments. I should be able to cut the 
> size of the remaining files to 1/4th their current size or even smaller.
> 
> Some files will still remain quite large, however, such as our serdes 
> initialization and DRAM initialization code (which I plan to re-architect 
> because the original author didn't believe in functions due to stack 
> limitations. (it is well commented though). If you ever want to learn all the 
> gory details of DDR4 link training and finding trends and so-forth it's all in 
> there. The current memory initialization code is over 1MB in size. I plan to 
> cut this down and break it up in a clean manner. The initialization code has 
> grown in complexity and size over the years as various instabilities have been 
> identified and fixed. The DRAM initialization code for our OcteonTX2 CPU is 
> almost as large, though this code has been cleaned up and re-written. There 
> really is no way to avoid this. The OcteonTX and OcteonTX2 DDR initialization 
> code is similar to that for Octeon. In the case of U-Boot on our ARM SoCs, 
> though, initialization is done before U-Boot is loaded.
> 
> I'll move the init code to drivers/ram/marvell/octeon. It will be about twice 
> as large as the AXP driver (which only handles DDR3). The serdes init code I 
> figure could go under drivers/soc/marvell/octeon.
> 
> I noticed that there are several directories under drivers for memory. There's 
> drivers/ram, drivers/memory and drivers/ddr. These should be consolidated. I 
> think some code might be able to be common, such as the SPD decoding code. 
> It's even possible that some algorithms might be able to be made common such 
> as deskew training and read/write leveling.
> 
> In terms of Octeon specific features, there really aren't too many of those 
> but most of the ones we have are essential in the bootloader. There's no 
> avoiding the Serdes and low-level network initialization. The serdes init code 
> works across all networking interface types (SGMII, 1000Base-X, XAUI, RXAUI, 
> XFI, XLAUI, 25G (XLAUI), SATA, PCIe, SRIO plus all the variants (i.e. KR). It 
> also configures all the clocks and equalization. It's not like a simple 
> gigabit NIC nor is it offloaded to some other layer. Some of this code will 
> come later, for example support for NUMA with CN78XX (96 cores, 256GiB of 
> RAM).
> 
> Currently we are using 39MB under arch/mips. I think I can easily cut this 
> down to 15MB or smaller, especially by moving some code here to the 
> appropriate driver directories (i.e. DRAM,  pcie, watchdog, etc.)
> 
> It will still be a large SoC, though.

Most modern SoCs are pretty large.  Taking this one step at a time and
evaluating and re-architecting code along the way and we'll get there.
You're probably going to run in to a lot of code that needs to be
adapted to new frameworks, too.  What I strongly encourage from the
example of previous SoCs that started out this way is to think of your
internal tree as a reference only.  Sure, you'll want to grab as much of
the complex init sequence code when moving things over, but it shouldn't
be thought of as "move board X/Y/Z over" but "start adding board X with
minimal peripherals" and add on top.

-- 
Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20191031/7be588ef/attachment.sig>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [U-Boot] [EXT] Re: Cavium/Marvell Octeon Support
  2019-10-30 23:36   ` [U-Boot] [EXT] " Aaron Williams
@ 2019-10-31 10:40     ` Wolfgang Denk
  2019-10-31 18:01       ` Aaron Williams
  2019-10-31 13:26     ` Tom Rini
  1 sibling, 1 reply; 32+ messages in thread
From: Wolfgang Denk @ 2019-10-31 10:40 UTC (permalink / raw)
  To: u-boot

Dear Aaron,

In message <1889679.7FQr5zsBR1@flash> you wrote:
>
> Currently we are using 39MB under arch/mips. I think I can easily cut this 
> down to 15MB or smaller, especially by moving some code here to the 
> appropriate driver directories (i.e. DRAM,  pcie, watchdog, etc.)
> 
> It will still be a large SoC, though.

Have you already looked at formal requirements, like coding style
etc.?   Did you ever run your additions through checkpatch.pl, for
example?

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
If ignorance is bliss, why aren't there more happy people?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [U-Boot] [EXT] Re: Cavium/Marvell Octeon Support
  2019-10-30 21:23       ` Aaron Williams
@ 2019-10-31 10:36         ` Wolfgang Denk
  2019-10-31 17:59           ` Aaron Williams
  0 siblings, 1 reply; 32+ messages in thread
From: Wolfgang Denk @ 2019-10-31 10:36 UTC (permalink / raw)
  To: u-boot

Dear Aaron,

In message <1932577.QJWW3v3lL8@flash> you wrote:
> 
> We do this relocation as well, however the way we do it is by changing a 
> couple of TLB entries. This lets U-Boot begin execution from any memory 
> location, be it flash, L2 cache or RAM. It also lets us statically link U-Boot 
> to run at a fixed address, in our case 0xC0000000. The relocation happens 

It seems you have missed the primary purpose of relocation.  The
interesting thing is not the start address, but the end address of
U-Boot in memory, as we alsways try to place the U-Boot code and data
at the very end of the available memory (and yes, this includes
systems which can cam with different memory sizes). Additionally, we
want to be able to reserve additional memry at the end of RAM, above
U-Boot, so it can even be kept across warm boots.  Features like
protected RAM (PRAM), shared log buffers, shared video memory etc.
come in to mind here.

> This might be something to consider in the future on some platforms where 
> "relocation" could be performed by just adjusting the TLB or page tables. MIPS 
> makes this particularly easy.

This cannot be done, not without castrating U-Boot from a number of
features that require allocation at the end of the available RAM,
see above.

> That's fine. The code is actually quite small. It has some custom APIs unique 
> to our needs. We have need to call into the phy code from these applications. 
> I don't know if this could work with the general API or not. One reason we did 

What exactly do you need this for?  Why don't you just link your
code with the rest of U-Boot?


Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
Many companies that have made themselves dependent on [the  equipment
of  a  certain  major  manufacturer] (and in doing so have sold their
soul to the devil) will collapse under the sheer weight  of  the  un-
mastered complexity of their data processing systems.
          -- Edsger W. Dijkstra, SIGPLAN Notices, Volume 17, Number 5

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [U-Boot] [EXT] Re:  Cavium/Marvell Octeon Support
  2019-10-30 22:05 ` [U-Boot] " Tom Rini
@ 2019-10-30 23:36   ` Aaron Williams
  2019-10-31 10:40     ` Wolfgang Denk
  2019-10-31 13:26     ` Tom Rini
  0 siblings, 2 replies; 32+ messages in thread
From: Aaron Williams @ 2019-10-30 23:36 UTC (permalink / raw)
  To: u-boot

On Wednesday, October 30, 2019 3:05:25 PM PDT Tom Rini wrote:
> External Email
> 
> ----------------------------------------------------------------------
> 
> On Wed, Oct 23, 2019 at 03:50:00AM +0000, Aaron Williams wrote:
> > Hi all,
> > 
> > I have been tasked with porting our Octeon U-Boot to the latest U-Boot
> > and merging it upstream.
> 
> [snip]
> 
> I want to jump back up back to the top of this thread.  And first I want
> to say that I am glad that there is official desire to upstream support.
> This is good.  My concern is that the plan seems to be, at a very high
> level, "get everything we have for every feature upstream".  But as been
> said elsewhere this would roughly double the total LOC for the project,
> and it's not like we're a new project with a small handful of things :)
> It's impossible for the community to review that much code in any
> meaningful way over anything less than a period of several years.  I
> know you've said that to support various customer use cases you need all
> sorts of other things, and while I'm certain that's true, I believe the
> plan needs to be to step back and pick the smallest possible testable
> unit, and upstream it.  And add to it, small pieces at a time.  Thanks!

It might be easier if I were a maintainer for our SOC to limit the needed 
review of a fair bit of the code. I have already found I can cut out a large 
chunk of our code by removing support for our older models. Much of the code 
has been very well tested, for example our serdes initialization and DRAM 
initialization code. That's not to say I can't do some cleanup.

The changes to U-Boot itself should be relatively small as long as I can keep 
much of our code under arch/mips/arch-octeon and arch/mips/cpu/octeon much 
like how our ARM code is. For anything that is applicable to other 
architectures I will place it in the appropriate locations. 

Our ARM code is quite a bit simpler than MIPS because on ARM most of the heavy 
lifting is done by our "BDK" bootloader as well as ATF. On ARM, U-Boot doesn't 
need to deal with SFPs, serdes initialization, DRAM initialization, hot plug 
or a myriad of other issues.

I noticed the same with X86.  If it weren't for these other layers the X86 
code would also be quite large.

I have already identified quite a few very large files that will be removed, 
such as the error handling code for or Octeon2 and earlier CPUs.

Additionally I have identified a number of register definition files I can get 
rid of, including one that's 2.3MB in size! These files tend to be huge 
because they contain definitions for every single chip and revision as well as 
big and little endian definitions. On top of that, there are a huge number of 
comments. Each field contains all of the text that is in our hardware 
reference manuals. There is no dearth of comments. I should be able to cut the 
size of the remaining files to 1/4th their current size or even smaller.

Some files will still remain quite large, however, such as our serdes 
initialization and DRAM initialization code (which I plan to re-architect 
because the original author didn't believe in functions due to stack 
limitations. (it is well commented though). If you ever want to learn all the 
gory details of DDR4 link training and finding trends and so-forth it's all in 
there. The current memory initialization code is over 1MB in size. I plan to 
cut this down and break it up in a clean manner. The initialization code has 
grown in complexity and size over the years as various instabilities have been 
identified and fixed. The DRAM initialization code for our OcteonTX2 CPU is 
almost as large, though this code has been cleaned up and re-written. There 
really is no way to avoid this. The OcteonTX and OcteonTX2 DDR initialization 
code is similar to that for Octeon. In the case of U-Boot on our ARM SoCs, 
though, initialization is done before U-Boot is loaded.

I'll move the init code to drivers/ram/marvell/octeon. It will be about twice 
as large as the AXP driver (which only handles DDR3). The serdes init code I 
figure could go under drivers/soc/marvell/octeon.

I noticed that there are several directories under drivers for memory. There's 
drivers/ram, drivers/memory and drivers/ddr. These should be consolidated. I 
think some code might be able to be common, such as the SPD decoding code. 
It's even possible that some algorithms might be able to be made common such 
as deskew training and read/write leveling.

In terms of Octeon specific features, there really aren't too many of those 
but most of the ones we have are essential in the bootloader. There's no 
avoiding the Serdes and low-level network initialization. The serdes init code 
works across all networking interface types (SGMII, 1000Base-X, XAUI, RXAUI, 
XFI, XLAUI, 25G (XLAUI), SATA, PCIe, SRIO plus all the variants (i.e. KR). It 
also configures all the clocks and equalization. It's not like a simple 
gigabit NIC nor is it offloaded to some other layer. Some of this code will 
come later, for example support for NUMA with CN78XX (96 cores, 256GiB of 
RAM).

Currently we are using 39MB under arch/mips. I think I can easily cut this 
down to 15MB or smaller, especially by moving some code here to the 
appropriate driver directories (i.e. DRAM,  pcie, watchdog, etc.)

It will still be a large SoC, though.

-Aaron

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [U-Boot] [EXT] Re: Cavium/Marvell Octeon Support
  2019-10-30 16:20     ` Daniel Schwierzeck
  2019-10-30 17:21       ` Wolfgang Denk
@ 2019-10-30 21:23       ` Aaron Williams
  2019-10-31 10:36         ` Wolfgang Denk
  1 sibling, 1 reply; 32+ messages in thread
From: Aaron Williams @ 2019-10-30 21:23 UTC (permalink / raw)
  To: u-boot

Hi Daniel,

On Wednesday, October 30, 2019 9:20:31 AM PDT Daniel Schwierzeck wrote:
> Hi Aaron,
> 
> Am 27.10.19 um 03:34 schrieb Aaron Williams:
> > Hi Daniel,
> > 
> > On Friday, October 25, 2019 8:13:57 AM PDT Daniel Schwierzeck wrote:
> >> External Email
> >> 
> >> ----------------------------------------------------------------------
> >> Hi Aaron,
> >> 
> >> Am 23.10.19 um 05:50 schrieb Aaron Williams:
> >>> Hi all,
> >>> 
> >>> I have been tasked with porting our Octeon U-Boot to the latest U-Boot
> >>> and merging it upstream. This will involve a very significant amount of
> >>> code that generally will not be compatible with other MIPS processors
> >>> due to our needs and requirements. For example, the start.S will need to
> >>> be completely different than what is present. For example, our existing
> >>> start.S is 3577 lines of code in order to deal with things like RAS,
> >>> exceptions, virtual memory and more. We need to use virtual memory since
> >>> U-Boot can be loaded at any 4MB boundary in memory, not just 0xbfc00000.
> >>> A number of drivers will need to be updated in order to properly map
> >>> pointers to physical addresses. This is needed anyway, since I see
> >>> numerous drivers that assume that a pointer is a DMA address. For MIPS
> >>> this is never the case (I'm looking at XHCI).
> >> 
> >> Good to see some progress in mainline Octeon support. Could you briefly
> >> describe the differences and commonalities in booting an Octeon CPU
> >> compared to other "generic" MIPS cores? Or could you point me to a
> >> public Git tree? It can't be that different because Linux kernel is also
> >> able to share most of the code ;)
> > 
> > Actually the low level code is significantly different. First of all, we
> > need the U-Boot bootloader to be able to boot from different memory
> > locations. Because of this, we use mapped memory for U-Boot. A side
> > effect of this is that it eliminates the need for relocation when it is
> > shifted to the top of memory. All we need to do is just set a couple of
> > TLB entries.
> 
> Understood. but still U-Boot relocates itself from its initial entry
> memory address to its destination memory address based on gd->ram_top.
> Maybe this is ineffective nowadays with various SPL/TPL boot methods
> because U-Boot proper is already loaded to an executable memory location
> by SPL, but you have to initially deal with that design. Feel free to
> suggest/submit a patch for the generic board init code to make the
> reloaction configurable.
> 
We do this relocation as well, however the way we do it is by changing a 
couple of TLB entries. This lets U-Boot begin execution from any memory 
location, be it flash, L2 cache or RAM. It also lets us statically link U-Boot 
to run at a fixed address, in our case 0xC0000000. The relocation happens 
transparently in the start.S code. This also makes our bootloader smaller. 
None of the U-Boot code is affected since on MIPS pointers cannot be used for 
DMA anyway. The functions that map pointers to DMA addresses work as they 
should. The only issues I have found are drivers that don't use this and would 
break on MIPS anyway. We have a SPL loader for our CN7XXX series since the L2 
cache is too small to otherwise fit the entire bootloader. Even this is a 
challenge to make fit since the code to initialize DDR4 memory is very large 
so every bit of space savings helps.

As far as U-Boot is concerned, we just treat it as if relocation is disabled 
since with virtual memory it isn't needed.  I even got it working with the API 
for running standalone apps without requiring any changes to the existing code 
other than to add the MIPS specific changes for our environment.

This might be something to consider in the future on some platforms where 
"relocation" could be performed by just adjusting the TLB or page tables. MIPS 
makes this particularly easy.

I have attached a copy of our existing start.S code. It needs a bit of work 
for the new U-Boot since currently locking the cache and allocating GD on the 
stack are done in board_init_f(). The changes are fairly easy to make. I also 
need to strip out the code for CN6XXX and earlier.

> > The assembly code is significantly different and is far more extensive.
> > 
> > Additionally, the way Octeon Linux is booted is different.
> > 
> > The generic start.S is not usable in our case.
> > 
> > We have a significant amount of code for dealing with the cache and for
> > things like copying U-Boot from flash into the L2 cache. We also have to
> > deal with taking other cores out of reset in our start.S. Our exception
> > handler has also been extended to handle multiple cores.
> 
> it's hard to discuss this without example code but I still think the
> basic principles of cache and exception handling can't be that different
> from generic MIPS cores. Locking cache lines and loading code to it
> could be useful for other MIPS platforms and should be added as generic
> feature. BTW the exception handler code is a port of the Linux one, I
> only skipped the stack trace output because of the complicated stack
> unwinding code. I think the current dump of general and CP0 and EPC
> registers is more than feasible for a bootloader. It already helped me
> multiple times to quickly locate code locations with e.g. null pointer
> dereferencing.
> 
I have attached our start.S code which includes this. In addition, our version 
also dumps out the stack. NULL pointers aren't the easiest to catch since 
typically 0 is a valid memory location. I suppose I could just add a TLB entry 
to mark the first 4K memory as invalid.

> > Some other things we have included are a native API that allows Simple
> > Executive applications to make calls into U-Boot for such things as
> > environment variable access as well as access to block devices and
> > filesystems.
> 
> This is one of the parts that shouldn't be needed for basic upstream
> support. It your API is a parallel and independent implementation of the
> API that U-Boot already has for standalone applications, than I'm afraid
> this won't be accepted and should be kept in a downstream fork.
> 
That's fine. The code is actually quite small. It has some custom APIs unique 
to our needs. We have need to call into the phy code from these applications. 
I don't know if this could work with the general API or not. One reason we did 
this is because we wanted all addresses passed to U-Boot to be physical 
addresses. We need to context switch since these applications have their own 
memory mapping (hence the requirement for physical addresses). We save the TLB 
mapping of the application and set up the U-Boot TLBs then restore that 
afterwards. For pointers we just use XKPHYS addresses. With the API, though, I 
set it up so that applications are linked at another virtual address which can 
access the U-Boot virtual address directly. I think I used 0xd0000000 for 
those. This didn't require any changes to the API other than the assembly code 
and linker scripts.

> > We used to have our Octeon SDK available for download but it seems this
> > has
> > been taken down :( I'm trying to find out how I can make it available but
> > I'm getting pushback in sharing our GPLed U-Boot even though it is GPL.> 
> >> In principle you could compile an own start.S in your mach-octeon
> >> directory, but you should try to use the generic start.S which is
> >> already customisable and extensible. If needed, we could add more
> >> extension points to it. Booting from any custom memory address is
> >> already supported and very common for other MIPS based SoC's. Exception
> >> support is also already there.
> > 
> > The bootloader needs to be able to start from multiple memory locations
> > without recompiling. Our existing bootloader can run from any 4MB boundary
> > without recompiling or relocation. It can start out of flash (from any
> > sector boundary, not just 0) or L2 cache. Starting by L2 cache is
> > supported by eMMC, SPI and PCI target bootloaders. Additionally the same
> > bootloader can be started from RAM such as when the failsafe bootloader
> > starts the main bootloader. In most cases, the failsafe is the same
> > full-featured bootloader since it fits entirely within the L2 cache. Our
> > only bootloader requirement is that it fits in the L2 cache (except when
> > booting from Flash, though this is preferred for speed) and that it
> > remain under 4 MiB in size.
> > 
> > I believe our exception handling is more extensive than the standard
> > U-Boot
> > exception handler. It includes the stack output as well as numerous COP0
> > registers and decoding the cause of the exception. The exception handler
> > is
> > also independent of a working C environment. We also need to handle
> > exceptions occurring on multiple cores as they're brought out of reset
> > and not all cases are exceptions.
> 
> as I wrote above, the current exception handling is already feasible in
> almost all cases to quickly locate code bugs and doesn't need much code.
> Adding stack trace output would required adding a lot of more code. But
> if you only missing some registers or want to dump the stack itself,
> feel free to extend the current code.

That's fine. The only other thing we do is we carve out a bit of the L1 cache 
for a temporary stack. That way the exception handler has zero dependency on 
memory. Currently it's all in assembly language as well.
> 
> Cores are first powered on and kept in a halted state, then

We do more than that. We need to take the cores out of the halted state and do 
some more processing before starting applications. I hope to provide some 
examples later.

> 
> > later when we start the Linux kernel or simple executive applications, the
> > exception handler is updated (via a bootbus moveable memory region)  and
> > an
> > NMI is generated for the cores where they will begin executing code out of
> > start.S before moving to the code that sets up the environment for booting
> > Linux and/or simple executive applications. In the latter case, TLB
> > entries
> > are programmed in for each core.
> > 
> >>> The new Octeon U-Boot will be native 64-bit instead of how the earlier
> >>> one was 32-bit using the N32 ABI (so 64-bit addresses could be
> >>> accessed). We had to jump through some hoops to make a 32-bit U-Boot
> >>> fully support 64-bit hardware.
> >> 
> >> We have 64 bit support for MIPS. I even sync'ed the asm/io stuff from
> >> Linux in the past (which includes support for Octeon) so that you would
> >> be able to use the standard IO primitives and ioremap stuff and hook in
> >> your platform-specifc memory mappings.
> > 
> > That is good to know. What I have run into is the fact that many drivers
> > do
> > not support I/O remapping. I.e. XHCI assumes that a pointer is a DMA
> > address. Also, does the 64-bit support handle multiple cores in U-Boot?
> 
> we already have stuff like dev_remap_addr(struct udevice* dev) as part
> of the driver model API to map your physical addresses from device tree
> to virtual addresses. This is used in all drivers compatible with MIPS.
> That function is backed by the MIPS specific ioremap_nocache() function
> (also ported from Linux) so that you can hook in platform specific
> mapping code. If you want to use existing drivers which don't do
> remapping yet, you have to patch them. But this should be simple, we
> recently did that on Broadcom or Mediatek platforms, which are sharing
> drivers between their MIPS and ARM CPUs.
> 
That's what we take advantage of :) This allows the drivers to work fine when 
virtual memory is used.

> For XHCI you probably only need to patch the xhci_readl() and
> xhci_writel() functions and establish the memory mappings in your
> platform specific glue code. But USB support shouldn't be your first
> priority ;)
> 
The readl and writel are used for accessing the registers. Those aren't the 
problem. The problem comes when setting up the descriptors in memory. The 
descriptors need to use the memory mapping. That's the part that's missing. 
It's not difficult to fix. I think I also found a few endian issues as well 
since we run in big endian mode.

> > I agree about using the standard ioremap stuff. I'm only pointing out that
> > there are places where it is missing in the common U-Boot code. Where it
> > is
> > present, there won't be any issues since traditionally I used those
> > methods to call our platform specific remapping. I will look to see what
> > is present and if it will work or not.
> 
> yes, those places need some patching anyway. There is already an ongoing
> task to address this:
> 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__gitlab.denx.de_u-2Dboot
> _custodians_u-2Dboot-2Dmips_issues_15&d=DwICaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=3y
> fMNumMHGMnOfmVc0dViBi3fJfF8ZXRL_aRWSIGwm4&m=knQuIYR9b2vNU-i0lQUe1OVT1ibM48_K
> zERoDPCHSoA&s=V0kRRm5AwodTkHkcaAvQVrfc2vmMQnw5FESKi5KQW08&e=

I think I can help there. I've already spent a fair bit of time on this with 
XHCI which I backported. I still have a major common XHCI issue to fix when 
short packets are received. The U-Boot code does not handle this case 
properly. It's easy to reproduce the case. Use a USB to Ethernet adapter and 
have the receive buffer cross a 64K boundary and bad things will happen.

> >>> I think we can shrink the code by removing support for starting "simple
> >>> executive" tasks. Simple executive tasks are bare metal applications
> >>> that can run on dedicated cores beside Linux (or without Linux). I will
> >>> also not be porting any support for anything older than Octeon3.
> >>> 
> >>> We also make heavy use of our SDK in order to perform hardware
> >>> initialization and networking. In our old U-Boot, we have almost 900K
> >>> lines of code. I can cut out much of this but much will remain.
> >>> 
> >>> We also have added extensive infrastructure for handling SFP and QSFP
> >>> cables as well as very extensive phy support for phys from
> >>> Aquantia/Marvell, Vitesse/Microsemi, Inphi/Cortina and an Avago gearbox.
> >>> Our customer wants us to port all of this to the new U-Boot and upstream
> >>> it. I'm worried about the sheer amount of code since it is absolutely
> >>> massive.
> >> 
> >> Maybe you should cut down your customers expectations a bit. According
> >> to sloccount we currently have 1.6M SLOC for the whole U-Boot. I guess
> >> Tom or Wolfgang wouldn't agree with adding another 900k only for one
> >> CPU. Actually what should be upstream is the basic CPU, driver and board
> >> support to be able to boot a mainline kernel. Everything else like
> >> custom bare metal applications or the SFP/PHY handling stuff mentioned
> >> below could also be maintained in a downstream tree. Maybe Wolfgang is
> >> willing to host one on gitlab.denx.de.
> > 
> > I will try and cut it down. Much of the code is register definitions. The
> > register definition files are auto-generated and tend to be huge. They're
> > fully commented and include both big and little endian bitfields. In this
> > case I can do like I did for OcteonTX and modify the scripts that
> > generate these headers to strip out the little-endian and comments. There
> > is a huge amount of code for configuring our QLM hardware interfaces. We
> > also have a lot of code for SFP/QSFP ports.
> > 
> > There are some other huge files that can also be eliminated by dropping
> > support for Octeon II and earlier. The error handling files are massive
> > for
> > those chips.
> > 
> > Much of the rest can be shrunk somewhat, but a lot of that code is still
> > required.
> > 
> > There is a huge amount of code for dealing with our quad-lane modules
> > (QLMs). The QLMs can be configured to run in a variety of modes, from
> > PCIe, SGMII, SATA, XLAUI, XFI, Interlaken, SVRIO, QSGMII, XAUI, RXAUI and
> > more. There is a lot of tuning and configuration code needed in order to
> > handle different clocks, equalization, gain, AGC and a whole host of
> > other serdes issues.
> > 
> > The MAC code is also quite large and complex since there are many
> > coprocessors that must be configured. These chips are designed as network
> > processors. While it makes their networking quite powerful and fast, it
> > also means that a lot of programming is needed before they will work.
> > There are input parser engines, buffer management engines, queueing
> > engines, output engines and more that must be fully configured before any
> > packets can be sent or received.
> 
> what I meant was that your customer shouldn't expect to get his custom
> code merged upstream as it is only with some cleanups. Of course an
> user/customer can decide to use U-Boot as system management and hardware
> initialisation tool but that doesn't correspond with U-Boot's design. I
> think most people would agree, that a proper OS like Linux should be
> doing the heavy network initialisation and hardware-offloading stuff as
> well as booting all remaining CPU cores. U-Boot's responsibilty should
> only be to boot that OS in the first CPU ;)
> 
> > There is a fair bit of code used to bring additional cores out of reset.
> > In
> > our biggest configuration, there can be two Octeon CN78XX chips connected
> > in tandem where each chip has 48 cores. In this case there is a lot of
> > tuning that needs to happen with the lanes connecting the two chips
> > before this configuration works reliably. There is a tuning process that
> > is required to run on both sides (and the second chip runs a small binary
> > image as well to perform its half of the tuning).
> > 
> > I do not know if this will change or not but the way the Linux kernel is
> > booted on Octeon is not compatible with the standard boot commands. Part
> > of
> > this is due to the fact that Linux can be run in parallel with Simple
> > Executive applications. It's even possible to run two copies of Linux
> > simultaneously on different cores. To go along with this, there is also a
> > mechanism with named memory blocks that is used. When bring cores out of
> > reset for SE applications, the TLB entries need to be configured. There
> > also is a fair bit of code dealing with core masks when choosing which
> > cores are used for what.
> > 
> > We also have a named memory block feature which is used by Linux and
> > simple
> > executive applications where blocks of memory can be carved up. U-Boot
> > needs to tie into this.
> > 
> > There are also a numerous other I/O interfaces that we also need to
> > initialize. Unfortunately we also have some erratas we need to work around
> > as well and a few are non-trivial.
> > 
> > The DRAM initialization code is also massive.  It handles DDR3 and DDR4
> > for
> > both registered and unregistered memory with ECC.
> > 
> > In many cases, the reason for the size of the code is due to the
> > complexity of the SoC and the platforms built around it. You can think of
> > CN78XX as being more like an enterprise-class server than a simple
> > embedded device. The CN73XX is not too far behind the CN78XX. The only
> > reason our Octeon TX2 U-Boot is so much smaller is that most of the early
> > initialization takes place before U- Boot is started and the fact that a
> > lot of the networking support (such as SFP management and PHY support) is
> > handled by ATF as well as on-chip managment cores. This is necessary
> > because Linux does not have any SFP management support
> 
> last year the PHY framework has been reworked to a phylink framework
> which supports hot-plugging and dynamically linking of PHY drivers with
> MAC drivers especially to support SFP modules. A SFP module driver is
> there as well. There was a talk on ELCE 2018 about this:
> 

I will look at this. The code I wrote can handle some really crazy 
configurations. I may want to modify some of the drivers we have to be 
"virtual MACs" such as Inphi. Also of note that not all phys use MDIO. Two of 
the ones I work with use i2c and there has been talk of using other methods of 
communicating with the phy.

> https://urldefense.proofpoint.com/v2/url?u=https-3A__events19.linuxfoundatio
> n.org_wp-2Dcontent_uploads_2017_12_chevallier-2Dtenart-2Dfrom-2Dthe-2Dethern
> et-2Dmac-2Dto-2Dthe-2Dlink-2Dpartner.pdf&d=DwICaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r
> =3yfMNumMHGMnOfmVc0dViBi3fJfF8ZXRL_aRWSIGwm4&m=knQuIYR9b2vNU-i0lQUe1OVT1ibM4
> 8_KzERoDPCHSoA&s=__bT79VjAensVB_6dAcDvepvNRxCf_TlQVYrRTo8exo&e=
> 
> nor can it handle the complex typologies we're frequently running into
> 
> > today.  The requirements of Redhat also preclude any additional software
> > being installed in order for the networking support to run.
> > 
> > One thing I may need to re-introduce to U-Boot is the temperature sensor
> > support for devices like this, since thermal monitoring is important.
> 
> this should be easy as U-Boot already has a thermal uclass within the
> driver model.
> 
I just noticed that. It looked like for a while it was removed. :)

> > Some boards require a background task to perform periodic monitoring for
> > certain events, including the board that needs to be upstreamed. I haven't
> > checked if anything is available now, but what I did in the past was hook
> > into the input function and while waiting for input it calls a
> > user-defined polling function.
> > 
> > If interrupts are supported it makes the polling job easier.
> > 
> >>> Some of these phy drivers are extremely complex and need to tie
> >>> into the SFP management. We also need to use a background polling thread
> >>> while at the command prompt. A fair bit of our phy code is not in the
> >>> normal phy drivers because it did not fit the model. Some of these phy
> >>> drivers need to interact with the SFP support code in order to handle
> >>> hot plug events in order to reconfigure themselves based on the cable
> >>> type. The existing SFP code handles everything from SFP to SFP28 as well
> >>> as QSFP and 100G QSFP (never tested).
> >>> 
> >>> In the old U-Boot the PHY support had to be significantly enhanced due
> >>> to requirements for hot-plugging and how some of the PHYs are
> >>> configured. It gets quite complicated with phys like the Inphi where one
> >>> phy can handle either four ports (XFI/SGMII) or a single 4-lane port
> >>> (XLAUI). It gets even worse since in some boards we use reclocking chips
> >>> and there is one chip that handles the receive path of a QSFP and
> >>> another that handles the transmit path. Further complicating things,
> >>> with a QSFP it can be treated either as XLAUI or as four XFI ports, so
> >>> you can have four ports spread across two chips, with each port using
> >>> different slices of each chip. In the case of the Inphi/Cortina chip, a
> >>> single device can handle one or four ports based on the configuration
> >>> and it is configured by "slice" which is basically an offset into the
> >>> MDIO register space. We had to jump through hoops in order to have this
> >>> stuff work in a sane way in the device tree. We added entries for SFP
> >>> and QSFP slots in the device tree which point to the MACs, GPIOs and I2C
> >>> bus because pointing them to the phys just got too insane. This will
> >>> need to be ported to the new U-Boot. It should not break the existing
> >>> support since most of it was implemented outside of the core PHY
> >>> handling code. In the port, it would be far better if this could be
> >>> integrated in. The SFP management code is architecture agnostic as is
> >>> all of the PHY support. The callbacks for the SFP support are used by
> >>> the MAC which then notifies the PHY since the MAC often needs to
> >>> reconfigure itself. It can handle some crazy configurations.
> >>> 
> >>> While I see some phy drivers that we also support, i.e. Cortina, our
> >>> drivers tend to have a lot more functionality. For example, all of our
> >>> phy drivers that support firmware support commands for upgrading the
> >>> firmware as well as things like cable testing and other features.
> >> 
> >> PHY drivers and ethernet drivers should be really reduced to the
> >> required functionality to enable basic networking like Ping, DHCP, TFTP.
> >> U-Boot is still "just" a bootloader and not a system managemnt tool ;)
> >> You should do that stuff either in Linux or in a downstream fork.
> > 
> > This is the case for the most part. Unfortunately, many of these drivers
> > require a lot of code and some require frequent monitoring to make
> > adjustments. The SFP support is required to monitor what cable type is
> > plugged in and to reprogram the phy as needed based on the type of cable.
> > The 10G and 25G phys need different settings for optical/active vs
> > passive copper vs SFP connectors. In addition, some require different
> > settings based on the cable length and in some cases exceptions are
> > needed for certain modules (there are a series of Avago SFP to Gigabit
> > modules that require autonegotiation to be disabled in 1000Base-X mode).
> > In at least one case there needs to be frequent polling to make
> > adjustments (25G) as the equalization settings can change based on
> > temperature. The SFP management code identifies the type of cable
> > connected and its parameters so that the phy driver can adjust the
> > appropriate settings. The SFP management code is generic and not tied to
> > any one type of phy or MAC or brand of module. It also monitors all of
> > the GPIO pins and will make callbacks when needed. Many phys lack the
> > support for doing this themselves. Phys I have worked with that need this
> > support include Cortina/ Inphi and several Microsemi/Vitesse devices.
> > 
> > The Inphy devices will typically handle four XFI lanes with four bi-
> > directional slices with each slice given a different register range.
> > Further complicating matters is that a QSFP port can either be four XFI
> > interfaces or a single XLAUI interface. We have code to update the
> > firmware for the Inphi chips, but this is small compared to the rest of
> > the initialization code. These chips require that equalization and gain
> > be configured on each slice based on the board and cable characteristics
> > as well as LED configuration.
> > 
> > With the Microsemi reclocking chips, each chip has four unidirectional
> > lanes. For a QSFP port, two chips are required with one chip configured
> > for ingress and the other for egress. This can support either XLAUI or
> > four XFI interfaces. When it is configured for XFI there are four XFI
> > interfaces, since now four MACs are shared with two chips with each MAC
> > going to one lane on each chip.
> > 
> > Also making things fun is that Inphi and the reclocking chips do not
> > conform to the clause 45 standard at all. In the case of Inphi, the ID
> > registers are 0.0 and 0.1 instead of 1.2 and 1.3 as they are in Clause
> > 45.
> > 
> > The MAC drivers are also non-trivial. The Octeon chips are designed as
> > network processors with a lot of hardware offloading and coprocessors.
> > Bringing up a "simple Ethernet" interface is anything but simple. There
> > are numerous offload engines that must be configured before it will work.
> > While we do have one "simple" interface that can be configured, it often
> > isn't because it's usually only good for a management port and many
> > boards do not have this and the customers desire to be able to use any
> > port.
> > 
> > Just configuring the interface between the MAC and PHY is also
> > non-trivial.
> > The Octeon (and later CPUs) have what are called "QLMs" or quad lane
> > modules. These QLMs contain programmable serdes which can be configured
> > for PCIe, SATA, XFI, XAUI, RXAUI, SGMII, 1000Base-X, XLAUI and a whole
> > host of other interface types with a lot of tuning for things like
> > equalization and clocks. The amount of QLM initialization code is quite
> > large but necessary. There are a lot of clock and analog tuning
> > parameters and sequences that must be run.
> > 
> > Sadly all of this is needed just for basic ping and DHCP. This isn't like
> > a
> > simple e1000 NIC or the NICs common with most SoCs.
> 
> as already stated this heavy networking stuff should be the task of an
> OS. I understand why you chose another way because Linux only recently
> got real support for SFP or more hardware-offloading capabilities but
> maybe you should take the chance and update your system design and
> submit missing functionality to Linux rather than adding a lot of
> networm management stuff to U-Boot.
> 
Unfortunately, without the support in U-Boot, networking just won't work at 
all. The U-Boot drivers do not use any of the heavy lifting features. 
Unfortunately there is still a lot of code that needs to execute just for 
ping.

> > Think of scaling from a Raspberry Pi to a dual-CPU XEON enterprise-class
> > server with 96 cores and 256GiB of RAM with 10, 25 and 40Gbe ports but
> > without a BCM or MCU to handle low-level board changes while also having
> > many enterprise-class requirements for RAS, etc. That is why our code is
> > so large and complex. There are a lot of hardware engines for offloading
> > a lot of tasks since the chips are often used in security appliances.
> > There are engines for ZIP compression, hardware regex engines, packet
> > ordering engines, packet parsing engines, buffer management engines, RAID
> > engines and a whole host of others. Many are not used in U-Boot, but a
> > fair number are required for basic packet I/O.
> > 
> > For example, one of the boxes contains a CN78XX with 8 10G ports (where
> > either can also be configured in XLAUI using 4 to 1 using a QSFP to SFP+
> > splitter cable. It has 128GiB of registered DDR4 DIMMS, 4 SATA drives,
> > redundant power supplies and a whole host of other things including
> > multiple temperature monitors. This uses an Inphi/Cortina phy chip that
> > requires full SFP management support. With Inphi phys, the phy cannot
> > drive LEDs based on traffic since it has no concept of packets,
> > especially in XLAUI mode since each lane is independent of the others.
> > 
> > Another board, one I specifically have been told to upstream is a NIC that
> > contains a CN73XX and two 10G/25G ports that go through a complex gearbox
> > chip. Since there is no hardware support for LEDs in the Octeon SoC to
> > indicate link and packet I/O this must be done in software (including
> > U-Boot, customer requirement) and SFP port management is also a must. The
> > phy is not at all a traditional phy. It uses i2c instead of MDIO and
> > requires frequent monitoring of the link parameters (it's an older custom
> > gearbox chip, there are newer and better chips that don't require this
> > now). I have a hook while U-Boot is sitting at the prompt which allows
> > for background tasks to operate while it's sitting.
> > 
> > I have several other NICs to support that use a Microsemi reclocking chip
> > that has four unidirectional lanes per chip. The chip has zero
> > intelligence and is shared between ports (and on some devices, multiple
> > chips are shared between ports). Everything must be tuned based on the
> > SFP/QSFP module type and cable length. LEDs also must be software driven.
> > (The software driving of LEDs is eliminated in OcteonTX2). These chips
> > have no way to drive the LEDs themselves to indicate packet I/O or link
> > status.
> > 
> > There are also other boards that use the Microsemi reclocking chips. They
> > were chosen in part due to the power budget and these chips are very low
> > power (and inexpensive).
> > 
> > In all of these phy cases, all of the parameters are maintained in the
> > device tree so the drivers are generic. Unfortunately these drivers also
> > require SFP and QSFP management support.
> > 
> > I figure if there are several boards I need to upstream, it's not much
> > more
> > effort to port all of the boards to the new U-Boot. I've worked hard to
> > minimize the board-specific code and make as much of it generic and based
> > on the device tree as possible.
> > 
> > Someday I would love for SFP/QSFP infrastructure to get into Linux. Some
> > NIC cards do it in their drivers, but I'd like to see generic
> > infrastructure (like my U-Boot support). This might make it harder for
> > some drivers to only support certain brands of modules too :) The generic
> > code I wrote works with most modules except Intel (because they have bad
> > checksums, but counterfeit Intel modules work fine!). It still can be
> > expanded at some point since there is no support for module diagnostics
> > other than identifying if it is present. Pretty much all it does is
> > monitor the GPIO pins and parse and decode the EEPROM. The SFP code is
> > generic enough such that any phy driver that needs it can easily hook
> > into it.
> 
> as already noted this is already in Linux:
> 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_
> linux_kernel_git_torvalds_linux.git_tree_drivers_net_phy_phylink.c&d=DwICaQ&
> c=nKjWec2b6R0mOyPaz7xtfQ&r=3yfMNumMHGMnOfmVc0dViBi3fJfF8ZXRL_aRWSIGwm4&m=knQ
> uIYR9b2vNU-i0lQUe1OVT1ibM48_KzERoDPCHSoA&s=p672bj1xBj_xHCzdr0pvpPNg4qe_LA0Pc
> R7Sa4J9OQA&e=
> 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_
> linux_kernel_git_torvalds_linux.git_tree_drivers_net_phy_sfp.c&d=DwICaQ&c=nK
> jWec2b6R0mOyPaz7xtfQ&r=3yfMNumMHGMnOfmVc0dViBi3fJfF8ZXRL_aRWSIGwm4&m=knQuIYR
> 9b2vNU-i0lQUe1OVT1ibM48_KzERoDPCHSoA&s=uCs-21llsi62iM9tfPQIHGyU1qVnoYaQVwVX6
> TZwaO0&e=

Unfortunately, for high speed interfaces (which our customers use in U-Boot 
for tftpboot, a fair bit needs to be implemented just to work. The way the 
code is architect ed there isn't much impact to the existing U-Boot code 
unless it needs to take advantage of it.

> >>> Our bootloader needs to be able to be booted from a variety of sources,
> >>> including SPI, eMMC, NOR flash and booting over the PCI bus from a host
> >>> system. This is one reason we use virtual memory. The other reason is
> >>> that it eliminates the need to perform relocation. Our start.S code
> >>> handles all of these different cases as well as exception handling.
> >> 
> >> This is already supported for MIPS. You should try to use the generic
> >> SPL framework for that. Whether you like the relocation or not, it's one
> >> of the basic design principles of U-Boot. I guess it likely won't be
> >> accepted if you circumvent this. In fact by now we're sharing the same
> >> technology as Linux to have relocatable binaries without using gcc's
> >> -fPIC or -mabicalls to reduce the binary footprint. You can configure
> >> gd->ram_top to any address of your liking as reference address for the
> >> relocation.
> > 
> > I will look into this. One other complication is the fact that we require
> > both a failsafe as well as a default bootloader. With the older U-Boot we
> > got around all of this by just using TLB entries to map U-Boot to always
> > run in the same virtual address regardless of the physical address. It
> > eliminated any need for -fPIC and helped keep the binary small. For our
> > older bootloader, it always executes at 0xC0000000 regardless of where it
> > sits in physical memory. Using virtual memory also helps keep U-Boot
> > simple and small.
> > 
> >>> I will also say up front that the memory initialization code is a mess
> >>> and quite large (it was written by a hardware engineer who never heard
> >>> of functions).
> >>> 
> >>> One thing is that this will break mips unless it is refactored like ARM
> >>> is, for example, separating armv7 and armv8. This way we could have
> >>> arch/mips/cpu/octeon. I did this with the old bootloader to separate our
> >>> stuff. I'm open to suggestions as for the naming. I don't see how we can
> >>> share much of the code with the other MIPS CPUs.
> >> 
> >> We have the same mach directory handling as in Linux MIPS. So you could
> >> easily add all your platform specific code (except drivers) to
> >> arch/mips/mach-octeon or (-cavium). Inside that directory you can have
> >> an include directory for you cusom header files, you can even override
> >> the generic files from arch/mips/include like in Linux. arch/mips/cpu
> >> and arch/mips/lib should only contain generic code. As already mentioned
> >> you could provide an own start.S inside arch/mips/mach-octeon but if
> >> possible you should try to reuse or extend the generic variant.
> > 
> > We can't use the existing start.S. We have a lot of requirements that are
> > not supported there as well as a fair bit of code dedicated to dealing
> > with the cache and TLBs and bringing additional cores out of reset. We
> > make use of a boot bus movable region in order to do this and handle
> > other cases like NMIs and the watchdog. Our start.S currently sits at
> > around 3800 lines of code. Some is common but most is not.
> > 
> > Our start.S is designed to be able to boot both a failsafe and
> > non-failsafe
> > image and supports adjusting the flash mapping in order to start from an
> > offset other than zero in the flash. There is also a fair bit of code for
> > copying the image out of flash into the L2 cache for a significant speedup
> > for DRAM initialization. I'm trying to get permission to share our
> > existing code but I'm getting push-back (even though it's GPL!?!). How
> > they want me to upstream it without sharing the code is beyond me.
> > 
> > While U-Boot has an exception handler, I believe ours is more
> > comprehensive. It is written entirely in assembler and is not dependent
> > on a working C runtime environment. It also dumps more information than
> > just the registers such as the stack and a number of other exception
> > registers and does some exception decoding. It's quite a bit better than
> > the ARMv8 exception handler IMHO.
> > 
> > Putting this under mach-octeon will make it much easier. I'll try and
> > re-use where I can.
> > 
> >>> All in all, I think the final port will add between 500K-1M lines of
> >>> code for the Octeon CPU. It is much more extensive than what is required
> >>> for OcteonTX since in the latter case most of the hardware
> >>> initialization is done by earlier stage bootloaders and the ATF handles
> >>> things like SFP port management and many of the networking operations.
> >>> 
> >>> I'm not sure how well I'll be able to upstream all of this code at this
> >>> point since I was just handed this task. We already have at least 1M
> >>> lines of code added to the old U-Boot which is based off of 2013.08 with
> >>> a lot of backports.
> > 
> > I'm trying to get  our existing code made available someplace online. I'm
> > getting pushback even though U-Boot is GPL and the license on our SDK is
> > BSD- like (i.e. do whatever you want but don't hold us responsible). It
> > looks like it used to be available but was taken down. I don't
> > undertstand lawyers. All of the code I wrote is GPL. There is some U-Boot
> > specific code in our SDK, but none was copied from U-Boot. There also is
> > some duplication of functionality between U-Boot and our SDK that I'll
> > try and eliminate.
> > 
> > I have implemented just about every feature in U-Boot I could with our
> > Octeon SoC. That's another reason it's so large. Some customer always
> > comes back and says they want feature X to work. Fortunately, the changes
> > to the U-Boot supplied code are generally minimal, despite it being so
> > large.
> > 
> > I likely will need to add some more hooks to board_f.c and board_r.c. I
> > have run into many cases where we need a specific order of initialization
> > that does not match the normal U-Boot order. Perhaps make init_sequence_f
> > and init_sequence_r weak so that they can be overridden if needed by a
> > specific board or architecture. While much of the current init order
> > works,  we need some things initialized as quickly as possible and others
> > initialized later. For example, the first thing we call is an
> > early_errate_workaround function in the init sequence before anything
> > else is called.
> 
> I guess overriding the complete generic board init code is not
> acceptable. It was once hard work to unify this. A hook like
> early_errate_workaround() sounds reasonable but could also be called
> from start.S before handing over to board_init_f(). But everything else
> should fit into the exisiting init hooks. There are quite a lot.

I agree. I did some more research and noticed that it's not uncommon to have 
other functions called before board_init_f by the start code. I also noticed 
that there appear to be quite a few places where custom board_init_f functions 
are defined. I will try and avoid this. Back when I did this port in 2012 
things were a lot more limited.

Would marking a few functions as weak be acceptable? This would help keep 
#ifdefs to a minimum. I have found that doing this as well as adding hooks in 
some key places can really minimize the use of #ifdefs and keep the code 
cleaner. In our common board code I did this a lot. That way there is nothing 
specific to any single board in there and any board can override whatever 
functionality it needs to do. Our existing U-Boot supports 83 boards, though 
many of these will go away (and some are no longer tested).

-Aaron
-------------- next part --------------
A non-text attachment was scrubbed...
Name: start.S
Type: text/x-csrc
Size: 95688 bytes
Desc: start.S
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20191030/93718458/attachment.c>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [U-Boot] [EXT] Re: Cavium/Marvell Octeon Support
  2019-10-30 16:20     ` Daniel Schwierzeck
@ 2019-10-30 17:21       ` Wolfgang Denk
  2019-10-30 21:23       ` Aaron Williams
  1 sibling, 0 replies; 32+ messages in thread
From: Wolfgang Denk @ 2019-10-30 17:21 UTC (permalink / raw)
  To: u-boot

Dear Daniel & Aaron,

In message <7fdf93f6-412e-5fcf-da5e-17665daadb30@gmail.com> you wrote:
>
> > Some other things we have included are a native API that allows Simple 
> > Executive applications to make calls into U-Boot for such things as 
> > environment variable access as well as access to block devices and 
> > filesystems.
>
> This is one of the parts that shouldn't be needed for basic upstream
> support. It your API is a parallel and independent implementation of the
> API that U-Boot already has for standalone applications, than I'm afraid
> this won't be accepted and should be kept in a downstream fork.

The big question here is what these are intended for.

If they are indeed thought as standalone applications, especially
containing code that shall not be disclosed unter GPL, then there is
a licensing issue - the pretty hard restrictions of the API for
standalone applications is intentional, and attempts to work around
it are license violations.

But if it's just normal GPL code that is somehow dependent on U-Boot
services, then why is it not linked against U-Boot?

Or this might be something ike dynamically loadable modules - well,
then a close look is needed because such an approach has to be
generic enough (end probably borrow much from Linux).

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
There you go man, Keep as cool as you can. It riles them  to  believe
that you perceive the web they weave. Keep on being free!

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [U-Boot] [EXT] Re: Cavium/Marvell Octeon Support
  2019-10-27  2:34   ` Aaron Williams
  2019-10-29 13:12     ` Wolfgang Denk
@ 2019-10-30 16:20     ` Daniel Schwierzeck
  2019-10-30 17:21       ` Wolfgang Denk
  2019-10-30 21:23       ` Aaron Williams
  1 sibling, 2 replies; 32+ messages in thread
From: Daniel Schwierzeck @ 2019-10-30 16:20 UTC (permalink / raw)
  To: u-boot

Hi Aaron,

Am 27.10.19 um 03:34 schrieb Aaron Williams:
> Hi Daniel,
> 
> On Friday, October 25, 2019 8:13:57 AM PDT Daniel Schwierzeck wrote:
>> External Email
>>
>> ----------------------------------------------------------------------
>> Hi Aaron,
>>
>> Am 23.10.19 um 05:50 schrieb Aaron Williams:
>>> Hi all,
>>>
>>> I have been tasked with porting our Octeon U-Boot to the latest U-Boot
>>> and merging it upstream. This will involve a very significant amount of
>>> code that generally will not be compatible with other MIPS processors
>>> due to our needs and requirements. For example, the start.S will need to
>>> be completely different than what is present. For example, our existing
>>> start.S is 3577 lines of code in order to deal with things like RAS,
>>> exceptions, virtual memory and more. We need to use virtual memory since
>>> U-Boot can be loaded at any 4MB boundary in memory, not just 0xbfc00000.
>>> A number of drivers will need to be updated in order to properly map
>>> pointers to physical addresses. This is needed anyway, since I see
>>> numerous drivers that assume that a pointer is a DMA address. For MIPS
>>> this is never the case (I'm looking at XHCI).
>>
>> Good to see some progress in mainline Octeon support. Could you briefly
>> describe the differences and commonalities in booting an Octeon CPU
>> compared to other "generic" MIPS cores? Or could you point me to a
>> public Git tree? It can't be that different because Linux kernel is also
>> able to share most of the code ;)
>>
> 
> Actually the low level code is significantly different. First of all, we need 
> the U-Boot bootloader to be able to boot from different memory locations. 
> Because of this, we use mapped memory for U-Boot. A side effect of this is 
> that it eliminates the need for relocation when it is shifted to the top of 
> memory. All we need to do is just set a couple of TLB entries.

Understood. but still U-Boot relocates itself from its initial entry
memory address to its destination memory address based on gd->ram_top.
Maybe this is ineffective nowadays with various SPL/TPL boot methods
because U-Boot proper is already loaded to an executable memory location
by SPL, but you have to initially deal with that design. Feel free to
suggest/submit a patch for the generic board init code to make the
reloaction configurable.

> 
> The assembly code is significantly different and is far more extensive.
> 
> Additionally, the way Octeon Linux is booted is different.
> 
> The generic start.S is not usable in our case.
> 
> We have a significant amount of code for dealing with the cache and for things 
> like copying U-Boot from flash into the L2 cache. We also have to deal with 
> taking other cores out of reset in our start.S. Our exception handler has also 
> been extended to handle multiple cores.

it's hard to discuss this without example code but I still think the
basic principles of cache and exception handling can't be that different
from generic MIPS cores. Locking cache lines and loading code to it
could be useful for other MIPS platforms and should be added as generic
feature. BTW the exception handler code is a port of the Linux one, I
only skipped the stack trace output because of the complicated stack
unwinding code. I think the current dump of general and CP0 and EPC
registers is more than feasible for a bootloader. It already helped me
multiple times to quickly locate code locations with e.g. null pointer
dereferencing.

> 
> Some other things we have included are a native API that allows Simple 
> Executive applications to make calls into U-Boot for such things as 
> environment variable access as well as access to block devices and 
> filesystems.

This is one of the parts that shouldn't be needed for basic upstream
support. It your API is a parallel and independent implementation of the
API that U-Boot already has for standalone applications, than I'm afraid
this won't be accepted and should be kept in a downstream fork.

> 
> 
> We used to have our Octeon SDK available for download but it seems this has 
> been taken down :( I'm trying to find out how I can make it available but I'm 
> getting pushback in sharing our GPLed U-Boot even though it is GPL.
> 
>> In principle you could compile an own start.S in your mach-octeon
>> directory, but you should try to use the generic start.S which is
>> already customisable and extensible. If needed, we could add more
>> extension points to it. Booting from any custom memory address is
>> already supported and very common for other MIPS based SoC's. Exception
>> support is also already there.
>>
> 
> The bootloader needs to be able to start from multiple memory locations 
> without recompiling. Our existing bootloader can run from any 4MB boundary 
> without recompiling or relocation. It can start out of flash (from any sector 
> boundary, not just 0) or L2 cache. Starting by L2 cache is supported by eMMC, 
> SPI and PCI target bootloaders. Additionally the same bootloader can be 
> started from RAM such as when the failsafe bootloader starts the main 
> bootloader. In most cases, the failsafe is the same full-featured bootloader 
> since it fits entirely within the L2 cache. Our only bootloader requirement is 
> that it fits in the L2 cache (except when booting from Flash, though this is 
> preferred for speed) and that it remain under 4 MiB in size.
> 
> I believe our exception handling is more extensive than the standard U-Boot 
> exception handler. It includes the stack output as well as numerous COP0 
> registers and decoding the cause of the exception. The exception handler is 
> also independent of a working C environment. We also need to handle exceptions 
> occurring on multiple cores as they're brought out of reset and not all cases 
> are exceptions. 

as I wrote above, the current exception handling is already feasible in
almost all cases to quickly locate code bugs and doesn't need much code.
Adding stack trace output would required adding a lot of more code. But
if you only missing some registers or want to dump the stack itself,
feel free to extend the current code.

Cores are first powered on and kept in a halted state, then
> later when we start the Linux kernel or simple executive applications, the 
> exception handler is updated (via a bootbus moveable memory region)  and an 
> NMI is generated for the cores where they will begin executing code out of 
> start.S before moving to the code that sets up the environment for booting 
> Linux and/or simple executive applications. In the latter case, TLB entries 
> are programmed in for each core.
> 
>>> The new Octeon U-Boot will be native 64-bit instead of how the earlier
>>> one was 32-bit using the N32 ABI (so 64-bit addresses could be
>>> accessed). We had to jump through some hoops to make a 32-bit U-Boot
>>> fully support 64-bit hardware.
>>
>> We have 64 bit support for MIPS. I even sync'ed the asm/io stuff from
>> Linux in the past (which includes support for Octeon) so that you would
>> be able to use the standard IO primitives and ioremap stuff and hook in
>> your platform-specifc memory mappings.
>>
> That is good to know. What I have run into is the fact that many drivers do 
> not support I/O remapping. I.e. XHCI assumes that a pointer is a DMA address. 
> Also, does the 64-bit support handle multiple cores in U-Boot?

we already have stuff like dev_remap_addr(struct udevice* dev) as part
of the driver model API to map your physical addresses from device tree
to virtual addresses. This is used in all drivers compatible with MIPS.
That function is backed by the MIPS specific ioremap_nocache() function
(also ported from Linux) so that you can hook in platform specific
mapping code. If you want to use existing drivers which don't do
remapping yet, you have to patch them. But this should be simple, we
recently did that on Broadcom or Mediatek platforms, which are sharing
drivers between their MIPS and ARM CPUs.

For XHCI you probably only need to patch the xhci_readl() and
xhci_writel() functions and establish the memory mappings in your
platform specific glue code. But USB support shouldn't be your first
priority ;)


> 
> I agree about using the standard ioremap stuff. I'm only pointing out that 
> there are places where it is missing in the common U-Boot code. Where it is 
> present, there won't be any issues since traditionally I used those methods to 
> call our platform specific remapping. I will look to see what is present and 
> if it will work or not.

yes, those places need some patching anyway. There is already an ongoing
task to address this:

https://gitlab.denx.de/u-boot/custodians/u-boot-mips/issues/15

> 
>>> I think we can shrink the code by removing support for starting "simple
>>> executive" tasks. Simple executive tasks are bare metal applications
>>> that can run on dedicated cores beside Linux (or without Linux). I will
>>> also not be porting any support for anything older than Octeon3.
>>>
>>> We also make heavy use of our SDK in order to perform hardware
>>> initialization and networking. In our old U-Boot, we have almost 900K
>>> lines of code. I can cut out much of this but much will remain.
>>>
>>> We also have added extensive infrastructure for handling SFP and QSFP
>>> cables as well as very extensive phy support for phys from
>>> Aquantia/Marvell, Vitesse/Microsemi, Inphi/Cortina and an Avago gearbox.
>>> Our customer wants us to port all of this to the new U-Boot and upstream
>>> it. I'm worried about the sheer amount of code since it is absolutely
>>> massive.
>>
>> Maybe you should cut down your customers expectations a bit. According
>> to sloccount we currently have 1.6M SLOC for the whole U-Boot. I guess
>> Tom or Wolfgang wouldn't agree with adding another 900k only for one
>> CPU. Actually what should be upstream is the basic CPU, driver and board
>> support to be able to boot a mainline kernel. Everything else like
>> custom bare metal applications or the SFP/PHY handling stuff mentioned
>> below could also be maintained in a downstream tree. Maybe Wolfgang is
>> willing to host one on gitlab.denx.de.
>>
> 
> I will try and cut it down. Much of the code is register definitions. The 
> register definition files are auto-generated and tend to be huge. They're 
> fully commented and include both big and little endian bitfields. In this case 
> I can do like I did for OcteonTX and modify the scripts that generate these 
> headers to strip out the little-endian and comments. There is a huge amount of 
> code for configuring our QLM hardware interfaces. We also have a lot of code 
> for SFP/QSFP ports. 
> 
> There are some other huge files that can also be eliminated by dropping 
> support for Octeon II and earlier. The error handling files are massive for 
> those chips.
> 
> Much of the rest can be shrunk somewhat, but a lot of that code is still 
> required.
> 
> There is a huge amount of code for dealing with our quad-lane modules (QLMs). 
> The QLMs can be configured to run in a variety of modes, from PCIe, SGMII, 
> SATA, XLAUI, XFI, Interlaken, SVRIO, QSGMII, XAUI, RXAUI and more. There is a 
> lot of tuning and configuration code needed in order to handle different 
> clocks, equalization, gain, AGC and a whole host of other serdes issues.
> 
> The MAC code is also quite large and complex since there are many coprocessors 
> that must be configured. These chips are designed as network processors. While 
> it makes their networking quite powerful and fast, it also means that a lot of 
> programming is needed before they will work. There are input parser engines, 
> buffer management engines, queueing engines, output engines and more that must 
> be fully configured before any packets can be sent or received.

what I meant was that your customer shouldn't expect to get his custom
code merged upstream as it is only with some cleanups. Of course an
user/customer can decide to use U-Boot as system management and hardware
initialisation tool but that doesn't correspond with U-Boot's design. I
think most people would agree, that a proper OS like Linux should be
doing the heavy network initialisation and hardware-offloading stuff as
well as booting all remaining CPU cores. U-Boot's responsibilty should
only be to boot that OS in the first CPU ;)

> 
> There is a fair bit of code used to bring additional cores out of reset. In 
> our biggest configuration, there can be two Octeon CN78XX chips connected in 
> tandem where each chip has 48 cores. In this case there is a lot of tuning 
> that needs to happen with the lanes connecting the two chips before this 
> configuration works reliably. There is a tuning process that is required to 
> run on both sides (and the second chip runs a small binary image as well to 
> perform its half of the tuning).
> 
> I do not know if this will change or not but the way the Linux kernel is 
> booted on Octeon is not compatible with the standard boot commands. Part of 
> this is due to the fact that Linux can be run in parallel with Simple 
> Executive applications. It's even possible to run two copies of Linux 
> simultaneously on different cores. To go along with this, there is also a 
> mechanism with named memory blocks that is used. When bring cores out of reset  
> for SE applications, the TLB entries need to be configured. There also is a 
> fair bit of code dealing with core masks when choosing which cores are used 
> for what.
> 
> We also have a named memory block feature which is used by Linux and simple 
> executive applications where blocks of memory can be carved up. U-Boot needs 
> to tie into this.
> 
> There are also a numerous other I/O interfaces that we also need to 
> initialize. Unfortunately we also have some erratas we need to work around as 
> well and a few are non-trivial.
> 
> The DRAM initialization code is also massive.  It handles DDR3 and DDR4 for 
> both registered and unregistered memory with ECC.
> 
> In many cases, the reason for the size of the code is due to the complexity of 
> the SoC and the platforms built around it. You can think of CN78XX as being 
> more like an enterprise-class server than a simple embedded device. The CN73XX 
> is not too far behind the CN78XX. The only reason our Octeon TX2 U-Boot is so 
> much smaller is that most of the early initialization takes place before U-
> Boot is started and the fact that a lot of the networking support (such as SFP 
> management and PHY support) is handled by ATF as well as on-chip managment 
> cores. This is necessary because Linux does not have any SFP management 
> support 

last year the PHY framework has been reworked to a phylink framework
which supports hot-plugging and dynamically linking of PHY drivers with
MAC drivers especially to support SFP modules. A SFP module driver is
there as well. There was a talk on ELCE 2018 about this:

https://events19.linuxfoundation.org/wp-content/uploads/2017/12/chevallier-tenart-from-the-ethernet-mac-to-the-link-partner.pdf

nor can it handle the complex typologies we're frequently running into
> today.  The requirements of Redhat also preclude any additional software being 
> installed in order for the networking support to run.
> 
> One thing I may need to re-introduce to U-Boot is the temperature sensor 
> support for devices like this, since thermal monitoring is important.

this should be easy as U-Boot already has a thermal uclass within the
driver model.

> 
> Some boards require a background task to perform periodic monitoring for 
> certain events, including the board that needs to be upstreamed. I haven't 
> checked if anything is available now, but what I did in the past was hook into 
> the input function and while waiting for input it calls a user-defined polling 
> function.
> 
> If interrupts are supported it makes the polling job easier.
>>> Some of these phy drivers are extremely complex and need to tie
>>> into the SFP management. We also need to use a background polling thread
>>> while at the command prompt. A fair bit of our phy code is not in the
>>> normal phy drivers because it did not fit the model. Some of these phy
>>> drivers need to interact with the SFP support code in order to handle
>>> hot plug events in order to reconfigure themselves based on the cable
>>> type. The existing SFP code handles everything from SFP to SFP28 as well
>>> as QSFP and 100G QSFP (never tested).
>>>
>>> In the old U-Boot the PHY support had to be significantly enhanced due
>>> to requirements for hot-plugging and how some of the PHYs are
>>> configured. It gets quite complicated with phys like the Inphi where one
>>> phy can handle either four ports (XFI/SGMII) or a single 4-lane port
>>> (XLAUI). It gets even worse since in some boards we use reclocking chips
>>> and there is one chip that handles the receive path of a QSFP and
>>> another that handles the transmit path. Further complicating things,
>>> with a QSFP it can be treated either as XLAUI or as four XFI ports, so
>>> you can have four ports spread across two chips, with each port using
>>> different slices of each chip. In the case of the Inphi/Cortina chip, a
>>> single device can handle one or four ports based on the configuration
>>> and it is configured by "slice" which is basically an offset into the
>>> MDIO register space. We had to jump through hoops in order to have this
>>> stuff work in a sane way in the device tree. We added entries for SFP
>>> and QSFP slots in the device tree which point to the MACs, GPIOs and I2C
>>> bus because pointing them to the phys just got too insane. This will
>>> need to be ported to the new U-Boot. It should not break the existing
>>> support since most of it was implemented outside of the core PHY
>>> handling code. In the port, it would be far better if this could be
>>> integrated in. The SFP management code is architecture agnostic as is
>>> all of the PHY support. The callbacks for the SFP support are used by
>>> the MAC which then notifies the PHY since the MAC often needs to
>>> reconfigure itself. It can handle some crazy configurations.
>>>
>>> While I see some phy drivers that we also support, i.e. Cortina, our
>>> drivers tend to have a lot more functionality. For example, all of our
>>> phy drivers that support firmware support commands for upgrading the
>>> firmware as well as things like cable testing and other features.
>>
>> PHY drivers and ethernet drivers should be really reduced to the
>> required functionality to enable basic networking like Ping, DHCP, TFTP.
>> U-Boot is still "just" a bootloader and not a system managemnt tool ;)
>> You should do that stuff either in Linux or in a downstream fork.
>>
> 
> This is the case for the most part. Unfortunately, many of these drivers 
> require a lot of code and some require frequent monitoring to make 
> adjustments. The SFP support is required to monitor what cable type is plugged 
> in and to reprogram the phy as needed based on the type of cable. The 10G and 
> 25G phys need different settings for optical/active vs passive copper vs SFP 
> connectors. In addition, some require different settings based on the cable 
> length and in some cases exceptions are needed for certain modules (there are 
> a series of Avago SFP to Gigabit modules that require autonegotiation to be 
> disabled in 1000Base-X mode). In at least one case there needs to be frequent 
> polling to make adjustments (25G) as the equalization settings can change 
> based on temperature. The SFP management code identifies the type of cable 
> connected and its parameters so that the phy driver can adjust the appropriate 
> settings. The SFP management code is generic and not tied to any one type of 
> phy or MAC or brand of module. It also monitors all of the GPIO pins and will 
> make callbacks when needed. Many phys lack the support for doing this 
> themselves. Phys I have worked with that need this support include Cortina/
> Inphi and several Microsemi/Vitesse devices.
> 
> The Inphy devices will typically handle four XFI lanes with four bi-
> directional slices with each slice given a different register range. Further 
> complicating matters is that a QSFP port can either be four XFI interfaces or 
> a single XLAUI interface. We have code to update the firmware for the Inphi 
> chips, but this is small compared to the rest of the initialization code. 
> These chips require that equalization and gain be configured on each slice 
> based on the board and cable characteristics as well as LED configuration.
> 
> With the Microsemi reclocking chips, each chip has four unidirectional lanes. 
> For a QSFP port, two chips are required with one chip configured for ingress 
> and the other for egress. This can support either XLAUI or four XFI 
> interfaces. When it is configured for XFI there are four XFI interfaces, since 
> now four MACs are shared with two chips with each MAC going to one lane on 
> each chip.
> 
> Also making things fun is that Inphi and the reclocking chips do not conform 
> to the clause 45 standard at all. In the case of Inphi, the ID registers are 
> 0.0 and 0.1 instead of 1.2 and 1.3 as they are in Clause 45.
> 
> The MAC drivers are also non-trivial. The Octeon chips are designed as network 
> processors with a lot of hardware offloading and coprocessors. Bringing up a 
> "simple Ethernet" interface is anything but simple. There are numerous offload 
> engines that must be configured before it will work. While we do have one 
> "simple" interface that can be configured, it often isn't because it's usually 
> only good for a management port and many boards do not have this and the 
> customers desire to be able to use any port.
> 
> Just configuring the interface between the MAC and PHY is also non-trivial. 
> The Octeon (and later CPUs) have what are called "QLMs" or quad lane modules. 
> These QLMs contain programmable serdes which can be configured for PCIe, SATA, 
> XFI, XAUI, RXAUI, SGMII, 1000Base-X, XLAUI and a whole host of other interface 
> types with a lot of tuning for things like equalization and clocks. The amount 
> of QLM initialization code is quite large but necessary. There are a lot of 
> clock and analog tuning parameters and sequences that must be run.
> 
> Sadly all of this is needed just for basic ping and DHCP. This isn't like a 
> simple e1000 NIC or the NICs common with most SoCs.

as already stated this heavy networking stuff should be the task of an
OS. I understand why you chose another way because Linux only recently
got real support for SFP or more hardware-offloading capabilities but
maybe you should take the chance and update your system design and
submit missing functionality to Linux rather than adding a lot of
networm management stuff to U-Boot.

> 
> Think of scaling from a Raspberry Pi to a dual-CPU XEON enterprise-class 
> server with 96 cores and 256GiB of RAM with 10, 25 and 40Gbe ports but without 
> a BCM or MCU to handle low-level board changes while also having many 
> enterprise-class requirements for RAS, etc. That is why our code is so large 
> and complex. There are a lot of hardware engines for offloading a lot of tasks 
> since the chips are often used in security appliances. There are engines for 
> ZIP compression, hardware regex engines, packet ordering engines, packet 
> parsing engines, buffer management engines, RAID engines and a whole host of 
> others. Many are not used in U-Boot, but a fair number are required for basic 
> packet I/O.
> 
> For example, one of the boxes contains a CN78XX with 8 10G ports (where either 
> can also be configured in XLAUI using 4 to 1 using a QSFP to SFP+ splitter 
> cable. It has 128GiB of registered DDR4 DIMMS, 4 SATA drives, redundant power 
> supplies and a whole host of other things including multiple temperature 
> monitors. This uses an Inphi/Cortina phy chip that requires full SFP 
> management support. With Inphi phys, the phy cannot drive LEDs based on 
> traffic since it has no concept of packets, especially in XLAUI mode since 
> each lane is independent of the others.
> 
> Another board, one I specifically have been told to upstream is a NIC that 
> contains a CN73XX and two 10G/25G ports that go through a complex gearbox 
> chip. Since there is no hardware support for LEDs in the Octeon SoC to 
> indicate link and packet I/O this must be done in software (including U-Boot, 
> customer requirement) and SFP port management is also a must. The phy is not 
> at all a traditional phy. It uses i2c instead of MDIO and requires frequent 
> monitoring of the link parameters (it's an older custom gearbox chip, there 
> are newer and better chips that don't require this now). I have a hook while 
> U-Boot is sitting at the prompt which allows for background tasks to operate 
> while it's sitting.
> 
> I have several other NICs to support that use a Microsemi reclocking chip that 
> has four unidirectional lanes per chip. The chip has zero intelligence and is 
> shared between ports (and on some devices, multiple chips are shared between 
> ports). Everything must be tuned based on the SFP/QSFP module type and cable 
> length. LEDs also must be software driven. (The software driving of LEDs is 
> eliminated in OcteonTX2). These chips have no way to drive the LEDs themselves 
> to indicate packet I/O or link status.
> 
> There are also other boards that use the Microsemi reclocking chips. They were 
> chosen in part due to the power budget and these chips are very low power (and 
> inexpensive).
> 
> In all of these phy cases, all of the parameters are maintained in the device 
> tree so the drivers are generic. Unfortunately these drivers also require SFP 
> and QSFP management support.
> 
> I figure if there are several boards I need to upstream, it's not much more 
> effort to port all of the boards to the new U-Boot. I've worked hard to 
> minimize the board-specific code and make as much of it generic and based on 
> the device tree as possible.
> 
> Someday I would love for SFP/QSFP infrastructure to get into Linux. Some NIC 
> cards do it in their drivers, but I'd like to see generic infrastructure (like 
> my U-Boot support). This might make it harder for some drivers to only support 
> certain brands of modules too :) The generic code I wrote works with most 
> modules except Intel (because they have bad checksums, but counterfeit Intel 
> modules work fine!). It still can be expanded at some point since there is no 
> support for module diagnostics other than identifying if it is present. Pretty 
> much all it does is monitor the GPIO pins and parse and decode the EEPROM. The 
> SFP code is generic enough such that any phy driver that needs it can easily 
> hook into it.

as already noted this is already in Linux:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/phy/phylink.c

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/phy/sfp.c

> 
>>> Our bootloader needs to be able to be booted from a variety of sources,
>>> including SPI, eMMC, NOR flash and booting over the PCI bus from a host
>>> system. This is one reason we use virtual memory. The other reason is
>>> that it eliminates the need to perform relocation. Our start.S code
>>> handles all of these different cases as well as exception handling.
>>
>> This is already supported for MIPS. You should try to use the generic
>> SPL framework for that. Whether you like the relocation or not, it's one
>> of the basic design principles of U-Boot. I guess it likely won't be
>> accepted if you circumvent this. In fact by now we're sharing the same
>> technology as Linux to have relocatable binaries without using gcc's
>> -fPIC or -mabicalls to reduce the binary footprint. You can configure
>> gd->ram_top to any address of your liking as reference address for the
>> relocation.
>>
> 
> I will look into this. One other complication is the fact that we require both 
> a failsafe as well as a default bootloader. With the older U-Boot we got 
> around all of this by just using TLB entries to map U-Boot to always run in 
> the same virtual address regardless of the physical address. It eliminated any 
> need for -fPIC and helped keep the binary small. For our older bootloader, it 
> always executes at 0xC0000000 regardless of where it sits in physical memory. 
> Using virtual memory also helps keep U-Boot simple and small.
> 
>>> I will also say up front that the memory initialization code is a mess
>>> and quite large (it was written by a hardware engineer who never heard
>>> of functions).
>>>
>>> One thing is that this will break mips unless it is refactored like ARM
>>> is, for example, separating armv7 and armv8. This way we could have
>>> arch/mips/cpu/octeon. I did this with the old bootloader to separate our
>>> stuff. I'm open to suggestions as for the naming. I don't see how we can
>>> share much of the code with the other MIPS CPUs.
>>
>> We have the same mach directory handling as in Linux MIPS. So you could
>> easily add all your platform specific code (except drivers) to
>> arch/mips/mach-octeon or (-cavium). Inside that directory you can have
>> an include directory for you cusom header files, you can even override
>> the generic files from arch/mips/include like in Linux. arch/mips/cpu
>> and arch/mips/lib should only contain generic code. As already mentioned
>> you could provide an own start.S inside arch/mips/mach-octeon but if
>> possible you should try to reuse or extend the generic variant.
>>
> 
> We can't use the existing start.S. We have a lot of requirements that are not 
> supported there as well as a fair bit of code dedicated to dealing with the 
> cache and TLBs and bringing additional cores out of reset. We make use of a 
> boot bus movable region in order to do this and handle other cases like NMIs 
> and the watchdog. Our start.S currently sits at around 3800 lines of code. 
> Some is common but most is not.
> 
> Our start.S is designed to be able to boot both a failsafe and non-failsafe 
> image and supports adjusting the flash mapping in order to start from an 
> offset other than zero in the flash. There is also a fair bit of code for 
> copying the image out of flash into the L2 cache for a significant speedup for 
> DRAM initialization. I'm trying to get permission to share our existing code 
> but I'm getting push-back (even though it's GPL!?!). How they want me to 
> upstream it without sharing the code is beyond me.
> 
> While U-Boot has an exception handler, I believe ours is more comprehensive. 
> It is written entirely in assembler and is not dependent on a working C 
> runtime environment. It also dumps more information than just the registers 
> such as the stack and a number of other exception registers and does some 
> exception decoding. It's quite a bit better than the ARMv8 exception handler 
> IMHO.
> 
> Putting this under mach-octeon will make it much easier. I'll try and re-use 
> where I can.
> 
>>> All in all, I think the final port will add between 500K-1M lines of
>>> code for the Octeon CPU. It is much more extensive than what is required
>>> for OcteonTX since in the latter case most of the hardware
>>> initialization is done by earlier stage bootloaders and the ATF handles
>>> things like SFP port management and many of the networking operations.
>>>
>>> I'm not sure how well I'll be able to upstream all of this code at this
>>> point since I was just handed this task. We already have at least 1M
>>> lines of code added to the old U-Boot which is based off of 2013.08 with
>>> a lot of backports.
> 
> I'm trying to get  our existing code made available someplace online. I'm 
> getting pushback even though U-Boot is GPL and the license on our SDK is BSD-
> like (i.e. do whatever you want but don't hold us responsible). It looks like 
> it used to be available but was taken down. I don't undertstand lawyers. All 
> of the code I wrote is GPL. There is some U-Boot specific code in our SDK, but 
> none was copied from U-Boot. There also is some duplication of functionality 
> between U-Boot and our SDK that I'll try and eliminate.
> 
> I have implemented just about every feature in U-Boot I could with our Octeon 
> SoC. That's another reason it's so large. Some customer always comes back and 
> says they want feature X to work. Fortunately, the changes to the U-Boot 
> supplied code are generally minimal, despite it being so large.
> 
> I likely will need to add some more hooks to board_f.c and board_r.c. I have 
> run into many cases where we need a specific order of initialization that does 
> not match the normal U-Boot order. Perhaps make init_sequence_f and 
> init_sequence_r weak so that they can be overridden if needed by a specific 
> board or architecture. While much of the current init order works,  we need 
> some things initialized as quickly as possible and others initialized later. 
> For example, the first thing we call is an early_errate_workaround function in 
> the init sequence before anything else is called. 
> 

I guess overriding the complete generic board init code is not
acceptable. It was once hard work to unify this. A hook like
early_errate_workaround() sounds reasonable but could also be called
from start.S before handing over to board_init_f(). But everything else
should fit into the exisiting init hooks. There are quite a lot.

-- 
- Daniel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [U-Boot] [EXT] Re: Cavium/Marvell Octeon Support
  2019-10-27  2:34   ` Aaron Williams
@ 2019-10-29 13:12     ` Wolfgang Denk
  2019-10-30 16:20     ` Daniel Schwierzeck
  1 sibling, 0 replies; 32+ messages in thread
From: Wolfgang Denk @ 2019-10-29 13:12 UTC (permalink / raw)
  To: u-boot

Dear Aaron,

In message <4176494.JIoP81OjG2@flash> you wrote:
>
> Actually the low level code is significantly different. First of all, we need 
> the U-Boot bootloader to be able to boot from different memory locations. 
> Because of this, we use mapped memory for U-Boot. A side effect of this is 
> that it eliminates the need for relocation when it is shifted to the top of 
> memory. All we need to do is just set a couple of TLB entries.
>
> The assembly code is significantly different and is far more extensive.
>
> Additionally, the way Octeon Linux is booted is different.
>
> The generic start.S is not usable in our case.

Please excuse my ignorance - I have never touched a Cavium system yet
(at least not knowingly), and never looked into any of that code.
So for me it would be really helpful if you would not only describe
what you have, or what you need, or that things are different or
cannot be used, but actually explain _why_ this is the case, and why
you cannot use the existing structure of U-Boot mainline code.

I know that it is always difficult to upstream code that has been
developed out of tree and without synchronizing the design with the
mainline maintainers, but as long as you don't explain why it was
mandatory to do things different, it is impossible to understand if
this is the only sane way things can be implemented, or if you just
don't want to change the code that has grown over the years in an
uncontrolled way to avoid the efforts for cleaning it up.

> We have a significant amount of code for dealing with the cache and for things 
> like copying U-Boot from flash into the L2 cache. We also have to deal with 
> taking other cores out of reset in our start.S. Our exception handler has also 
> been extended to handle multiple cores.

We should be able to understand why you need this.  There might be
areas where your code overlaps with things that are already
available in U-Boot mainline, and if there are good reasons to
duplicate such areas, you should explain them.

Daniel already pointed out that doubling the code size of U-Boot by
adding just a single new CPU simply makes no sense.  I don't know
what you are using U-Boot for, but we should keep in mind that it's
a boot loader, which main purpose should be to execute as fast as
possible just to be replaced by an operating system.

I have to admit that I have problems understanding why someone would
need hot plug support for hardware in U-Boot.

It would be best to restrict initial upstreaming to a minimal sub-set
that gives maintainers even a chance to review it.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
Alliance: In international politics, the union  of  two  thieves  who
have  their hands so deeply inserted in each other's pocket that they
cannot separately plunder a third.                   - Ambrose Bierce

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [U-Boot] [EXT] Re: Cavium/Marvell Octeon Support
  2019-10-26 22:15   ` Tom Rini
@ 2019-10-27  3:08     ` Aaron Williams
  0 siblings, 0 replies; 32+ messages in thread
From: Aaron Williams @ 2019-10-27  3:08 UTC (permalink / raw)
  To: u-boot

On Saturday, October 26, 2019 3:15:36 PM PDT Tom Rini wrote:
> External Email
> 
> ----------------------------------------------------------------------
> 
> On Fri, Oct 25, 2019 at 05:13:57PM +0200, Daniel Schwierzeck wrote:
> > Hi Aaron,
> > 
> > Am 23.10.19 um 05:50 schrieb Aaron Williams:
> > > Hi all,
> > > 
> > > I have been tasked with porting our Octeon U-Boot to the latest U-Boot
> > > and merging it upstream. This will involve a very significant amount of
> > > code that generally will not be compatible with other MIPS processors
> > > due to our needs and requirements. For example, the start.S will need to
> > > be completely different than what is present. For example, our existing
> > > start.S is 3577 lines of code in order to deal with things like RAS,
> > > exceptions, virtual memory and more. We need to use virtual memory since
> > > U-Boot can be loaded at any 4MB boundary in memory, not just 0xbfc00000.
> > > A number of drivers will need to be updated in order to properly map
> > > pointers to physical addresses. This is needed anyway, since I see
> > > numerous drivers that assume that a pointer is a DMA address. For MIPS
> > > this is never the case (I'm looking at XHCI).
> > 
> > Good to see some progress in mainline Octeon support. Could you briefly
> > describe the differences and commonalities in booting an Octeon CPU
> > compared to other "generic" MIPS cores? Or could you point me to a
> > public Git tree? It can't be that different because Linux kernel is also
> > able to share most of the code ;)
> > 
> > In principle you could compile an own start.S in your mach-octeon
> > directory, but you should try to use the generic start.S which is
> > already customisable and extensible. If needed, we could add more
> > extension points to it. Booting from any custom memory address is
> > already supported and very common for other MIPS based SoC's. Exception
> > support is also already there.
> > 
> > > The new Octeon U-Boot will be native 64-bit instead of how the earlier
> > > one was 32-bit using the N32 ABI (so 64-bit addresses could be
> > > accessed). We had to jump through some hoops to make a 32-bit U-Boot
> > > fully support 64-bit hardware.
> > 
> > We have 64 bit support for MIPS. I even sync'ed the asm/io stuff from
> > Linux in the past (which includes support for Octeon) so that you would
> > be able to use the standard IO primitives and ioremap stuff and hook in
> > your platform-specifc memory mappings.
> > 
> > > I think we can shrink the code by removing support for starting "simple
> > > executive" tasks. Simple executive tasks are bare metal applications
> > > that can run on dedicated cores beside Linux (or without Linux). I will
> > > also not be porting any support for anything older than Octeon3.
> > > 
> > > We also make heavy use of our SDK in order to perform hardware
> > > initialization and networking. In our old U-Boot, we have almost 900K
> > > lines of code. I can cut out much of this but much will remain.
> > > 
> > > We also have added extensive infrastructure for handling SFP and QSFP
> > > cables as well as very extensive phy support for phys from
> > > Aquantia/Marvell, Vitesse/Microsemi, Inphi/Cortina and an Avago gearbox.
> > > Our customer wants us to port all of this to the new U-Boot and upstream
> > > it. I'm worried about the sheer amount of code since it is absolutely
> > > massive.
> > 
> > Maybe you should cut down your customers expectations a bit. According
> > to sloccount we currently have 1.6M SLOC for the whole U-Boot. I guess
> > Tom or Wolfgang wouldn't agree with adding another 900k only for one
> > CPU. Actually what should be upstream is the basic CPU, driver and board
> > support to be able to boot a mainline kernel. Everything else like
> > custom bare metal applications or the SFP/PHY handling stuff mentioned
> > below could also be maintained in a downstream tree. Maybe Wolfgang is
> > willing to host one on gitlab.denx.de.
> > 
> > > Some of these phy drivers are extremely complex and need to tie
> > > into the SFP management. We also need to use a background polling thread
> > > while at the command prompt. A fair bit of our phy code is not in the
> > > normal phy drivers because it did not fit the model. Some of these phy
> > > drivers need to interact with the SFP support code in order to handle
> > > hot plug events in order to reconfigure themselves based on the cable
> > > type. The existing SFP code handles everything from SFP to SFP28 as well
> > > as QSFP and 100G QSFP (never tested).
> > > 
> > > In the old U-Boot the PHY support had to be significantly enhanced due
> > > to requirements for hot-plugging and how some of the PHYs are
> > > configured. It gets quite complicated with phys like the Inphi where one
> > > phy can handle either four ports (XFI/SGMII) or a single 4-lane port
> > > (XLAUI). It gets even worse since in some boards we use reclocking chips
> > > and there is one chip that handles the receive path of a QSFP and
> > > another that handles the transmit path. Further complicating things,
> > > with a QSFP it can be treated either as XLAUI or as four XFI ports, so
> > > you can have four ports spread across two chips, with each port using
> > > different slices of each chip. In the case of the Inphi/Cortina chip, a
> > > single device can handle one or four ports based on the configuration
> > > and it is configured by "slice" which is basically an offset into the
> > > MDIO register space. We had to jump through hoops in order to have this
> > > stuff work in a sane way in the device tree. We added entries for SFP
> > > and QSFP slots in the device tree which point to the MACs, GPIOs and I2C
> > > bus because pointing them to the phys just got too insane. This will
> > > need to be ported to the new U-Boot. It should not break the existing
> > > support since most of it was implemented outside of the core PHY
> > > handling code. In the port, it would be far better if this could be
> > > integrated in. The SFP management code is architecture agnostic as is
> > > all of the PHY support. The callbacks for the SFP support are used by
> > > the MAC which then notifies the PHY since the MAC often needs to
> > > reconfigure itself. It can handle some crazy configurations.
> > > 
> > > While I see some phy drivers that we also support, i.e. Cortina, our
> > > drivers tend to have a lot more functionality. For example, all of our
> > > phy drivers that support firmware support commands for upgrading the
> > > firmware as well as things like cable testing and other features.
> > 
> > PHY drivers and ethernet drivers should be really reduced to the
> > required functionality to enable basic networking like Ping, DHCP, TFTP.
> > U-Boot is still "just" a bootloader and not a system managemnt tool ;)
> > You should do that stuff either in Linux or in a downstream fork.
> > 
> > > Our bootloader needs to be able to be booted from a variety of sources,
> > > including SPI, eMMC, NOR flash and booting over the PCI bus from a host
> > > system. This is one reason we use virtual memory. The other reason is
> > > that it eliminates the need to perform relocation. Our start.S code
> > > handles all of these different cases as well as exception handling.
> > 
> > This is already supported for MIPS. You should try to use the generic
> > SPL framework for that. Whether you like the relocation or not, it's one
> > of the basic design principles of U-Boot. I guess it likely won't be
> > accepted if you circumvent this. In fact by now we're sharing the same
> > technology as Linux to have relocatable binaries without using gcc's
> > -fPIC or -mabicalls to reduce the binary footprint. You can configure
> > gd->ram_top to any address of your liking as reference address for the
> > relocation.
> > 
> > > I will also say up front that the memory initialization code is a mess
> > > and quite large (it was written by a hardware engineer who never heard
> > > of functions).
> > > 
> > > One thing is that this will break mips unless it is refactored like ARM
> > > is, for example, separating armv7 and armv8. This way we could have
> > > arch/mips/cpu/octeon. I did this with the old bootloader to separate our
> > > stuff. I'm open to suggestions as for the naming. I don't see how we can
> > > share much of the code with the other MIPS CPUs.
> > 
> > We have the same mach directory handling as in Linux MIPS. So you could
> > easily add all your platform specific code (except drivers) to
> > arch/mips/mach-octeon or (-cavium). Inside that directory you can have
> > an include directory for you cusom header files, you can even override
> > the generic files from arch/mips/include like in Linux. arch/mips/cpu
> > and arch/mips/lib should only contain generic code. As already mentioned
> > you could provide an own start.S inside arch/mips/mach-octeon but if
> > possible you should try to reuse or extend the generic variant.
> > 
> > > All in all, I think the final port will add between 500K-1M lines of
> > > code for the Octeon CPU. It is much more extensive than what is required
> > > for OcteonTX since in the latter case most of the hardware
> > > initialization is done by earlier stage bootloaders and the ATF handles
> > > things like SFP port management and many of the networking operations.
> > > 
> > > I'm not sure how well I'll be able to upstream all of this code at this
> > > point since I was just handed this task. We already have at least 1M
> > > lines of code added to the old U-Boot which is based off of 2013.08 with
> > > a lot of backports.
> 
> Daniel makes a lot of good points and I defer to him on general MIPS
> questions.  What I do want to add is that it's a good idea to start by
> focusing on the minimum needs to be able to boot Linux and aim for a
> medium term goal of having enough upstream that all of the other things
> that can live downstream, as Daniel suggests, be applied in your
> internal tree and work over time to minimize that delta, either by
> re-evaluating use-cases or submitting more code upstream.

This is my goal, unfortunately getting it to this point requires that most of 
the stuff works. I'll start on the "simpler" boards like the one the customer 
requires we first support, unfortunately there's not much simple about it. It 
requires the full networking support, SFP management and one of the more 
complex phys (and a custom one at that). Booting Linux also requires a lot of 
stuff work, including our custom command for booting Linux and all the code to 
bring cores out of reset and initialize them, at least for the current Linux 
kernel. Hopefully we can  move away from this but we will still need to 
support the current stuff. I think much of our existing code can be used and 
cleaned up. We had to jump through some hoops due to the fact that our current 
U-Boot is 32-bit but we're dealing with a 64-bit environment so this allows 
some code to be cleaned up and simplified, though even though it's 32-bit it 
can still natively perform 64-bit addressing using the N32 ABI.

The required networking and initialization code alone is massive, and that's 
just for ping, dhcp and tftp! The Linux code is much smaller because U-Boot 
needs to do all the low-level hardware initialization first. Fortunately I've 
generally been fairly strict at following the U-Boot coding standard (such as 
it was). and tried to keep the code fairly modular. I can move a few drivers 
out of the arch section and into the driver section. It's also generally well 
commented (which leads to some of the size).

I'll basically strip out all the support for earlier Octeon devices which will 
help some, unfortunately most of the current code is for Octeon3.

My goal is to re-use as much existing U-Boot code as possible and make the 
smallest impact on it as I can. There are a handful of changes I will need to 
make to the U-Boot core code, but most of these are generally quite minor.

--Aaron

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [U-Boot] [EXT] Re: Cavium/Marvell Octeon Support
  2019-10-25 15:13 ` Daniel Schwierzeck
  2019-10-26 22:15   ` Tom Rini
@ 2019-10-27  2:34   ` Aaron Williams
  2019-10-29 13:12     ` Wolfgang Denk
  2019-10-30 16:20     ` Daniel Schwierzeck
  1 sibling, 2 replies; 32+ messages in thread
From: Aaron Williams @ 2019-10-27  2:34 UTC (permalink / raw)
  To: u-boot

Hi Daniel,

On Friday, October 25, 2019 8:13:57 AM PDT Daniel Schwierzeck wrote:
> External Email
> 
> ----------------------------------------------------------------------
> Hi Aaron,
> 
> Am 23.10.19 um 05:50 schrieb Aaron Williams:
> > Hi all,
> > 
> > I have been tasked with porting our Octeon U-Boot to the latest U-Boot
> > and merging it upstream. This will involve a very significant amount of
> > code that generally will not be compatible with other MIPS processors
> > due to our needs and requirements. For example, the start.S will need to
> > be completely different than what is present. For example, our existing
> > start.S is 3577 lines of code in order to deal with things like RAS,
> > exceptions, virtual memory and more. We need to use virtual memory since
> > U-Boot can be loaded at any 4MB boundary in memory, not just 0xbfc00000.
> > A number of drivers will need to be updated in order to properly map
> > pointers to physical addresses. This is needed anyway, since I see
> > numerous drivers that assume that a pointer is a DMA address. For MIPS
> > this is never the case (I'm looking at XHCI).
> 
> Good to see some progress in mainline Octeon support. Could you briefly
> describe the differences and commonalities in booting an Octeon CPU
> compared to other "generic" MIPS cores? Or could you point me to a
> public Git tree? It can't be that different because Linux kernel is also
> able to share most of the code ;)
> 

Actually the low level code is significantly different. First of all, we need 
the U-Boot bootloader to be able to boot from different memory locations. 
Because of this, we use mapped memory for U-Boot. A side effect of this is 
that it eliminates the need for relocation when it is shifted to the top of 
memory. All we need to do is just set a couple of TLB entries.

The assembly code is significantly different and is far more extensive.

Additionally, the way Octeon Linux is booted is different.

The generic start.S is not usable in our case.

We have a significant amount of code for dealing with the cache and for things 
like copying U-Boot from flash into the L2 cache. We also have to deal with 
taking other cores out of reset in our start.S. Our exception handler has also 
been extended to handle multiple cores.

Some other things we have included are a native API that allows Simple 
Executive applications to make calls into U-Boot for such things as 
environment variable access as well as access to block devices and 
filesystems.


We used to have our Octeon SDK available for download but it seems this has 
been taken down :( I'm trying to find out how I can make it available but I'm 
getting pushback in sharing our GPLed U-Boot even though it is GPL.

> In principle you could compile an own start.S in your mach-octeon
> directory, but you should try to use the generic start.S which is
> already customisable and extensible. If needed, we could add more
> extension points to it. Booting from any custom memory address is
> already supported and very common for other MIPS based SoC's. Exception
> support is also already there.
> 

The bootloader needs to be able to start from multiple memory locations 
without recompiling. Our existing bootloader can run from any 4MB boundary 
without recompiling or relocation. It can start out of flash (from any sector 
boundary, not just 0) or L2 cache. Starting by L2 cache is supported by eMMC, 
SPI and PCI target bootloaders. Additionally the same bootloader can be 
started from RAM such as when the failsafe bootloader starts the main 
bootloader. In most cases, the failsafe is the same full-featured bootloader 
since it fits entirely within the L2 cache. Our only bootloader requirement is 
that it fits in the L2 cache (except when booting from Flash, though this is 
preferred for speed) and that it remain under 4 MiB in size.

I believe our exception handling is more extensive than the standard U-Boot 
exception handler. It includes the stack output as well as numerous COP0 
registers and decoding the cause of the exception. The exception handler is 
also independent of a working C environment. We also need to handle exceptions 
occurring on multiple cores as they're brought out of reset and not all cases 
are exceptions. Cores are first powered on and kept in a halted state, then 
later when we start the Linux kernel or simple executive applications, the 
exception handler is updated (via a bootbus moveable memory region)  and an 
NMI is generated for the cores where they will begin executing code out of 
start.S before moving to the code that sets up the environment for booting 
Linux and/or simple executive applications. In the latter case, TLB entries 
are programmed in for each core.

> > The new Octeon U-Boot will be native 64-bit instead of how the earlier
> > one was 32-bit using the N32 ABI (so 64-bit addresses could be
> > accessed). We had to jump through some hoops to make a 32-bit U-Boot
> > fully support 64-bit hardware.
> 
> We have 64 bit support for MIPS. I even sync'ed the asm/io stuff from
> Linux in the past (which includes support for Octeon) so that you would
> be able to use the standard IO primitives and ioremap stuff and hook in
> your platform-specifc memory mappings.
> 
That is good to know. What I have run into is the fact that many drivers do 
not support I/O remapping. I.e. XHCI assumes that a pointer is a DMA address. 
Also, does the 64-bit support handle multiple cores in U-Boot?

I agree about using the standard ioremap stuff. I'm only pointing out that 
there are places where it is missing in the common U-Boot code. Where it is 
present, there won't be any issues since traditionally I used those methods to 
call our platform specific remapping. I will look to see what is present and 
if it will work or not.

> > I think we can shrink the code by removing support for starting "simple
> > executive" tasks. Simple executive tasks are bare metal applications
> > that can run on dedicated cores beside Linux (or without Linux). I will
> > also not be porting any support for anything older than Octeon3.
> > 
> > We also make heavy use of our SDK in order to perform hardware
> > initialization and networking. In our old U-Boot, we have almost 900K
> > lines of code. I can cut out much of this but much will remain.
> > 
> > We also have added extensive infrastructure for handling SFP and QSFP
> > cables as well as very extensive phy support for phys from
> > Aquantia/Marvell, Vitesse/Microsemi, Inphi/Cortina and an Avago gearbox.
> > Our customer wants us to port all of this to the new U-Boot and upstream
> > it. I'm worried about the sheer amount of code since it is absolutely
> > massive.
> 
> Maybe you should cut down your customers expectations a bit. According
> to sloccount we currently have 1.6M SLOC for the whole U-Boot. I guess
> Tom or Wolfgang wouldn't agree with adding another 900k only for one
> CPU. Actually what should be upstream is the basic CPU, driver and board
> support to be able to boot a mainline kernel. Everything else like
> custom bare metal applications or the SFP/PHY handling stuff mentioned
> below could also be maintained in a downstream tree. Maybe Wolfgang is
> willing to host one on gitlab.denx.de.
> 

I will try and cut it down. Much of the code is register definitions. The 
register definition files are auto-generated and tend to be huge. They're 
fully commented and include both big and little endian bitfields. In this case 
I can do like I did for OcteonTX and modify the scripts that generate these 
headers to strip out the little-endian and comments. There is a huge amount of 
code for configuring our QLM hardware interfaces. We also have a lot of code 
for SFP/QSFP ports. 

There are some other huge files that can also be eliminated by dropping 
support for Octeon II and earlier. The error handling files are massive for 
those chips.

Much of the rest can be shrunk somewhat, but a lot of that code is still 
required.

There is a huge amount of code for dealing with our quad-lane modules (QLMs). 
The QLMs can be configured to run in a variety of modes, from PCIe, SGMII, 
SATA, XLAUI, XFI, Interlaken, SVRIO, QSGMII, XAUI, RXAUI and more. There is a 
lot of tuning and configuration code needed in order to handle different 
clocks, equalization, gain, AGC and a whole host of other serdes issues.

The MAC code is also quite large and complex since there are many coprocessors 
that must be configured. These chips are designed as network processors. While 
it makes their networking quite powerful and fast, it also means that a lot of 
programming is needed before they will work. There are input parser engines, 
buffer management engines, queueing engines, output engines and more that must 
be fully configured before any packets can be sent or received.

There is a fair bit of code used to bring additional cores out of reset. In 
our biggest configuration, there can be two Octeon CN78XX chips connected in 
tandem where each chip has 48 cores. In this case there is a lot of tuning 
that needs to happen with the lanes connecting the two chips before this 
configuration works reliably. There is a tuning process that is required to 
run on both sides (and the second chip runs a small binary image as well to 
perform its half of the tuning).

I do not know if this will change or not but the way the Linux kernel is 
booted on Octeon is not compatible with the standard boot commands. Part of 
this is due to the fact that Linux can be run in parallel with Simple 
Executive applications. It's even possible to run two copies of Linux 
simultaneously on different cores. To go along with this, there is also a 
mechanism with named memory blocks that is used. When bring cores out of reset  
for SE applications, the TLB entries need to be configured. There also is a 
fair bit of code dealing with core masks when choosing which cores are used 
for what.

We also have a named memory block feature which is used by Linux and simple 
executive applications where blocks of memory can be carved up. U-Boot needs 
to tie into this.

There are also a numerous other I/O interfaces that we also need to 
initialize. Unfortunately we also have some erratas we need to work around as 
well and a few are non-trivial.

The DRAM initialization code is also massive.  It handles DDR3 and DDR4 for 
both registered and unregistered memory with ECC.

In many cases, the reason for the size of the code is due to the complexity of 
the SoC and the platforms built around it. You can think of CN78XX as being 
more like an enterprise-class server than a simple embedded device. The CN73XX 
is not too far behind the CN78XX. The only reason our Octeon TX2 U-Boot is so 
much smaller is that most of the early initialization takes place before U-
Boot is started and the fact that a lot of the networking support (such as SFP 
management and PHY support) is handled by ATF as well as on-chip managment 
cores. This is necessary because Linux does not have any SFP management 
support nor can it handle the complex typologies we're frequently running into 
today.  The requirements of Redhat also preclude any additional software being 
installed in order for the networking support to run.

One thing I may need to re-introduce to U-Boot is the temperature sensor 
support for devices like this, since thermal monitoring is important.

Some boards require a background task to perform periodic monitoring for 
certain events, including the board that needs to be upstreamed. I haven't 
checked if anything is available now, but what I did in the past was hook into 
the input function and while waiting for input it calls a user-defined polling 
function.

If interrupts are supported it makes the polling job easier.
> > Some of these phy drivers are extremely complex and need to tie
> > into the SFP management. We also need to use a background polling thread
> > while at the command prompt. A fair bit of our phy code is not in the
> > normal phy drivers because it did not fit the model. Some of these phy
> > drivers need to interact with the SFP support code in order to handle
> > hot plug events in order to reconfigure themselves based on the cable
> > type. The existing SFP code handles everything from SFP to SFP28 as well
> > as QSFP and 100G QSFP (never tested).
> > 
> > In the old U-Boot the PHY support had to be significantly enhanced due
> > to requirements for hot-plugging and how some of the PHYs are
> > configured. It gets quite complicated with phys like the Inphi where one
> > phy can handle either four ports (XFI/SGMII) or a single 4-lane port
> > (XLAUI). It gets even worse since in some boards we use reclocking chips
> > and there is one chip that handles the receive path of a QSFP and
> > another that handles the transmit path. Further complicating things,
> > with a QSFP it can be treated either as XLAUI or as four XFI ports, so
> > you can have four ports spread across two chips, with each port using
> > different slices of each chip. In the case of the Inphi/Cortina chip, a
> > single device can handle one or four ports based on the configuration
> > and it is configured by "slice" which is basically an offset into the
> > MDIO register space. We had to jump through hoops in order to have this
> > stuff work in a sane way in the device tree. We added entries for SFP
> > and QSFP slots in the device tree which point to the MACs, GPIOs and I2C
> > bus because pointing them to the phys just got too insane. This will
> > need to be ported to the new U-Boot. It should not break the existing
> > support since most of it was implemented outside of the core PHY
> > handling code. In the port, it would be far better if this could be
> > integrated in. The SFP management code is architecture agnostic as is
> > all of the PHY support. The callbacks for the SFP support are used by
> > the MAC which then notifies the PHY since the MAC often needs to
> > reconfigure itself. It can handle some crazy configurations.
> > 
> > While I see some phy drivers that we also support, i.e. Cortina, our
> > drivers tend to have a lot more functionality. For example, all of our
> > phy drivers that support firmware support commands for upgrading the
> > firmware as well as things like cable testing and other features.
> 
> PHY drivers and ethernet drivers should be really reduced to the
> required functionality to enable basic networking like Ping, DHCP, TFTP.
> U-Boot is still "just" a bootloader and not a system managemnt tool ;)
> You should do that stuff either in Linux or in a downstream fork.
> 

This is the case for the most part. Unfortunately, many of these drivers 
require a lot of code and some require frequent monitoring to make 
adjustments. The SFP support is required to monitor what cable type is plugged 
in and to reprogram the phy as needed based on the type of cable. The 10G and 
25G phys need different settings for optical/active vs passive copper vs SFP 
connectors. In addition, some require different settings based on the cable 
length and in some cases exceptions are needed for certain modules (there are 
a series of Avago SFP to Gigabit modules that require autonegotiation to be 
disabled in 1000Base-X mode). In at least one case there needs to be frequent 
polling to make adjustments (25G) as the equalization settings can change 
based on temperature. The SFP management code identifies the type of cable 
connected and its parameters so that the phy driver can adjust the appropriate 
settings. The SFP management code is generic and not tied to any one type of 
phy or MAC or brand of module. It also monitors all of the GPIO pins and will 
make callbacks when needed. Many phys lack the support for doing this 
themselves. Phys I have worked with that need this support include Cortina/
Inphi and several Microsemi/Vitesse devices.

The Inphy devices will typically handle four XFI lanes with four bi-
directional slices with each slice given a different register range. Further 
complicating matters is that a QSFP port can either be four XFI interfaces or 
a single XLAUI interface. We have code to update the firmware for the Inphi 
chips, but this is small compared to the rest of the initialization code. 
These chips require that equalization and gain be configured on each slice 
based on the board and cable characteristics as well as LED configuration.

With the Microsemi reclocking chips, each chip has four unidirectional lanes. 
For a QSFP port, two chips are required with one chip configured for ingress 
and the other for egress. This can support either XLAUI or four XFI 
interfaces. When it is configured for XFI there are four XFI interfaces, since 
now four MACs are shared with two chips with each MAC going to one lane on 
each chip.

Also making things fun is that Inphi and the reclocking chips do not conform 
to the clause 45 standard at all. In the case of Inphi, the ID registers are 
0.0 and 0.1 instead of 1.2 and 1.3 as they are in Clause 45.

The MAC drivers are also non-trivial. The Octeon chips are designed as network 
processors with a lot of hardware offloading and coprocessors. Bringing up a 
"simple Ethernet" interface is anything but simple. There are numerous offload 
engines that must be configured before it will work. While we do have one 
"simple" interface that can be configured, it often isn't because it's usually 
only good for a management port and many boards do not have this and the 
customers desire to be able to use any port.

Just configuring the interface between the MAC and PHY is also non-trivial. 
The Octeon (and later CPUs) have what are called "QLMs" or quad lane modules. 
These QLMs contain programmable serdes which can be configured for PCIe, SATA, 
XFI, XAUI, RXAUI, SGMII, 1000Base-X, XLAUI and a whole host of other interface 
types with a lot of tuning for things like equalization and clocks. The amount 
of QLM initialization code is quite large but necessary. There are a lot of 
clock and analog tuning parameters and sequences that must be run.

Sadly all of this is needed just for basic ping and DHCP. This isn't like a 
simple e1000 NIC or the NICs common with most SoCs.

Think of scaling from a Raspberry Pi to a dual-CPU XEON enterprise-class 
server with 96 cores and 256GiB of RAM with 10, 25 and 40Gbe ports but without 
a BCM or MCU to handle low-level board changes while also having many 
enterprise-class requirements for RAS, etc. That is why our code is so large 
and complex. There are a lot of hardware engines for offloading a lot of tasks 
since the chips are often used in security appliances. There are engines for 
ZIP compression, hardware regex engines, packet ordering engines, packet 
parsing engines, buffer management engines, RAID engines and a whole host of 
others. Many are not used in U-Boot, but a fair number are required for basic 
packet I/O.

For example, one of the boxes contains a CN78XX with 8 10G ports (where either 
can also be configured in XLAUI using 4 to 1 using a QSFP to SFP+ splitter 
cable. It has 128GiB of registered DDR4 DIMMS, 4 SATA drives, redundant power 
supplies and a whole host of other things including multiple temperature 
monitors. This uses an Inphi/Cortina phy chip that requires full SFP 
management support. With Inphi phys, the phy cannot drive LEDs based on 
traffic since it has no concept of packets, especially in XLAUI mode since 
each lane is independent of the others.

Another board, one I specifically have been told to upstream is a NIC that 
contains a CN73XX and two 10G/25G ports that go through a complex gearbox 
chip. Since there is no hardware support for LEDs in the Octeon SoC to 
indicate link and packet I/O this must be done in software (including U-Boot, 
customer requirement) and SFP port management is also a must. The phy is not 
at all a traditional phy. It uses i2c instead of MDIO and requires frequent 
monitoring of the link parameters (it's an older custom gearbox chip, there 
are newer and better chips that don't require this now). I have a hook while 
U-Boot is sitting at the prompt which allows for background tasks to operate 
while it's sitting.

I have several other NICs to support that use a Microsemi reclocking chip that 
has four unidirectional lanes per chip. The chip has zero intelligence and is 
shared between ports (and on some devices, multiple chips are shared between 
ports). Everything must be tuned based on the SFP/QSFP module type and cable 
length. LEDs also must be software driven. (The software driving of LEDs is 
eliminated in OcteonTX2). These chips have no way to drive the LEDs themselves 
to indicate packet I/O or link status.

There are also other boards that use the Microsemi reclocking chips. They were 
chosen in part due to the power budget and these chips are very low power (and 
inexpensive).

In all of these phy cases, all of the parameters are maintained in the device 
tree so the drivers are generic. Unfortunately these drivers also require SFP 
and QSFP management support.

I figure if there are several boards I need to upstream, it's not much more 
effort to port all of the boards to the new U-Boot. I've worked hard to 
minimize the board-specific code and make as much of it generic and based on 
the device tree as possible.

Someday I would love for SFP/QSFP infrastructure to get into Linux. Some NIC 
cards do it in their drivers, but I'd like to see generic infrastructure (like 
my U-Boot support). This might make it harder for some drivers to only support 
certain brands of modules too :) The generic code I wrote works with most 
modules except Intel (because they have bad checksums, but counterfeit Intel 
modules work fine!). It still can be expanded at some point since there is no 
support for module diagnostics other than identifying if it is present. Pretty 
much all it does is monitor the GPIO pins and parse and decode the EEPROM. The 
SFP code is generic enough such that any phy driver that needs it can easily 
hook into it.

> > Our bootloader needs to be able to be booted from a variety of sources,
> > including SPI, eMMC, NOR flash and booting over the PCI bus from a host
> > system. This is one reason we use virtual memory. The other reason is
> > that it eliminates the need to perform relocation. Our start.S code
> > handles all of these different cases as well as exception handling.
> 
> This is already supported for MIPS. You should try to use the generic
> SPL framework for that. Whether you like the relocation or not, it's one
> of the basic design principles of U-Boot. I guess it likely won't be
> accepted if you circumvent this. In fact by now we're sharing the same
> technology as Linux to have relocatable binaries without using gcc's
> -fPIC or -mabicalls to reduce the binary footprint. You can configure
> gd->ram_top to any address of your liking as reference address for the
> relocation.
> 

I will look into this. One other complication is the fact that we require both 
a failsafe as well as a default bootloader. With the older U-Boot we got 
around all of this by just using TLB entries to map U-Boot to always run in 
the same virtual address regardless of the physical address. It eliminated any 
need for -fPIC and helped keep the binary small. For our older bootloader, it 
always executes at 0xC0000000 regardless of where it sits in physical memory. 
Using virtual memory also helps keep U-Boot simple and small.

> > I will also say up front that the memory initialization code is a mess
> > and quite large (it was written by a hardware engineer who never heard
> > of functions).
> > 
> > One thing is that this will break mips unless it is refactored like ARM
> > is, for example, separating armv7 and armv8. This way we could have
> > arch/mips/cpu/octeon. I did this with the old bootloader to separate our
> > stuff. I'm open to suggestions as for the naming. I don't see how we can
> > share much of the code with the other MIPS CPUs.
> 
> We have the same mach directory handling as in Linux MIPS. So you could
> easily add all your platform specific code (except drivers) to
> arch/mips/mach-octeon or (-cavium). Inside that directory you can have
> an include directory for you cusom header files, you can even override
> the generic files from arch/mips/include like in Linux. arch/mips/cpu
> and arch/mips/lib should only contain generic code. As already mentioned
> you could provide an own start.S inside arch/mips/mach-octeon but if
> possible you should try to reuse or extend the generic variant.
> 

We can't use the existing start.S. We have a lot of requirements that are not 
supported there as well as a fair bit of code dedicated to dealing with the 
cache and TLBs and bringing additional cores out of reset. We make use of a 
boot bus movable region in order to do this and handle other cases like NMIs 
and the watchdog. Our start.S currently sits at around 3800 lines of code. 
Some is common but most is not.

Our start.S is designed to be able to boot both a failsafe and non-failsafe 
image and supports adjusting the flash mapping in order to start from an 
offset other than zero in the flash. There is also a fair bit of code for 
copying the image out of flash into the L2 cache for a significant speedup for 
DRAM initialization. I'm trying to get permission to share our existing code 
but I'm getting push-back (even though it's GPL!?!). How they want me to 
upstream it without sharing the code is beyond me.

While U-Boot has an exception handler, I believe ours is more comprehensive. 
It is written entirely in assembler and is not dependent on a working C 
runtime environment. It also dumps more information than just the registers 
such as the stack and a number of other exception registers and does some 
exception decoding. It's quite a bit better than the ARMv8 exception handler 
IMHO.

Putting this under mach-octeon will make it much easier. I'll try and re-use 
where I can.

> > All in all, I think the final port will add between 500K-1M lines of
> > code for the Octeon CPU. It is much more extensive than what is required
> > for OcteonTX since in the latter case most of the hardware
> > initialization is done by earlier stage bootloaders and the ATF handles
> > things like SFP port management and many of the networking operations.
> > 
> > I'm not sure how well I'll be able to upstream all of this code at this
> > point since I was just handed this task. We already have at least 1M
> > lines of code added to the old U-Boot which is based off of 2013.08 with
> > a lot of backports.

I'm trying to get  our existing code made available someplace online. I'm 
getting pushback even though U-Boot is GPL and the license on our SDK is BSD-
like (i.e. do whatever you want but don't hold us responsible). It looks like 
it used to be available but was taken down. I don't undertstand lawyers. All 
of the code I wrote is GPL. There is some U-Boot specific code in our SDK, but 
none was copied from U-Boot. There also is some duplication of functionality 
between U-Boot and our SDK that I'll try and eliminate.

I have implemented just about every feature in U-Boot I could with our Octeon 
SoC. That's another reason it's so large. Some customer always comes back and 
says they want feature X to work. Fortunately, the changes to the U-Boot 
supplied code are generally minimal, despite it being so large.

I likely will need to add some more hooks to board_f.c and board_r.c. I have 
run into many cases where we need a specific order of initialization that does 
not match the normal U-Boot order. Perhaps make init_sequence_f and 
init_sequence_r weak so that they can be overridden if needed by a specific 
board or architecture. While much of the current init order works,  we need 
some things initialized as quickly as possible and others initialized later. 
For example, the first thing we call is an early_errate_workaround function in 
the init sequence before anything else is called. 

Regards,

-Aaron

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2019-11-07  0:34 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-06  0:03 [U-Boot] [EXT] Re: Cavium/Marvell Octeon Support Aaron Williams
2019-11-07  0:34 ` Tom Rini
  -- strict thread matches above, loose matches on Subject: below --
2019-10-23  3:50 [U-Boot] " Aaron Williams
2019-10-25 15:13 ` Daniel Schwierzeck
2019-10-26 22:15   ` Tom Rini
2019-10-27  3:08     ` [U-Boot] [EXT] " Aaron Williams
2019-10-27  2:34   ` Aaron Williams
2019-10-29 13:12     ` Wolfgang Denk
2019-10-30 16:20     ` Daniel Schwierzeck
2019-10-30 17:21       ` Wolfgang Denk
2019-10-30 21:23       ` Aaron Williams
2019-10-31 10:36         ` Wolfgang Denk
2019-10-31 17:59           ` Aaron Williams
2019-11-04 15:44             ` Wolfgang Denk
2019-11-04 16:23               ` Tom Rini
2019-11-05  2:08                 ` Aaron Williams
2019-11-05  8:37                   ` Wolfgang Denk
2019-11-05 10:22                     ` Aaron Williams
2019-11-05 11:36                       ` Wolfgang Denk
2019-11-05 23:09                         ` Aaron Williams
2019-11-06 15:06                           ` Wolfgang Denk
2019-11-06 22:18                             ` Aaron Williams
2019-11-07  0:21                               ` Tom Rini
2019-11-05 14:15                   ` Tom Rini
2019-11-05  1:57               ` Aaron Williams
2019-11-05  8:33                 ` Wolfgang Denk
2019-11-05 14:16                   ` Tom Rini
2019-10-30 22:05 ` [U-Boot] " Tom Rini
2019-10-30 23:36   ` [U-Boot] [EXT] " Aaron Williams
2019-10-31 10:40     ` Wolfgang Denk
2019-10-31 18:01       ` Aaron Williams
2019-11-04 17:22         ` Tom Rini
2019-11-05  2:13           ` Aaron Williams
2019-11-05 14:09             ` Tom Rini
2019-10-31 13:26     ` Tom Rini
2019-10-31 18:04       ` Aaron Williams

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.