linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Jernej Škrabec" <jernej.skrabec@gmail.com>
To: "Ondřej Jirman <megous@megous.com>,
	Clément Péron" <peron.clem@gmail.com>,
	"Maxime Ripard" <maxime@cerno.tech>,
	"Rob Herring" <robh@kernel.org>,
	"Tomeu Vizoso" <tomeu.vizoso@collabora.com>,
	"Steven Price" <steven.price@arm.com>,
	"Alyssa Rosenzweig" <alyssa.rosenzweig@collabora.com>,
	"Viresh Kumar" <vireshk@kernel.org>, "Nishanth Menon" <nm@ti.com>,
	"Stephen Boyd" <sboyd@kernel.org>, "Chen-Yu Tsai" <wens@csie.org>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-sunxi <linux-sunxi@googlegroups.com>
Subject: Re: [linux-sunxi] Re: [PATCH v2 13/14] [DO NOT MERGE] arm64: dts: allwinner: h6: Add GPU OPP table
Date: Fri, 28 Aug 2020 15:01:21 +0200	[thread overview]
Message-ID: <1857384.8dBkWBLq5R@jernej-laptop> (raw)
In-Reply-To: <20200828122119.eadup4aiohnqldam@core.my.home>

Dne petek, 28. avgust 2020 ob 14:21:19 CEST je Ondřej Jirman napisal(a):
> On Fri, Aug 28, 2020 at 02:16:36PM +0200, Clément Péron wrote:
> > Hi Maxime,
> > 
> > On Tue, 25 Aug 2020 at 15:35, Maxime Ripard <maxime@cerno.tech> wrote:
> > > Hi Clement,
> > > 
> > > On Mon, Aug 03, 2020 at 09:54:05AM +0200, Clément Péron wrote:
> > > > Hi Maxime and All,
> > > > 
> > > > On Sat, 4 Jul 2020 at 16:56, Clément Péron <peron.clem@gmail.com> 
wrote:
> > > > > Hi Maxime,
> > > > > 
> > > > > On Sat, 4 Jul 2020 at 14:13, Maxime Ripard <maxime@cerno.tech> 
wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > On Sat, Jul 04, 2020 at 12:25:34PM +0200, Clément Péron wrote:
> > > > > > > Add an Operating Performance Points table for the GPU to
> > > > > > > enable Dynamic Voltage & Frequency Scaling on the H6.
> > > > > > > 
> > > > > > > The voltage range is set with minival voltage set to the target
> > > > > > > and the maximal voltage set to 1.2V. This allow DVFS framework
> > > > > > > to
> > > > > > > work properly on board with fixed regulator.
> > > > > > > 
> > > > > > > Signed-off-by: Clément Péron <peron.clem@gmail.com>
> > > > > > 
> > > > > > That patch seems reasonable, why shouldn't we merge it?
> > > > > 
> > > > > I didn't test it a lot and last time I did, some frequencies looked
> > > > > unstable. https://lore.kernel.org/patchwork/cover/1239739/
> > > > > 
> > > > > This series adds regulator support to Panfrost devfreq, I will send
> > > > > a
> > > > > new one if DVFS on the H6 GPU is stable.
> > > > > 
> > > > > I got this running glmark2 last time
> > > > > # glmark2-es2-drm
> > > > > =======================================================
> > > > > 
> > > > >     glmark2 2017.07
> > > > > 
> > > > > =======================================================
> > > > > 
> > > > >     OpenGL Information
> > > > >     GL_VENDOR:     Panfrost
> > > > >     GL_RENDERER:   Mali T720 (Panfrost)
> > > > >     GL_VERSION:    OpenGL ES 2.0 Mesa 20.0.5
> > > > > 
> > > > > =======================================================
> > > > > 
> > > > > [   93.550063] panfrost 1800000.gpu: GPU Fault 0x00000088 (UNKNOWN)
> > > > > at
> > > > > 0x0000000080117100
> > > > > [   94.045401] panfrost 1800000.gpu: gpu sched timeout, js=0,
> > > > > config=0x3700, status=0x8, head=0x21d6c00, tail=0x21d6c00,
> > > > > sched_job=00000000e3c2132f
> > > > > 
> > > > > [  328.871070] panfrost 1800000.gpu: Unhandled Page fault in AS0 at
> > > > > VA
> > > > > 0x0000000000000000
> > > > > [  328.871070] Reason: TODO
> > > > > [  328.871070] raw fault status: 0xAA0003C2
> > > > > [  328.871070] decoded fault status: SLAVE FAULT
> > > > > [  328.871070] exception type 0xC2: TRANSLATION_FAULT_LEVEL2
> > > > > [  328.871070] access type 0x3: WRITE
> > > > > [  328.871070] source id 0xAA00
> > > > > [  329.373327] panfrost 1800000.gpu: gpu sched timeout, js=1,
> > > > > config=0x3700, status=0x8, head=0xa1a4900, tail=0xa1a4900,
> > > > > sched_job=000000007ac31097
> > > > > [  329.386527] panfrost 1800000.gpu: js fault, js=0,
> > > > > status=DATA_INVALID_FAULT, head=0xa1a4c00, tail=0xa1a4c00
> > > > > [  329.396293] panfrost 1800000.gpu: gpu sched timeout, js=0,
> > > > > config=0x3700, status=0x58, head=0xa1a4c00, tail=0xa1a4c00,
> > > > > sched_job=0000000004c90381
> > > > > [  329.411521] panfrost 1800000.gpu: Unhandled Page fault in AS0 at
> > > > > VA
> > > > > 0x0000000000000000
> > > > > [  329.411521] Reason: TODO
> > > > > [  329.411521] raw fault status: 0xAA0003C2
> > > > > [  329.411521] decoded fault status: SLAVE FAULT
> > > > > [  329.411521] exception type 0xC2: TRANSLATION_FAULT_LEVEL2
> > > > > [  329.411521] access type 0x3: WRITE
> > > > > [  329.411521] source id 0xAA00
> > > > 
> > > > Just to keep a track of this issue.
> > > > 
> > > > Piotr Oniszczuk give more test and seems to be software related:
> > > > https://www.spinics.net/lists/dri-devel/msg264279.html
> > > > 
> > > > Ondrej gave a great explanation about a possible origin of this issue:
> > > > https://freenode.irclog.whitequark.org/linux-sunxi/2020-07-11
> > > > 
> > > > 20:12 <megi> looks like gpu pll on H6 is NKMP clock, and those are
> > > > implemented in such a way in mainline that they are prone to
> > > > overshooting the frequency during output divider reduction
> > > > 20:13 <megi> so disabling P divider may help
> > > > 20:13 <megi> or fixing the dividers
> > > > 20:14 <megi> and just allowing N to change
> > > > 20:22 <megi> hmm, I haven't looked at this for quite some time, but H6
> > > > BSP way of setting PLL factors actually makes the most sense out of
> > > > everything I've seen/tested so far
> > > > 20:23 <megi> it waits for lock not after setting NK factors, but after
> > > > reducing the M factor (pre-divider)
> > > > 20:24 <megi> I might as well re-run my CPU PLL tester with this
> > > > algorithm, to see if it fixes the lockups
> > > > 20:26 <megi> it makes sense to wait for PLL to stabilize "after"
> > > > changing all the factors that actually affect the VCO, and not just
> > > > some of them
> > > > 20:27 <megi> warpme_: ^
> > > > 20:28 <megi> it may be the same thing that plagues the CPU PLL rate
> > > > changes at runtime
> > > 
> > > I guess it's one of the bugs we never heard of...
> > > 
> > > It would be a good idea to test it on another platform (like Rockchip?)
> > > to rule out any driver issue?
> > > 
> > > What do you think?
> > 
> > I can't exclude a bug in the driver, but if it was the case LE
> > community or Panfrost maintainer would have heard of that.
> > 
> > Megi's explanations match what I observed.
> > NKMP drivers seem the perfect guilty here or maybe it's a combination of
> > both...
> > 
> > Jernej sent me this patch to test:
> > https://github.com/clementperon/linux/commit/56bde359beaf8e827ce53ede1fe4a
> > 0ad233cb79b But it didn't fix the issue, If someone want to have a look at
> > it :)
> Not sure how that patch is supposed to work, but it seems to apply
> all factors at once to me.

It's hackish but in essence, it is what vendor clock driver does. From A83T or 
so onwards, vendor driver uses "lock_mode = PLL_LOCK_NEW_MODE" and this hack 
tries to mimick that locking behaviour (good enough for tests).

And yes, this code still applies all factors at once, because vendor driver 
does that too on H6. Only case where vendor driver applies factors one by one 
is for V3s SoC.

Btw, only recently I noticed following block in h6 clk driver:
https://elixir.bootlin.com/linux/latest/source/drivers/clk/sunxi-ng/ccu-sun50i-h6.c#L1196

Other clk drivers don't have it. Maybe it has some influence in this matter.

Best regards,
Jernej

> 
> regards,
> 	o.
> 
> > Regards,
> > Clement
> > 
> > > Maxime





  reply	other threads:[~2020-08-28 13:01 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-04 10:25 [PATCH v2 00/14] Add regulator devfreq support to Panfrost Clément Péron
2020-07-04 10:25 ` [PATCH v2 01/14] drm/panfrost: avoid static declaration Clément Péron
2020-07-04 10:25 ` [PATCH v2 02/14] drm/panfrost: clean headers in devfreq Clément Péron
2020-07-04 10:25 ` [PATCH v2 03/14] drm/panfrost: don't use pfdevfreq.busy_count to know if hw is idle Clément Péron
2020-07-04 10:25 ` [PATCH v2 04/14] drm/panfrost: introduce panfrost_devfreq struct Clément Péron
2020-07-04 10:25 ` [PATCH v2 05/14] drm/panfrost: use spinlock instead of atomic Clément Péron
2020-07-04 10:25 ` [PATCH v2 06/14] drm/panfrost: properly handle error in probe Clément Péron
2020-07-04 10:25 ` [PATCH v2 07/14] drm/panfrost: rename error labels in device_init Clément Péron
2020-07-04 10:25 ` [PATCH v2 08/14] drm/panfrost: move devfreq_init()/fini() in device Clément Péron
2020-07-04 10:25 ` [PATCH v2 09/14] drm/panfrost: dynamically alloc regulators Clément Péron
2020-07-04 10:25 ` [PATCH v2 10/14] drm/panfrost: add regulators to devfreq Clément Péron
2020-07-04 10:25 ` [PATCH v2 11/14] arm64: defconfig: Enable devfreq cooling device Clément Péron
2020-07-04 10:25 ` [PATCH v2 12/14] arm64: dts: allwinner: h6: Add cooling map for GPU Clément Péron
2020-07-04 10:25 ` [PATCH v2 13/14] [DO NOT MERGE] arm64: dts: allwinner: h6: Add GPU OPP table Clément Péron
2020-07-04 12:13   ` Maxime Ripard
2020-07-04 14:56     ` Clément Péron
2020-08-03  7:54       ` Clément Péron
2020-08-24 13:11         ` Maxime Ripard
2020-08-28 12:16           ` Clément Péron
2020-08-28 12:21             ` Ondřej Jirman
2020-08-28 13:01               ` Jernej Škrabec [this message]
2020-07-04 10:25 ` [PATCH v2 14/14] [DO NOT MERGE] arm64: dts: allwinner: force GPU regulator to be always Clément Péron
2020-07-04 10:30   ` Clément Péron
2020-07-04 10:28 ` [PATCH v2 00/14] Add regulator devfreq support to Panfrost Clément Péron
2020-07-06 13:26 ` Alyssa Rosenzweig
2020-07-07  0:16 ` Ondřej Jirman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1857384.8dBkWBLq5R@jernej-laptop \
    --to=jernej.skrabec@gmail.com \
    --cc=alyssa.rosenzweig@collabora.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-sunxi@googlegroups.com \
    --cc=maxime@cerno.tech \
    --cc=nm@ti.com \
    --cc=peron.clem@gmail.com \
    --cc=robh@kernel.org \
    --cc=sboyd@kernel.org \
    --cc=steven.price@arm.com \
    --cc=tomeu.vizoso@collabora.com \
    --cc=vireshk@kernel.org \
    --cc=wens@csie.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).