All of lore.kernel.org
 help / color / mirror / Atom feed
From: Cristian Marussi <cristian.marussi@arm.com>
To: Florian Fainelli <f.fainelli@gmail.com>
Cc: linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, sudeep.holla@arm.com,
	james.quinlan@broadcom.com, Jonathan.Cameron@huawei.com,
	etienne.carriere@linaro.org, vincent.guittot@linaro.org,
	souvik.chakravarty@arm.com
Subject: Re: [PATCH 11/22] firmware: arm_scmi: Add SCMIv3.1 extended names protocols support
Date: Wed, 15 Jun 2022 18:32:31 +0100	[thread overview]
Message-ID: <YqoXgsbg6t46rZeI@e120937-lin> (raw)
In-Reply-To: <f194f9b6-578d-ae08-16c3-6d464da10452@gmail.com>

On Wed, Jun 15, 2022 at 10:19:53AM -0700, Florian Fainelli wrote:
> On 6/15/22 09:29, Cristian Marussi wrote:
> > On Wed, Jun 15, 2022 at 09:10:03AM -0700, Florian Fainelli wrote:
> > > On 6/15/22 02:40, Cristian Marussi wrote:
> > > > On Wed, Jun 15, 2022 at 09:18:03AM +0100, Cristian Marussi wrote:
> > > > > On Wed, Jun 15, 2022 at 05:45:11AM +0200, Florian Fainelli wrote:
> > > > > > 
> > > > > > 
> > > > > > On 3/30/2022 5:05 PM, Cristian Marussi wrote:
> > > > > > > Using the common protocol helper implementation add support for all new
> > > > > > > SCMIv3.1 extended names commands related to all protocols with the
> > > > > > > exception of SENSOR_AXIS_GET_NAME.
> > > > > > > 
> > > > > > > Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
> > > > > > 
> > > > > > This causes the following splat on a platform where regulators fail to
> > > > > > initialize:
> > > > > > 
> > > > > 
> > > > > Hi Florian,
> > > > > 
> > > > > thanks for the report.
> > > > > 
> > > > > It seems a memory error while allocating so it was not meant to be
> > > > > solved by the fixes, anyway, I've never seen this splat in my testing
> > > > > and at first sight I cannot see anything wrong in the devm_k* calls
> > > > > inside scmi_voltage_protocol_init...is there any particular config in
> > > > > your setup ?
> > > > > 
> > > > > Moreover, the WARNING line 5402 seems to match v5.19-rc1 and it has
> > > > > slightly changed with -rc-1, so I'll try rebasing on that at first and
> > > > > see if I can reproduce the issue locally.
> > > > > 
> > > > 
> > > > I just re-tested the series rebased on v519-rc1 plus fixes and I cannot
> > > > reproduce in my setup with a few (~9) good and bad voltage domains.
> > > > 
> > > > How many voltage domains are advertised by the platform in your setup ?
> > > 
> > > There are 11 voltage regulators on this platform, and of course, now that I
> > > am trying to reproduce the splat I reported I just cannot anymore... I will
> > > let you know if there is anything that needs to be done. Thanks for being
> > > responsive as usual!
> > 
> > ... you're welcome...
> > 
> > I'm trying to figure out where an abnormal mem request could happen...
> 
> I think the problem is/was with the number of voltage domains being reported
> which was way too big and passed directly, without any clamping to
> devm_kcalloc() resulting the splat indicating that the allocation was beyond
> the MAX_ORDER. The specification allows for up to 2^16 domains which would
> still be way too much to allocate using kmalloc() so we could/should
> consider vmalloc() here eventually?
> 

Yes I was suspicious about the same and I was starting to think about vmalloc
in general across all domain enumerations even if this is may not be the case
for your splat...

> In all likelihood though we probably won't find a system with 65k voltage
> domains.
> 
> > 
> > can you try adding this (for brutal debugging) when you try ?
> > (...just to rule out funny fw replies.... :D)
> 
> Sure, nothing weird coming out and it succeeded in enumerating all of the
> regulators, I smell a transient issue with our firmware implementation,
> maybe...
> 
> [    0.560544] arm-scmi brcm_scmi@0: num_returned:1  num_remaining:0
> [    0.560617] arm-scmi brcm_scmi@0: num_returned:1  num_remaining:0
> [    0.560673] arm-scmi brcm_scmi@0: num_returned:1  num_remaining:0
> [    0.560730] arm-scmi brcm_scmi@0: num_returned:1  num_remaining:0
> [    0.560881] arm-scmi brcm_scmi@0: num_returned:1  num_remaining:0
> [    0.560940] arm-scmi brcm_scmi@0: num_returned:1  num_remaining:0
> [    0.560996] arm-scmi brcm_scmi@0: num_returned:1  num_remaining:0
> [    0.561054] arm-scmi brcm_scmi@0: num_returned:1  num_remaining:0
> [    0.561110] arm-scmi brcm_scmi@0: num_returned:1  num_remaining:0
> [    0.561168] arm-scmi brcm_scmi@0: num_returned:1  num_remaining:0
> [    0.561225] arm-scmi brcm_scmi@0: num_returned:1  num_remaining:0
> [    0.561652] scmi-regulator scmi_dev.2: Regulator stb_vreg_2 registered
> for domain [2]
> [    0.561858] scmi-regulator scmi_dev.2: Regulator stb_vreg_3 registered
> for domain [3]
> [    0.562030] scmi-regulator scmi_dev.2: Regulator stb_vreg_4 registered
> for domain [4]
> [    0.562190] scmi-regulator scmi_dev.2: Regulator stb_vreg_5 registered
> for domain [5]
> [    0.564427] scmi-regulator scmi_dev.2: Regulator stb_vreg_6 registered
> for domain [6]
> [    0.564638] scmi-regulator scmi_dev.2: Regulator stb_vreg_7 registered
> for domain [7]
> [    0.564817] scmi-regulator scmi_dev.2: Regulator stb_vreg_8 registered
> for domain [8]
> [    0.565030] scmi-regulator scmi_dev.2: Regulator stb_vreg_9 registered
> for domain [9]
> [    0.565191] scmi-regulator scmi_dev.2: Regulator stb_vreg_10 registered
> for domain [10]
> 

Ok, this was another place where a reply carrying a consistent number of
non-segmented entries could trigger an jumbo-request to kmalloc...

..maybe this is where a transient fw issue can reply with a dramatic
numbers of possible (non-segmented) levels (num_returned+num_remaining)

(I filter out replies carrying descriptors for segmented voltage-levels
that does not come in triplets (returned:3  remaining:0) since they are out
of spec...but I just hit today on another fw sending such out of spec
reply for clocks and I will relax such req probably not to break too
many out-of-spec blobs out in the wild :P)

So let me know if got new splats...

Thanks,
Cristian


WARNING: multiple messages have this Message-ID (diff)
From: Cristian Marussi <cristian.marussi@arm.com>
To: Florian Fainelli <f.fainelli@gmail.com>
Cc: linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, sudeep.holla@arm.com,
	james.quinlan@broadcom.com, Jonathan.Cameron@huawei.com,
	etienne.carriere@linaro.org, vincent.guittot@linaro.org,
	souvik.chakravarty@arm.com
Subject: Re: [PATCH 11/22] firmware: arm_scmi: Add SCMIv3.1 extended names protocols support
Date: Wed, 15 Jun 2022 18:32:31 +0100	[thread overview]
Message-ID: <YqoXgsbg6t46rZeI@e120937-lin> (raw)
In-Reply-To: <f194f9b6-578d-ae08-16c3-6d464da10452@gmail.com>

On Wed, Jun 15, 2022 at 10:19:53AM -0700, Florian Fainelli wrote:
> On 6/15/22 09:29, Cristian Marussi wrote:
> > On Wed, Jun 15, 2022 at 09:10:03AM -0700, Florian Fainelli wrote:
> > > On 6/15/22 02:40, Cristian Marussi wrote:
> > > > On Wed, Jun 15, 2022 at 09:18:03AM +0100, Cristian Marussi wrote:
> > > > > On Wed, Jun 15, 2022 at 05:45:11AM +0200, Florian Fainelli wrote:
> > > > > > 
> > > > > > 
> > > > > > On 3/30/2022 5:05 PM, Cristian Marussi wrote:
> > > > > > > Using the common protocol helper implementation add support for all new
> > > > > > > SCMIv3.1 extended names commands related to all protocols with the
> > > > > > > exception of SENSOR_AXIS_GET_NAME.
> > > > > > > 
> > > > > > > Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
> > > > > > 
> > > > > > This causes the following splat on a platform where regulators fail to
> > > > > > initialize:
> > > > > > 
> > > > > 
> > > > > Hi Florian,
> > > > > 
> > > > > thanks for the report.
> > > > > 
> > > > > It seems a memory error while allocating so it was not meant to be
> > > > > solved by the fixes, anyway, I've never seen this splat in my testing
> > > > > and at first sight I cannot see anything wrong in the devm_k* calls
> > > > > inside scmi_voltage_protocol_init...is there any particular config in
> > > > > your setup ?
> > > > > 
> > > > > Moreover, the WARNING line 5402 seems to match v5.19-rc1 and it has
> > > > > slightly changed with -rc-1, so I'll try rebasing on that at first and
> > > > > see if I can reproduce the issue locally.
> > > > > 
> > > > 
> > > > I just re-tested the series rebased on v519-rc1 plus fixes and I cannot
> > > > reproduce in my setup with a few (~9) good and bad voltage domains.
> > > > 
> > > > How many voltage domains are advertised by the platform in your setup ?
> > > 
> > > There are 11 voltage regulators on this platform, and of course, now that I
> > > am trying to reproduce the splat I reported I just cannot anymore... I will
> > > let you know if there is anything that needs to be done. Thanks for being
> > > responsive as usual!
> > 
> > ... you're welcome...
> > 
> > I'm trying to figure out where an abnormal mem request could happen...
> 
> I think the problem is/was with the number of voltage domains being reported
> which was way too big and passed directly, without any clamping to
> devm_kcalloc() resulting the splat indicating that the allocation was beyond
> the MAX_ORDER. The specification allows for up to 2^16 domains which would
> still be way too much to allocate using kmalloc() so we could/should
> consider vmalloc() here eventually?
> 

Yes I was suspicious about the same and I was starting to think about vmalloc
in general across all domain enumerations even if this is may not be the case
for your splat...

> In all likelihood though we probably won't find a system with 65k voltage
> domains.
> 
> > 
> > can you try adding this (for brutal debugging) when you try ?
> > (...just to rule out funny fw replies.... :D)
> 
> Sure, nothing weird coming out and it succeeded in enumerating all of the
> regulators, I smell a transient issue with our firmware implementation,
> maybe...
> 
> [    0.560544] arm-scmi brcm_scmi@0: num_returned:1  num_remaining:0
> [    0.560617] arm-scmi brcm_scmi@0: num_returned:1  num_remaining:0
> [    0.560673] arm-scmi brcm_scmi@0: num_returned:1  num_remaining:0
> [    0.560730] arm-scmi brcm_scmi@0: num_returned:1  num_remaining:0
> [    0.560881] arm-scmi brcm_scmi@0: num_returned:1  num_remaining:0
> [    0.560940] arm-scmi brcm_scmi@0: num_returned:1  num_remaining:0
> [    0.560996] arm-scmi brcm_scmi@0: num_returned:1  num_remaining:0
> [    0.561054] arm-scmi brcm_scmi@0: num_returned:1  num_remaining:0
> [    0.561110] arm-scmi brcm_scmi@0: num_returned:1  num_remaining:0
> [    0.561168] arm-scmi brcm_scmi@0: num_returned:1  num_remaining:0
> [    0.561225] arm-scmi brcm_scmi@0: num_returned:1  num_remaining:0
> [    0.561652] scmi-regulator scmi_dev.2: Regulator stb_vreg_2 registered
> for domain [2]
> [    0.561858] scmi-regulator scmi_dev.2: Regulator stb_vreg_3 registered
> for domain [3]
> [    0.562030] scmi-regulator scmi_dev.2: Regulator stb_vreg_4 registered
> for domain [4]
> [    0.562190] scmi-regulator scmi_dev.2: Regulator stb_vreg_5 registered
> for domain [5]
> [    0.564427] scmi-regulator scmi_dev.2: Regulator stb_vreg_6 registered
> for domain [6]
> [    0.564638] scmi-regulator scmi_dev.2: Regulator stb_vreg_7 registered
> for domain [7]
> [    0.564817] scmi-regulator scmi_dev.2: Regulator stb_vreg_8 registered
> for domain [8]
> [    0.565030] scmi-regulator scmi_dev.2: Regulator stb_vreg_9 registered
> for domain [9]
> [    0.565191] scmi-regulator scmi_dev.2: Regulator stb_vreg_10 registered
> for domain [10]
> 

Ok, this was another place where a reply carrying a consistent number of
non-segmented entries could trigger an jumbo-request to kmalloc...

..maybe this is where a transient fw issue can reply with a dramatic
numbers of possible (non-segmented) levels (num_returned+num_remaining)

(I filter out replies carrying descriptors for segmented voltage-levels
that does not come in triplets (returned:3  remaining:0) since they are out
of spec...but I just hit today on another fw sending such out of spec
reply for clocks and I will relax such req probably not to break too
many out-of-spec blobs out in the wild :P)

So let me know if got new splats...

Thanks,
Cristian


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2022-06-15 17:33 UTC|newest]

Thread overview: 100+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-30 15:05 [PATCH 00/22] SCMIv3.1 Miscellaneous changes Cristian Marussi
2022-03-30 15:05 ` Cristian Marussi
2022-03-30 15:05 ` [PATCH 01/22] firmware: arm_scmi: Fix sorting of retrieved clock rates Cristian Marussi
2022-03-30 15:05   ` Cristian Marussi
2022-03-30 15:05 ` [PATCH 02/22] firmware: arm_scmi: Make protocols init fail on basic errors Cristian Marussi
2022-03-30 15:05   ` Cristian Marussi
2022-04-26 15:35   ` Sudeep Holla
2022-04-26 15:35     ` Sudeep Holla
2022-04-26 16:25     ` Cristian Marussi
2022-04-26 16:25       ` Cristian Marussi
2022-04-28 10:25       ` Sudeep Holla
2022-04-28 10:25         ` Sudeep Holla
2022-04-28 12:07         ` Cristian Marussi
2022-04-28 12:07           ` Cristian Marussi
2022-03-30 15:05 ` [PATCH 03/22] firmware: arm_scmi: Fix Base list protocols enumeration Cristian Marussi
2022-03-30 15:05   ` Cristian Marussi
2022-03-30 15:05 ` [PATCH 04/22] firmware: arm_scmi: Validate BASE_DISCOVER_LIST_PROTOCOLS reply Cristian Marussi
2022-03-30 15:05   ` Cristian Marussi
2022-04-28 10:07   ` Sudeep Holla
2022-04-28 10:07     ` Sudeep Holla
2022-04-28 13:45     ` Cristian Marussi
2022-04-28 13:45       ` Cristian Marussi
2022-04-28 13:55       ` Sudeep Holla
2022-04-28 13:55         ` Sudeep Holla
2022-04-28 14:03         ` Cristian Marussi
2022-04-28 14:03           ` Cristian Marussi
2022-03-30 15:05 ` [PATCH 05/22] firmware: arm_scmi: Dynamically allocate protocols array Cristian Marussi
2022-03-30 15:05   ` Cristian Marussi
2022-04-28 10:27   ` Sudeep Holla
2022-04-28 10:27     ` Sudeep Holla
2022-03-30 15:05 ` [PATCH 06/22] firmware: arm_scmi: Make name_get operations return a const Cristian Marussi
2022-03-30 15:05   ` Cristian Marussi
2022-03-30 15:05 ` [PATCH 07/22] firmware: arm_scmi: Check CLOCK_RATE_SET_COMPLETE async reply Cristian Marussi
2022-03-30 15:05   ` Cristian Marussi
2022-03-30 15:05 ` [PATCH 08/22] firmware: arm_scmi: Remove unneeded NULL termination of clk name Cristian Marussi
2022-03-30 15:05   ` Cristian Marussi
2022-03-30 15:05 ` [PATCH 09/22] firmware: arm_scmi: Split protocol specific definitions in a dedicated header Cristian Marussi
2022-03-30 15:05   ` Cristian Marussi
2022-03-30 15:05 ` [PATCH 10/22] firmware: arm_scmi: Introduce a common SCMIv3.1 .extended_name_get helper Cristian Marussi
2022-03-30 15:05   ` Cristian Marussi
2022-03-30 15:05 ` [PATCH 11/22] firmware: arm_scmi: Add SCMIv3.1 extended names protocols support Cristian Marussi
2022-03-30 15:05   ` Cristian Marussi
2022-06-15  3:45   ` Florian Fainelli
2022-06-15  3:45     ` Florian Fainelli
2022-06-15  8:17     ` Cristian Marussi
2022-06-15  8:17       ` Cristian Marussi
2022-06-15  9:40       ` Cristian Marussi
2022-06-15  9:40         ` Cristian Marussi
2022-06-15 16:10         ` Florian Fainelli
2022-06-15 16:10           ` Florian Fainelli
2022-06-15 16:29           ` Cristian Marussi
2022-06-15 16:29             ` Cristian Marussi
2022-06-15 17:19             ` Florian Fainelli
2022-06-15 17:19               ` Florian Fainelli
2022-06-15 17:32               ` Cristian Marussi [this message]
2022-06-15 17:32                 ` Cristian Marussi
2022-06-15 22:58                 ` Florian Fainelli
2022-06-15 22:58                   ` Florian Fainelli
2022-03-30 15:05 ` [PATCH 12/22] firmware: arm_scmi: Parse clock_enable_latency conditionally Cristian Marussi
2022-03-30 15:05   ` Cristian Marussi
2022-03-30 15:05 ` [PATCH 13/22] firmware: arm_scmi: Add iterators for multi-part commands Cristian Marussi
2022-03-30 15:05   ` Cristian Marussi
2022-03-30 15:05 ` [PATCH 14/22] firmware: arm_scmi: Use common iterators in Sensor protocol Cristian Marussi
2022-03-30 15:05   ` Cristian Marussi
2022-03-30 15:05 ` [PATCH 15/22] firmware: arm_scmi: Add SCMIv3.1 SENSOR_AXIS_NAME_GET support Cristian Marussi
2022-03-30 15:05   ` Cristian Marussi
2022-06-02 14:25   ` Peter Hilber
2022-06-02 14:25     ` Peter Hilber
2022-06-06  8:18     ` Cristian Marussi
2022-06-06  8:18       ` Cristian Marussi
2022-06-08  8:40       ` Peter Hilber
2022-06-08  8:40         ` Peter Hilber
2022-06-08  8:49         ` Cristian Marussi
2022-06-08  8:49           ` Cristian Marussi
2022-03-30 15:05 ` [PATCH 16/22] firmware: arm_scmi: Use common iterators in Clock protocol Cristian Marussi
2022-03-30 15:05   ` Cristian Marussi
2022-03-30 15:05 ` [PATCH 17/22] firmware: arm_scmi: Use common iterators in Voltage protocol Cristian Marussi
2022-03-30 15:05   ` Cristian Marussi
2022-03-30 15:05 ` [PATCH 18/22] firmware: arm_scmi: Use common iterators in Perf protocol Cristian Marussi
2022-03-30 15:05   ` Cristian Marussi
2022-03-30 15:05 ` [PATCH 19/22] firmware: arm_scmi: Add SCMIv3.1 Clock notifications Cristian Marussi
2022-03-30 15:05   ` Cristian Marussi
2022-03-30 15:05 ` [PATCH 20/22] firmware: arm_scmi: Add SCMIv3.1 VOLTAGE_LEVEL_SET_COMPLETE Cristian Marussi
2022-03-30 15:05   ` Cristian Marussi
2022-03-30 15:05 ` [PATCH 21/22] firmware: arm_scmi: Add SCMI v3.1 Perf power-cost in microwatts Cristian Marussi
2022-03-30 15:05   ` Cristian Marussi
2022-03-30 16:46   ` Lukasz Luba
2022-03-30 16:46     ` Lukasz Luba
2022-03-30 15:05 ` [PATCH 22/22] firmware: arm_scmi: Add SCMIv3.1 PERFORMANCE_LIMITS_SET checks Cristian Marussi
2022-03-30 15:05   ` Cristian Marussi
2022-04-28 13:13   ` Sudeep Holla
2022-04-28 13:13     ` Sudeep Holla
2022-04-28 13:49     ` Cristian Marussi
2022-04-28 13:49       ` Cristian Marussi
2022-04-28 13:52       ` Sudeep Holla
2022-04-28 13:52         ` Sudeep Holla
2022-04-28 13:46 ` [PATCH 00/22] SCMIv3.1 Miscellaneous changes Sudeep Holla
2022-04-28 13:46   ` Sudeep Holla
2022-05-03  8:03 ` Sudeep Holla
2022-05-03  8:03   ` Sudeep Holla

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YqoXgsbg6t46rZeI@e120937-lin \
    --to=cristian.marussi@arm.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=etienne.carriere@linaro.org \
    --cc=f.fainelli@gmail.com \
    --cc=james.quinlan@broadcom.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=souvik.chakravarty@arm.com \
    --cc=sudeep.holla@arm.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.