[AMD Official Use Only - General]



The notifier block is embedded in smu context of a device. If there are 4 devices and 3 of them are interested, they register using the notifier block within their smu context. Notifier is called in a chain and when received they use the container_of to get the smu context and communicate with the respective device's FW on the actions to do.

BTW, I don't know why dGPU PMFW would be interested in SMT change. On AMD APU + dGPU we already have things like smartshift and it will take care if communicated to APU alone.

Thanks,
Lijo
________________________________
From: Limonciello, Mario <Mario.Limonciello@amd.com>
Sent: Friday, March 24, 2023 11:19:55 PM
To: Lazar, Lijo <Lijo.Lazar@amd.com>; Yang, WenYou <WenYou.Yang@amd.com>; Deucher, Alexander <Alexander.Deucher@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; Pan, Xinhui <Xinhui.Pan@amd.com>
Cc: Li, Ying <YING.LI@amd.com>; Liu, Kun <Kun.Liu2@amd.com>; Liang, Richard qi <Richardqi.Liang@amd.com>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>
Subject: RE: [v1,2/3] drm/amd/pm: send the SMT-enable message to pmfw

[AMD Official Use Only - General]



> -----Original Message-----
> From: Lazar, Lijo <Lijo.Lazar@amd.com>
> Sent: Thursday, March 23, 2023 21:29
> To: Limonciello, Mario <Mario.Limonciello@amd.com>; Yang, WenYou
> <WenYou.Yang@amd.com>; Deucher, Alexander
> <Alexander.Deucher@amd.com>; Koenig, Christian
> <Christian.Koenig@amd.com>; Pan, Xinhui <Xinhui.Pan@amd.com>
> Cc: Li, Ying <YING.LI@amd.com>; Liu, Kun <Kun.Liu2@amd.com>; Liang,
> Richard qi <Richardqi.Liang@amd.com>; amd-gfx@lists.freedesktop.org
> Subject: Re: [v1,2/3] drm/amd/pm: send the SMT-enable message to pmfw
>
>
>
> On 3/23/2023 11:36 PM, Limonciello, Mario wrote:
> > On 3/23/2023 12:41, Limonciello, Mario wrote:
> >> On 3/22/2023 00:48, Wenyou Yang wrote:
> >>> When the CPU SMT status change in the fly, sent the SMT-enable
> >>> message to pmfw to notify it that the SMT status changed.
> >>>
> >>> Signed-off-by: Wenyou Yang <WenYou.Yang@amd.com>
> >>> ---
> >>>   drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c     | 41
> +++++++++++++++++++
> >>>   drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h |  5 +++
> >>>   2 files changed, 46 insertions(+)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
> >>> b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
> >>> index b5d64749990e..5cd85a9d149d 100644
> >>> --- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
> >>> +++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
> >>> @@ -22,6 +22,7 @@
> >>>   #define SWSMU_CODE_LAYER_L1
> >>> +#include <linux/cpu.h>
> >>>   #include <linux/firmware.h>
> >>>   #include <linux/pci.h>
> >>> @@ -69,6 +70,14 @@ static int smu_set_fan_speed_rpm(void *handle,
> >>> uint32_t speed);
> >>>   static int smu_set_gfx_cgpg(struct smu_context *smu, bool enabled);
> >>>   static int smu_set_mp1_state(void *handle, enum pp_mp1_state
> >>> mp1_state);
> >>> +static int smt_notifier_callback(struct notifier_block *nb, unsigned
> >>> long action, void *data);
> >>> +
> >>> +extern struct raw_notifier_head smt_notifier_head;
> >>> +
> >>> +static struct notifier_block smt_notifier = {
> >>> +    .notifier_call = smt_notifier_callback,
> >>> +};
> >>> +
> >>>   static int smu_sys_get_pp_feature_mask(void *handle,
> >>>                          char *buf)
> >>>   {
> >>> @@ -625,6 +634,8 @@ static int smu_set_funcs(struct amdgpu_device
> *adev)
> >>>       return 0;
> >>>   }
> >>> +static struct smu_context *current_smu;
> >>> +
> >>>   static int smu_early_init(void *handle)
> >>>   {
> >>>       struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> >>> @@ -645,6 +656,7 @@ static int smu_early_init(void *handle)
> >>>       mutex_init(&smu->message_lock);
> >>>       adev->powerplay.pp_handle = smu;
> >>> +    current_smu = smu;
> >
> > Although this series is intended for the Van Gogh case right now, I
> > dont't think this would scale well for multiple GPUs in a system.
> >
> > I think that instead you may want to move the notifier callback to be a
> > level "higher" in amdgpu.  Perhaps amdgpu_device.c?  Then when that
> > notifier call is received you'll want to walk through the PCI device
> > space to find any GPUs that are bound with AMDGPU a series of
> > wrappers/calls that end up calling smu_set_cpu_smt_enable with the
> > approriate arguments.
> >
>
> This is not required when the notifier is registered only within Vangogh
> ppt function. Then follow Evan's suggestion of keeping the notifier
> block inside smu. From the notifier block, it can find the smu block and
> then call cpu_smt_enable/disable. That way notifier callback comes only
> once even with multiple dGPUs + Vangogh and processed for the
> corresponding smu.
>
> This notifier doesn't need to be registered for platforms only with
> dGPUs or APUs which don't need this.

They don't right now, but I was thinking how this could scale to other
APUs or dGPUs if they are interested in adding support for this message
too.

>
> Thanks,
> Lijo
>
> >
> >>>       adev->powerplay.pp_funcs = &swsmu_pm_funcs;
> >>>       r = smu_set_funcs(adev);
> >>> @@ -1105,6 +1117,8 @@ static int smu_sw_init(void *handle)
> >>>       if (!smu->ppt_funcs->get_fan_control_mode)
> >>>           smu->adev->pm.no_fan = true;
> >>> +    raw_notifier_chain_register(&smt_notifier_head, &smt_notifier);
> >>> +
> >>>       return 0;
> >>>   }
> >>> @@ -1122,6 +1136,8 @@ static int smu_sw_fini(void *handle)
> >>>       smu_fini_microcode(smu);
> >>> +    raw_notifier_chain_unregister(&smt_notifier_head, &smt_notifier);
> >>> +
> >>>       return 0;
> >>>   }
> >>> @@ -3241,3 +3257,28 @@ int smu_send_hbm_bad_channel_flag(struct
> >>> smu_context *smu, uint32_t size)
> >>>       return ret;
> >>>   }
> >>> +
> >>> +static int smu_set_cpu_smt_enable(struct smu_context *smu, bool
> enable)
> >>> +{
> >>> +    int ret = -EINVAL;
> >>> +
> >>> +    if (smu->ppt_funcs && smu->ppt_funcs->set_cpu_smt_enable)
> >>> +        ret = smu->ppt_funcs->set_cpu_smt_enable(smu, enable);
> >>> +
> >>> +    return ret;
> >>> +}
> >>> +
> >>> +static int smt_notifier_callback(struct notifier_block *nb,
> >>> +                 unsigned long action, void *data)
> >>> +{
> >>> +    struct smu_context *smu = current_smu;
> >>> +    int ret = NOTIFY_OK;
> >>
> >> This initialization is pointless, it's clobbered in the next line.
> >>
> >>> +
> >>> +    ret = (action == SMT_ENABLED) ?
> >>> +                smu_set_cpu_smt_enable(smu, true) :
> >>> +                smu_set_cpu_smt_enable(smu, false);
> >>
> >> How about this instead, it should be more readable:
> >>
> >>      ret = smu_set_cpu_smt_enable(smu, action == SMT_ENABLED);
> >>
> >>> +    if (ret)
> >>> +        ret = NOTIFY_BAD;
> >>> +
> >>> +    return ret;
> >>
> >> How about instead:
> >>
> >>      dev_dbg(adev->dev, "failed to %sable SMT: %d\n", action ==
> >> SMT_ENABLED ? "en" : "dis", ret);
> >>
> >>      return ret ? NOTIFY_BAD : NOTIFY_OK;
> >>
> >>> +}
> >>> diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
> >>> b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
> >>> index 09469c750a96..7c6594bba796 100644
> >>> --- a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
> >>> +++ b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
> >>> @@ -1354,6 +1354,11 @@ struct pptable_funcs {
> >>>        * @init_pptable_microcode: Prepare the pptable microcode to
> >>> upload via PSP
> >>>        */
> >>>       int (*init_pptable_microcode)(struct smu_context *smu);
> >>> +
> >>> +    /**
> >>> +     * @set_cpu_smt_enable: Set the CPU SMT status
> >>> +     */
> >>> +    int (*set_cpu_smt_enable)(struct smu_context *smu, bool enable);
> >>>   };
> >>>   typedef enum {
> >>
> >