dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] drm/amdgpu: Move racy global PMU list into device
@ 2022-10-28 22:48 Brian Norris
  2022-10-28 22:48 ` [PATCH 2/2] drm/amdgpu: Set PROBE_PREFER_ASYNCHRONOUS Brian Norris
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Brian Norris @ 2022-10-28 22:48 UTC (permalink / raw)
  To: Alex Deucher, Christian König, Xinhui
  Cc: Brian Norris, amd-gfx, linux-kernel, dri-devel

If there are multiple amdgpu devices, this list processing can be racy.

We're really treating this like a per-device list, so make that explicit
and remove the global list.

Signed-off-by: Brian Norris <briannorris@chromium.org>
---

 drivers/gpu/drm/amd/amdgpu/amdgpu.h     |  4 ++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c | 12 +++++-------
 2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 0e6ddf05c23c..e968b7f2417c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1063,6 +1063,10 @@ struct amdgpu_device {
 	struct work_struct		reset_work;
 
 	bool                            job_hang;
+
+#if IS_ENABLED(CONFIG_PERF_EVENTS)
+	struct list_head pmu_list;
+#endif
 };
 
 static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c
index 71ee361d0972..24f2055a2f23 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c
@@ -23,6 +23,7 @@
 
 #include <linux/perf_event.h>
 #include <linux/init.h>
+#include <linux/list.h>
 #include "amdgpu.h"
 #include "amdgpu_pmu.h"
 
@@ -72,9 +73,6 @@ static ssize_t amdgpu_pmu_event_show(struct device *dev,
 			amdgpu_pmu_attr->event_str, amdgpu_pmu_attr->type);
 }
 
-static LIST_HEAD(amdgpu_pmu_list);
-
-
 struct amdgpu_pmu_attr {
 	const char *name;
 	const char *config;
@@ -558,7 +556,7 @@ static int init_pmu_entry_by_type_and_add(struct amdgpu_pmu_entry *pmu_entry,
 		pr_info("Detected AMDGPU %d Perf Events.\n", total_num_events);
 
 
-	list_add_tail(&pmu_entry->entry, &amdgpu_pmu_list);
+	list_add_tail(&pmu_entry->entry, &pmu_entry->adev->pmu_list);
 
 	return 0;
 err_register:
@@ -579,9 +577,7 @@ void amdgpu_pmu_fini(struct amdgpu_device *adev)
 {
 	struct amdgpu_pmu_entry *pe, *temp;
 
-	list_for_each_entry_safe(pe, temp, &amdgpu_pmu_list, entry) {
-		if (pe->adev != adev)
-			continue;
+	list_for_each_entry_safe(pe, temp, &adev->pmu_list, entry) {
 		list_del(&pe->entry);
 		perf_pmu_unregister(&pe->pmu);
 		kfree(pe->pmu.attr_groups);
@@ -623,6 +619,8 @@ int amdgpu_pmu_init(struct amdgpu_device *adev)
 	int ret = 0;
 	struct amdgpu_pmu_entry *pmu_entry, *pmu_entry_df;
 
+	INIT_LIST_HEAD(&adev->pmu_list);
+
 	switch (adev->asic_type) {
 	case CHIP_VEGA20:
 		pmu_entry_df = create_pmu_entry(adev, AMDGPU_PMU_PERF_TYPE_DF,
-- 
2.38.1.273.g43a17bfeac-goog


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] drm/amdgpu: Set PROBE_PREFER_ASYNCHRONOUS
  2022-10-28 22:48 [PATCH 1/2] drm/amdgpu: Move racy global PMU list into device Brian Norris
@ 2022-10-28 22:48 ` Brian Norris
  2022-11-08 16:11 ` [PATCH 1/2] drm/amdgpu: Move racy global PMU list into device Alex Deucher
  2022-11-08 16:50 ` Felix Kuehling
  2 siblings, 0 replies; 5+ messages in thread
From: Brian Norris @ 2022-10-28 22:48 UTC (permalink / raw)
  To: Alex Deucher, Christian König, Xinhui
  Cc: Brian Norris, amd-gfx, linux-kernel, dri-devel

This driver often takes over 200ms to start, so it can improve boot
speed to probe it asynchronously.

I did a short review of the driver, and apart from an issue fixed in the
parent patch ("drm/amdgpu: Move racy global PMU list into device"),
there don't appear to be many cross-device dependencies or racy accesses
to global state, so this should be safe.

This driver was pinpointed as part of a survey of top slowest initcalls
(i.e., are built in, and probing synchronously) on a lab of ChromeOS
systems.

Signed-off-by: Brian Norris <briannorris@chromium.org>
---

 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 3c9fecdd6b2f..2d180e48df1b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -2793,7 +2793,10 @@ static struct pci_driver amdgpu_kms_pci_driver = {
 	.probe = amdgpu_pci_probe,
 	.remove = amdgpu_pci_remove,
 	.shutdown = amdgpu_pci_shutdown,
-	.driver.pm = &amdgpu_pm_ops,
+	.driver = {
+		.pm = &amdgpu_pm_ops,
+		.probe_type = PROBE_PREFER_ASYNCHRONOUS,
+	},
 	.err_handler = &amdgpu_pci_err_handler,
 	.dev_groups = amdgpu_sysfs_groups,
 };
-- 
2.38.1.273.g43a17bfeac-goog


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] drm/amdgpu: Move racy global PMU list into device
  2022-10-28 22:48 [PATCH 1/2] drm/amdgpu: Move racy global PMU list into device Brian Norris
  2022-10-28 22:48 ` [PATCH 2/2] drm/amdgpu: Set PROBE_PREFER_ASYNCHRONOUS Brian Norris
@ 2022-11-08 16:11 ` Alex Deucher
  2022-11-08 16:50 ` Felix Kuehling
  2 siblings, 0 replies; 5+ messages in thread
From: Alex Deucher @ 2022-11-08 16:11 UTC (permalink / raw)
  To: Brian Norris, Kim, Jonathan, Kuehling, Felix
  Cc: Xinhui, linux-kernel, dri-devel, amd-gfx, Alex Deucher,
	Christian König

On Fri, Oct 28, 2022 at 6:48 PM Brian Norris <briannorris@chromium.org> wrote:
>
> If there are multiple amdgpu devices, this list processing can be racy.
>
> We're really treating this like a per-device list, so make that explicit
> and remove the global list.
>
> Signed-off-by: Brian Norris <briannorris@chromium.org>

@Kuehling, Felix @Kim, Jonathan can you take a look at this patch?

Thanks,

Alex


> ---
>
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h     |  4 ++++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c | 12 +++++-------
>  2 files changed, 9 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 0e6ddf05c23c..e968b7f2417c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -1063,6 +1063,10 @@ struct amdgpu_device {
>         struct work_struct              reset_work;
>
>         bool                            job_hang;
> +
> +#if IS_ENABLED(CONFIG_PERF_EVENTS)
> +       struct list_head pmu_list;
> +#endif
>  };
>
>  static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c
> index 71ee361d0972..24f2055a2f23 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c
> @@ -23,6 +23,7 @@
>
>  #include <linux/perf_event.h>
>  #include <linux/init.h>
> +#include <linux/list.h>
>  #include "amdgpu.h"
>  #include "amdgpu_pmu.h"
>
> @@ -72,9 +73,6 @@ static ssize_t amdgpu_pmu_event_show(struct device *dev,
>                         amdgpu_pmu_attr->event_str, amdgpu_pmu_attr->type);
>  }
>
> -static LIST_HEAD(amdgpu_pmu_list);
> -
> -
>  struct amdgpu_pmu_attr {
>         const char *name;
>         const char *config;
> @@ -558,7 +556,7 @@ static int init_pmu_entry_by_type_and_add(struct amdgpu_pmu_entry *pmu_entry,
>                 pr_info("Detected AMDGPU %d Perf Events.\n", total_num_events);
>
>
> -       list_add_tail(&pmu_entry->entry, &amdgpu_pmu_list);
> +       list_add_tail(&pmu_entry->entry, &pmu_entry->adev->pmu_list);
>
>         return 0;
>  err_register:
> @@ -579,9 +577,7 @@ void amdgpu_pmu_fini(struct amdgpu_device *adev)
>  {
>         struct amdgpu_pmu_entry *pe, *temp;
>
> -       list_for_each_entry_safe(pe, temp, &amdgpu_pmu_list, entry) {
> -               if (pe->adev != adev)
> -                       continue;
> +       list_for_each_entry_safe(pe, temp, &adev->pmu_list, entry) {
>                 list_del(&pe->entry);
>                 perf_pmu_unregister(&pe->pmu);
>                 kfree(pe->pmu.attr_groups);
> @@ -623,6 +619,8 @@ int amdgpu_pmu_init(struct amdgpu_device *adev)
>         int ret = 0;
>         struct amdgpu_pmu_entry *pmu_entry, *pmu_entry_df;
>
> +       INIT_LIST_HEAD(&adev->pmu_list);
> +
>         switch (adev->asic_type) {
>         case CHIP_VEGA20:
>                 pmu_entry_df = create_pmu_entry(adev, AMDGPU_PMU_PERF_TYPE_DF,
> --
> 2.38.1.273.g43a17bfeac-goog
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] drm/amdgpu: Move racy global PMU list into device
  2022-10-28 22:48 [PATCH 1/2] drm/amdgpu: Move racy global PMU list into device Brian Norris
  2022-10-28 22:48 ` [PATCH 2/2] drm/amdgpu: Set PROBE_PREFER_ASYNCHRONOUS Brian Norris
  2022-11-08 16:11 ` [PATCH 1/2] drm/amdgpu: Move racy global PMU list into device Alex Deucher
@ 2022-11-08 16:50 ` Felix Kuehling
  2022-11-09  1:22   ` Brian Norris
  2 siblings, 1 reply; 5+ messages in thread
From: Felix Kuehling @ 2022-11-08 16:50 UTC (permalink / raw)
  To: Brian Norris, Alex Deucher, Christian König, Xinhui
  Cc: dri-devel, linux-kernel, amd-gfx

On 2022-10-28 18:48, Brian Norris wrote:
> If there are multiple amdgpu devices, this list processing can be racy.
>
> We're really treating this like a per-device list, so make that explicit
> and remove the global list.

I agree with the problem and the solution. See one comment inline.


>
> Signed-off-by: Brian Norris <briannorris@chromium.org>
> ---
>
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h     |  4 ++++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c | 12 +++++-------
>   2 files changed, 9 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 0e6ddf05c23c..e968b7f2417c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -1063,6 +1063,10 @@ struct amdgpu_device {
>   	struct work_struct		reset_work;
>   
>   	bool                            job_hang;
> +
> +#if IS_ENABLED(CONFIG_PERF_EVENTS)
> +	struct list_head pmu_list;
> +#endif
>   };
>   
>   static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c
> index 71ee361d0972..24f2055a2f23 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c
> @@ -23,6 +23,7 @@
>   
>   #include <linux/perf_event.h>
>   #include <linux/init.h>
> +#include <linux/list.h>
>   #include "amdgpu.h"
>   #include "amdgpu_pmu.h"
>   
> @@ -72,9 +73,6 @@ static ssize_t amdgpu_pmu_event_show(struct device *dev,
>   			amdgpu_pmu_attr->event_str, amdgpu_pmu_attr->type);
>   }
>   
> -static LIST_HEAD(amdgpu_pmu_list);
> -
> -
>   struct amdgpu_pmu_attr {
>   	const char *name;
>   	const char *config;
> @@ -558,7 +556,7 @@ static int init_pmu_entry_by_type_and_add(struct amdgpu_pmu_entry *pmu_entry,
>   		pr_info("Detected AMDGPU %d Perf Events.\n", total_num_events);
>   
>   
> -	list_add_tail(&pmu_entry->entry, &amdgpu_pmu_list);
> +	list_add_tail(&pmu_entry->entry, &pmu_entry->adev->pmu_list);

While you're making the pmu list per-device, I'd suggest removing adev 
from the pmu entry because it is now redundant. The device is implied by 
the list that the entry is on. Instead, add an adev parameter to 
init_pmu_entry_by_type_and_add. Or you could move the list_add_tail to 
amdgpu_pmu_init and remove "_and_add" from the function name.

Other than that, the patch looks good to me.

Regards,
   Felix


>   
>   	return 0;
>   err_register:
> @@ -579,9 +577,7 @@ void amdgpu_pmu_fini(struct amdgpu_device *adev)
>   {
>   	struct amdgpu_pmu_entry *pe, *temp;
>   
> -	list_for_each_entry_safe(pe, temp, &amdgpu_pmu_list, entry) {
> -		if (pe->adev != adev)
> -			continue;
> +	list_for_each_entry_safe(pe, temp, &adev->pmu_list, entry) {
>   		list_del(&pe->entry);
>   		perf_pmu_unregister(&pe->pmu);
>   		kfree(pe->pmu.attr_groups);
> @@ -623,6 +619,8 @@ int amdgpu_pmu_init(struct amdgpu_device *adev)
>   	int ret = 0;
>   	struct amdgpu_pmu_entry *pmu_entry, *pmu_entry_df;
>   
> +	INIT_LIST_HEAD(&adev->pmu_list);
> +
>   	switch (adev->asic_type) {
>   	case CHIP_VEGA20:
>   		pmu_entry_df = create_pmu_entry(adev, AMDGPU_PMU_PERF_TYPE_DF,

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] drm/amdgpu: Move racy global PMU list into device
  2022-11-08 16:50 ` Felix Kuehling
@ 2022-11-09  1:22   ` Brian Norris
  0 siblings, 0 replies; 5+ messages in thread
From: Brian Norris @ 2022-11-09  1:22 UTC (permalink / raw)
  To: Felix Kuehling
  Cc: Xinhui, linux-kernel, dri-devel, amd-gfx, Alex Deucher,
	Christian König

On Tue, Nov 08, 2022 at 11:50:04AM -0500, Felix Kuehling wrote:
> While you're making the pmu list per-device, I'd suggest removing adev from
> the pmu entry because it is now redundant. The device is implied by the list
> that the entry is on. Instead, add an adev parameter to
> init_pmu_entry_by_type_and_add. Or you could move the list_add_tail to
> amdgpu_pmu_init and remove "_and_add" from the function name.

Sorry if I'm being naive here, but does that mean trying to navigate the
list pointers to move from a 'pmu_entry' to an 'adev'
(list_first_entry(), etc.)? There are quite a few cases where we're
trying to go between 'pmu_entry' and 'adev'. I guess I could turn that
into a mini helper.

I'll also need to scrounge around a bit to see if I have an amdgpu
system around that actually supports PMU. I realized the one I tested on
doesn't actually hit this code path... and this would be getting a
little less obvious/trivial.

> Other than that, the patch looks good to me.

Thanks for looking!

Brian

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-11-09  1:22 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-28 22:48 [PATCH 1/2] drm/amdgpu: Move racy global PMU list into device Brian Norris
2022-10-28 22:48 ` [PATCH 2/2] drm/amdgpu: Set PROBE_PREFER_ASYNCHRONOUS Brian Norris
2022-11-08 16:11 ` [PATCH 1/2] drm/amdgpu: Move racy global PMU list into device Alex Deucher
2022-11-08 16:50 ` Felix Kuehling
2022-11-09  1:22   ` Brian Norris

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).