All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/amdkfd: Fix potential NULL pointer dereference
@ 2022-04-04 22:21 ` Grigory Vasilyev
  0 siblings, 0 replies; 7+ messages in thread
From: Grigory Vasilyev @ 2022-04-04 22:21 UTC (permalink / raw)
  To: Rodrigo Siqueira, Melissa Wen
  Cc: Grigory Vasilyev, Felix Kuehling, Alex Deucher,
	Christian König, Pan, Xinhui, David Airlie, Daniel Vetter,
	amd-gfx, dri-devel, linux-kernel

In the amdgpu_amdkfd_get_xgmi_bandwidth_mbytes function,
the peer_adev pointer can be NULL and is passed to amdgpu_xgmi_get_num_links.

In amdgpu_xgmi_get_num_links, peer_adev pointer is dereferenced
without any checks: peer_adev->gmc.xgmi.node_id .

Signed-off-by: Grigory Vasilyev <h0tc0d3@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index be1a55f8b8c5..1a1006b18016 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -541,11 +541,10 @@ int amdgpu_amdkfd_get_xgmi_bandwidth_mbytes(struct amdgpu_device *dst,
 	struct amdgpu_device *adev = dst, *peer_adev;
 	int num_links;
 
-	if (adev->asic_type != CHIP_ALDEBARAN)
+	if (!src || adev->asic_type != CHIP_ALDEBARAN)
 		return 0;
 
-	if (src)
-		peer_adev = src;
+	peer_adev = src;
 
 	/* num links returns 0 for indirect peers since indirect route is unknown. */
 	num_links = is_min ? 1 : amdgpu_xgmi_get_num_links(adev, peer_adev);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH] drm/amdkfd: Fix potential NULL pointer dereference
@ 2022-04-04 22:21 ` Grigory Vasilyev
  0 siblings, 0 replies; 7+ messages in thread
From: Grigory Vasilyev @ 2022-04-04 22:21 UTC (permalink / raw)
  To: Rodrigo Siqueira, Melissa Wen
  Cc: Grigory Vasilyev, Felix Kuehling, Pan, Xinhui, linux-kernel,
	amd-gfx, David Airlie, dri-devel, Alex Deucher,
	Christian König

In the amdgpu_amdkfd_get_xgmi_bandwidth_mbytes function,
the peer_adev pointer can be NULL and is passed to amdgpu_xgmi_get_num_links.

In amdgpu_xgmi_get_num_links, peer_adev pointer is dereferenced
without any checks: peer_adev->gmc.xgmi.node_id .

Signed-off-by: Grigory Vasilyev <h0tc0d3@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index be1a55f8b8c5..1a1006b18016 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -541,11 +541,10 @@ int amdgpu_amdkfd_get_xgmi_bandwidth_mbytes(struct amdgpu_device *dst,
 	struct amdgpu_device *adev = dst, *peer_adev;
 	int num_links;
 
-	if (adev->asic_type != CHIP_ALDEBARAN)
+	if (!src || adev->asic_type != CHIP_ALDEBARAN)
 		return 0;
 
-	if (src)
-		peer_adev = src;
+	peer_adev = src;
 
 	/* num links returns 0 for indirect peers since indirect route is unknown. */
 	num_links = is_min ? 1 : amdgpu_xgmi_get_num_links(adev, peer_adev);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH] drm/amdkfd: Fix potential NULL pointer dereference
@ 2022-04-04 22:21 ` Grigory Vasilyev
  0 siblings, 0 replies; 7+ messages in thread
From: Grigory Vasilyev @ 2022-04-04 22:21 UTC (permalink / raw)
  To: Rodrigo Siqueira, Melissa Wen
  Cc: Grigory Vasilyev, Felix Kuehling, Pan, Xinhui, linux-kernel,
	amd-gfx, David Airlie, dri-devel, Daniel Vetter, Alex Deucher,
	Christian König

In the amdgpu_amdkfd_get_xgmi_bandwidth_mbytes function,
the peer_adev pointer can be NULL and is passed to amdgpu_xgmi_get_num_links.

In amdgpu_xgmi_get_num_links, peer_adev pointer is dereferenced
without any checks: peer_adev->gmc.xgmi.node_id .

Signed-off-by: Grigory Vasilyev <h0tc0d3@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index be1a55f8b8c5..1a1006b18016 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -541,11 +541,10 @@ int amdgpu_amdkfd_get_xgmi_bandwidth_mbytes(struct amdgpu_device *dst,
 	struct amdgpu_device *adev = dst, *peer_adev;
 	int num_links;
 
-	if (adev->asic_type != CHIP_ALDEBARAN)
+	if (!src || adev->asic_type != CHIP_ALDEBARAN)
 		return 0;
 
-	if (src)
-		peer_adev = src;
+	peer_adev = src;
 
 	/* num links returns 0 for indirect peers since indirect route is unknown. */
 	num_links = is_min ? 1 : amdgpu_xgmi_get_num_links(adev, peer_adev);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/amdkfd: Fix potential NULL pointer dereference
  2022-04-04 22:21 ` Grigory Vasilyev
@ 2022-04-05 17:00   ` Felix Kuehling
  -1 siblings, 0 replies; 7+ messages in thread
From: Felix Kuehling @ 2022-04-05 17:00 UTC (permalink / raw)
  To: Grigory Vasilyev, Rodrigo Siqueira, Melissa Wen
  Cc: David Airlie, Pan, Xinhui, linux-kernel, dri-devel, amd-gfx,
	Alex Deucher, Christian König

Am 2022-04-04 um 18:21 schrieb Grigory Vasilyev:
> In the amdgpu_amdkfd_get_xgmi_bandwidth_mbytes function,
> the peer_adev pointer can be NULL and is passed to amdgpu_xgmi_get_num_links.
>
> In amdgpu_xgmi_get_num_links, peer_adev pointer is dereferenced
> without any checks: peer_adev->gmc.xgmi.node_id .

What's worse, peer_adev is uninitialized with an undefined value if src 
is NULL. So that code was definitely bogus.

However, I think your patch will result in incorrect results. Currently 
amdgpu_amdkfd_get_xgmi_bandwidth is always called with is_min=true if 
src==NULL, and with is_min=false if src!=NULL. The intention is, that we 
assume a single XGMI link in the case that src==NULL. That means the 
is_min parameter is redundant. What we should do instead is, assume that 
num_links==1 if src==NULL, and drop the is_min parameter.

That would keep things working the way they do now, and prevent 
potential problems in the future.

Regards,
   Felix


>
> Signed-off-by: Grigory Vasilyev <h0tc0d3@gmail.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 5 ++---
>   1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> index be1a55f8b8c5..1a1006b18016 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> @@ -541,11 +541,10 @@ int amdgpu_amdkfd_get_xgmi_bandwidth_mbytes(struct amdgpu_device *dst,
>   	struct amdgpu_device *adev = dst, *peer_adev;
>   	int num_links;
>   
> -	if (adev->asic_type != CHIP_ALDEBARAN)
> +	if (!src || adev->asic_type != CHIP_ALDEBARAN)
>   		return 0;
>   
> -	if (src)
> -		peer_adev = src;
> +	peer_adev = src;
>   
>   	/* num links returns 0 for indirect peers since indirect route is unknown. */
>   	num_links = is_min ? 1 : amdgpu_xgmi_get_num_links(adev, peer_adev);

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/amdkfd: Fix potential NULL pointer dereference
@ 2022-04-05 17:00   ` Felix Kuehling
  0 siblings, 0 replies; 7+ messages in thread
From: Felix Kuehling @ 2022-04-05 17:00 UTC (permalink / raw)
  To: Grigory Vasilyev, Rodrigo Siqueira, Melissa Wen
  Cc: Pan, Xinhui, linux-kernel, amd-gfx, David Airlie, dri-devel,
	Alex Deucher, Christian König

Am 2022-04-04 um 18:21 schrieb Grigory Vasilyev:
> In the amdgpu_amdkfd_get_xgmi_bandwidth_mbytes function,
> the peer_adev pointer can be NULL and is passed to amdgpu_xgmi_get_num_links.
>
> In amdgpu_xgmi_get_num_links, peer_adev pointer is dereferenced
> without any checks: peer_adev->gmc.xgmi.node_id .

What's worse, peer_adev is uninitialized with an undefined value if src 
is NULL. So that code was definitely bogus.

However, I think your patch will result in incorrect results. Currently 
amdgpu_amdkfd_get_xgmi_bandwidth is always called with is_min=true if 
src==NULL, and with is_min=false if src!=NULL. The intention is, that we 
assume a single XGMI link in the case that src==NULL. That means the 
is_min parameter is redundant. What we should do instead is, assume that 
num_links==1 if src==NULL, and drop the is_min parameter.

That would keep things working the way they do now, and prevent 
potential problems in the future.

Regards,
   Felix


>
> Signed-off-by: Grigory Vasilyev <h0tc0d3@gmail.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 5 ++---
>   1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> index be1a55f8b8c5..1a1006b18016 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> @@ -541,11 +541,10 @@ int amdgpu_amdkfd_get_xgmi_bandwidth_mbytes(struct amdgpu_device *dst,
>   	struct amdgpu_device *adev = dst, *peer_adev;
>   	int num_links;
>   
> -	if (adev->asic_type != CHIP_ALDEBARAN)
> +	if (!src || adev->asic_type != CHIP_ALDEBARAN)
>   		return 0;
>   
> -	if (src)
> -		peer_adev = src;
> +	peer_adev = src;
>   
>   	/* num links returns 0 for indirect peers since indirect route is unknown. */
>   	num_links = is_min ? 1 : amdgpu_xgmi_get_num_links(adev, peer_adev);

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/amdkfd: Fix potential NULL pointer dereference
  2022-04-05 17:00   ` Felix Kuehling
@ 2022-04-06 13:41     ` Григорий
  -1 siblings, 0 replies; 7+ messages in thread
From: Григорий @ 2022-04-06 13:41 UTC (permalink / raw)
  To: Felix Kuehling
  Cc: David Airlie, Pan, Xinhui, Rodrigo Siqueira, LKML, amd-gfx list,
	Melissa Wen, Maling list - DRI developers, Alex Deucher,
	Christian König

Thanks. You are right. I found a potential bug, and as I understand
it, the code only applies to the Aldebaran GPU and I can not check the
correctness of the code. I only test code on my navi 10 and run GPU
stress tests.
My knowledge of amdgpu is limited, and fixing potential bugs allows me
to learn more about amdgpu code. But there are many that I still don't
understand. In any case, we need to fix the code to eliminate
problems in the future.

Regards, Grigory.

вт, 5 апр. 2022 г. в 20:00, Felix Kuehling <felix.kuehling@amd.com>:
>
> Am 2022-04-04 um 18:21 schrieb Grigory Vasilyev:
> > In the amdgpu_amdkfd_get_xgmi_bandwidth_mbytes function,
> > the peer_adev pointer can be NULL and is passed to amdgpu_xgmi_get_num_links.
> >
> > In amdgpu_xgmi_get_num_links, peer_adev pointer is dereferenced
> > without any checks: peer_adev->gmc.xgmi.node_id .
>
> What's worse, peer_adev is uninitialized with an undefined value if src
> is NULL. So that code was definitely bogus.
>
> However, I think your patch will result in incorrect results. Currently
> amdgpu_amdkfd_get_xgmi_bandwidth is always called with is_min=true if
> src==NULL, and with is_min=false if src!=NULL. The intention is, that we
> assume a single XGMI link in the case that src==NULL. That means the
> is_min parameter is redundant. What we should do instead is, assume that
> num_links==1 if src==NULL, and drop the is_min parameter.
>
> That would keep things working the way they do now, and prevent
> potential problems in the future.
>
> Regards,
>    Felix
>
>
> >
> > Signed-off-by: Grigory Vasilyev <h0tc0d3@gmail.com>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 5 ++---
> >   1 file changed, 2 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> > index be1a55f8b8c5..1a1006b18016 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> > @@ -541,11 +541,10 @@ int amdgpu_amdkfd_get_xgmi_bandwidth_mbytes(struct amdgpu_device *dst,
> >       struct amdgpu_device *adev = dst, *peer_adev;
> >       int num_links;
> >
> > -     if (adev->asic_type != CHIP_ALDEBARAN)
> > +     if (!src || adev->asic_type != CHIP_ALDEBARAN)
> >               return 0;
> >
> > -     if (src)
> > -             peer_adev = src;
> > +     peer_adev = src;
> >
> >       /* num links returns 0 for indirect peers since indirect route is unknown. */
> >       num_links = is_min ? 1 : amdgpu_xgmi_get_num_links(adev, peer_adev);

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/amdkfd: Fix potential NULL pointer dereference
@ 2022-04-06 13:41     ` Григорий
  0 siblings, 0 replies; 7+ messages in thread
From: Григорий @ 2022-04-06 13:41 UTC (permalink / raw)
  To: Felix Kuehling
  Cc: Rodrigo Siqueira, Melissa Wen, Pan, Xinhui, LKML, amd-gfx list,
	David Airlie, Maling list - DRI developers, Alex Deucher,
	Christian König

Thanks. You are right. I found a potential bug, and as I understand
it, the code only applies to the Aldebaran GPU and I can not check the
correctness of the code. I only test code on my navi 10 and run GPU
stress tests.
My knowledge of amdgpu is limited, and fixing potential bugs allows me
to learn more about amdgpu code. But there are many that I still don't
understand. In any case, we need to fix the code to eliminate
problems in the future.

Regards, Grigory.

вт, 5 апр. 2022 г. в 20:00, Felix Kuehling <felix.kuehling@amd.com>:
>
> Am 2022-04-04 um 18:21 schrieb Grigory Vasilyev:
> > In the amdgpu_amdkfd_get_xgmi_bandwidth_mbytes function,
> > the peer_adev pointer can be NULL and is passed to amdgpu_xgmi_get_num_links.
> >
> > In amdgpu_xgmi_get_num_links, peer_adev pointer is dereferenced
> > without any checks: peer_adev->gmc.xgmi.node_id .
>
> What's worse, peer_adev is uninitialized with an undefined value if src
> is NULL. So that code was definitely bogus.
>
> However, I think your patch will result in incorrect results. Currently
> amdgpu_amdkfd_get_xgmi_bandwidth is always called with is_min=true if
> src==NULL, and with is_min=false if src!=NULL. The intention is, that we
> assume a single XGMI link in the case that src==NULL. That means the
> is_min parameter is redundant. What we should do instead is, assume that
> num_links==1 if src==NULL, and drop the is_min parameter.
>
> That would keep things working the way they do now, and prevent
> potential problems in the future.
>
> Regards,
>    Felix
>
>
> >
> > Signed-off-by: Grigory Vasilyev <h0tc0d3@gmail.com>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 5 ++---
> >   1 file changed, 2 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> > index be1a55f8b8c5..1a1006b18016 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> > @@ -541,11 +541,10 @@ int amdgpu_amdkfd_get_xgmi_bandwidth_mbytes(struct amdgpu_device *dst,
> >       struct amdgpu_device *adev = dst, *peer_adev;
> >       int num_links;
> >
> > -     if (adev->asic_type != CHIP_ALDEBARAN)
> > +     if (!src || adev->asic_type != CHIP_ALDEBARAN)
> >               return 0;
> >
> > -     if (src)
> > -             peer_adev = src;
> > +     peer_adev = src;
> >
> >       /* num links returns 0 for indirect peers since indirect route is unknown. */
> >       num_links = is_min ? 1 : amdgpu_xgmi_get_num_links(adev, peer_adev);

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-04-07  7:32 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-04 22:21 [PATCH] drm/amdkfd: Fix potential NULL pointer dereference Grigory Vasilyev
2022-04-04 22:21 ` Grigory Vasilyev
2022-04-04 22:21 ` Grigory Vasilyev
2022-04-05 17:00 ` Felix Kuehling
2022-04-05 17:00   ` Felix Kuehling
2022-04-06 13:41   ` Григорий
2022-04-06 13:41     ` Григорий

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.