* [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems @ 2021-03-29 17:53 ` Xℹ Ruoyao 0 siblings, 0 replies; 54+ messages in thread From: Xℹ Ruoyao @ 2021-03-29 17:53 UTC (permalink / raw) To: Alex Deucher, Christian König Cc: David Airlie, Daniel Vetter, amd-gfx, dri-devel, linux-kernel, Felix Kuehling, Xℹ Ruoyao, Dan Horák, stable If the initial value of `num_entires` (calculated at line 1654) is not an integral multiple of `AMDGPU_GPU_PAGES_IN_CPU_PAGE`, in line 1681 a value greater than the initial value will be assigned to it. That causes `start > last + 1` after line 1708. Then in the next iteration an underflow happens at line 1654. It causes message *ERROR* Couldn't update BO_VA (-12) printed in kernel log, and GPU hanging. Fortify the criteria of the loop to fix this issue. BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1549 Fixes: a39f2a8d7066 ("drm/amdgpu: nuke amdgpu_vm_bo_split_mapping v2") Reported-by: Xi Ruoyao <xry111@mengyan1223.wang> Reported-by: Dan Horák <dan@danny.cz> Cc: stable@vger.kernel.org Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang> --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index ad91c0c3c423..cee0cc9c8085 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -1707,7 +1707,7 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev, } start = tmp; - } while (unlikely(start != last + 1)); + } while (unlikely(start < last + 1)); r = vm->update_funcs->commit(¶ms, fence); base-commit: a5e13c6df0e41702d2b2c77c8ad41677ebb065b3 -- 2.31.1 ^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems @ 2021-03-29 17:53 ` Xℹ Ruoyao 0 siblings, 0 replies; 54+ messages in thread From: Xℹ Ruoyao @ 2021-03-29 17:53 UTC (permalink / raw) To: Alex Deucher, Christian König Cc: David Airlie, Felix Kuehling, linux-kernel, dri-devel, Dan Horák, amd-gfx, Daniel Vetter, stable, Xℹ Ruoyao If the initial value of `num_entires` (calculated at line 1654) is not an integral multiple of `AMDGPU_GPU_PAGES_IN_CPU_PAGE`, in line 1681 a value greater than the initial value will be assigned to it. That causes `start > last + 1` after line 1708. Then in the next iteration an underflow happens at line 1654. It causes message *ERROR* Couldn't update BO_VA (-12) printed in kernel log, and GPU hanging. Fortify the criteria of the loop to fix this issue. BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1549 Fixes: a39f2a8d7066 ("drm/amdgpu: nuke amdgpu_vm_bo_split_mapping v2") Reported-by: Xi Ruoyao <xry111@mengyan1223.wang> Reported-by: Dan Horák <dan@danny.cz> Cc: stable@vger.kernel.org Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang> --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index ad91c0c3c423..cee0cc9c8085 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -1707,7 +1707,7 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev, } start = tmp; - } while (unlikely(start != last + 1)); + } while (unlikely(start < last + 1)); r = vm->update_funcs->commit(¶ms, fence); base-commit: a5e13c6df0e41702d2b2c77c8ad41677ebb065b3 -- 2.31.1 _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems @ 2021-03-29 17:53 ` Xℹ Ruoyao 0 siblings, 0 replies; 54+ messages in thread From: Xℹ Ruoyao @ 2021-03-29 17:53 UTC (permalink / raw) To: Alex Deucher, Christian König Cc: David Airlie, Felix Kuehling, linux-kernel, dri-devel, Dan Horák, amd-gfx, stable, Xℹ Ruoyao If the initial value of `num_entires` (calculated at line 1654) is not an integral multiple of `AMDGPU_GPU_PAGES_IN_CPU_PAGE`, in line 1681 a value greater than the initial value will be assigned to it. That causes `start > last + 1` after line 1708. Then in the next iteration an underflow happens at line 1654. It causes message *ERROR* Couldn't update BO_VA (-12) printed in kernel log, and GPU hanging. Fortify the criteria of the loop to fix this issue. BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1549 Fixes: a39f2a8d7066 ("drm/amdgpu: nuke amdgpu_vm_bo_split_mapping v2") Reported-by: Xi Ruoyao <xry111@mengyan1223.wang> Reported-by: Dan Horák <dan@danny.cz> Cc: stable@vger.kernel.org Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang> --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index ad91c0c3c423..cee0cc9c8085 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -1707,7 +1707,7 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev, } start = tmp; - } while (unlikely(start != last + 1)); + } while (unlikely(start < last + 1)); r = vm->update_funcs->commit(¶ms, fence); base-commit: a5e13c6df0e41702d2b2c77c8ad41677ebb065b3 -- 2.31.1 _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply related [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems 2021-03-29 17:53 ` Xℹ Ruoyao (?) @ 2021-03-29 18:04 ` Christian König -1 siblings, 0 replies; 54+ messages in thread From: Christian König @ 2021-03-29 18:04 UTC (permalink / raw) To: Xℹ Ruoyao, Alex Deucher, Christian König Cc: David Airlie, Felix Kuehling, linux-kernel, dri-devel, Dan Horák, amd-gfx, Daniel Vetter, stable Am 29.03.21 um 19:53 schrieb Xℹ Ruoyao: > If the initial value of `num_entires` (calculated at line 1654) is not > an integral multiple of `AMDGPU_GPU_PAGES_IN_CPU_PAGE`, in line 1681 a > value greater than the initial value will be assigned to it. That causes > `start > last + 1` after line 1708. Then in the next iteration an > underflow happens at line 1654. It causes message > > *ERROR* Couldn't update BO_VA (-12) > > printed in kernel log, and GPU hanging. > > Fortify the criteria of the loop to fix this issue. NAK the value of num_entries must always be a multiple of AMDGPU_GPU_PAGES_IN_CPU_PAGE or otherwise we corrupt the page tables. How do you trigger that? Christian. > > BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1549 > Fixes: a39f2a8d7066 ("drm/amdgpu: nuke amdgpu_vm_bo_split_mapping v2") > Reported-by: Xi Ruoyao <xry111@mengyan1223.wang> > Reported-by: Dan Horák <dan@danny.cz> > Cc: stable@vger.kernel.org > Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > index ad91c0c3c423..cee0cc9c8085 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > @@ -1707,7 +1707,7 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev, > } > start = tmp; > > - } while (unlikely(start != last + 1)); > + } while (unlikely(start < last + 1)); > > r = vm->update_funcs->commit(¶ms, fence); > > > base-commit: a5e13c6df0e41702d2b2c77c8ad41677ebb065b3 ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems @ 2021-03-29 18:04 ` Christian König 0 siblings, 0 replies; 54+ messages in thread From: Christian König @ 2021-03-29 18:04 UTC (permalink / raw) To: Xℹ Ruoyao, Alex Deucher, Christian König Cc: David Airlie, Felix Kuehling, linux-kernel, amd-gfx, Dan Horák, dri-devel, Daniel Vetter, stable Am 29.03.21 um 19:53 schrieb Xℹ Ruoyao: > If the initial value of `num_entires` (calculated at line 1654) is not > an integral multiple of `AMDGPU_GPU_PAGES_IN_CPU_PAGE`, in line 1681 a > value greater than the initial value will be assigned to it. That causes > `start > last + 1` after line 1708. Then in the next iteration an > underflow happens at line 1654. It causes message > > *ERROR* Couldn't update BO_VA (-12) > > printed in kernel log, and GPU hanging. > > Fortify the criteria of the loop to fix this issue. NAK the value of num_entries must always be a multiple of AMDGPU_GPU_PAGES_IN_CPU_PAGE or otherwise we corrupt the page tables. How do you trigger that? Christian. > > BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1549 > Fixes: a39f2a8d7066 ("drm/amdgpu: nuke amdgpu_vm_bo_split_mapping v2") > Reported-by: Xi Ruoyao <xry111@mengyan1223.wang> > Reported-by: Dan Horák <dan@danny.cz> > Cc: stable@vger.kernel.org > Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > index ad91c0c3c423..cee0cc9c8085 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > @@ -1707,7 +1707,7 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev, > } > start = tmp; > > - } while (unlikely(start != last + 1)); > + } while (unlikely(start < last + 1)); > > r = vm->update_funcs->commit(¶ms, fence); > > > base-commit: a5e13c6df0e41702d2b2c77c8ad41677ebb065b3 _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems @ 2021-03-29 18:04 ` Christian König 0 siblings, 0 replies; 54+ messages in thread From: Christian König @ 2021-03-29 18:04 UTC (permalink / raw) To: Xℹ Ruoyao, Alex Deucher, Christian König Cc: David Airlie, Felix Kuehling, linux-kernel, amd-gfx, Dan Horák, dri-devel, stable Am 29.03.21 um 19:53 schrieb Xℹ Ruoyao: > If the initial value of `num_entires` (calculated at line 1654) is not > an integral multiple of `AMDGPU_GPU_PAGES_IN_CPU_PAGE`, in line 1681 a > value greater than the initial value will be assigned to it. That causes > `start > last + 1` after line 1708. Then in the next iteration an > underflow happens at line 1654. It causes message > > *ERROR* Couldn't update BO_VA (-12) > > printed in kernel log, and GPU hanging. > > Fortify the criteria of the loop to fix this issue. NAK the value of num_entries must always be a multiple of AMDGPU_GPU_PAGES_IN_CPU_PAGE or otherwise we corrupt the page tables. How do you trigger that? Christian. > > BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1549 > Fixes: a39f2a8d7066 ("drm/amdgpu: nuke amdgpu_vm_bo_split_mapping v2") > Reported-by: Xi Ruoyao <xry111@mengyan1223.wang> > Reported-by: Dan Horák <dan@danny.cz> > Cc: stable@vger.kernel.org > Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > index ad91c0c3c423..cee0cc9c8085 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > @@ -1707,7 +1707,7 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev, > } > start = tmp; > > - } while (unlikely(start != last + 1)); > + } while (unlikely(start < last + 1)); > > r = vm->update_funcs->commit(¶ms, fence); > > > base-commit: a5e13c6df0e41702d2b2c77c8ad41677ebb065b3 _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems 2021-03-29 18:04 ` Christian König (?) @ 2021-03-29 18:08 ` Xi Ruoyao -1 siblings, 0 replies; 54+ messages in thread From: Xi Ruoyao @ 2021-03-29 18:08 UTC (permalink / raw) To: Christian König, Alex Deucher, Christian König Cc: David Airlie, Felix Kuehling, linux-kernel, dri-devel, Dan Horák, amd-gfx, Daniel Vetter, stable On 2021-03-29 20:04 +0200, Christian König wrote: > Am 29.03.21 um 19:53 schrieb Xℹ Ruoyao: > > If the initial value of `num_entires` (calculated at line 1654) is not > > an integral multiple of `AMDGPU_GPU_PAGES_IN_CPU_PAGE`, in line 1681 a > > value greater than the initial value will be assigned to it. That causes > > `start > last + 1` after line 1708. Then in the next iteration an > > underflow happens at line 1654. It causes message > > > > *ERROR* Couldn't update BO_VA (-12) > > > > printed in kernel log, and GPU hanging. > > > > Fortify the criteria of the loop to fix this issue. > > NAK the value of num_entries must always be a multiple of > AMDGPU_GPU_PAGES_IN_CPU_PAGE or otherwise we corrupt the page tables. > > How do you trigger that? Simply run "OpenGL area" from gtk3-demo (which just renders a triangle with GL) under Xorg, on MIPS64. See the BugLink. > Christian. > > > > > BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1549 > > Fixes: a39f2a8d7066 ("drm/amdgpu: nuke amdgpu_vm_bo_split_mapping v2") > > Reported-by: Xi Ruoyao <xry111@mengyan1223.wang> > > Reported-by: Dan Horák <dan@danny.cz> > > Cc: stable@vger.kernel.org > > Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang> > > --- > > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > index ad91c0c3c423..cee0cc9c8085 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > @@ -1707,7 +1707,7 @@ static int amdgpu_vm_bo_update_mapping(struct > > amdgpu_device *adev, > > } > > start = tmp; > > > > - } while (unlikely(start != last + 1)); > > + } while (unlikely(start < last + 1)); > > > > r = vm->update_funcs->commit(¶ms, fence); > > > > > > base-commit: a5e13c6df0e41702d2b2c77c8ad41677ebb065b3 > -- Xi Ruoyao <xry111@mengyan1223.wang> School of Aerospace Science and Technology, Xidian University ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems @ 2021-03-29 18:08 ` Xi Ruoyao 0 siblings, 0 replies; 54+ messages in thread From: Xi Ruoyao @ 2021-03-29 18:08 UTC (permalink / raw) To: Christian König, Alex Deucher, Christian König Cc: David Airlie, Felix Kuehling, linux-kernel, amd-gfx, Dan Horák, dri-devel, Daniel Vetter, stable On 2021-03-29 20:04 +0200, Christian König wrote: > Am 29.03.21 um 19:53 schrieb Xℹ Ruoyao: > > If the initial value of `num_entires` (calculated at line 1654) is not > > an integral multiple of `AMDGPU_GPU_PAGES_IN_CPU_PAGE`, in line 1681 a > > value greater than the initial value will be assigned to it. That causes > > `start > last + 1` after line 1708. Then in the next iteration an > > underflow happens at line 1654. It causes message > > > > *ERROR* Couldn't update BO_VA (-12) > > > > printed in kernel log, and GPU hanging. > > > > Fortify the criteria of the loop to fix this issue. > > NAK the value of num_entries must always be a multiple of > AMDGPU_GPU_PAGES_IN_CPU_PAGE or otherwise we corrupt the page tables. > > How do you trigger that? Simply run "OpenGL area" from gtk3-demo (which just renders a triangle with GL) under Xorg, on MIPS64. See the BugLink. > Christian. > > > > > BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1549 > > Fixes: a39f2a8d7066 ("drm/amdgpu: nuke amdgpu_vm_bo_split_mapping v2") > > Reported-by: Xi Ruoyao <xry111@mengyan1223.wang> > > Reported-by: Dan Horák <dan@danny.cz> > > Cc: stable@vger.kernel.org > > Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang> > > --- > > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > index ad91c0c3c423..cee0cc9c8085 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > @@ -1707,7 +1707,7 @@ static int amdgpu_vm_bo_update_mapping(struct > > amdgpu_device *adev, > > } > > start = tmp; > > > > - } while (unlikely(start != last + 1)); > > + } while (unlikely(start < last + 1)); > > > > r = vm->update_funcs->commit(¶ms, fence); > > > > > > base-commit: a5e13c6df0e41702d2b2c77c8ad41677ebb065b3 > -- Xi Ruoyao <xry111@mengyan1223.wang> School of Aerospace Science and Technology, Xidian University _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems @ 2021-03-29 18:08 ` Xi Ruoyao 0 siblings, 0 replies; 54+ messages in thread From: Xi Ruoyao @ 2021-03-29 18:08 UTC (permalink / raw) To: Christian König, Alex Deucher, Christian König Cc: David Airlie, Felix Kuehling, linux-kernel, amd-gfx, Dan Horák, dri-devel, stable On 2021-03-29 20:04 +0200, Christian König wrote: > Am 29.03.21 um 19:53 schrieb Xℹ Ruoyao: > > If the initial value of `num_entires` (calculated at line 1654) is not > > an integral multiple of `AMDGPU_GPU_PAGES_IN_CPU_PAGE`, in line 1681 a > > value greater than the initial value will be assigned to it. That causes > > `start > last + 1` after line 1708. Then in the next iteration an > > underflow happens at line 1654. It causes message > > > > *ERROR* Couldn't update BO_VA (-12) > > > > printed in kernel log, and GPU hanging. > > > > Fortify the criteria of the loop to fix this issue. > > NAK the value of num_entries must always be a multiple of > AMDGPU_GPU_PAGES_IN_CPU_PAGE or otherwise we corrupt the page tables. > > How do you trigger that? Simply run "OpenGL area" from gtk3-demo (which just renders a triangle with GL) under Xorg, on MIPS64. See the BugLink. > Christian. > > > > > BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1549 > > Fixes: a39f2a8d7066 ("drm/amdgpu: nuke amdgpu_vm_bo_split_mapping v2") > > Reported-by: Xi Ruoyao <xry111@mengyan1223.wang> > > Reported-by: Dan Horák <dan@danny.cz> > > Cc: stable@vger.kernel.org > > Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang> > > --- > > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > index ad91c0c3c423..cee0cc9c8085 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > @@ -1707,7 +1707,7 @@ static int amdgpu_vm_bo_update_mapping(struct > > amdgpu_device *adev, > > } > > start = tmp; > > > > - } while (unlikely(start != last + 1)); > > + } while (unlikely(start < last + 1)); > > > > r = vm->update_funcs->commit(¶ms, fence); > > > > > > base-commit: a5e13c6df0e41702d2b2c77c8ad41677ebb065b3 > -- Xi Ruoyao <xry111@mengyan1223.wang> School of Aerospace Science and Technology, Xidian University _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems 2021-03-29 18:08 ` Xi Ruoyao (?) @ 2021-03-29 18:10 ` Christian König -1 siblings, 0 replies; 54+ messages in thread From: Christian König @ 2021-03-29 18:10 UTC (permalink / raw) To: Xi Ruoyao, Alex Deucher, Christian König Cc: David Airlie, Felix Kuehling, linux-kernel, dri-devel, Dan Horák, amd-gfx, Daniel Vetter, stable Am 29.03.21 um 20:08 schrieb Xi Ruoyao: > On 2021-03-29 20:04 +0200, Christian König wrote: >> Am 29.03.21 um 19:53 schrieb Xℹ Ruoyao: >>> If the initial value of `num_entires` (calculated at line 1654) is not >>> an integral multiple of `AMDGPU_GPU_PAGES_IN_CPU_PAGE`, in line 1681 a >>> value greater than the initial value will be assigned to it. That causes >>> `start > last + 1` after line 1708. Then in the next iteration an >>> underflow happens at line 1654. It causes message >>> >>> *ERROR* Couldn't update BO_VA (-12) >>> >>> printed in kernel log, and GPU hanging. >>> >>> Fortify the criteria of the loop to fix this issue. >> NAK the value of num_entries must always be a multiple of >> AMDGPU_GPU_PAGES_IN_CPU_PAGE or otherwise we corrupt the page tables. >> >> How do you trigger that? > Simply run "OpenGL area" from gtk3-demo (which just renders a triangle with GL) > under Xorg, on MIPS64. See the BugLink. You need to identify the root cause of this, most likely start or last are not a multiple of AMDGPU_GPU_PAGES_IN_CPU_PAGE. Christian. > >> Christian. >> >>> BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1549 >>> Fixes: a39f2a8d7066 ("drm/amdgpu: nuke amdgpu_vm_bo_split_mapping v2") >>> Reported-by: Xi Ruoyao <xry111@mengyan1223.wang> >>> Reported-by: Dan Horák <dan@danny.cz> >>> Cc: stable@vger.kernel.org >>> Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang> >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>> index ad91c0c3c423..cee0cc9c8085 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>> @@ -1707,7 +1707,7 @@ static int amdgpu_vm_bo_update_mapping(struct >>> amdgpu_device *adev, >>> } >>> start = tmp; >>> >>> - } while (unlikely(start != last + 1)); >>> + } while (unlikely(start < last + 1)); >>> >>> r = vm->update_funcs->commit(¶ms, fence); >>> >>> >>> base-commit: a5e13c6df0e41702d2b2c77c8ad41677ebb065b3 ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems @ 2021-03-29 18:10 ` Christian König 0 siblings, 0 replies; 54+ messages in thread From: Christian König @ 2021-03-29 18:10 UTC (permalink / raw) To: Xi Ruoyao, Alex Deucher, Christian König Cc: David Airlie, Felix Kuehling, linux-kernel, amd-gfx, Dan Horák, dri-devel, Daniel Vetter, stable Am 29.03.21 um 20:08 schrieb Xi Ruoyao: > On 2021-03-29 20:04 +0200, Christian König wrote: >> Am 29.03.21 um 19:53 schrieb Xℹ Ruoyao: >>> If the initial value of `num_entires` (calculated at line 1654) is not >>> an integral multiple of `AMDGPU_GPU_PAGES_IN_CPU_PAGE`, in line 1681 a >>> value greater than the initial value will be assigned to it. That causes >>> `start > last + 1` after line 1708. Then in the next iteration an >>> underflow happens at line 1654. It causes message >>> >>> *ERROR* Couldn't update BO_VA (-12) >>> >>> printed in kernel log, and GPU hanging. >>> >>> Fortify the criteria of the loop to fix this issue. >> NAK the value of num_entries must always be a multiple of >> AMDGPU_GPU_PAGES_IN_CPU_PAGE or otherwise we corrupt the page tables. >> >> How do you trigger that? > Simply run "OpenGL area" from gtk3-demo (which just renders a triangle with GL) > under Xorg, on MIPS64. See the BugLink. You need to identify the root cause of this, most likely start or last are not a multiple of AMDGPU_GPU_PAGES_IN_CPU_PAGE. Christian. > >> Christian. >> >>> BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1549 >>> Fixes: a39f2a8d7066 ("drm/amdgpu: nuke amdgpu_vm_bo_split_mapping v2") >>> Reported-by: Xi Ruoyao <xry111@mengyan1223.wang> >>> Reported-by: Dan Horák <dan@danny.cz> >>> Cc: stable@vger.kernel.org >>> Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang> >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>> index ad91c0c3c423..cee0cc9c8085 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>> @@ -1707,7 +1707,7 @@ static int amdgpu_vm_bo_update_mapping(struct >>> amdgpu_device *adev, >>> } >>> start = tmp; >>> >>> - } while (unlikely(start != last + 1)); >>> + } while (unlikely(start < last + 1)); >>> >>> r = vm->update_funcs->commit(¶ms, fence); >>> >>> >>> base-commit: a5e13c6df0e41702d2b2c77c8ad41677ebb065b3 _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems @ 2021-03-29 18:10 ` Christian König 0 siblings, 0 replies; 54+ messages in thread From: Christian König @ 2021-03-29 18:10 UTC (permalink / raw) To: Xi Ruoyao, Alex Deucher, Christian König Cc: David Airlie, Felix Kuehling, linux-kernel, amd-gfx, Dan Horák, dri-devel, stable Am 29.03.21 um 20:08 schrieb Xi Ruoyao: > On 2021-03-29 20:04 +0200, Christian König wrote: >> Am 29.03.21 um 19:53 schrieb Xℹ Ruoyao: >>> If the initial value of `num_entires` (calculated at line 1654) is not >>> an integral multiple of `AMDGPU_GPU_PAGES_IN_CPU_PAGE`, in line 1681 a >>> value greater than the initial value will be assigned to it. That causes >>> `start > last + 1` after line 1708. Then in the next iteration an >>> underflow happens at line 1654. It causes message >>> >>> *ERROR* Couldn't update BO_VA (-12) >>> >>> printed in kernel log, and GPU hanging. >>> >>> Fortify the criteria of the loop to fix this issue. >> NAK the value of num_entries must always be a multiple of >> AMDGPU_GPU_PAGES_IN_CPU_PAGE or otherwise we corrupt the page tables. >> >> How do you trigger that? > Simply run "OpenGL area" from gtk3-demo (which just renders a triangle with GL) > under Xorg, on MIPS64. See the BugLink. You need to identify the root cause of this, most likely start or last are not a multiple of AMDGPU_GPU_PAGES_IN_CPU_PAGE. Christian. > >> Christian. >> >>> BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1549 >>> Fixes: a39f2a8d7066 ("drm/amdgpu: nuke amdgpu_vm_bo_split_mapping v2") >>> Reported-by: Xi Ruoyao <xry111@mengyan1223.wang> >>> Reported-by: Dan Horák <dan@danny.cz> >>> Cc: stable@vger.kernel.org >>> Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang> >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>> index ad91c0c3c423..cee0cc9c8085 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>> @@ -1707,7 +1707,7 @@ static int amdgpu_vm_bo_update_mapping(struct >>> amdgpu_device *adev, >>> } >>> start = tmp; >>> >>> - } while (unlikely(start != last + 1)); >>> + } while (unlikely(start < last + 1)); >>> >>> r = vm->update_funcs->commit(¶ms, fence); >>> >>> >>> base-commit: a5e13c6df0e41702d2b2c77c8ad41677ebb065b3 _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems 2021-03-29 18:10 ` Christian König (?) @ 2021-03-29 18:21 ` Xi Ruoyao -1 siblings, 0 replies; 54+ messages in thread From: Xi Ruoyao @ 2021-03-29 18:21 UTC (permalink / raw) To: Christian König, Alex Deucher, Christian König Cc: David Airlie, Felix Kuehling, linux-kernel, dri-devel, Dan Horák, amd-gfx, Daniel Vetter, stable On 2021-03-29 20:10 +0200, Christian König wrote: > You need to identify the root cause of this, most likely start or last > are not a multiple of AMDGPU_GPU_PAGES_IN_CPU_PAGE. I printk'ed the value of start & last, they are all a multiple of 4 (AMDGPU_GPU_PAGES_IN_CPU_PAGE). However... `num_entries = last - start + 1` so it became some irrational thing... Either this line is wrong, or someone called amdgpu_vm_bo_update_mapping with [start, last) instead of [start, last], which is unexpected. > > > > BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1549 > > > > Fixes: a39f2a8d7066 ("drm/amdgpu: nuke amdgpu_vm_bo_split_mapping v2") > > > > Reported-by: Xi Ruoyao <xry111@mengyan1223.wang> > > > > Reported-by: Dan Horák <dan@danny.cz> > > > > Cc: stable@vger.kernel.org > > > > Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang> > > > > --- > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > > > index ad91c0c3c423..cee0cc9c8085 100644 > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > > > @@ -1707,7 +1707,7 @@ static int amdgpu_vm_bo_update_mapping(struct > > > > amdgpu_device *adev, > > > > } > > > > start = tmp; > > > > > > > > - } while (unlikely(start != last + 1)); > > > > + } while (unlikely(start < last + 1)); > > > > > > > > r = vm->update_funcs->commit(¶ms, fence); > > > > > > > > > > > > base-commit: a5e13c6df0e41702d2b2c77c8ad41677ebb065b3 > -- Xi Ruoyao <xry111@mengyan1223.wang> School of Aerospace Science and Technology, Xidian University ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems @ 2021-03-29 18:21 ` Xi Ruoyao 0 siblings, 0 replies; 54+ messages in thread From: Xi Ruoyao @ 2021-03-29 18:21 UTC (permalink / raw) To: Christian König, Alex Deucher, Christian König Cc: David Airlie, Felix Kuehling, linux-kernel, amd-gfx, Dan Horák, dri-devel, Daniel Vetter, stable On 2021-03-29 20:10 +0200, Christian König wrote: > You need to identify the root cause of this, most likely start or last > are not a multiple of AMDGPU_GPU_PAGES_IN_CPU_PAGE. I printk'ed the value of start & last, they are all a multiple of 4 (AMDGPU_GPU_PAGES_IN_CPU_PAGE). However... `num_entries = last - start + 1` so it became some irrational thing... Either this line is wrong, or someone called amdgpu_vm_bo_update_mapping with [start, last) instead of [start, last], which is unexpected. > > > > BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1549 > > > > Fixes: a39f2a8d7066 ("drm/amdgpu: nuke amdgpu_vm_bo_split_mapping v2") > > > > Reported-by: Xi Ruoyao <xry111@mengyan1223.wang> > > > > Reported-by: Dan Horák <dan@danny.cz> > > > > Cc: stable@vger.kernel.org > > > > Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang> > > > > --- > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > > > index ad91c0c3c423..cee0cc9c8085 100644 > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > > > @@ -1707,7 +1707,7 @@ static int amdgpu_vm_bo_update_mapping(struct > > > > amdgpu_device *adev, > > > > } > > > > start = tmp; > > > > > > > > - } while (unlikely(start != last + 1)); > > > > + } while (unlikely(start < last + 1)); > > > > > > > > r = vm->update_funcs->commit(¶ms, fence); > > > > > > > > > > > > base-commit: a5e13c6df0e41702d2b2c77c8ad41677ebb065b3 > -- Xi Ruoyao <xry111@mengyan1223.wang> School of Aerospace Science and Technology, Xidian University _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems @ 2021-03-29 18:21 ` Xi Ruoyao 0 siblings, 0 replies; 54+ messages in thread From: Xi Ruoyao @ 2021-03-29 18:21 UTC (permalink / raw) To: Christian König, Alex Deucher, Christian König Cc: David Airlie, Felix Kuehling, linux-kernel, amd-gfx, Dan Horák, dri-devel, stable On 2021-03-29 20:10 +0200, Christian König wrote: > You need to identify the root cause of this, most likely start or last > are not a multiple of AMDGPU_GPU_PAGES_IN_CPU_PAGE. I printk'ed the value of start & last, they are all a multiple of 4 (AMDGPU_GPU_PAGES_IN_CPU_PAGE). However... `num_entries = last - start + 1` so it became some irrational thing... Either this line is wrong, or someone called amdgpu_vm_bo_update_mapping with [start, last) instead of [start, last], which is unexpected. > > > > BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1549 > > > > Fixes: a39f2a8d7066 ("drm/amdgpu: nuke amdgpu_vm_bo_split_mapping v2") > > > > Reported-by: Xi Ruoyao <xry111@mengyan1223.wang> > > > > Reported-by: Dan Horák <dan@danny.cz> > > > > Cc: stable@vger.kernel.org > > > > Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang> > > > > --- > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > > > index ad91c0c3c423..cee0cc9c8085 100644 > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > > > @@ -1707,7 +1707,7 @@ static int amdgpu_vm_bo_update_mapping(struct > > > > amdgpu_device *adev, > > > > } > > > > start = tmp; > > > > > > > > - } while (unlikely(start != last + 1)); > > > > + } while (unlikely(start < last + 1)); > > > > > > > > r = vm->update_funcs->commit(¶ms, fence); > > > > > > > > > > > > base-commit: a5e13c6df0e41702d2b2c77c8ad41677ebb065b3 > -- Xi Ruoyao <xry111@mengyan1223.wang> School of Aerospace Science and Technology, Xidian University _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems 2021-03-29 18:21 ` Xi Ruoyao (?) @ 2021-03-29 18:30 ` Xi Ruoyao -1 siblings, 0 replies; 54+ messages in thread From: Xi Ruoyao @ 2021-03-29 18:30 UTC (permalink / raw) To: Christian König, Alex Deucher, Christian König Cc: David Airlie, Felix Kuehling, linux-kernel, dri-devel, Dan Horák, amd-gfx, Daniel Vetter, stable On 2021-03-30 02:21 +0800, Xi Ruoyao wrote: > On 2021-03-29 20:10 +0200, Christian König wrote: > > You need to identify the root cause of this, most likely start or last > > are not a multiple of AMDGPU_GPU_PAGES_IN_CPU_PAGE. > > I printk'ed the value of start & last, they are all a multiple of 4 > (AMDGPU_GPU_PAGES_IN_CPU_PAGE). > > However... `num_entries = last - start + 1` so it became some irrational > thing... Either this line is wrong, or someone called > amdgpu_vm_bo_update_mapping with [start, last) instead of [start, last], which > is unexpected. I added BUG_ON(num_entries % AMDGPU_GPU_PAGES_IN_CPU_PAGE != 0), get: > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c90750>] > amdgpu_vm_bo_update_mapping.constprop.0+0x218/0xae8 > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c922b8>] > amdgpu_vm_bo_update+0x270/0x4c0 > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c823ec>] > amdgpu_gem_va_ioctl+0x40c/0x430 > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c1cae4>] > drm_ioctl_kernel+0xcc/0x120 > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c1cfd8>] > drm_ioctl+0x220/0x408 > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c5ea48>] > amdgpu_drm_ioctl+0x58/0x98 > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff838feac4>] sys_ioctl+0xcc/0xe8 > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff836e74f0>] > syscall_common+0x34/0x58 > > > > > > BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1549 > > > > > Fixes: a39f2a8d7066 ("drm/amdgpu: nuke amdgpu_vm_bo_split_mapping v2") > > > > > Reported-by: Xi Ruoyao <xry111@mengyan1223.wang> > > > > > Reported-by: Dan Horák <dan@danny.cz> > > > > > Cc: stable@vger.kernel.org > > > > > Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang> > > > > > --- > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +- > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > > > > index ad91c0c3c423..cee0cc9c8085 100644 > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > > > > @@ -1707,7 +1707,7 @@ static int amdgpu_vm_bo_update_mapping(struct > > > > > amdgpu_device *adev, > > > > > } > > > > > start = tmp; > > > > > > > > > > - } while (unlikely(start != last + 1)); > > > > > + } while (unlikely(start < last + 1)); > > > > > > > > > > r = vm->update_funcs->commit(¶ms, fence); > > > > > > > > > > > > > > > base-commit: a5e13c6df0e41702d2b2c77c8ad41677ebb065b3 > > > -- Xi Ruoyao <xry111@mengyan1223.wang> School of Aerospace Science and Technology, Xidian University ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems @ 2021-03-29 18:30 ` Xi Ruoyao 0 siblings, 0 replies; 54+ messages in thread From: Xi Ruoyao @ 2021-03-29 18:30 UTC (permalink / raw) To: Christian König, Alex Deucher, Christian König Cc: David Airlie, Felix Kuehling, linux-kernel, amd-gfx, Dan Horák, dri-devel, Daniel Vetter, stable On 2021-03-30 02:21 +0800, Xi Ruoyao wrote: > On 2021-03-29 20:10 +0200, Christian König wrote: > > You need to identify the root cause of this, most likely start or last > > are not a multiple of AMDGPU_GPU_PAGES_IN_CPU_PAGE. > > I printk'ed the value of start & last, they are all a multiple of 4 > (AMDGPU_GPU_PAGES_IN_CPU_PAGE). > > However... `num_entries = last - start + 1` so it became some irrational > thing... Either this line is wrong, or someone called > amdgpu_vm_bo_update_mapping with [start, last) instead of [start, last], which > is unexpected. I added BUG_ON(num_entries % AMDGPU_GPU_PAGES_IN_CPU_PAGE != 0), get: > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c90750>] > amdgpu_vm_bo_update_mapping.constprop.0+0x218/0xae8 > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c922b8>] > amdgpu_vm_bo_update+0x270/0x4c0 > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c823ec>] > amdgpu_gem_va_ioctl+0x40c/0x430 > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c1cae4>] > drm_ioctl_kernel+0xcc/0x120 > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c1cfd8>] > drm_ioctl+0x220/0x408 > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c5ea48>] > amdgpu_drm_ioctl+0x58/0x98 > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff838feac4>] sys_ioctl+0xcc/0xe8 > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff836e74f0>] > syscall_common+0x34/0x58 > > > > > > BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1549 > > > > > Fixes: a39f2a8d7066 ("drm/amdgpu: nuke amdgpu_vm_bo_split_mapping v2") > > > > > Reported-by: Xi Ruoyao <xry111@mengyan1223.wang> > > > > > Reported-by: Dan Horák <dan@danny.cz> > > > > > Cc: stable@vger.kernel.org > > > > > Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang> > > > > > --- > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +- > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > > > > index ad91c0c3c423..cee0cc9c8085 100644 > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > > > > @@ -1707,7 +1707,7 @@ static int amdgpu_vm_bo_update_mapping(struct > > > > > amdgpu_device *adev, > > > > > } > > > > > start = tmp; > > > > > > > > > > - } while (unlikely(start != last + 1)); > > > > > + } while (unlikely(start < last + 1)); > > > > > > > > > > r = vm->update_funcs->commit(¶ms, fence); > > > > > > > > > > > > > > > base-commit: a5e13c6df0e41702d2b2c77c8ad41677ebb065b3 > > > -- Xi Ruoyao <xry111@mengyan1223.wang> School of Aerospace Science and Technology, Xidian University _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems @ 2021-03-29 18:30 ` Xi Ruoyao 0 siblings, 0 replies; 54+ messages in thread From: Xi Ruoyao @ 2021-03-29 18:30 UTC (permalink / raw) To: Christian König, Alex Deucher, Christian König Cc: David Airlie, Felix Kuehling, linux-kernel, amd-gfx, Dan Horák, dri-devel, stable On 2021-03-30 02:21 +0800, Xi Ruoyao wrote: > On 2021-03-29 20:10 +0200, Christian König wrote: > > You need to identify the root cause of this, most likely start or last > > are not a multiple of AMDGPU_GPU_PAGES_IN_CPU_PAGE. > > I printk'ed the value of start & last, they are all a multiple of 4 > (AMDGPU_GPU_PAGES_IN_CPU_PAGE). > > However... `num_entries = last - start + 1` so it became some irrational > thing... Either this line is wrong, or someone called > amdgpu_vm_bo_update_mapping with [start, last) instead of [start, last], which > is unexpected. I added BUG_ON(num_entries % AMDGPU_GPU_PAGES_IN_CPU_PAGE != 0), get: > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c90750>] > amdgpu_vm_bo_update_mapping.constprop.0+0x218/0xae8 > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c922b8>] > amdgpu_vm_bo_update+0x270/0x4c0 > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c823ec>] > amdgpu_gem_va_ioctl+0x40c/0x430 > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c1cae4>] > drm_ioctl_kernel+0xcc/0x120 > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c1cfd8>] > drm_ioctl+0x220/0x408 > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c5ea48>] > amdgpu_drm_ioctl+0x58/0x98 > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff838feac4>] sys_ioctl+0xcc/0xe8 > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff836e74f0>] > syscall_common+0x34/0x58 > > > > > > BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1549 > > > > > Fixes: a39f2a8d7066 ("drm/amdgpu: nuke amdgpu_vm_bo_split_mapping v2") > > > > > Reported-by: Xi Ruoyao <xry111@mengyan1223.wang> > > > > > Reported-by: Dan Horák <dan@danny.cz> > > > > > Cc: stable@vger.kernel.org > > > > > Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang> > > > > > --- > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +- > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > > > > index ad91c0c3c423..cee0cc9c8085 100644 > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > > > > @@ -1707,7 +1707,7 @@ static int amdgpu_vm_bo_update_mapping(struct > > > > > amdgpu_device *adev, > > > > > } > > > > > start = tmp; > > > > > > > > > > - } while (unlikely(start != last + 1)); > > > > > + } while (unlikely(start < last + 1)); > > > > > > > > > > r = vm->update_funcs->commit(¶ms, fence); > > > > > > > > > > > > > > > base-commit: a5e13c6df0e41702d2b2c77c8ad41677ebb065b3 > > > -- Xi Ruoyao <xry111@mengyan1223.wang> School of Aerospace Science and Technology, Xidian University _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems 2021-03-29 18:30 ` Xi Ruoyao (?) @ 2021-03-29 19:27 ` Xi Ruoyao -1 siblings, 0 replies; 54+ messages in thread From: Xi Ruoyao @ 2021-03-29 19:27 UTC (permalink / raw) To: Christian König, Alex Deucher, Christian König Cc: David Airlie, Felix Kuehling, linux-kernel, dri-devel, Dan Horák, amd-gfx, Daniel Vetter, stable Hi Christian, I don't think there is any constraint implemented to ensure `num_entries % AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0`. For example, in `amdgpu_vm_bo_map()`: /* validate the parameters */ if (saddr & AMDGPU_GPU_PAGE_MASK || offset & AMDGPU_GPU_PAGE_MASK || size == 0 || size & AMDGPU_GPU_PAGE_MASK) return -EINVAL; /* snip */ saddr /= AMDGPU_GPU_PAGE_SIZE; eaddr /= AMDGPU_GPU_PAGE_SIZE; /* snip */ mapping->start = saddr; mapping->last = eaddr; If we really want to ensure (mapping->last - mapping->start + 1) % AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0, then we should replace "AMDGPU_GPU_PAGE_MASK" in "validate the parameters" with "PAGE_MASK". I tried it and it broke userspace: Xorg startup fails with EINVAL with this change. On 2021-03-30 02:30 +0800, Xi Ruoyao wrote: > On 2021-03-30 02:21 +0800, Xi Ruoyao wrote: > > On 2021-03-29 20:10 +0200, Christian König wrote: > > > You need to identify the root cause of this, most likely start or last > > > are not a multiple of AMDGPU_GPU_PAGES_IN_CPU_PAGE. > > > > I printk'ed the value of start & last, they are all a multiple of 4 > > (AMDGPU_GPU_PAGES_IN_CPU_PAGE). > > > > However... `num_entries = last - start + 1` so it became some irrational > > thing... Either this line is wrong, or someone called > > amdgpu_vm_bo_update_mapping with [start, last) instead of [start, last], which > > is unexpected. > > I added BUG_ON(num_entries % AMDGPU_GPU_PAGES_IN_CPU_PAGE != 0), get: > > > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c90750>] > > amdgpu_vm_bo_update_mapping.constprop.0+0x218/0xae8 > > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c922b8>] > > amdgpu_vm_bo_update+0x270/0x4c0 > > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c823ec>] > > amdgpu_gem_va_ioctl+0x40c/0x430 > > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c1cae4>] > > drm_ioctl_kernel+0xcc/0x120 > > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c1cfd8>] > > drm_ioctl+0x220/0x408 > > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c5ea48>] > > amdgpu_drm_ioctl+0x58/0x98 > > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff838feac4>] sys_ioctl+0xcc/0xe8 > > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff836e74f0>] > > syscall_common+0x34/0x58 > > > > > > > > > BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1549 > > > > > > Fixes: a39f2a8d7066 ("drm/amdgpu: nuke amdgpu_vm_bo_split_mapping v2") > > > > > > Reported-by: Xi Ruoyao <xry111@mengyan1223.wang> > > > > > > Reported-by: Dan Horák <dan@danny.cz> > > > > > > Cc: stable@vger.kernel.org > > > > > > Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang> > > > > > > --- > > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +- > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > > > > > index ad91c0c3c423..cee0cc9c8085 100644 > > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > > > > > @@ -1707,7 +1707,7 @@ static int amdgpu_vm_bo_update_mapping(struct > > > > > > amdgpu_device *adev, > > > > > > } > > > > > > start = tmp; > > > > > > > > > > > > - } while (unlikely(start != last + 1)); > > > > > > + } while (unlikely(start < last + 1)); > > > > > > > > > > > > r = vm->update_funcs->commit(¶ms, fence); > > > > > > > > > > > > > > > > > > base-commit: a5e13c6df0e41702d2b2c77c8ad41677ebb065b3 > > > > > > -- Xi Ruoyao <xry111@mengyan1223.wang> School of Aerospace Science and Technology, Xidian University ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems @ 2021-03-29 19:27 ` Xi Ruoyao 0 siblings, 0 replies; 54+ messages in thread From: Xi Ruoyao @ 2021-03-29 19:27 UTC (permalink / raw) To: Christian König, Alex Deucher, Christian König Cc: David Airlie, Felix Kuehling, linux-kernel, amd-gfx, Dan Horák, dri-devel, Daniel Vetter, stable Hi Christian, I don't think there is any constraint implemented to ensure `num_entries % AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0`. For example, in `amdgpu_vm_bo_map()`: /* validate the parameters */ if (saddr & AMDGPU_GPU_PAGE_MASK || offset & AMDGPU_GPU_PAGE_MASK || size == 0 || size & AMDGPU_GPU_PAGE_MASK) return -EINVAL; /* snip */ saddr /= AMDGPU_GPU_PAGE_SIZE; eaddr /= AMDGPU_GPU_PAGE_SIZE; /* snip */ mapping->start = saddr; mapping->last = eaddr; If we really want to ensure (mapping->last - mapping->start + 1) % AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0, then we should replace "AMDGPU_GPU_PAGE_MASK" in "validate the parameters" with "PAGE_MASK". I tried it and it broke userspace: Xorg startup fails with EINVAL with this change. On 2021-03-30 02:30 +0800, Xi Ruoyao wrote: > On 2021-03-30 02:21 +0800, Xi Ruoyao wrote: > > On 2021-03-29 20:10 +0200, Christian König wrote: > > > You need to identify the root cause of this, most likely start or last > > > are not a multiple of AMDGPU_GPU_PAGES_IN_CPU_PAGE. > > > > I printk'ed the value of start & last, they are all a multiple of 4 > > (AMDGPU_GPU_PAGES_IN_CPU_PAGE). > > > > However... `num_entries = last - start + 1` so it became some irrational > > thing... Either this line is wrong, or someone called > > amdgpu_vm_bo_update_mapping with [start, last) instead of [start, last], which > > is unexpected. > > I added BUG_ON(num_entries % AMDGPU_GPU_PAGES_IN_CPU_PAGE != 0), get: > > > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c90750>] > > amdgpu_vm_bo_update_mapping.constprop.0+0x218/0xae8 > > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c922b8>] > > amdgpu_vm_bo_update+0x270/0x4c0 > > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c823ec>] > > amdgpu_gem_va_ioctl+0x40c/0x430 > > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c1cae4>] > > drm_ioctl_kernel+0xcc/0x120 > > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c1cfd8>] > > drm_ioctl+0x220/0x408 > > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c5ea48>] > > amdgpu_drm_ioctl+0x58/0x98 > > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff838feac4>] sys_ioctl+0xcc/0xe8 > > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff836e74f0>] > > syscall_common+0x34/0x58 > > > > > > > > > BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1549 > > > > > > Fixes: a39f2a8d7066 ("drm/amdgpu: nuke amdgpu_vm_bo_split_mapping v2") > > > > > > Reported-by: Xi Ruoyao <xry111@mengyan1223.wang> > > > > > > Reported-by: Dan Horák <dan@danny.cz> > > > > > > Cc: stable@vger.kernel.org > > > > > > Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang> > > > > > > --- > > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +- > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > > > > > index ad91c0c3c423..cee0cc9c8085 100644 > > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > > > > > @@ -1707,7 +1707,7 @@ static int amdgpu_vm_bo_update_mapping(struct > > > > > > amdgpu_device *adev, > > > > > > } > > > > > > start = tmp; > > > > > > > > > > > > - } while (unlikely(start != last + 1)); > > > > > > + } while (unlikely(start < last + 1)); > > > > > > > > > > > > r = vm->update_funcs->commit(¶ms, fence); > > > > > > > > > > > > > > > > > > base-commit: a5e13c6df0e41702d2b2c77c8ad41677ebb065b3 > > > > > > -- Xi Ruoyao <xry111@mengyan1223.wang> School of Aerospace Science and Technology, Xidian University _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems @ 2021-03-29 19:27 ` Xi Ruoyao 0 siblings, 0 replies; 54+ messages in thread From: Xi Ruoyao @ 2021-03-29 19:27 UTC (permalink / raw) To: Christian König, Alex Deucher, Christian König Cc: David Airlie, Felix Kuehling, linux-kernel, amd-gfx, Dan Horák, dri-devel, stable Hi Christian, I don't think there is any constraint implemented to ensure `num_entries % AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0`. For example, in `amdgpu_vm_bo_map()`: /* validate the parameters */ if (saddr & AMDGPU_GPU_PAGE_MASK || offset & AMDGPU_GPU_PAGE_MASK || size == 0 || size & AMDGPU_GPU_PAGE_MASK) return -EINVAL; /* snip */ saddr /= AMDGPU_GPU_PAGE_SIZE; eaddr /= AMDGPU_GPU_PAGE_SIZE; /* snip */ mapping->start = saddr; mapping->last = eaddr; If we really want to ensure (mapping->last - mapping->start + 1) % AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0, then we should replace "AMDGPU_GPU_PAGE_MASK" in "validate the parameters" with "PAGE_MASK". I tried it and it broke userspace: Xorg startup fails with EINVAL with this change. On 2021-03-30 02:30 +0800, Xi Ruoyao wrote: > On 2021-03-30 02:21 +0800, Xi Ruoyao wrote: > > On 2021-03-29 20:10 +0200, Christian König wrote: > > > You need to identify the root cause of this, most likely start or last > > > are not a multiple of AMDGPU_GPU_PAGES_IN_CPU_PAGE. > > > > I printk'ed the value of start & last, they are all a multiple of 4 > > (AMDGPU_GPU_PAGES_IN_CPU_PAGE). > > > > However... `num_entries = last - start + 1` so it became some irrational > > thing... Either this line is wrong, or someone called > > amdgpu_vm_bo_update_mapping with [start, last) instead of [start, last], which > > is unexpected. > > I added BUG_ON(num_entries % AMDGPU_GPU_PAGES_IN_CPU_PAGE != 0), get: > > > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c90750>] > > amdgpu_vm_bo_update_mapping.constprop.0+0x218/0xae8 > > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c922b8>] > > amdgpu_vm_bo_update+0x270/0x4c0 > > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c823ec>] > > amdgpu_gem_va_ioctl+0x40c/0x430 > > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c1cae4>] > > drm_ioctl_kernel+0xcc/0x120 > > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c1cfd8>] > > drm_ioctl+0x220/0x408 > > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c5ea48>] > > amdgpu_drm_ioctl+0x58/0x98 > > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff838feac4>] sys_ioctl+0xcc/0xe8 > > Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff836e74f0>] > > syscall_common+0x34/0x58 > > > > > > > > > BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1549 > > > > > > Fixes: a39f2a8d7066 ("drm/amdgpu: nuke amdgpu_vm_bo_split_mapping v2") > > > > > > Reported-by: Xi Ruoyao <xry111@mengyan1223.wang> > > > > > > Reported-by: Dan Horák <dan@danny.cz> > > > > > > Cc: stable@vger.kernel.org > > > > > > Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang> > > > > > > --- > > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +- > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > > > > > index ad91c0c3c423..cee0cc9c8085 100644 > > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > > > > > @@ -1707,7 +1707,7 @@ static int amdgpu_vm_bo_update_mapping(struct > > > > > > amdgpu_device *adev, > > > > > > } > > > > > > start = tmp; > > > > > > > > > > > > - } while (unlikely(start != last + 1)); > > > > > > + } while (unlikely(start < last + 1)); > > > > > > > > > > > > r = vm->update_funcs->commit(¶ms, fence); > > > > > > > > > > > > > > > > > > base-commit: a5e13c6df0e41702d2b2c77c8ad41677ebb065b3 > > > > > > -- Xi Ruoyao <xry111@mengyan1223.wang> School of Aerospace Science and Technology, Xidian University _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems 2021-03-29 19:27 ` Xi Ruoyao (?) @ 2021-03-29 19:36 ` Christian König -1 siblings, 0 replies; 54+ messages in thread From: Christian König @ 2021-03-29 19:36 UTC (permalink / raw) To: Xi Ruoyao, Alex Deucher, Christian König Cc: David Airlie, Felix Kuehling, linux-kernel, dri-devel, Dan Horák, amd-gfx, Daniel Vetter, stable Am 29.03.21 um 21:27 schrieb Xi Ruoyao: > Hi Christian, > > I don't think there is any constraint implemented to ensure `num_entries % > AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0`. For example, in `amdgpu_vm_bo_map()`: > > /* validate the parameters */ > if (saddr & AMDGPU_GPU_PAGE_MASK || offset & AMDGPU_GPU_PAGE_MASK || > size == 0 || size & AMDGPU_GPU_PAGE_MASK) > return -EINVAL; > > /* snip */ > > saddr /= AMDGPU_GPU_PAGE_SIZE; > eaddr /= AMDGPU_GPU_PAGE_SIZE; > > /* snip */ > > mapping->start = saddr; > mapping->last = eaddr; > > > If we really want to ensure (mapping->last - mapping->start + 1) % > AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0, then we should replace "AMDGPU_GPU_PAGE_MASK" > in "validate the parameters" with "PAGE_MASK". Yeah, good point. > I tried it and it broke userspace: Xorg startup fails with EINVAL with this > change. Well in theory it is possible that we always fill the GPUVM on a 4k basis while the native page size of the CPU is larger. Let me double check the code. BTW: What code base are you based on? The code your post here is quite outdated. Christian. > > On 2021-03-30 02:30 +0800, Xi Ruoyao wrote: >> On 2021-03-30 02:21 +0800, Xi Ruoyao wrote: >>> On 2021-03-29 20:10 +0200, Christian König wrote: >>>> You need to identify the root cause of this, most likely start or last >>>> are not a multiple of AMDGPU_GPU_PAGES_IN_CPU_PAGE. >>> I printk'ed the value of start & last, they are all a multiple of 4 >>> (AMDGPU_GPU_PAGES_IN_CPU_PAGE). >>> >>> However... `num_entries = last - start + 1` so it became some irrational >>> thing... Either this line is wrong, or someone called >>> amdgpu_vm_bo_update_mapping with [start, last) instead of [start, last], which >>> is unexpected. >> I added BUG_ON(num_entries % AMDGPU_GPU_PAGES_IN_CPU_PAGE != 0), get: >> >>> Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c90750>] >>> amdgpu_vm_bo_update_mapping.constprop.0+0x218/0xae8 >>> Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c922b8>] >>> amdgpu_vm_bo_update+0x270/0x4c0 >>> Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c823ec>] >>> amdgpu_gem_va_ioctl+0x40c/0x430 >>> Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c1cae4>] >>> drm_ioctl_kernel+0xcc/0x120 >>> Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c1cfd8>] >>> drm_ioctl+0x220/0x408 >>> Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c5ea48>] >>> amdgpu_drm_ioctl+0x58/0x98 >>> Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff838feac4>] sys_ioctl+0xcc/0xe8 >>> Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff836e74f0>] >>> syscall_common+0x34/0x58 >>> >>>>>>> BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1549 >>>>>>> Fixes: a39f2a8d7066 ("drm/amdgpu: nuke amdgpu_vm_bo_split_mapping v2") >>>>>>> Reported-by: Xi Ruoyao <xry111@mengyan1223.wang> >>>>>>> Reported-by: Dan Horák <dan@danny.cz> >>>>>>> Cc: stable@vger.kernel.org >>>>>>> Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang> >>>>>>> --- >>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +- >>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>>>> >>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>>>>> index ad91c0c3c423..cee0cc9c8085 100644 >>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>>>>> @@ -1707,7 +1707,7 @@ static int amdgpu_vm_bo_update_mapping(struct >>>>>>> amdgpu_device *adev, >>>>>>> } >>>>>>> start = tmp; >>>>>>> >>>>>>> - } while (unlikely(start != last + 1)); >>>>>>> + } while (unlikely(start < last + 1)); >>>>>>> >>>>>>> r = vm->update_funcs->commit(¶ms, fence); >>>>>>> >>>>>>> >>>>>>> base-commit: a5e13c6df0e41702d2b2c77c8ad41677ebb065b3 ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems @ 2021-03-29 19:36 ` Christian König 0 siblings, 0 replies; 54+ messages in thread From: Christian König @ 2021-03-29 19:36 UTC (permalink / raw) To: Xi Ruoyao, Alex Deucher, Christian König Cc: David Airlie, Felix Kuehling, linux-kernel, amd-gfx, Dan Horák, dri-devel, Daniel Vetter, stable Am 29.03.21 um 21:27 schrieb Xi Ruoyao: > Hi Christian, > > I don't think there is any constraint implemented to ensure `num_entries % > AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0`. For example, in `amdgpu_vm_bo_map()`: > > /* validate the parameters */ > if (saddr & AMDGPU_GPU_PAGE_MASK || offset & AMDGPU_GPU_PAGE_MASK || > size == 0 || size & AMDGPU_GPU_PAGE_MASK) > return -EINVAL; > > /* snip */ > > saddr /= AMDGPU_GPU_PAGE_SIZE; > eaddr /= AMDGPU_GPU_PAGE_SIZE; > > /* snip */ > > mapping->start = saddr; > mapping->last = eaddr; > > > If we really want to ensure (mapping->last - mapping->start + 1) % > AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0, then we should replace "AMDGPU_GPU_PAGE_MASK" > in "validate the parameters" with "PAGE_MASK". Yeah, good point. > I tried it and it broke userspace: Xorg startup fails with EINVAL with this > change. Well in theory it is possible that we always fill the GPUVM on a 4k basis while the native page size of the CPU is larger. Let me double check the code. BTW: What code base are you based on? The code your post here is quite outdated. Christian. > > On 2021-03-30 02:30 +0800, Xi Ruoyao wrote: >> On 2021-03-30 02:21 +0800, Xi Ruoyao wrote: >>> On 2021-03-29 20:10 +0200, Christian König wrote: >>>> You need to identify the root cause of this, most likely start or last >>>> are not a multiple of AMDGPU_GPU_PAGES_IN_CPU_PAGE. >>> I printk'ed the value of start & last, they are all a multiple of 4 >>> (AMDGPU_GPU_PAGES_IN_CPU_PAGE). >>> >>> However... `num_entries = last - start + 1` so it became some irrational >>> thing... Either this line is wrong, or someone called >>> amdgpu_vm_bo_update_mapping with [start, last) instead of [start, last], which >>> is unexpected. >> I added BUG_ON(num_entries % AMDGPU_GPU_PAGES_IN_CPU_PAGE != 0), get: >> >>> Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c90750>] >>> amdgpu_vm_bo_update_mapping.constprop.0+0x218/0xae8 >>> Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c922b8>] >>> amdgpu_vm_bo_update+0x270/0x4c0 >>> Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c823ec>] >>> amdgpu_gem_va_ioctl+0x40c/0x430 >>> Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c1cae4>] >>> drm_ioctl_kernel+0xcc/0x120 >>> Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c1cfd8>] >>> drm_ioctl+0x220/0x408 >>> Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c5ea48>] >>> amdgpu_drm_ioctl+0x58/0x98 >>> Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff838feac4>] sys_ioctl+0xcc/0xe8 >>> Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff836e74f0>] >>> syscall_common+0x34/0x58 >>> >>>>>>> BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1549 >>>>>>> Fixes: a39f2a8d7066 ("drm/amdgpu: nuke amdgpu_vm_bo_split_mapping v2") >>>>>>> Reported-by: Xi Ruoyao <xry111@mengyan1223.wang> >>>>>>> Reported-by: Dan Horák <dan@danny.cz> >>>>>>> Cc: stable@vger.kernel.org >>>>>>> Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang> >>>>>>> --- >>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +- >>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>>>> >>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>>>>> index ad91c0c3c423..cee0cc9c8085 100644 >>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>>>>> @@ -1707,7 +1707,7 @@ static int amdgpu_vm_bo_update_mapping(struct >>>>>>> amdgpu_device *adev, >>>>>>> } >>>>>>> start = tmp; >>>>>>> >>>>>>> - } while (unlikely(start != last + 1)); >>>>>>> + } while (unlikely(start < last + 1)); >>>>>>> >>>>>>> r = vm->update_funcs->commit(¶ms, fence); >>>>>>> >>>>>>> >>>>>>> base-commit: a5e13c6df0e41702d2b2c77c8ad41677ebb065b3 _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems @ 2021-03-29 19:36 ` Christian König 0 siblings, 0 replies; 54+ messages in thread From: Christian König @ 2021-03-29 19:36 UTC (permalink / raw) To: Xi Ruoyao, Alex Deucher, Christian König Cc: David Airlie, Felix Kuehling, linux-kernel, amd-gfx, Dan Horák, dri-devel, stable Am 29.03.21 um 21:27 schrieb Xi Ruoyao: > Hi Christian, > > I don't think there is any constraint implemented to ensure `num_entries % > AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0`. For example, in `amdgpu_vm_bo_map()`: > > /* validate the parameters */ > if (saddr & AMDGPU_GPU_PAGE_MASK || offset & AMDGPU_GPU_PAGE_MASK || > size == 0 || size & AMDGPU_GPU_PAGE_MASK) > return -EINVAL; > > /* snip */ > > saddr /= AMDGPU_GPU_PAGE_SIZE; > eaddr /= AMDGPU_GPU_PAGE_SIZE; > > /* snip */ > > mapping->start = saddr; > mapping->last = eaddr; > > > If we really want to ensure (mapping->last - mapping->start + 1) % > AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0, then we should replace "AMDGPU_GPU_PAGE_MASK" > in "validate the parameters" with "PAGE_MASK". Yeah, good point. > I tried it and it broke userspace: Xorg startup fails with EINVAL with this > change. Well in theory it is possible that we always fill the GPUVM on a 4k basis while the native page size of the CPU is larger. Let me double check the code. BTW: What code base are you based on? The code your post here is quite outdated. Christian. > > On 2021-03-30 02:30 +0800, Xi Ruoyao wrote: >> On 2021-03-30 02:21 +0800, Xi Ruoyao wrote: >>> On 2021-03-29 20:10 +0200, Christian König wrote: >>>> You need to identify the root cause of this, most likely start or last >>>> are not a multiple of AMDGPU_GPU_PAGES_IN_CPU_PAGE. >>> I printk'ed the value of start & last, they are all a multiple of 4 >>> (AMDGPU_GPU_PAGES_IN_CPU_PAGE). >>> >>> However... `num_entries = last - start + 1` so it became some irrational >>> thing... Either this line is wrong, or someone called >>> amdgpu_vm_bo_update_mapping with [start, last) instead of [start, last], which >>> is unexpected. >> I added BUG_ON(num_entries % AMDGPU_GPU_PAGES_IN_CPU_PAGE != 0), get: >> >>> Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c90750>] >>> amdgpu_vm_bo_update_mapping.constprop.0+0x218/0xae8 >>> Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c922b8>] >>> amdgpu_vm_bo_update+0x270/0x4c0 >>> Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c823ec>] >>> amdgpu_gem_va_ioctl+0x40c/0x430 >>> Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c1cae4>] >>> drm_ioctl_kernel+0xcc/0x120 >>> Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c1cfd8>] >>> drm_ioctl+0x220/0x408 >>> Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff83c5ea48>] >>> amdgpu_drm_ioctl+0x58/0x98 >>> Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff838feac4>] sys_ioctl+0xcc/0xe8 >>> Mar 30 02:28:27 xry111-A1901 kernel: [<ffffffff836e74f0>] >>> syscall_common+0x34/0x58 >>> >>>>>>> BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1549 >>>>>>> Fixes: a39f2a8d7066 ("drm/amdgpu: nuke amdgpu_vm_bo_split_mapping v2") >>>>>>> Reported-by: Xi Ruoyao <xry111@mengyan1223.wang> >>>>>>> Reported-by: Dan Horák <dan@danny.cz> >>>>>>> Cc: stable@vger.kernel.org >>>>>>> Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang> >>>>>>> --- >>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +- >>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>>>> >>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>>>>> index ad91c0c3c423..cee0cc9c8085 100644 >>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>>>>> @@ -1707,7 +1707,7 @@ static int amdgpu_vm_bo_update_mapping(struct >>>>>>> amdgpu_device *adev, >>>>>>> } >>>>>>> start = tmp; >>>>>>> >>>>>>> - } while (unlikely(start != last + 1)); >>>>>>> + } while (unlikely(start < last + 1)); >>>>>>> >>>>>>> r = vm->update_funcs->commit(¶ms, fence); >>>>>>> >>>>>>> >>>>>>> base-commit: a5e13c6df0e41702d2b2c77c8ad41677ebb065b3 _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems 2021-03-29 19:36 ` Christian König (?) @ 2021-03-29 19:40 ` Xi Ruoyao -1 siblings, 0 replies; 54+ messages in thread From: Xi Ruoyao @ 2021-03-29 19:40 UTC (permalink / raw) To: Christian König, Alex Deucher, Christian König Cc: David Airlie, Felix Kuehling, linux-kernel, dri-devel, Dan Horák, amd-gfx, Daniel Vetter, stable On 2021-03-29 21:36 +0200, Christian König wrote: > Am 29.03.21 um 21:27 schrieb Xi Ruoyao: > > Hi Christian, > > > > I don't think there is any constraint implemented to ensure `num_entries % > > AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0`. For example, in `amdgpu_vm_bo_map()`: > > > > /* validate the parameters */ > > if (saddr & AMDGPU_GPU_PAGE_MASK || offset & AMDGPU_GPU_PAGE_MASK > > || > > size == 0 || size & AMDGPU_GPU_PAGE_MASK) > > return -EINVAL; > > > > /* snip */ > > > > saddr /= AMDGPU_GPU_PAGE_SIZE; > > eaddr /= AMDGPU_GPU_PAGE_SIZE; > > > > /* snip */ > > > > mapping->start = saddr; > > mapping->last = eaddr; > > > > > > If we really want to ensure (mapping->last - mapping->start + 1) % > > AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0, then we should replace > > "AMDGPU_GPU_PAGE_MASK" > > in "validate the parameters" with "PAGE_MASK". > > Yeah, good point. > > > I tried it and it broke userspace: Xorg startup fails with EINVAL with this > > change. > > Well in theory it is possible that we always fill the GPUVM on a 4k > basis while the native page size of the CPU is larger. Let me double > check the code. > > BTW: What code base are you based on? The code your post here is quite > outdated. Linus' tree. I'll go to sleep now (it's 03:39 here :( ), when I wake up I can try to fetch drm-next or something. -- Xi Ruoyao <xry111@mengyan1223.wang> School of Aerospace Science and Technology, Xidian University ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems @ 2021-03-29 19:40 ` Xi Ruoyao 0 siblings, 0 replies; 54+ messages in thread From: Xi Ruoyao @ 2021-03-29 19:40 UTC (permalink / raw) To: Christian König, Alex Deucher, Christian König Cc: David Airlie, Felix Kuehling, linux-kernel, amd-gfx, Dan Horák, dri-devel, Daniel Vetter, stable On 2021-03-29 21:36 +0200, Christian König wrote: > Am 29.03.21 um 21:27 schrieb Xi Ruoyao: > > Hi Christian, > > > > I don't think there is any constraint implemented to ensure `num_entries % > > AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0`. For example, in `amdgpu_vm_bo_map()`: > > > > /* validate the parameters */ > > if (saddr & AMDGPU_GPU_PAGE_MASK || offset & AMDGPU_GPU_PAGE_MASK > > || > > size == 0 || size & AMDGPU_GPU_PAGE_MASK) > > return -EINVAL; > > > > /* snip */ > > > > saddr /= AMDGPU_GPU_PAGE_SIZE; > > eaddr /= AMDGPU_GPU_PAGE_SIZE; > > > > /* snip */ > > > > mapping->start = saddr; > > mapping->last = eaddr; > > > > > > If we really want to ensure (mapping->last - mapping->start + 1) % > > AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0, then we should replace > > "AMDGPU_GPU_PAGE_MASK" > > in "validate the parameters" with "PAGE_MASK". > > Yeah, good point. > > > I tried it and it broke userspace: Xorg startup fails with EINVAL with this > > change. > > Well in theory it is possible that we always fill the GPUVM on a 4k > basis while the native page size of the CPU is larger. Let me double > check the code. > > BTW: What code base are you based on? The code your post here is quite > outdated. Linus' tree. I'll go to sleep now (it's 03:39 here :( ), when I wake up I can try to fetch drm-next or something. -- Xi Ruoyao <xry111@mengyan1223.wang> School of Aerospace Science and Technology, Xidian University _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems @ 2021-03-29 19:40 ` Xi Ruoyao 0 siblings, 0 replies; 54+ messages in thread From: Xi Ruoyao @ 2021-03-29 19:40 UTC (permalink / raw) To: Christian König, Alex Deucher, Christian König Cc: David Airlie, Felix Kuehling, linux-kernel, amd-gfx, Dan Horák, dri-devel, stable On 2021-03-29 21:36 +0200, Christian König wrote: > Am 29.03.21 um 21:27 schrieb Xi Ruoyao: > > Hi Christian, > > > > I don't think there is any constraint implemented to ensure `num_entries % > > AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0`. For example, in `amdgpu_vm_bo_map()`: > > > > /* validate the parameters */ > > if (saddr & AMDGPU_GPU_PAGE_MASK || offset & AMDGPU_GPU_PAGE_MASK > > || > > size == 0 || size & AMDGPU_GPU_PAGE_MASK) > > return -EINVAL; > > > > /* snip */ > > > > saddr /= AMDGPU_GPU_PAGE_SIZE; > > eaddr /= AMDGPU_GPU_PAGE_SIZE; > > > > /* snip */ > > > > mapping->start = saddr; > > mapping->last = eaddr; > > > > > > If we really want to ensure (mapping->last - mapping->start + 1) % > > AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0, then we should replace > > "AMDGPU_GPU_PAGE_MASK" > > in "validate the parameters" with "PAGE_MASK". > > Yeah, good point. > > > I tried it and it broke userspace: Xorg startup fails with EINVAL with this > > change. > > Well in theory it is possible that we always fill the GPUVM on a 4k > basis while the native page size of the CPU is larger. Let me double > check the code. > > BTW: What code base are you based on? The code your post here is quite > outdated. Linus' tree. I'll go to sleep now (it's 03:39 here :( ), when I wake up I can try to fetch drm-next or something. -- Xi Ruoyao <xry111@mengyan1223.wang> School of Aerospace Science and Technology, Xidian University _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems 2021-03-29 19:40 ` Xi Ruoyao (?) @ 2021-03-30 12:04 ` Xi Ruoyao -1 siblings, 0 replies; 54+ messages in thread From: Xi Ruoyao @ 2021-03-30 12:04 UTC (permalink / raw) To: Christian König, Alex Deucher, Christian König Cc: David Airlie, Felix Kuehling, linux-kernel, dri-devel, Dan Horák, amd-gfx, Daniel Vetter, stable On 2021-03-30 03:40 +0800, Xi Ruoyao wrote: > On 2021-03-29 21:36 +0200, Christian König wrote: > > Am 29.03.21 um 21:27 schrieb Xi Ruoyao: > > > Hi Christian, > > > > > > I don't think there is any constraint implemented to ensure `num_entries % > > > AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0`. For example, in `amdgpu_vm_bo_map()`: > > > > > > /* validate the parameters */ > > > if (saddr & AMDGPU_GPU_PAGE_MASK || offset & AMDGPU_GPU_PAGE_MASK > > > > > > > > size == 0 || size & AMDGPU_GPU_PAGE_MASK) > > > return -EINVAL; > > > > > > /* snip */ > > > > > > saddr /= AMDGPU_GPU_PAGE_SIZE; > > > eaddr /= AMDGPU_GPU_PAGE_SIZE; > > > > > > /* snip */ > > > > > > mapping->start = saddr; > > > mapping->last = eaddr; > > > > > > > > > If we really want to ensure (mapping->last - mapping->start + 1) % > > > AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0, then we should replace > > > "AMDGPU_GPU_PAGE_MASK" > > > in "validate the parameters" with "PAGE_MASK". It should be "~PAGE_MASK", "PAGE_MASK" has an opposite convention of "AMDGPU_GPU_PAGE_MASK" :(. > > Yeah, good point. > > > > > I tried it and it broke userspace: Xorg startup fails with EINVAL with > > > this > > > change. > > > > Well in theory it is possible that we always fill the GPUVM on a 4k > > basis while the native page size of the CPU is larger. Let me double > > check the code. On my platform, there are two issues both causing the VM corruption. One is fixed by agd5f/linux@fe001e7. Another is in Mesa from userspace: it uses `dev_info->gart_page_size` as the alignment, but the kernel AMDGPU driver expects it to use `dev_info->virtual_address_alignment`. If we can make the change to fill the GPUVM on a 4k basis, we can fix this issue and make virtual_address_alignment = 4K. Otherwise, we should fortify the parameter validation, changing "AMDGPU_GPU_PAGE_MASK" to "~PAGE_MASK". Then the userspace will just get an EINVAL, instead of a slient VM corruption. And someone should tell Mesa developers to fix the code in this case. -- Xi Ruoyao <xry111@mengyan1223.wang> School of Aerospace Science and Technology, Xidian University ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems @ 2021-03-30 12:04 ` Xi Ruoyao 0 siblings, 0 replies; 54+ messages in thread From: Xi Ruoyao @ 2021-03-30 12:04 UTC (permalink / raw) To: Christian König, Alex Deucher, Christian König Cc: David Airlie, Felix Kuehling, linux-kernel, amd-gfx, Dan Horák, dri-devel, Daniel Vetter, stable On 2021-03-30 03:40 +0800, Xi Ruoyao wrote: > On 2021-03-29 21:36 +0200, Christian König wrote: > > Am 29.03.21 um 21:27 schrieb Xi Ruoyao: > > > Hi Christian, > > > > > > I don't think there is any constraint implemented to ensure `num_entries % > > > AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0`. For example, in `amdgpu_vm_bo_map()`: > > > > > > /* validate the parameters */ > > > if (saddr & AMDGPU_GPU_PAGE_MASK || offset & AMDGPU_GPU_PAGE_MASK > > > > > > > > size == 0 || size & AMDGPU_GPU_PAGE_MASK) > > > return -EINVAL; > > > > > > /* snip */ > > > > > > saddr /= AMDGPU_GPU_PAGE_SIZE; > > > eaddr /= AMDGPU_GPU_PAGE_SIZE; > > > > > > /* snip */ > > > > > > mapping->start = saddr; > > > mapping->last = eaddr; > > > > > > > > > If we really want to ensure (mapping->last - mapping->start + 1) % > > > AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0, then we should replace > > > "AMDGPU_GPU_PAGE_MASK" > > > in "validate the parameters" with "PAGE_MASK". It should be "~PAGE_MASK", "PAGE_MASK" has an opposite convention of "AMDGPU_GPU_PAGE_MASK" :(. > > Yeah, good point. > > > > > I tried it and it broke userspace: Xorg startup fails with EINVAL with > > > this > > > change. > > > > Well in theory it is possible that we always fill the GPUVM on a 4k > > basis while the native page size of the CPU is larger. Let me double > > check the code. On my platform, there are two issues both causing the VM corruption. One is fixed by agd5f/linux@fe001e7. Another is in Mesa from userspace: it uses `dev_info->gart_page_size` as the alignment, but the kernel AMDGPU driver expects it to use `dev_info->virtual_address_alignment`. If we can make the change to fill the GPUVM on a 4k basis, we can fix this issue and make virtual_address_alignment = 4K. Otherwise, we should fortify the parameter validation, changing "AMDGPU_GPU_PAGE_MASK" to "~PAGE_MASK". Then the userspace will just get an EINVAL, instead of a slient VM corruption. And someone should tell Mesa developers to fix the code in this case. -- Xi Ruoyao <xry111@mengyan1223.wang> School of Aerospace Science and Technology, Xidian University _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems @ 2021-03-30 12:04 ` Xi Ruoyao 0 siblings, 0 replies; 54+ messages in thread From: Xi Ruoyao @ 2021-03-30 12:04 UTC (permalink / raw) To: Christian König, Alex Deucher, Christian König Cc: David Airlie, Felix Kuehling, linux-kernel, amd-gfx, Dan Horák, dri-devel, stable On 2021-03-30 03:40 +0800, Xi Ruoyao wrote: > On 2021-03-29 21:36 +0200, Christian König wrote: > > Am 29.03.21 um 21:27 schrieb Xi Ruoyao: > > > Hi Christian, > > > > > > I don't think there is any constraint implemented to ensure `num_entries % > > > AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0`. For example, in `amdgpu_vm_bo_map()`: > > > > > > /* validate the parameters */ > > > if (saddr & AMDGPU_GPU_PAGE_MASK || offset & AMDGPU_GPU_PAGE_MASK > > > > > > > > size == 0 || size & AMDGPU_GPU_PAGE_MASK) > > > return -EINVAL; > > > > > > /* snip */ > > > > > > saddr /= AMDGPU_GPU_PAGE_SIZE; > > > eaddr /= AMDGPU_GPU_PAGE_SIZE; > > > > > > /* snip */ > > > > > > mapping->start = saddr; > > > mapping->last = eaddr; > > > > > > > > > If we really want to ensure (mapping->last - mapping->start + 1) % > > > AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0, then we should replace > > > "AMDGPU_GPU_PAGE_MASK" > > > in "validate the parameters" with "PAGE_MASK". It should be "~PAGE_MASK", "PAGE_MASK" has an opposite convention of "AMDGPU_GPU_PAGE_MASK" :(. > > Yeah, good point. > > > > > I tried it and it broke userspace: Xorg startup fails with EINVAL with > > > this > > > change. > > > > Well in theory it is possible that we always fill the GPUVM on a 4k > > basis while the native page size of the CPU is larger. Let me double > > check the code. On my platform, there are two issues both causing the VM corruption. One is fixed by agd5f/linux@fe001e7. Another is in Mesa from userspace: it uses `dev_info->gart_page_size` as the alignment, but the kernel AMDGPU driver expects it to use `dev_info->virtual_address_alignment`. If we can make the change to fill the GPUVM on a 4k basis, we can fix this issue and make virtual_address_alignment = 4K. Otherwise, we should fortify the parameter validation, changing "AMDGPU_GPU_PAGE_MASK" to "~PAGE_MASK". Then the userspace will just get an EINVAL, instead of a slient VM corruption. And someone should tell Mesa developers to fix the code in this case. -- Xi Ruoyao <xry111@mengyan1223.wang> School of Aerospace Science and Technology, Xidian University _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems 2021-03-30 12:04 ` Xi Ruoyao (?) @ 2021-03-30 12:55 ` Christian König -1 siblings, 0 replies; 54+ messages in thread From: Christian König @ 2021-03-30 12:55 UTC (permalink / raw) To: Xi Ruoyao, Christian König, Alex Deucher Cc: David Airlie, Felix Kuehling, linux-kernel, dri-devel, Dan Horák, amd-gfx, Daniel Vetter, stable Am 30.03.21 um 14:04 schrieb Xi Ruoyao: > On 2021-03-30 03:40 +0800, Xi Ruoyao wrote: >> On 2021-03-29 21:36 +0200, Christian König wrote: >>> Am 29.03.21 um 21:27 schrieb Xi Ruoyao: >>>> Hi Christian, >>>> >>>> I don't think there is any constraint implemented to ensure `num_entries % >>>> AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0`. For example, in `amdgpu_vm_bo_map()`: >>>> >>>> /* validate the parameters */ >>>> if (saddr & AMDGPU_GPU_PAGE_MASK || offset & AMDGPU_GPU_PAGE_MASK >>>> size == 0 || size & AMDGPU_GPU_PAGE_MASK) >>>> return -EINVAL; >>>> >>>> /* snip */ >>>> >>>> saddr /= AMDGPU_GPU_PAGE_SIZE; >>>> eaddr /= AMDGPU_GPU_PAGE_SIZE; >>>> >>>> /* snip */ >>>> >>>> mapping->start = saddr; >>>> mapping->last = eaddr; >>>> >>>> >>>> If we really want to ensure (mapping->last - mapping->start + 1) % >>>> AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0, then we should replace >>>> "AMDGPU_GPU_PAGE_MASK" >>>> in "validate the parameters" with "PAGE_MASK". > It should be "~PAGE_MASK", "PAGE_MASK" has an opposite convention of > "AMDGPU_GPU_PAGE_MASK" :(. > >>> Yeah, good point. >>> >>>> I tried it and it broke userspace: Xorg startup fails with EINVAL with >>>> this >>>> change. >>> Well in theory it is possible that we always fill the GPUVM on a 4k >>> basis while the native page size of the CPU is larger. Let me double >>> check the code. > On my platform, there are two issues both causing the VM corruption. One is > fixed by agd5f/linux@fe001e7. What is fe001e7? A commit id? If yes then that is to short and I can't find it. > Another is in Mesa from userspace: it uses > `dev_info->gart_page_size` as the alignment, but the kernel AMDGPU driver > expects it to use `dev_info->virtual_address_alignment`. Mhm, looking at the kernel code I would rather say Mesa is correct and the kernel driver is broken. The gart_page_size is limited by the CPU page size, but the virtual_address_alignment isn't. > If we can make the change to fill the GPUVM on a 4k basis, we can fix this issue > and make virtual_address_alignment = 4K. Otherwise, we should fortify the > parameter validation, changing "AMDGPU_GPU_PAGE_MASK" to "~PAGE_MASK". Then the > userspace will just get an EINVAL, instead of a slient VM corruption. And > someone should tell Mesa developers to fix the code in this case. I rather see this as a kernel bug. Can you test if this code fragment fixes your issue: diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c index 64beb3399604..e1260b517e1b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c @@ -780,7 +780,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) } dev_info->virtual_address_alignment = max((int)PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE); dev_info->pte_fragment_size = (1 << adev->vm_manager.fragment_size) * AMDGPU_GPU_PAGE_SIZE; - dev_info->gart_page_size = AMDGPU_GPU_PAGE_SIZE; + dev_info->gart_page_size = dev_info->virtual_address_alignment; dev_info->cu_active_number = adev->gfx.cu_info.number; dev_info->cu_ao_mask = adev->gfx.cu_info.ao_cu_mask; dev_info->ce_ram_size = adev->gfx.ce_ram_size; Thanks, Christian. > -- > Xi Ruoyao <xry111@mengyan1223.wang> > School of Aerospace Science and Technology, Xidian University > ^ permalink raw reply related [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems @ 2021-03-30 12:55 ` Christian König 0 siblings, 0 replies; 54+ messages in thread From: Christian König @ 2021-03-30 12:55 UTC (permalink / raw) To: Xi Ruoyao, Christian König, Alex Deucher Cc: David Airlie, Felix Kuehling, linux-kernel, amd-gfx, Dan Horák, dri-devel, Daniel Vetter, stable Am 30.03.21 um 14:04 schrieb Xi Ruoyao: > On 2021-03-30 03:40 +0800, Xi Ruoyao wrote: >> On 2021-03-29 21:36 +0200, Christian König wrote: >>> Am 29.03.21 um 21:27 schrieb Xi Ruoyao: >>>> Hi Christian, >>>> >>>> I don't think there is any constraint implemented to ensure `num_entries % >>>> AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0`. For example, in `amdgpu_vm_bo_map()`: >>>> >>>> /* validate the parameters */ >>>> if (saddr & AMDGPU_GPU_PAGE_MASK || offset & AMDGPU_GPU_PAGE_MASK >>>> size == 0 || size & AMDGPU_GPU_PAGE_MASK) >>>> return -EINVAL; >>>> >>>> /* snip */ >>>> >>>> saddr /= AMDGPU_GPU_PAGE_SIZE; >>>> eaddr /= AMDGPU_GPU_PAGE_SIZE; >>>> >>>> /* snip */ >>>> >>>> mapping->start = saddr; >>>> mapping->last = eaddr; >>>> >>>> >>>> If we really want to ensure (mapping->last - mapping->start + 1) % >>>> AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0, then we should replace >>>> "AMDGPU_GPU_PAGE_MASK" >>>> in "validate the parameters" with "PAGE_MASK". > It should be "~PAGE_MASK", "PAGE_MASK" has an opposite convention of > "AMDGPU_GPU_PAGE_MASK" :(. > >>> Yeah, good point. >>> >>>> I tried it and it broke userspace: Xorg startup fails with EINVAL with >>>> this >>>> change. >>> Well in theory it is possible that we always fill the GPUVM on a 4k >>> basis while the native page size of the CPU is larger. Let me double >>> check the code. > On my platform, there are two issues both causing the VM corruption. One is > fixed by agd5f/linux@fe001e7. What is fe001e7? A commit id? If yes then that is to short and I can't find it. > Another is in Mesa from userspace: it uses > `dev_info->gart_page_size` as the alignment, but the kernel AMDGPU driver > expects it to use `dev_info->virtual_address_alignment`. Mhm, looking at the kernel code I would rather say Mesa is correct and the kernel driver is broken. The gart_page_size is limited by the CPU page size, but the virtual_address_alignment isn't. > If we can make the change to fill the GPUVM on a 4k basis, we can fix this issue > and make virtual_address_alignment = 4K. Otherwise, we should fortify the > parameter validation, changing "AMDGPU_GPU_PAGE_MASK" to "~PAGE_MASK". Then the > userspace will just get an EINVAL, instead of a slient VM corruption. And > someone should tell Mesa developers to fix the code in this case. I rather see this as a kernel bug. Can you test if this code fragment fixes your issue: diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c index 64beb3399604..e1260b517e1b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c @@ -780,7 +780,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) } dev_info->virtual_address_alignment = max((int)PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE); dev_info->pte_fragment_size = (1 << adev->vm_manager.fragment_size) * AMDGPU_GPU_PAGE_SIZE; - dev_info->gart_page_size = AMDGPU_GPU_PAGE_SIZE; + dev_info->gart_page_size = dev_info->virtual_address_alignment; dev_info->cu_active_number = adev->gfx.cu_info.number; dev_info->cu_ao_mask = adev->gfx.cu_info.ao_cu_mask; dev_info->ce_ram_size = adev->gfx.ce_ram_size; Thanks, Christian. > -- > Xi Ruoyao <xry111@mengyan1223.wang> > School of Aerospace Science and Technology, Xidian University > _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply related [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems @ 2021-03-30 12:55 ` Christian König 0 siblings, 0 replies; 54+ messages in thread From: Christian König @ 2021-03-30 12:55 UTC (permalink / raw) To: Xi Ruoyao, Christian König, Alex Deucher Cc: David Airlie, Felix Kuehling, linux-kernel, amd-gfx, Dan Horák, dri-devel, stable Am 30.03.21 um 14:04 schrieb Xi Ruoyao: > On 2021-03-30 03:40 +0800, Xi Ruoyao wrote: >> On 2021-03-29 21:36 +0200, Christian König wrote: >>> Am 29.03.21 um 21:27 schrieb Xi Ruoyao: >>>> Hi Christian, >>>> >>>> I don't think there is any constraint implemented to ensure `num_entries % >>>> AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0`. For example, in `amdgpu_vm_bo_map()`: >>>> >>>> /* validate the parameters */ >>>> if (saddr & AMDGPU_GPU_PAGE_MASK || offset & AMDGPU_GPU_PAGE_MASK >>>> size == 0 || size & AMDGPU_GPU_PAGE_MASK) >>>> return -EINVAL; >>>> >>>> /* snip */ >>>> >>>> saddr /= AMDGPU_GPU_PAGE_SIZE; >>>> eaddr /= AMDGPU_GPU_PAGE_SIZE; >>>> >>>> /* snip */ >>>> >>>> mapping->start = saddr; >>>> mapping->last = eaddr; >>>> >>>> >>>> If we really want to ensure (mapping->last - mapping->start + 1) % >>>> AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0, then we should replace >>>> "AMDGPU_GPU_PAGE_MASK" >>>> in "validate the parameters" with "PAGE_MASK". > It should be "~PAGE_MASK", "PAGE_MASK" has an opposite convention of > "AMDGPU_GPU_PAGE_MASK" :(. > >>> Yeah, good point. >>> >>>> I tried it and it broke userspace: Xorg startup fails with EINVAL with >>>> this >>>> change. >>> Well in theory it is possible that we always fill the GPUVM on a 4k >>> basis while the native page size of the CPU is larger. Let me double >>> check the code. > On my platform, there are two issues both causing the VM corruption. One is > fixed by agd5f/linux@fe001e7. What is fe001e7? A commit id? If yes then that is to short and I can't find it. > Another is in Mesa from userspace: it uses > `dev_info->gart_page_size` as the alignment, but the kernel AMDGPU driver > expects it to use `dev_info->virtual_address_alignment`. Mhm, looking at the kernel code I would rather say Mesa is correct and the kernel driver is broken. The gart_page_size is limited by the CPU page size, but the virtual_address_alignment isn't. > If we can make the change to fill the GPUVM on a 4k basis, we can fix this issue > and make virtual_address_alignment = 4K. Otherwise, we should fortify the > parameter validation, changing "AMDGPU_GPU_PAGE_MASK" to "~PAGE_MASK". Then the > userspace will just get an EINVAL, instead of a slient VM corruption. And > someone should tell Mesa developers to fix the code in this case. I rather see this as a kernel bug. Can you test if this code fragment fixes your issue: diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c index 64beb3399604..e1260b517e1b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c @@ -780,7 +780,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) } dev_info->virtual_address_alignment = max((int)PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE); dev_info->pte_fragment_size = (1 << adev->vm_manager.fragment_size) * AMDGPU_GPU_PAGE_SIZE; - dev_info->gart_page_size = AMDGPU_GPU_PAGE_SIZE; + dev_info->gart_page_size = dev_info->virtual_address_alignment; dev_info->cu_active_number = adev->gfx.cu_info.number; dev_info->cu_ao_mask = adev->gfx.cu_info.ao_cu_mask; dev_info->ce_ram_size = adev->gfx.ce_ram_size; Thanks, Christian. > -- > Xi Ruoyao <xry111@mengyan1223.wang> > School of Aerospace Science and Technology, Xidian University > _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply related [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems 2021-03-30 12:55 ` Christian König (?) @ 2021-03-30 13:00 ` Dan Horák -1 siblings, 0 replies; 54+ messages in thread From: Dan Horák @ 2021-03-30 13:00 UTC (permalink / raw) To: Christian König Cc: Xi Ruoyao, Christian König, Alex Deucher, David Airlie, Felix Kuehling, linux-kernel, dri-devel, amd-gfx, Daniel Vetter, stable On Tue, 30 Mar 2021 14:55:01 +0200 Christian König <christian.koenig@amd.com> wrote: > Am 30.03.21 um 14:04 schrieb Xi Ruoyao: > > On 2021-03-30 03:40 +0800, Xi Ruoyao wrote: > >> On 2021-03-29 21:36 +0200, Christian König wrote: > >>> Am 29.03.21 um 21:27 schrieb Xi Ruoyao: > >>>> Hi Christian, > >>>> > >>>> I don't think there is any constraint implemented to ensure `num_entries % > >>>> AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0`. For example, in `amdgpu_vm_bo_map()`: > >>>> > >>>> /* validate the parameters */ > >>>> if (saddr & AMDGPU_GPU_PAGE_MASK || offset & AMDGPU_GPU_PAGE_MASK > >>>> size == 0 || size & AMDGPU_GPU_PAGE_MASK) > >>>> return -EINVAL; > >>>> > >>>> /* snip */ > >>>> > >>>> saddr /= AMDGPU_GPU_PAGE_SIZE; > >>>> eaddr /= AMDGPU_GPU_PAGE_SIZE; > >>>> > >>>> /* snip */ > >>>> > >>>> mapping->start = saddr; > >>>> mapping->last = eaddr; > >>>> > >>>> > >>>> If we really want to ensure (mapping->last - mapping->start + 1) % > >>>> AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0, then we should replace > >>>> "AMDGPU_GPU_PAGE_MASK" > >>>> in "validate the parameters" with "PAGE_MASK". > > It should be "~PAGE_MASK", "PAGE_MASK" has an opposite convention of > > "AMDGPU_GPU_PAGE_MASK" :(. > > > >>> Yeah, good point. > >>> > >>>> I tried it and it broke userspace: Xorg startup fails with EINVAL with > >>>> this > >>>> change. > >>> Well in theory it is possible that we always fill the GPUVM on a 4k > >>> basis while the native page size of the CPU is larger. Let me double > >>> check the code. > > On my platform, there are two issues both causing the VM corruption. One is > > fixed by agd5f/linux@fe001e7. > > What is fe001e7? A commit id? If yes then that is to short and I can't > find it. it's a gitlab shortcut for https://gitlab.freedesktop.org/agd5f/linux/-/commit/fe001e70a55d0378328612be1fdc3abfc68b9ccc Dan > > > Another is in Mesa from userspace: it uses > > `dev_info->gart_page_size` as the alignment, but the kernel AMDGPU driver > > expects it to use `dev_info->virtual_address_alignment`. > > Mhm, looking at the kernel code I would rather say Mesa is correct and > the kernel driver is broken. > > The gart_page_size is limited by the CPU page size, but the > virtual_address_alignment isn't. > > > If we can make the change to fill the GPUVM on a 4k basis, we can fix this issue > > and make virtual_address_alignment = 4K. Otherwise, we should fortify the > > parameter validation, changing "AMDGPU_GPU_PAGE_MASK" to "~PAGE_MASK". Then the > > userspace will just get an EINVAL, instead of a slient VM corruption. And > > someone should tell Mesa developers to fix the code in this case. > > I rather see this as a kernel bug. Can you test if this code fragment > fixes your issue: > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > index 64beb3399604..e1260b517e1b 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > @@ -780,7 +780,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void > *data, struct drm_file *filp) > } > dev_info->virtual_address_alignment = > max((int)PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE); > dev_info->pte_fragment_size = (1 << > adev->vm_manager.fragment_size) * AMDGPU_GPU_PAGE_SIZE; > - dev_info->gart_page_size = AMDGPU_GPU_PAGE_SIZE; > + dev_info->gart_page_size = > dev_info->virtual_address_alignment; > dev_info->cu_active_number = adev->gfx.cu_info.number; > dev_info->cu_ao_mask = adev->gfx.cu_info.ao_cu_mask; > dev_info->ce_ram_size = adev->gfx.ce_ram_size; > > > Thanks, > Christian. > > > -- > > Xi Ruoyao <xry111@mengyan1223.wang> > > School of Aerospace Science and Technology, Xidian University > > > ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems @ 2021-03-30 13:00 ` Dan Horák 0 siblings, 0 replies; 54+ messages in thread From: Dan Horák @ 2021-03-30 13:00 UTC (permalink / raw) To: Christian König Cc: amd-gfx, David Airlie, Christian König, Felix Kuehling, linux-kernel, dri-devel, Xi Ruoyao, Daniel Vetter, Alex Deucher, stable On Tue, 30 Mar 2021 14:55:01 +0200 Christian König <christian.koenig@amd.com> wrote: > Am 30.03.21 um 14:04 schrieb Xi Ruoyao: > > On 2021-03-30 03:40 +0800, Xi Ruoyao wrote: > >> On 2021-03-29 21:36 +0200, Christian König wrote: > >>> Am 29.03.21 um 21:27 schrieb Xi Ruoyao: > >>>> Hi Christian, > >>>> > >>>> I don't think there is any constraint implemented to ensure `num_entries % > >>>> AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0`. For example, in `amdgpu_vm_bo_map()`: > >>>> > >>>> /* validate the parameters */ > >>>> if (saddr & AMDGPU_GPU_PAGE_MASK || offset & AMDGPU_GPU_PAGE_MASK > >>>> size == 0 || size & AMDGPU_GPU_PAGE_MASK) > >>>> return -EINVAL; > >>>> > >>>> /* snip */ > >>>> > >>>> saddr /= AMDGPU_GPU_PAGE_SIZE; > >>>> eaddr /= AMDGPU_GPU_PAGE_SIZE; > >>>> > >>>> /* snip */ > >>>> > >>>> mapping->start = saddr; > >>>> mapping->last = eaddr; > >>>> > >>>> > >>>> If we really want to ensure (mapping->last - mapping->start + 1) % > >>>> AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0, then we should replace > >>>> "AMDGPU_GPU_PAGE_MASK" > >>>> in "validate the parameters" with "PAGE_MASK". > > It should be "~PAGE_MASK", "PAGE_MASK" has an opposite convention of > > "AMDGPU_GPU_PAGE_MASK" :(. > > > >>> Yeah, good point. > >>> > >>>> I tried it and it broke userspace: Xorg startup fails with EINVAL with > >>>> this > >>>> change. > >>> Well in theory it is possible that we always fill the GPUVM on a 4k > >>> basis while the native page size of the CPU is larger. Let me double > >>> check the code. > > On my platform, there are two issues both causing the VM corruption. One is > > fixed by agd5f/linux@fe001e7. > > What is fe001e7? A commit id? If yes then that is to short and I can't > find it. it's a gitlab shortcut for https://gitlab.freedesktop.org/agd5f/linux/-/commit/fe001e70a55d0378328612be1fdc3abfc68b9ccc Dan > > > Another is in Mesa from userspace: it uses > > `dev_info->gart_page_size` as the alignment, but the kernel AMDGPU driver > > expects it to use `dev_info->virtual_address_alignment`. > > Mhm, looking at the kernel code I would rather say Mesa is correct and > the kernel driver is broken. > > The gart_page_size is limited by the CPU page size, but the > virtual_address_alignment isn't. > > > If we can make the change to fill the GPUVM on a 4k basis, we can fix this issue > > and make virtual_address_alignment = 4K. Otherwise, we should fortify the > > parameter validation, changing "AMDGPU_GPU_PAGE_MASK" to "~PAGE_MASK". Then the > > userspace will just get an EINVAL, instead of a slient VM corruption. And > > someone should tell Mesa developers to fix the code in this case. > > I rather see this as a kernel bug. Can you test if this code fragment > fixes your issue: > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > index 64beb3399604..e1260b517e1b 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > @@ -780,7 +780,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void > *data, struct drm_file *filp) > } > dev_info->virtual_address_alignment = > max((int)PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE); > dev_info->pte_fragment_size = (1 << > adev->vm_manager.fragment_size) * AMDGPU_GPU_PAGE_SIZE; > - dev_info->gart_page_size = AMDGPU_GPU_PAGE_SIZE; > + dev_info->gart_page_size = > dev_info->virtual_address_alignment; > dev_info->cu_active_number = adev->gfx.cu_info.number; > dev_info->cu_ao_mask = adev->gfx.cu_info.ao_cu_mask; > dev_info->ce_ram_size = adev->gfx.ce_ram_size; > > > Thanks, > Christian. > > > -- > > Xi Ruoyao <xry111@mengyan1223.wang> > > School of Aerospace Science and Technology, Xidian University > > > _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems @ 2021-03-30 13:00 ` Dan Horák 0 siblings, 0 replies; 54+ messages in thread From: Dan Horák @ 2021-03-30 13:00 UTC (permalink / raw) To: Christian König Cc: amd-gfx, David Airlie, Christian König, Felix Kuehling, linux-kernel, dri-devel, Xi Ruoyao, Alex Deucher, stable On Tue, 30 Mar 2021 14:55:01 +0200 Christian König <christian.koenig@amd.com> wrote: > Am 30.03.21 um 14:04 schrieb Xi Ruoyao: > > On 2021-03-30 03:40 +0800, Xi Ruoyao wrote: > >> On 2021-03-29 21:36 +0200, Christian König wrote: > >>> Am 29.03.21 um 21:27 schrieb Xi Ruoyao: > >>>> Hi Christian, > >>>> > >>>> I don't think there is any constraint implemented to ensure `num_entries % > >>>> AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0`. For example, in `amdgpu_vm_bo_map()`: > >>>> > >>>> /* validate the parameters */ > >>>> if (saddr & AMDGPU_GPU_PAGE_MASK || offset & AMDGPU_GPU_PAGE_MASK > >>>> size == 0 || size & AMDGPU_GPU_PAGE_MASK) > >>>> return -EINVAL; > >>>> > >>>> /* snip */ > >>>> > >>>> saddr /= AMDGPU_GPU_PAGE_SIZE; > >>>> eaddr /= AMDGPU_GPU_PAGE_SIZE; > >>>> > >>>> /* snip */ > >>>> > >>>> mapping->start = saddr; > >>>> mapping->last = eaddr; > >>>> > >>>> > >>>> If we really want to ensure (mapping->last - mapping->start + 1) % > >>>> AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0, then we should replace > >>>> "AMDGPU_GPU_PAGE_MASK" > >>>> in "validate the parameters" with "PAGE_MASK". > > It should be "~PAGE_MASK", "PAGE_MASK" has an opposite convention of > > "AMDGPU_GPU_PAGE_MASK" :(. > > > >>> Yeah, good point. > >>> > >>>> I tried it and it broke userspace: Xorg startup fails with EINVAL with > >>>> this > >>>> change. > >>> Well in theory it is possible that we always fill the GPUVM on a 4k > >>> basis while the native page size of the CPU is larger. Let me double > >>> check the code. > > On my platform, there are two issues both causing the VM corruption. One is > > fixed by agd5f/linux@fe001e7. > > What is fe001e7? A commit id? If yes then that is to short and I can't > find it. it's a gitlab shortcut for https://gitlab.freedesktop.org/agd5f/linux/-/commit/fe001e70a55d0378328612be1fdc3abfc68b9ccc Dan > > > Another is in Mesa from userspace: it uses > > `dev_info->gart_page_size` as the alignment, but the kernel AMDGPU driver > > expects it to use `dev_info->virtual_address_alignment`. > > Mhm, looking at the kernel code I would rather say Mesa is correct and > the kernel driver is broken. > > The gart_page_size is limited by the CPU page size, but the > virtual_address_alignment isn't. > > > If we can make the change to fill the GPUVM on a 4k basis, we can fix this issue > > and make virtual_address_alignment = 4K. Otherwise, we should fortify the > > parameter validation, changing "AMDGPU_GPU_PAGE_MASK" to "~PAGE_MASK". Then the > > userspace will just get an EINVAL, instead of a slient VM corruption. And > > someone should tell Mesa developers to fix the code in this case. > > I rather see this as a kernel bug. Can you test if this code fragment > fixes your issue: > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > index 64beb3399604..e1260b517e1b 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > @@ -780,7 +780,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void > *data, struct drm_file *filp) > } > dev_info->virtual_address_alignment = > max((int)PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE); > dev_info->pte_fragment_size = (1 << > adev->vm_manager.fragment_size) * AMDGPU_GPU_PAGE_SIZE; > - dev_info->gart_page_size = AMDGPU_GPU_PAGE_SIZE; > + dev_info->gart_page_size = > dev_info->virtual_address_alignment; > dev_info->cu_active_number = adev->gfx.cu_info.number; > dev_info->cu_ao_mask = adev->gfx.cu_info.ao_cu_mask; > dev_info->ce_ram_size = adev->gfx.ce_ram_size; > > > Thanks, > Christian. > > > -- > > Xi Ruoyao <xry111@mengyan1223.wang> > > School of Aerospace Science and Technology, Xidian University > > > _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems 2021-03-30 13:00 ` Dan Horák (?) @ 2021-03-30 13:02 ` Christian König -1 siblings, 0 replies; 54+ messages in thread From: Christian König @ 2021-03-30 13:02 UTC (permalink / raw) To: Dan Horák Cc: Xi Ruoyao, Christian König, Alex Deucher, David Airlie, Felix Kuehling, linux-kernel, dri-devel, amd-gfx, Daniel Vetter, stable Am 30.03.21 um 15:00 schrieb Dan Horák: > On Tue, 30 Mar 2021 14:55:01 +0200 > Christian König <christian.koenig@amd.com> wrote: > >> Am 30.03.21 um 14:04 schrieb Xi Ruoyao: >>> On 2021-03-30 03:40 +0800, Xi Ruoyao wrote: >>>> On 2021-03-29 21:36 +0200, Christian König wrote: >>>>> Am 29.03.21 um 21:27 schrieb Xi Ruoyao: >>>>>> Hi Christian, >>>>>> >>>>>> I don't think there is any constraint implemented to ensure `num_entries % >>>>>> AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0`. For example, in `amdgpu_vm_bo_map()`: >>>>>> >>>>>> /* validate the parameters */ >>>>>> if (saddr & AMDGPU_GPU_PAGE_MASK || offset & AMDGPU_GPU_PAGE_MASK >>>>>> size == 0 || size & AMDGPU_GPU_PAGE_MASK) >>>>>> return -EINVAL; >>>>>> >>>>>> /* snip */ >>>>>> >>>>>> saddr /= AMDGPU_GPU_PAGE_SIZE; >>>>>> eaddr /= AMDGPU_GPU_PAGE_SIZE; >>>>>> >>>>>> /* snip */ >>>>>> >>>>>> mapping->start = saddr; >>>>>> mapping->last = eaddr; >>>>>> >>>>>> >>>>>> If we really want to ensure (mapping->last - mapping->start + 1) % >>>>>> AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0, then we should replace >>>>>> "AMDGPU_GPU_PAGE_MASK" >>>>>> in "validate the parameters" with "PAGE_MASK". >>> It should be "~PAGE_MASK", "PAGE_MASK" has an opposite convention of >>> "AMDGPU_GPU_PAGE_MASK" :(. >>> >>>>> Yeah, good point. >>>>> >>>>>> I tried it and it broke userspace: Xorg startup fails with EINVAL with >>>>>> this >>>>>> change. >>>>> Well in theory it is possible that we always fill the GPUVM on a 4k >>>>> basis while the native page size of the CPU is larger. Let me double >>>>> check the code. >>> On my platform, there are two issues both causing the VM corruption. One is >>> fixed by agd5f/linux@fe001e7. >> What is fe001e7? A commit id? If yes then that is to short and I can't >> find it. > it's a gitlab shortcut for > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fagd5f%2Flinux%2F-%2Fcommit%2Ffe001e70a55d0378328612be1fdc3abfc68b9ccc&data=04%7C01%7Cchristian.koenig%40amd.com%7Cd16d123aaa01420ebd0808d8f37bbf2f%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637527060812278536%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=5rFVLxSRnfHUGjhoiqx1e6SeROqbg4ZPef%2BxEvgv%2BTg%3D&reserved=0 Ah! Yes I would expect that this patch is fixing something in this use case. Thanks, Christian. > > > Dan >>> Another is in Mesa from userspace: it uses >>> `dev_info->gart_page_size` as the alignment, but the kernel AMDGPU driver >>> expects it to use `dev_info->virtual_address_alignment`. >> Mhm, looking at the kernel code I would rather say Mesa is correct and >> the kernel driver is broken. >> >> The gart_page_size is limited by the CPU page size, but the >> virtual_address_alignment isn't. >> >>> If we can make the change to fill the GPUVM on a 4k basis, we can fix this issue >>> and make virtual_address_alignment = 4K. Otherwise, we should fortify the >>> parameter validation, changing "AMDGPU_GPU_PAGE_MASK" to "~PAGE_MASK". Then the >>> userspace will just get an EINVAL, instead of a slient VM corruption. And >>> someone should tell Mesa developers to fix the code in this case. >> I rather see this as a kernel bug. Can you test if this code fragment >> fixes your issue: >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c >> index 64beb3399604..e1260b517e1b 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c >> @@ -780,7 +780,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void >> *data, struct drm_file *filp) >> } >> dev_info->virtual_address_alignment = >> max((int)PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE); >> dev_info->pte_fragment_size = (1 << >> adev->vm_manager.fragment_size) * AMDGPU_GPU_PAGE_SIZE; >> - dev_info->gart_page_size = AMDGPU_GPU_PAGE_SIZE; >> + dev_info->gart_page_size = >> dev_info->virtual_address_alignment; >> dev_info->cu_active_number = adev->gfx.cu_info.number; >> dev_info->cu_ao_mask = adev->gfx.cu_info.ao_cu_mask; >> dev_info->ce_ram_size = adev->gfx.ce_ram_size; >> >> >> Thanks, >> Christian. >> >>> -- >>> Xi Ruoyao <xry111@mengyan1223.wang> >>> School of Aerospace Science and Technology, Xidian University >>> ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems @ 2021-03-30 13:02 ` Christian König 0 siblings, 0 replies; 54+ messages in thread From: Christian König @ 2021-03-30 13:02 UTC (permalink / raw) To: Dan Horák Cc: amd-gfx, David Airlie, Christian König, Felix Kuehling, linux-kernel, dri-devel, Xi Ruoyao, Daniel Vetter, Alex Deucher, stable Am 30.03.21 um 15:00 schrieb Dan Horák: > On Tue, 30 Mar 2021 14:55:01 +0200 > Christian König <christian.koenig@amd.com> wrote: > >> Am 30.03.21 um 14:04 schrieb Xi Ruoyao: >>> On 2021-03-30 03:40 +0800, Xi Ruoyao wrote: >>>> On 2021-03-29 21:36 +0200, Christian König wrote: >>>>> Am 29.03.21 um 21:27 schrieb Xi Ruoyao: >>>>>> Hi Christian, >>>>>> >>>>>> I don't think there is any constraint implemented to ensure `num_entries % >>>>>> AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0`. For example, in `amdgpu_vm_bo_map()`: >>>>>> >>>>>> /* validate the parameters */ >>>>>> if (saddr & AMDGPU_GPU_PAGE_MASK || offset & AMDGPU_GPU_PAGE_MASK >>>>>> size == 0 || size & AMDGPU_GPU_PAGE_MASK) >>>>>> return -EINVAL; >>>>>> >>>>>> /* snip */ >>>>>> >>>>>> saddr /= AMDGPU_GPU_PAGE_SIZE; >>>>>> eaddr /= AMDGPU_GPU_PAGE_SIZE; >>>>>> >>>>>> /* snip */ >>>>>> >>>>>> mapping->start = saddr; >>>>>> mapping->last = eaddr; >>>>>> >>>>>> >>>>>> If we really want to ensure (mapping->last - mapping->start + 1) % >>>>>> AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0, then we should replace >>>>>> "AMDGPU_GPU_PAGE_MASK" >>>>>> in "validate the parameters" with "PAGE_MASK". >>> It should be "~PAGE_MASK", "PAGE_MASK" has an opposite convention of >>> "AMDGPU_GPU_PAGE_MASK" :(. >>> >>>>> Yeah, good point. >>>>> >>>>>> I tried it and it broke userspace: Xorg startup fails with EINVAL with >>>>>> this >>>>>> change. >>>>> Well in theory it is possible that we always fill the GPUVM on a 4k >>>>> basis while the native page size of the CPU is larger. Let me double >>>>> check the code. >>> On my platform, there are two issues both causing the VM corruption. One is >>> fixed by agd5f/linux@fe001e7. >> What is fe001e7? A commit id? If yes then that is to short and I can't >> find it. > it's a gitlab shortcut for > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fagd5f%2Flinux%2F-%2Fcommit%2Ffe001e70a55d0378328612be1fdc3abfc68b9ccc&data=04%7C01%7Cchristian.koenig%40amd.com%7Cd16d123aaa01420ebd0808d8f37bbf2f%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637527060812278536%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=5rFVLxSRnfHUGjhoiqx1e6SeROqbg4ZPef%2BxEvgv%2BTg%3D&reserved=0 Ah! Yes I would expect that this patch is fixing something in this use case. Thanks, Christian. > > > Dan >>> Another is in Mesa from userspace: it uses >>> `dev_info->gart_page_size` as the alignment, but the kernel AMDGPU driver >>> expects it to use `dev_info->virtual_address_alignment`. >> Mhm, looking at the kernel code I would rather say Mesa is correct and >> the kernel driver is broken. >> >> The gart_page_size is limited by the CPU page size, but the >> virtual_address_alignment isn't. >> >>> If we can make the change to fill the GPUVM on a 4k basis, we can fix this issue >>> and make virtual_address_alignment = 4K. Otherwise, we should fortify the >>> parameter validation, changing "AMDGPU_GPU_PAGE_MASK" to "~PAGE_MASK". Then the >>> userspace will just get an EINVAL, instead of a slient VM corruption. And >>> someone should tell Mesa developers to fix the code in this case. >> I rather see this as a kernel bug. Can you test if this code fragment >> fixes your issue: >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c >> index 64beb3399604..e1260b517e1b 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c >> @@ -780,7 +780,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void >> *data, struct drm_file *filp) >> } >> dev_info->virtual_address_alignment = >> max((int)PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE); >> dev_info->pte_fragment_size = (1 << >> adev->vm_manager.fragment_size) * AMDGPU_GPU_PAGE_SIZE; >> - dev_info->gart_page_size = AMDGPU_GPU_PAGE_SIZE; >> + dev_info->gart_page_size = >> dev_info->virtual_address_alignment; >> dev_info->cu_active_number = adev->gfx.cu_info.number; >> dev_info->cu_ao_mask = adev->gfx.cu_info.ao_cu_mask; >> dev_info->ce_ram_size = adev->gfx.ce_ram_size; >> >> >> Thanks, >> Christian. >> >>> -- >>> Xi Ruoyao <xry111@mengyan1223.wang> >>> School of Aerospace Science and Technology, Xidian University >>> _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems @ 2021-03-30 13:02 ` Christian König 0 siblings, 0 replies; 54+ messages in thread From: Christian König @ 2021-03-30 13:02 UTC (permalink / raw) To: Dan Horák Cc: amd-gfx, David Airlie, Christian König, Felix Kuehling, linux-kernel, dri-devel, Xi Ruoyao, Alex Deucher, stable Am 30.03.21 um 15:00 schrieb Dan Horák: > On Tue, 30 Mar 2021 14:55:01 +0200 > Christian König <christian.koenig@amd.com> wrote: > >> Am 30.03.21 um 14:04 schrieb Xi Ruoyao: >>> On 2021-03-30 03:40 +0800, Xi Ruoyao wrote: >>>> On 2021-03-29 21:36 +0200, Christian König wrote: >>>>> Am 29.03.21 um 21:27 schrieb Xi Ruoyao: >>>>>> Hi Christian, >>>>>> >>>>>> I don't think there is any constraint implemented to ensure `num_entries % >>>>>> AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0`. For example, in `amdgpu_vm_bo_map()`: >>>>>> >>>>>> /* validate the parameters */ >>>>>> if (saddr & AMDGPU_GPU_PAGE_MASK || offset & AMDGPU_GPU_PAGE_MASK >>>>>> size == 0 || size & AMDGPU_GPU_PAGE_MASK) >>>>>> return -EINVAL; >>>>>> >>>>>> /* snip */ >>>>>> >>>>>> saddr /= AMDGPU_GPU_PAGE_SIZE; >>>>>> eaddr /= AMDGPU_GPU_PAGE_SIZE; >>>>>> >>>>>> /* snip */ >>>>>> >>>>>> mapping->start = saddr; >>>>>> mapping->last = eaddr; >>>>>> >>>>>> >>>>>> If we really want to ensure (mapping->last - mapping->start + 1) % >>>>>> AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0, then we should replace >>>>>> "AMDGPU_GPU_PAGE_MASK" >>>>>> in "validate the parameters" with "PAGE_MASK". >>> It should be "~PAGE_MASK", "PAGE_MASK" has an opposite convention of >>> "AMDGPU_GPU_PAGE_MASK" :(. >>> >>>>> Yeah, good point. >>>>> >>>>>> I tried it and it broke userspace: Xorg startup fails with EINVAL with >>>>>> this >>>>>> change. >>>>> Well in theory it is possible that we always fill the GPUVM on a 4k >>>>> basis while the native page size of the CPU is larger. Let me double >>>>> check the code. >>> On my platform, there are two issues both causing the VM corruption. One is >>> fixed by agd5f/linux@fe001e7. >> What is fe001e7? A commit id? If yes then that is to short and I can't >> find it. > it's a gitlab shortcut for > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fagd5f%2Flinux%2F-%2Fcommit%2Ffe001e70a55d0378328612be1fdc3abfc68b9ccc&data=04%7C01%7Cchristian.koenig%40amd.com%7Cd16d123aaa01420ebd0808d8f37bbf2f%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637527060812278536%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=5rFVLxSRnfHUGjhoiqx1e6SeROqbg4ZPef%2BxEvgv%2BTg%3D&reserved=0 Ah! Yes I would expect that this patch is fixing something in this use case. Thanks, Christian. > > > Dan >>> Another is in Mesa from userspace: it uses >>> `dev_info->gart_page_size` as the alignment, but the kernel AMDGPU driver >>> expects it to use `dev_info->virtual_address_alignment`. >> Mhm, looking at the kernel code I would rather say Mesa is correct and >> the kernel driver is broken. >> >> The gart_page_size is limited by the CPU page size, but the >> virtual_address_alignment isn't. >> >>> If we can make the change to fill the GPUVM on a 4k basis, we can fix this issue >>> and make virtual_address_alignment = 4K. Otherwise, we should fortify the >>> parameter validation, changing "AMDGPU_GPU_PAGE_MASK" to "~PAGE_MASK". Then the >>> userspace will just get an EINVAL, instead of a slient VM corruption. And >>> someone should tell Mesa developers to fix the code in this case. >> I rather see this as a kernel bug. Can you test if this code fragment >> fixes your issue: >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c >> index 64beb3399604..e1260b517e1b 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c >> @@ -780,7 +780,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void >> *data, struct drm_file *filp) >> } >> dev_info->virtual_address_alignment = >> max((int)PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE); >> dev_info->pte_fragment_size = (1 << >> adev->vm_manager.fragment_size) * AMDGPU_GPU_PAGE_SIZE; >> - dev_info->gart_page_size = AMDGPU_GPU_PAGE_SIZE; >> + dev_info->gart_page_size = >> dev_info->virtual_address_alignment; >> dev_info->cu_active_number = adev->gfx.cu_info.number; >> dev_info->cu_ao_mask = adev->gfx.cu_info.ao_cu_mask; >> dev_info->ce_ram_size = adev->gfx.ce_ram_size; >> >> >> Thanks, >> Christian. >> >>> -- >>> Xi Ruoyao <xry111@mengyan1223.wang> >>> School of Aerospace Science and Technology, Xidian University >>> _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems 2021-03-30 12:55 ` Christian König (?) @ 2021-03-30 13:02 ` Xi Ruoyao -1 siblings, 0 replies; 54+ messages in thread From: Xi Ruoyao @ 2021-03-30 13:02 UTC (permalink / raw) To: Christian König, Christian König, Alex Deucher Cc: David Airlie, Felix Kuehling, linux-kernel, dri-devel, Dan Horák, amd-gfx, Daniel Vetter, stable On 2021-03-30 14:55 +0200, Christian König wrote: > Am 30.03.21 um 14:04 schrieb Xi Ruoyao: > > On 2021-03-30 03:40 +0800, Xi Ruoyao wrote: > > > On 2021-03-29 21:36 +0200, Christian König wrote: > > > > Am 29.03.21 um 21:27 schrieb Xi Ruoyao: > > > > > Hi Christian, > > > > > > > > > > I don't think there is any constraint implemented to ensure `num_entries > > > > > % > > > > > AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0`. For example, in > > > > > `amdgpu_vm_bo_map()`: > > > > > > > > > > /* validate the parameters */ > > > > > if (saddr & AMDGPU_GPU_PAGE_MASK || offset & > > > > > AMDGPU_GPU_PAGE_MASK > > > > > size == 0 || size & AMDGPU_GPU_PAGE_MASK) > > > > > return -EINVAL; > > > > > > > > > > /* snip */ > > > > > > > > > > saddr /= AMDGPU_GPU_PAGE_SIZE; > > > > > eaddr /= AMDGPU_GPU_PAGE_SIZE; > > > > > > > > > > /* snip */ > > > > > > > > > > mapping->start = saddr; > > > > > mapping->last = eaddr; > > > > > > > > > > > > > > > If we really want to ensure (mapping->last - mapping->start + 1) % > > > > > AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0, then we should replace > > > > > "AMDGPU_GPU_PAGE_MASK" > > > > > in "validate the parameters" with "PAGE_MASK". > > It should be "~PAGE_MASK", "PAGE_MASK" has an opposite convention of > > "AMDGPU_GPU_PAGE_MASK" :(. > > > > > > Yeah, good point. > > > > > > > > > I tried it and it broke userspace: Xorg startup fails with EINVAL with > > > > > this > > > > > change. > > > > Well in theory it is possible that we always fill the GPUVM on a 4k > > > > basis while the native page size of the CPU is larger. Let me double > > > > check the code. > > On my platform, there are two issues both causing the VM corruption. One is > > fixed by agd5f/linux@fe001e7. > > What is fe001e7? A commit id? If yes then that is to short and I can't > find it. > > > Another is in Mesa from userspace: it uses > > `dev_info->gart_page_size` as the alignment, but the kernel AMDGPU driver > > expects it to use `dev_info->virtual_address_alignment`. > > Mhm, looking at the kernel code I would rather say Mesa is correct and > the kernel driver is broken. > > The gart_page_size is limited by the CPU page size, but the > virtual_address_alignment isn't. > > > If we can make the change to fill the GPUVM on a 4k basis, we can fix this > > issue > > and make virtual_address_alignment = 4K. Otherwise, we should fortify the > > parameter validation, changing "AMDGPU_GPU_PAGE_MASK" to "~PAGE_MASK". Then > > the > > userspace will just get an EINVAL, instead of a slient VM corruption. And > > someone should tell Mesa developers to fix the code in this case. > > I rather see this as a kernel bug. Can you test if this code fragment > fixes your issue: > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > index 64beb3399604..e1260b517e1b 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > @@ -780,7 +780,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void > *data, struct drm_file *filp) > } > dev_info->virtual_address_alignment = > max((int)PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE); > dev_info->pte_fragment_size = (1 << > adev->vm_manager.fragment_size) * AMDGPU_GPU_PAGE_SIZE; > - dev_info->gart_page_size = AMDGPU_GPU_PAGE_SIZE; > + dev_info->gart_page_size = > dev_info->virtual_address_alignment; > dev_info->cu_active_number = adev->gfx.cu_info.number; > dev_info->cu_ao_mask = adev->gfx.cu_info.ao_cu_mask; > dev_info->ce_ram_size = adev->gfx.ce_ram_size; It works. I've seen it at https://github.com/xen0n/linux/commit/84ada72983838bd7ce54bc32f5d34ac5b5aae191 before (with a common sub-expression, though :). -- Xi Ruoyao <xry111@mengyan1223.wang> School of Aerospace Science and Technology, Xidian University ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems @ 2021-03-30 13:02 ` Xi Ruoyao 0 siblings, 0 replies; 54+ messages in thread From: Xi Ruoyao @ 2021-03-30 13:02 UTC (permalink / raw) To: Christian König, Christian König, Alex Deucher Cc: David Airlie, Felix Kuehling, linux-kernel, amd-gfx, Dan Horák, dri-devel, Daniel Vetter, stable On 2021-03-30 14:55 +0200, Christian König wrote: > Am 30.03.21 um 14:04 schrieb Xi Ruoyao: > > On 2021-03-30 03:40 +0800, Xi Ruoyao wrote: > > > On 2021-03-29 21:36 +0200, Christian König wrote: > > > > Am 29.03.21 um 21:27 schrieb Xi Ruoyao: > > > > > Hi Christian, > > > > > > > > > > I don't think there is any constraint implemented to ensure `num_entries > > > > > % > > > > > AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0`. For example, in > > > > > `amdgpu_vm_bo_map()`: > > > > > > > > > > /* validate the parameters */ > > > > > if (saddr & AMDGPU_GPU_PAGE_MASK || offset & > > > > > AMDGPU_GPU_PAGE_MASK > > > > > size == 0 || size & AMDGPU_GPU_PAGE_MASK) > > > > > return -EINVAL; > > > > > > > > > > /* snip */ > > > > > > > > > > saddr /= AMDGPU_GPU_PAGE_SIZE; > > > > > eaddr /= AMDGPU_GPU_PAGE_SIZE; > > > > > > > > > > /* snip */ > > > > > > > > > > mapping->start = saddr; > > > > > mapping->last = eaddr; > > > > > > > > > > > > > > > If we really want to ensure (mapping->last - mapping->start + 1) % > > > > > AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0, then we should replace > > > > > "AMDGPU_GPU_PAGE_MASK" > > > > > in "validate the parameters" with "PAGE_MASK". > > It should be "~PAGE_MASK", "PAGE_MASK" has an opposite convention of > > "AMDGPU_GPU_PAGE_MASK" :(. > > > > > > Yeah, good point. > > > > > > > > > I tried it and it broke userspace: Xorg startup fails with EINVAL with > > > > > this > > > > > change. > > > > Well in theory it is possible that we always fill the GPUVM on a 4k > > > > basis while the native page size of the CPU is larger. Let me double > > > > check the code. > > On my platform, there are two issues both causing the VM corruption. One is > > fixed by agd5f/linux@fe001e7. > > What is fe001e7? A commit id? If yes then that is to short and I can't > find it. > > > Another is in Mesa from userspace: it uses > > `dev_info->gart_page_size` as the alignment, but the kernel AMDGPU driver > > expects it to use `dev_info->virtual_address_alignment`. > > Mhm, looking at the kernel code I would rather say Mesa is correct and > the kernel driver is broken. > > The gart_page_size is limited by the CPU page size, but the > virtual_address_alignment isn't. > > > If we can make the change to fill the GPUVM on a 4k basis, we can fix this > > issue > > and make virtual_address_alignment = 4K. Otherwise, we should fortify the > > parameter validation, changing "AMDGPU_GPU_PAGE_MASK" to "~PAGE_MASK". Then > > the > > userspace will just get an EINVAL, instead of a slient VM corruption. And > > someone should tell Mesa developers to fix the code in this case. > > I rather see this as a kernel bug. Can you test if this code fragment > fixes your issue: > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > index 64beb3399604..e1260b517e1b 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > @@ -780,7 +780,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void > *data, struct drm_file *filp) > } > dev_info->virtual_address_alignment = > max((int)PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE); > dev_info->pte_fragment_size = (1 << > adev->vm_manager.fragment_size) * AMDGPU_GPU_PAGE_SIZE; > - dev_info->gart_page_size = AMDGPU_GPU_PAGE_SIZE; > + dev_info->gart_page_size = > dev_info->virtual_address_alignment; > dev_info->cu_active_number = adev->gfx.cu_info.number; > dev_info->cu_ao_mask = adev->gfx.cu_info.ao_cu_mask; > dev_info->ce_ram_size = adev->gfx.ce_ram_size; It works. I've seen it at https://github.com/xen0n/linux/commit/84ada72983838bd7ce54bc32f5d34ac5b5aae191 before (with a common sub-expression, though :). -- Xi Ruoyao <xry111@mengyan1223.wang> School of Aerospace Science and Technology, Xidian University _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems @ 2021-03-30 13:02 ` Xi Ruoyao 0 siblings, 0 replies; 54+ messages in thread From: Xi Ruoyao @ 2021-03-30 13:02 UTC (permalink / raw) To: Christian König, Christian König, Alex Deucher Cc: David Airlie, Felix Kuehling, linux-kernel, amd-gfx, Dan Horák, dri-devel, stable On 2021-03-30 14:55 +0200, Christian König wrote: > Am 30.03.21 um 14:04 schrieb Xi Ruoyao: > > On 2021-03-30 03:40 +0800, Xi Ruoyao wrote: > > > On 2021-03-29 21:36 +0200, Christian König wrote: > > > > Am 29.03.21 um 21:27 schrieb Xi Ruoyao: > > > > > Hi Christian, > > > > > > > > > > I don't think there is any constraint implemented to ensure `num_entries > > > > > % > > > > > AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0`. For example, in > > > > > `amdgpu_vm_bo_map()`: > > > > > > > > > > /* validate the parameters */ > > > > > if (saddr & AMDGPU_GPU_PAGE_MASK || offset & > > > > > AMDGPU_GPU_PAGE_MASK > > > > > size == 0 || size & AMDGPU_GPU_PAGE_MASK) > > > > > return -EINVAL; > > > > > > > > > > /* snip */ > > > > > > > > > > saddr /= AMDGPU_GPU_PAGE_SIZE; > > > > > eaddr /= AMDGPU_GPU_PAGE_SIZE; > > > > > > > > > > /* snip */ > > > > > > > > > > mapping->start = saddr; > > > > > mapping->last = eaddr; > > > > > > > > > > > > > > > If we really want to ensure (mapping->last - mapping->start + 1) % > > > > > AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0, then we should replace > > > > > "AMDGPU_GPU_PAGE_MASK" > > > > > in "validate the parameters" with "PAGE_MASK". > > It should be "~PAGE_MASK", "PAGE_MASK" has an opposite convention of > > "AMDGPU_GPU_PAGE_MASK" :(. > > > > > > Yeah, good point. > > > > > > > > > I tried it and it broke userspace: Xorg startup fails with EINVAL with > > > > > this > > > > > change. > > > > Well in theory it is possible that we always fill the GPUVM on a 4k > > > > basis while the native page size of the CPU is larger. Let me double > > > > check the code. > > On my platform, there are two issues both causing the VM corruption. One is > > fixed by agd5f/linux@fe001e7. > > What is fe001e7? A commit id? If yes then that is to short and I can't > find it. > > > Another is in Mesa from userspace: it uses > > `dev_info->gart_page_size` as the alignment, but the kernel AMDGPU driver > > expects it to use `dev_info->virtual_address_alignment`. > > Mhm, looking at the kernel code I would rather say Mesa is correct and > the kernel driver is broken. > > The gart_page_size is limited by the CPU page size, but the > virtual_address_alignment isn't. > > > If we can make the change to fill the GPUVM on a 4k basis, we can fix this > > issue > > and make virtual_address_alignment = 4K. Otherwise, we should fortify the > > parameter validation, changing "AMDGPU_GPU_PAGE_MASK" to "~PAGE_MASK". Then > > the > > userspace will just get an EINVAL, instead of a slient VM corruption. And > > someone should tell Mesa developers to fix the code in this case. > > I rather see this as a kernel bug. Can you test if this code fragment > fixes your issue: > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > index 64beb3399604..e1260b517e1b 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > @@ -780,7 +780,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void > *data, struct drm_file *filp) > } > dev_info->virtual_address_alignment = > max((int)PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE); > dev_info->pte_fragment_size = (1 << > adev->vm_manager.fragment_size) * AMDGPU_GPU_PAGE_SIZE; > - dev_info->gart_page_size = AMDGPU_GPU_PAGE_SIZE; > + dev_info->gart_page_size = > dev_info->virtual_address_alignment; > dev_info->cu_active_number = adev->gfx.cu_info.number; > dev_info->cu_ao_mask = adev->gfx.cu_info.ao_cu_mask; > dev_info->ce_ram_size = adev->gfx.ce_ram_size; It works. I've seen it at https://github.com/xen0n/linux/commit/84ada72983838bd7ce54bc32f5d34ac5b5aae191 before (with a common sub-expression, though :). -- Xi Ruoyao <xry111@mengyan1223.wang> School of Aerospace Science and Technology, Xidian University _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems 2021-03-30 13:02 ` Xi Ruoyao (?) @ 2021-03-30 13:09 ` Xi Ruoyao -1 siblings, 0 replies; 54+ messages in thread From: Xi Ruoyao @ 2021-03-30 13:09 UTC (permalink / raw) To: Christian König, Christian König, Alex Deucher Cc: David Airlie, Felix Kuehling, linux-kernel, dri-devel, Dan Horák, amd-gfx, Daniel Vetter, stable On 2021-03-30 21:02 +0800, Xi Ruoyao wrote: > On 2021-03-30 14:55 +0200, Christian König wrote: > > > > I rather see this as a kernel bug. Can you test if this code fragment > > fixes your issue: > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > > index 64beb3399604..e1260b517e1b 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > > @@ -780,7 +780,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void > > *data, struct drm_file *filp) > > } > > dev_info->virtual_address_alignment = > > max((int)PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE); > > dev_info->pte_fragment_size = (1 << > > adev->vm_manager.fragment_size) * AMDGPU_GPU_PAGE_SIZE; > > - dev_info->gart_page_size = AMDGPU_GPU_PAGE_SIZE; > > + dev_info->gart_page_size = > > dev_info->virtual_address_alignment; > > dev_info->cu_active_number = adev->gfx.cu_info.number; > > dev_info->cu_ao_mask = adev->gfx.cu_info.ao_cu_mask; > > dev_info->ce_ram_size = adev->gfx.ce_ram_size; > > It works. I've seen it at > https://github.com/xen0n/linux/commit/84ada72983838bd7ce54bc32f5d34ac5b5aae191 > before (with a common sub-expression, though :). Some comment: on an old version of Fedora ported by Loongson, Xorg just hangs without this commit. But on the system I built from source, I didn't see any issue before Linux 5.11. So I misbelieved that it was something already fixed. Dan: you can try it on your PPC 64 with non-4K page as well. -- Xi Ruoyao <xry111@mengyan1223.wang> School of Aerospace Science and Technology, Xidian University ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems @ 2021-03-30 13:09 ` Xi Ruoyao 0 siblings, 0 replies; 54+ messages in thread From: Xi Ruoyao @ 2021-03-30 13:09 UTC (permalink / raw) To: Christian König, Christian König, Alex Deucher Cc: David Airlie, Felix Kuehling, linux-kernel, amd-gfx, Dan Horák, dri-devel, Daniel Vetter, stable On 2021-03-30 21:02 +0800, Xi Ruoyao wrote: > On 2021-03-30 14:55 +0200, Christian König wrote: > > > > I rather see this as a kernel bug. Can you test if this code fragment > > fixes your issue: > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > > index 64beb3399604..e1260b517e1b 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > > @@ -780,7 +780,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void > > *data, struct drm_file *filp) > > } > > dev_info->virtual_address_alignment = > > max((int)PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE); > > dev_info->pte_fragment_size = (1 << > > adev->vm_manager.fragment_size) * AMDGPU_GPU_PAGE_SIZE; > > - dev_info->gart_page_size = AMDGPU_GPU_PAGE_SIZE; > > + dev_info->gart_page_size = > > dev_info->virtual_address_alignment; > > dev_info->cu_active_number = adev->gfx.cu_info.number; > > dev_info->cu_ao_mask = adev->gfx.cu_info.ao_cu_mask; > > dev_info->ce_ram_size = adev->gfx.ce_ram_size; > > It works. I've seen it at > https://github.com/xen0n/linux/commit/84ada72983838bd7ce54bc32f5d34ac5b5aae191 > before (with a common sub-expression, though :). Some comment: on an old version of Fedora ported by Loongson, Xorg just hangs without this commit. But on the system I built from source, I didn't see any issue before Linux 5.11. So I misbelieved that it was something already fixed. Dan: you can try it on your PPC 64 with non-4K page as well. -- Xi Ruoyao <xry111@mengyan1223.wang> School of Aerospace Science and Technology, Xidian University _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems @ 2021-03-30 13:09 ` Xi Ruoyao 0 siblings, 0 replies; 54+ messages in thread From: Xi Ruoyao @ 2021-03-30 13:09 UTC (permalink / raw) To: Christian König, Christian König, Alex Deucher Cc: David Airlie, Felix Kuehling, linux-kernel, amd-gfx, Dan Horák, dri-devel, stable On 2021-03-30 21:02 +0800, Xi Ruoyao wrote: > On 2021-03-30 14:55 +0200, Christian König wrote: > > > > I rather see this as a kernel bug. Can you test if this code fragment > > fixes your issue: > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > > index 64beb3399604..e1260b517e1b 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > > @@ -780,7 +780,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void > > *data, struct drm_file *filp) > > } > > dev_info->virtual_address_alignment = > > max((int)PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE); > > dev_info->pte_fragment_size = (1 << > > adev->vm_manager.fragment_size) * AMDGPU_GPU_PAGE_SIZE; > > - dev_info->gart_page_size = AMDGPU_GPU_PAGE_SIZE; > > + dev_info->gart_page_size = > > dev_info->virtual_address_alignment; > > dev_info->cu_active_number = adev->gfx.cu_info.number; > > dev_info->cu_ao_mask = adev->gfx.cu_info.ao_cu_mask; > > dev_info->ce_ram_size = adev->gfx.ce_ram_size; > > It works. I've seen it at > https://github.com/xen0n/linux/commit/84ada72983838bd7ce54bc32f5d34ac5b5aae191 > before (with a common sub-expression, though :). Some comment: on an old version of Fedora ported by Loongson, Xorg just hangs without this commit. But on the system I built from source, I didn't see any issue before Linux 5.11. So I misbelieved that it was something already fixed. Dan: you can try it on your PPC 64 with non-4K page as well. -- Xi Ruoyao <xry111@mengyan1223.wang> School of Aerospace Science and Technology, Xidian University _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems 2021-03-30 13:09 ` Xi Ruoyao (?) @ 2021-03-30 13:23 ` Dan Horák -1 siblings, 0 replies; 54+ messages in thread From: Dan Horák @ 2021-03-30 13:23 UTC (permalink / raw) To: Xi Ruoyao Cc: Christian König, Christian König, Alex Deucher, David Airlie, Felix Kuehling, linux-kernel, dri-devel, amd-gfx, Daniel Vetter, stable On Tue, 30 Mar 2021 21:09:12 +0800 Xi Ruoyao <xry111@mengyan1223.wang> wrote: > On 2021-03-30 21:02 +0800, Xi Ruoyao wrote: > > On 2021-03-30 14:55 +0200, Christian König wrote: > > > > > > I rather see this as a kernel bug. Can you test if this code fragment > > > fixes your issue: > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > > > index 64beb3399604..e1260b517e1b 100644 > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > > > @@ -780,7 +780,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void > > > *data, struct drm_file *filp) > > > } > > > dev_info->virtual_address_alignment = > > > max((int)PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE); > > > dev_info->pte_fragment_size = (1 << > > > adev->vm_manager.fragment_size) * AMDGPU_GPU_PAGE_SIZE; > > > - dev_info->gart_page_size = AMDGPU_GPU_PAGE_SIZE; > > > + dev_info->gart_page_size = > > > dev_info->virtual_address_alignment; > > > dev_info->cu_active_number = adev->gfx.cu_info.number; > > > dev_info->cu_ao_mask = adev->gfx.cu_info.ao_cu_mask; > > > dev_info->ce_ram_size = adev->gfx.ce_ram_size; > > > > It works. I've seen it at > > https://github.com/xen0n/linux/commit/84ada72983838bd7ce54bc32f5d34ac5b5aae191 > > before (with a common sub-expression, though :). > > Some comment: on an old version of Fedora ported by Loongson, Xorg just hangs > without this commit. But on the system I built from source, I didn't see any > issue before Linux 5.11. So I misbelieved that it was something already fixed. > > Dan: you can try it on your PPC 64 with non-4K page as well. yup, looks good here as well, ppc64le (Power9) system with 64KB pages Dan ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems @ 2021-03-30 13:23 ` Dan Horák 0 siblings, 0 replies; 54+ messages in thread From: Dan Horák @ 2021-03-30 13:23 UTC (permalink / raw) To: Xi Ruoyao Cc: David Airlie, Christian König, Felix Kuehling, linux-kernel, dri-devel, amd-gfx, Daniel Vetter, Alex Deucher, stable, Christian König On Tue, 30 Mar 2021 21:09:12 +0800 Xi Ruoyao <xry111@mengyan1223.wang> wrote: > On 2021-03-30 21:02 +0800, Xi Ruoyao wrote: > > On 2021-03-30 14:55 +0200, Christian König wrote: > > > > > > I rather see this as a kernel bug. Can you test if this code fragment > > > fixes your issue: > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > > > index 64beb3399604..e1260b517e1b 100644 > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > > > @@ -780,7 +780,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void > > > *data, struct drm_file *filp) > > > } > > > dev_info->virtual_address_alignment = > > > max((int)PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE); > > > dev_info->pte_fragment_size = (1 << > > > adev->vm_manager.fragment_size) * AMDGPU_GPU_PAGE_SIZE; > > > - dev_info->gart_page_size = AMDGPU_GPU_PAGE_SIZE; > > > + dev_info->gart_page_size = > > > dev_info->virtual_address_alignment; > > > dev_info->cu_active_number = adev->gfx.cu_info.number; > > > dev_info->cu_ao_mask = adev->gfx.cu_info.ao_cu_mask; > > > dev_info->ce_ram_size = adev->gfx.ce_ram_size; > > > > It works. I've seen it at > > https://github.com/xen0n/linux/commit/84ada72983838bd7ce54bc32f5d34ac5b5aae191 > > before (with a common sub-expression, though :). > > Some comment: on an old version of Fedora ported by Loongson, Xorg just hangs > without this commit. But on the system I built from source, I didn't see any > issue before Linux 5.11. So I misbelieved that it was something already fixed. > > Dan: you can try it on your PPC 64 with non-4K page as well. yup, looks good here as well, ppc64le (Power9) system with 64KB pages Dan _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems @ 2021-03-30 13:23 ` Dan Horák 0 siblings, 0 replies; 54+ messages in thread From: Dan Horák @ 2021-03-30 13:23 UTC (permalink / raw) To: Xi Ruoyao Cc: David Airlie, Christian König, Felix Kuehling, linux-kernel, dri-devel, amd-gfx, Alex Deucher, stable, Christian König On Tue, 30 Mar 2021 21:09:12 +0800 Xi Ruoyao <xry111@mengyan1223.wang> wrote: > On 2021-03-30 21:02 +0800, Xi Ruoyao wrote: > > On 2021-03-30 14:55 +0200, Christian König wrote: > > > > > > I rather see this as a kernel bug. Can you test if this code fragment > > > fixes your issue: > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > > > index 64beb3399604..e1260b517e1b 100644 > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > > > @@ -780,7 +780,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void > > > *data, struct drm_file *filp) > > > } > > > dev_info->virtual_address_alignment = > > > max((int)PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE); > > > dev_info->pte_fragment_size = (1 << > > > adev->vm_manager.fragment_size) * AMDGPU_GPU_PAGE_SIZE; > > > - dev_info->gart_page_size = AMDGPU_GPU_PAGE_SIZE; > > > + dev_info->gart_page_size = > > > dev_info->virtual_address_alignment; > > > dev_info->cu_active_number = adev->gfx.cu_info.number; > > > dev_info->cu_ao_mask = adev->gfx.cu_info.ao_cu_mask; > > > dev_info->ce_ram_size = adev->gfx.ce_ram_size; > > > > It works. I've seen it at > > https://github.com/xen0n/linux/commit/84ada72983838bd7ce54bc32f5d34ac5b5aae191 > > before (with a common sub-expression, though :). > > Some comment: on an old version of Fedora ported by Loongson, Xorg just hangs > without this commit. But on the system I built from source, I didn't see any > issue before Linux 5.11. So I misbelieved that it was something already fixed. > > Dan: you can try it on your PPC 64 with non-4K page as well. yup, looks good here as well, ppc64le (Power9) system with 64KB pages Dan _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems 2021-03-30 13:23 ` Dan Horák (?) @ 2021-03-30 13:24 ` Christian König -1 siblings, 0 replies; 54+ messages in thread From: Christian König @ 2021-03-30 13:24 UTC (permalink / raw) To: Dan Horák, Xi Ruoyao Cc: Christian König, Alex Deucher, David Airlie, Felix Kuehling, linux-kernel, dri-devel, amd-gfx, Daniel Vetter, stable Am 30.03.21 um 15:23 schrieb Dan Horák: > On Tue, 30 Mar 2021 21:09:12 +0800 > Xi Ruoyao <xry111@mengyan1223.wang> wrote: > >> On 2021-03-30 21:02 +0800, Xi Ruoyao wrote: >>> On 2021-03-30 14:55 +0200, Christian König wrote: >>>> I rather see this as a kernel bug. Can you test if this code fragment >>>> fixes your issue: >>>> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c >>>> index 64beb3399604..e1260b517e1b 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c >>>> @@ -780,7 +780,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void >>>> *data, struct drm_file *filp) >>>> } >>>> dev_info->virtual_address_alignment = >>>> max((int)PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE); >>>> dev_info->pte_fragment_size = (1 << >>>> adev->vm_manager.fragment_size) * AMDGPU_GPU_PAGE_SIZE; >>>> - dev_info->gart_page_size = AMDGPU_GPU_PAGE_SIZE; >>>> + dev_info->gart_page_size = >>>> dev_info->virtual_address_alignment; >>>> dev_info->cu_active_number = adev->gfx.cu_info.number; >>>> dev_info->cu_ao_mask = adev->gfx.cu_info.ao_cu_mask; >>>> dev_info->ce_ram_size = adev->gfx.ce_ram_size; >>> It works. I've seen it at >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fxen0n%2Flinux%2Fcommit%2F84ada72983838bd7ce54bc32f5d34ac5b5aae191&data=04%7C01%7Cchristian.koenig%40amd.com%7Cf37fddf20a8847edf67808d8f37ef23c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637527074118791321%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=DZnmee38NGpiWRMX5LmlxOhxAzBMhAusnAWNnCxXTJ0%3D&reserved=0 >>> before (with a common sub-expression, though :). >> Some comment: on an old version of Fedora ported by Loongson, Xorg just hangs >> without this commit. But on the system I built from source, I didn't see any >> issue before Linux 5.11. So I misbelieved that it was something already fixed. >> >> Dan: you can try it on your PPC 64 with non-4K page as well. > yup, looks good here as well, ppc64le (Power9) system with 64KB pages Mhm, as far as I can say this patch never made it to us. Can you please send it to the mailing list with me on CC? Thanks, Christian. > > > Dan ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems @ 2021-03-30 13:24 ` Christian König 0 siblings, 0 replies; 54+ messages in thread From: Christian König @ 2021-03-30 13:24 UTC (permalink / raw) To: Dan Horák, Xi Ruoyao Cc: David Airlie, Christian König, Felix Kuehling, linux-kernel, dri-devel, amd-gfx, Daniel Vetter, Alex Deucher, stable Am 30.03.21 um 15:23 schrieb Dan Horák: > On Tue, 30 Mar 2021 21:09:12 +0800 > Xi Ruoyao <xry111@mengyan1223.wang> wrote: > >> On 2021-03-30 21:02 +0800, Xi Ruoyao wrote: >>> On 2021-03-30 14:55 +0200, Christian König wrote: >>>> I rather see this as a kernel bug. Can you test if this code fragment >>>> fixes your issue: >>>> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c >>>> index 64beb3399604..e1260b517e1b 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c >>>> @@ -780,7 +780,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void >>>> *data, struct drm_file *filp) >>>> } >>>> dev_info->virtual_address_alignment = >>>> max((int)PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE); >>>> dev_info->pte_fragment_size = (1 << >>>> adev->vm_manager.fragment_size) * AMDGPU_GPU_PAGE_SIZE; >>>> - dev_info->gart_page_size = AMDGPU_GPU_PAGE_SIZE; >>>> + dev_info->gart_page_size = >>>> dev_info->virtual_address_alignment; >>>> dev_info->cu_active_number = adev->gfx.cu_info.number; >>>> dev_info->cu_ao_mask = adev->gfx.cu_info.ao_cu_mask; >>>> dev_info->ce_ram_size = adev->gfx.ce_ram_size; >>> It works. I've seen it at >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fxen0n%2Flinux%2Fcommit%2F84ada72983838bd7ce54bc32f5d34ac5b5aae191&data=04%7C01%7Cchristian.koenig%40amd.com%7Cf37fddf20a8847edf67808d8f37ef23c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637527074118791321%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=DZnmee38NGpiWRMX5LmlxOhxAzBMhAusnAWNnCxXTJ0%3D&reserved=0 >>> before (with a common sub-expression, though :). >> Some comment: on an old version of Fedora ported by Loongson, Xorg just hangs >> without this commit. But on the system I built from source, I didn't see any >> issue before Linux 5.11. So I misbelieved that it was something already fixed. >> >> Dan: you can try it on your PPC 64 with non-4K page as well. > yup, looks good here as well, ppc64le (Power9) system with 64KB pages Mhm, as far as I can say this patch never made it to us. Can you please send it to the mailing list with me on CC? Thanks, Christian. > > > Dan _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems @ 2021-03-30 13:24 ` Christian König 0 siblings, 0 replies; 54+ messages in thread From: Christian König @ 2021-03-30 13:24 UTC (permalink / raw) To: Dan Horák, Xi Ruoyao Cc: David Airlie, Christian König, Felix Kuehling, linux-kernel, dri-devel, amd-gfx, Alex Deucher, stable Am 30.03.21 um 15:23 schrieb Dan Horák: > On Tue, 30 Mar 2021 21:09:12 +0800 > Xi Ruoyao <xry111@mengyan1223.wang> wrote: > >> On 2021-03-30 21:02 +0800, Xi Ruoyao wrote: >>> On 2021-03-30 14:55 +0200, Christian König wrote: >>>> I rather see this as a kernel bug. Can you test if this code fragment >>>> fixes your issue: >>>> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c >>>> index 64beb3399604..e1260b517e1b 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c >>>> @@ -780,7 +780,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void >>>> *data, struct drm_file *filp) >>>> } >>>> dev_info->virtual_address_alignment = >>>> max((int)PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE); >>>> dev_info->pte_fragment_size = (1 << >>>> adev->vm_manager.fragment_size) * AMDGPU_GPU_PAGE_SIZE; >>>> - dev_info->gart_page_size = AMDGPU_GPU_PAGE_SIZE; >>>> + dev_info->gart_page_size = >>>> dev_info->virtual_address_alignment; >>>> dev_info->cu_active_number = adev->gfx.cu_info.number; >>>> dev_info->cu_ao_mask = adev->gfx.cu_info.ao_cu_mask; >>>> dev_info->ce_ram_size = adev->gfx.ce_ram_size; >>> It works. I've seen it at >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fxen0n%2Flinux%2Fcommit%2F84ada72983838bd7ce54bc32f5d34ac5b5aae191&data=04%7C01%7Cchristian.koenig%40amd.com%7Cf37fddf20a8847edf67808d8f37ef23c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637527074118791321%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=DZnmee38NGpiWRMX5LmlxOhxAzBMhAusnAWNnCxXTJ0%3D&reserved=0 >>> before (with a common sub-expression, though :). >> Some comment: on an old version of Fedora ported by Loongson, Xorg just hangs >> without this commit. But on the system I built from source, I didn't see any >> issue before Linux 5.11. So I misbelieved that it was something already fixed. >> >> Dan: you can try it on your PPC 64 with non-4K page as well. > yup, looks good here as well, ppc64le (Power9) system with 64KB pages Mhm, as far as I can say this patch never made it to us. Can you please send it to the mailing list with me on CC? Thanks, Christian. > > > Dan _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems 2021-03-30 13:24 ` Christian König (?) @ 2021-03-30 15:46 ` Xi Ruoyao -1 siblings, 0 replies; 54+ messages in thread From: Xi Ruoyao @ 2021-03-30 15:46 UTC (permalink / raw) To: Christian König, Dan Horák Cc: Christian König, Alex Deucher, David Airlie, Felix Kuehling, linux-kernel, dri-devel, amd-gfx, Daniel Vetter, stable On 2021-03-30 15:24 +0200, Christian König wrote: > Am 30.03.21 um 15:23 schrieb Dan Horák: > > On Tue, 30 Mar 2021 21:09:12 +0800 > > Xi Ruoyao <xry111@mengyan1223.wang> wrote: > > > > > On 2021-03-30 21:02 +0800, Xi Ruoyao wrote: > > > > On 2021-03-30 14:55 +0200, Christian König wrote: > > > > > I rather see this as a kernel bug. Can you test if this code fragment > > > > > fixes your issue: > > > > > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > > > > > index 64beb3399604..e1260b517e1b 100644 > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > > > > > @@ -780,7 +780,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void > > > > > *data, struct drm_file *filp) > > > > > } > > > > > dev_info->virtual_address_alignment = > > > > > max((int)PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE); > > > > > dev_info->pte_fragment_size = (1 << > > > > > adev->vm_manager.fragment_size) * AMDGPU_GPU_PAGE_SIZE; > > > > > - dev_info->gart_page_size = AMDGPU_GPU_PAGE_SIZE; > > > > > + dev_info->gart_page_size = > > > > > dev_info->virtual_address_alignment; > > > > > dev_info->cu_active_number = adev- > > > > > >gfx.cu_info.number; > > > > > dev_info->cu_ao_mask = adev->gfx.cu_info.ao_cu_mask; > > > > > dev_info->ce_ram_size = adev->gfx.ce_ram_size; > > > > It works. I've seen it at > > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fxen0n%2Flinux%2Fcommit%2F84ada72983838bd7ce54bc32f5d34ac5b5aae191&data=04%7C01%7Cchristian.koenig%40amd.com%7Cf37fddf20a8847edf67808d8f37ef23c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637527074118791321%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=DZnmee38NGpiWRMX5LmlxOhxAzBMhAusnAWNnCxXTJ0%3D&reserved=0 > > > > before (with a common sub-expression, though :). > > > Some comment: on an old version of Fedora ported by Loongson, Xorg just > > > hangs > > > without this commit. But on the system I built from source, I didn't see > > > any > > > issue before Linux 5.11. So I misbelieved that it was something already > > > fixed. > > > > > > Dan: you can try it on your PPC 64 with non-4K page as well. > > yup, looks good here as well, ppc64le (Power9) system with 64KB pages > > Mhm, as far as I can say this patch never made it to us. I think maybe its author considers it a "workaround" like me, as there is already a "virtual_address_alignment" which seems correct. > Can you please send it to the mailing list with me on CC? I've sent it, together with my patch for using ~PAGE_MASK in parameter validation. -- Xi Ruoyao <xry111@mengyan1223.wang> School of Aerospace Science and Technology, Xidian University ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems @ 2021-03-30 15:46 ` Xi Ruoyao 0 siblings, 0 replies; 54+ messages in thread From: Xi Ruoyao @ 2021-03-30 15:46 UTC (permalink / raw) To: Christian König, Dan Horák Cc: David Airlie, Christian König, Felix Kuehling, linux-kernel, dri-devel, amd-gfx, Daniel Vetter, Alex Deucher, stable On 2021-03-30 15:24 +0200, Christian König wrote: > Am 30.03.21 um 15:23 schrieb Dan Horák: > > On Tue, 30 Mar 2021 21:09:12 +0800 > > Xi Ruoyao <xry111@mengyan1223.wang> wrote: > > > > > On 2021-03-30 21:02 +0800, Xi Ruoyao wrote: > > > > On 2021-03-30 14:55 +0200, Christian König wrote: > > > > > I rather see this as a kernel bug. Can you test if this code fragment > > > > > fixes your issue: > > > > > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > > > > > index 64beb3399604..e1260b517e1b 100644 > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > > > > > @@ -780,7 +780,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void > > > > > *data, struct drm_file *filp) > > > > > } > > > > > dev_info->virtual_address_alignment = > > > > > max((int)PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE); > > > > > dev_info->pte_fragment_size = (1 << > > > > > adev->vm_manager.fragment_size) * AMDGPU_GPU_PAGE_SIZE; > > > > > - dev_info->gart_page_size = AMDGPU_GPU_PAGE_SIZE; > > > > > + dev_info->gart_page_size = > > > > > dev_info->virtual_address_alignment; > > > > > dev_info->cu_active_number = adev- > > > > > >gfx.cu_info.number; > > > > > dev_info->cu_ao_mask = adev->gfx.cu_info.ao_cu_mask; > > > > > dev_info->ce_ram_size = adev->gfx.ce_ram_size; > > > > It works. I've seen it at > > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fxen0n%2Flinux%2Fcommit%2F84ada72983838bd7ce54bc32f5d34ac5b5aae191&data=04%7C01%7Cchristian.koenig%40amd.com%7Cf37fddf20a8847edf67808d8f37ef23c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637527074118791321%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=DZnmee38NGpiWRMX5LmlxOhxAzBMhAusnAWNnCxXTJ0%3D&reserved=0 > > > > before (with a common sub-expression, though :). > > > Some comment: on an old version of Fedora ported by Loongson, Xorg just > > > hangs > > > without this commit. But on the system I built from source, I didn't see > > > any > > > issue before Linux 5.11. So I misbelieved that it was something already > > > fixed. > > > > > > Dan: you can try it on your PPC 64 with non-4K page as well. > > yup, looks good here as well, ppc64le (Power9) system with 64KB pages > > Mhm, as far as I can say this patch never made it to us. I think maybe its author considers it a "workaround" like me, as there is already a "virtual_address_alignment" which seems correct. > Can you please send it to the mailing list with me on CC? I've sent it, together with my patch for using ~PAGE_MASK in parameter validation. -- Xi Ruoyao <xry111@mengyan1223.wang> School of Aerospace Science and Technology, Xidian University _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems @ 2021-03-30 15:46 ` Xi Ruoyao 0 siblings, 0 replies; 54+ messages in thread From: Xi Ruoyao @ 2021-03-30 15:46 UTC (permalink / raw) To: Christian König, Dan Horák Cc: David Airlie, Christian König, Felix Kuehling, linux-kernel, dri-devel, amd-gfx, Alex Deucher, stable On 2021-03-30 15:24 +0200, Christian König wrote: > Am 30.03.21 um 15:23 schrieb Dan Horák: > > On Tue, 30 Mar 2021 21:09:12 +0800 > > Xi Ruoyao <xry111@mengyan1223.wang> wrote: > > > > > On 2021-03-30 21:02 +0800, Xi Ruoyao wrote: > > > > On 2021-03-30 14:55 +0200, Christian König wrote: > > > > > I rather see this as a kernel bug. Can you test if this code fragment > > > > > fixes your issue: > > > > > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > > > > > index 64beb3399604..e1260b517e1b 100644 > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > > > > > @@ -780,7 +780,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void > > > > > *data, struct drm_file *filp) > > > > > } > > > > > dev_info->virtual_address_alignment = > > > > > max((int)PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE); > > > > > dev_info->pte_fragment_size = (1 << > > > > > adev->vm_manager.fragment_size) * AMDGPU_GPU_PAGE_SIZE; > > > > > - dev_info->gart_page_size = AMDGPU_GPU_PAGE_SIZE; > > > > > + dev_info->gart_page_size = > > > > > dev_info->virtual_address_alignment; > > > > > dev_info->cu_active_number = adev- > > > > > >gfx.cu_info.number; > > > > > dev_info->cu_ao_mask = adev->gfx.cu_info.ao_cu_mask; > > > > > dev_info->ce_ram_size = adev->gfx.ce_ram_size; > > > > It works. I've seen it at > > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fxen0n%2Flinux%2Fcommit%2F84ada72983838bd7ce54bc32f5d34ac5b5aae191&data=04%7C01%7Cchristian.koenig%40amd.com%7Cf37fddf20a8847edf67808d8f37ef23c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637527074118791321%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=DZnmee38NGpiWRMX5LmlxOhxAzBMhAusnAWNnCxXTJ0%3D&reserved=0 > > > > before (with a common sub-expression, though :). > > > Some comment: on an old version of Fedora ported by Loongson, Xorg just > > > hangs > > > without this commit. But on the system I built from source, I didn't see > > > any > > > issue before Linux 5.11. So I misbelieved that it was something already > > > fixed. > > > > > > Dan: you can try it on your PPC 64 with non-4K page as well. > > yup, looks good here as well, ppc64le (Power9) system with 64KB pages > > Mhm, as far as I can say this patch never made it to us. I think maybe its author considers it a "workaround" like me, as there is already a "virtual_address_alignment" which seems correct. > Can you please send it to the mailing list with me on CC? I've sent it, together with my patch for using ~PAGE_MASK in parameter validation. -- Xi Ruoyao <xry111@mengyan1223.wang> School of Aerospace Science and Technology, Xidian University _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 54+ messages in thread
end of thread, other threads:[~2021-03-30 16:06 UTC | newest] Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-03-29 17:53 [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems Xℹ Ruoyao 2021-03-29 17:53 ` Xℹ Ruoyao 2021-03-29 17:53 ` Xℹ Ruoyao 2021-03-29 18:04 ` Christian König 2021-03-29 18:04 ` Christian König 2021-03-29 18:04 ` Christian König 2021-03-29 18:08 ` Xi Ruoyao 2021-03-29 18:08 ` Xi Ruoyao 2021-03-29 18:08 ` Xi Ruoyao 2021-03-29 18:10 ` Christian König 2021-03-29 18:10 ` Christian König 2021-03-29 18:10 ` Christian König 2021-03-29 18:21 ` Xi Ruoyao 2021-03-29 18:21 ` Xi Ruoyao 2021-03-29 18:21 ` Xi Ruoyao 2021-03-29 18:30 ` Xi Ruoyao 2021-03-29 18:30 ` Xi Ruoyao 2021-03-29 18:30 ` Xi Ruoyao 2021-03-29 19:27 ` Xi Ruoyao 2021-03-29 19:27 ` Xi Ruoyao 2021-03-29 19:27 ` Xi Ruoyao 2021-03-29 19:36 ` Christian König 2021-03-29 19:36 ` Christian König 2021-03-29 19:36 ` Christian König 2021-03-29 19:40 ` Xi Ruoyao 2021-03-29 19:40 ` Xi Ruoyao 2021-03-29 19:40 ` Xi Ruoyao 2021-03-30 12:04 ` Xi Ruoyao 2021-03-30 12:04 ` Xi Ruoyao 2021-03-30 12:04 ` Xi Ruoyao 2021-03-30 12:55 ` Christian König 2021-03-30 12:55 ` Christian König 2021-03-30 12:55 ` Christian König 2021-03-30 13:00 ` Dan Horák 2021-03-30 13:00 ` Dan Horák 2021-03-30 13:00 ` Dan Horák 2021-03-30 13:02 ` Christian König 2021-03-30 13:02 ` Christian König 2021-03-30 13:02 ` Christian König 2021-03-30 13:02 ` Xi Ruoyao 2021-03-30 13:02 ` Xi Ruoyao 2021-03-30 13:02 ` Xi Ruoyao 2021-03-30 13:09 ` Xi Ruoyao 2021-03-30 13:09 ` Xi Ruoyao 2021-03-30 13:09 ` Xi Ruoyao 2021-03-30 13:23 ` Dan Horák 2021-03-30 13:23 ` Dan Horák 2021-03-30 13:23 ` Dan Horák 2021-03-30 13:24 ` Christian König 2021-03-30 13:24 ` Christian König 2021-03-30 13:24 ` Christian König 2021-03-30 15:46 ` Xi Ruoyao 2021-03-30 15:46 ` Xi Ruoyao 2021-03-30 15:46 ` Xi Ruoyao
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.