All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] drm/nouveau/bar/gf100: fix hang when calling ->fini() before ->init()
@ 2017-12-04 18:37 ` Guillaume Tucker
  0 siblings, 0 replies; 18+ messages in thread
From: Guillaume Tucker @ 2017-12-04 18:37 UTC (permalink / raw)
  To: Jonathan Hunter, David Airlie
  Cc: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Guillaume Tucker, Ben Skeggs

If the firmware fails to load then ->fini() will be called before the
device has been initialised, causing the kernel to hang while trying
to write to a register.  Add a test in ->fini() to avoid this issue.

This fixes a kernel hang on tegra124.

Fixes: b17de35a2ebbe ("drm/nouveau/bar: implement bar1 teardown")
Signed-off-by: Guillaume Tucker <guillaume.tucker-ZGY8ohtN/8qB+jHODAdFcQ@public.gmane.org>
CC: Ben Skeggs <bskeggs-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
index a3ba7f50198b..95e2aba64aad 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
@@ -43,9 +43,12 @@ gf100_bar_bar1_wait(struct nvkm_bar *base)
 }
 
 void
-gf100_bar_bar1_fini(struct nvkm_bar *bar)
+gf100_bar_bar1_fini(struct nvkm_bar *base)
 {
-	nvkm_mask(bar->subdev.device, 0x001704, 0x80000000, 0x00000000);
+	struct nvkm_device *device = base->subdev.device;
+
+	if (base->subdev.oneinit)
+		nvkm_mask(device, 0x001704, 0x80000000, 0x00000000);
 }
 
 void
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 1/2] drm/nouveau/bar/gf100: fix hang when calling ->fini() before ->init()
@ 2017-12-04 18:37 ` Guillaume Tucker
  0 siblings, 0 replies; 18+ messages in thread
From: Guillaume Tucker @ 2017-12-04 18:37 UTC (permalink / raw)
  To: Jonathan Hunter, David Airlie
  Cc: dri-devel, linux-tegra, linux-kernel, linux-arm-kernel,
	Guillaume Tucker, Ben Skeggs

If the firmware fails to load then ->fini() will be called before the
device has been initialised, causing the kernel to hang while trying
to write to a register.  Add a test in ->fini() to avoid this issue.

This fixes a kernel hang on tegra124.

Fixes: b17de35a2ebbe ("drm/nouveau/bar: implement bar1 teardown")
Signed-off-by: Guillaume Tucker <guillaume.tucker@collabora.com>
CC: Ben Skeggs <bskeggs@redhat.com>
---
 drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
index a3ba7f50198b..95e2aba64aad 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
@@ -43,9 +43,12 @@ gf100_bar_bar1_wait(struct nvkm_bar *base)
 }
 
 void
-gf100_bar_bar1_fini(struct nvkm_bar *bar)
+gf100_bar_bar1_fini(struct nvkm_bar *base)
 {
-	nvkm_mask(bar->subdev.device, 0x001704, 0x80000000, 0x00000000);
+	struct nvkm_device *device = base->subdev.device;
+
+	if (base->subdev.oneinit)
+		nvkm_mask(device, 0x001704, 0x80000000, 0x00000000);
 }
 
 void
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 1/2] drm/nouveau/bar/gf100: fix hang when calling ->fini() before ->init()
@ 2017-12-04 18:37 ` Guillaume Tucker
  0 siblings, 0 replies; 18+ messages in thread
From: Guillaume Tucker @ 2017-12-04 18:37 UTC (permalink / raw)
  To: linux-arm-kernel

If the firmware fails to load then ->fini() will be called before the
device has been initialised, causing the kernel to hang while trying
to write to a register.  Add a test in ->fini() to avoid this issue.

This fixes a kernel hang on tegra124.

Fixes: b17de35a2ebbe ("drm/nouveau/bar: implement bar1 teardown")
Signed-off-by: Guillaume Tucker <guillaume.tucker@collabora.com>
CC: Ben Skeggs <bskeggs@redhat.com>
---
 drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
index a3ba7f50198b..95e2aba64aad 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
@@ -43,9 +43,12 @@ gf100_bar_bar1_wait(struct nvkm_bar *base)
 }
 
 void
-gf100_bar_bar1_fini(struct nvkm_bar *bar)
+gf100_bar_bar1_fini(struct nvkm_bar *base)
 {
-	nvkm_mask(bar->subdev.device, 0x001704, 0x80000000, 0x00000000);
+	struct nvkm_device *device = base->subdev.device;
+
+	if (base->subdev.oneinit)
+		nvkm_mask(device, 0x001704, 0x80000000, 0x00000000);
 }
 
 void
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/2] drm/tegra: sor: Fix hang on tegra124 due to NULL clk_out
  2017-12-04 18:37 ` Guillaume Tucker
  (?)
@ 2017-12-04 18:37     ` Guillaume Tucker
  -1 siblings, 0 replies; 18+ messages in thread
From: Guillaume Tucker @ 2017-12-04 18:37 UTC (permalink / raw)
  To: Jonathan Hunter, David Airlie
  Cc: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Guillaume Tucker, Thierry Reding

When neither HDMI nor DP is supported such as on the tegra124, the
sor->clk_out is not initialised and remains NULL.  In this case, the
parent clock can't be assigned to it so revert to the previous
behaviour of assigning it to the main sor->clock instead.

This fixes a kernel hang on tegra124.

Fixes: e1335e2f0cfc ("drm/tegra: sor: Reimplement pad clock")
Signed-off-by: Guillaume Tucker <guillaume.tucker-ZGY8ohtN/8qB+jHODAdFcQ@public.gmane.org>
CC: Thierry Reding <treding-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
---
 drivers/gpu/drm/tegra/sor.c | 24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/tegra/sor.c b/drivers/gpu/drm/tegra/sor.c
index b0a1dedac802..8d2e29c9ab2b 100644
--- a/drivers/gpu/drm/tegra/sor.c
+++ b/drivers/gpu/drm/tegra/sor.c
@@ -255,7 +255,7 @@ static int tegra_sor_set_parent_clock(struct tegra_sor *sor, struct clk *parent)
 
 	clk_disable_unprepare(sor->clk);
 
-	err = clk_set_parent(sor->clk_out, parent);
+	err = clk_set_parent(sor->clk_out ? sor->clk_out : sor->clk, parent);
 	if (err < 0)
 		return err;
 
@@ -2698,15 +2698,19 @@ static int tegra_sor_probe(struct platform_device *pdev)
 		sor->clk_pad = NULL;
 	}
 
-	/*
-	 * The bootloader may have set up the SOR such that it's module clock
-	 * is sourced by one of the display PLLs. However, that doesn't work
-	 * without properly having set up other bits of the SOR.
-	 */
-	err = clk_set_parent(sor->clk_out, sor->clk_safe);
-	if (err < 0) {
-		dev_err(&pdev->dev, "failed to use safe clock: %d\n", err);
-		goto remove;
+	if (sor->clk_out) {
+		/*
+		 * The bootloader may have set up the SOR such that its module
+		 * clock is sourced by one of the display PLLs. However, that
+		 * doesn't work without properly having set up other bits of
+		 * the SOR.
+		 */
+		err = clk_set_parent(sor->clk_out, sor->clk_safe);
+		if (err < 0) {
+			dev_err(&pdev->dev, "failed to use safe clock: %d\n",
+				err);
+			goto remove;
+		}
 	}
 
 	platform_set_drvdata(pdev, sor);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/2] drm/tegra: sor: Fix hang on tegra124 due to NULL clk_out
@ 2017-12-04 18:37     ` Guillaume Tucker
  0 siblings, 0 replies; 18+ messages in thread
From: Guillaume Tucker @ 2017-12-04 18:37 UTC (permalink / raw)
  To: Jonathan Hunter, David Airlie
  Cc: dri-devel, linux-tegra, linux-kernel, linux-arm-kernel,
	Guillaume Tucker, Thierry Reding

When neither HDMI nor DP is supported such as on the tegra124, the
sor->clk_out is not initialised and remains NULL.  In this case, the
parent clock can't be assigned to it so revert to the previous
behaviour of assigning it to the main sor->clock instead.

This fixes a kernel hang on tegra124.

Fixes: e1335e2f0cfc ("drm/tegra: sor: Reimplement pad clock")
Signed-off-by: Guillaume Tucker <guillaume.tucker@collabora.com>
CC: Thierry Reding <treding@nvidia.com>
---
 drivers/gpu/drm/tegra/sor.c | 24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/tegra/sor.c b/drivers/gpu/drm/tegra/sor.c
index b0a1dedac802..8d2e29c9ab2b 100644
--- a/drivers/gpu/drm/tegra/sor.c
+++ b/drivers/gpu/drm/tegra/sor.c
@@ -255,7 +255,7 @@ static int tegra_sor_set_parent_clock(struct tegra_sor *sor, struct clk *parent)
 
 	clk_disable_unprepare(sor->clk);
 
-	err = clk_set_parent(sor->clk_out, parent);
+	err = clk_set_parent(sor->clk_out ? sor->clk_out : sor->clk, parent);
 	if (err < 0)
 		return err;
 
@@ -2698,15 +2698,19 @@ static int tegra_sor_probe(struct platform_device *pdev)
 		sor->clk_pad = NULL;
 	}
 
-	/*
-	 * The bootloader may have set up the SOR such that it's module clock
-	 * is sourced by one of the display PLLs. However, that doesn't work
-	 * without properly having set up other bits of the SOR.
-	 */
-	err = clk_set_parent(sor->clk_out, sor->clk_safe);
-	if (err < 0) {
-		dev_err(&pdev->dev, "failed to use safe clock: %d\n", err);
-		goto remove;
+	if (sor->clk_out) {
+		/*
+		 * The bootloader may have set up the SOR such that its module
+		 * clock is sourced by one of the display PLLs. However, that
+		 * doesn't work without properly having set up other bits of
+		 * the SOR.
+		 */
+		err = clk_set_parent(sor->clk_out, sor->clk_safe);
+		if (err < 0) {
+			dev_err(&pdev->dev, "failed to use safe clock: %d\n",
+				err);
+			goto remove;
+		}
 	}
 
 	platform_set_drvdata(pdev, sor);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/2] drm/tegra: sor: Fix hang on tegra124 due to NULL clk_out
@ 2017-12-04 18:37     ` Guillaume Tucker
  0 siblings, 0 replies; 18+ messages in thread
From: Guillaume Tucker @ 2017-12-04 18:37 UTC (permalink / raw)
  To: linux-arm-kernel

When neither HDMI nor DP is supported such as on the tegra124, the
sor->clk_out is not initialised and remains NULL.  In this case, the
parent clock can't be assigned to it so revert to the previous
behaviour of assigning it to the main sor->clock instead.

This fixes a kernel hang on tegra124.

Fixes: e1335e2f0cfc ("drm/tegra: sor: Reimplement pad clock")
Signed-off-by: Guillaume Tucker <guillaume.tucker@collabora.com>
CC: Thierry Reding <treding@nvidia.com>
---
 drivers/gpu/drm/tegra/sor.c | 24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/tegra/sor.c b/drivers/gpu/drm/tegra/sor.c
index b0a1dedac802..8d2e29c9ab2b 100644
--- a/drivers/gpu/drm/tegra/sor.c
+++ b/drivers/gpu/drm/tegra/sor.c
@@ -255,7 +255,7 @@ static int tegra_sor_set_parent_clock(struct tegra_sor *sor, struct clk *parent)
 
 	clk_disable_unprepare(sor->clk);
 
-	err = clk_set_parent(sor->clk_out, parent);
+	err = clk_set_parent(sor->clk_out ? sor->clk_out : sor->clk, parent);
 	if (err < 0)
 		return err;
 
@@ -2698,15 +2698,19 @@ static int tegra_sor_probe(struct platform_device *pdev)
 		sor->clk_pad = NULL;
 	}
 
-	/*
-	 * The bootloader may have set up the SOR such that it's module clock
-	 * is sourced by one of the display PLLs. However, that doesn't work
-	 * without properly having set up other bits of the SOR.
-	 */
-	err = clk_set_parent(sor->clk_out, sor->clk_safe);
-	if (err < 0) {
-		dev_err(&pdev->dev, "failed to use safe clock: %d\n", err);
-		goto remove;
+	if (sor->clk_out) {
+		/*
+		 * The bootloader may have set up the SOR such that its module
+		 * clock is sourced by one of the display PLLs. However, that
+		 * doesn't work without properly having set up other bits of
+		 * the SOR.
+		 */
+		err = clk_set_parent(sor->clk_out, sor->clk_safe);
+		if (err < 0) {
+			dev_err(&pdev->dev, "failed to use safe clock: %d\n",
+				err);
+			goto remove;
+		}
 	}
 
 	platform_set_drvdata(pdev, sor);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] drm/nouveau/bar/gf100: fix hang when calling ->fini() before ->init()
  2017-12-04 18:37 ` Guillaume Tucker
  (?)
@ 2017-12-05 14:30   ` Jon Hunter
  -1 siblings, 0 replies; 18+ messages in thread
From: Jon Hunter @ 2017-12-05 14:30 UTC (permalink / raw)
  To: Guillaume Tucker, David Airlie, Ben Skeggs
  Cc: linux-tegra, linux-kernel, dri-devel, linux-arm-kernel


On 04/12/17 18:37, Guillaume Tucker wrote:
> If the firmware fails to load then ->fini() will be called before the
> device has been initialised, causing the kernel to hang while trying
> to write to a register.  Add a test in ->fini() to avoid this issue.
> 
> This fixes a kernel hang on tegra124.
> 
> Fixes: b17de35a2ebbe ("drm/nouveau/bar: implement bar1 teardown")
> Signed-off-by: Guillaume Tucker <guillaume.tucker@collabora.com>
> CC: Ben Skeggs <bskeggs@redhat.com>
> ---
>  drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
> index a3ba7f50198b..95e2aba64aad 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
> @@ -43,9 +43,12 @@ gf100_bar_bar1_wait(struct nvkm_bar *base)
>  }
>  
>  void
> -gf100_bar_bar1_fini(struct nvkm_bar *bar)
> +gf100_bar_bar1_fini(struct nvkm_bar *base)
>  {
> -	nvkm_mask(bar->subdev.device, 0x001704, 0x80000000, 0x00000000);
> +	struct nvkm_device *device = base->subdev.device;
> +
> +	if (base->subdev.oneinit)
> +		nvkm_mask(device, 0x001704, 0x80000000, 0x00000000);
>  }
>  
>  void

I have tested this and it works for me. Thanks for fixing this! Would be
good to get Ben's ACK, but you can have my ...

Tested-by: Jon Hunter <jonathanh@nvidia.com>

Cheers
Jon

-- 
nvpublic
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] drm/nouveau/bar/gf100: fix hang when calling ->fini() before ->init()
@ 2017-12-05 14:30   ` Jon Hunter
  0 siblings, 0 replies; 18+ messages in thread
From: Jon Hunter @ 2017-12-05 14:30 UTC (permalink / raw)
  To: Guillaume Tucker, David Airlie, Ben Skeggs
  Cc: dri-devel, linux-tegra, linux-kernel, linux-arm-kernel


On 04/12/17 18:37, Guillaume Tucker wrote:
> If the firmware fails to load then ->fini() will be called before the
> device has been initialised, causing the kernel to hang while trying
> to write to a register.  Add a test in ->fini() to avoid this issue.
> 
> This fixes a kernel hang on tegra124.
> 
> Fixes: b17de35a2ebbe ("drm/nouveau/bar: implement bar1 teardown")
> Signed-off-by: Guillaume Tucker <guillaume.tucker@collabora.com>
> CC: Ben Skeggs <bskeggs@redhat.com>
> ---
>  drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
> index a3ba7f50198b..95e2aba64aad 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
> @@ -43,9 +43,12 @@ gf100_bar_bar1_wait(struct nvkm_bar *base)
>  }
>  
>  void
> -gf100_bar_bar1_fini(struct nvkm_bar *bar)
> +gf100_bar_bar1_fini(struct nvkm_bar *base)
>  {
> -	nvkm_mask(bar->subdev.device, 0x001704, 0x80000000, 0x00000000);
> +	struct nvkm_device *device = base->subdev.device;
> +
> +	if (base->subdev.oneinit)
> +		nvkm_mask(device, 0x001704, 0x80000000, 0x00000000);
>  }
>  
>  void

I have tested this and it works for me. Thanks for fixing this! Would be
good to get Ben's ACK, but you can have my ...

Tested-by: Jon Hunter <jonathanh@nvidia.com>

Cheers
Jon

-- 
nvpublic

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/2] drm/nouveau/bar/gf100: fix hang when calling ->fini() before ->init()
@ 2017-12-05 14:30   ` Jon Hunter
  0 siblings, 0 replies; 18+ messages in thread
From: Jon Hunter @ 2017-12-05 14:30 UTC (permalink / raw)
  To: linux-arm-kernel


On 04/12/17 18:37, Guillaume Tucker wrote:
> If the firmware fails to load then ->fini() will be called before the
> device has been initialised, causing the kernel to hang while trying
> to write to a register.  Add a test in ->fini() to avoid this issue.
> 
> This fixes a kernel hang on tegra124.
> 
> Fixes: b17de35a2ebbe ("drm/nouveau/bar: implement bar1 teardown")
> Signed-off-by: Guillaume Tucker <guillaume.tucker@collabora.com>
> CC: Ben Skeggs <bskeggs@redhat.com>
> ---
>  drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
> index a3ba7f50198b..95e2aba64aad 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
> @@ -43,9 +43,12 @@ gf100_bar_bar1_wait(struct nvkm_bar *base)
>  }
>  
>  void
> -gf100_bar_bar1_fini(struct nvkm_bar *bar)
> +gf100_bar_bar1_fini(struct nvkm_bar *base)
>  {
> -	nvkm_mask(bar->subdev.device, 0x001704, 0x80000000, 0x00000000);
> +	struct nvkm_device *device = base->subdev.device;
> +
> +	if (base->subdev.oneinit)
> +		nvkm_mask(device, 0x001704, 0x80000000, 0x00000000);
>  }
>  
>  void

I have tested this and it works for me. Thanks for fixing this! Would be
good to get Ben's ACK, but you can have my ...

Tested-by: Jon Hunter <jonathanh@nvidia.com>

Cheers
Jon

-- 
nvpublic

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] drm/nouveau/bar/gf100: fix hang when calling ->fini() before ->init()
  2017-12-05 14:30   ` Jon Hunter
  (?)
  (?)
@ 2017-12-05 18:32   ` Ben Skeggs
  2017-12-06  9:22       ` Guillaume Tucker
  -1 siblings, 1 reply; 18+ messages in thread
From: Ben Skeggs @ 2017-12-05 18:32 UTC (permalink / raw)
  To: Jon Hunter
  Cc: David Airlie, Guillaume Tucker, linux-kernel, dri-devel,
	linux-tegra, linux-arm-kernel


[-- Attachment #1.1: Type: text/plain, Size: 1814 bytes --]

On Wed, Dec 6, 2017 at 12:30 AM, Jon Hunter <jonathanh@nvidia.com> wrote:

>
> On 04/12/17 18:37, Guillaume Tucker wrote:
> > If the firmware fails to load then ->fini() will be called before the
> > device has been initialised, causing the kernel to hang while trying
> > to write to a register.  Add a test in ->fini() to avoid this issue.
> >
> > This fixes a kernel hang on tegra124.
> >
> > Fixes: b17de35a2ebbe ("drm/nouveau/bar: implement bar1 teardown")
> > Signed-off-by: Guillaume Tucker <guillaume.tucker@collabora.com>
> > CC: Ben Skeggs <bskeggs@redhat.com>
> > ---
> >  drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c | 7 +++++--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
> b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
> > index a3ba7f50198b..95e2aba64aad 100644
> > --- a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
> > +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
> > @@ -43,9 +43,12 @@ gf100_bar_bar1_wait(struct nvkm_bar *base)
> >  }
> >
> >  void
> > -gf100_bar_bar1_fini(struct nvkm_bar *bar)
> > +gf100_bar_bar1_fini(struct nvkm_bar *base)
> >  {
> > -     nvkm_mask(bar->subdev.device, 0x001704, 0x80000000, 0x00000000);
> > +     struct nvkm_device *device = base->subdev.device;
> > +
> > +     if (base->subdev.oneinit)
> > +             nvkm_mask(device, 0x001704, 0x80000000, 0x00000000);
> >  }
> >
> >  void
>
> I have tested this and it works for me. Thanks for fixing this! Would be
> good to get Ben's ACK, but you can have my ...
>
I'd love to get a good explanation as to why it hangs without this change,
as, on the surface, it's not immediately obvious as to why it's hanging.

Thanks,
Ben.


>
> Tested-by: Jon Hunter <jonathanh@nvidia.com>
>
> Cheers
> Jon
>
> --
> nvpublic
>

[-- Attachment #1.2: Type: text/html, Size: 2859 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] drm/nouveau/bar/gf100: fix hang when calling ->fini() before ->init()
  2017-12-05 18:32   ` Ben Skeggs
@ 2017-12-06  9:22       ` Guillaume Tucker
  0 siblings, 0 replies; 18+ messages in thread
From: Guillaume Tucker @ 2017-12-06  9:22 UTC (permalink / raw)
  To: Ben Skeggs, Jon Hunter
  Cc: David Airlie, dri-devel, linux-tegra, linux-kernel, linux-arm-kernel

On 05/12/17 18:32, Ben Skeggs wrote:
> On Wed, Dec 6, 2017 at 12:30 AM, Jon Hunter <jonathanh@nvidia.com> wrote:
>
>>
>> On 04/12/17 18:37, Guillaume Tucker wrote:
>>> If the firmware fails to load then ->fini() will be called before the
>>> device has been initialised, causing the kernel to hang while trying
>>> to write to a register.  Add a test in ->fini() to avoid this issue.
>>>
>>> This fixes a kernel hang on tegra124.
>>>
>>> Fixes: b17de35a2ebbe ("drm/nouveau/bar: implement bar1 teardown")
>>> Signed-off-by: Guillaume Tucker <guillaume.tucker@collabora.com>
>>> CC: Ben Skeggs <bskeggs@redhat.com>
>>> ---
>>>  drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c | 7 +++++--
>>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
>> b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
>>> index a3ba7f50198b..95e2aba64aad 100644
>>> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
>>> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
>>> @@ -43,9 +43,12 @@ gf100_bar_bar1_wait(struct nvkm_bar *base)
>>>  }
>>>
>>>  void
>>> -gf100_bar_bar1_fini(struct nvkm_bar *bar)
>>> +gf100_bar_bar1_fini(struct nvkm_bar *base)
>>>  {
>>> -     nvkm_mask(bar->subdev.device, 0x001704, 0x80000000, 0x00000000);
>>> +     struct nvkm_device *device = base->subdev.device;
>>> +
>>> +     if (base->subdev.oneinit)
>>> +             nvkm_mask(device, 0x001704, 0x80000000, 0x00000000);
>>>  }
>>>
>>>  void
>>
>> I have tested this and it works for me. Thanks for fixing this! Would be
>> good to get Ben's ACK, but you can have my ...
>>
> I'd love to get a good explanation as to why it hangs without this change,
> as, on the surface, it's not immediately obvious as to why it's hanging.

To be fair I'm not entirely sure either why this causes a hang, I
haven't read the TRM...  The iomem has been mapped at this point,
so accessing the register should work.  One clue is when you look
at _bar1_init(), the 0x1704 register is initialised with
some (device instance?) memory address.  So it's possible that
the hardware does something special when you set this to 0 as in
_bar1_fini(), which may fail in particular if it was previously
not initialised with a valid address.

This is merely guesswork, would be interested to find out the
real explanation though.

>> Tested-by: Jon Hunter <jonathanh@nvidia.com>

Thanks!

Guillaume

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/2] drm/nouveau/bar/gf100: fix hang when calling ->fini() before ->init()
@ 2017-12-06  9:22       ` Guillaume Tucker
  0 siblings, 0 replies; 18+ messages in thread
From: Guillaume Tucker @ 2017-12-06  9:22 UTC (permalink / raw)
  To: linux-arm-kernel

On 05/12/17 18:32, Ben Skeggs wrote:
> On Wed, Dec 6, 2017 at 12:30 AM, Jon Hunter <jonathanh@nvidia.com> wrote:
>
>>
>> On 04/12/17 18:37, Guillaume Tucker wrote:
>>> If the firmware fails to load then ->fini() will be called before the
>>> device has been initialised, causing the kernel to hang while trying
>>> to write to a register.  Add a test in ->fini() to avoid this issue.
>>>
>>> This fixes a kernel hang on tegra124.
>>>
>>> Fixes: b17de35a2ebbe ("drm/nouveau/bar: implement bar1 teardown")
>>> Signed-off-by: Guillaume Tucker <guillaume.tucker@collabora.com>
>>> CC: Ben Skeggs <bskeggs@redhat.com>
>>> ---
>>>  drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c | 7 +++++--
>>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
>> b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
>>> index a3ba7f50198b..95e2aba64aad 100644
>>> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
>>> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
>>> @@ -43,9 +43,12 @@ gf100_bar_bar1_wait(struct nvkm_bar *base)
>>>  }
>>>
>>>  void
>>> -gf100_bar_bar1_fini(struct nvkm_bar *bar)
>>> +gf100_bar_bar1_fini(struct nvkm_bar *base)
>>>  {
>>> -     nvkm_mask(bar->subdev.device, 0x001704, 0x80000000, 0x00000000);
>>> +     struct nvkm_device *device = base->subdev.device;
>>> +
>>> +     if (base->subdev.oneinit)
>>> +             nvkm_mask(device, 0x001704, 0x80000000, 0x00000000);
>>>  }
>>>
>>>  void
>>
>> I have tested this and it works for me. Thanks for fixing this! Would be
>> good to get Ben's ACK, but you can have my ...
>>
> I'd love to get a good explanation as to why it hangs without this change,
> as, on the surface, it's not immediately obvious as to why it's hanging.

To be fair I'm not entirely sure either why this causes a hang, I
haven't read the TRM...  The iomem has been mapped at this point,
so accessing the register should work.  One clue is when you look
at _bar1_init(), the 0x1704 register is initialised with
some (device instance?) memory address.  So it's possible that
the hardware does something special when you set this to 0 as in
_bar1_fini(), which may fail in particular if it was previously
not initialised with a valid address.

This is merely guesswork, would be interested to find out the
real explanation though.

>> Tested-by: Jon Hunter <jonathanh@nvidia.com>

Thanks!

Guillaume

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] drm/nouveau/bar/gf100: fix hang when calling ->fini() before ->init()
  2017-12-06  9:22       ` Guillaume Tucker
  (?)
@ 2017-12-06 17:18           ` Jon Hunter
  -1 siblings, 0 replies; 18+ messages in thread
From: Jon Hunter @ 2017-12-06 17:18 UTC (permalink / raw)
  To: Guillaume Tucker, Ben Skeggs
  Cc: David Airlie, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r


On 06/12/17 09:22, Guillaume Tucker wrote:
> On 05/12/17 18:32, Ben Skeggs wrote:
>> On Wed, Dec 6, 2017 at 12:30 AM, Jon Hunter <jonathanh-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org> wrote:
>>
>>>
>>> On 04/12/17 18:37, Guillaume Tucker wrote:
>>>> If the firmware fails to load then ->fini() will be called before the
>>>> device has been initialised, causing the kernel to hang while trying
>>>> to write to a register.  Add a test in ->fini() to avoid this issue.
>>>>
>>>> This fixes a kernel hang on tegra124.
>>>>
>>>> Fixes: b17de35a2ebbe ("drm/nouveau/bar: implement bar1 teardown")
>>>> Signed-off-by: Guillaume Tucker <guillaume.tucker-ZGY8ohtN/8qB+jHODAdFcQ@public.gmane.org>
>>>> CC: Ben Skeggs <bskeggs-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>>>> ---
>>>>  drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c | 7 +++++--
>>>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
>>> b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
>>>> index a3ba7f50198b..95e2aba64aad 100644
>>>> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
>>>> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
>>>> @@ -43,9 +43,12 @@ gf100_bar_bar1_wait(struct nvkm_bar *base)
>>>>  }
>>>>
>>>>  void
>>>> -gf100_bar_bar1_fini(struct nvkm_bar *bar)
>>>> +gf100_bar_bar1_fini(struct nvkm_bar *base)
>>>>  {
>>>> -     nvkm_mask(bar->subdev.device, 0x001704, 0x80000000, 0x00000000);
>>>> +     struct nvkm_device *device = base->subdev.device;
>>>> +
>>>> +     if (base->subdev.oneinit)
>>>> +             nvkm_mask(device, 0x001704, 0x80000000, 0x00000000);
>>>>  }
>>>>
>>>>  void
>>>
>>> I have tested this and it works for me. Thanks for fixing this! Would be
>>> good to get Ben's ACK, but you can have my ...
>>>
>> I'd love to get a good explanation as to why it hangs without this
>> change,
>> as, on the surface, it's not immediately obvious as to why it's hanging.
> 
> To be fair I'm not entirely sure either why this causes a hang, I
> haven't read the TRM...  The iomem has been mapped at this point,
> so accessing the register should work.  One clue is when you look
> at _bar1_init(), the 0x1704 register is initialised with
> some (device instance?) memory address.  So it's possible that
> the hardware does something special when you set this to 0 as in
> _bar1_fini(), which may fail in particular if it was previously
> not initialised with a valid address.
> 
> This is merely guesswork, would be interested to find out the
> real explanation though.

OK, well that's no good. It's a good pointer, but we need to make sure
we understand the root of this hang. I will see if I have sometime to
dig into this further, maybe next week.

Cheers
Jon

-- 
nvpublic

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] drm/nouveau/bar/gf100: fix hang when calling ->fini() before ->init()
@ 2017-12-06 17:18           ` Jon Hunter
  0 siblings, 0 replies; 18+ messages in thread
From: Jon Hunter @ 2017-12-06 17:18 UTC (permalink / raw)
  To: Guillaume Tucker, Ben Skeggs
  Cc: David Airlie, dri-devel, linux-tegra, linux-kernel, linux-arm-kernel


On 06/12/17 09:22, Guillaume Tucker wrote:
> On 05/12/17 18:32, Ben Skeggs wrote:
>> On Wed, Dec 6, 2017 at 12:30 AM, Jon Hunter <jonathanh@nvidia.com> wrote:
>>
>>>
>>> On 04/12/17 18:37, Guillaume Tucker wrote:
>>>> If the firmware fails to load then ->fini() will be called before the
>>>> device has been initialised, causing the kernel to hang while trying
>>>> to write to a register.  Add a test in ->fini() to avoid this issue.
>>>>
>>>> This fixes a kernel hang on tegra124.
>>>>
>>>> Fixes: b17de35a2ebbe ("drm/nouveau/bar: implement bar1 teardown")
>>>> Signed-off-by: Guillaume Tucker <guillaume.tucker@collabora.com>
>>>> CC: Ben Skeggs <bskeggs@redhat.com>
>>>> ---
>>>>  drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c | 7 +++++--
>>>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
>>> b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
>>>> index a3ba7f50198b..95e2aba64aad 100644
>>>> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
>>>> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
>>>> @@ -43,9 +43,12 @@ gf100_bar_bar1_wait(struct nvkm_bar *base)
>>>>  }
>>>>
>>>>  void
>>>> -gf100_bar_bar1_fini(struct nvkm_bar *bar)
>>>> +gf100_bar_bar1_fini(struct nvkm_bar *base)
>>>>  {
>>>> -     nvkm_mask(bar->subdev.device, 0x001704, 0x80000000, 0x00000000);
>>>> +     struct nvkm_device *device = base->subdev.device;
>>>> +
>>>> +     if (base->subdev.oneinit)
>>>> +             nvkm_mask(device, 0x001704, 0x80000000, 0x00000000);
>>>>  }
>>>>
>>>>  void
>>>
>>> I have tested this and it works for me. Thanks for fixing this! Would be
>>> good to get Ben's ACK, but you can have my ...
>>>
>> I'd love to get a good explanation as to why it hangs without this
>> change,
>> as, on the surface, it's not immediately obvious as to why it's hanging.
> 
> To be fair I'm not entirely sure either why this causes a hang, I
> haven't read the TRM...  The iomem has been mapped at this point,
> so accessing the register should work.  One clue is when you look
> at _bar1_init(), the 0x1704 register is initialised with
> some (device instance?) memory address.  So it's possible that
> the hardware does something special when you set this to 0 as in
> _bar1_fini(), which may fail in particular if it was previously
> not initialised with a valid address.
> 
> This is merely guesswork, would be interested to find out the
> real explanation though.

OK, well that's no good. It's a good pointer, but we need to make sure
we understand the root of this hang. I will see if I have sometime to
dig into this further, maybe next week.

Cheers
Jon

-- 
nvpublic

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/2] drm/nouveau/bar/gf100: fix hang when calling ->fini() before ->init()
@ 2017-12-06 17:18           ` Jon Hunter
  0 siblings, 0 replies; 18+ messages in thread
From: Jon Hunter @ 2017-12-06 17:18 UTC (permalink / raw)
  To: linux-arm-kernel


On 06/12/17 09:22, Guillaume Tucker wrote:
> On 05/12/17 18:32, Ben Skeggs wrote:
>> On Wed, Dec 6, 2017 at 12:30 AM, Jon Hunter <jonathanh@nvidia.com> wrote:
>>
>>>
>>> On 04/12/17 18:37, Guillaume Tucker wrote:
>>>> If the firmware fails to load then ->fini() will be called before the
>>>> device has been initialised, causing the kernel to hang while trying
>>>> to write to a register.? Add a test in ->fini() to avoid this issue.
>>>>
>>>> This fixes a kernel hang on tegra124.
>>>>
>>>> Fixes: b17de35a2ebbe ("drm/nouveau/bar: implement bar1 teardown")
>>>> Signed-off-by: Guillaume Tucker <guillaume.tucker@collabora.com>
>>>> CC: Ben Skeggs <bskeggs@redhat.com>
>>>> ---
>>>> ?drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c | 7 +++++--
>>>> ?1 file changed, 5 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
>>> b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
>>>> index a3ba7f50198b..95e2aba64aad 100644
>>>> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
>>>> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
>>>> @@ -43,9 +43,12 @@ gf100_bar_bar1_wait(struct nvkm_bar *base)
>>>> ?}
>>>>
>>>> ?void
>>>> -gf100_bar_bar1_fini(struct nvkm_bar *bar)
>>>> +gf100_bar_bar1_fini(struct nvkm_bar *base)
>>>> ?{
>>>> -???? nvkm_mask(bar->subdev.device, 0x001704, 0x80000000, 0x00000000);
>>>> +???? struct nvkm_device *device = base->subdev.device;
>>>> +
>>>> +???? if (base->subdev.oneinit)
>>>> +???????????? nvkm_mask(device, 0x001704, 0x80000000, 0x00000000);
>>>> ?}
>>>>
>>>> ?void
>>>
>>> I have tested this and it works for me. Thanks for fixing this! Would be
>>> good to get Ben's ACK, but you can have my ...
>>>
>> I'd love to get a good explanation as to why it hangs without this
>> change,
>> as, on the surface, it's not immediately obvious as to why it's hanging.
> 
> To be fair I'm not entirely sure either why this causes a hang, I
> haven't read the TRM...? The iomem has been mapped at this point,
> so accessing the register should work.? One clue is when you look
> at _bar1_init(), the 0x1704 register is initialised with
> some (device instance?) memory address.? So it's possible that
> the hardware does something special when you set this to 0 as in
> _bar1_fini(), which may fail in particular if it was previously
> not initialised with a valid address.
> 
> This is merely guesswork, would be interested to find out the
> real explanation though.

OK, well that's no good. It's a good pointer, but we need to make sure
we understand the root of this hang. I will see if I have sometime to
dig into this further, maybe next week.

Cheers
Jon

-- 
nvpublic

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] drm/nouveau/bar/gf100: fix hang when calling ->fini() before ->init()
  2017-12-06 17:18           ` Jon Hunter
  (?)
@ 2017-12-19 22:05               ` Jon Hunter
  -1 siblings, 0 replies; 18+ messages in thread
From: Jon Hunter @ 2017-12-19 22:05 UTC (permalink / raw)
  To: Guillaume Tucker, Ben Skeggs
  Cc: David Airlie, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r


On 06/12/17 17:18, Jon Hunter wrote:
> 
> On 06/12/17 09:22, Guillaume Tucker wrote:
>> On 05/12/17 18:32, Ben Skeggs wrote:
>>> On Wed, Dec 6, 2017 at 12:30 AM, Jon Hunter <jonathanh-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org> wrote:
>>>
>>>>
>>>> On 04/12/17 18:37, Guillaume Tucker wrote:
>>>>> If the firmware fails to load then ->fini() will be called before the
>>>>> device has been initialised, causing the kernel to hang while trying
>>>>> to write to a register.  Add a test in ->fini() to avoid this issue.
>>>>>
>>>>> This fixes a kernel hang on tegra124.
>>>>>
>>>>> Fixes: b17de35a2ebbe ("drm/nouveau/bar: implement bar1 teardown")
>>>>> Signed-off-by: Guillaume Tucker <guillaume.tucker-ZGY8ohtN/8qB+jHODAdFcQ@public.gmane.org>
>>>>> CC: Ben Skeggs <bskeggs-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>>>>> ---
>>>>>  drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c | 7 +++++--
>>>>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
>>>> b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
>>>>> index a3ba7f50198b..95e2aba64aad 100644
>>>>> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
>>>>> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
>>>>> @@ -43,9 +43,12 @@ gf100_bar_bar1_wait(struct nvkm_bar *base)
>>>>>  }
>>>>>
>>>>>  void
>>>>> -gf100_bar_bar1_fini(struct nvkm_bar *bar)
>>>>> +gf100_bar_bar1_fini(struct nvkm_bar *base)
>>>>>  {
>>>>> -     nvkm_mask(bar->subdev.device, 0x001704, 0x80000000, 0x00000000);
>>>>> +     struct nvkm_device *device = base->subdev.device;
>>>>> +
>>>>> +     if (base->subdev.oneinit)
>>>>> +             nvkm_mask(device, 0x001704, 0x80000000, 0x00000000);
>>>>>  }
>>>>>
>>>>>  void
>>>>
>>>> I have tested this and it works for me. Thanks for fixing this! Would be
>>>> good to get Ben's ACK, but you can have my ...
>>>>
>>> I'd love to get a good explanation as to why it hangs without this
>>> change,
>>> as, on the surface, it's not immediately obvious as to why it's hanging.
>>
>> To be fair I'm not entirely sure either why this causes a hang, I
>> haven't read the TRM...  The iomem has been mapped at this point,
>> so accessing the register should work.  One clue is when you look
>> at _bar1_init(), the 0x1704 register is initialised with
>> some (device instance?) memory address.  So it's possible that
>> the hardware does something special when you set this to 0 as in
>> _bar1_fini(), which may fail in particular if it was previously
>> not initialised with a valid address.
>>
>> This is merely guesswork, would be interested to find out the
>> real explanation though.
> 
> OK, well that's no good. It's a good pointer, but we need to make sure
> we understand the root of this hang. I will see if I have sometime to
> dig into this further, maybe next week.

I spent a bit of time looking at this, but I still do not fully
understand the cause of the hang. It appears to hang initialisation of
the FB subdev and it appears to be around the point where the L2 cache
is flushed. I will see if anyone else has a clue what is happening here.

Ben, in the meantime with the holiday season upon us, should we remove
the bar1 teardown for gk20a?

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/base.c
b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/base.c
index 9646adec57cb..243f0a5c8a62 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/base.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/base.c
@@ -73,7 +73,8 @@ struct nvkm_vmm *
 nvkm_bar_fini(struct nvkm_subdev *subdev, bool suspend)
 {
        struct nvkm_bar *bar = nvkm_bar(subdev);
-       bar->func->bar1.fini(bar);
+       if (bar->func->bar1.fini)
+               bar->func->bar1.fini(bar);
        return 0;
 }

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gk20a.c
b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gk20a.c
index b10077d38839..35878fb538f2 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gk20a.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gk20a.c
@@ -26,7 +26,6 @@
        .dtor = gf100_bar_dtor,
        .oneinit = gf100_bar_oneinit,
        .bar1.init = gf100_bar_bar1_init,
-       .bar1.fini = gf100_bar_bar1_fini,
        .bar1.wait = gf100_bar_bar1_wait,
        .bar1.vmm = gf100_bar_bar1_vmm,
        .flush = g84_bar_flush,

Cheers
Jon

-- 
nvpublic

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] drm/nouveau/bar/gf100: fix hang when calling ->fini() before ->init()
@ 2017-12-19 22:05               ` Jon Hunter
  0 siblings, 0 replies; 18+ messages in thread
From: Jon Hunter @ 2017-12-19 22:05 UTC (permalink / raw)
  To: Guillaume Tucker, Ben Skeggs
  Cc: David Airlie, dri-devel, linux-tegra, linux-kernel, linux-arm-kernel


On 06/12/17 17:18, Jon Hunter wrote:
> 
> On 06/12/17 09:22, Guillaume Tucker wrote:
>> On 05/12/17 18:32, Ben Skeggs wrote:
>>> On Wed, Dec 6, 2017 at 12:30 AM, Jon Hunter <jonathanh@nvidia.com> wrote:
>>>
>>>>
>>>> On 04/12/17 18:37, Guillaume Tucker wrote:
>>>>> If the firmware fails to load then ->fini() will be called before the
>>>>> device has been initialised, causing the kernel to hang while trying
>>>>> to write to a register.  Add a test in ->fini() to avoid this issue.
>>>>>
>>>>> This fixes a kernel hang on tegra124.
>>>>>
>>>>> Fixes: b17de35a2ebbe ("drm/nouveau/bar: implement bar1 teardown")
>>>>> Signed-off-by: Guillaume Tucker <guillaume.tucker@collabora.com>
>>>>> CC: Ben Skeggs <bskeggs@redhat.com>
>>>>> ---
>>>>>  drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c | 7 +++++--
>>>>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
>>>> b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
>>>>> index a3ba7f50198b..95e2aba64aad 100644
>>>>> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
>>>>> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
>>>>> @@ -43,9 +43,12 @@ gf100_bar_bar1_wait(struct nvkm_bar *base)
>>>>>  }
>>>>>
>>>>>  void
>>>>> -gf100_bar_bar1_fini(struct nvkm_bar *bar)
>>>>> +gf100_bar_bar1_fini(struct nvkm_bar *base)
>>>>>  {
>>>>> -     nvkm_mask(bar->subdev.device, 0x001704, 0x80000000, 0x00000000);
>>>>> +     struct nvkm_device *device = base->subdev.device;
>>>>> +
>>>>> +     if (base->subdev.oneinit)
>>>>> +             nvkm_mask(device, 0x001704, 0x80000000, 0x00000000);
>>>>>  }
>>>>>
>>>>>  void
>>>>
>>>> I have tested this and it works for me. Thanks for fixing this! Would be
>>>> good to get Ben's ACK, but you can have my ...
>>>>
>>> I'd love to get a good explanation as to why it hangs without this
>>> change,
>>> as, on the surface, it's not immediately obvious as to why it's hanging.
>>
>> To be fair I'm not entirely sure either why this causes a hang, I
>> haven't read the TRM...  The iomem has been mapped at this point,
>> so accessing the register should work.  One clue is when you look
>> at _bar1_init(), the 0x1704 register is initialised with
>> some (device instance?) memory address.  So it's possible that
>> the hardware does something special when you set this to 0 as in
>> _bar1_fini(), which may fail in particular if it was previously
>> not initialised with a valid address.
>>
>> This is merely guesswork, would be interested to find out the
>> real explanation though.
> 
> OK, well that's no good. It's a good pointer, but we need to make sure
> we understand the root of this hang. I will see if I have sometime to
> dig into this further, maybe next week.

I spent a bit of time looking at this, but I still do not fully
understand the cause of the hang. It appears to hang initialisation of
the FB subdev and it appears to be around the point where the L2 cache
is flushed. I will see if anyone else has a clue what is happening here.

Ben, in the meantime with the holiday season upon us, should we remove
the bar1 teardown for gk20a?

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/base.c
b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/base.c
index 9646adec57cb..243f0a5c8a62 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/base.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/base.c
@@ -73,7 +73,8 @@ struct nvkm_vmm *
 nvkm_bar_fini(struct nvkm_subdev *subdev, bool suspend)
 {
        struct nvkm_bar *bar = nvkm_bar(subdev);
-       bar->func->bar1.fini(bar);
+       if (bar->func->bar1.fini)
+               bar->func->bar1.fini(bar);
        return 0;
 }

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gk20a.c
b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gk20a.c
index b10077d38839..35878fb538f2 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gk20a.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gk20a.c
@@ -26,7 +26,6 @@
        .dtor = gf100_bar_dtor,
        .oneinit = gf100_bar_oneinit,
        .bar1.init = gf100_bar_bar1_init,
-       .bar1.fini = gf100_bar_bar1_fini,
        .bar1.wait = gf100_bar_bar1_wait,
        .bar1.vmm = gf100_bar_bar1_vmm,
        .flush = g84_bar_flush,

Cheers
Jon

-- 
nvpublic

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 1/2] drm/nouveau/bar/gf100: fix hang when calling ->fini() before ->init()
@ 2017-12-19 22:05               ` Jon Hunter
  0 siblings, 0 replies; 18+ messages in thread
From: Jon Hunter @ 2017-12-19 22:05 UTC (permalink / raw)
  To: linux-arm-kernel


On 06/12/17 17:18, Jon Hunter wrote:
> 
> On 06/12/17 09:22, Guillaume Tucker wrote:
>> On 05/12/17 18:32, Ben Skeggs wrote:
>>> On Wed, Dec 6, 2017 at 12:30 AM, Jon Hunter <jonathanh@nvidia.com> wrote:
>>>
>>>>
>>>> On 04/12/17 18:37, Guillaume Tucker wrote:
>>>>> If the firmware fails to load then ->fini() will be called before the
>>>>> device has been initialised, causing the kernel to hang while trying
>>>>> to write to a register.? Add a test in ->fini() to avoid this issue.
>>>>>
>>>>> This fixes a kernel hang on tegra124.
>>>>>
>>>>> Fixes: b17de35a2ebbe ("drm/nouveau/bar: implement bar1 teardown")
>>>>> Signed-off-by: Guillaume Tucker <guillaume.tucker@collabora.com>
>>>>> CC: Ben Skeggs <bskeggs@redhat.com>
>>>>> ---
>>>>> ?drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c | 7 +++++--
>>>>> ?1 file changed, 5 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
>>>> b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
>>>>> index a3ba7f50198b..95e2aba64aad 100644
>>>>> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
>>>>> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c
>>>>> @@ -43,9 +43,12 @@ gf100_bar_bar1_wait(struct nvkm_bar *base)
>>>>> ?}
>>>>>
>>>>> ?void
>>>>> -gf100_bar_bar1_fini(struct nvkm_bar *bar)
>>>>> +gf100_bar_bar1_fini(struct nvkm_bar *base)
>>>>> ?{
>>>>> -???? nvkm_mask(bar->subdev.device, 0x001704, 0x80000000, 0x00000000);
>>>>> +???? struct nvkm_device *device = base->subdev.device;
>>>>> +
>>>>> +???? if (base->subdev.oneinit)
>>>>> +???????????? nvkm_mask(device, 0x001704, 0x80000000, 0x00000000);
>>>>> ?}
>>>>>
>>>>> ?void
>>>>
>>>> I have tested this and it works for me. Thanks for fixing this! Would be
>>>> good to get Ben's ACK, but you can have my ...
>>>>
>>> I'd love to get a good explanation as to why it hangs without this
>>> change,
>>> as, on the surface, it's not immediately obvious as to why it's hanging.
>>
>> To be fair I'm not entirely sure either why this causes a hang, I
>> haven't read the TRM...? The iomem has been mapped at this point,
>> so accessing the register should work.? One clue is when you look
>> at _bar1_init(), the 0x1704 register is initialised with
>> some (device instance?) memory address.? So it's possible that
>> the hardware does something special when you set this to 0 as in
>> _bar1_fini(), which may fail in particular if it was previously
>> not initialised with a valid address.
>>
>> This is merely guesswork, would be interested to find out the
>> real explanation though.
> 
> OK, well that's no good. It's a good pointer, but we need to make sure
> we understand the root of this hang. I will see if I have sometime to
> dig into this further, maybe next week.

I spent a bit of time looking at this, but I still do not fully
understand the cause of the hang. It appears to hang initialisation of
the FB subdev and it appears to be around the point where the L2 cache
is flushed. I will see if anyone else has a clue what is happening here.

Ben, in the meantime with the holiday season upon us, should we remove
the bar1 teardown for gk20a?

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/base.c
b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/base.c
index 9646adec57cb..243f0a5c8a62 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/base.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/base.c
@@ -73,7 +73,8 @@ struct nvkm_vmm *
 nvkm_bar_fini(struct nvkm_subdev *subdev, bool suspend)
 {
        struct nvkm_bar *bar = nvkm_bar(subdev);
-       bar->func->bar1.fini(bar);
+       if (bar->func->bar1.fini)
+               bar->func->bar1.fini(bar);
        return 0;
 }

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gk20a.c
b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gk20a.c
index b10077d38839..35878fb538f2 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gk20a.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gk20a.c
@@ -26,7 +26,6 @@
        .dtor = gf100_bar_dtor,
        .oneinit = gf100_bar_oneinit,
        .bar1.init = gf100_bar_bar1_init,
-       .bar1.fini = gf100_bar_bar1_fini,
        .bar1.wait = gf100_bar_bar1_wait,
        .bar1.vmm = gf100_bar_bar1_vmm,
        .flush = g84_bar_flush,

Cheers
Jon

-- 
nvpublic

^ permalink raw reply related	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2017-12-19 22:05 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-04 18:37 [PATCH 1/2] drm/nouveau/bar/gf100: fix hang when calling ->fini() before ->init() Guillaume Tucker
2017-12-04 18:37 ` Guillaume Tucker
2017-12-04 18:37 ` Guillaume Tucker
     [not found] ` <be610a7596ad8fee7da092161888c532c2eb2908.1512411775.git.guillaume.tucker-ZGY8ohtN/8qB+jHODAdFcQ@public.gmane.org>
2017-12-04 18:37   ` [PATCH 2/2] drm/tegra: sor: Fix hang on tegra124 due to NULL clk_out Guillaume Tucker
2017-12-04 18:37     ` Guillaume Tucker
2017-12-04 18:37     ` Guillaume Tucker
2017-12-05 14:30 ` [PATCH 1/2] drm/nouveau/bar/gf100: fix hang when calling ->fini() before ->init() Jon Hunter
2017-12-05 14:30   ` Jon Hunter
2017-12-05 14:30   ` Jon Hunter
2017-12-05 18:32   ` Ben Skeggs
2017-12-06  9:22     ` Guillaume Tucker
2017-12-06  9:22       ` Guillaume Tucker
     [not found]       ` <d5b6a012-52c4-dce3-1cc6-3041558e808b-ZGY8ohtN/8qB+jHODAdFcQ@public.gmane.org>
2017-12-06 17:18         ` Jon Hunter
2017-12-06 17:18           ` Jon Hunter
2017-12-06 17:18           ` Jon Hunter
     [not found]           ` <b6618306-8de5-9b08-f2ee-9939df62aedc-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2017-12-19 22:05             ` Jon Hunter
2017-12-19 22:05               ` Jon Hunter
2017-12-19 22:05               ` Jon Hunter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.