All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yong Wu <yong.wu@mediatek.com>
To: Stephen Boyd <swboyd@chromium.org>
Cc: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	Douglas Anderson <dianders@chromium.org>,
	<linux-kernel@vger.kernel.org>, <linux-arm-msm@vger.kernel.org>,
	<dri-devel@lists.freedesktop.org>,
	<freedreno@lists.freedesktop.org>, Joerg Roedel <joro@8bytes.org>,
	"Will Deacon" <will@kernel.org>,
	Daniel Vetter <daniel.vetter@ffwll.ch>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Rob Clark <robdclark@gmail.com>,
	Russell King <rmk+kernel@arm.linux.org.uk>,
	Saravana Kannan <saravanak@google.com>,
	<linux-mediatek@lists.infradead.org>,
	<iommu@lists.linux-foundation.org>, <youlin.pei@mediatek.com>
Subject: Re: [PATCH v5 25/32] iommu/mtk: Migrate to aggregate driver
Date: Wed, 12 Jan 2022 17:09:19 +0800	[thread overview]
Message-ID: <ff81bc1fe1f1c2060fcf03ba14f1bef584c47599.camel@mediatek.com> (raw)
In-Reply-To: <CAE-0n53Y3WRy4_QvUm9k9wjjWV7adMDQcK_+1ji4+W25SSeGwg@mail.gmail.com>

On Tue, 2022-01-11 at 16:27 -0800, Stephen Boyd wrote:
> Quoting Yong Wu (2022-01-11 04:22:23)
> > Hi Stephen,
> > 
> > Thanks for helping update here.
> > 
> > On Thu, 2022-01-06 at 13:45 -0800, Stephen Boyd wrote:
> > > Use an aggregate driver instead of component ops so that we can
> > > get
> > > proper driver probe ordering of the aggregate device with respect
> > > to
> > > all
> > > the component devices that make up the aggregate device.
> > > 
> > > Cc: Yong Wu <yong.wu@mediatek.com>
> > > Cc: Joerg Roedel <joro@8bytes.org>
> > > Cc: Will Deacon <will@kernel.org>
> > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> > > Cc: Rob Clark <robdclark@gmail.com>
> > > Cc: Russell King <rmk+kernel@arm.linux.org.uk>
> > > Cc: Saravana Kannan <saravanak@google.com>
> > > Signed-off-by: Stephen Boyd <swboyd@chromium.org>
> > 
> > When I test this on mt8195 which have two IOMMU HWs(calling
> > component_aggregate_regsiter twice), it will abort like this. Then
> > what
> > should we do if we have two instances?
> > 
> 
> Thanks for testing it out. We can't register the struct driver more
> than
> once but this driver is calling the component_aggregate_register()
> function from the driver probe and there are two devices bound to the
> mtk-iommu driver so we try to register it more than once. Sigh!
> 
> I see a couple options. One is to do a deep copy of the driver
> structure
> and change the driver name. Then it's a one to one relationship
> between
> device and driver. That's not very great because it leaves around
> junk
> so it should probably be avoided.
> 
> Another option is to reference count the driver registration calls
> when
> component_aggregate_register() is called multiple times. Then we
> would
> only register the driver once and keep it pinned until the last
> unregister call is made, but still remove devices that are created
> for
> the match table.
> 
> Can you try the attached patch? It is based on the next version of
> this
> patch series so the include part of the patch may not apply cleanly.
> 
> ---8<---
> diff --git a/drivers/base/component.c b/drivers/base/component.c
> index 64ad7478c67a..97f253a41bdf 100644
> --- a/drivers/base/component.c
> +++ b/drivers/base/component.c
> @@ -492,15 +492,30 @@ static struct aggregate_device
> *__aggregate_find(struct device *parent)
>  	return dev ? to_aggregate_device(dev) : NULL;
>  }
> 
> +static DEFINE_MUTEX(aggregate_mutex);
> +
>  static int aggregate_driver_register(struct aggregate_driver *adrv)
>  {
> -	adrv->driver.bus = &aggregate_bus_type;
> -	return driver_register(&adrv->driver);
> +	int ret = 0;
> +
> +	mutex_lock(&aggregate_mutex);
> +	if (!refcount_inc_not_zero(&adrv->count)) {
> +		adrv->driver.bus = &aggregate_bus_type;
> +		ret = driver_register(&adrv->driver);
> +		if (!ret)
> +			refcount_inc(&adrv->count);

This should be refcount_set(&adrv->count, 1)?

Otherwise, it will warning like this:

[    2.654526] ------------[ cut here ]------------
[    2.655558] refcount_t: addition on 0; use-after-free.
[    2.656219] WARNING: CPU: 7 PID: 74 at ../v5.16-
rc1/kernel/mediatek/lib/refcount.c:25
refcount_warn_saturate+0x128/0x148
...
[    2.672227] Call trace:
[    2.672539]  refcount_warn_saturate+0x128/0x148
[    2.673118]  component_aggregate_register+0x388/0x390
[    2.673763]  mtk_iommu_probe+0x638/0x690

[    2.686467] ------------[ cut here ]------------
[    2.687049] refcount_t: saturated; leaking memory.
[    2.687666] WARNING: CPU: 5 PID: 74 at ../v5.16-
rc1/kernel/mediatek/lib/refcount.c:19 refcount_warn_saturate+0xfc/0x148

[    2.703805] Call trace:
[    2.704117]  refcount_warn_saturate+0xfc/0x148
[    2.704685]  component_aggregate_register+0x1fc/0x390
[    2.705330]  mtk_iommu_probe+0x638/0x690

> +	}
> +	mutex_unlock(&aggregate_mutex);
> +
> +	return ret;
>  }
> 
>  static void aggregate_driver_unregister(struct aggregate_driver
> *adrv)
>  {
> -	driver_unregister(&adrv->driver);
> +	if (refcount_dec_and_mutex_lock(&adrv->count,
> &aggregate_mutex)) {
> +		driver_unregister(&adrv->driver);
> +		mutex_unlock(&aggregate_mutex);
> +	}
>  }
> 
>  static struct aggregate_device *aggregate_device_add(struct device
> *parent,
> diff --git a/include/linux/component.h b/include/linux/component.h
> index 53d81203c095..b061341938aa 100644
> --- a/include/linux/component.h
> +++ b/include/linux/component.h
> @@ -4,6 +4,7 @@
> 
>  #include <linux/stddef.h>
>  #include <linux/device.h>
> +#include <linux/refcount.h>
> 
>  struct aggregate_device;
> 
> @@ -66,6 +67,7 @@ struct device *aggregate_device_parent(const struct
> aggregate_device *adev);
> 
>  /**
>   * struct aggregate_driver - Aggregate driver (made up of other
> drivers)
> + * @count: driver registration refcount
>   * @driver: device driver
>   */
>  struct aggregate_driver {
> @@ -101,6 +103,7 @@ struct aggregate_driver {
>  	 */
>  	void (*shutdown)(struct aggregate_device *adev);
> 
> +	refcount_t		count;
>  	struct device_driver	driver;
>  };

After this patch, the aggregate_driver flow looks ok. But our driver
still aborts like this:

[    2.721316] Unable to handle kernel NULL pointer dereference at
virtual address 0000000000000000
...
[    2.731658] pc : mtk_smi_larb_config_port_gen2_general+0xa4/0x138
[    2.732434] lr : mtk_smi_larb_resume+0x54/0x98
...
[    2.742457] Call trace:
[    2.742768]  mtk_smi_larb_config_port_gen2_general+0xa4/0x138
[    2.743496]  pm_generic_runtime_resume+0x2c/0x48
[    2.744090]  __genpd_runtime_resume+0x30/0xa8
[    2.744648]  genpd_runtime_resume+0x94/0x2c8
[    2.745191]  __rpm_callback+0x44/0x150
[    2.745669]  rpm_callback+0x6c/0x78
[    2.746114]  rpm_resume+0x314/0x558
[    2.746559]  __pm_runtime_resume+0x3c/0x88
[    2.747080]  pm_runtime_get_suppliers+0x7c/0x110
[    2.747668]  __driver_probe_device+0x4c/0xe8
[    2.748212]  driver_probe_device+0x44/0x130
[    2.748745]  __device_attach_driver+0x98/0xd0
[    2.749300]  bus_for_each_drv+0x68/0xd0
[    2.749787]  __device_attach+0xec/0x148
[    2.750277]  device_attach+0x14/0x20
[    2.750733]  bus_rescan_devices_helper+0x50/0x90
[    2.751319]  bus_for_each_dev+0x7c/0xd8
[    2.751806]  bus_rescan_devices+0x20/0x30
[    2.752315]  __component_add+0x7c/0xa0
[    2.752795]  component_add+0x14/0x20
[    2.753253]  mtk_smi_larb_probe+0xe0/0x120

This is because the device runtime_resume is called before the bind
operation(In our case this detailed function is mtk_smi_larb_bind).
The issue doesn't happen without this patchset. I'm not sure the right
sequence. If we should fix in mediatek driver, the patch could be:


diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c
index b883dcc0bbfa..288841555067 100644
--- a/drivers/memory/mtk-smi.c
+++ b/drivers/memory/mtk-smi.c
@@ -483,8 +483,9 @@ static int __maybe_unused
mtk_smi_larb_resume(struct device *dev)
        if (ret < 0)
                return ret;

-       /* Configure the basic setting for this larb */
-       larb_gen->config_port(dev);
+       /* Configure the basic setting for this larb after it binds
with iommu */
+       if (larb->mmu)
+               larb_gen->config_port(dev);

        return 0;
 }


Another nitpick, the title should be: iommu/mediatek: xxxx



WARNING: multiple messages have this Message-ID (diff)
From: Yong Wu <yong.wu@mediatek.com>
To: Stephen Boyd <swboyd@chromium.org>
Cc: youlin.pei@mediatek.com, Saravana Kannan <saravanak@google.com>,
	Will Deacon <will@kernel.org>,
	Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Douglas Anderson <dianders@chromium.org>,
	dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org,
	Daniel Vetter <daniel.vetter@ffwll.ch>,
	iommu@lists.linux-foundation.org,
	linux-mediatek@lists.infradead.org,
	linux-arm-msm@vger.kernel.org,
	Russell King <rmk+kernel@arm.linux.org.uk>,
	freedreno@lists.freedesktop.org
Subject: Re: [PATCH v5 25/32] iommu/mtk: Migrate to aggregate driver
Date: Wed, 12 Jan 2022 17:09:19 +0800	[thread overview]
Message-ID: <ff81bc1fe1f1c2060fcf03ba14f1bef584c47599.camel@mediatek.com> (raw)
In-Reply-To: <CAE-0n53Y3WRy4_QvUm9k9wjjWV7adMDQcK_+1ji4+W25SSeGwg@mail.gmail.com>

On Tue, 2022-01-11 at 16:27 -0800, Stephen Boyd wrote:
> Quoting Yong Wu (2022-01-11 04:22:23)
> > Hi Stephen,
> > 
> > Thanks for helping update here.
> > 
> > On Thu, 2022-01-06 at 13:45 -0800, Stephen Boyd wrote:
> > > Use an aggregate driver instead of component ops so that we can
> > > get
> > > proper driver probe ordering of the aggregate device with respect
> > > to
> > > all
> > > the component devices that make up the aggregate device.
> > > 
> > > Cc: Yong Wu <yong.wu@mediatek.com>
> > > Cc: Joerg Roedel <joro@8bytes.org>
> > > Cc: Will Deacon <will@kernel.org>
> > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> > > Cc: Rob Clark <robdclark@gmail.com>
> > > Cc: Russell King <rmk+kernel@arm.linux.org.uk>
> > > Cc: Saravana Kannan <saravanak@google.com>
> > > Signed-off-by: Stephen Boyd <swboyd@chromium.org>
> > 
> > When I test this on mt8195 which have two IOMMU HWs(calling
> > component_aggregate_regsiter twice), it will abort like this. Then
> > what
> > should we do if we have two instances?
> > 
> 
> Thanks for testing it out. We can't register the struct driver more
> than
> once but this driver is calling the component_aggregate_register()
> function from the driver probe and there are two devices bound to the
> mtk-iommu driver so we try to register it more than once. Sigh!
> 
> I see a couple options. One is to do a deep copy of the driver
> structure
> and change the driver name. Then it's a one to one relationship
> between
> device and driver. That's not very great because it leaves around
> junk
> so it should probably be avoided.
> 
> Another option is to reference count the driver registration calls
> when
> component_aggregate_register() is called multiple times. Then we
> would
> only register the driver once and keep it pinned until the last
> unregister call is made, but still remove devices that are created
> for
> the match table.
> 
> Can you try the attached patch? It is based on the next version of
> this
> patch series so the include part of the patch may not apply cleanly.
> 
> ---8<---
> diff --git a/drivers/base/component.c b/drivers/base/component.c
> index 64ad7478c67a..97f253a41bdf 100644
> --- a/drivers/base/component.c
> +++ b/drivers/base/component.c
> @@ -492,15 +492,30 @@ static struct aggregate_device
> *__aggregate_find(struct device *parent)
>  	return dev ? to_aggregate_device(dev) : NULL;
>  }
> 
> +static DEFINE_MUTEX(aggregate_mutex);
> +
>  static int aggregate_driver_register(struct aggregate_driver *adrv)
>  {
> -	adrv->driver.bus = &aggregate_bus_type;
> -	return driver_register(&adrv->driver);
> +	int ret = 0;
> +
> +	mutex_lock(&aggregate_mutex);
> +	if (!refcount_inc_not_zero(&adrv->count)) {
> +		adrv->driver.bus = &aggregate_bus_type;
> +		ret = driver_register(&adrv->driver);
> +		if (!ret)
> +			refcount_inc(&adrv->count);

This should be refcount_set(&adrv->count, 1)?

Otherwise, it will warning like this:

[    2.654526] ------------[ cut here ]------------
[    2.655558] refcount_t: addition on 0; use-after-free.
[    2.656219] WARNING: CPU: 7 PID: 74 at ../v5.16-
rc1/kernel/mediatek/lib/refcount.c:25
refcount_warn_saturate+0x128/0x148
...
[    2.672227] Call trace:
[    2.672539]  refcount_warn_saturate+0x128/0x148
[    2.673118]  component_aggregate_register+0x388/0x390
[    2.673763]  mtk_iommu_probe+0x638/0x690

[    2.686467] ------------[ cut here ]------------
[    2.687049] refcount_t: saturated; leaking memory.
[    2.687666] WARNING: CPU: 5 PID: 74 at ../v5.16-
rc1/kernel/mediatek/lib/refcount.c:19 refcount_warn_saturate+0xfc/0x148

[    2.703805] Call trace:
[    2.704117]  refcount_warn_saturate+0xfc/0x148
[    2.704685]  component_aggregate_register+0x1fc/0x390
[    2.705330]  mtk_iommu_probe+0x638/0x690

> +	}
> +	mutex_unlock(&aggregate_mutex);
> +
> +	return ret;
>  }
> 
>  static void aggregate_driver_unregister(struct aggregate_driver
> *adrv)
>  {
> -	driver_unregister(&adrv->driver);
> +	if (refcount_dec_and_mutex_lock(&adrv->count,
> &aggregate_mutex)) {
> +		driver_unregister(&adrv->driver);
> +		mutex_unlock(&aggregate_mutex);
> +	}
>  }
> 
>  static struct aggregate_device *aggregate_device_add(struct device
> *parent,
> diff --git a/include/linux/component.h b/include/linux/component.h
> index 53d81203c095..b061341938aa 100644
> --- a/include/linux/component.h
> +++ b/include/linux/component.h
> @@ -4,6 +4,7 @@
> 
>  #include <linux/stddef.h>
>  #include <linux/device.h>
> +#include <linux/refcount.h>
> 
>  struct aggregate_device;
> 
> @@ -66,6 +67,7 @@ struct device *aggregate_device_parent(const struct
> aggregate_device *adev);
> 
>  /**
>   * struct aggregate_driver - Aggregate driver (made up of other
> drivers)
> + * @count: driver registration refcount
>   * @driver: device driver
>   */
>  struct aggregate_driver {
> @@ -101,6 +103,7 @@ struct aggregate_driver {
>  	 */
>  	void (*shutdown)(struct aggregate_device *adev);
> 
> +	refcount_t		count;
>  	struct device_driver	driver;
>  };

After this patch, the aggregate_driver flow looks ok. But our driver
still aborts like this:

[    2.721316] Unable to handle kernel NULL pointer dereference at
virtual address 0000000000000000
...
[    2.731658] pc : mtk_smi_larb_config_port_gen2_general+0xa4/0x138
[    2.732434] lr : mtk_smi_larb_resume+0x54/0x98
...
[    2.742457] Call trace:
[    2.742768]  mtk_smi_larb_config_port_gen2_general+0xa4/0x138
[    2.743496]  pm_generic_runtime_resume+0x2c/0x48
[    2.744090]  __genpd_runtime_resume+0x30/0xa8
[    2.744648]  genpd_runtime_resume+0x94/0x2c8
[    2.745191]  __rpm_callback+0x44/0x150
[    2.745669]  rpm_callback+0x6c/0x78
[    2.746114]  rpm_resume+0x314/0x558
[    2.746559]  __pm_runtime_resume+0x3c/0x88
[    2.747080]  pm_runtime_get_suppliers+0x7c/0x110
[    2.747668]  __driver_probe_device+0x4c/0xe8
[    2.748212]  driver_probe_device+0x44/0x130
[    2.748745]  __device_attach_driver+0x98/0xd0
[    2.749300]  bus_for_each_drv+0x68/0xd0
[    2.749787]  __device_attach+0xec/0x148
[    2.750277]  device_attach+0x14/0x20
[    2.750733]  bus_rescan_devices_helper+0x50/0x90
[    2.751319]  bus_for_each_dev+0x7c/0xd8
[    2.751806]  bus_rescan_devices+0x20/0x30
[    2.752315]  __component_add+0x7c/0xa0
[    2.752795]  component_add+0x14/0x20
[    2.753253]  mtk_smi_larb_probe+0xe0/0x120

This is because the device runtime_resume is called before the bind
operation(In our case this detailed function is mtk_smi_larb_bind).
The issue doesn't happen without this patchset. I'm not sure the right
sequence. If we should fix in mediatek driver, the patch could be:


diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c
index b883dcc0bbfa..288841555067 100644
--- a/drivers/memory/mtk-smi.c
+++ b/drivers/memory/mtk-smi.c
@@ -483,8 +483,9 @@ static int __maybe_unused
mtk_smi_larb_resume(struct device *dev)
        if (ret < 0)
                return ret;

-       /* Configure the basic setting for this larb */
-       larb_gen->config_port(dev);
+       /* Configure the basic setting for this larb after it binds
with iommu */
+       if (larb->mmu)
+               larb_gen->config_port(dev);

        return 0;
 }


Another nitpick, the title should be: iommu/mediatek: xxxx


_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

WARNING: multiple messages have this Message-ID (diff)
From: Yong Wu <yong.wu@mediatek.com>
To: Stephen Boyd <swboyd@chromium.org>
Cc: youlin.pei@mediatek.com, Saravana Kannan <saravanak@google.com>,
	Will Deacon <will@kernel.org>,
	Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Joerg Roedel <joro@8bytes.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Douglas Anderson <dianders@chromium.org>,
	dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org,
	Daniel Vetter <daniel.vetter@ffwll.ch>,
	iommu@lists.linux-foundation.org,
	linux-mediatek@lists.infradead.org,
	linux-arm-msm@vger.kernel.org,
	Russell King <rmk+kernel@arm.linux.org.uk>,
	freedreno@lists.freedesktop.org
Subject: Re: [PATCH v5 25/32] iommu/mtk: Migrate to aggregate driver
Date: Wed, 12 Jan 2022 17:09:19 +0800	[thread overview]
Message-ID: <ff81bc1fe1f1c2060fcf03ba14f1bef584c47599.camel@mediatek.com> (raw)
In-Reply-To: <CAE-0n53Y3WRy4_QvUm9k9wjjWV7adMDQcK_+1ji4+W25SSeGwg@mail.gmail.com>

On Tue, 2022-01-11 at 16:27 -0800, Stephen Boyd wrote:
> Quoting Yong Wu (2022-01-11 04:22:23)
> > Hi Stephen,
> > 
> > Thanks for helping update here.
> > 
> > On Thu, 2022-01-06 at 13:45 -0800, Stephen Boyd wrote:
> > > Use an aggregate driver instead of component ops so that we can
> > > get
> > > proper driver probe ordering of the aggregate device with respect
> > > to
> > > all
> > > the component devices that make up the aggregate device.
> > > 
> > > Cc: Yong Wu <yong.wu@mediatek.com>
> > > Cc: Joerg Roedel <joro@8bytes.org>
> > > Cc: Will Deacon <will@kernel.org>
> > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> > > Cc: Rob Clark <robdclark@gmail.com>
> > > Cc: Russell King <rmk+kernel@arm.linux.org.uk>
> > > Cc: Saravana Kannan <saravanak@google.com>
> > > Signed-off-by: Stephen Boyd <swboyd@chromium.org>
> > 
> > When I test this on mt8195 which have two IOMMU HWs(calling
> > component_aggregate_regsiter twice), it will abort like this. Then
> > what
> > should we do if we have two instances?
> > 
> 
> Thanks for testing it out. We can't register the struct driver more
> than
> once but this driver is calling the component_aggregate_register()
> function from the driver probe and there are two devices bound to the
> mtk-iommu driver so we try to register it more than once. Sigh!
> 
> I see a couple options. One is to do a deep copy of the driver
> structure
> and change the driver name. Then it's a one to one relationship
> between
> device and driver. That's not very great because it leaves around
> junk
> so it should probably be avoided.
> 
> Another option is to reference count the driver registration calls
> when
> component_aggregate_register() is called multiple times. Then we
> would
> only register the driver once and keep it pinned until the last
> unregister call is made, but still remove devices that are created
> for
> the match table.
> 
> Can you try the attached patch? It is based on the next version of
> this
> patch series so the include part of the patch may not apply cleanly.
> 
> ---8<---
> diff --git a/drivers/base/component.c b/drivers/base/component.c
> index 64ad7478c67a..97f253a41bdf 100644
> --- a/drivers/base/component.c
> +++ b/drivers/base/component.c
> @@ -492,15 +492,30 @@ static struct aggregate_device
> *__aggregate_find(struct device *parent)
>  	return dev ? to_aggregate_device(dev) : NULL;
>  }
> 
> +static DEFINE_MUTEX(aggregate_mutex);
> +
>  static int aggregate_driver_register(struct aggregate_driver *adrv)
>  {
> -	adrv->driver.bus = &aggregate_bus_type;
> -	return driver_register(&adrv->driver);
> +	int ret = 0;
> +
> +	mutex_lock(&aggregate_mutex);
> +	if (!refcount_inc_not_zero(&adrv->count)) {
> +		adrv->driver.bus = &aggregate_bus_type;
> +		ret = driver_register(&adrv->driver);
> +		if (!ret)
> +			refcount_inc(&adrv->count);

This should be refcount_set(&adrv->count, 1)?

Otherwise, it will warning like this:

[    2.654526] ------------[ cut here ]------------
[    2.655558] refcount_t: addition on 0; use-after-free.
[    2.656219] WARNING: CPU: 7 PID: 74 at ../v5.16-
rc1/kernel/mediatek/lib/refcount.c:25
refcount_warn_saturate+0x128/0x148
...
[    2.672227] Call trace:
[    2.672539]  refcount_warn_saturate+0x128/0x148
[    2.673118]  component_aggregate_register+0x388/0x390
[    2.673763]  mtk_iommu_probe+0x638/0x690

[    2.686467] ------------[ cut here ]------------
[    2.687049] refcount_t: saturated; leaking memory.
[    2.687666] WARNING: CPU: 5 PID: 74 at ../v5.16-
rc1/kernel/mediatek/lib/refcount.c:19 refcount_warn_saturate+0xfc/0x148

[    2.703805] Call trace:
[    2.704117]  refcount_warn_saturate+0xfc/0x148
[    2.704685]  component_aggregate_register+0x1fc/0x390
[    2.705330]  mtk_iommu_probe+0x638/0x690

> +	}
> +	mutex_unlock(&aggregate_mutex);
> +
> +	return ret;
>  }
> 
>  static void aggregate_driver_unregister(struct aggregate_driver
> *adrv)
>  {
> -	driver_unregister(&adrv->driver);
> +	if (refcount_dec_and_mutex_lock(&adrv->count,
> &aggregate_mutex)) {
> +		driver_unregister(&adrv->driver);
> +		mutex_unlock(&aggregate_mutex);
> +	}
>  }
> 
>  static struct aggregate_device *aggregate_device_add(struct device
> *parent,
> diff --git a/include/linux/component.h b/include/linux/component.h
> index 53d81203c095..b061341938aa 100644
> --- a/include/linux/component.h
> +++ b/include/linux/component.h
> @@ -4,6 +4,7 @@
> 
>  #include <linux/stddef.h>
>  #include <linux/device.h>
> +#include <linux/refcount.h>
> 
>  struct aggregate_device;
> 
> @@ -66,6 +67,7 @@ struct device *aggregate_device_parent(const struct
> aggregate_device *adev);
> 
>  /**
>   * struct aggregate_driver - Aggregate driver (made up of other
> drivers)
> + * @count: driver registration refcount
>   * @driver: device driver
>   */
>  struct aggregate_driver {
> @@ -101,6 +103,7 @@ struct aggregate_driver {
>  	 */
>  	void (*shutdown)(struct aggregate_device *adev);
> 
> +	refcount_t		count;
>  	struct device_driver	driver;
>  };

After this patch, the aggregate_driver flow looks ok. But our driver
still aborts like this:

[    2.721316] Unable to handle kernel NULL pointer dereference at
virtual address 0000000000000000
...
[    2.731658] pc : mtk_smi_larb_config_port_gen2_general+0xa4/0x138
[    2.732434] lr : mtk_smi_larb_resume+0x54/0x98
...
[    2.742457] Call trace:
[    2.742768]  mtk_smi_larb_config_port_gen2_general+0xa4/0x138
[    2.743496]  pm_generic_runtime_resume+0x2c/0x48
[    2.744090]  __genpd_runtime_resume+0x30/0xa8
[    2.744648]  genpd_runtime_resume+0x94/0x2c8
[    2.745191]  __rpm_callback+0x44/0x150
[    2.745669]  rpm_callback+0x6c/0x78
[    2.746114]  rpm_resume+0x314/0x558
[    2.746559]  __pm_runtime_resume+0x3c/0x88
[    2.747080]  pm_runtime_get_suppliers+0x7c/0x110
[    2.747668]  __driver_probe_device+0x4c/0xe8
[    2.748212]  driver_probe_device+0x44/0x130
[    2.748745]  __device_attach_driver+0x98/0xd0
[    2.749300]  bus_for_each_drv+0x68/0xd0
[    2.749787]  __device_attach+0xec/0x148
[    2.750277]  device_attach+0x14/0x20
[    2.750733]  bus_rescan_devices_helper+0x50/0x90
[    2.751319]  bus_for_each_dev+0x7c/0xd8
[    2.751806]  bus_rescan_devices+0x20/0x30
[    2.752315]  __component_add+0x7c/0xa0
[    2.752795]  component_add+0x14/0x20
[    2.753253]  mtk_smi_larb_probe+0xe0/0x120

This is because the device runtime_resume is called before the bind
operation(In our case this detailed function is mtk_smi_larb_bind).
The issue doesn't happen without this patchset. I'm not sure the right
sequence. If we should fix in mediatek driver, the patch could be:


diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c
index b883dcc0bbfa..288841555067 100644
--- a/drivers/memory/mtk-smi.c
+++ b/drivers/memory/mtk-smi.c
@@ -483,8 +483,9 @@ static int __maybe_unused
mtk_smi_larb_resume(struct device *dev)
        if (ret < 0)
                return ret;

-       /* Configure the basic setting for this larb */
-       larb_gen->config_port(dev);
+       /* Configure the basic setting for this larb after it binds
with iommu */
+       if (larb->mmu)
+               larb_gen->config_port(dev);

        return 0;
 }


Another nitpick, the title should be: iommu/mediatek: xxxx



WARNING: multiple messages have this Message-ID (diff)
From: Yong Wu <yong.wu@mediatek.com>
To: Stephen Boyd <swboyd@chromium.org>
Cc: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	Douglas Anderson <dianders@chromium.org>,
	<linux-kernel@vger.kernel.org>, <linux-arm-msm@vger.kernel.org>,
	<dri-devel@lists.freedesktop.org>,
	<freedreno@lists.freedesktop.org>, Joerg Roedel <joro@8bytes.org>,
	"Will Deacon" <will@kernel.org>,
	Daniel Vetter <daniel.vetter@ffwll.ch>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Rob Clark <robdclark@gmail.com>,
	Russell King <rmk+kernel@arm.linux.org.uk>,
	Saravana Kannan <saravanak@google.com>,
	<linux-mediatek@lists.infradead.org>,
	<iommu@lists.linux-foundation.org>, <youlin.pei@mediatek.com>
Subject: Re: [PATCH v5 25/32] iommu/mtk: Migrate to aggregate driver
Date: Wed, 12 Jan 2022 17:09:19 +0800	[thread overview]
Message-ID: <ff81bc1fe1f1c2060fcf03ba14f1bef584c47599.camel@mediatek.com> (raw)
In-Reply-To: <CAE-0n53Y3WRy4_QvUm9k9wjjWV7adMDQcK_+1ji4+W25SSeGwg@mail.gmail.com>

On Tue, 2022-01-11 at 16:27 -0800, Stephen Boyd wrote:
> Quoting Yong Wu (2022-01-11 04:22:23)
> > Hi Stephen,
> > 
> > Thanks for helping update here.
> > 
> > On Thu, 2022-01-06 at 13:45 -0800, Stephen Boyd wrote:
> > > Use an aggregate driver instead of component ops so that we can
> > > get
> > > proper driver probe ordering of the aggregate device with respect
> > > to
> > > all
> > > the component devices that make up the aggregate device.
> > > 
> > > Cc: Yong Wu <yong.wu@mediatek.com>
> > > Cc: Joerg Roedel <joro@8bytes.org>
> > > Cc: Will Deacon <will@kernel.org>
> > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> > > Cc: Rob Clark <robdclark@gmail.com>
> > > Cc: Russell King <rmk+kernel@arm.linux.org.uk>
> > > Cc: Saravana Kannan <saravanak@google.com>
> > > Signed-off-by: Stephen Boyd <swboyd@chromium.org>
> > 
> > When I test this on mt8195 which have two IOMMU HWs(calling
> > component_aggregate_regsiter twice), it will abort like this. Then
> > what
> > should we do if we have two instances?
> > 
> 
> Thanks for testing it out. We can't register the struct driver more
> than
> once but this driver is calling the component_aggregate_register()
> function from the driver probe and there are two devices bound to the
> mtk-iommu driver so we try to register it more than once. Sigh!
> 
> I see a couple options. One is to do a deep copy of the driver
> structure
> and change the driver name. Then it's a one to one relationship
> between
> device and driver. That's not very great because it leaves around
> junk
> so it should probably be avoided.
> 
> Another option is to reference count the driver registration calls
> when
> component_aggregate_register() is called multiple times. Then we
> would
> only register the driver once and keep it pinned until the last
> unregister call is made, but still remove devices that are created
> for
> the match table.
> 
> Can you try the attached patch? It is based on the next version of
> this
> patch series so the include part of the patch may not apply cleanly.
> 
> ---8<---
> diff --git a/drivers/base/component.c b/drivers/base/component.c
> index 64ad7478c67a..97f253a41bdf 100644
> --- a/drivers/base/component.c
> +++ b/drivers/base/component.c
> @@ -492,15 +492,30 @@ static struct aggregate_device
> *__aggregate_find(struct device *parent)
>  	return dev ? to_aggregate_device(dev) : NULL;
>  }
> 
> +static DEFINE_MUTEX(aggregate_mutex);
> +
>  static int aggregate_driver_register(struct aggregate_driver *adrv)
>  {
> -	adrv->driver.bus = &aggregate_bus_type;
> -	return driver_register(&adrv->driver);
> +	int ret = 0;
> +
> +	mutex_lock(&aggregate_mutex);
> +	if (!refcount_inc_not_zero(&adrv->count)) {
> +		adrv->driver.bus = &aggregate_bus_type;
> +		ret = driver_register(&adrv->driver);
> +		if (!ret)
> +			refcount_inc(&adrv->count);

This should be refcount_set(&adrv->count, 1)?

Otherwise, it will warning like this:

[    2.654526] ------------[ cut here ]------------
[    2.655558] refcount_t: addition on 0; use-after-free.
[    2.656219] WARNING: CPU: 7 PID: 74 at ../v5.16-
rc1/kernel/mediatek/lib/refcount.c:25
refcount_warn_saturate+0x128/0x148
...
[    2.672227] Call trace:
[    2.672539]  refcount_warn_saturate+0x128/0x148
[    2.673118]  component_aggregate_register+0x388/0x390
[    2.673763]  mtk_iommu_probe+0x638/0x690

[    2.686467] ------------[ cut here ]------------
[    2.687049] refcount_t: saturated; leaking memory.
[    2.687666] WARNING: CPU: 5 PID: 74 at ../v5.16-
rc1/kernel/mediatek/lib/refcount.c:19 refcount_warn_saturate+0xfc/0x148

[    2.703805] Call trace:
[    2.704117]  refcount_warn_saturate+0xfc/0x148
[    2.704685]  component_aggregate_register+0x1fc/0x390
[    2.705330]  mtk_iommu_probe+0x638/0x690

> +	}
> +	mutex_unlock(&aggregate_mutex);
> +
> +	return ret;
>  }
> 
>  static void aggregate_driver_unregister(struct aggregate_driver
> *adrv)
>  {
> -	driver_unregister(&adrv->driver);
> +	if (refcount_dec_and_mutex_lock(&adrv->count,
> &aggregate_mutex)) {
> +		driver_unregister(&adrv->driver);
> +		mutex_unlock(&aggregate_mutex);
> +	}
>  }
> 
>  static struct aggregate_device *aggregate_device_add(struct device
> *parent,
> diff --git a/include/linux/component.h b/include/linux/component.h
> index 53d81203c095..b061341938aa 100644
> --- a/include/linux/component.h
> +++ b/include/linux/component.h
> @@ -4,6 +4,7 @@
> 
>  #include <linux/stddef.h>
>  #include <linux/device.h>
> +#include <linux/refcount.h>
> 
>  struct aggregate_device;
> 
> @@ -66,6 +67,7 @@ struct device *aggregate_device_parent(const struct
> aggregate_device *adev);
> 
>  /**
>   * struct aggregate_driver - Aggregate driver (made up of other
> drivers)
> + * @count: driver registration refcount
>   * @driver: device driver
>   */
>  struct aggregate_driver {
> @@ -101,6 +103,7 @@ struct aggregate_driver {
>  	 */
>  	void (*shutdown)(struct aggregate_device *adev);
> 
> +	refcount_t		count;
>  	struct device_driver	driver;
>  };

After this patch, the aggregate_driver flow looks ok. But our driver
still aborts like this:

[    2.721316] Unable to handle kernel NULL pointer dereference at
virtual address 0000000000000000
...
[    2.731658] pc : mtk_smi_larb_config_port_gen2_general+0xa4/0x138
[    2.732434] lr : mtk_smi_larb_resume+0x54/0x98
...
[    2.742457] Call trace:
[    2.742768]  mtk_smi_larb_config_port_gen2_general+0xa4/0x138
[    2.743496]  pm_generic_runtime_resume+0x2c/0x48
[    2.744090]  __genpd_runtime_resume+0x30/0xa8
[    2.744648]  genpd_runtime_resume+0x94/0x2c8
[    2.745191]  __rpm_callback+0x44/0x150
[    2.745669]  rpm_callback+0x6c/0x78
[    2.746114]  rpm_resume+0x314/0x558
[    2.746559]  __pm_runtime_resume+0x3c/0x88
[    2.747080]  pm_runtime_get_suppliers+0x7c/0x110
[    2.747668]  __driver_probe_device+0x4c/0xe8
[    2.748212]  driver_probe_device+0x44/0x130
[    2.748745]  __device_attach_driver+0x98/0xd0
[    2.749300]  bus_for_each_drv+0x68/0xd0
[    2.749787]  __device_attach+0xec/0x148
[    2.750277]  device_attach+0x14/0x20
[    2.750733]  bus_rescan_devices_helper+0x50/0x90
[    2.751319]  bus_for_each_dev+0x7c/0xd8
[    2.751806]  bus_rescan_devices+0x20/0x30
[    2.752315]  __component_add+0x7c/0xa0
[    2.752795]  component_add+0x14/0x20
[    2.753253]  mtk_smi_larb_probe+0xe0/0x120

This is because the device runtime_resume is called before the bind
operation(In our case this detailed function is mtk_smi_larb_bind).
The issue doesn't happen without this patchset. I'm not sure the right
sequence. If we should fix in mediatek driver, the patch could be:


diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c
index b883dcc0bbfa..288841555067 100644
--- a/drivers/memory/mtk-smi.c
+++ b/drivers/memory/mtk-smi.c
@@ -483,8 +483,9 @@ static int __maybe_unused
mtk_smi_larb_resume(struct device *dev)
        if (ret < 0)
                return ret;

-       /* Configure the basic setting for this larb */
-       larb_gen->config_port(dev);
+       /* Configure the basic setting for this larb after it binds
with iommu */
+       if (larb->mmu)
+               larb_gen->config_port(dev);

        return 0;
 }


Another nitpick, the title should be: iommu/mediatek: xxxx

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

  reply	other threads:[~2022-01-12  9:09 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-06 21:45 [PATCH v5 00/32] component: Make into an aggregate bus Stephen Boyd
2022-01-06 21:45 ` Stephen Boyd
2022-01-06 21:45 ` [PATCH v5 01/32] component: Replace most references to 'master' with 'aggregate device' Stephen Boyd
2022-01-06 21:45   ` Stephen Boyd
2022-01-06 21:45 ` [PATCH v5 02/32] component: Introduce the aggregate bus_type Stephen Boyd
2022-01-06 21:45   ` Stephen Boyd
2022-01-06 21:45 ` [PATCH v5 03/32] component: Move struct aggregate_device out to header file Stephen Boyd
2022-01-06 21:45   ` Stephen Boyd
2022-01-07 13:07   ` Jani Nikula
2022-01-07 20:12     ` Stephen Boyd
2022-01-10 11:23       ` Jani Nikula
2022-01-06 21:45 ` [PATCH v5 04/32] component: Add {bind, unbind}_component() ops that take aggregate device Stephen Boyd
2022-01-06 21:45   ` [PATCH v5 04/32] component: Add {bind,unbind}_component() " Stephen Boyd
2022-01-06 21:45 ` [PATCH v5 05/32] drm/of: Add a drm_of_aggregate_probe() API Stephen Boyd
2022-01-06 21:45   ` Stephen Boyd
2022-01-06 21:45 ` [PATCH v5 06/32] drm/msm: Migrate to aggregate driver Stephen Boyd
2022-01-06 21:45   ` Stephen Boyd
2022-01-06 21:45 ` [PATCH v5 07/32] drm/komeda: " Stephen Boyd
2022-01-06 21:45   ` Stephen Boyd
2022-01-06 21:45 ` [PATCH v5 08/32] drm/arm/hdlcd: " Stephen Boyd
2022-01-06 21:45   ` Stephen Boyd
2022-01-06 21:45 ` [PATCH v5 09/32] drm/malidp: " Stephen Boyd
2022-01-06 21:45   ` Stephen Boyd
2022-01-06 21:45 ` [PATCH v5 10/32] drm/armada: " Stephen Boyd
2022-01-06 21:45   ` Stephen Boyd
2022-01-06 21:45 ` [PATCH v5 11/32] drm/etnaviv: " Stephen Boyd
2022-01-06 21:45   ` Stephen Boyd
2022-01-06 21:45 ` [PATCH v5 12/32] drm/kirin: " Stephen Boyd
2022-01-06 21:45   ` Stephen Boyd
2022-01-06 21:45 ` [PATCH v5 13/32] drm/exynos: " Stephen Boyd
2022-01-06 21:45   ` Stephen Boyd
2022-01-06 21:45 ` [PATCH v5 14/32] drm/imx: " Stephen Boyd
2022-01-06 21:45   ` Stephen Boyd
2022-01-06 21:45 ` [PATCH v5 15/32] drm/ingenic: " Stephen Boyd
2022-01-06 21:45   ` Stephen Boyd
2022-01-06 21:45 ` [PATCH v5 16/32] drm/mcde: " Stephen Boyd
2022-01-06 21:45   ` Stephen Boyd
2022-01-06 21:45 ` [PATCH v5 17/32] drm/mediatek: " Stephen Boyd
2022-01-06 21:45   ` Stephen Boyd
2022-01-06 21:45 ` [PATCH v5 18/32] drm/meson: " Stephen Boyd
2022-01-06 21:45   ` Stephen Boyd
2022-01-06 21:45 ` [PATCH v5 19/32] drm/omap: " Stephen Boyd
2022-01-06 21:45   ` Stephen Boyd
2022-01-06 21:45 ` [PATCH v5 20/32] drm/rockchip: " Stephen Boyd
2022-01-06 21:45   ` Stephen Boyd
2022-01-06 21:45 ` [PATCH v5 21/32] drm/sti: " Stephen Boyd
2022-01-06 21:45   ` Stephen Boyd
2022-01-06 21:45 ` [PATCH v5 22/32] drm/sun4i: " Stephen Boyd
2022-01-06 21:45   ` Stephen Boyd
2022-01-06 21:45 ` [PATCH v5 23/32] drm/tilcdc: " Stephen Boyd
2022-01-06 21:45   ` Stephen Boyd
2022-01-06 21:45 ` [PATCH v5 24/32] drm/vc4: " Stephen Boyd
2022-01-06 21:45   ` Stephen Boyd
2022-01-06 21:45 ` [PATCH v5 25/32] iommu/mtk: " Stephen Boyd
2022-01-06 21:45   ` Stephen Boyd
2022-01-11 12:22   ` Yong Wu
2022-01-11 12:22     ` Yong Wu
2022-01-11 12:22     ` Yong Wu
2022-01-12  0:27     ` Stephen Boyd
2022-01-12  0:27       ` Stephen Boyd
2022-01-12  0:27       ` Stephen Boyd
2022-01-12  9:09       ` Yong Wu [this message]
2022-01-12  9:09         ` Yong Wu
2022-01-12  9:09         ` Yong Wu
2022-01-12  9:09         ` Yong Wu
2022-01-13  4:25         ` Stephen Boyd
2022-01-13  4:25           ` Stephen Boyd
2022-01-13  4:25           ` Stephen Boyd
2022-01-13  4:25           ` Stephen Boyd
2022-01-14  9:06           ` Yong Wu
2022-01-14  9:06             ` Yong Wu
2022-01-14  9:06             ` Yong Wu
2022-01-14  9:06             ` Yong Wu
2022-01-14 21:30             ` Stephen Boyd
2022-01-14 21:30               ` Stephen Boyd
2022-01-14 21:30               ` Stephen Boyd
2022-01-14 21:30               ` Stephen Boyd
2022-01-15  7:39               ` Yong Wu
2022-01-15  7:39                 ` Yong Wu
2022-01-15  7:39                 ` Yong Wu
2022-01-15  7:39                 ` Yong Wu
2022-01-15  7:50                 ` Stephen Boyd
2022-01-15  7:50                   ` Stephen Boyd
2022-01-15  7:50                   ` Stephen Boyd
2022-01-15  7:50                   ` Stephen Boyd
2022-01-06 21:45 ` [PATCH v5 26/32] mei: " Stephen Boyd
2022-01-06 21:45   ` Stephen Boyd
2022-01-06 21:45 ` [PATCH v5 27/32] power: supply: ab8500: " Stephen Boyd
2022-01-06 21:45   ` Stephen Boyd
2022-01-06 21:45 ` [PATCH v5 28/32] fbdev: omap2: " Stephen Boyd
2022-01-06 21:45   ` Stephen Boyd
2022-01-06 21:45 ` [PATCH v5 29/32] sound: hdac: " Stephen Boyd
2022-01-06 21:45   ` Stephen Boyd
2022-01-06 21:45 ` [PATCH v5 30/32] ASoC: codecs: wcd938x: " Stephen Boyd
2022-01-06 21:45   ` Stephen Boyd
2022-01-06 21:45 ` [PATCH v5 31/32] component: Get rid of drm_of_component_probe() Stephen Boyd
2022-01-06 21:45   ` Stephen Boyd
2022-01-06 21:45 ` [PATCH v5 32/32] component: Remove component_master_ops and friends Stephen Boyd
2022-01-06 21:45   ` Stephen Boyd

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ff81bc1fe1f1c2060fcf03ba14f1bef584c47599.camel@mediatek.com \
    --to=yong.wu@mediatek.com \
    --cc=daniel.vetter@ffwll.ch \
    --cc=dianders@chromium.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=freedreno@lists.freedesktop.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=joro@8bytes.org \
    --cc=krzysztof.kozlowski@canonical.com \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mediatek@lists.infradead.org \
    --cc=rafael@kernel.org \
    --cc=rmk+kernel@arm.linux.org.uk \
    --cc=robdclark@gmail.com \
    --cc=saravanak@google.com \
    --cc=swboyd@chromium.org \
    --cc=will@kernel.org \
    --cc=youlin.pei@mediatek.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.