From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E8883C433F5 for ; Wed, 12 Jan 2022 09:09:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351947AbiALJJZ (ORCPT ); Wed, 12 Jan 2022 04:09:25 -0500 Received: from mailgw02.mediatek.com ([210.61.82.184]:35542 "EHLO mailgw02.mediatek.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1351945AbiALJJY (ORCPT ); Wed, 12 Jan 2022 04:09:24 -0500 X-UUID: c3dd01af70294ed4829a1750c41c591b-20220112 X-UUID: c3dd01af70294ed4829a1750c41c591b-20220112 Received: from mtkexhb02.mediatek.inc [(172.21.101.103)] by mailgw02.mediatek.com (envelope-from ) (Generic MTA with TLSv1.2 ECDHE-RSA-AES256-SHA384 256/256) with ESMTP id 1440893993; Wed, 12 Jan 2022 17:09:22 +0800 Received: from mtkcas10.mediatek.inc (172.21.101.39) by mtkmbs10n1.mediatek.inc (172.21.101.34) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.2.792.15; Wed, 12 Jan 2022 17:09:20 +0800 Received: from mhfsdcap04 (10.17.3.154) by mtkcas10.mediatek.inc (172.21.101.73) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Wed, 12 Jan 2022 17:09:19 +0800 Message-ID: Subject: Re: [PATCH v5 25/32] iommu/mtk: Migrate to aggregate driver From: Yong Wu To: Stephen Boyd CC: Krzysztof Kozlowski , "Greg Kroah-Hartman" , Douglas Anderson , , , , , Joerg Roedel , "Will Deacon" , Daniel Vetter , "Rafael J. Wysocki" , Rob Clark , Russell King , Saravana Kannan , , , Date: Wed, 12 Jan 2022 17:09:19 +0800 In-Reply-To: References: <20220106214556.2461363-1-swboyd@chromium.org> <20220106214556.2461363-26-swboyd@chromium.org> <1a3b368eb891ca55c33265397cffab0b9f128737.camel@mediatek.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.28.5-0ubuntu0.18.04.2 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-MTK: N Precedence: bulk List-ID: X-Mailing-List: linux-arm-msm@vger.kernel.org On Tue, 2022-01-11 at 16:27 -0800, Stephen Boyd wrote: > Quoting Yong Wu (2022-01-11 04:22:23) > > Hi Stephen, > > > > Thanks for helping update here. > > > > On Thu, 2022-01-06 at 13:45 -0800, Stephen Boyd wrote: > > > Use an aggregate driver instead of component ops so that we can > > > get > > > proper driver probe ordering of the aggregate device with respect > > > to > > > all > > > the component devices that make up the aggregate device. > > > > > > Cc: Yong Wu > > > Cc: Joerg Roedel > > > Cc: Will Deacon > > > Cc: Daniel Vetter > > > Cc: "Rafael J. Wysocki" > > > Cc: Rob Clark > > > Cc: Russell King > > > Cc: Saravana Kannan > > > Signed-off-by: Stephen Boyd > > > > When I test this on mt8195 which have two IOMMU HWs(calling > > component_aggregate_regsiter twice), it will abort like this. Then > > what > > should we do if we have two instances? > > > > Thanks for testing it out. We can't register the struct driver more > than > once but this driver is calling the component_aggregate_register() > function from the driver probe and there are two devices bound to the > mtk-iommu driver so we try to register it more than once. Sigh! > > I see a couple options. One is to do a deep copy of the driver > structure > and change the driver name. Then it's a one to one relationship > between > device and driver. That's not very great because it leaves around > junk > so it should probably be avoided. > > Another option is to reference count the driver registration calls > when > component_aggregate_register() is called multiple times. Then we > would > only register the driver once and keep it pinned until the last > unregister call is made, but still remove devices that are created > for > the match table. > > Can you try the attached patch? It is based on the next version of > this > patch series so the include part of the patch may not apply cleanly. > > ---8<--- > diff --git a/drivers/base/component.c b/drivers/base/component.c > index 64ad7478c67a..97f253a41bdf 100644 > --- a/drivers/base/component.c > +++ b/drivers/base/component.c > @@ -492,15 +492,30 @@ static struct aggregate_device > *__aggregate_find(struct device *parent) > return dev ? to_aggregate_device(dev) : NULL; > } > > +static DEFINE_MUTEX(aggregate_mutex); > + > static int aggregate_driver_register(struct aggregate_driver *adrv) > { > - adrv->driver.bus = &aggregate_bus_type; > - return driver_register(&adrv->driver); > + int ret = 0; > + > + mutex_lock(&aggregate_mutex); > + if (!refcount_inc_not_zero(&adrv->count)) { > + adrv->driver.bus = &aggregate_bus_type; > + ret = driver_register(&adrv->driver); > + if (!ret) > + refcount_inc(&adrv->count); This should be refcount_set(&adrv->count, 1)? Otherwise, it will warning like this: [ 2.654526] ------------[ cut here ]------------ [ 2.655558] refcount_t: addition on 0; use-after-free. [ 2.656219] WARNING: CPU: 7 PID: 74 at ../v5.16- rc1/kernel/mediatek/lib/refcount.c:25 refcount_warn_saturate+0x128/0x148 ... [ 2.672227] Call trace: [ 2.672539] refcount_warn_saturate+0x128/0x148 [ 2.673118] component_aggregate_register+0x388/0x390 [ 2.673763] mtk_iommu_probe+0x638/0x690 [ 2.686467] ------------[ cut here ]------------ [ 2.687049] refcount_t: saturated; leaking memory. [ 2.687666] WARNING: CPU: 5 PID: 74 at ../v5.16- rc1/kernel/mediatek/lib/refcount.c:19 refcount_warn_saturate+0xfc/0x148 [ 2.703805] Call trace: [ 2.704117] refcount_warn_saturate+0xfc/0x148 [ 2.704685] component_aggregate_register+0x1fc/0x390 [ 2.705330] mtk_iommu_probe+0x638/0x690 > + } > + mutex_unlock(&aggregate_mutex); > + > + return ret; > } > > static void aggregate_driver_unregister(struct aggregate_driver > *adrv) > { > - driver_unregister(&adrv->driver); > + if (refcount_dec_and_mutex_lock(&adrv->count, > &aggregate_mutex)) { > + driver_unregister(&adrv->driver); > + mutex_unlock(&aggregate_mutex); > + } > } > > static struct aggregate_device *aggregate_device_add(struct device > *parent, > diff --git a/include/linux/component.h b/include/linux/component.h > index 53d81203c095..b061341938aa 100644 > --- a/include/linux/component.h > +++ b/include/linux/component.h > @@ -4,6 +4,7 @@ > > #include > #include > +#include > > struct aggregate_device; > > @@ -66,6 +67,7 @@ struct device *aggregate_device_parent(const struct > aggregate_device *adev); > > /** > * struct aggregate_driver - Aggregate driver (made up of other > drivers) > + * @count: driver registration refcount > * @driver: device driver > */ > struct aggregate_driver { > @@ -101,6 +103,7 @@ struct aggregate_driver { > */ > void (*shutdown)(struct aggregate_device *adev); > > + refcount_t count; > struct device_driver driver; > }; After this patch, the aggregate_driver flow looks ok. But our driver still aborts like this: [ 2.721316] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 ... [ 2.731658] pc : mtk_smi_larb_config_port_gen2_general+0xa4/0x138 [ 2.732434] lr : mtk_smi_larb_resume+0x54/0x98 ... [ 2.742457] Call trace: [ 2.742768] mtk_smi_larb_config_port_gen2_general+0xa4/0x138 [ 2.743496] pm_generic_runtime_resume+0x2c/0x48 [ 2.744090] __genpd_runtime_resume+0x30/0xa8 [ 2.744648] genpd_runtime_resume+0x94/0x2c8 [ 2.745191] __rpm_callback+0x44/0x150 [ 2.745669] rpm_callback+0x6c/0x78 [ 2.746114] rpm_resume+0x314/0x558 [ 2.746559] __pm_runtime_resume+0x3c/0x88 [ 2.747080] pm_runtime_get_suppliers+0x7c/0x110 [ 2.747668] __driver_probe_device+0x4c/0xe8 [ 2.748212] driver_probe_device+0x44/0x130 [ 2.748745] __device_attach_driver+0x98/0xd0 [ 2.749300] bus_for_each_drv+0x68/0xd0 [ 2.749787] __device_attach+0xec/0x148 [ 2.750277] device_attach+0x14/0x20 [ 2.750733] bus_rescan_devices_helper+0x50/0x90 [ 2.751319] bus_for_each_dev+0x7c/0xd8 [ 2.751806] bus_rescan_devices+0x20/0x30 [ 2.752315] __component_add+0x7c/0xa0 [ 2.752795] component_add+0x14/0x20 [ 2.753253] mtk_smi_larb_probe+0xe0/0x120 This is because the device runtime_resume is called before the bind operation(In our case this detailed function is mtk_smi_larb_bind). The issue doesn't happen without this patchset. I'm not sure the right sequence. If we should fix in mediatek driver, the patch could be: diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c index b883dcc0bbfa..288841555067 100644 --- a/drivers/memory/mtk-smi.c +++ b/drivers/memory/mtk-smi.c @@ -483,8 +483,9 @@ static int __maybe_unused mtk_smi_larb_resume(struct device *dev) if (ret < 0) return ret; - /* Configure the basic setting for this larb */ - larb_gen->config_port(dev); + /* Configure the basic setting for this larb after it binds with iommu */ + if (larb->mmu) + larb_gen->config_port(dev); return 0; } Another nitpick, the title should be: iommu/mediatek: xxxx From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 73D32C433F5 for ; Wed, 12 Jan 2022 09:09:36 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id EF38E6FB87; Wed, 12 Jan 2022 09:09:35 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id I0p2aFEUgTjz; Wed, 12 Jan 2022 09:09:34 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp3.osuosl.org (Postfix) with ESMTPS id 8E69860087; Wed, 12 Jan 2022 09:09:34 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 4783DC002F; Wed, 12 Jan 2022 09:09:34 +0000 (UTC) Received: from smtp1.osuosl.org (smtp1.osuosl.org [140.211.166.138]) by lists.linuxfoundation.org (Postfix) with ESMTP id 84E72C001E for ; Wed, 12 Jan 2022 09:09:32 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id 5F09F83F5B for ; Wed, 12 Jan 2022 09:09:32 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id u1agARXOsibf for ; Wed, 12 Jan 2022 09:09:27 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from mailgw02.mediatek.com (unknown [210.61.82.184]) by smtp1.osuosl.org (Postfix) with ESMTPS id 56D0F83F16 for ; Wed, 12 Jan 2022 09:09:26 +0000 (UTC) X-UUID: c3dd01af70294ed4829a1750c41c591b-20220112 X-UUID: c3dd01af70294ed4829a1750c41c591b-20220112 Received: from mtkexhb02.mediatek.inc [(172.21.101.103)] by mailgw02.mediatek.com (envelope-from ) (Generic MTA with TLSv1.2 ECDHE-RSA-AES256-SHA384 256/256) with ESMTP id 1440893993; Wed, 12 Jan 2022 17:09:22 +0800 Received: from mtkcas10.mediatek.inc (172.21.101.39) by mtkmbs10n1.mediatek.inc (172.21.101.34) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.2.792.15; Wed, 12 Jan 2022 17:09:20 +0800 Received: from mhfsdcap04 (10.17.3.154) by mtkcas10.mediatek.inc (172.21.101.73) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Wed, 12 Jan 2022 17:09:19 +0800 Message-ID: Subject: Re: [PATCH v5 25/32] iommu/mtk: Migrate to aggregate driver From: Yong Wu To: Stephen Boyd Date: Wed, 12 Jan 2022 17:09:19 +0800 In-Reply-To: References: <20220106214556.2461363-1-swboyd@chromium.org> <20220106214556.2461363-26-swboyd@chromium.org> <1a3b368eb891ca55c33265397cffab0b9f128737.camel@mediatek.com> X-Mailer: Evolution 3.28.5-0ubuntu0.18.04.2 MIME-Version: 1.0 X-MTK: N Cc: youlin.pei@mediatek.com, Saravana Kannan , Will Deacon , Krzysztof Kozlowski , Greg Kroah-Hartman , "Rafael J. Wysocki" , Douglas Anderson , dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, Daniel Vetter , iommu@lists.linux-foundation.org, linux-mediatek@lists.infradead.org, linux-arm-msm@vger.kernel.org, Russell King , freedreno@lists.freedesktop.org X-BeenThere: iommu@lists.linux-foundation.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: Development issues for Linux IOMMU support List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: iommu-bounces@lists.linux-foundation.org Sender: "iommu" On Tue, 2022-01-11 at 16:27 -0800, Stephen Boyd wrote: > Quoting Yong Wu (2022-01-11 04:22:23) > > Hi Stephen, > > > > Thanks for helping update here. > > > > On Thu, 2022-01-06 at 13:45 -0800, Stephen Boyd wrote: > > > Use an aggregate driver instead of component ops so that we can > > > get > > > proper driver probe ordering of the aggregate device with respect > > > to > > > all > > > the component devices that make up the aggregate device. > > > > > > Cc: Yong Wu > > > Cc: Joerg Roedel > > > Cc: Will Deacon > > > Cc: Daniel Vetter > > > Cc: "Rafael J. Wysocki" > > > Cc: Rob Clark > > > Cc: Russell King > > > Cc: Saravana Kannan > > > Signed-off-by: Stephen Boyd > > > > When I test this on mt8195 which have two IOMMU HWs(calling > > component_aggregate_regsiter twice), it will abort like this. Then > > what > > should we do if we have two instances? > > > > Thanks for testing it out. We can't register the struct driver more > than > once but this driver is calling the component_aggregate_register() > function from the driver probe and there are two devices bound to the > mtk-iommu driver so we try to register it more than once. Sigh! > > I see a couple options. One is to do a deep copy of the driver > structure > and change the driver name. Then it's a one to one relationship > between > device and driver. That's not very great because it leaves around > junk > so it should probably be avoided. > > Another option is to reference count the driver registration calls > when > component_aggregate_register() is called multiple times. Then we > would > only register the driver once and keep it pinned until the last > unregister call is made, but still remove devices that are created > for > the match table. > > Can you try the attached patch? It is based on the next version of > this > patch series so the include part of the patch may not apply cleanly. > > ---8<--- > diff --git a/drivers/base/component.c b/drivers/base/component.c > index 64ad7478c67a..97f253a41bdf 100644 > --- a/drivers/base/component.c > +++ b/drivers/base/component.c > @@ -492,15 +492,30 @@ static struct aggregate_device > *__aggregate_find(struct device *parent) > return dev ? to_aggregate_device(dev) : NULL; > } > > +static DEFINE_MUTEX(aggregate_mutex); > + > static int aggregate_driver_register(struct aggregate_driver *adrv) > { > - adrv->driver.bus = &aggregate_bus_type; > - return driver_register(&adrv->driver); > + int ret = 0; > + > + mutex_lock(&aggregate_mutex); > + if (!refcount_inc_not_zero(&adrv->count)) { > + adrv->driver.bus = &aggregate_bus_type; > + ret = driver_register(&adrv->driver); > + if (!ret) > + refcount_inc(&adrv->count); This should be refcount_set(&adrv->count, 1)? Otherwise, it will warning like this: [ 2.654526] ------------[ cut here ]------------ [ 2.655558] refcount_t: addition on 0; use-after-free. [ 2.656219] WARNING: CPU: 7 PID: 74 at ../v5.16- rc1/kernel/mediatek/lib/refcount.c:25 refcount_warn_saturate+0x128/0x148 ... [ 2.672227] Call trace: [ 2.672539] refcount_warn_saturate+0x128/0x148 [ 2.673118] component_aggregate_register+0x388/0x390 [ 2.673763] mtk_iommu_probe+0x638/0x690 [ 2.686467] ------------[ cut here ]------------ [ 2.687049] refcount_t: saturated; leaking memory. [ 2.687666] WARNING: CPU: 5 PID: 74 at ../v5.16- rc1/kernel/mediatek/lib/refcount.c:19 refcount_warn_saturate+0xfc/0x148 [ 2.703805] Call trace: [ 2.704117] refcount_warn_saturate+0xfc/0x148 [ 2.704685] component_aggregate_register+0x1fc/0x390 [ 2.705330] mtk_iommu_probe+0x638/0x690 > + } > + mutex_unlock(&aggregate_mutex); > + > + return ret; > } > > static void aggregate_driver_unregister(struct aggregate_driver > *adrv) > { > - driver_unregister(&adrv->driver); > + if (refcount_dec_and_mutex_lock(&adrv->count, > &aggregate_mutex)) { > + driver_unregister(&adrv->driver); > + mutex_unlock(&aggregate_mutex); > + } > } > > static struct aggregate_device *aggregate_device_add(struct device > *parent, > diff --git a/include/linux/component.h b/include/linux/component.h > index 53d81203c095..b061341938aa 100644 > --- a/include/linux/component.h > +++ b/include/linux/component.h > @@ -4,6 +4,7 @@ > > #include > #include > +#include > > struct aggregate_device; > > @@ -66,6 +67,7 @@ struct device *aggregate_device_parent(const struct > aggregate_device *adev); > > /** > * struct aggregate_driver - Aggregate driver (made up of other > drivers) > + * @count: driver registration refcount > * @driver: device driver > */ > struct aggregate_driver { > @@ -101,6 +103,7 @@ struct aggregate_driver { > */ > void (*shutdown)(struct aggregate_device *adev); > > + refcount_t count; > struct device_driver driver; > }; After this patch, the aggregate_driver flow looks ok. But our driver still aborts like this: [ 2.721316] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 ... [ 2.731658] pc : mtk_smi_larb_config_port_gen2_general+0xa4/0x138 [ 2.732434] lr : mtk_smi_larb_resume+0x54/0x98 ... [ 2.742457] Call trace: [ 2.742768] mtk_smi_larb_config_port_gen2_general+0xa4/0x138 [ 2.743496] pm_generic_runtime_resume+0x2c/0x48 [ 2.744090] __genpd_runtime_resume+0x30/0xa8 [ 2.744648] genpd_runtime_resume+0x94/0x2c8 [ 2.745191] __rpm_callback+0x44/0x150 [ 2.745669] rpm_callback+0x6c/0x78 [ 2.746114] rpm_resume+0x314/0x558 [ 2.746559] __pm_runtime_resume+0x3c/0x88 [ 2.747080] pm_runtime_get_suppliers+0x7c/0x110 [ 2.747668] __driver_probe_device+0x4c/0xe8 [ 2.748212] driver_probe_device+0x44/0x130 [ 2.748745] __device_attach_driver+0x98/0xd0 [ 2.749300] bus_for_each_drv+0x68/0xd0 [ 2.749787] __device_attach+0xec/0x148 [ 2.750277] device_attach+0x14/0x20 [ 2.750733] bus_rescan_devices_helper+0x50/0x90 [ 2.751319] bus_for_each_dev+0x7c/0xd8 [ 2.751806] bus_rescan_devices+0x20/0x30 [ 2.752315] __component_add+0x7c/0xa0 [ 2.752795] component_add+0x14/0x20 [ 2.753253] mtk_smi_larb_probe+0xe0/0x120 This is because the device runtime_resume is called before the bind operation(In our case this detailed function is mtk_smi_larb_bind). The issue doesn't happen without this patchset. I'm not sure the right sequence. If we should fix in mediatek driver, the patch could be: diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c index b883dcc0bbfa..288841555067 100644 --- a/drivers/memory/mtk-smi.c +++ b/drivers/memory/mtk-smi.c @@ -483,8 +483,9 @@ static int __maybe_unused mtk_smi_larb_resume(struct device *dev) if (ret < 0) return ret; - /* Configure the basic setting for this larb */ - larb_gen->config_port(dev); + /* Configure the basic setting for this larb after it binds with iommu */ + if (larb->mmu) + larb_gen->config_port(dev); return 0; } Another nitpick, the title should be: iommu/mediatek: xxxx _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 8FAA8C433EF for ; Wed, 12 Jan 2022 09:09:29 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 94EEB1132A6; Wed, 12 Jan 2022 09:09:28 +0000 (UTC) Received: from mailgw02.mediatek.com (unknown [210.61.82.184]) by gabe.freedesktop.org (Postfix) with ESMTPS id 2AA7E1132A4; Wed, 12 Jan 2022 09:09:26 +0000 (UTC) X-UUID: c3dd01af70294ed4829a1750c41c591b-20220112 X-UUID: c3dd01af70294ed4829a1750c41c591b-20220112 Received: from mtkexhb02.mediatek.inc [(172.21.101.103)] by mailgw02.mediatek.com (envelope-from ) (Generic MTA with TLSv1.2 ECDHE-RSA-AES256-SHA384 256/256) with ESMTP id 1440893993; Wed, 12 Jan 2022 17:09:22 +0800 Received: from mtkcas10.mediatek.inc (172.21.101.39) by mtkmbs10n1.mediatek.inc (172.21.101.34) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.2.792.15; Wed, 12 Jan 2022 17:09:20 +0800 Received: from mhfsdcap04 (10.17.3.154) by mtkcas10.mediatek.inc (172.21.101.73) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Wed, 12 Jan 2022 17:09:19 +0800 Message-ID: Subject: Re: [PATCH v5 25/32] iommu/mtk: Migrate to aggregate driver From: Yong Wu To: Stephen Boyd Date: Wed, 12 Jan 2022 17:09:19 +0800 In-Reply-To: References: <20220106214556.2461363-1-swboyd@chromium.org> <20220106214556.2461363-26-swboyd@chromium.org> <1a3b368eb891ca55c33265397cffab0b9f128737.camel@mediatek.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.28.5-0ubuntu0.18.04.2 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-MTK: N X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: youlin.pei@mediatek.com, Saravana Kannan , Will Deacon , Krzysztof Kozlowski , Greg Kroah-Hartman , Joerg Roedel , "Rafael J. Wysocki" , Douglas Anderson , dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, Daniel Vetter , iommu@lists.linux-foundation.org, linux-mediatek@lists.infradead.org, linux-arm-msm@vger.kernel.org, Russell King , freedreno@lists.freedesktop.org Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" On Tue, 2022-01-11 at 16:27 -0800, Stephen Boyd wrote: > Quoting Yong Wu (2022-01-11 04:22:23) > > Hi Stephen, > > > > Thanks for helping update here. > > > > On Thu, 2022-01-06 at 13:45 -0800, Stephen Boyd wrote: > > > Use an aggregate driver instead of component ops so that we can > > > get > > > proper driver probe ordering of the aggregate device with respect > > > to > > > all > > > the component devices that make up the aggregate device. > > > > > > Cc: Yong Wu > > > Cc: Joerg Roedel > > > Cc: Will Deacon > > > Cc: Daniel Vetter > > > Cc: "Rafael J. Wysocki" > > > Cc: Rob Clark > > > Cc: Russell King > > > Cc: Saravana Kannan > > > Signed-off-by: Stephen Boyd > > > > When I test this on mt8195 which have two IOMMU HWs(calling > > component_aggregate_regsiter twice), it will abort like this. Then > > what > > should we do if we have two instances? > > > > Thanks for testing it out. We can't register the struct driver more > than > once but this driver is calling the component_aggregate_register() > function from the driver probe and there are two devices bound to the > mtk-iommu driver so we try to register it more than once. Sigh! > > I see a couple options. One is to do a deep copy of the driver > structure > and change the driver name. Then it's a one to one relationship > between > device and driver. That's not very great because it leaves around > junk > so it should probably be avoided. > > Another option is to reference count the driver registration calls > when > component_aggregate_register() is called multiple times. Then we > would > only register the driver once and keep it pinned until the last > unregister call is made, but still remove devices that are created > for > the match table. > > Can you try the attached patch? It is based on the next version of > this > patch series so the include part of the patch may not apply cleanly. > > ---8<--- > diff --git a/drivers/base/component.c b/drivers/base/component.c > index 64ad7478c67a..97f253a41bdf 100644 > --- a/drivers/base/component.c > +++ b/drivers/base/component.c > @@ -492,15 +492,30 @@ static struct aggregate_device > *__aggregate_find(struct device *parent) > return dev ? to_aggregate_device(dev) : NULL; > } > > +static DEFINE_MUTEX(aggregate_mutex); > + > static int aggregate_driver_register(struct aggregate_driver *adrv) > { > - adrv->driver.bus = &aggregate_bus_type; > - return driver_register(&adrv->driver); > + int ret = 0; > + > + mutex_lock(&aggregate_mutex); > + if (!refcount_inc_not_zero(&adrv->count)) { > + adrv->driver.bus = &aggregate_bus_type; > + ret = driver_register(&adrv->driver); > + if (!ret) > + refcount_inc(&adrv->count); This should be refcount_set(&adrv->count, 1)? Otherwise, it will warning like this: [ 2.654526] ------------[ cut here ]------------ [ 2.655558] refcount_t: addition on 0; use-after-free. [ 2.656219] WARNING: CPU: 7 PID: 74 at ../v5.16- rc1/kernel/mediatek/lib/refcount.c:25 refcount_warn_saturate+0x128/0x148 ... [ 2.672227] Call trace: [ 2.672539] refcount_warn_saturate+0x128/0x148 [ 2.673118] component_aggregate_register+0x388/0x390 [ 2.673763] mtk_iommu_probe+0x638/0x690 [ 2.686467] ------------[ cut here ]------------ [ 2.687049] refcount_t: saturated; leaking memory. [ 2.687666] WARNING: CPU: 5 PID: 74 at ../v5.16- rc1/kernel/mediatek/lib/refcount.c:19 refcount_warn_saturate+0xfc/0x148 [ 2.703805] Call trace: [ 2.704117] refcount_warn_saturate+0xfc/0x148 [ 2.704685] component_aggregate_register+0x1fc/0x390 [ 2.705330] mtk_iommu_probe+0x638/0x690 > + } > + mutex_unlock(&aggregate_mutex); > + > + return ret; > } > > static void aggregate_driver_unregister(struct aggregate_driver > *adrv) > { > - driver_unregister(&adrv->driver); > + if (refcount_dec_and_mutex_lock(&adrv->count, > &aggregate_mutex)) { > + driver_unregister(&adrv->driver); > + mutex_unlock(&aggregate_mutex); > + } > } > > static struct aggregate_device *aggregate_device_add(struct device > *parent, > diff --git a/include/linux/component.h b/include/linux/component.h > index 53d81203c095..b061341938aa 100644 > --- a/include/linux/component.h > +++ b/include/linux/component.h > @@ -4,6 +4,7 @@ > > #include > #include > +#include > > struct aggregate_device; > > @@ -66,6 +67,7 @@ struct device *aggregate_device_parent(const struct > aggregate_device *adev); > > /** > * struct aggregate_driver - Aggregate driver (made up of other > drivers) > + * @count: driver registration refcount > * @driver: device driver > */ > struct aggregate_driver { > @@ -101,6 +103,7 @@ struct aggregate_driver { > */ > void (*shutdown)(struct aggregate_device *adev); > > + refcount_t count; > struct device_driver driver; > }; After this patch, the aggregate_driver flow looks ok. But our driver still aborts like this: [ 2.721316] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 ... [ 2.731658] pc : mtk_smi_larb_config_port_gen2_general+0xa4/0x138 [ 2.732434] lr : mtk_smi_larb_resume+0x54/0x98 ... [ 2.742457] Call trace: [ 2.742768] mtk_smi_larb_config_port_gen2_general+0xa4/0x138 [ 2.743496] pm_generic_runtime_resume+0x2c/0x48 [ 2.744090] __genpd_runtime_resume+0x30/0xa8 [ 2.744648] genpd_runtime_resume+0x94/0x2c8 [ 2.745191] __rpm_callback+0x44/0x150 [ 2.745669] rpm_callback+0x6c/0x78 [ 2.746114] rpm_resume+0x314/0x558 [ 2.746559] __pm_runtime_resume+0x3c/0x88 [ 2.747080] pm_runtime_get_suppliers+0x7c/0x110 [ 2.747668] __driver_probe_device+0x4c/0xe8 [ 2.748212] driver_probe_device+0x44/0x130 [ 2.748745] __device_attach_driver+0x98/0xd0 [ 2.749300] bus_for_each_drv+0x68/0xd0 [ 2.749787] __device_attach+0xec/0x148 [ 2.750277] device_attach+0x14/0x20 [ 2.750733] bus_rescan_devices_helper+0x50/0x90 [ 2.751319] bus_for_each_dev+0x7c/0xd8 [ 2.751806] bus_rescan_devices+0x20/0x30 [ 2.752315] __component_add+0x7c/0xa0 [ 2.752795] component_add+0x14/0x20 [ 2.753253] mtk_smi_larb_probe+0xe0/0x120 This is because the device runtime_resume is called before the bind operation(In our case this detailed function is mtk_smi_larb_bind). The issue doesn't happen without this patchset. I'm not sure the right sequence. If we should fix in mediatek driver, the patch could be: diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c index b883dcc0bbfa..288841555067 100644 --- a/drivers/memory/mtk-smi.c +++ b/drivers/memory/mtk-smi.c @@ -483,8 +483,9 @@ static int __maybe_unused mtk_smi_larb_resume(struct device *dev) if (ret < 0) return ret; - /* Configure the basic setting for this larb */ - larb_gen->config_port(dev); + /* Configure the basic setting for this larb after it binds with iommu */ + if (larb->mmu) + larb_gen->config_port(dev); return 0; } Another nitpick, the title should be: iommu/mediatek: xxxx From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B76C3C433F5 for ; Wed, 12 Jan 2022 09:19:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Date:CC:To:From:Subject:Message-ID:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=cLoV/F5xgp9mbYDmeMDznx0RKqd2RINEn9XpZO2C+Q8=; b=4WTl1HDttNyJch Dk63zWATpkZ4lLXjfApe8S6ANfezJsja+CAcB8KNS45/4RKw/+tKfiyTyRgoUttKvy9Zp4ftq5c3N CfGdg2NauKdIFjLidpOohhFmG4583k71RaOzRB8kcxOM7M0i4c5zs8zDmGuNsktYlRlt2mqfuSMDx C6izSs1Vf1cdOIJUYx9ZEP4oCmlZGQc1njClwcsGQeBCmIbYVutS4x6xgXmVGH+lUFO9n5h9onpYq Ni6GxfiF6pEAl4BEcvlrjtRwEhN1I54g9JrQNI1JenFGuzlLO4a5mSM3Ap6CSmVQHYUonlPEF2Ryu t0IZf55Y952xbdbxEexw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1n7Zmz-001kEA-7J; Wed, 12 Jan 2022 09:19:29 +0000 Received: from mailgw02.mediatek.com ([216.200.240.185]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1n7Zmw-001kDb-PB for linux-mediatek@lists.infradead.org; Wed, 12 Jan 2022 09:19:28 +0000 X-UUID: 1f678aeeb1704c62913e1445ceadc0d4-20220112 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=mediatek.com; s=dk; h=Content-Transfer-Encoding:MIME-Version:Content-Type:References:In-Reply-To:Date:CC:To:From:Subject:Message-ID; bh=RUbBjSRoXhJxZ34F8uCc8t9/YbQanu87DWvjRxaFHt8=; b=SU9wXPEO7JH5umOobDKyezuf1cd0zIfM8zOEh4NNnUCXyQHTX7wdwXBH6jgM46F9HlsidCxjKrVUr63xjOPq+f8/6F9vo1qH1TJZdZvlrE+adC8kB26jIYSyIgE+6rXLhc4VvGY5mNrE8VTQUeC6St4dnSTl9dabsmToP+G3gBI=; X-UUID: 1f678aeeb1704c62913e1445ceadc0d4-20220112 Received: from mtkcas66.mediatek.inc [(172.29.193.44)] by mailgw02.mediatek.com (envelope-from ) (musrelay.mediatek.com ESMTP with TLSv1.2 ECDHE-RSA-AES256-SHA384 256/256) with ESMTP id 1172357487; Wed, 12 Jan 2022 02:19:24 -0700 Received: from mtkmbs10n1.mediatek.inc (172.21.101.34) by MTKMBS62N1.mediatek.inc (172.29.193.41) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Wed, 12 Jan 2022 01:09:22 -0800 Received: from mtkcas10.mediatek.inc (172.21.101.39) by mtkmbs10n1.mediatek.inc (172.21.101.34) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.2.792.15; Wed, 12 Jan 2022 17:09:20 +0800 Received: from mhfsdcap04 (10.17.3.154) by mtkcas10.mediatek.inc (172.21.101.73) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Wed, 12 Jan 2022 17:09:19 +0800 Message-ID: Subject: Re: [PATCH v5 25/32] iommu/mtk: Migrate to aggregate driver From: Yong Wu To: Stephen Boyd CC: Krzysztof Kozlowski , "Greg Kroah-Hartman" , Douglas Anderson , , , , , Joerg Roedel , "Will Deacon" , Daniel Vetter , "Rafael J. Wysocki" , Rob Clark , Russell King , Saravana Kannan , , , Date: Wed, 12 Jan 2022 17:09:19 +0800 In-Reply-To: References: <20220106214556.2461363-1-swboyd@chromium.org> <20220106214556.2461363-26-swboyd@chromium.org> <1a3b368eb891ca55c33265397cffab0b9f128737.camel@mediatek.com> X-Mailer: Evolution 3.28.5-0ubuntu0.18.04.2 MIME-Version: 1.0 X-MTK: N X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220112_011926_856498_44CD3FE0 X-CRM114-Status: GOOD ( 49.69 ) X-BeenThere: linux-mediatek@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "Linux-mediatek" Errors-To: linux-mediatek-bounces+linux-mediatek=archiver.kernel.org@lists.infradead.org On Tue, 2022-01-11 at 16:27 -0800, Stephen Boyd wrote: > Quoting Yong Wu (2022-01-11 04:22:23) > > Hi Stephen, > > > > Thanks for helping update here. > > > > On Thu, 2022-01-06 at 13:45 -0800, Stephen Boyd wrote: > > > Use an aggregate driver instead of component ops so that we can > > > get > > > proper driver probe ordering of the aggregate device with respect > > > to > > > all > > > the component devices that make up the aggregate device. > > > > > > Cc: Yong Wu > > > Cc: Joerg Roedel > > > Cc: Will Deacon > > > Cc: Daniel Vetter > > > Cc: "Rafael J. Wysocki" > > > Cc: Rob Clark > > > Cc: Russell King > > > Cc: Saravana Kannan > > > Signed-off-by: Stephen Boyd > > > > When I test this on mt8195 which have two IOMMU HWs(calling > > component_aggregate_regsiter twice), it will abort like this. Then > > what > > should we do if we have two instances? > > > > Thanks for testing it out. We can't register the struct driver more > than > once but this driver is calling the component_aggregate_register() > function from the driver probe and there are two devices bound to the > mtk-iommu driver so we try to register it more than once. Sigh! > > I see a couple options. One is to do a deep copy of the driver > structure > and change the driver name. Then it's a one to one relationship > between > device and driver. That's not very great because it leaves around > junk > so it should probably be avoided. > > Another option is to reference count the driver registration calls > when > component_aggregate_register() is called multiple times. Then we > would > only register the driver once and keep it pinned until the last > unregister call is made, but still remove devices that are created > for > the match table. > > Can you try the attached patch? It is based on the next version of > this > patch series so the include part of the patch may not apply cleanly. > > ---8<--- > diff --git a/drivers/base/component.c b/drivers/base/component.c > index 64ad7478c67a..97f253a41bdf 100644 > --- a/drivers/base/component.c > +++ b/drivers/base/component.c > @@ -492,15 +492,30 @@ static struct aggregate_device > *__aggregate_find(struct device *parent) > return dev ? to_aggregate_device(dev) : NULL; > } > > +static DEFINE_MUTEX(aggregate_mutex); > + > static int aggregate_driver_register(struct aggregate_driver *adrv) > { > - adrv->driver.bus = &aggregate_bus_type; > - return driver_register(&adrv->driver); > + int ret = 0; > + > + mutex_lock(&aggregate_mutex); > + if (!refcount_inc_not_zero(&adrv->count)) { > + adrv->driver.bus = &aggregate_bus_type; > + ret = driver_register(&adrv->driver); > + if (!ret) > + refcount_inc(&adrv->count); This should be refcount_set(&adrv->count, 1)? Otherwise, it will warning like this: [ 2.654526] ------------[ cut here ]------------ [ 2.655558] refcount_t: addition on 0; use-after-free. [ 2.656219] WARNING: CPU: 7 PID: 74 at ../v5.16- rc1/kernel/mediatek/lib/refcount.c:25 refcount_warn_saturate+0x128/0x148 ... [ 2.672227] Call trace: [ 2.672539] refcount_warn_saturate+0x128/0x148 [ 2.673118] component_aggregate_register+0x388/0x390 [ 2.673763] mtk_iommu_probe+0x638/0x690 [ 2.686467] ------------[ cut here ]------------ [ 2.687049] refcount_t: saturated; leaking memory. [ 2.687666] WARNING: CPU: 5 PID: 74 at ../v5.16- rc1/kernel/mediatek/lib/refcount.c:19 refcount_warn_saturate+0xfc/0x148 [ 2.703805] Call trace: [ 2.704117] refcount_warn_saturate+0xfc/0x148 [ 2.704685] component_aggregate_register+0x1fc/0x390 [ 2.705330] mtk_iommu_probe+0x638/0x690 > + } > + mutex_unlock(&aggregate_mutex); > + > + return ret; > } > > static void aggregate_driver_unregister(struct aggregate_driver > *adrv) > { > - driver_unregister(&adrv->driver); > + if (refcount_dec_and_mutex_lock(&adrv->count, > &aggregate_mutex)) { > + driver_unregister(&adrv->driver); > + mutex_unlock(&aggregate_mutex); > + } > } > > static struct aggregate_device *aggregate_device_add(struct device > *parent, > diff --git a/include/linux/component.h b/include/linux/component.h > index 53d81203c095..b061341938aa 100644 > --- a/include/linux/component.h > +++ b/include/linux/component.h > @@ -4,6 +4,7 @@ > > #include > #include > +#include > > struct aggregate_device; > > @@ -66,6 +67,7 @@ struct device *aggregate_device_parent(const struct > aggregate_device *adev); > > /** > * struct aggregate_driver - Aggregate driver (made up of other > drivers) > + * @count: driver registration refcount > * @driver: device driver > */ > struct aggregate_driver { > @@ -101,6 +103,7 @@ struct aggregate_driver { > */ > void (*shutdown)(struct aggregate_device *adev); > > + refcount_t count; > struct device_driver driver; > }; After this patch, the aggregate_driver flow looks ok. But our driver still aborts like this: [ 2.721316] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 ... [ 2.731658] pc : mtk_smi_larb_config_port_gen2_general+0xa4/0x138 [ 2.732434] lr : mtk_smi_larb_resume+0x54/0x98 ... [ 2.742457] Call trace: [ 2.742768] mtk_smi_larb_config_port_gen2_general+0xa4/0x138 [ 2.743496] pm_generic_runtime_resume+0x2c/0x48 [ 2.744090] __genpd_runtime_resume+0x30/0xa8 [ 2.744648] genpd_runtime_resume+0x94/0x2c8 [ 2.745191] __rpm_callback+0x44/0x150 [ 2.745669] rpm_callback+0x6c/0x78 [ 2.746114] rpm_resume+0x314/0x558 [ 2.746559] __pm_runtime_resume+0x3c/0x88 [ 2.747080] pm_runtime_get_suppliers+0x7c/0x110 [ 2.747668] __driver_probe_device+0x4c/0xe8 [ 2.748212] driver_probe_device+0x44/0x130 [ 2.748745] __device_attach_driver+0x98/0xd0 [ 2.749300] bus_for_each_drv+0x68/0xd0 [ 2.749787] __device_attach+0xec/0x148 [ 2.750277] device_attach+0x14/0x20 [ 2.750733] bus_rescan_devices_helper+0x50/0x90 [ 2.751319] bus_for_each_dev+0x7c/0xd8 [ 2.751806] bus_rescan_devices+0x20/0x30 [ 2.752315] __component_add+0x7c/0xa0 [ 2.752795] component_add+0x14/0x20 [ 2.753253] mtk_smi_larb_probe+0xe0/0x120 This is because the device runtime_resume is called before the bind operation(In our case this detailed function is mtk_smi_larb_bind). The issue doesn't happen without this patchset. I'm not sure the right sequence. If we should fix in mediatek driver, the patch could be: diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c index b883dcc0bbfa..288841555067 100644 --- a/drivers/memory/mtk-smi.c +++ b/drivers/memory/mtk-smi.c @@ -483,8 +483,9 @@ static int __maybe_unused mtk_smi_larb_resume(struct device *dev) if (ret < 0) return ret; - /* Configure the basic setting for this larb */ - larb_gen->config_port(dev); + /* Configure the basic setting for this larb after it binds with iommu */ + if (larb->mmu) + larb_gen->config_port(dev); return 0; } Another nitpick, the title should be: iommu/mediatek: xxxx _______________________________________________ Linux-mediatek mailing list Linux-mediatek@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-mediatek