linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/2] Fix two kernel warnings in glink driver
@ 2021-11-02 23:51 Sujit Kautkar
  2021-11-02 23:51 ` [PATCH v3 1/2] rpmsg: glink: Fix use-after-free in qcom_glink_rpdev_release() Sujit Kautkar
  2021-11-02 23:51 ` [PATCH v3 2/2] rpmsg: glink: Update cdev add/del API in rpmsg_ctrldev_release_device() Sujit Kautkar
  0 siblings, 2 replies; 8+ messages in thread
From: Sujit Kautkar @ 2021-11-02 23:51 UTC (permalink / raw)
  To: Andy Gross, Ohad Ben-Cohen
  Cc: Bjorn Andersson, Sibi Sankar, Matthias Kaehlcke, Stephen Boyd,
	Sujit Kautkar, linux-arm-msm, linux-kernel, linux-remoteproc

These changes addresses kernel warnings which shows up after enabling
debug kernel. First one fixes use-after-free warning and second fixes
warning by updating cdev APIs

Changes in v3:
- Clear ept pointers in patch 1
- Remove error check in patch 2

Changes in v2:
- Fix typo in commit message

Sujit Kautkar (2):
  rpmsg: glink: Fix use-after-free in qcom_glink_rpdev_release()
  rpmsg: glink: Update cdev add/del API in
    rpmsg_ctrldev_release_device()

 drivers/rpmsg/qcom_glink_native.c | 12 ++++++++++--
 drivers/rpmsg/rpmsg_char.c        | 10 ++--------
 2 files changed, 12 insertions(+), 10 deletions(-)

-- 
2.31.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v3 1/2] rpmsg: glink: Fix use-after-free in qcom_glink_rpdev_release()
  2021-11-02 23:51 [PATCH v3 0/2] Fix two kernel warnings in glink driver Sujit Kautkar
@ 2021-11-02 23:51 ` Sujit Kautkar
  2021-11-03 16:34   ` Matthias Kaehlcke
  2021-11-02 23:51 ` [PATCH v3 2/2] rpmsg: glink: Update cdev add/del API in rpmsg_ctrldev_release_device() Sujit Kautkar
  1 sibling, 1 reply; 8+ messages in thread
From: Sujit Kautkar @ 2021-11-02 23:51 UTC (permalink / raw)
  To: Andy Gross, Ohad Ben-Cohen
  Cc: Bjorn Andersson, Sibi Sankar, Matthias Kaehlcke, Stephen Boyd,
	Sujit Kautkar, linux-arm-msm, linux-kernel, linux-remoteproc

qcom_glink_rpdev_release() sets channel->rpdev to NULL. However, with
debug enabled kernel, qcom_glink_rpdev_release() gets delayed due to
delayed kobject release and channel gets released by that time and
triggers below kernel warning. To avoid this use-after-free, clear ept
pointers during ept destroy and channel release and add a new condition
in qcom_glink_rpdev_release() to access channel

| BUG: KASAN: use-after-free in qcom_glink_rpdev_release+0x54/0x70
| Write of size 8 at addr ffffffaba438e8d0 by task kworker/6:1/54
|
| CPU: 6 PID: 54 Comm: kworker/6:1 Not tainted 5.4.109-lockdep #16
| Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
| Workqueue: events kobject_delayed_cleanup
| Call trace:
|  dump_backtrace+0x0/0x284
|  show_stack+0x20/0x2c
|  dump_stack+0xd4/0x170
|  print_address_description+0x3c/0x4a8
|  __kasan_report+0x144/0x168
|  kasan_report+0x10/0x18
|  __asan_report_store8_noabort+0x1c/0x24
|  qcom_glink_rpdev_release+0x54/0x70
|  device_release+0x68/0x14c
|  kobject_delayed_cleanup+0x158/0x2cc
|  process_one_work+0x7cc/0x10a4
|  worker_thread+0x80c/0xcec
|  kthread+0x2a8/0x314
|  ret_from_fork+0x10/0x18

Signed-off-by: Sujit Kautkar <sujitka@chromium.org>
---
Changes in v3:
- Clear ept pointers and add extra conditions

Changes in v2:
- Fix typo in commit message

 drivers/rpmsg/qcom_glink_native.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/rpmsg/qcom_glink_native.c b/drivers/rpmsg/qcom_glink_native.c
index e1444fefdd1c0..0c64a6f7a4f09 100644
--- a/drivers/rpmsg/qcom_glink_native.c
+++ b/drivers/rpmsg/qcom_glink_native.c
@@ -269,6 +269,9 @@ static void qcom_glink_channel_release(struct kref *ref)
 	idr_destroy(&channel->riids);
 	spin_unlock_irqrestore(&channel->intent_lock, flags);
 
+	if (channel->rpdev)
+		channel->rpdev->ept = NULL;
+
 	kfree(channel->name);
 	kfree(channel);
 }
@@ -1214,6 +1217,8 @@ static void qcom_glink_destroy_ept(struct rpmsg_endpoint *ept)
 	channel->ept.cb = NULL;
 	spin_unlock_irqrestore(&channel->recv_lock, flags);
 
+	channel->rpdev->ept = NULL;
+
 	/* Decouple the potential rpdev from the channel */
 	channel->rpdev = NULL;
 
@@ -1371,9 +1376,12 @@ static const struct rpmsg_endpoint_ops glink_endpoint_ops = {
 static void qcom_glink_rpdev_release(struct device *dev)
 {
 	struct rpmsg_device *rpdev = to_rpmsg_device(dev);
-	struct glink_channel *channel = to_glink_channel(rpdev->ept);
+	struct glink_channel *channel = NULL;
 
-	channel->rpdev = NULL;
+	if (rpdev->ept) {
+		channel = to_glink_channel(rpdev->ept);
+		channel->rpdev = NULL;
+	}
 	kfree(rpdev);
 }
 
-- 
2.31.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 2/2] rpmsg: glink: Update cdev add/del API in rpmsg_ctrldev_release_device()
  2021-11-02 23:51 [PATCH v3 0/2] Fix two kernel warnings in glink driver Sujit Kautkar
  2021-11-02 23:51 ` [PATCH v3 1/2] rpmsg: glink: Fix use-after-free in qcom_glink_rpdev_release() Sujit Kautkar
@ 2021-11-02 23:51 ` Sujit Kautkar
  2021-11-03 17:16   ` Matthias Kaehlcke
                     ` (2 more replies)
  1 sibling, 3 replies; 8+ messages in thread
From: Sujit Kautkar @ 2021-11-02 23:51 UTC (permalink / raw)
  To: Andy Gross, Ohad Ben-Cohen
  Cc: Bjorn Andersson, Sibi Sankar, Matthias Kaehlcke, Stephen Boyd,
	Sujit Kautkar, linux-kernel, linux-remoteproc

Replace cdev add/del APIs with cdev_device_add/cdev_device_del to avoid
below kernel warning. This correctly takes a reference to the parent
device so the parent will not get released until all references to the
cdev are released.

| ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x7c
| WARNING: CPU: 7 PID: 19892 at lib/debugobjects.c:488 debug_print_object+0x13c/0x1b0
| CPU: 7 PID: 19892 Comm: kworker/7:4 Tainted: G        W         5.4.147-lockdep #1
| ==================================================================
| Hardware name: Google CoachZ (rev1 - 2) with LTE (DT)
| Workqueue: events kobject_delayed_cleanup
| pstate: 60c00009 (nZCv daif +PAN +UAO)
| pc : debug_print_object+0x13c/0x1b0
| lr : debug_print_object+0x13c/0x1b0
| sp : ffffff83b2ec7970
| x29: ffffff83b2ec7970 x28: dfffffd000000000
| x27: ffffff83d674f000 x26: dfffffd000000000
| x25: ffffffd06b8fa660 x24: dfffffd000000000
| x23: 0000000000000000 x22: ffffffd06b7c5108
| x21: ffffffd06d597860 x20: ffffffd06e2c21c0
| x19: ffffffd06d5974c0 x18: 000000000001dad8
| x17: 0000000000000000 x16: dfffffd000000000
| BUG: KASAN: use-after-free in qcom_glink_rpdev_release+0x54/0x70
| x15: ffffffffffffffff x14: 79616c6564203a74
| x13: 0000000000000000 x12: 0000000000000080
| Write of size 8 at addr ffffff83d95768d0 by task kworker/3:1/150
| x11: 0000000000000001 x10: 0000000000000000
| x9 : fc9e8edec0ad0300 x8 : fc9e8edec0ad0300
|
| x7 : 0000000000000000 x6 : 0000000000000000
| x5 : 0000000000000080 x4 : 0000000000000000
| CPU: 3 PID: 150 Comm: kworker/3:1 Tainted: G        W         5.4.147-lockdep #1
| x3 : ffffffd06c149574 x2 : ffffff83f77f7498
| x1 : ffffffd06d596f60 x0 : 0000000000000061
| Hardware name: Google CoachZ (rev1 - 2) with LTE (DT)
| Call trace:
|  debug_print_object+0x13c/0x1b0
| Workqueue: events kobject_delayed_cleanup
|  __debug_check_no_obj_freed+0x25c/0x3c0
|  debug_check_no_obj_freed+0x18/0x20
| Call trace:
|  slab_free_freelist_hook+0xb4/0x1bc
|  kfree+0xe8/0x2d8
|  dump_backtrace+0x0/0x27c
|  rpmsg_ctrldev_release_device+0x78/0xb8
|  device_release+0x68/0x14c
|  show_stack+0x20/0x2c
|  kobject_cleanup+0x12c/0x298
|  kobject_delayed_cleanup+0x10/0x18
|  dump_stack+0xe0/0x19c
|  process_one_work+0x578/0x92c
|  worker_thread+0x804/0xcf8
|  print_address_description+0x3c/0x4a8
|  kthread+0x2a8/0x314
|  ret_from_fork+0x10/0x18
|  __kasan_report+0x100/0x124

Signed-off-by: Sujit Kautkar <sujitka@chromium.org>
---
Changes in v3:
- Remove unecessary error check as per Matthias's comment

Changes in v2:
- Fix typo in commit message

 drivers/rpmsg/rpmsg_char.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/rpmsg/rpmsg_char.c b/drivers/rpmsg/rpmsg_char.c
index 876ce43df732b..a6a33155ca859 100644
--- a/drivers/rpmsg/rpmsg_char.c
+++ b/drivers/rpmsg/rpmsg_char.c
@@ -458,7 +458,7 @@ static void rpmsg_ctrldev_release_device(struct device *dev)
 
 	ida_simple_remove(&rpmsg_ctrl_ida, dev->id);
 	ida_simple_remove(&rpmsg_minor_ida, MINOR(dev->devt));
-	cdev_del(&ctrldev->cdev);
+	cdev_device_del(&ctrldev->cdev, &ctrldev->dev);
 	kfree(ctrldev);
 }
 
@@ -493,19 +493,13 @@ static int rpmsg_chrdev_probe(struct rpmsg_device *rpdev)
 	dev->id = ret;
 	dev_set_name(&ctrldev->dev, "rpmsg_ctrl%d", ret);
 
-	ret = cdev_add(&ctrldev->cdev, dev->devt, 1);
+	ret = cdev_device_add(&ctrldev->cdev, &ctrldev->dev);
 	if (ret)
 		goto free_ctrl_ida;
 
 	/* We can now rely on the release function for cleanup */
 	dev->release = rpmsg_ctrldev_release_device;
 
-	ret = device_add(dev);
-	if (ret) {
-		dev_err(&rpdev->dev, "device_add failed: %d\n", ret);
-		put_device(dev);
-	}
-
 	dev_set_drvdata(&rpdev->dev, ctrldev);
 
 	return ret;
-- 
2.31.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 1/2] rpmsg: glink: Fix use-after-free in qcom_glink_rpdev_release()
  2021-11-02 23:51 ` [PATCH v3 1/2] rpmsg: glink: Fix use-after-free in qcom_glink_rpdev_release() Sujit Kautkar
@ 2021-11-03 16:34   ` Matthias Kaehlcke
  0 siblings, 0 replies; 8+ messages in thread
From: Matthias Kaehlcke @ 2021-11-03 16:34 UTC (permalink / raw)
  To: Sujit Kautkar
  Cc: Andy Gross, Ohad Ben-Cohen, Bjorn Andersson, Sibi Sankar,
	Stephen Boyd, linux-arm-msm, linux-kernel, linux-remoteproc

Hi Sujit,

On Tue, Nov 02, 2021 at 04:51:49PM -0700, Sujit Kautkar wrote:
> qcom_glink_rpdev_release() sets channel->rpdev to NULL. However, with
> debug enabled kernel, qcom_glink_rpdev_release() gets delayed due to
> delayed kobject release and channel gets released by that time and
> triggers below kernel warning. To avoid this use-after-free, clear ept
> pointers during ept destroy and channel release and add a new condition
> in qcom_glink_rpdev_release() to access channel
> 
> | BUG: KASAN: use-after-free in qcom_glink_rpdev_release+0x54/0x70
> | Write of size 8 at addr ffffffaba438e8d0 by task kworker/6:1/54
> |
> | CPU: 6 PID: 54 Comm: kworker/6:1 Not tainted 5.4.109-lockdep #16
> | Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
> | Workqueue: events kobject_delayed_cleanup
> | Call trace:
> |  dump_backtrace+0x0/0x284
> |  show_stack+0x20/0x2c
> |  dump_stack+0xd4/0x170
> |  print_address_description+0x3c/0x4a8
> |  __kasan_report+0x144/0x168
> |  kasan_report+0x10/0x18
> |  __asan_report_store8_noabort+0x1c/0x24
> |  qcom_glink_rpdev_release+0x54/0x70
> |  device_release+0x68/0x14c
> |  kobject_delayed_cleanup+0x158/0x2cc
> |  process_one_work+0x7cc/0x10a4
> |  worker_thread+0x80c/0xcec
> |  kthread+0x2a8/0x314
> |  ret_from_fork+0x10/0x18
> 
> Signed-off-by: Sujit Kautkar <sujitka@chromium.org>
> ---
> Changes in v3:
> - Clear ept pointers and add extra conditions
> 
> Changes in v2:
> - Fix typo in commit message
> 
>  drivers/rpmsg/qcom_glink_native.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/rpmsg/qcom_glink_native.c b/drivers/rpmsg/qcom_glink_native.c
> index e1444fefdd1c0..0c64a6f7a4f09 100644
> --- a/drivers/rpmsg/qcom_glink_native.c
> +++ b/drivers/rpmsg/qcom_glink_native.c
> @@ -269,6 +269,9 @@ static void qcom_glink_channel_release(struct kref *ref)
>  	idr_destroy(&channel->riids);
>  	spin_unlock_irqrestore(&channel->intent_lock, flags);
>  
> +	if (channel->rpdev)
> +		channel->rpdev->ept = NULL;
> +
>  	kfree(channel->name);
>  	kfree(channel);
>  }
> @@ -1214,6 +1217,8 @@ static void qcom_glink_destroy_ept(struct rpmsg_endpoint *ept)
>  	channel->ept.cb = NULL;
>  	spin_unlock_irqrestore(&channel->recv_lock, flags);
>  
> +	channel->rpdev->ept = NULL;
> +
>  	/* Decouple the potential rpdev from the channel */
>  	channel->rpdev = NULL;
>  
> @@ -1371,9 +1376,12 @@ static const struct rpmsg_endpoint_ops glink_endpoint_ops = {
>  static void qcom_glink_rpdev_release(struct device *dev)
>  {
>  	struct rpmsg_device *rpdev = to_rpmsg_device(dev);
> -	struct glink_channel *channel = to_glink_channel(rpdev->ept);
> +	struct glink_channel *channel = NULL;

no need to initialize the pointer, it is assigned in the path that uses it.

>  
> -	channel->rpdev = NULL;
> +	if (rpdev->ept) {
> +		channel = to_glink_channel(rpdev->ept);
> +		channel->rpdev = NULL;
> +	}
>  	kfree(rpdev);
>  }

Looks like this is already fixed in -next by:

commit 343ba27b6f9d473ec3e602cc648300eb03a7fa05
Author: Chris Lew <clew@codeaurora.org>
Date:   Thu Jul 30 10:48:15 2020 +0530

    rpmsg: glink: Remove channel decouple from rpdev release

    If a channel is being rapidly restarting and the kobj release worker is
    busy, there is a chance the rpdev_release function will run after the
    channel struct itself has been released.

    There should not be a need to decouple the channel from rpdev in the
    rpdev release since that should only happen from the close commands.

    Signed-off-by: Chris Lew <clew@codeaurora.org>
    Signed-off-by: Deepak Kumar Singh <deesin@codeaurora.org>
    Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
    Link: https://lore.kernel.org/r/1596086296-28529-6-git-send-email-deesin@codeaurora.org

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 2/2] rpmsg: glink: Update cdev add/del API in rpmsg_ctrldev_release_device()
  2021-11-02 23:51 ` [PATCH v3 2/2] rpmsg: glink: Update cdev add/del API in rpmsg_ctrldev_release_device() Sujit Kautkar
@ 2021-11-03 17:16   ` Matthias Kaehlcke
  2021-11-17 18:59   ` Stephen Boyd
  2021-11-17 23:29   ` Bjorn Andersson
  2 siblings, 0 replies; 8+ messages in thread
From: Matthias Kaehlcke @ 2021-11-03 17:16 UTC (permalink / raw)
  To: Sujit Kautkar
  Cc: Andy Gross, Ohad Ben-Cohen, Bjorn Andersson, Sibi Sankar,
	Stephen Boyd, linux-kernel, linux-remoteproc

On Tue, Nov 02, 2021 at 04:51:51PM -0700, Sujit Kautkar wrote:
> Replace cdev add/del APIs with cdev_device_add/cdev_device_del to avoid
> below kernel warning. This correctly takes a reference to the parent
> device so the parent will not get released until all references to the
> cdev are released.
> 
> | ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x7c
> | WARNING: CPU: 7 PID: 19892 at lib/debugobjects.c:488 debug_print_object+0x13c/0x1b0
> | CPU: 7 PID: 19892 Comm: kworker/7:4 Tainted: G        W         5.4.147-lockdep #1
> | ==================================================================
> | Hardware name: Google CoachZ (rev1 - 2) with LTE (DT)
> | Workqueue: events kobject_delayed_cleanup
> | pstate: 60c00009 (nZCv daif +PAN +UAO)
> | pc : debug_print_object+0x13c/0x1b0
> | lr : debug_print_object+0x13c/0x1b0
> | sp : ffffff83b2ec7970
> | x29: ffffff83b2ec7970 x28: dfffffd000000000
> | x27: ffffff83d674f000 x26: dfffffd000000000
> | x25: ffffffd06b8fa660 x24: dfffffd000000000
> | x23: 0000000000000000 x22: ffffffd06b7c5108
> | x21: ffffffd06d597860 x20: ffffffd06e2c21c0
> | x19: ffffffd06d5974c0 x18: 000000000001dad8
> | x17: 0000000000000000 x16: dfffffd000000000
> | BUG: KASAN: use-after-free in qcom_glink_rpdev_release+0x54/0x70
> | x15: ffffffffffffffff x14: 79616c6564203a74
> | x13: 0000000000000000 x12: 0000000000000080
> | Write of size 8 at addr ffffff83d95768d0 by task kworker/3:1/150
> | x11: 0000000000000001 x10: 0000000000000000
> | x9 : fc9e8edec0ad0300 x8 : fc9e8edec0ad0300
> |
> | x7 : 0000000000000000 x6 : 0000000000000000
> | x5 : 0000000000000080 x4 : 0000000000000000
> | CPU: 3 PID: 150 Comm: kworker/3:1 Tainted: G        W         5.4.147-lockdep #1
> | x3 : ffffffd06c149574 x2 : ffffff83f77f7498
> | x1 : ffffffd06d596f60 x0 : 0000000000000061
> | Hardware name: Google CoachZ (rev1 - 2) with LTE (DT)
> | Call trace:
> |  debug_print_object+0x13c/0x1b0
> | Workqueue: events kobject_delayed_cleanup
> |  __debug_check_no_obj_freed+0x25c/0x3c0
> |  debug_check_no_obj_freed+0x18/0x20
> | Call trace:
> |  slab_free_freelist_hook+0xb4/0x1bc
> |  kfree+0xe8/0x2d8
> |  dump_backtrace+0x0/0x27c
> |  rpmsg_ctrldev_release_device+0x78/0xb8
> |  device_release+0x68/0x14c
> |  show_stack+0x20/0x2c
> |  kobject_cleanup+0x12c/0x298
> |  kobject_delayed_cleanup+0x10/0x18
> |  dump_stack+0xe0/0x19c
> |  process_one_work+0x578/0x92c
> |  worker_thread+0x804/0xcf8
> |  print_address_description+0x3c/0x4a8
> |  kthread+0x2a8/0x314
> |  ret_from_fork+0x10/0x18
> |  __kasan_report+0x100/0x124
> 
> Signed-off-by: Sujit Kautkar <sujitka@chromium.org>

Reviewed-by: Matthias Kaehlcke <mka@chromium.org>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 2/2] rpmsg: glink: Update cdev add/del API in rpmsg_ctrldev_release_device()
  2021-11-02 23:51 ` [PATCH v3 2/2] rpmsg: glink: Update cdev add/del API in rpmsg_ctrldev_release_device() Sujit Kautkar
  2021-11-03 17:16   ` Matthias Kaehlcke
@ 2021-11-17 18:59   ` Stephen Boyd
  2021-11-17 23:29   ` Bjorn Andersson
  2 siblings, 0 replies; 8+ messages in thread
From: Stephen Boyd @ 2021-11-17 18:59 UTC (permalink / raw)
  To: Andy Gross, Ohad Ben-Cohen, Sujit Kautkar
  Cc: Bjorn Andersson, Sibi Sankar, Matthias Kaehlcke, linux-kernel,
	linux-remoteproc

The subject is a little confusing. Maybe it should be "Use
cdev_device_{add,del}() instead of open coding".

Quoting Sujit Kautkar (2021-11-02 16:51:51)
> Replace cdev add/del APIs with cdev_device_add/cdev_device_del to avoid
> below kernel warning. This correctly takes a reference to the parent
> device so the parent will not get released until all references to the
> cdev are released.
>
> | ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x7c
> | WARNING: CPU: 7 PID: 19892 at lib/debugobjects.c:488 debug_print_object+0x13c/0x1b0
> | CPU: 7 PID: 19892 Comm: kworker/7:4 Tainted: G        W         5.4.147-lockdep #1
> | ==================================================================
> | Hardware name: Google CoachZ (rev1 - 2) with LTE (DT)
> | Workqueue: events kobject_delayed_cleanup
> | pstate: 60c00009 (nZCv daif +PAN +UAO)
> | pc : debug_print_object+0x13c/0x1b0
> | lr : debug_print_object+0x13c/0x1b0
> | sp : ffffff83b2ec7970
> | x29: ffffff83b2ec7970 x28: dfffffd000000000
> | x27: ffffff83d674f000 x26: dfffffd000000000
> | x25: ffffffd06b8fa660 x24: dfffffd000000000
> | x23: 0000000000000000 x22: ffffffd06b7c5108
> | x21: ffffffd06d597860 x20: ffffffd06e2c21c0
> | x19: ffffffd06d5974c0 x18: 000000000001dad8
> | x17: 0000000000000000 x16: dfffffd000000000
> | BUG: KASAN: use-after-free in qcom_glink_rpdev_release+0x54/0x70
> | x15: ffffffffffffffff x14: 79616c6564203a74
> | x13: 0000000000000000 x12: 0000000000000080
> | Write of size 8 at addr ffffff83d95768d0 by task kworker/3:1/150
> | x11: 0000000000000001 x10: 0000000000000000
> | x9 : fc9e8edec0ad0300 x8 : fc9e8edec0ad0300
> |
> | x7 : 0000000000000000 x6 : 0000000000000000
> | x5 : 0000000000000080 x4 : 0000000000000000
> | CPU: 3 PID: 150 Comm: kworker/3:1 Tainted: G        W         5.4.147-lockdep #1
> | x3 : ffffffd06c149574 x2 : ffffff83f77f7498
> | x1 : ffffffd06d596f60 x0 : 0000000000000061
> | Hardware name: Google CoachZ (rev1 - 2) with LTE (DT)
> | Call trace:
> |  debug_print_object+0x13c/0x1b0
> | Workqueue: events kobject_delayed_cleanup
> |  __debug_check_no_obj_freed+0x25c/0x3c0
> |  debug_check_no_obj_freed+0x18/0x20
> | Call trace:
> |  slab_free_freelist_hook+0xb4/0x1bc
> |  kfree+0xe8/0x2d8
> |  dump_backtrace+0x0/0x27c
> |  rpmsg_ctrldev_release_device+0x78/0xb8
> |  device_release+0x68/0x14c
> |  show_stack+0x20/0x2c
> |  kobject_cleanup+0x12c/0x298
> |  kobject_delayed_cleanup+0x10/0x18
> |  dump_stack+0xe0/0x19c
> |  process_one_work+0x578/0x92c
> |  worker_thread+0x804/0xcf8
> |  print_address_description+0x3c/0x4a8
> |  kthread+0x2a8/0x314
> |  ret_from_fork+0x10/0x18
> |  __kasan_report+0x100/0x124
>
> Signed-off-by: Sujit Kautkar <sujitka@chromium.org>
> ---

Reviewed-by: Stephen Boyd <swboyd@chromium.org>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 2/2] rpmsg: glink: Update cdev add/del API in rpmsg_ctrldev_release_device()
  2021-11-02 23:51 ` [PATCH v3 2/2] rpmsg: glink: Update cdev add/del API in rpmsg_ctrldev_release_device() Sujit Kautkar
  2021-11-03 17:16   ` Matthias Kaehlcke
  2021-11-17 18:59   ` Stephen Boyd
@ 2021-11-17 23:29   ` Bjorn Andersson
  2021-12-07  0:15     ` Matthias Kaehlcke
  2 siblings, 1 reply; 8+ messages in thread
From: Bjorn Andersson @ 2021-11-17 23:29 UTC (permalink / raw)
  To: Sujit Kautkar
  Cc: Andy Gross, Ohad Ben-Cohen, Sibi Sankar, Matthias Kaehlcke,
	Stephen Boyd, linux-kernel, linux-remoteproc

On Tue 02 Nov 18:51 CDT 2021, Sujit Kautkar wrote:

I like Stephen's suggestion about modifying the $subject.
Also note that the change isn't in the glink driver, so prefix should
reflect that:

$ git log --oneline --no-decorate -- drivers/rpmsg/rpmsg_char.c
f998d48f9b3c rpmsg: glink: Update cdev add/del API in rpmsg_ctrldev_release_device()
bc774a3887cb rpmsg: char: Remove useless include
964e8bedd5a1 rpmsg: char: Return an error if device already open
...

> Replace cdev add/del APIs with cdev_device_add/cdev_device_del to avoid
> below kernel warning. This correctly takes a reference to the parent
> device so the parent will not get released until all references to the
> cdev are released.
> 
> | ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x7c
> | WARNING: CPU: 7 PID: 19892 at lib/debugobjects.c:488 debug_print_object+0x13c/0x1b0
> | CPU: 7 PID: 19892 Comm: kworker/7:4 Tainted: G        W         5.4.147-lockdep #1
> | ==================================================================
> | Hardware name: Google CoachZ (rev1 - 2) with LTE (DT)
> | Workqueue: events kobject_delayed_cleanup
> | pstate: 60c00009 (nZCv daif +PAN +UAO)
> | pc : debug_print_object+0x13c/0x1b0
> | lr : debug_print_object+0x13c/0x1b0
> | sp : ffffff83b2ec7970
> | x29: ffffff83b2ec7970 x28: dfffffd000000000
> | x27: ffffff83d674f000 x26: dfffffd000000000
> | x25: ffffffd06b8fa660 x24: dfffffd000000000
> | x23: 0000000000000000 x22: ffffffd06b7c5108
> | x21: ffffffd06d597860 x20: ffffffd06e2c21c0
> | x19: ffffffd06d5974c0 x18: 000000000001dad8
> | x17: 0000000000000000 x16: dfffffd000000000
> | BUG: KASAN: use-after-free in qcom_glink_rpdev_release+0x54/0x70
> | x15: ffffffffffffffff x14: 79616c6564203a74
> | x13: 0000000000000000 x12: 0000000000000080
> | Write of size 8 at addr ffffff83d95768d0 by task kworker/3:1/150
> | x11: 0000000000000001 x10: 0000000000000000
> | x9 : fc9e8edec0ad0300 x8 : fc9e8edec0ad0300
> |
> | x7 : 0000000000000000 x6 : 0000000000000000
> | x5 : 0000000000000080 x4 : 0000000000000000
> | CPU: 3 PID: 150 Comm: kworker/3:1 Tainted: G        W         5.4.147-lockdep #1
> | x3 : ffffffd06c149574 x2 : ffffff83f77f7498
> | x1 : ffffffd06d596f60 x0 : 0000000000000061
> | Hardware name: Google CoachZ (rev1 - 2) with LTE (DT)
> | Call trace:
> |  debug_print_object+0x13c/0x1b0
> | Workqueue: events kobject_delayed_cleanup
> |  __debug_check_no_obj_freed+0x25c/0x3c0
> |  debug_check_no_obj_freed+0x18/0x20
> | Call trace:
> |  slab_free_freelist_hook+0xb4/0x1bc
> |  kfree+0xe8/0x2d8
> |  dump_backtrace+0x0/0x27c

Why is dump_backtrace in the callstack here inbetween
rpmsg_ctrldev_release_device() and kfree()? Isn't the error that we're
calling kfree() on an chunk of memory that contains a live object?

> |  rpmsg_ctrldev_release_device+0x78/0xb8
> |  device_release+0x68/0x14c
> |  show_stack+0x20/0x2c
> |  kobject_cleanup+0x12c/0x298
> |  kobject_delayed_cleanup+0x10/0x18
> |  dump_stack+0xe0/0x19c
> |  process_one_work+0x578/0x92c
> |  worker_thread+0x804/0xcf8
> |  print_address_description+0x3c/0x4a8
> |  kthread+0x2a8/0x314
> |  ret_from_fork+0x10/0x18
> |  __kasan_report+0x100/0x124
> 
> Signed-off-by: Sujit Kautkar <sujitka@chromium.org>
> ---
> Changes in v3:
> - Remove unecessary error check as per Matthias's comment
> 
> Changes in v2:
> - Fix typo in commit message
> 
>  drivers/rpmsg/rpmsg_char.c | 10 ++--------
>  1 file changed, 2 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/rpmsg/rpmsg_char.c b/drivers/rpmsg/rpmsg_char.c
> index 876ce43df732b..a6a33155ca859 100644
> --- a/drivers/rpmsg/rpmsg_char.c
> +++ b/drivers/rpmsg/rpmsg_char.c
> @@ -458,7 +458,7 @@ static void rpmsg_ctrldev_release_device(struct device *dev)
>  
>  	ida_simple_remove(&rpmsg_ctrl_ida, dev->id);
>  	ida_simple_remove(&rpmsg_minor_ida, MINOR(dev->devt));
> -	cdev_del(&ctrldev->cdev);
> +	cdev_device_del(&ctrldev->cdev, &ctrldev->dev);

I am not able to find any other instance where cdev_device_del() is
called from the device's release function itself, which tells me that
this probably is not the right thing to do. Instead the appropriate way
seem to put the cdev_device_del() in rpmsg_chrdev_remove().


That said, we already do device_del() in rpmsg_chrdev_remove() so if the
warning is trying to tell us that ctrldev->dev is not deleted I think we
have an unbalanced put_device()?

Regards,
Bjorn

>  	kfree(ctrldev);
>  }
>  
> @@ -493,19 +493,13 @@ static int rpmsg_chrdev_probe(struct rpmsg_device *rpdev)
>  	dev->id = ret;
>  	dev_set_name(&ctrldev->dev, "rpmsg_ctrl%d", ret);
>  
> -	ret = cdev_add(&ctrldev->cdev, dev->devt, 1);
> +	ret = cdev_device_add(&ctrldev->cdev, &ctrldev->dev);
>  	if (ret)
>  		goto free_ctrl_ida;
>  
>  	/* We can now rely on the release function for cleanup */
>  	dev->release = rpmsg_ctrldev_release_device;
>  
> -	ret = device_add(dev);
> -	if (ret) {
> -		dev_err(&rpdev->dev, "device_add failed: %d\n", ret);
> -		put_device(dev);
> -	}
> -
>  	dev_set_drvdata(&rpdev->dev, ctrldev);
>  
>  	return ret;
> -- 
> 2.31.0
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 2/2] rpmsg: glink: Update cdev add/del API in rpmsg_ctrldev_release_device()
  2021-11-17 23:29   ` Bjorn Andersson
@ 2021-12-07  0:15     ` Matthias Kaehlcke
  0 siblings, 0 replies; 8+ messages in thread
From: Matthias Kaehlcke @ 2021-12-07  0:15 UTC (permalink / raw)
  To: Bjorn Andersson
  Cc: Sujit Kautkar, Andy Gross, Ohad Ben-Cohen, Sibi Sankar,
	Stephen Boyd, linux-kernel, linux-remoteproc

On Wed, Nov 17, 2021 at 05:29:07PM -0600, Bjorn Andersson wrote:
> On Tue 02 Nov 18:51 CDT 2021, Sujit Kautkar wrote:
> 
> I like Stephen's suggestion about modifying the $subject.
> Also note that the change isn't in the glink driver, so prefix should
> reflect that:
> 
> $ git log --oneline --no-decorate -- drivers/rpmsg/rpmsg_char.c
> f998d48f9b3c rpmsg: glink: Update cdev add/del API in rpmsg_ctrldev_release_device()
> bc774a3887cb rpmsg: char: Remove useless include
> 964e8bedd5a1 rpmsg: char: Return an error if device already open
> ...
> 
> > Replace cdev add/del APIs with cdev_device_add/cdev_device_del to avoid
> > below kernel warning. This correctly takes a reference to the parent
> > device so the parent will not get released until all references to the
> > cdev are released.
> > 
> > | ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x7c
> > | WARNING: CPU: 7 PID: 19892 at lib/debugobjects.c:488 debug_print_object+0x13c/0x1b0
> > | CPU: 7 PID: 19892 Comm: kworker/7:4 Tainted: G        W         5.4.147-lockdep #1
> > | ==================================================================
> > | Hardware name: Google CoachZ (rev1 - 2) with LTE (DT)
> > | Workqueue: events kobject_delayed_cleanup
> > | pstate: 60c00009 (nZCv daif +PAN +UAO)
> > | pc : debug_print_object+0x13c/0x1b0
> > | lr : debug_print_object+0x13c/0x1b0
> > | sp : ffffff83b2ec7970
> > | x29: ffffff83b2ec7970 x28: dfffffd000000000
> > | x27: ffffff83d674f000 x26: dfffffd000000000
> > | x25: ffffffd06b8fa660 x24: dfffffd000000000
> > | x23: 0000000000000000 x22: ffffffd06b7c5108
> > | x21: ffffffd06d597860 x20: ffffffd06e2c21c0
> > | x19: ffffffd06d5974c0 x18: 000000000001dad8
> > | x17: 0000000000000000 x16: dfffffd000000000
> > | BUG: KASAN: use-after-free in qcom_glink_rpdev_release+0x54/0x70
> > | x15: ffffffffffffffff x14: 79616c6564203a74
> > | x13: 0000000000000000 x12: 0000000000000080
> > | Write of size 8 at addr ffffff83d95768d0 by task kworker/3:1/150
> > | x11: 0000000000000001 x10: 0000000000000000
> > | x9 : fc9e8edec0ad0300 x8 : fc9e8edec0ad0300
> > |
> > | x7 : 0000000000000000 x6 : 0000000000000000
> > | x5 : 0000000000000080 x4 : 0000000000000000
> > | CPU: 3 PID: 150 Comm: kworker/3:1 Tainted: G        W         5.4.147-lockdep #1
> > | x3 : ffffffd06c149574 x2 : ffffff83f77f7498
> > | x1 : ffffffd06d596f60 x0 : 0000000000000061
> > | Hardware name: Google CoachZ (rev1 - 2) with LTE (DT)
> > | Call trace:
> > |  debug_print_object+0x13c/0x1b0
> > | Workqueue: events kobject_delayed_cleanup
> > |  __debug_check_no_obj_freed+0x25c/0x3c0
> > |  debug_check_no_obj_freed+0x18/0x20
> > | Call trace:
> > |  slab_free_freelist_hook+0xb4/0x1bc
> > |  kfree+0xe8/0x2d8
> > |  dump_backtrace+0x0/0x27c
> 
> Why is dump_backtrace in the callstack here inbetween
> rpmsg_ctrldev_release_device() and kfree()? Isn't the error that we're
> calling kfree() on an chunk of memory that contains a live object?

When I tried to repro there was no dump_backtrace():

  Call trace:
   debug_print_object+0x13c/0x1b0
   __debug_check_no_obj_freed+0x25c/0x3c0
   debug_check_no_obj_freed+0x18/0x20
   slab_free_freelist_hook+0xbc/0x1e4
   kfree+0xfc/0x2f4
   rpmsg_ctrldev_release_device+0x78/0xb8
   device_release+0x84/0x168
   kobject_cleanup+0x12c/0x298
   kobject_delayed_cleanup+0x10/0x18
   process_one_work+0x578/0x92c
   worker_thread+0x804/0xcf8
   kthread+0x2a8/0x314
   ret_from_fork+0x10/0x18

My guess is that Sujit added a dump_backtrace() for debugging and it was
still there when the backtrace of the commit message was generated. That
would also explain the two 'Call trace:' entries in the log.

> > |  rpmsg_ctrldev_release_device+0x78/0xb8
> > |  device_release+0x68/0x14c
> > |  show_stack+0x20/0x2c
> > |  kobject_cleanup+0x12c/0x298
> > |  kobject_delayed_cleanup+0x10/0x18
> > |  dump_stack+0xe0/0x19c
> > |  process_one_work+0x578/0x92c
> > |  worker_thread+0x804/0xcf8
> > |  print_address_description+0x3c/0x4a8
> > |  kthread+0x2a8/0x314
> > |  ret_from_fork+0x10/0x18
> > |  __kasan_report+0x100/0x124
> > 
> > Signed-off-by: Sujit Kautkar <sujitka@chromium.org>
> > ---
> > Changes in v3:
> > - Remove unecessary error check as per Matthias's comment
> > 
> > Changes in v2:
> > - Fix typo in commit message
> > 
> >  drivers/rpmsg/rpmsg_char.c | 10 ++--------
> >  1 file changed, 2 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/rpmsg/rpmsg_char.c b/drivers/rpmsg/rpmsg_char.c
> > index 876ce43df732b..a6a33155ca859 100644
> > --- a/drivers/rpmsg/rpmsg_char.c
> > +++ b/drivers/rpmsg/rpmsg_char.c
> > @@ -458,7 +458,7 @@ static void rpmsg_ctrldev_release_device(struct device *dev)
> >  
> >  	ida_simple_remove(&rpmsg_ctrl_ida, dev->id);
> >  	ida_simple_remove(&rpmsg_minor_ida, MINOR(dev->devt));
> > -	cdev_del(&ctrldev->cdev);
> > +	cdev_device_del(&ctrldev->cdev, &ctrldev->dev);
> 
> I am not able to find any other instance where cdev_device_del() is
> called from the device's release function itself, which tells me that
> this probably is not the right thing to do. Instead the appropriate way
> seem to put the cdev_device_del() in rpmsg_chrdev_remove().

Yes, it sounds reasonable to me to delete the char device when the control
device is removed.

> That said, we already do device_del() in rpmsg_chrdev_remove() so if the
> warning is trying to tell us that ctrldev->dev is not deleted I think we
> have an unbalanced put_device()?

My understanding is that the situation is analogous to this one:

commit 1413ef638abae4ab5621901cf4d8ef08a4a48ba6
Author: Kevin Hao <haokexin@gmail.com>
Date:   Fri Oct 11 23:00:14 2019 +0800

    i2c: dev: Fix the race between the release of i2c_dev and cdev

    The struct cdev is embedded in the struct i2c_dev. In the current code,
    we would free the i2c_dev struct directly in put_i2c_dev(), but the
    cdev is manged by a kobject, and the release of it is not predictable.
    So it is very possible that the i2c_dev is freed before the cdev is
    entirely released. We can easily get the following call trace with
    CONFIG_DEBUG_KOBJECT_RELEASE and CONFIG_DEBUG_OBJECTS_TIMERS enabled.
      ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x38
      WARNING: CPU: 19 PID: 1 at lib/debugobjects.c:325 debug_print_object+0xb0/0xf0

    ...

    This is a common issue when using cdev embedded in a struct.
    Fortunately, we already have a mechanism to solve this kind of issue.
    Please see commit 233ed09d7fda ("chardev: add helper function to
    egister char devs with a struct device") for more detail.

    In this patch, we choose to embed the struct device into the i2c_dev,
    and use the API provided by the commit 233ed09d7fda to make sure that
    the release of i2c_dev and cdev are in sequence.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-12-07  0:15 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-02 23:51 [PATCH v3 0/2] Fix two kernel warnings in glink driver Sujit Kautkar
2021-11-02 23:51 ` [PATCH v3 1/2] rpmsg: glink: Fix use-after-free in qcom_glink_rpdev_release() Sujit Kautkar
2021-11-03 16:34   ` Matthias Kaehlcke
2021-11-02 23:51 ` [PATCH v3 2/2] rpmsg: glink: Update cdev add/del API in rpmsg_ctrldev_release_device() Sujit Kautkar
2021-11-03 17:16   ` Matthias Kaehlcke
2021-11-17 18:59   ` Stephen Boyd
2021-11-17 23:29   ` Bjorn Andersson
2021-12-07  0:15     ` Matthias Kaehlcke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).