linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Use after free in bcm2835_spi_remove()
@ 2020-10-13 23:48 Florian Fainelli
  2020-10-14 14:09 ` Lukas Wunner
  0 siblings, 1 reply; 10+ messages in thread
From: Florian Fainelli @ 2020-10-13 23:48 UTC (permalink / raw)
  To: Lukas Wunner, linux-kernel, linux-spi; +Cc: Mark Brown

Hi Lukas,

With KASAN now working on ARM 32-bit, I was able to get the following
trace upon reboot which invokes bcm2835_spi_shutdown() calling
bcm2835_spi_remove(), the same can be triggered by doing a driver unbind:

# pwd
/sys/devices/platform/rdb/47e204800.spi/driver
# echo 47e204800.spi > unbind

How would you go about fixing this? This was not on a Rpi 4 but in
premise the same problem exists there.

Thanks!

[  229.746516]
==================================================================
[  229.754013] BUG: KASAN: use-after-free in bcm2835_dma_release+0x2c/0x260
[  229.760820] Read of size 4 at addr e0f08358 by task reboot/157
[  229.766727]
[  229.768302] CPU: 0 PID: 157 Comm: reboot Not tainted
5.9.0-gdf4dd84a3f7d #27
[  229.775445] Hardware name: Broadcom STB (Flattened Device Tree)
[  229.781448] Backtrace:
[  229.784017] [<c02120b4>] (dump_backtrace) from [<c02123d8>]
(show_stack+0x20/0x24)
[  229.791738]  r9:ffffffff r8:00000080 r7:c298e3c0 r6:400f0093
r5:00000000 r4:c298e3c0
[  229.799655] [<c02123b8>] (show_stack) from [<c08852a0>]
(dump_stack+0xbc/0xe0)
[  229.807050] [<c08851e4>] (dump_stack) from [<c04522bc>]
(print_address_description.constprop.3+0x3c/0x4b0)
[  229.816863]  r10:c2b771c0 r9:e46d9848 r8:e46d9854 r7:00000000
r6:c0b3ea3c r5:eeea5940
[  229.824815]  r4:e0f08358 r3:00000100
[  229.828510] [<c0452280>] (print_address_description.constprop.3) from
[<c0452944>] (kasan_report+0x15c/0x178)
[  229.838575]  r8:e46d9854 r7:00000000 r6:c0b3ea3c r5:0000009d r4:e0f08358
[  229.845411] [<c04527e8>] (kasan_report) from [<c0452f24>]
(__asan_load4+0x6c/0xbc)
[  229.853109]  r7:e0f08380 r6:e0f08000 r5:e0f08358 r4:e0f08380
[  229.858898] [<c0452eb8>] (__asan_load4) from [<c0b3ea3c>]
(bcm2835_dma_release+0x2c/0x260)
[  229.867318] [<c0b3ea10>] (bcm2835_dma_release) from [<c0b3ecd8>]
(bcm2835_spi_remove+0x68/0xa4)
[  229.876166]  r9:e46d9848 r8:e46d9854 r7:e0f083c0 r6:00000000
r5:e0f08000 r4:e0f08380
[  229.884069] [<c0b3ec70>] (bcm2835_spi_remove) from [<c0b3ed30>]
(bcm2835_spi_shutdown+0x1c/0x38)
[  229.892991]  r7:c2fc7f40 r6:e46d9810 r5:c2a1d854 r4:e46d9800
[  229.898788] [<c0b3ed14>] (bcm2835_spi_shutdown) from [<c0a17010>]
(platform_drv_shutdown+0x40/0x44)
[  229.907958]  r5:c2a1d854 r4:e46d9810
[  229.911653] [<c0a16fd0>] (platform_drv_shutdown) from [<c0a0f91c>]
(device_shutdown+0x248/0x35c)
[  229.920561]  r5:e465b810 r4:e46d9814
[  229.924255] [<c0a0f6d4>] (device_shutdown) from [<c0269418>]
(kernel_restart_prepare+0x4c/0x50)
[  229.933103]  r10:01234567 r9:fee1dead r8:dfdb3f60 r7:c2835240
r6:c2806d48 r5:00000000
[  229.941045]  r4:c2806d40
[  229.943675] [<c02693cc>] (kernel_restart_prepare) from [<c0269528>]
(kernel_restart+0x1c/0x60)
[  229.952405]  r5:00000000 r4:00000000
[  229.956084] [<c026950c>] (kernel_restart) from [<c0269810>]
(__do_sys_reboot+0x148/0x260)
[  229.964380]  r5:00000000 r4:bafb67c0
[  229.968057] [<c02696c8>] (__do_sys_reboot) from [<c0269998>]
(sys_reboot+0x18/0x1c)
[  229.975852]  r10:00000058 r9:dfdb0000 r8:c0200228 r7:00000058
r6:00000000 r5:00000004
[  229.983792]  r4:00000002
[  229.986422] [<c0269980>] (sys_reboot) from [<c0200060>]
(ret_fast_syscall+0x0/0x2c)
[  229.994190] Exception stack(0xdfdb3fa8 to 0xdfdb3ff0)
[  229.999350] 3fa0:                   00000002 00000004 fee1dead
28121969 01234567 000a9864
[  230.007669] 3fc0: 00000002 00000004 00000000 00000058 00000000
00000000 aedbe000 00000000
[  230.015974] 3fe0: aecce8f0 b6a81cec 000982d4 aecce910
[  230.021095]
[  230.022636] Allocated by task 20:
[  230.026039]  kasan_save_stack+0x24/0x48
[  230.029962]  __kasan_kmalloc.constprop.1+0xb8/0xc4
[  230.034842]  kasan_kmalloc+0x10/0x14
[  230.038495]  __kmalloc+0x168/0x2f4
[  230.041976]  __spi_alloc_controller+0x30/0xc0
[  230.046421]  bcm2835_spi_probe+0x90/0x4cc
[  230.050514]  platform_drv_probe+0x70/0xc8
[  230.054612]  really_probe+0x184/0x728
[  230.058361]  driver_probe_device+0xa4/0x278
[  230.062637]  __device_attach_driver+0xe8/0x148
[  230.067169]  bus_for_each_drv+0x108/0x158
[  230.071267]  __device_attach+0x190/0x234
[  230.075279]  device_initial_probe+0x1c/0x20
[  230.079551]  bus_probe_device+0xdc/0xec
[  230.083475]  deferred_probe_work_func+0xd4/0x11c
[  230.088196]  process_one_work+0x420/0x8f0
[  230.092293]  worker_thread+0x4fc/0x91c
[  230.096127]  kthread+0x21c/0x22c
[  230.099427]  ret_from_fork+0x14/0x20
[  230.103075]  0x0
[  230.104957]
[  230.106496] Freed by task 157:
[  230.109627]  kasan_save_stack+0x24/0x48
[  230.113542]  kasan_set_track+0x30/0x38
[  230.117375]  kasan_set_free_info+0x28/0x34
[  230.121553]  __kasan_slab_free+0x110/0x144
[  230.125732]  kasan_slab_free+0x14/0x18
[  230.129556]  kfree+0xbc/0x2b8
[  230.132597]  spi_controller_release+0x18/0x1c
[  230.137037]  device_release+0x4c/0xf0
[  230.140781]  kobject_put+0x14c/0x2d8
[  230.144434]  device_unregister+0x44/0x84
[  230.148438]  spi_unregister_controller+0xcc/0x124
[  230.153233]  bcm2835_spi_remove+0x5c/0xa4
[  230.157328]  bcm2835_spi_shutdown+0x1c/0x38
[  230.161593]  platform_drv_shutdown+0x40/0x44
[  230.165949]  device_shutdown+0x248/0x35c
[  230.169953]  kernel_restart_prepare+0x4c/0x50
[  230.174391]  kernel_restart+0x1c/0x60
[  230.178131]  __do_sys_reboot+0x148/0x260
[  230.182132]  sys_reboot+0x18/0x1c
[  230.185519]  ret_fast_syscall+0x0/0x2c
[  230.189335]  0xb6a81cec
[  230.191829]
[  230.193380] The buggy address belongs to the object at e0f08000
[  230.193380]  which belongs to the cache kmalloc-2k of size 2048
[  230.205354] The buggy address is located 856 bytes inside of
[  230.205354]  2048-byte region [e0f08000, e0f08800)
[  230.215907] The buggy address belongs to the page:
[  230.220806] page:b990e388 refcount:1 mapcount:0 mapping:00000000
index:0x0 pfn:0x20f08
[  230.228841] head:b990e388 order:3 compound_mapcount:0 compound_pincount:0
[  230.235731] flags: 0x2010200(slab|head)
[  230.239688] raw: 02010200 00000000 00000100 00000122 e4401800
00000000 80080008 00000000
[  230.247895] raw: ffffffff 00000001
[  230.251358] page dumped because: kasan: bad access detected
[  230.257000]
[  230.258534] Memory state around the buggy address:
[  230.263412]  e0f08200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  230.270038]  e0f08280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  230.276662] >e0f08300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  230.283272]                                             ^
[  230.288759]  e0f08380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  230.295384]  e0f08400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  230.301992]
==================================================================
[  230.309311] Disabling lock debugging due to kernel taint
[  230.325568] reboot: Restarting system

-- 
Florian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Use after free in bcm2835_spi_remove()
  2020-10-13 23:48 Use after free in bcm2835_spi_remove() Florian Fainelli
@ 2020-10-14 14:09 ` Lukas Wunner
  2020-10-14 19:40   ` Vladimir Oltean
  0 siblings, 1 reply; 10+ messages in thread
From: Lukas Wunner @ 2020-10-14 14:09 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: linux-kernel, linux-spi, Mark Brown

On Tue, Oct 13, 2020 at 04:48:42PM -0700, Florian Fainelli wrote:
> With KASAN now working on ARM 32-bit, I was able to get the following
> trace upon reboot which invokes bcm2835_spi_shutdown() calling
> bcm2835_spi_remove(), the same can be triggered by doing a driver unbind:

Thank you for the report.  Apparently the problem is that
spi_unregister_controller() drops the last ref on the controller,
causing it to be freed, and afterwards we access the controller's
private data, which is part of the same allocation as
struct spi_controller:

bcm2835_spi_remove()
  spi_unregister_controller()
    device_unregister()
      put_device()
        spi_controller_release()  #  spi_master_class.dev_release()
	  kfree(ctlr)
  bcm2835_dma_release(ctlr, bs)
  ...

However, when I submitted commit 9dd277ff92d0, I double-checked that
the kfree() happens after bcm2835_spi_remove() has finished and I
even wrote in the commit message:

   "Note that
    the struct spi_controller as well as the driver-private data are not
    freed until after bcm2835_spi_remove() has finished, so accessing them
    is safe."

I'm puzzled now that it doesn't work as intended.  I do not see any
recent commits which changed the behavior, so I must have made a
mistake and missed something.

The below patch should fix the issue.  Could you verify that?
Unfortunately I do not have access to a RasPi currently.

An alternative to this patch would be a devm function which acquires
a ref on the spi controller on ->probe() and automatically releases it
after ->remove() has finished.  This could be used by other SPI drivers
as well.

Thanks,

Lukas

-- >8 --

diff --git a/drivers/spi/spi-bcm2835.c b/drivers/spi/spi-bcm2835.c
index 41986ac..5254fda 100644
--- a/drivers/spi/spi-bcm2835.c
+++ b/drivers/spi/spi-bcm2835.c
@@ -1377,6 +1377,7 @@ static int bcm2835_spi_remove(struct platform_device *pdev)
 
 	bcm2835_debugfs_remove(bs);
 
+	spi_controller_get(ctlr);
 	spi_unregister_controller(ctlr);
 
 	bcm2835_dma_release(ctlr, bs);
@@ -1386,6 +1387,7 @@ static int bcm2835_spi_remove(struct platform_device *pdev)
 		   BCM2835_SPI_CS_CLEAR_RX | BCM2835_SPI_CS_CLEAR_TX);
 
 	clk_disable_unprepare(bs->clk);
+	spi_controller_put(ctlr);
 
 	return 0;
 }


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Use after free in bcm2835_spi_remove()
  2020-10-14 14:09 ` Lukas Wunner
@ 2020-10-14 19:40   ` Vladimir Oltean
  2020-10-14 20:25     ` Mark Brown
  0 siblings, 1 reply; 10+ messages in thread
From: Vladimir Oltean @ 2020-10-14 19:40 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: Florian Fainelli, linux-kernel, linux-spi, Mark Brown

On Wed, Oct 14, 2020 at 04:09:12PM +0200, Lukas Wunner wrote:
> Apparently the problem is that spi_unregister_controller() drops the
> last ref on the controller, causing it to be freed, and afterwards we
> access the controller's private data, which is part of the same
> allocation as struct spi_controller:
>
> bcm2835_spi_remove()
>   spi_unregister_controller()
>     device_unregister()
>       put_device()
>         spi_controller_release()  #  spi_master_class.dev_release()
> 	  kfree(ctlr)
>   bcm2835_dma_release(ctlr, bs)
>   ...

Also see these threads:
https://lore.kernel.org/linux-spi/20200922112241.GO4792@sirena.org.uk/T/#t
https://lore.kernel.org/linux-spi/270b94fd1e546d0c17a735c1f55500e58522da04.camel@suse.de/T/#u
And here's how _not_ to fix it:
https://lore.kernel.org/linux-spi/160088764365.36195.16185348610086043664.b4-ty@kernel.org/T/#t
At least without some care to not break other things:
https://lore.kernel.org/linux-spi/20200928080432.GC11648@pengutronix.de/T/#t

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Use after free in bcm2835_spi_remove()
  2020-10-14 19:40   ` Vladimir Oltean
@ 2020-10-14 20:25     ` Mark Brown
  2020-10-14 21:20       ` Florian Fainelli
  2020-10-15  5:38       ` Lukas Wunner
  0 siblings, 2 replies; 10+ messages in thread
From: Mark Brown @ 2020-10-14 20:25 UTC (permalink / raw)
  To: Vladimir Oltean; +Cc: Lukas Wunner, Florian Fainelli, linux-kernel, linux-spi

[-- Attachment #1: Type: text/plain, Size: 1100 bytes --]

On Wed, Oct 14, 2020 at 10:40:35PM +0300, Vladimir Oltean wrote:
> On Wed, Oct 14, 2020 at 04:09:12PM +0200, Lukas Wunner wrote:

> > Apparently the problem is that spi_unregister_controller() drops the
> > last ref on the controller, causing it to be freed, and afterwards we
> > access the controller's private data, which is part of the same
> > allocation as struct spi_controller:

> > bcm2835_spi_remove()
> >   spi_unregister_controller()
> >     device_unregister()
> >       put_device()
> >         spi_controller_release()  #  spi_master_class.dev_release()
> > 	  kfree(ctlr)
> >   bcm2835_dma_release(ctlr, bs)

> Also see these threads:
> https://lore.kernel.org/linux-spi/20200922112241.GO4792@sirena.org.uk/T/#t
> https://lore.kernel.org/linux-spi/270b94fd1e546d0c17a735c1f55500e58522da04.camel@suse.de/T/#u

Right, the proposed patch is yet another way to fix the issue - it all
comes back to the fact that you shouldn't be using the driver data after
unregistering if it was allocated as part of allocating the controller.
This framework feature is unfortunately quite error prone.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Use after free in bcm2835_spi_remove()
  2020-10-14 20:25     ` Mark Brown
@ 2020-10-14 21:20       ` Florian Fainelli
  2020-10-22 12:12         ` Lukas Wunner
  2020-10-15  5:38       ` Lukas Wunner
  1 sibling, 1 reply; 10+ messages in thread
From: Florian Fainelli @ 2020-10-14 21:20 UTC (permalink / raw)
  To: Mark Brown, Vladimir Oltean; +Cc: Lukas Wunner, linux-kernel, linux-spi

[-- Attachment #1: Type: text/plain, Size: 1298 bytes --]

On 10/14/20 1:25 PM, Mark Brown wrote:
> On Wed, Oct 14, 2020 at 10:40:35PM +0300, Vladimir Oltean wrote:
>> On Wed, Oct 14, 2020 at 04:09:12PM +0200, Lukas Wunner wrote:
> 
>>> Apparently the problem is that spi_unregister_controller() drops the
>>> last ref on the controller, causing it to be freed, and afterwards we
>>> access the controller's private data, which is part of the same
>>> allocation as struct spi_controller:
> 
>>> bcm2835_spi_remove()
>>>   spi_unregister_controller()
>>>     device_unregister()
>>>       put_device()
>>>         spi_controller_release()  #  spi_master_class.dev_release()
>>> 	  kfree(ctlr)
>>>   bcm2835_dma_release(ctlr, bs)
> 
>> Also see these threads:
>> https://lore.kernel.org/linux-spi/20200922112241.GO4792@sirena.org.uk/T/#t
>> https://lore.kernel.org/linux-spi/270b94fd1e546d0c17a735c1f55500e58522da04.camel@suse.de/T/#u
> 
> Right, the proposed patch is yet another way to fix the issue - it all
> comes back to the fact that you shouldn't be using the driver data after
> unregistering if it was allocated as part of allocating the controller.
> This framework feature is unfortunately quite error prone.

Lukas, your patch works fine for me and is only two lines, so maybe
better suited for stable. How about the attached patch?
-- 
Florian

[-- Attachment #2: 0001-spi-bcm2835-Fix-use-after-free-in-bcm2835_spi_remove.patch --]
[-- Type: text/x-patch, Size: 4973 bytes --]

From a4ee9da1ef09f9ddb04060e644b9c34fd532c189 Mon Sep 17 00:00:00 2001
From: Florian Fainelli <f.fainelli@gmail.com>
Date: Wed, 14 Oct 2020 14:15:28 -0700
Subject: [PATCH] spi: bcm2835: Fix use-after-free in bcm2835_spi_remove()

In bcm2835_spi_remove(), spi_controller_unregister() will free the ctlr
reference which will lead to an use after free in bcm2835_release_dma().

To avoid this use after free, allocate the bcm2835_spi structure with a
different lifecycle than the spi_controller structure such that we
unregister the SPI controller, free up all the resources and finally let
device managed allocations free the bcm2835_spi structure.

Fixes: 05897c710e8e ("spi: bcm2835: Tear down DMA before turning off SPI controller")
Fixes: 3ecd37edaa2a ("spi: bcm2835: enable dma modes for transfers meeting certain conditions")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/spi/spi-bcm2835.c | 46 +++++++++++++++++++++++----------------
 1 file changed, 27 insertions(+), 19 deletions(-)

diff --git a/drivers/spi/spi-bcm2835.c b/drivers/spi/spi-bcm2835.c
index 41986ac0fbfb..d66358e6b5cd 100644
--- a/drivers/spi/spi-bcm2835.c
+++ b/drivers/spi/spi-bcm2835.c
@@ -847,42 +847,41 @@ static bool bcm2835_spi_can_dma(struct spi_controller *ctlr,
 	return true;
 }
 
-static void bcm2835_dma_release(struct spi_controller *ctlr,
-				struct bcm2835_spi *bs)
+static void bcm2835_dma_release(struct bcm2835_spi *bs,
+				struct dma_chan *dma_tx,
+				struct dma_chan *dma_rx)
 {
 	int i;
 
-	if (ctlr->dma_tx) {
-		dmaengine_terminate_sync(ctlr->dma_tx);
+	if (dma_tx) {
+		dmaengine_terminate_sync(dma_tx);
 
 		if (bs->fill_tx_desc)
 			dmaengine_desc_free(bs->fill_tx_desc);
 
 		if (bs->fill_tx_addr)
-			dma_unmap_page_attrs(ctlr->dma_tx->device->dev,
+			dma_unmap_page_attrs(dma_tx->device->dev,
 					     bs->fill_tx_addr, sizeof(u32),
 					     DMA_TO_DEVICE,
 					     DMA_ATTR_SKIP_CPU_SYNC);
 
-		dma_release_channel(ctlr->dma_tx);
-		ctlr->dma_tx = NULL;
+		dma_release_channel(dma_tx);
 	}
 
-	if (ctlr->dma_rx) {
-		dmaengine_terminate_sync(ctlr->dma_rx);
+	if (dma_rx) {
+		dmaengine_terminate_sync(dma_rx);
 
 		for (i = 0; i < BCM2835_SPI_NUM_CS; i++)
 			if (bs->clear_rx_desc[i])
 				dmaengine_desc_free(bs->clear_rx_desc[i]);
 
 		if (bs->clear_rx_addr)
-			dma_unmap_single(ctlr->dma_rx->device->dev,
+			dma_unmap_single(dma_rx->device->dev,
 					 bs->clear_rx_addr,
 					 sizeof(bs->clear_rx_cs),
 					 DMA_TO_DEVICE);
 
-		dma_release_channel(ctlr->dma_rx);
-		ctlr->dma_rx = NULL;
+		dma_release_channel(dma_rx);
 	}
 }
 
@@ -1010,7 +1009,7 @@ static int bcm2835_dma_init(struct spi_controller *ctlr, struct device *dev,
 	dev_err(dev, "issue configuring dma: %d - not using DMA mode\n",
 		ret);
 err_release:
-	bcm2835_dma_release(ctlr, bs);
+	bcm2835_dma_release(bs, ctlr->dma_tx, ctlr->dma_rx);
 err:
 	/*
 	 * Only report error for deferred probing, otherwise fall back to
@@ -1291,12 +1290,17 @@ static int bcm2835_spi_probe(struct platform_device *pdev)
 	struct bcm2835_spi *bs;
 	int err;
 
+	bs = devm_kzalloc(&pdev->dev, sizeof(*bs), GFP_KERNEL);
+	if (!bs)
+		return -ENOMEM;
+
 	ctlr = spi_alloc_master(&pdev->dev, ALIGN(sizeof(*bs),
 						  dma_get_cache_alignment()));
 	if (!ctlr)
 		return -ENOMEM;
 
-	platform_set_drvdata(pdev, ctlr);
+	spi_controller_set_devdata(ctlr, bs);
+	platform_set_drvdata(pdev, bs);
 
 	ctlr->use_gpio_descriptors = true;
 	ctlr->mode_bits = BCM2835_SPI_MODE_BITS;
@@ -1308,7 +1312,6 @@ static int bcm2835_spi_probe(struct platform_device *pdev)
 	ctlr->prepare_message = bcm2835_spi_prepare_message;
 	ctlr->dev.of_node = pdev->dev.of_node;
 
-	bs = spi_controller_get_devdata(ctlr);
 	bs->ctlr = ctlr;
 
 	bs->regs = devm_platform_ioremap_resource(pdev, 0);
@@ -1362,7 +1365,7 @@ static int bcm2835_spi_probe(struct platform_device *pdev)
 	return 0;
 
 out_dma_release:
-	bcm2835_dma_release(ctlr, bs);
+	bcm2835_dma_release(bs, ctlr->dma_tx, ctlr->dma_rx);
 out_clk_disable:
 	clk_disable_unprepare(bs->clk);
 out_controller_put:
@@ -1372,14 +1375,19 @@ static int bcm2835_spi_probe(struct platform_device *pdev)
 
 static int bcm2835_spi_remove(struct platform_device *pdev)
 {
-	struct spi_controller *ctlr = platform_get_drvdata(pdev);
-	struct bcm2835_spi *bs = spi_controller_get_devdata(ctlr);
+	struct bcm2835_spi *bs = platform_get_drvdata(pdev);
+	struct spi_controller *ctlr = bs->ctlr;
+	struct dma_chan *tx_chan = ctlr->dma_tx;
+	struct dma_chan *rx_chan = ctlr->dma_rx;
 
 	bcm2835_debugfs_remove(bs);
 
 	spi_unregister_controller(ctlr);
 
-	bcm2835_dma_release(ctlr, bs);
+	/* ctlr is freed by spi_unregister_controller() use the cached dma_chan
+	 * references.
+	 */
+	bcm2835_dma_release(bs, tx_chan, rx_chan);
 
 	/* Clear FIFOs, and disable the HW block */
 	bcm2835_wr(bs, BCM2835_SPI_CS,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Use after free in bcm2835_spi_remove()
  2020-10-14 20:25     ` Mark Brown
  2020-10-14 21:20       ` Florian Fainelli
@ 2020-10-15  5:38       ` Lukas Wunner
  2020-10-15 12:53         ` Mark Brown
  1 sibling, 1 reply; 10+ messages in thread
From: Lukas Wunner @ 2020-10-15  5:38 UTC (permalink / raw)
  To: Mark Brown
  Cc: Vladimir Oltean, Florian Fainelli, linux-kernel, linux-spi, Sascha Hauer

[cc += Sascha]

On Wed, Oct 14, 2020 at 09:25:05PM +0100, Mark Brown wrote:
> > On Wed, Oct 14, 2020 at 04:09:12PM +0200, Lukas Wunner wrote:
> > > Apparently the problem is that spi_unregister_controller() drops the
> > > last ref on the controller, causing it to be freed, and afterwards we
> > > access the controller's private data, which is part of the same
> > > allocation as struct spi_controller:
> 
> Right, the proposed patch is yet another way to fix the issue - it all
> comes back to the fact that you shouldn't be using the driver data after
> unregistering if it was allocated as part of allocating the controller.
> This framework feature is unfortunately quite error prone.

How about holding a ref on the controller as long as the SPI driver
is bound to the controller's parent device?  See below for a patch,
compile-tested only for lack of an SPI-equipped machine.

Makes sense or dumb idea?

If this approach is deemed to be a case of "midlayer fallacy",
we could alternatively do this in a library function which drivers
opt-in to.  Or, given that the majority of drivers seems to be affected,
make it the default and allow drivers to opt-out.

-- >8 --

diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
index 0cab239..5afa275 100644
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -2399,6 +2399,11 @@ static ssize_t slave_store(struct device *dev, struct device_attribute *attr,
 extern struct class spi_slave_class;	/* dummy */
 #endif
 
+static void __spi_controller_put(void *ctlr)
+{
+	spi_controller_put(ctlr);
+}
+
 /**
  * __spi_alloc_controller - allocate an SPI master or slave controller
  * @dev: the controller, possibly using the platform_bus
@@ -2414,6 +2419,7 @@ static ssize_t slave_store(struct device *dev, struct device_attribute *attr,
  * This call is used only by SPI controller drivers, which are the
  * only ones directly touching chip registers.  It's how they allocate
  * an spi_controller structure, prior to calling spi_register_controller().
+ * The structure is accessible as long as the SPI driver is bound to @dev.
  *
  * This must be called from context that can sleep.
  *
@@ -2429,6 +2435,7 @@ struct spi_controller *__spi_alloc_controller(struct device *dev,
 {
 	struct spi_controller	*ctlr;
 	size_t ctlr_size = ALIGN(sizeof(*ctlr), dma_get_cache_alignment());
+	int ret;
 
 	if (!dev)
 		return NULL;
@@ -2449,6 +2456,13 @@ struct spi_controller *__spi_alloc_controller(struct device *dev,
 	pm_suspend_ignore_children(&ctlr->dev, true);
 	spi_controller_set_devdata(ctlr, (void *)ctlr + ctlr_size);
 
+	spi_controller_get(ctlr);
+	ret = devm_add_action(dev, __spi_controller_put, ctlr);
+	if (ret) {
+		kfree(ctlr);
+		return NULL;
+	}
+
 	return ctlr;
 }
 EXPORT_SYMBOL_GPL(__spi_alloc_controller);

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Use after free in bcm2835_spi_remove()
  2020-10-15  5:38       ` Lukas Wunner
@ 2020-10-15 12:53         ` Mark Brown
  2020-10-28  9:59           ` Lukas Wunner
  0 siblings, 1 reply; 10+ messages in thread
From: Mark Brown @ 2020-10-15 12:53 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Vladimir Oltean, Florian Fainelli, linux-kernel, linux-spi, Sascha Hauer

[-- Attachment #1: Type: text/plain, Size: 1504 bytes --]

On Thu, Oct 15, 2020 at 07:38:29AM +0200, Lukas Wunner wrote:
> On Wed, Oct 14, 2020 at 09:25:05PM +0100, Mark Brown wrote:

> > Right, the proposed patch is yet another way to fix the issue - it all
> > comes back to the fact that you shouldn't be using the driver data after
> > unregistering if it was allocated as part of allocating the controller.
> > This framework feature is unfortunately quite error prone.

> How about holding a ref on the controller as long as the SPI driver
> is bound to the controller's parent device?  See below for a patch,
> compile-tested only for lack of an SPI-equipped machine.

> Makes sense or dumb idea?

> If this approach is deemed to be a case of "midlayer fallacy",
> we could alternatively do this in a library function which drivers
> opt-in to.  Or, given that the majority of drivers seems to be affected,
> make it the default and allow drivers to opt-out.

...

> +	spi_controller_get(ctlr);
> +	ret = devm_add_action(dev, __spi_controller_put, ctlr);
> +	if (ret) {
> +		kfree(ctlr);

This feels a bit icky - we're masking a standard use after free bug that
affects devm in general, not just this instance, and so while it will
work it doesn't feel great.  If we did do this it'd need more comments
and should probably be conditional on using the feature.  TBH I'm just
thinking it's better to just remove the feature, it's clearly error
prone and pretty redundant with devm.  I'm not sure any memory savings
it's delivering are worth the sharp edges.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Use after free in bcm2835_spi_remove()
  2020-10-14 21:20       ` Florian Fainelli
@ 2020-10-22 12:12         ` Lukas Wunner
  0 siblings, 0 replies; 10+ messages in thread
From: Lukas Wunner @ 2020-10-22 12:12 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: Mark Brown, Vladimir Oltean, linux-kernel, linux-spi

On Wed, Oct 14, 2020 at 02:20:16PM -0700, Florian Fainelli wrote:
> In bcm2835_spi_remove(), spi_controller_unregister() will free the ctlr
> reference which will lead to an use after free in bcm2835_release_dma().
> 
> To avoid this use after free, allocate the bcm2835_spi structure with a
> different lifecycle than the spi_controller structure such that we
> unregister the SPI controller, free up all the resources and finally let
> device managed allocations free the bcm2835_spi structure.
[...]
> -	if (ctlr->dma_tx) {
> -		dmaengine_terminate_sync(ctlr->dma_tx);
> +	if (dma_tx) {
> +		dmaengine_terminate_sync(dma_tx);
>  
>  		if (bs->fill_tx_desc)
>  			dmaengine_desc_free(bs->fill_tx_desc);
>  
>  		if (bs->fill_tx_addr)
> -			dma_unmap_page_attrs(ctlr->dma_tx->device->dev,
> +			dma_unmap_page_attrs(dma_tx->device->dev,
>  					     bs->fill_tx_addr, sizeof(u32),
>  					     DMA_TO_DEVICE,
>  					     DMA_ATTR_SKIP_CPU_SYNC);
>  
> -		dma_release_channel(ctlr->dma_tx);
> -		ctlr->dma_tx = NULL;
> +		dma_release_channel(dma_tx);
>  	}

You must set ctlr->dma_tx and ctlr->dma_rx to NULL because the driver
checks their value in a couple of places.

E.g. bcm2835_spi_setup() checks ctlr->dma_rx.

Likewise, the error paths of bcm2835_dma_init() and bcm2835_spi_probe()
call bcm2835_dma_release() and the latter checks ctlr->dma_tx and
ctlr->dma_rx to determine whether DMA was set up, hence needs to be
torn down.


> +	bs = devm_kzalloc(&pdev->dev, sizeof(*bs), GFP_KERNEL);
> +	if (!bs)
> +		return -ENOMEM;
> +
>  	ctlr = spi_alloc_master(&pdev->dev, ALIGN(sizeof(*bs),
>  						  dma_get_cache_alignment()));

You can set the second argument to spi_alloc_master() to 0
to conserve memory.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Use after free in bcm2835_spi_remove()
  2020-10-15 12:53         ` Mark Brown
@ 2020-10-28  9:59           ` Lukas Wunner
  2020-10-29 22:24             ` Mark Brown
  0 siblings, 1 reply; 10+ messages in thread
From: Lukas Wunner @ 2020-10-28  9:59 UTC (permalink / raw)
  To: Mark Brown
  Cc: Vladimir Oltean, Florian Fainelli, linux-kernel, linux-spi, Sascha Hauer

On Thu, Oct 15, 2020 at 01:53:35PM +0100, Mark Brown wrote:
> On Thu, Oct 15, 2020 at 07:38:29AM +0200, Lukas Wunner wrote:
> > On Wed, Oct 14, 2020 at 09:25:05PM +0100, Mark Brown wrote:
> > How about holding a ref on the controller as long as the SPI driver
> > is bound to the controller's parent device?  See below for a patch,
> > compile-tested only for lack of an SPI-equipped machine.
[...]
> > +	spi_controller_get(ctlr);
> > +	ret = devm_add_action(dev, __spi_controller_put, ctlr);
> > +	if (ret) {
> > +		kfree(ctlr);
> 
> This feels a bit icky - we're masking a standard use after free bug that
> affects devm in general, not just this instance, and so while it will
> work it doesn't feel great.  If we did do this it'd need more comments
> and should probably be conditional on using the feature.  TBH I'm just
> thinking it's better to just remove the feature, it's clearly error
> prone and pretty redundant with devm.  I'm not sure any memory savings
> it's delivering are worth the sharp edges.

A combined memory allocation for struct spi_controller and the private
data has more benefits than just memory savings:  Having them adjacent
in memory reduces the chance of cache misses.  Also, one can get from
one struct to the other with a cheap subtraction (using container_of())
instead of having to chase pointers.  So it helps performance.  And a
lack of pointers arguably helps security.

Most subsystems embed the controller struct in the private data, but
there *is* precedence for doing it the other way round.  E.g. the IIO
subsystem likewise appends the private data to the controller struct.
So I think that's fine, it need not and should not be changed.

The problem is that the ->probe() and ->remove() code is currently
asymmetric, which is unintuitive:  On ->probe(), there's an allocation
step and a registration step:

	spi_alloc_master()
	spi_register_controller()

Whereas on ->remove(), there's no step to free the master which would
balance the prior alloc step:

	spi_unregister_controller()

That's because the spi_controller struct is ref-counted and the last
ref is usually dropped by spi_unregister_controller().  If the private
data is accessed after the spi_unregister_controller() step, a ref
needs to be held.

I maintain that it would be more intuitive to automatically hold a ref.
We could offer a devm_spi_alloc_master() function which holds this ref
and automatically releases it on unbind.

There are three drivers which call spi_alloc_master() with a size of zero
for the private data.  In these three cases it is fine to free the
spi_controller struct upon spi_unregister_controller().  So these drivers
can continue to use spi_alloc_master().  All other drivers could be
changed to use the new devm_spi_alloc_master(), or I could scrutinize
each of them and convert to the new function only if necessary.

Does this sound more convincing to you?

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Use after free in bcm2835_spi_remove()
  2020-10-28  9:59           ` Lukas Wunner
@ 2020-10-29 22:24             ` Mark Brown
  0 siblings, 0 replies; 10+ messages in thread
From: Mark Brown @ 2020-10-29 22:24 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Vladimir Oltean, Florian Fainelli, linux-kernel, linux-spi, Sascha Hauer

[-- Attachment #1: Type: text/plain, Size: 4078 bytes --]

On Wed, Oct 28, 2020 at 10:59:46AM +0100, Lukas Wunner wrote:
> On Thu, Oct 15, 2020 at 01:53:35PM +0100, Mark Brown wrote:

> > This feels a bit icky - we're masking a standard use after free bug that
> > affects devm in general, not just this instance, and so while it will
> > work it doesn't feel great.  If we did do this it'd need more comments

> A combined memory allocation for struct spi_controller and the private
> data has more benefits than just memory savings:  Having them adjacent
> in memory reduces the chance of cache misses.  Also, one can get from
> one struct to the other with a cheap subtraction (using container_of())
> instead of having to chase pointers.  So it helps performance.  And a
> lack of pointers arguably helps security.

The performance arguments don't seem super compelling either way TBH
given what SPI does, cache misses accessing the private data seem
unlikely to be perceptible when operations boil down to accesses on the
SPI bus.

> Most subsystems embed the controller struct in the private data, but
> there *is* precedence for doing it the other way round.  E.g. the IIO
> subsystem likewise appends the private data to the controller struct.
> So I think that's fine, it need not and should not be changed.

Given their ages I suspect IIO copied SPI; I do think it's this reversal
that's confusing things.

> The problem is that the ->probe() and ->remove() code is currently
> asymmetric, which is unintuitive:  On ->probe(), there's an allocation
> step and a registration step:

> 	spi_alloc_master()
> 	spi_register_controller()

> Whereas on ->remove(), there's no step to free the master which would
> balance the prior alloc step:

> 	spi_unregister_controller()

> That's because the spi_controller struct is ref-counted and the last
> ref is usually dropped by spi_unregister_controller().  If the private
> data is accessed after the spi_unregister_controller() step, a ref
> needs to be held.

I agree that it's the asymmetry here, the disagreement is about how to
fix it.  If we keep the allocations combined then that probably makes
sense but I'm at best unclear on the merit of keeping the allocations
combined.

> I maintain that it would be more intuitive to automatically hold a ref.
> We could offer a devm_spi_alloc_master() function which holds this ref
> and automatically releases it on unbind.

I don't know that it's super intuitive to have to have an explicit free
in the driver - you could equally expect that having registered the
thing allocated with the core's custom allocation function with the core
that the core is now taking ownership of it (which is how SPI devices as
opposed to controllers work).  That's what makes me lean towards just
doing separate allocations, there's no possibility of expectations about
transferring ownership.  If it's *always* done with devm it kind of gets
hidden though so perhaps it's not so bad and my concern goes away...

> There are three drivers which call spi_alloc_master() with a size of zero
> for the private data.  In these three cases it is fine to free the
> spi_controller struct upon spi_unregister_controller().  So these drivers
> can continue to use spi_alloc_master().  All other drivers could be
> changed to use the new devm_spi_alloc_master(), or I could scrutinize
> each of them and convert to the new function only if necessary.

It's only things that explicitly unregister the controller (rather than
using devm) that are going to be affected here, that's a *much* smaller
subset.  Everything else will be done with driver specific code and
hence private data usage before the controler goes away, though it looks
like a bunch (though not all) of them have other issues and are using
devm when they shouldn't.  That's a separate problem which ought to be
fixed anyway though.  The removal paths aren't exactly heavily stressed,
especially not under load.

In any case for your proposal your plan makes sense, I mostly just want
to avoid ending up with people getting confused in the other direction
and introducing another set of bugs.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-10-29 22:24 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-13 23:48 Use after free in bcm2835_spi_remove() Florian Fainelli
2020-10-14 14:09 ` Lukas Wunner
2020-10-14 19:40   ` Vladimir Oltean
2020-10-14 20:25     ` Mark Brown
2020-10-14 21:20       ` Florian Fainelli
2020-10-22 12:12         ` Lukas Wunner
2020-10-15  5:38       ` Lukas Wunner
2020-10-15 12:53         ` Mark Brown
2020-10-28  9:59           ` Lukas Wunner
2020-10-29 22:24             ` Mark Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).