* Use after free in bcm2835_spi_remove() @ 2020-10-13 23:48 Florian Fainelli 2020-10-14 14:09 ` Lukas Wunner 0 siblings, 1 reply; 10+ messages in thread From: Florian Fainelli @ 2020-10-13 23:48 UTC (permalink / raw) To: Lukas Wunner, linux-kernel, linux-spi; +Cc: Mark Brown Hi Lukas, With KASAN now working on ARM 32-bit, I was able to get the following trace upon reboot which invokes bcm2835_spi_shutdown() calling bcm2835_spi_remove(), the same can be triggered by doing a driver unbind: # pwd /sys/devices/platform/rdb/47e204800.spi/driver # echo 47e204800.spi > unbind How would you go about fixing this? This was not on a Rpi 4 but in premise the same problem exists there. Thanks! [ 229.746516] ================================================================== [ 229.754013] BUG: KASAN: use-after-free in bcm2835_dma_release+0x2c/0x260 [ 229.760820] Read of size 4 at addr e0f08358 by task reboot/157 [ 229.766727] [ 229.768302] CPU: 0 PID: 157 Comm: reboot Not tainted 5.9.0-gdf4dd84a3f7d #27 [ 229.775445] Hardware name: Broadcom STB (Flattened Device Tree) [ 229.781448] Backtrace: [ 229.784017] [<c02120b4>] (dump_backtrace) from [<c02123d8>] (show_stack+0x20/0x24) [ 229.791738] r9:ffffffff r8:00000080 r7:c298e3c0 r6:400f0093 r5:00000000 r4:c298e3c0 [ 229.799655] [<c02123b8>] (show_stack) from [<c08852a0>] (dump_stack+0xbc/0xe0) [ 229.807050] [<c08851e4>] (dump_stack) from [<c04522bc>] (print_address_description.constprop.3+0x3c/0x4b0) [ 229.816863] r10:c2b771c0 r9:e46d9848 r8:e46d9854 r7:00000000 r6:c0b3ea3c r5:eeea5940 [ 229.824815] r4:e0f08358 r3:00000100 [ 229.828510] [<c0452280>] (print_address_description.constprop.3) from [<c0452944>] (kasan_report+0x15c/0x178) [ 229.838575] r8:e46d9854 r7:00000000 r6:c0b3ea3c r5:0000009d r4:e0f08358 [ 229.845411] [<c04527e8>] (kasan_report) from [<c0452f24>] (__asan_load4+0x6c/0xbc) [ 229.853109] r7:e0f08380 r6:e0f08000 r5:e0f08358 r4:e0f08380 [ 229.858898] [<c0452eb8>] (__asan_load4) from [<c0b3ea3c>] (bcm2835_dma_release+0x2c/0x260) [ 229.867318] [<c0b3ea10>] (bcm2835_dma_release) from [<c0b3ecd8>] (bcm2835_spi_remove+0x68/0xa4) [ 229.876166] r9:e46d9848 r8:e46d9854 r7:e0f083c0 r6:00000000 r5:e0f08000 r4:e0f08380 [ 229.884069] [<c0b3ec70>] (bcm2835_spi_remove) from [<c0b3ed30>] (bcm2835_spi_shutdown+0x1c/0x38) [ 229.892991] r7:c2fc7f40 r6:e46d9810 r5:c2a1d854 r4:e46d9800 [ 229.898788] [<c0b3ed14>] (bcm2835_spi_shutdown) from [<c0a17010>] (platform_drv_shutdown+0x40/0x44) [ 229.907958] r5:c2a1d854 r4:e46d9810 [ 229.911653] [<c0a16fd0>] (platform_drv_shutdown) from [<c0a0f91c>] (device_shutdown+0x248/0x35c) [ 229.920561] r5:e465b810 r4:e46d9814 [ 229.924255] [<c0a0f6d4>] (device_shutdown) from [<c0269418>] (kernel_restart_prepare+0x4c/0x50) [ 229.933103] r10:01234567 r9:fee1dead r8:dfdb3f60 r7:c2835240 r6:c2806d48 r5:00000000 [ 229.941045] r4:c2806d40 [ 229.943675] [<c02693cc>] (kernel_restart_prepare) from [<c0269528>] (kernel_restart+0x1c/0x60) [ 229.952405] r5:00000000 r4:00000000 [ 229.956084] [<c026950c>] (kernel_restart) from [<c0269810>] (__do_sys_reboot+0x148/0x260) [ 229.964380] r5:00000000 r4:bafb67c0 [ 229.968057] [<c02696c8>] (__do_sys_reboot) from [<c0269998>] (sys_reboot+0x18/0x1c) [ 229.975852] r10:00000058 r9:dfdb0000 r8:c0200228 r7:00000058 r6:00000000 r5:00000004 [ 229.983792] r4:00000002 [ 229.986422] [<c0269980>] (sys_reboot) from [<c0200060>] (ret_fast_syscall+0x0/0x2c) [ 229.994190] Exception stack(0xdfdb3fa8 to 0xdfdb3ff0) [ 229.999350] 3fa0: 00000002 00000004 fee1dead 28121969 01234567 000a9864 [ 230.007669] 3fc0: 00000002 00000004 00000000 00000058 00000000 00000000 aedbe000 00000000 [ 230.015974] 3fe0: aecce8f0 b6a81cec 000982d4 aecce910 [ 230.021095] [ 230.022636] Allocated by task 20: [ 230.026039] kasan_save_stack+0x24/0x48 [ 230.029962] __kasan_kmalloc.constprop.1+0xb8/0xc4 [ 230.034842] kasan_kmalloc+0x10/0x14 [ 230.038495] __kmalloc+0x168/0x2f4 [ 230.041976] __spi_alloc_controller+0x30/0xc0 [ 230.046421] bcm2835_spi_probe+0x90/0x4cc [ 230.050514] platform_drv_probe+0x70/0xc8 [ 230.054612] really_probe+0x184/0x728 [ 230.058361] driver_probe_device+0xa4/0x278 [ 230.062637] __device_attach_driver+0xe8/0x148 [ 230.067169] bus_for_each_drv+0x108/0x158 [ 230.071267] __device_attach+0x190/0x234 [ 230.075279] device_initial_probe+0x1c/0x20 [ 230.079551] bus_probe_device+0xdc/0xec [ 230.083475] deferred_probe_work_func+0xd4/0x11c [ 230.088196] process_one_work+0x420/0x8f0 [ 230.092293] worker_thread+0x4fc/0x91c [ 230.096127] kthread+0x21c/0x22c [ 230.099427] ret_from_fork+0x14/0x20 [ 230.103075] 0x0 [ 230.104957] [ 230.106496] Freed by task 157: [ 230.109627] kasan_save_stack+0x24/0x48 [ 230.113542] kasan_set_track+0x30/0x38 [ 230.117375] kasan_set_free_info+0x28/0x34 [ 230.121553] __kasan_slab_free+0x110/0x144 [ 230.125732] kasan_slab_free+0x14/0x18 [ 230.129556] kfree+0xbc/0x2b8 [ 230.132597] spi_controller_release+0x18/0x1c [ 230.137037] device_release+0x4c/0xf0 [ 230.140781] kobject_put+0x14c/0x2d8 [ 230.144434] device_unregister+0x44/0x84 [ 230.148438] spi_unregister_controller+0xcc/0x124 [ 230.153233] bcm2835_spi_remove+0x5c/0xa4 [ 230.157328] bcm2835_spi_shutdown+0x1c/0x38 [ 230.161593] platform_drv_shutdown+0x40/0x44 [ 230.165949] device_shutdown+0x248/0x35c [ 230.169953] kernel_restart_prepare+0x4c/0x50 [ 230.174391] kernel_restart+0x1c/0x60 [ 230.178131] __do_sys_reboot+0x148/0x260 [ 230.182132] sys_reboot+0x18/0x1c [ 230.185519] ret_fast_syscall+0x0/0x2c [ 230.189335] 0xb6a81cec [ 230.191829] [ 230.193380] The buggy address belongs to the object at e0f08000 [ 230.193380] which belongs to the cache kmalloc-2k of size 2048 [ 230.205354] The buggy address is located 856 bytes inside of [ 230.205354] 2048-byte region [e0f08000, e0f08800) [ 230.215907] The buggy address belongs to the page: [ 230.220806] page:b990e388 refcount:1 mapcount:0 mapping:00000000 index:0x0 pfn:0x20f08 [ 230.228841] head:b990e388 order:3 compound_mapcount:0 compound_pincount:0 [ 230.235731] flags: 0x2010200(slab|head) [ 230.239688] raw: 02010200 00000000 00000100 00000122 e4401800 00000000 80080008 00000000 [ 230.247895] raw: ffffffff 00000001 [ 230.251358] page dumped because: kasan: bad access detected [ 230.257000] [ 230.258534] Memory state around the buggy address: [ 230.263412] e0f08200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 230.270038] e0f08280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 230.276662] >e0f08300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 230.283272] ^ [ 230.288759] e0f08380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 230.295384] e0f08400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 230.301992] ================================================================== [ 230.309311] Disabling lock debugging due to kernel taint [ 230.325568] reboot: Restarting system -- Florian ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Use after free in bcm2835_spi_remove() 2020-10-13 23:48 Use after free in bcm2835_spi_remove() Florian Fainelli @ 2020-10-14 14:09 ` Lukas Wunner 2020-10-14 19:40 ` Vladimir Oltean 0 siblings, 1 reply; 10+ messages in thread From: Lukas Wunner @ 2020-10-14 14:09 UTC (permalink / raw) To: Florian Fainelli; +Cc: linux-kernel, linux-spi, Mark Brown On Tue, Oct 13, 2020 at 04:48:42PM -0700, Florian Fainelli wrote: > With KASAN now working on ARM 32-bit, I was able to get the following > trace upon reboot which invokes bcm2835_spi_shutdown() calling > bcm2835_spi_remove(), the same can be triggered by doing a driver unbind: Thank you for the report. Apparently the problem is that spi_unregister_controller() drops the last ref on the controller, causing it to be freed, and afterwards we access the controller's private data, which is part of the same allocation as struct spi_controller: bcm2835_spi_remove() spi_unregister_controller() device_unregister() put_device() spi_controller_release() # spi_master_class.dev_release() kfree(ctlr) bcm2835_dma_release(ctlr, bs) ... However, when I submitted commit 9dd277ff92d0, I double-checked that the kfree() happens after bcm2835_spi_remove() has finished and I even wrote in the commit message: "Note that the struct spi_controller as well as the driver-private data are not freed until after bcm2835_spi_remove() has finished, so accessing them is safe." I'm puzzled now that it doesn't work as intended. I do not see any recent commits which changed the behavior, so I must have made a mistake and missed something. The below patch should fix the issue. Could you verify that? Unfortunately I do not have access to a RasPi currently. An alternative to this patch would be a devm function which acquires a ref on the spi controller on ->probe() and automatically releases it after ->remove() has finished. This could be used by other SPI drivers as well. Thanks, Lukas -- >8 -- diff --git a/drivers/spi/spi-bcm2835.c b/drivers/spi/spi-bcm2835.c index 41986ac..5254fda 100644 --- a/drivers/spi/spi-bcm2835.c +++ b/drivers/spi/spi-bcm2835.c @@ -1377,6 +1377,7 @@ static int bcm2835_spi_remove(struct platform_device *pdev) bcm2835_debugfs_remove(bs); + spi_controller_get(ctlr); spi_unregister_controller(ctlr); bcm2835_dma_release(ctlr, bs); @@ -1386,6 +1387,7 @@ static int bcm2835_spi_remove(struct platform_device *pdev) BCM2835_SPI_CS_CLEAR_RX | BCM2835_SPI_CS_CLEAR_TX); clk_disable_unprepare(bs->clk); + spi_controller_put(ctlr); return 0; } ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: Use after free in bcm2835_spi_remove() 2020-10-14 14:09 ` Lukas Wunner @ 2020-10-14 19:40 ` Vladimir Oltean 2020-10-14 20:25 ` Mark Brown 0 siblings, 1 reply; 10+ messages in thread From: Vladimir Oltean @ 2020-10-14 19:40 UTC (permalink / raw) To: Lukas Wunner; +Cc: Florian Fainelli, linux-kernel, linux-spi, Mark Brown On Wed, Oct 14, 2020 at 04:09:12PM +0200, Lukas Wunner wrote: > Apparently the problem is that spi_unregister_controller() drops the > last ref on the controller, causing it to be freed, and afterwards we > access the controller's private data, which is part of the same > allocation as struct spi_controller: > > bcm2835_spi_remove() > spi_unregister_controller() > device_unregister() > put_device() > spi_controller_release() # spi_master_class.dev_release() > kfree(ctlr) > bcm2835_dma_release(ctlr, bs) > ... Also see these threads: https://lore.kernel.org/linux-spi/20200922112241.GO4792@sirena.org.uk/T/#t https://lore.kernel.org/linux-spi/270b94fd1e546d0c17a735c1f55500e58522da04.camel@suse.de/T/#u And here's how _not_ to fix it: https://lore.kernel.org/linux-spi/160088764365.36195.16185348610086043664.b4-ty@kernel.org/T/#t At least without some care to not break other things: https://lore.kernel.org/linux-spi/20200928080432.GC11648@pengutronix.de/T/#t ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Use after free in bcm2835_spi_remove() 2020-10-14 19:40 ` Vladimir Oltean @ 2020-10-14 20:25 ` Mark Brown 2020-10-14 21:20 ` Florian Fainelli 2020-10-15 5:38 ` Lukas Wunner 0 siblings, 2 replies; 10+ messages in thread From: Mark Brown @ 2020-10-14 20:25 UTC (permalink / raw) To: Vladimir Oltean; +Cc: Lukas Wunner, Florian Fainelli, linux-kernel, linux-spi [-- Attachment #1: Type: text/plain, Size: 1100 bytes --] On Wed, Oct 14, 2020 at 10:40:35PM +0300, Vladimir Oltean wrote: > On Wed, Oct 14, 2020 at 04:09:12PM +0200, Lukas Wunner wrote: > > Apparently the problem is that spi_unregister_controller() drops the > > last ref on the controller, causing it to be freed, and afterwards we > > access the controller's private data, which is part of the same > > allocation as struct spi_controller: > > bcm2835_spi_remove() > > spi_unregister_controller() > > device_unregister() > > put_device() > > spi_controller_release() # spi_master_class.dev_release() > > kfree(ctlr) > > bcm2835_dma_release(ctlr, bs) > Also see these threads: > https://lore.kernel.org/linux-spi/20200922112241.GO4792@sirena.org.uk/T/#t > https://lore.kernel.org/linux-spi/270b94fd1e546d0c17a735c1f55500e58522da04.camel@suse.de/T/#u Right, the proposed patch is yet another way to fix the issue - it all comes back to the fact that you shouldn't be using the driver data after unregistering if it was allocated as part of allocating the controller. This framework feature is unfortunately quite error prone. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Use after free in bcm2835_spi_remove() 2020-10-14 20:25 ` Mark Brown @ 2020-10-14 21:20 ` Florian Fainelli 2020-10-22 12:12 ` Lukas Wunner 2020-10-15 5:38 ` Lukas Wunner 1 sibling, 1 reply; 10+ messages in thread From: Florian Fainelli @ 2020-10-14 21:20 UTC (permalink / raw) To: Mark Brown, Vladimir Oltean; +Cc: Lukas Wunner, linux-kernel, linux-spi [-- Attachment #1: Type: text/plain, Size: 1298 bytes --] On 10/14/20 1:25 PM, Mark Brown wrote: > On Wed, Oct 14, 2020 at 10:40:35PM +0300, Vladimir Oltean wrote: >> On Wed, Oct 14, 2020 at 04:09:12PM +0200, Lukas Wunner wrote: > >>> Apparently the problem is that spi_unregister_controller() drops the >>> last ref on the controller, causing it to be freed, and afterwards we >>> access the controller's private data, which is part of the same >>> allocation as struct spi_controller: > >>> bcm2835_spi_remove() >>> spi_unregister_controller() >>> device_unregister() >>> put_device() >>> spi_controller_release() # spi_master_class.dev_release() >>> kfree(ctlr) >>> bcm2835_dma_release(ctlr, bs) > >> Also see these threads: >> https://lore.kernel.org/linux-spi/20200922112241.GO4792@sirena.org.uk/T/#t >> https://lore.kernel.org/linux-spi/270b94fd1e546d0c17a735c1f55500e58522da04.camel@suse.de/T/#u > > Right, the proposed patch is yet another way to fix the issue - it all > comes back to the fact that you shouldn't be using the driver data after > unregistering if it was allocated as part of allocating the controller. > This framework feature is unfortunately quite error prone. Lukas, your patch works fine for me and is only two lines, so maybe better suited for stable. How about the attached patch? -- Florian [-- Attachment #2: 0001-spi-bcm2835-Fix-use-after-free-in-bcm2835_spi_remove.patch --] [-- Type: text/x-patch, Size: 4973 bytes --] From a4ee9da1ef09f9ddb04060e644b9c34fd532c189 Mon Sep 17 00:00:00 2001 From: Florian Fainelli <f.fainelli@gmail.com> Date: Wed, 14 Oct 2020 14:15:28 -0700 Subject: [PATCH] spi: bcm2835: Fix use-after-free in bcm2835_spi_remove() In bcm2835_spi_remove(), spi_controller_unregister() will free the ctlr reference which will lead to an use after free in bcm2835_release_dma(). To avoid this use after free, allocate the bcm2835_spi structure with a different lifecycle than the spi_controller structure such that we unregister the SPI controller, free up all the resources and finally let device managed allocations free the bcm2835_spi structure. Fixes: 05897c710e8e ("spi: bcm2835: Tear down DMA before turning off SPI controller") Fixes: 3ecd37edaa2a ("spi: bcm2835: enable dma modes for transfers meeting certain conditions") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> --- drivers/spi/spi-bcm2835.c | 46 +++++++++++++++++++++++---------------- 1 file changed, 27 insertions(+), 19 deletions(-) diff --git a/drivers/spi/spi-bcm2835.c b/drivers/spi/spi-bcm2835.c index 41986ac0fbfb..d66358e6b5cd 100644 --- a/drivers/spi/spi-bcm2835.c +++ b/drivers/spi/spi-bcm2835.c @@ -847,42 +847,41 @@ static bool bcm2835_spi_can_dma(struct spi_controller *ctlr, return true; } -static void bcm2835_dma_release(struct spi_controller *ctlr, - struct bcm2835_spi *bs) +static void bcm2835_dma_release(struct bcm2835_spi *bs, + struct dma_chan *dma_tx, + struct dma_chan *dma_rx) { int i; - if (ctlr->dma_tx) { - dmaengine_terminate_sync(ctlr->dma_tx); + if (dma_tx) { + dmaengine_terminate_sync(dma_tx); if (bs->fill_tx_desc) dmaengine_desc_free(bs->fill_tx_desc); if (bs->fill_tx_addr) - dma_unmap_page_attrs(ctlr->dma_tx->device->dev, + dma_unmap_page_attrs(dma_tx->device->dev, bs->fill_tx_addr, sizeof(u32), DMA_TO_DEVICE, DMA_ATTR_SKIP_CPU_SYNC); - dma_release_channel(ctlr->dma_tx); - ctlr->dma_tx = NULL; + dma_release_channel(dma_tx); } - if (ctlr->dma_rx) { - dmaengine_terminate_sync(ctlr->dma_rx); + if (dma_rx) { + dmaengine_terminate_sync(dma_rx); for (i = 0; i < BCM2835_SPI_NUM_CS; i++) if (bs->clear_rx_desc[i]) dmaengine_desc_free(bs->clear_rx_desc[i]); if (bs->clear_rx_addr) - dma_unmap_single(ctlr->dma_rx->device->dev, + dma_unmap_single(dma_rx->device->dev, bs->clear_rx_addr, sizeof(bs->clear_rx_cs), DMA_TO_DEVICE); - dma_release_channel(ctlr->dma_rx); - ctlr->dma_rx = NULL; + dma_release_channel(dma_rx); } } @@ -1010,7 +1009,7 @@ static int bcm2835_dma_init(struct spi_controller *ctlr, struct device *dev, dev_err(dev, "issue configuring dma: %d - not using DMA mode\n", ret); err_release: - bcm2835_dma_release(ctlr, bs); + bcm2835_dma_release(bs, ctlr->dma_tx, ctlr->dma_rx); err: /* * Only report error for deferred probing, otherwise fall back to @@ -1291,12 +1290,17 @@ static int bcm2835_spi_probe(struct platform_device *pdev) struct bcm2835_spi *bs; int err; + bs = devm_kzalloc(&pdev->dev, sizeof(*bs), GFP_KERNEL); + if (!bs) + return -ENOMEM; + ctlr = spi_alloc_master(&pdev->dev, ALIGN(sizeof(*bs), dma_get_cache_alignment())); if (!ctlr) return -ENOMEM; - platform_set_drvdata(pdev, ctlr); + spi_controller_set_devdata(ctlr, bs); + platform_set_drvdata(pdev, bs); ctlr->use_gpio_descriptors = true; ctlr->mode_bits = BCM2835_SPI_MODE_BITS; @@ -1308,7 +1312,6 @@ static int bcm2835_spi_probe(struct platform_device *pdev) ctlr->prepare_message = bcm2835_spi_prepare_message; ctlr->dev.of_node = pdev->dev.of_node; - bs = spi_controller_get_devdata(ctlr); bs->ctlr = ctlr; bs->regs = devm_platform_ioremap_resource(pdev, 0); @@ -1362,7 +1365,7 @@ static int bcm2835_spi_probe(struct platform_device *pdev) return 0; out_dma_release: - bcm2835_dma_release(ctlr, bs); + bcm2835_dma_release(bs, ctlr->dma_tx, ctlr->dma_rx); out_clk_disable: clk_disable_unprepare(bs->clk); out_controller_put: @@ -1372,14 +1375,19 @@ static int bcm2835_spi_probe(struct platform_device *pdev) static int bcm2835_spi_remove(struct platform_device *pdev) { - struct spi_controller *ctlr = platform_get_drvdata(pdev); - struct bcm2835_spi *bs = spi_controller_get_devdata(ctlr); + struct bcm2835_spi *bs = platform_get_drvdata(pdev); + struct spi_controller *ctlr = bs->ctlr; + struct dma_chan *tx_chan = ctlr->dma_tx; + struct dma_chan *rx_chan = ctlr->dma_rx; bcm2835_debugfs_remove(bs); spi_unregister_controller(ctlr); - bcm2835_dma_release(ctlr, bs); + /* ctlr is freed by spi_unregister_controller() use the cached dma_chan + * references. + */ + bcm2835_dma_release(bs, tx_chan, rx_chan); /* Clear FIFOs, and disable the HW block */ bcm2835_wr(bs, BCM2835_SPI_CS, -- 2.25.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: Use after free in bcm2835_spi_remove() 2020-10-14 21:20 ` Florian Fainelli @ 2020-10-22 12:12 ` Lukas Wunner 0 siblings, 0 replies; 10+ messages in thread From: Lukas Wunner @ 2020-10-22 12:12 UTC (permalink / raw) To: Florian Fainelli; +Cc: Mark Brown, Vladimir Oltean, linux-kernel, linux-spi On Wed, Oct 14, 2020 at 02:20:16PM -0700, Florian Fainelli wrote: > In bcm2835_spi_remove(), spi_controller_unregister() will free the ctlr > reference which will lead to an use after free in bcm2835_release_dma(). > > To avoid this use after free, allocate the bcm2835_spi structure with a > different lifecycle than the spi_controller structure such that we > unregister the SPI controller, free up all the resources and finally let > device managed allocations free the bcm2835_spi structure. [...] > - if (ctlr->dma_tx) { > - dmaengine_terminate_sync(ctlr->dma_tx); > + if (dma_tx) { > + dmaengine_terminate_sync(dma_tx); > > if (bs->fill_tx_desc) > dmaengine_desc_free(bs->fill_tx_desc); > > if (bs->fill_tx_addr) > - dma_unmap_page_attrs(ctlr->dma_tx->device->dev, > + dma_unmap_page_attrs(dma_tx->device->dev, > bs->fill_tx_addr, sizeof(u32), > DMA_TO_DEVICE, > DMA_ATTR_SKIP_CPU_SYNC); > > - dma_release_channel(ctlr->dma_tx); > - ctlr->dma_tx = NULL; > + dma_release_channel(dma_tx); > } You must set ctlr->dma_tx and ctlr->dma_rx to NULL because the driver checks their value in a couple of places. E.g. bcm2835_spi_setup() checks ctlr->dma_rx. Likewise, the error paths of bcm2835_dma_init() and bcm2835_spi_probe() call bcm2835_dma_release() and the latter checks ctlr->dma_tx and ctlr->dma_rx to determine whether DMA was set up, hence needs to be torn down. > + bs = devm_kzalloc(&pdev->dev, sizeof(*bs), GFP_KERNEL); > + if (!bs) > + return -ENOMEM; > + > ctlr = spi_alloc_master(&pdev->dev, ALIGN(sizeof(*bs), > dma_get_cache_alignment())); You can set the second argument to spi_alloc_master() to 0 to conserve memory. Thanks, Lukas ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Use after free in bcm2835_spi_remove() 2020-10-14 20:25 ` Mark Brown 2020-10-14 21:20 ` Florian Fainelli @ 2020-10-15 5:38 ` Lukas Wunner 2020-10-15 12:53 ` Mark Brown 1 sibling, 1 reply; 10+ messages in thread From: Lukas Wunner @ 2020-10-15 5:38 UTC (permalink / raw) To: Mark Brown Cc: Vladimir Oltean, Florian Fainelli, linux-kernel, linux-spi, Sascha Hauer [cc += Sascha] On Wed, Oct 14, 2020 at 09:25:05PM +0100, Mark Brown wrote: > > On Wed, Oct 14, 2020 at 04:09:12PM +0200, Lukas Wunner wrote: > > > Apparently the problem is that spi_unregister_controller() drops the > > > last ref on the controller, causing it to be freed, and afterwards we > > > access the controller's private data, which is part of the same > > > allocation as struct spi_controller: > > Right, the proposed patch is yet another way to fix the issue - it all > comes back to the fact that you shouldn't be using the driver data after > unregistering if it was allocated as part of allocating the controller. > This framework feature is unfortunately quite error prone. How about holding a ref on the controller as long as the SPI driver is bound to the controller's parent device? See below for a patch, compile-tested only for lack of an SPI-equipped machine. Makes sense or dumb idea? If this approach is deemed to be a case of "midlayer fallacy", we could alternatively do this in a library function which drivers opt-in to. Or, given that the majority of drivers seems to be affected, make it the default and allow drivers to opt-out. -- >8 -- diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c index 0cab239..5afa275 100644 --- a/drivers/spi/spi.c +++ b/drivers/spi/spi.c @@ -2399,6 +2399,11 @@ static ssize_t slave_store(struct device *dev, struct device_attribute *attr, extern struct class spi_slave_class; /* dummy */ #endif +static void __spi_controller_put(void *ctlr) +{ + spi_controller_put(ctlr); +} + /** * __spi_alloc_controller - allocate an SPI master or slave controller * @dev: the controller, possibly using the platform_bus @@ -2414,6 +2419,7 @@ static ssize_t slave_store(struct device *dev, struct device_attribute *attr, * This call is used only by SPI controller drivers, which are the * only ones directly touching chip registers. It's how they allocate * an spi_controller structure, prior to calling spi_register_controller(). + * The structure is accessible as long as the SPI driver is bound to @dev. * * This must be called from context that can sleep. * @@ -2429,6 +2435,7 @@ struct spi_controller *__spi_alloc_controller(struct device *dev, { struct spi_controller *ctlr; size_t ctlr_size = ALIGN(sizeof(*ctlr), dma_get_cache_alignment()); + int ret; if (!dev) return NULL; @@ -2449,6 +2456,13 @@ struct spi_controller *__spi_alloc_controller(struct device *dev, pm_suspend_ignore_children(&ctlr->dev, true); spi_controller_set_devdata(ctlr, (void *)ctlr + ctlr_size); + spi_controller_get(ctlr); + ret = devm_add_action(dev, __spi_controller_put, ctlr); + if (ret) { + kfree(ctlr); + return NULL; + } + return ctlr; } EXPORT_SYMBOL_GPL(__spi_alloc_controller); ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: Use after free in bcm2835_spi_remove() 2020-10-15 5:38 ` Lukas Wunner @ 2020-10-15 12:53 ` Mark Brown 2020-10-28 9:59 ` Lukas Wunner 0 siblings, 1 reply; 10+ messages in thread From: Mark Brown @ 2020-10-15 12:53 UTC (permalink / raw) To: Lukas Wunner Cc: Vladimir Oltean, Florian Fainelli, linux-kernel, linux-spi, Sascha Hauer [-- Attachment #1: Type: text/plain, Size: 1504 bytes --] On Thu, Oct 15, 2020 at 07:38:29AM +0200, Lukas Wunner wrote: > On Wed, Oct 14, 2020 at 09:25:05PM +0100, Mark Brown wrote: > > Right, the proposed patch is yet another way to fix the issue - it all > > comes back to the fact that you shouldn't be using the driver data after > > unregistering if it was allocated as part of allocating the controller. > > This framework feature is unfortunately quite error prone. > How about holding a ref on the controller as long as the SPI driver > is bound to the controller's parent device? See below for a patch, > compile-tested only for lack of an SPI-equipped machine. > Makes sense or dumb idea? > If this approach is deemed to be a case of "midlayer fallacy", > we could alternatively do this in a library function which drivers > opt-in to. Or, given that the majority of drivers seems to be affected, > make it the default and allow drivers to opt-out. ... > + spi_controller_get(ctlr); > + ret = devm_add_action(dev, __spi_controller_put, ctlr); > + if (ret) { > + kfree(ctlr); This feels a bit icky - we're masking a standard use after free bug that affects devm in general, not just this instance, and so while it will work it doesn't feel great. If we did do this it'd need more comments and should probably be conditional on using the feature. TBH I'm just thinking it's better to just remove the feature, it's clearly error prone and pretty redundant with devm. I'm not sure any memory savings it's delivering are worth the sharp edges. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Use after free in bcm2835_spi_remove() 2020-10-15 12:53 ` Mark Brown @ 2020-10-28 9:59 ` Lukas Wunner 2020-10-29 22:24 ` Mark Brown 0 siblings, 1 reply; 10+ messages in thread From: Lukas Wunner @ 2020-10-28 9:59 UTC (permalink / raw) To: Mark Brown Cc: Vladimir Oltean, Florian Fainelli, linux-kernel, linux-spi, Sascha Hauer On Thu, Oct 15, 2020 at 01:53:35PM +0100, Mark Brown wrote: > On Thu, Oct 15, 2020 at 07:38:29AM +0200, Lukas Wunner wrote: > > On Wed, Oct 14, 2020 at 09:25:05PM +0100, Mark Brown wrote: > > How about holding a ref on the controller as long as the SPI driver > > is bound to the controller's parent device? See below for a patch, > > compile-tested only for lack of an SPI-equipped machine. [...] > > + spi_controller_get(ctlr); > > + ret = devm_add_action(dev, __spi_controller_put, ctlr); > > + if (ret) { > > + kfree(ctlr); > > This feels a bit icky - we're masking a standard use after free bug that > affects devm in general, not just this instance, and so while it will > work it doesn't feel great. If we did do this it'd need more comments > and should probably be conditional on using the feature. TBH I'm just > thinking it's better to just remove the feature, it's clearly error > prone and pretty redundant with devm. I'm not sure any memory savings > it's delivering are worth the sharp edges. A combined memory allocation for struct spi_controller and the private data has more benefits than just memory savings: Having them adjacent in memory reduces the chance of cache misses. Also, one can get from one struct to the other with a cheap subtraction (using container_of()) instead of having to chase pointers. So it helps performance. And a lack of pointers arguably helps security. Most subsystems embed the controller struct in the private data, but there *is* precedence for doing it the other way round. E.g. the IIO subsystem likewise appends the private data to the controller struct. So I think that's fine, it need not and should not be changed. The problem is that the ->probe() and ->remove() code is currently asymmetric, which is unintuitive: On ->probe(), there's an allocation step and a registration step: spi_alloc_master() spi_register_controller() Whereas on ->remove(), there's no step to free the master which would balance the prior alloc step: spi_unregister_controller() That's because the spi_controller struct is ref-counted and the last ref is usually dropped by spi_unregister_controller(). If the private data is accessed after the spi_unregister_controller() step, a ref needs to be held. I maintain that it would be more intuitive to automatically hold a ref. We could offer a devm_spi_alloc_master() function which holds this ref and automatically releases it on unbind. There are three drivers which call spi_alloc_master() with a size of zero for the private data. In these three cases it is fine to free the spi_controller struct upon spi_unregister_controller(). So these drivers can continue to use spi_alloc_master(). All other drivers could be changed to use the new devm_spi_alloc_master(), or I could scrutinize each of them and convert to the new function only if necessary. Does this sound more convincing to you? Thanks, Lukas ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Use after free in bcm2835_spi_remove() 2020-10-28 9:59 ` Lukas Wunner @ 2020-10-29 22:24 ` Mark Brown 0 siblings, 0 replies; 10+ messages in thread From: Mark Brown @ 2020-10-29 22:24 UTC (permalink / raw) To: Lukas Wunner Cc: Vladimir Oltean, Florian Fainelli, linux-kernel, linux-spi, Sascha Hauer [-- Attachment #1: Type: text/plain, Size: 4078 bytes --] On Wed, Oct 28, 2020 at 10:59:46AM +0100, Lukas Wunner wrote: > On Thu, Oct 15, 2020 at 01:53:35PM +0100, Mark Brown wrote: > > This feels a bit icky - we're masking a standard use after free bug that > > affects devm in general, not just this instance, and so while it will > > work it doesn't feel great. If we did do this it'd need more comments > A combined memory allocation for struct spi_controller and the private > data has more benefits than just memory savings: Having them adjacent > in memory reduces the chance of cache misses. Also, one can get from > one struct to the other with a cheap subtraction (using container_of()) > instead of having to chase pointers. So it helps performance. And a > lack of pointers arguably helps security. The performance arguments don't seem super compelling either way TBH given what SPI does, cache misses accessing the private data seem unlikely to be perceptible when operations boil down to accesses on the SPI bus. > Most subsystems embed the controller struct in the private data, but > there *is* precedence for doing it the other way round. E.g. the IIO > subsystem likewise appends the private data to the controller struct. > So I think that's fine, it need not and should not be changed. Given their ages I suspect IIO copied SPI; I do think it's this reversal that's confusing things. > The problem is that the ->probe() and ->remove() code is currently > asymmetric, which is unintuitive: On ->probe(), there's an allocation > step and a registration step: > spi_alloc_master() > spi_register_controller() > Whereas on ->remove(), there's no step to free the master which would > balance the prior alloc step: > spi_unregister_controller() > That's because the spi_controller struct is ref-counted and the last > ref is usually dropped by spi_unregister_controller(). If the private > data is accessed after the spi_unregister_controller() step, a ref > needs to be held. I agree that it's the asymmetry here, the disagreement is about how to fix it. If we keep the allocations combined then that probably makes sense but I'm at best unclear on the merit of keeping the allocations combined. > I maintain that it would be more intuitive to automatically hold a ref. > We could offer a devm_spi_alloc_master() function which holds this ref > and automatically releases it on unbind. I don't know that it's super intuitive to have to have an explicit free in the driver - you could equally expect that having registered the thing allocated with the core's custom allocation function with the core that the core is now taking ownership of it (which is how SPI devices as opposed to controllers work). That's what makes me lean towards just doing separate allocations, there's no possibility of expectations about transferring ownership. If it's *always* done with devm it kind of gets hidden though so perhaps it's not so bad and my concern goes away... > There are three drivers which call spi_alloc_master() with a size of zero > for the private data. In these three cases it is fine to free the > spi_controller struct upon spi_unregister_controller(). So these drivers > can continue to use spi_alloc_master(). All other drivers could be > changed to use the new devm_spi_alloc_master(), or I could scrutinize > each of them and convert to the new function only if necessary. It's only things that explicitly unregister the controller (rather than using devm) that are going to be affected here, that's a *much* smaller subset. Everything else will be done with driver specific code and hence private data usage before the controler goes away, though it looks like a bunch (though not all) of them have other issues and are using devm when they shouldn't. That's a separate problem which ought to be fixed anyway though. The removal paths aren't exactly heavily stressed, especially not under load. In any case for your proposal your plan makes sense, I mostly just want to avoid ending up with people getting confused in the other direction and introducing another set of bugs. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2020-10-29 22:24 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-10-13 23:48 Use after free in bcm2835_spi_remove() Florian Fainelli 2020-10-14 14:09 ` Lukas Wunner 2020-10-14 19:40 ` Vladimir Oltean 2020-10-14 20:25 ` Mark Brown 2020-10-14 21:20 ` Florian Fainelli 2020-10-22 12:12 ` Lukas Wunner 2020-10-15 5:38 ` Lukas Wunner 2020-10-15 12:53 ` Mark Brown 2020-10-28 9:59 ` Lukas Wunner 2020-10-29 22:24 ` Mark Brown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).