From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751604AbdAYNgj (ORCPT ); Wed, 25 Jan 2017 08:36:39 -0500 Received: from mx2.suse.de ([195.135.220.15]:57638 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751386AbdAYNgh (ORCPT ); Wed, 25 Jan 2017 08:36:37 -0500 Date: Wed, 25 Jan 2017 14:36:31 +0100 From: "Luis R. Rodriguez" To: Greg KH Cc: "Luis R. Rodriguez" , ming.lei@canonical.com, keescook@chromium.org, linux-kernel-dev@beckhoff.com, jakub.kicinski@netronome.com, chris@chris-wilson.co.uk, oss-drivers@netronome.com, johannes@sipsolutions.net, j@w1.fi, teg@jklm.no, kay@vrfy.org, jwboyer@fedoraproject.org, dmitry.torokhov@gmail.com, seth.forshee@canonical.com, bjorn.andersson@linaro.org, linux-kernel@vger.kernel.org, wagi@monom.org, stephen.boyd@linaro.org, zohar@linux.vnet.ibm.com, tiwai@suse.de, dwmw2@infradead.org, fengguang.wu@intel.com, dhowells@redhat.com, arend.vanspriel@broadcom.com, kvalo@codeaurora.org, "[3.10+]" Subject: Re: [PATCH 7/7] firmware: firmware: fix NULL pointer dereference in __fw_load_abort() Message-ID: <20170125133631.GP13946@wotan.suse.de> References: <20170118200141.GH13946@wotan.suse.de> <20170123161111.5925-1-mcgrof@kernel.org> <20170123161111.5925-8-mcgrof@kernel.org> <20170125105204.GA5919@kroah.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170125105204.GA5919@kroah.com> User-Agent: Mutt/1.6.0 (2016-04-01) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jan 25, 2017 at 11:52:04AM +0100, Greg KH wrote: > On Mon, Jan 23, 2017 at 08:11:11AM -0800, Luis R. Rodriguez wrote: > > Since commit 5d47ec02c37ea632398cb251c884e3a488dff794 > > ("firmware: Correct handling of fw_state_wait() return value") > > fw_load_abort(fw_priv) could be called twice and lead us to a > > kernel crash. This happens only when the firmware fallback mechanism > > (regular or custom) is used. The fallback mechanism exposes a sysfs > > interface for userspace to upload a file and notify the kernel when > > the file is loaded and ready, or to cancel an upload by echo'ing -1 > > into on the loading file: > > > > echo -n "-1" > /sys/$DEVPATH/loading > > > > This will call fw_load_abort(). Some distributions actually have > > a udev rule in place to *always* immediately cancel all firmware > > fallback mechanism requests (Debian, OpenSUSE), they have: > > > > $ cat /lib/udev/rules.d/50-firmware.rules > > # stub for immediately telling the kernel that userspace firmware loading > > # failed; necessary to avoid long timeouts with CONFIG_FW_LOADER_USER_HELPER=y > > SUBSYSTEM=="firmware", ACTION=="add", ATTR{loading}="-1 > > > > This was done since udev removed the firmware fallback mechanism a while ago > > and a long standing misunderstood issues with the timeout (but now corrected). > > Distributions with this udev rule would run into this crash only if the > > fallback mechanism is used. Since most distributions disable by default > > using the fallback mechanism (CONFIG_FW_LOADER_USER_HELPER_FALLBACK), this > > would typicaly mean only 2 drivers which *require* the fallback mechanism > > could typically incur a crash: drivers/firmware/dell_rbu.c and the > > drivers/leds/leds-lp55xx-common.c driver. > > > > Distributions enabling CONFIG_FW_LOADER_USER_HELPER_FALLBACK are clearly > > more exposed as every file not found through a firmware request will > > use the fallback mechanism. > > > > The crash happens because after commit 5b029624948d ("firmware: do not > > use fw_lock for fw_state protection") and subsequent fix commit > > 5d47ec02c37ea6 ("firmware: Correct handling of fw_state_wait() return > > value") a race can happen between this cancelation and the firmware > > fw_state_wait_timeout() being woken up after a state change with which > > fw_load_abort() as that calls swake_up(). Upon error fw_state_wait_timeout() > > will also again call fw_load_abort() and trigger a null reference. > > > > At first glance we could just fix this with a !buf check on > > fw_load_abort() before accessing buf->fw_st, however there is > > a logical issue in having a state machine used for the fallback > > mechanism and preventing access from it once we abort as its inside > > the buf (buf->fw_st). > > > > The firmware_class.c code is setting the buf to NULL to annotate an > > abort has occurred. Replace this mechanism by simply using the state check > > instead. All the other code in place already uses similar checks > > for aborting as well so no further changes are needed. > > > > An oops can be reproduced with the new fw_fallback.sh fallback > > mechanism cancellation test. Either cancelling the fallback mechanism > > or the custom fallback mechanism triggers a crash. > > > > mcgrof@piggy ~/linux-next/tools/testing/selftests/firmware > > (git::20170111-fw-fixes)$ sudo ./fw_fallback.sh > > > > ./fw_fallback.sh: timeout works > > ./fw_fallback.sh: firmware comparison works > > ./fw_fallback.sh: fallback mechanism works > > > > [ this then sits here when it is trying the cancellation test ] > > > > Kernel log: > > > > [ 36.750521] test_firmware: loading 'nope-test-firmware.bin' > > [ 36.751144] misc test_firmware: Direct firmware load for nope-test-firmware.bin failed with error -2 > > [ 36.752034] misc test_firmware: Falling back to user helper > > [ 36.853324] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 > > [ 36.854221] IP: _request_firmware+0xa27/0xad0 > > [ 36.854671] PGD 0 > > [ 36.854672] > > [ 36.855081] Oops: 0000 [#1] SMP > > [ 36.855433] Modules linked in: test_firmware(E) ... etc ... > > [ 36.857802] CPU: 1 PID: 1396 Comm: fw_fallback.sh Tainted: G W E 4.10.0-rc3-next-20170111+ #30 > > [ 36.857802] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014 > > [ 36.857802] task: ffff9740b27f4340 task.stack: ffffbb15c0bc8000 > > [ 36.857802] RIP: 0010:_request_firmware+0xa27/0xad0 > > [ 36.857802] RSP: 0018:ffffbb15c0bcbd10 EFLAGS: 00010246 > > [ 36.857802] RAX: 00000000fffffffe RBX: ffff9740afe5aa80 RCX: 0000000000000000 > > [ 36.857802] RDX: ffff9740b27f4340 RSI: 0000000000000283 RDI: 0000000000000000 > > [ 36.857802] RBP: ffffbb15c0bcbd90 R08: ffffbb15c0bcbcd8 R09: 0000000000000000 > > [ 36.857802] R10: 0000000894a0d4b1 R11: 000000000000008c R12: ffffffffc0312480 > > [ 36.857802] R13: 0000000000000005 R14: ffff9740b1c32400 R15: 00000000000003e8 > > [ 36.857802] FS: 00007f8604422700(0000) GS:ffff9740bfc80000(0000) knlGS:0000000000000000 > > [ 36.857802] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 36.857802] CR2: 0000000000000038 CR3: 000000012164c000 CR4: 00000000000006e0 > > [ 36.857802] Call Trace: > > [ 36.857802] request_firmware+0x37/0x50 > > [ 36.857802] trigger_request_store+0x79/0xd0 [test_firmware] > > [ 36.857802] dev_attr_store+0x18/0x30 > > [ 36.857802] sysfs_kf_write+0x37/0x40 > > [ 36.857802] kernfs_fop_write+0x110/0x1a0 > > [ 36.857802] __vfs_write+0x37/0x160 > > [ 36.857802] ? _cond_resched+0x1a/0x50 > > [ 36.857802] vfs_write+0xb5/0x1a0 > > [ 36.857802] SyS_write+0x55/0xc0 > > [ 36.857802] ? trace_do_page_fault+0x37/0xd0 > > [ 36.857802] entry_SYSCALL_64_fastpath+0x1e/0xad > > [ 36.857802] RIP: 0033:0x7f8603f49620 > > [ 36.857802] RSP: 002b:00007fff6287b788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 > > [ 36.857802] RAX: ffffffffffffffda RBX: 000055c307b110a0 RCX: 00007f8603f49620 > > [ 36.857802] RDX: 0000000000000016 RSI: 000055c3084d8a90 RDI: 0000000000000001 > > [ 36.857802] RBP: 0000000000000016 R08: 000000000000c0ff R09: 000055c3084d6336 > > [ 36.857802] R10: 000055c307b108b0 R11: 0000000000000246 R12: 000055c307b13c80 > > [ 36.857802] R13: 000055c3084d6320 R14: 0000000000000000 R15: 00007fff6287b950 > > [ 36.857802] Code: 9f 64 84 e8 9c 61 fe ff b8 f4 ff ff ff e9 6b f9 ff > > ff 48 c7 c7 40 6b 8d 84 89 45 a8 e8 43 84 18 00 49 8b be 00 03 00 00 8b > > 45 a8 <83> 7f 38 02 74 08 e8 6e ec ff ff 8b 45 a8 49 c7 86 00 03 00 00 > > [ 36.857802] RIP: _request_firmware+0xa27/0xad0 RSP: ffffbb15c0bcbd10 > > [ 36.857802] CR2: 0000000000000038 > > [ 36.872685] ---[ end trace 6d94ac339c133e6f ]--- > > > > In above case the call hierarchy that causes the crash looks as follows: > > > > lib/test_firmware.c request_firmware() > > -> fw_load_from_user_helper() > > -> _request_firmware_load() > > -> call fw_state_wait_timeout() > > > > Some time later firmware_loading_store() scans a control value of "-1" > > -> switch(loading) case -1: will call > > -> fw_load_abort(fw_priv) which calls > > -> __fw_load_abort(fw_priv->buf) > > -> and set fw_priv->buf = NULL; > > > > Upon being woken up via swake_up(), back in _request_firmware_load() > > fw_state_wait_timeout() returns -ENOENT > > -> since mentioned commit > > -> fw_load_abort(fw_priv) is called a second time > > -> and this time it would call: > > -> __fw_load_abort(NULL /* fw_priv->buf */) > > -> and we get: NULL->fw_st.status > > > > Fixes: 5d47ec02c37e ("firmware: Correct handling of fw_state_wait() return value") > > Reported-and-Tested-by: Jakub Kicinski > > Reported-and-Tested-by: Patrick Bruenn > > Reported-by: Chris Wilson > > CC: [3.10+] Note: 3.10+ > > Signed-off-by: Luis R. Rodriguez > > --- > > drivers/base/firmware_class.c | 5 +---- > > 1 file changed, 1 insertion(+), 4 deletions(-) > > Why is this patch 7/7? Without the tests available on a development tree one cannot easily reproduce. > Shouldn't it go into 4.10-final now? Why wait > for 4.11-rc1? Certainly, it should go into 4.10 now, sorry if it seemed otherwise. Luis