linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Guenter Roeck <linux@roeck-us.net>
To: Daniel Lezcano <daniel.lezcano@linaro.org>, linux-pm@vger.kernel.org
Cc: "Rafael J . Wysocki" <rafael@kernel.org>,
	Amit Kucheria <amitk@kernel.org>, Zhang Rui <rui.zhang@intel.com>,
	Lukasz Luba <lukasz.luba@arm.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC/RFT PATCH resend] thermal: Protect thermal device operations against thermal device removal
Date: Tue, 4 Oct 2022 07:24:50 -0700	[thread overview]
Message-ID: <50423dd3-a190-80e3-79ca-0a0328dfd4c1@roeck-us.net> (raw)
In-Reply-To: <fe6c90ea-7b19-36d9-2568-f484c54eafff@linaro.org>

On 10/4/22 04:49, Daniel Lezcano wrote:
> On 04/10/2022 05:39, Guenter Roeck wrote:
>> A call to thermal_zone_device_unregister() results in thermal device
>> removal. While the thermal device itself is reference counted and
>> protected against removal of its associated data structures, the
>> thermal device operations are owned by the calling code and unprotected.
>> This may result in crashes such as
>>
>> BUG: unable to handle page fault for address: ffffffffc04ef420
>>   #PF: supervisor read access in kernel mode
>>   #PF: error_code(0x0000) - not-present page
>> PGD 5d60e067 P4D 5d60e067 PUD 5d610067 PMD 110197067 PTE 0
>> Oops: 0000 [#1] PREEMPT SMP NOPTI
>> CPU: 1 PID: 3209 Comm: cat Tainted: G        W         5.10.136-19389-g615abc6eb807 #1 02df41ac0b12f3a64f4b34245188d8875bb3bce1
>> Hardware name: Google Coral/Coral, BIOS Google_Coral.10068.92.0 11/27/2018
>> RIP: 0010:thermal_zone_get_temp+0x26/0x73
>> Code: 89 c3 eb d3 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 53 48 85 ff 74 50 48 89 fb 48 81 ff 00 f0 ff ff 77 44 48 8b 83 98 03 00 00 <48> 83 78 10 00 74 36 49 89 f6 4c 8d bb d8 03 00 00 4c 89 ff e8 9f
>> RSP: 0018:ffffb3758138fd38 EFLAGS: 00010287
>> RAX: ffffffffc04ef410 RBX: ffff98f14d7fb000 RCX: 0000000000000000
>> RDX: ffff98f17cf90000 RSI: ffffb3758138fd64 RDI: ffff98f14d7fb000
>> RBP: ffffb3758138fd50 R08: 0000000000001000 R09: ffff98f17cf90000
>> R10: 0000000000000000 R11: ffffffff8dacad28 R12: 0000000000001000
>> R13: ffff98f1793a7d80 R14: ffff98f143231708 R15: ffff98f14d7fb018
>> FS:  00007ec166097800(0000) GS:ffff98f1bbd00000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: ffffffffc04ef420 CR3: 000000010ee9a000 CR4: 00000000003506e0
>> Call Trace:
>>   temp_show+0x31/0x68
>>   dev_attr_show+0x1d/0x4f
>>   sysfs_kf_seq_show+0x92/0x107
>>   seq_read_iter+0xf5/0x3f2
>>   vfs_read+0x205/0x379
>>   __x64_sys_read+0x7c/0xe2
>>   do_syscall_64+0x43/0x55
>>   entry_SYSCALL_64_after_hwframe+0x61/0xc6
>>
>> if a thermal device is removed while accesses to its device attributes
>> are ongoing.
>>
>> Use the thermal device mutex to protect device operations. Clear the
>> device operations pointer in thermal_zone_device_unregister() under
>> protection of this mutex, and only access it while the mutex is held.
>> Flatten and simplify device mutex operations to only acquire the mutex
>> once and hold it instead of acquiring and releasing it several times
>> during thermal operations. Only validate parameters once at module entry
>> points after acquiring the mutex. Execute governor operations under mutex
>> instead of expecting governors to acquire and release it.
> 
> Does the following series:
> 
> https://lore.kernel.org/lkml/20220805153834.2510142-1-daniel.lezcano@linaro.org/
> 
> goes to the same direction than your proposal?
> 

Thanks for the pointer.

The series simplifies the mutex problem, but it doesn't solve the problem
I was trying to solve (the problem causing the crash above). There
is still no guarantee that thermal device ops are not accessed after
the call to thermal_zone_device_unregister().

Thanks,
Guenter

  reply	other threads:[~2022-10-04 14:25 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-04  3:39 [RFC/RFT PATCH resend] thermal: Protect thermal device operations against thermal device removal Guenter Roeck
2022-10-04 11:49 ` Daniel Lezcano
2022-10-04 14:24   ` Guenter Roeck [this message]
2022-10-07 15:23 ` [thermal] 4971d1200e: BUG:KASAN:use-after-free_in_mutex_lock kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50423dd3-a190-80e3-79ca-0a0328dfd4c1@roeck-us.net \
    --to=linux@roeck-us.net \
    --cc=amitk@kernel.org \
    --cc=daniel.lezcano@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=lukasz.luba@arm.com \
    --cc=rafael@kernel.org \
    --cc=rui.zhang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).