linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Scally <djrscally@gmail.com>
To: Andy Shevchenko <andriy.shevchenko@linux.intel.com>,
	Hans de Goede <hdegoede@redhat.com>
Cc: kernel test robot <oliver.sang@intel.com>,
	lkp@lists.01.org, lkp@intel.com, linux-kernel@vger.kernel.org,
	gregkh@linuxfoundation.org, rafael@kernel.org
Subject: Re: [device property] 995fe757ec: BUG:kernel_NULL_pointer_dereference,address
Date: Wed, 17 Nov 2021 00:10:58 +0000	[thread overview]
Message-ID: <606a6bf2-e971-ddfe-74b0-cbc2b76935ba@gmail.com> (raw)
In-Reply-To: <YZPjf1GfZHR2ZjpD@smile.fi.intel.com>

Hi Hans, Andy

On 16/11/2021 16:59, Andy Shevchenko wrote:
> On Tue, Nov 16, 2021 at 03:55:00PM +0100, Hans de Goede wrote:
>> On 11/16/21 08:41, kernel test robot wrote:
>>> FYI, we noticed the following commit (built with gcc-9):
>>>
>>> commit: 995fe757ecaeac44e023458af64d27655f9dbf73 ("[PATCH] device property: Check fwnode->secondary when finding properties")
>>> url: https://github.com/0day-ci/linux/commits/Daniel-Scally/device-property-Check-fwnode-secondary-when-finding-properties/20211114-044259
>>> base: https://git.kernel.org/cgit/linux/kernel/git/gregkh/driver-core.git b5013d084e03e82ceeab4db8ae8ceeaebe76b0eb
>>> patch link: https://lore.kernel.org/lkml/20211113204141.520924-1-djrscally@gmail.com
>>>
>>> in testcase: boot
>>>
>>> on test machine: qemu-system-i386 -enable-kvm -cpu SandyBridge -smp 2 -m 4G
>>>
>>> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>>>
>>>
>>> +---------------------------------------------+------------+------------+
>>> |                                             | b5013d084e | 995fe757ec |
>>> +---------------------------------------------+------------+------------+
>>> | boot_successes                              | 23         | 0          |
>>> | boot_failures                               | 0          | 22         |
>>> | BUG:kernel_NULL_pointer_dereference,address | 0          | 22         |
>>> | Oops:#[##]                                  | 0          | 22         |
>>> | EIP:fwnode_property_get_reference_args      | 0          | 22         |
>>> | Kernel_panic-not_syncing:Fatal_exception    | 0          | 22         |
>>> +---------------------------------------------+------------+------------+
>>>
>>>
>>> If you fix the issue, kindly add following tag
>>> Reported-by: kernel test robot <oliver.sang@intel.com>
>> Ok, so this patch likely needs a v2 which changes the if to this:
>>
>>         if (ret == -EINVAL && !IS_ERR_OR_NULL(fwnode) &&
>>             !IS_ERR_OR_NULL(fwnode->secondary))
>>                 ret = fwnode_call_int_op(fwnode->secondary, get_reference_args,
>>                                          prop, nargs_prop, nargs, index, args);
>>
>>
>> So that we check fwnode before dereferencing it, note this also changes the
>> (ret < 0) check to (ret == -EINVAL), this makes the secondary node handling
>> identical to fwnode_property_read_int_array() and
>> fwnode_property_read_string_array()
>>
>> Danny, can you send a v2 with this change please?
> Hmm... So, you are suggesting that we need to check it only for EINVAL and
> ENOENT in this case the one that brings us to the NULL pointer dereference.
> But I don't understand what's the difference here.


Sticking point; the ACPI version of .get_reference_args() returns
-ENOENT (converted from -EINVAL [1]) if the property you ask for doesn't
exist against that fwnode, which unless I'm missing something means this
won't work in our use case. This confused me for a while because we
definitely call fwnode_property_read_int_array() in sensor driver probes
through v4l2_fwnode_endpoint_alloc_parse(), but it turns out the ACPI
version of _that_ operation has no matching conversion of the error
code, so when that fails to find the property it sends back -EINVAL and
so the form that exists in fwnode_property_read_int_array() currently
works fine.


We could align them all to if (ret < 0 && !IS_ERR_OR_NULL(fwnode) &&
!IS_ERR_OR_NULL(fwnode->secondary)). This is probably my preferred
option, because I can't really see why we'd only want to do the
secondary check on -EINVAL anyway - but maybe I miss something here.
Alternatively we can take Hans suggestion so they all match the existing
code, but this means we have to handle that conversion first - I
couldn't see from a cursory look that any of the direct callers check
the value of the return beyond "is it 0?", but of course it could be
done somewhere in calls to the fwnode->ops->get_reference_args()
callback instead.


Thoughts?


[1]
https://elixir.bootlin.com/linux/latest/source/drivers/acpi/property.c#L680

>
>>> [   17.327851][    T7] BUG: kernel NULL pointer dereference, address: 00000000
>>> [   17.329758][    T7] #PF: supervisor read access in kernel mode
>>> [   17.331371][    T7] #PF: error_code(0x0000) - not-present page
>>> [   17.332992][    T7] *pde = 00000000
>>> [   17.334107][    T7] Oops: 0000 [#1] PREEMPT
>>> [   17.335310][    T7] CPU: 0 PID: 7 Comm: kworker/u2:0 Tainted: G S                5.15.0-11191-g995fe757ecae #1
>>> [   17.338036][    T7] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
>>> [   17.340544][    T7] Workqueue: events_unbound deferred_probe_work_func
>>> [ 17.342291][ T7] EIP: fwnode_property_get_reference_args (drivers/base/property.c:486 (discriminator 1)) 
>>> [ 17.344051][ T7] Code: 8b 45 0c 50 8b 45 08 50 89 d8 89 55 f4 ff d6 83 c4 0c 89 c6 85 c0 78 55 8d 65 f8 89 f0 5b 5e 5d c3 8d 74 26 00 be fa ff ff ff <8b> 03 85 c0 74 e8 3d 00 f0 ff ff 77 e1 8b 58 04 85 db 74 37 8b 5b
>>> All code
>>> ========
>>>    0:	8b 45 0c             	mov    0xc(%rbp),%eax
>>>    3:	50                   	push   %rax
>>>    4:	8b 45 08             	mov    0x8(%rbp),%eax
>>>    7:	50                   	push   %rax
>>>    8:	89 d8                	mov    %ebx,%eax
>>>    a:	89 55 f4             	mov    %edx,-0xc(%rbp)
>>>    d:	ff d6                	callq  *%rsi
>>>    f:	83 c4 0c             	add    $0xc,%esp
>>>   12:	89 c6                	mov    %eax,%esi
>>>   14:	85 c0                	test   %eax,%eax
>>>   16:	78 55                	js     0x6d
>>>   18:	8d 65 f8             	lea    -0x8(%rbp),%esp
>>>   1b:	89 f0                	mov    %esi,%eax
>>>   1d:	5b                   	pop    %rbx
>>>   1e:	5e                   	pop    %rsi
>>>   1f:	5d                   	pop    %rbp
>>>   20:	c3                   	retq   
>>>   21:	8d 74 26 00          	lea    0x0(%rsi,%riz,1),%esi
>>>   25:	be fa ff ff ff       	mov    $0xfffffffa,%esi
>>>   2a:*	8b 03                	mov    (%rbx),%eax		<-- trapping instruction
>>>   2c:	85 c0                	test   %eax,%eax
>>>   2e:	74 e8                	je     0x18
>>>   30:	3d 00 f0 ff ff       	cmp    $0xfffff000,%eax
>>>   35:	77 e1                	ja     0x18
>>>   37:	8b 58 04             	mov    0x4(%rax),%ebx
>>>   3a:	85 db                	test   %ebx,%ebx
>>>   3c:	74 37                	je     0x75
>>>   3e:	8b                   	.byte 0x8b
>>>   3f:	5b                   	pop    %rbx
>>>
>>> Code starting with the faulting instruction
>>> ===========================================
>>>    0:	8b 03                	mov    (%rbx),%eax
>>>    2:	85 c0                	test   %eax,%eax
>>>    4:	74 e8                	je     0xffffffffffffffee
>>>    6:	3d 00 f0 ff ff       	cmp    $0xfffff000,%eax
>>>    b:	77 e1                	ja     0xffffffffffffffee
>>>    d:	8b 58 04             	mov    0x4(%rax),%ebx
>>>   10:	85 db                	test   %ebx,%ebx
>>>   12:	74 37                	je     0x4b
>>>   14:	8b                   	.byte 0x8b
>>>   15:	5b                   	pop    %rbx
>>> [   17.350847][    T7] EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: c37cd6d8
>>> [   17.352783][    T7] ESI: ffffffea EDI: f5b5a400 EBP: c4cffd24 ESP: c4cffd14
>>> [   17.354673][    T7] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00010246
>>> [   17.362075][    T7] CR0: 80050033 CR2: 00000000 CR3: 04206000 CR4: 00000690
>>> [   17.363993][    T7] Call Trace:
>>> [ 17.365018][ T7] fwnode_find_reference (drivers/base/property.c:514) 
>>> [ 17.366430][ T7] ? __this_cpu_preempt_check (lib/smp_processor_id.c:67) 
>>> [ 17.367825][ T7] ? lockdep_init_map_type (kernel/locking/lockdep.c:4813) 
>>> [ 17.369325][ T7] ? phylink_run_resolve+0x20/0x20 
>>> [ 17.370897][ T7] ? init_timer_key (kernel/time/timer.c:818) 
>>> [ 17.372228][ T7] fwnode_get_phy_node (drivers/net/phy/phy_device.c:2986) 
>>> [ 17.373574][ T7] phylink_fwnode_phy_connect (drivers/net/phy/phylink.c:1180 drivers/net/phy/phylink.c:1166) 
>>> [ 17.375014][ T7] phylink_of_phy_connect (drivers/net/phy/phylink.c:1152) 
>>> [ 17.376373][ T7] dsa_slave_create (net/dsa/slave.c:1889 net/dsa/slave.c:2036) 
>>> [ 17.377765][ T7] dsa_tree_setup_switches (net/dsa/dsa2.c:477 net/dsa/dsa2.c:977) 
>>> [ 17.379282][ T7] dsa_register_switch (net/dsa/dsa2.c:1065 net/dsa/dsa2.c:1565 net/dsa/dsa2.c:1579) 
>>> [ 17.380762][ T7] dsa_loop_drv_probe (drivers/net/dsa/dsa_loop.c:333) 
>>> [ 17.382137][ T7] mdio_probe (drivers/net/phy/mdio_device.c:157) 

  reply	other threads:[~2021-11-17  0:12 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-13 20:41 [PATCH] device property: Check fwnode->secondary when finding properties Daniel Scally
2021-11-16  7:41 ` [device property] 995fe757ec: BUG:kernel_NULL_pointer_dereference,address kernel test robot
2021-11-16 14:55   ` Hans de Goede
2021-11-16 16:59     ` Andy Shevchenko
2021-11-17  0:10       ` Daniel Scally [this message]
2021-11-17 11:54         ` Hans de Goede
2021-11-17 12:38           ` Andy Shevchenko
2021-11-19 21:47             ` Daniel Scally

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=606a6bf2-e971-ddfe-74b0-cbc2b76935ba@gmail.com \
    --to=djrscally@gmail.com \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hdegoede@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@intel.com \
    --cc=lkp@lists.01.org \
    --cc=oliver.sang@intel.com \
    --cc=rafael@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).