* Possible case of Race in kobject_get_path()
@ 2022-03-30 10:11 Mukesh Ojha
2022-03-30 10:16 ` Greg KH
0 siblings, 1 reply; 9+ messages in thread
From: Mukesh Ojha @ 2022-03-30 10:11 UTC (permalink / raw)
To: Rafael J. Wysocki, gregkh, linux-remoteproc
Hi All,
We are facing one issue where one driver (p1) is trying to register its
device from driver probe
and from another path (p2) dev_set_name(new name) done from driver probe
of the added device whose
new name length can be more than earlier done in (p1 path) which could
result in redzone overwritten issue.
Can we get your suggestion here ? is this case of a race here ?
p1 p2
device_register()
kobject_get_path()
=> get_kobj_path_length
(length is calculated from this path)
dev_set_name()
=> fill_kobj_path
(path is copied here)
Thanks,
-Mukesh
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Possible case of Race in kobject_get_path()
2022-03-30 10:11 Possible case of Race in kobject_get_path() Mukesh Ojha
@ 2022-03-30 10:16 ` Greg KH
[not found] ` <148f4e9b-0584-ccc1-1443-30d41190dfd3@quicinc.com>
0 siblings, 1 reply; 9+ messages in thread
From: Greg KH @ 2022-03-30 10:16 UTC (permalink / raw)
To: Mukesh Ojha; +Cc: Rafael J. Wysocki, linux-remoteproc
On Wed, Mar 30, 2022 at 03:41:04PM +0530, Mukesh Ojha wrote:
> Hi All,
>
> We are facing one issue where one driver (p1) is trying to register its
> device from driver probe
> and from another path (p2) dev_set_name(new name) done from driver probe of
> the added device whose
> new name length can be more than earlier done in (p1 path) which could
> result in redzone overwritten issue.
I do not understand, what specific driver is doing this so that we can
see an example of this problem?
> Can we get your suggestion here ? is this case of a race here ?
>
> p1 p2
>
> device_register()
> kobject_get_path()
> => get_kobj_path_length
> (length is calculated from this path)
>
> dev_set_name()
>
> => fill_kobj_path
> (path is copied here)
>
I can not understand this example, any specific code you can point me
at?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Possible case of Race in kobject_get_path()
[not found] ` <148f4e9b-0584-ccc1-1443-30d41190dfd3@quicinc.com>
@ 2022-03-30 11:01 ` Greg KH
2022-03-30 11:14 ` Mukesh Ojha
0 siblings, 1 reply; 9+ messages in thread
From: Greg KH @ 2022-03-30 11:01 UTC (permalink / raw)
To: Mukesh Ojha; +Cc: Rafael J. Wysocki, linux-kernel
On Wed, Mar 30, 2022 at 04:14:11PM +0530, Mukesh Ojha wrote:
>
> On 3/30/2022 3:46 PM, Greg KH wrote:
> > On Wed, Mar 30, 2022 at 03:41:04PM +0530, Mukesh Ojha wrote:
> > > Hi All,
> > >
> > > We are facing one issue where one driver (p1) is trying to register its
> > > device from driver probe
> > > and from another path (p2) dev_set_name(new name) done from driver probe of
> > > the added device whose
> > > new name length can be more than earlier done in (p1 path) which could
> > > result in redzone overwritten issue.
> > I do not understand, what specific driver is doing this so that we can
> > see an example of this problem?
>
> >>
>
> trying to paste some logs of the race.
>
> [ 14.235820][ T503] BUG kmalloc-128 (Tainted: G S O ): Left Redzone
> overwritten
What kernel version is this?
> [ 14.244189][ T503]
> -----------------------------------------------------------------------------
> [ 14.255241][ T503] INFO: 0xffffff87caae0f7d-0xffffff87caae0f7f
> @offset=3965. First byte 0x2f instead of 0xcc
> [ 14.265176][ T503] INFO: Allocated in kobject_get_path+0x58/0x100 age=1
> cpu=0 pid=503
> [ 14.273111][ T503] kobject_get_path+0x58/0x100
> [ 14.277747][ T503] kobject_uevent_env+0x120/0x744
> [ 14.282639][ T503] device_add+0x98c/0x1028
> [ 14.286925][ T503] *device_register+0x24/0x38*
> [ 14.291395][ T503] slim_alloc_device+0xdc/0x108 [slimbus]
> [ 14.296992][ T503] slim_register_controller+0x210/0x2ac [slimbus]
> [ 14.303291][ T503] qcom_slim_ngd_probe+0x3c/0x348 [slim_qcom_ngd_ctrl]
> [ 14.310007][ T503] platform_drv_probe+0x60/0x180
> [ 14.314812][ T503] really_probe+0x208/0xb64
> [ 14.319184][ T503] driver_probe_device+0x130/0x254
> [ 14.324159][ T503] __device_attach_driver+0x1e8/0x324
> [ 14.329399][ T503] __device_attach+0x170/0x364
> [ 14.334035][ T503] bus_probe_device+0x4c/0x164
> [ 14.338671][ T503] device_add+0xa60/0x1028
> [ 14.342957][ T503] platform_device_add+0x260/0x2c8
> [ 14.347941][ T503] qcom_slim_ngd_ctrl_probe+0x71c/0x804
> [slim_qcom_ngd_ctrl]
> [ 14.355177][ T503] INFO: Freed in kobject_uevent_env+0x210/0x744 age=1
> cpu=4 pid=518
> [ 14.363018][ T503] do_init_module+0xac/0x494
> [ 14.367475][ T503] load_module+0x35c0/0x3e54
> [ 14.371930][ T503] __se_sys_finit_module+0x188/0x1c8
> [ 14.377086][ T503] __arm64_sys_finit_module+0x20/0x30
> [ 14.382328][ T503] el0_svc_common+0xdc/0x294
> [ 14.386786][ T503] el0_svc+0x38/0x9c
> [ 14.390552][ T503] el0_sync_handler+0x8c/0xf0
>
>
> [ 14.490994*][ T503] Redzone ffffff87caae0f7**0*: cc cc cc cc cc cc cc cc cc
> cc cc cc cc 2f 64 65 .............*/de*
> [ 14.501185][ T503] Object ffffff87caae0f80: 76 69 63 65 73 2f 70 6c 61 74
> 66 6f 72 6d 2f 73 vices/platform/s
> [ 14.511376][ T503] Object ffffff87caae0f90: 6f 63 2f 33 33 34 30 30 30 30
> 2e 73 6c 69 6d 2f oc/3340000.slim/
> [ 14.521566][ T503] Object ffffff87caae0fa0: 71 63 6f 6d 2c 73 6c 69 6d 2d
> 6e 67 64 2e 31 2f qcom,slim-ngd.1/
> [ 14.531757][ T503] Object ffffff87caae0fb0: 62 74 66 6d 73 6c 69 6d 5f 73
> 6c 61 76 65 00 00 *btfmslim_slave*..
>
>
>
> 499 static int btfm_slim_probe(struct slim_device *slim)
I do not see that function in Linus's tree right now. Where does it
come from?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Possible case of Race in kobject_get_path()
2022-03-30 11:01 ` Greg KH
@ 2022-03-30 11:14 ` Mukesh Ojha
2022-03-30 11:19 ` Greg KH
0 siblings, 1 reply; 9+ messages in thread
From: Mukesh Ojha @ 2022-03-30 11:14 UTC (permalink / raw)
To: Greg KH; +Cc: Rafael J. Wysocki, linux-kernel
On 3/30/2022 4:31 PM, Greg KH wrote:
> On Wed, Mar 30, 2022 at 04:14:11PM +0530, Mukesh Ojha wrote:
>> On 3/30/2022 3:46 PM, Greg KH wrote:
>>> On Wed, Mar 30, 2022 at 03:41:04PM +0530, Mukesh Ojha wrote:
>>>> Hi All,
>>>>
>>>> We are facing one issue where one driver (p1) is trying to register its
>>>> device from driver probe
>>>> and from another path (p2) dev_set_name(new name) done from driver probe of
>>>> the added device whose
>>>> new name length can be more than earlier done in (p1 path) which could
>>>> result in redzone overwritten issue.
>>> I do not understand, what specific driver is doing this so that we can
>>> see an example of this problem?
>> trying to paste some logs of the race.
>>
>> [ 14.235820][ T503] BUG kmalloc-128 (Tainted: G S O ): Left Redzone
>> overwritten
> What kernel version is this?
5.10
>
>> [ 14.244189][ T503]
>> -----------------------------------------------------------------------------
>> [ 14.255241][ T503] INFO: 0xffffff87caae0f7d-0xffffff87caae0f7f
>> @offset=3965. First byte 0x2f instead of 0xcc
>> [ 14.265176][ T503] INFO: Allocated in kobject_get_path+0x58/0x100 age=1
>> cpu=0 pid=503
>> [ 14.273111][ T503] kobject_get_path+0x58/0x100
>> [ 14.277747][ T503] kobject_uevent_env+0x120/0x744
>> [ 14.282639][ T503] device_add+0x98c/0x1028
>> [ 14.286925][ T503] *device_register+0x24/0x38*
>> [ 14.291395][ T503] slim_alloc_device+0xdc/0x108 [slimbus]
>> [ 14.296992][ T503] slim_register_controller+0x210/0x2ac [slimbus]
>> [ 14.303291][ T503] qcom_slim_ngd_probe+0x3c/0x348 [slim_qcom_ngd_ctrl]
>> [ 14.310007][ T503] platform_drv_probe+0x60/0x180
>> [ 14.314812][ T503] really_probe+0x208/0xb64
>> [ 14.319184][ T503] driver_probe_device+0x130/0x254
>> [ 14.324159][ T503] __device_attach_driver+0x1e8/0x324
>> [ 14.329399][ T503] __device_attach+0x170/0x364
>> [ 14.334035][ T503] bus_probe_device+0x4c/0x164
>> [ 14.338671][ T503] device_add+0xa60/0x1028
>> [ 14.342957][ T503] platform_device_add+0x260/0x2c8
>> [ 14.347941][ T503] qcom_slim_ngd_ctrl_probe+0x71c/0x804
>> [slim_qcom_ngd_ctrl]
>> [ 14.355177][ T503] INFO: Freed in kobject_uevent_env+0x210/0x744 age=1
>> cpu=4 pid=518
>> [ 14.363018][ T503] do_init_module+0xac/0x494
>> [ 14.367475][ T503] load_module+0x35c0/0x3e54
>> [ 14.371930][ T503] __se_sys_finit_module+0x188/0x1c8
>> [ 14.377086][ T503] __arm64_sys_finit_module+0x20/0x30
>> [ 14.382328][ T503] el0_svc_common+0xdc/0x294
>> [ 14.386786][ T503] el0_svc+0x38/0x9c
>> [ 14.390552][ T503] el0_sync_handler+0x8c/0xf0
>>
>>
>> [ 14.490994*][ T503] Redzone ffffff87caae0f7**0*: cc cc cc cc cc cc cc cc cc
>> cc cc cc cc 2f 64 65 .............*/de*
>> [ 14.501185][ T503] Object ffffff87caae0f80: 76 69 63 65 73 2f 70 6c 61 74
>> 66 6f 72 6d 2f 73 vices/platform/s
>> [ 14.511376][ T503] Object ffffff87caae0f90: 6f 63 2f 33 33 34 30 30 30 30
>> 2e 73 6c 69 6d 2f oc/3340000.slim/
>> [ 14.521566][ T503] Object ffffff87caae0fa0: 71 63 6f 6d 2c 73 6c 69 6d 2d
>> 6e 67 64 2e 31 2f qcom,slim-ngd.1/
>> [ 14.531757][ T503] Object ffffff87caae0fb0: 62 74 66 6d 73 6c 69 6d 5f 73
>> 6c 61 76 65 00 00 *btfmslim_slave*..
>>
>>
>>
>> 499 static int btfm_slim_probe(struct slim_device *slim)
> I do not see that function in Linus's tree right now. Where does it
> come from?
Don't you think, it can be caused by any kernel module.
>
> thanks,
>
> greg k-h
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Possible case of Race in kobject_get_path()
2022-03-30 11:14 ` Mukesh Ojha
@ 2022-03-30 11:19 ` Greg KH
2022-03-30 11:23 ` Mukesh Ojha
0 siblings, 1 reply; 9+ messages in thread
From: Greg KH @ 2022-03-30 11:19 UTC (permalink / raw)
To: Mukesh Ojha; +Cc: Rafael J. Wysocki, linux-kernel
On Wed, Mar 30, 2022 at 04:44:26PM +0530, Mukesh Ojha wrote:
>
> On 3/30/2022 4:31 PM, Greg KH wrote:
> > On Wed, Mar 30, 2022 at 04:14:11PM +0530, Mukesh Ojha wrote:
> > > On 3/30/2022 3:46 PM, Greg KH wrote:
> > > > On Wed, Mar 30, 2022 at 03:41:04PM +0530, Mukesh Ojha wrote:
> > > > > Hi All,
> > > > >
> > > > > We are facing one issue where one driver (p1) is trying to register its
> > > > > device from driver probe
> > > > > and from another path (p2) dev_set_name(new name) done from driver probe of
> > > > > the added device whose
> > > > > new name length can be more than earlier done in (p1 path) which could
> > > > > result in redzone overwritten issue.
> > > > I do not understand, what specific driver is doing this so that we can
> > > > see an example of this problem?
> > > trying to paste some logs of the race.
> > >
> > > [ 14.235820][ T503] BUG kmalloc-128 (Tainted: G S O ): Left Redzone
> > > overwritten
> > What kernel version is this?
> 5.10
That is very old, is this an issue in 5.17?
> > > [ 14.244189][ T503]
> > > -----------------------------------------------------------------------------
> > > [ 14.255241][ T503] INFO: 0xffffff87caae0f7d-0xffffff87caae0f7f
> > > @offset=3965. First byte 0x2f instead of 0xcc
> > > [ 14.265176][ T503] INFO: Allocated in kobject_get_path+0x58/0x100 age=1
> > > cpu=0 pid=503
> > > [ 14.273111][ T503] kobject_get_path+0x58/0x100
> > > [ 14.277747][ T503] kobject_uevent_env+0x120/0x744
> > > [ 14.282639][ T503] device_add+0x98c/0x1028
> > > [ 14.286925][ T503] *device_register+0x24/0x38*
> > > [ 14.291395][ T503] slim_alloc_device+0xdc/0x108 [slimbus]
> > > [ 14.296992][ T503] slim_register_controller+0x210/0x2ac [slimbus]
> > > [ 14.303291][ T503] qcom_slim_ngd_probe+0x3c/0x348 [slim_qcom_ngd_ctrl]
> > > [ 14.310007][ T503] platform_drv_probe+0x60/0x180
> > > [ 14.314812][ T503] really_probe+0x208/0xb64
> > > [ 14.319184][ T503] driver_probe_device+0x130/0x254
> > > [ 14.324159][ T503] __device_attach_driver+0x1e8/0x324
> > > [ 14.329399][ T503] __device_attach+0x170/0x364
> > > [ 14.334035][ T503] bus_probe_device+0x4c/0x164
> > > [ 14.338671][ T503] device_add+0xa60/0x1028
> > > [ 14.342957][ T503] platform_device_add+0x260/0x2c8
> > > [ 14.347941][ T503] qcom_slim_ngd_ctrl_probe+0x71c/0x804
> > > [slim_qcom_ngd_ctrl]
> > > [ 14.355177][ T503] INFO: Freed in kobject_uevent_env+0x210/0x744 age=1
> > > cpu=4 pid=518
> > > [ 14.363018][ T503] do_init_module+0xac/0x494
> > > [ 14.367475][ T503] load_module+0x35c0/0x3e54
> > > [ 14.371930][ T503] __se_sys_finit_module+0x188/0x1c8
> > > [ 14.377086][ T503] __arm64_sys_finit_module+0x20/0x30
> > > [ 14.382328][ T503] el0_svc_common+0xdc/0x294
> > > [ 14.386786][ T503] el0_svc+0x38/0x9c
> > > [ 14.390552][ T503] el0_sync_handler+0x8c/0xf0
> > >
> > >
> > > [ 14.490994*][ T503] Redzone ffffff87caae0f7**0*: cc cc cc cc cc cc cc cc cc
> > > cc cc cc cc 2f 64 65 .............*/de*
> > > [ 14.501185][ T503] Object ffffff87caae0f80: 76 69 63 65 73 2f 70 6c 61 74
> > > 66 6f 72 6d 2f 73 vices/platform/s
> > > [ 14.511376][ T503] Object ffffff87caae0f90: 6f 63 2f 33 33 34 30 30 30 30
> > > 2e 73 6c 69 6d 2f oc/3340000.slim/
> > > [ 14.521566][ T503] Object ffffff87caae0fa0: 71 63 6f 6d 2c 73 6c 69 6d 2d
> > > 6e 67 64 2e 31 2f qcom,slim-ngd.1/
> > > [ 14.531757][ T503] Object ffffff87caae0fb0: 62 74 66 6d 73 6c 69 6d 5f 73
> > > 6c 61 76 65 00 00 *btfmslim_slave*..
> > >
> > >
> > >
> > > 499 static int btfm_slim_probe(struct slim_device *slim)
> > I do not see that function in Linus's tree right now. Where does it
> > come from?
> Don't you think, it can be caused by any kernel module.
Kernel code do not protect themselves against other kernel code doing
abusive things.
Perhaps fix your out-of-tree code to not do this if no in-kernel code is
doing this? Nothing we can do at all about out-of-tree code that we
have never seen.
Best of luck!
greg k-h
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Possible case of Race in kobject_get_path()
2022-03-30 11:19 ` Greg KH
@ 2022-03-30 11:23 ` Mukesh Ojha
2022-03-30 11:31 ` Greg KH
0 siblings, 1 reply; 9+ messages in thread
From: Mukesh Ojha @ 2022-03-30 11:23 UTC (permalink / raw)
To: Greg KH; +Cc: Rafael J. Wysocki, linux-kernel
On 3/30/2022 4:49 PM, Greg KH wrote:
> On Wed, Mar 30, 2022 at 04:44:26PM +0530, Mukesh Ojha wrote:
>> On 3/30/2022 4:31 PM, Greg KH wrote:
>>> On Wed, Mar 30, 2022 at 04:14:11PM +0530, Mukesh Ojha wrote:
>>>> On 3/30/2022 3:46 PM, Greg KH wrote:
>>>>> On Wed, Mar 30, 2022 at 03:41:04PM +0530, Mukesh Ojha wrote:
>>>>>> Hi All,
>>>>>>
>>>>>> We are facing one issue where one driver (p1) is trying to register its
>>>>>> device from driver probe
>>>>>> and from another path (p2) dev_set_name(new name) done from driver probe of
>>>>>> the added device whose
>>>>>> new name length can be more than earlier done in (p1 path) which could
>>>>>> result in redzone overwritten issue.
>>>>> I do not understand, what specific driver is doing this so that we can
>>>>> see an example of this problem?
>>>> trying to paste some logs of the race.
>>>>
>>>> [ 14.235820][ T503] BUG kmalloc-128 (Tainted: G S O ): Left Redzone
>>>> overwritten
>>> What kernel version is this?
>> 5.10
> That is very old, is this an issue in 5.17?
>
>>>> [ 14.244189][ T503]
>>>> -----------------------------------------------------------------------------
>>>> [ 14.255241][ T503] INFO: 0xffffff87caae0f7d-0xffffff87caae0f7f
>>>> @offset=3965. First byte 0x2f instead of 0xcc
>>>> [ 14.265176][ T503] INFO: Allocated in kobject_get_path+0x58/0x100 age=1
>>>> cpu=0 pid=503
>>>> [ 14.273111][ T503] kobject_get_path+0x58/0x100
>>>> [ 14.277747][ T503] kobject_uevent_env+0x120/0x744
>>>> [ 14.282639][ T503] device_add+0x98c/0x1028
>>>> [ 14.286925][ T503] *device_register+0x24/0x38*
>>>> [ 14.291395][ T503] slim_alloc_device+0xdc/0x108 [slimbus]
>>>> [ 14.296992][ T503] slim_register_controller+0x210/0x2ac [slimbus]
>>>> [ 14.303291][ T503] qcom_slim_ngd_probe+0x3c/0x348 [slim_qcom_ngd_ctrl]
>>>> [ 14.310007][ T503] platform_drv_probe+0x60/0x180
>>>> [ 14.314812][ T503] really_probe+0x208/0xb64
>>>> [ 14.319184][ T503] driver_probe_device+0x130/0x254
>>>> [ 14.324159][ T503] __device_attach_driver+0x1e8/0x324
>>>> [ 14.329399][ T503] __device_attach+0x170/0x364
>>>> [ 14.334035][ T503] bus_probe_device+0x4c/0x164
>>>> [ 14.338671][ T503] device_add+0xa60/0x1028
>>>> [ 14.342957][ T503] platform_device_add+0x260/0x2c8
>>>> [ 14.347941][ T503] qcom_slim_ngd_ctrl_probe+0x71c/0x804
>>>> [slim_qcom_ngd_ctrl]
>>>> [ 14.355177][ T503] INFO: Freed in kobject_uevent_env+0x210/0x744 age=1
>>>> cpu=4 pid=518
>>>> [ 14.363018][ T503] do_init_module+0xac/0x494
>>>> [ 14.367475][ T503] load_module+0x35c0/0x3e54
>>>> [ 14.371930][ T503] __se_sys_finit_module+0x188/0x1c8
>>>> [ 14.377086][ T503] __arm64_sys_finit_module+0x20/0x30
>>>> [ 14.382328][ T503] el0_svc_common+0xdc/0x294
>>>> [ 14.386786][ T503] el0_svc+0x38/0x9c
>>>> [ 14.390552][ T503] el0_sync_handler+0x8c/0xf0
>>>>
>>>>
>>>> [ 14.490994*][ T503] Redzone ffffff87caae0f7**0*: cc cc cc cc cc cc cc cc cc
>>>> cc cc cc cc 2f 64 65 .............*/de*
>>>> [ 14.501185][ T503] Object ffffff87caae0f80: 76 69 63 65 73 2f 70 6c 61 74
>>>> 66 6f 72 6d 2f 73 vices/platform/s
>>>> [ 14.511376][ T503] Object ffffff87caae0f90: 6f 63 2f 33 33 34 30 30 30 30
>>>> 2e 73 6c 69 6d 2f oc/3340000.slim/
>>>> [ 14.521566][ T503] Object ffffff87caae0fa0: 71 63 6f 6d 2c 73 6c 69 6d 2d
>>>> 6e 67 64 2e 31 2f qcom,slim-ngd.1/
>>>> [ 14.531757][ T503] Object ffffff87caae0fb0: 62 74 66 6d 73 6c 69 6d 5f 73
>>>> 6c 61 76 65 00 00 *btfmslim_slave*..
>>>>
>>>>
>>>>
>>>> 499 static int btfm_slim_probe(struct slim_device *slim)
>>> I do not see that function in Linus's tree right now. Where does it
>>> come from?
>> Don't you think, it can be caused by any kernel module.
> Kernel code do not protect themselves against other kernel code doing
> abusive things.
>
> Perhaps fix your out-of-tree code to not do this if no in-kernel code is
> doing this? Nothing we can do at all about out-of-tree code that we
> have never seen.
Thanks for the comment, Greg.
Why do we make dev_set_name() make it available to module apart from the
core ?
-Mukesh
>
> Best of luck!
>
> greg k-h
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Possible case of Race in kobject_get_path()
2022-03-30 11:23 ` Mukesh Ojha
@ 2022-03-30 11:31 ` Greg KH
2022-03-30 12:55 ` Mukesh Ojha
0 siblings, 1 reply; 9+ messages in thread
From: Greg KH @ 2022-03-30 11:31 UTC (permalink / raw)
To: Mukesh Ojha; +Cc: Rafael J. Wysocki, linux-kernel
On Wed, Mar 30, 2022 at 04:53:41PM +0530, Mukesh Ojha wrote:
>
> On 3/30/2022 4:49 PM, Greg KH wrote:
> > On Wed, Mar 30, 2022 at 04:44:26PM +0530, Mukesh Ojha wrote:
> > > On 3/30/2022 4:31 PM, Greg KH wrote:
> > > > On Wed, Mar 30, 2022 at 04:14:11PM +0530, Mukesh Ojha wrote:
> > > > > On 3/30/2022 3:46 PM, Greg KH wrote:
> > > > > > On Wed, Mar 30, 2022 at 03:41:04PM +0530, Mukesh Ojha wrote:
> > > > > > > Hi All,
> > > > > > >
> > > > > > > We are facing one issue where one driver (p1) is trying to register its
> > > > > > > device from driver probe
> > > > > > > and from another path (p2) dev_set_name(new name) done from driver probe of
> > > > > > > the added device whose
> > > > > > > new name length can be more than earlier done in (p1 path) which could
> > > > > > > result in redzone overwritten issue.
> > > > > > I do not understand, what specific driver is doing this so that we can
> > > > > > see an example of this problem?
> > > > > trying to paste some logs of the race.
> > > > >
> > > > > [ 14.235820][ T503] BUG kmalloc-128 (Tainted: G S O ): Left Redzone
> > > > > overwritten
> > > > What kernel version is this?
> > > 5.10
> > That is very old, is this an issue in 5.17?
> >
> > > > > [ 14.244189][ T503]
> > > > > -----------------------------------------------------------------------------
> > > > > [ 14.255241][ T503] INFO: 0xffffff87caae0f7d-0xffffff87caae0f7f
> > > > > @offset=3965. First byte 0x2f instead of 0xcc
> > > > > [ 14.265176][ T503] INFO: Allocated in kobject_get_path+0x58/0x100 age=1
> > > > > cpu=0 pid=503
> > > > > [ 14.273111][ T503] kobject_get_path+0x58/0x100
> > > > > [ 14.277747][ T503] kobject_uevent_env+0x120/0x744
> > > > > [ 14.282639][ T503] device_add+0x98c/0x1028
> > > > > [ 14.286925][ T503] *device_register+0x24/0x38*
> > > > > [ 14.291395][ T503] slim_alloc_device+0xdc/0x108 [slimbus]
> > > > > [ 14.296992][ T503] slim_register_controller+0x210/0x2ac [slimbus]
> > > > > [ 14.303291][ T503] qcom_slim_ngd_probe+0x3c/0x348 [slim_qcom_ngd_ctrl]
> > > > > [ 14.310007][ T503] platform_drv_probe+0x60/0x180
> > > > > [ 14.314812][ T503] really_probe+0x208/0xb64
> > > > > [ 14.319184][ T503] driver_probe_device+0x130/0x254
> > > > > [ 14.324159][ T503] __device_attach_driver+0x1e8/0x324
> > > > > [ 14.329399][ T503] __device_attach+0x170/0x364
> > > > > [ 14.334035][ T503] bus_probe_device+0x4c/0x164
> > > > > [ 14.338671][ T503] device_add+0xa60/0x1028
> > > > > [ 14.342957][ T503] platform_device_add+0x260/0x2c8
> > > > > [ 14.347941][ T503] qcom_slim_ngd_ctrl_probe+0x71c/0x804
> > > > > [slim_qcom_ngd_ctrl]
> > > > > [ 14.355177][ T503] INFO: Freed in kobject_uevent_env+0x210/0x744 age=1
> > > > > cpu=4 pid=518
> > > > > [ 14.363018][ T503] do_init_module+0xac/0x494
> > > > > [ 14.367475][ T503] load_module+0x35c0/0x3e54
> > > > > [ 14.371930][ T503] __se_sys_finit_module+0x188/0x1c8
> > > > > [ 14.377086][ T503] __arm64_sys_finit_module+0x20/0x30
> > > > > [ 14.382328][ T503] el0_svc_common+0xdc/0x294
> > > > > [ 14.386786][ T503] el0_svc+0x38/0x9c
> > > > > [ 14.390552][ T503] el0_sync_handler+0x8c/0xf0
> > > > >
> > > > >
> > > > > [ 14.490994*][ T503] Redzone ffffff87caae0f7**0*: cc cc cc cc cc cc cc cc cc
> > > > > cc cc cc cc 2f 64 65 .............*/de*
> > > > > [ 14.501185][ T503] Object ffffff87caae0f80: 76 69 63 65 73 2f 70 6c 61 74
> > > > > 66 6f 72 6d 2f 73 vices/platform/s
> > > > > [ 14.511376][ T503] Object ffffff87caae0f90: 6f 63 2f 33 33 34 30 30 30 30
> > > > > 2e 73 6c 69 6d 2f oc/3340000.slim/
> > > > > [ 14.521566][ T503] Object ffffff87caae0fa0: 71 63 6f 6d 2c 73 6c 69 6d 2d
> > > > > 6e 67 64 2e 31 2f qcom,slim-ngd.1/
> > > > > [ 14.531757][ T503] Object ffffff87caae0fb0: 62 74 66 6d 73 6c 69 6d 5f 73
> > > > > 6c 61 76 65 00 00 *btfmslim_slave*..
> > > > >
> > > > >
> > > > >
> > > > > 499 static int btfm_slim_probe(struct slim_device *slim)
> > > > I do not see that function in Linus's tree right now. Where does it
> > > > come from?
> > > Don't you think, it can be caused by any kernel module.
> > Kernel code do not protect themselves against other kernel code doing
> > abusive things.
> >
> > Perhaps fix your out-of-tree code to not do this if no in-kernel code is
> > doing this? Nothing we can do at all about out-of-tree code that we
> > have never seen.
>
>
> Thanks for the comment, Greg.
> Why do we make dev_set_name() make it available to module apart from the
> core ?
Because busses can be in a module, just like you should be doing :)
Please submit your code upstream for review.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Possible case of Race in kobject_get_path()
2022-03-30 11:31 ` Greg KH
@ 2022-03-30 12:55 ` Mukesh Ojha
2022-03-30 14:22 ` Greg KH
0 siblings, 1 reply; 9+ messages in thread
From: Mukesh Ojha @ 2022-03-30 12:55 UTC (permalink / raw)
To: Greg KH; +Cc: Rafael J. Wysocki, linux-kernel
On 3/30/2022 5:01 PM, Greg KH wrote:
> On Wed, Mar 30, 2022 at 04:53:41PM +0530, Mukesh Ojha wrote:
>> On 3/30/2022 4:49 PM, Greg KH wrote:
>>> On Wed, Mar 30, 2022 at 04:44:26PM +0530, Mukesh Ojha wrote:
>>>> On 3/30/2022 4:31 PM, Greg KH wrote:
>>>>> On Wed, Mar 30, 2022 at 04:14:11PM +0530, Mukesh Ojha wrote:
>>>>>> On 3/30/2022 3:46 PM, Greg KH wrote:
>>>>>>> On Wed, Mar 30, 2022 at 03:41:04PM +0530, Mukesh Ojha wrote:
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> We are facing one issue where one driver (p1) is trying to register its
>>>>>>>> device from driver probe
>>>>>>>> and from another path (p2) dev_set_name(new name) done from driver probe of
>>>>>>>> the added device whose
>>>>>>>> new name length can be more than earlier done in (p1 path) which could
>>>>>>>> result in redzone overwritten issue.
>>>>>>> I do not understand, what specific driver is doing this so that we can
>>>>>>> see an example of this problem?
>>>>>> trying to paste some logs of the race.
>>>>>>
>>>>>> [ 14.235820][ T503] BUG kmalloc-128 (Tainted: G S O ): Left Redzone
>>>>>> overwritten
>>>>> What kernel version is this?
>>>> 5.10
>>> That is very old, is this an issue in 5.17?
>>>
>>>>>> [ 14.244189][ T503]
>>>>>> -----------------------------------------------------------------------------
>>>>>> [ 14.255241][ T503] INFO: 0xffffff87caae0f7d-0xffffff87caae0f7f
>>>>>> @offset=3965. First byte 0x2f instead of 0xcc
>>>>>> [ 14.265176][ T503] INFO: Allocated in kobject_get_path+0x58/0x100 age=1
>>>>>> cpu=0 pid=503
>>>>>> [ 14.273111][ T503] kobject_get_path+0x58/0x100
>>>>>> [ 14.277747][ T503] kobject_uevent_env+0x120/0x744
>>>>>> [ 14.282639][ T503] device_add+0x98c/0x1028
>>>>>> [ 14.286925][ T503] *device_register+0x24/0x38*
>>>>>> [ 14.291395][ T503] slim_alloc_device+0xdc/0x108 [slimbus]
>>>>>> [ 14.296992][ T503] slim_register_controller+0x210/0x2ac [slimbus]
>>>>>> [ 14.303291][ T503] qcom_slim_ngd_probe+0x3c/0x348 [slim_qcom_ngd_ctrl]
>>>>>> [ 14.310007][ T503] platform_drv_probe+0x60/0x180
>>>>>> [ 14.314812][ T503] really_probe+0x208/0xb64
>>>>>> [ 14.319184][ T503] driver_probe_device+0x130/0x254
>>>>>> [ 14.324159][ T503] __device_attach_driver+0x1e8/0x324
>>>>>> [ 14.329399][ T503] __device_attach+0x170/0x364
>>>>>> [ 14.334035][ T503] bus_probe_device+0x4c/0x164
>>>>>> [ 14.338671][ T503] device_add+0xa60/0x1028
>>>>>> [ 14.342957][ T503] platform_device_add+0x260/0x2c8
>>>>>> [ 14.347941][ T503] qcom_slim_ngd_ctrl_probe+0x71c/0x804
>>>>>> [slim_qcom_ngd_ctrl]
>>>>>> [ 14.355177][ T503] INFO: Freed in kobject_uevent_env+0x210/0x744 age=1
>>>>>> cpu=4 pid=518
>>>>>> [ 14.363018][ T503] do_init_module+0xac/0x494
>>>>>> [ 14.367475][ T503] load_module+0x35c0/0x3e54
>>>>>> [ 14.371930][ T503] __se_sys_finit_module+0x188/0x1c8
>>>>>> [ 14.377086][ T503] __arm64_sys_finit_module+0x20/0x30
>>>>>> [ 14.382328][ T503] el0_svc_common+0xdc/0x294
>>>>>> [ 14.386786][ T503] el0_svc+0x38/0x9c
>>>>>> [ 14.390552][ T503] el0_sync_handler+0x8c/0xf0
>>>>>>
>>>>>>
>>>>>> [ 14.490994*][ T503] Redzone ffffff87caae0f7**0*: cc cc cc cc cc cc cc cc cc
>>>>>> cc cc cc cc 2f 64 65 .............*/de*
>>>>>> [ 14.501185][ T503] Object ffffff87caae0f80: 76 69 63 65 73 2f 70 6c 61 74
>>>>>> 66 6f 72 6d 2f 73 vices/platform/s
>>>>>> [ 14.511376][ T503] Object ffffff87caae0f90: 6f 63 2f 33 33 34 30 30 30 30
>>>>>> 2e 73 6c 69 6d 2f oc/3340000.slim/
>>>>>> [ 14.521566][ T503] Object ffffff87caae0fa0: 71 63 6f 6d 2c 73 6c 69 6d 2d
>>>>>> 6e 67 64 2e 31 2f qcom,slim-ngd.1/
>>>>>> [ 14.531757][ T503] Object ffffff87caae0fb0: 62 74 66 6d 73 6c 69 6d 5f 73
>>>>>> 6c 61 76 65 00 00 *btfmslim_slave*..
>>>>>>
>>>>>>
>>>>>>
>>>>>> 499 static int btfm_slim_probe(struct slim_device *slim)
>>>>> I do not see that function in Linus's tree right now. Where does it
>>>>> come from?
>>>> Don't you think, it can be caused by any kernel module.
>>> Kernel code do not protect themselves against other kernel code doing
>>> abusive things.
>>>
>>> Perhaps fix your out-of-tree code to not do this if no in-kernel code is
>>> doing this? Nothing we can do at all about out-of-tree code that we
>>> have never seen.
>>
>> Thanks for the comment, Greg.
>> Why do we make dev_set_name() make it available to module apart from the
>> core ?
> Because busses can be in a module, just like you should be doing :)
>
> Please submit your code upstream for review.
I still did not get why one should not do device name change if
dev_set_name() is available for anyone to do name change while
kobject_get_path() is running?
This is possible race, if we know this is the limitation/issue and would
be difficult to address , that is fine and should be documented.
Thanks,
Mukesh
>
> thanks,
>
> greg k-h
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Possible case of Race in kobject_get_path()
2022-03-30 12:55 ` Mukesh Ojha
@ 2022-03-30 14:22 ` Greg KH
0 siblings, 0 replies; 9+ messages in thread
From: Greg KH @ 2022-03-30 14:22 UTC (permalink / raw)
To: Mukesh Ojha; +Cc: Rafael J. Wysocki, linux-kernel
On Wed, Mar 30, 2022 at 06:25:29PM +0530, Mukesh Ojha wrote:
>
> On 3/30/2022 5:01 PM, Greg KH wrote:
> > On Wed, Mar 30, 2022 at 04:53:41PM +0530, Mukesh Ojha wrote:
> > > On 3/30/2022 4:49 PM, Greg KH wrote:
> > > > On Wed, Mar 30, 2022 at 04:44:26PM +0530, Mukesh Ojha wrote:
> > > > > On 3/30/2022 4:31 PM, Greg KH wrote:
> > > > > > On Wed, Mar 30, 2022 at 04:14:11PM +0530, Mukesh Ojha wrote:
> > > > > > > On 3/30/2022 3:46 PM, Greg KH wrote:
> > > > > > > > On Wed, Mar 30, 2022 at 03:41:04PM +0530, Mukesh Ojha wrote:
> > > > > > > > > Hi All,
> > > > > > > > >
> > > > > > > > > We are facing one issue where one driver (p1) is trying to register its
> > > > > > > > > device from driver probe
> > > > > > > > > and from another path (p2) dev_set_name(new name) done from driver probe of
> > > > > > > > > the added device whose
> > > > > > > > > new name length can be more than earlier done in (p1 path) which could
> > > > > > > > > result in redzone overwritten issue.
> > > > > > > > I do not understand, what specific driver is doing this so that we can
> > > > > > > > see an example of this problem?
> > > > > > > trying to paste some logs of the race.
> > > > > > >
> > > > > > > [ 14.235820][ T503] BUG kmalloc-128 (Tainted: G S O ): Left Redzone
> > > > > > > overwritten
> > > > > > What kernel version is this?
> > > > > 5.10
> > > > That is very old, is this an issue in 5.17?
> > > >
> > > > > > > [ 14.244189][ T503]
> > > > > > > -----------------------------------------------------------------------------
> > > > > > > [ 14.255241][ T503] INFO: 0xffffff87caae0f7d-0xffffff87caae0f7f
> > > > > > > @offset=3965. First byte 0x2f instead of 0xcc
> > > > > > > [ 14.265176][ T503] INFO: Allocated in kobject_get_path+0x58/0x100 age=1
> > > > > > > cpu=0 pid=503
> > > > > > > [ 14.273111][ T503] kobject_get_path+0x58/0x100
> > > > > > > [ 14.277747][ T503] kobject_uevent_env+0x120/0x744
> > > > > > > [ 14.282639][ T503] device_add+0x98c/0x1028
> > > > > > > [ 14.286925][ T503] *device_register+0x24/0x38*
> > > > > > > [ 14.291395][ T503] slim_alloc_device+0xdc/0x108 [slimbus]
> > > > > > > [ 14.296992][ T503] slim_register_controller+0x210/0x2ac [slimbus]
> > > > > > > [ 14.303291][ T503] qcom_slim_ngd_probe+0x3c/0x348 [slim_qcom_ngd_ctrl]
> > > > > > > [ 14.310007][ T503] platform_drv_probe+0x60/0x180
> > > > > > > [ 14.314812][ T503] really_probe+0x208/0xb64
> > > > > > > [ 14.319184][ T503] driver_probe_device+0x130/0x254
> > > > > > > [ 14.324159][ T503] __device_attach_driver+0x1e8/0x324
> > > > > > > [ 14.329399][ T503] __device_attach+0x170/0x364
> > > > > > > [ 14.334035][ T503] bus_probe_device+0x4c/0x164
> > > > > > > [ 14.338671][ T503] device_add+0xa60/0x1028
> > > > > > > [ 14.342957][ T503] platform_device_add+0x260/0x2c8
> > > > > > > [ 14.347941][ T503] qcom_slim_ngd_ctrl_probe+0x71c/0x804
> > > > > > > [slim_qcom_ngd_ctrl]
> > > > > > > [ 14.355177][ T503] INFO: Freed in kobject_uevent_env+0x210/0x744 age=1
> > > > > > > cpu=4 pid=518
> > > > > > > [ 14.363018][ T503] do_init_module+0xac/0x494
> > > > > > > [ 14.367475][ T503] load_module+0x35c0/0x3e54
> > > > > > > [ 14.371930][ T503] __se_sys_finit_module+0x188/0x1c8
> > > > > > > [ 14.377086][ T503] __arm64_sys_finit_module+0x20/0x30
> > > > > > > [ 14.382328][ T503] el0_svc_common+0xdc/0x294
> > > > > > > [ 14.386786][ T503] el0_svc+0x38/0x9c
> > > > > > > [ 14.390552][ T503] el0_sync_handler+0x8c/0xf0
> > > > > > >
> > > > > > >
> > > > > > > [ 14.490994*][ T503] Redzone ffffff87caae0f7**0*: cc cc cc cc cc cc cc cc cc
> > > > > > > cc cc cc cc 2f 64 65 .............*/de*
> > > > > > > [ 14.501185][ T503] Object ffffff87caae0f80: 76 69 63 65 73 2f 70 6c 61 74
> > > > > > > 66 6f 72 6d 2f 73 vices/platform/s
> > > > > > > [ 14.511376][ T503] Object ffffff87caae0f90: 6f 63 2f 33 33 34 30 30 30 30
> > > > > > > 2e 73 6c 69 6d 2f oc/3340000.slim/
> > > > > > > [ 14.521566][ T503] Object ffffff87caae0fa0: 71 63 6f 6d 2c 73 6c 69 6d 2d
> > > > > > > 6e 67 64 2e 31 2f qcom,slim-ngd.1/
> > > > > > > [ 14.531757][ T503] Object ffffff87caae0fb0: 62 74 66 6d 73 6c 69 6d 5f 73
> > > > > > > 6c 61 76 65 00 00 *btfmslim_slave*..
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > 499 static int btfm_slim_probe(struct slim_device *slim)
> > > > > > I do not see that function in Linus's tree right now. Where does it
> > > > > > come from?
> > > > > Don't you think, it can be caused by any kernel module.
> > > > Kernel code do not protect themselves against other kernel code doing
> > > > abusive things.
> > > >
> > > > Perhaps fix your out-of-tree code to not do this if no in-kernel code is
> > > > doing this? Nothing we can do at all about out-of-tree code that we
> > > > have never seen.
> > >
> > > Thanks for the comment, Greg.
> > > Why do we make dev_set_name() make it available to module apart from the
> > > core ?
> > Because busses can be in a module, just like you should be doing :)
> >
> > Please submit your code upstream for review.
>
> I still did not get why one should not do device name change if
> dev_set_name() is available for anyone to do name change while
> kobject_get_path() is running?
I do not see where in the kernel today this is happening. What code
changes the name of a device like this?
> This is possible race, if we know this is the limitation/issue and would be
> difficult to address , that is fine and should be documented.
Again, why are you doing this? What needs to change the device name in
this way and what is it racing with?
Without code examples, speculating about all of this is quite difficult
to discuss.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2022-03-30 14:23 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-30 10:11 Possible case of Race in kobject_get_path() Mukesh Ojha
2022-03-30 10:16 ` Greg KH
[not found] ` <148f4e9b-0584-ccc1-1443-30d41190dfd3@quicinc.com>
2022-03-30 11:01 ` Greg KH
2022-03-30 11:14 ` Mukesh Ojha
2022-03-30 11:19 ` Greg KH
2022-03-30 11:23 ` Mukesh Ojha
2022-03-30 11:31 ` Greg KH
2022-03-30 12:55 ` Mukesh Ojha
2022-03-30 14:22 ` Greg KH
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.