linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* general protection fault in del_gendisk
@ 2021-10-29 19:13 Tadeusz Struk
  2021-11-01 20:01 ` Tadeusz Struk
  0 siblings, 1 reply; 6+ messages in thread
From: Tadeusz Struk @ 2021-10-29 19:13 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block

Hi,
I'm looking at a bug found by the syzkaller robot [1], and I just wanted
to confirm that my understanding is correct, and the issue can be closed.
First, the kernel is configured with some fault injections enabled:

CONFIG_FAULT_INJECTION=y
CONFIG_FAILSLAB=y
CONFIG_FAIL_PAGE_ALLOC=y

The test adds loop devices, which causes some entries in sysfs to be created.
It does some magic with ioctls, which calls:
__device_add_disk() -> register_disk()
which eventually triggers sysfs_create_files() and it crashes there,
in line 627 [2], because the fault injector logic triggers it.
That can be seen in the trace [3]:
[   34.089707][ T1813] FAULT_INJECTION: forcing a failure.

Sysfs code returns a -ENOMEM error, but because the __device_add_disk()
implementation mostly uses void function, and doesn't return on errors [4]
it goes farther, hits some warnings, like:
disk_add_events() -> sysfs_create_files() -> sysfs_create_file_ns() - > WARN()
and eventually triggers general protection fault in sysfs code, and panics there.

I think for this to recover and return an error to the caller via ioctl()
the __device_add_disk() code would need be reworked to handle errors,
and return errors to the caller.
My question is: is it implemented like this by design? Are there any plans
to make it fail more gracefully?

[1] https://syzkaller.appspot.com/bug?id=c234dd5151b92650adff0683a8c567c269fb39e5
[2] https://elixir.bootlin.com/linux/v5.14.15/source/fs/kernfs/dir.c#L583
[3] https://syzkaller.appspot.com/text?tag=CrashLog&x=113d8bfcb00000
[4] https://elixir.bootlin.com/linux/v5.14.15/source/block/genhd.c#L532

-- 
Thanks,
Tadeusz

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: general protection fault in del_gendisk
  2021-10-29 19:13 general protection fault in del_gendisk Tadeusz Struk
@ 2021-11-01 20:01 ` Tadeusz Struk
  2021-11-01 20:03   ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Tadeusz Struk @ 2021-11-01 20:01 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-kernel

On 10/29/21 12:13, Tadeusz Struk wrote:
> Hi,
> I'm looking at a bug found by the syzkaller robot [1], and I just wanted
> to confirm that my understanding is correct, and the issue can be closed.
> First, the kernel is configured with some fault injections enabled:
> 
> CONFIG_FAULT_INJECTION=y
> CONFIG_FAILSLAB=y
> CONFIG_FAIL_PAGE_ALLOC=y
> 
> The test adds loop devices, which causes some entries in sysfs to be created.
> It does some magic with ioctls, which calls:
> __device_add_disk() -> register_disk()
> which eventually triggers sysfs_create_files() and it crashes there,
> in line 627 [2], because the fault injector logic triggers it.
> That can be seen in the trace [3]:
> [   34.089707][ T1813] FAULT_INJECTION: forcing a failure.
> 
> Sysfs code returns a -ENOMEM error, but because the __device_add_disk()
> implementation mostly uses void function, and doesn't return on errors [4]
> it goes farther, hits some warnings, like:
> disk_add_events() -> sysfs_create_files() -> sysfs_create_file_ns() - > WARN()
> and eventually triggers general protection fault in sysfs code, and panics there.
> 
> I think for this to recover and return an error to the caller via ioctl()
> the __device_add_disk() code would need be reworked to handle errors,
> and return errors to the caller.
> My question is: is it implemented like this by design? Are there any plans
> to make it fail more gracefully?

Hi,
Any comments on this one?

-- 
Thanks,
Tadeusz

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: general protection fault in del_gendisk
  2021-11-01 20:01 ` Tadeusz Struk
@ 2021-11-01 20:03   ` Jens Axboe
  2021-11-02  6:45     ` Christoph Hellwig
  0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2021-11-01 20:03 UTC (permalink / raw)
  To: Tadeusz Struk; +Cc: linux-block, linux-kernel

On 11/1/21 2:01 PM, Tadeusz Struk wrote:
> On 10/29/21 12:13, Tadeusz Struk wrote:
>> Hi,
>> I'm looking at a bug found by the syzkaller robot [1], and I just wanted
>> to confirm that my understanding is correct, and the issue can be closed.
>> First, the kernel is configured with some fault injections enabled:
>>
>> CONFIG_FAULT_INJECTION=y
>> CONFIG_FAILSLAB=y
>> CONFIG_FAIL_PAGE_ALLOC=y
>>
>> The test adds loop devices, which causes some entries in sysfs to be created.
>> It does some magic with ioctls, which calls:
>> __device_add_disk() -> register_disk()
>> which eventually triggers sysfs_create_files() and it crashes there,
>> in line 627 [2], because the fault injector logic triggers it.
>> That can be seen in the trace [3]:
>> [   34.089707][ T1813] FAULT_INJECTION: forcing a failure.
>>
>> Sysfs code returns a -ENOMEM error, but because the __device_add_disk()
>> implementation mostly uses void function, and doesn't return on errors [4]
>> it goes farther, hits some warnings, like:
>> disk_add_events() -> sysfs_create_files() -> sysfs_create_file_ns() - > WARN()
>> and eventually triggers general protection fault in sysfs code, and panics there.
>>
>> I think for this to recover and return an error to the caller via ioctl()
>> the __device_add_disk() code would need be reworked to handle errors,
>> and return errors to the caller.
>> My question is: is it implemented like this by design? Are there any plans
>> to make it fail more gracefully?
> 
> Hi,
> Any comments on this one?

People will take a look at it, but you sent it out on a Saturday right
before a merge window, doing a 'ping' kind of followup on a Monday is
way too soon.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: general protection fault in del_gendisk
  2021-11-01 20:03   ` Jens Axboe
@ 2021-11-02  6:45     ` Christoph Hellwig
  2021-11-08 19:19       ` Tadeusz Struk
  0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2021-11-02  6:45 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Tadeusz Struk, linux-block, linux-kernel

> People will take a look at it, but you sent it out on a Saturday right
> before a merge window, doing a 'ping' kind of followup on a Monday is
> way too soon.

Please retests on 5.16-rc1 once it is out.  Sorting out add_disk error
handling is one of the big changes in this merge window.  And no, it is
not easily backportable.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: general protection fault in del_gendisk
  2021-11-02  6:45     ` Christoph Hellwig
@ 2021-11-08 19:19       ` Tadeusz Struk
  2021-11-12  6:36         ` Christoph Hellwig
  0 siblings, 1 reply; 6+ messages in thread
From: Tadeusz Struk @ 2021-11-08 19:19 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe; +Cc: linux-block, linux-kernel

On 11/1/21 23:45, Christoph Hellwig wrote:
>> People will take a look at it, but you sent it out on a Saturday right
>> before a merge window, doing a 'ping' kind of followup on a Monday is
>> way too soon.
> Please retests on 5.16-rc1 once it is out.  Sorting out add_disk error
> handling is one of the big changes in this merge window.  And no, it is
> not easily backportable.

I have ran this on the latest mainline. I can confirm that the errors are
nicely handled there. It triggers a warning and panics in device_add_disk()
because of:
return WARN_ON_ONCE(ret); /* keep until all callers handle errors */
and
Kernel panic - not syncing: panic_on_warn set ...

I will close the issue.
-- 
Thanks,
Tadeusz

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: general protection fault in del_gendisk
  2021-11-08 19:19       ` Tadeusz Struk
@ 2021-11-12  6:36         ` Christoph Hellwig
  0 siblings, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2021-11-12  6:36 UTC (permalink / raw)
  To: Tadeusz Struk; +Cc: Christoph Hellwig, Jens Axboe, linux-block, linux-kernel

On Mon, Nov 08, 2021 at 11:19:35AM -0800, Tadeusz Struk wrote:
> I have ran this on the latest mainline. I can confirm that the errors are
> nicely handled there. It triggers a warning and panics in device_add_disk()
> because of:
> return WARN_ON_ONCE(ret); /* keep until all callers handle errors */
> and
> Kernel panic - not syncing: panic_on_warn set ...

The WARRN_ON will go away once all drivers are fixed.  And no, it does
not panic the kernel unless you set an obscure sysctl asking for it to
panic on warnings, in which case you get what you ask for.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-11-12  6:36 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-29 19:13 general protection fault in del_gendisk Tadeusz Struk
2021-11-01 20:01 ` Tadeusz Struk
2021-11-01 20:03   ` Jens Axboe
2021-11-02  6:45     ` Christoph Hellwig
2021-11-08 19:19       ` Tadeusz Struk
2021-11-12  6:36         ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).