[PATCH] /dev/mem: Bail out upon SIGKILL when reading memory.

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] /dev/mem: Bail out upon SIGKILL when reading memory.
@ 2019-08-20 22:06 Tetsuo Handa
  2019-08-20 22:24 ` Greg Kroah-Hartman
                   ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Tetsuo Handa @ 2019-08-20 22:06 UTC (permalink / raw)
  To: Arnd Bergmann, Greg Kroah-Hartman; +Cc: linux-kernel, Tetsuo Handa, syzbot

syzbot found that a thread can stall for minutes inside read_mem()
after that thread was killed by SIGKILL [1]. Reading 2GB at one read()
is legal, but delaying termination of killed thread for minutes is bad.

  [ 1335.912419][T20577] read_mem: sz=4096 count=2134565632
  [ 1335.943194][T20577] read_mem: sz=4096 count=2134561536
  [ 1335.978280][T20577] read_mem: sz=4096 count=2134557440
  [ 1336.011147][T20577] read_mem: sz=4096 count=2134553344
  [ 1336.041897][T20577] read_mem: sz=4096 count=2134549248

[1] https://syzkaller.appspot.com/bug?id=a0e3436829698d5824231251fad9d8e998f94f5e

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: syzbot <syzbot+8ab2d0f39fb79fe6ca40@syzkaller.appspotmail.com>
---
 drivers/char/mem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index b08dc50..0f7d4c4 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -135,7 +135,7 @@ static ssize_t read_mem(struct file *file, char __user *buf,
 	if (!bounce)
 		return -ENOMEM;
 
-	while (count > 0) {
+	while (count > 0 && !fatal_signal_pending(current)) {
 		unsigned long remaining;
 		int allowed, probe;
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH] /dev/mem: Bail out upon SIGKILL when reading memory.
  2019-08-20 22:06 [PATCH] /dev/mem: Bail out upon SIGKILL when reading memory Tetsuo Handa
@ 2019-08-20 22:24 ` Greg Kroah-Hartman
  2019-08-21  0:07   ` Tetsuo Handa
  2019-08-22  9:59   ` Tetsuo Handa
  2019-08-22 21:29 ` [PATCH] /dev/mem: Bail out upon SIGKILL when reading memory Linus Torvalds
  2019-08-22 22:08 ` Linus Torvalds
  2 siblings, 2 replies; 31+ messages in thread
From: Greg Kroah-Hartman @ 2019-08-20 22:24 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: Arnd Bergmann, linux-kernel, syzbot

On Wed, Aug 21, 2019 at 07:06:51AM +0900, Tetsuo Handa wrote:
> syzbot found that a thread can stall for minutes inside read_mem()
> after that thread was killed by SIGKILL [1]. Reading 2GB at one read()
> is legal, but delaying termination of killed thread for minutes is bad.
> 
>   [ 1335.912419][T20577] read_mem: sz=4096 count=2134565632
>   [ 1335.943194][T20577] read_mem: sz=4096 count=2134561536
>   [ 1335.978280][T20577] read_mem: sz=4096 count=2134557440
>   [ 1336.011147][T20577] read_mem: sz=4096 count=2134553344
>   [ 1336.041897][T20577] read_mem: sz=4096 count=2134549248
> 
> [1] https://syzkaller.appspot.com/bug?id=a0e3436829698d5824231251fad9d8e998f94f5e
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Reported-by: syzbot <syzbot+8ab2d0f39fb79fe6ca40@syzkaller.appspotmail.com>
> ---
>  drivers/char/mem.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/char/mem.c b/drivers/char/mem.c
> index b08dc50..0f7d4c4 100644
> --- a/drivers/char/mem.c
> +++ b/drivers/char/mem.c
> @@ -135,7 +135,7 @@ static ssize_t read_mem(struct file *file, char __user *buf,
>  	if (!bounce)
>  		return -ENOMEM;
>  
> -	while (count > 0) {
> +	while (count > 0 && !fatal_signal_pending(current)) {
>  		unsigned long remaining;
>  		int allowed, probe;
>  
> -- 
> 1.8.3.1
> 

Oh, nice!  This shouldn't break anything that is assuming that the read
will complete before a signal is delivered, right?

I know userspace handling of "short" reads is almost always not there...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] /dev/mem: Bail out upon SIGKILL when reading memory.
  2019-08-20 22:24 ` Greg Kroah-Hartman
@ 2019-08-21  0:07   ` Tetsuo Handa
  2019-08-22  9:59   ` Tetsuo Handa
  1 sibling, 0 replies; 31+ messages in thread
From: Tetsuo Handa @ 2019-08-21  0:07 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: Arnd Bergmann, linux-kernel, syzbot

Greg Kroah-Hartman wrote:
> Oh, nice!  This shouldn't break anything that is assuming that the read
> will complete before a signal is delivered, right?
>
> I know userspace handling of "short" reads is almost always not there...

Since this check will give up upon SIGKILL, userspace won't be able to see
the return value from read(). Thus, returning 0 upon SIGKILL will be safe. ;-)
Maybe we also want to add cond_resched()...

By the way, do we want similar check on write_mem() side?
If aborting "write to /dev/mem" upon SIGKILL (results in partial write) is
unexpected, we might want to ignore SIGKILL for write_mem() case.
But copying data from killed threads (especially when killed by OOM killer
and userspace memory is reclaimed by OOM reaper before write_mem() returns)
would be after all unexpected. Then, it might be preferable to check SIGKILL
on write_mem() side...

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] /dev/mem: Bail out upon SIGKILL when reading memory.
  2019-08-20 22:24 ` Greg Kroah-Hartman
  2019-08-21  0:07   ` Tetsuo Handa
@ 2019-08-22  9:59   ` Tetsuo Handa
  2019-08-22 13:35     ` Greg Kroah-Hartman
  1 sibling, 1 reply; 31+ messages in thread
From: Tetsuo Handa @ 2019-08-22  9:59 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: Arnd Bergmann, linux-kernel, syzbot, Eric Biggers

Tetsuo Handa wrote:
> Greg Kroah-Hartman wrote:
> > Oh, nice!  This shouldn't break anything that is assuming that the read
> > will complete before a signal is delivered, right?
> >
> > I know userspace handling of "short" reads is almost always not there...
> 
> Since this check will give up upon SIGKILL, userspace won't be able to see
> the return value from read(). Thus, returning 0 upon SIGKILL will be safe. ;-)
> Maybe we also want to add cond_resched()...
> 
> By the way, do we want similar check on write_mem() side?
> If aborting "write to /dev/mem" upon SIGKILL (results in partial write) is
> unexpected, we might want to ignore SIGKILL for write_mem() case.
> But copying data from killed threads (especially when killed by OOM killer
> and userspace memory is reclaimed by OOM reaper before write_mem() returns)
> would be after all unexpected. Then, it might be preferable to check SIGKILL
> on write_mem() side...
> 

Ha, ha. syzbot reported the same problem using write_mem().
https://syzkaller.appspot.com/text?tag=CrashLog&x=1018055a600000
We want fatal_signal_pending() check on both sides.

By the way, write_mem() worries me whether there is possibility of replacing
kernel code/data with user-defined memory data supplied from userspace.
If write_mem() were by chance replaced with code that does

   while (1);

we won't be able to return from write_mem() even if we added fatal_signal_pending() check.
Ditto for replacing local variables with unexpected values...

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] /dev/mem: Bail out upon SIGKILL when reading memory.
  2019-08-22  9:59   ` Tetsuo Handa
@ 2019-08-22 13:35     ` Greg Kroah-Hartman
  2019-08-22 14:00       ` Tetsuo Handa
  0 siblings, 1 reply; 31+ messages in thread
From: Greg Kroah-Hartman @ 2019-08-22 13:35 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: Arnd Bergmann, linux-kernel, syzbot, Eric Biggers

On Thu, Aug 22, 2019 at 06:59:25PM +0900, Tetsuo Handa wrote:
> Tetsuo Handa wrote:
> > Greg Kroah-Hartman wrote:
> > > Oh, nice!  This shouldn't break anything that is assuming that the read
> > > will complete before a signal is delivered, right?
> > >
> > > I know userspace handling of "short" reads is almost always not there...
> > 
> > Since this check will give up upon SIGKILL, userspace won't be able to see
> > the return value from read(). Thus, returning 0 upon SIGKILL will be safe. ;-)
> > Maybe we also want to add cond_resched()...
> > 
> > By the way, do we want similar check on write_mem() side?
> > If aborting "write to /dev/mem" upon SIGKILL (results in partial write) is
> > unexpected, we might want to ignore SIGKILL for write_mem() case.
> > But copying data from killed threads (especially when killed by OOM killer
> > and userspace memory is reclaimed by OOM reaper before write_mem() returns)
> > would be after all unexpected. Then, it might be preferable to check SIGKILL
> > on write_mem() side...
> > 
> 
> Ha, ha. syzbot reported the same problem using write_mem().
> https://syzkaller.appspot.com/text?tag=CrashLog&x=1018055a600000
> We want fatal_signal_pending() check on both sides.

Ok, want to send a patch for that?

And does anything use /dev/mem anymore?  I think X stopped using it a
long time ago.

> By the way, write_mem() worries me whether there is possibility of replacing
> kernel code/data with user-defined memory data supplied from userspace.
> If write_mem() were by chance replaced with code that does
> 
>    while (1);
> 
> we won't be able to return from write_mem() even if we added fatal_signal_pending() check.
> Ditto for replacing local variables with unexpected values...

I'm sorry, I don't really understand what you mean here, but I haven't
had my morning coffee...  Any hints as to an example?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] /dev/mem: Bail out upon SIGKILL when reading memory.
  2019-08-22 13:35     ` Greg Kroah-Hartman
@ 2019-08-22 14:00       ` Tetsuo Handa
  2019-08-22 16:42         ` Greg Kroah-Hartman
  0 siblings, 1 reply; 31+ messages in thread
From: Tetsuo Handa @ 2019-08-22 14:00 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Arnd Bergmann, linux-kernel, syzbot, Eric Biggers, Dmitry Vyukov

On 2019/08/22 22:35, Greg Kroah-Hartman wrote:
> On Thu, Aug 22, 2019 at 06:59:25PM +0900, Tetsuo Handa wrote:
>> Tetsuo Handa wrote:
>>> Greg Kroah-Hartman wrote:
>>>> Oh, nice!  This shouldn't break anything that is assuming that the read
>>>> will complete before a signal is delivered, right?
>>>>
>>>> I know userspace handling of "short" reads is almost always not there...
>>>
>>> Since this check will give up upon SIGKILL, userspace won't be able to see
>>> the return value from read(). Thus, returning 0 upon SIGKILL will be safe. ;-)
>>> Maybe we also want to add cond_resched()...
>>>
>>> By the way, do we want similar check on write_mem() side?
>>> If aborting "write to /dev/mem" upon SIGKILL (results in partial write) is
>>> unexpected, we might want to ignore SIGKILL for write_mem() case.
>>> But copying data from killed threads (especially when killed by OOM killer
>>> and userspace memory is reclaimed by OOM reaper before write_mem() returns)
>>> would be after all unexpected. Then, it might be preferable to check SIGKILL
>>> on write_mem() side...
>>>
>>
>> Ha, ha. syzbot reported the same problem using write_mem().
>> https://syzkaller.appspot.com/text?tag=CrashLog&x=1018055a600000
>> We want fatal_signal_pending() check on both sides.
> 
> Ok, want to send a patch for that?

Yes. But before sending a patch, I'm trying to dump values using debug printk().

> 
> And does anything use /dev/mem anymore?  I think X stopped using it a
> long time ago.
> 
>> By the way, write_mem() worries me whether there is possibility of replacing
>> kernel code/data with user-defined memory data supplied from userspace.
>> If write_mem() were by chance replaced with code that does
>>
>>    while (1);
>>
>> we won't be able to return from write_mem() even if we added fatal_signal_pending() check.
>> Ditto for replacing local variables with unexpected values...
> 
> I'm sorry, I don't really understand what you mean here, but I haven't
> had my morning coffee...  Any hints as to an example?

Probably similar idea: "lockdown: Restrict /dev/{mem,kmem,port} when the kernel is locked down"

https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/drivers/char/mem.c?h=next-20190822&id=9b9d8dda1ed72e9bd560ab0ca93d322a9440510e

Then, syzbot might want to blacklist writing to /dev/mem .

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] /dev/mem: Bail out upon SIGKILL when reading memory.
  2019-08-22 14:00       ` Tetsuo Handa
@ 2019-08-22 16:42         ` Greg Kroah-Hartman
  2019-08-22 17:11           ` Dmitry Vyukov
  0 siblings, 1 reply; 31+ messages in thread
From: Greg Kroah-Hartman @ 2019-08-22 16:42 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Arnd Bergmann, linux-kernel, syzbot, Eric Biggers, Dmitry Vyukov

On Thu, Aug 22, 2019 at 11:00:59PM +0900, Tetsuo Handa wrote:
> On 2019/08/22 22:35, Greg Kroah-Hartman wrote:
> > On Thu, Aug 22, 2019 at 06:59:25PM +0900, Tetsuo Handa wrote:
> >> Tetsuo Handa wrote:
> >>> Greg Kroah-Hartman wrote:
> >>>> Oh, nice!  This shouldn't break anything that is assuming that the read
> >>>> will complete before a signal is delivered, right?
> >>>>
> >>>> I know userspace handling of "short" reads is almost always not there...
> >>>
> >>> Since this check will give up upon SIGKILL, userspace won't be able to see
> >>> the return value from read(). Thus, returning 0 upon SIGKILL will be safe. ;-)
> >>> Maybe we also want to add cond_resched()...
> >>>
> >>> By the way, do we want similar check on write_mem() side?
> >>> If aborting "write to /dev/mem" upon SIGKILL (results in partial write) is
> >>> unexpected, we might want to ignore SIGKILL for write_mem() case.
> >>> But copying data from killed threads (especially when killed by OOM killer
> >>> and userspace memory is reclaimed by OOM reaper before write_mem() returns)
> >>> would be after all unexpected. Then, it might be preferable to check SIGKILL
> >>> on write_mem() side...
> >>>
> >>
> >> Ha, ha. syzbot reported the same problem using write_mem().
> >> https://syzkaller.appspot.com/text?tag=CrashLog&x=1018055a600000
> >> We want fatal_signal_pending() check on both sides.
> > 
> > Ok, want to send a patch for that?
> 
> Yes. But before sending a patch, I'm trying to dump values using debug printk().
> 
> > 
> > And does anything use /dev/mem anymore?  I think X stopped using it a
> > long time ago.
> > 
> >> By the way, write_mem() worries me whether there is possibility of replacing
> >> kernel code/data with user-defined memory data supplied from userspace.
> >> If write_mem() were by chance replaced with code that does
> >>
> >>    while (1);
> >>
> >> we won't be able to return from write_mem() even if we added fatal_signal_pending() check.
> >> Ditto for replacing local variables with unexpected values...
> > 
> > I'm sorry, I don't really understand what you mean here, but I haven't
> > had my morning coffee...  Any hints as to an example?
> 
> Probably similar idea: "lockdown: Restrict /dev/{mem,kmem,port} when the kernel is locked down"
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/drivers/char/mem.c?h=next-20190822&id=9b9d8dda1ed72e9bd560ab0ca93d322a9440510e
> 
> Then, syzbot might want to blacklist writing to /dev/mem .

syzbot should probably blacklist that now, you can do a lot of bad
things writing to that device node :(

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] /dev/mem: Bail out upon SIGKILL when reading memory.
  2019-08-22 16:42         ` Greg Kroah-Hartman
@ 2019-08-22 17:11           ` Dmitry Vyukov
  2019-08-22 21:17             ` Tetsuo Handa
  0 siblings, 1 reply; 31+ messages in thread
From: Dmitry Vyukov @ 2019-08-22 17:11 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Tetsuo Handa, Arnd Bergmann, LKML, syzbot, Eric Biggers

On Thu, Aug 22, 2019 at 9:42 AM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Thu, Aug 22, 2019 at 11:00:59PM +0900, Tetsuo Handa wrote:
> > On 2019/08/22 22:35, Greg Kroah-Hartman wrote:
> > > On Thu, Aug 22, 2019 at 06:59:25PM +0900, Tetsuo Handa wrote:
> > >> Tetsuo Handa wrote:
> > >>> Greg Kroah-Hartman wrote:
> > >>>> Oh, nice!  This shouldn't break anything that is assuming that the read
> > >>>> will complete before a signal is delivered, right?
> > >>>>
> > >>>> I know userspace handling of "short" reads is almost always not there...
> > >>>
> > >>> Since this check will give up upon SIGKILL, userspace won't be able to see
> > >>> the return value from read(). Thus, returning 0 upon SIGKILL will be safe. ;-)
> > >>> Maybe we also want to add cond_resched()...
> > >>>
> > >>> By the way, do we want similar check on write_mem() side?
> > >>> If aborting "write to /dev/mem" upon SIGKILL (results in partial write) is
> > >>> unexpected, we might want to ignore SIGKILL for write_mem() case.
> > >>> But copying data from killed threads (especially when killed by OOM killer
> > >>> and userspace memory is reclaimed by OOM reaper before write_mem() returns)
> > >>> would be after all unexpected. Then, it might be preferable to check SIGKILL
> > >>> on write_mem() side...
> > >>>
> > >>
> > >> Ha, ha. syzbot reported the same problem using write_mem().
> > >> https://syzkaller.appspot.com/text?tag=CrashLog&x=1018055a600000
> > >> We want fatal_signal_pending() check on both sides.
> > >
> > > Ok, want to send a patch for that?
> >
> > Yes. But before sending a patch, I'm trying to dump values using debug printk().
> >
> > >
> > > And does anything use /dev/mem anymore?  I think X stopped using it a
> > > long time ago.
> > >
> > >> By the way, write_mem() worries me whether there is possibility of replacing
> > >> kernel code/data with user-defined memory data supplied from userspace.
> > >> If write_mem() were by chance replaced with code that does
> > >>
> > >>    while (1);
> > >>
> > >> we won't be able to return from write_mem() even if we added fatal_signal_pending() check.
> > >> Ditto for replacing local variables with unexpected values...
> > >
> > > I'm sorry, I don't really understand what you mean here, but I haven't
> > > had my morning coffee...  Any hints as to an example?
> >
> > Probably similar idea: "lockdown: Restrict /dev/{mem,kmem,port} when the kernel is locked down"
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/drivers/char/mem.c?h=next-20190822&id=9b9d8dda1ed72e9bd560ab0ca93d322a9440510e
> >
> > Then, syzbot might want to blacklist writing to /dev/mem .
>
> syzbot should probably blacklist that now, you can do a lot of bad
> things writing to that device node :(

Agree. It wasn't supposed to reach it, but it figured out how to mount
devfs and then open "./mem"  bypassing all checks. Fortunately there
is a config to disable /dev/mem, so we are going to turn it off.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] /dev/mem: Bail out upon SIGKILL when reading memory.
  2019-08-22 17:11           ` Dmitry Vyukov
@ 2019-08-22 21:17             ` Tetsuo Handa
  2019-08-22 23:59               ` Dmitry Vyukov
  0 siblings, 1 reply; 31+ messages in thread
From: Tetsuo Handa @ 2019-08-22 21:17 UTC (permalink / raw)
  To: Dmitry Vyukov, Greg Kroah-Hartman, Linus Torvalds
  Cc: Arnd Bergmann, LKML, syzbot, Eric Biggers

On 2019/08/23 2:11, Dmitry Vyukov wrote:
> On Thu, Aug 22, 2019 at 9:42 AM Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
>>>>> By the way, write_mem() worries me whether there is possibility of replacing
>>>>> kernel code/data with user-defined memory data supplied from userspace.
>>>>> If write_mem() were by chance replaced with code that does
>>>>>
>>>>>    while (1);
>>>>>
>>>>> we won't be able to return from write_mem() even if we added fatal_signal_pending() check.
>>>>> Ditto for replacing local variables with unexpected values...
>>>>
>>>> I'm sorry, I don't really understand what you mean here, but I haven't
>>>> had my morning coffee...  Any hints as to an example?
>>>
>>> Probably similar idea: "lockdown: Restrict /dev/{mem,kmem,port} when the kernel is locked down"
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/drivers/char/mem.c?h=next-20190822&id=9b9d8dda1ed72e9bd560ab0ca93d322a9440510e
>>>
>>> Then, syzbot might want to blacklist writing to /dev/mem .
>>
>> syzbot should probably blacklist that now, you can do a lot of bad
>> things writing to that device node :(
> 
> Agree. It wasn't supposed to reach it, but it figured out how to mount
> devfs and then open "./mem"  bypassing all checks. Fortunately there
> is a config to disable /dev/mem, so we are going to turn it off.
> 

Can't we introduce a kernel config which selectively blocks specific actions?
If we don't need to worry about bypassing blacklist checks, we will be able to
enable syz_execute_func() again.

----------
 			ptr = xlate_dev_mem_ptr(p);
 			if (!ptr) {
 				if (written)
 					break;
 				return -EFAULT;
 			}
+#ifndef CONFIG_KERNEL_BUILT_FOR_FUZZ_TESTING
 			copied = copy_from_user(ptr, buf, sz);
+#else
+			copied = 0;
+#endif
 			unxlate_dev_mem_ptr(p, ptr);
----------

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] /dev/mem: Bail out upon SIGKILL when reading memory.
  2019-08-20 22:06 [PATCH] /dev/mem: Bail out upon SIGKILL when reading memory Tetsuo Handa
  2019-08-20 22:24 ` Greg Kroah-Hartman
@ 2019-08-22 21:29 ` Linus Torvalds
  2019-08-22 22:08 ` Linus Torvalds
  2 siblings, 0 replies; 31+ messages in thread
From: Linus Torvalds @ 2019-08-22 21:29 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Linux List Kernel Mailing, syzbot

On Tue, Aug 20, 2019 at 3:07 PM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> -       while (count > 0) {
> +       while (count > 0 && !fatal_signal_pending(current)) {

Please just use the normal pattern of doing

        if (fatal_signal_pending(current))
                return -EINTR;

inside the loop instead.

(Ok, in this case I think it wants

        err = -EINTR;
        if (fatal_signal_pending(current))
                break;

instead, but the point is to make it look like signal handling, just
with the special "fatal signals can sometimes be handled even when
regular signals might not make it through".

              Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] /dev/mem: Bail out upon SIGKILL when reading memory.
  2019-08-20 22:06 [PATCH] /dev/mem: Bail out upon SIGKILL when reading memory Tetsuo Handa
  2019-08-20 22:24 ` Greg Kroah-Hartman
  2019-08-22 21:29 ` [PATCH] /dev/mem: Bail out upon SIGKILL when reading memory Linus Torvalds
@ 2019-08-22 22:08 ` Linus Torvalds
  2019-08-23  9:16   ` Ingo Molnar
  2019-08-23 11:46   ` [PATCH] /dev/mem: Bail out upon SIGKILL when reading memory Tetsuo Handa
  2 siblings, 2 replies; 31+ messages in thread
From: Linus Torvalds @ 2019-08-22 22:08 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Linux List Kernel Mailing, syzbot

[-- Attachment #1: Type: text/plain, Size: 1104 bytes --]

On Tue, Aug 20, 2019 at 3:07 PM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> syzbot found that a thread can stall for minutes inside read_mem()
> after that thread was killed by SIGKILL [1]. Reading 2GB at one read()
> is legal, but delaying termination of killed thread for minutes is bad.

Side note: we might even just allow regular signals to interrupt
/dev/mem reads. We already do that for /dev/zero, and the risk of
breaking something is likely fairly low since nothing should use that
thing anyway.

Also, if it takes minutes to delay killing things, that implies that
we're probably still faulting in pages for the read_mem(). Which
points to another possible thing we could do in general: just don't
bother to handle page faults when a fatal signal is pending.

That situation might happen for other random cases too, and is not
limited to /dev/mem. So maybe it's worth trying? Does that essentially
fix the /dev/mem read case too in practice?

COMPLETELY untested patch attached, it may or may not make a
difference (and it may or may not work at all ;)

                Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 1892 bytes --]

 arch/x86/mm/fault.c | 15 ++++++++++++---
 mm/memory.c         |  5 +++++
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 9ceacd1156db..d6c029a6cb90 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1033,8 +1033,15 @@ static noinline void
 mm_fault_error(struct pt_regs *regs, unsigned long error_code,
 	       unsigned long address, vm_fault_t fault)
 {
-	if (fatal_signal_pending(current) && !(error_code & X86_PF_USER)) {
-		no_context(regs, error_code, address, 0, 0);
+	/*
+	 * If we already have a fatal signal, don't bother adding
+	 * a new one. If it's a kernel access, just make it fail,
+	 * and if it's a user access just return to let the process
+	 * die.
+	 */
+	if (fatal_signal_pending(current)) {
+		if (!(error_code & X86_PF_USER))
+			no_context(regs, error_code, address, 0, 0);
 		return;
 	}

@@ -1389,7 +1396,8 @@ void do_user_addr_fault(struct pt_regs *regs,
 			return;
 		}
 retry:
-		down_read(&mm->mmap_sem);
+		if (down_read_killable(&mm->mmap_sem))
+			goto fatal_signal;
 	} else {
 		/*
 		 * The above down_read_trylock() might have succeeded in
@@ -1455,6 +1463,7 @@ void do_user_addr_fault(struct pt_regs *regs,
 				goto retry;
 		}

+fatal_signal:
 		/* User mode? Just return to handle the fatal exception */
 		if (flags & FAULT_FLAG_USER)
 			return;
diff --git a/mm/memory.c b/mm/memory.c
index e2bb51b6242e..7ad62f96b08e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3988,6 +3988,11 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
 					    flags & FAULT_FLAG_REMOTE))
 		return VM_FAULT_SIGSEGV;

+	if (flags & FAULT_FLAG_KILLABLE) {
+		if (fatal_signal_pending(current))
+			return VM_FAULT_SIGSEGV;
+	}
+
 	/*
 	 * Enable the memcg OOM handling for faults triggered in user
 	 * space.  Kernel faults are handled more gracefully.

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH] /dev/mem: Bail out upon SIGKILL when reading memory.
  2019-08-22 21:17             ` Tetsuo Handa
@ 2019-08-22 23:59               ` Dmitry Vyukov
  2019-08-23  8:17                 ` Tetsuo Handa
  0 siblings, 1 reply; 31+ messages in thread
From: Dmitry Vyukov @ 2019-08-22 23:59 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Greg Kroah-Hartman, Linus Torvalds, Arnd Bergmann, LKML, syzbot,
	Eric Biggers

On Thu, Aug 22, 2019 at 2:17 PM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2019/08/23 2:11, Dmitry Vyukov wrote:
> > On Thu, Aug 22, 2019 at 9:42 AM Greg Kroah-Hartman
> > <gregkh@linuxfoundation.org> wrote:
> >>>>> By the way, write_mem() worries me whether there is possibility of replacing
> >>>>> kernel code/data with user-defined memory data supplied from userspace.
> >>>>> If write_mem() were by chance replaced with code that does
> >>>>>
> >>>>>    while (1);
> >>>>>
> >>>>> we won't be able to return from write_mem() even if we added fatal_signal_pending() check.
> >>>>> Ditto for replacing local variables with unexpected values...
> >>>>
> >>>> I'm sorry, I don't really understand what you mean here, but I haven't
> >>>> had my morning coffee...  Any hints as to an example?
> >>>
> >>> Probably similar idea: "lockdown: Restrict /dev/{mem,kmem,port} when the kernel is locked down"
> >>>
> >>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/drivers/char/mem.c?h=next-20190822&id=9b9d8dda1ed72e9bd560ab0ca93d322a9440510e
> >>>
> >>> Then, syzbot might want to blacklist writing to /dev/mem .
> >>
> >> syzbot should probably blacklist that now, you can do a lot of bad
> >> things writing to that device node :(
> >
> > Agree. It wasn't supposed to reach it, but it figured out how to mount
> > devfs and then open "./mem"  bypassing all checks. Fortunately there
> > is a config to disable /dev/mem, so we are going to turn it off.
> >
>
> Can't we introduce a kernel config which selectively blocks specific actions?
> If we don't need to worry about bypassing blacklist checks, we will be able to
> enable syz_execute_func() again.


We can consider this, but we need some set of good use cases first.
For /dev/{mem,kmem} we disable them with config, right? That looks
like the right thing to do because we don't want fuzzer to do anything
with these files anyway. So this won't be a good use case for
CONFIG_KERNEL_BUILT_FOR_FUZZ_TESTING.
Fuzzer can also reliably filter out based on syscall numbers of
top-level argument values. The potential problem is with (1)
pointers/indirect memory and (2) where blacklisting some top-level
argument values would backlist too much (e.g. prohibiting 3rd ioctl
argument 0 entirely).


> ----------
>                         ptr = xlate_dev_mem_ptr(p);
>                         if (!ptr) {
>                                 if (written)
>                                         break;
>                                 return -EFAULT;
>                         }
> +#ifndef CONFIG_KERNEL_BUILT_FOR_FUZZ_TESTING
>                         copied = copy_from_user(ptr, buf, sz);
> +#else
> +                       copied = 0;
> +#endif
>                         unxlate_dev_mem_ptr(p, ptr);
> ----------

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] /dev/mem: Bail out upon SIGKILL when reading memory.
  2019-08-22 23:59               ` Dmitry Vyukov
@ 2019-08-23  8:17                 ` Tetsuo Handa
  2019-08-23 16:47                   ` Dmitry Vyukov
  2019-10-08  9:57                   ` Kernel config for fuzz testing Tetsuo Handa
  0 siblings, 2 replies; 31+ messages in thread
From: Tetsuo Handa @ 2019-08-23  8:17 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Greg Kroah-Hartman, Linus Torvalds, Arnd Bergmann, LKML, syzbot,
	Eric Biggers

On 2019/08/23 8:59, Dmitry Vyukov wrote:
>> Can't we introduce a kernel config which selectively blocks specific actions?
>> If we don't need to worry about bypassing blacklist checks, we will be able to
>> enable syz_execute_func() again.
> 
> 
> We can consider this, but we need some set of good use cases first.
> For /dev/{mem,kmem} we disable them with config, right?

/dev/{mem,kmem} can be disabled by kernel config options. But

>                                                         That looks
> like the right thing to do because we don't want fuzzer to do anything
> with these files anyway.

I don't think so. To examine as corner as possible (e.g. lock dependency),
I consider that even doing

----------
+#ifdef CONFIG_KERNEL_BUILT_FOR_FUZZ_TESTING
+static char dummybuf[PAGE_SIZE];
+#endif
----------

----------
                        ptr = xlate_dev_mem_ptr(p);
                        if (!ptr) {
                                if (written)
                                        break;
                                return -EFAULT;
                        }
+#ifndef CONFIG_KERNEL_BUILT_FOR_FUZZ_TESTING
                        copied = copy_from_user(ptr, buf, sz);
+#else
+                       copied = copy_from_user(dummybuf, buf, min(sizeof(dummybuf), sz));
+#endif
                        unxlate_dev_mem_ptr(p, ptr);
----------

makes sense, for copy_from_user() might find new lock dependency
which would otherwise be unnoticed.

>                          So this won't be a good use case for
> CONFIG_KERNEL_BUILT_FOR_FUZZ_TESTING.
> Fuzzer can also reliably filter out based on syscall numbers of
> top-level argument values. The potential problem is with (1)
> pointers/indirect memory and (2) where blacklisting some top-level
> argument values would backlist too much (e.g. prohibiting 3rd ioctl
> argument 0 entirely).

I consider that functions that freezes processes/filesystems, 
reboots/shutdowns a system, changes console loglevels can be blocked
as well. Trying to examine up to last-second conditional branches will
catch more bugs (e.g. bugs in error recovery paths).


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] /dev/mem: Bail out upon SIGKILL when reading memory.
  2019-08-22 22:08 ` Linus Torvalds
@ 2019-08-23  9:16   ` Ingo Molnar
  2019-08-23 16:39     ` Linus Torvalds
  2019-08-23 11:46   ` [PATCH] /dev/mem: Bail out upon SIGKILL when reading memory Tetsuo Handa
  1 sibling, 1 reply; 31+ messages in thread
From: Ingo Molnar @ 2019-08-23  9:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Tetsuo Handa, Arnd Bergmann, Greg Kroah-Hartman,
	Linux List Kernel Mailing, syzbot


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Tue, Aug 20, 2019 at 3:07 PM Tetsuo Handa
> <penguin-kernel@i-love.sakura.ne.jp> wrote:
> >
> > syzbot found that a thread can stall for minutes inside read_mem()
> > after that thread was killed by SIGKILL [1]. Reading 2GB at one read()
> > is legal, but delaying termination of killed thread for minutes is bad.
> 
> Side note: we might even just allow regular signals to interrupt
> /dev/mem reads. We already do that for /dev/zero, and the risk of
> breaking something is likely fairly low since nothing should use that
> thing anyway.
> 
> Also, if it takes minutes to delay killing things, that implies that
> we're probably still faulting in pages for the read_mem(). Which
> points to another possible thing we could do in general: just don't
> bother to handle page faults when a fatal signal is pending.
> 
> That situation might happen for other random cases too, and is not
> limited to /dev/mem. So maybe it's worth trying? Does that essentially
> fix the /dev/mem read case too in practice?
> 
> COMPLETELY untested patch attached, it may or may not make a
> difference (and it may or may not work at all ;)
> 
>                 Linus

>  arch/x86/mm/fault.c | 15 ++++++++++++---
>  mm/memory.c         |  5 +++++
>  2 files changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index 9ceacd1156db..d6c029a6cb90 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -1033,8 +1033,15 @@ static noinline void
>  mm_fault_error(struct pt_regs *regs, unsigned long error_code,
>  	       unsigned long address, vm_fault_t fault)
>  {
> -	if (fatal_signal_pending(current) && !(error_code & X86_PF_USER)) {
> -		no_context(regs, error_code, address, 0, 0);
> +	/*
> +	 * If we already have a fatal signal, don't bother adding
> +	 * a new one. If it's a kernel access, just make it fail,
> +	 * and if it's a user access just return to let the process
> +	 * die.
> +	 */
> +	if (fatal_signal_pending(current)) {
> +		if (!(error_code & X86_PF_USER))
> +			no_context(regs, error_code, address, 0, 0);

Since the 'signal' parameter to no_context() is 0, will there be another 
signal generated? I don't think so - so the comment looks wrong to me, 
unless I'm missing something.

Other than that, what we are skipping here if a KILL signal is pending is 
the printout of oops information if the fault is a kernel one.

Not sure that's a good idea in general: carefully timing a KILL signal 
would allow the 'stealth probing' of otherwise oops generating addresses? 
Arguably it would only be a marginally useful facility to any attacker, 
but it doesn't feel right to me to skip the printing of bad faults if 
they occur.

I.e. I'm not sure this hunk is necessary or even a good idea.



>  		return;
>  	}
>  
> @@ -1389,7 +1396,8 @@ void do_user_addr_fault(struct pt_regs *regs,
>  			return;
>  		}
>  retry:
> -		down_read(&mm->mmap_sem);
> +		if (down_read_killable(&mm->mmap_sem))
> +			goto fatal_signal;
>  	} else {
>  		/*
>  		 * The above down_read_trylock() might have succeeded in
> @@ -1455,6 +1463,7 @@ void do_user_addr_fault(struct pt_regs *regs,
>  				goto retry;
>  		}
>  
> +fatal_signal:
>  		/* User mode? Just return to handle the fatal exception */
>  		if (flags & FAULT_FLAG_USER)
>  			return;
> diff --git a/mm/memory.c b/mm/memory.c
> index e2bb51b6242e..7ad62f96b08e 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3988,6 +3988,11 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
>  					    flags & FAULT_FLAG_REMOTE))
>  		return VM_FAULT_SIGSEGV;
>  
> +	if (flags & FAULT_FLAG_KILLABLE) {
> +		if (fatal_signal_pending(current))
> +			return VM_FAULT_SIGSEGV;
> +	}
> +
>  	/*
>  	 * Enable the memcg OOM handling for faults triggered in user
>  	 * space.  Kernel faults are handled more gracefully.

This part looks good to me, assuming someone tests it and it solves the 
bug :-)

Acked-by: Ingo Molnar <mingo@kernel.org>

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] /dev/mem: Bail out upon SIGKILL when reading memory.
  2019-08-22 22:08 ` Linus Torvalds
  2019-08-23  9:16   ` Ingo Molnar
@ 2019-08-23 11:46   ` Tetsuo Handa
  1 sibling, 0 replies; 31+ messages in thread
From: Tetsuo Handa @ 2019-08-23 11:46 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Linux List Kernel Mailing, syzbot

On 2019/08/23 7:08, Linus Torvalds wrote:
>> syzbot found that a thread can stall for minutes inside read_mem()
>> after that thread was killed by SIGKILL [1]. Reading 2GB at one read()
>> is legal, but delaying termination of killed thread for minutes is bad.
> 
> Side note: we might even just allow regular signals to interrupt
> /dev/mem reads. We already do that for /dev/zero, and the risk of
> breaking something is likely fairly low since nothing should use that
> thing anyway.
> 
> Also, if it takes minutes to delay killing things, that implies that
> we're probably still faulting in pages for the read_mem(). Which
> points to another possible thing we could do in general: just don't
> bother to handle page faults when a fatal signal is pending.

The cause of stall (by the reproducer) is that read_mem() continues iteration
until copy_to_user() returns -EFAULT (the userspace is asking for 2GB without
allocating 2GB of buffer to receive the result). Also, since the reproducer
concurrently calls preadv() but iteration loop in read_mem() consumes 100% of
CPU time, termination of killed threads is delayed a lot.

Thus, I can confirm that adding cond_resched() and fatal_signal_pending(current)
check into the iteration significantly reduces termination delay.

[ 1414.479373][T82148] read_mem: sz=4096 count=1070538751
[ 1414.481476][T75184] read_mem: sz=4096 count=1064574975
[ 1414.483489][T75184] read_mem: sz=4096 count=1064570879
[ 1414.483495][T82148] read_mem: sz=4096 count=1070534655
[ 1414.485517][T74318] read_mem: sz=4096 count=1060986879
[ 1414.485520][T75184] read_mem: sz=4096 count=1064566783
[ 1414.487978][T75184] read_mem: sz=4096 count=1064562687
[ 1414.487982][T74318] read_mem: sz=4096 count=1060982783
[ 1414.490011][T75184] read_mem: sz=4096 count=1064558591
[ 1414.490015][T74318] read_mem: sz=4096 count=1060978687
[ 1414.492099][T82148] read_mem: sz=4096 count=1070530559
[ 1414.492101][T75184] read_mem: sz=4096 count=1064554495
[ 1414.492104][T74497] read_mem: sz=4096 count=1062944767
[ 1414.494100][T75184] read_mem: sz=4096 count=1064550399
[ 1414.494104][T82148] read_mem: sz=4096 count=1070526463
[ 1414.494107][T74497] read_mem: sz=4096 count=1062940671
[ 1414.496112][T82148] read_mem: sz=4096 count=1070522367
[ 1414.496116][T74497] read_mem: sz=4096 count=1062936575
[ 1414.496119][T75184] read_mem: sz=4096 count=1064546303
[ 1414.498157][T75184] read_mem: sz=4096 count=1064542207
[ 1414.498169][T74497] read_mem: sz=4096 count=1062932479
[ 1414.498276][   T33] INFO: task a.out:74318 can't die for more than 30 seconds.
[ 1414.498278][   T33] a.out           R  running task    13688 74318 112832 0x80004084
[ 1414.498284][   T33] Call Trace:
[ 1414.498292][   T33]  ? __schedule+0x253/0x700
[ 1414.498296][   T33]  preempt_schedule_common+0x1b/0x3a
[ 1414.498298][   T33]  _cond_resched+0x18/0x20
[ 1414.498301][   T33]  kmem_cache_alloc_trace+0x274/0x320
[ 1414.498354][   T33]  ? reserve_memtype+0xd4/0x3e0
[ 1414.498359][   T33]  reserve_memtype+0xd4/0x3e0
[ 1414.498363][   T33]  ? memremap+0xa0/0x1a0
[ 1414.498365][   T33]  __ioremap_caller.isra.10+0xeb/0x2f0
[ 1414.498370][   T33]  memremap+0xa0/0x1a0
[ 1414.498373][   T33]  xlate_dev_mem_ptr+0x20/0x30
[ 1414.498403][   T33]  read_mem+0xf3/0x1f0
[ 1414.498409][   T33]  do_loop_readv_writev+0x47/0x160
[ 1414.498413][   T33]  do_iter_read+0xef/0x120
[ 1414.498417][   T33]  vfs_readv+0x68/0xa0
[ 1414.498423][   T33]  ? ktime_get_coarse_real_ts64+0x66/0xd0
[ 1414.498426][   T33]  ? lockdep_hardirqs_on+0x122/0x1b0
[ 1414.498429][   T33]  ? syscall_trace_enter+0x1f3/0x340
[ 1414.498431][   T33]  ? syscall_trace_enter+0x1f3/0x340
[ 1414.498433][   T33]  ? trace_hardirqs_on_thunk+0x1a/0x20
[ 1414.498436][   T33]  do_preadv+0x97/0xc0
[ 1414.498440][   T33]  do_syscall_64+0x55/0x260
[ 1414.498443][   T33]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1414.498445][   T33] RIP: 0033:0x7f63675c0349
[ 1414.498449][   T33] Code: Bad RIP value.
[ 1414.498450][   T33] RSP: 002b:00007ffd22952c18 EFLAGS: 00000246 ORIG_RAX: 0000000000000127
[ 1414.498452][   T33] RAX: ffffffffffffffda RBX: 00000000000f4240 RCX: 00007f63675c0349
[ 1414.498453][   T33] RDX: 0000000000000002 RSI: 0000000020000740 RDI: 0000000000000003
[ 1414.498453][   T33] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 1414.498454][   T33] R10: 00000000febfffff R11: 0000000000000246 R12: 000000000014e1ab
[ 1414.498455][   T33] R13: 00007ffd22952d50 R14: 0000000000000000 R15: 0000000000000000
[ 1414.498463][   T33] INFO: task a.out:74497 can't die for more than 30 seconds.
[ 1414.498464][   T33] a.out           R  running task    13888 74497 112827 0x8000408c
[ 1414.498468][   T33] Call Trace:
[ 1414.498473][   T33]  ? __lock_acquire+0x24b/0x1090
[ 1414.498479][   T33]  ? vprintk_emit+0x19e/0x2e0
[ 1414.498484][   T33]  ? vprintk_emit+0x1d4/0x2e0
[ 1414.498485][   T33]  ? vprintk_emit+0x19e/0x2e0
[ 1414.498490][   T33]  ? printk+0x53/0x6a
[ 1414.498495][   T33]  ? read_mem+0x19a/0x1f0
[ 1414.498499][   T33]  ? do_loop_readv_writev+0x47/0x160
[ 1414.498503][   T33]  ? do_iter_read+0xef/0x120
[ 1414.498506][   T33]  ? vfs_readv+0x68/0xa0
[ 1414.498511][   T33]  ? ktime_get_coarse_real_ts64+0x66/0xd0
[ 1414.498512][   T33]  ? lockdep_hardirqs_on+0x122/0x1b0
[ 1414.498515][   T33]  ? syscall_trace_enter+0x1f3/0x340
[ 1414.498517][   T33]  ? syscall_trace_enter+0x1f3/0x340
[ 1414.498519][   T33]  ? trace_hardirqs_on_thunk+0x1a/0x20
[ 1414.498521][   T33]  ? do_preadv+0x97/0xc0
[ 1414.498525][   T33]  ? do_syscall_64+0x55/0x260
[ 1414.498527][   T33]  ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1414.498535][   T33] INFO: task a.out:74786 can't die for more than 30 seconds.
[ 1414.498535][   T33] a.out           R  running task    13688 74786 112831 0x80004084
[ 1414.498539][   T33] Call Trace:
[ 1414.498542][   T33]  ? __schedule+0x253/0x700
[ 1414.498546][   T33]  preempt_schedule_common+0x1b/0x3a
[ 1414.498548][   T33]  _cond_resched+0x18/0x20
[ 1414.498550][   T33]  kmem_cache_alloc_trace+0x274/0x320
[ 1414.498552][   T33]  ? reserve_memtype+0xd4/0x3e0
[ 1414.498555][   T33]  reserve_memtype+0xd4/0x3e0
[ 1414.498558][   T33]  ? memremap+0xa0/0x1a0
[ 1414.498561][   T33]  __ioremap_caller.isra.10+0xeb/0x2f0
[ 1414.498566][   T33]  memremap+0xa0/0x1a0
[ 1414.498568][   T33]  xlate_dev_mem_ptr+0x20/0x30
[ 1414.498570][   T33]  read_mem+0xf3/0x1f0
[ 1414.498574][   T33]  do_loop_readv_writev+0x47/0x160
[ 1414.498578][   T33]  do_iter_read+0xef/0x120
[ 1414.498581][   T33]  vfs_readv+0x68/0xa0
[ 1414.498586][   T33]  ? ktime_get_coarse_real_ts64+0x66/0xd0
[ 1414.498588][   T33]  ? lockdep_hardirqs_on+0x122/0x1b0
[ 1414.498590][   T33]  ? syscall_trace_enter+0x1f3/0x340
[ 1414.498592][   T33]  ? syscall_trace_enter+0x1f3/0x340
[ 1414.498594][   T33]  ? trace_hardirqs_on_thunk+0x1a/0x20
[ 1414.498596][   T33]  do_preadv+0x97/0xc0
[ 1414.498600][   T33]  do_syscall_64+0x55/0x260
[ 1414.498602][   T33]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1414.498603][   T33] RIP: 0033:0x7f63675c0349
[ 1414.498605][   T33] Code: Bad RIP value.
[ 1414.498606][   T33] RSP: 002b:00007ffd22952c18 EFLAGS: 00000246 ORIG_RAX: 0000000000000127
[ 1414.498607][   T33] RAX: ffffffffffffffda RBX: 00000000000f4240 RCX: 00007f63675c0349
[ 1414.498608][   T33] RDX: 0000000000000002 RSI: 0000000020000740 RDI: 0000000000000003
[ 1414.498609][   T33] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 1414.498610][   T33] R10: 00000000febfffff R11: 0000000000000246 R12: 000000000014e74b
[ 1414.498610][   T33] R13: 00007ffd22952d50 R14: 0000000000000000 R15: 0000000000000000
[ 1414.498620][   T33] 
[ 1414.498620][   T33] Showing all locks held in the system:
[ 1414.498624][   T33] 1 lock held by khungtaskd/33:
[ 1414.498625][   T33]  #0: ffffffff96276b40 (rcu_read_lock){....}, at: debug_show_all_locks+0xe/0x1a0
[ 1414.498774][   T33] 2 locks held by agetty/2831:
[ 1414.498775][   T33]  #0: ffffa3a16cd84b18 (&tty->ldisc_sem){++++}, at: tty_ldisc_ref_wait+0x1f/0x50
[ 1414.498803][   T33]  #1: ffffb02b413592e0 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0xd3/0x930
[ 1414.498814][   T33] 2 locks held by bash/2859:
[ 1414.498814][   T33]  #0: ffffa3a16cd81338 (&tty->ldisc_sem){++++}, at: tty_ldisc_ref_wait+0x1f/0x50
[ 1414.498817][   T33]  #1: ffffb02b4135d2e0 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0xd3/0x930
[ 1414.498823][   T33] 2 locks held by bash/2984:
[ 1414.498824][   T33]  #0: ffffa3a17453ddb8 (&tty->ldisc_sem){++++}, at: tty_ldisc_ref_wait+0x1f/0x50
[ 1414.498826][   T33]  #1: ffffb02b41fcd2e0 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0xd3/0x930
[ 1414.498832][   T33] 2 locks held by agetty/4230:
[ 1414.498833][   T33]  #0: ffffa3a16cdf6708 (&tty->ldisc_sem){++++}, at: tty_ldisc_ref_wait+0x1f/0x50
[ 1414.498835][   T33]  #1: ffffb02b461ff2e0 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0xd3/0x930
[ 1414.498843][   T33] 1 lock held by a.out/74497:
[ 1414.498848][   T33] 3 locks held by a.out/78162:
[ 1414.498852][   T33] 1 lock held by a.out/86204:
[ 1414.498856][   T33] 
[ 1414.498857][   T33] =============================================
[ 1414.498857][   T33] 
[ 1414.500709][T75184] read_mem: sz=4096 count=1064538111
[ 1414.505727][T75184] read_mem: sz=4096 count=1064534015
[ 1414.524668][T74318] read_mem: sz=4096 count=1060974591
[ 1414.526675][T74786] read_mem: sz=4096 count=1062191103
[ 1414.529343][T82148] read_mem: sz=4096 count=1070518271
[ 1414.529356][T78162] read_mem: sz=4096 count=1067028479
[ 1414.530675][T74318] read_mem: sz=4096 count=1060970495
[ 1414.532791][T74318] read_mem: sz=4096 count=1060966399
[ 1414.532806][T75184] read_mem: sz=4096 count=1064529919
[ 1414.534823][T75184] read_mem: sz=4096 count=1064525823


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] /dev/mem: Bail out upon SIGKILL when reading memory.
  2019-08-23  9:16   ` Ingo Molnar
@ 2019-08-23 16:39     ` Linus Torvalds
  2019-08-24 16:14       ` Ingo Molnar
  0 siblings, 1 reply; 31+ messages in thread
From: Linus Torvalds @ 2019-08-23 16:39 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Tetsuo Handa, Arnd Bergmann, Greg Kroah-Hartman,
	Linux List Kernel Mailing, syzbot

On Fri, Aug 23, 2019 at 2:16 AM Ingo Molnar <mingo@kernel.org> wrote:
>
> > +      */
> > +     if (fatal_signal_pending(current)) {
> > +             if (!(error_code & X86_PF_USER))
> > +                     no_context(regs, error_code, address, 0, 0);
>
> Since the 'signal' parameter to no_context() is 0, will there be another
> signal generated? I don't think so - so the comment looks wrong to me,
> unless I'm missing something.

The old case only handled the kernel case - which never added a signal
at all, it just failed the access (and results in EFAULT or similar,
but nobody cares since the whole point is that the process is going to
be killed).

The *changed* case handles user space accesses too - by just returning
without any new signal being generated. The old case would fall
through to the "generate SIGSEGV", which seems pointless and wrong
(and also possibly generates misleading messages in the kernel logs).

> Other than that, what we are skipping here if a KILL signal is pending is
> the printout of oops information if the fault is a kernel one.
>
> Not sure that's a good idea in general: carefully timing a KILL signal
> would allow the 'stealth probing' of otherwise oops generating addresses?

That sounds really like a non-issue to me. Plus the old code ALREADY
did that exact thing. The only _new_ case is that it does is silently
for user page faults.

> I.e. I'm not sure this hunk is necessary or even a good idea.

As mentioned, the new code doesn't change the case you are talking about at all.

The new code only changes the case of this happening from user space,
when it doesn't generate a pointless signal and a pointless possible
show_signal_msg() garbage for a user mode access that was denied due
to the new

> > +     if (flags & FAULT_FLAG_KILLABLE) {
> > +             if (fatal_signal_pending(current))
> > +                     return VM_FAULT_SIGSEGV;
> > +     }

code in handle_mm_fault().

And you said that new code looks fine to you, but you didn't seem to
realize that it causes stupid incorrect kernel messages to be printed
("segfault at xyz" etc) that are completely wrong.

Because it's not a "real" SIGSEGV. It gets repressed by the fact that
there's a fatal signal pending.

Otherwise we'd have to add a whole new case of VM_FAULT_xyz..

             Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] /dev/mem: Bail out upon SIGKILL when reading memory.
  2019-08-23  8:17                 ` Tetsuo Handa
@ 2019-08-23 16:47                   ` Dmitry Vyukov
  2019-10-08  9:57                   ` Kernel config for fuzz testing Tetsuo Handa
  1 sibling, 0 replies; 31+ messages in thread
From: Dmitry Vyukov @ 2019-08-23 16:47 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Greg Kroah-Hartman, Linus Torvalds, Arnd Bergmann, LKML, syzbot,
	Eric Biggers

On Fri, Aug 23, 2019 at 1:17 AM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2019/08/23 8:59, Dmitry Vyukov wrote:
> >> Can't we introduce a kernel config which selectively blocks specific actions?
> >> If we don't need to worry about bypassing blacklist checks, we will be able to
> >> enable syz_execute_func() again.
> >
> >
> > We can consider this, but we need some set of good use cases first.
> > For /dev/{mem,kmem} we disable them with config, right?
>
> /dev/{mem,kmem} can be disabled by kernel config options. But
>
> >                                                         That looks
> > like the right thing to do because we don't want fuzzer to do anything
> > with these files anyway.
>
> I don't think so. To examine as corner as possible (e.g. lock dependency),
> I consider that even doing
>
> ----------
> +#ifdef CONFIG_KERNEL_BUILT_FOR_FUZZ_TESTING
> +static char dummybuf[PAGE_SIZE];
> +#endif
> ----------
>
> ----------
>                         ptr = xlate_dev_mem_ptr(p);
>                         if (!ptr) {
>                                 if (written)
>                                         break;
>                                 return -EFAULT;
>                         }
> +#ifndef CONFIG_KERNEL_BUILT_FOR_FUZZ_TESTING
>                         copied = copy_from_user(ptr, buf, sz);
> +#else
> +                       copied = copy_from_user(dummybuf, buf, min(sizeof(dummybuf), sz));
> +#endif
>                         unxlate_dev_mem_ptr(p, ptr);
> ----------
>
> makes sense, for copy_from_user() might find new lock dependency
> which would otherwise be unnoticed.
>
> >                          So this won't be a good use case for
> > CONFIG_KERNEL_BUILT_FOR_FUZZ_TESTING.
> > Fuzzer can also reliably filter out based on syscall numbers of
> > top-level argument values. The potential problem is with (1)
> > pointers/indirect memory and (2) where blacklisting some top-level
> > argument values would backlist too much (e.g. prohibiting 3rd ioctl
> > argument 0 entirely).
>
> I consider that functions that freezes processes/filesystems,
> reboots/shutdowns a system, changes console loglevels can be blocked
> as well. Trying to examine up to last-second conditional branches will
> catch more bugs (e.g. bugs in error recovery paths).

Well, ok, sounds reasonable. If you can take on upstreaming such
config, we will definitely enable it on syzbot.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] /dev/mem: Bail out upon SIGKILL when reading memory.
  2019-08-23 16:39     ` Linus Torvalds
@ 2019-08-24 16:14       ` Ingo Molnar
  2019-08-24 17:40         ` Linus Torvalds
  0 siblings, 1 reply; 31+ messages in thread
From: Ingo Molnar @ 2019-08-24 16:14 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Tetsuo Handa, Arnd Bergmann, Greg Kroah-Hartman,
	Linux List Kernel Mailing, syzbot


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Fri, Aug 23, 2019 at 2:16 AM Ingo Molnar <mingo@kernel.org> wrote:
> >
> > > +      */
> > > +     if (fatal_signal_pending(current)) {
> > > +             if (!(error_code & X86_PF_USER))
> > > +                     no_context(regs, error_code, address, 0, 0);
> >
> > Since the 'signal' parameter to no_context() is 0, will there be another
> > signal generated? I don't think so - so the comment looks wrong to me,
> > unless I'm missing something.
> 
> The old case only handled the kernel case - which never added a signal
> at all, it just failed the access (and results in EFAULT or similar,
> but nobody cares since the whole point is that the process is going to
> be killed).
> 
> The *changed* case handles user space accesses too - by just returning
> without any new signal being generated. The old case would fall
> through to the "generate SIGSEGV", which seems pointless and wrong
> (and also possibly generates misleading messages in the kernel logs).

Indeed!

> > Other than that, what we are skipping here if a KILL signal is pending is
> > the printout of oops information if the fault is a kernel one.
> >
> > Not sure that's a good idea in general: carefully timing a KILL signal
> > would allow the 'stealth probing' of otherwise oops generating addresses?
> 
> That sounds really like a non-issue to me. Plus the old code ALREADY
> did that exact thing. The only _new_ case is that it does is silently
> for user page faults.
> 
> > I.e. I'm not sure this hunk is necessary or even a good idea.
> 
> As mentioned, the new code doesn't change the case you are talking about at all.
> 
> The new code only changes the case of this happening from user space,
> when it doesn't generate a pointless signal and a pointless possible
> show_signal_msg() garbage for a user mode access that was denied due
> to the new
> 
> > > +     if (flags & FAULT_FLAG_KILLABLE) {
> > > +             if (fatal_signal_pending(current))
> > > +                     return VM_FAULT_SIGSEGV;
> > > +     }
> 
> code in handle_mm_fault().
> 
> And you said that new code looks fine to you, but you didn't seem to
> realize that it causes stupid incorrect kernel messages to be printed
> ("segfault at xyz" etc) that are completely wrong.
> 
> Because it's not a "real" SIGSEGV. It gets repressed by the fact that
> there's a fatal signal pending.
> 
> Otherwise we'd have to add a whole new case of VM_FAULT_xyz..

Indeed :-/

At least I can report that I was able to reproduce a probable variant of 
the bug Tetsuo reported on a regular PC as well, with this command:

  dd if=/dev/mem of=/dev/null status=progress conv=noerror bs=100M

The 'noerror' is needed to skip unreadable holes early in the memory map.

What I noticed is that while reading regular RAM is reasonably fast even 
in very large chunks, accesses become very slow once they hit iomem - and 
delays even longer than the reported 145 seconds become possible if 
bs=100M is increased to say bs=1000M.

With your patch applied these large read chunks are still possible and 
not interruptible within read_mem(), and because the source of the 
slowdown is the iomem access and memcpy()s, not the paging overhead, the 
new page fault code doesn't even trigger and the delays remain.

I.e. this hypothesis is not true, on my testbox at least - and with my 
different testcase:

> > > Also, if it takes minutes to delay killing things, that implies 
> > > that we're probably still faulting in pages for the read_mem(). 
> > > Which points to another possible thing we could do in general: just 
> > > don't bother to handle page faults when a fatal signal is pending.

With Tetsuo's approach the delays are fixed, because the fatal signal is 
noticed within the 4K chunks of read_mem() immediately:

[pid  1674] 1566662105.052485 write(1, "ZZZZ\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 104857600) = 104857600 <0.000008>
[pid  1674] 1566662105.052512 read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 104857600) = 104857600 <17.006740>
[pid  1674] 1566662122.059306 write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 104857600) = 104857600 <0.000009>
) = 1 <0.000010>662122.059334 write(2, "\r", 1
[pid  1674] 1566662122.059372 write(2, "3145728000 bytes (3.1 GB) copied", 323145728000 bytes (3.1 GB) copied) = 32 <0.000009>
[pid  1674] 1566662122.059411 write(2, ", 18.214699 s, 173 MB/s", 23, 18.214699 s, 173 MB/s) = 23 <0.000008>
[pid  1674] 1566662122.059433 read(0,  <unfinished ...>
[pid  1674] 1566662123.082491 +++ killed by SIGKILL +++

Note how the first 100MB chunk took 17 seconds, the second chunk was 
interrupted within ~2 seconds after I sent a SIGKILL.

Find below a variant of Tetsuo's patch, that uses a more regular signal 
return pattern that you suggested in your other mail:

> (Ok, in this case I think it wants
>
>         err = -EINTR;
>         if (fatal_signal_pending(current))
>                 break;
>
> instead, but the point is to make it look like signal handling, just
> with the special "fatal signals can sometimes be handled even when
> regular signals might not make it through".

Except that I think the regular pattern here is not 'break' but 'goto 
failed' - 'break' would just result in a partial read and 'err' gets 
ignored.

I also agree that we should probably allow regular interrupts to 
interrupt /dev/mem accesses - while the 'dd' process is killable, it's 
not Ctrl-C-able.

If I change the fatal_signal_pending() to signal_pending() it all behaves 
much better IMHO - assuming that no utility learned rely on 
non-interruptibility ...

The second patch is attached below the first one.

Independently of this I also agree with Tetsuo that a cond_resched() 
might be in order to break up this loop on non-preempt kernels. The third 
patch below implements this variant.

Is this closer to what you had in mind?

Thanks,

	Ingo

---
 drivers/char/mem.c |    9 +++++++++
 1 file changed, 9 insertions(+)

Index: linux/drivers/char/mem.c
===================================================================
--- linux.orig/drivers/char/mem.c
+++ linux/drivers/char/mem.c
@@ -175,6 +175,15 @@ static ssize_t read_mem(struct file *fil
 		p += sz;
 		count -= sz;
 		read += sz;
+
+		/*
+		 * Reading from iomem areas of /dev/mem can be slow,
+		 * depending on the hardware, so allow such read()s
+		 * to be interrupted via fatal signals (SIGKILL):
+		 */
+		err = -EINTR;
+		if (fatal_signal_pending(current))
+			goto failed;
 	}
 	kfree(bounce);
 
 
 drivers/char/mem.c |    9 +++++++++
 1 file changed, 9 insertions(+)

Index: linux/drivers/char/mem.c
===================================================================
--- linux.orig/drivers/char/mem.c
+++ linux/drivers/char/mem.c
@@ -175,6 +175,15 @@ static ssize_t read_mem(struct file *fil
 		p += sz;
 		count -= sz;
 		read += sz;
+
+		/*
+		 * Reading from iomem areas of /dev/mem can be slow,
+		 * depending on the hardware, so allow such read()s
+		 * to be interruptible:
+		 */
+		err = -EINTR;
+		if (signal_pending(current))
+			goto failed;
 	}
 	kfree(bounce);
 
Index: linux/drivers/char/mem.c
===================================================================
--- linux.orig/drivers/char/mem.c
+++ linux/drivers/char/mem.c
@@ -175,6 +175,18 @@ static ssize_t read_mem(struct file *fil
 		p += sz;
 		count -= sz;
 		read += sz;
+
+		/*
+		 * Reading from iomem areas of /dev/mem can be slow,
+		 * depending on the hardware, so allow such read()s
+		 * to be interruptible and preemptible:
+		 */
+		err = -EINTR;
+		if (signal_pending(current))
+			goto failed;
+
+		if (need_resched())
+			cond_resched();
 	}
 	kfree(bounce);
 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] /dev/mem: Bail out upon SIGKILL when reading memory.
  2019-08-24 16:14       ` Ingo Molnar
@ 2019-08-24 17:40         ` Linus Torvalds
  2019-08-24 20:22           ` Ingo Molnar
  0 siblings, 1 reply; 31+ messages in thread
From: Linus Torvalds @ 2019-08-24 17:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Tetsuo Handa, Arnd Bergmann, Greg Kroah-Hartman,
	Linux List Kernel Mailing, syzbot

On Sat, Aug 24, 2019 at 9:14 AM Ingo Molnar <mingo@kernel.org> wrote:
>
> What I noticed is that while reading regular RAM is reasonably fast even
> in very large chunks, accesses become very slow once they hit iomem - and
> delays even longer than the reported 145 seconds become possible if
> bs=100M is increased to say bs=1000M.

Ok, that makes sense.

So for IOMEM, we actually have two independent issues:

 (a) for normal unmapped IOMEM (ie no device), which is probably the
case your test triggers anyway, it quite possibly ends up having ISA
timings (ie around 1us per read)

 (b) we use "probe_kernel_read()" to a temporary buffer avoid page
faults, which uses __copy_from_user_inatomic(). So far so good. But on
modern x86, because we treat it as just a regular memcpy, we probably
have ERMS and do "rep movsb".

So (a) means that each access is slow to begin with, and then (b)
means that we don't use "copy_from_io()" but a slow byte-by-byte
access.

> With Tetsuo's approach the delays are fixed, because the fatal signal is
> noticed within the 4K chunks of read_mem() immediately:

Yeah. that's the size of the temp buffer.

> Note how the first 100MB chunk took 17 seconds, the second chunk was
> interrupted within ~2 seconds after I sent a SIGKILL.

It looks closer to 1 second from that trace, but that actually is
still within the basic noise above: a 4kB buffer being copied one byte
at a time would take about 4s, but maybe it's not *quite* as bad as
traditional ISA PIO timings.

> Except that I think the regular pattern here is not 'break' but 'goto
> failed' - 'break' would just result in a partial read and 'err' gets
> ignored.

That's actually the correct signal handling pattern: either a partial
read, or -EINTR.

In the case of SIGKILL, the return value doesn't matter, of course,
but *if* we were to decide that we can make it interruptible, then it
would.

> I also agree that we should probably allow regular interrupts to
> interrupt /dev/mem accesses - while the 'dd' process is killable, it's
> not Ctrl-C-able.

I'm a bit nervous about it, but there probably aren't all that many
actual /dev/mem users.

The main ones I can see are things like "dmidecode" etc.

> If I change the fatal_signal_pending() to signal_pending() it all behaves
> much better IMHO - assuming that no utility learned rely on
> non-interruptibility ...

So I cloned the dmidecode git tree, and it does "myread()" on the
/dev/mem file as far as I can tell. And that one correctly hanles
partial reads.

It still makes me a bit nervous, but just plain "signal_pending()" is
not just nicer, it's technically the right thing to do (it's not a
regular file, so partial reads are permissible, and other files like
it - eg /dev/zero - already does exactly that).

I also wonder if we should just not use that crazy
"probe_kernel_read()" to begin with. The /dev/mem code uses it to
avoid faults - and that's the intent of it - but it's not meant for
IO-memory at all.

But "read()" on /dev/mem does that off "xlate_dev_mem_ptr()", which is
a bastardized "kernel address or ioremap" thing. It works, but it's
all kinds of stupid. We'd be a *lot* better off using kmap(), I think.

So my gut feel is that this is something worth trying to do bigger and
more fundamental changes to.

But changing it to use "signal_pending()" at least as a trial looks
ok. The only user *should* be things like dmidecode that apparently
already do the right thing.

And if we changed the bounce buffering to do things at least a 32-bit
word at a time, that would help latency for IO by a factor of 4.

And if the signal_pending() is at the end of the copy, then we'd
guarantee that at least _small_ reads still work the way they did
before, so people who do things like "lspci()" with direct hardware
accesses wouldn't be impacted - if they exist.

So I'd be willing to try that (and then if somebody reports a
regression we can make it use "fatal_signal_pending()" instead)

                    Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] /dev/mem: Bail out upon SIGKILL when reading memory.
  2019-08-24 17:40         ` Linus Torvalds
@ 2019-08-24 20:22           ` Ingo Molnar
  2019-08-24 20:56             ` Linus Torvalds
  2019-08-25  5:49             ` Tetsuo Handa
  0 siblings, 2 replies; 31+ messages in thread
From: Ingo Molnar @ 2019-08-24 20:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Tetsuo Handa, Arnd Bergmann, Greg Kroah-Hartman,
	Linux List Kernel Mailing, syzbot


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Sat, Aug 24, 2019 at 9:14 AM Ingo Molnar <mingo@kernel.org> wrote:
> >
> > What I noticed is that while reading regular RAM is reasonably fast even
> > in very large chunks, accesses become very slow once they hit iomem - and
> > delays even longer than the reported 145 seconds become possible if
> > bs=100M is increased to say bs=1000M.
> 
> Ok, that makes sense.
> 
> So for IOMEM, we actually have two independent issues:
> 
>  (a) for normal unmapped IOMEM (ie no device), which is probably the
> case your test triggers anyway, it quite possibly ends up having ISA
> timings (ie around 1us per read)

That makes sense: I measured 17 seconds per 100 MB of data, which is is 
0.16 usecs per byte. The instruction used by 
copy_user_enhanced_fast_string() is REP MOVSB - which supposedly goes as 
high as cacheline size accesses - but perhaps those get broken down for 
physical memory that has no device claiming it?

>  (b) we use "probe_kernel_read()" to a temporary buffer avoid page
> faults, which uses __copy_from_user_inatomic(). So far so good. But on
> modern x86, because we treat it as just a regular memcpy, we probably
> have ERMS and do "rep movsb".
> 
> So (a) means that each access is slow to begin with, and then (b)
> means that we don't use "copy_from_io()" but a slow byte-by-byte
> access.
> 
> > With Tetsuo's approach the delays are fixed, because the fatal signal is
> > noticed within the 4K chunks of read_mem() immediately:
> 
> Yeah. that's the size of the temp buffer.
> 
> > Note how the first 100MB chunk took 17 seconds, the second chunk was
> > interrupted within ~2 seconds after I sent a SIGKILL.
> 
> It looks closer to 1 second from that trace, but that actually is
> still within the basic noise above: a 4kB buffer being copied one byte
> at a time would take about 4s, but maybe it's not *quite* as bad as
> traditional ISA PIO timings.

Yeah - and note that I phrased it imprecisely: the real in-kernel signal 
interruption delay was probably below 1 msec: in my test the SIGKILL was 
manually triggered, with an about 1 second delay caused by the human 
brain, not by the kernel. ;-)

The in-kernel interruption is almost instantaneous - the 4K max chunking 
is good I think.

> > Except that I think the regular pattern here is not 'break' but 'goto
> > failed' - 'break' would just result in a partial read and 'err' gets
> > ignored.
> 
> That's actually the correct signal handling pattern: either a partial
> read, or -EINTR.
> 
> In the case of SIGKILL, the return value doesn't matter, of course,
> but *if* we were to decide that we can make it interruptible, then it
> would.

Yeah. So while error cases and signals are not equivalent, I also went by 
existing precedent within read_kmem(): the 2 other error cases return an 
error (-ENXIO and -EFAULT), while they could also conceivably return a 
partial read of the previously completed read and only return an error if 
the *first* chunk generates an error without making any progress?

Interruption is arguably *not* an 'error', so preserving partial reads 
sounds like the higher quality solution, nevertheless one could argue 
that particual read *could* be returned by read_kmem() if progress was 
made.

> > I also agree that we should probably allow regular interrupts to
> > interrupt /dev/mem accesses - while the 'dd' process is killable, it's
> > not Ctrl-C-able.
> 
> I'm a bit nervous about it, but there probably aren't all that many
> actual /dev/mem users.
> 
> The main ones I can see are things like "dmidecode" etc.
> 
> > If I change the fatal_signal_pending() to signal_pending() it all behaves
> > much better IMHO - assuming that no utility learned rely on
> > non-interruptibility ...
> 
> So I cloned the dmidecode git tree, and it does "myread()" on the
> /dev/mem file as far as I can tell. And that one correctly hanles
> partial reads.
> 
> It still makes me a bit nervous, but just plain "signal_pending()" is
> not just nicer, it's technically the right thing to do (it's not a
> regular file, so partial reads are permissible, and other files like
> it - eg /dev/zero - already does exactly that).
> 
> I also wonder if we should just not use that crazy
> "probe_kernel_read()" to begin with. The /dev/mem code uses it to
> avoid faults - and that's the intent of it - but it's not meant for
> IO-memory at all.
> 
> But "read()" on /dev/mem does that off "xlate_dev_mem_ptr()", which is
> a bastardized "kernel address or ioremap" thing. It works, but it's
> all kinds of stupid. We'd be a *lot* better off using kmap(), I think.

Hm, I think xlate_dev_mem_ptr() and thus ioremap() would also perform a 
cache aliasing conflict check - which kmap() wouldn't do?

I.e. if for example an iomem area is already mapped by a driver with some 
conflicting cache attribute, xlate_dev_mem_ptr() AFAICS will not 
ioremap_cache() it to cached? IIRC some CPUs would triple fault or 
completely misbehave on certain cache attribute conflicts.

This check in mremap() might also trigger:

        if (is_ram == REGION_MIXED) {
                WARN_ONCE(1, "memremap attempted on mixed range %pa size: %#lx\n",
                                &offset, (unsigned long) size);
                return NULL;
        }

So I'd say xlate_dev_mem_ptr() looks messy, but is possibly a bit more 
robust in this regard?

> So my gut feel is that this is something worth trying to do bigger and
> more fundamental changes to.
> 
> But changing it to use "signal_pending()" at least as a trial looks
> ok. The only user *should* be things like dmidecode that apparently
> already do the right thing.
> 
> And if we changed the bounce buffering to do things at least a 32-bit
> word at a time, that would help latency for IO by a factor of 4.
> 
> And if the signal_pending() is at the end of the copy, then we'd
> guarantee that at least _small_ reads still work the way they did
> before, so people who do things like "lspci()" with direct hardware
> accesses wouldn't be impacted - if they exist.

Yeah.

> So I'd be willing to try that (and then if somebody reports a
> regression we can make it use "fatal_signal_pending()" instead)

Ok, will post a changelogged patch (unless Tetsuo beats me to it?).

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] /dev/mem: Bail out upon SIGKILL when reading memory.
  2019-08-24 20:22           ` Ingo Molnar
@ 2019-08-24 20:56             ` Linus Torvalds
  2019-08-30  9:56               ` David Laight
  2019-08-25  5:49             ` Tetsuo Handa
  1 sibling, 1 reply; 31+ messages in thread
From: Linus Torvalds @ 2019-08-24 20:56 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Tetsuo Handa, Arnd Bergmann, Greg Kroah-Hartman,
	Linux List Kernel Mailing, syzbot

On Sat, Aug 24, 2019 at 1:22 PM Ingo Molnar <mingo@kernel.org> wrote:
>
> That makes sense: I measured 17 seconds per 100 MB of data, which is is
> 0.16 usecs per byte. The instruction used by
> copy_user_enhanced_fast_string() is REP MOVSB - which supposedly goes as
> high as cacheline size accesses - but perhaps those get broken down for
> physical memory that has no device claiming it?

All the "rep string" optimizations are _only_ done for regular memory.

When it hits any IO accesses, it will do the accesses at the specified
size (so "movsb" will do it a byte at a time).

0.16 usec per byte is faster than the traditional ISA 'inb', but not
by a huge factor.

> Interruption is arguably *not* an 'error', so preserving partial reads
> sounds like the higher quality solution, nevertheless one could argue
> that particual read *could* be returned by read_kmem() if progress was
> made.

So if we react to regular signals, not just fatal ones, we definitely
need to honor partial reads.

> I.e. if for example an iomem area is already mapped by a driver with some
> conflicting cache attribute, xlate_dev_mem_ptr() AFAICS will not
> ioremap_cache() it to cached? IIRC some CPUs would triple fault or
> completely misbehave on certain cache attribute conflicts.

Yeah, I guess we have the machine check possibility with mixed cached cases.

> This check in mremap() might also trigger:
>
>         if (is_ram == REGION_MIXED) {
>                 WARN_ONCE(1, "memremap attempted on mixed range %pa size: %#lx\n",
>                                 &offset, (unsigned long) size);
>                 return NULL;
>         }
>
> So I'd say xlate_dev_mem_ptr() looks messy, but is possibly a bit more
> robust in this regard?

It clearly does work. Slowly, but work.

At least it works on x86. On some other architectures /dev/mem
definitely cannot possibly handle IO memory correctly, because you
can't even try to just read it as regular memory.

But those architectures are few and likely not relevant anyway (eg early alpha).

             Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] /dev/mem: Bail out upon SIGKILL when reading memory.
  2019-08-24 20:22           ` Ingo Molnar
  2019-08-24 20:56             ` Linus Torvalds
@ 2019-08-25  5:49             ` Tetsuo Handa
  2019-08-25  9:59               ` Ingo Molnar
  2019-08-25 16:54               ` Linus Torvalds
  1 sibling, 2 replies; 31+ messages in thread
From: Tetsuo Handa @ 2019-08-25  5:49 UTC (permalink / raw)
  To: Ingo Molnar, Linus Torvalds
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Linux List Kernel Mailing, syzbot

On 2019/08/25 5:22, Ingo Molnar wrote:
>> So I'd be willing to try that (and then if somebody reports a
>> regression we can make it use "fatal_signal_pending()" instead)
> 
> Ok, will post a changelogged patch (unless Tetsuo beats me to it?).

Here is a patch. This patch also tries to fix handling of return code when
partial read/write happened (because we should return bytes processed when
we return due to -EINTR). But asymmetric between read function and write
function looks messy. Maybe we should just make /dev/{mem,kmem} killable
for now, and defer making /dev/{mem,kmem} interruptible till rewrite of
read/write functions.

 drivers/char/mem.c | 89 ++++++++++++++++++++++++++++++------------------------
 1 file changed, 50 insertions(+), 39 deletions(-)

diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index cb8e653..3c6a3c2 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -108,7 +108,7 @@ static ssize_t read_mem(struct file *file, char __user *buf,
 	ssize_t read, sz;
 	void *ptr;
 	char *bounce;
-	int err;
+	int err = 0;
 
 	if (p != *ppos)
 		return 0;
@@ -132,8 +132,10 @@ static ssize_t read_mem(struct file *file, char __user *buf,
 #endif
 
 	bounce = kmalloc(PAGE_SIZE, GFP_KERNEL);
-	if (!bounce)
-		return -ENOMEM;
+	if (!bounce) {
+		err = -ENOMEM;
+		goto failed;
+	}
 
 	while (count > 0) {
 		unsigned long remaining;
@@ -142,7 +144,7 @@ static ssize_t read_mem(struct file *file, char __user *buf,
 		sz = size_inside_page(p, count);
 		cond_resched();
 		err = -EINTR;
-		if (fatal_signal_pending(current))
+		if (signal_pending(current))
 			goto failed;
 
 		err = -EPERM;
@@ -180,14 +182,11 @@ static ssize_t read_mem(struct file *file, char __user *buf,
 		count -= sz;
 		read += sz;
 	}
+failed:
 	kfree(bounce);
 
 	*ppos += read;
-	return read;
-
-failed:
-	kfree(bounce);
-	return err;
+	return read ? read : err;
 }
 
 static ssize_t write_mem(struct file *file, const char __user *buf,
@@ -197,6 +196,7 @@ static ssize_t write_mem(struct file *file, const char __user *buf,
 	ssize_t written, sz;
 	unsigned long copied;
 	void *ptr;
+	int err = 0;
 
 	if (p != *ppos)
 		return -EFBIG;
@@ -223,13 +223,16 @@ static ssize_t write_mem(struct file *file, const char __user *buf,
 
 		sz = size_inside_page(p, count);
 		cond_resched();
-		if (fatal_signal_pending(current))
-			return -EINTR;
+		err = -EINTR;
+		if (signal_pending(current))
+			break;
 
+		err = -EPERM;
 		allowed = page_is_allowed(p >> PAGE_SHIFT);
 		if (!allowed)
-			return -EPERM;
+			break;
 
+		err = -EFAULT;
 		/* Skip actual writing when a page is marked as restricted. */
 		if (allowed == 1) {
 			/*
@@ -238,19 +241,14 @@ static ssize_t write_mem(struct file *file, const char __user *buf,
 			 * by the kernel or data corruption may occur.
 			 */
 			ptr = xlate_dev_mem_ptr(p);
-			if (!ptr) {
-				if (written)
-					break;
-				return -EFAULT;
-			}
+			if (!ptr)
+				break;
 
 			copied = copy_from_user(ptr, buf, sz);
 			unxlate_dev_mem_ptr(p, ptr);
 			if (copied) {
 				written += sz - copied;
-				if (written)
-					break;
-				return -EFAULT;
+				break;
 			}
 		}
 
@@ -261,7 +259,7 @@ static ssize_t write_mem(struct file *file, const char __user *buf,
 	}
 
 	*ppos += written;
-	return written;
+	return written ? written : err;
 }
 
 int __weak phys_mem_access_prot_allowed(struct file *file,
@@ -459,8 +457,10 @@ static ssize_t read_kmem(struct file *file, char __user *buf,
 		while (low_count > 0) {
 			sz = size_inside_page(p, low_count);
 			cond_resched();
-			if (fatal_signal_pending(current))
-				return -EINTR;
+			if (signal_pending(current)) {
+				err = -EINTR;
+				goto failed;
+			}
 
 			/*
 			 * On ia64 if a page has been mapped somewhere as
@@ -468,11 +468,15 @@ static ssize_t read_kmem(struct file *file, char __user *buf,
 			 * by the kernel or data corruption may occur
 			 */
 			kbuf = xlate_dev_kmem_ptr((void *)p);
-			if (!virt_addr_valid(kbuf))
-				return -ENXIO;
+			if (!virt_addr_valid(kbuf)) {
+				err = -ENXIO;
+				goto failed;
+			}
 
-			if (copy_to_user(buf, kbuf, sz))
-				return -EFAULT;
+			if (copy_to_user(buf, kbuf, sz)) {
+				err = -EFAULT;
+				goto failed;
+			}
 			buf += sz;
 			p += sz;
 			read += sz;
@@ -483,12 +487,14 @@ static ssize_t read_kmem(struct file *file, char __user *buf,
 
 	if (count > 0) {
 		kbuf = (char *)__get_free_page(GFP_KERNEL);
-		if (!kbuf)
-			return -ENOMEM;
+		if (!kbuf) {
+			err = -ENOMEM;
+			goto failed;
+		}
 		while (count > 0) {
 			sz = size_inside_page(p, count);
 			cond_resched();
-			if (fatal_signal_pending(current)) {
+			if (signal_pending(current)) {
 				err = -EINTR;
 				break;
 			}
@@ -510,6 +516,7 @@ static ssize_t read_kmem(struct file *file, char __user *buf,
 		}
 		free_page((unsigned long)kbuf);
 	}
+ failed:
 	*ppos = p;
 	return read ? read : err;
 }
@@ -520,6 +527,7 @@ static ssize_t do_write_kmem(unsigned long p, const char __user *buf,
 {
 	ssize_t written, sz;
 	unsigned long copied;
+	int err = 0;
 
 	written = 0;
 #ifdef __ARCH_HAS_NO_PAGE_ZERO_MAPPED
@@ -539,8 +547,10 @@ static ssize_t do_write_kmem(unsigned long p, const char __user *buf,
 
 		sz = size_inside_page(p, count);
 		cond_resched();
-		if (fatal_signal_pending(current))
-			return -EINTR;
+		if (signal_pending(current)) {
+			err = -EINTR;
+			break;
+		}
 
 		/*
 		 * On ia64 if a page has been mapped somewhere as uncached, then
@@ -548,15 +558,16 @@ static ssize_t do_write_kmem(unsigned long p, const char __user *buf,
 		 * corruption may occur.
 		 */
 		ptr = xlate_dev_kmem_ptr((void *)p);
-		if (!virt_addr_valid(ptr))
-			return -ENXIO;
+		if (!virt_addr_valid(ptr)) {
+			err = -ENXIO;
+			break;
+		}
 
 		copied = copy_from_user(ptr, buf, sz);
 		if (copied) {
 			written += sz - copied;
-			if (written)
-				break;
-			return -EFAULT;
+			err = -EFAULT;
+			break;
 		}
 		buf += sz;
 		p += sz;
@@ -565,7 +576,7 @@ static ssize_t do_write_kmem(unsigned long p, const char __user *buf,
 	}
 
 	*ppos += written;
-	return written;
+	return written ? written : err;
 }
 
 /*
@@ -600,7 +611,7 @@ static ssize_t write_kmem(struct file *file, const char __user *buf,
 			unsigned long n;
 
 			cond_resched();
-			if (fatal_signal_pending(current)) {
+			if (signal_pending(current)) {
 				err = -EINTR;
 				break;
 			}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH] /dev/mem: Bail out upon SIGKILL when reading memory.
  2019-08-25  5:49             ` Tetsuo Handa
@ 2019-08-25  9:59               ` Ingo Molnar
  2019-08-25 10:35                 ` Tetsuo Handa
  2019-08-25 16:54               ` Linus Torvalds
  1 sibling, 1 reply; 31+ messages in thread
From: Ingo Molnar @ 2019-08-25  9:59 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Linus Torvalds, Arnd Bergmann, Greg Kroah-Hartman,
	Linux List Kernel Mailing, syzbot


* Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote:

> On 2019/08/25 5:22, Ingo Molnar wrote:
> >> So I'd be willing to try that (and then if somebody reports a
> >> regression we can make it use "fatal_signal_pending()" instead)
> > 
> > Ok, will post a changelogged patch (unless Tetsuo beats me to it?).
> 
> Here is a patch. This patch also tries to fix handling of return code when
> partial read/write happened (because we should return bytes processed when
> we return due to -EINTR). But asymmetric between read function and write
> function looks messy. Maybe we should just make /dev/{mem,kmem} killable
> for now, and defer making /dev/{mem,kmem} interruptible till rewrite of
> read/write functions.
> 
>  drivers/char/mem.c | 89 ++++++++++++++++++++++++++++++------------------------
>  1 file changed, 50 insertions(+), 39 deletions(-)
> 
> diff --git a/drivers/char/mem.c b/drivers/char/mem.c
> index cb8e653..3c6a3c2 100644
> --- a/drivers/char/mem.c
> +++ b/drivers/char/mem.c
> @@ -108,7 +108,7 @@ static ssize_t read_mem(struct file *file, char __user *buf,
>  	ssize_t read, sz;
>  	void *ptr;
>  	char *bounce;
> -	int err;
> +	int err = 0;
>  
>  	if (p != *ppos)
>  		return 0;
> @@ -132,8 +132,10 @@ static ssize_t read_mem(struct file *file, char __user *buf,
>  #endif
>  
>  	bounce = kmalloc(PAGE_SIZE, GFP_KERNEL);
> -	if (!bounce)
> -		return -ENOMEM;
> +	if (!bounce) {
> +		err = -ENOMEM;
> +		goto failed;
> +	}

Yeah, so while I agree with the more consistent handling of partial 
reads, I'd suggest the following changes:

 - Please don't use this 4-line error handling variant, use the old short 
   2-line pattern instead. There's no real reason to keep 'err' as a 
   flag, the 'failed' branch will know that 'err' is the error return if 
   there's been no progress.

 - We should probably separate out a third 'fatal error' variant: for 
   example if copying to user-space generates a page fault, then we 
   clearly should not pretend that all is fine and return a short read 
   even if we made some progress, a -EFAULT is more informative, as we 
   might have corrupted (overran) some user buffer on the failed copy as 
   well, and ran off the end into the first unmapped user area.

 - As for the patch series maybe it might make sense to separate the 
   fixes from the semantic changes, in case there's any breakage. I.e. 
   first fix the bug minimally, then add the other changes in a separate 
   commit. If any of them causes problems with applications we'll have a 
   more precise bisection result.

 - Likewise, the changing of the write side interruptability of /dev/mem 
   should probably be a separate patch as well.

I can factor out such a series if you don't have the time, but feel free 
to do it yourself, this is your bug report and your patch. :)

> @@ -180,14 +182,11 @@ static ssize_t read_mem(struct file *file, char __user *buf,
>  		count -= sz;
>  		read += sz;
>  	}
> +failed:
>  	kfree(bounce);
>  
>  	*ppos += read;
> -	return read;
> -
> -failed:
> -	kfree(bounce);
> -	return err;
> +	return read ? read : err;
>  }

Yeah, the merging of the normal and the failure path control flow doesn't 
really help readability and makes the actual iterator more complex - I 
think the old exception handling pattern was fine.

I think the same applies to the write path as well.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] /dev/mem: Bail out upon SIGKILL when reading memory.
  2019-08-25  9:59               ` Ingo Molnar
@ 2019-08-25 10:35                 ` Tetsuo Handa
  2019-08-25 10:48                   ` Ingo Molnar
  0 siblings, 1 reply; 31+ messages in thread
From: Tetsuo Handa @ 2019-08-25 10:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Arnd Bergmann, Greg Kroah-Hartman,
	Linux List Kernel Mailing, syzbot

On 2019/08/25 18:59, Ingo Molnar wrote:
>> @@ -132,8 +132,10 @@ static ssize_t read_mem(struct file *file, char __user *buf,
>>  #endif
>>  
>>  	bounce = kmalloc(PAGE_SIZE, GFP_KERNEL);
>> -	if (!bounce)
>> -		return -ENOMEM;
>> +	if (!bounce) {
>> +		err = -ENOMEM;
>> +		goto failed;
>> +	}
> 
> Yeah, so while I agree with the more consistent handling of partial 
> reads, I'd suggest the following changes:
> 
>  - Please don't use this 4-line error handling variant, use the old short 
>    2-line pattern instead. There's no real reason to keep 'err' as a 
>    flag, the 'failed' branch will know that 'err' is the error return if 
>    there's been no progress.

The caller might guarantee count > 0, but for robustness, I decided to
choose 4-line error handling here because I merged the normal and the
failure path control flow; read will remain 0 if count == 0, and thus
err should remain 0.

> 
>  - We should probably separate out a third 'fatal error' variant: for 
>    example if copying to user-space generates a page fault, then we 
>    clearly should not pretend that all is fine and return a short read 
>    even if we made some progress, a -EFAULT is more informative, as we 
>    might have corrupted (overran) some user buffer on the failed copy as 
>    well, and ran off the end into the first unmapped user area.

Is it possible that copy_from_user() corrupts user buffer in a way that userspace
cannot retry when kernel responded with "there was a short write"? It seems that
these functions are difficult to return appropriate errors...

> 
>  - As for the patch series maybe it might make sense to separate the 
>    fixes from the semantic changes, in case there's any breakage. I.e. 
>    first fix the bug minimally, then add the other changes in a separate 
>    commit. If any of them causes problems with applications we'll have a 
>    more precise bisection result.

Yes. I think for now we should just make these functions killable. Then, we
can try changing return code for partial read/write. If no breakage report,
we can make these functions interruptible.

> 
>  - Likewise, the changing of the write side interruptability of /dev/mem 
>    should probably be a separate patch as well.
> 
> I can factor out such a series if you don't have the time, but feel free 
> to do it yourself, this is your bug report and your patch. :)

You can do it. It's a syzbot's bug report. I just forwarded it. ;-)


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] /dev/mem: Bail out upon SIGKILL when reading memory.
  2019-08-25 10:35                 ` Tetsuo Handa
@ 2019-08-25 10:48                   ` Ingo Molnar
  0 siblings, 0 replies; 31+ messages in thread
From: Ingo Molnar @ 2019-08-25 10:48 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Linus Torvalds, Arnd Bergmann, Greg Kroah-Hartman,
	Linux List Kernel Mailing, syzbot


* Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote:

> >  - We should probably separate out a third 'fatal error' variant: for 
> >    example if copying to user-space generates a page fault, then we 
> >    clearly should not pretend that all is fine and return a short read 
> >    even if we made some progress, a -EFAULT is more informative, as we 
> >    might have corrupted (overran) some user buffer on the failed copy as 
> >    well, and ran off the end into the first unmapped user area.
> 
> Is it possible that copy_from_user() corrupts user buffer in a way that userspace
> cannot retry when kernel responded with "there was a short write"? It seems that
> these functions are difficult to return appropriate errors...

In the cleanest implementation I believe should be done is to 
differentiate between 'kernel side errors' and 'user side errors':

 - 'kernel side errors' are conditions that relate to the layout of 
   kernel memory: some areas might not be readable, and there might be 
   cachability restrictions - or the kernel ran out of memory. In this 
   case it's not user-space's "fault" that they ran into an error and 
   returning a partial read might improve the whole read process, as 
   user-space can decide to continue reading at the last offset that was 
   read - and would also be able to extract all information that can be 
   extracted.

 - 'user side errors' on the other hand are conditions that are probably 
   a user-space bug: such as trying to read() too much data into a too 
   small buffer, running off the end of it and potentially generating a 
   -EFAULT. In this case the kernel should not return a short read, but 
   escalate the error immediately - bugs are easier to find the earlier 
   the condition is reported.

So this is why I think it would make sense to have two error labels: 
"failure" and "fatal_failure". The "failure" case would return a partial 
read if possible (and an error if not), the "fatal_failure" would return 
an error immediately.

This is probably a tad over-engineered, but since you asked ... ;-)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] /dev/mem: Bail out upon SIGKILL when reading memory.
  2019-08-25  5:49             ` Tetsuo Handa
  2019-08-25  9:59               ` Ingo Molnar
@ 2019-08-25 16:54               ` Linus Torvalds
  2019-08-26 10:40                 ` Tetsuo Handa
  1 sibling, 1 reply; 31+ messages in thread
From: Linus Torvalds @ 2019-08-25 16:54 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Ingo Molnar, Arnd Bergmann, Greg Kroah-Hartman,
	Linux List Kernel Mailing, syzbot

On Sat, Aug 24, 2019 at 10:50 PM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> @@ -142,7 +144,7 @@ static ssize_t read_mem(struct file *file, char __user *buf,
>                 sz = size_inside_page(p, count);
>                 cond_resched();
>                 err = -EINTR;
> -               if (fatal_signal_pending(current))
> +               if (signal_pending(current))
>                         goto failed;
>
>                 err = -EPERM;

So from a "likelihood of breaking" standpoint, I'd really like to make
sure that the "signal_pending()" checks come at the *end* of the loop.

That way, if somebody is doing a 4-byte read from MMIO, he'll never see -EINTR.

I'm specifically thinking of tools like user-space 'lspci' etc, which
I wouldn't be surprised could happen.

Also, just in case things break, I do agree with Ingo that this should
be split up into several patches.

             Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] /dev/mem: Bail out upon SIGKILL when reading memory.
  2019-08-25 16:54               ` Linus Torvalds
@ 2019-08-26 10:40                 ` Tetsuo Handa
  2019-08-26 11:19                   ` Ingo Molnar
  0 siblings, 1 reply; 31+ messages in thread
From: Tetsuo Handa @ 2019-08-26 10:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Arnd Bergmann, Greg Kroah-Hartman,
	Linux List Kernel Mailing, syzbot

On 2019/08/26 1:54, Linus Torvalds wrote:
> On Sat, Aug 24, 2019 at 10:50 PM Tetsuo Handa
> <penguin-kernel@i-love.sakura.ne.jp> wrote:
>>
>> @@ -142,7 +144,7 @@ static ssize_t read_mem(struct file *file, char __user *buf,
>>                 sz = size_inside_page(p, count);
>>                 cond_resched();
>>                 err = -EINTR;
>> -               if (fatal_signal_pending(current))
>> +               if (signal_pending(current))
>>                         goto failed;
>>
>>                 err = -EPERM;
> 
> So from a "likelihood of breaking" standpoint, I'd really like to make
> sure that the "signal_pending()" checks come at the *end* of the loop.
> 
> That way, if somebody is doing a 4-byte read from MMIO, he'll never see -EINTR.
> 
> I'm specifically thinking of tools like user-space 'lspci' etc, which
> I wouldn't be surprised could happen.
> 
> Also, just in case things break, I do agree with Ingo that this should
> be split up into several patches.

Thinking from how read_mem() returns error code instead of returning bytes
already processed, any sane users will not try to read so much memory (like 2GB).
If userspace programs want to read so much memory, there must have been attempts
to improve performance. I guess that userspace program somehow knows which region
to read and tries to read only meaningful pages (which would not become hundreds MB).
Thus, I don't think we want to make /dev/{mem,kmem} intrruptible. Just making killable
in case insane userspace program (like fuzzer) tried to read/write so much memory
will be sufficient...


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] /dev/mem: Bail out upon SIGKILL when reading memory.
  2019-08-26 10:40                 ` Tetsuo Handa
@ 2019-08-26 11:19                   ` Ingo Molnar
  2019-08-26 15:02                     ` Rewriting read_kmem()/write_kmem() ? Tetsuo Handa
  0 siblings, 1 reply; 31+ messages in thread
From: Ingo Molnar @ 2019-08-26 11:19 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Linus Torvalds, Arnd Bergmann, Greg Kroah-Hartman,
	Linux List Kernel Mailing, syzbot


* Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote:

> On 2019/08/26 1:54, Linus Torvalds wrote:
> > On Sat, Aug 24, 2019 at 10:50 PM Tetsuo Handa
> > <penguin-kernel@i-love.sakura.ne.jp> wrote:
> >>
> >> @@ -142,7 +144,7 @@ static ssize_t read_mem(struct file *file, char __user *buf,
> >>                 sz = size_inside_page(p, count);
> >>                 cond_resched();
> >>                 err = -EINTR;
> >> -               if (fatal_signal_pending(current))
> >> +               if (signal_pending(current))
> >>                         goto failed;
> >>
> >>                 err = -EPERM;
> > 
> > So from a "likelihood of breaking" standpoint, I'd really like to make
> > sure that the "signal_pending()" checks come at the *end* of the loop.
> > 
> > That way, if somebody is doing a 4-byte read from MMIO, he'll never see -EINTR.
> > 
> > I'm specifically thinking of tools like user-space 'lspci' etc, which
> > I wouldn't be surprised could happen.
> > 
> > Also, just in case things break, I do agree with Ingo that this should
> > be split up into several patches.
> 
> Thinking from how read_mem() returns error code instead of returning bytes
> already processed, any sane users will not try to read so much memory (like 2GB).
> If userspace programs want to read so much memory, there must have been attempts
> to improve performance. I guess that userspace program somehow knows which region
> to read and tries to read only meaningful pages (which would not become hundreds MB).
> Thus, I don't think we want to make /dev/{mem,kmem} intrruptible. Just making killable
> in case insane userspace program (like fuzzer) tried to read/write so much memory
> will be sufficient...

Basically making IO primitives interruptible is the norm and it's a 
quality of implementation issue: it's only a historic accident that 
/dev/mem read()s aren't.

So let's try and make it interruptible as the #3 patch I sent did - of 
course if anything breaks we'll have to undo it. But if we can get away 
with then by all means let's do so - even shorter reads can generate 
nasty long processing latencies.

Ok?

Thanks,

     Ingo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Rewriting read_kmem()/write_kmem() ?
  2019-08-26 11:19                   ` Ingo Molnar
@ 2019-08-26 15:02                     ` Tetsuo Handa
  0 siblings, 0 replies; 31+ messages in thread
From: Tetsuo Handa @ 2019-08-26 15:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Arnd Bergmann, Greg Kroah-Hartman,
	Linux List Kernel Mailing

On 2019/08/26 20:19, Ingo Molnar wrote:
> Basically making IO primitives interruptible is the norm and it's a 
> quality of implementation issue: it's only a historic accident that 
> /dev/mem read()s aren't.
> 
> So let's try and make it interruptible as the #3 patch I sent did - of 
> course if anything breaks we'll have to undo it. But if we can get away 
> with then by all means let's do so - even shorter reads can generate 
> nasty long processing latencies.
> 
> Ok?

This is how read_kmem()/write_kmem() could be rewritten (before doing
s/fatal_signal_pending/signal_pending/ change). Only compile tested.
Do we want to try this change using several patches?

 drivers/char/mem.c | 202 +++++++++++++++++++----------------------------------
 1 file changed, 73 insertions(+), 129 deletions(-)

diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index 9eb564c..12bca2a 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -430,114 +430,32 @@ static int mmap_kmem(struct file *file, struct vm_area_struct *vma)
 	return mmap_mem(file, vma);
 }
 
-/*
- * This function reads the *virtual* memory as seen by the kernel.
- */
-static ssize_t read_kmem(struct file *file, char __user *buf,
-			 size_t count, loff_t *ppos)
+static ssize_t do_copy_kmem(unsigned long p, char __user *buf, size_t count,
+			    loff_t *ppos, const bool is_write)
 {
-	unsigned long p = *ppos;
-	ssize_t low_count, read, sz;
-	char *kbuf; /* k-addr because vread() takes vmlist_lock rwlock */
-	int err = 0;
-
-	read = 0;
-	if (p < (unsigned long) high_memory) {
-		low_count = count;
-		if (count > (unsigned long)high_memory - p)
-			low_count = (unsigned long)high_memory - p;
+	ssize_t copied = 0;
+	ssize_t sz;
 
 #ifdef __ARCH_HAS_NO_PAGE_ZERO_MAPPED
-		/* we don't have page 0 mapped on sparc and m68k.. */
-		if (p < PAGE_SIZE && low_count > 0) {
-			sz = size_inside_page(p, low_count);
-			if (clear_user(buf, sz))
-				return -EFAULT;
-			buf += sz;
-			p += sz;
-			read += sz;
-			low_count -= sz;
-			count -= sz;
-		}
-#endif
-		while (low_count > 0) {
-			sz = size_inside_page(p, low_count);
-
-			/*
-			 * On ia64 if a page has been mapped somewhere as
-			 * uncached, then it must also be accessed uncached
-			 * by the kernel or data corruption may occur
-			 */
-			kbuf = xlate_dev_kmem_ptr((void *)p);
-			if (!virt_addr_valid(kbuf))
-				return -ENXIO;
-
-			if (copy_to_user(buf, kbuf, sz))
-				return -EFAULT;
-			buf += sz;
-			p += sz;
-			read += sz;
-			low_count -= sz;
-			count -= sz;
-			if (should_stop_iteration()) {
-				count = 0;
-				break;
-			}
-		}
-	}
-
-	if (count > 0) {
-		kbuf = (char *)__get_free_page(GFP_KERNEL);
-		if (!kbuf)
-			return -ENOMEM;
-		while (count > 0) {
-			sz = size_inside_page(p, count);
-			if (!is_vmalloc_or_module_addr((void *)p)) {
-				err = -ENXIO;
-				break;
-			}
-			sz = vread(kbuf, (char *)p, sz);
-			if (!sz)
-				break;
-			if (copy_to_user(buf, kbuf, sz)) {
-				err = -EFAULT;
-				break;
-			}
-			count -= sz;
-			buf += sz;
-			read += sz;
-			p += sz;
-			if (should_stop_iteration())
-				break;
-		}
-		free_page((unsigned long)kbuf);
-	}
-	*ppos = p;
-	return read ? read : err;
-}
-
-
-static ssize_t do_write_kmem(unsigned long p, const char __user *buf,
-				size_t count, loff_t *ppos)
-{
-	ssize_t written, sz;
-	unsigned long copied;
-
-	written = 0;
-#ifdef __ARCH_HAS_NO_PAGE_ZERO_MAPPED
 	/* we don't have page 0 mapped on sparc and m68k.. */
 	if (p < PAGE_SIZE) {
 		sz = size_inside_page(p, count);
-		/* Hmm. Do something? */
+		if (is_write) {
+			/* Hmm. Do something? */
+		} else {
+			if (clear_user(buf, sz))
+				return -EFAULT;
+		}
 		buf += sz;
 		p += sz;
 		count -= sz;
-		written += sz;
+		copied += sz;
 	}
 #endif
 
 	while (count > 0) {
 		void *ptr;
+		unsigned long n;
 
 		sz = size_inside_page(p, count);
 
@@ -550,78 +468,104 @@ static ssize_t do_write_kmem(unsigned long p, const char __user *buf,
 		if (!virt_addr_valid(ptr))
 			return -ENXIO;
 
-		copied = copy_from_user(ptr, buf, sz);
-		if (copied) {
-			written += sz - copied;
-			if (written)
+		if (is_write)
+			n = copy_from_user(ptr, buf, sz);
+		else
+			n = copy_to_user(buf, buf, sz);
+		if (n) {
+			copied += sz - n;
+			if (copied)
 				break;
 			return -EFAULT;
 		}
 		buf += sz;
 		p += sz;
 		count -= sz;
-		written += sz;
+		copied += sz;
 		if (should_stop_iteration())
 			break;
 	}
 
-	*ppos += written;
-	return written;
+	*ppos += copied;
+	return copied;
 }
 
-/*
- * This function writes to the *virtual* memory as seen by the kernel.
- */
-static ssize_t write_kmem(struct file *file, const char __user *buf,
-			  size_t count, loff_t *ppos)
+static ssize_t copy_kmem(char __user *buf, size_t count, loff_t *ppos,
+			 const bool is_write)
 {
 	unsigned long p = *ppos;
-	ssize_t wrote = 0;
-	ssize_t virtr = 0;
-	char *kbuf; /* k-addr because vwrite() takes vmlist_lock rwlock */
+	ssize_t copied = 0;
 	int err = 0;
 
 	if (p < (unsigned long) high_memory) {
-		unsigned long to_write = min_t(unsigned long, count,
-					       (unsigned long)high_memory - p);
-		wrote = do_write_kmem(p, buf, to_write, ppos);
-		if (wrote != to_write)
-			return wrote;
-		p += wrote;
-		buf += wrote;
-		count -= wrote;
+		unsigned long to_copy = min_t(unsigned long, count,
+					      (unsigned long)high_memory - p);
+
+		copied = do_copy_kmem(p, buf, to_copy, ppos, is_write);
+		if (copied != to_copy)
+			return copied;
+		p += copied;
+		buf += copied;
+		count -= copied;
 	}
 
 	if (count > 0) {
-		kbuf = (char *)__get_free_page(GFP_KERNEL);
+		/* k-addr because vread()/vwrite() takes vmlist_lock rwlock */
+		char *kbuf = (char *)__get_free_page(GFP_KERNEL);
+
 		if (!kbuf)
-			return wrote ? wrote : -ENOMEM;
+			return copied ? copied : -ENOMEM;
 		while (count > 0) {
-			unsigned long sz = size_inside_page(p, count);
-			unsigned long n;
+			ssize_t sz = size_inside_page(p, count);
 
 			if (!is_vmalloc_or_module_addr((void *)p)) {
 				err = -ENXIO;
 				break;
 			}
-			n = copy_from_user(kbuf, buf, sz);
-			if (n) {
-				err = -EFAULT;
-				break;
+			if (is_write) {
+				if (copy_from_user(kbuf, buf, sz)) {
+					err = -EFAULT;
+					break;
+				}
+				vwrite(kbuf, (char *)p, sz);
+			} else {
+				sz = vread(kbuf, (char *)p, sz);
+				if (!sz)
+					break;
+				if (copy_to_user(buf, kbuf, sz)) {
+					err = -EFAULT;
+					break;
+				}
 			}
-			vwrite(kbuf, (char *)p, sz);
 			count -= sz;
 			buf += sz;
-			virtr += sz;
+			copied += sz;
 			p += sz;
 			if (should_stop_iteration())
 				break;
 		}
 		free_page((unsigned long)kbuf);
 	}
-
 	*ppos = p;
-	return virtr + wrote ? : err;
+	return copied ? copied : err;
+}
+
+/*
+ * This function reads the *virtual* memory as seen by the kernel.
+ */
+static ssize_t read_kmem(struct file *file, char __user *buf,
+			 size_t count, loff_t *ppos)
+{
+	return copy_kmem(buf, count, ppos, false);
+}
+
+/*
+ * This function writes to the *virtual* memory as seen by the kernel.
+ */
+static ssize_t write_kmem(struct file *file, const char __user *buf,
+			  size_t count, loff_t *ppos)
+{
+	return copy_kmem((char __user *) buf, count, ppos, true);
 }
 
 static ssize_t read_port(struct file *file, char __user *buf,
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* RE: [PATCH] /dev/mem: Bail out upon SIGKILL when reading memory.
  2019-08-24 20:56             ` Linus Torvalds
@ 2019-08-30  9:56               ` David Laight
  0 siblings, 0 replies; 31+ messages in thread
From: David Laight @ 2019-08-30  9:56 UTC (permalink / raw)
  To: 'Linus Torvalds', Ingo Molnar
  Cc: Tetsuo Handa, Arnd Bergmann, Greg Kroah-Hartman,
	Linux List Kernel Mailing, syzbot

From: Linus Torvalds
> Sent: 24 August 2019 21:57
> On Sat, Aug 24, 2019 at 1:22 PM Ingo Molnar <mingo@kernel.org> wrote:
> >
> > That makes sense: I measured 17 seconds per 100 MB of data, which is is
> > 0.16 usecs per byte. The instruction used by
> > copy_user_enhanced_fast_string() is REP MOVSB - which supposedly goes as
> > high as cacheline size accesses - but perhaps those get broken down for
> > physical memory that has no device claiming it?
> 
> All the "rep string" optimizations are _only_ done for regular memory.

More likely for pages that uses the data cache.
It ought to be possible to map PCIe memory through the data cache.
With care that would allow longer TLPs (especially read TLP) for
cpu 'pio' buffer transfers.

> When it hits any IO accesses, it will do the accesses at the specified
> size (so "movsb" will do it a byte at a time).
> 
> 0.16 usec per byte is faster than the traditional ISA 'inb', but not
> by a huge factor.

That speed depends on the target, IIRC our fpga target takes 128 clocks
at 62.5MHz to process a PCIe read request.
None of the current Intel x86 cpus will issue multiple read TLP from
a single core, so reads never get any overlapping and suffer the
full latency.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Kernel config for fuzz testing.
  2019-08-23  8:17                 ` Tetsuo Handa
  2019-08-23 16:47                   ` Dmitry Vyukov
@ 2019-10-08  9:57                   ` Tetsuo Handa
  1 sibling, 0 replies; 31+ messages in thread
From: Tetsuo Handa @ 2019-10-08  9:57 UTC (permalink / raw)
  To: Dmitry Vyukov, Linus Torvalds
  Cc: Greg Kroah-Hartman, Arnd Bergmann, LKML, syzbot, Eric Biggers

On 2019/08/23 17:17, Tetsuo Handa wrote:
> On 2019/08/23 8:59, Dmitry Vyukov wrote:
>>> Can't we introduce a kernel config which selectively blocks specific actions?
>>> If we don't need to worry about bypassing blacklist checks, we will be able to
>>> enable syz_execute_func() again.
>>
>>
>> We can consider this, but we need some set of good use cases first.
>> For /dev/{mem,kmem} we disable them with config, right?
> 
> /dev/{mem,kmem} can be disabled by kernel config options. But
> 
>>                                                         That looks
>> like the right thing to do because we don't want fuzzer to do anything
>> with these files anyway.
> 
> I don't think so. To examine as corner as possible (e.g. lock dependency),
> I consider that even doing
> 
> ----------
> +#ifdef CONFIG_KERNEL_BUILT_FOR_FUZZ_TESTING
> +static char dummybuf[PAGE_SIZE];
> +#endif
> ----------
> 
> ----------
>                         ptr = xlate_dev_mem_ptr(p);
>                         if (!ptr) {
>                                 if (written)
>                                         break;
>                                 return -EFAULT;
>                         }
> +#ifndef CONFIG_KERNEL_BUILT_FOR_FUZZ_TESTING
>                         copied = copy_from_user(ptr, buf, sz);
> +#else
> +                       copied = copy_from_user(dummybuf, buf, min(sizeof(dummybuf), sz));
> +#endif
>                         unxlate_dev_mem_ptr(p, ptr);
> ----------
> 
> makes sense, for copy_from_user() might find new lock dependency
> which would otherwise be unnoticed.
> 
>>                          So this won't be a good use case for
>> CONFIG_KERNEL_BUILT_FOR_FUZZ_TESTING.
>> Fuzzer can also reliably filter out based on syscall numbers of
>> top-level argument values. The potential problem is with (1)
>> pointers/indirect memory and (2) where blacklisting some top-level
>> argument values would backlist too much (e.g. prohibiting 3rd ioctl
>> argument 0 entirely).
> 
> I consider that functions that freezes processes/filesystems, 
> reboots/shutdowns a system, changes console loglevels can be blocked
> as well. Trying to examine up to last-second conditional branches will
> catch more bugs (e.g. bugs in error recovery paths).
> 

A USB fuzz test is triggering SysRq-t until RCU stall happens. ;-)
Can we introduce a kernel config (for linux.git) which selectively
blocks specific actions?

https://syzkaller.appspot.com/text?tag=CrashLog&x=12eb1e67600000

[  563.439044][    C1] ksoftirqd/1     R  running task    28264    16      2 0x80004008
[  563.447037][    C1] Call Trace:
[  563.450321][    C1]  sched_show_task.cold+0x2e0/0x359
[  563.455502][    C1]  show_state_filter+0x164/0x209
[  563.460421][    C1]  ? fn_caps_on+0x90/0x90
[  563.464730][    C1]  k_spec+0xdc/0x120
[  563.468605][    C1]  kbd_event+0x927/0x3790
[  563.472914][    C1]  ? k_pad+0x720/0x720
[  563.476964][    C1]  ? mark_held_locks+0xe0/0xe0
[  563.481791][    C1]  ? sysrq_filter+0xdf/0xeb0
[  563.486359][    C1]  ? k_pad+0x720/0x720
[  563.490419][    C1]  input_to_handler+0x3b6/0x4c0
[  563.495247][    C1]  input_pass_values.part.0+0x2e3/0x720
[  563.500770][    C1]  input_repeat_key+0x1ee/0x2c0
[  563.505622][    C1]  ? input_dev_suspend+0x80/0x80
[  563.510535][    C1]  ? rcu_read_lock_bh_held+0xb0/0xb0
[  563.515805][    C1]  call_timer_fn+0x179/0x650
[  563.520385][    C1]  ? input_dev_suspend+0x80/0x80
[  563.525389][    C1]  ? msleep_interruptible+0x130/0x130
[  563.531528][    C1]  ? rcu_read_lock_sched_held+0x9c/0xd0
[  563.537058][    C1]  ? rcu_read_lock_bh_held+0xb0/0xb0
[  563.542334][    C1]  ? _raw_spin_unlock_irq+0x24/0x30
[  563.547519][    C1]  ? input_dev_suspend+0x80/0x80
[  563.552443][    C1]  run_timer_softirq+0x5e3/0x1490
[  563.557454][    C1]  ? add_timer+0x7a0/0x7a0
[  563.561853][    C1]  ? rcu_read_lock_sched_held+0x9c/0xd0
[  563.567377][    C1]  ? rcu_read_lock_bh_held+0xb0/0xb0
[  563.572644][    C1]  __do_softirq+0x221/0x912
[  563.577147][    C1]  ? takeover_tasklets+0x720/0x720
[  563.582246][    C1]  run_ksoftirqd+0x1f/0x40
[  563.586638][    C1]  smpboot_thread_fn+0x3e8/0x850
[  563.591640][    C1]  ? smpboot_unregister_percpu_thread+0x190/0x190
[  563.598031][    C1]  ? __kthread_parkme+0x10a/0x1c0
[  563.603031][    C1]  ? smpboot_unregister_percpu_thread+0x190/0x190
[  563.609440][    C1]  kthread+0x318/0x420
[  563.613746][    C1]  ? kthread_create_on_node+0xf0/0xf0
[  563.619092][    C1]  ret_from_fork+0x24/0x30

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2019-10-08  9:58 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-20 22:06 [PATCH] /dev/mem: Bail out upon SIGKILL when reading memory Tetsuo Handa
2019-08-20 22:24 ` Greg Kroah-Hartman
2019-08-21  0:07   ` Tetsuo Handa
2019-08-22  9:59   ` Tetsuo Handa
2019-08-22 13:35     ` Greg Kroah-Hartman
2019-08-22 14:00       ` Tetsuo Handa
2019-08-22 16:42         ` Greg Kroah-Hartman
2019-08-22 17:11           ` Dmitry Vyukov
2019-08-22 21:17             ` Tetsuo Handa
2019-08-22 23:59               ` Dmitry Vyukov
2019-08-23  8:17                 ` Tetsuo Handa
2019-08-23 16:47                   ` Dmitry Vyukov
2019-10-08  9:57                   ` Kernel config for fuzz testing Tetsuo Handa
2019-08-22 21:29 ` [PATCH] /dev/mem: Bail out upon SIGKILL when reading memory Linus Torvalds
2019-08-22 22:08 ` Linus Torvalds
2019-08-23  9:16   ` Ingo Molnar
2019-08-23 16:39     ` Linus Torvalds
2019-08-24 16:14       ` Ingo Molnar
2019-08-24 17:40         ` Linus Torvalds
2019-08-24 20:22           ` Ingo Molnar
2019-08-24 20:56             ` Linus Torvalds
2019-08-30  9:56               ` David Laight
2019-08-25  5:49             ` Tetsuo Handa
2019-08-25  9:59               ` Ingo Molnar
2019-08-25 10:35                 ` Tetsuo Handa
2019-08-25 10:48                   ` Ingo Molnar
2019-08-25 16:54               ` Linus Torvalds
2019-08-26 10:40                 ` Tetsuo Handa
2019-08-26 11:19                   ` Ingo Molnar
2019-08-26 15:02                     ` Rewriting read_kmem()/write_kmem() ? Tetsuo Handa
2019-08-23 11:46   ` [PATCH] /dev/mem: Bail out upon SIGKILL when reading memory Tetsuo Handa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).