linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* perf: use after free in perf_remove_from_context
@ 2014-05-12 15:42 Sasha Levin
  2014-05-14 16:29 ` Peter Zijlstra
                   ` (2 more replies)
  0 siblings, 3 replies; 46+ messages in thread
From: Sasha Levin @ 2014-05-12 15:42 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, acme; +Cc: LKML, Thomas Gleixner, Dave Jones

Hi all,

While fuzzing with trinity inside a KVM tools guest running the latest -next
kernel I've stumbled on the following spew. Maybe related to the very recent
change in freeing on task exit?

[ 2509.827261] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 2509.830379] Dumping ftrace buffer:
[ 2509.830379]    (ftrace buffer empty)
[ 2509.830379] Modules linked in:
[ 2509.830379] CPU: 47 PID: 43306 Comm: trinity-c126 Tainted: G        W     3.15.0-rc5-next-20140512-sasha-00019-ga20bc00-dirty #456
[ 2509.830379] task: ffff880142e03000 ti: ffff880129628000 task.ti: ffff880129628000
[ 2509.830379] RIP: perf_remove_from_context (kernel/events/core.c:1495)
[ 2509.830379] RSP: 0000:ffff880129629bf8  EFLAGS: 00010292
[ 2509.830379] RAX: 0000000000000000 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000001
[ 2509.830379] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 6b6b6b6b6b6b6b6b
[ 2509.830379] RBP: ffff880129629c38 R08: 0000000000000000 R09: 0000000000000000
[ 2509.830379] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8801439caaa8
[ 2509.830379] R13: ffff88002cc84a40 R14: 6b6b6b6b6b6b6b6b R15: 0000000000000001
[ 2509.830379] FS:  00007fcd0ca92700(0000) GS:ffff8803a0e00000(0000) knlGS:0000000000000000
[ 2509.830379] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2509.830379] CR2: 00000000029b7ff8 CR3: 00000004ad510000 CR4: 00000000000006a0
[ 2509.830379] DR0: 00000000006df000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2509.830379] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
[ 2509.830379] Stack:
[ 2509.830379]  6b6b6b6b6b6b6b6b 0000000000000201 ffff880129629c28 6b6b6b6b6b6b6b6b
[ 2509.830379]  ffff8801439caaa8 ffff88002cc84a40 ffff880142e03000 0000000000000001
[ 2509.830379]  ffff880129629ca8 ffffffffa127fa81 ffff880112f77200 ffff88002cc84c38
[ 2509.830379] Call Trace:
[ 2509.830379] perf_event_exit_task (kernel/events/core.c:7426 kernel/events/core.c:7489 kernel/events/core.c:7520)
[ 2509.830379] do_exit (kernel/exit.c:766)
[ 2509.830379] ? debug_smp_processor_id (lib/smp_processor_id.c:57)
[ 2509.830379] ? put_lock_stats.isra.12 (arch/x86/include/asm/preempt.h:98 kernel/locking/lockdep.c:254)
[ 2509.830379] ? _raw_spin_unlock_irq (arch/x86/include/asm/paravirt.h:819 include/linux/spinlock_api_smp.h:168 kernel/locking/spinlock.c:199)
[ 2509.830379] do_group_exit (kernel/exit.c:884)
[ 2509.830379] get_signal_to_deliver (kernel/signal.c:2346)
[ 2509.830379] ? vtime_account_user (kernel/sched/cputime.c:687)
[ 2509.830379] do_signal (arch/x86/kernel/signal.c:698)
[ 2509.830379] ? vtime_account_user (kernel/sched/cputime.c:687)
[ 2509.830379] ? preempt_count_sub (kernel/sched/core.c:2575)
[ 2509.830379] ? context_tracking_user_exit (arch/x86/include/asm/paravirt.h:809 (discriminator 2) kernel/context_tracking.c:182 (discriminator 2))
[ 2509.830379] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
[ 2509.830379] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2557 kernel/locking/lockdep.c:2599)
[ 2509.830379] do_notify_resume (arch/x86/kernel/signal.c:751)
[ 2509.830379] retint_signal (arch/x86/kernel/entry_64.S:1125)
[ 2509.830379] Code: f8 c9 c3 0f 1f 00 55 48 89 e5 41 57 41 89 f7 41 56 49 89 fe 41 55 41 54 53 48 83 ec 18 40 88 75 c8 8b 35 0f c4 ae 05 48 89 7d c0 <4c> 8b a7 d8 01 00 00 85 f6 4d 8b ac 24 50 01 00 00 74 18 49 8d
[ 2509.830379] RIP perf_remove_from_context (kernel/events/core.c:1495)
[ 2509.830379]  RSP <ffff880129629bf8>


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use after free in perf_remove_from_context
  2014-05-12 15:42 perf: use after free in perf_remove_from_context Sasha Levin
@ 2014-05-14 16:29 ` Peter Zijlstra
  2014-05-14 16:32   ` Sasha Levin
  2014-05-15 18:11 ` eventpoll __list_del_entry corruption (was: perf: use after free in perf_remove_from_context) Peter Zijlstra
  2014-05-16 15:34 ` BUG_ON drivers/char/random.c:986 (Was: perf: use after free in perf_remove_from_context) Peter Zijlstra
  2 siblings, 1 reply; 46+ messages in thread
From: Peter Zijlstra @ 2014-05-14 16:29 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Ingo Molnar, acme, LKML, Thomas Gleixner, Dave Jones

[-- Attachment #1: Type: text/plain, Size: 701 bytes --]

On Mon, May 12, 2014 at 11:42:33AM -0400, Sasha Levin wrote:
> Hi all,
> 
> While fuzzing with trinity inside a KVM tools guest running the latest -next
> kernel I've stumbled on the following spew. Maybe related to the very recent
> change in freeing on task exit?
> 
> [ 2509.827261] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> [ 2509.830379] Dumping ftrace buffer:
> [ 2509.830379]    (ftrace buffer empty)
> [ 2509.830379] Modules linked in:
> [ 2509.830379] CPU: 47 PID: 43306 Comm: trinity-c126 Tainted: G        W     3.15.0-rc5-next-20140512-sasha-00019-ga20bc00-dirty #456

Any particular trinity setup? And would you happen to have the seed of
that run?

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use after free in perf_remove_from_context
  2014-05-14 16:29 ` Peter Zijlstra
@ 2014-05-14 16:32   ` Sasha Levin
  2014-05-14 16:35     ` Peter Zijlstra
  0 siblings, 1 reply; 46+ messages in thread
From: Sasha Levin @ 2014-05-14 16:32 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, acme, LKML, Thomas Gleixner, Dave Jones

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/14/2014 12:29 PM, Peter Zijlstra wrote:
> On Mon, May 12, 2014 at 11:42:33AM -0400, Sasha Levin wrote:
>> Hi all,
>> 
>> While fuzzing with trinity inside a KVM tools guest running the latest -next kernel I've stumbled on the following spew. Maybe related to the very recent change in freeing on task exit?
>> 
>> [ 2509.827261] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC [ 2509.830379] Dumping ftrace buffer: [ 2509.830379]    (ftrace buffer empty) [ 2509.830379] Modules linked in: [ 2509.830379] CPU: 47 PID: 43306 Comm: trinity-c126 Tainted: G        W     3.15.0-rc5-next-20140512-sasha-00019-ga20bc00-dirty #456
> 
> Any particular trinity setup? And would you happen to have the seed of that run?

Nothing special about trinity options. 400 threads and blacklisting some of the
destructive syscalls (umount, reboot, etc).

I don't have the seed, but that problem did reproduce again tonight so I can test
out debug code if you have something in mind.


Thanks,
Sasha

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJTc5qaAAoJEN6mb/eXdyzcg30P/jQ9XyI1kNd82nGwumNkZoxn
7krZU9M0IcUkBcvrxDFgPtdbcEt3p5K+mbF8+fwkwYSGXmKf1MhWaXjB47x74BMg
uM5DNlBTFYU9YRYtHnheb6UNhtIOhU5/oFfmMcGdxWCh/7PdXqrUBMAaAQDa1XJH
y+YBtOZWu+tKXVVuloCJMdgBvh/AI9Hbk/LqUyolKlKOR69A6oK7qFEhQLQpvk4y
63YYCePqLFcc1wYR0zscCnI53jw0qTqOUDo4pPMvWm+i2e9hjSpjDu0Y8drum4hf
nbrr3A8hR4Voy6+z92niEPcpn/N2OeNpijnjUc8YKprl+n2/i+mFx5sUlsAhiWMx
kzboP4yQQHcERuRaL1KJrQAQuWEtUFxLHVIzk0Ou+GrySCQKFCdL1XYO1sc84rOO
3ucDeSaf9lnjYOEHOydHolPzmEZynvGeOGclD5sxzJGrRmvKDuioI/gb5DIKJSLn
1948AukoekAHgFbrZv2ZXV57BtKcuY01cqpOGAfv4/7K8n4FTBgfTLd+1inpqk+n
P+tFzfI+Co+aB6UBHvCRg9Hr5qELjlorYAfVz98Fgvx8uqwbrM3xFCbwjem6qLOq
G2oHHB7/Wzu3GAabaZfwyMcf9jRLDXbgls5NIZ3jpMSt9BX1ezRTmTF4VhHuy51G
4SpXVJGpn/wrUoRygiop
=Fnmx
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use after free in perf_remove_from_context
  2014-05-14 16:32   ` Sasha Levin
@ 2014-05-14 16:35     ` Peter Zijlstra
  2014-05-14 16:38       ` Sasha Levin
  2014-05-28 23:52       ` Sasha Levin
  0 siblings, 2 replies; 46+ messages in thread
From: Peter Zijlstra @ 2014-05-14 16:35 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Ingo Molnar, acme, LKML, Thomas Gleixner, Dave Jones

[-- Attachment #1: Type: text/plain, Size: 1464 bytes --]

On Wed, May 14, 2014 at 12:32:26PM -0400, Sasha Levin wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 05/14/2014 12:29 PM, Peter Zijlstra wrote:
> > On Mon, May 12, 2014 at 11:42:33AM -0400, Sasha Levin wrote:
> >> Hi all,
> >> 
> >> While fuzzing with trinity inside a KVM tools guest running the latest -next kernel I've stumbled on the following spew. Maybe related to the very recent change in freeing on task exit?
> >> 
> >> [ 2509.827261] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC [ 2509.830379] Dumping ftrace buffer: [ 2509.830379]    (ftrace buffer empty) [ 2509.830379] Modules linked in: [ 2509.830379] CPU: 47 PID: 43306 Comm: trinity-c126 Tainted: G        W     3.15.0-rc5-next-20140512-sasha-00019-ga20bc00-dirty #456
> > 
> > Any particular trinity setup? And would you happen to have the seed of that run?
> 
> Nothing special about trinity options. 400 threads and blacklisting some of the
> destructive syscalls (umount, reboot, etc).
> 
> I don't have the seed, but that problem did reproduce again tonight so I can test
> out debug code if you have something in mind.

Nah, I drew a pretty big blank, which is why I wanted to see if I could
reproduce. If you could share your trinity cmdline I'd be much obliged.
While I did manage to clone (the repo moved since last time) and build
it, I'm not really that handy with it and want to avoid destroying my
machine if possible ;-)

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use after free in perf_remove_from_context
  2014-05-14 16:35     ` Peter Zijlstra
@ 2014-05-14 16:38       ` Sasha Levin
  2014-05-14 16:52         ` Peter Zijlstra
  2014-05-28 23:52       ` Sasha Levin
  1 sibling, 1 reply; 46+ messages in thread
From: Sasha Levin @ 2014-05-14 16:38 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, acme, LKML, Thomas Gleixner, Dave Jones

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/14/2014 12:35 PM, Peter Zijlstra wrote:
> On Wed, May 14, 2014 at 12:32:26PM -0400, Sasha Levin wrote:
>> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
>> 
>> On 05/14/2014 12:29 PM, Peter Zijlstra wrote:
>>> On Mon, May 12, 2014 at 11:42:33AM -0400, Sasha Levin wrote:
>>>> Hi all,
>>>> 
>>>> While fuzzing with trinity inside a KVM tools guest running the latest -next kernel I've stumbled on the following spew. Maybe related to the very recent change in freeing on task exit?
>>>> 
>>>> [ 2509.827261] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC [ 2509.830379] Dumping ftrace buffer: [ 2509.830379]    (ftrace buffer empty) [ 2509.830379] Modules linked in: [ 2509.830379] CPU: 47 PID: 43306 Comm: trinity-c126 Tainted: G        W     3.15.0-rc5-next-20140512-sasha-00019-ga20bc00-dirty #456
>>> 
>>> Any particular trinity setup? And would you happen to have the seed of that run?
>> 
>> Nothing special about trinity options. 400 threads and blacklisting some of the destructive syscalls (umount, reboot, etc).
>> 
>> I don't have the seed, but that problem did reproduce again tonight so I can test out debug code if you have something in mind.
> 
> Nah, I drew a pretty big blank, which is why I wanted to see if I could reproduce. If you could share your trinity cmdline I'd be much obliged. While I did manage to clone (the repo moved since last time) and build it, I'm not really that handy with it and want to avoid destroying my machine if possible ;-)
> 

./trinity -xinit_module -xreboot -xshutdown -xunshare -xnfsservctl -xclock_nanosleep -xuselib -xumount -xmount -m --quiet --dangerous -C 400 -l off

Note that I run it as root in a disposable VM. Running that as root on your
machine will likely kill it.


Thanks,
Sasha
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJTc5vxAAoJEN6mb/eXdyzckwUQAJDWAzRcDfjQUwJsXHeMG9hb
QMBdTJqijHnbzOiPX/xHnxxALGVsUl8+ZekLxlCmPF6l5o9/AGngavrb2pBG9uQe
qo8B9vB6rxmzfImwWFpaC8LTizGfrwwU60Br1ByQiGCMnKVWGsenhOJgItawH+cu
nm2QsyKwGAU9OvwFXfEuD8jky9yvTMOZVftaZE2oLW85dyjzAu24+PD5ogMpCraJ
6sM4fgU8w3WdBqQEvxj4u3ET0DmUrASV4wLTJ+/K5Mty6IZPYChEpDrdMmBl7kOM
jk8QnJvP9jnflbfYwKvEZ3n0j9WkiVitU+3rITy17RJ7lHglwSR/5IwU/nhYVZUA
ymrWldsYRPx55r0XVvTGyfx/9SkpHyl/mOgtjbFTYq8c4oxGTpC0LQZIVmtVPP0l
j8ZBIV0pZUYK4KPvoe8eozZuJ6q+YdETrekLoFpLnIweo8bCfEU2vhD8Vzxp1aDm
E907cHq1pqe6eRC9Las/gAXC/efy9YhsmtQgdwVpwwDpyaIujznbCF9nTfYgOkgx
VoWED91EXqls8OQONlsNkc+Xy1fMmBZpwWiQeaOI5vLXhiwl0vxUl0ZHcb9VvJ4q
0k8EET64XIoJp+vZVO8d97AYG7B6Ub0W7DAs7Sn2ib5h44pByOXpR4scMZVXR7Hv
zcievjH07ftXEUXOn6Xr
=sJLe
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use after free in perf_remove_from_context
  2014-05-14 16:38       ` Sasha Levin
@ 2014-05-14 16:52         ` Peter Zijlstra
  2014-05-14 17:09           ` Sasha Levin
  0 siblings, 1 reply; 46+ messages in thread
From: Peter Zijlstra @ 2014-05-14 16:52 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Ingo Molnar, acme, LKML, Thomas Gleixner, Dave Jones

[-- Attachment #1: Type: text/plain, Size: 1518 bytes --]

On Wed, May 14, 2014 at 12:38:10PM -0400, Sasha Levin wrote:
> ./trinity -xinit_module -xreboot -xshutdown -xunshare -xnfsservctl -xclock_nanosleep -xuselib -xumount -xmount -m --quiet --dangerous -C 400 -l off
> 
> Note that I run it as root in a disposable VM. Running that as root on your
> machine will likely kill it.

Right, so i'll start with !root running. That said, I seem to get an
endless stream of double free or corrupt msgs:

trinity$ ./trinity -xinit_module -xreboot -xshutdown -xunshare -xnfsservctl -xclock_nanosleep -xuselib -xumount -xmount -m --quiet  -C 400 -l off

...

*** Error in `./trinity': double free or corruption (top): 0x000000000135af60 ***
[main] Random reseed: 3671679404
[main] Random reseed: 67838733
*** Error in `./trinity': double free or corruption (top): 0x000000000135ef60 ***
[main] Random reseed: 943044433
*** Error in `./trinity': double free or corruption (top): 0x00000000012bb210 ***
[main] Random reseed: 3627649145
*** Error in `./trinity': double free or corruption (top): 0x000000000139dbe0 ***
[main] Random reseed: 935036390
[main] Random reseed: 3922349807
[main] Random reseed: 835563887
*** Error in `./trinity': double free or corruption (top): 0x0000000001323890 ***
[main] Random reseed: 974945617
*** Error in `./trinity': double free or corruption (top): 0x0000000001308190 ***
[main] Random reseed: 2912296953
[main] Random reseed: 681900861


Is this 'expected' behaviour?

trinity$ git describe
1.1-1418-g8bf4840


[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use after free in perf_remove_from_context
  2014-05-14 16:52         ` Peter Zijlstra
@ 2014-05-14 17:09           ` Sasha Levin
  2014-05-14 17:20             ` Dave Jones
  0 siblings, 1 reply; 46+ messages in thread
From: Sasha Levin @ 2014-05-14 17:09 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, acme, LKML, Thomas Gleixner, Dave Jones

[-- Attachment #1: Type: text/plain, Size: 2003 bytes --]

On 05/14/2014 12:52 PM, Peter Zijlstra wrote:
> On Wed, May 14, 2014 at 12:38:10PM -0400, Sasha Levin wrote:
>> ./trinity -xinit_module -xreboot -xshutdown -xunshare -xnfsservctl -xclock_nanosleep -xuselib -xumount -xmount -m --quiet --dangerous -C 400 -l off
>>
>> Note that I run it as root in a disposable VM. Running that as root on your
>> machine will likely kill it.
> 
> Right, so i'll start with !root running. That said, I seem to get an
> endless stream of double free or corrupt msgs:
> 
> trinity$ ./trinity -xinit_module -xreboot -xshutdown -xunshare -xnfsservctl -xclock_nanosleep -xuselib -xumount -xmount -m --quiet  -C 400 -l off
> 
> ...
> 
> *** Error in `./trinity': double free or corruption (top): 0x000000000135af60 ***
> [main] Random reseed: 3671679404
> [main] Random reseed: 67838733
> *** Error in `./trinity': double free or corruption (top): 0x000000000135ef60 ***
> [main] Random reseed: 943044433
> *** Error in `./trinity': double free or corruption (top): 0x00000000012bb210 ***
> [main] Random reseed: 3627649145
> *** Error in `./trinity': double free or corruption (top): 0x000000000139dbe0 ***
> [main] Random reseed: 935036390
> [main] Random reseed: 3922349807
> [main] Random reseed: 835563887
> *** Error in `./trinity': double free or corruption (top): 0x0000000001323890 ***
> [main] Random reseed: 974945617
> *** Error in `./trinity': double free or corruption (top): 0x0000000001308190 ***
> [main] Random reseed: 2912296953
> [main] Random reseed: 681900861
> 
> 
> Is this 'expected' behaviour?

I've talked about it with Dave a while ago, he said it happens because the threads that
do all the testing sometimes corrupt themselves which result in that error. It shouldn't
be harmful as the thread will just get killed and a fresh one will be started in it's
place.

Since then I set MALLOC_CHECK_=0 to avoid seeing that in my logs.

So yes, it's probably a good thing you're seeing that.


Thanks,
Sasha


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use after free in perf_remove_from_context
  2014-05-14 17:09           ` Sasha Levin
@ 2014-05-14 17:20             ` Dave Jones
  2014-05-14 18:37               ` Peter Zijlstra
  0 siblings, 1 reply; 46+ messages in thread
From: Dave Jones @ 2014-05-14 17:20 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Peter Zijlstra, Ingo Molnar, acme, LKML, Thomas Gleixner

On Wed, May 14, 2014 at 01:09:31PM -0400, Sasha Levin wrote:
 
 > > *** Error in `./trinity': double free or corruption (top): 0x000000000135af60 ***
 > > [main] Random reseed: 3671679404
 > > [main] Random reseed: 67838733
 > > *** Error in `./trinity': double free or corruption (top): 0x000000000135ef60 ***
 > > [main] Random reseed: 943044433
 > > *** Error in `./trinity': double free or corruption (top): 0x00000000012bb210 ***
 > > [main] Random reseed: 3627649145
 > > *** Error in `./trinity': double free or corruption (top): 0x000000000139dbe0 ***
 > > [main] Random reseed: 935036390
 > > [main] Random reseed: 3922349807
 > > [main] Random reseed: 835563887
 > > *** Error in `./trinity': double free or corruption (top): 0x0000000001323890 ***
 > > [main] Random reseed: 974945617
 > > *** Error in `./trinity': double free or corruption (top): 0x0000000001308190 ***
 > > [main] Random reseed: 2912296953
 > > [main] Random reseed: 681900861
 > > 
 > > Is this 'expected' behaviour?
 > 
 > I've talked about it with Dave a while ago, he said it happens because the threads that
 > do all the testing sometimes corrupt themselves which result in that error. It shouldn't
 > be harmful as the thread will just get killed and a fresh one will be started in it's
 > place.
 > 
 > Since then I set MALLOC_CHECK_=0 to avoid seeing that in my logs.
 
Pretty strong odds it's mremap causing those double frees.  You can
-xmremap that to see if it goes away.

	Dave


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use after free in perf_remove_from_context
  2014-05-14 17:20             ` Dave Jones
@ 2014-05-14 18:37               ` Peter Zijlstra
  0 siblings, 0 replies; 46+ messages in thread
From: Peter Zijlstra @ 2014-05-14 18:37 UTC (permalink / raw)
  To: Dave Jones, Sasha Levin, Ingo Molnar, acme, LKML, Thomas Gleixner

On Wed, May 14, 2014 at 01:20:43PM -0400, Dave Jones wrote:
> Pretty strong odds it's mremap causing those double frees.  You can
> -xmremap that to see if it goes away.

unless I'm entirely pebkac atm it doesn't seem to help any.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* eventpoll __list_del_entry corruption (was: perf: use after free in perf_remove_from_context)
  2014-05-12 15:42 perf: use after free in perf_remove_from_context Sasha Levin
  2014-05-14 16:29 ` Peter Zijlstra
@ 2014-05-15 18:11 ` Peter Zijlstra
  2014-05-15 18:16   ` eventpoll __list_del_entry corruption Sasha Levin
                     ` (2 more replies)
  2014-05-16 15:34 ` BUG_ON drivers/char/random.c:986 (Was: perf: use after free in perf_remove_from_context) Peter Zijlstra
  2 siblings, 3 replies; 46+ messages in thread
From: Peter Zijlstra @ 2014-05-15 18:11 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Ingo Molnar, acme, LKML, Thomas Gleixner, Dave Jones, edumazet,
	viro, jbaron

[-- Attachment #1: Type: text/plain, Size: 2455 bytes --]

On Mon, May 12, 2014 at 11:42:33AM -0400, Sasha Levin wrote:
> Hi all,
> 
> While fuzzing with trinity inside a KVM tools guest running the latest -next
> kernel I've stumbled on the following spew. Maybe related to the very recent
> change in freeing on task exit?
> 

While fuzzing to reproduce; I hit this one, is it a known one or should
I go poke the right people about it?

---
[ 5823.689985] ------------[ cut here ]------------
[ 5823.690004] WARNING: CPU: 3 PID: 2508 at /usr/src/linux-2.6/lib/list_debug.c:59 __list_del_entry+0xa1/0xd0()
[ 5823.690004] list_del corruption. prev->next should be ffff880131111de0, but was 6b6b6b6b6b6b6b6b
[ 5823.690004] Modules linked in:
[ 5823.690004] CPU: 3 PID: 2508 Comm: trinity-main Not tainted 3.15.0-rc5-01700-g505011124ad0-dirty #1072
[ 5823.690004] Hardware name: Supermicro X8DTN/X8DTN, BIOS 4.6.3 01/08/2010
[ 5823.690004]  0000000000000009 ffff880432709ca8 ffffffff81681aa2 ffff880432709cf0
[ 5823.690004]  ffff880432709ce0 ffffffff8109807c ffff880131111de0 ffff880131111dc8
[ 5823.690004]  0000000000000286 ffff8800b9dd5618 ffff88023699b720 ffff880432709d40
[ 5823.690004] Call Trace:
[ 5823.690004]  [<ffffffff81681aa2>] dump_stack+0x4e/0x7a
[ 5823.690004]  [<ffffffff8109807c>] warn_slowpath_common+0x8c/0xc0
[ 5823.690004]  [<ffffffff8109816c>] warn_slowpath_fmt+0x4c/0x50
[ 5823.690004]  [<ffffffff810ec8bf>] ? do_raw_spin_lock+0x13f/0x160
[ 5823.690004]  [<ffffffff8138c661>] __list_del_entry+0xa1/0xd0
[ 5823.690004]  [<ffffffff8138c69d>] list_del+0xd/0x30
[ 5823.690004]  [<ffffffff810dfa71>] remove_wait_queue+0x31/0x50
[ 5823.690004]  [<ffffffff812152aa>] ep_unregister_pollwait.isra.9+0x6a/0xb0
[ 5823.690004]  [<ffffffff81215268>] ? ep_unregister_pollwait.isra.9+0x28/0xb0
[ 5823.690004]  [<ffffffff8121531f>] ep_remove+0x2f/0xe0
[ 5823.690004]  [<ffffffff81215705>] eventpoll_release_file+0x65/0xa0
[ 5823.690004]  [<ffffffff811cf259>] __fput+0x1d9/0x1e0
[ 5823.690004]  [<ffffffff811cf2ae>] ____fput+0xe/0x10
[ 5823.690004]  [<ffffffff810b91f4>] task_work_run+0xc4/0xe0
[ 5823.690004]  [<ffffffff8109a544>] do_exit+0x2d4/0xa90
[ 5823.690004]  [<ffffffff813825c4>] ? lockdep_sys_exit_thunk+0x35/0x67
[ 5823.690004]  [<ffffffff8109ae2c>] do_group_exit+0x4c/0xc0
[ 5823.690004]  [<ffffffff8109aeb7>] SyS_exit_group+0x17/0x20
[ 5823.690004]  [<ffffffff8168a2c2>] system_call_fastpath+0x16/0x1b
[ 5823.690004] ---[ end trace 515b7fa3169c0906 ]---

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: eventpoll __list_del_entry corruption
  2014-05-15 18:11 ` eventpoll __list_del_entry corruption (was: perf: use after free in perf_remove_from_context) Peter Zijlstra
@ 2014-05-15 18:16   ` Sasha Levin
  2014-06-16  9:44     ` Eric Wong
  2014-05-21  8:25   ` BUG at /usr/src/linux-2.6/mm/filemap.c:202 (was: perf: use after free in perf_remove_from_context) Peter Zijlstra
  2014-06-03 15:07   ` eventpoll __list_del_entry corruption Jason Baron
  2 siblings, 1 reply; 46+ messages in thread
From: Sasha Levin @ 2014-05-15 18:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, acme, LKML, Thomas Gleixner, Dave Jones, edumazet,
	viro, jbaron

[-- Attachment #1: Type: text/plain, Size: 2731 bytes --]

On 05/15/2014 02:11 PM, Peter Zijlstra wrote:
> On Mon, May 12, 2014 at 11:42:33AM -0400, Sasha Levin wrote:
>> Hi all,
>>
>> While fuzzing with trinity inside a KVM tools guest running the latest -next
>> kernel I've stumbled on the following spew. Maybe related to the very recent
>> change in freeing on task exit?
>>
> 
> While fuzzing to reproduce; I hit this one, is it a known one or should
> I go poke the right people about it?
> 
> ---
> [ 5823.689985] ------------[ cut here ]------------
> [ 5823.690004] WARNING: CPU: 3 PID: 2508 at /usr/src/linux-2.6/lib/list_debug.c:59 __list_del_entry+0xa1/0xd0()
> [ 5823.690004] list_del corruption. prev->next should be ffff880131111de0, but was 6b6b6b6b6b6b6b6b
> [ 5823.690004] Modules linked in:
> [ 5823.690004] CPU: 3 PID: 2508 Comm: trinity-main Not tainted 3.15.0-rc5-01700-g505011124ad0-dirty #1072
> [ 5823.690004] Hardware name: Supermicro X8DTN/X8DTN, BIOS 4.6.3 01/08/2010
> [ 5823.690004]  0000000000000009 ffff880432709ca8 ffffffff81681aa2 ffff880432709cf0
> [ 5823.690004]  ffff880432709ce0 ffffffff8109807c ffff880131111de0 ffff880131111dc8
> [ 5823.690004]  0000000000000286 ffff8800b9dd5618 ffff88023699b720 ffff880432709d40
> [ 5823.690004] Call Trace:
> [ 5823.690004]  [<ffffffff81681aa2>] dump_stack+0x4e/0x7a
> [ 5823.690004]  [<ffffffff8109807c>] warn_slowpath_common+0x8c/0xc0
> [ 5823.690004]  [<ffffffff8109816c>] warn_slowpath_fmt+0x4c/0x50
> [ 5823.690004]  [<ffffffff810ec8bf>] ? do_raw_spin_lock+0x13f/0x160
> [ 5823.690004]  [<ffffffff8138c661>] __list_del_entry+0xa1/0xd0
> [ 5823.690004]  [<ffffffff8138c69d>] list_del+0xd/0x30
> [ 5823.690004]  [<ffffffff810dfa71>] remove_wait_queue+0x31/0x50
> [ 5823.690004]  [<ffffffff812152aa>] ep_unregister_pollwait.isra.9+0x6a/0xb0
> [ 5823.690004]  [<ffffffff81215268>] ? ep_unregister_pollwait.isra.9+0x28/0xb0
> [ 5823.690004]  [<ffffffff8121531f>] ep_remove+0x2f/0xe0
> [ 5823.690004]  [<ffffffff81215705>] eventpoll_release_file+0x65/0xa0
> [ 5823.690004]  [<ffffffff811cf259>] __fput+0x1d9/0x1e0
> [ 5823.690004]  [<ffffffff811cf2ae>] ____fput+0xe/0x10
> [ 5823.690004]  [<ffffffff810b91f4>] task_work_run+0xc4/0xe0
> [ 5823.690004]  [<ffffffff8109a544>] do_exit+0x2d4/0xa90
> [ 5823.690004]  [<ffffffff813825c4>] ? lockdep_sys_exit_thunk+0x35/0x67
> [ 5823.690004]  [<ffffffff8109ae2c>] do_group_exit+0x4c/0xc0
> [ 5823.690004]  [<ffffffff8109aeb7>] SyS_exit_group+0x17/0x20
> [ 5823.690004]  [<ffffffff8168a2c2>] system_call_fastpath+0x16/0x1b
> [ 5823.690004] ---[ end trace 515b7fa3169c0906 ]---
> 

Dave reported something similar to that last year(!) and that never got fixed
AFAIK: https://lkml.org/lkml/2013/10/14/353.


Thanks,
Sasha


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* BUG_ON drivers/char/random.c:986 (Was: perf: use after free in perf_remove_from_context)
  2014-05-12 15:42 perf: use after free in perf_remove_from_context Sasha Levin
  2014-05-14 16:29 ` Peter Zijlstra
  2014-05-15 18:11 ` eventpoll __list_del_entry corruption (was: perf: use after free in perf_remove_from_context) Peter Zijlstra
@ 2014-05-16 15:34 ` Peter Zijlstra
  2014-05-16 16:06   ` H. Peter Anvin
  2 siblings, 1 reply; 46+ messages in thread
From: Peter Zijlstra @ 2014-05-16 15:34 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Ingo Molnar, acme, LKML, Thomas Gleixner, Dave Jones, hpa, tytso,
	price, hannes

[-- Attachment #1: Type: text/plain, Size: 2623 bytes --]


While fuzzing to reproduce my issue I hit the below, its triggered loads
of times and then the machine wedged (needed a power cycle), I can
provide the full console log if people care.

Anybody seen that one before?

---
[  861.777414] ------------[ cut here ]------------
[  861.777416] kernel BUG at /usr/src/linux-2.6/drivers/char/random.c:986!
[  861.777419] invalid opcode: 0000 [#1] PREEMPT SMP 
[  861.777421] Modules linked in:
[  861.777424] CPU: 1 PID: 9851 Comm: trinity-c321 Not tainted 3.15.0-rc5-01700-g505011124ad0-dirty #1076
[  861.777425] Hardware name: Supermicro X8DTN/X8DTN, BIOS 4.6.3 01/08/2010
[  861.777426] task: ffff88022f68a600 ti: ffff88022f716000 task.ti: ffff88022f716000
[  861.777433] RIP: 0010:[<ffffffff81429ce4>]  [<ffffffff81429ce4>] account+0x164/0x170
[  861.777434] RSP: 0000:ffff88022f717e48  EFLAGS: 00010202
[  861.777435] RAX: ffffffff81c90788 RBX: 0000000000000991 RCX: 0000000000000000
[  861.777436] RDX: 0000000000000000 RSI: 0000000000000991 RDI: 0000000000002000
[  861.777437] RBP: ffff88022f717e70 R08: 0000000000000000 R09: 0000000000000000
[  861.777438] R10: 00000000000046c0 R11: 000000c8a18b27e1 R12: ffffffff81c90300
[  861.777439] R13: 0000000000000000 R14: ffffffff8142aff6 R15: ffffffff81a58605
[  861.777441] FS:  00007fba327ff700(0000) GS:ffff880237c20000(0000) knlGS:0000000000000000
[  861.777442] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  861.777443] CR2: 0000000000000010 CR3: 000000022f703000 CR4: 00000000000007e0
[  861.777444] Stack:
[  861.777446]  0000000000000991 0000000000007b44 0000000000000000 ffffffff8142aff6
[  861.777448]  ffffffff81a58605 ffff88022f717ed0 ffffffff8142ae82 0000000000000000
[  861.777450]  ffffffff81c90300 0000000000020000 ffff880233291240 ffff880435788690
[  861.777451] Call Trace:
[  861.777454]  [<ffffffff8142aff6>] ? urandom_read+0x46/0x120
[  861.777455]  [<ffffffff8142ae82>] extract_entropy_user+0x52/0x180
[  861.777457]  [<ffffffff8142aff6>] urandom_read+0x46/0x120
[  861.777462]  [<ffffffff811cdf3b>] vfs_read+0xab/0x180
[  861.777464]  [<ffffffff811ce1b2>] SyS_read+0x52/0xb0
[  861.777466]  [<ffffffff8168c939>] ia32_do_call+0x13/0x13
[  861.777489] Code: c9 81 e8 a0 5e cb ff ba 02 00 02 00 be 1d 00 00 00 48 c7 c7 c0 e5 c6 82 e8 9a 64 db ff 48 89 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 
[  861.777491] RIP  [<ffffffff81429ce4>] account+0x164/0x170
[  861.777492]  RSP <ffff88022f717e48>
[  861.778140] ---[ end trace e70cc3dc70339c62 ]---
[  861.779282] ------------[ cut here ]------------

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: BUG_ON drivers/char/random.c:986 (Was: perf: use after free in perf_remove_from_context)
  2014-05-16 15:34 ` BUG_ON drivers/char/random.c:986 (Was: perf: use after free in perf_remove_from_context) Peter Zijlstra
@ 2014-05-16 16:06   ` H. Peter Anvin
  2014-05-16 16:21     ` Peter Zijlstra
  0 siblings, 1 reply; 46+ messages in thread
From: H. Peter Anvin @ 2014-05-16 16:06 UTC (permalink / raw)
  To: Peter Zijlstra, Sasha Levin
  Cc: Ingo Molnar, acme, LKML, Thomas Gleixner, Dave Jones, tytso,
	price, hannes

On 05/16/2014 08:34 AM, Peter Zijlstra wrote:
> 
> While fuzzing to reproduce my issue I hit the below, its triggered loads
> of times and then the machine wedged (needed a power cycle), I can
> provide the full console log if people care.
> 
> Anybody seen that one before?
> 

I certainly haven't... this bad.  Any idea what the actual system call
looked like?

The BUG_ON() in question indicates that r->entropy_count either
overflowed or underflowed... most likely underflowed I'd guess.

	-hpa


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: BUG_ON drivers/char/random.c:986 (Was: perf: use after free in perf_remove_from_context)
  2014-05-16 16:06   ` H. Peter Anvin
@ 2014-05-16 16:21     ` Peter Zijlstra
  2014-05-17  0:46       ` Hannes Frederic Sowa
  0 siblings, 1 reply; 46+ messages in thread
From: Peter Zijlstra @ 2014-05-16 16:21 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Sasha Levin, Ingo Molnar, acme, LKML, Thomas Gleixner,
	Dave Jones, tytso, price, hannes

[-- Attachment #1: Type: text/plain, Size: 804 bytes --]

On Fri, May 16, 2014 at 09:06:13AM -0700, H. Peter Anvin wrote:
> On 05/16/2014 08:34 AM, Peter Zijlstra wrote:
> > 
> > While fuzzing to reproduce my issue I hit the below, its triggered loads
> > of times and then the machine wedged (needed a power cycle), I can
> > provide the full console log if people care.
> > 
> > Anybody seen that one before?
> > 
> 
> I certainly haven't... this bad.  Any idea what the actual system call
> looked like?

Sadly, no, while I had syscall tracing enabled, the bug I'm chasing is
non fatal so I hadn't set it to dump on bug -- not that dumping 24*1.4M
trace buffers over serial would've completed this side of the century.

I did save the trinity output (including seeds) but my experience so far
is that that is no guarantee for reproduction.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: BUG_ON drivers/char/random.c:986 (Was: perf: use after free in perf_remove_from_context)
  2014-05-16 16:21     ` Peter Zijlstra
@ 2014-05-17  0:46       ` Hannes Frederic Sowa
  2014-05-17  2:18         ` Theodore Ts'o
  0 siblings, 1 reply; 46+ messages in thread
From: Hannes Frederic Sowa @ 2014-05-17  0:46 UTC (permalink / raw)
  To: Peter Zijlstra, H. Peter Anvin
  Cc: Sasha Levin, Ingo Molnar, acme, LKML, Thomas Gleixner,
	Dave Jones, tytso, price

On Fri, May 16, 2014, at 9:21, Peter Zijlstra wrote:
> On Fri, May 16, 2014 at 09:06:13AM -0700, H. Peter Anvin wrote:
> > On 05/16/2014 08:34 AM, Peter Zijlstra wrote:
> > > 
> > > While fuzzing to reproduce my issue I hit the below, its triggered loads
> > > of times and then the machine wedged (needed a power cycle), I can
> > > provide the full console log if people care.
> > > 
> > > Anybody seen that one before?
> > > 
> > 
> > I certainly haven't... this bad.  Any idea what the actual system call
> > looked like?
> 
> Sadly, no, while I had syscall tracing enabled, the bug I'm chasing is
> non fatal so I hadn't set it to dump on bug -- not that dumping 24*1.4M
> trace buffers over serial would've completed this side of the century.
> 
> I did save the trinity output (including seeds) but my experience so far
> is that that is no guarantee for reproduction.

This should do the trick:
dd if=/dev/urandom of=/dev/zero bs=67108707

I suspect ee1de406ba6eb1 ("random: simplify accounting logic") as the
culprit.

Bye,

  Hannes

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: BUG_ON drivers/char/random.c:986 (Was: perf: use after free in perf_remove_from_context)
  2014-05-17  0:46       ` Hannes Frederic Sowa
@ 2014-05-17  2:18         ` Theodore Ts'o
  2014-05-17 16:24           ` Sasha Levin
  2014-07-15  4:36           ` BUG_ON drivers/char/random.c:986 Dave Jones
  0 siblings, 2 replies; 46+ messages in thread
From: Theodore Ts'o @ 2014-05-17  2:18 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Peter Zijlstra, H. Peter Anvin, Sasha Levin, Ingo Molnar, acme,
	LKML, Thomas Gleixner, Dave Jones, price

On Fri, May 16, 2014 at 05:46:22PM -0700, Hannes Frederic Sowa wrote:
> This should do the trick:
> dd if=/dev/urandom of=/dev/zero bs=67108707
> 
> I suspect ee1de406ba6eb1 ("random: simplify accounting logic") as the
> culprit.

Yep, that it's it.  Thanks for noticing this so quickly!  I'll push
the following patch to Linus.

    	      	    			- Ted

commit 29fb0ca5b3922288fba3f4c975a55032a51df0f0
Author: Theodore Ts'o <tytso@mit.edu>
Date:   Fri May 16 21:40:41 2014 -0400

    random: fix BUG_ON caused by accounting simplification
    
    Commit ee1de406ba6eb1 ("random: simplify accounting logic") simplified
    things too much, in that it allows the following to trigger an
    overflow that results in a BUG_ON crash:
    
    dd if=/dev/urandom of=/dev/zero bs=67108707
    
    Thanks to Peter Zihlstra for discovering the crash, and Hannes
    Frederic for analyizing the root cause.
    
    Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
    Reported-by: Peter Zijlstra <peterz@infradead.org>
    Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
    Cc: Greg Price <price@mit.edu>

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 6b75713..102c50d 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -995,8 +995,11 @@ retry:
 		ibytes = min_t(size_t, ibytes, have_bytes - reserved);
 	if (ibytes < min)
 		ibytes = 0;
-	entropy_count = max_t(int, 0,
-			      entropy_count - (ibytes << (ENTROPY_SHIFT + 3)));
+	if (have_bytes >= ibytes + reserved)
+		entropy_count -= ibytes << (ENTROPY_SHIFT + 3);
+	else
+		entropy_count = reserved << (ENTROPY_SHIFT + 3);
+
 	if (cmpxchg(&r->entropy_count, orig, entropy_count) != orig)
 		goto retry;
 

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: BUG_ON drivers/char/random.c:986 (Was: perf: use after free in perf_remove_from_context)
  2014-05-17  2:18         ` Theodore Ts'o
@ 2014-05-17 16:24           ` Sasha Levin
  2014-05-17 17:00             ` Peter Zijlstra
  2014-07-15  4:36           ` BUG_ON drivers/char/random.c:986 Dave Jones
  1 sibling, 1 reply; 46+ messages in thread
From: Sasha Levin @ 2014-05-17 16:24 UTC (permalink / raw)
  To: Theodore Ts'o, Hannes Frederic Sowa, Peter Zijlstra,
	H. Peter Anvin, Ingo Molnar, acme, LKML, Thomas Gleixner,
	Dave Jones, price

On 05/16/2014 10:18 PM, Theodore Ts'o wrote:
> On Fri, May 16, 2014 at 05:46:22PM -0700, Hannes Frederic Sowa wrote:
>> > This should do the trick:
>> > dd if=/dev/urandom of=/dev/zero bs=67108707
>> > 
>> > I suspect ee1de406ba6eb1 ("random: simplify accounting logic") as the
>> > culprit.
> Yep, that it's it.  Thanks for noticing this so quickly!  I'll push
> the following patch to Linus.

Peter, could you describe the infrastructure you're running on? I don't
think Dave or myself have ever hit this issue, and I wonder what made you
stumble on it.


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: BUG_ON drivers/char/random.c:986 (Was: perf: use after free in perf_remove_from_context)
  2014-05-17 16:24           ` Sasha Levin
@ 2014-05-17 17:00             ` Peter Zijlstra
  0 siblings, 0 replies; 46+ messages in thread
From: Peter Zijlstra @ 2014-05-17 17:00 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Theodore Ts'o, Hannes Frederic Sowa, H. Peter Anvin,
	Ingo Molnar, acme, LKML, Thomas Gleixner, Dave Jones, price

[-- Attachment #1: Type: text/plain, Size: 1187 bytes --]

On Sat, May 17, 2014 at 12:24:32PM -0400, Sasha Levin wrote:
> On 05/16/2014 10:18 PM, Theodore Ts'o wrote:
> > On Fri, May 16, 2014 at 05:46:22PM -0700, Hannes Frederic Sowa wrote:
> >> > This should do the trick:
> >> > dd if=/dev/urandom of=/dev/zero bs=67108707
> >> > 
> >> > I suspect ee1de406ba6eb1 ("random: simplify accounting logic") as the
> >> > culprit.
> > Yep, that it's it.  Thanks for noticing this so quickly!  I'll push
> > the following patch to Linus.
> 
> Peter, could you describe the infrastructure you're running on? I don't
> think Dave or myself have ever hit this issue, and I wonder what made you
> stumble on it.

I'm running it on a WSP-EP, 2 nodes, 6 cores per node, 2 threads per
core for a total of 24 cpus.

I'm running trinity as user, and using:

  MALLOC_CHECK_=0 ./trinity -xinit_module -xreboot -xshutdown -xunshare -xnfsservctl -xclock_nanosleep -xuselib -xumount -xmount -m --quiet -C 400 -l off -xmremap

to run it.

I'm currently also running with syscall and a few other trace bits
enabled, but I suppose that shouldn't affect things too much.

The machine is running Debian wheezy with a few choice testing bits.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* BUG at /usr/src/linux-2.6/mm/filemap.c:202 (was: perf: use after free in perf_remove_from_context)
  2014-05-15 18:11 ` eventpoll __list_del_entry corruption (was: perf: use after free in perf_remove_from_context) Peter Zijlstra
  2014-05-15 18:16   ` eventpoll __list_del_entry corruption Sasha Levin
@ 2014-05-21  8:25   ` Peter Zijlstra
  2014-05-21 13:02     ` BUG at /usr/src/linux-2.6/mm/filemap.c:202 Sasha Levin
  2014-06-03 15:07   ` eventpoll __list_del_entry corruption Jason Baron
  2 siblings, 1 reply; 46+ messages in thread
From: Peter Zijlstra @ 2014-05-21  8:25 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Ingo Molnar, acme, LKML, Thomas Gleixner, Dave Jones, edumazet,
	viro, jbaron, hughd, mgorman

[-- Attachment #1: Type: text/plain, Size: 5712 bytes --]

On Thu, May 15, 2014 at 08:11:02PM +0200, Peter Zijlstra wrote:
> On Mon, May 12, 2014 at 11:42:33AM -0400, Sasha Levin wrote:
> > Hi all,
> > 
> > While fuzzing with trinity inside a KVM tools guest running the latest -next
> > kernel I've stumbled on the following spew. Maybe related to the very recent
> > change in freeing on task exit?
> > 
> 
> While fuzzing to reproduce; I hit this one, is it a known one or should
> I go poke the right people about it?
> 
> ---
> [ 5823.689985] ------------[ cut here ]------------
> [ 5823.690004] WARNING: CPU: 3 PID: 2508 at /usr/src/linux-2.6/lib/list_debug.c:59 __list_del_entry+0xa1/0xd0()
> [ 5823.690004] list_del corruption. prev->next should be ffff880131111de0, but was 6b6b6b6b6b6b6b6b
> [ 5823.690004] Modules linked in:
> [ 5823.690004] CPU: 3 PID: 2508 Comm: trinity-main Not tainted 3.15.0-rc5-01700-g505011124ad0-dirty #1072
> [ 5823.690004] Hardware name: Supermicro X8DTN/X8DTN, BIOS 4.6.3 01/08/2010
> [ 5823.690004]  0000000000000009 ffff880432709ca8 ffffffff81681aa2 ffff880432709cf0
> [ 5823.690004]  ffff880432709ce0 ffffffff8109807c ffff880131111de0 ffff880131111dc8
> [ 5823.690004]  0000000000000286 ffff8800b9dd5618 ffff88023699b720 ffff880432709d40
> [ 5823.690004] Call Trace:
> [ 5823.690004]  [<ffffffff81681aa2>] dump_stack+0x4e/0x7a
> [ 5823.690004]  [<ffffffff8109807c>] warn_slowpath_common+0x8c/0xc0
> [ 5823.690004]  [<ffffffff8109816c>] warn_slowpath_fmt+0x4c/0x50
> [ 5823.690004]  [<ffffffff810ec8bf>] ? do_raw_spin_lock+0x13f/0x160
> [ 5823.690004]  [<ffffffff8138c661>] __list_del_entry+0xa1/0xd0
> [ 5823.690004]  [<ffffffff8138c69d>] list_del+0xd/0x30
> [ 5823.690004]  [<ffffffff810dfa71>] remove_wait_queue+0x31/0x50
> [ 5823.690004]  [<ffffffff812152aa>] ep_unregister_pollwait.isra.9+0x6a/0xb0
> [ 5823.690004]  [<ffffffff81215268>] ? ep_unregister_pollwait.isra.9+0x28/0xb0
> [ 5823.690004]  [<ffffffff8121531f>] ep_remove+0x2f/0xe0
> [ 5823.690004]  [<ffffffff81215705>] eventpoll_release_file+0x65/0xa0
> [ 5823.690004]  [<ffffffff811cf259>] __fput+0x1d9/0x1e0
> [ 5823.690004]  [<ffffffff811cf2ae>] ____fput+0xe/0x10
> [ 5823.690004]  [<ffffffff810b91f4>] task_work_run+0xc4/0xe0
> [ 5823.690004]  [<ffffffff8109a544>] do_exit+0x2d4/0xa90
> [ 5823.690004]  [<ffffffff813825c4>] ? lockdep_sys_exit_thunk+0x35/0x67
> [ 5823.690004]  [<ffffffff8109ae2c>] do_group_exit+0x4c/0xc0
> [ 5823.690004]  [<ffffffff8109aeb7>] SyS_exit_group+0x17/0x20
> [ 5823.690004]  [<ffffffff8168a2c2>] system_call_fastpath+0x16/0x1b
> [ 5823.690004] ---[ end trace 515b7fa3169c0906 ]---

I just hit this one, which is somewhat similar:

---
[ 4003.295259] ------------[ cut here ]------------
[ 4003.297195] kernel BUG at /usr/src/linux-2.6/mm/filemap.c:202!
[ 4003.297195] invalid opcode: 0000 [#1] PREEMPT SMP 
[ 4003.297195] Modules linked in:
[ 4003.297195] CPU: 0 PID: 9360 Comm: trinity-c92 Not tainted 3.15.0-rc5-01700-g505011124ad0-dirty #1081
[ 4003.297195] Hardware name: Supermicro X8DTN/X8DTN, BIOS 4.6.3 01/08/2010
[ 4003.297195] task: ffff88042a9db900 ti: ffff88042aa7a000 task.ti: ffff88042aa7a000
[ 4003.297195] RIP: 0010:[<ffffffff81174af1>]  [<ffffffff81174af1>] __delete_from_page_cache+0x2a1/0x2b0
[ 4003.297195] RSP: 0018:ffff88042aa7bb30  EFLAGS: 00010046
[ 4003.297195] RAX: 0000000000000000 RBX: 0000000000000003 RCX: ffff88019dcd46a0
[ 4003.297195] RDX: 0000000000000146 RSI: ffffffff81a651f7 RDI: ffffffff81a2e091
[ 4003.297195] RBP: ffff88042aa7bb78 R08: 000000000000004e R09: ffff8801c4efd138
[ 4003.297195] R10: 0000000000000012 R11: ffff88042aa7bb48 R12: ffffea000828c280
[ 4003.297195] R13: ffff8801bc9a0890 R14: 0000000000000000 R15: ffff8801bc9a0898
[ 4003.297195] FS:  00007f984ad54700(0000) GS:ffff880237c00000(0000) knlGS:0000000000000000
[ 4003.297195] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 4003.297195] CR2: 00007f9847813000 CR3: 0000000001c0e000 CR4: 00000000000007f0
[ 4003.297195] Stack:
[ 4003.297195]  ffff8801bc9a08a8 ffff8801bc9a08a8 ffff8801c4efd138 ffff8801c4efd1d0
[ 4003.297195]  ffffea000828c280 ffff8801bc9a08a8 0000000000000000 ffffffffffffffff
[ 4003.297195]  000000000000004e ffff88042aa7bba0 ffffffff81174c98 ffffea000828c280
[ 4003.297195] Call Trace:
[ 4003.297195]  [<ffffffff81174c98>] delete_from_page_cache+0x48/0x80
[ 4003.297195]  [<ffffffff81182d6b>] truncate_inode_page+0x5b/0x90
[ 4003.297195]  [<ffffffff8118d06a>] shmem_undo_range+0x2fa/0x6e0
[ 4003.297195]  [<ffffffff8118d464>] shmem_truncate_range+0x14/0x30
[ 4003.297195]  [<ffffffff8118d67d>] shmem_evict_inode+0xed/0x150
[ 4003.297195]  [<ffffffff811ea377>] evict+0xa7/0x170
[ 4003.297195]  [<ffffffff811eaaa5>] iput+0x105/0x190
[ 4003.297195]  [<ffffffff811e51c8>] dentry_kill+0x268/0x2e0
[ 4003.297195]  [<ffffffff811e54e9>] dput+0x69/0x110
[ 4003.297195]  [<ffffffff811cf66c>] __fput+0x16c/0x1e0
[ 4003.297195]  [<ffffffff811cf72e>] ____fput+0xe/0x10
[ 4003.297195]  [<ffffffff810b91e7>] task_work_run+0xa7/0xe0
[ 4003.297195]  [<ffffffff8109a554>] do_exit+0x2d4/0xa90
[ 4003.297195]  [<ffffffff8168b351>] ? retint_swapgs+0xe/0x13
[ 4003.297195]  [<ffffffff8109ae3c>] do_group_exit+0x4c/0xc0
[ 4003.297195]  [<ffffffff8109aec7>] SyS_exit_group+0x17/0x20
[ 4003.297195]  [<ffffffff8168a742>] system_call_fastpath+0x16/0x1b
[ 4003.297195] Code: 45 d0 75 29 4c 89 30 e9 b0 fe ff ff 66 0f 1f 44 00 00 48 8b 75 c8 4c 89 ff e8 0c 71 20 00 84 c0 0f 85 96 fe ff ff e9 79 fe ff ff <0f> 0b e8 fe a7 50 00 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 
[ 4003.297195] RIP  [<ffffffff81174af1>] __delete_from_page_cache+0x2a1/0x2b0
[ 4003.297195]  RSP <ffff88042aa7bb30>
[ 4003.297195] ---[ end trace 2530b701678d4601 ]---

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: BUG at /usr/src/linux-2.6/mm/filemap.c:202
  2014-05-21  8:25   ` BUG at /usr/src/linux-2.6/mm/filemap.c:202 (was: perf: use after free in perf_remove_from_context) Peter Zijlstra
@ 2014-05-21 13:02     ` Sasha Levin
  0 siblings, 0 replies; 46+ messages in thread
From: Sasha Levin @ 2014-05-21 13:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, acme, LKML, Thomas Gleixner, Dave Jones, edumazet,
	viro, jbaron, hughd, mgorman

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/21/2014 04:25 AM, Peter Zijlstra wrote:
> On Thu, May 15, 2014 at 08:11:02PM +0200, Peter Zijlstra wrote:
>> On Mon, May 12, 2014 at 11:42:33AM -0400, Sasha Levin wrote:
>>> Hi all,
>>> 
>>> While fuzzing with trinity inside a KVM tools guest running the latest -next kernel I've stumbled on the following spew. Maybe related to the very recent change in freeing on task exit?
>>> 
>> 
>> While fuzzing to reproduce; I hit this one, is it a known one or should I go poke the right people about it?
>> 
>> --- [ 5823.689985] ------------[ cut here ]------------ [ 5823.690004] WARNING: CPU: 3 PID: 2508 at /usr/src/linux-2.6/lib/list_debug.c:59 __list_del_entry+0xa1/0xd0() [ 5823.690004] list_del corruption. prev->next should be ffff880131111de0, but was 6b6b6b6b6b6b6b6b [ 5823.690004] Modules linked in: [ 5823.690004] CPU: 3 PID: 2508 Comm: trinity-main Not tainted 3.15.0-rc5-01700-g505011124ad0-dirty #1072 [ 5823.690004] Hardware name: Supermicro X8DTN/X8DTN, BIOS 4.6.3 01/08/2010 [ 5823.690004]  0000000000000009 ffff880432709ca8 ffffffff81681aa2 ffff880432709cf0 [ 5823.690004]  ffff880432709ce0 ffffffff8109807c ffff880131111de0 ffff880131111dc8 [ 5823.690004]  0000000000000286 ffff8800b9dd5618 ffff88023699b720 ffff880432709d40 [ 5823.690004] Call Trace: [ 5823.690004]  [<ffffffff81681aa2>] dump_stack+0x4e/0x7a [ 5823.690004]  [<ffffffff8109807c>] warn_slowpath_common+0x8c/0xc0 [ 5823.690004]  [<ffffffff8109816c>] warn_slowpath_fmt+0x4c/0x50 [ 5823.690004]  [<ffffffff810ec8!
 bf>] ? do_
raw_spin_lock+0x13f/0x160 [ 5823.690004]  [<ffffffff8138c661>] __list_del_entry+0xa1/0xd0 [ 5823.690004]  [<ffffffff8138c69d>] list_del+0xd/0x30 [ 5823.690004]  [<ffffffff810dfa71>] remove_wait_queue+0x31/0x50 [ 5823.690004]  [<ffffffff812152aa>] ep_unregister_pollwait.isra.9+0x6a/0xb0 [ 5823.690004]  [<ffffffff81215268>] ? ep_unregister_pollwait.isra.9+0x28/0xb0 [ 5823.690004]  [<ffffffff8121531f>] ep_remove+0x2f/0xe0 [ 5823.690004]  [<ffffffff81215705>] eventpoll_release_file+0x65/0xa0 [ 5823.690004]  [<ffffffff811cf259>] __fput+0x1d9/0x1e0 [ 5823.690004]  [<ffffffff811cf2ae>] ____fput+0xe/0x10 [ 5823.690004]  [<ffffffff810b91f4>] task_work_run+0xc4/0xe0 [ 5823.690004]  [<ffffffff8109a544>] do_exit+0x2d4/0xa90 [ 5823.690004]  [<ffffffff813825c4>] ? lockdep_sys_exit_thunk+0x35/0x67 [ 5823.690004]  [<ffffffff8109ae2c>] do_group_exit+0x4c/0xc0 [ 5823.690004]  [<ffffffff8109aeb7>] SyS_exit_group+0x17/0x20 [ 5823.690004]  [<ffffffff8168a2c2>] system_call_fastpath+0x16/0x1b [ 58!
 23.690004]
 ---[ end trace 515b7fa3169c0906 ]---
> 
> I just hit this one, which is somewhat similar:
> 
> --- [ 4003.295259] ------------[ cut here ]------------ [ 4003.297195] kernel BUG at /usr/src/linux-2.6/mm/filemap.c:202! [ 4003.297195] invalid opcode: 0000 [#1] PREEMPT SMP [ 4003.297195] Modules linked in: [ 4003.297195] CPU: 0 PID: 9360 Comm: trinity-c92 Not tainted 3.15.0-rc5-01700-g505011124ad0-dirty #1081 [ 4003.297195] Hardware name: Supermicro X8DTN/X8DTN, BIOS 4.6.3 01/08/2010 [ 4003.297195] task: ffff88042a9db900 ti: ffff88042aa7a000 task.ti: ffff88042aa7a000 [ 4003.297195] RIP: 0010:[<ffffffff81174af1>]  [<ffffffff81174af1>] __delete_from_page_cache+0x2a1/0x2b0 [ 4003.297195] RSP: 0018:ffff88042aa7bb30  EFLAGS: 00010046 [ 4003.297195] RAX: 0000000000000000 RBX: 0000000000000003 RCX: ffff88019dcd46a0 [ 4003.297195] RDX: 0000000000000146 RSI: ffffffff81a651f7 RDI: ffffffff81a2e091 [ 4003.297195] RBP: ffff88042aa7bb78 R08: 000000000000004e R09: ffff8801c4efd138 [ 4003.297195] R10: 0000000000000012 R11: ffff88042aa7bb48 R12: ffffea000828c280 [ 4003.297195] R13: fff!
 f8801bc9a0
890 R14: 0000000000000000 R15: ffff8801bc9a0898 [ 4003.297195] FS:  00007f984ad54700(0000) GS:ffff880237c00000(0000) knlGS:0000000000000000 [ 4003.297195] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 4003.297195] CR2: 00007f9847813000 CR3: 0000000001c0e000 CR4: 00000000000007f0 [ 4003.297195] Stack: [ 4003.297195]  ffff8801bc9a08a8 ffff8801bc9a08a8 ffff8801c4efd138 ffff8801c4efd1d0 [ 4003.297195]  ffffea000828c280 ffff8801bc9a08a8 0000000000000000 ffffffffffffffff [ 4003.297195]  000000000000004e ffff88042aa7bba0 ffffffff81174c98 ffffea000828c280 [ 4003.297195] Call Trace: [ 4003.297195]  [<ffffffff81174c98>] delete_from_page_cache+0x48/0x80 [ 4003.297195]  [<ffffffff81182d6b>] truncate_inode_page+0x5b/0x90 [ 4003.297195]  [<ffffffff8118d06a>] shmem_undo_range+0x2fa/0x6e0 [ 4003.297195]  [<ffffffff8118d464>] shmem_truncate_range+0x14/0x30 [ 4003.297195]  [<ffffffff8118d67d>] shmem_evict_inode+0xed/0x150 [ 4003.297195]  [<ffffffff811ea377>] evict+0xa7/0x170 [ 4003.2971!
 95]  [<fff
fffff811eaaa5>] iput+0x105/0x190 [ 4003.297195]  [<ffffffff811e51c8>] dentry_kill+0x268/0x2e0 [ 4003.297195]  [<ffffffff811e54e9>] dput+0x69/0x110 [ 4003.297195]  [<ffffffff811cf66c>] __fput+0x16c/0x1e0 [ 4003.297195]  [<ffffffff811cf72e>] ____fput+0xe/0x10 [ 4003.297195]  [<ffffffff810b91e7>] task_work_run+0xa7/0xe0 [ 4003.297195]  [<ffffffff8109a554>] do_exit+0x2d4/0xa90 [ 4003.297195]  [<ffffffff8168b351>] ? retint_swapgs+0xe/0x13 [ 4003.297195]  [<ffffffff8109ae3c>] do_group_exit+0x4c/0xc0 [ 4003.297195]  [<ffffffff8109aec7>] SyS_exit_group+0x17/0x20 [ 4003.297195]  [<ffffffff8168a742>] system_call_fastpath+0x16/0x1b [ 4003.297195] Code: 45 d0 75 29 4c 89 30 e9 b0 fe ff ff 66 0f 1f 44 00 00 48 8b 75 c8 4c 89 ff e8 0c 71 20 00 84 c0 0f 85 96 fe ff ff e9 79 fe ff ff <0f> 0b e8 fe a7 50 00 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 [ 4003.297195] RIP  [<ffffffff81174af1>] __delete_from_page_cache+0x2a1/0x2b0 [ 4003.297195]  RSP <ffff88042aa7bb30> [ 4003.297195] ---[ end trac!
 e 2530b701
678d4601 ]---
> 

This one has been known for a while, and still unfixed (https://lkml.org/lkml/2014/4/16/624).


Thanks,
Sasha
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJTfKPmAAoJEN6mb/eXdyzcVEIP/iVkwGynatlHwnWYrN6ddec4
iNZpgJ+FiTY776BneC2Pvn3TLoHuXmLvZ7V5eV8JQEvmZfyuSCS1E8adAH9WSVgw
/E70rVDbTpo1FeokMLJZp8i3+knGCKjkBZD7S5rr6hkcb/t3ZbfgV1YVVbCVcLy3
0Z2dRIbwFRQUzQSHtQ2o3PjpbU/mcal6eJLrYgE6Zilsj72kEn4PEb5SMC1/smCk
qQUUXksaiT2uR9MmkO0oUppGNhphCbM/kRobpFEV6Hvw3YtauZz8R56Jt5G6XBrQ
cJYfkC4ygATGGA0ypIfoygbwg1PiFf26+czigM3SxVj+d29kRogt3B3ZiDWS2oEv
SHTF+53B0o2qVK7ZaZ74TAMrDjV/X6V+DCDHzd8M0qbiHbdUGrYZ11jfJlCdhbya
fsJGt8dnQj4lP/y9De4pjeUconAYZs2iD9VP+dk3NwbyhxhYMluSxX4mY56fJYTl
Dchopohv+HOoOixYugIo/8p9NT8/947EjKpiLoOq9HqFk+tcVzzMmyT78t6Li67O
BeyJwnbbHNXEdCeyNhpo4pPH+GpVdEzDTzUpbdgywiwkt7z8KPYGyrFg1Qt6r0hH
mYvekXTG+fUwTECJIqGkBy7I8v0HiA/Dg51h79o1zZmTB9gHc//yEIwcWj2ps74B
1DYLq3dz3xE5Z9KgFmUD
=eZ1C
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use after free in perf_remove_from_context
  2014-05-14 16:35     ` Peter Zijlstra
  2014-05-14 16:38       ` Sasha Levin
@ 2014-05-28 23:52       ` Sasha Levin
  2014-05-29  2:31         ` Sasha Levin
  2014-05-29  7:57         ` Peter Zijlstra
  1 sibling, 2 replies; 46+ messages in thread
From: Sasha Levin @ 2014-05-28 23:52 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, acme, LKML, Thomas Gleixner, Dave Jones

On 05/14/2014 12:35 PM, Peter Zijlstra wrote:
> On Wed, May 14, 2014 at 12:32:26PM -0400, Sasha Levin wrote:
>> > -----BEGIN PGP SIGNED MESSAGE-----
>> > Hash: SHA1
>> > 
>> > On 05/14/2014 12:29 PM, Peter Zijlstra wrote:
>>> > > On Mon, May 12, 2014 at 11:42:33AM -0400, Sasha Levin wrote:
>>>> > >> Hi all,
>>>> > >> 
>>>> > >> While fuzzing with trinity inside a KVM tools guest running the latest -next kernel I've stumbled on the following spew. Maybe related to the very recent change in freeing on task exit?
>>>> > >> 
>>>> > >> [ 2509.827261] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC [ 2509.830379] Dumping ftrace buffer: [ 2509.830379]    (ftrace buffer empty) [ 2509.830379] Modules linked in: [ 2509.830379] CPU: 47 PID: 43306 Comm: trinity-c126 Tainted: G        W     3.15.0-rc5-next-20140512-sasha-00019-ga20bc00-dirty #456
>>> > > 
>>> > > Any particular trinity setup? And would you happen to have the seed of that run?
>> > 
>> > Nothing special about trinity options. 400 threads and blacklisting some of the
>> > destructive syscalls (umount, reboot, etc).
>> > 
>> > I don't have the seed, but that problem did reproduce again tonight so I can test
>> > out debug code if you have something in mind.
> Nah, I drew a pretty big blank, which is why I wanted to see if I could
> reproduce. If you could share your trinity cmdline I'd be much obliged.
> While I did manage to clone (the repo moved since last time) and build
> it, I'm not really that handy with it and want to avoid destroying my
> machine if possible ;-)

Anything I could do to help out with this? It reproduces pretty easily on my
configuration so I'd be happy to test out whatever might help.


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use after free in perf_remove_from_context
  2014-05-28 23:52       ` Sasha Levin
@ 2014-05-29  2:31         ` Sasha Levin
  2014-05-29  7:59           ` Peter Zijlstra
  2014-05-29  7:57         ` Peter Zijlstra
  1 sibling, 1 reply; 46+ messages in thread
From: Sasha Levin @ 2014-05-29  2:31 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, acme, LKML, Thomas Gleixner, Dave Jones

On 05/28/2014 07:52 PM, Sasha Levin wrote:
> On 05/14/2014 12:35 PM, Peter Zijlstra wrote:
>> On Wed, May 14, 2014 at 12:32:26PM -0400, Sasha Levin wrote:
>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>> Hash: SHA1
>>>>
>>>> On 05/14/2014 12:29 PM, Peter Zijlstra wrote:
>>>>>> On Mon, May 12, 2014 at 11:42:33AM -0400, Sasha Levin wrote:
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> While fuzzing with trinity inside a KVM tools guest running the latest -next kernel I've stumbled on the following spew. Maybe related to the very recent change in freeing on task exit?
>>>>>>>>
>>>>>>>> [ 2509.827261] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC [ 2509.830379] Dumping ftrace buffer: [ 2509.830379]    (ftrace buffer empty) [ 2509.830379] Modules linked in: [ 2509.830379] CPU: 47 PID: 43306 Comm: trinity-c126 Tainted: G        W     3.15.0-rc5-next-20140512-sasha-00019-ga20bc00-dirty #456
>>>>>>
>>>>>> Any particular trinity setup? And would you happen to have the seed of that run?
>>>>
>>>> Nothing special about trinity options. 400 threads and blacklisting some of the
>>>> destructive syscalls (umount, reboot, etc).
>>>>
>>>> I don't have the seed, but that problem did reproduce again tonight so I can test
>>>> out debug code if you have something in mind.
>> Nah, I drew a pretty big blank, which is why I wanted to see if I could
>> reproduce. If you could share your trinity cmdline I'd be much obliged.
>> While I did manage to clone (the repo moved since last time) and build
>> it, I'm not really that handy with it and want to avoid destroying my
>> machine if possible ;-)
> 
> Anything I could do to help out with this? It reproduces pretty easily on my
> configuration so I'd be happy to test out whatever might help.

I've just had this:


[  591.111854] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[  591.121057] Dumping ftrace buffer:
[  591.121057]    (ftrace buffer empty)
[  591.121057] Modules linked in:
[  591.121057] CPU: 14 PID: 10272 Comm: trinity-c230 Tainted: G        W     3.15.0-rc7-next-20140528-sasha-00019-gd193e91-dirty #545
[  591.121057] task: ffff8802df0d3000 ti: ffff8802df0fe000 task.ti: ffff8802df0fe000
[  591.129130] RIP: perf_event_exit_task (kernel/events/core.c:7492 kernel/events/core.c:7524)
[  591.129130] RSP: 0000:ffff8802df0ffe28  EFLAGS: 00010246
[  591.129130] RAX: 0000000000000000 RBX: ffff8802df0d3000 RCX: 0000000000000001
[  591.129130] RDX: 0000000000000000 RSI: ffffffff97088100 RDI: 0000000000000282
[  591.129130] RBP: ffff8802df0ffe98 R08: 0000000000000000 R09: 0000000000000000
[  591.129130] R10: 0000000000000001 R11: 0000000000000000 R12: 6b6b6b6b6b6b6b6b
[  591.129130] R13: 6b6b6b6b6b6b6b6b R14: ffff8806c9bfd550 R15: 0000000000000001
[  591.129130] FS:  00007f4b26866700(0000) GS:ffff88036ae00000(0000) knlGS:0000000000000000
[  591.129130] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  591.129130] CR2: 0000000001a3a9f8 CR3: 000000001702c000 CR4: 00000000000006a0
[  591.129130] Stack:
[  591.129130]  ffff8802df0ffe48 ffff8802df09cce0 ffff8802fdf99488 ffff8802fdf99290
[  591.129130]  ffff8806c9bfd5a8 ffff8806c9bfd670 ffff8802df0ffe88 6b6b6b6b6b6b6b6b
[  591.129130]  ffff8802df0ffe98 0000000000000001 0000000000000001 ffff8802df0d3000
[  591.129130] Call Trace:
[  591.129130] do_exit (kernel/exit.c:766)
[  591.129130] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
[  591.129130] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2557 kernel/locking/lockdep.c:2599)
[  591.129130] ? trace_hardirqs_on (kernel/locking/lockdep.c:2607)
[  591.129130] do_group_exit (kernel/exit.c:884)
[  591.129130] SyS_exit_group (kernel/exit.c:895)
[  591.129130] tracesys (arch/x86/kernel/entry_64.S:542)
[ 591.129130] Code: a0 02 00 00 49 89 95 a8 02 00 00 4c 8b 45 a0 4c 89 c7 e8 11 c2 2c 03 48 8b 45 a8 48 89 c7 e8 15 7c ff ff 4c 89 ef e8 bd 7b ff ff <49> 8b 45 00 48 89 45 c8 4c 8b 6d c8 e8 cc 19 f6 ff 85 c0 74 0e
All code
========
   0:	a0 02 00 00 49 89 95 	movabs 0x2a8958949000002,%al
   7:	a8 02
   9:	00 00                	add    %al,(%rax)
   b:	4c 8b 45 a0          	mov    -0x60(%rbp),%r8
   f:	4c 89 c7             	mov    %r8,%rdi
  12:	e8 11 c2 2c 03       	callq  0x32cc228
  17:	48 8b 45 a8          	mov    -0x58(%rbp),%rax
  1b:	48 89 c7             	mov    %rax,%rdi
  1e:	e8 15 7c ff ff       	callq  0xffffffffffff7c38
  23:	4c 89 ef             	mov    %r13,%rdi
  26:	e8 bd 7b ff ff       	callq  0xffffffffffff7be8
  2b:*	49 8b 45 00          	mov    0x0(%r13),%rax		<-- trapping instruction
  2f:	48 89 45 c8          	mov    %rax,-0x38(%rbp)
  33:	4c 8b 6d c8          	mov    -0x38(%rbp),%r13
  37:	e8 cc 19 f6 ff       	callq  0xfffffffffff61a08
  3c:	85 c0                	test   %eax,%eax
  3e:	74 0e                	je     0x4e
	...

Code starting with the faulting instruction
===========================================
   0:	49 8b 45 00          	mov    0x0(%r13),%rax
   4:	48 89 45 c8          	mov    %rax,-0x38(%rbp)
   8:	4c 8b 6d c8          	mov    -0x38(%rbp),%r13
   c:	e8 cc 19 f6 ff       	callq  0xfffffffffff619dd
  11:	85 c0                	test   %eax,%eax
  13:	74 0e                	je     0x23
	...
[  591.129130] RIP perf_event_exit_task (kernel/events/core.c:7492 kernel/events/core.c:7524)
[  591.129130]  RSP <ffff8802df0ffe28>


Seems similar enough.


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use after free in perf_remove_from_context
  2014-05-28 23:52       ` Sasha Levin
  2014-05-29  2:31         ` Sasha Levin
@ 2014-05-29  7:57         ` Peter Zijlstra
  2014-05-29 14:47           ` Sasha Levin
  1 sibling, 1 reply; 46+ messages in thread
From: Peter Zijlstra @ 2014-05-29  7:57 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Ingo Molnar, acme, LKML, Thomas Gleixner, Dave Jones

[-- Attachment #1: Type: text/plain, Size: 2707 bytes --]

On Wed, May 28, 2014 at 07:52:07PM -0400, Sasha Levin wrote:
> On 05/14/2014 12:35 PM, Peter Zijlstra wrote:
> > On Wed, May 14, 2014 at 12:32:26PM -0400, Sasha Levin wrote:
> >> > -----BEGIN PGP SIGNED MESSAGE-----
> >> > Hash: SHA1
> >> > 
> >> > On 05/14/2014 12:29 PM, Peter Zijlstra wrote:
> >>> > > On Mon, May 12, 2014 at 11:42:33AM -0400, Sasha Levin wrote:
> >>>> > >> Hi all,
> >>>> > >> 
> >>>> > >> While fuzzing with trinity inside a KVM tools guest running the latest -next kernel I've stumbled on the following spew. Maybe related to the very recent change in freeing on task exit?
> >>>> > >> 
> >>>> > >> [ 2509.827261] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC [ 2509.830379] Dumping ftrace buffer: [ 2509.830379]    (ftrace buffer empty) [ 2509.830379] Modules linked in: [ 2509.830379] CPU: 47 PID: 43306 Comm: trinity-c126 Tainted: G        W     3.15.0-rc5-next-20140512-sasha-00019-ga20bc00-dirty #456
> >>> > > 
> >>> > > Any particular trinity setup? And would you happen to have the seed of that run?
> >> > 
> >> > Nothing special about trinity options. 400 threads and blacklisting some of the
> >> > destructive syscalls (umount, reboot, etc).
> >> > 
> >> > I don't have the seed, but that problem did reproduce again tonight so I can test
> >> > out debug code if you have something in mind.
> > Nah, I drew a pretty big blank, which is why I wanted to see if I could
> > reproduce. If you could share your trinity cmdline I'd be much obliged.
> > While I did manage to clone (the repo moved since last time) and build
> > it, I'm not really that handy with it and want to avoid destroying my
> > machine if possible ;-)
> 
> Anything I could do to help out with this? It reproduces pretty easily on my
> configuration so I'd be happy to test out whatever might help.

Yeah, it takes me days to test anything, and my last guess panned out to
nothing, at which point I decided I needed to look at the things I'd
neglected for a bit :/

Could you see if the below makes any difference? I'll try and get back
to tracking this.

---
 kernel/events/core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 9efb1e7858ac..851dc9dc5643 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7497,8 +7497,10 @@ static void perf_event_exit_task_context(struct task_struct *child, int ctxn)
 	 */
 	mutex_lock(&child_ctx->mutex);
 
+	rcu_read_lock();
 	list_for_each_entry_rcu(child_event, &child_ctx->event_list, event_entry)
 		__perf_event_exit_task(child_event, child_ctx, child);
+	rcu_read_unlock();
 
 	mutex_unlock(&child_ctx->mutex);
 

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: perf: use after free in perf_remove_from_context
  2014-05-29  2:31         ` Sasha Levin
@ 2014-05-29  7:59           ` Peter Zijlstra
  0 siblings, 0 replies; 46+ messages in thread
From: Peter Zijlstra @ 2014-05-29  7:59 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Ingo Molnar, acme, LKML, Thomas Gleixner, Dave Jones

[-- Attachment #1: Type: text/plain, Size: 4065 bytes --]

On Wed, May 28, 2014 at 10:31:38PM -0400, Sasha Levin wrote:
> I've just had this:
> 
> 
> [  591.111854] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> [  591.121057] Dumping ftrace buffer:
> [  591.121057]    (ftrace buffer empty)
> [  591.121057] Modules linked in:
> [  591.121057] CPU: 14 PID: 10272 Comm: trinity-c230 Tainted: G        W     3.15.0-rc7-next-20140528-sasha-00019-gd193e91-dirty #545
> [  591.121057] task: ffff8802df0d3000 ti: ffff8802df0fe000 task.ti: ffff8802df0fe000
> [  591.129130] RIP: perf_event_exit_task (kernel/events/core.c:7492 kernel/events/core.c:7524)
> [  591.129130] RSP: 0000:ffff8802df0ffe28  EFLAGS: 00010246
> [  591.129130] RAX: 0000000000000000 RBX: ffff8802df0d3000 RCX: 0000000000000001
> [  591.129130] RDX: 0000000000000000 RSI: ffffffff97088100 RDI: 0000000000000282
> [  591.129130] RBP: ffff8802df0ffe98 R08: 0000000000000000 R09: 0000000000000000
> [  591.129130] R10: 0000000000000001 R11: 0000000000000000 R12: 6b6b6b6b6b6b6b6b
> [  591.129130] R13: 6b6b6b6b6b6b6b6b R14: ffff8806c9bfd550 R15: 0000000000000001
> [  591.129130] FS:  00007f4b26866700(0000) GS:ffff88036ae00000(0000) knlGS:0000000000000000
> [  591.129130] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [  591.129130] CR2: 0000000001a3a9f8 CR3: 000000001702c000 CR4: 00000000000006a0
> [  591.129130] Stack:
> [  591.129130]  ffff8802df0ffe48 ffff8802df09cce0 ffff8802fdf99488 ffff8802fdf99290
> [  591.129130]  ffff8806c9bfd5a8 ffff8806c9bfd670 ffff8802df0ffe88 6b6b6b6b6b6b6b6b
> [  591.129130]  ffff8802df0ffe98 0000000000000001 0000000000000001 ffff8802df0d3000
> [  591.129130] Call Trace:
> [  591.129130] do_exit (kernel/exit.c:766)
> [  591.129130] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
> [  591.129130] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2557 kernel/locking/lockdep.c:2599)
> [  591.129130] ? trace_hardirqs_on (kernel/locking/lockdep.c:2607)
> [  591.129130] do_group_exit (kernel/exit.c:884)
> [  591.129130] SyS_exit_group (kernel/exit.c:895)
> [  591.129130] tracesys (arch/x86/kernel/entry_64.S:542)
> [ 591.129130] Code: a0 02 00 00 49 89 95 a8 02 00 00 4c 8b 45 a0 4c 89 c7 e8 11 c2 2c 03 48 8b 45 a8 48 89 c7 e8 15 7c ff ff 4c 89 ef e8 bd 7b ff ff <49> 8b 45 00 48 89 45 c8 4c 8b 6d c8 e8 cc 19 f6 ff 85 c0 74 0e
> All code
> ========
>    0:	a0 02 00 00 49 89 95 	movabs 0x2a8958949000002,%al
>    7:	a8 02
>    9:	00 00                	add    %al,(%rax)
>    b:	4c 8b 45 a0          	mov    -0x60(%rbp),%r8
>    f:	4c 89 c7             	mov    %r8,%rdi
>   12:	e8 11 c2 2c 03       	callq  0x32cc228
>   17:	48 8b 45 a8          	mov    -0x58(%rbp),%rax
>   1b:	48 89 c7             	mov    %rax,%rdi
>   1e:	e8 15 7c ff ff       	callq  0xffffffffffff7c38
>   23:	4c 89 ef             	mov    %r13,%rdi
>   26:	e8 bd 7b ff ff       	callq  0xffffffffffff7be8
>   2b:*	49 8b 45 00          	mov    0x0(%r13),%rax		<-- trapping instruction
>   2f:	48 89 45 c8          	mov    %rax,-0x38(%rbp)
>   33:	4c 8b 6d c8          	mov    -0x38(%rbp),%r13
>   37:	e8 cc 19 f6 ff       	callq  0xfffffffffff61a08
>   3c:	85 c0                	test   %eax,%eax
>   3e:	74 0e                	je     0x4e
> 	...
> 
> Code starting with the faulting instruction
> ===========================================
>    0:	49 8b 45 00          	mov    0x0(%r13),%rax
>    4:	48 89 45 c8          	mov    %rax,-0x38(%rbp)
>    8:	4c 8b 6d c8          	mov    -0x38(%rbp),%r13
>    c:	e8 cc 19 f6 ff       	callq  0xfffffffffff619dd
>   11:	85 c0                	test   %eax,%eax
>   13:	74 0e                	je     0x23
> 	...
> [  591.129130] RIP perf_event_exit_task (kernel/events/core.c:7492 kernel/events/core.c:7524)
> [  591.129130]  RSP <ffff8802df0ffe28>
> 
> 
> Seems similar enough.

Shiny, you got the code bits scripted ;-)

Yeah, looks about the same, one of the list elements (I checked, its not
the list head itself, or at least it wasn't when I managed to hit) was
freed while still on the list.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use after free in perf_remove_from_context
  2014-05-29  7:57         ` Peter Zijlstra
@ 2014-05-29 14:47           ` Sasha Levin
  2014-05-29 15:07             ` Peter Zijlstra
  0 siblings, 1 reply; 46+ messages in thread
From: Sasha Levin @ 2014-05-29 14:47 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, acme, LKML, Thomas Gleixner, Dave Jones

On 05/29/2014 03:57 AM, Peter Zijlstra wrote:
> On Wed, May 28, 2014 at 07:52:07PM -0400, Sasha Levin wrote:
>> On 05/14/2014 12:35 PM, Peter Zijlstra wrote:
>>> On Wed, May 14, 2014 at 12:32:26PM -0400, Sasha Levin wrote:
>>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>>> Hash: SHA1
>>>>>
>>>>> On 05/14/2014 12:29 PM, Peter Zijlstra wrote:
>>>>>>> On Mon, May 12, 2014 at 11:42:33AM -0400, Sasha Levin wrote:
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> While fuzzing with trinity inside a KVM tools guest running the latest -next kernel I've stumbled on the following spew. Maybe related to the very recent change in freeing on task exit?
>>>>>>>>>
>>>>>>>>> [ 2509.827261] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC [ 2509.830379] Dumping ftrace buffer: [ 2509.830379]    (ftrace buffer empty) [ 2509.830379] Modules linked in: [ 2509.830379] CPU: 47 PID: 43306 Comm: trinity-c126 Tainted: G        W     3.15.0-rc5-next-20140512-sasha-00019-ga20bc00-dirty #456
>>>>>>>
>>>>>>> Any particular trinity setup? And would you happen to have the seed of that run?
>>>>>
>>>>> Nothing special about trinity options. 400 threads and blacklisting some of the
>>>>> destructive syscalls (umount, reboot, etc).
>>>>>
>>>>> I don't have the seed, but that problem did reproduce again tonight so I can test
>>>>> out debug code if you have something in mind.
>>> Nah, I drew a pretty big blank, which is why I wanted to see if I could
>>> reproduce. If you could share your trinity cmdline I'd be much obliged.
>>> While I did manage to clone (the repo moved since last time) and build
>>> it, I'm not really that handy with it and want to avoid destroying my
>>> machine if possible ;-)
>>
>> Anything I could do to help out with this? It reproduces pretty easily on my
>> configuration so I'd be happy to test out whatever might help.
> 
> Yeah, it takes me days to test anything, and my last guess panned out to
> nothing, at which point I decided I needed to look at the things I'd
> neglected for a bit :/
> 
> Could you see if the below makes any difference? I'll try and get back
> to tracking this.
> 
> ---
>  kernel/events/core.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 9efb1e7858ac..851dc9dc5643 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -7497,8 +7497,10 @@ static void perf_event_exit_task_context(struct task_struct *child, int ctxn)
>  	 */
>  	mutex_lock(&child_ctx->mutex);
>  
> +	rcu_read_lock();
>  	list_for_each_entry_rcu(child_event, &child_ctx->event_list, event_entry)
>  		__perf_event_exit_task(child_event, child_ctx, child);
> +	rcu_read_unlock();
>  
>  	mutex_unlock(&child_ctx->mutex);
>  
> 

It doesn't work out well because we later lock a mutex in sync_child_event().


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use after free in perf_remove_from_context
  2014-05-29 14:47           ` Sasha Levin
@ 2014-05-29 15:07             ` Peter Zijlstra
  2014-05-29 16:44               ` Sasha Levin
  0 siblings, 1 reply; 46+ messages in thread
From: Peter Zijlstra @ 2014-05-29 15:07 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Ingo Molnar, acme, LKML, Thomas Gleixner, Dave Jones

On Thu, May 29, 2014 at 10:47:09AM -0400, Sasha Levin wrote:
> It doesn't work out well because we later lock a mutex in sync_child_event().
> 

Urgh, right you are. I'll go stare at it more. It shouldn't have
mattered, because the mutex we take just before should ensure existence,
but.. you know.. :-)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use after free in perf_remove_from_context
  2014-05-29 15:07             ` Peter Zijlstra
@ 2014-05-29 16:44               ` Sasha Levin
  2014-05-29 16:50                 ` Peter Zijlstra
  0 siblings, 1 reply; 46+ messages in thread
From: Sasha Levin @ 2014-05-29 16:44 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, acme, LKML, Thomas Gleixner, Dave Jones

On 05/29/2014 11:07 AM, Peter Zijlstra wrote:
> On Thu, May 29, 2014 at 10:47:09AM -0400, Sasha Levin wrote:
>> It doesn't work out well because we later lock a mutex in sync_child_event().
>>
> 
> Urgh, right you are. I'll go stare at it more. It shouldn't have
> mattered, because the mutex we take just before should ensure existence,
> but.. you know.. :-)
> 

So the only caller to sync_child_event() is that loop. According to what you said
it should be safe to remove that mutex lock, but doing that triggers a list
corruption:

[ 1204.341887] WARNING: CPU: 20 PID: 12839 at lib/list_debug.c:62 __list_del_entry+0xa1/0xe0()
[ 1204.347597] list_del corruption. next->prev should be ffff8806ca68b108, but was ffff88051a67c398
[...]

I don't see how that would happen :/


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use after free in perf_remove_from_context
  2014-05-29 16:44               ` Sasha Levin
@ 2014-05-29 16:50                 ` Peter Zijlstra
  2014-05-29 16:52                   ` Sasha Levin
  2014-05-29 17:00                   ` Peter Zijlstra
  0 siblings, 2 replies; 46+ messages in thread
From: Peter Zijlstra @ 2014-05-29 16:50 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Ingo Molnar, acme, LKML, Thomas Gleixner, Dave Jones

On Thu, May 29, 2014 at 12:44:23PM -0400, Sasha Levin wrote:
> On 05/29/2014 11:07 AM, Peter Zijlstra wrote:
> > On Thu, May 29, 2014 at 10:47:09AM -0400, Sasha Levin wrote:
> >> It doesn't work out well because we later lock a mutex in sync_child_event().
> >>
> > 
> > Urgh, right you are. I'll go stare at it more. It shouldn't have
> > mattered, because the mutex we take just before should ensure existence,
> > but.. you know.. :-)
> > 
> 
> So the only caller to sync_child_event() is that loop. According to what you said
> it should be safe to remove that mutex lock, but doing that triggers a list
> corruption:
> 
> [ 1204.341887] WARNING: CPU: 20 PID: 12839 at lib/list_debug.c:62 __list_del_entry+0xa1/0xe0()
> [ 1204.347597] list_del corruption. next->prev should be ffff8806ca68b108, but was ffff88051a67c398
> [...]
> 
> I don't see how that would happen :/

No, what I said is that the mutex in perf_event_exit_task() should be
sufficient to guard the list iteration calling __perf_event_exit_task().

Ading the RCU was a bit of paranoia.. 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use after free in perf_remove_from_context
  2014-05-29 16:50                 ` Peter Zijlstra
@ 2014-05-29 16:52                   ` Sasha Levin
  2014-05-29 17:00                   ` Peter Zijlstra
  1 sibling, 0 replies; 46+ messages in thread
From: Sasha Levin @ 2014-05-29 16:52 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, acme, LKML, Thomas Gleixner, Dave Jones

On 05/29/2014 12:50 PM, Peter Zijlstra wrote:
>> > 
>> > So the only caller to sync_child_event() is that loop. According to what you said
>> > it should be safe to remove that mutex lock, but doing that triggers a list
>> > corruption:
>> > 
>> > [ 1204.341887] WARNING: CPU: 20 PID: 12839 at lib/list_debug.c:62 __list_del_entry+0xa1/0xe0()
>> > [ 1204.347597] list_del corruption. next->prev should be ffff8806ca68b108, but was ffff88051a67c398
>> > [...]
>> > 
>> > I don't see how that would happen :/
> No, what I said is that the mutex in perf_event_exit_task() should be
> sufficient to guard the list iteration calling __perf_event_exit_task().
> 
> Ading the RCU was a bit of paranoia.. 

Oh, my bad, I thought you referred to the mutex lock in sync_child_event().


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: perf: use after free in perf_remove_from_context
  2014-05-29 16:50                 ` Peter Zijlstra
  2014-05-29 16:52                   ` Sasha Levin
@ 2014-05-29 17:00                   ` Peter Zijlstra
  2014-05-29 22:37                     ` Sasha Levin
  2014-06-05 14:38                     ` [tip:perf/core] perf: Fix use after free in perf_remove_from_context() tip-bot for Peter Zijlstra
  1 sibling, 2 replies; 46+ messages in thread
From: Peter Zijlstra @ 2014-05-29 17:00 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Ingo Molnar, acme, LKML, Thomas Gleixner, Dave Jones

On Thu, May 29, 2014 at 06:50:57PM +0200, Peter Zijlstra wrote:
> On Thu, May 29, 2014 at 12:44:23PM -0400, Sasha Levin wrote:
> > On 05/29/2014 11:07 AM, Peter Zijlstra wrote:
> > > On Thu, May 29, 2014 at 10:47:09AM -0400, Sasha Levin wrote:
> > >> It doesn't work out well because we later lock a mutex in sync_child_event().
> > >>
> > > 
> > > Urgh, right you are. I'll go stare at it more. It shouldn't have
> > > mattered, because the mutex we take just before should ensure existence,
> > > but.. you know.. :-)
> > > 
> > 
> > So the only caller to sync_child_event() is that loop. According to what you said
> > it should be safe to remove that mutex lock, but doing that triggers a list
> > corruption:
> > 
> > [ 1204.341887] WARNING: CPU: 20 PID: 12839 at lib/list_debug.c:62 __list_del_entry+0xa1/0xe0()
> > [ 1204.347597] list_del corruption. next->prev should be ffff8806ca68b108, but was ffff88051a67c398
> > [...]
> > 
> > I don't see how that would happen :/
> 
> No, what I said is that the mutex in perf_event_exit_task() should be
> sufficient to guard the list iteration calling __perf_event_exit_task().
> 
> Ading the RCU was a bit of paranoia.. 

Hmm, so can you try this..

While that mutex should guard the elements, it doesn't guard against the
use-after-free that's from list_for_each_entry_rcu().
__perf_event_exit_task() can actually free the event.

And because list addition/deletion is guarded by both ctx->mutex and
ctx->lock, holding ctx->mutex is sufficient for reading the list, so we
don't actually need the rcu list iteration.

---
 kernel/events/core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 689237a0c5e8..2bb45d483325 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7432,7 +7432,7 @@ __perf_event_exit_task(struct perf_event *child_event,
 
 static void perf_event_exit_task_context(struct task_struct *child, int ctxn)
 {
-	struct perf_event *child_event;
+	struct perf_event *child_event, *next;
 	struct perf_event_context *child_ctx;
 	unsigned long flags;
 
@@ -7486,7 +7486,7 @@ static void perf_event_exit_task_context(struct task_struct *child, int ctxn)
 	 */
 	mutex_lock(&child_ctx->mutex);
 
-	list_for_each_entry_rcu(child_event, &child_ctx->event_list, event_entry)
+	list_for_each_entry_safe(child_event, next, &child_ctx->event_list, event_entry)
 		__perf_event_exit_task(child_event, child_ctx, child);
 
 	mutex_unlock(&child_ctx->mutex);

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: perf: use after free in perf_remove_from_context
  2014-05-29 17:00                   ` Peter Zijlstra
@ 2014-05-29 22:37                     ` Sasha Levin
  2014-06-05 14:38                     ` [tip:perf/core] perf: Fix use after free in perf_remove_from_context() tip-bot for Peter Zijlstra
  1 sibling, 0 replies; 46+ messages in thread
From: Sasha Levin @ 2014-05-29 22:37 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, acme, LKML, Thomas Gleixner, Dave Jones

On 05/29/2014 01:00 PM, Peter Zijlstra wrote:
> On Thu, May 29, 2014 at 06:50:57PM +0200, Peter Zijlstra wrote:
>> > On Thu, May 29, 2014 at 12:44:23PM -0400, Sasha Levin wrote:
>>> > > On 05/29/2014 11:07 AM, Peter Zijlstra wrote:
>>>> > > > On Thu, May 29, 2014 at 10:47:09AM -0400, Sasha Levin wrote:
>>>>> > > >> It doesn't work out well because we later lock a mutex in sync_child_event().
>>>>> > > >>
>>>> > > > 
>>>> > > > Urgh, right you are. I'll go stare at it more. It shouldn't have
>>>> > > > mattered, because the mutex we take just before should ensure existence,
>>>> > > > but.. you know.. :-)
>>>> > > > 
>>> > > 
>>> > > So the only caller to sync_child_event() is that loop. According to what you said
>>> > > it should be safe to remove that mutex lock, but doing that triggers a list
>>> > > corruption:
>>> > > 
>>> > > [ 1204.341887] WARNING: CPU: 20 PID: 12839 at lib/list_debug.c:62 __list_del_entry+0xa1/0xe0()
>>> > > [ 1204.347597] list_del corruption. next->prev should be ffff8806ca68b108, but was ffff88051a67c398
>>> > > [...]
>>> > > 
>>> > > I don't see how that would happen :/
>> > 
>> > No, what I said is that the mutex in perf_event_exit_task() should be
>> > sufficient to guard the list iteration calling __perf_event_exit_task().
>> > 
>> > Ading the RCU was a bit of paranoia.. 
> Hmm, so can you try this..
> 
> While that mutex should guard the elements, it doesn't guard against the
> use-after-free that's from list_for_each_entry_rcu().
> __perf_event_exit_task() can actually free the event.
> 
> And because list addition/deletion is guarded by both ctx->mutex and
> ctx->lock, holding ctx->mutex is sufficient for reading the list, so we
> don't actually need the rcu list iteration.

Works for me, thanks!


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: eventpoll __list_del_entry corruption
  2014-05-15 18:11 ` eventpoll __list_del_entry corruption (was: perf: use after free in perf_remove_from_context) Peter Zijlstra
  2014-05-15 18:16   ` eventpoll __list_del_entry corruption Sasha Levin
  2014-05-21  8:25   ` BUG at /usr/src/linux-2.6/mm/filemap.c:202 (was: perf: use after free in perf_remove_from_context) Peter Zijlstra
@ 2014-06-03 15:07   ` Jason Baron
  2014-06-03 15:11     ` Peter Zijlstra
  2 siblings, 1 reply; 46+ messages in thread
From: Jason Baron @ 2014-06-03 15:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sasha Levin, Ingo Molnar, acme, LKML, Thomas Gleixner,
	Dave Jones, edumazet, viro

On 05/15/2014 02:11 PM, Peter Zijlstra wrote:
> On Mon, May 12, 2014 at 11:42:33AM -0400, Sasha Levin wrote:
>> Hi all,
>>
>> While fuzzing with trinity inside a KVM tools guest running the latest -next
>> kernel I've stumbled on the following spew. Maybe related to the very recent
>> change in freeing on task exit?
>>
> 
> While fuzzing to reproduce; I hit this one, is it a known one or should
> I go poke the right people about it?
> 
> ---
> [ 5823.689985] ------------[ cut here ]------------
> [ 5823.690004] WARNING: CPU: 3 PID: 2508 at /usr/src/linux-2.6/lib/list_debug.c:59 __list_del_entry+0xa1/0xd0()
> [ 5823.690004] list_del corruption. prev->next should be ffff880131111de0, but was 6b6b6b6b6b6b6b6b
> [ 5823.690004] Modules linked in:
> [ 5823.690004] CPU: 3 PID: 2508 Comm: trinity-main Not tainted 3.15.0-rc5-01700-g505011124ad0-dirty #1072
> [ 5823.690004] Hardware name: Supermicro X8DTN/X8DTN, BIOS 4.6.3 01/08/2010
> [ 5823.690004]  0000000000000009 ffff880432709ca8 ffffffff81681aa2 ffff880432709cf0
> [ 5823.690004]  ffff880432709ce0 ffffffff8109807c ffff880131111de0 ffff880131111dc8
> [ 5823.690004]  0000000000000286 ffff8800b9dd5618 ffff88023699b720 ffff880432709d40
> [ 5823.690004] Call Trace:
> [ 5823.690004]  [<ffffffff81681aa2>] dump_stack+0x4e/0x7a
> [ 5823.690004]  [<ffffffff8109807c>] warn_slowpath_common+0x8c/0xc0
> [ 5823.690004]  [<ffffffff8109816c>] warn_slowpath_fmt+0x4c/0x50
> [ 5823.690004]  [<ffffffff810ec8bf>] ? do_raw_spin_lock+0x13f/0x160
> [ 5823.690004]  [<ffffffff8138c661>] __list_del_entry+0xa1/0xd0
> [ 5823.690004]  [<ffffffff8138c69d>] list_del+0xd/0x30
> [ 5823.690004]  [<ffffffff810dfa71>] remove_wait_queue+0x31/0x50
> [ 5823.690004]  [<ffffffff812152aa>] ep_unregister_pollwait.isra.9+0x6a/0xb0
> [ 5823.690004]  [<ffffffff81215268>] ? ep_unregister_pollwait.isra.9+0x28/0xb0
> [ 5823.690004]  [<ffffffff8121531f>] ep_remove+0x2f/0xe0
> [ 5823.690004]  [<ffffffff81215705>] eventpoll_release_file+0x65/0xa0
> [ 5823.690004]  [<ffffffff811cf259>] __fput+0x1d9/0x1e0
> [ 5823.690004]  [<ffffffff811cf2ae>] ____fput+0xe/0x10
> [ 5823.690004]  [<ffffffff810b91f4>] task_work_run+0xc4/0xe0
> [ 5823.690004]  [<ffffffff8109a544>] do_exit+0x2d4/0xa90
> [ 5823.690004]  [<ffffffff813825c4>] ? lockdep_sys_exit_thunk+0x35/0x67
> [ 5823.690004]  [<ffffffff8109ae2c>] do_group_exit+0x4c/0xc0
> [ 5823.690004]  [<ffffffff8109aeb7>] SyS_exit_group+0x17/0x20
> [ 5823.690004]  [<ffffffff8168a2c2>] system_call_fastpath+0x16/0x1b
> [ 5823.690004] ---[ end trace 515b7fa3169c0906 ]---
> 


Hi Peter,

If its possible to reproduce maybe we can apply the following debug
patch to at least have a clue about which wait queue has gotten
corrupted. The bug could also be isolated to the epoll core too (ie
not specific to a particular wait queue), but I think its worth a
shot...

Thanks,

-Jason


diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index af90312..e8d5ea7 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -237,6 +237,9 @@ struct eppoll_entry {
 
 	/* The wait queue head that linked the "wait" wait queue item */
 	wait_queue_head_t *whead;
+
+	/* DEBUG: save address of ep_ptable_queue_proc() caller */
+	unsigned long poll_wait_addr;
 };
 
 /* Wrapper struct used by poll queueing */
@@ -513,6 +516,21 @@ static void ep_poll_safewake(wait_queue_head_t *wq)
 	put_cpu();
 }
 
+static void check_pwq(struct eppoll_entry *pwq)
+{
+	unsigned long flags;
+	struct list_head *prev, *entry;
+
+	spin_lock_irqsave(&pwq->whead->lock, flags);
+	entry = &pwq->wait.task_list;
+	prev = entry->prev;
+	if (prev->next != entry)
+		pr_err("epoll: list corruption: queue caller addr: 0x%lx, "
+			"function: %pS\n", pwq->poll_wait_addr,
+			(void *)pwq->poll_wait_addr);
+	spin_unlock_irqrestore(&pwq->whead->lock, flags);
+}
+
 static void ep_remove_wait_queue(struct eppoll_entry *pwq)
 {
 	wait_queue_head_t *whead;
@@ -520,8 +538,10 @@ static void ep_remove_wait_queue(struct eppoll_entry *pwq)
 	rcu_read_lock();
 	/* If it is cleared by POLLFREE, it should be rcu-safe */
 	whead = rcu_dereference(pwq->whead);
-	if (whead)
+	if (whead) {
+		check_pwq(pwq);
 		remove_wait_queue(whead, &pwq->wait);
+	}
 	rcu_read_unlock();
 }
 
@@ -1101,6 +1121,7 @@ static void ep_ptable_queue_proc(struct file *file, wait_queue_head_t *whead,
 		add_wait_queue(whead, &pwq->wait);
 		list_add_tail(&pwq->llink, &epi->pwqlist);
 		epi->nwait++;
+		pwq->poll_wait_addr = (unsigned long)__builtin_return_address(0);
 	} else {
 		/* We have to signal that an error occurred */
 		epi->nwait = -1;

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: eventpoll __list_del_entry corruption
  2014-06-03 15:07   ` eventpoll __list_del_entry corruption Jason Baron
@ 2014-06-03 15:11     ` Peter Zijlstra
  0 siblings, 0 replies; 46+ messages in thread
From: Peter Zijlstra @ 2014-06-03 15:11 UTC (permalink / raw)
  To: Jason Baron
  Cc: Sasha Levin, Ingo Molnar, acme, LKML, Thomas Gleixner,
	Dave Jones, edumazet, viro

[-- Attachment #1: Type: text/plain, Size: 697 bytes --]

On Tue, Jun 03, 2014 at 11:07:11AM -0400, Jason Baron wrote:
> Hi Peter,
> 
> If its possible to reproduce maybe we can apply the following debug

I think I saw it twice during a week of fuzzing, so it might be a while,
esp. since I'm now not actively fuzzing anymore, but I can sure stuff
this in my tmp/trinity branch just in case.

Sasha and Dave might also want to carry it for a while, since they're
doing far more of this than me.

> patch to at least have a clue about which wait queue has gotten
> corrupted. The bug could also be isolated to the epoll core too (ie
> not specific to a particular wait queue), but I think its worth a
> shot...
> 
> Thanks,
> 
> -Jason

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [tip:perf/core] perf: Fix use after free in perf_remove_from_context()
  2014-05-29 17:00                   ` Peter Zijlstra
  2014-05-29 22:37                     ` Sasha Levin
@ 2014-06-05 14:38                     ` tip-bot for Peter Zijlstra
  1 sibling, 0 replies; 46+ messages in thread
From: tip-bot for Peter Zijlstra @ 2014-06-05 14:38 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, sasha.levin, hpa, mingo, torvalds, peterz, acme,
	davej, tglx

Commit-ID:  ebf905fc7a6e7c99c53b5afc888d8f950da90aff
Gitweb:     http://git.kernel.org/tip/ebf905fc7a6e7c99c53b5afc888d8f950da90aff
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Thu, 29 May 2014 19:00:24 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 5 Jun 2014 12:29:52 +0200

perf: Fix use after free in perf_remove_from_context()

While that mutex should guard the elements, it doesn't guard against the
use-after-free that's from list_for_each_entry_rcu().
__perf_event_exit_task() can actually free the event.

And because list addition/deletion is guarded by both ctx->mutex and
ctx->lock, holding ctx->mutex is sufficient for reading the list, so we
don't actually need the rcu list iteration.

Fixes: 3a497f48637e ("perf: Simplify perf_event_exit_task_context()")
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Tested-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Jones <davej@redhat.com>
Cc: acme@ghostprotocols.net
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140529170024.GA2315@laptop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/events/core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index ed50b09..a62d142 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7431,7 +7431,7 @@ __perf_event_exit_task(struct perf_event *child_event,
 
 static void perf_event_exit_task_context(struct task_struct *child, int ctxn)
 {
-	struct perf_event *child_event;
+	struct perf_event *child_event, *next;
 	struct perf_event_context *child_ctx;
 	unsigned long flags;
 
@@ -7485,7 +7485,7 @@ static void perf_event_exit_task_context(struct task_struct *child, int ctxn)
 	 */
 	mutex_lock(&child_ctx->mutex);
 
-	list_for_each_entry_rcu(child_event, &child_ctx->event_list, event_entry)
+	list_for_each_entry_safe(child_event, next, &child_ctx->event_list, event_entry)
 		__perf_event_exit_task(child_event, child_ctx, child);
 
 	mutex_unlock(&child_ctx->mutex);

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: eventpoll __list_del_entry corruption
  2014-05-15 18:16   ` eventpoll __list_del_entry corruption Sasha Levin
@ 2014-06-16  9:44     ` Eric Wong
  0 siblings, 0 replies; 46+ messages in thread
From: Eric Wong @ 2014-06-16  9:44 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Peter Zijlstra, Ingo Molnar, acme, LKML, Thomas Gleixner,
	Dave Jones, edumazet, viro, jbaron

Sasha Levin <sasha.levin@oracle.com> wrote:
> On 05/15/2014 02:11 PM, Peter Zijlstra wrote:
> > On Mon, May 12, 2014 at 11:42:33AM -0400, Sasha Levin wrote:
> >> Hi all,
> >>
> >> While fuzzing with trinity inside a KVM tools guest running the latest -next
> >> kernel I've stumbled on the following spew. Maybe related to the very recent
> >> change in freeing on task exit?

> > [ 5823.690004]  [<ffffffff8109a544>] do_exit+0x2d4/0xa90
> > [ 5823.690004]  [<ffffffff813825c4>] ? lockdep_sys_exit_thunk+0x35/0x67
> > [ 5823.690004]  [<ffffffff8109ae2c>] do_group_exit+0x4c/0xc0
> > [ 5823.690004]  [<ffffffff8109aeb7>] SyS_exit_group+0x17/0x20
> > [ 5823.690004]  [<ffffffff8168a2c2>] system_call_fastpath+0x16/0x1b
> > [ 5823.690004] ---[ end trace 515b7fa3169c0906 ]---
> 
> Dave reported something similar to that last year(!) and that never got fixed
> AFAIK: https://lkml.org/lkml/2013/10/14/353.

Both stack traces happen during exit, so it may be a destruction
ordering problem when the eventpoll and files it references are
all released together.

I've never seen this in normal use, perhaps because I'm in the habit of
closing all descriptors before exit to keep valgrind happy :)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: BUG_ON drivers/char/random.c:986
  2014-05-17  2:18         ` Theodore Ts'o
  2014-05-17 16:24           ` Sasha Levin
@ 2014-07-15  4:36           ` Dave Jones
  2014-07-15 20:29             ` Hannes Frederic Sowa
  1 sibling, 1 reply; 46+ messages in thread
From: Dave Jones @ 2014-07-15  4:36 UTC (permalink / raw)
  To: Linux Kernel
  Cc: Theodore Ts'o, Hannes Frederic Sowa, Peter Zijlstra,
	H. Peter Anvin, Sasha Levin, Ingo Molnar, acme, Thomas Gleixner,
	price

On Fri, May 16, 2014 at 10:18:40PM -0400, Theodore Ts'o wrote:
 > On Fri, May 16, 2014 at 05:46:22PM -0700, Hannes Frederic Sowa wrote:
 > > This should do the trick:
 > > dd if=/dev/urandom of=/dev/zero bs=67108707
 > > 
 > > I suspect ee1de406ba6eb1 ("random: simplify accounting logic") as the
 > > culprit.
 > 
 > Yep, that it's it.  Thanks for noticing this so quickly!  I'll push
 > the following patch to Linus.
 > 
 >     	      	    			- Ted
 > 
 > commit 29fb0ca5b3922288fba3f4c975a55032a51df0f0
 > Author: Theodore Ts'o <tytso@mit.edu>
 > Date:   Fri May 16 21:40:41 2014 -0400
 > 
 >     random: fix BUG_ON caused by accounting simplification

I just hit this bug again on 3.16rc5, which has this patch, so
there's still some corner case that isn't happy..

(old thread is here for reference: https://lkml.org/lkml/2014/5/16/724)

kernel BUG at drivers/char/random.c:986!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
CPU: 2 PID: 29845 Comm: trinity-c65 Not tainted 3.16.0-rc5+ #22
task: ffff880096f45880 ti: ffff88017dfe8000 task.ti: ffff88017dfe8000
RIP: 0010:[<ffffffff8d420c2c>]  [<ffffffff8d420c2c>] account+0x14c/0x150
RSP: 0018:ffff88017dfebd38  EFLAGS: 00010206
RAX: 0000000000002000 RBX: 000000000027ae6c RCX: 0000000000000000
RDX: 0000000000000000 RSI: 000000000027ae6c RDI: ffffffff8dcb6840
RBP: ffff88017dfebd98 R08: ffffffff8d423490 R09: 0000000000000000
R10: ffffffff8dcb68a4 R11: 0000000000000000 R12: 00007fe396cec000
R13: 0000000000000000 R14: ffffffff8dcb6840 R15: ffffffff8da85638
FS:  00007fe39a1e2700(0000) GS:ffff88024d100000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000012e5a3000 CR4: 00000000001407e0
DR0: 000000000111b000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Stack:
 ffffffff8d423340 00000000ffffffff 00000000a4c39e97 ffffffff8dcb6840
 0000000000000002 0000000000000000 00000000a4c39e97 00007fe396cec000
 000000000027ae6c 000000000000005a 000000000027ae6c ffffffff8d423490
Call Trace:
 [<ffffffff8d423340>] ? extract_entropy_user+0x60/0x1b0
 [<ffffffff8d423490>] ? extract_entropy_user+0x1b0/0x1b0
 [<ffffffff8d4234c7>] urandom_read+0x37/0x110
 [<ffffffff8d423490>] ? extract_entropy_user+0x1b0/0x1b0
 [<ffffffff8d1d06f3>] do_loop_readv_writev+0x63/0x90
 [<ffffffff8d423490>] ? extract_entropy_user+0x1b0/0x1b0
 [<ffffffff8d1d2588>] do_readv_writev+0x268/0x280
 [<ffffffff8d11d4eb>] ? __acct_update_integrals+0x8b/0x120
 [<ffffffff8d0a935b>] ? preempt_count_sub+0xab/0x100
 [<ffffffff8d343d53>] ? __this_cpu_preempt_check+0x13/0x20
 [<ffffffff8d1d25d6>] vfs_readv+0x36/0x50
 [<ffffffff8d1d269c>] SyS_readv+0x5c/0x100
 [<ffffffff8d74ff9f>] tracesys+0xdd/0xe2
Code: cb 8d e8 d8 e0 c9 ff ba 02 00 02 00 be 1d 00 00 00 48 c7 c7 40 20 00 8e e8 d2 45 dc ff 48 89 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 <0f> 0b 66 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 49 89 fe 41 
RIP  [<ffffffff8d420c2c>] account+0x14c/0x150
 RSP <ffff88017dfebd38>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: BUG_ON drivers/char/random.c:986
  2014-07-15  4:36           ` BUG_ON drivers/char/random.c:986 Dave Jones
@ 2014-07-15 20:29             ` Hannes Frederic Sowa
  2014-07-16  8:33               ` Theodore Ts'o
  0 siblings, 1 reply; 46+ messages in thread
From: Hannes Frederic Sowa @ 2014-07-15 20:29 UTC (permalink / raw)
  To: Dave Jones
  Cc: Linux Kernel, Theodore Ts'o, Peter Zijlstra, H. Peter Anvin,
	Sasha Levin, Ingo Molnar, acme, Thomas Gleixner, price

On Di, 2014-07-15 at 00:36 -0400, Dave Jones wrote:
> On Fri, May 16, 2014 at 10:18:40PM -0400, Theodore Ts'o wrote:
>  > On Fri, May 16, 2014 at 05:46:22PM -0700, Hannes Frederic Sowa wrote:
>  > > This should do the trick:
>  > > dd if=/dev/urandom of=/dev/zero bs=67108707
>  > > 
>  > > I suspect ee1de406ba6eb1 ("random: simplify accounting logic") as the
>  > > culprit.
>  > 
>  > Yep, that it's it.  Thanks for noticing this so quickly!  I'll push
>  > the following patch to Linus.
>  > 
>  >     	      	    			- Ted
>  > 
>  > commit 29fb0ca5b3922288fba3f4c975a55032a51df0f0
>  > Author: Theodore Ts'o <tytso@mit.edu>
>  > Date:   Fri May 16 21:40:41 2014 -0400
>  > 
>  >     random: fix BUG_ON caused by accounting simplification
> 
> I just hit this bug again on 3.16rc5, which has this patch, so
> there's still some corner case that isn't happy..
> 
> (old thread is here for reference: https://lkml.org/lkml/2014/5/16/724)

Looks like an overflow introduced by e33ba5fa7afce1a ("random: fix nasty
entropy accounting bug"). Ted, what do you think about the following
fix?

Bye,
Hannes

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 0a7ac0a..72511e8 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -999,7 +999,9 @@ retry:
 	}
 	if (ibytes < min)
 		ibytes = 0;
-	if ((entropy_count -= ibytes << (ENTROPY_SHIFT + 3)) < 0)
+
+	if ((entropy_count -= ibytes << (ENTROPY_SHIFT + 3)) < 0 ||
+	    entropy_count > orig)
 		entropy_count = 0;
 
 	if (cmpxchg(&r->entropy_count, orig, entropy_count) != orig)



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: BUG_ON drivers/char/random.c:986
  2014-07-15 20:29             ` Hannes Frederic Sowa
@ 2014-07-16  8:33               ` Theodore Ts'o
  2014-07-16 19:18                 ` [PATCH] random: check for increase of entropy_count because of signed conversion Hannes Frederic Sowa
  0 siblings, 1 reply; 46+ messages in thread
From: Theodore Ts'o @ 2014-07-16  8:33 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Dave Jones, Linux Kernel, Peter Zijlstra, H. Peter Anvin,
	Sasha Levin, Ingo Molnar, acme, Thomas Gleixner, price

On Tue, Jul 15, 2014 at 10:29:10PM +0200, Hannes Frederic Sowa wrote:
> 
> Looks like an overflow introduced by e33ba5fa7afce1a ("random: fix nasty
> entropy accounting bug"). Ted, what do you think about the following
> fix?

Ah, right.  We're only capping ibytes to be no more entropy in the
pool only in the r->limit case.

LGTM.  Can you send a patch with a description with a description and
a Signed-off-by?

						- Ted

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] random: check for increase of entropy_count because of signed conversion
  2014-07-16  8:33               ` Theodore Ts'o
@ 2014-07-16 19:18                 ` Hannes Frederic Sowa
  2014-07-18 21:25                   ` Theodore Ts'o
  0 siblings, 1 reply; 46+ messages in thread
From: Hannes Frederic Sowa @ 2014-07-16 19:18 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Dave Jones, Linux Kernel, Greg Price

The expression entropy_count -= ibytes << (ENTROPY_SHIFT + 3) could
actually increase entropy_count if during assignment of the unsigned
expression on the RHS (mind the -=) we reduce the value modulo
2^width(int) and assign it to entropy_count. Trinity found this.

Reported-by: Dave Jones <davej@redhat.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Greg Price <price@mit.edu>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
As indicated by credit_entropy_bits entropy_count cannot get negative,
so I don't see any reason to include a check for entropy_count < 0
here. Do you agree?

 drivers/char/random.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 0a7ac0a..cd50c4e 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -981,7 +981,7 @@ static size_t account(struct entropy_store *r, size_t nbytes, int min,
 		      int reserved)
 {
 	int entropy_count, orig;
-	size_t ibytes;
+	size_t ibytes, nfrac;
 
 	BUG_ON(r->entropy_count > r->poolinfo->poolfracbits);
 
@@ -999,7 +999,11 @@ retry:
 	}
 	if (ibytes < min)
 		ibytes = 0;
-	if ((entropy_count -= ibytes << (ENTROPY_SHIFT + 3)) < 0)
+
+	nfrac = ibytes << (ENTROPY_SHIFT + 3);
+	if (entropy_count > nfrac)
+		entropy_count -= nfrac;
+	else
 		entropy_count = 0;
 
 	if (cmpxchg(&r->entropy_count, orig, entropy_count) != orig)
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH] random: check for increase of entropy_count because of signed conversion
  2014-07-16 19:18                 ` [PATCH] random: check for increase of entropy_count because of signed conversion Hannes Frederic Sowa
@ 2014-07-18 21:25                   ` Theodore Ts'o
  2014-07-18 21:43                     ` Hannes Frederic Sowa
  2014-07-18 21:50                     ` Theodore Ts'o
  0 siblings, 2 replies; 46+ messages in thread
From: Theodore Ts'o @ 2014-07-18 21:25 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: Dave Jones, Linux Kernel, Greg Price

On Wed, Jul 16, 2014 at 09:18:15PM +0200, Hannes Frederic Sowa wrote:
> The expression entropy_count -= ibytes << (ENTROPY_SHIFT + 3) could
> actually increase entropy_count if during assignment of the unsigned
> expression on the RHS (mind the -=) we reduce the value modulo
> 2^width(int) and assign it to entropy_count. Trinity found this.
> 
> Reported-by: Dave Jones <davej@redhat.com>
> Cc: Theodore Ts'o <tytso@mit.edu>
> Cc: Greg Price <price@mit.edu>
> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> ---
> As indicated by credit_entropy_bits entropy_count cannot get negative,
> so I don't see any reason to include a check for entropy_count < 0
> here. Do you agree?

No, the check is important; after we subtract ibytes << (ENTROPY_SHIFT
+ 3) we could drive entropy_count negative, and we don't want to
trigger the WARN_ON().

I'll modify the patch to keep the check.

						- Ted


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] random: check for increase of entropy_count because of signed conversion
  2014-07-18 21:25                   ` Theodore Ts'o
@ 2014-07-18 21:43                     ` Hannes Frederic Sowa
  2014-07-18 21:50                     ` Theodore Ts'o
  1 sibling, 0 replies; 46+ messages in thread
From: Hannes Frederic Sowa @ 2014-07-18 21:43 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Dave Jones, Linux Kernel, Greg Price

Hi,

On Fri, Jul 18, 2014, at 23:25, Theodore Ts'o wrote:
> On Wed, Jul 16, 2014 at 09:18:15PM +0200, Hannes Frederic Sowa wrote:
> > The expression entropy_count -= ibytes << (ENTROPY_SHIFT + 3) could
> > actually increase entropy_count if during assignment of the unsigned
> > expression on the RHS (mind the -=) we reduce the value modulo
> > 2^width(int) and assign it to entropy_count. Trinity found this.
> > 
> > Reported-by: Dave Jones <davej@redhat.com>
> > Cc: Theodore Ts'o <tytso@mit.edu>
> > Cc: Greg Price <price@mit.edu>
> > Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> > ---
> > As indicated by credit_entropy_bits entropy_count cannot get negative,
> > so I don't see any reason to include a check for entropy_count < 0
> > here. Do you agree?
> 
> No, the check is important; after we subtract ibytes << (ENTROPY_SHIFT
> + 3) we could drive entropy_count negative, and we don't want to
> trigger the WARN_ON().

If we always enter account() with r->entropy_count > 0 I think the
checks in place after this patch should suffice. The problem should only
arise in case entropy_count < 0 at the beginning of account (or in
between if we need to do a retry).

I didn't find a way how entropy_count could get smaller than 0 without
hitting a WARN_ON and a reset of r->entropy_count = 0.

Thanks,
Hannes

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] random: check for increase of entropy_count because of signed conversion
  2014-07-18 21:25                   ` Theodore Ts'o
  2014-07-18 21:43                     ` Hannes Frederic Sowa
@ 2014-07-18 21:50                     ` Theodore Ts'o
  2014-07-18 22:07                       ` Theodore Ts'o
  1 sibling, 1 reply; 46+ messages in thread
From: Theodore Ts'o @ 2014-07-18 21:50 UTC (permalink / raw)
  To: Hannes Frederic Sowa, Dave Jones, Linux Kernel, Greg Price

On Fri, Jul 18, 2014 at 05:25:04PM -0400, Theodore Ts'o wrote:
> > As indicated by credit_entropy_bits entropy_count cannot get negative,
> > so I don't see any reason to include a check for entropy_count < 0
> > here. Do you agree?
> 
> No, the check is important; after we subtract ibytes << (ENTROPY_SHIFT
> + 3) we could drive entropy_count negative, and we don't want to
> trigger the WARN_ON().
> 
> I'll modify the patch to keep the check.

Never mind, I took a closer look at the your patch, and I now
understand what you were asking.  Since entropy_count should never
_start_ negative, simply checking to see if entropy_count > nfrac is
sufficient.

However, there's something a bit larger hiding here, which is we
shouldn't allow urandom_read to be passed a which is greater than
INT_MAX >> ENTROPY_SHIFT.  Otherwise, the nfrac calcuation will
overflow, which can also result in too little entropy getting removed.

The other problem is that comparing since entropy_count is an int, and
nfrac is a size_t, this is a signed vs. unsigned comparison, which
will raise compiler warnings.  

Let me know what you think of my revised patch, which should hopefully
add enough checks to be sufficiently paranoid.  :-)

							- Ted

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] random: check for increase of entropy_count because of signed conversion
  2014-07-18 21:50                     ` Theodore Ts'o
@ 2014-07-18 22:07                       ` Theodore Ts'o
  2014-07-18 23:35                         ` Hannes Frederic Sowa
  0 siblings, 1 reply; 46+ messages in thread
From: Theodore Ts'o @ 2014-07-18 22:07 UTC (permalink / raw)
  To: linux-kernel; +Cc: hannes, davej, price, Greg Price, Theodore Ts'o

From: Hannes Frederic Sowa <hannes@stressinduktion.org>

The expression entropy_count -= ibytes << (ENTROPY_SHIFT + 3) could
actually increase entropy_count if during assignment of the unsigned
expression on the RHS (mind the -=) we reduce the value modulo
2^width(int) and assign it to entropy_count. Trinity found this.

[ Commit modified by tytso to add an additional safety check for a
  negative entropy_count -- which should never happen, and to also add
  an additional paranoia check to prevent overly large count values to
  be passed into urandom_read().  ]

Reported-by: Dave Jones <davej@redhat.com>
Cc: Greg Price <price@mit.edu>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
---
 drivers/char/random.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 0a7ac0a..003f744 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -641,7 +641,7 @@ retry:
 		} while (unlikely(entropy_count < pool_size-2 && pnfrac));
 	}
 
-	if (entropy_count < 0) {
+	if (unlikely(entropy_count < 0)) {
 		pr_warn("random: negative entropy/overflow: pool %s count %d\n",
 			r->name, entropy_count);
 		WARN_ON(1);
@@ -981,7 +981,7 @@ static size_t account(struct entropy_store *r, size_t nbytes, int min,
 		      int reserved)
 {
 	int entropy_count, orig;
-	size_t ibytes;
+	size_t ibytes, nfrac;
 
 	BUG_ON(r->entropy_count > r->poolinfo->poolfracbits);
 
@@ -999,7 +999,17 @@ retry:
 	}
 	if (ibytes < min)
 		ibytes = 0;
-	if ((entropy_count -= ibytes << (ENTROPY_SHIFT + 3)) < 0)
+
+	nfrac = ibytes << (ENTROPY_SHIFT + 3);
+	if (entropy_count < 0) {
+		pr_warn("random: negative entropy count: pool %s count %d\n",
+			r->name, entropy_count);
+		WARN_ON(1);
+		entropy_count = 0;
+	}
+	if ((unsigned) entropy_count > nfrac)
+		entropy_count -= nfrac;
+	else
 		entropy_count = 0;
 
 	if (cmpxchg(&r->entropy_count, orig, entropy_count) != orig)
@@ -1376,6 +1386,7 @@ urandom_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
 			    "with %d bits of entropy available\n",
 			    current->comm, nonblocking_pool.entropy_total);
 
+	nbytes = min_t(size_t, nbytes, INT_MAX >> ENTROPY_SHIFT);
 	ret = extract_entropy_user(&nonblocking_pool, buf, nbytes);
 
 	trace_urandom_read(8 * nbytes, ENTROPY_BITS(&nonblocking_pool),
-- 
2.0.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH] random: check for increase of entropy_count because of signed conversion
  2014-07-18 22:07                       ` Theodore Ts'o
@ 2014-07-18 23:35                         ` Hannes Frederic Sowa
  2014-07-19  5:42                           ` Theodore Ts'o
  0 siblings, 1 reply; 46+ messages in thread
From: Hannes Frederic Sowa @ 2014-07-18 23:35 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-kernel, davej, price

Hi,

On Fr, 2014-07-18 at 18:07 -0400, Theodore Ts'o wrote:
> From: Hannes Frederic Sowa <hannes@stressinduktion.org>
> 
> The expression entropy_count -= ibytes << (ENTROPY_SHIFT + 3) could
> actually increase entropy_count if during assignment of the unsigned
> expression on the RHS (mind the -=) we reduce the value modulo
> 2^width(int) and assign it to entropy_count. Trinity found this.
> 
> [ Commit modified by tytso to add an additional safety check for a
>   negative entropy_count -- which should never happen, and to also add
>   an additional paranoia check to prevent overly large count values to
>   be passed into urandom_read().  ]
> 
> Reported-by: Dave Jones <davej@redhat.com>
> Cc: Greg Price <price@mit.edu>
> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> ---
>  drivers/char/random.c | 17 ++++++++++++++---
>  1 file changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index 0a7ac0a..003f744 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -641,7 +641,7 @@ retry:
>  		} while (unlikely(entropy_count < pool_size-2 && pnfrac));
>  	}
>  
> -	if (entropy_count < 0) {
> +	if (unlikely(entropy_count < 0)) {
>  		pr_warn("random: negative entropy/overflow: pool %s count %d\n",
>  			r->name, entropy_count);
>  		WARN_ON(1);
> @@ -981,7 +981,7 @@ static size_t account(struct entropy_store *r, size_t nbytes, int min,
>  		      int reserved)
>  {
>  	int entropy_count, orig;
> -	size_t ibytes;
> +	size_t ibytes, nfrac;
>  
>  	BUG_ON(r->entropy_count > r->poolinfo->poolfracbits);
>  
> @@ -999,7 +999,17 @@ retry:
>  	}
>  	if (ibytes < min)
>  		ibytes = 0;
> -	if ((entropy_count -= ibytes << (ENTROPY_SHIFT + 3)) < 0)
> +
> +	nfrac = ibytes << (ENTROPY_SHIFT + 3);
> +	if (entropy_count < 0) {

Minor nit: maybe also add an unlikely() here?

> +		pr_warn("random: negative entropy count: pool %s count %d\n",
> +			r->name, entropy_count);
> +		WARN_ON(1);
> +		entropy_count = 0;
> +	}
> +	if ((unsigned) entropy_count > nfrac)

(unsigned) -> (size_t)

size_t could also be (unsigned long) so the plain (unsigned) is
misleading.

(Maybe I wouldn't have done the cast at all, as we compile the kernel
with -Wno-sign-compare and we have the < 0 check right above, but I
don't have a strong opinion on that.)

> +		entropy_count -= nfrac;
> +	else
>  		entropy_count = 0;
>  
>  	if (cmpxchg(&r->entropy_count, orig, entropy_count) != orig)
> @@ -1376,6 +1386,7 @@ urandom_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
>  			    "with %d bits of entropy available\n",
>  			    current->comm, nonblocking_pool.entropy_total);
>  
> +	nbytes = min_t(size_t, nbytes, INT_MAX >> ENTROPY_SHIFT);

Hmm, not sure, nfracs unit is 1/8 bits, so don't we have to limit nbytes
to INT_MAX >> (ENTROPY_SHIFT + 3) here?

And if we want to be even more correct here, we could switch from
INT_MAX to SIZE_MAX, as we do all nfrac calculations in the size_t
domain.

Maximum read/write size is SSIZE_MAX, so we don't need to care about
that, but if a user on a 64 bit machine requests INT_MAX bytes, we only
account/extract INT_MAX >> (ENTROPY_SHIFT + 3) bytes and cause a partial
read, though we actually could calulcate a correct nfrac for INT_MAX.
Because we don't have such large poolfragbits pools we would still
always end up with 0 while still allowing larger buffers to fill.

Hm, I just see that we should leave the INT_MAX limit just because of
the tracepoint.

Good catch,
Hannes

>  	ret = extract_entropy_user(&nonblocking_pool, buf, nbytes);
>  
>  	trace_urandom_read(8 * nbytes, ENTROPY_BITS(&nonblocking_pool),




^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] random: check for increase of entropy_count because of signed conversion
  2014-07-18 23:35                         ` Hannes Frederic Sowa
@ 2014-07-19  5:42                           ` Theodore Ts'o
  2014-07-19  6:20                             ` Hannes Frederic Sowa
  0 siblings, 1 reply; 46+ messages in thread
From: Theodore Ts'o @ 2014-07-19  5:42 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: linux-kernel, davej, price

On Sat, Jul 19, 2014 at 01:35:48AM +0200, Hannes Frederic Sowa wrote:
> > +	nfrac = ibytes << (ENTROPY_SHIFT + 3);
> > +	if (entropy_count < 0) {
> 
> Minor nit: maybe also add an unlikely() here?

Yep, done.

> > +	if ((unsigned) entropy_count > nfrac)
> 
> (unsigned) -> (size_t)
>
> size_t could also be (unsigned long) so the plain (unsigned) is
> misleading.

Good point, done.

> (Maybe I wouldn't have done the cast at all, as we compile the kernel
> with -Wno-sign-compare and we have the < 0 check right above, but I
> don't have a strong opinion on that.)

I also wanted to shut up other static code checkers like Coverity.  :-)

> > +	nbytes = min_t(size_t, nbytes, INT_MAX >> ENTROPY_SHIFT);
> 
> Hmm, not sure, nfracs unit is 1/8 bits, so don't we have to limit nbytes
> to INT_MAX >> (ENTROPY_SHIFT + 3) here?

Good catch, done.

> And if we want to be even more correct here, we could switch from
> INT_MAX to SIZE_MAX, as we do all nfrac calculations in the size_t
> domain.

The main reason why I used INT_MAX was as a further safety check to
protect the:

	entropy_count -= nfrac;

calculation, since nfrac is size_t and entropy_count is int.

In fact I think this online change ("nbytes = min_t(size_t, nbytes,
INT_MAX >> (ENTROPY_SHIFT + 3));") would have been enough to fix the
problem all by itself, but the other changes results in code which is
cleaner and easier to understand, and I'm a firm believer in multiple
layers of protection.  :-)

Cheers,

					- Ted

commit 79a8468747c5f95ed3d5ce8376a3e82e0c5857fc
Author: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date:   Fri Jul 18 17:26:41 2014 -0400

    random: check for increase of entropy_count because of signed conversion
    
    The expression entropy_count -= ibytes << (ENTROPY_SHIFT + 3) could
    actually increase entropy_count if during assignment of the unsigned
    expression on the RHS (mind the -=) we reduce the value modulo
    2^width(int) and assign it to entropy_count. Trinity found this.
    
    [ Commit modified by tytso to add an additional safety check for a
      negative entropy_count -- which should never happen, and to also add
      an additional paranoia check to prevent overly large count values to
      be passed into urandom_read().  ]
    
    Reported-by: Dave Jones <davej@redhat.com>
    Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    Cc: stable@vger.kernel.org

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 0a7ac0a..71529e1 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -641,7 +641,7 @@ retry:
 		} while (unlikely(entropy_count < pool_size-2 && pnfrac));
 	}
 
-	if (entropy_count < 0) {
+	if (unlikely(entropy_count < 0)) {
 		pr_warn("random: negative entropy/overflow: pool %s count %d\n",
 			r->name, entropy_count);
 		WARN_ON(1);
@@ -981,7 +981,7 @@ static size_t account(struct entropy_store *r, size_t nbytes, int min,
 		      int reserved)
 {
 	int entropy_count, orig;
-	size_t ibytes;
+	size_t ibytes, nfrac;
 
 	BUG_ON(r->entropy_count > r->poolinfo->poolfracbits);
 
@@ -999,7 +999,17 @@ retry:
 	}
 	if (ibytes < min)
 		ibytes = 0;
-	if ((entropy_count -= ibytes << (ENTROPY_SHIFT + 3)) < 0)
+
+	if (unlikely(entropy_count < 0)) {
+		pr_warn("random: negative entropy count: pool %s count %d\n",
+			r->name, entropy_count);
+		WARN_ON(1);
+		entropy_count = 0;
+	}
+	nfrac = ibytes << (ENTROPY_SHIFT + 3);
+	if ((size_t) entropy_count > nfrac)
+		entropy_count -= nfrac;
+	else
 		entropy_count = 0;
 
 	if (cmpxchg(&r->entropy_count, orig, entropy_count) != orig)
@@ -1376,6 +1386,7 @@ urandom_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
 			    "with %d bits of entropy available\n",
 			    current->comm, nonblocking_pool.entropy_total);
 
+	nbytes = min_t(size_t, nbytes, INT_MAX >> (ENTROPY_SHIFT + 3));
 	ret = extract_entropy_user(&nonblocking_pool, buf, nbytes);
 
 	trace_urandom_read(8 * nbytes, ENTROPY_BITS(&nonblocking_pool),

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH] random: check for increase of entropy_count because of signed conversion
  2014-07-19  5:42                           ` Theodore Ts'o
@ 2014-07-19  6:20                             ` Hannes Frederic Sowa
  0 siblings, 0 replies; 46+ messages in thread
From: Hannes Frederic Sowa @ 2014-07-19  6:20 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-kernel, davej, price

On Sa, 2014-07-19 at 01:42 -0400, Theodore Ts'o wrote:
> On Sat, Jul 19, 2014 at 01:35:48AM +0200, Hannes Frederic Sowa wrote:
> > > +	nfrac = ibytes << (ENTROPY_SHIFT + 3);
> > > +	if (entropy_count < 0) {
> > 
> > Minor nit: maybe also add an unlikely() here?
> 
> Yep, done.
> 
> > > +	if ((unsigned) entropy_count > nfrac)
> > 
> > (unsigned) -> (size_t)
> >
> > size_t could also be (unsigned long) so the plain (unsigned) is
> > misleading.
> 
> Good point, done.
> 
> > (Maybe I wouldn't have done the cast at all, as we compile the kernel
> > with -Wno-sign-compare and we have the < 0 check right above, but I
> > don't have a strong opinion on that.)
> 
> I also wanted to shut up other static code checkers like Coverity.  :-)
> 
> > > +	nbytes = min_t(size_t, nbytes, INT_MAX >> ENTROPY_SHIFT);
> > 
> > Hmm, not sure, nfracs unit is 1/8 bits, so don't we have to limit nbytes
> > to INT_MAX >> (ENTROPY_SHIFT + 3) here?
> 
> Good catch, done.
> 
> > And if we want to be even more correct here, we could switch from
> > INT_MAX to SIZE_MAX, as we do all nfrac calculations in the size_t
> > domain.
> 
> The main reason why I used INT_MAX was as a further safety check to
> protect the:
> 
> 	entropy_count -= nfrac;
> 
> calculation, since nfrac is size_t and entropy_count is int.
> 
> In fact I think this online change ("nbytes = min_t(size_t, nbytes,
> INT_MAX >> (ENTROPY_SHIFT + 3));") would have been enough to fix the
> problem all by itself, but the other changes results in code which is
> cleaner and easier to understand, and I'm a firm believer in multiple
> layers of protection.  :-)

I see and can agree here. :)

I think the patch is good to go.

Thanks you,
Hannes



^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2014-07-19  6:20 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-12 15:42 perf: use after free in perf_remove_from_context Sasha Levin
2014-05-14 16:29 ` Peter Zijlstra
2014-05-14 16:32   ` Sasha Levin
2014-05-14 16:35     ` Peter Zijlstra
2014-05-14 16:38       ` Sasha Levin
2014-05-14 16:52         ` Peter Zijlstra
2014-05-14 17:09           ` Sasha Levin
2014-05-14 17:20             ` Dave Jones
2014-05-14 18:37               ` Peter Zijlstra
2014-05-28 23:52       ` Sasha Levin
2014-05-29  2:31         ` Sasha Levin
2014-05-29  7:59           ` Peter Zijlstra
2014-05-29  7:57         ` Peter Zijlstra
2014-05-29 14:47           ` Sasha Levin
2014-05-29 15:07             ` Peter Zijlstra
2014-05-29 16:44               ` Sasha Levin
2014-05-29 16:50                 ` Peter Zijlstra
2014-05-29 16:52                   ` Sasha Levin
2014-05-29 17:00                   ` Peter Zijlstra
2014-05-29 22:37                     ` Sasha Levin
2014-06-05 14:38                     ` [tip:perf/core] perf: Fix use after free in perf_remove_from_context() tip-bot for Peter Zijlstra
2014-05-15 18:11 ` eventpoll __list_del_entry corruption (was: perf: use after free in perf_remove_from_context) Peter Zijlstra
2014-05-15 18:16   ` eventpoll __list_del_entry corruption Sasha Levin
2014-06-16  9:44     ` Eric Wong
2014-05-21  8:25   ` BUG at /usr/src/linux-2.6/mm/filemap.c:202 (was: perf: use after free in perf_remove_from_context) Peter Zijlstra
2014-05-21 13:02     ` BUG at /usr/src/linux-2.6/mm/filemap.c:202 Sasha Levin
2014-06-03 15:07   ` eventpoll __list_del_entry corruption Jason Baron
2014-06-03 15:11     ` Peter Zijlstra
2014-05-16 15:34 ` BUG_ON drivers/char/random.c:986 (Was: perf: use after free in perf_remove_from_context) Peter Zijlstra
2014-05-16 16:06   ` H. Peter Anvin
2014-05-16 16:21     ` Peter Zijlstra
2014-05-17  0:46       ` Hannes Frederic Sowa
2014-05-17  2:18         ` Theodore Ts'o
2014-05-17 16:24           ` Sasha Levin
2014-05-17 17:00             ` Peter Zijlstra
2014-07-15  4:36           ` BUG_ON drivers/char/random.c:986 Dave Jones
2014-07-15 20:29             ` Hannes Frederic Sowa
2014-07-16  8:33               ` Theodore Ts'o
2014-07-16 19:18                 ` [PATCH] random: check for increase of entropy_count because of signed conversion Hannes Frederic Sowa
2014-07-18 21:25                   ` Theodore Ts'o
2014-07-18 21:43                     ` Hannes Frederic Sowa
2014-07-18 21:50                     ` Theodore Ts'o
2014-07-18 22:07                       ` Theodore Ts'o
2014-07-18 23:35                         ` Hannes Frederic Sowa
2014-07-19  5:42                           ` Theodore Ts'o
2014-07-19  6:20                             ` Hannes Frederic Sowa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).