All of lore.kernel.org
 help / color / mirror / Atom feed
* kernel warning with 0.0.20170223: entered softirq 3 NET_RX net_rx_action+0x0/0x760 with preempt_count 00000101, exited with 00000100?
@ 2017-02-26 19:47 Timothée Ravier
       [not found] ` <CAHmME9pH5xbVR04b9JLqFfho=i_K-jod6N8tJt0ggDXPfqQ_LA@mail.gmail.com>
  0 siblings, 1 reply; 6+ messages in thread
From: Timothée Ravier @ 2017-02-26 19:47 UTC (permalink / raw)
  To: wireguard

Hi,

With the following setup, both client & server running:
* Arch Linux
* kernel 4.9.11.r201702222257-1-grsec
* wireguard-dkms & wireguard-tools 0.0.20170223

Server:
interface: wg0
  public key: ...
  private key: ...
  listening port: 51820

peer: ...
  endpoint: ...:51820
  allowed ips: 192.168.42.1/32

Client:
interface: wg0
  public key: ...
  private key: ...
  listening port: 51820

peer: ...
  endpoint: ...:51820
  allowed ips: 192.168.42.100/32

I get the following warning in the kernel log for each received packets
(a simple ping triggers it):
softirq: huh, entered softirq 3 NET_RX net_rx_action+0x0/0x760 with
preempt_count 00000101, exited with 00000100?

I am using nftables on the server side but iptables on the client side.

Thanks,

Tim

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: kernel warning with 0.0.20170223: entered softirq 3 NET_RX net_rx_action+0x0/0x760 with preempt_count 00000101, exited with 00000100?
       [not found]     ` <CAApVa_kyD=KZYN3ABZU5yZGExNhC-34ryF+JZAPZ0JAKgmdJLw@mail.gmail.com>
@ 2017-02-27  3:22       ` Jason A. Donenfeld
  2017-02-27 11:53         ` Brad Spengler
  2017-02-27 23:36         ` kernel warning with 0.0.20170223: entered softirq 3 NET_RX net_rx_action+0x0/0x760 with preempt_count 00000101, exited with PaX Team
  0 siblings, 2 replies; 6+ messages in thread
From: Jason A. Donenfeld @ 2017-02-27  3:22 UTC (permalink / raw)
  To: Pipacs; +Cc: Brad Spengler, WireGuard mailing list

Hey Pipacs,

I've been receiving reports of strange bugs from grsec users with
WireGuard. The first set of bugs was a heisenbug crash, and I never
found the root cause, but it seemed to happen in the rx path. Then
today Timoth=C3=A9e emailed another different bug from a grsec box, also
along the rx path. This time it was related to the preemption count
being wrong coming into and going out of the rx softirq. This kind of
preemption mismatch, I figure, might account for the earlier bug I
never solved.

So armed with this new information, I went hunting. I followed the
path inward, surrounding the body of each function with:

int i =3D preempt_count();
function_body...
if (i !=3D preempt_count()) pr_err("LORDHAVEMERCY\n");

Eventually I isolated the bug to an interesting situation like this:

int i =3D preempt_count();
other_function(...);
if (i !=3D preempt_count()) pr_err("This will print out\n");

void other_function(int a)
{
int vla[a];
int i =3D preempt_count();
function_body...
if (i !=3D preempt_count()) pr_err("This will NOT print out\n");
}

Since I only got the outer print, I thought this was strange, so I rearrang=
ed:

void other_function(int a)
{
int i =3D preempt_count();
int vla[a];
if (i !=3D preempt_count()) pr_err("This will print out\n");
function_body...
}

Yay, we found the bug. But wtf, what could possibly be changing the
preempt_count there?

So I went disassembling, and lo and behold the clever PaX stack leak
plugin was adding calls to pax_check_alloca. Very nice! But still, why
the preemption bug situation? I went hunting further:

void __used pax_check_alloca(unsigned long size)
{
 ...
       case STACK_TYPE_IRQ:
               stack_left =3D sp & (IRQ_STACK_SIZE - 1);
               put_cpu();
               break;
 ...
}

Do you see the bug? Looks like somebody snuck in a "put_cpu()" there,
where it really does not belong. "put_cpu()" basically just jiggers
the preempt_count. I can confirm that removing the erroneous call to
"put_cpu()" fixes the bug.

So, either this is by design, and there's some odd subtlety I'm
missing, or this is a bug that should be fixed in grsec/PaX.

In the case of the latter, I believe this introduces a security
vulnerability, since it opens up a whole host of interesting race
conditions that can be exploited.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: kernel warning with 0.0.20170223: entered softirq 3 NET_RX net_rx_action+0x0/0x760 with preempt_count 00000101, exited with 00000100?
  2017-02-27  3:22       ` Jason A. Donenfeld
@ 2017-02-27 11:53         ` Brad Spengler
  2017-02-27 15:51           ` Jason A. Donenfeld
  2017-02-27 23:36         ` kernel warning with 0.0.20170223: entered softirq 3 NET_RX net_rx_action+0x0/0x760 with preempt_count 00000101, exited with PaX Team
  1 sibling, 1 reply; 6+ messages in thread
From: Brad Spengler @ 2017-02-27 11:53 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: Pipacs, WireGuard mailing list

[-- Attachment #1: Type: text/plain, Size: 3124 bytes --]

Hi,

From looking at the code, this seems to have been introduced during
the port to 4.9.  In previous kernels (and in both the stable kernels)
a get_cpu was used to look at the per-cpu irq stack pointer value,
which was properly put in all cases.  When the code was reworked
for 4.9 to use some new upstream helper functions to identify the
various stacks (and one which uses this_cpu_ptr to identify the
irq stack) that single put_cpu() wasn't removed with the others.
We'll have this fixed in the next patch.

Thanks!
-Brad

On Mon, Feb 27, 2017 at 04:22:34AM +0100, Jason A. Donenfeld wrote:
> Hey Pipacs,
> 
> I've been receiving reports of strange bugs from grsec users with
> WireGuard. The first set of bugs was a heisenbug crash, and I never
> found the root cause, but it seemed to happen in the rx path. Then
> today Timoth??e emailed another different bug from a grsec box, also
> along the rx path. This time it was related to the preemption count
> being wrong coming into and going out of the rx softirq. This kind of
> preemption mismatch, I figure, might account for the earlier bug I
> never solved.
> 
> So armed with this new information, I went hunting. I followed the
> path inward, surrounding the body of each function with:
> 
> int i = preempt_count();
> function_body...
> if (i != preempt_count()) pr_err("LORDHAVEMERCY\n");
> 
> Eventually I isolated the bug to an interesting situation like this:
> 
> int i = preempt_count();
> other_function(...);
> if (i != preempt_count()) pr_err("This will print out\n");
> 
> void other_function(int a)
> {
> int vla[a];
> int i = preempt_count();
> function_body...
> if (i != preempt_count()) pr_err("This will NOT print out\n");
> }
> 
> Since I only got the outer print, I thought this was strange, so I rearranged:
> 
> void other_function(int a)
> {
> int i = preempt_count();
> int vla[a];
> if (i != preempt_count()) pr_err("This will print out\n");
> function_body...
> }
> 
> Yay, we found the bug. But wtf, what could possibly be changing the
> preempt_count there?
> 
> So I went disassembling, and lo and behold the clever PaX stack leak
> plugin was adding calls to pax_check_alloca. Very nice! But still, why
> the preemption bug situation? I went hunting further:
> 
> void __used pax_check_alloca(unsigned long size)
> {
>  ...
>        case STACK_TYPE_IRQ:
>                stack_left = sp & (IRQ_STACK_SIZE - 1);
>                put_cpu();
>                break;
>  ...
> }
> 
> Do you see the bug? Looks like somebody snuck in a "put_cpu()" there,
> where it really does not belong. "put_cpu()" basically just jiggers
> the preempt_count. I can confirm that removing the erroneous call to
> "put_cpu()" fixes the bug.
> 
> So, either this is by design, and there's some odd subtlety I'm
> missing, or this is a bug that should be fixed in grsec/PaX.
> 
> In the case of the latter, I believe this introduces a security
> vulnerability, since it opens up a whole host of interesting race
> conditions that can be exploited.
> 
> Thanks,
> Jason

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: kernel warning with 0.0.20170223: entered softirq 3 NET_RX net_rx_action+0x0/0x760 with preempt_count 00000101, exited with 00000100?
  2017-02-27 11:53         ` Brad Spengler
@ 2017-02-27 15:51           ` Jason A. Donenfeld
  2017-02-27 17:33             ` Timothée Ravier
  0 siblings, 1 reply; 6+ messages in thread
From: Jason A. Donenfeld @ 2017-02-27 15:51 UTC (permalink / raw)
  To: Brad Spengler; +Cc: Pipacs, WireGuard mailing list

Hey Brad,

Thanks for fixing this!

Regards,
Jason

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: kernel warning with 0.0.20170223: entered softirq 3 NET_RX net_rx_action+0x0/0x760 with preempt_count 00000101, exited with 00000100?
  2017-02-27 15:51           ` Jason A. Donenfeld
@ 2017-02-27 17:33             ` Timothée Ravier
  0 siblings, 0 replies; 6+ messages in thread
From: Timothée Ravier @ 2017-02-27 17:33 UTC (permalink / raw)
  To: Jason A. Donenfeld, Brad Spengler; +Cc: Pipacs, WireGuard mailing list

On 27/02/2017 16:51, Jason A. Donenfeld wrote:
> Hey Brad,
> 
> Thanks for fixing this!
> 
> Regards,
> Jason

I can confirm this is fixed for me.

Thanks!

Tim

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: kernel warning with 0.0.20170223: entered softirq 3 NET_RX net_rx_action+0x0/0x760 with preempt_count 00000101, exited with
  2017-02-27  3:22       ` Jason A. Donenfeld
  2017-02-27 11:53         ` Brad Spengler
@ 2017-02-27 23:36         ` PaX Team
  1 sibling, 0 replies; 6+ messages in thread
From: PaX Team @ 2017-02-27 23:36 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: Brad Spengler, WireGuard mailing list

On 27 Feb 2017 at 4:22, Jason A. Donenfeld wrote:

> void __used pax_check_alloca(unsigned long size)
> {
>  ...
>        case STACK_TYPE_IRQ:
>                stack_left = sp & (IRQ_STACK_SIZE - 1);
>                put_cpu();
>                break;
>  ...
> }
> 
> Do you see the bug? Looks like somebody snuck in a "put_cpu()" there,
> where it really does not belong. "put_cpu()" basically just jiggers
> the preempt_count. I can confirm that removing the erroneous call to
> "put_cpu()" fixes the bug.
> 
> So, either this is by design, and there's some odd subtlety I'm
> missing, or this is a bug that should be fixed in grsec/PaX.

as spender explained it, it's a bug. what happened was that 4.9 introduced
get_stack_info that i then made use of in pax_check_alloca (instead of keeping
my own stack classifier). this old code of mine had to lock the cpu which is
why i had get/put_cpu calls in there and i managed to forget about one of the
latter.

> In the case of the latter, I believe this introduces a security
> vulnerability, since it opens up a whole host of interesting race
> conditions that can be exploited.

now this is the interesting part, isn't it ;). first, the conditions needed to
trigger the bug are that you need an amd64 config with STACKLEAK and PREEMPT
enabled. second, you need functions with big enough stack frames (including
controllable dynamically sized local variables) that get instrumented by STACKLEAK
and executed in IRQ context (on a IRQ stack). once you have these conditions
under control, you can overdecrement the preempt count on a given CPU each time
such a function is executed. note that if this happens under __do_softirq then
the overdecrement will be fixed up (and get logged), so you probably need another
path for abuse. what i don't know (and have no time to figure out) is how many
of these functions there are and how you get something useful out of abusing them.

thanks & cheers,
 PaX Team

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-02-27 23:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-26 19:47 kernel warning with 0.0.20170223: entered softirq 3 NET_RX net_rx_action+0x0/0x760 with preempt_count 00000101, exited with 00000100? Timothée Ravier
     [not found] ` <CAHmME9pH5xbVR04b9JLqFfho=i_K-jod6N8tJt0ggDXPfqQ_LA@mail.gmail.com>
     [not found]   ` <a80303ca-e841-77f1-5ea6-6833f69b6059@gmail.com>
     [not found]     ` <CAApVa_kyD=KZYN3ABZU5yZGExNhC-34ryF+JZAPZ0JAKgmdJLw@mail.gmail.com>
2017-02-27  3:22       ` Jason A. Donenfeld
2017-02-27 11:53         ` Brad Spengler
2017-02-27 15:51           ` Jason A. Donenfeld
2017-02-27 17:33             ` Timothée Ravier
2017-02-27 23:36         ` kernel warning with 0.0.20170223: entered softirq 3 NET_RX net_rx_action+0x0/0x760 with preempt_count 00000101, exited with PaX Team

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.