CVE-2021-3609: Race condition in net/can/bcm.c leads to local privilege escalation

* CVE-2021-3609: Race condition in net/can/bcm.c leads to local privilege escalation
@ 2021-06-19 14:45 Norbert Slusarek
  0 siblings, 0 replies; only message in thread
From: Norbert Slusarek @ 2021-06-19 14:45 UTC (permalink / raw)
  To: linux-can; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 2201 bytes --]

Hello,

this is an announcement for the recently reported bug (CVE-2021-3609)
in the CAN BCM networking protocol in the Linux kernel ranging from
version 2.6.25 to mainline 5.13-rc6.
The vulnerability is a race condition in net/can/bcm.c allowing for local
privilege escalation to root. The issue was initially reported by syzbot and
proven to be exploitable by Norbert Slusarek.

The CAN BCM networking protocol allows to register a CAN message receiver for a
specified socket. The function bcm_rx_handler() is run for incoming CAN messages.
Simultaneously to running this function, the socket can be closed and
bcm_release() will be called. Inside bcm_release(), struct bcm_op and
struct bcm_sock are freed while bcm_rx_handler() is still running,
finally leading to multiple use-after-free's.

Reproduction
------------

- setup unprivileged user namespace
- setup vcan network interface
- open two CAN BCM sockets and connect each to the interface
- call sendmsg() on socket 1 with RX_SETUP to setup CAN receiver
- call sendmsg() on socket 2 to send message to socket 1

Here comes the race condition:

- bcm_rx_handler() is run automatically for socket 1 to receive the message
- call close() -> bcm_release() on socket 1 to free struct bcm_op and struct bcm_sock

=> bcm_rx_handler() is still running and will access struct bcm_op and struct
   bcm_sock which were previously freed

Exploitation
------------

My exploitation attempt concentrates on kernels with version >= 5.4-rc1
since commit bf74aa86e111 ("can: bcm: switch timer to HRTIMER_MODE_SOFT and
remove hrtimer_tasklet"). I didn't investigate into exploiting kernels older
than 5.4-rc1 which used tasklets, nevertheless exploitation on older kernels
looks feasible as well. My specific exploitation approach was adjusted to work
with Ubuntu 20.04.02 LTS but other known distributions could also be targeted.

More exploitation details can be found at

https://github.com/nrb547/kernel-exploitation/blob/main/cve-2021-3609/cve-2021-3609.md

or in the attachments (plain text and attached image).

The original posting to oss-security can be found at

https://www.openwall.com/lists/oss-security/2021/06/19/1

Regards,
Norbert Slusarek

[-- Attachment #2: cve-2021-3609-exploitation --]
[-- Type: application/octet-stream, Size: 13809 bytes --]

CVE-2021-3609: CAN BCM local privilege escalation
--------------------------------------------------

This article is about a vulnerability in the Linux kernel labeled CVE-2021-3609. The issue
was initially reported by syzbot. The vulnerable part of the kernel was the CAN BCM networking
protocol in the CAN networking subsystem ranging from kernel version 2.6.25 to 5.13-rc6.
In the following, I am going to cover the vulnerability and my exploitation approach for
kernel version >= 5.4 which led to successful local privilege escalation to root.

Vulnerability
-------------

The vulnerability is a race condition which lets us free struct bcm_op and struct bcm_sock
in bcm_release() while still being used in bcm_rx_handler().

struct bcm_op is a structure which can be allocated by sending a message on a CAN BCM socket with
the opcode RX_SETUP. It is used to setup either specific transmission or reception of CAN messages.
In this particular case, we allocate an operation in bcm_rx_setup() to receive messages.

static int bcm_rx_setup(struct bcm_msg_head *msg_head, struct msghdr *msg,
                        int ifindex, struct sock *sk)
{
	...

        /* check the given can_id */
        op = bcm_find_op(&bo->rx_ops, msg_head, ifindex);
        if (op) {
                /* update existing BCM operation */

		... update struct members of op ...

                /* Only an update -> do not call can_rx_register() */
                do_rx_register = 0;

        } else {
                /* insert new BCM operation for the given can_id */
                op = kzalloc(OPSIZ, GFP_KERNEL);

		... initialization of op ...

		do_rx_register = 1;						[1]

	}

	...

        /* now we can register for can_ids, if we added a new bcm_op */
        if (do_rx_register) {
                if (ifindex) {
                        struct net_device *dev;

                        dev = dev_get_by_index(sock_net(sk), ifindex);
                        if (dev) {
                                err = can_rx_register(sock_net(sk), dev,	[2]
                                                      op->can_id,
                                                      REGMASK(op->can_id),
                                                      bcm_rx_handler, op,
                                                      "bcm", sk);

                                op->rx_reg_dev = dev;
                                dev_put(dev);
                        }
	...
}

The excerpt above makes it clear that we have to specifically allocate a new struct bcm_op [1] in
order to register a new CAN receiver. At [2], we register such for our user-controlled network
interface specified with ifindex. Notice that bcm_rx_handler is passed as an argument which means
that this function will be called on message receival.

Now we have to send a CAN message from another CAN BCM socket which will be broadcasted to all
sockets on this network interface. In total, we have one socket for reception (this is the one we are
going to exploit) and another one for transmission. Because we registered the first socket with
RX_SETUP, we can receive the incoming message.
Interestingly enough, TX_SETUP for our sending socket is not required as we already specify the
network interface in connect().

At this point, we have a message incoming so bcm_rx_handler() is called. At the same time, we
close the socket and bcm_release() is run in parallel to our receive handler.

static int bcm_release(struct socket *sock)
{
	...

        /* remove bcm_ops, timer, rx_unregister(), etc. */

        unregister_netdevice_notifier(&bo->notifier);

        lock_sock(sk);							[1]

        list_for_each_entry_safe(op, next, &bo->tx_ops, list)
                bcm_remove_op(op);

        list_for_each_entry_safe(op, next, &bo->rx_ops, list) {
                /*
                 * Don't care if we're bound or not (due to netdev problems)
                 * can_rx_unregister() is always a save thing to do here.
                 */
                if (op->ifindex) {
                        /*
                         * Only remove subscriptions that had not
                         * been removed due to NETDEV_UNREGISTER
                         * in bcm_notifier()
                         */
                        if (op->rx_reg_dev) {
                                struct net_device *dev;

                                dev = dev_get_by_index(net, op->ifindex);
                                if (dev) {
                                        bcm_rx_unreg(dev, op);
                                        dev_put(dev);
                                }
                        }
                }
	...

                bcm_remove_op(op);					[2]
        }

	...

        sock_orphan(sk);
        sock->sk = NULL;

        release_sock(sk);
        sock_put(sk);							[3]

	return 0;
}

In bcm_release(), we take the lock [1]. One might ask themselves, why do we have a race condition if
we take a lock before accessing the socket? It's because there is no similar locking in
bcm_rx_handler() which would effectively hang bcm_release() to wait for bcm_rx_handler() to finish its work.
Although, the patch for this bug does not take a lock in bcm_rx_handler().
Instead, we are under a so-called RCU read lock which is invoked in CAN receiver code before
bcm_rx_handler(). For this reason, the patch adds a call to synchronize_rcu() right before [2]
in order to wait for all RCU dependent operations to finish before completely closing the socket.
I won't go into detail about how RCU works, but I'm leaving you a link at the bottom of this article.

Because there was no synchronizing feature prior the patch, we simply free struct bcm_op at [2] and
decrease the refcount of the socket. Finally, struct bcm_sock will also be freed because refcount
will reach 0.

Exploitation
------------

So now we are still in bcm_rx_handler(), but how do we want to exploit this? After many trials, I've
found it particularly hard to exploit any of the use-after-free's within bcm_rx_handler(). This is
due to bcm_rx_handler() executing fast which means that it's tricky to overwrite struct bcm_op with
heap spraying. In contrast to my previous CAN ISOTP exploit, it looks to me that there is no good
opportunity to halt execution within bcm_rx_handler() and make it more reliable.
Instead, I focus on another approach which I will explain in the following.

This particular code in bcm_rx_setup() turned out to be useful:

if (op->flags & SETTIMER) {

	/* set timer value */
	op->ival1 = msg_head->ival1;
	op->ival2 = msg_head->ival2;
	op->kt_ival1 = bcm_timeval_to_ktime(msg_head->ival1);
	op->kt_ival2 = bcm_timeval_to_ktime(msg_head->ival2);
	...
}

When we allocate a new struct bcm_op, we can specify the flag SETTIMER and setup a timer. If the
timer is started, bcm_rx_timeout_handler() will be called once the user-controlled time value
op->kt_ival1 has passed.

At the end of bcm_rx_handler(), we have a call to bcm_rx_starttimer() which will start this timer.

/*
 * bcm_rx_starttimer - enable timeout monitoring for CAN frame reception
 */
static void bcm_rx_starttimer(struct bcm_op *op)
{
        if (op->flags & RX_NO_AUTOTIMER)
                return;

        if (op->kt_ival1)							[1]
                hrtimer_start(&op->timer, op->kt_ival1, HRTIMER_MODE_REL_SOFT);
}

If we set a timer in bcm_rx_setup(), it will be started and run for op->kt_ival1 which is
controlled by the user.
In my case, I have set the timer to expire after one second, so bcm_rx_timeout_handler() will be
called one second after hrtimer_start() in bcm_rx_starttimer().
This allowed me to have a sufficient time frame of one second in which I can perform a
reliable heap spray.

For the heap spray, I use the already known technique with setxattr() and userfaultfd() which was
described well by Vitaly Nikolenko. You can find a link to his article at the bottom.

I didn't want to heap spray struct bcm_op because it is heavily used in bcm_rx_handler() where a
reliable heap spray is hard. Instead, I hope that during the time span of running bcm_rx_handler()
the freed struct bcm_op won't be overwritten until I start the timer in bcm_rx_starttimer().
This approach sort of works because bcm_rx_handler() runs fast so there is not much time in which
the freed struct bcm_op could be overwritten.

Back to bcm_rx_timeout_handler(), struct bcm_sock has a few function pointers which I could
overwrite with my heap spray. I decided to use the sk_data_ready() pointer which is called in the
following call path:

bcm_rx_timeout_handler() -> bcm_send_to_user() -> sock_queue_rcv_skb() -> __sock_queue_rcv_skb() ->
sk->sk_data_ready(sk)

At this point, the sk->sk_data_ready(sk) pointer will be called and we end up with arbitrary kernel
execution. Because the function is called with the parameter sk (struct sock *),
the address of our heap sprayed socket will be stored in the RDI register.
This allows me to perform a stack pivot to the beginning of the socket structure and start
executing ROP gadgets.

struct bcm_sock is 872 bytes big on my system which means that it is allocated in the generic
kmalloc-1024 SLAB cache. Because struct bcm_sock does not fill all 1024 bytes, I have 152 unused
bytes (1024 - 872) which I can use to construct a ROP chain.

[The attached image is useful here]

The extended ROP chain will overwrite a kernel address where modprobe_path is stored. I've already
used this technique in my CAN ISOTP exploit (article available on my github) and it's explained well
by lkmidas in his article. Check it out in the link at the bottom.

One problem I've stumbled upon during exploitation was that I couldn't jump to do_task_dead() to
halt my hijacked kernel thread. Shortly after, I noticed what the issue was:
bcm_rx_timeout_handler() is executed by task swapper with PID 0.
I obviously can't kill task with PID 0, so I had to figure out another way to fixate the system
after executing the ROP chain. Looking at the kernel panic logs which reveal registers, I noticed
that the register RBP stored an address similar to RSP.
Notice that I had to change RSP by performing a stack pivot to abandon the actual kernel stack for my
own malicious one.
The RBP register wasn't touched during execution of the ROP gadgets, so I could use it to move back
to the old kernel stack. Even if RBP would change during the ROP execution, I could save the
contents of RBP to another register and restore the kernel stack from this register instead.

So after executing ROP gadgets, I can basically reverse the stack pivot by moving RBP into RSP,
then I pop one element off the stack and return back to __sock_queue_rcv_skb(). I also set RAX to 0
for a clean return without errors.

*rop++ = 0xffffffff81087bc3 + kaslr_offset; /* xor rax, rax ; ret */ /* return value */
*rop++ = 0xffffffff81087b0c + kaslr_offset; /* mov rsp, rbp ; pop rbp ; ret */

Finally, all is left is to execute /tmp/dummy which in turn runs /tmp/x with root privileges
and the unprivileged user is added to /etc/sudoers without password.
Local privilege escalation is done.

Getting the KASLR offset
------------------------

In case we run on a system with KASLR enabled, we need to know the KASLR offset in order to return
to valid kernel addresses in the ROP chain. On Ubuntu 20.04.02 LTS, I was able to retrieve a kernel
text address from a warning in dmesg.
If the target machine is 32-bit and KASLR is enabled, you could try CVE-2021-34693 which is an
infoleak of 4 bytes in struct bcm_msg_head. You can find a link to the PoC at the bottom.

Combining everything together
-----------------------------

At this place, I covered all the steps which now have to be combined. The following sequence is used
in my exploit:

- retrieve kernel text address for KASLR offset in dmesg
	- on 32-bit systems CVE-2021-34693 can be used
- setup user namespace
- setup vcan network interface
- open two CAN BCM sockets and connect each to the interface
- call sendmsg() on socket 1 with RX_SETUP, flag SETTIMER and time interval of one second
  to allocate struct bcm_op
- call sendmsg() on socket 2 to send message to socket 1

At the same time:
- bcm_rx_handler() is run in a softirq
	- bcm_rx_starttimer() starts the timer
- close socket 1 -> bcm_release() -> free struct bcm_op and struct bcm_sock

- heap spray struct bcm_sock with the malicious buffer
- bcm_rx_timeout_handler() is run after 1 second due to bcm_rx_starttimer()
- overwritten sk->sk_data_ready(sk) is executed, jump to beginning of struct bcm_sock
- within struct bcm_sock, move to the end of struct bcm_sock and start executing the extended ROP
  chain
- overwrite modprobe_path and return back to __queue_sock_rcv_skb()

- run /tmp/dummy so /tmp/x will be run by root -> unprivileged user is added to /etc/sudoers without
  password

Notice
------

Investigating into a syzbot report to find its root cause and prove exploitability was a great
opportunity which taught me a couple of useful tricks. If you have any questions, send me an
e-mail (nslusarek@gmx.net).

Also, I'm currently looking for job and internship opportunities in infosec in Germany/Europe.
In case you are interested, please reach out to me via e-mail.

References
----------

https://www.kernel.org/doc/Documentation/RCU/whatisRCU.txt
https://duasynt.com/blog/linux-kernel-heap-spray
https://lkmidas.github.io/posts/20210223-linux-kernel-pwn-modprobe/
https://github.com/nrb547/kernel-exploitation/tree/main/cve-2021-34693

[-- Attachment #3: cve-2021-3609-spray.png --]
[-- Type: image/png, Size: 31596 bytes --]

^ permalink raw reply	[flat|nested] only message in thread