netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Patch] mqueue: fix the retry logic for netlink_attachskb()
@ 2017-07-07 18:32 Cong Wang
  2017-07-08  0:23 ` Linus Torvalds
  0 siblings, 1 reply; 5+ messages in thread
From: Cong Wang @ 2017-07-07 18:32 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, geneblue.mail, Cong Wang, Andrew Morton,
	Linus Torvalds, Manfred Spraul

The retry logic for netlink_attachskb() inside sys_mq_notify()
is suspicious and vulnerable:

1) The sock refcnt is already released when retry is needed
2) The fd is controllable by user-space because we already
   release the file refcnt

so we when retry and the fd has been closed during this small
window, we end up calling netlink_detachskb() on the error path
which releases the sock again and could lead to a use-after-free.

We don't need to re-get the file pointer in the retry loop, keep
it until we finally succeed.

Reported-by: GeneBlue <geneblue.mail@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 ipc/mqueue.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index e8d41ff..b894d6c 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -1237,15 +1237,15 @@ SYSCALL_DEFINE2(mq_notify, mqd_t, mqdes,
 			/* TODO: add a header? */
 			skb_put(nc, NOTIFY_COOKIE_LEN);
 			/* and attach it to the socket */
-retry:
 			f = fdget(notification.sigev_signo);
 			if (!f.file) {
 				ret = -EBADF;
 				goto out;
 			}
+retry:
 			sock = netlink_getsockbyfilp(f.file);
-			fdput(f);
 			if (IS_ERR(sock)) {
+				fdput(f);
 				ret = PTR_ERR(sock);
 				sock = NULL;
 				goto out;
@@ -1255,6 +1255,7 @@ SYSCALL_DEFINE2(mq_notify, mqd_t, mqdes,
 			ret = netlink_attachskb(sock, nc, &timeo, NULL);
 			if (ret == 1)
 				goto retry;
+			fdput(f);
 			if (ret) {
 				sock = NULL;
 				nc = NULL;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [Patch] mqueue: fix the retry logic for netlink_attachskb()
  2017-07-07 18:32 [Patch] mqueue: fix the retry logic for netlink_attachskb() Cong Wang
@ 2017-07-08  0:23 ` Linus Torvalds
  2017-07-08 18:04   ` Cong Wang
  0 siblings, 1 reply; 5+ messages in thread
From: Linus Torvalds @ 2017-07-08  0:23 UTC (permalink / raw)
  To: Cong Wang
  Cc: Network Development, Linux Kernel Mailing List, geneblue.mail,
	Andrew Morton, Manfred Spraul

On Fri, Jul 7, 2017 at 11:32 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> The retry logic for netlink_attachskb() inside sys_mq_notify()
> is suspicious and vulnerable:
>
> 1) The sock refcnt is already released when retry is needed
> 2) The fd is controllable by user-space because we already
>    release the file refcnt

Hmm. What's different the second (and third.. and..) time around from
the first time?

I don't dislike your patch (it looks fine), but  avoiding the
fdget/fdput in the retry loop doesn't seem to really change anything -
it's just as if we'd just react to the original thing a bit later.

> so we when retry and the fd has been closed during this small
> window, we end up calling netlink_detachskb() on the error path
> which releases the sock again and could lead to a use-after-free.

So this seems to be a real problem: "sock" is not NULL'ed out in that

                        if (!f.file) {

error case (or alternatively, in the retry case).  Plus, since we did
the "fput()" early, "sock" may be gone by the time we do the
netlink_attachskb() even when it's all successful.

But I don't think this is really so much about the retrying - the
"sock may be gone" case seems to be true even the first time around,
and even if we never retry at all.

Am I reading this correctly?

Basically, I think the patch is fine, but the explanation seems a bit
misleading. This isn't really about the re-trying: that would be fine
if we just cleaned up sock properly.

Can you confirm that? I don't know where the original report is.

And that code is ancient, so we should do a "cc: stable" there too,
and backport it basically forever. I think most of the code in this
area predates the git tree, although Al Viro actually touched some
things around here very recently to make the compat case cleaner.

                 Linus

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Patch] mqueue: fix the retry logic for netlink_attachskb()
  2017-07-08  0:23 ` Linus Torvalds
@ 2017-07-08 18:04   ` Cong Wang
  2017-07-08 18:55     ` Linus Torvalds
  0 siblings, 1 reply; 5+ messages in thread
From: Cong Wang @ 2017-07-08 18:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Network Development, Linux Kernel Mailing List, Gene Blue,
	Andrew Morton, Manfred Spraul

On Fri, Jul 7, 2017 at 5:23 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Fri, Jul 7, 2017 at 11:32 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> so we when retry and the fd has been closed during this small
>> window, we end up calling netlink_detachskb() on the error path
>> which releases the sock again and could lead to a use-after-free.
>
> So this seems to be a real problem: "sock" is not NULL'ed out in that
>
>                         if (!f.file) {
>
> error case (or alternatively, in the retry case).  Plus, since we did
> the "fput()" early, "sock" may be gone by the time we do the
> netlink_attachskb() even when it's all successful.
>
> But I don't think this is really so much about the retrying - the
> "sock may be gone" case seems to be true even the first time around,
> and even if we never retry at all.
>
> Am I reading this correctly?


Yes you are correct.

>
> Basically, I think the patch is fine, but the explanation seems a bit
> misleading. This isn't really about the re-trying: that would be fine
> if we just cleaned up sock properly.
>
> Can you confirm that? I don't know where the original report is.

Yes of course, setting 'sock' to NULL before 'goto retry' is sufficient
to fix it, that is in fact my initial thought. And I realized retry'ing
fdget() can't help anything in this situation but increases the
attack vector, so I decided to get rid of it from the retry loop
instead of just NULL'ing 'sock'.

Or do you prefer the simpler fix? Or should I just resend it with
a improved changelog?

BTW, the original report is here:
https://groups.google.com/forum/#!topic/syzkaller/QsmbsGoYPzA


>
> And that code is ancient, so we should do a "cc: stable" there too,
> and backport it basically forever. I think most of the code in this
> area predates the git tree, although Al Viro actually touched some
> things around here very recently to make the compat case cleaner.
>

Yeah, sorry about forgetting it.

Thanks!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Patch] mqueue: fix the retry logic for netlink_attachskb()
  2017-07-08 18:04   ` Cong Wang
@ 2017-07-08 18:55     ` Linus Torvalds
  2017-07-09 19:57       ` Cong Wang
  0 siblings, 1 reply; 5+ messages in thread
From: Linus Torvalds @ 2017-07-08 18:55 UTC (permalink / raw)
  To: Cong Wang
  Cc: Network Development, Linux Kernel Mailing List, Gene Blue,
	Andrew Morton, Manfred Spraul

On Sat, Jul 8, 2017 at 11:04 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>
>> Can you confirm that? I don't know where the original report is.
>
> Yes of course, setting 'sock' to NULL before 'goto retry' is sufficient
> to fix it, that is in fact my initial thought. And I realized retry'ing
> fdget() can't help anything in this situation but increases the
> attack vector, so I decided to get rid of it from the retry loop
> instead of just NULL'ing 'sock'.
>
> Or do you prefer the simpler fix? Or should I just resend it with
> a improved changelog?

It was just the combination of that nasty code, your patch, and the
explanation that confused me.

Reading the patch, I actually thought that one of the things you fixed
was moving the "fdput()" later, to after the netlink_attachskb().

And I thought you did that because netlink_attachskb() would need the
file to be still around keeping a reference count to the socket, and
without it the socket could have been dropped in the meantime.

But reading the code more closely, I notice that
netlink_getsockbyfilp() gets a reference to the sock, and it's that
netlink_attachskb() will drop that reference on error or retry.

So the fdput() makes sense after netlink_getsockbyfilp(), but that's
also why the retry code currently includes repeating the fdget()...
And the error handling for the fdget is that then triggers the real
bug.

So the reason you moved the fdput() later wasn't to protect the socket
reference, it was just because of how the whole retry loop needs to
have the file descriptor just to get a new reference to the socket.

That's why I thought you fixed a bug even in the first iteration, but
it turns out that was just me making assumptions based on mis-reading
the patch without looking at all the context and the logic of the
called functions.

Now that I have checked deeper, I realize that your patch description
was actually correct about this only being a retry problem - the first
time around the reference count ends up moving correctly from file to
socket, but then when it repeats and 'sock' may contain a stale
pointer, we may end up doing the wrong thing when the fdget fails.

Honestly, now I feel like either patch is fine, and your original
commit message is fine too - but I just hate that code.

And making it use some nice helper function to clean it up looks
painful too, because the error handling is so odd (ie
mq_netlink_attachskb() will free the skb on error, while the other
error cases won't, so you'd have to have some special handling for the
different errors that can happen).

Honestly, this code is nasty, and right now my feeling is that it
would be good to have a minimal patch that also backports cleanly.
Maybe somebody can clean it up later, but that's a separate windmill
to rail against.

And due to the recent compat cleanups by Al, your bigger patch does
not apply cleanly to current git - but the smaller patch to just
setting 'sock' to NULL before that 'goto retry' should apply cleanly
to all versions of this code.

So purely because of that reason, I think I'd prefer to see that
smaller patch instead. Would you mind re-sending the thing?

Sorry about the whole confusion.

               Linus

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Patch] mqueue: fix the retry logic for netlink_attachskb()
  2017-07-08 18:55     ` Linus Torvalds
@ 2017-07-09 19:57       ` Cong Wang
  0 siblings, 0 replies; 5+ messages in thread
From: Cong Wang @ 2017-07-09 19:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Network Development, Linux Kernel Mailing List, Gene Blue,
	Andrew Morton, Manfred Spraul

On Sat, Jul 8, 2017 at 11:55 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Sat, Jul 8, 2017 at 11:04 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>>
>>> Can you confirm that? I don't know where the original report is.
>>
>> Yes of course, setting 'sock' to NULL before 'goto retry' is sufficient
>> to fix it, that is in fact my initial thought. And I realized retry'ing
>> fdget() can't help anything in this situation but increases the
>> attack vector, so I decided to get rid of it from the retry loop
>> instead of just NULL'ing 'sock'.
>>
>> Or do you prefer the simpler fix? Or should I just resend it with
>> a improved changelog?
>
> It was just the combination of that nasty code, your patch, and the
> explanation that confused me.
>
> Reading the patch, I actually thought that one of the things you fixed
> was moving the "fdput()" later, to after the netlink_attachskb().
>
> And I thought you did that because netlink_attachskb() would need the
> file to be still around keeping a reference count to the socket, and
> without it the socket could have been dropped in the meantime.
>
> But reading the code more closely, I notice that
> netlink_getsockbyfilp() gets a reference to the sock, and it's that
> netlink_attachskb() will drop that reference on error or retry.
>
> So the fdput() makes sense after netlink_getsockbyfilp(), but that's
> also why the retry code currently includes repeating the fdget()...
> And the error handling for the fdget is that then triggers the real
> bug.
>
> So the reason you moved the fdput() later wasn't to protect the socket
> reference, it was just because of how the whole retry loop needs to
> have the file descriptor just to get a new reference to the socket.
>
> That's why I thought you fixed a bug even in the first iteration, but
> it turns out that was just me making assumptions based on mis-reading
> the patch without looking at all the context and the logic of the
> called functions.
>
> Now that I have checked deeper, I realize that your patch description
> was actually correct about this only being a retry problem - the first
> time around the reference count ends up moving correctly from file to
> socket, but then when it repeats and 'sock' may contain a stale
> pointer, we may end up doing the wrong thing when the fdget fails.
>
> Honestly, now I feel like either patch is fine, and your original
> commit message is fine too - but I just hate that code.
>
> And making it use some nice helper function to clean it up looks
> painful too, because the error handling is so odd (ie
> mq_netlink_attachskb() will free the skb on error, while the other
> error cases won't, so you'd have to have some special handling for the
> different errors that can happen).
>
> Honestly, this code is nasty, and right now my feeling is that it
> would be good to have a minimal patch that also backports cleanly.
> Maybe somebody can clean it up later, but that's a separate windmill
> to rail against.


Indeed, I spent a few hours to spot this bug, to be honest. ;) We can
always clean up things later.

>
> And due to the recent compat cleanups by Al, your bigger patch does
> not apply cleanly to current git - but the smaller patch to just
> setting 'sock' to NULL before that 'goto retry' should apply cleanly
> to all versions of this code.

Ah, understood, probably because I don't pull your branch so often
but now we are still in merge window...

>
> So purely because of that reason, I think I'd prefer to see that
> smaller patch instead. Would you mind re-sending the thing?

Sure, I will rebase and send the simpler fix.

>
> Sorry about the whole confusion.
>

No worries. Thanks for all the details!

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-07-09 19:57 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-07 18:32 [Patch] mqueue: fix the retry logic for netlink_attachskb() Cong Wang
2017-07-08  0:23 ` Linus Torvalds
2017-07-08 18:04   ` Cong Wang
2017-07-08 18:55     ` Linus Torvalds
2017-07-09 19:57       ` Cong Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).