All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Jann Horn <jannh@google.com>
Cc: linux-block@vger.kernel.org,
	"David S. Miller" <davem@davemloft.net>,
	Network Development <netdev@vger.kernel.org>
Subject: Re: [PATCH 1/3] io_uring: add support for async work inheriting files table
Date: Fri, 18 Oct 2019 10:36:26 -0600	[thread overview]
Message-ID: <20b44cc0-87b1-7bf8-d20e-f6131da9d130@kernel.dk> (raw)
In-Reply-To: <CAG48ez12pteHyZasU8Smup-0Mn3BWNMCVjybd1jvXsPrJ7OmYg@mail.gmail.com>

On 10/18/19 10:20 AM, Jann Horn wrote:
> On Fri, Oct 18, 2019 at 5:55 PM Jens Axboe <axboe@kernel.dk> wrote:
>> On 10/18/19 9:00 AM, Jens Axboe wrote:
>>> On 10/18/19 8:52 AM, Jann Horn wrote:
>>>> On Fri, Oct 18, 2019 at 4:43 PM Jens Axboe <axboe@kernel.dk> wrote:
>>>>>
>>>>> On 10/18/19 8:40 AM, Jann Horn wrote:
>>>>>> On Fri, Oct 18, 2019 at 4:37 PM Jens Axboe <axboe@kernel.dk> wrote:
>>>>>>>
>>>>>>> On 10/18/19 8:34 AM, Jann Horn wrote:
>>>>>>>> On Fri, Oct 18, 2019 at 4:01 PM Jens Axboe <axboe@kernel.dk> wrote:
>>>>>>>>> On 10/17/19 8:41 PM, Jann Horn wrote:
>>>>>>>>>> On Fri, Oct 18, 2019 at 4:01 AM Jens Axboe <axboe@kernel.dk> wrote:
>>>>>>>>>>> This is in preparation for adding opcodes that need to modify files
>>>>>>>>>>> in a process file table, either adding new ones or closing old ones.
>>>>>>>> [...]
>>>>>>>>> Updated patch1:
>>>>>>>>>
>>>>>>>>> http://git.kernel.dk/cgit/linux-block/commit/?h=for-5.5/io_uring-test&id=df6caac708dae8ee9a74c9016e479b02ad78d436
>>>>>>>>
>>>>>>>> I don't understand what you're doing with old_files in there. In the
>>>>>>>> "s->files && !old_files" branch, "current->files = s->files" happens
>>>>>>>> without holding task_lock(), but current->files and s->files are also
>>>>>>>> the same already at that point anyway. And what's the intent behind
>>>>>>>> assigning stuff to old_files inside the loop? Isn't that going to
>>>>>>>> cause the workqueue to keep a modified current->files beyond the
>>>>>>>> runtime of the work?
>>>>>>>
>>>>>>> I simply forgot to remove the old block, it should only have this one:
>>>>>>>
>>>>>>> if (s->files && s->files != cur_files) {
>>>>>>>             task_lock(current);
>>>>>>>             current->files = s->files;
>>>>>>>             task_unlock(current);
>>>>>>>             if (cur_files)
>>>>>>>                     put_files_struct(cur_files);
>>>>>>>             cur_files = s->files;
>>>>>>> }
>>>>>>
>>>>>> Don't you still need a put_files_struct() in the case where "s->files
>>>>>> == cur_files"?
>>>>>
>>>>> I want to hold on to the files for as long as I can, to avoid unnecessary
>>>>> shuffling of it. But I take it your worry here is that we'll be calling
>>>>> something that manipulates ->files? Nothing should do that, unless
>>>>> s->files is set. We didn't hide the workqueue ->files[] before this
>>>>> change either.
>>>>
>>>> No, my worry is that the refcount of the files_struct is left too
>>>> high. From what I can tell, the "do" loop in io_sq_wq_submit_work()
>>>> iterates over multiple instances of struct sqe_submit. If there are
>>>> two sqe_submit instances with the same ->files (each holding a
>>>> reference from the get_files_struct() in __io_queue_sqe()), then:
>>>>
>>>> When processing the first sqe_submit instance, current->files and
>>>> cur_files are set to $user_files.
>>>> When processing the second sqe_submit instance, nothing happens
>>>> (s->files == cur_files).
>>>> After the loop, at the end of the function, put_files_struct() is
>>>> called once on $user_files.
>>>>
>>>> So get_files_struct() has been called twice, but put_files_struct()
>>>> has only been called once. That leaves the refcount too high, and by
>>>> repeating this, an attacker can make the refcount wrap around and then
>>>> cause a use-after-free.
>>>
>>> Ah now I see what you are getting at, yes that's clearly a bug! I wonder
>>> how we best safely can batch the drops. We can track the number of times
>>> we've used the same files, and do atomic_sub_and_test() in a
>>> put_files_struct_many() type addition. But that would leave us open to
>>> the issue you describe, where someone could maliciously overflow the
>>> files ref count.
>>>
>>> Probably not worth over-optimizing, as long as we can avoid the
>>> current->files task lock/unlock and shuffle.
>>>
>>> I'll update the patch.
>>
>> Alright, this incremental on top should do it. And full updated patch
>> here:
>>
>> http://git.kernel.dk/cgit/linux-block/commit/?h=for-5.5/io_uring-test&id=40449c5a3d3b16796fa13e9469c69d62986e961c
>>
>> Let me know what you think.
> 
> Ignoring the locking elision, basically the logic is now this:
> 
> static void io_sq_wq_submit_work(struct work_struct *work)
> {
>          struct io_kiocb *req = container_of(work, struct io_kiocb, work);
>          struct files_struct *cur_files = NULL, *old_files;
>          [...]
>          old_files = current->files;
>          [...]
>          do {
>                  struct sqe_submit *s = &req->submit;
>                  [...]
>                  if (cur_files)
>                          /* drop cur_files reference; borrow lifetime must
>                           * end before here */
>                          put_files_struct(cur_files);
>                  /* move reference ownership to cur_files */
>                  cur_files = s->files;
>                  if (cur_files) {
>                          task_lock(current);
>                          /* current->files borrows reference from cur_files;
>                           * existing borrow from previous loop ends here */
>                          current->files = cur_files;
>                          task_unlock(current);
>                  }
> 
>                  [call __io_submit_sqe()]
>                  [...]
>          } while (req);
>          [...]
>          /* existing borrow ends here */
>          task_lock(current);
>          current->files = old_files;
>          task_unlock(current);
>          if (cur_files)
>                  /* drop cur_files reference; borrow lifetime must
>                   * end before here */
>                  put_files_struct(cur_files);
> }
> 
> If you run two iterations of this loop, with a first element that has
> a ->files pointer and a second element that doesn't, then in the
> second run through the loop, the reference to the files_struct will be
> dropped while current->files still points to it; current->files is
> only reset after the loop has ended. If someone accesses
> current->files through procfs directly after that, AFAICS you'd get a
> use-after-free.

Amazing how this is still broken. You are right, and it's especially
annoying since that's exactly the case I originally talked about (not
flipping current->files if we don't have to). I just did it wrong, so
we'll leave a dangling pointer in ->files.

The by far most common case is if one sqe has a files it needs to
attach, then others that also have files will be the same set. So I want
to optimize for the case where we only flip current->files once when we
see the files, and once when we're done with the loop.

Let me see if I can get this right...

-- 
Jens Axboe


  reply	other threads:[~2019-10-18 16:36 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-17 21:28 [PATCHSET] io_uring: add support for accept(4) Jens Axboe
2019-10-17 21:28 ` [PATCH 1/3] io_uring: add support for async work inheriting files table Jens Axboe
2019-10-18  2:41   ` Jann Horn
2019-10-18 14:01     ` Jens Axboe
2019-10-18 14:34       ` Jann Horn
2019-10-18 14:37         ` Jens Axboe
2019-10-18 14:40           ` Jann Horn
2019-10-18 14:43             ` Jens Axboe
2019-10-18 14:52               ` Jann Horn
2019-10-18 15:00                 ` Jens Axboe
2019-10-18 15:54                   ` Jens Axboe
2019-10-18 16:20                     ` Jann Horn
2019-10-18 16:36                       ` Jens Axboe [this message]
2019-10-18 17:05                         ` Jens Axboe
2019-10-18 18:06                           ` Jann Horn
2019-10-18 18:16                             ` Jens Axboe
2019-10-18 18:50                               ` Jann Horn
2019-10-24 19:41                                 ` Jens Axboe
2019-10-24 20:31                                   ` Jann Horn
2019-10-24 22:04                                     ` Jens Axboe
2019-10-24 22:09                                       ` Jens Axboe
2019-10-24 23:13                                       ` Jann Horn
2019-10-25  0:35                                         ` Jens Axboe
2019-10-25  0:52                                           ` Jens Axboe
2019-10-23 12:04   ` Wolfgang Bumiller
2019-10-23 14:11     ` Jens Axboe
2019-10-17 21:28 ` [PATCH 2/3] net: add __sys_accept4_file() helper Jens Axboe
2019-10-17 21:28 ` [PATCH 3/3] io_uring: add support for IORING_OP_ACCEPT Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20b44cc0-87b1-7bf8-d20e-f6131da9d130@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=davem@davemloft.net \
    --cc=jannh@google.com \
    --cc=linux-block@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.