Re: [PATCH 6/6] fuse: Verify userspace asks to requeue interrupt that we really sent

From: Kirill Tkhai <ktkhai@virtuozzo.com>
To: Miklos Szeredi <miklos@szeredi.hu>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 6/6] fuse: Verify userspace asks to requeue interrupt that we really sent
Date: Wed, 7 Nov 2018 17:25:52 +0300	[thread overview]
Message-ID: <6f27b5a5-0092-b23f-b28e-341ae093a241@virtuozzo.com> (raw)
In-Reply-To: <CAJfpeguDTsG7vEAhH=CHp43vJak70VzR8YH8K6=vZAUXCZZeEQ@mail.gmail.com>

On 07.11.2018 16:55, Miklos Szeredi wrote:
> On Tue, Nov 6, 2018 at 10:31 AM, Kirill Tkhai <ktkhai@virtuozzo.com> wrote:
>> When queue_interrupt() is called from fuse_dev_do_write(),
>> it came from userspace directly. Userspace may pass any
>> request id, even the request's we have not interrupted
>> (or even background's request). This patch adds sanity
>> check to make kernel safe against that.
> 
> Okay, I understand this far.
> 
>> The problem is real interrupt may be queued and requeued
>> by two tasks on two cpus. This case, the requeuer has not
>> guarantees it sees FR_INTERRUPTED bit on its cpu (since
>> we know nothing about the way userspace manages requests
>> between its threads and whether it uses smp barriers).
> 
> This sounds like BS. There's an explicit  smp_mb__after_atomic()
> after the set_bit(FR_INTERRUPTED,...).  Which means FR_INTERRUPTED is
> going to be visible on all CPU's after this, no need to fool around
> with setting FR_INTERRUPTED again, etc...

Hm, but how does it make the bit visible on all CPUS?

The problem is that smp_mb_xxx() barrier on a cpu has a sense
only in pair with the appropriate barrier on the second cpu.
There is no guarantee for visibility, if second cpu does not
have a barrier:

  CPU 1                  CPU2                        CPU3                       CPU4                        CPU5

  set FR_INTERRUPTED     set FR_SENT                                            
  <smp mb>               <smp mb>                          
  test FR_SENT (== 0)    test FR_INTERRUPTED (==1)
                         list_add[&req->intr_entry]  read[req by intr_entry]
                                                     <place to insert a barrier>
                                                     goto userspace
                                                     write in userspace buffer
                                                                                read from userspace buffer  
                                                                                write to userspace buffer
                                                                                                             read from userspace buffer
                                                                                                             enter kernel
                                                                                                             <place to insert a barrier>
                                                                                                             test FR_INTERRUPTED <- Not visible

The sequence:

set_bit(FR_INTERRUPTED, ...)
smp_mb__after_atomic();
test_bit(FR_SENT, &req->flags)

just guarantees the expected order on CPU2, which uses <smp mb>,
but CPU5 does not have any guarantees.

This 5 CPUs picture is a scheme, which illustrates the possible way userspace
may manage interrupts. Tags <place to insert a barrier> show places, where
we not have barriers yet, but where we may introduce them. But even in case
of we introduce them, there is no a way, that such the barriers help against CPU4.
So, this is the reason we have to set FR_INTERRUPTED bit again to make it visible
under the spinlock on CPU5.

Thanks,
Kirill

>>
>> To eliminate this problem, queuer writes FR_INTERRUPTED
>> bit again under fiq->waitq.lock, and this guarantees
>> requeuer will see the bit, when checks it.
>>
>> I try to introduce solution, which does not affect on
>> performance, and which does not force to take more
>> locks. This is the reason, the below solution is worse:
>>
>>    request_wait_answer()
>>    {
>>      ...
>>   +  spin_lock(&fiq->waitq.lock);
>>      set_bit(FR_INTERRUPTED, &req->flags);
>>   +  spin_unlock(&fiq->waitq.lock);
>>      ...
>>    }
>>
>> Also, it does not look a better idea to extend fuse_dev_do_read()
>> with the fix, since it's already a big function:
>>
>>    fuse_dev_do_read()
>>    {
>>      ...
>>      if (test_bit(FR_INTERRUPTED, &req->flags)) {
>>   +      /* Comment */
>>   +      barrier();
>>   +      set_bit(FR_INTERRUPTED, &req->flags);
>>          queue_interrupt(fiq, req);
>>      }
>>      ...
>>    }
>>
>> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
>> ---
>>  fs/fuse/dev.c |   26 +++++++++++++++++++++-----
>>  1 file changed, 21 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
>> index 315d395d5c02..3bfc5ed61c9a 100644
>> --- a/fs/fuse/dev.c
>> +++ b/fs/fuse/dev.c
>> @@ -475,13 +475,27 @@ static void request_end(struct fuse_conn *fc, struct fuse_req *req)
>>         fuse_put_request(fc, req);
>>  }
>>
>> -static void queue_interrupt(struct fuse_iqueue *fiq, struct fuse_req *req)
>> +static int queue_interrupt(struct fuse_iqueue *fiq, struct fuse_req *req)
>>  {
>>         bool kill = false;
>>
>>         if (test_bit(FR_FINISHED, &req->flags))
>> -               return;
>> +               return 0;
>>         spin_lock(&fiq->waitq.lock);
>> +       /* Check for we've sent request to interrupt this req */
>> +       if (unlikely(!test_bit(FR_INTERRUPTED, &req->flags))) {
>> +               spin_unlock(&fiq->waitq.lock);
>> +               return -EINVAL;
>> +       }
>> +       /*
>> +        * Interrupt may be queued from fuse_dev_do_read(), and
>> +        * later requeued on other cpu by fuse_dev_do_write().
>> +        * To make FR_INTERRUPTED bit visible for the requeuer
>> +        * (under fiq->waitq.lock) we write it once again.
>> +        */
>> +       barrier();
>> +       __set_bit(FR_INTERRUPTED, &req->flags);
>> +
>>         if (list_empty(&req->intr_entry)) {
>>                 list_add_tail(&req->intr_entry, &fiq->interrupts);
>>                 /*
>> @@ -492,7 +506,7 @@ static void queue_interrupt(struct fuse_iqueue *fiq, struct fuse_req *req)
>>                 if (test_bit(FR_FINISHED, &req->flags)) {
>>                         list_del_init(&req->intr_entry);
>>                         spin_unlock(&fiq->waitq.lock);
>> -                       return;
>> +                       return 0;
>>                 }
>>                 wake_up_locked(&fiq->waitq);
>>                 kill = true;
>> @@ -500,6 +514,7 @@ static void queue_interrupt(struct fuse_iqueue *fiq, struct fuse_req *req)
>>         spin_unlock(&fiq->waitq.lock);
>>         if (kill)
>>                 kill_fasync(&fiq->fasync, SIGIO, POLL_IN);
>> +       return (int)kill;
>>  }
>>
>>  static void request_wait_answer(struct fuse_conn *fc, struct fuse_req *req)
>> @@ -1959,8 +1974,9 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
>>                         nbytes = -EINVAL;
>>                 else if (oh.error == -ENOSYS)
>>                         fc->no_interrupt = 1;
>> -               else if (oh.error == -EAGAIN)
>> -                       queue_interrupt(&fc->iq, req);
>> +               else if (oh.error == -EAGAIN &&
>> +                        queue_interrupt(&fc->iq, req) < 0)
>> +                       nbytes = -EINVAL;
>>
>>                 fuse_put_request(fc, req);
>>                 fuse_copy_finish(cs);
>>