All of lore.kernel.org
 help / color / mirror / Atom feed
* Should PV frontend drivers trust the backends?
@ 2018-04-25 12:42 Juergen Gross
  2018-04-25 13:36 ` Andrew Cooper
  2018-04-25 13:47 ` Paul Durrant
  0 siblings, 2 replies; 11+ messages in thread
From: Juergen Gross @ 2018-04-25 12:42 UTC (permalink / raw)
  To: xen-devel

This is a followup of a discussion on IRC:

The main question of the discussion was: "Should frontend drivers
trust their backends not doing malicious actions?"

This IMO includes:

1. The data put by the backend on the ring page(s) is sane and
   consistent, meaning that e.g. the response producer index is always
   ahead of the consumer index.

2. Response data won't be modified by the backend after the producer
   index has been incremented signaling the response is valid.

3. Response data is sane, e.g. an I/O data length is not larger than
   the buffer originally was.

4. When a response has been sent all grants belonging to the request
   have been unmapped again by the backend, meaning that the frontend
   can assume the grants can be removed without conflict.

Today most frontend drivers (at least in the Linux kernel) seem to
assume all of the above is true (there are some exceptions, but never
for all items):

- they don't check sanity of ring index values
- they don't copy response data into local memory before looking at it
- they don't verify returned data length (or do so via BUG_ON())
- they BUG() in case of a conflict when trying to remove a grant

So the basic question is: should all Linux frontend drivers be modified
in order to be able to tolerate buggy or malicious backends? Or is the
list of trust above fine?

IMO even in case the frontends do trust the backends to behave sane this
doesn't mean driver domains don't make sense. Driver domains still make
a Xen host more robust as they e.g. protect the host against driver
failures normally leading to a crash of dom0.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Should PV frontend drivers trust the backends?
  2018-04-25 12:42 Should PV frontend drivers trust the backends? Juergen Gross
@ 2018-04-25 13:36 ` Andrew Cooper
  2018-04-25 13:47 ` Paul Durrant
  1 sibling, 0 replies; 11+ messages in thread
From: Andrew Cooper @ 2018-04-25 13:36 UTC (permalink / raw)
  To: Juergen Gross, xen-devel

On 25/04/18 13:42, Juergen Gross wrote:
> This is a followup of a discussion on IRC:
>
> The main question of the discussion was: "Should frontend drivers
> trust their backends not doing malicious actions?"
>
> This IMO includes:
>
> 1. The data put by the backend on the ring page(s) is sane and
>    consistent, meaning that e.g. the response producer index is always
>    ahead of the consumer index.
>
> 2. Response data won't be modified by the backend after the producer
>    index has been incremented signaling the response is valid.
>
> 3. Response data is sane, e.g. an I/O data length is not larger than
>    the buffer originally was.
>
> 4. When a response has been sent all grants belonging to the request
>    have been unmapped again by the backend, meaning that the frontend
>    can assume the grants can be removed without conflict.
>
> Today most frontend drivers (at least in the Linux kernel) seem to
> assume all of the above is true (there are some exceptions, but never
> for all items):
>
> - they don't check sanity of ring index values
> - they don't copy response data into local memory before looking at it
> - they don't verify returned data length (or do so via BUG_ON())
> - they BUG() in case of a conflict when trying to remove a grant
>
> So the basic question is: should all Linux frontend drivers be modified
> in order to be able to tolerate buggy or malicious backends? Or is the
> list of trust above fine?
>
> IMO even in case the frontends do trust the backends to behave sane this
> doesn't mean driver domains don't make sense. Driver domains still make
> a Xen host more robust as they e.g. protect the host against driver
> failures normally leading to a crash of dom0.

I think the issue here is that "trust" is actually two different thing here.

If we consider "the ring" as an opaque transport layer, then I expect
both sides to be resilient to a buggy/malicious other end.  I realise
this is not currently the case, but I think it should be reasonable to
hook either side up to AFL and not have the other side crash. 
(Declaring the other half insane and transitioning to closed is an
entirely reasonable action.)

When it comes to the data content served by "the opaque ring", then
trust is far more complicated thing.

If blkback is serving /, then the default case has little option but to
trust the other end.  This is clearly how the frontends have been developed.

However, non-default cases might include using an encrypted filesystem,
at which point the domU isn't in the position of having to trust the
driver domain serving its filesystem, and therefore shouldn't be forced
into trusting the backend simply because it doesn't protect itself
against pitfalls which inherently come from using shared memory interfaces.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Should PV frontend drivers trust the backends?
  2018-04-25 12:42 Should PV frontend drivers trust the backends? Juergen Gross
  2018-04-25 13:36 ` Andrew Cooper
@ 2018-04-25 13:47 ` Paul Durrant
  2018-04-26  5:59   ` Oleksandr Andrushchenko
  2018-04-26  8:46   ` Petr Tesarik
  1 sibling, 2 replies; 11+ messages in thread
From: Paul Durrant @ 2018-04-25 13:47 UTC (permalink / raw)
  To: 'Juergen Gross', xen-devel

> -----Original Message-----
> From: Xen-devel [mailto:xen-devel-bounces@lists.xenproject.org] On Behalf
> Of Juergen Gross
> Sent: 25 April 2018 13:43
> To: xen-devel <xen-devel@lists.xenproject.org>
> Subject: [Xen-devel] Should PV frontend drivers trust the backends?
> 
> This is a followup of a discussion on IRC:
> 
> The main question of the discussion was: "Should frontend drivers
> trust their backends not doing malicious actions?"
> 
> This IMO includes:
> 
> 1. The data put by the backend on the ring page(s) is sane and
>    consistent, meaning that e.g. the response producer index is always
>    ahead of the consumer index.
> 
> 2. Response data won't be modified by the backend after the producer
>    index has been incremented signaling the response is valid.
> 
> 3. Response data is sane, e.g. an I/O data length is not larger than
>    the buffer originally was.
> 
> 4. When a response has been sent all grants belonging to the request
>    have been unmapped again by the backend, meaning that the frontend
>    can assume the grants can be removed without conflict.
> 
> Today most frontend drivers (at least in the Linux kernel) seem to
> assume all of the above is true (there are some exceptions, but never
> for all items):
> 
> - they don't check sanity of ring index values
> - they don't copy response data into local memory before looking at it
> - they don't verify returned data length (or do so via BUG_ON())
> - they BUG() in case of a conflict when trying to remove a grant
> 
> So the basic question is: should all Linux frontend drivers be modified
> in order to be able to tolerate buggy or malicious backends? Or is the
> list of trust above fine?
> 
> IMO even in case the frontends do trust the backends to behave sane this
> doesn't mean driver domains don't make sense. Driver domains still make
> a Xen host more robust as they e.g. protect the host against driver
> failures normally leading to a crash of dom0.
> 

I see the general question as being analogous to 'should a Linux device driver trust its hardware' and I think the answer for a general purpose OS like linux is 'yes'.

Now, having worked on fault tolerant systems in a past life, there are definitely cases where you want your OS not to implicitly trust its peripheral hardware and hence special device drivers are used. I think the same would apply for virtual machines in situations where a driver domain is not wholly controlled by a host administrator or is not trusted to the same extent as dom0 for other reasons; i.e. they should have specialist frontends.

  Paul

> 
> Juergen
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Should PV frontend drivers trust the backends?
  2018-04-25 13:47 ` Paul Durrant
@ 2018-04-26  5:59   ` Oleksandr Andrushchenko
  2018-04-26  8:16     ` Paul Durrant
  2018-04-26  8:46   ` Petr Tesarik
  1 sibling, 1 reply; 11+ messages in thread
From: Oleksandr Andrushchenko @ 2018-04-26  5:59 UTC (permalink / raw)
  To: Paul Durrant, 'Juergen Gross', xen-devel

On 04/25/2018 04:47 PM, Paul Durrant wrote:
>> -----Original Message-----
>> From: Xen-devel [mailto:xen-devel-bounces@lists.xenproject.org] On Behalf
>> Of Juergen Gross
>> Sent: 25 April 2018 13:43
>> To: xen-devel <xen-devel@lists.xenproject.org>
>> Subject: [Xen-devel] Should PV frontend drivers trust the backends?
>>
>> This is a followup of a discussion on IRC:
>>
>> The main question of the discussion was: "Should frontend drivers
>> trust their backends not doing malicious actions?"
>>
>> This IMO includes:
>>
>> 1. The data put by the backend on the ring page(s) is sane and
>>     consistent, meaning that e.g. the response producer index is always
>>     ahead of the consumer index.
>>
>> 2. Response data won't be modified by the backend after the producer
>>     index has been incremented signaling the response is valid.
>>
>> 3. Response data is sane, e.g. an I/O data length is not larger than
>>     the buffer originally was.
>>
>> 4. When a response has been sent all grants belonging to the request
>>     have been unmapped again by the backend, meaning that the frontend
>>     can assume the grants can be removed without conflict.
>>
>> Today most frontend drivers (at least in the Linux kernel) seem to
>> assume all of the above is true (there are some exceptions, but never
>> for all items):
>>
>> - they don't check sanity of ring index values
>> - they don't copy response data into local memory before looking at it
>> - they don't verify returned data length (or do so via BUG_ON())
>> - they BUG() in case of a conflict when trying to remove a grant
>>
>> So the basic question is: should all Linux frontend drivers be modified
>> in order to be able to tolerate buggy or malicious backends? Or is the
>> list of trust above fine?
>>
>> IMO even in case the frontends do trust the backends to behave sane this
>> doesn't mean driver domains don't make sense. Driver domains still make
>> a Xen host more robust as they e.g. protect the host against driver
>> failures normally leading to a crash of dom0.
>>
> I see the general question as being analogous to 'should a Linux device driver trust its hardware' and I think the answer for a general purpose OS like linux is 'yes'.
>
> Now, having worked on fault tolerant systems in a past life, there are definitely cases where you want your OS not to implicitly trust its peripheral hardware and hence special device drivers are used.
So what do you do if counters provided by the untrusted HW are ok
and the payload is not?
> I think the same would apply for virtual machines in situations where a driver domain is not wholly controlled by a host administrator or is not trusted to the same extent as dom0 for other reasons; i.e. they should have specialist frontends.
I believe we might be able to express some common strategy for the 
frontends.
I do understand though that it all needs to be decided on case by case 
basis,
but common things could still be there, e.g. if prod/cons counters are 
not in sync
what a frontend needs to do:
  - should it keep trying to get in sync - might be a bad idea as the 
req/resp data
    may already become inconsistent (net can probably survive, but not 
block)
  - should it tear down the connection with the backend - this may 
render in the whole
    system instability, e.g. imagine you tear down a "/" block device
  - should it BUG_ON and die
To me the second option (tear down the connection) seems to be
more reasonable, although it can still render the guest unusable, but at 
least it
gives a chance for the guest to recover in a proper way

And, if my assumption is correct, we still do trust the contents of the 
requests
and responses, e.g. the payload is still trusted. This also questions 
the approach,
e.g. if we don't trust backend's counters, then why do we trust the 
payload it sends?
And there is no obvious way to check the payload integrity.

So, either we trust the backend and accept the risks or we need to 
develop some
complex approach to address the above.

Thank you,
Oleksandr
>    Paul
>
>> Juergen
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xenproject.org
>> https://lists.xenproject.org/mailman/listinfo/xen-devel
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Should PV frontend drivers trust the backends?
  2018-04-26  5:59   ` Oleksandr Andrushchenko
@ 2018-04-26  8:16     ` Paul Durrant
  2018-04-26  8:47       ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 11+ messages in thread
From: Paul Durrant @ 2018-04-26  8:16 UTC (permalink / raw)
  To: 'Oleksandr Andrushchenko', 'Juergen Gross', xen-devel

> -----Original Message-----
> From: Oleksandr Andrushchenko [mailto:andr2000@gmail.com]
> Sent: 26 April 2018 07:00
> To: Paul Durrant <Paul.Durrant@citrix.com>; 'Juergen Gross'
> <jgross@suse.com>; xen-devel <xen-devel@lists.xenproject.org>
> Subject: Re: [Xen-devel] Should PV frontend drivers trust the backends?
> 
> On 04/25/2018 04:47 PM, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Xen-devel [mailto:xen-devel-bounces@lists.xenproject.org] On
> Behalf
> >> Of Juergen Gross
> >> Sent: 25 April 2018 13:43
> >> To: xen-devel <xen-devel@lists.xenproject.org>
> >> Subject: [Xen-devel] Should PV frontend drivers trust the backends?
> >>
> >> This is a followup of a discussion on IRC:
> >>
> >> The main question of the discussion was: "Should frontend drivers
> >> trust their backends not doing malicious actions?"
> >>
> >> This IMO includes:
> >>
> >> 1. The data put by the backend on the ring page(s) is sane and
> >>     consistent, meaning that e.g. the response producer index is always
> >>     ahead of the consumer index.
> >>
> >> 2. Response data won't be modified by the backend after the producer
> >>     index has been incremented signaling the response is valid.
> >>
> >> 3. Response data is sane, e.g. an I/O data length is not larger than
> >>     the buffer originally was.
> >>
> >> 4. When a response has been sent all grants belonging to the request
> >>     have been unmapped again by the backend, meaning that the frontend
> >>     can assume the grants can be removed without conflict.
> >>
> >> Today most frontend drivers (at least in the Linux kernel) seem to
> >> assume all of the above is true (there are some exceptions, but never
> >> for all items):
> >>
> >> - they don't check sanity of ring index values
> >> - they don't copy response data into local memory before looking at it
> >> - they don't verify returned data length (or do so via BUG_ON())
> >> - they BUG() in case of a conflict when trying to remove a grant
> >>
> >> So the basic question is: should all Linux frontend drivers be modified
> >> in order to be able to tolerate buggy or malicious backends? Or is the
> >> list of trust above fine?
> >>
> >> IMO even in case the frontends do trust the backends to behave sane this
> >> doesn't mean driver domains don't make sense. Driver domains still make
> >> a Xen host more robust as they e.g. protect the host against driver
> >> failures normally leading to a crash of dom0.
> >>
> > I see the general question as being analogous to 'should a Linux device
> driver trust its hardware' and I think the answer for a general purpose OS like
> linux is 'yes'.
> >
> > Now, having worked on fault tolerant systems in a past life, there are
> definitely cases where you want your OS not to implicitly trust its peripheral
> hardware and hence special device drivers are used.
> So what do you do if counters provided by the untrusted HW are ok
> and the payload is not?

Well, that depends on whether there is actually any way to verify the payload in a driver. Whatever layer in the system is responsible for the data needs to verify its integrity in a fault tolerant system. Generally the driver can only attempt to verify that it's hardware is working as expect and quiesce it if not. For that reason, in the systems I worked on, the driver had the ability to control FETs that disconnected peripheral h/w from the PCI bus.

> > I think the same would apply for virtual machines in situations where a
> driver domain is not wholly controlled by a host administrator or is not
> trusted to the same extent as dom0 for other reasons; i.e. they should have
> specialist frontends.

> I believe we might be able to express some common strategy for the
> frontends.
> I do understand though that it all needs to be decided on case by case
> basis,
> but common things could still be there, e.g. if prod/cons counters are
> not in sync
> what a frontend needs to do:
>   - should it keep trying to get in sync - might be a bad idea as the
> req/resp data
>     may already become inconsistent (net can probably survive, but not
> block)
>   - should it tear down the connection with the backend - this may
> render in the whole
>     system instability, e.g. imagine you tear down a "/" block device
>   - should it BUG_ON and die
> To me the second option (tear down the connection) seems to be
> more reasonable, although it can still render the guest unusable, but at
> least it
> gives a chance for the guest to recover in a proper way
> 

Absolutely that can be done and it's certainly a good idea to be somewhat defensive but, as you say, it's quite likely that the PV pair is part of a critical subsystem for the guest and so a BUG() may well be the best option to make sure that the inevitable guest crash actually contains pertinent information.

> And, if my assumption is correct, we still do trust the contents of the
> requests
> and responses, e.g. the payload is still trusted.

Why should the payload be any more trusted than the content of the shared ring? They are both shared with the backend and therefore can be corrupted to the same extent.

> This also questions
> the approach,
> e.g. if we don't trust backend's counters, then why do we trust the
> payload it sends?
> And there is no obvious way to check the payload integrity.

Quite, as I said above. 

> 
> So, either we trust the backend and accept the risks or we need to
> develop some
> complex approach to address the above.
> 

Indeed, hence my position that in the general case it's not a security issue for a frontend to trust its backend.

Cheers,

  Paul

> Thank you,
> Oleksandr
> >    Paul
> >
> >> Juergen
> >>
> >> _______________________________________________
> >> Xen-devel mailing list
> >> Xen-devel@lists.xenproject.org
> >> https://lists.xenproject.org/mailman/listinfo/xen-devel
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xenproject.org
> > https://lists.xenproject.org/mailman/listinfo/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Should PV frontend drivers trust the backends?
  2018-04-25 13:47 ` Paul Durrant
  2018-04-26  5:59   ` Oleksandr Andrushchenko
@ 2018-04-26  8:46   ` Petr Tesarik
  1 sibling, 0 replies; 11+ messages in thread
From: Petr Tesarik @ 2018-04-26  8:46 UTC (permalink / raw)
  To: xen-devel

On Wed, 25 Apr 2018 13:47:09 +0000
Paul Durrant <Paul.Durrant@citrix.com> wrote:

> > -----Original Message-----
> > From: Xen-devel [mailto:xen-devel-bounces@lists.xenproject.org] On
> > Behalf Of Juergen Gross
> > Sent: 25 April 2018 13:43
> > To: xen-devel <xen-devel@lists.xenproject.org>
> > Subject: [Xen-devel] Should PV frontend drivers trust the backends?
> > 
> > This is a followup of a discussion on IRC:
> > 
> > The main question of the discussion was: "Should frontend drivers
> > trust their backends not doing malicious actions?"
> > 
>[...]
> I see the general question as being analogous to 'should a Linux
> device driver trust its hardware' and I think the answer for a
> general purpose OS like linux is 'yes'.

I can see how this is analogous, but it's not identical. Traditionally,
hardware has full control of the system anyway, so it makes little
sense to distrust it. It does make sense to validate the data and retry
an invalid operation (if possible) or crash the system (if technically
impossible).

However, a backend driver runs in a domain that is not much different
from the domain running the frontend driver, so it is theoretically
possible to implement some support in Xen itself. Now, if you're asking
whether Xen _should_ add complex handling of resilient domain-to-domain
communication, that's purely a matter of taste.

FWIW my vote is: Do nothing. Keep Xen architecture simple.

Petr T

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Should PV frontend drivers trust the backends?
  2018-04-26  8:16     ` Paul Durrant
@ 2018-04-26  8:47       ` Oleksandr Andrushchenko
  2018-04-30 17:32         ` Marek Marczykowski-Górecki
  0 siblings, 1 reply; 11+ messages in thread
From: Oleksandr Andrushchenko @ 2018-04-26  8:47 UTC (permalink / raw)
  To: Paul Durrant, 'Juergen Gross', xen-devel

On 04/26/2018 11:16 AM, Paul Durrant wrote:
>> -----Original Message-----
>> From: Oleksandr Andrushchenko [mailto:andr2000@gmail.com]
>> Sent: 26 April 2018 07:00
>> To: Paul Durrant <Paul.Durrant@citrix.com>; 'Juergen Gross'
>> <jgross@suse.com>; xen-devel <xen-devel@lists.xenproject.org>
>> Subject: Re: [Xen-devel] Should PV frontend drivers trust the backends?
>>
>> On 04/25/2018 04:47 PM, Paul Durrant wrote:
>>>> -----Original Message-----
>>>> From: Xen-devel [mailto:xen-devel-bounces@lists.xenproject.org] On
>> Behalf
>>>> Of Juergen Gross
>>>> Sent: 25 April 2018 13:43
>>>> To: xen-devel <xen-devel@lists.xenproject.org>
>>>> Subject: [Xen-devel] Should PV frontend drivers trust the backends?
>>>>
>>>> This is a followup of a discussion on IRC:
>>>>
>>>> The main question of the discussion was: "Should frontend drivers
>>>> trust their backends not doing malicious actions?"
>>>>
>>>> This IMO includes:
>>>>
>>>> 1. The data put by the backend on the ring page(s) is sane and
>>>>      consistent, meaning that e.g. the response producer index is always
>>>>      ahead of the consumer index.
>>>>
>>>> 2. Response data won't be modified by the backend after the producer
>>>>      index has been incremented signaling the response is valid.
>>>>
>>>> 3. Response data is sane, e.g. an I/O data length is not larger than
>>>>      the buffer originally was.
>>>>
>>>> 4. When a response has been sent all grants belonging to the request
>>>>      have been unmapped again by the backend, meaning that the frontend
>>>>      can assume the grants can be removed without conflict.
>>>>
>>>> Today most frontend drivers (at least in the Linux kernel) seem to
>>>> assume all of the above is true (there are some exceptions, but never
>>>> for all items):
>>>>
>>>> - they don't check sanity of ring index values
>>>> - they don't copy response data into local memory before looking at it
>>>> - they don't verify returned data length (or do so via BUG_ON())
>>>> - they BUG() in case of a conflict when trying to remove a grant
>>>>
>>>> So the basic question is: should all Linux frontend drivers be modified
>>>> in order to be able to tolerate buggy or malicious backends? Or is the
>>>> list of trust above fine?
>>>>
>>>> IMO even in case the frontends do trust the backends to behave sane this
>>>> doesn't mean driver domains don't make sense. Driver domains still make
>>>> a Xen host more robust as they e.g. protect the host against driver
>>>> failures normally leading to a crash of dom0.
>>>>
>>> I see the general question as being analogous to 'should a Linux device
>> driver trust its hardware' and I think the answer for a general purpose OS like
>> linux is 'yes'.
>>> Now, having worked on fault tolerant systems in a past life, there are
>> definitely cases where you want your OS not to implicitly trust its peripheral
>> hardware and hence special device drivers are used.
>> So what do you do if counters provided by the untrusted HW are ok
>> and the payload is not?
> Well, that depends on whether there is actually any way to verify the payload in a driver. Whatever layer in the system is responsible for the data needs to verify its integrity in a fault tolerant system. Generally the driver can only attempt to verify that it's hardware is working as expect and quiesce it if not. For that reason, in the systems I worked on, the driver had the ability to control FETs that disconnected peripheral h/w from the PCI bus.
>
>>> I think the same would apply for virtual machines in situations where a
>> driver domain is not wholly controlled by a host administrator or is not
>> trusted to the same extent as dom0 for other reasons; i.e. they should have
>> specialist frontends.
>> I believe we might be able to express some common strategy for the
>> frontends.
>> I do understand though that it all needs to be decided on case by case
>> basis,
>> but common things could still be there, e.g. if prod/cons counters are
>> not in sync
>> what a frontend needs to do:
>>    - should it keep trying to get in sync - might be a bad idea as the
>> req/resp data
>>      may already become inconsistent (net can probably survive, but not
>> block)
>>    - should it tear down the connection with the backend - this may
>> render in the whole
>>      system instability, e.g. imagine you tear down a "/" block device
>>    - should it BUG_ON and die
>> To me the second option (tear down the connection) seems to be
>> more reasonable, although it can still render the guest unusable, but at
>> least it
>> gives a chance for the guest to recover in a proper way
>>
> Absolutely that can be done and it's certainly a good idea to be somewhat defensive but, as you say, it's quite likely that the PV pair is part of a critical subsystem for the guest and so a BUG() may well be the best option to make sure that the inevitable guest crash actually contains pertinent information.
>
>> And, if my assumption is correct, we still do trust the contents of the
>> requests
>> and responses, e.g. the payload is still trusted.
> Why should the payload be any more trusted than the content of the shared ring? They are both shared with the backend and therefore can be corrupted to the same extent.
This is exactly my point: if we only try to protect from inconsistent 
prod/cons then
this protection is still incomplete as the payload may be the source of 
failure.
>> This also questions
>> the approach,
>> e.g. if we don't trust backend's counters, then why do we trust the
>> payload it sends?
>> And there is no obvious way to check the payload integrity.
> Quite, as I said above.
>
>> So, either we trust the backend and accept the risks or we need to
>> develop some
>> complex approach to address the above.
>>
> Indeed, hence my position that in the general case it's not a security issue for a frontend to trust its backend.
+1
I tend to trust the backends in general, but some critical PV pairs
*may* implement some logic to detect the corruption
> Cheers,
>
>    Paul
>
>> Thank you,
>> Oleksandr
>>>     Paul
>>>
>>>> Juergen
>>>>
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@lists.xenproject.org
>>>> https://lists.xenproject.org/mailman/listinfo/xen-devel
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xenproject.org
>>> https://lists.xenproject.org/mailman/listinfo/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Should PV frontend drivers trust the backends?
  2018-04-26  8:47       ` Oleksandr Andrushchenko
@ 2018-04-30 17:32         ` Marek Marczykowski-Górecki
  2018-05-01  7:55           ` Paul Durrant
  0 siblings, 1 reply; 11+ messages in thread
From: Marek Marczykowski-Górecki @ 2018-04-30 17:32 UTC (permalink / raw)
  To: Oleksandr Andrushchenko; +Cc: 'Juergen Gross', xen-devel, Paul Durrant


[-- Attachment #1.1: Type: text/plain, Size: 8419 bytes --]

On Thu, Apr 26, 2018 at 11:47:41AM +0300, Oleksandr Andrushchenko wrote:
> On 04/26/2018 11:16 AM, Paul Durrant wrote:
> > > -----Original Message-----
> > > From: Oleksandr Andrushchenko [mailto:andr2000@gmail.com]
> > > Sent: 26 April 2018 07:00
> > > To: Paul Durrant <Paul.Durrant@citrix.com>; 'Juergen Gross'
> > > <jgross@suse.com>; xen-devel <xen-devel@lists.xenproject.org>
> > > Subject: Re: [Xen-devel] Should PV frontend drivers trust the backends?
> > > 
> > > On 04/25/2018 04:47 PM, Paul Durrant wrote:
> > > > > -----Original Message-----
> > > > > From: Xen-devel [mailto:xen-devel-bounces@lists.xenproject.org] On
> > > Behalf
> > > > > Of Juergen Gross
> > > > > Sent: 25 April 2018 13:43
> > > > > To: xen-devel <xen-devel@lists.xenproject.org>
> > > > > Subject: [Xen-devel] Should PV frontend drivers trust the backends?
> > > > > 
> > > > > This is a followup of a discussion on IRC:
> > > > > 
> > > > > The main question of the discussion was: "Should frontend drivers
> > > > > trust their backends not doing malicious actions?"
> > > > > 
> > > > > This IMO includes:
> > > > > 
> > > > > 1. The data put by the backend on the ring page(s) is sane and
> > > > >      consistent, meaning that e.g. the response producer index is always
> > > > >      ahead of the consumer index.
> > > > > 
> > > > > 2. Response data won't be modified by the backend after the producer
> > > > >      index has been incremented signaling the response is valid.
> > > > > 
> > > > > 3. Response data is sane, e.g. an I/O data length is not larger than
> > > > >      the buffer originally was.
> > > > > 
> > > > > 4. When a response has been sent all grants belonging to the request
> > > > >      have been unmapped again by the backend, meaning that the frontend
> > > > >      can assume the grants can be removed without conflict.
> > > > > 
> > > > > Today most frontend drivers (at least in the Linux kernel) seem to
> > > > > assume all of the above is true (there are some exceptions, but never
> > > > > for all items):
> > > > > 
> > > > > - they don't check sanity of ring index values
> > > > > - they don't copy response data into local memory before looking at it
> > > > > - they don't verify returned data length (or do so via BUG_ON())
> > > > > - they BUG() in case of a conflict when trying to remove a grant
> > > > > 
> > > > > So the basic question is: should all Linux frontend drivers be modified
> > > > > in order to be able to tolerate buggy or malicious backends? Or is the
> > > > > list of trust above fine?
> > > > > 
> > > > > IMO even in case the frontends do trust the backends to behave sane this
> > > > > doesn't mean driver domains don't make sense. Driver domains still make
> > > > > a Xen host more robust as they e.g. protect the host against driver
> > > > > failures normally leading to a crash of dom0.
> > > > > 
> > > > I see the general question as being analogous to 'should a Linux device
> > > driver trust its hardware' and I think the answer for a general purpose OS like
> > > linux is 'yes'.
> > > > Now, having worked on fault tolerant systems in a past life, there are
> > > definitely cases where you want your OS not to implicitly trust its peripheral
> > > hardware and hence special device drivers are used.
> > > So what do you do if counters provided by the untrusted HW are ok
> > > and the payload is not?
> > Well, that depends on whether there is actually any way to verify the payload in a driver. Whatever layer in the system is responsible for the data needs to verify its integrity in a fault tolerant system. Generally the driver can only attempt to verify that it's hardware is working as expect and quiesce it if not. For that reason, in the systems I worked on, the driver had the ability to control FETs that disconnected peripheral h/w from the PCI bus.
> > 
> > > > I think the same would apply for virtual machines in situations where a
> > > driver domain is not wholly controlled by a host administrator or is not
> > > trusted to the same extent as dom0 for other reasons; i.e. they should have
> > > specialist frontends.
> > > I believe we might be able to express some common strategy for the
> > > frontends.
> > > I do understand though that it all needs to be decided on case by case
> > > basis,
> > > but common things could still be there, e.g. if prod/cons counters are
> > > not in sync
> > > what a frontend needs to do:
> > >    - should it keep trying to get in sync - might be a bad idea as the
> > > req/resp data
> > >      may already become inconsistent (net can probably survive, but not
> > > block)
> > >    - should it tear down the connection with the backend - this may
> > > render in the whole
> > >      system instability, e.g. imagine you tear down a "/" block device
> > >    - should it BUG_ON and die
> > > To me the second option (tear down the connection) seems to be
> > > more reasonable, although it can still render the guest unusable, but at
> > > least it
> > > gives a chance for the guest to recover in a proper way
> > > 
> > Absolutely that can be done and it's certainly a good idea to be somewhat defensive but, as you say, it's quite likely that the PV pair is part of a critical subsystem for the guest and so a BUG() may well be the best option to make sure that the inevitable guest crash actually contains pertinent information.

In some cases indeed such device might be critical. But "quite likely"
IMO isn't good enough to abandon all the other cases and crash the
domain if any device fails.
Tearing down misbehaving connection is absolutely reasonable (I do not
advocate for some complex recovery algorithm), but crashing the domain
is not.

> > 
> > > And, if my assumption is correct, we still do trust the contents of the
> > > requests
> > > and responses, e.g. the payload is still trusted.
> > Why should the payload be any more trusted than the content of the shared ring? They are both shared with the backend and therefore can be corrupted to the same extent.
> This is exactly my point: if we only try to protect from inconsistent
> prod/cons then
> this protection is still incomplete as the payload may be the source of
> failure.

Well, you can take extra measures, external to the driver, to
protect against malicious payload (like encryption mentioned by Andrew,
or dm-verity for block devices). But you can't do the same about the
driver itself (ring handling etc).

Of course backend will be able to perform a DoS to some extend in all
the cases, at least by stopping responding to requests. But keep in mind
that root fs is not the only device out there. There are also other
block device, network interfaces etc. And misbehaving backend should
_not_ be able to take over frontend domain in those cases. And ideally
also shouldn't also be able to crash it (if device isn't critical for
domU).

If you want some real world use cases for this, here are two from Qubes
OS:

1. Block devices - base system devices (/, /home equivalent etc) have
backends in dom0 (*), but there is also an option to use block devices
exported by other domains. For example the one handling USB controllers.
So, when you plug USB stick, one domain handle all the USB nasty stuff,
and export it as a plain device to another domain when user can mount
LUKS container stored there. Whatever happens there, nothing from that
USB stick touches dom0 at any time.

2. Network devices - there are no network backends in dom0 at all. There
is one (or more) dedicated domain for handling NICs, then there is
(possibly a tree of) domain(s) routing the traffic. In some cases a VM
facing actual network (where the backend runs) is considered less
trusted than a VM using that network (where the frontend runs).

BTW Since XSA-155 we do have some additional patches for block and
network frontend, making similar changes as done to backends at that
time. I'll resend them in a moment.

(*) we still have plans to support also untrusted backends for base
system, with domU verifying all the data it gets (dm-verity, dm-crypt).
But it isn't there yet.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Should PV frontend drivers trust the backends?
  2018-04-30 17:32         ` Marek Marczykowski-Górecki
@ 2018-05-01  7:55           ` Paul Durrant
  2018-05-01 15:00             ` 'Marek Marczykowski-Górecki'
  0 siblings, 1 reply; 11+ messages in thread
From: Paul Durrant @ 2018-05-01  7:55 UTC (permalink / raw)
  To: 'Marek Marczykowski-Górecki', Oleksandr Andrushchenko
  Cc: 'Juergen Gross', xen-devel

> -----Original Message-----
> From: Marek Marczykowski-Górecki
> [mailto:marmarek@invisiblethingslab.com]
> Sent: 30 April 2018 18:33
> To: Oleksandr Andrushchenko <andr2000@gmail.com>
> Cc: Paul Durrant <Paul.Durrant@citrix.com>; 'Juergen Gross'
> <jgross@suse.com>; xen-devel <xen-devel@lists.xenproject.org>
> Subject: Re: [Xen-devel] Should PV frontend drivers trust the backends?
> 
> On Thu, Apr 26, 2018 at 11:47:41AM +0300, Oleksandr Andrushchenko wrote:
> > On 04/26/2018 11:16 AM, Paul Durrant wrote:
> > > > -----Original Message-----
> > > > From: Oleksandr Andrushchenko [mailto:andr2000@gmail.com]
> > > > Sent: 26 April 2018 07:00
> > > > To: Paul Durrant <Paul.Durrant@citrix.com>; 'Juergen Gross'
> > > > <jgross@suse.com>; xen-devel <xen-devel@lists.xenproject.org>
> > > > Subject: Re: [Xen-devel] Should PV frontend drivers trust the
> backends?
> > > >
> > > > On 04/25/2018 04:47 PM, Paul Durrant wrote:
> > > > > > -----Original Message-----
> > > > > > From: Xen-devel [mailto:xen-devel-bounces@lists.xenproject.org]
> On
> > > > Behalf
> > > > > > Of Juergen Gross
> > > > > > Sent: 25 April 2018 13:43
> > > > > > To: xen-devel <xen-devel@lists.xenproject.org>
> > > > > > Subject: [Xen-devel] Should PV frontend drivers trust the
> backends?
> > > > > >
> > > > > > This is a followup of a discussion on IRC:
> > > > > >
> > > > > > The main question of the discussion was: "Should frontend drivers
> > > > > > trust their backends not doing malicious actions?"
> > > > > >
> > > > > > This IMO includes:
> > > > > >
> > > > > > 1. The data put by the backend on the ring page(s) is sane and
> > > > > >      consistent, meaning that e.g. the response producer index is
> always
> > > > > >      ahead of the consumer index.
> > > > > >
> > > > > > 2. Response data won't be modified by the backend after the
> producer
> > > > > >      index has been incremented signaling the response is valid.
> > > > > >
> > > > > > 3. Response data is sane, e.g. an I/O data length is not larger than
> > > > > >      the buffer originally was.
> > > > > >
> > > > > > 4. When a response has been sent all grants belonging to the
> request
> > > > > >      have been unmapped again by the backend, meaning that the
> frontend
> > > > > >      can assume the grants can be removed without conflict.
> > > > > >
> > > > > > Today most frontend drivers (at least in the Linux kernel) seem to
> > > > > > assume all of the above is true (there are some exceptions, but
> never
> > > > > > for all items):
> > > > > >
> > > > > > - they don't check sanity of ring index values
> > > > > > - they don't copy response data into local memory before looking at
> it
> > > > > > - they don't verify returned data length (or do so via BUG_ON())
> > > > > > - they BUG() in case of a conflict when trying to remove a grant
> > > > > >
> > > > > > So the basic question is: should all Linux frontend drivers be
> modified
> > > > > > in order to be able to tolerate buggy or malicious backends? Or is
> the
> > > > > > list of trust above fine?
> > > > > >
> > > > > > IMO even in case the frontends do trust the backends to behave
> sane this
> > > > > > doesn't mean driver domains don't make sense. Driver domains still
> make
> > > > > > a Xen host more robust as they e.g. protect the host against driver
> > > > > > failures normally leading to a crash of dom0.
> > > > > >
> > > > > I see the general question as being analogous to 'should a Linux
> device
> > > > driver trust its hardware' and I think the answer for a general purpose
> OS like
> > > > linux is 'yes'.
> > > > > Now, having worked on fault tolerant systems in a past life, there are
> > > > definitely cases where you want your OS not to implicitly trust its
> peripheral
> > > > hardware and hence special device drivers are used.
> > > > So what do you do if counters provided by the untrusted HW are ok
> > > > and the payload is not?
> > > Well, that depends on whether there is actually any way to verify the
> payload in a driver. Whatever layer in the system is responsible for the data
> needs to verify its integrity in a fault tolerant system. Generally the driver can
> only attempt to verify that it's hardware is working as expect and quiesce it if
> not. For that reason, in the systems I worked on, the driver had the ability to
> control FETs that disconnected peripheral h/w from the PCI bus.
> > >
> > > > > I think the same would apply for virtual machines in situations where
> a
> > > > driver domain is not wholly controlled by a host administrator or is not
> > > > trusted to the same extent as dom0 for other reasons; i.e. they should
> have
> > > > specialist frontends.
> > > > I believe we might be able to express some common strategy for the
> > > > frontends.
> > > > I do understand though that it all needs to be decided on case by case
> > > > basis,
> > > > but common things could still be there, e.g. if prod/cons counters are
> > > > not in sync
> > > > what a frontend needs to do:
> > > >    - should it keep trying to get in sync - might be a bad idea as the
> > > > req/resp data
> > > >      may already become inconsistent (net can probably survive, but not
> > > > block)
> > > >    - should it tear down the connection with the backend - this may
> > > > render in the whole
> > > >      system instability, e.g. imagine you tear down a "/" block device
> > > >    - should it BUG_ON and die
> > > > To me the second option (tear down the connection) seems to be
> > > > more reasonable, although it can still render the guest unusable, but at
> > > > least it
> > > > gives a chance for the guest to recover in a proper way
> > > >
> > > Absolutely that can be done and it's certainly a good idea to be somewhat
> defensive but, as you say, it's quite likely that the PV pair is part of a critical
> subsystem for the guest and so a BUG() may well be the best option to make
> sure that the inevitable guest crash actually contains pertinent information.
> 
> In some cases indeed such device might be critical. But "quite likely"
> IMO isn't good enough to abandon all the other cases and crash the
> domain if any device fails.
> Tearing down misbehaving connection is absolutely reasonable (I do not
> advocate for some complex recovery algorithm), but crashing the domain
> is not.

So what happens if the backend servicing the VM's boot disk fails? Is it better to:

a) BUG()/BSOD with some meaningful stack and code such that it's obvious that happened, so
b) cover up and wait until something further up the storage stack crashes the VM, probably with some error that's just a generic timeout

I'm clearly advocating a) but it's possible b) may be more desirable in some scenarios. I think the choice is up to whoever is writing the frontend and no-one else should decide their policy for them.

> 
> > >
> > > > And, if my assumption is correct, we still do trust the contents of the
> > > > requests
> > > > and responses, e.g. the payload is still trusted.
> > > Why should the payload be any more trusted than the content of the
> shared ring? They are both shared with the backend and therefore can be
> corrupted to the same extent.
> > This is exactly my point: if we only try to protect from inconsistent
> > prod/cons then
> > this protection is still incomplete as the payload may be the source of
> > failure.
> 
> Well, you can take extra measures, external to the driver, to
> protect against malicious payload (like encryption mentioned by Andrew,
> or dm-verity for block devices). But you can't do the same about the
> driver itself (ring handling etc).
> 

As I said, verification should be down to the layer that has the relevant information.

> Of course backend will be able to perform a DoS to some extend in all
> the cases, at least by stopping responding to requests. But keep in mind
> that root fs is not the only device out there. There are also other
> block device, network interfaces etc. And misbehaving backend should
> _not_ be able to take over frontend domain in those cases. And ideally
> also shouldn't also be able to crash it (if device isn't critical for
> domU).
> 

I still think that is the choice of the frontend. Yes, they can be programmed defensively but for some usecases it may just not be that important.

> If you want some real world use cases for this, here are two from Qubes
> OS:
> 
> 1. Block devices - base system devices (/, /home equivalent etc) have
> backends in dom0 (*), but there is also an option to use block devices
> exported by other domains. For example the one handling USB controllers.
> So, when you plug USB stick, one domain handle all the USB nasty stuff,
> and export it as a plain device to another domain when user can mount
> LUKS container stored there. Whatever happens there, nothing from that
> USB stick touches dom0 at any time.
> 
> 2. Network devices - there are no network backends in dom0 at all. There
> is one (or more) dedicated domain for handling NICs, then there is
> (possibly a tree of) domain(s) routing the traffic. In some cases a VM
> facing actual network (where the backend runs) is considered less
> trusted than a VM using that network (where the frontend runs).

But, without revocable grants that backend could still DoS the frontend, right?

> 
> BTW Since XSA-155 we do have some additional patches for block and
> network frontend, making similar changes as done to backends at that
> time. I'll resend them in a moment.
> 
> (*) we still have plans to support also untrusted backends for base
> system, with domU verifying all the data it gets (dm-verity, dm-crypt).
> But it isn't there yet.

Maybe the frontend should advised on the trust level of a backend so that it can apply auditing should it wish to. If the backend were running in dom0 then there would be little point, but a frontend may wish to be more careful when e.g. the domain is a trusted driver domain (but with no dm priv). There have also been discussions about skipping the use of grants when the backend has mapping privilege, for performance reasons, so maybe that could be worked in too.

  Paul

> 
> --
> Best Regards,
> Marek Marczykowski-Górecki
> Invisible Things Lab
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Should PV frontend drivers trust the backends?
  2018-05-01  7:55           ` Paul Durrant
@ 2018-05-01 15:00             ` 'Marek Marczykowski-Górecki'
  2018-05-01 15:32               ` Paul Durrant
  0 siblings, 1 reply; 11+ messages in thread
From: 'Marek Marczykowski-Górecki' @ 2018-05-01 15:00 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Oleksandr Andrushchenko, 'Juergen Gross', xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 12278 bytes --]

On Tue, May 01, 2018 at 07:55:39AM +0000, Paul Durrant wrote:
> > -----Original Message-----
> > From: Marek Marczykowski-Górecki
> > [mailto:marmarek@invisiblethingslab.com]
> > Sent: 30 April 2018 18:33
> > To: Oleksandr Andrushchenko <andr2000@gmail.com>
> > Cc: Paul Durrant <Paul.Durrant@citrix.com>; 'Juergen Gross'
> > <jgross@suse.com>; xen-devel <xen-devel@lists.xenproject.org>
> > Subject: Re: [Xen-devel] Should PV frontend drivers trust the backends?
> > 
> > On Thu, Apr 26, 2018 at 11:47:41AM +0300, Oleksandr Andrushchenko wrote:
> > > On 04/26/2018 11:16 AM, Paul Durrant wrote:
> > > > > -----Original Message-----
> > > > > From: Oleksandr Andrushchenko [mailto:andr2000@gmail.com]
> > > > > Sent: 26 April 2018 07:00
> > > > > To: Paul Durrant <Paul.Durrant@citrix.com>; 'Juergen Gross'
> > > > > <jgross@suse.com>; xen-devel <xen-devel@lists.xenproject.org>
> > > > > Subject: Re: [Xen-devel] Should PV frontend drivers trust the
> > backends?
> > > > >
> > > > > On 04/25/2018 04:47 PM, Paul Durrant wrote:
> > > > > > > -----Original Message-----
> > > > > > > From: Xen-devel [mailto:xen-devel-bounces@lists.xenproject.org]
> > On
> > > > > Behalf
> > > > > > > Of Juergen Gross
> > > > > > > Sent: 25 April 2018 13:43
> > > > > > > To: xen-devel <xen-devel@lists.xenproject.org>
> > > > > > > Subject: [Xen-devel] Should PV frontend drivers trust the
> > backends?
> > > > > > >
> > > > > > > This is a followup of a discussion on IRC:
> > > > > > >
> > > > > > > The main question of the discussion was: "Should frontend drivers
> > > > > > > trust their backends not doing malicious actions?"
> > > > > > >
> > > > > > > This IMO includes:
> > > > > > >
> > > > > > > 1. The data put by the backend on the ring page(s) is sane and
> > > > > > >      consistent, meaning that e.g. the response producer index is
> > always
> > > > > > >      ahead of the consumer index.
> > > > > > >
> > > > > > > 2. Response data won't be modified by the backend after the
> > producer
> > > > > > >      index has been incremented signaling the response is valid.
> > > > > > >
> > > > > > > 3. Response data is sane, e.g. an I/O data length is not larger than
> > > > > > >      the buffer originally was.
> > > > > > >
> > > > > > > 4. When a response has been sent all grants belonging to the
> > request
> > > > > > >      have been unmapped again by the backend, meaning that the
> > frontend
> > > > > > >      can assume the grants can be removed without conflict.
> > > > > > >
> > > > > > > Today most frontend drivers (at least in the Linux kernel) seem to
> > > > > > > assume all of the above is true (there are some exceptions, but
> > never
> > > > > > > for all items):
> > > > > > >
> > > > > > > - they don't check sanity of ring index values
> > > > > > > - they don't copy response data into local memory before looking at
> > it
> > > > > > > - they don't verify returned data length (or do so via BUG_ON())
> > > > > > > - they BUG() in case of a conflict when trying to remove a grant
> > > > > > >
> > > > > > > So the basic question is: should all Linux frontend drivers be
> > modified
> > > > > > > in order to be able to tolerate buggy or malicious backends? Or is
> > the
> > > > > > > list of trust above fine?
> > > > > > >
> > > > > > > IMO even in case the frontends do trust the backends to behave
> > sane this
> > > > > > > doesn't mean driver domains don't make sense. Driver domains still
> > make
> > > > > > > a Xen host more robust as they e.g. protect the host against driver
> > > > > > > failures normally leading to a crash of dom0.
> > > > > > >
> > > > > > I see the general question as being analogous to 'should a Linux
> > device
> > > > > driver trust its hardware' and I think the answer for a general purpose
> > OS like
> > > > > linux is 'yes'.
> > > > > > Now, having worked on fault tolerant systems in a past life, there are
> > > > > definitely cases where you want your OS not to implicitly trust its
> > peripheral
> > > > > hardware and hence special device drivers are used.
> > > > > So what do you do if counters provided by the untrusted HW are ok
> > > > > and the payload is not?
> > > > Well, that depends on whether there is actually any way to verify the
> > payload in a driver. Whatever layer in the system is responsible for the data
> > needs to verify its integrity in a fault tolerant system. Generally the driver can
> > only attempt to verify that it's hardware is working as expect and quiesce it if
> > not. For that reason, in the systems I worked on, the driver had the ability to
> > control FETs that disconnected peripheral h/w from the PCI bus.
> > > >
> > > > > > I think the same would apply for virtual machines in situations where
> > a
> > > > > driver domain is not wholly controlled by a host administrator or is not
> > > > > trusted to the same extent as dom0 for other reasons; i.e. they should
> > have
> > > > > specialist frontends.
> > > > > I believe we might be able to express some common strategy for the
> > > > > frontends.
> > > > > I do understand though that it all needs to be decided on case by case
> > > > > basis,
> > > > > but common things could still be there, e.g. if prod/cons counters are
> > > > > not in sync
> > > > > what a frontend needs to do:
> > > > >    - should it keep trying to get in sync - might be a bad idea as the
> > > > > req/resp data
> > > > >      may already become inconsistent (net can probably survive, but not
> > > > > block)
> > > > >    - should it tear down the connection with the backend - this may
> > > > > render in the whole
> > > > >      system instability, e.g. imagine you tear down a "/" block device
> > > > >    - should it BUG_ON and die
> > > > > To me the second option (tear down the connection) seems to be
> > > > > more reasonable, although it can still render the guest unusable, but at
> > > > > least it
> > > > > gives a chance for the guest to recover in a proper way
> > > > >
> > > > Absolutely that can be done and it's certainly a good idea to be somewhat
> > defensive but, as you say, it's quite likely that the PV pair is part of a critical
> > subsystem for the guest and so a BUG() may well be the best option to make
> > sure that the inevitable guest crash actually contains pertinent information.
> > 
> > In some cases indeed such device might be critical. But "quite likely"
> > IMO isn't good enough to abandon all the other cases and crash the
> > domain if any device fails.
> > Tearing down misbehaving connection is absolutely reasonable (I do not
> > advocate for some complex recovery algorithm), but crashing the domain
> > is not.
> 
> So what happens if the backend servicing the VM's boot disk fails? Is it better to:
> 
> a) BUG()/BSOD with some meaningful stack and code such that it's obvious that happened, so
> b) cover up and wait until something further up the storage stack crashes the VM, probably with some error that's just a generic timeout
> 
> I'm clearly advocating a) but it's possible b) may be more desirable in some scenarios. I think the choice is up to whoever is writing the frontend and no-one else should decide their policy for them.

But you know, BUG() isn't the only method for getting error message.
I see in this thread proper logging is used as an excuse for crashing
things - really, this is very poor excuse. You can use printk, or even
WARN() or such. And if there are cases where the only way to get
meaningful messages is crashing the whole thing, somethings is _really_
wrong.
In many cases crashing the thing will actually make retrieving messages
harder, not easier (remote systems, console not working etc).

> > > > > And, if my assumption is correct, we still do trust the contents of the
> > > > > requests
> > > > > and responses, e.g. the payload is still trusted.
> > > > Why should the payload be any more trusted than the content of the
> > shared ring? They are both shared with the backend and therefore can be
> > corrupted to the same extent.
> > > This is exactly my point: if we only try to protect from inconsistent
> > > prod/cons then
> > > this protection is still incomplete as the payload may be the source of
> > > failure.
> > 
> > Well, you can take extra measures, external to the driver, to
> > protect against malicious payload (like encryption mentioned by Andrew,
> > or dm-verity for block devices). But you can't do the same about the
> > driver itself (ring handling etc).
> > 
> 
> As I said, verification should be down to the layer that has the relevant information.
> 
> > Of course backend will be able to perform a DoS to some extend in all
> > the cases, at least by stopping responding to requests. But keep in mind
> > that root fs is not the only device out there. There are also other
> > block device, network interfaces etc. And misbehaving backend should
> > _not_ be able to take over frontend domain in those cases. And ideally
> > also shouldn't also be able to crash it (if device isn't critical for
> > domU).
> > 
> 
> I still think that is the choice of the frontend. Yes, they can be programmed defensively but for some usecases it may just not be that important.
> 
> > If you want some real world use cases for this, here are two from Qubes
> > OS:
> > 
> > 1. Block devices - base system devices (/, /home equivalent etc) have
> > backends in dom0 (*), but there is also an option to use block devices
> > exported by other domains. For example the one handling USB controllers.
> > So, when you plug USB stick, one domain handle all the USB nasty stuff,
> > and export it as a plain device to another domain when user can mount
> > LUKS container stored there. Whatever happens there, nothing from that
> > USB stick touches dom0 at any time.
> > 
> > 2. Network devices - there are no network backends in dom0 at all. There
> > is one (or more) dedicated domain for handling NICs, then there is
> > (possibly a tree of) domain(s) routing the traffic. In some cases a VM
> > facing actual network (where the backend runs) is considered less
> > trusted than a VM using that network (where the frontend runs).
> 
> But, without revocable grants that backend could still DoS the frontend, right?

Yes, but in that case it should be enough to kill the backend (domain)
and frontend domain should be good, right?
What I mean, malicious/buggy backend should be able to do harm only to
devices it controls. Not crashing the whole driver (affecting all
devices of that kind), or the whole system. 

And definitely arbitrary code execution or info leak also should not be
possible. I hope we agree at least to this point, right?

Of course this all is about what the driver itself. If upper layer is about
to execute any payload it gets, then PV driver can do nothing about it.
But as you've said, it should be up to the frontend [domain configuration].

> > BTW Since XSA-155 we do have some additional patches for block and
> > network frontend, making similar changes as done to backends at that
> > time. I'll resend them in a moment.
> > 
> > (*) we still have plans to support also untrusted backends for base
> > system, with domU verifying all the data it gets (dm-verity, dm-crypt).
> > But it isn't there yet.
> 
> Maybe the frontend should advised on the trust level of a backend so that it can apply auditing should it wish to. If the backend were running in dom0 then there would be little point, but a frontend may wish to be more careful when e.g. the domain is a trusted driver domain (but with no dm priv). There have also been discussions about skipping the use of grants when the backend has mapping privilege, for performance reasons, so maybe that could be worked in too.

Generally I'd avoid multiple modes (either dom0/non-dom0 or
trusted/untrusted). This almost always leads to some bugs in one of
those branches sooner or later.
 
-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Should PV frontend drivers trust the backends?
  2018-05-01 15:00             ` 'Marek Marczykowski-Górecki'
@ 2018-05-01 15:32               ` Paul Durrant
  0 siblings, 0 replies; 11+ messages in thread
From: Paul Durrant @ 2018-05-01 15:32 UTC (permalink / raw)
  To: 'Marek Marczykowski-Górecki'
  Cc: Oleksandr Andrushchenko, 'Juergen Gross', xen-devel

> -----Original Message-----
[snip]
> > So what happens if the backend servicing the VM's boot disk fails? Is it
> better to:
> >
> > a) BUG()/BSOD with some meaningful stack and code such that it's obvious
> that happened, so
> > b) cover up and wait until something further up the storage stack crashes
> the VM, probably with some error that's just a generic timeout
> >
> > I'm clearly advocating a) but it's possible b) may be more desirable in some
> scenarios. I think the choice is up to whoever is writing the frontend and no-
> one else should decide their policy for them.
> 
> But you know, BUG() isn't the only method for getting error message.
> I see in this thread proper logging is used as an excuse for crashing
> things - really, this is very poor excuse. You can use printk, or even
> WARN() or such.

On Windows? I think not.

> And if there are cases where the only way to get
> meaningful messages is crashing the whole thing, somethings is _really_
> wrong.

Forcing a BSOD really is sometime the best option on Windows.

> In many cases crashing the thing will actually make retrieving messages
> harder, not easier (remote systems, console not working etc).
> 

Again, forcing a BSOD in Windows can resulting in a meaningful crashdump that can take you straight to a diagnosis of the problem. Fixing things up and getting some form of arbitrary 'page in timeout' BSOD a couple of minutes later can make post mortem diagnosis a lot harder.

> > > > > > And, if my assumption is correct, we still do trust the contents of the
> > > > > > requests
> > > > > > and responses, e.g. the payload is still trusted.
> > > > > Why should the payload be any more trusted than the content of the
> > > shared ring? They are both shared with the backend and therefore can
> be
> > > corrupted to the same extent.
> > > > This is exactly my point: if we only try to protect from inconsistent
> > > > prod/cons then
> > > > this protection is still incomplete as the payload may be the source of
> > > > failure.
> > >
> > > Well, you can take extra measures, external to the driver, to
> > > protect against malicious payload (like encryption mentioned by Andrew,
> > > or dm-verity for block devices). But you can't do the same about the
> > > driver itself (ring handling etc).
> > >
> >
> > As I said, verification should be down to the layer that has the relevant
> information.
> >
> > > Of course backend will be able to perform a DoS to some extend in all
> > > the cases, at least by stopping responding to requests. But keep in mind
> > > that root fs is not the only device out there. There are also other
> > > block device, network interfaces etc. And misbehaving backend should
> > > _not_ be able to take over frontend domain in those cases. And ideally
> > > also shouldn't also be able to crash it (if device isn't critical for
> > > domU).
> > >
> >
> > I still think that is the choice of the frontend. Yes, they can be programmed
> defensively but for some usecases it may just not be that important.
> >
> > > If you want some real world use cases for this, here are two from Qubes
> > > OS:
> > >
> > > 1. Block devices - base system devices (/, /home equivalent etc) have
> > > backends in dom0 (*), but there is also an option to use block devices
> > > exported by other domains. For example the one handling USB
> controllers.
> > > So, when you plug USB stick, one domain handle all the USB nasty stuff,
> > > and export it as a plain device to another domain when user can mount
> > > LUKS container stored there. Whatever happens there, nothing from that
> > > USB stick touches dom0 at any time.
> > >
> > > 2. Network devices - there are no network backends in dom0 at all. There
> > > is one (or more) dedicated domain for handling NICs, then there is
> > > (possibly a tree of) domain(s) routing the traffic. In some cases a VM
> > > facing actual network (where the backend runs) is considered less
> > > trusted than a VM using that network (where the frontend runs).
> >
> > But, without revocable grants that backend could still DoS the frontend,
> right?
> 
> Yes, but in that case it should be enough to kill the backend (domain)
> and frontend domain should be good, right?
> What I mean, malicious/buggy backend should be able to do harm only to
> devices it controls. Not crashing the whole driver (affecting all
> devices of that kind), or the whole system.
> 
> And definitely arbitrary code execution or info leak also should not be
> possible. I hope we agree at least to this point, right?

It's a good idea to defend against it...

> 
> Of course this all is about what the driver itself. If upper layer is about
> to execute any payload it gets, then PV driver can do nothing about it.

...but as you point out here, it will likely always be possible at some level.

  Paul

> But as you've said, it should be up to the frontend [domain configuration].
> 
> > > BTW Since XSA-155 we do have some additional patches for block and
> > > network frontend, making similar changes as done to backends at that
> > > time. I'll resend them in a moment.
> > >
> > > (*) we still have plans to support also untrusted backends for base
> > > system, with domU verifying all the data it gets (dm-verity, dm-crypt).
> > > But it isn't there yet.
> >
> > Maybe the frontend should advised on the trust level of a backend so that
> it can apply auditing should it wish to. If the backend were running in dom0
> then there would be little point, but a frontend may wish to be more careful
> when e.g. the domain is a trusted driver domain (but with no dm priv). There
> have also been discussions about skipping the use of grants when the
> backend has mapping privilege, for performance reasons, so maybe that
> could be worked in too.
> 
> Generally I'd avoid multiple modes (either dom0/non-dom0 or
> trusted/untrusted). This almost always leads to some bugs in one of
> those branches sooner or later.
> 
> --
> Best Regards,
> Marek Marczykowski-Górecki
> Invisible Things Lab
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-05-01 15:33 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-25 12:42 Should PV frontend drivers trust the backends? Juergen Gross
2018-04-25 13:36 ` Andrew Cooper
2018-04-25 13:47 ` Paul Durrant
2018-04-26  5:59   ` Oleksandr Andrushchenko
2018-04-26  8:16     ` Paul Durrant
2018-04-26  8:47       ` Oleksandr Andrushchenko
2018-04-30 17:32         ` Marek Marczykowski-Górecki
2018-05-01  7:55           ` Paul Durrant
2018-05-01 15:00             ` 'Marek Marczykowski-Górecki'
2018-05-01 15:32               ` Paul Durrant
2018-04-26  8:46   ` Petr Tesarik

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.