linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jason Wang <jasowang@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	Jintack Lim <jintack@cs.columbia.edu>
Subject: Re: [PATCH net V2 4/4] vhost: log dirty page correctly
Date: Thu, 27 Dec 2018 17:32:20 +0800	[thread overview]
Message-ID: <ad02985b-7a3c-06e2-7614-3ac29e69d678@redhat.com> (raw)
In-Reply-To: <20181226083630-mutt-send-email-mst@kernel.org>


On 2018/12/26 下午9:46, Michael S. Tsirkin wrote:
> On Wed, Dec 26, 2018 at 01:43:26PM +0800, Jason Wang wrote:
>> On 2018/12/26 上午12:25, Michael S. Tsirkin wrote:
>>> On Tue, Dec 25, 2018 at 05:43:25PM +0800, Jason Wang wrote:
>>>> On 2018/12/25 上午1:41, Michael S. Tsirkin wrote:
>>>>> On Mon, Dec 24, 2018 at 11:43:31AM +0800, Jason Wang wrote:
>>>>>> On 2018/12/14 下午9:20, Michael S. Tsirkin wrote:
>>>>>>> On Fri, Dec 14, 2018 at 10:43:03AM +0800, Jason Wang wrote:
>>>>>>>> On 2018/12/13 下午10:31, Michael S. Tsirkin wrote:
>>>>>>>>>> Just to make sure I understand this. It looks to me we should:
>>>>>>>>>>
>>>>>>>>>> - allow passing GIOVA->GPA through UAPI
>>>>>>>>>>
>>>>>>>>>> - cache GIOVA->GPA somewhere but still use GIOVA->HVA in device IOTLB for
>>>>>>>>>> performance
>>>>>>>>>>
>>>>>>>>>> Is this what you suggest?
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>> Not really. We already have GPA->HVA, so I suggested a flag to pass
>>>>>>>>> GIOVA->GPA in the IOTLB.
>>>>>>>>>
>>>>>>>>> This has advantages for security since a single table needs
>>>>>>>>> then to be validated to ensure guest does not corrupt
>>>>>>>>> QEMU memory.
>>>>>>>>>
>>>>>>>> I wonder how much we can gain through this. Currently, qemu IOMMU gives
>>>>>>>> GIOVA->GPA mapping, and qemu vhost code will translate GPA to HVA then pass
>>>>>>>> GIOVA->HVA to vhost. It looks no difference to me.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>> The difference is in security not in performance.  Getting a bad HVA
>>>>>>> corrupts QEMU memory and it might be guest controlled. Very risky.
>>>>>> How can this be controlled by guest? HVA was generated from qemu ram blocks
>>>>>> which is totally under the control of qemu memory core instead of guest.
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>> It is ultimately under guest influence as guest supplies IOVA->GPA
>>>>> translations.  qemu translates GPA->HVA and gives the translated result
>>>>> to the kernel.  If it's not buggy and kernel isn't buggy it's all
>>>>> fine.
>>>> If qemu provides buggy GPA->HVA, we can't workaround this. And I don't get
>>>> the point why we even want to try this. Buggy qemu code can crash itself in
>>>> many ways.
>>>>
>>>>
>>>>> But that's the approach that was proven not to work in the 20th century.
>>>>> In the 21st century we are trying defence in depth approach.
>>>>>
>>>>> My point is that a single code path that is responsible for
>>>>> the HVA translations is better than two.
>>>>>
>>>> So the difference whether or not use memory table information:
>>>>
>>>> Current:
>>>>
>>>> 1) SET_MEM_TABLE: GPA->HVA
>>>>
>>>> 2) Qemu GIOVA->GPA
>>>>
>>>> 3) Qemu GPA->HVA
>>>>
>>>> 4) IOTLB_UPDATE: GIOVA->HVA
>>>>
>>>> If I understand correctly you want to drop step 3 consider it might be buggy
>>>> which is just 19 lines of code in qemu (vhost_memory_region_lookup()). This
>>>> will ends up:
>>>>
>>>> 1) Do GPA->HVA translation in IOTLB_UPDATE path (I believe we won't want to
>>>> do it during device IOTLB lookup).
>>>>
>>>> 2) Extra bits to enable this capability.
>>>>
>>>> So this looks need more codes in kernel than what qemu did in userspace.  Is
>>>> this really worthwhile?
>>>>
>>>> Thanks
>>> So there are several points I would like to make
>>>
>>> 1. At the moment without an iommu it is possible to
>>>      change GPA-HVA mappings and everything keeps working
>>>      because a change in memory tables flushes the rings.
>>
>> Interesting, I don't know this before. But when can this happen?
>
> It doesn't happen with existing qemu. But it seems like a valid
> thing to do to remap memory at a different address.
>

Ok.


>>>      However I don't see the iotlb cache being invalidated
>>>      on that path - did I miss it? If it is not there it's
>>>      a related minor bug.
>>
>> It might have a bug. But a question is consider the case without IOMMU. We
>> only update mem table (SET_MEM_TABLE), but not vring address. This looks
>> like a bug as well?
> I think that without an iommu it can only work without races if backend is
> stopped or if the vring isn't in guest memory with ring aliasing).


Right.


>
>>> 2. qemu already has a GPA. Discarding it and re-calculating
>>>      when logging is on just seems wrong.
>>>      However if you would like to *also* keep the HVA in the iotlb
>>>      to avoid doing extra translations, that sounds like a
>>>      reasonable optimization.
>>
>> Yes, traverse GPA->HVA mapping seems unnecessary.
>>
>>
>>> 3. it also means that the hva->gpa translation only runs
>>>      when logging is enabled. That is a rarely excercised
>>>      path so any bugs there will not be caught.
>>
>> I wonder maybe some kind of unit-test may help here.
>>
>>
>>> So I really would like us long term to move away from
>>> hva->gpa translations, keep them for legacy userspace only
>>> but I don't really mind how we do it.
>>>
>>> How about
>>> - a new flag to pass an iotlb with *both* a gpa and hva
>>> - for legacy userspace, calculate the gpa on iotlb update
>>>     so the device then uses a shared code path
>>>
>>> what do you think?
>>>
>>>
>> I don't object this idea so I can try, just want to figure out why it was a
>> must.
>>
>> Thanks
> Not a must but I think it's a good interface extension.
>

Ok. let me try to do this.

Thanks


  reply	other threads:[~2018-12-27  9:32 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-12 10:08 [PATCH net V2 0/4] Fix various issue of vhost Jason Wang
2018-12-12 10:08 ` [PATCH net V2 1/4] vhost: make sure used idx is seen before log in vhost_add_used_n() Jason Wang
2018-12-12 14:33   ` Michael S. Tsirkin
2018-12-12 10:08 ` [PATCH net V2 2/4] vhost_net: switch to use mutex_trylock() in vhost_net_busy_poll() Jason Wang
2018-12-12 14:20   ` Michael S. Tsirkin
2018-12-12 10:08 ` [PATCH net V2 3/4] Revert "net: vhost: lock the vqs one by one" Jason Wang
2018-12-12 14:24   ` Michael S. Tsirkin
2018-12-13  2:27     ` Jason Wang
2018-12-12 10:08 ` [PATCH net V2 4/4] vhost: log dirty page correctly Jason Wang
2018-12-12 14:32   ` Michael S. Tsirkin
2018-12-13  2:39     ` Jason Wang
2018-12-13 14:31       ` Michael S. Tsirkin
2018-12-14  2:43         ` Jason Wang
2018-12-14 13:20           ` Michael S. Tsirkin
2018-12-24  3:43             ` Jason Wang
2018-12-24 17:41               ` Michael S. Tsirkin
2018-12-25  9:43                 ` Jason Wang
2018-12-25 16:25                   ` Michael S. Tsirkin
2018-12-26  5:43                     ` Jason Wang
2018-12-26 13:46                       ` Michael S. Tsirkin
2018-12-27  9:32                         ` Jason Wang [this message]
2018-12-12 23:31 ` [PATCH net V2 0/4] Fix various issue of vhost David Miller
2018-12-13  2:42   ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ad02985b-7a3c-06e2-7614-3ac29e69d678@redhat.com \
    --to=jasowang@redhat.com \
    --cc=jintack@cs.columbia.edu \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).