All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yu Zhang <yu.c.zhang@linux.intel.com>
To: George Dunlap <george.dunlap@citrix.com>,
	George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Kevin Tian <kevin.tian@intel.com>,
	Jan Beulich <jbeulich@suse.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
	Paul Durrant <paul.durrant@citrix.com>,
	"Lv, Zhiyuan" <zhiyuan.lv@intel.com>,
	Jun Nakajima <jun.nakajima@intel.com>
Subject: Re: [PATCH v10 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
Date: Thu, 6 Apr 2017 02:02:07 +0800	[thread overview]
Message-ID: <58E5311F.2080205@linux.intel.com> (raw)
In-Reply-To: <58E52922.1030500@linux.intel.com>



On 4/6/2017 1:28 AM, Yu Zhang wrote:
>
>
> On 4/6/2017 1:18 AM, Yu Zhang wrote:
>>
>>
>> On 4/6/2017 1:01 AM, George Dunlap wrote:
>>> On 05/04/17 17:32, Yu Zhang wrote:
>>>>
>>>> On 4/6/2017 12:35 AM, George Dunlap wrote:
>>>>> On 05/04/17 17:22, Yu Zhang wrote:
>>>>>> On 4/5/2017 10:41 PM, George Dunlap wrote:
>>>>>>> On Sun, Apr 2, 2017 at 1:24 PM, Yu Zhang 
>>>>>>> <yu.c.zhang@linux.intel.com>
>>>>>>> wrote:
>>>>>>>> After an ioreq server has unmapped, the remaining p2m_ioreq_server
>>>>>>>> entries need to be reset back to p2m_ram_rw. This patch does this
>>>>>>>> asynchronously with the current p2m_change_entry_type_global()
>>>>>>>> interface.
>>>>>>>>
>>>>>>>> New field entry_count is introduced in struct p2m_domain, to 
>>>>>>>> record
>>>>>>>> the number of p2m_ioreq_server p2m page table entries. One 
>>>>>>>> nature of
>>>>>>>> these entries is that they only point to 4K sized page frames, 
>>>>>>>> because
>>>>>>>> all p2m_ioreq_server entries are originated from p2m_ram_rw 
>>>>>>>> ones in
>>>>>>>> p2m_change_type_one(). We do not need to worry about the 
>>>>>>>> counting for
>>>>>>>> 2M/1G sized pages.
>>>>>>> Assuming that all p2m_ioreq_server entries are *created* by
>>>>>>> p2m_change_type_one() may valid, but can you assume that they 
>>>>>>> are only
>>>>>>> ever *removed* by p2m_change_type_one() (or recalculation)?
>>>>>>>
>>>>>>> What happens, for instance, if a guest balloons out one of the ram
>>>>>>> pages?  I don't immediately see anything preventing a 
>>>>>>> p2m_ioreq_server
>>>>>>> page from being ballooned out, nor anything on the
>>>>>>> decrease_reservation() path decreasing p2m->ioreq.entry_count.  
>>>>>>> Or did
>>>>>>> I miss something?
>>>>>>>
>>>>>>> Other than that, only one minor comment...
>>>>>> Thanks for your thorough consideration, George. But I do not 
>>>>>> think we
>>>>>> need to worry about this:
>>>>>>
>>>>>> If the emulation is in process, the balloon driver cannot get a
>>>>>> p2m_ioreq_server page - because
>>>>>> it is already allocated.
>>>>> In theory, yes, the guest *shouldn't* do this.  But what if the 
>>>>> guest OS
>>>>> makes a mistake?  Or, what if the ioreq server makes a mistake and
>>>>> places a watch on a page that *isn't* allocated by the device 
>>>>> driver, or
>>>>> forgets to change a page type back to ram when the device driver 
>>>>> frees
>>>>> it back to the guest kernel?
>>>> Then the lazy p2m change code will be triggered, and this page is 
>>>> reset
>>>> to p2m_ram_rw
>>>> before being set to p2m_invalid, just like the normal path. Will 
>>>> this be
>>>> a problem?
>>> No, I'm talking about before the ioreq server detaches.
>> Sorry, I do not get it. Take scenario 1 for example:
>>> Scenario 1: Bug in driver
>>> 1. Guest driver allocates page A
>>> 2. dm marks A as p2m_ioreq_server
>> Here in step 2. the ioreq.entry_count increases;
>>> 3. Guest driver accidentally frees A to the kernel
>>> 4. guest kernel balloons out page A; ioreq.entry_count is wrong
>>
>> Here in step 4. the ioreq.entry_count decreases.
>
> Oh. I figured out. This entry is not invalidated yet if ioreq is not 
> unmapped. Sorry.
>
>> Isn't this what we are expecting?
>>
>> Yu
>>>
>>> Scenario 2: Bug in the kernel
>>> 1. Guest driver allocates page A
>>> 2. dm marks A as p2m_ioreq_server
>>> 3. Guest kernel tries to balloon out page B, but makes a calculation
>>> mistake and balloons out A instead; now ioreq.entry_count is wrong
>>>
>>> Scenario 3: Off-by-one bug in devicemodel
>>> 1. Guest driver allocates pages A-D
>>> 2. dm makes a mistake and marks pages A-E as p2m_ioreq_server (one 
>>> extra
>>> page)
>>> 3. guest kernel balloons out page E; now ioreq.entry_count is wrong
>>>
>>> Scenario 4: "Leak" in devicemodel
>>> 1. Guest driver allocates page A
>>> 2. dm marks A as p2m_ioreq_server
>>> 3. Guest driver is done with page A, but DM forgets to reset it to
>>> p2m_ram_rw
>>> 4. Guest driver frees A to guest kernel
>>> 5. Guest kernel balloons out page A; now ioreq.entry_count is wrong
>>>
>>> I could keep going on; there are *lots* of bugs in the driver, the
>>> kernel, or the devicemodel which could cause pages marked
>>> p2m_ioreq_server to end up being ballooned out; which under the current
>>> code would make ioreq.entry_count wrong.
>>>
>>> It's the hypervisor's job to do the right thing even when other
>>> components have bugs in them.  This is why I initially suggested 
>>> keeping
>>> count in atomic_write_ept_entry() -- no matter how the entry is 
>>> changed,
>>> we always know exactly how many entries of type p2m_ioreq_server we 
>>> have.
>>>
>
> Well, count in atomic_write_ept_entry() only works for ept. Besides, 
> it requires
> interface changes - need to pass the p2m.
> Another thought is - now in XenGT, PoD is disabled to make sure 
> gfn->mfn does
> not change. So how about we disable ballooning if ioreq.entry_count is 
> not 0?

Or maybe just change the p2m_ioreq_server to p2m_ram_rw before it is set 
to p2m_invalid?
Like below code:

diff --git a/xen/common/memory.c b/xen/common/memory.c

index 7dbddda..40e5f63 100644

--- a/xen/common/memory.c

+++ b/xen/common/memory.c

@@ -288,6 +288,10 @@ int guest_remove_page(struct domain *d, unsigned 
long gmfn)

put_gfn(d, gmfn);

return 1;

}

+if ( unlikely(p2mt == p2m_ioreq_server) )

+p2m_change_type_one(d, gmfn,

+p2m_ioreq_server, p2m_ram_rw);

+

#else

mfn = gfn_to_mfn(d, _gfn(gmfn));

#endif


Yu

>
> Yu
>>>   -George
>>>
>>>
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> https://lists.xen.org/xen-devel
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

  reply	other threads:[~2017-04-05 18:02 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-02 12:24 [PATCH v10 0/6] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type Yu Zhang
2017-04-02 12:24 ` [PATCH v10 1/6] x86/ioreq server: Release the p2m lock after mmio is handled Yu Zhang
2017-04-02 12:24 ` [PATCH v10 2/6] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server Yu Zhang
2017-04-03 14:21   ` Jan Beulich
2017-04-05 13:48   ` George Dunlap
2017-04-02 12:24 ` [PATCH v10 3/6] x86/ioreq server: Add device model wrappers for new DMOP Yu Zhang
2017-04-03  8:13   ` Paul Durrant
2017-04-03  9:28     ` Wei Liu
2017-04-05  6:53       ` Yu Zhang
2017-04-05  9:21         ` Jan Beulich
2017-04-05  9:22           ` Yu Zhang
2017-04-05  9:38             ` Jan Beulich
2017-04-05 10:08         ` Wei Liu
2017-04-05 10:20           ` Wei Liu
2017-04-05 10:21             ` Yu Zhang
2017-04-05 10:21           ` Yu Zhang
2017-04-05 10:33             ` Wei Liu
2017-04-05 10:26               ` Yu Zhang
2017-04-05 10:46                 ` Jan Beulich
2017-04-05 10:50                   ` Yu Zhang
2017-04-02 12:24 ` [PATCH v10 4/6] x86/ioreq server: Handle read-modify-write cases for p2m_ioreq_server pages Yu Zhang
2017-04-02 12:24 ` [PATCH v10 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries Yu Zhang
2017-04-03 14:36   ` Jan Beulich
2017-04-03 14:38     ` Jan Beulich
2017-04-05  7:18     ` Yu Zhang
2017-04-05  8:11       ` Jan Beulich
2017-04-05 14:41   ` George Dunlap
2017-04-05 16:22     ` Yu Zhang
2017-04-05 16:35       ` George Dunlap
2017-04-05 16:32         ` Yu Zhang
2017-04-05 17:01           ` George Dunlap
2017-04-05 17:18             ` Yu Zhang
2017-04-05 17:28               ` Yu Zhang
2017-04-05 18:02                 ` Yu Zhang [this message]
2017-04-05 18:04                   ` Yu Zhang
2017-04-06  7:48                     ` Jan Beulich
2017-04-06  8:27                       ` Yu Zhang
2017-04-06  8:44                         ` Jan Beulich
2017-04-06  7:43                 ` Jan Beulich
2017-04-05 17:29               ` George Dunlap
2017-04-02 12:24 ` [PATCH v10 6/6] x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server entries when an ioreq server unmaps Yu Zhang
2017-04-03  8:16   ` Paul Durrant
2017-04-03 15:23   ` Jan Beulich
2017-04-05  9:11     ` Yu Zhang
2017-04-05  9:41       ` Jan Beulich
2017-04-05 14:46   ` George Dunlap

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=58E5311F.2080205@linux.intel.com \
    --to=yu.c.zhang@linux.intel.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=george.dunlap@citrix.com \
    --cc=jbeulich@suse.com \
    --cc=jun.nakajima@intel.com \
    --cc=kevin.tian@intel.com \
    --cc=paul.durrant@citrix.com \
    --cc=xen-devel@lists.xen.org \
    --cc=zhiyuan.lv@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.