From: Yu Zhang <yu.c.zhang@linux.intel.com>
To: George Dunlap <george.dunlap@citrix.com>,
Jan Beulich <JBeulich@suse.com>
Cc: Kevin Tian <kevin.tian@intel.com>,
George Dunlap <george.dunlap@eu.citrix.com>,
Andrew Cooper <andrew.cooper3@citrix.com>,
Tim Deegan <tim@xen.org>,
xen-devel@lists.xen.org, Paul Durrant <paul.durrant@citrix.com>,
zhiyuan.lv@intel.com, JunNakajima <jun.nakajima@intel.com>
Subject: Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
Date: Fri, 24 Jun 2016 12:16:08 +0800 [thread overview]
Message-ID: <576CB408.90104@linux.intel.com> (raw)
In-Reply-To: <b0f86fb1-d408-ac80-060b-274c7559f922@citrix.com>
On 6/23/2016 6:33 PM, George Dunlap wrote:
> On 23/06/16 08:37, Yu Zhang wrote:
>> On 6/22/2016 7:33 PM, George Dunlap wrote:
>>> On 22/06/16 11:07, Yu Zhang wrote:
>>>> On 6/22/2016 5:47 PM, George Dunlap wrote:
>>>>> On 22/06/16 10:29, Jan Beulich wrote:
>>>>>>>>> On 22.06.16 at 11:16, <george.dunlap@citrix.com> wrote:
>>>>>>> On 22/06/16 07:39, Jan Beulich wrote:
>>>>>>>>>>> On 21.06.16 at 16:38, <george.dunlap@citrix.com> wrote:
>>>>>>>>> On 21/06/16 10:47, Jan Beulich wrote:
>>>>>>>>>>>>>> And then - didn't we mean to disable that part of XenGT during
>>>>>>>>>>>>>> migration, i.e. temporarily accept the higher performance
>>>>>>>>>>>>>> overhead without the p2m_ioreq_server entries? In which case
>>>>>>>>>>>>>> flipping everything back to p2m_ram_rw after (completed or
>>>>>>>>>>>>>> canceled) migration would be exactly what we want. The (new
>>>>>>>>>>>>>> or previous) ioreq server should attach only afterwards, and
>>>>>>>>>>>>>> can then freely re-establish any p2m_ioreq_server entries it
>>>>>>>>>>>>>> deems necessary.
>>>>>>>>>>>>>>
>>>>>>>>>>>>> Well, I agree this part of XenGT should be disabled during
>>>>>>>>>>>>> migration.
>>>>>>>>>>>>> But in such
>>>>>>>>>>>>> case I think it's device model's job to trigger the p2m type
>>>>>>>>>>>>> flipping(i.e. by calling
>>>>>>>>>>>>> HVMOP_set_mem_type).
>>>>>>>>>>>> I agree - this would seem to be the simpler model here, despite
>>>>>>>>>>>> (as
>>>>>>>>>>>> George validly says) the more consistent model would be for the
>>>>>>>>>>>> hypervisor to do the cleanup. Such cleanup would imo be
>>>>>>>>>>>> reasonable
>>>>>>>>>>>> only if there was an easy way for the hypervisor to enumerate
>>>>>>>>>>>> all
>>>>>>>>>>>> p2m_ioreq_server pages.
>>>>>>>>>>> Well, for me, the "easy way" means we should avoid traversing
>>>>>>>>>>> the whole ept
>>>>>>>>>>> paging structure all at once, right?
>>>>>>>>>> Yes.
>>>>>>>>> Does calling p2m_change_entry_type_global() not satisfy this
>>>>>>>>> requirement?
>>>>>>>> Not really - that addresses the "low overhead" aspect, but not the
>>>>>>>> "enumerate all such entries" one.
>>>>>>> I'm sorry, I think I'm missing something here. What do we need the
>>>>>>> enumeration for?
>>>>>> We'd need that if we were to do the cleanup in the hypervisor (as
>>>>>> we can't rely on all p2m entry re-calculation to have happened by
>>>>>> the time a new ioreq server registers for the type).
>>>>> So you're afraid of this sequence of events?
>>>>> 1) Server A de-registered, triggering a ioreq_server -> ram_rw type
>>>>> change
>>>>> 2) gfn N is marked as misconfigured
>>>>> 3) Server B registers and marks gfn N as ioreq_server
>>>>> 4) When N is accessed, the misconfiguration is resolved incorrectly to
>>>>> ram_rw
>>>>>
>>>>> But that can't happen, because misconfigured entries are resolved
>>>>> before
>>>>> setting a p2m entry; so at step 3, gfn N will be first set to
>>>>> (non-misconfigured) ram_rw, then changed to (non-misconfigured)
>>>>> ioreq_server.
>>>>>
>>>>> Or is there another sequence of events that I'm missing?
>>>> Thanks for your reply, George. :)
>>>> If no log dirty is triggered during this process, your sequence is
>>>> correct.
>>>> However, if log dirty is triggered, we'll met problems. I have described
>>>> this
>>>> in previous mails :
>>>>
>>>> http://lists.xenproject.org/archives/html/xen-devel/2016-06/msg02426.html
>>>>
>>>> on Jun 20
>>>>
>>>> and
>>>>
>>>> http://lists.xenproject.org/archives/html/xen-devel/2016-06/msg02575.html
>>>>
>>>> on Jun 21
>>> Right -- sorry, now I see the issue:
>>>
>>> 1. Server A marks gfn X as ioreq_server
>>> 2. Server A deregisters, gfn X misconfigured
>>> 3. Server B registers, marks gfn Y as ioreq_server
>>> 4. Logdirty mode enabled; gfn Y misconfigured
>>> 5. When X or Y are accessed, resolve_misconfigure() has no way of
>>> telling whether the entry is from server A (which should be set to
>>> logdirty) or from server B (which should be left as ioreq_server).
>> Exactly. :)
>> Another simpler scenario would be
>> 1. Server A marks gfn X as p2m_ioreq_server;
>> 2. Logdirty mode enabled; gfn X misconfigured;
>> 3. When X is written, it will not cause ept vioalation, but ept
>> misconfig, and the
>> resolve_misconfig() would set gfn X back to p2m_ram_rw, thereafter we
>> can not
>> track access to X;
> Right, so this is a reason that simply making misconfigurations always
> resolve ioreq_server into ram_rw isn't compatible with logdirty.
>
>> Note: Not resetting the p2m type for p2m_ioreq_server when
>> p2m->ioreq_server is
>> not NULL is suitable for this simpler scenario, but is not correct if
>> take your scenario
>> into account.
>>
>> The core reason is I could not find a simple solution in
>> resolve_misconfig() to handle
>> handle both the outdated p2m_ioreq_server entries, the in-use ones and
>> to support
>> the logdirty feature at the same time.
> Indeed; and as I said, the real problem is that
> p2m_change_entry_type_global() isn't really properly abstracted; in
> order to use it you need to know how it works and be careful not to use
> it at the wrong time.
>
> Short-term, thinking through a handful of the scenarios we want to
> support should be good enough. Long-term, making it more robust so that
> we don't have to think so hard about is probably better.
>
>>> In a sense this is a deficiency in the change_entry_type_global()
>>> interface. A common OS principle is "make the common case fast, and the
>>> uncommon case correct". The scenario described above seems to me to be
>>> an uncommon case which is handled quickly but incorrectly; ideally we
>>> should handle it correctly, even if it's not very quick.
>>>
>>> Synchronously resolving a previous misconfig is probably the most
>>> straightforward thing to do. It could be done at point #3, when an M->N
>>> type change is not complete and a new p2m entry of type M is written; it
>>> could be at point #4, when an N->O type change is initiated while an
>>> M->N type change hasn't completed. Or it could be when an N->O type
>>> change happens while there are unfinished M->N transitions *and*
>>> post-type-change M entries.
>> Sorry, I did not quite get it. Could you please elaborate more? Thanks! :)
> Well the basic idea is to make change_entry_type_global() *appear* to
> all external callers as though the change happened immediately. And the
> basic problem is that at the moment, you can start a second
> change_entry_type_global() before the first one has actually finished
> changing all the types it was meant to change. So the improvement is to
> make sure that all the types which need to be changed actually get
> changed before the second invocation starts.
>
> The absolute simplest thing to do would be to make it actually search
> through the p2m table and make the change immediately. But we'd like to
> avoid this if we can because it's so slow.
>
> So the next simplest thing to do would be that when someone calls
> change_entry_type_global() a second time, you go through every
> misconfigured entry and resolve it, so that the new
> change_entry_type_global() starts with a clean slate. This should be
> faster than just sweeping the whole p2m table, since we only need to
> check the p2m entries that haven't been touched since we did the type
> change, but it may still be a lot of work.
>
> So there are other optimizations we might be able to make to try to
> avoid going through and re-syncing things; and those are the examples
> that I gave.
>
>>> For the time being though, this will fail at #4, right? That is,
>>> logdirty mode cannot be enabled while server B is registered?
>>>
>>> That does mean we'd be forced to sort out the situation before we allow
>>> logdirty and ioreq_server to be used at the same time, but that doesn't
>>> really seem like such a bad idea to me.
>> One solution I thought of is to just return failure in
>> hap_enable_log_dirty()
>> if p2m->ioreq.server is not NULL. But I did not choose such approach,
>> because:
>>
>> 1> I still want to keep the logdirty feature so that XenGT can use it to
>> keep track
>> of the dirty rams when we support live migration in the future;
> But that's not something you can set in stone -- you can have it return
> -EBUSY now, and then at such time as you add dirty vram support for
> ioreq_server p2m types (which I don't think should be hard at all,
> actually), you can remove that restriction.
OK. Returning -EBUSY is fine to me.
In fact, I do not really worry about the tracking of ioreq_server gfns,
I was
worrying about the normal dirty ram pages. But if p2m_change_entry_type()
can be enhanced in the future, I can remove that restriction then. :)
> The fact is that at he moment, setting logdirty *will not work*; and the
> best interface is one in which broken things cannot happen even by
> accident. Regardless of what we end up deciding wrt Xen changing the
> entries, I think that Xen should refuse to enable logdirty mode when
> there is an ioreq server registered for ioreq_server p2m entries.
>
>> 2> I also agree with Paul's argument: it is device model's duty to do
>> the p2m type
>> resetting work.
> But that's not really the point. We all agree that people should look
> where they're going and it's generally an individual's responsibility
> not to fall into pits or run into things in the road or on the sidewalk.
> But that doesn't mean that it's therefore OK to dig a pit with spikes
> at the bottom in the middle of where people normally walk or cycle.
> Because even though it is an individual person's responsibility not to
> walk into holes, occasionally people are distracted or don't see well or
> make mistakes; and the consequences for being temporarily distracted
> should never be "falling onto a bed of sharp spikes". :-) If you do
> have to dig a hole in the sidewalk, then at very least you need to put a
> physical barrier around it, so that the consequences for being
> distracted are "runs into a barrier" rather than "falls into a pit".
> But the best of all, if you can manage it, is not to dig the hole at all.
>
> Similarly, even if it could be in theory the device model's duty to
> reset the p2m entries it changed, it's still the case that programmers
> make mistakes. When those mistakes happen, we at very least want it to
> be as easy to figure out what the problem is as possible; and if we can,
> we want to make those mistakes completely harmless.
>
> Making it possible for ioreq server A's entries to remain outstanding
> after ioreq server B connects is the programming equivalent of leaving a
> big open hole in the middle of a sidewalk: it means that when there's a
> mistake made, there aren't any obvious immediate failures that tell you,
> "Server A forgot to release some entries". Instead, you will get random
> failures, as bits of memory behave strangely or run very slowly for no
> apparent reason.
>
> Having it impossible to connect ioreq server B if there are still
> outstanding entries from ioreq server A is the equivalent of digging a
> hole and then putting up a barrier. Now you get a failure when you try
> it, and you're told exactly what the problem is -- the last guy didn't
> release all his entries. If you don't have the code you're still a bit
> stuck, but at least you know it just doesn't work, rather than failing
> in mysterious and difficult to detect ways later.
>
> Having Xen reset the entries just makes this entire problem go away --
> it's like accessing whatever you needed to access by digging sideways
> from the storm drains, rather than digging a hole in the sidewalk.
>
> It's all well and good to say, "It's the device model's responsibility",
> but we need to plan on programmers making mistakes. (And we also need
> to plan for administrators making mistakes, which is why I think
> returning -EBUSY when you try to enable logdirty with
Hah. This is a very good metaphor. I am convinced. :)
Though I have doubts about how to refactor the
p2m_change_entry_type_global() interface,
I'm now willing to take your suggestions:
a> still need the p2m resetting when ioreq server is unbounded;
b> disable log dirty feature if one ioreq server is bounded.
Does anyone else has different opinions? Thanks!
>>> I'm still open to being convinced, but at the moment it really seems to
>>> me like improving the situation is the better long-term option.
>>>
>> Thanks for all your advices, George. I'm also willing to taking other
>> advices, if we have
>> a more acceptable(for you, Jan and other maintainers) resync approach in
>> hypervisor,
>> I'd like to add this. If the code is too complicated, I can submit it
>> in a separate new
>> patchset. :)
> Well I think sometime in early July I should be able to make some time
> to take a look at it properly. Maybe I can start with a "draft" patch,
> and you can take it and fix it up and make it work. Or maybe I'll find
> it's actually too complicated, and then agree with you that relying on
> the server to clean up after itself is the only option. :-)
Thank you, George. I definitely would like to take this work.
And before that, I think disable the log dirty could be OK for me(
after all, making vGPU live migratible requires more features added).
Yu
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
next prev parent reply other threads:[~2016-06-24 4:16 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-19 9:05 [PATCH v4 0/3] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type Yu Zhang
2016-05-19 9:05 ` [PATCH v4 1/3] x86/ioreq server: Rename p2m_mmio_write_dm to p2m_ioreq_server Yu Zhang
2016-06-14 10:04 ` Jan Beulich
2016-06-14 13:14 ` George Dunlap
2016-06-15 10:51 ` Yu Zhang
2016-05-19 9:05 ` [PATCH v4 2/3] x86/ioreq server: Add new functions to get/set memory types Yu Zhang
2016-05-19 9:05 ` [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server Yu Zhang
2016-06-14 10:45 ` Jan Beulich
2016-06-14 13:13 ` George Dunlap
2016-06-14 13:31 ` Jan Beulich
2016-06-15 9:50 ` George Dunlap
2016-06-15 10:21 ` Jan Beulich
2016-06-15 11:28 ` George Dunlap
2016-06-16 9:30 ` Yu Zhang
2016-06-16 9:55 ` Jan Beulich
2016-06-17 10:17 ` George Dunlap
2016-06-20 9:03 ` Yu Zhang
2016-06-20 10:10 ` George Dunlap
2016-06-20 10:25 ` Jan Beulich
2016-06-20 10:32 ` George Dunlap
2016-06-20 10:55 ` Jan Beulich
2016-06-20 11:28 ` Yu Zhang
2016-06-20 13:13 ` George Dunlap
2016-06-21 7:42 ` Yu Zhang
2016-06-20 10:30 ` Yu Zhang
2016-06-20 10:43 ` George Dunlap
2016-06-20 10:45 ` Jan Beulich
2016-06-20 11:06 ` Yu Zhang
2016-06-20 11:20 ` Jan Beulich
2016-06-20 12:06 ` Yu Zhang
2016-06-20 13:38 ` Jan Beulich
2016-06-21 7:45 ` Yu Zhang
2016-06-21 8:22 ` Jan Beulich
2016-06-21 9:16 ` Yu Zhang
2016-06-21 9:47 ` Jan Beulich
2016-06-21 10:00 ` Yu Zhang
2016-06-21 14:38 ` George Dunlap
2016-06-22 6:39 ` Jan Beulich
2016-06-22 8:38 ` Yu Zhang
2016-06-22 9:11 ` Jan Beulich
2016-06-22 9:16 ` George Dunlap
2016-06-22 9:29 ` Jan Beulich
2016-06-22 9:47 ` George Dunlap
2016-06-22 10:07 ` Yu Zhang
2016-06-22 11:33 ` George Dunlap
2016-06-23 7:37 ` Yu Zhang
2016-06-23 10:33 ` George Dunlap
2016-06-24 4:16 ` Yu Zhang [this message]
2016-06-24 6:12 ` Jan Beulich
2016-06-24 7:12 ` Yu Zhang
2016-06-24 8:01 ` Jan Beulich
2016-06-24 9:57 ` Yu Zhang
2016-06-24 10:27 ` Jan Beulich
2016-06-22 10:10 ` Jan Beulich
2016-06-22 10:15 ` George Dunlap
2016-06-22 11:50 ` Jan Beulich
2016-06-15 10:52 ` Yu Zhang
2016-06-15 12:26 ` Jan Beulich
2016-06-16 9:32 ` Yu Zhang
2016-06-16 10:02 ` Jan Beulich
2016-06-16 11:18 ` Yu Zhang
2016-06-16 12:43 ` Jan Beulich
2016-06-20 9:05 ` Yu Zhang
2016-06-14 13:14 ` George Dunlap
2016-05-27 7:52 ` [PATCH v4 0/3] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type Zhang, Yu C
2016-05-27 10:00 ` Jan Beulich
2016-05-27 9:51 ` Zhang, Yu C
2016-05-27 10:02 ` George Dunlap
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=576CB408.90104@linux.intel.com \
--to=yu.c.zhang@linux.intel.com \
--cc=JBeulich@suse.com \
--cc=andrew.cooper3@citrix.com \
--cc=george.dunlap@citrix.com \
--cc=george.dunlap@eu.citrix.com \
--cc=jun.nakajima@intel.com \
--cc=kevin.tian@intel.com \
--cc=paul.durrant@citrix.com \
--cc=tim@xen.org \
--cc=xen-devel@lists.xen.org \
--cc=zhiyuan.lv@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).