All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Chen, Tiejun" <tiejun.chen@intel.com>
To: Wei Liu <wei.liu2@citrix.com>
Cc: kevin.tian@intel.com, ian.campbell@citrix.com,
	andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org,
	stefano.stabellini@citrix.com, JBeulich@suse.com,
	yang.z.zhang@intel.com, Ian.Jackson@eu.citrix.com
Subject: Re: [RFC][PATCH 04/13] tools/libxl: detect and avoid conflicts with RDM
Date: Tue, 19 May 2015 14:47:05 +0800	[thread overview]
Message-ID: <555ADC69.5060605@intel.com> (raw)
In-Reply-To: <20150518200017.GA9893@zion.uk.xensource.com>

On 2015/5/19 4:00, Wei Liu wrote:
> On Thu, May 14, 2015 at 04:27:45PM +0800, Chen, Tiejun wrote:
>> On 2015/5/11 19:32, Wei Liu wrote:
>>> On Mon, May 11, 2015 at 04:09:53PM +0800, Chen, Tiejun wrote:
>>>> On 2015/5/8 22:43, Wei Liu wrote:
>>>>> Sorry for the late review. This series fell through the crack.
>>>>>
>>>>
>>>> Thanks for your review.
>>>>
>>>>> On Fri, Apr 10, 2015 at 05:21:55PM +0800, Tiejun Chen wrote:
>>>>>> While building a VM, HVM domain builder provides struct hvm_info_table{}
>>>>>> to help hvmloader. Currently it includes two fields to construct guest
>>>>>> e820 table by hvmloader, low_mem_pgend and high_mem_pgend. So we should
>>>>>> check them to fix any conflict with RAM.
>>>>>>
>>>>>> RMRR can reside in address space beyond 4G theoretically, but we never
>>>>>> see this in real world. So in order to avoid breaking highmem layout
>>>>>
>>>>> How does this break highmem layout?
>>>>
>>>> In most cases highmen is always continuous like [4G, ...] but RMRR is
>>>> theoretically beyond 4G but very rarely. Especially we don't see this
>>>> happened in real world. So we don't want to such a case breaking the
>>>> highmem.
>>>>
>>>
>>> The problem is  we take this approach just because this rarely happens
>>> *now* is not future proof.  It needs to be clearly documented somewhere
>>> in the manual (or any other Intel docs) and be referenced in the code.
>>> Otherwise we might end up in a situation that no-one knows how it is
>>> supposed to work and no-one can fix it if it breaks in the future, that
>>> is, every single device on earth requires RMRR > 4G overnight (I'm
>>> exaggerating).
>>>
>>> Or you can just make it works with highmem. How much more work do you
>>> envisage?
>>>
>>> (If my above comment makes no sense do let me know. I only have very
>>> shallow understanding of RMRR)
>>
>> Maybe I'm misleading you :)
>>
>> I don't mean RMRR > 4G is not allowed to work in our implementation. What
>> I'm saying is that our *policy* is just simple for this kind of rare highmem
>> case...
>>
>
> Yes, but this policy is hardcoded in code (as in, you bail when
> detecting conflict in highmem region). I don't think we have the
> expertise to un-hardcode it in the future (it might require x86 specific
> insider information and specific hardware to test). So I would like to
> make it as future proof as possible.
>
> I know you're limited by hvm_info. I would accept this hardcoded policy
> as short term solution, but I would like commitment from Intel to
> maintain this piece of code and properly work out a flexible solution to
> deal with >4G case.

Looks Kevin replied this.

>
>>>
>>>>>
>>>>>> we don't solve highmem conflict. Note this means highmem rmrr could still
>>>>>> be supported if no conflict.
>>>>>>
>>
>> Like these two sentences above.
>>
>>>>>
>>>>> Aren't you actively trying to avoid conflict in libxl?
>>>>
>>>> RMRR is fixed by BIOS so we can't aovid conflict. Here we want to adopt some
>>>> good policies to address RMRR. In the case of highmemt, that simple policy
>>>> should be enough?
>>>>
>>>
>>> Whatever policy you and HV maintainers agree on. Just clearly document
>>> it.
>>
>> Do you mean I should brief this patch description into one separate
>> document?
>>
>
> Either in code comment or in a separate document.

Okay.

>
>>>
>>>>>
>
> [...]
>
>>>> The caller to xc_device_get_rdm always frees this.
>>>>
>>>
>>> I think I misunderstood how this function works. I thought xrdm was
>>> passed in by caller, which is clearly not the case. Sorry!
>>>
>>>
>>> In that case, the `if ( rc < 0 )' is not needed because the call should
>>> always return rc < 0. An assert is good enough.
>>
>> assert(rc < 0)? But we can't presume the user always pass '0' firstly, and
>> additionally, we may have no any RMRR indeed.
>>
>> So I guess what you want is,
>>
>> assert( rc <=0 );
>> if ( !rc )
>>      goto xrdm;
>>
>
> Yes I was thinking about something like this.
>
> How in this case can rc equal to 0? Is it because you don't actually have any

Yes.

> regions? If so, please write a comment.
>

Sure.

>> if ( errno == ENOBUFS )
>> ...
>>
>> Right?
>>
>>>
>
> [...]
>
>>>>>
>>>>>> +    memset(d_config->rdms, 0, sizeof(libxl_device_rdm));
>>>>>> +
>>>>>> +    /* Query all RDM entries in this platform */
>>>>>> +    if (type == LIBXL_RDM_RESERVE_TYPE_HOST) {
>>>>>> +        d_config->num_rdms = nr_all_rdms;
>>>>>> +        for (i = 0; i < d_config->num_rdms; i++) {
>>>>>> +            d_config->rdms[i].start =
>>>>>> +                                (uint64_t)xrdm[i].start_pfn << XC_PAGE_SHIFT;
>>>>>> +            d_config->rdms[i].size =
>>>>>> +                                (uint64_t)xrdm[i].nr_pages << XC_PAGE_SHIFT;
>>>>>> +            d_config->rdms[i].flag = d_config->b_info.rdm.reserve;
>>>>>> +        }
>>>>>> +    } else {
>>>>>> +        d_config->num_rdms = 0;
>>>>>> +    }
>>>>>
>>>>> And you should move d_config->rdms = libxl__calloc inside that `if'.
>>>>> That is, don't allocate memory if you don't need it.
>>>>
>>>> We can't since in all cases we need to preallocate this, and then we will
>>>> handle this according to our policy.
>>>>
>>>
>>> How would it ever be used again if you set d_config->num_rdms to 0? How
>>> do you know the exact size of your array again?
>>
>> If we don't consider multiple devices shares one rdm entry, our workflow can
>> be showed as follows:
>>
>> #1. We always preallocate all rdms[] but with memset().
>
> Because the number of RDM entries used in a domain never exceeds the
> number of global entries? I.e. we never risk overrunning the array?

Yes.

"global" indicates the number would count all rdm entries. If w/o 
"global" flag, we just count these rdm entries associated to those pci 
devices assigned to this domain. So this means actually we have this 
equation,

The number without "global" <= The number with "global"

>
>> #2. Then we have two cases for that global rule,
>>
>> #2.1 If we really need all according to our global rule, we would set all
>> rdms[] with all real rdm info and set d_config->num_rdms.
>> #2.2 If we have no such a global rule to obtain all, we just clear
>> d_config->num_rdms.
>>
>
> No, don't do this. In any failure path, if the free / dispose function
> depends on num_rdms to iterate through the whole list to dispose memory
> (if your rdm structure later contains pointers to allocated memory),
> this method leaks memory.
>
> The num_ field should always reflect the actual size of the array.
>
>> #3. Then for per device rule
>>
>> #3.1 From #2.1, we just need to check if we should change one given rdm
>> entry's policy if this given entry is just associated to this device.
>> #3.2 From 2.2, obviously we just need to fill rdms one by one. Of course,
>> its very possible that we don't fill all rdms since all passthroued devices
>> might not have no rdm at all or they just occupy some. But anyway, finally
>> we sync d_config->num_rdms.
>>
>
> Sorry I don't follow. But if your problem is you don't know how many
> entries you actually need, just use libxl__realloc?
>
>>>
>>>>>
>>>>>> +    free(xrdm);
>>>>>> +
>>>>>> +    /* Query RDM entries per-device */
>>>>>> +    for (i = 0; i < d_config->num_pcidevs; i++) {
>>>>>> +        unsigned int nr_entries = 0;
>>>>
>>>> Maybe I should restate this,
>>>> 	unsigned int nr_entries;
>>>>
>
> [...]
>
>>>>
>>>> Like I replied above we always preallocate this at the beginning.
>>>>
>>>
>>> Ah, OK.
>>>
>>> But please don't do this. It's hard to see you don't overrun the
>>> buffer. Please allocate memory only when you need it.
>>
>> Sorry I don't understand this. As I mention above, we don't know how many
>> rdm entries we really need to allocate. So this is why we'd like to
>> preallocate all rdms at the beginning. Then this can cover all cases, global
>> policy, (global policy & per device) and only per device. Even if multiple
>> devices shares one rdm we also need to avoid duplicating a new...
>>
>
> Can you use libxl__realloc?
>

Looks this can expand size each time automatically, right? If so I think 
we can try this.

Thanks
Tiejun

  parent reply	other threads:[~2015-05-19  6:47 UTC|newest]

Thread overview: 125+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-10  9:21 [RFC][PATCH 00/13] Fix RMRR Tiejun Chen
2015-04-10  9:21 ` [RFC][PATCH 01/13] tools: introduce some new parameters to set rdm policy Tiejun Chen
2015-05-08 13:04   ` Wei Liu
2015-05-11  5:35     ` Chen, Tiejun
2015-05-11 14:54       ` Wei Liu
2015-05-15  1:52         ` Chen, Tiejun
2015-05-18  1:06           ` Chen, Tiejun
2015-05-18 19:17           ` Wei Liu
2015-05-19  3:16             ` Chen, Tiejun
2015-05-19  9:42               ` Wei Liu
2015-05-19 10:50                 ` Chen, Tiejun
2015-05-19 11:00                   ` Wei Liu
2015-05-20  5:27                     ` Chen, Tiejun
2015-05-20  8:36                       ` Wei Liu
2015-05-20  8:51                         ` Chen, Tiejun
2015-05-20  9:07                           ` Wei Liu
2015-04-10  9:21 ` [RFC][PATCH 02/13] introduce XENMEM_reserved_device_memory_map Tiejun Chen
2015-04-16 14:59   ` Tim Deegan
2015-04-16 15:10     ` Jan Beulich
2015-04-16 15:24       ` Tim Deegan
2015-04-16 15:40         ` Tian, Kevin
2015-04-23 12:32       ` Chen, Tiejun
2015-04-23 12:59         ` Jan Beulich
2015-04-24  1:17           ` Chen, Tiejun
2015-04-24  7:21             ` Jan Beulich
2015-04-10  9:21 ` [RFC][PATCH 03/13] tools/libxc: Expose new hypercall xc_reserved_device_memory_map Tiejun Chen
2015-05-08 13:07   ` Wei Liu
2015-05-11  5:36     ` Chen, Tiejun
2015-05-11  9:50       ` Wei Liu
2015-04-10  9:21 ` [RFC][PATCH 04/13] tools/libxl: detect and avoid conflicts with RDM Tiejun Chen
2015-04-15 13:10   ` Ian Jackson
2015-04-15 18:22     ` Tian, Kevin
2015-04-23 12:31     ` Chen, Tiejun
2015-04-20 11:13   ` Jan Beulich
2015-05-06 15:00     ` Chen, Tiejun
2015-05-06 15:34       ` Jan Beulich
2015-05-07  2:22         ` Chen, Tiejun
2015-05-07  6:04           ` Jan Beulich
2015-05-08  1:14             ` Chen, Tiejun
2015-05-08  1:24           ` Chen, Tiejun
2015-05-08 15:13             ` Wei Liu
2015-05-11  6:06               ` Chen, Tiejun
2015-05-08 14:43   ` Wei Liu
2015-05-11  8:09     ` Chen, Tiejun
2015-05-11 11:32       ` Wei Liu
2015-05-14  8:27         ` Chen, Tiejun
2015-05-18  1:06           ` Chen, Tiejun
2015-05-18 20:00           ` Wei Liu
2015-05-19  1:32             ` Tian, Kevin
2015-05-19 10:22               ` Wei Liu
2015-05-19  6:47             ` Chen, Tiejun [this message]
2015-04-10  9:21 ` [RFC][PATCH 05/13] xen/x86/p2m: introduce set_identity_p2m_entry Tiejun Chen
2015-04-16 15:05   ` Tim Deegan
2015-04-23 12:33     ` Chen, Tiejun
2015-04-10  9:21 ` [RFC][PATCH 06/13] xen:vtd: create RMRR mapping Tiejun Chen
2015-04-16 15:16   ` Tim Deegan
2015-04-16 15:50     ` Tian, Kevin
2015-04-10  9:21 ` [RFC][PATCH 07/13] xen/passthrough: extend hypercall to support rdm reservation policy Tiejun Chen
2015-04-16 15:40   ` Tim Deegan
2015-04-23 12:32     ` Chen, Tiejun
2015-04-23 13:05       ` Tim Deegan
2015-04-23 13:59       ` Jan Beulich
2015-04-23 14:26         ` Tim Deegan
2015-05-04  8:15         ` Tian, Kevin
2015-04-20 13:36   ` Jan Beulich
2015-05-11  8:37     ` Chen, Tiejun
2015-05-08 16:07   ` Julien Grall
2015-05-11  8:42     ` Chen, Tiejun
2015-05-11  9:51       ` Julien Grall
2015-05-11 10:57         ` Jan Beulich
2015-05-14  5:48           ` Chen, Tiejun
2015-05-14 20:13             ` Jan Beulich
2015-05-14  5:47         ` Chen, Tiejun
2015-05-14 10:19           ` Julien Grall
2015-04-10  9:21 ` [RFC][PATCH 08/13] tools: extend xc_assign_device() " Tiejun Chen
2015-04-20 13:39   ` Jan Beulich
2015-05-11  9:45     ` Chen, Tiejun
2015-05-11 10:53       ` Jan Beulich
2015-05-14  7:04         ` Chen, Tiejun
2015-04-10  9:22 ` [RFC][PATCH 09/13] xen: enable XENMEM_set_memory_map in hvm Tiejun Chen
2015-04-16 15:42   ` Tim Deegan
2015-04-20 13:46   ` Jan Beulich
2015-05-15  2:33     ` Chen, Tiejun
2015-05-15  6:12       ` Jan Beulich
2015-05-15  6:24         ` Chen, Tiejun
2015-05-15  6:35           ` Jan Beulich
2015-05-15  6:59             ` Chen, Tiejun
2015-04-10  9:22 ` [RFC][PATCH 10/13] tools: extend XENMEM_set_memory_map Tiejun Chen
2015-04-10 10:01   ` Wei Liu
2015-04-13  2:09     ` Chen, Tiejun
2015-04-13 11:02       ` Wei Liu
2015-04-14  0:42         ` Chen, Tiejun
2015-05-05  9:32           ` Wei Liu
2015-04-20 13:51   ` Jan Beulich
2015-05-15  2:57     ` Chen, Tiejun
2015-05-15  6:16       ` Jan Beulich
2015-05-15  7:09         ` Chen, Tiejun
2015-05-15  7:32           ` Jan Beulich
2015-05-15  7:51             ` Chen, Tiejun
2015-04-10  9:22 ` [RFC][PATCH 11/13] hvmloader: get guest memory map into memory_map[] Tiejun Chen
2015-04-20 13:57   ` Jan Beulich
2015-05-15  3:10     ` Chen, Tiejun
2015-04-10  9:22 ` [RFC][PATCH 12/13] hvmloader/pci: skip reserved ranges Tiejun Chen
2015-04-20 14:21   ` Jan Beulich
2015-05-15  3:18     ` Chen, Tiejun
2015-05-15  6:19       ` Jan Beulich
2015-05-15  7:34         ` Chen, Tiejun
2015-05-15  7:44           ` Jan Beulich
2015-05-15  8:16             ` Chen, Tiejun
2015-05-15  8:31               ` Jan Beulich
2015-05-15  9:21                 ` Chen, Tiejun
2015-05-15  9:32                   ` Jan Beulich
2015-04-10  9:22 ` [RFC][PATCH 13/13] hvmloader/e820: construct guest e820 table Tiejun Chen
2015-04-20 14:29   ` Jan Beulich
2015-05-15  6:11     ` Chen, Tiejun
2015-05-15  6:25       ` Jan Beulich
2015-05-15  6:39         ` Chen, Tiejun
2015-05-15  6:56           ` Jan Beulich
2015-05-15  7:11             ` Chen, Tiejun
2015-05-15  7:34               ` Jan Beulich
2015-05-15  8:00                 ` Chen, Tiejun
2015-05-15  8:12                   ` Jan Beulich
2015-05-15  8:47                     ` Chen, Tiejun
2015-05-15  8:54                       ` Jan Beulich
2015-05-15  9:18                         ` Chen, Tiejun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=555ADC69.5060605@intel.com \
    --to=tiejun.chen@intel.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=JBeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=ian.campbell@citrix.com \
    --cc=kevin.tian@intel.com \
    --cc=stefano.stabellini@citrix.com \
    --cc=tim@xen.org \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xen.org \
    --cc=yang.z.zhang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.