All of lore.kernel.org
 help / color / mirror / Atom feed
* Converting heap page_infos to contiguous virtual
@ 2016-07-13 19:44 Boris Ostrovsky
  2016-07-13 20:02 ` Andrew Cooper
  0 siblings, 1 reply; 20+ messages in thread
From: Boris Ostrovsky @ 2016-07-13 19:44 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Andrew Cooper

I would like to clear a bunch of Xen heap pages at once (i.e. not
page-by-page).

Greatly simplifying things, let's say I grab (in common/page_alloc.c)
    pg = page_list_remove_head(&heap(node, zone, order)

and then

    mfn_t mfn =
_mfn(page_to_mfn(pg));                                        
    char *va = mfn_to_virt(mfn_x(mfn));
    memset(va, 0, 4096 * (1 << order));


Would it be valid to this? Do I need to account for the PDX hole?


Thanks.
-boris





_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Converting heap page_infos to contiguous virtual
  2016-07-13 19:44 Converting heap page_infos to contiguous virtual Boris Ostrovsky
@ 2016-07-13 20:02 ` Andrew Cooper
  2016-07-13 20:17   ` Boris Ostrovsky
  2016-08-01 12:09   ` Jan Beulich
  0 siblings, 2 replies; 20+ messages in thread
From: Andrew Cooper @ 2016-07-13 20:02 UTC (permalink / raw)
  To: Boris Ostrovsky, xen-devel; +Cc: George Dunlap

On 13/07/16 20:44, Boris Ostrovsky wrote:
> I would like to clear a bunch of Xen heap pages at once (i.e. not
> page-by-page).
>
> Greatly simplifying things, let's say I grab (in common/page_alloc.c)
>     pg = page_list_remove_head(&heap(node, zone, order)
>
> and then
>
>     mfn_t mfn =
> _mfn(page_to_mfn(pg));                                        
>     char *va = mfn_to_virt(mfn_x(mfn));
>     memset(va, 0, 4096 * (1 << order));
>
>
> Would it be valid to this?

In principle, yes.  The frame_table is in order.

However, mfn_to_virt() will blow up for RAM above the 5TB boundary.  You
need to map_domain_page() to get a mapping.

>  Do I need to account for the PDX hole?

Jan is probably the best person to ask about this, but I am failure sure
there are lurking dragons here.

PDX compression is used to reduce the size of the frametable when there
are large unused ranges of mfns.  Without paying attention to the PDX
shift, you don't know where the discontinuities lie.

However, because the PDX shift is an aligned power of two, there are
likely to be struct page_info*'s in the frame_table which don't point at
real RAM, and won't have a virtual mapping even in the directmap.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Converting heap page_infos to contiguous virtual
  2016-07-13 20:02 ` Andrew Cooper
@ 2016-07-13 20:17   ` Boris Ostrovsky
  2016-07-13 20:28     ` Boris Ostrovsky
  2016-07-13 20:34     ` Andrew Cooper
  2016-08-01 12:09   ` Jan Beulich
  1 sibling, 2 replies; 20+ messages in thread
From: Boris Ostrovsky @ 2016-07-13 20:17 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel; +Cc: George Dunlap

On 07/13/2016 04:02 PM, Andrew Cooper wrote:
> On 13/07/16 20:44, Boris Ostrovsky wrote:
>> I would like to clear a bunch of Xen heap pages at once (i.e. not
>> page-by-page).
>>
>> Greatly simplifying things, let's say I grab (in common/page_alloc.c)
>>     pg = page_list_remove_head(&heap(node, zone, order)
>>
>> and then
>>
>>     mfn_t mfn =
>> _mfn(page_to_mfn(pg));                                        
>>     char *va = mfn_to_virt(mfn_x(mfn));
>>     memset(va, 0, 4096 * (1 << order));
>>
>>
>> Would it be valid to this?
> In principle, yes.  The frame_table is in order.
>
> However, mfn_to_virt() will blow up for RAM above the 5TB boundary.  You
> need to map_domain_page() to get a mapping.

Right, but that would mean going page-by-page, which I want to avoid.

Now, DIRECTMAP_SIZE is ~128TB (if my math is correct) --- doesn't it
imply that it maps this big a range contiguously (modulo PDX hole)?

>
>>  Do I need to account for the PDX hole?
> Jan is probably the best person to ask about this, but I am failure sure
> there are lurking dragons here.
>
> PDX compression is used to reduce the size of the frametable when there
> are large unused ranges of mfns.  Without paying attention to the PDX
> shift, you don't know where the discontinuities lie.
>
> However, because the PDX shift is an aligned power of two, there are
> likely to be struct page_info*'s in the frame_table which don't point at
> real RAM, and won't have a virtual mapping even in the directmap.

So I would be OK with finding which mfn of my range points to beginning
of the hole and break the mfn range into two sections --- one below the
hole and one above. With hope that both ranges can be mapped
contiguously --- something that I don't know whether is true.

-boris




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Converting heap page_infos to contiguous virtual
  2016-07-13 20:17   ` Boris Ostrovsky
@ 2016-07-13 20:28     ` Boris Ostrovsky
  2016-07-13 20:34     ` Andrew Cooper
  1 sibling, 0 replies; 20+ messages in thread
From: Boris Ostrovsky @ 2016-07-13 20:28 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel; +Cc: George Dunlap

On 07/13/2016 04:17 PM, Boris Ostrovsky wrote:
> On 07/13/2016 04:02 PM, Andrew Cooper wrote:
>> On 13/07/16 20:44, Boris Ostrovsky wrote:
>>> I would like to clear a bunch of Xen heap pages at once (i.e. not
>>> page-by-page).
>>>
>>> Greatly simplifying things, let's say I grab (in common/page_alloc.c)
>>>     pg = page_list_remove_head(&heap(node, zone, order)
>>>
>>> and then
>>>
>>>     mfn_t mfn =
>>> _mfn(page_to_mfn(pg));                                        
>>>     char *va = mfn_to_virt(mfn_x(mfn));
>>>     memset(va, 0, 4096 * (1 << order));
>>>
>>>
>>> Would it be valid to this?
>> In principle, yes.  The frame_table is in order.
>>
>> However, mfn_to_virt() will blow up for RAM above the 5TB boundary.  You
>> need to map_domain_page() to get a mapping.
> Right, but that would mean going page-by-page, which I want to avoid.
>
> Now, DIRECTMAP_SIZE is ~128TB (if my math is correct) --- doesn't it
> imply that it maps this big a range contiguously (modulo PDX hole)?
>
>>>  Do I need to account for the PDX hole?
>> Jan is probably the best person to ask about this, but I am failure sure
>> there are lurking dragons here.
>>
>> PDX compression is used to reduce the size of the frametable when there
>> are large unused ranges of mfns.  Without paying attention to the PDX
>> shift, you don't know where the discontinuities lie.
>>
>> However, because the PDX shift is an aligned power of two, there are
>> likely to be struct page_info*'s in the frame_table which don't point at
>> real RAM, and won't have a virtual mapping even in the directmap.
> So I would be OK with finding which mfn of my range points to beginning
> of the hole and break the mfn range into two sections --- one below the
> hole and one above. With hope that both ranges can be mapped
> contiguously --- something that I don't know whether is true.


In fact, I am OK with breaking the range into many chunks. My goal is to
be able to map something like a few megabytes (maybe few tens of
megabytes at most) contiguously.

-boris


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Converting heap page_infos to contiguous virtual
  2016-07-13 20:17   ` Boris Ostrovsky
  2016-07-13 20:28     ` Boris Ostrovsky
@ 2016-07-13 20:34     ` Andrew Cooper
  2016-07-13 20:57       ` Boris Ostrovsky
  1 sibling, 1 reply; 20+ messages in thread
From: Andrew Cooper @ 2016-07-13 20:34 UTC (permalink / raw)
  To: Boris Ostrovsky, xen-devel; +Cc: George Dunlap

On 13/07/2016 21:17, Boris Ostrovsky wrote:
> On 07/13/2016 04:02 PM, Andrew Cooper wrote:
>> On 13/07/16 20:44, Boris Ostrovsky wrote:
>>> I would like to clear a bunch of Xen heap pages at once (i.e. not
>>> page-by-page).
>>>
>>> Greatly simplifying things, let's say I grab (in common/page_alloc.c)
>>>     pg = page_list_remove_head(&heap(node, zone, order)
>>>
>>> and then
>>>
>>>     mfn_t mfn =
>>> _mfn(page_to_mfn(pg));                                        
>>>     char *va = mfn_to_virt(mfn_x(mfn));
>>>     memset(va, 0, 4096 * (1 << order));
>>>
>>>
>>> Would it be valid to this?
>> In principle, yes.  The frame_table is in order.
>>
>> However, mfn_to_virt() will blow up for RAM above the 5TB boundary.  You
>> need to map_domain_page() to get a mapping.
> Right, but that would mean going page-by-page, which I want to avoid.
>
> Now, DIRECTMAP_SIZE is ~128TB (if my math is correct) --- doesn't it
> imply that it maps this big a range contiguously (modulo PDX hole)?

Your maths is correct, and yet you will end up with problems if you
trust it.

That is the magic mode for the idle and monitor pagetables.  In the
context of a 64bit PV guest, the cutoff is at 5TB, at which point you
venture into the virtual address space reserved for guest kernel use. 
(It is rather depressing that the 64bit PV guest ABI is the factor
limiting Xen's maximum RAM usage.)

>>>  Do I need to account for the PDX hole?
>> Jan is probably the best person to ask about this, but I am failure sure
>> there are lurking dragons here.
>>
>> PDX compression is used to reduce the size of the frametable when there
>> are large unused ranges of mfns.  Without paying attention to the PDX
>> shift, you don't know where the discontinuities lie.
>>
>> However, because the PDX shift is an aligned power of two, there are
>> likely to be struct page_info*'s in the frame_table which don't point at
>> real RAM, and won't have a virtual mapping even in the directmap.
> So I would be OK with finding which mfn of my range points to beginning
> of the hole and break the mfn range into two sections --- one below the
> hole and one above. With hope that both ranges can be mapped
> contiguously --- something that I don't know whether is true.

If you have a struct page_info * in your hand, and know from the E820
where the next non RAM boundary is, I think you should be safe to clear
memory over a contiguous range of the directmap.  There shouldn't be any
discontinuities over that range.

Be aware of memory_guard() though which does shoot holes in the
directmap.  However, only allocated pages should be guarded, so you
should never be in the position of scrubbing pages with a missing
mapping in the directmap.  For RAM above the 5TB boundary,
map_domain_page() will DTRT, but we might want to see about making a
variant which can make mappings spanning more than 4k.

(You probably want someone else to sanity check my logic here.)

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Converting heap page_infos to contiguous virtual
  2016-07-13 20:34     ` Andrew Cooper
@ 2016-07-13 20:57       ` Boris Ostrovsky
  2016-07-13 21:06         ` Andrew Cooper
  2016-07-14 10:25         ` George Dunlap
  0 siblings, 2 replies; 20+ messages in thread
From: Boris Ostrovsky @ 2016-07-13 20:57 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel; +Cc: George Dunlap

On 07/13/2016 04:34 PM, Andrew Cooper wrote:
> On 13/07/2016 21:17, Boris Ostrovsky wrote:
>> On 07/13/2016 04:02 PM, Andrew Cooper wrote:
>>> On 13/07/16 20:44, Boris Ostrovsky wrote:
>>>> I would like to clear a bunch of Xen heap pages at once (i.e. not
>>>> page-by-page).
>>>>
>>>> Greatly simplifying things, let's say I grab (in common/page_alloc.c)
>>>>     pg = page_list_remove_head(&heap(node, zone, order)
>>>>
>>>> and then
>>>>
>>>>     mfn_t mfn =
>>>> _mfn(page_to_mfn(pg));                                        
>>>>     char *va = mfn_to_virt(mfn_x(mfn));
>>>>     memset(va, 0, 4096 * (1 << order));
>>>>
>>>>
>>>> Would it be valid to this?
>>> In principle, yes.  The frame_table is in order.
>>>
>>> However, mfn_to_virt() will blow up for RAM above the 5TB boundary.  You
>>> need to map_domain_page() to get a mapping.
>> Right, but that would mean going page-by-page, which I want to avoid.
>>
>> Now, DIRECTMAP_SIZE is ~128TB (if my math is correct) --- doesn't it
>> imply that it maps this big a range contiguously (modulo PDX hole)?
> Your maths is correct, and yet you will end up with problems if you
> trust it.
>
> That is the magic mode for the idle and monitor pagetables.  In the
> context of a 64bit PV guest, the cutoff is at 5TB, at which point you
> venture into the virtual address space reserved for guest kernel use. 
> (It is rather depressing that the 64bit PV guest ABI is the factor
> limiting Xen's maximum RAM usage.)

I don't know whether it would make any difference but the pages that I am
talking about are not in use by any guest, they are free. (This question
is for scrubbing rewrite that I am working on. Which apparently you
figured out judged by what you are saying below)


>
>>>>  Do I need to account for the PDX hole?
>>> Jan is probably the best person to ask about this, but I am failure sure
>>> there are lurking dragons here.
>>>
>>> PDX compression is used to reduce the size of the frametable when there
>>> are large unused ranges of mfns.  Without paying attention to the PDX
>>> shift, you don't know where the discontinuities lie.
>>>
>>> However, because the PDX shift is an aligned power of two, there are
>>> likely to be struct page_info*'s in the frame_table which don't point at
>>> real RAM, and won't have a virtual mapping even in the directmap.
>> So I would be OK with finding which mfn of my range points to beginning
>> of the hole and break the mfn range into two sections --- one below the
>> hole and one above. With hope that both ranges can be mapped
>> contiguously --- something that I don't know whether is true.
> If you have a struct page_info * in your hand, and know from the E820
> where the next non RAM boundary is, I think you should be safe to clear
> memory over a contiguous range of the directmap.  There shouldn't be any
> discontinuities over that range.

OK, I'll look at this, thanks.


>
> Be aware of memory_guard() though which does shoot holes in the
> directmap.  However, only allocated pages should be guarded, so you
> should never be in the position of scrubbing pages with a missing
> mapping in the directmap.  For RAM above the 5TB boundary,
> map_domain_page() will DTRT, but we might want to see about making a
> variant which can make mappings spanning more than 4k.

Maybe have it at least attempt to map a larger range, i.e. not try to
cover all corner cases. Something like map_domain_page_fast(order).


-boris

>
> (You probably want someone else to sanity check my logic here.)
>
> ~Andrew
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Converting heap page_infos to contiguous virtual
  2016-07-13 20:57       ` Boris Ostrovsky
@ 2016-07-13 21:06         ` Andrew Cooper
  2016-07-13 21:43           ` Boris Ostrovsky
  2016-07-14 10:25         ` George Dunlap
  1 sibling, 1 reply; 20+ messages in thread
From: Andrew Cooper @ 2016-07-13 21:06 UTC (permalink / raw)
  To: Boris Ostrovsky, xen-devel; +Cc: George Dunlap

On 13/07/2016 21:57, Boris Ostrovsky wrote:
> On 07/13/2016 04:34 PM, Andrew Cooper wrote:
>> On 13/07/2016 21:17, Boris Ostrovsky wrote:
>>> On 07/13/2016 04:02 PM, Andrew Cooper wrote:
>>>> On 13/07/16 20:44, Boris Ostrovsky wrote:
>>>>> I would like to clear a bunch of Xen heap pages at once (i.e. not
>>>>> page-by-page).
>>>>>
>>>>> Greatly simplifying things, let's say I grab (in common/page_alloc.c)
>>>>>     pg = page_list_remove_head(&heap(node, zone, order)
>>>>>
>>>>> and then
>>>>>
>>>>>     mfn_t mfn =
>>>>> _mfn(page_to_mfn(pg));                                        
>>>>>     char *va = mfn_to_virt(mfn_x(mfn));
>>>>>     memset(va, 0, 4096 * (1 << order));
>>>>>
>>>>>
>>>>> Would it be valid to this?
>>>> In principle, yes.  The frame_table is in order.
>>>>
>>>> However, mfn_to_virt() will blow up for RAM above the 5TB boundary.  You
>>>> need to map_domain_page() to get a mapping.
>>> Right, but that would mean going page-by-page, which I want to avoid.
>>>
>>> Now, DIRECTMAP_SIZE is ~128TB (if my math is correct) --- doesn't it
>>> imply that it maps this big a range contiguously (modulo PDX hole)?
>> Your maths is correct, and yet you will end up with problems if you
>> trust it.
>>
>> That is the magic mode for the idle and monitor pagetables.  In the
>> context of a 64bit PV guest, the cutoff is at 5TB, at which point you
>> venture into the virtual address space reserved for guest kernel use. 
>> (It is rather depressing that the 64bit PV guest ABI is the factor
>> limiting Xen's maximum RAM usage.)
> I don't know whether it would make any difference but the pages that I am
> talking about are not in use by any guest, they are free. (This question
> is for scrubbing rewrite that I am working on. Which apparently you
> figured out judged by what you are saying below)

Being free is not relevant.  It depends whether current is a 64bit PV
guest or not.  Even in the idle loop, we don't context switch away from
current's pagetables.

Realistically, you must at all times use map_domain_page() (or an
alternative thereabouts), as the 5TB limit with 64bit PV guests turns
into a 3.5TB limit depending on CONFIG_BIGMEM.

>
>
>>>>>  Do I need to account for the PDX hole?
>>>> Jan is probably the best person to ask about this, but I am failure sure
>>>> there are lurking dragons here.
>>>>
>>>> PDX compression is used to reduce the size of the frametable when there
>>>> are large unused ranges of mfns.  Without paying attention to the PDX
>>>> shift, you don't know where the discontinuities lie.
>>>>
>>>> However, because the PDX shift is an aligned power of two, there are
>>>> likely to be struct page_info*'s in the frame_table which don't point at
>>>> real RAM, and won't have a virtual mapping even in the directmap.
>>> So I would be OK with finding which mfn of my range points to beginning
>>> of the hole and break the mfn range into two sections --- one below the
>>> hole and one above. With hope that both ranges can be mapped
>>> contiguously --- something that I don't know whether is true.
>> If you have a struct page_info * in your hand, and know from the E820
>> where the next non RAM boundary is, I think you should be safe to clear
>> memory over a contiguous range of the directmap.  There shouldn't be any
>> discontinuities over that range.
> OK, I'll look at this, thanks.
>
>
>> Be aware of memory_guard() though which does shoot holes in the
>> directmap.  However, only allocated pages should be guarded, so you
>> should never be in the position of scrubbing pages with a missing
>> mapping in the directmap.  For RAM above the 5TB boundary,
>> map_domain_page() will DTRT, but we might want to see about making a
>> variant which can make mappings spanning more than 4k.
> Maybe have it at least attempt to map a larger range, i.e. not try to
> cover all corner cases. Something like map_domain_page_fast(order).

map_domain_page() is fast if it can use the directmap.  (However, it
doesn't on a debug build to test the highmem logic.)

I expect the exceedingly common case for RAM above the 5TB (or 3.5TB)
boundary will be for it to already align on a 1GB boundary, at which
point 2M or 1G superpages will work just fine for a temporary mapping.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Converting heap page_infos to contiguous virtual
  2016-07-13 21:06         ` Andrew Cooper
@ 2016-07-13 21:43           ` Boris Ostrovsky
  2016-07-14 13:29             ` Andrew Cooper
  0 siblings, 1 reply; 20+ messages in thread
From: Boris Ostrovsky @ 2016-07-13 21:43 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel; +Cc: George Dunlap

On 07/13/2016 05:06 PM, Andrew Cooper wrote:
> On 13/07/2016 21:57, Boris Ostrovsky wrote:
>> On 07/13/2016 04:34 PM, Andrew Cooper wrote:
>>> On 13/07/2016 21:17, Boris Ostrovsky wrote:
>>>> On 07/13/2016 04:02 PM, Andrew Cooper wrote:
>>>>> On 13/07/16 20:44, Boris Ostrovsky wrote:
>>>>>> I would like to clear a bunch of Xen heap pages at once (i.e. not
>>>>>> page-by-page).
>>>>>>
>>>>>> Greatly simplifying things, let's say I grab (in common/page_alloc.c)
>>>>>>     pg = page_list_remove_head(&heap(node, zone, order)
>>>>>>
>>>>>> and then
>>>>>>
>>>>>>     mfn_t mfn =
>>>>>> _mfn(page_to_mfn(pg));                                        
>>>>>>     char *va = mfn_to_virt(mfn_x(mfn));
>>>>>>     memset(va, 0, 4096 * (1 << order));
>>>>>>
>>>>>>
>>>>>> Would it be valid to this?
>>>>> In principle, yes.  The frame_table is in order.
>>>>>
>>>>> However, mfn_to_virt() will blow up for RAM above the 5TB boundary.  You
>>>>> need to map_domain_page() to get a mapping.
>>>> Right, but that would mean going page-by-page, which I want to avoid.
>>>>
>>>> Now, DIRECTMAP_SIZE is ~128TB (if my math is correct) --- doesn't it
>>>> imply that it maps this big a range contiguously (modulo PDX hole)?
>>> Your maths is correct, and yet you will end up with problems if you
>>> trust it.
>>>
>>> That is the magic mode for the idle and monitor pagetables.  In the
>>> context of a 64bit PV guest, the cutoff is at 5TB, at which point you
>>> venture into the virtual address space reserved for guest kernel use. 
>>> (It is rather depressing that the 64bit PV guest ABI is the factor
>>> limiting Xen's maximum RAM usage.)
>> I don't know whether it would make any difference but the pages that I am
>> talking about are not in use by any guest, they are free. (This question
>> is for scrubbing rewrite that I am working on. Which apparently you
>> figured out judged by what you are saying below)
> Being free is not relevant.  It depends whether current is a 64bit PV
> guest or not.  Even in the idle loop, we don't context switch away from
> current's pagetables.


Can we force switch to idle (i.e. a non-64b PV guest) when we know
it would be useful for mapping/scrubbing? The cost of TLB flush (if that
was the reason) may be small compared to advantages brought by
fast mapping during scrubbing.


-boris


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Converting heap page_infos to contiguous virtual
  2016-07-13 20:57       ` Boris Ostrovsky
  2016-07-13 21:06         ` Andrew Cooper
@ 2016-07-14 10:25         ` George Dunlap
  2016-07-14 10:34           ` Andrew Cooper
  1 sibling, 1 reply; 20+ messages in thread
From: George Dunlap @ 2016-07-14 10:25 UTC (permalink / raw)
  To: Boris Ostrovsky, Andrew Cooper, xen-devel; +Cc: George Dunlap

On 13/07/16 21:57, Boris Ostrovsky wrote:
> On 07/13/2016 04:34 PM, Andrew Cooper wrote:
>> On 13/07/2016 21:17, Boris Ostrovsky wrote:
>>> On 07/13/2016 04:02 PM, Andrew Cooper wrote:
>>>> On 13/07/16 20:44, Boris Ostrovsky wrote:
>>>>> I would like to clear a bunch of Xen heap pages at once (i.e. not
>>>>> page-by-page).
>>>>>
>>>>> Greatly simplifying things, let's say I grab (in common/page_alloc.c)
>>>>>     pg = page_list_remove_head(&heap(node, zone, order)
>>>>>
>>>>> and then
>>>>>
>>>>>     mfn_t mfn =
>>>>> _mfn(page_to_mfn(pg));                                        
>>>>>     char *va = mfn_to_virt(mfn_x(mfn));
>>>>>     memset(va, 0, 4096 * (1 << order));
>>>>>
>>>>>
>>>>> Would it be valid to this?
>>>> In principle, yes.  The frame_table is in order.
>>>>
>>>> However, mfn_to_virt() will blow up for RAM above the 5TB boundary.  You
>>>> need to map_domain_page() to get a mapping.
>>> Right, but that would mean going page-by-page, which I want to avoid.
>>>
>>> Now, DIRECTMAP_SIZE is ~128TB (if my math is correct) --- doesn't it
>>> imply that it maps this big a range contiguously (modulo PDX hole)?
>> Your maths is correct, and yet you will end up with problems if you
>> trust it.
>>
>> That is the magic mode for the idle and monitor pagetables.  In the
>> context of a 64bit PV guest, the cutoff is at 5TB, at which point you
>> venture into the virtual address space reserved for guest kernel use. 
>> (It is rather depressing that the 64bit PV guest ABI is the factor
>> limiting Xen's maximum RAM usage.)
> 
> I don't know whether it would make any difference but the pages that I am
> talking about are not in use by any guest, they are free. (This question
> is for scrubbing rewrite that I am working on. Which apparently you
> figured out judged by what you are saying below)

Is this start-of-day scrubbing (when there are no guests), or scrubbing
on guest destruction?

If the former, it seems like it might not be too difficult to arrange
that we're in a context that has all the RAM mapped.

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Converting heap page_infos to contiguous virtual
  2016-07-14 10:25         ` George Dunlap
@ 2016-07-14 10:34           ` Andrew Cooper
  2016-07-14 12:42             ` Julien Grall
  2016-07-15 14:39             ` Boris Ostrovsky
  0 siblings, 2 replies; 20+ messages in thread
From: Andrew Cooper @ 2016-07-14 10:34 UTC (permalink / raw)
  To: George Dunlap, Boris Ostrovsky, xen-devel; +Cc: George Dunlap

On 14/07/16 11:25, George Dunlap wrote:
> On 13/07/16 21:57, Boris Ostrovsky wrote:
>> On 07/13/2016 04:34 PM, Andrew Cooper wrote:
>>> On 13/07/2016 21:17, Boris Ostrovsky wrote:
>>>> On 07/13/2016 04:02 PM, Andrew Cooper wrote:
>>>>> On 13/07/16 20:44, Boris Ostrovsky wrote:
>>>>>> I would like to clear a bunch of Xen heap pages at once (i.e. not
>>>>>> page-by-page).
>>>>>>
>>>>>> Greatly simplifying things, let's say I grab (in common/page_alloc.c)
>>>>>>     pg = page_list_remove_head(&heap(node, zone, order)
>>>>>>
>>>>>> and then
>>>>>>
>>>>>>     mfn_t mfn =
>>>>>> _mfn(page_to_mfn(pg));                                        
>>>>>>     char *va = mfn_to_virt(mfn_x(mfn));
>>>>>>     memset(va, 0, 4096 * (1 << order));
>>>>>>
>>>>>>
>>>>>> Would it be valid to this?
>>>>> In principle, yes.  The frame_table is in order.
>>>>>
>>>>> However, mfn_to_virt() will blow up for RAM above the 5TB boundary.  You
>>>>> need to map_domain_page() to get a mapping.
>>>> Right, but that would mean going page-by-page, which I want to avoid.
>>>>
>>>> Now, DIRECTMAP_SIZE is ~128TB (if my math is correct) --- doesn't it
>>>> imply that it maps this big a range contiguously (modulo PDX hole)?
>>> Your maths is correct, and yet you will end up with problems if you
>>> trust it.
>>>
>>> That is the magic mode for the idle and monitor pagetables.  In the
>>> context of a 64bit PV guest, the cutoff is at 5TB, at which point you
>>> venture into the virtual address space reserved for guest kernel use. 
>>> (It is rather depressing that the 64bit PV guest ABI is the factor
>>> limiting Xen's maximum RAM usage.)
>> I don't know whether it would make any difference but the pages that I am
>> talking about are not in use by any guest, they are free. (This question
>> is for scrubbing rewrite that I am working on. Which apparently you
>> figured out judged by what you are saying below)
> Is this start-of-day scrubbing (when there are no guests), or scrubbing
> on guest destruction?
>
> If the former, it seems like it might not be too difficult to arrange
> that we're in a context that has all the RAM mapped.

This will be runtime scrubbing of pages.  This topic has come up at
several hackathons.

Currently, domain destroy on a 1TB VM takes ~10 mins of synchronously
scrubbing RAM in continuations of the domain_kill() hypercall (and those
databases VMs really like their RAM).

ISTR the plan was to have a page_info "dirty" flag and a dirty page list
which is scrubbed while idle (or per-node, more likely). 
alloc_{dom/xen}_heap_pages() can pull off the dirty or free list, doing
a small synchronous scrub if it was dirty and needs to be clean. 
domain_kill() can just do a pagelist_splice() to move all memory onto
the dirty list, and save 10 minutes per TB.  The boot time memory scrub
can then be implemented in terms of setting the dirty flag by default,
rather than being an explicit step.

(Although I really shouldn't be second-guessing what Boris is planning
to implement ;p)

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Converting heap page_infos to contiguous virtual
  2016-07-14 10:34           ` Andrew Cooper
@ 2016-07-14 12:42             ` Julien Grall
  2016-07-14 13:10               ` Andrew Cooper
  2016-07-15 14:39             ` Boris Ostrovsky
  1 sibling, 1 reply; 20+ messages in thread
From: Julien Grall @ 2016-07-14 12:42 UTC (permalink / raw)
  To: Andrew Cooper, George Dunlap, Boris Ostrovsky, xen-devel; +Cc: George Dunlap

Hi,

On 14/07/16 11:34, Andrew Cooper wrote:
> On 14/07/16 11:25, George Dunlap wrote:
>> On 13/07/16 21:57, Boris Ostrovsky wrote:
>>> On 07/13/2016 04:34 PM, Andrew Cooper wrote:
>>>> On 13/07/2016 21:17, Boris Ostrovsky wrote:
>>>>> On 07/13/2016 04:02 PM, Andrew Cooper wrote:
>>>>>> On 13/07/16 20:44, Boris Ostrovsky wrote:
>>>>>>> I would like to clear a bunch of Xen heap pages at once (i.e. not
>>>>>>> page-by-page).
>>>>>>>
>>>>>>> Greatly simplifying things, let's say I grab (in common/page_alloc.c)
>>>>>>>      pg = page_list_remove_head(&heap(node, zone, order)
>>>>>>>
>>>>>>> and then
>>>>>>>
>>>>>>>      mfn_t mfn =
>>>>>>> _mfn(page_to_mfn(pg));
>>>>>>>      char *va = mfn_to_virt(mfn_x(mfn));
>>>>>>>      memset(va, 0, 4096 * (1 << order));
>>>>>>>
>>>>>>>
>>>>>>> Would it be valid to this?
>>>>>> In principle, yes.  The frame_table is in order.
>>>>>>
>>>>>> However, mfn_to_virt() will blow up for RAM above the 5TB boundary.  You
>>>>>> need to map_domain_page() to get a mapping.
>>>>> Right, but that would mean going page-by-page, which I want to avoid.
>>>>>
>>>>> Now, DIRECTMAP_SIZE is ~128TB (if my math is correct) --- doesn't it
>>>>> imply that it maps this big a range contiguously (modulo PDX hole)?
>>>> Your maths is correct, and yet you will end up with problems if you
>>>> trust it.
>>>>
>>>> That is the magic mode for the idle and monitor pagetables.  In the
>>>> context of a 64bit PV guest, the cutoff is at 5TB, at which point you
>>>> venture into the virtual address space reserved for guest kernel use.
>>>> (It is rather depressing that the 64bit PV guest ABI is the factor
>>>> limiting Xen's maximum RAM usage.)
>>> I don't know whether it would make any difference but the pages that I am
>>> talking about are not in use by any guest, they are free. (This question
>>> is for scrubbing rewrite that I am working on. Which apparently you
>>> figured out judged by what you are saying below)
>> Is this start-of-day scrubbing (when there are no guests), or scrubbing
>> on guest destruction?
>>
>> If the former, it seems like it might not be too difficult to arrange
>> that we're in a context that has all the RAM mapped.
>
> This will be runtime scrubbing of pages.  This topic has come up at
> several hackathons.

Is it a feature that will be implemented in common code? If so, bear in 
mind that ARM 32-bit hypervisor does not have all the memory mapped 
(actually only Xen heap is mapped).

Regards,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Converting heap page_infos to contiguous virtual
  2016-07-14 12:42             ` Julien Grall
@ 2016-07-14 13:10               ` Andrew Cooper
  0 siblings, 0 replies; 20+ messages in thread
From: Andrew Cooper @ 2016-07-14 13:10 UTC (permalink / raw)
  To: Julien Grall, George Dunlap, Boris Ostrovsky, xen-devel; +Cc: George Dunlap

On 14/07/16 13:42, Julien Grall wrote:
> Hi,
>
> On 14/07/16 11:34, Andrew Cooper wrote:
>> On 14/07/16 11:25, George Dunlap wrote:
>>> On 13/07/16 21:57, Boris Ostrovsky wrote:
>>>> On 07/13/2016 04:34 PM, Andrew Cooper wrote:
>>>>> On 13/07/2016 21:17, Boris Ostrovsky wrote:
>>>>>> On 07/13/2016 04:02 PM, Andrew Cooper wrote:
>>>>>>> On 13/07/16 20:44, Boris Ostrovsky wrote:
>>>>>>>> I would like to clear a bunch of Xen heap pages at once (i.e. not
>>>>>>>> page-by-page).
>>>>>>>>
>>>>>>>> Greatly simplifying things, let's say I grab (in
>>>>>>>> common/page_alloc.c)
>>>>>>>>      pg = page_list_remove_head(&heap(node, zone, order)
>>>>>>>>
>>>>>>>> and then
>>>>>>>>
>>>>>>>>      mfn_t mfn =
>>>>>>>> _mfn(page_to_mfn(pg));
>>>>>>>>      char *va = mfn_to_virt(mfn_x(mfn));
>>>>>>>>      memset(va, 0, 4096 * (1 << order));
>>>>>>>>
>>>>>>>>
>>>>>>>> Would it be valid to this?
>>>>>>> In principle, yes.  The frame_table is in order.
>>>>>>>
>>>>>>> However, mfn_to_virt() will blow up for RAM above the 5TB
>>>>>>> boundary.  You
>>>>>>> need to map_domain_page() to get a mapping.
>>>>>> Right, but that would mean going page-by-page, which I want to
>>>>>> avoid.
>>>>>>
>>>>>> Now, DIRECTMAP_SIZE is ~128TB (if my math is correct) --- doesn't it
>>>>>> imply that it maps this big a range contiguously (modulo PDX hole)?
>>>>> Your maths is correct, and yet you will end up with problems if you
>>>>> trust it.
>>>>>
>>>>> That is the magic mode for the idle and monitor pagetables.  In the
>>>>> context of a 64bit PV guest, the cutoff is at 5TB, at which point you
>>>>> venture into the virtual address space reserved for guest kernel use.
>>>>> (It is rather depressing that the 64bit PV guest ABI is the factor
>>>>> limiting Xen's maximum RAM usage.)
>>>> I don't know whether it would make any difference but the pages
>>>> that I am
>>>> talking about are not in use by any guest, they are free. (This
>>>> question
>>>> is for scrubbing rewrite that I am working on. Which apparently you
>>>> figured out judged by what you are saying below)
>>> Is this start-of-day scrubbing (when there are no guests), or scrubbing
>>> on guest destruction?
>>>
>>> If the former, it seems like it might not be too difficult to arrange
>>> that we're in a context that has all the RAM mapped.
>>
>> This will be runtime scrubbing of pages.  This topic has come up at
>> several hackathons.
>
> Is it a feature that will be implemented in common code? If so, bear
> in mind that ARM 32-bit hypervisor does not have all the memory mapped
> (actually only Xen heap is mapped).

Nor does x86, on boxes with more than 5TB of RAM.

Where possible, we should try to make it common, but it will depend on
exactly how similar the heap implementations are.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Converting heap page_infos to contiguous virtual
  2016-07-13 21:43           ` Boris Ostrovsky
@ 2016-07-14 13:29             ` Andrew Cooper
  2016-07-15 14:53               ` Boris Ostrovsky
  0 siblings, 1 reply; 20+ messages in thread
From: Andrew Cooper @ 2016-07-14 13:29 UTC (permalink / raw)
  To: Boris Ostrovsky, xen-devel; +Cc: George Dunlap

On 13/07/16 22:43, Boris Ostrovsky wrote:
> On 07/13/2016 05:06 PM, Andrew Cooper wrote:
>> On 13/07/2016 21:57, Boris Ostrovsky wrote:
>>> On 07/13/2016 04:34 PM, Andrew Cooper wrote:
>>>> On 13/07/2016 21:17, Boris Ostrovsky wrote:
>>>>> On 07/13/2016 04:02 PM, Andrew Cooper wrote:
>>>>>> On 13/07/16 20:44, Boris Ostrovsky wrote:
>>>>>>> I would like to clear a bunch of Xen heap pages at once (i.e. not
>>>>>>> page-by-page).
>>>>>>>
>>>>>>> Greatly simplifying things, let's say I grab (in common/page_alloc.c)
>>>>>>>     pg = page_list_remove_head(&heap(node, zone, order)
>>>>>>>
>>>>>>> and then
>>>>>>>
>>>>>>>     mfn_t mfn =
>>>>>>> _mfn(page_to_mfn(pg));                                        
>>>>>>>     char *va = mfn_to_virt(mfn_x(mfn));
>>>>>>>     memset(va, 0, 4096 * (1 << order));
>>>>>>>
>>>>>>>
>>>>>>> Would it be valid to this?
>>>>>> In principle, yes.  The frame_table is in order.
>>>>>>
>>>>>> However, mfn_to_virt() will blow up for RAM above the 5TB boundary.  You
>>>>>> need to map_domain_page() to get a mapping.
>>>>> Right, but that would mean going page-by-page, which I want to avoid.
>>>>>
>>>>> Now, DIRECTMAP_SIZE is ~128TB (if my math is correct) --- doesn't it
>>>>> imply that it maps this big a range contiguously (modulo PDX hole)?
>>>> Your maths is correct, and yet you will end up with problems if you
>>>> trust it.
>>>>
>>>> That is the magic mode for the idle and monitor pagetables.  In the
>>>> context of a 64bit PV guest, the cutoff is at 5TB, at which point you
>>>> venture into the virtual address space reserved for guest kernel use. 
>>>> (It is rather depressing that the 64bit PV guest ABI is the factor
>>>> limiting Xen's maximum RAM usage.)
>>> I don't know whether it would make any difference but the pages that I am
>>> talking about are not in use by any guest, they are free. (This question
>>> is for scrubbing rewrite that I am working on. Which apparently you
>>> figured out judged by what you are saying below)
>> Being free is not relevant.  It depends whether current is a 64bit PV
>> guest or not.  Even in the idle loop, we don't context switch away from
>> current's pagetables.
>
> Can we force switch to idle (i.e. a non-64b PV guest) when we know
> it would be useful for mapping/scrubbing? The cost of TLB flush (if that
> was the reason) may be small compared to advantages brought by
> fast mapping during scrubbing.

It sounds like a plausible option, but would need some numbers to back
it up.

However, I would recommend getting something functioning first, before
trying to optimise it.

There is probably a lot to be gained simply by improving clear_page().

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Converting heap page_infos to contiguous virtual
  2016-07-14 10:34           ` Andrew Cooper
  2016-07-14 12:42             ` Julien Grall
@ 2016-07-15 14:39             ` Boris Ostrovsky
  1 sibling, 0 replies; 20+ messages in thread
From: Boris Ostrovsky @ 2016-07-15 14:39 UTC (permalink / raw)
  To: Andrew Cooper, George Dunlap, xen-devel; +Cc: George Dunlap

On 07/14/2016 06:34 AM, Andrew Cooper wrote:
> On 14/07/16 11:25, George Dunlap wrote:
>> On 13/07/16 21:57, Boris Ostrovsky wrote:
>>> On 07/13/2016 04:34 PM, Andrew Cooper wrote:
>>>> On 13/07/2016 21:17, Boris Ostrovsky wrote:
>>>>> On 07/13/2016 04:02 PM, Andrew Cooper wrote:
>>>>>> On 13/07/16 20:44, Boris Ostrovsky wrote:
>>>>>>> I would like to clear a bunch of Xen heap pages at once (i.e. not
>>>>>>> page-by-page).
>>>>>>>
>>>>>>> Greatly simplifying things, let's say I grab (in common/page_alloc.c)
>>>>>>>     pg = page_list_remove_head(&heap(node, zone, order)
>>>>>>>
>>>>>>> and then
>>>>>>>
>>>>>>>     mfn_t mfn =
>>>>>>> _mfn(page_to_mfn(pg));                                        
>>>>>>>     char *va = mfn_to_virt(mfn_x(mfn));
>>>>>>>     memset(va, 0, 4096 * (1 << order));
>>>>>>>
>>>>>>>
>>>>>>> Would it be valid to this?
>>>>>> In principle, yes.  The frame_table is in order.
>>>>>>
>>>>>> However, mfn_to_virt() will blow up for RAM above the 5TB boundary.  You
>>>>>> need to map_domain_page() to get a mapping.
>>>>> Right, but that would mean going page-by-page, which I want to avoid.
>>>>>
>>>>> Now, DIRECTMAP_SIZE is ~128TB (if my math is correct) --- doesn't it
>>>>> imply that it maps this big a range contiguously (modulo PDX hole)?
>>>> Your maths is correct, and yet you will end up with problems if you
>>>> trust it.
>>>>
>>>> That is the magic mode for the idle and monitor pagetables.  In the
>>>> context of a 64bit PV guest, the cutoff is at 5TB, at which point you
>>>> venture into the virtual address space reserved for guest kernel use. 
>>>> (It is rather depressing that the 64bit PV guest ABI is the factor
>>>> limiting Xen's maximum RAM usage.)
>>> I don't know whether it would make any difference but the pages that I am
>>> talking about are not in use by any guest, they are free. (This question
>>> is for scrubbing rewrite that I am working on. Which apparently you
>>> figured out judged by what you are saying below)
>> Is this start-of-day scrubbing (when there are no guests), or scrubbing
>> on guest destruction?
>>
>> If the former, it seems like it might not be too difficult to arrange
>> that we're in a context that has all the RAM mapped.
> This will be runtime scrubbing of pages.  

Actually, both. My prototype (apparently, mistakenly) assumed whole RAM
mapping and so I used the same clearing code during both system boot and
guest destruction.

In the former case with a 6TB box scrubbing time went from minutes to
seconds. This was a while ago so I don't remember exact numbers (or
whether the system had 6 or fewer TB). This was using AVX instructions.


> This topic has come up at
> several hackathons.
>
> Currently, domain destroy on a 1TB VM takes ~10 mins of synchronously
> scrubbing RAM in continuations of the domain_kill() hypercall (and those
> databases VMs really like their RAM).
>
> ISTR the plan was to have a page_info "dirty" flag and a dirty page list
> which is scrubbed while idle (or per-node, more likely). 
> alloc_{dom/xen}_heap_pages() can pull off the dirty or free list, doing
> a small synchronous scrub if it was dirty and needs to be clean. 
> domain_kill() can just do a pagelist_splice() to move all memory onto
> the dirty list, and save 10 minutes per TB.  The boot time memory scrub
> can then be implemented in terms of setting the dirty flag by default,
> rather than being an explicit step.
>
> (Although I really shouldn't be second-guessing what Boris is planning
> to implement ;p)

Not exactly this, but something along those lines.

-boris



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Converting heap page_infos to contiguous virtual
  2016-07-14 13:29             ` Andrew Cooper
@ 2016-07-15 14:53               ` Boris Ostrovsky
  2016-07-15 15:19                 ` Andrew Cooper
  2016-07-15 16:04                 ` Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 20+ messages in thread
From: Boris Ostrovsky @ 2016-07-15 14:53 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel; +Cc: George Dunlap

On 07/14/2016 09:29 AM, Andrew Cooper wrote:
>
> However, I would recommend getting something functioning first, before
> trying to optimise it.

There are two fairly independent parts to improving scrubbing: one is
making it asynchronous and second is improving clear_page() performance.
Whole-RAM mapping is needed for the latter.

>
> There is probably a lot to be gained simply by improving clear_page().

The biggest improvement comes from switching to AVX(2) when available.
It's been a while since I ran those tests so I will have to re-measure
it but my recollection is that 4K was too small to see significant changes.

A potential improvement might come from dropping (or, rather, deferring)
sfence in clear_page_sse2. I don't know how much this would buy us though.

-boris



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Converting heap page_infos to contiguous virtual
  2016-07-15 14:53               ` Boris Ostrovsky
@ 2016-07-15 15:19                 ` Andrew Cooper
  2016-07-15 15:35                   ` Boris Ostrovsky
  2016-07-15 16:04                 ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 20+ messages in thread
From: Andrew Cooper @ 2016-07-15 15:19 UTC (permalink / raw)
  To: Boris Ostrovsky, xen-devel; +Cc: George Dunlap

On 15/07/16 15:53, Boris Ostrovsky wrote:
> On 07/14/2016 09:29 AM, Andrew Cooper wrote:
>> However, I would recommend getting something functioning first, before
>> trying to optimise it.
> There are two fairly independent parts to improving scrubbing: one is
> making it asynchronous and second is improving clear_page() performance.
> Whole-RAM mapping is needed for the latter.
>
>> There is probably a lot to be gained simply by improving clear_page().
> The biggest improvement comes from switching to AVX(2) when available.
> It's been a while since I ran those tests so I will have to re-measure
> it but my recollection is that 4K was too small to see significant changes.

There is also the new `clzero` on AMD Zen processors, which looks like
an interesting option which doesn't involve context switching SIMD state.

>
> A potential improvement might come from dropping (or, rather, deferring)
> sfence in clear_page_sse2. I don't know how much this would buy us though.

The sfence is mandatory because of the movnti.  It cannot be dropped, or
the zeroes might be stuck in the Write Combining buffer when something
else comes to access the page.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Converting heap page_infos to contiguous virtual
  2016-07-15 15:19                 ` Andrew Cooper
@ 2016-07-15 15:35                   ` Boris Ostrovsky
  0 siblings, 0 replies; 20+ messages in thread
From: Boris Ostrovsky @ 2016-07-15 15:35 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel; +Cc: George Dunlap

On 07/15/2016 11:19 AM, Andrew Cooper wrote:
> On 15/07/16 15:53, Boris Ostrovsky wrote:
>> On 07/14/2016 09:29 AM, Andrew Cooper wrote:
>>> However, I would recommend getting something functioning first, before
>>> trying to optimise it.
>> There are two fairly independent parts to improving scrubbing: one is
>> making it asynchronous and second is improving clear_page() performance.
>> Whole-RAM mapping is needed for the latter.
>>
>>> There is probably a lot to be gained simply by improving clear_page().
>> The biggest improvement comes from switching to AVX(2) when available.
>> It's been a while since I ran those tests so I will have to re-measure
>> it but my recollection is that 4K was too small to see significant changes.
> There is also the new `clzero` on AMD Zen processors, which looks like
> an interesting option which doesn't involve context switching SIMD state.

Yes, this looks like a good option. I  am not aware of Zen HW being
available to me on the near future though ;-)

>
>> A potential improvement might come from dropping (or, rather, deferring)
>> sfence in clear_page_sse2. I don't know how much this would buy us though.
> The sfence is mandatory because of the movnti.  It cannot be dropped, or
> the zeroes might be stuck in the Write Combining buffer when something
> else comes to access the page.

Right, but while the chunk of pages is being scrubbed, those pages are
not available for anyone to access. Before they are (as a set) declared
available an sfence will be issued. That's what I meant by "deferring".

-boris



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Converting heap page_infos to contiguous virtual
  2016-07-15 14:53               ` Boris Ostrovsky
  2016-07-15 15:19                 ` Andrew Cooper
@ 2016-07-15 16:04                 ` Konrad Rzeszutek Wilk
  2016-07-15 16:07                   ` Andrew Cooper
  1 sibling, 1 reply; 20+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-07-15 16:04 UTC (permalink / raw)
  To: Boris Ostrovsky; +Cc: George Dunlap, Andrew Cooper, xen-devel

[-- Attachment #1: Type: text/plain, Size: 1175 bytes --]

On Fri, Jul 15, 2016 at 10:53:51AM -0400, Boris Ostrovsky wrote:
> On 07/14/2016 09:29 AM, Andrew Cooper wrote:
> >
> > However, I would recommend getting something functioning first, before
> > trying to optimise it.
> 
> There are two fairly independent parts to improving scrubbing: one is
> making it asynchronous and second is improving clear_page() performance.
> Whole-RAM mapping is needed for the latter.

Attaching a nice graph of different memset on Broadwell (credits go to
Joao for doing the testing). Skylake is 10% faster than Broadwell.


> 
> >
> > There is probably a lot to be gained simply by improving clear_page().
> 
> The biggest improvement comes from switching to AVX(2) when available.
> It's been a while since I ran those tests so I will have to re-measure
> it but my recollection is that 4K was too small to see significant changes.
> 
> A potential improvement might come from dropping (or, rather, deferring)
> sfence in clear_page_sse2. I don't know how much this would buy us though.
> 
> -boris
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

[-- Attachment #2: broadwell_memset.png --]
[-- Type: image/png, Size: 25670 bytes --]

[-- Attachment #3: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Converting heap page_infos to contiguous virtual
  2016-07-15 16:04                 ` Konrad Rzeszutek Wilk
@ 2016-07-15 16:07                   ` Andrew Cooper
  0 siblings, 0 replies; 20+ messages in thread
From: Andrew Cooper @ 2016-07-15 16:07 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Boris Ostrovsky; +Cc: George Dunlap, xen-devel

On 15/07/16 17:04, Konrad Rzeszutek Wilk wrote:
> On Fri, Jul 15, 2016 at 10:53:51AM -0400, Boris Ostrovsky wrote:
>> On 07/14/2016 09:29 AM, Andrew Cooper wrote:
>>> However, I would recommend getting something functioning first, before
>>> trying to optimise it.
>> There are two fairly independent parts to improving scrubbing: one is
>> making it asynchronous and second is improving clear_page() performance.
>> Whole-RAM mapping is needed for the latter.
> Attaching a nice graph of different memset on Broadwell (credits go to
> Joao for doing the testing). Skylake is 10% faster than Broadwell.

Graph says memcpy() not memset().

Either way, it looks like we would do very well to "borrow" a different
implementation.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Converting heap page_infos to contiguous virtual
  2016-07-13 20:02 ` Andrew Cooper
  2016-07-13 20:17   ` Boris Ostrovsky
@ 2016-08-01 12:09   ` Jan Beulich
  1 sibling, 0 replies; 20+ messages in thread
From: Jan Beulich @ 2016-08-01 12:09 UTC (permalink / raw)
  To: Andrew Cooper, Boris Ostrovsky; +Cc: George Dunlap, xen-devel

>>> On 13.07.16 at 22:02, <andrew.cooper3@citrix.com> wrote:
> On 13/07/16 20:44, Boris Ostrovsky wrote:
>>  Do I need to account for the PDX hole?
> 
> Jan is probably the best person to ask about this, but I am failure sure
> there are lurking dragons here.

I don't think there are - contiguous chunks of pages can't have a PDX
hole in their middle. There would be two parts (before and after the
hole), and no allocation can return two such parts.

Furthermore PDX compression actually _reduces_ holes, i.e. there
might only be a case where physical addresses are discontiguous
yet virtual ones are contiguous, but never the other way around.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2016-08-01 12:09 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-13 19:44 Converting heap page_infos to contiguous virtual Boris Ostrovsky
2016-07-13 20:02 ` Andrew Cooper
2016-07-13 20:17   ` Boris Ostrovsky
2016-07-13 20:28     ` Boris Ostrovsky
2016-07-13 20:34     ` Andrew Cooper
2016-07-13 20:57       ` Boris Ostrovsky
2016-07-13 21:06         ` Andrew Cooper
2016-07-13 21:43           ` Boris Ostrovsky
2016-07-14 13:29             ` Andrew Cooper
2016-07-15 14:53               ` Boris Ostrovsky
2016-07-15 15:19                 ` Andrew Cooper
2016-07-15 15:35                   ` Boris Ostrovsky
2016-07-15 16:04                 ` Konrad Rzeszutek Wilk
2016-07-15 16:07                   ` Andrew Cooper
2016-07-14 10:25         ` George Dunlap
2016-07-14 10:34           ` Andrew Cooper
2016-07-14 12:42             ` Julien Grall
2016-07-14 13:10               ` Andrew Cooper
2016-07-15 14:39             ` Boris Ostrovsky
2016-08-01 12:09   ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.