All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xen-devel] [XEN PATCH v2 1/2] Check zone before merging adjacent blocks in heap
@ 2020-02-04 15:14 Stewart Hildebrand
  2020-02-04 15:14 ` [Xen-devel] [DO NOT APPLY XEN PATCH v2 2/2] Test case for buddy allocator merging issue Stewart Hildebrand
  2020-02-04 15:22 ` [Xen-devel] [XEN PATCH v2 1/2] Check zone before merging adjacent blocks in heap Jan Beulich
  0 siblings, 2 replies; 48+ messages in thread
From: Stewart Hildebrand @ 2020-02-04 15:14 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Jeff Kubascik,
	Jan Beulich, David Woodhouse

From: Jeff Kubascik <jeff.kubascik@dornerworks.com>

The Xen heap is split up into nodes and zones. Each node + zone is
managed as a separate pool of memory.

When returning pages to the heap, free_heap_pages will check adjacent
blocks to see if they can be combined into a larger block. However, the
zone of the adjacent block is not checked. This results in blocks that
migrate from one zone to another.

When a block migrates to the adjacent zone, the avail counters for the
old and new node + zone is not updated accordingly. The avail counter
is used when allocating pages to determine whether to skip over a zone.
With this behavior, it is possible for free pages to collect in a zone
with the avail counter smaller than the actual page count, resulting
in free pages that are not allocable.

This commit adds a check to compare the adjacent block's zone with the
current zone before merging them.

Signed-off-by: Jeff Kubascik <jeff.kubascik@dornerworks.com>
Signed-off-by: Stewart Hildebrand <stewart.hildebrand@dornerworks.com>

---
v2: s/pg-mask/predecessor/
---
 xen/common/page_alloc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 97902d42c1..735049fe3e 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -1462,6 +1462,7 @@ static void free_heap_pages(
             if ( !mfn_valid(page_to_mfn(predecessor)) ||
                  !page_state_is(predecessor, free) ||
                  (PFN_ORDER(predecessor) != order) ||
+                 (page_to_zone(predecessor) != zone) ||
                  (phys_to_nid(page_to_maddr(predecessor)) != node) )
                 break;
 
@@ -1485,6 +1486,7 @@ static void free_heap_pages(
             if ( !mfn_valid(page_to_mfn(successor)) ||
                  !page_state_is(successor, free) ||
                  (PFN_ORDER(successor) != order) ||
+                 (page_to_zone(successor) != zone) ||
                  (phys_to_nid(page_to_maddr(successor)) != node) )
                 break;
 
-- 
2.25.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Xen-devel] [DO NOT APPLY XEN PATCH v2 2/2] Test case for buddy allocator merging issue
  2020-02-04 15:14 [Xen-devel] [XEN PATCH v2 1/2] Check zone before merging adjacent blocks in heap Stewart Hildebrand
@ 2020-02-04 15:14 ` Stewart Hildebrand
  2020-02-04 15:22 ` [Xen-devel] [XEN PATCH v2 1/2] Check zone before merging adjacent blocks in heap Jan Beulich
  1 sibling, 0 replies; 48+ messages in thread
From: Stewart Hildebrand @ 2020-02-04 15:14 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Jan Beulich,
	David Woodhouse

Do not apply this patch - it is intended as a test case to show how the
problem presents itself. This was tested on a Xiling Zynq UltraScale+
MPSoC with CONFIG_DEBUG=y.

Add an assert for merging pages across zones

Revert "Check zone before merging adjacent blocks in heap"

Revert "xen/page_alloc: Keep away MFN 0 from the buddy allocator"

This reverts commit 762b9a2d990bba1f3aefe660cff0c37ad2e375bc.

With this test case patch applied, Xen crashes very early in the boot process:

- UART enabled -
- Boot CPU booting -
- Current EL 00000008 -
- Initialize CPU -
- Turning on paging -
- Zero BSS -
- Ready -
(XEN) Checking for initrd in /chosen
(XEN) RAM: 0000000000000000 - 000000007fffffff
(XEN)
(XEN) MODULE[0]: 0000000000080000 - 00000000001c28f0 Xen
(XEN) MODULE[1]: 0000000000280000 - 0000000000288000 Device Tree
(XEN) MODULE[2]: 0000000000300000 - 0000000001371200 Kernel
(XEN)  RESVD[0]: 0000000000280000 - 0000000000288000
(XEN)
(XEN)
(XEN) Command line: console=dtuart dtuart=/amba/serial@ff000000 dom0_mem=1024M bootscrub=0 loglvl=all/all guest_loglvl=info/all
(XEN) Assertion 'page_to_zone(predecessor) == zone' failed at page_alloc.c:1468
(XEN) ----[ Xen-4.14-unstable  arm64  debug=y   Not tainted ]----
(XEN) CPU:    0
(XEN) PC:     000000000021b4dc page_alloc.c#free_heap_pages+0x34c/0x6a8
(XEN) LR:     000000000021b6cc
(XEN) SP:     00000000002ffcd0
(XEN) CPSR:   200003c9 MODE:64-bit EL2h (Hypervisor, handler)
(XEN)      X0: 0000000000000040  X1: 0000000000000001  X2: 0180000000000000
(XEN)      X3: ffffffffffffffff  X4: 0000000000000000  X5: 0000000000000000
(XEN)      X6: 0000000000000080  X7: 0000000000000001  X8: 0000000000000001
(XEN)      X9: 0000000000000000 X10: 0000000000000001 X11: 0080000000000000
(XEN)     X12: 0180000000000000 X13: 00000000002ad4b8 X14: 0000000000000000
(XEN)     X15: 0000000047ffffff X16: 00000000002ad000 X17: 00000000002ad000
(XEN)     X18: 00000000002ad000 X19: 0000000800000000 X20: 0000000000000001
(XEN)     X21: 0000000800000070 X22: 0000000000000002 X23: 00000000002ad4b8
(XEN)     X24: 6db6db6db6db6db7 X25: 00000000002ad000 X26: 0000000000200200
(XEN)     X27: 0000000000100100 X28: 0000000000000000  FP: 00000000002ffcd0
(XEN)
(XEN)   VTCR_EL2: 80000210
(XEN)  VTTBR_EL2: 0000000106044200
(XEN)
(XEN)  SCTLR_EL2: 30cd183d
(XEN)    HCR_EL2: 000000000000003a
(XEN)  TTBR0_EL2: 00000000001bc000
(XEN)
(XEN)    ESR_EL2: f2000001
(XEN)  HPFAR_EL2: 00000000c8082a90
(XEN)    FAR_EL2: 3238095901188162
(XEN)
(XEN) Xen stack trace from sp=00000000002ffcd0:
(XEN)    00000000002ffd40 000000000021db0c 000000000030ad08 00000008000000a8
(XEN)    0000000000000003 0000000000000080 00000000002ad488 0000000800000000
(XEN)    0000000000000000 0000000000000001 0000000000000000 6db6db6db6db6db7
(XEN)    00000000002ab0c8 00000000002c4030 00000000002ffdb0 00000000002b0c1c
(XEN)    0000000000000000 0000000001371200 0000000080000000 00000000002e03c0
(XEN)    ffffffffffffffff 00000000002e03d8 00000000002e03d0 0000000000000000
(XEN)    0000000080000000 0000000000000001 ffffffffffffffff 000000000030ad58
(XEN)    00000000002ffde0 00000000002c403c 0000000000080000 0000000001371200
(XEN)    0000000080000000 00000000002e03c0 000000007def3c00 00000000002001b8
(XEN)    0000000000080000 ffffffffffe80000 0000000000280000 0000000000000000
(XEN)    0000000000400000 000000007ff9c250 0000000000080040 0000000000000000
(XEN)    0000000000000400 0000000000080000 0000000000000000 0000000000008000
(XEN)    0000000000280000 0000000080000000 00000000002e03c0 00000000002b026c
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000300000000
(XEN)    0000000000000000 00000040ffffffff 00000000ffffffff 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000
(XEN) Xen call trace:
(XEN)    [<000000000021b4dc>] page_alloc.c#free_heap_pages+0x34c/0x6a8 (PC)
(XEN)    [<000000000021b6cc>] page_alloc.c#free_heap_pages+0x53c/0x6a8 (LR)
(XEN)    [<000000000021db0c>] page_alloc.c#init_heap_pages+0x344/0x504
(XEN)    [<00000000002b0c1c>] end_boot_allocator+0xa8/0x19c
(XEN)    [<00000000002c403c>] start_xen+0x3b4/0xc4c
(XEN)    [<00000000002001b8>] arm64/head.o#primary_switched+0x10/0x30
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Assertion 'page_to_zone(predecessor) == zone' failed at page_alloc.c:1468
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...
---
 xen/common/page_alloc.c | 18 ++++--------------
 1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 735049fe3e..cb65668736 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -1462,10 +1462,11 @@ static void free_heap_pages(
             if ( !mfn_valid(page_to_mfn(predecessor)) ||
                  !page_state_is(predecessor, free) ||
                  (PFN_ORDER(predecessor) != order) ||
-                 (page_to_zone(predecessor) != zone) ||
                  (phys_to_nid(page_to_maddr(predecessor)) != node) )
                 break;
 
+            ASSERT(page_to_zone(predecessor) == zone);
+
             check_and_stop_scrub(predecessor);
 
             page_list_del(predecessor, &heap(node, zone, order));
@@ -1486,10 +1487,11 @@ static void free_heap_pages(
             if ( !mfn_valid(page_to_mfn(successor)) ||
                  !page_state_is(successor, free) ||
                  (PFN_ORDER(successor) != order) ||
-                 (page_to_zone(successor) != zone) ||
                  (phys_to_nid(page_to_maddr(successor)) != node) )
                 break;
 
+            ASSERT(page_to_zone(successor) == zone);
+
             check_and_stop_scrub(successor);
 
             /* Update pg's first_dirty if necessary. */
@@ -1773,18 +1775,6 @@ static void init_heap_pages(
     unsigned long i;
     bool idle_scrub = false;
 
-    /*
-     * Keep MFN 0 away from the buddy allocator to avoid crossing zone
-     * boundary when merging two buddies.
-     */
-    if ( !mfn_x(page_to_mfn(pg)) )
-    {
-        if ( nr_pages-- <= 1 )
-            return;
-        pg++;
-    }
-
-
     /*
      * Some pages may not go through the boot allocator (e.g reserved
      * memory at boot but released just after --- kernel, initramfs,
-- 
2.25.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [Xen-devel] [XEN PATCH v2 1/2] Check zone before merging adjacent blocks in heap
  2020-02-04 15:14 [Xen-devel] [XEN PATCH v2 1/2] Check zone before merging adjacent blocks in heap Stewart Hildebrand
  2020-02-04 15:14 ` [Xen-devel] [DO NOT APPLY XEN PATCH v2 2/2] Test case for buddy allocator merging issue Stewart Hildebrand
@ 2020-02-04 15:22 ` Jan Beulich
  2020-02-04 15:37   ` George Dunlap
  2020-02-04 15:37   ` Stewart Hildebrand
  1 sibling, 2 replies; 48+ messages in thread
From: Jan Beulich @ 2020-02-04 15:22 UTC (permalink / raw)
  To: Stewart Hildebrand
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Jeff Kubascik,
	xen-devel, David Woodhouse

On 04.02.2020 16:14, Stewart Hildebrand wrote:
> From: Jeff Kubascik <jeff.kubascik@dornerworks.com>
> 
> The Xen heap is split up into nodes and zones. Each node + zone is
> managed as a separate pool of memory.
> 
> When returning pages to the heap, free_heap_pages will check adjacent
> blocks to see if they can be combined into a larger block. However, the
> zone of the adjacent block is not checked. This results in blocks that
> migrate from one zone to another.
> 
> When a block migrates to the adjacent zone, the avail counters for the
> old and new node + zone is not updated accordingly. The avail counter
> is used when allocating pages to determine whether to skip over a zone.
> With this behavior, it is possible for free pages to collect in a zone
> with the avail counter smaller than the actual page count, resulting
> in free pages that are not allocable.

"When a block migrates" - fine. But is this situation possible to
occur, without "xen/page_alloc: Keep away MFN 0 from the buddy
allocator" reverted? If not, there's no bug, no need for a change,
and even less so ...

> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -1462,6 +1462,7 @@ static void free_heap_pages(
>              if ( !mfn_valid(page_to_mfn(predecessor)) ||
>                   !page_state_is(predecessor, free) ||
>                   (PFN_ORDER(predecessor) != order) ||
> +                 (page_to_zone(predecessor) != zone) ||
>                   (phys_to_nid(page_to_maddr(predecessor)) != node) )
>                  break;
>  
> @@ -1485,6 +1486,7 @@ static void free_heap_pages(
>              if ( !mfn_valid(page_to_mfn(successor)) ||
>                   !page_state_is(successor, free) ||
>                   (PFN_ORDER(successor) != order) ||
> +                 (page_to_zone(successor) != zone) ||
>                   (phys_to_nid(page_to_maddr(successor)) != node) )
>                  break;

... for one that slows down many free operations, even if just
slightly. IOW afaict either the change is not needed, or its
description needs updating.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Xen-devel] [XEN PATCH v2 1/2] Check zone before merging adjacent blocks in heap
  2020-02-04 15:22 ` [Xen-devel] [XEN PATCH v2 1/2] Check zone before merging adjacent blocks in heap Jan Beulich
@ 2020-02-04 15:37   ` George Dunlap
  2020-02-05  9:50     ` David Woodhouse
  2020-02-04 15:37   ` Stewart Hildebrand
  1 sibling, 1 reply; 48+ messages in thread
From: George Dunlap @ 2020-02-04 15:37 UTC (permalink / raw)
  To: Jan Beulich, Stewart Hildebrand
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Jeff Kubascik,
	xen-devel, David Woodhouse

On 2/4/20 3:22 PM, Jan Beulich wrote:
> On 04.02.2020 16:14, Stewart Hildebrand wrote:
>> From: Jeff Kubascik <jeff.kubascik@dornerworks.com>
>>
>> The Xen heap is split up into nodes and zones. Each node + zone is
>> managed as a separate pool of memory.
>>
>> When returning pages to the heap, free_heap_pages will check adjacent
>> blocks to see if they can be combined into a larger block. However, the
>> zone of the adjacent block is not checked. This results in blocks that
>> migrate from one zone to another.
>>
>> When a block migrates to the adjacent zone, the avail counters for the
>> old and new node + zone is not updated accordingly. The avail counter
>> is used when allocating pages to determine whether to skip over a zone.
>> With this behavior, it is possible for free pages to collect in a zone
>> with the avail counter smaller than the actual page count, resulting
>> in free pages that are not allocable.
> 
> "When a block migrates" - fine. But is this situation possible to
> occur, without "xen/page_alloc: Keep away MFN 0 from the buddy
> allocator" reverted? If not, there's no bug, no need for a change,
> and even less so ...
> 
>> --- a/xen/common/page_alloc.c
>> +++ b/xen/common/page_alloc.c
>> @@ -1462,6 +1462,7 @@ static void free_heap_pages(
>>              if ( !mfn_valid(page_to_mfn(predecessor)) ||
>>                   !page_state_is(predecessor, free) ||
>>                   (PFN_ORDER(predecessor) != order) ||
>> +                 (page_to_zone(predecessor) != zone) ||
>>                   (phys_to_nid(page_to_maddr(predecessor)) != node) )
>>                  break;
>>  
>> @@ -1485,6 +1486,7 @@ static void free_heap_pages(
>>              if ( !mfn_valid(page_to_mfn(successor)) ||
>>                   !page_state_is(successor, free) ||
>>                   (PFN_ORDER(successor) != order) ||
>> +                 (page_to_zone(successor) != zone) ||
>>                   (phys_to_nid(page_to_maddr(successor)) != node) )
>>                  break;
> 
> ... for one that slows down many free operations, even if just
> slightly. IOW afaict either the change is not needed, or its
> description needs updating.

At very least it's more robust this way; the algorithm is also less
"magic".  We just had a long discussion this morning trying to re-create
the logic for why "Remove MFN 0" was sufficient to prevent this issue,
and even then David wasn't sure it was correct at first.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Xen-devel] [XEN PATCH v2 1/2] Check zone before merging adjacent blocks in heap
  2020-02-04 15:22 ` [Xen-devel] [XEN PATCH v2 1/2] Check zone before merging adjacent blocks in heap Jan Beulich
  2020-02-04 15:37   ` George Dunlap
@ 2020-02-04 15:37   ` Stewart Hildebrand
  1 sibling, 0 replies; 48+ messages in thread
From: Stewart Hildebrand @ 2020-02-04 15:37 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Jeff Kubascik,
	xen-devel, David Woodhouse

On Tuesday, February 4, 2020 10:22 AM, Jan Beulich wrote:
>On 04.02.2020 16:14, Stewart Hildebrand wrote:
>> From: Jeff Kubascik <jeff.kubascik@dornerworks.com>
>>
>> The Xen heap is split up into nodes and zones. Each node + zone is
>> managed as a separate pool of memory.
>>
>> When returning pages to the heap, free_heap_pages will check adjacent
>> blocks to see if they can be combined into a larger block. However, the
>> zone of the adjacent block is not checked. This results in blocks that
>> migrate from one zone to another.
>>
>> When a block migrates to the adjacent zone, the avail counters for the
>> old and new node + zone is not updated accordingly. The avail counter
>> is used when allocating pages to determine whether to skip over a zone.
>> With this behavior, it is possible for free pages to collect in a zone
>> with the avail counter smaller than the actual page count, resulting
>> in free pages that are not allocable.
>
>"When a block migrates" - fine. But is this situation possible to
>occur, without "xen/page_alloc: Keep away MFN 0 from the buddy
>allocator" reverted?

No, not as far as I'm aware, though I have not studied this code in detail so I don't feel fully confident in my "no".

> If not, there's no bug, no need for a change,
>and even less so ...
>
>> --- a/xen/common/page_alloc.c
>> +++ b/xen/common/page_alloc.c
>> @@ -1462,6 +1462,7 @@ static void free_heap_pages(
>>              if ( !mfn_valid(page_to_mfn(predecessor)) ||
>>                   !page_state_is(predecessor, free) ||
>>                   (PFN_ORDER(predecessor) != order) ||
>> +                 (page_to_zone(predecessor) != zone) ||
>>                   (phys_to_nid(page_to_maddr(predecessor)) != node) )
>>                  break;
>>
>> @@ -1485,6 +1486,7 @@ static void free_heap_pages(
>>              if ( !mfn_valid(page_to_mfn(successor)) ||
>>                   !page_state_is(successor, free) ||
>>                   (PFN_ORDER(successor) != order) ||
>> +                 (page_to_zone(successor) != zone) ||
>>                   (phys_to_nid(page_to_maddr(successor)) != node) )
>>                  break;
>
>... for one that slows down many free operations, even if just
>slightly. IOW afaict either the change is not needed, or its
>description needs updating.

Right. An alternative that wouldn't potentially slow things down in production builds would be to apply the ASSERT from patch 2. I don't have any performance metrics regarding exactly how much of a performance hit this would incur.

Stew

>
>Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Xen-devel] [XEN PATCH v2 1/2] Check zone before merging adjacent blocks in heap
  2020-02-04 15:37   ` George Dunlap
@ 2020-02-05  9:50     ` David Woodhouse
  2020-02-05 10:02       ` Jan Beulich
  2020-02-05 10:22       ` [Xen-devel] [XEN PATCH v2 1/2] Check zone before merging adjacent blocks in heap Julien Grall
  0 siblings, 2 replies; 48+ messages in thread
From: David Woodhouse @ 2020-02-05  9:50 UTC (permalink / raw)
  To: George Dunlap, Jan Beulich, Stewart Hildebrand
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Jeff Kubascik,
	xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2121 bytes --]

On Tue, 2020-02-04 at 15:37 +0000, George Dunlap wrote:
> At very least it's more robust this way; the algorithm is also less
> "magic".  We just had a long discussion this morning trying to re-create
> the logic for why "Remove MFN 0" was sufficient to prevent this issue,
> and even then David wasn't sure it was correct at first.

Right. So the real reason I'm staring hard at this is because I can't
convince myself there aren't places where memory allocated by the boot
allocator is later freed with free_xenheap_pages().

We have a few pieces of code which decide whether to use the boot
allocator vs. heap based on system_state >= SYS_STATE_boot, and *if*
the rule is "thou shalt not allocate with boot allocator and free
later" then it's *extremely* fragile and probably being violated —
especially because it actually *works* most of the time, except in some
esoteric corner cases like MFN#0, boot allocations which cross
zones/regions, etc.

So because we want to make that *more* likely by allowing vmap() to
happen earlier, I'd like to clean things up by addressing those corner
cases and making it unconditionally OK to free boot-allocated pages
into the heap.

I think might be as simple as checking for (first_pg)->count_info == 0
in free_xenheap_pages(). That's quick enough, and if the count_info is
zero then I think it does indicate a boot-allocated page, because pages
from alloc_xenheap_pages() would have PGC_xen_heap set?

It would suffice just to pass such pages to init_heap_pages() instead
of directly to free_heap_pages(), I think. Julien?

The straw man version of that looks a bit like this...

--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -2304,6 +2304,12 @@ void free_xenheap_pages(void *v, unsigned int order)
 
     pg = virt_to_page(v);
 
+    /* Pages from the boot allocator need to pass through init_heap_pages() */
+    if ( unlikely(!pg->count_info) )
+    {
+        init_heap_pages(pg, 1 << order);
+        return;
+    }
     for ( i = 0; i < (1u << order); i++ )
         pg[i].count_info &= ~PGC_xen_heap;
 

[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Xen-devel] [XEN PATCH v2 1/2] Check zone before merging adjacent blocks in heap
  2020-02-05  9:50     ` David Woodhouse
@ 2020-02-05 10:02       ` Jan Beulich
  2020-02-05 10:24         ` David Woodhouse
  2020-02-05 10:22       ` [Xen-devel] [XEN PATCH v2 1/2] Check zone before merging adjacent blocks in heap Julien Grall
  1 sibling, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2020-02-05 10:02 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, George Dunlap,
	Jeff Kubascik, Stewart Hildebrand, xen-devel

On 05.02.2020 10:50, David Woodhouse wrote:
> On Tue, 2020-02-04 at 15:37 +0000, George Dunlap wrote:
>> At very least it's more robust this way; the algorithm is also less
>> "magic".  We just had a long discussion this morning trying to re-create
>> the logic for why "Remove MFN 0" was sufficient to prevent this issue,
>> and even then David wasn't sure it was correct at first.
> 
> Right. So the real reason I'm staring hard at this is because I can't
> convince myself there aren't places where memory allocated by the boot
> allocator is later freed with free_xenheap_pages().
> 
> We have a few pieces of code which decide whether to use the boot
> allocator vs. heap based on system_state >= SYS_STATE_boot, and *if*
> the rule is "thou shalt not allocate with boot allocator and free
> later" then it's *extremely* fragile and probably being violated —
> especially because it actually *works* most of the time, except in some
> esoteric corner cases like MFN#0, boot allocations which cross
> zones/regions, etc.
> 
> So because we want to make that *more* likely by allowing vmap() to
> happen earlier, I'd like to clean things up by addressing those corner
> cases and making it unconditionally OK to free boot-allocated pages
> into the heap.
> 
> I think might be as simple as checking for (first_pg)->count_info == 0
> in free_xenheap_pages(). That's quick enough, and if the count_info is
> zero then I think it does indicate a boot-allocated page, because pages
> from alloc_xenheap_pages() would have PGC_xen_heap set?

They would, but that leaves {alloc,free}_domheap_pages() out of
the picture. I.e. ...

> It would suffice just to pass such pages to init_heap_pages() instead
> of directly to free_heap_pages(), I think. Julien?
> 
> The straw man version of that looks a bit like this...
> 
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -2304,6 +2304,12 @@ void free_xenheap_pages(void *v, unsigned int order)
>  
>      pg = virt_to_page(v);
>  
> +    /* Pages from the boot allocator need to pass through init_heap_pages() */
> +    if ( unlikely(!pg->count_info) )

... while I think this check may be fine here, no similar one
can be used in free_domheap_pages(), yet pages getting handed
there isn't less likely than ones getting handed to
free_xenheap_pages() (if we already fear mismatch).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Xen-devel] [XEN PATCH v2 1/2] Check zone before merging adjacent blocks in heap
  2020-02-05  9:50     ` David Woodhouse
  2020-02-05 10:02       ` Jan Beulich
@ 2020-02-05 10:22       ` Julien Grall
  2020-02-05 10:32         ` David Woodhouse
  2020-02-05 11:36         ` David Woodhouse
  1 sibling, 2 replies; 48+ messages in thread
From: Julien Grall @ 2020-02-05 10:22 UTC (permalink / raw)
  To: David Woodhouse, George Dunlap, Jan Beulich, Stewart Hildebrand
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Jeff Kubascik,
	xen-devel

Hi,

On 05/02/2020 09:50, David Woodhouse wrote:
> On Tue, 2020-02-04 at 15:37 +0000, George Dunlap wrote:
>> At very least it's more robust this way; the algorithm is also less
>> "magic".  We just had a long discussion this morning trying to re-create
>> the logic for why "Remove MFN 0" was sufficient to prevent this issue,
>> and even then David wasn't sure it was correct at first.
> 
> Right. So the real reason I'm staring hard at this is because I can't
> convince myself there aren't places where memory allocated by the boot
> allocator is later freed with free_xenheap_pages().
> 
> We have a few pieces of code which decide whether to use the boot
> allocator vs. heap based on system_state >= SYS_STATE_boot, and *if*
> the rule is "thou shalt not allocate with boot allocator and free
> later" then it's *extremely* fragile and probably being violated —
> especially because it actually *works* most of the time, except in some
> esoteric corner cases like MFN#0, boot allocations which cross
> zones/regions, etc.
> 
> So because we want to make that *more* likely by allowing vmap() to
> happen earlier, I'd like to clean things up by addressing those corner
> cases and making it unconditionally OK to free boot-allocated pages
> into the heap.
> 
> I think might be as simple as checking for (first_pg)->count_info == 0
> in free_xenheap_pages(). That's quick enough, and if the count_info is
> zero then I think it does indicate a boot-allocated page, because pages
> from alloc_xenheap_pages() would have PGC_xen_heap set?
> 
> It would suffice just to pass such pages to init_heap_pages() instead
> of directly to free_heap_pages(), I think. Julien?
> 
> The straw man version of that looks a bit like this...
> 
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -2304,6 +2304,12 @@ void free_xenheap_pages(void *v, unsigned int order)
>   
>       pg = virt_to_page(v);
>   
> +    /* Pages from the boot allocator need to pass through init_heap_pages() */
> +    if ( unlikely(!pg->count_info) )

Note that there is two versions of free_xenheap_pages(). This one only 
cover the case where the domheap and xenheap are the same.

But you can't use the same trick when xenheap is separated (like on 
Arm32) because PGC_xen_heap is not set. So you would be calling 
init_heap_pages() everytime.

However, I don't like the idea of relying on count_info == 0. Indeed, 
there are valid case where count_info == 0 because it means the page is 
inuse (PCC_state_inuse).

It might be best if we introduce a new page state that would be the 
default value set in the frametable.

0 may be an option if we assign a different value to PGC_state_inuse but 
we would need to check if there are places relying on PGC_state_inuse == 
0. I know that assign_pages() does rely on it, I have sent a patch for 
it yesterday.

Another option is to use an all 1s value and initialize the frametable 
to all 1s. But I am not entirely sure whether the allocator would be 
able to cope with it.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Xen-devel] [XEN PATCH v2 1/2] Check zone before merging adjacent blocks in heap
  2020-02-05 10:02       ` Jan Beulich
@ 2020-02-05 10:24         ` David Woodhouse
  2020-02-05 10:49           ` Jan Beulich
  0 siblings, 1 reply; 48+ messages in thread
From: David Woodhouse @ 2020-02-05 10:24 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, George Dunlap,
	Jeff Kubascik, Stewart Hildebrand, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1170 bytes --]

On Wed, 2020-02-05 at 11:02 +0100, Jan Beulich wrote:
> > +    /* Pages from the boot allocator need to pass through init_heap_pages() */
> > +    if ( unlikely(!pg->count_info) )
> 
> ... while I think this check may be fine here, no similar one
> can be used in free_domheap_pages(), yet pages getting handed
> there isn't less likely than ones getting handed to
> free_xenheap_pages() (if we already fear mismatch).

Do we care about that?

ICBW but I don't think I've seen a case where boot-allocated pages get
handed to free_domheap_pages() later. I've only seen them handed to
free_xenheap_pages(). These are pages which are mapped to Xen, not
domheap pages.

You are already expected *not* to conflate free_xenheap_pages() and
free_domheap_pages().

I think it should be OK to declare that freeing boot-allocated pages
with free_xenheap_pages() is permitted, but freeing them with
free_domheap_pages() isn't.

(Note my straw man patch didn't fix the CONFIG_SEPARATE_XENHEAP case
and doesn't trivially port to that version of free_xenheap_pages()
because the PGC_xen_heap flag isn't used there. But it's fixable
relatively easily.




[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Xen-devel] [XEN PATCH v2 1/2] Check zone before merging adjacent blocks in heap
  2020-02-05 10:22       ` [Xen-devel] [XEN PATCH v2 1/2] Check zone before merging adjacent blocks in heap Julien Grall
@ 2020-02-05 10:32         ` David Woodhouse
  2020-02-05 11:36         ` David Woodhouse
  1 sibling, 0 replies; 48+ messages in thread
From: David Woodhouse @ 2020-02-05 10:32 UTC (permalink / raw)
  To: Julien Grall, George Dunlap, Jan Beulich, Stewart Hildebrand
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Jeff Kubascik,
	xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3518 bytes --]

On Wed, 2020-02-05 at 10:22 +0000, Julien Grall wrote:
> Hi,
> 
> On 05/02/2020 09:50, David Woodhouse wrote:
> > On Tue, 2020-02-04 at 15:37 +0000, George Dunlap wrote:
> > > At very least it's more robust this way; the algorithm is also less
> > > "magic".  We just had a long discussion this morning trying to re-create
> > > the logic for why "Remove MFN 0" was sufficient to prevent this issue,
> > > and even then David wasn't sure it was correct at first.
> > 
> > Right. So the real reason I'm staring hard at this is because I can't
> > convince myself there aren't places where memory allocated by the boot
> > allocator is later freed with free_xenheap_pages().
> > 
> > We have a few pieces of code which decide whether to use the boot
> > allocator vs. heap based on system_state >= SYS_STATE_boot, and *if*
> > the rule is "thou shalt not allocate with boot allocator and free
> > later" then it's *extremely* fragile and probably being violated —
> > especially because it actually *works* most of the time, except in some
> > esoteric corner cases like MFN#0, boot allocations which cross
> > zones/regions, etc.
> > 
> > So because we want to make that *more* likely by allowing vmap() to
> > happen earlier, I'd like to clean things up by addressing those corner
> > cases and making it unconditionally OK to free boot-allocated pages
> > into the heap.
> > 
> > I think might be as simple as checking for (first_pg)->count_info == 0
> > in free_xenheap_pages(). That's quick enough, and if the count_info is
> > zero then I think it does indicate a boot-allocated page, because pages
> > from alloc_xenheap_pages() would have PGC_xen_heap set?
> > 
> > It would suffice just to pass such pages to init_heap_pages() instead
> > of directly to free_heap_pages(), I think. Julien?
> > 
> > The straw man version of that looks a bit like this...
> > 
> > --- a/xen/common/page_alloc.c
> > +++ b/xen/common/page_alloc.c
> > @@ -2304,6 +2304,12 @@ void free_xenheap_pages(void *v, unsigned int order)
> >   
> >       pg = virt_to_page(v);
> >   
> > +    /* Pages from the boot allocator need to pass through init_heap_pages() */
> > +    if ( unlikely(!pg->count_info) )
> 
> Note that there is two versions of free_xenheap_pages(). This one only 
> cover the case where the domheap and xenheap are the same.
> 
> But you can't use the same trick when xenheap is separated (like on 
> Arm32) because PGC_xen_heap is not set. So you would be calling 
> init_heap_pages() everytime.

Right. We'd want to set PGC_xen_heap there too on the corresponding
pages.

> However, I don't like the idea of relying on count_info == 0. Indeed, 
> there are valid case where count_info == 0 because it means the page is 
> inuse (PCC_state_inuse).

For xenheap pages, not just domheap pages?

> It might be best if we introduce a new page state that would be the 
> default value set in the frametable.

... and which is easily set with memset() so it's fast, which I think
you considered since you were suggesting all 0xFF, but which we haven't
explciitly stated out loud.

The other thing to note is that it's easy to make a default state for
pages which *weren't* given out by the boot allocator, but we don't
have a simple way to do anything to the pages which *were* given out by
the boot allocator, since we don't track those and — fairly much by
definition — we don't have the frametable yet when we start the boot
allocator.



[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Xen-devel] [XEN PATCH v2 1/2] Check zone before merging adjacent blocks in heap
  2020-02-05 10:24         ` David Woodhouse
@ 2020-02-05 10:49           ` Jan Beulich
  2020-02-05 11:23             ` David Woodhouse
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2020-02-05 10:49 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, George Dunlap,
	Jeff Kubascik, Stewart Hildebrand, xen-devel

On 05.02.2020 11:24, David Woodhouse wrote:
> On Wed, 2020-02-05 at 11:02 +0100, Jan Beulich wrote:
>>> +    /* Pages from the boot allocator need to pass through init_heap_pages() */
>>> +    if ( unlikely(!pg->count_info) )
>>
>> ... while I think this check may be fine here, no similar one
>> can be used in free_domheap_pages(), yet pages getting handed
>> there isn't less likely than ones getting handed to
>> free_xenheap_pages() (if we already fear mismatch).
> 
> Do we care about that?
> 
> ICBW but I don't think I've seen a case where boot-allocated pages get
> handed to free_domheap_pages() later. I've only seen them handed to
> free_xenheap_pages(). These are pages which are mapped to Xen, not
> domheap pages.
> 
> You are already expected *not* to conflate free_xenheap_pages() and
> free_domheap_pages().

That's the case now, but with us wanting to get rid of the direct map,
to sole difference will be whether a mapping gets established.
There'll likely be no need for a PGC_xen_heap flag anymore, as we
don't need to tell apart the underlying pages (at least as far as the
allocator is concerned).

> I think it should be OK to declare that freeing boot-allocated pages
> with free_xenheap_pages() is permitted, but freeing them with
> free_domheap_pages() isn't.

I don't think this is going to be a good idea. free_xenheap_pages()
right now expects an address within the direct map to be passed.
alloc_boot_pages(), however, returns an MFN, which may get mapped
to arbitrary linear addresses (or none at all). There's quite a bit
more similarity to alloc_domheap_pages(), as that returns
struct page_info *, which in turn has a reliable translation
to/from MFN. Yet, as you say elsewhere, whether an MFN has an
entry in frame_table[] is entirely unclear, so permitting boot-
allocator pages to be freed via alloc_domheap_pages() nevertheless
still doesn't look any better an idea to me.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Xen-devel] [XEN PATCH v2 1/2] Check zone before merging adjacent blocks in heap
  2020-02-05 10:49           ` Jan Beulich
@ 2020-02-05 11:23             ` David Woodhouse
  2020-02-05 13:37               ` Jan Beulich
  0 siblings, 1 reply; 48+ messages in thread
From: David Woodhouse @ 2020-02-05 11:23 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, George Dunlap,
	Jeff Kubascik, Stewart Hildebrand, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 828 bytes --]

On Wed, 2020-02-05 at 11:49 +0100, Jan Beulich wrote:
> Yet, as you say elsewhere, whether an MFN has an
> entry in frame_table[] is entirely unclear, so permitting boot-
> allocator pages to be freed via alloc_domheap_pages() nevertheless
> still doesn't look any better an idea to me.

Hm, I don't think I said that, did I? That would be a new and exciting
complication.

I think every MFN handed out by the boot allocator *should* have a
corresponding entry in the frame table. All the PDX setup is done for
those pages, just as it is for heap pages. In fact, some of that setup
is *only* done in init_boot_pages() right now, and if page ranges don't
go through the boot allocator and end up being transferred to the heap
in end_boot_allocator(), things (like badpages= on the command line)
don't work right.

[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Xen-devel] [XEN PATCH v2 1/2] Check zone before merging adjacent blocks in heap
  2020-02-05 10:22       ` [Xen-devel] [XEN PATCH v2 1/2] Check zone before merging adjacent blocks in heap Julien Grall
  2020-02-05 10:32         ` David Woodhouse
@ 2020-02-05 11:36         ` David Woodhouse
  1 sibling, 0 replies; 48+ messages in thread
From: David Woodhouse @ 2020-02-05 11:36 UTC (permalink / raw)
  To: Julien Grall, George Dunlap, Jan Beulich, Stewart Hildebrand
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Jeff Kubascik,
	xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1565 bytes --]

On Wed, 2020-02-05 at 10:22 +0000, Julien Grall wrote:
> However, I don't like the idea of relying on count_info == 0. Indeed, 
> there are valid case where count_info == 0 because it means the page is 
> inuse (PCC_state_inuse).
> 
> It might be best if we introduce a new page state that would be the 
> default value set in the frametable.
> 
> 0 may be an option if we assign a different value to PGC_state_inuse but 
> we would need to check if there are places relying on PGC_state_inuse == 
> 0. I know that assign_pages() does rely on it, I have sent a patch for 
> it yesterday.

We'd need an extra bit for that.

Perhaps we could recognise that PGC_broken is not valid in combination
with some of the four PGC_state states, and take that bit back to turn
PGC_state into a 3-bit field, which could hold 8 states of which I
think we only currently need 7?

 { zero, inuse, offlining, offlining_broken, offline, offline_broken, free }

> Another option is to use an all 1s value and initialize the frametable 
> to all 1s. But I am not entirely sure whether the allocator would be 
> able to cope with it.

We'd zero it in init_heap_pages() for every page, and it would only be
still set to all 1s for pages from the boot allocator; as long as all
1s *is* proven to be an invalid state (I think it is); then the same
kind of trick checking unlikely(pg->count_info == -1) in the free
functions as in my earlier straw man, would probably work.

I think I prefer zero being the "untouched" state though, rather than
all 1s. 


[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Xen-devel] [XEN PATCH v2 1/2] Check zone before merging adjacent blocks in heap
  2020-02-05 11:23             ` David Woodhouse
@ 2020-02-05 13:37               ` Jan Beulich
  2020-02-05 14:12                 ` David Woodhouse
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2020-02-05 13:37 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, George Dunlap,
	Jeff Kubascik, Stewart Hildebrand, xen-devel

On 05.02.2020 12:23, David Woodhouse wrote:
> On Wed, 2020-02-05 at 11:49 +0100, Jan Beulich wrote:
>> Yet, as you say elsewhere, whether an MFN has an
>> entry in frame_table[] is entirely unclear, so permitting boot-
>> allocator pages to be freed via alloc_domheap_pages() nevertheless
>> still doesn't look any better an idea to me.
> 
> Hm, I don't think I said that, did I? That would be a new and exciting
> complication.
> 
> I think every MFN handed out by the boot allocator *should* have a
> corresponding entry in the frame table. All the PDX setup is done for
> those pages, just as it is for heap pages. In fact, some of that setup
> is *only* done in init_boot_pages() right now, and if page ranges don't
> go through the boot allocator and end up being transferred to the heap
> in end_boot_allocator(), things (like badpages= on the command line)
> don't work right.

I guess I should have said "whether an MFN has a properly initialized
entry in frame_table[] is entirely unclear".

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Xen-devel] [XEN PATCH v2 1/2] Check zone before merging adjacent blocks in heap
  2020-02-05 13:37               ` Jan Beulich
@ 2020-02-05 14:12                 ` David Woodhouse
  2020-02-07 15:49                   ` David Woodhouse
  0 siblings, 1 reply; 48+ messages in thread
From: David Woodhouse @ 2020-02-05 14:12 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, George Dunlap,
	Jeff Kubascik, Stewart Hildebrand, xen-devel, David Woodhouse



> On 05.02.2020 12:23, David Woodhouse wrote:
>> On Wed, 2020-02-05 at 11:49 +0100, Jan Beulich wrote:
>>> Yet, as you say elsewhere, whether an MFN has an
>>> entry in frame_table[] is entirely unclear, so permitting boot-
>>> allocator pages to be freed via alloc_domheap_pages() nevertheless
>>> still doesn't look any better an idea to me.
>>
>> Hm, I don't think I said that, did I? That would be a new and exciting
>> complication.
>>
>> I think every MFN handed out by the boot allocator *should* have a
>> corresponding entry in the frame table. All the PDX setup is done for
>> those pages, just as it is for heap pages. In fact, some of that setup
>> is *only* done in init_boot_pages() right now, and if page ranges don't
>> go through the boot allocator and end up being transferred to the heap
>> in end_boot_allocator(), things (like badpages= on the command line)
>> don't work right.
>
> I guess I should have said "whether an MFN has a properly initialized
> entry in frame_table[] is entirely unclear".

Aha, right. Yes, I admit to having said that :)

I think we have a viable path to fixing that, by folding PGC_broken in to
the state bits so that we can disambiguate. Will experiment.


-- 
dwmw2


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Xen-devel] [XEN PATCH v2 1/2] Check zone before merging adjacent blocks in heap
  2020-02-05 14:12                 ` David Woodhouse
@ 2020-02-07 15:49                   ` David Woodhouse
  2020-02-07 15:57                     ` [Xen-devel] [PATCH 1/2] xen/mm: fold PGC_broken into PGC_state bits David Woodhouse
  2020-02-07 15:57                     ` [Xen-devel] [PATCH 2/2] xen/mm: Introduce PG_state_uninitialised David Woodhouse
  0 siblings, 2 replies; 48+ messages in thread
From: David Woodhouse @ 2020-02-07 15:49 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, George Dunlap,
	Jeff Kubascik, Stewart Hildebrand, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1328 bytes --]

On Wed, 2020-02-05 at 14:12 +0000, David Woodhouse wrote:
> I think we have a viable path to fixing that, by folding PGC_broken in to
> the state bits so that we can disambiguate. Will experiment.

Here, it looks something like this. First we fold PGC_broken into the
state bits giving us 8 possible states of which only 6 are currently in
use.

Then in the second patch we can move PGC_state_inuse from zero (the
default contents of the frame table at startup), and make a new state
PGC_state_uninitialised with the value zero.

We can make free_xenheap_pages() and free_domheap_pages() call
init_heap_pages() instead of free_heap_pages() if they see a page range
which is still in PGC_state_uninitialised.

We need to fix up a couple of asserts not only to cope with
PGC_state_inuse not being zero (as Julien has already looked at) but
also to take PGC_state_uninitialised pages.

In assign_pages() that's because we map the multiboot module containing
the initramfs to dom0. That might actually cross node boundaries,
contain MFN#0, etc. — so if/when that gets released by dom0 we'd want
those pages to be passed to init_heap_pages() just the same as boot-
allocated memory.

In share_xen_page_with_guest() it happens because we share all non-RAM
page frames with dom0.

(2 patches follow...)

[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Xen-devel] [PATCH 1/2] xen/mm: fold PGC_broken into PGC_state bits
  2020-02-07 15:49                   ` David Woodhouse
@ 2020-02-07 15:57                     ` David Woodhouse
  2020-02-07 20:27                       ` Julien Grall
  2020-02-20 11:10                       ` Jan Beulich
  2020-02-07 15:57                     ` [Xen-devel] [PATCH 2/2] xen/mm: Introduce PG_state_uninitialised David Woodhouse
  1 sibling, 2 replies; 48+ messages in thread
From: David Woodhouse @ 2020-02-07 15:57 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, George Dunlap,
	Jeff Kubascik, Stewart Hildebrand, xen-devel

From: David Woodhouse <dwmw@amazon.co.uk>

Only PGC_state_offlining and PGC_state_offlined are valid in conjunction
with PGC_broken. The other two states (free and inuse) were never valid
for a broken page.

By folding PGC_broken in, we can have three bits for PGC_state which
allows up to 8 states, of which 6 are currently used and 2 are available
for new use cases.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 xen/arch/x86/domctl.c    |  2 +-
 xen/common/page_alloc.c  | 67 ++++++++++++++++++++++++----------------
 xen/include/asm-arm/mm.h | 26 +++++++++++-----
 xen/include/asm-x86/mm.h | 33 +++++++++++++-------
 4 files changed, 82 insertions(+), 46 deletions(-)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 4fa9c91140..17a318e16d 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -415,7 +415,7 @@ long arch_do_domctl(
                 if ( page->u.inuse.type_info & PGT_pinned )
                     type |= XEN_DOMCTL_PFINFO_LPINTAB;
 
-                if ( page->count_info & PGC_broken )
+                if ( page_is_broken(page) )
                     type = XEN_DOMCTL_PFINFO_BROKEN;
             }
 
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 97902d42c1..4084503554 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -1093,7 +1093,7 @@ static int reserve_offlined_page(struct page_info *head)
         struct page_info *pg;
         int next_order;
 
-        if ( page_state_is(cur_head, offlined) )
+        if ( page_is_offlined(cur_head) )
         {
             cur_head++;
             if ( first_dirty != INVALID_DIRTY_IDX && first_dirty )
@@ -1113,7 +1113,7 @@ static int reserve_offlined_page(struct page_info *head)
             for ( i = (1 << cur_order), pg = cur_head + (1 << cur_order );
                   i < (1 << next_order);
                   i++, pg++ )
-                if ( page_state_is(pg, offlined) )
+                if ( page_is_offlined(pg) )
                     break;
             if ( i == ( 1 << next_order) )
             {
@@ -1145,16 +1145,19 @@ static int reserve_offlined_page(struct page_info *head)
 
     for ( cur_head = head; cur_head < head + ( 1UL << head_order); cur_head++ )
     {
-        if ( !page_state_is(cur_head, offlined) )
+        struct page_list_head *list;
+        if ( page_state_is(cur_head, offlined) )
+            list = &page_offlined_list;
+        else if (page_state_is(cur_head, broken) )
+            list = &page_broken_list;
+        else
             continue;
 
         avail[node][zone]--;
         total_avail_pages--;
         ASSERT(total_avail_pages >= 0);
 
-        page_list_add_tail(cur_head,
-                           test_bit(_PGC_broken, &cur_head->count_info) ?
-                           &page_broken_list : &page_offlined_list);
+        page_list_add_tail(cur_head, list);
 
         count++;
     }
@@ -1404,13 +1407,16 @@ static void free_heap_pages(
         switch ( pg[i].count_info & PGC_state )
         {
         case PGC_state_inuse:
-            BUG_ON(pg[i].count_info & PGC_broken);
             pg[i].count_info = PGC_state_free;
             break;
 
         case PGC_state_offlining:
-            pg[i].count_info = (pg[i].count_info & PGC_broken) |
-                               PGC_state_offlined;
+            pg[i].count_info = PGC_state_offlined;
+            tainted = 1;
+            break;
+
+        case PGC_state_broken_offlining:
+            pg[i].count_info = PGC_state_broken;
             tainted = 1;
             break;
 
@@ -1527,16 +1533,25 @@ static unsigned long mark_page_offline(struct page_info *pg, int broken)
     do {
         nx = x = y;
 
-        if ( ((x & PGC_state) != PGC_state_offlined) &&
-             ((x & PGC_state) != PGC_state_offlining) )
+        nx &= ~PGC_state;
+
+        switch ( x & PGC_state )
         {
-            nx &= ~PGC_state;
-            nx |= (((x & PGC_state) == PGC_state_free)
-                   ? PGC_state_offlined : PGC_state_offlining);
-        }
+        case PGC_state_inuse:
+        case PGC_state_offlining:
+            nx |= broken ? PGC_state_offlining : PGC_state_broken_offlining;
+            break;
+
+        case PGC_state_free:
+            nx |= broken ? PGC_state_broken : PGC_state_offlined;
 
-        if ( broken )
-            nx |= PGC_broken;
+        case PGC_state_broken_offlining:
+            nx |= PGC_state_broken_offlining;
+
+        case PGC_state_offlined:
+        case PGC_state_broken:
+            nx |= PGC_state_broken;
+        }
 
         if ( x == nx )
             break;
@@ -1609,7 +1624,7 @@ int offline_page(mfn_t mfn, int broken, uint32_t *status)
      * need to prevent malicious guest access the broken page again.
      * Under such case, hypervisor shutdown guest, preventing recursive mce.
      */
-    if ( (pg->count_info & PGC_broken) && (owner = page_get_owner(pg)) )
+    if ( page_is_broken(pg) && (owner = page_get_owner(pg)) )
     {
         *status = PG_OFFLINE_AGAIN;
         domain_crash(owner);
@@ -1620,7 +1635,7 @@ int offline_page(mfn_t mfn, int broken, uint32_t *status)
 
     old_info = mark_page_offline(pg, broken);
 
-    if ( page_state_is(pg, offlined) )
+    if ( page_is_offlined(pg) )
     {
         reserve_heap_page(pg);
 
@@ -1699,14 +1714,14 @@ unsigned int online_page(mfn_t mfn, uint32_t *status)
     do {
         ret = *status = 0;
 
-        if ( y & PGC_broken )
+        if ( (y & PGC_state) == PGC_state_broken ||
+             (y & PGC_state) == PGC_state_broken_offlining )
         {
             ret = -EINVAL;
             *status = PG_ONLINE_FAILED |PG_ONLINE_BROKEN;
             break;
         }
-
-        if ( (y & PGC_state) == PGC_state_offlined )
+        else if ( (y & PGC_state) == PGC_state_offlined )
         {
             page_list_del(pg, &page_offlined_list);
             *status = PG_ONLINE_ONLINED;
@@ -1747,11 +1762,11 @@ int query_page_offline(mfn_t mfn, uint32_t *status)
 
     pg = mfn_to_page(mfn);
 
-    if ( page_state_is(pg, offlining) )
+    if ( page_is_offlining(pg) )
         *status |= PG_OFFLINE_STATUS_OFFLINE_PENDING;
-    if ( pg->count_info & PGC_broken )
+    if ( page_is_broken(pg) )
         *status |= PG_OFFLINE_STATUS_BROKEN;
-    if ( page_state_is(pg, offlined) )
+    if ( page_is_offlined(pg) )
         *status |= PG_OFFLINE_STATUS_OFFLINED;
 
     spin_unlock(&heap_lock);
@@ -2483,7 +2498,7 @@ __initcall(pagealloc_keyhandler_init);
 
 void scrub_one_page(struct page_info *pg)
 {
-    if ( unlikely(pg->count_info & PGC_broken) )
+    if ( unlikely(page_is_broken(pg)) )
         return;
 
 #ifndef NDEBUG
diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
index 333efd3a60..c9466c8ca0 100644
--- a/xen/include/asm-arm/mm.h
+++ b/xen/include/asm-arm/mm.h
@@ -112,13 +112,25 @@ struct page_info
 /* Page is broken? */
 #define _PGC_broken       PG_shift(7)
 #define PGC_broken        PG_mask(1, 7)
- /* Mutually-exclusive page states: { inuse, offlining, offlined, free }. */
-#define PGC_state         PG_mask(3, 9)
-#define PGC_state_inuse   PG_mask(0, 9)
-#define PGC_state_offlining PG_mask(1, 9)
-#define PGC_state_offlined PG_mask(2, 9)
-#define PGC_state_free    PG_mask(3, 9)
-#define page_state_is(pg, st) (((pg)->count_info&PGC_state) == PGC_state_##st)
+ /*
+  * Mutually-exclusive page states:
+  * { inuse, offlining, offlined, free, broken_offlining, broken }
+  */
+#define PGC_state                  PG_mask(7, 9)
+#define PGC_state_inuse            PG_mask(0, 9)
+#define PGC_state_offlining        PG_mask(1, 9)
+#define PGC_state_offlined         PG_mask(2, 9)
+#define PGC_state_free             PG_mask(3, 9)
+#define PGC_state_broken_offlining PG_mask(4, 9)
+#define PGC_state_broken           PG_mask(5, 9)
+
+#define page_state_is(pg, st)      (((pg)->count_info&PGC_state) == PGC_state_##st)
+#define page_is_broken(pg)         (page_state_is((pg), broken_offlining) ||  \
+                                    page_state_is((pg), broken))
+#define page_is_offlined(pg)       (page_state_is((pg), broken) ||    \
+                                    page_state_is((pg), offlined))
+#define page_is_offlining(pg)      (page_state_is((pg), broken_offlining) || \
+                                    page_state_is((pg), offlining))
 
 /* Count of references to this frame. */
 #define PGC_count_width   PG_shift(9)
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 2ca8882ad0..3edadf7a7c 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -67,18 +67,27 @@
  /* 3-bit PAT/PCD/PWT cache-attribute hint. */
 #define PGC_cacheattr_base PG_shift(6)
 #define PGC_cacheattr_mask PG_mask(7, 6)
- /* Page is broken? */
-#define _PGC_broken       PG_shift(7)
-#define PGC_broken        PG_mask(1, 7)
- /* Mutually-exclusive page states: { inuse, offlining, offlined, free }. */
-#define PGC_state         PG_mask(3, 9)
-#define PGC_state_inuse   PG_mask(0, 9)
-#define PGC_state_offlining PG_mask(1, 9)
-#define PGC_state_offlined PG_mask(2, 9)
-#define PGC_state_free    PG_mask(3, 9)
-#define page_state_is(pg, st) (((pg)->count_info&PGC_state) == PGC_state_##st)
-
- /* Count of references to this frame. */
+ /*
+  * Mutually-exclusive page states:
+  * { inuse, offlining, offlined, free, broken_offlining, broken }
+  */
+#define PGC_state                  PG_mask(7, 9)
+#define PGC_state_inuse            PG_mask(0, 9)
+#define PGC_state_offlining        PG_mask(1, 9)
+#define PGC_state_offlined         PG_mask(2, 9)
+#define PGC_state_free             PG_mask(3, 9)
+#define PGC_state_broken_offlining PG_mask(4, 9)
+#define PGC_state_broken           PG_mask(5, 9)
+
+#define page_state_is(pg, st)      (((pg)->count_info&PGC_state) == PGC_state_##st)
+#define page_is_broken(pg)         (page_state_is((pg), broken_offlining) ||  \
+                                    page_state_is((pg), broken))
+#define page_is_offlined(pg)       (page_state_is((pg), broken) ||    \
+                                    page_state_is((pg), offlined))
+#define page_is_offlining(pg)      (page_state_is((pg), broken_offlining) || \
+                                    page_state_is((pg), offlining))
+
+/* Count of references to this frame. */
 #define PGC_count_width   PG_shift(9)
 #define PGC_count_mask    ((1UL<<PGC_count_width)-1)
 
-- 
2.21.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Xen-devel] [PATCH 2/2] xen/mm: Introduce PG_state_uninitialised
  2020-02-07 15:49                   ` David Woodhouse
  2020-02-07 15:57                     ` [Xen-devel] [PATCH 1/2] xen/mm: fold PGC_broken into PGC_state bits David Woodhouse
@ 2020-02-07 15:57                     ` David Woodhouse
  2020-02-07 16:30                       ` Xia, Hongyan
  1 sibling, 1 reply; 48+ messages in thread
From: David Woodhouse @ 2020-02-07 15:57 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, George Dunlap,
	Jeff Kubascik, Stewart Hildebrand, xen-devel

From: David Woodhouse <dwmw@amazon.co.uk>

It is possible for pages to enter general circulation without ever
being process by init_heap_pages().

For example, pages of the multiboot module containing the initramfs may
be assigned via assign_pages() to dom0 as it is created. And some code
including map_pages_to_xen() has checks on 'system_state' to determine
whether to use the boot or the heap allocator, but it seems impossible
to prove that pages allocated by the boot allocator are not subsequently
freed with free_heap_pages().

This actually works fine in the majority of cases; there are only a few
esoteric corner cases which init_heap_pages() handles before handing the
page range off to free_heap_pages():
 • Excluding MFN #0 to avoid inappropriate cross-zone merging.
 • Ensuring that the node information structures exist, when the first
   page(s) of a given node are handled.
 • High order allocations crossing from one node to another.

To handle this case, shift PG_state_inuse from its current value of
zero, to another value. Use zero, which is the initial state of the
entire frame table, as PG_state_uninitialised.

Fix a couple of assertions which were assuming that PG_state_inuse is
zero, and make them cope with the PG_state_uninitialised case too where
appopriate.

Finally, make free_xenheap_pages() and free_domheap_pages() call
through to init_heap_pages() instead of directly to free_heap_pages()
in the case where pages are being free which have never passed through
init_heap_pages().

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 xen/arch/x86/mm.c        |  3 ++-
 xen/common/page_alloc.c  | 41 ++++++++++++++++++++++++++--------------
 xen/include/asm-arm/mm.h |  3 ++-
 xen/include/asm-x86/mm.h |  3 ++-
 4 files changed, 33 insertions(+), 17 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 9b33829084..bf660ee8eb 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -488,7 +488,8 @@ void share_xen_page_with_guest(struct page_info *page, struct domain *d,
 
     page_set_owner(page, d);
     smp_wmb(); /* install valid domain ptr before updating refcnt. */
-    ASSERT((page->count_info & ~PGC_xen_heap) == 0);
+    ASSERT((page->count_info & ~PGC_xen_heap) == PGC_state_inuse ||
+           (page->count_info & ~PGC_xen_heap) == PGC_state_uninitialised);
 
     /* Only add to the allocation list if the domain isn't dying. */
     if ( !d->is_dying )
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 4084503554..9703a2c664 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -1407,6 +1407,7 @@ static void free_heap_pages(
         switch ( pg[i].count_info & PGC_state )
         {
         case PGC_state_inuse:
+        case PGC_state_uninitialised:
             pg[i].count_info = PGC_state_free;
             break;
 
@@ -1780,11 +1781,10 @@ int query_page_offline(mfn_t mfn, uint32_t *status)
  * latter is not on a MAX_ORDER boundary, then we reserve the page by
  * not freeing it to the buddy allocator.
  */
-static void init_heap_pages(
-    struct page_info *pg, unsigned long nr_pages)
+static void init_heap_pages(struct page_info *pg, unsigned long nr_pages,
+                            bool scrub)
 {
     unsigned long i;
-    bool idle_scrub = false;
 
     /*
      * Keep MFN 0 away from the buddy allocator to avoid crossing zone
@@ -1809,7 +1809,7 @@ static void init_heap_pages(
     spin_unlock(&heap_lock);
 
     if ( system_state < SYS_STATE_active && opt_bootscrub == BOOTSCRUB_IDLE )
-        idle_scrub = true;
+        scrub = true;
 
     for ( i = 0; i < nr_pages; i++ )
     {
@@ -1837,7 +1837,7 @@ static void init_heap_pages(
             nr_pages -= n;
         }
 
-        free_heap_pages(pg + i, 0, scrub_debug || idle_scrub);
+        free_heap_pages(pg + i, 0, scrub_debug || scrub);
     }
 }
 
@@ -1873,7 +1873,7 @@ void __init end_boot_allocator(void)
         if ( (r->s < r->e) &&
              (phys_to_nid(pfn_to_paddr(r->s)) == cpu_to_node(0)) )
         {
-            init_heap_pages(mfn_to_page(_mfn(r->s)), r->e - r->s);
+            init_heap_pages(mfn_to_page(_mfn(r->s)), r->e - r->s, false);
             r->e = r->s;
             break;
         }
@@ -1882,7 +1882,7 @@ void __init end_boot_allocator(void)
     {
         struct bootmem_region *r = &bootmem_region_list[i];
         if ( r->s < r->e )
-            init_heap_pages(mfn_to_page(_mfn(r->s)), r->e - r->s);
+            init_heap_pages(mfn_to_page(_mfn(r->s)), r->e - r->s, false);
     }
     nr_bootmem_regions = 0;
 
@@ -2151,7 +2151,7 @@ void init_xenheap_pages(paddr_t ps, paddr_t pe)
 
     memguard_guard_range(maddr_to_virt(ps), pe - ps);
 
-    init_heap_pages(maddr_to_page(ps), (pe - ps) >> PAGE_SHIFT);
+    init_heap_pages(maddr_to_page(ps), (pe - ps) >> PAGE_SHIFT, false);
 }
 
 
@@ -2174,14 +2174,20 @@ void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
 
 void free_xenheap_pages(void *v, unsigned int order)
 {
+    struct page_info *pg;
     ASSERT(!in_irq());
 
     if ( v == NULL )
         return;
 
+    pg = virt_to_page(v);
+
     memguard_guard_range(v, 1 << (order + PAGE_SHIFT));
 
-    free_heap_pages(virt_to_page(v), order, false);
+    if ( unlikely(page_state_is(pg, uninitialised)) )
+        init_heap_pages(pg, 1 << order, true);
+    else
+        free_heap_pages(pg, order, false);
 }
 
 #else  /* !CONFIG_SEPARATE_XENHEAP */
@@ -2237,7 +2243,10 @@ void free_xenheap_pages(void *v, unsigned int order)
     for ( i = 0; i < (1u << order); i++ )
         pg[i].count_info &= ~PGC_xen_heap;
 
-    free_heap_pages(pg, order, true);
+    if ( unlikely(page_state_is(pg, uninitialised)) )
+        init_heap_pages(pg, 1 << order, true);
+    else
+        free_heap_pages(pg, order, true);
 }
 
 #endif  /* CONFIG_SEPARATE_XENHEAP */
@@ -2260,7 +2269,7 @@ void init_domheap_pages(paddr_t ps, paddr_t pe)
     if ( mfn_x(emfn) <= mfn_x(smfn) )
         return;
 
-    init_heap_pages(mfn_to_page(smfn), mfn_x(emfn) - mfn_x(smfn));
+    init_heap_pages(mfn_to_page(smfn), mfn_x(emfn) - mfn_x(smfn), false);
 }
 
 
@@ -2301,10 +2310,11 @@ int assign_pages(
     for ( i = 0; i < (1 << order); i++ )
     {
         ASSERT(page_get_owner(&pg[i]) == NULL);
-        ASSERT(!pg[i].count_info);
+        ASSERT(pg[i].count_info == PGC_state_inuse ||
+               pg[i].count_info == PGC_state_uninitialised);
         page_set_owner(&pg[i], d);
         smp_wmb(); /* Domain pointer must be visible before updating refcnt. */
-        pg[i].count_info = PGC_allocated | 1;
+        pg[i].count_info |= PGC_allocated | 1;
         page_list_add_tail(&pg[i], &d->page_list);
     }
 
@@ -2427,7 +2437,10 @@ void free_domheap_pages(struct page_info *pg, unsigned int order)
             scrub = 1;
         }
 
-        free_heap_pages(pg, order, scrub);
+        if ( unlikely(page_state_is(pg, uninitialised)) )
+            init_heap_pages(pg, 1 << order, scrub);
+        else
+            free_heap_pages(pg, order, scrub);
     }
 
     if ( drop_dom_ref )
diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
index c9466c8ca0..c696941600 100644
--- a/xen/include/asm-arm/mm.h
+++ b/xen/include/asm-arm/mm.h
@@ -117,12 +117,13 @@ struct page_info
   * { inuse, offlining, offlined, free, broken_offlining, broken }
   */
 #define PGC_state                  PG_mask(7, 9)
-#define PGC_state_inuse            PG_mask(0, 9)
+#define PGC_state_uninitialised    PG_mask(0, 9)
 #define PGC_state_offlining        PG_mask(1, 9)
 #define PGC_state_offlined         PG_mask(2, 9)
 #define PGC_state_free             PG_mask(3, 9)
 #define PGC_state_broken_offlining PG_mask(4, 9)
 #define PGC_state_broken           PG_mask(5, 9)
+#define PGC_state_inuse            PG_mask(6, 9)
 
 #define page_state_is(pg, st)      (((pg)->count_info&PGC_state) == PGC_state_##st)
 #define page_is_broken(pg)         (page_state_is((pg), broken_offlining) ||  \
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 3edadf7a7c..645368e6a9 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -72,12 +72,13 @@
   * { inuse, offlining, offlined, free, broken_offlining, broken }
   */
 #define PGC_state                  PG_mask(7, 9)
-#define PGC_state_inuse            PG_mask(0, 9)
+#define PGC_state_uninitialised    PG_mask(0, 9)
 #define PGC_state_offlining        PG_mask(1, 9)
 #define PGC_state_offlined         PG_mask(2, 9)
 #define PGC_state_free             PG_mask(3, 9)
 #define PGC_state_broken_offlining PG_mask(4, 9)
 #define PGC_state_broken           PG_mask(5, 9)
+#define PGC_state_inuse            PG_mask(6, 9)
 
 #define page_state_is(pg, st)      (((pg)->count_info&PGC_state) == PGC_state_##st)
 #define page_is_broken(pg)         (page_state_is((pg), broken_offlining) ||  \
-- 
2.21.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [Xen-devel] [PATCH 2/2] xen/mm: Introduce PG_state_uninitialised
  2020-02-07 15:57                     ` [Xen-devel] [PATCH 2/2] xen/mm: Introduce PG_state_uninitialised David Woodhouse
@ 2020-02-07 16:30                       ` Xia, Hongyan
  2020-02-07 16:32                         ` David Woodhouse
  0 siblings, 1 reply; 48+ messages in thread
From: Xia, Hongyan @ 2020-02-07 16:30 UTC (permalink / raw)
  To: jbeulich, dwmw2
  Cc: sstabellini, julien, wl, konrad.wilk, george.dunlap,
	andrew.cooper3, ian.jackson, george.dunlap, jeff.kubascik,
	stewart.hildebrand, xen-devel

On Fri, 2020-02-07 at 15:57 +0000, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
> 
> ...
> 
> Finally, make free_xenheap_pages() and free_domheap_pages() call
> through to init_heap_pages() instead of directly to free_heap_pages()
> in the case where pages are being free which have never passed
> through
> init_heap_pages().
> 
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

If both end up calling free_heap_pages, can we just put the check
there?

Hongyan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Xen-devel] [PATCH 2/2] xen/mm: Introduce PG_state_uninitialised
  2020-02-07 16:30                       ` Xia, Hongyan
@ 2020-02-07 16:32                         ` David Woodhouse
  2020-02-07 16:40                           ` Xia, Hongyan
  0 siblings, 1 reply; 48+ messages in thread
From: David Woodhouse @ 2020-02-07 16:32 UTC (permalink / raw)
  To: Xia, Hongyan, jbeulich
  Cc: sstabellini, julien, wl, konrad.wilk, george.dunlap,
	andrew.cooper3, ian.jackson, george.dunlap, jeff.kubascik,
	stewart.hildebrand, xen-devel



On 7 February 2020 16:30:04 GMT, "Xia, Hongyan" <hongyxia@amazon.com> wrote:
>On Fri, 2020-02-07 at 15:57 +0000, David Woodhouse wrote:
>> From: David Woodhouse <dwmw@amazon.co.uk>
>> 
>> ...
>> 
>> Finally, make free_xenheap_pages() and free_domheap_pages() call
>> through to init_heap_pages() instead of directly to free_heap_pages()
>> in the case where pages are being free which have never passed
>> through
>> init_heap_pages().
>> 
>> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
>
>If both end up calling free_heap_pages, can we just put the check
>there?

Ideally not because init_heap_pages() then calls free_heap_pages() and the "recursion" is best avoided.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Xen-devel] [PATCH 2/2] xen/mm: Introduce PG_state_uninitialised
  2020-02-07 16:32                         ` David Woodhouse
@ 2020-02-07 16:40                           ` Xia, Hongyan
  2020-02-07 17:06                             ` David Woodhouse
  0 siblings, 1 reply; 48+ messages in thread
From: Xia, Hongyan @ 2020-02-07 16:40 UTC (permalink / raw)
  To: jbeulich, dwmw2
  Cc: sstabellini, julien, wl, konrad.wilk, george.dunlap,
	andrew.cooper3, ian.jackson, george.dunlap, jeff.kubascik,
	stewart.hildebrand, xen-devel

On Fri, 2020-02-07 at 16:32 +0000, David Woodhouse wrote:
> 
> ...
> 
> Ideally not because init_heap_pages() then calls free_heap_pages()
> and the "recursion" is best avoided.

Kind of depends on where we change its pg to initialised. If we do that
in free_heap_pages this does recurse, but if it is done in
init_heap_pages before calling free it does not.

Hongyan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Xen-devel] [PATCH 2/2] xen/mm: Introduce PG_state_uninitialised
  2020-02-07 16:40                           ` Xia, Hongyan
@ 2020-02-07 17:06                             ` David Woodhouse
  2020-02-07 18:04                               ` David Woodhouse
  0 siblings, 1 reply; 48+ messages in thread
From: David Woodhouse @ 2020-02-07 17:06 UTC (permalink / raw)
  To: Xia, Hongyan, jbeulich
  Cc: sstabellini, julien, wl, konrad.wilk, george.dunlap,
	andrew.cooper3, ian.jackson, george.dunlap, jeff.kubascik,
	stewart.hildebrand, xen-devel



On 7 February 2020 16:40:01 GMT, "Xia, Hongyan" <hongyxia@amazon.com> wrote:
>On Fri, 2020-02-07 at 16:32 +0000, David Woodhouse wrote:
>> 
>> ...
>> 
>> Ideally not because init_heap_pages() then calls free_heap_pages()
>> and the "recursion" is best avoided.
>
>Kind of depends on where we change its pg to initialised. If we do that
>in free_heap_pages this does recurse, but if it is done in
>init_heap_pages before calling free it does not.

True. It would *look* like it might recurse, but never should in practice.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Xen-devel] [PATCH 2/2] xen/mm: Introduce PG_state_uninitialised
  2020-02-07 17:06                             ` David Woodhouse
@ 2020-02-07 18:04                               ` David Woodhouse
  2020-02-20 11:59                                 ` Jan Beulich
  0 siblings, 1 reply; 48+ messages in thread
From: David Woodhouse @ 2020-02-07 18:04 UTC (permalink / raw)
  To: Xia, Hongyan, jbeulich
  Cc: sstabellini, julien, wl, konrad.wilk, george.dunlap,
	andrew.cooper3, ian.jackson, george.dunlap, jeff.kubascik,
	stewart.hildebrand, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 9378 bytes --]

On Fri, 2020-02-07 at 17:06 +0000, David Woodhouse wrote:
> On 7 February 2020 16:40:01 GMT, "Xia, Hongyan" <hongyxia@amazon.com> wrote:
> > On Fri, 2020-02-07 at 16:32 +0000, David Woodhouse wrote:
> > > 
> > > ...
> > > 
> > > Ideally not because init_heap_pages() then calls free_heap_pages()
> > > and the "recursion" is best avoided.
> > 
> > Kind of depends on where we change its pg to initialised. If we do that
> > in free_heap_pages this does recurse, but if it is done in
> > init_heap_pages before calling free it does not.
> 
> True. It would *look* like it might recurse, but never should in practice.

Looks like this. Less duplication of the 'if (uninitialised)
init_heap_pages() else free_heap_pages()' construct, more scope for
people to naïvely complain that it "recurses". I think I prefer it this
way.



From: David Woodhouse <dwmw@amazon.co.uk>
Date: Fri, 7 Feb 2020 13:01:36 +0000
Subject: [PATCH] xen/mm: Introduce PG_state_uninitialised

It is possible for pages to enter general circulation without ever
being process by init_heap_pages().

For example, pages of the multiboot module containing the initramfs may
be assigned via assign_pages() to dom0 as it is created. And some code
including map_pages_to_xen() has checks on 'system_state' to determine
whether to use the boot or the heap allocator, but it seems impossible
to prove that pages allocated by the boot allocator are not subsequently
freed with free_heap_pages().

This actually works fine in the majority of cases; there are only a few
esoteric corner cases which init_heap_pages() handles before handing the
page range off to free_heap_pages():
 • Excluding MFN #0 to avoid inappropriate cross-zone merging.
 • Ensuring that the node information structures exist, when the first
   page(s) of a given node are handled.
 • High order allocations crossing from one node to another.

To handle this case, shift PG_state_inuse from its current value of
zero, to another value. Use zero, which is the initial state of the
entire frame table, as PG_state_uninitialised.

Fix a couple of assertions which were assuming that PG_state_inuse is
zero, and make them cope with the PG_state_uninitialised case too where
appopriate.

Finally, make free_heap_pages() call through to init_heap_pages() when
given a page range which has not been initialised. This cannot keep
recursing because init_heap_pages() will set each page state to
PGC_state_inuse before passing it back to free_heap_pages() for the
second time.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 xen/arch/x86/mm.c        |  3 ++-
 xen/common/page_alloc.c  | 40 ++++++++++++++++++++++++++++------------
 xen/include/asm-arm/mm.h |  3 ++-
 xen/include/asm-x86/mm.h |  3 ++-
 4 files changed, 34 insertions(+), 15 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 9b33829084..bf660ee8eb 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -488,7 +488,8 @@ void share_xen_page_with_guest(struct page_info *page, struct domain *d,
 
     page_set_owner(page, d);
     smp_wmb(); /* install valid domain ptr before updating refcnt. */
-    ASSERT((page->count_info & ~PGC_xen_heap) == 0);
+    ASSERT((page->count_info & ~PGC_xen_heap) == PGC_state_inuse ||
+           (page->count_info & ~PGC_xen_heap) == PGC_state_uninitialised);
 
     /* Only add to the allocation list if the domain isn't dying. */
     if ( !d->is_dying )
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 4084503554..95984d6de0 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -252,6 +252,8 @@ struct bootmem_region {
 static struct bootmem_region __initdata
     bootmem_region_list[PAGE_SIZE / sizeof(struct bootmem_region)];
 static unsigned int __initdata nr_bootmem_regions;
+static void init_heap_pages(struct page_info *pg, unsigned long nr_pages,
+                            bool scrub);
 
 struct scrub_region {
     unsigned long offset;
@@ -1389,6 +1391,16 @@ static void free_heap_pages(
     ASSERT(order <= MAX_ORDER);
     ASSERT(node >= 0);
 
+    if ( page_state_is(pg, uninitialised) )
+    {
+        init_heap_pages(pg, 1 << order, need_scrub);
+        /*
+         * init_heap_pages() will call back into free_heap_pages() for
+         * each page but cannot keep recursing because each page will
+         * be set to PGC_state_inuse first.
+         */
+        return;
+    }
     spin_lock(&heap_lock);
 
     for ( i = 0; i < (1 << order); i++ )
@@ -1780,11 +1792,10 @@ int query_page_offline(mfn_t mfn, uint32_t *status)
  * latter is not on a MAX_ORDER boundary, then we reserve the page by
  * not freeing it to the buddy allocator.
  */
-static void init_heap_pages(
-    struct page_info *pg, unsigned long nr_pages)
+static void init_heap_pages(struct page_info *pg, unsigned long nr_pages,
+                            bool scrub)
 {
     unsigned long i;
-    bool idle_scrub = false;
 
     /*
      * Keep MFN 0 away from the buddy allocator to avoid crossing zone
@@ -1809,7 +1820,7 @@ static void init_heap_pages(
     spin_unlock(&heap_lock);
 
     if ( system_state < SYS_STATE_active && opt_bootscrub == BOOTSCRUB_IDLE )
-        idle_scrub = true;
+        scrub = true;
 
     for ( i = 0; i < nr_pages; i++ )
     {
@@ -1837,7 +1848,8 @@ static void init_heap_pages(
             nr_pages -= n;
         }
 
-        free_heap_pages(pg + i, 0, scrub_debug || idle_scrub);
+        pg[i].count_info = PGC_state_inuse;
+        free_heap_pages(pg + i, 0, scrub_debug || scrub);
     }
 }
 
@@ -1873,7 +1885,7 @@ void __init end_boot_allocator(void)
         if ( (r->s < r->e) &&
              (phys_to_nid(pfn_to_paddr(r->s)) == cpu_to_node(0)) )
         {
-            init_heap_pages(mfn_to_page(_mfn(r->s)), r->e - r->s);
+            init_heap_pages(mfn_to_page(_mfn(r->s)), r->e - r->s, false);
             r->e = r->s;
             break;
         }
@@ -1882,7 +1894,7 @@ void __init end_boot_allocator(void)
     {
         struct bootmem_region *r = &bootmem_region_list[i];
         if ( r->s < r->e )
-            init_heap_pages(mfn_to_page(_mfn(r->s)), r->e - r->s);
+            init_heap_pages(mfn_to_page(_mfn(r->s)), r->e - r->s, false);
     }
     nr_bootmem_regions = 0;
 
@@ -2151,7 +2163,7 @@ void init_xenheap_pages(paddr_t ps, paddr_t pe)
 
     memguard_guard_range(maddr_to_virt(ps), pe - ps);
 
-    init_heap_pages(maddr_to_page(ps), (pe - ps) >> PAGE_SHIFT);
+    init_heap_pages(maddr_to_page(ps), (pe - ps) >> PAGE_SHIFT, false);
 }
 
 
@@ -2260,7 +2275,7 @@ void init_domheap_pages(paddr_t ps, paddr_t pe)
     if ( mfn_x(emfn) <= mfn_x(smfn) )
         return;
 
-    init_heap_pages(mfn_to_page(smfn), mfn_x(emfn) - mfn_x(smfn));
+    init_heap_pages(mfn_to_page(smfn), mfn_x(emfn) - mfn_x(smfn), false);
 }
 
 
@@ -2301,10 +2316,11 @@ int assign_pages(
     for ( i = 0; i < (1 << order); i++ )
     {
         ASSERT(page_get_owner(&pg[i]) == NULL);
-        ASSERT(!pg[i].count_info);
+        ASSERT(pg[i].count_info == PGC_state_inuse ||
+               pg[i].count_info == PGC_state_uninitialised);
         page_set_owner(&pg[i], d);
         smp_wmb(); /* Domain pointer must be visible before updating refcnt. */
-        pg[i].count_info = PGC_allocated | 1;
+        pg[i].count_info |= PGC_allocated | 1;
         page_list_add_tail(&pg[i], &d->page_list);
     }
 
diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
index c9466c8ca0..c696941600 100644
--- a/xen/include/asm-arm/mm.h
+++ b/xen/include/asm-arm/mm.h
@@ -117,12 +117,13 @@ struct page_info
   * { inuse, offlining, offlined, free, broken_offlining, broken }
   */
 #define PGC_state                  PG_mask(7, 9)
-#define PGC_state_inuse            PG_mask(0, 9)
+#define PGC_state_uninitialised    PG_mask(0, 9)
 #define PGC_state_offlining        PG_mask(1, 9)
 #define PGC_state_offlined         PG_mask(2, 9)
 #define PGC_state_free             PG_mask(3, 9)
 #define PGC_state_broken_offlining PG_mask(4, 9)
 #define PGC_state_broken           PG_mask(5, 9)
+#define PGC_state_inuse            PG_mask(6, 9)
 
 #define page_state_is(pg, st)      (((pg)->count_info&PGC_state) == PGC_state_##st)
 #define page_is_broken(pg)         (page_state_is((pg), broken_offlining) ||  \
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 3edadf7a7c..645368e6a9 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -72,12 +72,13 @@
   * { inuse, offlining, offlined, free, broken_offlining, broken }
   */
 #define PGC_state                  PG_mask(7, 9)
-#define PGC_state_inuse            PG_mask(0, 9)
+#define PGC_state_uninitialised    PG_mask(0, 9)
 #define PGC_state_offlining        PG_mask(1, 9)
 #define PGC_state_offlined         PG_mask(2, 9)
 #define PGC_state_free             PG_mask(3, 9)
 #define PGC_state_broken_offlining PG_mask(4, 9)
 #define PGC_state_broken           PG_mask(5, 9)
+#define PGC_state_inuse            PG_mask(6, 9)
 
 #define page_state_is(pg, st)      (((pg)->count_info&PGC_state) == PGC_state_##st)
 #define page_is_broken(pg)         (page_state_is((pg), broken_offlining) ||  \
-- 
2.17.1


[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [Xen-devel] [PATCH 1/2] xen/mm: fold PGC_broken into PGC_state bits
  2020-02-07 15:57                     ` [Xen-devel] [PATCH 1/2] xen/mm: fold PGC_broken into PGC_state bits David Woodhouse
@ 2020-02-07 20:27                       ` Julien Grall
  2020-02-09 13:22                         ` David Woodhouse
  2020-03-17 21:39                         ` David Woodhouse
  2020-02-20 11:10                       ` Jan Beulich
  1 sibling, 2 replies; 48+ messages in thread
From: Julien Grall @ 2020-02-07 20:27 UTC (permalink / raw)
  To: David Woodhouse, Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, George Dunlap,
	Jeff Kubascik, Stewart Hildebrand, xen-devel

Hi David,

Could you please send the series in a separate thread? So we don't mix 
the two discussions (i.e merge and free on boot allocated page) together.

On 07/02/2020 15:57, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
> 
> Only PGC_state_offlining and PGC_state_offlined are valid in conjunction
> with PGC_broken. The other two states (free and inuse) were never valid
> for a broken page.
> 
> By folding PGC_broken in, we can have three bits for PGC_state which
> allows up to 8 states, of which 6 are currently used and 2 are available
> for new use cases.

The idea looks good to me. I have a few  mostly nitpicking comment below.

> 
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
>   xen/arch/x86/domctl.c    |  2 +-
>   xen/common/page_alloc.c  | 67 ++++++++++++++++++++++++----------------
>   xen/include/asm-arm/mm.h | 26 +++++++++++-----
>   xen/include/asm-x86/mm.h | 33 +++++++++++++-------
>   4 files changed, 82 insertions(+), 46 deletions(-)
> 
> diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
> index 4fa9c91140..17a318e16d 100644
> --- a/xen/arch/x86/domctl.c
> +++ b/xen/arch/x86/domctl.c
> @@ -415,7 +415,7 @@ long arch_do_domctl(
>                   if ( page->u.inuse.type_info & PGT_pinned )
>                       type |= XEN_DOMCTL_PFINFO_LPINTAB;
>   
> -                if ( page->count_info & PGC_broken )
> +                if ( page_is_broken(page) )
>                       type = XEN_DOMCTL_PFINFO_BROKEN;
>               }
>   
> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> index 97902d42c1..4084503554 100644
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -1093,7 +1093,7 @@ static int reserve_offlined_page(struct page_info *head)
>           struct page_info *pg;
>           int next_order;
>   
> -        if ( page_state_is(cur_head, offlined) )
> +        if ( page_is_offlined(cur_head) )
>           {
>               cur_head++;
>               if ( first_dirty != INVALID_DIRTY_IDX && first_dirty )
> @@ -1113,7 +1113,7 @@ static int reserve_offlined_page(struct page_info *head)
>               for ( i = (1 << cur_order), pg = cur_head + (1 << cur_order );
>                     i < (1 << next_order);
>                     i++, pg++ )
> -                if ( page_state_is(pg, offlined) )
> +                if ( page_is_offlined(pg) )
>                       break;
>               if ( i == ( 1 << next_order) )
>               {
> @@ -1145,16 +1145,19 @@ static int reserve_offlined_page(struct page_info *head)
>   
>       for ( cur_head = head; cur_head < head + ( 1UL << head_order); cur_head++ )
>       {
> -        if ( !page_state_is(cur_head, offlined) )
> +        struct page_list_head *list;

We tend to add a newline after a series of declaration.

> +        if ( page_state_is(cur_head, offlined) )
> +            list = &page_offlined_list;
> +        else if (page_state_is(cur_head, broken) )
> +            list = &page_broken_list;
> +        else
>               continue;
>   
>           avail[node][zone]--;
>           total_avail_pages--;
>           ASSERT(total_avail_pages >= 0);
>   
> -        page_list_add_tail(cur_head,
> -                           test_bit(_PGC_broken, &cur_head->count_info) ?
> -                           &page_broken_list : &page_offlined_list);
> +        page_list_add_tail(cur_head, list);
>   
>           count++;
>       }
> @@ -1404,13 +1407,16 @@ static void free_heap_pages(
>           switch ( pg[i].count_info & PGC_state )
>           {
>           case PGC_state_inuse:
> -            BUG_ON(pg[i].count_info & PGC_broken);
>               pg[i].count_info = PGC_state_free;
>               break;
>   
>           case PGC_state_offlining:
> -            pg[i].count_info = (pg[i].count_info & PGC_broken) |
> -                               PGC_state_offlined;
> +            pg[i].count_info = PGC_state_offlined;
> +            tainted = 1;
> +            break;
> +
> +        case PGC_state_broken_offlining:
> +            pg[i].count_info = PGC_state_broken;
>               tainted = 1;
>               break;
>   
> @@ -1527,16 +1533,25 @@ static unsigned long mark_page_offline(struct page_info *pg, int broken)
>       do {
>           nx = x = y;
>   
> -        if ( ((x & PGC_state) != PGC_state_offlined) &&
> -             ((x & PGC_state) != PGC_state_offlining) )
> +        nx &= ~PGC_state;
> +
> +        switch ( x & PGC_state )
>           {
> -            nx &= ~PGC_state;
> -            nx |= (((x & PGC_state) == PGC_state_free)
> -                   ? PGC_state_offlined : PGC_state_offlining);
> -        }
> +        case PGC_state_inuse:
> +        case PGC_state_offlining:
> +            nx |= broken ? PGC_state_offlining : PGC_state_broken_offlining;
> +            break;
> +
> +        case PGC_state_free:
> +            nx |= broken ? PGC_state_broken : PGC_state_offlined;
>   
> -        if ( broken )
> -            nx |= PGC_broken;
> +        case PGC_state_broken_offlining:
> +            nx |= PGC_state_broken_offlining;
> +
> +        case PGC_state_offlined:
> +        case PGC_state_broken:
> +            nx |= PGC_state_broken;

Shouldn't this be:

case PGC_state_offlined:
     nx |= broken ? PGC_state_offlined : PGC_state_broken;

case PGC_state_broken:
     nx |= PGC_state_broken;

There are also quite a difference with the default case now. Without 
this patch, if you were to add a new state but not handling it here, you 
would transition to PGC_state_offlining. With this patch, we will 
transtion to 0 (i.e PGC_state_inuse for now).

PGC_state_* are not an enum, the compiler can't help to catch new state 
that doesn't have a corresponding case. So I would suggest to add a 
default matching the current behavior and adding an 
ASSERT_UNREACHABLE(). Note that I am open to a different kind of 
handling here.

> +        }
>   
>           if ( x == nx )
>               break;
> @@ -1609,7 +1624,7 @@ int offline_page(mfn_t mfn, int broken, uint32_t *status)
>        * need to prevent malicious guest access the broken page again.
>        * Under such case, hypervisor shutdown guest, preventing recursive mce.
>        */
> -    if ( (pg->count_info & PGC_broken) && (owner = page_get_owner(pg)) )
> +    if ( page_is_broken(pg) && (owner = page_get_owner(pg)) )
>       {
>           *status = PG_OFFLINE_AGAIN;
>           domain_crash(owner);
> @@ -1620,7 +1635,7 @@ int offline_page(mfn_t mfn, int broken, uint32_t *status)
>   
>       old_info = mark_page_offline(pg, broken);
>   
> -    if ( page_state_is(pg, offlined) )
> +    if ( page_is_offlined(pg) )
>       {
>           reserve_heap_page(pg);
>   
> @@ -1699,14 +1714,14 @@ unsigned int online_page(mfn_t mfn, uint32_t *status)
>       do {
>           ret = *status = 0;
>   
> -        if ( y & PGC_broken )
> +        if ( (y & PGC_state) == PGC_state_broken ||
> +             (y & PGC_state) == PGC_state_broken_offlining )

This is a bit a shame we can't use page_is_broken. Would it make sense 
to introduce an helper that just take a count_info?

>           {
>               ret = -EINVAL;
>               *status = PG_ONLINE_FAILED |PG_ONLINE_BROKEN;
>               break;
>           }
> -
> -        if ( (y & PGC_state) == PGC_state_offlined )
> +        else if ( (y & PGC_state) == PGC_state_offlined )
>           {
>               page_list_del(pg, &page_offlined_list);
>               *status = PG_ONLINE_ONLINED;
> @@ -1747,11 +1762,11 @@ int query_page_offline(mfn_t mfn, uint32_t *status)
>   
>       pg = mfn_to_page(mfn);
>   
> -    if ( page_state_is(pg, offlining) )
> +    if ( page_is_offlining(pg) )
>           *status |= PG_OFFLINE_STATUS_OFFLINE_PENDING;
> -    if ( pg->count_info & PGC_broken )
> +    if ( page_is_broken(pg) )
>           *status |= PG_OFFLINE_STATUS_BROKEN;
> -    if ( page_state_is(pg, offlined) )
> +    if ( page_is_offlined(pg) )
>           *status |= PG_OFFLINE_STATUS_OFFLINED;
>   
>       spin_unlock(&heap_lock);
> @@ -2483,7 +2498,7 @@ __initcall(pagealloc_keyhandler_init);
>   
>   void scrub_one_page(struct page_info *pg)
>   {
> -    if ( unlikely(pg->count_info & PGC_broken) )
> +    if ( unlikely(page_is_broken(pg)) )
>           return;
>   
>   #ifndef NDEBUG
> diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
> index 333efd3a60..c9466c8ca0 100644
> --- a/xen/include/asm-arm/mm.h
> +++ b/xen/include/asm-arm/mm.h
> @@ -112,13 +112,25 @@ struct page_info
>   /* Page is broken? */
>   #define _PGC_broken       PG_shift(7)
>   #define PGC_broken        PG_mask(1, 7)

Shouldn't this be dropped now?

> - /* Mutually-exclusive page states: { inuse, offlining, offlined, free }. */
> -#define PGC_state         PG_mask(3, 9)
> -#define PGC_state_inuse   PG_mask(0, 9)
> -#define PGC_state_offlining PG_mask(1, 9)
> -#define PGC_state_offlined PG_mask(2, 9)
> -#define PGC_state_free    PG_mask(3, 9)
> -#define page_state_is(pg, st) (((pg)->count_info&PGC_state) == PGC_state_##st)
> + /*
> +  * Mutually-exclusive page states:
> +  * { inuse, offlining, offlined, free, broken_offlining, broken }
> +  */
> +#define PGC_state                  PG_mask(7, 9)
> +#define PGC_state_inuse            PG_mask(0, 9)
> +#define PGC_state_offlining        PG_mask(1, 9)
> +#define PGC_state_offlined         PG_mask(2, 9)
> +#define PGC_state_free             PG_mask(3, 9)
> +#define PGC_state_broken_offlining PG_mask(4, 9)
> +#define PGC_state_broken           PG_mask(5, 9)

I agree that making all the value aligned make it nicer to read, but 
this is increasing number of "unrelated" changes and makes the review 
more difficult.

I would prefer if we leave the indentation alone for the current 
#define. But I am not going to push for it :).

> +
> +#define page_state_is(pg, st)      (((pg)->count_info&PGC_state) == PGC_state_##st)
> +#define page_is_broken(pg)         (page_state_is((pg), broken_offlining) ||  \
> +                                    page_state_is((pg), broken))
> +#define page_is_offlined(pg)       (page_state_is((pg), broken) ||    \
> +                                    page_state_is((pg), offlined))
> +#define page_is_offlining(pg)      (page_state_is((pg), broken_offlining) || \
> +                                    page_state_is((pg), offlining))
>   
>   /* Count of references to this frame. */
>   #define PGC_count_width   PG_shift(9)
> diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
> index 2ca8882ad0..3edadf7a7c 100644
> --- a/xen/include/asm-x86/mm.h
> +++ b/xen/include/asm-x86/mm.h
> @@ -67,18 +67,27 @@
>    /* 3-bit PAT/PCD/PWT cache-attribute hint. */
>   #define PGC_cacheattr_base PG_shift(6)
>   #define PGC_cacheattr_mask PG_mask(7, 6)
> - /* Page is broken? */
> -#define _PGC_broken       PG_shift(7)
> -#define PGC_broken        PG_mask(1, 7)
> - /* Mutually-exclusive page states: { inuse, offlining, offlined, free }. */
> -#define PGC_state         PG_mask(3, 9)
> -#define PGC_state_inuse   PG_mask(0, 9)
> -#define PGC_state_offlining PG_mask(1, 9)
> -#define PGC_state_offlined PG_mask(2, 9)
> -#define PGC_state_free    PG_mask(3, 9)
> -#define page_state_is(pg, st) (((pg)->count_info&PGC_state) == PGC_state_##st)
> -
> - /* Count of references to this frame. */
> + /*
> +  * Mutually-exclusive page states:
> +  * { inuse, offlining, offlined, free, broken_offlining, broken }
> +  */
> +#define PGC_state                  PG_mask(7, 9)
> +#define PGC_state_inuse            PG_mask(0, 9)
> +#define PGC_state_offlining        PG_mask(1, 9)
> +#define PGC_state_offlined         PG_mask(2, 9)
> +#define PGC_state_free             PG_mask(3, 9)
> +#define PGC_state_broken_offlining PG_mask(4, 9)
> +#define PGC_state_broken           PG_mask(5, 9)
> +
> +#define page_state_is(pg, st)      (((pg)->count_info&PGC_state) == PGC_state_##st)
> +#define page_is_broken(pg)         (page_state_is((pg), broken_offlining) ||  \
> +                                    page_state_is((pg), broken))
> +#define page_is_offlined(pg)       (page_state_is((pg), broken) ||    \
> +                                    page_state_is((pg), offlined))
> +#define page_is_offlining(pg)      (page_state_is((pg), broken_offlining) || \
> +                                    page_state_is((pg), offlining))
> +
> +/* Count of references to this frame. */
>   #define PGC_count_width   PG_shift(9)
>   #define PGC_count_mask    ((1UL<<PGC_count_width)-1)
>   
> 

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Xen-devel] [PATCH 1/2] xen/mm: fold PGC_broken into PGC_state bits
  2020-02-07 20:27                       ` Julien Grall
@ 2020-02-09 13:22                         ` David Woodhouse
  2020-02-09 17:59                           ` Julien Grall
  2020-03-17 21:39                         ` David Woodhouse
  1 sibling, 1 reply; 48+ messages in thread
From: David Woodhouse @ 2020-02-09 13:22 UTC (permalink / raw)
  To: Julien Grall, Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, George Dunlap,
	Jeff Kubascik, Stewart Hildebrand, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1657 bytes --]

On Fri, 2020-02-07 at 20:27 +0000, Julien Grall wrote:
> Could you please send the series in a separate thread? So we don't mix 
> the two discussions (i.e merge and free on boot allocated page) together.

Well, they're all part of the same mess, really.

There are cases where pages end up in free_heap_pages() which were
never vetted by init_heap_pages(). While that normally works just fine
— to the extent that we've never really cared — the hack excluding MFN0
is one of the things that gets 'missed' for such pages.

I was only looking at this because the early vmap support makes it a
tiny bit more likely that some pages will be freed that way after being
given out by the boot allocator, but there were plenty of reasons it
might happen already.

These patches basically legitimise that — we make it OK for
free_heap_pages() to be given a range which was never in the heap in
the first place. And in so doing, we fix the MFN0/zone merge trick even
when (for example) MFN0 was part of the dom0 initramfs and assigned
directly, but gets freed up later.

But sure, having failed to elicit any screams of "don't do it like
that", I'll fix up the things you pointed out and I'll repost it as
part of the series containing the early vmap() support, since that's
why I'm working on it.

Although at this point I'm half tempted to declare that there are *so*
many ways this can happen already, that the tiny little bit that it's
made more likely by the early vmap() support is basically in the noise.

In that case we can separate these patches out as an independent fix
rather than a dependency of early vmap.




[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Xen-devel] [PATCH 1/2] xen/mm: fold PGC_broken into PGC_state bits
  2020-02-09 13:22                         ` David Woodhouse
@ 2020-02-09 17:59                           ` Julien Grall
  0 siblings, 0 replies; 48+ messages in thread
From: Julien Grall @ 2020-02-09 17:59 UTC (permalink / raw)
  To: David Woodhouse, Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, George Dunlap,
	Jeff Kubascik, Stewart Hildebrand, xen-devel



On 09/02/2020 13:22, David Woodhouse wrote:
> On Fri, 2020-02-07 at 20:27 +0000, Julien Grall wrote:
>> Could you please send the series in a separate thread? So we don't mix
>> the two discussions (i.e merge and free on boot allocated page) together.
> 
> Well, they're all part of the same mess, really.

Sending a series in the middle of another series is always more 
difficult to track :). The more if they are handled by two different 
person...

> 
> There are cases where pages end up in free_heap_pages() which were
> never vetted by init_heap_pages(). While that normally works just fine
> — to the extent that we've never really cared — the hack excluding MFN0
> is one of the things that gets 'missed' for such pages.
> 
> I was only looking at this because the early vmap support makes it a
> tiny bit more likely that some pages will be freed that way after being
> given out by the boot allocator, but there were plenty of reasons it
> might happen already.
> 
> These patches basically legitimise that — we make it OK for
> free_heap_pages() to be given a range which was never in the heap in
> the first place. And in so doing, we fix the MFN0/zone merge trick even
> when (for example) MFN0 was part of the dom0 initramfs and assigned
> directly, but gets freed up later.
> 
> But sure, having failed to elicit any screams of "don't do it like
> that", I'll fix up the things you pointed out and I'll repost it as
> part of the series containing the early vmap() support, since that's
> why I'm working on it.
> 
> Although at this point I'm half tempted to declare that there are *so*
> many ways this can happen already, that the tiny little bit that it's
> made more likely by the early vmap() support is basically in the noise.
> 
> In that case we can separate these patches out as an independent fix
> rather than a dependency of early vmap.

I hadn't realize how messy it was because I had Arm in mind and wasn't 
expected x86 to abuse so much the interface.

For x86, I agree that this is noise as they are abusing the interface 
pretty much everywhere.

However, on Arm there is only one place that is abusing the interface. 
It is in the ACPI code, although I think it will be just a leak given 
the implementation of acpi_os_free_memory(). As we don't free 
page-tables yet on Arm, the introduction of the early vmap would not 
introduce any more abuse on Arm.

It would obviously nice to fix it, but as you said this is noise on x86. 
So that's really up to the x86 folks (Andrew, George, Jan) to see 
whether yet another abuse is ok for them :).

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Xen-devel] [PATCH 1/2] xen/mm: fold PGC_broken into PGC_state bits
  2020-02-07 15:57                     ` [Xen-devel] [PATCH 1/2] xen/mm: fold PGC_broken into PGC_state bits David Woodhouse
  2020-02-07 20:27                       ` Julien Grall
@ 2020-02-20 11:10                       ` Jan Beulich
  2020-03-17 21:52                         ` David Woodhouse
  1 sibling, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2020-02-20 11:10 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, George Dunlap,
	Jeff Kubascik, Stewart Hildebrand, xen-devel

On 07.02.2020 16:57, David Woodhouse wrote:
> @@ -1145,16 +1145,19 @@ static int reserve_offlined_page(struct page_info *head)
>  
>      for ( cur_head = head; cur_head < head + ( 1UL << head_order); cur_head++ )
>      {
> -        if ( !page_state_is(cur_head, offlined) )
> +        struct page_list_head *list;
> +        if ( page_state_is(cur_head, offlined) )
> +            list = &page_offlined_list;
> +        else if (page_state_is(cur_head, broken) )
> +            list = &page_broken_list;
> +        else
>              continue;
>  
>          avail[node][zone]--;
>          total_avail_pages--;
>          ASSERT(total_avail_pages >= 0);
>  
> -        page_list_add_tail(cur_head,
> -                           test_bit(_PGC_broken, &cur_head->count_info) ?
> -                           &page_broken_list : &page_offlined_list);
> +        page_list_add_tail(cur_head, list);

While I realize it's fewer comparisons this way, I still wonder
whether for the reader's sake it wouldn't better be
page_is_offlined() first and then page_is_broken() down here.

> @@ -1699,14 +1714,14 @@ unsigned int online_page(mfn_t mfn, uint32_t *status)
>      do {
>          ret = *status = 0;
>  
> -        if ( y & PGC_broken )
> +        if ( (y & PGC_state) == PGC_state_broken ||
> +             (y & PGC_state) == PGC_state_broken_offlining )
>          {
>              ret = -EINVAL;
>              *status = PG_ONLINE_FAILED |PG_ONLINE_BROKEN;
>              break;
>          }
> -
> -        if ( (y & PGC_state) == PGC_state_offlined )
> +        else if ( (y & PGC_state) == PGC_state_offlined )

I don't see a need for adding "else" here.

> --- a/xen/include/asm-x86/mm.h
> +++ b/xen/include/asm-x86/mm.h
> @@ -67,18 +67,27 @@
>   /* 3-bit PAT/PCD/PWT cache-attribute hint. */
>  #define PGC_cacheattr_base PG_shift(6)
>  #define PGC_cacheattr_mask PG_mask(7, 6)
> - /* Page is broken? */
> -#define _PGC_broken       PG_shift(7)
> -#define PGC_broken        PG_mask(1, 7)
> - /* Mutually-exclusive page states: { inuse, offlining, offlined, free }. */
> -#define PGC_state         PG_mask(3, 9)
> -#define PGC_state_inuse   PG_mask(0, 9)
> -#define PGC_state_offlining PG_mask(1, 9)
> -#define PGC_state_offlined PG_mask(2, 9)
> -#define PGC_state_free    PG_mask(3, 9)
> -#define page_state_is(pg, st) (((pg)->count_info&PGC_state) == PGC_state_##st)
> -
> - /* Count of references to this frame. */
> + /*
> +  * Mutually-exclusive page states:
> +  * { inuse, offlining, offlined, free, broken_offlining, broken }
> +  */
> +#define PGC_state                  PG_mask(7, 9)
> +#define PGC_state_inuse            PG_mask(0, 9)
> +#define PGC_state_offlining        PG_mask(1, 9)
> +#define PGC_state_offlined         PG_mask(2, 9)
> +#define PGC_state_free             PG_mask(3, 9)
> +#define PGC_state_broken_offlining PG_mask(4, 9)

TBH I'd prefer PGC_state_offlining_broken, as it's not the
offlining which is broken, but a broken page is being
offlined.

> +#define PGC_state_broken           PG_mask(5, 9)
> +
> +#define page_state_is(pg, st)      (((pg)->count_info&PGC_state) == PGC_state_##st)

Blanks around & please.

> +#define page_is_broken(pg)         (page_state_is((pg), broken_offlining) ||  \
> +                                    page_state_is((pg), broken))
> +#define page_is_offlined(pg)       (page_state_is((pg), broken) ||    \
> +                                    page_state_is((pg), offlined))

The inclusion of "broken" here would seem to deserve a (brief)
comment, either here or next to PGC_state_broken's #define.

> +#define page_is_offlining(pg)      (page_state_is((pg), broken_offlining) || \
> +                                    page_state_is((pg), offlining))

Overall I wonder whether the PGC_state_* ordering couldn't be
adjusted such that at least some of these three won't need
two comparisons (by masking off a bit before comparing).

Also for all three - no need for extra parentheses around pg
(or in general macro arguments which get handed on without
being part of an expression).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Xen-devel] [PATCH 2/2] xen/mm: Introduce PG_state_uninitialised
  2020-02-07 18:04                               ` David Woodhouse
@ 2020-02-20 11:59                                 ` Jan Beulich
  2020-02-20 13:27                                   ` Julien Grall
  2020-03-17 22:15                                   ` David Woodhouse
  0 siblings, 2 replies; 48+ messages in thread
From: Jan Beulich @ 2020-02-20 11:59 UTC (permalink / raw)
  To: David Woodhouse
  Cc: sstabellini, julien, wl, konrad.wilk, george.dunlap,
	andrew.cooper3, ian.jackson, george.dunlap, jeff.kubascik, Xia,
	Hongyan, stewart.hildebrand, xen-devel

On 07.02.2020 19:04, David Woodhouse wrote:
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -488,7 +488,8 @@ void share_xen_page_with_guest(struct page_info *page, struct domain *d,
>  
>      page_set_owner(page, d);
>      smp_wmb(); /* install valid domain ptr before updating refcnt. */
> -    ASSERT((page->count_info & ~PGC_xen_heap) == 0);
> +    ASSERT((page->count_info & ~PGC_xen_heap) == PGC_state_inuse ||
> +           (page->count_info & ~PGC_xen_heap) == PGC_state_uninitialised);

Can uninitialized pages really make it here?

> @@ -1389,6 +1391,16 @@ static void free_heap_pages(
>      ASSERT(order <= MAX_ORDER);
>      ASSERT(node >= 0);
>  
> +    if ( page_state_is(pg, uninitialised) )
> +    {
> +        init_heap_pages(pg, 1 << order, need_scrub);
> +        /*
> +         * init_heap_pages() will call back into free_heap_pages() for
> +         * each page but cannot keep recursing because each page will
> +         * be set to PGC_state_inuse first.
> +         */
> +        return;
> +    }
>      spin_lock(&heap_lock);

Can you also add a blank line above here please?

> @@ -1780,11 +1792,10 @@ int query_page_offline(mfn_t mfn, uint32_t *status)
>   * latter is not on a MAX_ORDER boundary, then we reserve the page by
>   * not freeing it to the buddy allocator.
>   */
> -static void init_heap_pages(
> -    struct page_info *pg, unsigned long nr_pages)
> +static void init_heap_pages(struct page_info *pg, unsigned long nr_pages,
> +                            bool scrub)

Is this new parameter strictly needed, i.e. can free_heap_pages()
be called with uninitialized pages which need scrubbing? (The
code change is simple enough, and hence may warrant keeping, but
then the commit message could indicate so in case this isn't a
strict requirement.)

> @@ -2301,10 +2316,11 @@ int assign_pages(
>      for ( i = 0; i < (1 << order); i++ )
>      {
>          ASSERT(page_get_owner(&pg[i]) == NULL);
> -        ASSERT(!pg[i].count_info);
> +        ASSERT(pg[i].count_info == PGC_state_inuse ||
> +               pg[i].count_info == PGC_state_uninitialised);

Same question here: Can uninitialized pages make it here? If
so, wouldn't it be better to correct this, rather than having
the more permissive assertion?

>          page_set_owner(&pg[i], d);
>          smp_wmb(); /* Domain pointer must be visible before updating refcnt. */
> -        pg[i].count_info = PGC_allocated | 1;
> +        pg[i].count_info |= PGC_allocated | 1;

This is too relaxed for my taste: I understand you want to
retain page state, but I suppose other bits would want clearing
nevertheless.

> --- a/xen/include/asm-x86/mm.h
> +++ b/xen/include/asm-x86/mm.h
> @@ -72,12 +72,13 @@
>    * { inuse, offlining, offlined, free, broken_offlining, broken }
>    */
>  #define PGC_state                  PG_mask(7, 9)
> -#define PGC_state_inuse            PG_mask(0, 9)
> +#define PGC_state_uninitialised    PG_mask(0, 9)
>  #define PGC_state_offlining        PG_mask(1, 9)
>  #define PGC_state_offlined         PG_mask(2, 9)
>  #define PGC_state_free             PG_mask(3, 9)
>  #define PGC_state_broken_offlining PG_mask(4, 9)
>  #define PGC_state_broken           PG_mask(5, 9)
> +#define PGC_state_inuse            PG_mask(6, 9)

Would imo be nice if this most common state was actually
either 1 or 7, for easy recognition. But the most suitable
value to pick may also depend on the outcome of one of the
comments on patch 1.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Xen-devel] [PATCH 2/2] xen/mm: Introduce PG_state_uninitialised
  2020-02-20 11:59                                 ` Jan Beulich
@ 2020-02-20 13:27                                   ` Julien Grall
  2020-03-17 22:15                                   ` David Woodhouse
  1 sibling, 0 replies; 48+ messages in thread
From: Julien Grall @ 2020-02-20 13:27 UTC (permalink / raw)
  To: Jan Beulich, David Woodhouse
  Cc: sstabellini, wl, konrad.wilk, george.dunlap, andrew.cooper3,
	ian.jackson, george.dunlap, jeff.kubascik, Xia, Hongyan,
	stewart.hildebrand, xen-devel

Hi,

On 20/02/2020 11:59, Jan Beulich wrote:
> On 07.02.2020 19:04, David Woodhouse wrote:
>> --- a/xen/arch/x86/mm.c
>> +++ b/xen/arch/x86/mm.c
>> @@ -488,7 +488,8 @@ void share_xen_page_with_guest(struct page_info *page, struct domain *d,
>>   
>>       page_set_owner(page, d);
>>       smp_wmb(); /* install valid domain ptr before updating refcnt. */
>> -    ASSERT((page->count_info & ~PGC_xen_heap) == 0);
>> +    ASSERT((page->count_info & ~PGC_xen_heap) == PGC_state_inuse ||
>> +           (page->count_info & ~PGC_xen_heap) == PGC_state_uninitialised);
> 
> Can uninitialized pages really make it here?

IIRC, arch_init_memory() will share the first 1MB of the RAM but they 
were never given to the page allocator.

01,10 +2316,11 @@ int assign_pages(
>>       for ( i = 0; i < (1 << order); i++ )
>>       {
>>           ASSERT(page_get_owner(&pg[i]) == NULL);
>> -        ASSERT(!pg[i].count_info);
>> +        ASSERT(pg[i].count_info == PGC_state_inuse ||
>> +               pg[i].count_info == PGC_state_uninitialised);
> 
> Same question here: Can uninitialized pages make it here?

Yes, in dom0_construct_pv() when the initrd is assigned to the guest.

> If
> so, wouldn't it be better to correct this, rather than having
> the more permissive assertion?

We would need to rework init_heap_pages() (or create a new function) so 
the allocator initialize the state but it is marked as "reserved/used" 
rather than "freed".

Most likely we will need a similar function for liveupdate. This is
because the pages belonging to guests should be untouched.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Xen-devel] [PATCH 1/2] xen/mm: fold PGC_broken into PGC_state bits
  2020-02-07 20:27                       ` Julien Grall
  2020-02-09 13:22                         ` David Woodhouse
@ 2020-03-17 21:39                         ` David Woodhouse
  1 sibling, 0 replies; 48+ messages in thread
From: David Woodhouse @ 2020-03-17 21:39 UTC (permalink / raw)
  To: Julien Grall, Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, George Dunlap,
	Jeff Kubascik, Stewart Hildebrand, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3528 bytes --]

On Fri, 2020-02-07 at 20:27 +0000, Julien Grall wrote:

> > +        switch ( x & PGC_state )
> >           {
> > -            nx &= ~PGC_state;
> > -            nx |= (((x & PGC_state) == PGC_state_free)
> > -                   ? PGC_state_offlined : PGC_state_offlining);
> > -        }
> > +        case PGC_state_inuse:
> > +        case PGC_state_offlining:
> > +            nx |= broken ? PGC_state_offlining : PGC_state_broken_offlining;
> > +            break;
> > +
> > +        case PGC_state_free:
> > +            nx |= broken ? PGC_state_broken : PGC_state_offlined;
> >   
> > -        if ( broken )
> > -            nx |= PGC_broken;
> > +        case PGC_state_broken_offlining:
> > +            nx |= PGC_state_broken_offlining;
> > +
> > +        case PGC_state_offlined:
> > +        case PGC_state_broken:
> > +            nx |= PGC_state_broken;
> 
> Shouldn't this be:
> 
> case PGC_state_offlined:
>      nx |= broken ? PGC_state_offlined : PGC_state_broken;
> 
> case PGC_state_broken:
>      nx |= PGC_state_broken;
> 
> There are also quite a difference with the default case now. Without 
> this patch, if you were to add a new state but not handling it here, you 
> would transition to PGC_state_offlining. With this patch, we will 
> transtion to 0 (i.e PGC_state_inuse for now).
> 
> PGC_state_* are not an enum, the compiler can't help to catch new state 
> that doesn't have a corresponding case. So I would suggest to add a 
> default matching the current behavior and adding an 
> ASSERT_UNREACHABLE(). Note that I am open to a different kind of 
> handling here.

I revamped this, taking into account your later suggestion of a helper
that works on the count_info. Looks more like this:


        /* If it was already broken, it stays broken */
        if ( pgc_is_broken(x) )
            broken = 1;

        if ( pgc_is_offlined(x) || pgc_is(x, free) )
            nx |= broken ? PGC_state_broken : PGC_state_offlined;
        else
            nx |= broken ? PGC_state_broken_offlining : PGC_state_offlining;


> > - /* Mutually-exclusive page states: { inuse, offlining, offlined, free }. */
> > -#define PGC_state         PG_mask(3, 9)
> > -#define PGC_state_inuse   PG_mask(0, 9)
> > -#define PGC_state_offlining PG_mask(1, 9)
> > -#define PGC_state_offlined PG_mask(2, 9)
> > -#define PGC_state_free    PG_mask(3, 9)
> > -#define page_state_is(pg, st) (((pg)->count_info&PGC_state) == PGC_state_##st)
> > + /*
> > +  * Mutually-exclusive page states:
> > +  * { inuse, offlining, offlined, free, broken_offlining, broken }
> > +  */
> > +#define PGC_state                  PG_mask(7, 9)
> > +#define PGC_state_inuse            PG_mask(0, 9)
> > +#define PGC_state_offlining        PG_mask(1, 9)
> > +#define PGC_state_offlined         PG_mask(2, 9)
> > +#define PGC_state_free             PG_mask(3, 9)
> > +#define PGC_state_broken_offlining PG_mask(4, 9)
> > +#define PGC_state_broken           PG_mask(5, 9)
> 
> I agree that making all the value aligned make it nicer to read, but 
> this is increasing number of "unrelated" changes and makes the review 
> more difficult.
> 
> I would prefer if we leave the indentation alone for the current 
> #define. But I am not going to push for it :).

I'm generally sympathetic to that point of view but at this point, all
those page states are kind of being redefined and it makes sense to
think about them all; having them all change doesn't hurt.



[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Xen-devel] [PATCH 1/2] xen/mm: fold PGC_broken into PGC_state bits
  2020-02-20 11:10                       ` Jan Beulich
@ 2020-03-17 21:52                         ` David Woodhouse
  2020-03-18  9:56                           ` Jan Beulich
  0 siblings, 1 reply; 48+ messages in thread
From: David Woodhouse @ 2020-03-17 21:52 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, George Dunlap,
	Jeff Kubascik, Stewart Hildebrand, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 5400 bytes --]

On Thu, 2020-02-20 at 12:10 +0100, Jan Beulich wrote:
> On 07.02.2020 16:57, David Woodhouse wrote:
> > @@ -1145,16 +1145,19 @@ static int reserve_offlined_page(struct
> > page_info *head)
> >  
> >      for ( cur_head = head; cur_head < head + ( 1UL << head_order);
> > cur_head++ )
> >      {
> > -        if ( !page_state_is(cur_head, offlined) )
> > +        struct page_list_head *list;
> > +        if ( page_state_is(cur_head, offlined) )
> > +            list = &page_offlined_list;
> > +        else if (page_state_is(cur_head, broken) )
> > +            list = &page_broken_list;
> > +        else
> >              continue;
> >  
> >          avail[node][zone]--;
> >          total_avail_pages--;
> >          ASSERT(total_avail_pages >= 0);
> >  
> > -        page_list_add_tail(cur_head,
> > -                           test_bit(_PGC_broken, &cur_head->count_info) ?
> > -                           &page_broken_list : &page_offlined_list);
> > +        page_list_add_tail(cur_head, list);
> 
> While I realize it's fewer comparisons this way, I still wonder
> whether for the reader's sake it wouldn't better be
> page_is_offlined() first and then page_is_broken() down here.

Nah, that would be worse. This way there are two cases which are
explicitly handled and the list to use for each of them is explicitly
set. The 'if (a||b) …    some_function(a ? thing_for_a : thing_for_b)'
construct is much less comprehensible.



> > @@ -1699,14 +1714,14 @@ unsigned int online_page(mfn_t mfn,
> > uint32_t *status)
> >      do {
> >          ret = *status = 0;
> >  
> > -        if ( y & PGC_broken )
> > +        if ( (y & PGC_state) == PGC_state_broken ||
> > +             (y & PGC_state) == PGC_state_broken_offlining )
> >          {
> >              ret = -EINVAL;
> >              *status = PG_ONLINE_FAILED |PG_ONLINE_BROKEN;
> >              break;
> >          }
> > -
> > -        if ( (y & PGC_state) == PGC_state_offlined )
> > +        else if ( (y & PGC_state) == PGC_state_offlined )
> 
> I don't see a need for adding "else" here.

They are mutually exclusive cases. It makes things a whole lot clearer
to the reader to put the 'else' there, and sometimes helps a naïve
compiler along the way too.


> > --- a/xen/include/asm-x86/mm.h
> > +++ b/xen/include/asm-x86/mm.h
> > @@ -67,18 +67,27 @@
> >   /* 3-bit PAT/PCD/PWT cache-attribute hint. */
> >  #define PGC_cacheattr_base PG_shift(6)
> >  #define PGC_cacheattr_mask PG_mask(7, 6)
> > - /* Page is broken? */
> > -#define _PGC_broken       PG_shift(7)
> > -#define PGC_broken        PG_mask(1, 7)
> > - /* Mutually-exclusive page states: { inuse, offlining, offlined,
> > free }. */
> > -#define PGC_state         PG_mask(3, 9)
> > -#define PGC_state_inuse   PG_mask(0, 9)
> > -#define PGC_state_offlining PG_mask(1, 9)
> > -#define PGC_state_offlined PG_mask(2, 9)
> > -#define PGC_state_free    PG_mask(3, 9)
> > -#define page_state_is(pg, st) (((pg)->count_info&PGC_state) ==
> > PGC_state_##st)
> > -
> > - /* Count of references to this frame. */
> > + /*
> > +  * Mutually-exclusive page states:
> > +  * { inuse, offlining, offlined, free, broken_offlining, broken }
> > +  */
> > +#define PGC_state                  PG_mask(7, 9)
> > +#define PGC_state_inuse            PG_mask(0, 9)
> > +#define PGC_state_offlining        PG_mask(1, 9)
> > +#define PGC_state_offlined         PG_mask(2, 9)
> > +#define PGC_state_free             PG_mask(3, 9)
> > +#define PGC_state_broken_offlining PG_mask(4, 9)
> 
> TBH I'd prefer PGC_state_offlining_broken, as it's not the
> offlining which is broken, but a broken page is being
> offlined.

It is the page which is both broken and offlining.
Or indeed it is the page which is both offlining and broken.


> > +#define PGC_state_broken           PG_mask(5, 9)
> > +
> > +#define page_state_is(pg, st)      (((pg)->count_info&PGC_state)
> > == PGC_state_##st)
> 
> Blanks around & please.

That part I hadn't touched but sure, I'll add those while I'm touching
it. I'd already ignored Julien's request *not* to make whitespace
cleanups while I'm here, after all :)

> > +#define page_is_broken(pg)         (page_state_is((pg),
> > broken_offlining) ||  \
> > +                                    page_state_is((pg), broken))
> > +#define page_is_offlined(pg)       (page_state_is((pg), broken)
> > ||    \
> > +                                    page_state_is((pg), offlined))
> 
> The inclusion of "broken" here would seem to deserve a (brief)
> comment, either here or next to PGC_state_broken's #define.

Done, in the version which will be sent shortly.

> > +#define page_is_offlining(pg)      (page_state_is((pg), broken_offlining) || \
> > +                                    page_state_is((pg), offlining))
> 
> Overall I wonder whether the PGC_state_* ordering couldn't be
> adjusted such that at least some of these three won't need
> two comparisons (by masking off a bit before comparing).

The whole point in this exercise is that there isn't a whole bit for
these; they are each *two* states out of the possible 8.

> Also for all three - no need for extra parentheses around pg
> (or in general macro arguments which get handed on without
> being part of an expression).

Yeah, I'll remove some of those.




[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Xen-devel] [PATCH 2/2] xen/mm: Introduce PG_state_uninitialised
  2020-02-20 11:59                                 ` Jan Beulich
  2020-02-20 13:27                                   ` Julien Grall
@ 2020-03-17 22:15                                   ` David Woodhouse
  2020-03-18  8:53                                     ` Paul Durrant
  2020-03-18 10:03                                     ` Jan Beulich
  1 sibling, 2 replies; 48+ messages in thread
From: David Woodhouse @ 2020-03-17 22:15 UTC (permalink / raw)
  To: Jan Beulich
  Cc: sstabellini, julien, wl, konrad.wilk, george.dunlap,
	andrew.cooper3, ian.jackson, george.dunlap, jeff.kubascik, Xia,
	Hongyan, stewart.hildebrand, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 4927 bytes --]

On Thu, 2020-02-20 at 12:59 +0100, Jan Beulich wrote:
> On 07.02.2020 19:04, David Woodhouse wrote:
> > --- a/xen/arch/x86/mm.c
> > +++ b/xen/arch/x86/mm.c
> > @@ -488,7 +488,8 @@ void share_xen_page_with_guest(struct page_info *page, struct domain *d,
> >  
> >      page_set_owner(page, d);
> >      smp_wmb(); /* install valid domain ptr before updating refcnt. */
> > -    ASSERT((page->count_info & ~PGC_xen_heap) == 0);
> > +    ASSERT((page->count_info & ~PGC_xen_heap) == PGC_state_inuse ||
> > +           (page->count_info & ~PGC_xen_heap) == PGC_state_uninitialised);
> 
> Can uninitialized pages really make it here?

Yep, we share the low 1MiB with dom_io.

> > @@ -1389,6 +1391,16 @@ static void free_heap_pages(
> >      ASSERT(order <= MAX_ORDER);
> >      ASSERT(node >= 0);
> >  
> > +    if ( page_state_is(pg, uninitialised) )
> > +    {
> > +        init_heap_pages(pg, 1 << order, need_scrub);
> > +        /*
> > +         * init_heap_pages() will call back into free_heap_pages() for
> > +         * each page but cannot keep recursing because each page will
> > +         * be set to PGC_state_inuse first.
> > +         */
> > +        return;
> > +    }
> >      spin_lock(&heap_lock);
> 
> Can you also add a blank line above here please?

Done.

> > @@ -1780,11 +1792,10 @@ int query_page_offline(mfn_t mfn, uint32_t *status)
> >   * latter is not on a MAX_ORDER boundary, then we reserve the page by
> >   * not freeing it to the buddy allocator.
> >   */
> > -static void init_heap_pages(
> > -    struct page_info *pg, unsigned long nr_pages)
> > +static void init_heap_pages(struct page_info *pg, unsigned long nr_pages,
> > +                            bool scrub)
> 
> Is this new parameter strictly needed, i.e. can free_heap_pages()
> be called with uninitialized pages which need scrubbing? (The
> code change is simple enough, and hence may warrant keeping, but
> then the commit message could indicate so in case this isn't a
> strict requirement.)

Yes, I think it's feasible for the initramfs pages, which is assigned
to dom0 from uninitialised pages, to later get freed and then they'll
want scrubbing?

There *is* a path into free_heap_pages() with the need_scrub argument
set, and I'd have to *prove* that it can never happen in order to... I
don't know... put a BUG() in that case instead of supporting it? Didn't
seem like that was the thing I wanted to do.

> > @@ -2301,10 +2316,11 @@ int assign_pages(
> >      for ( i = 0; i < (1 << order); i++ )
> >      {
> >          ASSERT(page_get_owner(&pg[i]) == NULL);
> > -        ASSERT(!pg[i].count_info);
> > +        ASSERT(pg[i].count_info == PGC_state_inuse ||
> > +               pg[i].count_info == PGC_state_uninitialised);
> 
> Same question here: Can uninitialized pages make it here? If
> so, wouldn't it be better to correct this, rather than having
> the more permissive assertion?

Yep, Dom0 initrd on x86.


        ASSERT((pg[i].count_info & ~PGC_extra) == PGC_state_inuse ||
               (pg[i].count_info & ~PGC_extra) == PGC_state_uninitialised);
> >          page_set_owner(&pg[i], d);
> >          smp_wmb(); /* Domain pointer must be visible before updating refcnt. */
> > -        pg[i].count_info = PGC_allocated | 1;
> > +        pg[i].count_info |= PGC_allocated | 1;
> 
> This is too relaxed for my taste: I understand you want to
> retain page state, but I suppose other bits would want clearing
> nevertheless.

You seem to have dropped the ASSERT immediately before the code snippet
you cited, in which arbitrary other contents of count_info are not
permitted. I put it back, in its current form after I rebase on top of
Paul's commit c793d13944b45d assing PGC_extra.

> > --- a/xen/include/asm-x86/mm.h
> > +++ b/xen/include/asm-x86/mm.h
> > @@ -72,12 +72,13 @@
> >    * { inuse, offlining, offlined, free, broken_offlining, broken }
> >    */
> >  #define PGC_state                  PG_mask(7, 9)
> > -#define PGC_state_inuse            PG_mask(0, 9)
> > +#define PGC_state_uninitialised    PG_mask(0, 9)
> >  #define PGC_state_offlining        PG_mask(1, 9)
> >  #define PGC_state_offlined         PG_mask(2, 9)
> >  #define PGC_state_free             PG_mask(3, 9)
> >  #define PGC_state_broken_offlining PG_mask(4, 9)
> >  #define PGC_state_broken           PG_mask(5, 9)
> > +#define PGC_state_inuse            PG_mask(6, 9)
> 
> Would imo be nice if this most common state was actually
> either 1 or 7, for easy recognition. But the most suitable
> value to pick may also depend on the outcome of one of the
> comments on patch 1.

Not quite sure why 1 and 7 are easier to recognise than other values.
The important one is that uninitialised has to be zero, since that's
the default (because that's what the frame table is memset to. Which is
changeable, but non-trivially so).

[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Xen-devel] [PATCH 2/2] xen/mm: Introduce PG_state_uninitialised
  2020-03-17 22:15                                   ` David Woodhouse
@ 2020-03-18  8:53                                     ` Paul Durrant
  2020-03-18 10:10                                       ` Jan Beulich
  2020-03-18 10:03                                     ` Jan Beulich
  1 sibling, 1 reply; 48+ messages in thread
From: Paul Durrant @ 2020-03-18  8:53 UTC (permalink / raw)
  To: 'David Woodhouse', 'Jan Beulich'
  Cc: sstabellini, julien, wl, konrad.wilk, george.dunlap,
	andrew.cooper3, ian.jackson, george.dunlap, jeff.kubascik,
	'Xia, Hongyan',
	stewart.hildebrand, xen-devel

> -----Original Message-----
> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of David Woodhouse
> Sent: 17 March 2020 22:15
> To: Jan Beulich <jbeulich@suse.com>
> Cc: sstabellini@kernel.org; julien@xen.org; wl@xen.org; konrad.wilk@oracle.com;
> george.dunlap@eu.citrix.com; andrew.cooper3@citrix.com; ian.jackson@eu.citrix.com;
> george.dunlap@citrix.com; jeff.kubascik@dornerworks.com; Xia, Hongyan <hongyxia@amazon.com>;
> stewart.hildebrand@dornerworks.com; xen-devel@lists.xenproject.org
> Subject: Re: [Xen-devel] [PATCH 2/2] xen/mm: Introduce PG_state_uninitialised
> 
> On Thu, 2020-02-20 at 12:59 +0100, Jan Beulich wrote:
> > On 07.02.2020 19:04, David Woodhouse wrote:
> > > --- a/xen/arch/x86/mm.c
> > > +++ b/xen/arch/x86/mm.c
> > > @@ -488,7 +488,8 @@ void share_xen_page_with_guest(struct page_info *page, struct domain *d,
> > >
> > >      page_set_owner(page, d);
> > >      smp_wmb(); /* install valid domain ptr before updating refcnt. */
> > > -    ASSERT((page->count_info & ~PGC_xen_heap) == 0);
> > > +    ASSERT((page->count_info & ~PGC_xen_heap) == PGC_state_inuse ||
> > > +           (page->count_info & ~PGC_xen_heap) == PGC_state_uninitialised);
> >
> > Can uninitialized pages really make it here?
> 
> Yep, we share the low 1MiB with dom_io.
> 

OOI anyone know why we do this? Is it actually necessary?

  Paul


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Xen-devel] [PATCH 1/2] xen/mm: fold PGC_broken into PGC_state bits
  2020-03-17 21:52                         ` David Woodhouse
@ 2020-03-18  9:56                           ` Jan Beulich
  2020-03-18 12:31                             ` Julien Grall
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2020-03-18  9:56 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, George Dunlap,
	Jeff Kubascik, Stewart Hildebrand, xen-devel

On 17.03.2020 22:52, David Woodhouse wrote:
> On Thu, 2020-02-20 at 12:10 +0100, Jan Beulich wrote:
>> On 07.02.2020 16:57, David Woodhouse wrote:
>>> @@ -1145,16 +1145,19 @@ static int reserve_offlined_page(struct
>>> page_info *head)
>>>   
>>>       for ( cur_head = head; cur_head < head + ( 1UL << head_order);
>>> cur_head++ )
>>>       {
>>> -        if ( !page_state_is(cur_head, offlined) )
>>> +        struct page_list_head *list;
>>> +        if ( page_state_is(cur_head, offlined) )
>>> +            list = &page_offlined_list;
>>> +        else if (page_state_is(cur_head, broken) )
>>> +            list = &page_broken_list;
>>> +        else
>>>               continue;
>>>   
>>>           avail[node][zone]--;
>>>           total_avail_pages--;
>>>           ASSERT(total_avail_pages >= 0);
>>>   
>>> -        page_list_add_tail(cur_head,
>>> -                           test_bit(_PGC_broken, &cur_head->count_info) ?
>>> -                           &page_broken_list : &page_offlined_list);
>>> +        page_list_add_tail(cur_head, list);
>>
>> While I realize it's fewer comparisons this way, I still wonder
>> whether for the reader's sake it wouldn't better be
>> page_is_offlined() first and then page_is_broken() down here.
> 
> Nah, that would be worse. This way there are two cases which are
> explicitly handled and the list to use for each of them is explicitly
> set. The 'if (a||b) …    some_function(a ? thing_for_a : thing_for_b)'
> construct is much less comprehensible.

It's a matter of taste, I agree, and in such a case I generally advise
to see about limiting code churn. For code you then still introduce
anew, yes, taste decisions may typically be to the authors judgement
(there are exceptions, though).

>>> @@ -1699,14 +1714,14 @@ unsigned int online_page(mfn_t mfn,
>>> uint32_t *status)
>>>       do {
>>>           ret = *status = 0;
>>>   
>>> -        if ( y & PGC_broken )
>>> +        if ( (y & PGC_state) == PGC_state_broken ||
>>> +             (y & PGC_state) == PGC_state_broken_offlining )
>>>           {
>>>               ret = -EINVAL;
>>>               *status = PG_ONLINE_FAILED |PG_ONLINE_BROKEN;
>>>               break;
>>>           }
>>> -
>>> -        if ( (y & PGC_state) == PGC_state_offlined )
>>> +        else if ( (y & PGC_state) == PGC_state_offlined )
>>
>> I don't see a need for adding "else" here.
> 
> They are mutually exclusive cases. It makes things a whole lot clearer
> to the reader to put the 'else' there, and sometimes helps a naïve
> compiler along the way too.

Well, I'm afraid I'm going to be pretty strict about this: It's again
a matter of taste, yes, but we generally try to avoid pointless else.
What you consider "a whole lot clearer to the reader" is the opposite
from my pov.

>>> --- a/xen/include/asm-x86/mm.h
>>> +++ b/xen/include/asm-x86/mm.h
>>> @@ -67,18 +67,27 @@
>>>    /* 3-bit PAT/PCD/PWT cache-attribute hint. */
>>>   #define PGC_cacheattr_base PG_shift(6)
>>>   #define PGC_cacheattr_mask PG_mask(7, 6)
>>> - /* Page is broken? */
>>> -#define _PGC_broken       PG_shift(7)
>>> -#define PGC_broken        PG_mask(1, 7)
>>> - /* Mutually-exclusive page states: { inuse, offlining, offlined,
>>> free }. */
>>> -#define PGC_state         PG_mask(3, 9)
>>> -#define PGC_state_inuse   PG_mask(0, 9)
>>> -#define PGC_state_offlining PG_mask(1, 9)
>>> -#define PGC_state_offlined PG_mask(2, 9)
>>> -#define PGC_state_free    PG_mask(3, 9)
>>> -#define page_state_is(pg, st) (((pg)->count_info&PGC_state) ==
>>> PGC_state_##st)
>>> -
>>> - /* Count of references to this frame. */
>>> + /*
>>> +  * Mutually-exclusive page states:
>>> +  * { inuse, offlining, offlined, free, broken_offlining, broken }
>>> +  */
>>> +#define PGC_state                  PG_mask(7, 9)
>>> +#define PGC_state_inuse            PG_mask(0, 9)
>>> +#define PGC_state_offlining        PG_mask(1, 9)
>>> +#define PGC_state_offlined         PG_mask(2, 9)
>>> +#define PGC_state_free             PG_mask(3, 9)
>>> +#define PGC_state_broken_offlining PG_mask(4, 9)
>>
>> TBH I'd prefer PGC_state_offlining_broken, as it's not the
>> offlining which is broken, but a broken page is being
>> offlined.
> 
> It is the page which is both broken and offlining.
> Or indeed it is the page which is both offlining and broken.

I.e. you agree with flipping the two parts around?

>>> +#define page_is_offlining(pg)      (page_state_is((pg), broken_offlining) || \
>>> +                                    page_state_is((pg), offlining))
>>
>> Overall I wonder whether the PGC_state_* ordering couldn't be
>> adjusted such that at least some of these three won't need
>> two comparisons (by masking off a bit before comparing).
> 
> The whole point in this exercise is that there isn't a whole bit for
> these; they are each *two* states out of the possible 8.

Sure. But just consider the more general case: Instead of writing

	if ( i == 6 || i == 7 )

you can as well write

	if ( (i | 1) == 7 )

Similar for multiple == vs a single <= or >=.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Xen-devel] [PATCH 2/2] xen/mm: Introduce PG_state_uninitialised
  2020-03-17 22:15                                   ` David Woodhouse
  2020-03-18  8:53                                     ` Paul Durrant
@ 2020-03-18 10:03                                     ` Jan Beulich
  2020-03-18 12:11                                       ` David Woodhouse
  1 sibling, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2020-03-18 10:03 UTC (permalink / raw)
  To: David Woodhouse
  Cc: sstabellini, julien, wl, konrad.wilk, george.dunlap,
	andrew.cooper3, ian.jackson, george.dunlap, jeff.kubascik, Xia,
	Hongyan, stewart.hildebrand, xen-devel

On 17.03.2020 23:15, David Woodhouse wrote:
> On Thu, 2020-02-20 at 12:59 +0100, Jan Beulich wrote:
>> On 07.02.2020 19:04, David Woodhouse wrote:
>          ASSERT((pg[i].count_info & ~PGC_extra) == PGC_state_inuse ||
>                 (pg[i].count_info & ~PGC_extra) == PGC_state_uninitialised);
>>>           page_set_owner(&pg[i], d);
>>>           smp_wmb(); /* Domain pointer must be visible before updating refcnt. */
>>> -        pg[i].count_info = PGC_allocated | 1;
>>> +        pg[i].count_info |= PGC_allocated | 1;
>>
>> This is too relaxed for my taste: I understand you want to
>> retain page state, but I suppose other bits would want clearing
>> nevertheless.
> 
> You seem to have dropped the ASSERT immediately before the code snippet
> you cited, in which arbitrary other contents of count_info are not
> permitted. I put it back, in its current form after I rebase on top of
> Paul's commit c793d13944b45d assing PGC_extra.

But that' only an ASSERT(), i.e. no protection at all in release builds.

>>> --- a/xen/include/asm-x86/mm.h
>>> +++ b/xen/include/asm-x86/mm.h
>>> @@ -72,12 +72,13 @@
>>>     * { inuse, offlining, offlined, free, broken_offlining, broken }
>>>     */
>>>   #define PGC_state                  PG_mask(7, 9)
>>> -#define PGC_state_inuse            PG_mask(0, 9)
>>> +#define PGC_state_uninitialised    PG_mask(0, 9)
>>>   #define PGC_state_offlining        PG_mask(1, 9)
>>>   #define PGC_state_offlined         PG_mask(2, 9)
>>>   #define PGC_state_free             PG_mask(3, 9)
>>>   #define PGC_state_broken_offlining PG_mask(4, 9)
>>>   #define PGC_state_broken           PG_mask(5, 9)
>>> +#define PGC_state_inuse            PG_mask(6, 9)
>>
>> Would imo be nice if this most common state was actually
>> either 1 or 7, for easy recognition. But the most suitable
>> value to pick may also depend on the outcome of one of the
>> comments on patch 1.
> 
> Not quite sure why 1 and 7 are easier to recognise than other values.
> The important one is that uninitialised has to be zero, since that's
> the default (because that's what the frame table is memset to. Which is
> changeable, but non-trivially so).

In particular 7 may imo be easier to recognize, as having all
of the three bits set. It sometimes helps seeing such at (almost)
the first glance in e.g. logged data.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Xen-devel] [PATCH 2/2] xen/mm: Introduce PG_state_uninitialised
  2020-03-18  8:53                                     ` Paul Durrant
@ 2020-03-18 10:10                                       ` Jan Beulich
  2020-03-18 10:41                                         ` Paul Durrant
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2020-03-18 10:10 UTC (permalink / raw)
  To: paul
  Cc: sstabellini, julien, wl, konrad.wilk, george.dunlap,
	andrew.cooper3, ian.jackson, george.dunlap, jeff.kubascik,
	'Xia, Hongyan',
	stewart.hildebrand, xen-devel, 'David Woodhouse'

On 18.03.2020 09:53, Paul Durrant wrote:
>> -----Original Message-----
>> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of David Woodhouse
>> Sent: 17 March 2020 22:15
>>
>> On Thu, 2020-02-20 at 12:59 +0100, Jan Beulich wrote:
>>> On 07.02.2020 19:04, David Woodhouse wrote:
>>>> --- a/xen/arch/x86/mm.c
>>>> +++ b/xen/arch/x86/mm.c
>>>> @@ -488,7 +488,8 @@ void share_xen_page_with_guest(struct page_info *page, struct domain *d,
>>>>
>>>>       page_set_owner(page, d);
>>>>       smp_wmb(); /* install valid domain ptr before updating refcnt. */
>>>> -    ASSERT((page->count_info & ~PGC_xen_heap) == 0);
>>>> +    ASSERT((page->count_info & ~PGC_xen_heap) == PGC_state_inuse ||
>>>> +           (page->count_info & ~PGC_xen_heap) == PGC_state_uninitialised);
>>>
>>> Can uninitialized pages really make it here?
>>
>> Yep, we share the low 1MiB with dom_io.
>>
> 
> OOI anyone know why we do this? Is it actually necessary?

Yes, for Dom0 to be able to access things like EBDA, IBFT, or data
found in BIOS space.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Xen-devel] [PATCH 2/2] xen/mm: Introduce PG_state_uninitialised
  2020-03-18 10:10                                       ` Jan Beulich
@ 2020-03-18 10:41                                         ` Paul Durrant
  2020-03-18 11:12                                           ` Jan Beulich
  0 siblings, 1 reply; 48+ messages in thread
From: Paul Durrant @ 2020-03-18 10:41 UTC (permalink / raw)
  To: 'Jan Beulich'
  Cc: sstabellini, julien, wl, konrad.wilk, george.dunlap,
	andrew.cooper3, ian.jackson, george.dunlap, jeff.kubascik,
	'Xia, Hongyan',
	stewart.hildebrand, xen-devel, 'David Woodhouse'

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 18 March 2020 10:10
> To: paul@xen.org
> Cc: 'David Woodhouse' <dwmw2@infradead.org>; sstabellini@kernel.org; julien@xen.org; wl@xen.org;
> konrad.wilk@oracle.com; george.dunlap@eu.citrix.com; andrew.cooper3@citrix.com;
> ian.jackson@eu.citrix.com; george.dunlap@citrix.com; jeff.kubascik@dornerworks.com; 'Xia, Hongyan'
> <hongyxia@amazon.com>; stewart.hildebrand@dornerworks.com; xen-devel@lists.xenproject.org
> Subject: Re: [PATCH 2/2] xen/mm: Introduce PG_state_uninitialised
> 
> On 18.03.2020 09:53, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of David Woodhouse
> >> Sent: 17 March 2020 22:15
> >>
> >> On Thu, 2020-02-20 at 12:59 +0100, Jan Beulich wrote:
> >>> On 07.02.2020 19:04, David Woodhouse wrote:
> >>>> --- a/xen/arch/x86/mm.c
> >>>> +++ b/xen/arch/x86/mm.c
> >>>> @@ -488,7 +488,8 @@ void share_xen_page_with_guest(struct page_info *page, struct domain *d,
> >>>>
> >>>>       page_set_owner(page, d);
> >>>>       smp_wmb(); /* install valid domain ptr before updating refcnt. */
> >>>> -    ASSERT((page->count_info & ~PGC_xen_heap) == 0);
> >>>> +    ASSERT((page->count_info & ~PGC_xen_heap) == PGC_state_inuse ||
> >>>> +           (page->count_info & ~PGC_xen_heap) == PGC_state_uninitialised);
> >>>
> >>> Can uninitialized pages really make it here?
> >>
> >> Yep, we share the low 1MiB with dom_io.
> >>
> >
> > OOI anyone know why we do this? Is it actually necessary?
> 
> Yes, for Dom0 to be able to access things like EBDA, IBFT, or data
> found in BIOS space.
>

Ok. I am still wondering why dom0's low 1MiB of pfn space is not simply mapped 1:1 though. Just historical?

  Paul
 
> Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Xen-devel] [PATCH 2/2] xen/mm: Introduce PG_state_uninitialised
  2020-03-18 10:41                                         ` Paul Durrant
@ 2020-03-18 11:12                                           ` Jan Beulich
  0 siblings, 0 replies; 48+ messages in thread
From: Jan Beulich @ 2020-03-18 11:12 UTC (permalink / raw)
  To: paul
  Cc: sstabellini, julien, wl, konrad.wilk, george.dunlap,
	andrew.cooper3, ian.jackson, george.dunlap, jeff.kubascik,
	'Xia, Hongyan',
	stewart.hildebrand, xen-devel, 'David Woodhouse'

On 18.03.2020 11:41, Paul Durrant wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 18 March 2020 10:10
>> To: paul@xen.org
>> Cc: 'David Woodhouse' <dwmw2@infradead.org>; sstabellini@kernel.org; julien@xen.org; wl@xen.org;
>> konrad.wilk@oracle.com; george.dunlap@eu.citrix.com; andrew.cooper3@citrix.com;
>> ian.jackson@eu.citrix.com; george.dunlap@citrix.com; jeff.kubascik@dornerworks.com; 'Xia, Hongyan'
>> <hongyxia@amazon.com>; stewart.hildebrand@dornerworks.com; xen-devel@lists.xenproject.org
>> Subject: Re: [PATCH 2/2] xen/mm: Introduce PG_state_uninitialised
>>
>> On 18.03.2020 09:53, Paul Durrant wrote:
>>>> -----Original Message-----
>>>> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of David Woodhouse
>>>> Sent: 17 March 2020 22:15
>>>>
>>>> On Thu, 2020-02-20 at 12:59 +0100, Jan Beulich wrote:
>>>>> On 07.02.2020 19:04, David Woodhouse wrote:
>>>>>> --- a/xen/arch/x86/mm.c
>>>>>> +++ b/xen/arch/x86/mm.c
>>>>>> @@ -488,7 +488,8 @@ void share_xen_page_with_guest(struct page_info *page, struct domain *d,
>>>>>>
>>>>>>        page_set_owner(page, d);
>>>>>>        smp_wmb(); /* install valid domain ptr before updating refcnt. */
>>>>>> -    ASSERT((page->count_info & ~PGC_xen_heap) == 0);
>>>>>> +    ASSERT((page->count_info & ~PGC_xen_heap) == PGC_state_inuse ||
>>>>>> +           (page->count_info & ~PGC_xen_heap) == PGC_state_uninitialised);
>>>>>
>>>>> Can uninitialized pages really make it here?
>>>>
>>>> Yep, we share the low 1MiB with dom_io.
>>>>
>>>
>>> OOI anyone know why we do this? Is it actually necessary?
>>
>> Yes, for Dom0 to be able to access things like EBDA, IBFT, or data
>> found in BIOS space.
>>
> 
> Ok. I am still wondering why dom0's low 1MiB of pfn space is not
> simply mapped 1:1 though. Just historical?

Well, in a way perhaps. Using the DomIO approach is less of a special
case than mapping some arbitrary range 1:1. Furthermore Dom0 being PV
wouldn't necessarily expect any BIOS in its PFN range there, but
rather views it as normal RAM.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Xen-devel] [PATCH 2/2] xen/mm: Introduce PG_state_uninitialised
  2020-03-18 10:03                                     ` Jan Beulich
@ 2020-03-18 12:11                                       ` David Woodhouse
  2020-03-18 13:27                                         ` Jan Beulich
  0 siblings, 1 reply; 48+ messages in thread
From: David Woodhouse @ 2020-03-18 12:11 UTC (permalink / raw)
  To: Jan Beulich
  Cc: sstabellini, julien, wl, konrad.wilk, george.dunlap,
	andrew.cooper3, ian.jackson, george.dunlap, jeff.kubascik, Xia,
	Hongyan, stewart.hildebrand, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2868 bytes --]

On Wed, 2020-03-18 at 11:03 +0100, Jan Beulich wrote:
> On 17.03.2020 23:15, David Woodhouse wrote:
> > On Thu, 2020-02-20 at 12:59 +0100, Jan Beulich wrote:
> > > On 07.02.2020 19:04, David Woodhouse wrote:
> > 
> >           ASSERT((pg[i].count_info & ~PGC_extra) == PGC_state_inuse ||
> >                  (pg[i].count_info & ~PGC_extra) == PGC_state_uninitialised);
> > > >            page_set_owner(&pg[i], d);
> > > >            smp_wmb(); /* Domain pointer must be visible before updating refcnt. */
> > > > -        pg[i].count_info = PGC_allocated | 1;
> > > > +        pg[i].count_info |= PGC_allocated | 1;
> > > 
> > > This is too relaxed for my taste: I understand you want to
> > > retain page state, but I suppose other bits would want clearing
> > > nevertheless.
> > 
> > You seem to have dropped the ASSERT immediately before the code snippet
> > you cited, in which arbitrary other contents of count_info are not
> > permitted. I put it back, in its current form after I rebase on top of
> > Paul's commit c793d13944b45d assing PGC_extra.
> 
> But that' only an ASSERT(), i.e. no protection at all in release builds.

An ASSERT does protect release builds. If the rule is that you must
never call assign_pages() with pages that have the other bits in
count_info set, then the ASSERT helps to catch the cases where people
introduce a bug and start doing precisely that, and the bug never
*makes* it to release builds.

What we're debating here is the behaviour of assign_pages() when
someone introduces such a bug and calls it with inappropriate pages.

Currently, the behaviour is that the other flags are silently cleared.
I've seen no analysis that such clearing is correct or desirable. In
fact, for the PGC_state bits I determined that it now is NOT correct,
which is why I changed it.

While I was at it, I let it preserve the other bits — which, again,
should never be set, and which would trigger the ASSERT in debug builds
if it were to happen.

But I'm not tied to that behaviour. It's still a "can never happen"
case as far as I'm concerned. So let's make it look like this:


    for ( i = 0; i < (1 << order); i++ )
    {
        ASSERT(page_get_owner(&pg[i]) == NULL);
        /*
         * Note: Not using page_state_is() here. The ASSERT requires that
         * all other bits in count_info are zero, in addition to PGC_state
         * being appropriate.
         */
        ASSERT((pg[i].count_info & ~PGC_extra) == PGC_state_inuse ||
               (pg[i].count_info & ~PGC_extra) == PGC_state_uninitialised);
        page_set_owner(&pg[i], d);
        smp_wmb(); /* Domain pointer must be visible before updating refcnt. */
        pg[i].count_info = (pg[i].count_info & PGC_state) | PGC_allocated | 1;
        page_list_add_tail(&pg[i], &d->page_list);
    }

OK?

[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Xen-devel] [PATCH 1/2] xen/mm: fold PGC_broken into PGC_state bits
  2020-03-18  9:56                           ` Jan Beulich
@ 2020-03-18 12:31                             ` Julien Grall
  2020-03-18 13:23                               ` Jan Beulich
  2020-03-18 17:13                               ` David Woodhouse
  0 siblings, 2 replies; 48+ messages in thread
From: Julien Grall @ 2020-03-18 12:31 UTC (permalink / raw)
  To: Jan Beulich, David Woodhouse
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, George Dunlap,
	Jeff Kubascik, Stewart Hildebrand, xen-devel

Hi Jan,

On 18/03/2020 09:56, Jan Beulich wrote:
> On 17.03.2020 22:52, David Woodhouse wrote:
>> On Thu, 2020-02-20 at 12:10 +0100, Jan Beulich wrote:
>>> On 07.02.2020 16:57, David Woodhouse wrote:
>>>> @@ -1145,16 +1145,19 @@ static int reserve_offlined_page(struct
>>>> page_info *head)
>>>>       for ( cur_head = head; cur_head < head + ( 1UL << head_order);
>>>> cur_head++ )
>>>>       {
>>>> -        if ( !page_state_is(cur_head, offlined) )
>>>> +        struct page_list_head *list;
>>>> +        if ( page_state_is(cur_head, offlined) )
>>>> +            list = &page_offlined_list;
>>>> +        else if (page_state_is(cur_head, broken) )
>>>> +            list = &page_broken_list;
>>>> +        else
>>>>               continue;
>>>>           avail[node][zone]--;
>>>>           total_avail_pages--;
>>>>           ASSERT(total_avail_pages >= 0);
>>>> -        page_list_add_tail(cur_head,
>>>> -                           test_bit(_PGC_broken, 
>>>> &cur_head->count_info) ?
>>>> -                           &page_broken_list : &page_offlined_list);
>>>> +        page_list_add_tail(cur_head, list);
>>>
>>> While I realize it's fewer comparisons this way, I still wonder
>>> whether for the reader's sake it wouldn't better be
>>> page_is_offlined() first and then page_is_broken() down here.
>>
>> Nah, that would be worse. This way there are two cases which are
>> explicitly handled and the list to use for each of them is explicitly
>> set. The 'if (a||b) …    some_function(a ? thing_for_a : thing_for_b)'
>> construct is much less comprehensible.
> 
> It's a matter of taste, I agree, and in such a case I generally advise
> to see about limiting code churn. For code you then still introduce
> anew, yes, taste decisions may typically be to the authors judgement
> (there are exceptions, though).
> 
>>>> @@ -1699,14 +1714,14 @@ unsigned int online_page(mfn_t mfn,
>>>> uint32_t *status)
>>>>       do {
>>>>           ret = *status = 0;
>>>> -        if ( y & PGC_broken )
>>>> +        if ( (y & PGC_state) == PGC_state_broken ||
>>>> +             (y & PGC_state) == PGC_state_broken_offlining )
>>>>           {
>>>>               ret = -EINVAL;
>>>>               *status = PG_ONLINE_FAILED |PG_ONLINE_BROKEN;
>>>>               break;
>>>>           }
>>>> -
>>>> -        if ( (y & PGC_state) == PGC_state_offlined )
>>>> +        else if ( (y & PGC_state) == PGC_state_offlined )
>>>
>>> I don't see a need for adding "else" here.
>>
>> They are mutually exclusive cases. It makes things a whole lot clearer
>> to the reader to put the 'else' there, and sometimes helps a naïve
>> compiler along the way too.
> 
> Well, I'm afraid I'm going to be pretty strict about this: It's again
> a matter of taste, yes, but we generally try to avoid pointless else.
> What you consider "a whole lot clearer to the reader" is the opposite
> from my pov.

While I agree the 'else' may be pointless, I don't think it is worth an 
argument. As the author of the patch, it is his choice to write the code 
like that.

> 
>>>> --- a/xen/include/asm-x86/mm.h
>>>> +++ b/xen/include/asm-x86/mm.h
>>>> @@ -67,18 +67,27 @@
>>>>    /* 3-bit PAT/PCD/PWT cache-attribute hint. */
>>>>   #define PGC_cacheattr_base PG_shift(6)
>>>>   #define PGC_cacheattr_mask PG_mask(7, 6)
>>>> - /* Page is broken? */
>>>> -#define _PGC_broken       PG_shift(7)
>>>> -#define PGC_broken        PG_mask(1, 7)
>>>> - /* Mutually-exclusive page states: { inuse, offlining, offlined,
>>>> free }. */
>>>> -#define PGC_state         PG_mask(3, 9)
>>>> -#define PGC_state_inuse   PG_mask(0, 9)
>>>> -#define PGC_state_offlining PG_mask(1, 9)
>>>> -#define PGC_state_offlined PG_mask(2, 9)
>>>> -#define PGC_state_free    PG_mask(3, 9)
>>>> -#define page_state_is(pg, st) (((pg)->count_info&PGC_state) ==
>>>> PGC_state_##st)
>>>> -
>>>> - /* Count of references to this frame. */
>>>> + /*
>>>> +  * Mutually-exclusive page states:
>>>> +  * { inuse, offlining, offlined, free, broken_offlining, broken }
>>>> +  */
>>>> +#define PGC_state                  PG_mask(7, 9)
>>>> +#define PGC_state_inuse            PG_mask(0, 9)
>>>> +#define PGC_state_offlining        PG_mask(1, 9)
>>>> +#define PGC_state_offlined         PG_mask(2, 9)
>>>> +#define PGC_state_free             PG_mask(3, 9)
>>>> +#define PGC_state_broken_offlining PG_mask(4, 9)
>>>
>>> TBH I'd prefer PGC_state_offlining_broken, as it's not the
>>> offlining which is broken, but a broken page is being
>>> offlined.
>>
>> It is the page which is both broken and offlining.
>> Or indeed it is the page which is both offlining and broken.
> 
> I.e. you agree with flipping the two parts around?
> 
>>>> +#define page_is_offlining(pg)      (page_state_is((pg), 
>>>> broken_offlining) || \
>>>> +                                    page_state_is((pg), offlining))
>>>
>>> Overall I wonder whether the PGC_state_* ordering couldn't be
>>> adjusted such that at least some of these three won't need
>>> two comparisons (by masking off a bit before comparing).
>>
>> The whole point in this exercise is that there isn't a whole bit for
>> these; they are each *two* states out of the possible 8.
> 
> Sure. But just consider the more general case: Instead of writing
> 
>      if ( i == 6 || i == 7 )
> 
> you can as well write
> 
>      if ( (i | 1) == 7 )

I stumbled accross a few of those recently and this is not the obvious 
things to read. Yes, your code may be faster. But is it really worth it 
compare to the cost of readability and futureproofness?

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Xen-devel] [PATCH 1/2] xen/mm: fold PGC_broken into PGC_state bits
  2020-03-18 12:31                             ` Julien Grall
@ 2020-03-18 13:23                               ` Jan Beulich
  2020-03-18 17:13                               ` David Woodhouse
  1 sibling, 0 replies; 48+ messages in thread
From: Jan Beulich @ 2020-03-18 13:23 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, George Dunlap,
	Jeff Kubascik, Stewart Hildebrand, xen-devel, David Woodhouse

On 18.03.2020 13:31, Julien Grall wrote:
> On 18/03/2020 09:56, Jan Beulich wrote:
>> On 17.03.2020 22:52, David Woodhouse wrote:
>>> On Thu, 2020-02-20 at 12:10 +0100, Jan Beulich wrote:
>>>> On 07.02.2020 16:57, David Woodhouse wrote:
>>>>> +#define page_is_offlining(pg)      (page_state_is((pg), broken_offlining) || \
>>>>> +                                    page_state_is((pg), offlining))
>>>>
>>>> Overall I wonder whether the PGC_state_* ordering couldn't be
>>>> adjusted such that at least some of these three won't need
>>>> two comparisons (by masking off a bit before comparing).
>>>
>>> The whole point in this exercise is that there isn't a whole bit for
>>> these; they are each *two* states out of the possible 8.
>>
>> Sure. But just consider the more general case: Instead of writing
>>
>>      if ( i == 6 || i == 7 )
>>
>> you can as well write
>>
>>      if ( (i | 1) == 7 )
> 
> I stumbled accross a few of those recently and this is not the
> obvious things to read.

Depends on the reader, I guess.

> Yes, your code may be faster. But is it really worth it compare
> to the cost of readability and futureproofness?

Not sure how being future proof comes into play here. I'm not
suggesting to use literal numbers. I'd also be happy to be proven
wrong in assuming that the compiler still can't do such
transformations itself; it couldn't when I check a while back.
Reducing the number of conditional branches is, imo, worth at
least some effort at the source level.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Xen-devel] [PATCH 2/2] xen/mm: Introduce PG_state_uninitialised
  2020-03-18 12:11                                       ` David Woodhouse
@ 2020-03-18 13:27                                         ` Jan Beulich
  0 siblings, 0 replies; 48+ messages in thread
From: Jan Beulich @ 2020-03-18 13:27 UTC (permalink / raw)
  To: David Woodhouse
  Cc: sstabellini, julien, wl, konrad.wilk, george.dunlap,
	andrew.cooper3, ian.jackson, george.dunlap, jeff.kubascik, Xia,
	Hongyan, stewart.hildebrand, xen-devel

On 18.03.2020 13:11, David Woodhouse wrote:
> On Wed, 2020-03-18 at 11:03 +0100, Jan Beulich wrote:
>> On 17.03.2020 23:15, David Woodhouse wrote:
>>> On Thu, 2020-02-20 at 12:59 +0100, Jan Beulich wrote:
>>>> On 07.02.2020 19:04, David Woodhouse wrote:
>>>
>>>           ASSERT((pg[i].count_info & ~PGC_extra) == PGC_state_inuse ||
>>>                  (pg[i].count_info & ~PGC_extra) == PGC_state_uninitialised);
>>>>>            page_set_owner(&pg[i], d);
>>>>>            smp_wmb(); /* Domain pointer must be visible before updating refcnt. */
>>>>> -        pg[i].count_info = PGC_allocated | 1;
>>>>> +        pg[i].count_info |= PGC_allocated | 1;
>>>>
>>>> This is too relaxed for my taste: I understand you want to
>>>> retain page state, but I suppose other bits would want clearing
>>>> nevertheless.
>>>
>>> You seem to have dropped the ASSERT immediately before the code snippet
>>> you cited, in which arbitrary other contents of count_info are not
>>> permitted. I put it back, in its current form after I rebase on top of
>>> Paul's commit c793d13944b45d assing PGC_extra.
>>
>> But that' only an ASSERT(), i.e. no protection at all in release builds.
> 
> An ASSERT does protect release builds. If the rule is that you must
> never call assign_pages() with pages that have the other bits in
> count_info set, then the ASSERT helps to catch the cases where people
> introduce a bug and start doing precisely that, and the bug never
> *makes* it to release builds.
> 
> What we're debating here is the behaviour of assign_pages() when
> someone introduces such a bug and calls it with inappropriate pages.
> 
> Currently, the behaviour is that the other flags are silently cleared.
> I've seen no analysis that such clearing is correct or desirable. In
> fact, for the PGC_state bits I determined that it now is NOT correct,
> which is why I changed it.
> 
> While I was at it, I let it preserve the other bits — which, again,
> should never be set, and which would trigger the ASSERT in debug builds
> if it were to happen.
> 
> But I'm not tied to that behaviour. It's still a "can never happen"
> case as far as I'm concerned. So let's make it look like this:
> 
> 
>     for ( i = 0; i < (1 << order); i++ )
>     {
>         ASSERT(page_get_owner(&pg[i]) == NULL);
>         /*
>          * Note: Not using page_state_is() here. The ASSERT requires that
>          * all other bits in count_info are zero, in addition to PGC_state
>          * being appropriate.
>          */
>         ASSERT((pg[i].count_info & ~PGC_extra) == PGC_state_inuse ||
>                (pg[i].count_info & ~PGC_extra) == PGC_state_uninitialised);
>         page_set_owner(&pg[i], d);
>         smp_wmb(); /* Domain pointer must be visible before updating refcnt. */
>         pg[i].count_info = (pg[i].count_info & PGC_state) | PGC_allocated | 1;
>         page_list_add_tail(&pg[i], &d->page_list);
>     }
> 
> OK?

Yes, thanks.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Xen-devel] [PATCH 1/2] xen/mm: fold PGC_broken into PGC_state bits
  2020-03-18 12:31                             ` Julien Grall
  2020-03-18 13:23                               ` Jan Beulich
@ 2020-03-18 17:13                               ` David Woodhouse
  2020-03-19  8:49                                 ` Jan Beulich
  1 sibling, 1 reply; 48+ messages in thread
From: David Woodhouse @ 2020-03-18 17:13 UTC (permalink / raw)
  To: Julien Grall, Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, George Dunlap,
	Jeff Kubascik, Stewart Hildebrand, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 8790 bytes --]

On Wed, 2020-03-18 at 12:31 +0000, Julien Grall wrote:
> On 18/03/2020 09:56, Jan Beulich wrote:
> > On 17.03.2020 22:52, David Woodhouse wrote:
> > > On Thu, 2020-02-20 at 12:10 +0100, Jan Beulich wrote:
> > > > > @@ -1699,14 +1714,14 @@ unsigned int online_page(mfn_t mfn,
> > > > > uint32_t *status)
> > > > >       do {
> > > > >           ret = *status = 0;
> > > > > -        if ( y & PGC_broken )
> > > > > +        if ( (y & PGC_state) == PGC_state_broken ||
> > > > > +             (y & PGC_state) == PGC_state_broken_offlining )
> > > > >           {
> > > > >               ret = -EINVAL;
> > > > >               *status = PG_ONLINE_FAILED |PG_ONLINE_BROKEN;
> > > > >               break;
> > > > >           }
> > > > > -
> > > > > -        if ( (y & PGC_state) == PGC_state_offlined )
> > > > > +        else if ( (y & PGC_state) == PGC_state_offlined )
> > > > 
> > > > I don't see a need for adding "else" here.
> > > 
> > > They are mutually exclusive cases. It makes things a whole lot clearer
> > > to the reader to put the 'else' there, and sometimes helps a naïve
> > > compiler along the way too.
> > 
> > Well, I'm afraid I'm going to be pretty strict about this: It's again
> > a matter of taste, yes, but we generally try to avoid pointless else.
> > What you consider "a whole lot clearer to the reader" is the opposite
> > from my pov.
> 
> While I agree the 'else' may be pointless, I don't think it is worth an 
> argument. As the author of the patch, it is his choice to write the code 
> like that.

Indeed. While I appreciate your insight, Jan, and your detailed reviews
are undoubtedly helpful — especially to me as I poke around the Xen
code base without knowing where the bodies are buried — I do sometimes
find that it degenerates into what appears to be gratuitous
bikeshedding.

Like *some* others, I'm perfectly capable of responding "I understand
you would have done it differently, but I prefer it this way".

But even for those like me who have the self-confidence (or arrogance?)
to respond in such a way, the end result is often the same — a patch
series which the maintainer doesn't apply because it has "unresolved
issues".

Perfect is the enemy of good. Especially when perfection is so
subjective.

This definitely isn't the kind of welcoming community that I enjoy
trying to get my junior engineers to contribute to. And they aren't
snowflakes; they cope with the Linux community just fine, for the most
part.


Earlier today, I found myself adjusting a patch in order to tweak the
behaviour of a "can never happen" situation, when it was far from clear
that the *existing* behaviour in that situation would have been correct
anyway.

There is a lot of value in your reviews, and they are appreciated. But
the overall effect is seen by some as making the Xen community somewhat
dysfunctional. 

The -MP makefile patch I posted yesterday... I almost didn't bother.
And when I allowed myself to be talked into it, I was entirely
unsurprised when a review came in basically asking me to prove a
negative before the patch could proceed. So as far as I can tell, it'll
fall by the wayside and the build will remain broken any time anyone
removes or renames a header file. Because life's too short to invest
the energy to make improvements like that.

One of these days, I may attempt to revive my series cleaning up the
16-bit and 32-bit boot code. Which was a clear improvement and
simplification, and again you gave extremely valid feedback which
helped to improve it — but again it was interspersed with more
subjective and less helpful comments which basically derailed it. 

Having carefully threaded my way through the existing byzantine code
and made incremental bisectable changes, I ended up with feedback that
basically would have required me to start again from scratch, in order
to satisfy what appeared to be fairly arbitrary and subjective demands.

As is often the case in creating a bisectable series out of complex
changes, I had sometimes moved/refactored code, only to move/refactor
it again in a subsequent patch. Sometimes the ordering of such inter-
related changes can be fairly arbitrary, and I had made my choices as
I'd progressed; the real focus being the end result. At one point you
were picking on intermediate details of how I'd made my overall series
bisectable, and seemed to be demanding that I go back and refactor (the
intermediate stages for no better reason than because you would have
done it differently. 

Again, your attention to detail and your expertise are massively
appreciated. But please let's remember that "perfect is the enemy of
good", and strike a balance which allows forward progress without
blocking improvements.

Sometimes I wonder if you truly realise how much you derail the
progress of a patch series just by raising well-intentioned "queries"
around it.


> > > > > --- a/xen/include/asm-x86/mm.h
> > > > > +++ b/xen/include/asm-x86/mm.h
> > > > > @@ -67,18 +67,27 @@
> > > > >    /* 3-bit PAT/PCD/PWT cache-attribute hint. */
> > > > >   #define PGC_cacheattr_base PG_shift(6)
> > > > >   #define PGC_cacheattr_mask PG_mask(7, 6)
> > > > > - /* Page is broken? */
> > > > > -#define _PGC_broken       PG_shift(7)
> > > > > -#define PGC_broken        PG_mask(1, 7)
> > > > > - /* Mutually-exclusive page states: { inuse, offlining, offlined,
> > > > > free }. */
> > > > > -#define PGC_state         PG_mask(3, 9)
> > > > > -#define PGC_state_inuse   PG_mask(0, 9)
> > > > > -#define PGC_state_offlining PG_mask(1, 9)
> > > > > -#define PGC_state_offlined PG_mask(2, 9)
> > > > > -#define PGC_state_free    PG_mask(3, 9)
> > > > > -#define page_state_is(pg, st) (((pg)->count_info&PGC_state) ==
> > > > > PGC_state_##st)
> > > > > -
> > > > > - /* Count of references to this frame. */
> > > > > + /*
> > > > > +  * Mutually-exclusive page states:
> > > > > +  * { inuse, offlining, offlined, free, broken_offlining, broken }
> > > > > +  */
> > > > > +#define PGC_state                  PG_mask(7, 9)
> > > > > +#define PGC_state_inuse            PG_mask(0, 9)
> > > > > +#define PGC_state_offlining        PG_mask(1, 9)
> > > > > +#define PGC_state_offlined         PG_mask(2, 9)
> > > > > +#define PGC_state_free             PG_mask(3, 9)
> > > > > +#define PGC_state_broken_offlining PG_mask(4, 9)
> > > > 
> > > > TBH I'd prefer PGC_state_offlining_broken, as it's not the
> > > > offlining which is broken, but a broken page is being
> > > > offlined.
> > > 
> > > It is the page which is both broken and offlining.
> > > Or indeed it is the page which is both offlining and broken.
> > 
> > I.e. you agree with flipping the two parts around?

I hope I have respectfully made it clear that no, I'm really not happy
with the very concept of such a request.

Perhaps it would be easier for me to acquiesce, in the short term.

But on the whole I think it's better to put my foot down and say 'no',
and focus on real work and things that matter.

> > > > > +#define page_is_offlining(pg)      (page_state_is((pg), 
> > > > > broken_offlining) || \
> > > > > +                                    page_state_is((pg), offlining))
> > > > 
> > > > Overall I wonder whether the PGC_state_* ordering couldn't be
> > > > adjusted such that at least some of these three won't need
> > > > two comparisons (by masking off a bit before comparing).
> > > 
> > > The whole point in this exercise is that there isn't a whole bit for
> > > these; they are each *two* states out of the possible 8.
> > 
> > Sure. But just consider the more general case: Instead of writing
> > 
> >      if ( i == 6 || i == 7 )
> > 
> > you can as well write
> > 
> >      if ( (i | 1) == 7 )
> 
> I stumbled accross a few of those recently and this is not the obvious 
> things to read. Yes, your code may be faster. But is it really worth it 
> compare to the cost of readability and futureproofness?

No. Just no.

If that kind of change is really a worthwhile win, it'll depend on the
CPU. File a GCC PR with a test case as a missed optimisation. Don't
make the source code gratuitously harder to read.

Honestly, this, right here, is the *epitome* of why I, and others,
sometimes feel that submitting a patch to Xen can be more effort than
it's worth.

This email is not intended as a personal attack; I hope you don't feel
that it is. For about the fifth time: your careful reviews and your
attention to detail are *massively* appreciated. Just a little over the
time sometimes, and I'd like to ask you to take care to be aware of the
overall effect, and that you are not blocking progress.

Thanks.



[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Xen-devel] [PATCH 1/2] xen/mm: fold PGC_broken into PGC_state bits
  2020-03-18 17:13                               ` David Woodhouse
@ 2020-03-19  8:49                                 ` Jan Beulich
  2020-03-19 10:26                                   ` David Woodhouse
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2020-03-19  8:49 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, George Dunlap,
	Jeff Kubascik, Stewart Hildebrand, xen-devel

On 18.03.2020 18:13, David Woodhouse wrote:
> On Wed, 2020-03-18 at 12:31 +0000, Julien Grall wrote:
>> On 18/03/2020 09:56, Jan Beulich wrote:
>>> On 17.03.2020 22:52, David Woodhouse wrote:
>>>> On Thu, 2020-02-20 at 12:10 +0100, Jan Beulich wrote:
>>>>>> @@ -1699,14 +1714,14 @@ unsigned int online_page(mfn_t mfn,
>>>>>> uint32_t *status)
>>>>>>       do {
>>>>>>           ret = *status = 0;
>>>>>> -        if ( y & PGC_broken )
>>>>>> +        if ( (y & PGC_state) == PGC_state_broken ||
>>>>>> +             (y & PGC_state) == PGC_state_broken_offlining )
>>>>>>           {
>>>>>>               ret = -EINVAL;
>>>>>>               *status = PG_ONLINE_FAILED |PG_ONLINE_BROKEN;
>>>>>>               break;
>>>>>>           }
>>>>>> -
>>>>>> -        if ( (y & PGC_state) == PGC_state_offlined )
>>>>>> +        else if ( (y & PGC_state) == PGC_state_offlined )
>>>>>
>>>>> I don't see a need for adding "else" here.
>>>>
>>>> They are mutually exclusive cases. It makes things a whole lot clearer
>>>> to the reader to put the 'else' there, and sometimes helps a naïve
>>>> compiler along the way too.
>>>
>>> Well, I'm afraid I'm going to be pretty strict about this: It's again
>>> a matter of taste, yes, but we generally try to avoid pointless else.
>>> What you consider "a whole lot clearer to the reader" is the opposite
>>> from my pov.
>>
>> While I agree the 'else' may be pointless, I don't think it is worth an 
>> argument. As the author of the patch, it is his choice to write the code 
>> like that.
> 
> Indeed. While I appreciate your insight, Jan, and your detailed reviews
> are undoubtedly helpful — especially to me as I poke around the Xen
> code base without knowing where the bodies are buried — I do sometimes
> find that it degenerates into what appears to be gratuitous
> bikeshedding.
> 
> Like *some* others, I'm perfectly capable of responding "I understand
> you would have done it differently, but I prefer it this way".
> 
> But even for those like me who have the self-confidence (or arrogance?)
> to respond in such a way, the end result is often the same — a patch
> series which the maintainer doesn't apply because it has "unresolved
> issues".
> 
> Perfect is the enemy of good. Especially when perfection is so
> subjective.
> 
> This definitely isn't the kind of welcoming community that I enjoy
> trying to get my junior engineers to contribute to. And they aren't
> snowflakes; they cope with the Linux community just fine, for the most
> part.

I appreciate your open an honest feedback, and having had similar
comments in the past I can assure you that I've already tried to
adjust where I find this acceptable. I take it you realize that
there are two limitations in this - trying doesn't mean succeeding,
and the boundaries of what I'd consider acceptable to let go with
no comments.

Of course there are always two sides of the medal.

As a maintainer of some piece of code, I view it as my
responsibility to look after not only the technical correctness of
that code, but also after its style (in the broadest sense of the
word). Looking at some very bad examples in our tree, many of
which I'm afraid have a Linux origin, I'm in particular of the
opinion that consistent style is a significant aid in code
readability and maintainability. And I hope you agree that _style_
adjustments are pretty easy to make, so I don't view asking for
such as placing a significant burden on the submitter. The
alternative of letting it go uncommented and then take the time
myself to clean up seems quite a bit worse to me, not the least
because of this scaling even less well than the amount of code
review that needs doing.

The mentioned Linux origin of some of the particularly bad
examples in our tree is why I view your "they cope with the Linux
community just fine" as not really applicable. This is despite
our subsequent changes to those files often having made the
situation worse rather than better.

To some degree the same goes for bigger than necessary code churn,
albeit I agree that in a number of cases it is far less objective
to judge than the aim for consistent style. Extra code churn
instead is often making review harder, irrespective of the often
good intentions behind doing so.

> There is a lot of value in your reviews, and they are appreciated. But
> the overall effect is seen by some as making the Xen community somewhat
> dysfunctional. 

In which case I ought to consider, of course after first checking
with my management, to step back as a maintainer. I'd very much
regret doing so, but if it's in the interest of the community ...

(As an aside, likely being among those doing the largest part of
code reviews, helping with that part of the overall workload the
project generates would reduce the number of reviews I'd have to
do, and hence the chances of me giving comments viewed as
unhelpful or worse by submitters. Or, to put it in different,
frank, but hopefully not offending words - I'd like to see you do
a fair amount of code review, including looking after merely
cosmetic aspects in the spirit of our written and unwritten rules,
before you actually comment on me going too far with some of my
feedback. And without me wanting to put too much emphasis on this:
It is my opinion that maintainer views generally have somewhat
higher weight than non-maintainer ones. I'm not going to claim
though there aren't cases where I might go too far and hence abuse
rather than use this, but as per above I can only try to avoid
doing so, I can't promise to succeed. And of course I, like others,
can be convinced to be wrong.)

> The -MP makefile patch I posted yesterday... I almost didn't bother.
> And when I allowed myself to be talked into it, I was entirely
> unsurprised when a review came in basically asking me to prove a
> negative before the patch could proceed. So as far as I can tell, it'll
> fall by the wayside and the build will remain broken any time anyone
> removes or renames a header file. Because life's too short to invest
> the energy to make improvements like that.

So are you saying that as a maintainer I should let go uncommented a
change which I'm unconvinced doesn't have negative side effects,
besides its positive intended behavioral change? The more that here
the workaround is rather trivial? As you may imagine, I've run into
the situation myself a number of times, without considering this a
reason to make any adjustments to the build machinery.

>>>>>> --- a/xen/include/asm-x86/mm.h
>>>>>> +++ b/xen/include/asm-x86/mm.h
>>>>>> @@ -67,18 +67,27 @@
>>>>>>    /* 3-bit PAT/PCD/PWT cache-attribute hint. */
>>>>>>   #define PGC_cacheattr_base PG_shift(6)
>>>>>>   #define PGC_cacheattr_mask PG_mask(7, 6)
>>>>>> - /* Page is broken? */
>>>>>> -#define _PGC_broken       PG_shift(7)
>>>>>> -#define PGC_broken        PG_mask(1, 7)
>>>>>> - /* Mutually-exclusive page states: { inuse, offlining, offlined,
>>>>>> free }. */
>>>>>> -#define PGC_state         PG_mask(3, 9)
>>>>>> -#define PGC_state_inuse   PG_mask(0, 9)
>>>>>> -#define PGC_state_offlining PG_mask(1, 9)
>>>>>> -#define PGC_state_offlined PG_mask(2, 9)
>>>>>> -#define PGC_state_free    PG_mask(3, 9)
>>>>>> -#define page_state_is(pg, st) (((pg)->count_info&PGC_state) ==
>>>>>> PGC_state_##st)
>>>>>> -
>>>>>> - /* Count of references to this frame. */
>>>>>> + /*
>>>>>> +  * Mutually-exclusive page states:
>>>>>> +  * { inuse, offlining, offlined, free, broken_offlining, broken }
>>>>>> +  */
>>>>>> +#define PGC_state                  PG_mask(7, 9)
>>>>>> +#define PGC_state_inuse            PG_mask(0, 9)
>>>>>> +#define PGC_state_offlining        PG_mask(1, 9)
>>>>>> +#define PGC_state_offlined         PG_mask(2, 9)
>>>>>> +#define PGC_state_free             PG_mask(3, 9)
>>>>>> +#define PGC_state_broken_offlining PG_mask(4, 9)
>>>>>
>>>>> TBH I'd prefer PGC_state_offlining_broken, as it's not the
>>>>> offlining which is broken, but a broken page is being
>>>>> offlined.
>>>>
>>>> It is the page which is both broken and offlining.
>>>> Or indeed it is the page which is both offlining and broken.
>>>
>>> I.e. you agree with flipping the two parts around?
> 
> I hope I have respectfully made it clear that no, I'm really not happy
> with the very concept of such a request.
> 
> Perhaps it would be easier for me to acquiesce, in the short term.
> 
> But on the whole I think it's better to put my foot down and say 'no',
> and focus on real work and things that matter.

Well, in the specific case here I've meanwhile realized that my
alternative naming suggested in in no way less ambiguous. So
stick to what you've chosen, albeit I continue to dislike the
name ambiguously also suggesting that the offlining operation
might be broken (e.g. as in "can't be offlined"), rather than the
page itself. I'm not going to exclude though that this is just
because of not being a native English speaker.

>>>>>> +#define page_is_offlining(pg)      (page_state_is((pg), 
>>>>>> broken_offlining) || \
>>>>>> +                                    page_state_is((pg), offlining))
>>>>>
>>>>> Overall I wonder whether the PGC_state_* ordering couldn't be
>>>>> adjusted such that at least some of these three won't need
>>>>> two comparisons (by masking off a bit before comparing).
>>>>
>>>> The whole point in this exercise is that there isn't a whole bit for
>>>> these; they are each *two* states out of the possible 8.
>>>
>>> Sure. But just consider the more general case: Instead of writing
>>>
>>>      if ( i == 6 || i == 7 )
>>>
>>> you can as well write
>>>
>>>      if ( (i | 1) == 7 )
>>
>> I stumbled accross a few of those recently and this is not the obvious 
>> things to read. Yes, your code may be faster. But is it really worth it 
>> compare to the cost of readability and futureproofness?
> 
> No. Just no.
> 
> If that kind of change is really a worthwhile win, it'll depend on the
> CPU. File a GCC PR with a test case as a missed optimisation.

Your experience may be different, but I hardly ever see any effect from
reporting bugs (not just against gcc) unless they're really bad or really
easy to address. That's why I generally prefer to fix such issues myself,
provided of course that I can find the time.

> Don't make the source code gratuitously harder to read.

That's a very subjective aspect again. Personally I find two comparisons
of the same variable against different constants harder to read.

> Honestly, this, right here, is the *epitome* of why I, and others,
> sometimes feel that submitting a patch to Xen can be more effort than
> it's worth.

Note how I said "I wonder", not "please make".

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Xen-devel] [PATCH 1/2] xen/mm: fold PGC_broken into PGC_state bits
  2020-03-19  8:49                                 ` Jan Beulich
@ 2020-03-19 10:26                                   ` David Woodhouse
  2020-03-19 11:59                                     ` Jan Beulich
  0 siblings, 1 reply; 48+ messages in thread
From: David Woodhouse @ 2020-03-19 10:26 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, George Dunlap,
	Jeff Kubascik, Stewart Hildebrand, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 16193 bytes --]

On Thu, 2020-03-19 at 09:49 +0100, Jan Beulich wrote:
> On 18.03.2020 18:13, David Woodhouse wrote:
> > On Wed, 2020-03-18 at 12:31 +0000, Julien Grall wrote:
> > > On 18/03/2020 09:56, Jan Beulich wrote:
> > > > On 17.03.2020 22:52, David Woodhouse wrote:
> > > > > On Thu, 2020-02-20 at 12:10 +0100, Jan Beulich wrote:
> > > > > > > @@ -1699,14 +1714,14 @@ unsigned int online_page(mfn_t mfn,
> > > > > > > uint32_t *status)
> > > > > > >       do {
> > > > > > >           ret = *status = 0;
> > > > > > > -        if ( y & PGC_broken )
> > > > > > > +        if ( (y & PGC_state) == PGC_state_broken ||
> > > > > > > +             (y & PGC_state) == PGC_state_broken_offlining )
> > > > > > >           {
> > > > > > >               ret = -EINVAL;
> > > > > > >               *status = PG_ONLINE_FAILED |PG_ONLINE_BROKEN;
> > > > > > >               break;
> > > > > > >           }
> > > > > > > -
> > > > > > > -        if ( (y & PGC_state) == PGC_state_offlined )
> > > > > > > +        else if ( (y & PGC_state) == PGC_state_offlined )
> > > > > > 
> > > > > > I don't see a need for adding "else" here.
> > > > > 
> > > > > They are mutually exclusive cases. It makes things a whole lot clearer
> > > > > to the reader to put the 'else' there, and sometimes helps a naïve
> > > > > compiler along the way too.
> > > > 
> > > > Well, I'm afraid I'm going to be pretty strict about this: It's again
> > > > a matter of taste, yes, but we generally try to avoid pointless else.
> > > > What you consider "a whole lot clearer to the reader" is the opposite
> > > > from my pov.
> > > 
> > > While I agree the 'else' may be pointless, I don't think it is worth an 
> > > argument. As the author of the patch, it is his choice to write the code 
> > > like that.
> > 
> > Indeed. While I appreciate your insight, Jan, and your detailed reviews
> > are undoubtedly helpful — especially to me as I poke around the Xen
> > code base without knowing where the bodies are buried — I do sometimes
> > find that it degenerates into what appears to be gratuitous
> > bikeshedding.
> > 
> > Like *some* others, I'm perfectly capable of responding "I understand
> > you would have done it differently, but I prefer it this way".
> > 
> > But even for those like me who have the self-confidence (or arrogance?)
> > to respond in such a way, the end result is often the same — a patch
> > series which the maintainer doesn't apply because it has "unresolved
> > issues".
> > 
> > Perfect is the enemy of good. Especially when perfection is so
> > subjective.
> > 
> > This definitely isn't the kind of welcoming community that I enjoy
> > trying to get my junior engineers to contribute to. And they aren't
> > snowflakes; they cope with the Linux community just fine, for the most
> > part.
> 
> I appreciate your open an honest feedback, and having had similar
> comments in the past I can assure you that I've already tried to
> adjust where I find this acceptable. I take it you realize that
> there are two limitations in this - trying doesn't mean succeeding,
> and the boundaries of what I'd consider acceptable to let go with
> no comments.
> 
> Of course there are always two sides of the medal.
> 
> As a maintainer of some piece of code, I view it as my
> responsibility to look after not only the technical correctness of
> that code, but also after its style (in the broadest sense of the
> word). Looking at some very bad examples in our tree, many of
> which I'm afraid have a Linux origin, I'm in particular of the
> opinion that consistent style is a significant aid in code
> readability and maintainability. And I hope you agree that _style_
> adjustments are pretty easy to make, so I don't view asking for
> such as placing a significant burden on the submitter. The
> alternative of letting it go uncommented and then take the time
> myself to clean up seems quite a bit worse to me, not the least
> because of this scaling even less well than the amount of code
> review that needs doing.

Yes, 100% agreed. And I'll even concede that for the cases of moving
code around that happens to not conform to the current style, and
asking contributors to fix it up as they go.

I was agreeing with you on that point, while simultaneously telling
Julien "nah, I'll fix it while I'm here" when he suggested that I *not*
realign the PGC_state bit definitions.


> The mentioned Linux origin of some of the particularly bad
> examples in our tree is why I view your "they cope with the Linux
> community just fine" as not really applicable. This is despite
> our subsequent changes to those files often having made the
> situation worse rather than better.

Was more about the community effect than technical matters, but let's
not rathole on that.

> To some degree the same goes for bigger than necessary code churn,
> albeit I agree that in a number of cases it is far less objective
> to judge than the aim for consistent style. Extra code churn
> instead is often making review harder, irrespective of the often
> good intentions behind doing so.

Completely agreed.

> > There is a lot of value in your reviews, and they are appreciated. But
> > the overall effect is seen by some as making the Xen community somewhat
> > dysfunctional. 
> 
> In which case I ought to consider, of course after first checking
> with my management, to step back as a maintainer. I'd very much
> regret doing so, but if it's in the interest of the community ...

I definitely don't think that would be in the interest of the
community. As I think I may have mentioned once or twice in my previous
message, your detailed reviews are massively appreciated and useful.

> (As an aside, likely being among those doing the largest part of
> code reviews, helping with that part of the overall workload the
> project generates would reduce the number of reviews I'd have to
> do, and hence the chances of me giving comments viewed as
> unhelpful or worse by submitters. Or, to put it in different,
> frank, but hopefully not offending words - I'd like to see you do
> a fair amount of code review, including looking after merely
> cosmetic aspects in the spirit of our written and unwritten rules,
> before you actually comment on me going too far with some of my
> feedback. And without me wanting to put too much emphasis on this:
> It is my opinion that maintainer views generally have somewhat
> higher weight than non-maintainer ones. I'm not going to claim
> though there aren't cases where I might go too far and hence abuse
> rather than use this, but as per above I can only try to avoid
> doing so, I can't promise to succeed. And of course I, like others,
> can be convinced to be wrong.)

Understood.

> > The -MP makefile patch I posted yesterday... I almost didn't bother.
> > And when I allowed myself to be talked into it, I was entirely
> > unsurprised when a review came in basically asking me to prove a
> > negative before the patch could proceed. So as far as I can tell, it'll
> > fall by the wayside and the build will remain broken any time anyone
> > removes or renames a header file. Because life's too short to invest
> > the energy to make improvements like that.
> 
> So are you saying that as a maintainer I should let go uncommented a
> change which I'm unconvinced doesn't have negative side effects,
> besides its positive intended behavioral change? The more that here
> the workaround is rather trivial? As you may imagine, I've run into
> the situation myself a number of times, without considering this a
> reason to make any adjustments to the build machinery.

Jan, I would respectfully request that you take another look at your
initial response, but put yourself in the shoes of a patch submitter:
https://lists.xenproject.org/archives/html/xen-devel/2020-03/msg01171.html

You mention a "simple" workaround... but the workaround I've been using
is to manually remove the offending .o.d files, one at a time (or at
least one directory at a time), until the broken build starts working
again. Is that what you meant? And you really didn't ever consider that
it should be fixed?

And the substance of the response is basically saying "this is voodoo
and we can't touch it or unspecified things might break, but I have no
idea where to tell you to look."

Looking back I realise that the concern about phony rules overriding
pattern rules didn't even come from you; your concern was more nebulous
and unaddressable. It looks like I came up with a straw man and shot
*that* down in my later analysis (although that wasn't my intent; I
think the concern about pattern rules really did come from somewhere).

You asked a question about "why isn't this default behaviour", which is
kind of a silly question when asking about an option (-MP) that was
added to GCC almost a decade after the initial -MD behaviour was
established. Of *course* they didn't retroactively change the default.


Read that message again from the point of view of a contributor.
Pretend it isn't even me; pretend it's someone attempting to make their
first, trivial, improvement to the Xen ecosystem.

I hope you'll understand why my initial reaction was just a
monosyllabic 'no'.


> > > > > > > +#define PGC_state_broken_offlining PG_mask(4, 9)
> > > > > > 
> > > > > > TBH I'd prefer PGC_state_offlining_broken, as it's not the
> > > > > > offlining which is broken, but a broken page is being
> > > > > > offlined.
> > > > > 
> > > > > It is the page which is both broken and offlining.
> > > > > Or indeed it is the page which is both offlining and broken.
> > > > 
> > > > I.e. you agree with flipping the two parts around?
> > 
> > I hope I have respectfully made it clear that no, I'm really not happy
> > with the very concept of such a request.
> > 
> > Perhaps it would be easier for me to acquiesce, in the short term.
> > 
> > But on the whole I think it's better to put my foot down and say 'no',
> > and focus on real work and things that matter.
> 
> Well, in the specific case here I've meanwhile realized that my
> alternative naming suggested in in no way less ambiguous. So
> stick to what you've chosen, albeit I continue to dislike the
> name ambiguously also suggesting that the offlining operation
> might be broken (e.g. as in "can't be offlined"), rather than the
> page itself. I'm not going to exclude though that this is just
> because of not being a native English speaker.

As a native English speaker, the naming of these bothered me too.
They're too long and redundant. But subsuming the PGC_broken but into a
3-bit PGC_state makes sense, and we can't abandon that idea purely
because we can't come up with a *name* that fills us with joy.

There wasn't a *good* answer. I vacillated for a while, and picked the
one that offended me least.

And then ended up in a debate about it when it really wasn't important.


> > > > > > > +#define page_is_offlining(pg)      (page_state_is((pg), 
> > > > > > > broken_offlining) || \
> > > > > > > +                                    page_state_is((pg), offlining))
> > > > > > 
> > > > > > Overall I wonder whether the PGC_state_* ordering couldn't be
> > > > > > adjusted such that at least some of these three won't need
> > > > > > two comparisons (by masking off a bit before comparing).
> > > > > 
> > > > > The whole point in this exercise is that there isn't a whole bit for
> > > > > these; they are each *two* states out of the possible 8.
> > > > 
> > > > Sure. But just consider the more general case: Instead of writing
> > > > 
> > > >      if ( i == 6 || i == 7 )
> > > > 
> > > > you can as well write
> > > > 
> > > >      if ( (i | 1) == 7 )
> > > 
> > > I stumbled accross a few of those recently and this is not the obvious 
> > > things to read. Yes, your code may be faster. But is it really worth it 
> > > compare to the cost of readability and futureproofness?
> > 
> > No. Just no.
> > 
> > If that kind of change is really a worthwhile win, it'll depend on the
> > CPU. File a GCC PR with a test case as a missed optimisation.
> 
> Your experience may be different, but I hardly ever see any effect from
> reporting bugs (not just against gcc) unless they're really bad or really
> easy to address. That's why I generally prefer to fix such issues myself,
> provided of course that I can find the time.

Perhaps so. But if I *don't* file it, it *certainly* doesn't get fixed.

And I've learned over the years *not* to second-guess the optimisations
that today's compiler might make, with the wind blowing in this
particular direction.

FWIW 'return (x == 6 || x == 7)' ends up being emitted by GCC on x86 as

	subl	$6, %edi
	xorl	%eax, %eax
	cmpl	$1, %edi
	setbe	%al
	ret

And 'return (x == 5 || x == 7)' gives:

	andl	$-3, %edi
	xorl	%eax, %eax
	cmpl	$5, %edi
	sete	%al
	ret

So it does look like GCC is actually doing its job, on this occasion.

But that's entirely beside the point, which is that I'm having some
pointless discussion about compiler optimisation minutiæ when fixing
PGC_broken was *already* deep into yak-shaving for the improvement I
was *actually* trying to make. It's distracting from real work, raising
barriers to getting fixes merged.

> > Don't make the source code gratuitously harder to read.
> 
> That's a very subjective aspect again. Personally I find two comparisons
> of the same variable against different constants harder to read.
> 
> > Honestly, this, right here, is the *epitome* of why I, and others,
> > sometimes feel that submitting a patch to Xen can be more effort than
> > it's worth.
> 
> Note how I said "I wonder", not "please make".

Perspective again. That distinction really doesn't matter. Perhaps you
underestimate the weight your words carry, as a well-respected
maintainer. You can't negate that effect purely by word tricks like
saying 'I wonder'. 

Because understatement is a very common tool in the English language,
especially in British English — and we've all seen people write "I
wonder if you should..." when what was really meant was "I will set
fire to you if you don't...". 

Understatement like that doesn't work. It still derails the patch
review. It just didn't need to be said, in that context.

Let me repeat — because I've only said it once today, I think, that
your reviews are incredibly useful. I'm only asking that you recognise
the weight that your 'wondering' can have, and recognise when something
you are asking for is *subjective*.

A review is not about "is this code precisely how it would look if I
had written it myself", but it is about "is this code correct and
maintainable".

Sometimes, as in the example with the PGC_state_ naming above, there
isn't a "nice" answer. We pick the solution that offends us least. And
I completely understand as a maintainer, what it's like to be on the
receiving end of such a choice. You think "that could be nicer"... and
have to work through the alternatives yourself before you realise that
actually, it was the best of the choices available.

Each of the responses I've identified from you as 'excessive' has some
merit, we can focus on each of them and you can justify them, to a
certain extent. But as a whole, the effect is of a barrage of nitpicks
of questionable utility which really does hinder forward progress.

Let's try to focus on comments which will genuinely make the code
better. It's not that we should deliberately stop paying attention to
detail, or deliberately allow buggy and broken code into the tree. It's
that we should be aware that "perfect is the enemy of good".

For my part, I'll stop whining at you now. If I end up giving responses
to parts of your code review which seem to be along the lines of
"that's nice, dear, but I didn't think so and I did the typing", please
hark back to this conversation. I'll try to phrase them more
appropriately than that, but no promises :)

Thanks.


[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Xen-devel] [PATCH 1/2] xen/mm: fold PGC_broken into PGC_state bits
  2020-03-19 10:26                                   ` David Woodhouse
@ 2020-03-19 11:59                                     ` Jan Beulich
  2020-03-19 13:54                                       ` David Woodhouse
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2020-03-19 11:59 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, George Dunlap,
	Jeff Kubascik, Stewart Hildebrand, xen-devel

On 19.03.2020 11:26, David Woodhouse wrote:
> On Thu, 2020-03-19 at 09:49 +0100, Jan Beulich wrote:
>> On 18.03.2020 18:13, David Woodhouse wrote:
>>> The -MP makefile patch I posted yesterday... I almost didn't bother.
>>> And when I allowed myself to be talked into it, I was entirely
>>> unsurprised when a review came in basically asking me to prove a
>>> negative before the patch could proceed. So as far as I can tell, it'll
>>> fall by the wayside and the build will remain broken any time anyone
>>> removes or renames a header file. Because life's too short to invest
>>> the energy to make improvements like that.
>>
>> So are you saying that as a maintainer I should let go uncommented a
>> change which I'm unconvinced doesn't have negative side effects,
>> besides its positive intended behavioral change? The more that here
>> the workaround is rather trivial? As you may imagine, I've run into
>> the situation myself a number of times, without considering this a
>> reason to make any adjustments to the build machinery.
> 
> Jan, I would respectfully request that you take another look at your
> initial response, but put yourself in the shoes of a patch submitter:
> https://lists.xenproject.org/archives/html/xen-devel/2020-03/msg01171.html
> 
> You mention a "simple" workaround... but the workaround I've been using
> is to manually remove the offending .o.d files, one at a time (or at
> least one directory at a time), until the broken build starts working
> again. Is that what you meant? And you really didn't ever consider that
> it should be fixed?

No, the workaround I've been using (after initially doing the expensive
one you describe) was to simply put in an empty file (or perhaps one
with an #error directive) in the place of the deleted one, rebuild, and
all .*.o.d files would have been updated. I might do so even before
fully deleting the file.

> And the substance of the response is basically saying "this is voodoo
> and we can't touch it or unspecified things might break, but I have no
> idea where to tell you to look."
> 
> Looking back I realise that the concern about phony rules overriding
> pattern rules didn't even come from you; your concern was more nebulous
> and unaddressable. It looks like I came up with a straw man and shot
> *that* down in my later analysis (although that wasn't my intent; I
> think the concern about pattern rules really did come from somewhere).
> 
> You asked a question about "why isn't this default behaviour", which is
> kind of a silly question when asking about an option (-MP) that was
> added to GCC almost a decade after the initial -MD behaviour was
> established. Of *course* they didn't retroactively change the default.

I don't see at all why this would be "of course" - if there really
was no undue side effect, why couldn't they?

> Read that message again from the point of view of a contributor.
> Pretend it isn't even me; pretend it's someone attempting to make their
> first, trivial, improvement to the Xen ecosystem.
> 
> I hope you'll understand why my initial reaction was just a
> monosyllabic 'no'.

To be honest- no, I don't. I didn't say "no way". Instead I asked
back to see whether there's more background to this. It is a useful
piece of information to know that -MP post-dates -MD by 10 or more
years. It's still speculation of why a new option was added rather
than making this default behavior, but I feel less afraid of the
change this way than by an implied "this not going to do any harm"
without really being certain why there is a separate option in the
first place (and gcc doc also not saying anything to this effect).

I can certainly follow your sentiment, not the least because
especially in my early days I also frequently got back replies I
didn't like, in various projects. Yet in a case like this one I'm
afraid it is not the reviewer's job to point out the unsafety of
a change, but it's the submitter who has to (sufficiently) prove
that a change won't break anything. Yes, in the typical case,
when there's a recognizable bug, the reviewer would point this
out. But there are cases where there's no obvious bug, but
experience (and, as so often, insufficient documentation) tells
one to be wary of changes of, possibly, any kind.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Xen-devel] [PATCH 1/2] xen/mm: fold PGC_broken into PGC_state bits
  2020-03-19 11:59                                     ` Jan Beulich
@ 2020-03-19 13:54                                       ` David Woodhouse
  2020-03-19 14:46                                         ` Jan Beulich
  0 siblings, 1 reply; 48+ messages in thread
From: David Woodhouse @ 2020-03-19 13:54 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, George Dunlap,
	Jeff Kubascik, Stewart Hildebrand, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3256 bytes --]

On Thu, 2020-03-19 at 12:59 +0100, Jan Beulich wrote:
> > Read that message again from the point of view of a contributor.
> > Pretend it isn't even me; pretend it's someone attempting to make
> > their first, trivial, improvement to the Xen ecosystem.
> > 
> > I hope you'll understand why my initial reaction was just a
> > monosyllabic 'no'.
> 
> To be honest- no, I don't. I didn't say "no way".

Then you have completely missed my point about how subtly understating
your 'objections' makes no difference at all to the outcome.

But OK, I'll come back to that at the end. You have made your intent
clear, more than once now, and we should take it on board.

>  Instead I asked back to see whether there's more background to this.
> It is a useful piece of information to know that -MP post-dates -MD
> by 10 or more years. It's still speculation of why a new option was
> added rather than making this default behavior, but I feel less
> afraid of the change this way than by an implied "this not going to
> do any harm" without really being certain why there is a separate
> option in the first place (and gcc doc also not saying anything to
> this effect).

It is not my job to make you feel less afraid of change.

> I can certainly follow your sentiment, not the least because
> especially in my early days I also frequently got back replies I
> didn't like, in various projects. Yet in a case like this one I'm
> afraid it is not the reviewer's job to point out the unsafety of
> a change, but it's the submitter who has to (sufficiently) prove
> that a change won't break anything. 

I'm sure you didn't mean it as such, Jan, but FYI that response could
be construed as being fairly patronising. If you were to direct it
towards someone who's even remotely inclined to feeling patronised,
that is. :)

> Yes, in the typical case, when there's a recognizable bug, the
> reviewer would point this out. But there are cases where there's no
> obvious bug, but experience (and, as so often, insufficient
> documentation) tells one to be wary of changes of, possibly, any
> kind.

I find this response to be purely obstructive and unhelpful. Your
response to my patch was basically asking me to prove a negative, and I
find myself surprised and disappointed that you cannot acknowledge
that. I didn't think our viewpoints were really that far apart; perhaps
I was wrong.

If there was an actual bug — or even the suspicion of a bug — I could
understand it. But this is just voodoo "we're too scared to change
things because we don't understand".

We are better than that. You can be better than that.

But I will take on board your comments about understatement and the
fact that you hadn't actually said "no". In future I shall consider
merely ignoring such interjections unless you explicitly state that you
are blocking the acceptance of a patch. Or, I suppose, resorting to the
style of monosyllabic answer that I had originally given in this case.

I trust that maintainers will take that on board too, and that open
"questions" from you in a thread will not be considered sufficient
reason not to merge a patch.

That seems to be what you're saying is your intent, yes? 


[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Xen-devel] [PATCH 1/2] xen/mm: fold PGC_broken into PGC_state bits
  2020-03-19 13:54                                       ` David Woodhouse
@ 2020-03-19 14:46                                         ` Jan Beulich
  0 siblings, 0 replies; 48+ messages in thread
From: Jan Beulich @ 2020-03-19 14:46 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, George Dunlap,
	Jeff Kubascik, Stewart Hildebrand, xen-devel

On 19.03.2020 14:54, David Woodhouse wrote:
> On Thu, 2020-03-19 at 12:59 +0100, Jan Beulich wrote:
>>> Read that message again from the point of view of a contributor.
>>> Pretend it isn't even me; pretend it's someone attempting to make
>>> their first, trivial, improvement to the Xen ecosystem.
>>>
>>> I hope you'll understand why my initial reaction was just a
>>> monosyllabic 'no'.
>>
>> To be honest- no, I don't. I didn't say "no way".
> 
> Then you have completely missed my point about how subtly understating
> your 'objections' makes no difference at all to the outcome.
> 
> But OK, I'll come back to that at the end. You have made your intent
> clear, more than once now, and we should take it on board.
> 
>>  Instead I asked back to see whether there's more background to this.
>> It is a useful piece of information to know that -MP post-dates -MD
>> by 10 or more years. It's still speculation of why a new option was
>> added rather than making this default behavior, but I feel less
>> afraid of the change this way than by an implied "this not going to
>> do any harm" without really being certain why there is a separate
>> option in the first place (and gcc doc also not saying anything to
>> this effect).
> 
> It is not my job to make you feel less afraid of change.
> 
>> I can certainly follow your sentiment, not the least because
>> especially in my early days I also frequently got back replies I
>> didn't like, in various projects. Yet in a case like this one I'm
>> afraid it is not the reviewer's job to point out the unsafety of
>> a change, but it's the submitter who has to (sufficiently) prove
>> that a change won't break anything. 
> 
> I'm sure you didn't mean it as such, Jan, but FYI that response could
> be construed as being fairly patronising. If you were to direct it
> towards someone who's even remotely inclined to feeling patronised,
> that is. :)

I certainly didn't mean to, I apologize. (My dictionary gives me
several very different meanings of "patronize", so I'm somewhat
guessing which meaning you infer here.)

>> Yes, in the typical case, when there's a recognizable bug, the
>> reviewer would point this out. But there are cases where there's no
>> obvious bug, but experience (and, as so often, insufficient
>> documentation) tells one to be wary of changes of, possibly, any
>> kind.
> 
> I find this response to be purely obstructive and unhelpful.

I'm sorry if it feels like this to you.

> Your
> response to my patch was basically asking me to prove a negative, and I
> find myself surprised and disappointed that you cannot acknowledge
> that. I didn't think our viewpoints were really that far apart; perhaps
> I was wrong.

I'm certainly willing to acknowledge that I've asked a question
that may be difficult if possible at all to answer in a way that
we'd be fully certain in the end. Yet even after all of the
discussion we've had here I still think the question was
appropriate to ask. It continues to be unobvious to me that non-
default behavior of a tool would imply using this behavior is
going to be free of side effects. The historical aspect you've
dug out afterwards is at least a partial explanation which,
seeing that you've got an unconditional and a conditional ack,
is good enough for me to let the change go in, despite still
not being finally convinced of it being free of side effects.

> If there was an actual bug — or even the suspicion of a bug — I could
> understand it. But this is just voodoo "we're too scared to change
> things because we don't understand".

Not just this, but also because things had been broken in subtle
ways in the past. Until we get a better one, we have to live with
the build system being fragile here and there.

> We are better than that. You can be better than that.
> 
> But I will take on board your comments about understatement and the
> fact that you hadn't actually said "no". In future I shall consider
> merely ignoring such interjections unless you explicitly state that you
> are blocking the acceptance of a patch. Or, I suppose, resorting to the
> style of monosyllabic answer that I had originally given in this case.
> 
> I trust that maintainers will take that on board too, and that open
> "questions" from you in a thread will not be considered sufficient
> reason not to merge a patch.
> 
> That seems to be what you're saying is your intent, yes? 

My intent was to get clarification before the patch would go in.
I didn't mean to block it, but I also didn't see it go in without
such clarification. I'm struggling to see what's bad in asking
whether you/we are certain enough that a change won't have bad
side effects; if there were, we might treat an easy to work around
situation by one hard to recognize and address. Seeing you reply
just "no" seemed a fair answer to me (while I sensed a certain
level of annoyance), albeit not one that would resolve the
question. In anticipation I did include anyone else who might
know right away. Had I known the answer myself, I of course
wouldn't have asked.

Bottom line - when I say "no", I mean "no". When I ask a question
I expect it to be resolved, at least to a reasonable degree. When
I say "I wonder" I indeed mean just that; to me "may I suggest to
consider as an alternative" is simply more words, which may again
be an effect of English not being my native language. And when I
say "ack", I mean "ack". (I also didn't think I made any comments
about understatement; it was you who brought up that [cultural]
aspect.)

I'm afraid as a result of this discussion I'm now more confused
as to finding common grounds than I was before.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2020-03-19 14:46 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-04 15:14 [Xen-devel] [XEN PATCH v2 1/2] Check zone before merging adjacent blocks in heap Stewart Hildebrand
2020-02-04 15:14 ` [Xen-devel] [DO NOT APPLY XEN PATCH v2 2/2] Test case for buddy allocator merging issue Stewart Hildebrand
2020-02-04 15:22 ` [Xen-devel] [XEN PATCH v2 1/2] Check zone before merging adjacent blocks in heap Jan Beulich
2020-02-04 15:37   ` George Dunlap
2020-02-05  9:50     ` David Woodhouse
2020-02-05 10:02       ` Jan Beulich
2020-02-05 10:24         ` David Woodhouse
2020-02-05 10:49           ` Jan Beulich
2020-02-05 11:23             ` David Woodhouse
2020-02-05 13:37               ` Jan Beulich
2020-02-05 14:12                 ` David Woodhouse
2020-02-07 15:49                   ` David Woodhouse
2020-02-07 15:57                     ` [Xen-devel] [PATCH 1/2] xen/mm: fold PGC_broken into PGC_state bits David Woodhouse
2020-02-07 20:27                       ` Julien Grall
2020-02-09 13:22                         ` David Woodhouse
2020-02-09 17:59                           ` Julien Grall
2020-03-17 21:39                         ` David Woodhouse
2020-02-20 11:10                       ` Jan Beulich
2020-03-17 21:52                         ` David Woodhouse
2020-03-18  9:56                           ` Jan Beulich
2020-03-18 12:31                             ` Julien Grall
2020-03-18 13:23                               ` Jan Beulich
2020-03-18 17:13                               ` David Woodhouse
2020-03-19  8:49                                 ` Jan Beulich
2020-03-19 10:26                                   ` David Woodhouse
2020-03-19 11:59                                     ` Jan Beulich
2020-03-19 13:54                                       ` David Woodhouse
2020-03-19 14:46                                         ` Jan Beulich
2020-02-07 15:57                     ` [Xen-devel] [PATCH 2/2] xen/mm: Introduce PG_state_uninitialised David Woodhouse
2020-02-07 16:30                       ` Xia, Hongyan
2020-02-07 16:32                         ` David Woodhouse
2020-02-07 16:40                           ` Xia, Hongyan
2020-02-07 17:06                             ` David Woodhouse
2020-02-07 18:04                               ` David Woodhouse
2020-02-20 11:59                                 ` Jan Beulich
2020-02-20 13:27                                   ` Julien Grall
2020-03-17 22:15                                   ` David Woodhouse
2020-03-18  8:53                                     ` Paul Durrant
2020-03-18 10:10                                       ` Jan Beulich
2020-03-18 10:41                                         ` Paul Durrant
2020-03-18 11:12                                           ` Jan Beulich
2020-03-18 10:03                                     ` Jan Beulich
2020-03-18 12:11                                       ` David Woodhouse
2020-03-18 13:27                                         ` Jan Beulich
2020-02-05 10:22       ` [Xen-devel] [XEN PATCH v2 1/2] Check zone before merging adjacent blocks in heap Julien Grall
2020-02-05 10:32         ` David Woodhouse
2020-02-05 11:36         ` David Woodhouse
2020-02-04 15:37   ` Stewart Hildebrand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.