All of lore.kernel.org
 help / color / mirror / Atom feed
* Add 'info pg' command to monitor
@ 2024-04-15 16:08 Don Porter
  2024-04-15 16:08 ` [PATCH 1/2] monitor: Implement a generic x86 page table iterator Don Porter
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Don Porter @ 2024-04-15 16:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, dave

Hi all,

I am a CS professor (and, newly, a second-time contributor). I have
been using qemu in my courses for over a decade, especially a course
that asks students to write major pieces of an OS kernel from starter
code.

I have some patches, originally from Austin Clements at MIT, that I
have found useful over the years and that may be useful to others.  It
would also be nice not to have to build a custom qemu each semester.  I
have cleared upstreaming these with Austin, the original author.

This patch set adds an 'info pg' command to the monitor, which prints
a nicer view of the page tables.  A project in my graduate OS course
involves implementing x86 page table support, and my students have
found this helpful for debugging.

Thank you in advance for your time,
Don Porter




^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] monitor: Implement a generic x86 page table iterator
  2024-04-15 16:08 Add 'info pg' command to monitor Don Porter
@ 2024-04-15 16:08 ` Don Porter
  2024-04-15 16:08 ` [PATCH 2/2] monitor: Add an "info pg" command that prints the current page tables Don Porter
  2024-04-15 16:37 ` Add 'info pg' command to monitor Peter Maydell
  2 siblings, 0 replies; 13+ messages in thread
From: Don Porter @ 2024-04-15 16:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, dave, Austin Clements, Geoffrey Thomas, Don Porter

From: Austin Clements <aclements@csail.mit.edu>

This iterator provides a way to traverse 32-bit, PAE, and 64-bit page
tables by abstracting them as n-ary trees.  A struct describes the
full range of x86 page table layouts and the iterator builds on this
to provide a "successor" function for efficiently traversing the page
table tree.

This code is currently unused, but provides the groundwork for the
following "info pg" patch.  It could also be used to unify and
simplify the implementations of "info mem" and "info tlb".

Signed-off-by: Austin Clements <aclements@csail.mit.edu>
[geofft@ldpreload.com: Rebased on top of 2.9.0]
Signed-off-by: Geoffrey Thomas <geofft@ldpreload.com>
Signed-off-by: Don Porter <porter@cs.unc.edu>
---
 target/i386/monitor.c | 149 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 149 insertions(+)

diff --git a/target/i386/monitor.c b/target/i386/monitor.c
index 3a281dab02..7924de6520 100644
--- a/target/i386/monitor.c
+++ b/target/i386/monitor.c
@@ -34,6 +34,155 @@
 #include "qapi/qapi-commands-misc-target.h"
 #include "qapi/qapi-commands-misc.h"
 
+/**
+ * PTLayout describes the layout of an x86 page table in enough detail
+ * to fully decode up to a 4-level 64-bit page table tree.
+ */
+typedef struct PTLayout {
+    int levels, entsize;
+    int entries[4];             /* Entries in each table level */
+    int shift[4];               /* VA bit shift each each level */
+    bool pse[4];                /* Whether PSE bit is valid */
+    const char *names[4];
+    int vaw, paw;               /* VA and PA width in characters */
+} PTLayout;
+
+/**
+ * PTIter provides a generic way to traverse and decode an x86 page
+ * table tree.
+ */
+typedef struct PTIter {
+    const PTLayout *layout;
+    bool pse;                   /* PSE enabled */
+    int level;                  /* Current level */
+    int i[4];                   /* Index at each level */
+    hwaddr base[4];             /* Physical base pointer */
+
+    uint64_t ent;               /* Current entry */
+    bool present, leaf;
+    target_ulong va;
+    hwaddr pa;
+    target_ulong  size;
+} PTIter;
+
+static bool ptiter_succ(PTIter *it);
+
+/**
+ * Initialize a PTIter to point to the first entry of the page table's
+ * top level.  On failure, prints a message to mon and returns false.
+ */
+static bool
+__attribute__ ((unused))
+ptiter_init(Monitor *mon, PTIter *it)
+{
+    static const PTLayout l32 = {
+        2, 4, {1024, 1024}, {22, 12}, {1, 0}, {"PDE", "PTE"}, 8, 8
+    };
+    static const PTLayout lpae = {
+        3, 8, {4, 512, 512}, {30, 21, 12}, {0, 1, 0},
+        {"PDP", "PDE", "PTE"}, 8, 13
+    };
+#ifdef TARGET_X86_64
+    static const PTLayout l64 = {
+        4, 8, {512, 512, 512, 512}, {39, 30, 21, 12}, {0, 1, 1, 0},
+        {"PML4", "PDP", "PDE", "PTE"}, 12, 13
+    };
+#endif
+    CPUArchState *env;
+
+    env = mon_get_cpu_env(mon);
+
+    if (!(env->cr[0] & CR0_PG_MASK)) {
+        monitor_printf(mon, "PG disabled\n");
+        return false;
+    }
+
+    memset(it, 0, sizeof(*it));
+    if (env->cr[4] & CR4_PAE_MASK) {
+#ifdef TARGET_X86_64
+        if (env->hflags & HF_LMA_MASK) {
+            it->layout = &l64;
+            it->base[0] = env->cr[3] & 0x3fffffffff000ULL;
+        } else
+#endif
+        {
+            it->layout = &lpae;
+            it->base[0] = env->cr[3] & ~0x1f;
+        }
+        it->pse = true;
+    } else {
+        it->layout = &l32;
+        it->base[0] = env->cr[3] & ~0xfff;
+        it->pse = (env->cr[4] & CR4_PSE_MASK);
+    }
+
+    /* Trick ptiter_succ into doing the hard initialization. */
+    it->i[0] = -1;
+    it->leaf = true;
+    ptiter_succ(it);
+    return true;
+}
+
+/**
+ * Move a PTIter to the successor of the current entry.  Specifically:
+ * if the iterator points to a leaf, move to its next sibling, or to
+ * the next sibling of a parent if it has no more siblings.  If the
+ * iterator points to a non-leaf, move to its first child.  If there
+ * is no successor, return false.
+ *
+ * Note that the resulting entry may not be marked present, though
+ * non-present entries are always leafs (within a page
+ * table/directory/etc, this will always visit all entries).
+ */
+static bool ptiter_succ(PTIter *it)
+{
+    int i, l, entsize;
+    uint64_t ent64;
+    uint32_t ent32;
+    bool large;
+
+    if (it->level < 0) {
+        return false;
+    } else if (!it->leaf) {
+        /* Move to this entry's first child */
+        it->level++;
+        it->base[it->level] = it->pa;
+        it->i[it->level] = 0;
+    } else {
+        /* Move forward and, if we hit the end of this level, up */
+        while (++it->i[it->level] == it->layout->entries[it->level]) {
+            if (it->level-- == 0) {
+                /* We're out of page table */
+                return false;
+            }
+        }
+    }
+
+    /* Read this entry */
+    l = it->level;
+    entsize = it->layout->entsize;
+    cpu_physical_memory_read(it->base[l] + it->i[l] * entsize,
+                             entsize == 4 ? (void *)&ent32 : (void *)&ent64,
+                             entsize);
+    if (entsize == 4) {
+        it->ent = le32_to_cpu(ent32);
+    } else {
+        it->ent = le64_to_cpu(ent64);
+    }
+
+    /* Decode the entry */
+    large = (it->pse && it->layout->pse[l] && (it->ent & PG_PSE_MASK));
+    it->present = it->ent & PG_PRESENT_MASK;
+    it->leaf = (large || !it->present || ((l + 1) == it->layout->levels));
+    it->va = 0;
+    for (i = 0; i <= l; i++) {
+        it->va |= (uint64_t)it->i[i] << it->layout->shift[i];
+    }
+    it->pa = it->ent & (large ? 0x3ffffffffc000ULL : 0x3fffffffff000ULL);
+    it->size = 1 << it->layout->shift[l];
+    return true;
+}
+
 /* Perform linear address sign extension */
 static hwaddr addr_canonical(CPUArchState *env, hwaddr addr)
 {
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2] monitor: Add an "info pg" command that prints the current page tables
  2024-04-15 16:08 Add 'info pg' command to monitor Don Porter
  2024-04-15 16:08 ` [PATCH 1/2] monitor: Implement a generic x86 page table iterator Don Porter
@ 2024-04-15 16:08 ` Don Porter
  2024-04-15 16:37 ` Add 'info pg' command to monitor Peter Maydell
  2 siblings, 0 replies; 13+ messages in thread
From: Don Porter @ 2024-04-15 16:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, dave, Austin Clements, Geoffrey Thomas, Don Porter

From: Austin Clements <aclements@csail.mit.edu>

The new "info pg" monitor command prints the current page table,
including virtual address ranges, flag bits, and snippets of physical
page numbers.  Completely filled regions of the page table with
compatible flags are "folded", with the result that the complete
output for a freshly booted x86-64 Linux VM can fit in a single
terminal window.  The output looks like this:

VPN range             Entry         Flags            Physical page
[7f0000000-7f0000000] PML4[0fe]     ---DA--UWP
  [7f28c0000-7f28fffff]  PDP[0a3]     ---DA--UWP
    [7f28c4600-7f28c47ff]  PDE[023]     ---DA--UWP
      [7f28c4655-7f28c4656]  PTE[055-056] X--D---U-P 0000007f14-0000007f15
      [7f28c465b-7f28c465b]  PTE[05b]     ----A--U-P 0000001cfc
...
[ff8000000-ff8000000] PML4[1ff]     ---DA--UWP
  [ffff80000-ffffbffff]  PDP[1fe]     ---DA---WP
    [ffff81000-ffff81dff]  PDE[008-00e] -GSDA---WP 0000001000-0000001dff
  [ffffc0000-fffffffff]  PDP[1ff]     ---DA--UWP
    [ffffff400-ffffff5ff]  PDE[1fa]     ---DA--UWP
      [ffffff5fb-ffffff5fc]  PTE[1fb-1fc] XG-DACT-WP 00000fec00 00000fee00
    [ffffff600-ffffff7ff]  PDE[1fb]     ---DA--UWP
      [ffffff600-ffffff600]  PTE[000]     -G-DA--U-P 0000001467

Signed-off-by: Austin Clements <aclements@csail.mit.edu>
[geofft@ldpreload.com: Rebased on top of 2.9.0]
Signed-off-by: Geoffrey Thomas <geofft@ldpreload.com>
Signed-off-by: Don Porter <porter@cs.unc.edu>
---
 hmp-commands-info.hx         |  15 +++
 include/monitor/hmp-target.h |   2 +
 target/i386/monitor.c        | 179 ++++++++++++++++++++++++++++++++++-
 3 files changed, 195 insertions(+), 1 deletion(-)

diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index ad1b1306e3..ae7de74041 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -239,6 +239,21 @@ SRST
     Show the active virtual memory mappings.
 ERST
 
+#if defined(TARGET_I386)
+    {
+        .name       = "pg",
+        .args_type  = "",
+        .params     = "",
+        .help       = "show the page table",
+        .cmd        = hmp_info_pg,
+    },
+#endif
+
+SRST                                                                               |
+  ``info pg``                                                                      |
+    Show the active page table.                                                    |
+ERST
+
     {
         .name       = "mtree",
         .args_type  = "flatview:-f,dispatch_tree:-d,owner:-o,disabled:-D",
diff --git a/include/monitor/hmp-target.h b/include/monitor/hmp-target.h
index d78e979f05..68f58d2a31 100644
--- a/include/monitor/hmp-target.h
+++ b/include/monitor/hmp-target.h
@@ -48,6 +48,8 @@ void hmp_info_mem(Monitor *mon, const QDict *qdict);
 void hmp_info_tlb(Monitor *mon, const QDict *qdict);
 void hmp_mce(Monitor *mon, const QDict *qdict);
 void hmp_info_local_apic(Monitor *mon, const QDict *qdict);
+void hmp_info_io_apic(Monitor *mon, const QDict *qdict);
+void hmp_info_pg(Monitor *mon, const QDict *qdict);
 void hmp_info_sev(Monitor *mon, const QDict *qdict);
 void hmp_info_sgx(Monitor *mon, const QDict *qdict);
 void hmp_info_via(Monitor *mon, const QDict *qdict);
diff --git a/target/i386/monitor.c b/target/i386/monitor.c
index 7924de6520..4cf39a4140 100644
--- a/target/i386/monitor.c
+++ b/target/i386/monitor.c
@@ -72,7 +72,6 @@ static bool ptiter_succ(PTIter *it);
  * top level.  On failure, prints a message to mon and returns false.
  */
 static bool
-__attribute__ ((unused))
 ptiter_init(Monitor *mon, PTIter *it)
 {
     static const PTLayout l32 = {
@@ -88,10 +87,16 @@ ptiter_init(Monitor *mon, PTIter *it)
         {"PML4", "PDP", "PDE", "PTE"}, 12, 13
     };
 #endif
+
     CPUArchState *env;
 
     env = mon_get_cpu_env(mon);
 
+    if (!env) {
+        monitor_printf(mon, "No CPU available\n");
+        return false;
+    }
+
     if (!(env->cr[0] & CR0_PG_MASK)) {
         monitor_printf(mon, "PG disabled\n");
         return false;
@@ -200,6 +205,178 @@ static hwaddr addr_canonical(CPUArchState *env, hwaddr addr)
     return addr;
 }
 
+/**
+ * Return true if the page tree rooted at iter is complete and
+ * compatible with compat.  last will be filled with the last entry at
+ * each level.  If false, does not change iter and last can be filled
+ * with anything; if true, returns with iter at the next entry on the
+ * same level, or the next parent entry if iter is on the last entry
+ * of this level.
+ */
+static bool pg_complete(PTIter *root, const PTIter compat[], PTIter last[])
+{
+    PTIter iter = *root;
+
+    if ((root->ent & 0xfff) != (compat[root->level].ent & 0xfff)) {
+        return false;
+    }
+
+    last[root->level] = *root;
+    ptiter_succ(&iter);
+    if (!root->leaf) {
+        /* Are all of the direct children of root complete? */
+        while (iter.level == root->level + 1) {
+            if (!pg_complete(&iter, compat, last)) {
+                return false;
+            }
+        }
+    }
+    assert(iter.level <= root->level);
+    assert(iter.level == root->level ?
+           iter.i[iter.level] == root->i[iter.level] + 1 : 1);
+    *root = iter;
+    return true;
+}
+
+static char *pg_bits(uint64_t ent)
+{
+    static char buf[32];
+    sprintf(buf, "%c%c%c%c%c%c%c%c%c%c",
+            /* TODO: Some of these change depending on level */
+            ent & PG_NX_MASK ? 'X' : '-',
+            ent & PG_GLOBAL_MASK ? 'G' : '-',
+            ent & PG_PSE_MASK ? 'S' : '-',
+            ent & PG_DIRTY_MASK ? 'D' : '-',
+            ent & PG_ACCESSED_MASK ? 'A' : '-',
+            ent & PG_PCD_MASK ? 'C' : '-',
+            ent & PG_PWT_MASK ? 'T' : '-',
+            ent & PG_USER_MASK ? 'U' : '-',
+            ent & PG_RW_MASK ? 'W' : '-',
+            ent & PG_PRESENT_MASK ? 'P' : '-');
+    return buf;
+}
+
+static void pg_print(Monitor *mon, PTIter *s, PTIter *l)
+{
+    int lev = s->level;
+    char buf[128];
+    char *pos = buf, *end = buf + sizeof(buf);
+
+    /* VFN range */
+    pos += sprintf(pos, "%*s[%0*"PRIx64"-%0*"PRIx64"] ",
+                   lev * 2, "",
+                   s->layout->vaw - 3, (uint64_t)s->va >> 12,
+                   s->layout->vaw - 3, ((uint64_t)l->va + l->size - 1) >> 12);
+
+    /* Slot */
+    if (s->i[lev] == l->i[lev]) {
+        pos += sprintf(pos, "%4s[%03x]    ",
+                       s->layout->names[lev], s->i[lev]);
+    } else {
+        pos += sprintf(pos, "%4s[%03x-%03x]",
+                       s->layout->names[lev], s->i[lev], l->i[lev]);
+    }
+
+    /* Flags */
+    pos += sprintf(pos, " %s", pg_bits(s->ent));
+
+    /* Range-compressed PFN's */
+    if (s->leaf) {
+        PTIter iter = *s;
+        int i = 0;
+        bool exhausted = false;
+        while (!exhausted && i++ < 10) {
+            hwaddr pas = iter.pa, pae = iter.pa + iter.size;
+            while (ptiter_succ(&iter) && iter.va <= l->va) {
+                if (iter.level == s->level) {
+                    if (iter.pa == pae) {
+                        pae = iter.pa + iter.size;
+                    } else {
+                        goto print;
+                    }
+                }
+            }
+            exhausted = true;
+
+print:
+            if (pas >> 12 == (pae - 1) >> 12) {
+                pos += snprintf(pos, end - pos, " %0*"PRIx64,
+                                s->layout->paw - 3, (uint64_t)pas >> 12);
+            } else {
+                pos += snprintf(pos, end - pos, " %0*"PRIx64"-%0*"PRIx64,
+                                s->layout->paw - 3, (uint64_t)pas >> 12,
+                                s->layout->paw - 3, (uint64_t)(pae - 1) >> 12);
+            }
+            pos = MIN(pos, end);
+        }
+    }
+
+    /* Trim line to fit screen */
+    if (pos - buf > 79) {
+        strcpy(buf + 77, "..");
+    }
+
+    monitor_printf(mon, "%s\n", buf);
+}
+
+void hmp_info_pg(Monitor *mon, const QDict *qdict)
+{
+    PTIter iter;
+
+    if (!ptiter_init(mon, &iter)) {
+        return;
+    }
+
+    /* Header line */
+    monitor_printf(mon, "%-*s %-13s %-10s %*s%s\n",
+                   3 + 2 * (iter.layout->vaw - 3), "VPN range",
+                   "Entry", "Flags",
+                   2 * (iter.layout->levels - 1), "", "Physical page");
+
+    while (iter.level >= 0) {
+        int i, startLevel, maxLevel;
+        PTIter start[4], last[4], nlast[4];
+        bool compressed = false;
+
+        /* Skip to the next present entry */
+        do { } while (!iter.present && ptiter_succ(&iter));
+        if (iter.level < 0) {
+            break;
+        }
+
+        /*
+         * Find a run of complete entries starting at iter and staying
+         * on the same level.
+         */
+        startLevel = iter.level;
+        memset(start, 0, sizeof(start));
+        do {
+            start[iter.level] = iter;
+        } while (!iter.leaf && ptiter_succ(&iter));
+        maxLevel = iter.level;
+        iter = start[startLevel];
+        while (iter.level == startLevel && pg_complete(&iter, start, nlast)) {
+            compressed = true;
+            memcpy(last, nlast, sizeof(last));
+        }
+
+        if (compressed) {
+            /*
+             * We found a run we can show as a range spanning
+             * [startLevel, maxLevel].  start stores the first entry
+             * at each level and last stores the last entry.
+             */
+            for (i = startLevel; i <= maxLevel; i++) {
+                pg_print(mon, &start[i], &last[i]);
+            }
+        } else {
+            /* No luck finding a range.  Iter hasn't moved. */
+            pg_print(mon, &iter, &iter);
+            ptiter_succ(&iter);
+        }
+    }
+}
+
 static void print_pte(Monitor *mon, CPUArchState *env, hwaddr addr,
                       hwaddr pte, hwaddr mask)
 {
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: Add 'info pg' command to monitor
  2024-04-15 16:08 Add 'info pg' command to monitor Don Porter
  2024-04-15 16:08 ` [PATCH 1/2] monitor: Implement a generic x86 page table iterator Don Porter
  2024-04-15 16:08 ` [PATCH 2/2] monitor: Add an "info pg" command that prints the current page tables Don Porter
@ 2024-04-15 16:37 ` Peter Maydell
  2024-04-16 16:53   ` Don Porter
  2 siblings, 1 reply; 13+ messages in thread
From: Peter Maydell @ 2024-04-15 16:37 UTC (permalink / raw)
  To: Don Porter; +Cc: qemu-devel, pbonzini, dave

On Mon, 15 Apr 2024 at 17:09, Don Porter <porter@cs.unc.edu> wrote:
> I am a CS professor (and, newly, a second-time contributor). I have
> been using qemu in my courses for over a decade, especially a course
> that asks students to write major pieces of an OS kernel from starter
> code.
>
> I have some patches, originally from Austin Clements at MIT, that I
> have found useful over the years and that may be useful to others.  It
> would also be nice not to have to build a custom qemu each semester.  I
> have cleared upstreaming these with Austin, the original author.
>
> This patch set adds an 'info pg' command to the monitor, which prints
> a nicer view of the page tables.  A project in my graduate OS course
> involves implementing x86 page table support, and my students have
> found this helpful for debugging.

So, my issue with this is that it's x86 specific, and it adds
yet another monitor command that is doing "show some kind of debug
info related to the guest page tables", along with "info mem"
and "info tlb". Plus it is yet another lump of code that's
walking the guest page tables and interpreting them.

What I'd really like to see is some infrastructure that is
at least somewhat guest-architecture-agnostic, so we can
minimise what a guest architecture needs to implement (and
then make providing that mandatory).

The other thing I'd like to see is perhaps some investigation of
whether there's any way to implement something useful by
using/extending the existing get_phys_page_attrs_debug() and
similar functions, so that you don't have to write one lot
of page-table-walking code for QEMU to use when it's executing
guest code and a separate lot (that's bound to get out of
sync or not support new functionality/changes) that's only
handling these monitor debug commands. There's a lot of
complexity in figuring out things like permissions in a
modern architecture...

thanks
-- PMM


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Add 'info pg' command to monitor
  2024-04-15 16:37 ` Add 'info pg' command to monitor Peter Maydell
@ 2024-04-16 16:53   ` Don Porter
  2024-04-16 17:03     ` Peter Maydell
  0 siblings, 1 reply; 13+ messages in thread
From: Don Porter @ 2024-04-16 16:53 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel, pbonzini, dave

On 4/15/24 12:37, Peter Maydell wrote:
> On Mon, 15 Apr 2024 at 17:09, Don Porter <porter@cs.unc.edu> wrote:
>
> This patch set adds an 'info pg' command to the monitor, which prints
> a nicer view of the page tables.  A project in my graduate OS course
> involves implementing x86 page table support, and my students have
> found this helpful for debugging.
> So, my issue with this is that it's x86 specific, and it adds
> yet another monitor command that is doing "show some kind of debug
> info related to the guest page tables", along with "info mem"
> and "info tlb". Plus it is yet another lump of code that's
> walking the guest page tables and interpreting them.
>
> What I'd really like to see is some infrastructure that is
> at least somewhat guest-architecture-agnostic, so we can
> minimise what a guest architecture needs to implement (and
> then make providing that mandatory).
>
> The other thing I'd like to see is perhaps some investigation of
> whether there's any way to implement something useful by
> using/extending the existing get_phys_page_attrs_debug() and
> similar functions, so that you don't have to write one lot
> of page-table-walking code for QEMU to use when it's executing
> guest code and a separate lot (that's bound to get out of
> sync or not support new functionality/changes) that's only
> handling these monitor debug commands. There's a lot of
> complexity in figuring out things like permissions in a
> modern architecture...
>
> thanks
> -- PMM
>   

Thank you for the feedback!

There is still a lot I am learning about the code base, but it seems 
that qemu_get_guest_memory_mapping() does most of what one would need.  
It currently only returns the "leaves" of the page table tree in a list.

What if I extend this function with an optional argument to either
1) return the interior nodes of the page table in additional lists (and 
then parse+print in the monitor code), or
2) inline the monitor printing in the arch-specific hook, and pass a 
flag to get_guest_memory_mapping() that turns on/off the statements that 
pretty print the page tables?

It looks like most CPUs implement this function as part of checkpointing.

Thoughts?

Don



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Add 'info pg' command to monitor
  2024-04-16 16:53   ` Don Porter
@ 2024-04-16 17:03     ` Peter Maydell
  2024-04-16 18:11       ` Don Porter
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Maydell @ 2024-04-16 17:03 UTC (permalink / raw)
  To: Don Porter; +Cc: qemu-devel, pbonzini, dave

On Tue, 16 Apr 2024 at 17:53, Don Porter <porter@cs.unc.edu> wrote:
> There is still a lot I am learning about the code base, but it seems
> that qemu_get_guest_memory_mapping() does most of what one would need.
> It currently only returns the "leaves" of the page table tree in a list.
>
> What if I extend this function with an optional argument to either
> 1) return the interior nodes of the page table in additional lists (and
> then parse+print in the monitor code), or
> 2) inline the monitor printing in the arch-specific hook, and pass a
> flag to get_guest_memory_mapping() that turns on/off the statements that
> pretty print the page tables?
>
> It looks like most CPUs implement this function as part of checkpointing.

As far as I can see only x86 implements the get_memory_mapping
function, so once again somebody has added some bit of
functionality that does a walk of the page tables that is
x86 only and that shares no code with any of the other
page table walking code :-(

thanks
-- PMM


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Add 'info pg' command to monitor
  2024-04-16 17:03     ` Peter Maydell
@ 2024-04-16 18:11       ` Don Porter
  2024-04-17  8:30         ` Nadav Amit
                           ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Don Porter @ 2024-04-16 18:11 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel, pbonzini, dave

On 4/16/24 13:03, Peter Maydell wrote:
> On Tue, 16 Apr 2024 at 17:53, Don Porter <porter@cs.unc.edu> wrote:
>> There is still a lot I am learning about the code base, but it seems
>> that qemu_get_guest_memory_mapping() does most of what one would need.
>> It currently only returns the "leaves" of the page table tree in a list.
>>
>> What if I extend this function with an optional argument to either
>> 1) return the interior nodes of the page table in additional lists (and
>> then parse+print in the monitor code), or
>> 2) inline the monitor printing in the arch-specific hook, and pass a
>> flag to get_guest_memory_mapping() that turns on/off the statements that
>> pretty print the page tables?
>>
>> It looks like most CPUs implement this function as part of checkpointing.
> As far as I can see only x86 implements the get_memory_mapping
> function, so once again somebody has added some bit of
> functionality that does a walk of the page tables that is
> x86 only and that shares no code with any of the other
> page table walking code :-(

My mistake - get_memory_mappings() is only implemented in x86.

In doing some searching of the code, many architectures implement 
mmu_translate() and
get_physical_address() functions, but they are not standardized. I also 
see your larger point
about replicating page walking code in x86.

I imagine you have something in mind that abstracts things like the 
height of the radix tree,
entries per node, checking permissions, printing the contents, etc.

Perhaps I should start by trying to merge the x86 page walking code into 
one set of common
helper functions, get more feedback (perhaps on a new patch thread?), 
and then consider
how to abstract across architectures after getting feedback on this?

In looking at x86 code, I see the following places where there is page 
table walking code to
potentially merge:

* target/i386/monitor.c - existing info commands

* target/i386/helper.c - get_phys_page_attrs_debug

* target/i386/arch_memory_mapping.c - implements get_memory_mapping

* tcg/sysemu/excp_helper.c: implements mmu_translate() and 
get_physical_address()

Thanks,

Don



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Add 'info pg' command to monitor
  2024-04-16 18:11       ` Don Porter
@ 2024-04-17  8:30         ` Nadav Amit
  2024-04-17 20:05           ` Nadav Amit
  2024-04-17 21:03         ` Dr. David Alan Gilbert
  2024-04-19 14:47         ` Peter Maydell
  2 siblings, 1 reply; 13+ messages in thread
From: Nadav Amit @ 2024-04-17  8:30 UTC (permalink / raw)
  To: Don Porter; +Cc: Peter Maydell, qemu-devel, Paolo Bonzini, dave



> On 16 Apr 2024, at 21:11, Don Porter <porter@cs.unc.edu> wrote:
> 
> On 4/16/24 13:03, Peter Maydell wrote:
>> On Tue, 16 Apr 2024 at 17:53, Don Porter <porter@cs.unc.edu> wrote:
>>> There is still a lot I am learning about the code base, but it seems
>>> that qemu_get_guest_memory_mapping() does most of what one would need.
>>> It currently only returns the "leaves" of the page table tree in a list.
>>> 
>>> What if I extend this function with an optional argument to either
>>> 1) return the interior nodes of the page table in additional lists (and
>>> then parse+print in the monitor code), or
>>> 2) inline the monitor printing in the arch-specific hook, and pass a
>>> flag to get_guest_memory_mapping() that turns on/off the statements that
>>> pretty print the page tables?
>>> 
>>> It looks like most CPUs implement this function as part of checkpointing.
>> As far as I can see only x86 implements the get_memory_mapping
>> function, so once again somebody has added some bit of
>> functionality that does a walk of the page tables that is
>> x86 only and that shares no code with any of the other
>> page table walking code :-(
> 
> My mistake - get_memory_mappings() is only implemented in x86.
> 

Hi Don,

Your email popped up, so I just drop my 2 cents.

If you are only interested in x86, you can just build a gdb script that would
parse the page tables and present the data. It might be less efficient, but
easier to maintain.

Regards,
Nadav




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Add 'info pg' command to monitor
  2024-04-17  8:30         ` Nadav Amit
@ 2024-04-17 20:05           ` Nadav Amit
  0 siblings, 0 replies; 13+ messages in thread
From: Nadav Amit @ 2024-04-17 20:05 UTC (permalink / raw)
  To: Don Porter; +Cc: Peter Maydell, qemu-devel, Paolo Bonzini, dave



> On 17 Apr 2024, at 11:30, Nadav Amit <nadav.amit@gmail.com> wrote:
> 
> 
> 
>> On 16 Apr 2024, at 21:11, Don Porter <porter@cs.unc.edu> wrote:
>> 
>> On 4/16/24 13:03, Peter Maydell wrote:
>>> On Tue, 16 Apr 2024 at 17:53, Don Porter <porter@cs.unc.edu> wrote:
>>>> There is still a lot I am learning about the code base, but it seems
>>>> that qemu_get_guest_memory_mapping() does most of what one would need.
>>>> It currently only returns the "leaves" of the page table tree in a list.
>>>> 
>>>> What if I extend this function with an optional argument to either
>>>> 1) return the interior nodes of the page table in additional lists (and
>>>> then parse+print in the monitor code), or
>>>> 2) inline the monitor printing in the arch-specific hook, and pass a
>>>> flag to get_guest_memory_mapping() that turns on/off the statements that
>>>> pretty print the page tables?
>>>> 
>>>> It looks like most CPUs implement this function as part of checkpointing.
>>> As far as I can see only x86 implements the get_memory_mapping
>>> function, so once again somebody has added some bit of
>>> functionality that does a walk of the page tables that is
>>> x86 only and that shares no code with any of the other
>>> page table walking code :-(
>> 
>> My mistake - get_memory_mappings() is only implemented in x86.
>> 
> 
> Hi Don,
> 
> Your email popped up, so I just drop my 2 cents.
> 
> If you are only interested in x86, you can just build a gdb script that would
> parse the page tables and present the data. It might be less efficient, but
> easier to maintain.

Actually, I think I was wrong. It might not be easily done with GDB, but I did
something similar with QMP.

You can have a look at some related code I once wrote for similar purpose:
https://nadav.amit.zone/downloads/pti_test.py



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Add 'info pg' command to monitor
  2024-04-16 18:11       ` Don Porter
  2024-04-17  8:30         ` Nadav Amit
@ 2024-04-17 21:03         ` Dr. David Alan Gilbert
  2024-04-17 21:26           ` Richard Henderson
  2024-04-19 14:47         ` Peter Maydell
  2 siblings, 1 reply; 13+ messages in thread
From: Dr. David Alan Gilbert @ 2024-04-17 21:03 UTC (permalink / raw)
  To: Don Porter; +Cc: Peter Maydell, qemu-devel, pbonzini

* Don Porter (porter@cs.unc.edu) wrote:
> On 4/16/24 13:03, Peter Maydell wrote:
> > On Tue, 16 Apr 2024 at 17:53, Don Porter <porter@cs.unc.edu> wrote:
> > > There is still a lot I am learning about the code base, but it seems
> > > that qemu_get_guest_memory_mapping() does most of what one would need.
> > > It currently only returns the "leaves" of the page table tree in a list.
> > > 
> > > What if I extend this function with an optional argument to either
> > > 1) return the interior nodes of the page table in additional lists (and
> > > then parse+print in the monitor code), or
> > > 2) inline the monitor printing in the arch-specific hook, and pass a
> > > flag to get_guest_memory_mapping() that turns on/off the statements that
> > > pretty print the page tables?
> > > 
> > > It looks like most CPUs implement this function as part of checkpointing.
> > As far as I can see only x86 implements the get_memory_mapping
> > function, so once again somebody has added some bit of
> > functionality that does a walk of the page tables that is
> > x86 only and that shares no code with any of the other
> > page table walking code :-(
> 
> My mistake - get_memory_mappings() is only implemented in x86.
> 
> In doing some searching of the code, many architectures implement
> mmu_translate() and
> get_physical_address() functions, but they are not standardized. I also see
> your larger point
> about replicating page walking code in x86.
> 
> I imagine you have something in mind that abstracts things like the height
> of the radix tree,
> entries per node, checking permissions, printing the contents, etc.
> 
> Perhaps I should start by trying to merge the x86 page walking code into one
> set of common
> helper functions, get more feedback (perhaps on a new patch thread?), and
> then consider
> how to abstract across architectures after getting feedback on this?
> 
> In looking at x86 code, I see the following places where there is page table
> walking code to
> potentially merge:
> 
> * target/i386/monitor.c - existing info commands
> 
> * target/i386/helper.c - get_phys_page_attrs_debug
> 
> * target/i386/arch_memory_mapping.c - implements get_memory_mapping
> 
> * tcg/sysemu/excp_helper.c: implements mmu_translate() and
> get_physical_address()

One thing to keep in mind (although I don't know the x86 code) is that
you want the monitor command not to change any state, nor to fail if
there's a particularly screwy page table; so no flagging exceptions
or flagging accessed bits or changing the state of the tcg.

Dave

> Thanks,
> 
> Don
> 
-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\        dave @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Add 'info pg' command to monitor
  2024-04-17 21:03         ` Dr. David Alan Gilbert
@ 2024-04-17 21:26           ` Richard Henderson
  0 siblings, 0 replies; 13+ messages in thread
From: Richard Henderson @ 2024-04-17 21:26 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, Don Porter; +Cc: Peter Maydell, qemu-devel, pbonzini

On 4/17/24 14:03, Dr. David Alan Gilbert wrote:
>> In looking at x86 code, I see the following places where there is page table
>> walking code to
>> potentially merge:
>>
>> * target/i386/monitor.c - existing info commands
>>
>> * target/i386/helper.c - get_phys_page_attrs_debug
>>
>> * target/i386/arch_memory_mapping.c - implements get_memory_mapping
>>
>> * tcg/sysemu/excp_helper.c: implements mmu_translate() and
>> get_physical_address()
> 
> One thing to keep in mind (although I don't know the x86 code) is that
> you want the monitor command not to change any state, nor to fail if
> there's a particularly screwy page table; so no flagging exceptions
> or flagging accessed bits or changing the state of the tcg.

Indeed, the only one of the 4 that *is* allowed to change state is excp_helper.c.


r~



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Add 'info pg' command to monitor
  2024-04-16 18:11       ` Don Porter
  2024-04-17  8:30         ` Nadav Amit
  2024-04-17 21:03         ` Dr. David Alan Gilbert
@ 2024-04-19 14:47         ` Peter Maydell
  2024-04-19 17:05           ` Dr. David Alan Gilbert
  2 siblings, 1 reply; 13+ messages in thread
From: Peter Maydell @ 2024-04-19 14:47 UTC (permalink / raw)
  To: Don Porter; +Cc: qemu-devel, pbonzini, dave

On Tue, 16 Apr 2024 at 19:11, Don Porter <porter@cs.unc.edu> wrote:
>
> On 4/16/24 13:03, Peter Maydell wrote:
> > On Tue, 16 Apr 2024 at 17:53, Don Porter <porter@cs.unc.edu> wrote:
> >> There is still a lot I am learning about the code base, but it seems
> >> that qemu_get_guest_memory_mapping() does most of what one would need.
> >> It currently only returns the "leaves" of the page table tree in a list.
> >>
> >> What if I extend this function with an optional argument to either
> >> 1) return the interior nodes of the page table in additional lists (and
> >> then parse+print in the monitor code), or
> >> 2) inline the monitor printing in the arch-specific hook, and pass a
> >> flag to get_guest_memory_mapping() that turns on/off the statements that
> >> pretty print the page tables?
> >>
> >> It looks like most CPUs implement this function as part of checkpointing.
> > As far as I can see only x86 implements the get_memory_mapping
> > function, so once again somebody has added some bit of
> > functionality that does a walk of the page tables that is
> > x86 only and that shares no code with any of the other
> > page table walking code :-(
>
> My mistake - get_memory_mappings() is only implemented in x86.
>
> In doing some searching of the code, many architectures implement
> mmu_translate() and
> get_physical_address() functions, but they are not standardized. I also
> see your larger point
> about replicating page walking code in x86.
>
> I imagine you have something in mind that abstracts things like the
> height of the radix tree,
> entries per node, checking permissions, printing the contents, etc.
>
> Perhaps I should start by trying to merge the x86 page walking code into
> one set of common
> helper functions, get more feedback (perhaps on a new patch thread?),
> and then consider
> how to abstract across architectures after getting feedback on this?

I think the cross-architecture abstraction is probably the
trickiest part. I would actually be happy for us to drop
'info tlb' and 'info mem' entirely if we have a cross-arch
command that gives basically the same information -- we don't
IMHO need more than one command for this, and we only have
multiple commands for basically legacy reasons. And for the
human monitor (HMP) we don't need to keep things around
for backwards compatibility.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Add 'info pg' command to monitor
  2024-04-19 14:47         ` Peter Maydell
@ 2024-04-19 17:05           ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 13+ messages in thread
From: Dr. David Alan Gilbert @ 2024-04-19 17:05 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Don Porter, qemu-devel, pbonzini

* Peter Maydell (peter.maydell@linaro.org) wrote:
> On Tue, 16 Apr 2024 at 19:11, Don Porter <porter@cs.unc.edu> wrote:
> >
> > On 4/16/24 13:03, Peter Maydell wrote:
> > > On Tue, 16 Apr 2024 at 17:53, Don Porter <porter@cs.unc.edu> wrote:
> > >> There is still a lot I am learning about the code base, but it seems
> > >> that qemu_get_guest_memory_mapping() does most of what one would need.
> > >> It currently only returns the "leaves" of the page table tree in a list.
> > >>
> > >> What if I extend this function with an optional argument to either
> > >> 1) return the interior nodes of the page table in additional lists (and
> > >> then parse+print in the monitor code), or
> > >> 2) inline the monitor printing in the arch-specific hook, and pass a
> > >> flag to get_guest_memory_mapping() that turns on/off the statements that
> > >> pretty print the page tables?
> > >>
> > >> It looks like most CPUs implement this function as part of checkpointing.
> > > As far as I can see only x86 implements the get_memory_mapping
> > > function, so once again somebody has added some bit of
> > > functionality that does a walk of the page tables that is
> > > x86 only and that shares no code with any of the other
> > > page table walking code :-(
> >
> > My mistake - get_memory_mappings() is only implemented in x86.
> >
> > In doing some searching of the code, many architectures implement
> > mmu_translate() and
> > get_physical_address() functions, but they are not standardized. I also
> > see your larger point
> > about replicating page walking code in x86.
> >
> > I imagine you have something in mind that abstracts things like the
> > height of the radix tree,
> > entries per node, checking permissions, printing the contents, etc.
> >
> > Perhaps I should start by trying to merge the x86 page walking code into
> > one set of common
> > helper functions, get more feedback (perhaps on a new patch thread?),
> > and then consider
> > how to abstract across architectures after getting feedback on this?
> 
> I think the cross-architecture abstraction is probably the
> trickiest part. I would actually be happy for us to drop
> 'info tlb' and 'info mem' entirely if we have a cross-arch
> command that gives basically the same information -- we don't
> IMHO need more than one command for this, and we only have
> multiple commands for basically legacy reasons. And for the
> human monitor (HMP) we don't need to keep things around
> for backwards compatibility.

I'm not sure what happens for the (MIPS/SPARC ?) where it's not
a traditional table hierarchy.

The other thing you might want (and I'm not sure how it interacts
with any of this) is to specify the root of the MMU tree (i.e. CR3
value for those in Intel thinking) to dump different processes etc.

Dave

> thanks
> -- PMM
-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\        dave @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-04-19 17:06 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-15 16:08 Add 'info pg' command to monitor Don Porter
2024-04-15 16:08 ` [PATCH 1/2] monitor: Implement a generic x86 page table iterator Don Porter
2024-04-15 16:08 ` [PATCH 2/2] monitor: Add an "info pg" command that prints the current page tables Don Porter
2024-04-15 16:37 ` Add 'info pg' command to monitor Peter Maydell
2024-04-16 16:53   ` Don Porter
2024-04-16 17:03     ` Peter Maydell
2024-04-16 18:11       ` Don Porter
2024-04-17  8:30         ` Nadav Amit
2024-04-17 20:05           ` Nadav Amit
2024-04-17 21:03         ` Dr. David Alan Gilbert
2024-04-17 21:26           ` Richard Henderson
2024-04-19 14:47         ` Peter Maydell
2024-04-19 17:05           ` Dr. David Alan Gilbert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.