All of lore.kernel.org
 help / color / mirror / Atom feed
* A few QEMU questiosn
@ 2022-10-03 21:10 a b
  2022-10-04  9:20 ` Peter Maydell
  0 siblings, 1 reply; 5+ messages in thread
From: a b @ 2022-10-03 21:10 UTC (permalink / raw)
  To: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 704 bytes --]

Hello, there,

I have a few newbie QEMU questions.  I found that mmu_idx in aarch64-softmmu  falls in 8, 10 and 12.

I need some help to understand what they are for.

I cannot find which macros are for mmu-idx 8, 10 and 12 at target/arm/cpu.h<https://git.qemu.org/?p=qemu.git;a=blob;f=target/arm/cpu.h;h=89d49cdcb21b6c57de391851d64a523f07bde664;hb=HEAD#l2178>. It looks like all the values from ARMMMUIdx<https://git.qemu.org/?p=qemu.git;a=blob;f=target/arm/cpu.h;h=89d49cdcb21b6c57de391851d64a523f07bde664;hb=HEAD#l2262> are greater than 0x10 (ARM_MMU_IDX_A). Am I looking at the wrong place or missing something for the different MMU modes in aarch64?

I'd appreciate your help.

Regards


[-- Attachment #2: Type: text/html, Size: 3034 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: A few QEMU questiosn
  2022-10-03 21:10 A few QEMU questiosn a b
@ 2022-10-04  9:20 ` Peter Maydell
  2022-10-06  7:34   ` a b
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Maydell @ 2022-10-04  9:20 UTC (permalink / raw)
  To: a b; +Cc: qemu-devel

On Tue, 4 Oct 2022 at 02:10, a b <blue_3too@hotmail.com> wrote:
> I have a few newbie QEMU questions.  I found that mmu_idx in aarch64-softmmu  falls in 8, 10 and 12.
>
> I need some help to understand what they are for.
>
> I cannot find which macros are for mmu-idx 8, 10 and 12 at target/arm/cpu.h. It looks like all the values from ARMMMUIdx are greater than 0x10 (ARM_MMU_IDX_A). Am I looking at the wrong place or missing something for the different MMU modes in aarch64?

The comment in target/arm/cpu.h and the various enum definitions
should be what you need. Note in particular the part that says
"The ARMMMUIdx and the mmu index value used by the core QEMU
 TLB code are not quite the same" and also the functions in
internals.h arm_to_core_mmu_idx() and core_to_arm_mmu_idx()
which convert between these two representations.

PS: there is a refactoring patch set currently in review which
changes the MMU index allocation (essentially it collapses
the separate Secure and NonSecure MMUIdx values together),
so the specific details will likely change at some point this
release cycle.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: A few QEMU questiosn
  2022-10-04  9:20 ` Peter Maydell
@ 2022-10-06  7:34   ` a b
  2022-10-06 10:50     ` Peter Maydell
  0 siblings, 1 reply; 5+ messages in thread
From: a b @ 2022-10-06  7:34 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 3164 bytes --]

Thanks a lot Peter for the clarification. It is very helpful.

My naive understanding is that each MMU has only 1 TLB, why do we need an array of CPUTLBDescFast structures? How are these different CPUTLBDescFast data structures correlate with a hardware TLB?

220 typedef struct CPUTLB {
221     CPUTLBCommon c;
222     CPUTLBDesc d[NB_MMU_MODES];
223     CPUTLBDescFast f[NB_MMU_MODES];
224 } CPUTLB;


Why do we want to store a shifted (n_entries-1) in mask?
184 typedef struct CPUTLBDescFast {
185     /* Contains (n_entries - 1) << CPU_TLB_ENTRY_BITS */
186     uintptr_t mask;
187     /* The array of tlb entries itself. */
188     CPUTLBEntry *table;
189 } CPUTLBDescFast QEMU_ALIGNED(2 * sizeof(void *));


Why doesn't CPUTLBEntry have information like ASID, shared (or global) bits?  How do we know if the TLB entry is a match for a particular process?

In include/exec/cpu-defs.h:
111 typedef struct CPUTLBEntry {
112     /* bit TARGET_LONG_BITS to TARGET_PAGE_BITS : virtual address
113        bit TARGET_PAGE_BITS-1..4  : Nonzero for accesses that should not
114                                     go directly to ram.
115        bit 3                      : indicates that the entry is invalid
116        bit 2..0                   : zero
117     */
118     union {
119         struct {
120             target_ulong addr_read;
121             target_ulong addr_write;
122             target_ulong addr_code;
123             /* Addend to virtual address to get host address.  IO accesses
124                use the corresponding iotlb value.  */
125             uintptr_t addend;
126         };
127         /* padding to get a power of two size */
128         uint8_t dummy[1 << CPU_TLB_ENTRY_BITS];
129     };
130 } CPUTLBEntry;


Thanks!
________________________________
From: Peter Maydell <peter.maydell@linaro.org>
Sent: October 4, 2022 9:20 AM
To: a b <blue_3too@hotmail.com>
Cc: qemu-devel@nongnu.org <qemu-devel@nongnu.org>
Subject: Re: A few QEMU questiosn

On Tue, 4 Oct 2022 at 02:10, a b <blue_3too@hotmail.com> wrote:
> I have a few newbie QEMU questions.  I found that mmu_idx in aarch64-softmmu  falls in 8, 10 and 12.
>
> I need some help to understand what they are for.
>
> I cannot find which macros are for mmu-idx 8, 10 and 12 at target/arm/cpu.h. It looks like all the values from ARMMMUIdx are greater than 0x10 (ARM_MMU_IDX_A). Am I looking at the wrong place or missing something for the different MMU modes in aarch64?

The comment in target/arm/cpu.h and the various enum definitions
should be what you need. Note in particular the part that says
"The ARMMMUIdx and the mmu index value used by the core QEMU
 TLB code are not quite the same" and also the functions in
internals.h arm_to_core_mmu_idx() and core_to_arm_mmu_idx()
which convert between these two representations.

PS: there is a refactoring patch set currently in review which
changes the MMU index allocation (essentially it collapses
the separate Secure and NonSecure MMUIdx values together),
so the specific details will likely change at some point this
release cycle.

thanks
-- PMM

[-- Attachment #2: Type: text/html, Size: 7913 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: A few QEMU questiosn
  2022-10-06  7:34   ` a b
@ 2022-10-06 10:50     ` Peter Maydell
  2022-10-07  6:34       ` a b
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Maydell @ 2022-10-06 10:50 UTC (permalink / raw)
  To: a b; +Cc: qemu-devel

On Thu, 6 Oct 2022 at 08:34, a b <blue_3too@hotmail.com> wrote:
>
> Thanks a lot Peter for the clarification. It is very helpful.
>
> My naive understanding is that each MMU has only 1 TLB, why do we need an array of CPUTLBDescFast structures? How are these different CPUTLBDescFast data structures correlate with a hardware TLB?
>
> 220 typedef struct CPUTLB {
> 221     CPUTLBCommon c;
> 222     CPUTLBDesc d[NB_MMU_MODES];
> 223     CPUTLBDescFast f[NB_MMU_MODES];
> 224 } CPUTLB;

QEMU's "TLB" doesn't really correlate with a hardware TLB
except in that they're serving vaguely similar purposes.
A hardware TLB is a h/w structure which accelerates the lookup
  virtual-address => (physical-address, permissions)
QEMU's TLB is a software data structure which accelerates
the lookup
  virtual-address => (host virtual address or device MemoryRegion structure)

It's not an emulation of the "real" CPU TLB. (Note that this
means that you can't use QEMU to look at performance behaviour
around whether guest code is hitting or missing in the TLB,
and that the size of QEMU's TLB is unrelated to the size of a
TLB on the real CPU.)

Further, the set of things that can be done fast in hardware
differs from the set of things that can be done fast in
software. In hardware, a TLB is a "content-addressable
memory" that essentially checks every entry in parallel to
find the match in fixed time. In this kind of hardware it's
easy to add checks like "and it should match the right ASID"
or "and it must be an entry for EL2" without it making the
lookup slower. In software, you can't do that kind of parallel
lookup, so we must use a different structure. Instead of
having one TLB that can store entries for multiple contexts
at once and where we check that the context is correct when
we look up the address, we have effectively a separate TLB
for each context, so we can look up the address in an O(1)
data structure that has exactly one entry for the address,
and know that if it is present it is the correct entry.

The aim of the QEMU TLB design is to make the "fast path"
lookup of guest virtual address to host virtual address for
RAM accesses as fast as possible (it is a handful of
instructions directly generated as part of the JIT output);
the slow path for faults, hardware accesses, etc, is handled
in C code and is less performance critical.

> Why do we want to store a shifted (n_entries-1) in mask?
> 184 typedef struct CPUTLBDescFast {
> 185     /* Contains (n_entries - 1) << CPU_TLB_ENTRY_BITS */
> 186     uintptr_t mask;
> 187     /* The array of tlb entries itself. */
> 188     CPUTLBEntry *table;
> 189 } CPUTLBDescFast QEMU_ALIGNED(2 * sizeof(void *));

The mask field is a pre-calculated value that is going to
be used as part of the "given a virtual address, find the
table entry" lookup. Because the number of entries in the table
varies, the part of the address we need to use as the index
also varies. We pre-calculate the mask in a convenient format
for the generated JIT code because if we stored just n_entries
here it would cost us an extra instruction or two in the fast path.
(To understand these data structures you probably want to also
be looking at the code that generates the lookup code, which
you can find under tcg/, usually in a function named
tcg_out_tlb_load or tcg_out_tlb_read or similar.)

> Why doesn't CPUTLBEntry have information like ASID, shared
> (or global) bits?  How do we know if the TLB entry is a match
> for a particular process?

We don't store the ASID because it would be slow to do a check
on it when we got a TLB hit, and it would be too expensive to
have an entire separate TLB per-ASID. Instead we simply flush
the appropriate TLB when the ASID is changed. That means that
we can rely on a TLB hit being for the current context/process.

-- PMM


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: A few QEMU questiosn
  2022-10-06 10:50     ` Peter Maydell
@ 2022-10-07  6:34       ` a b
  0 siblings, 0 replies; 5+ messages in thread
From: a b @ 2022-10-07  6:34 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 6500 bytes --]

Thanks Peter.

How QEMU deals with different page size?   Does a 2GB hugepage has a single corresponding TLB entry? Or it is partitioned to 512 4K pages and has 512 TLB entries?

does a CPUTLBDescFast always hold TLB entries for the same single process? Is it always flushed/restored on context switch?

Is MMU-IDX for different translation regimes or exception level?

How about ITLB? It looks that QEMU has a mixed TLB implementation since the ELB entries have read/write/execute flags. Am I correct?


I am exploring to reconstruct a guest TLB (i.e. guest VA --> guest PA) for the running process (i.e. I can live with a TLB just for the running process). I found that exelog.c calls qemu_plugin_get_hwaddr to get  the guest PA. A quick eye-balling the function seems suggests it  populates data->v.ram.hostaddr with host VA (see line 1699 below). Am I correct?

What is the correct way to construct the guest TLB for the running process based on QEMU data structure at runtime?

1681 bool tlb_plugin_lookup(CPUState *cpu, target_ulong addr, int mmu_idx,
1682                        bool is_store, struct qemu_plugin_hwaddr *data)
1683 {
1684     CPUArchState *env = cpu->env_ptr;
1685     CPUTLBEntry *tlbe = tlb_entry(env, mmu_idx, addr);
1686     uintptr_t index = tlb_index(env, mmu_idx, addr);
1687     target_ulong tlb_addr = is_store ? tlb_addr_write(tlbe) : tlbe->addr_read;
1688
1689     if (likely(tlb_hit(tlb_addr, addr))) {
1690         /* We must have an iotlb entry for MMIO */
1691         if (tlb_addr & TLB_MMIO) {
1692             CPUIOTLBEntry *iotlbentry;
1693             iotlbentry = &env_tlb(env)->d[mmu_idx].iotlb[index];
1694             data->is_io = true;
1695             data->v.io.section = iotlb_to_section(cpu, iotlbentry->addr, iotlbentry->attrs);
1696             data->v.io.offset = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
1697         } else {
1698             data->is_io = false;
1699             data->v.ram.hostaddr = (void *)((uintptr_t)addr + tlbe->addend);
1700         }
1701         return true;
1702     } else {
1703         SavedIOTLB *saved = &cpu->saved_iotlb;
1704         data->is_io = true;
1705         data->v.io.section = saved->section;
1706         data->v.io.offset = saved->mr_offset;
1707         return true;
1708     }
1709 }

Thanks a bunch!

Regards
________________________________
From: Peter Maydell <peter.maydell@linaro.org>
Sent: October 6, 2022 10:50 AM
To: a b <blue_3too@hotmail.com>
Cc: qemu-devel@nongnu.org <qemu-devel@nongnu.org>
Subject: Re: A few QEMU questiosn

On Thu, 6 Oct 2022 at 08:34, a b <blue_3too@hotmail.com> wrote:
>
> Thanks a lot Peter for the clarification. It is very helpful.
>
> My naive understanding is that each MMU has only 1 TLB, why do we need an array of CPUTLBDescFast structures? How are these different CPUTLBDescFast data structures correlate with a hardware TLB?
>
> 220 typedef struct CPUTLB {
> 221     CPUTLBCommon c;
> 222     CPUTLBDesc d[NB_MMU_MODES];
> 223     CPUTLBDescFast f[NB_MMU_MODES];
> 224 } CPUTLB;

QEMU's "TLB" doesn't really correlate with a hardware TLB
except in that they're serving vaguely similar purposes.
A hardware TLB is a h/w structure which accelerates the lookup
  virtual-address => (physical-address, permissions)
QEMU's TLB is a software data structure which accelerates
the lookup
  virtual-address => (host virtual address or device MemoryRegion structure)

It's not an emulation of the "real" CPU TLB. (Note that this
means that you can't use QEMU to look at performance behaviour
around whether guest code is hitting or missing in the TLB,
and that the size of QEMU's TLB is unrelated to the size of a
TLB on the real CPU.)

Further, the set of things that can be done fast in hardware
differs from the set of things that can be done fast in
software. In hardware, a TLB is a "content-addressable
memory" that essentially checks every entry in parallel to
find the match in fixed time. In this kind of hardware it's
easy to add checks like "and it should match the right ASID"
or "and it must be an entry for EL2" without it making the
lookup slower. In software, you can't do that kind of parallel
lookup, so we must use a different structure. Instead of
having one TLB that can store entries for multiple contexts
at once and where we check that the context is correct when
we look up the address, we have effectively a separate TLB
for each context, so we can look up the address in an O(1)
data structure that has exactly one entry for the address,
and know that if it is present it is the correct entry.

The aim of the QEMU TLB design is to make the "fast path"
lookup of guest virtual address to host virtual address for
RAM accesses as fast as possible (it is a handful of
instructions directly generated as part of the JIT output);
the slow path for faults, hardware accesses, etc, is handled
in C code and is less performance critical.

> Why do we want to store a shifted (n_entries-1) in mask?
> 184 typedef struct CPUTLBDescFast {
> 185     /* Contains (n_entries - 1) << CPU_TLB_ENTRY_BITS */
> 186     uintptr_t mask;
> 187     /* The array of tlb entries itself. */
> 188     CPUTLBEntry *table;
> 189 } CPUTLBDescFast QEMU_ALIGNED(2 * sizeof(void *));

The mask field is a pre-calculated value that is going to
be used as part of the "given a virtual address, find the
table entry" lookup. Because the number of entries in the table
varies, the part of the address we need to use as the index
also varies. We pre-calculate the mask in a convenient format
for the generated JIT code because if we stored just n_entries
here it would cost us an extra instruction or two in the fast path.
(To understand these data structures you probably want to also
be looking at the code that generates the lookup code, which
you can find under tcg/, usually in a function named
tcg_out_tlb_load or tcg_out_tlb_read or similar.)

> Why doesn't CPUTLBEntry have information like ASID, shared
> (or global) bits?  How do we know if the TLB entry is a match
> for a particular process?

We don't store the ASID because it would be slow to do a check
on it when we got a TLB hit, and it would be too expensive to
have an entire separate TLB per-ASID. Instead we simply flush
the appropriate TLB when the ASID is changed. That means that
we can rely on a TLB hit being for the current context/process.

-- PMM

[-- Attachment #2: Type: text/html, Size: 13036 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-10-07  6:38 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-03 21:10 A few QEMU questiosn a b
2022-10-04  9:20 ` Peter Maydell
2022-10-06  7:34   ` a b
2022-10-06 10:50     ` Peter Maydell
2022-10-07  6:34       ` a b

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.