All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Why QEMU translates one instruction to a TB?
       [not found] <tencent_EAC696641F035EB7E9885302EAAE37455907@qq.com>
@ 2020-09-17  7:38 ` Philippe Mathieu-Daudé
  2020-09-17  7:45 ` Philippe Mathieu-Daudé
  2020-09-17  8:41 ` Alex Bennée
  2 siblings, 0 replies; 5+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-09-17  7:38 UTC (permalink / raw)
  To: casmac, qemu-devel; +Cc: Peter Maydell

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=gb18030, Size: 2793 bytes --]

On 9/17/20 8:25 AM, casmac wrote:
> Hi all,
> 0„20„2 0„2 We try to add DSP architecure to QEMU 4.2. To load the COFF format
> object file, we have added loader code to load content from
> 0„2 the object file. The rom_add_blob() function is used. We firstly
> analyze the COFF file to figure out which sections are chained
> 0„2 together(so each chain forms a "memory blob"), and then allocate the
> memory blobs.
> 0„2
> 0„2 The psuedo code looks like:
> 0„2
> 0„20„20„2 0„2 0„2 0„2 for(i=0; i<BADTYPE; i++){
> 0„2 0„2 0„2 0„2 0„2 0„2 if(ary_sect_chain[i].exist) 0„2 //there is a chain of sections
> to allocate
> 0„2 0„2 0„2 0„2 0„2 0„2 {
> 0„2 0„2 0„2 0„2 0„2 0„2 0„2 0„2 ary_sect_chain[i].mem_region = g_new(MemoryRegion, 1);
> 0„2 0„2 0„2 0„2 0„2 0„2 0„2 0„2 memory_region_init_ram(...);
> 0„2 0„2 0„2 0„2 0„2 0„2 0„2 0„2 memory_region_add_subregion(sysmem, ....);
> 0„2 0„2 0„2 0„2 0„2 0„2 0„2 0„2 rom_add_blob(....);
> 0„2 0„2 0„2 0„2 0„2 0„2 }
> 0„2 0„2 0„20„2 0„2 }

Why do this silly mapping when you know your DSP memory map?

> ------------------------------------------------------
> ok.lds file:
> 
> MEMORY 0„2 /* MEMORY directive */
> {
> 0„2 0„2 ROM:0„20„20„20„20„20„20„20„20„2 0„2 origin = 000000h0„2 0„2 length = 001000h0„20„2 0„2 /* 4K
> 32-bit words on-chip ROM (C31/VC33) */

Per the TI spru031f datasheet, this is external (there is no
on-chip ROM).

I have my doubts there is actually a ROM mapped here...
Is this linkscript used to *test* a BIOS written in SRAM by
some JTAG?

> 0„2 0„2 /* 256K 32-bit word off-chip SRAM (D.Module.VC33-150-S2) */
> 0„2 0„2 BIOS:0„20„20„20„2 0„2 origin = 001000h0„20„20„2 0„2 length = 000300h
> 0„2 0„2 CONF_UTL: 0„2 origin = 001300h0„20„20„2 0„2 length = 000800h
> 0„2 0„2 FREE:0„20„20„20„2 0„2 origin = 001B00h0„20„20„2 0„2 length = 03F500h0„2 /* 259328 32-bit
> words */
> 0„2 0„2 RAM_0_1:0„20„2 0„2 origin = 809800h0„2 0„2 length = 000800h0„20„2 0„2 /* 2 x 1K
> 32-bit word on-chip SRAM (C31/VC33) */
> 0„2 0„2 RAM_2_3:0„20„2 0„2 origin = 800000h0„2 0„2 length = 008000h0„20„2 0„2 /* 2 x 16K
> 32-bit word on-chip SRAM (VC33 only) */
> }

You probably want to use:

  memory_region_init_ram(&s->extsram, OBJECT(dev), "eSRAM",
                         256 * KiB, &error_fatal);
  memory_region_add_subregion(get_system_memory(),
                              0x000000, &s->extsram);

  memory_region_init_ram(&s->ocsram, OBJECT(dev), "iSRAM",
                         2 * KiB, &error_fatal);
  memory_region_add_subregion(get_system_memory(),
                              0x809800, &s->ocsram);

Then different areas of the object file will be loaded into
the either the iSRAM or the eSRAM.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Why QEMU translates one instruction to a TB?
       [not found] <tencent_EAC696641F035EB7E9885302EAAE37455907@qq.com>
  2020-09-17  7:38 ` Why QEMU translates one instruction to a TB? Philippe Mathieu-Daudé
@ 2020-09-17  7:45 ` Philippe Mathieu-Daudé
       [not found]   ` <tencent_6FBC0FD37CA798D4766FE6B2822DAC3E2908@qq.com>
  2020-09-17  8:41 ` Alex Bennée
  2 siblings, 1 reply; 5+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-09-17  7:45 UTC (permalink / raw)
  To: casmac, qemu-devel; +Cc: Alex Bennée, Peter Maydell, Richard Henderson

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=gb18030, Size: 2013 bytes --]

On 9/17/20 8:25 AM, casmac wrote:
> Hi all,
> 0„20„2 0„2 We try to add DSP architecure to QEMU 4.2. To load the COFF format
> object file, we have added loader code to load content from
> 0„2 the object file. 
[...]

> 0„2 0„2 The COFF loader works functionally, but we then found that sometimes
> QEMU is down-graded - it treats each instruction as one TB. In version
> 4.2,0„2 debugging shows
> that get_page_addr_code_host() from accel/tcg/cputlb.c returns -1, as
> shown below.
> 
> accel/tcg/cputlb.c:
> tb_page_addr_t get_page_addr_code_hostp(CPUArchState *env, target_ulong
> addr,
> 0„20„20„20„20„20„20„20„20„20„20„20„20„20„20„20„20„20„20„20„20„20„20„20„20„20„20„20„20„20„20„20„20„20„20„20„20„2 0„2 void **hostp)
> {
> 0„2 0„2 uintptr_t mmu_idx = cpu_mmu_index(env, true);
> 0„2 0„2 uintptr_t index = tlb_index(env, mmu_idx, addr);
> 0„2 0„2 CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr);
> 0„2 0„2 void *p;
> 
> 0„2 0„2 //.....
> 0„2 0„2 if (unlikely(entry->addr_code & TLB_MMIO)) {
> 0„20„20„20„20„2 0„2 /* The region is not backed by RAM.0„2 */
> 0„20„20„20„20„2 0„2 if (hostp) {
> 0„20„20„20„20„20„20„20„20„2 0„2 *hostp = NULL;
> 0„20„20„20„20„2 0„2 }
> 0„20„20„20„20„2 0„2 return -1;0„2 0„2 0„2 0„2 /* debugging falls to this branch, after this
> point QEMU translate one instruction to a TB0„2 */
> 0„2 0„2 }
> 0„2 0„2 //.......
> }0„2 0„2
> 
> 0„2 0„2 One intresting fact is that this somehow depends on the linker
> command file. The object file generated by the following linker command
> file(per_instr.lds)
> will "trigger" the problem. But QEMU work well with the object file
> linked by the other linker command file (ok.lds).
> 0„2 0„2 What cause get_page_addr_code_hostp() function to return -1? I have
> no clue at all. Any advise is appreciated!!

Maybe the "execute from small-MMU-region RAM" problem?

See:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg549660.html


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Why QEMU translates one instruction to a TB?
       [not found] <tencent_EAC696641F035EB7E9885302EAAE37455907@qq.com>
  2020-09-17  7:38 ` Why QEMU translates one instruction to a TB? Philippe Mathieu-Daudé
  2020-09-17  7:45 ` Philippe Mathieu-Daudé
@ 2020-09-17  8:41 ` Alex Bennée
  2 siblings, 0 replies; 5+ messages in thread
From: Alex Bennée @ 2020-09-17  8:41 UTC (permalink / raw)
  To: casmac; +Cc: Peter&nbsp;Maydell, qemu-devel


casmac <climber.cui@qq.com> writes:

> Hi all, 
> &nbsp;&nbsp; &nbsp; We try to add DSP architecure to QEMU 4.2. To load the  COFF format object file, we have added loader code to load content from 
> &nbsp;  the object file. The rom_add_blob() function is used. We firstly  analyze the COFF file to figure out which sections are chained
> &nbsp; together(so each chain forms a "memory blob"), and then allocate the memory blobs. 
> &nbsp; 
> &nbsp; The psuedo code looks like:
> &nbsp; 
> &nbsp;&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; for(i=0; i<BADTYPE; i++){
> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if(ary_sect_chain[i].exist) &nbsp; //there is a chain of sections to allocate 
> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; {
> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ary_sect_chain[i].mem_region = g_new(MemoryRegion, 1);
> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; memory_region_init_ram(...);
> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; memory_region_add_subregion(sysmem, ....);
> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; rom_add_blob(....);
> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }
> &nbsp; &nbsp; &nbsp;&nbsp; &nbsp; }
> &nbsp; 
<snip>

> &nbsp; &nbsp; if (unlikely(entry-&gt;addr_code &amp; TLB_MMIO)) {
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; /* The region is not backed by
> RAM.&nbsp; */

This is the crux of it. If the address looked up isn't in a RAM region
then the TLB code can't assume a contiguous page of instructions or that
the instruction executed on one read will be the same on the next so it
will only execute a single instruction at a time and not cache the
resulting TB either forcing a fresh re-translation each time.

All TLB_MMIO access basically force the slow path.

I suspect there is something wrong in your memory region mappings.

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Why QEMU translates one instruction to a TB?
       [not found]   ` <tencent_6FBC0FD37CA798D4766FE6B2822DAC3E2908@qq.com>
@ 2020-09-18  9:39     ` Peter Maydell
  2020-09-18 10:04     ` 回复: " Alex Bennée
  1 sibling, 0 replies; 5+ messages in thread
From: Peter Maydell @ 2020-09-18  9:39 UTC (permalink / raw)
  To: casmac; +Cc: Alex&nbsp,Bennée, qemu-devel, Philippe Mathieu-Daudé

On Fri, 18 Sep 2020 at 07:12, casmac <climber.cui@qq.com> wrote:
>
> Hello ,
>   thanks for the hints. I modified one parameter of  memory_region_init_ram() call ,and the slow-path problem disappeared.
>   What I did is , change the RAM size from the exact memory size needed to hold the object file section(s), to the size that TI C3X user manual memory mapping specifies.
>   The former size is significantly smaller. But I did not specify the memory mapping else where in the program, so still unsure about the cause of conflict.
>
>             memory_region_init_ram(ary_sect_chain[i].mem_region, NULL, ary_sect_chain[i].s_name,
>                                    /*ary_sect_chain[i].chain_size*4*/  ary_sect_chain[i].region_size,  &error_fatal);      //region_size is fixed as specified in CPU manual , region_size>chain_size*4

This still looks very strange. You shouldn't be creating
RAM memory regions in your COFF file loader at all. You create
the RAM memory regions for the board in the board model. Then
the file loader only needs to call rom_add_blob() or similar.
Look at the way we handle ELF files -- COFF loading should
work on a similar principle.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 回复: Why QEMU translates one instruction to a TB?
       [not found]   ` <tencent_6FBC0FD37CA798D4766FE6B2822DAC3E2908@qq.com>
  2020-09-18  9:39     ` Peter Maydell
@ 2020-09-18 10:04     ` Alex Bennée
  1 sibling, 0 replies; 5+ messages in thread
From: Alex Bennée @ 2020-09-18 10:04 UTC (permalink / raw)
  To: casmac; +Cc: Peter&nbsp; Maydell, qemu-devel, Philippe Mathieu-Daudé


casmac <climber.cui@qq.com> writes:

> Hello , 
>
> &nbsp; thanks for the hints. I modified one parameter of&nbsp; memory_region_init_ram() call ,and the slow-path problem disappeared. 
>
> &nbsp; What I did is , change the RAM size from the exact memory size needed to hold the object file section(s), to the size that TI C3X user manual memory mapping specifies. 
>
> &nbsp; The former size is significantly smaller. But I did not specify the memory mapping else where in the program, so still unsure about the cause of conflict. 
>

Well you should be modelling the system - not what is actually loaded.

<snip>
> &gt; &nbsp; &nbsp; One intresting fact is that this somehow depends on the linker
> &gt; command file. The object file generated by the following linker command
> &gt; file(per_instr.lds)
> &gt; will "trigger" the problem. But QEMU work well with the object file
> &gt; linked by the other linker command file (ok.lds).
> &gt; &nbsp; &nbsp; What cause get_page_addr_code_hostp() function to return -1? I have
> &gt; no clue at all. Any advise is appreciated!!
>
> Maybe the "execute from small-MMU-region RAM" problem?
>
> See:
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg549660.html

That is the change that introduced the ability to do this. On some SoCs
you often run small amounts of boot code from device memory (or on-chip
chache) while the main system memory is setup. Usually it's not a large
amount of code so doing it one instruction at a time isn't a massive
burden.

You have to do it this way because the underlying instruction may change
each time you read that memory. In normal system RAM we have
architectural hints such as flushing events which eventually end up as
tlb-flush events that ensure code is re-translated when needed.

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-09-18 10:09 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <tencent_EAC696641F035EB7E9885302EAAE37455907@qq.com>
2020-09-17  7:38 ` Why QEMU translates one instruction to a TB? Philippe Mathieu-Daudé
2020-09-17  7:45 ` Philippe Mathieu-Daudé
     [not found]   ` <tencent_6FBC0FD37CA798D4766FE6B2822DAC3E2908@qq.com>
2020-09-18  9:39     ` Peter Maydell
2020-09-18 10:04     ` 回复: " Alex Bennée
2020-09-17  8:41 ` Alex Bennée

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.