* Re: Why QEMU translates one instruction to a TB?
[not found] <tencent_EAC696641F035EB7E9885302EAAE37455907@qq.com>
@ 2020-09-17 7:38 ` Philippe Mathieu-Daudé
2020-09-17 7:45 ` Philippe Mathieu-Daudé
2020-09-17 8:41 ` Alex Bennée
2 siblings, 0 replies; 5+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-09-17 7:38 UTC (permalink / raw)
To: casmac, qemu-devel; +Cc: Peter Maydell
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=gb18030, Size: 2793 bytes --]
On 9/17/20 8:25 AM, casmac wrote:
> Hi all,
> 0202 02 We try to add DSP architecure to QEMU 4.2. To load the COFF format
> object file, we have added loader code to load content from
> 02 the object file. The rom_add_blob() function is used. We firstly
> analyze the COFF file to figure out which sections are chained
> 02 together(so each chain forms a "memory blob"), and then allocate the
> memory blobs.
> 02
> 02 The psuedo code looks like:
> 02
> 020202 02 02 02 for(i=0; i<BADTYPE; i++){
> 02 02 02 02 02 02 if(ary_sect_chain[i].exist) 02 //there is a chain of sections
> to allocate
> 02 02 02 02 02 02 {
> 02 02 02 02 02 02 02 02 ary_sect_chain[i].mem_region = g_new(MemoryRegion, 1);
> 02 02 02 02 02 02 02 02 memory_region_init_ram(...);
> 02 02 02 02 02 02 02 02 memory_region_add_subregion(sysmem, ....);
> 02 02 02 02 02 02 02 02 rom_add_blob(....);
> 02 02 02 02 02 02 }
> 02 02 0202 02 }
Why do this silly mapping when you know your DSP memory map?
> ------------------------------------------------------
> ok.lds file:
>
> MEMORY 02 /* MEMORY directive */
> {
> 02 02 ROM:020202020202020202 02 origin = 000000h02 02 length = 001000h0202 02 /* 4K
> 32-bit words on-chip ROM (C31/VC33) */
Per the TI spru031f datasheet, this is external (there is no
on-chip ROM).
I have my doubts there is actually a ROM mapped here...
Is this linkscript used to *test* a BIOS written in SRAM by
some JTAG?
> 02 02 /* 256K 32-bit word off-chip SRAM (D.Module.VC33-150-S2) */
> 02 02 BIOS:02020202 02 origin = 001000h020202 02 length = 000300h
> 02 02 CONF_UTL: 02 origin = 001300h020202 02 length = 000800h
> 02 02 FREE:02020202 02 origin = 001B00h020202 02 length = 03F500h02 /* 259328 32-bit
> words */
> 02 02 RAM_0_1:0202 02 origin = 809800h02 02 length = 000800h0202 02 /* 2 x 1K
> 32-bit word on-chip SRAM (C31/VC33) */
> 02 02 RAM_2_3:0202 02 origin = 800000h02 02 length = 008000h0202 02 /* 2 x 16K
> 32-bit word on-chip SRAM (VC33 only) */
> }
You probably want to use:
memory_region_init_ram(&s->extsram, OBJECT(dev), "eSRAM",
256 * KiB, &error_fatal);
memory_region_add_subregion(get_system_memory(),
0x000000, &s->extsram);
memory_region_init_ram(&s->ocsram, OBJECT(dev), "iSRAM",
2 * KiB, &error_fatal);
memory_region_add_subregion(get_system_memory(),
0x809800, &s->ocsram);
Then different areas of the object file will be loaded into
the either the iSRAM or the eSRAM.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Why QEMU translates one instruction to a TB?
[not found] <tencent_EAC696641F035EB7E9885302EAAE37455907@qq.com>
2020-09-17 7:38 ` Why QEMU translates one instruction to a TB? Philippe Mathieu-Daudé
@ 2020-09-17 7:45 ` Philippe Mathieu-Daudé
[not found] ` <tencent_6FBC0FD37CA798D4766FE6B2822DAC3E2908@qq.com>
2020-09-17 8:41 ` Alex Bennée
2 siblings, 1 reply; 5+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-09-17 7:45 UTC (permalink / raw)
To: casmac, qemu-devel; +Cc: Alex Bennée, Peter Maydell, Richard Henderson
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=gb18030, Size: 2013 bytes --]
On 9/17/20 8:25 AM, casmac wrote:
> Hi all,
> 0202 02 We try to add DSP architecure to QEMU 4.2. To load the COFF format
> object file, we have added loader code to load content from
> 02 the object file.
[...]
> 02 02 The COFF loader works functionally, but we then found that sometimes
> QEMU is down-graded - it treats each instruction as one TB. In version
> 4.2,02 debugging shows
> that get_page_addr_code_host() from accel/tcg/cputlb.c returns -1, as
> shown below.
>
> accel/tcg/cputlb.c:
> tb_page_addr_t get_page_addr_code_hostp(CPUArchState *env, target_ulong
> addr,
> 02020202020202020202020202020202020202020202020202020202020202020202020202 02 void **hostp)
> {
> 02 02 uintptr_t mmu_idx = cpu_mmu_index(env, true);
> 02 02 uintptr_t index = tlb_index(env, mmu_idx, addr);
> 02 02 CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr);
> 02 02 void *p;
>
> 02 02 //.....
> 02 02 if (unlikely(entry->addr_code & TLB_MMIO)) {
> 0202020202 02 /* The region is not backed by RAM.02 */
> 0202020202 02 if (hostp) {
> 020202020202020202 02 *hostp = NULL;
> 0202020202 02 }
> 0202020202 02 return -1;02 02 02 02 /* debugging falls to this branch, after this
> point QEMU translate one instruction to a TB02 */
> 02 02 }
> 02 02 //.......
> }02 02
>
> 02 02 One intresting fact is that this somehow depends on the linker
> command file. The object file generated by the following linker command
> file(per_instr.lds)
> will "trigger" the problem. But QEMU work well with the object file
> linked by the other linker command file (ok.lds).
> 02 02 What cause get_page_addr_code_hostp() function to return -1? I have
> no clue at all. Any advise is appreciated!!
Maybe the "execute from small-MMU-region RAM" problem?
See:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg549660.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Why QEMU translates one instruction to a TB?
[not found] <tencent_EAC696641F035EB7E9885302EAAE37455907@qq.com>
2020-09-17 7:38 ` Why QEMU translates one instruction to a TB? Philippe Mathieu-Daudé
2020-09-17 7:45 ` Philippe Mathieu-Daudé
@ 2020-09-17 8:41 ` Alex Bennée
2 siblings, 0 replies; 5+ messages in thread
From: Alex Bennée @ 2020-09-17 8:41 UTC (permalink / raw)
To: casmac; +Cc: Peter Maydell, qemu-devel
casmac <climber.cui@qq.com> writes:
> Hi all,
> We try to add DSP architecure to QEMU 4.2. To load the COFF format object file, we have added loader code to load content from
> the object file. The rom_add_blob() function is used. We firstly analyze the COFF file to figure out which sections are chained
> together(so each chain forms a "memory blob"), and then allocate the memory blobs.
>
> The psuedo code looks like:
>
> for(i=0; i<BADTYPE; i++){
> if(ary_sect_chain[i].exist) //there is a chain of sections to allocate
> {
> ary_sect_chain[i].mem_region = g_new(MemoryRegion, 1);
> memory_region_init_ram(...);
> memory_region_add_subregion(sysmem, ....);
> rom_add_blob(....);
> }
> }
>
<snip>
> if (unlikely(entry->addr_code & TLB_MMIO)) {
> /* The region is not backed by
> RAM. */
This is the crux of it. If the address looked up isn't in a RAM region
then the TLB code can't assume a contiguous page of instructions or that
the instruction executed on one read will be the same on the next so it
will only execute a single instruction at a time and not cache the
resulting TB either forcing a fresh re-translation each time.
All TLB_MMIO access basically force the slow path.
I suspect there is something wrong in your memory region mappings.
--
Alex Bennée
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Why QEMU translates one instruction to a TB?
[not found] ` <tencent_6FBC0FD37CA798D4766FE6B2822DAC3E2908@qq.com>
@ 2020-09-18 9:39 ` Peter Maydell
2020-09-18 10:04 ` 回复: " Alex Bennée
1 sibling, 0 replies; 5+ messages in thread
From: Peter Maydell @ 2020-09-18 9:39 UTC (permalink / raw)
To: casmac; +Cc: Alex ,Bennée, qemu-devel, Philippe Mathieu-Daudé
On Fri, 18 Sep 2020 at 07:12, casmac <climber.cui@qq.com> wrote:
>
> Hello ,
> thanks for the hints. I modified one parameter of memory_region_init_ram() call ,and the slow-path problem disappeared.
> What I did is , change the RAM size from the exact memory size needed to hold the object file section(s), to the size that TI C3X user manual memory mapping specifies.
> The former size is significantly smaller. But I did not specify the memory mapping else where in the program, so still unsure about the cause of conflict.
>
> memory_region_init_ram(ary_sect_chain[i].mem_region, NULL, ary_sect_chain[i].s_name,
> /*ary_sect_chain[i].chain_size*4*/ ary_sect_chain[i].region_size, &error_fatal); //region_size is fixed as specified in CPU manual , region_size>chain_size*4
This still looks very strange. You shouldn't be creating
RAM memory regions in your COFF file loader at all. You create
the RAM memory regions for the board in the board model. Then
the file loader only needs to call rom_add_blob() or similar.
Look at the way we handle ELF files -- COFF loading should
work on a similar principle.
thanks
-- PMM
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 回复: Why QEMU translates one instruction to a TB?
[not found] ` <tencent_6FBC0FD37CA798D4766FE6B2822DAC3E2908@qq.com>
2020-09-18 9:39 ` Peter Maydell
@ 2020-09-18 10:04 ` Alex Bennée
1 sibling, 0 replies; 5+ messages in thread
From: Alex Bennée @ 2020-09-18 10:04 UTC (permalink / raw)
To: casmac; +Cc: Peter Maydell, qemu-devel, Philippe Mathieu-Daudé
casmac <climber.cui@qq.com> writes:
> Hello ,
>
> thanks for the hints. I modified one parameter of memory_region_init_ram() call ,and the slow-path problem disappeared.
>
> What I did is , change the RAM size from the exact memory size needed to hold the object file section(s), to the size that TI C3X user manual memory mapping specifies.
>
> The former size is significantly smaller. But I did not specify the memory mapping else where in the program, so still unsure about the cause of conflict.
>
Well you should be modelling the system - not what is actually loaded.
<snip>
> > One intresting fact is that this somehow depends on the linker
> > command file. The object file generated by the following linker command
> > file(per_instr.lds)
> > will "trigger" the problem. But QEMU work well with the object file
> > linked by the other linker command file (ok.lds).
> > What cause get_page_addr_code_hostp() function to return -1? I have
> > no clue at all. Any advise is appreciated!!
>
> Maybe the "execute from small-MMU-region RAM" problem?
>
> See:
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg549660.html
That is the change that introduced the ability to do this. On some SoCs
you often run small amounts of boot code from device memory (or on-chip
chache) while the main system memory is setup. Usually it's not a large
amount of code so doing it one instruction at a time isn't a massive
burden.
You have to do it this way because the underlying instruction may change
each time you read that memory. In normal system RAM we have
architectural hints such as flushing events which eventually end up as
tlb-flush events that ensure code is re-translated when needed.
--
Alex Bennée
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-09-18 10:09 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <tencent_EAC696641F035EB7E9885302EAAE37455907@qq.com>
2020-09-17 7:38 ` Why QEMU translates one instruction to a TB? Philippe Mathieu-Daudé
2020-09-17 7:45 ` Philippe Mathieu-Daudé
[not found] ` <tencent_6FBC0FD37CA798D4766FE6B2822DAC3E2908@qq.com>
2020-09-18 9:39 ` Peter Maydell
2020-09-18 10:04 ` 回复: " Alex Bennée
2020-09-17 8:41 ` Alex Bennée
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.