From mboxrd@z Thu Jan 1 00:00:00 1970 From: Julien Grall Subject: Re: [PATCH ARM v5 19/20] mini-os: initial ARM support Date: Mon, 30 Jun 2014 22:08:44 +0100 Message-ID: <53B1D1DC.7020005@linaro.org> References: <1403782117-15125-1-git-send-email-talex5@gmail.com> <1403782117-15125-20-git-send-email-talex5@gmail.com> <53AF0A0E.6000504@linaro.org> Mime-Version: 1.0 Content-Type: text/plain; charset="windows-1252"; Format="flowed" Content-Transfer-Encoding: quoted-printable Return-path: Received: from mail6.bemta14.messagelabs.com ([193.109.254.103]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1X1ioj-00047o-Ed for xen-devel@lists.xenproject.org; Mon, 30 Jun 2014 21:08:49 +0000 Received: by mail-we0-f174.google.com with SMTP id u57so8840930wes.19 for ; Mon, 30 Jun 2014 14:08:47 -0700 (PDT) In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Thomas Leonard Cc: xen-devel@lists.xenproject.org, Stefano Stabellini , David Scott , Samuel Thibault , Anil Madhavapeddy List-Id: xen-devel@lists.xenproject.org Hi Thomas, On 30/06/14 20:12, Thomas Leonard wrote: > On 28 June 2014 19:31, Julien Grall wrote: >> >> >> On 26/06/14 12:28, Thomas Leonard wrote: >>> >>> On ARM, Mini-OS will boot and display some output on the console. >>> Tested with: >>> >>> make XEN_TARGET_ARCH=3Darm32 CROSS_COMPILE=3Darm-linux-gnueabihf- \ >>> CONFIG_TEST=3Dy CONFIG_START_NETWORK=3Dn CONFIG_BLKFRONT=3Dn \ >>> CONFIG_NETFRONT=3Dn CONFIG_FBFRONT=3Dn CONFIG_KBDFRONT=3Dn \ >>> CONFIG_CONSFRONT=3Dn CONFIG_XC=3Dn -j4 >>> >>> Signed-off-by: Karim Allah Ahmed >>> [talex5@gmail.com: made x86_64 support work again] >>> [talex5@gmail.com: split into multiple patches] >>> [talex5@gmail.com: re-enabled force_evtchn_callback] >>> [talex5@gmail.com: enable regular console] >>> [talex5@gmail.com: fixed initialisation code: >>> - Configure write-back caching in page table. This is needed for >>> reliable hypercalls to Xen (thanks to Julien Grall). >>> - Use "client mode" for access control (domains are deprecated, >>> according to ARM Cortex-A Series Programmer=92s Guide version 4.0, >>> section 9.6.4). >>> - Enable more SCTLR features (icache, branch prediction)] >>> [talex5@gmail.com: use Virtual Count register for monotonic time] >>> [talex5@gmail.com: fixed HYPERVISOR_shutdown] >>> [talex5@gmail.com: get xenstore details from hypervisor] >>> [talex5@gmail.com: use GCC implementation of division] >>> [talex5@gmail.com: cleaned up interrupt handlers and threading] >>> [talex5@gmail.com: call exit_thread when a thread returns] >>> [talex5@gmail.com: implemented block_domain for ARM] >>> [talex5@gmail.com: fixed hang when enabling interrupts] >>> [talex5@gmail.com: added -march=3Darmv7-a to flags] >>> [talex5@gmail.com: CLREX after handling IRQs] >>> [talex5@gmail.com: unbind debug port at shutdown] >>> [talex5@gmail.com: allow unaligned accesses] >>> [talex5@gmail.com: fix zImage header for XSA-95] >>> [talex5@gmail.com: get RAM base and size from the FDT] >>> [talex5@gmail.com: get GIC addresses from FDT] >>> [talex5@gmail.com: added ARM grant table initialisation] >>> [talex5@gmail.com: added missing copyright header to hypercalls32.S] >>> [talex5@gmail.com: moved GIC driver to arm directory] >>> [talex5@gmail.com: fixes suggested by Julien Grall: >>> - Removed unnecessary isb. >>> - Renamed GICD_PRIORITY to GICD_IPRIORITYR. >>> - Change IRQ number type from unsigned char to int. >>> - Added volatile to {set,clear}_bit_non_atomic. >>> - Fixed some comments. >>> - Check compatible properties in DTB.] >>> [talex5@gmail.com: made image relocatable] >>> [talex5@gmail.com: added mfn_to_pfn and pfn_to_mfn] >> >> >> IHMO all these changes doesn't belong to the commit message, i.e should = not >> appear in the commit message when Ian will apply your commit message. > > As I understand it, as I'm not the author of the patch, it's polite to > indicate how I have modified it. I'm trying to follow the guidelines > here: > "Rule (b) allows you to adjust the code, but then it is very impolite > to change one submitter's code and make him endorse your bugs. To > solve this problem, it is recommended that you add a line between the > last Signed-off-by header and yours, indicating the nature of your > changes." I agree with it. But you modify so much things in this patch, that I = think it's not necessary to specify all of them. Anyway it was only a suggestion. >>> Signed-off-by: Thomas Leonard >>> --- >>> extras/mini-os/ARM-TODO.txt | 6 + >>> extras/mini-os/Config.mk | 2 + >>> extras/mini-os/Makefile | 9 ++ >>> extras/mini-os/arch/arm/Makefile | 32 ++++ >>> extras/mini-os/arch/arm/arch.mk | 7 + >>> extras/mini-os/arch/arm/arm32.S | 266 >>> +++++++++++++++++++++++++++++++ >>> extras/mini-os/arch/arm/events.c | 30 ++++ >>> extras/mini-os/arch/arm/gic.c | 222 >>> ++++++++++++++++++++++++++ >>> extras/mini-os/arch/arm/hypercalls32.S | 75 +++++++++ >>> extras/mini-os/arch/arm/minios-arm32.lds | 75 +++++++++ >>> extras/mini-os/arch/arm/mm.c | 134 ++++++++++++++++ >>> extras/mini-os/arch/arm/sched.c | 37 +++++ >>> extras/mini-os/arch/arm/setup.c | 116 ++++++++++++++ >>> extras/mini-os/arch/arm/time.c | 202 +++++++++++++++++++++= ++ >>> extras/mini-os/kernel.c | 2 +- >>> 15 files changed, 1214 insertions(+), 1 deletion(-) >>> create mode 100644 extras/mini-os/ARM-TODO.txt >>> create mode 100755 extras/mini-os/arch/arm/Makefile >>> create mode 100644 extras/mini-os/arch/arm/arch.mk >>> create mode 100644 extras/mini-os/arch/arm/arm32.S >>> create mode 100644 extras/mini-os/arch/arm/events.c >>> create mode 100644 extras/mini-os/arch/arm/gic.c >>> create mode 100644 extras/mini-os/arch/arm/hypercalls32.S >>> create mode 100755 extras/mini-os/arch/arm/minios-arm32.lds >>> create mode 100644 extras/mini-os/arch/arm/mm.c >>> create mode 100644 extras/mini-os/arch/arm/sched.c >>> create mode 100644 extras/mini-os/arch/arm/setup.c >>> create mode 100644 extras/mini-os/arch/arm/time.c >>> >>> diff --git a/extras/mini-os/Config.mk b/extras/mini-os/Config.mk >>> index d61877b..4ecde54 100644 >>> --- a/extras/mini-os/Config.mk >>> +++ b/extras/mini-os/Config.mk >>> @@ -12,6 +12,8 @@ export XEN_INTERFACE_VERSION >>> # If not x86 then use $(XEN_TARGET_ARCH) >>> ifeq ($(findstring x86_,$(XEN_TARGET_ARCH)),x86_) >>> TARGET_ARCH_FAM =3D x86 >>> +else ifeq ($(findstring arm,$(XEN_TARGET_ARCH)),arm) >>> +TARGET_ARCH_FAM =3D arm >>> else >>> TARGET_ARCH_FAM =3D $(XEN_TARGET_ARCH) >>> endif >>> diff --git a/extras/mini-os/Makefile b/extras/mini-os/Makefile >>> index 931cd05..01d8af0 100644 >>> --- a/extras/mini-os/Makefile >>> +++ b/extras/mini-os/Makefile >>> @@ -78,6 +78,9 @@ TARGET :=3D mini-os >>> SUBDIRS :=3D lib xenbus console >>> >>> ifeq ($(XEN_TARGET_ARCH),arm32) >>> +# Need libgcc.a for division helpers >>> +LDLIBS +=3D `$(CC) -print-libgcc-file-name` >> >> >> OOI, how much code does add libgcc for the division helpers? > > Hard to say. libgcc.a contains many files, but the ones with "div" in > the name come to about 67K, so probably less than that. Ok. >>> diff --git a/extras/mini-os/arch/arm/arm32.S >>> b/extras/mini-os/arch/arm/arm32.S >>> new file mode 100644 >>> index 0000000..de74ed9 >>> --- /dev/null >>> +++ b/extras/mini-os/arch/arm/arm32.S >>> @@ -0,0 +1,266 @@ >>> +@ Virtual address of the start of RAM (any value will do, but it must = be >>> +@ section-aligned). Update the lds script if changed. >>> +#define VIRT_BASE 0x400000 >>> + >>> +@ Offset of the kernel within the RAM. This is a zImage convention whi= ch >>> we >>> +@ rely on. >>> +#define ZIMAGE_KERNEL_OFFSET 0x8000 >> >> >> Hmmm... this is not a zImage convention... IIRC Linux is using this offs= et >> to have enough space to create startup page table during boot. > > OK, so this is a convention of Linux only? I found this reference: > > http://www.simtec.co.uk/products/SWLINUX/files/booting_article.html > > "Despite the ability to place zImage anywhere within memory, > convention has it that it is loaded at the base of physical RAM plus > an offset of 0x8000 (32K). This leaves space for the parameter block > usually placed at offset 0x100, zero page exception vectors and page > tables. This convention is *very* common." I think this convention is for kernel which are able to load only in a = specific address (i.e CONFIG_AUTO_ZRELADDR=3Dn). Linux Xen guest need to = have this option set to be able to boot. So Linux will calculate the = relocation address itself: #ifdef CONFIG_AUTO_ZRELADDR @ determine final kernel image address mov r4, pc and r4, r4, #0xf8000000 add r4, r4, #TEXT_OFFSET #else ldr r4, =3Dzreladdr #endif TEXT_OFFSET is equal to 0x8000 here. This code lives in Linux since at least 2010 (last commit in git blame = arch/arm/boot/compressed/head.S). Futhermore, for DOM0 we are using a 2MB-aligned address. I think it's = time to make the same thing for the guest (see the patch I sent earlier = today: https://patches.linaro.org/32742/). This will also help to add support more easily for Xen in new OS. >> The Linux zImage is able to relocate itself in the memory to respect this >> convention. But the zImage itself can be loaded anywhere in the memory. >> >> By looking to your code below your are relying that the kernel will be >> loaded at 0xXXXX8000 which is incorrect. This offset is odd and make oth= er >> kernel (such as FreeBSD) requiring the same trick which is not part of t= he >> Linux boot protocol. > > Sorry, I don't understand this. You're saying that Xen's choice of > 0x8000 forces FreeBSD to support this offset too? But that isn't > caused by anything in Mini-OS. > >> I plan to send a patch to require the start address to be 2MB aligned, so >> kernels will be able to use 2MB (for LPAE) and 1MB section for there ear= ly >> page table. >> >> If the kernel wants another alignment, then you will have to relocate >> yourself. Even though in your case, you don't need this odd 0xXXXX8000. > > It's very convenient to know at least the offset where we will be > loaded. Otherwise, we have to move the image around in memory with > loads and stores. I think there is some work going on to share images > between VMs (for running large numbers of VMs on one machine). I don't > know the details, but presumably it would be easier to support that if > the image's text section can be mapped read-only. I sent a patch to handle 2MB-align with is less odd than the 0x8000. = IHMO, the latter offset has no sense as it's not part of the ABI. >> [..] >> >> >>> + @ Fill in the whole top-level translation table (at page_dir). >>> + @ Populate the whole pagedir with 1MB section descriptors. >>> + >>> + mov r1, r7 @ r1 -> first section entry >>> + add r3, r1, #4*4*1024 @ limit (4 GB address space, 4 >>> byte entries) >>> + orr r0, r8, r9 @ r0 =3D entry mapping section = zero >>> to start of physical RAM >>> +1: >>> + str r0, [r1],#4 @ write the section entry >>> + add r0, r0, #1 << 20 @ next physical page (wraps) >>> + cmp r1, r3 >>> + bne 1b >> >> >> I would document a bit more this part. It took me a bit of time to >> understand that you mapping the whole address space in an odd manner. > > Could you suggest something? It's hard to know what other people will > find confusing. For reference, here's the existing comment in the code > where I tried to explain the scheme: > > @ Problem: the C code wants to be at a known address (VIRT_BASE), > but Xen might > @ load us anywhere. We initialise the MMU (mapping virtual to > physical @ addresses) > @ so everything ends up where the code expects it to be. > @ > @ We calculate the offet between where the linker thought _start > would be and where > @ it actually is and initialise the page tables to have that > offset for every page. > @ > @ When we turn on the MMU, we're still executing at the old > address. We don't want > @ the code to disappear from under us. So we have to do the > mapping in stages: > @ > @ 1. set up a mapping to our current page from both its current > and desired addresses > @ 2. enable the MMU > @ 3. jump to the new address > @ 4. remap all the other pages with the calculated offset Actually, I like the existing comment. It tell the problems and explain = how to solve it. Can you keep it? >> While it's fine for a first version of Mini-OS support for ARM. It think= at >> long term you want to map only necessary bank rank. It will be easier to >> catch programming error and avoid to trap in the hypervisor because the >> physical address doesn't exist. > > Yes, this is just the boot code. Eventually, the C code should make > the .text read-only (and unmap pages that don't exist, although I > don't think there's currently any disadvantage to having the trap > caused by the second stage page tables). I also want to add guard > pages to the thread stacks once these initial patches are in. > >> Futhermore, the data abort sent by the hypervisor is odd (debug smth). > > Could you explain what you mean by "odd" here? Xen is injecting a debug fault when it needs to send a data abort to the = guest. It may be confusing for the developer that doesn't know Xen internal. Anyway, I'm fine with the current solution. >>> +.pushsection .data >>> +_data: >>> +.align 14 >>> +.globl page_dir >>> +@ Each word maps one 1MB virtual section to a physical section >>> +page_dir: >>> + .fill (4*1024), 4, 0x0 >>> + >>> +.align 12 >>> +.globl shared_info_page >>> +shared_info_page: >>> + .fill (1024), 4, 0x0 >>> + >>> +.align 3 >>> +.globl stack >>> +stack: >>> + .fill (4*1024), 4, 0x0 >>> +stack_end: >>> + >>> +.align 3 >>> +irqstack: >>> + .fill (1024), 4, 0x0 >>> +irqstack_end: >>> + >>> +.globl physical_address_offset >>> +physical_address_offset: >>> + .long 0 >>> + >>> +.popsection >> >> >> Any reason to define theses variables in ASM rather than C? > > I normally prefer to define variables in the module that sets them, > but it doesn't make much difference to me. > > Though perhaps they should be moved to the .bss section (except for > the page table, which could be moved to the start of RAM to simplify > things). It would help for ARM64 port as I don't think those variables is 32bit = specific. >>> diff --git a/extras/mini-os/arch/arm/gic.c b/extras/mini-os/arch/arm/gi= c.c >>> new file mode 100644 >>> index 0000000..5641eb0 >>> --- /dev/null >>> +++ b/extras/mini-os/arch/arm/gic.c >>> @@ -0,0 +1,222 @@ >>> +// ARM GIC implementation >>> + >>> +#include >>> +#include >>> +#include >>> + >>> +//#define VGIC_DEBUG >>> +#ifdef VGIC_DEBUG >>> +#define DEBUG(_f, _a...) \ >>> + DEBUG("MINI_OS(file=3Dvgic.c, line=3D%d) " _f , __LINE__, ## _a) >> >> >> Did you intend to use printk here? > > It does look odd. I'll take a look. Thanks. >> [..] >> >> >>> +void gic_init(void) { >>> + gic.gicd_base =3D NULL; >>> + int node =3D 0; >>> + int depth =3D 0; >>> + for (;;) >>> + { >>> + node =3D fdt_next_node(device_tree, node, &depth); >>> + if (node <=3D 0 || depth < 0) >>> + break; >>> + >>> + if (fdt_getprop(device_tree, node, "interrupt-controller", NUL= L)) >>> { >>> + int len =3D 0; >>> + >>> + if (fdt_node_check_compatible(device_tree, node, >>> "arm,cortex-a15-gic") && >>> + fdt_node_check_compatible(device_tree, node, >>> "arm,cortex-a9-gic") && >> >> >> I don't think this compatible is necessary. Cortex A9 doesn't support >> virtualisation. > > OK. I was just matching on the values that Xen provides. a15-gic > should be fine on its own, though. IIRC, the toolstack doesn't provide arm,cortex-a9-gic :). >>> + fdt_node_check_compatible(device_tree, node, >>> "arm,cortex-a7-gic")) { >>> + printk("Skipping incompatible interrupt-controller >>> node\n"); >>> + continue; >>> + } >>> + >>> + const uint64_t *reg =3D fdt_getprop(device_tree, node, "re= g", >>> &len); >>> + if (reg =3D=3D NULL || len !=3D 32) { >> >> >> As asked on the previous version, if you plan to assume specific range s= ize >> for the time-being, please explain the 32. > > OK, so we have two registers (GICC and GICD), each of which contains > two parts (an address and a size), each of which is a 64-bit value (8 > bytes). So 2 * 2 * 8 =3D 32. Should I check this as a minimum, or > require it to match exactly? Match exactly is better. It will catch easily any change in the device = tree provided by the toolstack. Can you add what you said as a comment? >>> + /* TODO: support other formats */ >>> + printk("Bad 'reg' property: %p %d\n", reg, len); >>> + continue; >>> + } >>> + gic.gicd_base =3D to_virt((long) fdt64_to_cpu(reg[0])); >>> + gic.gicc_base =3D to_virt((long) fdt64_to_cpu(reg[2])); >> >> >> AFAIU, your are mapping the GIC region with normal attribute. With this >> attribute, the processor may reorder the write in the device memory, pre= load >> memory, caching.... Which is completely wrong in this case. I'm sa bit >> surprised that it works correctly. You have to map thoses regions as dev= ice >> memory. > > I believe that the ARM uses the most restrictive memory type from the > two stages of the translation table, so if Xen maps it as device > memory then it will work. We can set it here as well, though. > > (see "Overlaying the memory type attribute" in the Architecture > Reference Manual) I don't really want to see this such assumption in Mini-OS. Let say, we = decide to change the memory type attribute in Xen 4.x to "Normal". Then = we screw Mini-OS, and it would be very hard to debug. Regards, -- = Julien Grall