From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=47131 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OBZxb-0006be-Cg for qemu-devel@nongnu.org; Mon, 10 May 2010 16:52:20 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OBZxW-0005zE-ES for qemu-devel@nongnu.org; Mon, 10 May 2010 16:52:19 -0400 Received: from mail-ww0-f45.google.com ([74.125.82.45]:37166) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OBZxV-0005yf-Ro for qemu-devel@nongnu.org; Mon, 10 May 2010 16:52:14 -0400 Received: by wwb39 with SMTP id 39so417243wwb.4 for ; Mon, 10 May 2010 13:52:12 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <1273327815-21408-1-git-send-email-atar4qemu@gmail.com> From: Artyom Tarasenko Date: Mon, 10 May 2010 22:51:52 +0200 Message-ID: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: [Qemu-devel] Re: [PATCH 1/2] Pad iommu with an empty slot (necessary for SunOS 4.1.4) List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Blue Swirl Cc: qemu-devel@nongnu.org 2010/5/10 Blue Swirl : > On 5/10/10, Artyom Tarasenko wrote: >> 2010/5/9 Blue Swirl : >> =C2=A0> On 5/9/10, Artyom Tarasenko wrote: >> =C2=A0>> 2010/5/9 Blue Swirl : >> =C2=A0>> >> =C2=A0>> > On 5/8/10, Artyom Tarasenko wrote: >> =C2=A0>> =C2=A0>> On the real hardware (SS-5, LX) the MMU is not padded,= but aliased. >> =C2=A0>> =C2=A0>> =C2=A0Software shouldn't use aliased addresses, neithe= r should it crash >> =C2=A0>> =C2=A0>> =C2=A0when it uses (on the real hardware it wouldn't).= Using empty_slot >> =C2=A0>> =C2=A0>> =C2=A0instead of aliasing can help with debugging such= accesses. >> =C2=A0>> =C2=A0> >> =C2=A0>> =C2=A0> TurboSPARC Microprocessor User's Manual shows that ther= e are >> =C2=A0>> =C2=A0> additional pages after the main IOMMU for AFX registers= . So this is >> =C2=A0>> =C2=A0> not board specific, but depends on CPU/IOMMU versions. >> =C2=A0>> >> =C2=A0>> >> =C2=A0>> I checked it on the real hw: on LX and SS-5 these are aliased M= MU addresses. >> =C2=A0>> =C2=A0SS-20 doesn't have any aliasing. >> =C2=A0> >> =C2=A0> But are your machines equipped with TurboSPARC or some other CPU= ? >> >> >> Good point, I must confess, I missed the word "Turbo" in your first >> =C2=A0answer. LX and SS-20 don't. >> =C2=A0But SS-5 must have a TurboSPARC CPU: >> >> =C2=A0ok cd /FMI,MB86904 >> =C2=A0ok .attributes >> =C2=A0context-table =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A000 00 00 00= 03 ff f0 00 00 00 10 00 >> =C2=A0psr-implementation =C2=A0 =C2=A0 =C2=A0 00000000 >> =C2=A0psr-version =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0000000= 04 >> =C2=A0implementation =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 00000000 >> =C2=A0version =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A000000004 >> =C2=A0cache-line-size =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A000000020 >> =C2=A0cache-nlines =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 00000200 >> =C2=A0page-size =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00= 0001000 >> =C2=A0dcache-line-size =C2=A0 =C2=A0 =C2=A0 =C2=A0 00000010 >> =C2=A0dcache-nlines =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A000000200 >> =C2=A0dcache-associativity =C2=A0 =C2=A0 00000001 >> =C2=A0icache-line-size =C2=A0 =C2=A0 =C2=A0 =C2=A0 00000020 >> =C2=A0icache-nlines =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A000000200 >> =C2=A0icache-associativity =C2=A0 =C2=A0 00000001 >> =C2=A0ncaches =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A000000002 >> =C2=A0mmu-nctx =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0= 0000100 >> =C2=A0sparc-version =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A000000008 >> =C2=A0mask_rev =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0= 0000026 >> =C2=A0device_type =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0cpu >> =C2=A0name =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 FMI,MB86904 >> >> =C2=A0and still it behaves the same as TI,TMS390S10 from the LX. This is= done on SS-5: >> >> =C2=A0ok 10000000 20 spacel@ . >> =C2=A04000009 >> =C2=A0ok 14000000 20 spacel@ . >> =C2=A04000009 >> =C2=A0ok 14000004 20 spacel@ . >> =C2=A023000 >> =C2=A0ok 1f000004 20 spacel@ . >> =C2=A023000 >> =C2=A0ok 10000008 20 spacel@ . >> =C2=A04000009 >> =C2=A0ok 14000028 20 spacel@ . >> =C2=A04000009 >> =C2=A0ok 1000000c 20 spacel@ . >> =C2=A023000 >> =C2=A0ok 10000010 20 spacel@ . >> =C2=A04000009 >> >> >> =C2=A0LX is the same except for the IOMMU-version: >> >> =C2=A0ok 10000000 20 spacel@ . >> =C2=A04000005 >> =C2=A0ok 14000000 20 spacel@ . >> =C2=A04000005 >> =C2=A0ok 18000000 20 spacel@ . >> =C2=A04000005 >> =C2=A0ok 1f000000 20 spacel@ . >> =C2=A04000005 >> =C2=A0ok 1ff00000 20 spacel@ . >> =C2=A04000005 >> =C2=A0ok 1fff0004 20 spacel@ . >> =C2=A01fe000 >> =C2=A0ok 10000004 20 spacel@ . >> =C2=A01fe000 >> =C2=A0ok 10000108 20 spacel@ . >> =C2=A041000005 >> =C2=A0ok 10000040 20 spacel@ . >> =C2=A041000005 >> =C2=A0ok 1fff0040 20 spacel@ . >> =C2=A041000005 >> =C2=A0ok 1fff0044 20 spacel@ . >> =C2=A01fe000 >> =C2=A0ok 1fff0024 20 spacel@ . >> =C2=A01fe000 >> >> >> =C2=A0>> =C2=A0At what address the additional AFX registers are located? >> =C2=A0> >> =C2=A0> Here's complete TurboSPARC IOMMU address map: >> =C2=A0> =C2=A0PA[30:0] =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Register =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0Access >> =C2=A0> 1000_0000 =C2=A0 =C2=A0 =C2=A0 IOMMU Control =C2=A0 =C2=A0 =C2= =A0 =C2=A0 R/W >> =C2=A0> 1000_0004 =C2=A0 =C2=A0IOMMU Base Address =C2=A0 =C2=A0 =C2=A0 R= /W >> =C2=A0> 1000_0014 =C2=A0 Flush All IOTLB Entries =C2=A0 =C2=A0W >> =C2=A0> 1000_0018 =C2=A0 =C2=A0 =C2=A0 =C2=A0Address Flush =C2=A0 =C2=A0= =C2=A0 =C2=A0 W >> =C2=A0> 1000_1000 =C2=A0Asynchronous Fault Status =C2=A0R/W >> =C2=A0> 1000_1004 Asynchronous Fault Address =C2=A0R/W >> =C2=A0> 1000_1010 =C2=A0SBus Slot Con=EF=AC=81guration 0 =C2=A0 R/W >> =C2=A0> 1000_1014 =C2=A0SBus Slot Con=EF=AC=81guration 1 =C2=A0 R/W >> =C2=A0> 1000_1018 =C2=A0SBus Slot Con=EF=AC=81guration 2 =C2=A0 R/W >> =C2=A0> 1000_101C =C2=A0SBus Slot Con=EF=AC=81guration 3 =C2=A0 R/W >> =C2=A0> 1000_1020 =C2=A0SBus Slot Con=EF=AC=81guration 4 =C2=A0 R/W >> =C2=A0> 1000_1050 =C2=A0 =C2=A0 Memory Fault Status =C2=A0 =C2=A0 R/W >> =C2=A0> 1000_1054 =C2=A0 =C2=A0Memory Fault Address =C2=A0 =C2=A0 R/W >> =C2=A0> 1000_2000 =C2=A0 =C2=A0 Module Identi=EF=AC=81cation =C2=A0 =C2= =A0R/W >> =C2=A0> 1000_3018 =C2=A0 =C2=A0 =C2=A0Mask Identi=EF=AC=81cation =C2=A0 = =C2=A0 =C2=A0R >> =C2=A0> 1000_4000 =C2=A0 =C2=A0 =C2=A0AFX Queue Level =C2=A0 =C2=A0 =C2= =A0 =C2=A0 W >> =C2=A0> 1000_6000 =C2=A0 =C2=A0 =C2=A0AFX Queue Level =C2=A0 =C2=A0 =C2= =A0 =C2=A0 R >> =C2=A0> 1000_7000 =C2=A0 =C2=A0 =C2=A0AFX Queue Status =C2=A0 =C2=A0 =C2= =A0 =C2=A0R >> >> >> >> But if I read it correctly 0x12fff294 (which makes SunOS crash with -m 3= 2) is >> =C2=A0well above this limit. > > Oh, so I also misread something. You are not talking about the > adjacent pages, but 16MB increments. > > Earlier I sent a patch for a generic address alias device, would it be > useful for this? Should do as well. But I thought empty_slot is less overhead and easier to debug. > Maybe we have a general design problem, perhaps unassigned access > faults should only be triggered inside SBus slots and ignored > elsewhere. If this is true, generic Sparc32 unassigned access handler > should just ignore the access and special fault generating slots > should be installed for empty SBus address ranges. My impression was that SS-5 and SS-20 do unassigned accesses a bit differen= tly. The current IOMMU implementation fits SS-20, which has no aliasing. >> =C2=A0>> =C2=A0> One approach would be that IOMMU_NREGS would be increas= ed to cover >> =C2=A0>> =C2=A0> these registers (with the bump in savevm version field)= and >> =C2=A0>> =C2=A0> iommu_init1() should check the version field to see how= much MMIO to >> =C2=A0>> =C2=A0> provide. >> =C2=A0>> >> =C2=A0>> >> =C2=A0>> The problem I see here is that we already have too much registe= rs: we >> =C2=A0>> =C2=A0emulate SS-20 IOMMU (I guess), while SS-5 and LX seem to = have only >> =C2=A0>> =C2=A00x20 registers which are aliased all the way. >> =C2=A0>> >> =C2=A0>> >> =C2=A0>> =C2=A0> But in order to avoid the savevm version change, iommu_= init1() could >> =C2=A0>> =C2=A0> just install dummy MMIO (in the TurboSPARC case), if OB= P does not care >> =C2=A0>> =C2=A0> if the read back data matches what has been written ear= lier. Because >> =C2=A0>> =C2=A0> from OBP point of view this is identical to what your p= atch results >> =C2=A0>> =C2=A0> in, I'd suppose this approach would also work. >> =C2=A0>> >> =C2=A0>> >> =C2=A0>> OBP doesn't seem to care about these addresses at all. It's onl= y the "MUNIX" >> =C2=A0>> =C2=A0SunOS 4.1.4 kernel who does. The "MUNIX" kernel is the on= ly kernel available >> =C2=A0>> =C2=A0during the installation, so it is currently not possible = to install 4.1.4. >> =C2=A0>> =C2=A0Surprisingly "GENERIC" kernel which is on the disk after = the >> =C2=A0>> =C2=A0installation doesn't >> =C2=A0>> =C2=A0try to access these address ranges either, so a disk imag= e taken from a live >> =C2=A0>> =C2=A0system works. >> =C2=A0>> >> =C2=A0>> =C2=A0Actually access to the non-connected/aliased addresses ma= y also be a >> =C2=A0>> =C2=A0consequence of phys_page_find bug I mentioned before. Whe= n I run >> =C2=A0>> =C2=A0install with -m 64 and -m 256 it tries to access differen= t >> =C2=A0>> =C2=A0non-connected addresses. May also be a SunOS bug of cours= e. 256m used >> =C2=A0>> =C2=A0to be a lot back then. >> =C2=A0> >> =C2=A0> Perhaps with 256MB, memory probing advances blindly from memory = to >> =C2=A0> IOMMU registers. Proll (used before OpenBIOS) did that once, wit= h bad >> =C2=A0> results :-). If this is true, 64M, 128M and 192M should show ide= ntical >> =C2=A0> results and only with close or equal to 256M the accesses happen= . >> >> >> 32m: 0x12fff294 >> =C2=A064m: 0x14fff294 >> =C2=A0192m:0x1cfff294 >> =C2=A0256m:0x20fff294 >> >> =C2=A0Memory probing? It would be strange that OS would do it itself. Th= e OS >> =C2=A0could just >> =C2=A0ask OBP how much does it have. Here is the listing where it happen= s: >> >> =C2=A0_swift_vac_rgnflush: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0rd = =C2=A0 =C2=A0 =C2=A0%psr, %g2 >> =C2=A0_swift_vac_rgnflush+4: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0andn =C2= =A0 =C2=A0%g2, 0x20, %g5 >> =C2=A0_swift_vac_rgnflush+8: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0mov =C2= =A0 =C2=A0 %g5, %psr >> =C2=A0_swift_vac_rgnflush+0xc: =C2=A0 =C2=A0 =C2=A0 =C2=A0nop >> =C2=A0_swift_vac_rgnflush+0x10: =C2=A0 =C2=A0 =C2=A0 nop >> =C2=A0_swift_vac_rgnflush+0x14: =C2=A0 =C2=A0 =C2=A0 mov =C2=A0 =C2=A0 0= x100, %g5 >> =C2=A0_swift_vac_rgnflush+0x18: =C2=A0 =C2=A0 =C2=A0 lda =C2=A0 =C2=A0 [= %g5] 0x4, %g5 >> =C2=A0_swift_vac_rgnflush+0x1c: =C2=A0 =C2=A0 =C2=A0 sll =C2=A0 =C2=A0 %= o2, 0x2, %g1 >> =C2=A0_swift_vac_rgnflush+0x20: =C2=A0 =C2=A0 =C2=A0 sll =C2=A0 =C2=A0 %= g5, 0x4, %g5 >> =C2=A0_swift_vac_rgnflush+0x24: =C2=A0 =C2=A0 =C2=A0 add =C2=A0 =C2=A0 %= g5, %g1, %g5 >> =C2=A0_swift_vac_rgnflush+0x28: =C2=A0 =C2=A0 =C2=A0 lda =C2=A0 =C2=A0 [= %g5] 0x20, %g5 >> >> =C2=A0_swift_vac_rgnflush+0x28: is the fatal one. >> >> =C2=A0kadb> $c >> =C2=A0_swift_vac_rgnflush(?) >> =C2=A0_vac_rgnflush() + 4 >> =C2=A0_hat_setup_kas(0xc00,0xf0447000,0x43a000,0x400,0xf043a000,0x3c0) += 70 >> =C2=A0_startup(0xfe000000,0x10000000,0xfa000000,0xf00e2bfc,0x10,0xdbc00)= + 1414 >> =C2=A0_main(0xf00e0fb4,0xf0007810,0x293ff49f,0xa805209c,0x200,0xf00d1d18= ) + 14 >> >> =C2=A0Unfortunately (but not surprisingly) kadb doesn't allow debugging >> =C2=A0cache-flush code, so I can't check what is in >> =C2=A0[%g5] (aka sfar) on the real machine when this happens. > > Linux code for Swift/TurboSPARC VAC flush should be similar. > >> =C2=A0But the bug in phys_page_find would explain this accesses: sfar ge= ts >> =C2=A0the wrong address, and then the secondary access happens on this w= rong >> =C2=A0address instead of the original one. > > I doubt phys_page_find can be buggy, it is so vital for all architecture. But you've seen the example of buggy behaviour I posted last Friday, right? If it's not phys_page_find, it's either cpu_physical_memory_rw (which is also pretty generic), or the way SS-20 registers devices. Can it be that all the pages must be registered in the proper order? I think it's a pretty rare use case where you have a memory fault (not a translation fault) on an unknown address. You may have such fault during device probing, but in such case you know what address you are probing, so you don't care about the sync fault address register. Besides, do all architectures have sync fault address register? >> =C2=A0fwiw the routine is called only once on the real hardware. It sort= of >> =C2=A0speaks for your hypothesis about the memory probing. Although it m= ay >> =C2=A0not necessarily probe for memory... >> >> --=20 Regards, Artyom Tarasenko solaris/sparc under qemu blog: http://tyom.blogspot.com/