From mboxrd@z Thu Jan 1 00:00:00 1970 From: Kevin Moraga Subject: Re: crash on boot with 4.6.1 on fedora 24 Date: Tue, 10 May 2016 10:11:20 -0600 Message-ID: <57320828.1070905@riseup.net> References: <572FC2F9.3060200@riseup.net> <57307DD402000078000E9622@prv-mh.provo.novell.com> <5730A447.3010505@riseup.net> <5730CE7E02000078000E9A8A@prv-mh.provo.novell.com> <5730BD86.6080407@riseup.net> <5730C5A4.9040008@oracle.com> <5730C73E.8030007@riseup.net> <5730D994.6050100@oracle.com> <5731A87502000078000E9D8C@prv-mh.provo.novell.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0784127548977254111==" Return-path: In-Reply-To: <5731A87502000078000E9D8C@prv-mh.provo.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" To: Jan Beulich , Boris Ostrovsky Cc: xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --===============0784127548977254111== Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="OP9PBxCOwfalsUoVRlu0BlSptRJ4MKR7A" This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --OP9PBxCOwfalsUoVRlu0BlSptRJ4MKR7A Content-Type: multipart/mixed; boundary="XG3rnKSRhn7tT1GGLhx9CQGnM4TvQdSX1" From: Kevin Moraga To: Jan Beulich , Boris Ostrovsky Cc: xen-devel@lists.xen.org Message-ID: <57320828.1070905@riseup.net> Subject: Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24 References: <572FC2F9.3060200@riseup.net> <57307DD402000078000E9622@prv-mh.provo.novell.com> <5730A447.3010505@riseup.net> <5730CE7E02000078000E9A8A@prv-mh.provo.novell.com> <5730BD86.6080407@riseup.net> <5730C5A4.9040008@oracle.com> <5730C73E.8030007@riseup.net> <5730D994.6050100@oracle.com> <5731A87502000078000E9D8C@prv-mh.provo.novell.com> In-Reply-To: <5731A87502000078000E9D8C@prv-mh.provo.novell.com> --XG3rnKSRhn7tT1GGLhx9CQGnM4TvQdSX1 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 05/10/2016 01:23 AM, Jan Beulich wrote: >>>> On 09.05.16 at 20:40, wrote: >> On 05/09/2016 01:22 PM, Kevin Moraga wrote: >>> On 05/09/2016 11:15 AM, Boris Ostrovsky wrote: >>>> On 05/09/2016 12:40 PM, Kevin Moraga wrote: >>>>> On 05/09/2016 09:53 AM, Jan Beulich wrote: >>>>>>>>> On 09.05.16 at 16:52, wrote: >>>>>>> On 05/09/2016 04:08 AM, Jan Beulich wrote: >>>>>>>>>>> On 09.05.16 at 00:51, wrote: >>>>>>>>> I'm try to compile kernel 4.4.8 (using fedora 23) to run with X= en 4.6.0 >>>>>>>>> and Intel Skylake processor (Intel Core i7-6600U) >>>>>>>>> >>>>>>>>> This kernel is crashing almost in the same way as explained in = this >>>>>>>>> thread... But my problem is mainly with Skylake. Because the sa= me >>>>>>>>> configuration works within another machine but with another pro= cessor >>>>>>>>> (Intel Core i5-3340M). Attached are the boot logs. >>>>>>>> The address the fault occurs on (ffff8000006bdee0) is bogus, so >>>>>>>> from the register and stack dump alone I don't think we can deri= ve >>>>>>>> much. What we'd need is access to the kernel binary used (or >>>>>>>> really the vmlinux accompanying the vmlinuz that was used), in >>>>>>>> order to see where exactly the kernel died, and hence where this= >>>>>>>> bogus address originates from. As I understand it this is a kern= el >>>>>>>> you built yourself - can you make said binary from exactly that >>>>>>>> build available somewhere?=20 >>>>>>> Yes I have it. But I get the same crash on various 4.4.X and also= with >>>>>>> 4.5.3. >>>>>>> >>>>>>> **https://drive.google.com/open?id=3D0B6Ol0ob95UxXQV9HM1BWMmhCZ0E= =20 >>>>>> Well, this doesn't contain the file I'm after (vmlinux), and takin= g >>>>>> apart vmlinuz would be quite cumbersome. >>>>>> >>>>>> Jan >>>>>> >>>>> Oh sorry, here is the link to vmlinux >>>>> >>>>> >> https://drive.google.com/file/d/0B6Ol0ob95UxXN0dDMWM1a29vMEk/view?usp=3D= sharing=20 >>>> This is still vmlinuz but the failure is at >>>> >>>> ffffffff81007ef3: 48 3b 1d 4e 2e ec 00 cmp =20 >>>> 0xec2e4e(%rip),%rbx # 0xffffffff81ecad48 >>>> ffffffff81007efa: 73 51 jae 0xffffffff810= 07f4d >>>> ffffffff81007efc: 31 c0 xor %eax,%eax >>>> ffffffff81007efe: 48 8b 15 03 d2 c0 00 mov =20 >>>> 0xc0d203(%rip),%rdx # 0xffffffff81c15108 >>>> ffffffff81007f05: 90 nop >>>> ffffffff81007f06: 90 nop >>>> ffffffff81007f07: 90 nop >>>> ffffffff81007f08: 4c 8b 2c da mov =20 >>>> (%rdx,%rbx,8),%r13 <=3D=3D=3D=3D=3D=3D >>>> ffffffff81007f0c: 90 nop >>>> ffffffff81007f0d: 90 nop >>>> ffffffff81007f0e: 90 nop >>>> ffffffff81007f0f: 85 c0 test %eax,%eax >>>> ffffffff81007f11: 78 3a js 0xffffffff810= 07f4d >>>> ffffffff81007f13: 48 8b 05 ee 11 d2 00 mov =20 >>>> 0xd211ee(%rip),%rax # 0xffffffff81d29108 >>>> ffffffff81007f1a: 49 39 c5 cmp %rax,%r13 >>>> ffffffff81007f1d: 73 6f jae 0xffffffff810= 07f8e >>>> ffffffff81007f1f: 48 8b 05 ea 11 d2 00 mov =20 >>>> 0xd211ea(%rip),%rax # 0xffffffff81d29110 >>>> ffffffff81007f26: 4a 8b 04 e8 mov (%rax,%r13,8)= ,%rax >>>> >>>> Any chance you could provide an un-stripped binary or System.map? >>> Here is the link for System.map >>> >>> >> https://drive.google.com/file/d/0B6Ol0ob95UxXYVE4SzdMcENsWWs/view?usp=3D= sharing=20 >> >> So my semi-educated guess at your stack is >> __early_ioremap >> -> __early_set_fixmap >> -> set_pte >> -> xen_set_pte_init >> -> mask_rw_pte >> -> pte_pfn >> -> pte_val >> -> xen_pte_val >> -> pte_mfn_to_pfn >> -> mfn_to_pfn_no_overrides >> -> ret =3D >> xen_safe_read_ulong(&machine_to_phys_mapping[mfn], &pfn) >> >> >> With ffffffff81007f08 being the faulted address the last one looks >> plausible: >> >> >> ffffffff81007efe: 48 8b 15 03 d2 c0 00 mov =20 >> 0xc0d203(%rip),%rdx # 0xffffffff81c15108 >> ffffffff81007f05: 90 nop >> ffffffff81007f06: 90 nop >> ffffffff81007f07: 90 nop >> ffffffff81007f08: 4c 8b 2c da mov (%rdx,%rbx,8),%r13 >> >> since >> >> ostr@workbase> grep ffffffff81c15108 >> /tmp/System.map-4.4.8-9.pvops.qubes.x86_64 >> ffffffff81c15108 D machine_to_phys_mapping >> ostr@workbase> >> >> But %rdx is not ffffffff81c15108, it is ffff800000000000: >> >> (XEN) rax: 0000000000000000 rbx: 00000000000d7bdc rcx: ffff8800020= 59000 >> (XEN) rdx: ffff800000000000 rsi: 80000000d7bdc063 rdi: 80000000d7b= dc063 > But that's a MOV above, i.e. %rdx =3D [0xffffffff81c15108], which > sensibly is MACH2PHYS_VIRT_START. And the MFN in %rbx > would then match with the value in %cr2. Question is - where > does MFN 0xd7bdc come from (it's in a reserved range, and hence > can only be MMIO, which shouldn't be subject to M2P translation), > and why is this a problem only on Skylake (or maybe that's not > CPU related at all, but just dependent on the memory layout > produced by the firmware). > > Obviously, accesses to the sparse[!] M2P prior to a proper #PF > handler established can't end well. With no RAM present in the > range 0xc0000000-0xffffffff, the 4th 2Mb M2P page doesn't get > populated, i.e. this page walk > > (XEN) Pagetable walk from ffff8000006bdee0: > (XEN) L4[0x100] =3D 000000081daf9067 ffffffffffffffff > (XEN) L3[0x000] =3D 000000081daf7067 ffffffffffffffff > (XEN) L2[0x003] =3D 0000000000000000 ffffffffffffffff=20 > > is to be expected. > > Anyway, Kevin, it would really make things a lot easier if you > provided the vmlinux matching the vmlinuz, which you should > have (assuming my understanding is correct that this is a kernel > you built yourself). After all what we may need to figure out is > the caller of __early_ioremap() in the call stack Boris deduced. > > Jan Yep, this is the link: https://drive.google.com/file/d/0B6Ol0ob95UxXaWl4cVRKR1BUak0/view?usp=3Ds= haring --=20 Sincerely, Kevin Moraga PGP: F258EDCB Fingerprint: 3915 A5A9 959C D18F 0A89 B47E FB4B 55F5 F258 EDCB --XG3rnKSRhn7tT1GGLhx9CQGnM4TvQdSX1-- --OP9PBxCOwfalsUoVRlu0BlSptRJ4MKR7A Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJXMggpAAoJEIr57EYHp2ROCygP/0Pxo2BZityLtfJeoXPIiZd2 yiLowbvcmv4VgFrKkzZZEDIEPlRGIsp3ofoVuk36NjssePazWiFnj1U80bqu4RT5 3+f2mZwOoMWNnk6YC9Ous5g20SBsaX0NQ2JwQYql4wqaN/vsfUof9YvSzLrAZplx doH2Y/7g6ktvT7AJyGe4sS1KvPCTFm8xlQ9YqsMdwlpT529VqOms8t7TRcIkFTYD edNioe/sZZgtxKqN86XYZzA3OgrRa/nvpwI1JrdbJQeJgBYGYF58EmJDXAVyD6iz qMZ4DTHbP7C6h1ZkITuU8dk0afVID4G/8LQwCmG036bCQylg+XRzVbh0Mnit5eqP cqgkPCJYvOOgfzCjO4Gw+1ZBlODadU0HE1LjgqaCzsvOt7IwtlHvLAojMC42LHMu Zep+VmAg8VT8tGtLu8tVfyfQLPWr8TKLU4Vr24SusuYkdHzgPV8oPGc4RTf0C9yI ++EobuoQUWWyqyvCEN2mlDesJDMNU2kdFeC0ikxNLq8htTQtS3rNivxGctIocgYJ X+6BVm+4G652Ns7PerXiHZjM64/MZ1IdxhUnTnXOK7N8AvCJQxcCv7qzTMOwM5dj teshuyaz2U82FLNT1dSt7lcA/ouLKdBsk7xiTfH6b8IoKL6vEnUq7R7hGjJISWBZ vMn0v1raTeZzZqXfd1CS =xWfb -----END PGP SIGNATURE----- --OP9PBxCOwfalsUoVRlu0BlSptRJ4MKR7A-- --===============0784127548977254111== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KWGVuLWRldmVs IG1haWxpbmcgbGlzdApYZW4tZGV2ZWxAbGlzdHMueGVuLm9yZwpodHRwOi8vbGlzdHMueGVuLm9y Zy94ZW4tZGV2ZWwK --===============0784127548977254111==--