From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:49918) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fYsIz-0004sX-My for qemu-devel@nongnu.org; Fri, 29 Jun 2018 08:15:19 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fYsIy-0001uF-Hm for qemu-devel@nongnu.org; Fri, 29 Jun 2018 08:15:13 -0400 Received: from mail-ot0-x22a.google.com ([2607:f8b0:4003:c0f::22a]:41815) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fYsIy-0001ss-3g for qemu-devel@nongnu.org; Fri, 29 Jun 2018 08:15:12 -0400 Received: by mail-ot0-x22a.google.com with SMTP id d19-v6so9680377oti.8 for ; Fri, 29 Jun 2018 05:15:12 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <489d8470-c12f-a441-4e87-b2b3013a3ab8@vivier.eu> References: <20180626165658.31394-1-peter.maydell@linaro.org> <20180626165658.31394-24-peter.maydell@linaro.org> <50e46f22-1e3d-10ce-5b5b-a10af49a95f1@vivier.eu> <489d8470-c12f-a441-4e87-b2b3013a3ab8@vivier.eu> From: Peter Maydell Date: Fri, 29 Jun 2018 13:14:50 +0100 Message-ID: Content-Type: text/plain; charset="UTF-8" Subject: Re: [Qemu-devel] [PULL 23/32] tcg: Support MMU protection regions smaller than TARGET_PAGE_SIZE List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Laurent Vivier Cc: QEMU Developers , Richard Henderson , =?UTF-8?B?QWxleCBCZW5uw6ll?= On 28 June 2018 at 23:26, Laurent Vivier wrote: > ./m68k-softmmu/qemu-system-m68k -M q800 \ > -serial none -serial mon:stdio \ > -kernel vmlinux-4.15.0-2-m68k \ > -nographic Thanks for the test case. I'm still investigating, but there are a couple of things happening here. First, there's a bug in get_page_addr_code()'s "is this a TLB miss?" condition which was introduced in commit 71b9a45330fe22: if (unlikely(env->tlb_table[mmu_idx][index].addr_code != (addr & (TARGET_PAGE_MASK | TLB_INVALID_MASK)))) { takes a (not necessarily page aligned) address, and masks out everything but the page-aligned top half (good) and the TLB_INVALID bit (not good, because that could be either 0 or 1 depending on the address). This means sometimes we'll incorrectly decide we got a miss in the TLB and do an unnecessary refill. The second thing that's going on here is that the m68k target code writes TLB entries for the same address with different prot bits without doing a flush in between: tlb_set_page_with_attrs: vaddr=0029b000 paddr=0x000000000029b000 prot=3 idx=0 tlb_set_page_with_attrs: vaddr=0029b000 paddr=0x000000000029b000 prot=7 idx=0 The tlb_set_page_with_attrs() code isn't expecting this, so we end up with two TLB entries for the same address, one in the main TLB and one in the victim cache TLB. The bug above means that we get this sequence of events: * fill main TLB entry with prot=3 entry * later, fill main TLB with prot=7 entry, and evict prot=3 entry to victim cache * hit on the prot=7 entry in the main TLB * refill condition incorrectly fails, but we hit in the victim cache * so we pull the prot=3 entry from victim to main TLB * prot=3 means "addr_code == -1", so the check of the TLB_RECHECK bit succeeds * in the TLB_RECHECK code we do a tlb_fill() * that fills in the main TLB with a prot=7 entry again, bouncing the prot=3 entry back out to the victim cache * prot=7 means the addr_code is correct, so we find ourselves in the "TLB_RECHECK but this is RAM" abort code path I'm not sure whether it's supposed to be the responsibility of the target code or the common accel/tcg code to ensure that we don't have multiple TLB entries for the same address. thanks -- PMM