From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Google-Smtp-Source: ACJfBosVfhSeSkSdwfm6DBAhOJcl7ubk1MX8z1IjV+alnF2TsV5fq2qX6Egc0KR6vtt5KwW3iDCj ARC-Seal: i=1; a=rsa-sha256; t=1516379455; cv=none; d=google.com; s=arc-20160816; b=qKMg6OrBvZXwhsj8g8s48LgV5vpf0U7xIukah/gEJPt5c5CqH+SvO1fRuTr4NixQBY altMP2lFl0JqZ31zEzu5cqG6rUsWlO0eK5bZXcVOnv1pPdpYqm9DpVszB4RtEDGXGZwn 6Vo1bh5VOPRk+Pl/hTYJXyRGip+Xgop/9662xh1WlXokBlCdQRhiGCMIs/29MH8uWaOJ XuCwNs/LsFNZ2MDz+vEG1Dcmyu4fzfQ/m0zPUJevjcMhaR19YqGl5lQr6vWzQRiPDOsh bAiWnGyVmpkZxNEg2lxJYfqBeBpUipw6d0qM0HTp4jD5urf1vel4faHna9NhcqA/sfN9 R/WA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:subject:message-id:date:from:references:in-reply-to :mime-version:dmarc-filter:arc-authentication-results; bh=xB1K/pb0TB3o1ioL3eWT0qIjbJP+IysjsQf3PWXjiHA=; b=CO/6FGL4C6CSNULgNPU8EVzineZmRdIMHDP6qMVOQ5xns4+zCNa5HBzvaM8Q/7ImKy AZhX0ypZTpl1OgGT+8+PGl0kVcnzf7UbrhJxBjNX99qP5P0jsf9YgbrkO3O1mDlenHUb 0pHzzkIp8Q/9haykc186IJPI3X/qAgXcPlGRfpZhdMTSq1XET9/rkbEQ9tqCVt6TSJg7 0IQY7bGGnzY6mzTreugP9qyl+DhgIGwNNCzshLfrEGJGmHUuDAWlHKYwpqzRlo2TWq0y SFtSyHQlPACEI+3mAdHvGM2K1AC9nylpmCBg61BrYbLA2OML1agNS3rZ5Mfw/cDUUXCi e6TA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of luto@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=luto@kernel.org Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of luto@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=luto@kernel.org DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 879CE2175B Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=luto@kernel.org MIME-Version: 1.0 In-Reply-To: <20180119095523.GY28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-3-git-send-email-joro@8bytes.org> <20180117091853.GI28161@8bytes.org> <20180119095523.GY28161@8bytes.org> From: Andy Lutomirski Date: Fri, 19 Jan 2018 08:30:33 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 02/16] x86/entry/32: Enter the kernel via trampoline stack To: Joerg Roedel Cc: Andy Lutomirski , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , Linux-MM , Linus Torvalds , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel Content-Type: text/plain; charset="UTF-8" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: =?utf-8?q?1589767841589420443?= X-GMAIL-MSGID: =?utf-8?q?1590039104287229626?= X-Mailing-List: linux-kernel@vger.kernel.org List-ID: On Fri, Jan 19, 2018 at 1:55 AM, Joerg Roedel wrote: > Hey Andy, > > On Wed, Jan 17, 2018 at 10:10:23AM -0800, Andy Lutomirski wrote: >> On Wed, Jan 17, 2018 at 1:18 AM, Joerg Roedel wrote: > >> > Just read up on vm86 mode control transfers and the stack layout then. >> > Looks like I need to check for eflags.vm=1 and copy four more registers >> > from/to the entry stack. Thanks for pointing that out. >> >> You could just copy those slots unconditionally. After all, you're >> slowing down entries by an epic amount due to writing CR3 on with PCID >> off, so four words copied should be entirely lost in the noise. OTOH, >> checking for VM86 mode is just a single bt against EFLAGS. >> >> With the modern (rewritten a year or two ago by Brian Gerst) vm86 >> code, all the slots (those actually in pt_regs) are in the same >> location regardless of whether we're in VM86 mode or not, but we're >> still fiddling with the bottom of the stack. Since you're controlling >> the switch to the kernel thread stack, you can easily just write the >> frame to the correct location, so you should not need to context >> switch sp1 -- you can do it sanely and leave sp1 as the actual bottom >> of the kernel stack no matter what. In fact, you could probably avoid >> context switching sp0, either, which would be a nice cleanup. > > I am not sure what you mean by "not context switching sp0/sp1" ... You're supposed to read what I meant, not what I said... I meant that we could have sp0 have a genuinely constant value per cpu. That means that the entry trampoline ends up with RIP, etc in a different place depending on whether VM was in use, but the entry trampoline code should be able to handle that. sp1 would have a value that varies by task, but it could just point to the top of the stack instead of being changed depending on whether VM is in use. Instead, the entry trampoline would offset the registers as needed to keep pt_regs in the right place. I think you already figured all of that out, though :) From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f199.google.com (mail-pf0-f199.google.com [209.85.192.199]) by kanga.kvack.org (Postfix) with ESMTP id 8DF706B026B for ; Fri, 19 Jan 2018 11:30:56 -0500 (EST) Received: by mail-pf0-f199.google.com with SMTP id a9so2197315pff.0 for ; Fri, 19 Jan 2018 08:30:56 -0800 (PST) Received: from mail.kernel.org (mail.kernel.org. [198.145.29.99]) by mx.google.com with ESMTPS id u69si8448230pgb.10.2018.01.19.08.30.55 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 19 Jan 2018 08:30:55 -0800 (PST) Received: from mail-io0-f173.google.com (mail-io0-f173.google.com [209.85.223.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id B6E2B21759 for ; Fri, 19 Jan 2018 16:30:54 +0000 (UTC) Received: by mail-io0-f173.google.com with SMTP id f4so964489ioh.8 for ; Fri, 19 Jan 2018 08:30:54 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20180119095523.GY28161@8bytes.org> References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <1516120619-1159-3-git-send-email-joro@8bytes.org> <20180117091853.GI28161@8bytes.org> <20180119095523.GY28161@8bytes.org> From: Andy Lutomirski Date: Fri, 19 Jan 2018 08:30:33 -0800 Message-ID: Subject: Re: [PATCH 02/16] x86/entry/32: Enter the kernel via trampoline stack Content-Type: text/plain; charset="UTF-8" Sender: owner-linux-mm@kvack.org List-ID: To: Joerg Roedel Cc: Andy Lutomirski , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , LKML , Linux-MM , Linus Torvalds , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel On Fri, Jan 19, 2018 at 1:55 AM, Joerg Roedel wrote: > Hey Andy, > > On Wed, Jan 17, 2018 at 10:10:23AM -0800, Andy Lutomirski wrote: >> On Wed, Jan 17, 2018 at 1:18 AM, Joerg Roedel wrote: > >> > Just read up on vm86 mode control transfers and the stack layout then. >> > Looks like I need to check for eflags.vm=1 and copy four more registers >> > from/to the entry stack. Thanks for pointing that out. >> >> You could just copy those slots unconditionally. After all, you're >> slowing down entries by an epic amount due to writing CR3 on with PCID >> off, so four words copied should be entirely lost in the noise. OTOH, >> checking for VM86 mode is just a single bt against EFLAGS. >> >> With the modern (rewritten a year or two ago by Brian Gerst) vm86 >> code, all the slots (those actually in pt_regs) are in the same >> location regardless of whether we're in VM86 mode or not, but we're >> still fiddling with the bottom of the stack. Since you're controlling >> the switch to the kernel thread stack, you can easily just write the >> frame to the correct location, so you should not need to context >> switch sp1 -- you can do it sanely and leave sp1 as the actual bottom >> of the kernel stack no matter what. In fact, you could probably avoid >> context switching sp0, either, which would be a nice cleanup. > > I am not sure what you mean by "not context switching sp0/sp1" ... You're supposed to read what I meant, not what I said... I meant that we could have sp0 have a genuinely constant value per cpu. That means that the entry trampoline ends up with RIP, etc in a different place depending on whether VM was in use, but the entry trampoline code should be able to handle that. sp1 would have a value that varies by task, but it could just point to the top of the stack instead of being changed depending on whether VM is in use. Instead, the entry trampoline would offset the registers as needed to keep pt_regs in the right place. I think you already figured all of that out, though :) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org