From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AAF07C352AA for ; Mon, 3 Oct 2022 22:22:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229896AbiJCWWB (ORCPT ); Mon, 3 Oct 2022 18:22:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33970 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229668AbiJCWV5 (ORCPT ); Mon, 3 Oct 2022 18:21:57 -0400 Received: from NAM12-DM6-obe.outbound.protection.outlook.com (mail-dm6nam12on20719.outbound.protection.outlook.com [IPv6:2a01:111:f400:fe59::719]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 238D653037; Mon, 3 Oct 2022 15:21:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=iSrbsMt6xnkbgDIFP1lq0e7w+nwMglxtF4et3ifiJDN+LjSYJZ7nGeqemGkyvnM+sP54QL5+KReR/4rGRxElhCNKBDryRSWO7H/qNon0DfjBpg4LhrKrvctP0DDKztXde0fOkx3XVsoUasI7B/PFHkDYDR1OV79DhvVXtdEdNfG1MdCTb4YayH4Uxbs14dVBOpRkUNZURJ+n9bXPbyZjckXOZqNREmwO6MdWdD1LiYO8GKClLlaFI/0GWLclbSyJEyzuuf4ZOkGEZMe/XO96a/flCoCseX41DY6islNf0wxpSIzQeGlvDBwtE7Nb5qVtOCsWaDoyv1B/hQLef9gEaQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Dpw2FHL2mwzQM0JW5AQf63SBBPNCd74oXpIyO+bDqAs=; b=jFn51eocjLEYCr4Yd6absV9Bv9qwPkAN4M9E143NyM85Yz+gv7Si9n3rSXG73W+OvdQcvArreic1jDos3VvFz0vN+LYimFpSymRnSmulIxMymegYumgySDsPbLMcr5G9Niq6g52MF77MF+lZQSkSFnkWAzRd3q2/kdk2Tmlsb75UsOymGnmWQyEzIRrnRYPUjIvlG8U8daxe/tpMeDBNdepgcrpacqA5HtEV+fB94hCrM9qk5F7dvG+vWxukvoN7I+9peu/V7MB5vxmp4uAPKhit9W7YnZBPBWyGC3wIMf38V5iln1ZDSJyM/zATrZ1xKiVqJ5IAb2xJ9xRS57AMCw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=bu.edu; dmarc=pass action=none header.from=bu.edu; dkim=pass header.d=bu.edu; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bu.edu; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Dpw2FHL2mwzQM0JW5AQf63SBBPNCd74oXpIyO+bDqAs=; b=XLAc01olul3ptAPhuQ8o2tbBtazkmV4jmii95zvFPv86r/m80zZIlgXXSzY2rb0GUesbaaD01YbIveDqhw6xNbVYSzi7XycxLVG0+z2arKBd+SFfM1RlBiqRZ0q4tanoeCKiZTsF7e4DuhrtmqN6d+kyOzIwNcMDbUjqMSlQlxrMVk0jZCZOalzVmBIEJgzTVup9XDFN4v1zuoIXA5zsM5y8OA6bpxdc5D8EJXQOiHDw+3OjA3m05PupqliC9Qu9Z7J6Me/AgpE0xbJcvw8STNSdPMeXmhI1AoPH3IZz2zhmi43ODFs44qNjufexVH2P8VPoNiTSou2YLoEUwWKNYw== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=bu.edu; Received: from BL0PR03MB4129.namprd03.prod.outlook.com (2603:10b6:208:65::33) by MN2PR03MB4928.namprd03.prod.outlook.com (2603:10b6:208:1a7::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5676.24; Mon, 3 Oct 2022 22:21:54 +0000 Received: from BL0PR03MB4129.namprd03.prod.outlook.com ([fe80::9e2b:bf05:79ec:581]) by BL0PR03MB4129.namprd03.prod.outlook.com ([fe80::9e2b:bf05:79ec:581%4]) with mapi id 15.20.5676.030; Mon, 3 Oct 2022 22:21:53 +0000 From: Ali Raza To: linux-kernel@vger.kernel.org Cc: corbet@lwn.net, masahiroy@kernel.org, michal.lkml@markovi.net, ndesaulniers@google.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, luto@kernel.org, ebiederm@xmission.com, keescook@chromium.org, peterz@infradead.org, viro@zeniv.linux.org.uk, arnd@arndb.de, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, pbonzini@redhat.com, jpoimboe@kernel.org, linux-doc@vger.kernel.org, linux-kbuild@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, x86@kernel.org, rjones@redhat.com, munsoner@bu.edu, tommyu@bu.edu, drepper@redhat.com, lwoodman@redhat.com, mboydmcse@gmail.com, okrieg@bu.edu, rmancuso@bu.edu, Ali Raza Subject: [RFC UKL 00/10] Unikernel Linux (UKL) Date: Mon, 3 Oct 2022 18:21:23 -0400 Message-Id: <20221003222133.20948-1-aliraza@bu.edu> X-Mailer: git-send-email 2.37.3 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-ClientProxiedBy: MN2PR04CA0031.namprd04.prod.outlook.com (2603:10b6:208:d4::44) To BL0PR03MB4129.namprd03.prod.outlook.com (2603:10b6:208:65::33) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BL0PR03MB4129:EE_|MN2PR03MB4928:EE_ X-MS-Office365-Filtering-Correlation-Id: d1f5fc72-6335-4464-6ad9-08daa58dacc9 X-LD-Processed: d57d32cc-c121-488f-b07b-dfe705680c71,ExtAddr X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: Gyr9k7zkumh12C39MMAH2hnXm4JDYfaxuQ9BXKxA5p8tA9UWvJXEbckDCPq7wzXWwaPDwG2JsLfOD/WL8jRwjUR//nrhXEUlwEcC8Q0TeqzcJWF+0OAaGNM25oy4+CWwOTR4kFZMlaPWGacc+hgHIauGNY4x4K7tz1zVDoQQ3L+rBqzyTqDW/9zr96TEyd6jFeFyi2QbDOqBeBFCInduL+3ZhIlQe4VWVVJ8nIJZEu+OOSWGWbYTlPbSJVJ3qJfiusVrC/QDhUcTymhUQEV7vg7Ash1NFEZr3NnlshdFnCbyktCMBY26ala+EDe/aDk/CGH7h/vAfPgAcUmIgRH/xAS2fjgmA9xg/xp9h8Vc3zfe00An7QbkrMtCWYMKrC6BkcjGUKcg3nZeYZD3+wER+YZFYHga+jSYKVhiPDwW9TfoYC+OPkl9e5QemHbBW7dL1xfRvHvat8hLCkvcamQZIxpLq3RZ+6ykRoQLO1XwhEYhyE5Yn3I2fOANUveqZXJAFGb6o2bXZvzZCMzQj2Cr2wMa4nxOi0ItHL/F+mVAT1vLb0Xv77F7/9KK8K7Xlvy/lPzzzh3eoeTfbCDut6MKXGWd10vT7NlazPDSBP4D5CJEbGVRdz06uAthtpM1l/4AB2SKOJ247QhPWC45AO2asKyWLYux773PZPVRkt3z4MP2sBkJcKftF5rbh6Z5Y9JOcCLwe/EYVDkH8iAb41lanYolax3Akmd81O7YR0D0hdMyIc33O51yCb11QjL2JhG+2q8ObP1WpztBLz/xqTQBo6Tp6x1O/EB786+2gaf31gbPaA7FjWDeHaxOS7Tb6S4TugsSRH+0GOrNZwM1OCXqZA== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BL0PR03MB4129.namprd03.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230022)(4636009)(376002)(136003)(346002)(366004)(396003)(39860400002)(451199015)(8936002)(7406005)(41300700001)(6666004)(38350700002)(75432002)(66556008)(66476007)(4326008)(8676002)(1076003)(186003)(6512007)(2906002)(2616005)(66946007)(83380400001)(52116002)(6506007)(36756003)(5660300002)(7416002)(26005)(86362001)(966005)(41320700001)(6916009)(38100700002)(316002)(6486002)(478600001)(786003);DIR:OUT;SFP:1102; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?uvmjxrLvBWu+BlcM/9SSITiDZ2rZtYib4ZqWXrYp4NF4xFzv0RQ6G9MTwkWf?= =?us-ascii?Q?MMBQHKBBiE8TYNAtDyDVEYGiOQ3tisy709lIE3K3T7+ECfP+z/PV/TxckZac?= =?us-ascii?Q?JCCTahAwBV5EeNaJAOROG/mxWb+pFQcmM5Ey3m8bOZQpgGigEWEJn+4Y6bYj?= =?us-ascii?Q?kgt6PJI/sdOeoDX8Nq779mrMWiYf27PGr62uu0UqVGgq8mq9YA7caDHAG2Ye?= =?us-ascii?Q?q0sstnRZ6G44/eDk3TfzNp7KEJaLFYkgVcD0XrJFht0q+cEGc7IHEmC97wn+?= =?us-ascii?Q?oQrlLrFtd/DK476bBfBJLAP7sWdiEKW1Y4tSpUrN/mlc7xnkiTvvXkbsx4kq?= =?us-ascii?Q?1X8Kw/DJ8kS9JoA6dGX9QoXbL0TX/wJ/bGSSiOjYUsjZNXvyBnP1K6nxkMl7?= =?us-ascii?Q?qIXEVdPbjD/kgLNhFMtNZ4auJ1x1Q6aCvVdZ4ES+EQPndzIVRpi8pAQrsZFb?= =?us-ascii?Q?Nq6gpeUAfdBZho64eDB3gGPJdF/LOU4/JxGK7D74g32oFo6EhpnxT4rD4QJ6?= =?us-ascii?Q?N/oIpSYRKXWWvPnaJuA+aOFK5kHtwCeURwtghhco5UxA3RlNr3iTj+YasH10?= =?us-ascii?Q?4DpsHqU/sebVADpd7t4i5uOUmfB3A6nRKEsA+kkTzh0CgsLoI0Omq3LGXO3y?= =?us-ascii?Q?EbY6EHN8t13M3wxjeiVqDojZI1bq6ci5zPq14vQ+AT3ujyJVj7wqf1d+pWkP?= =?us-ascii?Q?jQKi5gHDQtwFF+fnMZAmcxnOOfYlCeoJXwWLJeBrTp/MAYbabCgu4dP2JfaH?= =?us-ascii?Q?X5DxADRHI5I891q0kjJpWm84Sgd6+o6daq7OKrVrWTe77BgY3pidovQrfRUK?= =?us-ascii?Q?kduQamRWFFySUHAF9ko2F+681B905ItW+qrPf/C3nidsG5I8x7XtNVfDqigZ?= =?us-ascii?Q?U/r3Bbjc/coTMAJfALpK0mG6rOdhI1iFL69gU//B9Bs/odStsm2DORAvUqEY?= =?us-ascii?Q?BwHJ4PGk9Xx7TdoXSGHeuRsqPDhqgvOpopRjas3FIwl7gDfuEIxLwt5xWuMm?= =?us-ascii?Q?Gv+FHRElqUsBK1KpboMM3T1QIBj4m4VOBP5+CB2bUYv0Npf5zI5I6tiY+veJ?= =?us-ascii?Q?NOB/aGiFWPOzuDaLntW7tgsbowPiginN0IOgOM+k69fKugzu6vJMhGPBx0OW?= =?us-ascii?Q?955F2Wrd6QS/JXfSnkvxPSoSfQXovNjPjnSAUzZsgmXBuknWFFLDU2BI9xVE?= =?us-ascii?Q?cB9ZJRizyv+yoiOpNjxl2E6dLc4JN5/qcO3/jkNVAWxW81/Ef32ItBmhlY0V?= =?us-ascii?Q?APxNHlsJr7lKxFCftsxmjLIUKVJ8igFZtz+buZwPhexh5S1EitEt5DTVDgjV?= =?us-ascii?Q?R0yF0CbIwVeDmZ/A3O4OqpDMJnsCndKPY0lsU/JDpOnb3oVluuTJU0dfPcrO?= =?us-ascii?Q?NAWI1Tnirj/KlI3HosFhF2aPPsKhqQQfm9GhA8d9+R0aUETF33WKE92Sgp/7?= =?us-ascii?Q?ku4ETAhb91KC2PHOk1zwKnAKxc5fL1F4cugu5pV+4awIBAPvQK8ngyfWSyYd?= =?us-ascii?Q?0R1h9QFS05xDVBFc3linHYgJmH23YWtfFOwIrnc57KnHK8ujceNw8L/97pvd?= =?us-ascii?Q?keWUoqxNv5jSslwJzhkFZdczCXOZWk7SafJJh02H?= X-OriginatorOrg: bu.edu X-MS-Exchange-CrossTenant-Network-Message-Id: d1f5fc72-6335-4464-6ad9-08daa58dacc9 X-MS-Exchange-CrossTenant-AuthSource: BL0PR03MB4129.namprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 03 Oct 2022 22:21:53.6855 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: d57d32cc-c121-488f-b07b-dfe705680c71 X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: uTpEXc9ZRbdUHwjL77z8elePwIwfSNM1PFbC6ieXhXoeaXiQuhR4dLzVZ77Muvl4 X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN2PR03MB4928 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Unikernel Linux (UKL) is a research project aimed at integrating application specific optimizations to the Linux kernel. This RFC aims to introduce this research to the community. Any feedback regarding the idea, goals, implementation and research is highly appreciated. Unikernels are specialized operating systems where an application is linked directly with the kernel and runs in supervisor mode. This allows the developers to implement application specific optimizations to the kernel, which can be directly invoked by the application (without going through the syscall path). An application can control scheduling and resource management and directly access the hardware. Application and the kernel can be co-optimized, e.g., through LTO, PGO, etc. All of these optimizations, and others, provide applications with huge performance benefits over general purpose operating systems. Linux is the de-facto operating system of today. Applications depend on its battle tested code base, large developer community, support for legacy code, a huge ecosystem of tools and utilities, and a wide range of compatible hardware and device drivers. Linux also allows some degree of application specific optimizations through build time config options, runtime configuration, and recently through eBPF. But still, there is a need for even more fine-grained application specific optimizations, and some developers resort to kernel bypass techniques. Unikernel Linux (UKL) aims to get the best of both worlds by bringing application specific optimizations to the Linux ecosystem. This way, unmodified applications can keep getting the benefits of Linux while taking advantage of the unikernel-style optimizations. Optionally, applications can be modified to invoke deeper optimizations. There are two steps to unikernel-izing Linux, i.e., first, equip Linux with a unikernel model, and second, actually use that model to implement application specific optimizations. This patch focuses on the first part. Through this patch, unmodified applications can be built as Linux unikernels, albeit with only modest performance advantages. Like unikernels, UKL would allow an application to be statically linked into the kernel and executed in supervisor mode. However, UKL preserves most of the invariants and design of Linux, including a separate page-able application portion of the address space and a pinned kernel portion, the ability to run multiple processes, and distinct execution modes for application and kernel code. Kernel execution mode and application execution mode are different, e.g., the application execution mode allows application threads to be scheduled, handle signals, etc., which do not apply to kernel threads. Application built as a Linux unikernel will have its text and data loaded with the kernel at boot time, while the rest of the address space would remain unchanged. These applications invoke the system call functionality through a function call into the kernel system call entry point instead of through the syscall assembly instruction. UKL would support a normal userspace so the UKL application can be started, managed, profiled, etc., using normal command line utilities. Once Linux has a unikernel model, different application specific optimizations are possible. We have tried a few, e.g., fast system call transitions, shared stacks to allow LTO, invoking kernel functions directly, etc. We have seen huge performance benefits, details of which are not relevant to this patch and can be found in our paper. (https://arxiv.org/pdf/2206.00789.pdf) UKL differs significantly from previous projects, e.g., UML, KML and LKL. User Mode Linux (UML) is a virtual machine monitor implemented on syscall interface, a very different goal from UKL. Kernel Mode Linux (KML) allows applications to run in kernel mode and replaces syscalls with function calls. While KML stops there, UKL goes further. UKL links applications and kernel together which allows further optimizations e.g., fast system call transitions, shared stacks to allow LTO, invoking kernel functions directly etc. Details can be found in the paper linked above. Linux Kernel Library (LKL) harvests arch independent code from Linux, takes it to userspace as a library to be linked with applications. A host needs to provide arch dependent functionality. This model is very different from UKL. A detailed discussion of related work is present in the paper linked above. See samples/ukl for a simple TCP echo server example which can be built as a normal user space application and also as a UKL application. In the Linux config options, a path to the compiled and partially linked application binary can be specified. Kernel built with UKL enabled will search this location for the binary and link with the kernel. Applications and required libraries need to be compiled with -mno-red-zone -mcmodel=kernel flags because kernel mode execution can trample on application red zones and in order to link with the kernel and be loaded in the high end of the address space, application should have the correct memory model. Examples of other applications like Redis, Memcached etc along with glibc and libgcc etc., can be found at https://github.com/unikernelLinux/ukl List of authors and contributors: ================================= Ali Raza - aliraza@bu.edu Thomas Unger - tommyu@bu.edu Matthew Boyd - mboydmcse@gmail.com Eric Munson - munsoner@bu.edu Parul Sohal - psohal@bu.edu Ulrich Drepper - drepper@redhat.com Richard W.M. Jones - rjones@redhat.com Daniel Bristot de Oliveira - bristot@kernel.org Larry Woodman - lwoodman@redhat.com Renato Mancuso - rmancuso@bu.edu Jonathan Appavoo - jappavoo@bu.edu Orran Krieger - okrieg@bu.edu Ali Raza (9): kbuild: Add sections and symbols to linker script for UKL support x86/boot: Load the PT_TLS segment for Unikernel configs sched: Add task_struct tracking of kernel or application execution x86/entry: Create alternate entry path for system calls x86/uaccess: Make access_ok UKL aware x86/fault: Skip checking kernel mode access to user address space for UKL x86/signal: Adjust signal handler register values and return frame exec: Make exec path for starting UKL application Kconfig: Add config option for enabling and sample for testing UKL Eric B Munson (1): exec: Give userspace a method for starting UKL process Documentation/index.rst | 1 + Documentation/ukl/ukl.rst | 104 +++++++++++++++++++++++ Kconfig | 2 + Makefile | 4 + arch/x86/boot/compressed/misc.c | 3 + arch/x86/entry/entry_64.S | 133 ++++++++++++++++++++++++++++++ arch/x86/include/asm/elf.h | 9 +- arch/x86/include/asm/uaccess.h | 8 ++ arch/x86/kernel/process.c | 13 +++ arch/x86/kernel/process_64.c | 49 ++++++++--- arch/x86/kernel/signal.c | 22 +++-- arch/x86/kernel/vmlinux.lds.S | 98 ++++++++++++++++++++++ arch/x86/mm/fault.c | 7 +- fs/binfmt_elf.c | 28 +++++++ fs/exec.c | 75 +++++++++++++---- include/asm-generic/sections.h | 4 + include/asm-generic/vmlinux.lds.h | 32 ++++++- include/linux/sched.h | 26 ++++++ kernel/Kconfig.ukl | 41 +++++++++ samples/ukl/Makefile | 16 ++++ samples/ukl/README | 17 ++++ samples/ukl/syscall.S | 28 +++++++ samples/ukl/tcp_server.c | 99 ++++++++++++++++++++++ scripts/mod/modpost.c | 4 + 24 files changed, 785 insertions(+), 38 deletions(-) create mode 100644 Documentation/ukl/ukl.rst create mode 100644 kernel/Kconfig.ukl create mode 100644 samples/ukl/Makefile create mode 100644 samples/ukl/README create mode 100644 samples/ukl/syscall.S create mode 100644 samples/ukl/tcp_server.c base-commit: 4fe89d07dcc2804c8b562f6c7896a45643d34b2f -- 2.21.3