From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.3 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, MSGID_FROM_MTA_HEADER,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D6BCDC433E0 for ; Fri, 15 Jan 2021 18:36:18 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4A93123877 for ; Fri, 15 Jan 2021 18:36:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4A93123877 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=marvell.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C90248D01CD; Fri, 15 Jan 2021 13:36:17 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C3F2F8D01B2; Fri, 15 Jan 2021 13:36:17 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A91A08D01CD; Fri, 15 Jan 2021 13:36:17 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 90B498D01B2 for ; Fri, 15 Jan 2021 13:36:17 -0500 (EST) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 49851183A353C for ; Fri, 15 Jan 2021 18:36:17 +0000 (UTC) X-FDA: 77708864394.02.moon69_490e32327531 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin02.hostedemail.com (Postfix) with ESMTP id 29AA8100C9CCE for ; Fri, 15 Jan 2021 18:36:17 +0000 (UTC) X-HE-Tag: moon69_490e32327531 X-Filterd-Recvd-Size: 19900 Received: from mx0b-0016f401.pphosted.com (mx0b-0016f401.pphosted.com [67.231.156.173]) by imf38.hostedemail.com (Postfix) with ESMTP for ; Fri, 15 Jan 2021 18:36:16 +0000 (UTC) Received: from pps.filterd (m0045851.ppops.net [127.0.0.1]) by mx0b-0016f401.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 10FIa5Kn027270; Fri, 15 Jan 2021 10:36:05 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=message-id : date : from : subject : to : cc : references : in-reply-to : content-type : content-transfer-encoding : mime-version; s=pfpt0220; bh=6R3cNNqGyr0FUHyIfc2MVoo9eZSj5qggScNOA/ogo2g=; b=I98UTu8mGLZK6rIzmOMfH/OCYNagqsK7EYe7fHWOJvMO3OWhTXetRZmJ0aU1yUroSpi5 ERtdK+9+D7VwwMOmqIgjLIwvrydr8X8nBBDPuJj7sgjWPVE85tfDwYjQk2l/daItsaJk tm6xeRc2dQgoFrLGVUjrgIguxvtUAJiHNtesef3Ivs09FAAfeWWI4xKhnsR7ksUIk7Eq /ovBLAsErtTpJLAp5zNiIPCxRROC2uaKUipOSIDynVcjwAdwvF7SkHDmCnEavzMQWeac 6eIRTH5UTLmvpXoMsDxwQM3btYcPnoGqinoMFxGHsp/jR5nIhJzfNF8nCyWRUGrYGawS Ew== Received: from dc5-exch01.marvell.com ([199.233.59.181]) by mx0b-0016f401.pphosted.com with ESMTP id 35ycvq3aj6-3 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Fri, 15 Jan 2021 10:36:05 -0800 Received: from SC-EXCH03.marvell.com (10.93.176.83) by DC5-EXCH01.marvell.com (10.69.176.38) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Fri, 15 Jan 2021 10:36:03 -0800 Received: from DC5-EXCH01.marvell.com (10.69.176.38) by SC-EXCH03.marvell.com (10.93.176.83) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Fri, 15 Jan 2021 10:36:02 -0800 Received: from NAM10-DM6-obe.outbound.protection.outlook.com (104.47.58.102) by DC5-EXCH01.marvell.com (10.69.176.38) with Microsoft SMTP Server (TLS) id 15.0.1497.2 via Frontend Transport; Fri, 15 Jan 2021 10:36:02 -0800 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=UU6P4bdq7rPiRUxVYo8G8150FvTivZKTMS+JzQOty/406ZKsJR//GZlsPH9bukdi3F9/O/aSQSwvcuSxux9C8TOzKUtlNuC2+RXsRUgrAfr7b0e49jvUQ8qImZmwnbAZ0aHTpzjvWppN1BkF+LJi+eEHIDLT+8nQKS7luGvkmvzo48Y0M9N7nMiBNtw+yHjn5c8RTY0M5Aq62+OxOcowrZJZoNjzV/9H+BnU79klxRTHkqGkAEZWto2SJ+fuG3hrsSyoekKEDxCLdyhXP/xwcssNWK1btvSnfjGtQrwNB1wmysxvlRzYkfyIbEUgIvEqjsx5oRiI1eRtLMH8U4njvQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=6R3cNNqGyr0FUHyIfc2MVoo9eZSj5qggScNOA/ogo2g=; b=iqPH4vL6MvbzlZh8M0fp+++3bnsfhglknJto6zwSYu1bffmCECKVcfd3qudcU9B9c/3Df/wPLQc8x4Q4/aacU9A0y+idARpO4uRICqOtyWqSCS4jKBAVkcTBOpzdXbq/VY0aZLoDni+dUyjwfz9cWWnsQ21yMF1FJ05b/y6uT+dZqOQrUlPvRn5aYSITHC5IU0bULniczt6uUu6NWNqFuYLRBGBo2gQcjnHIILgEX5ahYSunxBCGwbLrvZAvxwk3BPz2Ne7FXbKHF0JDKmwkqhPLnzjrKBGM/dZ2Oj+89wTA1iLaexiF7yvUV5g2A5oZQYyrdZSmiKgi8lz9TbK2GQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=marvell.com; dmarc=pass action=none header.from=marvell.com; dkim=pass header.d=marvell.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.onmicrosoft.com; s=selector1-marvell-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=6R3cNNqGyr0FUHyIfc2MVoo9eZSj5qggScNOA/ogo2g=; b=YRuraoMDRRfuKljTUhfm3aj0uueIVA1hRLc7JMDVcN48/1ZLb+KcGe+Y5W0P7jZdL3Vll1xx1p0wQVY+ajM4W4HKAKFlkP76dr4UAi5vmZCNvHrWurwE2+vlwDNSXf3Xl4Ibtiw9Ng2zMXbFMgh0tca27J5i1hLqTeWgVBlGeXU= Authentication-Results: redhat.com; dkim=none (message not signed) header.d=none;redhat.com; dmarc=none action=none header.from=marvell.com; Received: from MW2PR18MB2267.namprd18.prod.outlook.com (2603:10b6:907:3::11) by MW2PR18MB2348.namprd18.prod.outlook.com (2603:10b6:907:b::25) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3763.11; Fri, 15 Jan 2021 18:36:01 +0000 Received: from MW2PR18MB2267.namprd18.prod.outlook.com ([fe80::24e2:8566:bf62:b363]) by MW2PR18MB2267.namprd18.prod.outlook.com ([fe80::24e2:8566:bf62:b363%6]) with mapi id 15.20.3763.011; Fri, 15 Jan 2021 18:36:01 +0000 Message-ID: <3fe6a794-a578-3564-acec-d1f4684abeee@marvell.com> Date: Fri, 15 Jan 2021 10:35:14 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:86.0) Gecko/20100101 Thunderbird/86.0a1 From: Alex Belits Subject: Re: [RFC] tentative prctl task isolation interface To: Christoph Lameter , Marcelo Tosatti CC: "tglx@linutronix.de" , "pauld@redhat.com" , "linux-mm@kvack.org" , "frederic@kernel.org" , "willy@infradead.org" , "peterz@infradead.org" , "akpm@linux-foundation.org" , Juri Lelli , Daniel Bristot de Oliveira References: <20201117180356.GT29991@casper.infradead.org> <20201117202317.GA282679@fuller.cnet> <20201127154845.GA9100@fuller.cnet> <87h7p4dwus.fsf@nanos.tec.linutronix.de> <12ddb629555590cfd41db5b10854d95c1f154e24.camel@marvell.com> <20210113121544.GA16380@fuller.cnet> <20210114193430.GA149907@fuller.cnet> Content-Language: en-US In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [173.228.7.197] X-ClientProxiedBy: SJ0PR03CA0136.namprd03.prod.outlook.com (2603:10b6:a03:33c::21) To MW2PR18MB2267.namprd18.prod.outlook.com (2603:10b6:907:3::11) MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Received: from [192.168.9.3] (173.228.7.197) by SJ0PR03CA0136.namprd03.prod.outlook.com (2603:10b6:a03:33c::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3763.9 via Frontend Transport; Fri, 15 Jan 2021 18:35:45 +0000 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: d17f925b-7392-4c68-cbbb-08d8b98467a7 X-MS-TrafficTypeDiagnostic: MW2PR18MB2348: X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:10000; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: tmFhg7V/Z3joxPqie1jCYnOL+qpoITLzlqje29p40RRrs4u17zfwa2wTgtdrcWSfyAiypX19JlArpbLrVNwVJO+2t/dmZhBNdRWMpvC/AkTqmJNuTmN89kJn5Y8mEnQglvn0GShOk3xnWPvKtmPKkcR/JylO/7OfGeapYnjg3D2tim49qJMWAw+TSwPKGduMCz3BHG6MfY+xtDwBQdfMpyU8PsqT4EARAfoJ9dzpkBXKf/GWlc4IzZit/gRhR2lrezQ4cAO7oQwo3XSHULRlynXhpeJZNYhEpfH43/IbCjRU96uqVR/iD6KOrVoSsF0WMoTiaH6H2JebOQfbV7SgPvwKYuTPBxw8A+Q3zq2ML4ud9JmK8aReUqIhrxUlRs8dJpPVieqbszJpL3CvUJsHvRIScMhm0+gjM3CEwH4zcP9Oac7fi2PmWg9uuhnnWFpb3an7qdWOFioiGulI8wVFdFEbPYq7cung5oHt6tnwL40= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:MW2PR18MB2267.namprd18.prod.outlook.com;PTR:;CAT:NONE;SFS:(4636009)(39860400002)(136003)(366004)(396003)(376002)(346002)(53546011)(956004)(31696002)(2906002)(5660300002)(2616005)(52116002)(26005)(4326008)(8676002)(54906003)(36756003)(31686004)(7416002)(478600001)(6666004)(83380400001)(66556008)(16576012)(16526019)(6486002)(316002)(66476007)(86362001)(66946007)(186003)(8936002)(110136005)(43740500002)(45980500001);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData: =?utf-8?B?eFc0Z1FLSys0TUU0c01pR2VCZ2pJNFl1eTczZkFaU21RQXBsSFllOEJOZDl2?= =?utf-8?B?LzRJbkJ4dllCRTJIN3ZIREw5SkNzZGRYSmljcmZFZ0UxbGcvSG1GcW5MOTFK?= =?utf-8?B?TkFoZTV5ODFPWm1QQ0N3YkJMQUE2UVYrYWV1eXh4Q1JLd3gyTFpSZU5tOS9H?= =?utf-8?B?bTVIUXhKeUhHV1VPSnhPVDAyUDJuOHVEUEY5cE9CaGg1Wk1BazFRZlNHSHdr?= =?utf-8?B?WkZjb0tOcWt0Smd6aEU4aVdhM1JRcVBSYXF3bkF6S1ovWHIwVHhqQlQrc1F1?= =?utf-8?B?RisyZ2FhMjc5SEdxRCtMeXI3VjY3OTFHSlJCTitJQ01leW92MlZXUmZkbHE1?= =?utf-8?B?MjBodzFIUllROUZ5eGtEYzZTMUlxdHc3N05FdDZrMnRBdkp1bmhSRXArRlhY?= =?utf-8?B?TzJ6dmQyZWpqeVVVT2xpYnBsTWtWdE1qajNaYWI3TXhzVUtCWmNNK2xKbXFu?= =?utf-8?B?MlJuMDBNWGNoYTZNNmFsNUFKMkZTaElIZnY2a25IdnFJYU50Q21yMlprazVm?= =?utf-8?B?WFJpWGQ4NjJkMGtlRlVZZmxxV0tQQ3dMTUFYQ2RLcE9Qay9tL2d1Kzg0WDB2?= =?utf-8?B?T2ZzNnRBN0ZmVVl3S1JlTTNweEhQVjFIc1VzVFlJWnQ1WnlpcERidXpBZWtR?= =?utf-8?B?RTFuMDZuZENYZEppcnFlc1BXRXhYZTZZU2k3RHpvS1BQeWFRVFpmMytUYk1P?= =?utf-8?B?eEVLZlE3QmxNNmhWN3NRNVlKdlhuamtyTWE2SDkzWTFjYnFCU3ZSY09STFdH?= =?utf-8?B?VldWQVVWNHl4TG50ZFd0ci9uaThNbWdoYmh0WHFPNXljV3hnZXdkZFBCdmFL?= =?utf-8?B?WGhPUWppc0lweDY3WE5jdERoQitwTWhIaXB1NW9GOHE1MmFRTzRvTkVzdDRq?= =?utf-8?B?NC9hWWI0cmFEcC92S2VQdjRlTVp3bUd2ZHZUL3Z0OHJ3clJ3TG9XZy9zdFBl?= =?utf-8?B?enJpN1A3REY0Qm9mREkrcUtER2c1UkZDQ1hRVFhhNFdTUHZOS1YxLzlUaTJJ?= =?utf-8?B?d08xRm5kaSsrall0YzhIN3J1NW42TE9wNGt5ZmRWVU8wdEt6dDMvMnl1M2tI?= =?utf-8?B?VFUwSDJyaEJSUG81VDQyNXFGRTVMZEdhUHVQMlpVZHM4WFFNaks3SjA3RmpW?= =?utf-8?B?ek5xWHN5SkRPOHV1dVh5NVJjUzRGUXhxY0lmRmxwd2lscW9TbkJDT3hQVXBk?= =?utf-8?B?ZmduRFI2aDZsY1ZRNWVTQ3pCaUwreUpwSVdTQU81azcycitYUzNDOUg1ZWRq?= =?utf-8?B?LzkwYzdxSHhsRmlkbXhtbmRBdFlhQ0VpTW9vdTZDaDlNUnZNQjNUL0d1RWpG?= =?utf-8?Q?Rb5gKPS2x5K7TrkilVmda9Uf6mgBiR4+T3?= X-MS-Exchange-CrossTenant-Network-Message-Id: d17f925b-7392-4c68-cbbb-08d8b98467a7 X-MS-Exchange-CrossTenant-AuthSource: MW2PR18MB2267.namprd18.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Jan 2021 18:36:01.0679 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 70e1fb47-1155-421d-87fc-2e58f638b6e0 X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: WUYo2zqMUwuNgFz0Bc7kMmox1NPjvgpq8JGsaeuO0ZES8Mcid8zFX2MoAZBtQaFuz5b9K+GT4qcUmnIzUlQUlA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: MW2PR18MB2348 X-OriginatorOrg: marvell.com X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.343,18.0.737 definitions=2021-01-15_09:2021-01-15,2021-01-15 signatures=0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 1/15/21 05:24, Christoph Lameter wrote: > ---------------------------------------------------------------------- > On Thu, 14 Jan 2021, Marcelo Tosatti wrote: > >>> How does one do a oneshot flush of OS activities? >> >> ret = prctl(PR_TASK_ISOLATION_REQUEST, ISOL_F_QUIESCE, 0, 0, 0); >> if (ret == -1) { >> perror("prctl PR_TASK_ISOLATION_REQUEST"); >> exit(0); >> } >> >>> >>> I.e. I have a polling loop over numerous shared and I/o devices in user >>> space and I want to make sure that the system is quite before I enter the >>> loop. >> >> You could configure things in two ways: with syscalls allowed or not. > > Well syscalls that do not cause deferred processing like getting the time > or determining the current cpu should be ok to use. Some of those syscalls go through vdso, and don't enter the kernel -- nothing specific is necessary to allow them, and it would be pointless and difficult to prevent them. For syscalls that enter the kernel, it's often difficult to predict, if they will or won't cause deferred processing, so I am afraid, it won't be possible to provide a "safe" class of syscalls for this purpose and not end up with something minimal like reading /sys and /proc. Right now isolation only "allows" syscalls that exit isolation. It may be possible to set up a filter by the system (allowing few safe things like reading /proc) and let the user expand it by adding combinations of syscall / file descriptor. If some device is known to process operations safely, user can open it and mark file descriptor as allowed, say, for reading. > And I already said that I want the system to quiet down and allow system > calls. Some indication that deferred actions have occurred may be useful > by f.e. resetting the flag. I think, it should be possible to process a syscall, and if any deferred action occurred, exit isolation on return to userspace. Then there is a question, how userspace should be notified about isolation being lost. Normally this happens with a signal, but that is useful if we want syscall to fail with EINTR, not to succeed. Make sure that signal arrives after successful syscall return but before deferred action to happen? Sounds convoluted. Maybe reflecting isolation status in vdso and having the user check it there will be a good solution. When I worked on my implementation I have encountered both a problem of interaction with the rest of system from isolated task (at least simple things as logging) and a problem of handling enter/exit from isolation on a system when it's possible for a task to be interrupted early after entering isolation due to various events that were still in progress on other CPUs. I ended up implementing a manager/helper task that talks to tasks over a socket (when they are not isolated) and over ring buffers in shared memory (when they are isolated). While the current implementation is rather limited, the intention is to delegate to it everything that isolated task either can't do at all (like, writing logs) or that it would be cumbersome to implement (like monitoring the state of task, determining presence of deferred work after the task returned to userspace), etc. It would be great if the complexity and amount of functionality of that manager/helper task can be reduced, however I believe that having such a task is a legitimate way of implementing things that otherwise would require additional functionality in kernel. > >> 1) Add a new isolation feature ISOL_F_BLOCK_SYSCALLS (to block certain >> syscalls) along with ISOL_F_SETUP_NOTIF (to notify upon isolation >> breaking): > > Well come up with a use case for that .... I know mine. What you propose > could be useful for debugging for me but I would prefer the quiet down > approach where I determine when I use some syscalls or not and will deal > with the consequences. For my purposes breaking isolation on syscalls and notifications about isolation breaking is a very useful approach -- this is why I kept it exactly as it was in the original implementation by Chris Metcalf. In applications that I intend to use isolation for, the primary concern is consistent time for running code in userspace, so syscalls should be only issued when the task is specifically not in isolated mode. If the program issues a syscall by mistake (and that may happen when some libraries are used, or thread synchronization primitives are kept from non-isolated version of the program, even though isolated tasks are not supposed to use those), it means not only that deferred work may cause delay in the future, but also that there is an additional time to be spent in kernel. This should be immediately visible to the developer, and the best way to do it is by breaking isolation on syscall immediately. > >> >>> Features that I think may be needed: >>> >>> F_ISOL_QUIESCE -> quiet down now but allow all OS activities. OS >>> activites reset flag >>> >>> F_ISOL_BAREMETAL_HARD -> No OS interruptions. Fault on syscalls that >>> require such actions in the future. >> >> Question: why BAREMETAL ? > > To disinguish it from "Realtime". We want the processor for ourselves > without anything else running on it. > >> Two comments: >> >> 1) HARD mode could also block activities from different CPUs that can >> interrupt this isolated CPU (for example CPU hotplug, or increasing >> per-CPU trace buffer size). > > Blocking? The app should fail if any deferred actions are triggered as a > result of syscalls. It would give a warning with _WARN There are many supposedly innocent things, nowhere at the scale of CPU hotplug, that happen in a system and result in synchronization implemented as an IPI to every online CPU. We should consider them to be an ordinary occurrence, so there is a choice: 1. Ignore them completely and allow them in isolated mode. This will delay userspace with no indication and no isolation breaking. 2. Allow them, and notify userspace afterwards (through vdso or through userspace helper/manager over shared memory). This may be useful in those rare situations when the consequences of delay can be mitigated afterwards. 3. Make them break isolation, with userspace being notified normally (ex: with a signal in the current implementation). I guess, can be used if somehow most of the causes will be eliminated. 4. Prevent them from reaching the target CPU and make sure that whatever synchronization they are intended to cause, will happen when intended target CPU will enter to kernel later. Since we may have to synchronize things like code modification, some of this synchronization has to happen very early on kernel entry. I am most interested in (4), so this is what was implemented in my version of the patch (and currently I am trying to achieve completeness and, if possible, elegance of the implementation). I guess, if we want to add more controls, we can allow the user to choose either of those four options, or of a subset of them. In my opinion, if (4) will be available, and the only additional cost will be time for synchronization spent in breaking isolation procedure, there is not much need in the other three. Without (4) I don't think, the goal of providing consistent, interruption-free environment is achieved at all, so not implementing it would be very bad. >> 2) For a type of application it is the case that certain interruptions >> can be tolerated, as long as they do not cross certain thresholds. >> For example, one loses the flexibility to read/write MSRs >> on the isolated CPUs (including performance counters, >> RDT/MBM type MSRs, frequency/power statistics) by >> forcing a "no interruptions" mode. > > Does reading these really cause deferred actions by the OS? AFAICT you > could map these into memory as well as read them without OS activities. Access to those is hardware/architecture-specific, and in many cases, indeed, there is no need to issue a syscall at all. However for many applications the model with a helper task performing interactions with OS on a different core and exchanging data over shared memory may be sufficient, and it will also provide clear separation between operations that do require consistent timing and those that don't. > > "Interruptions that can be tolerated".... Well that is the wild west of > "realtime" where you can define how much of a time slice is "real" and how > much can be use by other processes. I do not think that any of that should > come into this API. > To be honest, I have no idea, what can and can not be tolerated by applications other than what I am familiar with. Applications that I know, require no interruptions at all, so I want to implement that. I assume, someone already uses existing CPU isolation for the purpose of providing "nearly interrupt-less" environment. I can imaging something like a task of controlling a large slow-updating LED display by bit-banging a strictly timed long serial message representing a frame or frame update. If interrupted, it may, depending on the protocol, corrupt the state of a single LED or fail to update until the end of the screen, but the next start of message will reset the state, and everything will work until the next interrupt. Maybe there are more realistic or useful examples. -- Alex