From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B05A4C433E0 for ; Mon, 18 Jan 2021 15:57:05 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 459A322B39 for ; Mon, 18 Jan 2021 15:57:05 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 459A322B39 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 843848D001C; Mon, 18 Jan 2021 10:57:04 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7F2D28D0018; Mon, 18 Jan 2021 10:57:04 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7095A8D001C; Mon, 18 Jan 2021 10:57:04 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0142.hostedemail.com [216.40.44.142]) by kanga.kvack.org (Postfix) with ESMTP id 5AC818D0018 for ; Mon, 18 Jan 2021 10:57:04 -0500 (EST) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 0C7B73647 for ; Mon, 18 Jan 2021 15:57:04 +0000 (UTC) X-FDA: 77719349568.06.beds71_5000e312754a Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin06.hostedemail.com (Postfix) with ESMTP id B3BA2100355CC for ; Mon, 18 Jan 2021 15:57:03 +0000 (UTC) X-HE-Tag: beds71_5000e312754a X-Filterd-Recvd-Size: 6795 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf22.hostedemail.com (Postfix) with ESMTP for ; Mon, 18 Jan 2021 15:57:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1610985422; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=wvWqklGSNfgRYoq4jSGoJQtKveVxTpckvSdWHfjKNUU=; b=fRvoomQT7uYR7qXHPd0g8eRO1puY4m7x4wviv21jFaVn93jLbtV14yaCxU+11++wlfIsMk Bzvy//AcuSk+E/k3fqs5X6LalakvVKhBtyTlSvsWZu9+7vJLU9qVF+EKPBtoZjOUtuPYWv z9hTHiRzpYLJJi6EwVEavPEfjOxpaco= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-151-uQ-4X-iYMwy5R2rH6oLMyA-1; Mon, 18 Jan 2021 10:57:00 -0500 X-MC-Unique: uQ-4X-iYMwy5R2rH6oLMyA-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 0F13B1012EB5; Mon, 18 Jan 2021 15:56:58 +0000 (UTC) Received: from fuller.cnet (ovpn-112-6.gru2.redhat.com [10.97.112.6]) by smtp.corp.redhat.com (Postfix) with ESMTPS id DF75060C73; Mon, 18 Jan 2021 15:56:51 +0000 (UTC) Received: by fuller.cnet (Postfix, from userid 1000) id 1DA7D416D87F; Mon, 18 Jan 2021 12:18:19 -0300 (-03) Date: Mon, 18 Jan 2021 12:18:19 -0300 From: Marcelo Tosatti To: Christoph Lameter Cc: Alex Belits , "tglx@linutronix.de" , "pauld@redhat.com" , "linux-mm@kvack.org" , "frederic@kernel.org" , "willy@infradead.org" , "peterz@infradead.org" , "akpm@linux-foundation.org" , Juri Lelli , Daniel Bristot de Oliveira Subject: Re: [RFC] tentative prctl task isolation interface Message-ID: <20210118151819.GA83656@fuller.cnet> References: <20201117202317.GA282679@fuller.cnet> <20201127154845.GA9100@fuller.cnet> <87h7p4dwus.fsf@nanos.tec.linutronix.de> <12ddb629555590cfd41db5b10854d95c1f154e24.camel@marvell.com> <20210113121544.GA16380@fuller.cnet> <20210114193430.GA149907@fuller.cnet> MIME-Version: 1.0 In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=mtosatti@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Jan 15, 2021 at 01:24:10PM +0000, Christoph Lameter wrote: > On Thu, 14 Jan 2021, Marcelo Tosatti wrote: > > > > How does one do a oneshot flush of OS activities? > > > > ret = prctl(PR_TASK_ISOLATION_REQUEST, ISOL_F_QUIESCE, 0, 0, 0); > > if (ret == -1) { > > perror("prctl PR_TASK_ISOLATION_REQUEST"); > > exit(0); > > } > > > > > > > > I.e. I have a polling loop over numerous shared and I/o devices in user > > > space and I want to make sure that the system is quite before I enter the > > > loop. > > > > You could configure things in two ways: with syscalls allowed or not. > > Well syscalls that do not cause deferred processing like getting the time > or determining the current cpu should be ok to use. Yes. > And I already said that I want the system to quiet down and allow system > calls. Also see that as being useful. > Some indication that deferred actions have occurred may be useful > by f.e. resetting the flag. OK: will implement on next patchset. > > 1) Add a new isolation feature ISOL_F_BLOCK_SYSCALLS (to block certain > > syscalls) along with ISOL_F_SETUP_NOTIF (to notify upon isolation > > breaking): > > Well come up with a use case for that .... I know mine. Trying to come up with an interface that accomodates all known use cases. Maybe passing the allowed list of syscalls is overkill, but Alex seems interested in the notification to break isolation. > What you propose > could be useful for debugging for me but I would prefer the quiet down > approach where I determine when I use some syscalls or not and will deal > with the consequences. Trying to cover the usecases that Alex mentioned on this thread... > > > Features that I think may be needed: > > > > > > F_ISOL_QUIESCE -> quiet down now but allow all OS activities. OS > > > activites reset flag > > > > > > F_ISOL_BAREMETAL_HARD -> No OS interruptions. Fault on syscalls that > > > require such actions in the future. > > > > Question: why BAREMETAL ? > > To disinguish it from "Realtime". We want the processor for ourselves > without anything else running on it. OK. > > Two comments: > > > > 1) HARD mode could also block activities from different CPUs that can > > interrupt this isolated CPU (for example CPU hotplug, or increasing > > per-CPU trace buffer size). > > Blocking? Block CPU hotplug for example: # echo 0 > /sys/devices/system/cpu/cpu3/online returns -EBUSY with message saying: "Can't offline cpu3: reason: cpu9 is isolated by application APP". > The app should fail if any deferred actions are triggered as a > result of syscalls. It would give a warning with _WARN > > > 2) For a type of application it is the case that certain interruptions > > can be tolerated, as long as they do not cross certain thresholds. > > For example, one loses the flexibility to read/write MSRs > > on the isolated CPUs (including performance counters, > > RDT/MBM type MSRs, frequency/power statistics) by > > forcing a "no interruptions" mode. > > Does reading these really cause deferred actions by the OS? AFAICT you > could map these into memory as well as read them without OS activities. AFAIK you can't for MSRs. > "Interruptions that can be tolerated".... Well that is the wild west of > "realtime" where you can define how much of a time slice is "real" and how > much can be use by other processes. I do not think that any of that should > come into this API. Understood.