From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A937AC33CAF for ; Fri, 17 Jan 2020 00:23:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 1D3862077C for ; Fri, 17 Jan 2020 00:23:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731295AbgAQAXO (ORCPT ); Thu, 16 Jan 2020 19:23:14 -0500 Received: from Galois.linutronix.de ([193.142.43.55]:53997 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728963AbgAQAXN (ORCPT ); Thu, 16 Jan 2020 19:23:13 -0500 Received: from p5b06da22.dip0.t-ipconnect.de ([91.6.218.34] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1isFPl-0005sI-Nc; Fri, 17 Jan 2020 01:23:06 +0100 Received: by nanos.tec.linutronix.de (Postfix, from userid 1000) id C48A9100C19; Fri, 17 Jan 2020 01:23:03 +0100 (CET) From: Thomas Gleixner To: Ming Lei Cc: linux-kernel@vger.kernel.org, Ingo Molnar , Peter Zijlstra , Peter Xu , Juri Lelli Subject: Re: [PATCH] sched/isolation: isolate from handling managed interrupt In-Reply-To: <20200116215814.GA24827@ming.t460p> References: <20200116094806.25372-1-ming.lei@redhat.com> <875zhbwqmm.fsf@nanos.tec.linutronix.de> <20200116215814.GA24827@ming.t460p> Date: Fri, 17 Jan 2020 01:23:03 +0100 Message-ID: <87d0bjdj88.fsf@nanos.tec.linutronix.de> MIME-Version: 1.0 Content-Type: text/plain X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Ming, Ming Lei writes: > On Thu, Jan 16, 2020 at 01:08:17PM +0100, Thomas Gleixner wrote: >> Ming Lei writes: >> > - ret = chip->irq_set_affinity(data, mask, force); >> > + zalloc_cpumask_var(&tmp_mask, GFP_ATOMIC); >> >> I clearly told you: >> >> "That's wrong. This code is called with interrupts disabled, so >> GFP_KERNEL is wrong. And NO, we won't do a GFP_ATOMIC allocation >> here." >> >> Is that last sentence unclear in any way? > > Yeah, it is clear. > > But GFP_ATOMIC is usually allowed in atomic context, could you > explain it a bit why it can't be done in this case? You could have asked that question before sending this patch :) > We still can fallback to current behavior if the allocation fails. Allocation fail is not the main concern here. In general we avoid atomic allocations where ever we can, but even more so in contexts which are fully atomic even on PREEMPT_RT simply because PREEMPT_RT cannot support that. Regular spin_lock(); GFP_ATOMIC; spin_unlock(); sections are not a problem because spinlocks turn into sleepable 'spinlocks' on RT and are preemptible. irq_desc::lock is a raw spinlock which is a true spinlock on RT for obvious reasons and there any form of memory allocation is a NONO. > Or could you suggest to solve the issue in other way if GFP_ATOMIC > can't be done? Just use the same mechanism as irq_setup_affinity() for now. Not pretty, but it does the job. I have a plan how to avoid that, but that's a larger surgery. Let me give you a few comments on the rest of the patch while at it. > --- a/kernel/irq/manage.c > +++ b/kernel/irq/manage.c > @@ -20,6 +20,7 @@ > #include > #include > #include > +#include Can you please move this include next to the other linux/sched/ one? > @@ -212,12 +213,29 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask, > { > struct irq_desc *desc = irq_data_to_desc(data); > struct irq_chip *chip = irq_data_get_irq_chip(data); > + const struct cpumask *housekeeping_mask = > + housekeeping_cpumask(HK_FLAG_MANAGED_IRQ); Bah. This is unreadable garbage. What's wrong with defining the variable and retrieving the pointer later on? Especially as this is only required when there is an actual managed interrupt. > + /* > + * Userspace can't change managed irq's affinity, make sure that > + * isolated CPU won't be selected as the effective CPU if this > + * irq's affinity includes at least one housekeeping CPU. > + * > + * This way guarantees that isolated CPU won't be interrupted if > + * IO isn't submitted from isolated CPU. This comment is more confusing than helpful. What about: /* * If this is a managed interrupt check whether the requested * affinity mask intersects with a housekeeping CPU. If so, then * remove the isolated CPUs from the mask and just keep the * housekeeping CPU(s). This prevents the affinity setter from * routing the interrupt to an isolated CPU to avoid that I/O * submitted from a housekeeping CPU causes interrupts on an * isolated one. * * If the masks do not intersect then keep the requested mask. * The isolated target CPUs are only receiving interrupts * when the I/O operation was submitted directly from them. */ Or something to that effect. Hmm? > + */ > + if (irqd_affinity_is_managed(data) && tmp_mask && > + cpumask_intersects(mask, housekeeping_mask)) > + cpumask_and(tmp_mask, mask, housekeeping_mask); Now while writing the above comment the following interesting scenario came to my mind: Housekeeping CPUs 0-2, Isolated CPUs 3-7 Device has 4 queues. So the spreading results in: q0: CPU 0/1, q1: CPU 2/3, q2: CPU 4/5, q3: CPU 6/7 q1 is the interesting one. It's the one which gets the housekeeping mask applied and CPU3 is removed. So if CPU2 is offline when the mask is applied, then the result of the affinity setting operation is going to be an error code because the resulting mask is empty. Admittedly this is a weird corner case, but it's bound to happen and the resulting bug report is going to be hard to decode unless the reporter can provide a 100% reproducer or a very accurate description of the scenario. Thanks, tglx