All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: John Garry <john.garry@huawei.com>
Cc: Ming Lei <ming.lei@redhat.com>, <tglx@linutronix.de>,
	<chenxiang66@hisilicon.com>, <bigeasy@linutronix.de>,
	<linux-kernel@vger.kernel.org>, <hare@suse.com>, <hch@lst.de>,
	<axboe@kernel.dk>, <bvanassche@acm.org>, <peterz@infradead.org>,
	<mingo@redhat.com>
Subject: Re: [PATCH RFC 1/1] genirq: Make threaded handler use irq affinity  for managed interrupt
Date: Tue, 10 Dec 2019 18:32:35 +0000	[thread overview]
Message-ID: <5caa8414415ab35e74662ac0a30bb4ac@www.loen.fr> (raw)
In-Reply-To: <06d1e2ff-9ec7-2262-25a0-4503cb204b0b@huawei.com>

On 2019-12-10 12:05, John Garry wrote:
> On 10/12/2019 11:36, Marc Zyngier wrote:
>> On 2019-12-10 10:59, John Garry wrote:
>>>>>
>>>>> There is no lockup, just a potential performance boost in this 
>>>>> change.
>>>>>
>>>>> My colleague Xiang Chen can provide specifics of the test, as he 
>>>>> is
>>>>> the one running it.
>>>>>
>>>>> But one key bit of info - which I did not think most relevant 
>>>>> before
>>>>> - that is we have 2x SAS controllers running the throughput test 
>>>>> on
>>>>> the same host.
>>>>>
>>>>> As such, the completion queue interrupts would be spread 
>>>>> identically
>>>>> over the CPUs for each controller. I notice that ARM GICv3 ITS
>>>>> interrupt controller (which we use) does not use the generic irq
>>>>> matrix allocator, which I think would really help with this.
>>>>>
>>>>> Hi Marc,
>>>>>
>>>>> Is there any reason for which we couldn't utilise of the generic 
>>>>> irq
>>>>> matrix allocator for GICv3?
>>>>
>>>
>>> Hi Marc,
>>>
>>>> For a start, the ITS code predates the matrix allocator by about 
>>>> three
>>>> years. Also, my understanding of this allocator is that it allows
>>>> x86 to cope with a very small number of possible interrupt vectors
>>>> per CPU. The ITS doesn't have such issue, as:
>>>> 1) the namespace is global, and not per CPU
>>>> 2) the namespace is *huge*
>>>> Now, what property of the matrix allocator is the ITS code 
>>>> missing?
>>>> I'd be more than happy to improve it.
>>>
>>> I think specifically the property that the matrix allocator will 
>>> try
>>> to find a CPU for irq affinity which "has the lowest number of 
>>> managed
>>> IRQs allocated" - I'm quoting the comment on 
>>> matrix_find_best_cpu_managed().
>> But that decision is due to allocation constraints. You can have at 
>> most
>> 256 interrupts per CPU, so the allocator tries to balance it.
>> On the contrary, the ITS does care about how many interrupt target 
>> any
>> given CPU. The whole 2^24 interrupt namespace can be thrown at a 
>> single
>> CPU.
>>
>>> The ITS code will make the lowest online CPU in the affinity mask 
>>> the
>>> target CPU for the interrupt, which may result in some CPUs 
>>> handling
>>> so many interrupts.
>> If what you want is for the *default* affinity to be spread around,
>> that should be achieved pretty easily. Let me have a think about how
>> to do that.
>
> Cool, I anticipate that it should help my case.
>
> I can also seek out some NVMe cards to see how it would help a more
> "generic" scenario.

Can you give the following a go? It probably has all kind of warts on
top of the quality debug information, but I managed to get my D05 and
a couple of guests to boot with it. It will probably eat your data,
so use caution! ;-)

Thanks,

         M.

diff --git a/drivers/irqchip/irq-gic-v3-its.c 
b/drivers/irqchip/irq-gic-v3-its.c
index e05673bcd52b..301ee3bc0602 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -177,6 +177,8 @@ static DEFINE_IDA(its_vpeid_ida);
  #define gic_data_rdist_rd_base()	(gic_data_rdist()->rd_base)
  #define gic_data_rdist_vlpi_base()	(gic_data_rdist_rd_base() + 
SZ_128K)

+static DEFINE_PER_CPU(atomic_t, cpu_lpi_count);
+
  static u16 get_its_list(struct its_vm *vm)
  {
  	struct its_node *its;
@@ -1287,42 +1289,76 @@ static void its_unmask_irq(struct irq_data *d)
  	lpi_update_config(d, 0, LPI_PROP_ENABLED);
  }

+static int its_pick_target_cpu(struct its_device *its_dev, const 
struct cpumask *cpu_mask)
+{
+	unsigned int cpu = nr_cpu_ids, tmp;
+	int count = S32_MAX;
+
+	for_each_cpu_and(tmp, cpu_mask, cpu_online_mask) {
+		int this_count = per_cpu(cpu_lpi_count, tmp).counter;
+		if (this_count < count) {
+			cpu = tmp;
+		        count = this_count;
+		}
+	}
+
+	return cpu;
+}
+
  static int its_set_affinity(struct irq_data *d, const struct cpumask 
*mask_val,
  			    bool force)
  {
-	unsigned int cpu;
-	const struct cpumask *cpu_mask = cpu_online_mask;
  	struct its_device *its_dev = irq_data_get_irq_chip_data(d);
-	struct its_collection *target_col;
+	int ret = IRQ_SET_MASK_OK_DONE;
  	u32 id = its_get_event_id(d);
+	cpumask_var_t tmpmask;

  	/* A forwarded interrupt should use irq_set_vcpu_affinity */
  	if (irqd_is_forwarded_to_vcpu(d))
  		return -EINVAL;

-       /* lpi cannot be routed to a redistributor that is on a foreign 
node */
-	if (its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144) {
-		if (its_dev->its->numa_node >= 0) {
-			cpu_mask = cpumask_of_node(its_dev->its->numa_node);
-			if (!cpumask_intersects(mask_val, cpu_mask))
-				return -EINVAL;
+	if (!alloc_cpumask_var(&tmpmask, GFP_KERNEL))
+		return -ENOMEM;
+
+	cpumask_and(tmpmask, mask_val, cpu_online_mask);
+
+	if (its_dev->its->numa_node >= 0)
+		cpumask_and(tmpmask, tmpmask, 
cpumask_of_node(its_dev->its->numa_node));
+
+	if (cpumask_empty(tmpmask)) {
+		/* LPI cannot be routed to a redistributor that is on a foreign node 
*/
+		if (its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144) {
+			ret = -EINVAL;
+			goto out;
  		}
+
+		cpumask_copy(tmpmask, cpu_online_mask);
  	}

-	cpu = cpumask_any_and(mask_val, cpu_mask);
+	if (!cpumask_test_cpu(its_dev->event_map.col_map[id], tmpmask)) {
+		struct its_collection *target_col;
+		int cpu;

-	if (cpu >= nr_cpu_ids)
-		return -EINVAL;
+		cpu = its_pick_target_cpu(its_dev, tmpmask);
+		if (cpu >= nr_cpu_ids) {
+			ret = -EINVAL;
+			goto out;
+		}

-	/* don't set the affinity when the target cpu is same as current one 
*/
-	if (cpu != its_dev->event_map.col_map[id]) {
+		pr_info("IRQ%d CPU%d -> CPU%d\n",
+			d->irq, its_dev->event_map.col_map[id], cpu);
+		atomic_inc(per_cpu_ptr(&cpu_lpi_count, cpu));
+		atomic_dec(per_cpu_ptr(&cpu_lpi_count,
+				       its_dev->event_map.col_map[id]));
  		target_col = &its_dev->its->collections[cpu];
  		its_send_movi(its_dev, target_col, id);
  		its_dev->event_map.col_map[id] = cpu;
  		irq_data_update_effective_affinity(d, cpumask_of(cpu));
  	}

-	return IRQ_SET_MASK_OK_DONE;
+out:
+	free_cpumask_var(tmpmask);
+	return ret;
  }

  static u64 its_irq_get_msi_base(struct its_device *its_dev)
@@ -2773,22 +2809,28 @@ static int its_irq_domain_activate(struct 
irq_domain *domain,
  {
  	struct its_device *its_dev = irq_data_get_irq_chip_data(d);
  	u32 event = its_get_event_id(d);
-	const struct cpumask *cpu_mask = cpu_online_mask;
-	int cpu;
+	int cpu = nr_cpu_ids;

-	/* get the cpu_mask of local node */
-	if (its_dev->its->numa_node >= 0)
-		cpu_mask = cpumask_of_node(its_dev->its->numa_node);
+	/* Find the least loaded CPU on the local node */
+	if (its_dev->its->numa_node >= 0) {
+		cpu = its_pick_target_cpu(its_dev,
+					  cpumask_of_node(its_dev->its->numa_node));
+		if (cpu < 0)
+			return cpu;

-	/* Bind the LPI to the first possible CPU */
-	cpu = cpumask_first_and(cpu_mask, cpu_online_mask);
-	if (cpu >= nr_cpu_ids) {
-		if (its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144)
+		if (cpu >= nr_cpu_ids &&
+		    (its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144))
  			return -EINVAL;
+	}

-		cpu = cpumask_first(cpu_online_mask);
+	if (cpu >= nr_cpu_ids) {
+		cpu = its_pick_target_cpu(its_dev, cpu_online_mask);
+		if (cpu < 0)
+			return cpu;
  	}

+	pr_info("picked CPU%d IRQ%d\n", cpu, d->irq);
+	atomic_inc(per_cpu_ptr(&cpu_lpi_count, cpu));
  	its_dev->event_map.col_map[event] = cpu;
  	irq_data_update_effective_affinity(d, cpumask_of(cpu));

@@ -2803,6 +2845,8 @@ static void its_irq_domain_deactivate(struct 
irq_domain *domain,
  	struct its_device *its_dev = irq_data_get_irq_chip_data(d);
  	u32 event = its_get_event_id(d);

+	atomic_dec(per_cpu_ptr(&cpu_lpi_count,
+			       its_dev->event_map.col_map[event]));
  	/* Stop the delivery of interrupts */
  	its_send_discard(its_dev, event);
  }

-- 
Jazz is not dead. It just smells funny...

  reply	other threads:[~2019-12-10 18:32 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-06 14:35 [PATCH RFC 0/1] Threaded handler uses irq affinity for when the interrupt is managed John Garry
2019-12-06 14:35 ` [PATCH RFC 1/1] genirq: Make threaded handler use irq affinity for managed interrupt John Garry
2019-12-06 15:22   ` Marc Zyngier
2019-12-06 16:16     ` John Garry
2019-12-07  8:03   ` Ming Lei
2019-12-09 14:30     ` John Garry
2019-12-09 15:09       ` Hannes Reinecke
2019-12-09 15:17         ` Marc Zyngier
2019-12-09 15:25           ` Hannes Reinecke
2019-12-09 15:36             ` Marc Zyngier
2019-12-09 15:49           ` Qais Yousef
2019-12-09 15:55             ` Marc Zyngier
2019-12-10  1:43       ` Ming Lei
2019-12-10  9:45         ` John Garry
2019-12-10 10:06           ` Ming Lei
2019-12-10 10:28           ` Marc Zyngier
2019-12-10 10:59             ` John Garry
2019-12-10 11:36               ` Marc Zyngier
2019-12-10 12:05                 ` John Garry
2019-12-10 18:32                   ` Marc Zyngier [this message]
2019-12-11  9:41                     ` John Garry
2019-12-13 10:07                       ` John Garry
2019-12-13 10:31                         ` Marc Zyngier
2019-12-13 12:08                           ` John Garry
2019-12-14 10:59                             ` Marc Zyngier
2019-12-11 17:09         ` John Garry
2019-12-12 22:38           ` Ming Lei
2019-12-13 11:12             ` John Garry
2019-12-13 13:18               ` Ming Lei
2019-12-13 15:43                 ` John Garry
2019-12-13 17:12                   ` Ming Lei
2019-12-13 17:50                     ` John Garry
2019-12-14 13:56                   ` Marc Zyngier
2019-12-16 10:47                     ` John Garry
2019-12-16 11:40                       ` Marc Zyngier
2019-12-16 14:17                         ` John Garry
2019-12-16 18:00                           ` Marc Zyngier
2019-12-16 18:50                             ` John Garry
2019-12-20 11:30                               ` John Garry
2019-12-20 14:43                                 ` Marc Zyngier
2019-12-20 15:38                                   ` John Garry
2019-12-20 16:16                                     ` Marc Zyngier
2019-12-20 23:31                                     ` Ming Lei
2019-12-23  9:07                                       ` Marc Zyngier
2019-12-23 10:26                                         ` John Garry
2019-12-23 10:47                                           ` Marc Zyngier
2019-12-23 11:35                                             ` John Garry
2019-12-24  1:59                                             ` Ming Lei
2019-12-24 11:20                                               ` Marc Zyngier
2019-12-25  0:48                                                 ` Ming Lei
2020-01-02 10:35                                                   ` John Garry
2020-01-03  0:46                                                     ` Ming Lei
2020-01-03 10:41                                                       ` John Garry
2020-01-03 11:29                                                         ` Ming Lei
2020-01-03 11:50                                                           ` John Garry
2020-01-04 12:03                                                             ` Ming Lei
2020-05-30  7:46 ` [tip: irq/core] irqchip/gic-v3-its: Balance initial LPI affinity across CPUs tip-bot2 for Marc Zyngier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5caa8414415ab35e74662ac0a30bb4ac@www.loen.fr \
    --to=maz@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=bigeasy@linutronix.de \
    --cc=bvanassche@acm.org \
    --cc=chenxiang66@hisilicon.com \
    --cc=hare@suse.com \
    --cc=hch@lst.de \
    --cc=john.garry@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.