linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* iommu/amd: lock splitting & GFP_KERNEL allocation
@ 2018-02-23 22:27 Sebastian Andrzej Siewior
  2018-02-23 22:27 ` [PATCH 01/10] iommu/amd: take into account that alloc_dev_data() may return NULL Sebastian Andrzej Siewior
                   ` (10 more replies)
  0 siblings, 11 replies; 21+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-02-23 22:27 UTC (permalink / raw)
  To: iommu; +Cc: linux-kernel, Joerg Roedel, tglx

Hi,

I have no idea why but suddenly my A10 box complained loudly about
locking and memory allocations within the iommu code under RT. Looking
at the code it has been like this for a longer time so the iommu must
have appeared recently (well there was a bios upgrade due to other
issues so it might have enabled it).

The goal here is to make the memory allocation in get_irq_table() not
with disabled interrupts and having as little raw_spin_lock as possible
while having them if the caller is also holding one (like desc->lock
during IRQ-affinity changes).

The patches were boot tested on an AMD EPYC 7601.

Sebastian

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 01/10] iommu/amd: take into account that alloc_dev_data() may return NULL
  2018-02-23 22:27 iommu/amd: lock splitting & GFP_KERNEL allocation Sebastian Andrzej Siewior
@ 2018-02-23 22:27 ` Sebastian Andrzej Siewior
  2018-02-23 22:27 ` [PATCH 02/10] iommu/amd: turn dev_data_list into a lock less list Sebastian Andrzej Siewior
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-02-23 22:27 UTC (permalink / raw)
  To: iommu
  Cc: linux-kernel, Joerg Roedel, tglx, Sebastian Andrzej Siewior, Baoquan He

find_dev_data() does not check whether the return value alloc_dev_data()
is NULL. This was okay once because the pointer was returned once as-is.
Since commit df3f7a6e8e85 ("iommu/amd: Use is_attach_deferred
call-back") the pointer may be used within find_dev_data() so a NULL
check is required.

Cc: Baoquan He <bhe@redhat.com>
Fixes: df3f7a6e8e85 ("iommu/amd: Use is_attach_deferred call-back")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/iommu/amd_iommu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 74788fdeb773..8b591c192daf 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -310,6 +310,8 @@ static struct iommu_dev_data *find_dev_data(u16 devid)
 
 	if (dev_data == NULL) {
 		dev_data = alloc_dev_data(devid);
+		if (!dev_data)
+			return NULL;
 
 		if (translation_pre_enabled(iommu))
 			dev_data->defer_attach = true;
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 02/10] iommu/amd: turn dev_data_list into a lock less list
  2018-02-23 22:27 iommu/amd: lock splitting & GFP_KERNEL allocation Sebastian Andrzej Siewior
  2018-02-23 22:27 ` [PATCH 01/10] iommu/amd: take into account that alloc_dev_data() may return NULL Sebastian Andrzej Siewior
@ 2018-02-23 22:27 ` Sebastian Andrzej Siewior
  2018-02-23 22:27 ` [PATCH 03/10] iommu/amd: split domain id out of amd_iommu_devtable_lock Sebastian Andrzej Siewior
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-02-23 22:27 UTC (permalink / raw)
  To: iommu; +Cc: linux-kernel, Joerg Roedel, tglx, Sebastian Andrzej Siewior

alloc_dev_data() adds new items to dev_data_list and search_dev_data()
is searching for items in this list. Both protect the access to the list
with a spinlock.
There is no need to navigate forth and back within the list and there is
also no deleting of a specific item. This qualifies the list to become a
lock less list and as part of this, the spinlock can be removed.
With this change the ordering of those items within the list is changed:
before the change new items were added to the end of the list, now they
are added to the front. I don't think it matters but wanted to mention
it.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/iommu/amd_iommu.c       | 28 ++++++++++------------------
 drivers/iommu/amd_iommu_types.h |  2 +-
 2 files changed, 11 insertions(+), 19 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 8b591c192daf..53aece41ddf2 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -83,8 +83,7 @@
 static DEFINE_RWLOCK(amd_iommu_devtable_lock);
 
 /* List of all available dev_data structures */
-static LIST_HEAD(dev_data_list);
-static DEFINE_SPINLOCK(dev_data_list_lock);
+static LLIST_HEAD(dev_data_list);
 
 LIST_HEAD(ioapic_map);
 LIST_HEAD(hpet_map);
@@ -203,40 +202,33 @@ static struct dma_ops_domain* to_dma_ops_domain(struct protection_domain *domain
 static struct iommu_dev_data *alloc_dev_data(u16 devid)
 {
 	struct iommu_dev_data *dev_data;
-	unsigned long flags;
 
 	dev_data = kzalloc(sizeof(*dev_data), GFP_KERNEL);
 	if (!dev_data)
 		return NULL;
 
 	dev_data->devid = devid;
-
-	spin_lock_irqsave(&dev_data_list_lock, flags);
-	list_add_tail(&dev_data->dev_data_list, &dev_data_list);
-	spin_unlock_irqrestore(&dev_data_list_lock, flags);
-
 	ratelimit_default_init(&dev_data->rs);
 
+	llist_add(&dev_data->dev_data_list, &dev_data_list);
 	return dev_data;
 }
 
 static struct iommu_dev_data *search_dev_data(u16 devid)
 {
 	struct iommu_dev_data *dev_data;
-	unsigned long flags;
+	struct llist_node *node;
 
-	spin_lock_irqsave(&dev_data_list_lock, flags);
-	list_for_each_entry(dev_data, &dev_data_list, dev_data_list) {
+	if (llist_empty(&dev_data_list))
+		return NULL;
+
+	node = dev_data_list.first;
+	llist_for_each_entry(dev_data, node, dev_data_list) {
 		if (dev_data->devid == devid)
-			goto out_unlock;
+			return dev_data;
 	}
 
-	dev_data = NULL;
-
-out_unlock:
-	spin_unlock_irqrestore(&dev_data_list_lock, flags);
-
-	return dev_data;
+	return NULL;
 }
 
 static int __last_alias(struct pci_dev *pdev, u16 alias, void *data)
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index 6a877ebd058b..62d6f42b57c9 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -627,7 +627,7 @@ struct devid_map {
  */
 struct iommu_dev_data {
 	struct list_head list;		  /* For domain->dev_list */
-	struct list_head dev_data_list;	  /* For global dev_data_list */
+	struct llist_node dev_data_list;  /* For global dev_data_list */
 	struct protection_domain *domain; /* Domain the device is bound to */
 	u16 devid;			  /* PCI Device ID */
 	u16 alias;			  /* Alias Device ID */
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 03/10] iommu/amd: split domain id out of amd_iommu_devtable_lock
  2018-02-23 22:27 iommu/amd: lock splitting & GFP_KERNEL allocation Sebastian Andrzej Siewior
  2018-02-23 22:27 ` [PATCH 01/10] iommu/amd: take into account that alloc_dev_data() may return NULL Sebastian Andrzej Siewior
  2018-02-23 22:27 ` [PATCH 02/10] iommu/amd: turn dev_data_list into a lock less list Sebastian Andrzej Siewior
@ 2018-02-23 22:27 ` Sebastian Andrzej Siewior
  2018-02-23 22:27 ` [PATCH 04/10] iommu/amd: split irq_lookup_table out of the amd_iommu_devtable_lock Sebastian Andrzej Siewior
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-02-23 22:27 UTC (permalink / raw)
  To: iommu; +Cc: linux-kernel, Joerg Roedel, tglx, Sebastian Andrzej Siewior

domain_id_alloc() and domain_id_free() is used for id management. Those
two function share a bitmap (amd_iommu_pd_alloc_bitmap) and set/clear
bits based on id allocation. There is no need to share this with
amd_iommu_devtable_lock, it can use its own lock for this operation.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/iommu/amd_iommu.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 53aece41ddf2..958efe311057 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -81,6 +81,7 @@
 #define AMD_IOMMU_PGSIZES	((~0xFFFUL) & ~(2ULL << 38))
 
 static DEFINE_RWLOCK(amd_iommu_devtable_lock);
+static DEFINE_SPINLOCK(pd_bitmap_lock);
 
 /* List of all available dev_data structures */
 static LLIST_HEAD(dev_data_list);
@@ -1599,29 +1600,26 @@ static void del_domain_from_list(struct protection_domain *domain)
 
 static u16 domain_id_alloc(void)
 {
-	unsigned long flags;
 	int id;
 
-	write_lock_irqsave(&amd_iommu_devtable_lock, flags);
+	spin_lock(&pd_bitmap_lock);
 	id = find_first_zero_bit(amd_iommu_pd_alloc_bitmap, MAX_DOMAIN_ID);
 	BUG_ON(id == 0);
 	if (id > 0 && id < MAX_DOMAIN_ID)
 		__set_bit(id, amd_iommu_pd_alloc_bitmap);
 	else
 		id = 0;
-	write_unlock_irqrestore(&amd_iommu_devtable_lock, flags);
+	spin_unlock(&pd_bitmap_lock);
 
 	return id;
 }
 
 static void domain_id_free(int id)
 {
-	unsigned long flags;
-
-	write_lock_irqsave(&amd_iommu_devtable_lock, flags);
+	spin_lock(&pd_bitmap_lock);
 	if (id > 0 && id < MAX_DOMAIN_ID)
 		__clear_bit(id, amd_iommu_pd_alloc_bitmap);
-	write_unlock_irqrestore(&amd_iommu_devtable_lock, flags);
+	spin_unlock(&pd_bitmap_lock);
 }
 
 #define DEFINE_FREE_PT_FN(LVL, FN)				\
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 04/10] iommu/amd: split irq_lookup_table out of the amd_iommu_devtable_lock
  2018-02-23 22:27 iommu/amd: lock splitting & GFP_KERNEL allocation Sebastian Andrzej Siewior
                   ` (2 preceding siblings ...)
  2018-02-23 22:27 ` [PATCH 03/10] iommu/amd: split domain id out of amd_iommu_devtable_lock Sebastian Andrzej Siewior
@ 2018-02-23 22:27 ` Sebastian Andrzej Siewior
  2018-03-15 12:54   ` Joerg Roedel
  2018-02-23 22:27 ` [PATCH 05/10] iommu/amd: remove the special case from get_irq_table() Sebastian Andrzej Siewior
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 21+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-02-23 22:27 UTC (permalink / raw)
  To: iommu; +Cc: linux-kernel, Joerg Roedel, tglx, Sebastian Andrzej Siewior

The function get_irq_table() reads/writes irq_lookup_table while holding
the amd_iommu_devtable_lock. It also modifies
amd_iommu_dev_table[].data[2].
set_dte_entry() is using amd_iommu_dev_table[].data[0|1] (under the
domain->lock) so it should be okay. The access to the iommu is
serialized with its own (iommu's) lock.

So split out get_irq_table() out of amd_iommu_devtable_lock's lock. The
new lock is a raw_spin_lock because modify_irte_ga() is called while
desc->lock is held (which is raw).

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/iommu/amd_iommu.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 958efe311057..72487ac43eef 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -82,6 +82,7 @@
 
 static DEFINE_RWLOCK(amd_iommu_devtable_lock);
 static DEFINE_SPINLOCK(pd_bitmap_lock);
+static DEFINE_RAW_SPINLOCK(iommu_table_lock);
 
 /* List of all available dev_data structures */
 static LLIST_HEAD(dev_data_list);
@@ -3594,7 +3595,7 @@ static struct irq_remap_table *get_irq_table(u16 devid, bool ioapic)
 	unsigned long flags;
 	u16 alias;
 
-	write_lock_irqsave(&amd_iommu_devtable_lock, flags);
+	raw_spin_lock_irqsave(&iommu_table_lock, flags);
 
 	iommu = amd_iommu_rlookup_table[devid];
 	if (!iommu)
@@ -3659,7 +3660,7 @@ static struct irq_remap_table *get_irq_table(u16 devid, bool ioapic)
 	iommu_completion_wait(iommu);
 
 out_unlock:
-	write_unlock_irqrestore(&amd_iommu_devtable_lock, flags);
+	raw_spin_unlock_irqrestore(&iommu_table_lock, flags);
 
 	return table;
 }
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 05/10] iommu/amd: remove the special case from get_irq_table()
  2018-02-23 22:27 iommu/amd: lock splitting & GFP_KERNEL allocation Sebastian Andrzej Siewior
                   ` (3 preceding siblings ...)
  2018-02-23 22:27 ` [PATCH 04/10] iommu/amd: split irq_lookup_table out of the amd_iommu_devtable_lock Sebastian Andrzej Siewior
@ 2018-02-23 22:27 ` Sebastian Andrzej Siewior
  2018-03-15 12:53   ` Joerg Roedel
  2018-02-23 22:27 ` [PATCH 06/10] iommu/amd: use `table' instead `irt' as variable name in amd_iommu_update_ga() Sebastian Andrzej Siewior
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 21+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-02-23 22:27 UTC (permalink / raw)
  To: iommu; +Cc: linux-kernel, Joerg Roedel, tglx, Sebastian Andrzej Siewior

get_irq_table() has a special ioapic argument. If set then it will
pre-allocate / reserve the first 32 indexes. The argument is only once
true and it would make get_irq_table() a little simpler if we would
extract the special bits to the caller.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/iommu/amd_iommu.c | 42 ++++++++++++++++++++++++------------------
 1 file changed, 24 insertions(+), 18 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 72487ac43eef..19de479fe21c 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -3588,7 +3588,7 @@ static void set_dte_irq_entry(u16 devid, struct irq_remap_table *table)
 	amd_iommu_dev_table[devid].data[2] = dte;
 }
 
-static struct irq_remap_table *get_irq_table(u16 devid, bool ioapic)
+static struct irq_remap_table *get_irq_table(u16 devid)
 {
 	struct irq_remap_table *table = NULL;
 	struct amd_iommu *iommu;
@@ -3622,10 +3622,6 @@ static struct irq_remap_table *get_irq_table(u16 devid, bool ioapic)
 	/* Initialize table spin-lock */
 	spin_lock_init(&table->lock);
 
-	if (ioapic)
-		/* Keep the first 32 indexes free for IOAPIC interrupts */
-		table->min_index = 32;
-
 	table->table = kmem_cache_alloc(amd_iommu_irq_cache, GFP_ATOMIC);
 	if (!table->table) {
 		kfree(table);
@@ -3640,12 +3636,6 @@ static struct irq_remap_table *get_irq_table(u16 devid, bool ioapic)
 		memset(table->table, 0,
 		       (MAX_IRQS_PER_TABLE * (sizeof(u64) * 2)));
 
-	if (ioapic) {
-		int i;
-
-		for (i = 0; i < 32; ++i)
-			iommu->irte_ops->set_allocated(table, i);
-	}
 
 	irq_lookup_table[devid] = table;
 	set_dte_irq_entry(devid, table);
@@ -3675,7 +3665,7 @@ static int alloc_irq_index(u16 devid, int count, bool align)
 	if (!iommu)
 		return -ENODEV;
 
-	table = get_irq_table(devid, false);
+	table = get_irq_table(devid);
 	if (!table)
 		return -ENODEV;
 
@@ -3726,7 +3716,7 @@ static int modify_irte_ga(u16 devid, int index, struct irte_ga *irte,
 	if (iommu == NULL)
 		return -EINVAL;
 
-	table = get_irq_table(devid, false);
+	table = get_irq_table(devid);
 	if (!table)
 		return -ENOMEM;
 
@@ -3759,7 +3749,7 @@ static int modify_irte(u16 devid, int index, union irte *irte)
 	if (iommu == NULL)
 		return -EINVAL;
 
-	table = get_irq_table(devid, false);
+	table = get_irq_table(devid);
 	if (!table)
 		return -ENOMEM;
 
@@ -3783,7 +3773,7 @@ static void free_irte(u16 devid, int index)
 	if (iommu == NULL)
 		return;
 
-	table = get_irq_table(devid, false);
+	table = get_irq_table(devid);
 	if (!table)
 		return;
 
@@ -4103,10 +4093,26 @@ static int irq_remapping_alloc(struct irq_domain *domain, unsigned int virq,
 		return ret;
 
 	if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC) {
-		if (get_irq_table(devid, true))
+		struct irq_remap_table *table;
+		struct amd_iommu *iommu;
+
+		table = get_irq_table(devid);
+		if (table) {
+			if (!table->min_index) {
+				/*
+				 * Keep the first 32 indexes free for IOAPIC
+				 * interrupts.
+				 */
+				table->min_index = 32;
+				iommu = amd_iommu_rlookup_table[devid];
+				for (i = 0; i < 32; ++i)
+					iommu->irte_ops->set_allocated(table, i);
+			}
 			index = info->ioapic_pin;
-		else
+			WARN_ON(table->min_index != 32);
+		} else {
 			ret = -ENOMEM;
+		}
 	} else {
 		bool align = (info->type == X86_IRQ_ALLOC_TYPE_MSI);
 
@@ -4386,7 +4392,7 @@ int amd_iommu_update_ga(int cpu, bool is_run, void *data)
 	if (!iommu)
 		return -ENODEV;
 
-	irt = get_irq_table(devid, false);
+	irt = get_irq_table(devid);
 	if (!irt)
 		return -ENODEV;
 
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 06/10] iommu/amd: use `table' instead `irt' as variable name in amd_iommu_update_ga()
  2018-02-23 22:27 iommu/amd: lock splitting & GFP_KERNEL allocation Sebastian Andrzej Siewior
                   ` (4 preceding siblings ...)
  2018-02-23 22:27 ` [PATCH 05/10] iommu/amd: remove the special case from get_irq_table() Sebastian Andrzej Siewior
@ 2018-02-23 22:27 ` Sebastian Andrzej Siewior
  2018-02-23 22:27 ` [PATCH 07/10] iommu/amd: factor out setting the remap table for a devid Sebastian Andrzej Siewior
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-02-23 22:27 UTC (permalink / raw)
  To: iommu; +Cc: linux-kernel, Joerg Roedel, tglx, Sebastian Andrzej Siewior

The variable of type struct irq_remap_table is always named `table'
except in amd_iommu_update_ga() where it is called `irt'. Make it
consistent and name it also `table'.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/iommu/amd_iommu.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 19de479fe21c..d3a05d794218 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -4378,7 +4378,7 @@ int amd_iommu_update_ga(int cpu, bool is_run, void *data)
 {
 	unsigned long flags;
 	struct amd_iommu *iommu;
-	struct irq_remap_table *irt;
+	struct irq_remap_table *table;
 	struct amd_ir_data *ir_data = (struct amd_ir_data *)data;
 	int devid = ir_data->irq_2_irte.devid;
 	struct irte_ga *entry = (struct irte_ga *) ir_data->entry;
@@ -4392,11 +4392,11 @@ int amd_iommu_update_ga(int cpu, bool is_run, void *data)
 	if (!iommu)
 		return -ENODEV;
 
-	irt = get_irq_table(devid);
-	if (!irt)
+	table = get_irq_table(devid);
+	if (!table)
 		return -ENODEV;
 
-	spin_lock_irqsave(&irt->lock, flags);
+	spin_lock_irqsave(&table->lock, flags);
 
 	if (ref->lo.fields_vapic.guest_mode) {
 		if (cpu >= 0)
@@ -4405,7 +4405,7 @@ int amd_iommu_update_ga(int cpu, bool is_run, void *data)
 		barrier();
 	}
 
-	spin_unlock_irqrestore(&irt->lock, flags);
+	spin_unlock_irqrestore(&table->lock, flags);
 
 	iommu_flush_irt(iommu, devid);
 	iommu_completion_wait(iommu);
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 07/10] iommu/amd: factor out setting the remap table for a devid
  2018-02-23 22:27 iommu/amd: lock splitting & GFP_KERNEL allocation Sebastian Andrzej Siewior
                   ` (5 preceding siblings ...)
  2018-02-23 22:27 ` [PATCH 06/10] iommu/amd: use `table' instead `irt' as variable name in amd_iommu_update_ga() Sebastian Andrzej Siewior
@ 2018-02-23 22:27 ` Sebastian Andrzej Siewior
  2018-02-23 22:27 ` [PATCH 08/10] iommu/amd: drop the lock while allocating new irq remap table Sebastian Andrzej Siewior
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-02-23 22:27 UTC (permalink / raw)
  To: iommu; +Cc: linux-kernel, Joerg Roedel, tglx, Sebastian Andrzej Siewior

Setting the IRQ remap table for a specific devid (or its alias devid)
includes three steps. Those three steps are always repeated each time
this is done.
Introduce a new helper function, move those steps there and use that
function instead. The compiler can still decide if it is worth to
inline.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/iommu/amd_iommu.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index d3a05d794218..b763fcbd790d 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -3588,6 +3588,14 @@ static void set_dte_irq_entry(u16 devid, struct irq_remap_table *table)
 	amd_iommu_dev_table[devid].data[2] = dte;
 }
 
+static void set_remap_table_entry(struct amd_iommu *iommu, u16 devid,
+				  struct irq_remap_table *table)
+{
+	irq_lookup_table[devid] = table;
+	set_dte_irq_entry(devid, table);
+	iommu_flush_dte(iommu, devid);
+}
+
 static struct irq_remap_table *get_irq_table(u16 devid)
 {
 	struct irq_remap_table *table = NULL;
@@ -3608,9 +3616,7 @@ static struct irq_remap_table *get_irq_table(u16 devid)
 	alias = amd_iommu_alias_table[devid];
 	table = irq_lookup_table[alias];
 	if (table) {
-		irq_lookup_table[devid] = table;
-		set_dte_irq_entry(devid, table);
-		iommu_flush_dte(iommu, devid);
+		set_remap_table_entry(iommu, devid, table);
 		goto out;
 	}
 
@@ -3637,14 +3643,9 @@ static struct irq_remap_table *get_irq_table(u16 devid)
 		       (MAX_IRQS_PER_TABLE * (sizeof(u64) * 2)));
 
 
-	irq_lookup_table[devid] = table;
-	set_dte_irq_entry(devid, table);
-	iommu_flush_dte(iommu, devid);
-	if (devid != alias) {
-		irq_lookup_table[alias] = table;
-		set_dte_irq_entry(alias, table);
-		iommu_flush_dte(iommu, alias);
-	}
+	set_remap_table_entry(iommu, devid, table);
+	if (devid != alias)
+		set_remap_table_entry(iommu, alias, table);
 
 out:
 	iommu_completion_wait(iommu);
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 08/10] iommu/amd: drop the lock while allocating new irq remap table
  2018-02-23 22:27 iommu/amd: lock splitting & GFP_KERNEL allocation Sebastian Andrzej Siewior
                   ` (6 preceding siblings ...)
  2018-02-23 22:27 ` [PATCH 07/10] iommu/amd: factor out setting the remap table for a devid Sebastian Andrzej Siewior
@ 2018-02-23 22:27 ` Sebastian Andrzej Siewior
  2018-03-15 12:58   ` Joerg Roedel
  2018-02-23 22:27 ` [PATCH 09/10] iommu/amd: declare irq_remap_table's and amd_iommu's lock as a raw_spin_lock Sebastian Andrzej Siewior
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 21+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-02-23 22:27 UTC (permalink / raw)
  To: iommu; +Cc: linux-kernel, Joerg Roedel, tglx, Sebastian Andrzej Siewior

The irq_remap_table is allocated while the iommu_table_lock is held with
interrupts disabled. While this works it makes RT scream very loudly.
>From looking at the call sites, all callers are in the early device
initialisation (apic_bsp_setup(), pci_enable_device(),
pci_enable_msi()) so make sense to drop the lock which also enables
interrupts and try to allocate that memory with GFP_KERNEL instead
GFP_ATOMIC.

Since during the allocation the iommu_table_lock is dropped, we need to
recheck if table exists after the lock has been reacquired. I *think*
that it is impossible that the "devid" entry appears in irq_lookup_table
while the lock is dropped since the same device can only be probed once.
It is more likely that another device added an `alias' entry. However I
check for both cases, just to be sure.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/iommu/amd_iommu.c | 65 +++++++++++++++++++++++++++++++++--------------
 1 file changed, 46 insertions(+), 19 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index b763fcbd790d..de6cc41d6cd2 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -3588,6 +3588,30 @@ static void set_dte_irq_entry(u16 devid, struct irq_remap_table *table)
 	amd_iommu_dev_table[devid].data[2] = dte;
 }
 
+static struct irq_remap_table *alloc_irq_table(void)
+{
+	struct irq_remap_table *table;
+
+	table = kzalloc(sizeof(*table), GFP_ATOMIC);
+	if (!table)
+		return NULL;
+
+	table->table = kmem_cache_alloc(amd_iommu_irq_cache, GFP_KERNEL);
+	if (!table->table) {
+		kfree(table);
+		return NULL;
+	}
+	spin_lock_init(&table->lock);
+
+	if (!AMD_IOMMU_GUEST_IR_GA(amd_iommu_guest_ir))
+		memset(table->table, 0,
+		       MAX_IRQS_PER_TABLE * sizeof(u32));
+	else
+		memset(table->table, 0,
+		       (MAX_IRQS_PER_TABLE * (sizeof(u64) * 2)));
+	return table;
+}
+
 static void set_remap_table_entry(struct amd_iommu *iommu, u16 devid,
 				  struct irq_remap_table *table)
 {
@@ -3599,6 +3623,7 @@ static void set_remap_table_entry(struct amd_iommu *iommu, u16 devid,
 static struct irq_remap_table *get_irq_table(u16 devid)
 {
 	struct irq_remap_table *table = NULL;
+	struct irq_remap_table *new_table = NULL;
 	struct amd_iommu *iommu;
 	unsigned long flags;
 	u16 alias;
@@ -3617,42 +3642,44 @@ static struct irq_remap_table *get_irq_table(u16 devid)
 	table = irq_lookup_table[alias];
 	if (table) {
 		set_remap_table_entry(iommu, devid, table);
-		goto out;
+		goto out_wait;
 	}
+	raw_spin_unlock_irqrestore(&iommu_table_lock, flags);
 
 	/* Nothing there yet, allocate new irq remapping table */
-	table = kzalloc(sizeof(*table), GFP_ATOMIC);
-	if (!table)
+	new_table = alloc_irq_table();
+	if (!new_table)
+		return NULL;
+
+	raw_spin_lock_irqsave(&iommu_table_lock, flags);
+
+	table = irq_lookup_table[devid];
+	if (table)
 		goto out_unlock;
 
-	/* Initialize table spin-lock */
-	spin_lock_init(&table->lock);
-
-	table->table = kmem_cache_alloc(amd_iommu_irq_cache, GFP_ATOMIC);
-	if (!table->table) {
-		kfree(table);
-		table = NULL;
-		goto out_unlock;
+	table = irq_lookup_table[alias];
+	if (table) {
+		set_remap_table_entry(iommu, devid, table);
+		goto out_wait;
 	}
 
-	if (!AMD_IOMMU_GUEST_IR_GA(amd_iommu_guest_ir))
-		memset(table->table, 0,
-		       MAX_IRQS_PER_TABLE * sizeof(u32));
-	else
-		memset(table->table, 0,
-		       (MAX_IRQS_PER_TABLE * (sizeof(u64) * 2)));
-
+	table = new_table;
+	new_table = NULL;
 
 	set_remap_table_entry(iommu, devid, table);
 	if (devid != alias)
 		set_remap_table_entry(iommu, alias, table);
 
-out:
+out_wait:
 	iommu_completion_wait(iommu);
 
 out_unlock:
 	raw_spin_unlock_irqrestore(&iommu_table_lock, flags);
 
+	if (new_table) {
+		kmem_cache_free(amd_iommu_irq_cache, new_table->table);
+		kfree(new_table);
+	}
 	return table;
 }
 
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 09/10] iommu/amd: declare irq_remap_table's and amd_iommu's lock as a raw_spin_lock
  2018-02-23 22:27 iommu/amd: lock splitting & GFP_KERNEL allocation Sebastian Andrzej Siewior
                   ` (7 preceding siblings ...)
  2018-02-23 22:27 ` [PATCH 08/10] iommu/amd: drop the lock while allocating new irq remap table Sebastian Andrzej Siewior
@ 2018-02-23 22:27 ` Sebastian Andrzej Siewior
  2018-02-23 22:27 ` [PATCH 10/10] iommu/amd: make amd_iommu_devtable_lock a spin_lock Sebastian Andrzej Siewior
  2018-03-15 13:01 ` iommu/amd: lock splitting & GFP_KERNEL allocation Joerg Roedel
  10 siblings, 0 replies; 21+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-02-23 22:27 UTC (permalink / raw)
  To: iommu; +Cc: linux-kernel, Joerg Roedel, tglx, Sebastian Andrzej Siewior

The irq affinity setting is called while desc->lock is held. The
desc->lock is a raw_spin_lock called with interrupts disabled. The
call chain involves modify_irte_ga() which needs to take the
irq_remap_table->lock in order to update the entry and later iommu->lock
in order to update and flush the iommu.
The latter is also required during table allocation.

I currently don't see a feasible way of getting this done without
turning both locks raw so here it is.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/iommu/amd_iommu.c       | 30 +++++++++++++++---------------
 drivers/iommu/amd_iommu_init.c  |  2 +-
 drivers/iommu/amd_iommu_types.h |  4 ++--
 3 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index de6cc41d6cd2..04b7d263523f 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -1052,9 +1052,9 @@ static int iommu_queue_command_sync(struct amd_iommu *iommu,
 	unsigned long flags;
 	int ret;
 
-	spin_lock_irqsave(&iommu->lock, flags);
+	raw_spin_lock_irqsave(&iommu->lock, flags);
 	ret = __iommu_queue_command_sync(iommu, cmd, sync);
-	spin_unlock_irqrestore(&iommu->lock, flags);
+	raw_spin_unlock_irqrestore(&iommu->lock, flags);
 
 	return ret;
 }
@@ -1080,7 +1080,7 @@ static int iommu_completion_wait(struct amd_iommu *iommu)
 
 	build_completion_wait(&cmd, (u64)&iommu->cmd_sem);
 
-	spin_lock_irqsave(&iommu->lock, flags);
+	raw_spin_lock_irqsave(&iommu->lock, flags);
 
 	iommu->cmd_sem = 0;
 
@@ -1091,7 +1091,7 @@ static int iommu_completion_wait(struct amd_iommu *iommu)
 	ret = wait_on_sem(&iommu->cmd_sem);
 
 out_unlock:
-	spin_unlock_irqrestore(&iommu->lock, flags);
+	raw_spin_unlock_irqrestore(&iommu->lock, flags);
 
 	return ret;
 }
@@ -3601,7 +3601,7 @@ static struct irq_remap_table *alloc_irq_table(void)
 		kfree(table);
 		return NULL;
 	}
-	spin_lock_init(&table->lock);
+	raw_spin_lock_init(&table->lock);
 
 	if (!AMD_IOMMU_GUEST_IR_GA(amd_iommu_guest_ir))
 		memset(table->table, 0,
@@ -3700,7 +3700,7 @@ static int alloc_irq_index(u16 devid, int count, bool align)
 	if (align)
 		alignment = roundup_pow_of_two(count);
 
-	spin_lock_irqsave(&table->lock, flags);
+	raw_spin_lock_irqsave(&table->lock, flags);
 
 	/* Scan table for free entries */
 	for (index = ALIGN(table->min_index, alignment), c = 0;
@@ -3727,7 +3727,7 @@ static int alloc_irq_index(u16 devid, int count, bool align)
 	index = -ENOSPC;
 
 out:
-	spin_unlock_irqrestore(&table->lock, flags);
+	raw_spin_unlock_irqrestore(&table->lock, flags);
 
 	return index;
 }
@@ -3748,7 +3748,7 @@ static int modify_irte_ga(u16 devid, int index, struct irte_ga *irte,
 	if (!table)
 		return -ENOMEM;
 
-	spin_lock_irqsave(&table->lock, flags);
+	raw_spin_lock_irqsave(&table->lock, flags);
 
 	entry = (struct irte_ga *)table->table;
 	entry = &entry[index];
@@ -3759,7 +3759,7 @@ static int modify_irte_ga(u16 devid, int index, struct irte_ga *irte,
 	if (data)
 		data->ref = entry;
 
-	spin_unlock_irqrestore(&table->lock, flags);
+	raw_spin_unlock_irqrestore(&table->lock, flags);
 
 	iommu_flush_irt(iommu, devid);
 	iommu_completion_wait(iommu);
@@ -3781,9 +3781,9 @@ static int modify_irte(u16 devid, int index, union irte *irte)
 	if (!table)
 		return -ENOMEM;
 
-	spin_lock_irqsave(&table->lock, flags);
+	raw_spin_lock_irqsave(&table->lock, flags);
 	table->table[index] = irte->val;
-	spin_unlock_irqrestore(&table->lock, flags);
+	raw_spin_unlock_irqrestore(&table->lock, flags);
 
 	iommu_flush_irt(iommu, devid);
 	iommu_completion_wait(iommu);
@@ -3805,9 +3805,9 @@ static void free_irte(u16 devid, int index)
 	if (!table)
 		return;
 
-	spin_lock_irqsave(&table->lock, flags);
+	raw_spin_lock_irqsave(&table->lock, flags);
 	iommu->irte_ops->clear_allocated(table, index);
-	spin_unlock_irqrestore(&table->lock, flags);
+	raw_spin_unlock_irqrestore(&table->lock, flags);
 
 	iommu_flush_irt(iommu, devid);
 	iommu_completion_wait(iommu);
@@ -4424,7 +4424,7 @@ int amd_iommu_update_ga(int cpu, bool is_run, void *data)
 	if (!table)
 		return -ENODEV;
 
-	spin_lock_irqsave(&table->lock, flags);
+	raw_spin_lock_irqsave(&table->lock, flags);
 
 	if (ref->lo.fields_vapic.guest_mode) {
 		if (cpu >= 0)
@@ -4433,7 +4433,7 @@ int amd_iommu_update_ga(int cpu, bool is_run, void *data)
 		barrier();
 	}
 
-	spin_unlock_irqrestore(&table->lock, flags);
+	raw_spin_unlock_irqrestore(&table->lock, flags);
 
 	iommu_flush_irt(iommu, devid);
 	iommu_completion_wait(iommu);
diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 4e4a615bf13f..904c575d1677 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -1474,7 +1474,7 @@ static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header *h)
 {
 	int ret;
 
-	spin_lock_init(&iommu->lock);
+	raw_spin_lock_init(&iommu->lock);
 
 	/* Add IOMMU to internal data structures */
 	list_add_tail(&iommu->list, &amd_iommu_list);
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index 62d6f42b57c9..1c9b080276c9 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -408,7 +408,7 @@ extern bool amd_iommu_iotlb_sup;
 #define IRQ_TABLE_ALIGNMENT	128
 
 struct irq_remap_table {
-	spinlock_t lock;
+	raw_spinlock_t lock;
 	unsigned min_index;
 	u32 *table;
 };
@@ -490,7 +490,7 @@ struct amd_iommu {
 	int index;
 
 	/* locks the accesses to the hardware */
-	spinlock_t lock;
+	raw_spinlock_t lock;
 
 	/* Pointer to PCI device of this IOMMU */
 	struct pci_dev *dev;
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 10/10] iommu/amd: make amd_iommu_devtable_lock a spin_lock
  2018-02-23 22:27 iommu/amd: lock splitting & GFP_KERNEL allocation Sebastian Andrzej Siewior
                   ` (8 preceding siblings ...)
  2018-02-23 22:27 ` [PATCH 09/10] iommu/amd: declare irq_remap_table's and amd_iommu's lock as a raw_spin_lock Sebastian Andrzej Siewior
@ 2018-02-23 22:27 ` Sebastian Andrzej Siewior
  2018-03-15 13:01 ` iommu/amd: lock splitting & GFP_KERNEL allocation Joerg Roedel
  10 siblings, 0 replies; 21+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-02-23 22:27 UTC (permalink / raw)
  To: iommu; +Cc: linux-kernel, Joerg Roedel, tglx, Sebastian Andrzej Siewior

Before commit 0bb6e243d7fb ("iommu/amd: Support IOMMU_DOMAIN_DMA type
allocation") amd_iommu_devtable_lock had a read_lock() user but now
there are none. In fact, after the mentioned commit we had only
write_lock() user of the lock. Since there is no reason to keep it as
writer lock, change its type to a spin_lock.
I *think* that we might even be able to remove the lock because all its
current user seem to have their own protection.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/iommu/amd_iommu.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 04b7d263523f..cabb02cf6d42 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -80,7 +80,7 @@
  */
 #define AMD_IOMMU_PGSIZES	((~0xFFFUL) & ~(2ULL << 38))
 
-static DEFINE_RWLOCK(amd_iommu_devtable_lock);
+static DEFINE_SPINLOCK(amd_iommu_devtable_lock);
 static DEFINE_SPINLOCK(pd_bitmap_lock);
 static DEFINE_RAW_SPINLOCK(iommu_table_lock);
 
@@ -2096,9 +2096,9 @@ static int attach_device(struct device *dev,
 	}
 
 skip_ats_check:
-	write_lock_irqsave(&amd_iommu_devtable_lock, flags);
+	spin_lock_irqsave(&amd_iommu_devtable_lock, flags);
 	ret = __attach_device(dev_data, domain);
-	write_unlock_irqrestore(&amd_iommu_devtable_lock, flags);
+	spin_unlock_irqrestore(&amd_iommu_devtable_lock, flags);
 
 	/*
 	 * We might boot into a crash-kernel here. The crashed kernel
@@ -2148,9 +2148,9 @@ static void detach_device(struct device *dev)
 	domain   = dev_data->domain;
 
 	/* lock device table */
-	write_lock_irqsave(&amd_iommu_devtable_lock, flags);
+	spin_lock_irqsave(&amd_iommu_devtable_lock, flags);
 	__detach_device(dev_data);
-	write_unlock_irqrestore(&amd_iommu_devtable_lock, flags);
+	spin_unlock_irqrestore(&amd_iommu_devtable_lock, flags);
 
 	if (!dev_is_pci(dev))
 		return;
@@ -2813,7 +2813,7 @@ static void cleanup_domain(struct protection_domain *domain)
 	struct iommu_dev_data *entry;
 	unsigned long flags;
 
-	write_lock_irqsave(&amd_iommu_devtable_lock, flags);
+	spin_lock_irqsave(&amd_iommu_devtable_lock, flags);
 
 	while (!list_empty(&domain->dev_list)) {
 		entry = list_first_entry(&domain->dev_list,
@@ -2821,7 +2821,7 @@ static void cleanup_domain(struct protection_domain *domain)
 		__detach_device(entry);
 	}
 
-	write_unlock_irqrestore(&amd_iommu_devtable_lock, flags);
+	spin_unlock_irqrestore(&amd_iommu_devtable_lock, flags);
 }
 
 static void protection_domain_free(struct protection_domain *domain)
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 05/10] iommu/amd: remove the special case from get_irq_table()
  2018-02-23 22:27 ` [PATCH 05/10] iommu/amd: remove the special case from get_irq_table() Sebastian Andrzej Siewior
@ 2018-03-15 12:53   ` Joerg Roedel
  2018-03-15 13:01     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 21+ messages in thread
From: Joerg Roedel @ 2018-03-15 12:53 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: iommu, linux-kernel, tglx

On Fri, Feb 23, 2018 at 11:27:31PM +0100, Sebastian Andrzej Siewior wrote:
> @@ -4103,10 +4093,26 @@ static int irq_remapping_alloc(struct irq_domain *domain, unsigned int virq,
>  		return ret;
>  
>  	if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC) {
> -		if (get_irq_table(devid, true))
> +		struct irq_remap_table *table;
> +		struct amd_iommu *iommu;
> +
> +		table = get_irq_table(devid);
> +		if (table) {
> +			if (!table->min_index) {
> +				/*
> +				 * Keep the first 32 indexes free for IOAPIC
> +				 * interrupts.
> +				 */
> +				table->min_index = 32;
> +				iommu = amd_iommu_rlookup_table[devid];
> +				for (i = 0; i < 32; ++i)
> +					iommu->irte_ops->set_allocated(table, i);
> +			}

I think this needs to be protected by the table->lock.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 04/10] iommu/amd: split irq_lookup_table out of the amd_iommu_devtable_lock
  2018-02-23 22:27 ` [PATCH 04/10] iommu/amd: split irq_lookup_table out of the amd_iommu_devtable_lock Sebastian Andrzej Siewior
@ 2018-03-15 12:54   ` Joerg Roedel
  0 siblings, 0 replies; 21+ messages in thread
From: Joerg Roedel @ 2018-03-15 12:54 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: iommu, linux-kernel, tglx

On Fri, Feb 23, 2018 at 11:27:30PM +0100, Sebastian Andrzej Siewior wrote:
> The function get_irq_table() reads/writes irq_lookup_table while holding
> the amd_iommu_devtable_lock. It also modifies
> amd_iommu_dev_table[].data[2].
> set_dte_entry() is using amd_iommu_dev_table[].data[0|1] (under the
> domain->lock) so it should be okay. The access to the iommu is
> serialized with its own (iommu's) lock.
> 
> So split out get_irq_table() out of amd_iommu_devtable_lock's lock. The
> new lock is a raw_spin_lock because modify_irte_ga() is called while
> desc->lock is held (which is raw).
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

This patch likely conflicts with changes already in the iommu-tree, as
it contains a patch splitting the get_irq_table() function. Can
you rebase your series against the x86/amd branch?


Thanks,

	Joerg

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 08/10] iommu/amd: drop the lock while allocating new irq remap table
  2018-02-23 22:27 ` [PATCH 08/10] iommu/amd: drop the lock while allocating new irq remap table Sebastian Andrzej Siewior
@ 2018-03-15 12:58   ` Joerg Roedel
  0 siblings, 0 replies; 21+ messages in thread
From: Joerg Roedel @ 2018-03-15 12:58 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: iommu, linux-kernel, tglx

On Fri, Feb 23, 2018 at 11:27:34PM +0100, Sebastian Andrzej Siewior wrote:
> The irq_remap_table is allocated while the iommu_table_lock is held with
> interrupts disabled. While this works it makes RT scream very loudly.
> >From looking at the call sites, all callers are in the early device
> initialisation (apic_bsp_setup(), pci_enable_device(),
> pci_enable_msi()) so make sense to drop the lock which also enables
> interrupts and try to allocate that memory with GFP_KERNEL instead
> GFP_ATOMIC.
> 
> Since during the allocation the iommu_table_lock is dropped, we need to
> recheck if table exists after the lock has been reacquired. I *think*
> that it is impossible that the "devid" entry appears in irq_lookup_table
> while the lock is dropped since the same device can only be probed once.
> It is more likely that another device added an `alias' entry. However I
> check for both cases, just to be sure.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

This looks like it conflicts with the x86/amd branch as well.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: iommu/amd: lock splitting & GFP_KERNEL allocation
  2018-02-23 22:27 iommu/amd: lock splitting & GFP_KERNEL allocation Sebastian Andrzej Siewior
                   ` (9 preceding siblings ...)
  2018-02-23 22:27 ` [PATCH 10/10] iommu/amd: make amd_iommu_devtable_lock a spin_lock Sebastian Andrzej Siewior
@ 2018-03-15 13:01 ` Joerg Roedel
  10 siblings, 0 replies; 21+ messages in thread
From: Joerg Roedel @ 2018-03-15 13:01 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: iommu, linux-kernel, tglx

Hi Sebastian,

On Fri, Feb 23, 2018 at 11:27:26PM +0100, Sebastian Andrzej Siewior wrote:
> I have no idea why but suddenly my A10 box complained loudly about
> locking and memory allocations within the iommu code under RT. Looking
> at the code it has been like this for a longer time so the iommu must
> have appeared recently (well there was a bios upgrade due to other
> issues so it might have enabled it).
> 
> The goal here is to make the memory allocation in get_irq_table() not
> with disabled interrupts and having as little raw_spin_lock as possible
> while having them if the caller is also holding one (like desc->lock
> during IRQ-affinity changes).
> 
> The patches were boot tested on an AMD EPYC 7601.

Thanks for these patches, I really like the cleanups and the improved
locking. Please rebase against x86/amd in the iommu branch and address
the other comment I have, then I put them into my tree.


Thanks,

	Joerg

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 05/10] iommu/amd: remove the special case from get_irq_table()
  2018-03-15 12:53   ` Joerg Roedel
@ 2018-03-15 13:01     ` Sebastian Andrzej Siewior
  2018-03-15 13:07       ` Joerg Roedel
  0 siblings, 1 reply; 21+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-03-15 13:01 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: iommu, linux-kernel, tglx

On 2018-03-15 13:53:42 [+0100], Joerg Roedel wrote:
> On Fri, Feb 23, 2018 at 11:27:31PM +0100, Sebastian Andrzej Siewior wrote:
> > @@ -4103,10 +4093,26 @@ static int irq_remapping_alloc(struct irq_domain *domain, unsigned int virq,
> >  		return ret;
> >  
> >  	if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC) {
> > -		if (get_irq_table(devid, true))
> > +		struct irq_remap_table *table;
> > +		struct amd_iommu *iommu;
> > +
> > +		table = get_irq_table(devid);
> > +		if (table) {
> > +			if (!table->min_index) {
> > +				/*
> > +				 * Keep the first 32 indexes free for IOAPIC
> > +				 * interrupts.
> > +				 */
> > +				table->min_index = 32;
> > +				iommu = amd_iommu_rlookup_table[devid];
> > +				for (i = 0; i < 32; ++i)
> > +					iommu->irte_ops->set_allocated(table, i);
> > +			}
> 
> I think this needs to be protected by the table->lock.

Which part any why? The !->min_index part plus extending(setting to 32)?

Sebastian

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 05/10] iommu/amd: remove the special case from get_irq_table()
  2018-03-15 13:01     ` Sebastian Andrzej Siewior
@ 2018-03-15 13:07       ` Joerg Roedel
  2018-03-15 14:15         ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 21+ messages in thread
From: Joerg Roedel @ 2018-03-15 13:07 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: iommu, linux-kernel, tglx

On Thu, Mar 15, 2018 at 02:01:53PM +0100, Sebastian Andrzej Siewior wrote:
> On 2018-03-15 13:53:42 [+0100], Joerg Roedel wrote:
> > On Fri, Feb 23, 2018 at 11:27:31PM +0100, Sebastian Andrzej Siewior wrote:
> > > @@ -4103,10 +4093,26 @@ static int irq_remapping_alloc(struct irq_domain *domain, unsigned int virq,
> > >  		return ret;
> > >  
> > >  	if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC) {
> > > -		if (get_irq_table(devid, true))
> > > +		struct irq_remap_table *table;
> > > +		struct amd_iommu *iommu;
> > > +
> > > +		table = get_irq_table(devid);
> > > +		if (table) {
> > > +			if (!table->min_index) {
> > > +				/*
> > > +				 * Keep the first 32 indexes free for IOAPIC
> > > +				 * interrupts.
> > > +				 */
> > > +				table->min_index = 32;
> > > +				iommu = amd_iommu_rlookup_table[devid];
> > > +				for (i = 0; i < 32; ++i)
> > > +					iommu->irte_ops->set_allocated(table, i);
> > > +			}
> > 
> > I think this needs to be protected by the table->lock.
> 
> Which part any why? The !->min_index part plus extending(setting to 32)?

In particular the set_allocated part, when get_irq_table() returns the
table is visible to other users as well. I have not checked the
irq-layer locking to be sure that can happen, though.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 05/10] iommu/amd: remove the special case from get_irq_table()
  2018-03-15 13:07       ` Joerg Roedel
@ 2018-03-15 14:15         ` Sebastian Andrzej Siewior
  2018-03-15 15:19           ` Joerg Roedel
  0 siblings, 1 reply; 21+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-03-15 14:15 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: iommu, linux-kernel, tglx

On 2018-03-15 14:07:23 [+0100], Joerg Roedel wrote:
> On Thu, Mar 15, 2018 at 02:01:53PM +0100, Sebastian Andrzej Siewior wrote:
> > On 2018-03-15 13:53:42 [+0100], Joerg Roedel wrote:
> > > On Fri, Feb 23, 2018 at 11:27:31PM +0100, Sebastian Andrzej Siewior wrote:
> > > > @@ -4103,10 +4093,26 @@ static int irq_remapping_alloc(struct irq_domain *domain, unsigned int virq,
> > > >  		return ret;
> > > >  
> > > >  	if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC) {
> > > > -		if (get_irq_table(devid, true))
> > > > +		struct irq_remap_table *table;
> > > > +		struct amd_iommu *iommu;
> > > > +
> > > > +		table = get_irq_table(devid);
> > > > +		if (table) {
> > > > +			if (!table->min_index) {
> > > > +				/*
> > > > +				 * Keep the first 32 indexes free for IOAPIC
> > > > +				 * interrupts.
> > > > +				 */
> > > > +				table->min_index = 32;
> > > > +				iommu = amd_iommu_rlookup_table[devid];
> > > > +				for (i = 0; i < 32; ++i)
> > > > +					iommu->irte_ops->set_allocated(table, i);
> > > > +			}
> > > 
> > > I think this needs to be protected by the table->lock.
> > 
> > Which part any why? The !->min_index part plus extending(setting to 32)?
> 
> In particular the set_allocated part, when get_irq_table() returns the
> table is visible to other users as well. I have not checked the
> irq-layer locking to be sure that can happen, though.

->set_allocated() operates only on 0…31 and other could be used at the
same time. However 0…31 should be accessed by other user before they are
ready.

irq_remapping_alloc() is that ->alloc() callback invoked via
irq_domain_alloc_irqs_hierarchy() and each caller has to hold the
&irq_domain_mutex mutex. So we should not have those in parallel.

Is it possible to have those entries accessed before the setup is
complete? My understanding is that this setup is performed once during
boot (especially that ioapic part) and not again.

>From looking at that hunk, it should not hurt to add that lock, just
wanted to check it is really needed.

Sebastian

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 05/10] iommu/amd: remove the special case from get_irq_table()
  2018-03-15 14:15         ` Sebastian Andrzej Siewior
@ 2018-03-15 15:19           ` Joerg Roedel
  2018-03-15 15:25             ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 21+ messages in thread
From: Joerg Roedel @ 2018-03-15 15:19 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: iommu, linux-kernel, tglx

On Thu, Mar 15, 2018 at 03:15:41PM +0100, Sebastian Andrzej Siewior wrote:
> ->set_allocated() operates only on 0…31 and other could be used at the
> same time. However 0…31 should be accessed by other user before they are
> ready.
> 
> irq_remapping_alloc() is that ->alloc() callback invoked via
> irq_domain_alloc_irqs_hierarchy() and each caller has to hold the
> &irq_domain_mutex mutex. So we should not have those in parallel.
> 
> Is it possible to have those entries accessed before the setup is
> complete? My understanding is that this setup is performed once during
> boot (especially that ioapic part) and not again.
> 
> From looking at that hunk, it should not hurt to add that lock, just
> wanted to check it is really needed.

Okay, if the irq-layer does the needed locking, then we don't need
another lock here. There is the modify_irte_ga() path for the
virtualized irq routing into KVM guests, but there should be no KVM
guests running when setup the ioapic routing entries.



	Joerg

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 05/10] iommu/amd: remove the special case from get_irq_table()
  2018-03-15 15:19           ` Joerg Roedel
@ 2018-03-15 15:25             ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 21+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-03-15 15:25 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: iommu, linux-kernel, tglx

On 2018-03-15 16:19:17 [+0100], Joerg Roedel wrote:
> Okay, if the irq-layer does the needed locking, then we don't need
> another lock here. There is the modify_irte_ga() path for the
> virtualized irq routing into KVM guests, but there should be no KVM
> guests running when setup the ioapic routing entries.

okay then. Let me rebase the queue on top whatever you have now and then
if the box still boots I will post them again.

> 	Joerg

Sebastian

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 08/10] iommu/amd: drop the lock while allocating new irq remap table
  2018-03-22 15:22 [PATCH 00/10 v3] " Sebastian Andrzej Siewior
@ 2018-03-22 15:22 ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 21+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-03-22 15:22 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: iommu, linux-kernel, tglx, Scott Wood, Sebastian Andrzej Siewior

The irq_remap_table is allocated while the iommu_table_lock is held with
interrupts disabled.
>From looking at the call sites, all callers are in the early device
initialisation (apic_bsp_setup(), pci_enable_device(),
pci_enable_msi()) so make sense to drop the lock which also enables
interrupts and try to allocate that memory with GFP_KERNEL instead
GFP_ATOMIC.

Since during the allocation the iommu_table_lock is dropped, we need to
recheck if table exists after the lock has been reacquired. I *think*
that it is impossible that the "devid" entry appears in irq_lookup_table
while the lock is dropped since the same device can only be probed once.
However I check for both cases, just to be sure.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/iommu/amd_iommu.c | 65 +++++++++++++++++++++++++++++++++--------------
 1 file changed, 46 insertions(+), 19 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 11ea2d656be8..c493d345b3ef 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -3617,6 +3617,30 @@ static struct irq_remap_table *get_irq_table(u16 devid)
 	return table;
 }
 
+static struct irq_remap_table *__alloc_irq_table(void)
+{
+	struct irq_remap_table *table;
+
+	table = kzalloc(sizeof(*table), GFP_KERNEL);
+	if (!table)
+		return NULL;
+
+	table->table = kmem_cache_alloc(amd_iommu_irq_cache, GFP_KERNEL);
+	if (!table->table) {
+		kfree(table);
+		return NULL;
+	}
+	raw_spin_lock_init(&table->lock);
+
+	if (!AMD_IOMMU_GUEST_IR_GA(amd_iommu_guest_ir))
+		memset(table->table, 0,
+		       MAX_IRQS_PER_TABLE * sizeof(u32));
+	else
+		memset(table->table, 0,
+		       (MAX_IRQS_PER_TABLE * (sizeof(u64) * 2)));
+	return table;
+}
+
 static void set_remap_table_entry(struct amd_iommu *iommu, u16 devid,
 				  struct irq_remap_table *table)
 {
@@ -3628,6 +3652,7 @@ static void set_remap_table_entry(struct amd_iommu *iommu, u16 devid,
 static struct irq_remap_table *alloc_irq_table(u16 devid)
 {
 	struct irq_remap_table *table = NULL;
+	struct irq_remap_table *new_table = NULL;
 	struct amd_iommu *iommu;
 	unsigned long flags;
 	u16 alias;
@@ -3646,42 +3671,44 @@ static struct irq_remap_table *alloc_irq_table(u16 devid)
 	table = irq_lookup_table[alias];
 	if (table) {
 		set_remap_table_entry(iommu, devid, table);
-		goto out;
+		goto out_wait;
 	}
+	spin_unlock_irqrestore(&iommu_table_lock, flags);
 
 	/* Nothing there yet, allocate new irq remapping table */
-	table = kzalloc(sizeof(*table), GFP_ATOMIC);
-	if (!table)
+	new_table = __alloc_irq_table();
+	if (!new_table)
+		return NULL;
+
+	spin_lock_irqsave(&iommu_table_lock, flags);
+
+	table = irq_lookup_table[devid];
+	if (table)
 		goto out_unlock;
 
-	/* Initialize table spin-lock */
-	raw_spin_lock_init(&table->lock);
-
-	table->table = kmem_cache_alloc(amd_iommu_irq_cache, GFP_ATOMIC);
-	if (!table->table) {
-		kfree(table);
-		table = NULL;
-		goto out_unlock;
+	table = irq_lookup_table[alias];
+	if (table) {
+		set_remap_table_entry(iommu, devid, table);
+		goto out_wait;
 	}
 
-	if (!AMD_IOMMU_GUEST_IR_GA(amd_iommu_guest_ir))
-		memset(table->table, 0,
-		       MAX_IRQS_PER_TABLE * sizeof(u32));
-	else
-		memset(table->table, 0,
-		       (MAX_IRQS_PER_TABLE * (sizeof(u64) * 2)));
-
+	table = new_table;
+	new_table = NULL;
 
 	set_remap_table_entry(iommu, devid, table);
 	if (devid != alias)
 		set_remap_table_entry(iommu, alias, table);
 
-out:
+out_wait:
 	iommu_completion_wait(iommu);
 
 out_unlock:
 	spin_unlock_irqrestore(&iommu_table_lock, flags);
 
+	if (new_table) {
+		kmem_cache_free(amd_iommu_irq_cache, new_table->table);
+		kfree(new_table);
+	}
 	return table;
 }
 
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2018-03-22 15:23 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-23 22:27 iommu/amd: lock splitting & GFP_KERNEL allocation Sebastian Andrzej Siewior
2018-02-23 22:27 ` [PATCH 01/10] iommu/amd: take into account that alloc_dev_data() may return NULL Sebastian Andrzej Siewior
2018-02-23 22:27 ` [PATCH 02/10] iommu/amd: turn dev_data_list into a lock less list Sebastian Andrzej Siewior
2018-02-23 22:27 ` [PATCH 03/10] iommu/amd: split domain id out of amd_iommu_devtable_lock Sebastian Andrzej Siewior
2018-02-23 22:27 ` [PATCH 04/10] iommu/amd: split irq_lookup_table out of the amd_iommu_devtable_lock Sebastian Andrzej Siewior
2018-03-15 12:54   ` Joerg Roedel
2018-02-23 22:27 ` [PATCH 05/10] iommu/amd: remove the special case from get_irq_table() Sebastian Andrzej Siewior
2018-03-15 12:53   ` Joerg Roedel
2018-03-15 13:01     ` Sebastian Andrzej Siewior
2018-03-15 13:07       ` Joerg Roedel
2018-03-15 14:15         ` Sebastian Andrzej Siewior
2018-03-15 15:19           ` Joerg Roedel
2018-03-15 15:25             ` Sebastian Andrzej Siewior
2018-02-23 22:27 ` [PATCH 06/10] iommu/amd: use `table' instead `irt' as variable name in amd_iommu_update_ga() Sebastian Andrzej Siewior
2018-02-23 22:27 ` [PATCH 07/10] iommu/amd: factor out setting the remap table for a devid Sebastian Andrzej Siewior
2018-02-23 22:27 ` [PATCH 08/10] iommu/amd: drop the lock while allocating new irq remap table Sebastian Andrzej Siewior
2018-03-15 12:58   ` Joerg Roedel
2018-02-23 22:27 ` [PATCH 09/10] iommu/amd: declare irq_remap_table's and amd_iommu's lock as a raw_spin_lock Sebastian Andrzej Siewior
2018-02-23 22:27 ` [PATCH 10/10] iommu/amd: make amd_iommu_devtable_lock a spin_lock Sebastian Andrzej Siewior
2018-03-15 13:01 ` iommu/amd: lock splitting & GFP_KERNEL allocation Joerg Roedel
2018-03-22 15:22 [PATCH 00/10 v3] " Sebastian Andrzej Siewior
2018-03-22 15:22 ` [PATCH 08/10] iommu/amd: drop the lock while allocating new irq remap table Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).