[PATCH] irqchip/gic-v3-its: Lock its device list during find and create its device

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] irqchip/gic-v3-its: Lock its device list during find and create its device
@ 2019-01-26  6:16 Zheng Xiang
  2019-01-26 11:38 ` Marc Zyngier
  0 siblings, 1 reply; 10+ messages in thread
From: Zheng Xiang @ 2019-01-26  6:16 UTC (permalink / raw)
  To: linux-kernel; +Cc: tglx, jason, marc.zyngier, wanghaibin.wang, Zheng Xiang

Currently each PCI device under a PCI Bridge shares the same device id
and ITS device. Assume there are two PCI devices call its_msi_prepare
concurrently and they are both going to find and create their ITS
device. There is a chance that the later one couldn't find ITS device
before the other one creating the ITS device. It will cause the later
one to create a different ITS device even if they have the same
device_id.

Signed-off-by: Zheng Xiang <zhengxiang9@huawei.com>
---
 drivers/irqchip/irq-gic-v3-its.c | 52 +++++++++++++++-------------------------
 1 file changed, 19 insertions(+), 33 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index db20e99..397edc8 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -2205,25 +2205,6 @@ static void its_cpu_init_collections(void)
 	raw_spin_unlock(&its_lock);
 }
 
-static struct its_device *its_find_device(struct its_node *its, u32 dev_id)
-{
-	struct its_device *its_dev = NULL, *tmp;
-	unsigned long flags;
-
-	raw_spin_lock_irqsave(&its->lock, flags);
-
-	list_for_each_entry(tmp, &its->its_device_list, entry) {
-		if (tmp->device_id == dev_id) {
-			its_dev = tmp;
-			break;
-		}
-	}
-
-	raw_spin_unlock_irqrestore(&its->lock, flags);
-
-	return its_dev;
-}
-
 static struct its_baser *its_get_baser(struct its_node *its, u32 type)
 {
 	int i;
@@ -2321,7 +2302,7 @@ static bool its_alloc_vpe_table(u32 vpe_id)
 static struct its_device *its_create_device(struct its_node *its, u32 dev_id,
 					    int nvecs, bool alloc_lpis)
 {
-	struct its_device *dev;
+	struct its_device *dev = NULL, *tmp;
 	unsigned long *lpi_map = NULL;
 	unsigned long flags;
 	u16 *col_map = NULL;
@@ -2331,6 +2312,24 @@ static struct its_device *its_create_device(struct its_node *its, u32 dev_id,
 	int nr_ites;
 	int sz;
 
+	raw_spin_lock_irqsave(&its->lock, flags);
+	list_for_each_entry(tmp, &its->its_device_list, entry) {
+		if (tmp->device_id == dev_id) {
+			dev = tmp;
+			break;
+		}
+	}
+	if (dev) {
+		/*
+		 * We already have seen this ID, probably through
+		 * another alias (PCI bridge of some sort). No need to
+		 * create the device.
+		 */
+		pr_debug("Reusing ITT for devID %x\n", dev_id);
+		raw_spin_unlock_irqrestore(&its->lock, flags);
+		return dev;
+	}
+
 	if (!its_alloc_device_table(its, dev_id))
 		return NULL;
 
@@ -2378,7 +2377,6 @@ static struct its_device *its_create_device(struct its_node *its, u32 dev_id,
 	dev->device_id = dev_id;
 	INIT_LIST_HEAD(&dev->entry);
 
-	raw_spin_lock_irqsave(&its->lock, flags);
 	list_add(&dev->entry, &its->its_device_list);
 	raw_spin_unlock_irqrestore(&its->lock, flags);
 
@@ -2443,23 +2441,11 @@ static int its_msi_prepare(struct irq_domain *domain, struct device *dev,
 		return -EINVAL;
 	}
 
-	its_dev = its_find_device(its, dev_id);
-	if (its_dev) {
-		/*
-		 * We already have seen this ID, probably through
-		 * another alias (PCI bridge of some sort). No need to
-		 * create the device.
-		 */
-		pr_debug("Reusing ITT for devID %x\n", dev_id);
-		goto out;
-	}
-
 	its_dev = its_create_device(its, dev_id, nvec, true);
 	if (!its_dev)
 		return -ENOMEM;
 
 	pr_debug("ITT %d entries, %d bits\n", nvec, ilog2(nvec));
-out:
 	info->scratchpad[0].ptr = its_dev;
 	return 0;
 }
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] irqchip/gic-v3-its: Lock its device list during find and create its device
  2019-01-26  6:16 [PATCH] irqchip/gic-v3-its: Lock its device list during find and create its device Zheng Xiang
@ 2019-01-26 11:38 ` Marc Zyngier
  2019-01-28  7:13   ` Zheng Xiang
  0 siblings, 1 reply; 10+ messages in thread
From: Marc Zyngier @ 2019-01-26 11:38 UTC (permalink / raw)
  To: Zheng Xiang; +Cc: linux-kernel, tglx, jason, wanghaibin.wang

Hi Zheng,

On Sat, 26 Jan 2019 06:16:24 +0000,
Zheng Xiang <zhengxiang9@huawei.com> wrote:
> 
> Currently each PCI device under a PCI Bridge shares the same device id
> and ITS device. Assume there are two PCI devices call its_msi_prepare
> concurrently and they are both going to find and create their ITS
> device. There is a chance that the later one couldn't find ITS device
> before the other one creating the ITS device. It will cause the later
> one to create a different ITS device even if they have the same
> device_id.

Interesting finding. Is this something you've actually seen in practice
with two devices being probed in parallel? Or something that you found
by inspection?

The whole RID aliasing is such a mess, I wish we never supported
it. Anyway, comments below.

> 
> Signed-off-by: Zheng Xiang <zhengxiang9@huawei.com>
> ---
>  drivers/irqchip/irq-gic-v3-its.c | 52 +++++++++++++++-------------------------
>  1 file changed, 19 insertions(+), 33 deletions(-)
> 
> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> index db20e99..397edc8 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -2205,25 +2205,6 @@ static void its_cpu_init_collections(void)
>  	raw_spin_unlock(&its_lock);
>  }
>  
> -static struct its_device *its_find_device(struct its_node *its, u32 dev_id)
> -{
> -	struct its_device *its_dev = NULL, *tmp;
> -	unsigned long flags;
> -
> -	raw_spin_lock_irqsave(&its->lock, flags);
> -
> -	list_for_each_entry(tmp, &its->its_device_list, entry) {
> -		if (tmp->device_id == dev_id) {
> -			its_dev = tmp;
> -			break;
> -		}
> -	}
> -
> -	raw_spin_unlock_irqrestore(&its->lock, flags);
> -
> -	return its_dev;
> -}
> -
>  static struct its_baser *its_get_baser(struct its_node *its, u32 type)
>  {
>  	int i;
> @@ -2321,7 +2302,7 @@ static bool its_alloc_vpe_table(u32 vpe_id)
>  static struct its_device *its_create_device(struct its_node *its, u32 dev_id,
>  					    int nvecs, bool alloc_lpis)
>  {
> -	struct its_device *dev;
> +	struct its_device *dev = NULL, *tmp;
>  	unsigned long *lpi_map = NULL;
>  	unsigned long flags;
>  	u16 *col_map = NULL;
> @@ -2331,6 +2312,24 @@ static struct its_device *its_create_device(struct its_node *its, u32 dev_id,
>  	int nr_ites;
>  	int sz;
>  
> +	raw_spin_lock_irqsave(&its->lock, flags);
> +	list_for_each_entry(tmp, &its->its_device_list, entry) {
> +		if (tmp->device_id == dev_id) {
> +			dev = tmp;
> +			break;
> +		}
> +	}
> +	if (dev) {
> +		/*
> +		 * We already have seen this ID, probably through
> +		 * another alias (PCI bridge of some sort). No need to
> +		 * create the device.
> +		 */
> +		pr_debug("Reusing ITT for devID %x\n", dev_id);
> +		raw_spin_unlock_irqrestore(&its->lock, flags);
> +		return dev;
> +	}
> +
>  	if (!its_alloc_device_table(its, dev_id))

You're now performing all sort of allocations in an atomic context,
which is pretty horrible (and the kernel will shout at you for doing
so).

We could probably keep the current logic and wrap it around a mutex
instead, which would give us the appropriate guarantees WRT allocations.
Something along those lines (untested):

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index db20e992a40f..99feb62e63ba 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -97,9 +97,14 @@ struct its_device;
  * The ITS structure - contains most of the infrastructure, with the
  * top-level MSI domain, the command queue, the collections, and the
  * list of devices writing to it.
+ *
+ * alloc_lock has to be taken for any allocation that can happen at
+ * run time, while the spinlock must be taken to parse data structures
+ * such as the device list.
  */
 struct its_node {
 	raw_spinlock_t		lock;
+	struct mutex		alloc_lock;
 	struct list_head	entry;
 	void __iomem		*base;
 	phys_addr_t		phys_base;
@@ -2421,6 +2426,7 @@ static int its_msi_prepare(struct irq_domain *domain, struct device *dev,
 	struct its_device *its_dev;
 	struct msi_domain_info *msi_info;
 	u32 dev_id;
+	int err = 0;
 
 	/*
 	 * We ignore "dev" entierely, and rely on the dev_id that has
@@ -2443,6 +2449,7 @@ static int its_msi_prepare(struct irq_domain *domain, struct device *dev,
 		return -EINVAL;
 	}
 
+	mutex_lock(&its->alloc_lock);
 	its_dev = its_find_device(its, dev_id);
 	if (its_dev) {
 		/*
@@ -2455,11 +2462,14 @@ static int its_msi_prepare(struct irq_domain *domain, struct device *dev,
 	}
 
 	its_dev = its_create_device(its, dev_id, nvec, true);
-	if (!its_dev)
-		return -ENOMEM;
+	if (!its_dev) {
+		err = -ENOMEM;
+		goto out;
+	}
 
 	pr_debug("ITT %d entries, %d bits\n", nvec, ilog2(nvec));
 out:
+	mutex_unlock(&its->alloc_lock);
 	info->scratchpad[0].ptr = its_dev;
 	return 0;
 }
@@ -3516,6 +3526,7 @@ static int __init its_probe_one(struct resource *res,
 	}
 
 	raw_spin_lock_init(&its->lock);
+	mutex_init(&its->alloc_lock);
 	INIT_LIST_HEAD(&its->entry);
 	INIT_LIST_HEAD(&its->its_device_list);
 	typer = gic_read_typer(its_base + GITS_TYPER);

I still feel that the issue you're seeing here is much more generic.
Overall, there is no guarantee that for a given MSI domain, no two
allocation will take place in parallel, and maybe that's what we should
enforce instead.

Thanks,

	M.

-- 
Jazz is not dead, it just smell funny.

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] irqchip/gic-v3-its: Lock its device list during find and create its device
  2019-01-26 11:38 ` Marc Zyngier
@ 2019-01-28  7:13   ` Zheng Xiang
  2019-01-28 13:51     ` Marc Zyngier
  0 siblings, 1 reply; 10+ messages in thread
From: Zheng Xiang @ 2019-01-28  7:13 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: linux-kernel, tglx, jason, wanghaibin.wang

Hi Marc,

Thanks for your review.

On 2019/1/26 19:38, Marc Zyngier wrote:
> Hi Zheng,
> 
> On Sat, 26 Jan 2019 06:16:24 +0000,
> Zheng Xiang <zhengxiang9@huawei.com> wrote:
>>
>> Currently each PCI device under a PCI Bridge shares the same device id
>> and ITS device. Assume there are two PCI devices call its_msi_prepare
>> concurrently and they are both going to find and create their ITS
>> device. There is a chance that the later one couldn't find ITS device
>> before the other one creating the ITS device. It will cause the later
>> one to create a different ITS device even if they have the same
>> device_id.
> 
> Interesting finding. Is this something you've actually seen in practice
> with two devices being probed in parallel? Or something that you found
> by inspection?

Yes, I find this problem after analyzing the reason of VM hung. At last, I
find that the virtio-gpu cannot receive the MSI interrupts due to sharing
a same event_id as virtio-serial.

See https://lkml.org/lkml/2019/1/10/299 for the bug report.

This problem can be reproducted with high probability by booting a Qemu/KVM
VM with a virtio-serial controller and a virtio-gpu adding to a PCI Bridge
and also adding some delay before creating ITS device.

> 
> The whole RID aliasing is such a mess, I wish we never supported
> it. Anyway, comments below.
> 
>>
>> Signed-off-by: Zheng Xiang <zhengxiang9@huawei.com>
>> ---
>>  drivers/irqchip/irq-gic-v3-its.c | 52 +++++++++++++++-------------------------
>>  1 file changed, 19 insertions(+), 33 deletions(-)
>>
>> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
>> index db20e99..397edc8 100644
>> --- a/drivers/irqchip/irq-gic-v3-its.c
>> +++ b/drivers/irqchip/irq-gic-v3-its.c
>> @@ -2205,25 +2205,6 @@ static void its_cpu_init_collections(void)
>>  	raw_spin_unlock(&its_lock);
>>  }
>>  
>> -static struct its_device *its_find_device(struct its_node *its, u32 dev_id)
>> -{
>> -	struct its_device *its_dev = NULL, *tmp;
>> -	unsigned long flags;
>> -
>> -	raw_spin_lock_irqsave(&its->lock, flags);
>> -
>> -	list_for_each_entry(tmp, &its->its_device_list, entry) {
>> -		if (tmp->device_id == dev_id) {
>> -			its_dev = tmp;
>> -			break;
>> -		}
>> -	}
>> -
>> -	raw_spin_unlock_irqrestore(&its->lock, flags);
>> -
>> -	return its_dev;
>> -}
>> -
>>  static struct its_baser *its_get_baser(struct its_node *its, u32 type)
>>  {
>>  	int i;
>> @@ -2321,7 +2302,7 @@ static bool its_alloc_vpe_table(u32 vpe_id)
>>  static struct its_device *its_create_device(struct its_node *its, u32 dev_id,
>>  					    int nvecs, bool alloc_lpis)
>>  {
>> -	struct its_device *dev;
>> +	struct its_device *dev = NULL, *tmp;
>>  	unsigned long *lpi_map = NULL;
>>  	unsigned long flags;
>>  	u16 *col_map = NULL;
>> @@ -2331,6 +2312,24 @@ static struct its_device *its_create_device(struct its_node *its, u32 dev_id,
>>  	int nr_ites;
>>  	int sz;
>>  
>> +	raw_spin_lock_irqsave(&its->lock, flags);
>> +	list_for_each_entry(tmp, &its->its_device_list, entry) {
>> +		if (tmp->device_id == dev_id) {
>> +			dev = tmp;
>> +			break;
>> +		}
>> +	}
>> +	if (dev) {
>> +		/*
>> +		 * We already have seen this ID, probably through
>> +		 * another alias (PCI bridge of some sort). No need to
>> +		 * create the device.
>> +		 */
>> +		pr_debug("Reusing ITT for devID %x\n", dev_id);
>> +		raw_spin_unlock_irqrestore(&its->lock, flags);
>> +		return dev;
>> +	}
>> +
>>  	if (!its_alloc_device_table(its, dev_id))
> 
> You're now performing all sort of allocations in an atomic context,
> which is pretty horrible (and the kernel will shout at you for doing
> so).
> 
> We could probably keep the current logic and wrap it around a mutex
> instead, which would give us the appropriate guarantees WRT allocations.
> Something along those lines (untested):>
> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> index db20e992a40f..99feb62e63ba 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -97,9 +97,14 @@ struct its_device;
>   * The ITS structure - contains most of the infrastructure, with the
>   * top-level MSI domain, the command queue, the collections, and the
>   * list of devices writing to it.
> + *
> + * alloc_lock has to be taken for any allocation that can happen at
> + * run time, while the spinlock must be taken to parse data structures
> + * such as the device list.
>   */
>  struct its_node {
>  	raw_spinlock_t		lock;
> +	struct mutex		alloc_lock;
>  	struct list_head	entry;
>  	void __iomem		*base;
>  	phys_addr_t		phys_base;
> @@ -2421,6 +2426,7 @@ static int its_msi_prepare(struct irq_domain *domain, struct device *dev,
>  	struct its_device *its_dev;
>  	struct msi_domain_info *msi_info;
>  	u32 dev_id;
> +	int err = 0;
>  
>  	/*
>  	 * We ignore "dev" entierely, and rely on the dev_id that has
> @@ -2443,6 +2449,7 @@ static int its_msi_prepare(struct irq_domain *domain, struct device *dev,
>  		return -EINVAL;
>  	}
>  
> +	mutex_lock(&its->alloc_lock);
>  	its_dev = its_find_device(its, dev_id);
>  	if (its_dev) {
>  		/*
> @@ -2455,11 +2462,14 @@ static int its_msi_prepare(struct irq_domain *domain, struct device *dev,
>  	}
>  
>  	its_dev = its_create_device(its, dev_id, nvec, true);
> -	if (!its_dev)
> -		return -ENOMEM;
> +	if (!its_dev) {
> +		err = -ENOMEM;
> +		goto out;
> +	}
>  
>  	pr_debug("ITT %d entries, %d bits\n", nvec, ilog2(nvec));
>  out:
> +	mutex_unlock(&its->alloc_lock);
>  	info->scratchpad[0].ptr = its_dev;
>  	return 0;

Should it return *err* here?

>  }
> @@ -3516,6 +3526,7 @@ static int __init its_probe_one(struct resource *res,
>  	}
>  
>  	raw_spin_lock_init(&its->lock);
> +	mutex_init(&its->alloc_lock);
>  	INIT_LIST_HEAD(&its->entry);
>  	INIT_LIST_HEAD(&its->its_device_list);
>  	typer = gic_read_typer(its_base + GITS_TYPER);
> 
> I still feel that the issue you're seeing here is much more generic.
> Overall, there is no guarantee that for a given MSI domain, no two
> allocation will take place in parallel, and maybe that's what we should
> enforce instead.
> 
> Thanks,
> 
> 	M.
> 
-- 

Thanks,
Xiang



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] irqchip/gic-v3-its: Lock its device list during find and create its device
  2019-01-28  7:13   ` Zheng Xiang
@ 2019-01-28 13:51     ` Marc Zyngier
  2019-01-29  5:42       ` Zheng Xiang
  0 siblings, 1 reply; 10+ messages in thread
From: Marc Zyngier @ 2019-01-28 13:51 UTC (permalink / raw)
  To: Zheng Xiang; +Cc: linux-kernel, tglx, jason, wanghaibin.wang

On 28/01/2019 07:13, Zheng Xiang wrote:
> Hi Marc,
> 
> Thanks for your review.
> 
> On 2019/1/26 19:38, Marc Zyngier wrote:
>> Hi Zheng,
>>
>> On Sat, 26 Jan 2019 06:16:24 +0000,
>> Zheng Xiang <zhengxiang9@huawei.com> wrote:
>>>
>>> Currently each PCI device under a PCI Bridge shares the same device id
>>> and ITS device. Assume there are two PCI devices call its_msi_prepare
>>> concurrently and they are both going to find and create their ITS
>>> device. There is a chance that the later one couldn't find ITS device
>>> before the other one creating the ITS device. It will cause the later
>>> one to create a different ITS device even if they have the same
>>> device_id.
>>
>> Interesting finding. Is this something you've actually seen in practice
>> with two devices being probed in parallel? Or something that you found
>> by inspection?
> 
> Yes, I find this problem after analyzing the reason of VM hung. At last, I
> find that the virtio-gpu cannot receive the MSI interrupts due to sharing
> a same event_id as virtio-serial.
> 
> See https://lkml.org/lkml/2019/1/10/299 for the bug report.
> 
> This problem can be reproducted with high probability by booting a Qemu/KVM
> VM with a virtio-serial controller and a virtio-gpu adding to a PCI Bridge
> and also adding some delay before creating ITS device.

Fair enough. Do you mind sharing your QEMU command line? It'd be useful
if I could reproduce it here (and would give me a way to check that it
doesn't regress).

> 
>>
>> The whole RID aliasing is such a mess, I wish we never supported
>> it. Anyway, comments below.
>>
>>>
>>> Signed-off-by: Zheng Xiang <zhengxiang9@huawei.com>
>>> ---
>>>  drivers/irqchip/irq-gic-v3-its.c | 52 +++++++++++++++-------------------------
>>>  1 file changed, 19 insertions(+), 33 deletions(-)
>>>
>>> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
>>> index db20e99..397edc8 100644
>>> --- a/drivers/irqchip/irq-gic-v3-its.c
>>> +++ b/drivers/irqchip/irq-gic-v3-its.c
>>> @@ -2205,25 +2205,6 @@ static void its_cpu_init_collections(void)
>>>  	raw_spin_unlock(&its_lock);
>>>  }
>>>  
>>> -static struct its_device *its_find_device(struct its_node *its, u32 dev_id)
>>> -{
>>> -	struct its_device *its_dev = NULL, *tmp;
>>> -	unsigned long flags;
>>> -
>>> -	raw_spin_lock_irqsave(&its->lock, flags);
>>> -
>>> -	list_for_each_entry(tmp, &its->its_device_list, entry) {
>>> -		if (tmp->device_id == dev_id) {
>>> -			its_dev = tmp;
>>> -			break;
>>> -		}
>>> -	}
>>> -
>>> -	raw_spin_unlock_irqrestore(&its->lock, flags);
>>> -
>>> -	return its_dev;
>>> -}
>>> -
>>>  static struct its_baser *its_get_baser(struct its_node *its, u32 type)
>>>  {
>>>  	int i;
>>> @@ -2321,7 +2302,7 @@ static bool its_alloc_vpe_table(u32 vpe_id)
>>>  static struct its_device *its_create_device(struct its_node *its, u32 dev_id,
>>>  					    int nvecs, bool alloc_lpis)
>>>  {
>>> -	struct its_device *dev;
>>> +	struct its_device *dev = NULL, *tmp;
>>>  	unsigned long *lpi_map = NULL;
>>>  	unsigned long flags;
>>>  	u16 *col_map = NULL;
>>> @@ -2331,6 +2312,24 @@ static struct its_device *its_create_device(struct its_node *its, u32 dev_id,
>>>  	int nr_ites;
>>>  	int sz;
>>>  
>>> +	raw_spin_lock_irqsave(&its->lock, flags);
>>> +	list_for_each_entry(tmp, &its->its_device_list, entry) {
>>> +		if (tmp->device_id == dev_id) {
>>> +			dev = tmp;
>>> +			break;
>>> +		}
>>> +	}
>>> +	if (dev) {
>>> +		/*
>>> +		 * We already have seen this ID, probably through
>>> +		 * another alias (PCI bridge of some sort). No need to
>>> +		 * create the device.
>>> +		 */
>>> +		pr_debug("Reusing ITT for devID %x\n", dev_id);
>>> +		raw_spin_unlock_irqrestore(&its->lock, flags);
>>> +		return dev;
>>> +	}
>>> +
>>>  	if (!its_alloc_device_table(its, dev_id))
>>
>> You're now performing all sort of allocations in an atomic context,
>> which is pretty horrible (and the kernel will shout at you for doing
>> so).
>>
>> We could probably keep the current logic and wrap it around a mutex
>> instead, which would give us the appropriate guarantees WRT allocations.
>> Something along those lines (untested):>
>> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
>> index db20e992a40f..99feb62e63ba 100644
>> --- a/drivers/irqchip/irq-gic-v3-its.c
>> +++ b/drivers/irqchip/irq-gic-v3-its.c
>> @@ -97,9 +97,14 @@ struct its_device;
>>   * The ITS structure - contains most of the infrastructure, with the
>>   * top-level MSI domain, the command queue, the collections, and the
>>   * list of devices writing to it.
>> + *
>> + * alloc_lock has to be taken for any allocation that can happen at
>> + * run time, while the spinlock must be taken to parse data structures
>> + * such as the device list.
>>   */
>>  struct its_node {
>>  	raw_spinlock_t		lock;
>> +	struct mutex		alloc_lock;
>>  	struct list_head	entry;
>>  	void __iomem		*base;
>>  	phys_addr_t		phys_base;
>> @@ -2421,6 +2426,7 @@ static int its_msi_prepare(struct irq_domain *domain, struct device *dev,
>>  	struct its_device *its_dev;
>>  	struct msi_domain_info *msi_info;
>>  	u32 dev_id;
>> +	int err = 0;
>>  
>>  	/*
>>  	 * We ignore "dev" entierely, and rely on the dev_id that has
>> @@ -2443,6 +2449,7 @@ static int its_msi_prepare(struct irq_domain *domain, struct device *dev,
>>  		return -EINVAL;
>>  	}
>>  
>> +	mutex_lock(&its->alloc_lock);
>>  	its_dev = its_find_device(its, dev_id);
>>  	if (its_dev) {
>>  		/*
>> @@ -2455,11 +2462,14 @@ static int its_msi_prepare(struct irq_domain *domain, struct device *dev,
>>  	}
>>  
>>  	its_dev = its_create_device(its, dev_id, nvec, true);
>> -	if (!its_dev)
>> -		return -ENOMEM;
>> +	if (!its_dev) {
>> +		err = -ENOMEM;
>> +		goto out;
>> +	}
>>  
>>  	pr_debug("ITT %d entries, %d bits\n", nvec, ilog2(nvec));
>>  out:
>> +	mutex_unlock(&its->alloc_lock);
>>  	info->scratchpad[0].ptr = its_dev;
>>  	return 0;
> 
> Should it return *err* here?

Absolutely. Does it fix the problem for you?

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] irqchip/gic-v3-its: Lock its device list during find and create its device
  2019-01-28 13:51     ` Marc Zyngier
@ 2019-01-29  5:42       ` Zheng Xiang
  2019-01-31 14:47         ` Zheng Xiang
  0 siblings, 1 reply; 10+ messages in thread
From: Zheng Xiang @ 2019-01-29  5:42 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: linux-kernel, tglx, jason, wanghaibin.wang

On 2019/1/28 21:51, Marc Zyngier wrote:
> On 28/01/2019 07:13, Zheng Xiang wrote:
>> Hi Marc,
>>
>> Thanks for your review.
>>
>> On 2019/1/26 19:38, Marc Zyngier wrote:
>>> Hi Zheng,
>>>
>>> On Sat, 26 Jan 2019 06:16:24 +0000,
>>> Zheng Xiang <zhengxiang9@huawei.com> wrote:
>>>>
>>>> Currently each PCI device under a PCI Bridge shares the same device id
>>>> and ITS device. Assume there are two PCI devices call its_msi_prepare
>>>> concurrently and they are both going to find and create their ITS
>>>> device. There is a chance that the later one couldn't find ITS device
>>>> before the other one creating the ITS device. It will cause the later
>>>> one to create a different ITS device even if they have the same
>>>> device_id.
>>>
>>> Interesting finding. Is this something you've actually seen in practice
>>> with two devices being probed in parallel? Or something that you found
>>> by inspection?
>>
>> Yes, I find this problem after analyzing the reason of VM hung. At last, I
>> find that the virtio-gpu cannot receive the MSI interrupts due to sharing
>> a same event_id as virtio-serial.
>>
>> See https://lkml.org/lkml/2019/1/10/299 for the bug report.
>>
>> This problem can be reproducted with high probability by booting a Qemu/KVM
>> VM with a virtio-serial controller and a virtio-gpu adding to a PCI Bridge
>> and also adding some delay before creating ITS device.
> 
> Fair enough. Do you mind sharing your QEMU command line? It'd be useful
> if I could reproduce it here (and would give me a way to check that it
> doesn't regress).

Yes of course, my QEMU command line is below:

qemu-system-aarch64 \
    -name guest=arm64 \
    -machine virt,accel=kvm,usb=off,gic-version=3 \
    -cpu host \
    -bios /usr/share/edk2/aarch64/QEMU_EFI.fd \
    -nodefaults \
    -m 2048 \
    -smp 1 \
    -device ioh3420,port=0x8,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x1 \
    -device i82801b11-bridge,id=pci.2,bus=pcie.0,addr=0x2 \
    -device pci-bridge,chassis_nr=3,id=pci.3,bus=pci.2,addr=0x0 \
    -device ioh3420,port=0x9,chassis=4,id=pci.4,bus=pcie.0,addr=0x1.0x1 \
    -device virtio-scsi-pci,id=scsi0,bus=pci.4,addr=0x0 \
    -drive file=/home/zhengxiang/tmp.raw,format=raw,if=none,id=drive-scsi0-0-0-0,cache=none,aio=threads \
    -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 \
    -drive file=/dev/mapper/VolGroup00-lvol_7,format=raw,if=none,id=drive-virtio-disk0,cache=none,aio=threads \
    -device virtio-blk-pci,scsi=off,bus=pci.1,addr=0x0,drive=drive-virtio-disk0,id=virtio-disk0 \
    -device virtio-gpu-pci,id=video0,bus=pci.3,addr=0x2 \
    -device virtio-serial-pci,id=virtio-serial0,bus=pci.3,addr=0x3 \
    -device usb-ehci,id=usb,bus=pci.3,addr=0x1 \
    -device usb-kbd,id=input1,bus=usb.0,port=2 \
    -monitor telnet:0.0.0.0:22222,server,nowait \
    -vnc 0.0.0.0:8 \
    -msg timestamp=on \
    -serial stdio \

Add *msleep* between *its_find_device* and *its_create_device* to increase the
rate of probability, .

> 
>>
>>>
>>> The whole RID aliasing is such a mess, I wish we never supported
>>> it. Anyway, comments below.
>>>
>>>>
>>>> Signed-off-by: Zheng Xiang <zhengxiang9@huawei.com>
>>>> ---
>>>>  drivers/irqchip/irq-gic-v3-its.c | 52 +++++++++++++++-------------------------
>>>>  1 file changed, 19 insertions(+), 33 deletions(-)
>>>>
>>>> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
>>>> index db20e99..397edc8 100644
>>>> --- a/drivers/irqchip/irq-gic-v3-its.c
>>>> +++ b/drivers/irqchip/irq-gic-v3-its.c
>>>> @@ -2205,25 +2205,6 @@ static void its_cpu_init_collections(void)
>>>>  	raw_spin_unlock(&its_lock);
>>>>  }
>>>>  
>>>> -static struct its_device *its_find_device(struct its_node *its, u32 dev_id)
>>>> -{
>>>> -	struct its_device *its_dev = NULL, *tmp;
>>>> -	unsigned long flags;
>>>> -
>>>> -	raw_spin_lock_irqsave(&its->lock, flags);
>>>> -
>>>> -	list_for_each_entry(tmp, &its->its_device_list, entry) {
>>>> -		if (tmp->device_id == dev_id) {
>>>> -			its_dev = tmp;
>>>> -			break;
>>>> -		}
>>>> -	}
>>>> -
>>>> -	raw_spin_unlock_irqrestore(&its->lock, flags);
>>>> -
>>>> -	return its_dev;
>>>> -}
>>>> -
>>>>  static struct its_baser *its_get_baser(struct its_node *its, u32 type)
>>>>  {
>>>>  	int i;
>>>> @@ -2321,7 +2302,7 @@ static bool its_alloc_vpe_table(u32 vpe_id)
>>>>  static struct its_device *its_create_device(struct its_node *its, u32 dev_id,
>>>>  					    int nvecs, bool alloc_lpis)
>>>>  {
>>>> -	struct its_device *dev;
>>>> +	struct its_device *dev = NULL, *tmp;
>>>>  	unsigned long *lpi_map = NULL;
>>>>  	unsigned long flags;
>>>>  	u16 *col_map = NULL;
>>>> @@ -2331,6 +2312,24 @@ static struct its_device *its_create_device(struct its_node *its, u32 dev_id,
>>>>  	int nr_ites;
>>>>  	int sz;
>>>>  
>>>> +	raw_spin_lock_irqsave(&its->lock, flags);
>>>> +	list_for_each_entry(tmp, &its->its_device_list, entry) {
>>>> +		if (tmp->device_id == dev_id) {
>>>> +			dev = tmp;
>>>> +			break;
>>>> +		}
>>>> +	}
>>>> +	if (dev) {
>>>> +		/*
>>>> +		 * We already have seen this ID, probably through
>>>> +		 * another alias (PCI bridge of some sort). No need to
>>>> +		 * create the device.
>>>> +		 */
>>>> +		pr_debug("Reusing ITT for devID %x\n", dev_id);
>>>> +		raw_spin_unlock_irqrestore(&its->lock, flags);
>>>> +		return dev;
>>>> +	}
>>>> +
>>>>  	if (!its_alloc_device_table(its, dev_id))
>>>
>>> You're now performing all sort of allocations in an atomic context,
>>> which is pretty horrible (and the kernel will shout at you for doing
>>> so).
>>>
>>> We could probably keep the current logic and wrap it around a mutex
>>> instead, which would give us the appropriate guarantees WRT allocations.
>>> Something along those lines (untested):>
>>> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
>>> index db20e992a40f..99feb62e63ba 100644
>>> --- a/drivers/irqchip/irq-gic-v3-its.c
>>> +++ b/drivers/irqchip/irq-gic-v3-its.c
>>> @@ -97,9 +97,14 @@ struct its_device;
>>>   * The ITS structure - contains most of the infrastructure, with the
>>>   * top-level MSI domain, the command queue, the collections, and the
>>>   * list of devices writing to it.
>>> + *
>>> + * alloc_lock has to be taken for any allocation that can happen at
>>> + * run time, while the spinlock must be taken to parse data structures
>>> + * such as the device list.
>>>   */
>>>  struct its_node {
>>>  	raw_spinlock_t		lock;
>>> +	struct mutex		alloc_lock;
>>>  	struct list_head	entry;
>>>  	void __iomem		*base;
>>>  	phys_addr_t		phys_base;
>>> @@ -2421,6 +2426,7 @@ static int its_msi_prepare(struct irq_domain *domain, struct device *dev,
>>>  	struct its_device *its_dev;
>>>  	struct msi_domain_info *msi_info;
>>>  	u32 dev_id;
>>> +	int err = 0;
>>>  
>>>  	/*
>>>  	 * We ignore "dev" entierely, and rely on the dev_id that has
>>> @@ -2443,6 +2449,7 @@ static int its_msi_prepare(struct irq_domain *domain, struct device *dev,
>>>  		return -EINVAL;
>>>  	}
>>>  
>>> +	mutex_lock(&its->alloc_lock);
>>>  	its_dev = its_find_device(its, dev_id);
>>>  	if (its_dev) {
>>>  		/*
>>> @@ -2455,11 +2462,14 @@ static int its_msi_prepare(struct irq_domain *domain, struct device *dev,
>>>  	}
>>>  
>>>  	its_dev = its_create_device(its, dev_id, nvec, true);
>>> -	if (!its_dev)
>>> -		return -ENOMEM;
>>> +	if (!its_dev) {
>>> +		err = -ENOMEM;
>>> +		goto out;
>>> +	}
>>>  
>>>  	pr_debug("ITT %d entries, %d bits\n", nvec, ilog2(nvec));
>>>  out:
>>> +	mutex_unlock(&its->alloc_lock);
>>>  	info->scratchpad[0].ptr = its_dev;>>>  	return 0;
>>
>> Should it return *err* here?
> 
> Absolutely. Does it fix the problem for you?

Yes, VM doesn't get hung anymore after thousands of times of boot/reboot.

> 
> Thanks,
> 
> 	M.
> 
-- 

Thanks,
Xiang



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] irqchip/gic-v3-its: Lock its device list during find and create its device
  2019-01-29  5:42       ` Zheng Xiang
@ 2019-01-31 14:47         ` Zheng Xiang
  2019-01-31 15:12           ` Marc Zyngier
  0 siblings, 1 reply; 10+ messages in thread
From: Zheng Xiang @ 2019-01-31 14:47 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: linux-kernel, tglx, jason, wanghaibin.wang

Hi Marc,

On 2019/1/29 13:42, Zheng Xiang wrote:
> On 2019/1/28 21:51, Marc Zyngier wrote:
>> On 28/01/2019 07:13, Zheng Xiang wrote:
>>> Hi Marc,
>>>
>>> Thanks for your review.
>>>
>>> On 2019/1/26 19:38, Marc Zyngier wrote:
>>>> Hi Zheng,
>>>>
>>>> On Sat, 26 Jan 2019 06:16:24 +0000,
>>>> Zheng Xiang <zhengxiang9@huawei.com> wrote:
>>>>>
>>>>> Currently each PCI device under a PCI Bridge shares the same device id
>>>>> and ITS device. Assume there are two PCI devices call its_msi_prepare
>>>>> concurrently and they are both going to find and create their ITS
>>>>> device. There is a chance that the later one couldn't find ITS device
>>>>> before the other one creating the ITS device. It will cause the later
>>>>> one to create a different ITS device even if they have the same
>>>>> device_id.
>>>>
>>>> Interesting finding. Is this something you've actually seen in practice
>>>> with two devices being probed in parallel? Or something that you found
>>>> by inspection?
>>>
>>> Yes, I find this problem after analyzing the reason of VM hung. At last, I
>>> find that the virtio-gpu cannot receive the MSI interrupts due to sharing
>>> a same event_id as virtio-serial.
>>>
>>> See https://lkml.org/lkml/2019/1/10/299 for the bug report.
>>>
>>> This problem can be reproducted with high probability by booting a Qemu/KVM
>>> VM with a virtio-serial controller and a virtio-gpu adding to a PCI Bridge
>>> and also adding some delay before creating ITS device.
>>
>> Fair enough. Do you mind sharing your QEMU command line? It'd be useful
>> if I could reproduce it here (and would give me a way to check that it
>> doesn't regress).
> 

Have you reproduced it with my QEMU command line?

If so, should I send a V2 patch with your suggestion?

> Yes of course, my QEMU command line is below:
> 
> qemu-system-aarch64 \
>     -name guest=arm64 \
>     -machine virt,accel=kvm,usb=off,gic-version=3 \
>     -cpu host \
>     -bios /usr/share/edk2/aarch64/QEMU_EFI.fd \
>     -nodefaults \
>     -m 2048 \
>     -smp 1 \
>     -device ioh3420,port=0x8,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x1 \
>     -device i82801b11-bridge,id=pci.2,bus=pcie.0,addr=0x2 \
>     -device pci-bridge,chassis_nr=3,id=pci.3,bus=pci.2,addr=0x0 \
>     -device ioh3420,port=0x9,chassis=4,id=pci.4,bus=pcie.0,addr=0x1.0x1 \
>     -device virtio-scsi-pci,id=scsi0,bus=pci.4,addr=0x0 \
>     -drive file=/home/zhengxiang/tmp.raw,format=raw,if=none,id=drive-scsi0-0-0-0,cache=none,aio=threads \
>     -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 \
>     -drive file=/dev/mapper/VolGroup00-lvol_7,format=raw,if=none,id=drive-virtio-disk0,cache=none,aio=threads \
>     -device virtio-blk-pci,scsi=off,bus=pci.1,addr=0x0,drive=drive-virtio-disk0,id=virtio-disk0 \
>     -device virtio-gpu-pci,id=video0,bus=pci.3,addr=0x2 \
>     -device virtio-serial-pci,id=virtio-serial0,bus=pci.3,addr=0x3 \
>     -device usb-ehci,id=usb,bus=pci.3,addr=0x1 \
>     -device usb-kbd,id=input1,bus=usb.0,port=2 \
>     -monitor telnet:0.0.0.0:22222,server,nowait \
>     -vnc 0.0.0.0:8 \
>     -msg timestamp=on \
>     -serial stdio \
> 
> Add *msleep* between *its_find_device* and *its_create_device* to increase the
> rate of probability, .
> 
>>
>>>
>>>>
>>>> The whole RID aliasing is such a mess, I wish we never supported
>>>> it. Anyway, comments below.
>>>>
>>>>>
>>>>> Signed-off-by: Zheng Xiang <zhengxiang9@huawei.com>
>>>>> ---
>>>>>  drivers/irqchip/irq-gic-v3-its.c | 52 +++++++++++++++-------------------------
>>>>>  1 file changed, 19 insertions(+), 33 deletions(-)
>>>>>
>>>>> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
>>>>> index db20e99..397edc8 100644
>>>>> --- a/drivers/irqchip/irq-gic-v3-its.c
>>>>> +++ b/drivers/irqchip/irq-gic-v3-its.c
>>>>> @@ -2205,25 +2205,6 @@ static void its_cpu_init_collections(void)
>>>>>  	raw_spin_unlock(&its_lock);
>>>>>  }
>>>>>  
>>>>> -static struct its_device *its_find_device(struct its_node *its, u32 dev_id)
>>>>> -{
>>>>> -	struct its_device *its_dev = NULL, *tmp;
>>>>> -	unsigned long flags;
>>>>> -
>>>>> -	raw_spin_lock_irqsave(&its->lock, flags);
>>>>> -
>>>>> -	list_for_each_entry(tmp, &its->its_device_list, entry) {
>>>>> -		if (tmp->device_id == dev_id) {
>>>>> -			its_dev = tmp;
>>>>> -			break;
>>>>> -		}
>>>>> -	}
>>>>> -
>>>>> -	raw_spin_unlock_irqrestore(&its->lock, flags);
>>>>> -
>>>>> -	return its_dev;
>>>>> -}
>>>>> -
>>>>>  static struct its_baser *its_get_baser(struct its_node *its, u32 type)
>>>>>  {
>>>>>  	int i;
>>>>> @@ -2321,7 +2302,7 @@ static bool its_alloc_vpe_table(u32 vpe_id)
>>>>>  static struct its_device *its_create_device(struct its_node *its, u32 dev_id,
>>>>>  					    int nvecs, bool alloc_lpis)
>>>>>  {
>>>>> -	struct its_device *dev;
>>>>> +	struct its_device *dev = NULL, *tmp;
>>>>>  	unsigned long *lpi_map = NULL;
>>>>>  	unsigned long flags;
>>>>>  	u16 *col_map = NULL;
>>>>> @@ -2331,6 +2312,24 @@ static struct its_device *its_create_device(struct its_node *its, u32 dev_id,
>>>>>  	int nr_ites;
>>>>>  	int sz;
>>>>>  
>>>>> +	raw_spin_lock_irqsave(&its->lock, flags);
>>>>> +	list_for_each_entry(tmp, &its->its_device_list, entry) {
>>>>> +		if (tmp->device_id == dev_id) {
>>>>> +			dev = tmp;
>>>>> +			break;
>>>>> +		}
>>>>> +	}
>>>>> +	if (dev) {
>>>>> +		/*
>>>>> +		 * We already have seen this ID, probably through
>>>>> +		 * another alias (PCI bridge of some sort). No need to
>>>>> +		 * create the device.
>>>>> +		 */
>>>>> +		pr_debug("Reusing ITT for devID %x\n", dev_id);
>>>>> +		raw_spin_unlock_irqrestore(&its->lock, flags);
>>>>> +		return dev;
>>>>> +	}
>>>>> +
>>>>>  	if (!its_alloc_device_table(its, dev_id))
>>>>
>>>> You're now performing all sort of allocations in an atomic context,
>>>> which is pretty horrible (and the kernel will shout at you for doing
>>>> so).
>>>>
>>>> We could probably keep the current logic and wrap it around a mutex
>>>> instead, which would give us the appropriate guarantees WRT allocations.
>>>> Something along those lines (untested):>
>>>> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
>>>> index db20e992a40f..99feb62e63ba 100644
>>>> --- a/drivers/irqchip/irq-gic-v3-its.c
>>>> +++ b/drivers/irqchip/irq-gic-v3-its.c
>>>> @@ -97,9 +97,14 @@ struct its_device;
>>>>   * The ITS structure - contains most of the infrastructure, with the
>>>>   * top-level MSI domain, the command queue, the collections, and the
>>>>   * list of devices writing to it.
>>>> + *
>>>> + * alloc_lock has to be taken for any allocation that can happen at
>>>> + * run time, while the spinlock must be taken to parse data structures
>>>> + * such as the device list.
>>>>   */
>>>>  struct its_node {
>>>>  	raw_spinlock_t		lock;
>>>> +	struct mutex		alloc_lock;
>>>>  	struct list_head	entry;
>>>>  	void __iomem		*base;
>>>>  	phys_addr_t		phys_base;
>>>> @@ -2421,6 +2426,7 @@ static int its_msi_prepare(struct irq_domain *domain, struct device *dev,
>>>>  	struct its_device *its_dev;
>>>>  	struct msi_domain_info *msi_info;
>>>>  	u32 dev_id;
>>>> +	int err = 0;
>>>>  
>>>>  	/*
>>>>  	 * We ignore "dev" entierely, and rely on the dev_id that has
>>>> @@ -2443,6 +2449,7 @@ static int its_msi_prepare(struct irq_domain *domain, struct device *dev,
>>>>  		return -EINVAL;
>>>>  	}
>>>>  
>>>> +	mutex_lock(&its->alloc_lock);
>>>>  	its_dev = its_find_device(its, dev_id);
>>>>  	if (its_dev) {
>>>>  		/*
>>>> @@ -2455,11 +2462,14 @@ static int its_msi_prepare(struct irq_domain *domain, struct device *dev,
>>>>  	}
>>>>  
>>>>  	its_dev = its_create_device(its, dev_id, nvec, true);
>>>> -	if (!its_dev)
>>>> -		return -ENOMEM;
>>>> +	if (!its_dev) {
>>>> +		err = -ENOMEM;
>>>> +		goto out;
>>>> +	}
>>>>  
>>>>  	pr_debug("ITT %d entries, %d bits\n", nvec, ilog2(nvec));
>>>>  out:
>>>> +	mutex_unlock(&its->alloc_lock);
>>>>  	info->scratchpad[0].ptr = its_dev;>>>  	return 0;
>>>
>>> Should it return *err* here?
>>
>> Absolutely. Does it fix the problem for you?
> 
> Yes, VM doesn't get hung anymore after thousands of times of boot/reboot.
> 
>>
>> Thanks,
>>
>> 	M.
>>
-- 

Thanks,
Xiang



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] irqchip/gic-v3-its: Lock its device list during find and create its device
  2019-01-31 14:47         ` Zheng Xiang
@ 2019-01-31 15:12           ` Marc Zyngier
  2019-02-01  6:41             ` Zheng Xiang
  0 siblings, 1 reply; 10+ messages in thread
From: Marc Zyngier @ 2019-01-31 15:12 UTC (permalink / raw)
  To: Zheng Xiang; +Cc: linux-kernel, tglx, jason, wanghaibin.wang

Hi Zeng,

On 31/01/2019 14:47, Zheng Xiang wrote:
> Hi Marc,
> 
> On 2019/1/29 13:42, Zheng Xiang wrote:
>> On 2019/1/28 21:51, Marc Zyngier wrote:
>>> On 28/01/2019 07:13, Zheng Xiang wrote:
>>>> Hi Marc,
>>>>
>>>> Thanks for your review.
>>>>
>>>> On 2019/1/26 19:38, Marc Zyngier wrote:
>>>>> Hi Zheng,
>>>>>
>>>>> On Sat, 26 Jan 2019 06:16:24 +0000,
>>>>> Zheng Xiang <zhengxiang9@huawei.com> wrote:
>>>>>>
>>>>>> Currently each PCI device under a PCI Bridge shares the same device id
>>>>>> and ITS device. Assume there are two PCI devices call its_msi_prepare
>>>>>> concurrently and they are both going to find and create their ITS
>>>>>> device. There is a chance that the later one couldn't find ITS device
>>>>>> before the other one creating the ITS device. It will cause the later
>>>>>> one to create a different ITS device even if they have the same
>>>>>> device_id.
>>>>>
>>>>> Interesting finding. Is this something you've actually seen in practice
>>>>> with two devices being probed in parallel? Or something that you found
>>>>> by inspection?
>>>>
>>>> Yes, I find this problem after analyzing the reason of VM hung. At last, I
>>>> find that the virtio-gpu cannot receive the MSI interrupts due to sharing
>>>> a same event_id as virtio-serial.
>>>>
>>>> See https://lkml.org/lkml/2019/1/10/299 for the bug report.
>>>>
>>>> This problem can be reproducted with high probability by booting a Qemu/KVM
>>>> VM with a virtio-serial controller and a virtio-gpu adding to a PCI Bridge
>>>> and also adding some delay before creating ITS device.
>>>
>>> Fair enough. Do you mind sharing your QEMU command line? It'd be useful
>>> if I could reproduce it here (and would give me a way to check that it
>>> doesn't regress).
>>
> 
> Have you reproduced it with my QEMU command line?
> 
> If so, should I send a V2 patch with your suggestion?

I've queued the following, much more complete patch:

https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/commit/?h=irq/irqchip-next&id=9791ec7df0e7b4d80706ccea8f24b6542f6059e9

Can you check that it works for you? I didn't manage to get the right
timing conditions, but I also had issues getting virtio-gpu running on
my TX2, so one might explain the other.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] irqchip/gic-v3-its: Lock its device list during find and create its device
  2019-01-31 15:12           ` Marc Zyngier
@ 2019-02-01  6:41             ` Zheng Xiang
  2019-02-01  9:28               ` Marc Zyngier
  0 siblings, 1 reply; 10+ messages in thread
From: Zheng Xiang @ 2019-02-01  6:41 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: linux-kernel, tglx, jason, wanghaibin.wang


On 2019/1/31 23:12, Marc Zyngier wrote:
> Hi Zeng,
> 
> On 31/01/2019 14:47, Zheng Xiang wrote:
>> Hi Marc,
>>
>> On 2019/1/29 13:42, Zheng Xiang wrote:
>>> On 2019/1/28 21:51, Marc Zyngier wrote:
>>>> On 28/01/2019 07:13, Zheng Xiang wrote:
>>>>> Hi Marc,
>>>>>
>>>>> Thanks for your review.
>>>>>
>>>>> On 2019/1/26 19:38, Marc Zyngier wrote:
>>>>>> Hi Zheng,
>>>>>>
>>>>>> On Sat, 26 Jan 2019 06:16:24 +0000,
>>>>>> Zheng Xiang <zhengxiang9@huawei.com> wrote:
>>>>>>>
>>>>>>> Currently each PCI device under a PCI Bridge shares the same device id
>>>>>>> and ITS device. Assume there are two PCI devices call its_msi_prepare
>>>>>>> concurrently and they are both going to find and create their ITS
>>>>>>> device. There is a chance that the later one couldn't find ITS device
>>>>>>> before the other one creating the ITS device. It will cause the later
>>>>>>> one to create a different ITS device even if they have the same
>>>>>>> device_id.
>>>>>>
>>>>>> Interesting finding. Is this something you've actually seen in practice
>>>>>> with two devices being probed in parallel? Or something that you found
>>>>>> by inspection?
>>>>>
>>>>> Yes, I find this problem after analyzing the reason of VM hung. At last, I
>>>>> find that the virtio-gpu cannot receive the MSI interrupts due to sharing
>>>>> a same event_id as virtio-serial.
>>>>>
>>>>> See https://lkml.org/lkml/2019/1/10/299 for the bug report.
>>>>>
>>>>> This problem can be reproducted with high probability by booting a Qemu/KVM
>>>>> VM with a virtio-serial controller and a virtio-gpu adding to a PCI Bridge
>>>>> and also adding some delay before creating ITS device.
>>>>
>>>> Fair enough. Do you mind sharing your QEMU command line? It'd be useful
>>>> if I could reproduce it here (and would give me a way to check that it
>>>> doesn't regress).
>>>
>>
>> Have you reproduced it with my QEMU command line?
>>
>> If so, should I send a V2 patch with your suggestion?
> 
> I've queued the following, much more complete patch:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/commit/?h=irq/irqchip-next&id=9791ec7df0e7b4d80706ccea8f24b6542f6059e9
> 
> Can you check that it works for you? I didn't manage to get the right
> timing conditions, but I also had issues getting virtio-gpu running on
> my TX2, so one might explain the other.
> 

It works for my case, but I worried about the below lines which may
cause memory leak.

@@ -2627,8 +2640,14 @@ static void its_irq_domain_free(struct irq_domain *domain, unsigned int virq,
 		irq_domain_reset_irq_data(data);
 	}

-	/* If all interrupts have been freed, start mopping the floor */
-	if (bitmap_empty(its_dev->event_map.lpi_map,
+	mutex_lock(&its->dev_alloc_lock);
+
+	/*
+	 * If all interrupts have been freed, start mopping the
+	 * floor. This is conditionned on the device not being shared.
+	 */
+	if (!its_dev->shared &&
+	    bitmap_empty(its_dev->event_map.lpi_map,
 			 its_dev->event_map.nr_lpis)) {
 		its_lpi_free(its_dev->event_map.lpi_map,
 			     its_dev->event_map.lpi_base,

It seems that the shared its_dev would never be freed since the value of
its_dev->shared is always *true*.


> Thanks,
> 
> 	M.
> 
-- 

Thanks,
Xiang



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] irqchip/gic-v3-its: Lock its device list during find and create its device
  2019-02-01  6:41             ` Zheng Xiang
@ 2019-02-01  9:28               ` Marc Zyngier
  2019-02-02  1:51                 ` Zheng Xiang
  0 siblings, 1 reply; 10+ messages in thread
From: Marc Zyngier @ 2019-02-01  9:28 UTC (permalink / raw)
  To: Zheng Xiang; +Cc: linux-kernel, tglx, jason, wanghaibin.wang

On 01/02/2019 06:41, Zheng Xiang wrote:
> 
> On 2019/1/31 23:12, Marc Zyngier wrote:
>> Hi Zeng,
>>
>> On 31/01/2019 14:47, Zheng Xiang wrote:
>>> Hi Marc,
>>>
>>> On 2019/1/29 13:42, Zheng Xiang wrote:
>>>> On 2019/1/28 21:51, Marc Zyngier wrote:
>>>>> On 28/01/2019 07:13, Zheng Xiang wrote:
>>>>>> Hi Marc,
>>>>>>
>>>>>> Thanks for your review.
>>>>>>
>>>>>> On 2019/1/26 19:38, Marc Zyngier wrote:
>>>>>>> Hi Zheng,
>>>>>>>
>>>>>>> On Sat, 26 Jan 2019 06:16:24 +0000,
>>>>>>> Zheng Xiang <zhengxiang9@huawei.com> wrote:
>>>>>>>>
>>>>>>>> Currently each PCI device under a PCI Bridge shares the same device id
>>>>>>>> and ITS device. Assume there are two PCI devices call its_msi_prepare
>>>>>>>> concurrently and they are both going to find and create their ITS
>>>>>>>> device. There is a chance that the later one couldn't find ITS device
>>>>>>>> before the other one creating the ITS device. It will cause the later
>>>>>>>> one to create a different ITS device even if they have the same
>>>>>>>> device_id.
>>>>>>>
>>>>>>> Interesting finding. Is this something you've actually seen in practice
>>>>>>> with two devices being probed in parallel? Or something that you found
>>>>>>> by inspection?
>>>>>>
>>>>>> Yes, I find this problem after analyzing the reason of VM hung. At last, I
>>>>>> find that the virtio-gpu cannot receive the MSI interrupts due to sharing
>>>>>> a same event_id as virtio-serial.
>>>>>>
>>>>>> See https://lkml.org/lkml/2019/1/10/299 for the bug report.
>>>>>>
>>>>>> This problem can be reproducted with high probability by booting a Qemu/KVM
>>>>>> VM with a virtio-serial controller and a virtio-gpu adding to a PCI Bridge
>>>>>> and also adding some delay before creating ITS device.
>>>>>
>>>>> Fair enough. Do you mind sharing your QEMU command line? It'd be useful
>>>>> if I could reproduce it here (and would give me a way to check that it
>>>>> doesn't regress).
>>>>
>>>
>>> Have you reproduced it with my QEMU command line?
>>>
>>> If so, should I send a V2 patch with your suggestion?
>>
>> I've queued the following, much more complete patch:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/commit/?h=irq/irqchip-next&id=9791ec7df0e7b4d80706ccea8f24b6542f6059e9
>>
>> Can you check that it works for you? I didn't manage to get the right
>> timing conditions, but I also had issues getting virtio-gpu running on
>> my TX2, so one might explain the other.
>>
> 
> It works for my case, but I worried about the below lines which may
> cause memory leak.
> 
> @@ -2627,8 +2640,14 @@ static void its_irq_domain_free(struct irq_domain *domain, unsigned int virq,
>  		irq_domain_reset_irq_data(data);
>  	}
> 
> -	/* If all interrupts have been freed, start mopping the floor */
> -	if (bitmap_empty(its_dev->event_map.lpi_map,
> +	mutex_lock(&its->dev_alloc_lock);
> +
> +	/*
> +	 * If all interrupts have been freed, start mopping the
> +	 * floor. This is conditionned on the device not being shared.
> +	 */
> +	if (!its_dev->shared &&
> +	    bitmap_empty(its_dev->event_map.lpi_map,
>  			 its_dev->event_map.nr_lpis)) {
>  		its_lpi_free(its_dev->event_map.lpi_map,
>  			     its_dev->event_map.lpi_base,
> 
> It seems that the shared its_dev would never be freed since the value of
> its_dev->shared is always *true*.

Yes, and that is on purpose. As we don't refcount the number of
interrupts that have been requested in the prepare phase, there is a
race between free and alloc. We can have the following situation:

CPU0:               CPU1:

msi_prepare:
mutex_lock()
find device()
  -> found
store its_dev
mutex_unlock()

                    its_irq_domain_free:
                    mutex_lock()
                    free_device()
                    mutex_unlock()

its_irq_domain_alloc:
use its_dev -> boom.


So the trick is not to free the its_dev structure if it shares a devid.
It is not really a leak, as the next device sharing the same devid will
pick up the same structure.

Does it make sense?

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] irqchip/gic-v3-its: Lock its device list during find and create its device
  2019-02-01  9:28               ` Marc Zyngier
@ 2019-02-02  1:51                 ` Zheng Xiang
  0 siblings, 0 replies; 10+ messages in thread
From: Zheng Xiang @ 2019-02-02  1:51 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: linux-kernel, tglx, jason, wanghaibin.wang


On 2019/2/1 17:28, Marc Zyngier wrote:
> On 01/02/2019 06:41, Zheng Xiang wrote:
>>
>> On 2019/1/31 23:12, Marc Zyngier wrote:
>>> Hi Zeng,
>>>
>>> On 31/01/2019 14:47, Zheng Xiang wrote:
>>>> Hi Marc,
>>>>
>>>> On 2019/1/29 13:42, Zheng Xiang wrote:
>>>>> On 2019/1/28 21:51, Marc Zyngier wrote:
>>>>>> On 28/01/2019 07:13, Zheng Xiang wrote:
>>>>>>> Hi Marc,
>>>>>>>
>>>>>>> Thanks for your review.
>>>>>>>
>>>>>>> On 2019/1/26 19:38, Marc Zyngier wrote:
>>>>>>>> Hi Zheng,
>>>>>>>>
>>>>>>>> On Sat, 26 Jan 2019 06:16:24 +0000,
>>>>>>>> Zheng Xiang <zhengxiang9@huawei.com> wrote:
>>>>>>>>>
>>>>>>>>> Currently each PCI device under a PCI Bridge shares the same device id
>>>>>>>>> and ITS device. Assume there are two PCI devices call its_msi_prepare
>>>>>>>>> concurrently and they are both going to find and create their ITS
>>>>>>>>> device. There is a chance that the later one couldn't find ITS device
>>>>>>>>> before the other one creating the ITS device. It will cause the later
>>>>>>>>> one to create a different ITS device even if they have the same
>>>>>>>>> device_id.
>>>>>>>>
>>>>>>>> Interesting finding. Is this something you've actually seen in practice
>>>>>>>> with two devices being probed in parallel? Or something that you found
>>>>>>>> by inspection?
>>>>>>>
>>>>>>> Yes, I find this problem after analyzing the reason of VM hung. At last, I
>>>>>>> find that the virtio-gpu cannot receive the MSI interrupts due to sharing
>>>>>>> a same event_id as virtio-serial.
>>>>>>>
>>>>>>> See https://lkml.org/lkml/2019/1/10/299 for the bug report.
>>>>>>>
>>>>>>> This problem can be reproducted with high probability by booting a Qemu/KVM
>>>>>>> VM with a virtio-serial controller and a virtio-gpu adding to a PCI Bridge
>>>>>>> and also adding some delay before creating ITS device.
>>>>>>
>>>>>> Fair enough. Do you mind sharing your QEMU command line? It'd be useful
>>>>>> if I could reproduce it here (and would give me a way to check that it
>>>>>> doesn't regress).
>>>>>
>>>>
>>>> Have you reproduced it with my QEMU command line?
>>>>
>>>> If so, should I send a V2 patch with your suggestion?
>>>
>>> I've queued the following, much more complete patch:
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/commit/?h=irq/irqchip-next&id=9791ec7df0e7b4d80706ccea8f24b6542f6059e9
>>>
>>> Can you check that it works for you? I didn't manage to get the right
>>> timing conditions, but I also had issues getting virtio-gpu running on
>>> my TX2, so one might explain the other.
>>>
>>
>> It works for my case, but I worried about the below lines which may
>> cause memory leak.
>>
>> @@ -2627,8 +2640,14 @@ static void its_irq_domain_free(struct irq_domain *domain, unsigned int virq,
>>  		irq_domain_reset_irq_data(data);
>>  	}
>>
>> -	/* If all interrupts have been freed, start mopping the floor */
>> -	if (bitmap_empty(its_dev->event_map.lpi_map,
>> +	mutex_lock(&its->dev_alloc_lock);
>> +
>> +	/*
>> +	 * If all interrupts have been freed, start mopping the
>> +	 * floor. This is conditionned on the device not being shared.
>> +	 */
>> +	if (!its_dev->shared &&
>> +	    bitmap_empty(its_dev->event_map.lpi_map,
>>  			 its_dev->event_map.nr_lpis)) {
>>  		its_lpi_free(its_dev->event_map.lpi_map,
>>  			     its_dev->event_map.lpi_base,
>>
>> It seems that the shared its_dev would never be freed since the value of
>> its_dev->shared is always *true*.
> 
> Yes, and that is on purpose. As we don't refcount the number of
> interrupts that have been requested in the prepare phase, there is a
> race between free and alloc. We can have the following situation:
> 
> CPU0:               CPU1:
> 
> msi_prepare:
> mutex_lock()
> find device()
>   -> found
> store its_dev
> mutex_unlock()
> 
>                     its_irq_domain_free:
>                     mutex_lock()
>                     free_device()
>                     mutex_unlock()
> 
> its_irq_domain_alloc:
> use its_dev -> boom.
> 
> 
> So the trick is not to free the its_dev structure if it shares a devid.
> It is not really a leak, as the next device sharing the same devid will
> pick up the same structure.
> 
> Does it make sense?

Yes, Thanks a lot!


-- 

Thanks,
Xiang



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-02-02  1:52 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-26  6:16 [PATCH] irqchip/gic-v3-its: Lock its device list during find and create its device Zheng Xiang
2019-01-26 11:38 ` Marc Zyngier
2019-01-28  7:13   ` Zheng Xiang
2019-01-28 13:51     ` Marc Zyngier
2019-01-29  5:42       ` Zheng Xiang
2019-01-31 14:47         ` Zheng Xiang
2019-01-31 15:12           ` Marc Zyngier
2019-02-01  6:41             ` Zheng Xiang
2019-02-01  9:28               ` Marc Zyngier
2019-02-02  1:51                 ` Zheng Xiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).