linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] iommu/amd: Fix event counter availability check
@ 2020-05-29 20:07 Alexander Monakov
  2020-05-31  7:22 ` Alexander Monakov
  2020-06-01  7:37 ` Suravee Suthikulpanit
  0 siblings, 2 replies; 19+ messages in thread
From: Alexander Monakov @ 2020-05-29 20:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alexander Monakov, Joerg Roedel, Suravee Suthikulpanit, iommu

The driver performs an extra check if the IOMMU's capabilities advertise
presence of performance counters: it verifies that counters are writable
by writing a hard-coded value to a counter and testing that reading that
counter gives back the same value.

Unfortunately it does so quite early, even before pci_enable_device is
called for the IOMMU, i.e. when accessing its MMIO space is not
guaranteed to work. On Ryzen 4500U CPU, this actually breaks the test:
the driver assumes the counters are not writable, and disables the
functionality.

Moving init_iommu_perf_ctr just after iommu_flush_all_caches resolves
the issue. This is the earliest point in amd_iommu_init_pci where the
call succeeds on my laptop.

Signed-off-by: Alexander Monakov <amonakov@ispras.ru>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: iommu@lists.linux-foundation.org
---

PS. I'm seeing another hiccup with IOMMU probing on my system:
pci 0000:00:00.2: can't derive routing for PCI INT A
pci 0000:00:00.2: PCI INT A: not connected

Hopefully I can figure it out, but I'd appreciate hints.

 drivers/iommu/amd_iommu_init.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 5b81fd16f5fa..1b7ec6b6a282 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -1788,8 +1788,6 @@ static int __init iommu_init_pci(struct amd_iommu *iommu)
 	if (iommu->cap & (1UL << IOMMU_CAP_NPCACHE))
 		amd_iommu_np_cache = true;
 
-	init_iommu_perf_ctr(iommu);
-
 	if (is_rd890_iommu(iommu->dev)) {
 		int i, j;
 
@@ -1891,8 +1889,10 @@ static int __init amd_iommu_init_pci(void)
 
 	init_device_table_dma();
 
-	for_each_iommu(iommu)
+	for_each_iommu(iommu) {
 		iommu_flush_all_caches(iommu);
+		init_iommu_perf_ctr(iommu);
+	}
 
 	if (!ret)
 		print_iommu_info();

base-commit: 75caf310d16cc5e2f851c048cd597f5437013368
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] iommu/amd: Fix event counter availability check
  2020-05-29 20:07 [PATCH] iommu/amd: Fix event counter availability check Alexander Monakov
@ 2020-05-31  7:22 ` Alexander Monakov
  2020-06-01  2:48   ` Paul Menzel
  2020-06-02 23:51   ` Shuah Khan
  2020-06-01  7:37 ` Suravee Suthikulpanit
  1 sibling, 2 replies; 19+ messages in thread
From: Alexander Monakov @ 2020-05-31  7:22 UTC (permalink / raw)
  To: linux-kernel; +Cc: Joerg Roedel, Suravee Suthikulpanit, iommu, Shuah Khan

Hi,

Adding Shuah Khan to Cc: I've noticed you've seen this issue on Ryzen 2400GE;
can you have a look at the patch? Would be nice to know if it fixes the
problem for you too.

Thanks.
Alexander

On Fri, 29 May 2020, Alexander Monakov wrote:

> The driver performs an extra check if the IOMMU's capabilities advertise
> presence of performance counters: it verifies that counters are writable
> by writing a hard-coded value to a counter and testing that reading that
> counter gives back the same value.
> 
> Unfortunately it does so quite early, even before pci_enable_device is
> called for the IOMMU, i.e. when accessing its MMIO space is not
> guaranteed to work. On Ryzen 4500U CPU, this actually breaks the test:
> the driver assumes the counters are not writable, and disables the
> functionality.
> 
> Moving init_iommu_perf_ctr just after iommu_flush_all_caches resolves
> the issue. This is the earliest point in amd_iommu_init_pci where the
> call succeeds on my laptop.
> 
> Signed-off-by: Alexander Monakov <amonakov@ispras.ru>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> Cc: iommu@lists.linux-foundation.org
> ---
> 
> PS. I'm seeing another hiccup with IOMMU probing on my system:
> pci 0000:00:00.2: can't derive routing for PCI INT A
> pci 0000:00:00.2: PCI INT A: not connected
> 
> Hopefully I can figure it out, but I'd appreciate hints.
> 
>  drivers/iommu/amd_iommu_init.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
> index 5b81fd16f5fa..1b7ec6b6a282 100644
> --- a/drivers/iommu/amd_iommu_init.c
> +++ b/drivers/iommu/amd_iommu_init.c
> @@ -1788,8 +1788,6 @@ static int __init iommu_init_pci(struct amd_iommu *iommu)
>  	if (iommu->cap & (1UL << IOMMU_CAP_NPCACHE))
>  		amd_iommu_np_cache = true;
>  
> -	init_iommu_perf_ctr(iommu);
> -
>  	if (is_rd890_iommu(iommu->dev)) {
>  		int i, j;
>  
> @@ -1891,8 +1889,10 @@ static int __init amd_iommu_init_pci(void)
>  
>  	init_device_table_dma();
>  
> -	for_each_iommu(iommu)
> +	for_each_iommu(iommu) {
>  		iommu_flush_all_caches(iommu);
> +		init_iommu_perf_ctr(iommu);
> +	}
>  
>  	if (!ret)
>  		print_iommu_info();
> 
> base-commit: 75caf310d16cc5e2f851c048cd597f5437013368
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] iommu/amd: Fix event counter availability check
  2020-05-31  7:22 ` Alexander Monakov
@ 2020-06-01  2:48   ` Paul Menzel
  2021-02-21 13:44     ` Paul Menzel
  2020-06-02 23:51   ` Shuah Khan
  1 sibling, 1 reply; 19+ messages in thread
From: Paul Menzel @ 2020-06-01  2:48 UTC (permalink / raw)
  To: Alexander Monakov, linux-kernel
  Cc: iommu, Shuah Khan, Jörg Rödel, Suravee Suthikulpanit

Dear Alexander,


Thank you very much for the patch.


Am 31.05.20 um 09:22 schrieb Alexander Monakov:

> Adding Shuah Khan to Cc: I've noticed you've seen this issue on Ryzen 2400GE;
> can you have a look at the patch? Would be nice to know if it fixes the
> problem for you too.

> On Fri, 29 May 2020, Alexander Monakov wrote:
> 
>> The driver performs an extra check if the IOMMU's capabilities advertise
>> presence of performance counters: it verifies that counters are writable
>> by writing a hard-coded value to a counter and testing that reading that
>> counter gives back the same value.
>>
>> Unfortunately it does so quite early, even before pci_enable_device is
>> called for the IOMMU, i.e. when accessing its MMIO space is not
>> guaranteed to work. On Ryzen 4500U CPU, this actually breaks the test:
>> the driver assumes the counters are not writable, and disables the
>> functionality.
>>
>> Moving init_iommu_perf_ctr just after iommu_flush_all_caches resolves
>> the issue. This is the earliest point in amd_iommu_init_pci where the
>> call succeeds on my laptop.
>>
>> Signed-off-by: Alexander Monakov <amonakov@ispras.ru>
>> Cc: Joerg Roedel <joro@8bytes.org>
>> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
>> Cc: iommu@lists.linux-foundation.org
>> ---
>>
>> PS. I'm seeing another hiccup with IOMMU probing on my system:
>> pci 0000:00:00.2: can't derive routing for PCI INT A
>> pci 0000:00:00.2: PCI INT A: not connected
>>
>> Hopefully I can figure it out, but I'd appreciate hints.

I guess it’s a firmware bug, but I contacted the linux-pci folks [1].

>>   drivers/iommu/amd_iommu_init.c | 6 +++---
>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
>> index 5b81fd16f5fa..1b7ec6b6a282 100644
>> --- a/drivers/iommu/amd_iommu_init.c
>> +++ b/drivers/iommu/amd_iommu_init.c
>> @@ -1788,8 +1788,6 @@ static int __init iommu_init_pci(struct amd_iommu *iommu)
>>   	if (iommu->cap & (1UL << IOMMU_CAP_NPCACHE))
>>   		amd_iommu_np_cache = true;
>>   
>> -	init_iommu_perf_ctr(iommu);
>> -
>>   	if (is_rd890_iommu(iommu->dev)) {
>>   		int i, j;
>>   
>> @@ -1891,8 +1889,10 @@ static int __init amd_iommu_init_pci(void)
>>   
>>   	init_device_table_dma();
>>   
>> -	for_each_iommu(iommu)
>> +	for_each_iommu(iommu) {
>>   		iommu_flush_all_caches(iommu);
>> +		init_iommu_perf_ctr(iommu);
>> +	}
>>   
>>   	if (!ret)
>>   		print_iommu_info();
>>
>> base-commit: 75caf310d16cc5e2f851c048cd597f5437013368

Thank you very much for fixing this issue, which is almost two years old 
for me.

Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
MSI MSI MS-7A37/B350M MORTAR with AMD Ryzen 3 2200G
Link: https://lore.kernel.org/linux-iommu/20180727102710.GA6738@8bytes.org/


Kind regards,

Paul


[1]: 
https://lore.kernel.org/linux-pci/8579bd14-e369-1141-917b-204d20cff528@molgen.mpg.de/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] iommu/amd: Fix event counter availability check
  2020-05-29 20:07 [PATCH] iommu/amd: Fix event counter availability check Alexander Monakov
  2020-05-31  7:22 ` Alexander Monakov
@ 2020-06-01  7:37 ` Suravee Suthikulpanit
  2020-06-01  9:01   ` Alexander Monakov
  1 sibling, 1 reply; 19+ messages in thread
From: Suravee Suthikulpanit @ 2020-06-01  7:37 UTC (permalink / raw)
  To: Alexander Monakov, linux-kernel; +Cc: Joerg Roedel, iommu

Hi Alexander,

On 5/30/20 3:07 AM, Alexander Monakov wrote:
> The driver performs an extra check if the IOMMU's capabilities advertise
> presence of performance counters: it verifies that counters are writable
> by writing a hard-coded value to a counter and testing that reading that
> counter gives back the same value.
> 
> Unfortunately it does so quite early, even before pci_enable_device is
> called for the IOMMU, i.e. when accessing its MMIO space is not
> guaranteed to work. On Ryzen 4500U CPU, this actually breaks the test:
> the driver assumes the counters are not writable, and disables the
> functionality.
> 
> Moving init_iommu_perf_ctr just after iommu_flush_all_caches resolves
> the issue. This is the earliest point in amd_iommu_init_pci where the
> call succeeds on my laptop.

According to your description, it should just need to be anywhere after the pci_enable_device() is called for the IOMMU 
device, isn't it? So, on your system, what if we just move the init_iommu_perf_ctr() here:

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 5b81fd16f5fa..17b9ac9491e0 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -1875,6 +1875,7 @@ static int __init amd_iommu_init_pci(void)
                 ret = iommu_init_pci(iommu);
                 if (ret)
                         break;
+               init_iommu_perf_ctr(iommu);
         }

         /*
--
2.17.1


Does this works?

Thanks,
Suravee

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] iommu/amd: Fix event counter availability check
  2020-06-01  7:37 ` Suravee Suthikulpanit
@ 2020-06-01  9:01   ` Alexander Monakov
  2020-06-01 15:10     ` Suravee Suthikulpanit
  0 siblings, 1 reply; 19+ messages in thread
From: Alexander Monakov @ 2020-06-01  9:01 UTC (permalink / raw)
  To: Suravee Suthikulpanit; +Cc: linux-kernel, Joerg Roedel, iommu

On Mon, 1 Jun 2020, Suravee Suthikulpanit wrote:

> > Moving init_iommu_perf_ctr just after iommu_flush_all_caches resolves
> > the issue. This is the earliest point in amd_iommu_init_pci where the
> > call succeeds on my laptop.
> 
> According to your description, it should just need to be anywhere after the
> pci_enable_device() is called for the IOMMU device, isn't it? So, on your
> system, what if we just move the init_iommu_perf_ctr() here:

No, this doesn't work, as I already said in the paragraph you are responding
to. See my last sentence in the quoted part.

So the implication is init_device_table_dma together with subsequent cache
flush is also setting up something that is necessary for counters to be
writable.

Alexander

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] iommu/amd: Fix event counter availability check
  2020-06-01  9:01   ` Alexander Monakov
@ 2020-06-01 15:10     ` Suravee Suthikulpanit
  2020-06-01 16:09       ` Alexander Monakov
  2020-06-15 20:48       ` Alexander Monakov
  0 siblings, 2 replies; 19+ messages in thread
From: Suravee Suthikulpanit @ 2020-06-01 15:10 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: linux-kernel, Joerg Roedel, iommu

Alexander

On 6/1/20 4:01 PM, Alexander Monakov wrote:
> On Mon, 1 Jun 2020, Suravee Suthikulpanit wrote:
> 
>>> Moving init_iommu_perf_ctr just after iommu_flush_all_caches resolves
>>> the issue. This is the earliest point in amd_iommu_init_pci where the
>>> call succeeds on my laptop.
>>
>> According to your description, it should just need to be anywhere after the
>> pci_enable_device() is called for the IOMMU device, isn't it? So, on your
>> system, what if we just move the init_iommu_perf_ctr() here:
> 
> No, this doesn't work, as I already said in the paragraph you are responding
> to. See my last sentence in the quoted part.
> 
> So the implication is init_device_table_dma together with subsequent cache
> flush is also setting up something that is necessary for counters to be
> writable.
> 
> Alexander
> 

Instead of blindly moving the code around to a spot that would just work,
I am trying to understand what might be required here. In this case,
the init_device_table_dma()should not be needed. I suspect it's the IOMMU
invalidate all command that's also needed here.

I'm also checking with the HW and BIOS team. Meanwhile, could you please give
the following change a try:

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 5b81fd16f5fa..b07458cc1b0b 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -1875,6 +1875,8 @@ static int __init amd_iommu_init_pci(void)
                 ret = iommu_init_pci(iommu);
                 if (ret)
                         break;
+               iommu_flush_all_caches(iommu);
+               init_iommu_perf_ctr(iommu);
         }

         /*
--
2.17.1

Thanks,
Suravee

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] iommu/amd: Fix event counter availability check
  2020-06-01 15:10     ` Suravee Suthikulpanit
@ 2020-06-01 16:09       ` Alexander Monakov
  2020-06-15 20:48       ` Alexander Monakov
  1 sibling, 0 replies; 19+ messages in thread
From: Alexander Monakov @ 2020-06-01 16:09 UTC (permalink / raw)
  To: Suravee Suthikulpanit; +Cc: linux-kernel, Joerg Roedel, iommu

> Instead of blindly moving the code around to a spot that would just work,
> I am trying to understand what might be required here. In this case,
> the init_device_table_dma()should not be needed. I suspect it's the IOMMU
> invalidate all command that's also needed here.
> 
> I'm also checking with the HW and BIOS team. Meanwhile, could you please give
> the following change a try:

Yes, this also fixes the problem for me.
Alexander

> 
> diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
> index 5b81fd16f5fa..b07458cc1b0b 100644
> --- a/drivers/iommu/amd_iommu_init.c
> +++ b/drivers/iommu/amd_iommu_init.c
> @@ -1875,6 +1875,8 @@ static int __init amd_iommu_init_pci(void)
>                 ret = iommu_init_pci(iommu);
>                 if (ret)
>                         break;
> +               iommu_flush_all_caches(iommu);
> +               init_iommu_perf_ctr(iommu);
>         }
> 
>         /*

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] iommu/amd: Fix event counter availability check
  2020-05-31  7:22 ` Alexander Monakov
  2020-06-01  2:48   ` Paul Menzel
@ 2020-06-02 23:51   ` Shuah Khan
  2020-06-03  6:54     ` Alexander Monakov
  1 sibling, 1 reply; 19+ messages in thread
From: Shuah Khan @ 2020-06-02 23:51 UTC (permalink / raw)
  To: Alexander Monakov, linux-kernel
  Cc: Joerg Roedel, Suravee Suthikulpanit, iommu, skhan

On 5/31/20 1:22 AM, Alexander Monakov wrote:
> Hi,
> 
> Adding Shuah Khan to Cc: I've noticed you've seen this issue on Ryzen 2400GE;
> can you have a look at the patch? Would be nice to know if it fixes the
> problem for you too.
> 

I am not seeing any change in behavior on my system. I still see:

I can't read perf counters.

The question I asked in my previous thread on this:

--------------------------------------------------------------------
I see 2 banks and 4 counters on my system. Is it sufficient to check
the first bank and first counter? In other words, if the first one
isn't writable, are all counters non-writable?

Should we read the config first and then, try to see if any of the
counters are writable? I have a patch that does that, I can send it
out for review.

I changed the logic to read config to get max banks and counters
before checking if counters are writable and tried writing to all.
The result is the same and all of them aren't writable. However,
when disable the writable check and assume they are, I can run

perf stat -e 'amd_iommu_0 on all events and get data.

perf stat -e 'amd_iommu_0/cmd_processed/' sleep 10

  Performance counter stats for 'system wide':

                 56      amd_iommu_0/cmd_processed/

       10.001525171 seconds time elapsed


perf stat -a -e amd_iommu/mem_trans_total/ sleep 10

  Performance counter stats for 'system wide':

              2,696      amd_iommu/mem_trans_total/

       10.001465115 seconds time elapsed

I tried all possible events listed under amd_iommu_0 and I can get
data on all of them. No problems in dmesg.
--------------------------------------------------------------------

This patch doesn't really address that question.

thanks,
-- Shuah

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] iommu/amd: Fix event counter availability check
  2020-06-02 23:51   ` Shuah Khan
@ 2020-06-03  6:54     ` Alexander Monakov
  2021-02-26 21:44       ` Paul Menzel
  0 siblings, 1 reply; 19+ messages in thread
From: Alexander Monakov @ 2020-06-03  6:54 UTC (permalink / raw)
  To: Shuah Khan; +Cc: linux-kernel, Joerg Roedel, Suravee Suthikulpanit, iommu

On Tue, 2 Jun 2020, Shuah Khan wrote:

> I changed the logic to read config to get max banks and counters
> before checking if counters are writable and tried writing to all.
> The result is the same and all of them aren't writable. However,
> when disable the writable check and assume they are, I can run
[snip]

This is similar to what I did. I also noticed that counters can
be successfully used with perf if the initial check is ignored.
I was considering sending a patch to remove the check and adjust
the event counting logic to use counters as read-only, but after
a bit more investigation I've noticed how late pci_enable_device
is done, and came up with this patch. It's a path of less resistance:
I'd expect maintainers to be more averse to removing the check
rather than fixing it so it works as intended (even though I think
the check should not be there in the first place).

However:

The ability to modify the counters is needed only for sampling the
events (getting an interrupt when a counter overflows). There's no
code to do that for these AMD IOMMU counters. A solution I would
prefer is to not write to those counters at all. It would simplify or
even remove a bunch of code. I can submit a corresponding patch if
there's general agreement this path is ok.

What do you think?

Alexander

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] iommu/amd: Fix event counter availability check
  2020-06-01 15:10     ` Suravee Suthikulpanit
  2020-06-01 16:09       ` Alexander Monakov
@ 2020-06-15 20:48       ` Alexander Monakov
  2020-06-16  9:35         ` Suravee Suthikulpanit
  1 sibling, 1 reply; 19+ messages in thread
From: Alexander Monakov @ 2020-06-15 20:48 UTC (permalink / raw)
  To: Suravee Suthikulpanit; +Cc: linux-kernel, Joerg Roedel, iommu

On Mon, 1 Jun 2020, Suravee Suthikulpanit wrote:

> Alexander
> 
> On 6/1/20 4:01 PM, Alexander Monakov wrote:
> > On Mon, 1 Jun 2020, Suravee Suthikulpanit wrote:
> > 
> > > > Moving init_iommu_perf_ctr just after iommu_flush_all_caches resolves
> > > > the issue. This is the earliest point in amd_iommu_init_pci where the
> > > > call succeeds on my laptop.
> > > 
> > > According to your description, it should just need to be anywhere after
> > > the
> > > pci_enable_device() is called for the IOMMU device, isn't it? So, on your
> > > system, what if we just move the init_iommu_perf_ctr() here:
> > 
> > No, this doesn't work, as I already said in the paragraph you are responding
> > to. See my last sentence in the quoted part.
> > 
> > So the implication is init_device_table_dma together with subsequent cache
> > flush is also setting up something that is necessary for counters to be
> > writable.
> > 
> > Alexander
> > 
> 
> Instead of blindly moving the code around to a spot that would just work,
> I am trying to understand what might be required here. In this case,
> the init_device_table_dma()should not be needed. I suspect it's the IOMMU
> invalidate all command that's also needed here.
> 
> I'm also checking with the HW and BIOS team. Meanwhile, could you please give
> the following change a try:

Hello. Can you give any update please?

Alexander

> diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
> index 5b81fd16f5fa..b07458cc1b0b 100644
> --- a/drivers/iommu/amd_iommu_init.c
> +++ b/drivers/iommu/amd_iommu_init.c
> @@ -1875,6 +1875,8 @@ static int __init amd_iommu_init_pci(void)
>                 ret = iommu_init_pci(iommu);
>                 if (ret)
>                         break;
> +               iommu_flush_all_caches(iommu);
> +               init_iommu_perf_ctr(iommu);
>         }
> 
>         /*
> --
> 2.17.1
> 
> Thanks,
> Suravee
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] iommu/amd: Fix event counter availability check
  2020-06-15 20:48       ` Alexander Monakov
@ 2020-06-16  9:35         ` Suravee Suthikulpanit
  2020-06-30 19:22           ` Alexander Monakov
  2020-09-17 17:55           ` Alexander Monakov
  0 siblings, 2 replies; 19+ messages in thread
From: Suravee Suthikulpanit @ 2020-06-16  9:35 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: linux-kernel, Joerg Roedel, iommu



On 6/16/20 3:48 AM, Alexander Monakov wrote:
>> Alexander
>>
>> On 6/1/20 4:01 PM, Alexander Monakov wrote:
>>> On Mon, 1 Jun 2020, Suravee Suthikulpanit wrote:
>>>
>>>>> Moving init_iommu_perf_ctr just after iommu_flush_all_caches resolves
>>>>> the issue. This is the earliest point in amd_iommu_init_pci where the
>>>>> call succeeds on my laptop.
>>>> According to your description, it should just need to be anywhere after
>>>> the
>>>> pci_enable_device() is called for the IOMMU device, isn't it? So, on your
>>>> system, what if we just move the init_iommu_perf_ctr() here:
>>> No, this doesn't work, as I already said in the paragraph you are responding
>>> to. See my last sentence in the quoted part.
>>>
>>> So the implication is init_device_table_dma together with subsequent cache
>>> flush is also setting up something that is necessary for counters to be
>>> writable.
>>>
>>> Alexander
>>>
>> Instead of blindly moving the code around to a spot that would just work,
>> I am trying to understand what might be required here. In this case,
>> the init_device_table_dma()should not be needed. I suspect it's the IOMMU
>> invalidate all command that's also needed here.
>>
>> I'm also checking with the HW and BIOS team. Meanwhile, could you please give
>> the following change a try:
> Hello. Can you give any update please?
> 
> Alexander
> 

Sorry for late reply. I have a reproducer and working with the HW team to understand the issue.
I should be able to provide update with solution by the end of this week.

Suravee

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] iommu/amd: Fix event counter availability check
  2020-06-16  9:35         ` Suravee Suthikulpanit
@ 2020-06-30 19:22           ` Alexander Monakov
  2020-09-17 17:55           ` Alexander Monakov
  1 sibling, 0 replies; 19+ messages in thread
From: Alexander Monakov @ 2020-06-30 19:22 UTC (permalink / raw)
  To: Suravee Suthikulpanit; +Cc: linux-kernel, Joerg Roedel, iommu

On Tue, 16 Jun 2020, Suravee Suthikulpanit wrote:

> > > On 6/1/20 4:01 PM, Alexander Monakov wrote:
> > > > On Mon, 1 Jun 2020, Suravee Suthikulpanit wrote:
> > > > 
> > > > > > Moving init_iommu_perf_ctr just after iommu_flush_all_caches
> > > > > > resolves the issue. This is the earliest point in amd_iommu_init_pci
> > > > > > where the call succeeds on my laptop.
> > > > > According to your description, it should just need to be anywhere
> > > > > after the pci_enable_device() is called for the IOMMU device, isn't
> > > > > it? So, on your system, what if we just move the init_iommu_perf_ctr()
> > > > > here:
> > > > No, this doesn't work, as I already said in the paragraph you are
> > > > responding to. See my last sentence in the quoted part.
> > > > 
> > > > So the implication is init_device_table_dma together with subsequent
> > > > cache flush is also setting up something that is necessary for counters
> > > > to be writable.
> > > > 
> > > > Alexander
> > > > 
> > > Instead of blindly moving the code around to a spot that would just work,
> > > I am trying to understand what might be required here. In this case,
> > > the init_device_table_dma()should not be needed. I suspect it's the IOMMU
> > > invalidate all command that's also needed here.
> > > 
> > > I'm also checking with the HW and BIOS team. Meanwhile, could you please
> > > give the following change a try:
> > Hello. Can you give any update please?
> > 
> > Alexander
> > 
> 
> Sorry for late reply. I have a reproducer and working with the HW team to
> understand the issue.
> I should be able to provide update with solution by the end of this week.

Hi, can you share any information (two more weeks have passed)?

Alexander

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] iommu/amd: Fix event counter availability check
  2020-06-16  9:35         ` Suravee Suthikulpanit
  2020-06-30 19:22           ` Alexander Monakov
@ 2020-09-17 17:55           ` Alexander Monakov
  2021-02-21 13:49             ` Paul Menzel
  1 sibling, 1 reply; 19+ messages in thread
From: Alexander Monakov @ 2020-09-17 17:55 UTC (permalink / raw)
  To: Suravee Suthikulpanit; +Cc: linux-kernel, Joerg Roedel, iommu

On Tue, 16 Jun 2020, Suravee Suthikulpanit wrote:

> > > Instead of blindly moving the code around to a spot that would just work,
> > > I am trying to understand what might be required here. In this case,
> > > the init_device_table_dma()should not be needed. I suspect it's the IOMMU
> > > invalidate all command that's also needed here.
> > > 
> > > I'm also checking with the HW and BIOS team. Meanwhile, could you please
> > > give
> > > the following change a try:
> > Hello. Can you give any update please?
> > 
> > Alexander
> > 
> 
> Sorry for late reply. I have a reproducer and working with the HW team to
> understand the issue.
> I should be able to provide update with solution by the end of this week.

Hello, hope you are doing well. Has this investigation found anything?

Thanks.
Alexander

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] iommu/amd: Fix event counter availability check
  2020-06-01  2:48   ` Paul Menzel
@ 2021-02-21 13:44     ` Paul Menzel
  0 siblings, 0 replies; 19+ messages in thread
From: Paul Menzel @ 2021-02-21 13:44 UTC (permalink / raw)
  To: Alexander Monakov, linux-kernel
  Cc: iommu, Shuah Khan, Jörg Rödel, Suravee Suthikulpanit

Dear Alexander,


Am 01.06.20 um 04:48 schrieb Paul Menzel:

[…]

> Am 31.05.20 um 09:22 schrieb Alexander Monakov:
> 
>> Adding Shuah Khan to Cc: I've noticed you've seen this issue on Ryzen 2400GE;
>> can you have a look at the patch? Would be nice to know if it fixes the
>> problem for you too.
> 
>> On Fri, 29 May 2020, Alexander Monakov wrote:
>>
>>> The driver performs an extra check if the IOMMU's capabilities advertise
>>> presence of performance counters: it verifies that counters are writable
>>> by writing a hard-coded value to a counter and testing that reading that
>>> counter gives back the same value.
>>>
>>> Unfortunately it does so quite early, even before pci_enable_device is
>>> called for the IOMMU, i.e. when accessing its MMIO space is not
>>> guaranteed to work. On Ryzen 4500U CPU, this actually breaks the test:
>>> the driver assumes the counters are not writable, and disables the
>>> functionality.
>>>
>>> Moving init_iommu_perf_ctr just after iommu_flush_all_caches resolves
>>> the issue. This is the earliest point in amd_iommu_init_pci where the
>>> call succeeds on my laptop.
>>>
>>> Signed-off-by: Alexander Monakov <amonakov@ispras.ru>
>>> Cc: Joerg Roedel <joro@8bytes.org>
>>> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
>>> Cc: iommu@lists.linux-foundation.org
>>> ---
>>>
>>> PS. I'm seeing another hiccup with IOMMU probing on my system:
>>> pci 0000:00:00.2: can't derive routing for PCI INT A
>>> pci 0000:00:00.2: PCI INT A: not connected
>>>
>>> Hopefully I can figure it out, but I'd appreciate hints.
> 
> I guess it’s a firmware bug, but I contacted the linux-pci folks [1].

Unfortunately, it’s still present in Linux 5.11.

>>>   drivers/iommu/amd_iommu_init.c | 6 +++---
>>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/iommu/amd_iommu_init.c 
>>> b/drivers/iommu/amd_iommu_init.c
>>> index 5b81fd16f5fa..1b7ec6b6a282 100644
>>> --- a/drivers/iommu/amd_iommu_init.c
>>> +++ b/drivers/iommu/amd_iommu_init.c
>>> @@ -1788,8 +1788,6 @@ static int __init iommu_init_pci(struct 
>>> amd_iommu *iommu)
>>>       if (iommu->cap & (1UL << IOMMU_CAP_NPCACHE))
>>>           amd_iommu_np_cache = true;
>>> -    init_iommu_perf_ctr(iommu);
>>> -
>>>       if (is_rd890_iommu(iommu->dev)) {
>>>           int i, j;
>>> @@ -1891,8 +1889,10 @@ static int __init amd_iommu_init_pci(void)
>>>       init_device_table_dma();
>>> -    for_each_iommu(iommu)
>>> +    for_each_iommu(iommu) {
>>>           iommu_flush_all_caches(iommu);
>>> +        init_iommu_perf_ctr(iommu);
>>> +    }
>>>       if (!ret)
>>>           print_iommu_info();
>>>
>>> base-commit: 75caf310d16cc5e2f851c048cd597f5437013368
> 
> Thank you very much for fixing this issue, which is almost two years old 
> for me.
> 
> Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
> MSI MSI MS-7A37/B350M MORTAR with AMD Ryzen 3 2200G
> Link: https://lore.kernel.org/linux-iommu/20180727102710.GA6738@8bytes.org/

Just a small note, that I am applying your patch, but it looks like 
there is still some timing issue. At least today, I noticed it during 
one boot with Linux 5.11. (Before I never noticed it again in the 
several years, but I am not always paying attention and do not save the 
logs.)


Kind regards,

Paul


> [1]: https://lore.kernel.org/linux-pci/8579bd14-e369-1141-917b-204d20cff528@molgen.mpg.de/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] iommu/amd: Fix event counter availability check
  2020-09-17 17:55           ` Alexander Monakov
@ 2021-02-21 13:49             ` Paul Menzel
  2021-02-22 17:59               ` Suravee Suthikulpanit
  0 siblings, 1 reply; 19+ messages in thread
From: Paul Menzel @ 2021-02-21 13:49 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: linux-kernel, iommu, Alexander Monakov, Joerg Roedel

Dear Suravee,


Am 17.09.20 um 19:55 schrieb Alexander Monakov:
> On Tue, 16 Jun 2020, Suravee Suthikulpanit wrote:
> 
>>>> Instead of blindly moving the code around to a spot that would just work,
>>>> I am trying to understand what might be required here. In this case,
>>>> the init_device_table_dma()should not be needed. I suspect it's the IOMMU
>>>> invalidate all command that's also needed here.
>>>>
>>>> I'm also checking with the HW and BIOS team. Meanwhile, could you please
>>>> give
>>>> the following change a try:
>>> Hello. Can you give any update please?

[…]

>> Sorry for late reply. I have a reproducer and working with the HW team to
>> understand the issue.
>> I should be able to provide update with solution by the end of this week.
> 
> Hello, hope you are doing well. Has this investigation found anything?

I am wondering the same. It’d be great to have this fixed in the 
upstream Linux kernel.


Kind regards,

Paul

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] iommu/amd: Fix event counter availability check
  2021-02-21 13:49             ` Paul Menzel
@ 2021-02-22 17:59               ` Suravee Suthikulpanit
  2021-02-24 20:35                 ` Paul Menzel
  0 siblings, 1 reply; 19+ messages in thread
From: Suravee Suthikulpanit @ 2021-02-22 17:59 UTC (permalink / raw)
  To: Paul Menzel; +Cc: linux-kernel, iommu, Alexander Monakov, Joerg Roedel

This fix has been accepted in the upstream recently.

https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git/commit/?h=x86/amd

Could you please give this a try?

Thanks,
Suravee

On 2/21/21 8:49 PM, Paul Menzel wrote:
> Dear Suravee,
> 
> 
> Am 17.09.20 um 19:55 schrieb Alexander Monakov:
>> On Tue, 16 Jun 2020, Suravee Suthikulpanit wrote:
>>
>>>>> Instead of blindly moving the code around to a spot that would just work,
>>>>> I am trying to understand what might be required here. In this case,
>>>>> the init_device_table_dma()should not be needed. I suspect it's the IOMMU
>>>>> invalidate all command that's also needed here.
>>>>>
>>>>> I'm also checking with the HW and BIOS team. Meanwhile, could you please
>>>>> give
>>>>> the following change a try:
>>>> Hello. Can you give any update please?
> 
> […]
> 
>>> Sorry for late reply. I have a reproducer and working with the HW team to
>>> understand the issue.
>>> I should be able to provide update with solution by the end of this week.
>>
>> Hello, hope you are doing well. Has this investigation found anything?
> 
> I am wondering the same. It’d be great to have this fixed in the upstream Linux kernel.
> 
> 
> Kind regards,
> 
> Paul

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] iommu/amd: Fix event counter availability check
  2021-02-22 17:59               ` Suravee Suthikulpanit
@ 2021-02-24 20:35                 ` Paul Menzel
  0 siblings, 0 replies; 19+ messages in thread
From: Paul Menzel @ 2021-02-24 20:35 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: linux-kernel, iommu, Alexander Monakov, Joerg Roedel

Dear Suravee,


Thank you for your reply.


Am 22.02.21 um 18:59 schrieb Suravee Suthikulpanit:
> This fix has been accepted in the upstream recently.
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git/commit/?h=x86/amd 

Indeed. Linux pulled also pulled this [1].

> Could you please give this a try?

Yes, I did give it a try, but, unfortunately, the problem is unfixed. I 
commented on the Linux Bugzilla bug report #201753 [1].


Kind regards,

Paul


PS: It’d be great if you didn’t top post, and used interleaved style for 
responses.

[1]: https://bugzilla.kernel.org/show_bug.cgi?id=201753
      "AMD-Vi: Unable to write to IOMMU perf counter"

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] iommu/amd: Fix event counter availability check
  2020-06-03  6:54     ` Alexander Monakov
@ 2021-02-26 21:44       ` Paul Menzel
  2021-02-26 21:55         ` Shuah Khan
  0 siblings, 1 reply; 19+ messages in thread
From: Paul Menzel @ 2021-02-26 21:44 UTC (permalink / raw)
  To: Alexander Monakov, Shuah Khan, Suravee Suthikulpanit
  Cc: iommu, linux-kernel, David Coe, Joerg Roedel, Tj (Elloe Linux)

[cc: +suravee, +jörg]

Dear Alex, dear Shuah, dear Suravee, dear Jörg,


Am 03.06.20 um 08:54 schrieb Alexander Monakov:
> On Tue, 2 Jun 2020, Shuah Khan wrote:
> 
>> I changed the logic to read config to get max banks and counters
>> before checking if counters are writable and tried writing to all.
>> The result is the same and all of them aren't writable. However,
>> when disable the writable check and assume they are, I can run
> [snip]
> 
> This is similar to what I did. I also noticed that counters can
> be successfully used with perf if the initial check is ignored.
> I was considering sending a patch to remove the check and adjust
> the event counting logic to use counters as read-only, but after
> a bit more investigation I've noticed how late pci_enable_device
> is done, and came up with this patch. It's a path of less resistance:
> I'd expect maintainers to be more averse to removing the check
> rather than fixing it so it works as intended (even though I think
> the check should not be there in the first place).
> 
> However:
> 
> The ability to modify the counters is needed only for sampling the
> events (getting an interrupt when a counter overflows). There's no
> code to do that for these AMD IOMMU counters. A solution I would
> prefer is to not write to those counters at all. It would simplify or
> even remove a bunch of code. I can submit a corresponding patch if
> there's general agreement this path is ok.
> 
> What do you think?

I like this idea. Suravee, Jörg, what do you think?

Commit 6778ff5b21b (iommu/amd: Fix performance counter initialization) 
delays the boot up to 100 ms, which is over 20 % on fast systems, and 
also just a workaround, and does not seem to work always. The delay is 
also not mentioned in the commit message.


Kind regards,

Paul


[1]: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6778ff5b21bd8e78c8bd547fd66437cf2657fd9b

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] iommu/amd: Fix event counter availability check
  2021-02-26 21:44       ` Paul Menzel
@ 2021-02-26 21:55         ` Shuah Khan
  0 siblings, 0 replies; 19+ messages in thread
From: Shuah Khan @ 2021-02-26 21:55 UTC (permalink / raw)
  To: Paul Menzel, Alexander Monakov, Suravee Suthikulpanit
  Cc: iommu, linux-kernel, David Coe, Joerg Roedel, Tj (Elloe Linux),
	Shuah Khan

On 2/26/21 2:44 PM, Paul Menzel wrote:
> [cc: +suravee, +jörg]
> 
> Dear Alex, dear Shuah, dear Suravee, dear Jörg,
> 
> 
> Am 03.06.20 um 08:54 schrieb Alexander Monakov:
>> On Tue, 2 Jun 2020, Shuah Khan wrote:
>>
>>> I changed the logic to read config to get max banks and counters
>>> before checking if counters are writable and tried writing to all.
>>> The result is the same and all of them aren't writable. However,
>>> when disable the writable check and assume they are, I can run
>> [snip]
>>
>> This is similar to what I did. I also noticed that counters can
>> be successfully used with perf if the initial check is ignored.
>> I was considering sending a patch to remove the check and adjust
>> the event counting logic to use counters as read-only, but after
>> a bit more investigation I've noticed how late pci_enable_device
>> is done, and came up with this patch. It's a path of less resistance:
>> I'd expect maintainers to be more averse to removing the check
>> rather than fixing it so it works as intended (even though I think
>> the check should not be there in the first place).
>>
>> However:
>>
>> The ability to modify the counters is needed only for sampling the
>> events (getting an interrupt when a counter overflows). There's no
>> code to do that for these AMD IOMMU counters. A solution I would
>> prefer is to not write to those counters at all. It would simplify or
>> even remove a bunch of code. I can submit a corresponding patch if
>> there's general agreement this path is ok.
>>
>> What do you think?
> 
> I like this idea. Suravee, Jörg, what do you think?
> 
> Commit 6778ff5b21b (iommu/amd: Fix performance counter initialization) 
> delays the boot up to 100 ms, which is over 20 % on fast systems, and 
> also just a workaround, and does not seem to work always. The delay is 
> also not mentioned in the commit message.
> 
>

Sounds good to me. I can test this out on my system.

thanks,
-- Shuah


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-02-26 21:56 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-29 20:07 [PATCH] iommu/amd: Fix event counter availability check Alexander Monakov
2020-05-31  7:22 ` Alexander Monakov
2020-06-01  2:48   ` Paul Menzel
2021-02-21 13:44     ` Paul Menzel
2020-06-02 23:51   ` Shuah Khan
2020-06-03  6:54     ` Alexander Monakov
2021-02-26 21:44       ` Paul Menzel
2021-02-26 21:55         ` Shuah Khan
2020-06-01  7:37 ` Suravee Suthikulpanit
2020-06-01  9:01   ` Alexander Monakov
2020-06-01 15:10     ` Suravee Suthikulpanit
2020-06-01 16:09       ` Alexander Monakov
2020-06-15 20:48       ` Alexander Monakov
2020-06-16  9:35         ` Suravee Suthikulpanit
2020-06-30 19:22           ` Alexander Monakov
2020-09-17 17:55           ` Alexander Monakov
2021-02-21 13:49             ` Paul Menzel
2021-02-22 17:59               ` Suravee Suthikulpanit
2021-02-24 20:35                 ` Paul Menzel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).