iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* Some questions about arm_lpae_install_table
@ 2020-11-03  3:00 Kunkun Jiang
  2020-11-03  9:11 ` Will Deacon
  0 siblings, 1 reply; 5+ messages in thread
From: Kunkun Jiang @ 2020-11-03  3:00 UTC (permalink / raw)
  To: robin.murphy, iommu, Will Deacon; +Cc: wanghaibin.wang, zhukeqian1

Hi Robin,

Recently, I have read and learned the code related to io-pgtable-arm.c. 
There
are two question on arm_lpae_install_table.

1、the first

> static arm_lpae_iopte arm_lpae_install_table(arm_lpae_iopte *table,
>                                              arm_lpae_iopte *ptep,
>                                              arm_lpae_iopte curr,
>                                              struct io_pgtable_cfg *cfg)
> {
>         arm_lpae_iopte old, new;
>
>         new = __pa(table) | ARM_LPAE_PTE_TYPE_TABLE;
>         if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_NS)
>                 new |= ARM_LPAE_PTE_NSTABLE;
>
>        /*
>          * Ensure the table itself is visible before its PTE can be.
>          * Whilst we could get away with cmpxchg64_release below, this
>          * doesn't have any ordering semantics when !CONFIG_SMP.
>          */
>         dma_wmb();
>
>         old = cmpxchg64_relaxed(ptep, curr, new);
>
>         if (cfg->coherent_walk || (old & ARM_LPAE_PTE_SW_SYNC))
>                 return old;
>
>         /* Even if it's not ours, there's no point waiting; just kick 
> it */
>         __arm_lpae_sync_pte(ptep, cfg);
>         if (old == curr)
>                 WRITE_ONCE(*ptep, new | ARM_LPAE_PTE_SW_SYNC);
>
>         return old;
> }

If another thread changes the ptep between cmpxchg64_relaxed and
WRITE_ONCE(*ptep, new | ARM_LPAE_PTE_SW_SYNC), the operation
WRITE_ONCE will overwrite the change.

2、the second

> for (i = 0; i < tablesz / sizeof(pte); i++, blk_paddr += split_sz) {
>                 /* Unmap! */
>                 if (i == unmap_idx)
> continue;
>
>                 __arm_lpae_init_pte(data, blk_paddr, pte, lvl, 
> &tablep[i]);
> }
>
> pte = arm_lpae_install_table(tablep, ptep, blk_pte, cfg);

When altering a translation table descriptor include split a block into
constituent granules, the Armv8-A and SMMUv3 architectures require
a break-before-make procedure. But in the function arm_lpae_split_blk_unmap,
it changes a block descriptor to an equivalent span of page translations
directly. Is it appropriate to do so?

The above two questions may be my wrong thinking. Coude you please
give me more hints?

Thanks,
Kunkun Jiang






_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Some questions about arm_lpae_install_table
  2020-11-03  3:00 Some questions about arm_lpae_install_table Kunkun Jiang
@ 2020-11-03  9:11 ` Will Deacon
  2020-11-03 10:21   ` Robin Murphy
  0 siblings, 1 reply; 5+ messages in thread
From: Will Deacon @ 2020-11-03  9:11 UTC (permalink / raw)
  To: Kunkun Jiang; +Cc: wanghaibin.wang, iommu, robin.murphy, zhukeqian1

On Tue, Nov 03, 2020 at 11:00:27AM +0800, Kunkun Jiang wrote:
> Recently, I have read and learned the code related to io-pgtable-arm.c.
> There
> are two question on arm_lpae_install_table.
> 
> 1、the first
> 
> > static arm_lpae_iopte arm_lpae_install_table(arm_lpae_iopte *table,
> >                                              arm_lpae_iopte *ptep,
> >                                              arm_lpae_iopte curr,
> >                                              struct io_pgtable_cfg *cfg)
> > {
> >         arm_lpae_iopte old, new;
> > 
> >         new = __pa(table) | ARM_LPAE_PTE_TYPE_TABLE;
> >         if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_NS)
> >                 new |= ARM_LPAE_PTE_NSTABLE;
> > 
> >        /*
> >          * Ensure the table itself is visible before its PTE can be.
> >          * Whilst we could get away with cmpxchg64_release below, this
> >          * doesn't have any ordering semantics when !CONFIG_SMP.
> >          */
> >         dma_wmb();
> > 
> >         old = cmpxchg64_relaxed(ptep, curr, new);
> > 
> >         if (cfg->coherent_walk || (old & ARM_LPAE_PTE_SW_SYNC))
> >                 return old;
> > 
> >         /* Even if it's not ours, there's no point waiting; just kick it
> > */
> >         __arm_lpae_sync_pte(ptep, cfg);
> >         if (old == curr)
> >                 WRITE_ONCE(*ptep, new | ARM_LPAE_PTE_SW_SYNC);
> > 
> >         return old;
> > }
> 
> If another thread changes the ptep between cmpxchg64_relaxed and
> WRITE_ONCE(*ptep, new | ARM_LPAE_PTE_SW_SYNC), the operation
> WRITE_ONCE will overwrite the change.

Can you please provide an example of a code path where this happens? The
idea is that CPUs can race on the cmpxchg(), but there will only be one
winner.

> 2、the second
> 
> > for (i = 0; i < tablesz / sizeof(pte); i++, blk_paddr += split_sz) {
> >                 /* Unmap! */
> >                 if (i == unmap_idx)
> > continue;
> > 
> >                 __arm_lpae_init_pte(data, blk_paddr, pte, lvl,
> > &tablep[i]);
> > }
> > 
> > pte = arm_lpae_install_table(tablep, ptep, blk_pte, cfg);
> 
> When altering a translation table descriptor include split a block into
> constituent granules, the Armv8-A and SMMUv3 architectures require
> a break-before-make procedure. But in the function arm_lpae_split_blk_unmap,
> it changes a block descriptor to an equivalent span of page translations
> directly. Is it appropriate to do so?

Break-before-make doesn't really work for the SMMU because faults are
generally fatal.

Are you seeing problems in practice with this code?

Will
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Some questions about arm_lpae_install_table
  2020-11-03  9:11 ` Will Deacon
@ 2020-11-03 10:21   ` Robin Murphy
  2020-11-04  7:17     ` Kunkun Jiang
  0 siblings, 1 reply; 5+ messages in thread
From: Robin Murphy @ 2020-11-03 10:21 UTC (permalink / raw)
  To: Will Deacon, Kunkun Jiang; +Cc: wanghaibin.wang, iommu, zhukeqian1

On 2020-11-03 09:11, Will Deacon wrote:
> On Tue, Nov 03, 2020 at 11:00:27AM +0800, Kunkun Jiang wrote:
>> Recently, I have read and learned the code related to io-pgtable-arm.c.
>> There
>> are two question on arm_lpae_install_table.
>>
>> 1、the first
>>
>>> static arm_lpae_iopte arm_lpae_install_table(arm_lpae_iopte *table,
>>>                                               arm_lpae_iopte *ptep,
>>>                                               arm_lpae_iopte curr,
>>>                                               struct io_pgtable_cfg *cfg)
>>> {
>>>          arm_lpae_iopte old, new;
>>>
>>>          new = __pa(table) | ARM_LPAE_PTE_TYPE_TABLE;
>>>          if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_NS)
>>>                  new |= ARM_LPAE_PTE_NSTABLE;
>>>
>>>         /*
>>>           * Ensure the table itself is visible before its PTE can be.
>>>           * Whilst we could get away with cmpxchg64_release below, this
>>>           * doesn't have any ordering semantics when !CONFIG_SMP.
>>>           */
>>>          dma_wmb();
>>>
>>>          old = cmpxchg64_relaxed(ptep, curr, new);
>>>
>>>          if (cfg->coherent_walk || (old & ARM_LPAE_PTE_SW_SYNC))
>>>                  return old;
>>>
>>>          /* Even if it's not ours, there's no point waiting; just kick it
>>> */
>>>          __arm_lpae_sync_pte(ptep, cfg);
>>>          if (old == curr)
>>>                  WRITE_ONCE(*ptep, new | ARM_LPAE_PTE_SW_SYNC);
>>>
>>>          return old;
>>> }
>>
>> If another thread changes the ptep between cmpxchg64_relaxed and
>> WRITE_ONCE(*ptep, new | ARM_LPAE_PTE_SW_SYNC), the operation
>> WRITE_ONCE will overwrite the change.
> 
> Can you please provide an example of a code path where this happens? The
> idea is that CPUs can race on the cmpxchg(), but there will only be one
> winner.

The only way a table entry can suddenly disappear is in a race that 
involves mapping or unmapping a whole block/table-sized region, while 
simultaneously mapping a page *within* that region. Yes, if someone uses 
the API in a nonsensical and completely invalid way that cannot have a 
predictable outcome, they get an unpredictable outcome. Whoop de do...

>> 2、the second
>>
>>> for (i = 0; i < tablesz / sizeof(pte); i++, blk_paddr += split_sz) {
>>>                  /* Unmap! */
>>>                  if (i == unmap_idx)
>>> continue;
>>>
>>>                  __arm_lpae_init_pte(data, blk_paddr, pte, lvl,
>>> &tablep[i]);
>>> }
>>>
>>> pte = arm_lpae_install_table(tablep, ptep, blk_pte, cfg);
>>
>> When altering a translation table descriptor include split a block into
>> constituent granules, the Armv8-A and SMMUv3 architectures require
>> a break-before-make procedure. But in the function arm_lpae_split_blk_unmap,
>> it changes a block descriptor to an equivalent span of page translations
>> directly. Is it appropriate to do so?
> 
> Break-before-make doesn't really work for the SMMU because faults are
> generally fatal.
> 
> Are you seeing problems in practice with this code?

TBH I do still wonder if we shouldn't just get rid of split_blk_unmap 
and all its complexity. Other drivers treat an unmap of a page from a 
block entry as simply unmapping the whole block, and that's the 
behaviour VFIO seems to expect. My only worry is that it's been around 
long enough that there might be some horrible out-of-tree code that *is* 
relying on it, despite the fact that it's impossible to implement in a 
way that's definitely 100% safe :/

Robin.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Some questions about arm_lpae_install_table
  2020-11-03 10:21   ` Robin Murphy
@ 2020-11-04  7:17     ` Kunkun Jiang
  2020-11-04 11:15       ` Robin Murphy
  0 siblings, 1 reply; 5+ messages in thread
From: Kunkun Jiang @ 2020-11-04  7:17 UTC (permalink / raw)
  To: Robin Murphy, Will Deacon; +Cc: wanghaibin.wang, iommu, zhukeqian1

Hi Will and Robin,

Sorry for the late reply.

On 2020/11/3 18:21, Robin Murphy wrote:
> On 2020-11-03 09:11, Will Deacon wrote:
>> On Tue, Nov 03, 2020 at 11:00:27AM +0800, Kunkun Jiang wrote:
>>> Recently, I have read and learned the code related to io-pgtable-arm.c.
>>> There
>>> are two question on arm_lpae_install_table.
>>>
>>> 1、the first
>>>
>>>> static arm_lpae_iopte arm_lpae_install_table(arm_lpae_iopte *table,
>>>>                                               arm_lpae_iopte *ptep,
>>>>                                               arm_lpae_iopte curr,
>>>>                                               struct io_pgtable_cfg 
>>>> *cfg)
>>>> {
>>>>          arm_lpae_iopte old, new;
>>>>
>>>>          new = __pa(table) | ARM_LPAE_PTE_TYPE_TABLE;
>>>>          if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_NS)
>>>>                  new |= ARM_LPAE_PTE_NSTABLE;
>>>>
>>>>         /*
>>>>           * Ensure the table itself is visible before its PTE can be.
>>>>           * Whilst we could get away with cmpxchg64_release below, 
>>>> this
>>>>           * doesn't have any ordering semantics when !CONFIG_SMP.
>>>>           */
>>>>          dma_wmb();
>>>>
>>>>          old = cmpxchg64_relaxed(ptep, curr, new);
>>>>
>>>>          if (cfg->coherent_walk || (old & ARM_LPAE_PTE_SW_SYNC))
>>>>                  return old;
>>>>
>>>>          /* Even if it's not ours, there's no point waiting; just 
>>>> kick it
>>>> */
>>>>          __arm_lpae_sync_pte(ptep, cfg);
>>>>          if (old == curr)
>>>>                  WRITE_ONCE(*ptep, new | ARM_LPAE_PTE_SW_SYNC);
>>>>
>>>>          return old;
>>>> }
>>>
>>> If another thread changes the ptep between cmpxchg64_relaxed and
>>> WRITE_ONCE(*ptep, new | ARM_LPAE_PTE_SW_SYNC), the operation
>>> WRITE_ONCE will overwrite the change.
>>
>> Can you please provide an example of a code path where this happens? The
>> idea is that CPUs can race on the cmpxchg(), but there will only be one
>> winner.
>
> The only way a table entry can suddenly disappear is in a race that 
> involves mapping or unmapping a whole block/table-sized region, while 
> simultaneously mapping a page *within* that region. Yes, if someone 
> uses the API in a nonsensical and completely invalid way that cannot 
> have a predictable outcome, they get an unpredictable outcome. Whoop 
> de do...
>
Yes, as Robin mentioned, there will be an unpredictable outcome in extreme
cases. Now, we have two APIs mapping and unmapping to alter a block/page
descriptor. I worry about that it may be  increasingly difficult to add 
API in
the future.

>>> 2、the second
>>>
>>>> for (i = 0; i < tablesz / sizeof(pte); i++, blk_paddr += split_sz) {
>>>>                  /* Unmap! */
>>>>                  if (i == unmap_idx)
>>>> continue;
>>>>
>>>>                  __arm_lpae_init_pte(data, blk_paddr, pte, lvl,
>>>> &tablep[i]);
>>>> }
>>>>
>>>> pte = arm_lpae_install_table(tablep, ptep, blk_pte, cfg);
>>>
>>> When altering a translation table descriptor include split a block into
>>> constituent granules, the Armv8-A and SMMUv3 architectures require
>>> a break-before-make procedure. But in the function 
>>> arm_lpae_split_blk_unmap,
>>> it changes a block descriptor to an equivalent span of page 
>>> translations
>>> directly. Is it appropriate to do so?
>>
>> Break-before-make doesn't really work for the SMMU because faults are
>> generally fatal.
>>
>> Are you seeing problems in practice with this code?
>
> TBH I do still wonder if we shouldn't just get rid of split_blk_unmap 
> and all its complexity. Other drivers treat an unmap of a page from a 
> block entry as simply unmapping the whole block, and that's the 
> behaviour VFIO seems to expect. My only worry is that it's been around 
> long enough that there might be some horrible out-of-tree code that 
> *is* relying on it, despite the fact that it's impossible to implement 
> in a way that's definitely 100% safe :/
>
> Robin.
> .
I have not encountered a problem. I mean SMMU has supported BBML feature
in SMMUv3.2. Maybe we can use this feature to update translation table 
without
break-before-make.

Look forward to your reply.

Thanks,
Kunkun Jiang
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Some questions about arm_lpae_install_table
  2020-11-04  7:17     ` Kunkun Jiang
@ 2020-11-04 11:15       ` Robin Murphy
  0 siblings, 0 replies; 5+ messages in thread
From: Robin Murphy @ 2020-11-04 11:15 UTC (permalink / raw)
  To: Kunkun Jiang, Will Deacon; +Cc: wanghaibin.wang, iommu, zhukeqian1

On 2020-11-04 07:17, Kunkun Jiang wrote:
> Hi Will and Robin,
> 
> Sorry for the late reply.
> 
> On 2020/11/3 18:21, Robin Murphy wrote:
>> On 2020-11-03 09:11, Will Deacon wrote:
>>> On Tue, Nov 03, 2020 at 11:00:27AM +0800, Kunkun Jiang wrote:
>>>> Recently, I have read and learned the code related to io-pgtable-arm.c.
>>>> There
>>>> are two question on arm_lpae_install_table.
>>>>
>>>> 1、the first
>>>>
>>>>> static arm_lpae_iopte arm_lpae_install_table(arm_lpae_iopte *table,
>>>>>                                               arm_lpae_iopte *ptep,
>>>>>                                               arm_lpae_iopte curr,
>>>>>                                               struct io_pgtable_cfg 
>>>>> *cfg)
>>>>> {
>>>>>          arm_lpae_iopte old, new;
>>>>>
>>>>>          new = __pa(table) | ARM_LPAE_PTE_TYPE_TABLE;
>>>>>          if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_NS)
>>>>>                  new |= ARM_LPAE_PTE_NSTABLE;
>>>>>
>>>>>         /*
>>>>>           * Ensure the table itself is visible before its PTE can be.
>>>>>           * Whilst we could get away with cmpxchg64_release below, 
>>>>> this
>>>>>           * doesn't have any ordering semantics when !CONFIG_SMP.
>>>>>           */
>>>>>          dma_wmb();
>>>>>
>>>>>          old = cmpxchg64_relaxed(ptep, curr, new);
>>>>>
>>>>>          if (cfg->coherent_walk || (old & ARM_LPAE_PTE_SW_SYNC))
>>>>>                  return old;
>>>>>
>>>>>          /* Even if it's not ours, there's no point waiting; just 
>>>>> kick it
>>>>> */
>>>>>          __arm_lpae_sync_pte(ptep, cfg);
>>>>>          if (old == curr)
>>>>>                  WRITE_ONCE(*ptep, new | ARM_LPAE_PTE_SW_SYNC);
>>>>>
>>>>>          return old;
>>>>> }
>>>>
>>>> If another thread changes the ptep between cmpxchg64_relaxed and
>>>> WRITE_ONCE(*ptep, new | ARM_LPAE_PTE_SW_SYNC), the operation
>>>> WRITE_ONCE will overwrite the change.
>>>
>>> Can you please provide an example of a code path where this happens? The
>>> idea is that CPUs can race on the cmpxchg(), but there will only be one
>>> winner.
>>
>> The only way a table entry can suddenly disappear is in a race that 
>> involves mapping or unmapping a whole block/table-sized region, while 
>> simultaneously mapping a page *within* that region. Yes, if someone 
>> uses the API in a nonsensical and completely invalid way that cannot 
>> have a predictable outcome, they get an unpredictable outcome. Whoop 
>> de do...
>>
> Yes, as Robin mentioned, there will be an unpredictable outcome in extreme
> cases. Now, we have two APIs mapping and unmapping to alter a block/page
> descriptor. I worry about that it may be  increasingly difficult to add 
> API in
> the future.

I still don't understand your concern. If two threads try to 
simultaneously create a block mapping and a page mapping *for the same 
virtual address*, the result is that one of those mappings will end up 
in place, depending on who managed to write to the PTE last. This code 
is self-consistent - it doesn't crash, nor end up with any mapping that 
*wasn't* requested - it just doesn't go far out of its way to 
accommodate obviously invalid usage. By comparison, say instead those 
two threads simultaneously call kfree() on the same pointer; the outcome 
of that is considerably worse.

What kind of additional new API are you imagining where such 
deliberately racy behaviour would be anything other than broken nonsense?

>>>> 2、the second
>>>>
>>>>> for (i = 0; i < tablesz / sizeof(pte); i++, blk_paddr += split_sz) {
>>>>>                  /* Unmap! */
>>>>>                  if (i == unmap_idx)
>>>>> continue;
>>>>>
>>>>>                  __arm_lpae_init_pte(data, blk_paddr, pte, lvl,
>>>>> &tablep[i]);
>>>>> }
>>>>>
>>>>> pte = arm_lpae_install_table(tablep, ptep, blk_pte, cfg);
>>>>
>>>> When altering a translation table descriptor include split a block into
>>>> constituent granules, the Armv8-A and SMMUv3 architectures require
>>>> a break-before-make procedure. But in the function 
>>>> arm_lpae_split_blk_unmap,
>>>> it changes a block descriptor to an equivalent span of page 
>>>> translations
>>>> directly. Is it appropriate to do so?
>>>
>>> Break-before-make doesn't really work for the SMMU because faults are
>>> generally fatal.
>>>
>>> Are you seeing problems in practice with this code?
>>
>> TBH I do still wonder if we shouldn't just get rid of split_blk_unmap 
>> and all its complexity. Other drivers treat an unmap of a page from a 
>> block entry as simply unmapping the whole block, and that's the 
>> behaviour VFIO seems to expect. My only worry is that it's been around 
>> long enough that there might be some horrible out-of-tree code that 
>> *is* relying on it, despite the fact that it's impossible to implement 
>> in a way that's definitely 100% safe :/
>>
>> Robin.
>> .
> I have not encountered a problem. I mean SMMU has supported BBML feature
> in SMMUv3.2. Maybe we can use this feature to update translation table 
> without
> break-before-make.

We could. But then what do we do for all the existing hardware without 
BBML, or on x86 or other drivers which don't split blocks anyway?

Robin.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-11-04 11:15 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-03  3:00 Some questions about arm_lpae_install_table Kunkun Jiang
2020-11-03  9:11 ` Will Deacon
2020-11-03 10:21   ` Robin Murphy
2020-11-04  7:17     ` Kunkun Jiang
2020-11-04 11:15       ` Robin Murphy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).