All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Sricharan" <sricharan@codeaurora.org>
To: 'Robin Murphy' <robin.murphy@arm.com>,
	will.deacon@arm.com, joro@8bytes.org,
	iommu@lists.linux-foundation.org,
	linux-arm-kernel@lists.infradead.org,
	linux-arm-msm@vger.kernel.org, laurent.pinchart@ideasonboard.com,
	m.szyprowski@samsung.com, tfiga@chromium.org,
	srinivas.kandagatla@linaro.org,
	'Lorenzo Pieralisi' <lorenzo.pieralisi@arm.com>
Subject: RE: [PATCH V3 0/8] IOMMU probe deferral support
Date: Mon, 28 Nov 2016 23:12:08 +0530	[thread overview]
Message-ID: <000001d2499e$df7f2d80$9e7d8880$@codeaurora.org> (raw)
In-Reply-To: <f2ad6976-0f8f-d116-bc43-37337dd9f41f@arm.com>

Hi Robin,

>> <snip..>
>>
>>>
>>>>>>>> iommu_group_get_for_dev which gets called in the add_device
>>>>>>>> callback, increases the reference count of the iommu_group,
>>>>>>>> so we do an iommu_group_put after that. iommu_group_get_for_dev
>>>>>>>> inturn calls device_group callback and in the case of arm-smmu
>>>>>>>> we call generic_device_group/pci_device_group which takes
>>>>>>>> care of increasing the group's reference. But when we return
>>>>>>>> an already existing group(when multiple devices have same group)
>>>>>>>> the reference is not incremented, resulting in issues when the
>>>>>>>> remove_device callback for the devices is invoked.
>>>>>>>> Fixing the same here.
>>>>>>>
>>>>>>> Bah, yes, this does look like my fault - after flip-flopping between
>>>>>>> about 3 different ways to keep refcounts for the S2CR entries, none of
>>>>>>> which would quite work, I ripped it all out but apparently still got
>>>>>>> things wrong, oh well. Thanks for figuring it out.
>>>>>>>
>>>>>>> On the probe-deferral angle, whilst it's useful to have uncovered this
>>>>>>> bug, I don't think we should actually be calling remove_device() from
>>>>>>> DMA teardown. I think it's preferable from a user perspective if group
>>>>>>> numbering remains stable, rather than changing depending on the order in
>>>>>>> which they unbind/rebind VFIO drivers. I'm really keen to try and get
>>>>>>> this in shape for 4.10, so I've taken the liberty of hacking up my own
>>>>>>> branch (iommu/defer) based on v3 - would you mind taking a look at the
>>>>>>> two "iommu/of:" commits to see what you think? (Ignore the PCI changes
>>>>>>> to your later patches - that was an experiment which didn't really work out)
>>>>>>
>>>>>> Ok, will take a look at this now and respond more on this.
>>>>>>
>>>>> Sorry for the delayed response on this. I was OOO for the last few days.
>>>>> So i tested this branch and it worked fine. I tested it with a pci device
>>>>> for both normal and deferred probe cases.  The of/iommu patches
>>>>> are the cleanup/preparation patches and it looks fine. One thing is without
>>>>> calling the remove_device callback, the resources like (smes for exmaple)
>>>>> and the group association of the device all remain allocated. That does not
>>>>> feel correct, given that the associated device does not exist. So to
>>>>> understand that, what happens with VFIO in this case which makes the
>>>>> group renumbering/rebinding a problem ?
>>>>>
>>>>
>>>> Would it be ok if i post a V4 based on your branch above ?
>>>
>>> Sure, as long as none of the hacks slip through :) - I've just pushed
>>> out a mild rework based on Lorenzo's v9, which I hope shouldn't break
>>> anything for you.
>>>
>>
>> Ok sure, i will test and just the post out the stuff from your branch then
>> mostly by tomorrow.
>
>Cool. We're rather hoping that the ACPI stuff is good to go for 4.10
>now, so it's probably worth pulling the rest of that in (beyond the one
>patch I picked) to make sure the of_dma_configure/acpi_dma_configure
>paths don't inadvertently diverge.
>

I rebased and was testing your branch with Lorenzo's series. One thing
i am still trying to get right is the acpi_dma_configure call. With your
series dma_configure calls pci_dma/of_dma configure, so i am just adding
acpi_dma_configure call there for non-pci ACPI devices as well. I see that
acpi_dma_configure right now is called from acpi_bind_one and 
iort_add_smmu_platform_device, both go through the really_probe function
path, so moving acpi_dma_configure from the above the two functions
to dma_configure. I remember we discussed this on another thread, so
hopefully it is correct. I do not have an platform to test the ACPI though.
I will take some testing help on V4 for this.

>>> Having thought a bit more about the add/remove thing, I'm inclined to
>>> agree that the group numbering itself may not be that big an issue in
>>> practice - sure, it could break my little script, but it looks like QEMU
>>> and such work with the device ID rather than the group number directly,
>>> so might not even notice. However, the fact remains that the callbacks
>>> are intended to handle a device being added to/removed from its bus, and
>>> will continue to do so on other platforms, so I don't like the idea of
>>> introducing needlessly different behaviour. If you unbind a driver, the
>>> stream IDs and everything don't stop existing at the hardware level; the
>>> struct device to which the in-kernel data belongs still exists and
>>> doesn't stop being associated with its bus. There's no good reason for
>>> freeing SMEs that we'll only reallocate again (inadequately-specced
>>> hardware with not enough SMRs/contexts is not a *good* reason), and
>>
>> ok, so SMRs/contexts was the reason i was adding the remove_dev
>> callback, but if thats not good enough then there was no other
>> intention.
>>
>>> there are also some strong arguments against letting any stream IDs the
>>> kernel knows about go back to bypass after a driver has been bound - by
>>
>> ok, but not sure why is this so ?
>
>Any device the kernel is in control of, having bound a driver to it,
>definitely should not be doing DMA after that driver is unbound...
>

ok.

>>> keeping groups around as expected that's something we can implement
>>> quite easily without having to completely lock down bypass for stream
>>> IDs the kernel *doesn't* know about.
>>>
>>
>> So do you mean in this case to keep the unbound device's group/context bank
>> to bypass rather than resetting the streamids ?
>
>...which we can easily enforce by keeping the device attached to its
>default domain, in which nothing should be mapped by that point (we
>could even have a group notifier switch its S2CRs to faulting entries
>for extreme paranoia). Freeing the SMRs means those stream IDs would
>instead fall back to the default "unmatched" behaviour, which in general
>is going to be bypass, and thus allow DMA attacks.
>
>It's harder to disable unmatched bypass in general, because we may have
>devices which physically master through the SMMU but want low latency
>more than they want translation (at the moment the best we can do is
>leave the kernel unaware of those stream IDs), or there could be unknown
>devices under control of the firmware or other agents which we would
>disrupt by hitting a system-wide switch.
>

ok thanks, understand the point now. Agree that putting the previously bound
devices to fault is right than putting it to bypass. So the notifier to switch the
S2CRs to faulting entries can be added as a separate patch on top of this.

Regards,
 Sricharan

WARNING: multiple messages have this Message-ID (diff)
From: sricharan@codeaurora.org (Sricharan)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH V3 0/8] IOMMU probe deferral support
Date: Mon, 28 Nov 2016 23:12:08 +0530	[thread overview]
Message-ID: <000001d2499e$df7f2d80$9e7d8880$@codeaurora.org> (raw)
In-Reply-To: <f2ad6976-0f8f-d116-bc43-37337dd9f41f@arm.com>

Hi Robin,

>> <snip..>
>>
>>>
>>>>>>>> iommu_group_get_for_dev which gets called in the add_device
>>>>>>>> callback, increases the reference count of the iommu_group,
>>>>>>>> so we do an iommu_group_put after that. iommu_group_get_for_dev
>>>>>>>> inturn calls device_group callback and in the case of arm-smmu
>>>>>>>> we call generic_device_group/pci_device_group which takes
>>>>>>>> care of increasing the group's reference. But when we return
>>>>>>>> an already existing group(when multiple devices have same group)
>>>>>>>> the reference is not incremented, resulting in issues when the
>>>>>>>> remove_device callback for the devices is invoked.
>>>>>>>> Fixing the same here.
>>>>>>>
>>>>>>> Bah, yes, this does look like my fault - after flip-flopping between
>>>>>>> about 3 different ways to keep refcounts for the S2CR entries, none of
>>>>>>> which would quite work, I ripped it all out but apparently still got
>>>>>>> things wrong, oh well. Thanks for figuring it out.
>>>>>>>
>>>>>>> On the probe-deferral angle, whilst it's useful to have uncovered this
>>>>>>> bug, I don't think we should actually be calling remove_device() from
>>>>>>> DMA teardown. I think it's preferable from a user perspective if group
>>>>>>> numbering remains stable, rather than changing depending on the order in
>>>>>>> which they unbind/rebind VFIO drivers. I'm really keen to try and get
>>>>>>> this in shape for 4.10, so I've taken the liberty of hacking up my own
>>>>>>> branch (iommu/defer) based on v3 - would you mind taking a look at the
>>>>>>> two "iommu/of:" commits to see what you think? (Ignore the PCI changes
>>>>>>> to your later patches - that was an experiment which didn't really work out)
>>>>>>
>>>>>> Ok, will take a look at this now and respond more on this.
>>>>>>
>>>>> Sorry for the delayed response on this. I was OOO for the last few days.
>>>>> So i tested this branch and it worked fine. I tested it with a pci device
>>>>> for both normal and deferred probe cases.  The of/iommu patches
>>>>> are the cleanup/preparation patches and it looks fine. One thing is without
>>>>> calling the remove_device callback, the resources like (smes for exmaple)
>>>>> and the group association of the device all remain allocated. That does not
>>>>> feel correct, given that the associated device does not exist. So to
>>>>> understand that, what happens with VFIO in this case which makes the
>>>>> group renumbering/rebinding a problem ?
>>>>>
>>>>
>>>> Would it be ok if i post a V4 based on your branch above ?
>>>
>>> Sure, as long as none of the hacks slip through :) - I've just pushed
>>> out a mild rework based on Lorenzo's v9, which I hope shouldn't break
>>> anything for you.
>>>
>>
>> Ok sure, i will test and just the post out the stuff from your branch then
>> mostly by tomorrow.
>
>Cool. We're rather hoping that the ACPI stuff is good to go for 4.10
>now, so it's probably worth pulling the rest of that in (beyond the one
>patch I picked) to make sure the of_dma_configure/acpi_dma_configure
>paths don't inadvertently diverge.
>

I rebased and was testing your branch with Lorenzo's series. One thing
i am still trying to get right is the acpi_dma_configure call. With your
series dma_configure calls pci_dma/of_dma configure, so i am just adding
acpi_dma_configure call there for non-pci ACPI devices as well. I see that
acpi_dma_configure right now is called from acpi_bind_one and 
iort_add_smmu_platform_device, both go through the really_probe function
path, so moving acpi_dma_configure from the above the two functions
to dma_configure. I remember we discussed this on another thread, so
hopefully it is correct. I do not have an platform to test the ACPI though.
I will take some testing help on V4 for this.

>>> Having thought a bit more about the add/remove thing, I'm inclined to
>>> agree that the group numbering itself may not be that big an issue in
>>> practice - sure, it could break my little script, but it looks like QEMU
>>> and such work with the device ID rather than the group number directly,
>>> so might not even notice. However, the fact remains that the callbacks
>>> are intended to handle a device being added to/removed from its bus, and
>>> will continue to do so on other platforms, so I don't like the idea of
>>> introducing needlessly different behaviour. If you unbind a driver, the
>>> stream IDs and everything don't stop existing at the hardware level; the
>>> struct device to which the in-kernel data belongs still exists and
>>> doesn't stop being associated with its bus. There's no good reason for
>>> freeing SMEs that we'll only reallocate again (inadequately-specced
>>> hardware with not enough SMRs/contexts is not a *good* reason), and
>>
>> ok, so SMRs/contexts was the reason i was adding the remove_dev
>> callback, but if thats not good enough then there was no other
>> intention.
>>
>>> there are also some strong arguments against letting any stream IDs the
>>> kernel knows about go back to bypass after a driver has been bound - by
>>
>> ok, but not sure why is this so ?
>
>Any device the kernel is in control of, having bound a driver to it,
>definitely should not be doing DMA after that driver is unbound...
>

ok.

>>> keeping groups around as expected that's something we can implement
>>> quite easily without having to completely lock down bypass for stream
>>> IDs the kernel *doesn't* know about.
>>>
>>
>> So do you mean in this case to keep the unbound device's group/context bank
>> to bypass rather than resetting the streamids ?
>
>...which we can easily enforce by keeping the device attached to its
>default domain, in which nothing should be mapped by that point (we
>could even have a group notifier switch its S2CRs to faulting entries
>for extreme paranoia). Freeing the SMRs means those stream IDs would
>instead fall back to the default "unmatched" behaviour, which in general
>is going to be bypass, and thus allow DMA attacks.
>
>It's harder to disable unmatched bypass in general, because we may have
>devices which physically master through the SMMU but want low latency
>more than they want translation (at the moment the best we can do is
>leave the kernel unaware of those stream IDs), or there could be unknown
>devices under control of the firmware or other agents which we would
>disrupt by hitting a system-wide switch.
>

ok thanks, understand the point now. Agree that putting the previously bound
devices to fault is right than putting it to bypass. So the notifier to switch the
S2CRs to faulting entries can be added as a separate patch on top of this.

Regards,
 Sricharan

  reply	other threads:[~2016-11-28 17:43 UTC|newest]

Thread overview: 118+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20161004170414eucas1p141bebe16e1bf241862833e7ad0270c72@eucas1p1.samsung.com>
2016-10-04 17:03 ` [PATCH V3 0/8] IOMMU probe deferral support Sricharan R
2016-10-04 17:03   ` Sricharan R
     [not found]   ` <1475600632-21289-1-git-send-email-sricharan-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2016-10-04 17:03     ` [PATCH V3 1/8] arm: dma-mapping: Don't override dma_ops in arch_setup_dma_ops() Sricharan R
2016-10-04 17:03       ` Sricharan R
2016-10-04 17:03     ` [PATCH V3 4/8] drivers: platform: Configure dma operations at probe time Sricharan R
2016-10-04 17:03       ` Sricharan R
2016-10-26 14:07       ` Robin Murphy
2016-10-26 14:07         ` Robin Murphy
2016-10-26 15:04         ` Sricharan
2016-10-26 15:04           ` Sricharan
2016-10-27 10:49           ` Lorenzo Pieralisi
2016-10-27 10:49             ` Lorenzo Pieralisi
2016-11-02  7:05             ` Sricharan
2016-11-02  7:05               ` Sricharan
2016-10-04 17:03     ` [PATCH V3 6/8] arm: dma-mapping: Reset the device's dma_ops Sricharan R
2016-10-04 17:03       ` Sricharan R
2016-10-26 15:07       ` Robin Murphy
2016-10-26 15:07         ` Robin Murphy
     [not found]         ` <a3d4533f-165d-f444-7681-141479617a18-5wv7dgnIgG8@public.gmane.org>
2016-10-27  3:37           ` Sricharan
2016-10-27  3:37             ` Sricharan
2017-05-23 16:25             ` Russell King - ARM Linux
2017-05-23 16:25               ` Russell King - ARM Linux
     [not found]               ` <20170523162507.GA1729-l+eeeJia6m9URfEZ8mYm6t73F7V6hmMc@public.gmane.org>
2017-05-23 16:55                 ` Robin Murphy
2017-05-23 16:55                   ` Robin Murphy
2017-05-23 17:53                   ` Russell King - ARM Linux
2017-05-23 17:53                     ` Russell King - ARM Linux
     [not found]                     ` <20170523175319.GA22219-l+eeeJia6m9URfEZ8mYm6t73F7V6hmMc@public.gmane.org>
2017-05-23 21:46                       ` Laurent Pinchart
2017-05-23 21:46                         ` Laurent Pinchart
2017-05-23 22:42                         ` Russell King - ARM Linux
2017-05-23 22:42                           ` Russell King - ARM Linux
     [not found]                           ` <20170523224216.GI22219-l+eeeJia6m9URfEZ8mYm6t73F7V6hmMc@public.gmane.org>
2017-05-24 10:31                             ` Sricharan R
2017-05-24 10:31                               ` Sricharan R
     [not found]                               ` <c4ad7341-fa9f-81b7-a41c-417144c4f842-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2017-05-24 11:26                                 ` Laurent Pinchart
2017-05-24 11:26                                   ` Laurent Pinchart
2017-05-24 11:38                                   ` Sricharan R
2017-05-24 11:38                                     ` Sricharan R
2017-05-25 15:05                                   ` Russell King - ARM Linux
2017-05-25 15:05                                     ` Russell King - ARM Linux
     [not found]                                     ` <20170525150540.GJ22219-l+eeeJia6m9URfEZ8mYm6t73F7V6hmMc@public.gmane.org>
2017-05-26  5:18                                       ` Sricharan R
2017-05-26  5:18                                         ` Sricharan R
2017-05-26 14:04                                       ` Laurent Pinchart
2017-05-26 14:04                                         ` Laurent Pinchart
2016-10-10 12:36     ` [PATCH V3 0/8] IOMMU probe deferral support Marek Szyprowski
2016-10-10 12:36       ` Marek Szyprowski
2016-10-17  6:58       ` Sricharan
2016-10-17  6:58         ` Sricharan
     [not found]       ` <12cfb59f-f7ca-d4df-eb7f-42348e357979-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org>
2016-10-12  6:24         ` Sricharan
2016-10-12  6:24           ` Sricharan
2016-10-24  6:34           ` Marek Szyprowski
2016-10-24  6:34             ` Marek Szyprowski
     [not found]             ` <b9e4e81f-3b3e-951f-df62-d640275aae71-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org>
2016-10-24 12:30               ` Sricharan
2016-10-24 12:30                 ` Sricharan
2016-10-17  7:02         ` Sricharan
2016-10-17  7:02           ` Sricharan
2016-10-25  6:25     ` Archit Taneja
2016-10-25  6:25       ` Archit Taneja
2016-10-04 17:03   ` [PATCH V3 2/8] of: dma: Move range size workaround to of_dma_get_range() Sricharan R
2016-10-04 17:03     ` Sricharan R
2016-10-04 17:03   ` [PATCH V3 3/8] of: dma: Make of_dma_deconfigure() public Sricharan R
2016-10-04 17:03     ` Sricharan R
2016-10-04 17:03   ` [PATCH V3 5/8] iommu: of: Handle IOMMU lookup failure with deferred probing or error Sricharan R
2016-10-04 17:03     ` Sricharan R
     [not found]     ` <1475600632-21289-6-git-send-email-sricharan-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2016-10-26 14:52       ` Robin Murphy
2016-10-26 14:52         ` Robin Murphy
     [not found]         ` <f08e65b4-f755-897c-f776-40f0d6788251-5wv7dgnIgG8@public.gmane.org>
2016-10-27  2:55           ` Sricharan
2016-10-27  2:55             ` Sricharan
2016-10-04 17:03   ` [PATCH V3 7/8] arm/arm64: dma-mapping: Call iommu's remove_device callback during device detach Sricharan R
2016-10-04 17:03     ` Sricharan R
     [not found]     ` <1475600632-21289-8-git-send-email-sricharan-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2016-10-26 15:16       ` Robin Murphy
2016-10-26 15:16         ` Robin Murphy
2016-10-27  5:16         ` Sricharan
2016-10-27  5:16           ` Sricharan
2016-10-04 17:03   ` [PATCH V3 8/8] arm64: dma-mapping: Remove the notifier trick to handle early setting of dma_ops Sricharan R
2016-10-04 17:03     ` Sricharan R
     [not found]     ` <1475600632-21289-9-git-send-email-sricharan-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2016-10-07 15:40       ` Sricharan
2016-10-07 15:40         ` Sricharan
2016-10-26 15:34       ` Robin Murphy
2016-10-26 15:34         ` Robin Murphy
2016-10-27  5:19         ` Sricharan
2016-10-27  5:19           ` Sricharan
2016-10-25 14:35   ` [PATCH V3 0/8] IOMMU probe deferral support Robin Murphy
2016-10-25 14:35     ` Robin Murphy
     [not found]     ` <60ee8066-f167-e9df-ae3e-4138f1133bad-5wv7dgnIgG8@public.gmane.org>
2016-10-26 14:44       ` Sricharan
2016-10-26 14:44         ` Sricharan
2016-10-26 17:14         ` Robin Murphy
2016-10-26 17:14           ` Robin Murphy
     [not found]           ` <421e2b14-0231-d376-02a0-097423120b3d-5wv7dgnIgG8@public.gmane.org>
2016-10-27  8:37             ` Sricharan
2016-10-27  8:37               ` Sricharan
2016-11-03 22:25           ` Sricharan
2016-11-03 22:25             ` Sricharan
2016-11-04 15:16           ` Sricharan
2016-11-04 15:16             ` Sricharan
2016-11-07 19:13             ` Will Deacon
2016-11-07 19:13               ` Will Deacon
2016-11-07 19:22             ` Robin Murphy
2016-11-07 19:22               ` Robin Murphy
2016-11-09  6:24               ` Sricharan
2016-11-09  6:24                 ` Sricharan
2016-11-09 16:59                 ` Will Deacon
2016-11-09 16:59                   ` Will Deacon
2016-11-14  3:41               ` Sricharan
2016-11-14  3:41                 ` Sricharan
2016-11-20 15:11               ` Sricharan
2016-11-20 15:11                 ` Sricharan
2016-11-23 19:54                 ` Robin Murphy
2016-11-23 19:54                   ` Robin Murphy
     [not found]                   ` <918128b9-cdb0-1454-000a-146cee7a05ea-5wv7dgnIgG8@public.gmane.org>
2016-11-24 16:10                     ` Sricharan
2016-11-24 16:10                       ` Sricharan
2016-11-24 19:11                       ` Robin Murphy
2016-11-24 19:11                         ` Robin Murphy
2016-11-28 17:42                         ` Sricharan [this message]
2016-11-28 17:42                           ` Sricharan
2016-11-28 18:13                           ` Lorenzo Pieralisi
2016-11-28 18:13                             ` Lorenzo Pieralisi
2016-11-30  0:34                             ` Sricharan
2016-11-30  0:34                               ` Sricharan
2016-11-30 12:07                               ` Lorenzo Pieralisi
2016-11-30 12:07                                 ` Lorenzo Pieralisi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='000001d2499e$df7f2d80$9e7d8880$@codeaurora.org' \
    --to=sricharan@codeaurora.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=joro@8bytes.org \
    --cc=laurent.pinchart@ideasonboard.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=lorenzo.pieralisi@arm.com \
    --cc=m.szyprowski@samsung.com \
    --cc=robin.murphy@arm.com \
    --cc=srinivas.kandagatla@linaro.org \
    --cc=tfiga@chromium.org \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.