All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lu Baolu <baolu.lu@linux.intel.com>
To: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Cc: baolu.lu@linux.intel.com, "David Woodhouse" <dwmw2@infradead.org>,
	"Joerg Roedel" <joro@8bytes.org>,
	iommu@lists.linux-foundation.org,
	intel-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org,
	"Michał Wajdeczko" <michal.wajdeczko@intel.com>
Subject: Re: [RFC PATCH] iommu/vt-d: Fix IOMMU field not populated on device hot re-plug
Date: Wed, 28 Aug 2019 08:56:18 +0800	[thread overview]
Message-ID: <8f505c10-6256-c561-1aea-b3817388c5b2@linux.intel.com> (raw)
In-Reply-To: <29020717.Hl6jQjRASr@jkrzyszt-desk.ger.corp.intel.com>

Hi Janusz,

On 8/27/19 5:35 PM, Janusz Krzysztofik wrote:
> Hi Lu,
> 
> On Monday, August 26, 2019 10:29:12 AM CEST Lu Baolu wrote:
>> Hi Janusz,
>>
>> On 8/26/19 4:15 PM, Janusz Krzysztofik wrote:
>>> Hi Lu,
>>>
>>> On Friday, August 23, 2019 3:51:11 AM CEST Lu Baolu wrote:
>>>> Hi,
>>>>
>>>> On 8/22/19 10:29 PM, Janusz Krzysztofik wrote:
>>>>> When a perfectly working i915 device is hot unplugged (via sysfs) and
>>>>> hot re-plugged again, its dev->archdata.iommu field is not populated
>>>>> again with an IOMMU pointer.  As a result, the device probe fails on
>>>>> DMA mapping error during scratch page setup.
>>>>>
>>>>> It looks like that happens because devices are not detached from their
>>>>> MMUIO bus before they are removed on device unplug.  Then, when an
>>>>> already registered device/IOMMU association is identified by the
>>>>> reinstantiated device's bus and function IDs on IOMMU bus re-attach
>>>>> attempt, the device's archdata is not populated with IOMMU information
>>>>> and the bad happens.
>>>>>
>>>>> I'm not sure if this is a proper fix but it works for me so at least it
>>>>> confirms correctness of my analysis results, I believe.  So far I
>>>>> haven't been able to identify a good place where the possibly missing
>>>>> IOMMU bus detach on device unplug operation could be added.
>>>>
>>>> Which kernel version are you testing with? Does it contain below commit?
>>>>
>>>> commit 458b7c8e0dde12d140e3472b80919cbb9ae793f4
>>>> Author: Lu Baolu <baolu.lu@linux.intel.com>
>>>> Date:   Thu Aug 1 11:14:58 2019 +0800
>>>
>>> I was using an internal branch based on drm-tip which didn't contain this
>>> commit yet.  Fortunately it has been already merged into drm-tip over last
>>> weekend and has effectively fixed the issue.
>>
>> Thanks for testing this.
> 
> My testing appeared not sufficiently exhaustive. The fix indeed resolved my
> initially discovered issue of not being able to rebind the i915 driver to a
> re-plugged device, however it brought another, probably more serious problem
> to light.
> 
> When an open i915 device is hot unplugged, IOMMU bus notifier now cleans up
> IOMMU info for the device on PCI device remove while the i915 driver is still
> not released, kept by open file descriptors.  Then, on last device close,
> cleanup attempts lead to kernel panic raised from intel_unmap() on unresolved
> IOMMU domain.

We should avoid kernel panic when a intel_unmap() is called against
a non-existent domain. But we shouldn't expect the IOMMU driver not
cleaning up the domain info when a device remove notification comes and 
wait until all file descriptors being closed, right?

Best regards,
Baolu

> 
> With commit 458b7c8e0dde reverted and my fix applied, both late device close
> and device re-plug work for me.  However, I can realize that's probably still
> not a complete solution, possibly missing some protection against reuse of a
> removed device other than for cleanup.  If you think that's the right way to
> go, I can work more on that.
> 
> I've had a look at other drivers and found AMD is using somehow similar
> approach.  On the other hand, looking at the IOMMU common code I couldn't
> identify any arrangement that would support deferred device cleanup.
> 
> If that approach is not acceptable for Intel IOMMU, please suggest a way you'd
> like to have it resolved and I can try to implement it.
> 
> Thanks,
> Janusz
> 
>> Best regards,
>> Lu Baolu
>>

WARNING: multiple messages have this Message-ID (diff)
From: Lu Baolu <baolu.lu@linux.intel.com>
To: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Cc: linux-kernel@vger.kernel.org, iommu@lists.linux-foundation.org,
	"Michał Wajdeczko" <michal.wajdeczko@intel.com>,
	"David Woodhouse" <dwmw2@infradead.org>,
	intel-gfx@lists.freedesktop.org
Subject: Re: [RFC PATCH] iommu/vt-d: Fix IOMMU field not populated on device hot re-plug
Date: Wed, 28 Aug 2019 08:56:18 +0800	[thread overview]
Message-ID: <8f505c10-6256-c561-1aea-b3817388c5b2@linux.intel.com> (raw)
In-Reply-To: <29020717.Hl6jQjRASr@jkrzyszt-desk.ger.corp.intel.com>

Hi Janusz,

On 8/27/19 5:35 PM, Janusz Krzysztofik wrote:
> Hi Lu,
> 
> On Monday, August 26, 2019 10:29:12 AM CEST Lu Baolu wrote:
>> Hi Janusz,
>>
>> On 8/26/19 4:15 PM, Janusz Krzysztofik wrote:
>>> Hi Lu,
>>>
>>> On Friday, August 23, 2019 3:51:11 AM CEST Lu Baolu wrote:
>>>> Hi,
>>>>
>>>> On 8/22/19 10:29 PM, Janusz Krzysztofik wrote:
>>>>> When a perfectly working i915 device is hot unplugged (via sysfs) and
>>>>> hot re-plugged again, its dev->archdata.iommu field is not populated
>>>>> again with an IOMMU pointer.  As a result, the device probe fails on
>>>>> DMA mapping error during scratch page setup.
>>>>>
>>>>> It looks like that happens because devices are not detached from their
>>>>> MMUIO bus before they are removed on device unplug.  Then, when an
>>>>> already registered device/IOMMU association is identified by the
>>>>> reinstantiated device's bus and function IDs on IOMMU bus re-attach
>>>>> attempt, the device's archdata is not populated with IOMMU information
>>>>> and the bad happens.
>>>>>
>>>>> I'm not sure if this is a proper fix but it works for me so at least it
>>>>> confirms correctness of my analysis results, I believe.  So far I
>>>>> haven't been able to identify a good place where the possibly missing
>>>>> IOMMU bus detach on device unplug operation could be added.
>>>>
>>>> Which kernel version are you testing with? Does it contain below commit?
>>>>
>>>> commit 458b7c8e0dde12d140e3472b80919cbb9ae793f4
>>>> Author: Lu Baolu <baolu.lu@linux.intel.com>
>>>> Date:   Thu Aug 1 11:14:58 2019 +0800
>>>
>>> I was using an internal branch based on drm-tip which didn't contain this
>>> commit yet.  Fortunately it has been already merged into drm-tip over last
>>> weekend and has effectively fixed the issue.
>>
>> Thanks for testing this.
> 
> My testing appeared not sufficiently exhaustive. The fix indeed resolved my
> initially discovered issue of not being able to rebind the i915 driver to a
> re-plugged device, however it brought another, probably more serious problem
> to light.
> 
> When an open i915 device is hot unplugged, IOMMU bus notifier now cleans up
> IOMMU info for the device on PCI device remove while the i915 driver is still
> not released, kept by open file descriptors.  Then, on last device close,
> cleanup attempts lead to kernel panic raised from intel_unmap() on unresolved
> IOMMU domain.

We should avoid kernel panic when a intel_unmap() is called against
a non-existent domain. But we shouldn't expect the IOMMU driver not
cleaning up the domain info when a device remove notification comes and 
wait until all file descriptors being closed, right?

Best regards,
Baolu

> 
> With commit 458b7c8e0dde reverted and my fix applied, both late device close
> and device re-plug work for me.  However, I can realize that's probably still
> not a complete solution, possibly missing some protection against reuse of a
> removed device other than for cleanup.  If you think that's the right way to
> go, I can work more on that.
> 
> I've had a look at other drivers and found AMD is using somehow similar
> approach.  On the other hand, looking at the IOMMU common code I couldn't
> identify any arrangement that would support deferred device cleanup.
> 
> If that approach is not acceptable for Intel IOMMU, please suggest a way you'd
> like to have it resolved and I can try to implement it.
> 
> Thanks,
> Janusz
> 
>> Best regards,
>> Lu Baolu
>>
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

  reply	other threads:[~2019-08-28  0:57 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-22 14:29 [RFC PATCH] iommu/vt-d: Fix IOMMU field not populated on device hot re-plug Janusz Krzysztofik
2019-08-22 14:29 ` Janusz Krzysztofik
2019-08-22 15:34 ` ✓ Fi.CI.BAT: success for " Patchwork
2019-08-23  1:51 ` [RFC PATCH] " Lu Baolu
2019-08-23  1:51   ` Lu Baolu
2019-08-26  8:15   ` Janusz Krzysztofik
2019-08-26  8:15     ` Janusz Krzysztofik
2019-08-26  8:29     ` Lu Baolu
2019-08-26  8:29       ` Lu Baolu
2019-08-27  9:35       ` Janusz Krzysztofik
2019-08-27  9:35         ` Janusz Krzysztofik
2019-08-27  9:35         ` Janusz Krzysztofik
2019-08-28  0:56         ` Lu Baolu [this message]
2019-08-28  0:56           ` Lu Baolu
2019-08-28 14:17           ` Janusz Krzysztofik
2019-08-28 14:17             ` Janusz Krzysztofik
2019-08-29  1:43             ` Lu Baolu
2019-08-29  1:43               ` Lu Baolu
2019-08-29  1:43               ` Lu Baolu
2019-08-29  7:58               ` Janusz Krzysztofik
2019-08-29  7:58                 ` Janusz Krzysztofik
2019-08-29  7:58                 ` Janusz Krzysztofik
2019-08-29  9:08                 ` Lu Baolu
2019-08-29  9:08                   ` Lu Baolu
2019-09-02  8:37                   ` Janusz Krzysztofik
2019-09-02  8:37                     ` Janusz Krzysztofik
2019-09-03  1:29                     ` Lu Baolu
2019-09-03  1:29                       ` Lu Baolu
2019-09-03  7:41                       ` Janusz Krzysztofik
2019-09-03  7:41                         ` Janusz Krzysztofik
2019-10-01 15:01                         ` Janusz Krzysztofik
2019-10-01 15:01                           ` Janusz Krzysztofik
2019-10-08  2:27                           ` Lu Baolu
2019-10-08  2:27                             ` Lu Baolu
2019-10-08  2:27                             ` Lu Baolu
2019-10-11  6:54                         ` Lu Baolu
2019-10-11  6:54                           ` Lu Baolu
2019-10-11  6:54                           ` Lu Baolu
2019-10-11 10:27                           ` Janusz Krzysztofik
2019-10-11 10:27                             ` Janusz Krzysztofik
2019-10-11 10:27                             ` Janusz Krzysztofik
2019-08-23  8:47 ` ✓ Fi.CI.IGT: success for " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8f505c10-6256-c561-1aea-b3817388c5b2@linux.intel.com \
    --to=baolu.lu@linux.intel.com \
    --cc=dwmw2@infradead.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=janusz.krzysztofik@linux.intel.com \
    --cc=joro@8bytes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=michal.wajdeczko@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.