linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
To: "Christian König" <christian.koenig@amd.com>,
	"Sergei Miroshnichenko" <s.miroshnichenko@yadro.com>
Cc: "Alexander.Deucher@amd.com" <Alexander.Deucher@amd.com>,
	"anatoli.antonovitch@amd.com" <anatoli.antonovitch@amd.com>,
	"helgaas@kernel.org" <helgaas@kernel.org>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"linux@yadro.com" <linux@yadro.com>
Subject: Re: Question about supporting AMD eGPU hot plug case
Date: Mon, 15 Mar 2021 13:11:59 -0400	[thread overview]
Message-ID: <bd621789-d9de-aa2e-4a83-f8154e868325@amd.com> (raw)
In-Reply-To: <546ba5d4-d27a-4f81-cf1c-222c5f95899a@amd.com>


On 2021-03-15 12:55 p.m., Christian König wrote:
>
>
> Am 15.03.21 um 17:21 schrieb Andrey Grodzovsky:
>>
>> On 2021-03-15 12:10 p.m., Christian König wrote:
>>> Am 12.03.21 um 16:34 schrieb Andrey Grodzovsky:
>>>>
>>>>
>>>> On 2021-03-12 4:03 a.m., Christian König wrote:
>>>>> Am 11.03.21 um 23:40 schrieb Andrey Grodzovsky:
>>>>>> [SNIP]
>>>>>>>> The expected result is they all move closer to the start of PCI 
>>>>>>>> address
>>>>>>>> space.
>>>>>>>>
>>>>>>>
>>>>>>> Ok, I updated as you described. Also I removed PCI conf command 
>>>>>>> to stop
>>>>>>> address decoding and restart later as I noticed PCI core does it 
>>>>>>> itself
>>>>>>> when needed.
>>>>>>> I tested now also with graphic desktop enabled while submitting
>>>>>>> 3d draw commands and seems like under this scenario everything 
>>>>>>> still
>>>>>>> works. Again, this all needs to be tested with VRAM BAR move as 
>>>>>>> then
>>>>>>> I believe I will see more issues like handling of MMIO mapped 
>>>>>>> VRAM objects (like GART table). In case you do have an AMD card 
>>>>>>> you could also maybe give it a try. In the meanwhile I will add 
>>>>>>> support to ioremapping of those VRAM objects.
>>>>>>>
>>>>>>> Andrey
>>>>>>
>>>>>> Just an update, added support for unmaping/remapping of all VRAM
>>>>>> objects, both user space mmaped and kernel ioremaped. Seems to work
>>>>>> ok but again, without forcing VRAM BAR to move I can't be sure.
>>>>>> Alex, Chsristian - take a look when you have some time to give me 
>>>>>> some
>>>>>> initial feedback on the amdgpu side.
>>>>>>
>>>>>> The code is at 
>>>>>> https://cgit.freedesktop.org/~agrodzov/linux/log/?h=yadro%2Fpcie_hotplug%2Fmovable_bars_v9.1 
>>>>>>
>>>>>
>>>>> Mhm, that let's userspace busy retry until the BAR movement is done.
>>>>>
>>>>> Not sure if that can't live lock somehow.
>>>>>
>>>>> Christian.
>>>>
>>>> In my testing it didn't but, I can instead route them to some
>>>> global static dummy page while BARs are moving and then when 
>>>> everything
>>>> done just invalidate the device address space again and let the
>>>> pagefaults fill in valid PFNs again.
>>>
>>> Well that won't work because the reads/writes which are done in the 
>>> meantime do need to wait for the BAR to be available again.
>>>
>>> So waiting for the BAR move to finish is correct, but what we should 
>>> do is to use a lock instead of an SRCU because that makes lockdep 
>>> complain when we do something nasty.
>>>
>>> Christian.
>>
>>
>> Spinlock I assume ? We can't sleep there - it's an interrupt.
>
> Mhm, the BAR movement is in interrupt context?


No, BARs move is in task context I believe. The page faults are in 
interrupt context and so we can only lock a spinlock there I assume, not 
a mutex which might sleep. But we can't lock
spinlock for the entire BAR move because HW suspend + asic reset is a 
long process with some sleeps/context switches inside it probably.


>
> Well that is rather bad. I was hoping to rename the GPU reset rw_sem 
> into device_access rw_sem and then use the same lock for both (It's 
> essentially the same problem).


I was thinking about it from day 1 but what looked to me different is 
that in GPU reset case there is no technical need to block MMIO accesses 
as the BARs are not moving
and so the page table entries remain valid. It's true that while the 
device in reset those MMIO accesses are meaninglessness - so this indeed 
could be good reason to block
access even during GPU reset.

Andrey


>
> But when we need to move the BAR in atomic/interrupt context that 
> makes things a bit more complicated.
>
> Christian.
>
>>
>> Andrey
>>
>>
>>>
>>>>
>>>> Andrey
>>>>
>>>>>
>>>>>>
>>>>>> Andrey
>>>>>
>>>
>

  reply	other threads:[~2021-03-15 17:13 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <ddef2da4-4726-7321-40fe-5f90788cc836@amd.com>
     [not found] ` <MN2PR12MB44880052F5C4ECF3EF22B58EF7B69@MN2PR12MB4488.namprd12.prod.outlook.com>
     [not found]   ` <ffb816ae8336ff2373ec5fcf695e698900c3d557.camel@yadro.com>
     [not found]     ` <9c41221f-ecfa-5554-f2ea-6f72cfe7dc7e@amd.com>
     [not found]       ` <dae8dfd8-3a99-620d-f0aa-ceb39923b807@amd.com>
     [not found]         ` <7d9e947648ce47a4ba8d223cdec08197@yadro.com>
     [not found]           ` <c82919f3-5296-cd0a-0b8f-c33614ca3ea9@amd.com>
     [not found]             ` <340062dba82b813b311b2c78022d2d3d0e6f414d.camel@yadro.com>
     [not found]               ` <927d7fbe-756f-a4f1-44cd-c68aecd906d7@amd.com>
     [not found]                 ` <dc2de228b92c4e27819c7681f650e1e5@yadro.com>
     [not found]                   ` <a6af29ed-179a-7619-3dde-474542c180f4@amd.com>
     [not found]                     ` <8f53f1403f0c4121885398487bbfa241@yadro.com>
     [not found]                       ` <fc2ea091-8470-9501-242d-8d82714adecb@amd.com>
     [not found]                         ` <50afd1079dbabeba36fd35fdef84e6e15470ef45.camel@yadro.com>
     [not found]                           ` <c53c23b5-dd37-44f1-dffd-ff9699018a82@amd.com>
     [not found]                             ` <8d7e2d7b7982d8d78c0ecaa74b9d40ace4db8453.camel@yadro.com>
2021-03-04 19:49                               ` Question about supporting AMD eGPU hot plug case Andrey Grodzovsky
2021-03-05 16:08                                 ` Sergei Miroshnichenko
2021-03-05 17:13                                   ` Andrey Grodzovsky
2021-03-05 19:12                                     ` Sergei Miroshnichenko
2021-03-05 23:13                                       ` Andrey Grodzovsky
2021-03-11 22:40                                         ` Andrey Grodzovsky
2021-03-12  9:03                                           ` Christian König
2021-03-12 15:34                                             ` Andrey Grodzovsky
2021-03-15 16:00                                               ` Andrey Grodzovsky
2021-03-15 16:10                                               ` Christian König
2021-03-15 16:21                                                 ` Andrey Grodzovsky
2021-03-15 16:55                                                   ` Christian König
2021-03-15 17:11                                                     ` Andrey Grodzovsky [this message]
2021-03-16 16:17                                                       ` Christian König
2021-03-16 17:39                                                         ` Andrey Grodzovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bd621789-d9de-aa2e-4a83-f8154e868325@amd.com \
    --to=andrey.grodzovsky@amd.com \
    --cc=Alexander.Deucher@amd.com \
    --cc=anatoli.antonovitch@amd.com \
    --cc=christian.koenig@amd.com \
    --cc=helgaas@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux@yadro.com \
    --cc=s.miroshnichenko@yadro.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).