All of lore.kernel.org
 help / color / mirror / Atom feed
* KFD event handling questions
@ 2017-09-22  2:36 Felix Kuehling
       [not found] ` <0c2b2d6b-7466-6a2a-4da7-728756c020bb-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Felix Kuehling @ 2017-09-22  2:36 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Oded Gabbay, Hari Thangirala

[Hari, can you comment on the last paragraph below regarding the events
limit? Would it hurt the runtime much to lose 8 signals out of 4096?]

Hi Oded,

I'm looking into upstreaming some event handling enhancements. I see
that you actually worked on increasing the number of events to 4096 and
working around some problems with debug events when you were at AMD.

As I'm looking at this, I'm confused by the code that allocates signal
pages. It allocates a single signal page that's big enough to
accommodate all signals. But allocate_signal_page looks like it was
meant to allocate smaller signal pages incrementally that are
accumulated in a kfd_process::signal_event_pages list. However, this
feature never gets used. It would also break user mode because the Thunk
expects to be able to map a single signal page for all its signals.

Therefore I'm inclined to simplify allocate_signal_page and related code
to only deal with a single allocation. Do you have any concerns or
objections about that?

Somewhat related to that, you also added a workaround that increases the
signal number to 4096+512 to accommodate 8 debug events while
maintaining page alignment. However, the signal page is allocated with
__get_free_pages, which allocates in powers of 2. So in fact it needs to
allocate 8192 signals, wasting about 28KB per process. It's not a lot of
waste. But an alternative would be to instead sacrifice 8 user signals.
The runtime has a fallback for when it runs out of signals, and the
difference between 4096 and 4088 seems insignificant. What do you think?

Regards,
  Felix

-- 
F e l i x   K u e h l i n g
PMTS Software Development Engineer | Vertical Workstation/Compute
1 Commerce Valley Dr. East, Markham, ON L3T 7X6 Canada
(O) +1(289)695-1597
   _     _   _   _____   _____
  / \   | \ / | |  _  \  \ _  |
 / A \  | \M/ | | |D) )  /|_| |
/_/ \_\ |_| |_| |_____/ |__/ \|   facebook.com/AMD | amd.com

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: KFD event handling questions
       [not found] ` <0c2b2d6b-7466-6a2a-4da7-728756c020bb-5C7GfCeVMHo@public.gmane.org>
@ 2017-09-23  0:44   ` Thangirala, Hari
       [not found]     ` <CY4PR12MB1320F4926B8A9286421CEB89E2640-rpdhrqHFk05WqhXzhz0uLQdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Thangirala, Hari @ 2017-09-23  0:44 UTC (permalink / raw)
  To: Kuehling, Felix, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Oded Gabbay

Hi Felix,

Event IDs reserved for debug events was an artifact of a debugger design. We changed our tools strategy so those limitations do not exist anymore. There's no need to reserve special event-IDs for debugger. 4096 is good enough and the new debug trap handler will use events from that pool.

Thanks
Hari



-----Original Message-----
From: Kuehling, Felix 
Sent: Thursday, September 21, 2017 9:36 PM
To: amd-gfx@lists.freedesktop.org; Oded Gabbay <oded.gabbay@gmail.com>; Thangirala, Hari <Hari.Thangirala@amd.com>
Subject: KFD event handling questions

[Hari, can you comment on the last paragraph below regarding the events limit? Would it hurt the runtime much to lose 8 signals out of 4096?]

Hi Oded,

I'm looking into upstreaming some event handling enhancements. I see that you actually worked on increasing the number of events to 4096 and working around some problems with debug events when you were at AMD.

As I'm looking at this, I'm confused by the code that allocates signal pages. It allocates a single signal page that's big enough to accommodate all signals. But allocate_signal_page looks like it was meant to allocate smaller signal pages incrementally that are accumulated in a kfd_process::signal_event_pages list. However, this feature never gets used. It would also break user mode because the Thunk expects to be able to map a single signal page for all its signals.

Therefore I'm inclined to simplify allocate_signal_page and related code to only deal with a single allocation. Do you have any concerns or objections about that?

Somewhat related to that, you also added a workaround that increases the signal number to 4096+512 to accommodate 8 debug events while maintaining page alignment. However, the signal page is allocated with __get_free_pages, which allocates in powers of 2. So in fact it needs to allocate 8192 signals, wasting about 28KB per process. It's not a lot of waste. But an alternative would be to instead sacrifice 8 user signals.
The runtime has a fallback for when it runs out of signals, and the difference between 4096 and 4088 seems insignificant. What do you think?

Regards,
  Felix

-- 
F e l i x   K u e h l i n g
PMTS Software Development Engineer | Vertical Workstation/Compute
1 Commerce Valley Dr. East, Markham, ON L3T 7X6 Canada
(O) +1(289)695-1597
   _     _   _   _____   _____
  / \   | \ / | |  _  \  \ _  |
 / A \  | \M/ | | |D) )  /|_| |
/_/ \_\ |_| |_| |_____/ |__/ \|   facebook.com/AMD | amd.com

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: KFD event handling questions
       [not found]     ` <CY4PR12MB1320F4926B8A9286421CEB89E2640-rpdhrqHFk05WqhXzhz0uLQdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2017-09-24 10:11       ` Oded Gabbay
  2017-10-02 13:22       ` Kuehling, Felix
  1 sibling, 0 replies; 6+ messages in thread
From: Oded Gabbay @ 2017-09-24 10:11 UTC (permalink / raw)
  To: Thangirala, Hari
  Cc: Kuehling, Felix, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On Sat, Sep 23, 2017 at 3:44 AM, Thangirala, Hari
<Hari.Thangirala@amd.com> wrote:
> Hi Felix,
>
> Event IDs reserved for debug events was an artifact of a debugger design. We changed our tools strategy so those limitations do not exist anymore. There's no need to reserve special event-IDs for debugger. 4096 is good enough and the new debug trap handler will use events from that pool.
>
> Thanks
> Hari
>
>
>
> -----Original Message-----
> From: Kuehling, Felix
> Sent: Thursday, September 21, 2017 9:36 PM
> To: amd-gfx@lists.freedesktop.org; Oded Gabbay <oded.gabbay@gmail.com>; Thangirala, Hari <Hari.Thangirala@amd.com>
> Subject: KFD event handling questions
>
> [Hari, can you comment on the last paragraph below regarding the events limit? Would it hurt the runtime much to lose 8 signals out of 4096?]
>
> Hi Oded,
>
> I'm looking into upstreaming some event handling enhancements. I see that you actually worked on increasing the number of events to 4096 and working around some problems with debug events when you were at AMD.
>
> As I'm looking at this, I'm confused by the code that allocates signal pages. It allocates a single signal page that's big enough to accommodate all signals. But allocate_signal_page looks like it was meant to allocate smaller signal pages incrementally that are accumulated in a kfd_process::signal_event_pages list. However, this feature never gets used. It would also break user mode because the Thunk expects to be able to map a single signal page for all its signals.
>
> Therefore I'm inclined to simplify allocate_signal_page and related code to only deal with a single allocation. Do you have any concerns or objections about that?

Feel free to simplify that code. At the time I believe we didn't know
how many events will be needed so we prepared a more generic mechanism
for allocation. I guess we are now smarter (after almost 4 years) so
simplification seems like the right thing to do if it matches the rest
of the stack.

>
> Somewhat related to that, you also added a workaround that increases the signal number to 4096+512 to accommodate 8 debug events while maintaining page alignment. However, the signal page is allocated with __get_free_pages, which allocates in powers of 2. So in fact it needs to allocate 8192 signals, wasting about 28KB per process. It's not a lot of waste. But an alternative would be to instead sacrifice 8 user signals.
> The runtime has a fallback for when it runs out of signals, and the difference between 4096 and 4088 seems insignificant. What do you think?

I believe Hari already answered that and I don't have anything more to add.

Thanks,
Oded

>
> Regards,
>   Felix
>
> --
> F e l i x   K u e h l i n g
> PMTS Software Development Engineer | Vertical Workstation/Compute
> 1 Commerce Valley Dr. East, Markham, ON L3T 7X6 Canada
> (O) +1(289)695-1597
>    _     _   _   _____   _____
>   / \   | \ / | |  _  \  \ _  |
>  / A \  | \M/ | | |D) )  /|_| |
> /_/ \_\ |_| |_| |_____/ |__/ \|   facebook.com/AMD | amd.com
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: KFD event handling questions
       [not found]     ` <CY4PR12MB1320F4926B8A9286421CEB89E2640-rpdhrqHFk05WqhXzhz0uLQdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  2017-09-24 10:11       ` Oded Gabbay
@ 2017-10-02 13:22       ` Kuehling, Felix
       [not found]         ` <DM5PR1201MB02354ADD8A56B16EF16A544F927D0-grEf7a3NxMBd8L2jMOIKKmrFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
  1 sibling, 1 reply; 6+ messages in thread
From: Kuehling, Felix @ 2017-10-02 13:22 UTC (permalink / raw)
  To: Thangirala, Hari, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Oded Gabbay

Hi Hari,

Is the "new debug trap handler" already working? It seems right now I'm breaking the "old" debugger backend test. However, given the current status of that debugger, I guess we can disable those tests for now?

Can you speak on behalf of the debugger team, or should I consult someone else on their end as well?

Regards,
  Felix

-----Original Message-----
From: Thangirala, Hari 
Sent: Friday, September 22, 2017 8:44 PM
To: Kuehling, Felix; amd-gfx@lists.freedesktop.org; Oded Gabbay
Subject: RE: KFD event handling questions

Hi Felix,

Event IDs reserved for debug events was an artifact of a debugger design. We changed our tools strategy so those limitations do not exist anymore. There's no need to reserve special event-IDs for debugger. 4096 is good enough and the new debug trap handler will use events from that pool.

Thanks
Hari



-----Original Message-----
From: Kuehling, Felix 
Sent: Thursday, September 21, 2017 9:36 PM
To: amd-gfx@lists.freedesktop.org; Oded Gabbay <oded.gabbay@gmail.com>; Thangirala, Hari <Hari.Thangirala@amd.com>
Subject: KFD event handling questions

[Hari, can you comment on the last paragraph below regarding the events limit? Would it hurt the runtime much to lose 8 signals out of 4096?]

Hi Oded,

I'm looking into upstreaming some event handling enhancements. I see that you actually worked on increasing the number of events to 4096 and working around some problems with debug events when you were at AMD.

As I'm looking at this, I'm confused by the code that allocates signal pages. It allocates a single signal page that's big enough to accommodate all signals. But allocate_signal_page looks like it was meant to allocate smaller signal pages incrementally that are accumulated in a kfd_process::signal_event_pages list. However, this feature never gets used. It would also break user mode because the Thunk expects to be able to map a single signal page for all its signals.

Therefore I'm inclined to simplify allocate_signal_page and related code to only deal with a single allocation. Do you have any concerns or objections about that?

Somewhat related to that, you also added a workaround that increases the signal number to 4096+512 to accommodate 8 debug events while maintaining page alignment. However, the signal page is allocated with __get_free_pages, which allocates in powers of 2. So in fact it needs to allocate 8192 signals, wasting about 28KB per process. It's not a lot of waste. But an alternative would be to instead sacrifice 8 user signals.
The runtime has a fallback for when it runs out of signals, and the difference between 4096 and 4088 seems insignificant. What do you think?

Regards,
  Felix

-- 
F e l i x   K u e h l i n g
PMTS Software Development Engineer | Vertical Workstation/Compute
1 Commerce Valley Dr. East, Markham, ON L3T 7X6 Canada
(O) +1(289)695-1597
   _     _   _   _____   _____
  / \   | \ / | |  _  \  \ _  |
 / A \  | \M/ | | |D) )  /|_| |
/_/ \_\ |_| |_| |_____/ |__/ \|   facebook.com/AMD | amd.com

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: KFD event handling questions
       [not found]         ` <DM5PR1201MB02354ADD8A56B16EF16A544F927D0-grEf7a3NxMBd8L2jMOIKKmrFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
@ 2017-10-02 15:05           ` Jay Cornwall
       [not found]             ` <1506956716.122072.1125123352.5E99623F-2RFepEojUI2N1INw9kWLP6GC3tUn3ZHUQQ4Iyu8u01E@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Jay Cornwall @ 2017-10-02 15:05 UTC (permalink / raw)
  To: Kuehling, Felix, Thangirala, Hari,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Oded Gabbay

On Mon, Oct 2, 2017, at 08:22, Kuehling, Felix wrote:
> Is the "new debug trap handler" already working? It seems right now I'm
> breaking the "old" debugger backend test. However, given the current
> status of that debugger, I guess we can disable those tests for now?
> 
> Can you speak on behalf of the debugger team, or should I consult someone
> else on their end as well?

The existing debug API was designed to interact with live wavefronts on
the device. This created problems for the scheduler, which needs to be
able to remove wavefronts at any time without notice. The unprivileged
debugger could interfere with privileged operations (e.g. debugging in
the presence of oversubscribed processes, world switch in SR-IOV). It
wasn't developed beyond internal test cases and the ioctls should not
have been upstreamed.

The debugger was redesigned to work with offline wavefront state
collected through wavefront context save (already implemented in the
scheduler and controlled through hsaKmtUpdateQueue). This respects the
privilege model and is robust in all scheduling scenarios.

There are stil some yet-to-be-determined interfaces to control
per-process debugging features. This could extend/repurpose an existing
ioctl or require a new one.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: KFD event handling questions
       [not found]             ` <1506956716.122072.1125123352.5E99623F-2RFepEojUI2N1INw9kWLP6GC3tUn3ZHUQQ4Iyu8u01E@public.gmane.org>
@ 2017-10-02 17:40               ` Thangirala, Hari
  0 siblings, 0 replies; 6+ messages in thread
From: Thangirala, Hari @ 2017-10-02 17:40 UTC (permalink / raw)
  To: Jay Cornwall, Kuehling, Felix,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Oded Gabbay

Yes, should be okay to disable debugger tests as we don't support old debugger post-1.5 release. I've sent an email to Tools team and cc'd so we give them a heads-up. As Jay said, we're actively working on the new debugger design and don't have a need for hard-coding special event IDs anymore.  

Thanks
Hari

-----Original Message-----
From: Jay Cornwall [mailto:jay@jcornwall.com] 
Sent: Monday, October 2, 2017 10:05 AM
To: Kuehling, Felix <Felix.Kuehling@amd.com>; Thangirala, Hari <Hari.Thangirala@amd.com>; amd-gfx@lists.freedesktop.org; Oded Gabbay <oded.gabbay@gmail.com>
Subject: Re: KFD event handling questions

On Mon, Oct 2, 2017, at 08:22, Kuehling, Felix wrote:
> Is the "new debug trap handler" already working? It seems right now 
> I'm breaking the "old" debugger backend test. However, given the 
> current status of that debugger, I guess we can disable those tests for now?
> 
> Can you speak on behalf of the debugger team, or should I consult 
> someone else on their end as well?

The existing debug API was designed to interact with live wavefronts on the device. This created problems for the scheduler, which needs to be able to remove wavefronts at any time without notice. The unprivileged debugger could interfere with privileged operations (e.g. debugging in the presence of oversubscribed processes, world switch in SR-IOV). It wasn't developed beyond internal test cases and the ioctls should not have been upstreamed.

The debugger was redesigned to work with offline wavefront state collected through wavefront context save (already implemented in the scheduler and controlled through hsaKmtUpdateQueue). This respects the privilege model and is robust in all scheduling scenarios.

There are stil some yet-to-be-determined interfaces to control per-process debugging features. This could extend/repurpose an existing ioctl or require a new one.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-10-02 17:40 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-22  2:36 KFD event handling questions Felix Kuehling
     [not found] ` <0c2b2d6b-7466-6a2a-4da7-728756c020bb-5C7GfCeVMHo@public.gmane.org>
2017-09-23  0:44   ` Thangirala, Hari
     [not found]     ` <CY4PR12MB1320F4926B8A9286421CEB89E2640-rpdhrqHFk05WqhXzhz0uLQdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-09-24 10:11       ` Oded Gabbay
2017-10-02 13:22       ` Kuehling, Felix
     [not found]         ` <DM5PR1201MB02354ADD8A56B16EF16A544F927D0-grEf7a3NxMBd8L2jMOIKKmrFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2017-10-02 15:05           ` Jay Cornwall
     [not found]             ` <1506956716.122072.1125123352.5E99623F-2RFepEojUI2N1INw9kWLP6GC3tUn3ZHUQQ4Iyu8u01E@public.gmane.org>
2017-10-02 17:40               ` Thangirala, Hari

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.