All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/83] AMD HSA kernel driver
@ 2014-07-10 21:45 Oded Gabbay
  2014-07-10 22:24   ` Jerome Glisse
  0 siblings, 1 reply; 51+ messages in thread
From: Oded Gabbay @ 2014-07-10 21:45 UTC (permalink / raw)
  To: David Airlie, Alex Deucher, Jerome Glisse
  Cc: Andrew Morton, linux-kernel, dri-devel, John Bridgman,
	Andrew Lewycky, Joerg Roedel, Oded Gabbay

This patch set implements a Heterogeneous System Architecture (HSA) driver 
for radeon-family GPUs. 

HSA allows different processor types (CPUs, DSPs, GPUs, etc..) to share 
system resources more effectively via HW features including shared pageable 
memory, userspace-accessible work queues, and platform-level atomics. In 
addition to the memory protection mechanisms in GPUVM and IOMMUv2, the Sea 
Islands family of GPUs also performs HW-level validation of commands passed 
in through the queues (aka rings).

The code in this patch set is intended to serve both as a sample driver for 
other HSA-compatible hardware devices and as a production driver for 
radeon-family processors. The code is architected to support multiple CPUs 
each with connected GPUs, although the current implementation focuses on a 
single Kaveri/Berlin APU, and works alongside the existing radeon kernel 
graphics driver (kgd). 

AMD GPUs designed for use with HSA (Sea Islands and up) share some hardware 
functionality between HSA compute and regular gfx/compute (memory, 
interrupts, registers), while other functionality has been added 
specifically for HSA compute  (hw scheduler for virtualized compute rings). 
All shared hardware is owned by the radeon graphics driver, and an interface 
between kfd and kgd allows the kfd to make use of those shared resources, 
while HSA-specific functionality is managed directly by kfd by submitting 
packets into an HSA-specific command queue (the "HIQ").

During kfd module initialization a char device node (/dev/kfd) is created 
(surviving until module exit), with ioctls for queue creation & management, 
and data structures are initialized for managing HSA device topology. 

The rest of the initialization is driven by calls from the radeon kgd at 
the following points :

- radeon_init (kfd_init)
- radeon_exit (kfd_fini)
- radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
- radeon_driver_unload_kms (kfd_device_fini)

During the probe and init processing per-device data structures are 
established which connect to the associated graphics kernel driver. This 
information is exposed to userspace via sysfs, along with a version number 
allowing userspace to determine if a topology change has occurred while it 
was reading from sysfs. 

The interface between kfd and kgd also allows the kfd to request buffer 
management services from kgd, and allows kgd to route interrupt requests to 
kfd code since the interrupt block is shared between regular 
graphics/compute and HSA compute subsystems in the GPU.

The kfd code works with an open source usermode library ("libhsakmt") which 
is in the final stages of IP review and should be published in a separate 
repo over the next few days. 

The code operates in one of three modes, selectable via the sched_policy 
module parameter :

- sched_policy=0 uses a hardware scheduler running in the MEC block within 
CP, and allows oversubscription (more queues than HW slots) 

- sched_policy=1 also uses HW scheduling but does not allow 
oversubscription, so create_queue requests fail when we run out of HW slots 

- sched_policy=2 does not use HW scheduling, so the driver manually assigns 
queues to HW slots by programming registers

The "no HW scheduling" option is for debug & new hardware bringup only, so 
has less test coverage than the other options. Default in the current code 
is "HW scheduling without oversubscription" since that is where we have the 
most test coverage but we expect to change the default to "HW scheduling 
with oversubscription" after further testing. This effectively removes the 
HW limit on the number of work queues available to applications.

Programs running on the GPU are associated with an address space through the 
VMID field, which is translated to a unique PASID at access time via a set 
of 16 VMID-to-PASID mapping registers. The available VMIDs (currently 16) 
are partitioned (under control of the radeon kgd) between current 
gfx/compute and HSA compute, with each getting 8 in the current code. The 
VMID-to-PASID mapping registers are updated by the HW scheduler when used, 
and by driver code if HW scheduling is not being used.  

The Sea Islands compute queues use a new "doorbell" mechanism instead of the 
earlier kernel-managed write pointer registers. Doorbells use a separate BAR 
dedicated for this purpose, and pages within the doorbell aperture are 
mapped to userspace (each page mapped to only one user address space). 
Writes to the doorbell aperture are intercepted by GPU hardware, allowing 
userspace code to safely manage work queues (rings) without requiring a 
kernel call for every ring update. 

First step for an application process is to open the kfd device. Calls to 
open create a kfd "process" structure only for the first thread of the 
process. Subsequent open calls are checked to see if they are from processes 
using the same mm_struct and, if so, don't do anything. The kfd per-process 
data lives as long as the mm_struct exists. Each mm_struct is associated 
with a unique PASID, allowing the IOMMUv2 to make userspace process memory 
accessible to the GPU. 

Next step is for the application to collect topology information via sysfs. 
This gives userspace enough information to be able to identify specific 
nodes (processors) in subsequent queue management calls. Application 
processes can create queues on multiple processors, and processors support 
queues from multiple processes. 

At this point the application can create work queues in userspace memory and 
pass them through the usermode library to kfd to have them mapped onto HW 
queue slots so that commands written to the queues can be executed by the 
GPU. Queue operations specify a processor node, and so the bulk of this code 
is device-specific. 

Written by John Bridgman <John.Bridgman@amd.com>

Alexey Skidanov (4):
  hsa/radeon: 32-bit processes support
  hsa/radeon: NULL pointer dereference bug workaround
  hsa/radeon: HSA64/HSA32 modes support
  hsa/radeon: Add local memory to topology

Andrew Lewycky (3):
  hsa/radeon: Make binding of process to device permanent
  hsa/radeon: Implement hsaKmtSetMemoryPolicy
  mm: Change timing of notification to IOMMUs about a page to be
    invalidated

Ben Goz (20):
  hsa/radeon: Add queue and hw_pointer_store modules
  hsa/radeon: Add support allocating kernel doorbells
  hsa/radeon: Add mqd_manager module
  hsa/radeon: Add kernel queue support for KFD
  hsa/radeon: Add module parameter of scheduling policy
  hsa/radeon: Add packet manager module
  hsa/radeon: Add process queue manager module
  hsa/radeon: Add device queue manager module
  hsa/radeon: Switch to new queue scheduler
  hsa/radeon: Add IOCTL for update queue
  hsa/radeon: Queue Management integration with Memory Management
  hsa/radeon: update queue fault handling
  hsa/radeon: fixing a bug to support 32b processes
  hsa/radeon: Fix number of pipes per ME
  hsa/radeon: Removing hw pointer store module
  hsa/radeon: Adding some error messages
  hsa/radeon: Fixing minor issues with kernel queues (DIQ)
  drm/radeon: Add register access functions to kfd2kgd interface
  hsa/radeon: Eliminating all direct register accesses
  drm/radeon: Remove lock functions from kfd2kgd interface

Evgeny Pinchuk (9):
  hsa/radeon: fix the OEMID assignment in kfd_topology
  drm/radeon: extending kfd-kgd interface
  hsa/radeon: implementing IOCTL for clock counters
  drm/radeon: adding synchronization for GRBM GFX
  hsa/radeon: fixing clock counters bug
  drm/radeon: Extending kfd interface
  hsa/radeon: Adding max clock speeds to topology
  hsa/radeon: Alternating the source of max clock
  hsa/radeon: Exclusive access for perf. counters

Michael Varga (1):
  hsa/radeon: debugging print statements

Oded Gabbay (45):
  mm: Add kfd_process pointer to mm_struct
  drm/radeon: reduce number of free VMIDs and pipes in KV
  drm/radeon: Report doorbell configuration to kfd
  drm/radeon: Add radeon <--> kfd interface
  drm/radeon: Add kfd-->kgd interface to get virtual ram size
  drm/radeon: Add kfd-->kgd interfaces of memory allocation/mapping
  drm/radeon: Add kfd-->kgd interface of locking srbm_gfx_cntl register
  drm/radeon: Add calls to initialize and finalize kfd from radeon
  hsa/radeon: Add code base of hsa driver for AMD's GPUs
  hsa/radeon: Add initialization and unmapping of doorbell aperture
  hsa/radeon: Add scheduler code
  hsa/radeon: Add kfd mmap handler
  hsa/radeon: Add 2 new IOCTL to kfd, CREATE_QUEUE and DESTROY_QUEUE
  hsa/radeon: Update MAINTAINERS and CREDITS files
  hsa/radeon: Add interrupt handling module
  hsa/radeon: Add the isr function of the KFD scehduler
  hsa/radeon: Handle deactivation of queues using interrupts
  hsa/radeon: Enable interrupts in KFD scheduler
  hsa/radeon: Enable/Disable KFD interrupt module
  hsa/radeon: Add interrupt callback function to kgd2kfd interface
  hsa/radeon: Add kgd-->kfd interfaces for suspend and resume
  drm/radeon: Add calls to suspend and resume of kfd driver
  drm/radeon/cik: Don't touch int of pipes 1-7
  drm/radeon/cik: Call kfd isr function
  hsa/radeon: Fix memory size allocated for HPD
  hsa/radeon: Fix list of supported devices
  hsa/radeon: Fix coding style in cik_int.h
  hsa/radeon: Print ioctl commnad only in debug mode
  hsa/radeon: Print ISR info only in debug mode
  hsa/radeon: Workaround for a bug in amd_iommu
  hsa/radeon: Eliminate warnings in compilation
  hsa/radeon: Various kernel styling fixes
  hsa/radeon: Rearrange structures in kfd_ioctl.h
  hsa/radeon: change another pr_info to pr_debug
  hsa/radeon: Fix timeout calculation in sync_with_hw
  hsa/radeon: Update module information and version
  hsa/radeon: Update module version to 0.6.0
  hsa/radeon: Fix initialization of sh_mem registers
  hsa/radeon: Fix compilation warnings
  hsa/radeon: Remove old scheduler code
  hsa/radeon: Static analysis (smatch) fixes
  hsa/radeon: Check oversubscription before destroying runlist
  hsa/radeon: Don't verify cksum when parsing CRAT table
  hsa/radeon: Update module version to 0.6.1
  hsa/radeon: Update module version to 0.6.2

Yair Shachar (1):
  hsa/radeon: Adding qcm fence return status

 CREDITS                                            |    7 +
 MAINTAINERS                                        |    8 +
 drivers/Kconfig                                    |    2 +
 drivers/gpu/Makefile                               |    1 +
 drivers/gpu/drm/radeon/Makefile                    |    1 +
 drivers/gpu/drm/radeon/cik.c                       |  156 +--
 drivers/gpu/drm/radeon/cikd.h                      |   51 +-
 drivers/gpu/drm/radeon/radeon.h                    |    9 +
 drivers/gpu/drm/radeon/radeon_device.c             |   32 +
 drivers/gpu/drm/radeon/radeon_drv.c                |    6 +
 drivers/gpu/drm/radeon/radeon_kfd.c                |  630 ++++++++++
 drivers/gpu/drm/radeon/radeon_kms.c                |    9 +
 drivers/gpu/hsa/Kconfig                            |   20 +
 drivers/gpu/hsa/Makefile                           |    1 +
 drivers/gpu/hsa/radeon/Makefile                    |   12 +
 drivers/gpu/hsa/radeon/cik_int.h                   |   50 +
 drivers/gpu/hsa/radeon/cik_mqds.h                  |  250 ++++
 drivers/gpu/hsa/radeon/cik_regs.h                  |  220 ++++
 drivers/gpu/hsa/radeon/kfd_aperture.c              |  123 ++
 drivers/gpu/hsa/radeon/kfd_chardev.c               |  530 +++++++++
 drivers/gpu/hsa/radeon/kfd_crat.h                  |  294 +++++
 drivers/gpu/hsa/radeon/kfd_device.c                |  244 ++++
 drivers/gpu/hsa/radeon/kfd_device_queue_manager.c  |  981 ++++++++++++++++
 drivers/gpu/hsa/radeon/kfd_device_queue_manager.h  |  101 ++
 drivers/gpu/hsa/radeon/kfd_doorbell.c              |  242 ++++
 drivers/gpu/hsa/radeon/kfd_interrupt.c             |  177 +++
 drivers/gpu/hsa/radeon/kfd_kernel_queue.c          |  305 +++++
 drivers/gpu/hsa/radeon/kfd_kernel_queue.h          |   66 ++
 drivers/gpu/hsa/radeon/kfd_module.c                |  130 +++
 drivers/gpu/hsa/radeon/kfd_mqd_manager.c           |  290 +++++
 drivers/gpu/hsa/radeon/kfd_mqd_manager.h           |   54 +
 drivers/gpu/hsa/radeon/kfd_packet_manager.c        |  488 ++++++++
 drivers/gpu/hsa/radeon/kfd_pasid.c                 |   97 ++
 drivers/gpu/hsa/radeon/kfd_pm4_headers.h           |  682 +++++++++++
 drivers/gpu/hsa/radeon/kfd_pm4_opcodes.h           |  107 ++
 drivers/gpu/hsa/radeon/kfd_priv.h                  |  475 ++++++++
 drivers/gpu/hsa/radeon/kfd_process.c               |  391 +++++++
 drivers/gpu/hsa/radeon/kfd_process_queue_manager.c |  343 ++++++
 drivers/gpu/hsa/radeon/kfd_queue.c                 |  109 ++
 drivers/gpu/hsa/radeon/kfd_scheduler.h             |   72 ++
 drivers/gpu/hsa/radeon/kfd_topology.c              | 1207 ++++++++++++++++++++
 drivers/gpu/hsa/radeon/kfd_topology.h              |  168 +++
 drivers/gpu/hsa/radeon/kfd_vidmem.c                |   97 ++
 include/linux/mm_types.h                           |   14 +
 include/linux/radeon_kfd.h                         |  106 ++
 include/uapi/linux/kfd_ioctl.h                     |  133 +++
 mm/rmap.c                                          |    8 +-
 47 files changed, 9402 insertions(+), 97 deletions(-)
 create mode 100644 drivers/gpu/drm/radeon/radeon_kfd.c
 create mode 100644 drivers/gpu/hsa/Kconfig
 create mode 100644 drivers/gpu/hsa/Makefile
 create mode 100644 drivers/gpu/hsa/radeon/Makefile
 create mode 100644 drivers/gpu/hsa/radeon/cik_int.h
 create mode 100644 drivers/gpu/hsa/radeon/cik_mqds.h
 create mode 100644 drivers/gpu/hsa/radeon/cik_regs.h
 create mode 100644 drivers/gpu/hsa/radeon/kfd_aperture.c
 create mode 100644 drivers/gpu/hsa/radeon/kfd_chardev.c
 create mode 100644 drivers/gpu/hsa/radeon/kfd_crat.h
 create mode 100644 drivers/gpu/hsa/radeon/kfd_device.c
 create mode 100644 drivers/gpu/hsa/radeon/kfd_device_queue_manager.c
 create mode 100644 drivers/gpu/hsa/radeon/kfd_device_queue_manager.h
 create mode 100644 drivers/gpu/hsa/radeon/kfd_doorbell.c
 create mode 100644 drivers/gpu/hsa/radeon/kfd_interrupt.c
 create mode 100644 drivers/gpu/hsa/radeon/kfd_kernel_queue.c
 create mode 100644 drivers/gpu/hsa/radeon/kfd_kernel_queue.h
 create mode 100644 drivers/gpu/hsa/radeon/kfd_module.c
 create mode 100644 drivers/gpu/hsa/radeon/kfd_mqd_manager.c
 create mode 100644 drivers/gpu/hsa/radeon/kfd_mqd_manager.h
 create mode 100644 drivers/gpu/hsa/radeon/kfd_packet_manager.c
 create mode 100644 drivers/gpu/hsa/radeon/kfd_pasid.c
 create mode 100644 drivers/gpu/hsa/radeon/kfd_pm4_headers.h
 create mode 100644 drivers/gpu/hsa/radeon/kfd_pm4_opcodes.h
 create mode 100644 drivers/gpu/hsa/radeon/kfd_priv.h
 create mode 100644 drivers/gpu/hsa/radeon/kfd_process.c
 create mode 100644 drivers/gpu/hsa/radeon/kfd_process_queue_manager.c
 create mode 100644 drivers/gpu/hsa/radeon/kfd_queue.c
 create mode 100644 drivers/gpu/hsa/radeon/kfd_scheduler.h
 create mode 100644 drivers/gpu/hsa/radeon/kfd_topology.c
 create mode 100644 drivers/gpu/hsa/radeon/kfd_topology.h
 create mode 100644 drivers/gpu/hsa/radeon/kfd_vidmem.c
 create mode 100644 include/linux/radeon_kfd.h
 create mode 100644 include/uapi/linux/kfd_ioctl.h

-- 
1.9.1


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/83] AMD HSA kernel driver
  2014-07-10 21:45 [PATCH 00/83] AMD HSA kernel driver Oded Gabbay
@ 2014-07-10 22:24   ` Jerome Glisse
  0 siblings, 0 replies; 51+ messages in thread
From: Jerome Glisse @ 2014-07-10 22:24 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: David Airlie, Alex Deucher, Andrew Morton, linux-kernel,
	dri-devel, John Bridgman, Andrew Lewycky, Joerg Roedel,
	Oded Gabbay

On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote:
> This patch set implements a Heterogeneous System Architecture (HSA) driver 
> for radeon-family GPUs. 

This is just quick comments on few things. Given size of this, people
will need to have time to review things.

> 
> HSA allows different processor types (CPUs, DSPs, GPUs, etc..) to share 
> system resources more effectively via HW features including shared pageable 
> memory, userspace-accessible work queues, and platform-level atomics. In 
> addition to the memory protection mechanisms in GPUVM and IOMMUv2, the Sea 
> Islands family of GPUs also performs HW-level validation of commands passed 
> in through the queues (aka rings).
> 
> The code in this patch set is intended to serve both as a sample driver for 
> other HSA-compatible hardware devices and as a production driver for 
> radeon-family processors. The code is architected to support multiple CPUs 
> each with connected GPUs, although the current implementation focuses on a 
> single Kaveri/Berlin APU, and works alongside the existing radeon kernel 
> graphics driver (kgd). 
> 
> AMD GPUs designed for use with HSA (Sea Islands and up) share some hardware 
> functionality between HSA compute and regular gfx/compute (memory, 
> interrupts, registers), while other functionality has been added 
> specifically for HSA compute  (hw scheduler for virtualized compute rings). 
> All shared hardware is owned by the radeon graphics driver, and an interface 
> between kfd and kgd allows the kfd to make use of those shared resources, 
> while HSA-specific functionality is managed directly by kfd by submitting 
> packets into an HSA-specific command queue (the "HIQ").
> 
> During kfd module initialization a char device node (/dev/kfd) is created 
> (surviving until module exit), with ioctls for queue creation & management, 
> and data structures are initialized for managing HSA device topology. 
> 
> The rest of the initialization is driven by calls from the radeon kgd at 
> the following points :
> 
> - radeon_init (kfd_init)
> - radeon_exit (kfd_fini)
> - radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
> - radeon_driver_unload_kms (kfd_device_fini)
> 
> During the probe and init processing per-device data structures are 
> established which connect to the associated graphics kernel driver. This 
> information is exposed to userspace via sysfs, along with a version number 
> allowing userspace to determine if a topology change has occurred while it 
> was reading from sysfs. 
> 
> The interface between kfd and kgd also allows the kfd to request buffer 
> management services from kgd, and allows kgd to route interrupt requests to 
> kfd code since the interrupt block is shared between regular 
> graphics/compute and HSA compute subsystems in the GPU.
> 
> The kfd code works with an open source usermode library ("libhsakmt") which 
> is in the final stages of IP review and should be published in a separate 
> repo over the next few days. 
> 
> The code operates in one of three modes, selectable via the sched_policy 
> module parameter :
> 
> - sched_policy=0 uses a hardware scheduler running in the MEC block within 
> CP, and allows oversubscription (more queues than HW slots) 
> 
> - sched_policy=1 also uses HW scheduling but does not allow 
> oversubscription, so create_queue requests fail when we run out of HW slots 
> 
> - sched_policy=2 does not use HW scheduling, so the driver manually assigns 
> queues to HW slots by programming registers
> 
> The "no HW scheduling" option is for debug & new hardware bringup only, so 
> has less test coverage than the other options. Default in the current code 
> is "HW scheduling without oversubscription" since that is where we have the 
> most test coverage but we expect to change the default to "HW scheduling 
> with oversubscription" after further testing. This effectively removes the 
> HW limit on the number of work queues available to applications.
> 
> Programs running on the GPU are associated with an address space through the 
> VMID field, which is translated to a unique PASID at access time via a set 
> of 16 VMID-to-PASID mapping registers. The available VMIDs (currently 16) 
> are partitioned (under control of the radeon kgd) between current 
> gfx/compute and HSA compute, with each getting 8 in the current code. The 
> VMID-to-PASID mapping registers are updated by the HW scheduler when used, 
> and by driver code if HW scheduling is not being used.  
> 
> The Sea Islands compute queues use a new "doorbell" mechanism instead of the 
> earlier kernel-managed write pointer registers. Doorbells use a separate BAR 
> dedicated for this purpose, and pages within the doorbell aperture are 
> mapped to userspace (each page mapped to only one user address space). 
> Writes to the doorbell aperture are intercepted by GPU hardware, allowing 
> userspace code to safely manage work queues (rings) without requiring a 
> kernel call for every ring update. 
> 
> First step for an application process is to open the kfd device. Calls to 
> open create a kfd "process" structure only for the first thread of the 
> process. Subsequent open calls are checked to see if they are from processes 
> using the same mm_struct and, if so, don't do anything. The kfd per-process 
> data lives as long as the mm_struct exists. Each mm_struct is associated 
> with a unique PASID, allowing the IOMMUv2 to make userspace process memory 
> accessible to the GPU. 
> 
> Next step is for the application to collect topology information via sysfs. 
> This gives userspace enough information to be able to identify specific 
> nodes (processors) in subsequent queue management calls. Application 
> processes can create queues on multiple processors, and processors support 
> queues from multiple processes. 

I am not a fan to use sysfs to discover topoly.

> 
> At this point the application can create work queues in userspace memory and 
> pass them through the usermode library to kfd to have them mapped onto HW 
> queue slots so that commands written to the queues can be executed by the 
> GPU. Queue operations specify a processor node, and so the bulk of this code 
> is device-specific. 
> 
> Written by John Bridgman <John.Bridgman@amd.com>

So general comment is you need to collapse many patches things like 58 fixing
kernel style should be collapsed ie fix all previous patch that have broken
style.

Even worse is thing like 71, removing code you just added few patch earlier
in the patchset. This means that if i start reviewing following patch order
i might spend time on code that is later deleted/modified/fixed ie time i
spend understanding and reading some code might be just wasted.

I will come back with individual patch comments and also general remarks.

Cheers,
Jérôme


> 
> Alexey Skidanov (4):
>   hsa/radeon: 32-bit processes support
>   hsa/radeon: NULL pointer dereference bug workaround
>   hsa/radeon: HSA64/HSA32 modes support
>   hsa/radeon: Add local memory to topology
> 
> Andrew Lewycky (3):
>   hsa/radeon: Make binding of process to device permanent
>   hsa/radeon: Implement hsaKmtSetMemoryPolicy
>   mm: Change timing of notification to IOMMUs about a page to be
>     invalidated
> 
> Ben Goz (20):
>   hsa/radeon: Add queue and hw_pointer_store modules
>   hsa/radeon: Add support allocating kernel doorbells
>   hsa/radeon: Add mqd_manager module
>   hsa/radeon: Add kernel queue support for KFD
>   hsa/radeon: Add module parameter of scheduling policy
>   hsa/radeon: Add packet manager module
>   hsa/radeon: Add process queue manager module
>   hsa/radeon: Add device queue manager module
>   hsa/radeon: Switch to new queue scheduler
>   hsa/radeon: Add IOCTL for update queue
>   hsa/radeon: Queue Management integration with Memory Management
>   hsa/radeon: update queue fault handling
>   hsa/radeon: fixing a bug to support 32b processes
>   hsa/radeon: Fix number of pipes per ME
>   hsa/radeon: Removing hw pointer store module
>   hsa/radeon: Adding some error messages
>   hsa/radeon: Fixing minor issues with kernel queues (DIQ)
>   drm/radeon: Add register access functions to kfd2kgd interface
>   hsa/radeon: Eliminating all direct register accesses
>   drm/radeon: Remove lock functions from kfd2kgd interface
> 
> Evgeny Pinchuk (9):
>   hsa/radeon: fix the OEMID assignment in kfd_topology
>   drm/radeon: extending kfd-kgd interface
>   hsa/radeon: implementing IOCTL for clock counters
>   drm/radeon: adding synchronization for GRBM GFX
>   hsa/radeon: fixing clock counters bug
>   drm/radeon: Extending kfd interface
>   hsa/radeon: Adding max clock speeds to topology
>   hsa/radeon: Alternating the source of max clock
>   hsa/radeon: Exclusive access for perf. counters
> 
> Michael Varga (1):
>   hsa/radeon: debugging print statements
> 
> Oded Gabbay (45):
>   mm: Add kfd_process pointer to mm_struct
>   drm/radeon: reduce number of free VMIDs and pipes in KV
>   drm/radeon: Report doorbell configuration to kfd
>   drm/radeon: Add radeon <--> kfd interface
>   drm/radeon: Add kfd-->kgd interface to get virtual ram size
>   drm/radeon: Add kfd-->kgd interfaces of memory allocation/mapping
>   drm/radeon: Add kfd-->kgd interface of locking srbm_gfx_cntl register
>   drm/radeon: Add calls to initialize and finalize kfd from radeon
>   hsa/radeon: Add code base of hsa driver for AMD's GPUs
>   hsa/radeon: Add initialization and unmapping of doorbell aperture
>   hsa/radeon: Add scheduler code
>   hsa/radeon: Add kfd mmap handler
>   hsa/radeon: Add 2 new IOCTL to kfd, CREATE_QUEUE and DESTROY_QUEUE
>   hsa/radeon: Update MAINTAINERS and CREDITS files
>   hsa/radeon: Add interrupt handling module
>   hsa/radeon: Add the isr function of the KFD scehduler
>   hsa/radeon: Handle deactivation of queues using interrupts
>   hsa/radeon: Enable interrupts in KFD scheduler
>   hsa/radeon: Enable/Disable KFD interrupt module
>   hsa/radeon: Add interrupt callback function to kgd2kfd interface
>   hsa/radeon: Add kgd-->kfd interfaces for suspend and resume
>   drm/radeon: Add calls to suspend and resume of kfd driver
>   drm/radeon/cik: Don't touch int of pipes 1-7
>   drm/radeon/cik: Call kfd isr function
>   hsa/radeon: Fix memory size allocated for HPD
>   hsa/radeon: Fix list of supported devices
>   hsa/radeon: Fix coding style in cik_int.h
>   hsa/radeon: Print ioctl commnad only in debug mode
>   hsa/radeon: Print ISR info only in debug mode
>   hsa/radeon: Workaround for a bug in amd_iommu
>   hsa/radeon: Eliminate warnings in compilation
>   hsa/radeon: Various kernel styling fixes
>   hsa/radeon: Rearrange structures in kfd_ioctl.h
>   hsa/radeon: change another pr_info to pr_debug
>   hsa/radeon: Fix timeout calculation in sync_with_hw
>   hsa/radeon: Update module information and version
>   hsa/radeon: Update module version to 0.6.0
>   hsa/radeon: Fix initialization of sh_mem registers
>   hsa/radeon: Fix compilation warnings
>   hsa/radeon: Remove old scheduler code
>   hsa/radeon: Static analysis (smatch) fixes
>   hsa/radeon: Check oversubscription before destroying runlist
>   hsa/radeon: Don't verify cksum when parsing CRAT table
>   hsa/radeon: Update module version to 0.6.1
>   hsa/radeon: Update module version to 0.6.2
> 
> Yair Shachar (1):
>   hsa/radeon: Adding qcm fence return status
> 
>  CREDITS                                            |    7 +
>  MAINTAINERS                                        |    8 +
>  drivers/Kconfig                                    |    2 +
>  drivers/gpu/Makefile                               |    1 +
>  drivers/gpu/drm/radeon/Makefile                    |    1 +
>  drivers/gpu/drm/radeon/cik.c                       |  156 +--
>  drivers/gpu/drm/radeon/cikd.h                      |   51 +-
>  drivers/gpu/drm/radeon/radeon.h                    |    9 +
>  drivers/gpu/drm/radeon/radeon_device.c             |   32 +
>  drivers/gpu/drm/radeon/radeon_drv.c                |    6 +
>  drivers/gpu/drm/radeon/radeon_kfd.c                |  630 ++++++++++
>  drivers/gpu/drm/radeon/radeon_kms.c                |    9 +
>  drivers/gpu/hsa/Kconfig                            |   20 +
>  drivers/gpu/hsa/Makefile                           |    1 +
>  drivers/gpu/hsa/radeon/Makefile                    |   12 +
>  drivers/gpu/hsa/radeon/cik_int.h                   |   50 +
>  drivers/gpu/hsa/radeon/cik_mqds.h                  |  250 ++++
>  drivers/gpu/hsa/radeon/cik_regs.h                  |  220 ++++
>  drivers/gpu/hsa/radeon/kfd_aperture.c              |  123 ++
>  drivers/gpu/hsa/radeon/kfd_chardev.c               |  530 +++++++++
>  drivers/gpu/hsa/radeon/kfd_crat.h                  |  294 +++++
>  drivers/gpu/hsa/radeon/kfd_device.c                |  244 ++++
>  drivers/gpu/hsa/radeon/kfd_device_queue_manager.c  |  981 ++++++++++++++++
>  drivers/gpu/hsa/radeon/kfd_device_queue_manager.h  |  101 ++
>  drivers/gpu/hsa/radeon/kfd_doorbell.c              |  242 ++++
>  drivers/gpu/hsa/radeon/kfd_interrupt.c             |  177 +++
>  drivers/gpu/hsa/radeon/kfd_kernel_queue.c          |  305 +++++
>  drivers/gpu/hsa/radeon/kfd_kernel_queue.h          |   66 ++
>  drivers/gpu/hsa/radeon/kfd_module.c                |  130 +++
>  drivers/gpu/hsa/radeon/kfd_mqd_manager.c           |  290 +++++
>  drivers/gpu/hsa/radeon/kfd_mqd_manager.h           |   54 +
>  drivers/gpu/hsa/radeon/kfd_packet_manager.c        |  488 ++++++++
>  drivers/gpu/hsa/radeon/kfd_pasid.c                 |   97 ++
>  drivers/gpu/hsa/radeon/kfd_pm4_headers.h           |  682 +++++++++++
>  drivers/gpu/hsa/radeon/kfd_pm4_opcodes.h           |  107 ++
>  drivers/gpu/hsa/radeon/kfd_priv.h                  |  475 ++++++++
>  drivers/gpu/hsa/radeon/kfd_process.c               |  391 +++++++
>  drivers/gpu/hsa/radeon/kfd_process_queue_manager.c |  343 ++++++
>  drivers/gpu/hsa/radeon/kfd_queue.c                 |  109 ++
>  drivers/gpu/hsa/radeon/kfd_scheduler.h             |   72 ++
>  drivers/gpu/hsa/radeon/kfd_topology.c              | 1207 ++++++++++++++++++++
>  drivers/gpu/hsa/radeon/kfd_topology.h              |  168 +++
>  drivers/gpu/hsa/radeon/kfd_vidmem.c                |   97 ++
>  include/linux/mm_types.h                           |   14 +
>  include/linux/radeon_kfd.h                         |  106 ++
>  include/uapi/linux/kfd_ioctl.h                     |  133 +++
>  mm/rmap.c                                          |    8 +-
>  47 files changed, 9402 insertions(+), 97 deletions(-)
>  create mode 100644 drivers/gpu/drm/radeon/radeon_kfd.c
>  create mode 100644 drivers/gpu/hsa/Kconfig
>  create mode 100644 drivers/gpu/hsa/Makefile
>  create mode 100644 drivers/gpu/hsa/radeon/Makefile
>  create mode 100644 drivers/gpu/hsa/radeon/cik_int.h
>  create mode 100644 drivers/gpu/hsa/radeon/cik_mqds.h
>  create mode 100644 drivers/gpu/hsa/radeon/cik_regs.h
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_aperture.c
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_chardev.c
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_crat.h
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_device.c
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_device_queue_manager.c
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_device_queue_manager.h
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_doorbell.c
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_interrupt.c
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_kernel_queue.c
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_kernel_queue.h
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_module.c
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_mqd_manager.c
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_mqd_manager.h
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_packet_manager.c
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_pasid.c
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_pm4_headers.h
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_pm4_opcodes.h
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_priv.h
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_process.c
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_process_queue_manager.c
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_queue.c
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_scheduler.h
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_topology.c
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_topology.h
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_vidmem.c
>  create mode 100644 include/linux/radeon_kfd.h
>  create mode 100644 include/uapi/linux/kfd_ioctl.h
> 
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/83] AMD HSA kernel driver
@ 2014-07-10 22:24   ` Jerome Glisse
  0 siblings, 0 replies; 51+ messages in thread
From: Jerome Glisse @ 2014-07-10 22:24 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Andrew Lewycky, linux-kernel, dri-devel, Alex Deucher, Andrew Morton

On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote:
> This patch set implements a Heterogeneous System Architecture (HSA) driver 
> for radeon-family GPUs. 

This is just quick comments on few things. Given size of this, people
will need to have time to review things.

> 
> HSA allows different processor types (CPUs, DSPs, GPUs, etc..) to share 
> system resources more effectively via HW features including shared pageable 
> memory, userspace-accessible work queues, and platform-level atomics. In 
> addition to the memory protection mechanisms in GPUVM and IOMMUv2, the Sea 
> Islands family of GPUs also performs HW-level validation of commands passed 
> in through the queues (aka rings).
> 
> The code in this patch set is intended to serve both as a sample driver for 
> other HSA-compatible hardware devices and as a production driver for 
> radeon-family processors. The code is architected to support multiple CPUs 
> each with connected GPUs, although the current implementation focuses on a 
> single Kaveri/Berlin APU, and works alongside the existing radeon kernel 
> graphics driver (kgd). 
> 
> AMD GPUs designed for use with HSA (Sea Islands and up) share some hardware 
> functionality between HSA compute and regular gfx/compute (memory, 
> interrupts, registers), while other functionality has been added 
> specifically for HSA compute  (hw scheduler for virtualized compute rings). 
> All shared hardware is owned by the radeon graphics driver, and an interface 
> between kfd and kgd allows the kfd to make use of those shared resources, 
> while HSA-specific functionality is managed directly by kfd by submitting 
> packets into an HSA-specific command queue (the "HIQ").
> 
> During kfd module initialization a char device node (/dev/kfd) is created 
> (surviving until module exit), with ioctls for queue creation & management, 
> and data structures are initialized for managing HSA device topology. 
> 
> The rest of the initialization is driven by calls from the radeon kgd at 
> the following points :
> 
> - radeon_init (kfd_init)
> - radeon_exit (kfd_fini)
> - radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
> - radeon_driver_unload_kms (kfd_device_fini)
> 
> During the probe and init processing per-device data structures are 
> established which connect to the associated graphics kernel driver. This 
> information is exposed to userspace via sysfs, along with a version number 
> allowing userspace to determine if a topology change has occurred while it 
> was reading from sysfs. 
> 
> The interface between kfd and kgd also allows the kfd to request buffer 
> management services from kgd, and allows kgd to route interrupt requests to 
> kfd code since the interrupt block is shared between regular 
> graphics/compute and HSA compute subsystems in the GPU.
> 
> The kfd code works with an open source usermode library ("libhsakmt") which 
> is in the final stages of IP review and should be published in a separate 
> repo over the next few days. 
> 
> The code operates in one of three modes, selectable via the sched_policy 
> module parameter :
> 
> - sched_policy=0 uses a hardware scheduler running in the MEC block within 
> CP, and allows oversubscription (more queues than HW slots) 
> 
> - sched_policy=1 also uses HW scheduling but does not allow 
> oversubscription, so create_queue requests fail when we run out of HW slots 
> 
> - sched_policy=2 does not use HW scheduling, so the driver manually assigns 
> queues to HW slots by programming registers
> 
> The "no HW scheduling" option is for debug & new hardware bringup only, so 
> has less test coverage than the other options. Default in the current code 
> is "HW scheduling without oversubscription" since that is where we have the 
> most test coverage but we expect to change the default to "HW scheduling 
> with oversubscription" after further testing. This effectively removes the 
> HW limit on the number of work queues available to applications.
> 
> Programs running on the GPU are associated with an address space through the 
> VMID field, which is translated to a unique PASID at access time via a set 
> of 16 VMID-to-PASID mapping registers. The available VMIDs (currently 16) 
> are partitioned (under control of the radeon kgd) between current 
> gfx/compute and HSA compute, with each getting 8 in the current code. The 
> VMID-to-PASID mapping registers are updated by the HW scheduler when used, 
> and by driver code if HW scheduling is not being used.  
> 
> The Sea Islands compute queues use a new "doorbell" mechanism instead of the 
> earlier kernel-managed write pointer registers. Doorbells use a separate BAR 
> dedicated for this purpose, and pages within the doorbell aperture are 
> mapped to userspace (each page mapped to only one user address space). 
> Writes to the doorbell aperture are intercepted by GPU hardware, allowing 
> userspace code to safely manage work queues (rings) without requiring a 
> kernel call for every ring update. 
> 
> First step for an application process is to open the kfd device. Calls to 
> open create a kfd "process" structure only for the first thread of the 
> process. Subsequent open calls are checked to see if they are from processes 
> using the same mm_struct and, if so, don't do anything. The kfd per-process 
> data lives as long as the mm_struct exists. Each mm_struct is associated 
> with a unique PASID, allowing the IOMMUv2 to make userspace process memory 
> accessible to the GPU. 
> 
> Next step is for the application to collect topology information via sysfs. 
> This gives userspace enough information to be able to identify specific 
> nodes (processors) in subsequent queue management calls. Application 
> processes can create queues on multiple processors, and processors support 
> queues from multiple processes. 

I am not a fan to use sysfs to discover topoly.

> 
> At this point the application can create work queues in userspace memory and 
> pass them through the usermode library to kfd to have them mapped onto HW 
> queue slots so that commands written to the queues can be executed by the 
> GPU. Queue operations specify a processor node, and so the bulk of this code 
> is device-specific. 
> 
> Written by John Bridgman <John.Bridgman@amd.com>

So general comment is you need to collapse many patches things like 58 fixing
kernel style should be collapsed ie fix all previous patch that have broken
style.

Even worse is thing like 71, removing code you just added few patch earlier
in the patchset. This means that if i start reviewing following patch order
i might spend time on code that is later deleted/modified/fixed ie time i
spend understanding and reading some code might be just wasted.

I will come back with individual patch comments and also general remarks.

Cheers,
Jérôme


> 
> Alexey Skidanov (4):
>   hsa/radeon: 32-bit processes support
>   hsa/radeon: NULL pointer dereference bug workaround
>   hsa/radeon: HSA64/HSA32 modes support
>   hsa/radeon: Add local memory to topology
> 
> Andrew Lewycky (3):
>   hsa/radeon: Make binding of process to device permanent
>   hsa/radeon: Implement hsaKmtSetMemoryPolicy
>   mm: Change timing of notification to IOMMUs about a page to be
>     invalidated
> 
> Ben Goz (20):
>   hsa/radeon: Add queue and hw_pointer_store modules
>   hsa/radeon: Add support allocating kernel doorbells
>   hsa/radeon: Add mqd_manager module
>   hsa/radeon: Add kernel queue support for KFD
>   hsa/radeon: Add module parameter of scheduling policy
>   hsa/radeon: Add packet manager module
>   hsa/radeon: Add process queue manager module
>   hsa/radeon: Add device queue manager module
>   hsa/radeon: Switch to new queue scheduler
>   hsa/radeon: Add IOCTL for update queue
>   hsa/radeon: Queue Management integration with Memory Management
>   hsa/radeon: update queue fault handling
>   hsa/radeon: fixing a bug to support 32b processes
>   hsa/radeon: Fix number of pipes per ME
>   hsa/radeon: Removing hw pointer store module
>   hsa/radeon: Adding some error messages
>   hsa/radeon: Fixing minor issues with kernel queues (DIQ)
>   drm/radeon: Add register access functions to kfd2kgd interface
>   hsa/radeon: Eliminating all direct register accesses
>   drm/radeon: Remove lock functions from kfd2kgd interface
> 
> Evgeny Pinchuk (9):
>   hsa/radeon: fix the OEMID assignment in kfd_topology
>   drm/radeon: extending kfd-kgd interface
>   hsa/radeon: implementing IOCTL for clock counters
>   drm/radeon: adding synchronization for GRBM GFX
>   hsa/radeon: fixing clock counters bug
>   drm/radeon: Extending kfd interface
>   hsa/radeon: Adding max clock speeds to topology
>   hsa/radeon: Alternating the source of max clock
>   hsa/radeon: Exclusive access for perf. counters
> 
> Michael Varga (1):
>   hsa/radeon: debugging print statements
> 
> Oded Gabbay (45):
>   mm: Add kfd_process pointer to mm_struct
>   drm/radeon: reduce number of free VMIDs and pipes in KV
>   drm/radeon: Report doorbell configuration to kfd
>   drm/radeon: Add radeon <--> kfd interface
>   drm/radeon: Add kfd-->kgd interface to get virtual ram size
>   drm/radeon: Add kfd-->kgd interfaces of memory allocation/mapping
>   drm/radeon: Add kfd-->kgd interface of locking srbm_gfx_cntl register
>   drm/radeon: Add calls to initialize and finalize kfd from radeon
>   hsa/radeon: Add code base of hsa driver for AMD's GPUs
>   hsa/radeon: Add initialization and unmapping of doorbell aperture
>   hsa/radeon: Add scheduler code
>   hsa/radeon: Add kfd mmap handler
>   hsa/radeon: Add 2 new IOCTL to kfd, CREATE_QUEUE and DESTROY_QUEUE
>   hsa/radeon: Update MAINTAINERS and CREDITS files
>   hsa/radeon: Add interrupt handling module
>   hsa/radeon: Add the isr function of the KFD scehduler
>   hsa/radeon: Handle deactivation of queues using interrupts
>   hsa/radeon: Enable interrupts in KFD scheduler
>   hsa/radeon: Enable/Disable KFD interrupt module
>   hsa/radeon: Add interrupt callback function to kgd2kfd interface
>   hsa/radeon: Add kgd-->kfd interfaces for suspend and resume
>   drm/radeon: Add calls to suspend and resume of kfd driver
>   drm/radeon/cik: Don't touch int of pipes 1-7
>   drm/radeon/cik: Call kfd isr function
>   hsa/radeon: Fix memory size allocated for HPD
>   hsa/radeon: Fix list of supported devices
>   hsa/radeon: Fix coding style in cik_int.h
>   hsa/radeon: Print ioctl commnad only in debug mode
>   hsa/radeon: Print ISR info only in debug mode
>   hsa/radeon: Workaround for a bug in amd_iommu
>   hsa/radeon: Eliminate warnings in compilation
>   hsa/radeon: Various kernel styling fixes
>   hsa/radeon: Rearrange structures in kfd_ioctl.h
>   hsa/radeon: change another pr_info to pr_debug
>   hsa/radeon: Fix timeout calculation in sync_with_hw
>   hsa/radeon: Update module information and version
>   hsa/radeon: Update module version to 0.6.0
>   hsa/radeon: Fix initialization of sh_mem registers
>   hsa/radeon: Fix compilation warnings
>   hsa/radeon: Remove old scheduler code
>   hsa/radeon: Static analysis (smatch) fixes
>   hsa/radeon: Check oversubscription before destroying runlist
>   hsa/radeon: Don't verify cksum when parsing CRAT table
>   hsa/radeon: Update module version to 0.6.1
>   hsa/radeon: Update module version to 0.6.2
> 
> Yair Shachar (1):
>   hsa/radeon: Adding qcm fence return status
> 
>  CREDITS                                            |    7 +
>  MAINTAINERS                                        |    8 +
>  drivers/Kconfig                                    |    2 +
>  drivers/gpu/Makefile                               |    1 +
>  drivers/gpu/drm/radeon/Makefile                    |    1 +
>  drivers/gpu/drm/radeon/cik.c                       |  156 +--
>  drivers/gpu/drm/radeon/cikd.h                      |   51 +-
>  drivers/gpu/drm/radeon/radeon.h                    |    9 +
>  drivers/gpu/drm/radeon/radeon_device.c             |   32 +
>  drivers/gpu/drm/radeon/radeon_drv.c                |    6 +
>  drivers/gpu/drm/radeon/radeon_kfd.c                |  630 ++++++++++
>  drivers/gpu/drm/radeon/radeon_kms.c                |    9 +
>  drivers/gpu/hsa/Kconfig                            |   20 +
>  drivers/gpu/hsa/Makefile                           |    1 +
>  drivers/gpu/hsa/radeon/Makefile                    |   12 +
>  drivers/gpu/hsa/radeon/cik_int.h                   |   50 +
>  drivers/gpu/hsa/radeon/cik_mqds.h                  |  250 ++++
>  drivers/gpu/hsa/radeon/cik_regs.h                  |  220 ++++
>  drivers/gpu/hsa/radeon/kfd_aperture.c              |  123 ++
>  drivers/gpu/hsa/radeon/kfd_chardev.c               |  530 +++++++++
>  drivers/gpu/hsa/radeon/kfd_crat.h                  |  294 +++++
>  drivers/gpu/hsa/radeon/kfd_device.c                |  244 ++++
>  drivers/gpu/hsa/radeon/kfd_device_queue_manager.c  |  981 ++++++++++++++++
>  drivers/gpu/hsa/radeon/kfd_device_queue_manager.h  |  101 ++
>  drivers/gpu/hsa/radeon/kfd_doorbell.c              |  242 ++++
>  drivers/gpu/hsa/radeon/kfd_interrupt.c             |  177 +++
>  drivers/gpu/hsa/radeon/kfd_kernel_queue.c          |  305 +++++
>  drivers/gpu/hsa/radeon/kfd_kernel_queue.h          |   66 ++
>  drivers/gpu/hsa/radeon/kfd_module.c                |  130 +++
>  drivers/gpu/hsa/radeon/kfd_mqd_manager.c           |  290 +++++
>  drivers/gpu/hsa/radeon/kfd_mqd_manager.h           |   54 +
>  drivers/gpu/hsa/radeon/kfd_packet_manager.c        |  488 ++++++++
>  drivers/gpu/hsa/radeon/kfd_pasid.c                 |   97 ++
>  drivers/gpu/hsa/radeon/kfd_pm4_headers.h           |  682 +++++++++++
>  drivers/gpu/hsa/radeon/kfd_pm4_opcodes.h           |  107 ++
>  drivers/gpu/hsa/radeon/kfd_priv.h                  |  475 ++++++++
>  drivers/gpu/hsa/radeon/kfd_process.c               |  391 +++++++
>  drivers/gpu/hsa/radeon/kfd_process_queue_manager.c |  343 ++++++
>  drivers/gpu/hsa/radeon/kfd_queue.c                 |  109 ++
>  drivers/gpu/hsa/radeon/kfd_scheduler.h             |   72 ++
>  drivers/gpu/hsa/radeon/kfd_topology.c              | 1207 ++++++++++++++++++++
>  drivers/gpu/hsa/radeon/kfd_topology.h              |  168 +++
>  drivers/gpu/hsa/radeon/kfd_vidmem.c                |   97 ++
>  include/linux/mm_types.h                           |   14 +
>  include/linux/radeon_kfd.h                         |  106 ++
>  include/uapi/linux/kfd_ioctl.h                     |  133 +++
>  mm/rmap.c                                          |    8 +-
>  47 files changed, 9402 insertions(+), 97 deletions(-)
>  create mode 100644 drivers/gpu/drm/radeon/radeon_kfd.c
>  create mode 100644 drivers/gpu/hsa/Kconfig
>  create mode 100644 drivers/gpu/hsa/Makefile
>  create mode 100644 drivers/gpu/hsa/radeon/Makefile
>  create mode 100644 drivers/gpu/hsa/radeon/cik_int.h
>  create mode 100644 drivers/gpu/hsa/radeon/cik_mqds.h
>  create mode 100644 drivers/gpu/hsa/radeon/cik_regs.h
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_aperture.c
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_chardev.c
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_crat.h
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_device.c
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_device_queue_manager.c
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_device_queue_manager.h
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_doorbell.c
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_interrupt.c
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_kernel_queue.c
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_kernel_queue.h
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_module.c
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_mqd_manager.c
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_mqd_manager.h
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_packet_manager.c
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_pasid.c
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_pm4_headers.h
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_pm4_opcodes.h
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_priv.h
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_process.c
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_process_queue_manager.c
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_queue.c
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_scheduler.h
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_topology.c
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_topology.h
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_vidmem.c
>  create mode 100644 include/linux/radeon_kfd.h
>  create mode 100644 include/uapi/linux/kfd_ioctl.h
> 
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/83] AMD HSA kernel driver
  2014-07-10 22:24   ` Jerome Glisse
@ 2014-07-10 22:51     ` Gabbay, Oded
  -1 siblings, 0 replies; 51+ messages in thread
From: Gabbay, Oded @ 2014-07-10 22:51 UTC (permalink / raw)
  To: j.glisse
  Cc: linux-kernel, Bridgman, John, Deucher, Alexander, Lewycky,
	Andrew, joro, akpm, dri-devel, airlied, oded.gabbay

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 18831 bytes --]

On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote:
> On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote:
> >  This patch set implements a Heterogeneous System Architecture 
> > (HSA) driver
> >  for radeon-family GPUs.
>  
> This is just quick comments on few things. Given size of this, people
> will need to have time to review things.
>  
> >  HSA allows different processor types (CPUs, DSPs, GPUs, etc..) to 
> > share
> >  system resources more effectively via HW features including 
> > shared pageable
> >  memory, userspace-accessible work queues, and platform-level 
> > atomics. In
> >  addition to the memory protection mechanisms in GPUVM and 
> > IOMMUv2, the Sea
> >  Islands family of GPUs also performs HW-level validation of 
> > commands passed
> >  in through the queues (aka rings).
> >  The code in this patch set is intended to serve both as a sample 
> > driver for
> >  other HSA-compatible hardware devices and as a production driver 
> > for
> >  radeon-family processors. The code is architected to support 
> > multiple CPUs
> >  each with connected GPUs, although the current implementation 
> > focuses on a
> >  single Kaveri/Berlin APU, and works alongside the existing radeon 
> > kernel
> >  graphics driver (kgd).
> >  AMD GPUs designed for use with HSA (Sea Islands and up) share 
> > some hardware
> >  functionality between HSA compute and regular gfx/compute (memory,
> >  interrupts, registers), while other functionality has been added
> >  specifically for HSA compute  (hw scheduler for virtualized 
> > compute rings).
> >  All shared hardware is owned by the radeon graphics driver, and 
> > an interface
> >  between kfd and kgd allows the kfd to make use of those shared 
> > resources,
> >  while HSA-specific functionality is managed directly by kfd by 
> > submitting
> >  packets into an HSA-specific command queue (the "HIQ").
> >  During kfd module initialization a char device node (/dev/kfd) is 
> > created
> >  (surviving until module exit), with ioctls for queue creation & 
> > management,
> >  and data structures are initialized for managing HSA device 
> > topology.
> >  The rest of the initialization is driven by calls from the radeon 
> > kgd at
> >  the following points :
> >  - radeon_init (kfd_init)
> >  - radeon_exit (kfd_fini)
> >  - radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
> >  - radeon_driver_unload_kms (kfd_device_fini)
> >  During the probe and init processing per-device data structures 
> > are
> >  established which connect to the associated graphics kernel 
> > driver. This
> >  information is exposed to userspace via sysfs, along with a 
> > version number
> >  allowing userspace to determine if a topology change has occurred 
> > while it
> >  was reading from sysfs.
> >  The interface between kfd and kgd also allows the kfd to request 
> > buffer
> >  management services from kgd, and allows kgd to route interrupt 
> > requests to
> >  kfd code since the interrupt block is shared between regular
> >  graphics/compute and HSA compute subsystems in the GPU.
> >  The kfd code works with an open source usermode library 
> > ("libhsakmt") which
> >  is in the final stages of IP review and should be published in a 
> > separate
> >  repo over the next few days.
> >  The code operates in one of three modes, selectable via the 
> > sched_policy
> >  module parameter :
> >  - sched_policy=0 uses a hardware scheduler running in the MEC 
> > block within
> >  CP, and allows oversubscription (more queues than HW slots)
> >  - sched_policy=1 also uses HW scheduling but does not allow
> >  oversubscription, so create_queue requests fail when we run out 
> > of HW slots
> >  - sched_policy=2 does not use HW scheduling, so the driver 
> > manually assigns
> >  queues to HW slots by programming registers
> >  The "no HW scheduling" option is for debug & new hardware bringup 
> > only, so
> >  has less test coverage than the other options. Default in the 
> > current code
> >  is "HW scheduling without oversubscription" since that is where 
> > we have the
> >  most test coverage but we expect to change the default to "HW 
> > scheduling
> >  with oversubscription" after further testing. This effectively 
> > removes the
> >  HW limit on the number of work queues available to applications.
> >  Programs running on the GPU are associated with an address space 
> > through the
> >  VMID field, which is translated to a unique PASID at access time 
> > via a set
> >  of 16 VMID-to-PASID mapping registers. The available VMIDs 
> > (currently 16)
> >  are partitioned (under control of the radeon kgd) between current
> >  gfx/compute and HSA compute, with each getting 8 in the current 
> > code. The
> >  VMID-to-PASID mapping registers are updated by the HW scheduler 
> > when used,
> >  and by driver code if HW scheduling is not being used.
> >  The Sea Islands compute queues use a new "doorbell" mechanism 
> > instead of the
> >  earlier kernel-managed write pointer registers. Doorbells use a 
> > separate BAR
> >  dedicated for this purpose, and pages within the doorbell 
> > aperture are
> >  mapped to userspace (each page mapped to only one user address 
> > space).
> >  Writes to the doorbell aperture are intercepted by GPU hardware, 
> > allowing
> >  userspace code to safely manage work queues (rings) without 
> > requiring a
> >  kernel call for every ring update.
> >  First step for an application process is to open the kfd device. 
> > Calls to
> >  open create a kfd "process" structure only for the first thread 
> > of the
> >  process. Subsequent open calls are checked to see if they are 
> > from processes
> >  using the same mm_struct and, if so, don't do anything. The kfd 
> > per-process
> >  data lives as long as the mm_struct exists. Each mm_struct is 
> > associated
> >  with a unique PASID, allowing the IOMMUv2 to make userspace 
> > process memory
> >  accessible to the GPU.
> >  Next step is for the application to collect topology information 
> > via sysfs.
> >  This gives userspace enough information to be able to identify 
> > specific
> >  nodes (processors) in subsequent queue management calls. 
> > Application
> >  processes can create queues on multiple processors, and 
> > processors support
> >  queues from multiple processes.
>  
> I am not a fan to use sysfs to discover topoly.
>  
> >  At this point the application can create work queues in userspace 
> > memory and
> >  pass them through the usermode library to kfd to have them mapped 
> > onto HW
> >  queue slots so that commands written to the queues can be 
> > executed by the
> >  GPU. Queue operations specify a processor node, and so the bulk 
> > of this code
> >  is device-specific.
> >  Written by John Bridgman <John.Bridgman@amd.com>
>  
> So general comment is you need to collapse many patches things like 
> 58 fixing
> kernel style should be collapsed ie fix all previous patch that have 
> broken
> style.
>  
> Even worse is thing like 71, removing code you just added few patch 
> earlier
> in the patchset. 
Not quite, the code was added on patch 11.

> This means that if i start reviewing following 
> patch order
> i might spend time on code that is later deleted/modified/fixed ie 
> time i
> spend understanding and reading some code might be just wasted.
Quick answer is that you are of course right, but having said that, I 
think it would be not simple at all to do that squashing.
I squashed what I could, and probably I can do a little more (like 
58), but the major problem is that one of the main modules of the 
driver - the scheduler (QCM) - was completely re-written (patches 
46-53). Therefore, from patch 1 to 53, we use the old scheduler code 
and from 54 we use the new QCM (and the old scheduler code was 
completely remove at 71). So I could maybe squash 71 into 54, but that 
won't help you much, I think.

So, the current advice I can give is to completely ignore the 
following files because they do not exist in the final commit:
- kfd_sched_cik_static.c (stopped using in 54)
- kfd_registers.c (stopped using in 81 because we moved all register 
writes to radeon)
- kfd_hw_pointer_store.c (alive from 46 to 67)
- kfd_hw_pointer_store.h (alive from 46 to 67)

        Oded 
> I will come back with individual patch comments and also general 
> remarks.
>  
> Cheers,
> Jérôme
>  
>  
> >  Alexey Skidanov (4):
> >    hsa/radeon: 32-bit processes support
> >    hsa/radeon: NULL pointer dereference bug workaround
> >    hsa/radeon: HSA64/HSA32 modes support
> >    hsa/radeon: Add local memory to topology
> >  Andrew Lewycky (3):
> >    hsa/radeon: Make binding of process to device permanent
> >    hsa/radeon: Implement hsaKmtSetMemoryPolicy
> >    mm: Change timing of notification to IOMMUs about a page to be
> >      invalidated
> >  Ben Goz (20):
> >    hsa/radeon: Add queue and hw_pointer_store modules
> >    hsa/radeon: Add support allocating kernel doorbells
> >    hsa/radeon: Add mqd_manager module
> >    hsa/radeon: Add kernel queue support for KFD
> >    hsa/radeon: Add module parameter of scheduling policy
> >    hsa/radeon: Add packet manager module
> >    hsa/radeon: Add process queue manager module
> >    hsa/radeon: Add device queue manager module
> >    hsa/radeon: Switch to new queue scheduler
> >    hsa/radeon: Add IOCTL for update queue
> >    hsa/radeon: Queue Management integration with Memory Management
> >    hsa/radeon: update queue fault handling
> >    hsa/radeon: fixing a bug to support 32b processes
> >    hsa/radeon: Fix number of pipes per ME
> >    hsa/radeon: Removing hw pointer store module
> >    hsa/radeon: Adding some error messages
> >    hsa/radeon: Fixing minor issues with kernel queues (DIQ)
> >    drm/radeon: Add register access functions to kfd2kgd interface
> >    hsa/radeon: Eliminating all direct register accesses
> >    drm/radeon: Remove lock functions from kfd2kgd interface
> >  Evgeny Pinchuk (9):
> >    hsa/radeon: fix the OEMID assignment in kfd_topology
> >    drm/radeon: extending kfd-kgd interface
> >    hsa/radeon: implementing IOCTL for clock counters
> >    drm/radeon: adding synchronization for GRBM GFX
> >    hsa/radeon: fixing clock counters bug
> >    drm/radeon: Extending kfd interface
> >    hsa/radeon: Adding max clock speeds to topology
> >    hsa/radeon: Alternating the source of max clock
> >    hsa/radeon: Exclusive access for perf. counters
> >  Michael Varga (1):
> >    hsa/radeon: debugging print statements
> >  Oded Gabbay (45):
> >    mm: Add kfd_process pointer to mm_struct
> >    drm/radeon: reduce number of free VMIDs and pipes in KV
> >    drm/radeon: Report doorbell configuration to kfd
> >    drm/radeon: Add radeon <--> kfd interface
> >    drm/radeon: Add kfd-->kgd interface to get virtual ram size
> >    drm/radeon: Add kfd-->kgd interfaces of memory 
> > allocation/mapping
> >    drm/radeon: Add kfd-->kgd interface of locking srbm_gfx_cntl 
> > register
> >    drm/radeon: Add calls to initialize and finalize kfd from radeon
> >    hsa/radeon: Add code base of hsa driver for AMD's GPUs
> >    hsa/radeon: Add initialization and unmapping of doorbell 
> > aperture
> >    hsa/radeon: Add scheduler code
> >    hsa/radeon: Add kfd mmap handler
> >    hsa/radeon: Add 2 new IOCTL to kfd, CREATE_QUEUE and 
> > DESTROY_QUEUE
> >    hsa/radeon: Update MAINTAINERS and CREDITS files
> >    hsa/radeon: Add interrupt handling module
> >    hsa/radeon: Add the isr function of the KFD scehduler
> >    hsa/radeon: Handle deactivation of queues using interrupts
> >    hsa/radeon: Enable interrupts in KFD scheduler
> >    hsa/radeon: Enable/Disable KFD interrupt module
> >    hsa/radeon: Add interrupt callback function to kgd2kfd interface
> >    hsa/radeon: Add kgd-->kfd interfaces for suspend and resume
> >    drm/radeon: Add calls to suspend and resume of kfd driver
> >    drm/radeon/cik: Don't touch int of pipes 1-7
> >    drm/radeon/cik: Call kfd isr function
> >    hsa/radeon: Fix memory size allocated for HPD
> >    hsa/radeon: Fix list of supported devices
> >    hsa/radeon: Fix coding style in cik_int.h
> >    hsa/radeon: Print ioctl commnad only in debug mode
> >    hsa/radeon: Print ISR info only in debug mode
> >    hsa/radeon: Workaround for a bug in amd_iommu
> >    hsa/radeon: Eliminate warnings in compilation
> >    hsa/radeon: Various kernel styling fixes
> >    hsa/radeon: Rearrange structures in kfd_ioctl.h
> >    hsa/radeon: change another pr_info to pr_debug
> >    hsa/radeon: Fix timeout calculation in sync_with_hw
> >    hsa/radeon: Update module information and version
> >    hsa/radeon: Update module version to 0.6.0
> >    hsa/radeon: Fix initialization of sh_mem registers
> >    hsa/radeon: Fix compilation warnings
> >    hsa/radeon: Remove old scheduler code
> >    hsa/radeon: Static analysis (smatch) fixes
> >    hsa/radeon: Check oversubscription before destroying runlist
> >    hsa/radeon: Don't verify cksum when parsing CRAT table
> >    hsa/radeon: Update module version to 0.6.1
> >    hsa/radeon: Update module version to 0.6.2
> >  Yair Shachar (1):
> >    hsa/radeon: Adding qcm fence return status
> >   CREDITS                                            |    7 +
> >   MAINTAINERS                                        |    8 +
> >   drivers/Kconfig                                    |    2 +
> >   drivers/gpu/Makefile                               |    1 +
> >   drivers/gpu/drm/radeon/Makefile                    |    1 +
> >   drivers/gpu/drm/radeon/cik.c                       |  156 +--
> >   drivers/gpu/drm/radeon/cikd.h                      |   51 +-
> >   drivers/gpu/drm/radeon/radeon.h                    |    9 +
> >   drivers/gpu/drm/radeon/radeon_device.c             |   32 +
> >   drivers/gpu/drm/radeon/radeon_drv.c                |    6 +
> >   drivers/gpu/drm/radeon/radeon_kfd.c                |  630 
> > ++++++++++
> >   drivers/gpu/drm/radeon/radeon_kms.c                |    9 +
> >   drivers/gpu/hsa/Kconfig                            |   20 +
> >   drivers/gpu/hsa/Makefile                           |    1 +
> >   drivers/gpu/hsa/radeon/Makefile                    |   12 +
> >   drivers/gpu/hsa/radeon/cik_int.h                   |   50 +
> >   drivers/gpu/hsa/radeon/cik_mqds.h                  |  250 ++++
> >   drivers/gpu/hsa/radeon/cik_regs.h                  |  220 ++++
> >   drivers/gpu/hsa/radeon/kfd_aperture.c              |  123 ++
> >   drivers/gpu/hsa/radeon/kfd_chardev.c               |  530 
> > +++++++++
> >   drivers/gpu/hsa/radeon/kfd_crat.h                  |  294 +++++
> >   drivers/gpu/hsa/radeon/kfd_device.c                |  244 ++++
> >   drivers/gpu/hsa/radeon/kfd_device_queue_manager.c  |  981 
> > ++++++++++++++++
> >   drivers/gpu/hsa/radeon/kfd_device_queue_manager.h  |  101 ++
> >   drivers/gpu/hsa/radeon/kfd_doorbell.c              |  242 ++++
> >   drivers/gpu/hsa/radeon/kfd_interrupt.c             |  177 +++
> >   drivers/gpu/hsa/radeon/kfd_kernel_queue.c          |  305 +++++
> >   drivers/gpu/hsa/radeon/kfd_kernel_queue.h          |   66 ++
> >   drivers/gpu/hsa/radeon/kfd_module.c                |  130 +++
> >   drivers/gpu/hsa/radeon/kfd_mqd_manager.c           |  290 +++++
> >   drivers/gpu/hsa/radeon/kfd_mqd_manager.h           |   54 +
> >   drivers/gpu/hsa/radeon/kfd_packet_manager.c        |  488 
> > ++++++++
> >   drivers/gpu/hsa/radeon/kfd_pasid.c                 |   97 ++
> >   drivers/gpu/hsa/radeon/kfd_pm4_headers.h           |  682 
> > +++++++++++
> >   drivers/gpu/hsa/radeon/kfd_pm4_opcodes.h           |  107 ++
> >   drivers/gpu/hsa/radeon/kfd_priv.h                  |  475 
> > ++++++++
> >   drivers/gpu/hsa/radeon/kfd_process.c               |  391 +++++++
> >   drivers/gpu/hsa/radeon/kfd_process_queue_manager.c |  343 ++++++
> >   drivers/gpu/hsa/radeon/kfd_queue.c                 |  109 ++
> >   drivers/gpu/hsa/radeon/kfd_scheduler.h             |   72 ++
> >   drivers/gpu/hsa/radeon/kfd_topology.c              | 1207 
> > ++++++++++++++++++++
> >   drivers/gpu/hsa/radeon/kfd_topology.h              |  168 +++
> >   drivers/gpu/hsa/radeon/kfd_vidmem.c                |   97 ++
> >   include/linux/mm_types.h                           |   14 +
> >   include/linux/radeon_kfd.h                         |  106 ++
> >   include/uapi/linux/kfd_ioctl.h                     |  133 +++
> >   mm/rmap.c                                          |    8 +-
> >   47 files changed, 9402 insertions(+), 97 deletions(-)
> >   create mode 100644 drivers/gpu/drm/radeon/radeon_kfd.c
> >   create mode 100644 drivers/gpu/hsa/Kconfig
> >   create mode 100644 drivers/gpu/hsa/Makefile
> >   create mode 100644 drivers/gpu/hsa/radeon/Makefile
> >   create mode 100644 drivers/gpu/hsa/radeon/cik_int.h
> >   create mode 100644 drivers/gpu/hsa/radeon/cik_mqds.h
> >   create mode 100644 drivers/gpu/hsa/radeon/cik_regs.h
> >   create mode 100644 drivers/gpu/hsa/radeon/kfd_aperture.c
> >   create mode 100644 drivers/gpu/hsa/radeon/kfd_chardev.c
> >   create mode 100644 drivers/gpu/hsa/radeon/kfd_crat.h
> >   create mode 100644 drivers/gpu/hsa/radeon/kfd_device.c
> >   create mode 100644 
> > drivers/gpu/hsa/radeon/kfd_device_queue_manager.c
> >   create mode 100644 
> > drivers/gpu/hsa/radeon/kfd_device_queue_manager.h
> >   create mode 100644 drivers/gpu/hsa/radeon/kfd_doorbell.c
> >   create mode 100644 drivers/gpu/hsa/radeon/kfd_interrupt.c
> >   create mode 100644 drivers/gpu/hsa/radeon/kfd_kernel_queue.c
> >   create mode 100644 drivers/gpu/hsa/radeon/kfd_kernel_queue.h
> >   create mode 100644 drivers/gpu/hsa/radeon/kfd_module.c
> >   create mode 100644 drivers/gpu/hsa/radeon/kfd_mqd_manager.c
> >   create mode 100644 drivers/gpu/hsa/radeon/kfd_mqd_manager.h
> >   create mode 100644 drivers/gpu/hsa/radeon/kfd_packet_manager.c
> >   create mode 100644 drivers/gpu/hsa/radeon/kfd_pasid.c
> >   create mode 100644 drivers/gpu/hsa/radeon/kfd_pm4_headers.h
> >   create mode 100644 drivers/gpu/hsa/radeon/kfd_pm4_opcodes.h
> >   create mode 100644 drivers/gpu/hsa/radeon/kfd_priv.h
> >   create mode 100644 drivers/gpu/hsa/radeon/kfd_process.c
> >   create mode 100644 
> > drivers/gpu/hsa/radeon/kfd_process_queue_manager.c
> >   create mode 100644 drivers/gpu/hsa/radeon/kfd_queue.c
> >   create mode 100644 drivers/gpu/hsa/radeon/kfd_scheduler.h
> >   create mode 100644 drivers/gpu/hsa/radeon/kfd_topology.c
> >   create mode 100644 drivers/gpu/hsa/radeon/kfd_topology.h
> >   create mode 100644 drivers/gpu/hsa/radeon/kfd_vidmem.c
> >   create mode 100644 include/linux/radeon_kfd.h
> >   create mode 100644 include/uapi/linux/kfd_ioctl.h
> >  --
> >  1.9.1
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/83] AMD HSA kernel driver
@ 2014-07-10 22:51     ` Gabbay, Oded
  0 siblings, 0 replies; 51+ messages in thread
From: Gabbay, Oded @ 2014-07-10 22:51 UTC (permalink / raw)
  To: j.glisse
  Cc: linux-kernel, Bridgman, John, Deucher, Alexander, Lewycky,
	Andrew, joro, akpm, dri-devel, airlied, oded.gabbay

On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote:
> On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote:
> >  This patch set implements a Heterogeneous System Architecture 
> > (HSA) driver
> >  for radeon-family GPUs.
>  
> This is just quick comments on few things. Given size of this, people
> will need to have time to review things.
>  
> >  HSA allows different processor types (CPUs, DSPs, GPUs, etc..) to 
> > share
> >  system resources more effectively via HW features including 
> > shared pageable
> >  memory, userspace-accessible work queues, and platform-level 
> > atomics. In
> >  addition to the memory protection mechanisms in GPUVM and 
> > IOMMUv2, the Sea
> >  Islands family of GPUs also performs HW-level validation of 
> > commands passed
> >  in through the queues (aka rings).
> >  The code in this patch set is intended to serve both as a sample 
> > driver for
> >  other HSA-compatible hardware devices and as a production driver 
> > for
> >  radeon-family processors. The code is architected to support 
> > multiple CPUs
> >  each with connected GPUs, although the current implementation 
> > focuses on a
> >  single Kaveri/Berlin APU, and works alongside the existing radeon 
> > kernel
> >  graphics driver (kgd).
> >  AMD GPUs designed for use with HSA (Sea Islands and up) share 
> > some hardware
> >  functionality between HSA compute and regular gfx/compute (memory,
> >  interrupts, registers), while other functionality has been added
> >  specifically for HSA compute  (hw scheduler for virtualized 
> > compute rings).
> >  All shared hardware is owned by the radeon graphics driver, and 
> > an interface
> >  between kfd and kgd allows the kfd to make use of those shared 
> > resources,
> >  while HSA-specific functionality is managed directly by kfd by 
> > submitting
> >  packets into an HSA-specific command queue (the "HIQ").
> >  During kfd module initialization a char device node (/dev/kfd) is 
> > created
> >  (surviving until module exit), with ioctls for queue creation & 
> > management,
> >  and data structures are initialized for managing HSA device 
> > topology.
> >  The rest of the initialization is driven by calls from the radeon 
> > kgd at
> >  the following points :
> >  - radeon_init (kfd_init)
> >  - radeon_exit (kfd_fini)
> >  - radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
> >  - radeon_driver_unload_kms (kfd_device_fini)
> >  During the probe and init processing per-device data structures 
> > are
> >  established which connect to the associated graphics kernel 
> > driver. This
> >  information is exposed to userspace via sysfs, along with a 
> > version number
> >  allowing userspace to determine if a topology change has occurred 
> > while it
> >  was reading from sysfs.
> >  The interface between kfd and kgd also allows the kfd to request 
> > buffer
> >  management services from kgd, and allows kgd to route interrupt 
> > requests to
> >  kfd code since the interrupt block is shared between regular
> >  graphics/compute and HSA compute subsystems in the GPU.
> >  The kfd code works with an open source usermode library 
> > ("libhsakmt") which
> >  is in the final stages of IP review and should be published in a 
> > separate
> >  repo over the next few days.
> >  The code operates in one of three modes, selectable via the 
> > sched_policy
> >  module parameter :
> >  - sched_policy=0 uses a hardware scheduler running in the MEC 
> > block within
> >  CP, and allows oversubscription (more queues than HW slots)
> >  - sched_policy=1 also uses HW scheduling but does not allow
> >  oversubscription, so create_queue requests fail when we run out 
> > of HW slots
> >  - sched_policy=2 does not use HW scheduling, so the driver 
> > manually assigns
> >  queues to HW slots by programming registers
> >  The "no HW scheduling" option is for debug & new hardware bringup 
> > only, so
> >  has less test coverage than the other options. Default in the 
> > current code
> >  is "HW scheduling without oversubscription" since that is where 
> > we have the
> >  most test coverage but we expect to change the default to "HW 
> > scheduling
> >  with oversubscription" after further testing. This effectively 
> > removes the
> >  HW limit on the number of work queues available to applications.
> >  Programs running on the GPU are associated with an address space 
> > through the
> >  VMID field, which is translated to a unique PASID at access time 
> > via a set
> >  of 16 VMID-to-PASID mapping registers. The available VMIDs 
> > (currently 16)
> >  are partitioned (under control of the radeon kgd) between current
> >  gfx/compute and HSA compute, with each getting 8 in the current 
> > code. The
> >  VMID-to-PASID mapping registers are updated by the HW scheduler 
> > when used,
> >  and by driver code if HW scheduling is not being used.
> >  The Sea Islands compute queues use a new "doorbell" mechanism 
> > instead of the
> >  earlier kernel-managed write pointer registers. Doorbells use a 
> > separate BAR
> >  dedicated for this purpose, and pages within the doorbell 
> > aperture are
> >  mapped to userspace (each page mapped to only one user address 
> > space).
> >  Writes to the doorbell aperture are intercepted by GPU hardware, 
> > allowing
> >  userspace code to safely manage work queues (rings) without 
> > requiring a
> >  kernel call for every ring update.
> >  First step for an application process is to open the kfd device. 
> > Calls to
> >  open create a kfd "process" structure only for the first thread 
> > of the
> >  process. Subsequent open calls are checked to see if they are 
> > from processes
> >  using the same mm_struct and, if so, don't do anything. The kfd 
> > per-process
> >  data lives as long as the mm_struct exists. Each mm_struct is 
> > associated
> >  with a unique PASID, allowing the IOMMUv2 to make userspace 
> > process memory
> >  accessible to the GPU.
> >  Next step is for the application to collect topology information 
> > via sysfs.
> >  This gives userspace enough information to be able to identify 
> > specific
> >  nodes (processors) in subsequent queue management calls. 
> > Application
> >  processes can create queues on multiple processors, and 
> > processors support
> >  queues from multiple processes.
>  
> I am not a fan to use sysfs to discover topoly.
>  
> >  At this point the application can create work queues in userspace 
> > memory and
> >  pass them through the usermode library to kfd to have them mapped 
> > onto HW
> >  queue slots so that commands written to the queues can be 
> > executed by the
> >  GPU. Queue operations specify a processor node, and so the bulk 
> > of this code
> >  is device-specific.
> >  Written by John Bridgman <John.Bridgman@amd.com>
>  
> So general comment is you need to collapse many patches things like 
> 58 fixing
> kernel style should be collapsed ie fix all previous patch that have 
> broken
> style.
>  
> Even worse is thing like 71, removing code you just added few patch 
> earlier
> in the patchset. 
Not quite, the code was added on patch 11.

> This means that if i start reviewing following 
> patch order
> i might spend time on code that is later deleted/modified/fixed ie 
> time i
> spend understanding and reading some code might be just wasted.
Quick answer is that you are of course right, but having said that, I 
think it would be not simple at all to do that squashing.
I squashed what I could, and probably I can do a little more (like 
58), but the major problem is that one of the main modules of the 
driver - the scheduler (QCM) - was completely re-written (patches 
46-53). Therefore, from patch 1 to 53, we use the old scheduler code 
and from 54 we use the new QCM (and the old scheduler code was 
completely remove at 71). So I could maybe squash 71 into 54, but that 
won't help you much, I think.

So, the current advice I can give is to completely ignore the 
following files because they do not exist in the final commit:
- kfd_sched_cik_static.c (stopped using in 54)
- kfd_registers.c (stopped using in 81 because we moved all register 
writes to radeon)
- kfd_hw_pointer_store.c (alive from 46 to 67)
- kfd_hw_pointer_store.h (alive from 46 to 67)

        Oded 
> I will come back with individual patch comments and also general 
> remarks.
>  
> Cheers,
> Jérôme
>  
>  
> >  Alexey Skidanov (4):
> >    hsa/radeon: 32-bit processes support
> >    hsa/radeon: NULL pointer dereference bug workaround
> >    hsa/radeon: HSA64/HSA32 modes support
> >    hsa/radeon: Add local memory to topology
> >  Andrew Lewycky (3):
> >    hsa/radeon: Make binding of process to device permanent
> >    hsa/radeon: Implement hsaKmtSetMemoryPolicy
> >    mm: Change timing of notification to IOMMUs about a page to be
> >      invalidated
> >  Ben Goz (20):
> >    hsa/radeon: Add queue and hw_pointer_store modules
> >    hsa/radeon: Add support allocating kernel doorbells
> >    hsa/radeon: Add mqd_manager module
> >    hsa/radeon: Add kernel queue support for KFD
> >    hsa/radeon: Add module parameter of scheduling policy
> >    hsa/radeon: Add packet manager module
> >    hsa/radeon: Add process queue manager module
> >    hsa/radeon: Add device queue manager module
> >    hsa/radeon: Switch to new queue scheduler
> >    hsa/radeon: Add IOCTL for update queue
> >    hsa/radeon: Queue Management integration with Memory Management
> >    hsa/radeon: update queue fault handling
> >    hsa/radeon: fixing a bug to support 32b processes
> >    hsa/radeon: Fix number of pipes per ME
> >    hsa/radeon: Removing hw pointer store module
> >    hsa/radeon: Adding some error messages
> >    hsa/radeon: Fixing minor issues with kernel queues (DIQ)
> >    drm/radeon: Add register access functions to kfd2kgd interface
> >    hsa/radeon: Eliminating all direct register accesses
> >    drm/radeon: Remove lock functions from kfd2kgd interface
> >  Evgeny Pinchuk (9):
> >    hsa/radeon: fix the OEMID assignment in kfd_topology
> >    drm/radeon: extending kfd-kgd interface
> >    hsa/radeon: implementing IOCTL for clock counters
> >    drm/radeon: adding synchronization for GRBM GFX
> >    hsa/radeon: fixing clock counters bug
> >    drm/radeon: Extending kfd interface
> >    hsa/radeon: Adding max clock speeds to topology
> >    hsa/radeon: Alternating the source of max clock
> >    hsa/radeon: Exclusive access for perf. counters
> >  Michael Varga (1):
> >    hsa/radeon: debugging print statements
> >  Oded Gabbay (45):
> >    mm: Add kfd_process pointer to mm_struct
> >    drm/radeon: reduce number of free VMIDs and pipes in KV
> >    drm/radeon: Report doorbell configuration to kfd
> >    drm/radeon: Add radeon <--> kfd interface
> >    drm/radeon: Add kfd-->kgd interface to get virtual ram size
> >    drm/radeon: Add kfd-->kgd interfaces of memory 
> > allocation/mapping
> >    drm/radeon: Add kfd-->kgd interface of locking srbm_gfx_cntl 
> > register
> >    drm/radeon: Add calls to initialize and finalize kfd from radeon
> >    hsa/radeon: Add code base of hsa driver for AMD's GPUs
> >    hsa/radeon: Add initialization and unmapping of doorbell 
> > aperture
> >    hsa/radeon: Add scheduler code
> >    hsa/radeon: Add kfd mmap handler
> >    hsa/radeon: Add 2 new IOCTL to kfd, CREATE_QUEUE and 
> > DESTROY_QUEUE
> >    hsa/radeon: Update MAINTAINERS and CREDITS files
> >    hsa/radeon: Add interrupt handling module
> >    hsa/radeon: Add the isr function of the KFD scehduler
> >    hsa/radeon: Handle deactivation of queues using interrupts
> >    hsa/radeon: Enable interrupts in KFD scheduler
> >    hsa/radeon: Enable/Disable KFD interrupt module
> >    hsa/radeon: Add interrupt callback function to kgd2kfd interface
> >    hsa/radeon: Add kgd-->kfd interfaces for suspend and resume
> >    drm/radeon: Add calls to suspend and resume of kfd driver
> >    drm/radeon/cik: Don't touch int of pipes 1-7
> >    drm/radeon/cik: Call kfd isr function
> >    hsa/radeon: Fix memory size allocated for HPD
> >    hsa/radeon: Fix list of supported devices
> >    hsa/radeon: Fix coding style in cik_int.h
> >    hsa/radeon: Print ioctl commnad only in debug mode
> >    hsa/radeon: Print ISR info only in debug mode
> >    hsa/radeon: Workaround for a bug in amd_iommu
> >    hsa/radeon: Eliminate warnings in compilation
> >    hsa/radeon: Various kernel styling fixes
> >    hsa/radeon: Rearrange structures in kfd_ioctl.h
> >    hsa/radeon: change another pr_info to pr_debug
> >    hsa/radeon: Fix timeout calculation in sync_with_hw
> >    hsa/radeon: Update module information and version
> >    hsa/radeon: Update module version to 0.6.0
> >    hsa/radeon: Fix initialization of sh_mem registers
> >    hsa/radeon: Fix compilation warnings
> >    hsa/radeon: Remove old scheduler code
> >    hsa/radeon: Static analysis (smatch) fixes
> >    hsa/radeon: Check oversubscription before destroying runlist
> >    hsa/radeon: Don't verify cksum when parsing CRAT table
> >    hsa/radeon: Update module version to 0.6.1
> >    hsa/radeon: Update module version to 0.6.2
> >  Yair Shachar (1):
> >    hsa/radeon: Adding qcm fence return status
> >   CREDITS                                            |    7 +
> >   MAINTAINERS                                        |    8 +
> >   drivers/Kconfig                                    |    2 +
> >   drivers/gpu/Makefile                               |    1 +
> >   drivers/gpu/drm/radeon/Makefile                    |    1 +
> >   drivers/gpu/drm/radeon/cik.c                       |  156 +--
> >   drivers/gpu/drm/radeon/cikd.h                      |   51 +-
> >   drivers/gpu/drm/radeon/radeon.h                    |    9 +
> >   drivers/gpu/drm/radeon/radeon_device.c             |   32 +
> >   drivers/gpu/drm/radeon/radeon_drv.c                |    6 +
> >   drivers/gpu/drm/radeon/radeon_kfd.c                |  630 
> > ++++++++++
> >   drivers/gpu/drm/radeon/radeon_kms.c                |    9 +
> >   drivers/gpu/hsa/Kconfig                            |   20 +
> >   drivers/gpu/hsa/Makefile                           |    1 +
> >   drivers/gpu/hsa/radeon/Makefile                    |   12 +
> >   drivers/gpu/hsa/radeon/cik_int.h                   |   50 +
> >   drivers/gpu/hsa/radeon/cik_mqds.h                  |  250 ++++
> >   drivers/gpu/hsa/radeon/cik_regs.h                  |  220 ++++
> >   drivers/gpu/hsa/radeon/kfd_aperture.c              |  123 ++
> >   drivers/gpu/hsa/radeon/kfd_chardev.c               |  530 
> > +++++++++
> >   drivers/gpu/hsa/radeon/kfd_crat.h                  |  294 +++++
> >   drivers/gpu/hsa/radeon/kfd_device.c                |  244 ++++
> >   drivers/gpu/hsa/radeon/kfd_device_queue_manager.c  |  981 
> > ++++++++++++++++
> >   drivers/gpu/hsa/radeon/kfd_device_queue_manager.h  |  101 ++
> >   drivers/gpu/hsa/radeon/kfd_doorbell.c              |  242 ++++
> >   drivers/gpu/hsa/radeon/kfd_interrupt.c             |  177 +++
> >   drivers/gpu/hsa/radeon/kfd_kernel_queue.c          |  305 +++++
> >   drivers/gpu/hsa/radeon/kfd_kernel_queue.h          |   66 ++
> >   drivers/gpu/hsa/radeon/kfd_module.c                |  130 +++
> >   drivers/gpu/hsa/radeon/kfd_mqd_manager.c           |  290 +++++
> >   drivers/gpu/hsa/radeon/kfd_mqd_manager.h           |   54 +
> >   drivers/gpu/hsa/radeon/kfd_packet_manager.c        |  488 
> > ++++++++
> >   drivers/gpu/hsa/radeon/kfd_pasid.c                 |   97 ++
> >   drivers/gpu/hsa/radeon/kfd_pm4_headers.h           |  682 
> > +++++++++++
> >   drivers/gpu/hsa/radeon/kfd_pm4_opcodes.h           |  107 ++
> >   drivers/gpu/hsa/radeon/kfd_priv.h                  |  475 
> > ++++++++
> >   drivers/gpu/hsa/radeon/kfd_process.c               |  391 +++++++
> >   drivers/gpu/hsa/radeon/kfd_process_queue_manager.c |  343 ++++++
> >   drivers/gpu/hsa/radeon/kfd_queue.c                 |  109 ++
> >   drivers/gpu/hsa/radeon/kfd_scheduler.h             |   72 ++
> >   drivers/gpu/hsa/radeon/kfd_topology.c              | 1207 
> > ++++++++++++++++++++
> >   drivers/gpu/hsa/radeon/kfd_topology.h              |  168 +++
> >   drivers/gpu/hsa/radeon/kfd_vidmem.c                |   97 ++
> >   include/linux/mm_types.h                           |   14 +
> >   include/linux/radeon_kfd.h                         |  106 ++
> >   include/uapi/linux/kfd_ioctl.h                     |  133 +++
> >   mm/rmap.c                                          |    8 +-
> >   47 files changed, 9402 insertions(+), 97 deletions(-)
> >   create mode 100644 drivers/gpu/drm/radeon/radeon_kfd.c
> >   create mode 100644 drivers/gpu/hsa/Kconfig
> >   create mode 100644 drivers/gpu/hsa/Makefile
> >   create mode 100644 drivers/gpu/hsa/radeon/Makefile
> >   create mode 100644 drivers/gpu/hsa/radeon/cik_int.h
> >   create mode 100644 drivers/gpu/hsa/radeon/cik_mqds.h
> >   create mode 100644 drivers/gpu/hsa/radeon/cik_regs.h
> >   create mode 100644 drivers/gpu/hsa/radeon/kfd_aperture.c
> >   create mode 100644 drivers/gpu/hsa/radeon/kfd_chardev.c
> >   create mode 100644 drivers/gpu/hsa/radeon/kfd_crat.h
> >   create mode 100644 drivers/gpu/hsa/radeon/kfd_device.c
> >   create mode 100644 
> > drivers/gpu/hsa/radeon/kfd_device_queue_manager.c
> >   create mode 100644 
> > drivers/gpu/hsa/radeon/kfd_device_queue_manager.h
> >   create mode 100644 drivers/gpu/hsa/radeon/kfd_doorbell.c
> >   create mode 100644 drivers/gpu/hsa/radeon/kfd_interrupt.c
> >   create mode 100644 drivers/gpu/hsa/radeon/kfd_kernel_queue.c
> >   create mode 100644 drivers/gpu/hsa/radeon/kfd_kernel_queue.h
> >   create mode 100644 drivers/gpu/hsa/radeon/kfd_module.c
> >   create mode 100644 drivers/gpu/hsa/radeon/kfd_mqd_manager.c
> >   create mode 100644 drivers/gpu/hsa/radeon/kfd_mqd_manager.h
> >   create mode 100644 drivers/gpu/hsa/radeon/kfd_packet_manager.c
> >   create mode 100644 drivers/gpu/hsa/radeon/kfd_pasid.c
> >   create mode 100644 drivers/gpu/hsa/radeon/kfd_pm4_headers.h
> >   create mode 100644 drivers/gpu/hsa/radeon/kfd_pm4_opcodes.h
> >   create mode 100644 drivers/gpu/hsa/radeon/kfd_priv.h
> >   create mode 100644 drivers/gpu/hsa/radeon/kfd_process.c
> >   create mode 100644 
> > drivers/gpu/hsa/radeon/kfd_process_queue_manager.c
> >   create mode 100644 drivers/gpu/hsa/radeon/kfd_queue.c
> >   create mode 100644 drivers/gpu/hsa/radeon/kfd_scheduler.h
> >   create mode 100644 drivers/gpu/hsa/radeon/kfd_topology.c
> >   create mode 100644 drivers/gpu/hsa/radeon/kfd_topology.h
> >   create mode 100644 drivers/gpu/hsa/radeon/kfd_vidmem.c
> >   create mode 100644 include/linux/radeon_kfd.h
> >   create mode 100644 include/uapi/linux/kfd_ioctl.h
> >  --
> >  1.9.1

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/83] AMD HSA kernel driver
  2014-07-10 22:51     ` Gabbay, Oded
@ 2014-07-11 21:18       ` Jerome Glisse
  -1 siblings, 0 replies; 51+ messages in thread
From: Jerome Glisse @ 2014-07-11 21:18 UTC (permalink / raw)
  To: Gabbay, Oded
  Cc: linux-kernel, Bridgman, John, Deucher, Alexander, Lewycky,
	Andrew, joro, akpm, dri-devel, airlied, oded.gabbay

On Thu, Jul 10, 2014 at 10:51:29PM +0000, Gabbay, Oded wrote:
> On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote:
> > On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote:
> > >  This patch set implements a Heterogeneous System Architecture 
> > > (HSA) driver
> > >  for radeon-family GPUs.
> >  
> > This is just quick comments on few things. Given size of this, people
> > will need to have time to review things.
> >  
> > >  HSA allows different processor types (CPUs, DSPs, GPUs, etc..) to 
> > > share
> > >  system resources more effectively via HW features including 
> > > shared pageable
> > >  memory, userspace-accessible work queues, and platform-level 
> > > atomics. In
> > >  addition to the memory protection mechanisms in GPUVM and 
> > > IOMMUv2, the Sea
> > >  Islands family of GPUs also performs HW-level validation of 
> > > commands passed
> > >  in through the queues (aka rings).
> > >  The code in this patch set is intended to serve both as a sample 
> > > driver for
> > >  other HSA-compatible hardware devices and as a production driver 
> > > for
> > >  radeon-family processors. The code is architected to support 
> > > multiple CPUs
> > >  each with connected GPUs, although the current implementation 
> > > focuses on a
> > >  single Kaveri/Berlin APU, and works alongside the existing radeon 
> > > kernel
> > >  graphics driver (kgd).
> > >  AMD GPUs designed for use with HSA (Sea Islands and up) share 
> > > some hardware
> > >  functionality between HSA compute and regular gfx/compute (memory,
> > >  interrupts, registers), while other functionality has been added
> > >  specifically for HSA compute  (hw scheduler for virtualized 
> > > compute rings).
> > >  All shared hardware is owned by the radeon graphics driver, and 
> > > an interface
> > >  between kfd and kgd allows the kfd to make use of those shared 
> > > resources,
> > >  while HSA-specific functionality is managed directly by kfd by 
> > > submitting
> > >  packets into an HSA-specific command queue (the "HIQ").
> > >  During kfd module initialization a char device node (/dev/kfd) is 
> > > created
> > >  (surviving until module exit), with ioctls for queue creation & 
> > > management,
> > >  and data structures are initialized for managing HSA device 
> > > topology.
> > >  The rest of the initialization is driven by calls from the radeon 
> > > kgd at
> > >  the following points :
> > >  - radeon_init (kfd_init)
> > >  - radeon_exit (kfd_fini)
> > >  - radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
> > >  - radeon_driver_unload_kms (kfd_device_fini)
> > >  During the probe and init processing per-device data structures 
> > > are
> > >  established which connect to the associated graphics kernel 
> > > driver. This
> > >  information is exposed to userspace via sysfs, along with a 
> > > version number
> > >  allowing userspace to determine if a topology change has occurred 
> > > while it
> > >  was reading from sysfs.
> > >  The interface between kfd and kgd also allows the kfd to request 
> > > buffer
> > >  management services from kgd, and allows kgd to route interrupt 
> > > requests to
> > >  kfd code since the interrupt block is shared between regular
> > >  graphics/compute and HSA compute subsystems in the GPU.
> > >  The kfd code works with an open source usermode library 
> > > ("libhsakmt") which
> > >  is in the final stages of IP review and should be published in a 
> > > separate
> > >  repo over the next few days.
> > >  The code operates in one of three modes, selectable via the 
> > > sched_policy
> > >  module parameter :
> > >  - sched_policy=0 uses a hardware scheduler running in the MEC 
> > > block within
> > >  CP, and allows oversubscription (more queues than HW slots)
> > >  - sched_policy=1 also uses HW scheduling but does not allow
> > >  oversubscription, so create_queue requests fail when we run out 
> > > of HW slots
> > >  - sched_policy=2 does not use HW scheduling, so the driver 
> > > manually assigns
> > >  queues to HW slots by programming registers
> > >  The "no HW scheduling" option is for debug & new hardware bringup 
> > > only, so
> > >  has less test coverage than the other options. Default in the 
> > > current code
> > >  is "HW scheduling without oversubscription" since that is where 
> > > we have the
> > >  most test coverage but we expect to change the default to "HW 
> > > scheduling
> > >  with oversubscription" after further testing. This effectively 
> > > removes the
> > >  HW limit on the number of work queues available to applications.
> > >  Programs running on the GPU are associated with an address space 
> > > through the
> > >  VMID field, which is translated to a unique PASID at access time 
> > > via a set
> > >  of 16 VMID-to-PASID mapping registers. The available VMIDs 
> > > (currently 16)
> > >  are partitioned (under control of the radeon kgd) between current
> > >  gfx/compute and HSA compute, with each getting 8 in the current 
> > > code. The
> > >  VMID-to-PASID mapping registers are updated by the HW scheduler 
> > > when used,
> > >  and by driver code if HW scheduling is not being used.
> > >  The Sea Islands compute queues use a new "doorbell" mechanism 
> > > instead of the
> > >  earlier kernel-managed write pointer registers. Doorbells use a 
> > > separate BAR
> > >  dedicated for this purpose, and pages within the doorbell 
> > > aperture are
> > >  mapped to userspace (each page mapped to only one user address 
> > > space).
> > >  Writes to the doorbell aperture are intercepted by GPU hardware, 
> > > allowing
> > >  userspace code to safely manage work queues (rings) without 
> > > requiring a
> > >  kernel call for every ring update.
> > >  First step for an application process is to open the kfd device. 
> > > Calls to
> > >  open create a kfd "process" structure only for the first thread 
> > > of the
> > >  process. Subsequent open calls are checked to see if they are 
> > > from processes
> > >  using the same mm_struct and, if so, don't do anything. The kfd 
> > > per-process
> > >  data lives as long as the mm_struct exists. Each mm_struct is 
> > > associated
> > >  with a unique PASID, allowing the IOMMUv2 to make userspace 
> > > process memory
> > >  accessible to the GPU.
> > >  Next step is for the application to collect topology information 
> > > via sysfs.
> > >  This gives userspace enough information to be able to identify 
> > > specific
> > >  nodes (processors) in subsequent queue management calls. 
> > > Application
> > >  processes can create queues on multiple processors, and 
> > > processors support
> > >  queues from multiple processes.
> >  
> > I am not a fan to use sysfs to discover topoly.
> >  
> > >  At this point the application can create work queues in userspace 
> > > memory and
> > >  pass them through the usermode library to kfd to have them mapped 
> > > onto HW
> > >  queue slots so that commands written to the queues can be 
> > > executed by the
> > >  GPU. Queue operations specify a processor node, and so the bulk 
> > > of this code
> > >  is device-specific.
> > >  Written by John Bridgman <John.Bridgman@amd.com>
> >  
> > So general comment is you need to collapse many patches things like 
> > 58 fixing
> > kernel style should be collapsed ie fix all previous patch that have 
> > broken
> > style.
> >  
> > Even worse is thing like 71, removing code you just added few patch 
> > earlier
> > in the patchset. 
> Not quite, the code was added on patch 11.
> 
> > This means that if i start reviewing following 
> > patch order
> > i might spend time on code that is later deleted/modified/fixed ie 
> > time i
> > spend understanding and reading some code might be just wasted.
> Quick answer is that you are of course right, but having said that, I 
> think it would be not simple at all to do that squashing.
> I squashed what I could, and probably I can do a little more (like 
> 58), but the major problem is that one of the main modules of the 
> driver - the scheduler (QCM) - was completely re-written (patches 
> 46-53). Therefore, from patch 1 to 53, we use the old scheduler code 
> and from 54 we use the new QCM (and the old scheduler code was 
> completely remove at 71). So I could maybe squash 71 into 54, but that 
> won't help you much, I think.
> 
> So, the current advice I can give is to completely ignore the 
> following files because they do not exist in the final commit:
> - kfd_sched_cik_static.c (stopped using in 54)
> - kfd_registers.c (stopped using in 81 because we moved all register 
> writes to radeon)
> - kfd_hw_pointer_store.c (alive from 46 to 67)
> - kfd_hw_pointer_store.h (alive from 46 to 67)
> 
>         Oded 

Ok i try but this is driving me crazy, i am jungling btw full applied
patchset and individual patch going back and forth trying to find which
patch is the one i want to comment on. Not even mentioning that after
that people would need to ack bad patch just because they know a latter
good patch fix the badness.

This patchset can be reduced to less then 10 big patches. I would first
do core patches that touch core things like iommu and are preparatory
to the patchset. Then kfd patches that do not touch anything in radeon
but implement the skeleton and core infrastructure. Then radeon specific
patches.

This lead me to the design, all the core kfd helper should be in hsa
directory, and should be helper, while all specific bits to radeon
should most likely be part of the radeon gfx kernel module. I am not
against a radeon_kfd but realy i do not see the point. There should
be a hsa.ko like there is a drm.ko.

Also i think it would be benefical to rename kfd to hsa because it is
the most common name and the name that will bring valid hit with web
search.

Each of the patch should have a proper commit message that is not too
cryptic without being too chatty either.

Cheers,
Jérôme

> > I will come back with individual patch comments and also general 
> > remarks.
> >  
> > Cheers,
> > Jérôme
> >  
> >  
> > >  Alexey Skidanov (4):
> > >    hsa/radeon: 32-bit processes support
> > >    hsa/radeon: NULL pointer dereference bug workaround
> > >    hsa/radeon: HSA64/HSA32 modes support
> > >    hsa/radeon: Add local memory to topology
> > >  Andrew Lewycky (3):
> > >    hsa/radeon: Make binding of process to device permanent
> > >    hsa/radeon: Implement hsaKmtSetMemoryPolicy
> > >    mm: Change timing of notification to IOMMUs about a page to be
> > >      invalidated
> > >  Ben Goz (20):
> > >    hsa/radeon: Add queue and hw_pointer_store modules
> > >    hsa/radeon: Add support allocating kernel doorbells
> > >    hsa/radeon: Add mqd_manager module
> > >    hsa/radeon: Add kernel queue support for KFD
> > >    hsa/radeon: Add module parameter of scheduling policy
> > >    hsa/radeon: Add packet manager module
> > >    hsa/radeon: Add process queue manager module
> > >    hsa/radeon: Add device queue manager module
> > >    hsa/radeon: Switch to new queue scheduler
> > >    hsa/radeon: Add IOCTL for update queue
> > >    hsa/radeon: Queue Management integration with Memory Management
> > >    hsa/radeon: update queue fault handling
> > >    hsa/radeon: fixing a bug to support 32b processes
> > >    hsa/radeon: Fix number of pipes per ME
> > >    hsa/radeon: Removing hw pointer store module
> > >    hsa/radeon: Adding some error messages
> > >    hsa/radeon: Fixing minor issues with kernel queues (DIQ)
> > >    drm/radeon: Add register access functions to kfd2kgd interface
> > >    hsa/radeon: Eliminating all direct register accesses
> > >    drm/radeon: Remove lock functions from kfd2kgd interface
> > >  Evgeny Pinchuk (9):
> > >    hsa/radeon: fix the OEMID assignment in kfd_topology
> > >    drm/radeon: extending kfd-kgd interface
> > >    hsa/radeon: implementing IOCTL for clock counters
> > >    drm/radeon: adding synchronization for GRBM GFX
> > >    hsa/radeon: fixing clock counters bug
> > >    drm/radeon: Extending kfd interface
> > >    hsa/radeon: Adding max clock speeds to topology
> > >    hsa/radeon: Alternating the source of max clock
> > >    hsa/radeon: Exclusive access for perf. counters
> > >  Michael Varga (1):
> > >    hsa/radeon: debugging print statements
> > >  Oded Gabbay (45):
> > >    mm: Add kfd_process pointer to mm_struct
> > >    drm/radeon: reduce number of free VMIDs and pipes in KV
> > >    drm/radeon: Report doorbell configuration to kfd
> > >    drm/radeon: Add radeon <--> kfd interface
> > >    drm/radeon: Add kfd-->kgd interface to get virtual ram size
> > >    drm/radeon: Add kfd-->kgd interfaces of memory 
> > > allocation/mapping
> > >    drm/radeon: Add kfd-->kgd interface of locking srbm_gfx_cntl 
> > > register
> > >    drm/radeon: Add calls to initialize and finalize kfd from radeon
> > >    hsa/radeon: Add code base of hsa driver for AMD's GPUs
> > >    hsa/radeon: Add initialization and unmapping of doorbell 
> > > aperture
> > >    hsa/radeon: Add scheduler code
> > >    hsa/radeon: Add kfd mmap handler
> > >    hsa/radeon: Add 2 new IOCTL to kfd, CREATE_QUEUE and 
> > > DESTROY_QUEUE
> > >    hsa/radeon: Update MAINTAINERS and CREDITS files
> > >    hsa/radeon: Add interrupt handling module
> > >    hsa/radeon: Add the isr function of the KFD scehduler
> > >    hsa/radeon: Handle deactivation of queues using interrupts
> > >    hsa/radeon: Enable interrupts in KFD scheduler
> > >    hsa/radeon: Enable/Disable KFD interrupt module
> > >    hsa/radeon: Add interrupt callback function to kgd2kfd interface
> > >    hsa/radeon: Add kgd-->kfd interfaces for suspend and resume
> > >    drm/radeon: Add calls to suspend and resume of kfd driver
> > >    drm/radeon/cik: Don't touch int of pipes 1-7
> > >    drm/radeon/cik: Call kfd isr function
> > >    hsa/radeon: Fix memory size allocated for HPD
> > >    hsa/radeon: Fix list of supported devices
> > >    hsa/radeon: Fix coding style in cik_int.h
> > >    hsa/radeon: Print ioctl commnad only in debug mode
> > >    hsa/radeon: Print ISR info only in debug mode
> > >    hsa/radeon: Workaround for a bug in amd_iommu
> > >    hsa/radeon: Eliminate warnings in compilation
> > >    hsa/radeon: Various kernel styling fixes
> > >    hsa/radeon: Rearrange structures in kfd_ioctl.h
> > >    hsa/radeon: change another pr_info to pr_debug
> > >    hsa/radeon: Fix timeout calculation in sync_with_hw
> > >    hsa/radeon: Update module information and version
> > >    hsa/radeon: Update module version to 0.6.0
> > >    hsa/radeon: Fix initialization of sh_mem registers
> > >    hsa/radeon: Fix compilation warnings
> > >    hsa/radeon: Remove old scheduler code
> > >    hsa/radeon: Static analysis (smatch) fixes
> > >    hsa/radeon: Check oversubscription before destroying runlist
> > >    hsa/radeon: Don't verify cksum when parsing CRAT table
> > >    hsa/radeon: Update module version to 0.6.1
> > >    hsa/radeon: Update module version to 0.6.2
> > >  Yair Shachar (1):
> > >    hsa/radeon: Adding qcm fence return status
> > >   CREDITS                                            |    7 +
> > >   MAINTAINERS                                        |    8 +
> > >   drivers/Kconfig                                    |    2 +
> > >   drivers/gpu/Makefile                               |    1 +
> > >   drivers/gpu/drm/radeon/Makefile                    |    1 +
> > >   drivers/gpu/drm/radeon/cik.c                       |  156 +--
> > >   drivers/gpu/drm/radeon/cikd.h                      |   51 +-
> > >   drivers/gpu/drm/radeon/radeon.h                    |    9 +
> > >   drivers/gpu/drm/radeon/radeon_device.c             |   32 +
> > >   drivers/gpu/drm/radeon/radeon_drv.c                |    6 +
> > >   drivers/gpu/drm/radeon/radeon_kfd.c                |  630 
> > > ++++++++++
> > >   drivers/gpu/drm/radeon/radeon_kms.c                |    9 +
> > >   drivers/gpu/hsa/Kconfig                            |   20 +
> > >   drivers/gpu/hsa/Makefile                           |    1 +
> > >   drivers/gpu/hsa/radeon/Makefile                    |   12 +
> > >   drivers/gpu/hsa/radeon/cik_int.h                   |   50 +
> > >   drivers/gpu/hsa/radeon/cik_mqds.h                  |  250 ++++
> > >   drivers/gpu/hsa/radeon/cik_regs.h                  |  220 ++++
> > >   drivers/gpu/hsa/radeon/kfd_aperture.c              |  123 ++
> > >   drivers/gpu/hsa/radeon/kfd_chardev.c               |  530 
> > > +++++++++
> > >   drivers/gpu/hsa/radeon/kfd_crat.h                  |  294 +++++
> > >   drivers/gpu/hsa/radeon/kfd_device.c                |  244 ++++
> > >   drivers/gpu/hsa/radeon/kfd_device_queue_manager.c  |  981 
> > > ++++++++++++++++
> > >   drivers/gpu/hsa/radeon/kfd_device_queue_manager.h  |  101 ++
> > >   drivers/gpu/hsa/radeon/kfd_doorbell.c              |  242 ++++
> > >   drivers/gpu/hsa/radeon/kfd_interrupt.c             |  177 +++
> > >   drivers/gpu/hsa/radeon/kfd_kernel_queue.c          |  305 +++++
> > >   drivers/gpu/hsa/radeon/kfd_kernel_queue.h          |   66 ++
> > >   drivers/gpu/hsa/radeon/kfd_module.c                |  130 +++
> > >   drivers/gpu/hsa/radeon/kfd_mqd_manager.c           |  290 +++++
> > >   drivers/gpu/hsa/radeon/kfd_mqd_manager.h           |   54 +
> > >   drivers/gpu/hsa/radeon/kfd_packet_manager.c        |  488 
> > > ++++++++
> > >   drivers/gpu/hsa/radeon/kfd_pasid.c                 |   97 ++
> > >   drivers/gpu/hsa/radeon/kfd_pm4_headers.h           |  682 
> > > +++++++++++
> > >   drivers/gpu/hsa/radeon/kfd_pm4_opcodes.h           |  107 ++
> > >   drivers/gpu/hsa/radeon/kfd_priv.h                  |  475 
> > > ++++++++
> > >   drivers/gpu/hsa/radeon/kfd_process.c               |  391 +++++++
> > >   drivers/gpu/hsa/radeon/kfd_process_queue_manager.c |  343 ++++++
> > >   drivers/gpu/hsa/radeon/kfd_queue.c                 |  109 ++
> > >   drivers/gpu/hsa/radeon/kfd_scheduler.h             |   72 ++
> > >   drivers/gpu/hsa/radeon/kfd_topology.c              | 1207 
> > > ++++++++++++++++++++
> > >   drivers/gpu/hsa/radeon/kfd_topology.h              |  168 +++
> > >   drivers/gpu/hsa/radeon/kfd_vidmem.c                |   97 ++
> > >   include/linux/mm_types.h                           |   14 +
> > >   include/linux/radeon_kfd.h                         |  106 ++
> > >   include/uapi/linux/kfd_ioctl.h                     |  133 +++
> > >   mm/rmap.c                                          |    8 +-
> > >   47 files changed, 9402 insertions(+), 97 deletions(-)
> > >   create mode 100644 drivers/gpu/drm/radeon/radeon_kfd.c
> > >   create mode 100644 drivers/gpu/hsa/Kconfig
> > >   create mode 100644 drivers/gpu/hsa/Makefile
> > >   create mode 100644 drivers/gpu/hsa/radeon/Makefile
> > >   create mode 100644 drivers/gpu/hsa/radeon/cik_int.h
> > >   create mode 100644 drivers/gpu/hsa/radeon/cik_mqds.h
> > >   create mode 100644 drivers/gpu/hsa/radeon/cik_regs.h
> > >   create mode 100644 drivers/gpu/hsa/radeon/kfd_aperture.c
> > >   create mode 100644 drivers/gpu/hsa/radeon/kfd_chardev.c
> > >   create mode 100644 drivers/gpu/hsa/radeon/kfd_crat.h
> > >   create mode 100644 drivers/gpu/hsa/radeon/kfd_device.c
> > >   create mode 100644 
> > > drivers/gpu/hsa/radeon/kfd_device_queue_manager.c
> > >   create mode 100644 
> > > drivers/gpu/hsa/radeon/kfd_device_queue_manager.h
> > >   create mode 100644 drivers/gpu/hsa/radeon/kfd_doorbell.c
> > >   create mode 100644 drivers/gpu/hsa/radeon/kfd_interrupt.c
> > >   create mode 100644 drivers/gpu/hsa/radeon/kfd_kernel_queue.c
> > >   create mode 100644 drivers/gpu/hsa/radeon/kfd_kernel_queue.h
> > >   create mode 100644 drivers/gpu/hsa/radeon/kfd_module.c
> > >   create mode 100644 drivers/gpu/hsa/radeon/kfd_mqd_manager.c
> > >   create mode 100644 drivers/gpu/hsa/radeon/kfd_mqd_manager.h
> > >   create mode 100644 drivers/gpu/hsa/radeon/kfd_packet_manager.c
> > >   create mode 100644 drivers/gpu/hsa/radeon/kfd_pasid.c
> > >   create mode 100644 drivers/gpu/hsa/radeon/kfd_pm4_headers.h
> > >   create mode 100644 drivers/gpu/hsa/radeon/kfd_pm4_opcodes.h
> > >   create mode 100644 drivers/gpu/hsa/radeon/kfd_priv.h
> > >   create mode 100644 drivers/gpu/hsa/radeon/kfd_process.c
> > >   create mode 100644 
> > > drivers/gpu/hsa/radeon/kfd_process_queue_manager.c
> > >   create mode 100644 drivers/gpu/hsa/radeon/kfd_queue.c
> > >   create mode 100644 drivers/gpu/hsa/radeon/kfd_scheduler.h
> > >   create mode 100644 drivers/gpu/hsa/radeon/kfd_topology.c
> > >   create mode 100644 drivers/gpu/hsa/radeon/kfd_topology.h
> > >   create mode 100644 drivers/gpu/hsa/radeon/kfd_vidmem.c
> > >   create mode 100644 include/linux/radeon_kfd.h
> > >   create mode 100644 include/uapi/linux/kfd_ioctl.h
> > >  --
> > >  1.9.1

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/83] AMD HSA kernel driver
@ 2014-07-11 21:18       ` Jerome Glisse
  0 siblings, 0 replies; 51+ messages in thread
From: Jerome Glisse @ 2014-07-11 21:18 UTC (permalink / raw)
  To: Gabbay, Oded
  Cc: oded.gabbay, Lewycky, Andrew, linux-kernel, dri-devel, Deucher,
	Alexander, akpm

On Thu, Jul 10, 2014 at 10:51:29PM +0000, Gabbay, Oded wrote:
> On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote:
> > On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote:
> > >  This patch set implements a Heterogeneous System Architecture 
> > > (HSA) driver
> > >  for radeon-family GPUs.
> >  
> > This is just quick comments on few things. Given size of this, people
> > will need to have time to review things.
> >  
> > >  HSA allows different processor types (CPUs, DSPs, GPUs, etc..) to 
> > > share
> > >  system resources more effectively via HW features including 
> > > shared pageable
> > >  memory, userspace-accessible work queues, and platform-level 
> > > atomics. In
> > >  addition to the memory protection mechanisms in GPUVM and 
> > > IOMMUv2, the Sea
> > >  Islands family of GPUs also performs HW-level validation of 
> > > commands passed
> > >  in through the queues (aka rings).
> > >  The code in this patch set is intended to serve both as a sample 
> > > driver for
> > >  other HSA-compatible hardware devices and as a production driver 
> > > for
> > >  radeon-family processors. The code is architected to support 
> > > multiple CPUs
> > >  each with connected GPUs, although the current implementation 
> > > focuses on a
> > >  single Kaveri/Berlin APU, and works alongside the existing radeon 
> > > kernel
> > >  graphics driver (kgd).
> > >  AMD GPUs designed for use with HSA (Sea Islands and up) share 
> > > some hardware
> > >  functionality between HSA compute and regular gfx/compute (memory,
> > >  interrupts, registers), while other functionality has been added
> > >  specifically for HSA compute  (hw scheduler for virtualized 
> > > compute rings).
> > >  All shared hardware is owned by the radeon graphics driver, and 
> > > an interface
> > >  between kfd and kgd allows the kfd to make use of those shared 
> > > resources,
> > >  while HSA-specific functionality is managed directly by kfd by 
> > > submitting
> > >  packets into an HSA-specific command queue (the "HIQ").
> > >  During kfd module initialization a char device node (/dev/kfd) is 
> > > created
> > >  (surviving until module exit), with ioctls for queue creation & 
> > > management,
> > >  and data structures are initialized for managing HSA device 
> > > topology.
> > >  The rest of the initialization is driven by calls from the radeon 
> > > kgd at
> > >  the following points :
> > >  - radeon_init (kfd_init)
> > >  - radeon_exit (kfd_fini)
> > >  - radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
> > >  - radeon_driver_unload_kms (kfd_device_fini)
> > >  During the probe and init processing per-device data structures 
> > > are
> > >  established which connect to the associated graphics kernel 
> > > driver. This
> > >  information is exposed to userspace via sysfs, along with a 
> > > version number
> > >  allowing userspace to determine if a topology change has occurred 
> > > while it
> > >  was reading from sysfs.
> > >  The interface between kfd and kgd also allows the kfd to request 
> > > buffer
> > >  management services from kgd, and allows kgd to route interrupt 
> > > requests to
> > >  kfd code since the interrupt block is shared between regular
> > >  graphics/compute and HSA compute subsystems in the GPU.
> > >  The kfd code works with an open source usermode library 
> > > ("libhsakmt") which
> > >  is in the final stages of IP review and should be published in a 
> > > separate
> > >  repo over the next few days.
> > >  The code operates in one of three modes, selectable via the 
> > > sched_policy
> > >  module parameter :
> > >  - sched_policy=0 uses a hardware scheduler running in the MEC 
> > > block within
> > >  CP, and allows oversubscription (more queues than HW slots)
> > >  - sched_policy=1 also uses HW scheduling but does not allow
> > >  oversubscription, so create_queue requests fail when we run out 
> > > of HW slots
> > >  - sched_policy=2 does not use HW scheduling, so the driver 
> > > manually assigns
> > >  queues to HW slots by programming registers
> > >  The "no HW scheduling" option is for debug & new hardware bringup 
> > > only, so
> > >  has less test coverage than the other options. Default in the 
> > > current code
> > >  is "HW scheduling without oversubscription" since that is where 
> > > we have the
> > >  most test coverage but we expect to change the default to "HW 
> > > scheduling
> > >  with oversubscription" after further testing. This effectively 
> > > removes the
> > >  HW limit on the number of work queues available to applications.
> > >  Programs running on the GPU are associated with an address space 
> > > through the
> > >  VMID field, which is translated to a unique PASID at access time 
> > > via a set
> > >  of 16 VMID-to-PASID mapping registers. The available VMIDs 
> > > (currently 16)
> > >  are partitioned (under control of the radeon kgd) between current
> > >  gfx/compute and HSA compute, with each getting 8 in the current 
> > > code. The
> > >  VMID-to-PASID mapping registers are updated by the HW scheduler 
> > > when used,
> > >  and by driver code if HW scheduling is not being used.
> > >  The Sea Islands compute queues use a new "doorbell" mechanism 
> > > instead of the
> > >  earlier kernel-managed write pointer registers. Doorbells use a 
> > > separate BAR
> > >  dedicated for this purpose, and pages within the doorbell 
> > > aperture are
> > >  mapped to userspace (each page mapped to only one user address 
> > > space).
> > >  Writes to the doorbell aperture are intercepted by GPU hardware, 
> > > allowing
> > >  userspace code to safely manage work queues (rings) without 
> > > requiring a
> > >  kernel call for every ring update.
> > >  First step for an application process is to open the kfd device. 
> > > Calls to
> > >  open create a kfd "process" structure only for the first thread 
> > > of the
> > >  process. Subsequent open calls are checked to see if they are 
> > > from processes
> > >  using the same mm_struct and, if so, don't do anything. The kfd 
> > > per-process
> > >  data lives as long as the mm_struct exists. Each mm_struct is 
> > > associated
> > >  with a unique PASID, allowing the IOMMUv2 to make userspace 
> > > process memory
> > >  accessible to the GPU.
> > >  Next step is for the application to collect topology information 
> > > via sysfs.
> > >  This gives userspace enough information to be able to identify 
> > > specific
> > >  nodes (processors) in subsequent queue management calls. 
> > > Application
> > >  processes can create queues on multiple processors, and 
> > > processors support
> > >  queues from multiple processes.
> >  
> > I am not a fan to use sysfs to discover topoly.
> >  
> > >  At this point the application can create work queues in userspace 
> > > memory and
> > >  pass them through the usermode library to kfd to have them mapped 
> > > onto HW
> > >  queue slots so that commands written to the queues can be 
> > > executed by the
> > >  GPU. Queue operations specify a processor node, and so the bulk 
> > > of this code
> > >  is device-specific.
> > >  Written by John Bridgman <John.Bridgman@amd.com>
> >  
> > So general comment is you need to collapse many patches things like 
> > 58 fixing
> > kernel style should be collapsed ie fix all previous patch that have 
> > broken
> > style.
> >  
> > Even worse is thing like 71, removing code you just added few patch 
> > earlier
> > in the patchset. 
> Not quite, the code was added on patch 11.
> 
> > This means that if i start reviewing following 
> > patch order
> > i might spend time on code that is later deleted/modified/fixed ie 
> > time i
> > spend understanding and reading some code might be just wasted.
> Quick answer is that you are of course right, but having said that, I 
> think it would be not simple at all to do that squashing.
> I squashed what I could, and probably I can do a little more (like 
> 58), but the major problem is that one of the main modules of the 
> driver - the scheduler (QCM) - was completely re-written (patches 
> 46-53). Therefore, from patch 1 to 53, we use the old scheduler code 
> and from 54 we use the new QCM (and the old scheduler code was 
> completely remove at 71). So I could maybe squash 71 into 54, but that 
> won't help you much, I think.
> 
> So, the current advice I can give is to completely ignore the 
> following files because they do not exist in the final commit:
> - kfd_sched_cik_static.c (stopped using in 54)
> - kfd_registers.c (stopped using in 81 because we moved all register 
> writes to radeon)
> - kfd_hw_pointer_store.c (alive from 46 to 67)
> - kfd_hw_pointer_store.h (alive from 46 to 67)
> 
>         Oded 

Ok i try but this is driving me crazy, i am jungling btw full applied
patchset and individual patch going back and forth trying to find which
patch is the one i want to comment on. Not even mentioning that after
that people would need to ack bad patch just because they know a latter
good patch fix the badness.

This patchset can be reduced to less then 10 big patches. I would first
do core patches that touch core things like iommu and are preparatory
to the patchset. Then kfd patches that do not touch anything in radeon
but implement the skeleton and core infrastructure. Then radeon specific
patches.

This lead me to the design, all the core kfd helper should be in hsa
directory, and should be helper, while all specific bits to radeon
should most likely be part of the radeon gfx kernel module. I am not
against a radeon_kfd but realy i do not see the point. There should
be a hsa.ko like there is a drm.ko.

Also i think it would be benefical to rename kfd to hsa because it is
the most common name and the name that will bring valid hit with web
search.

Each of the patch should have a proper commit message that is not too
cryptic without being too chatty either.

Cheers,
Jérôme

> > I will come back with individual patch comments and also general 
> > remarks.
> >  
> > Cheers,
> > Jérôme
> >  
> >  
> > >  Alexey Skidanov (4):
> > >    hsa/radeon: 32-bit processes support
> > >    hsa/radeon: NULL pointer dereference bug workaround
> > >    hsa/radeon: HSA64/HSA32 modes support
> > >    hsa/radeon: Add local memory to topology
> > >  Andrew Lewycky (3):
> > >    hsa/radeon: Make binding of process to device permanent
> > >    hsa/radeon: Implement hsaKmtSetMemoryPolicy
> > >    mm: Change timing of notification to IOMMUs about a page to be
> > >      invalidated
> > >  Ben Goz (20):
> > >    hsa/radeon: Add queue and hw_pointer_store modules
> > >    hsa/radeon: Add support allocating kernel doorbells
> > >    hsa/radeon: Add mqd_manager module
> > >    hsa/radeon: Add kernel queue support for KFD
> > >    hsa/radeon: Add module parameter of scheduling policy
> > >    hsa/radeon: Add packet manager module
> > >    hsa/radeon: Add process queue manager module
> > >    hsa/radeon: Add device queue manager module
> > >    hsa/radeon: Switch to new queue scheduler
> > >    hsa/radeon: Add IOCTL for update queue
> > >    hsa/radeon: Queue Management integration with Memory Management
> > >    hsa/radeon: update queue fault handling
> > >    hsa/radeon: fixing a bug to support 32b processes
> > >    hsa/radeon: Fix number of pipes per ME
> > >    hsa/radeon: Removing hw pointer store module
> > >    hsa/radeon: Adding some error messages
> > >    hsa/radeon: Fixing minor issues with kernel queues (DIQ)
> > >    drm/radeon: Add register access functions to kfd2kgd interface
> > >    hsa/radeon: Eliminating all direct register accesses
> > >    drm/radeon: Remove lock functions from kfd2kgd interface
> > >  Evgeny Pinchuk (9):
> > >    hsa/radeon: fix the OEMID assignment in kfd_topology
> > >    drm/radeon: extending kfd-kgd interface
> > >    hsa/radeon: implementing IOCTL for clock counters
> > >    drm/radeon: adding synchronization for GRBM GFX
> > >    hsa/radeon: fixing clock counters bug
> > >    drm/radeon: Extending kfd interface
> > >    hsa/radeon: Adding max clock speeds to topology
> > >    hsa/radeon: Alternating the source of max clock
> > >    hsa/radeon: Exclusive access for perf. counters
> > >  Michael Varga (1):
> > >    hsa/radeon: debugging print statements
> > >  Oded Gabbay (45):
> > >    mm: Add kfd_process pointer to mm_struct
> > >    drm/radeon: reduce number of free VMIDs and pipes in KV
> > >    drm/radeon: Report doorbell configuration to kfd
> > >    drm/radeon: Add radeon <--> kfd interface
> > >    drm/radeon: Add kfd-->kgd interface to get virtual ram size
> > >    drm/radeon: Add kfd-->kgd interfaces of memory 
> > > allocation/mapping
> > >    drm/radeon: Add kfd-->kgd interface of locking srbm_gfx_cntl 
> > > register
> > >    drm/radeon: Add calls to initialize and finalize kfd from radeon
> > >    hsa/radeon: Add code base of hsa driver for AMD's GPUs
> > >    hsa/radeon: Add initialization and unmapping of doorbell 
> > > aperture
> > >    hsa/radeon: Add scheduler code
> > >    hsa/radeon: Add kfd mmap handler
> > >    hsa/radeon: Add 2 new IOCTL to kfd, CREATE_QUEUE and 
> > > DESTROY_QUEUE
> > >    hsa/radeon: Update MAINTAINERS and CREDITS files
> > >    hsa/radeon: Add interrupt handling module
> > >    hsa/radeon: Add the isr function of the KFD scehduler
> > >    hsa/radeon: Handle deactivation of queues using interrupts
> > >    hsa/radeon: Enable interrupts in KFD scheduler
> > >    hsa/radeon: Enable/Disable KFD interrupt module
> > >    hsa/radeon: Add interrupt callback function to kgd2kfd interface
> > >    hsa/radeon: Add kgd-->kfd interfaces for suspend and resume
> > >    drm/radeon: Add calls to suspend and resume of kfd driver
> > >    drm/radeon/cik: Don't touch int of pipes 1-7
> > >    drm/radeon/cik: Call kfd isr function
> > >    hsa/radeon: Fix memory size allocated for HPD
> > >    hsa/radeon: Fix list of supported devices
> > >    hsa/radeon: Fix coding style in cik_int.h
> > >    hsa/radeon: Print ioctl commnad only in debug mode
> > >    hsa/radeon: Print ISR info only in debug mode
> > >    hsa/radeon: Workaround for a bug in amd_iommu
> > >    hsa/radeon: Eliminate warnings in compilation
> > >    hsa/radeon: Various kernel styling fixes
> > >    hsa/radeon: Rearrange structures in kfd_ioctl.h
> > >    hsa/radeon: change another pr_info to pr_debug
> > >    hsa/radeon: Fix timeout calculation in sync_with_hw
> > >    hsa/radeon: Update module information and version
> > >    hsa/radeon: Update module version to 0.6.0
> > >    hsa/radeon: Fix initialization of sh_mem registers
> > >    hsa/radeon: Fix compilation warnings
> > >    hsa/radeon: Remove old scheduler code
> > >    hsa/radeon: Static analysis (smatch) fixes
> > >    hsa/radeon: Check oversubscription before destroying runlist
> > >    hsa/radeon: Don't verify cksum when parsing CRAT table
> > >    hsa/radeon: Update module version to 0.6.1
> > >    hsa/radeon: Update module version to 0.6.2
> > >  Yair Shachar (1):
> > >    hsa/radeon: Adding qcm fence return status
> > >   CREDITS                                            |    7 +
> > >   MAINTAINERS                                        |    8 +
> > >   drivers/Kconfig                                    |    2 +
> > >   drivers/gpu/Makefile                               |    1 +
> > >   drivers/gpu/drm/radeon/Makefile                    |    1 +
> > >   drivers/gpu/drm/radeon/cik.c                       |  156 +--
> > >   drivers/gpu/drm/radeon/cikd.h                      |   51 +-
> > >   drivers/gpu/drm/radeon/radeon.h                    |    9 +
> > >   drivers/gpu/drm/radeon/radeon_device.c             |   32 +
> > >   drivers/gpu/drm/radeon/radeon_drv.c                |    6 +
> > >   drivers/gpu/drm/radeon/radeon_kfd.c                |  630 
> > > ++++++++++
> > >   drivers/gpu/drm/radeon/radeon_kms.c                |    9 +
> > >   drivers/gpu/hsa/Kconfig                            |   20 +
> > >   drivers/gpu/hsa/Makefile                           |    1 +
> > >   drivers/gpu/hsa/radeon/Makefile                    |   12 +
> > >   drivers/gpu/hsa/radeon/cik_int.h                   |   50 +
> > >   drivers/gpu/hsa/radeon/cik_mqds.h                  |  250 ++++
> > >   drivers/gpu/hsa/radeon/cik_regs.h                  |  220 ++++
> > >   drivers/gpu/hsa/radeon/kfd_aperture.c              |  123 ++
> > >   drivers/gpu/hsa/radeon/kfd_chardev.c               |  530 
> > > +++++++++
> > >   drivers/gpu/hsa/radeon/kfd_crat.h                  |  294 +++++
> > >   drivers/gpu/hsa/radeon/kfd_device.c                |  244 ++++
> > >   drivers/gpu/hsa/radeon/kfd_device_queue_manager.c  |  981 
> > > ++++++++++++++++
> > >   drivers/gpu/hsa/radeon/kfd_device_queue_manager.h  |  101 ++
> > >   drivers/gpu/hsa/radeon/kfd_doorbell.c              |  242 ++++
> > >   drivers/gpu/hsa/radeon/kfd_interrupt.c             |  177 +++
> > >   drivers/gpu/hsa/radeon/kfd_kernel_queue.c          |  305 +++++
> > >   drivers/gpu/hsa/radeon/kfd_kernel_queue.h          |   66 ++
> > >   drivers/gpu/hsa/radeon/kfd_module.c                |  130 +++
> > >   drivers/gpu/hsa/radeon/kfd_mqd_manager.c           |  290 +++++
> > >   drivers/gpu/hsa/radeon/kfd_mqd_manager.h           |   54 +
> > >   drivers/gpu/hsa/radeon/kfd_packet_manager.c        |  488 
> > > ++++++++
> > >   drivers/gpu/hsa/radeon/kfd_pasid.c                 |   97 ++
> > >   drivers/gpu/hsa/radeon/kfd_pm4_headers.h           |  682 
> > > +++++++++++
> > >   drivers/gpu/hsa/radeon/kfd_pm4_opcodes.h           |  107 ++
> > >   drivers/gpu/hsa/radeon/kfd_priv.h                  |  475 
> > > ++++++++
> > >   drivers/gpu/hsa/radeon/kfd_process.c               |  391 +++++++
> > >   drivers/gpu/hsa/radeon/kfd_process_queue_manager.c |  343 ++++++
> > >   drivers/gpu/hsa/radeon/kfd_queue.c                 |  109 ++
> > >   drivers/gpu/hsa/radeon/kfd_scheduler.h             |   72 ++
> > >   drivers/gpu/hsa/radeon/kfd_topology.c              | 1207 
> > > ++++++++++++++++++++
> > >   drivers/gpu/hsa/radeon/kfd_topology.h              |  168 +++
> > >   drivers/gpu/hsa/radeon/kfd_vidmem.c                |   97 ++
> > >   include/linux/mm_types.h                           |   14 +
> > >   include/linux/radeon_kfd.h                         |  106 ++
> > >   include/uapi/linux/kfd_ioctl.h                     |  133 +++
> > >   mm/rmap.c                                          |    8 +-
> > >   47 files changed, 9402 insertions(+), 97 deletions(-)
> > >   create mode 100644 drivers/gpu/drm/radeon/radeon_kfd.c
> > >   create mode 100644 drivers/gpu/hsa/Kconfig
> > >   create mode 100644 drivers/gpu/hsa/Makefile
> > >   create mode 100644 drivers/gpu/hsa/radeon/Makefile
> > >   create mode 100644 drivers/gpu/hsa/radeon/cik_int.h
> > >   create mode 100644 drivers/gpu/hsa/radeon/cik_mqds.h
> > >   create mode 100644 drivers/gpu/hsa/radeon/cik_regs.h
> > >   create mode 100644 drivers/gpu/hsa/radeon/kfd_aperture.c
> > >   create mode 100644 drivers/gpu/hsa/radeon/kfd_chardev.c
> > >   create mode 100644 drivers/gpu/hsa/radeon/kfd_crat.h
> > >   create mode 100644 drivers/gpu/hsa/radeon/kfd_device.c
> > >   create mode 100644 
> > > drivers/gpu/hsa/radeon/kfd_device_queue_manager.c
> > >   create mode 100644 
> > > drivers/gpu/hsa/radeon/kfd_device_queue_manager.h
> > >   create mode 100644 drivers/gpu/hsa/radeon/kfd_doorbell.c
> > >   create mode 100644 drivers/gpu/hsa/radeon/kfd_interrupt.c
> > >   create mode 100644 drivers/gpu/hsa/radeon/kfd_kernel_queue.c
> > >   create mode 100644 drivers/gpu/hsa/radeon/kfd_kernel_queue.h
> > >   create mode 100644 drivers/gpu/hsa/radeon/kfd_module.c
> > >   create mode 100644 drivers/gpu/hsa/radeon/kfd_mqd_manager.c
> > >   create mode 100644 drivers/gpu/hsa/radeon/kfd_mqd_manager.h
> > >   create mode 100644 drivers/gpu/hsa/radeon/kfd_packet_manager.c
> > >   create mode 100644 drivers/gpu/hsa/radeon/kfd_pasid.c
> > >   create mode 100644 drivers/gpu/hsa/radeon/kfd_pm4_headers.h
> > >   create mode 100644 drivers/gpu/hsa/radeon/kfd_pm4_opcodes.h
> > >   create mode 100644 drivers/gpu/hsa/radeon/kfd_priv.h
> > >   create mode 100644 drivers/gpu/hsa/radeon/kfd_process.c
> > >   create mode 100644 
> > > drivers/gpu/hsa/radeon/kfd_process_queue_manager.c
> > >   create mode 100644 drivers/gpu/hsa/radeon/kfd_queue.c
> > >   create mode 100644 drivers/gpu/hsa/radeon/kfd_scheduler.h
> > >   create mode 100644 drivers/gpu/hsa/radeon/kfd_topology.c
> > >   create mode 100644 drivers/gpu/hsa/radeon/kfd_topology.h
> > >   create mode 100644 drivers/gpu/hsa/radeon/kfd_vidmem.c
> > >   create mode 100644 include/linux/radeon_kfd.h
> > >   create mode 100644 include/uapi/linux/kfd_ioctl.h
> > >  --
> > >  1.9.1

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/83] AMD HSA kernel driver
  2014-07-11 21:18       ` Jerome Glisse
  (?)
@ 2014-07-12  9:24       ` Christian König
  2014-07-12 11:10           ` Daniel Vetter
  -1 siblings, 1 reply; 51+ messages in thread
From: Christian König @ 2014-07-12  9:24 UTC (permalink / raw)
  To: Jerome Glisse, Gabbay, Oded
  Cc: oded.gabbay, Lewycky, Andrew, linux-kernel, dri-devel, Deucher,
	Alexander, akpm

Am 11.07.2014 23:18, schrieb Jerome Glisse:
> On Thu, Jul 10, 2014 at 10:51:29PM +0000, Gabbay, Oded wrote:
>> On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote:
>>> On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote:
>>>>   This patch set implements a Heterogeneous System Architecture
>>>> (HSA) driver
>>>>   for radeon-family GPUs.
>>>   
>>> This is just quick comments on few things. Given size of this, people
>>> will need to have time to review things.
>>>   
>>>>   HSA allows different processor types (CPUs, DSPs, GPUs, etc..) to
>>>> share
>>>>   system resources more effectively via HW features including
>>>> shared pageable
>>>>   memory, userspace-accessible work queues, and platform-level
>>>> atomics. In
>>>>   addition to the memory protection mechanisms in GPUVM and
>>>> IOMMUv2, the Sea
>>>>   Islands family of GPUs also performs HW-level validation of
>>>> commands passed
>>>>   in through the queues (aka rings).
>>>>   The code in this patch set is intended to serve both as a sample
>>>> driver for
>>>>   other HSA-compatible hardware devices and as a production driver
>>>> for
>>>>   radeon-family processors. The code is architected to support
>>>> multiple CPUs
>>>>   each with connected GPUs, although the current implementation
>>>> focuses on a
>>>>   single Kaveri/Berlin APU, and works alongside the existing radeon
>>>> kernel
>>>>   graphics driver (kgd).
>>>>   AMD GPUs designed for use with HSA (Sea Islands and up) share
>>>> some hardware
>>>>   functionality between HSA compute and regular gfx/compute (memory,
>>>>   interrupts, registers), while other functionality has been added
>>>>   specifically for HSA compute  (hw scheduler for virtualized
>>>> compute rings).
>>>>   All shared hardware is owned by the radeon graphics driver, and
>>>> an interface
>>>>   between kfd and kgd allows the kfd to make use of those shared
>>>> resources,
>>>>   while HSA-specific functionality is managed directly by kfd by
>>>> submitting
>>>>   packets into an HSA-specific command queue (the "HIQ").
>>>>   During kfd module initialization a char device node (/dev/kfd) is
>>>> created
>>>>   (surviving until module exit), with ioctls for queue creation &
>>>> management,
>>>>   and data structures are initialized for managing HSA device
>>>> topology.
>>>>   The rest of the initialization is driven by calls from the radeon
>>>> kgd at
>>>>   the following points :
>>>>   - radeon_init (kfd_init)
>>>>   - radeon_exit (kfd_fini)
>>>>   - radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
>>>>   - radeon_driver_unload_kms (kfd_device_fini)
>>>>   During the probe and init processing per-device data structures
>>>> are
>>>>   established which connect to the associated graphics kernel
>>>> driver. This
>>>>   information is exposed to userspace via sysfs, along with a
>>>> version number
>>>>   allowing userspace to determine if a topology change has occurred
>>>> while it
>>>>   was reading from sysfs.
>>>>   The interface between kfd and kgd also allows the kfd to request
>>>> buffer
>>>>   management services from kgd, and allows kgd to route interrupt
>>>> requests to
>>>>   kfd code since the interrupt block is shared between regular
>>>>   graphics/compute and HSA compute subsystems in the GPU.
>>>>   The kfd code works with an open source usermode library
>>>> ("libhsakmt") which
>>>>   is in the final stages of IP review and should be published in a
>>>> separate
>>>>   repo over the next few days.
>>>>   The code operates in one of three modes, selectable via the
>>>> sched_policy
>>>>   module parameter :
>>>>   - sched_policy=0 uses a hardware scheduler running in the MEC
>>>> block within
>>>>   CP, and allows oversubscription (more queues than HW slots)
>>>>   - sched_policy=1 also uses HW scheduling but does not allow
>>>>   oversubscription, so create_queue requests fail when we run out
>>>> of HW slots
>>>>   - sched_policy=2 does not use HW scheduling, so the driver
>>>> manually assigns
>>>>   queues to HW slots by programming registers
>>>>   The "no HW scheduling" option is for debug & new hardware bringup
>>>> only, so
>>>>   has less test coverage than the other options. Default in the
>>>> current code
>>>>   is "HW scheduling without oversubscription" since that is where
>>>> we have the
>>>>   most test coverage but we expect to change the default to "HW
>>>> scheduling
>>>>   with oversubscription" after further testing. This effectively
>>>> removes the
>>>>   HW limit on the number of work queues available to applications.
>>>>   Programs running on the GPU are associated with an address space
>>>> through the
>>>>   VMID field, which is translated to a unique PASID at access time
>>>> via a set
>>>>   of 16 VMID-to-PASID mapping registers. The available VMIDs
>>>> (currently 16)
>>>>   are partitioned (under control of the radeon kgd) between current
>>>>   gfx/compute and HSA compute, with each getting 8 in the current
>>>> code. The
>>>>   VMID-to-PASID mapping registers are updated by the HW scheduler
>>>> when used,
>>>>   and by driver code if HW scheduling is not being used.
>>>>   The Sea Islands compute queues use a new "doorbell" mechanism
>>>> instead of the
>>>>   earlier kernel-managed write pointer registers. Doorbells use a
>>>> separate BAR
>>>>   dedicated for this purpose, and pages within the doorbell
>>>> aperture are
>>>>   mapped to userspace (each page mapped to only one user address
>>>> space).
>>>>   Writes to the doorbell aperture are intercepted by GPU hardware,
>>>> allowing
>>>>   userspace code to safely manage work queues (rings) without
>>>> requiring a
>>>>   kernel call for every ring update.
>>>>   First step for an application process is to open the kfd device.
>>>> Calls to
>>>>   open create a kfd "process" structure only for the first thread
>>>> of the
>>>>   process. Subsequent open calls are checked to see if they are
>>>> from processes
>>>>   using the same mm_struct and, if so, don't do anything. The kfd
>>>> per-process
>>>>   data lives as long as the mm_struct exists. Each mm_struct is
>>>> associated
>>>>   with a unique PASID, allowing the IOMMUv2 to make userspace
>>>> process memory
>>>>   accessible to the GPU.
>>>>   Next step is for the application to collect topology information
>>>> via sysfs.
>>>>   This gives userspace enough information to be able to identify
>>>> specific
>>>>   nodes (processors) in subsequent queue management calls.
>>>> Application
>>>>   processes can create queues on multiple processors, and
>>>> processors support
>>>>   queues from multiple processes.
>>>   
>>> I am not a fan to use sysfs to discover topoly.
>>>   
>>>>   At this point the application can create work queues in userspace
>>>> memory and
>>>>   pass them through the usermode library to kfd to have them mapped
>>>> onto HW
>>>>   queue slots so that commands written to the queues can be
>>>> executed by the
>>>>   GPU. Queue operations specify a processor node, and so the bulk
>>>> of this code
>>>>   is device-specific.
>>>>   Written by John Bridgman <John.Bridgman@amd.com>
>>>   
>>> So general comment is you need to collapse many patches things like
>>> 58 fixing
>>> kernel style should be collapsed ie fix all previous patch that have
>>> broken
>>> style.
>>>   
>>> Even worse is thing like 71, removing code you just added few patch
>>> earlier
>>> in the patchset.
>> Not quite, the code was added on patch 11.
>>
>>> This means that if i start reviewing following
>>> patch order
>>> i might spend time on code that is later deleted/modified/fixed ie
>>> time i
>>> spend understanding and reading some code might be just wasted.
>> Quick answer is that you are of course right, but having said that, I
>> think it would be not simple at all to do that squashing.
>> I squashed what I could, and probably I can do a little more (like
>> 58), but the major problem is that one of the main modules of the
>> driver - the scheduler (QCM) - was completely re-written (patches
>> 46-53). Therefore, from patch 1 to 53, we use the old scheduler code
>> and from 54 we use the new QCM (and the old scheduler code was
>> completely remove at 71). So I could maybe squash 71 into 54, but that
>> won't help you much, I think.
>>
>> So, the current advice I can give is to completely ignore the
>> following files because they do not exist in the final commit:
>> - kfd_sched_cik_static.c (stopped using in 54)
>> - kfd_registers.c (stopped using in 81 because we moved all register
>> writes to radeon)
>> - kfd_hw_pointer_store.c (alive from 46 to 67)
>> - kfd_hw_pointer_store.h (alive from 46 to 67)
>>
>>          Oded
> Ok i try but this is driving me crazy, i am jungling btw full applied
> patchset and individual patch going back and forth trying to find which
> patch is the one i want to comment on. Not even mentioning that after
> that people would need to ack bad patch just because they know a latter
> good patch fix the badness.
>
> This patchset can be reduced to less then 10 big patches. I would first
> do core patches that touch core things like iommu and are preparatory
> to the patchset. Then kfd patches that do not touch anything in radeon
> but implement the skeleton and core infrastructure. Then radeon specific
> patches.
>
> This lead me to the design, all the core kfd helper should be in hsa
> directory, and should be helper, while all specific bits to radeon
> should most likely be part of the radeon gfx kernel module. I am not
> against a radeon_kfd but realy i do not see the point. There should
> be a hsa.ko like there is a drm.ko.

Yeah totally agree with Jerome here.

Maybe I can explain a bit further what the difference in the design is. 
First of all for other good examples see not only the DRM 
infrastructure, but also V4L or other Linux driver subsystems as well.

In those subsystems it's not the helper functions that control the 
driver, but instead the driver that controls the helper functions. E.g. 
if HSA wants to be a new subsystem with a standardized IOCTL interface 
it should provide functions that assists drivers with implementing the 
interface instead of implementing it themselves and then trying to talk 
to the drivers for necessary resources/operations.

Coming back to the structure of the patches one of the very first 
patches should be the IOCTL definition a driver needs to implement to be 
HSA conform.

I think it would be a good idea to split out changes to core structures 
like IOMMU in it's own separately submitted patches, only with the 
notice that this functionality is needed by the following "core HSA" and 
"HSA for radeon" patchsets.

Regards,
Christian.

>
> Also i think it would be benefical to rename kfd to hsa because it is
> the most common name and the name that will bring valid hit with web
> search.
>
> Each of the patch should have a proper commit message that is not too
> cryptic without being too chatty either.
>
> Cheers,
> Jérôme
>
>>> I will come back with individual patch comments and also general
>>> remarks.
>>>   
>>> Cheers,
>>> Jérôme
>>>   
>>>   
>>>>   Alexey Skidanov (4):
>>>>     hsa/radeon: 32-bit processes support
>>>>     hsa/radeon: NULL pointer dereference bug workaround
>>>>     hsa/radeon: HSA64/HSA32 modes support
>>>>     hsa/radeon: Add local memory to topology
>>>>   Andrew Lewycky (3):
>>>>     hsa/radeon: Make binding of process to device permanent
>>>>     hsa/radeon: Implement hsaKmtSetMemoryPolicy
>>>>     mm: Change timing of notification to IOMMUs about a page to be
>>>>       invalidated
>>>>   Ben Goz (20):
>>>>     hsa/radeon: Add queue and hw_pointer_store modules
>>>>     hsa/radeon: Add support allocating kernel doorbells
>>>>     hsa/radeon: Add mqd_manager module
>>>>     hsa/radeon: Add kernel queue support for KFD
>>>>     hsa/radeon: Add module parameter of scheduling policy
>>>>     hsa/radeon: Add packet manager module
>>>>     hsa/radeon: Add process queue manager module
>>>>     hsa/radeon: Add device queue manager module
>>>>     hsa/radeon: Switch to new queue scheduler
>>>>     hsa/radeon: Add IOCTL for update queue
>>>>     hsa/radeon: Queue Management integration with Memory Management
>>>>     hsa/radeon: update queue fault handling
>>>>     hsa/radeon: fixing a bug to support 32b processes
>>>>     hsa/radeon: Fix number of pipes per ME
>>>>     hsa/radeon: Removing hw pointer store module
>>>>     hsa/radeon: Adding some error messages
>>>>     hsa/radeon: Fixing minor issues with kernel queues (DIQ)
>>>>     drm/radeon: Add register access functions to kfd2kgd interface
>>>>     hsa/radeon: Eliminating all direct register accesses
>>>>     drm/radeon: Remove lock functions from kfd2kgd interface
>>>>   Evgeny Pinchuk (9):
>>>>     hsa/radeon: fix the OEMID assignment in kfd_topology
>>>>     drm/radeon: extending kfd-kgd interface
>>>>     hsa/radeon: implementing IOCTL for clock counters
>>>>     drm/radeon: adding synchronization for GRBM GFX
>>>>     hsa/radeon: fixing clock counters bug
>>>>     drm/radeon: Extending kfd interface
>>>>     hsa/radeon: Adding max clock speeds to topology
>>>>     hsa/radeon: Alternating the source of max clock
>>>>     hsa/radeon: Exclusive access for perf. counters
>>>>   Michael Varga (1):
>>>>     hsa/radeon: debugging print statements
>>>>   Oded Gabbay (45):
>>>>     mm: Add kfd_process pointer to mm_struct
>>>>     drm/radeon: reduce number of free VMIDs and pipes in KV
>>>>     drm/radeon: Report doorbell configuration to kfd
>>>>     drm/radeon: Add radeon <--> kfd interface
>>>>     drm/radeon: Add kfd-->kgd interface to get virtual ram size
>>>>     drm/radeon: Add kfd-->kgd interfaces of memory
>>>> allocation/mapping
>>>>     drm/radeon: Add kfd-->kgd interface of locking srbm_gfx_cntl
>>>> register
>>>>     drm/radeon: Add calls to initialize and finalize kfd from radeon
>>>>     hsa/radeon: Add code base of hsa driver for AMD's GPUs
>>>>     hsa/radeon: Add initialization and unmapping of doorbell
>>>> aperture
>>>>     hsa/radeon: Add scheduler code
>>>>     hsa/radeon: Add kfd mmap handler
>>>>     hsa/radeon: Add 2 new IOCTL to kfd, CREATE_QUEUE and
>>>> DESTROY_QUEUE
>>>>     hsa/radeon: Update MAINTAINERS and CREDITS files
>>>>     hsa/radeon: Add interrupt handling module
>>>>     hsa/radeon: Add the isr function of the KFD scehduler
>>>>     hsa/radeon: Handle deactivation of queues using interrupts
>>>>     hsa/radeon: Enable interrupts in KFD scheduler
>>>>     hsa/radeon: Enable/Disable KFD interrupt module
>>>>     hsa/radeon: Add interrupt callback function to kgd2kfd interface
>>>>     hsa/radeon: Add kgd-->kfd interfaces for suspend and resume
>>>>     drm/radeon: Add calls to suspend and resume of kfd driver
>>>>     drm/radeon/cik: Don't touch int of pipes 1-7
>>>>     drm/radeon/cik: Call kfd isr function
>>>>     hsa/radeon: Fix memory size allocated for HPD
>>>>     hsa/radeon: Fix list of supported devices
>>>>     hsa/radeon: Fix coding style in cik_int.h
>>>>     hsa/radeon: Print ioctl commnad only in debug mode
>>>>     hsa/radeon: Print ISR info only in debug mode
>>>>     hsa/radeon: Workaround for a bug in amd_iommu
>>>>     hsa/radeon: Eliminate warnings in compilation
>>>>     hsa/radeon: Various kernel styling fixes
>>>>     hsa/radeon: Rearrange structures in kfd_ioctl.h
>>>>     hsa/radeon: change another pr_info to pr_debug
>>>>     hsa/radeon: Fix timeout calculation in sync_with_hw
>>>>     hsa/radeon: Update module information and version
>>>>     hsa/radeon: Update module version to 0.6.0
>>>>     hsa/radeon: Fix initialization of sh_mem registers
>>>>     hsa/radeon: Fix compilation warnings
>>>>     hsa/radeon: Remove old scheduler code
>>>>     hsa/radeon: Static analysis (smatch) fixes
>>>>     hsa/radeon: Check oversubscription before destroying runlist
>>>>     hsa/radeon: Don't verify cksum when parsing CRAT table
>>>>     hsa/radeon: Update module version to 0.6.1
>>>>     hsa/radeon: Update module version to 0.6.2
>>>>   Yair Shachar (1):
>>>>     hsa/radeon: Adding qcm fence return status
>>>>    CREDITS                                            |    7 +
>>>>    MAINTAINERS                                        |    8 +
>>>>    drivers/Kconfig                                    |    2 +
>>>>    drivers/gpu/Makefile                               |    1 +
>>>>    drivers/gpu/drm/radeon/Makefile                    |    1 +
>>>>    drivers/gpu/drm/radeon/cik.c                       |  156 +--
>>>>    drivers/gpu/drm/radeon/cikd.h                      |   51 +-
>>>>    drivers/gpu/drm/radeon/radeon.h                    |    9 +
>>>>    drivers/gpu/drm/radeon/radeon_device.c             |   32 +
>>>>    drivers/gpu/drm/radeon/radeon_drv.c                |    6 +
>>>>    drivers/gpu/drm/radeon/radeon_kfd.c                |  630
>>>> ++++++++++
>>>>    drivers/gpu/drm/radeon/radeon_kms.c                |    9 +
>>>>    drivers/gpu/hsa/Kconfig                            |   20 +
>>>>    drivers/gpu/hsa/Makefile                           |    1 +
>>>>    drivers/gpu/hsa/radeon/Makefile                    |   12 +
>>>>    drivers/gpu/hsa/radeon/cik_int.h                   |   50 +
>>>>    drivers/gpu/hsa/radeon/cik_mqds.h                  |  250 ++++
>>>>    drivers/gpu/hsa/radeon/cik_regs.h                  |  220 ++++
>>>>    drivers/gpu/hsa/radeon/kfd_aperture.c              |  123 ++
>>>>    drivers/gpu/hsa/radeon/kfd_chardev.c               |  530
>>>> +++++++++
>>>>    drivers/gpu/hsa/radeon/kfd_crat.h                  |  294 +++++
>>>>    drivers/gpu/hsa/radeon/kfd_device.c                |  244 ++++
>>>>    drivers/gpu/hsa/radeon/kfd_device_queue_manager.c  |  981
>>>> ++++++++++++++++
>>>>    drivers/gpu/hsa/radeon/kfd_device_queue_manager.h  |  101 ++
>>>>    drivers/gpu/hsa/radeon/kfd_doorbell.c              |  242 ++++
>>>>    drivers/gpu/hsa/radeon/kfd_interrupt.c             |  177 +++
>>>>    drivers/gpu/hsa/radeon/kfd_kernel_queue.c          |  305 +++++
>>>>    drivers/gpu/hsa/radeon/kfd_kernel_queue.h          |   66 ++
>>>>    drivers/gpu/hsa/radeon/kfd_module.c                |  130 +++
>>>>    drivers/gpu/hsa/radeon/kfd_mqd_manager.c           |  290 +++++
>>>>    drivers/gpu/hsa/radeon/kfd_mqd_manager.h           |   54 +
>>>>    drivers/gpu/hsa/radeon/kfd_packet_manager.c        |  488
>>>> ++++++++
>>>>    drivers/gpu/hsa/radeon/kfd_pasid.c                 |   97 ++
>>>>    drivers/gpu/hsa/radeon/kfd_pm4_headers.h           |  682
>>>> +++++++++++
>>>>    drivers/gpu/hsa/radeon/kfd_pm4_opcodes.h           |  107 ++
>>>>    drivers/gpu/hsa/radeon/kfd_priv.h                  |  475
>>>> ++++++++
>>>>    drivers/gpu/hsa/radeon/kfd_process.c               |  391 +++++++
>>>>    drivers/gpu/hsa/radeon/kfd_process_queue_manager.c |  343 ++++++
>>>>    drivers/gpu/hsa/radeon/kfd_queue.c                 |  109 ++
>>>>    drivers/gpu/hsa/radeon/kfd_scheduler.h             |   72 ++
>>>>    drivers/gpu/hsa/radeon/kfd_topology.c              | 1207
>>>> ++++++++++++++++++++
>>>>    drivers/gpu/hsa/radeon/kfd_topology.h              |  168 +++
>>>>    drivers/gpu/hsa/radeon/kfd_vidmem.c                |   97 ++
>>>>    include/linux/mm_types.h                           |   14 +
>>>>    include/linux/radeon_kfd.h                         |  106 ++
>>>>    include/uapi/linux/kfd_ioctl.h                     |  133 +++
>>>>    mm/rmap.c                                          |    8 +-
>>>>    47 files changed, 9402 insertions(+), 97 deletions(-)
>>>>    create mode 100644 drivers/gpu/drm/radeon/radeon_kfd.c
>>>>    create mode 100644 drivers/gpu/hsa/Kconfig
>>>>    create mode 100644 drivers/gpu/hsa/Makefile
>>>>    create mode 100644 drivers/gpu/hsa/radeon/Makefile
>>>>    create mode 100644 drivers/gpu/hsa/radeon/cik_int.h
>>>>    create mode 100644 drivers/gpu/hsa/radeon/cik_mqds.h
>>>>    create mode 100644 drivers/gpu/hsa/radeon/cik_regs.h
>>>>    create mode 100644 drivers/gpu/hsa/radeon/kfd_aperture.c
>>>>    create mode 100644 drivers/gpu/hsa/radeon/kfd_chardev.c
>>>>    create mode 100644 drivers/gpu/hsa/radeon/kfd_crat.h
>>>>    create mode 100644 drivers/gpu/hsa/radeon/kfd_device.c
>>>>    create mode 100644
>>>> drivers/gpu/hsa/radeon/kfd_device_queue_manager.c
>>>>    create mode 100644
>>>> drivers/gpu/hsa/radeon/kfd_device_queue_manager.h
>>>>    create mode 100644 drivers/gpu/hsa/radeon/kfd_doorbell.c
>>>>    create mode 100644 drivers/gpu/hsa/radeon/kfd_interrupt.c
>>>>    create mode 100644 drivers/gpu/hsa/radeon/kfd_kernel_queue.c
>>>>    create mode 100644 drivers/gpu/hsa/radeon/kfd_kernel_queue.h
>>>>    create mode 100644 drivers/gpu/hsa/radeon/kfd_module.c
>>>>    create mode 100644 drivers/gpu/hsa/radeon/kfd_mqd_manager.c
>>>>    create mode 100644 drivers/gpu/hsa/radeon/kfd_mqd_manager.h
>>>>    create mode 100644 drivers/gpu/hsa/radeon/kfd_packet_manager.c
>>>>    create mode 100644 drivers/gpu/hsa/radeon/kfd_pasid.c
>>>>    create mode 100644 drivers/gpu/hsa/radeon/kfd_pm4_headers.h
>>>>    create mode 100644 drivers/gpu/hsa/radeon/kfd_pm4_opcodes.h
>>>>    create mode 100644 drivers/gpu/hsa/radeon/kfd_priv.h
>>>>    create mode 100644 drivers/gpu/hsa/radeon/kfd_process.c
>>>>    create mode 100644
>>>> drivers/gpu/hsa/radeon/kfd_process_queue_manager.c
>>>>    create mode 100644 drivers/gpu/hsa/radeon/kfd_queue.c
>>>>    create mode 100644 drivers/gpu/hsa/radeon/kfd_scheduler.h
>>>>    create mode 100644 drivers/gpu/hsa/radeon/kfd_topology.c
>>>>    create mode 100644 drivers/gpu/hsa/radeon/kfd_topology.h
>>>>    create mode 100644 drivers/gpu/hsa/radeon/kfd_vidmem.c
>>>>    create mode 100644 include/linux/radeon_kfd.h
>>>>    create mode 100644 include/uapi/linux/kfd_ioctl.h
>>>>   --
>>>>   1.9.1
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/83] AMD HSA kernel driver
  2014-07-12  9:24       ` Christian König
@ 2014-07-12 11:10           ` Daniel Vetter
  0 siblings, 0 replies; 51+ messages in thread
From: Daniel Vetter @ 2014-07-12 11:10 UTC (permalink / raw)
  To: Christian König
  Cc: Jerome Glisse, Gabbay, Oded, oded.gabbay, Lewycky, Andrew,
	linux-kernel, dri-devel, Deucher, Alexander, akpm, Jesse Barnes

On Sat, Jul 12, 2014 at 11:24:49AM +0200, Christian König wrote:
> Am 11.07.2014 23:18, schrieb Jerome Glisse:
> >On Thu, Jul 10, 2014 at 10:51:29PM +0000, Gabbay, Oded wrote:
> >>On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote:
> >>>On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote:
> >>>>  This patch set implements a Heterogeneous System Architecture
> >>>>(HSA) driver
> >>>>  for radeon-family GPUs.
> >>>This is just quick comments on few things. Given size of this, people
> >>>will need to have time to review things.
> >>>>  HSA allows different processor types (CPUs, DSPs, GPUs, etc..) to
> >>>>share
> >>>>  system resources more effectively via HW features including
> >>>>shared pageable
> >>>>  memory, userspace-accessible work queues, and platform-level
> >>>>atomics. In
> >>>>  addition to the memory protection mechanisms in GPUVM and
> >>>>IOMMUv2, the Sea
> >>>>  Islands family of GPUs also performs HW-level validation of
> >>>>commands passed
> >>>>  in through the queues (aka rings).
> >>>>  The code in this patch set is intended to serve both as a sample
> >>>>driver for
> >>>>  other HSA-compatible hardware devices and as a production driver
> >>>>for
> >>>>  radeon-family processors. The code is architected to support
> >>>>multiple CPUs
> >>>>  each with connected GPUs, although the current implementation
> >>>>focuses on a
> >>>>  single Kaveri/Berlin APU, and works alongside the existing radeon
> >>>>kernel
> >>>>  graphics driver (kgd).
> >>>>  AMD GPUs designed for use with HSA (Sea Islands and up) share
> >>>>some hardware
> >>>>  functionality between HSA compute and regular gfx/compute (memory,
> >>>>  interrupts, registers), while other functionality has been added
> >>>>  specifically for HSA compute  (hw scheduler for virtualized
> >>>>compute rings).
> >>>>  All shared hardware is owned by the radeon graphics driver, and
> >>>>an interface
> >>>>  between kfd and kgd allows the kfd to make use of those shared
> >>>>resources,
> >>>>  while HSA-specific functionality is managed directly by kfd by
> >>>>submitting
> >>>>  packets into an HSA-specific command queue (the "HIQ").
> >>>>  During kfd module initialization a char device node (/dev/kfd) is
> >>>>created
> >>>>  (surviving until module exit), with ioctls for queue creation &
> >>>>management,
> >>>>  and data structures are initialized for managing HSA device
> >>>>topology.
> >>>>  The rest of the initialization is driven by calls from the radeon
> >>>>kgd at
> >>>>  the following points :
> >>>>  - radeon_init (kfd_init)
> >>>>  - radeon_exit (kfd_fini)
> >>>>  - radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
> >>>>  - radeon_driver_unload_kms (kfd_device_fini)
> >>>>  During the probe and init processing per-device data structures
> >>>>are
> >>>>  established which connect to the associated graphics kernel
> >>>>driver. This
> >>>>  information is exposed to userspace via sysfs, along with a
> >>>>version number
> >>>>  allowing userspace to determine if a topology change has occurred
> >>>>while it
> >>>>  was reading from sysfs.
> >>>>  The interface between kfd and kgd also allows the kfd to request
> >>>>buffer
> >>>>  management services from kgd, and allows kgd to route interrupt
> >>>>requests to
> >>>>  kfd code since the interrupt block is shared between regular
> >>>>  graphics/compute and HSA compute subsystems in the GPU.
> >>>>  The kfd code works with an open source usermode library
> >>>>("libhsakmt") which
> >>>>  is in the final stages of IP review and should be published in a
> >>>>separate
> >>>>  repo over the next few days.
> >>>>  The code operates in one of three modes, selectable via the
> >>>>sched_policy
> >>>>  module parameter :
> >>>>  - sched_policy=0 uses a hardware scheduler running in the MEC
> >>>>block within
> >>>>  CP, and allows oversubscription (more queues than HW slots)
> >>>>  - sched_policy=1 also uses HW scheduling but does not allow
> >>>>  oversubscription, so create_queue requests fail when we run out
> >>>>of HW slots
> >>>>  - sched_policy=2 does not use HW scheduling, so the driver
> >>>>manually assigns
> >>>>  queues to HW slots by programming registers
> >>>>  The "no HW scheduling" option is for debug & new hardware bringup
> >>>>only, so
> >>>>  has less test coverage than the other options. Default in the
> >>>>current code
> >>>>  is "HW scheduling without oversubscription" since that is where
> >>>>we have the
> >>>>  most test coverage but we expect to change the default to "HW
> >>>>scheduling
> >>>>  with oversubscription" after further testing. This effectively
> >>>>removes the
> >>>>  HW limit on the number of work queues available to applications.
> >>>>  Programs running on the GPU are associated with an address space
> >>>>through the
> >>>>  VMID field, which is translated to a unique PASID at access time
> >>>>via a set
> >>>>  of 16 VMID-to-PASID mapping registers. The available VMIDs
> >>>>(currently 16)
> >>>>  are partitioned (under control of the radeon kgd) between current
> >>>>  gfx/compute and HSA compute, with each getting 8 in the current
> >>>>code. The
> >>>>  VMID-to-PASID mapping registers are updated by the HW scheduler
> >>>>when used,
> >>>>  and by driver code if HW scheduling is not being used.
> >>>>  The Sea Islands compute queues use a new "doorbell" mechanism
> >>>>instead of the
> >>>>  earlier kernel-managed write pointer registers. Doorbells use a
> >>>>separate BAR
> >>>>  dedicated for this purpose, and pages within the doorbell
> >>>>aperture are
> >>>>  mapped to userspace (each page mapped to only one user address
> >>>>space).
> >>>>  Writes to the doorbell aperture are intercepted by GPU hardware,
> >>>>allowing
> >>>>  userspace code to safely manage work queues (rings) without
> >>>>requiring a
> >>>>  kernel call for every ring update.
> >>>>  First step for an application process is to open the kfd device.
> >>>>Calls to
> >>>>  open create a kfd "process" structure only for the first thread
> >>>>of the
> >>>>  process. Subsequent open calls are checked to see if they are
> >>>>from processes
> >>>>  using the same mm_struct and, if so, don't do anything. The kfd
> >>>>per-process
> >>>>  data lives as long as the mm_struct exists. Each mm_struct is
> >>>>associated
> >>>>  with a unique PASID, allowing the IOMMUv2 to make userspace
> >>>>process memory
> >>>>  accessible to the GPU.
> >>>>  Next step is for the application to collect topology information
> >>>>via sysfs.
> >>>>  This gives userspace enough information to be able to identify
> >>>>specific
> >>>>  nodes (processors) in subsequent queue management calls.
> >>>>Application
> >>>>  processes can create queues on multiple processors, and
> >>>>processors support
> >>>>  queues from multiple processes.
> >>>I am not a fan to use sysfs to discover topoly.
> >>>>  At this point the application can create work queues in userspace
> >>>>memory and
> >>>>  pass them through the usermode library to kfd to have them mapped
> >>>>onto HW
> >>>>  queue slots so that commands written to the queues can be
> >>>>executed by the
> >>>>  GPU. Queue operations specify a processor node, and so the bulk
> >>>>of this code
> >>>>  is device-specific.
> >>>>  Written by John Bridgman <John.Bridgman@amd.com>
> >>>So general comment is you need to collapse many patches things like
> >>>58 fixing
> >>>kernel style should be collapsed ie fix all previous patch that have
> >>>broken
> >>>style.
> >>>Even worse is thing like 71, removing code you just added few patch
> >>>earlier
> >>>in the patchset.
> >>Not quite, the code was added on patch 11.
> >>
> >>>This means that if i start reviewing following
> >>>patch order
> >>>i might spend time on code that is later deleted/modified/fixed ie
> >>>time i
> >>>spend understanding and reading some code might be just wasted.
> >>Quick answer is that you are of course right, but having said that, I
> >>think it would be not simple at all to do that squashing.
> >>I squashed what I could, and probably I can do a little more (like
> >>58), but the major problem is that one of the main modules of the
> >>driver - the scheduler (QCM) - was completely re-written (patches
> >>46-53). Therefore, from patch 1 to 53, we use the old scheduler code
> >>and from 54 we use the new QCM (and the old scheduler code was
> >>completely remove at 71). So I could maybe squash 71 into 54, but that
> >>won't help you much, I think.
> >>
> >>So, the current advice I can give is to completely ignore the
> >>following files because they do not exist in the final commit:
> >>- kfd_sched_cik_static.c (stopped using in 54)
> >>- kfd_registers.c (stopped using in 81 because we moved all register
> >>writes to radeon)
> >>- kfd_hw_pointer_store.c (alive from 46 to 67)
> >>- kfd_hw_pointer_store.h (alive from 46 to 67)
> >>
> >>         Oded
> >Ok i try but this is driving me crazy, i am jungling btw full applied
> >patchset and individual patch going back and forth trying to find which
> >patch is the one i want to comment on. Not even mentioning that after
> >that people would need to ack bad patch just because they know a latter
> >good patch fix the badness.
> >
> >This patchset can be reduced to less then 10 big patches. I would first
> >do core patches that touch core things like iommu and are preparatory
> >to the patchset. Then kfd patches that do not touch anything in radeon
> >but implement the skeleton and core infrastructure. Then radeon specific
> >patches.
> >
> >This lead me to the design, all the core kfd helper should be in hsa
> >directory, and should be helper, while all specific bits to radeon
> >should most likely be part of the radeon gfx kernel module. I am not
> >against a radeon_kfd but realy i do not see the point. There should
> >be a hsa.ko like there is a drm.ko.
> 
> Yeah totally agree with Jerome here.
> 
> Maybe I can explain a bit further what the difference in the design is.
> First of all for other good examples see not only the DRM infrastructure,
> but also V4L or other Linux driver subsystems as well.
> 
> In those subsystems it's not the helper functions that control the driver,
> but instead the driver that controls the helper functions. E.g. if HSA wants
> to be a new subsystem with a standardized IOCTL interface it should provide
> functions that assists drivers with implementing the interface instead of
> implementing it themselves and then trying to talk to the drivers for
> necessary resources/operations.
> 
> Coming back to the structure of the patches one of the very first patches
> should be the IOCTL definition a driver needs to implement to be HSA
> conform.
> 
> I think it would be a good idea to split out changes to core structures like
> IOMMU in it's own separately submitted patches, only with the notice that
> this functionality is needed by the following "core HSA" and "HSA for
> radeon" patchsets.

Hm, so the hsa part is a completely new driver/subsystem, not just an
additional ioctl tacked onto radoen? The history of drm is littered with
"generic" ioctls that turned out to be useful for exactly one driver.
Which is why _all_ the command submission is now done with driver-private
ioctls.

I'd be quite a bit surprised if that suddenly works differently, so before
we bless a generic hsa interface I really want to see some implementation
from a different vendor (i.e. nvdidia or intel) using the same ioctls.
Otherwise we just repeat history and I'm not terribly inclined to keep on
cleanup up cruft forever - one drm legacy is enough ;-)

Jesse is the guy from our side to talk to about this.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/83] AMD HSA kernel driver
@ 2014-07-12 11:10           ` Daniel Vetter
  0 siblings, 0 replies; 51+ messages in thread
From: Daniel Vetter @ 2014-07-12 11:10 UTC (permalink / raw)
  To: Christian König
  Cc: oded.gabbay, Lewycky, Andrew, linux-kernel, dri-devel, Deucher,
	Alexander, Jesse Barnes, akpm

On Sat, Jul 12, 2014 at 11:24:49AM +0200, Christian König wrote:
> Am 11.07.2014 23:18, schrieb Jerome Glisse:
> >On Thu, Jul 10, 2014 at 10:51:29PM +0000, Gabbay, Oded wrote:
> >>On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote:
> >>>On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote:
> >>>>  This patch set implements a Heterogeneous System Architecture
> >>>>(HSA) driver
> >>>>  for radeon-family GPUs.
> >>>This is just quick comments on few things. Given size of this, people
> >>>will need to have time to review things.
> >>>>  HSA allows different processor types (CPUs, DSPs, GPUs, etc..) to
> >>>>share
> >>>>  system resources more effectively via HW features including
> >>>>shared pageable
> >>>>  memory, userspace-accessible work queues, and platform-level
> >>>>atomics. In
> >>>>  addition to the memory protection mechanisms in GPUVM and
> >>>>IOMMUv2, the Sea
> >>>>  Islands family of GPUs also performs HW-level validation of
> >>>>commands passed
> >>>>  in through the queues (aka rings).
> >>>>  The code in this patch set is intended to serve both as a sample
> >>>>driver for
> >>>>  other HSA-compatible hardware devices and as a production driver
> >>>>for
> >>>>  radeon-family processors. The code is architected to support
> >>>>multiple CPUs
> >>>>  each with connected GPUs, although the current implementation
> >>>>focuses on a
> >>>>  single Kaveri/Berlin APU, and works alongside the existing radeon
> >>>>kernel
> >>>>  graphics driver (kgd).
> >>>>  AMD GPUs designed for use with HSA (Sea Islands and up) share
> >>>>some hardware
> >>>>  functionality between HSA compute and regular gfx/compute (memory,
> >>>>  interrupts, registers), while other functionality has been added
> >>>>  specifically for HSA compute  (hw scheduler for virtualized
> >>>>compute rings).
> >>>>  All shared hardware is owned by the radeon graphics driver, and
> >>>>an interface
> >>>>  between kfd and kgd allows the kfd to make use of those shared
> >>>>resources,
> >>>>  while HSA-specific functionality is managed directly by kfd by
> >>>>submitting
> >>>>  packets into an HSA-specific command queue (the "HIQ").
> >>>>  During kfd module initialization a char device node (/dev/kfd) is
> >>>>created
> >>>>  (surviving until module exit), with ioctls for queue creation &
> >>>>management,
> >>>>  and data structures are initialized for managing HSA device
> >>>>topology.
> >>>>  The rest of the initialization is driven by calls from the radeon
> >>>>kgd at
> >>>>  the following points :
> >>>>  - radeon_init (kfd_init)
> >>>>  - radeon_exit (kfd_fini)
> >>>>  - radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
> >>>>  - radeon_driver_unload_kms (kfd_device_fini)
> >>>>  During the probe and init processing per-device data structures
> >>>>are
> >>>>  established which connect to the associated graphics kernel
> >>>>driver. This
> >>>>  information is exposed to userspace via sysfs, along with a
> >>>>version number
> >>>>  allowing userspace to determine if a topology change has occurred
> >>>>while it
> >>>>  was reading from sysfs.
> >>>>  The interface between kfd and kgd also allows the kfd to request
> >>>>buffer
> >>>>  management services from kgd, and allows kgd to route interrupt
> >>>>requests to
> >>>>  kfd code since the interrupt block is shared between regular
> >>>>  graphics/compute and HSA compute subsystems in the GPU.
> >>>>  The kfd code works with an open source usermode library
> >>>>("libhsakmt") which
> >>>>  is in the final stages of IP review and should be published in a
> >>>>separate
> >>>>  repo over the next few days.
> >>>>  The code operates in one of three modes, selectable via the
> >>>>sched_policy
> >>>>  module parameter :
> >>>>  - sched_policy=0 uses a hardware scheduler running in the MEC
> >>>>block within
> >>>>  CP, and allows oversubscription (more queues than HW slots)
> >>>>  - sched_policy=1 also uses HW scheduling but does not allow
> >>>>  oversubscription, so create_queue requests fail when we run out
> >>>>of HW slots
> >>>>  - sched_policy=2 does not use HW scheduling, so the driver
> >>>>manually assigns
> >>>>  queues to HW slots by programming registers
> >>>>  The "no HW scheduling" option is for debug & new hardware bringup
> >>>>only, so
> >>>>  has less test coverage than the other options. Default in the
> >>>>current code
> >>>>  is "HW scheduling without oversubscription" since that is where
> >>>>we have the
> >>>>  most test coverage but we expect to change the default to "HW
> >>>>scheduling
> >>>>  with oversubscription" after further testing. This effectively
> >>>>removes the
> >>>>  HW limit on the number of work queues available to applications.
> >>>>  Programs running on the GPU are associated with an address space
> >>>>through the
> >>>>  VMID field, which is translated to a unique PASID at access time
> >>>>via a set
> >>>>  of 16 VMID-to-PASID mapping registers. The available VMIDs
> >>>>(currently 16)
> >>>>  are partitioned (under control of the radeon kgd) between current
> >>>>  gfx/compute and HSA compute, with each getting 8 in the current
> >>>>code. The
> >>>>  VMID-to-PASID mapping registers are updated by the HW scheduler
> >>>>when used,
> >>>>  and by driver code if HW scheduling is not being used.
> >>>>  The Sea Islands compute queues use a new "doorbell" mechanism
> >>>>instead of the
> >>>>  earlier kernel-managed write pointer registers. Doorbells use a
> >>>>separate BAR
> >>>>  dedicated for this purpose, and pages within the doorbell
> >>>>aperture are
> >>>>  mapped to userspace (each page mapped to only one user address
> >>>>space).
> >>>>  Writes to the doorbell aperture are intercepted by GPU hardware,
> >>>>allowing
> >>>>  userspace code to safely manage work queues (rings) without
> >>>>requiring a
> >>>>  kernel call for every ring update.
> >>>>  First step for an application process is to open the kfd device.
> >>>>Calls to
> >>>>  open create a kfd "process" structure only for the first thread
> >>>>of the
> >>>>  process. Subsequent open calls are checked to see if they are
> >>>>from processes
> >>>>  using the same mm_struct and, if so, don't do anything. The kfd
> >>>>per-process
> >>>>  data lives as long as the mm_struct exists. Each mm_struct is
> >>>>associated
> >>>>  with a unique PASID, allowing the IOMMUv2 to make userspace
> >>>>process memory
> >>>>  accessible to the GPU.
> >>>>  Next step is for the application to collect topology information
> >>>>via sysfs.
> >>>>  This gives userspace enough information to be able to identify
> >>>>specific
> >>>>  nodes (processors) in subsequent queue management calls.
> >>>>Application
> >>>>  processes can create queues on multiple processors, and
> >>>>processors support
> >>>>  queues from multiple processes.
> >>>I am not a fan to use sysfs to discover topoly.
> >>>>  At this point the application can create work queues in userspace
> >>>>memory and
> >>>>  pass them through the usermode library to kfd to have them mapped
> >>>>onto HW
> >>>>  queue slots so that commands written to the queues can be
> >>>>executed by the
> >>>>  GPU. Queue operations specify a processor node, and so the bulk
> >>>>of this code
> >>>>  is device-specific.
> >>>>  Written by John Bridgman <John.Bridgman@amd.com>
> >>>So general comment is you need to collapse many patches things like
> >>>58 fixing
> >>>kernel style should be collapsed ie fix all previous patch that have
> >>>broken
> >>>style.
> >>>Even worse is thing like 71, removing code you just added few patch
> >>>earlier
> >>>in the patchset.
> >>Not quite, the code was added on patch 11.
> >>
> >>>This means that if i start reviewing following
> >>>patch order
> >>>i might spend time on code that is later deleted/modified/fixed ie
> >>>time i
> >>>spend understanding and reading some code might be just wasted.
> >>Quick answer is that you are of course right, but having said that, I
> >>think it would be not simple at all to do that squashing.
> >>I squashed what I could, and probably I can do a little more (like
> >>58), but the major problem is that one of the main modules of the
> >>driver - the scheduler (QCM) - was completely re-written (patches
> >>46-53). Therefore, from patch 1 to 53, we use the old scheduler code
> >>and from 54 we use the new QCM (and the old scheduler code was
> >>completely remove at 71). So I could maybe squash 71 into 54, but that
> >>won't help you much, I think.
> >>
> >>So, the current advice I can give is to completely ignore the
> >>following files because they do not exist in the final commit:
> >>- kfd_sched_cik_static.c (stopped using in 54)
> >>- kfd_registers.c (stopped using in 81 because we moved all register
> >>writes to radeon)
> >>- kfd_hw_pointer_store.c (alive from 46 to 67)
> >>- kfd_hw_pointer_store.h (alive from 46 to 67)
> >>
> >>         Oded
> >Ok i try but this is driving me crazy, i am jungling btw full applied
> >patchset and individual patch going back and forth trying to find which
> >patch is the one i want to comment on. Not even mentioning that after
> >that people would need to ack bad patch just because they know a latter
> >good patch fix the badness.
> >
> >This patchset can be reduced to less then 10 big patches. I would first
> >do core patches that touch core things like iommu and are preparatory
> >to the patchset. Then kfd patches that do not touch anything in radeon
> >but implement the skeleton and core infrastructure. Then radeon specific
> >patches.
> >
> >This lead me to the design, all the core kfd helper should be in hsa
> >directory, and should be helper, while all specific bits to radeon
> >should most likely be part of the radeon gfx kernel module. I am not
> >against a radeon_kfd but realy i do not see the point. There should
> >be a hsa.ko like there is a drm.ko.
> 
> Yeah totally agree with Jerome here.
> 
> Maybe I can explain a bit further what the difference in the design is.
> First of all for other good examples see not only the DRM infrastructure,
> but also V4L or other Linux driver subsystems as well.
> 
> In those subsystems it's not the helper functions that control the driver,
> but instead the driver that controls the helper functions. E.g. if HSA wants
> to be a new subsystem with a standardized IOCTL interface it should provide
> functions that assists drivers with implementing the interface instead of
> implementing it themselves and then trying to talk to the drivers for
> necessary resources/operations.
> 
> Coming back to the structure of the patches one of the very first patches
> should be the IOCTL definition a driver needs to implement to be HSA
> conform.
> 
> I think it would be a good idea to split out changes to core structures like
> IOMMU in it's own separately submitted patches, only with the notice that
> this functionality is needed by the following "core HSA" and "HSA for
> radeon" patchsets.

Hm, so the hsa part is a completely new driver/subsystem, not just an
additional ioctl tacked onto radoen? The history of drm is littered with
"generic" ioctls that turned out to be useful for exactly one driver.
Which is why _all_ the command submission is now done with driver-private
ioctls.

I'd be quite a bit surprised if that suddenly works differently, so before
we bless a generic hsa interface I really want to see some implementation
from a different vendor (i.e. nvdidia or intel) using the same ioctls.
Otherwise we just repeat history and I'm not terribly inclined to keep on
cleanup up cruft forever - one drm legacy is enough ;-)

Jesse is the guy from our side to talk to about this.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/83] AMD HSA kernel driver
  2014-07-12 11:10           ` Daniel Vetter
  (?)
@ 2014-07-12 16:49           ` Jerome Glisse
  2014-07-13  9:42               ` Daniel Vetter
  -1 siblings, 1 reply; 51+ messages in thread
From: Jerome Glisse @ 2014-07-12 16:49 UTC (permalink / raw)
  To: Christian König, Gabbay, Oded, oded.gabbay, Lewycky, Andrew,
	linux-kernel, dri-devel, Deucher, Alexander, akpm, Jesse Barnes

On Sat, Jul 12, 2014 at 01:10:32PM +0200, Daniel Vetter wrote:
> On Sat, Jul 12, 2014 at 11:24:49AM +0200, Christian König wrote:
> > Am 11.07.2014 23:18, schrieb Jerome Glisse:
> > >On Thu, Jul 10, 2014 at 10:51:29PM +0000, Gabbay, Oded wrote:
> > >>On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote:
> > >>>On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote:
> > >>>>  This patch set implements a Heterogeneous System Architecture
> > >>>>(HSA) driver
> > >>>>  for radeon-family GPUs.
> > >>>This is just quick comments on few things. Given size of this, people
> > >>>will need to have time to review things.
> > >>>>  HSA allows different processor types (CPUs, DSPs, GPUs, etc..) to
> > >>>>share
> > >>>>  system resources more effectively via HW features including
> > >>>>shared pageable
> > >>>>  memory, userspace-accessible work queues, and platform-level
> > >>>>atomics. In
> > >>>>  addition to the memory protection mechanisms in GPUVM and
> > >>>>IOMMUv2, the Sea
> > >>>>  Islands family of GPUs also performs HW-level validation of
> > >>>>commands passed
> > >>>>  in through the queues (aka rings).
> > >>>>  The code in this patch set is intended to serve both as a sample
> > >>>>driver for
> > >>>>  other HSA-compatible hardware devices and as a production driver
> > >>>>for
> > >>>>  radeon-family processors. The code is architected to support
> > >>>>multiple CPUs
> > >>>>  each with connected GPUs, although the current implementation
> > >>>>focuses on a
> > >>>>  single Kaveri/Berlin APU, and works alongside the existing radeon
> > >>>>kernel
> > >>>>  graphics driver (kgd).
> > >>>>  AMD GPUs designed for use with HSA (Sea Islands and up) share
> > >>>>some hardware
> > >>>>  functionality between HSA compute and regular gfx/compute (memory,
> > >>>>  interrupts, registers), while other functionality has been added
> > >>>>  specifically for HSA compute  (hw scheduler for virtualized
> > >>>>compute rings).
> > >>>>  All shared hardware is owned by the radeon graphics driver, and
> > >>>>an interface
> > >>>>  between kfd and kgd allows the kfd to make use of those shared
> > >>>>resources,
> > >>>>  while HSA-specific functionality is managed directly by kfd by
> > >>>>submitting
> > >>>>  packets into an HSA-specific command queue (the "HIQ").
> > >>>>  During kfd module initialization a char device node (/dev/kfd) is
> > >>>>created
> > >>>>  (surviving until module exit), with ioctls for queue creation &
> > >>>>management,
> > >>>>  and data structures are initialized for managing HSA device
> > >>>>topology.
> > >>>>  The rest of the initialization is driven by calls from the radeon
> > >>>>kgd at
> > >>>>  the following points :
> > >>>>  - radeon_init (kfd_init)
> > >>>>  - radeon_exit (kfd_fini)
> > >>>>  - radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
> > >>>>  - radeon_driver_unload_kms (kfd_device_fini)
> > >>>>  During the probe and init processing per-device data structures
> > >>>>are
> > >>>>  established which connect to the associated graphics kernel
> > >>>>driver. This
> > >>>>  information is exposed to userspace via sysfs, along with a
> > >>>>version number
> > >>>>  allowing userspace to determine if a topology change has occurred
> > >>>>while it
> > >>>>  was reading from sysfs.
> > >>>>  The interface between kfd and kgd also allows the kfd to request
> > >>>>buffer
> > >>>>  management services from kgd, and allows kgd to route interrupt
> > >>>>requests to
> > >>>>  kfd code since the interrupt block is shared between regular
> > >>>>  graphics/compute and HSA compute subsystems in the GPU.
> > >>>>  The kfd code works with an open source usermode library
> > >>>>("libhsakmt") which
> > >>>>  is in the final stages of IP review and should be published in a
> > >>>>separate
> > >>>>  repo over the next few days.
> > >>>>  The code operates in one of three modes, selectable via the
> > >>>>sched_policy
> > >>>>  module parameter :
> > >>>>  - sched_policy=0 uses a hardware scheduler running in the MEC
> > >>>>block within
> > >>>>  CP, and allows oversubscription (more queues than HW slots)
> > >>>>  - sched_policy=1 also uses HW scheduling but does not allow
> > >>>>  oversubscription, so create_queue requests fail when we run out
> > >>>>of HW slots
> > >>>>  - sched_policy=2 does not use HW scheduling, so the driver
> > >>>>manually assigns
> > >>>>  queues to HW slots by programming registers
> > >>>>  The "no HW scheduling" option is for debug & new hardware bringup
> > >>>>only, so
> > >>>>  has less test coverage than the other options. Default in the
> > >>>>current code
> > >>>>  is "HW scheduling without oversubscription" since that is where
> > >>>>we have the
> > >>>>  most test coverage but we expect to change the default to "HW
> > >>>>scheduling
> > >>>>  with oversubscription" after further testing. This effectively
> > >>>>removes the
> > >>>>  HW limit on the number of work queues available to applications.
> > >>>>  Programs running on the GPU are associated with an address space
> > >>>>through the
> > >>>>  VMID field, which is translated to a unique PASID at access time
> > >>>>via a set
> > >>>>  of 16 VMID-to-PASID mapping registers. The available VMIDs
> > >>>>(currently 16)
> > >>>>  are partitioned (under control of the radeon kgd) between current
> > >>>>  gfx/compute and HSA compute, with each getting 8 in the current
> > >>>>code. The
> > >>>>  VMID-to-PASID mapping registers are updated by the HW scheduler
> > >>>>when used,
> > >>>>  and by driver code if HW scheduling is not being used.
> > >>>>  The Sea Islands compute queues use a new "doorbell" mechanism
> > >>>>instead of the
> > >>>>  earlier kernel-managed write pointer registers. Doorbells use a
> > >>>>separate BAR
> > >>>>  dedicated for this purpose, and pages within the doorbell
> > >>>>aperture are
> > >>>>  mapped to userspace (each page mapped to only one user address
> > >>>>space).
> > >>>>  Writes to the doorbell aperture are intercepted by GPU hardware,
> > >>>>allowing
> > >>>>  userspace code to safely manage work queues (rings) without
> > >>>>requiring a
> > >>>>  kernel call for every ring update.
> > >>>>  First step for an application process is to open the kfd device.
> > >>>>Calls to
> > >>>>  open create a kfd "process" structure only for the first thread
> > >>>>of the
> > >>>>  process. Subsequent open calls are checked to see if they are
> > >>>>from processes
> > >>>>  using the same mm_struct and, if so, don't do anything. The kfd
> > >>>>per-process
> > >>>>  data lives as long as the mm_struct exists. Each mm_struct is
> > >>>>associated
> > >>>>  with a unique PASID, allowing the IOMMUv2 to make userspace
> > >>>>process memory
> > >>>>  accessible to the GPU.
> > >>>>  Next step is for the application to collect topology information
> > >>>>via sysfs.
> > >>>>  This gives userspace enough information to be able to identify
> > >>>>specific
> > >>>>  nodes (processors) in subsequent queue management calls.
> > >>>>Application
> > >>>>  processes can create queues on multiple processors, and
> > >>>>processors support
> > >>>>  queues from multiple processes.
> > >>>I am not a fan to use sysfs to discover topoly.
> > >>>>  At this point the application can create work queues in userspace
> > >>>>memory and
> > >>>>  pass them through the usermode library to kfd to have them mapped
> > >>>>onto HW
> > >>>>  queue slots so that commands written to the queues can be
> > >>>>executed by the
> > >>>>  GPU. Queue operations specify a processor node, and so the bulk
> > >>>>of this code
> > >>>>  is device-specific.
> > >>>>  Written by John Bridgman <John.Bridgman@amd.com>
> > >>>So general comment is you need to collapse many patches things like
> > >>>58 fixing
> > >>>kernel style should be collapsed ie fix all previous patch that have
> > >>>broken
> > >>>style.
> > >>>Even worse is thing like 71, removing code you just added few patch
> > >>>earlier
> > >>>in the patchset.
> > >>Not quite, the code was added on patch 11.
> > >>
> > >>>This means that if i start reviewing following
> > >>>patch order
> > >>>i might spend time on code that is later deleted/modified/fixed ie
> > >>>time i
> > >>>spend understanding and reading some code might be just wasted.
> > >>Quick answer is that you are of course right, but having said that, I
> > >>think it would be not simple at all to do that squashing.
> > >>I squashed what I could, and probably I can do a little more (like
> > >>58), but the major problem is that one of the main modules of the
> > >>driver - the scheduler (QCM) - was completely re-written (patches
> > >>46-53). Therefore, from patch 1 to 53, we use the old scheduler code
> > >>and from 54 we use the new QCM (and the old scheduler code was
> > >>completely remove at 71). So I could maybe squash 71 into 54, but that
> > >>won't help you much, I think.
> > >>
> > >>So, the current advice I can give is to completely ignore the
> > >>following files because they do not exist in the final commit:
> > >>- kfd_sched_cik_static.c (stopped using in 54)
> > >>- kfd_registers.c (stopped using in 81 because we moved all register
> > >>writes to radeon)
> > >>- kfd_hw_pointer_store.c (alive from 46 to 67)
> > >>- kfd_hw_pointer_store.h (alive from 46 to 67)
> > >>
> > >>         Oded
> > >Ok i try but this is driving me crazy, i am jungling btw full applied
> > >patchset and individual patch going back and forth trying to find which
> > >patch is the one i want to comment on. Not even mentioning that after
> > >that people would need to ack bad patch just because they know a latter
> > >good patch fix the badness.
> > >
> > >This patchset can be reduced to less then 10 big patches. I would first
> > >do core patches that touch core things like iommu and are preparatory
> > >to the patchset. Then kfd patches that do not touch anything in radeon
> > >but implement the skeleton and core infrastructure. Then radeon specific
> > >patches.
> > >
> > >This lead me to the design, all the core kfd helper should be in hsa
> > >directory, and should be helper, while all specific bits to radeon
> > >should most likely be part of the radeon gfx kernel module. I am not
> > >against a radeon_kfd but realy i do not see the point. There should
> > >be a hsa.ko like there is a drm.ko.
> > 
> > Yeah totally agree with Jerome here.
> > 
> > Maybe I can explain a bit further what the difference in the design is.
> > First of all for other good examples see not only the DRM infrastructure,
> > but also V4L or other Linux driver subsystems as well.
> > 
> > In those subsystems it's not the helper functions that control the driver,
> > but instead the driver that controls the helper functions. E.g. if HSA wants
> > to be a new subsystem with a standardized IOCTL interface it should provide
> > functions that assists drivers with implementing the interface instead of
> > implementing it themselves and then trying to talk to the drivers for
> > necessary resources/operations.
> > 
> > Coming back to the structure of the patches one of the very first patches
> > should be the IOCTL definition a driver needs to implement to be HSA
> > conform.
> > 
> > I think it would be a good idea to split out changes to core structures like
> > IOMMU in it's own separately submitted patches, only with the notice that
> > this functionality is needed by the following "core HSA" and "HSA for
> > radeon" patchsets.
> 
> Hm, so the hsa part is a completely new driver/subsystem, not just an
> additional ioctl tacked onto radoen? The history of drm is littered with
> "generic" ioctls that turned out to be useful for exactly one driver.
> Which is why _all_ the command submission is now done with driver-private
> ioctls.
> 
> I'd be quite a bit surprised if that suddenly works differently, so before
> we bless a generic hsa interface I really want to see some implementation
> from a different vendor (i.e. nvdidia or intel) using the same ioctls.
> Otherwise we just repeat history and I'm not terribly inclined to keep on
> cleanup up cruft forever - one drm legacy is enough ;-)
> 
> Jesse is the guy from our side to talk to about this.
> -Daniel

I am not worried about that side, the hsa foundation has pretty strict
guidelines on what is hsa compliant hardware ie the hw needs to understand
the pm4 packet format of radeon (well small subset of it). But of course
this require hsa compliant hardware and from member i am guessing ARM Mali,
ImgTech, Qualcomm, ... so unless Intel and NVidia joins hsa you will not
see it for those hardware.

So yes for once same ioctl would apply to different hardware. The only things
that is different is the shader isa. The hsafoundation site has some pdf
explaining all that but someone thought that slideshare would be a good idea
personnaly i would not register to any of the website just to get the pdf.

So to sumup i am ok with having a new device file that present uniform set
of ioctl. It would actualy be lot easier for userspace, just open this fix
device file and ask for list of compliant hardware.

Then radeon kernel driver would register itself as a provider. So all ioctl
decoding marshalling would be share which makes sense.

Cheers,
Jérôme Glisse

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/83] AMD HSA kernel driver
  2014-07-11 21:18       ` Jerome Glisse
@ 2014-07-12 21:55         ` Gabbay, Oded
  -1 siblings, 0 replies; 51+ messages in thread
From: Gabbay, Oded @ 2014-07-12 21:55 UTC (permalink / raw)
  To: j.glisse
  Cc: linux-kernel, Bridgman, John, Deucher, Alexander, Lewycky,
	Andrew, joro, akpm, dri-devel, airlied, oded.gabbay

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 23797 bytes --]

On Fri, 2014-07-11 at 17:18 -0400, Jerome Glisse wrote:
> On Thu, Jul 10, 2014 at 10:51:29PM +0000, Gabbay, Oded wrote:
> >  On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote:
> > >  On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote:
> > > >   This patch set implements a Heterogeneous System Architecture
> > > >  (HSA) driver
> > > >   for radeon-family GPUs.
> > >  This is just quick comments on few things. Given size of this, 
> > > people
> > >  will need to have time to review things.
> > > >   HSA allows different processor types (CPUs, DSPs, GPUs, 
> > > > etc..) to
> > > >  share
> > > >   system resources more effectively via HW features including
> > > >  shared pageable
> > > >   memory, userspace-accessible work queues, and platform-level
> > > >  atomics. In
> > > >   addition to the memory protection mechanisms in GPUVM and
> > > >  IOMMUv2, the Sea
> > > >   Islands family of GPUs also performs HW-level validation of
> > > >  commands passed
> > > >   in through the queues (aka rings).
> > > >   The code in this patch set is intended to serve both as a 
> > > > sample
> > > >  driver for
> > > >   other HSA-compatible hardware devices and as a production 
> > > > driver
> > > >  for
> > > >   radeon-family processors. The code is architected to support
> > > >  multiple CPUs
> > > >   each with connected GPUs, although the current implementation
> > > >  focuses on a
> > > >   single Kaveri/Berlin APU, and works alongside the existing 
> > > > radeon
> > > >  kernel
> > > >   graphics driver (kgd).
> > > >   AMD GPUs designed for use with HSA (Sea Islands and up) share
> > > >  some hardware
> > > >   functionality between HSA compute and regular gfx/compute 
> > > > (memory,
> > > >   interrupts, registers), while other functionality has been 
> > > > added
> > > >   specifically for HSA compute  (hw scheduler for virtualized
> > > >  compute rings).
> > > >   All shared hardware is owned by the radeon graphics driver, 
> > > > and
> > > >  an interface
> > > >   between kfd and kgd allows the kfd to make use of those 
> > > > shared
> > > >  resources,
> > > >   while HSA-specific functionality is managed directly by kfd 
> > > > by
> > > >  submitting
> > > >   packets into an HSA-specific command queue (the "HIQ").
> > > >   During kfd module initialization a char device node 
> > > > (/dev/kfd) is
> > > >  created
> > > >   (surviving until module exit), with ioctls for queue 
> > > > creation &
> > > >  management,
> > > >   and data structures are initialized for managing HSA device
> > > >  topology.
> > > >   The rest of the initialization is driven by calls from the 
> > > > radeon
> > > >  kgd at
> > > >   the following points :
> > > >   - radeon_init (kfd_init)
> > > >   - radeon_exit (kfd_fini)
> > > >   - radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
> > > >   - radeon_driver_unload_kms (kfd_device_fini)
> > > >   During the probe and init processing per-device data 
> > > > structures
> > > >  are
> > > >   established which connect to the associated graphics kernel
> > > >  driver. This
> > > >   information is exposed to userspace via sysfs, along with a
> > > >  version number
> > > >   allowing userspace to determine if a topology change has 
> > > > occurred
> > > >  while it
> > > >   was reading from sysfs.
> > > >   The interface between kfd and kgd also allows the kfd to 
> > > > request
> > > >  buffer
> > > >   management services from kgd, and allows kgd to route 
> > > > interrupt
> > > >  requests to
> > > >   kfd code since the interrupt block is shared between regular
> > > >   graphics/compute and HSA compute subsystems in the GPU.
> > > >   The kfd code works with an open source usermode library
> > > >  ("libhsakmt") which
> > > >   is in the final stages of IP review and should be published 
> > > > in a
> > > >  separate
> > > >   repo over the next few days.
> > > >   The code operates in one of three modes, selectable via the
> > > >  sched_policy
> > > >   module parameter :
> > > >   - sched_policy=0 uses a hardware scheduler running in the MEC
> > > >  block within
> > > >   CP, and allows oversubscription (more queues than HW slots)
> > > >   - sched_policy=1 also uses HW scheduling but does not allow
> > > >   oversubscription, so create_queue requests fail when we run 
> > > > out
> > > >  of HW slots
> > > >   - sched_policy=2 does not use HW scheduling, so the driver
> > > >  manually assigns
> > > >   queues to HW slots by programming registers
> > > >   The "no HW scheduling" option is for debug & new hardware 
> > > > bringup
> > > >  only, so
> > > >   has less test coverage than the other options. Default in the
> > > >  current code
> > > >   is "HW scheduling without oversubscription" since that is 
> > > > where
> > > >  we have the
> > > >   most test coverage but we expect to change the default to "HW
> > > >  scheduling
> > > >   with oversubscription" after further testing. This 
> > > > effectively
> > > >  removes the
> > > >   HW limit on the number of work queues available to 
> > > > applications.
> > > >   Programs running on the GPU are associated with an address 
> > > > space
> > > >  through the
> > > >   VMID field, which is translated to a unique PASID at access 
> > > > time
> > > >  via a set
> > > >   of 16 VMID-to-PASID mapping registers. The available VMIDs
> > > >  (currently 16)
> > > >   are partitioned (under control of the radeon kgd) between 
> > > > current
> > > >   gfx/compute and HSA compute, with each getting 8 in the 
> > > > current
> > > >  code. The
> > > >   VMID-to-PASID mapping registers are updated by the HW 
> > > > scheduler
> > > >  when used,
> > > >   and by driver code if HW scheduling is not being used.
> > > >   The Sea Islands compute queues use a new "doorbell" mechanism
> > > >  instead of the
> > > >   earlier kernel-managed write pointer registers. Doorbells 
> > > > use a
> > > >  separate BAR
> > > >   dedicated for this purpose, and pages within the doorbell
> > > >  aperture are
> > > >   mapped to userspace (each page mapped to only one user 
> > > > address
> > > >  space).
> > > >   Writes to the doorbell aperture are intercepted by GPU 
> > > > hardware,
> > > >  allowing
> > > >   userspace code to safely manage work queues (rings) without
> > > >  requiring a
> > > >   kernel call for every ring update.
> > > >   First step for an application process is to open the kfd 
> > > > device.
> > > >  Calls to
> > > >   open create a kfd "process" structure only for the first 
> > > > thread
> > > >  of the
> > > >   process. Subsequent open calls are checked to see if they are
> > > >  from processes
> > > >   using the same mm_struct and, if so, don't do anything. The 
> > > > kfd
> > > >  per-process
> > > >   data lives as long as the mm_struct exists. Each mm_struct is
> > > >  associated
> > > >   with a unique PASID, allowing the IOMMUv2 to make userspace
> > > >  process memory
> > > >   accessible to the GPU.
> > > >   Next step is for the application to collect topology 
> > > > information
> > > >  via sysfs.
> > > >   This gives userspace enough information to be able to 
> > > > identify
> > > >  specific
> > > >   nodes (processors) in subsequent queue management calls.
> > > >  Application
> > > >   processes can create queues on multiple processors, and
> > > >  processors support
> > > >   queues from multiple processes.
> > >  I am not a fan to use sysfs to discover topoly.
> > > >   At this point the application can create work queues in 
> > > > userspace
> > > >  memory and
> > > >   pass them through the usermode library to kfd to have them 
> > > > mapped
> > > >  onto HW
> > > >   queue slots so that commands written to the queues can be
> > > >  executed by the
> > > >   GPU. Queue operations specify a processor node, and so the 
> > > > bulk
> > > >  of this code
> > > >   is device-specific.
> > > >   Written by John Bridgman <John.Bridgman@amd.com>
> > >  So general comment is you need to collapse many patches things 
> > > like
> > >  58 fixing
> > >  kernel style should be collapsed ie fix all previous patch that 
> > > have
> > >  broken
> > >  style.
> > >  Even worse is thing like 71, removing code you just added few 
> > > patch
> > >  earlier
> > >  in the patchset.
> >  Not quite, the code was added on patch 11.
> > >  This means that if i start reviewing following
> > >  patch order
> > >  i might spend time on code that is later deleted/modified/fixed 
> > > ie
> > >  time i
> > >  spend understanding and reading some code might be just wasted.
> >  Quick answer is that you are of course right, but having said 
> > that, I
> >  think it would be not simple at all to do that squashing.
> >  I squashed what I could, and probably I can do a little more (like
> >  58), but the major problem is that one of the main modules of the
> >  driver - the scheduler (QCM) - was completely re-written (patches
> >  46-53). Therefore, from patch 1 to 53, we use the old scheduler 
> > code
> >  and from 54 we use the new QCM (and the old scheduler code was
> >  completely remove at 71). So I could maybe squash 71 into 54, but 
> > that
> >  won't help you much, I think.
> >  So, the current advice I can give is to completely ignore the
> >  following files because they do not exist in the final commit:
> >  - kfd_sched_cik_static.c (stopped using in 54)
> >  - kfd_registers.c (stopped using in 81 because we moved all 
> > register
> >  writes to radeon)
> >  - kfd_hw_pointer_store.c (alive from 46 to 67)
> >  - kfd_hw_pointer_store.h (alive from 46 to 67)
> >          Oded
>  
> Ok i try but this is driving me crazy, i am jungling btw full applied
> patchset and individual patch going back and forth trying to find 
> which
> patch is the one i want to comment on. Not even mentioning that after
> that people would need to ack bad patch just because they know a 
> latter
> good patch fix the badness.
>  
> This patchset can be reduced to less then 10 big patches. I would 
> first
> do core patches that touch core things like iommu and are preparatory
> to the patchset. Then kfd patches that do not touch anything in 
> radeon
> but implement the skeleton and core infrastructure. Then radeon 
> specific
> patches.
Hi Jerome,
I thought about what you said and I would like to make a suggestion. I 
believe I can restructure the patchset into 10-20 patches, organized 
logically and will be easier to review.

The downside is that you will lose all references to the current 
patchset and hence, all the review work you (and other people) have 
done so far will be somewhat wasted. Also, the entire git history of 
the driver development will be lost (at least externally).

John suggested something a bit different in the email thread of PATCH 
07/83. He said to squash everything from patch 2 to 54 (including 71) 
and leave the remaining patches as they were, with maybe some 
additional small squashing.

Please tell me which option do you prefer:

A. Continue with the review of the current patchset.
B. Go with my suggestion.
C. Go with John's suggestion.

I'm going tomorrow (Sunday) to prepare options B & C, but I need your 
input before publishing anything. So, if you could reply by Monday 
morning my time, that would be great as it will allow me to publish 
(if chosen) the new set by Monday morning, EST.

And thanks again for the time you dedicate to this review. This is 
highly appreciated.

        Oded
>  
> This lead me to the design, all the core kfd helper should be in hsa
> directory, and should be helper, while all specific bits to radeon
> should most likely be part of the radeon gfx kernel module. I am not
> against a radeon_kfd but realy i do not see the point. There should
> be a hsa.ko like there is a drm.ko.
>  
> Also i think it would be benefical to rename kfd to hsa because it is
> the most common name and the name that will bring valid hit with web
> search.
>  
> Each of the patch should have a proper commit message that is not too
> cryptic without being too chatty either.
>  
> Cheers,
> Jérôme
>  
> > >  I will come back with individual patch comments and also general
> > >  remarks.
> > >  Cheers,
> > >  Jérôme
> > > >   Alexey Skidanov (4):
> > > >     hsa/radeon: 32-bit processes support
> > > >     hsa/radeon: NULL pointer dereference bug workaround
> > > >     hsa/radeon: HSA64/HSA32 modes support
> > > >     hsa/radeon: Add local memory to topology
> > > >   Andrew Lewycky (3):
> > > >     hsa/radeon: Make binding of process to device permanent
> > > >     hsa/radeon: Implement hsaKmtSetMemoryPolicy
> > > >     mm: Change timing of notification to IOMMUs about a page 
> > > > to be
> > > >       invalidated
> > > >   Ben Goz (20):
> > > >     hsa/radeon: Add queue and hw_pointer_store modules
> > > >     hsa/radeon: Add support allocating kernel doorbells
> > > >     hsa/radeon: Add mqd_manager module
> > > >     hsa/radeon: Add kernel queue support for KFD
> > > >     hsa/radeon: Add module parameter of scheduling policy
> > > >     hsa/radeon: Add packet manager module
> > > >     hsa/radeon: Add process queue manager module
> > > >     hsa/radeon: Add device queue manager module
> > > >     hsa/radeon: Switch to new queue scheduler
> > > >     hsa/radeon: Add IOCTL for update queue
> > > >     hsa/radeon: Queue Management integration with Memory 
> > > > Management
> > > >     hsa/radeon: update queue fault handling
> > > >     hsa/radeon: fixing a bug to support 32b processes
> > > >     hsa/radeon: Fix number of pipes per ME
> > > >     hsa/radeon: Removing hw pointer store module
> > > >     hsa/radeon: Adding some error messages
> > > >     hsa/radeon: Fixing minor issues with kernel queues (DIQ)
> > > >     drm/radeon: Add register access functions to kfd2kgd 
> > > > interface
> > > >     hsa/radeon: Eliminating all direct register accesses
> > > >     drm/radeon: Remove lock functions from kfd2kgd interface
> > > >   Evgeny Pinchuk (9):
> > > >     hsa/radeon: fix the OEMID assignment in kfd_topology
> > > >     drm/radeon: extending kfd-kgd interface
> > > >     hsa/radeon: implementing IOCTL for clock counters
> > > >     drm/radeon: adding synchronization for GRBM GFX
> > > >     hsa/radeon: fixing clock counters bug
> > > >     drm/radeon: Extending kfd interface
> > > >     hsa/radeon: Adding max clock speeds to topology
> > > >     hsa/radeon: Alternating the source of max clock
> > > >     hsa/radeon: Exclusive access for perf. counters
> > > >   Michael Varga (1):
> > > >     hsa/radeon: debugging print statements
> > > >   Oded Gabbay (45):
> > > >     mm: Add kfd_process pointer to mm_struct
> > > >     drm/radeon: reduce number of free VMIDs and pipes in KV
> > > >     drm/radeon: Report doorbell configuration to kfd
> > > >     drm/radeon: Add radeon <--> kfd interface
> > > >     drm/radeon: Add kfd-->kgd interface to get virtual ram size
> > > >     drm/radeon: Add kfd-->kgd interfaces of memory
> > > >  allocation/mapping
> > > >     drm/radeon: Add kfd-->kgd interface of locking 
> > > > srbm_gfx_cntl
> > > >  register
> > > >     drm/radeon: Add calls to initialize and finalize kfd from 
> > > > radeon
> > > >     hsa/radeon: Add code base of hsa driver for AMD's GPUs
> > > >     hsa/radeon: Add initialization and unmapping of doorbell
> > > >  aperture
> > > >     hsa/radeon: Add scheduler code
> > > >     hsa/radeon: Add kfd mmap handler
> > > >     hsa/radeon: Add 2 new IOCTL to kfd, CREATE_QUEUE and
> > > >  DESTROY_QUEUE
> > > >     hsa/radeon: Update MAINTAINERS and CREDITS files
> > > >     hsa/radeon: Add interrupt handling module
> > > >     hsa/radeon: Add the isr function of the KFD scehduler
> > > >     hsa/radeon: Handle deactivation of queues using interrupts
> > > >     hsa/radeon: Enable interrupts in KFD scheduler
> > > >     hsa/radeon: Enable/Disable KFD interrupt module
> > > >     hsa/radeon: Add interrupt callback function to kgd2kfd 
> > > > interface
> > > >     hsa/radeon: Add kgd-->kfd interfaces for suspend and resume
> > > >     drm/radeon: Add calls to suspend and resume of kfd driver
> > > >     drm/radeon/cik: Don't touch int of pipes 1-7
> > > >     drm/radeon/cik: Call kfd isr function
> > > >     hsa/radeon: Fix memory size allocated for HPD
> > > >     hsa/radeon: Fix list of supported devices
> > > >     hsa/radeon: Fix coding style in cik_int.h
> > > >     hsa/radeon: Print ioctl commnad only in debug mode
> > > >     hsa/radeon: Print ISR info only in debug mode
> > > >     hsa/radeon: Workaround for a bug in amd_iommu
> > > >     hsa/radeon: Eliminate warnings in compilation
> > > >     hsa/radeon: Various kernel styling fixes
> > > >     hsa/radeon: Rearrange structures in kfd_ioctl.h
> > > >     hsa/radeon: change another pr_info to pr_debug
> > > >     hsa/radeon: Fix timeout calculation in sync_with_hw
> > > >     hsa/radeon: Update module information and version
> > > >     hsa/radeon: Update module version to 0.6.0
> > > >     hsa/radeon: Fix initialization of sh_mem registers
> > > >     hsa/radeon: Fix compilation warnings
> > > >     hsa/radeon: Remove old scheduler code
> > > >     hsa/radeon: Static analysis (smatch) fixes
> > > >     hsa/radeon: Check oversubscription before destroying 
> > > > runlist
> > > >     hsa/radeon: Don't verify cksum when parsing CRAT table
> > > >     hsa/radeon: Update module version to 0.6.1
> > > >     hsa/radeon: Update module version to 0.6.2
> > > >   Yair Shachar (1):
> > > >     hsa/radeon: Adding qcm fence return status
> > > >    CREDITS                                            |    7 +
> > > >    MAINTAINERS                                        |    8 +
> > > >    drivers/Kconfig                                    |    2 +
> > > >    drivers/gpu/Makefile                               |    1 +
> > > >    drivers/gpu/drm/radeon/Makefile                    |    1 +
> > > >    drivers/gpu/drm/radeon/cik.c                       |  156 
> > > > +--
> > > >    drivers/gpu/drm/radeon/cikd.h                      |   51 +-
> > > >    drivers/gpu/drm/radeon/radeon.h                    |    9 +
> > > >    drivers/gpu/drm/radeon/radeon_device.c             |   32 +
> > > >    drivers/gpu/drm/radeon/radeon_drv.c                |    6 +
> > > >    drivers/gpu/drm/radeon/radeon_kfd.c                |  630
> > > >  ++++++++++
> > > >    drivers/gpu/drm/radeon/radeon_kms.c                |    9 +
> > > >    drivers/gpu/hsa/Kconfig                            |   20 +
> > > >    drivers/gpu/hsa/Makefile                           |    1 +
> > > >    drivers/gpu/hsa/radeon/Makefile                    |   12 +
> > > >    drivers/gpu/hsa/radeon/cik_int.h                   |   50 +
> > > >    drivers/gpu/hsa/radeon/cik_mqds.h                  |  250 
> > > > ++++
> > > >    drivers/gpu/hsa/radeon/cik_regs.h                  |  220 
> > > > ++++
> > > >    drivers/gpu/hsa/radeon/kfd_aperture.c              |  123 ++
> > > >    drivers/gpu/hsa/radeon/kfd_chardev.c               |  530
> > > >  +++++++++
> > > >    drivers/gpu/hsa/radeon/kfd_crat.h                  |  294 
> > > > +++++
> > > >    drivers/gpu/hsa/radeon/kfd_device.c                |  244 
> > > > ++++
> > > >    drivers/gpu/hsa/radeon/kfd_device_queue_manager.c  |  981
> > > >  ++++++++++++++++
> > > >    drivers/gpu/hsa/radeon/kfd_device_queue_manager.h  |  101 ++
> > > >    drivers/gpu/hsa/radeon/kfd_doorbell.c              |  242 
> > > > ++++
> > > >    drivers/gpu/hsa/radeon/kfd_interrupt.c             |  177 
> > > > +++
> > > >    drivers/gpu/hsa/radeon/kfd_kernel_queue.c          |  305 
> > > > +++++
> > > >    drivers/gpu/hsa/radeon/kfd_kernel_queue.h          |   66 ++
> > > >    drivers/gpu/hsa/radeon/kfd_module.c                |  130 
> > > > +++
> > > >    drivers/gpu/hsa/radeon/kfd_mqd_manager.c           |  290 
> > > > +++++
> > > >    drivers/gpu/hsa/radeon/kfd_mqd_manager.h           |   54 +
> > > >    drivers/gpu/hsa/radeon/kfd_packet_manager.c        |  488
> > > >  ++++++++
> > > >    drivers/gpu/hsa/radeon/kfd_pasid.c                 |   97 ++
> > > >    drivers/gpu/hsa/radeon/kfd_pm4_headers.h           |  682
> > > >  +++++++++++
> > > >    drivers/gpu/hsa/radeon/kfd_pm4_opcodes.h           |  107 ++
> > > >    drivers/gpu/hsa/radeon/kfd_priv.h                  |  475
> > > >  ++++++++
> > > >    drivers/gpu/hsa/radeon/kfd_process.c               |  391 
> > > > +++++++
> > > >    drivers/gpu/hsa/radeon/kfd_process_queue_manager.c |  343 
> > > > ++++++
> > > >    drivers/gpu/hsa/radeon/kfd_queue.c                 |  109 ++
> > > >    drivers/gpu/hsa/radeon/kfd_scheduler.h             |   72 ++
> > > >    drivers/gpu/hsa/radeon/kfd_topology.c              | 1207
> > > >  ++++++++++++++++++++
> > > >    drivers/gpu/hsa/radeon/kfd_topology.h              |  168 
> > > > +++
> > > >    drivers/gpu/hsa/radeon/kfd_vidmem.c                |   97 ++
> > > >    include/linux/mm_types.h                           |   14 +
> > > >    include/linux/radeon_kfd.h                         |  106 ++
> > > >    include/uapi/linux/kfd_ioctl.h                     |  133 
> > > > +++
> > > >    mm/rmap.c                                          |    8 +-
> > > >    47 files changed, 9402 insertions(+), 97 deletions(-)
> > > >    create mode 100644 drivers/gpu/drm/radeon/radeon_kfd.c
> > > >    create mode 100644 drivers/gpu/hsa/Kconfig
> > > >    create mode 100644 drivers/gpu/hsa/Makefile
> > > >    create mode 100644 drivers/gpu/hsa/radeon/Makefile
> > > >    create mode 100644 drivers/gpu/hsa/radeon/cik_int.h
> > > >    create mode 100644 drivers/gpu/hsa/radeon/cik_mqds.h
> > > >    create mode 100644 drivers/gpu/hsa/radeon/cik_regs.h
> > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_aperture.c
> > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_chardev.c
> > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_crat.h
> > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_device.c
> > > >    create mode 100644
> > > >  drivers/gpu/hsa/radeon/kfd_device_queue_manager.c
> > > >    create mode 100644
> > > >  drivers/gpu/hsa/radeon/kfd_device_queue_manager.h
> > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_doorbell.c
> > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_interrupt.c
> > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_kernel_queue.c
> > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_kernel_queue.h
> > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_module.c
> > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_mqd_manager.c
> > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_mqd_manager.h
> > > >    create mode 100644 
> > > > drivers/gpu/hsa/radeon/kfd_packet_manager.c
> > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_pasid.c
> > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_pm4_headers.h
> > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_pm4_opcodes.h
> > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_priv.h
> > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_process.c
> > > >    create mode 100644
> > > >  drivers/gpu/hsa/radeon/kfd_process_queue_manager.c
> > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_queue.c
> > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_scheduler.h
> > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_topology.c
> > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_topology.h
> > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_vidmem.c
> > > >    create mode 100644 include/linux/radeon_kfd.h
> > > >    create mode 100644 include/uapi/linux/kfd_ioctl.h
> > > >   --
> > > >   1.9.1ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/83] AMD HSA kernel driver
@ 2014-07-12 21:55         ` Gabbay, Oded
  0 siblings, 0 replies; 51+ messages in thread
From: Gabbay, Oded @ 2014-07-12 21:55 UTC (permalink / raw)
  To: j.glisse
  Cc: oded.gabbay, Lewycky, Andrew, linux-kernel, dri-devel, Deucher,
	Alexander, akpm

On Fri, 2014-07-11 at 17:18 -0400, Jerome Glisse wrote:
> On Thu, Jul 10, 2014 at 10:51:29PM +0000, Gabbay, Oded wrote:
> >  On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote:
> > >  On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote:
> > > >   This patch set implements a Heterogeneous System Architecture
> > > >  (HSA) driver
> > > >   for radeon-family GPUs.
> > >  This is just quick comments on few things. Given size of this, 
> > > people
> > >  will need to have time to review things.
> > > >   HSA allows different processor types (CPUs, DSPs, GPUs, 
> > > > etc..) to
> > > >  share
> > > >   system resources more effectively via HW features including
> > > >  shared pageable
> > > >   memory, userspace-accessible work queues, and platform-level
> > > >  atomics. In
> > > >   addition to the memory protection mechanisms in GPUVM and
> > > >  IOMMUv2, the Sea
> > > >   Islands family of GPUs also performs HW-level validation of
> > > >  commands passed
> > > >   in through the queues (aka rings).
> > > >   The code in this patch set is intended to serve both as a 
> > > > sample
> > > >  driver for
> > > >   other HSA-compatible hardware devices and as a production 
> > > > driver
> > > >  for
> > > >   radeon-family processors. The code is architected to support
> > > >  multiple CPUs
> > > >   each with connected GPUs, although the current implementation
> > > >  focuses on a
> > > >   single Kaveri/Berlin APU, and works alongside the existing 
> > > > radeon
> > > >  kernel
> > > >   graphics driver (kgd).
> > > >   AMD GPUs designed for use with HSA (Sea Islands and up) share
> > > >  some hardware
> > > >   functionality between HSA compute and regular gfx/compute 
> > > > (memory,
> > > >   interrupts, registers), while other functionality has been 
> > > > added
> > > >   specifically for HSA compute  (hw scheduler for virtualized
> > > >  compute rings).
> > > >   All shared hardware is owned by the radeon graphics driver, 
> > > > and
> > > >  an interface
> > > >   between kfd and kgd allows the kfd to make use of those 
> > > > shared
> > > >  resources,
> > > >   while HSA-specific functionality is managed directly by kfd 
> > > > by
> > > >  submitting
> > > >   packets into an HSA-specific command queue (the "HIQ").
> > > >   During kfd module initialization a char device node 
> > > > (/dev/kfd) is
> > > >  created
> > > >   (surviving until module exit), with ioctls for queue 
> > > > creation &
> > > >  management,
> > > >   and data structures are initialized for managing HSA device
> > > >  topology.
> > > >   The rest of the initialization is driven by calls from the 
> > > > radeon
> > > >  kgd at
> > > >   the following points :
> > > >   - radeon_init (kfd_init)
> > > >   - radeon_exit (kfd_fini)
> > > >   - radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
> > > >   - radeon_driver_unload_kms (kfd_device_fini)
> > > >   During the probe and init processing per-device data 
> > > > structures
> > > >  are
> > > >   established which connect to the associated graphics kernel
> > > >  driver. This
> > > >   information is exposed to userspace via sysfs, along with a
> > > >  version number
> > > >   allowing userspace to determine if a topology change has 
> > > > occurred
> > > >  while it
> > > >   was reading from sysfs.
> > > >   The interface between kfd and kgd also allows the kfd to 
> > > > request
> > > >  buffer
> > > >   management services from kgd, and allows kgd to route 
> > > > interrupt
> > > >  requests to
> > > >   kfd code since the interrupt block is shared between regular
> > > >   graphics/compute and HSA compute subsystems in the GPU.
> > > >   The kfd code works with an open source usermode library
> > > >  ("libhsakmt") which
> > > >   is in the final stages of IP review and should be published 
> > > > in a
> > > >  separate
> > > >   repo over the next few days.
> > > >   The code operates in one of three modes, selectable via the
> > > >  sched_policy
> > > >   module parameter :
> > > >   - sched_policy=0 uses a hardware scheduler running in the MEC
> > > >  block within
> > > >   CP, and allows oversubscription (more queues than HW slots)
> > > >   - sched_policy=1 also uses HW scheduling but does not allow
> > > >   oversubscription, so create_queue requests fail when we run 
> > > > out
> > > >  of HW slots
> > > >   - sched_policy=2 does not use HW scheduling, so the driver
> > > >  manually assigns
> > > >   queues to HW slots by programming registers
> > > >   The "no HW scheduling" option is for debug & new hardware 
> > > > bringup
> > > >  only, so
> > > >   has less test coverage than the other options. Default in the
> > > >  current code
> > > >   is "HW scheduling without oversubscription" since that is 
> > > > where
> > > >  we have the
> > > >   most test coverage but we expect to change the default to "HW
> > > >  scheduling
> > > >   with oversubscription" after further testing. This 
> > > > effectively
> > > >  removes the
> > > >   HW limit on the number of work queues available to 
> > > > applications.
> > > >   Programs running on the GPU are associated with an address 
> > > > space
> > > >  through the
> > > >   VMID field, which is translated to a unique PASID at access 
> > > > time
> > > >  via a set
> > > >   of 16 VMID-to-PASID mapping registers. The available VMIDs
> > > >  (currently 16)
> > > >   are partitioned (under control of the radeon kgd) between 
> > > > current
> > > >   gfx/compute and HSA compute, with each getting 8 in the 
> > > > current
> > > >  code. The
> > > >   VMID-to-PASID mapping registers are updated by the HW 
> > > > scheduler
> > > >  when used,
> > > >   and by driver code if HW scheduling is not being used.
> > > >   The Sea Islands compute queues use a new "doorbell" mechanism
> > > >  instead of the
> > > >   earlier kernel-managed write pointer registers. Doorbells 
> > > > use a
> > > >  separate BAR
> > > >   dedicated for this purpose, and pages within the doorbell
> > > >  aperture are
> > > >   mapped to userspace (each page mapped to only one user 
> > > > address
> > > >  space).
> > > >   Writes to the doorbell aperture are intercepted by GPU 
> > > > hardware,
> > > >  allowing
> > > >   userspace code to safely manage work queues (rings) without
> > > >  requiring a
> > > >   kernel call for every ring update.
> > > >   First step for an application process is to open the kfd 
> > > > device.
> > > >  Calls to
> > > >   open create a kfd "process" structure only for the first 
> > > > thread
> > > >  of the
> > > >   process. Subsequent open calls are checked to see if they are
> > > >  from processes
> > > >   using the same mm_struct and, if so, don't do anything. The 
> > > > kfd
> > > >  per-process
> > > >   data lives as long as the mm_struct exists. Each mm_struct is
> > > >  associated
> > > >   with a unique PASID, allowing the IOMMUv2 to make userspace
> > > >  process memory
> > > >   accessible to the GPU.
> > > >   Next step is for the application to collect topology 
> > > > information
> > > >  via sysfs.
> > > >   This gives userspace enough information to be able to 
> > > > identify
> > > >  specific
> > > >   nodes (processors) in subsequent queue management calls.
> > > >  Application
> > > >   processes can create queues on multiple processors, and
> > > >  processors support
> > > >   queues from multiple processes.
> > >  I am not a fan to use sysfs to discover topoly.
> > > >   At this point the application can create work queues in 
> > > > userspace
> > > >  memory and
> > > >   pass them through the usermode library to kfd to have them 
> > > > mapped
> > > >  onto HW
> > > >   queue slots so that commands written to the queues can be
> > > >  executed by the
> > > >   GPU. Queue operations specify a processor node, and so the 
> > > > bulk
> > > >  of this code
> > > >   is device-specific.
> > > >   Written by John Bridgman <John.Bridgman@amd.com>
> > >  So general comment is you need to collapse many patches things 
> > > like
> > >  58 fixing
> > >  kernel style should be collapsed ie fix all previous patch that 
> > > have
> > >  broken
> > >  style.
> > >  Even worse is thing like 71, removing code you just added few 
> > > patch
> > >  earlier
> > >  in the patchset.
> >  Not quite, the code was added on patch 11.
> > >  This means that if i start reviewing following
> > >  patch order
> > >  i might spend time on code that is later deleted/modified/fixed 
> > > ie
> > >  time i
> > >  spend understanding and reading some code might be just wasted.
> >  Quick answer is that you are of course right, but having said 
> > that, I
> >  think it would be not simple at all to do that squashing.
> >  I squashed what I could, and probably I can do a little more (like
> >  58), but the major problem is that one of the main modules of the
> >  driver - the scheduler (QCM) - was completely re-written (patches
> >  46-53). Therefore, from patch 1 to 53, we use the old scheduler 
> > code
> >  and from 54 we use the new QCM (and the old scheduler code was
> >  completely remove at 71). So I could maybe squash 71 into 54, but 
> > that
> >  won't help you much, I think.
> >  So, the current advice I can give is to completely ignore the
> >  following files because they do not exist in the final commit:
> >  - kfd_sched_cik_static.c (stopped using in 54)
> >  - kfd_registers.c (stopped using in 81 because we moved all 
> > register
> >  writes to radeon)
> >  - kfd_hw_pointer_store.c (alive from 46 to 67)
> >  - kfd_hw_pointer_store.h (alive from 46 to 67)
> >          Oded
>  
> Ok i try but this is driving me crazy, i am jungling btw full applied
> patchset and individual patch going back and forth trying to find 
> which
> patch is the one i want to comment on. Not even mentioning that after
> that people would need to ack bad patch just because they know a 
> latter
> good patch fix the badness.
>  
> This patchset can be reduced to less then 10 big patches. I would 
> first
> do core patches that touch core things like iommu and are preparatory
> to the patchset. Then kfd patches that do not touch anything in 
> radeon
> but implement the skeleton and core infrastructure. Then radeon 
> specific
> patches.
Hi Jerome,
I thought about what you said and I would like to make a suggestion. I 
believe I can restructure the patchset into 10-20 patches, organized 
logically and will be easier to review.

The downside is that you will lose all references to the current 
patchset and hence, all the review work you (and other people) have 
done so far will be somewhat wasted. Also, the entire git history of 
the driver development will be lost (at least externally).

John suggested something a bit different in the email thread of PATCH 
07/83. He said to squash everything from patch 2 to 54 (including 71) 
and leave the remaining patches as they were, with maybe some 
additional small squashing.

Please tell me which option do you prefer:

A. Continue with the review of the current patchset.
B. Go with my suggestion.
C. Go with John's suggestion.

I'm going tomorrow (Sunday) to prepare options B & C, but I need your 
input before publishing anything. So, if you could reply by Monday 
morning my time, that would be great as it will allow me to publish 
(if chosen) the new set by Monday morning, EST.

And thanks again for the time you dedicate to this review. This is 
highly appreciated.

        Oded
>  
> This lead me to the design, all the core kfd helper should be in hsa
> directory, and should be helper, while all specific bits to radeon
> should most likely be part of the radeon gfx kernel module. I am not
> against a radeon_kfd but realy i do not see the point. There should
> be a hsa.ko like there is a drm.ko.
>  
> Also i think it would be benefical to rename kfd to hsa because it is
> the most common name and the name that will bring valid hit with web
> search.
>  
> Each of the patch should have a proper commit message that is not too
> cryptic without being too chatty either.
>  
> Cheers,
> Jérôme
>  
> > >  I will come back with individual patch comments and also general
> > >  remarks.
> > >  Cheers,
> > >  Jérôme
> > > >   Alexey Skidanov (4):
> > > >     hsa/radeon: 32-bit processes support
> > > >     hsa/radeon: NULL pointer dereference bug workaround
> > > >     hsa/radeon: HSA64/HSA32 modes support
> > > >     hsa/radeon: Add local memory to topology
> > > >   Andrew Lewycky (3):
> > > >     hsa/radeon: Make binding of process to device permanent
> > > >     hsa/radeon: Implement hsaKmtSetMemoryPolicy
> > > >     mm: Change timing of notification to IOMMUs about a page 
> > > > to be
> > > >       invalidated
> > > >   Ben Goz (20):
> > > >     hsa/radeon: Add queue and hw_pointer_store modules
> > > >     hsa/radeon: Add support allocating kernel doorbells
> > > >     hsa/radeon: Add mqd_manager module
> > > >     hsa/radeon: Add kernel queue support for KFD
> > > >     hsa/radeon: Add module parameter of scheduling policy
> > > >     hsa/radeon: Add packet manager module
> > > >     hsa/radeon: Add process queue manager module
> > > >     hsa/radeon: Add device queue manager module
> > > >     hsa/radeon: Switch to new queue scheduler
> > > >     hsa/radeon: Add IOCTL for update queue
> > > >     hsa/radeon: Queue Management integration with Memory 
> > > > Management
> > > >     hsa/radeon: update queue fault handling
> > > >     hsa/radeon: fixing a bug to support 32b processes
> > > >     hsa/radeon: Fix number of pipes per ME
> > > >     hsa/radeon: Removing hw pointer store module
> > > >     hsa/radeon: Adding some error messages
> > > >     hsa/radeon: Fixing minor issues with kernel queues (DIQ)
> > > >     drm/radeon: Add register access functions to kfd2kgd 
> > > > interface
> > > >     hsa/radeon: Eliminating all direct register accesses
> > > >     drm/radeon: Remove lock functions from kfd2kgd interface
> > > >   Evgeny Pinchuk (9):
> > > >     hsa/radeon: fix the OEMID assignment in kfd_topology
> > > >     drm/radeon: extending kfd-kgd interface
> > > >     hsa/radeon: implementing IOCTL for clock counters
> > > >     drm/radeon: adding synchronization for GRBM GFX
> > > >     hsa/radeon: fixing clock counters bug
> > > >     drm/radeon: Extending kfd interface
> > > >     hsa/radeon: Adding max clock speeds to topology
> > > >     hsa/radeon: Alternating the source of max clock
> > > >     hsa/radeon: Exclusive access for perf. counters
> > > >   Michael Varga (1):
> > > >     hsa/radeon: debugging print statements
> > > >   Oded Gabbay (45):
> > > >     mm: Add kfd_process pointer to mm_struct
> > > >     drm/radeon: reduce number of free VMIDs and pipes in KV
> > > >     drm/radeon: Report doorbell configuration to kfd
> > > >     drm/radeon: Add radeon <--> kfd interface
> > > >     drm/radeon: Add kfd-->kgd interface to get virtual ram size
> > > >     drm/radeon: Add kfd-->kgd interfaces of memory
> > > >  allocation/mapping
> > > >     drm/radeon: Add kfd-->kgd interface of locking 
> > > > srbm_gfx_cntl
> > > >  register
> > > >     drm/radeon: Add calls to initialize and finalize kfd from 
> > > > radeon
> > > >     hsa/radeon: Add code base of hsa driver for AMD's GPUs
> > > >     hsa/radeon: Add initialization and unmapping of doorbell
> > > >  aperture
> > > >     hsa/radeon: Add scheduler code
> > > >     hsa/radeon: Add kfd mmap handler
> > > >     hsa/radeon: Add 2 new IOCTL to kfd, CREATE_QUEUE and
> > > >  DESTROY_QUEUE
> > > >     hsa/radeon: Update MAINTAINERS and CREDITS files
> > > >     hsa/radeon: Add interrupt handling module
> > > >     hsa/radeon: Add the isr function of the KFD scehduler
> > > >     hsa/radeon: Handle deactivation of queues using interrupts
> > > >     hsa/radeon: Enable interrupts in KFD scheduler
> > > >     hsa/radeon: Enable/Disable KFD interrupt module
> > > >     hsa/radeon: Add interrupt callback function to kgd2kfd 
> > > > interface
> > > >     hsa/radeon: Add kgd-->kfd interfaces for suspend and resume
> > > >     drm/radeon: Add calls to suspend and resume of kfd driver
> > > >     drm/radeon/cik: Don't touch int of pipes 1-7
> > > >     drm/radeon/cik: Call kfd isr function
> > > >     hsa/radeon: Fix memory size allocated for HPD
> > > >     hsa/radeon: Fix list of supported devices
> > > >     hsa/radeon: Fix coding style in cik_int.h
> > > >     hsa/radeon: Print ioctl commnad only in debug mode
> > > >     hsa/radeon: Print ISR info only in debug mode
> > > >     hsa/radeon: Workaround for a bug in amd_iommu
> > > >     hsa/radeon: Eliminate warnings in compilation
> > > >     hsa/radeon: Various kernel styling fixes
> > > >     hsa/radeon: Rearrange structures in kfd_ioctl.h
> > > >     hsa/radeon: change another pr_info to pr_debug
> > > >     hsa/radeon: Fix timeout calculation in sync_with_hw
> > > >     hsa/radeon: Update module information and version
> > > >     hsa/radeon: Update module version to 0.6.0
> > > >     hsa/radeon: Fix initialization of sh_mem registers
> > > >     hsa/radeon: Fix compilation warnings
> > > >     hsa/radeon: Remove old scheduler code
> > > >     hsa/radeon: Static analysis (smatch) fixes
> > > >     hsa/radeon: Check oversubscription before destroying 
> > > > runlist
> > > >     hsa/radeon: Don't verify cksum when parsing CRAT table
> > > >     hsa/radeon: Update module version to 0.6.1
> > > >     hsa/radeon: Update module version to 0.6.2
> > > >   Yair Shachar (1):
> > > >     hsa/radeon: Adding qcm fence return status
> > > >    CREDITS                                            |    7 +
> > > >    MAINTAINERS                                        |    8 +
> > > >    drivers/Kconfig                                    |    2 +
> > > >    drivers/gpu/Makefile                               |    1 +
> > > >    drivers/gpu/drm/radeon/Makefile                    |    1 +
> > > >    drivers/gpu/drm/radeon/cik.c                       |  156 
> > > > +--
> > > >    drivers/gpu/drm/radeon/cikd.h                      |   51 +-
> > > >    drivers/gpu/drm/radeon/radeon.h                    |    9 +
> > > >    drivers/gpu/drm/radeon/radeon_device.c             |   32 +
> > > >    drivers/gpu/drm/radeon/radeon_drv.c                |    6 +
> > > >    drivers/gpu/drm/radeon/radeon_kfd.c                |  630
> > > >  ++++++++++
> > > >    drivers/gpu/drm/radeon/radeon_kms.c                |    9 +
> > > >    drivers/gpu/hsa/Kconfig                            |   20 +
> > > >    drivers/gpu/hsa/Makefile                           |    1 +
> > > >    drivers/gpu/hsa/radeon/Makefile                    |   12 +
> > > >    drivers/gpu/hsa/radeon/cik_int.h                   |   50 +
> > > >    drivers/gpu/hsa/radeon/cik_mqds.h                  |  250 
> > > > ++++
> > > >    drivers/gpu/hsa/radeon/cik_regs.h                  |  220 
> > > > ++++
> > > >    drivers/gpu/hsa/radeon/kfd_aperture.c              |  123 ++
> > > >    drivers/gpu/hsa/radeon/kfd_chardev.c               |  530
> > > >  +++++++++
> > > >    drivers/gpu/hsa/radeon/kfd_crat.h                  |  294 
> > > > +++++
> > > >    drivers/gpu/hsa/radeon/kfd_device.c                |  244 
> > > > ++++
> > > >    drivers/gpu/hsa/radeon/kfd_device_queue_manager.c  |  981
> > > >  ++++++++++++++++
> > > >    drivers/gpu/hsa/radeon/kfd_device_queue_manager.h  |  101 ++
> > > >    drivers/gpu/hsa/radeon/kfd_doorbell.c              |  242 
> > > > ++++
> > > >    drivers/gpu/hsa/radeon/kfd_interrupt.c             |  177 
> > > > +++
> > > >    drivers/gpu/hsa/radeon/kfd_kernel_queue.c          |  305 
> > > > +++++
> > > >    drivers/gpu/hsa/radeon/kfd_kernel_queue.h          |   66 ++
> > > >    drivers/gpu/hsa/radeon/kfd_module.c                |  130 
> > > > +++
> > > >    drivers/gpu/hsa/radeon/kfd_mqd_manager.c           |  290 
> > > > +++++
> > > >    drivers/gpu/hsa/radeon/kfd_mqd_manager.h           |   54 +
> > > >    drivers/gpu/hsa/radeon/kfd_packet_manager.c        |  488
> > > >  ++++++++
> > > >    drivers/gpu/hsa/radeon/kfd_pasid.c                 |   97 ++
> > > >    drivers/gpu/hsa/radeon/kfd_pm4_headers.h           |  682
> > > >  +++++++++++
> > > >    drivers/gpu/hsa/radeon/kfd_pm4_opcodes.h           |  107 ++
> > > >    drivers/gpu/hsa/radeon/kfd_priv.h                  |  475
> > > >  ++++++++
> > > >    drivers/gpu/hsa/radeon/kfd_process.c               |  391 
> > > > +++++++
> > > >    drivers/gpu/hsa/radeon/kfd_process_queue_manager.c |  343 
> > > > ++++++
> > > >    drivers/gpu/hsa/radeon/kfd_queue.c                 |  109 ++
> > > >    drivers/gpu/hsa/radeon/kfd_scheduler.h             |   72 ++
> > > >    drivers/gpu/hsa/radeon/kfd_topology.c              | 1207
> > > >  ++++++++++++++++++++
> > > >    drivers/gpu/hsa/radeon/kfd_topology.h              |  168 
> > > > +++
> > > >    drivers/gpu/hsa/radeon/kfd_vidmem.c                |   97 ++
> > > >    include/linux/mm_types.h                           |   14 +
> > > >    include/linux/radeon_kfd.h                         |  106 ++
> > > >    include/uapi/linux/kfd_ioctl.h                     |  133 
> > > > +++
> > > >    mm/rmap.c                                          |    8 +-
> > > >    47 files changed, 9402 insertions(+), 97 deletions(-)
> > > >    create mode 100644 drivers/gpu/drm/radeon/radeon_kfd.c
> > > >    create mode 100644 drivers/gpu/hsa/Kconfig
> > > >    create mode 100644 drivers/gpu/hsa/Makefile
> > > >    create mode 100644 drivers/gpu/hsa/radeon/Makefile
> > > >    create mode 100644 drivers/gpu/hsa/radeon/cik_int.h
> > > >    create mode 100644 drivers/gpu/hsa/radeon/cik_mqds.h
> > > >    create mode 100644 drivers/gpu/hsa/radeon/cik_regs.h
> > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_aperture.c
> > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_chardev.c
> > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_crat.h
> > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_device.c
> > > >    create mode 100644
> > > >  drivers/gpu/hsa/radeon/kfd_device_queue_manager.c
> > > >    create mode 100644
> > > >  drivers/gpu/hsa/radeon/kfd_device_queue_manager.h
> > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_doorbell.c
> > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_interrupt.c
> > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_kernel_queue.c
> > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_kernel_queue.h
> > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_module.c
> > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_mqd_manager.c
> > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_mqd_manager.h
> > > >    create mode 100644 
> > > > drivers/gpu/hsa/radeon/kfd_packet_manager.c
> > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_pasid.c
> > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_pm4_headers.h
> > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_pm4_opcodes.h
> > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_priv.h
> > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_process.c
> > > >    create mode 100644
> > > >  drivers/gpu/hsa/radeon/kfd_process_queue_manager.c
> > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_queue.c
> > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_scheduler.h
> > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_topology.c
> > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_topology.h
> > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_vidmem.c
> > > >    create mode 100644 include/linux/radeon_kfd.h
> > > >    create mode 100644 include/uapi/linux/kfd_ioctl.h
> > > >   --
> > > >   1.9.1
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/83] AMD HSA kernel driver
  2014-07-12 21:55         ` Gabbay, Oded
@ 2014-07-13  3:55           ` Jerome Glisse
  -1 siblings, 0 replies; 51+ messages in thread
From: Jerome Glisse @ 2014-07-13  3:55 UTC (permalink / raw)
  To: Gabbay, Oded
  Cc: linux-kernel, Bridgman, John, Deucher, Alexander, Lewycky,
	Andrew, joro, akpm, dri-devel, airlied, oded.gabbay

On Sat, Jul 12, 2014 at 09:55:49PM +0000, Gabbay, Oded wrote:
> On Fri, 2014-07-11 at 17:18 -0400, Jerome Glisse wrote:
> > On Thu, Jul 10, 2014 at 10:51:29PM +0000, Gabbay, Oded wrote:
> > >  On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote:
> > > >  On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote:
> > > > >   This patch set implements a Heterogeneous System Architecture
> > > > >  (HSA) driver
> > > > >   for radeon-family GPUs.
> > > >  This is just quick comments on few things. Given size of this, 
> > > > people
> > > >  will need to have time to review things.
> > > > >   HSA allows different processor types (CPUs, DSPs, GPUs, 
> > > > > etc..) to
> > > > >  share
> > > > >   system resources more effectively via HW features including
> > > > >  shared pageable
> > > > >   memory, userspace-accessible work queues, and platform-level
> > > > >  atomics. In
> > > > >   addition to the memory protection mechanisms in GPUVM and
> > > > >  IOMMUv2, the Sea
> > > > >   Islands family of GPUs also performs HW-level validation of
> > > > >  commands passed
> > > > >   in through the queues (aka rings).
> > > > >   The code in this patch set is intended to serve both as a 
> > > > > sample
> > > > >  driver for
> > > > >   other HSA-compatible hardware devices and as a production 
> > > > > driver
> > > > >  for
> > > > >   radeon-family processors. The code is architected to support
> > > > >  multiple CPUs
> > > > >   each with connected GPUs, although the current implementation
> > > > >  focuses on a
> > > > >   single Kaveri/Berlin APU, and works alongside the existing 
> > > > > radeon
> > > > >  kernel
> > > > >   graphics driver (kgd).
> > > > >   AMD GPUs designed for use with HSA (Sea Islands and up) share
> > > > >  some hardware
> > > > >   functionality between HSA compute and regular gfx/compute 
> > > > > (memory,
> > > > >   interrupts, registers), while other functionality has been 
> > > > > added
> > > > >   specifically for HSA compute  (hw scheduler for virtualized
> > > > >  compute rings).
> > > > >   All shared hardware is owned by the radeon graphics driver, 
> > > > > and
> > > > >  an interface
> > > > >   between kfd and kgd allows the kfd to make use of those 
> > > > > shared
> > > > >  resources,
> > > > >   while HSA-specific functionality is managed directly by kfd 
> > > > > by
> > > > >  submitting
> > > > >   packets into an HSA-specific command queue (the "HIQ").
> > > > >   During kfd module initialization a char device node 
> > > > > (/dev/kfd) is
> > > > >  created
> > > > >   (surviving until module exit), with ioctls for queue 
> > > > > creation &
> > > > >  management,
> > > > >   and data structures are initialized for managing HSA device
> > > > >  topology.
> > > > >   The rest of the initialization is driven by calls from the 
> > > > > radeon
> > > > >  kgd at
> > > > >   the following points :
> > > > >   - radeon_init (kfd_init)
> > > > >   - radeon_exit (kfd_fini)
> > > > >   - radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
> > > > >   - radeon_driver_unload_kms (kfd_device_fini)
> > > > >   During the probe and init processing per-device data 
> > > > > structures
> > > > >  are
> > > > >   established which connect to the associated graphics kernel
> > > > >  driver. This
> > > > >   information is exposed to userspace via sysfs, along with a
> > > > >  version number
> > > > >   allowing userspace to determine if a topology change has 
> > > > > occurred
> > > > >  while it
> > > > >   was reading from sysfs.
> > > > >   The interface between kfd and kgd also allows the kfd to 
> > > > > request
> > > > >  buffer
> > > > >   management services from kgd, and allows kgd to route 
> > > > > interrupt
> > > > >  requests to
> > > > >   kfd code since the interrupt block is shared between regular
> > > > >   graphics/compute and HSA compute subsystems in the GPU.
> > > > >   The kfd code works with an open source usermode library
> > > > >  ("libhsakmt") which
> > > > >   is in the final stages of IP review and should be published 
> > > > > in a
> > > > >  separate
> > > > >   repo over the next few days.
> > > > >   The code operates in one of three modes, selectable via the
> > > > >  sched_policy
> > > > >   module parameter :
> > > > >   - sched_policy=0 uses a hardware scheduler running in the MEC
> > > > >  block within
> > > > >   CP, and allows oversubscription (more queues than HW slots)
> > > > >   - sched_policy=1 also uses HW scheduling but does not allow
> > > > >   oversubscription, so create_queue requests fail when we run 
> > > > > out
> > > > >  of HW slots
> > > > >   - sched_policy=2 does not use HW scheduling, so the driver
> > > > >  manually assigns
> > > > >   queues to HW slots by programming registers
> > > > >   The "no HW scheduling" option is for debug & new hardware 
> > > > > bringup
> > > > >  only, so
> > > > >   has less test coverage than the other options. Default in the
> > > > >  current code
> > > > >   is "HW scheduling without oversubscription" since that is 
> > > > > where
> > > > >  we have the
> > > > >   most test coverage but we expect to change the default to "HW
> > > > >  scheduling
> > > > >   with oversubscription" after further testing. This 
> > > > > effectively
> > > > >  removes the
> > > > >   HW limit on the number of work queues available to 
> > > > > applications.
> > > > >   Programs running on the GPU are associated with an address 
> > > > > space
> > > > >  through the
> > > > >   VMID field, which is translated to a unique PASID at access 
> > > > > time
> > > > >  via a set
> > > > >   of 16 VMID-to-PASID mapping registers. The available VMIDs
> > > > >  (currently 16)
> > > > >   are partitioned (under control of the radeon kgd) between 
> > > > > current
> > > > >   gfx/compute and HSA compute, with each getting 8 in the 
> > > > > current
> > > > >  code. The
> > > > >   VMID-to-PASID mapping registers are updated by the HW 
> > > > > scheduler
> > > > >  when used,
> > > > >   and by driver code if HW scheduling is not being used.
> > > > >   The Sea Islands compute queues use a new "doorbell" mechanism
> > > > >  instead of the
> > > > >   earlier kernel-managed write pointer registers. Doorbells 
> > > > > use a
> > > > >  separate BAR
> > > > >   dedicated for this purpose, and pages within the doorbell
> > > > >  aperture are
> > > > >   mapped to userspace (each page mapped to only one user 
> > > > > address
> > > > >  space).
> > > > >   Writes to the doorbell aperture are intercepted by GPU 
> > > > > hardware,
> > > > >  allowing
> > > > >   userspace code to safely manage work queues (rings) without
> > > > >  requiring a
> > > > >   kernel call for every ring update.
> > > > >   First step for an application process is to open the kfd 
> > > > > device.
> > > > >  Calls to
> > > > >   open create a kfd "process" structure only for the first 
> > > > > thread
> > > > >  of the
> > > > >   process. Subsequent open calls are checked to see if they are
> > > > >  from processes
> > > > >   using the same mm_struct and, if so, don't do anything. The 
> > > > > kfd
> > > > >  per-process
> > > > >   data lives as long as the mm_struct exists. Each mm_struct is
> > > > >  associated
> > > > >   with a unique PASID, allowing the IOMMUv2 to make userspace
> > > > >  process memory
> > > > >   accessible to the GPU.
> > > > >   Next step is for the application to collect topology 
> > > > > information
> > > > >  via sysfs.
> > > > >   This gives userspace enough information to be able to 
> > > > > identify
> > > > >  specific
> > > > >   nodes (processors) in subsequent queue management calls.
> > > > >  Application
> > > > >   processes can create queues on multiple processors, and
> > > > >  processors support
> > > > >   queues from multiple processes.
> > > >  I am not a fan to use sysfs to discover topoly.
> > > > >   At this point the application can create work queues in 
> > > > > userspace
> > > > >  memory and
> > > > >   pass them through the usermode library to kfd to have them 
> > > > > mapped
> > > > >  onto HW
> > > > >   queue slots so that commands written to the queues can be
> > > > >  executed by the
> > > > >   GPU. Queue operations specify a processor node, and so the 
> > > > > bulk
> > > > >  of this code
> > > > >   is device-specific.
> > > > >   Written by John Bridgman <John.Bridgman@amd.com>
> > > >  So general comment is you need to collapse many patches things 
> > > > like
> > > >  58 fixing
> > > >  kernel style should be collapsed ie fix all previous patch that 
> > > > have
> > > >  broken
> > > >  style.
> > > >  Even worse is thing like 71, removing code you just added few 
> > > > patch
> > > >  earlier
> > > >  in the patchset.
> > >  Not quite, the code was added on patch 11.
> > > >  This means that if i start reviewing following
> > > >  patch order
> > > >  i might spend time on code that is later deleted/modified/fixed 
> > > > ie
> > > >  time i
> > > >  spend understanding and reading some code might be just wasted.
> > >  Quick answer is that you are of course right, but having said 
> > > that, I
> > >  think it would be not simple at all to do that squashing.
> > >  I squashed what I could, and probably I can do a little more (like
> > >  58), but the major problem is that one of the main modules of the
> > >  driver - the scheduler (QCM) - was completely re-written (patches
> > >  46-53). Therefore, from patch 1 to 53, we use the old scheduler 
> > > code
> > >  and from 54 we use the new QCM (and the old scheduler code was
> > >  completely remove at 71). So I could maybe squash 71 into 54, but 
> > > that
> > >  won't help you much, I think.
> > >  So, the current advice I can give is to completely ignore the
> > >  following files because they do not exist in the final commit:
> > >  - kfd_sched_cik_static.c (stopped using in 54)
> > >  - kfd_registers.c (stopped using in 81 because we moved all 
> > > register
> > >  writes to radeon)
> > >  - kfd_hw_pointer_store.c (alive from 46 to 67)
> > >  - kfd_hw_pointer_store.h (alive from 46 to 67)
> > >          Oded
> >  
> > Ok i try but this is driving me crazy, i am jungling btw full applied
> > patchset and individual patch going back and forth trying to find 
> > which
> > patch is the one i want to comment on. Not even mentioning that after
> > that people would need to ack bad patch just because they know a 
> > latter
> > good patch fix the badness.
> >  
> > This patchset can be reduced to less then 10 big patches. I would 
> > first
> > do core patches that touch core things like iommu and are preparatory
> > to the patchset. Then kfd patches that do not touch anything in 
> > radeon
> > but implement the skeleton and core infrastructure. Then radeon 
> > specific
> > patches.
> Hi Jerome,
> I thought about what you said and I would like to make a suggestion. I 
> believe I can restructure the patchset into 10-20 patches, organized 
> logically and will be easier to review.
> 
> The downside is that you will lose all references to the current 
> patchset and hence, all the review work you (and other people) have 
> done so far will be somewhat wasted. Also, the entire git history of 
> the driver development will be lost (at least externally).

This history does not matter much.

> 
> John suggested something a bit different in the email thread of PATCH 
> 07/83. He said to squash everything from patch 2 to 54 (including 71) 
> and leave the remaining patches as they were, with maybe some 
> additional small squashing.
> 
> Please tell me which option do you prefer:
> 
> A. Continue with the review of the current patchset.
> B. Go with my suggestion.
> C. Go with John's suggestion.
> 
> I'm going tomorrow (Sunday) to prepare options B & C, but I need your 
> input before publishing anything. So, if you could reply by Monday 
> morning my time, that would be great as it will allow me to publish 
> (if chosen) the new set by Monday morning, EST.
> 
> And thanks again for the time you dedicate to this review. This is 
> highly appreciated.
> 
>         Oded

Squashing patch together will not be enough, what really needs to happen
is a hsa module like drm module and radeon module registering itself into
this hsa module.

First patch whole ioctl set of hsa.
Second patch core hsa code and ioctl sekeleton that returns ENODEV or alike.
Third ioctl marshaling that parse ioctl parameter and handle common ioctl.
Fourth add the driver hookup to core hsa and define the driver api.
Fith implement said api for radeon.

Yes most of those 5 patches will be massive addition of single files but
really that what makes sense here.

But before those 5 patches you most likely want to do some ground work on
the iommu front.

Cheers,
Jérôme

> >  
> > This lead me to the design, all the core kfd helper should be in hsa
> > directory, and should be helper, while all specific bits to radeon
> > should most likely be part of the radeon gfx kernel module. I am not
> > against a radeon_kfd but realy i do not see the point. There should
> > be a hsa.ko like there is a drm.ko.
> >  
> > Also i think it would be benefical to rename kfd to hsa because it is
> > the most common name and the name that will bring valid hit with web
> > search.
> >  
> > Each of the patch should have a proper commit message that is not too
> > cryptic without being too chatty either.
> >  
> > Cheers,
> > Jérôme
> >  
> > > >  I will come back with individual patch comments and also general
> > > >  remarks.
> > > >  Cheers,
> > > >  Jérôme
> > > > >   Alexey Skidanov (4):
> > > > >     hsa/radeon: 32-bit processes support
> > > > >     hsa/radeon: NULL pointer dereference bug workaround
> > > > >     hsa/radeon: HSA64/HSA32 modes support
> > > > >     hsa/radeon: Add local memory to topology
> > > > >   Andrew Lewycky (3):
> > > > >     hsa/radeon: Make binding of process to device permanent
> > > > >     hsa/radeon: Implement hsaKmtSetMemoryPolicy
> > > > >     mm: Change timing of notification to IOMMUs about a page 
> > > > > to be
> > > > >       invalidated
> > > > >   Ben Goz (20):
> > > > >     hsa/radeon: Add queue and hw_pointer_store modules
> > > > >     hsa/radeon: Add support allocating kernel doorbells
> > > > >     hsa/radeon: Add mqd_manager module
> > > > >     hsa/radeon: Add kernel queue support for KFD
> > > > >     hsa/radeon: Add module parameter of scheduling policy
> > > > >     hsa/radeon: Add packet manager module
> > > > >     hsa/radeon: Add process queue manager module
> > > > >     hsa/radeon: Add device queue manager module
> > > > >     hsa/radeon: Switch to new queue scheduler
> > > > >     hsa/radeon: Add IOCTL for update queue
> > > > >     hsa/radeon: Queue Management integration with Memory 
> > > > > Management
> > > > >     hsa/radeon: update queue fault handling
> > > > >     hsa/radeon: fixing a bug to support 32b processes
> > > > >     hsa/radeon: Fix number of pipes per ME
> > > > >     hsa/radeon: Removing hw pointer store module
> > > > >     hsa/radeon: Adding some error messages
> > > > >     hsa/radeon: Fixing minor issues with kernel queues (DIQ)
> > > > >     drm/radeon: Add register access functions to kfd2kgd 
> > > > > interface
> > > > >     hsa/radeon: Eliminating all direct register accesses
> > > > >     drm/radeon: Remove lock functions from kfd2kgd interface
> > > > >   Evgeny Pinchuk (9):
> > > > >     hsa/radeon: fix the OEMID assignment in kfd_topology
> > > > >     drm/radeon: extending kfd-kgd interface
> > > > >     hsa/radeon: implementing IOCTL for clock counters
> > > > >     drm/radeon: adding synchronization for GRBM GFX
> > > > >     hsa/radeon: fixing clock counters bug
> > > > >     drm/radeon: Extending kfd interface
> > > > >     hsa/radeon: Adding max clock speeds to topology
> > > > >     hsa/radeon: Alternating the source of max clock
> > > > >     hsa/radeon: Exclusive access for perf. counters
> > > > >   Michael Varga (1):
> > > > >     hsa/radeon: debugging print statements
> > > > >   Oded Gabbay (45):
> > > > >     mm: Add kfd_process pointer to mm_struct
> > > > >     drm/radeon: reduce number of free VMIDs and pipes in KV
> > > > >     drm/radeon: Report doorbell configuration to kfd
> > > > >     drm/radeon: Add radeon <--> kfd interface
> > > > >     drm/radeon: Add kfd-->kgd interface to get virtual ram size
> > > > >     drm/radeon: Add kfd-->kgd interfaces of memory
> > > > >  allocation/mapping
> > > > >     drm/radeon: Add kfd-->kgd interface of locking 
> > > > > srbm_gfx_cntl
> > > > >  register
> > > > >     drm/radeon: Add calls to initialize and finalize kfd from 
> > > > > radeon
> > > > >     hsa/radeon: Add code base of hsa driver for AMD's GPUs
> > > > >     hsa/radeon: Add initialization and unmapping of doorbell
> > > > >  aperture
> > > > >     hsa/radeon: Add scheduler code
> > > > >     hsa/radeon: Add kfd mmap handler
> > > > >     hsa/radeon: Add 2 new IOCTL to kfd, CREATE_QUEUE and
> > > > >  DESTROY_QUEUE
> > > > >     hsa/radeon: Update MAINTAINERS and CREDITS files
> > > > >     hsa/radeon: Add interrupt handling module
> > > > >     hsa/radeon: Add the isr function of the KFD scehduler
> > > > >     hsa/radeon: Handle deactivation of queues using interrupts
> > > > >     hsa/radeon: Enable interrupts in KFD scheduler
> > > > >     hsa/radeon: Enable/Disable KFD interrupt module
> > > > >     hsa/radeon: Add interrupt callback function to kgd2kfd 
> > > > > interface
> > > > >     hsa/radeon: Add kgd-->kfd interfaces for suspend and resume
> > > > >     drm/radeon: Add calls to suspend and resume of kfd driver
> > > > >     drm/radeon/cik: Don't touch int of pipes 1-7
> > > > >     drm/radeon/cik: Call kfd isr function
> > > > >     hsa/radeon: Fix memory size allocated for HPD
> > > > >     hsa/radeon: Fix list of supported devices
> > > > >     hsa/radeon: Fix coding style in cik_int.h
> > > > >     hsa/radeon: Print ioctl commnad only in debug mode
> > > > >     hsa/radeon: Print ISR info only in debug mode
> > > > >     hsa/radeon: Workaround for a bug in amd_iommu
> > > > >     hsa/radeon: Eliminate warnings in compilation
> > > > >     hsa/radeon: Various kernel styling fixes
> > > > >     hsa/radeon: Rearrange structures in kfd_ioctl.h
> > > > >     hsa/radeon: change another pr_info to pr_debug
> > > > >     hsa/radeon: Fix timeout calculation in sync_with_hw
> > > > >     hsa/radeon: Update module information and version
> > > > >     hsa/radeon: Update module version to 0.6.0
> > > > >     hsa/radeon: Fix initialization of sh_mem registers
> > > > >     hsa/radeon: Fix compilation warnings
> > > > >     hsa/radeon: Remove old scheduler code
> > > > >     hsa/radeon: Static analysis (smatch) fixes
> > > > >     hsa/radeon: Check oversubscription before destroying 
> > > > > runlist
> > > > >     hsa/radeon: Don't verify cksum when parsing CRAT table
> > > > >     hsa/radeon: Update module version to 0.6.1
> > > > >     hsa/radeon: Update module version to 0.6.2
> > > > >   Yair Shachar (1):
> > > > >     hsa/radeon: Adding qcm fence return status
> > > > >    CREDITS                                            |    7 +
> > > > >    MAINTAINERS                                        |    8 +
> > > > >    drivers/Kconfig                                    |    2 +
> > > > >    drivers/gpu/Makefile                               |    1 +
> > > > >    drivers/gpu/drm/radeon/Makefile                    |    1 +
> > > > >    drivers/gpu/drm/radeon/cik.c                       |  156 
> > > > > +--
> > > > >    drivers/gpu/drm/radeon/cikd.h                      |   51 +-
> > > > >    drivers/gpu/drm/radeon/radeon.h                    |    9 +
> > > > >    drivers/gpu/drm/radeon/radeon_device.c             |   32 +
> > > > >    drivers/gpu/drm/radeon/radeon_drv.c                |    6 +
> > > > >    drivers/gpu/drm/radeon/radeon_kfd.c                |  630
> > > > >  ++++++++++
> > > > >    drivers/gpu/drm/radeon/radeon_kms.c                |    9 +
> > > > >    drivers/gpu/hsa/Kconfig                            |   20 +
> > > > >    drivers/gpu/hsa/Makefile                           |    1 +
> > > > >    drivers/gpu/hsa/radeon/Makefile                    |   12 +
> > > > >    drivers/gpu/hsa/radeon/cik_int.h                   |   50 +
> > > > >    drivers/gpu/hsa/radeon/cik_mqds.h                  |  250 
> > > > > ++++
> > > > >    drivers/gpu/hsa/radeon/cik_regs.h                  |  220 
> > > > > ++++
> > > > >    drivers/gpu/hsa/radeon/kfd_aperture.c              |  123 ++
> > > > >    drivers/gpu/hsa/radeon/kfd_chardev.c               |  530
> > > > >  +++++++++
> > > > >    drivers/gpu/hsa/radeon/kfd_crat.h                  |  294 
> > > > > +++++
> > > > >    drivers/gpu/hsa/radeon/kfd_device.c                |  244 
> > > > > ++++
> > > > >    drivers/gpu/hsa/radeon/kfd_device_queue_manager.c  |  981
> > > > >  ++++++++++++++++
> > > > >    drivers/gpu/hsa/radeon/kfd_device_queue_manager.h  |  101 ++
> > > > >    drivers/gpu/hsa/radeon/kfd_doorbell.c              |  242 
> > > > > ++++
> > > > >    drivers/gpu/hsa/radeon/kfd_interrupt.c             |  177 
> > > > > +++
> > > > >    drivers/gpu/hsa/radeon/kfd_kernel_queue.c          |  305 
> > > > > +++++
> > > > >    drivers/gpu/hsa/radeon/kfd_kernel_queue.h          |   66 ++
> > > > >    drivers/gpu/hsa/radeon/kfd_module.c                |  130 
> > > > > +++
> > > > >    drivers/gpu/hsa/radeon/kfd_mqd_manager.c           |  290 
> > > > > +++++
> > > > >    drivers/gpu/hsa/radeon/kfd_mqd_manager.h           |   54 +
> > > > >    drivers/gpu/hsa/radeon/kfd_packet_manager.c        |  488
> > > > >  ++++++++
> > > > >    drivers/gpu/hsa/radeon/kfd_pasid.c                 |   97 ++
> > > > >    drivers/gpu/hsa/radeon/kfd_pm4_headers.h           |  682
> > > > >  +++++++++++
> > > > >    drivers/gpu/hsa/radeon/kfd_pm4_opcodes.h           |  107 ++
> > > > >    drivers/gpu/hsa/radeon/kfd_priv.h                  |  475
> > > > >  ++++++++
> > > > >    drivers/gpu/hsa/radeon/kfd_process.c               |  391 
> > > > > +++++++
> > > > >    drivers/gpu/hsa/radeon/kfd_process_queue_manager.c |  343 
> > > > > ++++++
> > > > >    drivers/gpu/hsa/radeon/kfd_queue.c                 |  109 ++
> > > > >    drivers/gpu/hsa/radeon/kfd_scheduler.h             |   72 ++
> > > > >    drivers/gpu/hsa/radeon/kfd_topology.c              | 1207
> > > > >  ++++++++++++++++++++
> > > > >    drivers/gpu/hsa/radeon/kfd_topology.h              |  168 
> > > > > +++
> > > > >    drivers/gpu/hsa/radeon/kfd_vidmem.c                |   97 ++
> > > > >    include/linux/mm_types.h                           |   14 +
> > > > >    include/linux/radeon_kfd.h                         |  106 ++
> > > > >    include/uapi/linux/kfd_ioctl.h                     |  133 
> > > > > +++
> > > > >    mm/rmap.c                                          |    8 +-
> > > > >    47 files changed, 9402 insertions(+), 97 deletions(-)
> > > > >    create mode 100644 drivers/gpu/drm/radeon/radeon_kfd.c
> > > > >    create mode 100644 drivers/gpu/hsa/Kconfig
> > > > >    create mode 100644 drivers/gpu/hsa/Makefile
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/Makefile
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/cik_int.h
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/cik_mqds.h
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/cik_regs.h
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_aperture.c
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_chardev.c
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_crat.h
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_device.c
> > > > >    create mode 100644
> > > > >  drivers/gpu/hsa/radeon/kfd_device_queue_manager.c
> > > > >    create mode 100644
> > > > >  drivers/gpu/hsa/radeon/kfd_device_queue_manager.h
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_doorbell.c
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_interrupt.c
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_kernel_queue.c
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_kernel_queue.h
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_module.c
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_mqd_manager.c
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_mqd_manager.h
> > > > >    create mode 100644 
> > > > > drivers/gpu/hsa/radeon/kfd_packet_manager.c
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_pasid.c
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_pm4_headers.h
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_pm4_opcodes.h
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_priv.h
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_process.c
> > > > >    create mode 100644
> > > > >  drivers/gpu/hsa/radeon/kfd_process_queue_manager.c
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_queue.c
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_scheduler.h
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_topology.c
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_topology.h
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_vidmem.c
> > > > >    create mode 100644 include/linux/radeon_kfd.h
> > > > >    create mode 100644 include/uapi/linux/kfd_ioctl.h
> > > > >   --
> > > > >   1.9.1

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/83] AMD HSA kernel driver
@ 2014-07-13  3:55           ` Jerome Glisse
  0 siblings, 0 replies; 51+ messages in thread
From: Jerome Glisse @ 2014-07-13  3:55 UTC (permalink / raw)
  To: Gabbay, Oded
  Cc: Lewycky, Andrew, linux-kernel, dri-devel, Deucher, Alexander, akpm

On Sat, Jul 12, 2014 at 09:55:49PM +0000, Gabbay, Oded wrote:
> On Fri, 2014-07-11 at 17:18 -0400, Jerome Glisse wrote:
> > On Thu, Jul 10, 2014 at 10:51:29PM +0000, Gabbay, Oded wrote:
> > >  On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote:
> > > >  On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote:
> > > > >   This patch set implements a Heterogeneous System Architecture
> > > > >  (HSA) driver
> > > > >   for radeon-family GPUs.
> > > >  This is just quick comments on few things. Given size of this, 
> > > > people
> > > >  will need to have time to review things.
> > > > >   HSA allows different processor types (CPUs, DSPs, GPUs, 
> > > > > etc..) to
> > > > >  share
> > > > >   system resources more effectively via HW features including
> > > > >  shared pageable
> > > > >   memory, userspace-accessible work queues, and platform-level
> > > > >  atomics. In
> > > > >   addition to the memory protection mechanisms in GPUVM and
> > > > >  IOMMUv2, the Sea
> > > > >   Islands family of GPUs also performs HW-level validation of
> > > > >  commands passed
> > > > >   in through the queues (aka rings).
> > > > >   The code in this patch set is intended to serve both as a 
> > > > > sample
> > > > >  driver for
> > > > >   other HSA-compatible hardware devices and as a production 
> > > > > driver
> > > > >  for
> > > > >   radeon-family processors. The code is architected to support
> > > > >  multiple CPUs
> > > > >   each with connected GPUs, although the current implementation
> > > > >  focuses on a
> > > > >   single Kaveri/Berlin APU, and works alongside the existing 
> > > > > radeon
> > > > >  kernel
> > > > >   graphics driver (kgd).
> > > > >   AMD GPUs designed for use with HSA (Sea Islands and up) share
> > > > >  some hardware
> > > > >   functionality between HSA compute and regular gfx/compute 
> > > > > (memory,
> > > > >   interrupts, registers), while other functionality has been 
> > > > > added
> > > > >   specifically for HSA compute  (hw scheduler for virtualized
> > > > >  compute rings).
> > > > >   All shared hardware is owned by the radeon graphics driver, 
> > > > > and
> > > > >  an interface
> > > > >   between kfd and kgd allows the kfd to make use of those 
> > > > > shared
> > > > >  resources,
> > > > >   while HSA-specific functionality is managed directly by kfd 
> > > > > by
> > > > >  submitting
> > > > >   packets into an HSA-specific command queue (the "HIQ").
> > > > >   During kfd module initialization a char device node 
> > > > > (/dev/kfd) is
> > > > >  created
> > > > >   (surviving until module exit), with ioctls for queue 
> > > > > creation &
> > > > >  management,
> > > > >   and data structures are initialized for managing HSA device
> > > > >  topology.
> > > > >   The rest of the initialization is driven by calls from the 
> > > > > radeon
> > > > >  kgd at
> > > > >   the following points :
> > > > >   - radeon_init (kfd_init)
> > > > >   - radeon_exit (kfd_fini)
> > > > >   - radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
> > > > >   - radeon_driver_unload_kms (kfd_device_fini)
> > > > >   During the probe and init processing per-device data 
> > > > > structures
> > > > >  are
> > > > >   established which connect to the associated graphics kernel
> > > > >  driver. This
> > > > >   information is exposed to userspace via sysfs, along with a
> > > > >  version number
> > > > >   allowing userspace to determine if a topology change has 
> > > > > occurred
> > > > >  while it
> > > > >   was reading from sysfs.
> > > > >   The interface between kfd and kgd also allows the kfd to 
> > > > > request
> > > > >  buffer
> > > > >   management services from kgd, and allows kgd to route 
> > > > > interrupt
> > > > >  requests to
> > > > >   kfd code since the interrupt block is shared between regular
> > > > >   graphics/compute and HSA compute subsystems in the GPU.
> > > > >   The kfd code works with an open source usermode library
> > > > >  ("libhsakmt") which
> > > > >   is in the final stages of IP review and should be published 
> > > > > in a
> > > > >  separate
> > > > >   repo over the next few days.
> > > > >   The code operates in one of three modes, selectable via the
> > > > >  sched_policy
> > > > >   module parameter :
> > > > >   - sched_policy=0 uses a hardware scheduler running in the MEC
> > > > >  block within
> > > > >   CP, and allows oversubscription (more queues than HW slots)
> > > > >   - sched_policy=1 also uses HW scheduling but does not allow
> > > > >   oversubscription, so create_queue requests fail when we run 
> > > > > out
> > > > >  of HW slots
> > > > >   - sched_policy=2 does not use HW scheduling, so the driver
> > > > >  manually assigns
> > > > >   queues to HW slots by programming registers
> > > > >   The "no HW scheduling" option is for debug & new hardware 
> > > > > bringup
> > > > >  only, so
> > > > >   has less test coverage than the other options. Default in the
> > > > >  current code
> > > > >   is "HW scheduling without oversubscription" since that is 
> > > > > where
> > > > >  we have the
> > > > >   most test coverage but we expect to change the default to "HW
> > > > >  scheduling
> > > > >   with oversubscription" after further testing. This 
> > > > > effectively
> > > > >  removes the
> > > > >   HW limit on the number of work queues available to 
> > > > > applications.
> > > > >   Programs running on the GPU are associated with an address 
> > > > > space
> > > > >  through the
> > > > >   VMID field, which is translated to a unique PASID at access 
> > > > > time
> > > > >  via a set
> > > > >   of 16 VMID-to-PASID mapping registers. The available VMIDs
> > > > >  (currently 16)
> > > > >   are partitioned (under control of the radeon kgd) between 
> > > > > current
> > > > >   gfx/compute and HSA compute, with each getting 8 in the 
> > > > > current
> > > > >  code. The
> > > > >   VMID-to-PASID mapping registers are updated by the HW 
> > > > > scheduler
> > > > >  when used,
> > > > >   and by driver code if HW scheduling is not being used.
> > > > >   The Sea Islands compute queues use a new "doorbell" mechanism
> > > > >  instead of the
> > > > >   earlier kernel-managed write pointer registers. Doorbells 
> > > > > use a
> > > > >  separate BAR
> > > > >   dedicated for this purpose, and pages within the doorbell
> > > > >  aperture are
> > > > >   mapped to userspace (each page mapped to only one user 
> > > > > address
> > > > >  space).
> > > > >   Writes to the doorbell aperture are intercepted by GPU 
> > > > > hardware,
> > > > >  allowing
> > > > >   userspace code to safely manage work queues (rings) without
> > > > >  requiring a
> > > > >   kernel call for every ring update.
> > > > >   First step for an application process is to open the kfd 
> > > > > device.
> > > > >  Calls to
> > > > >   open create a kfd "process" structure only for the first 
> > > > > thread
> > > > >  of the
> > > > >   process. Subsequent open calls are checked to see if they are
> > > > >  from processes
> > > > >   using the same mm_struct and, if so, don't do anything. The 
> > > > > kfd
> > > > >  per-process
> > > > >   data lives as long as the mm_struct exists. Each mm_struct is
> > > > >  associated
> > > > >   with a unique PASID, allowing the IOMMUv2 to make userspace
> > > > >  process memory
> > > > >   accessible to the GPU.
> > > > >   Next step is for the application to collect topology 
> > > > > information
> > > > >  via sysfs.
> > > > >   This gives userspace enough information to be able to 
> > > > > identify
> > > > >  specific
> > > > >   nodes (processors) in subsequent queue management calls.
> > > > >  Application
> > > > >   processes can create queues on multiple processors, and
> > > > >  processors support
> > > > >   queues from multiple processes.
> > > >  I am not a fan to use sysfs to discover topoly.
> > > > >   At this point the application can create work queues in 
> > > > > userspace
> > > > >  memory and
> > > > >   pass them through the usermode library to kfd to have them 
> > > > > mapped
> > > > >  onto HW
> > > > >   queue slots so that commands written to the queues can be
> > > > >  executed by the
> > > > >   GPU. Queue operations specify a processor node, and so the 
> > > > > bulk
> > > > >  of this code
> > > > >   is device-specific.
> > > > >   Written by John Bridgman <John.Bridgman@amd.com>
> > > >  So general comment is you need to collapse many patches things 
> > > > like
> > > >  58 fixing
> > > >  kernel style should be collapsed ie fix all previous patch that 
> > > > have
> > > >  broken
> > > >  style.
> > > >  Even worse is thing like 71, removing code you just added few 
> > > > patch
> > > >  earlier
> > > >  in the patchset.
> > >  Not quite, the code was added on patch 11.
> > > >  This means that if i start reviewing following
> > > >  patch order
> > > >  i might spend time on code that is later deleted/modified/fixed 
> > > > ie
> > > >  time i
> > > >  spend understanding and reading some code might be just wasted.
> > >  Quick answer is that you are of course right, but having said 
> > > that, I
> > >  think it would be not simple at all to do that squashing.
> > >  I squashed what I could, and probably I can do a little more (like
> > >  58), but the major problem is that one of the main modules of the
> > >  driver - the scheduler (QCM) - was completely re-written (patches
> > >  46-53). Therefore, from patch 1 to 53, we use the old scheduler 
> > > code
> > >  and from 54 we use the new QCM (and the old scheduler code was
> > >  completely remove at 71). So I could maybe squash 71 into 54, but 
> > > that
> > >  won't help you much, I think.
> > >  So, the current advice I can give is to completely ignore the
> > >  following files because they do not exist in the final commit:
> > >  - kfd_sched_cik_static.c (stopped using in 54)
> > >  - kfd_registers.c (stopped using in 81 because we moved all 
> > > register
> > >  writes to radeon)
> > >  - kfd_hw_pointer_store.c (alive from 46 to 67)
> > >  - kfd_hw_pointer_store.h (alive from 46 to 67)
> > >          Oded
> >  
> > Ok i try but this is driving me crazy, i am jungling btw full applied
> > patchset and individual patch going back and forth trying to find 
> > which
> > patch is the one i want to comment on. Not even mentioning that after
> > that people would need to ack bad patch just because they know a 
> > latter
> > good patch fix the badness.
> >  
> > This patchset can be reduced to less then 10 big patches. I would 
> > first
> > do core patches that touch core things like iommu and are preparatory
> > to the patchset. Then kfd patches that do not touch anything in 
> > radeon
> > but implement the skeleton and core infrastructure. Then radeon 
> > specific
> > patches.
> Hi Jerome,
> I thought about what you said and I would like to make a suggestion. I 
> believe I can restructure the patchset into 10-20 patches, organized 
> logically and will be easier to review.
> 
> The downside is that you will lose all references to the current 
> patchset and hence, all the review work you (and other people) have 
> done so far will be somewhat wasted. Also, the entire git history of 
> the driver development will be lost (at least externally).

This history does not matter much.

> 
> John suggested something a bit different in the email thread of PATCH 
> 07/83. He said to squash everything from patch 2 to 54 (including 71) 
> and leave the remaining patches as they were, with maybe some 
> additional small squashing.
> 
> Please tell me which option do you prefer:
> 
> A. Continue with the review of the current patchset.
> B. Go with my suggestion.
> C. Go with John's suggestion.
> 
> I'm going tomorrow (Sunday) to prepare options B & C, but I need your 
> input before publishing anything. So, if you could reply by Monday 
> morning my time, that would be great as it will allow me to publish 
> (if chosen) the new set by Monday morning, EST.
> 
> And thanks again for the time you dedicate to this review. This is 
> highly appreciated.
> 
>         Oded

Squashing patch together will not be enough, what really needs to happen
is a hsa module like drm module and radeon module registering itself into
this hsa module.

First patch whole ioctl set of hsa.
Second patch core hsa code and ioctl sekeleton that returns ENODEV or alike.
Third ioctl marshaling that parse ioctl parameter and handle common ioctl.
Fourth add the driver hookup to core hsa and define the driver api.
Fith implement said api for radeon.

Yes most of those 5 patches will be massive addition of single files but
really that what makes sense here.

But before those 5 patches you most likely want to do some ground work on
the iommu front.

Cheers,
Jérôme

> >  
> > This lead me to the design, all the core kfd helper should be in hsa
> > directory, and should be helper, while all specific bits to radeon
> > should most likely be part of the radeon gfx kernel module. I am not
> > against a radeon_kfd but realy i do not see the point. There should
> > be a hsa.ko like there is a drm.ko.
> >  
> > Also i think it would be benefical to rename kfd to hsa because it is
> > the most common name and the name that will bring valid hit with web
> > search.
> >  
> > Each of the patch should have a proper commit message that is not too
> > cryptic without being too chatty either.
> >  
> > Cheers,
> > Jérôme
> >  
> > > >  I will come back with individual patch comments and also general
> > > >  remarks.
> > > >  Cheers,
> > > >  Jérôme
> > > > >   Alexey Skidanov (4):
> > > > >     hsa/radeon: 32-bit processes support
> > > > >     hsa/radeon: NULL pointer dereference bug workaround
> > > > >     hsa/radeon: HSA64/HSA32 modes support
> > > > >     hsa/radeon: Add local memory to topology
> > > > >   Andrew Lewycky (3):
> > > > >     hsa/radeon: Make binding of process to device permanent
> > > > >     hsa/radeon: Implement hsaKmtSetMemoryPolicy
> > > > >     mm: Change timing of notification to IOMMUs about a page 
> > > > > to be
> > > > >       invalidated
> > > > >   Ben Goz (20):
> > > > >     hsa/radeon: Add queue and hw_pointer_store modules
> > > > >     hsa/radeon: Add support allocating kernel doorbells
> > > > >     hsa/radeon: Add mqd_manager module
> > > > >     hsa/radeon: Add kernel queue support for KFD
> > > > >     hsa/radeon: Add module parameter of scheduling policy
> > > > >     hsa/radeon: Add packet manager module
> > > > >     hsa/radeon: Add process queue manager module
> > > > >     hsa/radeon: Add device queue manager module
> > > > >     hsa/radeon: Switch to new queue scheduler
> > > > >     hsa/radeon: Add IOCTL for update queue
> > > > >     hsa/radeon: Queue Management integration with Memory 
> > > > > Management
> > > > >     hsa/radeon: update queue fault handling
> > > > >     hsa/radeon: fixing a bug to support 32b processes
> > > > >     hsa/radeon: Fix number of pipes per ME
> > > > >     hsa/radeon: Removing hw pointer store module
> > > > >     hsa/radeon: Adding some error messages
> > > > >     hsa/radeon: Fixing minor issues with kernel queues (DIQ)
> > > > >     drm/radeon: Add register access functions to kfd2kgd 
> > > > > interface
> > > > >     hsa/radeon: Eliminating all direct register accesses
> > > > >     drm/radeon: Remove lock functions from kfd2kgd interface
> > > > >   Evgeny Pinchuk (9):
> > > > >     hsa/radeon: fix the OEMID assignment in kfd_topology
> > > > >     drm/radeon: extending kfd-kgd interface
> > > > >     hsa/radeon: implementing IOCTL for clock counters
> > > > >     drm/radeon: adding synchronization for GRBM GFX
> > > > >     hsa/radeon: fixing clock counters bug
> > > > >     drm/radeon: Extending kfd interface
> > > > >     hsa/radeon: Adding max clock speeds to topology
> > > > >     hsa/radeon: Alternating the source of max clock
> > > > >     hsa/radeon: Exclusive access for perf. counters
> > > > >   Michael Varga (1):
> > > > >     hsa/radeon: debugging print statements
> > > > >   Oded Gabbay (45):
> > > > >     mm: Add kfd_process pointer to mm_struct
> > > > >     drm/radeon: reduce number of free VMIDs and pipes in KV
> > > > >     drm/radeon: Report doorbell configuration to kfd
> > > > >     drm/radeon: Add radeon <--> kfd interface
> > > > >     drm/radeon: Add kfd-->kgd interface to get virtual ram size
> > > > >     drm/radeon: Add kfd-->kgd interfaces of memory
> > > > >  allocation/mapping
> > > > >     drm/radeon: Add kfd-->kgd interface of locking 
> > > > > srbm_gfx_cntl
> > > > >  register
> > > > >     drm/radeon: Add calls to initialize and finalize kfd from 
> > > > > radeon
> > > > >     hsa/radeon: Add code base of hsa driver for AMD's GPUs
> > > > >     hsa/radeon: Add initialization and unmapping of doorbell
> > > > >  aperture
> > > > >     hsa/radeon: Add scheduler code
> > > > >     hsa/radeon: Add kfd mmap handler
> > > > >     hsa/radeon: Add 2 new IOCTL to kfd, CREATE_QUEUE and
> > > > >  DESTROY_QUEUE
> > > > >     hsa/radeon: Update MAINTAINERS and CREDITS files
> > > > >     hsa/radeon: Add interrupt handling module
> > > > >     hsa/radeon: Add the isr function of the KFD scehduler
> > > > >     hsa/radeon: Handle deactivation of queues using interrupts
> > > > >     hsa/radeon: Enable interrupts in KFD scheduler
> > > > >     hsa/radeon: Enable/Disable KFD interrupt module
> > > > >     hsa/radeon: Add interrupt callback function to kgd2kfd 
> > > > > interface
> > > > >     hsa/radeon: Add kgd-->kfd interfaces for suspend and resume
> > > > >     drm/radeon: Add calls to suspend and resume of kfd driver
> > > > >     drm/radeon/cik: Don't touch int of pipes 1-7
> > > > >     drm/radeon/cik: Call kfd isr function
> > > > >     hsa/radeon: Fix memory size allocated for HPD
> > > > >     hsa/radeon: Fix list of supported devices
> > > > >     hsa/radeon: Fix coding style in cik_int.h
> > > > >     hsa/radeon: Print ioctl commnad only in debug mode
> > > > >     hsa/radeon: Print ISR info only in debug mode
> > > > >     hsa/radeon: Workaround for a bug in amd_iommu
> > > > >     hsa/radeon: Eliminate warnings in compilation
> > > > >     hsa/radeon: Various kernel styling fixes
> > > > >     hsa/radeon: Rearrange structures in kfd_ioctl.h
> > > > >     hsa/radeon: change another pr_info to pr_debug
> > > > >     hsa/radeon: Fix timeout calculation in sync_with_hw
> > > > >     hsa/radeon: Update module information and version
> > > > >     hsa/radeon: Update module version to 0.6.0
> > > > >     hsa/radeon: Fix initialization of sh_mem registers
> > > > >     hsa/radeon: Fix compilation warnings
> > > > >     hsa/radeon: Remove old scheduler code
> > > > >     hsa/radeon: Static analysis (smatch) fixes
> > > > >     hsa/radeon: Check oversubscription before destroying 
> > > > > runlist
> > > > >     hsa/radeon: Don't verify cksum when parsing CRAT table
> > > > >     hsa/radeon: Update module version to 0.6.1
> > > > >     hsa/radeon: Update module version to 0.6.2
> > > > >   Yair Shachar (1):
> > > > >     hsa/radeon: Adding qcm fence return status
> > > > >    CREDITS                                            |    7 +
> > > > >    MAINTAINERS                                        |    8 +
> > > > >    drivers/Kconfig                                    |    2 +
> > > > >    drivers/gpu/Makefile                               |    1 +
> > > > >    drivers/gpu/drm/radeon/Makefile                    |    1 +
> > > > >    drivers/gpu/drm/radeon/cik.c                       |  156 
> > > > > +--
> > > > >    drivers/gpu/drm/radeon/cikd.h                      |   51 +-
> > > > >    drivers/gpu/drm/radeon/radeon.h                    |    9 +
> > > > >    drivers/gpu/drm/radeon/radeon_device.c             |   32 +
> > > > >    drivers/gpu/drm/radeon/radeon_drv.c                |    6 +
> > > > >    drivers/gpu/drm/radeon/radeon_kfd.c                |  630
> > > > >  ++++++++++
> > > > >    drivers/gpu/drm/radeon/radeon_kms.c                |    9 +
> > > > >    drivers/gpu/hsa/Kconfig                            |   20 +
> > > > >    drivers/gpu/hsa/Makefile                           |    1 +
> > > > >    drivers/gpu/hsa/radeon/Makefile                    |   12 +
> > > > >    drivers/gpu/hsa/radeon/cik_int.h                   |   50 +
> > > > >    drivers/gpu/hsa/radeon/cik_mqds.h                  |  250 
> > > > > ++++
> > > > >    drivers/gpu/hsa/radeon/cik_regs.h                  |  220 
> > > > > ++++
> > > > >    drivers/gpu/hsa/radeon/kfd_aperture.c              |  123 ++
> > > > >    drivers/gpu/hsa/radeon/kfd_chardev.c               |  530
> > > > >  +++++++++
> > > > >    drivers/gpu/hsa/radeon/kfd_crat.h                  |  294 
> > > > > +++++
> > > > >    drivers/gpu/hsa/radeon/kfd_device.c                |  244 
> > > > > ++++
> > > > >    drivers/gpu/hsa/radeon/kfd_device_queue_manager.c  |  981
> > > > >  ++++++++++++++++
> > > > >    drivers/gpu/hsa/radeon/kfd_device_queue_manager.h  |  101 ++
> > > > >    drivers/gpu/hsa/radeon/kfd_doorbell.c              |  242 
> > > > > ++++
> > > > >    drivers/gpu/hsa/radeon/kfd_interrupt.c             |  177 
> > > > > +++
> > > > >    drivers/gpu/hsa/radeon/kfd_kernel_queue.c          |  305 
> > > > > +++++
> > > > >    drivers/gpu/hsa/radeon/kfd_kernel_queue.h          |   66 ++
> > > > >    drivers/gpu/hsa/radeon/kfd_module.c                |  130 
> > > > > +++
> > > > >    drivers/gpu/hsa/radeon/kfd_mqd_manager.c           |  290 
> > > > > +++++
> > > > >    drivers/gpu/hsa/radeon/kfd_mqd_manager.h           |   54 +
> > > > >    drivers/gpu/hsa/radeon/kfd_packet_manager.c        |  488
> > > > >  ++++++++
> > > > >    drivers/gpu/hsa/radeon/kfd_pasid.c                 |   97 ++
> > > > >    drivers/gpu/hsa/radeon/kfd_pm4_headers.h           |  682
> > > > >  +++++++++++
> > > > >    drivers/gpu/hsa/radeon/kfd_pm4_opcodes.h           |  107 ++
> > > > >    drivers/gpu/hsa/radeon/kfd_priv.h                  |  475
> > > > >  ++++++++
> > > > >    drivers/gpu/hsa/radeon/kfd_process.c               |  391 
> > > > > +++++++
> > > > >    drivers/gpu/hsa/radeon/kfd_process_queue_manager.c |  343 
> > > > > ++++++
> > > > >    drivers/gpu/hsa/radeon/kfd_queue.c                 |  109 ++
> > > > >    drivers/gpu/hsa/radeon/kfd_scheduler.h             |   72 ++
> > > > >    drivers/gpu/hsa/radeon/kfd_topology.c              | 1207
> > > > >  ++++++++++++++++++++
> > > > >    drivers/gpu/hsa/radeon/kfd_topology.h              |  168 
> > > > > +++
> > > > >    drivers/gpu/hsa/radeon/kfd_vidmem.c                |   97 ++
> > > > >    include/linux/mm_types.h                           |   14 +
> > > > >    include/linux/radeon_kfd.h                         |  106 ++
> > > > >    include/uapi/linux/kfd_ioctl.h                     |  133 
> > > > > +++
> > > > >    mm/rmap.c                                          |    8 +-
> > > > >    47 files changed, 9402 insertions(+), 97 deletions(-)
> > > > >    create mode 100644 drivers/gpu/drm/radeon/radeon_kfd.c
> > > > >    create mode 100644 drivers/gpu/hsa/Kconfig
> > > > >    create mode 100644 drivers/gpu/hsa/Makefile
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/Makefile
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/cik_int.h
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/cik_mqds.h
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/cik_regs.h
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_aperture.c
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_chardev.c
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_crat.h
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_device.c
> > > > >    create mode 100644
> > > > >  drivers/gpu/hsa/radeon/kfd_device_queue_manager.c
> > > > >    create mode 100644
> > > > >  drivers/gpu/hsa/radeon/kfd_device_queue_manager.h
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_doorbell.c
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_interrupt.c
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_kernel_queue.c
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_kernel_queue.h
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_module.c
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_mqd_manager.c
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_mqd_manager.h
> > > > >    create mode 100644 
> > > > > drivers/gpu/hsa/radeon/kfd_packet_manager.c
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_pasid.c
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_pm4_headers.h
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_pm4_opcodes.h
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_priv.h
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_process.c
> > > > >    create mode 100644
> > > > >  drivers/gpu/hsa/radeon/kfd_process_queue_manager.c
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_queue.c
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_scheduler.h
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_topology.c
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_topology.h
> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_vidmem.c
> > > > >    create mode 100644 include/linux/radeon_kfd.h
> > > > >    create mode 100644 include/uapi/linux/kfd_ioctl.h
> > > > >   --
> > > > >   1.9.1

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/83] AMD HSA kernel driver
  2014-07-12 16:49           ` Jerome Glisse
@ 2014-07-13  9:42               ` Daniel Vetter
  0 siblings, 0 replies; 51+ messages in thread
From: Daniel Vetter @ 2014-07-13  9:42 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Christian König, Gabbay, Oded, oded.gabbay, Lewycky, Andrew,
	linux-kernel, dri-devel, Deucher, Alexander, akpm, Jesse Barnes

On Sat, Jul 12, 2014 at 6:49 PM, Jerome Glisse <j.glisse@gmail.com> wrote:
>> Hm, so the hsa part is a completely new driver/subsystem, not just an
>> additional ioctl tacked onto radoen? The history of drm is littered with
>> "generic" ioctls that turned out to be useful for exactly one driver.
>> Which is why _all_ the command submission is now done with driver-private
>> ioctls.
>>
>> I'd be quite a bit surprised if that suddenly works differently, so before
>> we bless a generic hsa interface I really want to see some implementation
>> from a different vendor (i.e. nvdidia or intel) using the same ioctls.
>> Otherwise we just repeat history and I'm not terribly inclined to keep on
>> cleanup up cruft forever - one drm legacy is enough ;-)
>>
>> Jesse is the guy from our side to talk to about this.
>> -Daniel
>
> I am not worried about that side, the hsa foundation has pretty strict
> guidelines on what is hsa compliant hardware ie the hw needs to understand
> the pm4 packet format of radeon (well small subset of it). But of course
> this require hsa compliant hardware and from member i am guessing ARM Mali,
> ImgTech, Qualcomm, ... so unless Intel and NVidia joins hsa you will not
> see it for those hardware.
>
> So yes for once same ioctl would apply to different hardware. The only things
> that is different is the shader isa. The hsafoundation site has some pdf
> explaining all that but someone thought that slideshare would be a good idea
> personnaly i would not register to any of the website just to get the pdf.
>
> So to sumup i am ok with having a new device file that present uniform set
> of ioctl. It would actualy be lot easier for userspace, just open this fix
> device file and ask for list of compliant hardware.
>
> Then radeon kernel driver would register itself as a provider. So all ioctl
> decoding marshalling would be share which makes sense.

There's also the other side namely that preparing the cp ring in
userspace and submitting the entire pile through a doorbell to the hw
scheduler isn't really hsa exclusive. And for a solid platform with
seamless gpu/cpu integration that means we need standard ways to set
gpu context priorities and get at useful stats like gpu time used by a
given context.

To get there I guess intel/nvidia need to reuse the hsa subsystem with
the command submission adjusted a bit. Kinda like drm where kms and
buffer sharing is common and cs driver specific.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/83] AMD HSA kernel driver
@ 2014-07-13  9:42               ` Daniel Vetter
  0 siblings, 0 replies; 51+ messages in thread
From: Daniel Vetter @ 2014-07-13  9:42 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Lewycky, Andrew, linux-kernel, dri-devel, Jesse Barnes, Gabbay,
	Oded, Deucher, Alexander, akpm, Christian König

On Sat, Jul 12, 2014 at 6:49 PM, Jerome Glisse <j.glisse@gmail.com> wrote:
>> Hm, so the hsa part is a completely new driver/subsystem, not just an
>> additional ioctl tacked onto radoen? The history of drm is littered with
>> "generic" ioctls that turned out to be useful for exactly one driver.
>> Which is why _all_ the command submission is now done with driver-private
>> ioctls.
>>
>> I'd be quite a bit surprised if that suddenly works differently, so before
>> we bless a generic hsa interface I really want to see some implementation
>> from a different vendor (i.e. nvdidia or intel) using the same ioctls.
>> Otherwise we just repeat history and I'm not terribly inclined to keep on
>> cleanup up cruft forever - one drm legacy is enough ;-)
>>
>> Jesse is the guy from our side to talk to about this.
>> -Daniel
>
> I am not worried about that side, the hsa foundation has pretty strict
> guidelines on what is hsa compliant hardware ie the hw needs to understand
> the pm4 packet format of radeon (well small subset of it). But of course
> this require hsa compliant hardware and from member i am guessing ARM Mali,
> ImgTech, Qualcomm, ... so unless Intel and NVidia joins hsa you will not
> see it for those hardware.
>
> So yes for once same ioctl would apply to different hardware. The only things
> that is different is the shader isa. The hsafoundation site has some pdf
> explaining all that but someone thought that slideshare would be a good idea
> personnaly i would not register to any of the website just to get the pdf.
>
> So to sumup i am ok with having a new device file that present uniform set
> of ioctl. It would actualy be lot easier for userspace, just open this fix
> device file and ask for list of compliant hardware.
>
> Then radeon kernel driver would register itself as a provider. So all ioctl
> decoding marshalling would be share which makes sense.

There's also the other side namely that preparing the cp ring in
userspace and submitting the entire pile through a doorbell to the hw
scheduler isn't really hsa exclusive. And for a solid platform with
seamless gpu/cpu integration that means we need standard ways to set
gpu context priorities and get at useful stats like gpu time used by a
given context.

To get there I guess intel/nvidia need to reuse the hsa subsystem with
the command submission adjusted a bit. Kinda like drm where kms and
buffer sharing is common and cs driver specific.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH 00/83] AMD HSA kernel driver
  2014-07-13  3:55           ` Jerome Glisse
@ 2014-07-13 15:34             ` Bridgman, John
  -1 siblings, 0 replies; 51+ messages in thread
From: Bridgman, John @ 2014-07-13 15:34 UTC (permalink / raw)
  To: Jerome Glisse, Gabbay, Oded
  Cc: linux-kernel, Deucher, Alexander, Lewycky, Andrew, joro, akpm,
	dri-devel, airlied, oded.gabbay



>-----Original Message-----
>From: Jerome Glisse [mailto:j.glisse@gmail.com]
>Sent: Saturday, July 12, 2014 11:56 PM
>To: Gabbay, Oded
>Cc: linux-kernel@vger.kernel.org; Bridgman, John; Deucher, Alexander;
>Lewycky, Andrew; joro@8bytes.org; akpm@linux-foundation.org; dri-
>devel@lists.freedesktop.org; airlied@linux.ie; oded.gabbay@gmail.com
>Subject: Re: [PATCH 00/83] AMD HSA kernel driver
>
>On Sat, Jul 12, 2014 at 09:55:49PM +0000, Gabbay, Oded wrote:
>> On Fri, 2014-07-11 at 17:18 -0400, Jerome Glisse wrote:
>> > On Thu, Jul 10, 2014 at 10:51:29PM +0000, Gabbay, Oded wrote:
>> > >  On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote:
>> > > >  On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote:
>> > > > >   This patch set implements a Heterogeneous System
>> > > > > Architecture
>> > > > >  (HSA) driver
>> > > > >   for radeon-family GPUs.
>> > > >  This is just quick comments on few things. Given size of this,
>> > > > people  will need to have time to review things.
>> > > > >   HSA allows different processor types (CPUs, DSPs, GPUs,
>> > > > > etc..) to
>> > > > >  share
>> > > > >   system resources more effectively via HW features including
>> > > > > shared pageable
>> > > > >   memory, userspace-accessible work queues, and platform-level
>> > > > > atomics. In
>> > > > >   addition to the memory protection mechanisms in GPUVM and
>> > > > > IOMMUv2, the Sea
>> > > > >   Islands family of GPUs also performs HW-level validation of
>> > > > > commands passed
>> > > > >   in through the queues (aka rings).
>> > > > >   The code in this patch set is intended to serve both as a
>> > > > > sample  driver for
>> > > > >   other HSA-compatible hardware devices and as a production
>> > > > > driver  for
>> > > > >   radeon-family processors. The code is architected to support
>> > > > > multiple CPUs
>> > > > >   each with connected GPUs, although the current
>> > > > > implementation  focuses on a
>> > > > >   single Kaveri/Berlin APU, and works alongside the existing
>> > > > > radeon  kernel
>> > > > >   graphics driver (kgd).
>> > > > >   AMD GPUs designed for use with HSA (Sea Islands and up)
>> > > > > share  some hardware
>> > > > >   functionality between HSA compute and regular gfx/compute
>> > > > > (memory,
>> > > > >   interrupts, registers), while other functionality has been
>> > > > > added
>> > > > >   specifically for HSA compute  (hw scheduler for virtualized
>> > > > > compute rings).
>> > > > >   All shared hardware is owned by the radeon graphics driver,
>> > > > > and  an interface
>> > > > >   between kfd and kgd allows the kfd to make use of those
>> > > > > shared  resources,
>> > > > >   while HSA-specific functionality is managed directly by kfd
>> > > > > by  submitting
>> > > > >   packets into an HSA-specific command queue (the "HIQ").
>> > > > >   During kfd module initialization a char device node
>> > > > > (/dev/kfd) is
>> > > > >  created
>> > > > >   (surviving until module exit), with ioctls for queue
>> > > > > creation &  management,
>> > > > >   and data structures are initialized for managing HSA device
>> > > > > topology.
>> > > > >   The rest of the initialization is driven by calls from the
>> > > > > radeon  kgd at
>> > > > >   the following points :
>> > > > >   - radeon_init (kfd_init)
>> > > > >   - radeon_exit (kfd_fini)
>> > > > >   - radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
>> > > > >   - radeon_driver_unload_kms (kfd_device_fini)
>> > > > >   During the probe and init processing per-device data
>> > > > > structures  are
>> > > > >   established which connect to the associated graphics kernel
>> > > > > driver. This
>> > > > >   information is exposed to userspace via sysfs, along with a
>> > > > > version number
>> > > > >   allowing userspace to determine if a topology change has
>> > > > > occurred  while it
>> > > > >   was reading from sysfs.
>> > > > >   The interface between kfd and kgd also allows the kfd to
>> > > > > request  buffer
>> > > > >   management services from kgd, and allows kgd to route
>> > > > > interrupt  requests to
>> > > > >   kfd code since the interrupt block is shared between regular
>> > > > >   graphics/compute and HSA compute subsystems in the GPU.
>> > > > >   The kfd code works with an open source usermode library
>> > > > >  ("libhsakmt") which
>> > > > >   is in the final stages of IP review and should be published
>> > > > > in a  separate
>> > > > >   repo over the next few days.
>> > > > >   The code operates in one of three modes, selectable via the
>> > > > > sched_policy
>> > > > >   module parameter :
>> > > > >   - sched_policy=0 uses a hardware scheduler running in the
>> > > > > MEC  block within
>> > > > >   CP, and allows oversubscription (more queues than HW slots)
>> > > > >   - sched_policy=1 also uses HW scheduling but does not allow
>> > > > >   oversubscription, so create_queue requests fail when we run
>> > > > > out  of HW slots
>> > > > >   - sched_policy=2 does not use HW scheduling, so the driver
>> > > > > manually assigns
>> > > > >   queues to HW slots by programming registers
>> > > > >   The "no HW scheduling" option is for debug & new hardware
>> > > > > bringup  only, so
>> > > > >   has less test coverage than the other options. Default in
>> > > > > the  current code
>> > > > >   is "HW scheduling without oversubscription" since that is
>> > > > > where  we have the
>> > > > >   most test coverage but we expect to change the default to
>> > > > > "HW  scheduling
>> > > > >   with oversubscription" after further testing. This
>> > > > > effectively  removes the
>> > > > >   HW limit on the number of work queues available to
>> > > > > applications.
>> > > > >   Programs running on the GPU are associated with an address
>> > > > > space  through the
>> > > > >   VMID field, which is translated to a unique PASID at access
>> > > > > time  via a set
>> > > > >   of 16 VMID-to-PASID mapping registers. The available VMIDs
>> > > > > (currently 16)
>> > > > >   are partitioned (under control of the radeon kgd) between
>> > > > > current
>> > > > >   gfx/compute and HSA compute, with each getting 8 in the
>> > > > > current  code. The
>> > > > >   VMID-to-PASID mapping registers are updated by the HW
>> > > > > scheduler  when used,
>> > > > >   and by driver code if HW scheduling is not being used.
>> > > > >   The Sea Islands compute queues use a new "doorbell"
>> > > > > mechanism  instead of the
>> > > > >   earlier kernel-managed write pointer registers. Doorbells
>> > > > > use a  separate BAR
>> > > > >   dedicated for this purpose, and pages within the doorbell
>> > > > > aperture are
>> > > > >   mapped to userspace (each page mapped to only one user
>> > > > > address  space).
>> > > > >   Writes to the doorbell aperture are intercepted by GPU
>> > > > > hardware,  allowing
>> > > > >   userspace code to safely manage work queues (rings) without
>> > > > > requiring a
>> > > > >   kernel call for every ring update.
>> > > > >   First step for an application process is to open the kfd
>> > > > > device.
>> > > > >  Calls to
>> > > > >   open create a kfd "process" structure only for the first
>> > > > > thread  of the
>> > > > >   process. Subsequent open calls are checked to see if they
>> > > > > are  from processes
>> > > > >   using the same mm_struct and, if so, don't do anything. The
>> > > > > kfd  per-process
>> > > > >   data lives as long as the mm_struct exists. Each mm_struct
>> > > > > is  associated
>> > > > >   with a unique PASID, allowing the IOMMUv2 to make userspace
>> > > > > process memory
>> > > > >   accessible to the GPU.
>> > > > >   Next step is for the application to collect topology
>> > > > > information  via sysfs.
>> > > > >   This gives userspace enough information to be able to
>> > > > > identify  specific
>> > > > >   nodes (processors) in subsequent queue management calls.
>> > > > >  Application
>> > > > >   processes can create queues on multiple processors, and
>> > > > > processors support
>> > > > >   queues from multiple processes.
>> > > >  I am not a fan to use sysfs to discover topoly.
>> > > > >   At this point the application can create work queues in
>> > > > > userspace  memory and
>> > > > >   pass them through the usermode library to kfd to have them
>> > > > > mapped  onto HW
>> > > > >   queue slots so that commands written to the queues can be
>> > > > > executed by the
>> > > > >   GPU. Queue operations specify a processor node, and so the
>> > > > > bulk  of this code
>> > > > >   is device-specific.
>> > > > >   Written by John Bridgman <John.Bridgman@amd.com>
>> > > >  So general comment is you need to collapse many patches things
>> > > > like
>> > > >  58 fixing
>> > > >  kernel style should be collapsed ie fix all previous patch that
>> > > > have  broken  style.
>> > > >  Even worse is thing like 71, removing code you just added few
>> > > > patch  earlier  in the patchset.
>> > >  Not quite, the code was added on patch 11.
>> > > >  This means that if i start reviewing following  patch order  i
>> > > > might spend time on code that is later deleted/modified/fixed ie
>> > > > time i  spend understanding and reading some code might be just
>> > > > wasted.
>> > >  Quick answer is that you are of course right, but having said
>> > > that, I
>> > >  think it would be not simple at all to do that squashing.
>> > >  I squashed what I could, and probably I can do a little more (like
>> > >  58), but the major problem is that one of the main modules of the
>> > >  driver - the scheduler (QCM) - was completely re-written (patches
>> > >  46-53). Therefore, from patch 1 to 53, we use the old scheduler
>> > > code
>> > >  and from 54 we use the new QCM (and the old scheduler code was
>> > >  completely remove at 71). So I could maybe squash 71 into 54, but
>> > > that
>> > >  won't help you much, I think.
>> > >  So, the current advice I can give is to completely ignore the
>> > >  following files because they do not exist in the final commit:
>> > >  - kfd_sched_cik_static.c (stopped using in 54)
>> > >  - kfd_registers.c (stopped using in 81 because we moved all
>> > > register
>> > >  writes to radeon)
>> > >  - kfd_hw_pointer_store.c (alive from 46 to 67)
>> > >  - kfd_hw_pointer_store.h (alive from 46 to 67)
>> > >          Oded
>> >
>> > Ok i try but this is driving me crazy, i am jungling btw full applied
>> > patchset and individual patch going back and forth trying to find
>> > which
>> > patch is the one i want to comment on. Not even mentioning that after
>> > that people would need to ack bad patch just because they know a
>> > latter
>> > good patch fix the badness.
>> >
>> > This patchset can be reduced to less then 10 big patches. I would
>> > first
>> > do core patches that touch core things like iommu and are preparatory
>> > to the patchset. Then kfd patches that do not touch anything in
>> > radeon
>> > but implement the skeleton and core infrastructure. Then radeon
>> > specific
>> > patches.
>> Hi Jerome,
>> I thought about what you said and I would like to make a suggestion. I
>> believe I can restructure the patchset into 10-20 patches, organized
>> logically and will be easier to review.
>>
>> The downside is that you will lose all references to the current
>> patchset and hence, all the review work you (and other people) have
>> done so far will be somewhat wasted. Also, the entire git history of
>> the driver development will be lost (at least externally).
>
>This history does not matter much.
>
>>
>> John suggested something a bit different in the email thread of PATCH
>> 07/83. He said to squash everything from patch 2 to 54 (including 71)
>> and leave the remaining patches as they were, with maybe some
>> additional small squashing.
>>
>> Please tell me which option do you prefer:
>>
>> A. Continue with the review of the current patchset.
>> B. Go with my suggestion.
>> C. Go with John's suggestion.
>>
>> I'm going tomorrow (Sunday) to prepare options B & C, but I need your
>> input before publishing anything. So, if you could reply by Monday
>> morning my time, that would be great as it will allow me to publish
>> (if chosen) the new set by Monday morning, EST.
>>
>> And thanks again for the time you dedicate to this review. This is
>> highly appreciated.
>>
>>         Oded
>
>Squashing patch together will not be enough, what really needs to happen
>is a hsa module like drm module and radeon module registering itself into
>this hsa module.

Hi Jerome;

Agree completely that this needs to end up as a cross-vendor core framework, but our HSAF partners are not ready to focus on standardizing the low-level implementation this point. That left us with two options -- (a) push out a framework now in the hope that other vendors would use it in the future (not a trivial thing given the wide range of hardware that can be used), or (b) push out a minimal driver with only the IOCTLs we felt could be maintained going forward, while (and this is the important part) making sure we avoided code and/or interfaces which would get in the way of making a standard "core HSA" framework in the future. Our feeling was that option (b) had a much greater chance of success and much lower risk of codifying things in the kernel which ended up not being used in the future. 

Sorry for not including this in the initial writeup -- it was getting too long already but maybe this wasn't the right thing to leave out.

The current implementation (specifically the interface between kfd and kgd) is designed to support multiple gfx drivers for AMD hardware, and we tried to avoid any IOCTLs which would need to be replaced with multi-vendor-under-a-single-framework IOCTLs in the future, so I think we are meeting the abstract goal of not pushing something upstream that will leave litter in the kernel we all get stuck with later... in fact I really do believe this approach has the lowest risk of doing that.

That said, at minimum we should probably identify points in the code where a "core vs HW-specific" break would live, and comment that in the code so we have an RFC of sorts for that but without trying to implement and upstream a framework in advance of sufficient cross-vendor participation. 

Would something like that be sufficient for now ? 

Thanks,
JB
>
>First patch whole ioctl set of hsa.
>Second patch core hsa code and ioctl sekeleton that returns ENODEV or alike.
>Third ioctl marshaling that parse ioctl parameter and handle common ioctl.
>Fourth add the driver hookup to core hsa and define the driver api.
>Fith implement said api for radeon.
>
>Yes most of those 5 patches will be massive addition of single files but
>really that what makes sense here.
>
>But before those 5 patches you most likely want to do some ground work on
>the iommu front.
>
>Cheers,
>Jérôme
>
>> >
>> > This lead me to the design, all the core kfd helper should be in hsa
>> > directory, and should be helper, while all specific bits to radeon
>> > should most likely be part of the radeon gfx kernel module. I am not
>> > against a radeon_kfd but realy i do not see the point. There should
>> > be a hsa.ko like there is a drm.ko.
>> >
>> > Also i think it would be benefical to rename kfd to hsa because it is
>> > the most common name and the name that will bring valid hit with web
>> > search.
>> >
>> > Each of the patch should have a proper commit message that is not too
>> > cryptic without being too chatty either.
>> >
>> > Cheers,
>> > Jérôme
>> >
>> > > >  I will come back with individual patch comments and also general
>> > > >  remarks.
>> > > >  Cheers,
>> > > >  Jérôme
>> > > > >   Alexey Skidanov (4):
>> > > > >     hsa/radeon: 32-bit processes support
>> > > > >     hsa/radeon: NULL pointer dereference bug workaround
>> > > > >     hsa/radeon: HSA64/HSA32 modes support
>> > > > >     hsa/radeon: Add local memory to topology
>> > > > >   Andrew Lewycky (3):
>> > > > >     hsa/radeon: Make binding of process to device permanent
>> > > > >     hsa/radeon: Implement hsaKmtSetMemoryPolicy
>> > > > >     mm: Change timing of notification to IOMMUs about a page
>> > > > > to be
>> > > > >       invalidated
>> > > > >   Ben Goz (20):
>> > > > >     hsa/radeon: Add queue and hw_pointer_store modules
>> > > > >     hsa/radeon: Add support allocating kernel doorbells
>> > > > >     hsa/radeon: Add mqd_manager module
>> > > > >     hsa/radeon: Add kernel queue support for KFD
>> > > > >     hsa/radeon: Add module parameter of scheduling policy
>> > > > >     hsa/radeon: Add packet manager module
>> > > > >     hsa/radeon: Add process queue manager module
>> > > > >     hsa/radeon: Add device queue manager module
>> > > > >     hsa/radeon: Switch to new queue scheduler
>> > > > >     hsa/radeon: Add IOCTL for update queue
>> > > > >     hsa/radeon: Queue Management integration with Memory
>> > > > > Management
>> > > > >     hsa/radeon: update queue fault handling
>> > > > >     hsa/radeon: fixing a bug to support 32b processes
>> > > > >     hsa/radeon: Fix number of pipes per ME
>> > > > >     hsa/radeon: Removing hw pointer store module
>> > > > >     hsa/radeon: Adding some error messages
>> > > > >     hsa/radeon: Fixing minor issues with kernel queues (DIQ)
>> > > > >     drm/radeon: Add register access functions to kfd2kgd
>> > > > > interface
>> > > > >     hsa/radeon: Eliminating all direct register accesses
>> > > > >     drm/radeon: Remove lock functions from kfd2kgd interface
>> > > > >   Evgeny Pinchuk (9):
>> > > > >     hsa/radeon: fix the OEMID assignment in kfd_topology
>> > > > >     drm/radeon: extending kfd-kgd interface
>> > > > >     hsa/radeon: implementing IOCTL for clock counters
>> > > > >     drm/radeon: adding synchronization for GRBM GFX
>> > > > >     hsa/radeon: fixing clock counters bug
>> > > > >     drm/radeon: Extending kfd interface
>> > > > >     hsa/radeon: Adding max clock speeds to topology
>> > > > >     hsa/radeon: Alternating the source of max clock
>> > > > >     hsa/radeon: Exclusive access for perf. counters
>> > > > >   Michael Varga (1):
>> > > > >     hsa/radeon: debugging print statements
>> > > > >   Oded Gabbay (45):
>> > > > >     mm: Add kfd_process pointer to mm_struct
>> > > > >     drm/radeon: reduce number of free VMIDs and pipes in KV
>> > > > >     drm/radeon: Report doorbell configuration to kfd
>> > > > >     drm/radeon: Add radeon <--> kfd interface
>> > > > >     drm/radeon: Add kfd-->kgd interface to get virtual ram size
>> > > > >     drm/radeon: Add kfd-->kgd interfaces of memory
>> > > > >  allocation/mapping
>> > > > >     drm/radeon: Add kfd-->kgd interface of locking
>> > > > > srbm_gfx_cntl
>> > > > >  register
>> > > > >     drm/radeon: Add calls to initialize and finalize kfd from
>> > > > > radeon
>> > > > >     hsa/radeon: Add code base of hsa driver for AMD's GPUs
>> > > > >     hsa/radeon: Add initialization and unmapping of doorbell
>> > > > >  aperture
>> > > > >     hsa/radeon: Add scheduler code
>> > > > >     hsa/radeon: Add kfd mmap handler
>> > > > >     hsa/radeon: Add 2 new IOCTL to kfd, CREATE_QUEUE and
>> > > > >  DESTROY_QUEUE
>> > > > >     hsa/radeon: Update MAINTAINERS and CREDITS files
>> > > > >     hsa/radeon: Add interrupt handling module
>> > > > >     hsa/radeon: Add the isr function of the KFD scehduler
>> > > > >     hsa/radeon: Handle deactivation of queues using interrupts
>> > > > >     hsa/radeon: Enable interrupts in KFD scheduler
>> > > > >     hsa/radeon: Enable/Disable KFD interrupt module
>> > > > >     hsa/radeon: Add interrupt callback function to kgd2kfd
>> > > > > interface
>> > > > >     hsa/radeon: Add kgd-->kfd interfaces for suspend and resume
>> > > > >     drm/radeon: Add calls to suspend and resume of kfd driver
>> > > > >     drm/radeon/cik: Don't touch int of pipes 1-7
>> > > > >     drm/radeon/cik: Call kfd isr function
>> > > > >     hsa/radeon: Fix memory size allocated for HPD
>> > > > >     hsa/radeon: Fix list of supported devices
>> > > > >     hsa/radeon: Fix coding style in cik_int.h
>> > > > >     hsa/radeon: Print ioctl commnad only in debug mode
>> > > > >     hsa/radeon: Print ISR info only in debug mode
>> > > > >     hsa/radeon: Workaround for a bug in amd_iommu
>> > > > >     hsa/radeon: Eliminate warnings in compilation
>> > > > >     hsa/radeon: Various kernel styling fixes
>> > > > >     hsa/radeon: Rearrange structures in kfd_ioctl.h
>> > > > >     hsa/radeon: change another pr_info to pr_debug
>> > > > >     hsa/radeon: Fix timeout calculation in sync_with_hw
>> > > > >     hsa/radeon: Update module information and version
>> > > > >     hsa/radeon: Update module version to 0.6.0
>> > > > >     hsa/radeon: Fix initialization of sh_mem registers
>> > > > >     hsa/radeon: Fix compilation warnings
>> > > > >     hsa/radeon: Remove old scheduler code
>> > > > >     hsa/radeon: Static analysis (smatch) fixes
>> > > > >     hsa/radeon: Check oversubscription before destroying
>> > > > > runlist
>> > > > >     hsa/radeon: Don't verify cksum when parsing CRAT table
>> > > > >     hsa/radeon: Update module version to 0.6.1
>> > > > >     hsa/radeon: Update module version to 0.6.2
>> > > > >   Yair Shachar (1):
>> > > > >     hsa/radeon: Adding qcm fence return status
>> > > > >    CREDITS                                            |    7 +
>> > > > >    MAINTAINERS                                        |    8 +
>> > > > >    drivers/Kconfig                                    |    2 +
>> > > > >    drivers/gpu/Makefile                               |    1 +
>> > > > >    drivers/gpu/drm/radeon/Makefile                    |    1 +
>> > > > >    drivers/gpu/drm/radeon/cik.c                       |  156
>> > > > > +--
>> > > > >    drivers/gpu/drm/radeon/cikd.h                      |   51 +-
>> > > > >    drivers/gpu/drm/radeon/radeon.h                    |    9 +
>> > > > >    drivers/gpu/drm/radeon/radeon_device.c             |   32 +
>> > > > >    drivers/gpu/drm/radeon/radeon_drv.c                |    6 +
>> > > > >    drivers/gpu/drm/radeon/radeon_kfd.c                |  630
>> > > > >  ++++++++++
>> > > > >    drivers/gpu/drm/radeon/radeon_kms.c                |    9 +
>> > > > >    drivers/gpu/hsa/Kconfig                            |   20 +
>> > > > >    drivers/gpu/hsa/Makefile                           |    1 +
>> > > > >    drivers/gpu/hsa/radeon/Makefile                    |   12 +
>> > > > >    drivers/gpu/hsa/radeon/cik_int.h                   |   50 +
>> > > > >    drivers/gpu/hsa/radeon/cik_mqds.h                  |  250
>> > > > > ++++
>> > > > >    drivers/gpu/hsa/radeon/cik_regs.h                  |  220
>> > > > > ++++
>> > > > >    drivers/gpu/hsa/radeon/kfd_aperture.c              |  123 ++
>> > > > >    drivers/gpu/hsa/radeon/kfd_chardev.c               |  530
>> > > > >  +++++++++
>> > > > >    drivers/gpu/hsa/radeon/kfd_crat.h                  |  294
>> > > > > +++++
>> > > > >    drivers/gpu/hsa/radeon/kfd_device.c                |  244
>> > > > > ++++
>> > > > >    drivers/gpu/hsa/radeon/kfd_device_queue_manager.c  |  981
>> > > > >  ++++++++++++++++
>> > > > >    drivers/gpu/hsa/radeon/kfd_device_queue_manager.h  |  101 ++
>> > > > >    drivers/gpu/hsa/radeon/kfd_doorbell.c              |  242
>> > > > > ++++
>> > > > >    drivers/gpu/hsa/radeon/kfd_interrupt.c             |  177
>> > > > > +++
>> > > > >    drivers/gpu/hsa/radeon/kfd_kernel_queue.c          |  305
>> > > > > +++++
>> > > > >    drivers/gpu/hsa/radeon/kfd_kernel_queue.h          |   66 ++
>> > > > >    drivers/gpu/hsa/radeon/kfd_module.c                |  130
>> > > > > +++
>> > > > >    drivers/gpu/hsa/radeon/kfd_mqd_manager.c           |  290
>> > > > > +++++
>> > > > >    drivers/gpu/hsa/radeon/kfd_mqd_manager.h           |   54 +
>> > > > >    drivers/gpu/hsa/radeon/kfd_packet_manager.c        |  488
>> > > > >  ++++++++
>> > > > >    drivers/gpu/hsa/radeon/kfd_pasid.c                 |   97 ++
>> > > > >    drivers/gpu/hsa/radeon/kfd_pm4_headers.h           |  682
>> > > > >  +++++++++++
>> > > > >    drivers/gpu/hsa/radeon/kfd_pm4_opcodes.h           |  107 ++
>> > > > >    drivers/gpu/hsa/radeon/kfd_priv.h                  |  475
>> > > > >  ++++++++
>> > > > >    drivers/gpu/hsa/radeon/kfd_process.c               |  391
>> > > > > +++++++
>> > > > >    drivers/gpu/hsa/radeon/kfd_process_queue_manager.c |  343
>> > > > > ++++++
>> > > > >    drivers/gpu/hsa/radeon/kfd_queue.c                 |  109 ++
>> > > > >    drivers/gpu/hsa/radeon/kfd_scheduler.h             |   72 ++
>> > > > >    drivers/gpu/hsa/radeon/kfd_topology.c              | 1207
>> > > > >  ++++++++++++++++++++
>> > > > >    drivers/gpu/hsa/radeon/kfd_topology.h              |  168
>> > > > > +++
>> > > > >    drivers/gpu/hsa/radeon/kfd_vidmem.c                |   97 ++
>> > > > >    include/linux/mm_types.h                           |   14 +
>> > > > >    include/linux/radeon_kfd.h                         |  106 ++
>> > > > >    include/uapi/linux/kfd_ioctl.h                     |  133
>> > > > > +++
>> > > > >    mm/rmap.c                                          |    8 +-
>> > > > >    47 files changed, 9402 insertions(+), 97 deletions(-)
>> > > > >    create mode 100644 drivers/gpu/drm/radeon/radeon_kfd.c
>> > > > >    create mode 100644 drivers/gpu/hsa/Kconfig
>> > > > >    create mode 100644 drivers/gpu/hsa/Makefile
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/Makefile
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/cik_int.h
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/cik_mqds.h
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/cik_regs.h
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_aperture.c
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_chardev.c
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_crat.h
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_device.c
>> > > > >    create mode 100644
>> > > > >  drivers/gpu/hsa/radeon/kfd_device_queue_manager.c
>> > > > >    create mode 100644
>> > > > >  drivers/gpu/hsa/radeon/kfd_device_queue_manager.h
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_doorbell.c
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_interrupt.c
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_kernel_queue.c
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_kernel_queue.h
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_module.c
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_mqd_manager.c
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_mqd_manager.h
>> > > > >    create mode 100644
>> > > > > drivers/gpu/hsa/radeon/kfd_packet_manager.c
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_pasid.c
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_pm4_headers.h
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_pm4_opcodes.h
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_priv.h
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_process.c
>> > > > >    create mode 100644
>> > > > >  drivers/gpu/hsa/radeon/kfd_process_queue_manager.c
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_queue.c
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_scheduler.h
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_topology.c
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_topology.h
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_vidmem.c
>> > > > >    create mode 100644 include/linux/radeon_kfd.h
>> > > > >    create mode 100644 include/uapi/linux/kfd_ioctl.h
>> > > > >   --
>> > > > >   1.9.1

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH 00/83] AMD HSA kernel driver
@ 2014-07-13 15:34             ` Bridgman, John
  0 siblings, 0 replies; 51+ messages in thread
From: Bridgman, John @ 2014-07-13 15:34 UTC (permalink / raw)
  To: Jerome Glisse, Gabbay, Oded
  Cc: Lewycky, Andrew, linux-kernel, dri-devel, Deucher, Alexander, akpm



>-----Original Message-----
>From: Jerome Glisse [mailto:j.glisse@gmail.com]
>Sent: Saturday, July 12, 2014 11:56 PM
>To: Gabbay, Oded
>Cc: linux-kernel@vger.kernel.org; Bridgman, John; Deucher, Alexander;
>Lewycky, Andrew; joro@8bytes.org; akpm@linux-foundation.org; dri-
>devel@lists.freedesktop.org; airlied@linux.ie; oded.gabbay@gmail.com
>Subject: Re: [PATCH 00/83] AMD HSA kernel driver
>
>On Sat, Jul 12, 2014 at 09:55:49PM +0000, Gabbay, Oded wrote:
>> On Fri, 2014-07-11 at 17:18 -0400, Jerome Glisse wrote:
>> > On Thu, Jul 10, 2014 at 10:51:29PM +0000, Gabbay, Oded wrote:
>> > >  On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote:
>> > > >  On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote:
>> > > > >   This patch set implements a Heterogeneous System
>> > > > > Architecture
>> > > > >  (HSA) driver
>> > > > >   for radeon-family GPUs.
>> > > >  This is just quick comments on few things. Given size of this,
>> > > > people  will need to have time to review things.
>> > > > >   HSA allows different processor types (CPUs, DSPs, GPUs,
>> > > > > etc..) to
>> > > > >  share
>> > > > >   system resources more effectively via HW features including
>> > > > > shared pageable
>> > > > >   memory, userspace-accessible work queues, and platform-level
>> > > > > atomics. In
>> > > > >   addition to the memory protection mechanisms in GPUVM and
>> > > > > IOMMUv2, the Sea
>> > > > >   Islands family of GPUs also performs HW-level validation of
>> > > > > commands passed
>> > > > >   in through the queues (aka rings).
>> > > > >   The code in this patch set is intended to serve both as a
>> > > > > sample  driver for
>> > > > >   other HSA-compatible hardware devices and as a production
>> > > > > driver  for
>> > > > >   radeon-family processors. The code is architected to support
>> > > > > multiple CPUs
>> > > > >   each with connected GPUs, although the current
>> > > > > implementation  focuses on a
>> > > > >   single Kaveri/Berlin APU, and works alongside the existing
>> > > > > radeon  kernel
>> > > > >   graphics driver (kgd).
>> > > > >   AMD GPUs designed for use with HSA (Sea Islands and up)
>> > > > > share  some hardware
>> > > > >   functionality between HSA compute and regular gfx/compute
>> > > > > (memory,
>> > > > >   interrupts, registers), while other functionality has been
>> > > > > added
>> > > > >   specifically for HSA compute  (hw scheduler for virtualized
>> > > > > compute rings).
>> > > > >   All shared hardware is owned by the radeon graphics driver,
>> > > > > and  an interface
>> > > > >   between kfd and kgd allows the kfd to make use of those
>> > > > > shared  resources,
>> > > > >   while HSA-specific functionality is managed directly by kfd
>> > > > > by  submitting
>> > > > >   packets into an HSA-specific command queue (the "HIQ").
>> > > > >   During kfd module initialization a char device node
>> > > > > (/dev/kfd) is
>> > > > >  created
>> > > > >   (surviving until module exit), with ioctls for queue
>> > > > > creation &  management,
>> > > > >   and data structures are initialized for managing HSA device
>> > > > > topology.
>> > > > >   The rest of the initialization is driven by calls from the
>> > > > > radeon  kgd at
>> > > > >   the following points :
>> > > > >   - radeon_init (kfd_init)
>> > > > >   - radeon_exit (kfd_fini)
>> > > > >   - radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
>> > > > >   - radeon_driver_unload_kms (kfd_device_fini)
>> > > > >   During the probe and init processing per-device data
>> > > > > structures  are
>> > > > >   established which connect to the associated graphics kernel
>> > > > > driver. This
>> > > > >   information is exposed to userspace via sysfs, along with a
>> > > > > version number
>> > > > >   allowing userspace to determine if a topology change has
>> > > > > occurred  while it
>> > > > >   was reading from sysfs.
>> > > > >   The interface between kfd and kgd also allows the kfd to
>> > > > > request  buffer
>> > > > >   management services from kgd, and allows kgd to route
>> > > > > interrupt  requests to
>> > > > >   kfd code since the interrupt block is shared between regular
>> > > > >   graphics/compute and HSA compute subsystems in the GPU.
>> > > > >   The kfd code works with an open source usermode library
>> > > > >  ("libhsakmt") which
>> > > > >   is in the final stages of IP review and should be published
>> > > > > in a  separate
>> > > > >   repo over the next few days.
>> > > > >   The code operates in one of three modes, selectable via the
>> > > > > sched_policy
>> > > > >   module parameter :
>> > > > >   - sched_policy=0 uses a hardware scheduler running in the
>> > > > > MEC  block within
>> > > > >   CP, and allows oversubscription (more queues than HW slots)
>> > > > >   - sched_policy=1 also uses HW scheduling but does not allow
>> > > > >   oversubscription, so create_queue requests fail when we run
>> > > > > out  of HW slots
>> > > > >   - sched_policy=2 does not use HW scheduling, so the driver
>> > > > > manually assigns
>> > > > >   queues to HW slots by programming registers
>> > > > >   The "no HW scheduling" option is for debug & new hardware
>> > > > > bringup  only, so
>> > > > >   has less test coverage than the other options. Default in
>> > > > > the  current code
>> > > > >   is "HW scheduling without oversubscription" since that is
>> > > > > where  we have the
>> > > > >   most test coverage but we expect to change the default to
>> > > > > "HW  scheduling
>> > > > >   with oversubscription" after further testing. This
>> > > > > effectively  removes the
>> > > > >   HW limit on the number of work queues available to
>> > > > > applications.
>> > > > >   Programs running on the GPU are associated with an address
>> > > > > space  through the
>> > > > >   VMID field, which is translated to a unique PASID at access
>> > > > > time  via a set
>> > > > >   of 16 VMID-to-PASID mapping registers. The available VMIDs
>> > > > > (currently 16)
>> > > > >   are partitioned (under control of the radeon kgd) between
>> > > > > current
>> > > > >   gfx/compute and HSA compute, with each getting 8 in the
>> > > > > current  code. The
>> > > > >   VMID-to-PASID mapping registers are updated by the HW
>> > > > > scheduler  when used,
>> > > > >   and by driver code if HW scheduling is not being used.
>> > > > >   The Sea Islands compute queues use a new "doorbell"
>> > > > > mechanism  instead of the
>> > > > >   earlier kernel-managed write pointer registers. Doorbells
>> > > > > use a  separate BAR
>> > > > >   dedicated for this purpose, and pages within the doorbell
>> > > > > aperture are
>> > > > >   mapped to userspace (each page mapped to only one user
>> > > > > address  space).
>> > > > >   Writes to the doorbell aperture are intercepted by GPU
>> > > > > hardware,  allowing
>> > > > >   userspace code to safely manage work queues (rings) without
>> > > > > requiring a
>> > > > >   kernel call for every ring update.
>> > > > >   First step for an application process is to open the kfd
>> > > > > device.
>> > > > >  Calls to
>> > > > >   open create a kfd "process" structure only for the first
>> > > > > thread  of the
>> > > > >   process. Subsequent open calls are checked to see if they
>> > > > > are  from processes
>> > > > >   using the same mm_struct and, if so, don't do anything. The
>> > > > > kfd  per-process
>> > > > >   data lives as long as the mm_struct exists. Each mm_struct
>> > > > > is  associated
>> > > > >   with a unique PASID, allowing the IOMMUv2 to make userspace
>> > > > > process memory
>> > > > >   accessible to the GPU.
>> > > > >   Next step is for the application to collect topology
>> > > > > information  via sysfs.
>> > > > >   This gives userspace enough information to be able to
>> > > > > identify  specific
>> > > > >   nodes (processors) in subsequent queue management calls.
>> > > > >  Application
>> > > > >   processes can create queues on multiple processors, and
>> > > > > processors support
>> > > > >   queues from multiple processes.
>> > > >  I am not a fan to use sysfs to discover topoly.
>> > > > >   At this point the application can create work queues in
>> > > > > userspace  memory and
>> > > > >   pass them through the usermode library to kfd to have them
>> > > > > mapped  onto HW
>> > > > >   queue slots so that commands written to the queues can be
>> > > > > executed by the
>> > > > >   GPU. Queue operations specify a processor node, and so the
>> > > > > bulk  of this code
>> > > > >   is device-specific.
>> > > > >   Written by John Bridgman <John.Bridgman@amd.com>
>> > > >  So general comment is you need to collapse many patches things
>> > > > like
>> > > >  58 fixing
>> > > >  kernel style should be collapsed ie fix all previous patch that
>> > > > have  broken  style.
>> > > >  Even worse is thing like 71, removing code you just added few
>> > > > patch  earlier  in the patchset.
>> > >  Not quite, the code was added on patch 11.
>> > > >  This means that if i start reviewing following  patch order  i
>> > > > might spend time on code that is later deleted/modified/fixed ie
>> > > > time i  spend understanding and reading some code might be just
>> > > > wasted.
>> > >  Quick answer is that you are of course right, but having said
>> > > that, I
>> > >  think it would be not simple at all to do that squashing.
>> > >  I squashed what I could, and probably I can do a little more (like
>> > >  58), but the major problem is that one of the main modules of the
>> > >  driver - the scheduler (QCM) - was completely re-written (patches
>> > >  46-53). Therefore, from patch 1 to 53, we use the old scheduler
>> > > code
>> > >  and from 54 we use the new QCM (and the old scheduler code was
>> > >  completely remove at 71). So I could maybe squash 71 into 54, but
>> > > that
>> > >  won't help you much, I think.
>> > >  So, the current advice I can give is to completely ignore the
>> > >  following files because they do not exist in the final commit:
>> > >  - kfd_sched_cik_static.c (stopped using in 54)
>> > >  - kfd_registers.c (stopped using in 81 because we moved all
>> > > register
>> > >  writes to radeon)
>> > >  - kfd_hw_pointer_store.c (alive from 46 to 67)
>> > >  - kfd_hw_pointer_store.h (alive from 46 to 67)
>> > >          Oded
>> >
>> > Ok i try but this is driving me crazy, i am jungling btw full applied
>> > patchset and individual patch going back and forth trying to find
>> > which
>> > patch is the one i want to comment on. Not even mentioning that after
>> > that people would need to ack bad patch just because they know a
>> > latter
>> > good patch fix the badness.
>> >
>> > This patchset can be reduced to less then 10 big patches. I would
>> > first
>> > do core patches that touch core things like iommu and are preparatory
>> > to the patchset. Then kfd patches that do not touch anything in
>> > radeon
>> > but implement the skeleton and core infrastructure. Then radeon
>> > specific
>> > patches.
>> Hi Jerome,
>> I thought about what you said and I would like to make a suggestion. I
>> believe I can restructure the patchset into 10-20 patches, organized
>> logically and will be easier to review.
>>
>> The downside is that you will lose all references to the current
>> patchset and hence, all the review work you (and other people) have
>> done so far will be somewhat wasted. Also, the entire git history of
>> the driver development will be lost (at least externally).
>
>This history does not matter much.
>
>>
>> John suggested something a bit different in the email thread of PATCH
>> 07/83. He said to squash everything from patch 2 to 54 (including 71)
>> and leave the remaining patches as they were, with maybe some
>> additional small squashing.
>>
>> Please tell me which option do you prefer:
>>
>> A. Continue with the review of the current patchset.
>> B. Go with my suggestion.
>> C. Go with John's suggestion.
>>
>> I'm going tomorrow (Sunday) to prepare options B & C, but I need your
>> input before publishing anything. So, if you could reply by Monday
>> morning my time, that would be great as it will allow me to publish
>> (if chosen) the new set by Monday morning, EST.
>>
>> And thanks again for the time you dedicate to this review. This is
>> highly appreciated.
>>
>>         Oded
>
>Squashing patch together will not be enough, what really needs to happen
>is a hsa module like drm module and radeon module registering itself into
>this hsa module.

Hi Jerome;

Agree completely that this needs to end up as a cross-vendor core framework, but our HSAF partners are not ready to focus on standardizing the low-level implementation this point. That left us with two options -- (a) push out a framework now in the hope that other vendors would use it in the future (not a trivial thing given the wide range of hardware that can be used), or (b) push out a minimal driver with only the IOCTLs we felt could be maintained going forward, while (and this is the important part) making sure we avoided code and/or interfaces which would get in the way of making a standard "core HSA" framework in the future. Our feeling was that option (b) had a much greater chance of success and much lower risk of codifying things in the kernel which ended up not being used in the future. 

Sorry for not including this in the initial writeup -- it was getting too long already but maybe this wasn't the right thing to leave out.

The current implementation (specifically the interface between kfd and kgd) is designed to support multiple gfx drivers for AMD hardware, and we tried to avoid any IOCTLs which would need to be replaced with multi-vendor-under-a-single-framework IOCTLs in the future, so I think we are meeting the abstract goal of not pushing something upstream that will leave litter in the kernel we all get stuck with later... in fact I really do believe this approach has the lowest risk of doing that.

That said, at minimum we should probably identify points in the code where a "core vs HW-specific" break would live, and comment that in the code so we have an RFC of sorts for that but without trying to implement and upstream a framework in advance of sufficient cross-vendor participation. 

Would something like that be sufficient for now ? 

Thanks,
JB
>
>First patch whole ioctl set of hsa.
>Second patch core hsa code and ioctl sekeleton that returns ENODEV or alike.
>Third ioctl marshaling that parse ioctl parameter and handle common ioctl.
>Fourth add the driver hookup to core hsa and define the driver api.
>Fith implement said api for radeon.
>
>Yes most of those 5 patches will be massive addition of single files but
>really that what makes sense here.
>
>But before those 5 patches you most likely want to do some ground work on
>the iommu front.
>
>Cheers,
>Jérôme
>
>> >
>> > This lead me to the design, all the core kfd helper should be in hsa
>> > directory, and should be helper, while all specific bits to radeon
>> > should most likely be part of the radeon gfx kernel module. I am not
>> > against a radeon_kfd but realy i do not see the point. There should
>> > be a hsa.ko like there is a drm.ko.
>> >
>> > Also i think it would be benefical to rename kfd to hsa because it is
>> > the most common name and the name that will bring valid hit with web
>> > search.
>> >
>> > Each of the patch should have a proper commit message that is not too
>> > cryptic without being too chatty either.
>> >
>> > Cheers,
>> > Jérôme
>> >
>> > > >  I will come back with individual patch comments and also general
>> > > >  remarks.
>> > > >  Cheers,
>> > > >  Jérôme
>> > > > >   Alexey Skidanov (4):
>> > > > >     hsa/radeon: 32-bit processes support
>> > > > >     hsa/radeon: NULL pointer dereference bug workaround
>> > > > >     hsa/radeon: HSA64/HSA32 modes support
>> > > > >     hsa/radeon: Add local memory to topology
>> > > > >   Andrew Lewycky (3):
>> > > > >     hsa/radeon: Make binding of process to device permanent
>> > > > >     hsa/radeon: Implement hsaKmtSetMemoryPolicy
>> > > > >     mm: Change timing of notification to IOMMUs about a page
>> > > > > to be
>> > > > >       invalidated
>> > > > >   Ben Goz (20):
>> > > > >     hsa/radeon: Add queue and hw_pointer_store modules
>> > > > >     hsa/radeon: Add support allocating kernel doorbells
>> > > > >     hsa/radeon: Add mqd_manager module
>> > > > >     hsa/radeon: Add kernel queue support for KFD
>> > > > >     hsa/radeon: Add module parameter of scheduling policy
>> > > > >     hsa/radeon: Add packet manager module
>> > > > >     hsa/radeon: Add process queue manager module
>> > > > >     hsa/radeon: Add device queue manager module
>> > > > >     hsa/radeon: Switch to new queue scheduler
>> > > > >     hsa/radeon: Add IOCTL for update queue
>> > > > >     hsa/radeon: Queue Management integration with Memory
>> > > > > Management
>> > > > >     hsa/radeon: update queue fault handling
>> > > > >     hsa/radeon: fixing a bug to support 32b processes
>> > > > >     hsa/radeon: Fix number of pipes per ME
>> > > > >     hsa/radeon: Removing hw pointer store module
>> > > > >     hsa/radeon: Adding some error messages
>> > > > >     hsa/radeon: Fixing minor issues with kernel queues (DIQ)
>> > > > >     drm/radeon: Add register access functions to kfd2kgd
>> > > > > interface
>> > > > >     hsa/radeon: Eliminating all direct register accesses
>> > > > >     drm/radeon: Remove lock functions from kfd2kgd interface
>> > > > >   Evgeny Pinchuk (9):
>> > > > >     hsa/radeon: fix the OEMID assignment in kfd_topology
>> > > > >     drm/radeon: extending kfd-kgd interface
>> > > > >     hsa/radeon: implementing IOCTL for clock counters
>> > > > >     drm/radeon: adding synchronization for GRBM GFX
>> > > > >     hsa/radeon: fixing clock counters bug
>> > > > >     drm/radeon: Extending kfd interface
>> > > > >     hsa/radeon: Adding max clock speeds to topology
>> > > > >     hsa/radeon: Alternating the source of max clock
>> > > > >     hsa/radeon: Exclusive access for perf. counters
>> > > > >   Michael Varga (1):
>> > > > >     hsa/radeon: debugging print statements
>> > > > >   Oded Gabbay (45):
>> > > > >     mm: Add kfd_process pointer to mm_struct
>> > > > >     drm/radeon: reduce number of free VMIDs and pipes in KV
>> > > > >     drm/radeon: Report doorbell configuration to kfd
>> > > > >     drm/radeon: Add radeon <--> kfd interface
>> > > > >     drm/radeon: Add kfd-->kgd interface to get virtual ram size
>> > > > >     drm/radeon: Add kfd-->kgd interfaces of memory
>> > > > >  allocation/mapping
>> > > > >     drm/radeon: Add kfd-->kgd interface of locking
>> > > > > srbm_gfx_cntl
>> > > > >  register
>> > > > >     drm/radeon: Add calls to initialize and finalize kfd from
>> > > > > radeon
>> > > > >     hsa/radeon: Add code base of hsa driver for AMD's GPUs
>> > > > >     hsa/radeon: Add initialization and unmapping of doorbell
>> > > > >  aperture
>> > > > >     hsa/radeon: Add scheduler code
>> > > > >     hsa/radeon: Add kfd mmap handler
>> > > > >     hsa/radeon: Add 2 new IOCTL to kfd, CREATE_QUEUE and
>> > > > >  DESTROY_QUEUE
>> > > > >     hsa/radeon: Update MAINTAINERS and CREDITS files
>> > > > >     hsa/radeon: Add interrupt handling module
>> > > > >     hsa/radeon: Add the isr function of the KFD scehduler
>> > > > >     hsa/radeon: Handle deactivation of queues using interrupts
>> > > > >     hsa/radeon: Enable interrupts in KFD scheduler
>> > > > >     hsa/radeon: Enable/Disable KFD interrupt module
>> > > > >     hsa/radeon: Add interrupt callback function to kgd2kfd
>> > > > > interface
>> > > > >     hsa/radeon: Add kgd-->kfd interfaces for suspend and resume
>> > > > >     drm/radeon: Add calls to suspend and resume of kfd driver
>> > > > >     drm/radeon/cik: Don't touch int of pipes 1-7
>> > > > >     drm/radeon/cik: Call kfd isr function
>> > > > >     hsa/radeon: Fix memory size allocated for HPD
>> > > > >     hsa/radeon: Fix list of supported devices
>> > > > >     hsa/radeon: Fix coding style in cik_int.h
>> > > > >     hsa/radeon: Print ioctl commnad only in debug mode
>> > > > >     hsa/radeon: Print ISR info only in debug mode
>> > > > >     hsa/radeon: Workaround for a bug in amd_iommu
>> > > > >     hsa/radeon: Eliminate warnings in compilation
>> > > > >     hsa/radeon: Various kernel styling fixes
>> > > > >     hsa/radeon: Rearrange structures in kfd_ioctl.h
>> > > > >     hsa/radeon: change another pr_info to pr_debug
>> > > > >     hsa/radeon: Fix timeout calculation in sync_with_hw
>> > > > >     hsa/radeon: Update module information and version
>> > > > >     hsa/radeon: Update module version to 0.6.0
>> > > > >     hsa/radeon: Fix initialization of sh_mem registers
>> > > > >     hsa/radeon: Fix compilation warnings
>> > > > >     hsa/radeon: Remove old scheduler code
>> > > > >     hsa/radeon: Static analysis (smatch) fixes
>> > > > >     hsa/radeon: Check oversubscription before destroying
>> > > > > runlist
>> > > > >     hsa/radeon: Don't verify cksum when parsing CRAT table
>> > > > >     hsa/radeon: Update module version to 0.6.1
>> > > > >     hsa/radeon: Update module version to 0.6.2
>> > > > >   Yair Shachar (1):
>> > > > >     hsa/radeon: Adding qcm fence return status
>> > > > >    CREDITS                                            |    7 +
>> > > > >    MAINTAINERS                                        |    8 +
>> > > > >    drivers/Kconfig                                    |    2 +
>> > > > >    drivers/gpu/Makefile                               |    1 +
>> > > > >    drivers/gpu/drm/radeon/Makefile                    |    1 +
>> > > > >    drivers/gpu/drm/radeon/cik.c                       |  156
>> > > > > +--
>> > > > >    drivers/gpu/drm/radeon/cikd.h                      |   51 +-
>> > > > >    drivers/gpu/drm/radeon/radeon.h                    |    9 +
>> > > > >    drivers/gpu/drm/radeon/radeon_device.c             |   32 +
>> > > > >    drivers/gpu/drm/radeon/radeon_drv.c                |    6 +
>> > > > >    drivers/gpu/drm/radeon/radeon_kfd.c                |  630
>> > > > >  ++++++++++
>> > > > >    drivers/gpu/drm/radeon/radeon_kms.c                |    9 +
>> > > > >    drivers/gpu/hsa/Kconfig                            |   20 +
>> > > > >    drivers/gpu/hsa/Makefile                           |    1 +
>> > > > >    drivers/gpu/hsa/radeon/Makefile                    |   12 +
>> > > > >    drivers/gpu/hsa/radeon/cik_int.h                   |   50 +
>> > > > >    drivers/gpu/hsa/radeon/cik_mqds.h                  |  250
>> > > > > ++++
>> > > > >    drivers/gpu/hsa/radeon/cik_regs.h                  |  220
>> > > > > ++++
>> > > > >    drivers/gpu/hsa/radeon/kfd_aperture.c              |  123 ++
>> > > > >    drivers/gpu/hsa/radeon/kfd_chardev.c               |  530
>> > > > >  +++++++++
>> > > > >    drivers/gpu/hsa/radeon/kfd_crat.h                  |  294
>> > > > > +++++
>> > > > >    drivers/gpu/hsa/radeon/kfd_device.c                |  244
>> > > > > ++++
>> > > > >    drivers/gpu/hsa/radeon/kfd_device_queue_manager.c  |  981
>> > > > >  ++++++++++++++++
>> > > > >    drivers/gpu/hsa/radeon/kfd_device_queue_manager.h  |  101 ++
>> > > > >    drivers/gpu/hsa/radeon/kfd_doorbell.c              |  242
>> > > > > ++++
>> > > > >    drivers/gpu/hsa/radeon/kfd_interrupt.c             |  177
>> > > > > +++
>> > > > >    drivers/gpu/hsa/radeon/kfd_kernel_queue.c          |  305
>> > > > > +++++
>> > > > >    drivers/gpu/hsa/radeon/kfd_kernel_queue.h          |   66 ++
>> > > > >    drivers/gpu/hsa/radeon/kfd_module.c                |  130
>> > > > > +++
>> > > > >    drivers/gpu/hsa/radeon/kfd_mqd_manager.c           |  290
>> > > > > +++++
>> > > > >    drivers/gpu/hsa/radeon/kfd_mqd_manager.h           |   54 +
>> > > > >    drivers/gpu/hsa/radeon/kfd_packet_manager.c        |  488
>> > > > >  ++++++++
>> > > > >    drivers/gpu/hsa/radeon/kfd_pasid.c                 |   97 ++
>> > > > >    drivers/gpu/hsa/radeon/kfd_pm4_headers.h           |  682
>> > > > >  +++++++++++
>> > > > >    drivers/gpu/hsa/radeon/kfd_pm4_opcodes.h           |  107 ++
>> > > > >    drivers/gpu/hsa/radeon/kfd_priv.h                  |  475
>> > > > >  ++++++++
>> > > > >    drivers/gpu/hsa/radeon/kfd_process.c               |  391
>> > > > > +++++++
>> > > > >    drivers/gpu/hsa/radeon/kfd_process_queue_manager.c |  343
>> > > > > ++++++
>> > > > >    drivers/gpu/hsa/radeon/kfd_queue.c                 |  109 ++
>> > > > >    drivers/gpu/hsa/radeon/kfd_scheduler.h             |   72 ++
>> > > > >    drivers/gpu/hsa/radeon/kfd_topology.c              | 1207
>> > > > >  ++++++++++++++++++++
>> > > > >    drivers/gpu/hsa/radeon/kfd_topology.h              |  168
>> > > > > +++
>> > > > >    drivers/gpu/hsa/radeon/kfd_vidmem.c                |   97 ++
>> > > > >    include/linux/mm_types.h                           |   14 +
>> > > > >    include/linux/radeon_kfd.h                         |  106 ++
>> > > > >    include/uapi/linux/kfd_ioctl.h                     |  133
>> > > > > +++
>> > > > >    mm/rmap.c                                          |    8 +-
>> > > > >    47 files changed, 9402 insertions(+), 97 deletions(-)
>> > > > >    create mode 100644 drivers/gpu/drm/radeon/radeon_kfd.c
>> > > > >    create mode 100644 drivers/gpu/hsa/Kconfig
>> > > > >    create mode 100644 drivers/gpu/hsa/Makefile
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/Makefile
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/cik_int.h
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/cik_mqds.h
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/cik_regs.h
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_aperture.c
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_chardev.c
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_crat.h
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_device.c
>> > > > >    create mode 100644
>> > > > >  drivers/gpu/hsa/radeon/kfd_device_queue_manager.c
>> > > > >    create mode 100644
>> > > > >  drivers/gpu/hsa/radeon/kfd_device_queue_manager.h
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_doorbell.c
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_interrupt.c
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_kernel_queue.c
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_kernel_queue.h
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_module.c
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_mqd_manager.c
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_mqd_manager.h
>> > > > >    create mode 100644
>> > > > > drivers/gpu/hsa/radeon/kfd_packet_manager.c
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_pasid.c
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_pm4_headers.h
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_pm4_opcodes.h
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_priv.h
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_process.c
>> > > > >    create mode 100644
>> > > > >  drivers/gpu/hsa/radeon/kfd_process_queue_manager.c
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_queue.c
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_scheduler.h
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_topology.c
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_topology.h
>> > > > >    create mode 100644 drivers/gpu/hsa/radeon/kfd_vidmem.c
>> > > > >    create mode 100644 include/linux/radeon_kfd.h
>> > > > >    create mode 100644 include/uapi/linux/kfd_ioctl.h
>> > > > >   --
>> > > > >   1.9.1

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/83] AMD HSA kernel driver
  2014-07-13  9:42               ` Daniel Vetter
@ 2014-07-13 16:40                 ` Jerome Glisse
  -1 siblings, 0 replies; 51+ messages in thread
From: Jerome Glisse @ 2014-07-13 16:40 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Christian König, Gabbay, Oded, oded.gabbay, Lewycky, Andrew,
	linux-kernel, dri-devel, Deucher, Alexander, akpm, Jesse Barnes

On Sun, Jul 13, 2014 at 11:42:58AM +0200, Daniel Vetter wrote:
> On Sat, Jul 12, 2014 at 6:49 PM, Jerome Glisse <j.glisse@gmail.com> wrote:
> >> Hm, so the hsa part is a completely new driver/subsystem, not just an
> >> additional ioctl tacked onto radoen? The history of drm is littered with
> >> "generic" ioctls that turned out to be useful for exactly one driver.
> >> Which is why _all_ the command submission is now done with driver-private
> >> ioctls.
> >>
> >> I'd be quite a bit surprised if that suddenly works differently, so before
> >> we bless a generic hsa interface I really want to see some implementation
> >> from a different vendor (i.e. nvdidia or intel) using the same ioctls.
> >> Otherwise we just repeat history and I'm not terribly inclined to keep on
> >> cleanup up cruft forever - one drm legacy is enough ;-)
> >>
> >> Jesse is the guy from our side to talk to about this.
> >> -Daniel
> >
> > I am not worried about that side, the hsa foundation has pretty strict
> > guidelines on what is hsa compliant hardware ie the hw needs to understand
> > the pm4 packet format of radeon (well small subset of it). But of course
> > this require hsa compliant hardware and from member i am guessing ARM Mali,
> > ImgTech, Qualcomm, ... so unless Intel and NVidia joins hsa you will not
> > see it for those hardware.
> >
> > So yes for once same ioctl would apply to different hardware. The only things
> > that is different is the shader isa. The hsafoundation site has some pdf
> > explaining all that but someone thought that slideshare would be a good idea
> > personnaly i would not register to any of the website just to get the pdf.
> >
> > So to sumup i am ok with having a new device file that present uniform set
> > of ioctl. It would actualy be lot easier for userspace, just open this fix
> > device file and ask for list of compliant hardware.
> >
> > Then radeon kernel driver would register itself as a provider. So all ioctl
> > decoding marshalling would be share which makes sense.
> 
> There's also the other side namely that preparing the cp ring in
> userspace and submitting the entire pile through a doorbell to the hw
> scheduler isn't really hsa exclusive. And for a solid platform with
> seamless gpu/cpu integration that means we need standard ways to set
> gpu context priorities and get at useful stats like gpu time used by a
> given context.
> 
> To get there I guess intel/nvidia need to reuse the hsa subsystem with
> the command submission adjusted a bit. Kinda like drm where kms and
> buffer sharing is common and cs driver specific.

HSA module would be for HSA compliant hardware and thus hardware would
need to follow HSA specification which again is pretty clear on what
the hardware need to provide. So if Intel and NVidia wants to join HSA
i am sure they would be welcome, the more the merrier :)

So i would not block HSA kernel ioctl design in order to please non HSA
hardware especialy if at this point in time nor Intel or NVidia can
share anything concret on the design and how this things should be setup
for there hardware.

When Intel or NVidia present their own API they should provide their
own set of ioctl through their own platform.

Cheers,
Jérôme Glisse

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/83] AMD HSA kernel driver
@ 2014-07-13 16:40                 ` Jerome Glisse
  0 siblings, 0 replies; 51+ messages in thread
From: Jerome Glisse @ 2014-07-13 16:40 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Lewycky, Andrew, linux-kernel, dri-devel, Jesse Barnes, Gabbay,
	Oded, Deucher, Alexander, akpm, Christian König

On Sun, Jul 13, 2014 at 11:42:58AM +0200, Daniel Vetter wrote:
> On Sat, Jul 12, 2014 at 6:49 PM, Jerome Glisse <j.glisse@gmail.com> wrote:
> >> Hm, so the hsa part is a completely new driver/subsystem, not just an
> >> additional ioctl tacked onto radoen? The history of drm is littered with
> >> "generic" ioctls that turned out to be useful for exactly one driver.
> >> Which is why _all_ the command submission is now done with driver-private
> >> ioctls.
> >>
> >> I'd be quite a bit surprised if that suddenly works differently, so before
> >> we bless a generic hsa interface I really want to see some implementation
> >> from a different vendor (i.e. nvdidia or intel) using the same ioctls.
> >> Otherwise we just repeat history and I'm not terribly inclined to keep on
> >> cleanup up cruft forever - one drm legacy is enough ;-)
> >>
> >> Jesse is the guy from our side to talk to about this.
> >> -Daniel
> >
> > I am not worried about that side, the hsa foundation has pretty strict
> > guidelines on what is hsa compliant hardware ie the hw needs to understand
> > the pm4 packet format of radeon (well small subset of it). But of course
> > this require hsa compliant hardware and from member i am guessing ARM Mali,
> > ImgTech, Qualcomm, ... so unless Intel and NVidia joins hsa you will not
> > see it for those hardware.
> >
> > So yes for once same ioctl would apply to different hardware. The only things
> > that is different is the shader isa. The hsafoundation site has some pdf
> > explaining all that but someone thought that slideshare would be a good idea
> > personnaly i would not register to any of the website just to get the pdf.
> >
> > So to sumup i am ok with having a new device file that present uniform set
> > of ioctl. It would actualy be lot easier for userspace, just open this fix
> > device file and ask for list of compliant hardware.
> >
> > Then radeon kernel driver would register itself as a provider. So all ioctl
> > decoding marshalling would be share which makes sense.
> 
> There's also the other side namely that preparing the cp ring in
> userspace and submitting the entire pile through a doorbell to the hw
> scheduler isn't really hsa exclusive. And for a solid platform with
> seamless gpu/cpu integration that means we need standard ways to set
> gpu context priorities and get at useful stats like gpu time used by a
> given context.
> 
> To get there I guess intel/nvidia need to reuse the hsa subsystem with
> the command submission adjusted a bit. Kinda like drm where kms and
> buffer sharing is common and cs driver specific.

HSA module would be for HSA compliant hardware and thus hardware would
need to follow HSA specification which again is pretty clear on what
the hardware need to provide. So if Intel and NVidia wants to join HSA
i am sure they would be welcome, the more the merrier :)

So i would not block HSA kernel ioctl design in order to please non HSA
hardware especialy if at this point in time nor Intel or NVidia can
share anything concret on the design and how this things should be setup
for there hardware.

When Intel or NVidia present their own API they should provide their
own set of ioctl through their own platform.

Cheers,
Jérôme Glisse

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/83] AMD HSA kernel driver
  2014-07-13 15:34             ` Bridgman, John
@ 2014-07-13 16:49               ` Jerome Glisse
  -1 siblings, 0 replies; 51+ messages in thread
From: Jerome Glisse @ 2014-07-13 16:49 UTC (permalink / raw)
  To: Bridgman, John
  Cc: Gabbay, Oded, Lewycky, Andrew, linux-kernel, dri-devel, Deucher,
	Alexander, akpm

On Sun, Jul 13, 2014 at 03:34:12PM +0000, Bridgman, John wrote:
> >From: Jerome Glisse [mailto:j.glisse@gmail.com]
> >Sent: Saturday, July 12, 2014 11:56 PM
> >To: Gabbay, Oded
> >Cc: linux-kernel@vger.kernel.org; Bridgman, John; Deucher, Alexander;
> >Lewycky, Andrew; joro@8bytes.org; akpm@linux-foundation.org; dri-
> >devel@lists.freedesktop.org; airlied@linux.ie; oded.gabbay@gmail.com
> >Subject: Re: [PATCH 00/83] AMD HSA kernel driver
> >
> >On Sat, Jul 12, 2014 at 09:55:49PM +0000, Gabbay, Oded wrote:
> >> On Fri, 2014-07-11 at 17:18 -0400, Jerome Glisse wrote:
> >> > On Thu, Jul 10, 2014 at 10:51:29PM +0000, Gabbay, Oded wrote:
> >> > >  On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote:
> >> > > >  On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote:
> >> > > > >   This patch set implements a Heterogeneous System
> >> > > > > Architecture
> >> > > > >  (HSA) driver
> >> > > > >   for radeon-family GPUs.
> >> > > >  This is just quick comments on few things. Given size of this,
> >> > > > people  will need to have time to review things.
> >> > > > >   HSA allows different processor types (CPUs, DSPs, GPUs,
> >> > > > > etc..) to
> >> > > > >  share
> >> > > > >   system resources more effectively via HW features including
> >> > > > > shared pageable
> >> > > > >   memory, userspace-accessible work queues, and platform-level
> >> > > > > atomics. In
> >> > > > >   addition to the memory protection mechanisms in GPUVM and
> >> > > > > IOMMUv2, the Sea
> >> > > > >   Islands family of GPUs also performs HW-level validation of
> >> > > > > commands passed
> >> > > > >   in through the queues (aka rings).
> >> > > > >   The code in this patch set is intended to serve both as a
> >> > > > > sample  driver for
> >> > > > >   other HSA-compatible hardware devices and as a production
> >> > > > > driver  for
> >> > > > >   radeon-family processors. The code is architected to support
> >> > > > > multiple CPUs
> >> > > > >   each with connected GPUs, although the current
> >> > > > > implementation  focuses on a
> >> > > > >   single Kaveri/Berlin APU, and works alongside the existing
> >> > > > > radeon  kernel
> >> > > > >   graphics driver (kgd).
> >> > > > >   AMD GPUs designed for use with HSA (Sea Islands and up)
> >> > > > > share  some hardware
> >> > > > >   functionality between HSA compute and regular gfx/compute
> >> > > > > (memory,
> >> > > > >   interrupts, registers), while other functionality has been
> >> > > > > added
> >> > > > >   specifically for HSA compute  (hw scheduler for virtualized
> >> > > > > compute rings).
> >> > > > >   All shared hardware is owned by the radeon graphics driver,
> >> > > > > and  an interface
> >> > > > >   between kfd and kgd allows the kfd to make use of those
> >> > > > > shared  resources,
> >> > > > >   while HSA-specific functionality is managed directly by kfd
> >> > > > > by  submitting
> >> > > > >   packets into an HSA-specific command queue (the "HIQ").
> >> > > > >   During kfd module initialization a char device node
> >> > > > > (/dev/kfd) is
> >> > > > >  created
> >> > > > >   (surviving until module exit), with ioctls for queue
> >> > > > > creation &  management,
> >> > > > >   and data structures are initialized for managing HSA device
> >> > > > > topology.
> >> > > > >   The rest of the initialization is driven by calls from the
> >> > > > > radeon  kgd at
> >> > > > >   the following points :
> >> > > > >   - radeon_init (kfd_init)
> >> > > > >   - radeon_exit (kfd_fini)
> >> > > > >   - radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
> >> > > > >   - radeon_driver_unload_kms (kfd_device_fini)
> >> > > > >   During the probe and init processing per-device data
> >> > > > > structures  are
> >> > > > >   established which connect to the associated graphics kernel
> >> > > > > driver. This
> >> > > > >   information is exposed to userspace via sysfs, along with a
> >> > > > > version number
> >> > > > >   allowing userspace to determine if a topology change has
> >> > > > > occurred  while it
> >> > > > >   was reading from sysfs.
> >> > > > >   The interface between kfd and kgd also allows the kfd to
> >> > > > > request  buffer
> >> > > > >   management services from kgd, and allows kgd to route
> >> > > > > interrupt  requests to
> >> > > > >   kfd code since the interrupt block is shared between regular
> >> > > > >   graphics/compute and HSA compute subsystems in the GPU.
> >> > > > >   The kfd code works with an open source usermode library
> >> > > > >  ("libhsakmt") which
> >> > > > >   is in the final stages of IP review and should be published
> >> > > > > in a  separate
> >> > > > >   repo over the next few days.
> >> > > > >   The code operates in one of three modes, selectable via the
> >> > > > > sched_policy
> >> > > > >   module parameter :
> >> > > > >   - sched_policy=0 uses a hardware scheduler running in the
> >> > > > > MEC  block within
> >> > > > >   CP, and allows oversubscription (more queues than HW slots)
> >> > > > >   - sched_policy=1 also uses HW scheduling but does not allow
> >> > > > >   oversubscription, so create_queue requests fail when we run
> >> > > > > out  of HW slots
> >> > > > >   - sched_policy=2 does not use HW scheduling, so the driver
> >> > > > > manually assigns
> >> > > > >   queues to HW slots by programming registers
> >> > > > >   The "no HW scheduling" option is for debug & new hardware
> >> > > > > bringup  only, so
> >> > > > >   has less test coverage than the other options. Default in
> >> > > > > the  current code
> >> > > > >   is "HW scheduling without oversubscription" since that is
> >> > > > > where  we have the
> >> > > > >   most test coverage but we expect to change the default to
> >> > > > > "HW  scheduling
> >> > > > >   with oversubscription" after further testing. This
> >> > > > > effectively  removes the
> >> > > > >   HW limit on the number of work queues available to
> >> > > > > applications.
> >> > > > >   Programs running on the GPU are associated with an address
> >> > > > > space  through the
> >> > > > >   VMID field, which is translated to a unique PASID at access
> >> > > > > time  via a set
> >> > > > >   of 16 VMID-to-PASID mapping registers. The available VMIDs
> >> > > > > (currently 16)
> >> > > > >   are partitioned (under control of the radeon kgd) between
> >> > > > > current
> >> > > > >   gfx/compute and HSA compute, with each getting 8 in the
> >> > > > > current  code. The
> >> > > > >   VMID-to-PASID mapping registers are updated by the HW
> >> > > > > scheduler  when used,
> >> > > > >   and by driver code if HW scheduling is not being used.
> >> > > > >   The Sea Islands compute queues use a new "doorbell"
> >> > > > > mechanism  instead of the
> >> > > > >   earlier kernel-managed write pointer registers. Doorbells
> >> > > > > use a  separate BAR
> >> > > > >   dedicated for this purpose, and pages within the doorbell
> >> > > > > aperture are
> >> > > > >   mapped to userspace (each page mapped to only one user
> >> > > > > address  space).
> >> > > > >   Writes to the doorbell aperture are intercepted by GPU
> >> > > > > hardware,  allowing
> >> > > > >   userspace code to safely manage work queues (rings) without
> >> > > > > requiring a
> >> > > > >   kernel call for every ring update.
> >> > > > >   First step for an application process is to open the kfd
> >> > > > > device.
> >> > > > >  Calls to
> >> > > > >   open create a kfd "process" structure only for the first
> >> > > > > thread  of the
> >> > > > >   process. Subsequent open calls are checked to see if they
> >> > > > > are  from processes
> >> > > > >   using the same mm_struct and, if so, don't do anything. The
> >> > > > > kfd  per-process
> >> > > > >   data lives as long as the mm_struct exists. Each mm_struct
> >> > > > > is  associated
> >> > > > >   with a unique PASID, allowing the IOMMUv2 to make userspace
> >> > > > > process memory
> >> > > > >   accessible to the GPU.
> >> > > > >   Next step is for the application to collect topology
> >> > > > > information  via sysfs.
> >> > > > >   This gives userspace enough information to be able to
> >> > > > > identify  specific
> >> > > > >   nodes (processors) in subsequent queue management calls.
> >> > > > >  Application
> >> > > > >   processes can create queues on multiple processors, and
> >> > > > > processors support
> >> > > > >   queues from multiple processes.
> >> > > >  I am not a fan to use sysfs to discover topoly.
> >> > > > >   At this point the application can create work queues in
> >> > > > > userspace  memory and
> >> > > > >   pass them through the usermode library to kfd to have them
> >> > > > > mapped  onto HW
> >> > > > >   queue slots so that commands written to the queues can be
> >> > > > > executed by the
> >> > > > >   GPU. Queue operations specify a processor node, and so the
> >> > > > > bulk  of this code
> >> > > > >   is device-specific.
> >> > > > >   Written by John Bridgman <John.Bridgman@amd.com>
> >> > > >  So general comment is you need to collapse many patches things
> >> > > > like
> >> > > >  58 fixing
> >> > > >  kernel style should be collapsed ie fix all previous patch that
> >> > > > have  broken  style.
> >> > > >  Even worse is thing like 71, removing code you just added few
> >> > > > patch  earlier  in the patchset.
> >> > >  Not quite, the code was added on patch 11.
> >> > > >  This means that if i start reviewing following  patch order  i
> >> > > > might spend time on code that is later deleted/modified/fixed ie
> >> > > > time i  spend understanding and reading some code might be just
> >> > > > wasted.
> >> > >  Quick answer is that you are of course right, but having said
> >> > > that, I
> >> > >  think it would be not simple at all to do that squashing.
> >> > >  I squashed what I could, and probably I can do a little more (like
> >> > >  58), but the major problem is that one of the main modules of the
> >> > >  driver - the scheduler (QCM) - was completely re-written (patches
> >> > >  46-53). Therefore, from patch 1 to 53, we use the old scheduler
> >> > > code
> >> > >  and from 54 we use the new QCM (and the old scheduler code was
> >> > >  completely remove at 71). So I could maybe squash 71 into 54, but
> >> > > that
> >> > >  won't help you much, I think.
> >> > >  So, the current advice I can give is to completely ignore the
> >> > >  following files because they do not exist in the final commit:
> >> > >  - kfd_sched_cik_static.c (stopped using in 54)
> >> > >  - kfd_registers.c (stopped using in 81 because we moved all
> >> > > register
> >> > >  writes to radeon)
> >> > >  - kfd_hw_pointer_store.c (alive from 46 to 67)
> >> > >  - kfd_hw_pointer_store.h (alive from 46 to 67)
> >> > >          Oded
> >> >
> >> > Ok i try but this is driving me crazy, i am jungling btw full applied
> >> > patchset and individual patch going back and forth trying to find
> >> > which
> >> > patch is the one i want to comment on. Not even mentioning that after
> >> > that people would need to ack bad patch just because they know a
> >> > latter
> >> > good patch fix the badness.
> >> >
> >> > This patchset can be reduced to less then 10 big patches. I would
> >> > first
> >> > do core patches that touch core things like iommu and are preparatory
> >> > to the patchset. Then kfd patches that do not touch anything in
> >> > radeon
> >> > but implement the skeleton and core infrastructure. Then radeon
> >> > specific
> >> > patches.
> >> Hi Jerome,
> >> I thought about what you said and I would like to make a suggestion. I
> >> believe I can restructure the patchset into 10-20 patches, organized
> >> logically and will be easier to review.
> >>
> >> The downside is that you will lose all references to the current
> >> patchset and hence, all the review work you (and other people) have
> >> done so far will be somewhat wasted. Also, the entire git history of
> >> the driver development will be lost (at least externally).
> >
> >This history does not matter much.
> >
> >>
> >> John suggested something a bit different in the email thread of PATCH
> >> 07/83. He said to squash everything from patch 2 to 54 (including 71)
> >> and leave the remaining patches as they were, with maybe some
> >> additional small squashing.
> >>
> >> Please tell me which option do you prefer:
> >>
> >> A. Continue with the review of the current patchset.
> >> B. Go with my suggestion.
> >> C. Go with John's suggestion.
> >>
> >> I'm going tomorrow (Sunday) to prepare options B & C, but I need your
> >> input before publishing anything. So, if you could reply by Monday
> >> morning my time, that would be great as it will allow me to publish
> >> (if chosen) the new set by Monday morning, EST.
> >>
> >> And thanks again for the time you dedicate to this review. This is
> >> highly appreciated.
> >>
> >>         Oded
> >
> >Squashing patch together will not be enough, what really needs to happen
> >is a hsa module like drm module and radeon module registering itself into
> >this hsa module.
> 
> Hi Jerome;
> 
> Agree completely that this needs to end up as a cross-vendor core framework, but our HSAF partners are not ready to focus on standardizing the low-level implementation this point. That left us with two options -- (a) push out a framework now in the hope that other vendors would use it in the future (not a trivial thing given the wide range of hardware that can be used), or (b) push out a minimal driver with only the IOCTLs we felt could be maintained going forward, while (and this is the important part) making sure we avoided code and/or interfaces which would get in the way of making a standard "core HSA" framework in the future. Our feeling was that option (b) had a much greater chance of success and much lower risk of codifying things in the kernel which ended up not being used in the future. 
> 
> Sorry for not including this in the initial writeup -- it was getting too long already but maybe this wasn't the right thing to leave out.
> 
> The current implementation (specifically the interface between kfd and kgd) is designed to support multiple gfx drivers for AMD hardware, and we tried to avoid any IOCTLs which would need to be replaced with multi-vendor-under-a-single-framework IOCTLs in the future, so I think we are meeting the abstract goal of not pushing something upstream that will leave litter in the kernel we all get stuck with later... in fact I really do believe this approach has the lowest risk of doing that.
> 
> That said, at minimum we should probably identify points in the code where a "core vs HW-specific" break would live, and comment that in the code so we have an RFC of sorts for that but without trying to implement and upstream a framework in advance of sufficient cross-vendor participation. 
> 
> Would something like that be sufficient for now ? 
> 
> Thanks,
> JB

So i need to go over hsa specific once more but from memory doorbell, ring
buffer with pm4 packet format are mandatory and thus the same ioctl can be
use to setup this no matter what hardware is behind.

I would rather avoid having temporary ioctl. Common ioctl can be designed
with enough flexibilities for future change. First field of each ioctl can
be a version field describing a structure version.

I can understand the reluctance to enforce some framework to not bother
your partner but this is the Linux kernel, and if hardware manufacturer
wants to have their hardware supported they will have to abide by what
ever rules the kernel community decide.

I vote for HSA module that expose ioctl and is an intermediary with the
kernel driver that handle the hardware. This gives a single point for
HSA hardware and yes this enforce things for any hardware manufacturer.
I am more than happy to tell them that this is it and nothing else if
they want to get upstream.

Cheers,
Jérôme

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/83] AMD HSA kernel driver
@ 2014-07-13 16:49               ` Jerome Glisse
  0 siblings, 0 replies; 51+ messages in thread
From: Jerome Glisse @ 2014-07-13 16:49 UTC (permalink / raw)
  To: Bridgman, John
  Cc: Lewycky, Andrew, linux-kernel, dri-devel, Gabbay, Oded, Deucher,
	Alexander, akpm

On Sun, Jul 13, 2014 at 03:34:12PM +0000, Bridgman, John wrote:
> >From: Jerome Glisse [mailto:j.glisse@gmail.com]
> >Sent: Saturday, July 12, 2014 11:56 PM
> >To: Gabbay, Oded
> >Cc: linux-kernel@vger.kernel.org; Bridgman, John; Deucher, Alexander;
> >Lewycky, Andrew; joro@8bytes.org; akpm@linux-foundation.org; dri-
> >devel@lists.freedesktop.org; airlied@linux.ie; oded.gabbay@gmail.com
> >Subject: Re: [PATCH 00/83] AMD HSA kernel driver
> >
> >On Sat, Jul 12, 2014 at 09:55:49PM +0000, Gabbay, Oded wrote:
> >> On Fri, 2014-07-11 at 17:18 -0400, Jerome Glisse wrote:
> >> > On Thu, Jul 10, 2014 at 10:51:29PM +0000, Gabbay, Oded wrote:
> >> > >  On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote:
> >> > > >  On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote:
> >> > > > >   This patch set implements a Heterogeneous System
> >> > > > > Architecture
> >> > > > >  (HSA) driver
> >> > > > >   for radeon-family GPUs.
> >> > > >  This is just quick comments on few things. Given size of this,
> >> > > > people  will need to have time to review things.
> >> > > > >   HSA allows different processor types (CPUs, DSPs, GPUs,
> >> > > > > etc..) to
> >> > > > >  share
> >> > > > >   system resources more effectively via HW features including
> >> > > > > shared pageable
> >> > > > >   memory, userspace-accessible work queues, and platform-level
> >> > > > > atomics. In
> >> > > > >   addition to the memory protection mechanisms in GPUVM and
> >> > > > > IOMMUv2, the Sea
> >> > > > >   Islands family of GPUs also performs HW-level validation of
> >> > > > > commands passed
> >> > > > >   in through the queues (aka rings).
> >> > > > >   The code in this patch set is intended to serve both as a
> >> > > > > sample  driver for
> >> > > > >   other HSA-compatible hardware devices and as a production
> >> > > > > driver  for
> >> > > > >   radeon-family processors. The code is architected to support
> >> > > > > multiple CPUs
> >> > > > >   each with connected GPUs, although the current
> >> > > > > implementation  focuses on a
> >> > > > >   single Kaveri/Berlin APU, and works alongside the existing
> >> > > > > radeon  kernel
> >> > > > >   graphics driver (kgd).
> >> > > > >   AMD GPUs designed for use with HSA (Sea Islands and up)
> >> > > > > share  some hardware
> >> > > > >   functionality between HSA compute and regular gfx/compute
> >> > > > > (memory,
> >> > > > >   interrupts, registers), while other functionality has been
> >> > > > > added
> >> > > > >   specifically for HSA compute  (hw scheduler for virtualized
> >> > > > > compute rings).
> >> > > > >   All shared hardware is owned by the radeon graphics driver,
> >> > > > > and  an interface
> >> > > > >   between kfd and kgd allows the kfd to make use of those
> >> > > > > shared  resources,
> >> > > > >   while HSA-specific functionality is managed directly by kfd
> >> > > > > by  submitting
> >> > > > >   packets into an HSA-specific command queue (the "HIQ").
> >> > > > >   During kfd module initialization a char device node
> >> > > > > (/dev/kfd) is
> >> > > > >  created
> >> > > > >   (surviving until module exit), with ioctls for queue
> >> > > > > creation &  management,
> >> > > > >   and data structures are initialized for managing HSA device
> >> > > > > topology.
> >> > > > >   The rest of the initialization is driven by calls from the
> >> > > > > radeon  kgd at
> >> > > > >   the following points :
> >> > > > >   - radeon_init (kfd_init)
> >> > > > >   - radeon_exit (kfd_fini)
> >> > > > >   - radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
> >> > > > >   - radeon_driver_unload_kms (kfd_device_fini)
> >> > > > >   During the probe and init processing per-device data
> >> > > > > structures  are
> >> > > > >   established which connect to the associated graphics kernel
> >> > > > > driver. This
> >> > > > >   information is exposed to userspace via sysfs, along with a
> >> > > > > version number
> >> > > > >   allowing userspace to determine if a topology change has
> >> > > > > occurred  while it
> >> > > > >   was reading from sysfs.
> >> > > > >   The interface between kfd and kgd also allows the kfd to
> >> > > > > request  buffer
> >> > > > >   management services from kgd, and allows kgd to route
> >> > > > > interrupt  requests to
> >> > > > >   kfd code since the interrupt block is shared between regular
> >> > > > >   graphics/compute and HSA compute subsystems in the GPU.
> >> > > > >   The kfd code works with an open source usermode library
> >> > > > >  ("libhsakmt") which
> >> > > > >   is in the final stages of IP review and should be published
> >> > > > > in a  separate
> >> > > > >   repo over the next few days.
> >> > > > >   The code operates in one of three modes, selectable via the
> >> > > > > sched_policy
> >> > > > >   module parameter :
> >> > > > >   - sched_policy=0 uses a hardware scheduler running in the
> >> > > > > MEC  block within
> >> > > > >   CP, and allows oversubscription (more queues than HW slots)
> >> > > > >   - sched_policy=1 also uses HW scheduling but does not allow
> >> > > > >   oversubscription, so create_queue requests fail when we run
> >> > > > > out  of HW slots
> >> > > > >   - sched_policy=2 does not use HW scheduling, so the driver
> >> > > > > manually assigns
> >> > > > >   queues to HW slots by programming registers
> >> > > > >   The "no HW scheduling" option is for debug & new hardware
> >> > > > > bringup  only, so
> >> > > > >   has less test coverage than the other options. Default in
> >> > > > > the  current code
> >> > > > >   is "HW scheduling without oversubscription" since that is
> >> > > > > where  we have the
> >> > > > >   most test coverage but we expect to change the default to
> >> > > > > "HW  scheduling
> >> > > > >   with oversubscription" after further testing. This
> >> > > > > effectively  removes the
> >> > > > >   HW limit on the number of work queues available to
> >> > > > > applications.
> >> > > > >   Programs running on the GPU are associated with an address
> >> > > > > space  through the
> >> > > > >   VMID field, which is translated to a unique PASID at access
> >> > > > > time  via a set
> >> > > > >   of 16 VMID-to-PASID mapping registers. The available VMIDs
> >> > > > > (currently 16)
> >> > > > >   are partitioned (under control of the radeon kgd) between
> >> > > > > current
> >> > > > >   gfx/compute and HSA compute, with each getting 8 in the
> >> > > > > current  code. The
> >> > > > >   VMID-to-PASID mapping registers are updated by the HW
> >> > > > > scheduler  when used,
> >> > > > >   and by driver code if HW scheduling is not being used.
> >> > > > >   The Sea Islands compute queues use a new "doorbell"
> >> > > > > mechanism  instead of the
> >> > > > >   earlier kernel-managed write pointer registers. Doorbells
> >> > > > > use a  separate BAR
> >> > > > >   dedicated for this purpose, and pages within the doorbell
> >> > > > > aperture are
> >> > > > >   mapped to userspace (each page mapped to only one user
> >> > > > > address  space).
> >> > > > >   Writes to the doorbell aperture are intercepted by GPU
> >> > > > > hardware,  allowing
> >> > > > >   userspace code to safely manage work queues (rings) without
> >> > > > > requiring a
> >> > > > >   kernel call for every ring update.
> >> > > > >   First step for an application process is to open the kfd
> >> > > > > device.
> >> > > > >  Calls to
> >> > > > >   open create a kfd "process" structure only for the first
> >> > > > > thread  of the
> >> > > > >   process. Subsequent open calls are checked to see if they
> >> > > > > are  from processes
> >> > > > >   using the same mm_struct and, if so, don't do anything. The
> >> > > > > kfd  per-process
> >> > > > >   data lives as long as the mm_struct exists. Each mm_struct
> >> > > > > is  associated
> >> > > > >   with a unique PASID, allowing the IOMMUv2 to make userspace
> >> > > > > process memory
> >> > > > >   accessible to the GPU.
> >> > > > >   Next step is for the application to collect topology
> >> > > > > information  via sysfs.
> >> > > > >   This gives userspace enough information to be able to
> >> > > > > identify  specific
> >> > > > >   nodes (processors) in subsequent queue management calls.
> >> > > > >  Application
> >> > > > >   processes can create queues on multiple processors, and
> >> > > > > processors support
> >> > > > >   queues from multiple processes.
> >> > > >  I am not a fan to use sysfs to discover topoly.
> >> > > > >   At this point the application can create work queues in
> >> > > > > userspace  memory and
> >> > > > >   pass them through the usermode library to kfd to have them
> >> > > > > mapped  onto HW
> >> > > > >   queue slots so that commands written to the queues can be
> >> > > > > executed by the
> >> > > > >   GPU. Queue operations specify a processor node, and so the
> >> > > > > bulk  of this code
> >> > > > >   is device-specific.
> >> > > > >   Written by John Bridgman <John.Bridgman@amd.com>
> >> > > >  So general comment is you need to collapse many patches things
> >> > > > like
> >> > > >  58 fixing
> >> > > >  kernel style should be collapsed ie fix all previous patch that
> >> > > > have  broken  style.
> >> > > >  Even worse is thing like 71, removing code you just added few
> >> > > > patch  earlier  in the patchset.
> >> > >  Not quite, the code was added on patch 11.
> >> > > >  This means that if i start reviewing following  patch order  i
> >> > > > might spend time on code that is later deleted/modified/fixed ie
> >> > > > time i  spend understanding and reading some code might be just
> >> > > > wasted.
> >> > >  Quick answer is that you are of course right, but having said
> >> > > that, I
> >> > >  think it would be not simple at all to do that squashing.
> >> > >  I squashed what I could, and probably I can do a little more (like
> >> > >  58), but the major problem is that one of the main modules of the
> >> > >  driver - the scheduler (QCM) - was completely re-written (patches
> >> > >  46-53). Therefore, from patch 1 to 53, we use the old scheduler
> >> > > code
> >> > >  and from 54 we use the new QCM (and the old scheduler code was
> >> > >  completely remove at 71). So I could maybe squash 71 into 54, but
> >> > > that
> >> > >  won't help you much, I think.
> >> > >  So, the current advice I can give is to completely ignore the
> >> > >  following files because they do not exist in the final commit:
> >> > >  - kfd_sched_cik_static.c (stopped using in 54)
> >> > >  - kfd_registers.c (stopped using in 81 because we moved all
> >> > > register
> >> > >  writes to radeon)
> >> > >  - kfd_hw_pointer_store.c (alive from 46 to 67)
> >> > >  - kfd_hw_pointer_store.h (alive from 46 to 67)
> >> > >          Oded
> >> >
> >> > Ok i try but this is driving me crazy, i am jungling btw full applied
> >> > patchset and individual patch going back and forth trying to find
> >> > which
> >> > patch is the one i want to comment on. Not even mentioning that after
> >> > that people would need to ack bad patch just because they know a
> >> > latter
> >> > good patch fix the badness.
> >> >
> >> > This patchset can be reduced to less then 10 big patches. I would
> >> > first
> >> > do core patches that touch core things like iommu and are preparatory
> >> > to the patchset. Then kfd patches that do not touch anything in
> >> > radeon
> >> > but implement the skeleton and core infrastructure. Then radeon
> >> > specific
> >> > patches.
> >> Hi Jerome,
> >> I thought about what you said and I would like to make a suggestion. I
> >> believe I can restructure the patchset into 10-20 patches, organized
> >> logically and will be easier to review.
> >>
> >> The downside is that you will lose all references to the current
> >> patchset and hence, all the review work you (and other people) have
> >> done so far will be somewhat wasted. Also, the entire git history of
> >> the driver development will be lost (at least externally).
> >
> >This history does not matter much.
> >
> >>
> >> John suggested something a bit different in the email thread of PATCH
> >> 07/83. He said to squash everything from patch 2 to 54 (including 71)
> >> and leave the remaining patches as they were, with maybe some
> >> additional small squashing.
> >>
> >> Please tell me which option do you prefer:
> >>
> >> A. Continue with the review of the current patchset.
> >> B. Go with my suggestion.
> >> C. Go with John's suggestion.
> >>
> >> I'm going tomorrow (Sunday) to prepare options B & C, but I need your
> >> input before publishing anything. So, if you could reply by Monday
> >> morning my time, that would be great as it will allow me to publish
> >> (if chosen) the new set by Monday morning, EST.
> >>
> >> And thanks again for the time you dedicate to this review. This is
> >> highly appreciated.
> >>
> >>         Oded
> >
> >Squashing patch together will not be enough, what really needs to happen
> >is a hsa module like drm module and radeon module registering itself into
> >this hsa module.
> 
> Hi Jerome;
> 
> Agree completely that this needs to end up as a cross-vendor core framework, but our HSAF partners are not ready to focus on standardizing the low-level implementation this point. That left us with two options -- (a) push out a framework now in the hope that other vendors would use it in the future (not a trivial thing given the wide range of hardware that can be used), or (b) push out a minimal driver with only the IOCTLs we felt could be maintained going forward, while (and this is the important part) making sure we avoided code and/or interfaces which would get in the way of making a standard "core HSA" framework in the future. Our feeling was that option (b) had a much greater chance of success and much lower risk of codifying things in the kernel which ended up not being used in the future. 
> 
> Sorry for not including this in the initial writeup -- it was getting too long already but maybe this wasn't the right thing to leave out.
> 
> The current implementation (specifically the interface between kfd and kgd) is designed to support multiple gfx drivers for AMD hardware, and we tried to avoid any IOCTLs which would need to be replaced with multi-vendor-under-a-single-framework IOCTLs in the future, so I think we are meeting the abstract goal of not pushing something upstream that will leave litter in the kernel we all get stuck with later... in fact I really do believe this approach has the lowest risk of doing that.
> 
> That said, at minimum we should probably identify points in the code where a "core vs HW-specific" break would live, and comment that in the code so we have an RFC of sorts for that but without trying to implement and upstream a framework in advance of sufficient cross-vendor participation. 
> 
> Would something like that be sufficient for now ? 
> 
> Thanks,
> JB

So i need to go over hsa specific once more but from memory doorbell, ring
buffer with pm4 packet format are mandatory and thus the same ioctl can be
use to setup this no matter what hardware is behind.

I would rather avoid having temporary ioctl. Common ioctl can be designed
with enough flexibilities for future change. First field of each ioctl can
be a version field describing a structure version.

I can understand the reluctance to enforce some framework to not bother
your partner but this is the Linux kernel, and if hardware manufacturer
wants to have their hardware supported they will have to abide by what
ever rules the kernel community decide.

I vote for HSA module that expose ioctl and is an intermediary with the
kernel driver that handle the hardware. This gives a single point for
HSA hardware and yes this enforce things for any hardware manufacturer.
I am more than happy to tell them that this is it and nothing else if
they want to get upstream.

Cheers,
Jérôme

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/83] AMD HSA kernel driver
  2014-07-13 16:49               ` Jerome Glisse
@ 2014-07-14  8:37                 ` Christian König
  -1 siblings, 0 replies; 51+ messages in thread
From: Christian König @ 2014-07-14  8:37 UTC (permalink / raw)
  To: Jerome Glisse, Bridgman, John
  Cc: Lewycky, Andrew, linux-kernel, dri-devel, Gabbay, Oded, Deucher,
	Alexander, akpm, Daniel Vetter

> I vote for HSA module that expose ioctl and is an intermediary with the
> kernel driver that handle the hardware. This gives a single point for
> HSA hardware and yes this enforce things for any hardware manufacturer.
> I am more than happy to tell them that this is it and nothing else if
> they want to get upstream.
I think we should still discuss this single point of entry a bit more.

Just to make it clear the plan is to expose all physical HSA capable 
devices through a single /dev/hsa device node to userspace.

While this obviously makes device enumeration much easier it's still a 
quite hard break with Unix traditions. Essentially we now expose all 
devices of one kind though a single device node instead of creating 
independent nodes for each physical or logical device.

What makes it even worse is that we want to expose different drivers 
though the same device node.

Because of this any effort of a system administrator to limit access to 
HSA is reduced to an on/off decision. It's simply not possible any more 
to apply simple file system access semantics to individual hardware devices.

Just imaging you are an administrator with a bunch of different compute 
cards in a system and you want to restrict access of one off them 
because it's faulty or has a security problem or something like this. Or 
you have several hardware device and want to assign each of them a 
distinct container.

Just some thoughts,
Christian.

Am 13.07.2014 18:49, schrieb Jerome Glisse:
> On Sun, Jul 13, 2014 at 03:34:12PM +0000, Bridgman, John wrote:
>>> From: Jerome Glisse [mailto:j.glisse@gmail.com]
>>> Sent: Saturday, July 12, 2014 11:56 PM
>>> To: Gabbay, Oded
>>> Cc: linux-kernel@vger.kernel.org; Bridgman, John; Deucher, Alexander;
>>> Lewycky, Andrew; joro@8bytes.org; akpm@linux-foundation.org; dri-
>>> devel@lists.freedesktop.org; airlied@linux.ie; oded.gabbay@gmail.com
>>> Subject: Re: [PATCH 00/83] AMD HSA kernel driver
>>>
>>> On Sat, Jul 12, 2014 at 09:55:49PM +0000, Gabbay, Oded wrote:
>>>> On Fri, 2014-07-11 at 17:18 -0400, Jerome Glisse wrote:
>>>>> On Thu, Jul 10, 2014 at 10:51:29PM +0000, Gabbay, Oded wrote:
>>>>>>   On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote:
>>>>>>>   On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote:
>>>>>>>>    This patch set implements a Heterogeneous System
>>>>>>>> Architecture
>>>>>>>>   (HSA) driver
>>>>>>>>    for radeon-family GPUs.
>>>>>>>   This is just quick comments on few things. Given size of this,
>>>>>>> people  will need to have time to review things.
>>>>>>>>    HSA allows different processor types (CPUs, DSPs, GPUs,
>>>>>>>> etc..) to
>>>>>>>>   share
>>>>>>>>    system resources more effectively via HW features including
>>>>>>>> shared pageable
>>>>>>>>    memory, userspace-accessible work queues, and platform-level
>>>>>>>> atomics. In
>>>>>>>>    addition to the memory protection mechanisms in GPUVM and
>>>>>>>> IOMMUv2, the Sea
>>>>>>>>    Islands family of GPUs also performs HW-level validation of
>>>>>>>> commands passed
>>>>>>>>    in through the queues (aka rings).
>>>>>>>>    The code in this patch set is intended to serve both as a
>>>>>>>> sample  driver for
>>>>>>>>    other HSA-compatible hardware devices and as a production
>>>>>>>> driver  for
>>>>>>>>    radeon-family processors. The code is architected to support
>>>>>>>> multiple CPUs
>>>>>>>>    each with connected GPUs, although the current
>>>>>>>> implementation  focuses on a
>>>>>>>>    single Kaveri/Berlin APU, and works alongside the existing
>>>>>>>> radeon  kernel
>>>>>>>>    graphics driver (kgd).
>>>>>>>>    AMD GPUs designed for use with HSA (Sea Islands and up)
>>>>>>>> share  some hardware
>>>>>>>>    functionality between HSA compute and regular gfx/compute
>>>>>>>> (memory,
>>>>>>>>    interrupts, registers), while other functionality has been
>>>>>>>> added
>>>>>>>>    specifically for HSA compute  (hw scheduler for virtualized
>>>>>>>> compute rings).
>>>>>>>>    All shared hardware is owned by the radeon graphics driver,
>>>>>>>> and  an interface
>>>>>>>>    between kfd and kgd allows the kfd to make use of those
>>>>>>>> shared  resources,
>>>>>>>>    while HSA-specific functionality is managed directly by kfd
>>>>>>>> by  submitting
>>>>>>>>    packets into an HSA-specific command queue (the "HIQ").
>>>>>>>>    During kfd module initialization a char device node
>>>>>>>> (/dev/kfd) is
>>>>>>>>   created
>>>>>>>>    (surviving until module exit), with ioctls for queue
>>>>>>>> creation &  management,
>>>>>>>>    and data structures are initialized for managing HSA device
>>>>>>>> topology.
>>>>>>>>    The rest of the initialization is driven by calls from the
>>>>>>>> radeon  kgd at
>>>>>>>>    the following points :
>>>>>>>>    - radeon_init (kfd_init)
>>>>>>>>    - radeon_exit (kfd_fini)
>>>>>>>>    - radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
>>>>>>>>    - radeon_driver_unload_kms (kfd_device_fini)
>>>>>>>>    During the probe and init processing per-device data
>>>>>>>> structures  are
>>>>>>>>    established which connect to the associated graphics kernel
>>>>>>>> driver. This
>>>>>>>>    information is exposed to userspace via sysfs, along with a
>>>>>>>> version number
>>>>>>>>    allowing userspace to determine if a topology change has
>>>>>>>> occurred  while it
>>>>>>>>    was reading from sysfs.
>>>>>>>>    The interface between kfd and kgd also allows the kfd to
>>>>>>>> request  buffer
>>>>>>>>    management services from kgd, and allows kgd to route
>>>>>>>> interrupt  requests to
>>>>>>>>    kfd code since the interrupt block is shared between regular
>>>>>>>>    graphics/compute and HSA compute subsystems in the GPU.
>>>>>>>>    The kfd code works with an open source usermode library
>>>>>>>>   ("libhsakmt") which
>>>>>>>>    is in the final stages of IP review and should be published
>>>>>>>> in a  separate
>>>>>>>>    repo over the next few days.
>>>>>>>>    The code operates in one of three modes, selectable via the
>>>>>>>> sched_policy
>>>>>>>>    module parameter :
>>>>>>>>    - sched_policy=0 uses a hardware scheduler running in the
>>>>>>>> MEC  block within
>>>>>>>>    CP, and allows oversubscription (more queues than HW slots)
>>>>>>>>    - sched_policy=1 also uses HW scheduling but does not allow
>>>>>>>>    oversubscription, so create_queue requests fail when we run
>>>>>>>> out  of HW slots
>>>>>>>>    - sched_policy=2 does not use HW scheduling, so the driver
>>>>>>>> manually assigns
>>>>>>>>    queues to HW slots by programming registers
>>>>>>>>    The "no HW scheduling" option is for debug & new hardware
>>>>>>>> bringup  only, so
>>>>>>>>    has less test coverage than the other options. Default in
>>>>>>>> the  current code
>>>>>>>>    is "HW scheduling without oversubscription" since that is
>>>>>>>> where  we have the
>>>>>>>>    most test coverage but we expect to change the default to
>>>>>>>> "HW  scheduling
>>>>>>>>    with oversubscription" after further testing. This
>>>>>>>> effectively  removes the
>>>>>>>>    HW limit on the number of work queues available to
>>>>>>>> applications.
>>>>>>>>    Programs running on the GPU are associated with an address
>>>>>>>> space  through the
>>>>>>>>    VMID field, which is translated to a unique PASID at access
>>>>>>>> time  via a set
>>>>>>>>    of 16 VMID-to-PASID mapping registers. The available VMIDs
>>>>>>>> (currently 16)
>>>>>>>>    are partitioned (under control of the radeon kgd) between
>>>>>>>> current
>>>>>>>>    gfx/compute and HSA compute, with each getting 8 in the
>>>>>>>> current  code. The
>>>>>>>>    VMID-to-PASID mapping registers are updated by the HW
>>>>>>>> scheduler  when used,
>>>>>>>>    and by driver code if HW scheduling is not being used.
>>>>>>>>    The Sea Islands compute queues use a new "doorbell"
>>>>>>>> mechanism  instead of the
>>>>>>>>    earlier kernel-managed write pointer registers. Doorbells
>>>>>>>> use a  separate BAR
>>>>>>>>    dedicated for this purpose, and pages within the doorbell
>>>>>>>> aperture are
>>>>>>>>    mapped to userspace (each page mapped to only one user
>>>>>>>> address  space).
>>>>>>>>    Writes to the doorbell aperture are intercepted by GPU
>>>>>>>> hardware,  allowing
>>>>>>>>    userspace code to safely manage work queues (rings) without
>>>>>>>> requiring a
>>>>>>>>    kernel call for every ring update.
>>>>>>>>    First step for an application process is to open the kfd
>>>>>>>> device.
>>>>>>>>   Calls to
>>>>>>>>    open create a kfd "process" structure only for the first
>>>>>>>> thread  of the
>>>>>>>>    process. Subsequent open calls are checked to see if they
>>>>>>>> are  from processes
>>>>>>>>    using the same mm_struct and, if so, don't do anything. The
>>>>>>>> kfd  per-process
>>>>>>>>    data lives as long as the mm_struct exists. Each mm_struct
>>>>>>>> is  associated
>>>>>>>>    with a unique PASID, allowing the IOMMUv2 to make userspace
>>>>>>>> process memory
>>>>>>>>    accessible to the GPU.
>>>>>>>>    Next step is for the application to collect topology
>>>>>>>> information  via sysfs.
>>>>>>>>    This gives userspace enough information to be able to
>>>>>>>> identify  specific
>>>>>>>>    nodes (processors) in subsequent queue management calls.
>>>>>>>>   Application
>>>>>>>>    processes can create queues on multiple processors, and
>>>>>>>> processors support
>>>>>>>>    queues from multiple processes.
>>>>>>>   I am not a fan to use sysfs to discover topoly.
>>>>>>>>    At this point the application can create work queues in
>>>>>>>> userspace  memory and
>>>>>>>>    pass them through the usermode library to kfd to have them
>>>>>>>> mapped  onto HW
>>>>>>>>    queue slots so that commands written to the queues can be
>>>>>>>> executed by the
>>>>>>>>    GPU. Queue operations specify a processor node, and so the
>>>>>>>> bulk  of this code
>>>>>>>>    is device-specific.
>>>>>>>>    Written by John Bridgman <John.Bridgman@amd.com>
>>>>>>>   So general comment is you need to collapse many patches things
>>>>>>> like
>>>>>>>   58 fixing
>>>>>>>   kernel style should be collapsed ie fix all previous patch that
>>>>>>> have  broken  style.
>>>>>>>   Even worse is thing like 71, removing code you just added few
>>>>>>> patch  earlier  in the patchset.
>>>>>>   Not quite, the code was added on patch 11.
>>>>>>>   This means that if i start reviewing following  patch order  i
>>>>>>> might spend time on code that is later deleted/modified/fixed ie
>>>>>>> time i  spend understanding and reading some code might be just
>>>>>>> wasted.
>>>>>>   Quick answer is that you are of course right, but having said
>>>>>> that, I
>>>>>>   think it would be not simple at all to do that squashing.
>>>>>>   I squashed what I could, and probably I can do a little more (like
>>>>>>   58), but the major problem is that one of the main modules of the
>>>>>>   driver - the scheduler (QCM) - was completely re-written (patches
>>>>>>   46-53). Therefore, from patch 1 to 53, we use the old scheduler
>>>>>> code
>>>>>>   and from 54 we use the new QCM (and the old scheduler code was
>>>>>>   completely remove at 71). So I could maybe squash 71 into 54, but
>>>>>> that
>>>>>>   won't help you much, I think.
>>>>>>   So, the current advice I can give is to completely ignore the
>>>>>>   following files because they do not exist in the final commit:
>>>>>>   - kfd_sched_cik_static.c (stopped using in 54)
>>>>>>   - kfd_registers.c (stopped using in 81 because we moved all
>>>>>> register
>>>>>>   writes to radeon)
>>>>>>   - kfd_hw_pointer_store.c (alive from 46 to 67)
>>>>>>   - kfd_hw_pointer_store.h (alive from 46 to 67)
>>>>>>           Oded
>>>>> Ok i try but this is driving me crazy, i am jungling btw full applied
>>>>> patchset and individual patch going back and forth trying to find
>>>>> which
>>>>> patch is the one i want to comment on. Not even mentioning that after
>>>>> that people would need to ack bad patch just because they know a
>>>>> latter
>>>>> good patch fix the badness.
>>>>>
>>>>> This patchset can be reduced to less then 10 big patches. I would
>>>>> first
>>>>> do core patches that touch core things like iommu and are preparatory
>>>>> to the patchset. Then kfd patches that do not touch anything in
>>>>> radeon
>>>>> but implement the skeleton and core infrastructure. Then radeon
>>>>> specific
>>>>> patches.
>>>> Hi Jerome,
>>>> I thought about what you said and I would like to make a suggestion. I
>>>> believe I can restructure the patchset into 10-20 patches, organized
>>>> logically and will be easier to review.
>>>>
>>>> The downside is that you will lose all references to the current
>>>> patchset and hence, all the review work you (and other people) have
>>>> done so far will be somewhat wasted. Also, the entire git history of
>>>> the driver development will be lost (at least externally).
>>> This history does not matter much.
>>>
>>>> John suggested something a bit different in the email thread of PATCH
>>>> 07/83. He said to squash everything from patch 2 to 54 (including 71)
>>>> and leave the remaining patches as they were, with maybe some
>>>> additional small squashing.
>>>>
>>>> Please tell me which option do you prefer:
>>>>
>>>> A. Continue with the review of the current patchset.
>>>> B. Go with my suggestion.
>>>> C. Go with John's suggestion.
>>>>
>>>> I'm going tomorrow (Sunday) to prepare options B & C, but I need your
>>>> input before publishing anything. So, if you could reply by Monday
>>>> morning my time, that would be great as it will allow me to publish
>>>> (if chosen) the new set by Monday morning, EST.
>>>>
>>>> And thanks again for the time you dedicate to this review. This is
>>>> highly appreciated.
>>>>
>>>>          Oded
>>> Squashing patch together will not be enough, what really needs to happen
>>> is a hsa module like drm module and radeon module registering itself into
>>> this hsa module.
>> Hi Jerome;
>>
>> Agree completely that this needs to end up as a cross-vendor core framework, but our HSAF partners are not ready to focus on standardizing the low-level implementation this point. That left us with two options -- (a) push out a framework now in the hope that other vendors would use it in the future (not a trivial thing given the wide range of hardware that can be used), or (b) push out a minimal driver with only the IOCTLs we felt could be maintained going forward, while (and this is the important part) making sure we avoided code and/or interfaces which would get in the way of making a standard "core HSA" framework in the future. Our feeling was that option (b) had a much greater chance of success and much lower risk of codifying things in the kernel which ended up not being used in the future.
>>
>> Sorry for not including this in the initial writeup -- it was getting too long already but maybe this wasn't the right thing to leave out.
>>
>> The current implementation (specifically the interface between kfd and kgd) is designed to support multiple gfx drivers for AMD hardware, and we tried to avoid any IOCTLs which would need to be replaced with multi-vendor-under-a-single-framework IOCTLs in the future, so I think we are meeting the abstract goal of not pushing something upstream that will leave litter in the kernel we all get stuck with later... in fact I really do believe this approach has the lowest risk of doing that.
>>
>> That said, at minimum we should probably identify points in the code where a "core vs HW-specific" break would live, and comment that in the code so we have an RFC of sorts for that but without trying to implement and upstream a framework in advance of sufficient cross-vendor participation.
>>
>> Would something like that be sufficient for now ?
>>
>> Thanks,
>> JB
> So i need to go over hsa specific once more but from memory doorbell, ring
> buffer with pm4 packet format are mandatory and thus the same ioctl can be
> use to setup this no matter what hardware is behind.
>
> I would rather avoid having temporary ioctl. Common ioctl can be designed
> with enough flexibilities for future change. First field of each ioctl can
> be a version field describing a structure version.
>
> I can understand the reluctance to enforce some framework to not bother
> your partner but this is the Linux kernel, and if hardware manufacturer
> wants to have their hardware supported they will have to abide by what
> ever rules the kernel community decide.
>
> I vote for HSA module that expose ioctl and is an intermediary with the
> kernel driver that handle the hardware. This gives a single point for
> HSA hardware and yes this enforce things for any hardware manufacturer.
> I am more than happy to tell them that this is it and nothing else if
> they want to get upstream.
>
> Cheers,
> Jérôme
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/83] AMD HSA kernel driver
@ 2014-07-14  8:37                 ` Christian König
  0 siblings, 0 replies; 51+ messages in thread
From: Christian König @ 2014-07-14  8:37 UTC (permalink / raw)
  To: Jerome Glisse, Bridgman, John
  Cc: Lewycky, Andrew, linux-kernel, dri-devel, Deucher, Alexander, akpm

> I vote for HSA module that expose ioctl and is an intermediary with the
> kernel driver that handle the hardware. This gives a single point for
> HSA hardware and yes this enforce things for any hardware manufacturer.
> I am more than happy to tell them that this is it and nothing else if
> they want to get upstream.
I think we should still discuss this single point of entry a bit more.

Just to make it clear the plan is to expose all physical HSA capable 
devices through a single /dev/hsa device node to userspace.

While this obviously makes device enumeration much easier it's still a 
quite hard break with Unix traditions. Essentially we now expose all 
devices of one kind though a single device node instead of creating 
independent nodes for each physical or logical device.

What makes it even worse is that we want to expose different drivers 
though the same device node.

Because of this any effort of a system administrator to limit access to 
HSA is reduced to an on/off decision. It's simply not possible any more 
to apply simple file system access semantics to individual hardware devices.

Just imaging you are an administrator with a bunch of different compute 
cards in a system and you want to restrict access of one off them 
because it's faulty or has a security problem or something like this. Or 
you have several hardware device and want to assign each of them a 
distinct container.

Just some thoughts,
Christian.

Am 13.07.2014 18:49, schrieb Jerome Glisse:
> On Sun, Jul 13, 2014 at 03:34:12PM +0000, Bridgman, John wrote:
>>> From: Jerome Glisse [mailto:j.glisse@gmail.com]
>>> Sent: Saturday, July 12, 2014 11:56 PM
>>> To: Gabbay, Oded
>>> Cc: linux-kernel@vger.kernel.org; Bridgman, John; Deucher, Alexander;
>>> Lewycky, Andrew; joro@8bytes.org; akpm@linux-foundation.org; dri-
>>> devel@lists.freedesktop.org; airlied@linux.ie; oded.gabbay@gmail.com
>>> Subject: Re: [PATCH 00/83] AMD HSA kernel driver
>>>
>>> On Sat, Jul 12, 2014 at 09:55:49PM +0000, Gabbay, Oded wrote:
>>>> On Fri, 2014-07-11 at 17:18 -0400, Jerome Glisse wrote:
>>>>> On Thu, Jul 10, 2014 at 10:51:29PM +0000, Gabbay, Oded wrote:
>>>>>>   On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote:
>>>>>>>   On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote:
>>>>>>>>    This patch set implements a Heterogeneous System
>>>>>>>> Architecture
>>>>>>>>   (HSA) driver
>>>>>>>>    for radeon-family GPUs.
>>>>>>>   This is just quick comments on few things. Given size of this,
>>>>>>> people  will need to have time to review things.
>>>>>>>>    HSA allows different processor types (CPUs, DSPs, GPUs,
>>>>>>>> etc..) to
>>>>>>>>   share
>>>>>>>>    system resources more effectively via HW features including
>>>>>>>> shared pageable
>>>>>>>>    memory, userspace-accessible work queues, and platform-level
>>>>>>>> atomics. In
>>>>>>>>    addition to the memory protection mechanisms in GPUVM and
>>>>>>>> IOMMUv2, the Sea
>>>>>>>>    Islands family of GPUs also performs HW-level validation of
>>>>>>>> commands passed
>>>>>>>>    in through the queues (aka rings).
>>>>>>>>    The code in this patch set is intended to serve both as a
>>>>>>>> sample  driver for
>>>>>>>>    other HSA-compatible hardware devices and as a production
>>>>>>>> driver  for
>>>>>>>>    radeon-family processors. The code is architected to support
>>>>>>>> multiple CPUs
>>>>>>>>    each with connected GPUs, although the current
>>>>>>>> implementation  focuses on a
>>>>>>>>    single Kaveri/Berlin APU, and works alongside the existing
>>>>>>>> radeon  kernel
>>>>>>>>    graphics driver (kgd).
>>>>>>>>    AMD GPUs designed for use with HSA (Sea Islands and up)
>>>>>>>> share  some hardware
>>>>>>>>    functionality between HSA compute and regular gfx/compute
>>>>>>>> (memory,
>>>>>>>>    interrupts, registers), while other functionality has been
>>>>>>>> added
>>>>>>>>    specifically for HSA compute  (hw scheduler for virtualized
>>>>>>>> compute rings).
>>>>>>>>    All shared hardware is owned by the radeon graphics driver,
>>>>>>>> and  an interface
>>>>>>>>    between kfd and kgd allows the kfd to make use of those
>>>>>>>> shared  resources,
>>>>>>>>    while HSA-specific functionality is managed directly by kfd
>>>>>>>> by  submitting
>>>>>>>>    packets into an HSA-specific command queue (the "HIQ").
>>>>>>>>    During kfd module initialization a char device node
>>>>>>>> (/dev/kfd) is
>>>>>>>>   created
>>>>>>>>    (surviving until module exit), with ioctls for queue
>>>>>>>> creation &  management,
>>>>>>>>    and data structures are initialized for managing HSA device
>>>>>>>> topology.
>>>>>>>>    The rest of the initialization is driven by calls from the
>>>>>>>> radeon  kgd at
>>>>>>>>    the following points :
>>>>>>>>    - radeon_init (kfd_init)
>>>>>>>>    - radeon_exit (kfd_fini)
>>>>>>>>    - radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
>>>>>>>>    - radeon_driver_unload_kms (kfd_device_fini)
>>>>>>>>    During the probe and init processing per-device data
>>>>>>>> structures  are
>>>>>>>>    established which connect to the associated graphics kernel
>>>>>>>> driver. This
>>>>>>>>    information is exposed to userspace via sysfs, along with a
>>>>>>>> version number
>>>>>>>>    allowing userspace to determine if a topology change has
>>>>>>>> occurred  while it
>>>>>>>>    was reading from sysfs.
>>>>>>>>    The interface between kfd and kgd also allows the kfd to
>>>>>>>> request  buffer
>>>>>>>>    management services from kgd, and allows kgd to route
>>>>>>>> interrupt  requests to
>>>>>>>>    kfd code since the interrupt block is shared between regular
>>>>>>>>    graphics/compute and HSA compute subsystems in the GPU.
>>>>>>>>    The kfd code works with an open source usermode library
>>>>>>>>   ("libhsakmt") which
>>>>>>>>    is in the final stages of IP review and should be published
>>>>>>>> in a  separate
>>>>>>>>    repo over the next few days.
>>>>>>>>    The code operates in one of three modes, selectable via the
>>>>>>>> sched_policy
>>>>>>>>    module parameter :
>>>>>>>>    - sched_policy=0 uses a hardware scheduler running in the
>>>>>>>> MEC  block within
>>>>>>>>    CP, and allows oversubscription (more queues than HW slots)
>>>>>>>>    - sched_policy=1 also uses HW scheduling but does not allow
>>>>>>>>    oversubscription, so create_queue requests fail when we run
>>>>>>>> out  of HW slots
>>>>>>>>    - sched_policy=2 does not use HW scheduling, so the driver
>>>>>>>> manually assigns
>>>>>>>>    queues to HW slots by programming registers
>>>>>>>>    The "no HW scheduling" option is for debug & new hardware
>>>>>>>> bringup  only, so
>>>>>>>>    has less test coverage than the other options. Default in
>>>>>>>> the  current code
>>>>>>>>    is "HW scheduling without oversubscription" since that is
>>>>>>>> where  we have the
>>>>>>>>    most test coverage but we expect to change the default to
>>>>>>>> "HW  scheduling
>>>>>>>>    with oversubscription" after further testing. This
>>>>>>>> effectively  removes the
>>>>>>>>    HW limit on the number of work queues available to
>>>>>>>> applications.
>>>>>>>>    Programs running on the GPU are associated with an address
>>>>>>>> space  through the
>>>>>>>>    VMID field, which is translated to a unique PASID at access
>>>>>>>> time  via a set
>>>>>>>>    of 16 VMID-to-PASID mapping registers. The available VMIDs
>>>>>>>> (currently 16)
>>>>>>>>    are partitioned (under control of the radeon kgd) between
>>>>>>>> current
>>>>>>>>    gfx/compute and HSA compute, with each getting 8 in the
>>>>>>>> current  code. The
>>>>>>>>    VMID-to-PASID mapping registers are updated by the HW
>>>>>>>> scheduler  when used,
>>>>>>>>    and by driver code if HW scheduling is not being used.
>>>>>>>>    The Sea Islands compute queues use a new "doorbell"
>>>>>>>> mechanism  instead of the
>>>>>>>>    earlier kernel-managed write pointer registers. Doorbells
>>>>>>>> use a  separate BAR
>>>>>>>>    dedicated for this purpose, and pages within the doorbell
>>>>>>>> aperture are
>>>>>>>>    mapped to userspace (each page mapped to only one user
>>>>>>>> address  space).
>>>>>>>>    Writes to the doorbell aperture are intercepted by GPU
>>>>>>>> hardware,  allowing
>>>>>>>>    userspace code to safely manage work queues (rings) without
>>>>>>>> requiring a
>>>>>>>>    kernel call for every ring update.
>>>>>>>>    First step for an application process is to open the kfd
>>>>>>>> device.
>>>>>>>>   Calls to
>>>>>>>>    open create a kfd "process" structure only for the first
>>>>>>>> thread  of the
>>>>>>>>    process. Subsequent open calls are checked to see if they
>>>>>>>> are  from processes
>>>>>>>>    using the same mm_struct and, if so, don't do anything. The
>>>>>>>> kfd  per-process
>>>>>>>>    data lives as long as the mm_struct exists. Each mm_struct
>>>>>>>> is  associated
>>>>>>>>    with a unique PASID, allowing the IOMMUv2 to make userspace
>>>>>>>> process memory
>>>>>>>>    accessible to the GPU.
>>>>>>>>    Next step is for the application to collect topology
>>>>>>>> information  via sysfs.
>>>>>>>>    This gives userspace enough information to be able to
>>>>>>>> identify  specific
>>>>>>>>    nodes (processors) in subsequent queue management calls.
>>>>>>>>   Application
>>>>>>>>    processes can create queues on multiple processors, and
>>>>>>>> processors support
>>>>>>>>    queues from multiple processes.
>>>>>>>   I am not a fan to use sysfs to discover topoly.
>>>>>>>>    At this point the application can create work queues in
>>>>>>>> userspace  memory and
>>>>>>>>    pass them through the usermode library to kfd to have them
>>>>>>>> mapped  onto HW
>>>>>>>>    queue slots so that commands written to the queues can be
>>>>>>>> executed by the
>>>>>>>>    GPU. Queue operations specify a processor node, and so the
>>>>>>>> bulk  of this code
>>>>>>>>    is device-specific.
>>>>>>>>    Written by John Bridgman <John.Bridgman@amd.com>
>>>>>>>   So general comment is you need to collapse many patches things
>>>>>>> like
>>>>>>>   58 fixing
>>>>>>>   kernel style should be collapsed ie fix all previous patch that
>>>>>>> have  broken  style.
>>>>>>>   Even worse is thing like 71, removing code you just added few
>>>>>>> patch  earlier  in the patchset.
>>>>>>   Not quite, the code was added on patch 11.
>>>>>>>   This means that if i start reviewing following  patch order  i
>>>>>>> might spend time on code that is later deleted/modified/fixed ie
>>>>>>> time i  spend understanding and reading some code might be just
>>>>>>> wasted.
>>>>>>   Quick answer is that you are of course right, but having said
>>>>>> that, I
>>>>>>   think it would be not simple at all to do that squashing.
>>>>>>   I squashed what I could, and probably I can do a little more (like
>>>>>>   58), but the major problem is that one of the main modules of the
>>>>>>   driver - the scheduler (QCM) - was completely re-written (patches
>>>>>>   46-53). Therefore, from patch 1 to 53, we use the old scheduler
>>>>>> code
>>>>>>   and from 54 we use the new QCM (and the old scheduler code was
>>>>>>   completely remove at 71). So I could maybe squash 71 into 54, but
>>>>>> that
>>>>>>   won't help you much, I think.
>>>>>>   So, the current advice I can give is to completely ignore the
>>>>>>   following files because they do not exist in the final commit:
>>>>>>   - kfd_sched_cik_static.c (stopped using in 54)
>>>>>>   - kfd_registers.c (stopped using in 81 because we moved all
>>>>>> register
>>>>>>   writes to radeon)
>>>>>>   - kfd_hw_pointer_store.c (alive from 46 to 67)
>>>>>>   - kfd_hw_pointer_store.h (alive from 46 to 67)
>>>>>>           Oded
>>>>> Ok i try but this is driving me crazy, i am jungling btw full applied
>>>>> patchset and individual patch going back and forth trying to find
>>>>> which
>>>>> patch is the one i want to comment on. Not even mentioning that after
>>>>> that people would need to ack bad patch just because they know a
>>>>> latter
>>>>> good patch fix the badness.
>>>>>
>>>>> This patchset can be reduced to less then 10 big patches. I would
>>>>> first
>>>>> do core patches that touch core things like iommu and are preparatory
>>>>> to the patchset. Then kfd patches that do not touch anything in
>>>>> radeon
>>>>> but implement the skeleton and core infrastructure. Then radeon
>>>>> specific
>>>>> patches.
>>>> Hi Jerome,
>>>> I thought about what you said and I would like to make a suggestion. I
>>>> believe I can restructure the patchset into 10-20 patches, organized
>>>> logically and will be easier to review.
>>>>
>>>> The downside is that you will lose all references to the current
>>>> patchset and hence, all the review work you (and other people) have
>>>> done so far will be somewhat wasted. Also, the entire git history of
>>>> the driver development will be lost (at least externally).
>>> This history does not matter much.
>>>
>>>> John suggested something a bit different in the email thread of PATCH
>>>> 07/83. He said to squash everything from patch 2 to 54 (including 71)
>>>> and leave the remaining patches as they were, with maybe some
>>>> additional small squashing.
>>>>
>>>> Please tell me which option do you prefer:
>>>>
>>>> A. Continue with the review of the current patchset.
>>>> B. Go with my suggestion.
>>>> C. Go with John's suggestion.
>>>>
>>>> I'm going tomorrow (Sunday) to prepare options B & C, but I need your
>>>> input before publishing anything. So, if you could reply by Monday
>>>> morning my time, that would be great as it will allow me to publish
>>>> (if chosen) the new set by Monday morning, EST.
>>>>
>>>> And thanks again for the time you dedicate to this review. This is
>>>> highly appreciated.
>>>>
>>>>          Oded
>>> Squashing patch together will not be enough, what really needs to happen
>>> is a hsa module like drm module and radeon module registering itself into
>>> this hsa module.
>> Hi Jerome;
>>
>> Agree completely that this needs to end up as a cross-vendor core framework, but our HSAF partners are not ready to focus on standardizing the low-level implementation this point. That left us with two options -- (a) push out a framework now in the hope that other vendors would use it in the future (not a trivial thing given the wide range of hardware that can be used), or (b) push out a minimal driver with only the IOCTLs we felt could be maintained going forward, while (and this is the important part) making sure we avoided code and/or interfaces which would get in the way of making a standard "core HSA" framework in the future. Our feeling was that option (b) had a much greater chance of success and much lower risk of codifying things in the kernel which ended up not being used in the future.
>>
>> Sorry for not including this in the initial writeup -- it was getting too long already but maybe this wasn't the right thing to leave out.
>>
>> The current implementation (specifically the interface between kfd and kgd) is designed to support multiple gfx drivers for AMD hardware, and we tried to avoid any IOCTLs which would need to be replaced with multi-vendor-under-a-single-framework IOCTLs in the future, so I think we are meeting the abstract goal of not pushing something upstream that will leave litter in the kernel we all get stuck with later... in fact I really do believe this approach has the lowest risk of doing that.
>>
>> That said, at minimum we should probably identify points in the code where a "core vs HW-specific" break would live, and comment that in the code so we have an RFC of sorts for that but without trying to implement and upstream a framework in advance of sufficient cross-vendor participation.
>>
>> Would something like that be sufficient for now ?
>>
>> Thanks,
>> JB
> So i need to go over hsa specific once more but from memory doorbell, ring
> buffer with pm4 packet format are mandatory and thus the same ioctl can be
> use to setup this no matter what hardware is behind.
>
> I would rather avoid having temporary ioctl. Common ioctl can be designed
> with enough flexibilities for future change. First field of each ioctl can
> be a version field describing a structure version.
>
> I can understand the reluctance to enforce some framework to not bother
> your partner but this is the Linux kernel, and if hardware manufacturer
> wants to have their hardware supported they will have to abide by what
> ever rules the kernel community decide.
>
> I vote for HSA module that expose ioctl and is an intermediary with the
> kernel driver that handle the hardware. This gives a single point for
> HSA hardware and yes this enforce things for any hardware manufacturer.
> I am more than happy to tell them that this is it and nothing else if
> they want to get upstream.
>
> Cheers,
> Jérôme
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/83] AMD HSA kernel driver
  2014-07-14  8:37                 ` Christian König
@ 2014-07-15  4:35                   ` Dave Airlie
  -1 siblings, 0 replies; 51+ messages in thread
From: Dave Airlie @ 2014-07-15  4:35 UTC (permalink / raw)
  To: Christian König
  Cc: Jerome Glisse, Bridgman, John, Lewycky, Andrew, linux-kernel,
	dri-devel, Deucher, Alexander, akpm

On 14 July 2014 18:37, Christian König <deathsimple@vodafone.de> wrote:
>> I vote for HSA module that expose ioctl and is an intermediary with the
>> kernel driver that handle the hardware. This gives a single point for
>> HSA hardware and yes this enforce things for any hardware manufacturer.
>> I am more than happy to tell them that this is it and nothing else if
>> they want to get upstream.
>
> I think we should still discuss this single point of entry a bit more.
>
> Just to make it clear the plan is to expose all physical HSA capable devices
> through a single /dev/hsa device node to userspace.

This is why we don't design kernel interfaces in secret foundations,
and expect anyone to like them.

So before we go any further, how is this stuff planned to work for
multiple GPUs/accelerators?

Do we have a userspace to exercise this interface so we can see how
such a thing would look?

Dave.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/83] AMD HSA kernel driver
@ 2014-07-15  4:35                   ` Dave Airlie
  0 siblings, 0 replies; 51+ messages in thread
From: Dave Airlie @ 2014-07-15  4:35 UTC (permalink / raw)
  To: Christian König
  Cc: Lewycky, Andrew, linux-kernel, dri-devel, Deucher, Alexander, akpm

On 14 July 2014 18:37, Christian König <deathsimple@vodafone.de> wrote:
>> I vote for HSA module that expose ioctl and is an intermediary with the
>> kernel driver that handle the hardware. This gives a single point for
>> HSA hardware and yes this enforce things for any hardware manufacturer.
>> I am more than happy to tell them that this is it and nothing else if
>> they want to get upstream.
>
> I think we should still discuss this single point of entry a bit more.
>
> Just to make it clear the plan is to expose all physical HSA capable devices
> through a single /dev/hsa device node to userspace.

This is why we don't design kernel interfaces in secret foundations,
and expect anyone to like them.

So before we go any further, how is this stuff planned to work for
multiple GPUs/accelerators?

Do we have a userspace to exercise this interface so we can see how
such a thing would look?

Dave.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/83] AMD HSA kernel driver
  2014-07-15  4:35                   ` Dave Airlie
@ 2014-07-15 14:29                     ` Jerome Glisse
  -1 siblings, 0 replies; 51+ messages in thread
From: Jerome Glisse @ 2014-07-15 14:29 UTC (permalink / raw)
  To: Dave Airlie
  Cc: Christian König, Bridgman, John, Lewycky, Andrew,
	linux-kernel, dri-devel, Deucher, Alexander, akpm

On Tue, Jul 15, 2014 at 02:35:19PM +1000, Dave Airlie wrote:
> On 14 July 2014 18:37, Christian König <deathsimple@vodafone.de> wrote:
> >> I vote for HSA module that expose ioctl and is an intermediary with the
> >> kernel driver that handle the hardware. This gives a single point for
> >> HSA hardware and yes this enforce things for any hardware manufacturer.
> >> I am more than happy to tell them that this is it and nothing else if
> >> they want to get upstream.
> >
> > I think we should still discuss this single point of entry a bit more.
> >
> > Just to make it clear the plan is to expose all physical HSA capable devices
> > through a single /dev/hsa device node to userspace.
> 
> This is why we don't design kernel interfaces in secret foundations,
> and expect anyone to like them.
> 

I think at this time this is unlikely to get into 3.17. But Christian had
point on having multiple device file. So something like /dev/hsa/*

> So before we go any further, how is this stuff planned to work for
> multiple GPUs/accelerators?

My understanding is that you create queue and each queue is associated
with a device. You can create several queue for same device and have
different priority btw queue.

Btw queue here means a ring buffer that understand a common set of pm4
packet.

> Do we have a userspace to exercise this interface so we can see how
> such a thing would look?

I think we need to wait a bit before freezing and accepting the kernel
api and see enough userspace bits to be confortable. Moreover if AMD
wants common API for HSA i also think that they at very least needs
there HSA partner to make public comment on the kernel API.

Cheers,
Jérôme



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/83] AMD HSA kernel driver
@ 2014-07-15 14:29                     ` Jerome Glisse
  0 siblings, 0 replies; 51+ messages in thread
From: Jerome Glisse @ 2014-07-15 14:29 UTC (permalink / raw)
  To: Dave Airlie
  Cc: Lewycky, Andrew, linux-kernel, dri-devel, Deucher, Alexander, akpm

On Tue, Jul 15, 2014 at 02:35:19PM +1000, Dave Airlie wrote:
> On 14 July 2014 18:37, Christian König <deathsimple@vodafone.de> wrote:
> >> I vote for HSA module that expose ioctl and is an intermediary with the
> >> kernel driver that handle the hardware. This gives a single point for
> >> HSA hardware and yes this enforce things for any hardware manufacturer.
> >> I am more than happy to tell them that this is it and nothing else if
> >> they want to get upstream.
> >
> > I think we should still discuss this single point of entry a bit more.
> >
> > Just to make it clear the plan is to expose all physical HSA capable devices
> > through a single /dev/hsa device node to userspace.
> 
> This is why we don't design kernel interfaces in secret foundations,
> and expect anyone to like them.
> 

I think at this time this is unlikely to get into 3.17. But Christian had
point on having multiple device file. So something like /dev/hsa/*

> So before we go any further, how is this stuff planned to work for
> multiple GPUs/accelerators?

My understanding is that you create queue and each queue is associated
with a device. You can create several queue for same device and have
different priority btw queue.

Btw queue here means a ring buffer that understand a common set of pm4
packet.

> Do we have a userspace to exercise this interface so we can see how
> such a thing would look?

I think we need to wait a bit before freezing and accepting the kernel
api and see enough userspace bits to be confortable. Moreover if AMD
wants common API for HSA i also think that they at very least needs
there HSA partner to make public comment on the kernel API.

Cheers,
Jérôme

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH 00/83] AMD HSA kernel driver
  2014-07-15  4:35                   ` Dave Airlie
@ 2014-07-15 17:06                     ` Bridgman, John
  -1 siblings, 0 replies; 51+ messages in thread
From: Bridgman, John @ 2014-07-15 17:06 UTC (permalink / raw)
  To: Dave Airlie, Christian König
  Cc: Jerome Glisse, Lewycky, Andrew, linux-kernel, dri-devel, Deucher,
	Alexander, akpm

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 3183 bytes --]



>-----Original Message-----
>From: Dave Airlie [mailto:airlied@gmail.com]
>Sent: Tuesday, July 15, 2014 12:35 AM
>To: Christian König
>Cc: Jerome Glisse; Bridgman, John; Lewycky, Andrew; linux-
>kernel@vger.kernel.org; dri-devel@lists.freedesktop.org; Deucher,
>Alexander; akpm@linux-foundation.org
>Subject: Re: [PATCH 00/83] AMD HSA kernel driver
>
>On 14 July 2014 18:37, Christian König <deathsimple@vodafone.de> wrote:
>>> I vote for HSA module that expose ioctl and is an intermediary with
>>> the kernel driver that handle the hardware. This gives a single point
>>> for HSA hardware and yes this enforce things for any hardware
>manufacturer.
>>> I am more than happy to tell them that this is it and nothing else if
>>> they want to get upstream.
>>
>> I think we should still discuss this single point of entry a bit more.
>>
>> Just to make it clear the plan is to expose all physical HSA capable
>> devices through a single /dev/hsa device node to userspace.
>
>This is why we don't design kernel interfaces in secret foundations, and
>expect anyone to like them.

Understood and agree. In this case though this isn't a cross-vendor interface designed by a secret committee, it's supposed to be more of an inoffensive little single-vendor interface designed *for* a secret committee. I'm hoping that's better ;)

>
>So before we go any further, how is this stuff planned to work for multiple
>GPUs/accelerators?

Three classes of "multiple" :

1. Single CPU with IOMMUv2 and multiple GPUs:

- all devices accessible via /dev/kfd
- topology information identifies CPU + GPUs, each has "node ID" at top of userspace API, "global ID" at user/kernel interface
 (don't think we've implemented CPU part yet though)
- userspace builds snapshot from sysfs info & exposes to HSAIL runtime, which in turn exposes the "standard" API
- kfd sets up ATC aperture so GPUs can access system RAM via IOMMUv2 (fast for APU, relatively less so for dGPU over PCIE)
- to-be-added memory operations allow allocation & residency control (within existing gfx driver limits) of buffers in VRAM & carved-out system RAM
- queue operations specify a node ID to userspace library, which translates to "global ID" before calling kfd

2. Multiple CPUs connected via fabric (eg HyperTransport) each with 0 or more GPUs:

- topology information exposes CPUs & GPUs, along with affinity info showing what is connected to what
- everything else works as in (1) above

3. Multiple CPUs not connected via fabric (eg a blade server) each with 0 or more GPUs

- no attempt to cover this with HSA topology, each CPU and associated GPUs is accessed independently via separate /dev/kfd instances

>
>Do we have a userspace to exercise this interface so we can see how such a
>thing would look?

Yes -- initial IP review done, legal stuff done, sanitizing WIP, hoping for final approval this week

There's a separate test harness to exercise the userspace lib calls, haven't started IP review or sanitizing for that but legal stuff is done

>
>Dave.
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH 00/83] AMD HSA kernel driver
@ 2014-07-15 17:06                     ` Bridgman, John
  0 siblings, 0 replies; 51+ messages in thread
From: Bridgman, John @ 2014-07-15 17:06 UTC (permalink / raw)
  To: Dave Airlie, Christian König
  Cc: Lewycky, Andrew, linux-kernel, dri-devel, Deucher, Alexander, akpm



>-----Original Message-----
>From: Dave Airlie [mailto:airlied@gmail.com]
>Sent: Tuesday, July 15, 2014 12:35 AM
>To: Christian König
>Cc: Jerome Glisse; Bridgman, John; Lewycky, Andrew; linux-
>kernel@vger.kernel.org; dri-devel@lists.freedesktop.org; Deucher,
>Alexander; akpm@linux-foundation.org
>Subject: Re: [PATCH 00/83] AMD HSA kernel driver
>
>On 14 July 2014 18:37, Christian König <deathsimple@vodafone.de> wrote:
>>> I vote for HSA module that expose ioctl and is an intermediary with
>>> the kernel driver that handle the hardware. This gives a single point
>>> for HSA hardware and yes this enforce things for any hardware
>manufacturer.
>>> I am more than happy to tell them that this is it and nothing else if
>>> they want to get upstream.
>>
>> I think we should still discuss this single point of entry a bit more.
>>
>> Just to make it clear the plan is to expose all physical HSA capable
>> devices through a single /dev/hsa device node to userspace.
>
>This is why we don't design kernel interfaces in secret foundations, and
>expect anyone to like them.

Understood and agree. In this case though this isn't a cross-vendor interface designed by a secret committee, it's supposed to be more of an inoffensive little single-vendor interface designed *for* a secret committee. I'm hoping that's better ;)

>
>So before we go any further, how is this stuff planned to work for multiple
>GPUs/accelerators?

Three classes of "multiple" :

1. Single CPU with IOMMUv2 and multiple GPUs:

- all devices accessible via /dev/kfd
- topology information identifies CPU + GPUs, each has "node ID" at top of userspace API, "global ID" at user/kernel interface
 (don't think we've implemented CPU part yet though)
- userspace builds snapshot from sysfs info & exposes to HSAIL runtime, which in turn exposes the "standard" API
- kfd sets up ATC aperture so GPUs can access system RAM via IOMMUv2 (fast for APU, relatively less so for dGPU over PCIE)
- to-be-added memory operations allow allocation & residency control (within existing gfx driver limits) of buffers in VRAM & carved-out system RAM
- queue operations specify a node ID to userspace library, which translates to "global ID" before calling kfd

2. Multiple CPUs connected via fabric (eg HyperTransport) each with 0 or more GPUs:

- topology information exposes CPUs & GPUs, along with affinity info showing what is connected to what
- everything else works as in (1) above

3. Multiple CPUs not connected via fabric (eg a blade server) each with 0 or more GPUs

- no attempt to cover this with HSA topology, each CPU and associated GPUs is accessed independently via separate /dev/kfd instances

>
>Do we have a userspace to exercise this interface so we can see how such a
>thing would look?

Yes -- initial IP review done, legal stuff done, sanitizing WIP, hoping for final approval this week

There's a separate test harness to exercise the userspace lib calls, haven't started IP review or sanitizing for that but legal stuff is done

>
>Dave.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH 00/83] AMD HSA kernel driver
  2014-07-15 17:06                     ` Bridgman, John
@ 2014-07-15 17:23                       ` Bridgman, John
  -1 siblings, 0 replies; 51+ messages in thread
From: Bridgman, John @ 2014-07-15 17:23 UTC (permalink / raw)
  To: Dave Airlie, Christian König
  Cc: Lewycky, Andrew, linux-kernel, dri-devel, Deucher, Alexander, akpm

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 4070 bytes --]



>-----Original Message-----
>From: dri-devel [mailto:dri-devel-bounces@lists.freedesktop.org] On Behalf
>Of Bridgman, John
>Sent: Tuesday, July 15, 2014 1:07 PM
>To: Dave Airlie; Christian König
>Cc: Lewycky, Andrew; linux-kernel@vger.kernel.org; dri-
>devel@lists.freedesktop.org; Deucher, Alexander; akpm@linux-
>foundation.org
>Subject: RE: [PATCH 00/83] AMD HSA kernel driver
>
>
>
>>-----Original Message-----
>>From: Dave Airlie [mailto:airlied@gmail.com]
>>Sent: Tuesday, July 15, 2014 12:35 AM
>>To: Christian König
>>Cc: Jerome Glisse; Bridgman, John; Lewycky, Andrew; linux-
>>kernel@vger.kernel.org; dri-devel@lists.freedesktop.org; Deucher,
>>Alexander; akpm@linux-foundation.org
>>Subject: Re: [PATCH 00/83] AMD HSA kernel driver
>>
>>On 14 July 2014 18:37, Christian König <deathsimple@vodafone.de> wrote:
>>>> I vote for HSA module that expose ioctl and is an intermediary with
>>>> the kernel driver that handle the hardware. This gives a single
>>>> point for HSA hardware and yes this enforce things for any hardware
>>manufacturer.
>>>> I am more than happy to tell them that this is it and nothing else
>>>> if they want to get upstream.
>>>
>>> I think we should still discuss this single point of entry a bit more.
>>>
>>> Just to make it clear the plan is to expose all physical HSA capable
>>> devices through a single /dev/hsa device node to userspace.
>>
>>This is why we don't design kernel interfaces in secret foundations,
>>and expect anyone to like them.
>
>Understood and agree. In this case though this isn't a cross-vendor interface
>designed by a secret committee, it's supposed to be more of an inoffensive
>little single-vendor interface designed *for* a secret committee. I'm hoping
>that's better ;)
>
>>
>>So before we go any further, how is this stuff planned to work for
>>multiple GPUs/accelerators?
>
>Three classes of "multiple" :
>
>1. Single CPU with IOMMUv2 and multiple GPUs:
>
>- all devices accessible via /dev/kfd
>- topology information identifies CPU + GPUs, each has "node ID" at top of
>userspace API, "global ID" at user/kernel interface  (don't think we've
>implemented CPU part yet though)
>- userspace builds snapshot from sysfs info & exposes to HSAIL runtime,
>which in turn exposes the "standard" API
>- kfd sets up ATC aperture so GPUs can access system RAM via IOMMUv2 (fast
>for APU, relatively less so for dGPU over PCIE)
>- to-be-added memory operations allow allocation & residency control
>(within existing gfx driver limits) of buffers in VRAM & carved-out system
>RAM
>- queue operations specify a node ID to userspace library, which translates to
>"global ID" before calling kfd
>
>2. Multiple CPUs connected via fabric (eg HyperTransport) each with 0 or
>more GPUs:
>
>- topology information exposes CPUs & GPUs, along with affinity info
>showing what is connected to what
>- everything else works as in (1) above

This is probably a good point to stress that HSA topology is only intended as an OS-independent way of communicating system info up to higher levels of the HSA stack, not as a new and competing way to *manage* system properties inside Linux or any other OS.

>
>3. Multiple CPUs not connected via fabric (eg a blade server) each with 0 or
>more GPUs
>
>- no attempt to cover this with HSA topology, each CPU and associated GPUs
>is accessed independently via separate /dev/kfd instances
>
>>
>>Do we have a userspace to exercise this interface so we can see how
>>such a thing would look?
>
>Yes -- initial IP review done, legal stuff done, sanitizing WIP, hoping for final
>approval this week
>
>There's a separate test harness to exercise the userspace lib calls, haven't
>started IP review or sanitizing for that but legal stuff is done
>
>>
>>Dave.
>_______________________________________________
>dri-devel mailing list
>dri-devel@lists.freedesktop.org
>http://lists.freedesktop.org/mailman/listinfo/dri-devel
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH 00/83] AMD HSA kernel driver
@ 2014-07-15 17:23                       ` Bridgman, John
  0 siblings, 0 replies; 51+ messages in thread
From: Bridgman, John @ 2014-07-15 17:23 UTC (permalink / raw)
  To: Dave Airlie, Christian König
  Cc: Deucher, Alexander, akpm, Lewycky, Andrew, linux-kernel, dri-devel



>-----Original Message-----
>From: dri-devel [mailto:dri-devel-bounces@lists.freedesktop.org] On Behalf
>Of Bridgman, John
>Sent: Tuesday, July 15, 2014 1:07 PM
>To: Dave Airlie; Christian König
>Cc: Lewycky, Andrew; linux-kernel@vger.kernel.org; dri-
>devel@lists.freedesktop.org; Deucher, Alexander; akpm@linux-
>foundation.org
>Subject: RE: [PATCH 00/83] AMD HSA kernel driver
>
>
>
>>-----Original Message-----
>>From: Dave Airlie [mailto:airlied@gmail.com]
>>Sent: Tuesday, July 15, 2014 12:35 AM
>>To: Christian König
>>Cc: Jerome Glisse; Bridgman, John; Lewycky, Andrew; linux-
>>kernel@vger.kernel.org; dri-devel@lists.freedesktop.org; Deucher,
>>Alexander; akpm@linux-foundation.org
>>Subject: Re: [PATCH 00/83] AMD HSA kernel driver
>>
>>On 14 July 2014 18:37, Christian König <deathsimple@vodafone.de> wrote:
>>>> I vote for HSA module that expose ioctl and is an intermediary with
>>>> the kernel driver that handle the hardware. This gives a single
>>>> point for HSA hardware and yes this enforce things for any hardware
>>manufacturer.
>>>> I am more than happy to tell them that this is it and nothing else
>>>> if they want to get upstream.
>>>
>>> I think we should still discuss this single point of entry a bit more.
>>>
>>> Just to make it clear the plan is to expose all physical HSA capable
>>> devices through a single /dev/hsa device node to userspace.
>>
>>This is why we don't design kernel interfaces in secret foundations,
>>and expect anyone to like them.
>
>Understood and agree. In this case though this isn't a cross-vendor interface
>designed by a secret committee, it's supposed to be more of an inoffensive
>little single-vendor interface designed *for* a secret committee. I'm hoping
>that's better ;)
>
>>
>>So before we go any further, how is this stuff planned to work for
>>multiple GPUs/accelerators?
>
>Three classes of "multiple" :
>
>1. Single CPU with IOMMUv2 and multiple GPUs:
>
>- all devices accessible via /dev/kfd
>- topology information identifies CPU + GPUs, each has "node ID" at top of
>userspace API, "global ID" at user/kernel interface  (don't think we've
>implemented CPU part yet though)
>- userspace builds snapshot from sysfs info & exposes to HSAIL runtime,
>which in turn exposes the "standard" API
>- kfd sets up ATC aperture so GPUs can access system RAM via IOMMUv2 (fast
>for APU, relatively less so for dGPU over PCIE)
>- to-be-added memory operations allow allocation & residency control
>(within existing gfx driver limits) of buffers in VRAM & carved-out system
>RAM
>- queue operations specify a node ID to userspace library, which translates to
>"global ID" before calling kfd
>
>2. Multiple CPUs connected via fabric (eg HyperTransport) each with 0 or
>more GPUs:
>
>- topology information exposes CPUs & GPUs, along with affinity info
>showing what is connected to what
>- everything else works as in (1) above

This is probably a good point to stress that HSA topology is only intended as an OS-independent way of communicating system info up to higher levels of the HSA stack, not as a new and competing way to *manage* system properties inside Linux or any other OS.

>
>3. Multiple CPUs not connected via fabric (eg a blade server) each with 0 or
>more GPUs
>
>- no attempt to cover this with HSA topology, each CPU and associated GPUs
>is accessed independently via separate /dev/kfd instances
>
>>
>>Do we have a userspace to exercise this interface so we can see how
>>such a thing would look?
>
>Yes -- initial IP review done, legal stuff done, sanitizing WIP, hoping for final
>approval this week
>
>There's a separate test harness to exercise the userspace lib calls, haven't
>started IP review or sanitizing for that but legal stuff is done
>
>>
>>Dave.
>_______________________________________________
>dri-devel mailing list
>dri-devel@lists.freedesktop.org
>http://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/83] AMD HSA kernel driver
  2014-07-15 17:06                     ` Bridgman, John
@ 2014-07-15 17:37                       ` Jerome Glisse
  -1 siblings, 0 replies; 51+ messages in thread
From: Jerome Glisse @ 2014-07-15 17:37 UTC (permalink / raw)
  To: Bridgman, John
  Cc: Dave Airlie, Christian König, Lewycky, Andrew, linux-kernel,
	dri-devel, Deucher, Alexander, akpm

On Tue, Jul 15, 2014 at 05:06:56PM +0000, Bridgman, John wrote:
> >From: Dave Airlie [mailto:airlied@gmail.com]
> >Sent: Tuesday, July 15, 2014 12:35 AM
> >To: Christian König
> >Cc: Jerome Glisse; Bridgman, John; Lewycky, Andrew; linux-
> >kernel@vger.kernel.org; dri-devel@lists.freedesktop.org; Deucher,
> >Alexander; akpm@linux-foundation.org
> >Subject: Re: [PATCH 00/83] AMD HSA kernel driver
> >
> >On 14 July 2014 18:37, Christian König <deathsimple@vodafone.de> wrote:
> >>> I vote for HSA module that expose ioctl and is an intermediary with
> >>> the kernel driver that handle the hardware. This gives a single point
> >>> for HSA hardware and yes this enforce things for any hardware
> >manufacturer.
> >>> I am more than happy to tell them that this is it and nothing else if
> >>> they want to get upstream.
> >>
> >> I think we should still discuss this single point of entry a bit more.
> >>
> >> Just to make it clear the plan is to expose all physical HSA capable
> >> devices through a single /dev/hsa device node to userspace.
> >
> >This is why we don't design kernel interfaces in secret foundations, and
> >expect anyone to like them.
> 
> Understood and agree. In this case though this isn't a cross-vendor interface designed by a secret committee, it's supposed to be more of an inoffensive little single-vendor interface designed *for* a secret committee. I'm hoping that's better ;)
> 
> >
> >So before we go any further, how is this stuff planned to work for multiple
> >GPUs/accelerators?
> 
> Three classes of "multiple" :
> 
> 1. Single CPU with IOMMUv2 and multiple GPUs:
> 
> - all devices accessible via /dev/kfd
> - topology information identifies CPU + GPUs, each has "node ID" at top of userspace API, "global ID" at user/kernel interface
>  (don't think we've implemented CPU part yet though)
> - userspace builds snapshot from sysfs info & exposes to HSAIL runtime, which in turn exposes the "standard" API

This is why i do not like the sysfs approach, it would be lot nicer to have
device file per provider and thus hsail can listen on device file event and
discover if hardware is vanishing or appearing. Periodicaly going over sysfs
files is not the right way to do that.

> - kfd sets up ATC aperture so GPUs can access system RAM via IOMMUv2 (fast for APU, relatively less so for dGPU over PCIE)
> - to-be-added memory operations allow allocation & residency control (within existing gfx driver limits) of buffers in VRAM & carved-out system RAM
> - queue operations specify a node ID to userspace library, which translates to "global ID" before calling kfd
> 
> 2. Multiple CPUs connected via fabric (eg HyperTransport) each with 0 or more GPUs:
> 
> - topology information exposes CPUs & GPUs, along with affinity info showing what is connected to what
> - everything else works as in (1) above
> 

This is suppose to be part of HSA ? This is lot broader than i thought.

> 3. Multiple CPUs not connected via fabric (eg a blade server) each with 0 or more GPUs
> 
> - no attempt to cover this with HSA topology, each CPU and associated GPUs is accessed independently via separate /dev/kfd instances
> 
> >
> >Do we have a userspace to exercise this interface so we can see how such a
> >thing would look?
> 
> Yes -- initial IP review done, legal stuff done, sanitizing WIP, hoping for final approval this week
> 
> There's a separate test harness to exercise the userspace lib calls, haven't started IP review or sanitizing for that but legal stuff is done
> 
> >
> >Dave.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/83] AMD HSA kernel driver
@ 2014-07-15 17:37                       ` Jerome Glisse
  0 siblings, 0 replies; 51+ messages in thread
From: Jerome Glisse @ 2014-07-15 17:37 UTC (permalink / raw)
  To: Bridgman, John
  Cc: Lewycky, Andrew, linux-kernel, dri-devel, Deucher, Alexander, akpm

On Tue, Jul 15, 2014 at 05:06:56PM +0000, Bridgman, John wrote:
> >From: Dave Airlie [mailto:airlied@gmail.com]
> >Sent: Tuesday, July 15, 2014 12:35 AM
> >To: Christian König
> >Cc: Jerome Glisse; Bridgman, John; Lewycky, Andrew; linux-
> >kernel@vger.kernel.org; dri-devel@lists.freedesktop.org; Deucher,
> >Alexander; akpm@linux-foundation.org
> >Subject: Re: [PATCH 00/83] AMD HSA kernel driver
> >
> >On 14 July 2014 18:37, Christian König <deathsimple@vodafone.de> wrote:
> >>> I vote for HSA module that expose ioctl and is an intermediary with
> >>> the kernel driver that handle the hardware. This gives a single point
> >>> for HSA hardware and yes this enforce things for any hardware
> >manufacturer.
> >>> I am more than happy to tell them that this is it and nothing else if
> >>> they want to get upstream.
> >>
> >> I think we should still discuss this single point of entry a bit more.
> >>
> >> Just to make it clear the plan is to expose all physical HSA capable
> >> devices through a single /dev/hsa device node to userspace.
> >
> >This is why we don't design kernel interfaces in secret foundations, and
> >expect anyone to like them.
> 
> Understood and agree. In this case though this isn't a cross-vendor interface designed by a secret committee, it's supposed to be more of an inoffensive little single-vendor interface designed *for* a secret committee. I'm hoping that's better ;)
> 
> >
> >So before we go any further, how is this stuff planned to work for multiple
> >GPUs/accelerators?
> 
> Three classes of "multiple" :
> 
> 1. Single CPU with IOMMUv2 and multiple GPUs:
> 
> - all devices accessible via /dev/kfd
> - topology information identifies CPU + GPUs, each has "node ID" at top of userspace API, "global ID" at user/kernel interface
>  (don't think we've implemented CPU part yet though)
> - userspace builds snapshot from sysfs info & exposes to HSAIL runtime, which in turn exposes the "standard" API

This is why i do not like the sysfs approach, it would be lot nicer to have
device file per provider and thus hsail can listen on device file event and
discover if hardware is vanishing or appearing. Periodicaly going over sysfs
files is not the right way to do that.

> - kfd sets up ATC aperture so GPUs can access system RAM via IOMMUv2 (fast for APU, relatively less so for dGPU over PCIE)
> - to-be-added memory operations allow allocation & residency control (within existing gfx driver limits) of buffers in VRAM & carved-out system RAM
> - queue operations specify a node ID to userspace library, which translates to "global ID" before calling kfd
> 
> 2. Multiple CPUs connected via fabric (eg HyperTransport) each with 0 or more GPUs:
> 
> - topology information exposes CPUs & GPUs, along with affinity info showing what is connected to what
> - everything else works as in (1) above
> 

This is suppose to be part of HSA ? This is lot broader than i thought.

> 3. Multiple CPUs not connected via fabric (eg a blade server) each with 0 or more GPUs
> 
> - no attempt to cover this with HSA topology, each CPU and associated GPUs is accessed independently via separate /dev/kfd instances
> 
> >
> >Do we have a userspace to exercise this interface so we can see how such a
> >thing would look?
> 
> Yes -- initial IP review done, legal stuff done, sanitizing WIP, hoping for final approval this week
> 
> There's a separate test harness to exercise the userspace lib calls, haven't started IP review or sanitizing for that but legal stuff is done
> 
> >
> >Dave.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH 00/83] AMD HSA kernel driver
  2014-07-15 17:37                       ` Jerome Glisse
@ 2014-07-15 17:53                         ` Bridgman, John
  -1 siblings, 0 replies; 51+ messages in thread
From: Bridgman, John @ 2014-07-15 17:53 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Dave Airlie, Christian König, Lewycky, Andrew, linux-kernel,
	dri-devel, Deucher, Alexander, akpm



>-----Original Message-----
>From: Jerome Glisse [mailto:j.glisse@gmail.com]
>Sent: Tuesday, July 15, 2014 1:37 PM
>To: Bridgman, John
>Cc: Dave Airlie; Christian König; Lewycky, Andrew; linux-
>kernel@vger.kernel.org; dri-devel@lists.freedesktop.org; Deucher,
>Alexander; akpm@linux-foundation.org
>Subject: Re: [PATCH 00/83] AMD HSA kernel driver
>
>On Tue, Jul 15, 2014 at 05:06:56PM +0000, Bridgman, John wrote:
>> >From: Dave Airlie [mailto:airlied@gmail.com]
>> >Sent: Tuesday, July 15, 2014 12:35 AM
>> >To: Christian König
>> >Cc: Jerome Glisse; Bridgman, John; Lewycky, Andrew; linux-
>> >kernel@vger.kernel.org; dri-devel@lists.freedesktop.org; Deucher,
>> >Alexander; akpm@linux-foundation.org
>> >Subject: Re: [PATCH 00/83] AMD HSA kernel driver
>> >
>> >On 14 July 2014 18:37, Christian König <deathsimple@vodafone.de> wrote:
>> >>> I vote for HSA module that expose ioctl and is an intermediary
>> >>> with the kernel driver that handle the hardware. This gives a
>> >>> single point for HSA hardware and yes this enforce things for any
>> >>> hardware
>> >manufacturer.
>> >>> I am more than happy to tell them that this is it and nothing else
>> >>> if they want to get upstream.
>> >>
>> >> I think we should still discuss this single point of entry a bit more.
>> >>
>> >> Just to make it clear the plan is to expose all physical HSA
>> >> capable devices through a single /dev/hsa device node to userspace.
>> >
>> >This is why we don't design kernel interfaces in secret foundations,
>> >and expect anyone to like them.
>>
>> Understood and agree. In this case though this isn't a cross-vendor
>> interface designed by a secret committee, it's supposed to be more of
>> an inoffensive little single-vendor interface designed *for* a secret
>> committee. I'm hoping that's better ;)
>>
>> >
>> >So before we go any further, how is this stuff planned to work for
>> >multiple GPUs/accelerators?
>>
>> Three classes of "multiple" :
>>
>> 1. Single CPU with IOMMUv2 and multiple GPUs:
>>
>> - all devices accessible via /dev/kfd
>> - topology information identifies CPU + GPUs, each has "node ID" at
>> top of userspace API, "global ID" at user/kernel interface  (don't
>> think we've implemented CPU part yet though)
>> - userspace builds snapshot from sysfs info & exposes to HSAIL
>> runtime, which in turn exposes the "standard" API
>
>This is why i do not like the sysfs approach, it would be lot nicer to have
>device file per provider and thus hsail can listen on device file event and
>discover if hardware is vanishing or appearing. Periodicaly going over sysfs
>files is not the right way to do that.

Agree that wouldn't be good. There's an event mechanism still to come - mostly for communicating fences and shader interrupts back to userspace, but also used for "device change" notifications, so no polling of sysfs.

>
>> - kfd sets up ATC aperture so GPUs can access system RAM via IOMMUv2
>> (fast for APU, relatively less so for dGPU over PCIE)
>> - to-be-added memory operations allow allocation & residency control
>> (within existing gfx driver limits) of buffers in VRAM & carved-out
>> system RAM
>> - queue operations specify a node ID to userspace library, which
>> translates to "global ID" before calling kfd
>>
>> 2. Multiple CPUs connected via fabric (eg HyperTransport) each with 0 or
>more GPUs:
>>
>> - topology information exposes CPUs & GPUs, along with affinity info
>> showing what is connected to what
>> - everything else works as in (1) above
>>
>
>This is suppose to be part of HSA ? This is lot broader than i thought.

Yes although it can be skipped on most systems. We figured that topology needed to cover everything that would be handled by a single OS image, so in a NUMA system it would need to cover all the CPUs. I think that is still the right scope, do you agree ?

>
>> 3. Multiple CPUs not connected via fabric (eg a blade server) each
>> with 0 or more GPUs
>>
>> - no attempt to cover this with HSA topology, each CPU and associated
>> GPUs is accessed independently via separate /dev/kfd instances
>>
>> >
>> >Do we have a userspace to exercise this interface so we can see how
>> >such a thing would look?
>>
>> Yes -- initial IP review done, legal stuff done, sanitizing WIP,
>> hoping for final approval this week
>>
>> There's a separate test harness to exercise the userspace lib calls,
>> haven't started IP review or sanitizing for that but legal stuff is
>> done
>>
>> >
>> >Dave.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH 00/83] AMD HSA kernel driver
@ 2014-07-15 17:53                         ` Bridgman, John
  0 siblings, 0 replies; 51+ messages in thread
From: Bridgman, John @ 2014-07-15 17:53 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Lewycky, Andrew, linux-kernel, dri-devel, Deucher, Alexander, akpm



>-----Original Message-----
>From: Jerome Glisse [mailto:j.glisse@gmail.com]
>Sent: Tuesday, July 15, 2014 1:37 PM
>To: Bridgman, John
>Cc: Dave Airlie; Christian König; Lewycky, Andrew; linux-
>kernel@vger.kernel.org; dri-devel@lists.freedesktop.org; Deucher,
>Alexander; akpm@linux-foundation.org
>Subject: Re: [PATCH 00/83] AMD HSA kernel driver
>
>On Tue, Jul 15, 2014 at 05:06:56PM +0000, Bridgman, John wrote:
>> >From: Dave Airlie [mailto:airlied@gmail.com]
>> >Sent: Tuesday, July 15, 2014 12:35 AM
>> >To: Christian König
>> >Cc: Jerome Glisse; Bridgman, John; Lewycky, Andrew; linux-
>> >kernel@vger.kernel.org; dri-devel@lists.freedesktop.org; Deucher,
>> >Alexander; akpm@linux-foundation.org
>> >Subject: Re: [PATCH 00/83] AMD HSA kernel driver
>> >
>> >On 14 July 2014 18:37, Christian König <deathsimple@vodafone.de> wrote:
>> >>> I vote for HSA module that expose ioctl and is an intermediary
>> >>> with the kernel driver that handle the hardware. This gives a
>> >>> single point for HSA hardware and yes this enforce things for any
>> >>> hardware
>> >manufacturer.
>> >>> I am more than happy to tell them that this is it and nothing else
>> >>> if they want to get upstream.
>> >>
>> >> I think we should still discuss this single point of entry a bit more.
>> >>
>> >> Just to make it clear the plan is to expose all physical HSA
>> >> capable devices through a single /dev/hsa device node to userspace.
>> >
>> >This is why we don't design kernel interfaces in secret foundations,
>> >and expect anyone to like them.
>>
>> Understood and agree. In this case though this isn't a cross-vendor
>> interface designed by a secret committee, it's supposed to be more of
>> an inoffensive little single-vendor interface designed *for* a secret
>> committee. I'm hoping that's better ;)
>>
>> >
>> >So before we go any further, how is this stuff planned to work for
>> >multiple GPUs/accelerators?
>>
>> Three classes of "multiple" :
>>
>> 1. Single CPU with IOMMUv2 and multiple GPUs:
>>
>> - all devices accessible via /dev/kfd
>> - topology information identifies CPU + GPUs, each has "node ID" at
>> top of userspace API, "global ID" at user/kernel interface  (don't
>> think we've implemented CPU part yet though)
>> - userspace builds snapshot from sysfs info & exposes to HSAIL
>> runtime, which in turn exposes the "standard" API
>
>This is why i do not like the sysfs approach, it would be lot nicer to have
>device file per provider and thus hsail can listen on device file event and
>discover if hardware is vanishing or appearing. Periodicaly going over sysfs
>files is not the right way to do that.

Agree that wouldn't be good. There's an event mechanism still to come - mostly for communicating fences and shader interrupts back to userspace, but also used for "device change" notifications, so no polling of sysfs.

>
>> - kfd sets up ATC aperture so GPUs can access system RAM via IOMMUv2
>> (fast for APU, relatively less so for dGPU over PCIE)
>> - to-be-added memory operations allow allocation & residency control
>> (within existing gfx driver limits) of buffers in VRAM & carved-out
>> system RAM
>> - queue operations specify a node ID to userspace library, which
>> translates to "global ID" before calling kfd
>>
>> 2. Multiple CPUs connected via fabric (eg HyperTransport) each with 0 or
>more GPUs:
>>
>> - topology information exposes CPUs & GPUs, along with affinity info
>> showing what is connected to what
>> - everything else works as in (1) above
>>
>
>This is suppose to be part of HSA ? This is lot broader than i thought.

Yes although it can be skipped on most systems. We figured that topology needed to cover everything that would be handled by a single OS image, so in a NUMA system it would need to cover all the CPUs. I think that is still the right scope, do you agree ?

>
>> 3. Multiple CPUs not connected via fabric (eg a blade server) each
>> with 0 or more GPUs
>>
>> - no attempt to cover this with HSA topology, each CPU and associated
>> GPUs is accessed independently via separate /dev/kfd instances
>>
>> >
>> >Do we have a userspace to exercise this interface so we can see how
>> >such a thing would look?
>>
>> Yes -- initial IP review done, legal stuff done, sanitizing WIP,
>> hoping for final approval this week
>>
>> There's a separate test harness to exercise the userspace lib calls,
>> haven't started IP review or sanitizing for that but legal stuff is
>> done
>>
>> >
>> >Dave.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/83] AMD HSA kernel driver
  2014-07-15 17:53                         ` Bridgman, John
@ 2014-07-15 18:04                           ` Jerome Glisse
  -1 siblings, 0 replies; 51+ messages in thread
From: Jerome Glisse @ 2014-07-15 18:04 UTC (permalink / raw)
  To: Bridgman, John
  Cc: Dave Airlie, Christian König, Lewycky, Andrew, linux-kernel,
	dri-devel, Deucher, Alexander, akpm

On Tue, Jul 15, 2014 at 05:53:32PM +0000, Bridgman, John wrote:
> >From: Jerome Glisse [mailto:j.glisse@gmail.com]
> >Sent: Tuesday, July 15, 2014 1:37 PM
> >To: Bridgman, John
> >Cc: Dave Airlie; Christian König; Lewycky, Andrew; linux-
> >kernel@vger.kernel.org; dri-devel@lists.freedesktop.org; Deucher,
> >Alexander; akpm@linux-foundation.org
> >Subject: Re: [PATCH 00/83] AMD HSA kernel driver
> >
> >On Tue, Jul 15, 2014 at 05:06:56PM +0000, Bridgman, John wrote:
> >> >From: Dave Airlie [mailto:airlied@gmail.com]
> >> >Sent: Tuesday, July 15, 2014 12:35 AM
> >> >To: Christian König
> >> >Cc: Jerome Glisse; Bridgman, John; Lewycky, Andrew; linux-
> >> >kernel@vger.kernel.org; dri-devel@lists.freedesktop.org; Deucher,
> >> >Alexander; akpm@linux-foundation.org
> >> >Subject: Re: [PATCH 00/83] AMD HSA kernel driver
> >> >
> >> >On 14 July 2014 18:37, Christian König <deathsimple@vodafone.de> wrote:
> >> >>> I vote for HSA module that expose ioctl and is an intermediary
> >> >>> with the kernel driver that handle the hardware. This gives a
> >> >>> single point for HSA hardware and yes this enforce things for any
> >> >>> hardware
> >> >manufacturer.
> >> >>> I am more than happy to tell them that this is it and nothing else
> >> >>> if they want to get upstream.
> >> >>
> >> >> I think we should still discuss this single point of entry a bit more.
> >> >>
> >> >> Just to make it clear the plan is to expose all physical HSA
> >> >> capable devices through a single /dev/hsa device node to userspace.
> >> >
> >> >This is why we don't design kernel interfaces in secret foundations,
> >> >and expect anyone to like them.
> >>
> >> Understood and agree. In this case though this isn't a cross-vendor
> >> interface designed by a secret committee, it's supposed to be more of
> >> an inoffensive little single-vendor interface designed *for* a secret
> >> committee. I'm hoping that's better ;)
> >>
> >> >
> >> >So before we go any further, how is this stuff planned to work for
> >> >multiple GPUs/accelerators?
> >>
> >> Three classes of "multiple" :
> >>
> >> 1. Single CPU with IOMMUv2 and multiple GPUs:
> >>
> >> - all devices accessible via /dev/kfd
> >> - topology information identifies CPU + GPUs, each has "node ID" at
> >> top of userspace API, "global ID" at user/kernel interface  (don't
> >> think we've implemented CPU part yet though)
> >> - userspace builds snapshot from sysfs info & exposes to HSAIL
> >> runtime, which in turn exposes the "standard" API
> >
> >This is why i do not like the sysfs approach, it would be lot nicer to have
> >device file per provider and thus hsail can listen on device file event and
> >discover if hardware is vanishing or appearing. Periodicaly going over sysfs
> >files is not the right way to do that.
> 
> Agree that wouldn't be good. There's an event mechanism still to come - mostly
> for communicating fences and shader interrupts back to userspace, but also used
> for "device change" notifications, so no polling of sysfs.
> 

My point being, do not use sysfs, use /dev/hsa/device* and have hsail listen on
file event on /dev/hsa/ directory. The hsail would be inform of new device and
of device that are unloaded. It would do a first pass to open each device file
and get there capabilities through standardize ioctl.

Thought maybe sysfs is ok given than cpu numa is expose through sysfs

> >
> >> - kfd sets up ATC aperture so GPUs can access system RAM via IOMMUv2
> >> (fast for APU, relatively less so for dGPU over PCIE)
> >> - to-be-added memory operations allow allocation & residency control
> >> (within existing gfx driver limits) of buffers in VRAM & carved-out
> >> system RAM
> >> - queue operations specify a node ID to userspace library, which
> >> translates to "global ID" before calling kfd
> >>
> >> 2. Multiple CPUs connected via fabric (eg HyperTransport) each with 0 or
> >more GPUs:
> >>
> >> - topology information exposes CPUs & GPUs, along with affinity info
> >> showing what is connected to what
> >> - everything else works as in (1) above
> >>
> >
> >This is suppose to be part of HSA ? This is lot broader than i thought.
> 
> Yes although it can be skipped on most systems. We figured that topology
> needed to cover everything that would be handled by a single OS image, so
> in a NUMA system it would need to cover all the CPUs. I think that is still
> the right scope, do you agree ?

I think it is a idea to duplicate cpu. I would rather have each device
give its afinity against each cpu and for cpu just keep the existing
kernel api that expose this through sysfs iirc.

> 
> >
> >> 3. Multiple CPUs not connected via fabric (eg a blade server) each
> >> with 0 or more GPUs
> >>
> >> - no attempt to cover this with HSA topology, each CPU and associated
> >> GPUs is accessed independently via separate /dev/kfd instances
> >>
> >> >
> >> >Do we have a userspace to exercise this interface so we can see how
> >> >such a thing would look?
> >>
> >> Yes -- initial IP review done, legal stuff done, sanitizing WIP,
> >> hoping for final approval this week
> >>
> >> There's a separate test harness to exercise the userspace lib calls,
> >> haven't started IP review or sanitizing for that but legal stuff is
> >> done
> >>
> >> >
> >> >Dave.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/83] AMD HSA kernel driver
@ 2014-07-15 18:04                           ` Jerome Glisse
  0 siblings, 0 replies; 51+ messages in thread
From: Jerome Glisse @ 2014-07-15 18:04 UTC (permalink / raw)
  To: Bridgman, John
  Cc: Lewycky, Andrew, linux-kernel, dri-devel, Deucher, Alexander, akpm

On Tue, Jul 15, 2014 at 05:53:32PM +0000, Bridgman, John wrote:
> >From: Jerome Glisse [mailto:j.glisse@gmail.com]
> >Sent: Tuesday, July 15, 2014 1:37 PM
> >To: Bridgman, John
> >Cc: Dave Airlie; Christian König; Lewycky, Andrew; linux-
> >kernel@vger.kernel.org; dri-devel@lists.freedesktop.org; Deucher,
> >Alexander; akpm@linux-foundation.org
> >Subject: Re: [PATCH 00/83] AMD HSA kernel driver
> >
> >On Tue, Jul 15, 2014 at 05:06:56PM +0000, Bridgman, John wrote:
> >> >From: Dave Airlie [mailto:airlied@gmail.com]
> >> >Sent: Tuesday, July 15, 2014 12:35 AM
> >> >To: Christian König
> >> >Cc: Jerome Glisse; Bridgman, John; Lewycky, Andrew; linux-
> >> >kernel@vger.kernel.org; dri-devel@lists.freedesktop.org; Deucher,
> >> >Alexander; akpm@linux-foundation.org
> >> >Subject: Re: [PATCH 00/83] AMD HSA kernel driver
> >> >
> >> >On 14 July 2014 18:37, Christian König <deathsimple@vodafone.de> wrote:
> >> >>> I vote for HSA module that expose ioctl and is an intermediary
> >> >>> with the kernel driver that handle the hardware. This gives a
> >> >>> single point for HSA hardware and yes this enforce things for any
> >> >>> hardware
> >> >manufacturer.
> >> >>> I am more than happy to tell them that this is it and nothing else
> >> >>> if they want to get upstream.
> >> >>
> >> >> I think we should still discuss this single point of entry a bit more.
> >> >>
> >> >> Just to make it clear the plan is to expose all physical HSA
> >> >> capable devices through a single /dev/hsa device node to userspace.
> >> >
> >> >This is why we don't design kernel interfaces in secret foundations,
> >> >and expect anyone to like them.
> >>
> >> Understood and agree. In this case though this isn't a cross-vendor
> >> interface designed by a secret committee, it's supposed to be more of
> >> an inoffensive little single-vendor interface designed *for* a secret
> >> committee. I'm hoping that's better ;)
> >>
> >> >
> >> >So before we go any further, how is this stuff planned to work for
> >> >multiple GPUs/accelerators?
> >>
> >> Three classes of "multiple" :
> >>
> >> 1. Single CPU with IOMMUv2 and multiple GPUs:
> >>
> >> - all devices accessible via /dev/kfd
> >> - topology information identifies CPU + GPUs, each has "node ID" at
> >> top of userspace API, "global ID" at user/kernel interface  (don't
> >> think we've implemented CPU part yet though)
> >> - userspace builds snapshot from sysfs info & exposes to HSAIL
> >> runtime, which in turn exposes the "standard" API
> >
> >This is why i do not like the sysfs approach, it would be lot nicer to have
> >device file per provider and thus hsail can listen on device file event and
> >discover if hardware is vanishing or appearing. Periodicaly going over sysfs
> >files is not the right way to do that.
> 
> Agree that wouldn't be good. There's an event mechanism still to come - mostly
> for communicating fences and shader interrupts back to userspace, but also used
> for "device change" notifications, so no polling of sysfs.
> 

My point being, do not use sysfs, use /dev/hsa/device* and have hsail listen on
file event on /dev/hsa/ directory. The hsail would be inform of new device and
of device that are unloaded. It would do a first pass to open each device file
and get there capabilities through standardize ioctl.

Thought maybe sysfs is ok given than cpu numa is expose through sysfs

> >
> >> - kfd sets up ATC aperture so GPUs can access system RAM via IOMMUv2
> >> (fast for APU, relatively less so for dGPU over PCIE)
> >> - to-be-added memory operations allow allocation & residency control
> >> (within existing gfx driver limits) of buffers in VRAM & carved-out
> >> system RAM
> >> - queue operations specify a node ID to userspace library, which
> >> translates to "global ID" before calling kfd
> >>
> >> 2. Multiple CPUs connected via fabric (eg HyperTransport) each with 0 or
> >more GPUs:
> >>
> >> - topology information exposes CPUs & GPUs, along with affinity info
> >> showing what is connected to what
> >> - everything else works as in (1) above
> >>
> >
> >This is suppose to be part of HSA ? This is lot broader than i thought.
> 
> Yes although it can be skipped on most systems. We figured that topology
> needed to cover everything that would be handled by a single OS image, so
> in a NUMA system it would need to cover all the CPUs. I think that is still
> the right scope, do you agree ?

I think it is a idea to duplicate cpu. I would rather have each device
give its afinity against each cpu and for cpu just keep the existing
kernel api that expose this through sysfs iirc.

> 
> >
> >> 3. Multiple CPUs not connected via fabric (eg a blade server) each
> >> with 0 or more GPUs
> >>
> >> - no attempt to cover this with HSA topology, each CPU and associated
> >> GPUs is accessed independently via separate /dev/kfd instances
> >>
> >> >
> >> >Do we have a userspace to exercise this interface so we can see how
> >> >such a thing would look?
> >>
> >> Yes -- initial IP review done, legal stuff done, sanitizing WIP,
> >> hoping for final approval this week
> >>
> >> There's a separate test harness to exercise the userspace lib calls,
> >> haven't started IP review or sanitizing for that but legal stuff is
> >> done
> >>
> >> >
> >> >Dave.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/83] AMD HSA kernel driver
  2014-07-15 17:53                         ` Bridgman, John
@ 2014-07-16  8:21                           ` Daniel Vetter
  -1 siblings, 0 replies; 51+ messages in thread
From: Daniel Vetter @ 2014-07-16  8:21 UTC (permalink / raw)
  To: Bridgman, John
  Cc: Jerome Glisse, Lewycky, Andrew, linux-kernel, dri-devel, Deucher,
	Alexander, akpm, Greg KH, Maarten Lankhorst, linaro-mm-sig

On Tue, Jul 15, 2014 at 7:53 PM, Bridgman, John <John.Bridgman@amd.com> wrote:
[snip away the discussion about hsa device discover, I'm hijacking
this thread just for the event/fence stuff here.]

> ... There's an event mechanism still to come - mostly for communicating fences and shader interrupts back to userspace, but also used for "device change" notifications, so no polling of sysfs.

That could would be interesting. On i915 my plan is to internally use
the recently added struct fence from Maarten. For the external
interface for userspace that wants explicit control over fences I'm
leaning towards polishing the android syncpt stuff (currently in
staging). But in any case I _really_ want to avoid that we end up with
multiple different and incompatible explicit fencing interfaces on
linux.

Adding relevant people.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/83] AMD HSA kernel driver
@ 2014-07-16  8:21                           ` Daniel Vetter
  0 siblings, 0 replies; 51+ messages in thread
From: Daniel Vetter @ 2014-07-16  8:21 UTC (permalink / raw)
  To: Bridgman, John
  Cc: Lewycky, Andrew, Greg KH, linux-kernel, dri-devel, linaro-mm-sig,
	Deucher, Alexander, akpm

On Tue, Jul 15, 2014 at 7:53 PM, Bridgman, John <John.Bridgman@amd.com> wrote:
[snip away the discussion about hsa device discover, I'm hijacking
this thread just for the event/fence stuff here.]

> ... There's an event mechanism still to come - mostly for communicating fences and shader interrupts back to userspace, but also used for "device change" notifications, so no polling of sysfs.

That could would be interesting. On i915 my plan is to internally use
the recently added struct fence from Maarten. For the external
interface for userspace that wants explicit control over fences I'm
leaning towards polishing the android syncpt stuff (currently in
staging). But in any case I _really_ want to avoid that we end up with
multiple different and incompatible explicit fencing interfaces on
linux.

Adding relevant people.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/83] AMD HSA kernel driver
  2014-07-15 18:04                           ` Jerome Glisse
@ 2014-07-16  8:27                             ` Daniel Vetter
  -1 siblings, 0 replies; 51+ messages in thread
From: Daniel Vetter @ 2014-07-16  8:27 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Bridgman, John, Lewycky, Andrew, linux-kernel, dri-devel,
	Deucher, Alexander, akpm

On Tue, Jul 15, 2014 at 8:04 PM, Jerome Glisse <j.glisse@gmail.com> wrote:
>> Yes although it can be skipped on most systems. We figured that topology
>> needed to cover everything that would be handled by a single OS image, so
>> in a NUMA system it would need to cover all the CPUs. I think that is still
>> the right scope, do you agree ?
>
> I think it is a idea to duplicate cpu. I would rather have each device
> give its afinity against each cpu and for cpu just keep the existing
> kernel api that expose this through sysfs iirc.

It's all there already if we fix up the hsa dev-node model to expose
one dev node per underlying device instead of one for everything:
- cpus already expose the full numa topology in sysfs
- pci devices have a numa_node file in sysfs to display the link
- we can easily add similar stuff for platform devices on arm socs
without pci devices.

Then the only thing userspace needs to do is follow the device link in
the hsa instance node in sysfs and we have all the information
exposed. Iff we expose one hsa driver instance to userspace per
physical device (which is the normal linux device driver model
anyway).

I don't see a need to add anything hsa specific here at all (well
maybe some description of the cache architecture on the hsa device
itself, the spec seems to have provisions for that).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/83] AMD HSA kernel driver
@ 2014-07-16  8:27                             ` Daniel Vetter
  0 siblings, 0 replies; 51+ messages in thread
From: Daniel Vetter @ 2014-07-16  8:27 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Lewycky, Andrew, linux-kernel, dri-devel, Deucher, Alexander, akpm

On Tue, Jul 15, 2014 at 8:04 PM, Jerome Glisse <j.glisse@gmail.com> wrote:
>> Yes although it can be skipped on most systems. We figured that topology
>> needed to cover everything that would be handled by a single OS image, so
>> in a NUMA system it would need to cover all the CPUs. I think that is still
>> the right scope, do you agree ?
>
> I think it is a idea to duplicate cpu. I would rather have each device
> give its afinity against each cpu and for cpu just keep the existing
> kernel api that expose this through sysfs iirc.

It's all there already if we fix up the hsa dev-node model to expose
one dev node per underlying device instead of one for everything:
- cpus already expose the full numa topology in sysfs
- pci devices have a numa_node file in sysfs to display the link
- we can easily add similar stuff for platform devices on arm socs
without pci devices.

Then the only thing userspace needs to do is follow the device link in
the hsa instance node in sysfs and we have all the information
exposed. Iff we expose one hsa driver instance to userspace per
physical device (which is the normal linux device driver model
anyway).

I don't see a need to add anything hsa specific here at all (well
maybe some description of the cache architecture on the hsa device
itself, the spec seems to have provisions for that).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/83] AMD HSA kernel driver
  2014-07-16  8:27                             ` Daniel Vetter
@ 2014-07-16 14:52                               ` Jerome Glisse
  -1 siblings, 0 replies; 51+ messages in thread
From: Jerome Glisse @ 2014-07-16 14:52 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Bridgman, John, Lewycky, Andrew, linux-kernel, dri-devel,
	Deucher, Alexander, akpm

On Wed, Jul 16, 2014 at 10:27:42AM +0200, Daniel Vetter wrote:
> On Tue, Jul 15, 2014 at 8:04 PM, Jerome Glisse <j.glisse@gmail.com> wrote:
> >> Yes although it can be skipped on most systems. We figured that topology
> >> needed to cover everything that would be handled by a single OS image, so
> >> in a NUMA system it would need to cover all the CPUs. I think that is still
> >> the right scope, do you agree ?
> >
> > I think it is a idea to duplicate cpu. I would rather have each device
> > give its afinity against each cpu and for cpu just keep the existing
> > kernel api that expose this through sysfs iirc.
> 
> It's all there already if we fix up the hsa dev-node model to expose
> one dev node per underlying device instead of one for everything:
> - cpus already expose the full numa topology in sysfs
> - pci devices have a numa_node file in sysfs to display the link
> - we can easily add similar stuff for platform devices on arm socs
> without pci devices.
> 
> Then the only thing userspace needs to do is follow the device link in
> the hsa instance node in sysfs and we have all the information
> exposed. Iff we expose one hsa driver instance to userspace per
> physical device (which is the normal linux device driver model
> anyway).
> 
> I don't see a need to add anything hsa specific here at all (well
> maybe some description of the cache architecture on the hsa device
> itself, the spec seems to have provisions for that).
> -Daniel

What is HSA specific is userspace command queue in form of common ring
buffer execution queue all sharing common packet format. So yes i see
a reason for an HSA class that provide common ioctl through one dev file
per device. Note that i am not a fan of userspace command queue given
that linux ioctl overhead is small and having kernel do stuff would
allow for really "infinite" number of userspace context while right
now limit is DOORBELL_APERTURE_SIZE/PAGE_SIZE.

No, CPU should not be included, neither should the numa topology of
device. And yes all numa topology should use existing kernel interface.
I however understand that a second GPU specific topology might make
sense ie if you have specialize link btw some discrete GPU.

So if Intel wants to join the HSA foundation fine, but unless you are
ready to implement what is needed i do not see the value of forcing
your wish on another group that is trying to standardize something.

Cheers,
Jérôme

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/83] AMD HSA kernel driver
@ 2014-07-16 14:52                               ` Jerome Glisse
  0 siblings, 0 replies; 51+ messages in thread
From: Jerome Glisse @ 2014-07-16 14:52 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Lewycky, Andrew, linux-kernel, dri-devel, Deucher, Alexander, akpm

On Wed, Jul 16, 2014 at 10:27:42AM +0200, Daniel Vetter wrote:
> On Tue, Jul 15, 2014 at 8:04 PM, Jerome Glisse <j.glisse@gmail.com> wrote:
> >> Yes although it can be skipped on most systems. We figured that topology
> >> needed to cover everything that would be handled by a single OS image, so
> >> in a NUMA system it would need to cover all the CPUs. I think that is still
> >> the right scope, do you agree ?
> >
> > I think it is a idea to duplicate cpu. I would rather have each device
> > give its afinity against each cpu and for cpu just keep the existing
> > kernel api that expose this through sysfs iirc.
> 
> It's all there already if we fix up the hsa dev-node model to expose
> one dev node per underlying device instead of one for everything:
> - cpus already expose the full numa topology in sysfs
> - pci devices have a numa_node file in sysfs to display the link
> - we can easily add similar stuff for platform devices on arm socs
> without pci devices.
> 
> Then the only thing userspace needs to do is follow the device link in
> the hsa instance node in sysfs and we have all the information
> exposed. Iff we expose one hsa driver instance to userspace per
> physical device (which is the normal linux device driver model
> anyway).
> 
> I don't see a need to add anything hsa specific here at all (well
> maybe some description of the cache architecture on the hsa device
> itself, the spec seems to have provisions for that).
> -Daniel

What is HSA specific is userspace command queue in form of common ring
buffer execution queue all sharing common packet format. So yes i see
a reason for an HSA class that provide common ioctl through one dev file
per device. Note that i am not a fan of userspace command queue given
that linux ioctl overhead is small and having kernel do stuff would
allow for really "infinite" number of userspace context while right
now limit is DOORBELL_APERTURE_SIZE/PAGE_SIZE.

No, CPU should not be included, neither should the numa topology of
device. And yes all numa topology should use existing kernel interface.
I however understand that a second GPU specific topology might make
sense ie if you have specialize link btw some discrete GPU.

So if Intel wants to join the HSA foundation fine, but unless you are
ready to implement what is needed i do not see the value of forcing
your wish on another group that is trying to standardize something.

Cheers,
Jérôme

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/83] AMD HSA kernel driver
  2014-07-16 14:52                               ` Jerome Glisse
@ 2014-07-16 15:25                                 ` Daniel Vetter
  -1 siblings, 0 replies; 51+ messages in thread
From: Daniel Vetter @ 2014-07-16 15:25 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Daniel Vetter, Bridgman, John, Lewycky, Andrew, linux-kernel,
	dri-devel, Deucher, Alexander, akpm

On Wed, Jul 16, 2014 at 10:52:56AM -0400, Jerome Glisse wrote:
> On Wed, Jul 16, 2014 at 10:27:42AM +0200, Daniel Vetter wrote:
> > On Tue, Jul 15, 2014 at 8:04 PM, Jerome Glisse <j.glisse@gmail.com> wrote:
> > >> Yes although it can be skipped on most systems. We figured that topology
> > >> needed to cover everything that would be handled by a single OS image, so
> > >> in a NUMA system it would need to cover all the CPUs. I think that is still
> > >> the right scope, do you agree ?
> > >
> > > I think it is a idea to duplicate cpu. I would rather have each device
> > > give its afinity against each cpu and for cpu just keep the existing
> > > kernel api that expose this through sysfs iirc.
> > 
> > It's all there already if we fix up the hsa dev-node model to expose
> > one dev node per underlying device instead of one for everything:
> > - cpus already expose the full numa topology in sysfs
> > - pci devices have a numa_node file in sysfs to display the link
> > - we can easily add similar stuff for platform devices on arm socs
> > without pci devices.
> > 
> > Then the only thing userspace needs to do is follow the device link in
> > the hsa instance node in sysfs and we have all the information
> > exposed. Iff we expose one hsa driver instance to userspace per
> > physical device (which is the normal linux device driver model
> > anyway).
> > 
> > I don't see a need to add anything hsa specific here at all (well
> > maybe some description of the cache architecture on the hsa device
> > itself, the spec seems to have provisions for that).
> > -Daniel
> 
> What is HSA specific is userspace command queue in form of common ring
> buffer execution queue all sharing common packet format. So yes i see
> a reason for an HSA class that provide common ioctl through one dev file
> per device. Note that i am not a fan of userspace command queue given
> that linux ioctl overhead is small and having kernel do stuff would
> allow for really "infinite" number of userspace context while right
> now limit is DOORBELL_APERTURE_SIZE/PAGE_SIZE.
> 
> No, CPU should not be included, neither should the numa topology of
> device. And yes all numa topology should use existing kernel interface.
> I however understand that a second GPU specific topology might make
> sense ie if you have specialize link btw some discrete GPU.
> 
> So if Intel wants to join the HSA foundation fine, but unless you are
> ready to implement what is needed i do not see the value of forcing
> your wish on another group that is trying to standardize something.

You're mixing up my replies ;-) This was really just a comment on the
proposed hsa interfaces for exposing the topology - we already have all
the stuff exposed in sysfs for cpus and pci devices, so exposing this
again through a hsa specific interface doesn't make much sense imo.

What intel does or not does is completely irrelevant for my comment. I.e.
I've written the above with my drm hacker hat on, not with my intel hat
on.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/83] AMD HSA kernel driver
@ 2014-07-16 15:25                                 ` Daniel Vetter
  0 siblings, 0 replies; 51+ messages in thread
From: Daniel Vetter @ 2014-07-16 15:25 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Lewycky, Andrew, linux-kernel, dri-devel, Deucher, Alexander, akpm

On Wed, Jul 16, 2014 at 10:52:56AM -0400, Jerome Glisse wrote:
> On Wed, Jul 16, 2014 at 10:27:42AM +0200, Daniel Vetter wrote:
> > On Tue, Jul 15, 2014 at 8:04 PM, Jerome Glisse <j.glisse@gmail.com> wrote:
> > >> Yes although it can be skipped on most systems. We figured that topology
> > >> needed to cover everything that would be handled by a single OS image, so
> > >> in a NUMA system it would need to cover all the CPUs. I think that is still
> > >> the right scope, do you agree ?
> > >
> > > I think it is a idea to duplicate cpu. I would rather have each device
> > > give its afinity against each cpu and for cpu just keep the existing
> > > kernel api that expose this through sysfs iirc.
> > 
> > It's all there already if we fix up the hsa dev-node model to expose
> > one dev node per underlying device instead of one for everything:
> > - cpus already expose the full numa topology in sysfs
> > - pci devices have a numa_node file in sysfs to display the link
> > - we can easily add similar stuff for platform devices on arm socs
> > without pci devices.
> > 
> > Then the only thing userspace needs to do is follow the device link in
> > the hsa instance node in sysfs and we have all the information
> > exposed. Iff we expose one hsa driver instance to userspace per
> > physical device (which is the normal linux device driver model
> > anyway).
> > 
> > I don't see a need to add anything hsa specific here at all (well
> > maybe some description of the cache architecture on the hsa device
> > itself, the spec seems to have provisions for that).
> > -Daniel
> 
> What is HSA specific is userspace command queue in form of common ring
> buffer execution queue all sharing common packet format. So yes i see
> a reason for an HSA class that provide common ioctl through one dev file
> per device. Note that i am not a fan of userspace command queue given
> that linux ioctl overhead is small and having kernel do stuff would
> allow for really "infinite" number of userspace context while right
> now limit is DOORBELL_APERTURE_SIZE/PAGE_SIZE.
> 
> No, CPU should not be included, neither should the numa topology of
> device. And yes all numa topology should use existing kernel interface.
> I however understand that a second GPU specific topology might make
> sense ie if you have specialize link btw some discrete GPU.
> 
> So if Intel wants to join the HSA foundation fine, but unless you are
> ready to implement what is needed i do not see the value of forcing
> your wish on another group that is trying to standardize something.

You're mixing up my replies ;-) This was really just a comment on the
proposed hsa interfaces for exposing the topology - we already have all
the stuff exposed in sysfs for cpus and pci devices, so exposing this
again through a hsa specific interface doesn't make much sense imo.

What intel does or not does is completely irrelevant for my comment. I.e.
I've written the above with my drm hacker hat on, not with my intel hat
on.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/83] AMD HSA kernel driver
  2014-07-16  8:21                           ` Daniel Vetter
@ 2014-07-16 19:02                             ` Greg KH
  -1 siblings, 0 replies; 51+ messages in thread
From: Greg KH @ 2014-07-16 19:02 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Bridgman, John, Jerome Glisse, Lewycky, Andrew, linux-kernel,
	dri-devel, Deucher, Alexander, akpm, Maarten Lankhorst,
	linaro-mm-sig

On Wed, Jul 16, 2014 at 10:21:14AM +0200, Daniel Vetter wrote:
> On Tue, Jul 15, 2014 at 7:53 PM, Bridgman, John <John.Bridgman@amd.com> wrote:
> [snip away the discussion about hsa device discover, I'm hijacking
> this thread just for the event/fence stuff here.]
> 
> > ... There's an event mechanism still to come - mostly for communicating fences and shader interrupts back to userspace, but also used for "device change" notifications, so no polling of sysfs.
> 
> That could would be interesting. On i915 my plan is to internally use
> the recently added struct fence from Maarten. For the external
> interface for userspace that wants explicit control over fences I'm
> leaning towards polishing the android syncpt stuff (currently in
> staging). But in any case I _really_ want to avoid that we end up with
> multiple different and incompatible explicit fencing interfaces on
> linux.

I agree, and I'll say it stronger than that, we WILL NOT have different
and incompatible fencing interfaces in the kernel.  That way lies
madness.

John, take a look at what is now in linux-next, it should provide what
you need here, right?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/83] AMD HSA kernel driver
@ 2014-07-16 19:02                             ` Greg KH
  0 siblings, 0 replies; 51+ messages in thread
From: Greg KH @ 2014-07-16 19:02 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Lewycky, Andrew, linux-kernel, dri-devel, linaro-mm-sig, Deucher,
	Alexander, akpm

On Wed, Jul 16, 2014 at 10:21:14AM +0200, Daniel Vetter wrote:
> On Tue, Jul 15, 2014 at 7:53 PM, Bridgman, John <John.Bridgman@amd.com> wrote:
> [snip away the discussion about hsa device discover, I'm hijacking
> this thread just for the event/fence stuff here.]
> 
> > ... There's an event mechanism still to come - mostly for communicating fences and shader interrupts back to userspace, but also used for "device change" notifications, so no polling of sysfs.
> 
> That could would be interesting. On i915 my plan is to internally use
> the recently added struct fence from Maarten. For the external
> interface for userspace that wants explicit control over fences I'm
> leaning towards polishing the android syncpt stuff (currently in
> staging). But in any case I _really_ want to avoid that we end up with
> multiple different and incompatible explicit fencing interfaces on
> linux.

I agree, and I'll say it stronger than that, we WILL NOT have different
and incompatible fencing interfaces in the kernel.  That way lies
madness.

John, take a look at what is now in linux-next, it should provide what
you need here, right?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/83] AMD HSA kernel driver
  2014-07-13 16:40                 ` Jerome Glisse
  (?)
@ 2014-07-23 23:57                 ` Jesse Barnes
  2014-07-24  7:16                   ` Daniel Vetter
  -1 siblings, 1 reply; 51+ messages in thread
From: Jesse Barnes @ 2014-07-23 23:57 UTC (permalink / raw)
  To: dri-devel

On Sun, 13 Jul 2014 12:40:32 -0400
j.glisse at gmail.com (Jerome Glisse) wrote:

> On Sun, Jul 13, 2014 at 11:42:58AM +0200, Daniel Vetter wrote:
> > On Sat, Jul 12, 2014 at 6:49 PM, Jerome Glisse <j.glisse at gmail.com> wrote:
> > >> Hm, so the hsa part is a completely new driver/subsystem, not just an
> > >> additional ioctl tacked onto radoen? The history of drm is littered with
> > >> "generic" ioctls that turned out to be useful for exactly one driver.
> > >> Which is why _all_ the command submission is now done with driver-private
> > >> ioctls.
> > >>
> > >> I'd be quite a bit surprised if that suddenly works differently, so before
> > >> we bless a generic hsa interface I really want to see some implementation
> > >> from a different vendor (i.e. nvdidia or intel) using the same ioctls.
> > >> Otherwise we just repeat history and I'm not terribly inclined to keep on
> > >> cleanup up cruft forever - one drm legacy is enough ;-)
> > >>
> > >> Jesse is the guy from our side to talk to about this.
> > >> -Daniel
> > >
> > > I am not worried about that side, the hsa foundation has pretty strict
> > > guidelines on what is hsa compliant hardware ie the hw needs to understand
> > > the pm4 packet format of radeon (well small subset of it). But of course
> > > this require hsa compliant hardware and from member i am guessing ARM Mali,
> > > ImgTech, Qualcomm, ... so unless Intel and NVidia joins hsa you will not
> > > see it for those hardware.
> > >
> > > So yes for once same ioctl would apply to different hardware. The only things
> > > that is different is the shader isa. The hsafoundation site has some pdf
> > > explaining all that but someone thought that slideshare would be a good idea
> > > personnaly i would not register to any of the website just to get the pdf.
> > >
> > > So to sumup i am ok with having a new device file that present uniform set
> > > of ioctl. It would actualy be lot easier for userspace, just open this fix
> > > device file and ask for list of compliant hardware.
> > >
> > > Then radeon kernel driver would register itself as a provider. So all ioctl
> > > decoding marshalling would be share which makes sense.
> > 
> > There's also the other side namely that preparing the cp ring in
> > userspace and submitting the entire pile through a doorbell to the hw
> > scheduler isn't really hsa exclusive. And for a solid platform with
> > seamless gpu/cpu integration that means we need standard ways to set
> > gpu context priorities and get at useful stats like gpu time used by a
> > given context.
> > 
> > To get there I guess intel/nvidia need to reuse the hsa subsystem with
> > the command submission adjusted a bit. Kinda like drm where kms and
> > buffer sharing is common and cs driver specific.
> 
> HSA module would be for HSA compliant hardware and thus hardware would
> need to follow HSA specification which again is pretty clear on what
> the hardware need to provide. So if Intel and NVidia wants to join HSA
> i am sure they would be welcome, the more the merrier :)
> 
> So i would not block HSA kernel ioctl design in order to please non HSA
> hardware especialy if at this point in time nor Intel or NVidia can
> share anything concret on the design and how this things should be setup
> for there hardware.
> 
> When Intel or NVidia present their own API they should provide their
> own set of ioctl through their own platform.

Yeah things are different enough that a uniform ioctl doesn't make
sense.  If/when all the vendors decide on a single standard, we can use
that, but until then I don't see a nice way to share our doorbell &
submission scheme with HSA, and I assume nvidia is the same.

Using HSA as a basis for non-HSA systems seems like it would add a lot
of complexity, since non-HSA hardware would have to intercept the queue
writes and manage the submission requests etc as bytecodes in the
kernel driver, or maybe as a shim layer library that wraps that stuff.

Probably not worth the effort given that the command sets themselves
are all custom as well, driven by specific user level drivers like GL,
CL, and libva.

-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/83] AMD HSA kernel driver
  2014-07-23 23:57                 ` Jesse Barnes
@ 2014-07-24  7:16                   ` Daniel Vetter
  0 siblings, 0 replies; 51+ messages in thread
From: Daniel Vetter @ 2014-07-24  7:16 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: dri-devel

On Wed, Jul 23, 2014 at 04:57:48PM -0700, Jesse Barnes wrote:
> On Sun, 13 Jul 2014 12:40:32 -0400
> j.glisse at gmail.com (Jerome Glisse) wrote:
> 
> > On Sun, Jul 13, 2014 at 11:42:58AM +0200, Daniel Vetter wrote:
> > > On Sat, Jul 12, 2014 at 6:49 PM, Jerome Glisse <j.glisse at gmail.com> wrote:
> > > >> Hm, so the hsa part is a completely new driver/subsystem, not just an
> > > >> additional ioctl tacked onto radoen? The history of drm is littered with
> > > >> "generic" ioctls that turned out to be useful for exactly one driver.
> > > >> Which is why _all_ the command submission is now done with driver-private
> > > >> ioctls.
> > > >>
> > > >> I'd be quite a bit surprised if that suddenly works differently, so before
> > > >> we bless a generic hsa interface I really want to see some implementation
> > > >> from a different vendor (i.e. nvdidia or intel) using the same ioctls.
> > > >> Otherwise we just repeat history and I'm not terribly inclined to keep on
> > > >> cleanup up cruft forever - one drm legacy is enough ;-)
> > > >>
> > > >> Jesse is the guy from our side to talk to about this.
> > > >> -Daniel
> > > >
> > > > I am not worried about that side, the hsa foundation has pretty strict
> > > > guidelines on what is hsa compliant hardware ie the hw needs to understand
> > > > the pm4 packet format of radeon (well small subset of it). But of course
> > > > this require hsa compliant hardware and from member i am guessing ARM Mali,
> > > > ImgTech, Qualcomm, ... so unless Intel and NVidia joins hsa you will not
> > > > see it for those hardware.
> > > >
> > > > So yes for once same ioctl would apply to different hardware. The only things
> > > > that is different is the shader isa. The hsafoundation site has some pdf
> > > > explaining all that but someone thought that slideshare would be a good idea
> > > > personnaly i would not register to any of the website just to get the pdf.
> > > >
> > > > So to sumup i am ok with having a new device file that present uniform set
> > > > of ioctl. It would actualy be lot easier for userspace, just open this fix
> > > > device file and ask for list of compliant hardware.
> > > >
> > > > Then radeon kernel driver would register itself as a provider. So all ioctl
> > > > decoding marshalling would be share which makes sense.
> > > 
> > > There's also the other side namely that preparing the cp ring in
> > > userspace and submitting the entire pile through a doorbell to the hw
> > > scheduler isn't really hsa exclusive. And for a solid platform with
> > > seamless gpu/cpu integration that means we need standard ways to set
> > > gpu context priorities and get at useful stats like gpu time used by a
> > > given context.
> > > 
> > > To get there I guess intel/nvidia need to reuse the hsa subsystem with
> > > the command submission adjusted a bit. Kinda like drm where kms and
> > > buffer sharing is common and cs driver specific.
> > 
> > HSA module would be for HSA compliant hardware and thus hardware would
> > need to follow HSA specification which again is pretty clear on what
> > the hardware need to provide. So if Intel and NVidia wants to join HSA
> > i am sure they would be welcome, the more the merrier :)
> > 
> > So i would not block HSA kernel ioctl design in order to please non HSA
> > hardware especialy if at this point in time nor Intel or NVidia can
> > share anything concret on the design and how this things should be setup
> > for there hardware.
> > 
> > When Intel or NVidia present their own API they should provide their
> > own set of ioctl through their own platform.
> 
> Yeah things are different enough that a uniform ioctl doesn't make
> sense.  If/when all the vendors decide on a single standard, we can use
> that, but until then I don't see a nice way to share our doorbell &
> submission scheme with HSA, and I assume nvidia is the same.
> 
> Using HSA as a basis for non-HSA systems seems like it would add a lot
> of complexity, since non-HSA hardware would have to intercept the queue
> writes and manage the submission requests etc as bytecodes in the
> kernel driver, or maybe as a shim layer library that wraps that stuff.
> 
> Probably not worth the effort given that the command sets themselves
> are all custom as well, driven by specific user level drivers like GL,
> CL, and libva.

Well I know that - drm also has the split between shared management stuff
like prime and driver private cmd submission. I still think that some
common interfaces would benefit us. I want things like a gputop (and also
perf counters and all that) to work the same way with the same tooling on
all svm/gpgpu stuff. So a shared namespace/create for svm contexts or
something like that.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2014-07-24  7:15 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-10 21:45 [PATCH 00/83] AMD HSA kernel driver Oded Gabbay
2014-07-10 22:24 ` Jerome Glisse
2014-07-10 22:24   ` Jerome Glisse
2014-07-10 22:51   ` Gabbay, Oded
2014-07-10 22:51     ` Gabbay, Oded
2014-07-11 21:18     ` Jerome Glisse
2014-07-11 21:18       ` Jerome Glisse
2014-07-12  9:24       ` Christian König
2014-07-12 11:10         ` Daniel Vetter
2014-07-12 11:10           ` Daniel Vetter
2014-07-12 16:49           ` Jerome Glisse
2014-07-13  9:42             ` Daniel Vetter
2014-07-13  9:42               ` Daniel Vetter
2014-07-13 16:40               ` Jerome Glisse
2014-07-13 16:40                 ` Jerome Glisse
2014-07-23 23:57                 ` Jesse Barnes
2014-07-24  7:16                   ` Daniel Vetter
2014-07-12 21:55       ` Gabbay, Oded
2014-07-12 21:55         ` Gabbay, Oded
2014-07-13  3:55         ` Jerome Glisse
2014-07-13  3:55           ` Jerome Glisse
2014-07-13 15:34           ` Bridgman, John
2014-07-13 15:34             ` Bridgman, John
2014-07-13 16:49             ` Jerome Glisse
2014-07-13 16:49               ` Jerome Glisse
2014-07-14  8:37               ` Christian König
2014-07-14  8:37                 ` Christian König
2014-07-15  4:35                 ` Dave Airlie
2014-07-15  4:35                   ` Dave Airlie
2014-07-15 14:29                   ` Jerome Glisse
2014-07-15 14:29                     ` Jerome Glisse
2014-07-15 17:06                   ` Bridgman, John
2014-07-15 17:06                     ` Bridgman, John
2014-07-15 17:23                     ` Bridgman, John
2014-07-15 17:23                       ` Bridgman, John
2014-07-15 17:37                     ` Jerome Glisse
2014-07-15 17:37                       ` Jerome Glisse
2014-07-15 17:53                       ` Bridgman, John
2014-07-15 17:53                         ` Bridgman, John
2014-07-15 18:04                         ` Jerome Glisse
2014-07-15 18:04                           ` Jerome Glisse
2014-07-16  8:27                           ` Daniel Vetter
2014-07-16  8:27                             ` Daniel Vetter
2014-07-16 14:52                             ` Jerome Glisse
2014-07-16 14:52                               ` Jerome Glisse
2014-07-16 15:25                               ` Daniel Vetter
2014-07-16 15:25                                 ` Daniel Vetter
2014-07-16  8:21                         ` Daniel Vetter
2014-07-16  8:21                           ` Daniel Vetter
2014-07-16 19:02                           ` Greg KH
2014-07-16 19:02                             ` Greg KH

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.