linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [git pull] habanalabs pull request for kernel 5.15
@ 2021-08-19 11:02 Oded Gabbay
  2021-08-19 17:04 ` Greg KH
  0 siblings, 1 reply; 7+ messages in thread
From: Oded Gabbay @ 2021-08-19 11:02 UTC (permalink / raw)
  To: gregkh; +Cc: linux-kernel

Hi Greg,

This is habanalabs pull request for the merge window of kernel 5.15.
The commits divide roughly 50/50 between adding new features, such
as peer-to-peer support with DMA-BUF or signaling from within a graph,
and fixing various bugs, small improvements, etc.

Full details are in the tag.

Thanks,
Oded

The following changes since commit b2159182dd498fdb0f49e371ccc94efbc12d1f8e:

  lkdtm: remove IDE_CORE_CP crashpoint (2021-08-19 07:40:22 +0200)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux.git tags/misc-habanalabs-next-2021-08-19

for you to fetch changes up to a3f369db34e33236f994d4ca3f13655109394d06:

  habanalabs/gaudi: invalidate PMMU mem cache on init (2021-08-19 11:43:43 +0300)

----------------------------------------------------------------
This tag contains habanalabs driver changes for v5.15:

- Add a new uAPI (under the memory ioctl) to request from the driver
  to export a DMA-BUF object that represents a memory region on
  the device's DRAM. This is needed to enable peer-to-peer over PCIe
  between habana device and an RDMA adapter (e.g. mlnx5 adapter).

- Add a new uAPI (under the cs ioctl) to enable to user to reserve
  signals and signal them from within its workloads, while the driver
  performs the waiting. This allows finer granularity of pipelining
  between the different engines and resource utilization.

- Add a new uAPI (under the wait_for_cs ioctl) to allow waiting
  on multiple command submissions (workloads) at the same time. This
  is an optimization for the user process so it won't need to call
  multiple times to the wait_for_cs ioctl.

- Add new feature of "state dump", which can be triggered through new
  debugfs node. This is a similar concept to the kernel panic dump.
  This new mechanism retrieves information from the device in case
  one of the workloads that was sent by the user got stuck. This is
  very helpful for debugging the hang.

- Add a new debugfs node to perform lookup of user pointers that are
  mapped to habana device's pmmu.

- Fix to the tracking of user process when running inside a container.

- Allow user to map more than 4GB of memory to the device MMU in single
  IOCTL call.

- Minimize number of register reads done in GAUDI during user operation.

- Allow user to retrieve the device's server type that the device is
  connected to.

- Several fixes to the code of waiting on interrupts on behalf of the
  user.

- Fixes and improvements to the hint mechanism in our VA allocation.

- Update the firmware header files to the latest version while
  maintaining backward compatibility with older firmware versions.

- Multiple fixes to various bugs.

----------------------------------------------------------------
Alon Mizrahi (1):
      habanalabs/gaudi: add monitored SOBs to state dump

Koby Elbaz (2):
      habanalabs: fix race between soft reset and heartbeat
      habanalabs: clear msg_to_cpu_reg to avoid misread after reset

Oded Gabbay (25):
      habanalabs: rename enum vm_type_t to vm_type
      habanalabs: re-init completion object upon retry
      habanalabs: release pending user interrupts on device fini
      habanalabs: handle case of interruptable wait
      habanalabs: user mappings can be 64-bit
      habanalabs: allow disabling huge page use
      habanalabs: use get_task_pid() to take PID
      habanalabs: fix type of variable
      habanalabs: add asic property of host dma offset
      habanalabs: set dma max segment size
      habanalabs/gaudi: fix information printed on SM event
      habanalabs: update firmware header to latest version
      habanalabs/goya: add missing initialization
      habanalabs: revise prints on FD close
      habanalabs: remove redundant warning message
      habanalabs: expose server type in INFO IOCTL
      habanalabs: define uAPI to export FD for DMA-BUF
      habanalabs/gaudi: move scrubbing to late init
      habanalabs/gaudi: minimize number of register reads
      habanalabs: update to latest firmware headers
      habanalabs/gaudi: increase boot fit timeout
      habanalabs/gaudi: restore user registers when context opens
      habanalabs/gaudi: define DC POWER for secured PMC
      habanalabs/gaudi: size should be printed in decimal
      habanalabs/gaudi: invalidate PMMU mem cache on init

Ofir Bitton (6):
      habanalabs: update firmware header files
      habanalabs: missing mutex_unlock in process kill procedure
      habanalabs/gaudi: trigger state dump in case of SM errors
      habanalabs: add validity check for event ID received from F/W
      habanalabs/gaudi: scrub HBM to a specific value
      habanalabs/gaudi: fetch TPC/MME ECC errors from F/W

Ohad Sharabi (5):
      habanalabs: get multiple fences under same cs_lock
      habanalabs: add wait-for-multi-CS uAPI
      habanalabs: convert PCI BAR offset to u64
      habanalabs: make set_pci_regions asic function
      habanalabs: modify multi-CS to wait on stream masters

Tomer Tayar (4):
      habanalabs: fix nullifying of destroyed mmu pgt pool
      habanalabs: mark linux image as not loaded after hw_fini
      habanalabs: add support for dma-buf exporter
      habanalabs/gaudi: unmask out of bounds SLM access interrupt

Yuri Nudelman (7):
      habanalabs: allow fail on inability to respect hint
      habanalabs: expose state dump
      habanalabs: state dump monitors and fences infrastructure
      habanalabs/gaudi: implement state dump
      habanalabs: save pid per userptr
      habanalabs: fix mmu node address resolution in debugfs
      habanalabs: add userptr_lookup node in debugfs

Zvika Yehudai (1):
      habanalabs: rename cb_mmap to mmap

farah kassabri (4):
      habanalabs: support hint addresses range reservation
      habanalabs: signal/wait change sync object reset flow
      habanalabs: add support for encapsulated signals reservation
      habanalabs: add support for encapsulated signals submission

 .../ABI/testing/debugfs-driver-habanalabs          |   19 +
 drivers/misc/habanalabs/Kconfig                    |    1 +
 drivers/misc/habanalabs/common/Makefile            |    3 +-
 drivers/misc/habanalabs/common/command_buffer.c    |    2 +-
 .../misc/habanalabs/common/command_submission.c    | 1347 +++++++++++++++-----
 drivers/misc/habanalabs/common/context.c           |  146 ++-
 drivers/misc/habanalabs/common/debugfs.c           |  184 ++-
 drivers/misc/habanalabs/common/device.c            |  133 +-
 drivers/misc/habanalabs/common/firmware_if.c       |   56 +-
 drivers/misc/habanalabs/common/habanalabs.h        |  422 +++++-
 drivers/misc/habanalabs/common/habanalabs_drv.c    |    3 +-
 drivers/misc/habanalabs/common/habanalabs_ioctl.c  |    2 +
 drivers/misc/habanalabs/common/hw_queue.c          |  198 ++-
 drivers/misc/habanalabs/common/memory.c            |  689 +++++++++-
 drivers/misc/habanalabs/common/mmu/mmu_v1.c        |   12 +-
 drivers/misc/habanalabs/common/pci/pci.c           |    2 +
 drivers/misc/habanalabs/common/state_dump.c        |  718 +++++++++++
 drivers/misc/habanalabs/gaudi/gaudi.c              |  684 ++++++++--
 drivers/misc/habanalabs/gaudi/gaudiP.h             |   17 +
 drivers/misc/habanalabs/gaudi/gaudi_coresight.c    |    5 -
 drivers/misc/habanalabs/goya/goya.c                |   88 +-
 drivers/misc/habanalabs/include/common/cpucp_if.h  |  103 +-
 .../misc/habanalabs/include/common/hl_boot_if.h    |   62 +-
 .../habanalabs/include/gaudi/asic_reg/gaudi_regs.h |    3 +
 .../misc/habanalabs/include/gaudi/gaudi_masks.h    |   17 +
 .../misc/habanalabs/include/gaudi/gaudi_reg_map.h  |    2 -
 include/uapi/misc/habanalabs.h                     |  211 ++-
 27 files changed, 4370 insertions(+), 759 deletions(-)
 create mode 100644 drivers/misc/habanalabs/common/state_dump.c

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [git pull] habanalabs pull request for kernel 5.15
  2021-08-19 11:02 [git pull] habanalabs pull request for kernel 5.15 Oded Gabbay
@ 2021-08-19 17:04 ` Greg KH
  2021-08-19 18:48   ` Dave Airlie
  0 siblings, 1 reply; 7+ messages in thread
From: Greg KH @ 2021-08-19 17:04 UTC (permalink / raw)
  To: Oded Gabbay; +Cc: linux-kernel

On Thu, Aug 19, 2021 at 02:02:09PM +0300, Oded Gabbay wrote:
> Hi Greg,
> 
> This is habanalabs pull request for the merge window of kernel 5.15.
> The commits divide roughly 50/50 between adding new features, such
> as peer-to-peer support with DMA-BUF or signaling from within a graph,
> and fixing various bugs, small improvements, etc.

Pulled and pushed out, thanks!

greg k-h

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [git pull] habanalabs pull request for kernel 5.15
  2021-08-19 17:04 ` Greg KH
@ 2021-08-19 18:48   ` Dave Airlie
  2021-08-20  6:43     ` Daniel Vetter
                       ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Dave Airlie @ 2021-08-19 18:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Jason Gunthorpe, Daniel Vetter, Linus Torvalds
  Cc: Oded Gabbay, LKML

On Fri, 20 Aug 2021 at 03:07, Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Thu, Aug 19, 2021 at 02:02:09PM +0300, Oded Gabbay wrote:
> > Hi Greg,
> >
> > This is habanalabs pull request for the merge window of kernel 5.15.
> > The commits divide roughly 50/50 between adding new features, such
> > as peer-to-peer support with DMA-BUF or signaling from within a graph,
> > and fixing various bugs, small improvements, etc.
>
> Pulled and pushed out, thanks!

NAK for adding dma-buf or p2p support to this driver in the upstream
kernel. There needs to be a hard line between
"I-can't-believe-its-not-a-drm-driver" drivers which bypass our
userspace requirements, and I consider this the line.

This driver was merged into misc on the grounds it wasn't really a
drm/gpu driver and so didn't have to accept our userspace rules.

Adding dma-buf/p2p support to this driver is showing it really fits
the gpu driver model and should be under the drivers/gpu rules since
what are most GPUs except accelerators.

We are opening a major can of worms (some would say merging habanalabs
driver opened it), but this places us in the situation that if a GPU
vendor just claims their hw is a "vector" accelerator they can use
Greg to bypass all the work that been done to ensure we have
maintainability long term. I don't want drivers in the tree using
dma-buf to interact with other drivers when we don't have access to a
userspace project to validate the kernel driver assumptions.

Dave.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [git pull] habanalabs pull request for kernel 5.15
  2021-08-19 18:48   ` Dave Airlie
@ 2021-08-20  6:43     ` Daniel Vetter
  2021-08-20 10:02     ` Greg Kroah-Hartman
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Daniel Vetter @ 2021-08-20  6:43 UTC (permalink / raw)
  To: Dave Airlie
  Cc: Greg Kroah-Hartman, Jason Gunthorpe, Linus Torvalds, Oded Gabbay, LKML

On Thu, Aug 19, 2021 at 8:48 PM Dave Airlie <airlied@gmail.com> wrote:
> On Fri, 20 Aug 2021 at 03:07, Greg KH <gregkh@linuxfoundation.org> wrote:
> >
> > On Thu, Aug 19, 2021 at 02:02:09PM +0300, Oded Gabbay wrote:
> > > Hi Greg,
> > >
> > > This is habanalabs pull request for the merge window of kernel 5.15.
> > > The commits divide roughly 50/50 between adding new features, such
> > > as peer-to-peer support with DMA-BUF or signaling from within a graph,
> > > and fixing various bugs, small improvements, etc.
> >
> > Pulled and pushed out, thanks!
>
> NAK for adding dma-buf or p2p support to this driver in the upstream
> kernel. There needs to be a hard line between
> "I-can't-believe-its-not-a-drm-driver" drivers which bypass our
> userspace requirements, and I consider this the line.
>
> This driver was merged into misc on the grounds it wasn't really a
> drm/gpu driver and so didn't have to accept our userspace rules.
>
> Adding dma-buf/p2p support to this driver is showing it really fits
> the gpu driver model and should be under the drivers/gpu rules since
> what are most GPUs except accelerators.
>
> We are opening a major can of worms (some would say merging habanalabs
> driver opened it), but this places us in the situation that if a GPU
> vendor just claims their hw is a "vector" accelerator they can use
> Greg to bypass all the work that been done to ensure we have
> maintainability long term. I don't want drivers in the tree using
> dma-buf to interact with other drivers when we don't have access to a
> userspace project to validate the kernel driver assumptions.

I think everything that can be said has been said over the last few
years, here on m-l and at plumbers, so just for the record my +1.

There's no point in negotiation for years with accel companies in the
background if the guy next door just gleefully offers to get pulled
over the table, no questions asked.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [git pull] habanalabs pull request for kernel 5.15
  2021-08-19 18:48   ` Dave Airlie
  2021-08-20  6:43     ` Daniel Vetter
@ 2021-08-20 10:02     ` Greg Kroah-Hartman
  2021-08-22 23:06     ` Laurent Pinchart
  2021-08-25  1:16     ` Jeffrey Hugo
  3 siblings, 0 replies; 7+ messages in thread
From: Greg Kroah-Hartman @ 2021-08-20 10:02 UTC (permalink / raw)
  To: Dave Airlie
  Cc: Jason Gunthorpe, Daniel Vetter, Linus Torvalds, Oded Gabbay, LKML

On Fri, Aug 20, 2021 at 04:48:18AM +1000, Dave Airlie wrote:
> On Fri, 20 Aug 2021 at 03:07, Greg KH <gregkh@linuxfoundation.org> wrote:
> >
> > On Thu, Aug 19, 2021 at 02:02:09PM +0300, Oded Gabbay wrote:
> > > Hi Greg,
> > >
> > > This is habanalabs pull request for the merge window of kernel 5.15.
> > > The commits divide roughly 50/50 between adding new features, such
> > > as peer-to-peer support with DMA-BUF or signaling from within a graph,
> > > and fixing various bugs, small improvements, etc.
> >
> > Pulled and pushed out, thanks!
> 
> NAK for adding dma-buf or p2p support to this driver in the upstream
> kernel. There needs to be a hard line between
> "I-can't-believe-its-not-a-drm-driver" drivers which bypass our
> userspace requirements, and I consider this the line.

Damm, the first time I try to take a vacation in 1 1/2 years, and stuff
like this happens :(

I've now dropped this from my branch and will write more next week when
I get a chance, sorry, don't have the chance to right now.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [git pull] habanalabs pull request for kernel 5.15
  2021-08-19 18:48   ` Dave Airlie
  2021-08-20  6:43     ` Daniel Vetter
  2021-08-20 10:02     ` Greg Kroah-Hartman
@ 2021-08-22 23:06     ` Laurent Pinchart
  2021-08-25  1:16     ` Jeffrey Hugo
  3 siblings, 0 replies; 7+ messages in thread
From: Laurent Pinchart @ 2021-08-22 23:06 UTC (permalink / raw)
  To: Dave Airlie
  Cc: Greg Kroah-Hartman, Jason Gunthorpe, Daniel Vetter,
	Linus Torvalds, Oded Gabbay, LKML

On Fri, Aug 20, 2021 at 04:48:18AM +1000, Dave Airlie wrote:
> On Fri, 20 Aug 2021 at 03:07, Greg KH wrote:
> > On Thu, Aug 19, 2021 at 02:02:09PM +0300, Oded Gabbay wrote:
> > > Hi Greg,
> > >
> > > This is habanalabs pull request for the merge window of kernel 5.15.
> > > The commits divide roughly 50/50 between adding new features, such
> > > as peer-to-peer support with DMA-BUF or signaling from within a graph,
> > > and fixing various bugs, small improvements, etc.
> >
> > Pulled and pushed out, thanks!
> 
> NAK for adding dma-buf or p2p support to this driver in the upstream
> kernel. There needs to be a hard line between
> "I-can't-believe-its-not-a-drm-driver" drivers which bypass our
> userspace requirements, and I consider this the line.
> 
> This driver was merged into misc on the grounds it wasn't really a
> drm/gpu driver and so didn't have to accept our userspace rules.
> 
> Adding dma-buf/p2p support to this driver is showing it really fits
> the gpu driver model and should be under the drivers/gpu rules since
> what are most GPUs except accelerators.
> 
> We are opening a major can of worms (some would say merging habanalabs
> driver opened it), but this places us in the situation that if a GPU
> vendor just claims their hw is a "vector" accelerator they can use
> Greg to bypass all the work that been done to ensure we have
> maintainability long term. I don't want drivers in the tree using
> dma-buf to interact with other drivers when we don't have access to a
> userspace project to validate the kernel driver assumptions.

I can only voice the strongest agreement here. This is a situation that
is only too familiar and that we're facing in the camera world as well.
For the past ten years, the camera community has worked hard to build
bridges with hardware vendors. The public development in the kernel tree
is only the visible part of the iceberg, lots of efforts have been put
in reaching out, teaching and helping. A few years ago the libcamera
project got started to offer a userspace framework to device vendors
where they can contribute code, similar to Mesa for graphics (and
related) acceleration.

I can't emphasize strongly enough how much effort it took to start
getting vendors on board, and the situation is still fragile at best. If
we now send a message that all of this can be bypassed by merging code
that ignores all rules in drivers/misc/, it would be ten years of
completely wasted work. Beside the technical impact, the effect on the
motivation of the kernel and userspace communities we have slowly built
over time would be catastrophic.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [git pull] habanalabs pull request for kernel 5.15
  2021-08-19 18:48   ` Dave Airlie
                       ` (2 preceding siblings ...)
  2021-08-22 23:06     ` Laurent Pinchart
@ 2021-08-25  1:16     ` Jeffrey Hugo
  3 siblings, 0 replies; 7+ messages in thread
From: Jeffrey Hugo @ 2021-08-25  1:16 UTC (permalink / raw)
  To: Dave Airlie, Greg Kroah-Hartman, Jason Gunthorpe, Daniel Vetter,
	Linus Torvalds
  Cc: Oded Gabbay, LKML

On 8/19/2021 12:48 PM, Dave Airlie wrote:
> On Fri, 20 Aug 2021 at 03:07, Greg KH <gregkh@linuxfoundation.org> wrote:
>>
>> On Thu, Aug 19, 2021 at 02:02:09PM +0300, Oded Gabbay wrote:
>>> Hi Greg,
>>>
>>> This is habanalabs pull request for the merge window of kernel 5.15.
>>> The commits divide roughly 50/50 between adding new features, such
>>> as peer-to-peer support with DMA-BUF or signaling from within a graph,
>>> and fixing various bugs, small improvements, etc.
>>
>> Pulled and pushed out, thanks!
> 
> NAK for adding dma-buf or p2p support to this driver in the upstream
> kernel. There needs to be a hard line between
> "I-can't-believe-its-not-a-drm-driver" drivers which bypass our
> userspace requirements, and I consider this the line.
> 
> This driver was merged into misc on the grounds it wasn't really a
> drm/gpu driver and so didn't have to accept our userspace rules.
> 
> Adding dma-buf/p2p support to this driver is showing it really fits
> the gpu driver model and should be under the drivers/gpu rules since
> what are most GPUs except accelerators.

Care to elaborate?  I'm not trying to be cute, but all I see here is 
that dma-buf/p2p using drivers must be in drivers/gpu, yet many drivers 
outside of the gpu area use those features.  Surely your position can't 
be that only drivers/gpu can use dma-buf or p2p (which is part of the 
PCIe spec).

> We are opening a major can of worms (some would say merging habanalabs
> driver opened it), but this places us in the situation that if a GPU
> vendor just claims their hw is a "vector" accelerator they can use
> Greg to bypass all the work that been done to ensure we have
> maintainability long term. I don't want drivers in the tree using
> dma-buf to interact with other drivers when we don't have access to a
> userspace project to validate the kernel driver assumptions.

Umm, isn't that [1]?  The Habana device has the most open userspace I'm 
aware of.  Seems disingenuous to claim you don't have access to a 
userspace project for this driver.

[1] - https://github.com/HabanaAI/hl-thunk

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-08-25  1:17 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-19 11:02 [git pull] habanalabs pull request for kernel 5.15 Oded Gabbay
2021-08-19 17:04 ` Greg KH
2021-08-19 18:48   ` Dave Airlie
2021-08-20  6:43     ` Daniel Vetter
2021-08-20 10:02     ` Greg Kroah-Hartman
2021-08-22 23:06     ` Laurent Pinchart
2021-08-25  1:16     ` Jeffrey Hugo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).