Feature Request: Ability to decode bus/dma address back into physical address

All of lore.kernel.org
 help / color / mirror / Atom feed

* Feature Request: Ability to decode bus/dma address back into physical address
@ 2017-08-01 10:07 Tom St Denis
       [not found] ` <8379cf5a-7539-e221-c678-20f617fb4337-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 24+ messages in thread
From: Tom St Denis @ 2017-08-01 10:07 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Hi,

We're working on a user space debugger for AMDGPU devices and are trying 
to figure out a "proper" way of taking mapped pages and converting them 
back to physical addresses so the debugger can read memory that was sent 
to the GPU.

The debugger largely operates at arms length from the application being 
debugged so it has no knowledge of the buffers other than which PCI 
device mapped it and the mapped address.

We also use the debugger to read back kernel programmed buffers (ring 
buffers and the like) so we really need a kernel level solution.

As a prototype I put a trace point in the AMDGPU driver when 
pci_map_page() is called which preserves the physical and dma address 
and that works but obviously is a bit of a hack and doesn't work if 
pages are mapped before the trace is enabled.

Ideally, some form of debugfs interface would be nice.

Is there any sort of interface already I can take advantage of?  I've 
tried enabling the map/unmap tracepoints before loading amdgpu and it 
produced no traffic in the trace file.

Thanks,
Tom

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Feature Request: Ability to decode bus/dma address back into physical address
       [not found] ` <8379cf5a-7539-e221-c678-20f617fb4337-5C7GfCeVMHo@public.gmane.org>
@ 2017-08-01 17:25   ` Jerome Glisse
       [not found]     ` <30eb1ecb-c86f-4d3b-cd49-e002f46e582d@amd.com>
  0 siblings, 1 reply; 24+ messages in thread
From: Jerome Glisse @ 2017-08-01 17:25 UTC (permalink / raw)
  To: Tom St Denis; +Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Tue, Aug 01, 2017 at 06:07:48AM -0400, Tom St Denis wrote:
> Hi,
> 
> We're working on a user space debugger for AMDGPU devices and are trying to
> figure out a "proper" way of taking mapped pages and converting them back to
> physical addresses so the debugger can read memory that was sent to the GPU.
> 
> The debugger largely operates at arms length from the application being
> debugged so it has no knowledge of the buffers other than which PCI device
> mapped it and the mapped address.

There is your issue, you should reconsider your design. You should add a
debuging/tracing API to the amdgpu kernel driver that allow one process
to snoop on another process buffers. It would be a lot more realistic user
space interface.

> 
> We also use the debugger to read back kernel programmed buffers (ring
> buffers and the like) so we really need a kernel level solution.

Not going to happen, this would be like /dev/mem on steroid and a big
security risk.

> 
> As a prototype I put a trace point in the AMDGPU driver when pci_map_page()
> is called which preserves the physical and dma address and that works but
> obviously is a bit of a hack and doesn't work if pages are mapped before the
> trace is enabled.
> 
> Ideally, some form of debugfs interface would be nice.
> 
> Is there any sort of interface already I can take advantage of?  I've tried
> enabling the map/unmap tracepoints before loading amdgpu and it produced no
> traffic in the trace file.

I think you need to reconsider how to achieve your goal. It is a lot more
sensefull to add new API to amdgpu driver than asking kernel to provide you
with access to random physical memory.

Jérôme

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Feature Request: Ability to decode bus/dma address back into physical address
       [not found]       ` <30eb1ecb-c86f-4d3b-cd49-e002f46e582d-5C7GfCeVMHo@public.gmane.org>
@ 2017-08-01 18:04         ` Jerome Glisse
       [not found]           ` <483ecda0-2977-d2ea-794c-320e429d7645@amd.com>
  0 siblings, 1 reply; 24+ messages in thread
From: Jerome Glisse @ 2017-08-01 18:04 UTC (permalink / raw)
  To: Tom St Denis; +Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Tue, Aug 01, 2017 at 01:32:35PM -0400, Tom St Denis wrote:
> On 01/08/17 01:25 PM, Jerome Glisse wrote:
> > On Tue, Aug 01, 2017 at 06:07:48AM -0400, Tom St Denis wrote:
> > > Hi,
> > > 
> > > We're working on a user space debugger for AMDGPU devices and are trying to
> > > figure out a "proper" way of taking mapped pages and converting them back to
> > > physical addresses so the debugger can read memory that was sent to the GPU.
> > > 
> > > The debugger largely operates at arms length from the application being
> > > debugged so it has no knowledge of the buffers other than which PCI device
> > > mapped it and the mapped address.
> > 
> > There is your issue, you should reconsider your design. You should add a
> > debuging/tracing API to the amdgpu kernel driver that allow one process
> > to snoop on another process buffers. It would be a lot more realistic user
> > space interface.
> 
> It's funny you should say that:
> 
> https://lists.freedesktop.org/archives/amd-gfx/2017-August/011653.html
> 
> That approach works but it's less than ideal for several reasons.
> 
> Another angle is to add a debugfs interface into TTM so we can search their
> page tables.  But again that would only work for pages mapped through a TTM
> interface.  Anything mapped inside the driver with its own pci_alloc_*()
> wouldn't be searchable/traceable here.  So at every single place we would
> have to trace/log/etc the pair.

You miss-understood what i mean. The patch you are pointing too is wrong.
The kind of API i have in mind is high level not low level. You would
register with the amdgpu kernel driver as a snooper for a given process
or file descriptor. This would allow you to get list of all gem objects
of the process you are snooping. From there you could snap shot those
gem object on listen on event on those. You could also ask to listen on
GPU command submission and get event when that happen. No need to expose
complex API like you are trying to do. Something like:

struct amdgpu_snoop_process_ioctl {
	uint32_t	pid;
};

struct amdgpu_snoop_bo_info {
	uint32_t	handle;
	uint32_t	size;
	uint64_t	flags;
	...
};

struct amdgpu_snoop_list_bo_ioctl {
	uint32_t			nbos;
	uint32_t			mbos;
	struct amdgpu_snoop_bo_info	*bos; // bos[mbos] in size
};

struct amdgpu_snoop_snapshot_bo {
	uint32_t	*uptr;
	uint32_t	handle;
	uint32_t	offset;
	uint32_t	size;
};

struct amdgpu_snoop_cmd {
	uint32_t	size;
	uint32_t	*uptr;
};

...

You would need to leverage thing like uevent to get event when something
happen like a bo being destroy or command submission ...


> > > We also use the debugger to read back kernel programmed buffers (ring
> > > buffers and the like) so we really need a kernel level solution.
> > 
> > Not going to happen, this would be like /dev/mem on steroid and a big
> > security risk.
> 
> Not looking to rehash old debates but from our point of view a user with
> read/write access to debugfs can already do "bad things (tm)" so it's a moot
> point (for instance, they could program an SDMA job on the GPU to read/write
> anywhere in memory...).

That's is wrong and it should not be allowed ! You need to fix that.


> > > As a prototype I put a trace point in the AMDGPU driver when pci_map_page()
> > > is called which preserves the physical and dma address and that works but
> > > obviously is a bit of a hack and doesn't work if pages are mapped before the
> > > trace is enabled.
> > > 
> > > Ideally, some form of debugfs interface would be nice.
> > > 
> > > Is there any sort of interface already I can take advantage of?  I've tried
> > > enabling the map/unmap tracepoints before loading amdgpu and it produced no
> > > traffic in the trace file.
> > 
> > I think you need to reconsider how to achieve your goal. It is a lot more
> > sensefull to add new API to amdgpu driver than asking kernel to provide you
> > with access to random physical memory.
> 
> We already have physical memory access (through /dev/mem or our own
> /dev/fmem clone).  The issue is translating bus addresses.  With the IOMMU
> enabled addresses programmed into the GPU are not physical anymore and the
> debugger cannot read GPU related packets.

CONFIG_STRICT_DEVMEM is enabled by many distributions so /dev/mem is root
only and even when root there are restrictions. What you are asking for
is insane.

> 
> The existing tracer patch "works" but it's not ideal and the maintainers of
> the AMDGPU driver are legitimately desiring a better solution.

Again re-design what you are trying to do. There is no legitimate need to
track individual page mapping. All you need is access to all buffer object
a process have created and to be inform on anything a process do through
the amdgpu device file.

Jérôme

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Feature Request: Ability to decode bus/dma address back into physical address
       [not found]             ` <483ecda0-2977-d2ea-794c-320e429d7645-5C7GfCeVMHo@public.gmane.org>
@ 2017-08-01 19:03               ` Jerome Glisse
       [not found]                 ` <42c5fe2b-f179-cb71-03d3-7ae991543edb@amd.com>
  0 siblings, 1 reply; 24+ messages in thread
From: Jerome Glisse @ 2017-08-01 19:03 UTC (permalink / raw)
  To: Tom St Denis; +Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Tue, Aug 01, 2017 at 02:25:02PM -0400, Tom St Denis wrote:
> On 01/08/17 02:04 PM, Jerome Glisse wrote:
> > On Tue, Aug 01, 2017 at 01:32:35PM -0400, Tom St Denis wrote:
> > > On 01/08/17 01:25 PM, Jerome Glisse wrote:
> > > > On Tue, Aug 01, 2017 at 06:07:48AM -0400, Tom St Denis wrote:
> > > > > Hi,
> > > > > 
> > > > > We're working on a user space debugger for AMDGPU devices and are trying to
> > > > > figure out a "proper" way of taking mapped pages and converting them back to
> > > > > physical addresses so the debugger can read memory that was sent to the GPU.
> > > > > 
> > > > > The debugger largely operates at arms length from the application being
> > > > > debugged so it has no knowledge of the buffers other than which PCI device
> > > > > mapped it and the mapped address.
> > > > 
> > > > There is your issue, you should reconsider your design. You should add a
> > > > debuging/tracing API to the amdgpu kernel driver that allow one process
> > > > to snoop on another process buffers. It would be a lot more realistic user
> > > > space interface.
> > > 
> > > It's funny you should say that:
> > > 
> > > https://lists.freedesktop.org/archives/amd-gfx/2017-August/011653.html
> > > 
> > > That approach works but it's less than ideal for several reasons.
> > > 
> > > Another angle is to add a debugfs interface into TTM so we can search their
> > > page tables.  But again that would only work for pages mapped through a TTM
> > > interface.  Anything mapped inside the driver with its own pci_alloc_*()
> > > wouldn't be searchable/traceable here.  So at every single place we would
> > > have to trace/log/etc the pair.
> > 
> > You miss-understood what i mean. The patch you are pointing too is wrong.
> > The kind of API i have in mind is high level not low level. You would
> > register with the amdgpu kernel driver as a snooper for a given process
> > or file descriptor. This would allow you to get list of all gem objects
> > of the process you are snooping. From there you could snap shot those
> > gem object on listen on event on those. You could also ask to listen on
> > GPU command submission and get event when that happen. No need to expose
> > complex API like you are trying to do. Something like:
> > 
> > struct amdgpu_snoop_process_ioctl {
> > 	uint32_t	pid;
> > };
> > 
> > struct amdgpu_snoop_bo_info {
> > 	uint32_t	handle;
> > 	uint32_t	size;
> > 	uint64_t	flags;
> > 	...
> > };
> > 
> > struct amdgpu_snoop_list_bo_ioctl {
> > 	uint32_t			nbos;
> > 	uint32_t			mbos;
> > 	struct amdgpu_snoop_bo_info	*bos; // bos[mbos] in size
> > };
> > 
> > struct amdgpu_snoop_snapshot_bo {
> > 	uint32_t	*uptr;
> > 	uint32_t	handle;
> > 	uint32_t	offset;
> > 	uint32_t	size;
> > };
> > 
> > struct amdgpu_snoop_cmd {
> > 	uint32_t	size;
> > 	uint32_t	*uptr;
> > };
> > 
> > ...
> > 
> > You would need to leverage thing like uevent to get event when something
> > happen like a bo being destroy or command submission ...
> 
> The problem with this approach is when I'm reading an IB I'm not given user
> space addresses but bus addresses.  So I can't correlate anything I'm seeing
> in the hardware with the user task if I wanted to.
> 
> In fact, to augment [say] OpenGL debugging I would have to correlate a
> buffer handle/pointer's page backing with the bus address in the IB so I
> could correlate the two (e.g. dump an IB and print out user process variable
> names that correspond to the IB contents...).

When you read IB you are provided with GPU virtual address, you can get the
GPU virtual address from the same snoop ioctl just add a field in bo_info
above. So i don't see any issue here.

> 
> > > Not looking to rehash old debates but from our point of view a user with
> > > read/write access to debugfs can already do "bad things (tm)" so it's a moot
> > > point (for instance, they could program an SDMA job on the GPU to read/write
> > > anywhere in memory...).
> > 
> > That's is wrong and it should not be allowed ! You need to fix that.
> 
> Again, not looking to rehash this debate.  AMDGPU has had register level
> debugfs access for nearly two years now.  At the point you can write to
> hardware registers you have to be root anyways.  I could just as easily load
> a tainted AMDGPU driver if I wanted to at that point which then achieves the
> same debugfs-free badness for me.

You also have IOMMU that restrict what you can do. Even if you can write
register to program a DMA operation you are still likely behind an IOMMU
and thus you can't write anywhere (unless IOMMU is disabled or in pass
through ie 1 on 1 mapping).

> 
> > > > > As a prototype I put a trace point in the AMDGPU driver when pci_map_page()
> > > > > is called which preserves the physical and dma address and that works but
> > > > > obviously is a bit of a hack and doesn't work if pages are mapped before the
> > > > > trace is enabled.
> > > > > 
> > > > > Ideally, some form of debugfs interface would be nice.
> > > > > 
> > > > > Is there any sort of interface already I can take advantage of?  I've tried
> > > > > enabling the map/unmap tracepoints before loading amdgpu and it produced no
> > > > > traffic in the trace file.
> > > > 
> > > > I think you need to reconsider how to achieve your goal. It is a lot more
> > > > sensefull to add new API to amdgpu driver than asking kernel to provide you
> > > > with access to random physical memory.
> > > 
> > > We already have physical memory access (through /dev/mem or our own
> > > /dev/fmem clone).  The issue is translating bus addresses.  With the IOMMU
> > > enabled addresses programmed into the GPU are not physical anymore and the
> > > debugger cannot read GPU related packets.
> > 
> > CONFIG_STRICT_DEVMEM is enabled by many distributions so /dev/mem is root
> > only and even when root there are restrictions. What you are asking for
> > is insane.
> 
> It really isn't.  When you debug a user process do you not have access to
> it's stack and heap?  Why shouldn't I have access to memory the GPU is
> working on?  And we use tools like "umr" during bringup as well so at those
> stages there's not even a complete kernel driver let alone userspace to go
> with the new hardware.

I am saying you can have access to this memory with a sane API and not with
something insane. What is wrong with what i outlined above ?

> 
> At the point I'm root I can attach to any process, change memory contents,
> do whatever I want.  Unless the kernel has such broken vfs security that it
> allows non-root users to read/write debugfs/etc it shouldn't be a problem.

With sane API you would not need to be root to debug/trace GPU activity of
your own process like gdb.

> > > The existing tracer patch "works" but it's not ideal and the maintainers of
> > > the AMDGPU driver are legitimately desiring a better solution.
> > 
> > Again re-design what you are trying to do. There is no legitimate need to
> > track individual page mapping. All you need is access to all buffer object
> > a process have created and to be inform on anything a process do through
> > the amdgpu device file.
> 
> Again the problem is the hardware is programmed with the dma address. So I
> want to correlate what the hardware is doing with what the process is doing
> I need the mappings.

You only need the GPU virtual address and you can get it with the API i outline.

Jérôme

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Feature Request: Ability to decode bus/dma address back into physical address
       [not found]                   ` <42c5fe2b-f179-cb71-03d3-7ae991543edb-5C7GfCeVMHo@public.gmane.org>
@ 2017-08-01 19:55                     ` Jerome Glisse
       [not found]                       ` <20170801195556.GD3443-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 24+ messages in thread
From: Jerome Glisse @ 2017-08-01 19:55 UTC (permalink / raw)
  To: Tom St Denis
  Cc: Deucher, Alexander,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Koenig,
	Christian

On Tue, Aug 01, 2017 at 03:28:26PM -0400, Tom St Denis wrote:
> Adding the AMDGPU maintainers to get their opinions.
> 
> Context:
> https://lists.linuxfoundation.org/pipermail/iommu/2017-August/023489.html
> 
> (and others in case you missed it on the list).
> 
> 
> On 01/08/17 03:03 PM, Jerome Glisse wrote:
> > > > You would need to leverage thing like uevent to get event when something
> > > > happen like a bo being destroy or command submission ...
> > > 
> > > The problem with this approach is when I'm reading an IB I'm not given user
> > > space addresses but bus addresses.  So I can't correlate anything I'm seeing
> > > in the hardware with the user task if I wanted to.
> > > 
> > > In fact, to augment [say] OpenGL debugging I would have to correlate a
> > > buffer handle/pointer's page backing with the bus address in the IB so I
> > > could correlate the two (e.g. dump an IB and print out user process variable
> > > names that correspond to the IB contents...).
> > 
> > When you read IB you are provided with GPU virtual address, you can get the
> > GPU virtual address from the same snoop ioctl just add a field in bo_info
> > above. So i don't see any issue here.
> 
> This is effectively what I'm doing with the patch except via a trace not a
> bolted on snooping ioctl.
> 
> Tracers are a bit better since there's not really any overhead in the normal
> case which is desirable (and the code is relatively very simple).
> 
> In an ideal case we could simply search the ttm pages for a given device to
> look for a match for a given dma address.  But that also has the problem
> that ...
> 
> This will fail for all the other allocations e.g. for our GART, ring
> buffers, etc.
> 
> So a "complete" solution would be nice where any bus address mapping that is
> still valid for a device could be resolved to the physical page that is
> backing it.

So you agree that you can get what you want with ioctl ? Now you object
that the ioctl overhead is too important ?

I do not believe ioctl overhead to be an issue. Like i say in the model
i adviced you mix ioctl and thing like uevent or trace so you get "real
time" notification of what is going on.

Why do i say that your approach of tracking individual page mapping is
insane. There is several obstacle:
  - to get from GPU virtual address to physical memory address you need
    to first walk down the GPU page table to get the GPU pte entry. From
    that you either get an address inside the GPU memory (VRAM of PCIE
    device) or a DMA bus address. If it is the latter (DMA bus address)
    you need to walk down the IOMMU page table to find the physical address
    of the memory. So this is 2 page table walk down
  - none of the above walk down can happen while holding enough lock so
    that the GPU page table do not change behind you back (especialy the
    second walk down) so you are open to race
  - GPU device memory is not necessary accessible (due to PCIE bar size
    limit i know that there is patches to allow to remap all GPU memory).
  - you need to track a lot of informations in userspace ie what memory
    is behind each 4k of GPU virtual address space you are interested in
    so you are basicaly duplicating informations that already exist in
    kernel space into userland

With what i outline you rely on the kernel informations to get to the
thing you want. You can even add something like:

struct amdgpu_snoop_snapshot_ioctl {
	uint64_t	gpu_virtual_address;
	uint64_t	size;
	uint64_t	uptr;
};

To snapshot a range of the GPU virtual address space so you not even
have to worry about GPU virtual address -> bo handle -> offset into
bo. You can even go further and add a /dev/dri/card0-snoop device file
that you can mmap to access GPU virtual address space. Process open
the card0 file register as snoop against another process from that point
on a GPU virtual address in the snooped process becomes an offset into
the card0-snoop file. So now you can read/write on that device file
to access the GPU memory of a process without doing ioctl. The changes
to amdgpu are not that big.

> 
> > > Again, not looking to rehash this debate.  AMDGPU has had register level
> > > debugfs access for nearly two years now.  At the point you can write to
> > > hardware registers you have to be root anyways.  I could just as easily load
> > > a tainted AMDGPU driver if I wanted to at that point which then achieves the
> > > same debugfs-free badness for me.
> > 
> > You also have IOMMU that restrict what you can do. Even if you can write
> > register to program a DMA operation you are still likely behind an IOMMU
> > and thus you can't write anywhere (unless IOMMU is disabled or in pass
> > through ie 1 on 1 mapping).
> 
> Fair point.  All I was saying is at the point you have MMIO access you can
> typically do bad things if you wanted to (like crash the system).
> 
> > I am saying you can have access to this memory with a sane API and not with
> > something insane. What is wrong with what i outlined above ?
> 
> Well aside from the fact it doesn't cover all mappings, I can't see how it's
> materially different than what I was already doing.  For this to work I'd
> have to be able to mmap objects from another process into the debugger.  I
> don't know if drm is really designed for that sort of flow and I suspect it
> would take a lot of work to add that.

The amount of work does not matter. We are aiming for sane solution not
insane thing. We are aiming for thing that are safe from security point
of view. We are aiming for design that lower barrier (ie no need to be
root to debug/trace GPU). This is what matters.

Quite frankly what i outline is not a lot of work. The snoop device file
scheme for instance is pretty easy to achieve. Everytime there is a change
to ttm object or GPU virtual address space you would need to update
accordingly any mmap of that file. Overall i would say this is in the
500/1000 lines of code range.

GPU device memory is slightly harder, you probably need to do it as a
snapshot ie you copy device memory to system memory and map the system
memory.


> 
> > > At the point I'm root I can attach to any process, change memory contents,
> > > do whatever I want.  Unless the kernel has such broken vfs security that it
> > > allows non-root users to read/write debugfs/etc it shouldn't be a problem.
> > 
> > With sane API you would not need to be root to debug/trace GPU activity of
> > your own process like gdb.
> 
> You need to be root to read GPU registers though (simply reading some
> registers can cause a GPU hang).  So if you want to see which waves are
> active, or the contents of a ring (or it's IBs) you can't do that as a
> normal user.  There's a very tiny subset of registers you can access through
> libdrm and often we need more than that (especially for npi work).
> 
> Indeed we recommend that on devel machines people +s umr so they can invoke
> it from normal user processes.  You wouldn't normally install umr unless you
> were a developer anyways so for customers/users it's not an issue.
> 
> It's not exactly as confined as all-software which is where some compromises
> are made.

We want thing like debugger to work out of the box on any distribution so
design must be sane and you should not need either to ask special modules
or special kernel config option or to be root. There is way to do what you
want without any of this requirement and thus that should be what you aim
for.

> 
> > > > > The existing tracer patch "works" but it's not ideal and the maintainers of
> > > > > the AMDGPU driver are legitimately desiring a better solution.
> > > > 
> > > > Again re-design what you are trying to do. There is no legitimate need to
> > > > track individual page mapping. All you need is access to all buffer object
> > > > a process have created and to be inform on anything a process do through
> > > > the amdgpu device file.
> > > 
> > > Again the problem is the hardware is programmed with the dma address. So I
> > > want to correlate what the hardware is doing with what the process is doing
> > > I need the mappings.
> > 
> > You only need the GPU virtual address and you can get it with the API i outline.
> 
> Which is effectively what my tracer is.  But it (and your design) can't
> translate all mappings which is why we even engaged with iommu list in the
> first place at all.

My design do not need to get the reverse mapping at all that is my point.
No need to engage the iommu folks no need to ask them for something as
you have all the information from inside the amdgpu driver.

> 
> Adding drm support to allow another process to GEM_MAP BOs (and a search
> ioctl) might work but seems like a lot more work (and will likely reduce
> performance) than the tracer and has the same limitation.  And ultimately we
> find direct memory access useful in NPI work so that's unlikely to be
> removed.

You have not proof that it would be slower and i outline the special file
idea that avoid any ioctl.

> 
> Thanks for your time Jerome.  I do understand where you're coming from but
> I'm not sure a drm update is both politically or technically feasible and we
> need this functionality (limited though it is) sooner than later.

You have to understand that kernel is not about compromising design so that
people can reach their schedule. Sane design do matter and if sane design
require lot of work than you should size your team accordingly to face the
challenges.

What you want to do can be done with very little code and no change outside
amdgpu kernel driver so i do not see any reason to not do so.

Jérôme

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Feature Request: Ability to decode bus/dma address back into physical address
       [not found]                       ` <20170801195556.GD3443-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-08-01 20:26                         ` Tom St Denis
       [not found]                           ` <77e557d2-aa75-46c4-88a7-cca5448ea08e-5C7GfCeVMHo@public.gmane.org>
  2017-08-01 20:43                         ` Alex Williamson
  1 sibling, 1 reply; 24+ messages in thread
From: Tom St Denis @ 2017-08-01 20:26 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Deucher, Alexander,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Koenig,
	Christian

Was trying to walk away from this ... :-) (all in good fun).

On 01/08/17 03:55 PM, Jerome Glisse wrote:
> On Tue, Aug 01, 2017 at 03:28:26PM -0400, Tom St Denis wrote:
>> Adding the AMDGPU maintainers to get their opinions.
>>
>> Context:
>> https://lists.linuxfoundation.org/pipermail/iommu/2017-August/023489.html
>>
>> (and others in case you missed it on the list).
>>
>>
>> On 01/08/17 03:03 PM, Jerome Glisse wrote:
>>>>> You would need to leverage thing like uevent to get event when something
>>>>> happen like a bo being destroy or command submission ...
>>>>
>>>> The problem with this approach is when I'm reading an IB I'm not given user
>>>> space addresses but bus addresses.  So I can't correlate anything I'm seeing
>>>> in the hardware with the user task if I wanted to.
>>>>
>>>> In fact, to augment [say] OpenGL debugging I would have to correlate a
>>>> buffer handle/pointer's page backing with the bus address in the IB so I
>>>> could correlate the two (e.g. dump an IB and print out user process variable
>>>> names that correspond to the IB contents...).
>>>
>>> When you read IB you are provided with GPU virtual address, you can get the
>>> GPU virtual address from the same snoop ioctl just add a field in bo_info
>>> above. So i don't see any issue here.
>>
>> This is effectively what I'm doing with the patch except via a trace not a
>> bolted on snooping ioctl.
>>
>> Tracers are a bit better since there's not really any overhead in the normal
>> case which is desirable (and the code is relatively very simple).
>>
>> In an ideal case we could simply search the ttm pages for a given device to
>> look for a match for a given dma address.  But that also has the problem
>> that ...
>>
>> This will fail for all the other allocations e.g. for our GART, ring
>> buffers, etc.
>>
>> So a "complete" solution would be nice where any bus address mapping that is
>> still valid for a device could be resolved to the physical page that is
>> backing it.
> 
> So you agree that you can get what you want with ioctl ? Now you object
> that the ioctl overhead is too important ?

No, I object that the overhead of keeping track of it in some sort of 
snoop structure is too much (and adds too much new code to the driver 
for the sole purpose of this one feature).

Not to mention we can't really invent ioctls without co-ordinating with 
with the other drm users (re: not just AMD).  umr is a lot less coupled 
since it's not really versioned just yet (still bringing up a lot of 
features so users use git not packages).

> I do not believe ioctl overhead to be an issue. Like i say in the model
> i adviced you mix ioctl and thing like uevent or trace so you get "real
> time" notification of what is going on.

Unless you can nop that in a config invariant fashion (like you can for 
tracers) that's a NAK from the get go.  And we'd need to buffer them to 
be practical since you might run the debugger out of sync with the 
application (e.g. app hangs then you fire up umr to see what's going on).

> Why do i say that your approach of tracking individual page mapping is
> insane. There is several obstacle:
>    - to get from GPU virtual address to physical memory address you need
>      to first walk down the GPU page table to get the GPU pte entry. From
>      that you either get an address inside the GPU memory (VRAM of PCIE
>      device) or a DMA bus address. If it is the latter (DMA bus address)
>      you need to walk down the IOMMU page table to find the physical address
>      of the memory. So this is 2 page table walk down

We already do this in umr.  give a GPUVM address we can decode it down 
to a PTE on both AI and VI platforms.  It's decoding the PTE to a 
physical page (outside of vram) that is the issue.

In fact being able to VM decode is important to the kernel team who have 
to debug VM issues from time to time.

>    - none of the above walk down can happen while holding enough lock so
>      that the GPU page table do not change behind you back (especialy the
>      second walk down) so you are open to race

Which is fine because you'd only really do this on a stalled/halted GPU 
anyways.  Just like you don't inspect variables of a program that is 
currently running (and not blocked/asleep/etc).  We have the ability to 
halt currently active waves/etc.

>    - GPU device memory is not necessary accessible (due to PCIE bar size
>      limit i know that there is patches to allow to remap all GPU memory).

We have ways of accessing all of VRAM from basic MMIO accesses :)

>    - you need to track a lot of informations in userspace ie what memory
>      is behind each 4k of GPU virtual address space you are interested in
>      so you are basicaly duplicating informations that already exist in
>      kernel space into userland

Agreed, if we had a TTM query API that'd be better but we don't.  And we 
work out what the memory is by inspecting the ring.  The ring has 
pointers to IB buffers which then have their own contents (e.g. CE/DE 
content).

> With what i outline you rely on the kernel informations to get to the
> thing you want. You can even add something like:
> 
> struct amdgpu_snoop_snapshot_ioctl {
> 	uint64_t	gpu_virtual_address;
> 	uint64_t	size;
> 	uint64_t	uptr;
> };
> 
> To snapshot a range of the GPU virtual address space so you not even
> have to worry about GPU virtual address -> bo handle -> offset into
> bo. You can even go further and add a /dev/dri/card0-snoop device file
> that you can mmap to access GPU virtual address space. Process open
> the card0 file register as snoop against another process from that point
> on a GPU virtual address in the snooped process becomes an offset into
> the card0-snoop file. So now you can read/write on that device file
> to access the GPU memory of a process without doing ioctl. The changes
> to amdgpu are not that big.

Honestly I'm not qualified to agree/disagree with your last sentence.  I 
suspect it's a lot more complicated than a trace though and as I said a 
few times already this won't cover allocations done privately in the kernel.

>> Well aside from the fact it doesn't cover all mappings, I can't see how it's
>> materially different than what I was already doing.  For this to work I'd
>> have to be able to mmap objects from another process into the debugger.  I
>> don't know if drm is really designed for that sort of flow and I suspect it
>> would take a lot of work to add that.
> 
> The amount of work does not matter. We are aiming for sane solution not
> insane thing. We are aiming for thing that are safe from security point
> of view. We are aiming for design that lower barrier (ie no need to be
> root to debug/trace GPU). This is what matters.

Except that you're approaching this from the angle that the KMD is 
perfect and we're debugging the UMD.  umr is meant to debug both.  So 
things like raw access to VRAM/GART is necessary.  For instance, to 
debug VM mappings, to read IBs, to read frame buffers, etc...

> Quite frankly what i outline is not a lot of work. The snoop device file
> scheme for instance is pretty easy to achieve. Everytime there is a change
> to ttm object or GPU virtual address space you would need to update
> accordingly any mmap of that file. Overall i would say this is in the
> 500/1000 lines of code range.

Again I'm not familiar enough with the DRM/TTM code design to comment here.

> GPU device memory is slightly harder, you probably need to do it as a
> snapshot ie you copy device memory to system memory and map the system
> memory.

Which is unnecessary since we can directly read VRAM from the debugger :-)

> We want thing like debugger to work out of the box on any distribution so
> design must be sane and you should not need either to ask special modules
> or special kernel config option or to be root. There is way to do what you
> want without any of this requirement and thus that should be what you aim
> for.

Except again you're looking at this from the lens that the KMD is 
working perfectly.  We use umr (our debugger) to debug the kernel driver 
itself.  That means you need to be able to read/write MMIO registers, 
peek at VRAM, decode VM addresses, read rings, etc...

Since we already have an entire list of "security breaches" we might as 
well make use of them for UMD debugging too.  Again, you're not going to 
give random logins access to umr on your machine.  I mean in reality you 
wouldn't do that anyways for UMD tasks (you can easily lock up a GPU 
from a user task with no privileges...).

> My design do not need to get the reverse mapping at all that is my point.
> No need to engage the iommu folks no need to ask them for something as
> you have all the information from inside the amdgpu driver.

Which again works iff you're debugging UMD only.

>> Adding drm support to allow another process to GEM_MAP BOs (and a search
>> ioctl) might work but seems like a lot more work (and will likely reduce
>> performance) than the tracer and has the same limitation.  And ultimately we
>> find direct memory access useful in NPI work so that's unlikely to be
>> removed.
> 
> You have not proof that it would be slower and i outline the special file
> idea that avoid any ioctl.

By "slower" I don't mean the syscall for ioctl.  I mean the housekeeping 
we'd have to do in the driver on top of normal TTM housekeeping.  Unless 
you can nop it all out by default (which I suspect you can) it will get 
NAK'ed immediately by the maintainers.

Setting aside the fact of adding 500+ lines of code so you could attach 
to other libdrm processes and rifle through their BOs is code they have 
to maintain.

>> Thanks for your time Jerome.  I do understand where you're coming from but
>> I'm not sure a drm update is both politically or technically feasible and we
>> need this functionality (limited though it is) sooner than later.
> 
> You have to understand that kernel is not about compromising design so that
> people can reach their schedule. Sane design do matter and if sane design
> require lot of work than you should size your team accordingly to face the
> challenges.

Except the real world doesn't work that way.  And we have legitimate 
reasons for being able to peek/poke at the hardware.  Philosophies likes 
yours lead companies to write private debug tools and not share them 
with the public.  Whereas we're trying to come up with a debug tool (and 
kernel module) that is practically safe and effective at its job without 
overly mucking about in the kernel.

You can't legitimately debug hardware from a userspace tool without 
having some knobs/etc that would otherwise be ill advised to give to 
everyone.  In a normal distribution your users run unprivileged and 
wouldn't have umr installed.  So they have zero access to any of our 
debugfs facilities and can only talk to the device via libdrm (via 
GL/VK/etc).

I still haven't seen an explanation for why having root-only access to 
the device is bad other than "it's insane!"  At the point you can run 
privileged processes you can simply compile and install a tainted kernel 
module to do the same thing.

Ideally, though if we went this route it'd be a drm/ttm change not an 
amdgpu change since I can't believe the other vendors wouldn't be able 
to make use of this functionality either.

> What you want to do can be done with very little code and no change outside
> amdgpu kernel driver so i do not see any reason to not do so.

Maybe Alex or Christian can weigh in at this point?

I'm not saying your proposal isn't possible I just don't have enough 
clarity on how I'd implement that nor enough confidence that it would be 
materially any better (keep in mind we are NOT removing our debugfs 
facilities).

Tom

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Feature Request: Ability to decode bus/dma address back into physical address
       [not found]                       ` <20170801195556.GD3443-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2017-08-01 20:26                         ` Tom St Denis
@ 2017-08-01 20:43                         ` Alex Williamson
       [not found]                           ` <20170801144320.63bda17d-DGNDKt5SQtizQB+pC5nmwQ@public.gmane.org>
  1 sibling, 1 reply; 24+ messages in thread
From: Alex Williamson @ 2017-08-01 20:43 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Tom St Denis, Deucher, Alexander,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Koenig,
	Christian

Pardon a brief administrative interruption; somehow Tom got added to
the discard group for this list, so we've been losing his half of the
conversation.  I've pruned that discard list to cleanup a few other
legitimate looking emails and moved Tom to the accepted group to avoid
losing further replies.  The problem should be resolved now, but the
archives will be incomplete.  Sorry for the inconvenience.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: Feature Request: Ability to decode bus/dma address back into physical address
       [not found]                           ` <77e557d2-aa75-46c4-88a7-cca5448ea08e-5C7GfCeVMHo@public.gmane.org>
@ 2017-08-01 20:43                             ` Deucher, Alexander
  2017-08-01 21:01                             ` Jerome Glisse
  1 sibling, 0 replies; 24+ messages in thread
From: Deucher, Alexander @ 2017-08-01 20:43 UTC (permalink / raw)
  To: StDenis, Tom, Jerome Glisse
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Koenig, Christian

> -----Original Message-----
> From: StDenis, Tom
> Sent: Tuesday, August 01, 2017 4:27 PM
> To: Jerome Glisse
> Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org; Deucher, Alexander; Koenig, Christian
> Subject: Re: Feature Request: Ability to decode bus/dma address back into
> physical address
> 
> Was trying to walk away from this ... :-) (all in good fun).
> 
> 
> On 01/08/17 03:55 PM, Jerome Glisse wrote:
> > On Tue, Aug 01, 2017 at 03:28:26PM -0400, Tom St Denis wrote:
> >> Adding the AMDGPU maintainers to get their opinions.
> >>
> >> Context:
> >> https://lists.linuxfoundation.org/pipermail/iommu/2017-
> August/023489.html
> >>
> >> (and others in case you missed it on the list).
> >>
> >>
> >> On 01/08/17 03:03 PM, Jerome Glisse wrote:
> >>>>> You would need to leverage thing like uevent to get event when
> something
> >>>>> happen like a bo being destroy or command submission ...
> >>>>
> >>>> The problem with this approach is when I'm reading an IB I'm not given
> user
> >>>> space addresses but bus addresses.  So I can't correlate anything I'm
> seeing
> >>>> in the hardware with the user task if I wanted to.
> >>>>
> >>>> In fact, to augment [say] OpenGL debugging I would have to correlate a
> >>>> buffer handle/pointer's page backing with the bus address in the IB so I
> >>>> could correlate the two (e.g. dump an IB and print out user process
> variable
> >>>> names that correspond to the IB contents...).
> >>>
> >>> When you read IB you are provided with GPU virtual address, you can
> get the
> >>> GPU virtual address from the same snoop ioctl just add a field in bo_info
> >>> above. So i don't see any issue here.
> >>
> >> This is effectively what I'm doing with the patch except via a trace not a
> >> bolted on snooping ioctl.
> >>
> >> Tracers are a bit better since there's not really any overhead in the normal
> >> case which is desirable (and the code is relatively very simple).
> >>
> >> In an ideal case we could simply search the ttm pages for a given device to
> >> look for a match for a given dma address.  But that also has the problem
> >> that ...
> >>
> >> This will fail for all the other allocations e.g. for our GART, ring
> >> buffers, etc.
> >>
> >> So a "complete" solution would be nice where any bus address mapping
> that is
> >> still valid for a device could be resolved to the physical page that is
> >> backing it.
> >
> > So you agree that you can get what you want with ioctl ? Now you object
> > that the ioctl overhead is too important ?
> 
> No, I object that the overhead of keeping track of it in some sort of
> snoop structure is too much (and adds too much new code to the driver
> for the sole purpose of this one feature).
> 
> Not to mention we can't really invent ioctls without co-ordinating with
> with the other drm users (re: not just AMD).  umr is a lot less coupled
> since it's not really versioned just yet (still bringing up a lot of
> features so users use git not packages).
> 
> > I do not believe ioctl overhead to be an issue. Like i say in the model
> > i adviced you mix ioctl and thing like uevent or trace so you get "real
> > time" notification of what is going on.
> 
> Unless you can nop that in a config invariant fashion (like you can for
> tracers) that's a NAK from the get go.  And we'd need to buffer them to
> be practical since you might run the debugger out of sync with the
> application (e.g. app hangs then you fire up umr to see what's going on).
> 
> > Why do i say that your approach of tracking individual page mapping is
> > insane. There is several obstacle:
> >    - to get from GPU virtual address to physical memory address you need
> >      to first walk down the GPU page table to get the GPU pte entry. From
> >      that you either get an address inside the GPU memory (VRAM of PCIE
> >      device) or a DMA bus address. If it is the latter (DMA bus address)
> >      you need to walk down the IOMMU page table to find the physical
> address
> >      of the memory. So this is 2 page table walk down
> 
> We already do this in umr.  give a GPUVM address we can decode it down
> to a PTE on both AI and VI platforms.  It's decoding the PTE to a
> physical page (outside of vram) that is the issue.
> 
> In fact being able to VM decode is important to the kernel team who have
> to debug VM issues from time to time.
> 
> >    - none of the above walk down can happen while holding enough lock so
> >      that the GPU page table do not change behind you back (especialy the
> >      second walk down) so you are open to race
> 
> Which is fine because you'd only really do this on a stalled/halted GPU
> anyways.  Just like you don't inspect variables of a program that is
> currently running (and not blocked/asleep/etc).  We have the ability to
> halt currently active waves/etc.
> 
> >    - GPU device memory is not necessary accessible (due to PCIE bar size
> >      limit i know that there is patches to allow to remap all GPU memory).
> 
> We have ways of accessing all of VRAM from basic MMIO accesses :)
> 
> >    - you need to track a lot of informations in userspace ie what memory
> >      is behind each 4k of GPU virtual address space you are interested in
> >      so you are basicaly duplicating informations that already exist in
> >      kernel space into userland
> 
> Agreed, if we had a TTM query API that'd be better but we don't.  And we
> work out what the memory is by inspecting the ring.  The ring has
> pointers to IB buffers which then have their own contents (e.g. CE/DE
> content).
> 
> > With what i outline you rely on the kernel informations to get to the
> > thing you want. You can even add something like:
> >
> > struct amdgpu_snoop_snapshot_ioctl {
> > 	uint64_t	gpu_virtual_address;
> > 	uint64_t	size;
> > 	uint64_t	uptr;
> > };
> >
> > To snapshot a range of the GPU virtual address space so you not even
> > have to worry about GPU virtual address -> bo handle -> offset into
> > bo. You can even go further and add a /dev/dri/card0-snoop device file
> > that you can mmap to access GPU virtual address space. Process open
> > the card0 file register as snoop against another process from that point
> > on a GPU virtual address in the snooped process becomes an offset into
> > the card0-snoop file. So now you can read/write on that device file
> > to access the GPU memory of a process without doing ioctl. The changes
> > to amdgpu are not that big.
> 
> Honestly I'm not qualified to agree/disagree with your last sentence.  I
> suspect it's a lot more complicated than a trace though and as I said a
> few times already this won't cover allocations done privately in the kernel.
> 
> >> Well aside from the fact it doesn't cover all mappings, I can't see how it's
> >> materially different than what I was already doing.  For this to work I'd
> >> have to be able to mmap objects from another process into the
> debugger.  I
> >> don't know if drm is really designed for that sort of flow and I suspect it
> >> would take a lot of work to add that.
> >
> > The amount of work does not matter. We are aiming for sane solution not
> > insane thing. We are aiming for thing that are safe from security point
> > of view. We are aiming for design that lower barrier (ie no need to be
> > root to debug/trace GPU). This is what matters.
> 
> Except that you're approaching this from the angle that the KMD is
> perfect and we're debugging the UMD.  umr is meant to debug both.  So
> things like raw access to VRAM/GART is necessary.  For instance, to
> debug VM mappings, to read IBs, to read frame buffers, etc...
> 
> > Quite frankly what i outline is not a lot of work. The snoop device file
> > scheme for instance is pretty easy to achieve. Everytime there is a change
> > to ttm object or GPU virtual address space you would need to update
> > accordingly any mmap of that file. Overall i would say this is in the
> > 500/1000 lines of code range.
> 
> Again I'm not familiar enough with the DRM/TTM code design to comment
> here.
> 
> > GPU device memory is slightly harder, you probably need to do it as a
> > snapshot ie you copy device memory to system memory and map the
> system
> > memory.
> 
> Which is unnecessary since we can directly read VRAM from the debugger :-)
> 
> > We want thing like debugger to work out of the box on any distribution so
> > design must be sane and you should not need either to ask special modules
> > or special kernel config option or to be root. There is way to do what you
> > want without any of this requirement and thus that should be what you
> aim
> > for.
> 
> Except again you're looking at this from the lens that the KMD is
> working perfectly.  We use umr (our debugger) to debug the kernel driver
> itself.  That means you need to be able to read/write MMIO registers,
> peek at VRAM, decode VM addresses, read rings, etc...
> 
> Since we already have an entire list of "security breaches" we might as
> well make use of them for UMD debugging too.  Again, you're not going to
> give random logins access to umr on your machine.  I mean in reality you
> wouldn't do that anyways for UMD tasks (you can easily lock up a GPU
> from a user task with no privileges...).
> 
> > My design do not need to get the reverse mapping at all that is my point.
> > No need to engage the iommu folks no need to ask them for something as
> > you have all the information from inside the amdgpu driver.
> 
> Which again works iff you're debugging UMD only.
> 
> >> Adding drm support to allow another process to GEM_MAP BOs (and a
> search
> >> ioctl) might work but seems like a lot more work (and will likely reduce
> >> performance) than the tracer and has the same limitation.  And ultimately
> we
> >> find direct memory access useful in NPI work so that's unlikely to be
> >> removed.
> >
> > You have not proof that it would be slower and i outline the special file
> > idea that avoid any ioctl.
> 
> By "slower" I don't mean the syscall for ioctl.  I mean the housekeeping
> we'd have to do in the driver on top of normal TTM housekeeping.  Unless
> you can nop it all out by default (which I suspect you can) it will get
> NAK'ed immediately by the maintainers.
> 
> Setting aside the fact of adding 500+ lines of code so you could attach
> to other libdrm processes and rifle through their BOs is code they have
> to maintain.
> 
> >> Thanks for your time Jerome.  I do understand where you're coming from
> but
> >> I'm not sure a drm update is both politically or technically feasible and we
> >> need this functionality (limited though it is) sooner than later.
> >
> > You have to understand that kernel is not about compromising design so
> that
> > people can reach their schedule. Sane design do matter and if sane design
> > require lot of work than you should size your team accordingly to face the
> > challenges.
> 
> Except the real world doesn't work that way.  And we have legitimate
> reasons for being able to peek/poke at the hardware.  Philosophies likes
> yours lead companies to write private debug tools and not share them
> with the public.  Whereas we're trying to come up with a debug tool (and
> kernel module) that is practically safe and effective at its job without
> overly mucking about in the kernel.
> 
> You can't legitimately debug hardware from a userspace tool without
> having some knobs/etc that would otherwise be ill advised to give to
> everyone.  In a normal distribution your users run unprivileged and
> wouldn't have umr installed.  So they have zero access to any of our
> debugfs facilities and can only talk to the device via libdrm (via
> GL/VK/etc).
> 
> I still haven't seen an explanation for why having root-only access to
> the device is bad other than "it's insane!"  At the point you can run
> privileged processes you can simply compile and install a tainted kernel
> module to do the same thing.
> 
> Ideally, though if we went this route it'd be a drm/ttm change not an
> amdgpu change since I can't believe the other vendors wouldn't be able
> to make use of this functionality either.
> 
> > What you want to do can be done with very little code and no change
> outside
> > amdgpu kernel driver so i do not see any reason to not do so.
> 
> Maybe Alex or Christian can weigh in at this point?
> 
> I'm not saying your proposal isn't possible I just don't have enough
> clarity on how I'd implement that nor enough confidence that it would be
> materially any better (keep in mind we are NOT removing our debugfs
> facilities).

I haven't followed this entire thread that closely, but the whole point of umr is to be able to dump the hw state directly.  E.g., decode all rings, IBs, GPUVM, clock and powergating state, etc. based on the hw state.  That way we get as close to possible to hw's view of the current state.  If we have to involve sw, we can get into issues if we hit sw bugs.

Alex

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Feature Request: Ability to decode bus/dma address back into physical address
       [not found]                           ` <77e557d2-aa75-46c4-88a7-cca5448ea08e-5C7GfCeVMHo@public.gmane.org>
  2017-08-01 20:43                             ` Deucher, Alexander
@ 2017-08-01 21:01                             ` Jerome Glisse
       [not found]                               ` <20170801210105.GE3443-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  1 sibling, 1 reply; 24+ messages in thread
From: Jerome Glisse @ 2017-08-01 21:01 UTC (permalink / raw)
  To: Tom St Denis
  Cc: Deucher, Alexander,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Koenig,
	Christian

On Tue, Aug 01, 2017 at 04:26:38PM -0400, Tom St Denis wrote:
> Was trying to walk away from this ... :-) (all in good fun).
> 
> 
> On 01/08/17 03:55 PM, Jerome Glisse wrote:
> > On Tue, Aug 01, 2017 at 03:28:26PM -0400, Tom St Denis wrote:
> > > Adding the AMDGPU maintainers to get their opinions.
> > > 
> > > Context:
> > > https://lists.linuxfoundation.org/pipermail/iommu/2017-August/023489.html
> > > 
> > > (and others in case you missed it on the list).
> > > 
> > > 
> > > On 01/08/17 03:03 PM, Jerome Glisse wrote:
> > > > > > You would need to leverage thing like uevent to get event when something
> > > > > > happen like a bo being destroy or command submission ...
> > > > > 
> > > > > The problem with this approach is when I'm reading an IB I'm not given user
> > > > > space addresses but bus addresses.  So I can't correlate anything I'm seeing
> > > > > in the hardware with the user task if I wanted to.
> > > > > 
> > > > > In fact, to augment [say] OpenGL debugging I would have to correlate a
> > > > > buffer handle/pointer's page backing with the bus address in the IB so I
> > > > > could correlate the two (e.g. dump an IB and print out user process variable
> > > > > names that correspond to the IB contents...).
> > > > 
> > > > When you read IB you are provided with GPU virtual address, you can get the
> > > > GPU virtual address from the same snoop ioctl just add a field in bo_info
> > > > above. So i don't see any issue here.
> > > 
> > > This is effectively what I'm doing with the patch except via a trace not a
> > > bolted on snooping ioctl.
> > > 
> > > Tracers are a bit better since there's not really any overhead in the normal
> > > case which is desirable (and the code is relatively very simple).
> > > 
> > > In an ideal case we could simply search the ttm pages for a given device to
> > > look for a match for a given dma address.  But that also has the problem
> > > that ...
> > > 
> > > This will fail for all the other allocations e.g. for our GART, ring
> > > buffers, etc.
> > > 
> > > So a "complete" solution would be nice where any bus address mapping that is
> > > still valid for a device could be resolved to the physical page that is
> > > backing it.
> > 
> > So you agree that you can get what you want with ioctl ? Now you object
> > that the ioctl overhead is too important ?
> 
> No, I object that the overhead of keeping track of it in some sort of snoop
> structure is too much (and adds too much new code to the driver for the sole
> purpose of this one feature).
> 
> Not to mention we can't really invent ioctls without co-ordinating with with
> the other drm users (re: not just AMD).  umr is a lot less coupled since
> it's not really versioned just yet (still bringing up a lot of features so
> users use git not packages).

You can do this outside drm and if API make sense to other it can be added
latter to drm.

> 
> > I do not believe ioctl overhead to be an issue. Like i say in the model
> > i adviced you mix ioctl and thing like uevent or trace so you get "real
> > time" notification of what is going on.
> 
> Unless you can nop that in a config invariant fashion (like you can for
> tracers) that's a NAK from the get go.  And we'd need to buffer them to be
> practical since you might run the debugger out of sync with the application
> (e.g. app hangs then you fire up umr to see what's going on).

Again when you start snooping you can access all the existing bo
informations and start monitoring event to keep userspace in sync
with any changes from the first snapshot you take.

> 
> > Why do i say that your approach of tracking individual page mapping is
> > insane. There is several obstacle:
> >    - to get from GPU virtual address to physical memory address you need
> >      to first walk down the GPU page table to get the GPU pte entry. From
> >      that you either get an address inside the GPU memory (VRAM of PCIE
> >      device) or a DMA bus address. If it is the latter (DMA bus address)
> >      you need to walk down the IOMMU page table to find the physical address
> >      of the memory. So this is 2 page table walk down
> 
> We already do this in umr.  give a GPUVM address we can decode it down to a
> PTE on both AI and VI platforms.  It's decoding the PTE to a physical page
> (outside of vram) that is the issue.
> 
> In fact being able to VM decode is important to the kernel team who have to
> debug VM issues from time to time.
> 
> >    - none of the above walk down can happen while holding enough lock so
> >      that the GPU page table do not change behind you back (especialy the
> >      second walk down) so you are open to race
> 
> Which is fine because you'd only really do this on a stalled/halted GPU
> anyways.  Just like you don't inspect variables of a program that is
> currently running (and not blocked/asleep/etc).  We have the ability to halt
> currently active waves/etc.

So you want to add something to the kernel just for a corner case ie
when debugging GPU hang. Not for more generic tools. I don't think
that the work needed for that is worth the effort just for such a
small usecase.

> 
> >    - GPU device memory is not necessary accessible (due to PCIE bar size
> >      limit i know that there is patches to allow to remap all GPU memory).
> 
> We have ways of accessing all of VRAM from basic MMIO accesses :)

Yes and it is insane to have to write VRAM address in one register and
read the VRAM value in another, insane from tedious point of view :)

> 
> >    - you need to track a lot of informations in userspace ie what memory
> >      is behind each 4k of GPU virtual address space you are interested in
> >      so you are basicaly duplicating informations that already exist in
> >      kernel space into userland
> 
> Agreed, if we had a TTM query API that'd be better but we don't.  And we
> work out what the memory is by inspecting the ring.  The ring has pointers
> to IB buffers which then have their own contents (e.g. CE/DE content).
>
> > With what i outline you rely on the kernel informations to get to the
> > thing you want. You can even add something like:
> > 
> > struct amdgpu_snoop_snapshot_ioctl {
> > 	uint64_t	gpu_virtual_address;
> > 	uint64_t	size;
> > 	uint64_t	uptr;
> > };
> > 
> > To snapshot a range of the GPU virtual address space so you not even
> > have to worry about GPU virtual address -> bo handle -> offset into
> > bo. You can even go further and add a /dev/dri/card0-snoop device file
> > that you can mmap to access GPU virtual address space. Process open
> > the card0 file register as snoop against another process from that point
> > on a GPU virtual address in the snooped process becomes an offset into
> > the card0-snoop file. So now you can read/write on that device file
> > to access the GPU memory of a process without doing ioctl. The changes
> > to amdgpu are not that big.
> 
> Honestly I'm not qualified to agree/disagree with your last sentence.  I
> suspect it's a lot more complicated than a trace though and as I said a few
> times already this won't cover allocations done privately in the kernel.

Kernel allocation should be out of your reach really.

> 
> > > Well aside from the fact it doesn't cover all mappings, I can't see how it's
> > > materially different than what I was already doing.  For this to work I'd
> > > have to be able to mmap objects from another process into the debugger.  I
> > > don't know if drm is really designed for that sort of flow and I suspect it
> > > would take a lot of work to add that.
> > 
> > The amount of work does not matter. We are aiming for sane solution not
> > insane thing. We are aiming for thing that are safe from security point
> > of view. We are aiming for design that lower barrier (ie no need to be
> > root to debug/trace GPU). This is what matters.
> 
> Except that you're approaching this from the angle that the KMD is perfect
> and we're debugging the UMD.  umr is meant to debug both.  So things like
> raw access to VRAM/GART is necessary.  For instance, to debug VM mappings,
> to read IBs, to read frame buffers, etc...

Then you can use the snoop device file to access gart and use mmio reg to
access device memory. Have 2 files one that gives you gpu pte for each 4k
and the other one that allow you to access gart object either through mmap
or to read/write.

> 
> > Quite frankly what i outline is not a lot of work. The snoop device file
> > scheme for instance is pretty easy to achieve. Everytime there is a change
> > to ttm object or GPU virtual address space you would need to update
> > accordingly any mmap of that file. Overall i would say this is in the
> > 500/1000 lines of code range.
> 
> Again I'm not familiar enough with the DRM/TTM code design to comment here.
> 
> > GPU device memory is slightly harder, you probably need to do it as a
> > snapshot ie you copy device memory to system memory and map the system
> > memory.
> 
> Which is unnecessary since we can directly read VRAM from the debugger :-)
> 
> > We want thing like debugger to work out of the box on any distribution so
> > design must be sane and you should not need either to ask special modules
> > or special kernel config option or to be root. There is way to do what you
> > want without any of this requirement and thus that should be what you aim
> > for.
> 
> Except again you're looking at this from the lens that the KMD is working
> perfectly.  We use umr (our debugger) to debug the kernel driver itself.
> That means you need to be able to read/write MMIO registers, peek at VRAM,
> decode VM addresses, read rings, etc...

And you can do that with the scheme i propose.

> 
> Since we already have an entire list of "security breaches" we might as well
> make use of them for UMD debugging too.  Again, you're not going to give
> random logins access to umr on your machine.  I mean in reality you wouldn't
> do that anyways for UMD tasks (you can easily lock up a GPU from a user task
> with no privileges...).

Locking up is one thing, allowing to access data that should not be accessible
from userspace is another issue entirely. The former is bad, the latter is
worse.

> 
> > My design do not need to get the reverse mapping at all that is my point.
> > No need to engage the iommu folks no need to ask them for something as
> > you have all the information from inside the amdgpu driver.
> 
> Which again works iff you're debugging UMD only.

No it works also if you are debugging kernel side ie anything that is mapped
to the GPU either into userspace GPU virtual address space or into kernel
GPU virtual address space.

> 
> > > Adding drm support to allow another process to GEM_MAP BOs (and a search
> > > ioctl) might work but seems like a lot more work (and will likely reduce
> > > performance) than the tracer and has the same limitation.  And ultimately we
> > > find direct memory access useful in NPI work so that's unlikely to be
> > > removed.
> > 
> > You have not proof that it would be slower and i outline the special file
> > idea that avoid any ioctl.
> 
> By "slower" I don't mean the syscall for ioctl.  I mean the housekeeping
> we'd have to do in the driver on top of normal TTM housekeeping.  Unless you
> can nop it all out by default (which I suspect you can) it will get NAK'ed
> immediately by the maintainers.

No i don't think it would be anything big unless they object to an if()
in a hotpath.

> 
> Setting aside the fact of adding 500+ lines of code so you could attach to
> other libdrm processes and rifle through their BOs is code they have to
> maintain.
> 
> > > Thanks for your time Jerome.  I do understand where you're coming from but
> > > I'm not sure a drm update is both politically or technically feasible and we
> > > need this functionality (limited though it is) sooner than later.
> > 
> > You have to understand that kernel is not about compromising design so that
> > people can reach their schedule. Sane design do matter and if sane design
> > require lot of work than you should size your team accordingly to face the
> > challenges.
> 
> Except the real world doesn't work that way.  And we have legitimate reasons
> for being able to peek/poke at the hardware.  Philosophies likes yours lead
> companies to write private debug tools and not share them with the public.
> Whereas we're trying to come up with a debug tool (and kernel module) that
> is practically safe and effective at its job without overly mucking about in
> the kerne.
> 
> You can't legitimately debug hardware from a userspace tool without having
> some knobs/etc that would otherwise be ill advised to give to everyone.  In
> a normal distribution your users run unprivileged and wouldn't have umr
> installed.  So they have zero access to any of our debugfs facilities and
> can only talk to the device via libdrm (via GL/VK/etc).
> 
> I still haven't seen an explanation for why having root-only access to the
> device is bad other than "it's insane!"  At the point you can run privileged
> processes you can simply compile and install a tainted kernel module to do
> the same thing.

I was saying that your design is restrictive and can not be use for other
thing and thus it is hard to justify the work for a single use case.

> 
> Ideally, though if we went this route it'd be a drm/ttm change not an amdgpu
> change since I can't believe the other vendors wouldn't be able to make use
> of this functionality either.
> 
> > What you want to do can be done with very little code and no change outside
> > amdgpu kernel driver so i do not see any reason to not do so.
> 
> Maybe Alex or Christian can weigh in at this point?
> 
> I'm not saying your proposal isn't possible I just don't have enough clarity
> on how I'd implement that nor enough confidence that it would be materially
> any better (keep in mind we are NOT removing our debugfs facilities).

Well maybe i will code it down latter today on the train back home.

Jérôme

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Feature Request: Ability to decode bus/dma address back into physical address
       [not found]                               ` <20170801210105.GE3443-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-08-01 21:38                                 ` Tom St Denis
       [not found]                                   ` <2cd345ee-d5ad-1ad7-508a-86225e65621c-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 24+ messages in thread
From: Tom St Denis @ 2017-08-01 21:38 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Deucher, Alexander,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Koenig,
	Christian

On 01/08/17 05:01 PM, Jerome Glisse wrote:
>> Unless you can nop that in a config invariant fashion (like you can for
>> tracers) that's a NAK from the get go.  And we'd need to buffer them to be
>> practical since you might run the debugger out of sync with the application
>> (e.g. app hangs then you fire up umr to see what's going on).
> 
> Again when you start snooping you can access all the existing bo
> informations and start monitoring event to keep userspace in sync
> with any changes from the first snapshot you take.

You'd have to be able to do this post-mortem as well though.  And 
ideally I don't really care about getting live updates (at least not yet 
though...).  The way it would work now is you read the ring and get IB 
pointers which you then have to fetch and decode.  So all I care about 
now is what does the pointer point to right now.

Though at this point I'm decoding a GPUVM address not a DMA nor physical 
address so I then have to decode the GPUVM address "somehow" (assuming I 
don't use my "unsafe" methods) then I can proceed to look up the buffer 
based on the DMA address.

(* being notified of changes might be useful later on.  We do have more 
grandiose plans to integrate umr into gdb/etc functionality).

> So you want to add something to the kernel just for a corner case ie
> when debugging GPU hang. Not for more generic tools. I don't think
> that the work needed for that is worth the effort just for such a
> small usecase.

You can't really read GPU data when it's in flight anyways.  Ask a GFX 
hardware person about reading registers when the GPU is running and 
they'll say "NAK."

So even if the application hasn't crashed if you want to inspect things 
you need to halt the waves (effectively halting the GPU).

>>>     - GPU device memory is not necessary accessible (due to PCIE bar size
>>>       limit i know that there is patches to allow to remap all GPU memory).
>>
>> We have ways of accessing all of VRAM from basic MMIO accesses :)
> 
> Yes and it is insane to have to write VRAM address in one register and
> read the VRAM value in another, insane from tedious point of view :)

I'd think double buffering **all** of VRAM in case you might want parts 
of it to be more insane.

Typical VRAM reads to decode a GPUVM address for instance involve 
reading 16 or 40 bytes total.  It's pretty quick to be honest.

> Kernel allocation should be out of your reach really.

Not if you're debugging ... the kernel (or more specifically your module).

>> Except again you're looking at this from the lens that the KMD is working
>> perfectly.  We use umr (our debugger) to debug the kernel driver itself.
>> That means you need to be able to read/write MMIO registers, peek at VRAM,
>> decode VM addresses, read rings, etc...
> 
> And you can do that with the scheme i propose.

Not everything is a BO though ...

> Locking up is one thing, allowing to access data that should not be accessible
> from userspace is another issue entirely. The former is bad, the latter is
> worse.

Depends on what you're doing.  Locking up a medical device that somehow 
allows unprivileged users on it ... :-)

> No it works also if you are debugging kernel side ie anything that is mapped
> to the GPU either into userspace GPU virtual address space or into kernel
> GPU virtual address space.

Not all kernel allocations go through DRM though.  You can just as 
easily pci_alloc_*() some memory and pass the dma address to the GPU.

> No i don't think it would be anything big unless they object to an if()
> in a hotpath.

They probably wouldn't.

> I was saying that your design is restrictive and can not be use for other
> thing and thus it is hard to justify the work for a single use case.

I'm not married to my trace idea.  It just happens to work for user 
applications and is better than nothing (and is very unobtrusive).  If 
you want to take a crack at a proper interface that accomplishes the 
same (and more) by all means I'll gladly adopt using it.

Just keep in mind we have no plans to remove our existing debugfs 
facilities.

> Well maybe i will code it down latter today on the train back home.

Good luck Fermat.  :-)

Cheers,
Tom

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Feature Request: Ability to decode bus/dma address back into physical address
       [not found]                           ` <20170801144320.63bda17d-DGNDKt5SQtizQB+pC5nmwQ@public.gmane.org>
@ 2017-08-01 21:38                             ` Tom St Denis
       [not found]                               ` <aa08a829-b472-cc39-b1a3-6a7d01f64da1-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 24+ messages in thread
From: Tom St Denis @ 2017-08-01 21:38 UTC (permalink / raw)
  To: Alex Williamson, Jerome Glisse
  Cc: Deucher, Alexander,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Koenig,
	Christian

On 01/08/17 04:43 PM, Alex Williamson wrote:
> 
> Pardon a brief administrative interruption; somehow Tom got added to
> the discard group for this list, so we've been losing his half of the
> conversation.  I've pruned that discard list to cleanup a few other
> legitimate looking emails and moved Tom to the accepted group to avoid
> losing further replies.  The problem should be resolved now, but the
> archives will be incomplete.  Sorry for the inconvenience.  Thanks,
> 
> Alex
> 

Thanks.  I tried subscribing this morning and after 20 or so minutes of 
no confirmation email I just emailed the list anyways :-)

Tom

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Feature Request: Ability to decode bus/dma address back into physical address
       [not found]                               ` <aa08a829-b472-cc39-b1a3-6a7d01f64da1-5C7GfCeVMHo@public.gmane.org>
@ 2017-08-01 22:42                                 ` Alex Williamson
       [not found]                                   ` <20170801164250.3bae6436-DGNDKt5SQtizQB+pC5nmwQ@public.gmane.org>
  0 siblings, 1 reply; 24+ messages in thread
From: Alex Williamson @ 2017-08-01 22:42 UTC (permalink / raw)
  To: Tom St Denis
  Cc: Deucher, Alexander,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Koenig,
	Christian

On Tue, 1 Aug 2017 17:38:43 -0400
Tom St Denis <tom.stdenis-5C7GfCeVMHo@public.gmane.org> wrote:

> On 01/08/17 04:43 PM, Alex Williamson wrote:
> > 
> > Pardon a brief administrative interruption; somehow Tom got added to
> > the discard group for this list, so we've been losing his half of the
> > conversation.  I've pruned that discard list to cleanup a few other
> > legitimate looking emails and moved Tom to the accepted group to avoid
> > losing further replies.  The problem should be resolved now, but the
> > archives will be incomplete.  Sorry for the inconvenience.  Thanks,
> > 
> > Alex
> >   
> 
> Thanks.  I tried subscribing this morning and after 20 or so minutes of 
> no confirmation email I just emailed the list anyways :-)

Generally that's fine, your post will get delayed until Joerg or I
approve it, but it should get there eventually.  Probably just a
mis-click caused you to end up on the wrong list.  I still don't see
you on the subscriber list, but I can add you if you like.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Feature Request: Ability to decode bus/dma address back into physical address
       [not found]                                   ` <2cd345ee-d5ad-1ad7-508a-86225e65621c-5C7GfCeVMHo@public.gmane.org>
@ 2017-08-02  4:42                                     ` Jerome Glisse
       [not found]                                       ` <20170802044214.GA6285-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 24+ messages in thread
From: Jerome Glisse @ 2017-08-02  4:42 UTC (permalink / raw)
  To: Tom St Denis
  Cc: Deucher, Alexander,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Koenig,
	Christian

On Tue, Aug 01, 2017 at 05:38:05PM -0400, Tom St Denis wrote:
> On 01/08/17 05:01 PM, Jerome Glisse wrote:
> > > Unless you can nop that in a config invariant fashion (like you can for
> > > tracers) that's a NAK from the get go.  And we'd need to buffer them to be
> > > practical since you might run the debugger out of sync with the application
> > > (e.g. app hangs then you fire up umr to see what's going on).
> > 
> > Again when you start snooping you can access all the existing bo
> > informations and start monitoring event to keep userspace in sync
> > with any changes from the first snapshot you take.
> 
> You'd have to be able to do this post-mortem as well though.  And ideally I
> don't really care about getting live updates (at least not yet though...).
> The way it would work now is you read the ring and get IB pointers which you
> then have to fetch and decode.  So all I care about now is what does the
> pointer point to right now.
> 
> Though at this point I'm decoding a GPUVM address not a DMA nor physical
> address so I then have to decode the GPUVM address "somehow" (assuming I
> don't use my "unsafe" methods) then I can proceed to look up the buffer
> based on the DMA address.

Everything is a GPU virtual address in first place. Then it is translated
to either GPU VRAM address or bus address. IIRC some of the DMA engine do
not use GPU VM but directly bus address and VRAM but this can also fit in
what i am proposing. See at the end.

> 
> (* being notified of changes might be useful later on.  We do have more
> grandiose plans to integrate umr into gdb/etc functionality).
> 
> > So you want to add something to the kernel just for a corner case ie
> > when debugging GPU hang. Not for more generic tools. I don't think
> > that the work needed for that is worth the effort just for such a
> > small usecase.
> 
> You can't really read GPU data when it's in flight anyways.  Ask a GFX
> hardware person about reading registers when the GPU is running and they'll
> say "NAK."
> 
> So even if the application hasn't crashed if you want to inspect things you
> need to halt the waves (effectively halting the GPU).

I thought you were also doing a tracing/perf monitor tools so i wrongly
assume that you wanted thing to keep going.


> > > >     - GPU device memory is not necessary accessible (due to PCIE bar size
> > > >       limit i know that there is patches to allow to remap all GPU memory).
> > > 
> > > We have ways of accessing all of VRAM from basic MMIO accesses :)
> > 
> > Yes and it is insane to have to write VRAM address in one register and
> > read the VRAM value in another, insane from tedious point of view :)
> 
> I'd think double buffering **all** of VRAM in case you might want parts of
> it to be more insane.

Not saying all vram only thing that is accessed think of it as temporary
bounce buffer of couple mega byte.

[...]

> > > Except again you're looking at this from the lens that the KMD is working
> > > perfectly.  We use umr (our debugger) to debug the kernel driver itself.
> > > That means you need to be able to read/write MMIO registers, peek at VRAM,
> > > decode VM addresses, read rings, etc...
> > 
> > And you can do that with the scheme i propose.
> 
> Not everything is a BO though ...

What isn't a bo ? Everything inside amdgpu driver is an amdgpu_bo unless i missed
something. I am ignoring amdkfd here as this one is easy.

[...]

> > I was saying that your design is restrictive and can not be use for other
> > thing and thus it is hard to justify the work for a single use case.
> 
> I'm not married to my trace idea.  It just happens to work for user
> applications and is better than nothing (and is very unobtrusive).  If you
> want to take a crack at a proper interface that accomplishes the same (and
> more) by all means I'll gladly adopt using it.
> 
> Just keep in mind we have no plans to remove our existing debugfs
> facilities.
> 
> > Well maybe i will code it down latter today on the train back home.
> 
> Good luck Fermat.  :-)

So here it is

https://cgit.freedesktop.org/~glisse/linux/log/?h=amdgpu-debug

most of the plumbing is there, i am puzzle by the absence of an obvious
lock that protect the virtual address range from concurrent insert/remove.

I haven't finished the read method but what is missing is accessing ttm.tt
object if any or falling back to mmio read otherwise. Probably couple hundred
line of code to add this.

It is untested as i don't have hardware.

Jérôme

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Feature Request: Ability to decode bus/dma address back into physical address
       [not found]                                       ` <20170802044214.GA6285-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-08-02  8:26                                         ` Christian König
       [not found]                                           ` <4a2b004b-ebc2-9331-84c4-4e6672dd7b97-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 24+ messages in thread
From: Christian König @ 2017-08-02  8:26 UTC (permalink / raw)
  To: Jerome Glisse, Tom St Denis
  Cc: Deucher, Alexander, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Hi Jerome,

sorry for being a bit late to the discussion and the top posting.

But I think you miss a very important point here, which makes the whole 
discussion on how to implement completely superfluous:

We already have a functionality to access the content of BOs in a 
process for debugging purpose which works similar as you described and 
just uses the BO housekeeping structures in the driver to access the 
pages and VRAM locations in question.

See here for the implementation:
1. drm/ttm: Implement vm_operations_struct.access v2 
(http://www.spinics.net/lists/dri-devel/msg147302.html)
2. drm/amdgpu: Implement ttm_bo_driver.access_memory callback v2 
(http://www.spinics.net/lists/dri-devel/msg147303.html)

Those patches allow you to just attach gdb to a process and access the 
content of every CPU mapped buffer, even when that buffer is in CPU 
invisible VRAM.

So the primary goal of that effort is *NOT* to make the BO content 
accessible to the debugger through the BO housekeeping, cause that's 
something we already have.

The goal here is to walk the different page tables and hardware mapping 
functionalities to access the data just in the same way the GPU would do 
to catch problems.

That works fine as long as IOMMU is disabled, but when it is enabled the 
additional mapping breaks our neck and we don't knew if the stuff the 
page table dumper is producing is correct or not.

So what we need is just a way to translate dma addresses back to struct 
pages to check their validity.

I've considered to just add this information to amdgpu_gem_info, but 
then we get page->dma-address mapping instead dma-address->page as we want.

Regards,
Christian.

Am 02.08.2017 um 06:42 schrieb Jerome Glisse:
> On Tue, Aug 01, 2017 at 05:38:05PM -0400, Tom St Denis wrote:
>> On 01/08/17 05:01 PM, Jerome Glisse wrote:
>>>> Unless you can nop that in a config invariant fashion (like you can for
>>>> tracers) that's a NAK from the get go.  And we'd need to buffer them to be
>>>> practical since you might run the debugger out of sync with the application
>>>> (e.g. app hangs then you fire up umr to see what's going on).
>>> Again when you start snooping you can access all the existing bo
>>> informations and start monitoring event to keep userspace in sync
>>> with any changes from the first snapshot you take.
>> You'd have to be able to do this post-mortem as well though.  And ideally I
>> don't really care about getting live updates (at least not yet though...).
>> The way it would work now is you read the ring and get IB pointers which you
>> then have to fetch and decode.  So all I care about now is what does the
>> pointer point to right now.
>>
>> Though at this point I'm decoding a GPUVM address not a DMA nor physical
>> address so I then have to decode the GPUVM address "somehow" (assuming I
>> don't use my "unsafe" methods) then I can proceed to look up the buffer
>> based on the DMA address.
> Everything is a GPU virtual address in first place. Then it is translated
> to either GPU VRAM address or bus address. IIRC some of the DMA engine do
> not use GPU VM but directly bus address and VRAM but this can also fit in
> what i am proposing. See at the end.
>
>> (* being notified of changes might be useful later on.  We do have more
>> grandiose plans to integrate umr into gdb/etc functionality).
>>
>>> So you want to add something to the kernel just for a corner case ie
>>> when debugging GPU hang. Not for more generic tools. I don't think
>>> that the work needed for that is worth the effort just for such a
>>> small usecase.
>> You can't really read GPU data when it's in flight anyways.  Ask a GFX
>> hardware person about reading registers when the GPU is running and they'll
>> say "NAK."
>>
>> So even if the application hasn't crashed if you want to inspect things you
>> need to halt the waves (effectively halting the GPU).
> I thought you were also doing a tracing/perf monitor tools so i wrongly
> assume that you wanted thing to keep going.
>
>
>>>>>      - GPU device memory is not necessary accessible (due to PCIE bar size
>>>>>        limit i know that there is patches to allow to remap all GPU memory).
>>>> We have ways of accessing all of VRAM from basic MMIO accesses :)
>>> Yes and it is insane to have to write VRAM address in one register and
>>> read the VRAM value in another, insane from tedious point of view :)
>> I'd think double buffering **all** of VRAM in case you might want parts of
>> it to be more insane.
> Not saying all vram only thing that is accessed think of it as temporary
> bounce buffer of couple mega byte.
>
> [...]
>
>>>> Except again you're looking at this from the lens that the KMD is working
>>>> perfectly.  We use umr (our debugger) to debug the kernel driver itself.
>>>> That means you need to be able to read/write MMIO registers, peek at VRAM,
>>>> decode VM addresses, read rings, etc...
>>> And you can do that with the scheme i propose.
>> Not everything is a BO though ...
> What isn't a bo ? Everything inside amdgpu driver is an amdgpu_bo unless i missed
> something. I am ignoring amdkfd here as this one is easy.
>
> [...]
>
>>> I was saying that your design is restrictive and can not be use for other
>>> thing and thus it is hard to justify the work for a single use case.
>> I'm not married to my trace idea.  It just happens to work for user
>> applications and is better than nothing (and is very unobtrusive).  If you
>> want to take a crack at a proper interface that accomplishes the same (and
>> more) by all means I'll gladly adopt using it.
>>
>> Just keep in mind we have no plans to remove our existing debugfs
>> facilities.
>>
>>> Well maybe i will code it down latter today on the train back home.
>> Good luck Fermat.  :-)
> So here it is
>
> https://cgit.freedesktop.org/~glisse/linux/log/?h=amdgpu-debug
>
> most of the plumbing is there, i am puzzle by the absence of an obvious
> lock that protect the virtual address range from concurrent insert/remove.
>
> I haven't finished the read method but what is missing is accessing ttm.tt
> object if any or falling back to mmio read otherwise. Probably couple hundred
> line of code to add this.
>
> It is untested as i don't have hardware.
>
> Jérôme


_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Feature Request: Ability to decode bus/dma address back into physical address
       [not found]                                           ` <4a2b004b-ebc2-9331-84c4-4e6672dd7b97-5C7GfCeVMHo@public.gmane.org>
@ 2017-08-02 16:43                                             ` Jerome Glisse
       [not found]                                               ` <20170802164343.GA3105-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 24+ messages in thread
From: Jerome Glisse @ 2017-08-02 16:43 UTC (permalink / raw)
  To: Christian König
  Cc: Tom St Denis, Deucher, Alexander,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Wed, Aug 02, 2017 at 10:26:40AM +0200, Christian König wrote:
> Hi Jerome,
> 
> sorry for being a bit late to the discussion and the top posting.
> 
> But I think you miss a very important point here, which makes the whole
> discussion on how to implement completely superfluous:
> 
> We already have a functionality to access the content of BOs in a process
> for debugging purpose which works similar as you described and just uses the
> BO housekeeping structures in the driver to access the pages and VRAM
> locations in question.
> 
> See here for the implementation:
> 1. drm/ttm: Implement vm_operations_struct.access v2
> (http://www.spinics.net/lists/dri-devel/msg147302.html)
> 2. drm/amdgpu: Implement ttm_bo_driver.access_memory callback v2
> (http://www.spinics.net/lists/dri-devel/msg147303.html)
> 
> Those patches allow you to just attach gdb to a process and access the
> content of every CPU mapped buffer, even when that buffer is in CPU
> invisible VRAM.
> 
> So the primary goal of that effort is *NOT* to make the BO content
> accessible to the debugger through the BO housekeeping, cause that's
> something we already have.
> 
> The goal here is to walk the different page tables and hardware mapping
> functionalities to access the data just in the same way the GPU would do to
> catch problems.
> 
> That works fine as long as IOMMU is disabled, but when it is enabled the
> additional mapping breaks our neck and we don't knew if the stuff the page
> table dumper is producing is correct or not.
> 
> So what we need is just a way to translate dma addresses back to struct
> pages to check their validity.
> 
> I've considered to just add this information to amdgpu_gem_info, but then we
> get page->dma-address mapping instead dma-address->page as we want.
> 

So to summarize you are saying you do not trust the value you get from
pci_map_page() ?

If not then i stress again that you have all the informations you need
inside the amdgpu driver. You can take the same scheme i propose to
dump ttm.dma_address[] and compare against content of GPU page table.

Jérôme

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Feature Request: Ability to decode bus/dma address back into physical address
       [not found]                                               ` <20170802164343.GA3105-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-08-02 17:05                                                 ` Christian König
       [not found]                                                   ` <5ecbb5c2-fe4e-fd84-43b5-67ae06c5a032-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 24+ messages in thread
From: Christian König @ 2017-08-02 17:05 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Tom St Denis, Deucher, Alexander,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Am 02.08.2017 um 18:43 schrieb Jerome Glisse:
> On Wed, Aug 02, 2017 at 10:26:40AM +0200, Christian König wrote:
>> [SNIP]
> So to summarize you are saying you do not trust the value you get from
> pci_map_page() ?

Well, what we don't trust is that we actually get this value correctly 
into our page tables.

> If not then i stress again that you have all the informations you need
> inside the amdgpu driver. You can take the same scheme i propose to
> dump ttm.dma_address[] and compare against content of GPU page table.

Yes, exactly. But then again we have the mapping page to dma-address 
(because that is what drivers usually need), but what we need for 
debugging is a map with the info dma-address to page.

I mean we can obviously build the reverse table in the driver ourself, 
but that is just a waste of memory if you ask me cause the IOMMU driver 
should have that info in it's tables anyway.

Additional to that we have at least some plans to get away from BOs and 
towards HMM. So it would be nice to have a central debugging feature to 
access the iommu map.

I'm actually surprised that the IOMMU subsystem doesn't already have 
something like that somewhere. I mean how do you guys validate that 
what's written into the IOMMU tables is actually correct?

Isn't there some way (debugfs/sysfs/tools?) to dump them?

Regards,
Christian.

>
> Jérôme

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Feature Request: Ability to decode bus/dma address back into physical address
       [not found]                                                   ` <5ecbb5c2-fe4e-fd84-43b5-67ae06c5a032-5C7GfCeVMHo@public.gmane.org>
@ 2017-08-02 17:13                                                     ` Jerome Glisse
       [not found]                                                       ` <20170802171303.GB3105-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2017-08-02 18:42                                                     ` Robin Murphy
  1 sibling, 1 reply; 24+ messages in thread
From: Jerome Glisse @ 2017-08-02 17:13 UTC (permalink / raw)
  To: Christian König
  Cc: Tom St Denis, Deucher, Alexander,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Wed, Aug 02, 2017 at 07:05:11PM +0200, Christian König wrote:
> Am 02.08.2017 um 18:43 schrieb Jerome Glisse:
> > On Wed, Aug 02, 2017 at 10:26:40AM +0200, Christian König wrote:
> > > [SNIP]
> > So to summarize you are saying you do not trust the value you get from
> > pci_map_page() ?
> 
> Well, what we don't trust is that we actually get this value correctly into
> our page tables.
> 
> > If not then i stress again that you have all the informations you need
> > inside the amdgpu driver. You can take the same scheme i propose to
> > dump ttm.dma_address[] and compare against content of GPU page table.
> 
> Yes, exactly. But then again we have the mapping page to dma-address
> (because that is what drivers usually need), but what we need for debugging
> is a map with the info dma-address to page.

Why would you need the reverse ? You have a GPU virtual address do the following:
GPU vaddr -> GPU page table entrie -> bus address
GPU vaddr -> bo_va_mapping -> bo_va -> bo -> page -> dma/bus address

Compage the 2 values if they are the same then the GPU is looking where
you wanted it to look. If they are different then you know you have a bug
knowing the physical page addres do not give you any more informations
here ?

Jérôme

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Feature Request: Ability to decode bus/dma address back into physical address
       [not found]                                                       ` <20170802171303.GB3105-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-08-02 17:23                                                         ` Christian König
       [not found]                                                           ` <d4a6ad38-522c-2928-2ef1-1f7cd2267c4e-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 24+ messages in thread
From: Christian König @ 2017-08-02 17:23 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Tom St Denis, Deucher, Alexander,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Am 02.08.2017 um 19:13 schrieb Jerome Glisse:
> On Wed, Aug 02, 2017 at 07:05:11PM +0200, Christian König wrote:
>> Am 02.08.2017 um 18:43 schrieb Jerome Glisse:
>>> On Wed, Aug 02, 2017 at 10:26:40AM +0200, Christian König wrote:
>>>> [SNIP]
>>> So to summarize you are saying you do not trust the value you get from
>>> pci_map_page() ?
>> Well, what we don't trust is that we actually get this value correctly into
>> our page tables.
>>
>>> If not then i stress again that you have all the informations you need
>>> inside the amdgpu driver. You can take the same scheme i propose to
>>> dump ttm.dma_address[] and compare against content of GPU page table.
>> Yes, exactly. But then again we have the mapping page to dma-address
>> (because that is what drivers usually need), but what we need for debugging
>> is a map with the info dma-address to page.
> Why would you need the reverse ? You have a GPU virtual address do the following:
> GPU vaddr -> GPU page table entrie -> bus address
> GPU vaddr -> bo_va_mapping -> bo_va -> bo -> page -> dma/bus address

First of all the VM housekeeping structures keep the state as it is 
supposed to be on the next command submission and so the top of the 
pipeline, but the state of the page tables represent to bottom of the 
pipeline.

Second you can't access the BO from it's virtual address, the BO mapping 
is protected by the BO reservation lock. So when I want to lockup the BO 
I would need to lock the BO first - chicken and egg problem. That is of 
course solvable, but not something I would really like to do for a 
debugging feature.

Christian.

>
> Compage the 2 values if they are the same then the GPU is looking where
> you wanted it to look. If they are different then you know you have a bug
> knowing the physical page addres do not give you any more informations
> here ?
>
> Jérôme


_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Feature Request: Ability to decode bus/dma address back into physical address
       [not found]                                                           ` <d4a6ad38-522c-2928-2ef1-1f7cd2267c4e-5C7GfCeVMHo@public.gmane.org>
@ 2017-08-02 17:33                                                             ` Jerome Glisse
       [not found]                                                               ` <20170802173311.GC3105-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 24+ messages in thread
From: Jerome Glisse @ 2017-08-02 17:33 UTC (permalink / raw)
  To: Christian König
  Cc: Tom St Denis, Deucher, Alexander,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Wed, Aug 02, 2017 at 07:23:58PM +0200, Christian König wrote:
> Am 02.08.2017 um 19:13 schrieb Jerome Glisse:
> > On Wed, Aug 02, 2017 at 07:05:11PM +0200, Christian König wrote:
> > > Am 02.08.2017 um 18:43 schrieb Jerome Glisse:
> > > > On Wed, Aug 02, 2017 at 10:26:40AM +0200, Christian König wrote:
> > > > > [SNIP]
> > > > So to summarize you are saying you do not trust the value you get from
> > > > pci_map_page() ?
> > > Well, what we don't trust is that we actually get this value correctly into
> > > our page tables.
> > > 
> > > > If not then i stress again that you have all the informations you need
> > > > inside the amdgpu driver. You can take the same scheme i propose to
> > > > dump ttm.dma_address[] and compare against content of GPU page table.
> > > Yes, exactly. But then again we have the mapping page to dma-address
> > > (because that is what drivers usually need), but what we need for debugging
> > > is a map with the info dma-address to page.
> > Why would you need the reverse ? You have a GPU virtual address do the following:
> > GPU vaddr -> GPU page table entrie -> bus address
> > GPU vaddr -> bo_va_mapping -> bo_va -> bo -> page -> dma/bus address
> 
> First of all the VM housekeeping structures keep the state as it is supposed
> to be on the next command submission and so the top of the pipeline, but the
> state of the page tables represent to bottom of the pipeline.
> 
> Second you can't access the BO from it's virtual address, the BO mapping is
> protected by the BO reservation lock. So when I want to lockup the BO I
> would need to lock the BO first - chicken and egg problem. That is of course
> solvable, but not something I would really like to do for a debugging
> feature.

Tom said that the GPU is stop and thus there is nothing happening to the vm
nor any of the bo right ? So there is no need to protect anything. If you
still allow thing to change vm (map/unmap bo ...) how do you expect to debug ?

Jérôme

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Feature Request: Ability to decode bus/dma address back into physical address
       [not found]                                                               ` <20170802173311.GC3105-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-08-02 18:18                                                                 ` Christian König
       [not found]                                                                   ` <4f00e033-7a08-e54f-4b64-2813d5f25b5b-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 24+ messages in thread
From: Christian König @ 2017-08-02 18:18 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Tom St Denis, Deucher, Alexander,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Am 02.08.2017 um 19:33 schrieb Jerome Glisse:
> On Wed, Aug 02, 2017 at 07:23:58PM +0200, Christian König wrote:
>> Am 02.08.2017 um 19:13 schrieb Jerome Glisse:
>>> On Wed, Aug 02, 2017 at 07:05:11PM +0200, Christian König wrote:
>>>> Am 02.08.2017 um 18:43 schrieb Jerome Glisse:
>>>>> On Wed, Aug 02, 2017 at 10:26:40AM +0200, Christian König wrote:
>>>>>> [SNIP]
>>>>> So to summarize you are saying you do not trust the value you get from
>>>>> pci_map_page() ?
>>>> Well, what we don't trust is that we actually get this value correctly into
>>>> our page tables.
>>>>
>>>>> If not then i stress again that you have all the informations you need
>>>>> inside the amdgpu driver. You can take the same scheme i propose to
>>>>> dump ttm.dma_address[] and compare against content of GPU page table.
>>>> Yes, exactly. But then again we have the mapping page to dma-address
>>>> (because that is what drivers usually need), but what we need for debugging
>>>> is a map with the info dma-address to page.
>>> Why would you need the reverse ? You have a GPU virtual address do the following:
>>> GPU vaddr -> GPU page table entrie -> bus address
>>> GPU vaddr -> bo_va_mapping -> bo_va -> bo -> page -> dma/bus address
>> First of all the VM housekeeping structures keep the state as it is supposed
>> to be on the next command submission and so the top of the pipeline, but the
>> state of the page tables represent to bottom of the pipeline.
>>
>> Second you can't access the BO from it's virtual address, the BO mapping is
>> protected by the BO reservation lock. So when I want to lockup the BO I
>> would need to lock the BO first - chicken and egg problem. That is of course
>> solvable, but not something I would really like to do for a debugging
>> feature.
> Tom said that the GPU is stop and thus there is nothing happening to the vm
> nor any of the bo right ? So there is no need to protect anything. If you
> still allow thing to change vm (map/unmap bo ...) how do you expect to debug ?

Well the GPU is stuck (or manually stopped), but keep in mind that this 
is a very deep pipeline we are talking about here.

So even if there isn't any processing happening in the hardware any more 
there can still state changes queued up waiting for processing (or even 
new one added).

Christian.

>
> Jérôme


_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Feature Request: Ability to decode bus/dma address back into physical address
       [not found]                                                                   ` <4f00e033-7a08-e54f-4b64-2813d5f25b5b-5C7GfCeVMHo@public.gmane.org>
@ 2017-08-02 18:36                                                                     ` Jerome Glisse
  0 siblings, 0 replies; 24+ messages in thread
From: Jerome Glisse @ 2017-08-02 18:36 UTC (permalink / raw)
  To: Christian König
  Cc: Tom St Denis, Deucher, Alexander,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Wed, Aug 02, 2017 at 08:18:02PM +0200, Christian König wrote:
> Am 02.08.2017 um 19:33 schrieb Jerome Glisse:
> > On Wed, Aug 02, 2017 at 07:23:58PM +0200, Christian König wrote:
> > > Am 02.08.2017 um 19:13 schrieb Jerome Glisse:
> > > > On Wed, Aug 02, 2017 at 07:05:11PM +0200, Christian König wrote:
> > > > > Am 02.08.2017 um 18:43 schrieb Jerome Glisse:
> > > > > > On Wed, Aug 02, 2017 at 10:26:40AM +0200, Christian König wrote:
> > > > > > > [SNIP]
> > > > > > So to summarize you are saying you do not trust the value you get from
> > > > > > pci_map_page() ?
> > > > > Well, what we don't trust is that we actually get this value correctly into
> > > > > our page tables.
> > > > > 
> > > > > > If not then i stress again that you have all the informations you need
> > > > > > inside the amdgpu driver. You can take the same scheme i propose to
> > > > > > dump ttm.dma_address[] and compare against content of GPU page table.
> > > > > Yes, exactly. But then again we have the mapping page to dma-address
> > > > > (because that is what drivers usually need), but what we need for debugging
> > > > > is a map with the info dma-address to page.
> > > > Why would you need the reverse ? You have a GPU virtual address do the following:
> > > > GPU vaddr -> GPU page table entrie -> bus address
> > > > GPU vaddr -> bo_va_mapping -> bo_va -> bo -> page -> dma/bus address
> > > First of all the VM housekeeping structures keep the state as it is supposed
> > > to be on the next command submission and so the top of the pipeline, but the
> > > state of the page tables represent to bottom of the pipeline.
> > > 
> > > Second you can't access the BO from it's virtual address, the BO mapping is
> > > protected by the BO reservation lock. So when I want to lockup the BO I
> > > would need to lock the BO first - chicken and egg problem. That is of course
> > > solvable, but not something I would really like to do for a debugging
> > > feature.
> > Tom said that the GPU is stop and thus there is nothing happening to the vm
> > nor any of the bo right ? So there is no need to protect anything. If you
> > still allow thing to change vm (map/unmap bo ...) how do you expect to debug ?
> 
> Well the GPU is stuck (or manually stopped), but keep in mind that this is a
> very deep pipeline we are talking about here.
> 
> So even if there isn't any processing happening in the hardware any more
> there can still state changes queued up waiting for processing (or even new
> one added).
> 

I am familiar with how it all works. Vm are protected by fences so a
vm that is in use will be protected from bind/unbind if GPU is stop only
and update to virtual address space of that vm might also be block if
they were to happen through some GPU ring and not from CPU.

To me it looks like you want to know if GPU is accessing what was meant
to be access and i believe what i have outline allow that. Compare current
GPU page table entry with valid mapping.

Thus i do not see any value in trying to get bus -> page, especialy that
once you get the page you can't know to which bo it belongs without going
over all ttm tt objects and even then it might be an already "freed" page
from ttm point of view.

Jérôme

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Feature Request: Ability to decode bus/dma address back into physical address
       [not found]                                                   ` <5ecbb5c2-fe4e-fd84-43b5-67ae06c5a032-5C7GfCeVMHo@public.gmane.org>
  2017-08-02 17:13                                                     ` Jerome Glisse
@ 2017-08-02 18:42                                                     ` Robin Murphy
       [not found]                                                       ` <2f9da712-1559-6593-e512-c0508e21d747-5wv7dgnIgG8@public.gmane.org>
  1 sibling, 1 reply; 24+ messages in thread
From: Robin Murphy @ 2017-08-02 18:42 UTC (permalink / raw)
  To: Christian König, Jerome Glisse
  Cc: Tom St Denis, Deucher, Alexander,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On 02/08/17 18:05, Christian König wrote:
> Am 02.08.2017 um 18:43 schrieb Jerome Glisse:
>> On Wed, Aug 02, 2017 at 10:26:40AM +0200, Christian König wrote:
>>> [SNIP]
>> So to summarize you are saying you do not trust the value you get from
>> pci_map_page() ?
> 
> Well, what we don't trust is that we actually get this value correctly
> into our page tables.
> 
>> If not then i stress again that you have all the informations you need
>> inside the amdgpu driver. You can take the same scheme i propose to
>> dump ttm.dma_address[] and compare against content of GPU page table.
> 
> Yes, exactly. But then again we have the mapping page to dma-address
> (because that is what drivers usually need), but what we need for
> debugging is a map with the info dma-address to page.
> 
> I mean we can obviously build the reverse table in the driver ourself,
> but that is just a waste of memory if you ask me cause the IOMMU driver
> should have that info in it's tables anyway.
> 
> Additional to that we have at least some plans to get away from BOs and
> towards HMM. So it would be nice to have a central debugging feature to
> access the iommu map.

Note that underlying issue here doesn't have anything specifically to do
with IOMMUs - yes, the DMA API ops implemented by IOMMU drivers provide
the most obvious breakage of the assumption that DMA address == physical
address, but there are also various other cases (bus offsets, SWIOTLB,
etc.) for which it doesn't always hold. If the amdgpu-specific path
comes to a dead end, any acceptably general solution would probably have
to be implemented at the DMA API level.

> I'm actually surprised that the IOMMU subsystem doesn't already have
> something like that somewhere. I mean how do you guys validate that
> what's written into the IOMMU tables is actually correct?

In terms of DMA API implementation, experience says that you can usually
rely on everything going to hell pretty much instantly if it isn't ;)

The IOMMU API itself has tracepoints as mentioned at the top of the
thread, but everyone other than arm/arm64 implements their IOMMU-based
DMA ops directly without going via that abstraction. Adding DMA API
tracepoints would be a possibility, but would be awkward to do
generically because it's all static inlines in a header which ends up
getting pulled into the middle of other trace event headers (yes, I
tried...)

> Isn't there some way (debugfs/sysfs/tools?) to dump them?

FWIW I'm not aware of anything that does exactly what you want (other
than some trick patches I keep around that only work for arm64), but it
does come to mind that it ought to be fairly straightforward to give
dma-debug the ability to dump out the information it already captures.
It wouldn't be viable for deployment in non-development kernels, but
it's a idea - for production kernels, the simple answer is probably
"boot with iommu=off|pt when GPU debugging" anyway.

Robin.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Feature Request: Ability to decode bus/dma address back into physical address
       [not found]                                                       ` <2f9da712-1559-6593-e512-c0508e21d747-5wv7dgnIgG8@public.gmane.org>
@ 2017-08-02 19:25                                                         ` Tom St Denis
  0 siblings, 0 replies; 24+ messages in thread
From: Tom St Denis @ 2017-08-02 19:25 UTC (permalink / raw)
  To: Robin Murphy, Christian König, Jerome Glisse
  Cc: Deucher, Alexander, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On 02/08/17 02:42 PM, Robin Murphy wrote:

> FWIW I'm not aware of anything that does exactly what you want (other
> than some trick patches I keep around that only work for arm64), but it
> does come to mind that it ought to be fairly straightforward to give
> dma-debug the ability to dump out the information it already captures.
> It wouldn't be viable for deployment in non-development kernels, but
> it's a idea - for production kernels, the simple answer is probably
> "boot with iommu=off|pt when GPU debugging" anyway.

The genesis of this requirement is that IOMMU is a requirement in a lot 
of new development (for instance, virtualization) and newer systems 
cannot operate with it disabled.

Previously, ya I'd tell umr (our debugger) users to turn it off but 
that's becoming increasingly hard to sell :-)

Right now our tracepoint seems to work but relatively light workloads 
but can become somewhat laggy on higher loads and I fear that on data 
center type loads will become completely unusable.

Tom

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Feature Request: Ability to decode bus/dma address back into physical address
       [not found]                                   ` <20170801164250.3bae6436-DGNDKt5SQtizQB+pC5nmwQ@public.gmane.org>
@ 2017-08-04  9:43                                     ` Joerg Roedel
  0 siblings, 0 replies; 24+ messages in thread
From: Joerg Roedel @ 2017-08-04  9:43 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Tom St Denis, Deucher, Alexander,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Koenig,
	Christian

On Tue, Aug 01, 2017 at 04:42:50PM -0600, Alex Williamson wrote:
> Generally that's fine, your post will get delayed until Joerg or I
> approve it, but it should get there eventually.  Probably just a
> mis-click caused you to end up on the wrong list.  I still don't see
> you on the subscriber list, but I can add you if you like.  Thanks,

Oh sorry, it was probably my fault. I usually add legitmate
email-senders to the allowed list when accepting their first mail. But
probably I mis-clicked there and Tom ended up on the discarded-list.

Sorry again,


	Joerg

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2017-08-04  9:43 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-01 10:07 Feature Request: Ability to decode bus/dma address back into physical address Tom St Denis
     [not found] ` <8379cf5a-7539-e221-c678-20f617fb4337-5C7GfCeVMHo@public.gmane.org>
2017-08-01 17:25   ` Jerome Glisse
     [not found]     ` <30eb1ecb-c86f-4d3b-cd49-e002f46e582d@amd.com>
     [not found]       ` <30eb1ecb-c86f-4d3b-cd49-e002f46e582d-5C7GfCeVMHo@public.gmane.org>
2017-08-01 18:04         ` Jerome Glisse
     [not found]           ` <483ecda0-2977-d2ea-794c-320e429d7645@amd.com>
     [not found]             ` <483ecda0-2977-d2ea-794c-320e429d7645-5C7GfCeVMHo@public.gmane.org>
2017-08-01 19:03               ` Jerome Glisse
     [not found]                 ` <42c5fe2b-f179-cb71-03d3-7ae991543edb@amd.com>
     [not found]                   ` <42c5fe2b-f179-cb71-03d3-7ae991543edb-5C7GfCeVMHo@public.gmane.org>
2017-08-01 19:55                     ` Jerome Glisse
     [not found]                       ` <20170801195556.GD3443-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-08-01 20:26                         ` Tom St Denis
     [not found]                           ` <77e557d2-aa75-46c4-88a7-cca5448ea08e-5C7GfCeVMHo@public.gmane.org>
2017-08-01 20:43                             ` Deucher, Alexander
2017-08-01 21:01                             ` Jerome Glisse
     [not found]                               ` <20170801210105.GE3443-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-08-01 21:38                                 ` Tom St Denis
     [not found]                                   ` <2cd345ee-d5ad-1ad7-508a-86225e65621c-5C7GfCeVMHo@public.gmane.org>
2017-08-02  4:42                                     ` Jerome Glisse
     [not found]                                       ` <20170802044214.GA6285-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-08-02  8:26                                         ` Christian König
     [not found]                                           ` <4a2b004b-ebc2-9331-84c4-4e6672dd7b97-5C7GfCeVMHo@public.gmane.org>
2017-08-02 16:43                                             ` Jerome Glisse
     [not found]                                               ` <20170802164343.GA3105-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-08-02 17:05                                                 ` Christian König
     [not found]                                                   ` <5ecbb5c2-fe4e-fd84-43b5-67ae06c5a032-5C7GfCeVMHo@public.gmane.org>
2017-08-02 17:13                                                     ` Jerome Glisse
     [not found]                                                       ` <20170802171303.GB3105-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-08-02 17:23                                                         ` Christian König
     [not found]                                                           ` <d4a6ad38-522c-2928-2ef1-1f7cd2267c4e-5C7GfCeVMHo@public.gmane.org>
2017-08-02 17:33                                                             ` Jerome Glisse
     [not found]                                                               ` <20170802173311.GC3105-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-08-02 18:18                                                                 ` Christian König
     [not found]                                                                   ` <4f00e033-7a08-e54f-4b64-2813d5f25b5b-5C7GfCeVMHo@public.gmane.org>
2017-08-02 18:36                                                                     ` Jerome Glisse
2017-08-02 18:42                                                     ` Robin Murphy
     [not found]                                                       ` <2f9da712-1559-6593-e512-c0508e21d747-5wv7dgnIgG8@public.gmane.org>
2017-08-02 19:25                                                         ` Tom St Denis
2017-08-01 20:43                         ` Alex Williamson
     [not found]                           ` <20170801144320.63bda17d-DGNDKt5SQtizQB+pC5nmwQ@public.gmane.org>
2017-08-01 21:38                             ` Tom St Denis
     [not found]                               ` <aa08a829-b472-cc39-b1a3-6a7d01f64da1-5C7GfCeVMHo@public.gmane.org>
2017-08-01 22:42                                 ` Alex Williamson
     [not found]                                   ` <20170801164250.3bae6436-DGNDKt5SQtizQB+pC5nmwQ@public.gmane.org>
2017-08-04  9:43                                     ` Joerg Roedel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.