All of lore.kernel.org
 help / color / mirror / Atom feed
* dma_alloc_coherent bug in latest ntb_tool changes
@ 2018-06-07 23:10 Logan Gunthorpe
  2018-06-07 23:17 ` Dave Jiang
  0 siblings, 1 reply; 8+ messages in thread
From: Logan Gunthorpe @ 2018-06-07 23:10 UTC (permalink / raw)
  To: linux-ntb, Jon Mason, Serge Semin, Dave Jiang, Allen Hubbe

Hey,

I'm just running ntb_test on recent kernels with switchtec hardware and
found that ntb_tool and ntb_perf no longer work. I get this kernel warning:

[   64.077406] WARNING: CPU: 3 PID: 2231 at
./include/linux/dma-mapping.h:516 tool_mw_trans_write+0x16d/0x332 [ntb_tool]

Indicating that the dma_coherent mask is not set on the device. (Not
sure how much testing that patch set got before it was merged upstream...)

It seems one of the changes that slipped through is that it now
allocates DMA memory using the NTB struct device instead of the PCI
struct device that was used before the patchset.

Seeing nothing in the NTB tree sets the coherent mask on the NTB device
we get a warning and fail to allocate DMA-able memory.

I'm not sure the proper fix here: should we be using the NTB device for
DMA operations or roll back to using the PCI device? It seems cleaner to
use the NTB device but then we at least have to set the coherent_mask to
something sane (probably safe to set it to 64bits for NTB hardware).
However, I'm not sure if, for all arches, DMA ops depend on anything in
the device besides the mask.

Logan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: dma_alloc_coherent bug in latest ntb_tool changes
  2018-06-07 23:10 dma_alloc_coherent bug in latest ntb_tool changes Logan Gunthorpe
@ 2018-06-07 23:17 ` Dave Jiang
  2018-06-07 23:57   ` Serge Semin
  2018-06-08  0:04   ` Serge Semin
  0 siblings, 2 replies; 8+ messages in thread
From: Dave Jiang @ 2018-06-07 23:17 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-ntb, Jon Mason, Serge Semin, Allen Hubbe



On 06/07/2018 04:10 PM, Logan Gunthorpe wrote:
> Hey,
> 
> I'm just running ntb_test on recent kernels with switchtec hardware and
> found that ntb_tool and ntb_perf no longer work. I get this kernel warning:
> 
> [   64.077406] WARNING: CPU: 3 PID: 2231 at
> ./include/linux/dma-mapping.h:516 tool_mw_trans_write+0x16d/0x332 [ntb_tool]
> 
> Indicating that the dma_coherent mask is not set on the device. (Not
> sure how much testing that patch set got before it was merged upstream...)
> 
> It seems one of the changes that slipped through is that it now
> allocates DMA memory using the NTB struct device instead of the PCI
> struct device that was used before the patchset.
> 
> Seeing nothing in the NTB tree sets the coherent mask on the NTB device
> we get a warning and fail to allocate DMA-able memory.
> 
> I'm not sure the proper fix here: should we be using the NTB device for
> DMA operations or roll back to using the PCI device? It seems cleaner to
> use the NTB device but then we at least have to set the coherent_mask to
> something sane (probably safe to set it to 64bits for NTB hardware).
> However, I'm not sure if, for all arches, DMA ops depend on anything in
> the device besides the mask.
> 
> Logan
> 

I seem to recall having this discussion previously but don't remember
the outcome. I still think we should be dealing with the PCI device when
it comes to hardware stuff like DMA since that is the actual device. The
NTB device is just a an abstract device to make it work with the NTB bus
isn't it?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: dma_alloc_coherent bug in latest ntb_tool changes
  2018-06-07 23:17 ` Dave Jiang
@ 2018-06-07 23:57   ` Serge Semin
  2018-06-08  0:05     ` Logan Gunthorpe
  2018-06-08  0:04   ` Serge Semin
  1 sibling, 1 reply; 8+ messages in thread
From: Serge Semin @ 2018-06-07 23:57 UTC (permalink / raw)
  To: Dave Jiang; +Cc: Logan Gunthorpe, linux-ntb, Jon Mason, Allen Hubbe

On Thu, Jun 07, 2018 at 04:17:02PM -0700, Dave Jiang <dave.jiang@intel.com> wrote:
> 
> 
> On 06/07/2018 04:10 PM, Logan Gunthorpe wrote:
> > Hey,
> > 
> > I'm just running ntb_test on recent kernels with switchtec hardware and
> > found that ntb_tool and ntb_perf no longer work. I get this kernel warning:
> > 
> > [   64.077406] WARNING: CPU: 3 PID: 2231 at
> > ./include/linux/dma-mapping.h:516 tool_mw_trans_write+0x16d/0x332 [ntb_tool]
> > 
> > Indicating that the dma_coherent mask is not set on the device. (Not
> > sure how much testing that patch set got before it was merged upstream...)
> > 
> > It seems one of the changes that slipped through is that it now
> > allocates DMA memory using the NTB struct device instead of the PCI
> > struct device that was used before the patchset.
> > 
> > Seeing nothing in the NTB tree sets the coherent mask on the NTB device
> > we get a warning and fail to allocate DMA-able memory.
> > 
> > I'm not sure the proper fix here: should we be using the NTB device for
> > DMA operations or roll back to using the PCI device? It seems cleaner to
> > use the NTB device but then we at least have to set the coherent_mask to
> > something sane (probably safe to set it to 64bits for NTB hardware).
> > However, I'm not sure if, for all arches, DMA ops depend on anything in
> > the device besides the mask.
> > 
> > Logan
> > 
> 
> I seem to recall having this discussion previously but don't remember
> the outcome. I still think we should be dealing with the PCI device when
> it comes to hardware stuff like DMA since that is the actual device. The
> NTB device is just a an abstract device to make it work with the NTB bus
> isn't it?

Hi

The outcome was to use dma_coerce_mask_and_coherent() to set the proper mask
of the NTB device (copy of PCI-device mask) and use NTB device for DMA allocations.
My patchset concerned IDT, Intel and AMD hw code only in this matter. The change
worked for all of them. It wasn't tested on AMDs though, since well to be honest
we don't even know whether such hardware exists. I can't remember why the Switchtec driver
wasn't altered. It might be, the code hadn't been merged yet before I sent/prepared
the patchset.

Logan, you can see the reference of the function usage in the rest of the hardware
drivers.

-Sergey

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: dma_alloc_coherent bug in latest ntb_tool changes
  2018-06-07 23:17 ` Dave Jiang
  2018-06-07 23:57   ` Serge Semin
@ 2018-06-08  0:04   ` Serge Semin
  1 sibling, 0 replies; 8+ messages in thread
From: Serge Semin @ 2018-06-08  0:04 UTC (permalink / raw)
  To: Dave Jiang; +Cc: Logan Gunthorpe, linux-ntb, Jon Mason, Allen Hubbe

On Thu, Jun 07, 2018 at 04:17:02PM -0700, Dave Jiang <dave.jiang@intel.com> wrote:
> 
> 
> On 06/07/2018 04:10 PM, Logan Gunthorpe wrote:
> > Hey,
> > 
> > I'm just running ntb_test on recent kernels with switchtec hardware and
> > found that ntb_tool and ntb_perf no longer work. I get this kernel warning:
> > 
> > [   64.077406] WARNING: CPU: 3 PID: 2231 at
> > ./include/linux/dma-mapping.h:516 tool_mw_trans_write+0x16d/0x332 [ntb_tool]
> > 
> > Indicating that the dma_coherent mask is not set on the device. (Not
> > sure how much testing that patch set got before it was merged upstream...)
> > 
> > It seems one of the changes that slipped through is that it now
> > allocates DMA memory using the NTB struct device instead of the PCI
> > struct device that was used before the patchset.
> > 
> > Seeing nothing in the NTB tree sets the coherent mask on the NTB device
> > we get a warning and fail to allocate DMA-able memory.
> > 
> > I'm not sure the proper fix here: should we be using the NTB device for
> > DMA operations or roll back to using the PCI device? It seems cleaner to
> > use the NTB device but then we at least have to set the coherent_mask to
> > something sane (probably safe to set it to 64bits for NTB hardware).
> > However, I'm not sure if, for all arches, DMA ops depend on anything in
> > the device besides the mask.
> > 
> > Logan
> > 
> 
> I seem to recall having this discussion previously but don't remember
> the outcome. I still think we should be dealing with the PCI device when
> it comes to hardware stuff like DMA since that is the actual device. The
> NTB device is just a an abstract device to make it work with the NTB bus
> isn't it?

>> The NTB device is just a an abstract device to make it work with the NTB bus isn't it?

It is, and due to being abstraction, the client drivers shouldn't know, what
is hidden behind the scene. It might be PCIe, but may be something else. That's
why we decided to use dma_coerce_mask_and_coherent() and to use NTB device for
DMA allocations. I even had to alter ntb.c file so the ntb_register_device()
wouldn't memset the ntb-structure with zeros.

-Sergey


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: dma_alloc_coherent bug in latest ntb_tool changes
  2018-06-07 23:57   ` Serge Semin
@ 2018-06-08  0:05     ` Logan Gunthorpe
  2018-06-08  0:24       ` Serge Semin
  0 siblings, 1 reply; 8+ messages in thread
From: Logan Gunthorpe @ 2018-06-08  0:05 UTC (permalink / raw)
  To: Serge Semin, Dave Jiang; +Cc: linux-ntb, Jon Mason, Allen Hubbe



On 07/06/18 05:57 PM, Serge Semin wrote:
> The outcome was to use dma_coerce_mask_and_coherent() to set the proper mask
> of the NTB device (copy of PCI-device mask) and use NTB device for DMA allocations.
> My patchset concerned IDT, Intel and AMD hw code only in this matter. The change
> worked for all of them. It wasn't tested on AMDs though, since well to be honest
> we don't even know whether such hardware exists. I can't remember why the Switchtec driver
> wasn't altered. It might be, the code hadn't been merged yet before I sent/prepared
> the patchset.

Well, I kind of agree with Dave and should probably stick with using the
PCI device as we don't know what else all arches need for properly
allocating the DMA memory.

If we want to stick with this solution, I'd suggest we set the coherent
mask in ntb_register given that all the drivers are doing the exact same
thing. Then we won't have to worry about missing one.

> I can't remember why the Switchtec driver
> wasn't altered. It might be, the code hadn't been merged yet before I sent/prepared
> the patchset.

The switchtec driver was in ntb-next well before this patch set
materialized. In fact, the code in kernel v4.15 works fine.

It would have also been nice for this type of change to be in it's own
commit with a full description of the decisions made. Then, when we
bisect to find these issues we don't have to scratch our heads and
wonder what we were thinking.

Logan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: dma_alloc_coherent bug in latest ntb_tool changes
  2018-06-08  0:05     ` Logan Gunthorpe
@ 2018-06-08  0:24       ` Serge Semin
  2018-06-08  0:33         ` Logan Gunthorpe
  0 siblings, 1 reply; 8+ messages in thread
From: Serge Semin @ 2018-06-08  0:24 UTC (permalink / raw)
  To: Logan Gunthorpe; +Cc: Dave Jiang, linux-ntb, Jon Mason, Allen Hubbe

On Thu, Jun 07, 2018 at 06:05:20PM -0600, Logan Gunthorpe <logang@deltatee.com> wrote:
> 
> 
> On 07/06/18 05:57 PM, Serge Semin wrote:
> > The outcome was to use dma_coerce_mask_and_coherent() to set the proper mask
> > of the NTB device (copy of PCI-device mask) and use NTB device for DMA allocations.
> > My patchset concerned IDT, Intel and AMD hw code only in this matter. The change
> > worked for all of them. It wasn't tested on AMDs though, since well to be honest
> > we don't even know whether such hardware exists. I can't remember why the Switchtec driver
> > wasn't altered. It might be, the code hadn't been merged yet before I sent/prepared
> > the patchset.
> 
> Well, I kind of agree with Dave and should probably stick with using the
> PCI device as we don't know what else all arches need for properly
> allocating the DMA memory.
> 

You also said:
>> It seems cleaner to use the NTB device but then we at least have to set the
>> coherent_mask to something sane ...
It doesn't seem clearer now?

> If we want to stick with this solution, I'd suggest we set the coherent
> mask in ntb_register given that all the drivers are doing the exact same
> thing. Then we won't have to worry about missing one.
> 

Only hardware driver knows what dma address range is supported by device. Since
we have to initialize the coherence_mask anyway, seeing the field exists in the
standard device structure.
Regarding the ntb_register_device(). Since all the currently supported NTB devices
are PCIe-based, we can temporarily move the coherency mask setting to ntb.c. But
NTB API is still an abstraction, so I wouldn't do this, since there might be
non-PCIe-based devices.

> > I can't remember why the Switchtec driver
> > wasn't altered. It might be, the code hadn't been merged yet before I sent/prepared
> > the patchset.
> 
> The switchtec driver was in ntb-next well before this patch set
> materialized. In fact, the code in kernel v4.15 works fine.
> 

Ok. you got a chance to fix it know.

> It would have also been nice for this type of change to be in it's own
> commit with a full description of the decisions made. Then, when we
> bisect to find these issues we don't have to scratch our heads and
> wonder what we were thinking.
> 

You don't have to:
https://github.com/torvalds/linux/commit/417cf39cfea9c680aa7c278c8d8a0ca879cacf0a#diff-4cc577115709fbd1fb24e8803f9d629d

> Logan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: dma_alloc_coherent bug in latest ntb_tool changes
  2018-06-08  0:24       ` Serge Semin
@ 2018-06-08  0:33         ` Logan Gunthorpe
  2018-06-08  0:54           ` Serge Semin
  0 siblings, 1 reply; 8+ messages in thread
From: Logan Gunthorpe @ 2018-06-08  0:33 UTC (permalink / raw)
  To: Serge Semin; +Cc: Dave Jiang, linux-ntb, Jon Mason, Allen Hubbe



On 07/06/18 06:24 PM, Serge Semin wrote:
> You also said:
>>> It seems cleaner to use the NTB device but then we at least have to set the
>>> coherent_mask to something sane ...
> It doesn't seem clearer now?

It still seems cleaner... But I'm not convinced it's safe. I'm on the
fence really.

> Only hardware driver knows what dma address range is supported by device. 

Well given that all the hardware drivers only copy what's set in the pci
device I don't think that applies.

> Regarding the ntb_register_device(). Since all the currently supported NTB devices
> are PCIe-based, we can temporarily move the coherency mask setting to ntb.c. But
> NTB API is still an abstraction, so I wouldn't do this, since there might be
> non-PCIe-based devices.

NTB is a long way from being non-pci based. Especially seeing it's a PCI
standard. If and when that day comes where someone wants to create some
non-PCI NTB we'd probably have an ntb_register for pci devices and one
for other devices -- but then it's that person's problem. The general
rule of thumb for kernel development is not to write code for an
imagined future but to write good code for now.
> You don't have to:
> https://github.com/torvalds/linux/commit/417cf39cfea9c680aa7c278c8d8a0ca879cacf0a#diff-4cc577115709fbd1fb24e8803f9d629d

That's the commit that fixed an issue. It didn't cause the issue or even
talk about the change that would break it. Nor did it come up when a
bisect was done. That's why changes should be done in small chunks. It
would be different if there was a commit that made the change from pdevs
to ntb devs right after this commit explaining the consequences and that
it was fixed in the previous commit.

Logan


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: dma_alloc_coherent bug in latest ntb_tool changes
  2018-06-08  0:33         ` Logan Gunthorpe
@ 2018-06-08  0:54           ` Serge Semin
  0 siblings, 0 replies; 8+ messages in thread
From: Serge Semin @ 2018-06-08  0:54 UTC (permalink / raw)
  To: Logan Gunthorpe; +Cc: Dave Jiang, linux-ntb, Jon Mason, Allen Hubbe

On Thu, Jun 07, 2018 at 06:33:48PM -0600, Logan Gunthorpe <logang@deltatee.com> wrote:
> 
> 
> On 07/06/18 06:24 PM, Serge Semin wrote:
> > You also said:
> >>> It seems cleaner to use the NTB device but then we at least have to set the
> >>> coherent_mask to something sane ...
> > It doesn't seem clearer now?
> 
> It still seems cleaner... But I'm not convinced it's safe. I'm on the
> fence really.
> 
> > Only hardware driver knows what dma address range is supported by device. 
> 
> Well given that all the hardware drivers only copy what's set in the pci
> device I don't think that applies.
> 
> > Regarding the ntb_register_device(). Since all the currently supported NTB devices
> > are PCIe-based, we can temporarily move the coherency mask setting to ntb.c. But
> > NTB API is still an abstraction, so I wouldn't do this, since there might be
> > non-PCIe-based devices.
> 
> NTB is a long way from being non-pci based. Especially seeing it's a PCI
> standard. If and when that day comes where someone wants to create some
> non-PCI NTB we'd probably have an ntb_register for pci devices and one
> for other devices -- but then it's that person's problem. The general
> rule of thumb for kernel development is not to write code for an
> imagined future but to write good code for now.
> > You don't have to:
> > https://github.com/torvalds/linux/commit/417cf39cfea9c680aa7c278c8d8a0ca879cacf0a#diff-4cc577115709fbd1fb24e8803f9d629d
> 
> That's the commit that fixed an issue. It didn't cause the issue or even
> talk about the change that would break it. Nor did it come up when a
> bisect was done. That's why changes should be done in small chunks. It
> would be different if there was a commit that made the change from pdevs
> to ntb devs right after this commit explaining the consequences and that
> it was fixed in the previous commit.
> 
> Logan
> 

You saw the commit, lets halt the lecture about thumbs and chunks. Topic is about
the DMA mask setting. I still think, that it's more logical to place the
masks initialization to the hardware drivers code, since it's the hardware-specific
setting. I am also convinced, that we should proper set NTB device masks, so they
would correspond to the real NTB devices hidden behind the API. That's it.

-Sergey

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-06-08  0:54 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-07 23:10 dma_alloc_coherent bug in latest ntb_tool changes Logan Gunthorpe
2018-06-07 23:17 ` Dave Jiang
2018-06-07 23:57   ` Serge Semin
2018-06-08  0:05     ` Logan Gunthorpe
2018-06-08  0:24       ` Serge Semin
2018-06-08  0:33         ` Logan Gunthorpe
2018-06-08  0:54           ` Serge Semin
2018-06-08  0:04   ` Serge Semin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.