All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] TC: Set DMA masks for devices
@ 2018-10-03 12:21 Maciej W. Rozycki
  2018-10-04 16:57 ` Fredrik Noring
  0 siblings, 1 reply; 9+ messages in thread
From: Maciej W. Rozycki @ 2018-10-03 12:21 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: linux-mips, linux-kernel

Fix a TURBOchannel support regression with commit 205e1b7f51e4 
("dma-mapping: warn when there is no coherent_dma_mask") that caused 
coherent DMA allocations to produce a warning such as:

defxx: v1.11 2014/07/01  Lawrence V. Stefani and others
tc1: DEFTA at MMIO addr = 0x1e900000, IRQ = 20, Hardware addr = 08-00-2b-a3-a3-29
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at ./include/linux/dma-mapping.h:516 dfx_dev_register+0x670/0x678
Modules linked in:
CPU: 0 PID: 1 Comm: swapper Not tainted 4.19.0-rc6 #2
Stack : ffffffff8009ffc0 fffffffffffffec0 0000000000000000 ffffffff80647650
        0000000000000000 0000000000000000 ffffffff806f5f80 ffffffffffffffff
        0000000000000000 0000000000000000 0000000000000001 ffffffff8065d4e8
        98000000031b6300 ffffffff80563478 ffffffff805685b0 ffffffffffffffff
        0000000000000000 ffffffff805d6720 0000000000000204 ffffffff80388df8
        0000000000000000 0000000000000009 ffffffff8053efd0 ffffffff806657d0
        0000000000000000 ffffffff803177f8 0000000000000000 ffffffff806d0000
        9800000003078000 980000000307b9e0 000000001e900000 ffffffff80067940
        0000000000000000 ffffffff805d6720 0000000000000204 ffffffff80388df8
        ffffffff805176c0 ffffffff8004dc78 0000000000000000 ffffffff80067940
        ...
Call Trace:
[<ffffffff8004dc78>] show_stack+0xa0/0x130
[<ffffffff80067940>] __warn+0x128/0x170
---[ end trace b1d1e094f67f3bb2 ]---

This is because the TURBOchannel bus driver fails to set the coherent 
DMA mask for devices enumerated.

Set the regular and coherent DMA masks for TURBOchannel devices then, 
observing that the bus protocol supports a 34-bit (16GiB) DMA address 
space, by interpreting the value presented in the address cycle across 
the 32 `ad' lines as a 32-bit word rather than byte address[1].  The 
architectural size of the TURBOchannel DMA address space exceeds the 
maximum amount of RAM any actual TURBOchannel system in existence may 
have, hence both masks are the same.

This removes the warning shown above.

References:

[1] "TURBOchannel Hardware Specification", EK-369AA-OD-007B, Digital 
    Equipment Corporation, January 1993, Section "DMA", pp. 1-15 -- 1-17

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Fixes: 205e1b7f51e4 ("dma-mapping: warn when there is no coherent_dma_mask")
Cc: stable@vger.kernel.org # 4.16+
---
 drivers/tc/tc.c    |    8 +++++++-
 include/linux/tc.h |    1 +
 2 files changed, 8 insertions(+), 1 deletion(-)

linux-tc-dma-mask.patch
Index: linux-20180930-4maxp64/drivers/tc/tc.c
===================================================================
--- linux-20180930-4maxp64.orig/drivers/tc/tc.c
+++ linux-20180930-4maxp64/drivers/tc/tc.c
@@ -2,7 +2,7 @@
  *	TURBOchannel bus services.
  *
  *	Copyright (c) Harald Koerfgen, 1998
- *	Copyright (c) 2001, 2003, 2005, 2006  Maciej W. Rozycki
+ *	Copyright (c) 2001, 2003, 2005, 2006, 2018  Maciej W. Rozycki
  *	Copyright (c) 2005  James Simmons
  *
  *	This file is subject to the terms and conditions of the GNU
@@ -10,6 +10,7 @@
  *	directory of this archive for more details.
  */
 #include <linux/compiler.h>
+#include <linux/dma-mapping.h>
 #include <linux/errno.h>
 #include <linux/init.h>
 #include <linux/ioport.h>
@@ -92,6 +93,11 @@ static void __init tc_bus_add_devices(st
 		tdev->dev.bus = &tc_bus_type;
 		tdev->slot = slot;
 
+		/* TURBOchannel has 34-bit DMA addressing (16GiB space). */
+		tdev->dma_mask = DMA_BIT_MASK(34);
+		tdev->dev.dma_mask = &tdev->dma_mask;
+		tdev->dev.coherent_dma_mask = DMA_BIT_MASK(34);
+
 		for (i = 0; i < 8; i++) {
 			tdev->firmware[i] =
 				readb(module + offset + TC_FIRM_VER + 4 * i);
Index: linux-20180930-4maxp64/include/linux/tc.h
===================================================================
--- linux-20180930-4maxp64.orig/include/linux/tc.h
+++ linux-20180930-4maxp64/include/linux/tc.h
@@ -84,6 +84,7 @@ struct tc_dev {
 					   device. */
 	struct device	dev;		/* Generic device interface. */
 	struct resource	resource;	/* Address space of this device. */
+	u64		dma_mask;	/* DMA addressable range. */
 	char		vendor[9];
 	char		name[9];
 	char		firmware[9];

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] TC: Set DMA masks for devices
  2018-10-03 12:21 [PATCH] TC: Set DMA masks for devices Maciej W. Rozycki
@ 2018-10-04 16:57 ` Fredrik Noring
  2018-10-04 17:55   ` Fredrik Noring
  2018-10-04 20:09   ` Maciej W. Rozycki
  0 siblings, 2 replies; 9+ messages in thread
From: Fredrik Noring @ 2018-10-04 16:57 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Ralf Baechle, linux-mips, Jürgen Urban

Hi Maciej,

> Fix a TURBOchannel support regression with commit 205e1b7f51e4 
> ("dma-mapping: warn when there is no coherent_dma_mask") that caused 
> coherent DMA allocations to produce a warning such as:
> 
> defxx: v1.11 2014/07/01  Lawrence V. Stefani and others
> tc1: DEFTA at MMIO addr = 0x1e900000, IRQ = 20, Hardware addr = 08-00-2b-a3-a3-29
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 1 at ./include/linux/dma-mapping.h:516 dfx_dev_register+0x670/0x678
> Modules linked in:
> CPU: 0 PID: 1 Comm: swapper Not tainted 4.19.0-rc6 #2
> Stack : ffffffff8009ffc0 fffffffffffffec0 0000000000000000 ffffffff80647650
>         0000000000000000 0000000000000000 ffffffff806f5f80 ffffffffffffffff
>         0000000000000000 0000000000000000 0000000000000001 ffffffff8065d4e8
>         98000000031b6300 ffffffff80563478 ffffffff805685b0 ffffffffffffffff
>         0000000000000000 ffffffff805d6720 0000000000000204 ffffffff80388df8
>         0000000000000000 0000000000000009 ffffffff8053efd0 ffffffff806657d0
>         0000000000000000 ffffffff803177f8 0000000000000000 ffffffff806d0000
>         9800000003078000 980000000307b9e0 000000001e900000 ffffffff80067940
>         0000000000000000 ffffffff805d6720 0000000000000204 ffffffff80388df8
>         ffffffff805176c0 ffffffff8004dc78 0000000000000000 ffffffff80067940
>         ...
> Call Trace:
> [<ffffffff8004dc78>] show_stack+0xa0/0x130
> [<ffffffff80067940>] __warn+0x128/0x170
> ---[ end trace b1d1e094f67f3bb2 ]---
> 
> This is because the TURBOchannel bus driver fails to set the coherent 
> DMA mask for devices enumerated.

Interesting! This warning is also triggered by the PS2 OHCI driver. Robin
Murphy proposed the patch

https://lkml.org/lkml/2018/7/3/507

that relaxed it and a related warning. Half of the patch was merged in
commit d27fb99f62af7 while the other half (related to this warning) was
rejected by Christoph Hellwig. The PS2 OHCI triggers the following trace:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 62 at ./include/linux/dma-mapping.h:516 ohci_setup+0x41c/0x424 [ohci_hcd]
Modules linked in: ohci_ps2(+) ohci_hcd usbcore usb_common sd_mod iop iop_fio iop_module iop_memory sif
CPU: 0 PID: 62 Comm: modprobe Not tainted 4.16.0+ #1533
Stack : 00000000 00000000 80747392 00000037 81c6eb0c 804f32e7 80493b24 0000003e
        80743498 00000204 00000001 c01c0000 802a2fa0 10058c00 81ea5a68 804facc0
        00000000 00000000 80740000 00000007 00000000 00000060 00000000 00000000
        3a6d6d6f 00000000 0000005f 646f6d20 80000000 00000000 c01e66e8 c01e813c
        00000009 00000204 00000001 c01c0000 00000018 80278fe0 0007579f 00000001
        ...
Call Trace:
[<8001d6e4>] show_stack+0x74/0x104
[<800323a8>] __warn+0x118/0x120
[<8003246c>] warn_slowpath_null+0x44/0x58
[<c01e66e8>] ohci_setup+0x41c/0x424 [ohci_hcd]
[<c01f209c>] ohci_ps2_reset+0x30/0x70 [ohci_ps2]
[<c01a8aec>] usb_add_hcd+0x2d4/0x89c [usbcore]
[<c01f2360>] ohci_hcd_ps2_probe+0x284/0x2a4 [ohci_ps2]
[<802a8a74>] platform_drv_probe+0x2c/0x68
[<802a70b4>] driver_probe_device+0x22c/0x2e4
[<802a71f0>] __driver_attach+0x84/0xc8
[<802a53fc>] bus_for_each_dev+0x60/0x90
[<802a6580>] bus_add_driver+0x1b8/0x200
[<802a7980>] driver_register+0xc0/0x100
[<800106bc>] do_one_initcall+0x17c/0x190
[<800841f4>] do_init_module+0x74/0x1f0
[<80082f30>] load_module+0x1680/0x2044
[<80083adc>] SyS_finit_module+0xa0/0xb8
[<8002190c>] syscall_common+0x34/0x58
---[ end trace e71738b5fa6bf9aa ]---

> Set the regular and coherent DMA masks for TURBOchannel devices then, 
> observing that the bus protocol supports a 34-bit (16GiB) DMA address 
> space, by interpreting the value presented in the address cycle across 
> the 32 `ad' lines as a 32-bit word rather than byte address[1].  The 
> architectural size of the TURBOchannel DMA address space exceeds the 
> maximum amount of RAM any actual TURBOchannel system in existence may 
> have, hence both masks are the same.

A complication with the PS2 OHCI is that DMA addresses 0-0x200000 map to
0x1c000000-0x1c200000 as seen by the kernel. Robin suggested that the mask
might correspond to the effective addressing capability, which would be
DMA_BIT_MASK(21), but it does not seem to be entirely clear, since his
commit message said that

    A somewhat similar line of reasoning also applies at the other end for
    the mask check in dma_alloc_attrs() too - indeed, a device which cannot
    access anything other than its own local memory probably *shouldn't*
    have a valid mask for the general coherent DMA API.

A special circumstance here is the use of HCD_LOCAL_MEM that is a kind of
DMA bounce buffer. Are you using anything similar with your DEFTA driver?

Fredrik

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] TC: Set DMA masks for devices
  2018-10-04 16:57 ` Fredrik Noring
@ 2018-10-04 17:55   ` Fredrik Noring
  2018-10-04 20:09   ` Maciej W. Rozycki
  1 sibling, 0 replies; 9+ messages in thread
From: Fredrik Noring @ 2018-10-04 17:55 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Ralf Baechle, linux-mips, Jürgen Urban

H Maciej,

> > Set the regular and coherent DMA masks for TURBOchannel devices then, 
> > observing that the bus protocol supports a 34-bit (16GiB) DMA address 
> > space, by interpreting the value presented in the address cycle across 
> > the 32 `ad' lines as a 32-bit word rather than byte address[1].  The 
> > architectural size of the TURBOchannel DMA address space exceeds the 
> > maximum amount of RAM any actual TURBOchannel system in existence may 
> > have, hence both masks are the same.
> 
> A complication with the PS2 OHCI is that DMA addresses 0-0x200000 map to
> 0x1c000000-0x1c200000 as seen by the kernel. Robin suggested that the mask
> might correspond to the effective addressing capability, which would be
> DMA_BIT_MASK(21), but it does not seem to be entirely clear, since his
> commit message said that
> 
>     A somewhat similar line of reasoning also applies at the other end for
>     the mask check in dma_alloc_attrs() too - indeed, a device which cannot
>     access anything other than its own local memory probably *shouldn't*
>     have a valid mask for the general coherent DMA API.
> 
> A special circumstance here is the use of HCD_LOCAL_MEM that is a kind of
> DMA bounce buffer. Are you using anything similar with your DEFTA driver?

Sorry, I didn't interpret your comment properly. With TURBOchannel DMA
address space exceeding any practical amount of RAM, bounce buffers isn't
needed for that system. The situation is the reverse with the PS2 OHCI.

Fredrik

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] TC: Set DMA masks for devices
  2018-10-04 16:57 ` Fredrik Noring
  2018-10-04 17:55   ` Fredrik Noring
@ 2018-10-04 20:09   ` Maciej W. Rozycki
  2018-10-05 14:56     ` Fredrik Noring
  1 sibling, 1 reply; 9+ messages in thread
From: Maciej W. Rozycki @ 2018-10-04 20:09 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: Ralf Baechle, linux-mips, Jürgen Urban

Hi Fredrik,

> A complication with the PS2 OHCI is that DMA addresses 0-0x200000 map to
> 0x1c000000-0x1c200000 as seen by the kernel. Robin suggested that the mask
> might correspond to the effective addressing capability, which would be
> DMA_BIT_MASK(21),

 I take it you mean 0-0x1fffff obviously; let's be accurate in a technical 
discussion and avoid ambiguous cases.

 Well, the need to map between the CPU and the DMA address space is not 
uncommon.  As I recall the Galileo/Marvell GT-64xxx system controllers 
have a BAR for PCI master accesses to local DRAM (so that multiple such 
controllers can coexist in a NUMA system) and any non-identity mapping has 
to be taken into account with DMA of course

 And indeed e.g. `dma_map_single' does handle that and given a CPU-side 
physical memory address returns a corresponding DMA-side address.  And the 
DMA mask has to reflect that and describe the DMA side, as it's the device 
side that has an address space limitation here and any offset resulting 
from a non-identity mapping does not change that limitation, although the 
offset does have of course to be taken into account by `dma_map_single', 
etc. in determining whether the memory area requested for use by a DMA 
device can be used directly or whether a bounce buffer will be required 
for that mapping.

> but it does not seem to be entirely clear, since his
> commit message said that
> 
>     A somewhat similar line of reasoning also applies at the other end for
>     the mask check in dma_alloc_attrs() too - indeed, a device which cannot
>     access anything other than its own local memory probably *shouldn't*
>     have a valid mask for the general coherent DMA API.

 Well, how can such a device use the DMA API in the first place?  If the 
device has local memory, than the driver has to manage it itself somehow 
if needed, and then arrange copying it to main memory, either by a CPU or 
a third-party DMA controller (data mover) if available.  Of course in the 
latter case a driver for the DMA controller may have to use the DMA API.

 I'll be resubmitting a driver for such a device shortly, the DEFZA (the 
previous submission can be found here: 
<https://marc.info/?l=linux-netdev&m=139841853827404>).  It is interesting 
in that the FDDI engine supports host DMA on the reception side (and 
consequently the driver uses the DMA API to handle that), while on the 
transmission side (as well as with a couple of maintenance queues) it only 
does DMA with its onboard buffer memory, the contents of which need to be 
copied by the CPU.  So there's no use of the DMA API on the transmission 
or maintenance side.  However usual DMA rings (all located in board memory 
too) are used for all data moves.

 The DEFTA is a follow-up and an upgrade to the DEFZA, more integrated 
(the DEFZA uses a pair of PCBs while the DEFTA fits on one, of the size of 
each in the former pair), and with the extra silicon space gained it was 
possible to squeeze in circuitry required to do host DMA for all data 
moves, and also the DMA rings.

> A special circumstance here is the use of HCD_LOCAL_MEM that is a kind of
> DMA bounce buffer. Are you using anything similar with your DEFTA driver?

 The driver does need either an IOMMU or bounce buffers in system RAM in 
the case of 64-bit PCI systems, as the PFI PCI ASIC that the FDDI PDQ ASIC 
interfaces on the DEFPA does not AFAIK support 64-bit addressing (be it 
directly or with the use of DAC), although the PDQ itself does support 
48-bit addressing (i.e. DMA descriptor addresses hold bits 47:2 of host 
addresses), which would be sufficient for the usual cases.

 Not in the DEFTA (or for that matter DEFEA; possibly the only EISA device 
using the DMA API) case though, as the most equipped TURBOchannel systems, 
i.e. the DEC 3000 AXP models 500, 800 and 900 only support up to 1GiB of 
memory, which is well below the 34-bit addressing limit.

 The PDQ ASIC was used to interface FDDI to many host buses and in 
addition to the 3 bus attachments mentioned above, all of which we have 
support for in Linux, it was also used for Q-bus (the DEFQA) and FutureBus 
(the DEFAA).  We may have support for the DEFQA one day as I have both 
such a board and a suitable system to use it with.  We are unlikely to 
have support for the DEFAA, as FutureBus was only used in high-end VAX and 
Alpha systems, the size of a full 19" rack at the very least, but it is 
there I believe only that the full PDQ addressing capability was actually 
utilised.

 NB I sat on this fix from 2014, well before the warning was introduced in 
the first place, and it's only now that I got to unloading my patch queue. 
:(

  Maciej

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] TC: Set DMA masks for devices
  2018-10-04 20:09   ` Maciej W. Rozycki
@ 2018-10-05 14:56     ` Fredrik Noring
  2018-10-05 22:52       ` Maciej W. Rozycki
  0 siblings, 1 reply; 9+ messages in thread
From: Fredrik Noring @ 2018-10-05 14:56 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Ralf Baechle, linux-mips, Jürgen Urban

Hi Maciej,

> > A complication with the PS2 OHCI is that DMA addresses 0-0x200000 map to
> > 0x1c000000-0x1c200000 as seen by the kernel. Robin suggested that the mask
> > might correspond to the effective addressing capability, which would be
> > DMA_BIT_MASK(21),
> 
>  I take it you mean 0-0x1fffff obviously; let's be accurate in a technical 
> discussion and avoid ambiguous cases.

That's interesting. :) 0x1fffff is not a valid DMA address due to alignment
restrictions, so if one wants to indicate a closed [inclusive] DMA address
interval it would be 0-0x1ffffc, since the 32-bit word rather than the byte
is the unit of the IOP DMA. In mathematics and programming languages it is
often convenient to work with half-open intervals denoted by "[0,0x200000)"
in this case. I think both notations are technically accurate, but they do
emphasize different aspects of addresses and memory. I can switch to your
byte-centric notation if that helps. :)

>  Well, the need to map between the CPU and the DMA address space is not 
> uncommon.  As I recall the Galileo/Marvell GT-64xxx system controllers 
> have a BAR for PCI master accesses to local DRAM (so that multiple such 
> controllers can coexist in a NUMA system) and any non-identity mapping has 
> to be taken into account with DMA of course
> 
>  And indeed e.g. `dma_map_single' does handle that and given a CPU-side 
> physical memory address returns a corresponding DMA-side address.  And the 
> DMA mask has to reflect that and describe the DMA side, as it's the device 
> side that has an address space limitation here and any offset resulting 
> from a non-identity mapping does not change that limitation, although the 
> offset does have of course to be taken into account by `dma_map_single', 
> etc. in determining whether the memory area requested for use by a DMA 
> device can be used directly or whether a bounce buffer will be required 
> for that mapping.

Ah... memory that is known to be DMA compatible is allocated separately,
and then handed over to the DMA subsystem using dma_declare_coherent_memory.
This is done once during driver initialisation. The drivers ohci-sm501.c and
ohci-tmio.c do that too, which is why I suspect they might broken as well.

The SM501 driver has this explanation:

	/* The sm501 chip is equipped with local memory that may be used
	 * by on-chip devices such as the video controller and the usb host.
	 * This driver uses dma_declare_coherent_memory() to make sure
	 * usb allocations with dma_alloc_coherent() allocate from
	 * this local memory. The dma_handle returned by dma_alloc_coherent()
	 * will be an offset starting from 0 for the first local memory byte.
	 *
	 * So as long as data is allocated using dma_alloc_coherent() all is
	 * fine. This is however not always the case - buffers may be allocated
	 * using kmalloc() - so the usb core needs to be told that it must copy
	 * data into our local memory if the buffers happen to be placed in
	 * regular memory. The HCD_LOCAL_MEM flag does just that.
	 */

	retval = dma_declare_coherent_memory(dev, mem->start,
					 mem->start - mem->parent->start,
					 resource_size(mem),
					 DMA_MEMORY_EXCLUSIVE);

The corresponding code in the PS2 OHCI driver does

	ps2priv->iop_dma_addr = iop_alloc(size);
	if (ps2priv->iop_dma_addr == 0) {
		dev_err(dev, "iop_alloc failed\n");
		return -ENOMEM;
	}

	if (dma_declare_coherent_memory(dev,
			iop_bus_to_phys(ps2priv->iop_dma_addr),
			ps2priv->iop_dma_addr, size, flags)) {
		dev_err(dev, "dma_declare_coherent_memory failed\n");
		iop_free(ps2priv->iop_dma_addr);
		ps2priv->iop_dma_addr = 0;
		return -ENOMEM;
	}

where iop_alloc is a special IOP memory allocation function and its return
value stored in iop_dma_addr is handed over to dma_declare_coherent_memory.

> > but it does not seem to be entirely clear, since his
> > commit message said that
> > 
> >     A somewhat similar line of reasoning also applies at the other end for
> >     the mask check in dma_alloc_attrs() too - indeed, a device which cannot
> >     access anything other than its own local memory probably *shouldn't*
> >     have a valid mask for the general coherent DMA API.
> 
>  Well, how can such a device use the DMA API in the first place?  If the 
> device has local memory, than the driver has to manage it itself somehow 
> if needed, and then arrange copying it to main memory, either by a CPU or 
> a third-party DMA controller (data mover) if available.  Of course in the 
> latter case a driver for the DMA controller may have to use the DMA API.

The coherently declared memory given to the DMA subsystem is used for a
fixed sized DMA pool and no additional allocations are permitted. One could
choose a DMA mask that pretends to be reasonable, or the opposite, a mask
such as 1 that is unreasonable on purpose, as Robin writes:

	Alternatively, there is perhaps some degree of argument for
	deliberately picking a nonzero but useless value like 1,
	although it looks like the MIPS allocator (at least the dma-
	default one) never actually checks whether the page it gets
	is within range of the device's coherent mask, which it
	probably should do.

	https://lkml.org/lkml/2018/7/6/697

>  I'll be resubmitting a driver for such a device shortly, the DEFZA (the 
> previous submission can be found here: 
> <https://marc.info/?l=linux-netdev&m=139841853827404>).  It is interesting 
> in that the FDDI engine supports host DMA on the reception side (and 
> consequently the driver uses the DMA API to handle that), while on the 
> transmission side (as well as with a couple of maintenance queues) it only 
> does DMA with its onboard buffer memory, the contents of which need to be 
> copied by the CPU.  So there's no use of the DMA API on the transmission 
> or maintenance side.  However usual DMA rings (all located in board memory 
> too) are used for all data moves.

The DMA for its onboard buffer memory appears to be very similar to the
IOP and its DMA? That memory is currently copied by the EE, but there are
other DMA controllers that could handle that, possibly synchronised using
DMA chaining, which would assist the EE significantly.

Apart from USB, the IOP does networking, FireWire, harddisks, etc. Some
or all of the peripherals could be accelerated with DMA, which is an
interesting challenge.

>  The DEFTA is a follow-up and an upgrade to the DEFZA, more integrated 
> (the DEFZA uses a pair of PCBs while the DEFTA fits on one, of the size of 
> each in the former pair), and with the extra silicon space gained it was 
> possible to squeeze in circuitry required to do host DMA for all data 
> moves, and also the DMA rings.

Nice. :)

> > A special circumstance here is the use of HCD_LOCAL_MEM that is a kind of
> > DMA bounce buffer. Are you using anything similar with your DEFTA driver?
> 
>  The driver does need either an IOMMU or bounce buffers in system RAM in 
> the case of 64-bit PCI systems, as the PFI PCI ASIC that the FDDI PDQ ASIC 
> interfaces on the DEFPA does not AFAIK support 64-bit addressing (be it 
> directly or with the use of DAC), although the PDQ itself does support 
> 48-bit addressing (i.e. DMA descriptor addresses hold bits 47:2 of host 
> addresses), which would be sufficient for the usual cases.
> 
>  Not in the DEFTA (or for that matter DEFEA; possibly the only EISA device 
> using the DMA API) case though, as the most equipped TURBOchannel systems, 
> i.e. the DEC 3000 AXP models 500, 800 and 900 only support up to 1GiB of 
> memory, which is well below the 34-bit addressing limit.
> 
>  The PDQ ASIC was used to interface FDDI to many host buses and in 
> addition to the 3 bus attachments mentioned above, all of which we have 
> support for in Linux, it was also used for Q-bus (the DEFQA) and FutureBus 
> (the DEFAA).  We may have support for the DEFQA one day as I have both 
> such a board and a suitable system to use it with.  We are unlikely to 
> have support for the DEFAA, as FutureBus was only used in high-end VAX and 
> Alpha systems, the size of a full 19" rack at the very least, but it is 
> there I believe only that the full PDQ addressing capability was actually 
> utilised.

Thanks! By the way, is it possible to find spare parts for such vintage
hardware these days in case of irrepairable failures?

>  NB I sat on this fix from 2014, well before the warning was introduced in 
> the first place, and it's only now that I got to unloading my patch queue. 
> :(

Do you have the latest kernel running on your DECstation machines now? :)

Fredrik

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] TC: Set DMA masks for devices
  2018-10-05 14:56     ` Fredrik Noring
@ 2018-10-05 22:52       ` Maciej W. Rozycki
  2018-10-06  9:21         ` Fredrik Noring
  0 siblings, 1 reply; 9+ messages in thread
From: Maciej W. Rozycki @ 2018-10-05 22:52 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: Ralf Baechle, linux-mips, Jürgen Urban

Hi Fredrik,

> >  I take it you mean 0-0x1fffff obviously; let's be accurate in a technical 
> > discussion and avoid ambiguous cases.
> 
> That's interesting. :) 0x1fffff is not a valid DMA address due to alignment
> restrictions, so if one wants to indicate a closed [inclusive] DMA address
> interval it would be 0-0x1ffffc, since the 32-bit word rather than the byte
> is the unit of the IOP DMA. In mathematics and programming languages it is
> often convenient to work with half-open intervals denoted by "[0,0x200000)"
> in this case. I think both notations are technically accurate, but they do
> emphasize different aspects of addresses and memory. I can switch to your
> byte-centric notation if that helps. :)

 Well, the byte at 0x1fffff may not be individually addressable by this 
particular DMA engine, but surely it is there included in DMA transfers 
accessing the location that spans it.  If instead you prefer to use the 
mathematical notation to specify inclusive/exclusive ranges, then of 
course I'm fine with that too.

> >  And indeed e.g. `dma_map_single' does handle that and given a CPU-side 
> > physical memory address returns a corresponding DMA-side address.  And the 
> > DMA mask has to reflect that and describe the DMA side, as it's the device 
> > side that has an address space limitation here and any offset resulting 
> > from a non-identity mapping does not change that limitation, although the 
> > offset does have of course to be taken into account by `dma_map_single', 
> > etc. in determining whether the memory area requested for use by a DMA 
> > device can be used directly or whether a bounce buffer will be required 
> > for that mapping.
> 
> Ah... memory that is known to be DMA compatible is allocated separately,
> and then handed over to the DMA subsystem using dma_declare_coherent_memory.

 Well, that does specify both a CPU-side and a corresponding DMA-side 
address too.

> This is done once during driver initialisation. The drivers ohci-sm501.c and
> ohci-tmio.c do that too, which is why I suspect they might broken as well.
> 
> The SM501 driver has this explanation:
> 
> 	/* The sm501 chip is equipped with local memory that may be used
> 	 * by on-chip devices such as the video controller and the usb host.
> 	 * This driver uses dma_declare_coherent_memory() to make sure
> 	 * usb allocations with dma_alloc_coherent() allocate from
> 	 * this local memory. The dma_handle returned by dma_alloc_coherent()
> 	 * will be an offset starting from 0 for the first local memory byte.

 From the description I take it it is some MMIO memory rather than host 
memory.  I fail to see how it is supposed to work with these calls for 
non-system memory, which certainly any MMIO memory is, which surely is not 
under the supervision of the kernel memory allocator.

 There are calls for MMIO memory defined in the DMA API, specifically 
`dma_map_resource' and `dma_unmap_resource'.  I've never used them myself, 
and I gather they provide you with a way for CPUs to access MMIO memory 
with caching enabled and without the need to use the MMIO accessors only, 
such as `readl', `writel', etc., which are expected to avoid going through 
any CPU cache.  Maybe these are what you're after?

 But maybe I'm missing something.

> 	 *
> 	 * So as long as data is allocated using dma_alloc_coherent() all is
> 	 * fine. This is however not always the case - buffers may be allocated
> 	 * using kmalloc() - so the usb core needs to be told that it must copy
> 	 * data into our local memory if the buffers happen to be placed in
> 	 * regular memory. The HCD_LOCAL_MEM flag does just that.
> 	 */

 This raises a hack alert to me TBH.

> >  Well, how can such a device use the DMA API in the first place?  If the 
> > device has local memory, than the driver has to manage it itself somehow 
> > if needed, and then arrange copying it to main memory, either by a CPU or 
> > a third-party DMA controller (data mover) if available.  Of course in the 
> > latter case a driver for the DMA controller may have to use the DMA API.
> 
> The coherently declared memory given to the DMA subsystem is used for a
> fixed sized DMA pool and no additional allocations are permitted. One could
> choose a DMA mask that pretends to be reasonable, or the opposite, a mask
> such as 1 that is unreasonable on purpose, as Robin writes:
> 
> 	Alternatively, there is perhaps some degree of argument for
> 	deliberately picking a nonzero but useless value like 1,
> 	although it looks like the MIPS allocator (at least the dma-
> 	default one) never actually checks whether the page it gets
> 	is within range of the device's coherent mask, which it
> 	probably should do.
> 
> 	https://lkml.org/lkml/2018/7/6/697

 It does look like an API abuse to me, as I noted above.

> >  I'll be resubmitting a driver for such a device shortly, the DEFZA (the 
> > previous submission can be found here: 
> > <https://marc.info/?l=linux-netdev&m=139841853827404>).  It is interesting 
> > in that the FDDI engine supports host DMA on the reception side (and 
> > consequently the driver uses the DMA API to handle that), while on the 
> > transmission side (as well as with a couple of maintenance queues) it only 
> > does DMA with its onboard buffer memory, the contents of which need to be 
> > copied by the CPU.  So there's no use of the DMA API on the transmission 
> > or maintenance side.  However usual DMA rings (all located in board memory 
> > too) are used for all data moves.
> 
> The DMA for its onboard buffer memory appears to be very similar to the
> IOP and its DMA? That memory is currently copied by the EE, but there are
> other DMA controllers that could handle that, possibly synchronised using
> DMA chaining, which would assist the EE significantly.

 Mind that the DEFZA runs its own RTOS for initialization and management 
support, including in particular SMT (Station Management).  This is run on 
an MC68000 processor.  That processor is interfaced to a bus where board 
memory is attached as well as the RMC (Ring Memory Controller) chip, which 
acts as a DMA master on that bus, like does the host bus interface.  Also 
certain control register writes from the host raise interrupts to the 
MC68000 for special situations to handle.

 All the PDQ-based FDDI adapters also have an M68000 which runs an RTOS, 
however the presence of the PDQ ASIC makes their architecture slightly 
different as the FDDI chipset does host DMA via the PDQ ASIC, which acts 
as a master on the host bus (possibly through a bridge chip like the PFI, 
though TURBOchannel for example is interfaced directly).

 These adapters went through several revisions, all using the Motorola 
FDDI chipset (originally designed by DEC and then sold to Motorola for 
fabrication and marketing, with DEC retaining an unlimited licence to 
use), but with the PDQ (Packet Data Queue, I believe; not officially 
confirmed) replacing the FSI (FDDI System Interface) block, and the CAMEL 
(MAC and ELM (Media Access Controller and Elasticity Buffer and Link 
Management)) and FCG (FDDI Clock Generator) blocks both retained.

> >  The PDQ ASIC was used to interface FDDI to many host buses and in 
> > addition to the 3 bus attachments mentioned above, all of which we have 
> > support for in Linux, it was also used for Q-bus (the DEFQA) and FutureBus 
> > (the DEFAA).  We may have support for the DEFQA one day as I have both 
> > such a board and a suitable system to use it with.  We are unlikely to 
> > have support for the DEFAA, as FutureBus was only used in high-end VAX and 
> > Alpha systems, the size of a full 19" rack at the very least, but it is 
> > there I believe only that the full PDQ addressing capability was actually 
> > utilised.
> 
> Thanks! By the way, is it possible to find spare parts for such vintage
> hardware these days in case of irrepairable failures?

 What do you mean by spare parts?  ICs?  Complete modules can certainly be 
chased, though obviously there are the more common ones, and then there 
are the exotic ones.

 The biggest challenge has turned out to be electrolytic capacitor 
failures in power supplies.  Unfortunately in late 1980s to mid 1990s 
several lines of low-ESR capacitors, used in output filters in switch-mode 
PSUs, were made with a new electrolyte formula based on a quaternary 
ammonium salt.  All they have turned out to suffer from excessive 
corrosion caused by that electrolyte, shortening the lifespan of those 
parts well below the expectations even in the enhanced lines specifically 
made with long life in mind.  Consequently those parts start leaking even 
if unused (or indeed never used) and then obviously cause PSU breakage if 
powered up.

 Those were all from reputable manufacturers, such as Chemi-con, Nichicon 
or Panasonic; not to be confused with the bulged capacitor problem, aka 
capacitor plague, which many ATX PSUs have suffered from mid 1990s to mid 
2000s where cheap parts were used from less reputable manufacturers.

 Sadly I have ruined a couple of PSUs before I realised what the problem 
was and I have been struggling since with tracking down other parts that 
have failed as a result.  I plan to get back to it sometime.

 Some DECstation models are affected, as is other DEC (and non-DEC) 
hardware:

* The 5000/200, /240 and /260 are not affected.

* The 2100 and 3100 are not if stored in their working orientation, as the 
  capacitors are mounted leads up in their PSUs and corrosion only breaks 
  the seal and not the aluminium can.

* The 5000/120, /125, /133 and /150 are all affected and are better 
  recapped -- all SXF Chemi-con parts have to be replaced at the very 
  least.  Newer PSUs use newer LXF Chemi-con parts that haven't failed for 
  me (yet?), but are expected to too.

* I can't speak of the 5000/20, /25, /33, /50 as I haven't got one of 
  these.

* Other pieces of hardware would have to be inspected by their respective 
  owners, e.g. I had a case where I had to recap the PSU of a small Cisco
  Ethernet switch with an FDDI bridge module from that era (that actually 
  used a stock industrial PSU you can still buy new, although at ~£500 + 
  VAT -- not exactly cheaply).

 Other parts that have been failing are the usual Dallas RTC chips having 
an integrated Lithium coin cell depleted; either the DS1287 or the DS1287A 
depending on the specific model of hardware.  DECstations have these chips 
located in the TURBOchannel slot area with little clearance around them.  
Therefore I have been slowly converting them to a version with a discrete 
coin cell embedded in the IC case instead, as photographically documented 
here: <ftp://ftp.linux-mips.org/pub/linux/mips/people/macro/ds1287/>.

 You can still get recently manufactured brand new DS12887 or DS12887A 
parts from Maxim through the usual distribution channels, however for 
reference systems, such as I consider mine, I prefer to use original parts 
to avoid surprises, as the DS12887/A chips have 104 bytes of general NVRAM 
as opposed to 50 bytes with the DS1287/A.

 NB according to HP end of sales for the DEFPA was only 2004-2005 and 
based on occasional enquiries I get as the maintainer it remains deployed 
in production environments.  These boards remain readily available on the 
second-hand market; sometimes you can get at unused old stock even.  
Unless you look for the less common SMF variants, that is.  I own a couple 
of universal-PCI DEFPA boards that use the most recent PFI-3 ASIC (earlier 
versions were 5V-only), some of which have HP recorded as the vendor in 
the subsystem ID.

 Also new TURBOchannel option hardware has been designed and manufactured 
recently, see: <http://www.flxd.de/tc-usb/>. :)  We'll get a Linux driver 
sometime.

> >  NB I sat on this fix from 2014, well before the warning was introduced in 
> > the first place, and it's only now that I got to unloading my patch queue. 
> > :(
> 
> Do you have the latest kernel running on your DECstation machines now? :)

 Yep:

Linux version 4.19.0-rc6 (macro@tp) (gcc version 4.1.2) #3 Mon Oct 1 00:22:03 BST 2018
bootconsole [prom0] enabled
This is a DECstation 5000/2x0
CPU0 revision is: 00000440 (R4400SC)
FPU revision is: 00000500
Checking for the multiply/shift bug... no.
Checking for the daddiu bug... yes, workaround... yes.
Determined physical RAM map:
 memory: 0000000004000000 @ 0000000000000000 (usable)
Primary instruction cache 16kB, VIPT, direct mapped, linesize 16 bytes.
Primary data cache 16kB, direct mapped, VIPT, no aliases, linesize 16 bytes
Unified secondary cache 1024kB direct mapped, linesize 32 bytes.
Zone ranges:
  Normal   [mem 0x0000000000000000-0x0000000003ffffff]
Movable zone start for each node
Early memory node ranges
  node   0: [mem 0x0000000000000000-0x0000000003ffffff]
Initmem setup node 0 [mem 0x0000000000000000-0x0000000003ffffff]
On node 0 totalpages: 4096
  Normal zone: 14 pages used for memmap
  Normal zone: 0 pages reserved
  Normal zone: 4096 pages, LIFO batch:0
pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
pcpu-alloc: [0] 0
Built 1 zonelists, mobility grouping off.  Total pages: 4082
Kernel command line: rw console=ttyS3 debug panic=60 ip=bootp root=/dev/nfs
Dentry cache hash table entries: 8192 (order: 2, 65536 bytes)
Inode-cache hash table entries: 4096 (order: 1, 32768 bytes)
Memory: 57632K/65536K available (5279K kernel code, 338K rwdata, 1004K rodata, 272K init, 216K bss, 7904K reserved, 0K cma-reserved)
NR_IRQS: 128
I/O ASIC clock frequency 24999536Hz
clocksource: dec-ioasic: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 76451836814 ns
sched_clock: 32 bits at 24MHz, resolution 40ns, wraps every 85900940267ns
MIPS counter frequency 60000464Hz
clocksource: MIPS: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 31854094440 ns
sched_clock: 32 bits at 60MHz, resolution 16ns, wraps every 35791117303ns
Console: colour dummy device 160x64
console [ttyS3] enabled
bootconsole [prom0] disabled
Calibrating delay loop... 59.33 BogoMIPS (lpj=231424)
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 2048 (order: 0, 16384 bytes)
Mountpoint-cache hash table entries: 2048 (order: 0, 16384 bytes)
Checking for the daddi bug... no.
random: get_random_u32 called from bucket_table_alloc+0xbc/0x2e8 with crng_init=0
clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 14931722236523437 ns
futex hash table entries: 256 (order: -2, 6144 bytes)
NET: Registered protocol family 16
Can't analyze schedule() prologue at (____ptrval____)
HugeTLB registered 32.0 MiB page size, pre-allocated 0 pages
SCSI subsystem initialized
tc: TURBOchannel rev. 1 at 25.0 MHz (without parity)
tc0: DEC      PMAG-AA  V1.0a
tc1: DEC      PMAF-FD  V3.1D
tc2: DEC      PMAF-AA  T5.2P-
clocksource: Switched to clocksource MIPS
NET: Registered protocol family 2
tcp_listen_portaddr_hash hash table entries: 1024 (order: 0, 16384 bytes)
TCP established hash table entries: 2048 (order: 0, 16384 bytes)
TCP bind hash table entries: 2048 (order: 0, 16384 bytes)
TCP: Hash tables configured (established 2048 bind 2048)
UDP hash table entries: 512 (order: 0, 16384 bytes)
UDP-Lite hash table entries: 512 (order: 0, 16384 bytes)
NET: Registered protocol family 1
RPC: Registered named UNIX socket transport module.
RPC: Registered udp transport module.
RPC: Registered tcp transport module.
RPC: Registered tcp NFSv4.1 backchannel transport module.
workingset: timestamp_bits=62 max_order=12 bucket_order=0
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)
io scheduler noop registered
io scheduler deadline registered
io scheduler cfq registered (default)
Console: switching to mono frame buffer device 160x64
fb0: PMAG-AA frame buffer device at tc0
DECstation Z85C30 serial driver version 0.10
ttyS0 at MMIO 0x1f900008 (irq = 14, base_baud = 460800) is a Z85C30 SCC
ttyS1 at MMIO 0x1f900000 (irq = 14, base_baud = 460800) is a Z85C30 SCC
ttyS2 at MMIO 0x1f980008 (irq = 15, base_baud = 460800) is a Z85C30 SCC
ttyS3 at MMIO 0x1f980000 (irq = 15, base_baud = 460800) is a Z85C30 SCC
ms02-nv.c: v.1.0.0  13 Aug 2001  Maciej W. Rozycki.
mtd0: DEC MS02-NV NVRAM at 0x07000000, size 1MiB.
declance.c: v0.011 by Linux MIPS DECstation task force
declance0: IOASIC onboard LANCE, addr = 08:00:2b:35:62:c1, irq = 16
declance0: registered as eth0.
defxx: v1.11 2014/07/01  Lawrence V. Stefani and others
random: fast init done
tc1: DEFTA at MMIO addr = 0x1e900000, IRQ = 20, Hardware addr = 08-00-2b-a3-a3-29
tc1: registered as fddi0
defza: v.1.1.4  Oct  2 2018  Maciej W. Rozycki
tc2: DEC FDDIcontroller 700 or 700-C at 0x1f000000, irq 21
tc2: resetting the board...
tc2: OK
tc2: model 700 (DEFZA-AA), MMF PMD, address 08-00-2b-2e-6d-75
tc2: ROM rev. 1.0, firmware rev. 1.2, RMC rev. A, SMT ver. 1
tc2: link unavailable
tc2: registered as fddi1
mousedev: PS/2 mouse device common for all mice
rtc_cmos rtc_cmos: registered as rtc0
rtc_cmos rtc_cmos: no alarms, 50 bytes nvram
NET: Registered protocol family 10
Segment Routing with IPv6
sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
NET: Registered protocol family 17
rtc_cmos rtc_cmos: setting system clock to 2018-10-01 00:45:12 UTC (1538354712)
Sending BOOTP requests . OK
IP-Config: Got BOOTP answer from xxx.xxx.xxx.xxx, my address is xxx.xxx.xxx.xxx
IP-Config: Complete:
     device=eth0, hwaddr=08:00:2b:35:62:c1, ipaddr=xxx.xxx.xxx.xxx, mask=xxx.xxx.xxx.xxx, gw=xxx.xxx.xxx.xxx
fddi1: link available
     host=hhh.hhh.hhh.hhh, domain=, nis-domain=(none)
     bootserver=xxx.xxx.xxx.xxx, rootserver=xxx.xxx.xxx.xxx, rootpath=/ddd/ddd
     nameserver0=xxx.xxx.xxx.xxx
fddi1: link unavailable
VFS: Mounted root (nfs filesystem) on device 0:11.
Freeing unused PROM memory: 112k freed
Freeing unused kernel memory: 272K
This architecture does not have kernel memory protection.
Run /sbin/init as init process
[...]

I had to revert recent changes forcing the minimum of GCC 4.6, and then 
patch up the breakage that was the motivation for the version bump, as I 
cannot easily upgrade my compiler (the newest one I was able to make 
working without NPTL), which will be a process.

 Still 4.18 can be used pristine with CONFIG_32BIT, except for a recent 
build breakage with the RTC driver, my small fix for which has already 
been accepted.  I think 4.17 will build and boot just fine out of the box, 
and I expect the RTC fix to be backported to 4.18 too.

 For CONFIG_64BIT a fix for memory corruption with `memset' is required 
that applies to 4.17 and later versions, and is pending maintainer's 
acceptance.  So I think 4.16 will work just fine, but you need the 
toolchain (GCC+binutils) from my site with a DADDI and DADDIU workarounds 
implemented to build such a kernel.  I think the workarounds will never 
make it upstream due to their intrusiveness, but I mean to maintain them 
indefinitely (though as I mentioned above it'll make me a little bit yet 
to get beyond GCC 4.1.2).

  Maciej

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] TC: Set DMA masks for devices
  2018-10-05 22:52       ` Maciej W. Rozycki
@ 2018-10-06  9:21         ` Fredrik Noring
  2018-10-14 23:51             ` Maciej W. Rozycki
  0 siblings, 1 reply; 9+ messages in thread
From: Fredrik Noring @ 2018-10-06  9:21 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Ralf Baechle, linux-mips, Jürgen Urban

Hi Maciej,

> > Ah... memory that is known to be DMA compatible is allocated separately,
> > and then handed over to the DMA subsystem using dma_declare_coherent_memory.
> 
>  Well, that does specify both a CPU-side and a corresponding DMA-side 
> address too.

Yes, side-stepping any practical use of a DMA mask, which is why it
probably could have an arbitrary value except 0 that causes this warning.

> > This is done once during driver initialisation. The drivers ohci-sm501.c and
> > ohci-tmio.c do that too, which is why I suspect they might broken as well.
> > 
> > The SM501 driver has this explanation:
> > 
> > 	/* The sm501 chip is equipped with local memory that may be used
> > 	 * by on-chip devices such as the video controller and the usb host.
> > 	 * This driver uses dma_declare_coherent_memory() to make sure
> > 	 * usb allocations with dma_alloc_coherent() allocate from
> > 	 * this local memory. The dma_handle returned by dma_alloc_coherent()
> > 	 * will be an offset starting from 0 for the first local memory byte.
> 
>  From the description I take it it is some MMIO memory rather than host 
> memory.  I fail to see how it is supposed to work with these calls for 
> non-system memory, which certainly any MMIO memory is, which surely is not 
> under the supervision of the kernel memory allocator.

I agree, this is obscure to me too.

>  There are calls for MMIO memory defined in the DMA API, specifically 
> `dma_map_resource' and `dma_unmap_resource'.  I've never used them myself, 
> and I gather they provide you with a way for CPUs to access MMIO memory 
> with caching enabled and without the need to use the MMIO accessors only, 
> such as `readl', `writel', etc., which are expected to avoid going through 
> any CPU cache.  Maybe these are what you're after?
> 
>  But maybe I'm missing something.

That is handled within the USB OHCI subsystem. I don't know the details,
actually.

> > 	 *
> > 	 * So as long as data is allocated using dma_alloc_coherent() all is
> > 	 * fine. This is however not always the case - buffers may be allocated
> > 	 * using kmalloc() - so the usb core needs to be told that it must copy
> > 	 * data into our local memory if the buffers happen to be placed in
> > 	 * regular memory. The HCD_LOCAL_MEM flag does just that.
> > 	 */
> 
>  This raises a hack alert to me TBH.

Christoph Hellwig raised concerns too, but I don't know how an OHCI driver
could do things differently given the circumstances, at least for a simple
initial implementation. For sure, the IOP has the capability and was most
likely designed for handling USB devices and other peripherals to a much
greater extent than allowed by the current PS2 OHCI driver, where the EE
manipulates the OHCI registers directly, which is quite inefficient.

> > The DMA for its onboard buffer memory appears to be very similar to the
> > IOP and its DMA? That memory is currently copied by the EE, but there are
> > other DMA controllers that could handle that, possibly synchronised using
> > DMA chaining, which would assist the EE significantly.
> 
>  Mind that the DEFZA runs its own RTOS for initialization and management 
> support, including in particular SMT (Station Management).  This is run on 
> an MC68000 processor.  That processor is interfaced to a bus where board 
> memory is attached as well as the RMC (Ring Memory Controller) chip, which 
> acts as a DMA master on that bus, like does the host bus interface.  Also 
> certain control register writes from the host raise interrupts to the 
> MC68000 for special situations to handle.
> 
>  All the PDQ-based FDDI adapters also have an M68000 which runs an RTOS, 
> however the presence of the PDQ ASIC makes their architecture slightly 
> different as the FDDI chipset does host DMA via the PDQ ASIC, which acts 
> as a master on the host bus (possibly through a bridge chip like the PFI, 
> though TURBOchannel for example is interfaced directly).

How is its firmware handled? The Linux MIPS wiki entry for the DECstation
firmware

https://www.linux-mips.org/wiki/DECstation#Firmware

is a TODO. :) The main reason I'm asking is that the IOP is a MIPS R3000
(apparently in later product models replaced with a PowerPC 405GP and its
DECKARD software emulator) that also needs firmware. The IOP most likely
ought to handle multiple firmware files, in the IRX format, depending on
its set of services.

Have you implemented sysfs structures to inspect the DEFZA RTOS? That is
something I would like to do for the IOP.

>  The biggest challenge has turned out to be electrolytic capacitor 
> failures in power supplies.  Unfortunately in late 1980s to mid 1990s 
> several lines of low-ESR capacitors, used in output filters in switch-mode 
> PSUs, were made with a new electrolyte formula based on a quaternary 
> ammonium salt.  All they have turned out to suffer from excessive 
> corrosion caused by that electrolyte, shortening the lifespan of those 
> parts well below the expectations even in the enhanced lines specifically 
> made with long life in mind.  Consequently those parts start leaking even 
> if unused (or indeed never used) and then obviously cause PSU breakage if 
> powered up.
> 
>  Those were all from reputable manufacturers, such as Chemi-con, Nichicon 
> or Panasonic; not to be confused with the bulged capacitor problem, aka 
> capacitor plague, which many ATX PSUs have suffered from mid 1990s to mid 
> 2000s where cheap parts were used from less reputable manufacturers.

Interesting!

> This is a DECstation 5000/2x0
> CPU0 revision is: 00000440 (R4400SC)
> FPU revision is: 00000500
> Checking for the multiply/shift bug... no.
> Checking for the daddiu bug... yes, workaround... yes.
> Determined physical RAM map:
>  memory: 0000000004000000 @ 0000000000000000 (usable)

Considering the amount of memory, how do compile for it?

Fredrik

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] TC: Set DMA masks for devices
@ 2018-10-14 23:51             ` Maciej W. Rozycki
  0 siblings, 0 replies; 9+ messages in thread
From: Maciej W. Rozycki @ 2018-10-14 23:51 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: Ralf Baechle, linux-mips, Jürgen Urban

Hi Fredrik,

> >  From the description I take it it is some MMIO memory rather than host 
> > memory.  I fail to see how it is supposed to work with these calls for 
> > non-system memory, which certainly any MMIO memory is, which surely is not 
> > under the supervision of the kernel memory allocator.
> 
> I agree, this is obscure to me too.

 I can't be bothered (sorry!) to study this code or the datasheet for the 
IC to figure out what the arrangement is, but I do encourage you to do so 
if you want to make any changes here.

> >  Mind that the DEFZA runs its own RTOS for initialization and management 
> > support, including in particular SMT (Station Management).  This is run on 
> > an MC68000 processor.  That processor is interfaced to a bus where board 
> > memory is attached as well as the RMC (Ring Memory Controller) chip, which 
> > acts as a DMA master on that bus, like does the host bus interface.  Also 
> > certain control register writes from the host raise interrupts to the 
> > MC68000 for special situations to handle.
> > 
> >  All the PDQ-based FDDI adapters also have an M68000 which runs an RTOS, 
> > however the presence of the PDQ ASIC makes their architecture slightly 
> > different as the FDDI chipset does host DMA via the PDQ ASIC, which acts 
> > as a master on the host bus (possibly through a bridge chip like the PFI, 
> > though TURBOchannel for example is interfaced directly).
> 
> How is its firmware handled? The Linux MIPS wiki entry for the DECstation
> firmware
> 
> https://www.linux-mips.org/wiki/DECstation#Firmware
> 
> is a TODO. :)

 I'm not sure who actually created that entry and what they had in mind.  
Likely the console firmware and any of its peculiarities related to Linux.

> The main reason I'm asking is that the IOP is a MIPS R3000
> (apparently in later product models replaced with a PowerPC 405GP and its
> DECKARD software emulator) that also needs firmware. The IOP most likely
> ought to handle multiple firmware files, in the IRX format, depending on
> its set of services.

 The firmware of these FDDI boards is stored in flash memory onboard, so 
you don't need to do anything to load it as it boots by itself.

 There is a documented way to flash a firmware image by fiddling with the 
control registers appropriately, downloading the new image to board RAM 
and then requesting the board to transfer the image to onboard flash.  
From documentation I gather this process is done entirely by board 
circuitry with no software involved on the board side, that is a failed 
firmware flashing process does not preclude another attempt.

 Normally to start initializing the board you just assert/deassert RESET 
with one of the control registers and the board boots.

 It takes DEFZA 10s to boot (the documented amount of time to wait for the 
driver to wait for the boostrap to complete is 30s).  This is why I made 
initialisation messages so verbose, so that the user is not confused and 
does not conclude the kernel has hung.

 You need to boot the board to retrieve its MAC address as the onboard 
PROM chip holding the address is not accesssible from the host side and 
the address is only returned by the INIT command (NB there is no way to 
override it either).  There is an undocumented quicker way board's console 
support code uses for presentation purposes in a system's console monitor, 
but that's board's internal protocol and I didn't want to risk an 
incompatibility with some board revision out there.

 Therefore the board driver requests its interrupt right away, sets a 
timer, cycles RESET and puts the driver to sleep so that the system does 
not become frozen if the driver is loaded as a module during normal Linux 
operation.  Then either a state change interrupt from the board or the 
timer fires and the driver resumes from there accordingly.

 After reboot a command has to be sent to the board to initialise the DMA 
rings and it also takes a while, though not as much.  My measurements 
indicate 160ms, but it's obviously still too long for the driver to just 
busy-wait there twiddling thumbs, so it puts itself to sleep too.

 An unfortunate side effect of this design is that the the IRQ handler is 
called `tcX' rather than `fddiX', as observed in /proc/interrupts.  Maybe 
I'll propose a `rename_irq' API, however I'm not sure if it's worth it.

 The board also has to be reset during normal operation if the so called 
PC Trace (Physical Connection Trace) event has happened in the course of 
FDDI ring fault recovery (i.e. when the token has been lost and could not 
have been restored with beaconing).  That event causes the board to switch 
into the halted state (the link status LED changes from green to red to 
signify the problem) and the board has to be rebooted by the driver to 
verify it's not this board that is the FDDI station having caused the ring 
fault.

 Then all the usual commands have to be sent to initialise the board, set 
FDDI link parameters, add any CAM entries that were set before the reboot 
and set the promiscuous mode if in use, and then finally join the ring.  
So this is handled with an interrupt-driven state machine as otherwise 
again the driver would have to freeze the system for the duration of all 
this processing.

 The PDQ-based adapters are much quicker, they boot in ~1s.  However the 
current `defxx' driver is flawed in that it does not handle that PC Trace 
event with a state machine and it does freeze the system if that happens, 
remaining in the hardirq context throughout.  Also it may fail DMA buffer 
allocation in the course of the reboot as it (unnecessarily) frees all the 
buffers previously allocated and requests new ones instead.

 I need to fix this all, modelling the solution after `defza', however I 
want to upstream the latter driver first.  Fortunately PC Trace events are 
not that common, but earlier this year someone has already complained 
about this issue with `defxx' causing unacceptable latency problems with 
their system, so I do need to look into it.

> Have you implemented sysfs structures to inspect the DEFZA RTOS? That is
> something I would like to do for the IOP.

 There is no (documented) way to access the internals of board firmware 
(except for the request to flash it).  You only have have access to 
onboard 1MiB of RAM and a bunch of control/status registers.  Likewise 
with the PDQ-based adapters, although their use of RAM is not clearly 
documented (the PFI has a separate BAR for board RAM access) -- I find it 
hard to believe they'd put 1MiB of RAM there only to support firmware 
upgrades, so I think it is still used as a temporary packet buffer and 
other operational purposes.

> > This is a DECstation 5000/2x0
> > CPU0 revision is: 00000440 (R4400SC)
> > FPU revision is: 00000500
> > Checking for the multiply/shift bug... no.
> > Checking for the daddiu bug... yes, workaround... yes.
> > Determined physical RAM map:
> >  memory: 0000000004000000 @ 0000000000000000 (usable)
> 
> Considering the amount of memory, how do compile for it?

 The kernel can be cross-compiled easily and with no pitfalls, so this is 
what I have been always doing.

 With userland builds most software packages can be cross-compiled, but I 
prefer native builds indeed, as these do not require manual tweaking of 
any parameters that cannot be inferred in cross-compilation (fortunately 
modern versions of Autoconf are able to figure out what the sizes of data 
types are even if cross-compiling, as setting these manually used to be a 
real pain).

 For those I usually use my Broadcom SWARM board, which is clocked at 
800Mhz and currently has 3200MiB of RAM (pending a firmware fix of DRAM 
controller initialisation that will hopefully allow for full 4GiB possible 
with modules available on the market out of 8GiB theoretical maximum).  
The SWARM has switchable endianness with the line to control it at reset 
wired to a PCB header used with a jumper as shipped.  I have instead wired 
it to an external switch mounted on a cover plate of an unused option 
slot, so that I don't have to pull the system apart to change the 
endianness.

 I have better equipped DECstations at my remote site though; the maximum 
amount of RAM the /200, /240 and /260 models accept is 480MiB.  The 
remaining 32MiB of space addressable via the KSEG0/KSEG1 spaces is used 
for system ROM and MMIO (for onboard I/O circuitry and TURBOchannel).  
TURBOchannel can also be accessed from 0x20000000 physical up (not with 
the /200), for 3 slots of 512MiB of MMIO space each, however due to an API 
shortcoming system firmware cannot cope with that (as documented on the 
DECstation wiki).

  Maciej

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] TC: Set DMA masks for devices
@ 2018-10-14 23:51             ` Maciej W. Rozycki
  0 siblings, 0 replies; 9+ messages in thread
From: Maciej W. Rozycki @ 2018-10-14 23:51 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: Ralf Baechle, linux-mips, Jürgen Urban

Hi Fredrik,

> >  From the description I take it it is some MMIO memory rather than host 
> > memory.  I fail to see how it is supposed to work with these calls for 
> > non-system memory, which certainly any MMIO memory is, which surely is not 
> > under the supervision of the kernel memory allocator.
> 
> I agree, this is obscure to me too.

 I can't be bothered (sorry!) to study this code or the datasheet for the 
IC to figure out what the arrangement is, but I do encourage you to do so 
if you want to make any changes here.

> >  Mind that the DEFZA runs its own RTOS for initialization and management 
> > support, including in particular SMT (Station Management).  This is run on 
> > an MC68000 processor.  That processor is interfaced to a bus where board 
> > memory is attached as well as the RMC (Ring Memory Controller) chip, which 
> > acts as a DMA master on that bus, like does the host bus interface.  Also 
> > certain control register writes from the host raise interrupts to the 
> > MC68000 for special situations to handle.
> > 
> >  All the PDQ-based FDDI adapters also have an M68000 which runs an RTOS, 
> > however the presence of the PDQ ASIC makes their architecture slightly 
> > different as the FDDI chipset does host DMA via the PDQ ASIC, which acts 
> > as a master on the host bus (possibly through a bridge chip like the PFI, 
> > though TURBOchannel for example is interfaced directly).
> 
> How is its firmware handled? The Linux MIPS wiki entry for the DECstation
> firmware
> 
> https://www.linux-mips.org/wiki/DECstation#Firmware
> 
> is a TODO. :)

 I'm not sure who actually created that entry and what they had in mind.  
Likely the console firmware and any of its peculiarities related to Linux.

> The main reason I'm asking is that the IOP is a MIPS R3000
> (apparently in later product models replaced with a PowerPC 405GP and its
> DECKARD software emulator) that also needs firmware. The IOP most likely
> ought to handle multiple firmware files, in the IRX format, depending on
> its set of services.

 The firmware of these FDDI boards is stored in flash memory onboard, so 
you don't need to do anything to load it as it boots by itself.

 There is a documented way to flash a firmware image by fiddling with the 
control registers appropriately, downloading the new image to board RAM 
and then requesting the board to transfer the image to onboard flash.  

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-10-14 23:51 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-03 12:21 [PATCH] TC: Set DMA masks for devices Maciej W. Rozycki
2018-10-04 16:57 ` Fredrik Noring
2018-10-04 17:55   ` Fredrik Noring
2018-10-04 20:09   ` Maciej W. Rozycki
2018-10-05 14:56     ` Fredrik Noring
2018-10-05 22:52       ` Maciej W. Rozycki
2018-10-06  9:21         ` Fredrik Noring
2018-10-14 23:51           ` Maciej W. Rozycki
2018-10-14 23:51             ` Maciej W. Rozycki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.