All of lore.kernel.org
 help / color / mirror / Atom feed
* Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2
@ 2019-06-25  6:40 John Paul Adrian Glaubitz
  2019-06-25  6:42 ` Christoph Hellwig
                   ` (28 more replies)
  0 siblings, 29 replies; 30+ messages in thread
From: John Paul Adrian Glaubitz @ 2019-06-25  6:40 UTC (permalink / raw)
  To: linux-ia64

Hi Christoph!

On 6/21/19 10:08 PM, Frank Scheiner wrote:
> recent testing of a Debian v4.19.37 kernel showed a problem on my rx2800
> i2 happening during kernel boot:
> (...)
> [1]:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?idT3cea9accd9804307541cb93d3ed7ec94b07237

Do you have any idea what could be the reason for the issue introduced
by your above commit? James Clarke has guess that it might be GFP_DMA32
which isn't being set properly anymore for the affected machines.

Do you think we could test a kernel which just sets the flag unconditionally
to see whether this is the problem that causes the issues on these machines?

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2
  2019-06-25  6:40 Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2 John Paul Adrian Glaubitz
@ 2019-06-25  6:42 ` Christoph Hellwig
  2019-06-25  6:46 ` John Paul Adrian Glaubitz
                   ` (27 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2019-06-25  6:42 UTC (permalink / raw)
  To: linux-ia64

On Tue, Jun 25, 2019 at 08:40:01AM +0200, John Paul Adrian Glaubitz wrote:
> Hi Christoph!
> 
> On 6/21/19 10:08 PM, Frank Scheiner wrote:
> > recent testing of a Debian v4.19.37 kernel showed a problem on my rx2800
> > i2 happening during kernel boot:
> > (...)
> > [1]:
> > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?idT3cea9accd9804307541cb93d3ed7ec94b07237
> 
> Do you have any idea what could be the reason for the issue introduced
> by your above commit? James Clarke has guess that it might be GFP_DMA32
> which isn't being set properly anymore for the affected machines.
> 
> Do you think we could test a kernel which just sets the flag unconditionally
> to see whether this is the problem that causes the issues on these machines?

Might be worth a test.  Do you know what device failed?  Might be one
with a dma mask < 32-bit?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2
  2019-06-25  6:40 Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2 John Paul Adrian Glaubitz
  2019-06-25  6:42 ` Christoph Hellwig
@ 2019-06-25  6:46 ` John Paul Adrian Glaubitz
  2019-06-25  6:50 ` Christoph Hellwig
                   ` (26 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: John Paul Adrian Glaubitz @ 2019-06-25  6:46 UTC (permalink / raw)
  To: linux-ia64

On 6/25/19 8:42 AM, Christoph Hellwig wrote:
>> Do you think we could test a kernel which just sets the flag unconditionally
>> to see whether this is the problem that causes the issues on these machines?
> 
> Might be worth a test.  Do you know what device failed?  Might be one
> with a dma mask < 32-bit?

I can reproduce the crash when trying to load the module for the USB controllers,
for example. Loading the kernel module for the SATA controllers provokes the
backtrace as well.

I have skimmed through the code a bit , but I'm not sure whether I understand
the code in kernel/dma/direct.c correctly, so my suggestion would be to just
set GFP_DMA32 in __dma_direct_alloc_pages() unconditionally for a test. Would
that be enough for a test?

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2
  2019-06-25  6:40 Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2 John Paul Adrian Glaubitz
  2019-06-25  6:42 ` Christoph Hellwig
  2019-06-25  6:46 ` John Paul Adrian Glaubitz
@ 2019-06-25  6:50 ` Christoph Hellwig
  2019-06-25  6:54 ` John Paul Adrian Glaubitz
                   ` (25 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2019-06-25  6:50 UTC (permalink / raw)
  To: linux-ia64

On Tue, Jun 25, 2019 at 08:46:57AM +0200, John Paul Adrian Glaubitz wrote:
> I can reproduce the crash when trying to load the module for the USB controllers,
> for example. Loading the kernel module for the SATA controllers provokes the
> backtrace as well.
> 
> I have skimmed through the code a bit , but I'm not sure whether I understand
> the code in kernel/dma/direct.c correctly, so my suggestion would be to just
> set GFP_DMA32 in __dma_direct_alloc_pages() unconditionally for a test. Would
> that be enough for a test?

Yes.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2
  2019-06-25  6:40 Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2 John Paul Adrian Glaubitz
                   ` (2 preceding siblings ...)
  2019-06-25  6:50 ` Christoph Hellwig
@ 2019-06-25  6:54 ` John Paul Adrian Glaubitz
  2019-06-25  6:59 ` Christoph Hellwig
                   ` (24 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: John Paul Adrian Glaubitz @ 2019-06-25  6:54 UTC (permalink / raw)
  To: linux-ia64

On 6/25/19 8:50 AM, Christoph Hellwig wrote:
> On Tue, Jun 25, 2019 at 08:46:57AM +0200, John Paul Adrian Glaubitz wrote:
>> I can reproduce the crash when trying to load the module for the USB controllers,
>> for example. Loading the kernel module for the SATA controllers provokes the
>> backtrace as well.
>>
>> I have skimmed through the code a bit , but I'm not sure whether I understand
>> the code in kernel/dma/direct.c correctly, so my suggestion would be to just
>> set GFP_DMA32 in __dma_direct_alloc_pages() unconditionally for a test. Would
>> that be enough for a test?
> 
> Yes.

Okay, thanks. I'll whip up a patch for Frank to test.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2
  2019-06-25  6:40 Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2 John Paul Adrian Glaubitz
                   ` (3 preceding siblings ...)
  2019-06-25  6:54 ` John Paul Adrian Glaubitz
@ 2019-06-25  6:59 ` Christoph Hellwig
  2019-06-25  7:26 ` Frank Scheiner
                   ` (23 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2019-06-25  6:59 UTC (permalink / raw)
  To: linux-ia64

On Tue, Jun 25, 2019 at 08:54:11AM +0200, John Paul Adrian Glaubitz wrote:
> Okay, thanks. I'll whip up a patch for Frank to test.

The one below should do it, but from looking at the ia64 zone
initialization I'm not sure this will be the culprit.

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 2c2772e9702a..3e802f4580b3 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -82,9 +82,7 @@ static gfp_t __dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask,
 	 */
 	if (*phys_mask <= DMA_BIT_MASK(ARCH_ZONE_DMA_BITS))
 		return GFP_DMA;
-	if (*phys_mask <= DMA_BIT_MASK(32))
-		return GFP_DMA32;
-	return 0;
+	return GFP_DMA32;
 }
 
 static bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size)

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2
  2019-06-25  6:40 Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2 John Paul Adrian Glaubitz
                   ` (4 preceding siblings ...)
  2019-06-25  6:59 ` Christoph Hellwig
@ 2019-06-25  7:26 ` Frank Scheiner
  2019-06-25  8:16 ` Frank Scheiner
                   ` (22 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: Frank Scheiner @ 2019-06-25  7:26 UTC (permalink / raw)
  To: linux-ia64

On 6/25/19 08:59, Christoph Hellwig wrote:
> On Tue, Jun 25, 2019 at 08:54:11AM +0200, John Paul Adrian Glaubitz wrote:
>> Okay, thanks. I'll whip up a patch for Frank to test.
>
> The one below should do it, but from looking at the ia64 zone
> initialization I'm not sure this will be the culprit.
>
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 2c2772e9702a..3e802f4580b3 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -82,9 +82,7 @@ static gfp_t __dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask,
>   	 */
>   	if (*phys_mask <= DMA_BIT_MASK(ARCH_ZONE_DMA_BITS))
>   		return GFP_DMA;
> -	if (*phys_mask <= DMA_BIT_MASK(32))
> -		return GFP_DMA32;
> -	return 0;
> +	return GFP_DMA32;
>   }
>
>   static bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size)
>

Ok, will apply that to the most recent non-rc kernel source and give it
a try. Should take about 45 mins or so.

Cheers,
Frank

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2
  2019-06-25  6:40 Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2 John Paul Adrian Glaubitz
                   ` (5 preceding siblings ...)
  2019-06-25  7:26 ` Frank Scheiner
@ 2019-06-25  8:16 ` Frank Scheiner
  2019-06-25  8:18 ` Christoph Hellwig
                   ` (21 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: Frank Scheiner @ 2019-06-25  8:16 UTC (permalink / raw)
  To: linux-ia64

On 6/25/19 09:26, Frank Scheiner wrote:
> On 6/25/19 08:59, Christoph Hellwig wrote:
>> On Tue, Jun 25, 2019 at 08:54:11AM +0200, John Paul Adrian Glaubitz
>> wrote:
>>> Okay, thanks. I'll whip up a patch for Frank to test.
>>
>> The one below should do it, but from looking at the ia64 zone
>> initialization I'm not sure this will be the culprit.
>>
>> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
>> index 2c2772e9702a..3e802f4580b3 100644
>> --- a/kernel/dma/direct.c
>> +++ b/kernel/dma/direct.c
>> @@ -82,9 +82,7 @@ static gfp_t __dma_direct_optimal_gfp_mask(struct
>> device *dev, u64 dma_mask,
>>        */
>>       if (*phys_mask <= DMA_BIT_MASK(ARCH_ZONE_DMA_BITS))
>>           return GFP_DMA;
>> -    if (*phys_mask <= DMA_BIT_MASK(32))
>> -        return GFP_DMA32;
>> -    return 0;
>> +    return GFP_DMA32;
>>   }
>>   static bool dma_coherent_ok(struct device *dev, phys_addr_t phys,
>> size_t size)
>>
>
> Ok, will apply that to the most recent non-rc kernel source and give it
> a try. Should take about 45 mins or so.

Looks like this patch is not enough or not related, a kernel v5.1.15
with that patch applied yields the following:

```
Linux version 5.1.15-dirty (root@rx2800-i2) (gcc version 7.3.0 (Gentoo
7.3.0-r3 p1.4)) #1 SMP Tue Jun 25 09:59:06 CEST 2019
EFI v2.10 by HP:
efi:  SALsystab=0xdfdd63a18  ACPI 2.0=0x3d3c4014  HCDP=0xdffff8798
SMBIOS=0x3d368000
booting generic kernel on platform dig
PCDP: v3 at 0xdffff8798
earlycon: uart8250 at I/O port 0x4000 (options '115200n8')
printk: bootconsole [uart8250] enabled
ACPI: Early table checksum verification disabled
ACPI: RSDP 0x000000003D3C4014 000024 (v02 HP    )
ACPI: XSDT 0x000000003D3C4580 000124 (v01 HP     RX2800-2 00000001
01000013)
[...]
Trying to unpack rootfs image as initramfs...
[...]
Detecting Adaptec I2O RAID controllers...
ahci 0000:00:1f.2: AHCI 0001.0200 32 slots 6 ports 3 Gbps 0x3f impl SATA
mode
ahci 0000:00:1f.2: flags: 64bit ncq sntf pm led clo pio slum part ccc ems
Unable to handle kernel NULL pointer dereference (address 0000000000001688)
swapper/0[1]: Oops 11012296146944 [1]
Modules linked in:

CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.1.15-dirty #1
Hardware name: hp Integrity rx2800 i2, BIOS 01.93 09/12/2012
psr : 00001210084a6010 ifs : 8000000000001734 ip  : [<a00000010017b901>]
    Not tainted (5.1.15-dirty)
ip is at __alloc_pages_nodemask+0x281/0x17a0
unat: 0000000000000000 pfs : 0000000000001734 rsc : 0000000000000003
rnat: 00000003d8598c41 bsps: 000000000001003e pr  : 0000000000011269
ldrs: 0000000000000000 ccv : 000000038d5f0ad4 fpsr: 0009804c8a70433f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a00000010017b8c0 b6  : a00000010003a740 b7  : a0000001007fe990
f6  : 1003e0000000000000000 f7  : 1000fb27f800000000000
f8  : 1003e0000000000003480 f9  : 1003e000000000000000f
f10 : 1003e0000000000000400 f11 : 1003e0000000000003c00
r1  : a0000001015a9e80 r2  : a000000101339e94 r3  : 00000000007fffff
r8  : 0000000000001680 r9  : 0000000000002500 r10 : fffffffffffc04b8
r11 : e000000001519980 r12 : e000000d8339fce0 r13 : e000000d83398000
r14 : ffffffffffd90014 r15 : 0000000000000001 r16 : 0000000000000008
r17 : e000000001519990 r18 : 0000000000001680 r19 : 0000000000000000
r20 : 0000000000000000 r21 : 0000000000000000 r22 : 0000000000000000
r23 : 0000000000000000 r24 : ffffffffffd90000 r25 : a000000101339e80
r26 : 0000000000000000 r27 : 0000000000000000 r28 : 0000000000000000
r29 : 0000000000001688 r30 : 0000000000000000 r31 : 0000000000000081

Call Trace:
  [<a000000100013820>] show_stack+0x40/0x90
                                 sp=e000000d8339f930 bsp=e000000d833998c0
  [<a0000001000141a0>] show_regs+0x930/0x940
                                 sp=e000000d8339fb00 bsp=e000000d83399850
  [<a0000001000245e0>] die+0x1a0/0x2f0
                                 sp=e000000d8339fb00 bsp=e000000d83399810
  [<a00000010004bab0>] ia64_do_page_fault+0x7e0/0x9e0
                                 sp=e000000d8339fb00 bsp=e000000d83399778
  [<a00000010000c580>] ia64_leave_kernel+0x0/0x270
                                 sp=e000000d8339fb10 bsp=e000000d83399778
  [<a00000010017b900>] __alloc_pages_nodemask+0x280/0x17a0
                                 sp=e000000d8339fce0 bsp=e000000d833995d0
  [<a00000010010eab0>] __dma_direct_alloc_pages+0x190/0x320
                                 sp=e000000d8339fd50 bsp=e000000d83399550
  [<a00000010010ec70>] dma_direct_alloc_pages+0x30/0x170
                                 sp=e000000d8339fd50 bsp=e000000d83399510
  [<a00000010003a790>] arch_dma_alloc+0x30/0x50
                                 sp=e000000d8339fd50 bsp=e000000d833994d0
  [<a00000010010ef10>] dma_direct_alloc+0x60/0xa0
                                 sp=e000000d8339fd50 bsp=e000000d83399490
  [<a00000010010c570>] dma_alloc_attrs+0x150/0x1e0
                                 sp=e000000d8339fd50 bsp=e000000d83399440
  [<a00000010010c670>] dmam_alloc_attrs+0x70/0x100
                                 sp=e000000d8339fd50 bsp=e000000d833993e8
  [<a0000001009a9b90>] ahci_port_start+0x2e0/0x4a0
                                 sp=e000000d8339fd50 bsp=e000000d833993a0
  [<a000000100969460>] ata_host_start+0x300/0x460
                                 sp=e000000d8339fd60 bsp=e000000d83399340
  [<a0000001009758a0>] ata_host_activate+0x20/0x280
                                 sp=e000000d8339fd60 bsp=e000000d833992e0
  [<a0000001009aa070>] ahci_host_activate+0x320/0x330
                                 sp=e000000d8339fd60 bsp=e000000d83399270
  [<a0000001009a3410>] ahci_init_one+0x1a70/0x1e10
                                 sp=e000000d8339fd60 bsp=e000000d833991b8
  [<a0000001006df4b0>] local_pci_probe+0x90/0x140
                                 sp=e000000d8339fdc0 bsp=e000000d83399178
  [<a0000001006e09d0>] pci_device_probe+0x2f0/0x310
                                 sp=e000000d8339fdc0 bsp=e000000d83399140
  [<a00000010083a380>] really_probe+0x4a0/0x6b0
                                 sp=e000000d8339fde0 bsp=e000000d833990d8
  [<a00000010083aa40>] driver_probe_device+0x1e0/0x1f0
                                 sp=e000000d8339fde0 bsp=e000000d833990a0
  [<a00000010083aee0>] device_driver_attach+0xb0/0x100
                                 sp=e000000d8339fde0 bsp=e000000d83399070
  [<a00000010083b110>] __driver_attach+0x1e0/0x1f0
                                 sp=e000000d8339fde0 bsp=e000000d83399040
  [<a0000001008363d0>] bus_for_each_dev+0xd0/0x130
                                 sp=e000000d8339fde0 bsp=e000000d83399000
  [<a000000100839490>] driver_attach+0x40/0x60
                                 sp=e000000d8339fdf0 bsp=e000000d83398fd8
  [<a000000100838860>] bus_add_driver+0x3b0/0x450
                                 sp=e000000d8339fdf0 bsp=e000000d83398f88
  [<a00000010083c070>] driver_register+0x220/0x2b0
                                 sp=e000000d8339fdf0 bsp=e000000d83398f60
  [<a0000001006deb30>] __pci_register_driver+0xa0/0xc0
                                 sp=e000000d8339fdf0 bsp=e000000d83398f30
  [<a0000001011442d0>] ahci_pci_driver_init+0x50/0x70
                                 sp=e000000d8339fdf0 bsp=e000000d83398f18
  [<a00000010000a7d0>] do_one_initcall+0x100/0x2c0
                                 sp=e000000d8339fdf0 bsp=e000000d83398ee0
  [<a0000001010f9cc0>] kernel_init_freeable+0x410/0x470
                                 sp=e000000d8339fe30 bsp=e000000d83398e78
  [<a000000100ddd660>] kernel_init+0x20/0x280
                                 sp=e000000d8339fe30 bsp=e000000d83398e58
  [<a00000010000c370>] call_payload+0x50/0x80
                                 sp=e000000d8339fe30 bsp=e000000d83398e40
Disabling lock debugging due to kernel taint
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
---[ end Kernel panic - not syncing: Attempted to kill init!
exitcode=0x0000000b ]---
```

During compilation I noticed the following messages:

```
[...]
   CC      arch/ia64/kernel/dma-mapping.o
In file included from ./include/linux/cpumask.h:12:0,
                  from ./include/linux/rcupdate.h:31,
                  from ./include/linux/rculist.h:11,
                  from ./include/linux/pid.h:5,
                  from ./include/linux/sched.h:14,
                  from kernel/sched/sched.h:5,
                  from kernel/sched/core.c:8:
In function ‘bitmap_zero’,
     inlined from ‘cpumask_clear’ at ./include/linux/cpumask.h:390:2,
     inlined from ‘get_mmu_context’ at
./arch/ia64/include/asm/mmu_context.h:92:3,
     inlined from ‘activate_context’ at
./arch/ia64/include/asm/mmu_context.h:170:11,
     inlined from ‘activate_mm’ at
./arch/ia64/include/asm/mmu_context.h:194:2,
     inlined from ‘idle_task_exit’ at kernel/sched/core.c:5575:3:
./include/linux/bitmap.h:218:2: warning: ‘memset’ writing 8 bytes into a
region of size 0 overflows the destination [-Wstringop-overflow=]
   memset(dst, 0, len);
   ^~~~~~~~~~~~~~~~~~~
[...]
```

...though I can't say if I haven't seen this before, as I didn't check
the whole make output if it exited with 0.

Cheers,
Frank

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2
  2019-06-25  6:40 Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2 John Paul Adrian Glaubitz
                   ` (6 preceding siblings ...)
  2019-06-25  8:16 ` Frank Scheiner
@ 2019-06-25  8:18 ` Christoph Hellwig
  2019-06-25  8:38 ` Frank Scheiner
                   ` (20 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2019-06-25  8:18 UTC (permalink / raw)
  To: linux-ia64

On Tue, Jun 25, 2019 at 10:16:22AM +0200, Frank Scheiner wrote:
> Looks like this patch is not enough or not related, a kernel v5.1.15
> with that patch applied yields the following:

Can you use gdb to disassemle the faulting addresss?

Something like:

gdb vmlinux

The in gdb:

l *(__alloc_pages_nodemask+0x281)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2
  2019-06-25  6:40 Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2 John Paul Adrian Glaubitz
                   ` (7 preceding siblings ...)
  2019-06-25  8:18 ` Christoph Hellwig
@ 2019-06-25  8:38 ` Frank Scheiner
  2019-06-25  9:30 ` Frank Scheiner
                   ` (19 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: Frank Scheiner @ 2019-06-25  8:38 UTC (permalink / raw)
  To: linux-ia64

On 6/25/19 10:18, Christoph Hellwig wrote:
> On Tue, Jun 25, 2019 at 10:16:22AM +0200, Frank Scheiner wrote:
>> Looks like this patch is not enough or not related, a kernel v5.1.15
>> with that patch applied yields the following:
>
> Can you use gdb to disassemle the faulting addresss?
>
> Something like:
>
> gdb vmlinux
>
> The in gdb:
>
> l *(__alloc_pages_nodemask+0x281)

Will do. I didn't have gdb installed so it might take some time to
emerge it. Will report back with the requested information then.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2
  2019-06-25  6:40 Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2 John Paul Adrian Glaubitz
                   ` (8 preceding siblings ...)
  2019-06-25  8:38 ` Frank Scheiner
@ 2019-06-25  9:30 ` Frank Scheiner
  2019-06-25 10:32 ` Christoph Hellwig
                   ` (18 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: Frank Scheiner @ 2019-06-25  9:30 UTC (permalink / raw)
  To: linux-ia64

On 6/25/19 10:38, Frank Scheiner wrote:
> On 6/25/19 10:18, Christoph Hellwig wrote:
>> On Tue, Jun 25, 2019 at 10:16:22AM +0200, Frank Scheiner wrote:
>>> Looks like this patch is not enough or not related, a kernel v5.1.15
>>> with that patch applied yields the following:
>>
>> Can you use gdb to disassemle the faulting addresss?
>>
>> Something like:
>>
>> gdb vmlinux
>>
>> The in gdb:
>>
>> l *(__alloc_pages_nodemask+0x281)
>
> Will do. I didn't have gdb installed so it might take some time to
> emerge it. Will report back with the requested information then.

Here's what I get:

```
# gdb ./vmlinux
GNU gdb (Gentoo 8.1 p1) 8.1
[...]
Reading symbols from ./vmlinux...done.
[...]
(gdb) l *(__alloc_pages_nodemask+0x281)
0xa00000010017b901 is in __alloc_pages_nodemask
(./include/linux/mmzone.h:993).
988	 */
989	static __always_inline struct zoneref *next_zones_zonelist(struct
zoneref *z,
990						enum zone_type highest_zoneidx,
991						nodemask_t *nodes)
992	{
993		if (likely(!nodes && zonelist_zone_idx(z) <= highest_zoneidx))
994			return z;
995		return __next_zones_zonelist(z, highest_zoneidx, nodes);
996	}
997
```

Sorry, it took longer than expected, as I was compiling in a ramdisk and
I once again forgot to save that state **before** the reboot with the
v5.1.15 kernel. So I had to recompile the kernel, too (the faulting
address stays the same with the newly compiled kernel!). :-/ But maybe
that was needed anyhow, as my original `.config` had `CONFIG_DEBUG_INFO`
unset.

Cheers,
Frank

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2
  2019-06-25  6:40 Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2 John Paul Adrian Glaubitz
                   ` (9 preceding siblings ...)
  2019-06-25  9:30 ` Frank Scheiner
@ 2019-06-25 10:32 ` Christoph Hellwig
  2019-06-25 10:46 ` Frank Scheiner
                   ` (17 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2019-06-25 10:32 UTC (permalink / raw)
  To: linux-ia64

Thanks Frank.

It seems like there is something odd going on with the zonelist
on your system.  Maybe there is not actual ZONE_DMA32, or something
is messed with the numa node setup.  Below is a band aid patch to
try theory number two above:


diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index fcdb23e8d2fc..8e3f7b8bdb33 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -119,7 +119,7 @@ struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
 		}
 	}
 	if (!page)
-		page = alloc_pages_node(dev_to_node(dev), gfp, page_order);
+		page = alloc_pages(gfp, page_order);
 
 	if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
 		__free_pages(page, page_order);

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2
  2019-06-25  6:40 Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2 John Paul Adrian Glaubitz
                   ` (10 preceding siblings ...)
  2019-06-25 10:32 ` Christoph Hellwig
@ 2019-06-25 10:46 ` Frank Scheiner
  2019-06-25 10:47 ` Christoph Hellwig
                   ` (16 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: Frank Scheiner @ 2019-06-25 10:46 UTC (permalink / raw)
  To: linux-ia64



On 6/25/19 12:32, Christoph Hellwig wrote:
> Thanks Frank.
>
> It seems like there is something odd going on with the zonelist
> on your system.  Maybe there is not actual ZONE_DMA32, or something
> is messed with the numa node setup.

Do you suspect a firmware issue? Because the firmware of that machine is
actually quite old (the model was retired in 2015):

```
***********************************************************
* ROM Version : 01.93
* ROM Date    : Wed Sep 12 22:10:03 PDT 2012
***********************************************************
```

...but since HP has a different opinion about what is considered part of
the hardware and hence should be "corrected" by the manufacturer in any
case without asking for large amounts of money, I have no means to
upgrade it.

>  Below is a band aid patch to
> try theory number two above:
>
>
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index fcdb23e8d2fc..8e3f7b8bdb33 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -119,7 +119,7 @@ struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
>   		}
>   	}
>   	if (!page)
> -		page = alloc_pages_node(dev_to_node(dev), gfp, page_order);
> +		page = alloc_pages(gfp, page_order);
>
>   	if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
>   		__free_pages(page, page_order);
>

Ok, will try that patch - actually (1) in addition or (2) without the
first one?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2
  2019-06-25  6:40 Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2 John Paul Adrian Glaubitz
                   ` (11 preceding siblings ...)
  2019-06-25 10:46 ` Frank Scheiner
@ 2019-06-25 10:47 ` Christoph Hellwig
  2019-06-25 11:19 ` Frank Scheiner
                   ` (15 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2019-06-25 10:47 UTC (permalink / raw)
  To: linux-ia64

On Tue, Jun 25, 2019 at 12:46:39PM +0200, Frank Scheiner wrote:
> Do you suspect a firmware issue? Because the firmware of that machine is
> actually quite old (the model was retired in 2015):

No, probably something in the Linux ia64-specific code.

>>   	if (!page)
>> -		page = alloc_pages_node(dev_to_node(dev), gfp, page_order);
>> +		page = alloc_pages(gfp, page_order);
>>
>>   	if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
>>   		__free_pages(page, page_order);
>>
>
> Ok, will try that patch - actually (1) in addition or (2) without the
> first one?

Instead.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2
  2019-06-25  6:40 Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2 John Paul Adrian Glaubitz
                   ` (12 preceding siblings ...)
  2019-06-25 10:47 ` Christoph Hellwig
@ 2019-06-25 11:19 ` Frank Scheiner
  2019-06-25 11:21 ` John Paul Adrian Glaubitz
                   ` (14 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: Frank Scheiner @ 2019-06-25 11:19 UTC (permalink / raw)
  To: linux-ia64



On 6/25/19 12:47, Christoph Hellwig wrote:
> On Tue, Jun 25, 2019 at 12:46:39PM +0200, Frank Scheiner wrote:
>> Do you suspect a firmware issue? Because the firmware of that machine is
>> actually quite old (the model was retired in 2015):
>
> No, probably something in the Linux ia64-specific code.
>
>>>    	if (!page)
>>> -		page = alloc_pages_node(dev_to_node(dev), gfp, page_order);
>>> +		page = alloc_pages(gfp, page_order);
>>>
>>>    	if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
>>>    		__free_pages(page, page_order);
>>>
>>
>> Ok, will try that patch - actually (1) in addition or (2) without the
>> first one?
>
> Instead.

Ok, that looks much better now with the second patch:

```
Linux version 5.1.15-dirty (root@rx2800-i2) (gcc version 7.3.0 (Gentoo
7.3.0-r3 p1.4)) #2 SMP Tue Jun 25 13:11:38 CEST 2019
EFI v2.10 by HP:
efi:  SALsystab=0xdfdd63a18  ACPI 2.0=0x3d3c4014  HCDP=0xdffff8798
SMBIOS=0x3d368000
booting generic kernel on platform dig
PCDP: v3 at 0xdffff8798
earlycon: uart8250 at I/O port 0x4000 (options '115200n8')
printk: bootconsole [uart8250] enabled
ACPI: Early table checksum verification disabled
ACPI: RSDP 0x000000003D3C4014 000024 (v02 HP    )
ACPI: XSDT 0x000000003D3C4580 000124 (v01 HP     RX2800-2 00000001
01000013)
[...]
Trying to unpack rootfs image as initramfs...
[...]
Detecting Adaptec I2O RAID controllers...
ahci 0000:00:1f.2: AHCI 0001.0200 32 slots 6 ports 3 Gbps 0x3f impl SATA
mode
ahci 0000:00:1f.2: flags: 64bit ncq sntf pm led clo pio slum part ccc ems
scsi host0: ahci
scsi host1: ahci
scsi host2: ahci
scsi host3: ahci
scsi host4: ahci
scsi host5: ahci
[...]
INIT: version 2.93 booting

    OpenRC 0.41.2 is starting up Gentoo Linux (ia64)
[...]
This is rx2800-i2.[...] (Linux ia64 5.1.15-dirty) 13:23:57

rx2800-i2 login:
```

...even after a second reboot for verification. Great!

I assume this won't affect UMA Itaniums or should I check on one of my
other Integrities if this change breaks the kernel on them?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2
  2019-06-25  6:40 Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2 John Paul Adrian Glaubitz
                   ` (13 preceding siblings ...)
  2019-06-25 11:19 ` Frank Scheiner
@ 2019-06-25 11:21 ` John Paul Adrian Glaubitz
  2019-06-25 12:00 ` Christoph Hellwig
                   ` (13 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: John Paul Adrian Glaubitz @ 2019-06-25 11:21 UTC (permalink / raw)
  To: linux-ia64

On 6/25/19 1:19 PM, Frank Scheiner wrote:
>> Instead.
> 
> Ok, that looks much better now with the second patch:
> ...even after a second reboot for verification. Great!(
> (...)
> I assume this won't affect UMA Itaniums or should I check on one of my
> other Integrities if this change breaks the kernel on them?

Nice! I just assume we won't be able to use the patch "as is" as it would
potentially break other architectures if I'm not mistaken.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2
  2019-06-25  6:40 Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2 John Paul Adrian Glaubitz
                   ` (14 preceding siblings ...)
  2019-06-25 11:21 ` John Paul Adrian Glaubitz
@ 2019-06-25 12:00 ` Christoph Hellwig
  2019-06-25 12:08 ` Frank Scheiner
                   ` (12 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2019-06-25 12:00 UTC (permalink / raw)
  To: linux-ia64

On Tue, Jun 25, 2019 at 01:21:38PM +0200, John Paul Adrian Glaubitz wrote:
> > Ok, that looks much better now with the second patch:
> > ...even after a second reboot for verification. Great!(
> > (...)
> > I assume this won't affect UMA Itaniums or should I check on one of my
> > other Integrities if this change breaks the kernel on them?
> 
> Nice! I just assume we won't be able to use the patch "as is" as it would
> potentially break other architectures if I'm not mistaken.

It doesn't actually _break_ anything, but it regresses in not doing
node local allocations.  Give me some time to dig through the ia64
code to figure out if I can make sense of this.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2
  2019-06-25  6:40 Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2 John Paul Adrian Glaubitz
                   ` (15 preceding siblings ...)
  2019-06-25 12:00 ` Christoph Hellwig
@ 2019-06-25 12:08 ` Frank Scheiner
  2019-06-25 14:40 ` Christoph Hellwig
                   ` (11 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: Frank Scheiner @ 2019-06-25 12:08 UTC (permalink / raw)
  To: linux-ia64



On 6/25/19 14:00, Christoph Hellwig wrote:
> On Tue, Jun 25, 2019 at 01:21:38PM +0200, John Paul Adrian Glaubitz wrote:
>>> Ok, that looks much better now with the second patch:
>>> ...even after a second reboot for verification. Great!(
>>> (...)
>>> I assume this won't affect UMA Itaniums or should I check on one of my
>>> other Integrities if this change breaks the kernel on them?
>>
>> Nice! I just assume we won't be able to use the patch "as is" as it would
>> potentially break other architectures if I'm not mistaken.
>
> It doesn't actually _break_ anything, but it regresses in not doing
> node local allocations.  Give me some time to dig through the ia64
> code to figure out if I can make sense of this.

Thanks for your help and support. I'm happy to test what you come up with.

Cheers,
Frank

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2
  2019-06-25  6:40 Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2 John Paul Adrian Glaubitz
                   ` (16 preceding siblings ...)
  2019-06-25 12:08 ` Frank Scheiner
@ 2019-06-25 14:40 ` Christoph Hellwig
  2019-06-25 15:52 ` Frank Scheiner
                   ` (10 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2019-06-25 14:40 UTC (permalink / raw)
  To: linux-ia64

Please try this patch instead of the previous one:

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 2c2772e9702a..3516a543450e 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -118,9 +118,10 @@ struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
 			page = NULL;
 		}
 	}
-	if (!page)
-		page = alloc_pages_node(dev_to_node(dev), gfp, page_order);
-
+	if (!page) {
+		page = alloc_pages_node(local_memory_node(dev_to_node(dev)),
+					gfp, page_order);
+	}
 	if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
 		__free_pages(page, page_order);
 		page = NULL;

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2
  2019-06-25  6:40 Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2 John Paul Adrian Glaubitz
                   ` (17 preceding siblings ...)
  2019-06-25 14:40 ` Christoph Hellwig
@ 2019-06-25 15:52 ` Frank Scheiner
  2019-06-28  6:26 ` Christoph Hellwig
                   ` (9 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: Frank Scheiner @ 2019-06-25 15:52 UTC (permalink / raw)
  To: linux-ia64



On 6/25/19 16:40, Christoph Hellwig wrote:
> Please try this patch instead of the previous one:
>
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 2c2772e9702a..3516a543450e 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -118,9 +118,10 @@ struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
>   			page = NULL;
>   		}
>   	}
> -	if (!page)
> -		page = alloc_pages_node(dev_to_node(dev), gfp, page_order);
> -
> +	if (!page) {
> +		page = alloc_pages_node(local_memory_node(dev_to_node(dev)),
> +					gfp, page_order);
> +	}
>   	if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
>   		__free_pages(page, page_order);
>   		page = NULL;
>

Took me a while as I lost two tries, because of two problems after which
the machine was no longer responsible, once during recompilation of the
changed files and once during installation of kernel modules.

This is what I saw, not sure if it is related to the changes or the
newer kernel version, but I can't remember seeing such messages before:

```
## 1st problem:

BUG: Bad page state in process kworker/u33:1  pfn:36304b
bad because of flags: 0x800(arch_1)

## 2nd problem:

BUG: Bad page state in process kworker/u32:5  pfn:3630f7
bad because of flags: 0x800(arch_1)
```

Using the v4.19.37 with the reverts mentioned in the initial mail I was
able to create the new kernel, install the kernel modules and build the
initramfs.

Using the third patch the resulting kernel sadly panics again:
```
Linux version 5.1.15-dirty (root@rx2800-i2) (gcc version 7.3.0 (Gentoo
7.3.0-r3 p1.4)) #3 SMP Tue Jun 25 17:41:55 CEST 2019
EFI v2.10 by HP:
efi:  SALsystab=0xdfdd63a18  ACPI 2.0=0x3d3c4014  HCDP=0xdffff8798
SMBIOS=0x3d368000
booting generic kernel on platform dig
PCDP: v3 at 0xdffff8798
earlycon: uart8250 at I/O port 0x4000 (options '115200n8')
printk: bootconsole [uart8250] enabled
ACPI: Early table checksum verification disabled
ACPI: RSDP 0x000000003D3C4014 000024 (v02 HP    )
ACPI: XSDT 0x000000003D3C4580 000124 (v01 HP     RX2800-2 00000001
01000013)
[...]
Trying to unpack rootfs image as initramfs...
[...]
Detecting Adaptec I2O RAID controllers...
ahci 0000:00:1f.2: AHCI 0001.0200 32 slots 6 ports 3 Gbps 0x3f impl SATA
mode
ahci 0000:00:1f.2: flags: 64bit ncq sntf pm led clo pio slum part ccc ems
Unable to handle kernel NULL pointer dereference (address 0000000000001688)
swapper/0[1]: Oops 11012296146944 [1]
Modules linked in:

CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.1.15-dirty #3
Hardware name: hp Integrity rx2800 i2, BIOS 01.93 09/12/2012
psr : 00001210084a6010 ifs : 8000000000000207 ip  : [<a00000010017e591>]
    Not tainted (5.1.15-dirty)
ip is at local_memory_node+0x51/0xd0
unat: 0000000000000000 pfs : 0000000000000814 rsc : 0000000000000003
rnat: 4905ad66a46b1a31 bsps: 6330dc59462bf692 pr  : 000000000001aa55
ldrs: 0000000000000000 ccv : 000000038df5dd8b fpsr: 0009804c8a70433f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a00000010010ea70 b6  : a00000010003a740 b7  : a0000001007fe9b0
f6  : 1003e00000000000164ff f7  : 1000fb27f800000000000
f8  : 1003e0000000000003480 f9  : 1003e000000000000000f
f10 : 1003e0000000000000400 f11 : 1003e0000000000003c00
r1  : a0000001015a9e80 r2  : e000000001519980 r3  : e000000001519988
r8  : 0000000000000008 r9  : e000000001519990 r10 : 0000000000000000
r11 : 0000000000001688 r12 : e000000d8339fd50 r13 : e000000d83398000
r14 : fffffffffffc04b8 r15 : 0000000000000000 r16 : ffffffffffffffff
r17 : ffffffffffffffff r18 : 0000000000ffffff r19 : e000000d80010180
r20 : fffffffffffd01b0 r21 : 0000000000000010 r22 : e0000000011101b0
r23 : 0000000000000001 r24 : e0000000011101bc r25 : 0000000000000001
r26 : 000000000000006c r27 : e000000d846679d0 r28 : e000000d846679c0
r29 : 0000000000000370 r30 : 0000000000000000 r31 : 0000000000000081

Call Trace:
  [<a000000100013820>] show_stack+0x40/0x90
                                 sp=e000000d8339f9a0 bsp=e000000d83399750
  [<a0000001000141a0>] show_regs+0x930/0x940
                                 sp=e000000d8339fb70 bsp=e000000d833996e0
  [<a0000001000245e0>] die+0x1a0/0x2f0
                                 sp=e000000d8339fb70 bsp=e000000d833996a0
  [<a00000010004bab0>] ia64_do_page_fault+0x7e0/0x9e0
                                 sp=e000000d8339fb70 bsp=e000000d83399610
  [<a00000010000c580>] ia64_leave_kernel+0x0/0x270
                                 sp=e000000d8339fb80 bsp=e000000d83399610
  [<a00000010017e590>] local_memory_node+0x50/0xd0
                                 sp=e000000d8339fd50 bsp=e000000d833995d0
  [<a00000010010ea70>] __dma_direct_alloc_pages+0x150/0x340
                                 sp=e000000d8339fd50 bsp=e000000d83399550
  [<a00000010010ec90>] dma_direct_alloc_pages+0x30/0x170
                                 sp=e000000d8339fd50 bsp=e000000d83399510
  [<a00000010003a790>] arch_dma_alloc+0x30/0x50
                                 sp=e000000d8339fd50 bsp=e000000d833994d0
  [<a00000010010ef30>] dma_direct_alloc+0x60/0xa0
                                 sp=e000000d8339fd50 bsp=e000000d83399490
  [<a00000010010c570>] dma_alloc_attrs+0x150/0x1e0
                                 sp=e000000d8339fd50 bsp=e000000d83399440
  [<a00000010010c670>] dmam_alloc_attrs+0x70/0x100
                                 sp=e000000d8339fd50 bsp=e000000d833993e8
  [<a0000001009a9bb0>] ahci_port_start+0x2e0/0x4a0
                                 sp=e000000d8339fd50 bsp=e000000d833993a0
  [<a000000100969480>] ata_host_start+0x300/0x460
                                 sp=e000000d8339fd60 bsp=e000000d83399340
  [<a0000001009758c0>] ata_host_activate+0x20/0x280
                                 sp=e000000d8339fd60 bsp=e000000d833992e0
  [<a0000001009aa090>] ahci_host_activate+0x320/0x330
                                 sp=e000000d8339fd60 bsp=e000000d83399270
  [<a0000001009a3430>] ahci_init_one+0x1a70/0x1e10
                                 sp=e000000d8339fd60 bsp=e000000d833991b8
  [<a0000001006df4d0>] local_pci_probe+0x90/0x140
                                 sp=e000000d8339fdc0 bsp=e000000d83399178
  [<a0000001006e09f0>] pci_device_probe+0x2f0/0x310
                                 sp=e000000d8339fdc0 bsp=e000000d83399140
  [<a00000010083a3a0>] really_probe+0x4a0/0x6b0
                                 sp=e000000d8339fde0 bsp=e000000d833990d8
  [<a00000010083aa60>] driver_probe_device+0x1e0/0x1f0
                                 sp=e000000d8339fde0 bsp=e000000d833990a0
  [<a00000010083af00>] device_driver_attach+0xb0/0x100
                                 sp=e000000d8339fde0 bsp=e000000d83399070
  [<a00000010083b130>] __driver_attach+0x1e0/0x1f0
                                 sp=e000000d8339fde0 bsp=e000000d83399040
  [<a0000001008363f0>] bus_for_each_dev+0xd0/0x130
                                 sp=e000000d8339fde0 bsp=e000000d83399000
  [<a0000001008394b0>] driver_attach+0x40/0x60
                                 sp=e000000d8339fdf0 bsp=e000000d83398fd8
  [<a000000100838880>] bus_add_driver+0x3b0/0x450
                                 sp=e000000d8339fdf0 bsp=e000000d83398f88
  [<a00000010083c090>] driver_register+0x220/0x2b0
                                 sp=e000000d8339fdf0 bsp=e000000d83398f60
  [<a0000001006deb50>] __pci_register_driver+0xa0/0xc0
                                 sp=e000000d8339fdf0 bsp=e000000d83398f30
  [<a0000001011442d0>] ahci_pci_driver_init+0x50/0x70
                                 sp=e000000d8339fdf0 bsp=e000000d83398f18
  [<a00000010000a7d0>] do_one_initcall+0x100/0x2c0
                                 sp=e000000d8339fdf0 bsp=e000000d83398ee0
  [<a0000001010f9cc0>] kernel_init_freeable+0x410/0x470
                                 sp=e000000d8339fe30 bsp=e000000d83398e78
  [<a000000100ddd680>] kernel_init+0x20/0x280
                                 sp=e000000d8339fe30 bsp=e000000d83398e58
  [<a00000010000c370>] call_payload+0x50/0x80
                                 sp=e000000d8339fe30 bsp=e000000d83398e40
Disabling lock debugging due to kernel taint
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
---[ end Kernel panic - not syncing: Attempted to kill init!
exitcode=0x0000000b ]---
```

gdb shows the same like the last time for the "new" faulting address:

```
# gdb ./vmlinux
[...]
(gdb) l *(local_memory_node+0x51)
0xa00000010017e591 is in local_memory_node (./include/linux/mmzone.h:993).
988	 */
989	static __always_inline struct zoneref *next_zones_zonelist(struct
zoneref *z,
990						enum zone_type highest_zoneidx,
991						nodemask_t *nodes)
992	{
993		if (likely(!nodes && zonelist_zone_idx(z) <= highest_zoneidx))
994			return z;
995		return __next_zones_zonelist(z, highest_zoneidx, nodes);
996	}
997
```

Cheers,
Frank

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2
  2019-06-25  6:40 Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2 John Paul Adrian Glaubitz
                   ` (18 preceding siblings ...)
  2019-06-25 15:52 ` Frank Scheiner
@ 2019-06-28  6:26 ` Christoph Hellwig
  2019-06-28  7:35 ` Frank Scheiner
                   ` (8 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2019-06-28  6:26 UTC (permalink / raw)
  To: linux-ia64

Btw, as a workaround you could try to disable CONFIG_NUMA in
your .config - as far as I can tell the rx2800 i2 is not actually
a NUMA system.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2
  2019-06-25  6:40 Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2 John Paul Adrian Glaubitz
                   ` (19 preceding siblings ...)
  2019-06-28  6:26 ` Christoph Hellwig
@ 2019-06-28  7:35 ` Frank Scheiner
  2019-08-05  7:10 ` Christoph Hellwig
                   ` (7 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: Frank Scheiner @ 2019-06-28  7:35 UTC (permalink / raw)
  To: linux-ia64

On 6/28/19 08:26, Christoph Hellwig wrote:
> Btw, as a workaround you could try to disable CONFIG_NUMA in
> your .config - as far as I can tell the rx2800 i2 is not actually
> a NUMA system.

Actually I think it is a NUMA system. Maybe it's not acting like one in
my configuration, as I only have a single processor installed. But the
Tukwila has its memory controller integrated into the processor and
interconnects with other processors or chipset components via QPI like
the Nehalem. This can be compared to Opterons and HyperTransport.

Cheers,
Frank

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2
  2019-06-25  6:40 Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2 John Paul Adrian Glaubitz
                   ` (20 preceding siblings ...)
  2019-06-28  7:35 ` Frank Scheiner
@ 2019-08-05  7:10 ` Christoph Hellwig
  2019-08-05  8:16 ` Frank Scheiner
                   ` (6 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2019-08-05  7:10 UTC (permalink / raw)
  To: linux-ia64

Seems like we dropped the ball on this..

Did I give you a patch like this (for 5.2 and probably earlier, won't
apply to 5.3-rc) to test before as that is anther idea?

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 2c2772e9702a..e471158c7c6e 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -119,7 +119,8 @@ struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
 		}
 	}
 	if (!page)
-		page = alloc_pages_node(dev_to_node(dev), gfp, page_order);
+		page = alloc_pages_node(local_memory_node(dev_to_node(dev)),
+				gfp, page_order);
 
 	if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
 		__free_pages(page, page_order);

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2
  2019-06-25  6:40 Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2 John Paul Adrian Glaubitz
                   ` (21 preceding siblings ...)
  2019-08-05  7:10 ` Christoph Hellwig
@ 2019-08-05  8:16 ` Frank Scheiner
  2020-08-05 19:43 ` John Paul Adrian Glaubitz
                   ` (5 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: Frank Scheiner @ 2019-08-05  8:16 UTC (permalink / raw)
  To: linux-ia64

On 8/5/19 09:10, Christoph Hellwig wrote:
> Seems like we dropped the ball on this..

I still need to test the possible "disable CONFIG_NUMA" workaround. If
that works for my single processor rx2800 i2 it could be a good
workaround for now, as I assume the older Itanium systems (<= Montvale)
won't be affected by such a config change unless they're using those
ccNUMA sx1000/sx2000 chipsets.

>
> Did I give you a patch like this (for 5.2 and probably earlier, won't
> apply to 5.3-rc) to test before as that is anther idea?
>
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 2c2772e9702a..e471158c7c6e 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -119,7 +119,8 @@ struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
>   		}
>   	}
>   	if (!page)
> -		page = alloc_pages_node(dev_to_node(dev), gfp, page_order);
> +		page = alloc_pages_node(local_memory_node(dev_to_node(dev)),
> +				gfp, page_order);
>
>   	if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
>   		__free_pages(page, page_order);
>

It's not the exact same patch as from [1], but the resulting code is
identical if I didn't make an error.

Cheers,
Frank

[1]: https://marc.info/?l=linux-ia64&m=156147364328197&w=2

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2
  2019-06-25  6:40 Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2 John Paul Adrian Glaubitz
                   ` (22 preceding siblings ...)
  2019-08-05  8:16 ` Frank Scheiner
@ 2020-08-05 19:43 ` John Paul Adrian Glaubitz
  2020-08-05 20:27 ` Jessica Clarke
                   ` (4 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: John Paul Adrian Glaubitz @ 2020-08-05 19:43 UTC (permalink / raw)
  To: linux-ia64

Hi Christoph!

On 8/5/19 9:10 AM, Christoph Hellwig wrote:
> Seems like we dropped the ball on this..
> 
> Did I give you a patch like this (for 5.2 and probably earlier, won't
> apply to 5.3-rc) to test before as that is anther idea?
> 
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 2c2772e9702a..e471158c7c6e 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -119,7 +119,8 @@ struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
>  		}
>  	}
>  	if (!page)
> -		page = alloc_pages_node(dev_to_node(dev), gfp, page_order);
> +		page = alloc_pages_node(local_memory_node(dev_to_node(dev)),
> +				gfp, page_order);
>  
>  	if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
>  		__free_pages(page, page_order);

I just applied this patch on top of 4.19.137 and it's crashing with this trace
when trying to load the "hpsa" module. It definitely looks like an issue with
dma_direct_alloc():

[    2.352364] HP HPSA Driver (v 3.4.20-125)                                                                                                                                                                      
[    2.386832] hpsa 0000:02:00.0: Logical aborts not supported                                                                                                                                                    
[    2.420644] hpsa 0000:02:00.0: HP SSD Smart Path aborts not supported                                                                                                                                          
[    2.482838] Unable to handle kernel NULL pointer dereference (address 0000000000001688)                                                                                                                        
[    2.531221] swapper/0[1]: Oops 11012296146944 [1]                                                                                                                                                              
[    2.535221] Modules linked in:                                                                                                                                                                                 
[    2.535221]                                                                                                                                                                                                    
[    2.535221] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.19.137-dirty #7                                                                                                                                        
[    2.535221] Hardware name: hp Integrity BL870c i4 nPar, BIOS 02.64 03/03/2016                                                                                                                                  
[    2.535221] psr : 00001210084a6010 ifs : 8000000000000207 ip  : [<a00000010019acc1>]    Not tainted (4.19.137-dirty)                                                                                           
[    2.535221] ip is at local_memory_node+0x51/0xd0                                                                                                                                                               
[    2.535221] unat: 0000000000000000 pfs : 0000000000000793 rsc : 0000000000000003                                                                                                                               
[    2.535221] rnat: c00000005805cc60 bsps: 0000000000000000 pr  : a6a6aa956aaa9959                                                                                                                               
[    2.535221] ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70433f                                                                                                                               
[    2.535221] csd : 0000000000000000 ssd : 0000000000000000                                                                                                                                                      
[    2.535221] b0  : a00000010011aab0 b6  : a00000010011c870 b7  : a00000010003a5e0                                                                                                                               
[    2.535221] f6  : 10012bffff00000000000 f7  : 1003e00000000000bffff                                                                                                                                            
[    2.535221] f8  : 1003e0000000000002560 f9  : 1003e0000000000000054                                                                                                                                            
[    2.535221] f10 : 1003e00000000000000c0 f11 : 1003e0000000000003f00                                                                                                                                            
[    2.535221] r1  : a0000001013dd060 r2  : e000000001919980 r3  : e000000001919988                                                                                                                               
[    2.535221] r8  : 0000000000000008 r9  : e000000001919990 r10 : 0000000000000000                                                                                                                               
[    2.535221] r11 : 0000000000001688 r12 : e000000f8363fcc0 r13 : e000000f83638000                                                                                                                               
[    2.535221] r14 : fffffffffffc04b8 r15 : 00000000ffffffff r16 : 0000000000ffffff                                                                                                                               
[    2.535221] r17 : 0000000000000000 r18 : 0000000000000000 r19 : 0000000000000000                                                                                                                               
[    2.535221] r20 : fffffffffffcdd50 r21 : 0000000000000010 r22 : e00000000114dd50                                                                                                                               
[    2.535221] r23 : 0000000000000003 r24 : 0000000000000015 r25 : 0000000000000015                                                                                                                               
[    2.535221] r26 : 0000000000000800 r27 : 0000000000000c00 r28 : a000000100f1f0e8                                                                                                                               
[    2.535221] r29 : a00000010003a5e0 r30 : a00000010011c870 r31 : 0000000000000081                                                                                                                               
[    2.535221]                                                                                                                                                                                                    
[    2.535221] Call Trace:                                                                                                                                                                                        
[    2.535221]  [<a000000100013570>] show_stack+0x40/0x90                                                                                                                                                         
[    2.535221]                                 spà00000f8363f870 bspà00000f83639838                                                                                                                           
[    2.535221]  [<a000000100013f80>] show_regs+0x9c0/0x9f0                                                                                                                                                        
[    2.535221]                                 spà00000f8363fa40 bspà00000f836397d0
[    2.535221]  [<a000000100024530>] die+0x1a0/0x2f0
[    2.535221]                                 spà00000f8363fa60 bspà00000f83639790
[    3.750837] random: crng init done
[    2.535221]  [<a00000010004bb80>] ia64_do_page_fault+0x830/0x9d0
[    2.535221]                                 spà00000f8363fa60 bspà00000f836396f8
[    2.535221]  [<a00000010000c460>] ia64_leave_kernel+0x0/0x270
[    2.535221]                                 spà00000f8363faf0 bspà00000f836396f8
[    2.535221]  [<a00000010019acc0>] local_memory_node+0x50/0xd0
[    2.535221]                                 spà00000f8363fcc0 bspà00000f836396c0
[    2.535221]  [<a00000010011aab0>] dma_direct_alloc+0x110/0x280
[    2.535221]                                 spà00000f8363fcc0 bspà00000f83639648
[    2.535221]  [<a00000010011c8c0>] swiotlb_alloc+0x50/0x2a0
[    2.535221]                                 spà00000f8363fcc0 bspà00000f836395e8
[    2.535221]  [<a0000001007b4550>] hpsa_init_one+0x25f0/0x4670
[    2.535221]                                 spà00000f8363fcc0 bspà00000f83639320
[    2.535221]  [<a00000010056f6f0>] local_pci_probe+0x90/0x150
[    2.535221]                                 spà00000f8363fd40 bspà00000f836392e0
[    2.535221]  [<a000000100570dc0>] pci_device_probe+0x300/0x320
[    2.535221]                                 spà00000f8363fd40 bspà00000f836392a8
[    2.535221]  [<a0000001006c7a70>] really_probe+0x480/0x680
[    2.535221]                                 spà00000f8363fd60 bspà00000f83639240
[    2.535221]  [<a0000001006c8130>] driver_probe_device+0x1e0/0x1f0
[    2.535221]                                 spà00000f8363fd60 bspà00000f83639208
[    2.535221]  [<a0000001006c82d0>] __driver_attach+0x190/0x230
[    2.535221]                                 spà00000f8363fd60 bspà00000f836391d0
[    2.535221]  [<a0000001006c3950>] bus_for_each_dev+0xd0/0x130
[    2.535221]                                 spà00000f8363fd60 bspà00000f83639190
[    2.535221]  [<a0000001006c6ba0>] driver_attach+0x40/0x60
[    2.535221]                                 spà00000f8363fd70 bspà00000f83639170
[    2.535221]  [<a0000001006c60a0>] bus_add_driver+0x400/0x4a0
[    2.535221]                                 spà00000f8363fd70 bspà00000f83639120
[    2.535221]  [<a0000001006c9600>] driver_register+0x220/0x2b0
[    2.535221]                                 spà00000f8363fd70 bspà00000f836390f8
[    2.535221]  [<a00000010056ef80>] __pci_register_driver+0xa0/0xc0
[    2.535221]                                 spà00000f8363fd70 bspà00000f836390c8
[    2.535221]  [<a000000100f930b0>] hpsa_init+0x80/0xc0
[    2.535221]                                 spà00000f8363fd70 bspà00000f836390a0
[    2.535221]  [<a00000010000a6a0>] do_one_initcall+0x100/0x2d0
[    2.535221]                                 spà00000f8363fd70 bspà00000f83639068
[    2.535221]  [<a000000100f49c60>] kernel_init_freeable+0x5c0/0x5d0
[    2.535221]                                 spà00000f8363fdb0 bspà00000f83639000
[    2.535221]  [<a000000100c6a880>] kernel_init+0x20/0x280
[    2.535221]                                 spà00000f8363fe30 bspà00000f83638fd8
[    2.535221]  [<a00000010000c250>] call_payload+0x50/0x80
[    2.535221]                                 spà00000f8363fe30 bspà00000f83638fc0
[    2.535221] Disabling lock debugging due to kernel taint
[    5.711378] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    5.711378] 
[    5.766837] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    5.766837]  ]---

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2
  2019-06-25  6:40 Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2 John Paul Adrian Glaubitz
                   ` (23 preceding siblings ...)
  2020-08-05 19:43 ` John Paul Adrian Glaubitz
@ 2020-08-05 20:27 ` Jessica Clarke
  2020-08-05 21:27 ` John Paul Adrian Glaubitz
                   ` (3 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: Jessica Clarke @ 2020-08-05 20:27 UTC (permalink / raw)
  To: linux-ia64

On 5 Aug 2020, at 20:43, John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> wrote:
> 
> Hi Christoph!
> 
> On 8/5/19 9:10 AM, Christoph Hellwig wrote:
>> Seems like we dropped the ball on this..
>> 
>> Did I give you a patch like this (for 5.2 and probably earlier, won't
>> apply to 5.3-rc) to test before as that is anther idea?
>> 
>> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
>> index 2c2772e9702a..e471158c7c6e 100644
>> --- a/kernel/dma/direct.c
>> +++ b/kernel/dma/direct.c
>> @@ -119,7 +119,8 @@ struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
>> 		}
>> 	}
>> 	if (!page)
>> -		page = alloc_pages_node(dev_to_node(dev), gfp, page_order);
>> +		page = alloc_pages_node(local_memory_node(dev_to_node(dev)),
>> +				gfp, page_order);
>> 
>> 	if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
>> 		__free_pages(page, page_order);
> 
> I just applied this patch on top of 4.19.137 and it's crashing with this trace
> when trying to load the "hpsa" module. It definitely looks like an issue with
> dma_direct_alloc():

My guess is dev_to_node gave NUMA_NO_NODE for a random PCI device. Try:

    int nid = dev_to_node(dev);
    if (nid >= 0)
        nid = local_memory_node(nid);

and then pass nid to alloc_pages_node instead.

Jess

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2
  2019-06-25  6:40 Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2 John Paul Adrian Glaubitz
                   ` (24 preceding siblings ...)
  2020-08-05 20:27 ` Jessica Clarke
@ 2020-08-05 21:27 ` John Paul Adrian Glaubitz
  2020-08-05 22:41 ` John Paul Adrian Glaubitz
                   ` (2 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: John Paul Adrian Glaubitz @ 2020-08-05 21:27 UTC (permalink / raw)
  To: linux-ia64

On 8/5/20 10:27 PM, Jessica Clarke wrote:
> My guess is dev_to_node gave NUMA_NO_NODE for a random PCI device. Try:
> 
>     int nid = dev_to_node(dev);
>     if (nid >= 0)
>         nid = local_memory_node(nid);
> 
> and then pass nid to alloc_pages_node instead.

I tried:

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 1d2f147f737d..3efa759bbb3e 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -85,8 +85,13 @@ void *dma_direct_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
                        page = NULL;
                }
        }
+
+       int nid = dev_to_node(dev);
+       if (nid >= 0)
+               nid = local_memory_node(nid);
+
        if (!page)
-               page = alloc_pages_node(dev_to_node(dev), gfp, page_order);
+               page = alloc_pages_node(nid, gfp, page_order);
 
        if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
                __free_pages(page, page_order);
root@lenz:/usr/src/linux-stable#

with the same result. That's on 4.19.137. Can try a newer kernel tomorrow.

[    2.319109] Detecting Adaptec I2O RAID controllers...                                                                                                                    
[    2.351357] HP HPSA Driver (v 3.4.20-125)                                                                                                                                
[    2.392710] hpsa 0000:02:00.0: Logical aborts not supported                                                                                                              
[    2.427375] hpsa 0000:02:00.0: HP SSD Smart Path aborts not supported                                                                                                    
[    2.492710] Unable to handle kernel NULL pointer dereference (address 0000000000001688)                                                                                  
[    2.543228] swapper/0[1]: Oops 11012296146944 [1]                                                                                                                        
[    2.543228] Modules linked in:                                                                                                                                           
[    2.543228]                                                                                                                                                              
[    2.543228] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.137-dirty #8                                                                                                  
[    2.543228] Hardware name: hp Integrity BL870c i4 nPar, BIOS 02.64 03/03/2016                                                                                            
[    2.543228] psr : 00001210084a6010 ifs : 8000000000000207 ip  : [<a00000010019acf1>]    Not tainted (4.19.137-dirty)                                                     
[    2.543228] ip is at local_memory_node+0x51/0xd0                                                                                                                         
[    2.543228] unat: 0000000000000000 pfs : 0000000000000793 rsc : 0000000000000003                                                                                         
[    2.543228] rnat: c00000005805cc60 bsps: 0000000000000000 pr  : a6a6aa956aaa9959                                                                                         
[    2.543228] ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70433f                                                                                         
[    2.543228] csd : 0000000000000000 ssd : 0000000000000000                                                                                                                
[    2.543228] b0  : a00000010011aad0 b6  : a00000010011c8a0 b7  : a00000010003a5e0                                                                                         
[    2.543228] f6  : 10012bffff00000000000 f7  : 1003e00000000000bffff                                                                                                      
[    2.543228] f8  : 1003e0000000000003f00 f9  : 1003e0000000000000054                                                                                                      
[    2.543228] f10 : 1003e00000000000000c0 f11 : 1003e0000000000003f00                                                                                                      
[    2.543228] r1  : a0000001013dd060 r2  : e000000001919980 r3  : e000000001919988                                                                                         
[    2.543228] r8  : 0000000000000008 r9  : e000000001919990 r10 : 0000000000000000                                                                                         
[    2.543228] r11 : 0000000000001688 r12 : e000000f8363fcc0 r13 : e000000f83638000                                                                                         
[    2.543228] r14 : fffffffffffc04b8 r15 : 00000000ffffffff r16 : 0000000000ffffff                                                                                         
[    2.543228] r17 : 0000000000000000 r18 : 0000000000000000 r19 : 0000000000000000                                                                                         
[    2.543228] r20 : fffffffffffcdd50 r21 : 0000000000000010 r22 : e00000000110dd50                                                                                         
[    2.543228] r23 : 0000000000000003 r24 : 0000000000480220 r25 : 0000000000000028                                                                                         
[    2.543228] r26 : 0000000000000800 r27 : 0000000000000c00 r28 : a000000100f1f0e8                                                                                         
[    2.543228] r29 : a00000010003a5e0 r30 : a00000010011c8a0 r31 : 0000000000000081                                                                                         
[    2.543228]                                                                                                                                                              
[    2.543228] Call Trace:                                                                                                                                                  
[    2.543228]  [<a000000100013570>] show_stack+0x40/0x90                                                                                                                   
[    2.543228]                                 spà00000f8363f870 bspà00000f83639838                                                                                     
[    2.543228]  [<a000000100013f80>] show_regs+0x9c0/0x9f0                                                                                                                  
[    2.543228]                                 spà00000f8363fa40 bspà00000f836397d0                                                                                     
[    3.747675] random: crng init done                                                                                                                                       
[    2.543228]  [<a000000100024530>] die+0x1a0/0x2f0                                                                                                                        
[    2.543228]                                 spà00000f8363fa60 bspà00000f83639790                                                                                     
[    2.543228]  [<a00000010004bb80>] ia64_do_page_fault+0x830/0x9d0                                                                                                         
[    2.543228]                                 spà00000f8363fa60 bspà00000f836396f8                                                                                     
[    2.543228]  [<a00000010000c460>] ia64_leave_kernel+0x0/0x270                                                                                                            
[    2.543228]                                 spà00000f8363faf0 bspà00000f836396f8                                                                                     
[    2.543228]  [<a00000010019acf0>] local_memory_node+0x50/0xd0                                                                                                            
[    2.543228]                                 spà00000f8363fcc0 bspà00000f836396c0                                                                                     
[    2.543228]  [<a00000010011aad0>] dma_direct_alloc+0x130/0x2b0                                                                                                           
[    2.543228]                                 spà00000f8363fcc0 bspà00000f83639648
[    2.543228]  [<a00000010011c8f0>] swiotlb_alloc+0x50/0x2a0
[    2.543228]                                 spà00000f8363fcc0 bspà00000f836395e8
[    2.543228]  [<a0000001007b4580>] hpsa_init_one+0x25f0/0x4670
[    2.543228]                                 spà00000f8363fcc0 bspà00000f83639320
[    2.543228]  [<a00000010056f720>] local_pci_probe+0x90/0x150
[    2.543228]                                 spà00000f8363fd40 bspà00000f836392e0
[    2.543228]  [<a000000100570df0>] pci_device_probe+0x300/0x320
[    2.543228]                                 spà00000f8363fd40 bspà00000f836392a8
[    2.543228]  [<a0000001006c7aa0>] really_probe+0x480/0x680
[    2.543228]                                 spà00000f8363fd60 bspà00000f83639240
[    2.543228]  [<a0000001006c8160>] driver_probe_device+0x1e0/0x1f0
[    2.543228]                                 spà00000f8363fd60 bspà00000f83639208
[    2.543228]  [<a0000001006c8300>] __driver_attach+0x190/0x230
[    2.543228]                                 spà00000f8363fd60 bspà00000f836391d0
[    2.543228]  [<a0000001006c3980>] bus_for_each_dev+0xd0/0x130
[    2.543228]                                 spà00000f8363fd60 bspà00000f83639190
[    2.543228]  [<a0000001006c6bd0>] driver_attach+0x40/0x60
[    2.543228]                                 spà00000f8363fd70 bspà00000f83639170
[    2.543228]  [<a0000001006c60d0>] bus_add_driver+0x400/0x4a0
[    2.543228]                                 spà00000f8363fd70 bspà00000f83639120
[    2.543228]  [<a0000001006c9630>] driver_register+0x220/0x2b0
[    2.543228]                                 spà00000f8363fd70 bspà00000f836390f8
[    2.543228]  [<a00000010056efb0>] __pci_register_driver+0xa0/0xc0
[    2.543228]                                 spà00000f8363fd70 bspà00000f836390c8
[    2.543228]  [<a000000100f930b0>] hpsa_init+0x80/0xc0
[    2.543228]                                 spà00000f8363fd70 bspà00000f836390a0
[    2.543228]  [<a00000010000a6a0>] do_one_initcall+0x100/0x2d0
[    2.543228]                                 spà00000f8363fd70 bspà00000f83639068
[    2.543228]  [<a000000100f49c60>] kernel_init_freeable+0x5c0/0x5d0
[    2.543228]                                 spà00000f8363fdb0 bspà00000f83639000
[    2.543228]  [<a000000100c6a8c0>] kernel_init+0x20/0x280
[    2.543228]                                 spà00000f8363fe30 bspà00000f83638fd8
[    2.543228]  [<a00000010000c250>] call_payload+0x50/0x80
[    2.543228]                                 spà00000f8363fe30 bspà00000f83638fc0
[    2.543228] Disabling lock debugging due to kernel taint
[    5.750115] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    5.750115] 
[    5.750115] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    5.750115]  ]---

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2
  2019-06-25  6:40 Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2 John Paul Adrian Glaubitz
                   ` (25 preceding siblings ...)
  2020-08-05 21:27 ` John Paul Adrian Glaubitz
@ 2020-08-05 22:41 ` John Paul Adrian Glaubitz
  2021-03-23 15:02 ` John Paul Adrian Glaubitz
  2021-03-23 15:14 ` Frank Scheiner
  28 siblings, 0 replies; 30+ messages in thread
From: John Paul Adrian Glaubitz @ 2020-08-05 22:41 UTC (permalink / raw)
  To: linux-ia64

On 8/5/20 11:27 PM, John Paul Adrian Glaubitz wrote:
> with the same result. That's on 4.19.137. Can try a newer kernel tomorrow.
Looking at the change [1], I noticed that the ia64-specific "ia64_swiotlb_alloc_coherent()"
contained an additional check whether the DMA_BIT_MASK is 64 bits, so I added that back into
swiotlb_alloc():

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 2a8c41f12d45..e51654180189 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -1018,6 +1018,9 @@ void *swiotlb_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
        if (gfp & __GFP_NOWARN)
                attrs |= DMA_ATTR_NO_WARN;
 
+       if (dev->coherent_dma_mask != DMA_BIT_MASK(64))
+               gfp |= GFP_DMA32;
+
        /*
         * Don't print a warning when the first allocation attempt fails.
         * swiotlb_alloc_coherent() will print a warning when the DMA memory

No success, unfortunately even though this additional check was the only ia64-specific
part in the old code. But I assume the problem is also that swiotlb_alloc_coherent()
got replaced by swiotlb_alloc()?

Adrian

> [1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?idT3cea9accd9804307541cb93d3ed7ec94b07237

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2
  2019-06-25  6:40 Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2 John Paul Adrian Glaubitz
                   ` (26 preceding siblings ...)
  2020-08-05 22:41 ` John Paul Adrian Glaubitz
@ 2021-03-23 15:02 ` John Paul Adrian Glaubitz
  2021-03-23 15:14 ` Frank Scheiner
  28 siblings, 0 replies; 30+ messages in thread
From: John Paul Adrian Glaubitz @ 2021-03-23 15:02 UTC (permalink / raw)
  To: linux-ia64

Hi!

On 6/25/19 8:40 AM, John Paul Adrian Glaubitz wrote:
> On 6/21/19 10:08 PM, Frank Scheiner wrote:
>> recent testing of a Debian v4.19.37 kernel showed a problem on my rx2800
>> i2 happening during kernel boot:
>> (...)
>> [1]:
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?idT3cea9accd9804307541cb93d3ed7ec94b07237
> 
> Do you have any idea what could be the reason for the issue introduced
> by your above commit? James Clarke has guess that it might be GFP_DMA32
> which isn't being set properly anymore for the affected machines.
> 
> Do you think we could test a kernel which just sets the flag unconditionally
> to see whether this is the problem that causes the issues on these machines?

Just as a heads-up: This issue has magically fixed itself and a current kernel
with some additional minor fixes applied boots fine on these machines again [1].

Adrian

> [1] https://marc.info/?l=linux-ia64&m\x161651097316856&w=2

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2
  2019-06-25  6:40 Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2 John Paul Adrian Glaubitz
                   ` (27 preceding siblings ...)
  2021-03-23 15:02 ` John Paul Adrian Glaubitz
@ 2021-03-23 15:14 ` Frank Scheiner
  28 siblings, 0 replies; 30+ messages in thread
From: Frank Scheiner @ 2021-03-23 15:14 UTC (permalink / raw)
  To: linux-ia64

Hi Adrian,

On 23.03.21 16:02, John Paul Adrian Glaubitz wrote:
> Hi!
>
> On 6/25/19 8:40 AM, John Paul Adrian Glaubitz wrote:
>> On 6/21/19 10:08 PM, Frank Scheiner wrote:
>>> recent testing of a Debian v4.19.37 kernel showed a problem on my rx2800
>>> i2 happening during kernel boot:
>>> (...)
>>> [1]:
>>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=543cea9accd9804307541cb93d3ed7ec94b07237
>>
>> Do you have any idea what could be the reason for the issue introduced
>> by your above commit? James Clarke has guess that it might be GFP_DMA32
>> which isn't being set properly anymore for the affected machines.
>>
>> Do you think we could test a kernel which just sets the flag unconditionally
>> to see whether this is the problem that causes the issues on these machines?
>
> Just as a heads-up: This issue has magically fixed itself and a current kernel
> with some additional minor fixes applied boots fine on these machines again [1].

Thanks for he pointer, I already noticed your email to the debian-ia64
list some minutes ago. That's great news! :-)

If time allows today I might give it a try in my rx2800 i2, which ATM
just sits a meter away from me, but I'm still testing kernels on one of
my V245 machines.

Cheers,
Frank

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2021-03-23 15:14 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-25  6:40 Regression in 543cea9a - was: Re: Kernel problem on rx2800 i2 John Paul Adrian Glaubitz
2019-06-25  6:42 ` Christoph Hellwig
2019-06-25  6:46 ` John Paul Adrian Glaubitz
2019-06-25  6:50 ` Christoph Hellwig
2019-06-25  6:54 ` John Paul Adrian Glaubitz
2019-06-25  6:59 ` Christoph Hellwig
2019-06-25  7:26 ` Frank Scheiner
2019-06-25  8:16 ` Frank Scheiner
2019-06-25  8:18 ` Christoph Hellwig
2019-06-25  8:38 ` Frank Scheiner
2019-06-25  9:30 ` Frank Scheiner
2019-06-25 10:32 ` Christoph Hellwig
2019-06-25 10:46 ` Frank Scheiner
2019-06-25 10:47 ` Christoph Hellwig
2019-06-25 11:19 ` Frank Scheiner
2019-06-25 11:21 ` John Paul Adrian Glaubitz
2019-06-25 12:00 ` Christoph Hellwig
2019-06-25 12:08 ` Frank Scheiner
2019-06-25 14:40 ` Christoph Hellwig
2019-06-25 15:52 ` Frank Scheiner
2019-06-28  6:26 ` Christoph Hellwig
2019-06-28  7:35 ` Frank Scheiner
2019-08-05  7:10 ` Christoph Hellwig
2019-08-05  8:16 ` Frank Scheiner
2020-08-05 19:43 ` John Paul Adrian Glaubitz
2020-08-05 20:27 ` Jessica Clarke
2020-08-05 21:27 ` John Paul Adrian Glaubitz
2020-08-05 22:41 ` John Paul Adrian Glaubitz
2021-03-23 15:02 ` John Paul Adrian Glaubitz
2021-03-23 15:14 ` Frank Scheiner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.